[-- Attachment #1: Type: text/plain, Size: 987 bytes --] 14 fixes, based on c5e0666c5a3ccabdf16bb88451886cdf81849b66: Subject: mm: thp: correct split_huge_pages file permission Subject: mm: memcontrol: let v2 cgroups follow changes in system swappiness Subject: rapidio/mport_cdev: fix uapi type definitions Subject: huge pagecache: mmap_sem is unlocked when truncation splits pmd Subject: mm: update min_free_kbytes from khugepaged after core initialization Subject: mm, cma: prevent nr_isolated_* counters from going negative Subject: MAINTAINERS: fix Rajendra Nayak's address Subject: mm: thp: kvm: fix memory corruption in KVM with THP enabled Subject: mm/zswap: provide unique zpool name Subject: proc: prevent accessing /proc/<PID>/environ until it's ready Subject: modpost: fix module autoloading for OF devices with generic compatible property Subject: mm: fix kcompactd hang during memory offlining Subject: lib/stackdepot: avoid to return 0 handle Subject: byteswap: try to avoid __builtin_constant_p gcc bug [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 302 bytes --] 3 fixes, based on 44549e8f5eea4e0a41b487b63e616cb089922b99: Subject: Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan"" Subject: zsmalloc: fix zs_can_compact() integer overflow Subject: compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16() [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 344 bytes --] 4 fixes, based on 422ce5a97570cb8a37d016b6bc2021ae4dac5499: Subject: ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang Subject: ocfs2: fix posix_acl_create deadlock Subject: ksm: fix conflict between mmput and scan_get_next_rmap_item Subject: mm: thp: calculate the mapcount correctly for THP pages during WP faults [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 7864 bytes --] - fsnotify fix - poll() timeout fix - a few scripts/ tweaks - debugobjects updates - the (small) ocfs2 queue - Minor fixes to kernel/padata.c - Maybe half of the MM queue 117 patches, based on 2600a46ee0ed57c0e0a382c2a37ebac64d374d20: Subject: fsnotify: avoid spurious EMFILE errors from inotify_init() Subject: time: add missing implementation for timespec64_add_safe() Subject: fs: poll/select/recvmmsg: use timespec64 for timeout events Subject: time: remove timespec_add_safe() Subject: scripts/decode_stacktrace.sh: handle symbols in modules Subject: scripts/spelling.txt: add "fimware" misspelling Subject: scripts/bloat-o-meter: print percent change Subject: debugobjects: make fixup functions return bool instead of int Subject: debugobjects: correct the usage of fixup call results Subject: workqueue: update debugobjects fixup callbacks return type Subject: timer: update debugobjects fixup callbacks return type Subject: rcu: update debugobjects fixup callbacks return type Subject: percpu_counter: update debugobjects fixup callbacks return type Subject: Documentation: update debugobjects doc Subject: debugobjects: insulate non-fixup logic related to static obj from fixup callbacks Subject: ocfs2: fix comment in struct ocfs2_extended_slot Subject: ocfs2: clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec Subject: ocfs2: clean up unused parameter 'count' in o2hb_read_block_input() Subject: ocfs2: clean up an unneeded goto in ocfs2_put_slot() Subject: kernel/padata.c: removed unused code Subject: kernel/padata.c: hide unused functions Subject: mm/slab: fix the theoretical race by holding proper lock Subject: mm/slab: remove BAD_ALIEN_MAGIC again Subject: mm/slab: drain the free slab as much as possible Subject: mm/slab: factor out kmem_cache_node initialization code Subject: mm/slab: clean-up kmem_cache_node setup Subject: mm/slab: don't keep free slabs if free_objects exceeds free_limit Subject: mm/slab: racy access/modify the slab color Subject: mm/slab: make cache_grow() handle the page allocated on arbitrary node Subject: mm/slab: separate cache_grow() to two parts Subject: mm/slab: refill cpu cache through a new slab without holding a node lock Subject: mm/slab: lockless decision to grow cache Subject: mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in kmem_cache_shrink() Subject: mm: SLAB freelist randomization Subject: mm: slab: remove ZONE_DMA_FLAG Subject: mm/slub.c: fix sysfs filename in comment Subject: mm/page_ref: use page_ref helper instead of direct modification of _count Subject: mm: rename _count, field of the struct page, to _refcount Subject: compiler.h: add support for malloc attribute Subject: include/linux: apply __malloc attribute Subject: include/linux/nodemask.h: create next_node_in() helper Subject: mm/hugetlb: optimize minimum size (min_size) accounting Subject: mm/hugetlb: introduce hugetlb_bad_size() Subject: arm64: mm: use hugetlb_bad_size() Subject: metag: mm: use hugetlb_bad_size() Subject: powerpc: mm: use hugetlb_bad_size() Subject: tile: mm: use hugetlb_bad_size() Subject: x86: mm: use hugetlb_bad_size() Subject: mm/hugetlb: is_vm_hugetlb_page() can return bool Subject: mm/memory_hotplug: is_mem_section_removable() can return bool Subject: mm/vmalloc.c: is_vmalloc_addr() can return bool Subject: mm/mempolicy.c: vma_migratable() can return bool Subject: mm/memcontrol.c:mem_cgroup_select_victim_node(): clarify comment Subject: mm/page_alloc: remove useless parameter of __free_pages_boot_core Subject: mm/hugetlb.c: use first_memory_node Subject: mm/mempolicy.c:offset_il_node() document and clarify Subject: mm/rmap: replace BUG_ON(anon_vma->degree) with VM_WARN_ON Subject: mm, compaction: wrap calculating first and last pfn of pageblock Subject: mm, compaction: reduce spurious pcplist drains Subject: mm, compaction: skip blocks where isolation fails in async direct compaction Subject: mm/highmem: simplify is_highmem() Subject: mm: uninline page_mapped() Subject: mm/hugetlb: add same zone check in pfn_range_valid_gigantic() Subject: mm/memory_hotplug: add comment to some functions related to memory hotplug Subject: mm/vmstat: add zone range overlapping check Subject: mm/page_owner: add zone range overlapping check Subject: power: add zone range overlapping check Subject: mm/writeback: correct dirty page calculation for highmem Subject: mm/page_alloc: correct highmem memory statistics Subject: mm/highmem: make nr_free_highpages() handles all highmem zones by itself Subject: mm/vmstat: make node_page_state() handles all zones by itself Subject: mm/mmap: kill hook arch_rebalance_pgtables() Subject: mm: update_lru_size warn and reset bad lru_size Subject: mm: update_lru_size do the __mod_zone_page_state Subject: mm: use __SetPageSwapBacked and dont ClearPageSwapBacked Subject: tmpfs: preliminary minor tidyups Subject: tmpfs: mem_cgroup charge fault to vm_mm not current mm Subject: mm: /proc/sys/vm/stat_refresh to force vmstat update Subject: huge mm: move_huge_pmd does not need new_vma Subject: huge pagecache: extend mremap pmd rmap lockout to files Subject: arch: fix has_transparent_hugepage() Subject: memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE Subject: memory_hotplug: introduce memhp_default_state= command line parameter Subject: mm, oom: move GFP_NOFS check to out_of_memory Subject: oom, oom_reaper: try to reap tasks which skip regular OOM killer path Subject: mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper Subject: mm, page_alloc: only check PageCompound for high-order pages Subject: mm, page_alloc: use new PageAnonHead helper in the free page fast path Subject: mm, page_alloc: reduce branches in zone_statistics Subject: mm, page_alloc: inline zone_statistics Subject: mm, page_alloc: inline the fast path of the zonelist iterator Subject: mm, page_alloc: use __dec_zone_state for order-0 page allocation Subject: mm, page_alloc: avoid unnecessary zone lookups during pageblock operations Subject: mm, page_alloc: convert alloc_flags to unsigned Subject: mm, page_alloc: convert nr_fair_skipped to bool Subject: mm, page_alloc: remove unnecessary local variable in get_page_from_freelist Subject: mm, page_alloc: remove unnecessary initialisation in get_page_from_freelist Subject: mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() Subject: mm, page_alloc: simplify last cpupid reset Subject: mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath Subject: mm, page_alloc: check once if a zone has isolated pageblocks Subject: mm, page_alloc: shorten the page allocator fast path Subject: mm, page_alloc: reduce cost of fair zone allocation policy retry Subject: mm, page_alloc: shortcut watermark checks for order-0 pages Subject: mm, page_alloc: avoid looking up the first zone in a zonelist twice Subject: mm, page_alloc: remove field from alloc_context Subject: mm, page_alloc: check multiple page fields with a single branch Subject: mm, page_alloc: un-inline the bad part of free_pages_check Subject: mm, page_alloc: pull out side effects from free_pages_check Subject: mm, page_alloc: remove unnecessary variable from free_pcppages_bulk Subject: mm, page_alloc: inline pageblock lookup in page free fast paths Subject: cpuset: use static key better and convert to new API Subject: mm, page_alloc: defer debugging checks of freed pages until a PCP drain Subject: mm, page_alloc: defer debugging checks of pages allocated from the PCP Subject: mm, page_alloc: don't duplicate code in free_pcp_prepare Subject: mm, page_alloc: uninline the bad page part of check_new_page() Subject: mm, page_alloc: restore the original nodemask if the fast path allocation failed [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 297 bytes --] - the rest of MM - KASAN updates - procfs updates - exit, fork updates - printk updates - lib/ updates - radix-tree testsuite updates - checkpatch updates - kprobes updates - a few other misc bits 162 patches, based on 6eb59af580dcffc6f6982ac8ef6d27a1a5f26b27 [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 5691 bytes --] - Please have a think about Oleg's "wait/ptrace: assume __WALL if the child is traced". It's a kernel-based workaround for existing userspace issues and is a form of non-back-compatible change. - A few hotfixes - befs cleanups - nilfs2 updates - sys_wait() changes - kexec updates - kdump - scripts/gdb updates - the last of the MM queue - a few other misc things 84 patches, based on 7639dad93a5564579987abded4ec05e3db13659d: Subject: m32r: fix build failure Subject: : ELF/MIPS build fix Subject: mm: memcontrol: fix possible css ref leak on oom Subject: fs/befs/datastream.c:befs_read_datastream(): remove unneeded initialization to NULL Subject: fs/befs/datastream.c:befs_read_lsymlink(): remove unneeded initialization to NULL Subject: fs/befs/datastream.c:befs_find_brun_dblindirect(): remove unneeded initializations to NULL Subject: fs/befs/linuxvfs.c:befs_get_block(): remove unneeded initialization to NULL Subject: fs/befs/linuxvfs.c:befs_iget(): remove unneeded initialization to NULL Subject: fs/befs/linuxvfs.c:befs_iget(): remove unneeded raw_inode initialization to NULL Subject: fs/befs/linuxvfs.c:befs_iget(): remove unneeded befs_nio initialization to NULL Subject: fs/befs/io.c:befs_bread_iaddr(): remove unneeded initialization to NULL Subject: fs/befs/io.c:befs_bread(): remove unneeded initialization to NULL Subject: nilfs2: constify nilfs_sc_operations structures Subject: nilfs2: fix white space issue in nilfs_mount() Subject: nilfs2: remove space before comma Subject: nilfs2: remove FSF mailing address from GPL notices Subject: nilfs2: clean up old e-mail addresses Subject: MAINTAINERS: add web link for nilfs project Subject: nilfs2: clarify permission to replicate the design Subject: nilfs2: get rid of nilfs_mdt_mark_block_dirty() Subject: nilfs2: move cleanup code of metadata file from inode routines Subject: nilfs2: replace __attribute__((packed)) with __packed Subject: nilfs2: add missing line spacing Subject: nilfs2: clean trailing semicolons in macros Subject: nilfs2: do not emit extra newline on nilfs_warning() and nilfs_error() Subject: nilfs2: remove space before semicolon Subject: nilfs2: fix code indent coding style issue Subject: nilfs2: avoid bare use of 'unsigned' Subject: nilfs2: remove unnecessary else after return or break Subject: nilfs2: remove loops of single statement macros Subject: nilfs2: fix block comments Subject: wait/ptrace: assume __WALL if the child is traced Subject: wait: allow sys_waitid() to accept __WNOTHREAD/__WCLONE/__WALL Subject: signal: make oom_flags a bool Subject: kernel/signal.c: convert printk(KERN_<LEVEL> ...) to pr_<level>(...) Subject: signal: move the "sig < SIGRTMIN" check into siginmask(sig) Subject: kernek/fork.c: allocate idle task for a CPU always on its local node Subject: exec: remove the no longer needed remove_arg_zero()->free_arg_page() Subject: kexec: introduce a protection mechanism for the crashkernel reserved memory Subject: kexec: provide arch_kexec_protect(unprotect)_crashkres() Subject: kexec: make a pair of map/unmap reserved pages in error path Subject: kexec: do a cleanup for function kexec_load Subject: s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() Subject: kdump: fix gdb macros work work with newer and 64-bit kernels Subject: rtsx_usb_ms: use schedule_timeout_idle() in polling loop Subject: drivers/memstick/core/mspro_block: use kmemdup Subject: arch/defconfig: remove CONFIG_RESOURCE_COUNTERS Subject: scripts/gdb: Adjust module reference counter reported by lx-lsmod Subject: scripts/gdb: provide linux constants Subject: scripts/gdb: provide kernel list item generators Subject: scripts/gdb: convert modules usage to lists functions Subject: scripts/gdb: provide exception catching parser Subject: scripts/gdb: support !CONFIG_MODULES gracefully Subject: scripts/gdb: provide a dentry_name VFS path helper Subject: scripts/gdb: add io resource readers Subject: scripts/gdb: add mount point list command Subject: scripts/gdb: add cpu iterators Subject: scripts/gdb: cast CPU numbers to integer Subject: scripts/gdb: add a Radix Tree Parser Subject: scripts/gdb: add documentation example for radix tree Subject: scripts/gdb: add lx_thread_info_by_pid helper Subject: scripts/gdb: improve types abstraction for gdb python scripts Subject: scripts/gdb: fix issue with dmesg.py and python 3.X Subject: scripts/gdb: decode bytestream on dmesg for Python3 Subject: MAINTAINERS: add co-maintainer for scripts/gdb Subject: mm: make mmap_sem for write waits killable for mm syscalls Subject: mm: make vm_mmap killable Subject: mm: make vm_munmap killable Subject: mm, aout: handle vm_brk failures Subject: mm, elf: handle vm_brk error Subject: mm: make vm_brk killable Subject: mm, proc: make clear_refs killable Subject: mm, fork: make dup_mmap wait for mmap_sem for write killable Subject: ipc, shm: make shmem attach/detach wait for mmap_sem killable Subject: vdso: make arch_setup_additional_pages wait for mmap_sem for write killable Subject: coredump: make coredump_wait wait for mmap_sem for write killable Subject: aio: make aio_setup_ring killable Subject: exec: make exec path waiting for mmap_sem killable Subject: prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable Subject: uprobes: wait for mmap_sem for write killable Subject: drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable Subject: drm/radeon: make radeon_mn_get wait for mmap_sem killable Subject: drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable Subject: kgdb: depends on VT [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 706 bytes --] 10 fixes, based on ea8ea737c46cffa5d0ee74309f81e55a7e5e9c2a: Subject: seqlock: fix raw_read_seqcount_latch() Subject: mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly Subject: mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta Subject: mm: slub: remove unused virt_to_obj() Subject: ocfs2: fix improper handling of return errno Subject: memcg: fix mem_cgroup_out_of_memory() return value. Subject: mm: oom_reaper: remove some bloat Subject: dma-debug: avoid spinlock recursion when disabling dma-debug Subject: update "mm/zsmalloc: don't fail if can't create debugfs info" Subject: drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4 [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1273 bytes --] - late-breaking ocfs2 updates - random bunch of fixes 19 patches, based on dc03c0f9d12d85286d5e3623aa96d5c2a271b8e6: Subject: ocfs2: o2hb: add negotiate timer Subject: ocfs2: o2hb: add NEGO_TIMEOUT message Subject: ocfs2: o2hb: add NEGOTIATE_APPROVE message Subject: ocfs2: o2hb: add some user/debug log Subject: ocfs2: o2hb: don't negotiate if last hb fail Subject: ocfs2: o2hb: fix hb hung time Subject: ocfs2: bump up o2cb network protocol version Subject: direct-io: fix direct write stale data exposure from concurrent buffered read Subject: mm: oom: do not reap task if there are live threads in threadgroup Subject: MAINTAINERS: add kexec_core.c and kexec_file.c Subject: MAINTAINERS: Kdump maintainers update Subject: mm: use early_pfn_to_nid in page_ext_init Subject: mm: use early_pfn_to_nid in register_page_bootmem_info_node Subject: oom_reaper: close race with exiting task Subject: mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap() Subject: mm/cma: silence warnings due to max() usage Subject: mm/memcontrol.c: fix the margin computation in mem_cgroup_margin() Subject: mm/memcontrol.c: move comments for get_mctgt_type() to proper position Subject: mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 880 bytes --] 11 fixes, based on 4340fa55298d17049e71c7a34e04647379c269f3: Subject: mm: fix overflow in vm_map_ram() Subject: kdump: fix dmesg gdbmacro to work with record based printk Subject: mm: check the return value of lookup_page_ext for all call sites Subject: reiserfs: avoid uninitialized variable use Subject: memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem() Subject: mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup Subject: checkpatch: reduce git commit description style false positives Subject: mm, page_alloc: prevent infinite loop in buffered_rmqueue() Subject: mm, oom_reaper: do not use siglock in try_oom_reaper() Subject: mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 552 bytes --] 7 fixes, based on c8ae067f2635be0f8c7e5db1bb74b757d623e05b: Subject: mm/hugetlb: fix huge page reserve accounting for private mappings Subject: kasan: change memory hot-add error messages to info messages Subject: revert "mm: memcontrol: fix possible css ref leak on oom" Subject: mm: thp: broken page count after commit aa88b68c Subject: kernel/relay.c: fix potential memory leak Subject: mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all Subject: mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2596 bytes --] Two weeks worth of fixes here. 41 fixes, based on 63c04ee7d3b7c8d8e2726cb7c5f8a5f6fcc1e3b2: Subject: mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() Subject: oom_reaper: avoid pointless atomic_inc_not_zero usage. Subject: selftests/vm/compaction_test: fix write to restore nr_hugepages Subject: tmpfs: don't undo fallocate past its last page Subject: tree wide: get rid of __GFP_REPEAT for order-0 allocations part I Subject: x86: get rid of superfluous __GFP_REPEAT Subject: x86/efi: get rid of superfluous __GFP_REPEAT Subject: arm64: get rid of superfluous __GFP_REPEAT Subject: arc: get rid of superfluous __GFP_REPEAT Subject: mips: get rid of superfluous __GFP_REPEAT Subject: nios2: get rid of superfluous __GFP_REPEAT Subject: parisc: get rid of superfluous __GFP_REPEAT Subject: score: get rid of superfluous __GFP_REPEAT Subject: powerpc: get rid of superfluous __GFP_REPEAT Subject: sparc: get rid of superfluous __GFP_REPEAT Subject: s390: get rid of superfluous __GFP_REPEAT Subject: sh: get rid of superfluous __GFP_REPEAT Subject: tile: get rid of superfluous __GFP_REPEAT Subject: unicore32: get rid of superfluous __GFP_REPEAT Subject: jbd2: get rid of superfluous __GFP_REPEAT Subject: MAINTAINERS: update Calgary IOMMU Subject: mm: mempool: kasan: don't poot mempool objects in quarantine Subject: mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask Subject: mailmap: add Antoine Tenart's email Subject: mailmap: add Boris Brezillon's email Subject: Revert "mm: make faultaround produce old ptes" Subject: Revert "mm: disable fault around on emulated access bit architecture" Subject: hugetlb: fix nr_pmds accounting with shared page tables Subject: memcg: mem_cgroup_migrate() may be called with irq disabled Subject: memcg: css_alloc should return an ERR_PTR value on error Subject: mm/swap.c: flush lru pvecs on compound page arrival Subject: mm/hugetlb: clear compound_mapcount when freeing gigantic pages Subject: mm: prevent KASAN false positives in kmemleak Subject: mm, compaction: abort free scanner if split fails Subject: ocfs2: disable BUG assertions in reading blocks Subject: oom, suspend: fix oom_reaper vs. oom_killer_disable race Subject: fs/nilfs2: fix potential underflow in call to crc32_le Subject: tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences" Subject: mm/page_owner: avoid null pointer dereference Subject: autofs: don't get stuck in a loop if vfs_write() returns an error Subject: init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64 [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1338 bytes --] 20 fixes, based on f97d10454e4da2aceb44dfa7c59bb43ba9f50199: Subject: mm, compaction: prevent VM_BUG_ON when terminating freeing scanner Subject: kasan: add newline to messages Subject: scripts/gdb: silence 'nothing to do' message Subject: scripts/gdb: rebuild constants.py on dependancy change Subject: scripts/gdb: add constants.py to .gitignore Subject: scripts/gdb: Perform path expansion to lx-symbol's arguments Subject: Revert "scripts/gdb: add a Radix Tree Parser" Subject: Revert "scripts/gdb: add documentation example for radix tree" Subject: madvise_free, thp: fix madvise_free_huge_pmd return value after splitting Subject: uapi: export lirc.h header Subject: kasan/quarantine: fix bugs on qlist_move_cache() Subject: mm, meminit: always return a valid node from early_pfn_to_nid Subject: mm, meminit: ensure node is online before checking whether pages are uninitialised Subject: gcov: add support for gcc version >= 6 Subject: vmlinux.lds: account for destructor sections Subject: mm: thp: move pmd check inside ptl for freeze_page() Subject: mm: rmap: call page_check_address() with sync enabled to avoid racy check Subject: mm: thp: refix false positive BUG in page_move_anon_rmap() Subject: mm: workingset: printk missing log level, use pr_info() Subject: m32r: fix build warning about putc [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 378 bytes --] 5 fixes, based on 47ef4ad2684d380dd6d596140fb79395115c3950: Subject: mm: memcontrol: fix cgroup creation failure after many small jobs Subject: radix-tree: fix radix_tree_iter_retry() for tagged iterators. Subject: testing/radix-tree: fix a macro expansion bug Subject: tools/vm/slabinfo: fix an unintentional printf Subject: pps: do not crash when failed to register [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 7441 bytes --] - a few misc bits - ocfs2 - most(?) of MM 126 patches, based on e65805251f2db69c9f67ed8062ab82526be5a374: Subject: arm: get rid of superfluous __GFP_REPEAT Subject: dax: some small updates to dax.txt documentation Subject: dax: remote unused fault wrappers Subject: dma-debug: track bucket lock state for static checkers Subject: fbmon: remove unused function argument Subject: CFLAGS: add -Wunused-but-set-parameter Subject: kbuild: abort build on bad stack protector flag Subject: scripts/bloat-o-meter: fix percent on <1% changes Subject: m32r: add __ucmpdi2 to fix build failure Subject: debugobjects.h: fix trivial kernel doc warning Subject: ocfs2: fix a redundant re-initialization Subject: ocfs2: improve recovery performance Subject: ocfs2: cleanup unneeded goto in ocfs2_create_new_inode_locks Subject: ocfs2/dlm: fix memory leak of dlm_debug_ctxt Subject: ocfs2: cleanup implemented prototypes Subject: ocfs2: remove obscure BUG_ON in dlmglue Subject: ocfs2/cluster: clean up unnecessary assignment for 'ret' Subject: fs/fs-writeback.c: add a new writeback list for sync Subject: fs/fs-writeback.c: inode writeback list tracking tracepoints Subject: mm: reorganize SLAB freelist randomization Subject: mm: SLUB freelist randomization Subject: slab: make GFP_SLAB_BUG_MASK information more human readable Subject: slab: do not panic on invalid gfp_mask Subject: mm: faster kmalloc_array(), kcalloc() Subject: mm/slab: use list_move instead of list_del/list_add Subject: mm/memcontrol.c: remove the useless parameter for mc_handle_swap_pte Subject: mm/init: fix zone boundary creation Subject: memory-hotplug: add move_pfn_range() Subject: memory-hotplug: more general validation of zone during online Subject: memory-hotplug: use zone_can_shift() for sysfs valid_zones attribute Subject: mm: zap ZONE_OOM_LOCKED Subject: mm: oom: add memcg to oom_control Subject: include/linux/mmdebug.h: add VM_WARN which maps to WARN() Subject: powerpc/mm: check for irq disabled() only if DEBUG_VM is enabled Subject: zram: rename zstrm find-release functions Subject: zram: switch to crypto compress API Subject: zram: use crypto api to check alg availability Subject: zram: cosmetic: cleanup documentation Subject: zram: delete custom lzo/lz4 Subject: zram: add more compression algorithms Subject: zram: drop gfp_t from zcomp_strm_alloc() Subject: mm: use put_page() to free page instead of putback_lru_page() Subject: mm: migrate: support non-lru movable page migration Subject: mm: balloon: use general non-lru movable page feature Subject: zsmalloc: keep max_object in size_class Subject: zsmalloc: use bit_spin_lock Subject: zsmalloc: use accessor Subject: zsmalloc: factor page chain functionality out Subject: zsmalloc: introduce zspage structure Subject: zsmalloc: separate free_zspage from putback_zspage Subject: zsmalloc: use freeobj for index Subject: zsmalloc: page migration support Subject: zram: use __GFP_MOVABLE for memory allocation Subject: zsmalloc: use OBJ_TAG_BIT for bit shifter Subject: mm/compaction: split freepages without holding the zone lock Subject: mm/page_owner: initialize page owner without holding the zone lock Subject: mm/page_owner: copy last_migrate_reason in copy_page_owner() Subject: mm/page_owner: introduce split_page_owner and replace manual handling Subject: tools/vm/page_owner: increase temporary buffer size Subject: mm/page_owner: use stackdepot to store stacktrace Subject: mm/page_alloc: introduce post allocation processing on page allocator Subject: mm/page_isolation: clean up confused code Subject: mm: thp: check pmd_trans_unstable() after split_huge_pmd() Subject: mm/hugetlb: simplify hugetlb unmap Subject: mm: change the interface for __tlb_remove_page() Subject: mm/mmu_gather: track page size with mmu gather and force flush if page size change Subject: mm: remove pointless struct in struct page definition Subject: mm: clean up non-standard page->_mapcount users Subject: mm: memcontrol: cleanup kmem charge functions Subject: mm: charge/uncharge kmemcg from generic page allocator paths Subject: mm: memcontrol: teach uncharge_list to deal with kmem pages Subject: arch: x86: charge page tables to kmemcg Subject: pipe: account to kmemcg Subject: af_unix: charge buffers to kmemcg Subject: mm,oom: remove unused argument from oom_scan_process_thread(). Subject: mm, frontswap: convert frontswap_enabled to static key Subject: mm: add NR_ZSMALLOC to vmstat Subject: include/linux/memblock.h: Clean up code for several trivial details Subject: mm, oom_reaper: make sure that mmput_async is called only when memory was reaped Subject: mm, memcg: use consistent gfp flags during readahead Subject: mm/memblock.c:memblock_add_range(): if nr_new is 0 just return Subject: mm: make optimistic check for swapin readahead Subject: mm: make swapin readahead to improve thp collapse rate Subject: mm, thp: make swapin readahead under down_read of mmap_sem Subject: mm, thp: fix locking inconsistency in collapse_huge_page Subject: khugepaged: recheck pmd after mmap_sem re-acquired Subject: thp, mlock: update unevictable-lru.txt Subject: mm: do not pass mm_struct into handle_mm_fault Subject: mm: introduce fault_env Subject: mm: postpone page table allocation until we have page to map Subject: rmap: support file thp Subject: mm: introduce do_set_pmd() Subject: thp, vmstats: add counters for huge file pages Subject: thp: support file pages in zap_huge_pmd() Subject: thp: handle file pages in split_huge_pmd() Subject: thp: handle file COW faults Subject: thp: skip file huge pmd on copy_huge_pmd() Subject: thp: prepare change_huge_pmd() for file thp Subject: thp: run vma_adjust_trans_huge() outside i_mmap_rwsem Subject: thp: file pages support for split_huge_page() Subject: thp, mlock: do not mlock PTE-mapped file huge pages Subject: vmscan: split file huge pages before paging them out Subject: page-flags: relax policy for PG_mappedtodisk and PG_reclaim Subject: radix-tree: implement radix_tree_maybe_preload_order() Subject: filemap: prepare find and delete operations for huge pages Subject: truncate: handle file thp Subject: mm, rmap: account shmem thp pages Subject: shmem: prepare huge= mount option and sysfs knob Subject: shmem: get_unmapped_area align huge page Subject: shmem: add huge pages support Subject: shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings Subject: thp: extract khugepaged from mm/huge_memory.c Subject: khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() Subject: shmem: make shmem_inode_info::lock irq-safe Subject: khugepaged: add support of collapse for tmpfs/shmem pages Subject: thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE Subject: shmem: split huge pages beyond i_size under memory pressure Subject: thp: update Documentation/{vm/transhuge,filesystems/proc}.txt Subject: mm, thp: fix comment inconsistency for swapin readahead functions Subject: mm, thp: convert from optimistic swapin collapsing to conservative Subject: mm: fix build warnings in <linux/compaction.h> Subject: mm: memcontrol: remove BUG_ON in uncharge_list Subject: mm: memcontrol: fix documentation for compound parameter Subject: cgroup: fix idr leak for the first cgroup root Subject: cgroup: remove unnecessary 0 check from css_from_id() Subject: thp: fix comments of __pmd_trans_huge_lock() [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 6861 bytes --] - the rest of MM 101 patches, based on 194dc870a5890e855ecffb30f3b80ba7c88f96d6: Subject: proc, oom: drop bogus task_lock and mm check Subject: proc, oom: drop bogus sighand lock Subject: proc, oom_adj: extract oom_score_adj setting into a helper Subject: mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Subject: mm, oom: skip vforked tasks from being selected Subject: mm, oom: kill all tasks sharing the mm Subject: mm, oom: fortify task_will_free_mem() Subject: mm, oom: task_will_free_mem should skip oom_reaped tasks Subject: mm, oom_reaper: do not attempt to reap a task more than twice Subject: mm, oom: hide mm which is shared with kthread or global init Subject: mm, oom: tighten task_will_free_mem() locking Subject: mm: update the comment in __isolate_free_page Subject: mm: fix vm-scalability regression in cgroup-aware workingset code Subject: mm/compaction: remove unnecessary order check in try_to_compact_pages() Subject: freezer, oom: check TIF_MEMDIE on the correct task Subject: cpuset, mm: fix TIF_MEMDIE check in cpuset_change_task_nodemask Subject: mm, meminit: remove early_page_nid_uninitialised Subject: mm, vmstat: add infrastructure for per-node vmstats Subject: mm, vmscan: move lru_lock to the node Subject: mm, vmscan: move LRU lists to node Subject: mm, mmzone: clarify the usage of zone padding Subject: mm, vmscan: begin reclaiming pages on a per-node basis Subject: mm, vmscan: have kswapd only scan based on the highest requested zone Subject: mm, vmscan: make kswapd reclaim in terms of nodes Subject: mm, vmscan: remove balance gap Subject: mm, vmscan: simplify the logic deciding whether kswapd sleeps Subject: mm, vmscan: by default have direct reclaim only shrink once per node Subject: mm, vmscan: remove duplicate logic clearing node congestion and dirty state Subject: mm: vmscan: do not reclaim from kswapd if there is any eligible zone Subject: mm, vmscan: make shrink_node decisions more node-centric Subject: mm, memcg: move memcg limit enforcement from zones to nodes Subject: mm, workingset: make working set detection node-aware Subject: mm, page_alloc: consider dirtyable memory in terms of nodes Subject: mm: move page mapped accounting to the node Subject: mm: rename NR_ANON_PAGES to NR_ANON_MAPPED Subject: mm: move most file-based accounting to the node Subject: mm: move vmscan writes and file write accounting to the node Subject: mm, vmscan: only wakeup kswapd once per node for the requested classzone Subject: mm, page_alloc: wake kswapd based on the highest eligible zone Subject: mm: convert zone_reclaim to node_reclaim Subject: mm, vmscan: avoid passing in classzone_idx unnecessarily to shrink_node Subject: mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready Subject: mm, vmscan: avoid passing in `remaining' unnecessarily to prepare_kswapd_sleep() Subject: mm, vmscan: Have kswapd reclaim from all zones if reclaiming and buffer_heads_over_limit Subject: mm, vmscan: add classzone information to tracepoints Subject: mm, page_alloc: remove fair zone allocation policy Subject: mm: page_alloc: cache the last node whose dirty limit is reached Subject: mm: vmstat: replace __count_zone_vm_events with a zone id equivalent Subject: mm: vmstat: account per-zone stalls and pages skipped during reclaim Subject: mm, vmstat: print node-based stats in zoneinfo file Subject: mm, vmstat: remove zone and node double accounting by approximating retries Subject: mm, page_alloc: fix dirtyable highmem calculation Subject: mm, pagevec: release/reacquire lru_lock on pgdat change Subject: mm: show node_pages_scanned per node, not zone Subject: mm, vmscan: Update all zone LRU sizes before updating memcg Subject: mm, vmscan: remove redundant check in shrink_zones() Subject: mm, vmscan: release/reacquire lru_lock on pgdat change Subject: mm: add per-zone lru list stat Subject: mm, vmscan: remove highmem_file_pages Subject: mm: remove reclaim and compaction retry approximations Subject: mm: consider whether to decivate based on eligible zones inactive ratio Subject: mm, vmscan: account for skipped pages as a partial scan Subject: mm: bail out in shrink_inactive_list() Subject: mm/zsmalloc: use obj_index to keep consistent with others Subject: mm/zsmalloc: take obj index back from find_alloced_obj Subject: mm/zsmalloc: use class->objs_per_zspage to get num of max objects Subject: mm/zsmalloc: avoid calculate max objects of zspage twice Subject: mm/zsmalloc: keep comments consistent with code Subject: mm/zsmalloc: add __init,__exit attribute Subject: mm/zsmalloc: use helper to clear page->flags bit Subject: mm, THP: clean up return value of madvise_free_huge_pmd Subject: memblock: include <asm/sections.h> instead of <asm-generic/sections.h> Subject: mm: CONFIG_ZONE_DEVICE stop depending on CONFIG_EXPERT Subject: mm: cleanup ifdef guards for vmem_altmap Subject: mm: track NR_KERNEL_STACK in KiB instead of number of stacks Subject: mm: fix memcg stack accounting for sub-page stacks Subject: kdb: use task_cpu() instead of task_thread_info()->cpu Subject: printk: when dumping regs, show the stack, not thread_info Subject: mm/memblock.c: add new infrastructure to address the mem limit issue Subject: arm64:acpi: fix the acpi alignment exception when 'mem=' specified Subject: kmemleak: don't hang if user disables scanning early Subject: make __section_nr() more efficient Subject: mm: hwpoison: remove incorrect comments Subject: mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode Subject: Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" Subject: mm: add cond_resched() to generic_swapfile_activate() Subject: mm: optimize copy_page_to/from_iter_iovec Subject: mem-hotplug: alloc new page from a nearest neighbor node when mem-offline Subject: mm/memblock.c: fix index adjustment error in __next_mem_range_rev() Subject: zsmalloc: Delete an unnecessary check before the function call "iput" Subject: mm: fix use-after-free if memory allocation failed in vma_adjust() Subject: mm, kasan: account for object redzone in SLUB's nearest_obj() Subject: mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB Subject: lib/stackdepot.c: use __GFP_NOWARN for stack allocations Subject: mm, page_alloc: set alloc_flags only once in slowpath Subject: mm, page_alloc: don't retry initial attempt in slowpath Subject: mm, page_alloc: restructure direct compaction handling in slowpath Subject: mm, page_alloc: make THP-specific decisions more generic Subject: mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations Subject: mm, compaction: introduce direct compaction priority Subject: mm, compaction: simplify contended compaction handling [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 7166 bytes --] - the rest of ocfs2 - various hotfixes, mainly MM - quite a bit of misc stuff - drivers, fork, exec, signals, etc. - printk updates - firmware - checkpatch - nilfs2 - more kexec stuff than usual - rapidio updates - w1 things 111 patches, based on f7b32e4c021fd788f13f6785e17efbc3eb05b351: Subject: ocfs2: ensure that dlm lockspace is created by kernel module Subject: ocfs2: retry on ENOSPC if sufficient space in truncate log Subject: ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler Subject: ocfs2/dlm: solve a BUG when deref failed in dlm_drop_lockres_ref Subject: ocfs2/dlm: continue to purge recovery lockres when recovery master goes down Subject: mm: fail prefaulting if page table allocation fails Subject: mm: move swap-in anonymous page into active list Subject: tools/testing/radix-tree/linux/gfp.h: fix bitrotted value Subject: mm/hugetlb: avoid soft lockup in set_max_huge_pages() Subject: mm, hugetlb: fix huge_pte_alloc BUG_ON Subject: memcg: put soft limit reclaim out of way if the excess tree is empty Subject: mm/kasan: fix corruptions and false positive reports Subject: mm/kasan: don't reduce quarantine in atomic contexts Subject: mm/kasan, slub: don't disable interrupts when object leaves quarantine Subject: mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta Subject: mm/kasan: get rid of ->state in struct kasan_alloc_meta Subject: kasan: improve double-free reports Subject: kasan: avoid overflowing quarantine size on low memory systems Subject: radix-tree: account nodes to memcg only if explicitly requested Subject: mm: vmscan: fix memcg-aware shrinkers not called on global reclaim Subject: sysv, ipc: fix security-layer leaking Subject: UBSAN: fix typo in format string Subject: cgroup: update cgroup's document path Subject: MAINTAINERS: befs: add new maintainers Subject: proc_oom_score: remove tasklist_lock and pid_alive() Subject: procfs: avoid 32-bit time_t in /proc/*/stat Subject: fs/proc/task_mmu.c: suppress compilation warnings with W=1 Subject: init/Kconfig: make COMPILE_TEST depend on !UML Subject: memstick: don't allocate unused major for ms_block Subject: treewide: replace obsolete _refok by __ref Subject: uapi: move forward declarations of internal structures Subject: mailmap: add Linus L_ssing Subject: include: mman: use bool instead of int for the return value of arch_validate_prot Subject: task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works Subject: dynamic_debug: only add header when used Subject: printk: do not include interrupt.h Subject: printk: create pr_<level> functions Subject: printk: introduce suppress_message_printing() Subject: printk: include <asm/sections.h> instead of <asm-generic/sections.h> Subject: fbdev/bfin_adv7393fb: move DRIVER_NAME before its first use Subject: ratelimit: extend to print suppressed messages on release Subject: printk: add kernel parameter to control writes to /dev/kmsg Subject: get_maintainer.pl: reduce need for command-line option -f Subject: lib/iommu-helper: skip to next segment Subject: crc32: use ktime_get_ns() for measurement Subject: radix-tree: fix comment about "exceptional" bits Subject: firmware: consolidate kmap/read/write logic Subject: firmware: provide infrastructure to make fw caching optional Subject: firmware: support loading into a pre-allocated buffer Subject: checkpatch: skip long lines that use an EFI_GUID macro Subject: checkpatch: allow c99 style // comments Subject: checkpatch: yet another commit id improvement Subject: checkpatch: don't complain about BIT macro in uapi Subject: checkpatch: improve 'bare use of' signed/unsigned types warning Subject: checkpatch: check signoff when reading stdin Subject: checkpatch: if no filenames then read stdin Subject: binfmt_elf: fix calculations for bss padding Subject: mm: refuse wrapped vm_brk requests Subject: fs/binfmt_em86.c: fix incompatible pointer type Subject: nilfs2: hide function name argument from nilfs_error() Subject: nilfs2: add nilfs_msg() message interface Subject: nilfs2: embed a back pointer to super block instance in nilfs object Subject: nilfs2: reduce bare use of printk() with nilfs_msg() Subject: nilfs2: replace nilfs_warning() with nilfs_msg() Subject: nilfs2: emit error message when I/O error is detected Subject: nilfs2: do not use yield() Subject: nilfs2: refactor parser of snapshot mount option Subject: nilfs2: fix misuse of a semaphore in sysfs code Subject: nilfs2: use BIT() macro Subject: nilfs2: move ioctl interface and disk layout to uapi separately Subject: reiserfs: fix "new_insert_key may be used uninitialized ..." Subject: signal: consolidate {TS,TLF}_RESTORE_SIGMASK code Subject: kernel/exit.c: quieten greatest stack depth printk Subject: cpumask: fix code comment Subject: kexec: return error number directly Subject: ARM: kdump: advertise boot aliased crash kernel resource Subject: ARM: kexec: advertise location of bootable RAM Subject: kexec: don't invoke OOM-killer for control page allocation Subject: kexec: ensure user memory sizes do not wrap Subject: kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_t Subject: kexec: allow architectures to override boot mapping Subject: ARM: keystone: dts: add psci command definition Subject: ARM: kexec: fix kexec for Keystone 2 Subject: kexec: use core_param for crash_kexec_post_notifiers boot option Subject: kexec: add a kexec_crash_loaded() function Subject: kexec: allow kdump with crash_kexec_post_notifiers Subject: kexec: add restriction on kexec_load() segment sizes Subject: rapidio: add RapidIO channelized messaging driver Subject: rapidio: remove unnecessary 0x prefixes before %pa extension uses Subject: rapidio/documentation: fix mangled paragraph in mport_cdev Subject: rapidio: fix return value description for dma_prep functions Subject: rapidio/tsi721_dma: add channel mask and queue size parameters Subject: rapidio/tsi721: add PCIe MRRS override parameter Subject: rapidio/tsi721: add messaging mbox selector parameter Subject: rapidio/tsi721_dma: advance queue processing from transfer submit call Subject: rapidio: fix error handling in mbox request/release functions Subject: rapidio/idt_gen2: fix locking warning Subject: rapidio: change inbound window size type to u64 Subject: rapidio: modify for rev.3 specification changes Subject: powerpc/fsl_rio: apply changes for RIO spec rev 3 Subject: rapidio/switches: add driver for IDT gen3 switches Subject: w1: remove need for ida and use PLATFORM_DEVID_AUTO Subject: w1: add helper macro module_w1_family Subject: w1:omap_hdq: fix regression Subject: init: allow blacklisting of module_init functions Subject: relay: add global mode support for buffer-only channels Subject: init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig Subject: config: add android config fragments Subject: init/Kconfig: add clarification for out-of-tree modules Subject: kcov: allow more fine-grained coverage instrumentation Subject: ipc: delete "nr_ipc_ns" [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1288 bytes --] - dma-mapping API cleanup - a few cleanups and misc things - use jump labels in dynamic-debug 18 patches, based on bf0f500bd0199aab613eb0ecb3412edd5472740d: Subject: drivers/fpga/Kconfig: fix build failure Subject: tree-wide: replace config_enabled() with IS_ENABLED() Subject: include/linux/bitmap.h: cleanup Subject: media: mtk-vcodec: remove unused dma_attrs Subject: dma-mapping: use unsigned long for dma_attrs Subject: samples/kprobe: convert the printk to pr_info/pr_err Subject: samples/jprobe: convert the printk to pr_info/pr_err Subject: samples/kretprobe: convert the printk to pr_info/pr_err Subject: samples/kretprobe: fix the wrong type Subject: block: remove BLK_DEV_DAX config option Subject: MAINTAINERS: update email and list of Samsung HW driver maintainers Subject: drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning Subject: powerpc: add explicit #include <asm/asm-compat.h> for jump label Subject: sparc: support static_key usage in non-module __exit sections Subject: tile: support static_key usage in non-module __exit sections Subject: arm: jump label may reference text in __exit Subject: jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL Subject: dynamic_debug: add jump label support [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
A few late-breaking fixes. 7 fixes, based on c1ece76719205690f4b448460d9b85c130e8021b: Subject: mm: disable CONFIG_MEMORY_HOTPLUG when KASAN is enabled Subject: mm/memblock: fix a typo in a comment Subject: mm: initialise per_cpu_nodestats for all online pgdats at boot Subject: powerpc/fsl_rio: fix a missing error code Subject: slub: drop bogus inline for fixup_red_left() Subject: MAINTAINERS: update cgroup's document path Subject: mm/memblock.c: fix NULL dereference error
[-- Attachment #1: Type: text/plain, Size: 579 bytes --] 7 fixes, based on 85e97be32c6242c98dbbc7a241b4a78c1b93327b: Subject: mm/hugetlb: fix incorrect hugepages count during mem hotplug Subject: proc, meminfo: use correct helpers for calculating LRU sizes in meminfo Subject: mm: memcontrol: fix swap counter leak on swapout from offline cgroup Subject: mm: memcontrol: fix memcg id ref counter on swap charge move Subject: kasan: remove the unnecessary WARN_ONCE from quarantine.c Subject: mm, oom: fix uninitialized ret in task_will_free_mem() Subject: mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 738 bytes --] 12 fixes, based on 61c04572de404e52a655a36752e696bbcb483cf5: Subject: byteswap: don't use __builtin_bswap*() with sparse Subject: get_maintainer: quiet noisy implicit -f vcs_file_exists checking Subject: sysctl: handle error writing UINT_MAX to u32 fields Subject: stackdepot: fix mempolicy use-after-free Subject: soft_dirty: fix soft_dirty during THP split Subject: printk: fix parsing of "brl=" option Subject: treewide: replace config_enabled() with IS_ENABLED() (2nd round) Subject: mm: clarify COMPACTION Kconfig text Subject: mm: memcontrol: avoid unused function warning Subject: fs/seq_file: fix out-of-bounds read Subject: dax: fix device-dax region base Subject: mm: silently skip readahead for DAX inodes [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1144 bytes --] 14 fixes, based on 071e31e254e0e0c438eecba3dba1d6e2d0da36c2: Subject: mm, oom: prevent premature OOM killer invocation for high order request Subject: kexec: fix double-free when failing to relocate the purgatory Subject: kconfig: tinyconfig: provide whole choice blocks to avoid warnings Subject: lib/test_hash.c: fix warning in two-dimensional array init Subject: lib/test_hash.c: fix warning in preprocessor symbol evaluation Subject: mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator Subject: drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE Subject: treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE Subject: printk/nmi: avoid direct printk()-s from __printk_nmi_flush() Subject: mm, mempolicy: task->mempolicy must be NULL before dropping final reference Subject: MAINTAINERS: Vladimir has moved Subject: kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Subject: rapidio/documentation/mport_cdev: add missing parameter description Subject: rapidio/tsi721: fix incorrect detection of address translation condition [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1326 bytes --] 20 fixes, based on 3be7988674ab33565700a37b210f502563d932e6: Subject: mem-hotplug: don't clear the only node in new_node_page() Subject: ocfs2/dlm: fix race between convert and migration Subject: MAINTAINERS: Maik has moved Subject: khugepaged: fix use-after-free in collapse_huge_page() Subject: mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin() Subject: mm: avoid endless recursion in dump_page() Subject: MAINTAINERS: update email for VLYNQ bus entry Subject: autofs: use dentry flags to block walks during expire Subject: mm: fix the page_swap_info() BUG_ON check Subject: ipc/shm: fix crash if CONFIG_SHMEM is not set Subject: ocfs2: fix trans extend while flush truncate log Subject: ocfs2: fix trans extend while free cached blocks Subject: fsnotify: add a way to stop queueing events on group shutdown Subject: fanotify: fix list corruption in fanotify_get_response() Subject: ocfs2: fix double unlock in case retry after free truncate log Subject: mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting Subject: cgroup: duplicate cgroup reference when cloning sockets Subject: ocfs2: fix start offset to ocfs2_zero_range_for_truncate() Subject: Revert "ocfs2: bump up o2cb network protocol version" Subject: rapidio/rio_cm: avoid GFP_KERNEL in atomic context [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 393 bytes --] 5 fixes, based on 8ab293e3a1376574e11f9059c09cc0db212546cb: Subject: mm,ksm: fix endless looping in allocating memory when ksm enable Subject: dma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG Subject: scripts/recordmcount.c: account for .softirqentry.text Subject: mem-hotplug: use nodes that contain memory as mask in new_node_page() Subject: MAINTAINERS: Mark has moved [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 338 bytes --] 4 fixes, based on e3b3656ca63e23b5755183718df36fb9ff518b02: Subject: mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page() Subject: ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() Subject: include/linux/property.h: fix typo/compile error Subject: MAINTAINERS: Javi has moved [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
- fsnotify updates - ocfs2 updates - all of MM 127 patches, based on 87840a2b7e048018d18d60bdac5c09224de85370: Subject: fsnotify: drop notification_mutex before destroying event Subject: fsnotify: convert notification_mutex to a spinlock Subject: fanotify: use notification_lock instead of access_lock Subject: fanotify: fix possible false warning when freeing events Subject: fsnotify: clean up spinlock assertions Subject: jiffies: add time comparison functions for 64 bit jiffies Subject: fs/ocfs2/dlmfs: remove deprecated create_singlethread_workqueue() Subject: fs/ocfs2/cluster: remove deprecated create_singlethread_workqueue() Subject: fs/ocfs2/super: remove deprecated create_singlethread_workqueue() Subject: fs/ocfs2/dlm: remove deprecated create_singlethread_workqueue() Subject: ocfs2: fix undefined struct variable in inode.h Subject: mm: oom: deduplicate victim selection code for memcg and global oom Subject: mm/vmalloc.c: fix align value calculation error Subject: mm: memcontrol: add sanity checks for memcg->id.ref on get/put Subject: mm/oom_kill.c: fix task_will_free_mem() comment Subject: mm, compaction: make whole_zone flag ignore cached scanner positions Subject: mm, compaction: cleanup unused functions Subject: mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS Subject: mm, compaction: don't recheck watermarks after COMPACT_SUCCESS Subject: mm, compaction: add the ultimate direct compaction priority Subject: mm, compaction: use correct watermark when checking compaction success Subject: mm, compaction: create compact_gap wrapper Subject: mm, compaction: use proper alloc_flags in __compaction_suitable() Subject: mm, compaction: require only min watermarks for non-costly orders Subject: mm, vmscan: make compaction_ready() more accurate and readable Subject: mem-hotplug: fix node spanned pages when we have a movable node Subject: mm: fix set pageblock migratetype in deferred struct page init Subject: mm, vmscan: get rid of throttle_vm_writeout Subject: mm/debug_pagealloc.c: clean-up guard page handling code Subject: mm/debug_pagealloc.c: don't allocate page_ext if we don't use guard page Subject: mm/page_owner: move page_owner specific function to page_owner.c Subject: mm/page_ext: rename offset to index Subject: mm/page_ext: support extra space allocation by page_ext user Subject: mm/page_owner: don't define fields on struct page_ext by hard-coding Subject: do_generic_file_read(): fail immediately if killed Subject: mm: pagewalk: fix the comment for test_walk Subject: mm: unrig VMA cache hit ratio Subject: mm, swap: add swap_cluster_list Subject: mm,oom_reaper: reduce find_lock_task_mm() usage Subject: mm,oom_reaper: do not attempt to reap a task twice Subject: oom: keep mm of the killed task available Subject: kernel, oom: fix potential pgd_lock deadlock from __mmdrop Subject: mm, oom: get rid of signal_struct::oom_victims Subject: oom, suspend: fix oom_killer_disable vs. pm suspend properly Subject: mm, oom: enforce exit_oom_victim on current task Subject: mm: make sure that kthreads will not refault oom reaped memory Subject: oom, oom_reaper: allow to reap mm shared by the kthreads Subject: mm: use zonelist name instead of using hardcoded index Subject: mm: introduce arch_reserved_kernel_pages() Subject: mm/memblock.c: expose total reserved memory Subject: powerpc: implement arch_reserved_kernel_pages Subject: mm/nobootmem.c: remove duplicate macro ARCH_LOW_ADDRESS_LIMIT statements Subject: mm/bootmem.c: replace kzalloc() by kzalloc_node() Subject: mm: don't use radix tree writeback tags for pages in swap cache Subject: oom: warn if we go OOM for higher order and compaction is disabled Subject: mm: mlock: check against vma for actual mlock() size Subject: mm: mlock: avoid increase mm->locked_vm on mlock() when already mlock2(,MLOCK_ONFAULT) Subject: selftest: split mlock2_ funcs into separate mlock2.h Subject: selftests/vm: add test for mlock() when areas are intersected Subject: selftest: move seek_to_smaps_entry() out of mlock2-tests.c Subject: selftests: expanding more mlock selftest Subject: thp, dax: add thp_get_unmapped_area for pmd mappings Subject: ext2/4, xfs: call thp_get_unmapped_area() for pmd mappings Subject: cpu: fix node state for whether it contains CPU Subject: fs/proc/task_mmu.c: make the task_mmu walk_page_range() limit in clear_refs_write() obvious Subject: thp: reduce usage of huge zero page's atomic counter Subject: mm/memcontrol.c: make the walk_page_range() limit obvious Subject: memory-hotplug: fix store_mem_state() return value Subject: mm: fix cache mode tracking in vm_insert_mixed() Subject: mm, swap: use offset of swap entry as key of swap cache Subject: mm: remove page_file_index Subject: Revert "mm, oom: prevent premature OOM killer invocation for high order request" Subject: mm, compaction: more reliably increase direct compaction priority Subject: mm, compaction: restrict full priority to non-costly orders Subject: mm, compaction: make full priority ignore pageblock suitability Subject: mm, page_alloc: pull no_progress_loops update to should_reclaim_retry() Subject: mm, compaction: ignore fragindex from compaction_zonelist_suitable() Subject: mm, compaction: restrict fragindex to costly orders Subject: mm: don't emit warning from pagefault_out_of_memory() Subject: mm/page_io.c: replace some BUG_ON()s with VM_BUG_ON_PAGE() Subject: mm: move phys_mem_access_prot_allowed() declaration to pgtable.h Subject: mm: memcontrol: consolidate cgroup socket tracking Subject: mm/shmem.c: constify anon_ops Subject: mm: nobootmem: move the comment of free_all_bootmem Subject: mm/hugetlb: fix memory offline with hugepage size > memory block size Subject: mm/hugetlb: check for reserved hugepages during memory offline Subject: mm/hugetlb: improve locking in dissolve_free_huge_pages() Subject: mm/page_isolation: fix typo: "paes" -> "pages" Subject: mm,ksm: add __GFP_HIGH to the allocation in alloc_stable_node() Subject: mm: vm_page_prot: update with WRITE_ONCE/READ_ONCE Subject: mm: vma_adjust: remove superfluous confusing update in remove_next == 1 case Subject: mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk Subject: mm: vma_adjust: remove superfluous check for next not NULL Subject: mm: vma_adjust: minor comment correction Subject: mm: vma_merge: correct false positive from __vma_unlink->validate_mm_rb Subject: mm: clarify why we avoid page_mapcount() for slab pages in dump_page() Subject: oom: print nodemask in the oom report Subject: mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGE Subject: arm64 Kconfig: select gigantic page Subject: vfs,mm: fix a dead loop in truncate_inode_pages_range() Subject: mm: consolidate warn_alloc_failed users Subject: mm: warn about allocations which stall for too long Subject: mm: remove unnecessary condition in remove_inode_hugepages Subject: linux/mm.h: canonicalize macro PAGE_ALIGNED() definition Subject: ia64: implement atomic64_dec_if_positive Subject: atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE Subject: proc: much faster /proc/vmstat Subject: proc: faster /proc/*/status Subject: seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char Subject: meminfo: break apart a very long seq_printf with #ifdefs Subject: proc: relax /proc/<tid>/timerslack_ns capability requirements Subject: proc: add LSM hook checks to /proc/<tid>/timerslack_ns Subject: proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self Subject: mm, proc: fix region lost in /proc/self/smaps Subject: Documentation/filesystems/proc.txt: add more description for maps/smaps Subject: min/max: remove sparse warnings when they're nested Subject: nmi_backtrace: add more trigger_*_cpu_backtrace() methods Subject: nmi_backtrace: do a local dump_stack() instead of a self-NMI Subject: arch/tile: adopt the new nmi_backtrace framework Subject: nmi_backtrace: generate one-line reports for idle cpus Subject: spelling.txt: "modeled" is spelt correctly Subject: uprobes: remove function declarations from arch/{mips,s390} Subject: .gitattributes: set git diff driver for C source code files Subject: mailmap: add Johan Hovold Subject: CREDITS: update Pavel's information, add GPG key, remove snail mail address Subject: cred: simpler, 1D supplementary groups Subject: console: don't prefer first registered if DT specifies stdout-path
- a few block updates that fell in my lap - lib/ updates - checkpatch - autofs - ipc - A ton of misc other things 102 patches, based on 1689c73a739d094b544c680b0dfdebe52ffee8fb: Subject: ocfs2: fix memory leak in dlm_migrate_request_handler() Subject: block: invalidate the page cache when issuing BLKZEROOUT Subject: block: require write_same and discard requests align to logical block size Subject: block: implement (some of) fallocate for block devices Subject: fs/select: add vmalloc fallback for select(2) Subject: radix-tree: 'slot' can be NULL in radix_tree_next_slot() Subject: radix-tree tests: add iteration test Subject: radix-tree tests: properly initialize mutex Subject: lib: harden strncpy_from_user Subject: include/linux/ctype.h: make isdigit() table lookupless Subject: lib/kstrtox.c: smaller _parse_integer() Subject: lib/bitmap.c: enhance bitmap syntax Subject: include/linux: provide a safe version of container_of() Subject: llist: introduce llist_entry_safe() Subject: checkpatch: see if modified files are marked obsolete in MAINTAINERS Subject: checkpatch: look for symbolic permissions and suggest octal instead Subject: checkpatch: test multiple line block comment alignment Subject: checkpatch: don't test for prefer ether_addr_<foo> Subject: checkpatch: externalize the structs that should be const Subject: const_structs.checkpatch: add frequently used from Julia Lawall's list Subject: checkpatch: speed up checking for filenames in sections marked obsolete Subject: checkpatch: improve the block comment * alignment test Subject: checkpatch: add --strict test for macro argument reuse Subject: checkpatch: add --strict test for precedence challenged macro arguments Subject: checkpatch: improve MACRO_ARG_PRECEDENCE test Subject: checkpatch: add warning for unnamed function definition arguments Subject: checkpatch: improve the octal permissions tests Subject: kprobes: include <asm/sections.h> instead of <asm-generic/sections.h> Subject: autofs: fix typos in Documentation/filesystems/autofs4.txt Subject: autofs: drop unnecessary extern in autofs_i.h Subject: autofs: test autofs versions first on sb initialization Subject: autofs: fix autofs4_fill_super() error exit handling Subject: autofs: add WARN_ON(1) for non dir/link inode case Subject: autofs: remove ino free in autofs4_dir_symlink() Subject: autofs: use autofs4_free_ino() to kfree dentry data Subject: autofs: remove obsolete sb fields Subject: autofs: don't fail to free_dev_ioctl(param) Subject: autofs: remove AUTOFS_DEVID_LEN Subject: autofs: fix Documentation regarding devid on ioctl Subject: autofs: update struct autofs_dev_ioctl in Documentation Subject: autofs: fix pr_debug() message Subject: autofs: fix dev ioctl number range check Subject: autofs: add autofs_dev_ioctl_version() for AUTOFS_DEV_IOCTL_VERSION_CMD Subject: autofs: fix print format for ioctl warning message Subject: autofs: move inclusion of linux/limits.h to uapi Subject: autofs4: move linux/auto_dev-ioctl.h to uapi/linux Subject: autofs: remove possibly misleading /* #define DEBUG */ Subject: autofs: refactor ioctl fn vector in iookup_dev_ioctl() Subject: pipe: relocate round_pipe_size() above pipe_set_size() Subject: pipe: move limit checking logic into pipe_set_size() Subject: pipe: refactor argument for account_pipe_buffers() Subject: pipe: fix limit checking in pipe_set_size() Subject: pipe: simplify logic in alloc_pipe_info() Subject: pipe: fix limit checking in alloc_pipe_info() Subject: pipe: make account_pipe_buffers() return a value, and use it Subject: pipe: cap initial pipe capacity according to pipe-max-size limit Subject: ptrace: clear TIF_SYSCALL_TRACE on ptrace detach Subject: rapidio/rio_cm: use memdup_user() instead of duplicating code Subject: random: simplify API for random address requests Subject: x86: use simpler API for random address requests Subject: ARM: use simpler API for random address requests Subject: arm64: use simpler API for random address requests Subject: tile: use simpler API for random address requests Subject: unicore32: use simpler API for random address requests Subject: random: remove unused randomize_range() Subject: dma-mapping: introduce the DMA_ATTR_NO_WARN attribute Subject: powerpc: implement the DMA_ATTR_NO_WARN attribute Subject: nvme: use the DMA_ATTR_NO_WARN attribute Subject: x86/panic: replace smp_send_stop() with kdump friendly version in panic path Subject: mips/panic: replace smp_send_stop() with kdump friendly version in panic path Subject: pps: kc: fix non-tickless system config dependency Subject: relay: Use irq_work instead of plain timer for deferred wakeup Subject: config/android: Remove CONFIG_IPV6_PRIVACY Subject: config: android: move device mapper options to recommended Subject: config: android: set SELinux as default security mode Subject: config: android: enable CONFIG_SECCOMP Subject: kcov: do not instrument lib/stackdepot.c Subject: ipc/sem.c: fix complex_count vs. simple op race Subject: ipc/msg: implement lockless pipelined wakeups Subject: ipc/msg: batch queue sender wakeups Subject: ipc/msg: make ss_wakeup() kill arg boolean Subject: ipc/msg: avoid waking sender upon full queue Subject: ipc/sem.c: Add cond_resched in exit_sme Subject: kdump, vmcoreinfo: report memory sections virtual addresses Subject: mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping Subject: scripts/tags.sh: enable code completion in VIM Subject: kthread: rename probe_kthread_data() to kthread_probe_data() Subject: kthread: kthread worker API cleanup Subject: kthread/smpboot: do not park in kthread_create_on_cpu() Subject: kthread: allow to call __kthread_create_on_node() with va_list args Subject: kthread: add kthread_create_worker*() Subject: kthread: add kthread_destroy_worker() Subject: kthread: detect when a kthread work is used by more workers Subject: kthread: initial support for delayed kthread work Subject: kthread: allow to cancel kthread work Subject: kthread: allow to modify delayed kthread work Subject: kthread: better support freezable kthread workers Subject: kthread: add kerneldoc for kthread_create() Subject: hung_task: allow hung_task_panic when hung_task_warnings is 0 Subject: treewide: remove redundant #include <linux/kconfig.h> Subject: fs: use mapping_set_error instead of opencoded set_bit Subject: mm: split gfp_mask and mapping flags into separate fields
15 fixes, based on 27bcd37e0240bbe33f0efe244b5aad52104115b3: Subject: mm: remove extra newline from allocation stall warning Subject: mm, frontswap: make sure allocated frontswap map is assigned Subject: shmem: fix pageflags after swapping DMA32 object Subject: scripts/bloat-o-meter: fix SIGPIPE Subject: mm/cma.c: check the max limit for cma allocation Subject: swapfile: fix memory corruption via malformed swapfile Subject: mm: hwpoison: fix thp split handling in memory_failure() Subject: Revert "console: don't prefer first registered if DT specifies stdout-path" Subject: ocfs2: fix not enough credit panic Subject: mm/hugetlb: fix huge page reservation leak in private mapping error paths Subject: mm/filemap: don't allow partially uptodate page for pipes Subject: coredump: fix unfreezable coredumping task Subject: memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB Subject: mm: kmemleak: scan .data.ro_after_init Subject: lib/stackdepot: export save/fetch stack for drivers
7 fixes, based on ded6e842cf499ef04b0d611d92b859d5b846c497: Subject: mm, thp: propagation of conditional compilation in khugepaged.c Subject: thp: fix corner case of munlock() of PTE-mapped THPs Subject: zram: fix unbalanced idr management at hot removal Subject: lib/debugobjects: export for use in modules Subject: kasan: update kasan_global for gcc 7 Subject: kasan: support use-after-scope detection Subject: mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb
2 fixes, based on 8dc0f265d39a3933f4c1f846c7c694f12a2ab88a: Subject: mm: workingset: fix NULL ptr in count_shadow_nodes Subject: mm, vmscan: add cond_resched() into shrink_node_memcg()
3 fixes, based on ea5a9eff96fed8252f3a8c94a84959f981a93cae: Subject: zram: restrict add/remove attributes to root only Subject: radix tree test suite: fix compilation Subject: kcov: add missing #include <linux/sched.h>
- various misc bits - most of MM (quite a lot of MM material is awaiting the merge of linux-next dependencies) - kasan - printk updates - procfs updates - MAINTAINERS - /lib updates - checkpatch updates 123 patches, based on df5f0f0a028c9bf43949398a175dbaafaf513e14: Subject: kthread: add __printf attributes Subject: prctl: remove one-shot limitation for changing exe link Subject: scripts/bloat-o-meter: don't use readlines() Subject: scripts/bloat-o-meter: compile .NUMBER regex Subject: scripts/tags.sh: handle OMAP platforms properly Subject: m32r: add simple dma Subject: m32r: fix build warning Subject: drivers/pcmcia/m32r_pcc.c: check return from request_irq Subject: drivers/pcmcia/m32r_pcc.c: use common error path Subject: drivers/pcmcia/m32r_pcc.c: check return from add_pcc_socket Subject: ocfs2/dlm: clean up useless BUG_ON default case in dlm_finalize_reco_handler() Subject: ocfs2: delete redundant code and set the node bit into maybe_map directly Subject: ocfs2/dlm: clean up deadcode in dlm_master_request_handler() Subject: ocfs2: clean up unused 'page' parameter in ocfs2_write_end_nolock() Subject: ocfs2: fix double put of recount tree in ocfs2_lock_refcount_tree() Subject: ocfs2: use time64_t to represent orphan scan times Subject: ocfs2: replace CURRENT_TIME macro Subject: mm: memcontrol: use special workqueue for creating per-memcg caches Subject: slub: move synchronize_sched out of slab_mutex on shrink Subject: slub: avoid false-postive warning Subject: mm/slab_common.c: check kmem_create_cache flags are common Subject: mm, slab: faster active and free stats Subject: mm, slab: maintain total slab count instead of active count Subject: mm/mprotect.c: don't touch single threaded PTEs which are on the right node Subject: mm/vmscan.c: set correct defer count for shrinker Subject: mm/gup.c: make unnecessarily global vma_permits_fault() static Subject: mm/hugetlb.c: use the right pte val for compare in hugetlb_cow Subject: mm/hugetlb.c: use huge_pte_lock instead of opencoding the lock Subject: kmemleak: fix reference to Documentation Subject: mm: don't steal highatomic pageblock Subject: mm: prevent double decrease of nr_reserved_highatomic Subject: mm: try to exhaust highatomic reserve before the OOM Subject: mm: make unreserve highatomic functions reliable Subject: mm/vmalloc.c: simplify /proc/vmallocinfo implementation Subject: mm, thp: avoid unlikely branches for split_huge_pmd Subject: mm, mempolicy: clean up __GFP_THISNODE confusion in policy_zonelist Subject: mm, compaction: fix NR_ISOLATED_* stats for pfn based migration Subject: shmem: avoid maybe-uninitialized warning Subject: mm: use the correct page size when removing the page Subject: mm: update mmu_gather range correctly Subject: mm/hugetlb: add tlb_remove_hugetlb_entry for handling hugetlb pages Subject: mm: add tlb_remove_check_page_size_change to track page size change Subject: mm: remove the page size change check in tlb_remove_page Subject: mm: fix up get_user_pages* comments Subject: mm/mempolicy.c: forbid static or relative flags for local NUMA mode Subject: powerpc/mm: allow memory hotplug into a memoryless node Subject: mm: remove x86-only restriction of movable_node Subject: mm: enable CONFIG_MOVABLE_NODE on non-x86 arches Subject: of/fdt: mark hotpluggable memory Subject: dt: add documentation of "hotpluggable" memory property Subject: mm/pkeys: generate pkey system call code only if ARCH_HAS_PKEYS is selected Subject: mm: disable numa migration faults for dax vmas Subject: mm: cma: make linux/cma.h standalone includible Subject: mm/filemap.c: add comment for confusing logic in page_cache_tree_insert() Subject: fs/fs-writeback.c: remove redundant if check Subject: shmem: fix compilation warnings on unused functions Subject: mm: don't cap request size based on read-ahead setting Subject: include/linux/backing-dev-defs.h: shrink struct backing_dev_info Subject: mm: khugepaged: close use-after-free race during shmem collapsing Subject: mm: khugepaged: fix radix tree node leak in shmem collapse error path Subject: mm: workingset: turn shadow node shrinker bugs into warnings Subject: lib: radix-tree: native accounting of exceptional entries Subject: lib: radix-tree: check accounting of existing slot replacement users Subject: lib: radix-tree: add entry deletion support to __radix_tree_replace() Subject: lib: radix-tree: update callback for changing leaf nodes Subject: mm: workingset: move shadow entry tracking to radix tree exceptional tracking Subject: mm: workingset: restore refault tracking for single-page files Subject: mm: workingset: update shadow limit to reflect bigger active list Subject: mm: remove free_unmap_vmap_area_noflush() Subject: mm: remove free_unmap_vmap_area_addr() Subject: mm: refactor __purge_vmap_area_lazy() Subject: mm: add vfree_atomic() Subject: kernel/fork: use vfree_atomic() to free thread stack Subject: x86/ldt: use vfree_atomic() to free ldt entries Subject: mm: mark all calls into the vmalloc subsystem as potentially sleeping Subject: mm: turn vmap_purge_lock into a mutex Subject: mm: add preempt points into __purge_vmap_area_lazy() Subject: mm: move vma_is_anonymous check within pmd_move_must_withdraw Subject: mm: THP page cache support for ppc64 Subject: mm, debug: print raw struct page data in __dump_page() Subject: mm, rmap: handle anon_vma_prepare() common case inline Subject: mm, page_alloc: keep pcp count and list contents in sync if struct page is corrupted Subject: mm: add three more cond_resched() in swapoff Subject: mm: add cond_resched() in gather_pte_stats() Subject: mm: make transparent hugepage size public Subject: kasan: support panic_on_warn Subject: kasan: eliminate long stalls during quarantine reduction Subject: kasan: turn on -fsanitize-address-use-after-scope Subject: mm/percpu.c: fix panic triggered by BUG_ON() falsely Subject: proc: report no_new_privs state Subject: proc: make struct pid_entry::len unsigned Subject: proc: make struct struct map_files_info::len unsigned int Subject: proc: just list_del() struct pde_opener Subject: proc: fix type of struct pde_opener::closing field Subject: proc: kmalloc struct pde_opener Subject: proc: tweak comments about 2 stage open and everything Subject: fs/proc/array.c: slightly improve render_sigset_t Subject: fs/proc/base.c: save decrement during lookup/readdir in /proc/$PID Subject: fs/proc: calculate /proc/* and /proc/*/task/* nlink at init time Subject: hung_task: decrement sysctl_hung_task_warnings only if it is positive Subject: compiler-gcc.h: use "proved" instead of "proofed" Subject: printk/NMI: fix up handling of the full nmi log buffer Subject: printk/NMI: handle continuous lines and missing newline Subject: printk/kdb: handle more message headers Subject: printk/btrfs: handle more message headers Subject: printk/sound: handle more message headers Subject: printk: add Kconfig option to set default console loglevel Subject: get_maintainer: look for arbitrary letter prefixes in sections Subject: MAINTAINERS: add "B:" for URI where to file bugs Subject: MAINTAINERS: add drm and drm/i915 bug filing info Subject: MAINTAINERS: add "C:" for URI for chat where developers hang out Subject: MAINTAINERS: add drm and drm/i915 irc channels Subject: lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM Subject: lib/rbtree.c: fix typo in comment of ____rb_erase_color Subject: lib/ida: document locking requirements a bit better Subject: checkpatch: don't try to get maintained status when --no-tree is given Subject: scripts/checkpatch.pl: fix spelling Subject: checkpatch: don't check .pl files, improve absolute path commit log test Subject: checkpatch: avoid multiple line dereferences Subject: checkpatch: don't check c99 types like uint8_t under tools Subject: checkpatch: don't emit unified-diff error for rename-only patches Subject: binfmt_elf: use vmalloc() for allocation of vma_filesz Subject: init: reduce rootwait polling interval time to 5ms
- a few misc things - kexec updates - DMA-mapping updates to better support networking DMA operations - IPC updates - various MM changes to improve DAX fault handling - lots of radix-tree changes, mainly to the test suite. All leading up to reimplementing the IDA/IDR code to be a wrapper layer over the radix-tree. However the final trigger-pulling patch is held off for 4.11. 114 patches, based on 775a2e29c3bbcf853432f47d3caa9ff8808807ad: Subject: btrfs: better handle btrfs_printk() defaults Subject: kernel/watchdog: use nmi registers snapshot in hardlockup handler Subject: mm, compaction: allow compaction for GFP_NOFS requests Subject: signals: avoid unnecessary taking of sighand->siglock Subject: coredump: clarify "unsafe core_pattern" warning Subject: Revert "kdump, vmcoreinfo: report memory sections virtual addresses" Subject: kexec: export the value of phys_base instead of symbol address Subject: kexec: add cond_resched into kimage_alloc_crash_control_pages Subject: sysctl: add KERN_CONT to deprecated_sysctl_warning() Subject: arch/arc: add option to skip sync on DMA mapping Subject: arch/arm: add option to skip sync on DMA map and unmap Subject: arch/avr32: add option to skip sync on DMA map Subject: arch/blackfin: add option to skip sync on DMA map Subject: arch/c6x: add option to skip sync on DMA map and unmap Subject: arch/frv: add option to skip sync on DMA map Subject: arch/hexagon: Add option to skip DMA sync as a part of mapping Subject: arch/m68k: add option to skip DMA sync as a part of mapping Subject: arch/metag: add option to skip DMA sync as a part of map and unmap Subject: arch/microblaze: add option to skip DMA sync as a part of map and unmap Subject: arch/mips: add option to skip DMA sync as a part of map and unmap Subject: arch/nios2: add option to skip DMA sync as a part of map and unmap Subject: arch/openrisc: add option to skip DMA sync as a part of mapping Subject: arch/parisc: add option to skip DMA sync as a part of map and unmap Subject: arch/powerpc: add option to skip DMA sync as a part of mapping Subject: arch/sh: add option to skip DMA sync as a part of mapping Subject: arch/sparc: add option to skip DMA sync as a part of map and unmap Subject: arch/tile: add option to skip DMA sync as a part of map and unmap Subject: arch/xtensa: add option to skip DMA sync as a part of mapping Subject: dma: add calls for dma_map_page_attrs and dma_unmap_page_attrs Subject: mm: add support for releasing multiple instances of a page Subject: igb: update driver to make use of DMA_ATTR_SKIP_CPU_SYNC Subject: igb: update code to better handle incrementing page count Subject: relay: check array offset before using it Subject: Kconfig: lib/Kconfig.debug: fix references to Documenation Subject: Kconfig: lib/Kconfig.ubsan fix reference to ubsan documentation Subject: kcov: add more missing includes Subject: kernel/debug/debug_core.c: more properly delay for secondary CPUs Subject: kdb: remove unused kdb_event handling Subject: kdb: properly synchronize vkdb_printf() calls with other CPUs Subject: kdb: call vkdb_printf() from vprintk_default() only when wanted Subject: initramfs: select builtin initram compression algorithm on KConfig instead of Makefile Subject: initramfs: allow again choice of the embedded initram compression algorithm Subject: ipc: msg, make msgrcv work with LONG_MIN Subject: ipc/shm.c: coding style fixes Subject: posix-timers: give lazy compilers some help optimizing code away Subject: drivers/net/wireless/intel/iwlwifi/dvm/calib.c: simplfy min() expression Subject: ktest.pl: fix english Subject: kernel/watchdog.c: move shared definitions to nmi.h Subject: kernel/watchdog.c: move hardlockup detector to separate file Subject: sparc: implement watchdog_nmi_enable and watchdog_nmi_disable Subject: ipc/sem: do not call wake_sem_queue_do() prematurely Subject: ipc/sem: rework task wakeups Subject: ipc/sem: optimize perform_atomic_semop() Subject: ipc/sem: explicitly inline check_restart Subject: ipc/sem: use proper list api for pending_list wakeups Subject: ipc/sem: simplify wait-wake loop Subject: ipc/sem: avoid idr tree lookup for interrupted semop Subject: mm: add locked parameter to get_user_pages_remote() Subject: mm: unexport __get_user_pages_unlocked() Subject: mm: join struct fault_env and vm_fault Subject: mm: use vmf->address instead of of vmf->virtual_address Subject: mm: use pgoff in struct vm_fault instead of passing it separately Subject: mm: use passed vm_fault structure in __do_fault() Subject: mm: trim __do_fault() arguments Subject: mm: use passed vm_fault structure for in wp_pfn_shared() Subject: mm: add orig_pte field into vm_fault Subject: mm: allow full handling of COW faults in ->fault handlers Subject: mm: factor out functionality to finish page faults Subject: mm: move handling of COW faults into DAX code Subject: mm: factor out common parts of write fault handling Subject: mm: pass vm_fault structure into do_page_mkwrite() Subject: mm: use vmf->page during WP faults Subject: mm: move part of wp_page_reuse() into the single call site Subject: mm: provide helper for finishing mkwrite faults Subject: mm: change return values of finish_mkwrite_fault() Subject: mm: export follow_pte() Subject: dax: make cache flushing protected by entry lock Subject: dax: protect PTE modification on WP fault by radix tree entry lock Subject: dax: clear dirty entry tags on cache flush Subject: tools: add WARN_ON_ONCE Subject: radix tree test suite: allow GFP_ATOMIC allocations to fail Subject: radix tree test suite: track preempt_count Subject: radix tree test suite: free preallocated nodes Subject: radix tree test suite: make runs more reproducible Subject: radix tree test suite: iteration test misuses RCU Subject: radix tree test suite: benchmark for iterator Subject: radix tree test suite: use rcu_barrier Subject: radix tree test suite: handle exceptional entries Subject: radix tree test suite: record order in each item Subject: tools: add more bitmap functions Subject: radix tree test suite: use common find-bit code Subject: radix-tree: fix typo Subject: radix-tree: move rcu_head into a union with private_list Subject: radix-tree: create node_tag_set() Subject: radix-tree: make radix_tree_find_next_bit more useful Subject: radix-tree: improve dump output Subject: btrfs: fix race in btrfs_free_dummy_fs_info() Subject: radix-tree: improve multiorder iterators Subject: radix-tree: delete radix_tree_locate_item() Subject: radix-tree: delete radix_tree_range_tag_if_tagged() Subject: radix-tree: add radix_tree_join Subject: radix-tree: add radix_tree_split Subject: radix-tree: add radix_tree_split_preload() Subject: radix-tree: fix replacement for multiorder entries Subject: radix tree test suite: check multiorder iteration Subject: idr: add ida_is_empty Subject: tpm: use idr_find(), not idr_find_slowpath() Subject: rxrpc: abstract away knowledge of IDR internals Subject: idr: reduce the number of bits per level from 8 to 6 Subject: radix tree test suite: add some more functionality Subject: radix tree test suite: cache recently freed objects Subject: radix-tree: ensure counts are initialised Subject: radix tree test suite: add new tag check Subject: radix tree test suite: delete unused rcupdate.c
- a series to make IMA play better across kexec - a handful of random fixes 15 patches, based on e93b1cc8a8965da137ffea0b88e5f62fa1d2a9e6: Subject: powerpc: ima: get the kexec buffer passed by the previous kernel Subject: ima: on soft reboot, restore the measurement list Subject: ima: permit duplicate measurement list entries Subject: ima: maintain memory size needed for serializing the measurement list Subject: powerpc: ima: send the kexec buffer to the next kernel Subject: ima: on soft reboot, save the measurement list Subject: ima: store the builtin/custom template definitions in a list Subject: ima: support restoring multiple template formats Subject: ima: define a canonical binary_runtime_measurements list format Subject: ima: platform-independent hash value Subject: mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED Subject: arm64: setup: introduce kaslr_offset() Subject: kcov: make kcov work properly with KASLR enabled Subject: ratelimit: fix WARN_ON_RATELIMIT return value Subject: printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
27 fixes, based on bd5d7428f5e50cc10b98cf0abc13ccac391e1e33: The three patches Subject: mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free Subject: mm: rename __page_frag functions to __page_frag_cache, drop order from drain Subject: mm: add documentation for page fragment APIs aren't actually fixes. They're simple function renamings which are nice-to-have in mainline as ongoing net development depends on them. Subject: MAINTAINERS: remove duplicate bug filling description Subject: dax: fix deadlock with DAX 4k holes Subject: mm/thp/pagecache/collapse: free the pte page table on collapse for thp page cache. Subject: mm: add follow_pte_pmd() Subject: dax: wrprotect pmd_t in dax_mapping_entry_mkclean Subject: mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER Subject: bpf: do not use KMALLOC_SHIFT_MAX Subject: ocfs2: fix crash caused by stale lvb with fsdlm plugin Subject: mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} Subject: mm: fix remote numa hits statistics Subject: mm: get rid of __GFP_OTHER_NODE Subject: lib/Kconfig.debug: fix frv build failure Subject: ipc/sem.c: fix incorrect sem_lock pairing Subject: mm: pmd dirty emulation in page fault handler Subject: signal: protect SIGNAL_UNKILLABLE from unintentional clearing. Subject: mailmap: add codeaurora.org names for nameless email commits Subject: mm: don't dereference struct page fields of invalid pages Subject: mm, memcg: fix the active list aging for lowmem requests when memcg is enabled Subject: mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free Subject: mm: rename __page_frag functions to __page_frag_cache, drop order from drain Subject: mm: add documentation for page fragment APIs Subject: mm: support anonymous stable page Subject: zram: revalidate disk under init_lock Subject: zram: support BDI_CAP_STABLE_WRITES Subject: mm/slab.c: fix SLAB freelist randomization duplicate entries Subject: mm/hugetlb.c: fix reservation race when freeing surplus pages Subject: timerfd: export defines to userspace
26 fixes, based on a4685d2f58e2230d4e27fb2ee581d7ea35e5d046: Subject: memory_hotplug: make zone_can_shift() return a boolean value Subject: mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp Subject: dax: fix build warnings with FS_DAX and !FS_IOMAP Subject: kernel/watchdog: prevent false hardlockup on overloaded system Subject: drivers/memstick/core/memstick.c: avoid -Wnonnull warning Subject: userfaultfd: fix SIGBUS resulting from false rwsem wakeups Subject: mm/slub.c: trace free objects at KERN_INFO Subject: mm: alloc_contig: re-allow CMA to compact FS pages Subject: proc: add a schedule point in proc_pid_readdir() Subject: mm, memcg: do not retry precharge charges Subject: Documentation/filesystems/proc.txt: add VmPin Subject: radix-tree: fix private list warnings Subject: mm/mempolicy.c: do not put mempolicy before using its nodemask Subject: frv: add atomic64_add_unless() Subject: fbdev: color map copying bounds checking Subject: kernel/panic.c: add missing \n Subject: mm, page_alloc: fix check for NULL preferred_zone Subject: mm, page_alloc: fix fast-path race with cpuset update or removal Subject: mm, page_alloc: move cpuset seqcount checking to slowpath Subject: mm, page_alloc: fix premature OOM when racing with cpuset mems update Subject: frv: add missing atomic64 operations Subject: romfs: use different way to generate fsid for BLOCK or MTD Subject: mn10300: fix build error of missing fpu_save() Subject: mm: do not export ioremap_page_range symbol for external module Subject: MAINTAINERS: add Dan Streetman to zswap maintainers Subject: MAINTAINERS: add Dan Streetman to zbud maintainers
4 fixes, based on 926af6273fc683cd98cd0ce7bf0d04a02eed6742: Subject: kernel/ucount.c: mark user_header with kmemleak_ignore() Subject: mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers Subject: cpumask: use nr_cpumask_bits for parsing functions Subject: mm/slub.c: fix random_seq offset destruction
1 fix, based on 2fe1e8a7b2f4dcac3fcb07ff06b0ae7396201fd6: Subject: printk: use rcuidle console tracepoint
142 patches, based on 37c85961c3f87f2141c84e53df31e59db072fd2e: - DAX updates - various misc bits - OCFS2 updates - most of MM Subject: tracing: add __print_flags_u64() Subject: dax: add tracepoint infrastructure, PMD tracing Subject: dax: update MAINTAINERS entries for FS DAX Subject: dax: add tracepoints to dax_pmd_load_hole() Subject: dax: add tracepoints to dax_pmd_insert_mapping() Subject: mm, dax: make pmd_fault() and friends be the same as fault() Subject: mm, dax: change pmd_fault() to take only vmf parameter Subject: dma-debug: add comment for failed to check map error Subject: tools/vm: add missing Makefile rules Subject: scripts/spelling.txt: add several more common spelling mistakes Subject: scripts/spelling.txt: fix incorrect typo-words Subject: scripts/Lindent: clean up and optimize Subject: scripts/checkstack.pl: add support for nios2 Subject: scripts/checkincludes.pl: add exit message for no duplicates found Subject: scripts/tags.sh: include arch/Kconfig* for tags generation Subject: m32r: use generic current.h Subject: m32r: fix build warning Subject: score: remove asm/current.h Subject: ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock Subject: ocfs2: fix deadlock issue when taking inode lock at vfs entry points Subject: parisc: use generic current.h Subject: block: use for_each_thread() in sys_ioprio_set()/sys_ioprio_get() Subject: 9p: fix a potential acl leak Subject: kernel/watchdog.c: do not hardcode CPU 0 as the initial thread Subject: slub: do not merge cache if slub_debug contains a never-merge flag Subject: mm/slub: add a dump_stack() to the unexpected GFP check Subject: mm, slab: rename kmalloc-node cache to kmalloc-<size> Subject: Revert "slub: move synchronize_sched out of slab_mutex on shrink" Subject: slub: separate out sysfs_slab_release() from sysfs_slab_remove() Subject: slab: remove synchronous rcu_barrier() call in memcg cache release path Subject: slab: reorganize memcg_cache_params Subject: slab: link memcg kmem_caches on their associated memory cgroup Subject: slab: implement slab_root_caches list Subject: slab: introduce __kmemcg_cache_deactivate() Subject: slab: remove synchronous synchronize_sched() from memcg cache deactivation path Subject: slab: remove slub sysfs interface files early for empty memcg caches Subject: slab: use memcg_kmem_cache_wq for slab destruction operations Subject: slub: make sysfs directories for memcg sub-caches optional Subject: tmpfs: change shmem_mapping() to test shmem_aops Subject: mm: throttle show_mem() from warn_alloc() Subject: mm, page_alloc: don't convert pfn to idx when merging Subject: mm, page_alloc: avoid page_to_pfn() when merging buddies Subject: mm/vmalloc.c: use rb_entry_safe Subject: mm, trace: extract COMPACTION_STATUS and ZONE_TYPE to a common header Subject: oom, trace: add oom detection tracepoints Subject: oom, trace: add compaction retry tracepoint Subject: userfaultfd: document _IOR/_IOW Subject: userfaultfd: correct comment about UFFD_FEATURE_PAGEFAULT_FLAG_WP Subject: userfaultfd: convert BUG() to WARN_ON_ONCE() Subject: userfaultfd: use vma_is_anonymous Subject: userfaultfd: non-cooperative: Split the find_userfault() routine Subject: userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor Subject: userfaultfd: non-cooperative: report all available features to userland Subject: userfaultfd: non-cooperative: Add fork() event Subject: userfaultfd: non-cooperative: dup_userfaultfd: use mm_count instead of mm_users Subject: userfaultfd: non-cooperative: add mremap() event Subject: userfaultfd: non-cooperative: optimize mremap_userfaultfd_complete() Subject: userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request Subject: userfaultfd: non-cooperative: avoid MADV_DONTNEED race condition Subject: userfaultfd: non-cooperative: wake userfaults after UFFDIO_UNREGISTER Subject: userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support Subject: userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support Subject: userfaultfd: hugetlbfs: add __mcopy_atomic_hugetlb for huge page UFFDIO_COPY Subject: userfaultfd: hugetlbfs: fix __mcopy_atomic_hugetlb retry/error processing Subject: userfaultfd: hugetlbfs: add userfaultfd hugetlb hook Subject: userfaultfd: hugetlbfs: allow registration of ranges containing huge pages Subject: userfaultfd: hugetlbfs: add userfaultfd_hugetlb test Subject: userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges Subject: userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY Subject: userfaultfd: hugetlbfs: reserve count on error in __mcopy_atomic_hugetlb Subject: userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS Subject: userfaultfd: introduce vma_can_userfault Subject: userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support Subject: userfaultfd: shmem: introduce vma_is_shmem Subject: userfaultfd: shmem: add tlbflush.h header for microblaze Subject: userfaultfd: shmem: use shmem_mcopy_atomic_pte for shared memory Subject: userfaultfd: shmem: add userfaultfd hook for shared memory faults Subject: userfaultfd: shmem: allow registration of shared memory ranges Subject: userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappings Subject: userfaultfd: shmem: add userfaultfd_shmem test Subject: userfaultfd: shmem: lock the page before adding it to pagecache Subject: userfaultfd: shmem: avoid a lockup resulting from corrupted page->flags Subject: userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY Subject: userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_SHMEM Subject: userfaultfd: non-cooperative: selftest: introduce userfaultfd_open Subject: userfaultfd: non-cooperative: selftest: add ufd parameter to copy_page Subject: userfaultfd: non-cooperative: selftest: add test for FORK, MADVDONTNEED and REMAP events Subject: userfaultfd: selftest: test UFFDIO_ZEROPAGE on all memory types Subject: mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock Subject: mm, vmscan: remove unused mm_vmscan_memcg_isolate Subject: mm, vmscan: add active list aging tracepoint Subject: mm, vmscan: show the number of skipped pages in mm_vmscan_lru_isolate Subject: mm, vmscan: show LRU name in mm_vmscan_lru_isolate tracepoint Subject: mm, vmscan: extract shrink_page_list reclaim counters into a struct Subject: mm, vmscan: enhance mm_vmscan_lru_shrink_inactive tracepoint Subject: mm, vmscan: add mm_vmscan_inactive_list_is_low tracepoint Subject: trace-vmscan-postprocess: sync with tracepoints updates Subject: nfs: no PG_private waiters remain, remove waker Subject: mm: un-export wake_up_page functions Subject: mm: fix filemap.c kernel-doc warnings Subject: mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist() Subject: mm, compaction: add vmstats for kcompactd work Subject: mm: page_alloc: skip over regions of invalid pfns where possible Subject: mm,compaction: serialize waitqueue_active() checks Subject: mm/bootmem.c: cosmetic improvement of code readability Subject: mm: fix some typos in mm/zsmalloc.c Subject: mm/memblock.c: trivial code refine in memblock_is_region_memory() Subject: mm/memblock.c: check return value of memblock_reserve() in memblock_virt_alloc_internal() Subject: mm/sparse: use page_private() to get page->private value Subject: mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next Subject: powerpc: do not make the entire heap executable Subject: mm/swap: fix kernel message in swap_info_get() Subject: mm/swap: add cluster lock Subject: mm/swap: split swap cache into 64MB trunks Subject: mm/swap: skip readahead for unreferenced swap slots Subject: mm/swap: allocate swap slots in batches Subject: mm/swap: free swap slots in batch Subject: mm/swap: add cache for swap slots allocation Subject: mm/swap: enable swap slots cache usage Subject: mm/swap: skip readahead only when swap slot cache is enabled Subject: mm, thp: add new defer+madvise defrag option Subject: mm/backing-dev.c: use rb_entry() Subject: mm, vmscan: do not count freed pages as PGDEACTIVATE Subject: mm, vmscan: cleanup lru size claculations Subject: mm, vmscan: consider eligible zones in get_scan_count Subject: Revert "mm: bail out in shrink_inactive_list()" Subject: mm, page_alloc: do not report all nodes in show_mem Subject: mm, page_alloc: warn_alloc print nodemask Subject: arch, mm: remove arch specific show_mem Subject: lib/show_mem.c: teach show_mem to work with the given nodemask Subject: mm: consolidate GFP_NOFAIL checks in the allocator slowpath Subject: mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically Subject: mm: help __GFP_NOFAIL allocations which do not trigger OOM killer Subject: mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled Subject: mm: drop zap_details::ignore_dirty Subject: mm: drop zap_details::check_swap_entries Subject: mm: drop unused argument of zap_page_range() Subject: oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA Subject: mm/memblock.c: remove unnecessary log and clean up Subject: zram: remove obsolete sysfs attrs Subject: mm: fix <linux/pagemap.h> stray kernel-doc notation Subject: mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
- almost all of the rest of MM - misc bits - KASAN updates - procfs - lib/ updates - checkpatch updates 124 patches, based on f1ef09fde17f9b77ca1435a5b53a28b203afb81c: Subject: cris: use generic current.h Subject: mm/ksm: improve deduplication of zero pages with colouring Subject: mm, oom: header nodemask is NULL when cpusets are disabled Subject: mm, devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin, done} Subject: mm: validate device_hotplug is held for memory hotplug Subject: mm/memory_hotplug.c: unexport __remove_pages() Subject: memblock: let memblock_type_name know about physmem type Subject: memblock: also dump physmem list within __memblock_dump_all Subject: memblock: embed memblock type name within struct memblock_type Subject: userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE Subject: userfaultfd: non-cooperative: add madvise() event for MADV_REMOVE request Subject: userfaultfd: non-cooperative: selftest: enable REMOVE event test for shmem Subject: mm: vmscan: scan dirty pages even in laptop mode Subject: mm: vmscan: kick flushers when we encounter dirty pages on the LRU Subject: mm: vmscan: remove old flusher wakeup from direct reclaim path Subject: mm: vmscan: only write dirty pages that the scanner has seen twice Subject: mm: vmscan: move dirty pages out of the way until they're flushed Subject: mm, page_alloc: split buffered_rmqueue() Subject: mm, page_alloc: split alloc_pages_nodemask() Subject: mm, page_alloc: drain per-cpu pages from workqueue context Subject: mm, page_alloc: do not depend on cpu hotplug locks inside the allocator Subject: mm, page_alloc: only use per-cpu allocator for irq-safe requests Subject: mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf Subject: mm: fix comments for mmap_init() Subject: zram: remove waitqueue for IO done Subject: mm, page_alloc: remove redundant checks from alloc fastpath Subject: mm, page_alloc: don't check cpuset allowed twice in fast-path Subject: mm, page_alloc: use static global work_struct for draining per-cpu pages Subject: mm,fs,dax: change ->pmd_fault to ->huge_fault Subject: mm, x86: add support for PUD-sized transparent hugepages Subject: dax: support for transparent PUD pages for device DAX Subject: mm: replace FAULT_FLAG_SIZE with parameter to huge_fault Subject: mm: fix get_user_pages() vs device-dax pud mappings Subject: z3fold: make pages_nr atomic Subject: z3fold: fix header size related issues Subject: z3fold: extend compaction function Subject: z3fold: use per-page spinlock Subject: z3fold: add kref refcounting Subject: mm/migration: make isolate_movable_page() return int type Subject: mm/migration: make isolate_movable_page always defined Subject: HWPOISON: soft offlining for non-lru movable page Subject: mm/hotplug: enable memory hotplug for non-lru movable pages Subject: uprobes: split THPs before trying to replace them Subject: mm: introduce page_vma_mapped_walk() Subject: mm: fix handling PTE-mapped THPs in page_referenced() Subject: mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs() Subject: mm, rmap: check all VMAs that PTE-mapped THP can be part of Subject: mm: convert page_mkclean_one() to use page_vma_mapped_walk() Subject: mm: convert try_to_unmap_one() to use page_vma_mapped_walk() Subject: mm, ksm: convert write_protect_page() to use page_vma_mapped_walk() Subject: mm, uprobes: convert __replace_page() to use page_vma_mapped_walk() Subject: mm: convert page_mapped_in_vma() to use page_vma_mapped_walk() Subject: mm: drop page_check_address{,_transhuge} Subject: mm: convert remove_migration_pte() to use page_vma_mapped_walk() Subject: mm: call vm_munmap in munmap syscall instead of using open coded version Subject: userfaultfd: non-cooperative: add event for memory unmaps Subject: userfaultfd: non-cooperative: add event for exit() notification Subject: userfaultfd: mcopy_atomic: return -ENOENT when no compatible VMA found Subject: userfaultfd_copy: return -ENOSPC in case mm has gone Subject: userfaultfd: documentation update Subject: mm: alloc_contig_range: allow to specify GFP mask Subject: mm: cma_alloc: allow to specify GFP mask Subject: mm: wire up GFP flag passing in dma_alloc_from_contiguous Subject: mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count Subject: mm: cma: print allocation failure reason and bitmap status Subject: vmalloc: back off when the current task is killed Subject: mm/page_alloc.c: remove duplicate inclusion of page_ext.h Subject: mm/memory.c: use NULL instead of literal 0 Subject: mm: codgin-style fixes Subject: drm: remove unnecessary fault wrappers Subject: mm, vmscan: clear PGDAT_WRITEBACK when zone is balanced Subject: mm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROW Subject: mm/autonuma: don't use set_pte_at when updating protnone ptes Subject: mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte. Subject: mm/ksm: handle protnone saved writes when making page write protect Subject: powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write Subject: mm/page-writeback.c: place "not" inside of unlikely() statement in wb_domain_writeout_inc() Subject: zram: extend zero pages to same element pages Subject: mm/memory_hotplug.c: fix overflow in test_pages_in_a_zone() Subject: mm/page_alloc: fix nodes for reclaim in fast path Subject: mm: remove shmem_mapping() shmem_zero_setup() duplicates Subject: mm: vmpressure: fix sending wrong events on underflow Subject: mm/zsmalloc: remove redundant SetPagePrivate2 in create_page_chain Subject: mm/page_alloc.c: remove redundant init code for ZONE_MOVABLE Subject: mm/zsmalloc: fix comment in zsmalloc Subject: mm: cleanups for printing phys_addr_t and dma_addr_t Subject: mm/gup: check for protnone only if it is a PTE entry Subject: mm/thp/autonuma: use TNF flag instead of vm fault Subject: mm: do not access page->mapping directly on page_endio Subject: memory-hotplug: use dev_online for memhp_auto_online Subject: kasan: drain quarantine of memcg slab objects Subject: kasan: add memcg kmem_cache test Subject: arch/frv/mb93090-mb00/pci-frv.c: fix build warning Subject: alpha: use generic current.h Subject: proc: use rb_entry() Subject: proc: less code duplication in /proc/*/cmdline Subject: procfs: use an enum for possible hidepid values Subject: uapi: mqueue.h: add missing linux/types.h include Subject: include/linux/iopoll.h: include <linux/ktime.h> instead of <linux/hrtimer.h> Subject: compiler-gcc.h: add a new macro to wrap gcc attribute Subject: m68k: replace gcc specific macros with ones from compiler.h Subject: bug: switch data corruption check to __must_check Subject: mm balloon: umount balloon_mnt when removing vb device Subject: kernel/notifier.c: simplify expression Subject: kernel/ksysfs.c: add __ro_after_init to bin_attribute structure Subject: lib: add module support to crc32 tests Subject: lib: add module support to glob tests Subject: lib: add module support to atomic64 tests Subject: lib/find_bit.c: micro-optimise find_next_*_bit Subject: linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors Subject: rbtree: use designated initializers Subject: lib: add CONFIG_TEST_SORT to enable self-test of sort() Subject: lib/test_sort.c: make it explicitly non-modular Subject: lib: update LZ4 compressor module Subject: lib/decompress_unlz4: change module to work with new LZ4 module version Subject: crypto: change LZ4 modules to work with new LZ4 module version Subject: fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version Subject: lib/lz4: remove back-compat wrappers Subject: checkpatch: warn on embedded function names Subject: checkpatch: warn on logging continuations Subject: checkpatch: update $logFunctions Subject: checkpatch: add another old address for the FSF Subject: checkpatch: notice unbalanced else braces in a patch Subject: checkpatch: remove false unbalanced braces warning
- a few MM remainders - misc things - autofs updates - signals - affs updates - ipc - nilfs2 - spelling.txt updates 78 patches, based on e5d56efc97f8240d0b5d66c03949382b6d7e5570: Subject: mm,fs,dax: mark dax_iomap_pmd_fault as const Subject: zswap: allow initialization at boot without pool Subject: zswap: clear compressor or zpool param if invalid at init Subject: zswap: don't param_set_charp while holding spinlock Subject: kprobes: move kprobe declarations to asm-generic/kprobes.h Subject: autofs: remove wrong comment Subject: autofs: fix typo in Documentation Subject: autofs: fix wrong ioctl documentation regarding devid Subject: autofs: update ioctl documentation regarding struct autofs_dev_ioctl Subject: autofs: add command enum/macros for root-dir ioctls Subject: autofs: remove duplicated AUTOFS_DEV_IOCTL_SIZE definition Subject: autofs: take more care to not update last_used on path walk Subject: hfsplus: atomically read inode size Subject: fs/reiserfs: atomically read inode size Subject: sigaltstack: support SS_AUTODISARM for CONFIG_COMPAT Subject: tools/testing/selftests/sigaltstack/sas.c: improve output of sigaltstack testcase Subject: /proc/kcore: update physical address for kcore ram and text Subject: rapidio: use get_user_pages_unlocked() Subject: include/linux/pid.h: use for_each_thread() in do_each_pid_thread() Subject: fs,eventpoll: Don't test for bitfield with stack value Subject: fs/affs: remove reference to affs_parent_ino() Subject: fs/affs: add validation block function Subject: fs/affs: make affs exportable Subject: fs/affs: use octal for permissions Subject: fs/affs: add prefix to some functions Subject: fs/affs/namei.c: forward declarations clean-up Subject: fs/affs: make export work with cold dcache Subject: config: android-recommended: disable aio support Subject: config: android-base: enable hardened usercopy and kernel ASLR Subject: lib/fonts/Kconfig: keep non-Sparc fonts listed together Subject: initramfs: finish fput() before accessing any binary from initramfs Subject: ipc/sem.c: avoid using spin_unlock_wait() Subject: ipc/sem: add hysteresis Subject: ipc/mqueue: add missing sparse annotation Subject: ipc/shm: Fix shmat mmap nil-page protection Subject: scatterlist: reorder compound boolean expression Subject: scatterlist: do not disable IRQs in sg_copy_buffer Subject: fs: add i_blocksize() Subject: nilfs2: use nilfs_btree_node_size() Subject: nilfs2: use i_blocksize() Subject: scripts/spelling.txt: add "swith" pattern and fix typo instances Subject: scripts/spelling.txt: add "swithc" pattern and fix typo instances Subject: scripts/spelling.txt: add "an user" pattern and fix typo instances Subject: scripts/spelling.txt: add "an union" pattern and fix typo instances Subject: scripts/spelling.txt: add "an one" pattern and fix typo instances Subject: scripts/spelling.txt: add "partiton" pattern and fix typo instances Subject: scripts/spelling.txt: add "aligment" pattern and fix typo instances Subject: scripts/spelling.txt: add "algined" pattern and fix typo instances Subject: scripts/spelling.txt: add "efective" pattern and fix typo instances Subject: scripts/spelling.txt: add "varible" pattern and fix typo instances Subject: scripts/spelling.txt: add "embeded" pattern and fix typo instances Subject: scripts/spelling.txt: add "againt" pattern and fix typo instances Subject: scripts/spelling.txt: add "neded" pattern and fix typo instances Subject: scripts/spelling.txt: add "unneded" pattern and fix typo instances Subject: scripts/spelling.txt: add "intialization" pattern and fix typo instances Subject: scripts/spelling.txt: add "initialiazation" pattern and fix typo instances Subject: scripts/spelling.txt: add "comsume(r)" pattern and fix typo instances Subject: scripts/spelling.txt: add "overrided" pattern and fix typo instances Subject: scripts/spelling.txt: add "configuartion" pattern and fix typo instances Subject: scripts/spelling.txt: add "applys" pattern and fix typo instances Subject: scripts/spelling.txt: add "explictely" pattern and fix typo instances Subject: scripts/spelling.txt: add "omited" pattern and fix typo instances Subject: scripts/spelling.txt: add "disassocation" pattern and fix typo instances Subject: scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances Subject: scripts/spelling.txt: add "overwritting" pattern and fix typo instances Subject: scripts/spelling.txt: add "overwriten" pattern and fix typo instances Subject: scripts/spelling.txt: add "therfore" pattern and fix typo instances Subject: scripts/spelling.txt: add "followings" pattern and fix typo instances Subject: scripts/spelling.txt: add some typo-words Subject: lib/vsprintf.c: remove %Z support Subject: checkpatch: warn when formats use %Z and suggest %z Subject: mm: add new mmgrab() helper Subject: mm: add new mmget() helper Subject: mm: use mmget_not_zero() helper Subject: mm: clarify mm_struct.mm_{users,count} documentation Subject: hfs: atomically read inode size Subject: mm: add arch-independent testcases for RODATA Subject: mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()
26 fixes, based on ea6200e84182989a3cce9687cf79a23ac44ec4db: Subject: userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE Subject: scripts/spelling.txt: add "disble(d)" pattern and fix typo instances Subject: scripts/spelling.txt: add "overide" pattern and fix typo instances Subject: powerpc/mm: handle protnone ptes on fork Subject: power/mm: update pte_write and pte_wrprotect to handle savedwrite Subject: x86, mm: fix gup_pte_range() vs DAX mappings Subject: x86, mm: unify exit paths in gup_pte_range() Subject: userfaultfd: non-cooperative: rollback userfaultfd_exit Subject: userfaultfd: non-cooperative: robustness check Subject: userfaultfd: non-cooperative: release all ctx in dup_userfaultfd_complete Subject: include/linux/fs.h: fix unsigned enum warning with gcc-4.2 Subject: mm/vmstats: add thp_split_pud event for clarity Subject: drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h Subject: mm/cgroup: avoid panic when init with low memory Subject: userfaultfd: non-cooperative: fix fork fctx->new memleak Subject: userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Subject: userfaultfd: selftest: vm: allow to build in vm/ directory Subject: mm/memblock.c: fix memblock_next_valid_pfn() Subject: rmap: fix NULL-pointer dereference on THP munlocking Subject: thp: fix another corner case of munlock() vs. THPs Subject: mm: do not call mem_cgroup_free() from within mem_cgroup_alloc() Subject: kasan: resched in quarantine_remove_cache() Subject: kasan: fix races in quarantine_remove_cache() Subject: sh: cayman: IDE support fix Subject: fat: fix using uninitialized fields of fat_inode/fsinfo_inode Subject: userfaultfd: remove wrong comment from userfaultfd_ctx_get()
6 fixes, based on 69eea5a4ab9c705496e912b55a9d312325de19e6: Subject: z3fold: fix spinlock unlocking in page reclaim Subject: kasan: add a prototype of task_struct to avoid warning Subject: mm, x86: fix native_pud_clear build error Subject: mm: don't warn when vmalloc() fails due to a fatal signal Subject: mm: add private lock to serialize memory hotplug operations Subject: drivers core: remove assert_held_device_hotplug()
11 fixes, based on d4562267b995fa3917717cc7773dad9c1f1ca658: Subject: mm: migrate: fix remove_migration_pte() for ksm pages Subject: mm: move mm_percpu_wq initialization earlier Subject: mm: rmap: fix huge file mmap accounting in the memcg stats Subject: mm: workingset: fix premature shadow node shrinking with cgroups Subject: mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() Subject: mm: fix section name for .data..ro_after_init Subject: hugetlbfs: initialize shared policy as part of inode allocation Subject: kasan: report only the first error by default Subject: mm/hugetlb.c: don't call region_abort if region_chg fails Subject: drivers/rapidio/devices/tsi721.c: make module parameter variable name unique Subject: kasan: do not sanitize kexec purgatory
10 fixes, based on 81d4bab4ce87228c37ab14a885438544af5c9ce6: Subject: mm: fix page_vma_mapped_walk() for ksm pages Subject: userfaultfd: report actual registered features in fdinfo Subject: mm/page_alloc.c: fix print order in show_free_areas() Subject: vmlinux.lds: add missing VMLINUX_SYMBOL macros Subject: ptrace: fix PTRACE_LISTEN race corrupting task->state Subject: mm, thp: fix setting of defer+madvise thp defrag mode Subject: dax: fix radix tree insertion race Subject: mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff() Subject: mailmap: update Yakir Yang email address Subject: mm: move pcp and lru-pcp draining into single wq
11 fixes, based on 2760078203a6b46b96307f4b06030ab0b801c97e: Subject: z3fold: fix page locking in z3fold_alloc() Subject: thp: reduce indentation level in change_huge_pmd() Subject: thp: fix MADV_DONTNEED vs. numa balancing race Subject: mm: drop unused pmdp_huge_get_and_clear_notify() Subject: thp: fix MADV_DONTNEED vs. MADV_FREE race Subject: thp: fix MADV_DONTNEED vs clear soft dirty race Subject: hugetlbfs: fix offset overflow in hugetlbfs mmap Subject: zram: fix operator precedence to get offset Subject: zram: do not use copy_page with non-page aligned address Subject: zsmalloc: expand class bit Subject: mailmap: add Martin Kepplinger's email The presence of "thp: reduce indentation level in change_huge_pmd()" is unfortunate. But the patchset had been decently reviewed and tested before we decided it was needed in -stable and I felt it best not to churn things at the last minute.
2 fixes, based on f61143c45077df4fa78e2f1ba455a00bbe1d5b8c: Subject: Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests" Subject: mm: prevent NR_ISOLATE_* stats from going negative
- a few misc things - most of MM - KASAN updates 102 patches, based on 46f0537b1ecf672052007c97f102a7e6bf0791e4: Subject: lib/dma-debug.c: make locking work for RT Subject: scripts/spelling.txt: add several more common spelling mistakes Subject: blackfin: bf609: let clk_disable() return immediately if clk is NULL Subject: fs/ocfs2/cluster: use setup_timer Subject: ocfs2: o2hb: revert hb threshold to keep compatible Subject: fs/ocfs2/cluster: use offset_in_page() macro Subject: slab: avoid IPIs when creating kmem caches Subject: mm: fix 100% CPU kswapd busyloop on unreclaimable nodes Subject: mm: fix check for reclaimable pages in PF_MEMALLOC reclaim throttling Subject: mm: remove seemingly spurious reclaimability check from laptop_mode gating Subject: mm: remove unnecessary reclaimability check from NUMA balancing target Subject: mm: don't avoid high-priority reclaim on unreclaimable nodes Subject: mm: don't avoid high-priority reclaim on memcg limit reclaim Subject: mm: delete NR_PAGES_SCANNED and pgdat_reclaimable() Subject: Revert "mm, vmscan: account for skipped pages as a partial scan" Subject: mm: remove unnecessary back-off function when retrying page reclaim Subject: mm/page-writeback.c: use setup_deferrable_timer Subject: mm: delete unnecessary TTU_* flags Subject: mm: don't assume anonymous pages have SwapBacked flag Subject: mm: move MADV_FREE pages into LRU_INACTIVE_FILE list Subject: mm: reclaim MADV_FREE pages Subject: mm: fix lazyfree BUG_ON check in try_to_unmap_one() Subject: mm: enable MADV_FREE for swapless system Subject: proc: show MADV_FREE pages info in smaps Subject: mm: memcontrol: provide shmem statistics Subject: mm, swap: Fix a race in free_swap_and_cache() Subject: mm: use is_migrate_highatomic() to simplify the code Subject: mm: use is_migrate_isolate_page() to simplify the code Subject: mm, vmstat: print non-populated zones in zoneinfo Subject: mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfo Subject: lockdep: teach lockdep about memalloc_noio_save Subject: lockdep: allow to disable reclaim lockup detection Subject: xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Subject: mm: introduce memalloc_nofs_{save,restore} API Subject: xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio* Subject: jbd2: mark the transaction context with the scope GFP_NOFS context Subject: jbd2: make the whole kjournald2 kthread NOFS safe Subject: mm: tighten up the fault path a little Subject: mm: remove rodata_test_data export, add pr_fmt Subject: mm: do not use double negation for testing page flags Subject: mm, vmscan: fix zone balance check in prepare_kswapd_sleep Subject: mm, vmscan: only clear pgdat congested/dirty/writeback state when balanced Subject: mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx Subject: mm: page_alloc: __GFP_NOWARN shouldn't suppress stall warnings Subject: mm/sparse: refine usemap_size() a little Subject: mm/compaction: ignore block suitable after check large free page Subject: mm/vmscan: more restrictive condition for retry in do_try_to_free_pages Subject: mm: remove unncessary ret in page_referenced Subject: mm: remove SWAP_DIRTY in ttu Subject: mm: remove SWAP_MLOCK check for SWAP_SUCCESS in ttu Subject: mm: make try_to_munlock() return void Subject: mm: remove SWAP_MLOCK in ttu Subject: mm: remove SWAP_AGAIN in ttu Subject: mm: make ttu's return boolean Subject: mm: make rmap_walk() return void Subject: mm: make rmap_one boolean function Subject: mm: remove SWAP_[SUCCESS|AGAIN|FAIL] Subject: mm, swap: fix comment in __read_swap_cache_async Subject: mm, swap: improve readability via make spin_lock/unlock balanced Subject: mm, swap: avoid lock swap_avail_lock when held cluster lock Subject: mm: enable page poisoning early at boot Subject: include/linux/migrate.h: add arg names to prototype Subject: mm/swap_slots.c: add warning if swap slots cache failed to initialize Subject: mm: fix spelling error Subject: userfaultfd: selftest: combine all cases into a single executable Subject: oom: improve oom disable handling Subject: mm/mmap: replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff Subject: mm: vmscan: fix IO/refault regression in cache workingset transition Subject: mm: memcontrol: clean up memory.events counting function Subject: mm: memcontrol: re-use global VM event enum Subject: mm: memcontrol: re-use node VM page state enum Subject: mm: memcontrol: use node page state naming scheme for memcg Subject: mm, swap: remove unused function prototype Subject: Documentation: vm, add hugetlbfs reservation overview Subject: mm/madvise.c: clean up MADV_SOFT_OFFLINE and MADV_HWPOISON Subject: mm/madvise: move up the behavior parameter validation Subject: mm/memory-failure.c: add page flag description in error paths Subject: mm, page_alloc: remove debug_guardpage_minorder() test in warn_alloc() Subject: zram: handle multiple pages attached bio's bvec Subject: zram: partial IO refactoring Subject: zram: use zram_slot_lock instead of raw bit_spin_lock op Subject: zram: remove zram_meta structure Subject: zram: introduce zram data accessor Subject: zram: use zram_free_page instead of open-coded Subject: zram: reduce load operation in page_same_filled Subject: fs: fix data invalidation in the cleancache during direct IO Subject: fs/block_dev: always invalidate cleancache in invalidate_bdev() Subject: mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty Subject: mm/truncate: avoid pointless cleancache_invalidate_inode() calls. Subject: mm/gup.c: fix access_ok() argument type Subject: mm/swapfile.c: fix swap space leak in error path of swap_free_entries() Subject: mm: hwpoison: call shake_page() unconditionally Subject: mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page Subject: kasan: introduce helper functions for determining bug type Subject: kasan: unify report headers Subject: kasan: change allocation and freeing stack traces headers Subject: kasan: simplify address description logic Subject: kasan: change report header Subject: kasan: improve slab object description Subject: kasan: print page description after stacks Subject: kasan: improve double-free report format Subject: kasan: separate report parts by empty lines
- the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX. 114 patches, based on 13e0988140374123bead1dd27c287354cb95108e: Subject: mm, compaction: reorder fields in struct compact_control Subject: mm, compaction: remove redundant watermark check in compact_finished() Subject: mm, page_alloc: split smallest stolen page in fallback Subject: mm, page_alloc: count movable pages when stealing from pageblock Subject: mm, compaction: change migrate_async_suitable() to suitable_migration_source() Subject: mm, compaction: add migratetype to compact_control Subject: mm, compaction: restrict async compaction to pageblocks of same migratetype Subject: mm, compaction: finish whole pageblock to reduce fragmentation Subject: fs/proc/inode.c: remove cast from memory allocation Subject: proc/sysctl: fix the int overflow for jiffies conversion Subject: drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked() Subject: jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp Subject: make help: add tools help target Subject: kernel/hung_task.c: defer showing held locks Subject: drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests Subject: drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR() Subject: Revert "lib/test_sort.c: make it explicitly non-modular" Subject: lib: add module support to array-based sort tests Subject: lib: add module support to linked list sorting tests Subject: firmware/Makefile: force recompilation if makefile changes Subject: checkpatch: remove obsolete CONFIG_EXPERIMENTAL checks Subject: checkpatch: add ability to find bad uses of vsprintf %p<foo> extensions Subject: checkpatch: improve EMBEDDED_FUNCTION_NAME test Subject: checkpatch: allow space leading blank lines in email headers Subject: checkpatch: avoid suggesting struct definitions should be const Subject: checkpatch: improve MULTISTATEMENT_MACRO_USE_DO_WHILE test Subject: checkpatch: clarify the EMBEDDED_FUNCTION_NAME message Subject: checkpatch: special audit for revert commit line Subject: checkpatch: improve k.alloc with multiplication and sizeof test Subject: checkpatch: add --typedefsfile Subject: checkpatch: improve the embedded function name test for patch contexts Subject: checkpatch: improve the SUSPECT_CODE_INDENT test Subject: reiserfs: use designated initializers Subject: fork: free vmapped stacks in cache when cpus are offline Subject: cpumask: make "nr_cpumask_bits" unsigned Subject: crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE Subject: ia64: reuse append_elf_note() and final_note() functions Subject: powerpc/fadump: remove dependency with CONFIG_KEXEC Subject: powerpc/fadump: reuse crashkernel parameter for fadump memory reservation Subject: powerpc/fadump: update documentation about crashkernel parameter reuse Subject: pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() Subject: ns: allow ns_entries to have custom symlink content Subject: pidns: expose task pid_ns_for_children to userspace Subject: taskstats: add e/u/stime for TGID command Subject: kcov: simplify interrupt check Subject: lib/fault-inject.c: use correct check for interrupts Subject: lib/zlib_inflate/inftrees.c: fix potential buffer overflow Subject: initramfs: provide a way to ignore image provided by bootloader Subject: initramfs: use vfs_stat/lstat directly Subject: ipc/shm: some shmat cleanups Subject: sysv,ipc: cacheline align kern_ipc_perm Subject: mm: introduce kv[mz]alloc helpers Subject: mm, vmalloc: properly track vmalloc users Subject: mm: support __GFP_REPEAT in kvmalloc_node for >32kB Subject: lib/rhashtable.c: simplify a strange allocation pattern Subject: net/ipv6/ila/ila_xlat.c: simplify a strange allocation pattern Subject: fs/xattr.c: zero out memory copied to userspace in getxattr Subject: treewide: use kv[mz]alloc* rather than opencoded variants Subject: net: use kvmalloc with __GFP_REPEAT rather than open coded variant Subject: drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant Subject: drivers/md/bcache/super.c: use kvmalloc Subject: mm, swap: use kvzalloc to allocate some swap data structures Subject: mm, vmalloc: use __GFP_HIGHMEM implicitly Subject: scripts/spelling.txt: add "memory" pattern and fix typos Subject: scripts/spelling.txt: add regsiter -> register spelling mistake Subject: scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances Subject: treewide: spelling: correct diffrent[iate] and banlance typos Subject: treewide: move set_memory_* functions away from cacheflush.h Subject: arm: use set_memory.h header Subject: arm64: use set_memory.h header Subject: s390: use set_memory.h header Subject: x86: use set_memory.h header Subject: agp: use set_memory.h header Subject: drm: use set_memory.h header Subject: drivers/hwtracing/intel_th/msu.c: use set_memory.h header Subject: drivers/watchdog/hpwdt.c: use set_memory.h header Subject: include/linux/filter.h: use set_memory.h header Subject: kernel/module.c: use set_memory.h header Subject: kernel/power/snapshot.c: use set_memory.h header Subject: alsa: use set_memory.h header Subject: drivers/misc/sram-exec.c: use set_memory.h header Subject: drivers/video/fbdev/vermilion/vermilion.c: use set_memory.h header Subject: drivers/staging/media/atomisp/pci/atomisp2: use set_memory.h Subject: treewide: decouple cacheflush.h and set_memory.h Subject: kref: remove WARN_ON for NULL release functions Subject: drivers/scsi/megaraid: remove expensive inline from megasas_return_cmd Subject: include/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_dec Subject: fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag Subject: Documentation/vm/transhuge.txt: fix trivial typos Subject: format-security: move static strings to const Subject: fs: f2fs: use ktime_get_real_seconds for sit_info times Subject: trace: make trace_hwlat timestamp y2038 safe Subject: fs: cifs: replace CURRENT_TIME by other appropriate apis Subject: fs: ceph: CURRENT_TIME with ktime_get_real_ts() Subject: fs: ufs: use ktime_get_real_ts64() for birthtime Subject: fs: ubifs: replace CURRENT_TIME_SEC with current_time Subject: lustre: replace CURRENT_TIME macro Subject: apparmorfs: replace CURRENT_TIME with current_time() Subject: gfs2: replace CURRENT_TIME with current_time Subject: time: delete CURRENT_TIME_SEC and CURRENT_TIME Subject: mm/huge_memory.c: use zap_deposited_table() more Subject: mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required Subject: mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC Subject: mm: introduce memalloc_noreclaim_{save,restore} Subject: treewide: convert PF_MEMALLOC manipulations to new helpers Subject: mtd: nand: nandsim: convert to memalloc_noreclaim_*() Subject: dax: add tracepoints to dax_iomap_pte_fault() Subject: dax: add tracepoints to dax_pfn_mkwrite() Subject: dax: add tracepoints to dax_load_hole() Subject: dax: add tracepoints to dax_writeback_mapping_range() Subject: dax: add tracepoint to dax_writeback_one() Subject: dax: add tracepoint to dax_insert_mapping() Subject: selftests/vm: add a test for virtual address range mapping Subject: drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
15 fixes, based on deac8429d62ca19c1571853e2a18f60e760ee04c: Subject: hwpoison, memcg: forcibly uncharge LRU pages Subject: time: delete current_fs_time() Subject: mm, vmstat: Remove spurious WARN() during zoneinfo print Subject: gcov: support GCC 7.1 Subject: mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin Subject: mm, vmalloc: fix vmalloc users tracking properly Subject: Tigran has moved Subject: dax: prevent invalidation of mapped DAX entries Subject: mm: fix data corruption due to stale mmap reads Subject: ext4: return to starting transaction in ext4_dax_huge_fault() Subject: dax: fix data corruption when fault races with write Subject: dax: fix PMD data corruption when fault races with write Subject: mm, thp: copying user pages must schedule on collapse Subject: mm: vmscan: scan until it finds eligible pages Subject: mm, docs: update memory.stat description with workingset* entries
15 fixes, baed on c531577bcdac51225f50033e0c89644873f4dc6d: Subject: ksm: prevent crash after write_protect_page fails Subject: include/linux/gfp.h: fix ___GFP_NOLOCKDEP value Subject: frv: declare jiffies to be located in the .data section Subject: mm: clarify why we want kmalloc before falling backto vmallock Subject: initramfs: fix disabling of initramfs (and its compression) Subject: slub/memcg: cure the brainless abuse of sysfs attributes Subject: pcmcia: remove left-over %Z format Subject: mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once Subject: mm: avoid spurious 'bad pmd' warning messages Subject: dax: fix race between colliding PMD & PTE entries Subject: mm/migrate: fix refcount handling when !hugepage_migration_supported() Subject: mlock: fix mlock count can not decrease in race condition Subject: mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified Subject: mm: consider memblock reservations for deferred memory initialization sizing Subject: scripts/gdb: make lx-dmesg command work (reliably)
5 fixes, based on ab2789b72df3cf7a01e30636ea86cbbf44ba2e99: Subject: mm/memory-failure.c: use compound_head() flags for huge pages Subject: swap: cond_resched in swap_cgroup_prepare() Subject: mm: numa: avoid waiting on freed migrated pages Subject: userfaultfd: shmem: handle coredumping in handle_userfault() Subject: mm: correct the comment when reclaimed pages exceed the scanned pages
8 fixes, based on a38371cba67539ce6a5d5324db34bc2ddaf66cc1: Subject: mm, thp: remove cond_resched from __collapse_huge_page_copy Subject: mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings Subject: autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL Subject: fs/dax.c: fix inefficiency in dax_writeback_mapping_range() Subject: lib/cmdline.c: fix get_options() overflow while parsing ranges Subject: slub: make sysfs file removal asynchronous Subject: ocfs2: fix deadlock caused by recursive locking in xattr Subject: fs/exec.c: account for argv/envp pointers
- a few hotfixes - various misc updates - ocfs2 updates - most of MM 108 patches, based on 9ced560b82606b35adb33a27012a148d418a4c1f: Subject: compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled Subject: thp, mm: fix crash due race in MADV_FREE handling Subject: kernel/extable.c: mark core_kernel_text notrace Subject: mn10300: remove wrapper header for asm/device.h Subject: mn10300: use generic fb.h Subject: tile: provide default ioremap declaration Subject: scripts/gen_initramfs_list.sh: teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user". Subject: ramfs: clarify help text that compression applies to ramfs as well as legacy ramdisk. Subject: scripts/spelling.txt: add a bunch more spelling mistakes Subject: provide linux/set_memory.h Subject: kernel/power/snapshot.c: use linux/set_memory.h Subject: kernel/module.c: use linux/set_memory.h Subject: include/linux/filter.h: use linux/set_memory.h Subject: drivers/sh/intc/virq.c: delete an error message for a failed memory allocation in add_virq_to_pirq() Subject: ocfs2: fix a static checker warning Subject: ocfs2: use magic.h Subject: ocfs2: free 'dummy_sc' in sc_fop_release() to prevent memory leak Subject: ocfs2: constify attribute_group structures Subject: fs/file.c: replace alloc_fdmem() with kvmalloc() alternative Subject: mm/slub.c: remove a redundant assignment in ___slab_alloc() Subject: mm/slub: reset cpu_slab's pointer in deactivate_slab() Subject: mm/slub.c: pack red_left_pad with another int to save a word Subject: mm/slub.c: wrap cpu_slab->partial in CONFIG_SLUB_CPU_PARTIAL Subject: mm/slub.c: wrap kmem_cache->cpu_partial in config CONFIG_SLUB_CPU_PARTIAL Subject: mm/slab.c: replace open-coded round-up code with ALIGN Subject: mm: allow slab_nomerge to be set at build time Subject: mm, sparsemem: break out of loops early Subject: mm/mmap.c: mark protection_map as __ro_after_init Subject: mm/vmscan.c: fix unsequenced modification and access warning Subject: mm/nobootmem.c: return 0 when start_pfn equals end_pfn Subject: ksm: introduce ksm_max_page_sharing per page deduplication limit Subject: ksm: fix use after free with merge_across_nodes = 0 Subject: ksm: cleanup stable_node chain collapse case Subject: ksm: swap the two output parameters of chain/chain_prune Subject: ksm: optimize refile of stable_node_dup at the head of the chain Subject: zram: count same page write as page_stored Subject: mm/vmstat.c: standardize file operations variable names Subject: mm, THP, swap: delay splitting THP during swap out Subject: mm, THP, swap: unify swap slot free functions to put_swap_page Subject: mm, THP, swap: move anonymous THP split logic to vmscan Subject: mm, THP, swap: check whether THP can be split firstly Subject: mm, THP, swap: enable THP swap optimization only if has compound map Subject: mm: remove return value from init_currently_empty_zone Subject: mm, memory_hotplug: use node instead of zone in can_online_high_movable Subject: mm: drop page_initialized check from get_nid_for_pfn Subject: mm, memory_hotplug: get rid of is_zone_device_section Subject: mm, memory_hotplug: split up register_one_node() Subject: mm, memory_hotplug: consider offline memblocks removable Subject: mm: consider zone which is not fully populated to have holes Subject: mm, compaction: skip over holes in __reset_isolation_suitable Subject: mm: __first_valid_page skip over offline pages Subject: mm, vmstat: skip reporting offline pages in pagetypeinfo Subject: mm, memory_hotplug: do not associate hotadded memory to zones until online Subject: mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior Subject: mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone Subject: mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory Subject: mm, memory_hotplug: fix the section mismatch warning Subject: mm, memory_hotplug: remove unused cruft after memory hotplug rework Subject: kernel/exit.c: don't include unused userfaultfd_k.h Subject: fs/userfaultfd.c: drop dead code Subject: mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level Subject: mm/hugetlb/migration: use set_huge_pte_at instead of set_pte_at Subject: mm/follow_page_mask: split follow_page_mask to smaller functions. Subject: mm/hugetlb: export hugetlb_entry_migration helper Subject: mm/follow_page_mask: add support for hugetlb pgd entries Subject: mm/hugetlb: move default definition of hugepd_t earlier in the header Subject: mm/follow_page_mask: add support for hugepage directory entry Subject: powerpc/hugetlb: add follow_huge_pd implementation for ppc64 Subject: powerpc/mm/hugetlb: remove follow_huge_addr for powerpc Subject: powerpc/hugetlb: enable hugetlb migration for ppc64 Subject: mm: zero hash tables in allocator Subject: mm: update callers to use HASH_ZERO flag Subject: mm: adaptive hash table scaling Subject: mm/hugetlb: clean up ARCH_HAS_GIGANTIC_PAGE Subject: powerpc/mm/hugetlb: add support for 1G huge pages Subject: mm/page_alloc.c: mark bad_range() and meminit_pfn_in_nid() as __maybe_unused Subject: mm: drop NULL return check of pte_offset_map_lock() Subject: arm64: hugetlb: refactor find_num_contig() Subject: arm64: hugetlb: remove spurious calls to huge_ptep_offset() Subject: mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages Subject: mm, gup: ensure real head page is ref-counted when using hugepages Subject: mm/hugetlb: add size parameter to huge_pte_offset() Subject: mm/hugetlb: allow architectures to override huge_pte_clear() Subject: mm/hugetlb: introduce set_huge_swap_pte_at() helper Subject: mm: rmap: use correct helper when poisoning hugepages Subject: mm, page_alloc: fix more premature OOM due to race with cpuset update Subject: mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask() Subject: mm, page_alloc: pass preferred nid instead of zonelist to allocator Subject: mm, mempolicy: simplify rebinding mempolicies when updating cpusets Subject: mm, cpuset: always use seqlock when changing task's nodemask Subject: mm, mempolicy: don't check cpuset seqlock where it doesn't matter Subject: mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures Subject: mm: kmemleak: factor object reference updating out of scan_block() Subject: mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects Subject: mm: per-cgroup memory reclaim stats Subject: mm/oom_kill: count global and memory cgroup oom kills Subject: mm/swapfile.c: sort swap entries before free Subject: mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create() Subject: mm/zswap.c: improve a size determination in zswap_frontswap_init() Subject: mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare() Subject: mm: vmstat: move slab statistics from zone to node counters Subject: mm: memcontrol: use the node-native slab memory counters Subject: mm: memcontrol: use generic mod_memcg_page_state for kmem pages Subject: mm: memcontrol: per-lruvec stats infrastructure Subject: mm: memcontrol: account slab stats per lruvec Subject: mm, memory_hotplug: drop artificial restriction on online/offline Subject: mm, memory_hotplug: drop CONFIG_MOVABLE_NODE Subject: mm, memory_hotplug: move movable_node to the hotplug proper
- most of the rest of MM - KASAN updates - lib/ updates - checkpatch updates - some binfmt_elf changes - various misc bits 115 patches, based on 9eb788800510ae1a6bc419636a66071ee4deafd5: Subject: swap: add block io poll in swapin path Subject: mm, page_alloc: fallback to smallest page when not stealing whole pageblock Subject: mm/memory.c: convert to DEFINE_DEBUGFS_ATTRIBUTE Subject: mm, vmscan: avoid thrashing anon lru when free + file is low Subject: mm/memory_hotplug.c: add NULL check to avoid potential NULL pointer dereference Subject: mm/zsmalloc.c: fix -Wunneeded-internal-declaration warning Subject: fs/buffer.c: make bh_lru_install() more efficient Subject: mm: hugetlb: prevent reuse of hwpoisoned free hugepages Subject: mm: hugetlb: return immediately for hugetlb page in __delete_from_page_cache() Subject: mm: hwpoison: change PageHWPoison behavior on hugetlb pages Subject: mm: hugetlb: soft-offline: dissolve source hugepage after successful migration Subject: mm: soft-offline: dissolve free hugepage if soft-offlined Subject: mm: hwpoison: introduce memory_failure_hugetlb() Subject: mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error Subject: mm: hugetlb: delete dequeue_hwpoisoned_huge_page() Subject: mm: hwpoison: introduce idenfity_page_state Subject: mm, vmpressure: pass-through notification support Subject: mm: make PR_SET_THP_DISABLE immediately active Subject: mm/memcontrol: exclude @root from checks in mem_cgroup_low Subject: vmalloc: show lazy-purged vma info in vmallocinfo Subject: mm/cma.c: warn if the CMA area could not be activated Subject: mm/hugetlb.c: warn the user when issues arise on boot due to hugepages Subject: oom, trace: remove ENUM evaluation of COMPACTION_FEEDBACK Subject: mm: improve readability of transparent_hugepage_enabled() Subject: mm: always enable thp for dax mappings Subject: include/linux/page_ref.h: ensure page_ref_unfreeze is ordered against prior accesses Subject: mm/migrate.c: stabilise page count when migrating transparent hugepages Subject: zram: use __sysfs_match_string() helper Subject: mm, memory_hotplug: support movable_node for hotpluggable nodes Subject: mm, memory_hotplug: simplify empty node mask handling in new_node_page Subject: hugetlb, memory_hotplug: prefer to use reserved pages for migration Subject: mm: unify new_node_page and alloc_migrate_target Subject: mm, hugetlb: schedule when potentially allocating many hugepages Subject: mm, memcg: fix potential undefined behavior in mem_cgroup_event_ratelimit() Subject: mm/hugetlb.c: replace memfmt with string_get_size Subject: mm/truncate.c: fix THP handling in invalidate_mapping_pages() Subject: userfaultfd: non-cooperative: add madvise() event for MADV_FREE request Subject: mm/oom_kill.c: add tracepoints for oom reaper-related events Subject: mm, hugetlb: unclutter hugetlb allocation layers Subject: hugetlb: add support for preferred node to alloc_huge_page_nodemask Subject: mm, hugetlb, soft_offline: use new_page_nodemask for soft offline migration Subject: mm: avoid taking zone lock in pagetypeinfo_showmixed() Subject: mm: drop useless local parameters of __register_one_node() Subject: fs/proc/task_mmu.c: remove obsolete comment in show_map_vma() Subject: mm/page_alloc.c: eliminate unsigned confusion in __rmqueue_fallback Subject: mm/swap_slots.c: don't disable preemption while taking the per-CPU cache Subject: include/linux/mmzone.h: remove ancient/ambiguous comment Subject: include/linux/backing-dev.h: simplify wb_stat_sum Subject: mm: document highmem_is_dirtyable sysctl Subject: mm/memory_hotplug.c: remove unused local zone_type from __remove_zone() Subject: cma: fix calculation of aligned offset Subject: mm/balloon_compaction.c: enqueue zero page to balloon device Subject: mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack Subject: mm/mmap.c: expand_downwards: don't require the gap if !vm_prev Subject: mm/list_lru.c: fix list_lru_count_node() to be race free Subject: fs/dcache.c: fix spin lockup issue on nlru->lock Subject: mm: use dedicated helper to access rlimit value Subject: mm: swap: provide lru_add_drain_all_cpuslocked() Subject: mm/memory-hotplug: switch locking to a percpu rwsem Subject: mm: disallow early_pfn_to_nid on configurations which do not implement it Subject: zram: constify attribute_group structures. Subject: mm/zsmalloc: simplify zs_max_alloc_size handling Subject: mm/kasan/kasan_init.c: use kasan_zero_pud for p4d table Subject: mm/kasan: get rid of speculative shadow checks Subject: x86/kasan: don't allocate extra shadow memory Subject: arm64/kasan: don't allocate extra shadow memory Subject: mm/kasan: add support for memory hotplug Subject: mm/kasan/kasan.c: rename XXX_is_zero to XXX_is_nonzero Subject: kasan: make get_wild_bug_type() static Subject: frv: remove wrapper header for asm/device.h Subject: frv: use generic fb.h Subject: frv: cmpxchg: implement cmpxchg64() Subject: fs/proc/generic.c: switch to ida_simple_get/remove Subject: asm-generic/bug.h: declare struct pt_regs; before function prototype Subject: linux/bug.h: correct formatting of block comment Subject: linux/bug.h: correct "(foo*)" should be "(foo *)" Subject: linux/bug.h: correct "space required before that '-'" Subject: bug: split BUILD_BUG stuff out into <linux/build_bug.h> Subject: ARM: fix rd_size declaration Subject: kernel/ksysfs.c: constify attribute_group structures. Subject: kernel/groups.c: use sort library function Subject: kernel/kallsyms.c: replace all_var with IS_ENABLED(CONFIG_KALLSYMS_ALL) Subject: MAINTAINERS: give proc sysctl some maintainer love Subject: lib/test_bitmap.c: add optimisation tests Subject: bitmap: optimise bitmap_set and bitmap_clear of a single bit Subject: include/linux/bitmap.h: turn bitmap_set and bitmap_clear into memset when possible Subject: bitmap: use memcmp optimisation in more situations Subject: lib/kstrtox.c: delete end-of-string test Subject: lib/kstrtox.c: use "unsigned int" more Subject: lib/interval_tree_test.c: allow the module to be compiled-in Subject: lib/interval_tree_test.c: make test options module parameters Subject: lib/interval_tree_test.c: allow users to limit scope of endpoint Subject: lib/interval_tree_test.c: allow full tree search Subject: lib/rhashtable.c: use kvzalloc() in bucket_table_alloc() when possible Subject: lib/extable.c: use bsearch() library function in search_extable() Subject: lib/bsearch.c: micro-optimize pivot position calculation Subject: checkpatch: improve the unnecessary OOM message test Subject: checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t Subject: checkpatch: [HLP]LIST_HEAD is also declaration Subject: checkpatch: fix stepping through statements with $stat and ctx_statement_block Subject: checkpatch: remove false warning for commit reference Subject: checkpatch: improve tests for multiple line function definitions Subject: checkpatch: silence perl 5.26.0 unescaped left brace warnings Subject: checkpatch: change format of --color argument to --color[=WHEN] Subject: checkpatch: improve macro reuse test Subject: checkpatch: improve multi-line alignment test Subject: fs, epoll: short circuit fetching events if thread has been killed Subject: binfmt_elf: use ELF_ET_DYN_BASE only for PIE Subject: arm: move ELF_ET_DYN_BASE to 4MB Subject: arm64: move ELF_ET_DYN_BASE to 4GB / 4MB Subject: powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB Subject: s390: reduce ELF_ET_DYN_BASE Subject: binfmt_elf: safely increment argv pointers Subject: kernel/signal.c: avoid undefined behaviour in kill_something_info Subject: kernel/exit.c: avoid undefined behaviour when calling wait4()
- various misc things - kexec updates - sysctl core updates - scripts/gdb udpates - checkpoint-restart updates - ipc updates - kernel/watchdog updates - Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature" - "stackprotector: ascii armor the stack canary" - more MM bits - checkpatch updates 96 patches, based on 235b84fc862ae2637dc0dabada18d97f1bfc18e1: Subject: include/linux/dcache.h: use unsigned chars in struct name_snapshot Subject: kernel.h: handle pointers to arrays better in container_of() Subject: mm/memory.c: mark create_huge_pmd() inline to prevent build failure Subject: kernel/fork.c: virtually mapped stacks: do not disable interrupts Subject: kexec: move vmcoreinfo out of the kernel's .bss section Subject: powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdr Subject: kdump: protect vmcoreinfo data under the crash memory Subject: kexec/kdump: minor Documentation updates for arm64 and Image Subject: sysctl: fix lax sysctl_check_table() sanity check Subject: sysctl: kdoc'ify sysctl_writes_strict Subject: sysctl: fold sysctl_writes_strict checks into helper Subject: sysctl: simplify unsigned int support Subject: sysctl: add unsigned int range support Subject: test_sysctl: add dedicated proc sysctl test driver Subject: test_sysctl: add generic script to expand on tests Subject: test_sysctl: test against PAGE_SIZE for int Subject: test_sysctl: add simple proc_dointvec() case Subject: test_sysctl: add simple proc_douintvec() case Subject: test_sysctl: test against int proc_dointvec() array support Subject: kernel/sysctl_binary.c: check name array length in deprecated_sysctl_warning() Subject: random: do not ignore early device randomness Subject: bfs: fix sanity checks for empty files Subject: fs/Kconfig: kill CONFIG_PERCPU_RWSEM some more Subject: scripts/gdb: add lx-fdtdump command Subject: scripts/gdb: lx-dmesg: cast log_buf to void* for addr fetch Subject: scripts/gdb: lx-dmesg: use explicit encoding=utf8 errors=replace Subject: kfifo: clean up example to not use page_link Subject: procfs: fdinfo: extend information about epoll target files Subject: kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files Subject: kcmp: fs/epoll: wrap kcmp code with CONFIG_CHECKPOINT_RESTORE Subject: fault-inject: support systematic fault injection Subject: ipc/sem.c: remove sem_base, embed struct sem Subject: ipc: merge ipc_rcu and kern_ipc_perm Subject: include/linux/sem.h: correctly document sem_ctime Subject: ipc: drop non-RCU allocation Subject: ipc/sem: do not use ipc_rcu_free() Subject: ipc/shm: do not use ipc_rcu_free() Subject: ipc/msg: do not use ipc_rcu_free() Subject: ipc/util: drop ipc_rcu_free() Subject: ipc/sem: avoid ipc_rcu_alloc() Subject: ipc/shm: avoid ipc_rcu_alloc() Subject: ipc/msg: avoid ipc_rcu_alloc() Subject: ipc/util: drop ipc_rcu_alloc() Subject: ipc/sem.c: avoid ipc_rcu_putref for failed ipc_addid() Subject: ipc/shm.c: avoid ipc_rcu_putref for failed ipc_addid() Subject: ipc/msg.c: avoid ipc_rcu_putref for failed ipc_addid() Subject: ipc: move atomic_set() to where it is needed Subject: ipc/shm: remove special shm_alloc/free Subject: ipc/msg: remove special msg_alloc/free Subject: ipc/sem: drop __sem_free() Subject: ipc/util.h: update documentation for ipc_getref() and ipc_putref() Subject: net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info() Subject: kernel/watchdog: remove unused declaration Subject: kernel/watchdog: introduce arch_touch_nmi_watchdog() Subject: kernel/watchdog: split up config options Subject: kernel/watchdog: provide watchdog_nmi_reconfigure() for arch watchdogs Subject: powerpc/64s: implement arch-specific hardlockup watchdog Subject: efi: avoid fortify checks in EFI stub Subject: kexec_file: adjust declaration of kexec_purgatory Subject: IB/rxe: do not copy extra stack memory to skb Subject: powerpc: don't fortify prom_init Subject: powerpc: make feature-fixup tests fortify-safe Subject: include/linux/string.h: add the option of fortified string.h functions Subject: sh: mark end of BUG() implementation as unreachable Subject: random,stackprotect: introduce get_random_canary function Subject: fork,random: use get_random_canary() to set tsk->stack_canary Subject: x86: ascii armor the x86_64 boot init stack canary Subject: arm64: ascii armor the arm64 boot init stack canary Subject: sh64: ascii armor the sh64 boot init stack canary Subject: x86/mmap: properly account for stack randomization in mmap_base Subject: arm64/mmap: properly account for stack randomization in mmap_base Subject: powerpc,mmap: properly account for stack randomization in mmap_base Subject: MIPS: do not use __GFP_REPEAT for order-0 request Subject: mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic Subject: xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL Subject: mm: kvmalloc support __GFP_RETRY_MAYFAIL for all sizes Subject: drm/i915: use __GFP_RETRY_MAYFAIL Subject: mm, migration: do not trigger OOM killer when migrating memory Subject: checkpatch: improve the STORAGE_CLASS test Subject: ARM: KVM: move asmlinkage before type Subject: ARM: HP Jornada 7XX: move inline before return type Subject: CRIS: gpio: move inline before return type Subject: FRV: tlbflush: move asmlinkage before return type Subject: ia64: move inline before return type Subject: ia64: sn: pci: move inline before type Subject: m68k: coldfire: move inline before return type Subject: MIPS: SMP: move asmlinkage before return type Subject: sh: move inline before return type Subject: x86/efi: move asmlinkage before return type Subject: drivers: s390: move static and inline before return type Subject: drivers: tty: serial: move inline before return type Subject: USB: serial: safe_serial: move __inline__ before return type Subject: video: fbdev: intelfb: move inline before return type Subject: video: fbdev: omap: move inline before return type Subject: ARM: samsung: usb-ohci: move inline before return type Subject: writeback: rework wb_[dec|inc]_stat family of functions
- a few leftovers - fault-injector rework - add a module loader test driver 13 patches, based on b86faee6d111294fa95a2e89b5f771b2da3c9782: Subject: mm: fix overflow check in expand_upwards() Subject: lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int Subject: MAINTAINERS: move the befs tree to kernel.org Subject: kernel/watchdog.c: use better pr_fmt prefix Subject: fault-inject: automatically detect the number base for fail-nth write interface Subject: fault-inject: parse as natural 1-based value for fail-nth write interface Subject: fault-inject: make fail-nth read/write interface symmetric Subject: fault-inject: simplify access check for fail-nth Subject: fault-inject: add /proc/<pid>/fail-nth Subject: xtensa: use generic fb.h Subject: MAINTAINERS: give kmod some maintainer love Subject: kmod: add test driver to stress test the module loader Subject: kmod: throttle kmod thread limit
16 fixes, based on 4d3f5d04d69e9479a3df88ceb0e2cd8188a49366: Subject: mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors Subject: pid: kill pidhash_size in pidhash_init() Subject: mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries Subject: userfaultfd: non-cooperative: notify about unmap of destination during mremap Subject: kasan: avoid -Wmaybe-uninitialized warning Subject: kthread: fix documentation build warning Subject: zram: do not free pool->size_class Subject: fortify: use WARN instead of BUG for now Subject: mm/page_io.c: fix oops during block io poll in swapin path Subject: mm: take memory hotplug lock within numa_zonelist_order_handler() Subject: userfaultfd_zeropage: return -ENOSPC in case mm has gone Subject: cpuset: fix a deadlock due to incomplete patching of cpusets_enabled() Subject: ipc: add missing container_of()s for randstruct Subject: userfaultfd: non-cooperative: flush event_wqh at release time Subject: mm: allow page_cache_get_speculative in interrupt context Subject: ocfs2: don't clear SGID when inheriting ACLs
21 fixes, based on 26273939ace935dd7553b31d279eab30b40f7b9a: Subject: mm: fix global NR_SLAB_.*CLAIMABLE counter reads Subject: mm: ratelimit PFNs busy info message Subject: userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case Subject: test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY" Subject: test_kmod: fix bug which allows negative values on two config options Subject: test_kmod: fix the lock in register_test_dev_kmod() Subject: test_kmod: fix small memory leak on filesystem tests Subject: fault-inject: fix wrong should_fail() decision in task context Subject: mm: migrate: prevent racy access to tlb_flush_pending Subject: mm: migrate: fix barriers around tlb_flush_pending Subject: Revert "mm: numa: defer TLB flush for THP migration as long as possible" Subject: mm: refactor TLB gathering API Subject: mm: make tlb_flush_pending global Subject: mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem Subject: mm: fix KSM data corruption Subject: MAINTAINERS: copy virtio on balloon_compaction.c Subject: mm/balloon_compaction.c: don't zero ballooned pages Subject: mm: fix list corruptions on shmem shrinklist Subject: rmap: do not call mmu_notifier_invalidate_page() under ptl Subject: zram: rework copy of compressor name in comp_algorithm_store() Subject: userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
14 fixes, based on 039a8e38473323ed9f6c4415b4c3a36777efac34: Subject: mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() Subject: kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog Subject: wait: add wait_event_killable_timeout() Subject: kmod: fix wait on recursive loop Subject: test_kmod: fix description for -s -and -c parameters Subject: mm: discard memblock data later Subject: slub: fix per memcg cache leak on css offline Subject: mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS Subject: mm, oom: fix potential data corruption when oom_reaper races with writer Subject: signal: don't remove SIGNAL_UNKILLABLE for traced tasks. Subject: mm/cma_debug.c: fix stack corruption due to sprintf usage Subject: mm/mempolicy: fix use after free when calling get_mempolicy Subject: mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM Subject: mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
6 fixes, based on 90a6cd503982bfd33ce8c70eb49bd2dd33bc6325: Subject: PM/hibernate: touch NMI watchdog when creating snapshot Subject: mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled Subject: dax: fix deadlock due to misaligned PMD faults Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE Subject: fork: fix incorrect fput of ->exe_file causing use-after-free Subject: mm/memblock.c: reversed logic in memblock_discard()
6 fixes, baed on 42ff72cf27027fa28dd79acabe01d9196f1480a7: Subject: mm,page_alloc: don't call __node_reclaim() with oom_lock held. Subject: kernel/kthread.c: kthread_worker: don't hog the cpu Subject: mm, uprobes: fix multiple free of ->uprobes_state.xol_area Subject: mm, madvise: ensure poisoned pages are removed from per-cpu lists Subject: include/linux/compiler.h: don't perform compiletime_assert with -O0 Subject: scripts/dtc: fix '%zx' warning
- various misc bits - DAX updates - OCFS2 - most of MM 119 patches, based on e7d0c41ecc2e372a81741a30894f556afec24315: Subject: metag/numa: remove the unused parent_node() macro Subject: mm: add vm_insert_mixed_mkwrite() Subject: dax: relocate some dax functions Subject: dax: use common 4k zero page for dax mmap reads Subject: dax: remove DAX code from page_cache_tree_insert() Subject: dax: move all DAX radix tree defs to fs/dax.c Subject: dax: explain how read(2)/write(2) addresses are validated Subject: dax: use PG_PMD_COLOUR instead of open coding Subject: dax: initialize variable pfn before using it Subject: modpost: simplify sec_name() Subject: ocfs2: make ocfs2_set_acl() static Subject: ocfs2: clean up some dead code Subject: slub: tidy up initialization ordering Subject: mm: add SLUB free list pointer obfuscation Subject: mm/slub.c: add a naive detection of double free or corruption Subject: mm: track actual nr_scanned during shrink_slab() Subject: drm/i915: wire up shrinkctl->nr_scanned Subject: mm/memory_hotplug: just build zonelist for newly added node Subject: mm, memory_hotplug: display allowed zones in the preferred ordering Subject: mm, memory_hotplug: remove zone restrictions Subject: zram: clean up duplicated codes in __zram_bvec_write Subject: zram: inline zram_compress Subject: zram: rename zram_decompress_page to __zram_bvec_read Subject: zram: add interface to specif backing device Subject: zram: add free space management in backing device Subject: zram: identify asynchronous IO's return value Subject: zram: write incompressible pages to backing device Subject: zram: read page from backing device Subject: zram: add config and doc file for writeback feature Subject: mm, page_alloc: rip out ZONELIST_ORDER_ZONE Subject: mm, page_alloc: remove boot pageset initialization from memory hotplug Subject: mm, page_alloc: do not set_cpu_numa_mem on empty nodes initialization Subject: mm, memory_hotplug: drop zone from build_all_zonelists Subject: mm, memory_hotplug: remove explicit build_all_zonelists from try_online_node Subject: mm, page_alloc: simplify zonelist initialization Subject: mm, page_alloc: remove stop_machine from build_all_zonelists Subject: mm, memory_hotplug: get rid of zonelists_mutex Subject: mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations Subject: mm, page_owner: make init_pages_in_zone() faster Subject: mm, page_ext: periodically reschedule during page_ext_init() Subject: mm, page_owner: don't grab zone->lock for init_pages_in_zone() Subject: mm/mremap: fail map duplication attempts for private mappings Subject: mm/gup: make __gup_device_* require THP Subject: mm/hugetlb.c: make huge_pte_offset() consistent and document behaviour Subject: mm: always flush VMA ranges affected by zap_page_range Subject: zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse Subject: mm, vmscan: do not loop on too_many_isolated for ever Subject: fscache: remove unused ->now_uncached callback Subject: mm: make pagevec_lookup() update index Subject: mm: implement find_get_pages_range() Subject: fs: fix performance regression in clean_bdev_aliases() Subject: ext4: use pagevec_lookup_range() in ext4_find_unwritten_pgoff() Subject: ext4: use pagevec_lookup_range() in writeback code Subject: hugetlbfs: use pagevec_lookup_range() in remove_inode_hugepages() Subject: fs: use pagevec_lookup_range() in page_cache_seek_hole_data() Subject: mm: use find_get_pages_range() in filemap_range_has_page() Subject: mm: remove nr_pages argument from pagevec_lookup{,_range}() Subject: mm, memcg: reset memory.low during memcg offlining Subject: cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction") Subject: mm/ksm.c: constify attribute_group structures Subject: mm/slub.c: constify attribute_group structures Subject: mm/page_idle.c: constify attribute_group structures Subject: mm/huge_memory.c: constify attribute_group structures Subject: mm/hugetlb.c: constify attribute_group structures Subject: mm: memcontrol: use int for event/state parameter in several functions Subject: mm, THP, swap: support to clear swap cache flag for THP swapped out Subject: mm, THP, swap: support to reclaim swap space for THP swapped out Subject: mm, THP, swap: make reuse_swap_page() works for THP swapped out Subject: mm, THP, swap: don't allocate huge cluster for file backed swap device Subject: block, THP: make block_device_operations.rw_page support THP Subject: mm: test code to write THP to swap device as a whole Subject: mm, THP, swap: support splitting THP for THP swap out Subject: memcg, THP, swap: support move mem cgroup charge for THP swapped out Subject: memcg, THP, swap: avoid to duplicated charge THP in swap cache Subject: memcg, THP, swap: make mem_cgroup_swapout() support THP Subject: mm, THP, swap: delay splitting THP after swapped out Subject: mm, THP, swap: add THP swapping out fallback counting Subject: shmem: shmem_charge: verify max_block is not exceeded before inode update Subject: shmem: introduce shmem_inode_acct_block Subject: userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support Subject: userfaultfd: mcopy_atomic: introduce mfill_atomic_pte helper Subject: userfaultfd: shmem: wire up shmem_mfill_zeropage_pte Subject: userfaultfd: report UFFDIO_ZEROPAGE as available for shmem VMAs Subject: userfaultfd: selftest: enable testing of UFFDIO_ZEROPAGE for shmem Subject: fs/sync.c: remove unnecessary NULL f_mapping check in sync_file_range Subject: include/linux/fs.h: remove unneeded forward definition of mm_struct Subject: mm: hugetlb: define system call hugetlb size encodings in single file Subject: mm: arch: consolidate mmap hugetlb size encodings Subject: mm: shm: use new hugetlb size encoding definitions Subject: mm: rename global_page_state to global_zone_page_state Subject: mm: userfaultfd: add feature to request for a signal delivery Subject: userfaultfd: selftest: add tests for UFFD_FEATURE_SIGBUS feature Subject: userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST Subject: userfaultfd: selftest: explicit failure if the SIGBUS test failed Subject: userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds Subject: userfaultfd: provide pid in userfault msg Subject: userfaultfd: provide pid in userfault msg - add feat union Subject: mm, hugetlb: do not allocate non-migrateable gigantic pages from movable zones Subject: mm/vmstat: fix divide error at __fragmentation_index Subject: mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas() Subject: mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups Subject: mm/shmem: add hugetlbfs support to memfd_create() Subject: selftests/memfd: add memfd_create hugetlbfs selftest Subject: mm/vmstat.c: fix wrong comment Subject: mm/vmalloc.c: don't reinvent the wheel but use existing llist API Subject: mm, swap: add swap readahead hit statistics Subject: mm, swap: fix swap readahead marking Subject: mm, swap: VMA based swap readahead Subject: mm, swap: add sysfs interface for VMA based swap readahead Subject: mm, swap: don't use VMA based swap readahead if HDD is used as swap Subject: z3fold: use per-cpu unbuddied lists Subject: mm, oom: do not rely on TIF_MEMDIE for memory reserves access Subject: mm: replace TIF_MEMDIE checks by tsk_is_oom_victim Subject: swap: choose swap device according to numa node Subject: mm: oom: let oom_reap_task and exit_mmap run concurrently Subject: mm: hugetlb: clear target sub-page last when clearing huge page Subject: mm: add /proc/pid/smaps_rollup Subject: x86,mpx: make mpx depend on x86-64 to free up VMA flag Subject: mm,fork: introduce MADV_WIPEONFORK
126 patches, based on 015a9e66b9b8c1f28097ed09bf9350708e26249a: - most of the rest of MM - a small number of misc things - lib/ updates - checkpatch - autofs updates - ipc/ updates Subject: mm: mempolicy: add queue_pages_required() Subject: mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 Subject: mm: thp: introduce separate TTU flag for thp freezing Subject: mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION Subject: mm: thp: enable thp migration in generic path Subject: mm: thp: check pmd migration entry in common path Subject: mm: soft-dirty: keep soft-dirty bits over thp migration Subject: mm: mempolicy: mbind and migrate_pages support thp migration Subject: mm: migrate: move_pages() supports thp migration Subject: mm: memory_hotplug: memory hotremove supports thp migration Subject: hmm: heterogeneous memory management documentation Subject: mm/hmm: heterogeneous memory management (HMM for short) Subject: mm/hmm/mirror: mirror process address space on device with HMM helpers Subject: mm/hmm/mirror: helper to snapshot CPU page table Subject: mm/hmm/mirror: device page fault handler Subject: mm/memory_hotplug: introduce add_pages Subject: mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory Subject: mm/ZONE_DEVICE: special case put_page() for device private pages Subject: mm/memcontrol: allow to uncharge page without using page->lru field Subject: mm/memcontrol: support MEMORY_DEVICE_PRIVATE Subject: mm/hmm/devmem: device memory hotplug using ZONE_DEVICE Subject: mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory Subject: mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Subject: mm/migrate: new memory migration helper for use with device memory Subject: mm/migrate: migrate_vma() unmap page from vma while collecting pages Subject: mm/migrate: support un-addressable ZONE_DEVICE page in migration Subject: mm/migrate: allow migrate_vma() to alloc new page on empty entry Subject: mm/device-public-memory: device memory cache coherent with CPU Subject: mm/hmm: add new helper to hotplug CDM memory region Subject: mm/hmm: avoid bloating arch that do not make use of HMM Subject: mm/hmm: fix build when HMM is disabled Subject: mm: remove useless vma parameter to offset_il_node Subject: userfaultfd: non-cooperative: closing the uffd without triggering SIGBUS Subject: mm/memory.c: remove reduntant check for write access Subject: mm: change the call sites of numa statistics items Subject: mm: update NUMA counter threshold size Subject: mm: consider the number in local CPUs when reading NUMA stats Subject: mm/mlock.c: use page_zone() instead of page_zone_id() Subject: mm/zsmalloc.c: change stat type parameter to int Subject: mm: fadvise: avoid fadvise for fs without backing device Subject: mm: memcontrol: use per-cpu stocks for socket memory uncharging Subject: mm/memory.c: fix mem_cgroup_oom_disable() call missing Subject: mm/sparse.c: fix typo in online_mem_sections Subject: tools/testing/selftests/kcmp/kcmp_test.c: add KCMP_EPOLL_TFD testing Subject: mm/page_alloc.c: apply gfp_allowed_mask before the first allocation attempt Subject: mm: kvfree the swap cluster info if the swap file is unsatisfactory Subject: mm/swapfile.c: fix swapon frontswap_map memory leak on error Subject: mm/mempolicy.c: remove BUG_ON() checks for VMA inside mpol_misplaced() Subject: fs, proc: remove priv argument from is_stack Subject: proc: uninline proc_create() Subject: fs, proc: unconditional cond_resched when reading smaps Subject: linux/kernel.h: move DIV_ROUND_DOWN_ULL() macro Subject: lib/string.c: add multibyte memset functions Subject: lib/string.c: add testcases for memset16/32/64 Subject: x86: implement memset16, memset32 & memset64 Subject: ARM: implement memset32 & memset64 Subject: alpha: add support for memset16 Subject: drivers/block/zram/zram_drv.c: convert to using memset_l Subject: drivers/scsi/sym53c8xx_2/sym_hipd.c: convert to use memset32 Subject: vga: optimise console scrolling Subject: treewide: make "nr_cpu_ids" unsigned Subject: arch: define CPU_BIG_ENDIAN for all fixed big endian archs Subject: arch/microblaze: add choice for endianness and update Makefile Subject: include: warn for inconsistent endian config definition Subject: bitops: avoid integer overflow in GENMASK(_ULL) Subject: rbtree: cache leftmost node internally Subject: rbtree: optimize root-check during rebalancing loop Subject: rbtree: add some additional comments for rebalancing cases Subject: lib/rbtree_test.c: make input module parameters Subject: lib/rbtree_test.c: add (inorder) traversal test Subject: lib/rbtree_test.c: support rb_root_cached Subject: sched/fair: replace cfs_rq->rb_leftmost Subject: sched/deadline: replace earliest dl and rq leftmost caching Subject: locking/rtmutex: replace top-waiter and pi_waiters leftmost caching Subject: block/cfq: replace cfq_rb_root leftmost caching Subject: lib/interval_tree: fast overlap detection Subject: lib/interval-tree: correct comment wrt generic flavor Subject: procfs: use faster rb_first_cached() Subject: fs/epoll: use faster rb_first_cached() Subject: mem/memcg: cache rightmost node Subject: block/cfq: cache rightmost rb_node Subject: lib/hexdump.c: return -EINVAL in case of error in hex2bin() Subject: lib: add test module for CONFIG_DEBUG_VIRTUAL Subject: lib/bitmap.c: make bitmap_parselist() thread-safe and much faster Subject: lib/test_bitmap.c: add test for bitmap_parselist() Subject: bitmap: introduce BITMAP_FROM_U64() Subject: lib/rhashtable: fix comment on locks_mul default value Subject: lib/string.c: check for kmalloc() failure Subject: lib/cmdline.c: remove meaningless comment Subject: radix-tree: must check __radix_tree_preload() return value Subject: lib/oid_registry.c: X.509: fix the buffer overflow in the utility function for OID string Subject: checkpatch: add --strict check for ifs with unnecessary parentheses Subject: checkpatch: fix typo in comment Subject: checkpatch: rename variables to avoid confusion Subject: checkpatch: add 6 missing types to --list-types Subject: binfmt_flat: delete two error messages for a failed memory allocation in decompress_exec() Subject: init: move stack canary initialization after setup_arch Subject: init/main.c: extract early boot entropy from the passed cmdline Subject: autofs: fix AT_NO_AUTOMOUNT not being honored Subject: autofs: make disc device user accessible Subject: autofs: make dev ioctl version and ismountpoint user accessible Subject: autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT Subject: autofs: non functional header inclusion cleanup Subject: autofs: use AUTOFS_DEV_IOCTL_SIZE Subject: autofs: drop wrong comment Subject: autofs: use unsigned int/long instead of uint/ulong for ioctl args Subject: vfat: deduplicate hex2bin() Subject: test_kmod: remove paranoid UINT_MAX check on uint range processing Subject: test_kmod: flip INT checks to be consistent Subject: kmod: split out umh code into its own file Subject: MAINTAINERS: clarify kmod is just a kernel module loader Subject: kmod: split off umh headers into its own file Subject: kmod: move #ifdef CONFIG_MODULES wrapper to Makefile Subject: cpumask: make cpumask_next() out-of-line Subject: drivers/pps: aesthetic tweaks to PPS-related content Subject: drivers/pps: use surrounding "if PPS" to remove numerous dependency checks Subject: m32r: defconfig: cleanup from old Kconfig options Subject: mn10300: defconfig: cleanup from old Kconfig options Subject: sh: defconfig: cleanup from old Kconfig options Subject: kcov: support compat processes Subject: ipc: convert ipc_namespace.count from atomic_t to refcount_t Subject: ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t Subject: ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t Subject: ipc/sem: drop sem_checkid helper Subject: ipc/sem: play nicer with large nsops allocations Subject: ipc: optimize semget/shmget/msgget for lots of keys
On Sat, 9 Sep 2017 10:40:21 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Sep 8, 2017 at 6:27 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Git does all of this right. Your quilt scripts are garbage. Please
> > please start fixing this.
> >
> > I've worked around it by just editing the patch, but..
>
> .. and I just realized that your patches must obviously be ok in your
> tree, since you can apply them, and apparently Stephen can apply them
> in linux-next.
>
> I'm assuming Stephen applies them from your quilt series directly, and
> thus never saw the problem with bad locale conversion.
>
> Maybe we should just change the workflow, with you sending me a raw
> tar-ball of the quilt series (or whatever the equivalent quilt
> "bundle" is) as an attachment and we forego the traditional
> patch-bombing model?
>
> That would avoid the locale issues with email.
>
Leave it with me - I need to sit down and have fiddle for a while. For
some reason I can't recall I had LOCALE=C set, and using en_US.UTF-8
changes things quite a lot.
And I need to figure out why the heck I did this:
iconv -f latin1 | mailx -s "$subject" "$all"
!
A few leftovers. Now with fixed up locale stuff, fingers crossed. 9 patches, based on 46c1e79fee417f151547aa46fae04ab06cb666f4: Subject: idr: remove WARN_ON_ONCE() when trying to replace negative ID Subject: drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4 Subject: procfs: remove unused variable Subject: lib/test_bitmap.c: use ULL suffix for 64-bit constants Subject: fscache: fix fscache_objlist_show format processing Subject: IB/mlx4: fix sprintf format warning Subject: mm: treewide: remove GFP_TEMPORARY allocation flag Subject: arm64: stacktrace: avoid listing stacktrace functions in stacktrace Subject: mm, page_owner: skip unnecessary stack_trace entries
A lot of stuff, sorry about that. A week on a beach, then a bunch of time catching up then more time letting it bake in -next. Shan't do that again! 51 fixes, based on d81fa669e3de7eb8a631d7d95dac5fbcb2bf9d4e: Subject: alpha: fix build failures Subject: kernel/params.c: align add_sysfs_param documentation with code Subject: scripts/spelling.txt: add more spelling mistakes to spelling.txt Subject: include/linux/mm.h: fix typo in VM_MPX definition Subject: ksm: fix unlocked iteration over vmas in cmp_and_merge_page() Subject: mm, hugetlb, soft_offline: save compound page order before page migration Subject: sh: sh7722: remove nonexistent GPIO_PTQ7 to fix pinctrl registration Subject: sh: sh7757: remove nonexistent GPIO_PT[JLNQ]7_RESV to fix pinctrl registration Subject: sh: sh7264: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration Subject: sh: sh7269: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration Subject: z3fold: fix potential race in z3fold_reclaim_page Subject: mm, oom_reaper: skip mm structs with mmu notifiers Subject: mm, memcg: remove hotplug locking from try_charge Subject: mm/memcg: avoid page count check for zone device Subject: android: binder: drop lru lock in isolate callback Subject: mm,compaction: serialize waitqueue_active() checks (for real) Subject: z3fold: fix stale list handling Subject: mm: meminit: mark init_reserved_page as __meminit Subject: rapidio: remove global irq spinlocks from the subsystem Subject: mm: fix RODATA_TEST failure "rodata_test: test data was not read only" Subject: zram: fix null dereference of handle Subject: m32r: define CPU_BIG_ENDIAN Subject: mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC Subject: mm: avoid marking swap cached page as lazyfree Subject: mm: fix data corruption caused by lazyfree page Subject: mm/device-public-memory: fix edge case in _vm_normal_page() Subject: userfaultfd: non-cooperative: fix fork use after free Subject: exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array Subject: exec: binfmt_misc: don't nullify Node->dentry in kill_node() Subject: exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode() Subject: exec: binfmt_misc: remove the confusing e->interp_file != NULL checks Subject: exec: binfmt_misc: fix race between load_misc_binary() and kill_node() Subject: exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array Subject: lib/lz4: make arrays static const, reduces object code size Subject: include/linux/bitfield.h: remove 32bit from FIELD_GET comment block Subject: kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv() Subject: mm: memcontrol: use vmalloc fallback for large kmem memcg arrays Subject: lib/idr.c: fix comment for idr_replace() Subject: mm, memory_hotplug: add scheduling point to __add_pages Subject: mm, page_alloc: add scheduling point to memmap_init_zone Subject: memremap: add scheduling point to devm_memremap_pages Subject: kernel/kcmp.c: drop branch leftover typo Subject: mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function Subject: mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long Subject: kernel/params.c: fix the maximum length in param_get_string Subject: kernel/params.c: fix an overflow in param_attr_show Subject: kernel/params.c: improve STANDARD_PARAM_DEF readability Subject: lib/ratelimit.c: use deferred printk() version Subject: m32r: fix build failure Subject: checkpatch: fix ignoring cover-letter logic Subject: include/linux/fs.h: fix comment about struct address_space
18 fixes, based on 997301a860fca1a05ab8e383a8039b65f8abeb1e: Subject: mm/migrate: fix indexing bug (off by one) and avoid out of bound access Subject: lib/Kconfig.debug: kernel hacking menu: runtime testing: keep tests together Subject: mm/madvise.c: add description for MADV_WIPEONFORK and MADV_KEEPONFORK Subject: include/linux/of.h: provide of_n_{addr,size}_cells wrappers for !CONFIG_OF Subject: mm/mempolicy: fix NUMA_INTERLEAVE_HIT counter Subject: mm: remove unnecessary WARN_ONCE in page_vma_mapped_walk(). Subject: mm: only display online cpus of the numa node Subject: userfaultfd: selftest: exercise -EEXIST only in background transfer Subject: scripts/kallsyms.c: ignore symbol type 'n' Subject: mm/cma.c: take __GFP_NOWARN into account in cma_alloc() Subject: Revert "vmalloc: back off when the current task is killed" Subject: tty: fall back to N_NULL if switching to N_TTY fails during hangup Subject: linux/kernel.h: add/correct kernel-doc notation Subject: fs/mpage.c: fix mpage_writepage() for pages with buffers Subject: fs/binfmt_misc.c: node could be NULL when evicting inode Subject: kmemleak: clear stale pointers from task stacks Subject: mm: page_vma_mapped: ensure pmd is loaded with READ_ONCE outside of lock Subject: mm, swap: use page-cluster as max window of VMA based swap readahead
7 fixes, based on 5cb0512c02ecd7e6214e912e4c150f4219ac78e0: Subject: userfaultfd: hugetlbfs: prevent UFFDIO_COPY to fill beyond the end of i_size Subject: mm, /proc/pid/pagemap: fix soft dirty marking for PMD migration entry Subject: ocfs2: fstrim: Fix start offset of first cluster group during fstrim Subject: fs/hugetlbfs/inode.c: fix hwpoison reserve accounting Subject: initramfs: fix initramfs rebuilds w/ compression after disabling Subject: mm/huge_memory.c: deposit page table when copying a PMD migration entry Subject: mm, swap: fix race between swap count continuation operations
2 fixes, based on 3fefc31843cfe2b5f072efe11ed9ccaf6a7a5092: Subject: sysctl: add register_sysctl() dummy helper Subject: MAINTAINERS: update TPM driver infrastructure changes
- a few misc bits - ocfs2 updates - almost all of MM 131 patches, based on c9b012e5f4a1d01dfa8abc6318211a67ba7d5db2: Subject: bloat-o-meter: provide 3 different arguments for data, function and All Subject: m32r: fix endianness constraints Subject: ocfs2: remove unused declaration ocfs2_publish_get_mount_state() Subject: ocfs2: no need flush workqueue before destroying it Subject: ocfs2: cleanup unused func declaration and assignment Subject: ocfs2: fix cluster hang after a node dies Subject: ocfs2: clean up some unused function declarations Subject: ocfs2: should wait dio before inode lock in ocfs2_setattr() Subject: ocfs2: ip_alloc_sem should be taken in ocfs2_get_block() Subject: ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent Subject: ocfs2/dlm: get mle inuse only when it is initialized Subject: ocfs2: remove unneeded goto in ocfs2_reserve_cluster_bitmap_bits() Subject: tools: slabinfo: add "-U" option to show unreclaimable slabs only Subject: mm: slabinfo: remove CONFIG_SLABINFO Subject: mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory Subject: mm/slob.c: remove an unnecessary check for __GFP_ZERO Subject: mm/slab.c: only set __GFP_RECLAIMABLE once Subject: slab, slub, slob: add slab_flags_t Subject: slab, slub, slob: convert slab_flags_t to 32-bit Subject: slub: fix sysfs duplicate filename creation when slub_debug=O Subject: include/linux/slab.h: add kmalloc_array_node() and kcalloc_node() Subject: block/blk-mq.c: use kmalloc_array_node() Subject: drivers/infiniband/hw/qib/qib_init.c: use kmalloc_array_node() Subject: drivers/infiniband/sw/rdmavt/qp.c: use kmalloc_array_node() Subject: mm/mempool.c: use kmalloc_array_node() Subject: net/rds/ib_fmr.c: use kmalloc_array_node() Subject: mm: update comments for struct page.mapping Subject: zram: set BDI_CAP_STABLE_WRITES once Subject: bdi: introduce BDI_CAP_SYNCHRONOUS_IO Subject: mm, swap: introduce SWP_SYNCHRONOUS_IO Subject: mm, swap: skip swapcache for swapin of synchronous device Subject: mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference Subject: mm, swap: fix false error message in __swp_swapcount() Subject: mm/page-writeback.c: remove unused parameter from balance_dirty_pages() Subject: mm: drop migrate type checks from has_unmovable_pages Subject: mm: distinguish CMA and MOVABLE isolation in has_unmovable_pages() Subject: mm, page_alloc: fail has_unmovable_pages when seeing reserved pages Subject: mm, memory_hotplug: do not fail offlining too early Subject: mm, memory_hotplug: remove timeout from __offline_memory Subject: mm/memblock.c: make the index explicit argument of for_each_memblock_type Subject: mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical Subject: zram: add zstd to the supported algorithms list Subject: zram: remove zlib from the list of recommended algorithms Subject: fs/hugetlbfs/inode.c: remove redundant -ENIVAL return from hugetlbfs_setattr() Subject: mm/hmm: constify hmm_devmem_page_get_drvdata() parameter Subject: zsmalloc: calling zs_map_object() from irq is a bug Subject: mm/mmu_notifier: avoid double notification when it is useless Subject: mm/mmu_notifier: avoid call to invalidate_range() in range_end() Subject: mm: remove unused pgdat->inactive_ratio Subject: mm/swap_slots.c: fix race conditions in swap_slots cache init Subject: mm, arch: remove empty_bad_page* Subject: mm/cma.c: change pr_info to pr_err for cma_alloc fail log Subject: mm/page_owner.c: reduce page_owner structure size Subject: mm: implement find_get_pages_range_tag() Subject: btrfs: use pagevec_lookup_range_tag() Subject: ceph: use pagevec_lookup_range_tag() Subject: ext4: use pagevec_lookup_range_tag() Subject: f2fs: use pagevec_lookup_range_tag() Subject: f2fs: simplify page iteration loops Subject: f2fs: use find_get_pages_tag() for looking up single page Subject: gfs2: use pagevec_lookup_range_tag() Subject: nilfs2: use pagevec_lookup_range_tag() Subject: mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range() Subject: mm: use pagevec_lookup_range_tag() in write_cache_pages() Subject: mm: add variant of pagevec_lookup_range_tag() taking number of pages Subject: ceph: use pagevec_lookup_range_nr_tag() Subject: mm: remove nr_pages argument from pagevec_lookup_{,range}_tag() Subject: afs: use find_get_pages_range_tag() Subject: cifs: use find_get_pages_range_tag() Subject: kmemleak: change /sys/kernel/debug/kmemleak permissions from 0444 to 0644 Subject: mm: account pud page tables Subject: mm: introduce wrappers to access mm->nr_ptes Subject: mm: consolidate page table accounting Subject: fs, mm: account filp cache to kmemcg Subject: mm/rmap.c: remove redundant variable cend Subject: kmemcheck: remove annotations Subject: kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK Subject: kmemcheck: remove whats left of NOTRACK flags Subject: kmemcheck: rip it out Subject: mm/swap_state.c: declare a few variables as __read_mostly Subject: mm: deferred_init_memmap improvements Subject: x86/mm: set fields in deferred pages Subject: sparc64/mm: set fields in deferred pages Subject: sparc64: simplify vmemmap_populate Subject: mm: define memblock_virt_alloc_try_nid_raw Subject: mm: zero reserved and unavailable struct pages Subject: x86/mm/kasan: don't use vmemmap_populate() to initialize shadow Subject: arm64/mm/kasan: don't use vmemmap_populate() to initialize shadow Subject: mm: stop zeroing memory during allocation in vmemmap Subject: sparc64: optimize struct page zeroing Subject: mm/page_alloc: make sure __rmqueue() etc are always inline Subject: userfaultfd: use mmgrab instead of open-coded increment of mm_count Subject: mm, soft_offline: improve hugepage soft offlining error log Subject: mm/page-writeback.c: convert timers to use timer_setup() Subject: drivers/block/zram/zram_drv.c: make zram_page_end_io() static Subject: mm: speed up cancel_dirty_page() for clean pages Subject: mm: refactor truncate_complete_page() Subject: mm: factor out page cache page freeing into a separate function Subject: mm: move accounting updates before page_cache_tree_delete() Subject: mm: move clearing of page->mapping to page_cache_tree_delete() Subject: mm: factor out checks and accounting from __delete_from_page_cache() Subject: mm: batch radix tree operations when truncating pages Subject: mm, page_alloc: enable/disable IRQs once when freeing a list of pages Subject: mm, truncate: do not check mapping for every page being truncated Subject: mm, truncate: remove all exceptional entries from pagevec under one lock Subject: mm: only drain per-cpu pagevecs once per pagevec usage Subject: mm, pagevec: remove cold parameter for pagevecs Subject: mm: remove cold parameter for release_pages Subject: mm: remove cold parameter from free_hot_cold_page* Subject: mm: remove __GFP_COLD Subject: mm, page_alloc: simplify list handling in rmqueue_bulk() Subject: mm, pagevec: rename pagevec drained field Subject: Unify migrate_pages and move_pages access checks Subject: shmem: convert shmem_init_inodecache() to void Subject: mm, sysctl: make NUMA stats configurable Subject: mm: mlock: remove lru_add_drain_all() Subject: mm, page_alloc: fix potential false positive in __zone_watermark_ok Subject: fs: fuse: account fuse_inode slab memory as reclaimable Subject: mm: don't warn about allocations which stall for too long Subject: mm/page_alloc.c: broken deferred calculation Subject: mm/shmem.c: mark expected switch fall-through Subject: mm/list_lru.c: mark expected switch fall-through Subject: mm/hmm: remove redundant variable align_end Subject: mm, sparse: do not swamp log with huge vmemmap allocation failures Subject: mm: do not rely on preempt_count in print_vma_addr Subject: writeback: remove unused function parameter Subject: mm/page_ext.c: check if page_ext is not prepared Subject: mm,oom_reaper: remove pointless kthread_run() error check Subject: mm: simplify nodemask printing Subject: mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP Subject: memory hotplug: fix comments when adding section
- a bit more MM - procfs updates - dynamic-debug fixes - lib/ updates - checkpatch - epoll - nilfs2 - signals - rapidio - PID management cleanup and optimization - kcov updates - sysvipc updates - quite a few misc things all over the place 94 patches, based on a3841f94c7ecb3ede0f888d3fcfe8fb6368ddd7a: Subject: mm: fix nodemask printing Subject: mm/z3fold.c: use kref to prevent page free/compact race Subject: lib/dma-debug.c: fix incorrect pfn calculation Subject: mm: shmem: remove unused info variable Subject: mm, compaction: kcompactd should not ignore pageblock skip Subject: mm, compaction: persistently skip hugetlbfs pageblocks Subject: mm, compaction: extend pageblock_skip_persistent() to all compound pages Subject: mm, compaction: split off flag for not updating skip hints Subject: mm, compaction: remove unneeded pageblock_skip_persistent() checks Subject: proc, coredump: add CoreDumping flag to /proc/pid/status Subject: proc: : uninline name_to_int() Subject: proc: use do-while in name_to_int() Subject: spelling.txt: add "unnecessary" typo variants Subject: sh/boot: add static stack-protector to pre-kernel Subject: kernel debug: support resetting WARN*_ONCE Subject: kernel debug: support resetting WARN_ONCE for all architectures Subject: parse-maintainers: add ability to specify filenames Subject: iopoll: avoid -Wint-in-bool-context warning Subject: lkdtm: include WARN format string Subject: bug: define the "cut here" string in a single place Subject: bug: fix "cut here" location for __WARN_TAINT architectures Subject: include/linux/compiler-clang.h: handle randomizable anonymous structs Subject: kernel/umh.c: optimize 'proc_cap_handler()' Subject: dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0 Subject: dynamic_debug documentation: minor fixes Subject: get_maintainer: add --self-test for internal consistency tests Subject: get_maintainer: add more --self-test options Subject: include/linux/bitfield.h: include <linux/build_bug.h> instead of <linux/bug.h> Subject: include/linux/radix-tree.h: remove unneeded #include <linux/bug.h> Subject: lib: add module support to string tests Subject: lib/test: delete five error messages for failed memory allocations Subject: lib/int_sqrt: optimize small argument Subject: lib/int_sqrt: optimize initial value compute Subject: lib/int_sqrt: adjust comments Subject: lib/genalloc.c: make the avail variable an atomic_long_t Subject: lib/nmi_backtrace.c: fix kernel text address leak Subject: tools/lib/traceevent/parse-filter.c: clean up clang build warning Subject: lib/rbtree-test: lower default params Subject: lib: test module for find_*_bit() functions Subject: checkpatch: support function pointers for unnamed function definition arguments Subject: scripts/checkpatch.pl: avoid false warning missing break Subject: checkpatch: printks always need a KERN_<LEVEL> Subject: checkpatch: allow DEFINE_PER_CPU definitions to exceed line length Subject: checkpatch: add TP_printk to list of logging functions Subject: checkpatch: add --strict test for lines ending in [ or ( Subject: checkpatch: do not check missing blank line before builtin_*_driver Subject: epoll: account epitem and eppoll_entry to kmemcg Subject: epoll: avoid calling ep_call_nested() from ep_poll_safewake() Subject: epoll: remove ep_call_nested() from ep_eventpoll_poll() Subject: init/version.c: include <linux/export.h> instead of <linux/module.h> Subject: autofs: don't fail mount for transient error Subject: pipe: match pipe_max_size data type with procfs Subject: pipe: avoid round_pipe_size() nr_pages overflow on 32-bit Subject: pipe: add proc_dopipe_max_size() to safely assign pipe_max_size Subject: sysctl: check for UINT_MAX before unsigned int min/max Subject: fs/nilfs2: convert timers to use timer_setup() Subject: nilfs2: fix race condition that causes file system corruption Subject: fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t Subject: nilfs2: align block comments of nilfs_sufile_truncate_range() at * Subject: nilfs2: use octal for unreadable permission macro Subject: nilfs2: remove inode->i_version initialization Subject: hfs/hfsplus: clean up unused variables in bnode.c Subject: fat: remove redundant assignment of 0 to slots Subject: kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL Subject: kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals Subject: kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal() Subject: kdump: print a message in case parse_crashkernel_mem resulted in zero bytes Subject: rapidio: constify rio_device_id Subject: drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()' Subject: drivers/rapidio/devices/rio_mport_cdev.c: fix error handling in 'rio_dma_transfer()' Subject: Documentation/sysctl/vm.txt: fix typo Subject: kernel/sysctl.c: code cleanups Subject: pid: replace pid bitmap implementation with IDR API Subject: pid: remove pidhash Subject: kernel/panic.c: add TAINT_AUX Subject: kcov: remove pointless current != NULL check Subject: kcov: support comparison operands collection Subject: Makefile: support flag -fsanitizer-coverage=trace-cmp Subject: kcov: update documentation Subject: kernel/reboot.c: add devm_register_reboot_notifier() Subject: drivers/watchdog: make use of devm_register_reboot_notifier() Subject: initramfs: use time64_t timestamps Subject: sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE Subject: sysvipc: duplicate lock comments wrt ipc_addid() Subject: sysvipc: properly name ipc_addid() limit parameter Subject: sysvipc: make get_maxid O(1) again Subject: mm: add infrastructure for get_user_pages_fast() benchmarking Subject: drivers/pcmcia/sa1111_badge4.c: avoid unused function warning Subject: arch/ia64/include/asm/topology.h: remove unused parent_node() macro Subject: arch/sh/include/asm/topology.h: remove unused parent_node() macro Subject: arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro Subject: arch/tile/include/asm/topology.h: remove unused parent_node() macro Subject: include/asm-generic/topology.h: remove unused parent_node() macro Subject: EXPERT Kconfig menu: fix broken EXPERT menu
28 fixes, based on 43570f0383d6d5879ae585e6c3cf027ba321546f: Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry Subject: mm/cma: fix alloc_contig_range ret code/potential leak Subject: mm: fix device-dax pud write-faults triggered by get_user_pages() Subject: mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE Subject: mm: replace pud_write with pud_access_permitted in fault + gup paths Subject: mm: replace pmd_write with pmd_access_permitted in fault + gup paths Subject: mm: replace pte_write with pte_access_permitted in fault + gup paths Subject: scripts/faddr2line: extend usage on generic arch Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct Subject: device-dax: implement ->split() to catch invalid munmap attempts Subject: mm: introduce get_user_pages_longterm Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings Subject: v4l2: disable filesystem-dax mapping support Subject: IB/core: disable memory registration of filesystem-dax vmas Subject: exec: avoid RLIMIT_STACK races with prlimit() Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances Subject: Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical" Subject: fs/mbcache.c: make count_objects() more robust Subject: scripts/bloat-o-meter: don't fail with division by 0 Subject: kmemleak: add scheduling point to kmemleak_scan() Subject: mm: migrate: fix an incorrect call of prep_transhuge_page() Subject: mm, memcg: fix mem_cgroup_swapout() for THPs Subject: fs/fat/inode.c: fix sb_rdonly() change Subject: autofs: revert "autofs: take more care to not update last_used on path walk" Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored" Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine Subject: fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()
17 fixes, based on 7c5cac1bc7170bfc726a69eb64947c55658d16ad: Subject: include/linux/idr.h: add #include <linux/bug.h> Subject: lib/rbtree,drm/mm: add rbtree_replace_node_cached() Subject: mm/kmemleak.c: make cond_resched() rate-limiting more efficient Subject: string.h: workaround for increased stack usage Subject: autofs: fix careless error in recent commit Subject: exec: avoid gcc-8 warning for get_task_comm Subject: Documentation/vm/zswap.txt: update with same-value filled page feature Subject: scripts/faddr2line: fix CROSS_COMPILE unset error Subject: mm/memory.c: mark wp_huge_pmd() inline to prevent build failure Subject: mm/page_alloc.c: avoid excessive IRQ disabled times in free_unref_page_list() Subject: mm/slab.c: do not hash pointers when debugging slab Subject: kcov: fix comparison callback signature Subject: tools/slabinfo-gnuplot: force to use bash shell Subject: mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' Subject: kernel: make groups_sort calling a responsibility group_info allocators Subject: mm, oom_reaper: fix memory corruption Subject: arch: define weak abort()
9 fixes, based on e1915c8195b38393005be9b74bfa6a3a367c83b3: Subject: mm: check pfn_valid first in zero_resv_unavail Subject: kernel/acct.c: fix the acct->needcheck check in check_free_space() Subject: mm/mprotect: add a cond_resched() inside change_pmd_range() Subject: kernel/exit.c: export abort() to modules Subject: mm/debug.c: provide useful debugging information for VM_BUG Subject: mm/zsmalloc.c: include fs.h Subject: mm/sparse.c: wrong allocation for mem_section Subject: userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails Subject: mailmap: update Mark Yao's email address
4 fixes, based on 1545dec46db3858bbce84c2065b579e2925706ab: Subject: MAINTAINERS, nilfs2: change project home URLs Subject: kmemleak: allow to coexist with fault injection Subject: kdump: write correct address of mem_section into vmcoreinfo Subject: tools/objtool/Makefile: don't assume sync-check.sh is executable
6 fixes, based on dda3e15231b35840fe6f0973f803cc70ddb86281: Subject: mm/memory.c: release locked page in do_swap_page() Subject: mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages Subject: scripts/decodecode: fix decoding for AArch64 (arm64) instructions Subject: scripts/gdb/linux/tasks.py: fix get_thread_info Subject: proc: fix coredump vs read /proc/*/stat race Subject: sparse doesn't support struct randomization
- misc fixes - ocfs2 updates - most of MM 119 patches, based on 7b1cd95d65eb3b1e13f8a90eb757e0ea232c7899: Subject: fs/dax.c: release PMD lock even when there is no PMD support in DAX Subject: tools: fix cross-compile var clobbering Subject: scripts/decodecode: make it take multiline Code line Subject: scripts/tags.sh: change find_other_sources() for include directories Subject: m32r: remove abort() Subject: fs/ocfs2/dlm/dlmmaster.c: clean up dead code Subject: ocfs2/cluster: neaten a member of o2net_msg_handler Subject: ocfs2: give an obvious tip for mismatched cluster names Subject: ocfs2/cluster: close a race that fence can't be triggered Subject: ocfs2: use the OCFS2_XATTR_ROOT_SIZE macro in ocfs2_reflink_xattr_header() Subject: ocfs2: clean dead code in suballoc.c Subject: ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid Subject: ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE Subject: ocfs2/xattr: assign errno to 'ret' in ocfs2_calc_xattr_init() Subject: ocfs2: clean up dead code in alloc.c Subject: ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute Subject: ocfs2: make metadata estimation accurate and clear Subject: ocfs2: try to reuse extent block in dealloc without meta_alloc Subject: ocfs2: add trimfs dlm lock resource Subject: ocfs2: add trimfs lock to avoid duplicated trims in cluster Subject: ocfs2: add ocfs2_try_rw_lock() and ocfs2_try_inode_lock() Subject: ocfs2: add ocfs2_overwrite_io() Subject: ocfs2: nowait aio support Subject: ocfs2: unlock bh_state if bg check fails Subject: ocfs2: return error when we attempt to access a dirty bh in jbd2 Subject: mm/slab_common.c: make calculate_alignment() static Subject: mm/slab.c: remove redundant assignments for slab_state Subject: mm/slub.c: fix wrong address during slab padding restoration Subject: slub: remove obsolete comments of put_cpu_partial() Subject: include/linux/sched/mm.h: uninline mmdrop_async(), etc Subject: mm: kmemleak: remove unused hardirq.h Subject: zswap: same-filled pages handling Subject: mm: relax deferred struct page requirements Subject: mm/mempolicy: remove redundant check in get_nodes Subject: mm/mempolicy: fix the check of nodemask from user Subject: mm/mempolicy: add nodes_empty check in SYSC_migrate_pages Subject: mm: drop hotplug lock from lru_add_drain_all() Subject: mm: show total hugetlb memory consumption in /proc/meminfo Subject: mm: use sc->priority for slab shrink targets Subject: mm: split deferred_init_range into initializing and freeing parts Subject: mm/filemap.c: remove include of hardirq.h Subject: mm: memcontrol: eliminate raw access to stat and event counters Subject: mm: memcontrol: implement lruvec stat functions on top of each other Subject: mm: memcontrol: fix excessive complexity in memory.stat reporting Subject: mm/page_owner.c: use PTR_ERR_OR_ZERO() Subject: mm/page_alloc.c: fix comment in __get_free_pages() Subject: mm: do not stall register_shrinker() Subject: selftests/vm: move 128TB mmap boundary test to generic directory Subject: mm/interval_tree.c: use vma_pages() helper Subject: mm: remove unused pgdat_reclaimable_pages() Subject: mm, hugetlb: remove hugepages_treat_as_movable sysctl Subject: mm/memory_hotplug.c: remove unnecesary check from register_page_bootmem_info_section() Subject: mm: update comment describing tlb_gather_mmu Subject: fs/proc/task_mmu.c: do not show VmExe bigger than total executable virtual memory Subject: mm: memory_hotplug: remove second __nr_to_section in register_page_bootmem_info_section() Subject: mm/huge_memory.c: fix comment in __split_huge_pmd_locked Subject: mm, userfaultfd, THP: avoid waiting when PMD under THP migration Subject: mm: add unmap_mapping_pages() Subject: mm: get 7% more pages in a pagevec Subject: asm-generic: provide generic_pmdp_establish() Subject: arc: use generic_pmdp_establish as pmdp_establish Subject: arm/mm: provide pmdp_establish() helper Subject: arm64: provide pmdp_establish() helper Subject: mips: use generic_pmdp_establish as pmdp_establish Subject: powerpc/mm: update pmdp_invalidate to return old pmd value Subject: s390/mm: modify pmdp_invalidate to return old value. Subject: sparc64: update pmdp_invalidate() to return old pmd value Subject: x86/mm: provide pmdp_establish() helper Subject: mm: do not lose dirty and accessed bits in pmdp_invalidate() Subject: mm: use updated pmdp_invalidate() interface to track dirty/accessed bits Subject: mm/thp: remove pmd_huge_split_prepare() Subject: mm: thp: use down_read_trylock() in khugepaged to avoid long block Subject: mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks Subject: mm, oom: avoid reaping only for mm's with blockable invalidate callbacks Subject: mm/zsmalloc: simplify shrinker init/destroy Subject: mm: align struct page more aesthetically Subject: mm: de-indent struct page Subject: mm: remove misleading alignment claims Subject: mm: improve comment on page->mapping Subject: mm: introduce _slub_counter_t Subject: mm: store compound_dtor / compound_order as bytes Subject: mm: document how to use struct page Subject: mm: remove reference to PG_buddy Subject: shmem: unexport shmem_add_seals()/shmem_get_seals() Subject: shmem: rename functions that are memfd-related Subject: hugetlb: expose hugetlbfs_inode_info in header Subject: hugetlb: implement memfd sealing Subject: shmem: add sealing support to hugetlb-backed memfd Subject: memfd-test: test hugetlbfs sealing Subject: memfd-test: add 'memfd-hugetlb:' prefix when testing hugetlbfs Subject: memfd-test: move common code to a shared unit Subject: memfd-test: run fuse test on hugetlb backend memory Subject: userfaultfd: convert to use anon_inode_getfd() Subject: mm: pin address_space before dereferencing it while isolating an LRU page Subject: mm/fadvise: discard partial page if endbyte is also EOF Subject: zswap: only save zswap header when necessary Subject: memcg: refactor mem_cgroup_resize_limit() Subject: mm/page_alloc.c: fix typos in comments Subject: mm/page_owner.c: clean up init_pages_in_zone() Subject: zsmalloc: use U suffix for negative literals being shifted Subject: mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it Subject: mm/compaction.c: fix comment for try_to_compact_pages() Subject: include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer Subject: mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd() Subject: mm/memcontrol.c: make local symbol static Subject: mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes Subject: mm, hugetlb: unify core page allocation accounting and initialization Subject: mm, hugetlb: integrate giga hugetlb more naturally to the allocation path Subject: mm, hugetlb: do not rely on overcommit limit during migration Subject: mm, hugetlb: get rid of surplus page accounting tricks Subject: mm, hugetlb: further simplify hugetlb allocation API Subject: hugetlb, mempolicy: fix the mbind hugetlb migration Subject: hugetlb, mbind: fall back to default policy if vma is NULL Subject: mm: numa: do not trap faults on shared data section pages. Subject: mm: correct comments regarding do_fault_around() Subject: mm, memory_hotplug: fix memmap initialization Subject: mm/swap.c: make functions and their kernel-doc agree Subject: tools, vm: new option to specify kpageflags file Subject: mm: remove PG_highmem description
And... [002/119] seems to have just disappeared. It was a standalone thing, I'll resend next time.
- kasan updates - procfs - lib/bitmap updates - other lib/ updates - checkpatch tweaks - rapidio - ubsan - pipe fixes and cleanups - lots of other misc bits 114 patches, based on e237f98a9c134c3d600353f21e07db915516875b: Subject: kasan: don't emit builtin calls when sanitization is off Subject: kasan: add compiler support for clang Subject: kasan/Makefile: support LLVM style asan parameters Subject: kasan: support alloca() poisoning Subject: kasan: add tests for alloca poisoning Subject: kasan: add functions for unpoisoning stack variables Subject: kasan: detect invalid frees for large objects Subject: kasan: don't use __builtin_return_address(1) Subject: kasan: detect invalid frees for large mempool objects Subject: kasan: unify code between kasan_slab_free() and kasan_poison_kfree() Subject: kasan: detect invalid frees Subject: kasan: fix prototype author email address Subject: kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage Subject: kasan: remove redundant initialization of variable 'real_size' Subject: proc: use %u for pid printing and slightly less stack Subject: proc: don't use READ_ONCE/WRITE_ONCE for /proc/*/fail-nth Subject: proc: fix /proc/*/map_files lookup Subject: fs/proc/vmcore.c: simpler /proc/vmcore cleanup Subject: proc: less memory for /proc/*/map_files readdir Subject: fs/proc/array.c: delete children_seq_release() Subject: fs/proc/kcore.c: use probe_kernel_read() instead of memcpy() Subject: fs/proc/internal.h: rearrange struct proc_dir_entry Subject: fs/proc/internal.h: fix up comment Subject: fs/proc: use __ro_after_init Subject: proc: spread likely/unlikely a bit Subject: proc: rearrange args Subject: fs/proc/consoles.c: use seq_putc() in show_console_dev() Subject: Makefile: move stack-protector compiler breakage test earlier Subject: Makefile: move stack-protector availability out of Kconfig Subject: Makefile: introduce CONFIG_CC_STACKPROTECTOR_AUTO Subject: uuid: cleanup <uapi/linux/uuid.h> Subject: tools/lib/subcmd/pager.c: do not alias select() params Subject: kernel/async.c: revert "async: simplify lowest_in_progress()" Subject: MAINTAINERS: update sboyd's email address Subject: bitmap: new bitmap_copy_safe and bitmap_{from,to}_arr32 Subject: bitmap: replace bitmap_{from,to}_u32array Subject: lib/test_bitmap.c: add bitmap_zero()/bitmap_clear() test cases Subject: lib/test_bitmap.c: add bitmap_fill()/bitmap_set() test cases Subject: lib/test_bitmap.c: clean up test_zero_fill_copy() test case and rename Subject: include/linux/bitmap.h: make bitmap_fill() and bitmap_zero() consistent Subject: lib/stackdepot.c: use a non-instrumented version of memcmp() Subject: lib/test_find_bit.c: rename to find_bit_benchmark.c Subject: lib/find_bit_benchmark.c: improvements Subject: lib: optimize cpumask_next_and() Subject: lib/: make RUNTIME_TESTS a menuconfig to ease disabling it all Subject: lib/test_sort.c: add module unload support Subject: checkpatch: allow long lines containing URL Subject: checkpatch: ignore some octal permissions of 0 Subject: checkpatch: improve quoted string and line continuation test Subject: checkpatch: add a few DEVICE_ATTR style tests Subject: checkpatch: improve the TABSTOP test to include declarations Subject: checkpatch: exclude drivers/staging from if with unnecessary parentheses test Subject: checkpatch: avoid some false positives for TABSTOP declaration test Subject: checkpatch: improve OPEN_BRACE test Subject: elf: fix NT_FILE integer overflow Subject: kallsyms: let print_ip_sym() print raw addresses Subject: nilfs2: use time64_t internally Subject: hfsplus: honor setgid flag on directories Subject: <asm-generic/siginfo.h>: fix language in comments Subject: kernel/fork.c: check error and return early Subject: kernel/fork.c: add comment about usage of CLONE_FS flags and namespaces Subject: cpumask: make cpumask_size() return "unsigned int" Subject: rapidio: delete an error message for a failed memory allocation in rio_init_mports() Subject: rapidio: adjust 12 checks for null pointers Subject: rapidio: adjust five function calls together with a variable assignment Subject: rapidio: improve a size determination in five functions Subject: rapidio: delete an unnecessary variable initialisation in three functions Subject: rapidio: return an error code only as a constant in two functions Subject: rapidio: move 12 EXPORT_SYMBOL_GPL() calls to function implementations Subject: drivers/rapidio/devices/tsi721_dma.c: delete an error message for a failed memory allocation in tsi721_alloc_chan_resources() Subject: drivers/rapidio/devices/tsi721_dma.c: delete an unnecessary variable initialisation in tsi721_alloc_chan_resources() Subject: drivers/rapidio/devices/tsi721_dma.c: adjust six checks for null pointers Subject: pids: introduce find_get_task_by_vpid() helper Subject: pps: parport: use timespec64 instead of timespec Subject: kernel/relay.c: revert "kernel/relay.c: fix potential memory leak" Subject: kcov: detect double association with a single task Subject: include/linux/genl_magic_func.h: remove own BUILD_BUG_ON*() defines Subject: build_bug.h: remove BUILD_BUG_ON_NULL() Subject: lib/ubsan.c: s/missaligned/misaligned/ Subject: lib/ubsan: add type mismatch handler for new GCC/Clang Subject: lib/ubsan: remove returns-nonnull-attribute checks Subject: ipc: fix ipc data structures inconsistency Subject: ipc/mqueue.c: have RT tasks queue in by priority in wq_add() Subject: arch/score/kernel/setup.c: combine two seq_printf() calls into one call in show_cpuinfo() Subject: vfs: remove might_sleep() from clear_inode() Subject: mm/userfaultfd.c: remove duplicate include Subject: mm: remove unneeded kallsyms include Subject: hrtimer: remove unneeded kallsyms include Subject: genirq: remove unneeded kallsyms include Subject: mm/memblock: memblock_is_map/region_memory can be boolean Subject: lib/lockref: __lockref_is_dead can be boolean Subject: kernel/cpuset: current_cpuset_is_being_rebound can be boolean Subject: kernel/resource: iomem_is_exclusive can be boolean Subject: kernel/module: module_is_live can be boolean Subject: kernel/mutex: mutex_is_locked can be boolean Subject: crash_dump: is_kdump_kernel can be boolean Subject: kasan: rework Kconfig settings Subject: pipe, sysctl: drop 'min' parameter from pipe-max-size converter Subject: pipe, sysctl: remove pipe_proc_fn() Subject: pipe: actually allow root to exceed the pipe buffer limits Subject: pipe: fix off-by-one error when checking buffer limits Subject: pipe: reject F_SETPIPE_SZ with size over UINT_MAX Subject: pipe: simplify round_pipe_size() Subject: pipe: read buffer limits atomically Subject: mm: docs: fixup punctuation Subject: mm: docs: fix parameter names mismatch Subject: mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors Subject: MAINTAINERS: remove ANDROID ION pattern Subject: MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern Subject: MAINTAINERS: update Cortina/Gemini patterns Subject: MAINTAINERS: update "ARM/OXNAS platform support" patterns Subject: MAINTAINERS: update various PALM patterns Subject: MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns Subject: Documentation/sysctl/user.txt: fix typo
11 MM fixes, based on b3a987b0264d3ddbb24293ebff10eddfc472f653: Vlastimil Babka <vbabka@suse.cz>: mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations David Hildenbrand <david@redhat.com>: mm/memory_hotplug: don't free usage map when removing a re-added early section "Kirill A. Shutemov" <kirill@shutemov.name>: Patch series "Fix two above-47bit hint address vs. THP bugs": mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment Roman Gushchin <guro@fb.com>: mm: memcg/slab: fix percpu slab vmstats flushing Vlastimil Babka <vbabka@suse.cz>: mm, debug_pagealloc: don't rely on static keys too early Wen Yang <wenyang@linux.alibaba.com>: Patch series "use div64_ul() instead of div_u64() if the divisor is: mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide mm/page-writeback.c: improve arithmetic divisions Adrian Huang <ahuang12@lenovo.com>: mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid Yang Shi <yang.shi@linux.alibaba.com>: mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE include/linux/mm.h | 18 +++++++++- include/linux/mmzone.h | 5 +-- include/trace/events/huge_memory.h | 3 + init/main.c | 1 mm/huge_memory.c | 38 ++++++++++++++--------- mm/memcontrol.c | 37 +++++----------------- mm/mempolicy.c | 10 ++++-- mm/page-writeback.c | 10 +++--- mm/page_alloc.c | 61 ++++++++++--------------------------- mm/shmem.c | 7 ++-- mm/slab.c | 4 +- mm/slab_common.c | 3 + mm/slub.c | 2 - mm/sparse.c | 9 ++++- mm/vmalloc.c | 4 +- 15 files changed, 102 insertions(+), 110 deletions(-)
Most of -mm and quite a number of other subsystems. MM is fairly quiet this time. Holidays, I assume. 119 patches, based on 39bed42de2e7d74686a2d5a45638d6a5d7e7d473: Subsystems affected by this patch series: hotfixes scripts ocfs2 mm/slub mm/kmemleak mm/debug mm/pagecache mm/gup mm/swap mm/memcg mm/pagemap mm/tracing mm/kasan mm/initialization mm/pagealloc mm/vmscan mm/tools mm/memblock mm/oom-kill mm/hugetlb mm/migration mm/mmap mm/memory-hotplug mm/zswap mm/cleanups mm/zram misc lib binfmt init reiserfs exec dma-mapping kcov Subsystem: hotfixes Andy Shevchenko <andriy.shevchenko@linux.intel.com>: lib/test_bitmap: correct test data offsets for 32-bit "Theodore Ts'o" <tytso@mit.edu>: memcg: fix a crash in wb_workfn when a device disappears Dan Carpenter <dan.carpenter@oracle.com>: mm/mempolicy.c: fix out of bounds write in mpol_parse_str() Pingfan Liu <kernelfans@gmail.com>: mm/sparse.c: reset section's mem_map when fully deactivated Wei Yang <richardw.yang@linux.intel.com>: mm/migrate.c: also overwrite error when it is bigger than zero Dan Williams <dan.j.williams@intel.com>: mm/memory_hotplug: fix remove_memory() lockdep splat Wei Yang <richardw.yang@linux.intel.com>: mm: thp: don't need care deferred split queue in memcg charge move path Yang Shi <yang.shi@linux.alibaba.com>: mm: move_pages: report the number of non-attempted pages Subsystem: scripts Xiong <xndchn@gmail.com>: scripts/spelling.txt: add more spellings to spelling.txt Luca Ceresoli <luca@lucaceresoli.net>: scripts/spelling.txt: add "issus" typo Subsystem: ocfs2 Aditya Pakki <pakki001@umn.edu>: fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres zhengbin <zhengbin13@huawei.com>: ocfs2: remove unneeded semicolons Masahiro Yamada <masahiroy@kernel.org>: ocfs2: make local header paths relative to C files Colin Ian King <colin.king@canonical.com>: ocfs2/dlm: remove redundant assignment to ret Andy Shevchenko <andriy.shevchenko@linux.intel.com>: ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use wangyan <wangyan122@huawei.com>: ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans() ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction Subsystem: mm/slub Yu Zhao <yuzhao@google.com>: mm/slub.c: avoid slub allocation while holding list_lock Subsystem: mm/kmemleak He Zhe <zhe.he@windriver.com>: mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t Subsystem: mm/debug Vlastimil Babka <vbabka@suse.cz>: mm/debug.c: always print flags in dump_page() Subsystem: mm/pagecache Ira Weiny <ira.weiny@intel.com>: mm/filemap.c: clean up filemap_write_and_wait() Subsystem: mm/gup Qiujun Huang <hqjagain@gmail.com>: mm: fix gup_pud_range Wei Yang <richardw.yang@linux.intel.com>: mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge John Hubbard <jhubbard@nvidia.com>: Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12: mm/gup: factor out duplicate code from four routines mm/gup: move try_get_compound_head() to top, fix minor issues Dan Williams <dan.j.williams@intel.com>: mm: Cleanup __put_devmap_managed_page() vs ->page_free() John Hubbard <jhubbard@nvidia.com>: mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages goldish_pipe: rename local pin_user_pages() routine mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call mm/gup: allow FOLL_FORCE for get_user_pages_fast() IB/umem: use get_user_pages_fast() to pin DMA pages media/v4l2-core: set pages dirty upon releasing DMA buffers mm/gup: introduce pin_user_pages*() and FOLL_PIN goldish_pipe: convert to pin_user_pages() and put_user_page() IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() drm/via: set FOLL_PIN via pin_user_pages_fast() fs/io_uring: set FOLL_PIN via pin_user_pages() net/xdp: set FOLL_PIN via pin_user_pages() media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion powerpc: book3s64: convert to pin_user_pages() and put_user_page() mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" mm, tree-wide: rename put_user_page*() to unpin_user_page*() Subsystem: mm/swap Vasily Averin <vvs@virtuozzo.com>: mm/swapfile.c: swap_next should increase position index Subsystem: mm/memcg Kaitao Cheng <pilgrimtao@gmail.com>: mm/memcontrol.c: cleanup some useless code Subsystem: mm/pagemap Li Xinhai <lixinhai.lxh@gmail.com>: mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page Subsystem: mm/tracing Junyong Sun <sunjy516@gmail.com>: mm, tracing: print symbol name for kmem_alloc_node call_site events Subsystem: mm/kasan "Gustavo A. R. Silva" <gustavo@embeddedor.com>: lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more() Subsystem: mm/initialization Andy Shevchenko <andriy.shevchenko@linux.intel.com>: mm/early_ioremap.c: use %pa to print resource_size_t variables Subsystem: mm/pagealloc "Kirill A. Shutemov" <kirill@shutemov.name>: mm/page_alloc: skip non present sections on zone initialization David Hildenbrand <david@redhat.com>: mm: remove the memory isolate notifier mm: remove "count" parameter from has_unmovable_pages() Subsystem: mm/vmscan Liu Song <liu.song11@zte.com.cn>: mm/vmscan.c: remove unused return value of shrink_node Alex Shi <alex.shi@linux.alibaba.com>: mm/vmscan: remove prefetch_prev_lru_page mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE Subsystem: mm/tools Daniel Wagner <dwagner@suse.de>: tools/vm/slabinfo: fix sanity checks enabling Subsystem: mm/memblock Anshuman Khandual <anshuman.khandual@arm.com>: mm/memblock: define memblock_physmem_add() memblock: Use __func__ in remaining memblock_dbg() call sites Subsystem: mm/oom-kill David Rientjes <rientjes@google.com>: mm, oom: dump stack of victim when reaping failed Subsystem: mm/hugetlb Wei Yang <richardw.yang@linux.intel.com>: mm/huge_memory.c: use head to check huge zero page mm/huge_memory.c: use head to emphasize the purpose of page mm/huge_memory.c: reduce critical section protected by split_queue_lock Subsystem: mm/migration Ralph Campbell <rcampbell@nvidia.com>: mm/migrate: remove useless mask of start address mm/migrate: clean up some minor coding style mm/migrate: add stable check in migrate_vma_insert_page() David Rientjes <rientjes@google.com>: mm, thp: fix defrag setting if newline is not used Subsystem: mm/mmap Miaohe Lin <linmiaohe@huawei.com>: mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma() Subsystem: mm/memory-hotplug David Hildenbrand <david@redhat.com>: Patch series "mm/memory_hotplug: pass in nid to online_pages()": mm/memory_hotplug: pass in nid to online_pages() Qian Cai <cai@lca.pw>: mm/hotplug: silence a lockdep splat with printk() mm/page_isolation: fix potential warning from user Subsystem: mm/zswap Vitaly Wool <vitaly.wool@konsulko.com>: mm/zswap.c: add allocation hysteresis if pool limit is hit Dan Carpenter <dan.carpenter@oracle.com>: zswap: potential NULL dereference on error in init_zswap() Subsystem: mm/cleanups Yu Zhao <yuzhao@google.com>: include/linux/mm.h: clean up obsolete check on space in page->flags Wei Yang <richardw.yang@linux.intel.com>: include/linux/mm.h: remove dead code totalram_pages_set() Anshuman Khandual <anshuman.khandual@arm.com>: include/linux/memory.h: drop fields 'hw' and 'phys_callback' from struct memory_block Hao Lee <haolee.swjtu@gmail.com>: mm: fix comments related to node reclaim Subsystem: mm/zram Taejoon Song <taejoon.song@lge.com>: zram: try to avoid worst-case scenario on same element pages Colin Ian King <colin.king@canonical.com>: drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store Subsystem: misc Akinobu Mita <akinobu.mita@gmail.com>: Patch series "add header file for kelvin to/from Celsius conversion: include/linux/units.h: add helpers for kelvin to/from Celsius conversion ACPI: thermal: switch to use <linux/units.h> helpers platform/x86: asus-wmi: switch to use <linux/units.h> helpers platform/x86: intel_menlow: switch to use <linux/units.h> helpers thermal: int340x: switch to use <linux/units.h> helpers thermal: intel_pch: switch to use <linux/units.h> helpers nvme: hwmon: switch to use <linux/units.h> helpers thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h> iwlegacy: use <linux/units.h> helpers iwlwifi: use <linux/units.h> helpers thermal: armada: remove unused TO_MCELSIUS macro iio: adc: qcom-vadc-common: use <linux/units.h> helpers Subsystem: lib Mikhail Zaslonko <zaslonko@linux.ibm.com>: Patch series "S390 hardware support for kernel zlib", v3: lib/zlib: add s390 hardware support for kernel zlib_deflate s390/boot: rename HEAP_SIZE due to name collision lib/zlib: add s390 hardware support for kernel zlib_inflate s390/boot: add dfltcc= kernel command line parameter lib/zlib: add zlib_deflate_dfltcc_enabled() function btrfs: use larger zlib buffer for s390 hardware compression Nathan Chancellor <natechancellor@gmail.com>: lib/scatterlist.c: adjust indentation in __sg_alloc_table Yury Norov <yury.norov@gmail.com>: uapi: rename ext2_swab() to swab() and share globally in swab.h lib/find_bit.c: join _find_next_bit{_le} lib/find_bit.c: uninline helper _find_next_bit() Subsystem: binfmt Alexey Dobriyan <adobriyan@gmail.com>: fs/binfmt_elf.c: smaller code generation around auxv vector fill fs/binfmt_elf.c: fix ->start_code calculation fs/binfmt_elf.c: don't copy ELF header around fs/binfmt_elf.c: better codegen around current->mm fs/binfmt_elf.c: make BAD_ADDR() unlikely fs/binfmt_elf.c: coredump: allocate core ELF header on stack fs/binfmt_elf.c: coredump: delete duplicated overflow check fs/binfmt_elf.c: coredump: allow process with empty address space to coredump Subsystem: init Arvind Sankar <nivedita@alum.mit.edu>: init/main.c: log arguments and environment passed to init init/main.c: remove unnecessary repair_env_string in do_initcall_level Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2: init/main.c: fix quoted value handling in unknown_bootoption Christophe Leroy <christophe.leroy@c-s.fr>: init/main.c: fix misleading "This architecture does not have kernel memory protection" message Subsystem: reiserfs Yunfeng Ye <yeyunfeng@huawei.com>: reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() Subsystem: exec Alexey Dobriyan <adobriyan@gmail.com>: execve: warn if process starts with executable stack Subsystem: dma-mapping Andy Shevchenko <andriy.shevchenko@linux.intel.com>: include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc() Subsystem: kcov Dmitry Vyukov <dvyukov@google.com>: kcov: ignore fault-inject and stacktrace Documentation/admin-guide/kernel-parameters.txt | 12 Documentation/core-api/index.rst | 1 Documentation/core-api/pin_user_pages.rst | 234 +++++ Documentation/vm/zswap.rst | 13 arch/powerpc/mm/book3s64/iommu_api.c | 14 arch/s390/boot/compressed/decompressor.c | 8 arch/s390/boot/ipl_parm.c | 14 arch/s390/include/asm/setup.h | 7 arch/s390/kernel/setup.c | 14 drivers/acpi/thermal.c | 34 drivers/base/memory.c | 25 drivers/block/zram/zram_drv.c | 10 drivers/gpu/drm/via/via_dmablit.c | 6 drivers/iio/adc/qcom-vadc-common.c | 6 drivers/iio/adc/qcom-vadc-common.h | 1 drivers/infiniband/core/umem.c | 21 drivers/infiniband/core/umem_odp.c | 13 drivers/infiniband/hw/hfi1/user_pages.c | 4 drivers/infiniband/hw/mthca/mthca_memfree.c | 8 drivers/infiniband/hw/qib/qib_user_pages.c | 4 drivers/infiniband/hw/qib/qib_user_sdma.c | 8 drivers/infiniband/hw/usnic/usnic_uiom.c | 4 drivers/infiniband/sw/siw/siw_mem.c | 4 drivers/media/v4l2-core/videobuf-dma-sg.c | 20 drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h | 1 drivers/net/wireless/intel/iwlegacy/4965-mac.c | 3 drivers/net/wireless/intel/iwlegacy/4965.c | 17 drivers/net/wireless/intel/iwlegacy/common.h | 3 drivers/net/wireless/intel/iwlwifi/dvm/dev.h | 5 drivers/net/wireless/intel/iwlwifi/dvm/devices.c | 6 drivers/nvdimm/pmem.c | 6 drivers/nvme/host/hwmon.c | 13 drivers/platform/goldfish/goldfish_pipe.c | 39 drivers/platform/x86/asus-wmi.c | 7 drivers/platform/x86/intel_menlow.c | 9 drivers/thermal/armada_thermal.c | 2 drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c | 7 drivers/thermal/intel/intel_pch_thermal.c | 3 drivers/vfio/vfio_iommu_type1.c | 39 fs/binfmt_elf.c | 154 +-- fs/btrfs/compression.c | 2 fs/btrfs/zlib.c | 135 ++ fs/exec.c | 5 fs/fs-writeback.c | 2 fs/io_uring.c | 6 fs/ocfs2/cluster/quorum.c | 2 fs/ocfs2/dlm/Makefile | 2 fs/ocfs2/dlm/dlmast.c | 8 fs/ocfs2/dlm/dlmcommon.h | 4 fs/ocfs2/dlm/dlmconvert.c | 8 fs/ocfs2/dlm/dlmdebug.c | 8 fs/ocfs2/dlm/dlmdomain.c | 8 fs/ocfs2/dlm/dlmlock.c | 8 fs/ocfs2/dlm/dlmmaster.c | 10 fs/ocfs2/dlm/dlmrecovery.c | 10 fs/ocfs2/dlm/dlmthread.c | 8 fs/ocfs2/dlm/dlmunlock.c | 8 fs/ocfs2/dlmfs/Makefile | 2 fs/ocfs2/dlmfs/dlmfs.c | 4 fs/ocfs2/dlmfs/userdlm.c | 6 fs/ocfs2/dlmglue.c | 2 fs/ocfs2/journal.h | 8 fs/ocfs2/namei.c | 3 fs/reiserfs/stree.c | 3 include/linux/backing-dev.h | 10 include/linux/bitops.h | 1 include/linux/fs.h | 6 include/linux/io-mapping.h | 5 include/linux/memblock.h | 7 include/linux/memory.h | 29 include/linux/memory_hotplug.h | 3 include/linux/mm.h | 116 +- include/linux/mmzone.h | 2 include/linux/page-isolation.h | 8 include/linux/swab.h | 1 include/linux/thermal.h | 11 include/linux/units.h | 84 + include/linux/zlib.h | 6 include/trace/events/kmem.h | 4 include/trace/events/writeback.h | 37 include/uapi/linux/swab.h | 10 include/uapi/linux/sysctl.h | 2 init/main.c | 36 kernel/Makefile | 1 lib/Kconfig | 7 lib/Makefile | 2 lib/decompress_inflate.c | 13 lib/find_bit.c | 82 - lib/scatterlist.c | 2 lib/test_bitmap.c | 9 lib/test_kasan.c | 1 lib/zlib_deflate/deflate.c | 85 + lib/zlib_deflate/deflate_syms.c | 1 lib/zlib_deflate/deftree.c | 54 - lib/zlib_deflate/defutil.h | 134 ++ lib/zlib_dfltcc/Makefile | 13 lib/zlib_dfltcc/dfltcc.c | 57 + lib/zlib_dfltcc/dfltcc.h | 155 +++ lib/zlib_dfltcc/dfltcc_deflate.c | 280 ++++++ lib/zlib_dfltcc/dfltcc_inflate.c | 149 +++ lib/zlib_dfltcc/dfltcc_syms.c | 17 lib/zlib_dfltcc/dfltcc_util.h | 123 ++ lib/zlib_inflate/inflate.c | 32 lib/zlib_inflate/inflate.h | 8 lib/zlib_inflate/infutil.h | 18 mm/Makefile | 1 mm/backing-dev.c | 1 mm/debug.c | 18 mm/early_ioremap.c | 8 mm/filemap.c | 34 mm/gup.c | 503 ++++++----- mm/gup_benchmark.c | 9 mm/huge_memory.c | 44 mm/kmemleak.c | 112 +- mm/memblock.c | 22 mm/memcontrol.c | 25 mm/memory_hotplug.c | 24 mm/mempolicy.c | 6 mm/memremap.c | 95 -- mm/migrate.c | 77 + mm/mmap.c | 30 mm/oom_kill.c | 2 mm/page_alloc.c | 83 + mm/page_isolation.c | 69 - mm/page_vma_mapped.c | 12 mm/process_vm_access.c | 32 mm/slub.c | 88 + mm/sparse.c | 2 mm/swap.c | 27 mm/swapfile.c | 2 mm/vmscan.c | 24 mm/zswap.c | 88 + net/xdp/xdp_umem.c | 4 scripts/spelling.txt | 14 tools/testing/selftests/vm/gup_benchmark.c | 6 tools/vm/slabinfo.c | 4 136 files changed, 2790 insertions(+), 1358 deletions(-)
The rest of MM and the rest of everything else. Subsystems affected by this patch series: hotfixes mm/pagealloc mm/memory-hotplug ipc misc mm/cleanups mm/pagemap procfs lib cleanups arm Subsystem: hotfixes Gang He <GHe@suse.com>: ocfs2: fix oops when writing cloned file David Hildenbrand <david@redhat.com>: Patch series "mm: fix max_pfn not falling on section boundary", v2: mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section fs/proc/page.c: allow inspection of last section and fix end detection mm/page_alloc.c: initialize memmap of unavailable memory directly Subsystem: mm/pagealloc David Hildenbrand <david@redhat.com>: mm/page_alloc: fix and rework pfn handling in memmap_init_zone() mm: factor out next_present_section_nr() Subsystem: mm/memory-hotplug "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6: mm/memmap_init: update variable name in memmap_init_zone David Hildenbrand <david@redhat.com>: mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone() mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn mm/memory_hotplug: don't check for "all holes" in shrink_zone_span() mm/memory_hotplug: drop local variables in shrink_zone_span() mm/memory_hotplug: cleanup __remove_pages() mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone() Subsystem: ipc Manfred Spraul <manfred@colorfullife.com>: smp_mb__{before,after}_atomic(): update Documentation Davidlohr Bueso <dave@stgolabs.net>: ipc/mqueue.c: remove duplicated code Manfred Spraul <manfred@colorfullife.com>: ipc/mqueue.c: update/document memory barriers ipc/msg.c: update and document memory barriers ipc/sem.c: document and update memory barriers Lu Shuaibing <shuaibinglu@126.com>: ipc/msg.c: consolidate all xxxctl_down() functions drivers/block/null_blk_main.c: fix layout Subsystem: misc Andrew Morton <akpm@linux-foundation.org>: drivers/block/null_blk_main.c: fix layout drivers/block/null_blk_main.c: fix uninitialized var warnings Randy Dunlap <rdunlap@infradead.org>: pinctrl: fix pxa2xx.c build warnings Subsystem: mm/cleanups Florian Westphal <fw@strlen.de>: mm: remove __krealloc Subsystem: mm/pagemap Steven Price <steven.price@arm.com>: Patch series "Generic page walk and ptdump", v17: mm: add generic p?d_leaf() macros arc: mm: add p?d_leaf() definitions arm: mm: add p?d_leaf() definitions arm64: mm: add p?d_leaf() definitions mips: mm: add p?d_leaf() definitions powerpc: mm: add p?d_leaf() definitions riscv: mm: add p?d_leaf() definitions s390: mm: add p?d_leaf() definitions sparc: mm: add p?d_leaf() definitions x86: mm: add p?d_leaf() definitions mm: pagewalk: add p4d_entry() and pgd_entry() mm: pagewalk: allow walking without vma mm: pagewalk: don't lock PTEs for walk_page_range_novma() mm: pagewalk: fix termination condition in walk_pte_range() mm: pagewalk: add 'depth' parameter to pte_hole x86: mm: point to struct seq_file from struct pg_state x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct mm: add generic ptdump x86: mm: convert dump_pagetables to use walk_page_range arm64: mm: convert mm/dump.c to use walk_page_range() arm64: mm: display non-present entries in ptdump mm: ptdump: reduce level numbers by 1 in note_page() x86: mm: avoid allocating struct mm_struct on the stack "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: Patch series "Fixup page directory freeing", v4: powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case Peter Zijlstra <peterz@infradead.org>: mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush asm-generic/tlb: avoid potential double flush asm-gemeric/tlb: remove stray function declarations asm-generic/tlb: add missing CONFIG symbol asm-generic/tlb: rename HAVE_RCU_TABLE_FREE asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER asm-generic/tlb: provide MMU_GATHER_TABLE_FREE Subsystem: procfs Alexey Dobriyan <adobriyan@gmail.com>: proc: decouple proc from VFS with "struct proc_ops" proc: convert everything to "struct proc_ops" Subsystem: lib Yury Norov <yury.norov@gmail.com>: Patch series "lib: rework bitmap_parse", v5: lib/string: add strnchrnul() bitops: more BITS_TO_* macros lib: add test for bitmap_parse() lib: make bitmap_parse_user a wrapper on bitmap_parse lib: rework bitmap_parse() lib: new testcases for bitmap_parse{_user} include/linux/cpumask.h: don't calculate length of the input string Subsystem: cleanups Masahiro Yamada <masahiroy@kernel.org>: treewide: remove redundant IS_ERR() before error code check Subsystem: arm Chen-Yu Tsai <wens@csie.org>: ARM: dma-api: fix max_pfn off-by-one error in __dma_supported() Documentation/memory-barriers.txt | 14 arch/Kconfig | 17 arch/alpha/kernel/srm_env.c | 17 arch/arc/include/asm/pgtable.h | 1 arch/arm/Kconfig | 2 arch/arm/include/asm/pgtable-2level.h | 1 arch/arm/include/asm/pgtable-3level.h | 1 arch/arm/include/asm/tlb.h | 6 arch/arm/kernel/atags_proc.c | 8 arch/arm/mm/alignment.c | 14 arch/arm/mm/dma-mapping.c | 2 arch/arm64/Kconfig | 3 arch/arm64/Kconfig.debug | 19 arch/arm64/include/asm/pgtable.h | 2 arch/arm64/include/asm/ptdump.h | 8 arch/arm64/mm/Makefile | 4 arch/arm64/mm/dump.c | 152 ++---- arch/arm64/mm/mmu.c | 4 arch/arm64/mm/ptdump_debugfs.c | 2 arch/ia64/kernel/salinfo.c | 24 - arch/m68k/kernel/bootinfo_proc.c | 8 arch/mips/include/asm/pgtable.h | 5 arch/mips/lasat/picvue_proc.c | 31 - arch/powerpc/Kconfig | 7 arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 arch/powerpc/include/asm/book3s/64/pgtable.h | 3 arch/powerpc/include/asm/nohash/pgalloc.h | 8 arch/powerpc/include/asm/tlb.h | 11 arch/powerpc/kernel/proc_powerpc.c | 10 arch/powerpc/kernel/rtas-proc.c | 70 +-- arch/powerpc/kernel/rtas_flash.c | 34 - arch/powerpc/kernel/rtasd.c | 14 arch/powerpc/mm/book3s64/pgtable.c | 7 arch/powerpc/mm/numa.c | 12 arch/powerpc/platforms/pseries/lpar.c | 24 - arch/powerpc/platforms/pseries/lparcfg.c | 14 arch/powerpc/platforms/pseries/reconfig.c | 8 arch/powerpc/platforms/pseries/scanlog.c | 15 arch/riscv/include/asm/pgtable-64.h | 7 arch/riscv/include/asm/pgtable.h | 7 arch/s390/Kconfig | 4 arch/s390/include/asm/pgtable.h | 2 arch/sh/mm/alignment.c | 17 arch/sparc/Kconfig | 3 arch/sparc/include/asm/pgtable_64.h | 2 arch/sparc/include/asm/tlb_64.h | 11 arch/sparc/kernel/led.c | 15 arch/um/drivers/mconsole_kern.c | 9 arch/um/kernel/exitcode.c | 15 arch/um/kernel/process.c | 15 arch/x86/Kconfig | 3 arch/x86/Kconfig.debug | 20 arch/x86/include/asm/pgtable.h | 10 arch/x86/include/asm/tlb.h | 4 arch/x86/kernel/cpu/mtrr/if.c | 21 arch/x86/mm/Makefile | 4 arch/x86/mm/debug_pagetables.c | 18 arch/x86/mm/dump_pagetables.c | 418 +++++------------- arch/x86/platform/efi/efi_32.c | 2 arch/x86/platform/efi/efi_64.c | 4 arch/x86/platform/uv/tlb_uv.c | 14 arch/xtensa/platforms/iss/simdisk.c | 10 crypto/af_alg.c | 2 drivers/acpi/battery.c | 15 drivers/acpi/proc.c | 15 drivers/acpi/scan.c | 2 drivers/base/memory.c | 9 drivers/block/null_blk_main.c | 58 +- drivers/char/hw_random/bcm2835-rng.c | 2 drivers/char/hw_random/omap-rng.c | 4 drivers/clk/clk.c | 2 drivers/dma/mv_xor_v2.c | 2 drivers/firmware/efi/arm-runtime.c | 2 drivers/gpio/gpiolib-devres.c | 2 drivers/gpio/gpiolib-of.c | 8 drivers/gpio/gpiolib.c | 2 drivers/hwmon/dell-smm-hwmon.c | 15 drivers/i2c/busses/i2c-mv64xxx.c | 5 drivers/i2c/busses/i2c-synquacer.c | 2 drivers/ide/ide-proc.c | 19 drivers/input/input.c | 28 - drivers/isdn/capi/kcapi_proc.c | 6 drivers/macintosh/via-pmu.c | 17 drivers/md/md.c | 15 drivers/misc/sgi-gru/gruprocfs.c | 42 - drivers/mtd/ubi/build.c | 2 drivers/net/wireless/cisco/airo.c | 126 ++--- drivers/net/wireless/intel/ipw2x00/libipw_module.c | 15 drivers/net/wireless/intersil/hostap/hostap_hw.c | 4 drivers/net/wireless/intersil/hostap/hostap_proc.c | 14 drivers/net/wireless/intersil/hostap/hostap_wlan.h | 2 drivers/net/wireless/ray_cs.c | 20 drivers/of/device.c | 2 drivers/parisc/led.c | 17 drivers/pci/controller/pci-tegra.c | 2 drivers/pci/proc.c | 25 - drivers/phy/phy-core.c | 4 drivers/pinctrl/pxa/pinctrl-pxa2xx.c | 1 drivers/platform/x86/thinkpad_acpi.c | 15 drivers/platform/x86/toshiba_acpi.c | 60 +- drivers/pnp/isapnp/proc.c | 9 drivers/pnp/pnpbios/proc.c | 17 drivers/s390/block/dasd_proc.c | 15 drivers/s390/cio/blacklist.c | 14 drivers/s390/cio/css.c | 11 drivers/scsi/esas2r/esas2r_main.c | 9 drivers/scsi/scsi_devinfo.c | 15 drivers/scsi/scsi_proc.c | 29 - drivers/scsi/sg.c | 30 - drivers/spi/spi-orion.c | 3 drivers/staging/rtl8192u/ieee80211/ieee80211_module.c | 14 drivers/tty/sysrq.c | 8 drivers/usb/gadget/function/rndis.c | 17 drivers/video/fbdev/imxfb.c | 2 drivers/video/fbdev/via/viafbdev.c | 105 ++-- drivers/zorro/proc.c | 9 fs/cifs/cifs_debug.c | 108 ++-- fs/cifs/dfs_cache.c | 13 fs/cifs/dfs_cache.h | 2 fs/ext4/super.c | 2 fs/f2fs/node.c | 2 fs/fscache/internal.h | 2 fs/fscache/object-list.c | 11 fs/fscache/proc.c | 2 fs/jbd2/journal.c | 13 fs/jfs/jfs_debug.c | 14 fs/lockd/procfs.c | 12 fs/nfsd/nfsctl.c | 13 fs/nfsd/stats.c | 12 fs/ocfs2/file.c | 14 fs/ocfs2/suballoc.c | 2 fs/proc/cpuinfo.c | 12 fs/proc/generic.c | 38 - fs/proc/inode.c | 76 +-- fs/proc/internal.h | 5 fs/proc/kcore.c | 13 fs/proc/kmsg.c | 14 fs/proc/page.c | 54 +- fs/proc/proc_net.c | 32 - fs/proc/proc_sysctl.c | 2 fs/proc/root.c | 2 fs/proc/stat.c | 12 fs/proc/task_mmu.c | 4 fs/proc/vmcore.c | 10 fs/sysfs/group.c | 2 include/asm-generic/pgtable.h | 20 include/asm-generic/tlb.h | 138 +++-- include/linux/bitmap.h | 8 include/linux/bitops.h | 4 include/linux/cpumask.h | 4 include/linux/memory_hotplug.h | 4 include/linux/mm.h | 6 include/linux/mmzone.h | 10 include/linux/pagewalk.h | 49 +- include/linux/proc_fs.h | 23 include/linux/ptdump.h | 24 - include/linux/seq_file.h | 13 include/linux/slab.h | 1 include/linux/string.h | 1 include/linux/sunrpc/stats.h | 4 ipc/mqueue.c | 123 ++++- ipc/msg.c | 62 +- ipc/sem.c | 66 +- ipc/util.c | 14 kernel/configs.c | 9 kernel/irq/proc.c | 42 - kernel/kallsyms.c | 12 kernel/latencytop.c | 14 kernel/locking/lockdep_proc.c | 15 kernel/module.c | 12 kernel/profile.c | 24 - kernel/sched/psi.c | 48 +- lib/bitmap.c | 195 ++++---- lib/string.c | 17 lib/test_bitmap.c | 105 ++++ mm/Kconfig.debug | 21 mm/Makefile | 1 mm/gup.c | 2 mm/hmm.c | 66 +- mm/memory_hotplug.c | 104 +--- mm/memremap.c | 2 mm/migrate.c | 5 mm/mincore.c | 1 mm/mmu_gather.c | 158 ++++-- mm/page_alloc.c | 75 +-- mm/pagewalk.c | 167 +++++-- mm/ptdump.c | 159 ++++++ mm/slab_common.c | 37 - mm/sparse.c | 10 mm/swapfile.c | 14 net/atm/mpoa_proc.c | 17 net/atm/proc.c | 8 net/core/dev.c | 2 net/core/filter.c | 2 net/core/pktgen.c | 44 - net/ipv4/ipconfig.c | 10 net/ipv4/netfilter/ipt_CLUSTERIP.c | 16 net/ipv4/route.c | 24 - net/netfilter/xt_recent.c | 17 net/sunrpc/auth_gss/svcauth_gss.c | 10 net/sunrpc/cache.c | 45 - net/sunrpc/stats.c | 21 net/xfrm/xfrm_policy.c | 2 samples/kfifo/bytestream-example.c | 11 samples/kfifo/inttype-example.c | 11 samples/kfifo/record-example.c | 11 scripts/coccinelle/free/devm_free.cocci | 4 sound/core/info.c | 34 - sound/soc/codecs/ak4104.c | 3 sound/soc/codecs/cs4270.c | 3 sound/soc/codecs/tlv320aic32x4.c | 6 sound/soc/sunxi/sun4i-spdif.c | 2 tools/include/linux/bitops.h | 9 214 files changed, 2589 insertions(+), 2227 deletions(-)
On Tue, 4 Feb 2020 02:27:48 +0000 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Feb 4, 2020 at 1:33 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > The rest of MM and the rest of everything else.
>
> What's the base? You've changed your scripts or something, and that
> information is no longer in your cover letter..
>
Crap, sorry, geriatric.
d4e9056daedca3891414fe3c91de3449a5dad0f2
- A few y2038 fixes which missed the merge window whiole dependencies in NFS were being sorted out. - A bunch of fixes. Some minor, some not. Subsystems affected by this patch series: Arnd Bergmann <arnd@arndb.de>: y2038: remove ktime to/from timespec/timeval conversion y2038: remove unused time32 interfaces y2038: hide timeval/timespec/itimerval/itimerspec types Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>: Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" Christian Borntraeger <borntraeger@de.ibm.com>: include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap SeongJae Park <sjpark@amazon.de>: selftests/vm: add missed tests in run_vmtests Joe Perches <joe@perches.com>: get_maintainer: remove uses of P: for maintainer name Douglas Anderson <dianders@chromium.org>: scripts/get_maintainer.pl: deprioritize old Fixes: addresses Christoph Hellwig <hch@lst.de>: mm/swapfile.c: fix a comment in sys_swapon() Vasily Averin <vvs@virtuozzo.com>: mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps() Alexandru Ardelean <alexandru.ardelean@analog.com>: lib/string.c: update match_string() doc-strings with correct behavior Gavin Shan <gshan@redhat.com>: mm/vmscan.c: don't round up scan size for online memory cgroup Wei Yang <richardw.yang@linux.intel.com>: mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM Alexander Potapenko <glider@google.com>: lib/stackdepot.c: fix global out-of-bounds in stack_slabs Randy Dunlap <rdunlap@infradead.org>: MAINTAINERS: use tabs for SAFESETID MAINTAINERS | 8 - include/linux/compat.h | 29 ------ include/linux/ktime.h | 37 ------- include/linux/time32.h | 154 --------------------------------- include/linux/timekeeping32.h | 32 ------ include/linux/types.h | 5 - include/uapi/asm-generic/posix_types.h | 2 include/uapi/linux/swab.h | 4 include/uapi/linux/time.h | 22 ++-- ipc/sem.c | 6 - kernel/compat.c | 64 ------------- kernel/time/time.c | 43 --------- lib/stackdepot.c | 8 + lib/string.c | 16 +++ mm/memcontrol.c | 4 mm/sparse.c | 2 mm/swapfile.c | 2 mm/vmscan.c | 9 + scripts/get_maintainer.pl | 32 ------ tools/testing/selftests/vm/run_vmtests | 33 +++++++ 20 files changed, 93 insertions(+), 419 deletions(-)
On Thu, 20 Feb 2020 20:00:30 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> - A few y2038 fixes which missed the merge window whiole dependencies
> in NFS were being sorted out.
>
> - A bunch of fixes. Some minor, some not.
15 patches, based on ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
7 fixes, based on 9f65ed5fe41ce08ed1cb1f6a950f9ec694c142ad: Mel Gorman <mgorman@techsingularity.net>: mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa Huang Ying <ying.huang@intel.com>: mm: fix possible PMD dirty bit lost in set_pmd_migration_entry() "Kirill A. Shutemov" <kirill@shutemov.name>: mm: avoid data corruption on CoW fault into PFN-mapped VMA OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>: fat: fix uninit-memory access for partial initialized inode Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm/z3fold.c: do not include rwlock.h directly Vlastimil Babka <vbabka@suse.cz>: mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled Miroslav Benes <mbenes@suse.cz>: arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description arch/Kconfig | 5 +++-- fs/fat/inode.c | 19 +++++++------------ include/linux/mm.h | 4 ++++ mm/huge_memory.c | 3 +-- mm/memory.c | 35 +++++++++++++++++++++++++++-------- mm/memory_hotplug.c | 8 +++++++- mm/mprotect.c | 38 ++++++++++++++++++++++++++++++++++++-- mm/z3fold.c | 1 - 8 files changed, 85 insertions(+), 28 deletions(-)
10 fixes, based on c63c50fc2ec9afc4de21ef9ead2eac64b178cce1: Chunguang Xu <brookxu@tencent.com>: memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event Baoquan He <bhe@redhat.com>: mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case Qian Cai <cai@lca.pw>: page-flags: fix a crash at SetPageError(THP_SWAP) Chris Down <chris@chrisdown.name>: mm, memcg: fix corruption on 64-bit divisor in memory.high throttling mm, memcg: throttle allocators based on ancestral memory.high Michal Hocko <mhocko@suse.com>: mm: do not allow MADV_PAGEOUT for CoW pages Roman Penyaev <rpenyaev@suse.de>: epoll: fix possible lost wakeup on epoll_ctl() path Qian Cai <cai@lca.pw>: mm/mmu_notifier: silence PROVE_RCU_LIST warnings Vlastimil Babka <vbabka@suse.cz>: mm, slub: prevent kmalloc_node crashes and memory leaks Joerg Roedel <jroedel@suse.de>: x86/mm: split vmalloc_sync_all() arch/x86/mm/fault.c | 26 ++++++++++- drivers/acpi/apei/ghes.c | 2 fs/eventpoll.c | 8 +-- include/linux/page-flags.h | 2 include/linux/vmalloc.h | 5 +- kernel/notifier.c | 2 mm/madvise.c | 12 +++-- mm/memcontrol.c | 105 ++++++++++++++++++++++++++++----------------- mm/mmu_notifier.c | 27 +++++++---- mm/nommu.c | 10 +++- mm/slub.c | 26 +++++++---- mm/sparse.c | 8 ++- mm/vmalloc.c | 11 +++- 13 files changed, 165 insertions(+), 79 deletions(-)
5 fixes, based on 83fd69c93340177dcd66fd26ce6441fb581c1dbf: Naohiro Aota <naohiro.aota@wdc.com>: mm/swapfile.c: move inode_lock out of claim_swapfile David Hildenbrand <david@redhat.com>: drivers/base/memory.c: indicate all memory blocks as removable Mina Almasry <almasrymina@google.com>: hugetlb_cgroup: fix illegal access to memory Roman Gushchin <guro@fb.com>: mm: fork: fix kernel_stack memcg stats for various stack implementations "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: mm/sparse: fix kernel crash with pfn_section_valid check drivers/base/memory.c | 23 +++-------------------- include/linux/memcontrol.h | 12 ++++++++++++ kernel/fork.c | 4 ++-- mm/hugetlb_cgroup.c | 3 +-- mm/memcontrol.c | 38 ++++++++++++++++++++++++++++++++++++++ mm/sparse.c | 6 ++++++ mm/swapfile.c | 41 ++++++++++++++++++++--------------------- 7 files changed, 82 insertions(+), 45 deletions(-)
A large amount of MM, plenty more to come. 155 patches, based on GIT 1a323ea5356edbb3073dc59d51b9e6b86908857d Subsystems affected by this patch series: tools kthread kbuild scripts ocfs2 vfs mm/slub mm/kmemleak mm/pagecache mm/gup mm/swap mm/memcg mm/pagemap mm/mremap mm/sparsemem mm/kasan mm/pagealloc mm/vmscan mm/compaction mm/mempolicy mm/hugetlbfs mm/hugetlb Subsystem: tools David Ahern <dsahern@kernel.org>: tools/accounting/getdelays.c: fix netlink attribute length Subsystem: kthread Petr Mladek <pmladek@suse.com>: kthread: mark timer used by delayed kthread works as IRQ safe Subsystem: kbuild Masahiro Yamada <masahiroy@kernel.org>: asm-generic: make more kernel-space headers mandatory Subsystem: scripts Jonathan Neuschäfer <j.neuschaefer@gmx.net>: scripts/spelling.txt: add syfs/sysfs pattern Colin Ian King <colin.king@canonical.com>: scripts/spelling.txt: add more spellings to spelling.txt Subsystem: ocfs2 Alex Shi <alex.shi@linux.alibaba.com>: ocfs2: remove FS_OCFS2_NM ocfs2: remove unused macros ocfs2: use OCFS2_SEC_BITS in macro ocfs2: remove dlm_lock_is_remote wangyan <wangyan122@huawei.com>: ocfs2: there is no need to log twice in several functions ocfs2: correct annotation from "l_next_rec" to "l_next_free_rec" Alex Shi <alex.shi@linux.alibaba.com>: ocfs2: remove useless err Jules Irenge <jbi.octave@gmail.com>: ocfs2: Add missing annotations for ocfs2_refcount_cache_lock() and ocfs2_refcount_cache_unlock() "Gustavo A. R. Silva" <gustavo@embeddedor.com>: ocfs2: replace zero-length array with flexible-array member ocfs2: cluster: replace zero-length array with flexible-array member ocfs2: dlm: replace zero-length array with flexible-array member ocfs2: ocfs2_fs.h: replace zero-length array with flexible-array member wangjian <wangjian161@huawei.com>: ocfs2: roll back the reference count modification of the parent directory if an error occurs Takashi Iwai <tiwai@suse.de>: ocfs2: use scnprintf() for avoiding potential buffer overflow "Matthew Wilcox (Oracle)" <willy@infradead.org>: ocfs2: use memalloc_nofs_save instead of memalloc_noio_save Subsystem: vfs Kees Cook <keescook@chromium.org>: fs_parse: Remove pr_notice() about each validation Subsystem: mm/slub chenqiwu <chenqiwu@xiaomi.com>: mm/slub.c: replace cpu_slab->partial with wrapped APIs mm/slub.c: replace kmem_cache->cpu_partial with wrapped APIs Kees Cook <keescook@chromium.org>: slub: improve bit diffusion for freelist ptr obfuscation slub: relocate freelist pointer to middle of object Vlastimil Babka <vbabka@suse.cz>: Revert "topology: add support for node_to_mem_node() to determine the fallback node" Subsystem: mm/kmemleak Nathan Chancellor <natechancellor@gmail.com>: mm/kmemleak.c: use address-of operator on section symbols Qian Cai <cai@lca.pw>: mm/Makefile: disable KCSAN for kmemleak Subsystem: mm/pagecache Jan Kara <jack@suse.cz>: mm/filemap.c: don't bother dropping mmap_sem for zero size readahead Mauricio Faria de Oliveira <mfo@canonical.com>: mm/page-writeback.c: write_cache_pages(): deduplicate identical checks Xianting Tian <xianting_tian@126.com>: mm/filemap.c: clear page error before actual read Souptick Joarder <jrdr.linux@gmail.com>: mm/filemap.c: remove unused argument from shrink_readahead_size_eio() "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/filemap.c: use vm_fault error code directly include/linux/pagemap.h: rename arguments to find_subpage mm/page-writeback.c: use VM_BUG_ON_PAGE in clear_page_dirty_for_io mm/filemap.c: unexport find_get_entry mm/filemap.c: rewrite pagecache_get_page documentation Subsystem: mm/gup John Hubbard <jhubbard@nvidia.com>: Patch series "mm/gup: track FOLL_PIN pages", v6: mm/gup: split get_user_pages_remote() into two routines mm/gup: pass a flags arg to __gup_device_* functions mm: introduce page_ref_sub_return() mm/gup: pass gup flags to two more routines mm/gup: require FOLL_GET for get_user_pages_fast() mm/gup: track FOLL_PIN pages mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting mm/gup_benchmark: support pin_user_pages() and related calls selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: improve dump_page() for compound pages John Hubbard <jhubbard@nvidia.com>: mm: dump_page(): additional diagnostics for huge pinned pages Claudio Imbrenda <imbrenda@linux.ibm.com>: mm/gup/writeback: add callbacks for inaccessible pages Pingfan Liu <kernelfans@gmail.com>: mm/gup: rename nr as nr_pinned in get_user_pages_fast() mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path Subsystem: mm/swap Chen Wandun <chenwandun@huawei.com>: mm/swapfile.c: fix comments for swapcache_prepare Wei Yang <richardw.yang@linux.intel.com>: mm/swap.c: not necessary to export __pagevec_lru_add() Qian Cai <cai@lca.pw>: mm/swapfile: fix data races in try_to_unuse() Wei Yang <richard.weiyang@linux.alibaba.com>: mm/swap_slots.c: assign|reset cache slot by value directly Yang Shi <yang.shi@linux.alibaba.com>: mm: swap: make page_evictable() inline mm: swap: use smp_mb__after_atomic() to order LRU bit set Wei Yang <richard.weiyang@gmail.com>: mm/swap_state.c: use the same way to count page in [add_to|delete_from]_swap_cache Subsystem: mm/memcg Yafang Shao <laoar.shao@gmail.com>: mm, memcg: fix build error around the usage of kmem_caches Kirill Tkhai <ktkhai@virtuozzo.com>: mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node Roman Gushchin <guro@fb.com>: mm: memcg/slab: use mem_cgroup_from_obj() Patch series "mm: memcg: kmem API cleanup", v2: mm: kmem: cleanup (__)memcg_kmem_charge_memcg() arguments mm: kmem: cleanup memcg_kmem_uncharge_memcg() arguments mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page() mm: kmem: switch to nr_pages in (__)memcg_kmem_charge_memcg() mm: memcg/slab: cache page number in memcg_(un)charge_slab() mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge() Johannes Weiner <hannes@cmpxchg.org>: Patch series "mm: memcontrol: recursive memory.low protection", v3: mm: memcontrol: fix memory.low proportional distribution mm: memcontrol: clean up and document effective low/min calculations mm: memcontrol: recursive memory.low protection Shakeel Butt <shakeelb@google.com>: memcg: css_tryget_online cleanups Vincenzo Frascino <vincenzo.frascino@arm.com>: mm/memcontrol.c: make mem_cgroup_id_get_many() __maybe_unused Chris Down <chris@chrisdown.name>: mm, memcg: prevent memory.high load/store tearing mm, memcg: prevent memory.max load tearing mm, memcg: prevent memory.low load/store tearing mm, memcg: prevent memory.min load/store tearing mm, memcg: prevent memory.swap.max load tearing mm, memcg: prevent mem_cgroup_protected store tearing Roman Gushchin <guro@fb.com>: mm: memcg: make memory.oom.group tolerable to task migration Subsystem: mm/pagemap Thomas Hellstrom <thellstrom@vmware.com>: mm/mapping_dirty_helpers: Update huge page-table entry callbacks Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/vma: some more minor changes", v2: mm/vma: move VM_NO_KHUGEPAGED into generic header mm/vma: make vma_is_foreign() available for general use mm/vma: make is_vma_temporary_stack() available for general use "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: add pagemap.h to the fine documentation Peter Xu <peterx@redhat.com>: Patch series "mm: Page fault enhancements", v6: mm/gup: rename "nonblocking" to "locked" where proper mm/gup: fix __get_user_pages() on fault retry of hugetlb mm: introduce fault_signal_pending() x86/mm: use helper fault_signal_pending() arc/mm: use helper fault_signal_pending() arm64/mm: use helper fault_signal_pending() powerpc/mm: use helper fault_signal_pending() sh/mm: use helper fault_signal_pending() mm: return faster for non-fatal signals in user mode faults userfaultfd: don't retake mmap_sem to emulate NOPAGE mm: introduce FAULT_FLAG_DEFAULT mm: introduce FAULT_FLAG_INTERRUPTIBLE mm: allow VM_FAULT_RETRY for multiple times mm/gup: allow VM_FAULT_RETRY for multiple times mm/gup: allow to react to fatal signals mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path WANG Wenhu <wenhu.wang@vivo.com>: mm: clarify a confusing comment for remap_pfn_range() Wang Wenhu <wenhu.wang@vivo.com>: mm/memory.c: clarify a confusing comment for vm_iomap_memory Jaewon Kim <jaewon31.kim@samsung.com>: Patch series "mm: mmap: add mmap trace point", v3: mmap: remove inline of vm_unmapped_area mm: mmap: add trace point of vm_unmapped_area Subsystem: mm/mremap Brian Geffon <bgeffon@google.com>: mm/mremap: add MREMAP_DONTUNMAP to mremap() selftests: add MREMAP_DONTUNMAP selftest Subsystem: mm/sparsemem Wei Yang <richardw.yang@linux.intel.com>: mm/sparsemem: get address to page struct instead of address to pfn Pingfan Liu <kernelfans@gmail.com>: mm/sparse: rename pfn_present() to pfn_in_present_section() Baoquan He <bhe@redhat.com>: mm/sparse.c: use kvmalloc/kvfree to alloc/free memmap for the classic sparse mm/sparse.c: allocate memmap preferring the given node Subsystem: mm/kasan Walter Wu <walter-zh.wu@mediatek.com>: Patch series "fix the missing underflow in memory operation function", v4: kasan: detect negative size in memory operation function kasan: add test for invalid size in memmove Subsystem: mm/pagealloc Joel Savitz <jsavitz@redhat.com>: mm/page_alloc: increase default min_free_kbytes bound Mateusz Nosek <mateusznosek0@gmail.com>: mm, pagealloc: micro-optimisation: save two branches on hot page allocation path chenqiwu <chenqiwu@xiaomi.com>: mm/page_alloc.c: use free_area_empty() instead of open-coding Mateusz Nosek <mateusznosek0@gmail.com>: mm/page_alloc.c: micro-optimisation Remove unnecessary branch chenqiwu <chenqiwu@xiaomi.com>: mm/page_alloc: simplify page_is_buddy() for better code readability Subsystem: mm/vmscan Yang Shi <yang.shi@linux.alibaba.com>: mm: vmpressure: don't need call kfree if kstrndup fails mm: vmpressure: use mem_cgroup_is_root API mm: vmscan: replace open codings to NUMA_NO_NODE Wei Yang <richardw.yang@linux.intel.com>: mm/vmscan.c: remove cpu online notification for now Qian Cai <cai@lca.pw>: mm/vmscan.c: fix data races using kswapd_classzone_idx Mateusz Nosek <mateusznosek0@gmail.com>: mm/vmscan.c: Clean code by removing unnecessary assignment Kirill Tkhai <ktkhai@virtuozzo.com>: mm/vmscan.c: make may_enter_fs bool in shrink_page_list() Mateusz Nosek <mateusznosek0@gmail.com>: mm/vmscan.c: do_try_to_free_pages(): clean code by removing unnecessary assignment Michal Hocko <mhocko@suse.com>: selftests: vm: drop dependencies on page flags from mlock2 tests Subsystem: mm/compaction Rik van Riel <riel@surriel.com>: Patch series "fix THP migration for CMA allocations", v2: mm,compaction,cma: add alloc_contig flag to compact_control mm,thp,compaction,cma: allow THP migration for CMA allocations Vlastimil Babka <vbabka@suse.cz>: mm, compaction: fully assume capture is not NULL in compact_zone_order() Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm/compaction: really limit compact_unevictable_allowed to 0 and 1 mm/compaction: Disable compact_unevictable_allowed on RT Mateusz Nosek <mateusznosek0@gmail.com>: mm/compaction.c: clean code by removing unnecessary assignment Subsystem: mm/mempolicy Li Xinhai <lixinhai.lxh@gmail.com>: mm/mempolicy: support MPOL_MF_STRICT for huge page mapping mm/mempolicy: check hugepage migration is supported by arch in vma_migratable() Yang Shi <yang.shi@linux.alibaba.com>: mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk() Randy Dunlap <rdunlap@infradead.org>: mm: mempolicy: require at least one nodeid for MPOL_PREFERRED Colin Ian King <colin.king@canonical.com>: mm/memblock.c: remove redundant assignment to variable max_addr Subsystem: mm/hugetlbfs Mike Kravetz <mike.kravetz@oracle.com>: Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2: hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race Subsystem: mm/hugetlb Mina Almasry <almasrymina@google.com>: hugetlb_cgroup: add hugetlb_cgroup reservation counter hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations mm/hugetlb_cgroup: fix hugetlb_cgroup migration hugetlb_cgroup: add reservation accounting for private mappings hugetlb: disable region_add file_region coalescing hugetlb_cgroup: add accounting for shared mappings hugetlb_cgroup: support noreserve mappings hugetlb: support file_region coalescing again hugetlb_cgroup: add hugetlb_cgroup reservation tests hugetlb_cgroup: add hugetlb_cgroup reservation docs Mateusz Nosek <mateusznosek0@gmail.com>: mm/hugetlb.c: clean code by removing unnecessary initialization Vlastimil Babka <vbabka@suse.cz>: mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge() Christophe Leroy <christophe.leroy@c-s.fr>: selftests/vm: fix map_hugetlb length used for testing read and write mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS "Matthew Wilcox (Oracle)" <willy@infradead.org>: include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP Documentation/admin-guide/cgroup-v1/hugetlb.rst | 103 +- Documentation/admin-guide/cgroup-v2.rst | 11 Documentation/admin-guide/sysctl/vm.rst | 3 Documentation/core-api/mm-api.rst | 3 Documentation/core-api/pin_user_pages.rst | 86 + arch/alpha/include/asm/Kbuild | 11 arch/alpha/mm/fault.c | 6 arch/arc/include/asm/Kbuild | 21 arch/arc/mm/fault.c | 37 arch/arm/include/asm/Kbuild | 12 arch/arm/mm/fault.c | 7 arch/arm64/include/asm/Kbuild | 18 arch/arm64/mm/fault.c | 26 arch/c6x/include/asm/Kbuild | 37 arch/csky/include/asm/Kbuild | 36 arch/h8300/include/asm/Kbuild | 46 arch/hexagon/include/asm/Kbuild | 33 arch/hexagon/mm/vm_fault.c | 5 arch/ia64/include/asm/Kbuild | 7 arch/ia64/mm/fault.c | 5 arch/m68k/include/asm/Kbuild | 24 arch/m68k/mm/fault.c | 7 arch/microblaze/include/asm/Kbuild | 29 arch/microblaze/mm/fault.c | 5 arch/mips/include/asm/Kbuild | 13 arch/mips/mm/fault.c | 5 arch/nds32/include/asm/Kbuild | 37 arch/nds32/mm/fault.c | 5 arch/nios2/include/asm/Kbuild | 38 arch/nios2/mm/fault.c | 7 arch/openrisc/include/asm/Kbuild | 36 arch/openrisc/mm/fault.c | 5 arch/parisc/include/asm/Kbuild | 18 arch/parisc/mm/fault.c | 8 arch/powerpc/include/asm/Kbuild | 4 arch/powerpc/mm/book3s64/pkeys.c | 12 arch/powerpc/mm/fault.c | 20 arch/powerpc/platforms/pseries/hotplug-memory.c | 2 arch/riscv/include/asm/Kbuild | 28 arch/riscv/mm/fault.c | 9 arch/s390/include/asm/Kbuild | 15 arch/s390/mm/fault.c | 10 arch/sh/include/asm/Kbuild | 16 arch/sh/mm/fault.c | 13 arch/sparc/include/asm/Kbuild | 14 arch/sparc/mm/fault_32.c | 5 arch/sparc/mm/fault_64.c | 5 arch/um/kernel/trap.c | 3 arch/unicore32/include/asm/Kbuild | 34 arch/unicore32/mm/fault.c | 8 arch/x86/include/asm/Kbuild | 2 arch/x86/include/asm/mmu_context.h | 15 arch/x86/mm/fault.c | 32 arch/xtensa/include/asm/Kbuild | 26 arch/xtensa/mm/fault.c | 5 drivers/base/node.c | 2 drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 fs/fs_parser.c | 2 fs/hugetlbfs/inode.c | 30 fs/ocfs2/alloc.c | 3 fs/ocfs2/cluster/heartbeat.c | 12 fs/ocfs2/cluster/netdebug.c | 4 fs/ocfs2/cluster/tcp.c | 27 fs/ocfs2/cluster/tcp.h | 2 fs/ocfs2/dir.c | 4 fs/ocfs2/dlm/dlmcommon.h | 8 fs/ocfs2/dlm/dlmdebug.c | 100 - fs/ocfs2/dlm/dlmmaster.c | 2 fs/ocfs2/dlm/dlmthread.c | 3 fs/ocfs2/dlmglue.c | 2 fs/ocfs2/journal.c | 2 fs/ocfs2/namei.c | 15 fs/ocfs2/ocfs2_fs.h | 18 fs/ocfs2/refcounttree.c | 2 fs/ocfs2/reservations.c | 3 fs/ocfs2/stackglue.c | 2 fs/ocfs2/suballoc.c | 5 fs/ocfs2/super.c | 46 fs/pipe.c | 2 fs/userfaultfd.c | 64 - include/asm-generic/Kbuild | 52 + include/linux/cgroup-defs.h | 5 include/linux/fs.h | 5 include/linux/gfp.h | 6 include/linux/huge_mm.h | 10 include/linux/hugetlb.h | 76 + include/linux/hugetlb_cgroup.h | 175 +++ include/linux/kasan.h | 2 include/linux/kthread.h | 3 include/linux/memcontrol.h | 66 - include/linux/mempolicy.h | 29 include/linux/mm.h | 243 +++- include/linux/mm_types.h | 7 include/linux/mmzone.h | 6 include/linux/page_ref.h | 9 include/linux/pagemap.h | 29 include/linux/sched/signal.h | 18 include/linux/swap.h | 1 include/linux/topology.h | 17 include/trace/events/mmap.h | 48 include/uapi/linux/mman.h | 5 kernel/cgroup/cgroup.c | 17 kernel/fork.c | 9 kernel/sysctl.c | 31 lib/test_kasan.c | 19 mm/Makefile | 1 mm/compaction.c | 31 mm/debug.c | 54 - mm/filemap.c | 77 - mm/gup.c | 682 ++++++++++--- mm/gup_benchmark.c | 71 + mm/huge_memory.c | 29 mm/hugetlb.c | 866 ++++++++++++----- mm/hugetlb_cgroup.c | 347 +++++- mm/internal.h | 32 mm/kasan/common.c | 26 mm/kasan/generic.c | 9 mm/kasan/generic_report.c | 11 mm/kasan/kasan.h | 2 mm/kasan/report.c | 5 mm/kasan/tags.c | 9 mm/kasan/tags_report.c | 11 mm/khugepaged.c | 4 mm/kmemleak.c | 2 mm/list_lru.c | 12 mm/mapping_dirty_helpers.c | 42 mm/memblock.c | 2 mm/memcontrol.c | 378 ++++--- mm/memory-failure.c | 29 mm/memory.c | 4 mm/mempolicy.c | 73 + mm/migrate.c | 25 mm/mmap.c | 32 mm/mremap.c | 92 + mm/page-writeback.c | 19 mm/page_alloc.c | 82 - mm/page_counter.c | 29 mm/page_ext.c | 2 mm/rmap.c | 39 mm/shuffle.c | 2 mm/slab.h | 32 mm/slab_common.c | 2 mm/slub.c | 27 mm/sparse.c | 33 mm/swap.c | 5 mm/swap_slots.c | 12 mm/swap_state.c | 2 mm/swapfile.c | 10 mm/userfaultfd.c | 11 mm/vmpressure.c | 8 mm/vmscan.c | 111 -- mm/vmstat.c | 2 scripts/spelling.txt | 21 tools/accounting/getdelays.c | 2 tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 2 tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 575 +++++++++++ tools/testing/selftests/vm/gup_benchmark.c | 15 tools/testing/selftests/vm/hugetlb_reparenting_test.sh | 244 ++++ tools/testing/selftests/vm/map_hugetlb.c | 14 tools/testing/selftests/vm/mlock2-tests.c | 233 ---- tools/testing/selftests/vm/mremap_dontunmap.c | 313 ++++++ tools/testing/selftests/vm/run_vmtests | 37 tools/testing/selftests/vm/write_hugetlb_memory.sh | 23 tools/testing/selftests/vm/write_to_hugetlbfs.c | 242 ++++ 165 files changed, 5020 insertions(+), 2376 deletions(-)
- a lot more of MM, quite a bit more yet to come. - various other subsystems 166 patches based on 7e63420847ae5f1036e4f7c42f0b3282e73efbc2. Subsystems affected by this patch series: mm/memcg mm/pagemap mm/vmalloc mm/pagealloc mm/migration mm/thp mm/ksm mm/madvise mm/virtio mm/userfaultfd mm/memory-hotplug mm/shmem mm/rmap mm/zswap mm/zsmalloc mm/cleanups procfs misc MAINTAINERS bitops lib checkpatch epoll binfmt kallsyms reiserfs kmod gcov kconfig kcov ubsan fault-injection ipc Subsystem: mm/memcg Chris Down <chris@chrisdown.name>: mm, memcg: bypass high reclaim iteration for cgroup hierarchy root Subsystem: mm/pagemap Li Xinhai <lixinhai.lxh@gmail.com>: Patch series "mm: Fix misuse of parent anon_vma in dup_mmap path": mm: don't prepare anon_vma if vma has VM_WIPEONFORK Revert "mm/rmap.c: reuse mergeable anon_vma as parent when fork" mm: set vm_next and vm_prev to NULL in vm_area_dup() Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/vma: Use all available wrappers when possible", v2: mm/vma: add missing VMA flag readable name for VM_SYNC mm/vma: make vma_is_accessible() available for general use mm/vma: replace all remaining open encodings with is_vm_hugetlb_page() mm/vma: replace all remaining open encodings with vma_is_anonymous() mm/vma: append unlikely() while testing VMA access permissions Subsystem: mm/vmalloc Qiujun Huang <hqjagain@gmail.com>: mm/vmalloc: fix a typo in comment Subsystem: mm/pagealloc Michal Hocko <mhocko@suse.com>: mm: make it clear that gfp reclaim modifiers are valid only for sleepable allocations Subsystem: mm/migration Wei Yang <richardw.yang@linux.intel.com>: Patch series "cleanup on do_pages_move()", v5: mm/migrate.c: no need to check for i > start in do_pages_move() mm/migrate.c: wrap do_move_pages_to_node() and store_status() mm/migrate.c: check pagelist in move_pages_and_store_status() mm/migrate.c: unify "not queued for migration" handling in do_pages_move() Yang Shi <yang.shi@linux.alibaba.com>: mm/migrate.c: migrate PG_readahead flag Subsystem: mm/thp David Rientjes <rientjes@google.com>: mm, shmem: add vmstat for hugepage fallback mm, thp: track fallbacks due to failed memcg charges separately "Matthew Wilcox (Oracle)" <willy@infradead.org>: include/linux/pagemap.h: optimise find_subpage for !THP mm: remove CONFIG_TRANSPARENT_HUGE_PAGECACHE Subsystem: mm/ksm Li Chen <chenli@uniontech.com>: mm/ksm.c: update get_user_pages() argument in comment Subsystem: mm/madvise Huang Ying <ying.huang@intel.com>: mm: code cleanup for MADV_FREE Subsystem: mm/virtio Alexander Duyck <alexander.h.duyck@linux.intel.com>: Patch series "mm / virtio: Provide support for free page reporting", v17: mm: adjust shuffle code to allow for future coalescing mm: use zone and order instead of free area in free_list manipulators mm: add function __putback_isolated_page mm: introduce Reported pages virtio-balloon: pull page poisoning config out of free page hinting virtio-balloon: add support for providing free page reports to host mm/page_reporting: rotate reported pages to the tail of the list mm/page_reporting: add budget limit on how many pages can be reported per pass mm/page_reporting: add free page reporting documentation David Hildenbrand <david@redhat.com>: virtio-balloon: switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM Subsystem: mm/userfaultfd Shaohua Li <shli@fb.com>: Patch series "userfaultfd: write protection support", v6: userfaultfd: wp: add helper for writeprotect check Andrea Arcangeli <aarcange@redhat.com>: userfaultfd: wp: hook userfault handler to write protection fault userfaultfd: wp: add WP pagetable tracking to x86 userfaultfd: wp: userfaultfd_pte/huge_pmd_wp() helpers userfaultfd: wp: add UFFDIO_COPY_MODE_WP Peter Xu <peterx@redhat.com>: mm: merge parameters for change_protection() userfaultfd: wp: apply _PAGE_UFFD_WP bit userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork userfaultfd: wp: add pmd_swp_*uffd_wp() helpers userfaultfd: wp: support swap and page migration khugepaged: skip collapse if uffd-wp detected Shaohua Li <shli@fb.com>: userfaultfd: wp: support write protection for userfault vma range Andrea Arcangeli <aarcange@redhat.com>: userfaultfd: wp: add the writeprotect API to userfaultfd ioctl Shaohua Li <shli@fb.com>: userfaultfd: wp: enabled write protection in userfaultfd API Peter Xu <peterx@redhat.com>: userfaultfd: wp: don't wake up when doing write protect Martin Cracauer <cracauer@cons.org>: userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update Peter Xu <peterx@redhat.com>: userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally userfaultfd: selftests: refactor statistics userfaultfd: selftests: add write-protect test Subsystem: mm/memory-hotplug David Hildenbrand <david@redhat.com>: Patch series "mm: drop superfluous section checks when onlining/offlining": drivers/base/memory.c: drop section_count drivers/base/memory.c: drop pages_correctly_probed() mm/page_ext.c: drop pfn_present() check when onlining Baoquan He <bhe@redhat.com>: mm/memory_hotplug.c: only respect mem= parameter during boot stage David Hildenbrand <david@redhat.com>: mm/memory_hotplug.c: simplify calculation of number of pages in __remove_pages() mm/memory_hotplug.c: cleanup __add_pages() Baoquan He <bhe@redhat.com>: Patch series "mm/hotplug: Only use subsection map for VMEMMAP", v4: mm/sparse.c: introduce new function fill_subsection_map() mm/sparse.c: introduce a new function clear_subsection_map() mm/sparse.c: only use subsection map in VMEMMAP case mm/sparse.c: add note about only VMEMMAP supporting sub-section hotplug mm/sparse.c: move subsection_map related functions together David Hildenbrand <david@redhat.com>: Patch series "mm/memory_hotplug: allow to specify a default online_type", v3: drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE drivers/base/memory: map MMOP_OFFLINE to 0 drivers/base/memory: store mapping between MMOP_* and string in an array powernv/memtrace: always online added memory blocks hv_balloon: don't check for memhp_auto_online manually mm/memory_hotplug: unexport memhp_auto_online mm/memory_hotplug: convert memhp_auto_online to store an online_type mm/memory_hotplug: allow to specify a default online_type chenqiwu <chenqiwu@xiaomi.com>: mm/memory_hotplug.c: use __pfn_to_section() instead of open-coding Subsystem: mm/shmem Kees Cook <keescook@chromium.org>: mm/shmem.c: distribute switch variables for initialization Mateusz Nosek <mateusznosek0@gmail.com>: mm/shmem.c: clean code by removing unnecessary assignment Hugh Dickins <hughd@google.com>: mm: huge tmpfs: try to split_huge_page() when punching hole Subsystem: mm/rmap Palmer Dabbelt <palmerdabbelt@google.com>: mm: prevent a warning when casting void* -> enum Subsystem: mm/zswap "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>: mm/zswap: allow setting default status, compressor and allocator in Kconfig Subsystem: mm/zsmalloc Subsystem: mm/cleanups Jules Irenge <jbi.octave@gmail.com>: mm/compaction: add missing annotation for compact_lock_irqsave mm/hugetlb: add missing annotation for gather_surplus_pages() mm/mempolicy: add missing annotation for queue_pages_pmd() mm/slub: add missing annotation for get_map() mm/slub: add missing annotation for put_map() mm/zsmalloc: add missing annotation for migrate_read_lock() mm/zsmalloc: add missing annotation for migrate_read_unlock() mm/zsmalloc: add missing annotation for pin_tag() mm/zsmalloc: add missing annotation for unpin_tag() chenqiwu <chenqiwu@xiaomi.com>: mm: fix ambiguous comments for better code readability Mateusz Nosek <mateusznosek0@gmail.com>: mm/mm_init.c: clean code. Use BUILD_BUG_ON when comparing compile time constant Joe Perches <joe@perches.com>: mm: use fallthrough; Steven Price <steven.price@arm.com>: include/linux/swapops.h: correct guards for non_swap_entry() Ira Weiny <ira.weiny@intel.com>: include/linux/memremap.h: remove stale comments Mateusz Nosek <mateusznosek0@gmail.com>: mm/dmapool.c: micro-optimisation remove unnecessary branch Waiman Long <longman@redhat.com>: mm: remove dummy struct bootmem_data/bootmem_data_t Subsystem: procfs Jules Irenge <jbi.octave@gmail.com>: fs/proc/inode.c: annotate close_pdeo() for sparse Alexey Dobriyan <adobriyan@gmail.com>: proc: faster open/read/close with "permanent" files proc: speed up /proc/*/statm "Matthew Wilcox (Oracle)" <willy@infradead.org>: proc: inline vma_stop into m_stop proc: remove m_cache_vma proc: use ppos instead of m->version seq_file: remove m->version proc: inline m_next_vma into m_next Subsystem: misc Michal Simek <michal.simek@xilinx.com>: asm-generic: fix unistd_32.h generation format Nathan Chancellor <natechancellor@gmail.com>: kernel/extable.c: use address-of operator on section symbols Masahiro Yamada <masahiroy@kernel.org>: sparc,x86: vdso: remove meaningless undefining CONFIG_OPTIMIZE_INLINING compiler: remove CONFIG_OPTIMIZE_INLINING entirely Vegard Nossum <vegard.nossum@oracle.com>: compiler.h: fix error in BUILD_BUG_ON() reporting Subsystem: MAINTAINERS Joe Perches <joe@perches.com>: MAINTAINERS: list the section entries in the preferred order Subsystem: bitops Josh Poimboeuf <jpoimboe@redhat.com>: bitops: always inline sign extension helpers Subsystem: lib Konstantin Khlebnikov <khlebnikov@yandex-team.ru>: lib/test_lockup: test module to generate lockups Colin Ian King <colin.king@canonical.com>: lib/test_lockup.c: fix spelling mistake "iteraions" -> "iterations" Konstantin Khlebnikov <khlebnikov@yandex-team.ru>: lib/test_lockup.c: add parameters for locking generic vfs locks "Gustavo A. R. Silva" <gustavo@embeddedor.com>: lib/bch.c: replace zero-length array with flexible-array member lib/ts_bm.c: replace zero-length array with flexible-array member lib/ts_fsm.c: replace zero-length array with flexible-array member lib/ts_kmp.c: replace zero-length array with flexible-array member Geert Uytterhoeven <geert+renesas@glider.be>: lib/scatterlist: fix sg_copy_buffer() kerneldoc Kees Cook <keescook@chromium.org>: lib: test_stackinit.c: XFAIL switch variable init tests Alexander Potapenko <glider@google.com>: lib/stackdepot.c: check depot_index before accessing the stack slab lib/stackdepot.c: fix a condition in stack_depot_fetch() lib/stackdepot.c: build with -fno-builtin kasan: stackdepot: move filter_irq_stacks() to stackdepot.c Qian Cai <cai@lca.pw>: percpu_counter: fix a data race at vm_committed_as Andy Shevchenko <andriy.shevchenko@linux.intel.com>: lib/test_bitmap.c: make use of EXP2_IN_BITS chenqiwu <chenqiwu@xiaomi.com>: lib/rbtree: fix coding style of assignments Dan Carpenter <dan.carpenter@oracle.com>: lib/test_kmod.c: remove a NULL test Rikard Falkeborn <rikard.falkeborn@gmail.com>: linux/bits.h: add compile time sanity check of GENMASK inputs Chris Wilson <chris@chris-wilson.co.uk>: lib/list: prevent compiler reloads inside 'safe' list iteration Nathan Chancellor <natechancellor@gmail.com>: lib/dynamic_debug.c: use address-of operator on section symbols Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: remove email address comment from email address comparisons Lubomir Rintel <lkundrak@v3.sk>: checkpatch: check SPDX tags in YAML files John Hubbard <jhubbard@nvidia.com>: checkpatch: support "base-commit:" format Joe Perches <joe@perches.com>: checkpatch: prefer fallthrough; over fallthrough comments Antonio Borneo <borneo.antonio@gmail.com>: checkpatch: fix minor typo and mixed space+tab in indentation checkpatch: fix multiple const * types checkpatch: add command-line option for TAB size Joe Perches <joe@perches.com>: checkpatch: improve Gerrit Change-Id: test Lubomir Rintel <lkundrak@v3.sk>: checkpatch: check proper licensing of Devicetree bindings Joe Perches <joe@perches.com>: checkpatch: avoid warning about uninitialized_var() Subsystem: epoll Roman Penyaev <rpenyaev@suse.de>: kselftest: introduce new epoll test case Jason Baron <jbaron@akamai.com>: fs/epoll: make nesting accounting safe for -rt kernel Subsystem: binfmt Alexey Dobriyan <adobriyan@gmail.com>: fs/binfmt_elf.c: delete "loc" variable fs/binfmt_elf.c: allocate less for static executable fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path Subsystem: kallsyms Will Deacon <will@kernel.org>: Patch series "Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()": samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes samples/hw_breakpoint: drop use of kallsyms_lookup_name() kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol() Subsystem: reiserfs Colin Ian King <colin.king@canonical.com>: reiserfs: clean up several indentation issues Subsystem: kmod Qiujun Huang <hqjagain@gmail.com>: kernel/kmod.c: fix a typo "assuems" -> "assumes" Subsystem: gcov "Gustavo A. R. Silva" <gustavo@embeddedor.com>: gcov: gcc_4_7: replace zero-length array with flexible-array member gcov: gcc_3_4: replace zero-length array with flexible-array member kernel/gcov/fs.c: replace zero-length array with flexible-array member Subsystem: kconfig Krzysztof Kozlowski <krzk@kernel.org>: init/Kconfig: clean up ANON_INODES and old IO schedulers options Subsystem: kcov Andrey Konovalov <andreyknvl@google.com>: Patch series "kcov: collect coverage from usb soft interrupts", v4: kcov: cleanup debug messages kcov: fix potential use-after-free in kcov_remote_start kcov: move t->kcov assignments into kcov_start/stop kcov: move t->kcov_sequence assignment kcov: use t->kcov_mode as enabled indicator kcov: collect coverage from interrupts usb: core: kcov: collect coverage from usb complete callback Subsystem: ubsan Kees Cook <keescook@chromium.org>: Patch series "ubsan: Split out bounds checker", v5: ubsan: add trap instrumentation option ubsan: split "bounds" checker from other options drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks ubsan: check panic_on_warn kasan: unset panic_on_warn before calling panic() ubsan: include bug type in report header Subsystem: fault-injection Qiujun Huang <hqjagain@gmail.com>: lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability" Subsystem: ipc Somala Swaraj <somalaswaraj@gmail.com>: ipc/mqueue.c: fix a brace coding style issue Jason Yan <yanaijie@huawei.com>: ipc/shm.c: make compat_ksys_shmctl() static Documentation/admin-guide/kernel-parameters.txt | 13 Documentation/admin-guide/mm/transhuge.rst | 14 Documentation/admin-guide/mm/userfaultfd.rst | 51 Documentation/dev-tools/kcov.rst | 17 Documentation/vm/free_page_reporting.rst | 41 Documentation/vm/zswap.rst | 20 MAINTAINERS | 35 arch/alpha/include/asm/mmzone.h | 2 arch/alpha/kernel/syscalls/syscallhdr.sh | 2 arch/csky/mm/fault.c | 4 arch/ia64/kernel/syscalls/syscallhdr.sh | 2 arch/ia64/kernel/vmlinux.lds.S | 2 arch/m68k/mm/fault.c | 4 arch/microblaze/kernel/syscalls/syscallhdr.sh | 2 arch/mips/kernel/syscalls/syscallhdr.sh | 3 arch/mips/mm/fault.c | 4 arch/nds32/kernel/vmlinux.lds.S | 1 arch/parisc/kernel/syscalls/syscallhdr.sh | 2 arch/powerpc/kernel/syscalls/syscallhdr.sh | 3 arch/powerpc/kvm/e500_mmu_host.c | 2 arch/powerpc/mm/fault.c | 2 arch/powerpc/platforms/powernv/memtrace.c | 14 arch/sh/kernel/syscalls/syscallhdr.sh | 2 arch/sh/mm/fault.c | 2 arch/sparc/kernel/syscalls/syscallhdr.sh | 2 arch/sparc/vdso/vdso32/vclock_gettime.c | 4 arch/x86/Kconfig | 1 arch/x86/configs/i386_defconfig | 1 arch/x86/configs/x86_64_defconfig | 1 arch/x86/entry/vdso/vdso32/vclock_gettime.c | 4 arch/x86/include/asm/pgtable.h | 67 + arch/x86/include/asm/pgtable_64.h | 8 arch/x86/include/asm/pgtable_types.h | 12 arch/x86/mm/fault.c | 2 arch/xtensa/kernel/syscalls/syscallhdr.sh | 2 drivers/base/memory.c | 138 -- drivers/hv/hv_balloon.c | 25 drivers/misc/lkdtm/bugs.c | 75 + drivers/misc/lkdtm/core.c | 3 drivers/misc/lkdtm/lkdtm.h | 3 drivers/usb/core/hcd.c | 3 drivers/virtio/Kconfig | 1 drivers/virtio/virtio_balloon.c | 190 ++- fs/binfmt_elf.c | 56 fs/eventpoll.c | 64 - fs/proc/array.c | 39 fs/proc/cpuinfo.c | 1 fs/proc/generic.c | 31 fs/proc/inode.c | 188 ++- fs/proc/internal.h | 6 fs/proc/kmsg.c | 1 fs/proc/stat.c | 1 fs/proc/task_mmu.c | 97 - fs/reiserfs/do_balan.c | 2 fs/reiserfs/ioctl.c | 11 fs/reiserfs/namei.c | 10 fs/seq_file.c | 28 fs/userfaultfd.c | 116 + include/asm-generic/pgtable.h | 1 include/asm-generic/pgtable_uffd.h | 66 + include/asm-generic/tlb.h | 3 include/linux/bitops.h | 4 include/linux/bits.h | 22 include/linux/compiler.h | 2 include/linux/compiler_types.h | 11 include/linux/gfp.h | 2 include/linux/huge_mm.h | 2 include/linux/list.h | 50 include/linux/memory.h | 1 include/linux/memory_hotplug.h | 13 include/linux/memremap.h | 2 include/linux/mm.h | 25 include/linux/mm_inline.h | 15 include/linux/mm_types.h | 4 include/linux/mmzone.h | 47 include/linux/page-flags.h | 16 include/linux/page_reporting.h | 26 include/linux/pagemap.h | 4 include/linux/percpu_counter.h | 4 include/linux/proc_fs.h | 17 include/linux/sched.h | 3 include/linux/seq_file.h | 1 include/linux/shmem_fs.h | 10 include/linux/stackdepot.h | 2 include/linux/swapops.h | 5 include/linux/userfaultfd_k.h | 42 include/linux/vm_event_item.h | 5 include/trace/events/huge_memory.h | 1 include/trace/events/mmflags.h | 1 include/trace/events/vmscan.h | 2 include/uapi/linux/userfaultfd.h | 40 include/uapi/linux/virtio_balloon.h | 1 init/Kconfig | 8 ipc/mqueue.c | 5 ipc/shm.c | 2 ipc/util.c | 1 kernel/configs/tiny.config | 1 kernel/events/core.c | 3 kernel/extable.c | 3 kernel/fork.c | 10 kernel/gcov/fs.c | 2 kernel/gcov/gcc_3_4.c | 6 kernel/gcov/gcc_4_7.c | 2 kernel/kallsyms.c | 2 kernel/kcov.c | 282 +++- kernel/kmod.c | 2 kernel/module.c | 1 kernel/sched/fair.c | 2 lib/Kconfig.debug | 35 lib/Kconfig.ubsan | 51 lib/Makefile | 8 lib/bch.c | 2 lib/dynamic_debug.c | 2 lib/rbtree.c | 4 lib/scatterlist.c | 2 lib/stackdepot.c | 39 lib/test_bitmap.c | 2 lib/test_kmod.c | 2 lib/test_lockup.c | 601 +++++++++- lib/test_stackinit.c | 28 lib/ts_bm.c | 2 lib/ts_fsm.c | 2 lib/ts_kmp.c | 2 lib/ubsan.c | 47 mm/Kconfig | 135 ++ mm/Makefile | 1 mm/compaction.c | 3 mm/dmapool.c | 4 mm/filemap.c | 14 mm/gup.c | 9 mm/huge_memory.c | 36 mm/hugetlb.c | 1 mm/hugetlb_cgroup.c | 6 mm/internal.h | 2 mm/kasan/common.c | 23 mm/kasan/report.c | 10 mm/khugepaged.c | 39 mm/ksm.c | 5 mm/list_lru.c | 2 mm/memcontrol.c | 5 mm/memory-failure.c | 2 mm/memory.c | 42 mm/memory_hotplug.c | 53 mm/mempolicy.c | 11 mm/migrate.c | 122 +- mm/mm_init.c | 2 mm/mmap.c | 10 mm/mprotect.c | 76 - mm/page_alloc.c | 174 ++ mm/page_ext.c | 5 mm/page_isolation.c | 6 mm/page_reporting.c | 384 ++++++ mm/page_reporting.h | 54 mm/rmap.c | 23 mm/shmem.c | 168 +- mm/shuffle.c | 12 mm/shuffle.h | 6 mm/slab_common.c | 1 mm/slub.c | 3 mm/sparse.c | 236 ++- mm/swap.c | 20 mm/swapfile.c | 1 mm/userfaultfd.c | 98 + mm/vmalloc.c | 2 mm/vmscan.c | 12 mm/vmstat.c | 3 mm/zsmalloc.c | 10 mm/zswap.c | 24 samples/hw_breakpoint/data_breakpoint.c | 11 scripts/Makefile.ubsan | 16 scripts/checkpatch.pl | 155 +- tools/lib/rbtree.c | 4 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 67 + tools/testing/selftests/vm/userfaultfd.c | 233 +++ 174 files changed, 3990 insertions(+), 1399 deletions(-)
Almost all of the rest of MM. Various other things. 35 patches, based on c0cc271173b2e1c2d8d0ceaef14e4dfa79eefc0d. Subsystems affected by this patch series: hfs mm/memcg mm/slab-generic mm/slab mm/pagealloc mm/gup ocfs2 mm/hugetlb mm/pagemap mm/memremap kmod misc seqfile Subsystem: hfs Simon Gander <simon@tuxera.com>: hfsplus: fix crash and filesystem corruption when deleting files Subsystem: mm/memcg Jakub Kicinski <kuba@kernel.org>: mm, memcg: do not high throttle allocators based on wraparound Subsystem: mm/slab-generic Qiujun Huang <hqjagain@gmail.com>: mm, slab_common: fix a typo in comment "eariler"->"earlier" Subsystem: mm/slab Mauro Carvalho Chehab <mchehab+huawei@kernel.org>: docs: mm: slab.h: fix a broken cross-reference Subsystem: mm/pagealloc Randy Dunlap <rdunlap@infradead.org>: mm/page_alloc.c: fix kernel-doc warning Jason Yan <yanaijie@huawei.com>: mm/page_alloc: make pcpu_drain_mutex and pcpu_drain static Subsystem: mm/gup Miles Chen <miles.chen@mediatek.com>: mm/gup: fix null pointer dereference detected by coverity Subsystem: ocfs2 Changwei Ge <chge@linux.alibaba.com>: ocfs2: no need try to truncate file beyond i_size Subsystem: mm/hugetlb Aslan Bakirov <aslan@fb.com>: mm: cma: NUMA node interface Roman Gushchin <guro@fb.com>: mm: hugetlb: optionally allocate gigantic hugepages using cma Subsystem: mm/pagemap Jaewon Kim <jaewon31.kim@samsung.com>: mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area Arjun Roy <arjunroy@google.com>: mm/memory.c: refactor insert_page to prepare for batched-lock insert mm: bring sparc pte_index() semantics inline with other platforms mm: define pte_index as macro for x86 mm/memory.c: add vm_insert_pages() Anshuman Khandual <anshuman.khandual@arm.com>: mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS mm/vma: introduce VM_ACCESS_FLAGS mm/special: create generic fallbacks for pte_special() and pte_mkspecial() Subsystem: mm/memremap Logan Gunthorpe <logang@deltatee.com>: Patch series "Allow setting caching mode in arch_add_memory() for P2PDMA", v4: mm/memory_hotplug: drop the flags field from struct mhp_restrictions mm/memory_hotplug: rename mhp_restrictions to mhp_params x86/mm: thread pgprot_t through init_memory_mapping() x86/mm: introduce __set_memory_prot() powerpc/mm: thread pgprot_t through create_section_mapping() mm/memory_hotplug: add pgprot_t to mhp_params mm/memremap: set caching mode for PCI P2PDMA memory to WC Subsystem: kmod Eric Biggers <ebiggers@google.com>: Patch series "module autoloading fixes and cleanups", v5: kmod: make request_module() return an error when autoloading is disabled fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() docs: admin-guide: document the kernel.modprobe sysctl selftests: kmod: fix handling test numbers above 9 selftests: kmod: test disabling module autoloading Subsystem: misc Pali Rohár <pali@kernel.org>: change email address for Pali Rohár kbuild test robot <lkp@intel.com>: drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings Subsystem: seqfile Vasily Averin <vvs@virtuozzo.com>: Patch series "seq_file .next functions should increase position index": fs/seq_file.c: seq_read(): add info message about buggy .next functions kernel/gcov/fs.c: gcov_seq_next() should increase position index ipc/util.c: sysvipc_find_ipc() should increase position index Documentation/ABI/testing/sysfs-platform-dell-laptop | 8 Documentation/admin-guide/kernel-parameters.txt | 8 Documentation/admin-guide/sysctl/kernel.rst | 21 ++ MAINTAINERS | 16 - arch/alpha/include/asm/page.h | 3 arch/alpha/include/asm/pgtable.h | 2 arch/arc/include/asm/page.h | 2 arch/arm/include/asm/page.h | 4 arch/arm/include/asm/pgtable-2level.h | 2 arch/arm/include/asm/pgtable.h | 15 - arch/arm/mach-omap2/omap-secure.c | 2 arch/arm/mach-omap2/omap-secure.h | 2 arch/arm/mach-omap2/omap-smc.S | 2 arch/arm/mm/fault.c | 2 arch/arm/mm/mmu.c | 14 + arch/arm64/include/asm/page.h | 4 arch/arm64/mm/fault.c | 2 arch/arm64/mm/init.c | 6 arch/arm64/mm/mmu.c | 7 arch/c6x/include/asm/page.h | 5 arch/csky/include/asm/page.h | 3 arch/csky/include/asm/pgtable.h | 3 arch/h8300/include/asm/page.h | 2 arch/hexagon/include/asm/page.h | 3 arch/hexagon/include/asm/pgtable.h | 2 arch/ia64/include/asm/page.h | 5 arch/ia64/include/asm/pgtable.h | 2 arch/ia64/mm/init.c | 7 arch/m68k/include/asm/mcf_pgtable.h | 10 - arch/m68k/include/asm/motorola_pgtable.h | 2 arch/m68k/include/asm/page.h | 3 arch/m68k/include/asm/sun3_pgtable.h | 2 arch/microblaze/include/asm/page.h | 2 arch/microblaze/include/asm/pgtable.h | 4 arch/mips/include/asm/page.h | 5 arch/mips/include/asm/pgtable.h | 44 +++- arch/nds32/include/asm/page.h | 3 arch/nds32/include/asm/pgtable.h | 9 - arch/nds32/mm/fault.c | 2 arch/nios2/include/asm/page.h | 3 arch/nios2/include/asm/pgtable.h | 3 arch/openrisc/include/asm/page.h | 5 arch/openrisc/include/asm/pgtable.h | 2 arch/parisc/include/asm/page.h | 3 arch/parisc/include/asm/pgtable.h | 2 arch/powerpc/include/asm/book3s/64/hash.h | 3 arch/powerpc/include/asm/book3s/64/radix.h | 3 arch/powerpc/include/asm/page.h | 9 - arch/powerpc/include/asm/page_64.h | 7 arch/powerpc/include/asm/sparsemem.h | 3 arch/powerpc/mm/book3s64/hash_utils.c | 5 arch/powerpc/mm/book3s64/pgtable.c | 7 arch/powerpc/mm/book3s64/pkeys.c | 2 arch/powerpc/mm/book3s64/radix_pgtable.c | 18 +- arch/powerpc/mm/mem.c | 12 - arch/riscv/include/asm/page.h | 3 arch/s390/include/asm/page.h | 3 arch/s390/mm/fault.c | 2 arch/s390/mm/init.c | 9 - arch/sh/include/asm/page.h | 3 arch/sh/mm/init.c | 7 arch/sparc/include/asm/page_32.h | 3 arch/sparc/include/asm/page_64.h | 3 arch/sparc/include/asm/pgtable_32.h | 7 arch/sparc/include/asm/pgtable_64.h | 10 - arch/um/include/asm/pgtable.h | 10 - arch/unicore32/include/asm/page.h | 3 arch/unicore32/include/asm/pgtable.h | 3 arch/unicore32/mm/fault.c | 2 arch/x86/include/asm/page_types.h | 7 arch/x86/include/asm/pgtable.h | 6 arch/x86/include/asm/set_memory.h | 1 arch/x86/kernel/amd_gart_64.c | 3 arch/x86/kernel/setup.c | 4 arch/x86/mm/init.c | 9 - arch/x86/mm/init_32.c | 19 +- arch/x86/mm/init_64.c | 42 ++-- arch/x86/mm/mm_internal.h | 3 arch/x86/mm/pat/set_memory.c | 13 + arch/x86/mm/pkeys.c | 2 arch/x86/platform/uv/bios_uv.c | 3 arch/x86/um/asm/vm-flags.h | 10 - arch/xtensa/include/asm/page.h | 3 arch/xtensa/include/asm/pgtable.h | 3 drivers/char/hw_random/omap3-rom-rng.c | 4 drivers/dma/tegra20-apb-dma.c | 1 drivers/hwmon/dell-smm-hwmon.c | 4 drivers/platform/x86/dell-laptop.c | 4 drivers/platform/x86/dell-rbtn.c | 4 drivers/platform/x86/dell-rbtn.h | 2 drivers/platform/x86/dell-smbios-base.c | 4 drivers/platform/x86/dell-smbios-smm.c | 2 drivers/platform/x86/dell-smbios.h | 2 drivers/platform/x86/dell-smo8800.c | 2 drivers/platform/x86/dell-wmi.c | 4 drivers/power/supply/bq2415x_charger.c | 4 drivers/power/supply/bq27xxx_battery.c | 2 drivers/power/supply/isp1704_charger.c | 2 drivers/power/supply/rx51_battery.c | 4 drivers/staging/gasket/gasket_core.c | 2 fs/filesystems.c | 4 fs/hfsplus/attributes.c | 4 fs/ocfs2/alloc.c | 4 fs/seq_file.c | 7 fs/udf/ecma_167.h | 2 fs/udf/osta_udf.h | 2 include/linux/cma.h | 14 + include/linux/hugetlb.h | 12 + include/linux/memblock.h | 3 include/linux/memory_hotplug.h | 21 +- include/linux/mm.h | 34 +++ include/linux/power/bq2415x_charger.h | 2 include/linux/slab.h | 2 ipc/util.c | 2 kernel/gcov/fs.c | 2 kernel/kmod.c | 4 mm/cma.c | 16 + mm/gup.c | 3 mm/hugetlb.c | 109 ++++++++++++ mm/memblock.c | 2 mm/memcontrol.c | 3 mm/memory.c | 168 +++++++++++++++++-- mm/memory_hotplug.c | 13 - mm/memremap.c | 17 + mm/mmap.c | 4 mm/mprotect.c | 4 mm/page_alloc.c | 5 mm/slab_common.c | 2 tools/laptop/freefall/freefall.c | 2 tools/testing/selftests/kmod/kmod.sh | 43 ++++ 130 files changed, 710 insertions(+), 370 deletions(-)
A straggler. This patch caused a lot of build errors on a lot of architectures for a long time, but Anshuman believes it's all fixed up now. 1 patch, based on GIT b032227c62939b5481bcd45442b36dfa263f4a7c. Anshuman Khandual <anshuman.khandual@arm.com>: mm/debug: add tests validating architecture page table helpers Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 34 arch/arc/Kconfig | 1 arch/arm64/Kconfig | 1 arch/powerpc/Kconfig | 1 arch/s390/Kconfig | 1 arch/x86/Kconfig | 1 arch/x86/include/asm/pgtable_64.h | 6 include/linux/mmdebug.h | 5 init/main.c | 2 lib/Kconfig.debug | 26 mm/Makefile | 1 mm/debug_vm_pgtable.c | 392 ++++++++++ 12 files changed, 471 insertions(+)
15 fixes, based on ae83d0b416db002fe95601e7f97f64b59514d936: Masahiro Yamada <masahiroy@kernel.org>: sh: fix build error in mm/init.c Kees Cook <keescook@chromium.org>: slub: avoid redzone when choosing freepointer location Peter Xu <peterx@redhat.com>: mm/userfaultfd: disable userfaultfd-wp on x86_32 Bartosz Golaszewski <bgolaszewski@baylibre.com>: MAINTAINERS: add an entry for kfifo Longpeng <longpeng2@huawei.com>: mm/hugetlb: fix a addressing exception caused by huge_pte_offset Michal Hocko <mhocko@suse.com>: mm, gup: return EINTR when gup is interrupted by fatal signals Christophe JAILLET <christophe.jaillet@wanadoo.fr>: checkpatch: fix a typo in the regex for $allocFunctions George Burgess IV <gbiv@google.com>: tools/build: tweak unused value workaround Muchun Song <songmuchun@bytedance.com>: mm/ksm: fix NULL pointer dereference when KSM zero page is enabled Hugh Dickins <hughd@google.com>: mm/shmem: fix build without THP Jann Horn <jannh@google.com>: vmalloc: fix remap_vmalloc_range() bounds checks Hugh Dickins <hughd@google.com>: shmem: fix possible deadlocks on shmlock_user_lock Yang Shi <yang.shi@linux.alibaba.com>: mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path Sudip Mukherjee <sudipm.mukherjee@gmail.com>: coredump: fix null pointer dereference on coredump Lucas Stach <l.stach@pengutronix.de>: tools/vm: fix cross-compile build MAINTAINERS | 7 +++++++ arch/sh/mm/init.c | 2 +- arch/x86/Kconfig | 2 +- fs/coredump.c | 2 ++ fs/proc/vmcore.c | 5 +++-- include/linux/vmalloc.h | 2 +- mm/gup.c | 2 +- mm/hugetlb.c | 14 ++++++++------ mm/ksm.c | 12 ++++++++++-- mm/shmem.c | 13 ++++++++----- mm/slub.c | 12 ++++++++++-- mm/vmalloc.c | 16 +++++++++++++--- samples/vfio-mdev/mdpy.c | 2 +- scripts/checkpatch.pl | 2 +- tools/build/feature/test-sync-compare-and-swap.c | 2 +- tools/vm/Makefile | 2 ++ 16 files changed, 70 insertions(+), 27 deletions(-)
14 fixes and one selftest to verify the ipc fixes herein. 15 patches, based on a811c1fa0a02c062555b54651065899437bacdbe: Oleg Nesterov <oleg@redhat.com>: ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Yafang Shao <laoar.shao@gmail.com>: mm, memcg: fix error return value of mem_cgroup_css_alloc() David Hildenbrand <david@redhat.com>: mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Maciej Grochowski <maciej.grochowski@pm.me>: kernel/kcov.c: fix typos in kcov_remote_start documentation Ivan Delalande <colona@arista.com>: scripts/decodecode: fix trapping instruction formatting Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>: arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() Khazhismel Kumykov <khazhy@google.com>: eventpoll: fix missing wakeup for ovflist in ep_poll_callback Aymeric Agon-Rambosson <aymeric.agon@yandex.com>: scripts/gdb: repair rb_first() and rb_last() Waiman Long <longman@redhat.com>: mm/slub: fix incorrect interpretation of s->offset Filipe Manana <fdmanana@suse.com>: percpu: make pcpu_alloc() aware of current gfp context Roman Penyaev <rpenyaev@suse.de>: kselftests: introduce new epoll60 testcase for catching lost wakeups epoll: atomically remove wait entry on wake up Qiwu Chen <qiwuchen55@gmail.com>: mm/vmscan: remove unnecessary argument description of isolate_lru_pages() Kees Cook <keescook@chromium.org>: ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST Henry Willard <henry.willard@oracle.com>: mm: limit boost_watermark on small zones arch/x86/kvm/svm/sev.c | 2 fs/eventpoll.c | 61 ++-- ipc/mqueue.c | 34 +- kernel/kcov.c | 4 lib/Kconfig.ubsan | 15 - mm/memcontrol.c | 15 - mm/page_alloc.c | 9 mm/percpu.c | 14 mm/slub.c | 45 ++- mm/vmscan.c | 1 scripts/decodecode | 2 scripts/gdb/linux/rbtree.py | 4 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 146 ++++++++++ tools/testing/selftests/wireguard/qemu/debug.config | 1 14 files changed, 275 insertions(+), 78 deletions(-)
7 fixes, based on 24085f70a6e1b0cb647ec92623284641d8270637: Yafang Shao <laoar.shao@gmail.com>: mm, memcg: fix inconsistent oom event behavior Roman Penyaev <rpenyaev@suse.de>: epoll: call final ep_events_available() check under the lock Peter Xu <peterx@redhat.com>: mm/gup: fix fixup_user_fault() on multiple retries Brian Geffon <bgeffon@google.com>: userfaultfd: fix remap event with MREMAP_DONTUNMAP Vasily Averin <vvs@virtuozzo.com>: ipc/util.c: sysvipc_find_ipc() incorrectly updates position index Andrey Konovalov <andreyknvl@google.com>: kasan: consistently disable debugging features kasan: add missing functions declarations to kasan.h fs/eventpoll.c | 48 ++++++++++++++++++++++++++------------------- include/linux/memcontrol.h | 2 + ipc/util.c | 12 +++++------ mm/gup.c | 12 ++++++----- mm/kasan/Makefile | 15 +++++++++----- mm/kasan/kasan.h | 34 ++++++++++++++++++++++++++++++- mm/mremap.c | 2 - 7 files changed, 86 insertions(+), 39 deletions(-)
5 fixes, based on 444fc5cde64330661bf59944c43844e7d4c2ccd8: Qian Cai <cai@lca.pw>: mm/z3fold: silence kmemleak false positives of slots Hugh Dickins <hughd@google.com>: mm,thp: stop leaking unreleased file pages Konstantin Khlebnikov <khlebnikov@yandex-team.ru>: mm: remove VM_BUG_ON(PageSlab()) from page_mapcount() Alexander Potapenko <glider@google.com>: fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info() Arnd Bergmann <arnd@arndb.de>: include/asm-generic/topology.h: guard cpumask_of_node() macro argument fs/binfmt_elf.c | 2 +- include/asm-generic/topology.h | 2 +- include/linux/mm.h | 19 +++++++++++++++---- mm/khugepaged.c | 1 + mm/z3fold.c | 3 +++ 5 files changed, 21 insertions(+), 6 deletions(-)
On Thu, 28 May 2020 13:10:18 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Hmm..
>
> On Wed, May 27, 2020 at 10:20 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > fs/binfmt_elf.c | 2 +-
> > include/asm-generic/topology.h | 2 +-
> > include/linux/mm.h | 19 +++++++++++++++----
> > mm/khugepaged.c | 1 +
> > mm/z3fold.c | 3 +++
> > 5 files changed, 21 insertions(+), 6 deletions(-)
>
> I wonder how you generate that diffstat.
>
> The change to <linux/mm.h> simply doesn't match what you sent me. The
> patch you sent me that changed mm.h had this:
>
> include/linux/mm.h | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> (note 15 lines changed: it's +13 and -2) but now suddenly in your
> overall diffstat you have that
>
> include/linux/mm.h | 19 +++++++++++++++----
>
> with +15/-4.
>
> So your diffstat simply doesn't match what you are sending. What's going on?
>
Bah. I got lazy (didn't want to interrupt an ongoing build) so I
generated the diffstat prior to folding two patches into a single one.
Evidently diffstat isn't as smart as I had assumed!
On Fri, 29 May 2020 13:38:35 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, May 29, 2020 at 1:31 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Bah. I got lazy (didn't want to interrupt an ongoing build) so I
> > generated the diffstat prior to folding two patches into a single one.
> > Evidently diffstat isn't as smart as I had assumed!
>
> Ahh. Yes - given two patches, diffstat just adds up the line number
> counts for the individual diffs, it doesn't count some kind of
> "combined diff result" line counts.
Stupid diffstat. Means that basically all my diffstats are very wrong.
Thanks for spotting it.
I can fix that...
A few little subsystems and a start of a lot of MM patches. 128 patches, based on 9bf9511e3d9f328c03f6f79bfb741c3d18f2f2c0: Subsystems affected by this patch series: squashfs ocfs2 parisc vfs mm/slab-generic mm/slub mm/debug mm/pagecache mm/gup mm/swap mm/memcg mm/pagemap mm/memory-failure mm/vmalloc mm/kasan Subsystem: squashfs Philippe Liard <pliard@google.com>: squashfs: migrate from ll_rw_block usage to BIO Subsystem: ocfs2 Jules Irenge <jbi.octave@gmail.com>: ocfs2: add missing annotation for dlm_empty_lockres() Gang He <ghe@suse.com>: ocfs2: mount shared volume without ha stack Subsystem: parisc Andrew Morton <akpm@linux-foundation.org>: arch/parisc/include/asm/pgtable.h: remove unused `old_pte' Subsystem: vfs Jeff Layton <jlayton@redhat.com>: Patch series "vfs: have syncfs() return error when there are writeback: vfs: track per-sb writeback errors and report them to syncfs fs/buffer.c: record blockdev write errors in super_block that it backs Subsystem: mm/slab-generic Vlastimil Babka <vbabka@suse.cz>: usercopy: mark dma-kmalloc caches as usercopy caches Subsystem: mm/slub Dongli Zhang <dongli.zhang@oracle.com>: mm/slub.c: fix corrupted freechain in deactivate_slab() Christoph Lameter <cl@linux.com>: slub: Remove userspace notifier for cache add/remove Christopher Lameter <cl@linux.com>: slub: remove kmalloc under list_lock from list_slab_objects() V2 Qian Cai <cai@lca.pw>: mm/slub: fix stack overruns with SLUB_STATS Andrew Morton <akpm@linux-foundation.org>: Documentation/vm/slub.rst: s/Toggle/Enable/ Subsystem: mm/debug Vlastimil Babka <vbabka@suse.cz>: mm, dump_page(): do not crash with invalid mapping pointer Subsystem: mm/pagecache "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Change readahead API", v11: mm: move readahead prototypes from mm.h mm: return void from various readahead functions mm: ignore return value of ->readpages mm: move readahead nr_pages check into read_pages mm: add new readahead_control API mm: use readahead_control to pass arguments mm: rename various 'offset' parameters to 'index' mm: rename readahead loop variable to 'i' mm: remove 'page_offset' from readahead loop mm: put readahead pages in cache earlier mm: add readahead address space operation mm: move end_index check out of readahead loop mm: add page_cache_readahead_unbounded mm: document why we don't set PageReadahead mm: use memalloc_nofs_save in readahead path fs: convert mpage_readpages to mpage_readahead btrfs: convert from readpages to readahead erofs: convert uncompressed files from readpages to readahead erofs: convert compressed files from readpages to readahead ext4: convert from readpages to readahead ext4: pass the inode to ext4_mpage_readpages f2fs: convert from readpages to readahead f2fs: pass the inode to f2fs_mpage_readpages fuse: convert from readpages to readahead iomap: convert from readpages to readahead Guoqing Jiang <guoqing.jiang@cloud.ionos.com>: Patch series "Introduce attach/detach_page_private to cleanup code": include/linux/pagemap.h: introduce attach/detach_page_private md: remove __clear_page_buffers and use attach/detach_page_private btrfs: use attach/detach_page_private fs/buffer.c: use attach/detach_page_private f2fs: use attach/detach_page_private iomap: use attach/detach_page_private ntfs: replace attach_page_buffers with attach_page_private orangefs: use attach/detach_page_private buffer_head.h: remove attach_page_buffers mm/migrate.c: call detach_page_private to cleanup code mm_types.h: change set_page_private to inline function "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/filemap.c: remove misleading comment Chao Yu <yuchao0@huawei.com>: mm/page-writeback.c: remove unused variable NeilBrown <neilb@suse.de>: mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead Subsystem: mm/gup Souptick Joarder <jrdr.linux@gmail.com>: mm/gup.c: update the documentation John Hubbard <jhubbard@nvidia.com>: mm/gup: introduce pin_user_pages_unlocked ivtv: convert get_user_pages() --> pin_user_pages() Miles Chen <miles.chen@mediatek.com>: mm/gup.c: further document vma_permits_fault() Subsystem: mm/swap chenqiwu <chenqiwu@xiaomi.com>: mm/swapfile: use list_{prev,next}_entry() instead of open-coding Qian Cai <cai@lca.pw>: mm/swap_state: fix a data race in swapin_nr_pages Andrea Righi <andrea.righi@canonical.com>: mm: swap: properly update readahead statistics in unuse_pte_range() Wei Yang <richard.weiyang@gmail.com>: mm/swapfile.c: offset is only used when there is more slots mm/swapfile.c: explicitly show ssd/non-ssd is handled mutually exclusive mm/swapfile.c: remove the unnecessary goto for SSD case mm/swapfile.c: simplify the calculation of n_goal mm/swapfile.c: remove the extra check in scan_swap_map_slots() mm/swapfile.c: found_free could be represented by (tmp < max) mm/swapfile.c: tmp is always smaller than max mm/swapfile.c: omit a duplicate code by compare tmp and max first Huang Ying <ying.huang@intel.com>: swap: try to scan more free slots even when fragmented Wei Yang <richard.weiyang@gmail.com>: mm/swapfile.c: classify SWAP_MAP_XXX to make it more readable mm/swapfile.c: __swap_entry_free() always free 1 entry Huang Ying <ying.huang@intel.com>: mm/swapfile.c: use prandom_u32_max() swap: reduce lock contention on swap cache from swap slots allocation Randy Dunlap <rdunlap@infradead.org>: mm: swapfile: fix /proc/swaps heading and Size/Used/Priority alignment Miaohe Lin <linmiaohe@huawei.com>: include/linux/swap.h: delete meaningless __add_to_swap_cache() declaration Subsystem: mm/memcg Yafang Shao <laoar.shao@gmail.com>: mm, memcg: add workingset_restore in memory.stat Kaixu Xia <kaixuxia@tencent.com>: mm: memcontrol: simplify value comparison between count and limit Shakeel Butt <shakeelb@google.com>: memcg: expose root cgroup's memory.stat Jakub Kicinski <kuba@kernel.org>: Patch series "memcg: Slow down swap allocation as the available space gets: mm/memcg: prepare for swap over-high accounting and penalty calculation mm/memcg: move penalty delay clamping out of calculate_high_delay() mm/memcg: move cgroup high memory limit setting into struct page_counter mm/memcg: automatically penalize tasks with high swap use Zefan Li <lizefan@huawei.com>: memcg: fix memcg_kmem_bypass() for remote memcg charging Subsystem: mm/pagemap Steven Price <steven.price@arm.com>: Patch series "Fix W+X debug feature on x86": x86: mm: ptdump: calculate effective permissions correctly mm: ptdump: expand type of 'val' in note_page() Huang Ying <ying.huang@intel.com>: /proc/PID/smaps: Add PMD migration entry parsing chenqiwu <chenqiwu@xiaomi.com>: mm/memory: remove unnecessary pte_devmap case in copy_one_pte() Subsystem: mm/memory-failure Wetp Zhang <wetp.zy@linux.alibaba.com>: mm, memory_failure: don't send BUS_MCEERR_AO for action required error Subsystem: mm/vmalloc Christoph Hellwig <hch@lst.de>: Patch series "decruft the vmalloc API", v2: x86/hyperv: use vmalloc_exec for the hypercall page x86: fix vmap arguments in map_irq_stack staging: android: ion: use vmap instead of vm_map_ram staging: media: ipu3: use vmap instead of reimplementing it dma-mapping: use vmap insted of reimplementing it powerpc: add an ioremap_phb helper powerpc: remove __ioremap_at and __iounmap_at mm: remove __get_vm_area mm: unexport unmap_kernel_range_noflush mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING mm: only allow page table mappings for built-in zsmalloc mm: pass addr as unsigned long to vb_free mm: remove vmap_page_range_noflush and vunmap_page_range mm: rename vmap_page_range to map_kernel_range mm: don't return the number of pages from map_kernel_range{,_noflush} mm: remove map_vm_range mm: remove unmap_vmap_area mm: remove the prot argument from vm_map_ram mm: enforce that vmap can't map pages executable gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc mm: remove the pgprot argument to __vmalloc mm: remove the prot argument to __vmalloc_node mm: remove both instances of __vmalloc_node_flags mm: remove __vmalloc_node_flags_caller mm: switch the test_vmalloc module to use __vmalloc_node mm: remove vmalloc_user_node_flags arm64: use __vmalloc_node in arch_alloc_vmap_stack powerpc: use __vmalloc_node in alloc_vm_stack s390: use __vmalloc_node in stack_alloc Joerg Roedel <jroedel@suse.de>: Patch series "mm: Get rid of vmalloc_sync_(un)mappings()", v3: mm: add functions to track page directory modifications mm/vmalloc: track which page-table levels were modified mm/ioremap: track which page-table levels were modified x86/mm/64: implement arch_sync_kernel_mappings() x86/mm/32: implement arch_sync_kernel_mappings() mm: remove vmalloc_sync_(un)mappings() x86/mm: remove vmalloc faulting Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan: fix clang compilation warning due to stack protector Kees Cook <keescook@chromium.org>: ubsan: entirely disable alignment checks under UBSAN_TRAP Jing Xia <jing.xia@unisoc.com>: mm/mm_init.c: report kasan-tag information stored in page->flags Andrey Konovalov <andreyknvl@google.com>: kasan: move kasan_report() into report.c Documentation/admin-guide/cgroup-v2.rst | 24 + Documentation/core-api/cachetlb.rst | 2 Documentation/filesystems/locking.rst | 6 Documentation/filesystems/proc.rst | 4 Documentation/filesystems/vfs.rst | 15 Documentation/vm/slub.rst | 2 arch/arm/configs/omap2plus_defconfig | 2 arch/arm64/include/asm/pgtable.h | 3 arch/arm64/include/asm/vmap_stack.h | 6 arch/arm64/mm/dump.c | 2 arch/parisc/include/asm/pgtable.h | 2 arch/powerpc/include/asm/io.h | 10 arch/powerpc/include/asm/pci-bridge.h | 2 arch/powerpc/kernel/irq.c | 5 arch/powerpc/kernel/isa-bridge.c | 28 + arch/powerpc/kernel/pci_64.c | 56 +- arch/powerpc/mm/ioremap_64.c | 50 -- arch/riscv/include/asm/pgtable.h | 4 arch/riscv/mm/ptdump.c | 2 arch/s390/kernel/setup.c | 9 arch/sh/kernel/cpu/sh4/sq.c | 3 arch/x86/hyperv/hv_init.c | 5 arch/x86/include/asm/kvm_host.h | 3 arch/x86/include/asm/pgtable-2level_types.h | 2 arch/x86/include/asm/pgtable-3level_types.h | 2 arch/x86/include/asm/pgtable_64_types.h | 2 arch/x86/include/asm/pgtable_types.h | 8 arch/x86/include/asm/switch_to.h | 23 - arch/x86/kernel/irq_64.c | 2 arch/x86/kernel/setup_percpu.c | 6 arch/x86/kvm/svm/sev.c | 3 arch/x86/mm/dump_pagetables.c | 35 + arch/x86/mm/fault.c | 196 ---------- arch/x86/mm/init_64.c | 5 arch/x86/mm/pti.c | 8 arch/x86/mm/tlb.c | 37 - block/blk-core.c | 1 drivers/acpi/apei/ghes.c | 6 drivers/base/node.c | 2 drivers/block/drbd/drbd_bitmap.c | 4 drivers/block/loop.c | 2 drivers/dax/device.c | 1 drivers/gpu/drm/drm_scatter.c | 11 drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 2 drivers/lightnvm/pblk-init.c | 5 drivers/md/dm-bufio.c | 4 drivers/md/md-bitmap.c | 12 drivers/media/common/videobuf2/videobuf2-dma-sg.c | 3 drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3 drivers/media/pci/ivtv/ivtv-udma.c | 19 - drivers/media/pci/ivtv/ivtv-yuv.c | 17 drivers/media/pci/ivtv/ivtvfb.c | 4 drivers/mtd/ubi/io.c | 4 drivers/pcmcia/electra_cf.c | 45 -- drivers/scsi/sd_zbc.c | 3 drivers/staging/android/ion/ion_heap.c | 4 drivers/staging/media/ipu3/ipu3-css-pool.h | 4 drivers/staging/media/ipu3/ipu3-dmamap.c | 30 - fs/block_dev.c | 7 fs/btrfs/disk-io.c | 4 fs/btrfs/extent_io.c | 64 --- fs/btrfs/extent_io.h | 3 fs/btrfs/inode.c | 39 -- fs/buffer.c | 23 - fs/erofs/data.c | 41 -- fs/erofs/decompressor.c | 2 fs/erofs/zdata.c | 31 - fs/exfat/inode.c | 7 fs/ext2/inode.c | 10 fs/ext4/ext4.h | 5 fs/ext4/inode.c | 25 - fs/ext4/readpage.c | 25 - fs/ext4/verity.c | 35 - fs/f2fs/data.c | 56 +- fs/f2fs/f2fs.h | 14 fs/f2fs/verity.c | 35 - fs/fat/inode.c | 7 fs/file_table.c | 1 fs/fs-writeback.c | 1 fs/fuse/file.c | 100 +---- fs/gfs2/aops.c | 23 - fs/gfs2/dir.c | 9 fs/gfs2/quota.c | 2 fs/hpfs/file.c | 7 fs/iomap/buffered-io.c | 113 +---- fs/iomap/trace.h | 2 fs/isofs/inode.c | 7 fs/jfs/inode.c | 7 fs/mpage.c | 38 -- fs/nfs/blocklayout/extent_tree.c | 2 fs/nfs/internal.h | 10 fs/nfs/write.c | 4 fs/nfsd/vfs.c | 9 fs/nilfs2/inode.c | 15 fs/ntfs/aops.c | 2 fs/ntfs/malloc.h | 2 fs/ntfs/mft.c | 2 fs/ocfs2/aops.c | 34 - fs/ocfs2/dlm/dlmmaster.c | 1 fs/ocfs2/ocfs2.h | 4 fs/ocfs2/slot_map.c | 46 +- fs/ocfs2/super.c | 21 + fs/omfs/file.c | 7 fs/open.c | 3 fs/orangefs/inode.c | 32 - fs/proc/meminfo.c | 3 fs/proc/task_mmu.c | 16 fs/qnx6/inode.c | 7 fs/reiserfs/inode.c | 8 fs/squashfs/block.c | 273 +++++++------- fs/squashfs/decompressor.h | 5 fs/squashfs/decompressor_multi.c | 9 fs/squashfs/decompressor_multi_percpu.c | 17 fs/squashfs/decompressor_single.c | 9 fs/squashfs/lz4_wrapper.c | 17 fs/squashfs/lzo_wrapper.c | 17 fs/squashfs/squashfs.h | 4 fs/squashfs/xz_wrapper.c | 51 +- fs/squashfs/zlib_wrapper.c | 63 +-- fs/squashfs/zstd_wrapper.c | 62 +-- fs/sync.c | 6 fs/ubifs/debug.c | 2 fs/ubifs/lprops.c | 2 fs/ubifs/lpt_commit.c | 4 fs/ubifs/orphan.c | 2 fs/udf/inode.c | 7 fs/xfs/kmem.c | 2 fs/xfs/xfs_aops.c | 13 fs/xfs/xfs_buf.c | 2 fs/zonefs/super.c | 7 include/asm-generic/5level-fixup.h | 5 include/asm-generic/pgtable.h | 27 + include/linux/buffer_head.h | 8 include/linux/fs.h | 18 include/linux/iomap.h | 3 include/linux/memcontrol.h | 4 include/linux/mm.h | 67 ++- include/linux/mm_types.h | 6 include/linux/mmzone.h | 1 include/linux/mpage.h | 4 include/linux/page_counter.h | 8 include/linux/pagemap.h | 193 ++++++++++ include/linux/ptdump.h | 3 include/linux/sched.h | 3 include/linux/swap.h | 17 include/linux/vmalloc.h | 49 +- include/linux/zsmalloc.h | 2 include/trace/events/erofs.h | 6 include/trace/events/f2fs.h | 6 include/trace/events/writeback.h | 5 kernel/bpf/core.c | 6 kernel/bpf/syscall.c | 29 - kernel/dma/remap.c | 48 -- kernel/groups.c | 2 kernel/module.c | 3 kernel/notifier.c | 1 kernel/sys.c | 2 kernel/trace/trace.c | 12 lib/Kconfig.ubsan | 2 lib/ioremap.c | 46 +- lib/test_vmalloc.c | 26 - mm/Kconfig | 4 mm/debug.c | 56 ++ mm/fadvise.c | 6 mm/filemap.c | 1 mm/gup.c | 77 +++- mm/internal.h | 14 mm/kasan/Makefile | 21 - mm/kasan/common.c | 19 - mm/kasan/report.c | 22 + mm/memcontrol.c | 198 +++++++--- mm/memory-failure.c | 15 mm/memory.c | 2 mm/migrate.c | 9 mm/mm_init.c | 16 mm/nommu.c | 52 +- mm/page-writeback.c | 62 ++- mm/page_alloc.c | 7 mm/percpu.c | 2 mm/ptdump.c | 17 mm/readahead.c | 349 ++++++++++-------- mm/slab_common.c | 3 mm/slub.c | 67 ++- mm/swap_state.c | 5 mm/swapfile.c | 194 ++++++---- mm/util.c | 2 mm/vmalloc.c | 399 ++++++++------------- mm/vmscan.c | 4 mm/vmstat.c | 11 mm/zsmalloc.c | 12 net/bridge/netfilter/ebtables.c | 6 net/ceph/ceph_common.c | 3 sound/core/memalloc.c | 2 sound/core/pcm_memory.c | 2 195 files changed, 2292 insertions(+), 2288 deletions(-)
The local_lock merge made rather a mess of all of this. I'm cooking up a full resend of the same material.
A few little subsystems and a start of a lot of MM patches. 128 patches, based on f359287765c04711ff54fbd11645271d8e5ff763: Subsystems affected by this patch series: squashfs ocfs2 parisc vfs mm/slab-generic mm/slub mm/debug mm/pagecache mm/gup mm/swap mm/memcg mm/pagemap mm/memory-failure mm/vmalloc mm/kasan Subsystem: squashfs Philippe Liard <pliard@google.com>: squashfs: migrate from ll_rw_block usage to BIO Subsystem: ocfs2 Jules Irenge <jbi.octave@gmail.com>: ocfs2: add missing annotation for dlm_empty_lockres() Gang He <ghe@suse.com>: ocfs2: mount shared volume without ha stack Subsystem: parisc Andrew Morton <akpm@linux-foundation.org>: arch/parisc/include/asm/pgtable.h: remove unused `old_pte' Subsystem: vfs Jeff Layton <jlayton@redhat.com>: Patch series "vfs: have syncfs() return error when there are writeback: vfs: track per-sb writeback errors and report them to syncfs fs/buffer.c: record blockdev write errors in super_block that it backs Subsystem: mm/slab-generic Vlastimil Babka <vbabka@suse.cz>: usercopy: mark dma-kmalloc caches as usercopy caches Subsystem: mm/slub Dongli Zhang <dongli.zhang@oracle.com>: mm/slub.c: fix corrupted freechain in deactivate_slab() Christoph Lameter <cl@linux.com>: slub: Remove userspace notifier for cache add/remove Christopher Lameter <cl@linux.com>: slub: remove kmalloc under list_lock from list_slab_objects() V2 Qian Cai <cai@lca.pw>: mm/slub: fix stack overruns with SLUB_STATS Andrew Morton <akpm@linux-foundation.org>: Documentation/vm/slub.rst: s/Toggle/Enable/ Subsystem: mm/debug Vlastimil Babka <vbabka@suse.cz>: mm, dump_page(): do not crash with invalid mapping pointer Subsystem: mm/pagecache "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Change readahead API", v11: mm: move readahead prototypes from mm.h mm: return void from various readahead functions mm: ignore return value of ->readpages mm: move readahead nr_pages check into read_pages mm: add new readahead_control API mm: use readahead_control to pass arguments mm: rename various 'offset' parameters to 'index' mm: rename readahead loop variable to 'i' mm: remove 'page_offset' from readahead loop mm: put readahead pages in cache earlier mm: add readahead address space operation mm: move end_index check out of readahead loop mm: add page_cache_readahead_unbounded mm: document why we don't set PageReadahead mm: use memalloc_nofs_save in readahead path fs: convert mpage_readpages to mpage_readahead btrfs: convert from readpages to readahead erofs: convert uncompressed files from readpages to readahead erofs: convert compressed files from readpages to readahead ext4: convert from readpages to readahead ext4: pass the inode to ext4_mpage_readpages f2fs: convert from readpages to readahead f2fs: pass the inode to f2fs_mpage_readpages fuse: convert from readpages to readahead iomap: convert from readpages to readahead Guoqing Jiang <guoqing.jiang@cloud.ionos.com>: Patch series "Introduce attach/detach_page_private to cleanup code": include/linux/pagemap.h: introduce attach/detach_page_private md: remove __clear_page_buffers and use attach/detach_page_private btrfs: use attach/detach_page_private fs/buffer.c: use attach/detach_page_private f2fs: use attach/detach_page_private iomap: use attach/detach_page_private ntfs: replace attach_page_buffers with attach_page_private orangefs: use attach/detach_page_private buffer_head.h: remove attach_page_buffers mm/migrate.c: call detach_page_private to cleanup code mm_types.h: change set_page_private to inline function "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/filemap.c: remove misleading comment Chao Yu <yuchao0@huawei.com>: mm/page-writeback.c: remove unused variable NeilBrown <neilb@suse.de>: mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead Subsystem: mm/gup Souptick Joarder <jrdr.linux@gmail.com>: mm/gup.c: update the documentation John Hubbard <jhubbard@nvidia.com>: mm/gup: introduce pin_user_pages_unlocked ivtv: convert get_user_pages() --> pin_user_pages() Miles Chen <miles.chen@mediatek.com>: mm/gup.c: further document vma_permits_fault() Subsystem: mm/swap chenqiwu <chenqiwu@xiaomi.com>: mm/swapfile: use list_{prev,next}_entry() instead of open-coding Qian Cai <cai@lca.pw>: mm/swap_state: fix a data race in swapin_nr_pages Andrea Righi <andrea.righi@canonical.com>: mm: swap: properly update readahead statistics in unuse_pte_range() Wei Yang <richard.weiyang@gmail.com>: mm/swapfile.c: offset is only used when there is more slots mm/swapfile.c: explicitly show ssd/non-ssd is handled mutually exclusive mm/swapfile.c: remove the unnecessary goto for SSD case mm/swapfile.c: simplify the calculation of n_goal mm/swapfile.c: remove the extra check in scan_swap_map_slots() mm/swapfile.c: found_free could be represented by (tmp < max) mm/swapfile.c: tmp is always smaller than max mm/swapfile.c: omit a duplicate code by compare tmp and max first Huang Ying <ying.huang@intel.com>: swap: try to scan more free slots even when fragmented Wei Yang <richard.weiyang@gmail.com>: mm/swapfile.c: classify SWAP_MAP_XXX to make it more readable mm/swapfile.c: __swap_entry_free() always free 1 entry Huang Ying <ying.huang@intel.com>: mm/swapfile.c: use prandom_u32_max() swap: reduce lock contention on swap cache from swap slots allocation Randy Dunlap <rdunlap@infradead.org>: mm: swapfile: fix /proc/swaps heading and Size/Used/Priority alignment Miaohe Lin <linmiaohe@huawei.com>: include/linux/swap.h: delete meaningless __add_to_swap_cache() declaration Subsystem: mm/memcg Yafang Shao <laoar.shao@gmail.com>: mm, memcg: add workingset_restore in memory.stat Kaixu Xia <kaixuxia@tencent.com>: mm: memcontrol: simplify value comparison between count and limit Shakeel Butt <shakeelb@google.com>: memcg: expose root cgroup's memory.stat Jakub Kicinski <kuba@kernel.org>: Patch series "memcg: Slow down swap allocation as the available space gets: mm/memcg: prepare for swap over-high accounting and penalty calculation mm/memcg: move penalty delay clamping out of calculate_high_delay() mm/memcg: move cgroup high memory limit setting into struct page_counter mm/memcg: automatically penalize tasks with high swap use Zefan Li <lizefan@huawei.com>: memcg: fix memcg_kmem_bypass() for remote memcg charging Subsystem: mm/pagemap Steven Price <steven.price@arm.com>: Patch series "Fix W+X debug feature on x86": x86: mm: ptdump: calculate effective permissions correctly mm: ptdump: expand type of 'val' in note_page() Huang Ying <ying.huang@intel.com>: /proc/PID/smaps: Add PMD migration entry parsing chenqiwu <chenqiwu@xiaomi.com>: mm/memory: remove unnecessary pte_devmap case in copy_one_pte() Subsystem: mm/memory-failure Wetp Zhang <wetp.zy@linux.alibaba.com>: mm, memory_failure: don't send BUS_MCEERR_AO for action required error Subsystem: mm/vmalloc Christoph Hellwig <hch@lst.de>: Patch series "decruft the vmalloc API", v2: x86/hyperv: use vmalloc_exec for the hypercall page x86: fix vmap arguments in map_irq_stack staging: android: ion: use vmap instead of vm_map_ram staging: media: ipu3: use vmap instead of reimplementing it dma-mapping: use vmap insted of reimplementing it powerpc: add an ioremap_phb helper powerpc: remove __ioremap_at and __iounmap_at mm: remove __get_vm_area mm: unexport unmap_kernel_range_noflush mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING mm: only allow page table mappings for built-in zsmalloc mm: pass addr as unsigned long to vb_free mm: remove vmap_page_range_noflush and vunmap_page_range mm: rename vmap_page_range to map_kernel_range mm: don't return the number of pages from map_kernel_range{,_noflush} mm: remove map_vm_range mm: remove unmap_vmap_area mm: remove the prot argument from vm_map_ram mm: enforce that vmap can't map pages executable gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc mm: remove the pgprot argument to __vmalloc mm: remove the prot argument to __vmalloc_node mm: remove both instances of __vmalloc_node_flags mm: remove __vmalloc_node_flags_caller mm: switch the test_vmalloc module to use __vmalloc_node mm: remove vmalloc_user_node_flags arm64: use __vmalloc_node in arch_alloc_vmap_stack powerpc: use __vmalloc_node in alloc_vm_stack s390: use __vmalloc_node in stack_alloc Joerg Roedel <jroedel@suse.de>: Patch series "mm: Get rid of vmalloc_sync_(un)mappings()", v3: mm: add functions to track page directory modifications mm/vmalloc: track which page-table levels were modified mm/ioremap: track which page-table levels were modified x86/mm/64: implement arch_sync_kernel_mappings() x86/mm/32: implement arch_sync_kernel_mappings() mm: remove vmalloc_sync_(un)mappings() x86/mm: remove vmalloc faulting Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan: fix clang compilation warning due to stack protector Kees Cook <keescook@chromium.org>: ubsan: entirely disable alignment checks under UBSAN_TRAP Jing Xia <jing.xia@unisoc.com>: mm/mm_init.c: report kasan-tag information stored in page->flags Andrey Konovalov <andreyknvl@google.com>: kasan: move kasan_report() into report.c Documentation/admin-guide/cgroup-v2.rst | 24 + Documentation/core-api/cachetlb.rst | 2 Documentation/filesystems/locking.rst | 6 Documentation/filesystems/proc.rst | 4 Documentation/filesystems/vfs.rst | 15 Documentation/vm/slub.rst | 2 arch/arm/configs/omap2plus_defconfig | 2 arch/arm64/include/asm/pgtable.h | 3 arch/arm64/include/asm/vmap_stack.h | 6 arch/arm64/mm/dump.c | 2 arch/parisc/include/asm/pgtable.h | 2 arch/powerpc/include/asm/io.h | 10 arch/powerpc/include/asm/pci-bridge.h | 2 arch/powerpc/kernel/irq.c | 5 arch/powerpc/kernel/isa-bridge.c | 28 + arch/powerpc/kernel/pci_64.c | 56 +- arch/powerpc/mm/ioremap_64.c | 50 -- arch/riscv/include/asm/pgtable.h | 4 arch/riscv/mm/ptdump.c | 2 arch/s390/kernel/setup.c | 9 arch/sh/kernel/cpu/sh4/sq.c | 3 arch/x86/hyperv/hv_init.c | 5 arch/x86/include/asm/kvm_host.h | 3 arch/x86/include/asm/pgtable-2level_types.h | 2 arch/x86/include/asm/pgtable-3level_types.h | 2 arch/x86/include/asm/pgtable_64_types.h | 2 arch/x86/include/asm/pgtable_types.h | 8 arch/x86/include/asm/switch_to.h | 23 - arch/x86/kernel/irq_64.c | 2 arch/x86/kernel/setup_percpu.c | 6 arch/x86/kvm/svm/sev.c | 3 arch/x86/mm/dump_pagetables.c | 35 + arch/x86/mm/fault.c | 196 ---------- arch/x86/mm/init_64.c | 5 arch/x86/mm/pti.c | 8 arch/x86/mm/tlb.c | 37 - block/blk-core.c | 1 drivers/acpi/apei/ghes.c | 6 drivers/base/node.c | 2 drivers/block/drbd/drbd_bitmap.c | 4 drivers/block/loop.c | 2 drivers/dax/device.c | 1 drivers/gpu/drm/drm_scatter.c | 11 drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 2 drivers/lightnvm/pblk-init.c | 5 drivers/md/dm-bufio.c | 4 drivers/md/md-bitmap.c | 12 drivers/media/common/videobuf2/videobuf2-dma-sg.c | 3 drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3 drivers/media/pci/ivtv/ivtv-udma.c | 19 - drivers/media/pci/ivtv/ivtv-yuv.c | 17 drivers/media/pci/ivtv/ivtvfb.c | 4 drivers/mtd/ubi/io.c | 4 drivers/pcmcia/electra_cf.c | 45 -- drivers/scsi/sd_zbc.c | 3 drivers/staging/android/ion/ion_heap.c | 4 drivers/staging/media/ipu3/ipu3-css-pool.h | 4 drivers/staging/media/ipu3/ipu3-dmamap.c | 30 - fs/block_dev.c | 7 fs/btrfs/disk-io.c | 4 fs/btrfs/extent_io.c | 64 --- fs/btrfs/extent_io.h | 3 fs/btrfs/inode.c | 39 -- fs/buffer.c | 23 - fs/erofs/data.c | 41 -- fs/erofs/decompressor.c | 2 fs/erofs/zdata.c | 31 - fs/exfat/inode.c | 7 fs/ext2/inode.c | 10 fs/ext4/ext4.h | 5 fs/ext4/inode.c | 25 - fs/ext4/readpage.c | 25 - fs/ext4/verity.c | 35 - fs/f2fs/data.c | 56 +- fs/f2fs/f2fs.h | 14 fs/f2fs/verity.c | 35 - fs/fat/inode.c | 7 fs/file_table.c | 1 fs/fs-writeback.c | 1 fs/fuse/file.c | 100 +---- fs/gfs2/aops.c | 23 - fs/gfs2/dir.c | 9 fs/gfs2/quota.c | 2 fs/hpfs/file.c | 7 fs/iomap/buffered-io.c | 113 +---- fs/iomap/trace.h | 2 fs/isofs/inode.c | 7 fs/jfs/inode.c | 7 fs/mpage.c | 38 -- fs/nfs/blocklayout/extent_tree.c | 2 fs/nfs/internal.h | 10 fs/nfs/write.c | 4 fs/nfsd/vfs.c | 9 fs/nilfs2/inode.c | 15 fs/ntfs/aops.c | 2 fs/ntfs/malloc.h | 2 fs/ntfs/mft.c | 2 fs/ocfs2/aops.c | 34 - fs/ocfs2/dlm/dlmmaster.c | 1 fs/ocfs2/ocfs2.h | 4 fs/ocfs2/slot_map.c | 46 +- fs/ocfs2/super.c | 21 + fs/omfs/file.c | 7 fs/open.c | 3 fs/orangefs/inode.c | 32 - fs/proc/meminfo.c | 3 fs/proc/task_mmu.c | 16 fs/qnx6/inode.c | 7 fs/reiserfs/inode.c | 8 fs/squashfs/block.c | 273 +++++++------- fs/squashfs/decompressor.h | 5 fs/squashfs/decompressor_multi.c | 9 fs/squashfs/decompressor_multi_percpu.c | 17 fs/squashfs/decompressor_single.c | 9 fs/squashfs/lz4_wrapper.c | 17 fs/squashfs/lzo_wrapper.c | 17 fs/squashfs/squashfs.h | 4 fs/squashfs/xz_wrapper.c | 51 +- fs/squashfs/zlib_wrapper.c | 63 +-- fs/squashfs/zstd_wrapper.c | 62 +-- fs/sync.c | 6 fs/ubifs/debug.c | 2 fs/ubifs/lprops.c | 2 fs/ubifs/lpt_commit.c | 4 fs/ubifs/orphan.c | 2 fs/udf/inode.c | 7 fs/xfs/kmem.c | 2 fs/xfs/xfs_aops.c | 13 fs/xfs/xfs_buf.c | 2 fs/zonefs/super.c | 7 include/asm-generic/5level-fixup.h | 5 include/asm-generic/pgtable.h | 27 + include/linux/buffer_head.h | 8 include/linux/fs.h | 18 include/linux/iomap.h | 3 include/linux/memcontrol.h | 4 include/linux/mm.h | 67 ++- include/linux/mm_types.h | 6 include/linux/mmzone.h | 1 include/linux/mpage.h | 4 include/linux/page_counter.h | 8 include/linux/pagemap.h | 193 ++++++++++ include/linux/ptdump.h | 3 include/linux/sched.h | 3 include/linux/swap.h | 17 include/linux/vmalloc.h | 49 +- include/linux/zsmalloc.h | 2 include/trace/events/erofs.h | 6 include/trace/events/f2fs.h | 6 include/trace/events/writeback.h | 5 kernel/bpf/core.c | 6 kernel/bpf/syscall.c | 29 - kernel/dma/remap.c | 48 -- kernel/groups.c | 2 kernel/module.c | 3 kernel/notifier.c | 1 kernel/sys.c | 2 kernel/trace/trace.c | 12 lib/Kconfig.ubsan | 2 lib/ioremap.c | 46 +- lib/test_vmalloc.c | 26 - mm/Kconfig | 4 mm/debug.c | 56 ++ mm/fadvise.c | 6 mm/filemap.c | 1 mm/gup.c | 77 +++- mm/internal.h | 14 mm/kasan/Makefile | 21 - mm/kasan/common.c | 19 - mm/kasan/report.c | 22 + mm/memcontrol.c | 198 +++++++--- mm/memory-failure.c | 15 mm/memory.c | 2 mm/migrate.c | 9 mm/mm_init.c | 16 mm/nommu.c | 52 +- mm/page-writeback.c | 62 ++- mm/page_alloc.c | 7 mm/percpu.c | 2 mm/ptdump.c | 17 mm/readahead.c | 349 ++++++++++-------- mm/slab_common.c | 3 mm/slub.c | 67 ++- mm/swap_state.c | 5 mm/swapfile.c | 194 ++++++---- mm/util.c | 2 mm/vmalloc.c | 399 ++++++++------------- mm/vmscan.c | 4 mm/vmstat.c | 11 mm/zsmalloc.c | 12 net/bridge/netfilter/ebtables.c | 6 net/ceph/ceph_common.c | 3 sound/core/memalloc.c | 2 sound/core/pcm_memory.c | 2 195 files changed, 2292 insertions(+), 2288 deletions(-)
On Tue, 2 Jun 2020 13:45:49 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Jun 2, 2020 at 1:08 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > The local_lock merge made rather a mess of all of this. I'm > > cooking up a full resend of the same material. > > Hmm. I have no issues with conflicts, and already took your previous series. Well that's odd. > I've pushed it out now - does my tree match what you expect? Yup, thanks.
More mm/ work, plenty more to come. 131 patches, based on d6f9469a03d832dcd17041ed67774ffb5f3e73b3. Subsystems affected by this patch series: mm/slub mm/memcg mm/gup mm/kasan mm/pagealloc mm/hugetlb mm/vmscan mm/tools mm/mempolicy mm/memblock mm/hugetlbfs mm/thp mm/mmap mm/kconfig Subsystem: mm/slub Wang Hai <wanghai38@huawei.com>: mm/slub: fix a memory leak in sysfs_slab_add() Subsystem: mm/memcg Shakeel Butt <shakeelb@google.com>: mm/memcg: optimize memory.numa_stat like memory.stat Subsystem: mm/gup John Hubbard <jhubbard@nvidia.com>: Patch series "mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()", v2: mm/gup: move __get_user_pages_fast() down a few lines in gup.c mm/gup: refactor and de-duplicate gup_fast() code mm/gup: introduce pin_user_pages_fast_only() drm/i915: convert get_user_pages() --> pin_user_pages() mm/gup: might_lock_read(mmap_sem) in get_user_pages_fast() Subsystem: mm/kasan Daniel Axtens <dja@axtens.net>: Patch series "Fix some incompatibilites between KASAN and FORTIFY_SOURCE", v4: kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE string.h: fix incompatibility between FORTIFY_SOURCE and KASAN Subsystem: mm/pagealloc Michal Hocko <mhocko@suse.com>: mm: clarify __GFP_MEMALLOC usage Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: rework free_area_init*() funcitons": mm: memblock: replace dereferences of memblock_region.nid with API calls mm: make early_pfn_to_nid() and related defintions close to each other mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option mm: free_area_init: use maximal zone PFNs rather than zone sizes mm: use free_area_init() instead of free_area_init_nodes() alpha: simplify detection of memory zone boundaries arm: simplify detection of memory zone boundaries arm64: simplify detection of memory zone boundaries for UMA configs csky: simplify detection of memory zone boundaries m68k: mm: simplify detection of memory zone boundaries parisc: simplify detection of memory zone boundaries sparc32: simplify detection of memory zone boundaries unicore32: simplify detection of memory zone boundaries xtensa: simplify detection of memory zone boundaries Baoquan He <bhe@redhat.com>: mm: memmap_init: iterate over memblock regions rather that check each PFN Mike Rapoport <rppt@linux.ibm.com>: mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES mm: free_area_init: allow defining max_zone_pfn in descending order mm: rename free_area_init_node() to free_area_init_memoryless_node() mm: clean up free_area_init_node() and its helpers mm: simplify find_min_pfn_with_active_regions() docs/vm: update memory-models documentation Wei Yang <richard.weiyang@gmail.com>: Patch series "mm/page_alloc.c: cleanup on check page", v3: mm/page_alloc.c: bad_[reason|flags] is not necessary when PageHWPoison mm/page_alloc.c: bad_flags is not necessary for bad_page() mm/page_alloc.c: rename free_pages_check_bad() to check_free_page_bad() mm/page_alloc.c: rename free_pages_check() to check_free_page() mm/page_alloc.c: extract check_[new|free]_page_bad() common part to page_bad_reason() Roman Gushchin <guro@fb.com>: mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations Baoquan He <bhe@redhat.com>: mm/page_alloc.c: remove unused free_bootmem_with_active_regions Patch series "improvements about lowmem_reserve and /proc/zoneinfo", v2: mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty mm/vmstat.c: do not show lowmem reserve protection information of empty zone Joonsoo Kim <iamjoonsoo.kim@lge.com>: Patch series "integrate classzone_idx and high_zoneidx", v5: mm/page_alloc: use ac->high_zoneidx for classzone_idx mm/page_alloc: integrate classzone_idx and high_zoneidx Wei Yang <richard.weiyang@gmail.com>: mm/page_alloc.c: use NODE_MASK_NONE in build_zonelists() mm: rename gfpflags_to_migratetype to gfp_migratetype for same convention Sandipan Das <sandipan@linux.ibm.com>: mm/page_alloc.c: reset numa stats for boot pagesets Charan Teja Reddy <charante@codeaurora.org>: mm, page_alloc: reset the zone->watermark_boost early Anshuman Khandual <anshuman.khandual@arm.com>: mm/page_alloc: restrict and formalize compound_page_dtors[] Daniel Jordan <daniel.m.jordan@oracle.com>: Patch series "initialize deferred pages with interrupts enabled", v4: mm/pagealloc.c: call touch_nmi_watchdog() on max order boundaries in deferred init Pavel Tatashin <pasha.tatashin@soleen.com>: mm: initialize deferred pages with interrupts enabled mm: call cond_resched() from deferred_init_memmap() Daniel Jordan <daniel.m.jordan@oracle.com>: Patch series "padata: parallelize deferred page init", v3: padata: remove exit routine padata: initialize earlier padata: allocate work structures for parallel jobs from a pool padata: add basic support for multithreaded jobs mm: don't track number of pages during deferred initialization mm: parallelize deferred_init_memmap() mm: make deferred init's max threads arch-specific padata: document multithreaded jobs Chen Tao <chentao107@huawei.com>: mm/page_alloc.c: add missing newline Subsystem: mm/hugetlb "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>: Patch series "thp/khugepaged improvements and CoW semantics", v4: khugepaged: add self test khugepaged: do not stop collapse if less than half PTEs are referenced khugepaged: drain all LRU caches before scanning pages khugepaged: drain LRU add pagevec after swapin khugepaged: allow to collapse a page shared across fork khugepaged: allow to collapse PTE-mapped compound pages thp: change CoW semantics for anon-THP khugepaged: introduce 'max_ptes_shared' tunable Mike Kravetz <mike.kravetz@oracle.com>: Patch series "Clean up hugetlb boot command line processing", v4: hugetlbfs: add arch_hugetlb_valid_size hugetlbfs: move hugepagesz= parsing to arch independent code hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate hugetlbfs: clean up command line processing hugetlbfs: fix changes to command line processing Li Xinhai <lixinhai.lxh@gmail.com>: mm/hugetlb: avoid unnecessary check on pud and pmd entry in huge_pte_offset Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/hugetlb: Add some new generic fallbacks", v3: arm64/mm: drop __HAVE_ARCH_HUGE_PTEP_GET mm/hugetlb: define a generic fallback for is_hugepage_only_range() mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags() "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: simplify calling a compound page destructor Subsystem: mm/vmscan Wei Yang <richard.weiyang@gmail.com>: mm/vmscan.c: use update_lru_size() in update_lru_sizes() Jaewon Kim <jaewon31.kim@samsung.com>: mm/vmscan: count layzfree pages and fix nr_isolated_* mismatch Maninder Singh <maninder1.s@samsung.com>: mm/vmscan.c: change prototype for shrink_page_list Qiwu Chen <qiwuchen55@gmail.com>: mm/vmscan: update the comment of should_continue_reclaim() Johannes Weiner <hannes@cmpxchg.org>: Patch series "mm: memcontrol: charge swapin pages on instantiation", v2: mm: fix NUMA node file count error in replace_page_cache() mm: memcontrol: fix stat-corrupting race in charge moving mm: memcontrol: drop @compound parameter from memcg charging API mm: shmem: remove rare optimization when swapin races with hole punching mm: memcontrol: move out cgroup swaprate throttling mm: memcontrol: convert page cache to a new mem_cgroup_charge() API mm: memcontrol: prepare uncharging for removal of private page type counters mm: memcontrol: prepare move_account for removal of private page type counters mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters mm: memcontrol: switch to native NR_ANON_MAPPED counter mm: memcontrol: switch to native NR_ANON_THPS counter mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API mm: memcontrol: drop unused try/commit/cancel charge API mm: memcontrol: prepare swap controller setup for integration mm: memcontrol: make swap tracking an integral part of memory control mm: memcontrol: charge swapin pages on instantiation Alex Shi <alex.shi@linux.alibaba.com>: mm: memcontrol: document the new swap control behavior Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: delete unused lrucare handling mm: memcontrol: update page->mem_cgroup stability rules mm: fix LRU balancing effect of new transparent huge pages mm: keep separate anon and file statistics on page reclaim activity mm: allow swappiness that prefers reclaiming anon over the file workingset mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() mm: workingset: let cache workingset challenge anon mm: remove use-once cache bias from LRU balancing mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count() mm: base LRU balancing on an explicit cost model mm: deactivations shouldn't bias the LRU balance mm: only count actual rotations as LRU reclaim cost mm: balance LRU lists based on relative thrashing mm: vmscan: determine anon/file pressure balance at the reclaim root mm: vmscan: reclaim writepage is IO cost mm: vmscan: limit the range of LRU type balancing Shakeel Butt <shakeelb@google.com>: mm: swap: fix vmstats for huge pages mm: swap: memcg: fix memcg stats for huge pages Subsystem: mm/tools Changhee Han <ch0.han@lge.com>: tools/vm/page_owner_sort.c: filter out unneeded line Subsystem: mm/mempolicy Michal Hocko <mhocko@suse.com>: mm, mempolicy: fix up gup usage in lookup_node Subsystem: mm/memblock chenqiwu <chenqiwu@xiaomi.com>: include/linux/memblock.h: fix minor typo and unclear comment Mike Rapoport <rppt@linux.ibm.com>: sparc32: register memory occupied by kernel as memblock.memory Subsystem: mm/hugetlbfs Shijie Hu <hushijie3@huawei.com>: hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs Subsystem: mm/thp Yang Shi <yang.shi@linux.alibaba.com>: mm: thp: don't need to drain lru cache when splitting and mlocking THP Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/thp: Rename pmd_mknotpresent() as pmd_mknotvalid()", v2: powerpc/mm: drop platform defined pmd_mknotpresent() mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() Subsystem: mm/mmap Scott Cheloha <cheloha@linux.vnet.ibm.com>: drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup Subsystem: mm/kconfig Zong Li <zong.li@sifive.com>: Patch series "Extract DEBUG_WX to shared use": mm: add DEBUG_WX support riscv: support DEBUG_WX x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined Documentation/admin-guide/cgroup-v1/memory.rst | 19 Documentation/admin-guide/kernel-parameters.txt | 40 Documentation/admin-guide/mm/hugetlbpage.rst | 35 Documentation/admin-guide/mm/transhuge.rst | 7 Documentation/admin-guide/sysctl/vm.rst | 23 Documentation/core-api/padata.rst | 41 Documentation/features/vm/numa-memblock/arch-support.txt | 34 Documentation/vm/memory-model.rst | 9 Documentation/vm/page_owner.rst | 3 arch/alpha/mm/init.c | 16 arch/alpha/mm/numa.c | 22 arch/arc/include/asm/hugepage.h | 2 arch/arc/mm/init.c | 41 arch/arm/include/asm/hugetlb.h | 7 arch/arm/include/asm/pgtable-3level.h | 2 arch/arm/mm/init.c | 66 arch/arm64/Kconfig | 2 arch/arm64/Kconfig.debug | 29 arch/arm64/include/asm/hugetlb.h | 13 arch/arm64/include/asm/pgtable.h | 2 arch/arm64/mm/hugetlbpage.c | 48 arch/arm64/mm/init.c | 56 arch/arm64/mm/numa.c | 9 arch/c6x/mm/init.c | 8 arch/csky/kernel/setup.c | 26 arch/h8300/mm/init.c | 6 arch/hexagon/mm/init.c | 6 arch/ia64/Kconfig | 1 arch/ia64/include/asm/hugetlb.h | 5 arch/ia64/mm/contig.c | 2 arch/ia64/mm/discontig.c | 2 arch/m68k/mm/init.c | 6 arch/m68k/mm/mcfmmu.c | 9 arch/m68k/mm/motorola.c | 15 arch/m68k/mm/sun3mmu.c | 10 arch/microblaze/Kconfig | 1 arch/microblaze/mm/init.c | 2 arch/mips/Kconfig | 1 arch/mips/include/asm/hugetlb.h | 11 arch/mips/include/asm/pgtable.h | 2 arch/mips/loongson64/numa.c | 2 arch/mips/mm/init.c | 2 arch/mips/sgi-ip27/ip27-memory.c | 2 arch/nds32/mm/init.c | 11 arch/nios2/mm/init.c | 8 arch/openrisc/mm/init.c | 9 arch/parisc/include/asm/hugetlb.h | 10 arch/parisc/mm/init.c | 22 arch/powerpc/Kconfig | 10 arch/powerpc/include/asm/book3s/64/pgtable.h | 4 arch/powerpc/include/asm/hugetlb.h | 5 arch/powerpc/mm/hugetlbpage.c | 38 arch/powerpc/mm/mem.c | 2 arch/riscv/Kconfig | 2 arch/riscv/include/asm/hugetlb.h | 10 arch/riscv/include/asm/ptdump.h | 11 arch/riscv/mm/hugetlbpage.c | 44 arch/riscv/mm/init.c | 5 arch/s390/Kconfig | 1 arch/s390/include/asm/hugetlb.h | 8 arch/s390/mm/hugetlbpage.c | 34 arch/s390/mm/init.c | 2 arch/sh/Kconfig | 1 arch/sh/include/asm/hugetlb.h | 7 arch/sh/mm/init.c | 2 arch/sparc/Kconfig | 10 arch/sparc/include/asm/hugetlb.h | 10 arch/sparc/mm/init_32.c | 1 arch/sparc/mm/init_64.c | 67 arch/sparc/mm/srmmu.c | 21 arch/um/kernel/mem.c | 12 arch/unicore32/include/asm/memory.h | 2 arch/unicore32/include/mach/memory.h | 6 arch/unicore32/kernel/pci.c | 14 arch/unicore32/mm/init.c | 43 arch/x86/Kconfig | 11 arch/x86/Kconfig.debug | 27 arch/x86/include/asm/hugetlb.h | 10 arch/x86/include/asm/pgtable.h | 2 arch/x86/mm/hugetlbpage.c | 35 arch/x86/mm/init.c | 2 arch/x86/mm/init_64.c | 12 arch/x86/mm/kmmio.c | 2 arch/x86/mm/numa.c | 11 arch/xtensa/mm/init.c | 8 drivers/base/memory.c | 44 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 22 fs/cifs/file.c | 10 fs/fuse/dev.c | 2 fs/hugetlbfs/inode.c | 67 include/asm-generic/hugetlb.h | 2 include/linux/compaction.h | 9 include/linux/gfp.h | 7 include/linux/hugetlb.h | 16 include/linux/memblock.h | 15 include/linux/memcontrol.h | 102 - include/linux/mm.h | 52 include/linux/mmzone.h | 46 include/linux/padata.h | 43 include/linux/string.h | 60 include/linux/swap.h | 17 include/linux/vm_event_item.h | 4 include/linux/vmstat.h | 2 include/trace/events/compaction.h | 22 include/trace/events/huge_memory.h | 3 include/trace/events/vmscan.h | 14 init/Kconfig | 17 init/main.c | 2 kernel/events/uprobes.c | 22 kernel/padata.c | 293 +++- kernel/sysctl.c | 3 lib/test_kasan.c | 29 mm/Kconfig | 9 mm/Kconfig.debug | 32 mm/compaction.c | 70 - mm/filemap.c | 55 mm/gup.c | 237 ++- mm/huge_memory.c | 282 ---- mm/hugetlb.c | 260 ++- mm/internal.h | 25 mm/khugepaged.c | 316 ++-- mm/memblock.c | 19 mm/memcontrol.c | 642 +++------ mm/memory.c | 103 - mm/memory_hotplug.c | 10 mm/mempolicy.c | 5 mm/migrate.c | 30 mm/oom_kill.c | 4 mm/page_alloc.c | 735 ++++------ mm/page_owner.c | 7 mm/pgtable-generic.c | 2 mm/rmap.c | 53 mm/shmem.c | 156 -- mm/slab.c | 4 mm/slub.c | 8 mm/swap.c | 199 +- mm/swap_cgroup.c | 10 mm/swap_state.c | 110 - mm/swapfile.c | 39 mm/userfaultfd.c | 15 mm/vmscan.c | 344 ++-- mm/vmstat.c | 16 mm/workingset.c | 23 tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 1 tools/testing/selftests/vm/khugepaged.c | 1035 +++++++++++++++ tools/vm/page_owner_sort.c | 5 147 files changed, 3876 insertions(+), 3108 deletions(-)
- More MM work. 100ish more to go. Mike's "mm: remove __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue. - Various other little subsystems 127 patches, based on 6929f71e46bdddbf1c4d67c2728648176c67c555. Subsystems affected by this patch series: kcov mm/pagemap mm/vmalloc mm/kmap mm/util mm/memory-hotplug mm/cleanups mm/zram procfs core-kernel get_maintainer lib bitops checkpatch binfmt init fat seq_file exec rapidio relay selftests ubsan Subsystem: kcov Andrey Konovalov <andreyknvl@google.com>: Patch series "kcov: collect coverage from usb soft interrupts", v4: kcov: cleanup debug messages kcov: fix potential use-after-free in kcov_remote_start kcov: move t->kcov assignments into kcov_start/stop kcov: move t->kcov_sequence assignment kcov: use t->kcov_mode as enabled indicator kcov: collect coverage from interrupts usb: core: kcov: collect coverage from usb complete callback Subsystem: mm/pagemap Feng Tang <feng.tang@intel.com>: mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: remove __ARCH_HAS_5LEVEL_HACK", v4: h8300: remove usage of __ARCH_USE_5LEVEL_HACK arm: add support for folded p4d page tables arm64: add support for folded p4d page tables hexagon: remove __ARCH_USE_5LEVEL_HACK ia64: add support for folded p4d page tables nios2: add support for folded p4d page tables openrisc: add support for folded p4d page tables powerpc: add support for folded p4d page tables Geert Uytterhoeven <geert+renesas@glider.be>: sh: fault: modernize printing of kernel messages Mike Rapoport <rppt@linux.ibm.com>: sh: drop __pXd_offset() macros that duplicate pXd_index() ones sh: add support for folded p4d page tables unicore32: remove __ARCH_USE_5LEVEL_HACK asm-generic: remove pgtable-nop4d-hack.h mm: remove __ARCH_HAS_5LEVEL_HACK and include/asm-generic/5level-fixup.h Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/debug: Add tests validating architecture page table: x86/mm: define mm_p4d_folded() mm/debug: add tests validating architecture page table helpers Subsystem: mm/vmalloc Jeongtae Park <jtp.park@samsung.com>: mm/vmalloc: fix a typo in comment Subsystem: mm/kmap Ira Weiny <ira.weiny@intel.com>: Patch series "Remove duplicated kmap code", v3: arch/kmap: remove BUG_ON() arch/xtensa: move kmap build bug out of the way arch/kmap: remove redundant arch specific kmaps arch/kunmap: remove duplicate kunmap implementations {x86,powerpc,microblaze}/kmap: move preempt disable arch/kmap_atomic: consolidate duplicate code arch/kunmap_atomic: consolidate duplicate code arch/kmap: ensure kmap_prot visibility arch/kmap: don't hard code kmap_prot values arch/kmap: define kmap_atomic_prot() for all arch's drm: remove drm specific kmap_atomic code kmap: remove kmap_atomic_to_page() parisc/kmap: remove duplicate kmap code sparc: remove unnecessary includes kmap: consolidate kmap_prot definitions Subsystem: mm/util Waiman Long <longman@redhat.com>: mm: add kvfree_sensitive() for freeing sensitive data objects Subsystem: mm/memory-hotplug Vishal Verma <vishal.l.verma@intel.com>: mm/memory_hotplug: refrain from adding memory into an impossible node David Hildenbrand <david@redhat.com>: powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable() mm/memory_hotplug: remove is_mem_section_removable() Patch series "mm/memory_hotplug: handle memblocks only with: mm/memory_hotplug: set node_start_pfn of hotadded pgdat to 0 mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK Patch series "mm/memory_hotplug: Interface to add driver-managed system: mm/memory_hotplug: introduce add_memory_driver_managed() kexec_file: don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGED device-dax: add memory via add_memory_driver_managed() Michal Hocko <mhocko@kernel.org>: mm/memory_hotplug: disable the functionality for 32b Subsystem: mm/cleanups chenqiwu <chenqiwu@xiaomi.com>: mm: replace zero-length array with flexible-array member Ethon Paul <ethp@qq.com>: mm/memory_hotplug: fix a typo in comment "recoreded"->"recorded" mm: ksm: fix a typo in comment "alreaady"->"already" mm: mmap: fix a typo in comment "compatbility"->"compatibility" mm/hugetlb: fix a typos in comments mm/vmsan: fix some typos in comment mm/compaction: fix a typo in comment "pessemistic"->"pessimistic" mm/memblock: fix a typo in comment "implict"->"implicit" mm/list_lru: fix a typo in comment "numbesr"->"numbers" mm/filemap: fix a typo in comment "unneccssary"->"unnecessary" mm/frontswap: fix some typos in frontswap.c mm, memcg: fix some typos in memcontrol.c mm: fix a typo in comment "strucure"->"structure" mm/slub: fix a typo in comment "disambiguiation"->"disambiguation" mm/sparse: fix a typo in comment "convienence"->"convenience" mm/page-writeback: fix a typo in comment "effictive"->"effective" mm/memory: fix a typo in comment "attampt"->"attempt" Zou Wei <zou_wei@huawei.com>: mm: use false for bool variable Jason Yan <yanaijie@huawei.com>: include/linux/mm.h: return true in cpupid_pid_unset() Subsystem: mm/zram Andy Shevchenko <andriy.shevchenko@linux.intel.com>: zcomp: Use ARRAY_SIZE() for backends list Subsystem: procfs Alexey Dobriyan <adobriyan@gmail.com>: proc: rename "catch" function argument Subsystem: core-kernel Jason Yan <yanaijie@huawei.com>: user.c: make uidhash_table static Subsystem: get_maintainer Joe Perches <joe@perches.com>: get_maintainer: add email addresses from .yaml files get_maintainer: fix unexpected behavior for path/to//file (double slashes) Subsystem: lib Christophe JAILLET <christophe.jaillet@wanadoo.fr>: lib/math: avoid trailing newline hidden in pr_fmt() KP Singh <kpsingh@chromium.org>: lib: Add might_fault() to strncpy_from_user. Jason Yan <yanaijie@huawei.com>: lib/test_lockup.c: make test_inode static Jann Horn <jannh@google.com>: lib/zlib: remove outdated and incorrect pre-increment optimization Joe Perches <joe@perches.com>: lib/percpu-refcount.c: use a more common logging style Tan Hu <tan.hu@zte.com.cn>: lib/flex_proportions.c: cleanup __fprop_inc_percpu_max Jesse Brandeburg <jesse.brandeburg@intel.com>: lib: make a test module with set/clear bit Subsystem: bitops Arnd Bergmann <arnd@arndb.de>: include/linux/bitops.h: avoid clang shift-count-overflow warnings Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: additional MAINTAINER section entry ordering checks checkpatch: look for c99 comments in ctx_locate_comment checkpatch: disallow --git and --file/--fix Geert Uytterhoeven <geert+renesas@glider.be>: checkpatch: use patch subject when reading from stdin Subsystem: binfmt Anthony Iliopoulos <ailiop@suse.com>: fs/binfmt_elf: remove redundant elf_map ifndef Nick Desaulniers <ndesaulniers@google.com>: elfnote: mark all .note sections SHF_ALLOC Subsystem: init Chris Down <chris@chrisdown.name>: init: allow distribution configuration of default init Subsystem: fat OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>: fat: don't allow to mount if the FAT length == 0 fat: improve the readahead for FAT entries Subsystem: seq_file Joe Perches <joe@perches.com>: fs/seq_file.c: seq_read: Update pr_info_ratelimited Kefeng Wang <wangkefeng.wang@huawei.com>: Patch series "seq_file: Introduce DEFINE_SEQ_ATTRIBUTE() helper macro": include/linux/seq_file.h: introduce DEFINE_SEQ_ATTRIBUTE() helper macro mm/vmstat.c: convert to use DEFINE_SEQ_ATTRIBUTE macro kernel/kprobes.c: convert to use DEFINE_SEQ_ATTRIBUTE macro Subsystem: exec Christoph Hellwig <hch@lst.de>: exec: simplify the copy_strings_kernel calling convention exec: open code copy_string_kernel Subsystem: rapidio Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>: rapidio: avoid data race between file operation callbacks and mport_cdev_add(). John Hubbard <jhubbard@nvidia.com>: rapidio: convert get_user_pages() --> pin_user_pages() Subsystem: relay Daniel Axtens <dja@axtens.net>: kernel/relay.c: handle alloc_percpu returning NULL in relay_open Pengcheng Yang <yangpc@wangsu.com>: kernel/relay.c: fix read_pos error when multiple readers Subsystem: selftests Ram Pai <linuxram@us.ibm.com>: Patch series "selftests, powerpc, x86: Memory Protection Keys", v19: selftests/x86/pkeys: move selftests to arch-neutral directory selftests/vm/pkeys: rename all references to pkru to a generic name selftests/vm/pkeys: move generic definitions to header file Thiago Jung Bauermann <bauerman@linux.ibm.com>: selftests/vm/pkeys: move some definitions to arch-specific header selftests/vm/pkeys: make gcc check arguments of sigsafe_printf() Sandipan Das <sandipan@linux.ibm.com>: selftests: vm: pkeys: Use sane types for pkey register selftests: vm: pkeys: add helpers for pkey bits Ram Pai <linuxram@us.ibm.com>: selftests/vm/pkeys: fix pkey_disable_clear() selftests/vm/pkeys: fix assertion in pkey_disable_set/clear() selftests/vm/pkeys: fix alloc_random_pkey() to make it really random Sandipan Das <sandipan@linux.ibm.com>: selftests: vm: pkeys: use the correct huge page size Ram Pai <linuxram@us.ibm.com>: selftests/vm/pkeys: introduce generic pkey abstractions selftests/vm/pkeys: introduce powerpc support "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>: selftests/vm/pkeys: fix number of reserved powerpc pkeys Ram Pai <linuxram@us.ibm.com>: selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust() selftests/vm/pkeys: improve checks to determine pkey support selftests/vm/pkeys: associate key on a mapped page and detect access violation selftests/vm/pkeys: associate key on a mapped page and detect write violation selftests/vm/pkeys: detect write violation on a mapped access-denied-key page selftests/vm/pkeys: introduce a sub-page allocator selftests/vm/pkeys: test correct behaviour of pkey-0 selftests/vm/pkeys: override access right definitions on powerpc Sandipan Das <sandipan@linux.ibm.com>: selftests: vm: pkeys: use the correct page size on powerpc selftests: vm: pkeys: fix multilib builds for x86 Jagadeesh Pagadala <jagdsh.linux@gmail.com>: tools/testing/selftests/vm: remove duplicate headers Subsystem: ubsan Arnd Bergmann <arnd@arndb.de>: lib/ubsan.c: fix gcc-10 warnings Documentation/dev-tools/kcov.rst | 17 Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 34 arch/arc/Kconfig | 1 arch/arc/include/asm/highmem.h | 20 arch/arc/mm/highmem.c | 34 arch/arm/include/asm/highmem.h | 9 arch/arm/include/asm/pgtable.h | 1 arch/arm/lib/uaccess_with_memcpy.c | 7 arch/arm/mach-sa1100/assabet.c | 2 arch/arm/mm/dump.c | 29 arch/arm/mm/fault-armv.c | 7 arch/arm/mm/fault.c | 22 arch/arm/mm/highmem.c | 41 arch/arm/mm/idmap.c | 3 arch/arm/mm/init.c | 2 arch/arm/mm/ioremap.c | 12 arch/arm/mm/mm.h | 2 arch/arm/mm/mmu.c | 35 arch/arm/mm/pgd.c | 40 arch/arm64/Kconfig | 1 arch/arm64/include/asm/kvm_mmu.h | 10 arch/arm64/include/asm/pgalloc.h | 10 arch/arm64/include/asm/pgtable-types.h | 5 arch/arm64/include/asm/pgtable.h | 37 arch/arm64/include/asm/stage2_pgtable.h | 48 arch/arm64/kernel/hibernate.c | 44 arch/arm64/kvm/mmu.c | 209 arch/arm64/mm/fault.c | 9 arch/arm64/mm/hugetlbpage.c | 15 arch/arm64/mm/kasan_init.c | 26 arch/arm64/mm/mmu.c | 52 arch/arm64/mm/pageattr.c | 7 arch/csky/include/asm/highmem.h | 12 arch/csky/mm/highmem.c | 64 arch/h8300/include/asm/pgtable.h | 1 arch/hexagon/include/asm/fixmap.h | 4 arch/hexagon/include/asm/pgtable.h | 1 arch/ia64/include/asm/pgalloc.h | 4 arch/ia64/include/asm/pgtable.h | 17 arch/ia64/mm/fault.c | 7 arch/ia64/mm/hugetlbpage.c | 18 arch/ia64/mm/init.c | 28 arch/microblaze/include/asm/highmem.h | 55 arch/microblaze/mm/highmem.c | 21 arch/microblaze/mm/init.c | 3 arch/mips/include/asm/highmem.h | 11 arch/mips/mm/cache.c | 6 arch/mips/mm/highmem.c | 62 arch/nds32/include/asm/highmem.h | 9 arch/nds32/mm/highmem.c | 49 arch/nios2/include/asm/pgtable.h | 3 arch/nios2/mm/fault.c | 9 arch/nios2/mm/ioremap.c | 6 arch/openrisc/include/asm/pgtable.h | 1 arch/openrisc/mm/fault.c | 10 arch/openrisc/mm/init.c | 4 arch/parisc/include/asm/cacheflush.h | 32 arch/powerpc/Kconfig | 1 arch/powerpc/include/asm/book3s/32/pgtable.h | 1 arch/powerpc/include/asm/book3s/64/hash.h | 4 arch/powerpc/include/asm/book3s/64/pgalloc.h | 4 arch/powerpc/include/asm/book3s/64/pgtable.h | 60 arch/powerpc/include/asm/book3s/64/radix.h | 6 arch/powerpc/include/asm/highmem.h | 56 arch/powerpc/include/asm/nohash/32/pgtable.h | 1 arch/powerpc/include/asm/nohash/64/pgalloc.h | 2 arch/powerpc/include/asm/nohash/64/pgtable-4k.h | 32 arch/powerpc/include/asm/nohash/64/pgtable.h | 6 arch/powerpc/include/asm/pgtable.h | 10 arch/powerpc/kvm/book3s_64_mmu_radix.c | 32 arch/powerpc/lib/code-patching.c | 7 arch/powerpc/mm/book3s64/hash_pgtable.c | 4 arch/powerpc/mm/book3s64/radix_pgtable.c | 26 arch/powerpc/mm/book3s64/subpage_prot.c | 6 arch/powerpc/mm/highmem.c | 26 arch/powerpc/mm/hugetlbpage.c | 28 arch/powerpc/mm/kasan/kasan_init_32.c | 2 arch/powerpc/mm/mem.c | 3 arch/powerpc/mm/nohash/book3e_pgtable.c | 15 arch/powerpc/mm/pgtable.c | 30 arch/powerpc/mm/pgtable_64.c | 10 arch/powerpc/mm/ptdump/hashpagetable.c | 20 arch/powerpc/mm/ptdump/ptdump.c | 12 arch/powerpc/platforms/pseries/hotplug-memory.c | 26 arch/powerpc/xmon/xmon.c | 27 arch/s390/Kconfig | 1 arch/sh/include/asm/pgtable-2level.h | 1 arch/sh/include/asm/pgtable-3level.h | 1 arch/sh/include/asm/pgtable_32.h | 5 arch/sh/include/asm/pgtable_64.h | 5 arch/sh/kernel/io_trapped.c | 7 arch/sh/mm/cache-sh4.c | 4 arch/sh/mm/cache-sh5.c | 7 arch/sh/mm/fault.c | 64 arch/sh/mm/hugetlbpage.c | 28 arch/sh/mm/init.c | 15 arch/sh/mm/kmap.c | 2 arch/sh/mm/tlbex_32.c | 6 arch/sh/mm/tlbex_64.c | 7 arch/sparc/include/asm/highmem.h | 29 arch/sparc/mm/highmem.c | 31 arch/sparc/mm/io-unit.c | 1 arch/sparc/mm/iommu.c | 1 arch/unicore32/include/asm/pgtable.h | 1 arch/unicore32/kernel/hibernate.c | 4 arch/x86/Kconfig | 1 arch/x86/include/asm/fixmap.h | 1 arch/x86/include/asm/highmem.h | 37 arch/x86/include/asm/pgtable_64.h | 6 arch/x86/mm/highmem_32.c | 52 arch/xtensa/include/asm/highmem.h | 31 arch/xtensa/mm/highmem.c | 28 drivers/block/zram/zcomp.c | 7 drivers/dax/dax-private.h | 1 drivers/dax/kmem.c | 28 drivers/gpu/drm/ttm/ttm_bo_util.c | 56 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 17 drivers/rapidio/devices/rio_mport_cdev.c | 27 drivers/usb/core/hcd.c | 3 fs/binfmt_elf.c | 4 fs/binfmt_em86.c | 6 fs/binfmt_misc.c | 4 fs/binfmt_script.c | 6 fs/exec.c | 58 fs/fat/fatent.c | 103 fs/fat/inode.c | 6 fs/proc/array.c | 8 fs/seq_file.c | 7 include/asm-generic/5level-fixup.h | 59 include/asm-generic/pgtable-nop4d-hack.h | 64 include/asm-generic/pgtable-nopud.h | 4 include/drm/ttm/ttm_bo_api.h | 4 include/linux/binfmts.h | 3 include/linux/bitops.h | 2 include/linux/elfnote.h | 2 include/linux/highmem.h | 89 include/linux/ioport.h | 1 include/linux/memory_hotplug.h | 9 include/linux/mm.h | 12 include/linux/sched.h | 3 include/linux/seq_file.h | 19 init/Kconfig | 10 init/main.c | 10 kernel/kcov.c | 282 - kernel/kexec_file.c | 5 kernel/kprobes.c | 34 kernel/relay.c | 22 kernel/user.c | 2 lib/Kconfig.debug | 44 lib/Makefile | 2 lib/flex_proportions.c | 7 lib/math/prime_numbers.c | 10 lib/percpu-refcount.c | 6 lib/strncpy_from_user.c | 1 lib/test_bitops.c | 60 lib/test_lockup.c | 2 lib/ubsan.c | 33 lib/zlib_inflate/inffast.c | 91 mm/Kconfig | 4 mm/Makefile | 1 mm/compaction.c | 2 mm/debug_vm_pgtable.c | 382 + mm/filemap.c | 2 mm/frontswap.c | 6 mm/huge_memory.c | 2 mm/hugetlb.c | 16 mm/internal.h | 2 mm/kasan/init.c | 11 mm/ksm.c | 10 mm/list_lru.c | 2 mm/memblock.c | 2 mm/memcontrol.c | 4 mm/memory.c | 10 mm/memory_hotplug.c | 179 mm/mmap.c | 2 mm/mremap.c | 2 mm/page-writeback.c | 2 mm/slub.c | 2 mm/sparse.c | 2 mm/util.c | 22 mm/vmalloc.c | 2 mm/vmscan.c | 6 mm/vmstat.c | 32 mm/zbud.c | 2 scripts/checkpatch.pl | 62 scripts/get_maintainer.pl | 46 security/keys/internal.h | 11 security/keys/keyctl.c | 16 tools/testing/selftests/lib/config | 1 tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 75 tools/testing/selftests/vm/mremap_dontunmap.c | 1 tools/testing/selftests/vm/pkey-helpers.h | 557 +- tools/testing/selftests/vm/pkey-powerpc.h | 153 tools/testing/selftests/vm/pkey-x86.h | 191 tools/testing/selftests/vm/protection_keys.c | 2370 ++++++++-- tools/testing/selftests/x86/.gitignore | 1 tools/testing/selftests/x86/Makefile | 2 tools/testing/selftests/x86/pkey-helpers.h | 219 tools/testing/selftests/x86/protection_keys.c | 1506 ------ 200 files changed, 5182 insertions(+), 4033 deletions(-)
Various trees. Mainly those parts of MM whose linux-next dependents are now merged. I'm still sitting on ~160 patches which await merges from -next. 54 patches, based on 9aa900c8094dba7a60dc805ecec1e9f720744ba1. Subsystems affected by this patch series: mm/proc ipc dynamic-debug panic lib sysctl mm/gup mm/pagemap Subsystem: mm/proc SeongJae Park <sjpark@amazon.de>: mm/page_idle.c: skip offline pages Subsystem: ipc Jules Irenge <jbi.octave@gmail.com>: ipc/msg: add missing annotation for freeque() Giuseppe Scrivano <gscrivan@redhat.com>: ipc/namespace.c: use a work queue to free_ipc Subsystem: dynamic-debug Orson Zhai <orson.zhai@unisoc.com>: dynamic_debug: add an option to enable dynamic debug for modules only Subsystem: panic Rafael Aquini <aquini@redhat.com>: kernel: add panic_on_taint Subsystem: lib Manfred Spraul <manfred@colorfullife.com>: xarray.h: correct return code documentation for xa_store_{bh,irq}() Subsystem: sysctl Vlastimil Babka <vbabka@suse.cz>: Patch series "support setting sysctl parameters from kernel command line", v3: kernel/sysctl: support setting sysctl parameters from kernel command line kernel/sysctl: support handling command line aliases kernel/hung_task convert hung_task_panic boot parameter to sysctl tools/testing/selftests/sysctl/sysctl.sh: support CONFIG_TEST_SYSCTL=y lib/test_sysctl: support testing of sysctl. boot parameter "Guilherme G. Piccoli" <gpiccoli@canonical.com>: kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected panic: add sysctl to dump all CPUs backtraces on oops event Rafael Aquini <aquini@redhat.com>: kernel/sysctl.c: ignore out-of-range taint bits introduced via kernel.tainted Subsystem: mm/gup Souptick Joarder <jrdr.linux@gmail.com>: mm/gup.c: convert to use get_user_{page|pages}_fast_only() John Hubbard <jhubbard@nvidia.com>: mm/gup: update pin_user_pages.rst for "case 3" (mmu notifiers) Patch series "mm/gup: introduce pin_user_pages_locked(), use it in frame_vector.c", v2: mm/gup: introduce pin_user_pages_locked() mm/gup: frame_vector: convert get_user_pages() --> pin_user_pages() mm/gup: documentation fix for pin_user_pages*() APIs Patch series "vhost, docs: convert to pin_user_pages(), new "case 5"": docs: mm/gup: pin_user_pages.rst: add a "case 5" vhost: convert get_user_pages() --> pin_user_pages() Subsystem: mm/pagemap Alexander Gordeev <agordeev@linux.ibm.com>: mm/mmap.c: add more sanity checks to get_unmapped_area() mm/mmap.c: do not allow mappings outside of allowed limits Christoph Hellwig <hch@lst.de>: Patch series "sort out the flush_icache_range mess", v2: arm: fix the flush_icache_range arguments in set_fiq_handler nds32: unexport flush_icache_page powerpc: unexport flush_icache_user_range unicore32: remove flush_cache_user_range asm-generic: fix the inclusion guards for cacheflush.h asm-generic: don't include <linux/mm.h> in cacheflush.h asm-generic: improve the flush_dcache_page stub alpha: use asm-generic/cacheflush.h arm64: use asm-generic/cacheflush.h c6x: use asm-generic/cacheflush.h hexagon: use asm-generic/cacheflush.h ia64: use asm-generic/cacheflush.h microblaze: use asm-generic/cacheflush.h m68knommu: use asm-generic/cacheflush.h openrisc: use asm-generic/cacheflush.h powerpc: use asm-generic/cacheflush.h riscv: use asm-generic/cacheflush.h arm,sparc,unicore32: remove flush_icache_user_range mm: rename flush_icache_user_range to flush_icache_user_page asm-generic: add a flush_icache_user_range stub sh: implement flush_icache_user_range xtensa: implement flush_icache_user_range arm: rename flush_cache_user_range to flush_icache_user_range m68k: implement flush_icache_user_range exec: only build read_code when needed exec: use flush_icache_user_range in read_code binfmt_flat: use flush_icache_user_range nommu: use flush_icache_user_range in brk and mmap module: move the set_fs hack for flush_icache_range to m68k Konstantin Khlebnikov <khlebnikov@yandex-team.ru>: doc: cgroup: update note about conditions when oom killer is invoked Documentation/admin-guide/cgroup-v2.rst | 17 +- Documentation/admin-guide/dynamic-debug-howto.rst | 5 Documentation/admin-guide/kdump/kdump.rst | 8 + Documentation/admin-guide/kernel-parameters.txt | 34 +++- Documentation/admin-guide/sysctl/kernel.rst | 37 ++++ Documentation/core-api/pin_user_pages.rst | 47 ++++-- arch/alpha/include/asm/cacheflush.h | 38 +---- arch/alpha/kernel/smp.c | 2 arch/arm/include/asm/cacheflush.h | 7 arch/arm/kernel/fiq.c | 4 arch/arm/kernel/traps.c | 2 arch/arm64/include/asm/cacheflush.h | 46 ------ arch/c6x/include/asm/cacheflush.h | 19 -- arch/hexagon/include/asm/cacheflush.h | 19 -- arch/ia64/include/asm/cacheflush.h | 30 ---- arch/m68k/include/asm/cacheflush_mm.h | 6 arch/m68k/include/asm/cacheflush_no.h | 19 -- arch/m68k/mm/cache.c | 13 + arch/microblaze/include/asm/cacheflush.h | 29 --- arch/nds32/include/asm/cacheflush.h | 4 arch/nds32/mm/cacheflush.c | 3 arch/openrisc/include/asm/cacheflush.h | 33 ---- arch/powerpc/include/asm/cacheflush.h | 46 +----- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 arch/powerpc/mm/mem.c | 3 arch/powerpc/perf/callchain_64.c | 4 arch/riscv/include/asm/cacheflush.h | 65 -------- arch/sh/include/asm/cacheflush.h | 1 arch/sparc/include/asm/cacheflush_32.h | 2 arch/sparc/include/asm/cacheflush_64.h | 1 arch/um/include/asm/tlb.h | 2 arch/unicore32/include/asm/cacheflush.h | 11 - arch/x86/include/asm/cacheflush.h | 2 arch/xtensa/include/asm/cacheflush.h | 2 drivers/media/platform/omap3isp/ispvideo.c | 2 drivers/nvdimm/pmem.c | 3 drivers/vhost/vhost.c | 5 fs/binfmt_flat.c | 2 fs/exec.c | 5 fs/proc/proc_sysctl.c | 163 ++++++++++++++++++++-- include/asm-generic/cacheflush.h | 25 +-- include/linux/dev_printk.h | 6 include/linux/dynamic_debug.h | 2 include/linux/ipc_namespace.h | 2 include/linux/kernel.h | 9 + include/linux/mm.h | 12 + include/linux/net.h | 3 include/linux/netdevice.h | 6 include/linux/printk.h | 9 - include/linux/sched/sysctl.h | 7 include/linux/sysctl.h | 4 include/linux/xarray.h | 4 include/rdma/ib_verbs.h | 6 init/main.c | 2 ipc/msg.c | 2 ipc/namespace.c | 24 ++- kernel/events/core.c | 4 kernel/events/uprobes.c | 2 kernel/hung_task.c | 30 ++-- kernel/module.c | 8 - kernel/panic.c | 45 ++++++ kernel/sysctl.c | 38 ++++- kernel/watchdog.c | 37 +--- lib/Kconfig.debug | 12 + lib/Makefile | 2 lib/dynamic_debug.c | 9 - lib/test_sysctl.c | 13 + mm/frame_vector.c | 7 mm/gup.c | 74 +++++++-- mm/mmap.c | 28 ++- mm/nommu.c | 4 mm/page_alloc.c | 9 - mm/page_idle.c | 7 tools/testing/selftests/sysctl/sysctl.sh | 44 +++++ virt/kvm/kvm_main.c | 8 - 76 files changed, 732 insertions(+), 517 deletions(-)
- a kernel-wide sweep of show_stack() - pagetable cleanups - abstract out accesses to mmap_sem - prep for mmap_sem scalability work - hch's user acess work 93 patches, based on abfbb29297c27e3f101f348dc9e467b0fe70f919: Subsystems affected by this patch series: debug mm/pagemap mm/maccess mm/documentation Subsystem: debug Dmitry Safonov <dima@arista.com>: Patch series "Add log level to show_stack()", v3: kallsyms/printk: add loglvl to print_ip_sym() alpha: add show_stack_loglvl() arc: add show_stack_loglvl() arm/asm: add loglvl to c_backtrace() arm: add loglvl to unwind_backtrace() arm: add loglvl to dump_backtrace() arm: wire up dump_backtrace_{entry,stm} arm: add show_stack_loglvl() arm64: add loglvl to dump_backtrace() arm64: add show_stack_loglvl() c6x: add show_stack_loglvl() csky: add show_stack_loglvl() h8300: add show_stack_loglvl() hexagon: add show_stack_loglvl() ia64: pass log level as arg into ia64_do_show_stack() ia64: add show_stack_loglvl() m68k: add show_stack_loglvl() microblaze: add loglvl to microblaze_unwind_inner() microblaze: add loglvl to microblaze_unwind() microblaze: add show_stack_loglvl() mips: add show_stack_loglvl() nds32: add show_stack_loglvl() nios2: add show_stack_loglvl() openrisc: add show_stack_loglvl() parisc: add show_stack_loglvl() powerpc: add show_stack_loglvl() riscv: add show_stack_loglvl() s390: add show_stack_loglvl() sh: add loglvl to dump_mem() sh: remove needless printk() sh: add loglvl to printk_address() sh: add loglvl to show_trace() sh: add show_stack_loglvl() sparc: add show_stack_loglvl() um/sysrq: remove needless variable sp um: add show_stack_loglvl() unicore32: remove unused pmode argument in c_backtrace() unicore32: add loglvl to c_backtrace() unicore32: add show_stack_loglvl() x86: add missing const qualifiers for log_lvl x86: add show_stack_loglvl() xtensa: add loglvl to show_trace() xtensa: add show_stack_loglvl() sysrq: use show_stack_loglvl() x86/amd_gart: print stacktrace for a leak with KERN_ERR power: use show_stack_loglvl() kdb: don't play with console_loglevel sched: print stack trace with KERN_INFO kernel: use show_stack_loglvl() kernel: rename show_stack_loglvl() => show_stack() Subsystem: mm/pagemap Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: consolidate definitions of page table accessors", v2: mm: don't include asm/pgtable.h if linux/mm.h is already included mm: introduce include/linux/pgtable.h mm: reorder includes after introduction of linux/pgtable.h csky: replace definitions of __pXd_offset() with pXd_index() m68k/mm/motorola: move comment about page table allocation funcitons m68k/mm: move {cache,nocahe}_page() definitions close to their user x86/mm: simplify init_trampoline() and surrounding logic mm: pgtable: add shortcuts for accessing kernel PMD and PTE mm: consolidate pte_index() and pte_offset_*() definitions Michel Lespinasse <walken@google.com>: mmap locking API: initial implementation as rwsem wrappers MMU notifier: use the new mmap locking API DMA reservations: use the new mmap locking API mmap locking API: use coccinelle to convert mmap_sem rwsem call sites mmap locking API: convert mmap_sem call sites missed by coccinelle mmap locking API: convert nested write lock sites mmap locking API: add mmap_read_trylock_non_owner() mmap locking API: add MMAP_LOCK_INITIALIZER mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked() mmap locking API: rename mmap_sem to mmap_lock mmap locking API: convert mmap_sem API comments mmap locking API: convert mmap_sem comments Subsystem: mm/maccess Christoph Hellwig <hch@lst.de>: Patch series "clean up and streamline probe_kernel_* and friends", v4: maccess: unexport probe_kernel_write() maccess: remove various unused weak aliases maccess: remove duplicate kerneldoc comments maccess: clarify kerneldoc comments maccess: update the top of file comment maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault maccess: rename strnlen_unsafe_user to strnlen_user_nofault maccess: remove probe_read_common and probe_write_common maccess: unify the probe kernel arch hooks bpf: factor out a bpf_trace_copy_string helper bpf: handle the compat string in bpf_trace_copy_string better Andrew Morton <akpm@linux-foundation.org>: bpf:bpf_seq_printf(): handle potentially unsafe format string better Christoph Hellwig <hch@lst.de>: bpf: rework the compat kernel probe handling tracing/kprobes: handle mixed kernel/userspace probes better maccess: remove strncpy_from_unsafe maccess: always use strict semantics for probe_kernel_read maccess: move user access routines together maccess: allow architectures to provide kernel probing directly x86: use non-set_fs based maccess routines maccess: return -ERANGE when probe_kernel_read() fails Subsystem: mm/documentation Luis Chamberlain <mcgrof@kernel.org>: include/linux/cache.h: expand documentation over __read_mostly Documentation/admin-guide/mm/numa_memory_policy.rst | 10 Documentation/admin-guide/mm/userfaultfd.rst | 2 Documentation/filesystems/locking.rst | 2 Documentation/vm/hmm.rst | 6 Documentation/vm/transhuge.rst | 4 arch/alpha/boot/bootp.c | 1 arch/alpha/boot/bootpz.c | 1 arch/alpha/boot/main.c | 1 arch/alpha/include/asm/io.h | 1 arch/alpha/include/asm/pgtable.h | 16 arch/alpha/kernel/process.c | 1 arch/alpha/kernel/proto.h | 4 arch/alpha/kernel/ptrace.c | 1 arch/alpha/kernel/setup.c | 1 arch/alpha/kernel/smp.c | 1 arch/alpha/kernel/sys_alcor.c | 1 arch/alpha/kernel/sys_cabriolet.c | 1 arch/alpha/kernel/sys_dp264.c | 1 arch/alpha/kernel/sys_eb64p.c | 1 arch/alpha/kernel/sys_eiger.c | 1 arch/alpha/kernel/sys_jensen.c | 1 arch/alpha/kernel/sys_marvel.c | 1 arch/alpha/kernel/sys_miata.c | 1 arch/alpha/kernel/sys_mikasa.c | 1 arch/alpha/kernel/sys_nautilus.c | 1 arch/alpha/kernel/sys_noritake.c | 1 arch/alpha/kernel/sys_rawhide.c | 1 arch/alpha/kernel/sys_ruffian.c | 1 arch/alpha/kernel/sys_rx164.c | 1 arch/alpha/kernel/sys_sable.c | 1 arch/alpha/kernel/sys_sio.c | 1 arch/alpha/kernel/sys_sx164.c | 1 arch/alpha/kernel/sys_takara.c | 1 arch/alpha/kernel/sys_titan.c | 1 arch/alpha/kernel/sys_wildfire.c | 1 arch/alpha/kernel/traps.c | 40 arch/alpha/mm/fault.c | 12 arch/alpha/mm/init.c | 1 arch/arc/include/asm/bug.h | 3 arch/arc/include/asm/pgtable.h | 24 arch/arc/kernel/process.c | 4 arch/arc/kernel/stacktrace.c | 29 arch/arc/kernel/troubleshoot.c | 6 arch/arc/mm/fault.c | 6 arch/arc/mm/highmem.c | 14 arch/arc/mm/tlbex.S | 4 arch/arm/include/asm/bug.h | 3 arch/arm/include/asm/efi.h | 3 arch/arm/include/asm/fixmap.h | 4 arch/arm/include/asm/idmap.h | 2 arch/arm/include/asm/pgtable-2level.h | 1 arch/arm/include/asm/pgtable-3level.h | 7 arch/arm/include/asm/pgtable-nommu.h | 3 arch/arm/include/asm/pgtable.h | 25 arch/arm/include/asm/traps.h | 3 arch/arm/include/asm/unwind.h | 3 arch/arm/kernel/head.S | 4 arch/arm/kernel/machine_kexec.c | 1 arch/arm/kernel/module.c | 1 arch/arm/kernel/process.c | 4 arch/arm/kernel/ptrace.c | 1 arch/arm/kernel/smp.c | 1 arch/arm/kernel/suspend.c | 4 arch/arm/kernel/swp_emulate.c | 4 arch/arm/kernel/traps.c | 61 arch/arm/kernel/unwind.c | 7 arch/arm/kernel/vdso.c | 2 arch/arm/kernel/vmlinux.lds.S | 4 arch/arm/lib/backtrace-clang.S | 9 arch/arm/lib/backtrace.S | 14 arch/arm/lib/uaccess_with_memcpy.c | 16 arch/arm/mach-ebsa110/core.c | 1 arch/arm/mach-footbridge/common.c | 1 arch/arm/mach-imx/mm-imx21.c | 1 arch/arm/mach-imx/mm-imx27.c | 1 arch/arm/mach-imx/mm-imx3.c | 1 arch/arm/mach-integrator/core.c | 4 arch/arm/mach-iop32x/i2c.c | 1 arch/arm/mach-iop32x/iq31244.c | 1 arch/arm/mach-iop32x/iq80321.c | 1 arch/arm/mach-iop32x/n2100.c | 1 arch/arm/mach-ixp4xx/common.c | 1 arch/arm/mach-keystone/platsmp.c | 4 arch/arm/mach-sa1100/assabet.c | 3 arch/arm/mach-sa1100/hackkit.c | 4 arch/arm/mach-tegra/iomap.h | 2 arch/arm/mach-zynq/common.c | 4 arch/arm/mm/copypage-v4mc.c | 1 arch/arm/mm/copypage-v6.c | 1 arch/arm/mm/copypage-xscale.c | 1 arch/arm/mm/dump.c | 1 arch/arm/mm/fault-armv.c | 1 arch/arm/mm/fault.c | 9 arch/arm/mm/highmem.c | 4 arch/arm/mm/idmap.c | 4 arch/arm/mm/ioremap.c | 31 arch/arm/mm/mm.h | 8 arch/arm/mm/mmu.c | 7 arch/arm/mm/pageattr.c | 1 arch/arm/mm/proc-arm1020.S | 4 arch/arm/mm/proc-arm1020e.S | 4 arch/arm/mm/proc-arm1022.S | 4 arch/arm/mm/proc-arm1026.S | 4 arch/arm/mm/proc-arm720.S | 4 arch/arm/mm/proc-arm740.S | 4 arch/arm/mm/proc-arm7tdmi.S | 4 arch/arm/mm/proc-arm920.S | 4 arch/arm/mm/proc-arm922.S | 4 arch/arm/mm/proc-arm925.S | 4 arch/arm/mm/proc-arm926.S | 4 arch/arm/mm/proc-arm940.S | 4 arch/arm/mm/proc-arm946.S | 4 arch/arm/mm/proc-arm9tdmi.S | 4 arch/arm/mm/proc-fa526.S | 4 arch/arm/mm/proc-feroceon.S | 4 arch/arm/mm/proc-mohawk.S | 4 arch/arm/mm/proc-sa110.S | 4 arch/arm/mm/proc-sa1100.S | 4 arch/arm/mm/proc-v6.S | 4 arch/arm/mm/proc-v7.S | 4 arch/arm/mm/proc-xsc3.S | 4 arch/arm/mm/proc-xscale.S | 4 arch/arm/mm/pv-fixup-asm.S | 4 arch/arm64/include/asm/io.h | 4 arch/arm64/include/asm/kernel-pgtable.h | 2 arch/arm64/include/asm/kvm_mmu.h | 4 arch/arm64/include/asm/mmu_context.h | 4 arch/arm64/include/asm/pgtable.h | 40 arch/arm64/include/asm/stacktrace.h | 3 arch/arm64/include/asm/stage2_pgtable.h | 2 arch/arm64/include/asm/vmap_stack.h | 4 arch/arm64/kernel/acpi.c | 4 arch/arm64/kernel/head.S | 4 arch/arm64/kernel/hibernate.c | 5 arch/arm64/kernel/kaslr.c | 4 arch/arm64/kernel/process.c | 2 arch/arm64/kernel/ptrace.c | 1 arch/arm64/kernel/smp.c | 1 arch/arm64/kernel/suspend.c | 4 arch/arm64/kernel/traps.c | 37 arch/arm64/kernel/vdso.c | 8 arch/arm64/kernel/vmlinux.lds.S | 3 arch/arm64/kvm/mmu.c | 14 arch/arm64/mm/dump.c | 1 arch/arm64/mm/fault.c | 9 arch/arm64/mm/kasan_init.c | 3 arch/arm64/mm/mmu.c | 8 arch/arm64/mm/pageattr.c | 1 arch/arm64/mm/proc.S | 4 arch/c6x/include/asm/pgtable.h | 3 arch/c6x/kernel/traps.c | 28 arch/csky/include/asm/io.h | 2 arch/csky/include/asm/pgtable.h | 37 arch/csky/kernel/module.c | 1 arch/csky/kernel/ptrace.c | 5 arch/csky/kernel/stacktrace.c | 20 arch/csky/kernel/vdso.c | 4 arch/csky/mm/fault.c | 10 arch/csky/mm/highmem.c | 2 arch/csky/mm/init.c | 7 arch/csky/mm/tlb.c | 1 arch/h8300/include/asm/pgtable.h | 1 arch/h8300/kernel/process.c | 1 arch/h8300/kernel/setup.c | 1 arch/h8300/kernel/signal.c | 1 arch/h8300/kernel/traps.c | 26 arch/h8300/mm/fault.c | 1 arch/h8300/mm/init.c | 1 arch/h8300/mm/memory.c | 1 arch/hexagon/include/asm/fixmap.h | 4 arch/hexagon/include/asm/pgtable.h | 55 arch/hexagon/kernel/traps.c | 39 arch/hexagon/kernel/vdso.c | 4 arch/hexagon/mm/uaccess.c | 2 arch/hexagon/mm/vm_fault.c | 9 arch/ia64/include/asm/pgtable.h | 34 arch/ia64/include/asm/ptrace.h | 1 arch/ia64/include/asm/uaccess.h | 2 arch/ia64/kernel/efi.c | 1 arch/ia64/kernel/entry.S | 4 arch/ia64/kernel/head.S | 5 arch/ia64/kernel/irq_ia64.c | 4 arch/ia64/kernel/ivt.S | 4 arch/ia64/kernel/kprobes.c | 4 arch/ia64/kernel/mca.c | 2 arch/ia64/kernel/mca_asm.S | 4 arch/ia64/kernel/perfmon.c | 8 arch/ia64/kernel/process.c | 37 arch/ia64/kernel/ptrace.c | 1 arch/ia64/kernel/relocate_kernel.S | 6 arch/ia64/kernel/setup.c | 4 arch/ia64/kernel/smp.c | 1 arch/ia64/kernel/smpboot.c | 1 arch/ia64/kernel/uncached.c | 4 arch/ia64/kernel/vmlinux.lds.S | 4 arch/ia64/mm/contig.c | 1 arch/ia64/mm/fault.c | 17 arch/ia64/mm/init.c | 12 arch/m68k/68000/m68EZ328.c | 2 arch/m68k/68000/m68VZ328.c | 4 arch/m68k/68000/timers.c | 1 arch/m68k/amiga/config.c | 1 arch/m68k/apollo/config.c | 1 arch/m68k/atari/atasound.c | 1 arch/m68k/atari/stram.c | 1 arch/m68k/bvme6000/config.c | 1 arch/m68k/include/asm/mcf_pgtable.h | 63 arch/m68k/include/asm/motorola_pgalloc.h | 8 arch/m68k/include/asm/motorola_pgtable.h | 84 - arch/m68k/include/asm/pgtable_mm.h | 1 arch/m68k/include/asm/pgtable_no.h | 2 arch/m68k/include/asm/sun3_pgtable.h | 24 arch/m68k/include/asm/sun3xflop.h | 4 arch/m68k/kernel/head.S | 4 arch/m68k/kernel/process.c | 1 arch/m68k/kernel/ptrace.c | 1 arch/m68k/kernel/setup_no.c | 1 arch/m68k/kernel/signal.c | 1 arch/m68k/kernel/sys_m68k.c | 14 arch/m68k/kernel/traps.c | 27 arch/m68k/kernel/uboot.c | 1 arch/m68k/mac/config.c | 1 arch/m68k/mm/fault.c | 10 arch/m68k/mm/init.c | 2 arch/m68k/mm/mcfmmu.c | 1 arch/m68k/mm/motorola.c | 65 arch/m68k/mm/sun3kmap.c | 1 arch/m68k/mm/sun3mmu.c | 1 arch/m68k/mvme147/config.c | 1 arch/m68k/mvme16x/config.c | 1 arch/m68k/q40/config.c | 1 arch/m68k/sun3/config.c | 1 arch/m68k/sun3/dvma.c | 1 arch/m68k/sun3/mmu_emu.c | 1 arch/m68k/sun3/sun3dvma.c | 1 arch/m68k/sun3x/dvma.c | 1 arch/m68k/sun3x/prom.c | 1 arch/microblaze/include/asm/pgalloc.h | 4 arch/microblaze/include/asm/pgtable.h | 23 arch/microblaze/include/asm/uaccess.h | 2 arch/microblaze/include/asm/unwind.h | 3 arch/microblaze/kernel/hw_exception_handler.S | 4 arch/microblaze/kernel/module.c | 4 arch/microblaze/kernel/setup.c | 4 arch/microblaze/kernel/signal.c | 9 arch/microblaze/kernel/stacktrace.c | 4 arch/microblaze/kernel/traps.c | 28 arch/microblaze/kernel/unwind.c | 46 arch/microblaze/mm/fault.c | 17 arch/microblaze/mm/init.c | 9 arch/microblaze/mm/pgtable.c | 4 arch/mips/fw/arc/memory.c | 1 arch/mips/include/asm/fixmap.h | 3 arch/mips/include/asm/mach-generic/floppy.h | 1 arch/mips/include/asm/mach-jazz/floppy.h | 1 arch/mips/include/asm/pgtable-32.h | 22 arch/mips/include/asm/pgtable-64.h | 32 arch/mips/include/asm/pgtable.h | 2 arch/mips/jazz/irq.c | 4 arch/mips/jazz/jazzdma.c | 1 arch/mips/jazz/setup.c | 4 arch/mips/kernel/module.c | 1 arch/mips/kernel/process.c | 1 arch/mips/kernel/ptrace.c | 1 arch/mips/kernel/ptrace32.c | 1 arch/mips/kernel/smp-bmips.c | 1 arch/mips/kernel/traps.c | 58 arch/mips/kernel/vdso.c | 4 arch/mips/kvm/mips.c | 4 arch/mips/kvm/mmu.c | 20 arch/mips/kvm/tlb.c | 1 arch/mips/kvm/trap_emul.c | 2 arch/mips/lib/dump_tlb.c | 1 arch/mips/lib/r3k_dump_tlb.c | 1 arch/mips/mm/c-octeon.c | 1 arch/mips/mm/c-r3k.c | 11 arch/mips/mm/c-r4k.c | 11 arch/mips/mm/c-tx39.c | 11 arch/mips/mm/fault.c | 12 arch/mips/mm/highmem.c | 2 arch/mips/mm/init.c | 1 arch/mips/mm/page.c | 1 arch/mips/mm/pgtable-32.c | 1 arch/mips/mm/pgtable-64.c | 1 arch/mips/mm/sc-ip22.c | 1 arch/mips/mm/sc-mips.c | 1 arch/mips/mm/sc-r5k.c | 1 arch/mips/mm/tlb-r3k.c | 1 arch/mips/mm/tlb-r4k.c | 1 arch/mips/mm/tlbex.c | 4 arch/mips/sgi-ip27/ip27-init.c | 1 arch/mips/sgi-ip27/ip27-timer.c | 1 arch/mips/sgi-ip32/ip32-memory.c | 1 arch/nds32/include/asm/highmem.h | 3 arch/nds32/include/asm/pgtable.h | 22 arch/nds32/kernel/head.S | 4 arch/nds32/kernel/module.c | 2 arch/nds32/kernel/traps.c | 33 arch/nds32/kernel/vdso.c | 6 arch/nds32/mm/fault.c | 17 arch/nds32/mm/init.c | 13 arch/nds32/mm/proc.c | 7 arch/nios2/include/asm/pgtable.h | 24 arch/nios2/kernel/module.c | 1 arch/nios2/kernel/nios2_ksyms.c | 4 arch/nios2/kernel/traps.c | 35 arch/nios2/mm/fault.c | 14 arch/nios2/mm/init.c | 5 arch/nios2/mm/pgtable.c | 1 arch/nios2/mm/tlb.c | 1 arch/openrisc/include/asm/io.h | 3 arch/openrisc/include/asm/pgtable.h | 33 arch/openrisc/include/asm/tlbflush.h | 1 arch/openrisc/kernel/asm-offsets.c | 1 arch/openrisc/kernel/entry.S | 4 arch/openrisc/kernel/head.S | 4 arch/openrisc/kernel/or32_ksyms.c | 4 arch/openrisc/kernel/process.c | 1 arch/openrisc/kernel/ptrace.c | 1 arch/openrisc/kernel/setup.c | 1 arch/openrisc/kernel/traps.c | 27 arch/openrisc/mm/fault.c | 12 arch/openrisc/mm/init.c | 1 arch/openrisc/mm/ioremap.c | 4 arch/openrisc/mm/tlb.c | 1 arch/parisc/include/asm/io.h | 2 arch/parisc/include/asm/mmu_context.h | 1 arch/parisc/include/asm/pgtable.h | 33 arch/parisc/kernel/asm-offsets.c | 4 arch/parisc/kernel/entry.S | 4 arch/parisc/kernel/head.S | 4 arch/parisc/kernel/module.c | 1 arch/parisc/kernel/pacache.S | 4 arch/parisc/kernel/pci-dma.c | 2 arch/parisc/kernel/pdt.c | 4 arch/parisc/kernel/ptrace.c | 1 arch/parisc/kernel/smp.c | 1 arch/parisc/kernel/traps.c | 42 arch/parisc/lib/memcpy.c | 14 arch/parisc/mm/fault.c | 10 arch/parisc/mm/fixmap.c | 6 arch/parisc/mm/init.c | 1 arch/powerpc/include/asm/book3s/32/pgtable.h | 20 arch/powerpc/include/asm/book3s/64/pgtable.h | 43 arch/powerpc/include/asm/fixmap.h | 4 arch/powerpc/include/asm/io.h | 1 arch/powerpc/include/asm/kup.h | 2 arch/powerpc/include/asm/nohash/32/pgtable.h | 17 arch/powerpc/include/asm/nohash/64/pgtable-4k.h | 4 arch/powerpc/include/asm/nohash/64/pgtable.h | 22 arch/powerpc/include/asm/nohash/pgtable.h | 2 arch/powerpc/include/asm/pgtable.h | 28 arch/powerpc/include/asm/pkeys.h | 2 arch/powerpc/include/asm/tlb.h | 2 arch/powerpc/kernel/asm-offsets.c | 1 arch/powerpc/kernel/btext.c | 4 arch/powerpc/kernel/fpu.S | 3 arch/powerpc/kernel/head_32.S | 4 arch/powerpc/kernel/head_40x.S | 4 arch/powerpc/kernel/head_44x.S | 4 arch/powerpc/kernel/head_8xx.S | 4 arch/powerpc/kernel/head_fsl_booke.S | 4 arch/powerpc/kernel/io-workarounds.c | 4 arch/powerpc/kernel/irq.c | 4 arch/powerpc/kernel/mce_power.c | 4 arch/powerpc/kernel/paca.c | 4 arch/powerpc/kernel/process.c | 30 arch/powerpc/kernel/prom.c | 4 arch/powerpc/kernel/prom_init.c | 4 arch/powerpc/kernel/rtas_pci.c | 4 arch/powerpc/kernel/setup-common.c | 4 arch/powerpc/kernel/setup_32.c | 4 arch/powerpc/kernel/setup_64.c | 4 arch/powerpc/kernel/signal_32.c | 1 arch/powerpc/kernel/signal_64.c | 1 arch/powerpc/kernel/smp.c | 4 arch/powerpc/kernel/stacktrace.c | 2 arch/powerpc/kernel/traps.c | 1 arch/powerpc/kernel/vdso.c | 7 arch/powerpc/kvm/book3s_64_mmu_radix.c | 4 arch/powerpc/kvm/book3s_hv.c | 6 arch/powerpc/kvm/book3s_hv_nested.c | 4 arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 arch/powerpc/kvm/book3s_hv_rm_xive.c | 4 arch/powerpc/kvm/book3s_hv_uvmem.c | 18 arch/powerpc/kvm/e500_mmu_host.c | 4 arch/powerpc/kvm/fpu.S | 4 arch/powerpc/lib/code-patching.c | 1 arch/powerpc/mm/book3s32/hash_low.S | 4 arch/powerpc/mm/book3s32/mmu.c | 2 arch/powerpc/mm/book3s32/tlb.c | 6 arch/powerpc/mm/book3s64/hash_hugetlbpage.c | 1 arch/powerpc/mm/book3s64/hash_native.c | 4 arch/powerpc/mm/book3s64/hash_pgtable.c | 5 arch/powerpc/mm/book3s64/hash_utils.c | 4 arch/powerpc/mm/book3s64/iommu_api.c | 4 arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 1 arch/powerpc/mm/book3s64/radix_pgtable.c | 1 arch/powerpc/mm/book3s64/slb.c | 4 arch/powerpc/mm/book3s64/subpage_prot.c | 16 arch/powerpc/mm/copro_fault.c | 4 arch/powerpc/mm/fault.c | 23 arch/powerpc/mm/hugetlbpage.c | 1 arch/powerpc/mm/init-common.c | 4 arch/powerpc/mm/init_32.c | 1 arch/powerpc/mm/init_64.c | 1 arch/powerpc/mm/kasan/8xx.c | 4 arch/powerpc/mm/kasan/book3s_32.c | 2 arch/powerpc/mm/kasan/kasan_init_32.c | 8 arch/powerpc/mm/mem.c | 1 arch/powerpc/mm/nohash/40x.c | 5 arch/powerpc/mm/nohash/8xx.c | 2 arch/powerpc/mm/nohash/fsl_booke.c | 1 arch/powerpc/mm/nohash/tlb_low_64e.S | 4 arch/powerpc/mm/pgtable.c | 2 arch/powerpc/mm/pgtable_32.c | 5 arch/powerpc/mm/pgtable_64.c | 1 arch/powerpc/mm/ptdump/8xx.c | 2 arch/powerpc/mm/ptdump/bats.c | 4 arch/powerpc/mm/ptdump/book3s64.c | 2 arch/powerpc/mm/ptdump/hashpagetable.c | 1 arch/powerpc/mm/ptdump/ptdump.c | 1 arch/powerpc/mm/ptdump/shared.c | 2 arch/powerpc/oprofile/cell/spu_task_sync.c | 6 arch/powerpc/perf/callchain.c | 1 arch/powerpc/perf/callchain_32.c | 1 arch/powerpc/perf/callchain_64.c | 1 arch/powerpc/platforms/85xx/corenet_generic.c | 4 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 4 arch/powerpc/platforms/85xx/qemu_e500.c | 4 arch/powerpc/platforms/85xx/sbc8548.c | 4 arch/powerpc/platforms/85xx/smp.c | 4 arch/powerpc/platforms/86xx/mpc86xx_smp.c | 4 arch/powerpc/platforms/8xx/cpm1.c | 1 arch/powerpc/platforms/8xx/micropatch.c | 1 arch/powerpc/platforms/cell/cbe_regs.c | 4 arch/powerpc/platforms/cell/interrupt.c | 4 arch/powerpc/platforms/cell/pervasive.c | 4 arch/powerpc/platforms/cell/setup.c | 1 arch/powerpc/platforms/cell/smp.c | 4 arch/powerpc/platforms/cell/spider-pic.c | 4 arch/powerpc/platforms/cell/spufs/file.c | 10 arch/powerpc/platforms/chrp/pci.c | 4 arch/powerpc/platforms/chrp/setup.c | 1 arch/powerpc/platforms/chrp/smp.c | 4 arch/powerpc/platforms/maple/setup.c | 1 arch/powerpc/platforms/maple/time.c | 1 arch/powerpc/platforms/powermac/setup.c | 1 arch/powerpc/platforms/powermac/smp.c | 4 arch/powerpc/platforms/powermac/time.c | 1 arch/powerpc/platforms/pseries/lpar.c | 4 arch/powerpc/platforms/pseries/setup.c | 1 arch/powerpc/platforms/pseries/smp.c | 4 arch/powerpc/sysdev/cpm2.c | 1 arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2 arch/powerpc/sysdev/mpic.c | 4 arch/powerpc/xmon/xmon.c | 1 arch/riscv/include/asm/fixmap.h | 4 arch/riscv/include/asm/io.h | 4 arch/riscv/include/asm/kasan.h | 4 arch/riscv/include/asm/pgtable-64.h | 7 arch/riscv/include/asm/pgtable.h | 22 arch/riscv/kernel/module.c | 2 arch/riscv/kernel/setup.c | 1 arch/riscv/kernel/soc.c | 2 arch/riscv/kernel/stacktrace.c | 23 arch/riscv/kernel/vdso.c | 4 arch/riscv/mm/cacheflush.c | 3 arch/riscv/mm/fault.c | 14 arch/riscv/mm/init.c | 31 arch/riscv/mm/kasan_init.c | 4 arch/riscv/mm/pageattr.c | 6 arch/riscv/mm/ptdump.c | 2 arch/s390/boot/ipl_parm.c | 4 arch/s390/boot/kaslr.c | 4 arch/s390/include/asm/hugetlb.h | 4 arch/s390/include/asm/kasan.h | 4 arch/s390/include/asm/pgtable.h | 15 arch/s390/include/asm/tlbflush.h | 1 arch/s390/kernel/asm-offsets.c | 4 arch/s390/kernel/dumpstack.c | 25 arch/s390/kernel/machine_kexec.c | 1 arch/s390/kernel/ptrace.c | 1 arch/s390/kernel/uv.c | 4 arch/s390/kernel/vdso.c | 5 arch/s390/kvm/gaccess.c | 8 arch/s390/kvm/interrupt.c | 4 arch/s390/kvm/kvm-s390.c | 32 arch/s390/kvm/priv.c | 38 arch/s390/mm/dump_pagetables.c | 1 arch/s390/mm/extmem.c | 4 arch/s390/mm/fault.c | 17 arch/s390/mm/gmap.c | 80 arch/s390/mm/init.c | 1 arch/s390/mm/kasan_init.c | 4 arch/s390/mm/pageattr.c | 13 arch/s390/mm/pgalloc.c | 2 arch/s390/mm/pgtable.c | 1 arch/s390/mm/vmem.c | 1 arch/s390/pci/pci_mmio.c | 4 arch/sh/include/asm/io.h | 2 arch/sh/include/asm/kdebug.h | 6 arch/sh/include/asm/pgtable-3level.h | 7 arch/sh/include/asm/pgtable.h | 2 arch/sh/include/asm/pgtable_32.h | 25 arch/sh/include/asm/processor_32.h | 2 arch/sh/kernel/dumpstack.c | 54 arch/sh/kernel/machine_kexec.c | 1 arch/sh/kernel/process_32.c | 2 arch/sh/kernel/ptrace_32.c | 1 arch/sh/kernel/signal_32.c | 1 arch/sh/kernel/sys_sh.c | 6 arch/sh/kernel/traps.c | 4 arch/sh/kernel/vsyscall/vsyscall.c | 4 arch/sh/mm/cache-sh3.c | 1 arch/sh/mm/cache-sh4.c | 11 arch/sh/mm/cache-sh7705.c | 1 arch/sh/mm/fault.c | 16 arch/sh/mm/kmap.c | 5 arch/sh/mm/nommu.c | 1 arch/sh/mm/pmb.c | 4 arch/sparc/include/asm/floppy_32.h | 4 arch/sparc/include/asm/highmem.h | 4 arch/sparc/include/asm/ide.h | 2 arch/sparc/include/asm/io-unit.h | 4 arch/sparc/include/asm/pgalloc_32.h | 4 arch/sparc/include/asm/pgalloc_64.h | 2 arch/sparc/include/asm/pgtable_32.h | 34 arch/sparc/include/asm/pgtable_64.h | 32 arch/sparc/kernel/cpu.c | 4 arch/sparc/kernel/entry.S | 4 arch/sparc/kernel/head_64.S | 4 arch/sparc/kernel/ktlb.S | 4 arch/sparc/kernel/leon_smp.c | 1 arch/sparc/kernel/pci.c | 4 arch/sparc/kernel/process_32.c | 29 arch/sparc/kernel/process_64.c | 3 arch/sparc/kernel/ptrace_32.c | 1 arch/sparc/kernel/ptrace_64.c | 1 arch/sparc/kernel/setup_32.c | 1 arch/sparc/kernel/setup_64.c | 1 arch/sparc/kernel/signal32.c | 1 arch/sparc/kernel/signal_32.c | 1 arch/sparc/kernel/signal_64.c | 1 arch/sparc/kernel/smp_32.c | 1 arch/sparc/kernel/smp_64.c | 1 arch/sparc/kernel/sun4m_irq.c | 4 arch/sparc/kernel/trampoline_64.S | 4 arch/sparc/kernel/traps_32.c | 4 arch/sparc/kernel/traps_64.c | 24 arch/sparc/lib/clear_page.S | 4 arch/sparc/lib/copy_page.S | 2 arch/sparc/mm/fault_32.c | 21 arch/sparc/mm/fault_64.c | 17 arch/sparc/mm/highmem.c | 12 arch/sparc/mm/hugetlbpage.c | 1 arch/sparc/mm/init_32.c | 1 arch/sparc/mm/init_64.c | 7 arch/sparc/mm/io-unit.c | 11 arch/sparc/mm/iommu.c | 9 arch/sparc/mm/tlb.c | 1 arch/sparc/mm/tsb.c | 4 arch/sparc/mm/ultra.S | 4 arch/sparc/vdso/vma.c | 4 arch/um/drivers/mconsole_kern.c | 2 arch/um/include/asm/mmu_context.h | 5 arch/um/include/asm/pgtable-3level.h | 4 arch/um/include/asm/pgtable.h | 69 arch/um/kernel/maccess.c | 12 arch/um/kernel/mem.c | 10 arch/um/kernel/process.c | 1 arch/um/kernel/skas/mmu.c | 3 arch/um/kernel/skas/uaccess.c | 1 arch/um/kernel/sysrq.c | 35 arch/um/kernel/tlb.c | 5 arch/um/kernel/trap.c | 15 arch/um/kernel/um_arch.c | 1 arch/unicore32/include/asm/pgtable.h | 19 arch/unicore32/kernel/hibernate.c | 4 arch/unicore32/kernel/hibernate_asm.S | 4 arch/unicore32/kernel/module.c | 1 arch/unicore32/kernel/setup.h | 4 arch/unicore32/kernel/traps.c | 50 arch/unicore32/lib/backtrace.S | 24 arch/unicore32/mm/alignment.c | 4 arch/unicore32/mm/fault.c | 9 arch/unicore32/mm/mm.h | 10 arch/unicore32/mm/proc-ucv2.S | 4 arch/x86/boot/compressed/kaslr_64.c | 4 arch/x86/entry/vdso/vma.c | 14 arch/x86/events/core.c | 4 arch/x86/include/asm/agp.h | 2 arch/x86/include/asm/asm-prototypes.h | 4 arch/x86/include/asm/efi.h | 4 arch/x86/include/asm/iomap.h | 1 arch/x86/include/asm/kaslr.h | 2 arch/x86/include/asm/mmu.h | 2 arch/x86/include/asm/pgtable-3level.h | 8 arch/x86/include/asm/pgtable.h | 89 - arch/x86/include/asm/pgtable_32.h | 11 arch/x86/include/asm/pgtable_64.h | 4 arch/x86/include/asm/setup.h | 12 arch/x86/include/asm/stacktrace.h | 2 arch/x86/include/asm/uaccess.h | 16 arch/x86/include/asm/xen/hypercall.h | 4 arch/x86/include/asm/xen/page.h | 1 arch/x86/kernel/acpi/boot.c | 4 arch/x86/kernel/acpi/sleep.c | 4 arch/x86/kernel/alternative.c | 1 arch/x86/kernel/amd_gart_64.c | 5 arch/x86/kernel/apic/apic_numachip.c | 4 arch/x86/kernel/cpu/bugs.c | 4 arch/x86/kernel/cpu/common.c | 4 arch/x86/kernel/cpu/intel.c | 4 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 arch/x86/kernel/crash_core_32.c | 4 arch/x86/kernel/crash_core_64.c | 4 arch/x86/kernel/doublefault_32.c | 1 arch/x86/kernel/dumpstack.c | 21 arch/x86/kernel/early_printk.c | 4 arch/x86/kernel/espfix_64.c | 2 arch/x86/kernel/head64.c | 4 arch/x86/kernel/head_64.S | 4 arch/x86/kernel/i8259.c | 4 arch/x86/kernel/irqinit.c | 4 arch/x86/kernel/kprobes/core.c | 4 arch/x86/kernel/kprobes/opt.c | 4 arch/x86/kernel/ldt.c | 2 arch/x86/kernel/machine_kexec_32.c | 1 arch/x86/kernel/machine_kexec_64.c | 1 arch/x86/kernel/module.c | 1 arch/x86/kernel/paravirt.c | 4 arch/x86/kernel/process_32.c | 1 arch/x86/kernel/process_64.c | 1 arch/x86/kernel/ptrace.c | 1 arch/x86/kernel/reboot.c | 4 arch/x86/kernel/smpboot.c | 4 arch/x86/kernel/tboot.c | 3 arch/x86/kernel/vm86_32.c | 4 arch/x86/kvm/mmu/paging_tmpl.h | 8 arch/x86/mm/cpu_entry_area.c | 4 arch/x86/mm/debug_pagetables.c | 2 arch/x86/mm/dump_pagetables.c | 1 arch/x86/mm/fault.c | 22 arch/x86/mm/init.c | 22 arch/x86/mm/init_32.c | 27 arch/x86/mm/init_64.c | 1 arch/x86/mm/ioremap.c | 4 arch/x86/mm/kasan_init_64.c | 1 arch/x86/mm/kaslr.c | 37 arch/x86/mm/maccess.c | 44 arch/x86/mm/mem_encrypt_boot.S | 2 arch/x86/mm/mmio-mod.c | 4 arch/x86/mm/pat/cpa-test.c | 1 arch/x86/mm/pat/memtype.c | 1 arch/x86/mm/pat/memtype_interval.c | 4 arch/x86/mm/pgtable.c | 1 arch/x86/mm/pgtable_32.c | 1 arch/x86/mm/pti.c | 1 arch/x86/mm/setup_nx.c | 4 arch/x86/platform/efi/efi_32.c | 4 arch/x86/platform/efi/efi_64.c | 1 arch/x86/platform/olpc/olpc_ofw.c | 4 arch/x86/power/cpu.c | 4 arch/x86/power/hibernate.c | 4 arch/x86/power/hibernate_32.c | 4 arch/x86/power/hibernate_64.c | 4 arch/x86/realmode/init.c | 4 arch/x86/um/vdso/vma.c | 4 arch/x86/xen/enlighten_pv.c | 1 arch/x86/xen/grant-table.c | 1 arch/x86/xen/mmu_pv.c | 4 arch/x86/xen/smp_pv.c | 2 arch/xtensa/include/asm/fixmap.h | 12 arch/xtensa/include/asm/highmem.h | 4 arch/xtensa/include/asm/initialize_mmu.h | 2 arch/xtensa/include/asm/mmu_context.h | 4 arch/xtensa/include/asm/pgtable.h | 20 arch/xtensa/kernel/entry.S | 4 arch/xtensa/kernel/process.c | 1 arch/xtensa/kernel/ptrace.c | 1 arch/xtensa/kernel/setup.c | 1 arch/xtensa/kernel/traps.c | 42 arch/xtensa/kernel/vectors.S | 4 arch/xtensa/mm/cache.c | 4 arch/xtensa/mm/fault.c | 12 arch/xtensa/mm/highmem.c | 2 arch/xtensa/mm/ioremap.c | 4 arch/xtensa/mm/kasan_init.c | 10 arch/xtensa/mm/misc.S | 4 arch/xtensa/mm/mmu.c | 5 drivers/acpi/scan.c | 3 drivers/android/binder_alloc.c | 14 drivers/atm/fore200e.c | 4 drivers/base/power/main.c | 4 drivers/block/z2ram.c | 4 drivers/char/agp/frontend.c | 1 drivers/char/agp/generic.c | 1 drivers/char/bsr.c | 1 drivers/char/mspec.c | 3 drivers/dma-buf/dma-resv.c | 5 drivers/firmware/efi/arm-runtime.c | 4 drivers/firmware/efi/efi.c | 2 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 drivers/gpu/drm/drm_vm.c | 4 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 14 drivers/gpu/drm/i915/i915_mm.c | 1 drivers/gpu/drm/i915/i915_perf.c | 2 drivers/gpu/drm/nouveau/nouveau_svm.c | 22 drivers/gpu/drm/radeon/radeon_cs.c | 4 drivers/gpu/drm/radeon/radeon_gem.c | 6 drivers/gpu/drm/ttm/ttm_bo_vm.c | 10 drivers/infiniband/core/umem_odp.c | 4 drivers/infiniband/core/uverbs_main.c | 6 drivers/infiniband/hw/hfi1/mmu_rb.c | 2 drivers/infiniband/hw/mlx4/mr.c | 4 drivers/infiniband/hw/qib/qib_file_ops.c | 4 drivers/infiniband/hw/qib/qib_user_pages.c | 6 drivers/infiniband/hw/usnic/usnic_uiom.c | 4 drivers/infiniband/sw/rdmavt/mmap.c | 1 drivers/infiniband/sw/rxe/rxe_mmap.c | 1 drivers/infiniband/sw/siw/siw_mem.c | 4 drivers/iommu/amd_iommu_v2.c | 4 drivers/iommu/intel-svm.c | 4 drivers/macintosh/macio-adb.c | 4 drivers/macintosh/mediabay.c | 4 drivers/macintosh/via-pmu.c | 4 drivers/media/pci/bt8xx/bt878.c | 4 drivers/media/pci/bt8xx/btcx-risc.c | 4 drivers/media/pci/bt8xx/bttv-risc.c | 4 drivers/media/platform/davinci/vpbe_display.c | 1 drivers/media/v4l2-core/v4l2-common.c | 1 drivers/media/v4l2-core/videobuf-core.c | 4 drivers/media/v4l2-core/videobuf-dma-contig.c | 4 drivers/media/v4l2-core/videobuf-dma-sg.c | 10 drivers/media/v4l2-core/videobuf-vmalloc.c | 4 drivers/misc/cxl/cxllib.c | 9 drivers/misc/cxl/fault.c | 4 drivers/misc/genwqe/card_utils.c | 2 drivers/misc/sgi-gru/grufault.c | 25 drivers/misc/sgi-gru/grufile.c | 4 drivers/mtd/ubi/ubi.h | 2 drivers/net/ethernet/amd/7990.c | 4 drivers/net/ethernet/amd/hplance.c | 4 drivers/net/ethernet/amd/mvme147.c | 4 drivers/net/ethernet/amd/sun3lance.c | 4 drivers/net/ethernet/amd/sunlance.c | 4 drivers/net/ethernet/apple/bmac.c | 4 drivers/net/ethernet/apple/mace.c | 4 drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 4 drivers/net/ethernet/freescale/fs_enet/mac-fcc.c | 4 drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 4 drivers/net/ethernet/i825xx/82596.c | 4 drivers/net/ethernet/korina.c | 4 drivers/net/ethernet/marvell/pxa168_eth.c | 4 drivers/net/ethernet/natsemi/jazzsonic.c | 4 drivers/net/ethernet/natsemi/macsonic.c | 4 drivers/net/ethernet/natsemi/xtsonic.c | 4 drivers/net/ethernet/sun/sunbmac.c | 4 drivers/net/ethernet/sun/sunhme.c | 1 drivers/net/ethernet/sun/sunqe.c | 4 drivers/oprofile/buffer_sync.c | 12 drivers/sbus/char/flash.c | 1 drivers/sbus/char/uctrl.c | 1 drivers/scsi/53c700.c | 4 drivers/scsi/a2091.c | 1 drivers/scsi/a3000.c | 1 drivers/scsi/arm/cumana_2.c | 4 drivers/scsi/arm/eesox.c | 4 drivers/scsi/arm/powertec.c | 4 drivers/scsi/dpt_i2o.c | 4 drivers/scsi/gvp11.c | 1 drivers/scsi/lasi700.c | 1 drivers/scsi/mac53c94.c | 4 drivers/scsi/mesh.c | 4 drivers/scsi/mvme147.c | 1 drivers/scsi/qlogicpti.c | 4 drivers/scsi/sni_53c710.c | 1 drivers/scsi/zorro_esp.c | 4 drivers/staging/android/ashmem.c | 4 drivers/staging/comedi/comedi_fops.c | 2 drivers/staging/kpc2000/kpc_dma/fileops.c | 4 drivers/staging/media/atomisp/pci/hmm/hmm_bo.c | 4 drivers/tee/optee/call.c | 4 drivers/tty/sysrq.c | 4 drivers/tty/vt/consolemap.c | 2 drivers/vfio/pci/vfio_pci.c | 22 drivers/vfio/vfio_iommu_type1.c | 8 drivers/vhost/vdpa.c | 4 drivers/video/console/newport_con.c | 1 drivers/video/fbdev/acornfb.c | 1 drivers/video/fbdev/atafb.c | 1 drivers/video/fbdev/cirrusfb.c | 1 drivers/video/fbdev/cyber2000fb.c | 1 drivers/video/fbdev/fb-puv3.c | 1 drivers/video/fbdev/hitfb.c | 1 drivers/video/fbdev/neofb.c | 1 drivers/video/fbdev/q40fb.c | 1 drivers/video/fbdev/savage/savagefb_driver.c | 1 drivers/xen/balloon.c | 1 drivers/xen/gntdev.c | 6 drivers/xen/grant-table.c | 1 drivers/xen/privcmd.c | 15 drivers/xen/xenbus/xenbus_probe.c | 1 drivers/xen/xenbus/xenbus_probe_backend.c | 1 drivers/xen/xenbus/xenbus_probe_frontend.c | 1 fs/aio.c | 4 fs/coredump.c | 8 fs/exec.c | 18 fs/ext2/file.c | 2 fs/ext4/super.c | 6 fs/hugetlbfs/inode.c | 2 fs/io_uring.c | 4 fs/kernfs/file.c | 4 fs/proc/array.c | 1 fs/proc/base.c | 24 fs/proc/meminfo.c | 1 fs/proc/nommu.c | 1 fs/proc/task_mmu.c | 34 fs/proc/task_nommu.c | 18 fs/proc/vmcore.c | 1 fs/userfaultfd.c | 46 fs/xfs/xfs_file.c | 2 fs/xfs/xfs_inode.c | 14 fs/xfs/xfs_iops.c | 4 include/asm-generic/io.h | 2 include/asm-generic/pgtable-nopmd.h | 1 include/asm-generic/pgtable-nopud.h | 1 include/asm-generic/pgtable.h | 1322 ---------------- include/linux/cache.h | 10 include/linux/crash_dump.h | 3 include/linux/dax.h | 1 include/linux/dma-noncoherent.h | 2 include/linux/fs.h | 4 include/linux/hmm.h | 2 include/linux/huge_mm.h | 2 include/linux/hugetlb.h | 2 include/linux/io-mapping.h | 4 include/linux/kallsyms.h | 4 include/linux/kasan.h | 4 include/linux/mempolicy.h | 2 include/linux/mm.h | 15 include/linux/mm_types.h | 4 include/linux/mmap_lock.h | 128 + include/linux/mmu_notifier.h | 13 include/linux/pagemap.h | 2 include/linux/pgtable.h | 1444 +++++++++++++++++- include/linux/rmap.h | 2 include/linux/sched/debug.h | 7 include/linux/sched/mm.h | 10 include/linux/uaccess.h | 62 include/xen/arm/page.h | 4 init/init_task.c | 1 ipc/shm.c | 8 kernel/acct.c | 6 kernel/bpf/stackmap.c | 21 kernel/bpf/syscall.c | 2 kernel/cgroup/cpuset.c | 4 kernel/debug/kdb/kdb_bt.c | 17 kernel/events/core.c | 10 kernel/events/uprobes.c | 20 kernel/exit.c | 11 kernel/fork.c | 15 kernel/futex.c | 4 kernel/locking/lockdep.c | 4 kernel/locking/rtmutex-debug.c | 4 kernel/power/snapshot.c | 1 kernel/relay.c | 2 kernel/sched/core.c | 10 kernel/sched/fair.c | 4 kernel/sys.c | 22 kernel/trace/bpf_trace.c | 176 +- kernel/trace/ftrace.c | 8 kernel/trace/trace_kprobe.c | 80 kernel/trace/trace_output.c | 4 lib/dump_stack.c | 4 lib/ioremap.c | 1 lib/test_hmm.c | 14 lib/test_lockup.c | 16 mm/debug.c | 10 mm/debug_vm_pgtable.c | 1 mm/filemap.c | 46 mm/frame_vector.c | 6 mm/gup.c | 73 mm/hmm.c | 2 mm/huge_memory.c | 8 mm/hugetlb.c | 3 mm/init-mm.c | 6 mm/internal.h | 6 mm/khugepaged.c | 72 mm/ksm.c | 48 mm/maccess.c | 496 +++--- mm/madvise.c | 40 mm/memcontrol.c | 10 mm/memory.c | 61 mm/mempolicy.c | 36 mm/migrate.c | 16 mm/mincore.c | 8 mm/mlock.c | 22 mm/mmap.c | 74 mm/mmu_gather.c | 2 mm/mmu_notifier.c | 22 mm/mprotect.c | 22 mm/mremap.c | 14 mm/msync.c | 8 mm/nommu.c | 22 mm/oom_kill.c | 14 mm/page_io.c | 1 mm/page_reporting.h | 2 mm/pagewalk.c | 12 mm/pgtable-generic.c | 6 mm/process_vm_access.c | 4 mm/ptdump.c | 4 mm/rmap.c | 12 mm/shmem.c | 5 mm/sparse-vmemmap.c | 1 mm/sparse.c | 1 mm/swap_state.c | 5 mm/swapfile.c | 5 mm/userfaultfd.c | 26 mm/util.c | 12 mm/vmacache.c | 1 mm/zsmalloc.c | 4 net/ipv4/tcp.c | 8 net/xdp/xdp_umem.c | 4 security/keys/keyctl.c | 2 sound/core/oss/pcm_oss.c | 2 sound/core/sgbuf.c | 1 sound/pci/hda/hda_intel.c | 4 sound/soc/intel/common/sst-firmware.c | 4 sound/soc/intel/haswell/sst-haswell-pcm.c | 4 tools/include/linux/kallsyms.h | 2 virt/kvm/async_pf.c | 4 virt/kvm/kvm_main.c | 9 942 files changed, 4580 insertions(+), 5662 deletions(-)
- various hotfixes and minor things - hch's use_mm/unuse_mm clearnups - new syscall process_madvise(): perform madvise() on a process other than self 25 patches, based on 6f630784cc0d92fb58ea326e2bc01aa056279ecb. Subsystems affected by this patch series: mm/hugetlb scripts kcov lib nilfs checkpatch lib mm/debug ocfs2 lib misc mm/madvise Subsystem: mm/hugetlb Dan Carpenter <dan.carpenter@oracle.com>: khugepaged: selftests: fix timeout condition in wait_for_scan() Subsystem: scripts SeongJae Park <sjpark@amazon.de>: scripts/spelling: add a few more typos Subsystem: kcov Andrey Konovalov <andreyknvl@google.com>: kcov: check kcov_softirq in kcov_remote_stop() Subsystem: lib Joe Perches <joe@perches.com>: lib/lz4/lz4_decompress.c: document deliberate use of `&' Subsystem: nilfs Ryusuke Konishi <konishi.ryusuke@gmail.com>: nilfs2: fix null pointer dereference at nilfs_segctor_do_construct() Subsystem: checkpatch Tim Froidcoeur <tim.froidcoeur@tessares.net>: checkpatch: correct check for kernel parameters doc Subsystem: lib Alexander Gordeev <agordeev@linux.ibm.com>: lib: fix bitmap_parse() on 64-bit big endian archs Subsystem: mm/debug "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: mm/debug_vm_pgtable: fix kernel crash by checking for THP support Subsystem: ocfs2 Keyur Patel <iamkeyur96@gmail.com>: ocfs2: fix spelling mistake and grammar Ben Widawsky <ben.widawsky@intel.com>: mm: add comments on pglist_data zones Subsystem: lib Wei Yang <richard.weiyang@gmail.com>: lib: test get_count_order/long in test_bitops.c Subsystem: misc Walter Wu <walter-zh.wu@mediatek.com>: stacktrace: cleanup inconsistent variable type Christoph Hellwig <hch@lst.de>: Patch series "improve use_mm / unuse_mm", v2: kernel: move use_mm/unuse_mm to kthread.c kernel: move use_mm/unuse_mm to kthread.c kernel: better document the use_mm/unuse_mm API contract kernel: set USER_DS in kthread_use_mm Subsystem: mm/madvise Minchan Kim <minchan@kernel.org>: Patch series "introduce memory hinting API for external process", v7: mm/madvise: pass task and mm to do_madvise mm/madvise: introduce process_madvise() syscall: an external memory hinting API mm/madvise: check fatal signal pending of target process pid: move pidfd_get_pid() to pid.c mm/madvise: support both pid and pidfd for process_madvise Oleksandr Natalenko <oleksandr@redhat.com>: mm/madvise: allow KSM hints for remote API Minchan Kim <minchan@kernel.org>: mm: support vector address ranges for process_madvise mm: use only pidfd for process_madvise syscall YueHaibing <yuehaibing@huawei.com>: mm/madvise.c: remove duplicated include arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 4 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 3 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 3 arch/parisc/kernel/syscalls/syscall.tbl | 3 arch/powerpc/kernel/syscalls/syscall.tbl | 3 arch/powerpc/platforms/powernv/vas-fault.c | 4 arch/s390/kernel/syscalls/syscall.tbl | 3 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 3 arch/x86/entry/syscalls/syscall_32.tbl | 3 arch/x86/entry/syscalls/syscall_64.tbl | 5 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 1 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 2 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 drivers/usb/gadget/function/f_fs.c | 10 drivers/usb/gadget/legacy/inode.c | 6 drivers/vfio/vfio_iommu_type1.c | 6 drivers/vhost/vhost.c | 8 fs/aio.c | 1 fs/io-wq.c | 15 - fs/io_uring.c | 11 fs/nilfs2/segment.c | 2 fs/ocfs2/mmap.c | 2 include/linux/compat.h | 10 include/linux/kthread.h | 9 include/linux/mm.h | 3 include/linux/mmu_context.h | 5 include/linux/mmzone.h | 14 include/linux/pid.h | 1 include/linux/stacktrace.h | 2 include/linux/syscalls.h | 16 - include/uapi/asm-generic/unistd.h | 7 kernel/exit.c | 17 - kernel/kcov.c | 26 + kernel/kthread.c | 95 +++++- kernel/pid.c | 17 + kernel/sys_ni.c | 2 lib/Kconfig.debug | 10 lib/bitmap.c | 9 lib/lz4/lz4_decompress.c | 3 lib/test_bitops.c | 53 +++ mm/Makefile | 2 mm/debug_vm_pgtable.c | 6 mm/madvise.c | 295 ++++++++++++++------ mm/mmu_context.c | 64 ---- mm/oom_kill.c | 6 mm/vmacache.c | 4 scripts/checkpatch.pl | 4 scripts/spelling.txt | 9 tools/testing/selftests/vm/khugepaged.c | 2 62 files changed, 526 insertions(+), 285 deletions(-)
A few fixes and stragglers. 5 patches, based on 623f6dc593eaf98b91916836785278eddddaacf8. Subsystems affected by this patch series: mm/memory-failure ocfs2 lib/lzo misc Subsystem: mm/memory-failure Naoya Horiguchi <nao.horiguchi@gmail.com>: Patch series "hwpoison: fixes signaling on memory error": mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread Subsystem: ocfs2 Tom Seewald <tseewald@gmail.com>: ocfs2: fix build failure when TCP/IP is disabled Subsystem: lib/lzo Dave Rodgman <dave.rodgman@arm.com>: lib/lzo: fix ambiguous encoding bug in lzo-rle Subsystem: misc Christoph Hellwig <hch@lst.de>: amdgpu: a NULL ->mm does not mean a thread is a kthread Documentation/lzo.txt | 8 ++++- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 - fs/ocfs2/Kconfig | 2 - lib/lzo/lzo1x_compress.c | 13 ++++++++ mm/memory-failure.c | 43 +++++++++++++++++------------ 5 files changed, 47 insertions(+), 21 deletions(-)
32 patches, based on 908f7d12d3ba51dfe0449b9723199b423f97ca9a. Subsystems affected by this patch series: hotfixes mm/pagealloc kexec ocfs2 lib misc mm/slab mm/slab mm/slub mm/swap mm/pagemap mm/vmalloc mm/memcg mm/gup mm/thp mm/vmscan x86 mm/memory-hotplug MAINTAINERS Subsystem: hotfixes Stafford Horne <shorne@gmail.com>: openrisc: fix boot oops when DEBUG_VM is enabled Michal Hocko <mhocko@suse.com>: mm: do_swap_page(): fix up the error code Subsystem: mm/pagealloc Vlastimil Babka <vbabka@suse.cz>: mm, compaction: make capture control handling safe wrt interrupts Subsystem: kexec Lianbo Jiang <lijiang@redhat.com>: kexec: do not verify the signature without the lockdown or mandatory signature Subsystem: ocfs2 Junxiao Bi <junxiao.bi@oracle.com>: Patch series "ocfs2: fix nfsd over ocfs2 issues", v2: ocfs2: avoid inode removal while nfsd is accessing it ocfs2: load global_inode_alloc ocfs2: fix panic on nfs server over ocfs2 ocfs2: fix value of OCFS2_INVALID_SLOT Subsystem: lib Randy Dunlap <rdunlap@infradead.org>: lib: fix test_hmm.c reference after free Subsystem: misc Rikard Falkeborn <rikard.falkeborn@gmail.com>: linux/bits.h: fix unsigned less than zero warnings Subsystem: mm/slab Waiman Long <longman@redhat.com>: mm, slab: fix sign conversion problem in memcg_uncharge_slab() Subsystem: mm/slab Waiman Long <longman@redhat.com>: mm/slab: use memzero_explicit() in kzfree() Subsystem: mm/slub Sebastian Andrzej Siewior <bigeasy@linutronix.de>: slub: cure list_slab_objects() from double fix Subsystem: mm/swap Hugh Dickins <hughd@google.com>: mm: fix swap cache node allocation mask Subsystem: mm/pagemap Arjun Roy <arjunroy@google.com>: mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages() Christophe Leroy <christophe.leroy@csgroup.eu>: mm/debug_vm_pgtable: fix build failure with powerpc 8xx Stephen Rothwell <sfr@canb.auug.org.au>: make asm-generic/cacheflush.h more standalone Nathan Chancellor <natechancellor@gmail.com>: media: omap3isp: remove cacheflush.h Subsystem: mm/vmalloc Masanari Iida <standby24x7@gmail.com>: mm/vmalloc.c: fix a warning while make xmldocs Subsystem: mm/memcg Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: handle div0 crash race condition in memory.low Muchun Song <songmuchun@bytedance.com>: mm/memcontrol.c: add missed css_put() Chris Down <chris@chrisdown.name>: mm/memcontrol.c: prevent missed memory.low load tears Subsystem: mm/gup Souptick Joarder <jrdr.linux@gmail.com>: docs: mm/gup: minor documentation update Subsystem: mm/thp Yang Shi <yang.shi@linux.alibaba.com>: doc: THP CoW fault no longer allocate THP Subsystem: mm/vmscan Johannes Weiner <hannes@cmpxchg.org>: Patch series "fix for "mm: balance LRU lists based on relative thrashing" patchset": mm: workingset: age nonresident information alongside anonymous pages Joonsoo Kim <iamjoonsoo.kim@lge.com>: mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages" mm/memory: fix IO cost for anonymous page Subsystem: x86 Christoph Hellwig <hch@lst.de>: Patch series "fix a hyperv W^X violation and remove vmalloc_exec": x86/hyperv: allocate the hypercall page with only read and execute bits arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page mm: remove vmalloc_exec Subsystem: mm/memory-hotplug Ben Widawsky <ben.widawsky@intel.com>: mm/memory_hotplug.c: fix false softlockup during pfn range removal Subsystem: MAINTAINERS Luc Van Oostenryck <luc.vanoostenryck@gmail.com>: MAINTAINERS: update info for sparse Documentation/admin-guide/cgroup-v2.rst | 4 +- Documentation/admin-guide/mm/transhuge.rst | 3 - Documentation/core-api/pin_user_pages.rst | 2 - MAINTAINERS | 4 +- arch/arm64/kernel/probes/kprobes.c | 12 +------ arch/openrisc/kernel/dma.c | 5 +++ arch/x86/hyperv/hv_init.c | 4 +- arch/x86/include/asm/pgtable_types.h | 2 + drivers/media/platform/omap3isp/isp.c | 2 - drivers/media/platform/omap3isp/ispvideo.c | 1 fs/ocfs2/dlmglue.c | 17 ++++++++++ fs/ocfs2/ocfs2.h | 1 fs/ocfs2/ocfs2_fs.h | 4 +- fs/ocfs2/suballoc.c | 9 +++-- include/asm-generic/cacheflush.h | 5 +++ include/linux/bits.h | 3 + include/linux/mmzone.h | 4 +- include/linux/swap.h | 1 include/linux/vmalloc.h | 1 kernel/kexec_file.c | 36 ++++------------------ kernel/module.c | 4 +- lib/test_hmm.c | 3 - mm/compaction.c | 17 ++++++++-- mm/debug_vm_pgtable.c | 4 +- mm/memcontrol.c | 18 ++++++++--- mm/memory.c | 33 +++++++++++++------- mm/memory_hotplug.c | 13 ++++++-- mm/nommu.c | 17 ---------- mm/slab.h | 4 +- mm/slab_common.c | 2 - mm/slub.c | 19 ++--------- mm/swap.c | 3 - mm/swap_state.c | 4 +- mm/vmalloc.c | 21 ------------- mm/vmscan.c | 3 + mm/workingset.c | 46 +++++++++++++++++------------ 36 files changed, 168 insertions(+), 163 deletions(-)
5 patches, based on cdd3bb54332f82295ed90cd0c09c78cd0c0ee822. Subsystems affected by this patch series: mm/hugetlb samples mm/cma mm/vmalloc mm/pagealloc Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: mm/hugetlb.c: fix pages per hugetlb calculation Subsystem: samples Kees Cook <keescook@chromium.org>: samples/vfs: avoid warning in statx override Subsystem: mm/cma Barry Song <song.bao.hua@hisilicon.com>: mm/cma.c: use exact_nid true to fix possible per-numa cma leak Subsystem: mm/vmalloc Christoph Hellwig <hch@lst.de>: vmalloc: fix the owner argument for the new __vmalloc_node_range callers Subsystem: mm/pagealloc Joel Savitz <jsavitz@redhat.com>: mm/page_alloc: fix documentation error arch/arm64/kernel/probes/kprobes.c | 2 +- arch/x86/hyperv/hv_init.c | 3 ++- kernel/module.c | 2 +- mm/cma.c | 4 ++-- mm/hugetlb.c | 2 +- mm/page_alloc.c | 2 +- samples/vfs/test-statx.c | 2 ++ 7 files changed, 10 insertions(+), 7 deletions(-)
15 patches, based on f37e99aca03f63aa3f2bd13ceaf769455d12c4b0. Subsystems affected by this patch series: mm/pagemap mm/shmem mm/hotfixes mm/memcg mm/hugetlb mailmap squashfs scripts io-mapping MAINTAINERS gdb Subsystem: mm/pagemap Yang Shi <yang.shi@linux.alibaba.com>: mm/memory.c: avoid access flag update TLB flush for retried page fault "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>: mm/mmap.c: close race between munmap() and expand_upwards()/downwards() Subsystem: mm/shmem Chengguang Xu <cgxu519@mykernel.net>: vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way Subsystem: mm/hotfixes Tom Rix <trix@redhat.com>: mm: initialize return of vm_insert_pages Bhupesh Sharma <bhsharma@redhat.com>: mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages() Subsystem: mm/memcg Hugh Dickins <hughd@google.com>: mm/memcg: fix refcount error while moving and swapping Muchun Song <songmuchun@bytedance.com>: mm: memcg/slab: fix memory leak at non-root kmem_cache destroy Subsystem: mm/hugetlb Barry Song <song.bao.hua@hisilicon.com>: mm/hugetlb: avoid hardcoding while checking if cma is enabled "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>: khugepaged: fix null-pointer dereference due to race Subsystem: mailmap Mike Rapoport <rppt@linux.ibm.com>: mailmap: add entry for Mike Rapoport Subsystem: squashfs Phillip Lougher <phillip@squashfs.org.uk>: squashfs: fix length field overlap check in metadata reading Subsystem: scripts Pi-Hsun Shih <pihsun@chromium.org>: scripts/decode_stacktrace: strip basepath from all paths Subsystem: io-mapping "Michael J. Ruhl" <michael.j.ruhl@intel.com>: io-mapping: indicate mapping failure Subsystem: MAINTAINERS Andrey Konovalov <andreyknvl@google.com>: MAINTAINERS: add KCOV section Subsystem: gdb Stefano Garzarella <sgarzare@redhat.com>: scripts/gdb: fix lx-symbols 'gdb.error' while loading modules .mailmap | 3 +++ MAINTAINERS | 11 +++++++++++ fs/squashfs/block.c | 2 +- include/linux/io-mapping.h | 5 ++++- include/linux/xattr.h | 3 ++- mm/hugetlb.c | 15 ++++++++++----- mm/khugepaged.c | 3 +++ mm/memcontrol.c | 13 ++++++++++--- mm/memory.c | 9 +++++++-- mm/mmap.c | 16 ++++++++++++++-- mm/shmem.c | 2 +- mm/slab_common.c | 35 ++++++++++++++++++++++++++++------- scripts/decode_stacktrace.sh | 4 ++-- scripts/gdb/linux/symbols.py | 2 +- 14 files changed, 97 insertions(+), 26 deletions(-)
- A few MM hotfixes - kthread, tools, scripts, ntfs and ocfs2 - Some of MM 163 patches, based on d6efb3ac3e6c19ab722b28bdb9252bae0b9676b6. Subsystems affected by this patch series: mm/pagemap mm/hofixes mm/pagealloc kthread tools scripts ntfs ocfs2 mm/slab-generic mm/slab mm/slub mm/kcsan mm/debug mm/pagecache mm/gup mm/swap mm/shmem mm/memcg mm/pagemap mm/mremap mm/mincore mm/sparsemem mm/vmalloc mm/kasan mm/pagealloc mm/hugetlb mm/vmscan Subsystem: mm/pagemap Yang Shi <yang.shi@linux.alibaba.com>: mm/memory.c: avoid access flag update TLB flush for retried page fault Subsystem: mm/hofixes Ralph Campbell <rcampbell@nvidia.com>: mm/migrate: fix migrate_pgmap_owner w/o CONFIG_MMU_NOTIFIER Subsystem: mm/pagealloc David Hildenbrand <david@redhat.com>: mm/shuffle: don't move pages between zones and don't read garbage memmaps Subsystem: kthread Peter Zijlstra <peterz@infradead.org>: mm: fix kthread_use_mm() vs TLB invalidate Ilias Stamatis <stamatis.iliass@gmail.com>: kthread: remove incorrect comment in kthread_create_on_cpu() Subsystem: tools "Alexander A. Klimov" <grandmaster@al2klimov.de>: tools/: replace HTTP links with HTTPS ones Gaurav Singh <gaurav1086@gmail.com>: tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference Subsystem: scripts Jialu Xu <xujialu@vimux.org>: scripts/tags.sh: collect compiled source precisely Nikolay Borisov <nborisov@suse.com>: scripts/bloat-o-meter: Support comparing library archives Konstantin Khlebnikov <khlebnikov@yandex-team.ru>: scripts/decode_stacktrace.sh: skip missing symbols scripts/decode_stacktrace.sh: guess basepath if not specified scripts/decode_stacktrace.sh: guess path to modules scripts/decode_stacktrace.sh: guess path to vmlinux by release name Joe Perches <joe@perches.com>: const_structs.checkpatch: add regulator_ops Colin Ian King <colin.king@canonical.com>: scripts/spelling.txt: add more spellings to spelling.txt Subsystem: ntfs Luca Stefani <luca.stefani.ge1@gmail.com>: ntfs: fix ntfs_test_inode and ntfs_init_locked_inode function type Subsystem: ocfs2 Gang He <ghe@suse.com>: ocfs2: fix remounting needed after setfacl command Randy Dunlap <rdunlap@infradead.org>: ocfs2: suballoc.h: delete a duplicated word Junxiao Bi <junxiao.bi@oracle.com>: ocfs2: change slot number type s16 to u16 "Alexander A. Klimov" <grandmaster@al2klimov.de>: ocfs2: replace HTTP links with HTTPS ones Pavel Machek <pavel@ucw.cz>: ocfs2: fix unbalanced locking Subsystem: mm/slab-generic Waiman Long <longman@redhat.com>: mm, treewide: rename kzfree() to kfree_sensitive() William Kucharski <william.kucharski@oracle.com>: mm: ksize() should silently accept a NULL pointer Subsystem: mm/slab Kees Cook <keescook@chromium.org>: Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB": mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB mm/slab: add naive detection of double free Long Li <lonuxli.64@gmail.com>: mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_order Xiao Yang <yangx.jy@cn.fujitsu.com>: mm/slab.c: update outdated kmem_list3 in a comment Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: Patch series "slub_debug fixes and improvements": mm, slub: extend slub_debug syntax for multiple blocks mm, slub: make some slub_debug related attributes read-only mm, slub: remove runtime allocation order changes mm, slub: make remaining slub_debug related attributes read-only mm, slub: make reclaim_account attribute read-only mm, slub: introduce static key for slub_debug() mm, slub: introduce kmem_cache_debug_flags() mm, slub: extend checks guarded by slub_debug static key mm, slab/slub: move and improve cache_from_obj() mm, slab/slub: improve error reporting and overhead of cache_from_obj() Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm/slub.c: drop lockdep_assert_held() from put_map() Subsystem: mm/kcsan Marco Elver <elver@google.com>: mm, kcsan: instrument SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS" Subsystem: mm/debug Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/debug_vm_pgtable: Add some more tests", v5: mm/debug_vm_pgtable: add tests validating arch helpers for core MM features mm/debug_vm_pgtable: add tests validating advanced arch page table helpers mm/debug_vm_pgtable: add debug prints for individual tests Documentation/mm: add descriptions for arch page table helpers "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Improvements for dump_page()", v2: mm/debug: handle page->mapping better in dump_page mm/debug: dump compound page information on a second line mm/debug: print head flags in dump_page mm/debug: switch dump_page to get_kernel_nofault mm/debug: print the inode number in dump_page mm/debug: print hashed address of struct page John Hubbard <jhubbard@nvidia.com>: mm, dump_page: do not crash with bad compound_mapcount() Subsystem: mm/pagecache Yang Shi <yang.shi@linux.alibaba.com>: mm: filemap: clear idle flag for writes mm: filemap: add missing FGP_ flags in kerneldoc comment for pagecache_get_page Subsystem: mm/gup Tang Yizhou <tangyizhou@huawei.com>: mm/gup.c: fix the comment of return value for populate_vma_page_range() Subsystem: mm/swap Zhen Lei <thunder.leizhen@huawei.com>: Patch series "clean up some functions in mm/swap_slots.c": mm/swap_slots.c: simplify alloc_swap_slot_cache() mm/swap_slots.c: simplify enable_swap_slots_cache() mm/swap_slots.c: remove redundant check for swap_slot_cache_initialized Krzysztof Kozlowski <krzk@kernel.org>: mm: swap: fix kerneldoc of swap_vma_readahead() Xianting Tian <xianting_tian@126.com>: mm/page_io.c: use blk_io_schedule() for avoiding task hung in sync io Subsystem: mm/shmem Chris Down <chris@chrisdown.name>: Patch series "tmpfs: inode: Reduce risk of inum overflow", v7: tmpfs: per-superblock i_ino support tmpfs: support 64-bit inums per-sb Subsystem: mm/memcg Roman Gushchin <guro@fb.com>: mm: kmem: make memcg_kmem_enabled() irreversible Patch series "The new cgroup slab memory controller", v7: mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() mm: memcg: prepare for byte-sized vmstat items mm: memcg: convert vmstat slab counters to bytes mm: slub: implement SLUB version of obj_to_index() Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: decouple reference counting from page accounting Roman Gushchin <guro@fb.com>: mm: memcg/slab: obj_cgroup API mm: memcg/slab: allocate obj_cgroups for non-root slab pages mm: memcg/slab: save obj_cgroup for non-root slab objects mm: memcg/slab: charge individual slab objects instead of pages mm: memcg/slab: deprecate memory.kmem.slabinfo mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h mm: memcg/slab: use a single set of kmem_caches for all accounted allocations mm: memcg/slab: simplify memcg cache creation mm: memcg/slab: remove memcg_kmem_get_cache() mm: memcg/slab: deprecate slab_root_caches mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() mm: memcg/slab: use a single set of kmem_caches for all allocations kselftests: cgroup: add kernel memory accounting tests tools/cgroup: add memcg_slabinfo.py tool Shakeel Butt <shakeelb@google.com>: mm: memcontrol: account kernel stack per node Roman Gushchin <guro@fb.com>: mm: memcg/slab: remove unused argument by charge_slab_page() mm: slab: rename (un)charge_slab_page() to (un)account_slab_page() mm: kmem: switch to static_branch_likely() in memcg_kmem_enabled() mm: memcontrol: avoid workload stalls when lowering memory.high Chris Down <chris@chrisdown.name>: Patch series "mm, memcg: reclaim harder before high throttling", v2: mm, memcg: reclaim more aggressively before high allocator throttling mm, memcg: unify reclaim retry limits with page allocator Yafang Shao <laoar.shao@gmail.com>: Patch series "mm, memcg: memory.{low,min} reclaim fix & cleanup", v4: mm, memcg: avoid stale protection values when cgroup is above protection Chris Down <chris@chrisdown.name>: mm, memcg: decouple e{low,min} state mutations from protection checks Yafang Shao <laoar.shao@gmail.com>: memcg, oom: check memcg margin for parallel oom Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: restore proper dirty throttling when memory.high changes mm: memcontrol: don't count limit-setting reclaim as memory pressure Michal Koutný <mkoutny@suse.com>: mm/page_counter.c: fix protection usage propagation Subsystem: mm/pagemap Ralph Campbell <rcampbell@nvidia.com>: mm: remove redundant check non_swap_entry() Alex Zhang <zhangalex@google.com>: mm/memory.c: make remap_pfn_range() reject unaligned addr Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: cleanup usage of <asm/pgalloc.h>": mm: remove unneeded includes of <asm/pgalloc.h> opeinrisc: switch to generic version of pte allocation xtensa: switch to generic version of pte allocation asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one() asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one() asm-generic: pgalloc: provide generic pgd_free() mm: move lib/ioremap.c to mm/ Joerg Roedel <jroedel@suse.de>: mm: move p?d_alloc_track to separate header file Zhen Lei <thunder.leizhen@huawei.com>: mm/mmap: optimize a branch judgment in ksys_mmap_pgoff() Feng Tang <feng.tang@intel.com>: Patch series "make vm_committed_as_batch aware of vm overcommit policy", v6: proc/meminfo: avoid open coded reading of vm_committed_as mm/util.c: make vm_memory_committed() more accurate percpu_counter: add percpu_counter_sync() mm: adjust vm_committed_as_batch according to vm overcommit policy Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "arm64: Enable vmemmap mapping from device memory", v4: mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages() mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() arm64/mm: enable vmem_altmap support for vmemmap mappings Miaohe Lin <linmiaohe@huawei.com>: mm: mmap: merge vma after call_mmap() if possible Peter Collingbourne <pcc@google.com>: mm: remove unnecessary wrapper function do_mmap_pgoff() Subsystem: mm/mremap Wei Yang <richard.weiyang@linux.alibaba.com>: Patch series "mm/mremap: cleanup move_page_tables() a little", v5: mm/mremap: it is sure to have enough space when extent meets requirement mm/mremap: calculate extent in one place mm/mremap: start addresses are properly aligned Subsystem: mm/mincore Ricardo Cañuelo <ricardo.canuelo@collabora.com>: selftests: add mincore() tests Subsystem: mm/sparsemem Wei Yang <richard.weiyang@linux.alibaba.com>: mm/sparse: never partially remove memmap for early section mm/sparse: only sub-section aligned range would be populated Mike Rapoport <rppt@linux.ibm.com>: mm/sparse: cleanup the code surrounding memory_present() Subsystem: mm/vmalloc "Matthew Wilcox (Oracle)" <willy@infradead.org>: vmalloc: convert to XArray "Uladzislau Rezki (Sony)" <urezki@gmail.com>: mm/vmalloc: simplify merge_or_add_vmap_area() mm/vmalloc: simplify augment_tree_propagate_check() mm/vmalloc: switch to "propagate()" callback mm/vmalloc: update the header about KVA rework Mike Rapoport <rppt@linux.ibm.com>: mm: vmalloc: remove redundant assignment in unmap_kernel_range_noflush() "Uladzislau Rezki (Sony)" <urezki@gmail.com>: mm/vmalloc.c: remove BUG() from the find_va_links() Subsystem: mm/kasan Marco Elver <elver@google.com>: kasan: improve and simplify Kconfig.kasan kasan: update required compiler versions in documentation Walter Wu <walter-zh.wu@mediatek.com>: Patch series "kasan: memorize and print call_rcu stack", v8: rcu: kasan: record and print call_rcu() call stack kasan: record and print the free track kasan: add tests for call_rcu stack recording kasan: update documentation for generic kasan Vincenzo Frascino <vincenzo.frascino@arm.com>: kasan: remove kasan_unpoison_stack_above_sp_to() Walter Wu <walter-zh.wu@mediatek.com>: lib/test_kasan.c: fix KASAN unit tests for tag-based KASAN Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan: support stack instrumentation for tag-based mode", v2: kasan: don't tag stacks allocated with pagealloc efi: provide empty efi_enter_virtual_mode implementation kasan, arm64: don't instrument functions that enable kasan kasan: allow enabling stack tagging for tag-based mode kasan: adjust kasan_stack_oob for tag-based mode Subsystem: mm/pagealloc Vlastimil Babka <vbabka@suse.cz>: mm, page_alloc: use unlikely() in task_capc() Jaewon Kim <jaewon31.kim@samsung.com>: page_alloc: consider highatomic reserve in watermark fast Charan Teja Reddy <charante@codeaurora.org>: mm, page_alloc: skip ->waternark_boost for atomic order-0 allocations David Hildenbrand <david@redhat.com>: mm: remove vm_total_pages mm/page_alloc: remove nr_free_pagecache_pages() mm/memory_hotplug: document why shuffle_zone() is relevant mm/shuffle: remove dynamic reconfiguration Wei Yang <richard.weiyang@linux.alibaba.com>: mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits mm/page_alloc.c: extract the common part in pfn_to_bitidx() mm/page_alloc.c: simplify pageblock bitmap access mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask() Qian Cai <cai@lca.pw>: mm/page_alloc: silence a KASAN false positive Wei Yang <richard.weiyang@linux.alibaba.com>: mm/page_alloc: fallbacks at most has 3 elements Muchun Song <songmuchun@bytedance.com>: mm/page_alloc.c: skip setting nodemask when we are in interrupt Joonsoo Kim <iamjoonsoo.kim@lge.com>: mm/page_alloc: fix memalloc_nocma_{save/restore} APIs Subsystem: mm/hugetlb "Alexander A. Klimov" <grandmaster@al2klimov.de>: mm: thp: replace HTTP links with HTTPS ones Peter Xu <peterx@redhat.com>: mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible Hugh Dickins <hughd@google.com>: khugepaged: collapse_pte_mapped_thp() flush the right range khugepaged: collapse_pte_mapped_thp() protect the pmd lock khugepaged: retract_page_tables() remember to test exit khugepaged: khugepaged_test_exit() check mmget_still_valid() Subsystem: mm/vmscan dylan-meiners <spacct.spacct@gmail.com>: mm/vmscan.c: fix typo Shakeel Butt <shakeelb@google.com>: mm: vmscan: consistent update to pgrefill Documentation/admin-guide/kernel-parameters.txt | 2 Documentation/dev-tools/kasan.rst | 10 Documentation/filesystems/dlmfs.rst | 2 Documentation/filesystems/ocfs2.rst | 2 Documentation/filesystems/tmpfs.rst | 18 Documentation/vm/arch_pgtable_helpers.rst | 258 +++++ Documentation/vm/memory-model.rst | 9 Documentation/vm/slub.rst | 51 - arch/alpha/include/asm/pgalloc.h | 21 arch/alpha/include/asm/tlbflush.h | 1 arch/alpha/kernel/core_irongate.c | 1 arch/alpha/kernel/core_marvel.c | 1 arch/alpha/kernel/core_titan.c | 1 arch/alpha/kernel/machvec_impl.h | 2 arch/alpha/kernel/smp.c | 1 arch/alpha/mm/numa.c | 1 arch/arc/mm/fault.c | 1 arch/arc/mm/init.c | 1 arch/arm/include/asm/pgalloc.h | 12 arch/arm/include/asm/tlb.h | 1 arch/arm/kernel/machine_kexec.c | 1 arch/arm/kernel/smp.c | 1 arch/arm/kernel/suspend.c | 1 arch/arm/mach-omap2/omap-mpuss-lowpower.c | 1 arch/arm/mm/hugetlbpage.c | 1 arch/arm/mm/init.c | 9 arch/arm/mm/mmu.c | 1 arch/arm64/include/asm/pgalloc.h | 39 arch/arm64/kernel/setup.c | 2 arch/arm64/kernel/smp.c | 1 arch/arm64/mm/hugetlbpage.c | 1 arch/arm64/mm/init.c | 6 arch/arm64/mm/ioremap.c | 1 arch/arm64/mm/mmu.c | 63 - arch/csky/include/asm/pgalloc.h | 7 arch/csky/kernel/smp.c | 1 arch/hexagon/include/asm/pgalloc.h | 7 arch/ia64/include/asm/pgalloc.h | 24 arch/ia64/include/asm/tlb.h | 1 arch/ia64/kernel/process.c | 1 arch/ia64/kernel/smp.c | 1 arch/ia64/kernel/smpboot.c | 1 arch/ia64/mm/contig.c | 1 arch/ia64/mm/discontig.c | 4 arch/ia64/mm/hugetlbpage.c | 1 arch/ia64/mm/tlb.c | 1 arch/m68k/include/asm/mmu_context.h | 2 arch/m68k/include/asm/sun3_pgalloc.h | 7 arch/m68k/kernel/dma.c | 2 arch/m68k/kernel/traps.c | 3 arch/m68k/mm/cache.c | 2 arch/m68k/mm/fault.c | 1 arch/m68k/mm/kmap.c | 2 arch/m68k/mm/mcfmmu.c | 1 arch/m68k/mm/memory.c | 1 arch/m68k/sun3x/dvma.c | 2 arch/microblaze/include/asm/pgalloc.h | 6 arch/microblaze/include/asm/tlbflush.h | 1 arch/microblaze/kernel/process.c | 1 arch/microblaze/kernel/signal.c | 1 arch/microblaze/mm/init.c | 3 arch/mips/include/asm/pgalloc.h | 19 arch/mips/kernel/setup.c | 8 arch/mips/loongson64/numa.c | 1 arch/mips/sgi-ip27/ip27-memory.c | 2 arch/mips/sgi-ip32/ip32-memory.c | 1 arch/nds32/mm/mm-nds32.c | 2 arch/nios2/include/asm/pgalloc.h | 7 arch/openrisc/include/asm/pgalloc.h | 33 arch/openrisc/include/asm/tlbflush.h | 1 arch/openrisc/kernel/or32_ksyms.c | 1 arch/parisc/include/asm/mmu_context.h | 1 arch/parisc/include/asm/pgalloc.h | 12 arch/parisc/kernel/cache.c | 1 arch/parisc/kernel/pci-dma.c | 1 arch/parisc/kernel/process.c | 1 arch/parisc/kernel/signal.c | 1 arch/parisc/kernel/smp.c | 1 arch/parisc/mm/hugetlbpage.c | 1 arch/parisc/mm/init.c | 5 arch/parisc/mm/ioremap.c | 2 arch/powerpc/include/asm/tlb.h | 1 arch/powerpc/mm/book3s64/hash_hugetlbpage.c | 1 arch/powerpc/mm/book3s64/hash_pgtable.c | 1 arch/powerpc/mm/book3s64/hash_tlb.c | 1 arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 1 arch/powerpc/mm/init_32.c | 1 arch/powerpc/mm/init_64.c | 4 arch/powerpc/mm/kasan/8xx.c | 1 arch/powerpc/mm/kasan/book3s_32.c | 1 arch/powerpc/mm/mem.c | 3 arch/powerpc/mm/nohash/40x.c | 1 arch/powerpc/mm/nohash/8xx.c | 1 arch/powerpc/mm/nohash/fsl_booke.c | 1 arch/powerpc/mm/nohash/kaslr_booke.c | 1 arch/powerpc/mm/nohash/tlb.c | 1 arch/powerpc/mm/numa.c | 1 arch/powerpc/mm/pgtable.c | 1 arch/powerpc/mm/pgtable_64.c | 1 arch/powerpc/mm/ptdump/hashpagetable.c | 2 arch/powerpc/mm/ptdump/ptdump.c | 1 arch/powerpc/platforms/pseries/cmm.c | 1 arch/riscv/include/asm/pgalloc.h | 18 arch/riscv/mm/fault.c | 1 arch/riscv/mm/init.c | 3 arch/s390/crypto/prng.c | 4 arch/s390/include/asm/tlb.h | 1 arch/s390/include/asm/tlbflush.h | 1 arch/s390/kernel/machine_kexec.c | 1 arch/s390/kernel/ptrace.c | 1 arch/s390/kvm/diag.c | 1 arch/s390/kvm/priv.c | 1 arch/s390/kvm/pv.c | 1 arch/s390/mm/cmm.c | 1 arch/s390/mm/init.c | 1 arch/s390/mm/mmap.c | 1 arch/s390/mm/pgtable.c | 1 arch/sh/include/asm/pgalloc.h | 4 arch/sh/kernel/idle.c | 1 arch/sh/kernel/machine_kexec.c | 1 arch/sh/mm/cache-sh3.c | 1 arch/sh/mm/cache-sh7705.c | 1 arch/sh/mm/hugetlbpage.c | 1 arch/sh/mm/init.c | 7 arch/sh/mm/ioremap_fixed.c | 1 arch/sh/mm/numa.c | 3 arch/sh/mm/tlb-sh3.c | 1 arch/sparc/include/asm/ide.h | 1 arch/sparc/include/asm/tlb_64.h | 1 arch/sparc/kernel/leon_smp.c | 1 arch/sparc/kernel/process_32.c | 1 arch/sparc/kernel/signal_32.c | 1 arch/sparc/kernel/smp_32.c | 1 arch/sparc/kernel/smp_64.c | 1 arch/sparc/kernel/sun4m_irq.c | 1 arch/sparc/mm/highmem.c | 1 arch/sparc/mm/init_64.c | 1 arch/sparc/mm/io-unit.c | 1 arch/sparc/mm/iommu.c | 1 arch/sparc/mm/tlb.c | 1 arch/um/include/asm/pgalloc.h | 9 arch/um/include/asm/pgtable-3level.h | 3 arch/um/kernel/mem.c | 17 arch/x86/ia32/ia32_aout.c | 1 arch/x86/include/asm/mmu_context.h | 1 arch/x86/include/asm/pgalloc.h | 42 arch/x86/kernel/alternative.c | 1 arch/x86/kernel/apic/apic.c | 1 arch/x86/kernel/mpparse.c | 1 arch/x86/kernel/traps.c | 1 arch/x86/mm/fault.c | 1 arch/x86/mm/hugetlbpage.c | 1 arch/x86/mm/init_32.c | 2 arch/x86/mm/init_64.c | 12 arch/x86/mm/kaslr.c | 1 arch/x86/mm/pgtable_32.c | 1 arch/x86/mm/pti.c | 1 arch/x86/platform/uv/bios_uv.c | 1 arch/x86/power/hibernate.c | 2 arch/xtensa/include/asm/pgalloc.h | 46 arch/xtensa/kernel/xtensa_ksyms.c | 1 arch/xtensa/mm/cache.c | 1 arch/xtensa/mm/fault.c | 1 crypto/adiantum.c | 2 crypto/ahash.c | 4 crypto/api.c | 2 crypto/asymmetric_keys/verify_pefile.c | 4 crypto/deflate.c | 2 crypto/drbg.c | 10 crypto/ecc.c | 8 crypto/ecdh.c | 2 crypto/gcm.c | 2 crypto/gf128mul.c | 4 crypto/jitterentropy-kcapi.c | 2 crypto/rng.c | 2 crypto/rsa-pkcs1pad.c | 6 crypto/seqiv.c | 2 crypto/shash.c | 2 crypto/skcipher.c | 2 crypto/testmgr.c | 6 crypto/zstd.c | 2 drivers/base/node.c | 10 drivers/block/xen-blkback/common.h | 1 drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c | 2 drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c | 2 drivers/crypto/amlogic/amlogic-gxl-cipher.c | 4 drivers/crypto/atmel-ecc.c | 2 drivers/crypto/caam/caampkc.c | 28 drivers/crypto/cavium/cpt/cptvf_main.c | 6 drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 12 drivers/crypto/cavium/nitrox/nitrox_lib.c | 4 drivers/crypto/cavium/zip/zip_crypto.c | 6 drivers/crypto/ccp/ccp-crypto-rsa.c | 6 drivers/crypto/ccree/cc_aead.c | 4 drivers/crypto/ccree/cc_buffer_mgr.c | 4 drivers/crypto/ccree/cc_cipher.c | 6 drivers/crypto/ccree/cc_hash.c | 8 drivers/crypto/ccree/cc_request_mgr.c | 2 drivers/crypto/marvell/cesa/hash.c | 2 drivers/crypto/marvell/octeontx/otx_cptvf_main.c | 6 drivers/crypto/marvell/octeontx/otx_cptvf_reqmgr.h | 2 drivers/crypto/nx/nx.c | 4 drivers/crypto/virtio/virtio_crypto_algs.c | 12 drivers/crypto/virtio/virtio_crypto_core.c | 2 drivers/iommu/ipmmu-vmsa.c | 1 drivers/md/dm-crypt.c | 32 drivers/md/dm-integrity.c | 6 drivers/misc/ibmvmc.c | 6 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 2 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 6 drivers/net/ppp/ppp_mppe.c | 6 drivers/net/wireguard/noise.c | 4 drivers/net/wireguard/peer.c | 2 drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 6 drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 6 drivers/net/wireless/intersil/orinoco/wext.c | 4 drivers/s390/crypto/ap_bus.h | 4 drivers/staging/ks7010/ks_hostif.c | 2 drivers/staging/rtl8723bs/core/rtw_security.c | 2 drivers/staging/wlan-ng/p80211netdev.c | 2 drivers/target/iscsi/iscsi_target_auth.c | 2 drivers/xen/balloon.c | 1 drivers/xen/privcmd.c | 1 fs/Kconfig | 21 fs/aio.c | 6 fs/binfmt_elf_fdpic.c | 1 fs/cifs/cifsencrypt.c | 2 fs/cifs/connect.c | 10 fs/cifs/dfs_cache.c | 2 fs/cifs/misc.c | 8 fs/crypto/inline_crypt.c | 5 fs/crypto/keyring.c | 6 fs/crypto/keysetup_v1.c | 4 fs/ecryptfs/keystore.c | 4 fs/ecryptfs/messaging.c | 2 fs/hugetlbfs/inode.c | 2 fs/ntfs/dir.c | 2 fs/ntfs/inode.c | 27 fs/ntfs/inode.h | 4 fs/ntfs/mft.c | 4 fs/ocfs2/Kconfig | 6 fs/ocfs2/acl.c | 2 fs/ocfs2/blockcheck.c | 2 fs/ocfs2/dlmglue.c | 8 fs/ocfs2/ocfs2.h | 4 fs/ocfs2/suballoc.c | 4 fs/ocfs2/suballoc.h | 2 fs/ocfs2/super.c | 4 fs/proc/meminfo.c | 10 include/asm-generic/pgalloc.h | 80 + include/asm-generic/tlb.h | 1 include/crypto/aead.h | 2 include/crypto/akcipher.h | 2 include/crypto/gf128mul.h | 2 include/crypto/hash.h | 2 include/crypto/internal/acompress.h | 2 include/crypto/kpp.h | 2 include/crypto/skcipher.h | 2 include/linux/efi.h | 4 include/linux/fs.h | 17 include/linux/huge_mm.h | 2 include/linux/kasan.h | 4 include/linux/memcontrol.h | 209 +++- include/linux/mm.h | 86 - include/linux/mm_types.h | 5 include/linux/mman.h | 4 include/linux/mmu_notifier.h | 13 include/linux/mmzone.h | 54 - include/linux/pageblock-flags.h | 30 include/linux/percpu_counter.h | 4 include/linux/sched/mm.h | 8 include/linux/shmem_fs.h | 3 include/linux/slab.h | 11 include/linux/slab_def.h | 9 include/linux/slub_def.h | 31 include/linux/swap.h | 2 include/linux/vmstat.h | 14 init/Kconfig | 9 init/main.c | 2 ipc/shm.c | 2 kernel/fork.c | 54 - kernel/kthread.c | 8 kernel/power/snapshot.c | 2 kernel/rcu/tree.c | 2 kernel/scs.c | 2 kernel/sysctl.c | 2 lib/Kconfig.kasan | 39 lib/Makefile | 1 lib/ioremap.c | 287 ----- lib/mpi/mpiutil.c | 6 lib/percpu_counter.c | 19 lib/test_kasan.c | 87 + mm/Kconfig | 6 mm/Makefile | 2 mm/debug.c | 103 +- mm/debug_vm_pgtable.c | 666 +++++++++++++ mm/filemap.c | 9 mm/gup.c | 3 mm/huge_memory.c | 14 mm/hugetlb.c | 25 mm/ioremap.c | 289 +++++ mm/kasan/common.c | 41 mm/kasan/generic.c | 43 mm/kasan/generic_report.c | 1 mm/kasan/kasan.h | 25 mm/kasan/quarantine.c | 1 mm/kasan/report.c | 54 - mm/kasan/tags.c | 37 mm/khugepaged.c | 75 - mm/memcontrol.c | 832 ++++++++++------- mm/memory.c | 15 mm/memory_hotplug.c | 11 mm/migrate.c | 6 mm/mm_init.c | 20 mm/mmap.c | 45 mm/mremap.c | 19 mm/nommu.c | 6 mm/oom_kill.c | 2 mm/page-writeback.c | 6 mm/page_alloc.c | 226 ++-- mm/page_counter.c | 6 mm/page_io.c | 2 mm/pgalloc-track.h | 51 + mm/shmem.c | 133 ++ mm/shuffle.c | 46 mm/shuffle.h | 17 mm/slab.c | 129 +- mm/slab.h | 755 ++++++--------- mm/slab_common.c | 829 ++-------------- mm/slob.c | 12 mm/slub.c | 680 ++++--------- mm/sparse-vmemmap.c | 62 - mm/sparse.c | 31 mm/swap_slots.c | 45 mm/swap_state.c | 2 mm/util.c | 52 + mm/vmalloc.c | 176 +-- mm/vmscan.c | 39 mm/vmstat.c | 38 mm/workingset.c | 6 net/atm/mpoa_caches.c | 4 net/bluetooth/ecdh_helper.c | 6 net/bluetooth/smp.c | 24 net/core/sock.c | 2 net/ipv4/tcp_fastopen.c | 2 net/mac80211/aead_api.c | 4 net/mac80211/aes_gmac.c | 2 net/mac80211/key.c | 2 net/mac802154/llsec.c | 20 net/sctp/auth.c | 2 net/sunrpc/auth_gss/gss_krb5_crypto.c | 4 net/sunrpc/auth_gss/gss_krb5_keys.c | 6 net/sunrpc/auth_gss/gss_krb5_mech.c | 2 net/tipc/crypto.c | 10 net/wireless/core.c | 2 net/wireless/ibss.c | 4 net/wireless/lib80211_crypt_tkip.c | 2 net/wireless/lib80211_crypt_wep.c | 2 net/wireless/nl80211.c | 24 net/wireless/sme.c | 6 net/wireless/util.c | 2 net/wireless/wext-sme.c | 2 scripts/Makefile.kasan | 3 scripts/bloat-o-meter | 2 scripts/coccinelle/free/devm_free.cocci | 4 scripts/coccinelle/free/ifnullfree.cocci | 4 scripts/coccinelle/free/kfree.cocci | 6 scripts/coccinelle/free/kfreeaddr.cocci | 2 scripts/const_structs.checkpatch | 1 scripts/decode_stacktrace.sh | 85 + scripts/spelling.txt | 19 scripts/tags.sh | 18 security/apparmor/domain.c | 4 security/apparmor/include/file.h | 2 security/apparmor/policy.c | 24 security/apparmor/policy_ns.c | 6 security/apparmor/policy_unpack.c | 14 security/keys/big_key.c | 6 security/keys/dh.c | 14 security/keys/encrypted-keys/encrypted.c | 14 security/keys/trusted-keys/trusted_tpm1.c | 34 security/keys/user_defined.c | 6 tools/cgroup/memcg_slabinfo.py | 226 ++++ tools/include/linux/jhash.h | 2 tools/lib/rbtree.c | 2 tools/lib/traceevent/event-parse.h | 2 tools/testing/ktest/examples/README | 2 tools/testing/ktest/examples/crosstests.conf | 2 tools/testing/selftests/Makefile | 1 tools/testing/selftests/cgroup/.gitignore | 1 tools/testing/selftests/cgroup/Makefile | 2 tools/testing/selftests/cgroup/cgroup_util.c | 2 tools/testing/selftests/cgroup/test_kmem.c | 382 +++++++ tools/testing/selftests/mincore/.gitignore | 2 tools/testing/selftests/mincore/Makefile | 6 tools/testing/selftests/mincore/mincore_selftest.c | 361 +++++++ 397 files changed, 5547 insertions(+), 4072 deletions(-)
From: Yang Shi <yang.shi@linux.alibaba.com> Subject: mm/memory.c: avoid access flag update TLB flush for retried page fault Recently we found regression when running will_it_scale/page_fault3 test on ARM64. Over 70% down for the multi processes cases and over 20% down for the multi threads cases. It turns out the regression is caused by commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault"). The test mmaps a memory size file then write to the mapping, this would make all memory dirty and trigger dirty pages throttle, that upstream commit would release mmap_sem then retry the page fault. The retried page fault would see correct PTEs installed by the first try then update dirty bit and clear read-only bit and flush TLBs for ARM. The regression is caused by the excessive TLB flush. It is fine on x86 since x86 doesn't clear read-only bit so there is no need to flush TLB for this case. The page fault would be retried due to: 1. Waiting for page readahead 2. Waiting for page swapped in 3. Waiting for dirty pages throttling The first two cases don't have PTEs set up at all, so the retried page fault would install the PTEs, so they don't reach there. But the #3 case usually has PTEs installed, the retried page fault would reach the dirty bit and read-only bit update. But it seems not necessary to modify those bits again for #3 since they should be already set by the first page fault try. Of course the parallel page fault may set up PTEs, but we just need care about write fault. If the parallel page fault setup a writable and dirty PTE then the retried fault doesn't need do anything extra. If the parallel page fault setup a clean read-only PTE, the retried fault should just call do_wp_page() then return as the below code snippet shows: if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); } With this fix the test result get back to normal. [yang.shi@linux.alibaba.com: incorporate comment from Will Deacon, update commit log per discussion] Link: http://lkml.kernel.org/r/1594848990-55657-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1594148072-91273-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reported-by: Xu Yu <xuyu@linux.alibaba.com> Debugged-by: Xu Yu <xuyu@linux.alibaba.com> Tested-by: Xu Yu <xuyu@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/memory.c~mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault +++ a/mm/memory.c @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struc if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); - entry = pte_mkdirty(entry); } + + if (vmf->flags & FAULT_FLAG_TRIED) + goto unlock; + + if (vmf->flags & FAULT_FLAG_WRITE) + entry = pte_mkdirty(entry); + entry = pte_mkyoung(entry); if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { _
From: Ralph Campbell <rcampbell@nvidia.com> Subject: mm/migrate: fix migrate_pgmap_owner w/o CONFIG_MMU_NOTIFIER On x86_64, when CONFIG_MMU_NOTIFIER is not set/enabled, there is a compiler error: ../mm/migrate.c: In function 'migrate_vma_collect': ../mm/migrate.c:2481:7: error: 'struct mmu_notifier_range' has no member named 'migrate_pgmap_owner' range.migrate_pgmap_owner = migrate->pgmap_owner; ^ Link: http://lkml.kernel.org/r/20200806193353.7124-1-rcampbell@nvidia.com Fixes: 998427b3ad2c ("mm/notifier: add migration invalidation type") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Jason Gunthorpe" <jgg@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mmu_notifier.h | 13 +++++++++++++ mm/migrate.c | 6 +++--- 2 files changed, 16 insertions(+), 3 deletions(-) --- a/include/linux/mmu_notifier.h~mm-migrate-fix-migrate_pgmap_owner-w-o-config_mmu_notifier +++ a/include/linux/mmu_notifier.h @@ -521,6 +521,16 @@ static inline void mmu_notifier_range_in range->flags = flags; } +static inline void mmu_notifier_range_init_migrate( + struct mmu_notifier_range *range, unsigned int flags, + struct vm_area_struct *vma, struct mm_struct *mm, + unsigned long start, unsigned long end, void *pgmap) +{ + mmu_notifier_range_init(range, MMU_NOTIFY_MIGRATE, flags, vma, mm, + start, end); + range->migrate_pgmap_owner = pgmap; +} + #define ptep_clear_flush_young_notify(__vma, __address, __ptep) \ ({ \ int __young; \ @@ -645,6 +655,9 @@ static inline void _mmu_notifier_range_i #define mmu_notifier_range_init(range,event,flags,vma,mm,start,end) \ _mmu_notifier_range_init(range, start, end) +#define mmu_notifier_range_init_migrate(range, flags, vma, mm, start, end, \ + pgmap) \ + _mmu_notifier_range_init(range, start, end) static inline bool mmu_notifier_range_blockable(const struct mmu_notifier_range *range) --- a/mm/migrate.c~mm-migrate-fix-migrate_pgmap_owner-w-o-config_mmu_notifier +++ a/mm/migrate.c @@ -2386,9 +2386,9 @@ static void migrate_vma_collect(struct m * that the registered device driver can skip invalidating device * private page mappings that won't be migrated. */ - mmu_notifier_range_init(&range, MMU_NOTIFY_MIGRATE, 0, migrate->vma, - migrate->vma->vm_mm, migrate->start, migrate->end); - range.migrate_pgmap_owner = migrate->pgmap_owner; + mmu_notifier_range_init_migrate(&range, 0, migrate->vma, + migrate->vma->vm_mm, migrate->start, migrate->end, + migrate->pgmap_owner); mmu_notifier_invalidate_range_start(&range); walk_page_range(migrate->vma->vm_mm, migrate->start, migrate->end, _
From: David Hildenbrand <david@redhat.com> Subject: mm/shuffle: don't move pages between zones and don't read garbage memmaps Especially with memory hotplug, we can have offline sections (with a garbage memmap) and overlapping zones. We have to make sure to only touch initialized memmaps (online sections managed by the buddy) and that the zone matches, to not move pages between zones. To test if this can actually happen, I added a simple BUG_ON(page_zone(page_i) != page_zone(page_j)); right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and onlining the first memory block "online_movable" and the second memory block "online_kernel", it will trigger the BUG, as both zones (NORMAL and MOVABLE) overlap. This might result in all kinds of weird situations (e.g., double allocations, list corruptions, unmovable allocations ending up in the movable zone). Link: http://lkml.kernel.org/r/20200624094741.9918-2-david@redhat.com Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [5.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/shuffle.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) --- a/mm/shuffle.c~mm-shuffle-dont-move-pages-between-zones-and-dont-read-garbage-memmaps +++ a/mm/shuffle.c @@ -58,25 +58,25 @@ module_param_call(shuffle, shuffle_store * For two pages to be swapped in the shuffle, they must be free (on a * 'free_area' lru), have the same order, and have the same migratetype. */ -static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order) +static struct page * __meminit shuffle_valid_page(struct zone *zone, + unsigned long pfn, int order) { - struct page *page; + struct page *page = pfn_to_online_page(pfn); /* * Given we're dealing with randomly selected pfns in a zone we * need to ask questions like... */ - /* ...is the pfn even in the memmap? */ - if (!pfn_valid_within(pfn)) + /* ... is the page managed by the buddy? */ + if (!page) return NULL; - /* ...is the pfn in a present section or a hole? */ - if (!pfn_in_present_section(pfn)) + /* ... is the page assigned to the same zone? */ + if (page_zone(page) != zone) return NULL; /* ...is the page free and currently on a free_area list? */ - page = pfn_to_page(pfn); if (!PageBuddy(page)) return NULL; @@ -123,7 +123,7 @@ void __meminit __shuffle_zone(struct zon * page_j randomly selected in the span @zone_start_pfn to * @spanned_pages. */ - page_i = shuffle_valid_page(i, order); + page_i = shuffle_valid_page(z, i, order); if (!page_i) continue; @@ -137,7 +137,7 @@ void __meminit __shuffle_zone(struct zon j = z->zone_start_pfn + ALIGN_DOWN(get_random_long() % z->spanned_pages, order_pages); - page_j = shuffle_valid_page(j, order); + page_j = shuffle_valid_page(z, j, order); if (page_j && page_j != page_i) break; } _
From: Peter Zijlstra <peterz@infradead.org> Subject: mm: fix kthread_use_mm() vs TLB invalidate For SMP systems using IPI based TLB invalidation, looking at current->active_mm is entirely reasonable. This then presents the following race condition: CPU0 CPU1 flush_tlb_mm(mm) use_mm(mm) <send-IPI> tsk->active_mm = mm; <IPI> if (tsk->active_mm == mm) // flush TLBs </IPI> switch_mm(old_mm,mm,tsk); Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, because the IPI lands before we actually switched. Avoid this by disabling IRQs across changing ->active_mm and switch_mm(). Of the (SMP) architectures that have IPI based TLB invalidate: Alpha - checks active_mm ARC - ASID specific IA64 - checks active_mm MIPS - ASID specific flush OpenRISC - shoots down world PARISC - shoots down world SH - ASID specific SPARC - ASID specific x86 - N/A xtensa - checks active_mm So at the very least Alpha, IA64 and Xtensa are suspect. On top of this, for scheduler consistency we need at least preemption disabled across changing tsk->mm and doing switch_mm(), which is currently provided by task_lock(), but that's not sufficient for PREEMPT_RT. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/20200721154106.GE10769@hirez.programming.kicks-ass.net Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Will Deacon <will@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- kernel/kthread.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate +++ a/kernel/kthread.c @@ -1241,13 +1241,16 @@ void kthread_use_mm(struct mm_struct *mm WARN_ON_ONCE(tsk->mm); task_lock(tsk); + /* Hold off tlb flush IPIs while switching mm's */ + local_irq_disable(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); tsk->active_mm = mm; } tsk->mm = mm; - switch_mm(active_mm, mm, tsk); + switch_mm_irqs_off(active_mm, mm, tsk); + local_irq_enable(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -1276,9 +1279,11 @@ void kthread_unuse_mm(struct mm_struct * task_lock(tsk); sync_mm_rss(mm); + local_irq_disable(); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + local_irq_enable(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(kthread_unuse_mm); _
From: Ilias Stamatis <stamatis.iliass@gmail.com> Subject: kthread: remove incorrect comment in kthread_create_on_cpu() Originally kthread_create_on_cpu() parked and woke up the new thread. However, since commit a65d40961dc7 ("kthread/smpboot: do not park in kthread_create_on_cpu()") this is no longer the case. This patch removes the comment that has been left behind and is now incorrect / stale. Link: http://lkml.kernel.org/r/20200611135920.240551-1-stamatis.iliass@gmail.com Fixes: a65d40961dc7 ("kthread/smpboot: do not park in kthread_create_on_cpu()") Signed-off-by: Ilias Stamatis <stamatis.iliass@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- kernel/kthread.c | 1 - 1 file changed, 1 deletion(-) --- a/kernel/kthread.c~kthread-remove-incorrect-comment-in-kthread_create_on_cpu +++ a/kernel/kthread.c @@ -480,7 +480,6 @@ EXPORT_SYMBOL(kthread_bind); * to "name.*%u". Code fills in cpu number. * * Description: This helper function creates and names a kernel thread - * The thread will be woken and put into park mode. */ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), void *data, unsigned int cpu, _
From: "Alexander A. Klimov" <grandmaster@al2klimov.de> Subject: tools/: replace HTTP links with HTTPS ones Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Link: http://lkml.kernel.org/r/20200726120752.16768-1-grandmaster@al2klimov.de Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- tools/include/linux/jhash.h | 2 +- tools/lib/rbtree.c | 2 +- tools/lib/traceevent/event-parse.h | 2 +- tools/testing/ktest/examples/README | 2 +- tools/testing/ktest/examples/crosstests.conf | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) --- a/tools/include/linux/jhash.h~tools-replace-http-links-with-https-ones +++ a/tools/include/linux/jhash.h @@ -5,7 +5,7 @@ * * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net) * - * http://burtleburtle.net/bob/hash/ + * https://burtleburtle.net/bob/hash/ * * These are the credits from Bob's sources: * --- a/tools/lib/rbtree.c~tools-replace-http-links-with-https-ones +++ a/tools/lib/rbtree.c @@ -13,7 +13,7 @@ #include <linux/export.h> /* - * red-black trees properties: http://en.wikipedia.org/wiki/Rbtree + * red-black trees properties: https://en.wikipedia.org/wiki/Rbtree * * 1) A node is either red or black * 2) The root is black --- a/tools/lib/traceevent/event-parse.h~tools-replace-http-links-with-https-ones +++ a/tools/lib/traceevent/event-parse.h @@ -379,7 +379,7 @@ enum tep_errno { * errno since SUS requires the errno has distinct positive values. * See 'Issue 6' in the link below. * - * http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html + * https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html */ __TEP_ERRNO__START = -100000, --- a/tools/testing/ktest/examples/crosstests.conf~tools-replace-http-links-with-https-ones +++ a/tools/testing/ktest/examples/crosstests.conf @@ -3,7 +3,7 @@ # # In this config, it is expected that the tool chains from: # -# http://kernel.org/pub/tools/crosstool/files/bin/x86_64/ +# https://kernel.org/pub/tools/crosstool/files/bin/x86_64/ # # running on a x86_64 system have been downloaded and installed into: # --- a/tools/testing/ktest/examples/README~tools-replace-http-links-with-https-ones +++ a/tools/testing/ktest/examples/README @@ -11,7 +11,7 @@ crosstests.conf - this config shows an e lots of different architectures. It only does build tests, but makes it easy to compile test different archs. You can download the arch cross compilers from: - http://kernel.org/pub/tools/crosstool/files/bin/x86_64/ + https://kernel.org/pub/tools/crosstool/files/bin/x86_64/ test.conf - A generic example of a config. This is based on an actual config used to perform real testing. _
From: Gaurav Singh <gaurav1086@gmail.com> Subject: tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference Haven't reproduced this issue. This PR is does a minor code cleanup. Link: http://lkml.kernel.org/r/20200726013808.22242-1-gaurav1086@gmail.com Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Koutn <mkoutny@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- tools/testing/selftests/cgroup/cgroup_util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/tools/testing/selftests/cgroup/cgroup_util.c~cg_read_strcmp-fix-null-pointer-dereference +++ a/tools/testing/selftests/cgroup/cgroup_util.c @@ -106,7 +106,7 @@ int cg_read_strcmp(const char *cgroup, c /* Handle the case of comparing against empty string */ if (!expected) - size = 32; + return -1; else size = strlen(expected) + 1; _
From: Jialu Xu <xujialu@vimux.org> Subject: scripts/tags.sh: collect compiled source precisely Parse compiled source from *.cmd but don't 'find' too many files that are not related to compilation. [xujialu@vimux.org: don't expand symlinks by add option -s for realpath] Link: http://lkml.kernel.org/r/5efc5bfb.1c69fb81.41bf5.7131SMTPIN_ADDED_MISSING@mx.google.com Link: http://lkml.kernel.org/r/5ee5d8e3.1c69fb81.9b804.47b2SMTPIN_ADDED_MISSING@mx.google.com Signed-off-by: Jialu Xu <xujialu@vimux.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/tags.sh | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) --- a/scripts/tags.sh~scripts-tagssh-collect-compiled-source-precisely +++ a/scripts/tags.sh @@ -91,20 +91,10 @@ all_sources() all_compiled_sources() { - for i in $(all_sources); do - case "$i" in - *.[cS]) - j=${i/\.[cS]/\.o} - j="${j#$tree}" - if [ -e $j ]; then - echo $i - fi - ;; - *) - echo $i - ;; - esac - done + realpath -es $([ -z "$KBUILD_ABS_SRCTREE" ] && echo --relative-to=.) \ + include/generated/autoconf.h $(find -name "*.cmd" -exec \ + grep -Poh '(?(?=^source_.* \K).*|(?=^ \K\S).*(?= \\))' {} \+ | + awk '!a[$0]++') | sort -u } all_target_sources() _
From: Nikolay Borisov <nborisov@suse.com> Subject: scripts/bloat-o-meter: Support comparing library archives Library archives (.a) usually contain multiple object files so their output of nm --size-sort contains lines like: <omitted for brevity> 00000000000003a8 t run_test extent-map-tests.o: <omitted for brevity> bloat-o-meter currently doesn't handle them which results in errors when calling .split() on them. Fix this by simply ignoring them. This enables diffing subsystems which generate built-in.a files. Link: http://lkml.kernel.org/r/20200603103513.3712-1-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/bloat-o-meter | 2 ++ 1 file changed, 2 insertions(+) --- a/scripts/bloat-o-meter~bloat-o-meter-support-comparing-library-archives +++ a/scripts/bloat-o-meter @@ -26,6 +26,8 @@ def getsizes(file, format): sym = {} with os.popen("nm --size-sort " + file) as f: for line in f: + if line.startswith("\n") or ":" in line: + continue size, type, name = line.split() if type in format: # strip generated symbols _
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Subject: scripts/decode_stacktrace.sh: skip missing symbols For now script turns missing symbols into '0' and make bogus decode. Skip them instead. Also simplify parsing output of 'nm'. Before: $ echo 'xxx+0x0/0x0' | ./scripts/decode_stacktrace.sh vmlinux "" xxx (home/khlebnikov/src/linux/./arch/x86/include/asm/processor.h:398) After: $ echo 'xxx+0x0/0x0' | ./scripts/decode_stacktrace.sh vmlinux "" xxx+0x0/0x0 Link: http://lkml.kernel.org/r/159282922499.248444.4883465570858385250.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/decode_stacktrace.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/scripts/decode_stacktrace.sh~scripts-decode_stacktrace-skip-missing-symbols +++ a/scripts/decode_stacktrace.sh @@ -56,7 +56,11 @@ parse_symbol() { if [[ "${cache[$module,$name]+isset}" == "isset" ]]; then local base_addr=${cache[$module,$name]} else - local base_addr=$(nm "$objfile" | grep -i ' t ' | awk "/ $name\$/ {print \$1}" | head -n1) + local base_addr=$(nm "$objfile" | awk '$3 == "'$name'" && ($2 == "t" || $2 == "T") {print $1; exit}') + if [[ $base_addr == "" ]] ; then + # address not found + return + fi cache[$module,$name]="$base_addr" fi # Let's start doing the math to get the exact address into the _
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Subject: scripts/decode_stacktrace.sh: guess basepath if not specified Guess path to kernel sources using known location of symbol "kernel_init". Make basepath argument optional. Before: $ echo 'vfs_open+0x0/0x0' | ./scripts/decode_stacktrace.sh vmlinux "" vfs_open (home/khlebnikov/src/linux/fs/open.c:912) After: $ echo 'vfs_open+0x0/0x0' | ./scripts/decode_stacktrace.sh vmlinux vfs_open (fs/open.c:912) Link: http://lkml.kernel.org/r/159282922803.248444.2379229451667913634.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/decode_stacktrace.sh | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) --- a/scripts/decode_stacktrace.sh~scripts-decode_stacktrace-guess-basepath-if-not-specified +++ a/scripts/decode_stacktrace.sh @@ -3,14 +3,14 @@ # (c) 2014, Sasha Levin <sasha.levin@oracle.com> #set -x -if [[ $# < 2 ]]; then +if [[ $# < 1 ]]; then echo "Usage:" - echo " $0 [vmlinux] [base path] [modules path]" + echo " $0 <vmlinux> [base path] [modules path]" exit 1 fi vmlinux=$1 -basepath=$2 +basepath=${2-auto} modpath=$3 declare -A cache declare -A modcache @@ -152,6 +152,14 @@ handle_line() { echo "${words[@]}" "$symbol $module" } +if [[ $basepath == "auto" ]] ; then + module="" + symbol="kernel_init+0x0/0x0" + parse_symbol + basepath=${symbol#kernel_init (} + basepath=${basepath%/init/main.c:*)} +fi + while read line; do # Let's see if we have an address in the line if [[ $line =~ \[\<([^]]+)\>\] ]] || _
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Subject: scripts/decode_stacktrace.sh: guess path to modules Try to find module in directory with vmlinux (for fresh build). Then try standard paths where debuginfo are usually placed. Pick first file which have elf section '.debug_line'. Before: $ echo 'tap_open+0x0/0x0 [tap]' | ./scripts/decode_stacktrace.sh /usr/lib/debug/boot/vmlinux-5.4.0-37-generic WARNING! Modules path isn't set, but is needed to parse this symbol tap_open+0x0/0x0 tap After: $ echo 'tap_open+0x0/0x0 [tap]' | ./scripts/decode_stacktrace.sh /usr/lib/debug/boot/vmlinux-5.4.0-37-generic tap_open (drivers/net/tap.c:502) tap Link: http://lkml.kernel.org/r/159282923068.248444.5461337458421616083.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/decode_stacktrace.sh | 36 ++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-) --- a/scripts/decode_stacktrace.sh~scripts-decode_stacktrace-guess-path-to-modules +++ a/scripts/decode_stacktrace.sh @@ -12,9 +12,40 @@ fi vmlinux=$1 basepath=${2-auto} modpath=$3 +release="" + declare -A cache declare -A modcache +find_module() { + if [[ "$modpath" != "" ]] ; then + for fn in $(find "$modpath" -name "${module//_/[-_]}.ko*") ; do + if readelf -WS "$fn" | grep -qwF .debug_line ; then + echo $fn + return + fi + done + return 1 + fi + + modpath=$(dirname "$vmlinux") + find_module && return + + if [[ $release == "" ]] ; then + release=$(gdb -ex 'print init_uts_ns.name.release' -ex 'quit' -quiet -batch "$vmlinux" | sed -n 's/\$1 = "\(.*\)".*/\1/p') + fi + + for dn in {/usr/lib/debug,}/lib/modules/$release ; do + if [ -e "$dn" ] ; then + modpath="$dn" + find_module && return + fi + done + + modpath="" + return 1 +} + parse_symbol() { # The structure of symbol at this point is: # ([name]+[offset]/[total length]) @@ -27,12 +58,11 @@ parse_symbol() { elif [[ "${modcache[$module]+isset}" == "isset" ]]; then local objfile=${modcache[$module]} else - if [[ $modpath == "" ]]; then + local objfile=$(find_module) + if [[ $objfile == "" ]] ; then echo "WARNING! Modules path isn't set, but is needed to parse this symbol" >&2 return fi - local objfile=$(find "$modpath" -name "${module//_/[-_]}.ko*" -print -quit) - [[ $objfile == "" ]] && return modcache[$module]=$objfile fi _
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Subject: scripts/decode_stacktrace.sh: guess path to vmlinux by release name Add option decode_stacktrace -r <release> to specify only release name. This is enough to guess standard paths to vmlinux and modules: $ echo -e 'schedule+0x0/0x0 tap_open+0x0/0x0 [tap]' | ./scripts/decode_stacktrace.sh -r 5.4.0-37-generic schedule (kernel/sched/core.c:4138) tap_open (drivers/net/tap.c:502) tap Link: http://lkml.kernel.org/r/159282923334.248444.2399153100007347838.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/decode_stacktrace.sh | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) --- a/scripts/decode_stacktrace.sh~scripts-decode_stacktrace-guess-path-to-vmlinux-by-release-name +++ a/scripts/decode_stacktrace.sh @@ -5,14 +5,33 @@ if [[ $# < 1 ]]; then echo "Usage:" - echo " $0 <vmlinux> [base path] [modules path]" + echo " $0 -r <release> | <vmlinux> [base path] [modules path]" exit 1 fi -vmlinux=$1 -basepath=${2-auto} -modpath=$3 -release="" +if [[ $1 == "-r" ]] ; then + vmlinux="" + basepath="auto" + modpath="" + release=$2 + + for fn in {,/usr/lib/debug}/boot/vmlinux-$release{,.debug} /lib/modules/$release{,/build}/vmlinux ; do + if [ -e "$fn" ] ; then + vmlinux=$fn + break + fi + done + + if [[ $vmlinux == "" ]] ; then + echo "ERROR! vmlinux image for release $release is not found" >&2 + exit 2 + fi +else + vmlinux=$1 + basepath=${2-auto} + modpath=$3 + release="" +fi declare -A cache declare -A modcache _
From: Joe Perches <joe@perches.com> Subject: const_structs.checkpatch: add regulator_ops Add regulator_ops to expected to be const list. Link: http://lkml.kernel.org/r/dab1ba1aa03a8236933cfb7a28937efb0b808f13.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Pi-Hsun Shih <pihsun@chromium.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> Cc: Guenter Roeck <groeck@chromium.org> Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/const_structs.checkpatch | 1 + 1 file changed, 1 insertion(+) --- a/scripts/const_structs.checkpatch~const_structscheckpatch-add-regulator_ops +++ a/scripts/const_structs.checkpatch @@ -44,6 +44,7 @@ platform_hibernation_ops platform_suspend_ops proto_ops regmap_access_table +regulator_ops rpc_pipe_ops rtc_class_ops sd_desc _
From: Colin Ian King <colin.king@canonical.com> Subject: scripts/spelling.txt: add more spellings to spelling.txt Here are some of the more common spelling mistakes and typos that I've found while fixing up spelling mistakes in the kernel since April 2020. Link: http://lkml.kernel.org/r/20200714092837.173796-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/spelling.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) --- a/scripts/spelling.txt~scripts-spellingtxt-add-more-spellings-to-spellingtxt +++ a/scripts/spelling.txt @@ -149,6 +149,7 @@ arbitary||arbitrary architechture||architecture arguement||argument arguements||arguments +arithmatic||arithmetic aritmetic||arithmetic arne't||aren't arraival||arrival @@ -454,6 +455,7 @@ destorys||destroys destroied||destroyed detabase||database deteced||detected +detectt||detect develope||develop developement||development developped||developed @@ -545,6 +547,7 @@ entires||entries entites||entities entrys||entries enocded||encoded +enought||enough enterily||entirely enviroiment||environment enviroment||environment @@ -556,11 +559,14 @@ equivelant||equivalent equivilant||equivalent eror||error errorr||error +errror||error estbalishment||establishment etsablishment||establishment etsbalishment||establishment +evalution||evaluation excecutable||executable exceded||exceeded +exceds||exceeds exceeed||exceed excellant||excellent execeeded||exceeded @@ -583,6 +589,7 @@ explictly||explicitly expresion||expression exprimental||experimental extened||extended +exteneded||extended||extended extensability||extensibility extention||extension extenstion||extension @@ -610,10 +617,12 @@ feautures||features fetaure||feature fetaures||features fileystem||filesystem +fimrware||firmware fimware||firmware firmare||firmware firmaware||firmware firware||firmware +firwmare||firmware finanize||finalize findn||find finilizes||finalizes @@ -661,6 +670,7 @@ globel||global grabing||grabbing grahical||graphical grahpical||graphical +granularty||granularity grapic||graphic grranted||granted guage||gauge @@ -906,6 +916,7 @@ miximum||maximum mmnemonic||mnemonic mnay||many modfiy||modify +modifer||modifier modulues||modules momery||memory memomry||memory @@ -915,6 +926,7 @@ monochromo||monochrome monocrome||monochrome mopdule||module mroe||more +multipler||multiplier mulitplied||multiplied multidimensionnal||multidimensional multipe||multiple @@ -952,6 +964,7 @@ occassionally||occasionally occationally||occasionally occurance||occurrence occurances||occurrences +occurd||occurred occured||occurred occurence||occurrence occure||occurred @@ -1058,6 +1071,7 @@ precission||precision preemptable||preemptible prefered||preferred prefferably||preferably +prefitler||prefilter premption||preemption prepaired||prepared preperation||preparation @@ -1101,6 +1115,7 @@ pronunce||pronounce propery||property propigate||propagate propigation||propagation +propogation||propagation propogate||propagate prosess||process protable||portable @@ -1316,6 +1331,7 @@ sturcture||structure subdirectoires||subdirectories suble||subtle substract||subtract +submited||submitted submition||submission suceed||succeed succesfully||successfully @@ -1324,6 +1340,7 @@ successed||succeeded successfull||successful successfuly||successfully sucessfully||successfully +sucessful||successful sucess||success superflous||superfluous superseeded||superseded @@ -1409,6 +1426,7 @@ transormed||transformed trasfer||transfer trasmission||transmission treshold||threshold +triggerd||triggered trigerred||triggered trigerring||triggering trun||turn @@ -1421,6 +1439,7 @@ uknown||unknown usccess||success usupported||unsupported uncommited||uncommitted +uncompatible||incompatible unconditionaly||unconditionally undeflow||underflow underun||underrun _
From: Luca Stefani <luca.stefani.ge1@gmail.com> Subject: ntfs: fix ntfs_test_inode and ntfs_init_locked_inode function type Clang's Control Flow Integrity (CFI) is a security mechanism that can help prevent JOP chains, deployed extensively in downstream kernels used in Android. Its deployment is hindered by mismatches in function signatures. For this case, we make callbacks match their intended function signature, and cast parameters within them rather than casting the callback when passed as a parameter. When running `mount -t ntfs ...` we observe the following trace: Call trace: __cfi_check_fail+0x1c/0x24 name_to_dev_t+0x0/0x404 iget5_locked+0x594/0x5e8 ntfs_fill_super+0xbfc/0x43ec mount_bdev+0x30c/0x3cc ntfs_mount+0x18/0x24 mount_fs+0x1b0/0x380 vfs_kern_mount+0x90/0x398 do_mount+0x5d8/0x1a10 SyS_mount+0x108/0x144 el0_svc_naked+0x34/0x38 Link: http://lkml.kernel.org/r/20200718112513.533800-1-luca.stefani.ge1@gmail.com Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com> Tested-by: freak07 <michalechner92@googlemail.com> Acked-by: Anton Altaparmakov <anton@tuxera.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/ntfs/dir.c | 2 +- fs/ntfs/inode.c | 27 ++++++++++++++------------- fs/ntfs/inode.h | 4 +--- fs/ntfs/mft.c | 4 ++-- 4 files changed, 18 insertions(+), 19 deletions(-) --- a/fs/ntfs/dir.c~ntfs-fix-ntfs_test_inode-and-ntfs_init_locked_inode-function-type +++ a/fs/ntfs/dir.c @@ -1504,7 +1504,7 @@ static int ntfs_dir_fsync(struct file *f na.type = AT_BITMAP; na.name = I30; na.name_len = 4; - bmp_vi = ilookup5(vi->i_sb, vi->i_ino, (test_t)ntfs_test_inode, &na); + bmp_vi = ilookup5(vi->i_sb, vi->i_ino, ntfs_test_inode, &na); if (bmp_vi) { write_inode_now(bmp_vi, !datasync); iput(bmp_vi); --- a/fs/ntfs/inode.c~ntfs-fix-ntfs_test_inode-and-ntfs_init_locked_inode-function-type +++ a/fs/ntfs/inode.c @@ -30,10 +30,10 @@ /** * ntfs_test_inode - compare two (possibly fake) inodes for equality * @vi: vfs inode which to test - * @na: ntfs attribute which is being tested with + * @data: data which is being tested with * * Compare the ntfs attribute embedded in the ntfs specific part of the vfs - * inode @vi for equality with the ntfs attribute @na. + * inode @vi for equality with the ntfs attribute @data. * * If searching for the normal file/directory inode, set @na->type to AT_UNUSED. * @na->name and @na->name_len are then ignored. @@ -43,8 +43,9 @@ * NOTE: This function runs with the inode_hash_lock spin lock held so it is not * allowed to sleep. */ -int ntfs_test_inode(struct inode *vi, ntfs_attr *na) +int ntfs_test_inode(struct inode *vi, void *data) { + ntfs_attr *na = (ntfs_attr *)data; ntfs_inode *ni; if (vi->i_ino != na->mft_no) @@ -72,9 +73,9 @@ int ntfs_test_inode(struct inode *vi, nt /** * ntfs_init_locked_inode - initialize an inode * @vi: vfs inode to initialize - * @na: ntfs attribute which to initialize @vi to + * @data: data which to initialize @vi to * - * Initialize the vfs inode @vi with the values from the ntfs attribute @na in + * Initialize the vfs inode @vi with the values from the ntfs attribute @data in * order to enable ntfs_test_inode() to do its work. * * If initializing the normal file/directory inode, set @na->type to AT_UNUSED. @@ -87,8 +88,9 @@ int ntfs_test_inode(struct inode *vi, nt * NOTE: This function runs with the inode->i_lock spin lock held so it is not * allowed to sleep. (Hence the GFP_ATOMIC allocation.) */ -static int ntfs_init_locked_inode(struct inode *vi, ntfs_attr *na) +static int ntfs_init_locked_inode(struct inode *vi, void *data) { + ntfs_attr *na = (ntfs_attr *)data; ntfs_inode *ni = NTFS_I(vi); vi->i_ino = na->mft_no; @@ -131,7 +133,6 @@ static int ntfs_init_locked_inode(struct return 0; } -typedef int (*set_t)(struct inode *, void *); static int ntfs_read_locked_inode(struct inode *vi); static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode *vi); static int ntfs_read_locked_index_inode(struct inode *base_vi, @@ -164,8 +165,8 @@ struct inode *ntfs_iget(struct super_blo na.name = NULL; na.name_len = 0; - vi = iget5_locked(sb, mft_no, (test_t)ntfs_test_inode, - (set_t)ntfs_init_locked_inode, &na); + vi = iget5_locked(sb, mft_no, ntfs_test_inode, + ntfs_init_locked_inode, &na); if (unlikely(!vi)) return ERR_PTR(-ENOMEM); @@ -225,8 +226,8 @@ struct inode *ntfs_attr_iget(struct inod na.name = name; na.name_len = name_len; - vi = iget5_locked(base_vi->i_sb, na.mft_no, (test_t)ntfs_test_inode, - (set_t)ntfs_init_locked_inode, &na); + vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode, + ntfs_init_locked_inode, &na); if (unlikely(!vi)) return ERR_PTR(-ENOMEM); @@ -280,8 +281,8 @@ struct inode *ntfs_index_iget(struct ino na.name = name; na.name_len = name_len; - vi = iget5_locked(base_vi->i_sb, na.mft_no, (test_t)ntfs_test_inode, - (set_t)ntfs_init_locked_inode, &na); + vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode, + ntfs_init_locked_inode, &na); if (unlikely(!vi)) return ERR_PTR(-ENOMEM); --- a/fs/ntfs/inode.h~ntfs-fix-ntfs_test_inode-and-ntfs_init_locked_inode-function-type +++ a/fs/ntfs/inode.h @@ -253,9 +253,7 @@ typedef struct { ATTR_TYPE type; } ntfs_attr; -typedef int (*test_t)(struct inode *, void *); - -extern int ntfs_test_inode(struct inode *vi, ntfs_attr *na); +extern int ntfs_test_inode(struct inode *vi, void *data); extern struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no); extern struct inode *ntfs_attr_iget(struct inode *base_vi, ATTR_TYPE type, --- a/fs/ntfs/mft.c~ntfs-fix-ntfs_test_inode-and-ntfs_init_locked_inode-function-type +++ a/fs/ntfs/mft.c @@ -958,7 +958,7 @@ bool ntfs_may_write_mft_record(ntfs_volu * dirty code path of the inode dirty code path when writing * $MFT occurs. */ - vi = ilookup5_nowait(sb, mft_no, (test_t)ntfs_test_inode, &na); + vi = ilookup5_nowait(sb, mft_no, ntfs_test_inode, &na); } if (vi) { ntfs_debug("Base inode 0x%lx is in icache.", mft_no); @@ -1019,7 +1019,7 @@ bool ntfs_may_write_mft_record(ntfs_volu vi = igrab(mft_vi); BUG_ON(vi != mft_vi); } else - vi = ilookup5_nowait(sb, na.mft_no, (test_t)ntfs_test_inode, + vi = ilookup5_nowait(sb, na.mft_no, ntfs_test_inode, &na); if (!vi) { /* _
From: Gang He <ghe@suse.com> Subject: ocfs2: fix remounting needed after setfacl command When use setfacl command to change a file's acl, the user cannot get the latest acl information from the file via getfacl command, until remounting the file system. e.g. setfacl -m u:ivan:rw /ocfs2/ivan getfacl /ocfs2/ivan getfacl: Removing leading '/' from absolute path names file: ocfs2/ivan owner: root group: root user::rw- group::r-- mask::r-- other::r-- The latest acl record("u:ivan:rw") cannot be returned via getfacl command until remounting. Link: http://lkml.kernel.org/r/20200717023751.9922-1-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/ocfs2/acl.c | 2 ++ 1 file changed, 2 insertions(+) --- a/fs/ocfs2/acl.c~ocfs2-fix-remounting-needed-after-setfacl-command +++ a/fs/ocfs2/acl.c @@ -256,6 +256,8 @@ static int ocfs2_set_acl(handle_t *handl ret = ocfs2_xattr_set(inode, name_index, "", value, size, 0); kfree(value); + if (!ret) + set_cached_acl(inode, type, acl); return ret; } _
From: Randy Dunlap <rdunlap@infradead.org> Subject: ocfs2: suballoc.h: delete a duplicated word Drop the repeated word "is" in a comment. Link: http://lkml.kernel.org/r/20200720001421.28823-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/ocfs2/suballoc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/ocfs2/suballoc.h~ocfs2-suballoch-delete-a-duplicated-word +++ a/fs/ocfs2/suballoc.h @@ -40,7 +40,7 @@ struct ocfs2_alloc_context { u64 ac_last_group; u64 ac_max_block; /* Highest block number to allocate. 0 is - is the same as ~0 - unlimited */ + the same as ~0 - unlimited */ int ac_find_loc_only; /* hack for reflink operation ordering */ struct ocfs2_suballoc_result *ac_find_loc_priv; /* */ _
From: Junxiao Bi <junxiao.bi@oracle.com> Subject: ocfs2: change slot number type s16 to u16 Dan Carpenter reported the following static checker warning. fs/ocfs2/super.c:1269 ocfs2_parse_options() warn: '(-1)' 65535 can't fit into 32767 'mopt->slot' fs/ocfs2/suballoc.c:859 ocfs2_init_inode_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_inode_steal_slot' fs/ocfs2/suballoc.c:867 ocfs2_init_meta_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_meta_steal_slot' That's because OCFS2_INVALID_SLOT is (u16)-1. Slot number in ocfs2 can be never negative, so change s16 to u16. Link: http://lkml.kernel.org/r/20200627001259.19757-1-junxiao.bi@oracle.com Fixes: 9277f8334ffc ("ocfs2: fix value of OCFS2_INVALID_SLOT") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/ocfs2/ocfs2.h | 4 ++-- fs/ocfs2/suballoc.c | 4 ++-- fs/ocfs2/super.c | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) --- a/fs/ocfs2/ocfs2.h~ocfs2-change-slot-number-type-s16-to-u16 +++ a/fs/ocfs2/ocfs2.h @@ -327,8 +327,8 @@ struct ocfs2_super spinlock_t osb_lock; u32 s_next_generation; unsigned long osb_flags; - s16 s_inode_steal_slot; - s16 s_meta_steal_slot; + u16 s_inode_steal_slot; + u16 s_meta_steal_slot; atomic_t s_num_inodes_stolen; atomic_t s_num_meta_stolen; --- a/fs/ocfs2/suballoc.c~ocfs2-change-slot-number-type-s16-to-u16 +++ a/fs/ocfs2/suballoc.c @@ -879,9 +879,9 @@ static void __ocfs2_set_steal_slot(struc { spin_lock(&osb->osb_lock); if (type == INODE_ALLOC_SYSTEM_INODE) - osb->s_inode_steal_slot = slot; + osb->s_inode_steal_slot = (u16)slot; else if (type == EXTENT_ALLOC_SYSTEM_INODE) - osb->s_meta_steal_slot = slot; + osb->s_meta_steal_slot = (u16)slot; spin_unlock(&osb->osb_lock); } --- a/fs/ocfs2/super.c~ocfs2-change-slot-number-type-s16-to-u16 +++ a/fs/ocfs2/super.c @@ -78,7 +78,7 @@ struct mount_options unsigned long commit_interval; unsigned long mount_opt; unsigned int atime_quantum; - signed short slot; + unsigned short slot; int localalloc_opt; unsigned int resv_level; int dir_resv_level; @@ -1349,7 +1349,7 @@ static int ocfs2_parse_options(struct su goto bail; } if (option) - mopt->slot = (s16)option; + mopt->slot = (u16)option; break; case Opt_commit: if (match_int(&args[0], &option)) { _
From: "Alexander A. Klimov" <grandmaster@al2klimov.de> Subject: ocfs2: replace HTTP links with HTTPS ones Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `xmlns`: For each link, `http://[^# ]*(?:\w|/)`: If neither `gnu\.org/license`, nor `mozilla\.org/MPL`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Link: http://lkml.kernel.org/r/20200713174456.36596-1-grandmaster@al2klimov.de Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/filesystems/dlmfs.rst | 2 +- Documentation/filesystems/ocfs2.rst | 2 +- fs/ocfs2/Kconfig | 6 +++--- fs/ocfs2/blockcheck.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) --- a/Documentation/filesystems/dlmfs.rst~ocfs2-replace-http-links-with-https-ones +++ a/Documentation/filesystems/dlmfs.rst @@ -12,7 +12,7 @@ dlmfs is built with OCFS2 as it requires :Project web page: http://ocfs2.wiki.kernel.org :Tools web page: https://github.com/markfasheh/ocfs2-tools -:OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ +:OCFS2 mailing lists: https://oss.oracle.com/projects/ocfs2/mailman/ All code copyright 2005 Oracle except when otherwise noted. --- a/Documentation/filesystems/ocfs2.rst~ocfs2-replace-http-links-with-https-ones +++ a/Documentation/filesystems/ocfs2.rst @@ -14,7 +14,7 @@ get "mount.ocfs2" and "ocfs2_hb_ctl". Project web page: http://ocfs2.wiki.kernel.org Tools git tree: https://github.com/markfasheh/ocfs2-tools -OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ +OCFS2 mailing lists: https://oss.oracle.com/projects/ocfs2/mailman/ All code copyright 2005 Oracle except when otherwise noted. --- a/fs/ocfs2/blockcheck.c~ocfs2-replace-http-links-with-https-ones +++ a/fs/ocfs2/blockcheck.c @@ -124,7 +124,7 @@ u32 ocfs2_hamming_encode(u32 parity, voi * parity bits that are part of the bit number * representation. Huh? * - * <wikipedia href="http://en.wikipedia.org/wiki/Hamming_code"> + * <wikipedia href="https://en.wikipedia.org/wiki/Hamming_code"> * In other words, the parity bit at position 2^k * checks bits in positions having bit k set in * their binary representation. Conversely, for --- a/fs/ocfs2/Kconfig~ocfs2-replace-http-links-with-https-ones +++ a/fs/ocfs2/Kconfig @@ -16,9 +16,9 @@ config OCFS2_FS You'll want to install the ocfs2-tools package in order to at least get "mount.ocfs2". - Project web page: http://oss.oracle.com/projects/ocfs2 - Tools web page: http://oss.oracle.com/projects/ocfs2-tools - OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ + Project web page: https://oss.oracle.com/projects/ocfs2 + Tools web page: https://oss.oracle.com/projects/ocfs2-tools + OCFS2 mailing lists: https://oss.oracle.com/projects/ocfs2/mailman/ For more information on OCFS2, see the file <file:Documentation/filesystems/ocfs2.rst>. _
From: Pavel Machek <pavel@ucw.cz> Subject: ocfs2: fix unbalanced locking Based on what fails, function can return with nfs_sync_rwlock either locked or unlocked. That can not be right. Always return with lock unlocked on error. Link: http://lkml.kernel.org/r/20200724124443.GA28164@duo.ucw.cz Fixes: 4cd9973f9ff6 ("ocfs2: avoid inode removal while nfsd is accessing it") Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/ocfs2/dlmglue.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/fs/ocfs2/dlmglue.c~ocfs2-fix-unbalanced-locking +++ a/fs/ocfs2/dlmglue.c @@ -2871,9 +2871,15 @@ int ocfs2_nfs_sync_lock(struct ocfs2_sup status = ocfs2_cluster_lock(osb, lockres, ex ? LKM_EXMODE : LKM_PRMODE, 0, 0); - if (status < 0) + if (status < 0) { mlog(ML_ERROR, "lock on nfs sync lock failed %d\n", status); + if (ex) + up_write(&osb->nfs_sync_rwlock); + else + up_read(&osb->nfs_sync_rwlock); + } + return status; } _
From: Waiman Long <longman@redhat.com> Subject: mm, treewide: rename kzfree() to kfree_sensitive() As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important part of what the caller wants. In "kzfree()", the z is actively detrimental, because maybe in the future we really _might_ want to use that "memfill(0xdeadbeef)" or something. The "zero" part of the interface isn't even _relevant_. The main reason that kzfree() exists is to clear sensitive information that should not be leaked to other future users of the same memory objects. Rename kzfree() to kfree_sensitive() to follow the example of the recently added kvfree_sensitive() and make the intention of the API more explicit. In addition, memzero_explicit() is used to clear the memory to make sure that it won't get optimized away by the compiler. The renaming is done by using the command sequence: git grep -w --name-only kzfree |\ xargs sed -i 's/kzfree/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and adding a kzfree backward compatibility macro in slab.h. [akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h] [akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more] Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com Suggested-by: Joe Perches <joe@perches.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Waiman Long <longman@redhat.com> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Joe Perches <joe@perches.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/s390/crypto/prng.c | 4 - arch/x86/power/hibernate.c | 2 crypto/adiantum.c | 2 crypto/ahash.c | 4 - crypto/api.c | 2 crypto/asymmetric_keys/verify_pefile.c | 4 - crypto/deflate.c | 2 crypto/drbg.c | 10 +- crypto/ecc.c | 8 +- crypto/ecdh.c | 2 crypto/gcm.c | 2 crypto/gf128mul.c | 4 - crypto/jitterentropy-kcapi.c | 2 crypto/rng.c | 2 crypto/rsa-pkcs1pad.c | 6 - crypto/seqiv.c | 2 crypto/shash.c | 2 crypto/skcipher.c | 2 crypto/testmgr.c | 6 - crypto/zstd.c | 2 drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c | 2 drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c | 2 drivers/crypto/amlogic/amlogic-gxl-cipher.c | 4 - drivers/crypto/atmel-ecc.c | 2 drivers/crypto/caam/caampkc.c | 28 ++++---- drivers/crypto/cavium/cpt/cptvf_main.c | 6 - drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 12 +-- drivers/crypto/cavium/nitrox/nitrox_lib.c | 4 - drivers/crypto/cavium/zip/zip_crypto.c | 6 - drivers/crypto/ccp/ccp-crypto-rsa.c | 6 - drivers/crypto/ccree/cc_aead.c | 4 - drivers/crypto/ccree/cc_buffer_mgr.c | 4 - drivers/crypto/ccree/cc_cipher.c | 6 - drivers/crypto/ccree/cc_hash.c | 8 +- drivers/crypto/ccree/cc_request_mgr.c | 2 drivers/crypto/marvell/cesa/hash.c | 2 drivers/crypto/marvell/octeontx/otx_cptvf_main.c | 6 - drivers/crypto/marvell/octeontx/otx_cptvf_reqmgr.h | 2 drivers/crypto/nx/nx.c | 4 - drivers/crypto/virtio/virtio_crypto_algs.c | 12 +-- drivers/crypto/virtio/virtio_crypto_core.c | 2 drivers/md/dm-crypt.c | 32 ++++----- drivers/md/dm-integrity.c | 6 - drivers/misc/ibmvmc.c | 6 - drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 2 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 6 - drivers/net/ppp/ppp_mppe.c | 6 - drivers/net/wireguard/noise.c | 4 - drivers/net/wireguard/peer.c | 2 drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 6 - drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 6 - drivers/net/wireless/intersil/orinoco/wext.c | 4 - drivers/s390/crypto/ap_bus.h | 4 - drivers/staging/ks7010/ks_hostif.c | 2 drivers/staging/rtl8723bs/core/rtw_security.c | 2 drivers/staging/wlan-ng/p80211netdev.c | 2 drivers/target/iscsi/iscsi_target_auth.c | 2 fs/cifs/cifsencrypt.c | 2 fs/cifs/connect.c | 10 +- fs/cifs/dfs_cache.c | 2 fs/cifs/misc.c | 8 +- fs/crypto/inline_crypt.c | 5 - fs/crypto/keyring.c | 6 - fs/crypto/keysetup_v1.c | 4 - fs/ecryptfs/keystore.c | 4 - fs/ecryptfs/messaging.c | 2 include/crypto/aead.h | 2 include/crypto/akcipher.h | 2 include/crypto/gf128mul.h | 2 include/crypto/hash.h | 2 include/crypto/internal/acompress.h | 2 include/crypto/kpp.h | 2 include/crypto/skcipher.h | 2 include/linux/slab.h | 4 - lib/mpi/mpiutil.c | 6 - lib/test_kasan.c | 6 - mm/slab_common.c | 8 +- net/atm/mpoa_caches.c | 4 - net/bluetooth/ecdh_helper.c | 6 - net/bluetooth/smp.c | 24 +++---- net/core/sock.c | 2 net/ipv4/tcp_fastopen.c | 2 net/mac80211/aead_api.c | 4 - net/mac80211/aes_gmac.c | 2 net/mac80211/key.c | 2 net/mac802154/llsec.c | 20 ++--- net/sctp/auth.c | 2 net/sunrpc/auth_gss/gss_krb5_crypto.c | 4 - net/sunrpc/auth_gss/gss_krb5_keys.c | 6 - net/sunrpc/auth_gss/gss_krb5_mech.c | 2 net/tipc/crypto.c | 10 +- net/wireless/core.c | 2 net/wireless/ibss.c | 4 - net/wireless/lib80211_crypt_tkip.c | 2 net/wireless/lib80211_crypt_wep.c | 2 net/wireless/nl80211.c | 24 +++---- net/wireless/sme.c | 6 - net/wireless/util.c | 2 net/wireless/wext-sme.c | 2 scripts/coccinelle/free/devm_free.cocci | 4 - scripts/coccinelle/free/ifnullfree.cocci | 4 - scripts/coccinelle/free/kfree.cocci | 6 - scripts/coccinelle/free/kfreeaddr.cocci | 2 security/apparmor/domain.c | 4 - security/apparmor/include/file.h | 2 security/apparmor/policy.c | 24 +++---- security/apparmor/policy_ns.c | 6 - security/apparmor/policy_unpack.c | 14 ++-- security/keys/big_key.c | 6 - security/keys/dh.c | 14 ++-- security/keys/encrypted-keys/encrypted.c | 14 ++-- security/keys/trusted-keys/trusted_tpm1.c | 34 +++++----- security/keys/user_defined.c | 6 - 114 files changed, 323 insertions(+), 320 deletions(-) --- a/arch/s390/crypto/prng.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/arch/s390/crypto/prng.c @@ -249,7 +249,7 @@ static void prng_tdes_deinstantiate(void { pr_debug("The prng module stopped " "after running in triple DES mode\n"); - kzfree(prng_data); + kfree_sensitive(prng_data); } @@ -442,7 +442,7 @@ outfree: static void prng_sha512_deinstantiate(void) { pr_debug("The prng module stopped after running in SHA-512 mode\n"); - kzfree(prng_data); + kfree_sensitive(prng_data); } --- a/arch/x86/power/hibernate.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/arch/x86/power/hibernate.c @@ -98,7 +98,7 @@ static int get_e820_md5(struct e820_tabl if (crypto_shash_digest(desc, (u8 *)table, size, buf)) ret = -EINVAL; - kzfree(desc); + kfree_sensitive(desc); free_tfm: crypto_free_shash(tfm); --- a/crypto/adiantum.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/adiantum.c @@ -177,7 +177,7 @@ static int adiantum_setkey(struct crypto keyp += NHPOLY1305_KEY_SIZE; WARN_ON(keyp != &data->derived_keys[ARRAY_SIZE(data->derived_keys)]); out: - kzfree(data); + kfree_sensitive(data); return err; } --- a/crypto/ahash.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/ahash.c @@ -183,7 +183,7 @@ static int ahash_setkey_unaligned(struct alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1); memcpy(alignbuffer, key, keylen); ret = tfm->setkey(tfm, alignbuffer, keylen); - kzfree(buffer); + kfree_sensitive(buffer); return ret; } @@ -302,7 +302,7 @@ static void ahash_restore_req(struct aha req->priv = NULL; /* Free the req->priv.priv from the ADJUSTED request. */ - kzfree(priv); + kfree_sensitive(priv); } static void ahash_notify_einprogress(struct ahash_request *req) --- a/crypto/api.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/api.c @@ -571,7 +571,7 @@ void crypto_destroy_tfm(void *mem, struc alg->cra_exit(tfm); crypto_exit_ops(tfm); crypto_mod_put(alg); - kzfree(mem); + kfree_sensitive(mem); } EXPORT_SYMBOL_GPL(crypto_destroy_tfm); --- a/crypto/asymmetric_keys/verify_pefile.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/asymmetric_keys/verify_pefile.c @@ -376,7 +376,7 @@ static int pefile_digest_pe(const void * } error: - kzfree(desc); + kfree_sensitive(desc); error_no_desc: crypto_free_shash(tfm); kleave(" = %d", ret); @@ -447,6 +447,6 @@ int verify_pefile_signature(const void * ret = pefile_digest_pe(pebuf, pelen, &ctx); error: - kzfree(ctx.digest); + kfree_sensitive(ctx.digest); return ret; } --- a/crypto/deflate.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/deflate.c @@ -163,7 +163,7 @@ static void __deflate_exit(void *ctx) static void deflate_free_ctx(struct crypto_scomp *tfm, void *ctx) { __deflate_exit(ctx); - kzfree(ctx); + kfree_sensitive(ctx); } static void deflate_exit(struct crypto_tfm *tfm) --- a/crypto/drbg.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/drbg.c @@ -1218,19 +1218,19 @@ static inline void drbg_dealloc_state(st { if (!drbg) return; - kzfree(drbg->Vbuf); + kfree_sensitive(drbg->Vbuf); drbg->Vbuf = NULL; drbg->V = NULL; - kzfree(drbg->Cbuf); + kfree_sensitive(drbg->Cbuf); drbg->Cbuf = NULL; drbg->C = NULL; - kzfree(drbg->scratchpadbuf); + kfree_sensitive(drbg->scratchpadbuf); drbg->scratchpadbuf = NULL; drbg->reseed_ctr = 0; drbg->d_ops = NULL; drbg->core = NULL; if (IS_ENABLED(CONFIG_CRYPTO_FIPS)) { - kzfree(drbg->prev); + kfree_sensitive(drbg->prev); drbg->prev = NULL; drbg->fips_primed = false; } @@ -1701,7 +1701,7 @@ static int drbg_fini_hash_kernel(struct struct sdesc *sdesc = (struct sdesc *)drbg->priv_data; if (sdesc) { crypto_free_shash(sdesc->shash.tfm); - kzfree(sdesc); + kfree_sensitive(sdesc); } drbg->priv_data = NULL; return 0; --- a/crypto/ecc.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/ecc.c @@ -67,7 +67,7 @@ static u64 *ecc_alloc_digits_space(unsig static void ecc_free_digits_space(u64 *space) { - kzfree(space); + kfree_sensitive(space); } static struct ecc_point *ecc_alloc_point(unsigned int ndigits) @@ -101,9 +101,9 @@ static void ecc_free_point(struct ecc_po if (!p) return; - kzfree(p->x); - kzfree(p->y); - kzfree(p); + kfree_sensitive(p->x); + kfree_sensitive(p->y); + kfree_sensitive(p); } static void vli_clear(u64 *vli, unsigned int ndigits) --- a/crypto/ecdh.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/ecdh.c @@ -124,7 +124,7 @@ static int ecdh_compute_value(struct kpp /* fall through */ free_all: - kzfree(shared_secret); + kfree_sensitive(shared_secret); free_pubkey: kfree(public_key); return ret; --- a/crypto/gcm.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/gcm.c @@ -139,7 +139,7 @@ static int crypto_gcm_setkey(struct cryp CRYPTO_TFM_REQ_MASK); err = crypto_ahash_setkey(ghash, (u8 *)&data->hash, sizeof(be128)); out: - kzfree(data); + kfree_sensitive(data); return err; } --- a/crypto/gf128mul.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/gf128mul.c @@ -304,8 +304,8 @@ void gf128mul_free_64k(struct gf128mul_6 int i; for (i = 0; i < 16; i++) - kzfree(t->t[i]); - kzfree(t); + kfree_sensitive(t->t[i]); + kfree_sensitive(t); } EXPORT_SYMBOL(gf128mul_free_64k); --- a/crypto/jitterentropy-kcapi.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/jitterentropy-kcapi.c @@ -57,7 +57,7 @@ void *jent_zalloc(unsigned int len) void jent_zfree(void *ptr) { - kzfree(ptr); + kfree_sensitive(ptr); } int jent_fips_enabled(void) --- a/crypto/rng.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/rng.c @@ -53,7 +53,7 @@ int crypto_rng_reset(struct crypto_rng * err = crypto_rng_alg(tfm)->seed(tfm, seed, slen); crypto_stats_rng_seed(alg, err); out: - kzfree(buf); + kfree_sensitive(buf); return err; } EXPORT_SYMBOL_GPL(crypto_rng_reset); --- a/crypto/rsa-pkcs1pad.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/rsa-pkcs1pad.c @@ -199,7 +199,7 @@ static int pkcs1pad_encrypt_sign_complet sg_copy_from_buffer(req->dst, sg_nents_for_len(req->dst, ctx->key_size), out_buf, ctx->key_size); - kzfree(out_buf); + kfree_sensitive(out_buf); out: req->dst_len = ctx->key_size; @@ -322,7 +322,7 @@ static int pkcs1pad_decrypt_complete(str out_buf + pos, req->dst_len); done: - kzfree(req_ctx->out_buf); + kfree_sensitive(req_ctx->out_buf); return err; } @@ -500,7 +500,7 @@ static int pkcs1pad_verify_complete(stru req->dst_len) != 0) err = -EKEYREJECTED; done: - kzfree(req_ctx->out_buf); + kfree_sensitive(req_ctx->out_buf); return err; } --- a/crypto/seqiv.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/seqiv.c @@ -33,7 +33,7 @@ static void seqiv_aead_encrypt_complete2 memcpy(req->iv, subreq->iv, crypto_aead_ivsize(geniv)); out: - kzfree(subreq->iv); + kfree_sensitive(subreq->iv); } static void seqiv_aead_encrypt_complete(struct crypto_async_request *base, --- a/crypto/shash.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/shash.c @@ -44,7 +44,7 @@ static int shash_setkey_unaligned(struct alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1); memcpy(alignbuffer, key, keylen); err = shash->setkey(tfm, alignbuffer, keylen); - kzfree(buffer); + kfree_sensitive(buffer); return err; } --- a/crypto/skcipher.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/skcipher.c @@ -592,7 +592,7 @@ static int skcipher_setkey_unaligned(str alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1); memcpy(alignbuffer, key, keylen); ret = cipher->setkey(tfm, alignbuffer, keylen); - kzfree(buffer); + kfree_sensitive(buffer); return ret; } --- a/crypto/testmgr.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/testmgr.c @@ -1744,7 +1744,7 @@ out: kfree(vec.plaintext); kfree(vec.digest); crypto_free_shash(generic_tfm); - kzfree(generic_desc); + kfree_sensitive(generic_desc); return err; } #else /* !CONFIG_CRYPTO_MANAGER_EXTRA_TESTS */ @@ -3665,7 +3665,7 @@ static int drbg_cavs_test(const struct d if (IS_ERR(drng)) { printk(KERN_ERR "alg: drbg: could not allocate DRNG handle for " "%s\n", driver); - kzfree(buf); + kfree_sensitive(buf); return -ENOMEM; } @@ -3712,7 +3712,7 @@ static int drbg_cavs_test(const struct d outbuf: crypto_free_rng(drng); - kzfree(buf); + kfree_sensitive(buf); return ret; } --- a/crypto/zstd.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/crypto/zstd.c @@ -137,7 +137,7 @@ static void __zstd_exit(void *ctx) static void zstd_free_ctx(struct crypto_scomp *tfm, void *ctx) { __zstd_exit(ctx); - kzfree(ctx); + kfree_sensitive(ctx); } static void zstd_exit(struct crypto_tfm *tfm) --- a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c @@ -254,7 +254,7 @@ theend_iv: offset = areq->cryptlen - ivsize; if (rctx->op_dir & CE_DECRYPTION) { memcpy(areq->iv, backup_iv, ivsize); - kzfree(backup_iv); + kfree_sensitive(backup_iv); } else { scatterwalk_map_and_copy(areq->iv, areq->dst, offset, ivsize, 0); --- a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c @@ -249,7 +249,7 @@ theend_iv: if (rctx->op_dir & SS_DECRYPTION) { memcpy(areq->iv, backup_iv, ivsize); memzero_explicit(backup_iv, ivsize); - kzfree(backup_iv); + kfree_sensitive(backup_iv); } else { scatterwalk_map_and_copy(areq->iv, areq->dst, offset, ivsize, 0); --- a/drivers/crypto/amlogic/amlogic-gxl-cipher.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/amlogic/amlogic-gxl-cipher.c @@ -252,8 +252,8 @@ static int meson_cipher(struct skcipher_ } } theend: - kzfree(bkeyiv); - kzfree(backup_iv); + kfree_sensitive(bkeyiv); + kfree_sensitive(backup_iv); return err; } --- a/drivers/crypto/atmel-ecc.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/atmel-ecc.c @@ -69,7 +69,7 @@ static void atmel_ecdh_done(struct atmel /* fall through */ free_work_data: - kzfree(work_data); + kfree_sensitive(work_data); kpp_request_complete(req, status); } --- a/drivers/crypto/caam/caampkc.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/caam/caampkc.c @@ -854,14 +854,14 @@ static int caam_rsa_dec(struct akcipher_ static void caam_rsa_free_key(struct caam_rsa_key *key) { - kzfree(key->d); - kzfree(key->p); - kzfree(key->q); - kzfree(key->dp); - kzfree(key->dq); - kzfree(key->qinv); - kzfree(key->tmp1); - kzfree(key->tmp2); + kfree_sensitive(key->d); + kfree_sensitive(key->p); + kfree_sensitive(key->q); + kfree_sensitive(key->dp); + kfree_sensitive(key->dq); + kfree_sensitive(key->qinv); + kfree_sensitive(key->tmp1); + kfree_sensitive(key->tmp2); kfree(key->e); kfree(key->n); memset(key, 0, sizeof(*key)); @@ -1018,17 +1018,17 @@ static void caam_rsa_set_priv_key_form(s return; free_dq: - kzfree(rsa_key->dq); + kfree_sensitive(rsa_key->dq); free_dp: - kzfree(rsa_key->dp); + kfree_sensitive(rsa_key->dp); free_tmp2: - kzfree(rsa_key->tmp2); + kfree_sensitive(rsa_key->tmp2); free_tmp1: - kzfree(rsa_key->tmp1); + kfree_sensitive(rsa_key->tmp1); free_q: - kzfree(rsa_key->q); + kfree_sensitive(rsa_key->q); free_p: - kzfree(rsa_key->p); + kfree_sensitive(rsa_key->p); } static int caam_rsa_set_priv_key(struct crypto_akcipher *tfm, const void *key, --- a/drivers/crypto/cavium/cpt/cptvf_main.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/cavium/cpt/cptvf_main.c @@ -74,7 +74,7 @@ static void cleanup_worker_threads(struc for (i = 0; i < cptvf->nr_queues; i++) tasklet_kill(&cwqe_info->vq_wqe[i].twork); - kzfree(cwqe_info); + kfree_sensitive(cwqe_info); cptvf->wqe_info = NULL; } @@ -88,7 +88,7 @@ static void free_pending_queues(struct p continue; /* free single queue */ - kzfree((queue->head)); + kfree_sensitive((queue->head)); queue->front = 0; queue->rear = 0; @@ -189,7 +189,7 @@ static void free_command_queues(struct c chunk->head = NULL; chunk->dma_addr = 0; hlist_del(&chunk->nextchunk); - kzfree(chunk); + kfree_sensitive(chunk); } queue->nchunks = 0; --- a/drivers/crypto/cavium/cpt/cptvf_reqmanager.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/cavium/cpt/cptvf_reqmanager.c @@ -305,12 +305,12 @@ static void do_request_cleanup(struct cp } } - kzfree(info->scatter_components); - kzfree(info->gather_components); - kzfree(info->out_buffer); - kzfree(info->in_buffer); - kzfree((void *)info->completion_addr); - kzfree(info); + kfree_sensitive(info->scatter_components); + kfree_sensitive(info->gather_components); + kfree_sensitive(info->out_buffer); + kfree_sensitive(info->in_buffer); + kfree_sensitive((void *)info->completion_addr); + kfree_sensitive(info); } static void do_post_process(struct cpt_vf *cptvf, struct cpt_info_buffer *info) --- a/drivers/crypto/cavium/nitrox/nitrox_lib.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/cavium/nitrox/nitrox_lib.c @@ -90,7 +90,7 @@ static void nitrox_free_aqm_queues(struc for (i = 0; i < ndev->nr_queues; i++) { nitrox_cmdq_cleanup(ndev->aqmq[i]); - kzfree(ndev->aqmq[i]); + kfree_sensitive(ndev->aqmq[i]); ndev->aqmq[i] = NULL; } } @@ -122,7 +122,7 @@ static int nitrox_alloc_aqm_queues(struc err = nitrox_cmdq_init(cmdq, AQM_Q_ALIGN_BYTES); if (err) { - kzfree(cmdq); + kfree_sensitive(cmdq); goto aqmq_fail; } ndev->aqmq[i] = cmdq; --- a/drivers/crypto/cavium/zip/zip_crypto.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/cavium/zip/zip_crypto.c @@ -260,7 +260,7 @@ void *zip_alloc_scomp_ctx_deflate(struct ret = zip_ctx_init(zip_ctx, 0); if (ret) { - kzfree(zip_ctx); + kfree_sensitive(zip_ctx); return ERR_PTR(ret); } @@ -279,7 +279,7 @@ void *zip_alloc_scomp_ctx_lzs(struct cry ret = zip_ctx_init(zip_ctx, 1); if (ret) { - kzfree(zip_ctx); + kfree_sensitive(zip_ctx); return ERR_PTR(ret); } @@ -291,7 +291,7 @@ void zip_free_scomp_ctx(struct crypto_sc struct zip_kernel_ctx *zip_ctx = ctx; zip_ctx_exit(zip_ctx); - kzfree(zip_ctx); + kfree_sensitive(zip_ctx); } int zip_scomp_compress(struct crypto_scomp *tfm, --- a/drivers/crypto/ccp/ccp-crypto-rsa.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/ccp/ccp-crypto-rsa.c @@ -112,13 +112,13 @@ static int ccp_check_key_length(unsigned static void ccp_rsa_free_key_bufs(struct ccp_ctx *ctx) { /* Clean up old key data */ - kzfree(ctx->u.rsa.e_buf); + kfree_sensitive(ctx->u.rsa.e_buf); ctx->u.rsa.e_buf = NULL; ctx->u.rsa.e_len = 0; - kzfree(ctx->u.rsa.n_buf); + kfree_sensitive(ctx->u.rsa.n_buf); ctx->u.rsa.n_buf = NULL; ctx->u.rsa.n_len = 0; - kzfree(ctx->u.rsa.d_buf); + kfree_sensitive(ctx->u.rsa.d_buf); ctx->u.rsa.d_buf = NULL; ctx->u.rsa.d_len = 0; } --- a/drivers/crypto/ccree/cc_aead.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/ccree/cc_aead.c @@ -448,7 +448,7 @@ static int cc_get_plain_hmac_key(struct if (dma_mapping_error(dev, key_dma_addr)) { dev_err(dev, "Mapping key va=0x%p len=%u for DMA failed\n", key, keylen); - kzfree(key); + kfree_sensitive(key); return -ENOMEM; } if (keylen > blocksize) { @@ -533,7 +533,7 @@ static int cc_get_plain_hmac_key(struct if (key_dma_addr) dma_unmap_single(dev, key_dma_addr, keylen, DMA_TO_DEVICE); - kzfree(key); + kfree_sensitive(key); return rc; } --- a/drivers/crypto/ccree/cc_buffer_mgr.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/ccree/cc_buffer_mgr.c @@ -488,7 +488,7 @@ void cc_unmap_aead_request(struct device if (areq_ctx->gen_ctx.iv_dma_addr) { dma_unmap_single(dev, areq_ctx->gen_ctx.iv_dma_addr, hw_iv_size, DMA_BIDIRECTIONAL); - kzfree(areq_ctx->gen_ctx.iv); + kfree_sensitive(areq_ctx->gen_ctx.iv); } /* Release pool */ @@ -559,7 +559,7 @@ static int cc_aead_chain_iv(struct cc_dr if (dma_mapping_error(dev, areq_ctx->gen_ctx.iv_dma_addr)) { dev_err(dev, "Mapping iv %u B at va=%pK for DMA failed\n", hw_iv_size, req->iv); - kzfree(areq_ctx->gen_ctx.iv); + kfree_sensitive(areq_ctx->gen_ctx.iv); areq_ctx->gen_ctx.iv = NULL; rc = -ENOMEM; goto chain_iv_exit; --- a/drivers/crypto/ccree/cc_cipher.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/ccree/cc_cipher.c @@ -257,7 +257,7 @@ static void cc_cipher_exit(struct crypto &ctx_p->user.key_dma_addr); /* Free key buffer in context */ - kzfree(ctx_p->user.key); + kfree_sensitive(ctx_p->user.key); dev_dbg(dev, "Free key buffer in context. key=@%p\n", ctx_p->user.key); } @@ -881,7 +881,7 @@ static void cc_cipher_complete(struct de /* Not a BACKLOG notification */ cc_unmap_cipher_request(dev, req_ctx, ivsize, src, dst); memcpy(req->iv, req_ctx->iv, ivsize); - kzfree(req_ctx->iv); + kfree_sensitive(req_ctx->iv); } skcipher_request_complete(req, err); @@ -994,7 +994,7 @@ static int cc_cipher_process(struct skci exit_process: if (rc != -EINPROGRESS && rc != -EBUSY) { - kzfree(req_ctx->iv); + kfree_sensitive(req_ctx->iv); } return rc; --- a/drivers/crypto/ccree/cc_hash.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/ccree/cc_hash.c @@ -764,7 +764,7 @@ static int cc_hash_setkey(struct crypto_ if (dma_mapping_error(dev, ctx->key_params.key_dma_addr)) { dev_err(dev, "Mapping key va=0x%p len=%u for DMA failed\n", ctx->key_params.key, keylen); - kzfree(ctx->key_params.key); + kfree_sensitive(ctx->key_params.key); return -ENOMEM; } dev_dbg(dev, "mapping key-buffer: key_dma_addr=%pad keylen=%u\n", @@ -913,7 +913,7 @@ out: &ctx->key_params.key_dma_addr, ctx->key_params.keylen); } - kzfree(ctx->key_params.key); + kfree_sensitive(ctx->key_params.key); return rc; } @@ -950,7 +950,7 @@ static int cc_xcbc_setkey(struct crypto_ if (dma_mapping_error(dev, ctx->key_params.key_dma_addr)) { dev_err(dev, "Mapping key va=0x%p len=%u for DMA failed\n", key, keylen); - kzfree(ctx->key_params.key); + kfree_sensitive(ctx->key_params.key); return -ENOMEM; } dev_dbg(dev, "mapping key-buffer: key_dma_addr=%pad keylen=%u\n", @@ -999,7 +999,7 @@ static int cc_xcbc_setkey(struct crypto_ dev_dbg(dev, "Unmapped key-buffer: key_dma_addr=%pad keylen=%u\n", &ctx->key_params.key_dma_addr, ctx->key_params.keylen); - kzfree(ctx->key_params.key); + kfree_sensitive(ctx->key_params.key); return rc; } --- a/drivers/crypto/ccree/cc_request_mgr.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/ccree/cc_request_mgr.c @@ -107,7 +107,7 @@ void cc_req_mgr_fini(struct cc_drvdata * /* Kill tasklet */ tasklet_kill(&req_mgr_h->comptask); #endif - kzfree(req_mgr_h); + kfree_sensitive(req_mgr_h); drvdata->request_mgr_handle = NULL; } --- a/drivers/crypto/marvell/cesa/hash.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/marvell/cesa/hash.c @@ -1157,7 +1157,7 @@ static int mv_cesa_ahmac_pad_init(struct } /* Set the memory region to 0 to avoid any leak. */ - kzfree(keydup); + kfree_sensitive(keydup); if (ret) return ret; --- a/drivers/crypto/marvell/octeontx/otx_cptvf_main.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/marvell/octeontx/otx_cptvf_main.c @@ -68,7 +68,7 @@ static void cleanup_worker_threads(struc for (i = 0; i < cptvf->num_queues; i++) tasklet_kill(&cwqe_info->vq_wqe[i].twork); - kzfree(cwqe_info); + kfree_sensitive(cwqe_info); cptvf->wqe_info = NULL; } @@ -82,7 +82,7 @@ static void free_pending_queues(struct o continue; /* free single queue */ - kzfree((queue->head)); + kfree_sensitive((queue->head)); queue->front = 0; queue->rear = 0; queue->qlen = 0; @@ -176,7 +176,7 @@ static void free_command_queues(struct o chunk->head = NULL; chunk->dma_addr = 0; list_del(&chunk->nextchunk); - kzfree(chunk); + kfree_sensitive(chunk); } queue->num_chunks = 0; queue->idx = 0; --- a/drivers/crypto/marvell/octeontx/otx_cptvf_reqmgr.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/marvell/octeontx/otx_cptvf_reqmgr.h @@ -215,7 +215,7 @@ static inline void do_request_cleanup(st DMA_BIDIRECTIONAL); } } - kzfree(info); + kfree_sensitive(info); } struct otx_cptvf_wqe; --- a/drivers/crypto/nx/nx.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/nx/nx.c @@ -746,7 +746,7 @@ void nx_crypto_ctx_exit(struct crypto_tf { struct nx_crypto_ctx *nx_ctx = crypto_tfm_ctx(tfm); - kzfree(nx_ctx->kmem); + kfree_sensitive(nx_ctx->kmem); nx_ctx->csbcpb = NULL; nx_ctx->csbcpb_aead = NULL; nx_ctx->in_sg = NULL; @@ -762,7 +762,7 @@ void nx_crypto_ctx_aead_exit(struct cryp { struct nx_crypto_ctx *nx_ctx = crypto_aead_ctx(tfm); - kzfree(nx_ctx->kmem); + kfree_sensitive(nx_ctx->kmem); } static int nx_probe(struct vio_dev *viodev, const struct vio_device_id *id) --- a/drivers/crypto/virtio/virtio_crypto_algs.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/virtio/virtio_crypto_algs.c @@ -167,7 +167,7 @@ static int virtio_crypto_alg_skcipher_in num_in, vcrypto, GFP_ATOMIC); if (err < 0) { spin_unlock(&vcrypto->ctrl_lock); - kzfree(cipher_key); + kfree_sensitive(cipher_key); return err; } virtqueue_kick(vcrypto->ctrl_vq); @@ -184,7 +184,7 @@ static int virtio_crypto_alg_skcipher_in spin_unlock(&vcrypto->ctrl_lock); pr_err("virtio_crypto: Create session failed status: %u\n", le32_to_cpu(vcrypto->input.status)); - kzfree(cipher_key); + kfree_sensitive(cipher_key); return -EINVAL; } @@ -197,7 +197,7 @@ static int virtio_crypto_alg_skcipher_in spin_unlock(&vcrypto->ctrl_lock); - kzfree(cipher_key); + kfree_sensitive(cipher_key); return 0; } @@ -472,9 +472,9 @@ __virtio_crypto_skcipher_do_req(struct v return 0; free_iv: - kzfree(iv); + kfree_sensitive(iv); free: - kzfree(req_data); + kfree_sensitive(req_data); kfree(sgs); return err; } @@ -583,7 +583,7 @@ static void virtio_crypto_skcipher_final scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen - AES_BLOCK_SIZE, AES_BLOCK_SIZE, 0); - kzfree(vc_sym_req->iv); + kfree_sensitive(vc_sym_req->iv); virtcrypto_clear_request(&vc_sym_req->base); crypto_finalize_skcipher_request(vc_sym_req->base.dataq->engine, --- a/drivers/crypto/virtio/virtio_crypto_core.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/crypto/virtio/virtio_crypto_core.c @@ -17,7 +17,7 @@ void virtcrypto_clear_request(struct virtio_crypto_request *vc_req) { if (vc_req) { - kzfree(vc_req->req_data); + kfree_sensitive(vc_req->req_data); kfree(vc_req->sgs); } } --- a/drivers/md/dm-crypt.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/md/dm-crypt.c @@ -407,7 +407,7 @@ static void crypt_iv_lmk_dtr(struct cryp crypto_free_shash(lmk->hash_tfm); lmk->hash_tfm = NULL; - kzfree(lmk->seed); + kfree_sensitive(lmk->seed); lmk->seed = NULL; } @@ -558,9 +558,9 @@ static void crypt_iv_tcw_dtr(struct cryp { struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw; - kzfree(tcw->iv_seed); + kfree_sensitive(tcw->iv_seed); tcw->iv_seed = NULL; - kzfree(tcw->whitening); + kfree_sensitive(tcw->whitening); tcw->whitening = NULL; if (tcw->crc32_tfm && !IS_ERR(tcw->crc32_tfm)) @@ -994,8 +994,8 @@ static int crypt_iv_elephant(struct cryp kunmap_atomic(data); out: - kzfree(ks); - kzfree(es); + kfree_sensitive(ks); + kfree_sensitive(es); skcipher_request_free(req); return r; } @@ -2294,7 +2294,7 @@ static int crypt_set_keyring_key(struct key = request_key(type, key_desc + 1, NULL); if (IS_ERR(key)) { - kzfree(new_key_string); + kfree_sensitive(new_key_string); return PTR_ERR(key); } @@ -2304,7 +2304,7 @@ static int crypt_set_keyring_key(struct if (ret < 0) { up_read(&key->sem); key_put(key); - kzfree(new_key_string); + kfree_sensitive(new_key_string); return ret; } @@ -2318,10 +2318,10 @@ static int crypt_set_keyring_key(struct if (!ret) { set_bit(DM_CRYPT_KEY_VALID, &cc->flags); - kzfree(cc->key_string); + kfree_sensitive(cc->key_string); cc->key_string = new_key_string; } else - kzfree(new_key_string); + kfree_sensitive(new_key_string); return ret; } @@ -2382,7 +2382,7 @@ static int crypt_set_key(struct crypt_co clear_bit(DM_CRYPT_KEY_VALID, &cc->flags); /* wipe references to any kernel keyring key */ - kzfree(cc->key_string); + kfree_sensitive(cc->key_string); cc->key_string = NULL; /* Decode key from its hex representation. */ @@ -2414,7 +2414,7 @@ static int crypt_wipe_key(struct crypt_c return r; } - kzfree(cc->key_string); + kfree_sensitive(cc->key_string); cc->key_string = NULL; r = crypt_setkey(cc); memset(&cc->key, 0, cc->key_size * sizeof(u8)); @@ -2493,15 +2493,15 @@ static void crypt_dtr(struct dm_target * if (cc->dev) dm_put_device(ti, cc->dev); - kzfree(cc->cipher_string); - kzfree(cc->key_string); - kzfree(cc->cipher_auth); - kzfree(cc->authenc_key); + kfree_sensitive(cc->cipher_string); + kfree_sensitive(cc->key_string); + kfree_sensitive(cc->cipher_auth); + kfree_sensitive(cc->authenc_key); mutex_destroy(&cc->bio_alloc_lock); /* Must zero key material before freeing */ - kzfree(cc); + kfree_sensitive(cc); spin_lock(&dm_crypt_clients_lock); WARN_ON(!dm_crypt_clients_n); --- a/drivers/md/dm-integrity.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/md/dm-integrity.c @@ -3405,8 +3405,8 @@ static struct scatterlist **dm_integrity static void free_alg(struct alg_spec *a) { - kzfree(a->alg_string); - kzfree(a->key); + kfree_sensitive(a->alg_string); + kfree_sensitive(a->key); memset(a, 0, sizeof *a); } @@ -4337,7 +4337,7 @@ static void dm_integrity_dtr(struct dm_t for (i = 0; i < ic->journal_sections; i++) { struct skcipher_request *req = ic->sk_requests[i]; if (req) { - kzfree(req->iv); + kfree_sensitive(req->iv); skcipher_request_free(req); } } --- a/drivers/misc/ibmvmc.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/misc/ibmvmc.c @@ -286,7 +286,7 @@ static void *alloc_dma_buffer(struct vio if (dma_mapping_error(&vdev->dev, *dma_handle)) { *dma_handle = 0; - kzfree(buffer); + kfree_sensitive(buffer); return NULL; } @@ -310,7 +310,7 @@ static void free_dma_buffer(struct vio_d dma_unmap_single(&vdev->dev, dma_handle, size, DMA_BIDIRECTIONAL); /* deallocate memory */ - kzfree(vaddr); + kfree_sensitive(vaddr); } /** @@ -883,7 +883,7 @@ static int ibmvmc_close(struct inode *in spin_unlock_irqrestore(&hmc->lock, flags); } - kzfree(session); + kfree_sensitive(session); return rc; } --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -137,7 +137,7 @@ static void hclge_free_vector_ring_chain while (chain) { chain_tmp = chain->next; - kzfree(chain); + kfree_sensitive(chain); chain = chain_tmp; } } --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c @@ -960,9 +960,9 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_a return 0; err_aead: - kzfree(xs->aead); + kfree_sensitive(xs->aead); err_xs: - kzfree(xs); + kfree_sensitive(xs); err_out: msgbuf[1] = err; return err; @@ -1047,7 +1047,7 @@ int ixgbe_ipsec_vf_del_sa(struct ixgbe_a ixgbe_ipsec_del_sa(xs); /* remove the xs that was made-up in the add request */ - kzfree(xs); + kfree_sensitive(xs); return 0; } --- a/drivers/net/ppp/ppp_mppe.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/ppp/ppp_mppe.c @@ -222,7 +222,7 @@ out_free: kfree(state->sha1_digest); if (state->sha1) { crypto_free_shash(state->sha1->tfm); - kzfree(state->sha1); + kfree_sensitive(state->sha1); } kfree(state); out: @@ -238,8 +238,8 @@ static void mppe_free(void *arg) if (state) { kfree(state->sha1_digest); crypto_free_shash(state->sha1->tfm); - kzfree(state->sha1); - kzfree(state); + kfree_sensitive(state->sha1); + kfree_sensitive(state); } } --- a/drivers/net/wireguard/noise.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/wireguard/noise.c @@ -114,7 +114,7 @@ static struct noise_keypair *keypair_cre static void keypair_free_rcu(struct rcu_head *rcu) { - kzfree(container_of(rcu, struct noise_keypair, rcu)); + kfree_sensitive(container_of(rcu, struct noise_keypair, rcu)); } static void keypair_free_kref(struct kref *kref) @@ -821,7 +821,7 @@ bool wg_noise_handshake_begin_session(st handshake->entry.peer->device->index_hashtable, &handshake->entry, &new_keypair->entry); } else { - kzfree(new_keypair); + kfree_sensitive(new_keypair); } rcu_read_unlock_bh(); --- a/drivers/net/wireguard/peer.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/wireguard/peer.c @@ -203,7 +203,7 @@ static void rcu_release(struct rcu_head /* The final zeroing takes care of clearing any remaining handshake key * material and other potentially sensitive information. */ - kzfree(peer); + kfree_sensitive(peer); } static void kref_release(struct kref *refcount) --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c @@ -1369,7 +1369,7 @@ static void iwl_pcie_rx_handle_rb(struct &rxcb, rxq->id); if (reclaim) { - kzfree(txq->entries[cmd_index].free_buf); + kfree_sensitive(txq->entries[cmd_index].free_buf); txq->entries[cmd_index].free_buf = NULL; } --- a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c @@ -721,8 +721,8 @@ static void iwl_pcie_txq_free(struct iwl /* De-alloc array of command/tx buffers */ if (txq_id == trans->txqs.cmd.q_id) for (i = 0; i < txq->n_window; i++) { - kzfree(txq->entries[i].cmd); - kzfree(txq->entries[i].free_buf); + kfree_sensitive(txq->entries[i].cmd); + kfree_sensitive(txq->entries[i].free_buf); } /* De-alloc circular buffer of TFDs */ @@ -1765,7 +1765,7 @@ static int iwl_pcie_enqueue_hcmd(struct BUILD_BUG_ON(IWL_TFH_NUM_TBS > sizeof(out_meta->tbs) * BITS_PER_BYTE); out_meta->flags = cmd->flags; if (WARN_ON_ONCE(txq->entries[idx].free_buf)) - kzfree(txq->entries[idx].free_buf); + kfree_sensitive(txq->entries[idx].free_buf); txq->entries[idx].free_buf = dup_buf; trace_iwlwifi_dev_hcmd(trans->dev, cmd, cmd_size, &out_cmd->hdr_wide); --- a/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c @@ -1026,7 +1026,7 @@ static int iwl_pcie_gen2_enqueue_hcmd(st BUILD_BUG_ON(IWL_TFH_NUM_TBS > sizeof(out_meta->tbs) * BITS_PER_BYTE); out_meta->flags = cmd->flags; if (WARN_ON_ONCE(txq->entries[idx].free_buf)) - kzfree(txq->entries[idx].free_buf); + kfree_sensitive(txq->entries[idx].free_buf); txq->entries[idx].free_buf = dup_buf; trace_iwlwifi_dev_hcmd(trans->dev, cmd, cmd_size, &out_cmd->hdr_wide); @@ -1257,8 +1257,8 @@ static void iwl_pcie_gen2_txq_free(struc /* De-alloc array of command/tx buffers */ if (txq_id == trans->txqs.cmd.q_id) for (i = 0; i < txq->n_window; i++) { - kzfree(txq->entries[i].cmd); - kzfree(txq->entries[i].free_buf); + kfree_sensitive(txq->entries[i].cmd); + kfree_sensitive(txq->entries[i].free_buf); } del_timer_sync(&txq->stuck_timer); --- a/drivers/net/wireless/intersil/orinoco/wext.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/net/wireless/intersil/orinoco/wext.c @@ -31,8 +31,8 @@ static int orinoco_set_key(struct orinoc enum orinoco_alg alg, const u8 *key, int key_len, const u8 *seq, int seq_len) { - kzfree(priv->keys[index].key); - kzfree(priv->keys[index].seq); + kfree_sensitive(priv->keys[index].key); + kfree_sensitive(priv->keys[index].seq); if (key_len) { priv->keys[index].key = kzalloc(key_len, GFP_ATOMIC); --- a/drivers/s390/crypto/ap_bus.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/s390/crypto/ap_bus.h @@ -219,8 +219,8 @@ static inline void ap_init_message(struc */ static inline void ap_release_message(struct ap_message *ap_msg) { - kzfree(ap_msg->msg); - kzfree(ap_msg->private); + kfree_sensitive(ap_msg->msg); + kfree_sensitive(ap_msg->private); } /* --- a/drivers/staging/ks7010/ks_hostif.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/staging/ks7010/ks_hostif.c @@ -245,7 +245,7 @@ michael_mic(u8 *key, u8 *data, unsigned ret = crypto_shash_finup(desc, data + 12, len - 12, result); err_free_desc: - kzfree(desc); + kfree_sensitive(desc); err_free_tfm: crypto_free_shash(tfm); --- a/drivers/staging/rtl8723bs/core/rtw_security.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/staging/rtl8723bs/core/rtw_security.c @@ -2251,7 +2251,7 @@ static void gf_mulx(u8 *pad) static void aes_encrypt_deinit(void *ctx) { - kzfree(ctx); + kfree_sensitive(ctx); } --- a/drivers/staging/wlan-ng/p80211netdev.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/staging/wlan-ng/p80211netdev.c @@ -429,7 +429,7 @@ static netdev_tx_t p80211knetdev_hard_st failed: /* Free up the WEP buffer if it's not the same as the skb */ if ((p80211_wep.data) && (p80211_wep.data != skb->data)) - kzfree(p80211_wep.data); + kfree_sensitive(p80211_wep.data); /* we always free the skb here, never in a lower level. */ if (!result) --- a/drivers/target/iscsi/iscsi_target_auth.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/drivers/target/iscsi/iscsi_target_auth.c @@ -484,7 +484,7 @@ static int chap_server_compute_hash( pr_debug("[server] Sending CHAP_R=0x%s\n", response); auth_ret = 0; out: - kzfree(desc); + kfree_sensitive(desc); if (tfm) crypto_free_shash(tfm); kfree(initiatorchg); --- a/fs/cifs/cifsencrypt.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/cifs/cifsencrypt.c @@ -797,7 +797,7 @@ calc_seckey(struct cifs_ses *ses) ses->auth_key.len = CIFS_SESS_KEY_SIZE; memzero_explicit(sec_key, CIFS_SESS_KEY_SIZE); - kzfree(ctx_arc4); + kfree_sensitive(ctx_arc4); return 0; } --- a/fs/cifs/connect.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/cifs/connect.c @@ -2182,7 +2182,7 @@ cifs_parse_mount_options(const char *mou tmp_end++; if (!(tmp_end < end && tmp_end[1] == delim)) { /* No it is not. Set the password to NULL */ - kzfree(vol->password); + kfree_sensitive(vol->password); vol->password = NULL; break; } @@ -2220,7 +2220,7 @@ cifs_parse_mount_options(const char *mou options = end; } - kzfree(vol->password); + kfree_sensitive(vol->password); /* Now build new password string */ temp_len = strlen(value); vol->password = kzalloc(temp_len+1, GFP_KERNEL); @@ -3198,7 +3198,7 @@ cifs_set_cifscreds(struct smb_vol *vol, rc = -ENOMEM; kfree(vol->username); vol->username = NULL; - kzfree(vol->password); + kfree_sensitive(vol->password); vol->password = NULL; goto out_key_put; } @@ -4219,7 +4219,7 @@ void cifs_cleanup_volume_info_contents(struct smb_vol *volume_info) { kfree(volume_info->username); - kzfree(volume_info->password); + kfree_sensitive(volume_info->password); kfree(volume_info->UNC); kfree(volume_info->domainname); kfree(volume_info->iocharset); @@ -5345,7 +5345,7 @@ cifs_construct_tcon(struct cifs_sb_info out: kfree(vol_info->username); - kzfree(vol_info->password); + kfree_sensitive(vol_info->password); kfree(vol_info); return tcon; --- a/fs/cifs/dfs_cache.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/cifs/dfs_cache.c @@ -1131,7 +1131,7 @@ err_free_domainname: err_free_unc: kfree(new->UNC); err_free_password: - kzfree(new->password); + kfree_sensitive(new->password); err_free_username: kfree(new->username); kfree(new); --- a/fs/cifs/misc.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/cifs/misc.c @@ -103,12 +103,12 @@ sesInfoFree(struct cifs_ses *buf_to_free kfree(buf_to_free->serverOS); kfree(buf_to_free->serverDomain); kfree(buf_to_free->serverNOS); - kzfree(buf_to_free->password); + kfree_sensitive(buf_to_free->password); kfree(buf_to_free->user_name); kfree(buf_to_free->domainName); - kzfree(buf_to_free->auth_key.response); + kfree_sensitive(buf_to_free->auth_key.response); kfree(buf_to_free->iface_list); - kzfree(buf_to_free); + kfree_sensitive(buf_to_free); } struct cifs_tcon * @@ -148,7 +148,7 @@ tconInfoFree(struct cifs_tcon *buf_to_fr } atomic_dec(&tconInfoAllocCount); kfree(buf_to_free->nativeFileSystem); - kzfree(buf_to_free->password); + kfree_sensitive(buf_to_free->password); kfree(buf_to_free->crfid.fid); #ifdef CONFIG_CIFS_DFS_UPCALL kfree(buf_to_free->dfs_path); --- a/fs/crypto/inline_crypt.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/crypto/inline_crypt.c @@ -16,6 +16,7 @@ #include <linux/blkdev.h> #include <linux/buffer_head.h> #include <linux/sched/mm.h> +#include <linux/slab.h> #include "fscrypt_private.h" @@ -187,7 +188,7 @@ int fscrypt_prepare_inline_crypt_key(str fail: for (i = 0; i < queue_refs; i++) blk_put_queue(blk_key->devs[i]); - kzfree(blk_key); + kfree_sensitive(blk_key); return err; } @@ -201,7 +202,7 @@ void fscrypt_destroy_inline_crypt_key(st blk_crypto_evict_key(blk_key->devs[i], &blk_key->base); blk_put_queue(blk_key->devs[i]); } - kzfree(blk_key); + kfree_sensitive(blk_key); } } --- a/fs/crypto/keyring.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/crypto/keyring.c @@ -51,7 +51,7 @@ static void free_master_key(struct fscry } key_put(mk->mk_users); - kzfree(mk); + kfree_sensitive(mk); } static inline bool valid_key_spec(const struct fscrypt_key_specifier *spec) @@ -531,7 +531,7 @@ static int fscrypt_provisioning_key_prep static void fscrypt_provisioning_key_free_preparse( struct key_preparsed_payload *prep) { - kzfree(prep->payload.data[0]); + kfree_sensitive(prep->payload.data[0]); } static void fscrypt_provisioning_key_describe(const struct key *key, @@ -548,7 +548,7 @@ static void fscrypt_provisioning_key_des static void fscrypt_provisioning_key_destroy(struct key *key) { - kzfree(key->payload.data[0]); + kfree_sensitive(key->payload.data[0]); } static struct key_type key_type_fscrypt_provisioning = { --- a/fs/crypto/keysetup_v1.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/crypto/keysetup_v1.c @@ -155,7 +155,7 @@ static void free_direct_key(struct fscry { if (dk) { fscrypt_destroy_prepared_key(&dk->dk_key); - kzfree(dk); + kfree_sensitive(dk); } } @@ -283,7 +283,7 @@ static int setup_v1_file_key_derived(str err = fscrypt_set_per_file_enc_key(ci, derived_key); out: - kzfree(derived_key); + kfree_sensitive(derived_key); return err; } --- a/fs/ecryptfs/keystore.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/ecryptfs/keystore.c @@ -838,7 +838,7 @@ ecryptfs_write_tag_70_packet(char *dest, out_release_free_unlock: crypto_free_shash(s->hash_tfm); out_free_unlock: - kzfree(s->block_aligned_filename); + kfree_sensitive(s->block_aligned_filename); out_unlock: mutex_unlock(s->tfm_mutex); out: @@ -847,7 +847,7 @@ out: key_put(auth_tok_key); } skcipher_request_free(s->skcipher_req); - kzfree(s->hash_desc); + kfree_sensitive(s->hash_desc); kfree(s); return rc; } --- a/fs/ecryptfs/messaging.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/fs/ecryptfs/messaging.c @@ -175,7 +175,7 @@ int ecryptfs_exorcise_daemon(struct ecry } hlist_del(&daemon->euid_chain); mutex_unlock(&daemon->mux); - kzfree(daemon); + kfree_sensitive(daemon); out: return rc; } --- a/include/crypto/aead.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/crypto/aead.h @@ -425,7 +425,7 @@ static inline struct aead_request *aead_ */ static inline void aead_request_free(struct aead_request *req) { - kzfree(req); + kfree_sensitive(req); } /** --- a/include/crypto/akcipher.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/crypto/akcipher.h @@ -207,7 +207,7 @@ static inline struct akcipher_request *a */ static inline void akcipher_request_free(struct akcipher_request *req) { - kzfree(req); + kfree_sensitive(req); } /** --- a/include/crypto/gf128mul.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/crypto/gf128mul.h @@ -230,7 +230,7 @@ void gf128mul_4k_bbe(be128 *a, const str void gf128mul_x8_ble(le128 *r, const le128 *x); static inline void gf128mul_free_4k(struct gf128mul_4k *t) { - kzfree(t); + kfree_sensitive(t); } --- a/include/crypto/hash.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/crypto/hash.h @@ -606,7 +606,7 @@ static inline struct ahash_request *ahas */ static inline void ahash_request_free(struct ahash_request *req) { - kzfree(req); + kfree_sensitive(req); } static inline void ahash_request_zero(struct ahash_request *req) --- a/include/crypto/internal/acompress.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/crypto/internal/acompress.h @@ -46,7 +46,7 @@ static inline struct acomp_req *__acomp_ static inline void __acomp_request_free(struct acomp_req *req) { - kzfree(req); + kfree_sensitive(req); } /** --- a/include/crypto/kpp.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/crypto/kpp.h @@ -187,7 +187,7 @@ static inline struct kpp_request *kpp_re */ static inline void kpp_request_free(struct kpp_request *req) { - kzfree(req); + kfree_sensitive(req); } /** --- a/include/crypto/skcipher.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/crypto/skcipher.h @@ -508,7 +508,7 @@ static inline struct skcipher_request *s */ static inline void skcipher_request_free(struct skcipher_request *req) { - kzfree(req); + kfree_sensitive(req); } static inline void skcipher_request_zero(struct skcipher_request *req) --- a/include/linux/slab.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/include/linux/slab.h @@ -186,10 +186,12 @@ void memcg_deactivate_kmem_caches(struct */ void * __must_check krealloc(const void *, size_t, gfp_t); void kfree(const void *); -void kzfree(const void *); +void kfree_sensitive(const void *); size_t __ksize(const void *); size_t ksize(const void *); +#define kzfree(x) kfree_sensitive(x) /* For backward compatibility */ + #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR void __check_heap_object(const void *ptr, unsigned long n, struct page *page, bool to_user); --- a/lib/mpi/mpiutil.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/lib/mpi/mpiutil.c @@ -69,7 +69,7 @@ void mpi_free_limb_space(mpi_ptr_t a) if (!a) return; - kzfree(a); + kfree_sensitive(a); } void mpi_assign_limb_space(MPI a, mpi_ptr_t ap, unsigned nlimbs) @@ -95,7 +95,7 @@ int mpi_resize(MPI a, unsigned nlimbs) if (!p) return -ENOMEM; memcpy(p, a->d, a->alloced * sizeof(mpi_limb_t)); - kzfree(a->d); + kfree_sensitive(a->d); a->d = p; } else { a->d = kcalloc(nlimbs, sizeof(mpi_limb_t), GFP_KERNEL); @@ -112,7 +112,7 @@ void mpi_free(MPI a) return; if (a->flags & 4) - kzfree(a->d); + kfree_sensitive(a->d); else mpi_free_limb_space(a->d); --- a/lib/test_kasan.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/lib/test_kasan.c @@ -766,15 +766,15 @@ static noinline void __init kmalloc_doub char *ptr; size_t size = 16; - pr_info("double-free (kzfree)\n"); + pr_info("double-free (kfree_sensitive)\n"); ptr = kmalloc(size, GFP_KERNEL); if (!ptr) { pr_err("Allocation failed\n"); return; } - kzfree(ptr); - kzfree(ptr); + kfree_sensitive(ptr); + kfree_sensitive(ptr); } #ifdef CONFIG_KASAN_VMALLOC --- a/mm/slab_common.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/mm/slab_common.c @@ -1729,17 +1729,17 @@ void *krealloc(const void *p, size_t new EXPORT_SYMBOL(krealloc); /** - * kzfree - like kfree but zero memory + * kfree_sensitive - Clear sensitive information in memory before freeing * @p: object to free memory of * * The memory of the object @p points to is zeroed before freed. - * If @p is %NULL, kzfree() does nothing. + * If @p is %NULL, kfree_sensitive() does nothing. * * Note: this function zeroes the whole allocated buffer which can be a good * deal bigger than the requested buffer size passed to kmalloc(). So be * careful when using this function in performance sensitive code. */ -void kzfree(const void *p) +void kfree_sensitive(const void *p) { size_t ks; void *mem = (void *)p; @@ -1750,7 +1750,7 @@ void kzfree(const void *p) memzero_explicit(mem, ks); kfree(mem); } -EXPORT_SYMBOL(kzfree); +EXPORT_SYMBOL(kfree_sensitive); /** * ksize - get the actual amount of memory allocated for a given object --- a/net/atm/mpoa_caches.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/atm/mpoa_caches.c @@ -180,7 +180,7 @@ static int cache_hit(in_cache_entry *ent static void in_cache_put(in_cache_entry *entry) { if (refcount_dec_and_test(&entry->use)) { - kzfree(entry); + kfree_sensitive(entry); } } @@ -415,7 +415,7 @@ static eg_cache_entry *eg_cache_get_by_s static void eg_cache_put(eg_cache_entry *entry) { if (refcount_dec_and_test(&entry->use)) { - kzfree(entry); + kfree_sensitive(entry); } } --- a/net/bluetooth/ecdh_helper.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/bluetooth/ecdh_helper.c @@ -104,7 +104,7 @@ int compute_ecdh_secret(struct crypto_kp free_all: kpp_request_free(req); free_tmp: - kzfree(tmp); + kfree_sensitive(tmp); return err; } @@ -151,9 +151,9 @@ int set_ecdh_privkey(struct crypto_kpp * err = crypto_kpp_set_secret(tfm, buf, buf_len); /* fall through */ free_all: - kzfree(buf); + kfree_sensitive(buf); free_tmp: - kzfree(tmp); + kfree_sensitive(tmp); return err; } --- a/net/bluetooth/smp.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/bluetooth/smp.c @@ -753,9 +753,9 @@ static void smp_chan_destroy(struct l2ca complete = test_bit(SMP_FLAG_COMPLETE, &smp->flags); mgmt_smp_complete(hcon, complete); - kzfree(smp->csrk); - kzfree(smp->slave_csrk); - kzfree(smp->link_key); + kfree_sensitive(smp->csrk); + kfree_sensitive(smp->slave_csrk); + kfree_sensitive(smp->link_key); crypto_free_shash(smp->tfm_cmac); crypto_free_kpp(smp->tfm_ecdh); @@ -789,7 +789,7 @@ static void smp_chan_destroy(struct l2ca } chan->data = NULL; - kzfree(smp); + kfree_sensitive(smp); hci_conn_drop(hcon); } @@ -1156,7 +1156,7 @@ static void sc_generate_link_key(struct const u8 salt[16] = { 0x31, 0x70, 0x6d, 0x74 }; if (smp_h7(smp->tfm_cmac, smp->tk, salt, smp->link_key)) { - kzfree(smp->link_key); + kfree_sensitive(smp->link_key); smp->link_key = NULL; return; } @@ -1165,14 +1165,14 @@ static void sc_generate_link_key(struct const u8 tmp1[4] = { 0x31, 0x70, 0x6d, 0x74 }; if (smp_h6(smp->tfm_cmac, smp->tk, tmp1, smp->link_key)) { - kzfree(smp->link_key); + kfree_sensitive(smp->link_key); smp->link_key = NULL; return; } } if (smp_h6(smp->tfm_cmac, smp->link_key, lebr, smp->link_key)) { - kzfree(smp->link_key); + kfree_sensitive(smp->link_key); smp->link_key = NULL; return; } @@ -1407,7 +1407,7 @@ static struct smp_chan *smp_chan_create( free_shash: crypto_free_shash(smp->tfm_cmac); zfree_smp: - kzfree(smp); + kfree_sensitive(smp); return NULL; } @@ -3278,7 +3278,7 @@ static struct l2cap_chan *smp_add_cid(st tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0); if (IS_ERR(tfm_cmac)) { BT_ERR("Unable to create CMAC crypto context"); - kzfree(smp); + kfree_sensitive(smp); return ERR_CAST(tfm_cmac); } @@ -3286,7 +3286,7 @@ static struct l2cap_chan *smp_add_cid(st if (IS_ERR(tfm_ecdh)) { BT_ERR("Unable to create ECDH crypto context"); crypto_free_shash(tfm_cmac); - kzfree(smp); + kfree_sensitive(smp); return ERR_CAST(tfm_ecdh); } @@ -3300,7 +3300,7 @@ create_chan: if (smp) { crypto_free_shash(smp->tfm_cmac); crypto_free_kpp(smp->tfm_ecdh); - kzfree(smp); + kfree_sensitive(smp); } return ERR_PTR(-ENOMEM); } @@ -3347,7 +3347,7 @@ static void smp_del_chan(struct l2cap_ch chan->data = NULL; crypto_free_shash(smp->tfm_cmac); crypto_free_kpp(smp->tfm_ecdh); - kzfree(smp); + kfree_sensitive(smp); } l2cap_chan_put(chan); --- a/net/core/sock.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/core/sock.c @@ -2257,7 +2257,7 @@ static inline void __sock_kfree_s(struct if (WARN_ON_ONCE(!mem)) return; if (nullify) - kzfree(mem); + kfree_sensitive(mem); else kfree(mem); atomic_sub(size, &sk->sk_omem_alloc); --- a/net/ipv4/tcp_fastopen.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/ipv4/tcp_fastopen.c @@ -38,7 +38,7 @@ static void tcp_fastopen_ctx_free(struct struct tcp_fastopen_context *ctx = container_of(head, struct tcp_fastopen_context, rcu); - kzfree(ctx); + kfree_sensitive(ctx); } void tcp_fastopen_destroy_cipher(struct sock *sk) --- a/net/mac80211/aead_api.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/mac80211/aead_api.c @@ -41,7 +41,7 @@ int aead_encrypt(struct crypto_aead *tfm aead_request_set_ad(aead_req, sg[0].length); crypto_aead_encrypt(aead_req); - kzfree(aead_req); + kfree_sensitive(aead_req); return 0; } @@ -76,7 +76,7 @@ int aead_decrypt(struct crypto_aead *tfm aead_request_set_ad(aead_req, sg[0].length); err = crypto_aead_decrypt(aead_req); - kzfree(aead_req); + kfree_sensitive(aead_req); return err; } --- a/net/mac80211/aes_gmac.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/mac80211/aes_gmac.c @@ -60,7 +60,7 @@ int ieee80211_aes_gmac(struct crypto_aea aead_request_set_ad(aead_req, GMAC_AAD_LEN + data_len); crypto_aead_encrypt(aead_req); - kzfree(aead_req); + kfree_sensitive(aead_req); return 0; } --- a/net/mac80211/key.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/mac80211/key.c @@ -732,7 +732,7 @@ static void ieee80211_key_free_common(st ieee80211_aes_gcm_key_free(key->u.gcmp.tfm); break; } - kzfree(key); + kfree_sensitive(key); } static void __ieee80211_key_destroy(struct ieee80211_key *key, --- a/net/mac802154/llsec.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/mac802154/llsec.c @@ -49,7 +49,7 @@ void mac802154_llsec_destroy(struct mac8 msl = container_of(sl, struct mac802154_llsec_seclevel, level); list_del(&sl->list); - kzfree(msl); + kfree_sensitive(msl); } list_for_each_entry_safe(dev, dn, &sec->table.devices, list) { @@ -66,7 +66,7 @@ void mac802154_llsec_destroy(struct mac8 mkey = container_of(key->key, struct mac802154_llsec_key, key); list_del(&key->list); llsec_key_put(mkey); - kzfree(key); + kfree_sensitive(key); } } @@ -155,7 +155,7 @@ err_tfm: if (key->tfm[i]) crypto_free_aead(key->tfm[i]); - kzfree(key); + kfree_sensitive(key); return NULL; } @@ -170,7 +170,7 @@ static void llsec_key_release(struct kre crypto_free_aead(key->tfm[i]); crypto_free_sync_skcipher(key->tfm0); - kzfree(key); + kfree_sensitive(key); } static struct mac802154_llsec_key* @@ -261,7 +261,7 @@ int mac802154_llsec_key_add(struct mac80 return 0; fail: - kzfree(new); + kfree_sensitive(new); return -ENOMEM; } @@ -341,10 +341,10 @@ static void llsec_dev_free(struct mac802 devkey); list_del(&pos->list); - kzfree(devkey); + kfree_sensitive(devkey); } - kzfree(dev); + kfree_sensitive(dev); } int mac802154_llsec_dev_add(struct mac802154_llsec *sec, @@ -682,7 +682,7 @@ llsec_do_encrypt_auth(struct sk_buff *sk rc = crypto_aead_encrypt(req); - kzfree(req); + kfree_sensitive(req); return rc; } @@ -886,7 +886,7 @@ llsec_do_decrypt_auth(struct sk_buff *sk rc = crypto_aead_decrypt(req); - kzfree(req); + kfree_sensitive(req); skb_trim(skb, skb->len - authlen); return rc; @@ -926,7 +926,7 @@ llsec_update_devkey_record(struct mac802 if (!devkey) list_add_rcu(&next->devkey.list, &dev->dev.keys); else - kzfree(next); + kfree_sensitive(next); spin_unlock_bh(&dev->lock); } --- a/net/sctp/auth.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/sctp/auth.c @@ -49,7 +49,7 @@ void sctp_auth_key_put(struct sctp_auth_ return; if (refcount_dec_and_test(&key->refcnt)) { - kzfree(key); + kfree_sensitive(key); SCTP_DBG_OBJCNT_DEC(keys); } } --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/sunrpc/auth_gss/gss_krb5_crypto.c @@ -1003,7 +1003,7 @@ krb5_rc4_setup_seq_key(struct krb5_ctx * err = 0; out_err: - kzfree(desc); + kfree_sensitive(desc); crypto_free_shash(hmac); dprintk("%s: returning %d\n", __func__, err); return err; @@ -1079,7 +1079,7 @@ krb5_rc4_setup_enc_key(struct krb5_ctx * err = 0; out_err: - kzfree(desc); + kfree_sensitive(desc); crypto_free_shash(hmac); dprintk("%s: returning %d\n", __func__, err); return err; --- a/net/sunrpc/auth_gss/gss_krb5_keys.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/sunrpc/auth_gss/gss_krb5_keys.c @@ -228,11 +228,11 @@ u32 krb5_derive_key(const struct gss_krb ret = 0; err_free_raw: - kzfree(rawkey); + kfree_sensitive(rawkey); err_free_out: - kzfree(outblockdata); + kfree_sensitive(outblockdata); err_free_in: - kzfree(inblockdata); + kfree_sensitive(inblockdata); err_free_cipher: crypto_free_sync_skcipher(cipher); err_return: --- a/net/sunrpc/auth_gss/gss_krb5_mech.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/sunrpc/auth_gss/gss_krb5_mech.c @@ -443,7 +443,7 @@ context_derive_keys_rc4(struct krb5_ctx desc->tfm = hmac; err = crypto_shash_digest(desc, sigkeyconstant, slen, ctx->cksum); - kzfree(desc); + kfree_sensitive(desc); if (err) goto out_err_free_hmac; /* --- a/net/tipc/crypto.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/tipc/crypto.c @@ -441,7 +441,7 @@ static int tipc_aead_init(struct tipc_ae /* Allocate per-cpu TFM entry pointer */ tmp->tfm_entry = alloc_percpu(struct tipc_tfm *); if (!tmp->tfm_entry) { - kzfree(tmp); + kfree_sensitive(tmp); return -ENOMEM; } @@ -491,7 +491,7 @@ static int tipc_aead_init(struct tipc_ae /* Not any TFM is allocated? */ if (!tfm_cnt) { free_percpu(tmp->tfm_entry); - kzfree(tmp); + kfree_sensitive(tmp); return err; } @@ -545,7 +545,7 @@ static int tipc_aead_clone(struct tipc_a aead->tfm_entry = alloc_percpu_gfp(struct tipc_tfm *, GFP_ATOMIC); if (unlikely(!aead->tfm_entry)) { - kzfree(aead); + kfree_sensitive(aead); return -ENOMEM; } @@ -1352,7 +1352,7 @@ int tipc_crypto_start(struct tipc_crypto /* Allocate statistic structure */ c->stats = alloc_percpu_gfp(struct tipc_crypto_stats, GFP_ATOMIC); if (!c->stats) { - kzfree(c); + kfree_sensitive(c); return -ENOMEM; } @@ -1408,7 +1408,7 @@ void tipc_crypto_stop(struct tipc_crypto free_percpu(c->stats); *crypto = NULL; - kzfree(c); + kfree_sensitive(c); } void tipc_crypto_timeout(struct tipc_crypto *rx) --- a/net/wireless/core.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/core.c @@ -1125,7 +1125,7 @@ static void __cfg80211_unregister_wdev(s } #ifdef CONFIG_CFG80211_WEXT - kzfree(wdev->wext.keys); + kfree_sensitive(wdev->wext.keys); wdev->wext.keys = NULL; #endif /* only initialized if we have a netdev */ --- a/net/wireless/ibss.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/ibss.c @@ -127,7 +127,7 @@ int __cfg80211_join_ibss(struct cfg80211 return -EINVAL; if (WARN_ON(wdev->connect_keys)) - kzfree(wdev->connect_keys); + kfree_sensitive(wdev->connect_keys); wdev->connect_keys = connkeys; wdev->ibss_fixed = params->channel_fixed; @@ -161,7 +161,7 @@ static void __cfg80211_clear_ibss(struct ASSERT_WDEV_LOCK(wdev); - kzfree(wdev->connect_keys); + kfree_sensitive(wdev->connect_keys); wdev->connect_keys = NULL; rdev_set_qos_map(rdev, dev, NULL); --- a/net/wireless/lib80211_crypt_tkip.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/lib80211_crypt_tkip.c @@ -131,7 +131,7 @@ static void lib80211_tkip_deinit(void *p crypto_free_shash(_priv->tx_tfm_michael); crypto_free_shash(_priv->rx_tfm_michael); } - kzfree(priv); + kfree_sensitive(priv); } static inline u16 RotR1(u16 val) --- a/net/wireless/lib80211_crypt_wep.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/lib80211_crypt_wep.c @@ -56,7 +56,7 @@ static void *lib80211_wep_init(int keyid static void lib80211_wep_deinit(void *priv) { - kzfree(priv); + kfree_sensitive(priv); } /* Add WEP IV/key info to a frame that has at least 4 bytes of headroom */ --- a/net/wireless/nl80211.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/nl80211.c @@ -9836,7 +9836,7 @@ static int nl80211_join_ibss(struct sk_b if ((ibss.chandef.width != NL80211_CHAN_WIDTH_20_NOHT) && no_ht) { - kzfree(connkeys); + kfree_sensitive(connkeys); return -EINVAL; } } @@ -9848,7 +9848,7 @@ static int nl80211_join_ibss(struct sk_b int r = validate_pae_over_nl80211(rdev, info); if (r < 0) { - kzfree(connkeys); + kfree_sensitive(connkeys); return r; } @@ -9861,7 +9861,7 @@ static int nl80211_join_ibss(struct sk_b wdev_lock(dev->ieee80211_ptr); err = __cfg80211_join_ibss(rdev, dev, &ibss, connkeys); if (err) - kzfree(connkeys); + kfree_sensitive(connkeys); else if (info->attrs[NL80211_ATTR_SOCKET_OWNER]) dev->ieee80211_ptr->conn_owner_nlportid = info->snd_portid; wdev_unlock(dev->ieee80211_ptr); @@ -10289,7 +10289,7 @@ static int nl80211_connect(struct sk_buf if (info->attrs[NL80211_ATTR_HT_CAPABILITY]) { if (!info->attrs[NL80211_ATTR_HT_CAPABILITY_MASK]) { - kzfree(connkeys); + kfree_sensitive(connkeys); return -EINVAL; } memcpy(&connect.ht_capa, @@ -10307,7 +10307,7 @@ static int nl80211_connect(struct sk_buf if (info->attrs[NL80211_ATTR_VHT_CAPABILITY]) { if (!info->attrs[NL80211_ATTR_VHT_CAPABILITY_MASK]) { - kzfree(connkeys); + kfree_sensitive(connkeys); return -EINVAL; } memcpy(&connect.vht_capa, @@ -10321,7 +10321,7 @@ static int nl80211_connect(struct sk_buf (rdev->wiphy.features & NL80211_FEATURE_QUIET)) && !wiphy_ext_feature_isset(&rdev->wiphy, NL80211_EXT_FEATURE_RRM)) { - kzfree(connkeys); + kfree_sensitive(connkeys); return -EINVAL; } connect.flags |= ASSOC_REQ_USE_RRM; @@ -10329,21 +10329,21 @@ static int nl80211_connect(struct sk_buf connect.pbss = nla_get_flag(info->attrs[NL80211_ATTR_PBSS]); if (connect.pbss && !rdev->wiphy.bands[NL80211_BAND_60GHZ]) { - kzfree(connkeys); + kfree_sensitive(connkeys); return -EOPNOTSUPP; } if (info->attrs[NL80211_ATTR_BSS_SELECT]) { /* bss selection makes no sense if bssid is set */ if (connect.bssid) { - kzfree(connkeys); + kfree_sensitive(connkeys); return -EINVAL; } err = parse_bss_select(info->attrs[NL80211_ATTR_BSS_SELECT], wiphy, &connect.bss_select); if (err) { - kzfree(connkeys); + kfree_sensitive(connkeys); return err; } } @@ -10373,13 +10373,13 @@ static int nl80211_connect(struct sk_buf info->attrs[NL80211_ATTR_FILS_ERP_REALM] || info->attrs[NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM] || info->attrs[NL80211_ATTR_FILS_ERP_RRK]) { - kzfree(connkeys); + kfree_sensitive(connkeys); return -EINVAL; } if (nla_get_flag(info->attrs[NL80211_ATTR_EXTERNAL_AUTH_SUPPORT])) { if (!info->attrs[NL80211_ATTR_SOCKET_OWNER]) { - kzfree(connkeys); + kfree_sensitive(connkeys); GENL_SET_ERR_MSG(info, "external auth requires connection ownership"); return -EINVAL; @@ -10392,7 +10392,7 @@ static int nl80211_connect(struct sk_buf err = cfg80211_connect(rdev, dev, &connect, connkeys, connect.prev_bssid); if (err) - kzfree(connkeys); + kfree_sensitive(connkeys); if (!err && info->attrs[NL80211_ATTR_SOCKET_OWNER]) { dev->ieee80211_ptr->conn_owner_nlportid = info->snd_portid; --- a/net/wireless/sme.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/sme.c @@ -742,7 +742,7 @@ void __cfg80211_connect_result(struct ne } if (cr->status != WLAN_STATUS_SUCCESS) { - kzfree(wdev->connect_keys); + kfree_sensitive(wdev->connect_keys); wdev->connect_keys = NULL; wdev->ssid_len = 0; wdev->conn_owner_nlportid = 0; @@ -1098,7 +1098,7 @@ void __cfg80211_disconnected(struct net_ wdev->current_bss = NULL; wdev->ssid_len = 0; wdev->conn_owner_nlportid = 0; - kzfree(wdev->connect_keys); + kfree_sensitive(wdev->connect_keys); wdev->connect_keys = NULL; nl80211_send_disconnected(rdev, dev, reason, ie, ie_len, from_ap); @@ -1281,7 +1281,7 @@ int cfg80211_disconnect(struct cfg80211_ ASSERT_WDEV_LOCK(wdev); - kzfree(wdev->connect_keys); + kfree_sensitive(wdev->connect_keys); wdev->connect_keys = NULL; wdev->conn_owner_nlportid = 0; --- a/net/wireless/util.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/util.c @@ -871,7 +871,7 @@ void cfg80211_upload_connect_keys(struct } } - kzfree(wdev->connect_keys); + kfree_sensitive(wdev->connect_keys); wdev->connect_keys = NULL; } --- a/net/wireless/wext-sme.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/net/wireless/wext-sme.c @@ -57,7 +57,7 @@ int cfg80211_mgd_wext_connect(struct cfg err = cfg80211_connect(rdev, wdev->netdev, &wdev->wext.connect, ck, prev_bssid); if (err) - kzfree(ck); + kfree_sensitive(ck); return err; } --- a/scripts/coccinelle/free/devm_free.cocci~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/scripts/coccinelle/free/devm_free.cocci @@ -89,7 +89,7 @@ position p; ( kfree@p(x) | - kzfree@p(x) + kfree_sensitive@p(x) | krealloc@p(x, ...) | @@ -112,7 +112,7 @@ position p != safe.p; ( * kfree@p(x) | -* kzfree@p(x) +* kfree_sensitive@p(x) | * krealloc@p(x, ...) | --- a/scripts/coccinelle/free/ifnullfree.cocci~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/scripts/coccinelle/free/ifnullfree.cocci @@ -21,7 +21,7 @@ expression E; ( kfree(E); | - kzfree(E); + kfree_sensitive(E); | debugfs_remove(E); | @@ -42,7 +42,7 @@ position p; @@ * if (E != NULL) -* \(kfree@p\|kzfree@p\|debugfs_remove@p\|debugfs_remove_recursive@p\| +* \(kfree@p\|kfree_sensitive@p\|debugfs_remove@p\|debugfs_remove_recursive@p\| * usb_free_urb@p\|kmem_cache_destroy@p\|mempool_destroy@p\| * dma_pool_destroy@p\)(E); --- a/scripts/coccinelle/free/kfreeaddr.cocci~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/scripts/coccinelle/free/kfreeaddr.cocci @@ -20,7 +20,7 @@ position p; ( * kfree@p(&e->f) | -* kzfree@p(&e->f) +* kfree_sensitive@p(&e->f) ) @script:python depends on org@ --- a/scripts/coccinelle/free/kfree.cocci~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/scripts/coccinelle/free/kfree.cocci @@ -24,7 +24,7 @@ position p1; ( * kfree@p1(E) | -* kzfree@p1(E) +* kfree_sensitive@p1(E) ) @print expression@ @@ -68,7 +68,7 @@ while (1) { ... ( * kfree@ok(E) | -* kzfree@ok(E) +* kfree_sensitive@ok(E) ) ... when != break; when != goto l; @@ -86,7 +86,7 @@ position free.p1!=loop.ok,p2!={print.p,s ( * kfree@p1(E,...) | -* kzfree@p1(E,...) +* kfree_sensitive@p1(E,...) ) ... ( --- a/security/apparmor/domain.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/apparmor/domain.c @@ -40,8 +40,8 @@ void aa_free_domain_entries(struct aa_do return; for (i = 0; i < domain->size; i++) - kzfree(domain->table[i]); - kzfree(domain->table); + kfree_sensitive(domain->table[i]); + kfree_sensitive(domain->table); domain->table = NULL; } } --- a/security/apparmor/include/file.h~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/apparmor/include/file.h @@ -72,7 +72,7 @@ static inline void aa_free_file_ctx(stru { if (ctx) { aa_put_label(rcu_access_pointer(ctx->label)); - kzfree(ctx); + kfree_sensitive(ctx); } } --- a/security/apparmor/policy.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/apparmor/policy.c @@ -187,9 +187,9 @@ static void aa_free_data(void *ptr, void { struct aa_data *data = ptr; - kzfree(data->data); - kzfree(data->key); - kzfree(data); + kfree_sensitive(data->data); + kfree_sensitive(data->key); + kfree_sensitive(data); } /** @@ -217,19 +217,19 @@ void aa_free_profile(struct aa_profile * aa_put_profile(rcu_access_pointer(profile->parent)); aa_put_ns(profile->ns); - kzfree(profile->rename); + kfree_sensitive(profile->rename); aa_free_file_rules(&profile->file); aa_free_cap_rules(&profile->caps); aa_free_rlimit_rules(&profile->rlimits); for (i = 0; i < profile->xattr_count; i++) - kzfree(profile->xattrs[i]); - kzfree(profile->xattrs); + kfree_sensitive(profile->xattrs[i]); + kfree_sensitive(profile->xattrs); for (i = 0; i < profile->secmark_count; i++) - kzfree(profile->secmark[i].label); - kzfree(profile->secmark); - kzfree(profile->dirname); + kfree_sensitive(profile->secmark[i].label); + kfree_sensitive(profile->secmark); + kfree_sensitive(profile->dirname); aa_put_dfa(profile->xmatch); aa_put_dfa(profile->policy.dfa); @@ -237,14 +237,14 @@ void aa_free_profile(struct aa_profile * rht = profile->data; profile->data = NULL; rhashtable_free_and_destroy(rht, aa_free_data, NULL); - kzfree(rht); + kfree_sensitive(rht); } - kzfree(profile->hash); + kfree_sensitive(profile->hash); aa_put_loaddata(profile->rawdata); aa_label_destroy(&profile->label); - kzfree(profile); + kfree_sensitive(profile); } /** --- a/security/apparmor/policy_ns.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/apparmor/policy_ns.c @@ -121,9 +121,9 @@ static struct aa_ns *alloc_ns(const char return ns; fail_unconfined: - kzfree(ns->base.hname); + kfree_sensitive(ns->base.hname); fail_ns: - kzfree(ns); + kfree_sensitive(ns); return NULL; } @@ -145,7 +145,7 @@ void aa_free_ns(struct aa_ns *ns) ns->unconfined->ns = NULL; aa_free_profile(ns->unconfined); - kzfree(ns); + kfree_sensitive(ns); } /** --- a/security/apparmor/policy_unpack.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/apparmor/policy_unpack.c @@ -163,10 +163,10 @@ static void do_loaddata_free(struct work aa_put_ns(ns); } - kzfree(d->hash); - kzfree(d->name); + kfree_sensitive(d->hash); + kfree_sensitive(d->name); kvfree(d->data); - kzfree(d); + kfree_sensitive(d); } void aa_loaddata_kref(struct kref *kref) @@ -894,7 +894,7 @@ static struct aa_profile *unpack_profile while (unpack_strdup(e, &key, NULL)) { data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) { - kzfree(key); + kfree_sensitive(key); goto fail; } @@ -902,8 +902,8 @@ static struct aa_profile *unpack_profile data->size = unpack_blob(e, &data->data, NULL); data->data = kvmemdup(data->data, data->size); if (data->size && !data->data) { - kzfree(data->key); - kzfree(data); + kfree_sensitive(data->key); + kfree_sensitive(data); goto fail; } @@ -1037,7 +1037,7 @@ void aa_load_ent_free(struct aa_load_ent aa_put_profile(ent->old); aa_put_profile(ent->new); kfree(ent->ns_name); - kzfree(ent); + kfree_sensitive(ent); } } --- a/security/keys/big_key.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/keys/big_key.c @@ -138,7 +138,7 @@ int big_key_preparse(struct key_preparse err_fput: fput(file); err_enckey: - kzfree(enckey); + kfree_sensitive(enckey); error: memzero_explicit(buf, enclen); kvfree(buf); @@ -155,7 +155,7 @@ void big_key_free_preparse(struct key_pr path_put(path); } - kzfree(prep->payload.data[big_key_data]); + kfree_sensitive(prep->payload.data[big_key_data]); } /* @@ -187,7 +187,7 @@ void big_key_destroy(struct key *key) path->mnt = NULL; path->dentry = NULL; } - kzfree(key->payload.data[big_key_data]); + kfree_sensitive(key->payload.data[big_key_data]); key->payload.data[big_key_data] = NULL; } --- a/security/keys/dh.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/keys/dh.c @@ -58,9 +58,9 @@ error: static void dh_free_data(struct dh *dh) { - kzfree(dh->key); - kzfree(dh->p); - kzfree(dh->g); + kfree_sensitive(dh->key); + kfree_sensitive(dh->p); + kfree_sensitive(dh->g); } struct dh_completion { @@ -126,7 +126,7 @@ static void kdf_dealloc(struct kdf_sdesc if (sdesc->shash.tfm) crypto_free_shash(sdesc->shash.tfm); - kzfree(sdesc); + kfree_sensitive(sdesc); } /* @@ -220,7 +220,7 @@ static int keyctl_dh_compute_kdf(struct ret = -EFAULT; err: - kzfree(outbuf); + kfree_sensitive(outbuf); return ret; } @@ -395,11 +395,11 @@ long __keyctl_dh_compute(struct keyctl_d out6: kpp_request_free(req); out5: - kzfree(outbuf); + kfree_sensitive(outbuf); out4: crypto_free_kpp(tfm); out3: - kzfree(secret); + kfree_sensitive(secret); out2: dh_free_data(&dh_inputs); out1: --- a/security/keys/encrypted-keys/encrypted.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/keys/encrypted-keys/encrypted.c @@ -370,7 +370,7 @@ static int get_derived_key(u8 *derived_k master_keylen); ret = crypto_shash_tfm_digest(hash_tfm, derived_buf, derived_buf_len, derived_key); - kzfree(derived_buf); + kfree_sensitive(derived_buf); return ret; } @@ -812,13 +812,13 @@ static int encrypted_instantiate(struct ret = encrypted_init(epayload, key->description, format, master_desc, decrypted_datalen, hex_encoded_iv); if (ret < 0) { - kzfree(epayload); + kfree_sensitive(epayload); goto out; } rcu_assign_keypointer(key, epayload); out: - kzfree(datablob); + kfree_sensitive(datablob); return ret; } @@ -827,7 +827,7 @@ static void encrypted_rcu_free(struct rc struct encrypted_key_payload *epayload; epayload = container_of(rcu, struct encrypted_key_payload, rcu); - kzfree(epayload); + kfree_sensitive(epayload); } /* @@ -885,7 +885,7 @@ static int encrypted_update(struct key * rcu_assign_keypointer(key, new_epayload); call_rcu(&epayload->rcu, encrypted_rcu_free); out: - kzfree(buf); + kfree_sensitive(buf); return ret; } @@ -946,7 +946,7 @@ static long encrypted_read(const struct memzero_explicit(derived_key, sizeof(derived_key)); memcpy(buffer, ascii_buf, asciiblob_len); - kzfree(ascii_buf); + kfree_sensitive(ascii_buf); return asciiblob_len; out: @@ -961,7 +961,7 @@ out: */ static void encrypted_destroy(struct key *key) { - kzfree(key->payload.data[0]); + kfree_sensitive(key->payload.data[0]); } struct key_type key_type_encrypted = { --- a/security/keys/trusted-keys/trusted_tpm1.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/keys/trusted-keys/trusted_tpm1.c @@ -68,7 +68,7 @@ static int TSS_sha1(const unsigned char } ret = crypto_shash_digest(&sdesc->shash, data, datalen, digest); - kzfree(sdesc); + kfree_sensitive(sdesc); return ret; } @@ -112,7 +112,7 @@ static int TSS_rawhmac(unsigned char *di if (!ret) ret = crypto_shash_final(&sdesc->shash, digest); out: - kzfree(sdesc); + kfree_sensitive(sdesc); return ret; } @@ -166,7 +166,7 @@ int TSS_authhmac(unsigned char *digest, paramdigest, TPM_NONCE_SIZE, h1, TPM_NONCE_SIZE, h2, 1, &c, 0, 0); out: - kzfree(sdesc); + kfree_sensitive(sdesc); return ret; } EXPORT_SYMBOL_GPL(TSS_authhmac); @@ -251,7 +251,7 @@ int TSS_checkhmac1(unsigned char *buffer if (memcmp(testhmac, authdata, SHA1_DIGEST_SIZE)) ret = -EINVAL; out: - kzfree(sdesc); + kfree_sensitive(sdesc); return ret; } EXPORT_SYMBOL_GPL(TSS_checkhmac1); @@ -353,7 +353,7 @@ static int TSS_checkhmac2(unsigned char if (memcmp(testhmac2, authdata2, SHA1_DIGEST_SIZE)) ret = -EINVAL; out: - kzfree(sdesc); + kfree_sensitive(sdesc); return ret; } @@ -563,7 +563,7 @@ static int tpm_seal(struct tpm_buf *tb, *bloblen = storedsize; } out: - kzfree(td); + kfree_sensitive(td); return ret; } @@ -1031,12 +1031,12 @@ static int trusted_instantiate(struct ke if (!ret && options->pcrlock) ret = pcrlock(options->pcrlock); out: - kzfree(datablob); - kzfree(options); + kfree_sensitive(datablob); + kfree_sensitive(options); if (!ret) rcu_assign_keypointer(key, payload); else - kzfree(payload); + kfree_sensitive(payload); return ret; } @@ -1045,7 +1045,7 @@ static void trusted_rcu_free(struct rcu_ struct trusted_key_payload *p; p = container_of(rcu, struct trusted_key_payload, rcu); - kzfree(p); + kfree_sensitive(p); } /* @@ -1087,13 +1087,13 @@ static int trusted_update(struct key *ke ret = datablob_parse(datablob, new_p, new_o); if (ret != Opt_update) { ret = -EINVAL; - kzfree(new_p); + kfree_sensitive(new_p); goto out; } if (!new_o->keyhandle) { ret = -EINVAL; - kzfree(new_p); + kfree_sensitive(new_p); goto out; } @@ -1107,22 +1107,22 @@ static int trusted_update(struct key *ke ret = key_seal(new_p, new_o); if (ret < 0) { pr_info("trusted_key: key_seal failed (%d)\n", ret); - kzfree(new_p); + kfree_sensitive(new_p); goto out; } if (new_o->pcrlock) { ret = pcrlock(new_o->pcrlock); if (ret < 0) { pr_info("trusted_key: pcrlock failed (%d)\n", ret); - kzfree(new_p); + kfree_sensitive(new_p); goto out; } } rcu_assign_keypointer(key, new_p); call_rcu(&p->rcu, trusted_rcu_free); out: - kzfree(datablob); - kzfree(new_o); + kfree_sensitive(datablob); + kfree_sensitive(new_o); return ret; } @@ -1154,7 +1154,7 @@ static long trusted_read(const struct ke */ static void trusted_destroy(struct key *key) { - kzfree(key->payload.data[0]); + kfree_sensitive(key->payload.data[0]); } struct key_type key_type_trusted = { --- a/security/keys/user_defined.c~mm-treewide-rename-kzfree-to-kfree_sensitive +++ a/security/keys/user_defined.c @@ -82,7 +82,7 @@ EXPORT_SYMBOL_GPL(user_preparse); */ void user_free_preparse(struct key_preparsed_payload *prep) { - kzfree(prep->payload.data[0]); + kfree_sensitive(prep->payload.data[0]); } EXPORT_SYMBOL_GPL(user_free_preparse); @@ -91,7 +91,7 @@ static void user_free_payload_rcu(struct struct user_key_payload *payload; payload = container_of(head, struct user_key_payload, rcu); - kzfree(payload); + kfree_sensitive(payload); } /* @@ -147,7 +147,7 @@ void user_destroy(struct key *key) { struct user_key_payload *upayload = key->payload.data[0]; - kzfree(upayload); + kfree_sensitive(upayload); } EXPORT_SYMBOL_GPL(user_destroy); _
From: William Kucharski <william.kucharski@oracle.com> Subject: mm: ksize() should silently accept a NULL pointer Other mm routines such as kfree() and kzfree() silently do the right thing if passed a NULL pointer, so ksize() should do the same. Link: http://lkml.kernel.org/r/20200616225409.4670-1-william.kucharski@oracle.com Signed-off-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab_common.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) --- a/mm/slab_common.c~mm-ksize-should-silently-accept-a-null-pointer +++ a/mm/slab_common.c @@ -1681,10 +1681,9 @@ static __always_inline void *__do_kreall gfp_t flags) { void *ret; - size_t ks = 0; + size_t ks; - if (p) - ks = ksize(p); + ks = ksize(p); if (ks >= new_size) { p = kasan_krealloc((void *)p, new_size, flags); @@ -1744,10 +1743,9 @@ void kfree_sensitive(const void *p) size_t ks; void *mem = (void *)p; - if (unlikely(ZERO_OR_NULL_PTR(mem))) - return; ks = ksize(mem); - memzero_explicit(mem, ks); + if (ks) + memzero_explicit(mem, ks); kfree(mem); } EXPORT_SYMBOL(kfree_sensitive); @@ -1770,8 +1768,6 @@ size_t ksize(const void *objp) { size_t size; - if (WARN_ON_ONCE(!objp)) - return 0; /* * We need to check that the pointed to object is valid, and only then * unpoison the shadow memory below. We use __kasan_check_read(), to @@ -1785,7 +1781,7 @@ size_t ksize(const void *objp) * We want to perform the check before __ksize(), to avoid potentially * crashing in __ksize() due to accessing invalid metadata. */ - if (unlikely(objp == ZERO_SIZE_PTR) || !__kasan_check_read(objp, 1)) + if (unlikely(ZERO_OR_NULL_PTR(objp)) || !__kasan_check_read(objp, 1)) return 0; size = __ksize(objp); _
From: Kees Cook <keescook@chromium.org> Subject: mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB" In reviewing Vlastimil Babka's latest slub debug series, I realized[1] that several checks under CONFIG_SLAB_FREELIST_HARDENED weren't being applied to SLAB. Fix this by expanding the Kconfig coverage, and adding a simple double-free test for SLAB. This patch (of 2): Include SLAB caches when performing kmem_cache pointer verification. A defense against such corruption[1] should be applied to all the allocators. With this added, the "SLAB_FREE_CROSS" and "SLAB_FREE_PAGE" LKDTM tests now pass on SLAB: lkdtm: Performing direct entry SLAB_FREE_CROSS lkdtm: Attempting cross-cache slab free ... ------------[ cut here ]------------ cache_from_obj: Wrong slab cache. lkdtm-heap-b but object is from lkdtm-heap-a WARNING: CPU: 2 PID: 2195 at mm/slab.h:530 kmem_cache_free+0x8d/0x1d0 ... lkdtm: Performing direct entry SLAB_FREE_PAGE lkdtm: Attempting non-Slab slab free ... ------------[ cut here ]------------ virt_to_cache: Object is not a Slab page! WARNING: CPU: 1 PID: 2202 at mm/slab.h:489 kmem_cache_free+0x196/0x1d0 Additionally clean up neighboring Kconfig entries for clarity, readability, and redundant option removal. [1] https://github.com/ThomasKing2014/slides/raw/master/Building%20universal%20Android%20rooting%20with%20a%20type%20confusion%20vulnerability.pdf Link: http://lkml.kernel.org/r/20200625215548.389774-1-keescook@chromium.org Link: http://lkml.kernel.org/r/20200625215548.389774-2-keescook@chromium.org Fixes: 598a0717a816 ("mm/slab: validate cache membership under freelist hardening") Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Popov <alex.popov@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- init/Kconfig | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) --- a/init/Kconfig~mm-expand-config_slab_freelist_hardened-to-include-slab +++ a/init/Kconfig @@ -1913,9 +1913,8 @@ config SLAB_MERGE_DEFAULT command line. config SLAB_FREELIST_RANDOM - default n + bool "Randomize slab freelist" depends on SLAB || SLUB - bool "SLAB freelist randomization" help Randomizes the freelist order used on creating new pages. This security feature reduces the predictability of the kernel slab @@ -1923,12 +1922,14 @@ config SLAB_FREELIST_RANDOM config SLAB_FREELIST_HARDENED bool "Harden slab freelist metadata" - depends on SLUB + depends on SLAB || SLUB help Many kernel heap attacks try to target slab cache metadata and other infrastructure. This options makes minor performance sacrifices to harden the kernel slab allocator against common - freelist exploit methods. + freelist exploit methods. Some slab implementations have more + sanity-checking than others. This option is most effective with + CONFIG_SLUB. config SHUFFLE_PAGE_ALLOCATOR bool "Page allocator randomization" _
From: Kees Cook <keescook@chromium.org> Subject: mm/slab: add naive detection of double free Similar to commit ce6fa91b9363 ("mm/slub.c: add a naive detection of double free or corruption"), add a very cheap double-free check for SLAB under CONFIG_SLAB_FREELIST_HARDENED. With this added, the "SLAB_FREE_DOUBLE" LKDTM test passes under SLAB: lkdtm: Performing direct entry SLAB_FREE_DOUBLE lkdtm: Attempting double slab free ... ------------[ cut here ]------------ WARNING: CPU: 2 PID: 2193 at mm/slab.c:757 ___cache _free+0x325/0x390 [keescook@chromium.org: fix misplaced __free_one()] Link: http://lkml.kernel.org/r/202006261306.0D82A2B@keescook Link: https://lore.kernel.org/lkml/7ff248c7-d447-340c-a8e2-8c02972aca70@infradead.org Link: http://lkml.kernel.org/r/20200625215548.389774-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Randy Dunlap <rdunlap@infradead.org> [build tested] Cc: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Matthew Garrett <mjg59@google.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) --- a/mm/slab.c~slab-add-naive-detection-of-double-free +++ a/mm/slab.c @@ -588,6 +588,16 @@ static int transfer_objects(struct array return nr; } +/* &alien->lock must be held by alien callers. */ +static __always_inline void __free_one(struct array_cache *ac, void *objp) +{ + /* Avoid trivial double-free. */ + if (IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && + WARN_ON_ONCE(ac->avail > 0 && ac->entry[ac->avail - 1] == objp)) + return; + ac->entry[ac->avail++] = objp; +} + #ifndef CONFIG_NUMA #define drain_alien_cache(cachep, alien) do { } while (0) @@ -767,7 +777,7 @@ static int __cache_free_alien(struct kme STATS_INC_ACOVERFLOW(cachep); __drain_alien_cache(cachep, ac, page_node, &list); } - ac->entry[ac->avail++] = objp; + __free_one(ac, objp); spin_unlock(&alien->lock); slabs_destroy(cachep, &list); } else { @@ -3466,7 +3476,7 @@ void ___cache_free(struct kmem_cache *ca } } - ac->entry[ac->avail++] = objp; + __free_one(ac, objp); } /** _
From: Long Li <lonuxli.64@gmail.com> Subject: mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_order kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of memory currently bypasses the check and will simply leak the memory when page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK check out of slab & slub, and call it from kmalloc_order() as well. In order to make the code clear, the warning message is put in one place. Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilong Signed-off-by: Long Li <lonuxli.64@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 10 +++------- mm/slab.h | 1 + mm/slab_common.c | 17 +++++++++++++++++ mm/slub.c | 9 ++------- 4 files changed, 23 insertions(+), 14 deletions(-) --- a/mm/slab.c~mm-slab-check-gfp_slab_bug_mask-before-alloc_pages-in-kmalloc_order +++ a/mm/slab.c @@ -2589,13 +2589,9 @@ static struct page *cache_grow_begin(str * Be lazy and only check for valid flags here, keeping it out of the * critical path in kmem_cache_alloc(). */ - if (unlikely(flags & GFP_SLAB_BUG_MASK)) { - gfp_t invalid_mask = flags & GFP_SLAB_BUG_MASK; - flags &= ~GFP_SLAB_BUG_MASK; - pr_warn("Unexpected gfp: %#x (%pGg). Fixing up to gfp: %#x (%pGg). Fix your code!\n", - invalid_mask, &invalid_mask, flags, &flags); - dump_stack(); - } + if (unlikely(flags & GFP_SLAB_BUG_MASK)) + flags = kmalloc_fix_flags(flags); + WARN_ON_ONCE(cachep->ctor && (flags & __GFP_ZERO)); local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); --- a/mm/slab_common.c~mm-slab-check-gfp_slab_bug_mask-before-alloc_pages-in-kmalloc_order +++ a/mm/slab_common.c @@ -26,6 +26,8 @@ #define CREATE_TRACE_POINTS #include <trace/events/kmem.h> +#include "internal.h" + #include "slab.h" enum slab_state slab_state; @@ -1332,6 +1334,18 @@ void __init create_kmalloc_caches(slab_f } #endif /* !CONFIG_SLOB */ +gfp_t kmalloc_fix_flags(gfp_t flags) +{ + gfp_t invalid_mask = flags & GFP_SLAB_BUG_MASK; + + flags &= ~GFP_SLAB_BUG_MASK; + pr_warn("Unexpected gfp: %#x (%pGg). Fixing up to gfp: %#x (%pGg). Fix your code!\n", + invalid_mask, &invalid_mask, flags, &flags); + dump_stack(); + + return flags; +} + /* * To avoid unnecessary overhead, we pass through large allocation requests * directly to the page allocator. We use __GFP_COMP, because we will need to @@ -1342,6 +1356,9 @@ void *kmalloc_order(size_t size, gfp_t f void *ret = NULL; struct page *page; + if (unlikely(flags & GFP_SLAB_BUG_MASK)) + flags = kmalloc_fix_flags(flags); + flags |= __GFP_COMP; page = alloc_pages(flags, order); if (likely(page)) { --- a/mm/slab.h~mm-slab-check-gfp_slab_bug_mask-before-alloc_pages-in-kmalloc_order +++ a/mm/slab.h @@ -152,6 +152,7 @@ void create_kmalloc_caches(slab_flags_t) struct kmem_cache *kmalloc_slab(size_t, gfp_t); #endif +gfp_t kmalloc_fix_flags(gfp_t flags); /* Functions provided by the slab allocators */ int __kmem_cache_create(struct kmem_cache *, slab_flags_t flags); --- a/mm/slub.c~mm-slab-check-gfp_slab_bug_mask-before-alloc_pages-in-kmalloc_order +++ a/mm/slub.c @@ -1745,13 +1745,8 @@ out: static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) { - if (unlikely(flags & GFP_SLAB_BUG_MASK)) { - gfp_t invalid_mask = flags & GFP_SLAB_BUG_MASK; - flags &= ~GFP_SLAB_BUG_MASK; - pr_warn("Unexpected gfp: %#x (%pGg). Fixing up to gfp: %#x (%pGg). Fix your code!\n", - invalid_mask, &invalid_mask, flags, &flags); - dump_stack(); - } + if (unlikely(flags & GFP_SLAB_BUG_MASK)) + flags = kmalloc_fix_flags(flags); return allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); _
From: Xiao Yang <yangx.jy@cn.fujitsu.com> Subject: mm/slab.c: update outdated kmem_list3 in a comment kmem_list3 has been renamed to kmem_cache_node long long ago so update it. References: 6744f087ba2a ("slab: Common name for the per node structures") ce8eb6c424c7 ("slab: Rename list3/l3 to node") Link: http://lkml.kernel.org/r/20200722033355.26908-1-yangx.jy@cn.fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/slab.c~mm-slabc-update-outdated-kmem_list3-in-a-comment +++ a/mm/slab.c @@ -1060,7 +1060,7 @@ int slab_prepare_cpu(unsigned int cpu) * offline. * * Even if all the cpus of a node are down, we don't free the - * kmem_list3 of any cache. This to avoid a race between cpu_down, and + * kmem_cache_node of any cache. This to avoid a race between cpu_down, and * a kmalloc allocation from another cpu for memory from the node of * the cpu going down. The list3 structure is usually allocated from * kmem_cache_create() and gets destroyed at kmem_cache_destroy(). _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: extend slub_debug syntax for multiple blocks Patch series "slub_debug fixes and improvements". The slub_debug kernel boot parameter can either apply a single set of options to all caches or a list of caches. There is a use case where debugging is applied for all caches and then disabled at runtime for specific caches, for performance and memory consumption reasons [1]. As runtime changes are dangerous, extend the boot parameter syntax so that multiple blocks of either global or slab-specific options can be specified, with blocks delimited by ';'. This will also support the use case of [1] without runtime changes. For details see the updated Documentation/vm/slub.rst [1] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org [weiyongjun1@huawei.com: make parse_slub_debug_flags() static] Link: http://lkml.kernel.org/r/20200702150522.4940-1-weiyongjun1@huawei.com Link: http://lkml.kernel.org/r/20200610163135.17364-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/admin-guide/kernel-parameters.txt | 2 Documentation/vm/slub.rst | 18 + mm/slub.c | 179 +++++++++----- 3 files changed, 146 insertions(+), 53 deletions(-) --- a/Documentation/admin-guide/kernel-parameters.txt~mm-slub-extend-slub_debug-syntax-for-multiple-blocks +++ a/Documentation/admin-guide/kernel-parameters.txt @@ -4689,7 +4689,7 @@ fragmentation. Defaults to 1 for systems with more than 32MB of RAM, 0 otherwise. - slub_debug[=options[,slabs]] [MM, SLUB] + slub_debug[=options[,slabs][;[options[,slabs]]...] [MM, SLUB] Enabling slub_debug allows one to determine the culprit if slab objects become corrupted. Enabling slub_debug can create guard zones around objects and --- a/Documentation/vm/slub.rst~mm-slub-extend-slub_debug-syntax-for-multiple-blocks +++ a/Documentation/vm/slub.rst @@ -41,6 +41,11 @@ slub_debug=<Debug-Options>,<slab name1>, Enable options only for select slabs (no spaces after a comma) +Multiple blocks of options for all slabs or selected slabs can be given, with +blocks of options delimited by ';'. The last of "all slabs" blocks is applied +to all slabs except those that match one of the "select slabs" block. Options +of the first "select slabs" blocks that matches the slab's name are applied. + Possible debug options are:: F Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS @@ -83,6 +88,19 @@ switch off debugging for such caches by slub_debug=O +You can apply different options to different list of slab names, using blocks +of options. This will enable red zoning for dentry and user tracking for +kmalloc. All other slabs will not get any debugging enabled:: + + slub_debug=Z,dentry;U,kmalloc-* + +You can also enable options (e.g. sanity checks and poisoning) for all caches +except some that are deemed too performance critical and don't need to be +debugged by specifying global debug options followed by a list of slab names +with "-" as options:: + + slub_debug=FZ;-,zs_handle,zspage + In case you forgot to enable debugging on the kernel command line: It is possible to enable debugging manually when the kernel is up. Look at the contents of:: --- a/mm/slub.c~mm-slub-extend-slub_debug-syntax-for-multiple-blocks +++ a/mm/slub.c @@ -499,7 +499,7 @@ static slab_flags_t slub_debug = DEBUG_D static slab_flags_t slub_debug; #endif -static char *slub_debug_slabs; +static char *slub_debug_string; static int disable_higher_order_debug; /* @@ -1262,68 +1262,132 @@ out: return ret; } -static int __init setup_slub_debug(char *str) +/* + * Parse a block of slub_debug options. Blocks are delimited by ';' + * + * @str: start of block + * @flags: returns parsed flags, or DEBUG_DEFAULT_FLAGS if none specified + * @slabs: return start of list of slabs, or NULL when there's no list + * @init: assume this is initial parsing and not per-kmem-create parsing + * + * returns the start of next block if there's any, or NULL + */ +static char * +parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) { - slub_debug = DEBUG_DEFAULT_FLAGS; - if (*str++ != '=' || !*str) - /* - * No options specified. Switch on full debugging. - */ - goto out; + bool higher_order_disable = false; - if (*str == ',') + /* Skip any completely empty blocks */ + while (*str && *str == ';') + str++; + + if (*str == ',') { /* * No options but restriction on slabs. This means full * debugging for slabs matching a pattern. */ + *flags = DEBUG_DEFAULT_FLAGS; goto check_slabs; + } + *flags = 0; - slub_debug = 0; - if (*str == '-') - /* - * Switch off all debugging measures. - */ - goto out; - - /* - * Determine which debug features should be switched on - */ - for (; *str && *str != ','; str++) { + /* Determine which debug features should be switched on */ + for (; *str && *str != ',' && *str != ';'; str++) { switch (tolower(*str)) { + case '-': + *flags = 0; + break; case 'f': - slub_debug |= SLAB_CONSISTENCY_CHECKS; + *flags |= SLAB_CONSISTENCY_CHECKS; break; case 'z': - slub_debug |= SLAB_RED_ZONE; + *flags |= SLAB_RED_ZONE; break; case 'p': - slub_debug |= SLAB_POISON; + *flags |= SLAB_POISON; break; case 'u': - slub_debug |= SLAB_STORE_USER; + *flags |= SLAB_STORE_USER; break; case 't': - slub_debug |= SLAB_TRACE; + *flags |= SLAB_TRACE; break; case 'a': - slub_debug |= SLAB_FAILSLAB; + *flags |= SLAB_FAILSLAB; break; case 'o': /* * Avoid enabling debugging on caches if its minimum * order would increase as a result. */ - disable_higher_order_debug = 1; + higher_order_disable = true; break; default: - pr_err("slub_debug option '%c' unknown. skipped\n", - *str); + if (init) + pr_err("slub_debug option '%c' unknown. skipped\n", *str); } } - check_slabs: if (*str == ',') - slub_debug_slabs = str + 1; + *slabs = ++str; + else + *slabs = NULL; + + /* Skip over the slab list */ + while (*str && *str != ';') + str++; + + /* Skip any completely empty blocks */ + while (*str && *str == ';') + str++; + + if (init && higher_order_disable) + disable_higher_order_debug = 1; + + if (*str) + return str; + else + return NULL; +} + +static int __init setup_slub_debug(char *str) +{ + slab_flags_t flags; + char *saved_str; + char *slab_list; + bool global_slub_debug_changed = false; + bool slab_list_specified = false; + + slub_debug = DEBUG_DEFAULT_FLAGS; + if (*str++ != '=' || !*str) + /* + * No options specified. Switch on full debugging. + */ + goto out; + + saved_str = str; + while (str) { + str = parse_slub_debug_flags(str, &flags, &slab_list, true); + + if (!slab_list) { + slub_debug = flags; + global_slub_debug_changed = true; + } else { + slab_list_specified = true; + } + } + + /* + * For backwards compatibility, a single list of flags with list of + * slabs means debugging is only enabled for those slabs, so the global + * slub_debug should be 0. We can extended that to multiple lists as + * long as there is no option specifying flags without a slab list. + */ + if (slab_list_specified) { + if (!global_slub_debug_changed) + slub_debug = 0; + slub_debug_string = saved_str; + } out: if ((static_branch_unlikely(&init_on_alloc) || static_branch_unlikely(&init_on_free)) && @@ -1352,36 +1416,47 @@ slab_flags_t kmem_cache_flags(unsigned i { char *iter; size_t len; + char *next_block; + slab_flags_t block_flags; /* If slub_debug = 0, it folds into the if conditional. */ - if (!slub_debug_slabs) + if (!slub_debug_string) return flags | slub_debug; len = strlen(name); - iter = slub_debug_slabs; - while (*iter) { - char *end, *glob; - size_t cmplen; - - end = strchrnul(iter, ','); - - glob = strnchr(iter, end - iter, '*'); - if (glob) - cmplen = glob - iter; - else - cmplen = max_t(size_t, len, (end - iter)); + next_block = slub_debug_string; + /* Go through all blocks of debug options, see if any matches our slab's name */ + while (next_block) { + next_block = parse_slub_debug_flags(next_block, &block_flags, &iter, false); + if (!iter) + continue; + /* Found a block that has a slab list, search it */ + while (*iter) { + char *end, *glob; + size_t cmplen; + + end = strchrnul(iter, ','); + if (next_block && next_block < end) + end = next_block - 1; + + glob = strnchr(iter, end - iter, '*'); + if (glob) + cmplen = glob - iter; + else + cmplen = max_t(size_t, len, (end - iter)); - if (!strncmp(name, iter, cmplen)) { - flags |= slub_debug; - break; - } + if (!strncmp(name, iter, cmplen)) { + flags |= block_flags; + return flags; + } - if (!*end) - break; - iter = end + 1; + if (!*end || *end == ';') + break; + iter = end + 1; + } } - return flags; + return slub_debug; } #else /* !CONFIG_SLUB_DEBUG */ static inline void setup_object_debug(struct kmem_cache *s, _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: make some slub_debug related attributes read-only SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. The options can be also toggled at runtime by writing into the files. Some of those, namely red_zone, poison, and store_user can be toggled only when no objects yet exist in the cache. Vijayanand reports [1] that there is a problem with freelist randomization if changing the debugging option's state results in different number of objects per page, and the random sequence cache needs thus needs to be recomputed. However, another problem is that the check for "no objects yet exist in the cache" is racy, as noted by Jann [2] and fixing that would add overhead or otherwise complicate the allocation/freeing paths. Thus it would be much simpler just to remove the runtime toggling support. The documentation describes it's "In case you forgot to enable debugging on the kernel command line", but the neccessity of having no objects limits its usefulness anyway for many caches. Vijayanand describes an use case [3] where debugging is enabled for all but zram caches for memory overhead reasons, and using the runtime toggles was the only way to achieve such configuration. After the previous patch it's now possible to do that directly from the kernel boot option, so we can remove the dangerous runtime toggles by making the /sys attribute files read-only. While updating it, also improve the documentation of the debugging /sys files. [1] https://lkml.kernel.org/r/1580379523-32272-1-git-send-email-vjitta@codeaurora.org [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com [3] https://lore.kernel.org/r/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org Link: http://lkml.kernel.org/r/20200610163135.17364-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vijayanand Jitta <vjitta@codeaurora.org> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/vm/slub.rst | 26 ++++++++++++-------- mm/slub.c | 46 ++---------------------------------- 2 files changed, 19 insertions(+), 53 deletions(-) --- a/Documentation/vm/slub.rst~mm-slub-make-some-slub_debug-related-attributes-read-only +++ a/Documentation/vm/slub.rst @@ -101,20 +101,26 @@ with "-" as options:: slub_debug=FZ;-,zs_handle,zspage -In case you forgot to enable debugging on the kernel command line: It is -possible to enable debugging manually when the kernel is up. Look at the -contents of:: +The state of each debug option for a slab can be found in the respective files +under:: /sys/kernel/slab/<slab name>/ -Look at the writable files. Writing 1 to them will enable the -corresponding debug option. All options can be set on a slab that does -not contain objects. If the slab already contains objects then sanity checks -and tracing may only be enabled. The other options may cause the realignment -of objects. +If the file contains 1, the option is enabled, 0 means disabled. The debug +options from the ``slub_debug`` parameter translate to the following files:: -Careful with tracing: It may spew out lots of information and never stop if -used on the wrong slab. + F sanity_checks + Z red_zone + P poison + U store_user + T trace + A failslab + +The sanity_checks, trace and failslab files are writable, so writing 1 or 0 +will enable or disable the option at runtime. The writes to trace and failslab +may return -EINVAL if the cache is subject to slab merging. Careful with +tracing: It may spew out lots of information and never stop if used on the +wrong slab. Slab merging ============ --- a/mm/slub.c~mm-slub-make-some-slub_debug-related-attributes-read-only +++ a/mm/slub.c @@ -5335,61 +5335,21 @@ static ssize_t red_zone_show(struct kmem return sprintf(buf, "%d\n", !!(s->flags & SLAB_RED_ZONE)); } -static ssize_t red_zone_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - if (any_slab_objects(s)) - return -EBUSY; - - s->flags &= ~SLAB_RED_ZONE; - if (buf[0] == '1') { - s->flags |= SLAB_RED_ZONE; - } - calculate_sizes(s, -1); - return length; -} -SLAB_ATTR(red_zone); +SLAB_ATTR_RO(red_zone); static ssize_t poison_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_POISON)); } -static ssize_t poison_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - if (any_slab_objects(s)) - return -EBUSY; - - s->flags &= ~SLAB_POISON; - if (buf[0] == '1') { - s->flags |= SLAB_POISON; - } - calculate_sizes(s, -1); - return length; -} -SLAB_ATTR(poison); +SLAB_ATTR_RO(poison); static ssize_t store_user_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_STORE_USER)); } -static ssize_t store_user_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - if (any_slab_objects(s)) - return -EBUSY; - - s->flags &= ~SLAB_STORE_USER; - if (buf[0] == '1') { - s->flags &= ~__CMPXCHG_DOUBLE; - s->flags |= SLAB_STORE_USER; - } - calculate_sizes(s, -1); - return length; -} -SLAB_ATTR(store_user); +SLAB_ATTR_RO(store_user); static ssize_t validate_show(struct kmem_cache *s, char *buf) { _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: remove runtime allocation order changes SLUB allows runtime changing of page allocation order by writing into the /sys/kernel/slab/<cache>/order file. Jann has reported [1] that this interface allows the order to be set too small, leading to crashes. While it's possible to fix the immediate issue, closer inspection reveals potential races. Storing the new order calls calculate_sizes() which non-atomically updates a lot of kmem_cache fields while the cache is still in use. Unexpected behavior might occur even if the fields are set to the same value as they were. This could be fixed by splitting out the part of calculate_sizes() that depends on forced_order, so that we only update kmem_cache.oo field. This could still race with init_cache_random_seq(), shuffle_freelist(), allocate_slab(). Perhaps it's possible to audit and e.g. add some READ_ONCE/WRITE_ONCE accesses, it might be easier just to remove the runtime order changes, which is what this patch does. If there are valid usecases for per-cache order setting, we could e.g. extend the boot parameters to do that. [1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com Link: http://lkml.kernel.org/r/20200610163135.17364-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Christoph Lameter <cl@linux.com> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slub.c | 19 +------------------ 1 file changed, 1 insertion(+), 18 deletions(-) --- a/mm/slub.c~mm-slub-remove-runtime-allocation-order-changes +++ a/mm/slub.c @@ -5095,28 +5095,11 @@ static ssize_t objs_per_slab_show(struct } SLAB_ATTR_RO(objs_per_slab); -static ssize_t order_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - unsigned int order; - int err; - - err = kstrtouint(buf, 10, &order); - if (err) - return err; - - if (order > slub_max_order || order < slub_min_order) - return -EINVAL; - - calculate_sizes(s, order); - return length; -} - static ssize_t order_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%u\n", oo_order(s->oo)); } -SLAB_ATTR(order); +SLAB_ATTR_RO(order); static ssize_t min_partial_show(struct kmem_cache *s, char *buf) { _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: make remaining slub_debug related attributes read-only SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. Some options, namely sanity_checks, trace, and failslab can be also enabled and disabled at runtime by writing into the files. The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when enabled, which means that in case of concurrent allocations, some can still use __CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags field is also not updated or checked atomically. The simplest solution is to remove the runtime toggling. The extended slub_debug boot parameter syntax introduced by earlier patch should allow to fine-tune the debugging configuration during boot with same granularity. Link: http://lkml.kernel.org/r/20200610163135.17364-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/vm/slub.rst | 7 +--- mm/slub.c | 62 +----------------------------------- 2 files changed, 5 insertions(+), 64 deletions(-) --- a/Documentation/vm/slub.rst~mm-slub-make-remaining-slub_debug-related-attributes-read-only +++ a/Documentation/vm/slub.rst @@ -116,11 +116,8 @@ options from the ``slub_debug`` paramete T trace A failslab -The sanity_checks, trace and failslab files are writable, so writing 1 or 0 -will enable or disable the option at runtime. The writes to trace and failslab -may return -EINVAL if the cache is subject to slab merging. Careful with -tracing: It may spew out lots of information and never stop if used on the -wrong slab. +Careful with tracing: It may spew out lots of information and never stop if +used on the wrong slab. Slab merging ============ --- a/mm/slub.c~mm-slub-make-remaining-slub_debug-related-attributes-read-only +++ a/mm/slub.c @@ -5040,20 +5040,6 @@ static ssize_t show_slab_objects(struct return x + sprintf(buf + x, "\n"); } -#ifdef CONFIG_SLUB_DEBUG -static int any_slab_objects(struct kmem_cache *s) -{ - int node; - struct kmem_cache_node *n; - - for_each_kmem_cache_node(s, node, n) - if (atomic_long_read(&n->total_objects)) - return 1; - - return 0; -} -#endif - #define to_slab_attr(n) container_of(n, struct slab_attribute, attr) #define to_slab(n) container_of(n, struct kmem_cache, kobj) @@ -5275,43 +5261,13 @@ static ssize_t sanity_checks_show(struct { return sprintf(buf, "%d\n", !!(s->flags & SLAB_CONSISTENCY_CHECKS)); } - -static ssize_t sanity_checks_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - s->flags &= ~SLAB_CONSISTENCY_CHECKS; - if (buf[0] == '1') { - s->flags &= ~__CMPXCHG_DOUBLE; - s->flags |= SLAB_CONSISTENCY_CHECKS; - } - return length; -} -SLAB_ATTR(sanity_checks); +SLAB_ATTR_RO(sanity_checks); static ssize_t trace_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_TRACE)); } - -static ssize_t trace_store(struct kmem_cache *s, const char *buf, - size_t length) -{ - /* - * Tracing a merged cache is going to give confusing results - * as well as cause other issues like converting a mergeable - * cache into an umergeable one. - */ - if (s->refcount > 1) - return -EINVAL; - - s->flags &= ~SLAB_TRACE; - if (buf[0] == '1') { - s->flags &= ~__CMPXCHG_DOUBLE; - s->flags |= SLAB_TRACE; - } - return length; -} -SLAB_ATTR(trace); +SLAB_ATTR_RO(trace); static ssize_t red_zone_show(struct kmem_cache *s, char *buf) { @@ -5375,19 +5331,7 @@ static ssize_t failslab_show(struct kmem { return sprintf(buf, "%d\n", !!(s->flags & SLAB_FAILSLAB)); } - -static ssize_t failslab_store(struct kmem_cache *s, const char *buf, - size_t length) -{ - if (s->refcount > 1) - return -EINVAL; - - s->flags &= ~SLAB_FAILSLAB; - if (buf[0] == '1') - s->flags |= SLAB_FAILSLAB; - return length; -} -SLAB_ATTR(failslab); +SLAB_ATTR_RO(failslab); #endif static ssize_t shrink_show(struct kmem_cache *s, char *buf) _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: make reclaim_account attribute read-only The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not clear why this attribute was writable in the first place, as it's tied to how the cache is used by its creator, it's not a user tunable. Furthermore: - it affects slab merging, but that's not being checked while toggled - if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but the runtime toggle doesn't update allocflags - it affects cache_vmstat_idx() so runtime toggling might lead to incosistency of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE Thus make it read-only. Link: http://lkml.kernel.org/r/20200610163135.17364-6-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slub.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) --- a/mm/slub.c~mm-slub-make-reclaim_account-attribute-read-only +++ a/mm/slub.c @@ -5207,16 +5207,7 @@ static ssize_t reclaim_account_show(stru { return sprintf(buf, "%d\n", !!(s->flags & SLAB_RECLAIM_ACCOUNT)); } - -static ssize_t reclaim_account_store(struct kmem_cache *s, - const char *buf, size_t length) -{ - s->flags &= ~SLAB_RECLAIM_ACCOUNT; - if (buf[0] == '1') - s->flags |= SLAB_RECLAIM_ACCOUNT; - return length; -} -SLAB_ATTR(reclaim_account); +SLAB_ATTR_RO(reclaim_account); static ssize_t hwcache_align_show(struct kmem_cache *s, char *buf) { _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: introduce static key for slub_debug() One advantage of CONFIG_SLUB_DEBUG is that a generic distro kernel can be built with the option enabled, but it's inactive until simply enabled on boot, without rebuilding the kernel. With a static key, we can further eliminate the overhead of checking whether a cache has a particular debug flag enabled if we know that there are no such caches (slub_debug was not enabled during boot). We use the same mechanism also for e.g. page_owner, debug_pagealloc or kmemcg functionality. This patch introduces the static key and makes the general check for per-cache debug flags kmem_cache_debug() use it. This benefits several call sites, including (slow path but still rather frequent) __slab_free(). The next patches will add more uses. Link: http://lkml.kernel.org/r/20200610163135.17364-7-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slub.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) --- a/mm/slub.c~mm-slub-introduce-static-key-for-slub_debug +++ a/mm/slub.c @@ -114,13 +114,21 @@ * the fast path and disables lockless freelists. */ +#ifdef CONFIG_SLUB_DEBUG +#ifdef CONFIG_SLUB_DEBUG_ON +DEFINE_STATIC_KEY_TRUE(slub_debug_enabled); +#else +DEFINE_STATIC_KEY_FALSE(slub_debug_enabled); +#endif +#endif + static inline int kmem_cache_debug(struct kmem_cache *s) { #ifdef CONFIG_SLUB_DEBUG - return unlikely(s->flags & SLAB_DEBUG_FLAGS); -#else - return 0; + if (static_branch_unlikely(&slub_debug_enabled)) + return s->flags & SLAB_DEBUG_FLAGS; #endif + return 0; } void *fixup_red_left(struct kmem_cache *s, void *p) @@ -1389,6 +1397,8 @@ static int __init setup_slub_debug(char slub_debug_string = saved_str; } out: + if (slub_debug != 0 || slub_debug_string) + static_branch_enable(&slub_debug_enabled); if ((static_branch_unlikely(&init_on_alloc) || static_branch_unlikely(&init_on_free)) && (slub_debug & SLAB_POISON)) _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: introduce kmem_cache_debug_flags() There are few places that call kmem_cache_debug(s) (which tests if any of debug flags are enabled for a cache) immediately followed by a test for a specific flag. The compiler can probably eliminate the extra check, but we can make the code nicer by introducing kmem_cache_debug_flags() that works like kmem_cache_debug() (including the static key check) but tests for specific flag(s). The next patches will add more users. [vbabka@suse.cz: change return from int to bool, per Kees. Add VM_WARN_ON_ONCE() for invalid flags, per Roman] Link: http://lkml.kernel.org/r/949b90ed-e0f0-07d7-4d21-e30ec0958a7c@suse.cz Link: http://lkml.kernel.org/r/20200610163135.17364-8-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slub.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) --- a/mm/slub.c~mm-slub-introduce-kmem_cache_debug_flags +++ a/mm/slub.c @@ -122,18 +122,29 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabl #endif #endif -static inline int kmem_cache_debug(struct kmem_cache *s) +/* + * Returns true if any of the specified slub_debug flags is enabled for the + * cache. Use only for flags parsed by setup_slub_debug() as it also enables + * the static key. + */ +static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t flags) { + VM_WARN_ON_ONCE(!(flags & SLAB_DEBUG_FLAGS)); #ifdef CONFIG_SLUB_DEBUG if (static_branch_unlikely(&slub_debug_enabled)) - return s->flags & SLAB_DEBUG_FLAGS; + return s->flags & flags; #endif - return 0; + return false; +} + +static inline bool kmem_cache_debug(struct kmem_cache *s) +{ + return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS); } void *fixup_red_left(struct kmem_cache *s, void *p) { - if (kmem_cache_debug(s) && s->flags & SLAB_RED_ZONE) + if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) p += s->red_left_pad; return p; @@ -4060,7 +4071,7 @@ void __check_heap_object(const void *ptr offset = (ptr - page_address(page)) % s->size; /* Adjust for redzone and reject if within the redzone. */ - if (kmem_cache_debug(s) && s->flags & SLAB_RED_ZONE) { + if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) { if (offset < s->red_left_pad) usercopy_abort("SLUB object in left red zone", s->name, to_user, offset, n); _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slub: extend checks guarded by slub_debug static key There are few more places in SLUB that could benefit from reduced overhead of the static key introduced by a previous patch: - setup_object_debug() called on each object in newly allocated slab page - setup_page_debug() called on newly allocated slab page - __free_slab() called on freed slab page Link: http://lkml.kernel.org/r/20200610163135.17364-9-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slub.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/slub.c~mm-slub-extend-checks-guarded-by-slub_debug-static-key +++ a/mm/slub.c @@ -1131,7 +1131,7 @@ static inline void dec_slabs_node(struct static void setup_object_debug(struct kmem_cache *s, struct page *page, void *object) { - if (!(s->flags & (SLAB_STORE_USER|SLAB_RED_ZONE|__OBJECT_POISON))) + if (!kmem_cache_debug_flags(s, SLAB_STORE_USER|SLAB_RED_ZONE|__OBJECT_POISON)) return; init_object(s, object, SLUB_RED_INACTIVE); @@ -1141,7 +1141,7 @@ static void setup_object_debug(struct km static void setup_page_debug(struct kmem_cache *s, struct page *page, void *addr) { - if (!(s->flags & SLAB_POISON)) + if (!kmem_cache_debug_flags(s, SLAB_POISON)) return; metadata_access_enable(); @@ -1853,7 +1853,7 @@ static void __free_slab(struct kmem_cach int order = compound_order(page); int pages = 1 << order; - if (s->flags & SLAB_CONSISTENCY_CHECKS) { + if (kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) { void *p; slab_pad_check(s, page); _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slab/slub: move and improve cache_from_obj() The function cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get the cache from its page in kmem_cache_free()") to support kmemcg, where per-memcg cache can be different from the root one, so we can't use the kmem_cache pointer given to kmem_cache_free(). Prior to that commit, SLUB already had debugging check+warning that could be enabled to compare the given kmem_cache pointer to one referenced by the slab page where the object-to-be-freed resides. This check was moved to cache_from_obj(). Later the check was also enabled for SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate cache membership under freelist hardening"). These checks and warnings can be useful especially for the debugging, which can be improved. Commit 598a0717a816 changed the pr_err() with WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported, others are silent. This patch changes it to WARN() so that all errors are reported. It's also useful to print SLUB allocation/free tracking info for the offending object, if tracking is enabled. We could export the SLUB print_tracking() function and provide an empty one for SLAB, or realize that both the debugging and hardening cases in cache_from_obj() are only supported by SLUB anyway. So this patch moves cache_from_obj() from slab.h to separate instances in slab.c and slub.c, where the SLAB version only does the kmemcg lookup and even could be completely removed once the kmemcg rework [1] is merged. The SLUB version can thus easily use the print_tracking() function. It can also use the kmem_cache_debug_flags() static key check for improved performance in kernels without the hardening and with debugging not enabled on boot. [1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com Link: http://lkml.kernel.org/r/20200610163135.17364-10-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 8 ++++++++ mm/slab.h | 23 ----------------------- mm/slub.c | 21 +++++++++++++++++++++ 3 files changed, 29 insertions(+), 23 deletions(-) --- a/mm/slab.c~mm-slab-slub-move-and-improve-cache_from_obj +++ a/mm/slab.c @@ -3678,6 +3678,14 @@ void *__kmalloc_track_caller(size_t size } EXPORT_SYMBOL(__kmalloc_track_caller); +static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) +{ + if (memcg_kmem_enabled()) + return virt_to_cache(x); + else + return s; +} + /** * kmem_cache_free - Deallocate an object * @cachep: The cache the allocation was from. --- a/mm/slab.h~mm-slab-slub-move-and-improve-cache_from_obj +++ a/mm/slab.h @@ -504,29 +504,6 @@ static __always_inline void uncharge_sla memcg_uncharge_slab(page, order, s); } -static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) -{ - struct kmem_cache *cachep; - - /* - * When kmemcg is not being used, both assignments should return the - * same value. but we don't want to pay the assignment price in that - * case. If it is not compiled in, the compiler should be smart enough - * to not do even the assignment. In that case, slab_equal_or_root - * will also be a constant. - */ - if (!memcg_kmem_enabled() && - !IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && - !unlikely(s->flags & SLAB_CONSISTENCY_CHECKS)) - return s; - - cachep = virt_to_cache(x); - WARN_ONCE(cachep && !slab_equal_or_root(cachep, s), - "%s: Wrong slab cache. %s but object is from %s\n", - __func__, s->name, cachep->name); - return cachep; -} - static inline size_t slab_ksize(const struct kmem_cache *s) { #ifndef CONFIG_SLUB --- a/mm/slub.c~mm-slab-slub-move-and-improve-cache_from_obj +++ a/mm/slub.c @@ -1525,6 +1525,10 @@ static bool freelist_corrupted(struct km { return false; } + +static void print_tracking(struct kmem_cache *s, void *object) +{ +} #endif /* CONFIG_SLUB_DEBUG */ /* @@ -3171,6 +3175,23 @@ void ___cache_free(struct kmem_cache *ca } #endif +static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) +{ + struct kmem_cache *cachep; + + if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && + !memcg_kmem_enabled() && + !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) + return s; + + cachep = virt_to_cache(x); + if (WARN(cachep && !slab_equal_or_root(cachep, s), + "%s: Wrong slab cache. %s but object is from %s\n", + __func__, s->name, cachep->name)) + print_tracking(cachep, x); + return cachep; +} + void kmem_cache_free(struct kmem_cache *s, void *x) { s = cache_from_obj(s, x); _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, slab/slub: improve error reporting and overhead of cache_from_obj() cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get the cache from its page in kmem_cache_free()") to support kmemcg, where per-memcg cache can be different from the root one, so we can't use the kmem_cache pointer given to kmem_cache_free(). Prior to that commit, SLUB already had debugging check+warning that could be enabled to compare the given kmem_cache pointer to one referenced by the slab page where the object-to-be-freed resides. This check was moved to cache_from_obj(). Later the check was also enabled for SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate cache membership under freelist hardening"). These checks and warnings can be useful especially for the debugging, which can be improved. Commit 598a0717a816 changed the pr_err() with WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported, others are silent. This patch changes it to WARN() so that all errors are reported. It's also useful to print SLUB allocation/free tracking info for the offending object, if tracking is enabled. Thus, export the SLUB print_tracking() function and provide an empty one for SLAB. For SLUB we can also benefit from the static key check in kmem_cache_debug_flags(), but we need to move this function to slab.h and declare the static key there. [1] https://lore.kernel.org/r/20200608230654.828134-18-guro@fb.com [vbabka@suse.cz: avoid bogus WARN()] Link: https://lore.kernel.org/r/20200623090213.GW5535@shao2-debian Link: http://lkml.kernel.org/r/b33e0fa7-cd28-4788-9e54-5927846329ef@suse.cz Link: http://lkml.kernel.org/r/afeda7ac-748b-33d8-a905-56b708148ad5@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Garrett <mjg59@google.com> Cc: Jann Horn <jannh@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 8 -------- mm/slab.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ mm/slub.c | 38 +------------------------------------- 3 files changed, 46 insertions(+), 45 deletions(-) --- a/mm/slab.c~mm-slab-slub-improve-error-reporting-and-overhead-of-cache_from_obj +++ a/mm/slab.c @@ -3678,14 +3678,6 @@ void *__kmalloc_track_caller(size_t size } EXPORT_SYMBOL(__kmalloc_track_caller); -static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) -{ - if (memcg_kmem_enabled()) - return virt_to_cache(x); - else - return s; -} - /** * kmem_cache_free - Deallocate an object * @cachep: The cache the allocation was from. --- a/mm/slab.h~mm-slab-slub-improve-error-reporting-and-overhead-of-cache_from_obj +++ a/mm/slab.h @@ -276,6 +276,34 @@ static inline int cache_vmstat_idx(struc NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE; } +#ifdef CONFIG_SLUB_DEBUG +#ifdef CONFIG_SLUB_DEBUG_ON +DECLARE_STATIC_KEY_TRUE(slub_debug_enabled); +#else +DECLARE_STATIC_KEY_FALSE(slub_debug_enabled); +#endif +extern void print_tracking(struct kmem_cache *s, void *object); +#else +static inline void print_tracking(struct kmem_cache *s, void *object) +{ +} +#endif + +/* + * Returns true if any of the specified slub_debug flags is enabled for the + * cache. Use only for flags parsed by setup_slub_debug() as it also enables + * the static key. + */ +static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t flags) +{ +#ifdef CONFIG_SLUB_DEBUG + VM_WARN_ON_ONCE(!(flags & SLAB_DEBUG_FLAGS)); + if (static_branch_unlikely(&slub_debug_enabled)) + return s->flags & flags; +#endif + return false; +} + #ifdef CONFIG_MEMCG_KMEM /* List of all root caches. */ @@ -504,6 +532,23 @@ static __always_inline void uncharge_sla memcg_uncharge_slab(page, order, s); } +static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) +{ + struct kmem_cache *cachep; + + if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && + !memcg_kmem_enabled() && + !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) + return s; + + cachep = virt_to_cache(x); + if (WARN(cachep && !slab_equal_or_root(cachep, s), + "%s: Wrong slab cache. %s but object is from %s\n", + __func__, s->name, cachep->name)) + print_tracking(cachep, x); + return cachep; +} + static inline size_t slab_ksize(const struct kmem_cache *s) { #ifndef CONFIG_SLUB --- a/mm/slub.c~mm-slab-slub-improve-error-reporting-and-overhead-of-cache_from_obj +++ a/mm/slub.c @@ -122,21 +122,6 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabl #endif #endif -/* - * Returns true if any of the specified slub_debug flags is enabled for the - * cache. Use only for flags parsed by setup_slub_debug() as it also enables - * the static key. - */ -static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t flags) -{ - VM_WARN_ON_ONCE(!(flags & SLAB_DEBUG_FLAGS)); -#ifdef CONFIG_SLUB_DEBUG - if (static_branch_unlikely(&slub_debug_enabled)) - return s->flags & flags; -#endif - return false; -} - static inline bool kmem_cache_debug(struct kmem_cache *s) { return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS); @@ -653,7 +638,7 @@ static void print_track(const char *s, s #endif } -static void print_tracking(struct kmem_cache *s, void *object) +void print_tracking(struct kmem_cache *s, void *object) { unsigned long pr_time = jiffies; if (!(s->flags & SLAB_STORE_USER)) @@ -1525,10 +1510,6 @@ static bool freelist_corrupted(struct km { return false; } - -static void print_tracking(struct kmem_cache *s, void *object) -{ -} #endif /* CONFIG_SLUB_DEBUG */ /* @@ -3175,23 +3156,6 @@ void ___cache_free(struct kmem_cache *ca } #endif -static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) -{ - struct kmem_cache *cachep; - - if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && - !memcg_kmem_enabled() && - !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) - return s; - - cachep = virt_to_cache(x); - if (WARN(cachep && !slab_equal_or_root(cachep, s), - "%s: Wrong slab cache. %s but object is from %s\n", - __func__, s->name, cachep->name)) - print_tracking(cachep, x); - return cachep; -} - void kmem_cache_free(struct kmem_cache *s, void *x) { s = cache_from_obj(s, x); _
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Subject: mm/slub.c: drop lockdep_assert_held() from put_map() There is no point in using lockdep_assert_held() unlock that is about to be unlocked. It works only with lockdep and lockdep will complain if spin_unlock() is used on a lock that has not been locked. Remove superfluous lockdep_assert_held(). Link: http://lkml.kernel.org/r/20200618201234.795692-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Yu Zhao <yuzhao@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slub.c | 2 -- 1 file changed, 2 deletions(-) --- a/mm/slub.c~slub-drop-lockdep_assert_held-from-put_map +++ a/mm/slub.c @@ -473,8 +473,6 @@ static unsigned long *get_map(struct kme static void put_map(unsigned long *map) __releases(&object_map_lock) { VM_BUG_ON(map != object_map); - lockdep_assert_held(&object_map_lock); - spin_unlock(&object_map_lock); } _
From: Marco Elver <elver@google.com> Subject: mm, kcsan: instrument SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS" Provide the necessary KCSAN checks to assist with debugging racy use-after-frees. While KASAN is more reliable at generally catching such use-after-frees (due to its use of a quarantine), it can be difficult to debug racy use-after-frees. If a reliable reproducer exists, KCSAN can assist in debugging such issues. Note: ASSERT_EXCLUSIVE_ACCESS is a convenience wrapper if the size is simply sizeof(var). Instead, here we just use __kcsan_check_access() explicitly to pass the correct size. Link: http://lkml.kernel.org/r/20200623072653.114563-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 5 +++++ mm/slub.c | 5 +++++ 2 files changed, 10 insertions(+) --- a/mm/slab.c~mm-kcsan-instrument-slab-slub-free-with-assert_exclusive_access +++ a/mm/slab.c @@ -3432,6 +3432,11 @@ static __always_inline void __cache_free if (kasan_slab_free(cachep, objp, _RET_IP_)) return; + /* Use KCSAN to help debug racy use-after-free. */ + if (!(cachep->flags & SLAB_TYPESAFE_BY_RCU)) + __kcsan_check_access(objp, cachep->object_size, + KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT); + ___cache_free(cachep, objp, caller); } --- a/mm/slub.c~mm-kcsan-instrument-slab-slub-free-with-assert_exclusive_access +++ a/mm/slub.c @@ -1549,6 +1549,11 @@ static __always_inline bool slab_free_ho if (!(s->flags & SLAB_DEBUG_OBJECTS)) debug_check_no_obj_freed(x, s->object_size); + /* Use KCSAN to help debug racy use-after-free. */ + if (!(s->flags & SLAB_TYPESAFE_BY_RCU)) + __kcsan_check_access(x, s->object_size, + KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT); + /* KASAN might put x into memory quarantine, delaying its reuse */ return kasan_slab_free(s, x, _RET_IP_); } _
From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: mm/debug_vm_pgtable: add tests validating arch helpers for core MM features Patch series "mm/debug_vm_pgtable: Add some more tests", v5. This series adds some more arch page table helper validation tests which are related to core and advanced memory functions. This also creates a documentation, enlisting expected semantics for all page table helpers as suggested by Mike Rapoport previously (https://lkml.org/lkml/2020/1/30/40). There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD ifdefs scattered across the test. But consolidating all the fallback stubs is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE. Tested on arm64, x86 platforms but only build tested on all other enabled platforms through ARCH_HAS_DEBUG_VM_PGTABLE i.e powerpc, arc, s390. The following failure on arm64 still exists which was mentioned previously. It will be fixed with the upcoming THP migration on arm64 enablement series. WARNING .... mm/debug_vm_pgtable.c:860 debug_vm_pgtable+0x940/0xa54 WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))) This patch (of 4): This adds new tests validating arch page table helpers for these following core memory features. These tests create and test specific mapping types at various page table levels. 1. SPECIAL mapping 2. PROTNONE mapping 3. DEVMAP mapping 4. SOFTDIRTY mapping 5. SWAP mapping 6. MIGRATION mapping 7. HUGETLB mapping 8. THP mapping Link: http://lkml.kernel.org/r/1594610587-4172-1-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1593996516-7186-1-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1593996516-7186-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Vineet Gupta <vgupta@synopsys.com> [arc] Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug_vm_pgtable.c | 302 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 301 insertions(+), 1 deletion(-) --- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-add-tests-validating-arch-helpers-for-core-mm-features +++ a/mm/debug_vm_pgtable.c @@ -282,6 +282,278 @@ static void __init pmd_populate_tests(st WARN_ON(pmd_bad(pmd)); } +static void __init pte_special_tests(unsigned long pfn, pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) + return; + + WARN_ON(!pte_special(pte_mkspecial(pte))); +} + +static void __init pte_protnone_tests(unsigned long pfn, pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + return; + + WARN_ON(!pte_protnone(pte)); + WARN_ON(!pte_present(pte)); +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); + + if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + return; + + WARN_ON(!pmd_protnone(pmd)); + WARN_ON(!pmd_present(pmd)); +} +#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ +static void __init pmd_protnone_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + WARN_ON(!pte_devmap(pte_mkdevmap(pte))); +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd = pfn_pmd(pfn, prot); + + WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd))); +} + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) +{ + pud_t pud = pfn_pud(pfn, prot); + + WARN_ON(!pud_devmap(pud_mkdevmap(pud))); +} +#else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +#else /* CONFIG_TRANSPARENT_HUGEPAGE */ +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#else +static void __init pte_devmap_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pmd_devmap_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pud_devmap_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */ + +static void __init pte_soft_dirty_tests(unsigned long pfn, pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) + return; + + WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte))); + WARN_ON(pte_soft_dirty(pte_clear_soft_dirty(pte))); +} + +static void __init pte_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) + return; + + WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte))); + WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte))); +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd = pfn_pmd(pfn, prot); + + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) + return; + + WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd))); + WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd))); +} + +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd = pfn_pmd(pfn, prot); + + if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) || + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION)) + return; + + WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd))); + WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd))); +} +#else /* !CONFIG_ARCH_HAS_PTE_DEVMAP */ +static void __init pmd_soft_dirty_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pmd_swap_soft_dirty_tests(unsigned long pfn, pgprot_t prot) +{ +} +#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */ + +static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot) +{ + swp_entry_t swp; + pte_t pte; + + pte = pfn_pte(pfn, prot); + swp = __pte_to_swp_entry(pte); + pte = __swp_entry_to_pte(swp); + WARN_ON(pfn != pte_pfn(pte)); +} + +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) +{ + swp_entry_t swp; + pmd_t pmd; + + pmd = pfn_pmd(pfn, prot); + swp = __pmd_to_swp_entry(pmd); + pmd = __swp_entry_to_pmd(swp); + WARN_ON(pfn != pmd_pfn(pmd)); +} +#else /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */ +static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ + +static void __init swap_migration_tests(void) +{ + struct page *page; + swp_entry_t swp; + + if (!IS_ENABLED(CONFIG_MIGRATION)) + return; + /* + * swap_migration_tests() requires a dedicated page as it needs to + * be locked before creating a migration entry from it. Locking the + * page that actually maps kernel text ('start_kernel') can be real + * problematic. Lets allocate a dedicated page explicitly for this + * purpose that will be freed subsequently. + */ + page = alloc_page(GFP_KERNEL); + if (!page) { + pr_err("page allocation failed\n"); + return; + } + + /* + * make_migration_entry() expects given page to be + * locked, otherwise it stumbles upon a BUG_ON(). + */ + __SetPageLocked(page); + swp = make_migration_entry(page, 1); + WARN_ON(!is_migration_entry(swp)); + WARN_ON(!is_write_migration_entry(swp)); + + make_migration_entry_read(&swp); + WARN_ON(!is_migration_entry(swp)); + WARN_ON(is_write_migration_entry(swp)); + + swp = make_migration_entry(page, 0); + WARN_ON(!is_migration_entry(swp)); + WARN_ON(is_write_migration_entry(swp)); + __ClearPageLocked(page); + __free_page(page); +} + +#ifdef CONFIG_HUGETLB_PAGE +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) +{ + struct page *page; + pte_t pte; + + /* + * Accessing the page associated with the pfn is safe here, + * as it was previously derived from a real kernel symbol. + */ + page = pfn_to_page(pfn); + pte = mk_huge_pte(page, prot); + + WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte))); + WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte)))); + WARN_ON(huge_pte_write(huge_pte_wrprotect(huge_pte_mkwrite(pte)))); + +#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB + pte = pfn_pte(pfn, prot); + + WARN_ON(!pte_huge(pte_mkhuge(pte))); +#endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ +} +#else /* !CONFIG_HUGETLB_PAGE */ +static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_HUGETLB_PAGE */ + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void __init pmd_thp_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd; + + if (!has_transparent_hugepage()) + return; + + /* + * pmd_trans_huge() and pmd_present() must return positive after + * MMU invalidation with pmd_mkinvalid(). This behavior is an + * optimization for transparent huge page. pmd_trans_huge() must + * be true if pmd_page() returns a valid THP to avoid taking the + * pmd_lock when others walk over non transhuge pmds (i.e. there + * are no THP allocated). Especially when splitting a THP and + * removing the present bit from the pmd, pmd_trans_huge() still + * needs to return true. pmd_present() should be true whenever + * pmd_trans_huge() returns true. + */ + pmd = pfn_pmd(pfn, prot); + WARN_ON(!pmd_trans_huge(pmd_mkhuge(pmd))); + +#ifndef __HAVE_ARCH_PMDP_INVALIDATE + WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd)))); + WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))); +#endif /* __HAVE_ARCH_PMDP_INVALIDATE */ +} + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static void __init pud_thp_tests(unsigned long pfn, pgprot_t prot) +{ + pud_t pud; + + if (!has_transparent_hugepage()) + return; + + pud = pfn_pud(pfn, prot); + WARN_ON(!pud_trans_huge(pud_mkhuge(pud))); + + /* + * pud_mkinvalid() has been dropped for now. Enable back + * these tests when it comes back with a modified pud_present(). + * + * WARN_ON(!pud_trans_huge(pud_mkinvalid(pud_mkhuge(pud)))); + * WARN_ON(!pud_present(pud_mkinvalid(pud_mkhuge(pud)))); + */ +} +#else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +static void __init pud_thp_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ +static void __init pmd_thp_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pud_thp_tests(unsigned long pfn, pgprot_t prot) { } +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + static unsigned long __init get_random_vaddr(void) { unsigned long random_vaddr, random_pages, total_user_pages; @@ -303,7 +575,7 @@ static int __init debug_vm_pgtable(void) pmd_t *pmdp, *saved_pmdp, pmd; pte_t *ptep; pgtable_t saved_ptep; - pgprot_t prot; + pgprot_t prot, protnone; phys_addr_t paddr; unsigned long vaddr, pte_aligned, pmd_aligned; unsigned long pud_aligned, p4d_aligned, pgd_aligned; @@ -319,6 +591,12 @@ static int __init debug_vm_pgtable(void) } /* + * __P000 (or even __S000) will help create page table entries with + * PROT_NONE permission as required for pxx_protnone_tests(). + */ + protnone = __P000; + + /* * PFN for mapping at PTE level is determined from a standard kernel * text symbol. But pfns for higher page table levels are derived by * masking lower bits of this real pfn. These derived pfns might not @@ -373,6 +651,28 @@ static int __init debug_vm_pgtable(void) p4d_populate_tests(mm, p4dp, saved_pudp); pgd_populate_tests(mm, pgdp, saved_p4dp); + pte_special_tests(pte_aligned, prot); + pte_protnone_tests(pte_aligned, protnone); + pmd_protnone_tests(pmd_aligned, protnone); + + pte_devmap_tests(pte_aligned, prot); + pmd_devmap_tests(pmd_aligned, prot); + pud_devmap_tests(pud_aligned, prot); + + pte_soft_dirty_tests(pte_aligned, prot); + pmd_soft_dirty_tests(pmd_aligned, prot); + pte_swap_soft_dirty_tests(pte_aligned, prot); + pmd_swap_soft_dirty_tests(pmd_aligned, prot); + + pte_swap_tests(pte_aligned, prot); + pmd_swap_tests(pmd_aligned, prot); + + swap_migration_tests(); + hugetlb_basic_tests(pte_aligned, prot); + + pmd_thp_tests(pmd_aligned, prot); + pud_thp_tests(pud_aligned, prot); + p4d_free(mm, saved_p4dp); pud_free(mm, saved_pudp); pmd_free(mm, saved_pmdp); _
From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: mm/debug_vm_pgtable: add tests validating advanced arch page table helpers This adds new tests validating for these following arch advanced page table helpers. These tests create and test specific mapping types at various page table levels. 1. pxxp_set_wrprotect() 2. pxxp_get_and_clear() 3. pxxp_set_access_flags() 4. pxxp_get_and_clear_full() 5. pxxp_test_and_clear_young() 6. pxx_leaf() 7. pxx_set_huge() 8. pxx_(clear|mk)_savedwrite() 9. huge_pxxp_xxx() [anshuman.khandual@arm.com: drop RANDOM_ORVALUE from hugetlb_advanced_tests()] Link: http://lkml.kernel.org/r/1594610587-4172-3-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1593996516-7186-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Vineet Gupta <vgupta@synopsys.com> [arc] Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug_vm_pgtable.c | 312 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 312 insertions(+) --- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-add-tests-validating-advanced-arch-page-table-helpers +++ a/mm/debug_vm_pgtable.c @@ -21,6 +21,7 @@ #include <linux/module.h> #include <linux/pfn_t.h> #include <linux/printk.h> +#include <linux/pgtable.h> #include <linux/random.h> #include <linux/spinlock.h> #include <linux/swap.h> @@ -28,6 +29,7 @@ #include <linux/start_kernel.h> #include <linux/sched/mm.h> #include <asm/pgalloc.h> +#include <asm/tlbflush.h> #define VMFLAGS (VM_READ|VM_WRITE|VM_EXEC) @@ -55,6 +57,55 @@ static void __init pte_basic_tests(unsig WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); } +static void __init pte_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, pte_t *ptep, + unsigned long pfn, unsigned long vaddr, + pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + pte = pfn_pte(pfn, prot); + set_pte_at(mm, vaddr, ptep, pte); + ptep_set_wrprotect(mm, vaddr, ptep); + pte = ptep_get(ptep); + WARN_ON(pte_write(pte)); + + pte = pfn_pte(pfn, prot); + set_pte_at(mm, vaddr, ptep, pte); + ptep_get_and_clear(mm, vaddr, ptep); + pte = ptep_get(ptep); + WARN_ON(!pte_none(pte)); + + pte = pfn_pte(pfn, prot); + pte = pte_wrprotect(pte); + pte = pte_mkclean(pte); + set_pte_at(mm, vaddr, ptep, pte); + pte = pte_mkwrite(pte); + pte = pte_mkdirty(pte); + ptep_set_access_flags(vma, vaddr, ptep, pte, 1); + pte = ptep_get(ptep); + WARN_ON(!(pte_write(pte) && pte_dirty(pte))); + + pte = pfn_pte(pfn, prot); + set_pte_at(mm, vaddr, ptep, pte); + ptep_get_and_clear_full(mm, vaddr, ptep, 1); + pte = ptep_get(ptep); + WARN_ON(!pte_none(pte)); + + pte = pte_mkyoung(pte); + set_pte_at(mm, vaddr, ptep, pte); + ptep_test_and_clear_young(vma, vaddr, ptep); + pte = ptep_get(ptep); + WARN_ON(pte_young(pte)); +} + +static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot) +{ + pte_t pte = pfn_pte(pfn, prot); + + WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte)))); + WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte)))); +} #ifdef CONFIG_TRANSPARENT_HUGEPAGE static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { @@ -77,6 +128,90 @@ static void __init pmd_basic_tests(unsig WARN_ON(!pmd_bad(pmd_mkhuge(pmd))); } +static void __init pmd_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, pmd_t *pmdp, + unsigned long pfn, unsigned long vaddr, + pgprot_t prot) +{ + pmd_t pmd = pfn_pmd(pfn, prot); + + if (!has_transparent_hugepage()) + return; + + /* Align the address wrt HPAGE_PMD_SIZE */ + vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE; + + pmd = pfn_pmd(pfn, prot); + set_pmd_at(mm, vaddr, pmdp, pmd); + pmdp_set_wrprotect(mm, vaddr, pmdp); + pmd = READ_ONCE(*pmdp); + WARN_ON(pmd_write(pmd)); + + pmd = pfn_pmd(pfn, prot); + set_pmd_at(mm, vaddr, pmdp, pmd); + pmdp_huge_get_and_clear(mm, vaddr, pmdp); + pmd = READ_ONCE(*pmdp); + WARN_ON(!pmd_none(pmd)); + + pmd = pfn_pmd(pfn, prot); + pmd = pmd_wrprotect(pmd); + pmd = pmd_mkclean(pmd); + set_pmd_at(mm, vaddr, pmdp, pmd); + pmd = pmd_mkwrite(pmd); + pmd = pmd_mkdirty(pmd); + pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1); + pmd = READ_ONCE(*pmdp); + WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd))); + + pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); + set_pmd_at(mm, vaddr, pmdp, pmd); + pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1); + pmd = READ_ONCE(*pmdp); + WARN_ON(!pmd_none(pmd)); + + pmd = pmd_mkyoung(pmd); + set_pmd_at(mm, vaddr, pmdp, pmd); + pmdp_test_and_clear_young(vma, vaddr, pmdp); + pmd = READ_ONCE(*pmdp); + WARN_ON(pmd_young(pmd)); +} + +static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd = pfn_pmd(pfn, prot); + + /* + * PMD based THP is a leaf entry. + */ + pmd = pmd_mkhuge(pmd); + WARN_ON(!pmd_leaf(pmd)); +} + +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd; + + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + return; + /* + * X86 defined pmd_set_huge() verifies that the given + * PMD is not a populated non-leaf entry. + */ + WRITE_ONCE(*pmdp, __pmd(0)); + WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot)); + WARN_ON(!pmd_clear_huge(pmdp)); + pmd = READ_ONCE(*pmdp); + WARN_ON(!pmd_none(pmd)); +} + +static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) +{ + pmd_t pmd = pfn_pmd(pfn, prot); + + WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd)))); + WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd)))); +} + #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { @@ -100,12 +235,119 @@ static void __init pud_basic_tests(unsig */ WARN_ON(!pud_bad(pud_mkhuge(pud))); } + +static void __init pud_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, pud_t *pudp, + unsigned long pfn, unsigned long vaddr, + pgprot_t prot) +{ + pud_t pud = pfn_pud(pfn, prot); + + if (!has_transparent_hugepage()) + return; + + /* Align the address wrt HPAGE_PUD_SIZE */ + vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE; + + set_pud_at(mm, vaddr, pudp, pud); + pudp_set_wrprotect(mm, vaddr, pudp); + pud = READ_ONCE(*pudp); + WARN_ON(pud_write(pud)); + +#ifndef __PAGETABLE_PMD_FOLDED + pud = pfn_pud(pfn, prot); + set_pud_at(mm, vaddr, pudp, pud); + pudp_huge_get_and_clear(mm, vaddr, pudp); + pud = READ_ONCE(*pudp); + WARN_ON(!pud_none(pud)); + + pud = pfn_pud(pfn, prot); + set_pud_at(mm, vaddr, pudp, pud); + pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1); + pud = READ_ONCE(*pudp); + WARN_ON(!pud_none(pud)); +#endif /* __PAGETABLE_PMD_FOLDED */ + pud = pfn_pud(pfn, prot); + pud = pud_wrprotect(pud); + pud = pud_mkclean(pud); + set_pud_at(mm, vaddr, pudp, pud); + pud = pud_mkwrite(pud); + pud = pud_mkdirty(pud); + pudp_set_access_flags(vma, vaddr, pudp, pud, 1); + pud = READ_ONCE(*pudp); + WARN_ON(!(pud_write(pud) && pud_dirty(pud))); + + pud = pud_mkyoung(pud); + set_pud_at(mm, vaddr, pudp, pud); + pudp_test_and_clear_young(vma, vaddr, pudp); + pud = READ_ONCE(*pudp); + WARN_ON(pud_young(pud)); +} + +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) +{ + pud_t pud = pfn_pud(pfn, prot); + + /* + * PUD based THP is a leaf entry. + */ + pud = pud_mkhuge(pud); + WARN_ON(!pud_leaf(pud)); +} + +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) +{ + pud_t pud; + + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + return; + /* + * X86 defined pud_set_huge() verifies that the given + * PUD is not a populated non-leaf entry. + */ + WRITE_ONCE(*pudp, __pud(0)); + WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot)); + WARN_ON(!pud_clear_huge(pudp)); + pud = READ_ONCE(*pudp); + WARN_ON(!pud_none(pud)); +} #else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pud_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, pud_t *pudp, + unsigned long pfn, unsigned long vaddr, + pgprot_t prot) +{ +} +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) +{ +} #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ #else /* !CONFIG_TRANSPARENT_HUGEPAGE */ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { } static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pmd_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, pmd_t *pmdp, + unsigned long pfn, unsigned long vaddr, + pgprot_t prot) +{ +} +static void __init pud_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, pud_t *pudp, + unsigned long pfn, unsigned long vaddr, + pgprot_t prot) +{ +} +static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { } +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) +{ +} +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) +{ +} +static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot) @@ -495,8 +737,56 @@ static void __init hugetlb_basic_tests(u WARN_ON(!pte_huge(pte_mkhuge(pte))); #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ } + +static void __init hugetlb_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *ptep, unsigned long pfn, + unsigned long vaddr, pgprot_t prot) +{ + struct page *page = pfn_to_page(pfn); + pte_t pte = ptep_get(ptep); + unsigned long paddr = __pfn_to_phys(pfn) & PMD_MASK; + + pte = pte_mkhuge(mk_pte(pfn_to_page(PHYS_PFN(paddr)), prot)); + set_huge_pte_at(mm, vaddr, ptep, pte); + barrier(); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); + pte = huge_ptep_get(ptep); + WARN_ON(!huge_pte_none(pte)); + + pte = mk_huge_pte(page, prot); + set_huge_pte_at(mm, vaddr, ptep, pte); + barrier(); + huge_ptep_set_wrprotect(mm, vaddr, ptep); + pte = huge_ptep_get(ptep); + WARN_ON(huge_pte_write(pte)); + + pte = mk_huge_pte(page, prot); + set_huge_pte_at(mm, vaddr, ptep, pte); + barrier(); + huge_ptep_get_and_clear(mm, vaddr, ptep); + pte = huge_ptep_get(ptep); + WARN_ON(!huge_pte_none(pte)); + + pte = mk_huge_pte(page, prot); + pte = huge_pte_wrprotect(pte); + set_huge_pte_at(mm, vaddr, ptep, pte); + barrier(); + pte = huge_pte_mkwrite(pte); + pte = huge_pte_mkdirty(pte); + huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); + pte = huge_ptep_get(ptep); + WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); +} #else /* !CONFIG_HUGETLB_PAGE */ static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { } +static void __init hugetlb_advanced_tests(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t *ptep, unsigned long pfn, + unsigned long vaddr, pgprot_t prot) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -568,6 +858,7 @@ static unsigned long __init get_random_v static int __init debug_vm_pgtable(void) { + struct vm_area_struct *vma; struct mm_struct *mm; pgd_t *pgdp; p4d_t *p4dp, *saved_p4dp; @@ -596,6 +887,12 @@ static int __init debug_vm_pgtable(void) */ protnone = __P000; + vma = vm_area_alloc(mm); + if (!vma) { + pr_err("vma allocation failed\n"); + return 1; + } + /* * PFN for mapping at PTE level is determined from a standard kernel * text symbol. But pfns for higher page table levels are derived by @@ -644,6 +941,20 @@ static int __init debug_vm_pgtable(void) p4d_clear_tests(mm, p4dp); pgd_clear_tests(mm, pgdp); + pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); + pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot); + pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot); + hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); + + pmd_leaf_tests(pmd_aligned, prot); + pud_leaf_tests(pud_aligned, prot); + + pmd_huge_tests(pmdp, pmd_aligned, prot); + pud_huge_tests(pudp, pud_aligned, prot); + + pte_savedwrite_tests(pte_aligned, prot); + pmd_savedwrite_tests(pmd_aligned, prot); + pte_unmap_unlock(ptep, ptl); pmd_populate_tests(mm, pmdp, saved_ptep); @@ -678,6 +989,7 @@ static int __init debug_vm_pgtable(void) pmd_free(mm, saved_pmdp); pte_free(mm, saved_ptep); + vm_area_free(vma); mm_dec_nr_puds(mm); mm_dec_nr_pmds(mm); mm_dec_nr_ptes(mm); _
From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: mm/debug_vm_pgtable: add debug prints for individual tests This adds debug print information that enlists all tests getting executed on a given platform. With dynamic debug enabled, the following information will be splashed during boot. For compactness purpose, dropped both time stamp and prefix (i.e debug_vm_pgtable) from this sample output. [debug_vm_pgtable ]: Validating architecture page table helpers [pte_basic_tests ]: Validating PTE basic [pmd_basic_tests ]: Validating PMD basic [p4d_basic_tests ]: Validating P4D basic [pgd_basic_tests ]: Validating PGD basic [pte_clear_tests ]: Validating PTE clear [pmd_clear_tests ]: Validating PMD clear [pte_advanced_tests ]: Validating PTE advanced [pmd_advanced_tests ]: Validating PMD advanced [hugetlb_advanced_tests]: Validating HugeTLB advanced [pmd_leaf_tests ]: Validating PMD leaf [pmd_huge_tests ]: Validating PMD huge [pte_savedwrite_tests ]: Validating PTE saved write [pmd_savedwrite_tests ]: Validating PMD saved write [pmd_populate_tests ]: Validating PMD populate [pte_special_tests ]: Validating PTE special [pte_protnone_tests ]: Validating PTE protnone [pmd_protnone_tests ]: Validating PMD protnone [pte_devmap_tests ]: Validating PTE devmap [pmd_devmap_tests ]: Validating PMD devmap [pte_swap_tests ]: Validating PTE swap [swap_migration_tests ]: Validating swap migration [hugetlb_basic_tests ]: Validating HugeTLB basic [pmd_thp_tests ]: Validating PMD based THP Link: http://lkml.kernel.org/r/1593996516-7186-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Vineet Gupta <vgupta@synopsys.com> [arc] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug_vm_pgtable.c | 46 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 45 insertions(+), 1 deletion(-) --- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-add-debug-prints-for-individual-tests +++ a/mm/debug_vm_pgtable.c @@ -8,7 +8,7 @@ * * Author: Anshuman Khandual <anshuman.khandual@arm.com> */ -#define pr_fmt(fmt) "debug_vm_pgtable: %s: " fmt, __func__ +#define pr_fmt(fmt) "debug_vm_pgtable: [%-25s]: " fmt, __func__ #include <linux/gfp.h> #include <linux/highmem.h> @@ -48,6 +48,7 @@ static void __init pte_basic_tests(unsig { pte_t pte = pfn_pte(pfn, prot); + pr_debug("Validating PTE basic\n"); WARN_ON(!pte_same(pte, pte)); WARN_ON(!pte_young(pte_mkyoung(pte_mkold(pte)))); WARN_ON(!pte_dirty(pte_mkdirty(pte_mkclean(pte)))); @@ -64,6 +65,7 @@ static void __init pte_advanced_tests(st { pte_t pte = pfn_pte(pfn, prot); + pr_debug("Validating PTE advanced\n"); pte = pfn_pte(pfn, prot); set_pte_at(mm, vaddr, ptep, pte); ptep_set_wrprotect(mm, vaddr, ptep); @@ -103,6 +105,7 @@ static void __init pte_savedwrite_tests( { pte_t pte = pfn_pte(pfn, prot); + pr_debug("Validating PTE saved write\n"); WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte)))); WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte)))); } @@ -114,6 +117,7 @@ static void __init pmd_basic_tests(unsig if (!has_transparent_hugepage()) return; + pr_debug("Validating PMD basic\n"); WARN_ON(!pmd_same(pmd, pmd)); WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd)))); WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd)))); @@ -138,6 +142,7 @@ static void __init pmd_advanced_tests(st if (!has_transparent_hugepage()) return; + pr_debug("Validating PMD advanced\n"); /* Align the address wrt HPAGE_PMD_SIZE */ vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE; @@ -180,6 +185,7 @@ static void __init pmd_leaf_tests(unsign { pmd_t pmd = pfn_pmd(pfn, prot); + pr_debug("Validating PMD leaf\n"); /* * PMD based THP is a leaf entry. */ @@ -193,6 +199,8 @@ static void __init pmd_huge_tests(pmd_t if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) return; + + pr_debug("Validating PMD huge\n"); /* * X86 defined pmd_set_huge() verifies that the given * PMD is not a populated non-leaf entry. @@ -208,6 +216,7 @@ static void __init pmd_savedwrite_tests( { pmd_t pmd = pfn_pmd(pfn, prot); + pr_debug("Validating PMD saved write\n"); WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd)))); WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd)))); } @@ -220,6 +229,7 @@ static void __init pud_basic_tests(unsig if (!has_transparent_hugepage()) return; + pr_debug("Validating PUD basic\n"); WARN_ON(!pud_same(pud, pud)); WARN_ON(!pud_young(pud_mkyoung(pud_mkold(pud)))); WARN_ON(!pud_write(pud_mkwrite(pud_wrprotect(pud)))); @@ -246,6 +256,7 @@ static void __init pud_advanced_tests(st if (!has_transparent_hugepage()) return; + pr_debug("Validating PUD advanced\n"); /* Align the address wrt HPAGE_PUD_SIZE */ vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE; @@ -288,6 +299,7 @@ static void __init pud_leaf_tests(unsign { pud_t pud = pfn_pud(pfn, prot); + pr_debug("Validating PUD leaf\n"); /* * PUD based THP is a leaf entry. */ @@ -301,6 +313,8 @@ static void __init pud_huge_tests(pud_t if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) return; + + pr_debug("Validating PUD huge\n"); /* * X86 defined pud_set_huge() verifies that the given * PUD is not a populated non-leaf entry. @@ -354,6 +368,7 @@ static void __init p4d_basic_tests(unsig { p4d_t p4d; + pr_debug("Validating P4D basic\n"); memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t)); WARN_ON(!p4d_same(p4d, p4d)); } @@ -362,6 +377,7 @@ static void __init pgd_basic_tests(unsig { pgd_t pgd; + pr_debug("Validating PGD basic\n"); memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t)); WARN_ON(!pgd_same(pgd, pgd)); } @@ -374,6 +390,7 @@ static void __init pud_clear_tests(struc if (mm_pmd_folded(mm)) return; + pr_debug("Validating PUD clear\n"); pud = __pud(pud_val(pud) | RANDOM_ORVALUE); WRITE_ONCE(*pudp, pud); pud_clear(pudp); @@ -388,6 +405,8 @@ static void __init pud_populate_tests(st if (mm_pmd_folded(mm)) return; + + pr_debug("Validating PUD populate\n"); /* * This entry points to next level page table page. * Hence this must not qualify as pud_bad(). @@ -414,6 +433,7 @@ static void __init p4d_clear_tests(struc if (mm_pud_folded(mm)) return; + pr_debug("Validating P4D clear\n"); p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE); WRITE_ONCE(*p4dp, p4d); p4d_clear(p4dp); @@ -429,6 +449,7 @@ static void __init p4d_populate_tests(st if (mm_pud_folded(mm)) return; + pr_debug("Validating P4D populate\n"); /* * This entry points to next level page table page. * Hence this must not qualify as p4d_bad(). @@ -447,6 +468,7 @@ static void __init pgd_clear_tests(struc if (mm_p4d_folded(mm)) return; + pr_debug("Validating PGD clear\n"); pgd = __pgd(pgd_val(pgd) | RANDOM_ORVALUE); WRITE_ONCE(*pgdp, pgd); pgd_clear(pgdp); @@ -462,6 +484,7 @@ static void __init pgd_populate_tests(st if (mm_p4d_folded(mm)) return; + pr_debug("Validating PGD populate\n"); /* * This entry points to next level page table page. * Hence this must not qualify as pgd_bad(). @@ -490,6 +513,7 @@ static void __init pte_clear_tests(struc { pte_t pte = ptep_get(ptep); + pr_debug("Validating PTE clear\n"); pte = __pte(pte_val(pte) | RANDOM_ORVALUE); set_pte_at(mm, vaddr, ptep, pte); barrier(); @@ -502,6 +526,7 @@ static void __init pmd_clear_tests(struc { pmd_t pmd = READ_ONCE(*pmdp); + pr_debug("Validating PMD clear\n"); pmd = __pmd(pmd_val(pmd) | RANDOM_ORVALUE); WRITE_ONCE(*pmdp, pmd); pmd_clear(pmdp); @@ -514,6 +539,7 @@ static void __init pmd_populate_tests(st { pmd_t pmd; + pr_debug("Validating PMD populate\n"); /* * This entry points to next level page table page. * Hence this must not qualify as pmd_bad(). @@ -531,6 +557,7 @@ static void __init pte_special_tests(uns if (!IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) return; + pr_debug("Validating PTE special\n"); WARN_ON(!pte_special(pte_mkspecial(pte))); } @@ -541,6 +568,7 @@ static void __init pte_protnone_tests(un if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) return; + pr_debug("Validating PTE protnone\n"); WARN_ON(!pte_protnone(pte)); WARN_ON(!pte_present(pte)); } @@ -553,6 +581,7 @@ static void __init pmd_protnone_tests(un if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) return; + pr_debug("Validating PMD protnone\n"); WARN_ON(!pmd_protnone(pmd)); WARN_ON(!pmd_present(pmd)); } @@ -565,6 +594,7 @@ static void __init pte_devmap_tests(unsi { pte_t pte = pfn_pte(pfn, prot); + pr_debug("Validating PTE devmap\n"); WARN_ON(!pte_devmap(pte_mkdevmap(pte))); } @@ -573,6 +603,7 @@ static void __init pmd_devmap_tests(unsi { pmd_t pmd = pfn_pmd(pfn, prot); + pr_debug("Validating PMD devmap\n"); WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd))); } @@ -581,6 +612,7 @@ static void __init pud_devmap_tests(unsi { pud_t pud = pfn_pud(pfn, prot); + pr_debug("Validating PUD devmap\n"); WARN_ON(!pud_devmap(pud_mkdevmap(pud))); } #else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ @@ -603,6 +635,7 @@ static void __init pte_soft_dirty_tests( if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) return; + pr_debug("Validating PTE soft dirty\n"); WARN_ON(!pte_soft_dirty(pte_mksoft_dirty(pte))); WARN_ON(pte_soft_dirty(pte_clear_soft_dirty(pte))); } @@ -614,6 +647,7 @@ static void __init pte_swap_soft_dirty_t if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) return; + pr_debug("Validating PTE swap soft dirty\n"); WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte))); WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte))); } @@ -626,6 +660,7 @@ static void __init pmd_soft_dirty_tests( if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) return; + pr_debug("Validating PMD soft dirty\n"); WARN_ON(!pmd_soft_dirty(pmd_mksoft_dirty(pmd))); WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd))); } @@ -638,6 +673,7 @@ static void __init pmd_swap_soft_dirty_t !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION)) return; + pr_debug("Validating PMD swap soft dirty\n"); WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd))); WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd))); } @@ -653,6 +689,7 @@ static void __init pte_swap_tests(unsign swp_entry_t swp; pte_t pte; + pr_debug("Validating PTE swap\n"); pte = pfn_pte(pfn, prot); swp = __pte_to_swp_entry(pte); pte = __swp_entry_to_pte(swp); @@ -665,6 +702,7 @@ static void __init pmd_swap_tests(unsign swp_entry_t swp; pmd_t pmd; + pr_debug("Validating PMD swap\n"); pmd = pfn_pmd(pfn, prot); swp = __pmd_to_swp_entry(pmd); pmd = __swp_entry_to_pmd(swp); @@ -681,6 +719,8 @@ static void __init swap_migration_tests( if (!IS_ENABLED(CONFIG_MIGRATION)) return; + + pr_debug("Validating swap migration\n"); /* * swap_migration_tests() requires a dedicated page as it needs to * be locked before creating a migration entry from it. Locking the @@ -720,6 +760,7 @@ static void __init hugetlb_basic_tests(u struct page *page; pte_t pte; + pr_debug("Validating HugeTLB basic\n"); /* * Accessing the page associated with the pfn is safe here, * as it was previously derived from a real kernel symbol. @@ -747,6 +788,7 @@ static void __init hugetlb_advanced_test pte_t pte = ptep_get(ptep); unsigned long paddr = __pfn_to_phys(pfn) & PMD_MASK; + pr_debug("Validating HugeTLB advanced\n"); pte = pte_mkhuge(mk_pte(pfn_to_page(PHYS_PFN(paddr)), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); @@ -797,6 +839,7 @@ static void __init pmd_thp_tests(unsigne if (!has_transparent_hugepage()) return; + pr_debug("Validating PMD based THP\n"); /* * pmd_trans_huge() and pmd_present() must return positive after * MMU invalidation with pmd_mkinvalid(). This behavior is an @@ -825,6 +868,7 @@ static void __init pud_thp_tests(unsigne if (!has_transparent_hugepage()) return; + pr_debug("Validating PUD based THP\n"); pud = pfn_pud(pfn, prot); WARN_ON(!pud_trans_huge(pud_mkhuge(pud))); _
From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: Documentation/mm: add descriptions for arch page table helpers This adds a specific description file for all arch page table helpers which is in sync with the semantics being tested via CONFIG_DEBUG_VM_PGTABLE. All future changes either to these descriptions here or the debug test should always remain in sync. [anshuman.khandual@arm.com: fold in Mike's patch for the rst document, fix typos in the rst document] Link: http://lkml.kernel.org/r/1594610587-4172-5-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1593996516-7186-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Mike Rapoport <rppt@kernel.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/vm/arch_pgtable_helpers.rst | 258 ++++++++++++++++++++ mm/debug_vm_pgtable.c | 6 2 files changed, 264 insertions(+) --- /dev/null +++ a/Documentation/vm/arch_pgtable_helpers.rst @@ -0,0 +1,258 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _arch_page_table_helpers: + +=============================== +Architecture Page Table Helpers +=============================== + +Generic MM expects architectures (with MMU) to provide helpers to create, access +and modify page table entries at various level for different memory functions. +These page table helpers need to conform to a common semantics across platforms. +Following tables describe the expected semantics which can also be tested during +boot via CONFIG_DEBUG_VM_PGTABLE option. All future changes in here or the debug +test need to be in sync. + +====================== +PTE Page Table Helpers +====================== + ++---------------------------+--------------------------------------------------+ +| pte_same | Tests whether both PTE entries are the same | ++---------------------------+--------------------------------------------------+ +| pte_bad | Tests a non-table mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_present | Tests a valid mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_young | Tests a young PTE | ++---------------------------+--------------------------------------------------+ +| pte_dirty | Tests a dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_write | Tests a writable PTE | ++---------------------------+--------------------------------------------------+ +| pte_special | Tests a special PTE | ++---------------------------+--------------------------------------------------+ +| pte_protnone | Tests a PROT_NONE PTE | ++---------------------------+--------------------------------------------------+ +| pte_devmap | Tests a ZONE_DEVICE mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_soft_dirty | Tests a soft dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_swp_soft_dirty | Tests a soft dirty swapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkyoung | Creates a young PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkold | Creates an old PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkdirty | Creates a dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkclean | Creates a clean PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkwrite | Creates a writable PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkwrprotect | Creates a write protected PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkspecial | Creates a special PTE | ++---------------------------+--------------------------------------------------+ +| pte_mkdevmap | Creates a ZONE_DEVICE mapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_mksoft_dirty | Creates a soft dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_clear_soft_dirty | Clears a soft dirty PTE | ++---------------------------+--------------------------------------------------+ +| pte_swp_mksoft_dirty | Creates a soft dirty swapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_swp_clear_soft_dirty | Clears a soft dirty swapped PTE | ++---------------------------+--------------------------------------------------+ +| pte_mknotpresent | Invalidates a mapped PTE | ++---------------------------+--------------------------------------------------+ +| ptep_get_and_clear | Clears a PTE | ++---------------------------+--------------------------------------------------+ +| ptep_get_and_clear_full | Clears a PTE | ++---------------------------+--------------------------------------------------+ +| ptep_test_and_clear_young | Clears young from a PTE | ++---------------------------+--------------------------------------------------+ +| ptep_set_wrprotect | Converts into a write protected PTE | ++---------------------------+--------------------------------------------------+ +| ptep_set_access_flags | Converts into a more permissive PTE | ++---------------------------+--------------------------------------------------+ + +====================== +PMD Page Table Helpers +====================== + ++---------------------------+--------------------------------------------------+ +| pmd_same | Tests whether both PMD entries are the same | ++---------------------------+--------------------------------------------------+ +| pmd_bad | Tests a non-table mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_leaf | Tests a leaf mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_huge | Tests a HugeTLB mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_trans_huge | Tests a Transparent Huge Page (THP) at PMD | ++---------------------------+--------------------------------------------------+ +| pmd_present | Tests a valid mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_young | Tests a young PMD | ++---------------------------+--------------------------------------------------+ +| pmd_dirty | Tests a dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_write | Tests a writable PMD | ++---------------------------+--------------------------------------------------+ +| pmd_special | Tests a special PMD | ++---------------------------+--------------------------------------------------+ +| pmd_protnone | Tests a PROT_NONE PMD | ++---------------------------+--------------------------------------------------+ +| pmd_devmap | Tests a ZONE_DEVICE mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_soft_dirty | Tests a soft dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_swp_soft_dirty | Tests a soft dirty swapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkyoung | Creates a young PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkold | Creates an old PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkdirty | Creates a dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkclean | Creates a clean PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkwrite | Creates a writable PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkwrprotect | Creates a write protected PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkspecial | Creates a special PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkdevmap | Creates a ZONE_DEVICE mapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mksoft_dirty | Creates a soft dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_clear_soft_dirty | Clears a soft dirty PMD | ++---------------------------+--------------------------------------------------+ +| pmd_swp_mksoft_dirty | Creates a soft dirty swapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD | ++---------------------------+--------------------------------------------------+ +| pmd_mkinvalid | Invalidates a mapped PMD [1] | ++---------------------------+--------------------------------------------------+ +| pmd_set_huge | Creates a PMD huge mapping | ++---------------------------+--------------------------------------------------+ +| pmd_clear_huge | Clears a PMD huge mapping | ++---------------------------+--------------------------------------------------+ +| pmdp_get_and_clear | Clears a PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_get_and_clear_full | Clears a PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_test_and_clear_young | Clears young from a PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_set_wrprotect | Converts into a write protected PMD | ++---------------------------+--------------------------------------------------+ +| pmdp_set_access_flags | Converts into a more permissive PMD | ++---------------------------+--------------------------------------------------+ + +====================== +PUD Page Table Helpers +====================== + ++---------------------------+--------------------------------------------------+ +| pud_same | Tests whether both PUD entries are the same | ++---------------------------+--------------------------------------------------+ +| pud_bad | Tests a non-table mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_leaf | Tests a leaf mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_huge | Tests a HugeTLB mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_trans_huge | Tests a Transparent Huge Page (THP) at PUD | ++---------------------------+--------------------------------------------------+ +| pud_present | Tests a valid mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_young | Tests a young PUD | ++---------------------------+--------------------------------------------------+ +| pud_dirty | Tests a dirty PUD | ++---------------------------+--------------------------------------------------+ +| pud_write | Tests a writable PUD | ++---------------------------+--------------------------------------------------+ +| pud_devmap | Tests a ZONE_DEVICE mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkyoung | Creates a young PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkold | Creates an old PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkdirty | Creates a dirty PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkclean | Creates a clean PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkwrite | Creates a writable PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkwrprotect | Creates a write protected PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD | ++---------------------------+--------------------------------------------------+ +| pud_mkinvalid | Invalidates a mapped PUD [1] | ++---------------------------+--------------------------------------------------+ +| pud_set_huge | Creates a PUD huge mapping | ++---------------------------+--------------------------------------------------+ +| pud_clear_huge | Clears a PUD huge mapping | ++---------------------------+--------------------------------------------------+ +| pudp_get_and_clear | Clears a PUD | ++---------------------------+--------------------------------------------------+ +| pudp_get_and_clear_full | Clears a PUD | ++---------------------------+--------------------------------------------------+ +| pudp_test_and_clear_young | Clears young from a PUD | ++---------------------------+--------------------------------------------------+ +| pudp_set_wrprotect | Converts into a write protected PUD | ++---------------------------+--------------------------------------------------+ +| pudp_set_access_flags | Converts into a more permissive PUD | ++---------------------------+--------------------------------------------------+ + +========================== +HugeTLB Page Table Helpers +========================== + ++---------------------------+--------------------------------------------------+ +| pte_huge | Tests a HugeTLB | ++---------------------------+--------------------------------------------------+ +| pte_mkhuge | Creates a HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_dirty | Tests a dirty HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_write | Tests a writable HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_mkdirty | Creates a dirty HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_mkwrite | Creates a writable HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_pte_mkwrprotect | Creates a write protected HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_ptep_get_and_clear | Clears a HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_ptep_set_wrprotect | Converts into a write protected HugeTLB | ++---------------------------+--------------------------------------------------+ +| huge_ptep_set_access_flags | Converts into a more permissive HugeTLB | ++---------------------------+--------------------------------------------------+ + +======================== +SWAP Page Table Helpers +======================== + ++---------------------------+--------------------------------------------------+ +| __pte_to_swp_entry | Creates a swapped entry (arch) from a mapped PTE | ++---------------------------+--------------------------------------------------+ +| __swp_to_pte_entry | Creates a mapped PTE from a swapped entry (arch) | ++---------------------------+--------------------------------------------------+ +| __pmd_to_swp_entry | Creates a swapped entry (arch) from a mapped PMD | ++---------------------------+--------------------------------------------------+ +| __swp_to_pmd_entry | Creates a mapped PMD from a swapped entry (arch) | ++---------------------------+--------------------------------------------------+ +| is_migration_entry | Tests a migration (read or write) swapped entry | ++---------------------------+--------------------------------------------------+ +| is_write_migration_entry | Tests a write migration swapped entry | ++---------------------------+--------------------------------------------------+ +| make_migration_entry_read | Converts into read migration swapped entry | ++---------------------------+--------------------------------------------------+ +| make_migration_entry | Creates a migration swapped entry (read or write)| ++---------------------------+--------------------------------------------------+ + +[1] https://lore.kernel.org/linux-mm/20181017020930.GN30832@redhat.com/ --- a/mm/debug_vm_pgtable.c~documentation-mm-add-descriptions-for-arch-page-table-helpers +++ a/mm/debug_vm_pgtable.c @@ -31,6 +31,12 @@ #include <asm/pgalloc.h> #include <asm/tlbflush.h> +/* + * Please refer Documentation/vm/arch_pgtable_helpers.rst for the semantics + * expectations that are being validated here. All future changes in here + * or the documentation need to be in sync. + */ + #define VMFLAGS (VM_READ|VM_WRITE|VM_EXEC) /* _
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: mm/debug: handle page->mapping better in dump_page Patch series "Improvements for dump_page()", v2. Here's a sample dump of a pagecache tail page with all of the patches applied: page:000000006d1c49ca refcount:6 mapcount:0 mapping:00000000136b8d90 index:0x109 pfn:0x6c645 head:000000008bd38076 order:2 compound_mapcount:0 compound_pincount:0 aops:xfs_address_space_operations ino:800042 dentry name:"fd" flags: 0x4000000000012014(uptodate|lru|private|head) raw: 4000000000000000 ffffd46ac1b19101 ffffffff00000202 dead000000000004 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 head: 4000000000012014 ffffd46ac1b1bbc8 ffffd46ac1b1bc08 ffff91976f659560 head: 0000000000000108 ffff919773220680 00000006ffffffff 0000000000000000 page dumped because: testing This patch (of 6): If we can't call page_mapping() to get the page mapping, handle the anon/ksm/movable bits correctly. [akpm@linux-foundation.org: augmented code comment from John] Link: http://lkml.kernel.org/r/15cff11a-6762-8a6a-3f0e-dd227280cd6f@nvidia.com Link: http://lkml.kernel.org/r/20200709202117.7216-1-willy@infradead.org Link: http://lkml.kernel.org/r/20200709202117.7216-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: William Kucharski <william.kucharski@oracle.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) --- a/mm/debug.c~mm-handle-page-mapping-better-in-dump_page +++ a/mm/debug.c @@ -69,8 +69,19 @@ void __dump_page(struct page *page, cons } if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) { - /* Corrupt page, cannot call page_mapping */ - mapping = page->mapping; + /* + * Corrupt page, so we cannot call page_mapping. Instead, do a + * safe subset of the steps that page_mapping() does. Caution: + * this will be misleading for tail pages, PageSwapCache pages, + * and potentially other situations. (See the page_mapping() + * implementation for what's missing here.) + */ + unsigned long tmp = (unsigned long)page->mapping; + + if (tmp & PAGE_MAPPING_ANON) + mapping = NULL; + else + mapping = (void *)(tmp & ~PAGE_MAPPING_FLAGS); head = page; compound = false; } else { _
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: mm/debug: dump compound page information on a second line Simplify both the implementation and the output by splitting all the compound page information onto a second line. Link: http://lkml.kernel.org/r/20200709202117.7216-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug.c | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-) --- a/mm/debug.c~mm-dump-compound-page-information-on-a-second-line +++ a/mm/debug.c @@ -95,27 +95,21 @@ void __dump_page(struct page *page, cons */ mapcount = PageSlab(head) ? 0 : page_mapcount(page); - if (compound) + pr_warn("page:%px refcount:%d mapcount:%d mapping:%p index:%#lx\n", + page, page_ref_count(head), mapcount, mapping, + page_to_pgoff(page)); + if (compound) { if (hpage_pincount_available(page)) { - pr_warn("page:%px refcount:%d mapcount:%d mapping:%p " - "index:%#lx head:%px order:%u " - "compound_mapcount:%d compound_pincount:%d\n", - page, page_ref_count(head), mapcount, - mapping, page_to_pgoff(page), head, - compound_order(head), compound_mapcount(page), - compound_pincount(page)); + pr_warn("head:%px order:%u compound_mapcount:%d compound_pincount:%d\n", + head, compound_order(head), + compound_mapcount(head), + compound_pincount(head)); } else { - pr_warn("page:%px refcount:%d mapcount:%d mapping:%p " - "index:%#lx head:%px order:%u " - "compound_mapcount:%d\n", - page, page_ref_count(head), mapcount, - mapping, page_to_pgoff(page), head, - compound_order(head), compound_mapcount(page)); + pr_warn("head:%px order:%u compound_mapcount:%d\n", + head, compound_order(head), + compound_mapcount(head)); } - else - pr_warn("page:%px refcount:%d mapcount:%d mapping:%p index:%#lx\n", - page, page_ref_count(page), mapcount, - mapping, page_to_pgoff(page)); + } if (PageKsm(page)) type = "ksm "; else if (PageAnon(page)) _
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: mm/debug: print head flags in dump_page Tail page flags contain very little useful information. Print the head page's flags instead. While the flags will contain "head" for tail pages, this should not be too confusing as the previous line starts with the word "head:" and so the flags should be interpreted as belonging to the head page. Link: http://lkml.kernel.org/r/20200709202117.7216-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/debug.c~mm-print-head-flags-in-dump_page +++ a/mm/debug.c @@ -168,7 +168,7 @@ void __dump_page(struct page *page, cons out_mapping: BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); - pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags, + pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, page_cma ? " CMA" : ""); hex_only: _
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: mm/debug: switch dump_page to get_kernel_nofault This is simpler to use than copy_from_kernel_nofault(). Also make some of the related error messages less verbose. Link: http://lkml.kernel.org/r/20200709202117.7216-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug.c | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) --- a/mm/debug.c~mm-switch-dump_page-to-get_kernel_nofault +++ a/mm/debug.c @@ -115,54 +115,50 @@ void __dump_page(struct page *page, cons else if (PageAnon(page)) type = "anon "; else if (mapping) { - const struct inode *host; + struct inode *host; const struct address_space_operations *a_ops; - const struct hlist_node *dentry_first; - const struct dentry *dentry_ptr; + struct hlist_node *dentry_first; + struct dentry *dentry_ptr; struct dentry dentry; /* * mapping can be invalid pointer and we don't want to crash * accessing it, so probe everything depending on it carefully */ - if (copy_from_kernel_nofault(&host, &mapping->host, - sizeof(struct inode *)) || - copy_from_kernel_nofault(&a_ops, &mapping->a_ops, - sizeof(struct address_space_operations *))) { - pr_warn("failed to read mapping->host or a_ops, mapping not a valid kernel address?\n"); + if (get_kernel_nofault(host, &mapping->host) || + get_kernel_nofault(a_ops, &mapping->a_ops)) { + pr_warn("failed to read mapping contents, not a valid kernel address?\n"); goto out_mapping; } if (!host) { - pr_warn("mapping->a_ops:%ps\n", a_ops); + pr_warn("aops:%ps\n", a_ops); goto out_mapping; } - if (copy_from_kernel_nofault(&dentry_first, - &host->i_dentry.first, sizeof(struct hlist_node *))) { - pr_warn("mapping->a_ops:%ps with invalid mapping->host inode address %px\n", - a_ops, host); + if (get_kernel_nofault(dentry_first, &host->i_dentry.first)) { + pr_warn("aops:%ps with invalid host inode %px\n", + a_ops, host); goto out_mapping; } if (!dentry_first) { - pr_warn("mapping->a_ops:%ps\n", a_ops); + pr_warn("aops:%ps\n", a_ops); goto out_mapping; } dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias); - if (copy_from_kernel_nofault(&dentry, dentry_ptr, - sizeof(struct dentry))) { - pr_warn("mapping->aops:%ps with invalid mapping->host->i_dentry.first %px\n", - a_ops, dentry_ptr); + if (get_kernel_nofault(dentry, dentry_ptr)) { + pr_warn("aops:%ps with invalid dentry %px\n", a_ops, + dentry_ptr); } else { /* * if dentry is corrupted, the %pd handler may still * crash, but it's unlikely that we reach here with a * corrupted struct page */ - pr_warn("mapping->aops:%ps dentry name:\"%pd\"\n", - a_ops, &dentry); + pr_warn("aops:%ps dentry name:\"%pd\"\n", a_ops, + &dentry); } } out_mapping: _
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: mm/debug: print the inode number in dump_page The inode number helps correlate this page with debug messages elsewhere in the kernel. Link: http://lkml.kernel.org/r/20200709202117.7216-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/debug.c~mm-print-the-inode-number-in-dump_page +++ a/mm/debug.c @@ -143,7 +143,7 @@ void __dump_page(struct page *page, cons } if (!dentry_first) { - pr_warn("aops:%ps\n", a_ops); + pr_warn("aops:%ps ino:%lx\n", a_ops, host->i_ino); goto out_mapping; } @@ -157,8 +157,8 @@ void __dump_page(struct page *page, cons * crash, but it's unlikely that we reach here with a * corrupted struct page */ - pr_warn("aops:%ps dentry name:\"%pd\"\n", a_ops, - &dentry); + pr_warn("aops:%ps ino:%lx dentry name:\"%pd\"\n", + a_ops, host->i_ino, &dentry); } } out_mapping: _
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: mm/debug: print hashed address of struct page The actual address of the struct page isn't particularly helpful, while the hashed address helps match with other messages elsewhere. Add the PFN that the page refers to in order to help diagnose problems where the page is improperly aligned for the purpose. Link: http://lkml.kernel.org/r/20200709202117.7216-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/debug.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/mm/debug.c~mm-print-hashed-address-of-struct-page +++ a/mm/debug.c @@ -95,17 +95,17 @@ void __dump_page(struct page *page, cons */ mapcount = PageSlab(head) ? 0 : page_mapcount(page); - pr_warn("page:%px refcount:%d mapcount:%d mapping:%p index:%#lx\n", + pr_warn("page:%p refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n", page, page_ref_count(head), mapcount, mapping, - page_to_pgoff(page)); + page_to_pgoff(page), page_to_pfn(page)); if (compound) { if (hpage_pincount_available(page)) { - pr_warn("head:%px order:%u compound_mapcount:%d compound_pincount:%d\n", + pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n", head, compound_order(head), compound_mapcount(head), compound_pincount(head)); } else { - pr_warn("head:%px order:%u compound_mapcount:%d\n", + pr_warn("head:%p order:%u compound_mapcount:%d\n", head, compound_order(head), compound_mapcount(head)); } _
From: John Hubbard <jhubbard@nvidia.com> Subject: mm, dump_page: do not crash with bad compound_mapcount() If a compound page is being split while dump_page() is being run on that page, we can end up calling compound_mapcount() on a page that is no longer compound. This leads to a crash (already seen at least once in the field), due to the VM_BUG_ON_PAGE() assertion inside compound_mapcount(). (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.) A similar problem is possible, via compound_pincount() instead of compound_mapcount(). In order to avoid this kind of crash, make dump_page() slightly more robust, by providing a pair of simpler routines that don't contain assertions: head_mapcount() and head_pincount(). For debug tools, we don't want to go *too* far in this direction, but this is a simple small fix, and the crash has already been seen, so it's a good trade-off. Link: http://lkml.kernel.org/r/20200804214807.169256-1-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reported-by: Qian Cai <cai@lca.pw> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm.h | 14 ++++++++++++-- mm/debug.c | 6 +++--- 2 files changed, 15 insertions(+), 5 deletions(-) --- a/include/linux/mm.h~mm-dump_page-do-not-crash-with-bad-compound_mapcount +++ a/include/linux/mm.h @@ -779,6 +779,11 @@ static inline void *kvcalloc(size_t n, s extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); +static inline int head_mapcount(struct page *head) +{ + return atomic_read(compound_mapcount_ptr(head)) + 1; +} + /* * Mapcount of compound page as a whole, does not include mapped sub-pages. * @@ -788,7 +793,7 @@ static inline int compound_mapcount(stru { VM_BUG_ON_PAGE(!PageCompound(page), page); page = compound_head(page); - return atomic_read(compound_mapcount_ptr(page)) + 1; + return head_mapcount(page); } /* @@ -901,11 +906,16 @@ static inline bool hpage_pincount_availa return PageCompound(page) && compound_order(page) > 1; } +static inline int head_pincount(struct page *head) +{ + return atomic_read(compound_pincount_ptr(head)); +} + static inline int compound_pincount(struct page *page) { VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); page = compound_head(page); - return atomic_read(compound_pincount_ptr(page)); + return head_pincount(page); } static inline void set_compound_order(struct page *page, unsigned int order) --- a/mm/debug.c~mm-dump_page-do-not-crash-with-bad-compound_mapcount +++ a/mm/debug.c @@ -102,12 +102,12 @@ void __dump_page(struct page *page, cons if (hpage_pincount_available(page)) { pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n", head, compound_order(head), - compound_mapcount(head), - compound_pincount(head)); + head_mapcount(head), + head_pincount(head)); } else { pr_warn("head:%p order:%u compound_mapcount:%d\n", head, compound_order(head), - compound_mapcount(head)); + head_mapcount(head)); } } if (PageKsm(page)) _
From: Yang Shi <yang.shi@linux.alibaba.com> Subject: mm: filemap: clear idle flag for writes Since commit bbddabe2e436aa ("mm: filemap: only do access activations on reads"), mark_page_accessed() is called for reads only. But the idle flag is cleared by mark_page_accessed() so the idle flag won't get cleared if the page is write accessed only. Basically idle page tracking is used to estimate workingset size of workload, noticeable size of workingset might be missed if the idle flag is not maintained correctly. It seems good enough to just clear idle flag for write operations. Link: http://lkml.kernel.org/r/1593020612-13051-1-git-send-email-yang.shi@linux.alibaba.com Fixes: bbddabe2e436 ("mm: filemap: only do access activations on reads") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reported-by: Gang Deng <gavin.dg@linux.alibaba.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/filemap.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/mm/filemap.c~mm-filemap-clear-idle-flag-for-writes +++ a/mm/filemap.c @@ -41,6 +41,7 @@ #include <linux/delayacct.h> #include <linux/psi.h> #include <linux/ramfs.h> +#include <linux/page_idle.h> #include "internal.h" #define CREATE_TRACE_POINTS @@ -1689,6 +1690,11 @@ repeat: if (fgp_flags & FGP_ACCESSED) mark_page_accessed(page); + else if (fgp_flags & FGP_WRITE) { + /* Clear idle flag for buffer write */ + if (page_is_idle(page)) + clear_page_idle(page); + } no_page: if (!page && (fgp_flags & FGP_CREAT)) { _
From: Yang Shi <yang.shi@linux.alibaba.com> Subject: mm: filemap: add missing FGP_ flags in kerneldoc comment for pagecache_get_page FGP_{WRITE|NOFS|NOWAIT} were missed in pagecache_get_page's kerneldoc comment. Link: http://lkml.kernel.org/r/1593031747-4249-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Gang Deng <gavin.dg@linux.alibaba.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/filemap.c | 3 +++ 1 file changed, 3 insertions(+) --- a/mm/filemap.c~mm-filemap-add-missing-fgp_-flags-in-kerneldoc-comment-for-pagecache_get_page +++ a/mm/filemap.c @@ -1649,6 +1649,9 @@ EXPORT_SYMBOL(find_lock_entry); * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the * page is already in cache. If the page was allocated, unlock it before * returning so the caller can do the same dance. + * * %FGP_WRITE - The page will be written + * * %FGP_NOFS - __GFP_FS will get cleared in gfp mask + * * %FGP_NOWAIT - Don't get blocked by page lock * * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even * if the %GFP flags specified for %FGP_CREAT are atomic. _
From: Tang Yizhou <tangyizhou@huawei.com> Subject: mm/gup.c: fix the comment of return value for populate_vma_page_range() The return value of populate_vma_page_range() is consistent with __get_user_pages(), and so is the function comment of return value. Link: http://lkml.kernel.org/r/20200720034303.29920-1-tangyizhou@huawei.com Signed-off-by: Tang Yizhou <tangyizhou@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/gup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/gup.c~mm-gupc-fix-the-comment-of-return-value-for-populate_vma_page_range +++ a/mm/gup.c @@ -1404,7 +1404,8 @@ retry: * * This takes care of mlocking the pages too if VM_LOCKED is set. * - * return 0 on success, negative error code on error. + * Return either number of pages pinned in the vma, or a negative error + * code on error. * * vma->vm_mm->mmap_lock must be held. * _
From: Zhen Lei <thunder.leizhen@huawei.com> Subject: mm/swap_slots.c: simplify alloc_swap_slot_cache() Patch series "clean up some functions in mm/swap_slots.c". When I studied the code of mm/swap_slots.c, I found some places can be improved. This patch (of 3): Both "slots" and "slots_ret" are only need to be freed when cache already allocated. Make them closer, seems more clear. No functional change. Link: http://lkml.kernel.org/r/20200430061143.450-1-thunder.leizhen@huawei.com Link: http://lkml.kernel.org/r/20200430061143.450-2-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/swap_slots.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) --- a/mm/swap_slots.c~mm-swap-simplify-alloc_swap_slot_cache +++ a/mm/swap_slots.c @@ -136,9 +136,16 @@ static int alloc_swap_slot_cache(unsigne mutex_lock(&swap_slots_cache_mutex); cache = &per_cpu(swp_slots, cpu); - if (cache->slots || cache->slots_ret) + if (cache->slots || cache->slots_ret) { /* cache already allocated */ - goto out; + mutex_unlock(&swap_slots_cache_mutex); + + kvfree(slots); + kvfree(slots_ret); + + return 0; + } + if (!cache->lock_initialized) { mutex_init(&cache->alloc_lock); spin_lock_init(&cache->free_lock); @@ -155,15 +162,8 @@ static int alloc_swap_slot_cache(unsigne */ mb(); cache->slots = slots; - slots = NULL; cache->slots_ret = slots_ret; - slots_ret = NULL; -out: mutex_unlock(&swap_slots_cache_mutex); - if (slots) - kvfree(slots); - if (slots_ret) - kvfree(slots_ret); return 0; } _
From: Zhen Lei <thunder.leizhen@huawei.com> Subject: mm/swap_slots.c: simplify enable_swap_slots_cache() Whether swap_slot_cache_initialized is true or false, __reenable_swap_slots_cache() is always called. To make this meaning clear, leave only one call to __reenable_swap_slots_cache(). This also make it clearer what extra needs be done when swap_slot_cache_initialized is false. No functional change. Link: http://lkml.kernel.org/r/20200430061143.450-3-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/swap_slots.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) --- a/mm/swap_slots.c~mm-swap-simplify-enable_swap_slots_cache +++ a/mm/swap_slots.c @@ -240,21 +240,19 @@ static int free_slot_cache(unsigned int int enable_swap_slots_cache(void) { - int ret = 0; - mutex_lock(&swap_slots_cache_enable_mutex); - if (swap_slot_cache_initialized) { - __reenable_swap_slots_cache(); - goto out_unlock; - } + if (!swap_slot_cache_initialized) { + int ret; - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "swap_slots_cache", - alloc_swap_slot_cache, free_slot_cache); - if (WARN_ONCE(ret < 0, "Cache allocation failed (%s), operating " - "without swap slots cache.\n", __func__)) - goto out_unlock; + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "swap_slots_cache", + alloc_swap_slot_cache, free_slot_cache); + if (WARN_ONCE(ret < 0, "Cache allocation failed (%s), operating " + "without swap slots cache.\n", __func__)) + goto out_unlock; + + swap_slot_cache_initialized = true; + } - swap_slot_cache_initialized = true; __reenable_swap_slots_cache(); out_unlock: mutex_unlock(&swap_slots_cache_enable_mutex); _
From: Zhen Lei <thunder.leizhen@huawei.com> Subject: mm/swap_slots.c: remove redundant check for swap_slot_cache_initialized Because enable_swap_slots_cache can only become true in enable_swap_slots_cache(), and depends on swap_slot_cache_initialized is true before. That means, when enable_swap_slots_cache is true, swap_slot_cache_initialized is true also. So the condition: "swap_slot_cache_enabled && swap_slot_cache_initialized" can be reduced to "swap_slot_cache_enabled" And in mathematics: "!swap_slot_cache_enabled || !swap_slot_cache_initialized" is equal to "!(swap_slot_cache_enabled && swap_slot_cache_initialized)" So no functional change. Link: http://lkml.kernel.org/r/20200430061143.450-4-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/swap_slots.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) --- a/mm/swap_slots.c~mm-swap-remove-redundant-check-for-swap_slot_cache_initialized +++ a/mm/swap_slots.c @@ -46,8 +46,7 @@ static void __drain_swap_slots_cache(uns static void deactivate_swap_slots_cache(void); static void reactivate_swap_slots_cache(void); -#define use_swap_slot_cache (swap_slot_cache_active && \ - swap_slot_cache_enabled && swap_slot_cache_initialized) +#define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) #define SLOTS_CACHE 0x1 #define SLOTS_CACHE_RET 0x2 @@ -94,7 +93,7 @@ static bool check_cache_active(void) { long pages; - if (!swap_slot_cache_enabled || !swap_slot_cache_initialized) + if (!swap_slot_cache_enabled) return false; pages = get_nr_swap_pages(); _
From: Krzysztof Kozlowski <krzk@kernel.org> Subject: mm: swap: fix kerneldoc of swap_vma_readahead() Fix W=1 compile warnings (invalid kerneldoc): mm/swap_state.c:742: warning: Function parameter or member 'fentry' not described in 'swap_vma_readahead' mm/swap_state.c:742: warning: Excess function parameter 'entry' description in 'swap_vma_readahead' Link: http://lkml.kernel.org/r/20200728171109.28687-2-krzk@kernel.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/swap_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/swap_state.c~mm-swap-fix-kerneldoc-of-swap_vma_readahead +++ a/mm/swap_state.c @@ -725,7 +725,7 @@ static void swap_ra_info(struct vm_fault /** * swap_vma_readahead - swap in pages in hope we need them soon - * @entry: swap entry of this memory + * @fentry: swap entry of this memory * @gfp_mask: memory allocation flags * @vmf: fault information * _
From: Xianting Tian <xianting_tian@126.com> Subject: mm/page_io.c: use blk_io_schedule() for avoiding task hung in sync io swap_readpage() does the sync io for one page, the io is not big, normally, the io can be finished quickly, but it may take long time or wait forever in case of io failure or discard. This patch uses blk_io_schedule() instead of io_schedule() to avoid task hung and crash (when set /proc/sys/kernel/hung_task_panic) when the above exception occurs. This is similar to the hung task avoidance in submit_bio_wait(), blk_execute_rq() and __blkdev_direct_IO(). Link: http://lkml.kernel.org/r/1596461807-21087-1-git-send-email-xianting_tian@126.com Signed-off-by: Xianting Tian <xianting_tian@126.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/page_io.c~mm-use-blk_io_schedule-for-avoiding-task-hung-in-sync-io +++ a/mm/page_io.c @@ -441,7 +441,7 @@ int swap_readpage(struct page *page, boo break; if (!blk_poll(disk->queue, qc, true)) - io_schedule(); + blk_io_schedule(); } __set_current_state(TASK_RUNNING); bio_put(bio); _
From: Chris Down <chris@chrisdown.name> Subject: tmpfs: per-superblock i_ino support Patch series "tmpfs: inode: Reduce risk of inum overflow", v7. In Facebook production we are seeing heavy i_ino wraparounds on tmpfs. On affected tiers, in excess of 10% of hosts show multiple files with different content and the same inode number, with some servers even having as many as 150 duplicated inode numbers with differing file content. This causes actual, tangible problems in production. For example, we have complaints from those working on remote caches that their application is reporting cache corruptions because it uses (device, inodenum) to establish the identity of a particular cache object, but because it's not unique any more, the application refuses to continue and reports cache corruption. Even worse, sometimes applications may not even detect the corruption but may continue anyway, causing phantom and hard to debug behaviour. In general, userspace applications expect that (device, inodenum) should be enough to be uniquely point to one inode, which seems fair enough. One might also need to check the generation, but in this case: 1. That's not currently exposed to userspace (ioctl(...FS_IOC_GETVERSION...) returns ENOTTY on tmpfs); 2. Even with generation, there shouldn't be two live inodes with the same inode number on one device. In order to mitigate this, we take a two-pronged approach: 1. Moving inum generation from being global to per-sb for tmpfs. This itself allows some reduction in i_ino churn. This works on both 64- and 32- bit machines. 2. Adding inode{64,32} for tmpfs. This fix is supported on machines with 64-bit ino_t only: we allow users to mount tmpfs with a new inode64 option that uses the full width of ino_t, or CONFIG_TMPFS_INODE64. You can see how this compares to previous related patches which didn't implement this per-superblock: - https://patchwork.kernel.org/patch/11254001/ - https://patchwork.kernel.org/patch/11023915/ This patch (of 2): get_next_ino has a number of problems: - It uses and returns a uint, which is susceptible to become overflowed if a lot of volatile inodes that use get_next_ino are created. - It's global, with no specificity per-sb or even per-filesystem. This means it's not that difficult to cause inode number wraparounds on a single device, which can result in having multiple distinct inodes with the same inode number. This patch adds a per-superblock counter that mitigates the second case. This design also allows us to later have a specific i_ino size per-device, for example, allowing users to choose whether to use 32- or 64-bit inodes for each tmpfs mount. This is implemented in the next commit. For internal shmem mounts which may be less tolerant to spinlock delays, we implement a percpu batching scheme which only takes the stat_lock at each batch boundary. Link: http://lkml.kernel.org/r/cover.1594661218.git.chris@chrisdown.name Link: http://lkml.kernel.org/r/1986b9d63b986f08ec07a4aa4b2275e718e47d8a.1594661218.git.chris@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/fs.h | 15 ++++++++ include/linux/shmem_fs.h | 2 + mm/shmem.c | 66 ++++++++++++++++++++++++++++++++++--- 3 files changed, 78 insertions(+), 5 deletions(-) --- a/include/linux/fs.h~tmpfs-per-superblock-i_ino-support +++ a/include/linux/fs.h @@ -2946,6 +2946,21 @@ extern void discard_new_inode(struct ino extern unsigned int get_next_ino(void); extern void evict_inodes(struct super_block *sb); +/* + * Userspace may rely on the the inode number being non-zero. For example, glibc + * simply ignores files with zero i_ino in unlink() and other places. + * + * As an additional complication, if userspace was compiled with + * _FILE_OFFSET_BITS=32 on a 64-bit kernel we'll only end up reading out the + * lower 32 bits, so we need to check that those aren't zero explicitly. With + * _FILE_OFFSET_BITS=64, this may cause some harmless false-negatives, but + * better safe than sorry. + */ +static inline bool is_zero_ino(ino_t ino) +{ + return (u32)ino == 0; +} + extern void __iget(struct inode * inode); extern void iget_failed(struct inode *); extern void clear_inode(struct inode *); --- a/include/linux/shmem_fs.h~tmpfs-per-superblock-i_ino-support +++ a/include/linux/shmem_fs.h @@ -36,6 +36,8 @@ struct shmem_sb_info { unsigned char huge; /* Whether to try for hugepages */ kuid_t uid; /* Mount uid for root directory */ kgid_t gid; /* Mount gid for root directory */ + ino_t next_ino; /* The next per-sb inode number to use */ + ino_t __percpu *ino_batch; /* The next per-cpu inode number to use */ struct mempolicy *mpol; /* default memory policy for mappings */ spinlock_t shrinklist_lock; /* Protects shrinklist */ struct list_head shrinklist; /* List of shinkable inodes */ --- a/mm/shmem.c~tmpfs-per-superblock-i_ino-support +++ a/mm/shmem.c @@ -260,18 +260,67 @@ bool vma_is_shmem(struct vm_area_struct static LIST_HEAD(shmem_swaplist); static DEFINE_MUTEX(shmem_swaplist_mutex); -static int shmem_reserve_inode(struct super_block *sb) +/* + * shmem_reserve_inode() performs bookkeeping to reserve a shmem inode, and + * produces a novel ino for the newly allocated inode. + * + * It may also be called when making a hard link to permit the space needed by + * each dentry. However, in that case, no new inode number is needed since that + * internally draws from another pool of inode numbers (currently global + * get_next_ino()). This case is indicated by passing NULL as inop. + */ +#define SHMEM_INO_BATCH 1024 +static int shmem_reserve_inode(struct super_block *sb, ino_t *inop) { struct shmem_sb_info *sbinfo = SHMEM_SB(sb); - if (sbinfo->max_inodes) { + ino_t ino; + + if (!(sb->s_flags & SB_KERNMOUNT)) { spin_lock(&sbinfo->stat_lock); if (!sbinfo->free_inodes) { spin_unlock(&sbinfo->stat_lock); return -ENOSPC; } sbinfo->free_inodes--; + if (inop) { + ino = sbinfo->next_ino++; + if (unlikely(is_zero_ino(ino))) + ino = sbinfo->next_ino++; + if (unlikely(ino > UINT_MAX)) { + /* + * Emulate get_next_ino uint wraparound for + * compatibility + */ + ino = 1; + } + *inop = ino; + } spin_unlock(&sbinfo->stat_lock); + } else if (inop) { + /* + * __shmem_file_setup, one of our callers, is lock-free: it + * doesn't hold stat_lock in shmem_reserve_inode since + * max_inodes is always 0, and is called from potentially + * unknown contexts. As such, use a per-cpu batched allocator + * which doesn't require the per-sb stat_lock unless we are at + * the batch boundary. + */ + ino_t *next_ino; + next_ino = per_cpu_ptr(sbinfo->ino_batch, get_cpu()); + ino = *next_ino; + if (unlikely(ino % SHMEM_INO_BATCH == 0)) { + spin_lock(&sbinfo->stat_lock); + ino = sbinfo->next_ino; + sbinfo->next_ino += SHMEM_INO_BATCH; + spin_unlock(&sbinfo->stat_lock); + if (unlikely(is_zero_ino(ino))) + ino++; + } + *inop = ino; + *next_ino = ++ino; + put_cpu(); } + return 0; } @@ -2222,13 +2271,14 @@ static struct inode *shmem_get_inode(str struct inode *inode; struct shmem_inode_info *info; struct shmem_sb_info *sbinfo = SHMEM_SB(sb); + ino_t ino; - if (shmem_reserve_inode(sb)) + if (shmem_reserve_inode(sb, &ino)) return NULL; inode = new_inode(sb); if (inode) { - inode->i_ino = get_next_ino(); + inode->i_ino = ino; inode_init_owner(inode, dir, mode); inode->i_blocks = 0; inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode); @@ -2932,7 +2982,7 @@ static int shmem_link(struct dentry *old * first link must skip that, to get the accounting right. */ if (inode->i_nlink) { - ret = shmem_reserve_inode(inode->i_sb); + ret = shmem_reserve_inode(inode->i_sb, NULL); if (ret) goto out; } @@ -3584,6 +3634,7 @@ static void shmem_put_super(struct super { struct shmem_sb_info *sbinfo = SHMEM_SB(sb); + free_percpu(sbinfo->ino_batch); percpu_counter_destroy(&sbinfo->used_blocks); mpol_put(sbinfo->mpol); kfree(sbinfo); @@ -3626,6 +3677,11 @@ static int shmem_fill_super(struct super #endif sbinfo->max_blocks = ctx->blocks; sbinfo->free_inodes = sbinfo->max_inodes = ctx->inodes; + if (sb->s_flags & SB_KERNMOUNT) { + sbinfo->ino_batch = alloc_percpu(ino_t); + if (!sbinfo->ino_batch) + goto failed; + } sbinfo->uid = ctx->uid; sbinfo->gid = ctx->gid; sbinfo->mode = ctx->mode; _
From: Chris Down <chris@chrisdown.name> Subject: tmpfs: support 64-bit inums per-sb The default is still set to inode32 for backwards compatibility, but system administrators can opt in to the new 64-bit inode numbers by either: 1. Passing inode64 on the command line when mounting, or 2. Configuring the kernel with CONFIG_TMPFS_INODE64=y The inode64 and inode32 names are used based on existing precedent from XFS. [hughd@google.com: Kconfig fixes] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008011928010.13320@eggly.anvils Link: http://lkml.kernel.org/r/8b23758d0c66b5e2263e08baf9c4b6a7565cbd8f.1594661218.git.chris@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/filesystems/tmpfs.rst | 18 +++++++ fs/Kconfig | 21 ++++++++ include/linux/shmem_fs.h | 1 mm/shmem.c | 65 +++++++++++++++++++++++++- 4 files changed, 103 insertions(+), 2 deletions(-) --- a/Documentation/filesystems/tmpfs.rst~tmpfs-support-64-bit-inums-per-sb +++ a/Documentation/filesystems/tmpfs.rst @@ -150,6 +150,22 @@ These options do not have any effect on parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem. +tmpfs has a mount option to select whether it will wrap at 32- or 64-bit inode +numbers: + +======= ======================== +inode64 Use 64-bit inode numbers +inode32 Use 32-bit inode numbers +======= ======================== + +On a 32-bit kernel, inode32 is implicit, and inode64 is refused at mount time. +On a 64-bit kernel, CONFIG_TMPFS_INODE64 sets the default. inode64 avoids the +possibility of multiple files with the same inode number on a single device; +but risks glibc failing with EOVERFLOW once 33-bit inode numbers are reached - +if a long-lived tmpfs is accessed by 32-bit applications so ancient that +opening a file larger than 2GiB fails with EINVAL. + + So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' will give you tmpfs instance on /mytmpfs which can allocate 10GB RAM/SWAP in 10240 inodes and it is only accessible by root. @@ -161,3 +177,5 @@ RAM/SWAP in 10240 inodes and it is only Hugh Dickins, 4 June 2007 :Updated: KOSAKI Motohiro, 16 Mar 2010 +:Updated: + Chris Down, 13 July 2020 --- a/fs/Kconfig~tmpfs-support-64-bit-inums-per-sb +++ a/fs/Kconfig @@ -201,6 +201,27 @@ config TMPFS_XATTR If unsure, say N. +config TMPFS_INODE64 + bool "Use 64-bit ino_t by default in tmpfs" + depends on TMPFS && 64BIT + default n + help + tmpfs has historically used only inode numbers as wide as an unsigned + int. In some cases this can cause wraparound, potentially resulting + in multiple files with the same inode number on a single device. This + option makes tmpfs use the full width of ino_t by default, without + needing to specify the inode64 option when mounting. + + But if a long-lived tmpfs is to be accessed by 32-bit applications so + ancient that opening a file larger than 2GiB fails with EINVAL, then + the INODE64 config option and inode64 mount option risk operations + failing with EOVERFLOW once 33-bit inode numbers are reached. + + To override this configured default, use the inode32 or inode64 + option when mounting. + + If unsure, say N. + config HUGETLBFS bool "HugeTLB file system support" depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || \ --- a/include/linux/shmem_fs.h~tmpfs-support-64-bit-inums-per-sb +++ a/include/linux/shmem_fs.h @@ -36,6 +36,7 @@ struct shmem_sb_info { unsigned char huge; /* Whether to try for hugepages */ kuid_t uid; /* Mount uid for root directory */ kgid_t gid; /* Mount gid for root directory */ + bool full_inums; /* If i_ino should be uint or ino_t */ ino_t next_ino; /* The next per-sb inode number to use */ ino_t __percpu *ino_batch; /* The next per-cpu inode number to use */ struct mempolicy *mpol; /* default memory policy for mappings */ --- a/mm/shmem.c~tmpfs-support-64-bit-inums-per-sb +++ a/mm/shmem.c @@ -114,11 +114,13 @@ struct shmem_options { kuid_t uid; kgid_t gid; umode_t mode; + bool full_inums; int huge; int seen; #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 +#define SHMEM_SEEN_INUMS 8 }; #ifdef CONFIG_TMPFS @@ -286,12 +288,17 @@ static int shmem_reserve_inode(struct su ino = sbinfo->next_ino++; if (unlikely(is_zero_ino(ino))) ino = sbinfo->next_ino++; - if (unlikely(ino > UINT_MAX)) { + if (unlikely(!sbinfo->full_inums && + ino > UINT_MAX)) { /* * Emulate get_next_ino uint wraparound for * compatibility */ - ino = 1; + if (IS_ENABLED(CONFIG_64BIT)) + pr_warn("%s: inode number overflow on device %d, consider using inode64 mount option\n", + __func__, MINOR(sb->s_dev)); + sbinfo->next_ino = 1; + ino = sbinfo->next_ino++; } *inop = ino; } @@ -304,6 +311,10 @@ static int shmem_reserve_inode(struct su * unknown contexts. As such, use a per-cpu batched allocator * which doesn't require the per-sb stat_lock unless we are at * the batch boundary. + * + * We don't need to worry about inode{32,64} since SB_KERNMOUNT + * shmem mounts are not exposed to userspace, so we don't need + * to worry about things like glibc compatibility. */ ino_t *next_ino; next_ino = per_cpu_ptr(sbinfo->ino_batch, get_cpu()); @@ -3397,6 +3408,8 @@ enum shmem_param { Opt_nr_inodes, Opt_size, Opt_uid, + Opt_inode32, + Opt_inode64, }; static const struct constant_table shmem_param_enums_huge[] = { @@ -3416,6 +3429,8 @@ const struct fs_parameter_spec shmem_fs_ fsparam_string("nr_inodes", Opt_nr_inodes), fsparam_string("size", Opt_size), fsparam_u32 ("uid", Opt_uid), + fsparam_flag ("inode32", Opt_inode32), + fsparam_flag ("inode64", Opt_inode64), {} }; @@ -3487,6 +3502,18 @@ static int shmem_parse_one(struct fs_con break; } goto unsupported_parameter; + case Opt_inode32: + ctx->full_inums = false; + ctx->seen |= SHMEM_SEEN_INUMS; + break; + case Opt_inode64: + if (sizeof(ino_t) < 8) { + return invalfc(fc, + "Cannot use inode64 with <64bit inums in kernel\n"); + } + ctx->full_inums = true; + ctx->seen |= SHMEM_SEEN_INUMS; + break; } return 0; @@ -3578,8 +3605,16 @@ static int shmem_reconfigure(struct fs_c } } + if ((ctx->seen & SHMEM_SEEN_INUMS) && !ctx->full_inums && + sbinfo->next_ino > UINT_MAX) { + err = "Current inum too high to switch to 32-bit inums"; + goto out; + } + if (ctx->seen & SHMEM_SEEN_HUGE) sbinfo->huge = ctx->huge; + if (ctx->seen & SHMEM_SEEN_INUMS) + sbinfo->full_inums = ctx->full_inums; if (ctx->seen & SHMEM_SEEN_BLOCKS) sbinfo->max_blocks = ctx->blocks; if (ctx->seen & SHMEM_SEEN_INODES) { @@ -3619,6 +3654,29 @@ static int shmem_show_options(struct seq if (!gid_eq(sbinfo->gid, GLOBAL_ROOT_GID)) seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, sbinfo->gid)); + + /* + * Showing inode{64,32} might be useful even if it's the system default, + * since then people don't have to resort to checking both here and + * /proc/config.gz to confirm 64-bit inums were successfully applied + * (which may not even exist if IKCONFIG_PROC isn't enabled). + * + * We hide it when inode64 isn't the default and we are using 32-bit + * inodes, since that probably just means the feature isn't even under + * consideration. + * + * As such: + * + * +-----------------+-----------------+ + * | TMPFS_INODE64=y | TMPFS_INODE64=n | + * +------------------+-----------------+-----------------+ + * | full_inums=true | show | show | + * | full_inums=false | show | hide | + * +------------------+-----------------+-----------------+ + * + */ + if (IS_ENABLED(CONFIG_TMPFS_INODE64) || sbinfo->full_inums) + seq_printf(seq, ",inode%d", (sbinfo->full_inums ? 64 : 32)); #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* Rightly or wrongly, show huge mount option unmasked by shmem_huge */ if (sbinfo->huge) @@ -3667,6 +3725,8 @@ static int shmem_fill_super(struct super ctx->blocks = shmem_default_max_blocks(); if (!(ctx->seen & SHMEM_SEEN_INODES)) ctx->inodes = shmem_default_max_inodes(); + if (!(ctx->seen & SHMEM_SEEN_INUMS)) + ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64); } else { sb->s_flags |= SB_NOUSER; } @@ -3684,6 +3744,7 @@ static int shmem_fill_super(struct super } sbinfo->uid = ctx->uid; sbinfo->gid = ctx->gid; + sbinfo->full_inums = ctx->full_inums; sbinfo->mode = ctx->mode; sbinfo->huge = ctx->huge; sbinfo->mpol = ctx->mpol; _
From: Roman Gushchin <guro@fb.com> Subject: mm: kmem: make memcg_kmem_enabled() irreversible Historically the kernel memory accounting was an opt-in feature, which could be enabled for individual cgroups. But now it's not true, and it's on by default both on cgroup v1 and cgroup v2. And as long as a user has at least one non-root memory cgroup, the kernel memory accounting is on. So in most setups it's either always on (if memory cgroups are in use and kmem accounting is not disabled), either always off (otherwise). memcg_kmem_enabled() is used in many places to guard the kernel memory accounting code. If memcg_kmem_enabled() can reverse from returning true to returning false (as now), we can't rely on it on release paths and have to check if it was on before. If we'll make memcg_kmem_enabled() irreversible (always returning true after returning it for the first time), it'll make the general logic more simple and robust. It also will allow to guard some checks which otherwise would stay unguarded. Link: http://lkml.kernel.org/r/20200702180926.1330769-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) --- a/mm/memcontrol.c~mm-kmem-make-memcg_kmem_enabled-irreversible +++ a/mm/memcontrol.c @@ -3416,7 +3416,8 @@ static int memcg_online_kmem(struct mem_ if (memcg_id < 0) return memcg_id; - static_branch_inc(&memcg_kmem_enabled_key); + static_branch_enable(&memcg_kmem_enabled_key); + /* * A memory cgroup is considered kmem-online as soon as it gets * kmemcg_id. Setting the id after enabling static branching will @@ -3486,11 +3487,6 @@ static void memcg_free_kmem(struct mem_c /* css_alloc() failed, offlining didn't happen */ if (unlikely(memcg->kmem_state == KMEM_ONLINE)) memcg_offline_kmem(memcg); - - if (memcg->kmem_state == KMEM_ALLOCATED) { - WARN_ON(!list_empty(&memcg->kmem_caches)); - static_branch_dec(&memcg_kmem_enabled_key); - } } #else static int memcg_online_kmem(struct mem_cgroup *memcg) _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() Patch series "The new cgroup slab memory controller", v7. The patchset moves the accounting from the page level to the object level. It allows to share slab pages between memory cgroups. This leads to a significant win in the slab utilization (up to 45%) and the corresponding drop in the total kernel memory footprint. The reduced number of unmovable slab pages should also have a positive effect on the memory fragmentation. The patchset makes the slab accounting code simpler: there is no more need in the complicated dynamic creation and destruction of per-cgroup slab caches, all memory cgroups use a global set of shared slab caches. The lifetime of slab caches is not more connected to the lifetime of memory cgroups. The more precise accounting does require more CPU, however in practice the difference seems to be negligible. We've been using the new slab controller in Facebook production for several months with different workloads and haven't seen any noticeable regressions. What we've seen were memory savings in order of 1 GB per host (it varied heavily depending on the actual workload, size of RAM, number of CPUs, memory pressure, etc). The third version of the patchset added yet another step towards the simplification of the code: sharing of slab caches between accounted and non-accounted allocations. It comes with significant upsides (most noticeable, a complete elimination of dynamic slab caches creation) but not without some regression risks, so this change sits on top of the patchset and is not completely merged in. So in the unlikely event of a noticeable performance regression it can be reverted separately. The slab memory accounting works in exactly the same way for SLAB and SLUB. With both allocators the new controller shows significant memory savings, with SLUB the difference is bigger. On my 16-core desktop machine running Fedora 32 the size of the slab memory measured after the start of the system was lower by 58% and 38% with SLUB and SLAB correspondingly. As an estimation of a potential CPU overhead, below are results of slab_bulk_test01 test, kindly provided by Jesper D. Brouer. He also helped with the evaluation of results. The test can be found here: https://github.com/netoptimizer/prototype-kernel/ The smallest number in each row should be picked for a comparison. SLUB-patched - bulk-API - SLUB-patched : bulk_quick_reuse objects=1 : 187 - 90 - 224 cycles(tsc) - SLUB-patched : bulk_quick_reuse objects=2 : 110 - 53 - 133 cycles(tsc) - SLUB-patched : bulk_quick_reuse objects=3 : 88 - 95 - 42 cycles(tsc) - SLUB-patched : bulk_quick_reuse objects=4 : 91 - 85 - 36 cycles(tsc) - SLUB-patched : bulk_quick_reuse objects=8 : 32 - 66 - 32 cycles(tsc) SLUB-original - bulk-API - SLUB-original: bulk_quick_reuse objects=1 : 87 - 87 - 142 cycles(tsc) - SLUB-original: bulk_quick_reuse objects=2 : 52 - 53 - 53 cycles(tsc) - SLUB-original: bulk_quick_reuse objects=3 : 42 - 42 - 91 cycles(tsc) - SLUB-original: bulk_quick_reuse objects=4 : 91 - 37 - 37 cycles(tsc) - SLUB-original: bulk_quick_reuse objects=8 : 31 - 79 - 76 cycles(tsc) SLAB-patched - bulk-API - SLAB-patched : bulk_quick_reuse objects=1 : 67 - 67 - 140 cycles(tsc) - SLAB-patched : bulk_quick_reuse objects=2 : 55 - 46 - 46 cycles(tsc) - SLAB-patched : bulk_quick_reuse objects=3 : 93 - 94 - 39 cycles(tsc) - SLAB-patched : bulk_quick_reuse objects=4 : 35 - 88 - 85 cycles(tsc) - SLAB-patched : bulk_quick_reuse objects=8 : 30 - 30 - 30 cycles(tsc) SLAB-original- bulk-API - SLAB-original: bulk_quick_reuse objects=1 : 143 - 136 - 67 cycles(tsc) - SLAB-original: bulk_quick_reuse objects=2 : 45 - 46 - 46 cycles(tsc) - SLAB-original: bulk_quick_reuse objects=3 : 38 - 39 - 39 cycles(tsc) - SLAB-original: bulk_quick_reuse objects=4 : 35 - 87 - 87 cycles(tsc) - SLAB-original: bulk_quick_reuse objects=8 : 29 - 66 - 30 cycles(tsc) This patch (of 19): To convert memcg and lruvec slab counters to bytes there must be a way to change these counters without touching node counters. Factor out __mod_memcg_lruvec_state() out of __mod_lruvec_state(). Link: http://lkml.kernel.org/r/20200623174037.3951353-1-guro@fb.com Link: http://lkml.kernel.org/r/20200623174037.3951353-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 17 +++++++++++++ mm/memcontrol.c | 43 +++++++++++++++++++---------------- 2 files changed, 41 insertions(+), 19 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-factor-out-memcg-and-lruvec-level-changes-out-of-__mod_lruvec_state +++ a/include/linux/memcontrol.h @@ -679,11 +679,23 @@ static inline unsigned long lruvec_page_ return x; } +void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, + int val); void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val); void mod_memcg_obj_state(void *p, int idx, int val); +static inline void mod_memcg_lruvec_state(struct lruvec *lruvec, + enum node_stat_item idx, int val) +{ + unsigned long flags; + + local_irq_save(flags); + __mod_memcg_lruvec_state(lruvec, idx, val); + local_irq_restore(flags); +} + static inline void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { @@ -1057,6 +1069,11 @@ static inline unsigned long lruvec_page_ return node_page_state(lruvec_pgdat(lruvec), idx); } +static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec, + enum node_stat_item idx, int val) +{ +} + static inline void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { --- a/mm/memcontrol.c~mm-memcg-factor-out-memcg-and-lruvec-level-changes-out-of-__mod_lruvec_state +++ a/mm/memcontrol.c @@ -713,30 +713,13 @@ parent_nodeinfo(struct mem_cgroup_per_no return mem_cgroup_nodeinfo(parent, nid); } -/** - * __mod_lruvec_state - update lruvec memory statistics - * @lruvec: the lruvec - * @idx: the stat item - * @val: delta to add to the counter, can be negative - * - * The lruvec is the intersection of the NUMA node and a cgroup. This - * function updates the all three counters that are affected by a - * change of state at this level: per-node, per-cgroup, per-lruvec. - */ -void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, - int val) +void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, + int val) { - pg_data_t *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pn; struct mem_cgroup *memcg; long x; - /* Update node */ - __mod_node_page_state(pgdat, idx, val); - - if (mem_cgroup_disabled()) - return; - pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; @@ -748,6 +731,7 @@ void __mod_lruvec_state(struct lruvec *l x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + pg_data_t *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pi; for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id)) @@ -757,6 +741,27 @@ void __mod_lruvec_state(struct lruvec *l __this_cpu_write(pn->lruvec_stat_cpu->count[idx], x); } +/** + * __mod_lruvec_state - update lruvec memory statistics + * @lruvec: the lruvec + * @idx: the stat item + * @val: delta to add to the counter, can be negative + * + * The lruvec is the intersection of the NUMA node and a cgroup. This + * function updates the all three counters that are affected by a + * change of state at this level: per-node, per-cgroup, per-lruvec. + */ +void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, + int val) +{ + /* Update node */ + __mod_node_page_state(lruvec_pgdat(lruvec), idx, val); + + /* Update memcg and lruvec */ + if (!mem_cgroup_disabled()) + __mod_memcg_lruvec_state(lruvec, idx, val); +} + void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val) { pg_data_t *pgdat = page_pgdat(virt_to_page(p)); _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg: prepare for byte-sized vmstat items To implement per-object slab memory accounting, we need to convert slab vmstat counters to bytes. Actually, out of 4 levels of counters: global, per-node, per-memcg and per-lruvec only two last levels will require byte-sized counters. It's because global and per-node counters will be counting the number of slab pages, and per-memcg and per-lruvec will be counting the amount of memory taken by charged slab objects. Converting all vmstat counters to bytes or even all slab counters to bytes would introduce an additional overhead. So instead let's store global and per-node counters in pages, and memcg and lruvec counters in bytes. To make the API clean all access helpers (both on the read and write sides) are dealing with bytes. To avoid back-and-forth conversions a new flavor of read-side helpers is introduced, which always returns values in pages: node_page_state_pages() and global_node_page_state_pages(). Actually new helpers are just reading raw values. Old helpers are simple wrappers, which will complain on an attempt to read byte value, because at the moment no one actually needs bytes. Thanks to Johannes Weiner for the idea of having the byte-sized API on top of the page-sized internal storage. Link: http://lkml.kernel.org/r/20200623174037.3951353-3-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- drivers/base/node.c | 2 +- include/linux/mmzone.h | 10 ++++++++++ include/linux/vmstat.h | 14 +++++++++++++- mm/memcontrol.c | 14 ++++++++++---- mm/vmstat.c | 30 ++++++++++++++++++++++++++---- 5 files changed, 60 insertions(+), 10 deletions(-) --- a/drivers/base/node.c~mm-memcg-prepare-for-byte-sized-vmstat-items +++ a/drivers/base/node.c @@ -513,7 +513,7 @@ static ssize_t node_read_vmstat(struct d for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) n += sprintf(buf+n, "%s %lu\n", node_stat_name(i), - node_page_state(pgdat, i)); + node_page_state_pages(pgdat, i)); return n; } --- a/include/linux/mmzone.h~mm-memcg-prepare-for-byte-sized-vmstat-items +++ a/include/linux/mmzone.h @@ -207,6 +207,16 @@ enum node_stat_item { }; /* + * Returns true if the value is measured in bytes (most vmstat values are + * measured in pages). This defines the API part, the internal representation + * might be different. + */ +static __always_inline bool vmstat_item_in_bytes(int idx) +{ + return false; +} + +/* * We do arithmetic on the LRU lists in various places in the code, * so it is important to keep the active lists LRU_ACTIVE higher in * the array than the corresponding inactive lists, and to keep --- a/include/linux/vmstat.h~mm-memcg-prepare-for-byte-sized-vmstat-items +++ a/include/linux/vmstat.h @@ -8,6 +8,7 @@ #include <linux/vm_event_item.h> #include <linux/atomic.h> #include <linux/static_key.h> +#include <linux/mmdebug.h> extern int sysctl_stat_interval; @@ -192,7 +193,8 @@ static inline unsigned long global_zone_ return x; } -static inline unsigned long global_node_page_state(enum node_stat_item item) +static inline +unsigned long global_node_page_state_pages(enum node_stat_item item) { long x = atomic_long_read(&vm_node_stat[item]); #ifdef CONFIG_SMP @@ -202,6 +204,13 @@ static inline unsigned long global_node_ return x; } +static inline unsigned long global_node_page_state(enum node_stat_item item) +{ + VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); + + return global_node_page_state_pages(item); +} + static inline unsigned long zone_page_state(struct zone *zone, enum zone_stat_item item) { @@ -242,9 +251,12 @@ extern unsigned long sum_zone_node_page_ extern unsigned long sum_zone_numa_state(int node, enum numa_stat_item item); extern unsigned long node_page_state(struct pglist_data *pgdat, enum node_stat_item item); +extern unsigned long node_page_state_pages(struct pglist_data *pgdat, + enum node_stat_item item); #else #define sum_zone_node_page_state(node, item) global_zone_page_state(item) #define node_page_state(node, item) global_node_page_state(item) +#define node_page_state_pages(node, item) global_node_page_state_pages(item) #endif /* CONFIG_NUMA */ #ifdef CONFIG_SMP --- a/mm/memcontrol.c~mm-memcg-prepare-for-byte-sized-vmstat-items +++ a/mm/memcontrol.c @@ -681,13 +681,16 @@ mem_cgroup_largest_soft_limit_node(struc */ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) { - long x; + long x, threshold = MEMCG_CHARGE_BATCH; if (mem_cgroup_disabled()) return; + if (vmstat_item_in_bytes(idx)) + threshold <<= PAGE_SHIFT; + x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]); - if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + if (unlikely(abs(x) > threshold)) { struct mem_cgroup *mi; /* @@ -718,7 +721,7 @@ void __mod_memcg_lruvec_state(struct lru { struct mem_cgroup_per_node *pn; struct mem_cgroup *memcg; - long x; + long x, threshold = MEMCG_CHARGE_BATCH; pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; @@ -729,8 +732,11 @@ void __mod_memcg_lruvec_state(struct lru /* Update lruvec */ __this_cpu_add(pn->lruvec_stat_local->count[idx], val); + if (vmstat_item_in_bytes(idx)) + threshold <<= PAGE_SHIFT; + x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); - if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + if (unlikely(abs(x) > threshold)) { pg_data_t *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup_per_node *pi; --- a/mm/vmstat.c~mm-memcg-prepare-for-byte-sized-vmstat-items +++ a/mm/vmstat.c @@ -341,6 +341,11 @@ void __mod_node_page_state(struct pglist long x; long t; + if (vmstat_item_in_bytes(item)) { + VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1)); + delta >>= PAGE_SHIFT; + } + x = delta + __this_cpu_read(*p); t = __this_cpu_read(pcp->stat_threshold); @@ -398,6 +403,8 @@ void __inc_node_state(struct pglist_data s8 __percpu *p = pcp->vm_node_stat_diff + item; s8 v, t; + VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); + v = __this_cpu_inc_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v > t)) { @@ -442,6 +449,8 @@ void __dec_node_state(struct pglist_data s8 __percpu *p = pcp->vm_node_stat_diff + item; s8 v, t; + VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); + v = __this_cpu_dec_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v < - t)) { @@ -541,6 +550,11 @@ static inline void mod_node_state(struct s8 __percpu *p = pcp->vm_node_stat_diff + item; long o, n, t, z; + if (vmstat_item_in_bytes(item)) { + VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1)); + delta >>= PAGE_SHIFT; + } + do { z = 0; /* overflow to node counters */ @@ -989,8 +1003,8 @@ unsigned long sum_zone_numa_state(int no /* * Determine the per node value of a stat item. */ -unsigned long node_page_state(struct pglist_data *pgdat, - enum node_stat_item item) +unsigned long node_page_state_pages(struct pglist_data *pgdat, + enum node_stat_item item) { long x = atomic_long_read(&pgdat->vm_stat[item]); #ifdef CONFIG_SMP @@ -999,6 +1013,14 @@ unsigned long node_page_state(struct pgl #endif return x; } + +unsigned long node_page_state(struct pglist_data *pgdat, + enum node_stat_item item) +{ + VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); + + return node_page_state_pages(pgdat, item); +} #endif #ifdef CONFIG_COMPACTION @@ -1577,7 +1599,7 @@ static void zoneinfo_show_print(struct s seq_printf(m, "\n per-node stats"); for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { seq_printf(m, "\n %-12s %lu", node_stat_name(i), - node_page_state(pgdat, i)); + node_page_state_pages(pgdat, i)); } } seq_printf(m, @@ -1698,7 +1720,7 @@ static void *vmstat_start(struct seq_fil #endif for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) - v[i] = global_node_page_state(i); + v[i] = global_node_page_state_pages(i); v += NR_VM_NODE_STAT_ITEMS; global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD, _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg: convert vmstat slab counters to bytes In order to prepare for per-object slab memory accounting, convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes. To make it obvious, rename them to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB). Internally global and per-node counters are stored in pages, however memcg and lruvec counters are stored in bytes. This scheme may look weird, but only for now. As soon as slab pages will be shared between multiple cgroups, global and node counters will reflect the total number of slab pages. However memcg and lruvec counters will be used for per-memcg slab memory tracking, which will take separate kernel objects in the account. Keeping global and node counters in pages helps to avoid additional overhead. The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it will fit into atomic_long_t we use for vmstats. Link: http://lkml.kernel.org/r/20200623174037.3951353-4-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- drivers/base/node.c | 4 ++-- fs/proc/meminfo.c | 4 ++-- include/linux/mmzone.h | 16 +++++++++++++--- kernel/power/snapshot.c | 2 +- mm/memcontrol.c | 11 ++++------- mm/oom_kill.c | 2 +- mm/page_alloc.c | 8 ++++---- mm/slab.h | 15 ++++++++------- mm/slab_common.c | 4 ++-- mm/slob.c | 12 ++++++------ mm/slub.c | 8 ++++---- mm/vmscan.c | 3 ++- mm/workingset.c | 6 ++++-- 13 files changed, 53 insertions(+), 42 deletions(-) --- a/drivers/base/node.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/drivers/base/node.c @@ -368,8 +368,8 @@ static ssize_t node_read_meminfo(struct unsigned long sreclaimable, sunreclaimable; si_meminfo_node(&i, nid); - sreclaimable = node_page_state(pgdat, NR_SLAB_RECLAIMABLE); - sunreclaimable = node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE); + sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B); + sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B); n = sprintf(buf, "Node %d MemTotal: %8lu kB\n" "Node %d MemFree: %8lu kB\n" --- a/fs/proc/meminfo.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/fs/proc/meminfo.c @@ -52,8 +52,8 @@ static int meminfo_proc_show(struct seq_ pages[lru] = global_node_page_state(NR_LRU_BASE + lru); available = si_mem_available(); - sreclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE); - sunreclaim = global_node_page_state(NR_SLAB_UNRECLAIMABLE); + sreclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B); + sunreclaim = global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B); show_val_kb(m, "MemTotal: ", i.totalram); show_val_kb(m, "MemFree: ", i.freeram); --- a/include/linux/mmzone.h~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/include/linux/mmzone.h @@ -174,8 +174,8 @@ enum node_stat_item { NR_INACTIVE_FILE, /* " " " " " */ NR_ACTIVE_FILE, /* " " " " " */ NR_UNEVICTABLE, /* " " " " " */ - NR_SLAB_RECLAIMABLE, - NR_SLAB_UNRECLAIMABLE, + NR_SLAB_RECLAIMABLE_B, + NR_SLAB_UNRECLAIMABLE_B, NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ WORKINGSET_NODES, @@ -213,7 +213,17 @@ enum node_stat_item { */ static __always_inline bool vmstat_item_in_bytes(int idx) { - return false; + /* + * Global and per-node slab counters track slab pages. + * It's expected that changes are multiples of PAGE_SIZE. + * Internally values are stored in pages. + * + * Per-memcg and per-lruvec counters track memory, consumed + * by individual slab objects. These counters are actually + * byte-precise. + */ + return (idx == NR_SLAB_RECLAIMABLE_B || + idx == NR_SLAB_UNRECLAIMABLE_B); } /* --- a/kernel/power/snapshot.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/kernel/power/snapshot.c @@ -1663,7 +1663,7 @@ static unsigned long minimum_image_size( { unsigned long size; - size = global_node_page_state(NR_SLAB_RECLAIMABLE) + size = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) + global_node_page_state(NR_ACTIVE_ANON) + global_node_page_state(NR_INACTIVE_ANON) + global_node_page_state(NR_ACTIVE_FILE) --- a/mm/memcontrol.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/memcontrol.c @@ -1391,9 +1391,8 @@ static char *memory_stat_format(struct m (u64)memcg_page_state(memcg, MEMCG_KERNEL_STACK_KB) * 1024); seq_buf_printf(&s, "slab %llu\n", - (u64)(memcg_page_state(memcg, NR_SLAB_RECLAIMABLE) + - memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE)) * - PAGE_SIZE); + (u64)(memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B) + + memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B))); seq_buf_printf(&s, "sock %llu\n", (u64)memcg_page_state(memcg, MEMCG_SOCK) * PAGE_SIZE); @@ -1423,11 +1422,9 @@ static char *memory_stat_format(struct m PAGE_SIZE); seq_buf_printf(&s, "slab_reclaimable %llu\n", - (u64)memcg_page_state(memcg, NR_SLAB_RECLAIMABLE) * - PAGE_SIZE); + (u64)memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B)); seq_buf_printf(&s, "slab_unreclaimable %llu\n", - (u64)memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE) * - PAGE_SIZE); + (u64)memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B)); /* Accumulated memory events */ --- a/mm/oom_kill.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/oom_kill.c @@ -184,7 +184,7 @@ static bool is_dump_unreclaim_slabs(void global_node_page_state(NR_ISOLATED_FILE) + global_node_page_state(NR_UNEVICTABLE); - return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru); + return (global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B) > nr_lru); } /** --- a/mm/page_alloc.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/page_alloc.c @@ -5220,8 +5220,8 @@ long si_mem_available(void) * items that are in use, and cannot be freed. Cap this estimate at the * low watermark. */ - reclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE) + - global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE); + reclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) + + global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE); available += reclaimable - min(reclaimable / 2, wmark_low); if (available < 0) @@ -5364,8 +5364,8 @@ void show_free_areas(unsigned int filter global_node_page_state(NR_UNEVICTABLE), global_node_page_state(NR_FILE_DIRTY), global_node_page_state(NR_WRITEBACK), - global_node_page_state(NR_SLAB_RECLAIMABLE), - global_node_page_state(NR_SLAB_UNRECLAIMABLE), + global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B), + global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B), global_node_page_state(NR_FILE_MAPPED), global_node_page_state(NR_SHMEM), global_zone_page_state(NR_PAGETABLE), --- a/mm/slab_common.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/slab_common.c @@ -1363,8 +1363,8 @@ void *kmalloc_order(size_t size, gfp_t f page = alloc_pages(flags, order); if (likely(page)) { ret = page_address(page); - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - 1 << order); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + PAGE_SIZE << order); } ret = kasan_kmalloc_large(ret, size, flags); /* As ret might get tagged, call kmemleak hook after KASAN. */ --- a/mm/slab.h~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/slab.h @@ -273,7 +273,7 @@ int __kmem_cache_alloc_bulk(struct kmem_ static inline int cache_vmstat_idx(struct kmem_cache *s) { return (s->flags & SLAB_RECLAIM_ACCOUNT) ? - NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE; + NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B; } #ifdef CONFIG_SLUB_DEBUG @@ -390,7 +390,7 @@ static __always_inline int memcg_charge_ if (unlikely(!memcg || mem_cgroup_is_root(memcg))) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - nr_pages); + nr_pages << PAGE_SHIFT); percpu_ref_get_many(&s->memcg_params.refcnt, nr_pages); return 0; } @@ -400,7 +400,7 @@ static __always_inline int memcg_charge_ goto out; lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), nr_pages); + mod_lruvec_state(lruvec, cache_vmstat_idx(s), nr_pages << PAGE_SHIFT); /* transer try_charge() page references to kmem_cache */ percpu_ref_get_many(&s->memcg_params.refcnt, nr_pages); @@ -425,11 +425,12 @@ static __always_inline void memcg_unchar memcg = READ_ONCE(s->memcg_params.memcg); if (likely(!mem_cgroup_is_root(memcg))) { lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages); + mod_lruvec_state(lruvec, cache_vmstat_idx(s), + -(nr_pages << PAGE_SHIFT)); memcg_kmem_uncharge(memcg, nr_pages); } else { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -nr_pages); + -(nr_pages << PAGE_SHIFT)); } rcu_read_unlock(); @@ -513,7 +514,7 @@ static __always_inline int charge_slab_p { if (is_root_cache(s)) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - 1 << order); + PAGE_SIZE << order); return 0; } @@ -525,7 +526,7 @@ static __always_inline void uncharge_sla { if (is_root_cache(s)) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(1 << order)); + -(PAGE_SIZE << order)); return; } --- a/mm/slob.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/slob.c @@ -202,8 +202,8 @@ static void *slob_new_pages(gfp_t gfp, i if (!page) return NULL; - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - 1 << order); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + PAGE_SIZE << order); return page_address(page); } @@ -214,8 +214,8 @@ static void slob_free_pages(void *b, int if (current->reclaim_state) current->reclaim_state->reclaimed_slab += 1 << order; - mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE, - -(1 << order)); + mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, + -(PAGE_SIZE << order)); __free_pages(sp, order); } @@ -552,8 +552,8 @@ void kfree(const void *block) slob_free(m, *m + align); } else { unsigned int order = compound_order(sp); - mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE, - -(1 << order)); + mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B, + -(PAGE_SIZE << order)); __free_pages(sp, order); } --- a/mm/slub.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/slub.c @@ -3991,8 +3991,8 @@ static void *kmalloc_large_node(size_t s page = alloc_pages_node(node, flags, order); if (page) { ptr = page_address(page); - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - 1 << order); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + PAGE_SIZE << order); } return kmalloc_large_node_hook(ptr, size, flags); @@ -4123,8 +4123,8 @@ void kfree(const void *x) BUG_ON(!PageCompound(page)); kfree_hook(object); - mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE, - -(1 << order)); + mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE_B, + -(PAGE_SIZE << order)); __free_pages(page, order); return; } --- a/mm/vmscan.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/vmscan.c @@ -4222,7 +4222,8 @@ int node_reclaim(struct pglist_data *pgd * unmapped file backed pages. */ if (node_pagecache_reclaimable(pgdat) <= pgdat->min_unmapped_pages && - node_page_state(pgdat, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages) + node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B) <= + pgdat->min_slab_pages) return NODE_RECLAIM_FULL; /* --- a/mm/workingset.c~mm-memcg-convert-vmstat-slab-counters-to-bytes +++ a/mm/workingset.c @@ -486,8 +486,10 @@ static unsigned long count_shadow_nodes( for (pages = 0, i = 0; i < NR_LRU_LISTS; i++) pages += lruvec_page_state_local(lruvec, NR_LRU_BASE + i); - pages += lruvec_page_state_local(lruvec, NR_SLAB_RECLAIMABLE); - pages += lruvec_page_state_local(lruvec, NR_SLAB_UNRECLAIMABLE); + pages += lruvec_page_state_local( + lruvec, NR_SLAB_RECLAIMABLE_B) >> PAGE_SHIFT; + pages += lruvec_page_state_local( + lruvec, NR_SLAB_UNRECLAIMABLE_B) >> PAGE_SHIFT; } else #endif pages = node_present_pages(sc->nid); _
From: Roman Gushchin <guro@fb.com> Subject: mm: slub: implement SLUB version of obj_to_index() This commit implements SLUB version of the obj_to_index() function, which will be required to calculate the offset of obj_cgroup in the obj_cgroups vector to store/obtain the objcg ownership data. To make it faster, let's repeat the SLAB's trick introduced by commit 6a2d7a955d8d ("SLAB: use a multiply instead of a divide in obj_to_index()") and avoid an expensive division. Vlastimil Babka noticed, that SLUB does have already a similar function called slab_index(), which is defined only if SLUB_DEBUG is enabled. The function does a similar math, but with a division, and it also takes a page address instead of a page pointer. Let's remove slab_index() and replace it with the new helper __obj_to_index(), which takes a page address. obj_to_index() will be a simple wrapper taking a page pointer and passing page_address(page) into __obj_to_index(). Link: http://lkml.kernel.org/r/20200623174037.3951353-5-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/slub_def.h | 16 ++++++++++++++++ mm/slub.c | 15 +++++---------- 2 files changed, 21 insertions(+), 10 deletions(-) --- a/include/linux/slub_def.h~mm-slub-implement-slub-version-of-obj_to_index +++ a/include/linux/slub_def.h @@ -8,6 +8,7 @@ * (C) 2007 SGI, Christoph Lameter */ #include <linux/kobject.h> +#include <linux/reciprocal_div.h> enum stat_item { ALLOC_FASTPATH, /* Allocation from cpu slab */ @@ -86,6 +87,7 @@ struct kmem_cache { unsigned long min_partial; unsigned int size; /* The size of an object including metadata */ unsigned int object_size;/* The size of an object without metadata */ + struct reciprocal_value reciprocal_size; unsigned int offset; /* Free pointer offset */ #ifdef CONFIG_SLUB_CPU_PARTIAL /* Number of per cpu partial objects to keep around */ @@ -182,4 +184,18 @@ static inline void *nearest_obj(struct k return result; } +/* Determine object index from a given position */ +static inline unsigned int __obj_to_index(const struct kmem_cache *cache, + void *addr, void *obj) +{ + return reciprocal_divide(kasan_reset_tag(obj) - addr, + cache->reciprocal_size); +} + +static inline unsigned int obj_to_index(const struct kmem_cache *cache, + const struct page *page, void *obj) +{ + return __obj_to_index(cache, page_address(page), obj); +} + #endif /* _LINUX_SLUB_DEF_H */ --- a/mm/slub.c~mm-slub-implement-slub-version-of-obj_to_index +++ a/mm/slub.c @@ -317,12 +317,6 @@ static inline void set_freepointer(struc __p < (__addr) + (__objects) * (__s)->size; \ __p += (__s)->size) -/* Determine object index from a given position */ -static inline unsigned int slab_index(void *p, struct kmem_cache *s, void *addr) -{ - return (kasan_reset_tag(p) - addr) / s->size; -} - static inline unsigned int order_objects(unsigned int order, unsigned int size) { return ((unsigned int)PAGE_SIZE << order) / size; @@ -465,7 +459,7 @@ static unsigned long *get_map(struct kme bitmap_zero(object_map, page->objects); for (p = page->freelist; p; p = get_freepointer(s, p)) - set_bit(slab_index(p, s, addr), object_map); + set_bit(__obj_to_index(s, addr, p), object_map); return object_map; } @@ -3754,6 +3748,7 @@ static int calculate_sizes(struct kmem_c */ size = ALIGN(size, s->align); s->size = size; + s->reciprocal_size = reciprocal_value(size); if (forced_order >= 0) order = forced_order; else @@ -3858,7 +3853,7 @@ static void list_slab_objects(struct kme map = get_map(s, page); for_each_object(p, s, addr, page->objects) { - if (!test_bit(slab_index(p, s, addr), map)) { + if (!test_bit(__obj_to_index(s, addr, p), map)) { pr_err("INFO: Object 0x%p @offset=%tu\n", p, p - addr); print_tracking(s, p); } @@ -4574,7 +4569,7 @@ static void validate_slab(struct kmem_ca /* Now we know that a valid freelist exists */ map = get_map(s, page); for_each_object(p, s, addr, page->objects) { - u8 val = test_bit(slab_index(p, s, addr), map) ? + u8 val = test_bit(__obj_to_index(s, addr, p), map) ? SLUB_RED_INACTIVE : SLUB_RED_ACTIVE; if (!check_object(s, page, p, val)) @@ -4765,7 +4760,7 @@ static void process_slab(struct loc_trac map = get_map(s, page); for_each_object(p, s, addr, page->objects) - if (!test_bit(slab_index(p, s, addr), map)) + if (!test_bit(__obj_to_index(s, addr, p), map)) add_location(t, s, get_track(s, p, alloc)); put_map(map); } _
From: Johannes Weiner <hannes@cmpxchg.org> Subject: mm: memcontrol: decouple reference counting from page accounting The reference counting of a memcg is currently coupled directly to how many 4k pages are charged to it. This doesn't work well with Roman's new slab controller, which maintains pools of objects and doesn't want to keep an extra balance sheet for the pages backing those objects. This unusual refcounting design (reference counts usually track pointers to an object) is only for historical reasons: memcg used to not take any css references and simply stalled offlining until all charges had been reparented and the page counters had dropped to zero. When we got rid of the reparenting requirement, the simple mechanical translation was to take a reference for every charge. More historical context can be found in commit e8ea14cc6ead ("mm: memcontrol: take a css reference for each charged page"), commit 64f219938941 ("mm: memcontrol: remove obsolete kmemcg pinning tricks") and commit b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups"). The new slab controller exposes the limitations in this scheme, so let's switch it to a more idiomatic reference counting model based on actual kernel pointers to the memcg: - The per-cpu stock holds a reference to the memcg its caching - User pages hold a reference for their page->mem_cgroup. Transparent huge pages will no longer acquire tail references in advance, we'll get them if needed during the split. - Kernel pages hold a reference for their page->mem_cgroup - Pages allocated in the root cgroup will acquire and release css references for simplicity. css_get() and css_put() optimize that. - The current memcg_charge_slab() already hacked around the per-charge references; this change gets rid of that as well. - tcp accounting will handle reference in mem_cgroup_sk_{alloc,free} Roman: 1) Rebased on top of the current mm tree: added css_get() in mem_cgroup_charge(), dropped mem_cgroup_try_charge() part 2) I've reformatted commit references in the commit log to make checkpatch.pl happy. [hughd@google.com: remove css_put_many() from __mem_cgroup_clear_mc()] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007302011450.2347@eggly.anvils Link: http://lkml.kernel.org/r/20200623174037.3951353-6-guro@fb.com Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 39 +++++++++++++++++++++------------------ mm/slab.h | 2 -- 2 files changed, 21 insertions(+), 20 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-decouple-reference-counting-from-page-accounting +++ a/mm/memcontrol.c @@ -2094,13 +2094,17 @@ static void drain_stock(struct memcg_sto { struct mem_cgroup *old = stock->cached; + if (!old) + return; + if (stock->nr_pages) { page_counter_uncharge(&old->memory, stock->nr_pages); if (do_memsw_account()) page_counter_uncharge(&old->memsw, stock->nr_pages); - css_put_many(&old->css, stock->nr_pages); stock->nr_pages = 0; } + + css_put(&old->css); stock->cached = NULL; } @@ -2136,6 +2140,7 @@ static void refill_stock(struct mem_cgro stock = this_cpu_ptr(&memcg_stock); if (stock->cached != memcg) { /* reset if necessary */ drain_stock(stock); + css_get(&memcg->css); stock->cached = memcg; } stock->nr_pages += nr_pages; @@ -2594,12 +2599,10 @@ force: page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); - css_get_many(&memcg->css, nr_pages); return 0; done_restock: - css_get_many(&memcg->css, batch); if (batch > nr_pages) refill_stock(memcg, batch - nr_pages); @@ -2657,8 +2660,6 @@ static void cancel_charge(struct mem_cgr page_counter_uncharge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_uncharge(&memcg->memsw, nr_pages); - - css_put_many(&memcg->css, nr_pages); } #endif @@ -2966,6 +2967,7 @@ int __memcg_kmem_charge_page(struct page if (!ret) { page->mem_cgroup = memcg; __SetPageKmemcg(page); + return 0; } } css_put(&memcg->css); @@ -2988,12 +2990,11 @@ void __memcg_kmem_uncharge_page(struct p VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); __memcg_kmem_uncharge(memcg, nr_pages); page->mem_cgroup = NULL; + css_put(&memcg->css); /* slab pages do not have PageKmemcg flag set */ if (PageKmemcg(page)) __ClearPageKmemcg(page); - - css_put_many(&memcg->css, nr_pages); } #endif /* CONFIG_MEMCG_KMEM */ @@ -3005,13 +3006,16 @@ void __memcg_kmem_uncharge_page(struct p */ void mem_cgroup_split_huge_fixup(struct page *head) { + struct mem_cgroup *memcg = head->mem_cgroup; int i; if (mem_cgroup_disabled()) return; - for (i = 1; i < HPAGE_PMD_NR; i++) - head[i].mem_cgroup = head->mem_cgroup; + for (i = 1; i < HPAGE_PMD_NR; i++) { + css_get(&memcg->css); + head[i].mem_cgroup = memcg; + } } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -5452,7 +5456,10 @@ static int mem_cgroup_move_account(struc */ smp_mb(); - page->mem_cgroup = to; /* caller should have done css_get */ + css_get(&to->css); + css_put(&from->css); + + page->mem_cgroup = to; __unlock_page_memcg(from); @@ -5673,8 +5680,6 @@ static void __mem_cgroup_clear_mc(void) if (!mem_cgroup_is_root(mc.to)) page_counter_uncharge(&mc.to->memory, mc.moved_swap); - css_put_many(&mc.to->css, mc.moved_swap); - mc.moved_swap = 0; } memcg_oom_recover(from); @@ -6502,6 +6507,7 @@ int mem_cgroup_charge(struct page *page, if (ret) goto out_put; + css_get(&memcg->css); commit_charge(page, memcg); local_irq_disable(); @@ -6556,9 +6562,6 @@ static void uncharge_batch(const struct __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages); memcg_check_events(ug->memcg, ug->dummy_page); local_irq_restore(flags); - - if (!mem_cgroup_is_root(ug->memcg)) - css_put_many(&ug->memcg->css, ug->nr_pages); } static void uncharge_page(struct page *page, struct uncharge_gather *ug) @@ -6596,6 +6599,7 @@ static void uncharge_page(struct page *p ug->dummy_page = page; page->mem_cgroup = NULL; + css_put(&ug->memcg->css); } static void uncharge_list(struct list_head *page_list) @@ -6701,8 +6705,8 @@ void mem_cgroup_migrate(struct page *old page_counter_charge(&memcg->memory, nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); - css_get_many(&memcg->css, nr_pages); + css_get(&memcg->css); commit_charge(newpage, memcg); local_irq_save(flags); @@ -6939,8 +6943,7 @@ void mem_cgroup_swapout(struct page *pag mem_cgroup_charge_statistics(memcg, page, -nr_entries); memcg_check_events(memcg, page); - if (!mem_cgroup_is_root(memcg)) - css_put_many(&memcg->css, nr_entries); + css_put(&memcg->css); } /** --- a/mm/slab.h~mm-memcontrol-decouple-reference-counting-from-page-accounting +++ a/mm/slab.h @@ -402,9 +402,7 @@ static __always_inline int memcg_charge_ lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); mod_lruvec_state(lruvec, cache_vmstat_idx(s), nr_pages << PAGE_SHIFT); - /* transer try_charge() page references to kmem_cache */ percpu_ref_get_many(&s->memcg_params.refcnt, nr_pages); - css_put_many(&memcg->css, nr_pages); out: css_put(&memcg->css); return ret; _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: obj_cgroup API Obj_cgroup API provides an ability to account sub-page sized kernel objects, which potentially outlive the original memory cgroup. The top-level API consists of the following functions: bool obj_cgroup_tryget(struct obj_cgroup *objcg); void obj_cgroup_get(struct obj_cgroup *objcg); void obj_cgroup_put(struct obj_cgroup *objcg); int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg); struct obj_cgroup *get_obj_cgroup_from_current(void); Object cgroup is basically a pointer to a memory cgroup with a per-cpu reference counter. It substitutes a memory cgroup in places where it's necessary to charge a custom amount of bytes instead of pages. All charged memory rounded down to pages is charged to the corresponding memory cgroup using __memcg_kmem_charge(). It implements reparenting: on memcg offlining it's getting reattached to the parent memory cgroup. Each online memory cgroup has an associated active object cgroup to handle new allocations and the list of all attached object cgroups. On offlining of a cgroup this list is reparented and for each object cgroup in the list the memcg pointer is swapped to the parent memory cgroup. It prevents long-living objects from pinning the original memory cgroup in the memory. The implementation is based on byte-sized per-cpu stocks. A sub-page sized leftover is stored in an atomic field, which is a part of obj_cgroup object. So on cgroup offlining the leftover is automatically reparented. memcg->objcg is rcu protected. objcg->memcg is a raw pointer, which is always pointing at a memory cgroup, but can be atomically swapped to the parent memory cgroup. So a user must ensure the lifetime of the cgroup, e.g. grab rcu_read_lock or css_set_lock. Link: http://lkml.kernel.org/r/20200623174037.3951353-7-guro@fb.com Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 51 ++++++ mm/memcontrol.c | 288 ++++++++++++++++++++++++++++++++++- 2 files changed, 338 insertions(+), 1 deletion(-) --- a/include/linux/memcontrol.h~mm-memcg-slab-obj_cgroup-api +++ a/include/linux/memcontrol.h @@ -23,6 +23,7 @@ #include <linux/page-flags.h> struct mem_cgroup; +struct obj_cgroup; struct page; struct mm_struct; struct kmem_cache; @@ -193,6 +194,22 @@ struct memcg_cgwb_frn { }; /* + * Bucket for arbitrarily byte-sized objects charged to a memory + * cgroup. The bucket can be reparented in one piece when the cgroup + * is destroyed, without having to round up the individual references + * of all live memory objects in the wild. + */ +struct obj_cgroup { + struct percpu_ref refcnt; + struct mem_cgroup *memcg; + atomic_t nr_charged_bytes; + union { + struct list_head list; + struct rcu_head rcu; + }; +}; + +/* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide * statistics based on the statistics developed by Rik Van Riel for clock-pro, @@ -301,6 +318,8 @@ struct mem_cgroup { int kmemcg_id; enum memcg_kmem_state kmem_state; struct list_head kmem_caches; + struct obj_cgroup __rcu *objcg; + struct list_head objcg_list; /* list of inherited objcgs */ #endif #ifdef CONFIG_CGROUP_WRITEBACK @@ -416,6 +435,33 @@ struct mem_cgroup *mem_cgroup_from_css(s return css ? container_of(css, struct mem_cgroup, css) : NULL; } +static inline bool obj_cgroup_tryget(struct obj_cgroup *objcg) +{ + return percpu_ref_tryget(&objcg->refcnt); +} + +static inline void obj_cgroup_get(struct obj_cgroup *objcg) +{ + percpu_ref_get(&objcg->refcnt); +} + +static inline void obj_cgroup_put(struct obj_cgroup *objcg) +{ + percpu_ref_put(&objcg->refcnt); +} + +/* + * After the initialization objcg->memcg is always pointing at + * a valid memcg, but can be atomically swapped to the parent memcg. + * + * The caller must ensure that the returned memcg won't be released: + * e.g. acquire the rcu_read_lock or css_set_lock. + */ +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) +{ + return READ_ONCE(objcg->memcg); +} + static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) @@ -1368,6 +1414,11 @@ void __memcg_kmem_uncharge(struct mem_cg int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); +struct obj_cgroup *get_obj_cgroup_from_current(void); + +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); +void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); + extern struct static_key_false memcg_kmem_enabled_key; extern struct workqueue_struct *memcg_kmem_cache_wq; --- a/mm/memcontrol.c~mm-memcg-slab-obj_cgroup-api +++ a/mm/memcontrol.c @@ -257,6 +257,98 @@ struct cgroup_subsys_state *vmpressure_t } #ifdef CONFIG_MEMCG_KMEM +extern spinlock_t css_set_lock; + +static void obj_cgroup_release(struct percpu_ref *ref) +{ + struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); + struct mem_cgroup *memcg; + unsigned int nr_bytes; + unsigned int nr_pages; + unsigned long flags; + + /* + * At this point all allocated objects are freed, and + * objcg->nr_charged_bytes can't have an arbitrary byte value. + * However, it can be PAGE_SIZE or (x * PAGE_SIZE). + * + * The following sequence can lead to it: + * 1) CPU0: objcg == stock->cached_objcg + * 2) CPU1: we do a small allocation (e.g. 92 bytes), + * PAGE_SIZE bytes are charged + * 3) CPU1: a process from another memcg is allocating something, + * the stock if flushed, + * objcg->nr_charged_bytes = PAGE_SIZE - 92 + * 5) CPU0: we do release this object, + * 92 bytes are added to stock->nr_bytes + * 6) CPU0: stock is flushed, + * 92 bytes are added to objcg->nr_charged_bytes + * + * In the result, nr_charged_bytes == PAGE_SIZE. + * This page will be uncharged in obj_cgroup_release(). + */ + nr_bytes = atomic_read(&objcg->nr_charged_bytes); + WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); + nr_pages = nr_bytes >> PAGE_SHIFT; + + spin_lock_irqsave(&css_set_lock, flags); + memcg = obj_cgroup_memcg(objcg); + if (nr_pages) + __memcg_kmem_uncharge(memcg, nr_pages); + list_del(&objcg->list); + mem_cgroup_put(memcg); + spin_unlock_irqrestore(&css_set_lock, flags); + + percpu_ref_exit(ref); + kfree_rcu(objcg, rcu); +} + +static struct obj_cgroup *obj_cgroup_alloc(void) +{ + struct obj_cgroup *objcg; + int ret; + + objcg = kzalloc(sizeof(struct obj_cgroup), GFP_KERNEL); + if (!objcg) + return NULL; + + ret = percpu_ref_init(&objcg->refcnt, obj_cgroup_release, 0, + GFP_KERNEL); + if (ret) { + kfree(objcg); + return NULL; + } + INIT_LIST_HEAD(&objcg->list); + return objcg; +} + +static void memcg_reparent_objcgs(struct mem_cgroup *memcg, + struct mem_cgroup *parent) +{ + struct obj_cgroup *objcg, *iter; + + objcg = rcu_replace_pointer(memcg->objcg, NULL, true); + + spin_lock_irq(&css_set_lock); + + /* Move active objcg to the parent's list */ + xchg(&objcg->memcg, parent); + css_get(&parent->css); + list_add(&objcg->list, &parent->objcg_list); + + /* Move already reparented objcgs to the parent's list */ + list_for_each_entry(iter, &memcg->objcg_list, list) { + css_get(&parent->css); + xchg(&iter->memcg, parent); + css_put(&memcg->css); + } + list_splice(&memcg->objcg_list, &parent->objcg_list); + + spin_unlock_irq(&css_set_lock); + + percpu_ref_kill(&objcg->refcnt); +} + /* * This will be the memcg's index in each cache's ->memcg_params.memcg_caches. * The main reason for not using cgroup id for this: @@ -2047,6 +2139,12 @@ EXPORT_SYMBOL(unlock_page_memcg); struct memcg_stock_pcp { struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; + +#ifdef CONFIG_MEMCG_KMEM + struct obj_cgroup *cached_objcg; + unsigned int nr_bytes; +#endif + struct work_struct work; unsigned long flags; #define FLUSHING_CACHED_CHARGE 0 @@ -2054,6 +2152,22 @@ struct memcg_stock_pcp { static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); static DEFINE_MUTEX(percpu_charge_mutex); +#ifdef CONFIG_MEMCG_KMEM +static void drain_obj_stock(struct memcg_stock_pcp *stock); +static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, + struct mem_cgroup *root_memcg); + +#else +static inline void drain_obj_stock(struct memcg_stock_pcp *stock) +{ +} +static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, + struct mem_cgroup *root_memcg) +{ + return false; +} +#endif + /** * consume_stock: Try to consume stocked charge on this cpu. * @memcg: memcg to consume from. @@ -2120,6 +2234,7 @@ static void drain_local_stock(struct wor local_irq_save(flags); stock = this_cpu_ptr(&memcg_stock); + drain_obj_stock(stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); @@ -2179,6 +2294,8 @@ static void drain_all_stock(struct mem_c if (memcg && stock->nr_pages && mem_cgroup_is_descendant(memcg, root_memcg)) flush = true; + if (obj_stock_flush_required(stock, root_memcg)) + flush = true; rcu_read_unlock(); if (flush && @@ -2705,6 +2822,30 @@ struct mem_cgroup *mem_cgroup_from_obj(v return page->mem_cgroup; } +__always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) +{ + struct obj_cgroup *objcg = NULL; + struct mem_cgroup *memcg; + + if (unlikely(!current->mm && !current->active_memcg)) + return NULL; + + rcu_read_lock(); + if (unlikely(current->active_memcg)) + memcg = rcu_dereference(current->active_memcg); + else + memcg = mem_cgroup_from_task(current); + + for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) { + objcg = rcu_dereference(memcg->objcg); + if (objcg && obj_cgroup_tryget(objcg)) + break; + } + rcu_read_unlock(); + + return objcg; +} + static int memcg_alloc_cache_id(void) { int id, size; @@ -2996,6 +3137,140 @@ void __memcg_kmem_uncharge_page(struct p if (PageKmemcg(page)) __ClearPageKmemcg(page); } + +static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) +{ + struct memcg_stock_pcp *stock; + unsigned long flags; + bool ret = false; + + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); + if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { + stock->nr_bytes -= nr_bytes; + ret = true; + } + + local_irq_restore(flags); + + return ret; +} + +static void drain_obj_stock(struct memcg_stock_pcp *stock) +{ + struct obj_cgroup *old = stock->cached_objcg; + + if (!old) + return; + + if (stock->nr_bytes) { + unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; + unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); + + if (nr_pages) { + rcu_read_lock(); + __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages); + rcu_read_unlock(); + } + + /* + * The leftover is flushed to the centralized per-memcg value. + * On the next attempt to refill obj stock it will be moved + * to a per-cpu stock (probably, on an other CPU), see + * refill_obj_stock(). + * + * How often it's flushed is a trade-off between the memory + * limit enforcement accuracy and potential CPU contention, + * so it might be changed in the future. + */ + atomic_add(nr_bytes, &old->nr_charged_bytes); + stock->nr_bytes = 0; + } + + obj_cgroup_put(old); + stock->cached_objcg = NULL; +} + +static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, + struct mem_cgroup *root_memcg) +{ + struct mem_cgroup *memcg; + + if (stock->cached_objcg) { + memcg = obj_cgroup_memcg(stock->cached_objcg); + if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) + return true; + } + + return false; +} + +static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) +{ + struct memcg_stock_pcp *stock; + unsigned long flags; + + local_irq_save(flags); + + stock = this_cpu_ptr(&memcg_stock); + if (stock->cached_objcg != objcg) { /* reset if necessary */ + drain_obj_stock(stock); + obj_cgroup_get(objcg); + stock->cached_objcg = objcg; + stock->nr_bytes = atomic_xchg(&objcg->nr_charged_bytes, 0); + } + stock->nr_bytes += nr_bytes; + + if (stock->nr_bytes > PAGE_SIZE) + drain_obj_stock(stock); + + local_irq_restore(flags); +} + +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) +{ + struct mem_cgroup *memcg; + unsigned int nr_pages, nr_bytes; + int ret; + + if (consume_obj_stock(objcg, size)) + return 0; + + /* + * In theory, memcg->nr_charged_bytes can have enough + * pre-charged bytes to satisfy the allocation. However, + * flushing memcg->nr_charged_bytes requires two atomic + * operations, and memcg->nr_charged_bytes can't be big, + * so it's better to ignore it and try grab some new pages. + * memcg->nr_charged_bytes will be flushed in + * refill_obj_stock(), called from this function or + * independently later. + */ + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + css_get(&memcg->css); + rcu_read_unlock(); + + nr_pages = size >> PAGE_SHIFT; + nr_bytes = size & (PAGE_SIZE - 1); + + if (nr_bytes) + nr_pages += 1; + + ret = __memcg_kmem_charge(memcg, gfp, nr_pages); + if (!ret && nr_bytes) + refill_obj_stock(objcg, PAGE_SIZE - nr_bytes); + + css_put(&memcg->css); + return ret; +} + +void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) +{ + refill_obj_stock(objcg, size); +} + #endif /* CONFIG_MEMCG_KMEM */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -3416,6 +3691,7 @@ static void memcg_flush_percpu_vmevents( #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { + struct obj_cgroup *objcg; int memcg_id; if (cgroup_memory_nokmem) @@ -3428,6 +3704,14 @@ static int memcg_online_kmem(struct mem_ if (memcg_id < 0) return memcg_id; + objcg = obj_cgroup_alloc(); + if (!objcg) { + memcg_free_cache_id(memcg_id); + return -ENOMEM; + } + objcg->memcg = memcg; + rcu_assign_pointer(memcg->objcg, objcg); + static_branch_enable(&memcg_kmem_enabled_key); /* @@ -3464,9 +3748,10 @@ static void memcg_offline_kmem(struct me parent = root_mem_cgroup; /* - * Deactivate and reparent kmem_caches. + * Deactivate and reparent kmem_caches and objcgs. */ memcg_deactivate_kmem_caches(memcg, parent); + memcg_reparent_objcgs(memcg, parent); kmemcg_id = memcg->kmemcg_id; BUG_ON(kmemcg_id < 0); @@ -5030,6 +5315,7 @@ static struct mem_cgroup *mem_cgroup_all memcg->socket_pressure = jiffies; #ifdef CONFIG_MEMCG_KMEM memcg->kmemcg_id = -1; + INIT_LIST_HEAD(&memcg->objcg_list); #endif #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: allocate obj_cgroups for non-root slab pages Allocate and release memory to store obj_cgroup pointers for each non-root slab page. Reuse page->mem_cgroup pointer to store a pointer to the allocated space. This commit temporarily increases the memory footprint of the kernel memory accounting. To store obj_cgroup pointers we'll need a place for an objcg_pointer for each allocated object. However, the following patches in the series will enable sharing of slab pages between memory cgroups, which will dramatically increase the total slab utilization. And the final memory footprint will be significantly smaller than before. To distinguish between obj_cgroups and memcg pointers in case when it's not obvious which one is used (as in page_cgroup_ino()), let's always set the lowest bit in the obj_cgroup case. The original obj_cgroups pointer is marked to be ignored by kmemleak, which otherwise would report a memory leak for each allocated vector. Link: http://lkml.kernel.org/r/20200623174037.3951353-8-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm_types.h | 5 ++- include/linux/slab_def.h | 6 ++++ include/linux/slub_def.h | 5 +++ mm/memcontrol.c | 17 +++++++++--- mm/slab.h | 52 +++++++++++++++++++++++++++++++++++++ 5 files changed, 81 insertions(+), 4 deletions(-) --- a/include/linux/mm_types.h~mm-memcg-slab-allocate-obj_cgroups-for-non-root-slab-pages +++ a/include/linux/mm_types.h @@ -198,7 +198,10 @@ struct page { atomic_t _refcount; #ifdef CONFIG_MEMCG - struct mem_cgroup *mem_cgroup; + union { + struct mem_cgroup *mem_cgroup; + struct obj_cgroup **obj_cgroups; + }; #endif /* --- a/include/linux/slab_def.h~mm-memcg-slab-allocate-obj_cgroups-for-non-root-slab-pages +++ a/include/linux/slab_def.h @@ -114,4 +114,10 @@ static inline unsigned int obj_to_index( return reciprocal_divide(offset, cache->reciprocal_buffer_size); } +static inline int objs_per_slab_page(const struct kmem_cache *cache, + const struct page *page) +{ + return cache->num; +} + #endif /* _LINUX_SLAB_DEF_H */ --- a/include/linux/slub_def.h~mm-memcg-slab-allocate-obj_cgroups-for-non-root-slab-pages +++ a/include/linux/slub_def.h @@ -198,4 +198,9 @@ static inline unsigned int obj_to_index( return __obj_to_index(cache, page_address(page), obj); } +static inline int objs_per_slab_page(const struct kmem_cache *cache, + const struct page *page) +{ + return page->objects; +} #endif /* _LINUX_SLUB_DEF_H */ --- a/mm/memcontrol.c~mm-memcg-slab-allocate-obj_cgroups-for-non-root-slab-pages +++ a/mm/memcontrol.c @@ -569,10 +569,21 @@ ino_t page_cgroup_ino(struct page *page) unsigned long ino = 0; rcu_read_lock(); - if (PageSlab(page) && !PageTail(page)) + if (PageSlab(page) && !PageTail(page)) { memcg = memcg_from_slab_page(page); - else - memcg = READ_ONCE(page->mem_cgroup); + } else { + memcg = page->mem_cgroup; + + /* + * The lowest bit set means that memcg isn't a valid + * memcg pointer, but a obj_cgroups pointer. + * In this case the page is shared and doesn't belong + * to any specific memory cgroup. + */ + if ((unsigned long) memcg & 0x1UL) + memcg = NULL; + } + while (memcg && !(memcg->css.flags & CSS_ONLINE)) memcg = parent_mem_cgroup(memcg); if (memcg) --- a/mm/slab.h~mm-memcg-slab-allocate-obj_cgroups-for-non-root-slab-pages +++ a/mm/slab.h @@ -109,6 +109,7 @@ struct memcg_cache_params { #include <linux/kmemleak.h> #include <linux/random.h> #include <linux/sched/mm.h> +#include <linux/kmemleak.h> /* * State of the slab allocator. @@ -348,6 +349,18 @@ static inline struct kmem_cache *memcg_r return s->memcg_params.root_cache; } +static inline struct obj_cgroup **page_obj_cgroups(struct page *page) +{ + /* + * page->mem_cgroup and page->obj_cgroups are sharing the same + * space. To distinguish between them in case we don't know for sure + * that the page is a slab page (e.g. page_cgroup_ino()), let's + * always set the lowest bit of obj_cgroups. + */ + return (struct obj_cgroup **) + ((unsigned long)page->obj_cgroups & ~0x1UL); +} + /* * Expects a pointer to a slab page. Please note, that PageSlab() check * isn't sufficient, as it returns true also for tail compound slab pages, @@ -435,6 +448,28 @@ static __always_inline void memcg_unchar percpu_ref_put_many(&s->memcg_params.refcnt, nr_pages); } +static inline int memcg_alloc_page_obj_cgroups(struct page *page, + struct kmem_cache *s, gfp_t gfp) +{ + unsigned int objects = objs_per_slab_page(s, page); + void *vec; + + vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, + page_to_nid(page)); + if (!vec) + return -ENOMEM; + + kmemleak_not_leak(vec); + page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 0x1UL); + return 0; +} + +static inline void memcg_free_page_obj_cgroups(struct page *page) +{ + kfree(page_obj_cgroups(page)); + page->obj_cgroups = NULL; +} + extern void slab_init_memcg_params(struct kmem_cache *); extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg); @@ -484,6 +519,16 @@ static inline void memcg_uncharge_slab(s { } +static inline int memcg_alloc_page_obj_cgroups(struct page *page, + struct kmem_cache *s, gfp_t gfp) +{ + return 0; +} + +static inline void memcg_free_page_obj_cgroups(struct page *page) +{ +} + static inline void slab_init_memcg_params(struct kmem_cache *s) { } @@ -510,12 +555,18 @@ static __always_inline int charge_slab_p gfp_t gfp, int order, struct kmem_cache *s) { + int ret; + if (is_root_cache(s)) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), PAGE_SIZE << order); return 0; } + ret = memcg_alloc_page_obj_cgroups(page, s, gfp); + if (ret) + return ret; + return memcg_charge_slab(page, gfp, order, s); } @@ -528,6 +579,7 @@ static __always_inline void uncharge_sla return; } + memcg_free_page_obj_cgroups(page); memcg_uncharge_slab(page, order, s); } _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: save obj_cgroup for non-root slab objects Store the obj_cgroup pointer in the corresponding place of page->obj_cgroups for each allocated non-root slab object. Make sure that each allocated object holds a reference to obj_cgroup. Objcg pointer is obtained from the memcg->objcg dereferencing in memcg_kmem_get_cache() and passed from pre_alloc_hook to post_alloc_hook. Then in case of successful allocation(s) it's getting stored in the page->obj_cgroups vector. The objcg obtaining part look a bit bulky now, but it will be simplified by next commits in the series. Link: http://lkml.kernel.org/r/20200623174037.3951353-9-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 3 + mm/memcontrol.c | 14 +++++++- mm/slab.c | 18 ++++++---- mm/slab.h | 60 +++++++++++++++++++++++++++++++---- mm/slub.c | 14 +++++--- 5 files changed, 88 insertions(+), 21 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-slab-save-obj_cgroup-for-non-root-slab-objects +++ a/include/linux/memcontrol.h @@ -1404,7 +1404,8 @@ static inline void memcg_set_shrinker_bi } #endif -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep); +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, + struct obj_cgroup **objcgp); void memcg_kmem_put_cache(struct kmem_cache *cachep); #ifdef CONFIG_MEMCG_KMEM --- a/mm/memcontrol.c~mm-memcg-slab-save-obj_cgroup-for-non-root-slab-objects +++ a/mm/memcontrol.c @@ -2973,7 +2973,8 @@ static inline bool memcg_kmem_bypass(voi * done with it, memcg_kmem_put_cache() must be called to release the * reference. */ -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, + struct obj_cgroup **objcgp) { struct mem_cgroup *memcg; struct kmem_cache *memcg_cachep; @@ -3029,8 +3030,17 @@ struct kmem_cache *memcg_kmem_get_cache( */ if (unlikely(!memcg_cachep)) memcg_schedule_kmem_cache_create(memcg, cachep); - else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt)) + else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt)) { + struct obj_cgroup *objcg = rcu_dereference(memcg->objcg); + + if (!objcg || !obj_cgroup_tryget(objcg)) { + percpu_ref_put(&memcg_cachep->memcg_params.refcnt); + goto out_unlock; + } + + *objcgp = objcg; cachep = memcg_cachep; + } out_unlock: rcu_read_unlock(); return cachep; --- a/mm/slab.c~mm-memcg-slab-save-obj_cgroup-for-non-root-slab-objects +++ a/mm/slab.c @@ -3228,9 +3228,10 @@ slab_alloc_node(struct kmem_cache *cache unsigned long save_flags; void *ptr; int slab_node = numa_mem_id(); + struct obj_cgroup *objcg = NULL; flags &= gfp_allowed_mask; - cachep = slab_pre_alloc_hook(cachep, flags); + cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags); if (unlikely(!cachep)) return NULL; @@ -3266,7 +3267,7 @@ slab_alloc_node(struct kmem_cache *cache if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr) memset(ptr, 0, cachep->object_size); - slab_post_alloc_hook(cachep, flags, 1, &ptr); + slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr); return ptr; } @@ -3307,9 +3308,10 @@ slab_alloc(struct kmem_cache *cachep, gf { unsigned long save_flags; void *objp; + struct obj_cgroup *objcg = NULL; flags &= gfp_allowed_mask; - cachep = slab_pre_alloc_hook(cachep, flags); + cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags); if (unlikely(!cachep)) return NULL; @@ -3323,7 +3325,7 @@ slab_alloc(struct kmem_cache *cachep, gf if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp) memset(objp, 0, cachep->object_size); - slab_post_alloc_hook(cachep, flags, 1, &objp); + slab_post_alloc_hook(cachep, objcg, flags, 1, &objp); return objp; } @@ -3450,6 +3452,7 @@ void ___cache_free(struct kmem_cache *ca memset(objp, 0, cachep->object_size); kmemleak_free_recursive(objp, cachep->flags); objp = cache_free_debugcheck(cachep, objp, caller); + memcg_slab_free_hook(cachep, virt_to_head_page(objp), objp); /* * Skip calling cache_free_alien() when the platform is not numa. @@ -3515,8 +3518,9 @@ int kmem_cache_alloc_bulk(struct kmem_ca void **p) { size_t i; + struct obj_cgroup *objcg = NULL; - s = slab_pre_alloc_hook(s, flags); + s = slab_pre_alloc_hook(s, &objcg, size, flags); if (!s) return 0; @@ -3539,13 +3543,13 @@ int kmem_cache_alloc_bulk(struct kmem_ca for (i = 0; i < size; i++) memset(p[i], 0, s->object_size); - slab_post_alloc_hook(s, flags, size, p); + slab_post_alloc_hook(s, objcg, flags, size, p); /* FIXME: Trace call missing. Christoph would like a bulk variant */ return size; error: local_irq_enable(); cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); - slab_post_alloc_hook(s, flags, i, p); + slab_post_alloc_hook(s, objcg, flags, i, p); __kmem_cache_free_bulk(s, i, p); return 0; } --- a/mm/slab.h~mm-memcg-slab-save-obj_cgroup-for-non-root-slab-objects +++ a/mm/slab.h @@ -470,6 +470,41 @@ static inline void memcg_free_page_obj_c page->obj_cgroups = NULL; } +static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct obj_cgroup *objcg, + size_t size, void **p) +{ + struct page *page; + unsigned long off; + size_t i; + + for (i = 0; i < size; i++) { + if (likely(p[i])) { + page = virt_to_head_page(p[i]); + off = obj_to_index(s, page, p[i]); + obj_cgroup_get(objcg); + page_obj_cgroups(page)[off] = objcg; + } + } + obj_cgroup_put(objcg); + memcg_kmem_put_cache(s); +} + +static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, + void *p) +{ + struct obj_cgroup *objcg; + unsigned int off; + + if (!memcg_kmem_enabled() || is_root_cache(s)) + return; + + off = obj_to_index(s, page, p); + objcg = page_obj_cgroups(page)[off]; + page_obj_cgroups(page)[off] = NULL; + obj_cgroup_put(objcg); +} + extern void slab_init_memcg_params(struct kmem_cache *); extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg); @@ -529,6 +564,17 @@ static inline void memcg_free_page_obj_c { } +static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct obj_cgroup *objcg, + size_t size, void **p) +{ +} + +static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, + void *p) +{ +} + static inline void slab_init_memcg_params(struct kmem_cache *s) { } @@ -631,7 +677,8 @@ static inline size_t slab_ksize(const st } static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, - gfp_t flags) + struct obj_cgroup **objcgp, + size_t size, gfp_t flags) { flags &= gfp_allowed_mask; @@ -645,13 +692,14 @@ static inline struct kmem_cache *slab_pr if (memcg_kmem_enabled() && ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT))) - return memcg_kmem_get_cache(s); + return memcg_kmem_get_cache(s, objcgp); return s; } -static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags, - size_t size, void **p) +static inline void slab_post_alloc_hook(struct kmem_cache *s, + struct obj_cgroup *objcg, + gfp_t flags, size_t size, void **p) { size_t i; @@ -663,8 +711,8 @@ static inline void slab_post_alloc_hook( s->flags, flags); } - if (memcg_kmem_enabled()) - memcg_kmem_put_cache(s); + if (memcg_kmem_enabled() && !is_root_cache(s)) + memcg_slab_post_alloc_hook(s, objcg, size, p); } #ifndef CONFIG_SLOB --- a/mm/slub.c~mm-memcg-slab-save-obj_cgroup-for-non-root-slab-objects +++ a/mm/slub.c @@ -2817,8 +2817,9 @@ static __always_inline void *slab_alloc_ struct kmem_cache_cpu *c; struct page *page; unsigned long tid; + struct obj_cgroup *objcg = NULL; - s = slab_pre_alloc_hook(s, gfpflags); + s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags); if (!s) return NULL; redo: @@ -2894,7 +2895,7 @@ redo: if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object) memset(object, 0, s->object_size); - slab_post_alloc_hook(s, gfpflags, 1, &object); + slab_post_alloc_hook(s, objcg, gfpflags, 1, &object); return object; } @@ -3099,6 +3100,8 @@ static __always_inline void do_slab_free void *tail_obj = tail ? : head; struct kmem_cache_cpu *c; unsigned long tid; + + memcg_slab_free_hook(s, page, head); redo: /* * Determine the currently cpus per cpu slab. @@ -3278,9 +3281,10 @@ int kmem_cache_alloc_bulk(struct kmem_ca { struct kmem_cache_cpu *c; int i; + struct obj_cgroup *objcg = NULL; /* memcg and kmem_cache debug support */ - s = slab_pre_alloc_hook(s, flags); + s = slab_pre_alloc_hook(s, &objcg, size, flags); if (unlikely(!s)) return false; /* @@ -3334,11 +3338,11 @@ int kmem_cache_alloc_bulk(struct kmem_ca } /* memcg and kmem_cache debug support */ - slab_post_alloc_hook(s, flags, size, p); + slab_post_alloc_hook(s, objcg, flags, size, p); return i; error: local_irq_enable(); - slab_post_alloc_hook(s, flags, i, p); + slab_post_alloc_hook(s, objcg, flags, i, p); __kmem_cache_free_bulk(s, i, p); return 0; } _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: charge individual slab objects instead of pages Switch to per-object accounting of non-root slab objects. Charging is performed using obj_cgroup API in the pre_alloc hook. Obj_cgroup is charged with the size of the object and the size of metadata: as now it's the size of an obj_cgroup pointer. If the amount of memory has been charged successfully, the actual allocation code is executed. Otherwise, -ENOMEM is returned. In the post_alloc hook if the actual allocation succeeded, corresponding vmstats are bumped and the obj_cgroup pointer is saved. Otherwise, the charge is canceled. On the free path obj_cgroup pointer is obtained and used to uncharge the size of the releasing object. Memcg and lruvec counters are now representing only memory used by active slab objects and do not include the free space. The free space is shared and doesn't belong to any specific cgroup. Global per-node slab vmstats are still modified from (un)charge_slab_page() functions. The idea is to keep all slab pages accounted as slab pages on system level. Link: http://lkml.kernel.org/r/20200623174037.3951353-10-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.h | 174 +++++++++++++++++++++++----------------------------- 1 file changed, 78 insertions(+), 96 deletions(-) --- a/mm/slab.h~mm-memcg-slab-charge-individual-slab-objects-instead-of-pages +++ a/mm/slab.h @@ -382,72 +382,6 @@ static inline struct mem_cgroup *memcg_f return NULL; } -/* - * Charge the slab page belonging to the non-root kmem_cache. - * Can be called for non-root kmem_caches only. - */ -static __always_inline int memcg_charge_slab(struct page *page, - gfp_t gfp, int order, - struct kmem_cache *s) -{ - int nr_pages = 1 << order; - struct mem_cgroup *memcg; - struct lruvec *lruvec; - int ret; - - rcu_read_lock(); - memcg = READ_ONCE(s->memcg_params.memcg); - while (memcg && !css_tryget_online(&memcg->css)) - memcg = parent_mem_cgroup(memcg); - rcu_read_unlock(); - - if (unlikely(!memcg || mem_cgroup_is_root(memcg))) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - nr_pages << PAGE_SHIFT); - percpu_ref_get_many(&s->memcg_params.refcnt, nr_pages); - return 0; - } - - ret = memcg_kmem_charge(memcg, gfp, nr_pages); - if (ret) - goto out; - - lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), nr_pages << PAGE_SHIFT); - - percpu_ref_get_many(&s->memcg_params.refcnt, nr_pages); -out: - css_put(&memcg->css); - return ret; -} - -/* - * Uncharge a slab page belonging to a non-root kmem_cache. - * Can be called for non-root kmem_caches only. - */ -static __always_inline void memcg_uncharge_slab(struct page *page, int order, - struct kmem_cache *s) -{ - int nr_pages = 1 << order; - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - rcu_read_lock(); - memcg = READ_ONCE(s->memcg_params.memcg); - if (likely(!mem_cgroup_is_root(memcg))) { - lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); - mod_lruvec_state(lruvec, cache_vmstat_idx(s), - -(nr_pages << PAGE_SHIFT)); - memcg_kmem_uncharge(memcg, nr_pages); - } else { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(nr_pages << PAGE_SHIFT)); - } - rcu_read_unlock(); - - percpu_ref_put_many(&s->memcg_params.refcnt, nr_pages); -} - static inline int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp) { @@ -470,6 +404,48 @@ static inline void memcg_free_page_obj_c page->obj_cgroups = NULL; } +static inline size_t obj_full_size(struct kmem_cache *s) +{ + /* + * For each accounted object there is an extra space which is used + * to store obj_cgroup membership. Charge it too. + */ + return s->size + sizeof(struct obj_cgroup *); +} + +static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct obj_cgroup **objcgp, + size_t objects, gfp_t flags) +{ + struct kmem_cache *cachep; + + cachep = memcg_kmem_get_cache(s, objcgp); + if (is_root_cache(cachep)) + return s; + + if (obj_cgroup_charge(*objcgp, flags, objects * obj_full_size(s))) { + obj_cgroup_put(*objcgp); + memcg_kmem_put_cache(cachep); + cachep = NULL; + } + + return cachep; +} + +static inline void mod_objcg_state(struct obj_cgroup *objcg, + struct pglist_data *pgdat, + int idx, int nr) +{ + struct mem_cgroup *memcg; + struct lruvec *lruvec; + + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + lruvec = mem_cgroup_lruvec(memcg, pgdat); + mod_memcg_lruvec_state(lruvec, idx, nr); + rcu_read_unlock(); +} + static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, size_t size, void **p) @@ -484,6 +460,10 @@ static inline void memcg_slab_post_alloc off = obj_to_index(s, page, p[i]); obj_cgroup_get(objcg); page_obj_cgroups(page)[off] = objcg; + mod_objcg_state(objcg, page_pgdat(page), + cache_vmstat_idx(s), obj_full_size(s)); + } else { + obj_cgroup_uncharge(objcg, obj_full_size(s)); } } obj_cgroup_put(objcg); @@ -502,6 +482,11 @@ static inline void memcg_slab_free_hook( off = obj_to_index(s, page, p); objcg = page_obj_cgroups(page)[off]; page_obj_cgroups(page)[off] = NULL; + + obj_cgroup_uncharge(objcg, obj_full_size(s)); + mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s), + -obj_full_size(s)); + obj_cgroup_put(objcg); } @@ -543,17 +528,6 @@ static inline struct mem_cgroup *memcg_f return NULL; } -static inline int memcg_charge_slab(struct page *page, gfp_t gfp, int order, - struct kmem_cache *s) -{ - return 0; -} - -static inline void memcg_uncharge_slab(struct page *page, int order, - struct kmem_cache *s) -{ -} - static inline int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp) { @@ -564,6 +538,13 @@ static inline void memcg_free_page_obj_c { } +static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct obj_cgroup **objcgp, + size_t objects, gfp_t flags) +{ + return NULL; +} + static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, size_t size, void **p) @@ -601,32 +582,33 @@ static __always_inline int charge_slab_p gfp_t gfp, int order, struct kmem_cache *s) { - int ret; +#ifdef CONFIG_MEMCG_KMEM + if (memcg_kmem_enabled() && !is_root_cache(s)) { + int ret; + + ret = memcg_alloc_page_obj_cgroups(page, s, gfp); + if (ret) + return ret; - if (is_root_cache(s)) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - PAGE_SIZE << order); - return 0; + percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); } - - ret = memcg_alloc_page_obj_cgroups(page, s, gfp); - if (ret) - return ret; - - return memcg_charge_slab(page, gfp, order, s); +#endif + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), + PAGE_SIZE << order); + return 0; } static __always_inline void uncharge_slab_page(struct page *page, int order, struct kmem_cache *s) { - if (is_root_cache(s)) { - mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), - -(PAGE_SIZE << order)); - return; +#ifdef CONFIG_MEMCG_KMEM + if (memcg_kmem_enabled() && !is_root_cache(s)) { + memcg_free_page_obj_cgroups(page); + percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); } - - memcg_free_page_obj_cgroups(page); - memcg_uncharge_slab(page, order, s); +#endif + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), + -(PAGE_SIZE << order)); } static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) @@ -692,7 +674,7 @@ static inline struct kmem_cache *slab_pr if (memcg_kmem_enabled() && ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT))) - return memcg_kmem_get_cache(s, objcgp); + return memcg_slab_pre_alloc_hook(s, objcgp, size, flags); return s; } _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: deprecate memory.kmem.slabinfo Deprecate memory.kmem.slabinfo. An empty file will be presented if corresponding config options are enabled. The interface is implementation dependent, isn't present in cgroup v2, and is generally useful only for core mm debugging purposes. In other words, it doesn't provide any value for the absolute majority of users. A drgn-based replacement can be found in tools/cgroup/memcg_slabinfo.py. It does support cgroup v1 and v2, mimics memory.kmem.slabinfo output and also allows to get any additional information without a need to recompile the kernel. If a drgn-based solution is too slow for a task, a bpf-based tracing tool can be used, which can easily keep track of all slab allocations belonging to a memory cgroup. Link: http://lkml.kernel.org/r/20200623174037.3951353-11-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 3 --- mm/slab_common.c | 31 ++++--------------------------- 2 files changed, 4 insertions(+), 30 deletions(-) --- a/mm/memcontrol.c~mm-memcg-slab-deprecate-memorykmemslabinfo +++ a/mm/memcontrol.c @@ -5114,9 +5114,6 @@ static struct cftype mem_cgroup_legacy_f (defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG)) { .name = "kmem.slabinfo", - .seq_start = memcg_slab_start, - .seq_next = memcg_slab_next, - .seq_stop = memcg_slab_stop, .seq_show = memcg_slab_show, }, #endif --- a/mm/slab_common.c~mm-memcg-slab-deprecate-memorykmemslabinfo +++ a/mm/slab_common.c @@ -1561,35 +1561,12 @@ void dump_unreclaimable_slab(void) } #if defined(CONFIG_MEMCG_KMEM) -void *memcg_slab_start(struct seq_file *m, loff_t *pos) -{ - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - mutex_lock(&slab_mutex); - return seq_list_start(&memcg->kmem_caches, *pos); -} - -void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos) -{ - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - return seq_list_next(p, &memcg->kmem_caches, pos); -} - -void memcg_slab_stop(struct seq_file *m, void *p) -{ - mutex_unlock(&slab_mutex); -} - int memcg_slab_show(struct seq_file *m, void *p) { - struct kmem_cache *s = list_entry(p, struct kmem_cache, - memcg_params.kmem_caches_node); - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - if (p == memcg->kmem_caches.next) - print_slabinfo_header(m); - cache_show(s, m); + /* + * Deprecated. + * Please, take a look at tools/cgroup/slabinfo.py . + */ return 0; } #endif _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h To make the memcg_kmem_bypass() function available outside of the memcontrol.c, let's move it to memcontrol.h. The function is small and nicely fits into static inline sort of functions. It will be used from the slab code. Link: http://lkml.kernel.org/r/20200623174037.3951353-12-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 12 ++++++++++++ mm/memcontrol.c | 12 ------------ 2 files changed, 12 insertions(+), 12 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-slab-move-memcg_kmem_bypass-to-memcontrolh +++ a/include/linux/memcontrol.h @@ -1440,6 +1440,18 @@ static inline bool memcg_kmem_enabled(vo return static_branch_unlikely(&memcg_kmem_enabled_key); } +static inline bool memcg_kmem_bypass(void) +{ + if (in_interrupt()) + return true; + + /* Allow remote memcg charging in kthread contexts. */ + if ((!current->mm || (current->flags & PF_KTHREAD)) && + !current->active_memcg) + return true; + return false; +} + static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { --- a/mm/memcontrol.c~mm-memcg-slab-move-memcg_kmem_bypass-to-memcontrolh +++ a/mm/memcontrol.c @@ -2945,18 +2945,6 @@ static void memcg_schedule_kmem_cache_cr queue_work(memcg_kmem_cache_wq, &cw->work); } -static inline bool memcg_kmem_bypass(void) -{ - if (in_interrupt()) - return true; - - /* Allow remote memcg charging in kthread contexts. */ - if ((!current->mm || (current->flags & PF_KTHREAD)) && - !current->active_memcg) - return true; - return false; -} - /** * memcg_kmem_get_cache: select the correct per-memcg cache for allocation * @cachep: the original global kmem cache _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: use a single set of kmem_caches for all accounted allocations This is fairly big but mostly red patch, which makes all accounted slab allocations use a single set of kmem_caches instead of creating a separate set for each memory cgroup. Because the number of non-root kmem_caches is now capped by the number of root kmem_caches, there is no need to shrink or destroy them prematurely. They can be perfectly destroyed together with their root counterparts. This allows to dramatically simplify the management of non-root kmem_caches and delete a ton of code. This patch performs the following changes: 1) introduces memcg_params.memcg_cache pointer to represent the kmem_cache which will be used for all non-root allocations 2) reuses the existing memcg kmem_cache creation mechanism to create memcg kmem_cache on the first allocation attempt 3) memcg kmem_caches are named <kmemcache_name>-memcg, e.g. dentry-memcg 4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache or schedule it's creation and return the root cache 5) removes almost all non-root kmem_cache management code (separate refcounter, reparenting, shrinking, etc) 6) makes slab debugfs to display root_mem_cgroup css id and never show :dead and :deact flags in the memcg_slabinfo attribute. Following patches in the series will simplify the kmem_cache creation. Link: http://lkml.kernel.org/r/20200623174037.3951353-13-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 5 include/linux/slab.h | 5 mm/memcontrol.c | 165 ++---------- mm/slab.c | 16 - mm/slab.h | 146 +++-------- mm/slab_common.c | 461 +++-------------------------------- mm/slub.c | 38 -- 7 files changed, 136 insertions(+), 700 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations +++ a/include/linux/memcontrol.h @@ -317,7 +317,6 @@ struct mem_cgroup { /* Index in the kmem_cache->memcg_params.memcg_caches array */ int kmemcg_id; enum memcg_kmem_state kmem_state; - struct list_head kmem_caches; struct obj_cgroup __rcu *objcg; struct list_head objcg_list; /* list of inherited objcgs */ #endif @@ -1404,9 +1403,7 @@ static inline void memcg_set_shrinker_bi } #endif -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, - struct obj_cgroup **objcgp); -void memcg_kmem_put_cache(struct kmem_cache *cachep); +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep); #ifdef CONFIG_MEMCG_KMEM int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp, --- a/include/linux/slab.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations +++ a/include/linux/slab.h @@ -155,8 +155,7 @@ struct kmem_cache *kmem_cache_create_use void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); -void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *); -void memcg_deactivate_kmem_caches(struct mem_cgroup *, struct mem_cgroup *); +void memcg_create_kmem_cache(struct kmem_cache *cachep); /* * Please use this macro to create slab caches. Simply specify the @@ -580,8 +579,6 @@ static __always_inline void *kmalloc_nod return __kmalloc_node(size, flags, node); } -int memcg_update_all_caches(int num_memcgs); - /** * kmalloc_array - allocate memory for an array. * @n: number of elements. --- a/mm/memcontrol.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations +++ a/mm/memcontrol.c @@ -350,7 +350,7 @@ static void memcg_reparent_objcgs(struct } /* - * This will be the memcg's index in each cache's ->memcg_params.memcg_caches. + * This will be used as a shrinker list's index. * The main reason for not using cgroup id for this: * this works better in sparse environments, where we have a lot of memcgs, * but only a few kmem-limited. Or also, if we have, for instance, 200 @@ -569,20 +569,16 @@ ino_t page_cgroup_ino(struct page *page) unsigned long ino = 0; rcu_read_lock(); - if (PageSlab(page) && !PageTail(page)) { - memcg = memcg_from_slab_page(page); - } else { - memcg = page->mem_cgroup; + memcg = page->mem_cgroup; - /* - * The lowest bit set means that memcg isn't a valid - * memcg pointer, but a obj_cgroups pointer. - * In this case the page is shared and doesn't belong - * to any specific memory cgroup. - */ - if ((unsigned long) memcg & 0x1UL) - memcg = NULL; - } + /* + * The lowest bit set means that memcg isn't a valid + * memcg pointer, but a obj_cgroups pointer. + * In this case the page is shared and doesn't belong + * to any specific memory cgroup. + */ + if ((unsigned long) memcg & 0x1UL) + memcg = NULL; while (memcg && !(memcg->css.flags & CSS_ONLINE)) memcg = parent_mem_cgroup(memcg); @@ -2822,12 +2818,18 @@ struct mem_cgroup *mem_cgroup_from_obj(v page = virt_to_head_page(p); /* - * Slab pages don't have page->mem_cgroup set because corresponding - * kmem caches can be reparented during the lifetime. That's why - * memcg_from_slab_page() should be used instead. - */ - if (PageSlab(page)) - return memcg_from_slab_page(page); + * Slab objects are accounted individually, not per-page. + * Memcg membership data for each individual object is saved in + * the page->obj_cgroups. + */ + if (page_has_obj_cgroups(page)) { + struct obj_cgroup *objcg; + unsigned int off; + + off = obj_to_index(page->slab_cache, page, p); + objcg = page_obj_cgroups(page)[off]; + return obj_cgroup_memcg(objcg); + } /* All other pages use page->mem_cgroup */ return page->mem_cgroup; @@ -2882,9 +2884,7 @@ static int memcg_alloc_cache_id(void) else if (size > MEMCG_CACHES_MAX_SIZE) size = MEMCG_CACHES_MAX_SIZE; - err = memcg_update_all_caches(size); - if (!err) - err = memcg_update_all_list_lrus(size); + err = memcg_update_all_list_lrus(size); if (!err) memcg_nr_cache_ids = size; @@ -2903,7 +2903,6 @@ static void memcg_free_cache_id(int id) } struct memcg_kmem_cache_create_work { - struct mem_cgroup *memcg; struct kmem_cache *cachep; struct work_struct work; }; @@ -2912,33 +2911,24 @@ static void memcg_kmem_cache_create_func { struct memcg_kmem_cache_create_work *cw = container_of(w, struct memcg_kmem_cache_create_work, work); - struct mem_cgroup *memcg = cw->memcg; struct kmem_cache *cachep = cw->cachep; - memcg_create_kmem_cache(memcg, cachep); + memcg_create_kmem_cache(cachep); - css_put(&memcg->css); kfree(cw); } /* * Enqueue the creation of a per-memcg kmem_cache. */ -static void memcg_schedule_kmem_cache_create(struct mem_cgroup *memcg, - struct kmem_cache *cachep) +static void memcg_schedule_kmem_cache_create(struct kmem_cache *cachep) { struct memcg_kmem_cache_create_work *cw; - if (!css_tryget_online(&memcg->css)) - return; - cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN); - if (!cw) { - css_put(&memcg->css); + if (!cw) return; - } - cw->memcg = memcg; cw->cachep = cachep; INIT_WORK(&cw->work, memcg_kmem_cache_create_func); @@ -2946,102 +2936,26 @@ static void memcg_schedule_kmem_cache_cr } /** - * memcg_kmem_get_cache: select the correct per-memcg cache for allocation + * memcg_kmem_get_cache: select memcg or root cache for allocation * @cachep: the original global kmem cache * * Return the kmem_cache we're supposed to use for a slab allocation. - * We try to use the current memcg's version of the cache. * * If the cache does not exist yet, if we are the first user of it, we * create it asynchronously in a workqueue and let the current allocation * go through with the original cache. - * - * This function takes a reference to the cache it returns to assure it - * won't get destroyed while we are working with it. Once the caller is - * done with it, memcg_kmem_put_cache() must be called to release the - * reference. */ -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep, - struct obj_cgroup **objcgp) +struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) { - struct mem_cgroup *memcg; struct kmem_cache *memcg_cachep; - struct memcg_cache_array *arr; - int kmemcg_id; - - VM_BUG_ON(!is_root_cache(cachep)); - if (memcg_kmem_bypass()) + memcg_cachep = READ_ONCE(cachep->memcg_params.memcg_cache); + if (unlikely(!memcg_cachep)) { + memcg_schedule_kmem_cache_create(cachep); return cachep; - - rcu_read_lock(); - - if (unlikely(current->active_memcg)) - memcg = current->active_memcg; - else - memcg = mem_cgroup_from_task(current); - - if (!memcg || memcg == root_mem_cgroup) - goto out_unlock; - - kmemcg_id = READ_ONCE(memcg->kmemcg_id); - if (kmemcg_id < 0) - goto out_unlock; - - arr = rcu_dereference(cachep->memcg_params.memcg_caches); - - /* - * Make sure we will access the up-to-date value. The code updating - * memcg_caches issues a write barrier to match the data dependency - * barrier inside READ_ONCE() (see memcg_create_kmem_cache()). - */ - memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]); - - /* - * If we are in a safe context (can wait, and not in interrupt - * context), we could be be predictable and return right away. - * This would guarantee that the allocation being performed - * already belongs in the new cache. - * - * However, there are some clashes that can arrive from locking. - * For instance, because we acquire the slab_mutex while doing - * memcg_create_kmem_cache, this means no further allocation - * could happen with the slab_mutex held. So it's better to - * defer everything. - * - * If the memcg is dying or memcg_cache is about to be released, - * don't bother creating new kmem_caches. Because memcg_cachep - * is ZEROed as the fist step of kmem offlining, we don't need - * percpu_ref_tryget_live() here. css_tryget_online() check in - * memcg_schedule_kmem_cache_create() will prevent us from - * creation of a new kmem_cache. - */ - if (unlikely(!memcg_cachep)) - memcg_schedule_kmem_cache_create(memcg, cachep); - else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt)) { - struct obj_cgroup *objcg = rcu_dereference(memcg->objcg); - - if (!objcg || !obj_cgroup_tryget(objcg)) { - percpu_ref_put(&memcg_cachep->memcg_params.refcnt); - goto out_unlock; - } - - *objcgp = objcg; - cachep = memcg_cachep; } -out_unlock: - rcu_read_unlock(); - return cachep; -} -/** - * memcg_kmem_put_cache: drop reference taken by memcg_kmem_get_cache - * @cachep: the cache returned by memcg_kmem_get_cache - */ -void memcg_kmem_put_cache(struct kmem_cache *cachep) -{ - if (!is_root_cache(cachep)) - percpu_ref_put(&cachep->memcg_params.refcnt); + return memcg_cachep; } /** @@ -3731,7 +3645,6 @@ static int memcg_online_kmem(struct mem_ */ memcg->kmemcg_id = memcg_id; memcg->kmem_state = KMEM_ONLINE; - INIT_LIST_HEAD(&memcg->kmem_caches); return 0; } @@ -3744,22 +3657,13 @@ static void memcg_offline_kmem(struct me if (memcg->kmem_state != KMEM_ONLINE) return; - /* - * Clear the online state before clearing memcg_caches array - * entries. The slab_mutex in memcg_deactivate_kmem_caches() - * guarantees that no cache will be created for this cgroup - * after we are done (see memcg_create_kmem_cache()). - */ + memcg->kmem_state = KMEM_ALLOCATED; parent = parent_mem_cgroup(memcg); if (!parent) parent = root_mem_cgroup; - /* - * Deactivate and reparent kmem_caches and objcgs. - */ - memcg_deactivate_kmem_caches(memcg, parent); memcg_reparent_objcgs(memcg, parent); kmemcg_id = memcg->kmemcg_id; @@ -5384,9 +5288,6 @@ mem_cgroup_css_alloc(struct cgroup_subsy /* The following stuff does not apply to the root */ if (!parent) { -#ifdef CONFIG_MEMCG_KMEM - INIT_LIST_HEAD(&memcg->kmem_caches); -#endif root_mem_cgroup = memcg; return &memcg->css; } --- a/mm/slab.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations +++ a/mm/slab.c @@ -1249,7 +1249,7 @@ void __init kmem_cache_init(void) nr_node_ids * sizeof(struct kmem_cache_node *), SLAB_HWCACHE_ALIGN, 0, 0); list_add(&kmem_cache->list, &slab_caches); - memcg_link_cache(kmem_cache, NULL); + memcg_link_cache(kmem_cache); slab_state = PARTIAL; /* @@ -2253,17 +2253,6 @@ int __kmem_cache_shrink(struct kmem_cach return (ret ? 1 : 0); } -#ifdef CONFIG_MEMCG -void __kmemcg_cache_deactivate(struct kmem_cache *cachep) -{ - __kmem_cache_shrink(cachep); -} - -void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s) -{ -} -#endif - int __kmem_cache_shutdown(struct kmem_cache *cachep) { return __kmem_cache_shrink(cachep); @@ -3872,7 +3861,8 @@ static int do_tune_cpucache(struct kmem_ return ret; lockdep_assert_held(&slab_mutex); - for_each_memcg_cache(c, cachep) { + c = memcg_cache(cachep); + if (c) { /* return value determined by the root cache only */ __do_tune_cpucache(c, limit, batchcount, shared, gfp); } --- a/mm/slab_common.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations +++ a/mm/slab_common.c @@ -133,141 +133,36 @@ int __kmem_cache_alloc_bulk(struct kmem_ #ifdef CONFIG_MEMCG_KMEM LIST_HEAD(slab_root_caches); -static DEFINE_SPINLOCK(memcg_kmem_wq_lock); - -static void kmemcg_cache_shutdown(struct percpu_ref *percpu_ref); void slab_init_memcg_params(struct kmem_cache *s) { s->memcg_params.root_cache = NULL; - RCU_INIT_POINTER(s->memcg_params.memcg_caches, NULL); - INIT_LIST_HEAD(&s->memcg_params.children); - s->memcg_params.dying = false; -} - -static int init_memcg_params(struct kmem_cache *s, - struct kmem_cache *root_cache) -{ - struct memcg_cache_array *arr; - - if (root_cache) { - int ret = percpu_ref_init(&s->memcg_params.refcnt, - kmemcg_cache_shutdown, - 0, GFP_KERNEL); - if (ret) - return ret; - - s->memcg_params.root_cache = root_cache; - INIT_LIST_HEAD(&s->memcg_params.children_node); - INIT_LIST_HEAD(&s->memcg_params.kmem_caches_node); - return 0; - } - - slab_init_memcg_params(s); - - if (!memcg_nr_cache_ids) - return 0; - - arr = kvzalloc(sizeof(struct memcg_cache_array) + - memcg_nr_cache_ids * sizeof(void *), - GFP_KERNEL); - if (!arr) - return -ENOMEM; - - RCU_INIT_POINTER(s->memcg_params.memcg_caches, arr); - return 0; -} - -static void destroy_memcg_params(struct kmem_cache *s) -{ - if (is_root_cache(s)) { - kvfree(rcu_access_pointer(s->memcg_params.memcg_caches)); - } else { - mem_cgroup_put(s->memcg_params.memcg); - WRITE_ONCE(s->memcg_params.memcg, NULL); - percpu_ref_exit(&s->memcg_params.refcnt); - } + s->memcg_params.memcg_cache = NULL; } -static void free_memcg_params(struct rcu_head *rcu) +static void init_memcg_params(struct kmem_cache *s, + struct kmem_cache *root_cache) { - struct memcg_cache_array *old; - - old = container_of(rcu, struct memcg_cache_array, rcu); - kvfree(old); -} - -static int update_memcg_params(struct kmem_cache *s, int new_array_size) -{ - struct memcg_cache_array *old, *new; - - new = kvzalloc(sizeof(struct memcg_cache_array) + - new_array_size * sizeof(void *), GFP_KERNEL); - if (!new) - return -ENOMEM; - - old = rcu_dereference_protected(s->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - if (old) - memcpy(new->entries, old->entries, - memcg_nr_cache_ids * sizeof(void *)); - - rcu_assign_pointer(s->memcg_params.memcg_caches, new); - if (old) - call_rcu(&old->rcu, free_memcg_params); - return 0; -} - -int memcg_update_all_caches(int num_memcgs) -{ - struct kmem_cache *s; - int ret = 0; - - mutex_lock(&slab_mutex); - list_for_each_entry(s, &slab_root_caches, root_caches_node) { - ret = update_memcg_params(s, num_memcgs); - /* - * Instead of freeing the memory, we'll just leave the caches - * up to this point in an updated state. - */ - if (ret) - break; - } - mutex_unlock(&slab_mutex); - return ret; + if (root_cache) + s->memcg_params.root_cache = root_cache; + else + slab_init_memcg_params(s); } -void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg) +void memcg_link_cache(struct kmem_cache *s) { - if (is_root_cache(s)) { + if (is_root_cache(s)) list_add(&s->root_caches_node, &slab_root_caches); - } else { - css_get(&memcg->css); - s->memcg_params.memcg = memcg; - list_add(&s->memcg_params.children_node, - &s->memcg_params.root_cache->memcg_params.children); - list_add(&s->memcg_params.kmem_caches_node, - &s->memcg_params.memcg->kmem_caches); - } } static void memcg_unlink_cache(struct kmem_cache *s) { - if (is_root_cache(s)) { + if (is_root_cache(s)) list_del(&s->root_caches_node); - } else { - list_del(&s->memcg_params.children_node); - list_del(&s->memcg_params.kmem_caches_node); - } } #else -static inline int init_memcg_params(struct kmem_cache *s, - struct kmem_cache *root_cache) -{ - return 0; -} - -static inline void destroy_memcg_params(struct kmem_cache *s) +static inline void init_memcg_params(struct kmem_cache *s, + struct kmem_cache *root_cache) { } @@ -328,14 +223,6 @@ int slab_unmergeable(struct kmem_cache * if (s->refcount < 0) return 1; -#ifdef CONFIG_MEMCG_KMEM - /* - * Skip the dying kmem_cache. - */ - if (s->memcg_params.dying) - return 1; -#endif - return 0; } @@ -390,7 +277,7 @@ static struct kmem_cache *create_cache(c unsigned int object_size, unsigned int align, slab_flags_t flags, unsigned int useroffset, unsigned int usersize, void (*ctor)(void *), - struct mem_cgroup *memcg, struct kmem_cache *root_cache) + struct kmem_cache *root_cache) { struct kmem_cache *s; int err; @@ -410,24 +297,20 @@ static struct kmem_cache *create_cache(c s->useroffset = useroffset; s->usersize = usersize; - err = init_memcg_params(s, root_cache); - if (err) - goto out_free_cache; - + init_memcg_params(s, root_cache); err = __kmem_cache_create(s, flags); if (err) goto out_free_cache; s->refcount = 1; list_add(&s->list, &slab_caches); - memcg_link_cache(s, memcg); + memcg_link_cache(s); out: if (err) return ERR_PTR(err); return s; out_free_cache: - destroy_memcg_params(s); kmem_cache_free(kmem_cache, s); goto out; } @@ -514,7 +397,7 @@ kmem_cache_create_usercopy(const char *n s = create_cache(cache_name, size, calculate_alignment(flags, align, size), - flags, useroffset, usersize, ctor, NULL, NULL); + flags, useroffset, usersize, ctor, NULL); if (IS_ERR(s)) { err = PTR_ERR(s); kfree_const(cache_name); @@ -639,51 +522,27 @@ static int shutdown_cache(struct kmem_ca #ifdef CONFIG_MEMCG_KMEM /* - * memcg_create_kmem_cache - Create a cache for a memory cgroup. - * @memcg: The memory cgroup the new cache is for. + * memcg_create_kmem_cache - Create a cache for non-root memory cgroups. * @root_cache: The parent of the new cache. * * This function attempts to create a kmem cache that will serve allocation - * requests going from @memcg to @root_cache. The new cache inherits properties - * from its parent. + * requests going all non-root memory cgroups to @root_cache. The new cache + * inherits properties from its parent. */ -void memcg_create_kmem_cache(struct mem_cgroup *memcg, - struct kmem_cache *root_cache) +void memcg_create_kmem_cache(struct kmem_cache *root_cache) { - static char memcg_name_buf[NAME_MAX + 1]; /* protected by slab_mutex */ - struct cgroup_subsys_state *css = &memcg->css; - struct memcg_cache_array *arr; struct kmem_cache *s = NULL; char *cache_name; - int idx; get_online_cpus(); get_online_mems(); mutex_lock(&slab_mutex); - /* - * The memory cgroup could have been offlined while the cache - * creation work was pending. - */ - if (memcg->kmem_state != KMEM_ONLINE) + if (root_cache->memcg_params.memcg_cache) goto out_unlock; - idx = memcg_cache_id(memcg); - arr = rcu_dereference_protected(root_cache->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - - /* - * Since per-memcg caches are created asynchronously on first - * allocation (see memcg_kmem_get_cache()), several threads can try to - * create the same cache, but only one of them may succeed. - */ - if (arr->entries[idx]) - goto out_unlock; - - cgroup_name(css->cgroup, memcg_name_buf, sizeof(memcg_name_buf)); - cache_name = kasprintf(GFP_KERNEL, "%s(%llu:%s)", root_cache->name, - css->serial_nr, memcg_name_buf); + cache_name = kasprintf(GFP_KERNEL, "%s-memcg", root_cache->name); if (!cache_name) goto out_unlock; @@ -691,7 +550,7 @@ void memcg_create_kmem_cache(struct mem_ root_cache->align, root_cache->flags & CACHE_CREATE_MASK, root_cache->useroffset, root_cache->usersize, - root_cache->ctor, memcg, root_cache); + root_cache->ctor, root_cache); /* * If we could not create a memcg cache, do not complain, because * that's not critical at all as we can always proceed with the root @@ -708,7 +567,7 @@ void memcg_create_kmem_cache(struct mem_ * initialized. */ smp_wmb(); - arr->entries[idx] = s; + root_cache->memcg_params.memcg_cache = s; out_unlock: mutex_unlock(&slab_mutex); @@ -717,231 +576,40 @@ out_unlock: put_online_cpus(); } -static void kmemcg_workfn(struct work_struct *work) -{ - struct kmem_cache *s = container_of(work, struct kmem_cache, - memcg_params.work); - - get_online_cpus(); - get_online_mems(); - - mutex_lock(&slab_mutex); - s->memcg_params.work_fn(s); - mutex_unlock(&slab_mutex); - - put_online_mems(); - put_online_cpus(); -} - -static void kmemcg_rcufn(struct rcu_head *head) -{ - struct kmem_cache *s = container_of(head, struct kmem_cache, - memcg_params.rcu_head); - - /* - * We need to grab blocking locks. Bounce to ->work. The - * work item shares the space with the RCU head and can't be - * initialized earlier. - */ - INIT_WORK(&s->memcg_params.work, kmemcg_workfn); - queue_work(memcg_kmem_cache_wq, &s->memcg_params.work); -} - -static void kmemcg_cache_shutdown_fn(struct kmem_cache *s) -{ - WARN_ON(shutdown_cache(s)); -} - -static void kmemcg_cache_shutdown(struct percpu_ref *percpu_ref) -{ - struct kmem_cache *s = container_of(percpu_ref, struct kmem_cache, - memcg_params.refcnt); - unsigned long flags; - - spin_lock_irqsave(&memcg_kmem_wq_lock, flags); - if (s->memcg_params.root_cache->memcg_params.dying) - goto unlock; - - s->memcg_params.work_fn = kmemcg_cache_shutdown_fn; - INIT_WORK(&s->memcg_params.work, kmemcg_workfn); - queue_work(memcg_kmem_cache_wq, &s->memcg_params.work); - -unlock: - spin_unlock_irqrestore(&memcg_kmem_wq_lock, flags); -} - -static void kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s) -{ - __kmemcg_cache_deactivate_after_rcu(s); - percpu_ref_kill(&s->memcg_params.refcnt); -} - -static void kmemcg_cache_deactivate(struct kmem_cache *s) -{ - if (WARN_ON_ONCE(is_root_cache(s))) - return; - - __kmemcg_cache_deactivate(s); - s->flags |= SLAB_DEACTIVATED; - - /* - * memcg_kmem_wq_lock is used to synchronize memcg_params.dying - * flag and make sure that no new kmem_cache deactivation tasks - * are queued (see flush_memcg_workqueue() ). - */ - spin_lock_irq(&memcg_kmem_wq_lock); - if (s->memcg_params.root_cache->memcg_params.dying) - goto unlock; - - s->memcg_params.work_fn = kmemcg_cache_deactivate_after_rcu; - call_rcu(&s->memcg_params.rcu_head, kmemcg_rcufn); -unlock: - spin_unlock_irq(&memcg_kmem_wq_lock); -} - -void memcg_deactivate_kmem_caches(struct mem_cgroup *memcg, - struct mem_cgroup *parent) -{ - int idx; - struct memcg_cache_array *arr; - struct kmem_cache *s, *c; - unsigned int nr_reparented; - - idx = memcg_cache_id(memcg); - - get_online_cpus(); - get_online_mems(); - - mutex_lock(&slab_mutex); - list_for_each_entry(s, &slab_root_caches, root_caches_node) { - arr = rcu_dereference_protected(s->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - c = arr->entries[idx]; - if (!c) - continue; - - kmemcg_cache_deactivate(c); - arr->entries[idx] = NULL; - } - nr_reparented = 0; - list_for_each_entry(s, &memcg->kmem_caches, - memcg_params.kmem_caches_node) { - WRITE_ONCE(s->memcg_params.memcg, parent); - css_put(&memcg->css); - nr_reparented++; - } - if (nr_reparented) { - list_splice_init(&memcg->kmem_caches, - &parent->kmem_caches); - css_get_many(&parent->css, nr_reparented); - } - mutex_unlock(&slab_mutex); - - put_online_mems(); - put_online_cpus(); -} - static int shutdown_memcg_caches(struct kmem_cache *s) { - struct memcg_cache_array *arr; - struct kmem_cache *c, *c2; - LIST_HEAD(busy); - int i; - BUG_ON(!is_root_cache(s)); - /* - * First, shutdown active caches, i.e. caches that belong to online - * memory cgroups. - */ - arr = rcu_dereference_protected(s->memcg_params.memcg_caches, - lockdep_is_held(&slab_mutex)); - for_each_memcg_cache_index(i) { - c = arr->entries[i]; - if (!c) - continue; - if (shutdown_cache(c)) - /* - * The cache still has objects. Move it to a temporary - * list so as not to try to destroy it for a second - * time while iterating over inactive caches below. - */ - list_move(&c->memcg_params.children_node, &busy); - else - /* - * The cache is empty and will be destroyed soon. Clear - * the pointer to it in the memcg_caches array so that - * it will never be accessed even if the root cache - * stays alive. - */ - arr->entries[i] = NULL; - } - - /* - * Second, shutdown all caches left from memory cgroups that are now - * offline. - */ - list_for_each_entry_safe(c, c2, &s->memcg_params.children, - memcg_params.children_node) - shutdown_cache(c); - - list_splice(&busy, &s->memcg_params.children); + if (s->memcg_params.memcg_cache) + WARN_ON(shutdown_cache(s->memcg_params.memcg_cache)); - /* - * A cache being destroyed must be empty. In particular, this means - * that all per memcg caches attached to it must be empty too. - */ - if (!list_empty(&s->memcg_params.children)) - return -EBUSY; return 0; } -static void memcg_set_kmem_cache_dying(struct kmem_cache *s) -{ - spin_lock_irq(&memcg_kmem_wq_lock); - s->memcg_params.dying = true; - spin_unlock_irq(&memcg_kmem_wq_lock); -} - static void flush_memcg_workqueue(struct kmem_cache *s) { /* - * SLAB and SLUB deactivate the kmem_caches through call_rcu. Make - * sure all registered rcu callbacks have been invoked. - */ - rcu_barrier(); - - /* * SLAB and SLUB create memcg kmem_caches through workqueue and SLUB * deactivates the memcg kmem_caches through workqueue. Make sure all * previous workitems on workqueue are processed. */ if (likely(memcg_kmem_cache_wq)) flush_workqueue(memcg_kmem_cache_wq); - - /* - * If we're racing with children kmem_cache deactivation, it might - * take another rcu grace period to complete their destruction. - * At this moment the corresponding percpu_ref_kill() call should be - * done, but it might take another rcu grace period to complete - * switching to the atomic mode. - * Please, note that we check without grabbing the slab_mutex. It's safe - * because at this moment the children list can't grow. - */ - if (!list_empty(&s->memcg_params.children)) - rcu_barrier(); } #else static inline int shutdown_memcg_caches(struct kmem_cache *s) { return 0; } + +static inline void flush_memcg_workqueue(struct kmem_cache *s) +{ +} #endif /* CONFIG_MEMCG_KMEM */ void slab_kmem_cache_release(struct kmem_cache *s) { __kmem_cache_release(s); - destroy_memcg_params(s); kfree_const(s->name); kmem_cache_free(kmem_cache, s); } @@ -953,6 +621,8 @@ void kmem_cache_destroy(struct kmem_cach if (unlikely(!s)) return; + flush_memcg_workqueue(s); + get_online_cpus(); get_online_mems(); @@ -962,22 +632,6 @@ void kmem_cache_destroy(struct kmem_cach if (s->refcount) goto out_unlock; -#ifdef CONFIG_MEMCG_KMEM - memcg_set_kmem_cache_dying(s); - - mutex_unlock(&slab_mutex); - - put_online_mems(); - put_online_cpus(); - - flush_memcg_workqueue(s); - - get_online_cpus(); - get_online_mems(); - - mutex_lock(&slab_mutex); -#endif - err = shutdown_memcg_caches(s); if (!err) err = shutdown_cache(s); @@ -1019,7 +673,7 @@ int kmem_cache_shrink(struct kmem_cache EXPORT_SYMBOL(kmem_cache_shrink); /** - * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache + * kmem_cache_shrink_all - shrink root and memcg caches * @s: The cache pointer */ void kmem_cache_shrink_all(struct kmem_cache *s) @@ -1036,21 +690,11 @@ void kmem_cache_shrink_all(struct kmem_c kasan_cache_shrink(s); __kmem_cache_shrink(s); - /* - * We have to take the slab_mutex to protect from the memcg list - * modification. - */ - mutex_lock(&slab_mutex); - for_each_memcg_cache(c, s) { - /* - * Don't need to shrink deactivated memcg caches. - */ - if (s->flags & SLAB_DEACTIVATED) - continue; + c = memcg_cache(s); + if (c) { kasan_cache_shrink(c); __kmem_cache_shrink(c); } - mutex_unlock(&slab_mutex); put_online_mems(); put_online_cpus(); } @@ -1105,7 +749,7 @@ struct kmem_cache *__init create_kmalloc create_boot_cache(s, name, size, flags, useroffset, usersize); list_add(&s->list, &slab_caches); - memcg_link_cache(s, NULL); + memcg_link_cache(s); s->refcount = 1; return s; } @@ -1483,7 +1127,8 @@ memcg_accumulate_slabinfo(struct kmem_ca if (!is_root_cache(s)) return; - for_each_memcg_cache(c, s) { + c = memcg_cache(s); + if (c) { memset(&sinfo, 0, sizeof(sinfo)); get_slabinfo(c, &sinfo); @@ -1614,7 +1259,7 @@ module_init(slab_proc_init); #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM) /* - * Display information about kmem caches that have child memcg caches. + * Display information about kmem caches that have memcg cache. */ static int memcg_slabinfo_show(struct seq_file *m, void *unused) { @@ -1626,9 +1271,9 @@ static int memcg_slabinfo_show(struct se seq_puts(m, " <active_slabs> <num_slabs>\n"); list_for_each_entry(s, &slab_root_caches, root_caches_node) { /* - * Skip kmem caches that don't have any memcg children. + * Skip kmem caches that don't have the memcg cache. */ - if (list_empty(&s->memcg_params.children)) + if (!s->memcg_params.memcg_cache) continue; memset(&sinfo, 0, sizeof(sinfo)); @@ -1637,23 +1282,13 @@ static int memcg_slabinfo_show(struct se cache_name(s), sinfo.active_objs, sinfo.num_objs, sinfo.active_slabs, sinfo.num_slabs); - for_each_memcg_cache(c, s) { - struct cgroup_subsys_state *css; - char *status = ""; - - css = &c->memcg_params.memcg->css; - if (!(css->flags & CSS_ONLINE)) - status = ":dead"; - else if (c->flags & SLAB_DEACTIVATED) - status = ":deact"; - - memset(&sinfo, 0, sizeof(sinfo)); - get_slabinfo(c, &sinfo); - seq_printf(m, "%-17s %4d%-6s %6lu %6lu %6lu %6lu\n", - cache_name(c), css->id, status, - sinfo.active_objs, sinfo.num_objs, - sinfo.active_slabs, sinfo.num_slabs); - } + c = s->memcg_params.memcg_cache; + memset(&sinfo, 0, sizeof(sinfo)); + get_slabinfo(c, &sinfo); + seq_printf(m, "%-17s %4d %6lu %6lu %6lu %6lu\n", + cache_name(c), root_mem_cgroup->css.id, + sinfo.active_objs, sinfo.num_objs, + sinfo.active_slabs, sinfo.num_slabs); } mutex_unlock(&slab_mutex); return 0; --- a/mm/slab.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations +++ a/mm/slab.h @@ -32,66 +32,25 @@ struct kmem_cache { #else /* !CONFIG_SLOB */ -struct memcg_cache_array { - struct rcu_head rcu; - struct kmem_cache *entries[0]; -}; - /* * This is the main placeholder for memcg-related information in kmem caches. - * Both the root cache and the child caches will have it. For the root cache, - * this will hold a dynamically allocated array large enough to hold - * information about the currently limited memcgs in the system. To allow the - * array to be accessed without taking any locks, on relocation we free the old - * version only after a grace period. - * - * Root and child caches hold different metadata. + * Both the root cache and the child cache will have it. Some fields are used + * in both cases, other are specific to root caches. * * @root_cache: Common to root and child caches. NULL for root, pointer to * the root cache for children. * * The following fields are specific to root caches. * - * @memcg_caches: kmemcg ID indexed table of child caches. This table is - * used to index child cachces during allocation and cleared - * early during shutdown. - * - * @root_caches_node: List node for slab_root_caches list. - * - * @children: List of all child caches. While the child caches are also - * reachable through @memcg_caches, a child cache remains on - * this list until it is actually destroyed. - * - * The following fields are specific to child caches. - * - * @memcg: Pointer to the memcg this cache belongs to. - * - * @children_node: List node for @root_cache->children list. - * - * @kmem_caches_node: List node for @memcg->kmem_caches list. + * @memcg_cache: pointer to memcg kmem cache, used by all non-root memory + * cgroups. + * @root_caches_node: list node for slab_root_caches list. */ struct memcg_cache_params { struct kmem_cache *root_cache; - union { - struct { - struct memcg_cache_array __rcu *memcg_caches; - struct list_head __root_caches_node; - struct list_head children; - bool dying; - }; - struct { - struct mem_cgroup *memcg; - struct list_head children_node; - struct list_head kmem_caches_node; - struct percpu_ref refcnt; - - void (*work_fn)(struct kmem_cache *); - union { - struct rcu_head rcu_head; - struct work_struct work; - }; - }; - }; + + struct kmem_cache *memcg_cache; + struct list_head __root_caches_node; }; #endif /* CONFIG_SLOB */ @@ -236,8 +195,6 @@ bool __kmem_cache_empty(struct kmem_cach int __kmem_cache_shutdown(struct kmem_cache *); void __kmem_cache_release(struct kmem_cache *); int __kmem_cache_shrink(struct kmem_cache *); -void __kmemcg_cache_deactivate(struct kmem_cache *s); -void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s); void slab_kmem_cache_release(struct kmem_cache *); void kmem_cache_shrink_all(struct kmem_cache *s); @@ -311,14 +268,6 @@ static inline bool kmem_cache_debug_flag extern struct list_head slab_root_caches; #define root_caches_node memcg_params.__root_caches_node -/* - * Iterate over all memcg caches of the given root cache. The caller must hold - * slab_mutex. - */ -#define for_each_memcg_cache(iter, root) \ - list_for_each_entry(iter, &(root)->memcg_params.children, \ - memcg_params.children_node) - static inline bool is_root_cache(struct kmem_cache *s) { return !s->memcg_params.root_cache; @@ -349,6 +298,13 @@ static inline struct kmem_cache *memcg_r return s->memcg_params.root_cache; } +static inline struct kmem_cache *memcg_cache(struct kmem_cache *s) +{ + if (is_root_cache(s)) + return s->memcg_params.memcg_cache; + return NULL; +} + static inline struct obj_cgroup **page_obj_cgroups(struct page *page) { /* @@ -361,25 +317,9 @@ static inline struct obj_cgroup **page_o ((unsigned long)page->obj_cgroups & ~0x1UL); } -/* - * Expects a pointer to a slab page. Please note, that PageSlab() check - * isn't sufficient, as it returns true also for tail compound slab pages, - * which do not have slab_cache pointer set. - * So this function assumes that the page can pass PageSlab() && !PageTail() - * check. - * - * The kmem_cache can be reparented asynchronously. The caller must ensure - * the memcg lifetime, e.g. by taking rcu_read_lock() or cgroup_mutex. - */ -static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) +static inline bool page_has_obj_cgroups(struct page *page) { - struct kmem_cache *s; - - s = READ_ONCE(page->slab_cache); - if (s && !is_root_cache(s)) - return READ_ONCE(s->memcg_params.memcg); - - return NULL; + return ((unsigned long)page->obj_cgroups & 0x1UL); } static inline int memcg_alloc_page_obj_cgroups(struct page *page, @@ -418,17 +358,25 @@ static inline struct kmem_cache *memcg_s size_t objects, gfp_t flags) { struct kmem_cache *cachep; + struct obj_cgroup *objcg; + + if (memcg_kmem_bypass()) + return s; - cachep = memcg_kmem_get_cache(s, objcgp); + cachep = memcg_kmem_get_cache(s); if (is_root_cache(cachep)) return s; - if (obj_cgroup_charge(*objcgp, flags, objects * obj_full_size(s))) { - obj_cgroup_put(*objcgp); - memcg_kmem_put_cache(cachep); + objcg = get_obj_cgroup_from_current(); + if (!objcg) + return s; + + if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) { + obj_cgroup_put(objcg); cachep = NULL; } + *objcgp = objcg; return cachep; } @@ -467,7 +415,6 @@ static inline void memcg_slab_post_alloc } } obj_cgroup_put(objcg); - memcg_kmem_put_cache(s); } static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page, @@ -491,7 +438,7 @@ static inline void memcg_slab_free_hook( } extern void slab_init_memcg_params(struct kmem_cache *); -extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg); +extern void memcg_link_cache(struct kmem_cache *s); #else /* CONFIG_MEMCG_KMEM */ @@ -499,9 +446,6 @@ extern void memcg_link_cache(struct kmem #define slab_root_caches slab_caches #define root_caches_node list -#define for_each_memcg_cache(iter, root) \ - for ((void)(iter), (void)(root); 0; ) - static inline bool is_root_cache(struct kmem_cache *s) { return true; @@ -523,7 +467,17 @@ static inline struct kmem_cache *memcg_r return s; } -static inline struct mem_cgroup *memcg_from_slab_page(struct page *page) +static inline struct kmem_cache *memcg_cache(struct kmem_cache *s) +{ + return NULL; +} + +static inline bool page_has_obj_cgroups(struct page *page) +{ + return false; +} + +static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr) { return NULL; } @@ -560,8 +514,7 @@ static inline void slab_init_memcg_param { } -static inline void memcg_link_cache(struct kmem_cache *s, - struct mem_cgroup *memcg) +static inline void memcg_link_cache(struct kmem_cache *s) { } @@ -582,17 +535,14 @@ static __always_inline int charge_slab_p gfp_t gfp, int order, struct kmem_cache *s) { -#ifdef CONFIG_MEMCG_KMEM if (memcg_kmem_enabled() && !is_root_cache(s)) { int ret; ret = memcg_alloc_page_obj_cgroups(page, s, gfp); if (ret) return ret; - - percpu_ref_get_many(&s->memcg_params.refcnt, 1 << order); } -#endif + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), PAGE_SIZE << order); return 0; @@ -601,12 +551,9 @@ static __always_inline int charge_slab_p static __always_inline void uncharge_slab_page(struct page *page, int order, struct kmem_cache *s) { -#ifdef CONFIG_MEMCG_KMEM - if (memcg_kmem_enabled() && !is_root_cache(s)) { + if (memcg_kmem_enabled() && !is_root_cache(s)) memcg_free_page_obj_cgroups(page); - percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order); - } -#endif + mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), -(PAGE_SIZE << order)); } @@ -749,9 +696,6 @@ static inline struct kmem_cache_node *ge void *slab_start(struct seq_file *m, loff_t *pos); void *slab_next(struct seq_file *m, void *p, loff_t *pos); void slab_stop(struct seq_file *m, void *p); -void *memcg_slab_start(struct seq_file *m, loff_t *pos); -void *memcg_slab_next(struct seq_file *m, void *p, loff_t *pos); -void memcg_slab_stop(struct seq_file *m, void *p); int memcg_slab_show(struct seq_file *m, void *p); #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB_DEBUG) --- a/mm/slub.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-accounted-allocations +++ a/mm/slub.c @@ -4204,36 +4204,6 @@ int __kmem_cache_shrink(struct kmem_cach return ret; } -#ifdef CONFIG_MEMCG -void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s) -{ - /* - * Called with all the locks held after a sched RCU grace period. - * Even if @s becomes empty after shrinking, we can't know that @s - * doesn't have allocations already in-flight and thus can't - * destroy @s until the associated memcg is released. - * - * However, let's remove the sysfs files for empty caches here. - * Each cache has a lot of interface files which aren't - * particularly useful for empty draining caches; otherwise, we can - * easily end up with millions of unnecessary sysfs files on - * systems which have a lot of memory and transient cgroups. - */ - if (!__kmem_cache_shrink(s)) - sysfs_slab_remove(s); -} - -void __kmemcg_cache_deactivate(struct kmem_cache *s) -{ - /* - * Disable empty slabs caching. Used to avoid pinning offline - * memory cgroups by kmem pages that can be freed. - */ - slub_set_cpu_partial(s, 0); - s->min_partial = 0; -} -#endif /* CONFIG_MEMCG */ - static int slab_mem_going_offline_callback(void *arg) { struct kmem_cache *s; @@ -4390,7 +4360,7 @@ static struct kmem_cache * __init bootst } slab_init_memcg_params(s); list_add(&s->list, &slab_caches); - memcg_link_cache(s, NULL); + memcg_link_cache(s); return s; } @@ -4458,7 +4428,8 @@ __kmem_cache_alias(const char *name, uns s->object_size = max(s->object_size, size); s->inuse = max(s->inuse, ALIGN(size, sizeof(void *))); - for_each_memcg_cache(c, s) { + c = memcg_cache(s); + if (c) { c->object_size = s->object_size; c->inuse = max(c->inuse, ALIGN(size, sizeof(void *))); } @@ -5591,7 +5562,8 @@ static ssize_t slab_attr_store(struct ko * directly either failed or succeeded, in which case we loop * through the descendants with best-effort propagation. */ - for_each_memcg_cache(c, s) + c = memcg_cache(s); + if (c) attribute->store(c, buf, len); mutex_unlock(&slab_mutex); } _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: simplify memcg cache creation Because the number of non-root kmem_caches doesn't depend on the number of memory cgroups anymore and is generally not very big, there is no more need for a dedicated workqueue. Also, as there is no more need to pass any arguments to the memcg_create_kmem_cache() except the root kmem_cache, it's possible to just embed the work structure into the kmem_cache and avoid the dynamic allocation of the work structure. This will also simplify the synchronization: for each root kmem_cache there is only one work. So there will be no more concurrent attempts to create a non-root kmem_cache for a root kmem_cache: the second and all following attempts to queue the work will fail. On the kmem_cache destruction path there is no more need to call the expensive flush_workqueue() and wait for all pending works to be finished. Instead, cancel_work_sync() can be used to cancel/wait for only one work. Link: http://lkml.kernel.org/r/20200623174037.3951353-14-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 1 mm/memcontrol.c | 48 ----------------------------------- mm/slab.h | 2 + mm/slab_common.c | 22 ++++++++-------- 4 files changed, 15 insertions(+), 58 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-slab-simplify-memcg-cache-creation +++ a/include/linux/memcontrol.h @@ -1418,7 +1418,6 @@ int obj_cgroup_charge(struct obj_cgroup void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); extern struct static_key_false memcg_kmem_enabled_key; -extern struct workqueue_struct *memcg_kmem_cache_wq; extern int memcg_nr_cache_ids; void memcg_get_cache_ids(void); --- a/mm/memcontrol.c~mm-memcg-slab-simplify-memcg-cache-creation +++ a/mm/memcontrol.c @@ -399,8 +399,6 @@ void memcg_put_cache_ids(void) */ DEFINE_STATIC_KEY_FALSE(memcg_kmem_enabled_key); EXPORT_SYMBOL(memcg_kmem_enabled_key); - -struct workqueue_struct *memcg_kmem_cache_wq; #endif static int memcg_shrinker_map_size; @@ -2902,39 +2900,6 @@ static void memcg_free_cache_id(int id) ida_simple_remove(&memcg_cache_ida, id); } -struct memcg_kmem_cache_create_work { - struct kmem_cache *cachep; - struct work_struct work; -}; - -static void memcg_kmem_cache_create_func(struct work_struct *w) -{ - struct memcg_kmem_cache_create_work *cw = - container_of(w, struct memcg_kmem_cache_create_work, work); - struct kmem_cache *cachep = cw->cachep; - - memcg_create_kmem_cache(cachep); - - kfree(cw); -} - -/* - * Enqueue the creation of a per-memcg kmem_cache. - */ -static void memcg_schedule_kmem_cache_create(struct kmem_cache *cachep) -{ - struct memcg_kmem_cache_create_work *cw; - - cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN); - if (!cw) - return; - - cw->cachep = cachep; - INIT_WORK(&cw->work, memcg_kmem_cache_create_func); - - queue_work(memcg_kmem_cache_wq, &cw->work); -} - /** * memcg_kmem_get_cache: select memcg or root cache for allocation * @cachep: the original global kmem cache @@ -2951,7 +2916,7 @@ struct kmem_cache *memcg_kmem_get_cache( memcg_cachep = READ_ONCE(cachep->memcg_params.memcg_cache); if (unlikely(!memcg_cachep)) { - memcg_schedule_kmem_cache_create(cachep); + queue_work(system_wq, &cachep->memcg_params.work); return cachep; } @@ -7022,17 +6987,6 @@ static int __init mem_cgroup_init(void) { int cpu, node; -#ifdef CONFIG_MEMCG_KMEM - /* - * Kmem cache creation is mostly done with the slab_mutex held, - * so use a workqueue with limited concurrency to avoid stalling - * all worker threads in case lots of cgroups are created and - * destroyed simultaneously. - */ - memcg_kmem_cache_wq = alloc_workqueue("memcg_kmem_cache", 0, 1); - BUG_ON(!memcg_kmem_cache_wq); -#endif - cpuhp_setup_state_nocalls(CPUHP_MM_MEMCQ_DEAD, "mm/memctrl:dead", NULL, memcg_hotplug_cpu_dead); --- a/mm/slab_common.c~mm-memcg-slab-simplify-memcg-cache-creation +++ a/mm/slab_common.c @@ -134,10 +134,18 @@ int __kmem_cache_alloc_bulk(struct kmem_ LIST_HEAD(slab_root_caches); +static void memcg_kmem_cache_create_func(struct work_struct *work) +{ + struct kmem_cache *cachep = container_of(work, struct kmem_cache, + memcg_params.work); + memcg_create_kmem_cache(cachep); +} + void slab_init_memcg_params(struct kmem_cache *s) { s->memcg_params.root_cache = NULL; s->memcg_params.memcg_cache = NULL; + INIT_WORK(&s->memcg_params.work, memcg_kmem_cache_create_func); } static void init_memcg_params(struct kmem_cache *s, @@ -586,15 +594,9 @@ static int shutdown_memcg_caches(struct return 0; } -static void flush_memcg_workqueue(struct kmem_cache *s) +static void cancel_memcg_cache_creation(struct kmem_cache *s) { - /* - * SLAB and SLUB create memcg kmem_caches through workqueue and SLUB - * deactivates the memcg kmem_caches through workqueue. Make sure all - * previous workitems on workqueue are processed. - */ - if (likely(memcg_kmem_cache_wq)) - flush_workqueue(memcg_kmem_cache_wq); + cancel_work_sync(&s->memcg_params.work); } #else static inline int shutdown_memcg_caches(struct kmem_cache *s) @@ -602,7 +604,7 @@ static inline int shutdown_memcg_caches( return 0; } -static inline void flush_memcg_workqueue(struct kmem_cache *s) +static inline void cancel_memcg_cache_creation(struct kmem_cache *s) { } #endif /* CONFIG_MEMCG_KMEM */ @@ -621,7 +623,7 @@ void kmem_cache_destroy(struct kmem_cach if (unlikely(!s)) return; - flush_memcg_workqueue(s); + cancel_memcg_cache_creation(s); get_online_cpus(); get_online_mems(); --- a/mm/slab.h~mm-memcg-slab-simplify-memcg-cache-creation +++ a/mm/slab.h @@ -45,12 +45,14 @@ struct kmem_cache { * @memcg_cache: pointer to memcg kmem cache, used by all non-root memory * cgroups. * @root_caches_node: list node for slab_root_caches list. + * @work: work struct used to create the non-root cache. */ struct memcg_cache_params { struct kmem_cache *root_cache; struct kmem_cache *memcg_cache; struct list_head __root_caches_node; + struct work_struct work; }; #endif /* CONFIG_SLOB */ _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: remove memcg_kmem_get_cache() The memcg_kmem_get_cache() function became really trivial, so let's just inline it into the single call point: memcg_slab_pre_alloc_hook(). It will make the code less bulky and can also help the compiler to generate a better code. Link: http://lkml.kernel.org/r/20200623174037.3951353-15-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 2 -- mm/memcontrol.c | 25 +------------------------ mm/slab.h | 11 +++++++++-- mm/slab_common.c | 2 +- 4 files changed, 11 insertions(+), 29 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-slab-remove-memcg_kmem_get_cache +++ a/include/linux/memcontrol.h @@ -1403,8 +1403,6 @@ static inline void memcg_set_shrinker_bi } #endif -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep); - #ifdef CONFIG_MEMCG_KMEM int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp, unsigned int nr_pages); --- a/mm/memcontrol.c~mm-memcg-slab-remove-memcg_kmem_get_cache +++ a/mm/memcontrol.c @@ -393,7 +393,7 @@ void memcg_put_cache_ids(void) /* * A lot of the calls to the cache allocation functions are expected to be - * inlined by the compiler. Since the calls to memcg_kmem_get_cache are + * inlined by the compiler. Since the calls to memcg_slab_pre_alloc_hook() are * conditional to this static branch, we'll have to allow modules that does * kmem_cache_alloc and the such to see this symbol as well */ @@ -2901,29 +2901,6 @@ static void memcg_free_cache_id(int id) } /** - * memcg_kmem_get_cache: select memcg or root cache for allocation - * @cachep: the original global kmem cache - * - * Return the kmem_cache we're supposed to use for a slab allocation. - * - * If the cache does not exist yet, if we are the first user of it, we - * create it asynchronously in a workqueue and let the current allocation - * go through with the original cache. - */ -struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep) -{ - struct kmem_cache *memcg_cachep; - - memcg_cachep = READ_ONCE(cachep->memcg_params.memcg_cache); - if (unlikely(!memcg_cachep)) { - queue_work(system_wq, &cachep->memcg_params.work); - return cachep; - } - - return memcg_cachep; -} - -/** * __memcg_kmem_charge: charge a number of kernel pages to a memcg * @memcg: memory cgroup to charge * @gfp: reclaim mode --- a/mm/slab_common.c~mm-memcg-slab-remove-memcg_kmem_get_cache +++ a/mm/slab_common.c @@ -570,7 +570,7 @@ void memcg_create_kmem_cache(struct kmem } /* - * Since readers won't lock (see memcg_kmem_get_cache()), we need a + * Since readers won't lock (see memcg_slab_pre_alloc_hook()), we need a * barrier here to ensure nobody will see the kmem_cache partially * initialized. */ --- a/mm/slab.h~mm-memcg-slab-remove-memcg_kmem_get_cache +++ a/mm/slab.h @@ -365,9 +365,16 @@ static inline struct kmem_cache *memcg_s if (memcg_kmem_bypass()) return s; - cachep = memcg_kmem_get_cache(s); - if (is_root_cache(cachep)) + cachep = READ_ONCE(s->memcg_params.memcg_cache); + if (unlikely(!cachep)) { + /* + * If memcg cache does not exist yet, we schedule it's + * asynchronous creation and let the current allocation + * go through with the root cache. + */ + queue_work(system_wq, &s->memcg_params.work); return s; + } objcg = get_obj_cgroup_from_current(); if (!objcg) _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: deprecate slab_root_caches Currently there are two lists of kmem_caches: 1) slab_caches, which contains all kmem_caches, 2) slab_root_caches, which contains only root kmem_caches. And there is some preprocessor magic to have a single list if CONFIG_MEMCG_KMEM isn't enabled. It was required earlier because the number of non-root kmem_caches was proportional to the number of memory cgroups and could reach really big values. Now, when it cannot exceed the number of root kmem_caches, there is really no reason to maintain two lists. We never iterate over the slab_root_caches list on any hot paths, so it's perfectly fine to iterate over slab_caches and filter out non-root kmem_caches. It allows to remove a lot of config-dependent code and two pointers from the kmem_cache structure. Link: http://lkml.kernel.org/r/20200623174037.3951353-16-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 1 - mm/slab.h | 17 ----------------- mm/slab_common.c | 37 ++++++++----------------------------- mm/slub.c | 1 - 4 files changed, 8 insertions(+), 48 deletions(-) --- a/mm/slab.c~mm-memcg-slab-deprecate-slab_root_caches +++ a/mm/slab.c @@ -1249,7 +1249,6 @@ void __init kmem_cache_init(void) nr_node_ids * sizeof(struct kmem_cache_node *), SLAB_HWCACHE_ALIGN, 0, 0); list_add(&kmem_cache->list, &slab_caches); - memcg_link_cache(kmem_cache); slab_state = PARTIAL; /* --- a/mm/slab_common.c~mm-memcg-slab-deprecate-slab_root_caches +++ a/mm/slab_common.c @@ -131,9 +131,6 @@ int __kmem_cache_alloc_bulk(struct kmem_ } #ifdef CONFIG_MEMCG_KMEM - -LIST_HEAD(slab_root_caches); - static void memcg_kmem_cache_create_func(struct work_struct *work) { struct kmem_cache *cachep = container_of(work, struct kmem_cache, @@ -156,27 +153,11 @@ static void init_memcg_params(struct kme else slab_init_memcg_params(s); } - -void memcg_link_cache(struct kmem_cache *s) -{ - if (is_root_cache(s)) - list_add(&s->root_caches_node, &slab_root_caches); -} - -static void memcg_unlink_cache(struct kmem_cache *s) -{ - if (is_root_cache(s)) - list_del(&s->root_caches_node); -} #else static inline void init_memcg_params(struct kmem_cache *s, struct kmem_cache *root_cache) { } - -static inline void memcg_unlink_cache(struct kmem_cache *s) -{ -} #endif /* CONFIG_MEMCG_KMEM */ /* @@ -253,7 +234,7 @@ struct kmem_cache *find_mergeable(unsign if (flags & SLAB_NEVER_MERGE) return NULL; - list_for_each_entry_reverse(s, &slab_root_caches, root_caches_node) { + list_for_each_entry_reverse(s, &slab_caches, list) { if (slab_unmergeable(s)) continue; @@ -312,7 +293,6 @@ static struct kmem_cache *create_cache(c s->refcount = 1; list_add(&s->list, &slab_caches); - memcg_link_cache(s); out: if (err) return ERR_PTR(err); @@ -507,7 +487,6 @@ static int shutdown_cache(struct kmem_ca if (__kmem_cache_shutdown(s) != 0) return -EBUSY; - memcg_unlink_cache(s); list_del(&s->list); if (s->flags & SLAB_TYPESAFE_BY_RCU) { @@ -751,7 +730,6 @@ struct kmem_cache *__init create_kmalloc create_boot_cache(s, name, size, flags, useroffset, usersize); list_add(&s->list, &slab_caches); - memcg_link_cache(s); s->refcount = 1; return s; } @@ -1107,12 +1085,12 @@ static void print_slabinfo_header(struct void *slab_start(struct seq_file *m, loff_t *pos) { mutex_lock(&slab_mutex); - return seq_list_start(&slab_root_caches, *pos); + return seq_list_start(&slab_caches, *pos); } void *slab_next(struct seq_file *m, void *p, loff_t *pos) { - return seq_list_next(p, &slab_root_caches, pos); + return seq_list_next(p, &slab_caches, pos); } void slab_stop(struct seq_file *m, void *p) @@ -1165,11 +1143,12 @@ static void cache_show(struct kmem_cache static int slab_show(struct seq_file *m, void *p) { - struct kmem_cache *s = list_entry(p, struct kmem_cache, root_caches_node); + struct kmem_cache *s = list_entry(p, struct kmem_cache, list); - if (p == slab_root_caches.next) + if (p == slab_caches.next) print_slabinfo_header(m); - cache_show(s, m); + if (is_root_cache(s)) + cache_show(s, m); return 0; } @@ -1271,7 +1250,7 @@ static int memcg_slabinfo_show(struct se mutex_lock(&slab_mutex); seq_puts(m, "# <name> <css_id[:dead|deact]> <active_objs> <num_objs>"); seq_puts(m, " <active_slabs> <num_slabs>\n"); - list_for_each_entry(s, &slab_root_caches, root_caches_node) { + list_for_each_entry(s, &slab_caches, list) { /* * Skip kmem caches that don't have the memcg cache. */ --- a/mm/slab.h~mm-memcg-slab-deprecate-slab_root_caches +++ a/mm/slab.h @@ -44,14 +44,12 @@ struct kmem_cache { * * @memcg_cache: pointer to memcg kmem cache, used by all non-root memory * cgroups. - * @root_caches_node: list node for slab_root_caches list. * @work: work struct used to create the non-root cache. */ struct memcg_cache_params { struct kmem_cache *root_cache; struct kmem_cache *memcg_cache; - struct list_head __root_caches_node; struct work_struct work; }; #endif /* CONFIG_SLOB */ @@ -265,11 +263,6 @@ static inline bool kmem_cache_debug_flag } #ifdef CONFIG_MEMCG_KMEM - -/* List of all root caches. */ -extern struct list_head slab_root_caches; -#define root_caches_node memcg_params.__root_caches_node - static inline bool is_root_cache(struct kmem_cache *s) { return !s->memcg_params.root_cache; @@ -447,14 +440,8 @@ static inline void memcg_slab_free_hook( } extern void slab_init_memcg_params(struct kmem_cache *); -extern void memcg_link_cache(struct kmem_cache *s); #else /* CONFIG_MEMCG_KMEM */ - -/* If !memcg, all caches are root. */ -#define slab_root_caches slab_caches -#define root_caches_node list - static inline bool is_root_cache(struct kmem_cache *s) { return true; @@ -523,10 +510,6 @@ static inline void slab_init_memcg_param { } -static inline void memcg_link_cache(struct kmem_cache *s) -{ -} - #endif /* CONFIG_MEMCG_KMEM */ static inline struct kmem_cache *virt_to_cache(const void *obj) --- a/mm/slub.c~mm-memcg-slab-deprecate-slab_root_caches +++ a/mm/slub.c @@ -4360,7 +4360,6 @@ static struct kmem_cache * __init bootst } slab_init_memcg_params(s); list_add(&s->list, &slab_caches); - memcg_link_cache(s); return s; } _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() memcg_accumulate_slabinfo() is never called with a non-root kmem_cache as a first argument, so the is_root_cache(s) check is redundant and can be removed without any functional change. Link: http://lkml.kernel.org/r/20200623174037.3951353-17-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab_common.c | 3 --- 1 file changed, 3 deletions(-) --- a/mm/slab_common.c~mm-memcg-slab-remove-redundant-check-in-memcg_accumulate_slabinfo +++ a/mm/slab_common.c @@ -1104,9 +1104,6 @@ memcg_accumulate_slabinfo(struct kmem_ca struct kmem_cache *c; struct slabinfo sinfo; - if (!is_root_cache(s)) - return; - c = memcg_cache(s); if (c) { memset(&sinfo, 0, sizeof(sinfo)); _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: use a single set of kmem_caches for all allocations Instead of having two sets of kmem_caches: one for system-wide and non-accounted allocations and the second one shared by all accounted allocations, we can use just one. The idea is simple: space for obj_cgroup metadata can be allocated on demand and filled only for accounted allocations. It allows to remove a bunch of code which is required to handle kmem_cache clones for accounted allocations. There is no more need to create them, accumulate statistics, propagate attributes, etc. It's a quite significant simplification. Also, because the total number of slab_caches is reduced almost twice (not all kmem_caches have a memcg clone), some additional memory savings are expected. On my devvm it additionally saves about 3.5% of slab memory. [guro@fb.com: fix build on MIPS] Link: http://lkml.kernel.org/r/20200717214810.3733082-1-guro@fb.com Link: http://lkml.kernel.org/r/20200623174037.3951353-18-guro@fb.com Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/slab.h | 2 include/linux/slab_def.h | 3 include/linux/slub_def.h | 10 - mm/memcontrol.c | 25 +++- mm/slab.c | 41 ------ mm/slab.h | 196 ++++++------------------------- mm/slab_common.c | 230 ------------------------------------- mm/slub.c | 163 -------------------------- 8 files changed, 79 insertions(+), 591 deletions(-) --- a/include/linux/slab_def.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/include/linux/slab_def.h @@ -72,9 +72,6 @@ struct kmem_cache { int obj_offset; #endif /* CONFIG_DEBUG_SLAB */ -#ifdef CONFIG_MEMCG - struct memcg_cache_params memcg_params; -#endif #ifdef CONFIG_KASAN struct kasan_cache kasan_info; #endif --- a/include/linux/slab.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/include/linux/slab.h @@ -155,8 +155,6 @@ struct kmem_cache *kmem_cache_create_use void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); -void memcg_create_kmem_cache(struct kmem_cache *cachep); - /* * Please use this macro to create slab caches. Simply specify the * name of the structure and maybe some flags that are listed above. --- a/include/linux/slub_def.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/include/linux/slub_def.h @@ -108,17 +108,7 @@ struct kmem_cache { struct list_head list; /* List of slab caches */ #ifdef CONFIG_SYSFS struct kobject kobj; /* For sysfs */ - struct work_struct kobj_remove_work; #endif -#ifdef CONFIG_MEMCG - struct memcg_cache_params memcg_params; - /* For propagation, maximum size of a stored attr */ - unsigned int max_attr_size; -#ifdef CONFIG_SYSFS - struct kset *memcg_kset; -#endif -#endif - #ifdef CONFIG_SLAB_FREELIST_HARDENED unsigned long random; #endif --- a/mm/memcontrol.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/mm/memcontrol.c @@ -2800,6 +2800,26 @@ static void commit_charge(struct page *p } #ifdef CONFIG_MEMCG_KMEM +int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, + gfp_t gfp) +{ + unsigned int objects = objs_per_slab_page(s, page); + void *vec; + + vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, + page_to_nid(page)); + if (!vec) + return -ENOMEM; + + if (cmpxchg(&page->obj_cgroups, NULL, + (struct obj_cgroup **) ((unsigned long)vec | 0x1UL))) + kfree(vec); + else + kmemleak_not_leak(vec); + + return 0; +} + /* * Returns a pointer to the memory cgroup to which the kernel object is charged. * @@ -2826,7 +2846,10 @@ struct mem_cgroup *mem_cgroup_from_obj(v off = obj_to_index(page->slab_cache, page, p); objcg = page_obj_cgroups(page)[off]; - return obj_cgroup_memcg(objcg); + if (objcg) + return obj_cgroup_memcg(objcg); + + return NULL; } /* All other pages use page->mem_cgroup */ --- a/mm/slab.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/mm/slab.c @@ -1379,11 +1379,7 @@ static struct page *kmem_getpages(struct return NULL; } - if (charge_slab_page(page, flags, cachep->gfporder, cachep)) { - __free_pages(page, cachep->gfporder); - return NULL; - } - + charge_slab_page(page, flags, cachep->gfporder, cachep); __SetPageSlab(page); /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */ if (sk_memalloc_socks() && page_is_pfmemalloc(page)) @@ -3799,8 +3795,8 @@ fail: } /* Always called with the slab_mutex held */ -static int __do_tune_cpucache(struct kmem_cache *cachep, int limit, - int batchcount, int shared, gfp_t gfp) +static int do_tune_cpucache(struct kmem_cache *cachep, int limit, + int batchcount, int shared, gfp_t gfp) { struct array_cache __percpu *cpu_cache, *prev; int cpu; @@ -3845,30 +3841,6 @@ setup_node: return setup_kmem_cache_nodes(cachep, gfp); } -static int do_tune_cpucache(struct kmem_cache *cachep, int limit, - int batchcount, int shared, gfp_t gfp) -{ - int ret; - struct kmem_cache *c; - - ret = __do_tune_cpucache(cachep, limit, batchcount, shared, gfp); - - if (slab_state < FULL) - return ret; - - if ((ret < 0) || !is_root_cache(cachep)) - return ret; - - lockdep_assert_held(&slab_mutex); - c = memcg_cache(cachep); - if (c) { - /* return value determined by the root cache only */ - __do_tune_cpucache(c, limit, batchcount, shared, gfp); - } - - return ret; -} - /* Called with slab_mutex held always */ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp) { @@ -3881,13 +3853,6 @@ static int enable_cpucache(struct kmem_c if (err) goto end; - if (!is_root_cache(cachep)) { - struct kmem_cache *root = memcg_root_cache(cachep); - limit = root->limit; - shared = root->shared; - batchcount = root->batchcount; - } - if (limit && shared && batchcount) goto skip_setup; /* --- a/mm/slab_common.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/mm/slab_common.c @@ -130,36 +130,6 @@ int __kmem_cache_alloc_bulk(struct kmem_ return i; } -#ifdef CONFIG_MEMCG_KMEM -static void memcg_kmem_cache_create_func(struct work_struct *work) -{ - struct kmem_cache *cachep = container_of(work, struct kmem_cache, - memcg_params.work); - memcg_create_kmem_cache(cachep); -} - -void slab_init_memcg_params(struct kmem_cache *s) -{ - s->memcg_params.root_cache = NULL; - s->memcg_params.memcg_cache = NULL; - INIT_WORK(&s->memcg_params.work, memcg_kmem_cache_create_func); -} - -static void init_memcg_params(struct kmem_cache *s, - struct kmem_cache *root_cache) -{ - if (root_cache) - s->memcg_params.root_cache = root_cache; - else - slab_init_memcg_params(s); -} -#else -static inline void init_memcg_params(struct kmem_cache *s, - struct kmem_cache *root_cache) -{ -} -#endif /* CONFIG_MEMCG_KMEM */ - /* * Figure out what the alignment of the objects will be given a set of * flags, a user specified alignment and the size of the objects. @@ -197,9 +167,6 @@ int slab_unmergeable(struct kmem_cache * if (slab_nomerge || (s->flags & SLAB_NEVER_MERGE)) return 1; - if (!is_root_cache(s)) - return 1; - if (s->ctor) return 1; @@ -286,7 +253,6 @@ static struct kmem_cache *create_cache(c s->useroffset = useroffset; s->usersize = usersize; - init_memcg_params(s, root_cache); err = __kmem_cache_create(s, flags); if (err) goto out_free_cache; @@ -344,7 +310,6 @@ kmem_cache_create_usercopy(const char *n get_online_cpus(); get_online_mems(); - memcg_get_cache_ids(); mutex_lock(&slab_mutex); @@ -394,7 +359,6 @@ kmem_cache_create_usercopy(const char *n out_unlock: mutex_unlock(&slab_mutex); - memcg_put_cache_ids(); put_online_mems(); put_online_cpus(); @@ -507,87 +471,6 @@ static int shutdown_cache(struct kmem_ca return 0; } -#ifdef CONFIG_MEMCG_KMEM -/* - * memcg_create_kmem_cache - Create a cache for non-root memory cgroups. - * @root_cache: The parent of the new cache. - * - * This function attempts to create a kmem cache that will serve allocation - * requests going all non-root memory cgroups to @root_cache. The new cache - * inherits properties from its parent. - */ -void memcg_create_kmem_cache(struct kmem_cache *root_cache) -{ - struct kmem_cache *s = NULL; - char *cache_name; - - get_online_cpus(); - get_online_mems(); - - mutex_lock(&slab_mutex); - - if (root_cache->memcg_params.memcg_cache) - goto out_unlock; - - cache_name = kasprintf(GFP_KERNEL, "%s-memcg", root_cache->name); - if (!cache_name) - goto out_unlock; - - s = create_cache(cache_name, root_cache->object_size, - root_cache->align, - root_cache->flags & CACHE_CREATE_MASK, - root_cache->useroffset, root_cache->usersize, - root_cache->ctor, root_cache); - /* - * If we could not create a memcg cache, do not complain, because - * that's not critical at all as we can always proceed with the root - * cache. - */ - if (IS_ERR(s)) { - kfree(cache_name); - goto out_unlock; - } - - /* - * Since readers won't lock (see memcg_slab_pre_alloc_hook()), we need a - * barrier here to ensure nobody will see the kmem_cache partially - * initialized. - */ - smp_wmb(); - root_cache->memcg_params.memcg_cache = s; - -out_unlock: - mutex_unlock(&slab_mutex); - - put_online_mems(); - put_online_cpus(); -} - -static int shutdown_memcg_caches(struct kmem_cache *s) -{ - BUG_ON(!is_root_cache(s)); - - if (s->memcg_params.memcg_cache) - WARN_ON(shutdown_cache(s->memcg_params.memcg_cache)); - - return 0; -} - -static void cancel_memcg_cache_creation(struct kmem_cache *s) -{ - cancel_work_sync(&s->memcg_params.work); -} -#else -static inline int shutdown_memcg_caches(struct kmem_cache *s) -{ - return 0; -} - -static inline void cancel_memcg_cache_creation(struct kmem_cache *s) -{ -} -#endif /* CONFIG_MEMCG_KMEM */ - void slab_kmem_cache_release(struct kmem_cache *s) { __kmem_cache_release(s); @@ -602,8 +485,6 @@ void kmem_cache_destroy(struct kmem_cach if (unlikely(!s)) return; - cancel_memcg_cache_creation(s); - get_online_cpus(); get_online_mems(); @@ -613,10 +494,7 @@ void kmem_cache_destroy(struct kmem_cach if (s->refcount) goto out_unlock; - err = shutdown_memcg_caches(s); - if (!err) - err = shutdown_cache(s); - + err = shutdown_cache(s); if (err) { pr_err("kmem_cache_destroy %s: Slab cache still has objects\n", s->name); @@ -653,33 +531,6 @@ int kmem_cache_shrink(struct kmem_cache } EXPORT_SYMBOL(kmem_cache_shrink); -/** - * kmem_cache_shrink_all - shrink root and memcg caches - * @s: The cache pointer - */ -void kmem_cache_shrink_all(struct kmem_cache *s) -{ - struct kmem_cache *c; - - if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !is_root_cache(s)) { - kmem_cache_shrink(s); - return; - } - - get_online_cpus(); - get_online_mems(); - kasan_cache_shrink(s); - __kmem_cache_shrink(s); - - c = memcg_cache(s); - if (c) { - kasan_cache_shrink(c); - __kmem_cache_shrink(c); - } - put_online_mems(); - put_online_cpus(); -} - bool slab_is_available(void) { return slab_state >= UP; @@ -708,8 +559,6 @@ void __init create_boot_cache(struct kme s->useroffset = useroffset; s->usersize = usersize; - slab_init_memcg_params(s); - err = __kmem_cache_create(s, flags); if (err) @@ -1098,25 +947,6 @@ void slab_stop(struct seq_file *m, void mutex_unlock(&slab_mutex); } -static void -memcg_accumulate_slabinfo(struct kmem_cache *s, struct slabinfo *info) -{ - struct kmem_cache *c; - struct slabinfo sinfo; - - c = memcg_cache(s); - if (c) { - memset(&sinfo, 0, sizeof(sinfo)); - get_slabinfo(c, &sinfo); - - info->active_slabs += sinfo.active_slabs; - info->num_slabs += sinfo.num_slabs; - info->shared_avail += sinfo.shared_avail; - info->active_objs += sinfo.active_objs; - info->num_objs += sinfo.num_objs; - } -} - static void cache_show(struct kmem_cache *s, struct seq_file *m) { struct slabinfo sinfo; @@ -1124,10 +954,8 @@ static void cache_show(struct kmem_cache memset(&sinfo, 0, sizeof(sinfo)); get_slabinfo(s, &sinfo); - memcg_accumulate_slabinfo(s, &sinfo); - seq_printf(m, "%-17s %6lu %6lu %6u %4u %4d", - cache_name(s), sinfo.active_objs, sinfo.num_objs, s->size, + s->name, sinfo.active_objs, sinfo.num_objs, s->size, sinfo.objects_per_slab, (1 << sinfo.cache_order)); seq_printf(m, " : tunables %4u %4u %4u", @@ -1144,8 +972,7 @@ static int slab_show(struct seq_file *m, if (p == slab_caches.next) print_slabinfo_header(m); - if (is_root_cache(s)) - cache_show(s, m); + cache_show(s, m); return 0; } @@ -1170,13 +997,13 @@ void dump_unreclaimable_slab(void) pr_info("Name Used Total\n"); list_for_each_entry_safe(s, s2, &slab_caches, list) { - if (!is_root_cache(s) || (s->flags & SLAB_RECLAIM_ACCOUNT)) + if (s->flags & SLAB_RECLAIM_ACCOUNT) continue; get_slabinfo(s, &sinfo); if (sinfo.num_objs > 0) - pr_info("%-17s %10luKB %10luKB\n", cache_name(s), + pr_info("%-17s %10luKB %10luKB\n", s->name, (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024); } @@ -1235,53 +1062,6 @@ static int __init slab_proc_init(void) } module_init(slab_proc_init); -#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM) -/* - * Display information about kmem caches that have memcg cache. - */ -static int memcg_slabinfo_show(struct seq_file *m, void *unused) -{ - struct kmem_cache *s, *c; - struct slabinfo sinfo; - - mutex_lock(&slab_mutex); - seq_puts(m, "# <name> <css_id[:dead|deact]> <active_objs> <num_objs>"); - seq_puts(m, " <active_slabs> <num_slabs>\n"); - list_for_each_entry(s, &slab_caches, list) { - /* - * Skip kmem caches that don't have the memcg cache. - */ - if (!s->memcg_params.memcg_cache) - continue; - - memset(&sinfo, 0, sizeof(sinfo)); - get_slabinfo(s, &sinfo); - seq_printf(m, "%-17s root %6lu %6lu %6lu %6lu\n", - cache_name(s), sinfo.active_objs, sinfo.num_objs, - sinfo.active_slabs, sinfo.num_slabs); - - c = s->memcg_params.memcg_cache; - memset(&sinfo, 0, sizeof(sinfo)); - get_slabinfo(c, &sinfo); - seq_printf(m, "%-17s %4d %6lu %6lu %6lu %6lu\n", - cache_name(c), root_mem_cgroup->css.id, - sinfo.active_objs, sinfo.num_objs, - sinfo.active_slabs, sinfo.num_slabs); - } - mutex_unlock(&slab_mutex); - return 0; -} -DEFINE_SHOW_ATTRIBUTE(memcg_slabinfo); - -static int __init memcg_slabinfo_init(void) -{ - debugfs_create_file("memcg_slabinfo", S_IFREG | S_IRUGO, - NULL, NULL, &memcg_slabinfo_fops); - return 0; -} - -late_initcall(memcg_slabinfo_init); -#endif /* CONFIG_DEBUG_FS && CONFIG_MEMCG_KMEM */ #endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */ static __always_inline void *__do_krealloc(const void *p, size_t new_size, --- a/mm/slab.h~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/mm/slab.h @@ -30,28 +30,6 @@ struct kmem_cache { struct list_head list; /* List of all slab caches on the system */ }; -#else /* !CONFIG_SLOB */ - -/* - * This is the main placeholder for memcg-related information in kmem caches. - * Both the root cache and the child cache will have it. Some fields are used - * in both cases, other are specific to root caches. - * - * @root_cache: Common to root and child caches. NULL for root, pointer to - * the root cache for children. - * - * The following fields are specific to root caches. - * - * @memcg_cache: pointer to memcg kmem cache, used by all non-root memory - * cgroups. - * @work: work struct used to create the non-root cache. - */ -struct memcg_cache_params { - struct kmem_cache *root_cache; - - struct kmem_cache *memcg_cache; - struct work_struct work; -}; #endif /* CONFIG_SLOB */ #ifdef CONFIG_SLAB @@ -196,7 +174,6 @@ int __kmem_cache_shutdown(struct kmem_ca void __kmem_cache_release(struct kmem_cache *); int __kmem_cache_shrink(struct kmem_cache *); void slab_kmem_cache_release(struct kmem_cache *); -void kmem_cache_shrink_all(struct kmem_cache *s); struct seq_file; struct file; @@ -263,43 +240,6 @@ static inline bool kmem_cache_debug_flag } #ifdef CONFIG_MEMCG_KMEM -static inline bool is_root_cache(struct kmem_cache *s) -{ - return !s->memcg_params.root_cache; -} - -static inline bool slab_equal_or_root(struct kmem_cache *s, - struct kmem_cache *p) -{ - return p == s || p == s->memcg_params.root_cache; -} - -/* - * We use suffixes to the name in memcg because we can't have caches - * created in the system with the same name. But when we print them - * locally, better refer to them with the base name - */ -static inline const char *cache_name(struct kmem_cache *s) -{ - if (!is_root_cache(s)) - s = s->memcg_params.root_cache; - return s->name; -} - -static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) -{ - if (is_root_cache(s)) - return s; - return s->memcg_params.root_cache; -} - -static inline struct kmem_cache *memcg_cache(struct kmem_cache *s) -{ - if (is_root_cache(s)) - return s->memcg_params.memcg_cache; - return NULL; -} - static inline struct obj_cgroup **page_obj_cgroups(struct page *page) { /* @@ -317,21 +257,8 @@ static inline bool page_has_obj_cgroups( return ((unsigned long)page->obj_cgroups & 0x1UL); } -static inline int memcg_alloc_page_obj_cgroups(struct page *page, - struct kmem_cache *s, gfp_t gfp) -{ - unsigned int objects = objs_per_slab_page(s, page); - void *vec; - - vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, - page_to_nid(page)); - if (!vec) - return -ENOMEM; - - kmemleak_not_leak(vec); - page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 0x1UL); - return 0; -} +int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, + gfp_t gfp); static inline void memcg_free_page_obj_cgroups(struct page *page) { @@ -348,38 +275,25 @@ static inline size_t obj_full_size(struc return s->size + sizeof(struct obj_cgroup *); } -static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, - struct obj_cgroup **objcgp, - size_t objects, gfp_t flags) +static inline struct obj_cgroup *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + size_t objects, + gfp_t flags) { - struct kmem_cache *cachep; struct obj_cgroup *objcg; if (memcg_kmem_bypass()) - return s; - - cachep = READ_ONCE(s->memcg_params.memcg_cache); - if (unlikely(!cachep)) { - /* - * If memcg cache does not exist yet, we schedule it's - * asynchronous creation and let the current allocation - * go through with the root cache. - */ - queue_work(system_wq, &s->memcg_params.work); - return s; - } + return NULL; objcg = get_obj_cgroup_from_current(); if (!objcg) - return s; + return NULL; if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) { obj_cgroup_put(objcg); - cachep = NULL; + return NULL; } - *objcgp = objcg; - return cachep; + return objcg; } static inline void mod_objcg_state(struct obj_cgroup *objcg, @@ -398,15 +312,27 @@ static inline void mod_objcg_state(struc static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, - size_t size, void **p) + gfp_t flags, size_t size, + void **p) { struct page *page; unsigned long off; size_t i; + if (!objcg) + return; + + flags &= ~__GFP_ACCOUNT; for (i = 0; i < size; i++) { if (likely(p[i])) { page = virt_to_head_page(p[i]); + + if (!page_has_obj_cgroups(page) && + memcg_alloc_page_obj_cgroups(page, s, flags)) { + obj_cgroup_uncharge(objcg, obj_full_size(s)); + continue; + } + off = obj_to_index(s, page, p[i]); obj_cgroup_get(objcg); page_obj_cgroups(page)[off] = objcg; @@ -425,13 +351,19 @@ static inline void memcg_slab_free_hook( struct obj_cgroup *objcg; unsigned int off; - if (!memcg_kmem_enabled() || is_root_cache(s)) + if (!memcg_kmem_enabled()) + return; + + if (!page_has_obj_cgroups(page)) return; off = obj_to_index(s, page, p); objcg = page_obj_cgroups(page)[off]; page_obj_cgroups(page)[off] = NULL; + if (!objcg) + return; + obj_cgroup_uncharge(objcg, obj_full_size(s)); mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s), -obj_full_size(s)); @@ -439,35 +371,7 @@ static inline void memcg_slab_free_hook( obj_cgroup_put(objcg); } -extern void slab_init_memcg_params(struct kmem_cache *); - #else /* CONFIG_MEMCG_KMEM */ -static inline bool is_root_cache(struct kmem_cache *s) -{ - return true; -} - -static inline bool slab_equal_or_root(struct kmem_cache *s, - struct kmem_cache *p) -{ - return s == p; -} - -static inline const char *cache_name(struct kmem_cache *s) -{ - return s->name; -} - -static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s) -{ - return s; -} - -static inline struct kmem_cache *memcg_cache(struct kmem_cache *s) -{ - return NULL; -} - static inline bool page_has_obj_cgroups(struct page *page) { return false; @@ -488,16 +392,17 @@ static inline void memcg_free_page_obj_c { } -static inline struct kmem_cache *memcg_slab_pre_alloc_hook(struct kmem_cache *s, - struct obj_cgroup **objcgp, - size_t objects, gfp_t flags) +static inline struct obj_cgroup *memcg_slab_pre_alloc_hook(struct kmem_cache *s, + size_t objects, + gfp_t flags) { return NULL; } static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, - size_t size, void **p) + gfp_t flags, size_t size, + void **p) { } @@ -505,11 +410,6 @@ static inline void memcg_slab_free_hook( void *p) { } - -static inline void slab_init_memcg_params(struct kmem_cache *s) -{ -} - #endif /* CONFIG_MEMCG_KMEM */ static inline struct kmem_cache *virt_to_cache(const void *obj) @@ -523,27 +423,18 @@ static inline struct kmem_cache *virt_to return page->slab_cache; } -static __always_inline int charge_slab_page(struct page *page, - gfp_t gfp, int order, - struct kmem_cache *s) -{ - if (memcg_kmem_enabled() && !is_root_cache(s)) { - int ret; - - ret = memcg_alloc_page_obj_cgroups(page, s, gfp); - if (ret) - return ret; - } - +static __always_inline void charge_slab_page(struct page *page, + gfp_t gfp, int order, + struct kmem_cache *s) +{ mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), PAGE_SIZE << order); - return 0; } static __always_inline void uncharge_slab_page(struct page *page, int order, struct kmem_cache *s) { - if (memcg_kmem_enabled() && !is_root_cache(s)) + if (memcg_kmem_enabled()) memcg_free_page_obj_cgroups(page); mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), @@ -555,12 +446,11 @@ static inline struct kmem_cache *cache_f struct kmem_cache *cachep; if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && - !memcg_kmem_enabled() && !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) return s; cachep = virt_to_cache(x); - if (WARN(cachep && !slab_equal_or_root(cachep, s), + if (WARN(cachep && cachep != s, "%s: Wrong slab cache. %s but object is from %s\n", __func__, s->name, cachep->name)) print_tracking(cachep, x); @@ -613,7 +503,7 @@ static inline struct kmem_cache *slab_pr if (memcg_kmem_enabled() && ((flags & __GFP_ACCOUNT) || (s->flags & SLAB_ACCOUNT))) - return memcg_slab_pre_alloc_hook(s, objcgp, size, flags); + *objcgp = memcg_slab_pre_alloc_hook(s, size, flags); return s; } @@ -632,8 +522,8 @@ static inline void slab_post_alloc_hook( s->flags, flags); } - if (memcg_kmem_enabled() && !is_root_cache(s)) - memcg_slab_post_alloc_hook(s, objcg, size, p); + if (memcg_kmem_enabled()) + memcg_slab_post_alloc_hook(s, objcg, flags, size, p); } #ifndef CONFIG_SLOB --- a/mm/slub.c~mm-memcg-slab-use-a-single-set-of-kmem_caches-for-all-allocations +++ a/mm/slub.c @@ -218,14 +218,10 @@ enum track_item { TRACK_ALLOC, TRACK_FRE #ifdef CONFIG_SYSFS static int sysfs_slab_add(struct kmem_cache *); static int sysfs_slab_alias(struct kmem_cache *, const char *); -static void memcg_propagate_slab_attrs(struct kmem_cache *s); -static void sysfs_slab_remove(struct kmem_cache *s); #else static inline int sysfs_slab_add(struct kmem_cache *s) { return 0; } static inline int sysfs_slab_alias(struct kmem_cache *s, const char *p) { return 0; } -static inline void memcg_propagate_slab_attrs(struct kmem_cache *s) { } -static inline void sysfs_slab_remove(struct kmem_cache *s) { } #endif static inline void stat(const struct kmem_cache *s, enum stat_item si) @@ -1624,10 +1620,8 @@ static inline struct page *alloc_slab_pa else page = __alloc_pages_node(node, flags, order); - if (page && charge_slab_page(page, flags, order, s)) { - __free_pages(page, order); - page = NULL; - } + if (page) + charge_slab_page(page, flags, order, s); return page; } @@ -3920,7 +3914,6 @@ int __kmem_cache_shutdown(struct kmem_ca if (n->nr_partial || slabs_node(s, node)) return 1; } - sysfs_slab_remove(s); return 0; } @@ -4358,7 +4351,6 @@ static struct kmem_cache * __init bootst p->slab_cache = s; #endif } - slab_init_memcg_params(s); list_add(&s->list, &slab_caches); return s; } @@ -4414,7 +4406,7 @@ struct kmem_cache * __kmem_cache_alias(const char *name, unsigned int size, unsigned int align, slab_flags_t flags, void (*ctor)(void *)) { - struct kmem_cache *s, *c; + struct kmem_cache *s; s = find_mergeable(size, align, flags, name, ctor); if (s) { @@ -4427,12 +4419,6 @@ __kmem_cache_alias(const char *name, uns s->object_size = max(s->object_size, size); s->inuse = max(s->inuse, ALIGN(size, sizeof(void *))); - c = memcg_cache(s); - if (c) { - c->object_size = s->object_size; - c->inuse = max(c->inuse, ALIGN(size, sizeof(void *))); - } - if (sysfs_slab_alias(s, name)) { s->refcount--; s = NULL; @@ -4454,7 +4440,6 @@ int __kmem_cache_create(struct kmem_cach if (slab_state <= UP) return 0; - memcg_propagate_slab_attrs(s); err = sysfs_slab_add(s); if (err) __kmem_cache_release(s); @@ -5312,7 +5297,7 @@ static ssize_t shrink_store(struct kmem_ const char *buf, size_t length) { if (buf[0] == '1') - kmem_cache_shrink_all(s); + kmem_cache_shrink(s); else return -EINVAL; return length; @@ -5536,99 +5521,9 @@ static ssize_t slab_attr_store(struct ko return -EIO; err = attribute->store(s, buf, len); -#ifdef CONFIG_MEMCG - if (slab_state >= FULL && err >= 0 && is_root_cache(s)) { - struct kmem_cache *c; - - mutex_lock(&slab_mutex); - if (s->max_attr_size < len) - s->max_attr_size = len; - - /* - * This is a best effort propagation, so this function's return - * value will be determined by the parent cache only. This is - * basically because not all attributes will have a well - * defined semantics for rollbacks - most of the actions will - * have permanent effects. - * - * Returning the error value of any of the children that fail - * is not 100 % defined, in the sense that users seeing the - * error code won't be able to know anything about the state of - * the cache. - * - * Only returning the error code for the parent cache at least - * has well defined semantics. The cache being written to - * directly either failed or succeeded, in which case we loop - * through the descendants with best-effort propagation. - */ - c = memcg_cache(s); - if (c) - attribute->store(c, buf, len); - mutex_unlock(&slab_mutex); - } -#endif return err; } -static void memcg_propagate_slab_attrs(struct kmem_cache *s) -{ -#ifdef CONFIG_MEMCG - int i; - char *buffer = NULL; - struct kmem_cache *root_cache; - - if (is_root_cache(s)) - return; - - root_cache = s->memcg_params.root_cache; - - /* - * This mean this cache had no attribute written. Therefore, no point - * in copying default values around - */ - if (!root_cache->max_attr_size) - return; - - for (i = 0; i < ARRAY_SIZE(slab_attrs); i++) { - char mbuf[64]; - char *buf; - struct slab_attribute *attr = to_slab_attr(slab_attrs[i]); - ssize_t len; - - if (!attr || !attr->store || !attr->show) - continue; - - /* - * It is really bad that we have to allocate here, so we will - * do it only as a fallback. If we actually allocate, though, - * we can just use the allocated buffer until the end. - * - * Most of the slub attributes will tend to be very small in - * size, but sysfs allows buffers up to a page, so they can - * theoretically happen. - */ - if (buffer) - buf = buffer; - else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf) && - !IS_ENABLED(CONFIG_SLUB_STATS)) - buf = mbuf; - else { - buffer = (char *) get_zeroed_page(GFP_KERNEL); - if (WARN_ON(!buffer)) - continue; - buf = buffer; - } - - len = attr->show(root_cache, buf); - if (len > 0) - attr->store(s, buf, len); - } - - if (buffer) - free_page((unsigned long)buffer); -#endif /* CONFIG_MEMCG */ -} - static void kmem_cache_release(struct kobject *k) { slab_kmem_cache_release(to_slab(k)); @@ -5648,10 +5543,6 @@ static struct kset *slab_kset; static inline struct kset *cache_kset(struct kmem_cache *s) { -#ifdef CONFIG_MEMCG - if (!is_root_cache(s)) - return s->memcg_params.root_cache->memcg_kset; -#endif return slab_kset; } @@ -5694,27 +5585,6 @@ static char *create_unique_id(struct kme return name; } -static void sysfs_slab_remove_workfn(struct work_struct *work) -{ - struct kmem_cache *s = - container_of(work, struct kmem_cache, kobj_remove_work); - - if (!s->kobj.state_in_sysfs) - /* - * For a memcg cache, this may be called during - * deactivation and again on shutdown. Remove only once. - * A cache is never shut down before deactivation is - * complete, so no need to worry about synchronization. - */ - goto out; - -#ifdef CONFIG_MEMCG - kset_unregister(s->memcg_kset); -#endif -out: - kobject_put(&s->kobj); -} - static int sysfs_slab_add(struct kmem_cache *s) { int err; @@ -5722,8 +5592,6 @@ static int sysfs_slab_add(struct kmem_ca struct kset *kset = cache_kset(s); int unmergeable = slab_unmergeable(s); - INIT_WORK(&s->kobj_remove_work, sysfs_slab_remove_workfn); - if (!kset) { kobject_init(&s->kobj, &slab_ktype); return 0; @@ -5760,16 +5628,6 @@ static int sysfs_slab_add(struct kmem_ca if (err) goto out_del_kobj; -#ifdef CONFIG_MEMCG - if (is_root_cache(s) && memcg_sysfs_enabled) { - s->memcg_kset = kset_create_and_add("cgroup", NULL, &s->kobj); - if (!s->memcg_kset) { - err = -ENOMEM; - goto out_del_kobj; - } - } -#endif - if (!unmergeable) { /* Setup first alias */ sysfs_slab_alias(s, s->name); @@ -5783,19 +5641,6 @@ out_del_kobj: goto out; } -static void sysfs_slab_remove(struct kmem_cache *s) -{ - if (slab_state < FULL) - /* - * Sysfs has not been setup yet so no need to remove the - * cache from sysfs. - */ - return; - - kobject_get(&s->kobj); - schedule_work(&s->kobj_remove_work); -} - void sysfs_slab_unlink(struct kmem_cache *s) { if (slab_state >= FULL) _
From: Roman Gushchin <guro@fb.com> Subject: kselftests: cgroup: add kernel memory accounting tests Add some tests to cover the kernel memory accounting functionality. These are covering some issues (and changes) we had recently. 1) A test which allocates a lot of negative dentries, checks memcg slab statistics, creates memory pressure by setting memory.max to some low value and checks that some number of slabs was reclaimed. 2) A test which covers side effects of memcg destruction: it creates and destroys a large number of sub-cgroups, each containing a multi-threaded workload which allocates and releases some kernel memory. Then it checks that the charge ans memory.stats do add up on the parent level. 3) A test which reads /proc/kpagecgroup and implicitly checks that it doesn't crash the system. 4) A test which spawns a large number of threads and checks that the kernel stacks accounting works as expected. 5) A test which checks that living charged slab objects are not preventing the memory cgroup from being released after being deleted by a user. Link: http://lkml.kernel.org/r/20200623174037.3951353-19-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- tools/testing/selftests/cgroup/.gitignore | 1 tools/testing/selftests/cgroup/Makefile | 2 tools/testing/selftests/cgroup/test_kmem.c | 382 +++++++++++++++++++ 3 files changed, 385 insertions(+) --- a/tools/testing/selftests/cgroup/.gitignore~kselftests-cgroup-add-kernel-memory-accounting-tests +++ a/tools/testing/selftests/cgroup/.gitignore @@ -2,3 +2,4 @@ test_memcontrol test_core test_freezer +test_kmem \ No newline at end of file --- a/tools/testing/selftests/cgroup/Makefile~kselftests-cgroup-add-kernel-memory-accounting-tests +++ a/tools/testing/selftests/cgroup/Makefile @@ -6,11 +6,13 @@ all: TEST_FILES := with_stress.sh TEST_PROGS := test_stress.sh TEST_GEN_PROGS = test_memcontrol +TEST_GEN_PROGS += test_kmem TEST_GEN_PROGS += test_core TEST_GEN_PROGS += test_freezer include ../lib.mk $(OUTPUT)/test_memcontrol: cgroup_util.c ../clone3/clone3_selftests.h +$(OUTPUT)/test_kmem: cgroup_util.c ../clone3/clone3_selftests.h $(OUTPUT)/test_core: cgroup_util.c ../clone3/clone3_selftests.h $(OUTPUT)/test_freezer: cgroup_util.c ../clone3/clone3_selftests.h --- /dev/null +++ a/tools/testing/selftests/cgroup/test_kmem.c @@ -0,0 +1,382 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE + +#include <linux/limits.h> +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <unistd.h> +#include <sys/wait.h> +#include <errno.h> +#include <sys/sysinfo.h> +#include <pthread.h> + +#include "../kselftest.h" +#include "cgroup_util.h" + + +static int alloc_dcache(const char *cgroup, void *arg) +{ + unsigned long i; + struct stat st; + char buf[128]; + + for (i = 0; i < (unsigned long)arg; i++) { + snprintf(buf, sizeof(buf), + "/something-non-existent-with-a-long-name-%64lu-%d", + i, getpid()); + stat(buf, &st); + } + + return 0; +} + +/* + * This test allocates 100000 of negative dentries with long names. + * Then it checks that "slab" in memory.stat is larger than 1M. + * Then it sets memory.high to 1M and checks that at least 1/2 + * of slab memory has been reclaimed. + */ +static int test_kmem_basic(const char *root) +{ + int ret = KSFT_FAIL; + char *cg = NULL; + long slab0, slab1, current; + + cg = cg_name(root, "kmem_basic_test"); + if (!cg) + goto cleanup; + + if (cg_create(cg)) + goto cleanup; + + if (cg_run(cg, alloc_dcache, (void *)100000)) + goto cleanup; + + slab0 = cg_read_key_long(cg, "memory.stat", "slab "); + if (slab0 < (1 << 20)) + goto cleanup; + + cg_write(cg, "memory.high", "1M"); + slab1 = cg_read_key_long(cg, "memory.stat", "slab "); + if (slab1 <= 0) + goto cleanup; + + current = cg_read_long(cg, "memory.current"); + if (current <= 0) + goto cleanup; + + if (slab1 < slab0 / 2 && current < slab0 / 2) + ret = KSFT_PASS; +cleanup: + cg_destroy(cg); + free(cg); + + return ret; +} + +static void *alloc_kmem_fn(void *arg) +{ + alloc_dcache(NULL, (void *)100); + return NULL; +} + +static int alloc_kmem_smp(const char *cgroup, void *arg) +{ + int nr_threads = 2 * get_nprocs(); + pthread_t *tinfo; + unsigned long i; + int ret = -1; + + tinfo = calloc(nr_threads, sizeof(pthread_t)); + if (tinfo == NULL) + return -1; + + for (i = 0; i < nr_threads; i++) { + if (pthread_create(&tinfo[i], NULL, &alloc_kmem_fn, + (void *)i)) { + free(tinfo); + return -1; + } + } + + for (i = 0; i < nr_threads; i++) { + ret = pthread_join(tinfo[i], NULL); + if (ret) + break; + } + + free(tinfo); + return ret; +} + +static int cg_run_in_subcgroups(const char *parent, + int (*fn)(const char *cgroup, void *arg), + void *arg, int times) +{ + char *child; + int i; + + for (i = 0; i < times; i++) { + child = cg_name_indexed(parent, "child", i); + if (!child) + return -1; + + if (cg_create(child)) { + cg_destroy(child); + free(child); + return -1; + } + + if (cg_run(child, fn, NULL)) { + cg_destroy(child); + free(child); + return -1; + } + + cg_destroy(child); + free(child); + } + + return 0; +} + +/* + * The test creates and destroys a large number of cgroups. In each cgroup it + * allocates some slab memory (mostly negative dentries) using 2 * NR_CPUS + * threads. Then it checks the sanity of numbers on the parent level: + * the total size of the cgroups should be roughly equal to + * anon + file + slab + kernel_stack. + */ +static int test_kmem_memcg_deletion(const char *root) +{ + long current, slab, anon, file, kernel_stack, sum; + int ret = KSFT_FAIL; + char *parent; + + parent = cg_name(root, "kmem_memcg_deletion_test"); + if (!parent) + goto cleanup; + + if (cg_create(parent)) + goto cleanup; + + if (cg_write(parent, "cgroup.subtree_control", "+memory")) + goto cleanup; + + if (cg_run_in_subcgroups(parent, alloc_kmem_smp, NULL, 100)) + goto cleanup; + + current = cg_read_long(parent, "memory.current"); + slab = cg_read_key_long(parent, "memory.stat", "slab "); + anon = cg_read_key_long(parent, "memory.stat", "anon "); + file = cg_read_key_long(parent, "memory.stat", "file "); + kernel_stack = cg_read_key_long(parent, "memory.stat", "kernel_stack "); + if (current < 0 || slab < 0 || anon < 0 || file < 0 || + kernel_stack < 0) + goto cleanup; + + sum = slab + anon + file + kernel_stack; + if (abs(sum - current) < 4096 * 32 * 2 * get_nprocs()) { + ret = KSFT_PASS; + } else { + printf("memory.current = %ld\n", current); + printf("slab + anon + file + kernel_stack = %ld\n", sum); + printf("slab = %ld\n", slab); + printf("anon = %ld\n", anon); + printf("file = %ld\n", file); + printf("kernel_stack = %ld\n", kernel_stack); + } + +cleanup: + cg_destroy(parent); + free(parent); + + return ret; +} + +/* + * The test reads the entire /proc/kpagecgroup. If the operation went + * successfully (and the kernel didn't panic), the test is treated as passed. + */ +static int test_kmem_proc_kpagecgroup(const char *root) +{ + unsigned long buf[128]; + int ret = KSFT_FAIL; + ssize_t len; + int fd; + + fd = open("/proc/kpagecgroup", O_RDONLY); + if (fd < 0) + return ret; + + do { + len = read(fd, buf, sizeof(buf)); + } while (len > 0); + + if (len == 0) + ret = KSFT_PASS; + + close(fd); + return ret; +} + +static void *pthread_wait_fn(void *arg) +{ + sleep(100); + return NULL; +} + +static int spawn_1000_threads(const char *cgroup, void *arg) +{ + int nr_threads = 1000; + pthread_t *tinfo; + unsigned long i; + long stack; + int ret = -1; + + tinfo = calloc(nr_threads, sizeof(pthread_t)); + if (tinfo == NULL) + return -1; + + for (i = 0; i < nr_threads; i++) { + if (pthread_create(&tinfo[i], NULL, &pthread_wait_fn, + (void *)i)) { + free(tinfo); + return(-1); + } + } + + stack = cg_read_key_long(cgroup, "memory.stat", "kernel_stack "); + if (stack >= 4096 * 1000) + ret = 0; + + free(tinfo); + return ret; +} + +/* + * The test spawns a process, which spawns 1000 threads. Then it checks + * that memory.stat's kernel_stack is at least 1000 pages large. + */ +static int test_kmem_kernel_stacks(const char *root) +{ + int ret = KSFT_FAIL; + char *cg = NULL; + + cg = cg_name(root, "kmem_kernel_stacks_test"); + if (!cg) + goto cleanup; + + if (cg_create(cg)) + goto cleanup; + + if (cg_run(cg, spawn_1000_threads, NULL)) + goto cleanup; + + ret = KSFT_PASS; +cleanup: + cg_destroy(cg); + free(cg); + + return ret; +} + +/* + * This test sequentionally creates 30 child cgroups, allocates some + * kernel memory in each of them, and deletes them. Then it checks + * that the number of dying cgroups on the parent level is 0. + */ +static int test_kmem_dead_cgroups(const char *root) +{ + int ret = KSFT_FAIL; + char *parent; + long dead; + int i; + + parent = cg_name(root, "kmem_dead_cgroups_test"); + if (!parent) + goto cleanup; + + if (cg_create(parent)) + goto cleanup; + + if (cg_write(parent, "cgroup.subtree_control", "+memory")) + goto cleanup; + + if (cg_run_in_subcgroups(parent, alloc_dcache, (void *)100, 30)) + goto cleanup; + + for (i = 0; i < 5; i++) { + dead = cg_read_key_long(parent, "cgroup.stat", + "nr_dying_descendants "); + if (dead == 0) { + ret = KSFT_PASS; + break; + } + /* + * Reclaiming cgroups might take some time, + * let's wait a bit and repeat. + */ + sleep(1); + } + +cleanup: + cg_destroy(parent); + free(parent); + + return ret; +} + +#define T(x) { x, #x } +struct kmem_test { + int (*fn)(const char *root); + const char *name; +} tests[] = { + T(test_kmem_basic), + T(test_kmem_memcg_deletion), + T(test_kmem_proc_kpagecgroup), + T(test_kmem_kernel_stacks), + T(test_kmem_dead_cgroups), +}; +#undef T + +int main(int argc, char **argv) +{ + char root[PATH_MAX]; + int i, ret = EXIT_SUCCESS; + + if (cg_find_unified_root(root, sizeof(root))) + ksft_exit_skip("cgroup v2 isn't mounted\n"); + + /* + * Check that memory controller is available: + * memory is listed in cgroup.controllers + */ + if (cg_read_strstr(root, "cgroup.controllers", "memory")) + ksft_exit_skip("memory controller isn't available\n"); + + if (cg_read_strstr(root, "cgroup.subtree_control", "memory")) + if (cg_write(root, "cgroup.subtree_control", "+memory")) + ksft_exit_skip("Failed to set memory controller\n"); + + for (i = 0; i < ARRAY_SIZE(tests); i++) { + switch (tests[i].fn(root)) { + case KSFT_PASS: + ksft_test_result_pass("%s\n", tests[i].name); + break; + case KSFT_SKIP: + ksft_test_result_skip("%s\n", tests[i].name); + break; + default: + ret = EXIT_FAILURE; + ksft_test_result_fail("%s\n", tests[i].name); + break; + } + } + + return ret; +} _
From: Roman Gushchin <guro@fb.com> Subject: tools/cgroup: add memcg_slabinfo.py tool Add a drgn-based tool to display slab information for a given memcg. Can replace cgroup v1 memory.kmem.slabinfo interface on cgroup v2, but in a more flexiable way. Currently supports only SLUB configuration, but SLAB can be trivially added later. Output example: $ sudo ./tools/cgroup/memcg_slabinfo.py /sys/fs/cgroup/user.slice/user-111017.slice/user\@111017.service shmem_inode_cache 92 92 704 46 8 : tunables 0 0 0 : slabdata 2 2 0 eventpoll_pwq 56 56 72 56 1 : tunables 0 0 0 : slabdata 1 1 0 eventpoll_epi 32 32 128 32 1 : tunables 0 0 0 : slabdata 1 1 0 kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-2048 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-64 128 128 64 64 1 : tunables 0 0 0 : slabdata 2 2 0 mm_struct 160 160 1024 32 8 : tunables 0 0 0 : slabdata 5 5 0 signal_cache 96 96 1024 32 8 : tunables 0 0 0 : slabdata 3 3 0 sighand_cache 45 45 2112 15 8 : tunables 0 0 0 : slabdata 3 3 0 files_cache 138 138 704 46 8 : tunables 0 0 0 : slabdata 3 3 0 task_delay_info 153 153 80 51 1 : tunables 0 0 0 : slabdata 3 3 0 task_struct 27 27 3520 9 8 : tunables 0 0 0 : slabdata 3 3 0 radix_tree_node 56 56 584 28 4 : tunables 0 0 0 : slabdata 2 2 0 btrfs_inode 140 140 1136 28 8 : tunables 0 0 0 : slabdata 5 5 0 kmalloc-1024 64 64 1024 32 8 : tunables 0 0 0 : slabdata 2 2 0 kmalloc-192 84 84 192 42 2 : tunables 0 0 0 : slabdata 2 2 0 inode_cache 54 54 600 27 4 : tunables 0 0 0 : slabdata 2 2 0 kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-512 32 32 512 32 4 : tunables 0 0 0 : slabdata 1 1 0 skbuff_head_cache 32 32 256 32 2 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 46 46 704 46 8 : tunables 0 0 0 : slabdata 1 1 0 cred_jar 378 378 192 42 2 : tunables 0 0 0 : slabdata 9 9 0 proc_inode_cache 96 96 672 24 4 : tunables 0 0 0 : slabdata 4 4 0 dentry 336 336 192 42 2 : tunables 0 0 0 : slabdata 8 8 0 filp 697 864 256 32 2 : tunables 0 0 0 : slabdata 27 27 0 anon_vma 644 644 88 46 1 : tunables 0 0 0 : slabdata 14 14 0 pid 1408 1408 64 64 1 : tunables 0 0 0 : slabdata 22 22 0 vm_area_struct 1200 1200 200 40 2 : tunables 0 0 0 : slabdata 30 30 0 Link: http://lkml.kernel.org/r/20200623174037.3951353-20-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- tools/cgroup/memcg_slabinfo.py | 226 +++++++++++++++++++++++++++++++ 1 file changed, 226 insertions(+) --- /dev/null +++ a/tools/cgroup/memcg_slabinfo.py @@ -0,0 +1,226 @@ +#!/usr/bin/env drgn +# +# Copyright (C) 2020 Roman Gushchin <guro@fb.com> +# Copyright (C) 2020 Facebook + +from os import stat +import argparse +import sys + +from drgn.helpers.linux import list_for_each_entry, list_empty +from drgn.helpers.linux import for_each_page +from drgn.helpers.linux.cpumask import for_each_online_cpu +from drgn.helpers.linux.percpu import per_cpu_ptr +from drgn import container_of, FaultError, Object + + +DESC = """ +This is a drgn script to provide slab statistics for memory cgroups. +It supports cgroup v2 and v1 and can emulate memory.kmem.slabinfo +interface of cgroup v1. +For drgn, visit https://github.com/osandov/drgn. +""" + + +MEMCGS = {} + +OO_SHIFT = 16 +OO_MASK = ((1 << OO_SHIFT) - 1) + + +def err(s): + print('slabinfo.py: error: %s' % s, file=sys.stderr, flush=True) + sys.exit(1) + + +def find_memcg_ids(css=prog['root_mem_cgroup'].css, prefix=''): + if not list_empty(css.children.address_of_()): + for css in list_for_each_entry('struct cgroup_subsys_state', + css.children.address_of_(), + 'sibling'): + name = prefix + '/' + css.cgroup.kn.name.string_().decode('utf-8') + memcg = container_of(css, 'struct mem_cgroup', 'css') + MEMCGS[css.cgroup.kn.id.value_()] = memcg + find_memcg_ids(css, name) + + +def is_root_cache(s): + try: + return False if s.memcg_params.root_cache else True + except AttributeError: + return True + + +def cache_name(s): + if is_root_cache(s): + return s.name.string_().decode('utf-8') + else: + return s.memcg_params.root_cache.name.string_().decode('utf-8') + + +# SLUB + +def oo_order(s): + return s.oo.x >> OO_SHIFT + + +def oo_objects(s): + return s.oo.x & OO_MASK + + +def count_partial(n, fn): + nr_pages = 0 + for page in list_for_each_entry('struct page', n.partial.address_of_(), + 'lru'): + nr_pages += fn(page) + return nr_pages + + +def count_free(page): + return page.objects - page.inuse + + +def slub_get_slabinfo(s, cfg): + nr_slabs = 0 + nr_objs = 0 + nr_free = 0 + + for node in range(cfg['nr_nodes']): + n = s.node[node] + nr_slabs += n.nr_slabs.counter.value_() + nr_objs += n.total_objects.counter.value_() + nr_free += count_partial(n, count_free) + + return {'active_objs': nr_objs - nr_free, + 'num_objs': nr_objs, + 'active_slabs': nr_slabs, + 'num_slabs': nr_slabs, + 'objects_per_slab': oo_objects(s), + 'cache_order': oo_order(s), + 'limit': 0, + 'batchcount': 0, + 'shared': 0, + 'shared_avail': 0} + + +def cache_show(s, cfg, objs): + if cfg['allocator'] == 'SLUB': + sinfo = slub_get_slabinfo(s, cfg) + else: + err('SLAB isn\'t supported yet') + + if cfg['shared_slab_pages']: + sinfo['active_objs'] = objs + sinfo['num_objs'] = objs + + print('%-17s %6lu %6lu %6u %4u %4d' + ' : tunables %4u %4u %4u' + ' : slabdata %6lu %6lu %6lu' % ( + cache_name(s), sinfo['active_objs'], sinfo['num_objs'], + s.size, sinfo['objects_per_slab'], 1 << sinfo['cache_order'], + sinfo['limit'], sinfo['batchcount'], sinfo['shared'], + sinfo['active_slabs'], sinfo['num_slabs'], + sinfo['shared_avail'])) + + +def detect_kernel_config(): + cfg = {} + + cfg['nr_nodes'] = prog['nr_online_nodes'].value_() + + if prog.type('struct kmem_cache').members[1][1] == 'flags': + cfg['allocator'] = 'SLUB' + elif prog.type('struct kmem_cache').members[1][1] == 'batchcount': + cfg['allocator'] = 'SLAB' + else: + err('Can\'t determine the slab allocator') + + cfg['shared_slab_pages'] = False + try: + if prog.type('struct obj_cgroup'): + cfg['shared_slab_pages'] = True + except: + pass + + return cfg + + +def for_each_slab_page(prog): + PGSlab = 1 << prog.constant('PG_slab') + PGHead = 1 << prog.constant('PG_head') + + for page in for_each_page(prog): + try: + if page.flags.value_() & PGSlab: + yield page + except FaultError: + pass + + +def main(): + parser = argparse.ArgumentParser(description=DESC, + formatter_class= + argparse.RawTextHelpFormatter) + parser.add_argument('cgroup', metavar='CGROUP', + help='Target memory cgroup') + args = parser.parse_args() + + try: + cgroup_id = stat(args.cgroup).st_ino + find_memcg_ids() + memcg = MEMCGS[cgroup_id] + except KeyError: + err('Can\'t find the memory cgroup') + + cfg = detect_kernel_config() + + print('# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>' + ' : tunables <limit> <batchcount> <sharedfactor>' + ' : slabdata <active_slabs> <num_slabs> <sharedavail>') + + if cfg['shared_slab_pages']: + obj_cgroups = set() + stats = {} + caches = {} + + # find memcg pointers belonging to the specified cgroup + obj_cgroups.add(memcg.objcg.value_()) + for ptr in list_for_each_entry('struct obj_cgroup', + memcg.objcg_list.address_of_(), + 'list'): + obj_cgroups.add(ptr.value_()) + + # look over all slab pages, belonging to non-root memcgs + # and look for objects belonging to the given memory cgroup + for page in for_each_slab_page(prog): + objcg_vec_raw = page.obj_cgroups.value_() + if objcg_vec_raw == 0: + continue + cache = page.slab_cache + if not cache: + continue + addr = cache.value_() + caches[addr] = cache + # clear the lowest bit to get the true obj_cgroups + objcg_vec = Object(prog, page.obj_cgroups.type_, + value=objcg_vec_raw & ~1) + + if addr not in stats: + stats[addr] = 0 + + for i in range(oo_objects(cache)): + if objcg_vec[i].value_() in obj_cgroups: + stats[addr] += 1 + + for addr in caches: + if stats[addr] > 0: + cache_show(caches[addr], cfg, stats[addr]) + + else: + for s in list_for_each_entry('struct kmem_cache', + memcg.kmem_caches.address_of_(), + 'memcg_params.kmem_caches_node'): + cache_show(s, cfg, None) + + +main() _
From: Shakeel Butt <shakeelb@google.com> Subject: mm: memcontrol: account kernel stack per node Currently the kernel stack is being accounted per-zone. There is no need to do that. In addition due to being per-zone, memcg has to keep a separate MEMCG_KERNEL_STACK_KB. Make the stat per-node and deprecate MEMCG_KERNEL_STACK_KB as memcg_stat_item is an extension of node_stat_item. In addition localize the kernel stack stats updates to account_kernel_stack(). Link: http://lkml.kernel.org/r/20200630161539.1759185-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- drivers/base/node.c | 4 +- fs/proc/meminfo.c | 4 +- include/linux/memcontrol.h | 21 +++++++++++++- include/linux/mmzone.h | 8 ++--- kernel/fork.c | 51 +++++++++-------------------------- kernel/scs.c | 2 - mm/memcontrol.c | 2 - mm/page_alloc.c | 16 +++++----- mm/vmstat.c | 8 ++--- 9 files changed, 55 insertions(+), 61 deletions(-) --- a/drivers/base/node.c~mm-memcontrol-account-kernel-stack-per-node +++ a/drivers/base/node.c @@ -440,9 +440,9 @@ static ssize_t node_read_meminfo(struct nid, K(node_page_state(pgdat, NR_FILE_MAPPED)), nid, K(node_page_state(pgdat, NR_ANON_MAPPED)), nid, K(i.sharedram), - nid, sum_zone_node_page_state(nid, NR_KERNEL_STACK_KB), + nid, node_page_state(pgdat, NR_KERNEL_STACK_KB), #ifdef CONFIG_SHADOW_CALL_STACK - nid, sum_zone_node_page_state(nid, NR_KERNEL_SCS_KB), + nid, node_page_state(pgdat, NR_KERNEL_SCS_KB), #endif nid, K(sum_zone_node_page_state(nid, NR_PAGETABLE)), nid, 0UL, --- a/fs/proc/meminfo.c~mm-memcontrol-account-kernel-stack-per-node +++ a/fs/proc/meminfo.c @@ -101,10 +101,10 @@ static int meminfo_proc_show(struct seq_ show_val_kb(m, "SReclaimable: ", sreclaimable); show_val_kb(m, "SUnreclaim: ", sunreclaim); seq_printf(m, "KernelStack: %8lu kB\n", - global_zone_page_state(NR_KERNEL_STACK_KB)); + global_node_page_state(NR_KERNEL_STACK_KB)); #ifdef CONFIG_SHADOW_CALL_STACK seq_printf(m, "ShadowCallStack:%8lu kB\n", - global_zone_page_state(NR_KERNEL_SCS_KB)); + global_node_page_state(NR_KERNEL_SCS_KB)); #endif show_val_kb(m, "PageTables: ", global_zone_page_state(NR_PAGETABLE)); --- a/include/linux/memcontrol.h~mm-memcontrol-account-kernel-stack-per-node +++ a/include/linux/memcontrol.h @@ -32,8 +32,6 @@ struct kmem_cache; enum memcg_stat_item { MEMCG_SWAP = NR_VM_NODE_STAT_ITEMS, MEMCG_SOCK, - /* XXX: why are these zone and not node counters? */ - MEMCG_KERNEL_STACK_KB, MEMCG_NR_STAT, }; @@ -729,8 +727,19 @@ void __mod_memcg_lruvec_state(struct lru void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val); + void mod_memcg_obj_state(void *p, int idx, int val); +static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx, + int val) +{ + unsigned long flags; + + local_irq_save(flags); + __mod_lruvec_slab_state(p, idx, val); + local_irq_restore(flags); +} + static inline void mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { @@ -1151,6 +1160,14 @@ static inline void __mod_lruvec_slab_sta __mod_node_page_state(page_pgdat(page), idx, val); } +static inline void mod_lruvec_slab_state(void *p, enum node_stat_item idx, + int val) +{ + struct page *page = virt_to_head_page(p); + + mod_node_page_state(page_pgdat(page), idx, val); +} + static inline void mod_memcg_obj_state(void *p, int idx, int val) { } --- a/include/linux/mmzone.h~mm-memcontrol-account-kernel-stack-per-node +++ a/include/linux/mmzone.h @@ -155,10 +155,6 @@ enum zone_stat_item { NR_ZONE_WRITE_PENDING, /* Count of dirty, writeback and unstable pages */ NR_MLOCK, /* mlock()ed pages found and moved off LRU */ NR_PAGETABLE, /* used for pagetables */ - NR_KERNEL_STACK_KB, /* measured in KiB */ -#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) - NR_KERNEL_SCS_KB, /* measured in KiB */ -#endif /* Second 128 byte cacheline */ NR_BOUNCE, #if IS_ENABLED(CONFIG_ZSMALLOC) @@ -203,6 +199,10 @@ enum node_stat_item { NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */ NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */ + NR_KERNEL_STACK_KB, /* measured in KiB */ +#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) + NR_KERNEL_SCS_KB, /* measured in KiB */ +#endif NR_VM_NODE_STAT_ITEMS }; --- a/kernel/fork.c~mm-memcontrol-account-kernel-stack-per-node +++ a/kernel/fork.c @@ -276,13 +276,8 @@ static inline void free_thread_stack(str if (vm) { int i; - for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) { - mod_memcg_page_state(vm->pages[i], - MEMCG_KERNEL_STACK_KB, - -(int)(PAGE_SIZE / 1024)); - + for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) memcg_kmem_uncharge_page(vm->pages[i], 0); - } for (i = 0; i < NR_CACHED_STACKS; i++) { if (this_cpu_cmpxchg(cached_stacks[i], @@ -382,31 +377,14 @@ static void account_kernel_stack(struct void *stack = task_stack_page(tsk); struct vm_struct *vm = task_stack_vm_area(tsk); - BUILD_BUG_ON(IS_ENABLED(CONFIG_VMAP_STACK) && PAGE_SIZE % 1024 != 0); - if (vm) { - int i; - - BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE); - - for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) { - mod_zone_page_state(page_zone(vm->pages[i]), - NR_KERNEL_STACK_KB, - PAGE_SIZE / 1024 * account); - } - } else { - /* - * All stack pages are in the same zone and belong to the - * same memcg. - */ - struct page *first_page = virt_to_page(stack); - - mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB, - THREAD_SIZE / 1024 * account); - - mod_memcg_obj_state(stack, MEMCG_KERNEL_STACK_KB, - account * (THREAD_SIZE / 1024)); - } + /* All stack pages are in the same node. */ + if (vm) + mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB, + account * (THREAD_SIZE / 1024)); + else + mod_lruvec_slab_state(stack, NR_KERNEL_STACK_KB, + account * (THREAD_SIZE / 1024)); } static int memcg_charge_kernel_stack(struct task_struct *tsk) @@ -415,24 +393,23 @@ static int memcg_charge_kernel_stack(str struct vm_struct *vm = task_stack_vm_area(tsk); int ret; + BUILD_BUG_ON(IS_ENABLED(CONFIG_VMAP_STACK) && PAGE_SIZE % 1024 != 0); + if (vm) { int i; + BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE); + for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) { /* * If memcg_kmem_charge_page() fails, page->mem_cgroup - * pointer is NULL, and both memcg_kmem_uncharge_page() - * and mod_memcg_page_state() in free_thread_stack() - * will ignore this page. So it's safe. + * pointer is NULL, and memcg_kmem_uncharge_page() in + * free_thread_stack() will ignore this page. */ ret = memcg_kmem_charge_page(vm->pages[i], GFP_KERNEL, 0); if (ret) return ret; - - mod_memcg_page_state(vm->pages[i], - MEMCG_KERNEL_STACK_KB, - PAGE_SIZE / 1024); } } #endif --- a/kernel/scs.c~mm-memcontrol-account-kernel-stack-per-node +++ a/kernel/scs.c @@ -17,7 +17,7 @@ static void __scs_account(void *s, int a { struct page *scs_page = virt_to_page(s); - mod_zone_page_state(page_zone(scs_page), NR_KERNEL_SCS_KB, + mod_node_page_state(page_pgdat(scs_page), NR_KERNEL_SCS_KB, account * (SCS_SIZE / SZ_1K)); } --- a/mm/memcontrol.c~mm-memcontrol-account-kernel-stack-per-node +++ a/mm/memcontrol.c @@ -1485,7 +1485,7 @@ static char *memory_stat_format(struct m (u64)memcg_page_state(memcg, NR_FILE_PAGES) * PAGE_SIZE); seq_buf_printf(&s, "kernel_stack %llu\n", - (u64)memcg_page_state(memcg, MEMCG_KERNEL_STACK_KB) * + (u64)memcg_page_state(memcg, NR_KERNEL_STACK_KB) * 1024); seq_buf_printf(&s, "slab %llu\n", (u64)(memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B) + --- a/mm/page_alloc.c~mm-memcontrol-account-kernel-stack-per-node +++ a/mm/page_alloc.c @@ -5396,6 +5396,10 @@ void show_free_areas(unsigned int filter " anon_thp: %lukB" #endif " writeback_tmp:%lukB" + " kernel_stack:%lukB" +#ifdef CONFIG_SHADOW_CALL_STACK + " shadow_call_stack:%lukB" +#endif " all_unreclaimable? %s" "\n", pgdat->node_id, @@ -5417,6 +5421,10 @@ void show_free_areas(unsigned int filter K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR), #endif K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), + node_page_state(pgdat, NR_KERNEL_STACK_KB), +#ifdef CONFIG_SHADOW_CALL_STACK + node_page_state(pgdat, NR_KERNEL_SCS_KB), +#endif pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ? "yes" : "no"); } @@ -5448,10 +5456,6 @@ void show_free_areas(unsigned int filter " present:%lukB" " managed:%lukB" " mlocked:%lukB" - " kernel_stack:%lukB" -#ifdef CONFIG_SHADOW_CALL_STACK - " shadow_call_stack:%lukB" -#endif " pagetables:%lukB" " bounce:%lukB" " free_pcp:%lukB" @@ -5473,10 +5477,6 @@ void show_free_areas(unsigned int filter K(zone->present_pages), K(zone_managed_pages(zone)), K(zone_page_state(zone, NR_MLOCK)), - zone_page_state(zone, NR_KERNEL_STACK_KB), -#ifdef CONFIG_SHADOW_CALL_STACK - zone_page_state(zone, NR_KERNEL_SCS_KB), -#endif K(zone_page_state(zone, NR_PAGETABLE)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), --- a/mm/vmstat.c~mm-memcontrol-account-kernel-stack-per-node +++ a/mm/vmstat.c @@ -1140,10 +1140,6 @@ const char * const vmstat_text[] = { "nr_zone_write_pending", "nr_mlock", "nr_page_table_pages", - "nr_kernel_stack", -#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) - "nr_shadow_call_stack", -#endif "nr_bounce", #if IS_ENABLED(CONFIG_ZSMALLOC) "nr_zspages", @@ -1194,6 +1190,10 @@ const char * const vmstat_text[] = { "nr_kernel_misc_reclaimable", "nr_foll_pin_acquired", "nr_foll_pin_released", + "nr_kernel_stack", +#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK) + "nr_shadow_call_stack", +#endif /* enum writeback_stat_item counters */ "nr_dirty_threshold", _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcg/slab: remove unused argument by charge_slab_page() charge_slab_page() is not using the gfp argument anymore, remove it. Link: http://lkml.kernel.org/r/20200707173612.124425-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 2 +- mm/slab.h | 3 +-- mm/slub.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) --- a/mm/slab.c~mm-memcg-slab-remove-unused-argument-by-charge_slab_page +++ a/mm/slab.c @@ -1379,7 +1379,7 @@ static struct page *kmem_getpages(struct return NULL; } - charge_slab_page(page, flags, cachep->gfporder, cachep); + charge_slab_page(page, cachep->gfporder, cachep); __SetPageSlab(page); /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */ if (sk_memalloc_socks() && page_is_pfmemalloc(page)) --- a/mm/slab.h~mm-memcg-slab-remove-unused-argument-by-charge_slab_page +++ a/mm/slab.h @@ -423,8 +423,7 @@ static inline struct kmem_cache *virt_to return page->slab_cache; } -static __always_inline void charge_slab_page(struct page *page, - gfp_t gfp, int order, +static __always_inline void charge_slab_page(struct page *page, int order, struct kmem_cache *s) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), --- a/mm/slub.c~mm-memcg-slab-remove-unused-argument-by-charge_slab_page +++ a/mm/slub.c @@ -1621,7 +1621,7 @@ static inline struct page *alloc_slab_pa page = __alloc_pages_node(node, flags, order); if (page) - charge_slab_page(page, flags, order, s); + charge_slab_page(page, order, s); return page; } _
From: Roman Gushchin <guro@fb.com> Subject: mm: slab: rename (un)charge_slab_page() to (un)account_slab_page() charge_slab_page() and uncharge_slab_page() are not related anymore to memcg charging and uncharging. In order to make their names less confusing, let's rename them to account_slab_page() and unaccount_slab_page() respectively. Link: http://lkml.kernel.org/r/20200707173612.124425-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slab.c | 4 ++-- mm/slab.h | 8 ++++---- mm/slub.c | 4 ++-- 3 files changed, 8 insertions(+), 8 deletions(-) --- a/mm/slab.c~mm-slab-rename-uncharge_slab_page-to-unaccount_slab_page +++ a/mm/slab.c @@ -1379,7 +1379,7 @@ static struct page *kmem_getpages(struct return NULL; } - charge_slab_page(page, cachep->gfporder, cachep); + account_slab_page(page, cachep->gfporder, cachep); __SetPageSlab(page); /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */ if (sk_memalloc_socks() && page_is_pfmemalloc(page)) @@ -1403,7 +1403,7 @@ static void kmem_freepages(struct kmem_c if (current->reclaim_state) current->reclaim_state->reclaimed_slab += 1 << order; - uncharge_slab_page(page, order, cachep); + unaccount_slab_page(page, order, cachep); __free_pages(page, order); } --- a/mm/slab.h~mm-slab-rename-uncharge_slab_page-to-unaccount_slab_page +++ a/mm/slab.h @@ -423,15 +423,15 @@ static inline struct kmem_cache *virt_to return page->slab_cache; } -static __always_inline void charge_slab_page(struct page *page, int order, - struct kmem_cache *s) +static __always_inline void account_slab_page(struct page *page, int order, + struct kmem_cache *s) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), PAGE_SIZE << order); } -static __always_inline void uncharge_slab_page(struct page *page, int order, - struct kmem_cache *s) +static __always_inline void unaccount_slab_page(struct page *page, int order, + struct kmem_cache *s) { if (memcg_kmem_enabled()) memcg_free_page_obj_cgroups(page); --- a/mm/slub.c~mm-slab-rename-uncharge_slab_page-to-unaccount_slab_page +++ a/mm/slub.c @@ -1621,7 +1621,7 @@ static inline struct page *alloc_slab_pa page = __alloc_pages_node(node, flags, order); if (page) - charge_slab_page(page, order, s); + account_slab_page(page, order, s); return page; } @@ -1844,7 +1844,7 @@ static void __free_slab(struct kmem_cach page->mapping = NULL; if (current->reclaim_state) current->reclaim_state->reclaimed_slab += pages; - uncharge_slab_page(page, order, s); + unaccount_slab_page(page, order, s); __free_pages(page, order); } _
From: Roman Gushchin <guro@fb.com> Subject: mm: kmem: switch to static_branch_likely() in memcg_kmem_enabled() Currently memcg_kmem_enabled() is optimized for the kernel memory accounting being off. It was so for a long time, and arguably the reason behind was that the kernel memory accounting was initially an opt-in feature. However, now it's on by default on both cgroup v1 and cgroup v2, and it's on for all cgroups. So let's switch over to static_branch_likely() to reflect this fact. Unlikely there is a significant performance difference, as the cost of a memory allocation and its accounting significantly exceeds the cost of a jump. However, the conversion makes the code look more logically. Link: http://lkml.kernel.org/r/20200707173612.124425-3-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/include/linux/memcontrol.h~mm-kmem-switch-to-static_branch_likely-in-memcg_kmem_enabled +++ a/include/linux/memcontrol.h @@ -1448,7 +1448,7 @@ void memcg_put_cache_ids(void); static inline bool memcg_kmem_enabled(void) { - return static_branch_unlikely(&memcg_kmem_enabled_key); + return static_branch_likely(&memcg_kmem_enabled_key); } static inline bool memcg_kmem_bypass(void) _
From: Roman Gushchin <guro@fb.com> Subject: mm: memcontrol: avoid workload stalls when lowering memory.high Memory.high limit is implemented in a way such that the kernel penalizes all threads which are allocating a memory over the limit. Forcing all threads into the synchronous reclaim and adding some artificial delays allows to slow down the memory consumption and potentially give some time for userspace oom handlers/resource control agents to react. It works nicely if the memory usage is hitting the limit from below, however it works sub-optimal if a user adjusts memory.high to a value way below the current memory usage. It basically forces all workload threads (doing any memory allocations) into the synchronous reclaim and sleep. This makes the workload completely unresponsive for a long period of time and can also lead to a system-wide contention on lru locks. It can happen even if the workload is not actually tight on memory and has, for example, a ton of cold pagecache. In the current implementation writing to memory.high causes an atomic update of page counter's high value followed by an attempt to reclaim enough memory to fit into the new limit. To fix the problem described above, all we need is to change the order of execution: try to push the memory usage under the limit first, and only then set the new high limit. Link: http://lkml.kernel.org/r/20200709194718.189231-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Domas Mituzas <domas@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Chris Down <chris@chrisdown.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-avoid-workload-stalls-when-lowering-memoryhigh +++ a/mm/memcontrol.c @@ -6213,8 +6213,6 @@ static ssize_t memory_high_write(struct if (err) return err; - page_counter_set_high(&memcg->memory, high); - for (;;) { unsigned long nr_pages = page_counter_read(&memcg->memory); unsigned long reclaimed; @@ -6238,6 +6236,8 @@ static ssize_t memory_high_write(struct break; } + page_counter_set_high(&memcg->memory, high); + return nbytes; } _
From: Chris Down <chris@chrisdown.name> Subject: mm, memcg: reclaim more aggressively before high allocator throttling Patch series "mm, memcg: reclaim harder before high throttling", v2. This patch (of 2): In Facebook production, we've seen cases where cgroups have been put into allocator throttling even when they appear to have a lot of slack file caches which should be trivially reclaimable. Looking more closely, the problem is that we only try a single cgroup reclaim walk for each return to usermode before calculating whether or not we should throttle. This single attempt doesn't produce enough pressure to shrink for cgroups with a rapidly growing amount of file caches prior to entering allocator throttling. As an example, we see that threads in an affected cgroup are stuck in allocator throttling: # for i in $(cat cgroup.threads); do > grep over_high "/proc/$i/stack" > done [<0>] mem_cgroup_handle_over_high+0x10b/0x150 [<0>] mem_cgroup_handle_over_high+0x10b/0x150 [<0>] mem_cgroup_handle_over_high+0x10b/0x150 ...however, there is no I/O pressure reported by PSI, despite a lot of slack file pages: # cat memory.pressure some avg10=78.50 avg60=84.99 avg300=84.53 total=5702440903 full avg10=78.50 avg60=84.99 avg300=84.53 total=5702116959 # cat io.pressure some avg10=0.00 avg60=0.00 avg300=0.00 total=78051391 full avg10=0.00 avg60=0.00 avg300=0.00 total=78049640 # grep _file memory.stat inactive_file 1370939392 active_file 661635072 This patch changes the behaviour to retry reclaim either until the current task goes below the 10ms grace period, or we are making no reclaim progress at all. In the latter case, we enter reclaim throttling as before. To a user, there's no intuitive reason for the reclaim behaviour to differ from hitting memory.high as part of a new allocation, as opposed to hitting memory.high because someone lowered its value. As such this also brings an added benefit: it unifies the reclaim behaviour between the two. There's precedent for this behaviour: we already do reclaim retries when writing to memory.{high,max}, in max reclaim, and in the page allocator itself. Link: http://lkml.kernel.org/r/cover.1594640214.git.chris@chrisdown.name Link: http://lkml.kernel.org/r/a4e23b59e9ef499b575ae73a8120ee089b7d3373.1594640214.git.chris@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) --- a/mm/memcontrol.c~mm-memcg-reclaim-more-aggressively-before-high-allocator-throttling +++ a/mm/memcontrol.c @@ -73,6 +73,7 @@ EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; +/* The number of times we should retry reclaim failures before giving up. */ #define MEM_CGROUP_RECLAIM_RETRIES 5 /* Socket memory accounting disabled? */ @@ -2363,18 +2364,23 @@ static int memcg_hotplug_cpu_dead(unsign return 0; } -static void reclaim_high(struct mem_cgroup *memcg, - unsigned int nr_pages, - gfp_t gfp_mask) +static unsigned long reclaim_high(struct mem_cgroup *memcg, + unsigned int nr_pages, + gfp_t gfp_mask) { + unsigned long nr_reclaimed = 0; + do { if (page_counter_read(&memcg->memory) <= READ_ONCE(memcg->memory.high)) continue; memcg_memory_event(memcg, MEMCG_HIGH); - try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true); + nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, + gfp_mask, true); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); + + return nr_reclaimed; } static void high_work_func(struct work_struct *work) @@ -2530,16 +2536,32 @@ void mem_cgroup_handle_over_high(void) { unsigned long penalty_jiffies; unsigned long pflags; + unsigned long nr_reclaimed; unsigned int nr_pages = current->memcg_nr_pages_over_high; + int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; struct mem_cgroup *memcg; + bool in_retry = false; if (likely(!nr_pages)) return; memcg = get_mem_cgroup_from_mm(current->mm); - reclaim_high(memcg, nr_pages, GFP_KERNEL); current->memcg_nr_pages_over_high = 0; +retry_reclaim: + /* + * The allocating task should reclaim at least the batch size, but for + * subsequent retries we only want to do what's necessary to prevent oom + * or breaching resource isolation. + * + * This is distinct from memory.max or page allocator behaviour because + * memory.high is currently batched, whereas memory.max and the page + * allocator run every time an allocation is made. + */ + nr_reclaimed = reclaim_high(memcg, + in_retry ? SWAP_CLUSTER_MAX : nr_pages, + GFP_KERNEL); + /* * memory.high is breached and reclaim is unable to keep up. Throttle * allocators proactively to slow down excessive growth. @@ -2567,6 +2589,16 @@ void mem_cgroup_handle_over_high(void) goto out; /* + * If reclaim is making forward progress but we're still over + * memory.high, we want to encourage that rather than doing allocator + * throttling. + */ + if (nr_reclaimed || nr_retries--) { + in_retry = true; + goto retry_reclaim; + } + + /* * If we exit early, we're guaranteed to die (since * schedule_timeout_killable sets TASK_KILLABLE). This means we don't * need to account for any ill-begotten jiffies to pay them off later. _
From: Chris Down <chris@chrisdown.name> Subject: mm, memcg: unify reclaim retry limits with page allocator Reclaim retries have been set to 5 since the beginning of time in commit 66e1707bc346 ("Memory controller: add per cgroup LRU and reclaim"). However, we now have a generally agreed-upon standard for page reclaim: MAX_RECLAIM_RETRIES (currently 16), added many years later in commit 0a0337e0d1d1 ("mm, oom: rework oom detection"). In the absence of a compelling reason to declare an OOM earlier in memcg context than page allocator context, it seems reasonable to supplant MEM_CGROUP_RECLAIM_RETRIES with MAX_RECLAIM_RETRIES, making the page allocator and memcg internals more similar in semantics when reclaim fails to produce results, avoiding premature OOMs or throttling. Link: http://lkml.kernel.org/r/da557856c9c7654308eaff4eedc1952a95e8df5f.1594640214.git.chris@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) --- a/mm/memcontrol.c~mm-memcg-unify-reclaim-retry-limits-with-page-allocator +++ a/mm/memcontrol.c @@ -73,9 +73,6 @@ EXPORT_SYMBOL(memory_cgrp_subsys); struct mem_cgroup *root_mem_cgroup __read_mostly; -/* The number of times we should retry reclaim failures before giving up. */ -#define MEM_CGROUP_RECLAIM_RETRIES 5 - /* Socket memory accounting disabled? */ static bool cgroup_memory_nosocket; @@ -2538,7 +2535,7 @@ void mem_cgroup_handle_over_high(void) unsigned long pflags; unsigned long nr_reclaimed; unsigned int nr_pages = current->memcg_nr_pages_over_high; - int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + int nr_retries = MAX_RECLAIM_RETRIES; struct mem_cgroup *memcg; bool in_retry = false; @@ -2615,7 +2612,7 @@ static int try_charge(struct mem_cgroup unsigned int nr_pages) { unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); - int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + int nr_retries = MAX_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; struct page_counter *counter; unsigned long nr_reclaimed; @@ -2734,7 +2731,7 @@ retry: get_order(nr_pages * PAGE_SIZE)); switch (oom_status) { case OOM_SUCCESS: - nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + nr_retries = MAX_RECLAIM_RETRIES; goto retry; case OOM_FAILED: goto force; @@ -3414,7 +3411,7 @@ static inline bool memcg_has_children(st */ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) { - int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + int nr_retries = MAX_RECLAIM_RETRIES; /* we call try-to-free pages for make this cgroup empty */ lru_add_drain_all(); @@ -6235,7 +6232,7 @@ static ssize_t memory_high_write(struct char *buf, size_t nbytes, loff_t off) { struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - unsigned int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; + unsigned int nr_retries = MAX_RECLAIM_RETRIES; bool drained = false; unsigned long high; int err; @@ -6283,7 +6280,7 @@ static ssize_t memory_max_write(struct k char *buf, size_t nbytes, loff_t off) { struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - unsigned int nr_reclaims = MEM_CGROUP_RECLAIM_RETRIES; + unsigned int nr_reclaims = MAX_RECLAIM_RETRIES; bool drained = false; unsigned long max; int err; _
From: Yafang Shao <laoar.shao@gmail.com> Subject: mm, memcg: avoid stale protection values when cgroup is above protection Patch series "mm, memcg: memory.{low,min} reclaim fix & cleanup", v4. This series contains a fix for a edge case in my earlier protection calculation patches, and a patch to make the area overall a little more robust to hopefully help avoid this in future. This patch (of 2): A cgroup can have both memory protection and a memory limit to isolate it from its siblings in both directions - for example, to prevent it from being shrunk below 2G under high pressure from outside, but also from growing beyond 4G under low pressure. Commit 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") implemented proportional scan pressure so that multiple siblings in excess of their protection settings don't get reclaimed equally but instead in accordance to their unprotected portion. During limit reclaim, this proportionality shouldn't apply of course: there is no competition, all pressure is from within the cgroup and should be applied as such. Reclaim should operate at full efficiency. However, mem_cgroup_protected() never expected anybody to look at the effective protection values when it indicated that the cgroup is above its protection. As a result, a query during limit reclaim may return stale protection values that were calculated by a previous reclaim cycle in which the cgroup did have siblings. When this happens, reclaim is unnecessarily hesitant and potentially slow to meet the desired limit. In theory this could lead to premature OOM kills, although it's not obvious this has occurred in practice. Workaround the problem by special casing reclaim roots in mem_cgroup_protection. These memcgs are never participating in the reclaim protection because the reclaim is internal. We have to ignore effective protection values for reclaim roots because mem_cgroup_protected might be called from racing reclaim contexts with different roots. Calculation is relying on root -> leaf tree traversal therefore top-down reclaim protection invariants should hold. The only exception is the reclaim root which should have effective protection set to 0 but that would be problematic for the following setup: Let's have global and A's reclaim in parallel: | A (low=2G, usage = 3G, max = 3G, children_low_usage = 1.5G) |\ | C (low = 1G, usage = 2.5G) B (low = 1G, usage = 0.5G) for A reclaim we have B.elow = B.low C.elow = C.low For the global reclaim A.elow = A.low B.elow = min(B.usage, B.low) because children_low_usage <= A.elow C.elow = min(C.usage, C.low) With the effective values resetting we have A reclaim A.elow = 0 B.elow = B.low C.elow = C.low and global reclaim could see the above and then B.elow = C.elow = 0 because children_low_usage > A.elow Which means that protected memcgs would get reclaimed. In future we would like to make mem_cgroup_protected more robust against racing reclaim contexts but that is likely more complex solution than this simple workaround. [hannes@cmpxchg.org - large part of the changelog] [mhocko@suse.com - workaround explanation] [chris@chrisdown.name - retitle] Link: http://lkml.kernel.org/r/cover.1594638158.git.chris@chrisdown.name Link: http://lkml.kernel.org/r/044fb8ecffd001c7905d27c0c2ad998069fdc396.1594638158.git.chris@chrisdown.name Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 42 +++++++++++++++++++++++++++++++++-- mm/memcontrol.c | 8 ++++++ mm/vmscan.c | 3 +- 3 files changed, 50 insertions(+), 3 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-avoid-stale-protection-values-when-cgroup-is-above-protection +++ a/include/linux/memcontrol.h @@ -355,12 +355,49 @@ static inline bool mem_cgroup_disabled(v return !cgroup_subsys_enabled(memory_cgrp_subsys); } -static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg, +static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, + struct mem_cgroup *memcg, bool in_low_reclaim) { if (mem_cgroup_disabled()) return 0; + /* + * There is no reclaim protection applied to a targeted reclaim. + * We are special casing this specific case here because + * mem_cgroup_protected calculation is not robust enough to keep + * the protection invariant for calculated effective values for + * parallel reclaimers with different reclaim target. This is + * especially a problem for tail memcgs (as they have pages on LRU) + * which would want to have effective values 0 for targeted reclaim + * but a different value for external reclaim. + * + * Example + * Let's have global and A's reclaim in parallel: + * | + * A (low=2G, usage = 3G, max = 3G, children_low_usage = 1.5G) + * |\ + * | C (low = 1G, usage = 2.5G) + * B (low = 1G, usage = 0.5G) + * + * For the global reclaim + * A.elow = A.low + * B.elow = min(B.usage, B.low) because children_low_usage <= A.elow + * C.elow = min(C.usage, C.low) + * + * With the effective values resetting we have A reclaim + * A.elow = 0 + * B.elow = B.low + * C.elow = C.low + * + * If the global reclaim races with A's reclaim then + * B.elow = C.elow = 0 because children_low_usage > A.elow) + * is possible and reclaiming B would be violating the protection. + * + */ + if (root == memcg) + return 0; + if (in_low_reclaim) return READ_ONCE(memcg->memory.emin); @@ -891,7 +928,8 @@ static inline void memcg_memory_event_mm { } -static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg, +static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, + struct mem_cgroup *memcg, bool in_low_reclaim) { return 0; --- a/mm/memcontrol.c~mm-memcg-avoid-stale-protection-values-when-cgroup-is-above-protection +++ a/mm/memcontrol.c @@ -6605,6 +6605,14 @@ enum mem_cgroup_protection mem_cgroup_pr if (!root) root = root_mem_cgroup; + + /* + * Effective values of the reclaim targets are ignored so they + * can be stale. Have a look at mem_cgroup_protection for more + * details. + * TODO: calculation should be more robust so that we do not need + * that special casing. + */ if (memcg == root) return MEMCG_PROT_NONE; --- a/mm/vmscan.c~mm-memcg-avoid-stale-protection-values-when-cgroup-is-above-protection +++ a/mm/vmscan.c @@ -2331,7 +2331,8 @@ out: unsigned long protection; lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); - protection = mem_cgroup_protection(memcg, + protection = mem_cgroup_protection(sc->target_mem_cgroup, + memcg, sc->memcg_low_reclaim); if (protection) { _
From: Chris Down <chris@chrisdown.name> Subject: mm, memcg: decouple e{low,min} state mutations from protection checks mem_cgroup_protected currently is both used to set effective low and min and return a mem_cgroup_protection based on the result. As a user, this can be a little unexpected: it appears to be a simple predicate function, if not for the big warning in the comment above about the order in which it must be executed. This change makes it so that we separate the state mutations from the actual protection checks, which makes it more obvious where we need to be careful mutating internal state, and where we are simply checking and don't need to worry about that. [mhocko@suse.com - don't check protection on root memcgs] Link: http://lkml.kernel.org/r/ff3f915097fcee9f6d7041c084ef92d16aaeb56a.1594638158.git.chris@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/memcontrol.h | 53 +++++++++++++++++++++++++++-------- mm/memcontrol.c | 28 ++++-------------- mm/vmscan.c | 17 ++--------- 3 files changed, 53 insertions(+), 45 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-decouple-elowmin-state-mutations-from-protection-checks +++ a/include/linux/memcontrol.h @@ -47,12 +47,6 @@ enum memcg_memory_event { MEMCG_NR_MEMORY_EVENTS, }; -enum mem_cgroup_protection { - MEMCG_PROT_NONE, - MEMCG_PROT_LOW, - MEMCG_PROT_MIN, -}; - struct mem_cgroup_reclaim_cookie { pg_data_t *pgdat; unsigned int generation; @@ -405,8 +399,36 @@ static inline unsigned long mem_cgroup_p READ_ONCE(memcg->memory.elow)); } -enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, - struct mem_cgroup *memcg); +void mem_cgroup_calculate_protection(struct mem_cgroup *root, + struct mem_cgroup *memcg); + +static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg) +{ + /* + * The root memcg doesn't account charges, and doesn't support + * protection. + */ + return !mem_cgroup_disabled() && !mem_cgroup_is_root(memcg); + +} + +static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg) +{ + if (!mem_cgroup_supports_protection(memcg)) + return false; + + return READ_ONCE(memcg->memory.elow) >= + page_counter_read(&memcg->memory); +} + +static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) +{ + if (!mem_cgroup_supports_protection(memcg)) + return false; + + return READ_ONCE(memcg->memory.emin) >= + page_counter_read(&memcg->memory); +} int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask); @@ -935,10 +957,19 @@ static inline unsigned long mem_cgroup_p return 0; } -static inline enum mem_cgroup_protection mem_cgroup_protected( - struct mem_cgroup *root, struct mem_cgroup *memcg) +static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root, + struct mem_cgroup *memcg) +{ +} + +static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg) +{ + return false; +} + +static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg) { - return MEMCG_PROT_NONE; + return false; } static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm, --- a/mm/memcontrol.c~mm-memcg-decouple-elowmin-state-mutations-from-protection-checks +++ a/mm/memcontrol.c @@ -6587,21 +6587,15 @@ static unsigned long effective_protectio * * WARNING: This function is not stateless! It can only be used as part * of a top-down tree iteration, not for isolated queries. - * - * Returns one of the following: - * MEMCG_PROT_NONE: cgroup memory is not protected - * MEMCG_PROT_LOW: cgroup memory is protected as long there is - * an unprotected supply of reclaimable memory from other cgroups. - * MEMCG_PROT_MIN: cgroup memory is protected */ -enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, - struct mem_cgroup *memcg) +void mem_cgroup_calculate_protection(struct mem_cgroup *root, + struct mem_cgroup *memcg) { unsigned long usage, parent_usage; struct mem_cgroup *parent; if (mem_cgroup_disabled()) - return MEMCG_PROT_NONE; + return; if (!root) root = root_mem_cgroup; @@ -6614,21 +6608,21 @@ enum mem_cgroup_protection mem_cgroup_pr * that special casing. */ if (memcg == root) - return MEMCG_PROT_NONE; + return; usage = page_counter_read(&memcg->memory); if (!usage) - return MEMCG_PROT_NONE; + return; parent = parent_mem_cgroup(memcg); /* No parent means a non-hierarchical mode on v1 memcg */ if (!parent) - return MEMCG_PROT_NONE; + return; if (parent == root) { memcg->memory.emin = READ_ONCE(memcg->memory.min); memcg->memory.elow = READ_ONCE(memcg->memory.low); - goto out; + return; } parent_usage = page_counter_read(&parent->memory); @@ -6642,14 +6636,6 @@ enum mem_cgroup_protection mem_cgroup_pr READ_ONCE(memcg->memory.low), READ_ONCE(parent->memory.elow), atomic_long_read(&parent->memory.children_low_usage))); - -out: - if (usage <= memcg->memory.emin) - return MEMCG_PROT_MIN; - else if (usage <= memcg->memory.elow) - return MEMCG_PROT_LOW; - else - return MEMCG_PROT_NONE; } /** --- a/mm/vmscan.c~mm-memcg-decouple-elowmin-state-mutations-from-protection-checks +++ a/mm/vmscan.c @@ -2620,14 +2620,15 @@ static void shrink_node_memcgs(pg_data_t unsigned long reclaimed; unsigned long scanned; - switch (mem_cgroup_protected(target_memcg, memcg)) { - case MEMCG_PROT_MIN: + mem_cgroup_calculate_protection(target_memcg, memcg); + + if (mem_cgroup_below_min(memcg)) { /* * Hard protection. * If there is no reclaimable memory, OOM. */ continue; - case MEMCG_PROT_LOW: + } else if (mem_cgroup_below_low(memcg)) { /* * Soft protection. * Respect the protection only as long as @@ -2639,16 +2640,6 @@ static void shrink_node_memcgs(pg_data_t continue; } memcg_memory_event(memcg, MEMCG_LOW); - break; - case MEMCG_PROT_NONE: - /* - * All protection thresholds breached. We may - * still choose to vary the scan pressure - * applied based on by how much the cgroup in - * question has exceeded its protection - * thresholds (see get_scan_count). - */ - break; } reclaimed = sc->nr_reclaimed; _
From: Yafang Shao <laoar.shao@gmail.com> Subject: memcg, oom: check memcg margin for parallel oom Memcg oom killer invocation is synchronized by the global oom_lock and tasks are sleeping on the lock while somebody is selecting the victim or potentially race with the oom_reaper is releasing the victim's memory. This can result in a pointless oom killer invocation because a waiter might be racing with the oom_reaper P1 oom_reaper P2 oom_reap_task mutex_lock(oom_lock) out_of_memory # no victim because we have one already __oom_reap_task_mm mute_unlock(oom_lock) mutex_lock(oom_lock) set MMF_OOM_SKIP select_bad_process # finds a new victim The page allocator prevents from this race by trying to allocate after the lock can be acquired (in __alloc_pages_may_oom) which acts as a last minute check. Moreover page allocator simply doesn't block on the oom_lock and simply retries the whole reclaim process. Memcg oom killer should do the last minute check as well. Call mem_cgroup_margin to do that. Trylock on the oom_lock could be done as well but this doesn't seem to be necessary at this stage. [mhocko@kernel.org: commit log] Link: http://lkml.kernel.org/r/1594735034-19190-1-git-send-email-laoar.shao@gmail.com Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Chris Down <chris@chrisdown.name> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/memcontrol.c~memcg-oom-check-memcg-margin-for-parallel-oom +++ a/mm/memcontrol.c @@ -1663,15 +1663,21 @@ static bool mem_cgroup_out_of_memory(str .gfp_mask = gfp_mask, .order = order, }; - bool ret; + bool ret = true; if (mutex_lock_killable(&oom_lock)) return true; + + if (mem_cgroup_margin(memcg) >= (1 << order)) + goto unlock; + /* * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ ret = should_force_charge() || out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; } _
From: Johannes Weiner <hannes@cmpxchg.org> Subject: mm: memcontrol: restore proper dirty throttling when memory.high changes Commit 8c8c383c04f6 ("mm: memcontrol: try harder to set a new memory.high") inadvertently removed a callback to recalculate the writeback cache size in light of a newly configured memory.high limit. Without letting the writeback cache know about a potentially heavily reduced limit, it may permit too many dirty pages, which can cause unnecessary reclaim latencies or even avoidable OOM situations. This was spotted while reading the code, it hasn't knowingly caused any problems in practice so far. Link: http://lkml.kernel.org/r/20200728135210.379885-1-hannes@cmpxchg.org Fixes: 8c8c383c04f6 ("mm: memcontrol: try harder to set a new memory.high") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 2 ++ 1 file changed, 2 insertions(+) --- a/mm/memcontrol.c~mm-memcontrol-restore-proper-dirty-throttling-when-memoryhigh-changes +++ a/mm/memcontrol.c @@ -6273,6 +6273,8 @@ static ssize_t memory_high_write(struct page_counter_set_high(&memcg->memory, high); + memcg_wb_domain_size_changed(memcg); + return nbytes; } _
From: Johannes Weiner <hannes@cmpxchg.org> Subject: mm: memcontrol: don't count limit-setting reclaim as memory pressure When an outside process lowers one of the memory limits of a cgroup (or uses the force_empty knob in cgroup1), direct reclaim is performed in the context of the write(), in order to directly enforce the new limit and have it being met by the time the write() returns. Currently, this reclaim activity is accounted as memory pressure in the cgroup that the writer(!) belongs to. This is unexpected. It specifically causes problems for senpai (https://github.com/facebookincubator/senpai), which is an agent that routinely adjusts the memory limits and performs associated reclaim work in tens or even hundreds of cgroups running on the host. The cgroup that senpai is running in itself will report elevated levels of memory pressure, even though it itself is under no memory shortage or any sort of distress. Move the psi annotation from the central cgroup reclaim function to callsites in the allocation context, and thereby no longer count any limit-setting reclaim as memory pressure. If the newly set limit causes the workload inside the cgroup into direct reclaim, that of course will continue to count as memory pressure. Link: http://lkml.kernel.org/r/20200728135210.379885-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 11 ++++++++++- mm/vmscan.c | 6 ------ 2 files changed, 10 insertions(+), 7 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-dont-count-limit-setting-reclaim-as-memory-pressure +++ a/mm/memcontrol.c @@ -2374,12 +2374,18 @@ static unsigned long reclaim_high(struct unsigned long nr_reclaimed = 0; do { + unsigned long pflags; + if (page_counter_read(&memcg->memory) <= READ_ONCE(memcg->memory.high)) continue; + memcg_memory_event(memcg, MEMCG_HIGH); + + psi_memstall_enter(&pflags); nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true); + psi_memstall_leave(&pflags); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2621,10 +2627,11 @@ static int try_charge(struct mem_cgroup int nr_retries = MAX_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; struct page_counter *counter; + enum oom_status oom_status; unsigned long nr_reclaimed; bool may_swap = true; bool drained = false; - enum oom_status oom_status; + unsigned long pflags; if (mem_cgroup_is_root(memcg)) return 0; @@ -2684,8 +2691,10 @@ retry: memcg_memory_event(mem_over_limit, MEMCG_MAX); + psi_memstall_enter(&pflags); nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, gfp_mask, may_swap); + psi_memstall_leave(&pflags); if (mem_cgroup_margin(mem_over_limit) >= nr_pages) goto retry; --- a/mm/vmscan.c~mm-memcontrol-dont-count-limit-setting-reclaim-as-memory-pressure +++ a/mm/vmscan.c @@ -3310,7 +3310,6 @@ unsigned long try_to_free_mem_cgroup_pag bool may_swap) { unsigned long nr_reclaimed; - unsigned long pflags; unsigned int noreclaim_flag; struct scan_control sc = { .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), @@ -3331,17 +3330,12 @@ unsigned long try_to_free_mem_cgroup_pag struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); set_task_reclaim_state(current, &sc.reclaim_state); - trace_mm_vmscan_memcg_reclaim_begin(0, sc.gfp_mask); - - psi_memstall_enter(&pflags); noreclaim_flag = memalloc_noreclaim_save(); nr_reclaimed = do_try_to_free_pages(zonelist, &sc); memalloc_noreclaim_restore(noreclaim_flag); - psi_memstall_leave(&pflags); - trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed); set_task_reclaim_state(current, NULL); _
From: Michal Koutný <mkoutny@suse.com> Subject: mm/page_counter.c: fix protection usage propagation When workload runs in cgroups that aren't directly below root cgroup and their parent specifies reclaim protection, it may end up ineffective. The reason is that propagate_protected_usage() is not called in all hierarchy up. All the protected usage is incorrectly accumulated in the workload's parent. This means that siblings_low_usage is overestimated and effective protection underestimated. Even though it is transitional phenomenon (uncharge path does correct propagation and fixes the wrong children_low_usage), it can undermine the intended protection unexpectedly. We have noticed this problem while seeing a swap out in a descendant of a protected memcg (intermediate node) while the parent was conveniently under its protection limit and the memory pressure was external to that hierarchy. Michal has pinpointed this down to the wrong siblings_low_usage which led to the unwanted reclaim. The fix is simply updating children_low_usage in respective ancestors also in the charging path. Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.org Fixes: 230671533d64 ("mm: memory.low hierarchical behavior") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> [4.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_counter.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/page_counter.c~mm-fix-protection-usage-propagation +++ a/mm/page_counter.c @@ -72,7 +72,7 @@ void page_counter_charge(struct page_cou long new; new = atomic_long_add_return(nr_pages, &c->usage); - propagate_protected_usage(counter, new); + propagate_protected_usage(c, new); /* * This is indeed racy, but we can live with some * inaccuracy in the watermark. @@ -116,7 +116,7 @@ bool page_counter_try_charge(struct page new = atomic_long_add_return(nr_pages, &c->usage); if (new > c->max) { atomic_long_sub(nr_pages, &c->usage); - propagate_protected_usage(counter, new); + propagate_protected_usage(c, new); /* * This is racy, but we can live with some * inaccuracy in the failcnt. @@ -125,7 +125,7 @@ bool page_counter_try_charge(struct page *fail = c; goto failed; } - propagate_protected_usage(counter, new); + propagate_protected_usage(c, new); /* * Just like with failcnt, we can live with some * inaccuracy in the watermark. _
From: Ralph Campbell <rcampbell@nvidia.com> Subject: mm: remove redundant check non_swap_entry() In zap_pte_range(), the check for non_swap_entry() and is_device_private_entry() is unnecessary since the latter is sufficient to determine if the page is a device private page. Remove the test for non_swap_entry() to simplify the code and for clarity. Link: http://lkml.kernel.org/r/20200615175405.4613-1-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memory.c~mm-remove-redundant-check-non_swap_entry +++ a/mm/memory.c @@ -1098,7 +1098,7 @@ again: } entry = pte_to_swp_entry(ptent); - if (non_swap_entry(entry) && is_device_private_entry(entry)) { + if (is_device_private_entry(entry)) { struct page *page = device_private_entry_to_page(entry); if (unlikely(details && details->check_mapping)) { _
From: Alex Zhang <zhangalex@google.com> Subject: mm/memory.c: make remap_pfn_range() reject unaligned addr This function implicitly assumes that the addr passed in is page aligned. A non page aligned addr could ultimately cause a kernel bug in remap_pte_range as the exit condition in the logic loop may never be satisfied. This patch documents the need for the requirement, as well as explicitly adds a check for it. Link: http://lkml.kernel.org/r/20200617233512.177519-1-zhangalex@google.com Signed-off-by: Alex Zhang <zhangalex@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memory.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/mm/memory.c~mm-memoryc-make-remap_pfn_range-reject-unaligned-addr +++ a/mm/memory.c @@ -2082,7 +2082,7 @@ static inline int remap_p4d_range(struct /** * remap_pfn_range - remap kernel memory to userspace * @vma: user vma to map to - * @addr: target user address to start at + * @addr: target page aligned user address to start at * @pfn: page frame number of kernel physical memory address * @size: size of mapping area * @prot: page protection flags for this mapping @@ -2101,6 +2101,9 @@ int remap_pfn_range(struct vm_area_struc unsigned long remap_pfn = pfn; int err; + if (WARN_ON_ONCE(!PAGE_ALIGNED(addr))) + return -EINVAL; + /* * Physically remapped pages are special. Tell the * rest of the world about it: _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: mm: remove unneeded includes of <asm/pgalloc.h> Patch series "mm: cleanup usage of <asm/pgalloc.h>" Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermediate levels of page table. These patches add generic versions of these functions in <asm-generic/pgalloc.h> and enable use of the generic functions where appropriate. In addition, functions declared and defined in <asm/pgalloc.h> headers are used mostly by core mm and early mm initialization in arch and there is no actual reason to have the <asm/pgalloc.h> included all over the place. The first patch in this series removes unneeded includes of <asm/pgalloc.h> In the end it didn't work out as neatly as I hoped and moving pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require unnecessary changes to arches that have custom page table allocations, so I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local to mm/. This patch (of 8): In most cases <asm/pgalloc.h> header is required only for allocations of page table memory. Most of the .c files that include that header do not use symbols declared in <asm/pgalloc.h> and do not require that header. As for the other header files that used to include <asm/pgalloc.h>, it is possible to move that include into the .c file that actually uses symbols from <asm/pgalloc.h> and drop the include from the header file. The process was somewhat automated using sed -i -E '/[<"]asm\/pgalloc\.h/d' \ $(grep -L -w -f /tmp/xx \ $(git grep -E -l '[<"]asm/pgalloc\.h')) where /tmp/xx contains all the symbols defined in arch/*/include/asm/pgalloc.h. [rppt@linux.ibm.com: fix powerpc warning] Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/alpha/include/asm/tlbflush.h | 1 - arch/alpha/kernel/core_irongate.c | 1 - arch/alpha/kernel/core_marvel.c | 1 - arch/alpha/kernel/core_titan.c | 1 - arch/alpha/kernel/machvec_impl.h | 2 -- arch/alpha/kernel/smp.c | 1 - arch/alpha/mm/numa.c | 1 - arch/arc/mm/fault.c | 1 - arch/arc/mm/init.c | 1 - arch/arm/include/asm/tlb.h | 1 - arch/arm/kernel/machine_kexec.c | 1 - arch/arm/kernel/smp.c | 1 - arch/arm/kernel/suspend.c | 1 - arch/arm/mach-omap2/omap-mpuss-lowpower.c | 1 - arch/arm/mm/hugetlbpage.c | 1 - arch/arm/mm/mmu.c | 1 + arch/arm64/kernel/smp.c | 1 - arch/arm64/mm/hugetlbpage.c | 1 - arch/arm64/mm/ioremap.c | 1 - arch/arm64/mm/mmu.c | 1 + arch/csky/kernel/smp.c | 1 - arch/ia64/include/asm/tlb.h | 1 - arch/ia64/kernel/process.c | 1 - arch/ia64/kernel/smp.c | 1 - arch/ia64/kernel/smpboot.c | 1 - arch/ia64/mm/contig.c | 1 - arch/ia64/mm/discontig.c | 1 - arch/ia64/mm/hugetlbpage.c | 1 - arch/ia64/mm/tlb.c | 1 - arch/m68k/include/asm/mmu_context.h | 2 +- arch/m68k/kernel/dma.c | 2 +- arch/m68k/kernel/traps.c | 3 +-- arch/m68k/mm/cache.c | 2 +- arch/m68k/mm/fault.c | 1 - arch/m68k/mm/kmap.c | 2 +- arch/m68k/mm/mcfmmu.c | 1 + arch/m68k/mm/memory.c | 1 - arch/m68k/sun3x/dvma.c | 2 +- arch/microblaze/include/asm/tlbflush.h | 1 - arch/microblaze/kernel/process.c | 1 - arch/microblaze/kernel/signal.c | 1 - arch/mips/sgi-ip32/ip32-memory.c | 1 - arch/openrisc/include/asm/tlbflush.h | 1 - arch/openrisc/kernel/or32_ksyms.c | 1 - arch/parisc/include/asm/mmu_context.h | 1 - arch/parisc/kernel/cache.c | 1 - arch/parisc/kernel/pci-dma.c | 1 - arch/parisc/kernel/process.c | 1 - arch/parisc/kernel/signal.c | 1 - arch/parisc/kernel/smp.c | 1 - arch/parisc/mm/hugetlbpage.c | 1 - arch/parisc/mm/ioremap.c | 2 +- arch/powerpc/include/asm/tlb.h | 1 - arch/powerpc/mm/book3s64/hash_hugetlbpage.c | 1 - arch/powerpc/mm/book3s64/hash_pgtable.c | 1 - arch/powerpc/mm/book3s64/hash_tlb.c | 1 - arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 1 - arch/powerpc/mm/init_32.c | 1 - arch/powerpc/mm/kasan/8xx.c | 1 - arch/powerpc/mm/kasan/book3s_32.c | 1 - arch/powerpc/mm/mem.c | 1 - arch/powerpc/mm/nohash/40x.c | 1 - arch/powerpc/mm/nohash/8xx.c | 1 - arch/powerpc/mm/nohash/fsl_booke.c | 1 - arch/powerpc/mm/nohash/kaslr_booke.c | 1 - arch/powerpc/mm/nohash/tlb.c | 1 + arch/powerpc/mm/pgtable.c | 1 - arch/powerpc/mm/pgtable_64.c | 1 - arch/powerpc/mm/ptdump/hashpagetable.c | 2 +- arch/powerpc/mm/ptdump/ptdump.c | 1 - arch/powerpc/platforms/pseries/cmm.c | 1 - arch/riscv/mm/fault.c | 1 - arch/s390/include/asm/tlb.h | 1 - arch/s390/include/asm/tlbflush.h | 1 - arch/s390/kernel/machine_kexec.c | 1 - arch/s390/kernel/ptrace.c | 1 - arch/s390/kvm/diag.c | 1 - arch/s390/kvm/priv.c | 1 - arch/s390/kvm/pv.c | 1 - arch/s390/mm/cmm.c | 1 - arch/s390/mm/mmap.c | 1 - arch/s390/mm/pgtable.c | 1 - arch/sh/kernel/idle.c | 1 - arch/sh/kernel/machine_kexec.c | 1 - arch/sh/mm/cache-sh3.c | 1 - arch/sh/mm/cache-sh7705.c | 1 - arch/sh/mm/hugetlbpage.c | 1 - arch/sh/mm/init.c | 1 + arch/sh/mm/ioremap_fixed.c | 1 - arch/sh/mm/tlb-sh3.c | 1 - arch/sparc/include/asm/ide.h | 1 - arch/sparc/include/asm/tlb_64.h | 1 - arch/sparc/kernel/leon_smp.c | 1 - arch/sparc/kernel/process_32.c | 1 - arch/sparc/kernel/signal_32.c | 1 - arch/sparc/kernel/smp_32.c | 1 - arch/sparc/kernel/smp_64.c | 1 + arch/sparc/kernel/sun4m_irq.c | 1 - arch/sparc/mm/highmem.c | 1 - arch/sparc/mm/io-unit.c | 1 - arch/sparc/mm/iommu.c | 1 - arch/sparc/mm/tlb.c | 1 - arch/x86/ia32/ia32_aout.c | 1 - arch/x86/include/asm/mmu_context.h | 1 - arch/x86/kernel/alternative.c | 1 + arch/x86/kernel/apic/apic.c | 1 - arch/x86/kernel/mpparse.c | 1 - arch/x86/kernel/traps.c | 1 - arch/x86/mm/fault.c | 1 - arch/x86/mm/hugetlbpage.c | 1 - arch/x86/mm/kaslr.c | 1 - arch/x86/mm/pgtable_32.c | 1 - arch/x86/mm/pti.c | 1 - arch/x86/platform/uv/bios_uv.c | 1 + arch/xtensa/kernel/xtensa_ksyms.c | 1 - arch/xtensa/mm/cache.c | 1 - arch/xtensa/mm/fault.c | 1 - drivers/block/xen-blkback/common.h | 1 - drivers/iommu/ipmmu-vmsa.c | 1 - drivers/xen/balloon.c | 1 - drivers/xen/privcmd.c | 1 - fs/binfmt_elf_fdpic.c | 1 - include/asm-generic/tlb.h | 1 - mm/hugetlb.c | 1 + mm/sparse.c | 1 - 125 files changed, 17 insertions(+), 118 deletions(-) --- a/arch/alpha/include/asm/tlbflush.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/alpha/include/asm/tlbflush.h @@ -5,7 +5,6 @@ #include <linux/mm.h> #include <linux/sched.h> #include <asm/compiler.h> -#include <asm/pgalloc.h> #ifndef __EXTERN_INLINE #define __EXTERN_INLINE extern inline --- a/arch/alpha/kernel/core_irongate.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/alpha/kernel/core_irongate.c @@ -302,7 +302,6 @@ irongate_init_arch(void) #include <linux/agp_backend.h> #include <linux/agpgart.h> #include <linux/export.h> -#include <asm/pgalloc.h> #define GET_PAGE_DIR_OFF(addr) (addr >> 22) #define GET_PAGE_DIR_IDX(addr) (GET_PAGE_DIR_OFF(addr)) --- a/arch/alpha/kernel/core_marvel.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/alpha/kernel/core_marvel.c @@ -23,7 +23,6 @@ #include <asm/ptrace.h> #include <asm/smp.h> #include <asm/gct.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/vga.h> --- a/arch/alpha/kernel/core_titan.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/alpha/kernel/core_titan.c @@ -20,7 +20,6 @@ #include <asm/ptrace.h> #include <asm/smp.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/vga.h> --- a/arch/alpha/kernel/machvec_impl.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/alpha/kernel/machvec_impl.h @@ -7,8 +7,6 @@ * This file has goodies to help simplify instantiation of machine vectors. */ -#include <asm/pgalloc.h> - /* Whee. These systems don't have an HAE: IRONGATE, MARVEL, POLARIS, TSUNAMI, TITAN, WILDFIRE Fix things up for the GENERIC kernel by defining the HAE address --- a/arch/alpha/kernel/smp.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/alpha/kernel/smp.c @@ -36,7 +36,6 @@ #include <asm/io.h> #include <asm/irq.h> -#include <asm/pgalloc.h> #include <asm/mmu_context.h> #include <asm/tlbflush.h> --- a/arch/alpha/mm/numa.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/alpha/mm/numa.c @@ -17,7 +17,6 @@ #include <linux/module.h> #include <asm/hwrpb.h> -#include <asm/pgalloc.h> #include <asm/sections.h> pg_data_t node_data[MAX_NUMNODES]; --- a/arch/arc/mm/fault.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arc/mm/fault.c @@ -13,7 +13,6 @@ #include <linux/kdebug.h> #include <linux/perf_event.h> #include <linux/mm_types.h> -#include <asm/pgalloc.h> #include <asm/mmu.h> /* --- a/arch/arc/mm/init.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arc/mm/init.c @@ -14,7 +14,6 @@ #include <linux/module.h> #include <linux/highmem.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/sections.h> #include <asm/arcregs.h> --- a/arch/arm64/kernel/smp.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm64/kernel/smp.c @@ -43,7 +43,6 @@ #include <asm/kvm_mmu.h> #include <asm/mmu_context.h> #include <asm/numa.h> -#include <asm/pgalloc.h> #include <asm/processor.h> #include <asm/smp_plat.h> #include <asm/sections.h> --- a/arch/arm64/mm/hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm64/mm/hugetlbpage.c @@ -17,7 +17,6 @@ #include <asm/mman.h> #include <asm/tlb.h> #include <asm/tlbflush.h> -#include <asm/pgalloc.h> /* * HugeTLB Support Matrix --- a/arch/arm64/mm/ioremap.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm64/mm/ioremap.c @@ -16,7 +16,6 @@ #include <asm/fixmap.h> #include <asm/tlbflush.h> -#include <asm/pgalloc.h> static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size, pgprot_t prot, void *caller) --- a/arch/arm64/mm/mmu.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm64/mm/mmu.c @@ -35,6 +35,7 @@ #include <asm/mmu_context.h> #include <asm/ptdump.h> #include <asm/tlbflush.h> +#include <asm/pgalloc.h> #define NO_BLOCK_MAPPINGS BIT(0) #define NO_CONT_MAPPINGS BIT(1) --- a/arch/arm/include/asm/tlb.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm/include/asm/tlb.h @@ -27,7 +27,6 @@ #else /* !CONFIG_MMU */ #include <linux/swap.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> static inline void __tlb_remove_table(void *_table) --- a/arch/arm/kernel/machine_kexec.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm/kernel/machine_kexec.c @@ -11,7 +11,6 @@ #include <linux/irq.h> #include <linux/memblock.h> #include <linux/of_fdt.h> -#include <asm/pgalloc.h> #include <asm/mmu_context.h> #include <asm/cacheflush.h> #include <asm/fncpy.h> --- a/arch/arm/kernel/smp.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm/kernel/smp.c @@ -37,7 +37,6 @@ #include <asm/idmap.h> #include <asm/topology.h> #include <asm/mmu_context.h> -#include <asm/pgalloc.h> #include <asm/procinfo.h> #include <asm/processor.h> #include <asm/sections.h> --- a/arch/arm/kernel/suspend.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm/kernel/suspend.c @@ -7,7 +7,6 @@ #include <asm/bugs.h> #include <asm/cacheflush.h> #include <asm/idmap.h> -#include <asm/pgalloc.h> #include <asm/memory.h> #include <asm/smp_plat.h> #include <asm/suspend.h> --- a/arch/arm/mach-omap2/omap-mpuss-lowpower.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm/mach-omap2/omap-mpuss-lowpower.c @@ -42,7 +42,6 @@ #include <asm/cacheflush.h> #include <asm/tlbflush.h> #include <asm/smp_scu.h> -#include <asm/pgalloc.h> #include <asm/suspend.h> #include <asm/virt.h> #include <asm/hardware/cache-l2x0.h> --- a/arch/arm/mm/hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm/mm/hugetlbpage.c @@ -17,7 +17,6 @@ #include <asm/mman.h> #include <asm/tlb.h> #include <asm/tlbflush.h> -#include <asm/pgalloc.h> /* * On ARM, huge pages are backed by pmd's rather than pte's, so we do a lot --- a/arch/arm/mm/mmu.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/arm/mm/mmu.c @@ -29,6 +29,7 @@ #include <asm/traps.h> #include <asm/procinfo.h> #include <asm/memory.h> +#include <asm/pgalloc.h> #include <asm/mach/arch.h> #include <asm/mach/map.h> --- a/arch/csky/kernel/smp.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/csky/kernel/smp.c @@ -23,7 +23,6 @@ #include <asm/traps.h> #include <asm/sections.h> #include <asm/mmu_context.h> -#include <asm/pgalloc.h> #ifdef CONFIG_CPU_HAS_FPU #include <abi/fpu.h> #endif --- a/arch/ia64/include/asm/tlb.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/include/asm/tlb.h @@ -42,7 +42,6 @@ #include <linux/pagemap.h> #include <linux/swap.h> -#include <asm/pgalloc.h> #include <asm/processor.h> #include <asm/tlbflush.h> --- a/arch/ia64/kernel/process.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/kernel/process.c @@ -40,7 +40,6 @@ #include <asm/elf.h> #include <asm/irq.h> #include <asm/kexec.h> -#include <asm/pgalloc.h> #include <asm/processor.h> #include <asm/sal.h> #include <asm/switch_to.h> --- a/arch/ia64/kernel/smpboot.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/kernel/smpboot.c @@ -49,7 +49,6 @@ #include <asm/irq.h> #include <asm/mca.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/processor.h> #include <asm/ptrace.h> #include <asm/sal.h> --- a/arch/ia64/kernel/smp.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/kernel/smp.c @@ -39,7 +39,6 @@ #include <asm/io.h> #include <asm/irq.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/processor.h> #include <asm/ptrace.h> #include <asm/sal.h> --- a/arch/ia64/mm/contig.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/mm/contig.c @@ -21,7 +21,6 @@ #include <linux/swap.h> #include <asm/meminit.h> -#include <asm/pgalloc.h> #include <asm/sections.h> #include <asm/mca.h> --- a/arch/ia64/mm/discontig.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/mm/discontig.c @@ -24,7 +24,6 @@ #include <linux/efi.h> #include <linux/nodemask.h> #include <linux/slab.h> -#include <asm/pgalloc.h> #include <asm/tlb.h> #include <asm/meminit.h> #include <asm/numa.h> --- a/arch/ia64/mm/hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/mm/hugetlbpage.c @@ -18,7 +18,6 @@ #include <linux/sysctl.h> #include <linux/log2.h> #include <asm/mman.h> -#include <asm/pgalloc.h> #include <asm/tlb.h> #include <asm/tlbflush.h> --- a/arch/ia64/mm/tlb.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/ia64/mm/tlb.c @@ -27,7 +27,6 @@ #include <asm/delay.h> #include <asm/mmu_context.h> -#include <asm/pgalloc.h> #include <asm/pal.h> #include <asm/tlbflush.h> #include <asm/dma.h> --- a/arch/m68k/include/asm/mmu_context.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/include/asm/mmu_context.h @@ -222,7 +222,7 @@ static inline void activate_mm(struct mm #include <asm/setup.h> #include <asm/page.h> -#include <asm/pgalloc.h> +#include <asm/cacheflush.h> static inline int init_new_context(struct task_struct *tsk, struct mm_struct *mm) --- a/arch/m68k/kernel/dma.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/kernel/dma.c @@ -15,7 +15,7 @@ #include <linux/vmalloc.h> #include <linux/export.h> -#include <asm/pgalloc.h> +#include <asm/cacheflush.h> #if defined(CONFIG_MMU) && !defined(CONFIG_COLDFIRE) void arch_dma_prep_coherent(struct page *page, size_t size) --- a/arch/m68k/kernel/traps.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/kernel/traps.c @@ -35,10 +35,9 @@ #include <asm/fpu.h> #include <linux/uaccess.h> #include <asm/traps.h> -#include <asm/pgalloc.h> #include <asm/machdep.h> #include <asm/siginfo.h> - +#include <asm/tlbflush.h> static const char *vec_names[] = { [VEC_RESETSP] = "RESET SP", --- a/arch/m68k/mm/cache.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/mm/cache.c @@ -8,7 +8,7 @@ */ #include <linux/module.h> -#include <asm/pgalloc.h> +#include <asm/cacheflush.h> #include <asm/traps.h> --- a/arch/m68k/mm/fault.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/mm/fault.c @@ -15,7 +15,6 @@ #include <asm/setup.h> #include <asm/traps.h> -#include <asm/pgalloc.h> extern void die_if_kernel(char *, struct pt_regs *, long); --- a/arch/m68k/mm/kmap.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/mm/kmap.c @@ -19,8 +19,8 @@ #include <asm/setup.h> #include <asm/segment.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/io.h> +#include <asm/tlbflush.h> #undef DEBUG --- a/arch/m68k/mm/mcfmmu.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/mm/mcfmmu.c @@ -20,6 +20,7 @@ #include <asm/mmu_context.h> #include <asm/mcf_pgalloc.h> #include <asm/tlbflush.h> +#include <asm/pgalloc.h> #define KMAPAREA(x) ((x >= VMALLOC_START) && (x < KMAP_END)) --- a/arch/m68k/mm/memory.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/mm/memory.c @@ -17,7 +17,6 @@ #include <asm/setup.h> #include <asm/segment.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/traps.h> #include <asm/machdep.h> --- a/arch/m68k/sun3x/dvma.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/m68k/sun3x/dvma.c @@ -22,7 +22,7 @@ #include <asm/dvma.h> #include <asm/io.h> #include <asm/page.h> -#include <asm/pgalloc.h> +#include <asm/tlbflush.h> /* IOMMU support */ --- a/arch/microblaze/include/asm/tlbflush.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/microblaze/include/asm/tlbflush.h @@ -15,7 +15,6 @@ #include <asm/processor.h> /* For TASK_SIZE */ #include <asm/mmu.h> #include <asm/page.h> -#include <asm/pgalloc.h> extern void _tlbie(unsigned long address); extern void _tlbia(void); --- a/arch/microblaze/kernel/process.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/microblaze/kernel/process.c @@ -18,7 +18,6 @@ #include <linux/tick.h> #include <linux/bitops.h> #include <linux/ptrace.h> -#include <asm/pgalloc.h> #include <linux/uaccess.h> /* for USER_DS macros */ #include <asm/cacheflush.h> --- a/arch/microblaze/kernel/signal.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/microblaze/kernel/signal.c @@ -35,7 +35,6 @@ #include <asm/entry.h> #include <asm/ucontext.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <linux/syscalls.h> #include <asm/cacheflush.h> #include <asm/syscalls.h> --- a/arch/mips/sgi-ip32/ip32-memory.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/mips/sgi-ip32/ip32-memory.c @@ -14,7 +14,6 @@ #include <asm/ip32/crime.h> #include <asm/bootinfo.h> #include <asm/page.h> -#include <asm/pgalloc.h> extern void crime_init(void); --- a/arch/openrisc/include/asm/tlbflush.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/openrisc/include/asm/tlbflush.h @@ -17,7 +17,6 @@ #include <linux/mm.h> #include <asm/processor.h> -#include <asm/pgalloc.h> #include <asm/current.h> #include <linux/sched.h> --- a/arch/openrisc/kernel/or32_ksyms.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/openrisc/kernel/or32_ksyms.c @@ -26,7 +26,6 @@ #include <asm/io.h> #include <asm/hardirq.h> #include <asm/delay.h> -#include <asm/pgalloc.h> #define DECLARE_EXPORT(name) extern void name(void); EXPORT_SYMBOL(name) --- a/arch/parisc/include/asm/mmu_context.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/include/asm/mmu_context.h @@ -5,7 +5,6 @@ #include <linux/mm.h> #include <linux/sched.h> #include <linux/atomic.h> -#include <asm/pgalloc.h> #include <asm-generic/mm_hooks.h> static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) --- a/arch/parisc/kernel/cache.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/kernel/cache.c @@ -24,7 +24,6 @@ #include <asm/cacheflush.h> #include <asm/tlbflush.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/processor.h> #include <asm/sections.h> #include <asm/shmparam.h> --- a/arch/parisc/kernel/pci-dma.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/kernel/pci-dma.c @@ -32,7 +32,6 @@ #include <asm/dma.h> /* for DMA_CHUNK_SIZE */ #include <asm/io.h> #include <asm/page.h> /* get_order */ -#include <asm/pgalloc.h> #include <linux/uaccess.h> #include <asm/tlbflush.h> /* for purge_tlb_*() macros */ --- a/arch/parisc/kernel/process.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/kernel/process.c @@ -47,7 +47,6 @@ #include <asm/assembly.h> #include <asm/pdc.h> #include <asm/pdc_chassis.h> -#include <asm/pgalloc.h> #include <asm/unwind.h> #include <asm/sections.h> --- a/arch/parisc/kernel/signal.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/kernel/signal.c @@ -30,7 +30,6 @@ #include <asm/ucontext.h> #include <asm/rt_sigframe.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <asm/cacheflush.h> #include <asm/asm-offsets.h> --- a/arch/parisc/kernel/smp.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/kernel/smp.c @@ -39,7 +39,6 @@ #include <asm/irq.h> /* for CPU_IRQ_REGION and friends */ #include <asm/mmu_context.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/processor.h> #include <asm/ptrace.h> #include <asm/unistd.h> --- a/arch/parisc/mm/hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/mm/hugetlbpage.c @@ -15,7 +15,6 @@ #include <linux/sysctl.h> #include <asm/mman.h> -#include <asm/pgalloc.h> #include <asm/tlb.h> #include <asm/tlbflush.h> #include <asm/cacheflush.h> --- a/arch/parisc/mm/ioremap.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/parisc/mm/ioremap.c @@ -11,7 +11,7 @@ #include <linux/errno.h> #include <linux/module.h> #include <linux/io.h> -#include <asm/pgalloc.h> +#include <linux/mm.h> /* * Generic mapping function (not visible outside): --- a/arch/powerpc/include/asm/tlb.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/include/asm/tlb.h @@ -12,7 +12,6 @@ #ifndef __powerpc64__ #include <linux/pgtable.h> #endif -#include <asm/pgalloc.h> #ifndef __powerpc64__ #include <asm/page.h> #include <asm/mmu.h> --- a/arch/powerpc/mm/book3s64/hash_hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/book3s64/hash_hugetlbpage.c @@ -10,7 +10,6 @@ #include <linux/mm.h> #include <linux/hugetlb.h> -#include <asm/pgalloc.h> #include <asm/cacheflush.h> #include <asm/machdep.h> --- a/arch/powerpc/mm/book3s64/hash_pgtable.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/book3s64/hash_pgtable.c @@ -9,7 +9,6 @@ #include <linux/mm_types.h> #include <linux/mm.h> -#include <asm/pgalloc.h> #include <asm/sections.h> #include <asm/mmu.h> #include <asm/tlb.h> --- a/arch/powerpc/mm/book3s64/hash_tlb.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/book3s64/hash_tlb.c @@ -21,7 +21,6 @@ #include <linux/mm.h> #include <linux/percpu.h> #include <linux/hardirq.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/tlb.h> #include <asm/bug.h> --- a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c @@ -2,7 +2,6 @@ #include <linux/mm.h> #include <linux/hugetlb.h> #include <linux/security.h> -#include <asm/pgalloc.h> #include <asm/cacheflush.h> #include <asm/machdep.h> #include <asm/mman.h> --- a/arch/powerpc/mm/init_32.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/init_32.c @@ -29,7 +29,6 @@ #include <linux/slab.h> #include <linux/hugetlb.h> -#include <asm/pgalloc.h> #include <asm/prom.h> #include <asm/io.h> #include <asm/mmu.h> --- a/arch/powerpc/mm/kasan/8xx.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/kasan/8xx.c @@ -5,7 +5,6 @@ #include <linux/kasan.h> #include <linux/memblock.h> #include <linux/hugetlb.h> -#include <asm/pgalloc.h> static int __init kasan_init_shadow_8M(unsigned long k_start, unsigned long k_end, void *block) --- a/arch/powerpc/mm/kasan/book3s_32.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/kasan/book3s_32.c @@ -4,7 +4,6 @@ #include <linux/kasan.h> #include <linux/memblock.h> -#include <asm/pgalloc.h> #include <mm/mmu_decl.h> int __init kasan_init_region(void *start, size_t size) --- a/arch/powerpc/mm/mem.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/mem.c @@ -34,7 +34,6 @@ #include <linux/dma-direct.h> #include <linux/kprobes.h> -#include <asm/pgalloc.h> #include <asm/prom.h> #include <asm/io.h> #include <asm/mmu_context.h> --- a/arch/powerpc/mm/nohash/40x.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/nohash/40x.c @@ -32,7 +32,6 @@ #include <linux/highmem.h> #include <linux/memblock.h> -#include <asm/pgalloc.h> #include <asm/prom.h> #include <asm/io.h> #include <asm/mmu_context.h> --- a/arch/powerpc/mm/nohash/8xx.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/nohash/8xx.c @@ -13,7 +13,6 @@ #include <asm/fixmap.h> #include <asm/code-patching.h> #include <asm/inst.h> -#include <asm/pgalloc.h> #include <mm/mmu_decl.h> --- a/arch/powerpc/mm/nohash/fsl_booke.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/nohash/fsl_booke.c @@ -37,7 +37,6 @@ #include <linux/highmem.h> #include <linux/memblock.h> -#include <asm/pgalloc.h> #include <asm/prom.h> #include <asm/io.h> #include <asm/mmu_context.h> --- a/arch/powerpc/mm/nohash/kaslr_booke.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/nohash/kaslr_booke.c @@ -15,7 +15,6 @@ #include <linux/libfdt.h> #include <linux/crash_core.h> #include <asm/cacheflush.h> -#include <asm/pgalloc.h> #include <asm/prom.h> #include <asm/kdump.h> #include <mm/mmu_decl.h> --- a/arch/powerpc/mm/nohash/tlb.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/nohash/tlb.c @@ -34,6 +34,7 @@ #include <linux/of_fdt.h> #include <linux/hugetlb.h> +#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/tlb.h> #include <asm/code-patching.h> --- a/arch/powerpc/mm/pgtable_64.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/pgtable_64.c @@ -31,7 +31,6 @@ #include <linux/slab.h> #include <linux/hugetlb.h> -#include <asm/pgalloc.h> #include <asm/page.h> #include <asm/prom.h> #include <asm/mmu_context.h> --- a/arch/powerpc/mm/pgtable.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/pgtable.c @@ -23,7 +23,6 @@ #include <linux/percpu.h> #include <linux/hardirq.h> #include <linux/hugetlb.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/tlb.h> #include <asm/hugetlb.h> --- a/arch/powerpc/mm/ptdump/hashpagetable.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/ptdump/hashpagetable.c @@ -17,10 +17,10 @@ #include <linux/seq_file.h> #include <linux/const.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/plpar_wrappers.h> #include <linux/memblock.h> #include <asm/firmware.h> +#include <asm/pgalloc.h> struct pg_state { struct seq_file *seq; --- a/arch/powerpc/mm/ptdump/ptdump.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/mm/ptdump/ptdump.c @@ -21,7 +21,6 @@ #include <asm/fixmap.h> #include <linux/const.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/hugetlb.h> #include <mm/mmu_decl.h> --- a/arch/powerpc/platforms/pseries/cmm.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/powerpc/platforms/pseries/cmm.c @@ -26,7 +26,6 @@ #include <asm/firmware.h> #include <asm/hvcall.h> #include <asm/mmu.h> -#include <asm/pgalloc.h> #include <linux/uaccess.h> #include <linux/memory.h> #include <asm/plpar_wrappers.h> --- a/arch/riscv/mm/fault.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/riscv/mm/fault.c @@ -14,7 +14,6 @@ #include <linux/signal.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <asm/ptrace.h> #include <asm/tlbflush.h> --- a/arch/s390/include/asm/tlbflush.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/include/asm/tlbflush.h @@ -5,7 +5,6 @@ #include <linux/mm.h> #include <linux/sched.h> #include <asm/processor.h> -#include <asm/pgalloc.h> /* * Flush all TLB entries on the local CPU. --- a/arch/s390/include/asm/tlb.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/include/asm/tlb.h @@ -36,7 +36,6 @@ static inline bool __tlb_remove_page_siz #define p4d_free_tlb p4d_free_tlb #define pud_free_tlb pud_free_tlb -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm-generic/tlb.h> --- a/arch/s390/kernel/machine_kexec.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/kernel/machine_kexec.c @@ -16,7 +16,6 @@ #include <linux/debug_locks.h> #include <asm/cio.h> #include <asm/setup.h> -#include <asm/pgalloc.h> #include <asm/smp.h> #include <asm/ipl.h> #include <asm/diag.h> --- a/arch/s390/kernel/ptrace.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/kernel/ptrace.c @@ -25,7 +25,6 @@ #include <linux/compat.h> #include <trace/syscall.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <linux/uaccess.h> #include <asm/unistd.h> #include <asm/switch_to.h> --- a/arch/s390/kvm/diag.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/kvm/diag.c @@ -10,7 +10,6 @@ #include <linux/kvm.h> #include <linux/kvm_host.h> -#include <asm/pgalloc.h> #include <asm/gmap.h> #include <asm/virtio-ccw.h> #include "kvm-s390.h" --- a/arch/s390/kvm/priv.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/kvm/priv.c @@ -22,7 +22,6 @@ #include <asm/ebcdic.h> #include <asm/sysinfo.h> #include <asm/page-states.h> -#include <asm/pgalloc.h> #include <asm/gmap.h> #include <asm/io.h> #include <asm/ptrace.h> --- a/arch/s390/kvm/pv.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/kvm/pv.c @@ -9,7 +9,6 @@ #include <linux/kvm_host.h> #include <linux/pagemap.h> #include <linux/sched/signal.h> -#include <asm/pgalloc.h> #include <asm/gmap.h> #include <asm/uv.h> #include <asm/mman.h> --- a/arch/s390/mm/cmm.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/mm/cmm.c @@ -21,7 +21,6 @@ #include <linux/oom.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <asm/diag.h> #ifdef CONFIG_CMM_IUCV --- a/arch/s390/mm/mmap.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/mm/mmap.c @@ -17,7 +17,6 @@ #include <linux/random.h> #include <linux/compat.h> #include <linux/security.h> -#include <asm/pgalloc.h> #include <asm/elf.h> static unsigned long stack_maxrandom_size(void) --- a/arch/s390/mm/pgtable.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/s390/mm/pgtable.c @@ -19,7 +19,6 @@ #include <linux/ksm.h> #include <linux/mman.h> -#include <asm/pgalloc.h> #include <asm/tlb.h> #include <asm/tlbflush.h> #include <asm/mmu_context.h> --- a/arch/sh/kernel/idle.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/kernel/idle.c @@ -14,7 +14,6 @@ #include <linux/irqflags.h> #include <linux/smp.h> #include <linux/atomic.h> -#include <asm/pgalloc.h> #include <asm/smp.h> #include <asm/bl_bit.h> --- a/arch/sh/kernel/machine_kexec.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/kernel/machine_kexec.c @@ -14,7 +14,6 @@ #include <linux/ftrace.h> #include <linux/suspend.h> #include <linux/memblock.h> -#include <asm/pgalloc.h> #include <asm/mmu_context.h> #include <asm/io.h> #include <asm/cacheflush.h> --- a/arch/sh/mm/cache-sh3.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/mm/cache-sh3.c @@ -16,7 +16,6 @@ #include <asm/cache.h> #include <asm/io.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <asm/mmu_context.h> #include <asm/cacheflush.h> --- a/arch/sh/mm/cache-sh7705.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/mm/cache-sh7705.c @@ -20,7 +20,6 @@ #include <asm/cache.h> #include <asm/io.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <asm/mmu_context.h> #include <asm/cacheflush.h> --- a/arch/sh/mm/hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/mm/hugetlbpage.c @@ -17,7 +17,6 @@ #include <linux/sysctl.h> #include <asm/mman.h> -#include <asm/pgalloc.h> #include <asm/tlb.h> #include <asm/tlbflush.h> #include <asm/cacheflush.h> --- a/arch/sh/mm/init.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/mm/init.c @@ -27,6 +27,7 @@ #include <asm/sections.h> #include <asm/setup.h> #include <asm/cache.h> +#include <asm/pgalloc.h> #include <linux/sizes.h> pgd_t swapper_pg_dir[PTRS_PER_PGD]; --- a/arch/sh/mm/ioremap_fixed.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/mm/ioremap_fixed.c @@ -18,7 +18,6 @@ #include <linux/proc_fs.h> #include <asm/fixmap.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/addrspace.h> #include <asm/cacheflush.h> #include <asm/tlbflush.h> --- a/arch/sh/mm/tlb-sh3.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sh/mm/tlb-sh3.c @@ -21,7 +21,6 @@ #include <asm/io.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <asm/mmu_context.h> #include <asm/cacheflush.h> --- a/arch/sparc/include/asm/ide.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/include/asm/ide.h @@ -13,7 +13,6 @@ #include <asm/io.h> #ifdef CONFIG_SPARC64 -#include <asm/pgalloc.h> #include <asm/spitfire.h> #include <asm/cacheflush.h> #include <asm/page.h> --- a/arch/sparc/include/asm/tlb_64.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/include/asm/tlb_64.h @@ -4,7 +4,6 @@ #include <linux/swap.h> #include <linux/pagemap.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/mmu_context.h> --- a/arch/sparc/kernel/leon_smp.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/kernel/leon_smp.c @@ -38,7 +38,6 @@ #include <asm/delay.h> #include <asm/irq.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/oplib.h> #include <asm/cpudata.h> #include <asm/asi.h> --- a/arch/sparc/kernel/process_32.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/kernel/process_32.c @@ -34,7 +34,6 @@ #include <asm/oplib.h> #include <linux/uaccess.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/delay.h> #include <asm/processor.h> #include <asm/psr.h> --- a/arch/sparc/kernel/signal_32.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/kernel/signal_32.c @@ -23,7 +23,6 @@ #include <linux/uaccess.h> #include <asm/ptrace.h> -#include <asm/pgalloc.h> #include <asm/cacheflush.h> /* flush_sig_insns */ #include <asm/switch_to.h> --- a/arch/sparc/kernel/smp_32.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/kernel/smp_32.c @@ -29,7 +29,6 @@ #include <asm/irq.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/oplib.h> #include <asm/cacheflush.h> #include <asm/tlbflush.h> --- a/arch/sparc/kernel/smp_64.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/kernel/smp_64.c @@ -47,6 +47,7 @@ #include <linux/uaccess.h> #include <asm/starfire.h> #include <asm/tlb.h> +#include <asm/pgalloc.h> #include <asm/sections.h> #include <asm/prom.h> #include <asm/mdesc.h> --- a/arch/sparc/kernel/sun4m_irq.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/kernel/sun4m_irq.c @@ -16,7 +16,6 @@ #include <asm/timer.h> #include <asm/traps.h> -#include <asm/pgalloc.h> #include <asm/irq.h> #include <asm/io.h> #include <asm/cacheflush.h> --- a/arch/sparc/mm/highmem.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/mm/highmem.c @@ -29,7 +29,6 @@ #include <asm/cacheflush.h> #include <asm/tlbflush.h> -#include <asm/pgalloc.h> #include <asm/vaddrs.h> static pte_t *kmap_pte; --- a/arch/sparc/mm/iommu.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/mm/iommu.c @@ -16,7 +16,6 @@ #include <linux/of.h> #include <linux/of_device.h> -#include <asm/pgalloc.h> #include <asm/io.h> #include <asm/mxcc.h> #include <asm/mbus.h> --- a/arch/sparc/mm/io-unit.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/mm/io-unit.c @@ -15,7 +15,6 @@ #include <linux/of.h> #include <linux/of_device.h> -#include <asm/pgalloc.h> #include <asm/io.h> #include <asm/io-unit.h> #include <asm/mxcc.h> --- a/arch/sparc/mm/tlb.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/sparc/mm/tlb.c @@ -10,7 +10,6 @@ #include <linux/swap.h> #include <linux/preempt.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/cacheflush.h> #include <asm/mmu_context.h> --- a/arch/x86/ia32/ia32_aout.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/ia32/ia32_aout.c @@ -30,7 +30,6 @@ #include <linux/sched/task_stack.h> #include <linux/uaccess.h> -#include <asm/pgalloc.h> #include <asm/cacheflush.h> #include <asm/user32.h> #include <asm/ia32.h> --- a/arch/x86/include/asm/mmu_context.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/include/asm/mmu_context.h @@ -9,7 +9,6 @@ #include <trace/events/tlb.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/paravirt.h> #include <asm/debugreg.h> --- a/arch/x86/kernel/alternative.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/kernel/alternative.c @@ -7,6 +7,7 @@ #include <linux/mutex.h> #include <linux/list.h> #include <linux/stringify.h> +#include <linux/highmem.h> #include <linux/mm.h> #include <linux/vmalloc.h> #include <linux/memory.h> --- a/arch/x86/kernel/apic/apic.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/kernel/apic/apic.c @@ -40,7 +40,6 @@ #include <asm/irq_remapping.h> #include <asm/perf_event.h> #include <asm/x86_init.h> -#include <asm/pgalloc.h> #include <linux/atomic.h> #include <asm/mpspec.h> #include <asm/i8259.h> --- a/arch/x86/kernel/mpparse.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/kernel/mpparse.c @@ -22,7 +22,6 @@ #include <asm/irqdomain.h> #include <asm/mtrr.h> #include <asm/mpspec.h> -#include <asm/pgalloc.h> #include <asm/io_apic.h> #include <asm/proto.h> #include <asm/bios_ebda.h> --- a/arch/x86/kernel/traps.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/kernel/traps.c @@ -62,7 +62,6 @@ #ifdef CONFIG_X86_64 #include <asm/x86_init.h> -#include <asm/pgalloc.h> #include <asm/proto.h> #else #include <asm/processor-flags.h> --- a/arch/x86/mm/fault.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/mm/fault.c @@ -21,7 +21,6 @@ #include <asm/cpufeature.h> /* boot_cpu_has, ... */ #include <asm/traps.h> /* dotraplinkage, ... */ -#include <asm/pgalloc.h> /* pgd_*(), ... */ #include <asm/fixmap.h> /* VSYSCALL_ADDR */ #include <asm/vsyscall.h> /* emulate_vsyscall */ #include <asm/vm86.h> /* struct vm86 */ --- a/arch/x86/mm/hugetlbpage.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/mm/hugetlbpage.c @@ -17,7 +17,6 @@ #include <asm/mman.h> #include <asm/tlb.h> #include <asm/tlbflush.h> -#include <asm/pgalloc.h> #include <asm/elf.h> #if 0 /* This is just for testing */ --- a/arch/x86/mm/kaslr.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/mm/kaslr.c @@ -26,7 +26,6 @@ #include <linux/memblock.h> #include <linux/pgtable.h> -#include <asm/pgalloc.h> #include <asm/setup.h> #include <asm/kaslr.h> --- a/arch/x86/mm/pgtable_32.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/mm/pgtable_32.c @@ -11,7 +11,6 @@ #include <linux/spinlock.h> #include <asm/cpu_entry_area.h> -#include <asm/pgalloc.h> #include <asm/fixmap.h> #include <asm/e820/api.h> #include <asm/tlb.h> --- a/arch/x86/mm/pti.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/mm/pti.c @@ -34,7 +34,6 @@ #include <asm/vsyscall.h> #include <asm/cmdline.h> #include <asm/pti.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/desc.h> #include <asm/sections.h> --- a/arch/x86/platform/uv/bios_uv.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/x86/platform/uv/bios_uv.c @@ -11,6 +11,7 @@ #include <linux/slab.h> #include <asm/efi.h> #include <linux/io.h> +#include <asm/pgalloc.h> #include <asm/uv/bios.h> #include <asm/uv/uv_hub.h> --- a/arch/xtensa/kernel/xtensa_ksyms.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/xtensa/kernel/xtensa_ksyms.c @@ -25,7 +25,6 @@ #include <asm/dma.h> #include <asm/io.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/ftrace.h> #ifdef CONFIG_BLK_DEV_FD #include <asm/floppy.h> --- a/arch/xtensa/mm/cache.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/xtensa/mm/cache.c @@ -31,7 +31,6 @@ #include <asm/tlb.h> #include <asm/tlbflush.h> #include <asm/page.h> -#include <asm/pgalloc.h> /* * Note: --- a/arch/xtensa/mm/fault.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/arch/xtensa/mm/fault.c @@ -20,7 +20,6 @@ #include <asm/mmu_context.h> #include <asm/cacheflush.h> #include <asm/hardirq.h> -#include <asm/pgalloc.h> DEFINE_PER_CPU(unsigned long, asid_cache) = ASID_USER_FIRST; void bad_page_fault(struct pt_regs*, unsigned long, int); --- a/drivers/block/xen-blkback/common.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/drivers/block/xen-blkback/common.h @@ -36,7 +36,6 @@ #include <linux/io.h> #include <linux/rbtree.h> #include <asm/setup.h> -#include <asm/pgalloc.h> #include <asm/hypervisor.h> #include <xen/grant_table.h> #include <xen/page.h> --- a/drivers/iommu/ipmmu-vmsa.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/drivers/iommu/ipmmu-vmsa.c @@ -28,7 +28,6 @@ #if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA) #include <asm/dma-iommu.h> -#include <asm/pgalloc.h> #else #define arm_iommu_create_mapping(...) NULL #define arm_iommu_attach_device(...) -ENODEV --- a/drivers/xen/balloon.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/drivers/xen/balloon.c @@ -58,7 +58,6 @@ #include <linux/sysctl.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include <asm/tlb.h> #include <asm/xen/hypervisor.h> --- a/drivers/xen/privcmd.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/drivers/xen/privcmd.c @@ -25,7 +25,6 @@ #include <linux/miscdevice.h> #include <linux/moduleparam.h> -#include <asm/pgalloc.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> --- a/fs/binfmt_elf_fdpic.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/fs/binfmt_elf_fdpic.c @@ -38,7 +38,6 @@ #include <linux/uaccess.h> #include <asm/param.h> -#include <asm/pgalloc.h> typedef char *elf_caddr_t; --- a/include/asm-generic/tlb.h~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/include/asm-generic/tlb.h @@ -14,7 +14,6 @@ #include <linux/mmu_notifier.h> #include <linux/swap.h> #include <linux/hugetlb_inline.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/cacheflush.h> --- a/mm/hugetlb.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/mm/hugetlb.c @@ -31,6 +31,7 @@ #include <linux/cma.h> #include <asm/page.h> +#include <asm/pgalloc.h> #include <asm/tlb.h> #include <linux/io.h> --- a/mm/sparse.c~mm-remove-unneeded-includes-of-asm-pgalloch +++ a/mm/sparse.c @@ -16,7 +16,6 @@ #include "internal.h" #include <asm/dma.h> -#include <asm/pgalloc.h> /* * Permanent SPARSEMEM data: _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: opeinrisc: switch to generic version of pte allocation Replace pte_alloc_one(), pte_free() and pte_free_kernel() with the generic implementation. The only actual functional change is the addition of __GFP_ACCOUT for the allocation of the user page tables. The pte_alloc_one_kernel() is kept back because its implementation on openrisc is different than the generic one. Link: http://lkml.kernel.org/r/20200627143453.31835-3-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Stafford Horne <shorne@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/openrisc/include/asm/pgalloc.h | 33 ++------------------------ 1 file changed, 3 insertions(+), 30 deletions(-) --- a/arch/openrisc/include/asm/pgalloc.h~opeinrisc-switch-to-generic-version-of-pte-allocation +++ a/arch/openrisc/include/asm/pgalloc.h @@ -20,6 +20,9 @@ #include <linux/mm.h> #include <linux/memblock.h> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL +#include <asm-generic/pgalloc.h> + extern int mem_init_done; #define pmd_populate_kernel(mm, pmd, pte) \ @@ -61,38 +64,8 @@ extern inline pgd_t *pgd_alloc(struct mm } #endif -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long)pgd); -} - extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm); -static inline struct page *pte_alloc_one(struct mm_struct *mm) -{ - struct page *pte; - pte = alloc_pages(GFP_KERNEL, 0); - if (!pte) - return NULL; - clear_page(page_address(pte)); - if (!pgtable_pte_page_ctor(pte)) { - __free_page(pte); - return NULL; - } - return pte; -} - -static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte) -{ - free_page((unsigned long)pte); -} - -static inline void pte_free(struct mm_struct *mm, struct page *pte) -{ - pgtable_pte_page_dtor(pte); - __free_page(pte); -} - #define __pte_free_tlb(tlb, pte, addr) \ do { \ pgtable_pte_page_dtor(pte); \ _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: xtensa: switch to generic version of pte allocation xtensa clears PTEs during allocation of the page tables and pte_clear() sets the PTE to a non-zero value. Splitting ptes_clear() helper out of pte_alloc_one() and pte_alloc_one_kernel() allows reuse of base generic allocation methods (__pte_alloc_one() and __pte_alloc_one_kernel()) and the common GFP mask for page table allocations. The pte_free() and pte_free_kernel() implementations on xtensa are identical to the generic ones and can be dropped. [jcmvbkbc@gmail.com: xtensa: fix closing endif comment] Link: http://lkml.kernel.org/r/20200721024751.1257-1-jcmvbkbc@gmail.com Link: http://lkml.kernel.org/r/20200627143453.31835-4-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/xtensa/include/asm/pgalloc.h | 41 ++++++++++++---------------- 1 file changed, 19 insertions(+), 22 deletions(-) --- a/arch/xtensa/include/asm/pgalloc.h~xtensa-switch-to-generic-version-of-pte-allocation +++ a/arch/xtensa/include/asm/pgalloc.h @@ -8,9 +8,14 @@ #ifndef _XTENSA_PGALLOC_H #define _XTENSA_PGALLOC_H +#ifdef CONFIG_MMU #include <linux/highmem.h> #include <linux/slab.h> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL +#define __HAVE_ARCH_PTE_ALLOC_ONE +#include <asm-generic/pgalloc.h> + /* * Allocating and freeing a pmd is trivial: the 1-entry pmd is * inside the pgd, so has no extra memory associated with it. @@ -33,45 +38,37 @@ static inline void pgd_free(struct mm_st free_page((unsigned long)pgd); } +static inline void ptes_clear(pte_t *ptep) +{ + int i; + + for (i = 0; i < PTRS_PER_PTE; i++) + pte_clear(NULL, 0, ptep + i); +} + static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm) { pte_t *ptep; - int i; - ptep = (pte_t *)__get_free_page(GFP_KERNEL); + ptep = (pte_t *)__pte_alloc_one_kernel(mm); if (!ptep) return NULL; - for (i = 0; i < 1024; i++) - pte_clear(NULL, 0, ptep + i); + ptes_clear(ptep); return ptep; } static inline pgtable_t pte_alloc_one(struct mm_struct *mm) { - pte_t *pte; struct page *page; - pte = pte_alloc_one_kernel(mm); - if (!pte) - return NULL; - page = virt_to_page(pte); - if (!pgtable_pte_page_ctor(page)) { - __free_page(page); + page = __pte_alloc_one(mm, GFP_PGTABLE_USER); + if (!page) return NULL; - } + ptes_clear(page_address(page)); return page; } -static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte) -{ - free_page((unsigned long)pte); -} - -static inline void pte_free(struct mm_struct *mm, pgtable_t pte) -{ - pgtable_pte_page_dtor(pte); - __free_page(pte); -} #define pmd_pgtable(pmd) pmd_page(pmd) +#endif /* CONFIG_MMU */ #endif /* _XTENSA_PGALLOC_H */ _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one() For most architectures that support >2 levels of page tables, pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with __GFP_ZERO and sometimes followed by memset(0) instead. More elaborate versions on arm64 and x86 account memory for the user page tables and call to pgtable_pmd_page_ctor() as the part of PMD page initialization. Move the arm64 version to include/asm-generic/pgalloc.h and use the generic version on several architectures. The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is not enabled, so there is no functional change for most architectures except of the addition of __GFP_ACCOUNT for allocation of user page tables. The pmd_free() is a wrapper for free_page() in all the cases, so no functional change here. Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/alpha/include/asm/pgalloc.h | 15 -------- arch/arm/include/asm/pgalloc.h | 11 ------ arch/arm64/include/asm/pgalloc.h | 27 --------------- arch/ia64/include/asm/pgalloc.h | 10 ----- arch/mips/include/asm/pgalloc.h | 8 +--- arch/parisc/include/asm/pgalloc.h | 11 +----- arch/riscv/include/asm/pgalloc.h | 13 ------- arch/sh/include/asm/pgalloc.h | 3 + arch/um/include/asm/pgalloc.h | 8 ---- arch/um/include/asm/pgtable-3level.h | 3 - arch/um/kernel/mem.c | 12 ------ arch/x86/include/asm/pgalloc.h | 26 --------------- include/asm-generic/pgalloc.h | 43 +++++++++++++++++++++++++ 13 files changed, 55 insertions(+), 135 deletions(-) --- a/arch/alpha/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/alpha/include/asm/pgalloc.h @@ -5,7 +5,7 @@ #include <linux/mm.h> #include <linux/mmzone.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> /* * Allocate and free page tables. The xxx_kernel() versions are @@ -40,17 +40,4 @@ pgd_free(struct mm_struct *mm, pgd_t *pg free_page((unsigned long)pgd); } -static inline pmd_t * -pmd_alloc_one(struct mm_struct *mm, unsigned long address) -{ - pmd_t *ret = (pmd_t *)__get_free_page(GFP_PGTABLE_USER); - return ret; -} - -static inline void -pmd_free(struct mm_struct *mm, pmd_t *pmd) -{ - free_page((unsigned long)pmd); -} - #endif /* _ALPHA_PGALLOC_H */ --- a/arch/arm64/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/arm64/include/asm/pgalloc.h @@ -13,37 +13,12 @@ #include <asm/cacheflush.h> #include <asm/tlbflush.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> #define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t)) #if CONFIG_PGTABLE_LEVELS > 2 -static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - gfp_t gfp = GFP_PGTABLE_USER; - struct page *page; - - if (mm == &init_mm) - gfp = GFP_PGTABLE_KERNEL; - - page = alloc_page(gfp); - if (!page) - return NULL; - if (!pgtable_pmd_page_ctor(page)) { - __free_page(page); - return NULL; - } - return page_address(page); -} - -static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp) -{ - BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1)); - pgtable_pmd_page_dtor(virt_to_page(pmdp)); - free_page((unsigned long)pmdp); -} - static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot) { set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot)); --- a/arch/arm/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/arm/include/asm/pgalloc.h @@ -22,17 +22,6 @@ #ifdef CONFIG_ARM_LPAE -static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - return (pmd_t *)get_zeroed_page(GFP_KERNEL); -} - -static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) -{ - BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); - free_page((unsigned long)pmd); -} - static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) { set_pud(pud, __pud(__pa(pmd) | PMD_TYPE_TABLE)); --- a/arch/ia64/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/ia64/include/asm/pgalloc.h @@ -59,16 +59,6 @@ pud_populate(struct mm_struct *mm, pud_t pud_val(*pud_entry) = __pa(pmd); } -static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - return (pmd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO); -} - -static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) -{ - free_page((unsigned long)pmd); -} - #define __pmd_free_tlb(tlb, pmd, address) pmd_free((tlb)->mm, pmd) static inline void --- a/arch/mips/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/mips/include/asm/pgalloc.h @@ -13,7 +13,8 @@ #include <linux/mm.h> #include <linux/sched.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#define __HAVE_ARCH_PMD_ALLOC_ONE +#include <asm-generic/pgalloc.h> static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) @@ -70,11 +71,6 @@ static inline pmd_t *pmd_alloc_one(struc return pmd; } -static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) -{ - free_pages((unsigned long)pmd, PMD_ORDER); -} - #define __pmd_free_tlb(tlb, x, addr) pmd_free((tlb)->mm, x) #endif --- a/arch/parisc/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/parisc/include/asm/pgalloc.h @@ -10,7 +10,8 @@ #include <asm/cache.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#define __HAVE_ARCH_PMD_FREE +#include <asm-generic/pgalloc.h> /* Allocate the top level pgd (page directory) * @@ -65,14 +66,6 @@ static inline void pud_populate(struct m (__u32)(__pa((unsigned long)pmd) >> PxD_VALUE_SHIFT))); } -static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address) -{ - pmd_t *pmd = (pmd_t *)__get_free_pages(GFP_KERNEL, PMD_ORDER); - if (pmd) - memset(pmd, 0, PAGE_SIZE<<PMD_ORDER); - return pmd; -} - static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) { if (pmd_flag(*pmd) & PxD_FLAG_ATTACHED) { --- a/arch/riscv/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/riscv/include/asm/pgalloc.h @@ -11,7 +11,7 @@ #include <asm/tlb.h> #ifdef CONFIG_MMU -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) @@ -62,17 +62,6 @@ static inline void pgd_free(struct mm_st #ifndef __PAGETABLE_PMD_FOLDED -static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - return (pmd_t *)__get_free_page( - GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_ZERO); -} - -static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) -{ - free_page((unsigned long)pmd); -} - #define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd) #endif /* __PAGETABLE_PMD_FOLDED */ --- a/arch/sh/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/sh/include/asm/pgalloc.h @@ -3,6 +3,9 @@ #define __ASM_SH_PGALLOC_H #include <asm/page.h> + +#define __HAVE_ARCH_PMD_ALLOC_ONE +#define __HAVE_ARCH_PMD_FREE #include <asm-generic/pgalloc.h> extern pgd_t *pgd_alloc(struct mm_struct *); --- a/arch/um/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/um/include/asm/pgalloc.h @@ -10,7 +10,7 @@ #include <linux/mm.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> #define pmd_populate_kernel(mm, pmd, pte) \ set_pmd(pmd, __pmd(_PAGE_TABLE + (unsigned long) __pa(pte))) @@ -34,12 +34,6 @@ do { \ } while (0) #ifdef CONFIG_3_LEVEL_PGTABLES - -static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) -{ - free_page((unsigned long)pmd); -} - #define __pmd_free_tlb(tlb,x, address) tlb_remove_page((tlb),virt_to_page(x)) #endif --- a/arch/um/include/asm/pgtable-3level.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/um/include/asm/pgtable-3level.h @@ -78,9 +78,6 @@ static inline void pgd_mkuptodate(pgd_t #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval)) #endif -struct mm_struct; -extern pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address); - static inline void pud_clear (pud_t *pud) { set_pud(pud, __pud(_PAGE_NEWPAGE)); --- a/arch/um/kernel/mem.c~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/um/kernel/mem.c @@ -201,18 +201,6 @@ void pgd_free(struct mm_struct *mm, pgd_ free_page((unsigned long) pgd); } -#ifdef CONFIG_3_LEVEL_PGTABLES -pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address) -{ - pmd_t *pmd = (pmd_t *) __get_free_page(GFP_KERNEL); - - if (pmd) - memset(pmd, 0, PAGE_SIZE); - - return pmd; -} -#endif - void *uml_kmalloc(int size, int flags) { return kmalloc(size, flags); --- a/arch/x86/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/arch/x86/include/asm/pgalloc.h @@ -7,7 +7,7 @@ #include <linux/pagemap.h> #define __HAVE_ARCH_PTE_ALLOC_ONE -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> static inline int __paravirt_pgd_alloc(struct mm_struct *mm) { return 0; } @@ -86,30 +86,6 @@ static inline void pmd_populate(struct m #define pmd_pgtable(pmd) pmd_page(pmd) #if CONFIG_PGTABLE_LEVELS > 2 -static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - struct page *page; - gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO; - - if (mm == &init_mm) - gfp &= ~__GFP_ACCOUNT; - page = alloc_pages(gfp, 0); - if (!page) - return NULL; - if (!pgtable_pmd_page_ctor(page)) { - __free_pages(page, 0); - return NULL; - } - return (pmd_t *)page_address(page); -} - -static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) -{ - BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); - pgtable_pmd_page_dtor(virt_to_page(pmd)); - free_page((unsigned long)pmd); -} - extern void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, --- a/include/asm-generic/pgalloc.h~asm-generic-pgalloc-provide-generic-pmd_alloc_one-and-pmd_free_one +++ a/include/asm-generic/pgalloc.h @@ -102,6 +102,49 @@ static inline void pte_free(struct mm_st __free_page(pte_page); } + +#if CONFIG_PGTABLE_LEVELS > 2 + +#ifndef __HAVE_ARCH_PMD_ALLOC_ONE +/** + * pmd_alloc_one - allocate a page for PMD-level page table + * @mm: the mm_struct of the current context + * + * Allocates a page and runs the pgtable_pmd_page_ctor(). + * Allocations use %GFP_PGTABLE_USER in user context and + * %GFP_PGTABLE_KERNEL in kernel context. + * + * Return: pointer to the allocated memory or %NULL on error + */ +static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) +{ + struct page *page; + gfp_t gfp = GFP_PGTABLE_USER; + + if (mm == &init_mm) + gfp = GFP_PGTABLE_KERNEL; + page = alloc_pages(gfp, 0); + if (!page) + return NULL; + if (!pgtable_pmd_page_ctor(page)) { + __free_pages(page, 0); + return NULL; + } + return (pmd_t *)page_address(page); +} +#endif + +#ifndef __HAVE_ARCH_PMD_FREE +static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) +{ + BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); + pgtable_pmd_page_dtor(virt_to_page(pmd)); + free_page((unsigned long)pmd); +} +#endif + +#endif /* CONFIG_PGTABLE_LEVELS > 2 */ + #endif /* CONFIG_MMU */ #endif /* __ASM_GENERIC_PGALLOC_H */ _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one() Several architectures define pud_alloc_one() as a wrapper for __get_free_page() and pud_free() as a wrapper for free_page(). Provide a generic implementation in asm-generic/pgalloc.h and use it where appropriate. Link: http://lkml.kernel.org/r/20200627143453.31835-6-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm64/include/asm/pgalloc.h | 11 ---------- arch/ia64/include/asm/pgalloc.h | 9 -------- arch/mips/include/asm/pgalloc.h | 6 ----- arch/x86/include/asm/pgalloc.h | 15 -------------- include/asm-generic/pgalloc.h | 30 +++++++++++++++++++++++++++++ 5 files changed, 31 insertions(+), 40 deletions(-) --- a/arch/arm64/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pud_alloc_one-and-pud_free_one +++ a/arch/arm64/include/asm/pgalloc.h @@ -37,17 +37,6 @@ static inline void __pud_populate(pud_t #if CONFIG_PGTABLE_LEVELS > 3 -static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - return (pud_t *)__get_free_page(GFP_PGTABLE_USER); -} - -static inline void pud_free(struct mm_struct *mm, pud_t *pudp) -{ - BUG_ON((unsigned long)pudp & (PAGE_SIZE-1)); - free_page((unsigned long)pudp); -} - static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot) { set_p4d(p4dp, __p4d(__phys_to_p4d_val(pudp) | prot)); --- a/arch/ia64/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pud_alloc_one-and-pud_free_one +++ a/arch/ia64/include/asm/pgalloc.h @@ -41,15 +41,6 @@ p4d_populate(struct mm_struct *mm, p4d_t p4d_val(*p4d_entry) = __pa(pud); } -static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - return (pud_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO); -} - -static inline void pud_free(struct mm_struct *mm, pud_t *pud) -{ - free_page((unsigned long)pud); -} #define __pud_free_tlb(tlb, pud, address) pud_free((tlb)->mm, pud) #endif /* CONFIG_PGTABLE_LEVELS == 4 */ --- a/arch/mips/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pud_alloc_one-and-pud_free_one +++ a/arch/mips/include/asm/pgalloc.h @@ -14,6 +14,7 @@ #include <linux/sched.h> #define __HAVE_ARCH_PMD_ALLOC_ONE +#define __HAVE_ARCH_PUD_ALLOC_ONE #include <asm-generic/pgalloc.h> static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, @@ -87,11 +88,6 @@ static inline pud_t *pud_alloc_one(struc return pud; } -static inline void pud_free(struct mm_struct *mm, pud_t *pud) -{ - free_pages((unsigned long)pud, PUD_ORDER); -} - static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) { set_p4d(p4d, __p4d((unsigned long)pud)); --- a/arch/x86/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pud_alloc_one-and-pud_free_one +++ a/arch/x86/include/asm/pgalloc.h @@ -123,21 +123,6 @@ static inline void p4d_populate_safe(str set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud))); } -static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - gfp_t gfp = GFP_KERNEL_ACCOUNT; - - if (mm == &init_mm) - gfp &= ~__GFP_ACCOUNT; - return (pud_t *)get_zeroed_page(gfp); -} - -static inline void pud_free(struct mm_struct *mm, pud_t *pud) -{ - BUG_ON((unsigned long)pud & (PAGE_SIZE-1)); - free_page((unsigned long)pud); -} - extern void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud); static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud, --- a/include/asm-generic/pgalloc.h~asm-generic-pgalloc-provide-generic-pud_alloc_one-and-pud_free_one +++ a/include/asm-generic/pgalloc.h @@ -145,6 +145,36 @@ static inline void pmd_free(struct mm_st #endif /* CONFIG_PGTABLE_LEVELS > 2 */ +#if CONFIG_PGTABLE_LEVELS > 3 + +#ifndef __HAVE_ARCH_PUD_FREE +/** + * pud_alloc_one - allocate a page for PUD-level page table + * @mm: the mm_struct of the current context + * + * Allocates a page using %GFP_PGTABLE_USER for user context and + * %GFP_PGTABLE_KERNEL for kernel context. + * + * Return: pointer to the allocated memory or %NULL on error + */ +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) +{ + gfp_t gfp = GFP_PGTABLE_USER; + + if (mm == &init_mm) + gfp = GFP_PGTABLE_KERNEL; + return (pud_t *)get_zeroed_page(gfp); +} +#endif + +static inline void pud_free(struct mm_struct *mm, pud_t *pud) +{ + BUG_ON((unsigned long)pud & (PAGE_SIZE-1)); + free_page((unsigned long)pud); +} + +#endif /* CONFIG_PGTABLE_LEVELS > 3 */ + #endif /* CONFIG_MMU */ #endif /* __ASM_GENERIC_PGALLOC_H */ _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: asm-generic: pgalloc: provide generic pgd_free() Most architectures define pgd_free() as a wrapper for free_page(). Provide a generic version in asm-generic/pgalloc.h and enable its use for most architectures. Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/alpha/include/asm/pgalloc.h | 6 ------ arch/arm/include/asm/pgalloc.h | 1 + arch/arm64/include/asm/pgalloc.h | 1 + arch/csky/include/asm/pgalloc.h | 7 +------ arch/hexagon/include/asm/pgalloc.h | 7 +------ arch/ia64/include/asm/pgalloc.h | 5 ----- arch/m68k/include/asm/sun3_pgalloc.h | 7 +------ arch/microblaze/include/asm/pgalloc.h | 6 ------ arch/mips/include/asm/pgalloc.h | 5 ----- arch/nds32/mm/mm-nds32.c | 2 ++ arch/nios2/include/asm/pgalloc.h | 7 +------ arch/parisc/include/asm/pgalloc.h | 1 + arch/riscv/include/asm/pgalloc.h | 5 ----- arch/sh/include/asm/pgalloc.h | 1 + arch/um/include/asm/pgalloc.h | 1 - arch/um/kernel/mem.c | 5 ----- arch/x86/include/asm/pgalloc.h | 1 + arch/xtensa/include/asm/pgalloc.h | 5 ----- include/asm-generic/pgalloc.h | 7 +++++++ 19 files changed, 18 insertions(+), 62 deletions(-) --- a/arch/alpha/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/alpha/include/asm/pgalloc.h @@ -34,10 +34,4 @@ pud_populate(struct mm_struct *mm, pud_t extern pgd_t *pgd_alloc(struct mm_struct *mm); -static inline void -pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long)pgd); -} - #endif /* _ALPHA_PGALLOC_H */ --- a/arch/arm64/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/arm64/include/asm/pgalloc.h @@ -13,6 +13,7 @@ #include <asm/cacheflush.h> #include <asm/tlbflush.h> +#define __HAVE_ARCH_PGD_FREE #include <asm-generic/pgalloc.h> #define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t)) --- a/arch/arm/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/arm/include/asm/pgalloc.h @@ -65,6 +65,7 @@ static inline void clean_pte_table(pte_t #define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL #define __HAVE_ARCH_PTE_ALLOC_ONE +#define __HAVE_ARCH_PGD_FREE #include <asm-generic/pgalloc.h> static inline pte_t * --- a/arch/csky/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/csky/include/asm/pgalloc.h @@ -9,7 +9,7 @@ #include <linux/sched.h> #define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) @@ -42,11 +42,6 @@ static inline pte_t *pte_alloc_one_kerne return pte; } -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_pages((unsigned long)pgd, PGD_ORDER); -} - static inline pgd_t *pgd_alloc(struct mm_struct *mm) { pgd_t *ret; --- a/arch/hexagon/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/hexagon/include/asm/pgalloc.h @@ -11,7 +11,7 @@ #include <asm/mem-layout.h> #include <asm/atomic.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> extern unsigned long long kmap_generation; @@ -41,11 +41,6 @@ static inline pgd_t *pgd_alloc(struct mm return pgd; } -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long) pgd); -} - static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, pgtable_t pte) { --- a/arch/ia64/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/ia64/include/asm/pgalloc.h @@ -29,11 +29,6 @@ static inline pgd_t *pgd_alloc(struct mm return (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO); } -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long)pgd); -} - #if CONFIG_PGTABLE_LEVELS == 4 static inline void p4d_populate(struct mm_struct *mm, p4d_t * p4d_entry, pud_t * pud) --- a/arch/m68k/include/asm/sun3_pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/m68k/include/asm/sun3_pgalloc.h @@ -13,7 +13,7 @@ #include <asm/tlb.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> extern const char bad_pmd_string[]; @@ -40,11 +40,6 @@ static inline void pmd_populate(struct m */ #define pmd_free(mm, x) do { } while (0) -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long) pgd); -} - static inline pgd_t * pgd_alloc(struct mm_struct *mm) { pgd_t *new_pgd; --- a/arch/microblaze/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/microblaze/include/asm/pgalloc.h @@ -28,12 +28,6 @@ static inline pgd_t *get_pgd(void) return (pgd_t *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, 0); } -static inline void free_pgd(pgd_t *pgd) -{ - free_page((unsigned long)pgd); -} - -#define pgd_free(mm, pgd) free_pgd(pgd) #define pgd_alloc(mm) get_pgd() #define pmd_pgtable(pmd) pmd_page(pmd) --- a/arch/mips/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/mips/include/asm/pgalloc.h @@ -49,11 +49,6 @@ static inline void pud_populate(struct m extern void pgd_init(unsigned long page); extern pgd_t *pgd_alloc(struct mm_struct *mm); -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_pages((unsigned long)pgd, PGD_ORDER); -} - #define __pte_free_tlb(tlb,pte,address) \ do { \ pgtable_pte_page_dtor(pte); \ --- a/arch/nds32/mm/mm-nds32.c~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/nds32/mm/mm-nds32.c @@ -2,6 +2,8 @@ // Copyright (C) 2005-2017 Andes Technology Corporation #include <linux/init_task.h> + +#define __HAVE_ARCH_PGD_FREE #include <asm/pgalloc.h> #define FIRST_KERNEL_PGD_NR (USER_PTRS_PER_PGD) --- a/arch/nios2/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/nios2/include/asm/pgalloc.h @@ -12,7 +12,7 @@ #include <linux/mm.h> -#include <asm-generic/pgalloc.h> /* for pte_{alloc,free}_one */ +#include <asm-generic/pgalloc.h> static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) @@ -34,11 +34,6 @@ extern void pmd_init(unsigned long page, extern pgd_t *pgd_alloc(struct mm_struct *mm); -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_pages((unsigned long)pgd, PGD_ORDER); -} - #define __pte_free_tlb(tlb, pte, addr) \ do { \ pgtable_pte_page_dtor(pte); \ --- a/arch/parisc/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/parisc/include/asm/pgalloc.h @@ -11,6 +11,7 @@ #include <asm/cache.h> #define __HAVE_ARCH_PMD_FREE +#define __HAVE_ARCH_PGD_FREE #include <asm-generic/pgalloc.h> /* Allocate the top level pgd (page directory) --- a/arch/riscv/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/riscv/include/asm/pgalloc.h @@ -55,11 +55,6 @@ static inline pgd_t *pgd_alloc(struct mm return pgd; } -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long)pgd); -} - #ifndef __PAGETABLE_PMD_FOLDED #define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd) --- a/arch/sh/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/sh/include/asm/pgalloc.h @@ -6,6 +6,7 @@ #define __HAVE_ARCH_PMD_ALLOC_ONE #define __HAVE_ARCH_PMD_FREE +#define __HAVE_ARCH_PGD_FREE #include <asm-generic/pgalloc.h> extern pgd_t *pgd_alloc(struct mm_struct *); --- a/arch/um/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/um/include/asm/pgalloc.h @@ -25,7 +25,6 @@ * Allocate and free page tables. */ extern pgd_t *pgd_alloc(struct mm_struct *); -extern void pgd_free(struct mm_struct *mm, pgd_t *pgd); #define __pte_free_tlb(tlb,pte, address) \ do { \ --- a/arch/um/kernel/mem.c~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/um/kernel/mem.c @@ -196,11 +196,6 @@ pgd_t *pgd_alloc(struct mm_struct *mm) return pgd; } -void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long) pgd); -} - void *uml_kmalloc(int size, int flags) { return kmalloc(size, flags); --- a/arch/x86/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/x86/include/asm/pgalloc.h @@ -7,6 +7,7 @@ #include <linux/pagemap.h> #define __HAVE_ARCH_PTE_ALLOC_ONE +#define __HAVE_ARCH_PGD_FREE #include <asm-generic/pgalloc.h> static inline int __paravirt_pgd_alloc(struct mm_struct *mm) { return 0; } --- a/arch/xtensa/include/asm/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/arch/xtensa/include/asm/pgalloc.h @@ -33,11 +33,6 @@ pgd_alloc(struct mm_struct *mm) return (pgd_t*) __get_free_pages(GFP_KERNEL | __GFP_ZERO, PGD_ORDER); } -static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) -{ - free_page((unsigned long)pgd); -} - static inline void ptes_clear(pte_t *ptep) { int i; --- a/include/asm-generic/pgalloc.h~asm-generic-pgalloc-provide-generic-pgd_free +++ a/include/asm-generic/pgalloc.h @@ -175,6 +175,13 @@ static inline void pud_free(struct mm_st #endif /* CONFIG_PGTABLE_LEVELS > 3 */ +#ifndef __HAVE_ARCH_PGD_FREE +static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) +{ + free_page((unsigned long)pgd); +} +#endif + #endif /* CONFIG_MMU */ #endif /* __ASM_GENERIC_PGALLOC_H */ _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: mm: move lib/ioremap.c to mm/ The functionality in lib/ioremap.c deals with pagetables, vmalloc and caches, so it naturally belongs to mm/ Moving it there will also allow declaring p?d_alloc_track functions in an header file inside mm/ rather than having those declarations in include/linux/mm.h Link: http://lkml.kernel.org/r/20200627143453.31835-8-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- lib/Makefile | 1 lib/ioremap.c | 287 ------------------------------------------------ mm/Makefile | 2 mm/ioremap.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 288 insertions(+), 289 deletions(-) --- a/lib/ioremap.c +++ /dev/null @@ -1,287 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Re-map IO memory to kernel address space so that we can access it. - * This is needed for high PCI addresses that aren't mapped in the - * 640k-1MB IO memory area on PC's - * - * (C) Copyright 1995 1996 Linus Torvalds - */ -#include <linux/vmalloc.h> -#include <linux/mm.h> -#include <linux/sched.h> -#include <linux/io.h> -#include <linux/export.h> -#include <asm/cacheflush.h> - -#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -static int __read_mostly ioremap_p4d_capable; -static int __read_mostly ioremap_pud_capable; -static int __read_mostly ioremap_pmd_capable; -static int __read_mostly ioremap_huge_disabled; - -static int __init set_nohugeiomap(char *str) -{ - ioremap_huge_disabled = 1; - return 0; -} -early_param("nohugeiomap", set_nohugeiomap); - -void __init ioremap_huge_init(void) -{ - if (!ioremap_huge_disabled) { - if (arch_ioremap_p4d_supported()) - ioremap_p4d_capable = 1; - if (arch_ioremap_pud_supported()) - ioremap_pud_capable = 1; - if (arch_ioremap_pmd_supported()) - ioremap_pmd_capable = 1; - } -} - -static inline int ioremap_p4d_enabled(void) -{ - return ioremap_p4d_capable; -} - -static inline int ioremap_pud_enabled(void) -{ - return ioremap_pud_capable; -} - -static inline int ioremap_pmd_enabled(void) -{ - return ioremap_pmd_capable; -} - -#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ -static inline int ioremap_p4d_enabled(void) { return 0; } -static inline int ioremap_pud_enabled(void) { return 0; } -static inline int ioremap_pmd_enabled(void) { return 0; } -#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ - -static int ioremap_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) -{ - pte_t *pte; - u64 pfn; - - pfn = phys_addr >> PAGE_SHIFT; - pte = pte_alloc_kernel_track(pmd, addr, mask); - if (!pte) - return -ENOMEM; - do { - BUG_ON(!pte_none(*pte)); - set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot)); - pfn++; - } while (pte++, addr += PAGE_SIZE, addr != end); - *mask |= PGTBL_PTE_MODIFIED; - return 0; -} - -static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) -{ - if (!ioremap_pmd_enabled()) - return 0; - - if ((end - addr) != PMD_SIZE) - return 0; - - if (!IS_ALIGNED(addr, PMD_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, PMD_SIZE)) - return 0; - - if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) - return 0; - - return pmd_set_huge(pmd, phys_addr, prot); -} - -static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) -{ - pmd_t *pmd; - unsigned long next; - - pmd = pmd_alloc_track(&init_mm, pud, addr, mask); - if (!pmd) - return -ENOMEM; - do { - next = pmd_addr_end(addr, end); - - if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) { - *mask |= PGTBL_PMD_MODIFIED; - continue; - } - - if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask)) - return -ENOMEM; - } while (pmd++, phys_addr += (next - addr), addr = next, addr != end); - return 0; -} - -static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) -{ - if (!ioremap_pud_enabled()) - return 0; - - if ((end - addr) != PUD_SIZE) - return 0; - - if (!IS_ALIGNED(addr, PUD_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, PUD_SIZE)) - return 0; - - if (pud_present(*pud) && !pud_free_pmd_page(pud, addr)) - return 0; - - return pud_set_huge(pud, phys_addr, prot); -} - -static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) -{ - pud_t *pud; - unsigned long next; - - pud = pud_alloc_track(&init_mm, p4d, addr, mask); - if (!pud) - return -ENOMEM; - do { - next = pud_addr_end(addr, end); - - if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) { - *mask |= PGTBL_PUD_MODIFIED; - continue; - } - - if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask)) - return -ENOMEM; - } while (pud++, phys_addr += (next - addr), addr = next, addr != end); - return 0; -} - -static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, - pgprot_t prot) -{ - if (!ioremap_p4d_enabled()) - return 0; - - if ((end - addr) != P4D_SIZE) - return 0; - - if (!IS_ALIGNED(addr, P4D_SIZE)) - return 0; - - if (!IS_ALIGNED(phys_addr, P4D_SIZE)) - return 0; - - if (p4d_present(*p4d) && !p4d_free_pud_page(p4d, addr)) - return 0; - - return p4d_set_huge(p4d, phys_addr, prot); -} - -static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - pgtbl_mod_mask *mask) -{ - p4d_t *p4d; - unsigned long next; - - p4d = p4d_alloc_track(&init_mm, pgd, addr, mask); - if (!p4d) - return -ENOMEM; - do { - next = p4d_addr_end(addr, end); - - if (ioremap_try_huge_p4d(p4d, addr, next, phys_addr, prot)) { - *mask |= PGTBL_P4D_MODIFIED; - continue; - } - - if (ioremap_pud_range(p4d, addr, next, phys_addr, prot, mask)) - return -ENOMEM; - } while (p4d++, phys_addr += (next - addr), addr = next, addr != end); - return 0; -} - -int ioremap_page_range(unsigned long addr, - unsigned long end, phys_addr_t phys_addr, pgprot_t prot) -{ - pgd_t *pgd; - unsigned long start; - unsigned long next; - int err; - pgtbl_mod_mask mask = 0; - - might_sleep(); - BUG_ON(addr >= end); - - start = addr; - pgd = pgd_offset_k(addr); - do { - next = pgd_addr_end(addr, end); - err = ioremap_p4d_range(pgd, addr, next, phys_addr, prot, - &mask); - if (err) - break; - } while (pgd++, phys_addr += (next - addr), addr = next, addr != end); - - flush_cache_vmap(start, end); - - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) - arch_sync_kernel_mappings(start, end); - - return err; -} - -#ifdef CONFIG_GENERIC_IOREMAP -void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot) -{ - unsigned long offset, vaddr; - phys_addr_t last_addr; - struct vm_struct *area; - - /* Disallow wrap-around or zero size */ - last_addr = addr + size - 1; - if (!size || last_addr < addr) - return NULL; - - /* Page-align mappings */ - offset = addr & (~PAGE_MASK); - addr -= offset; - size = PAGE_ALIGN(size + offset); - - area = get_vm_area_caller(size, VM_IOREMAP, - __builtin_return_address(0)); - if (!area) - return NULL; - vaddr = (unsigned long)area->addr; - - if (ioremap_page_range(vaddr, vaddr + size, addr, __pgprot(prot))) { - free_vm_area(area); - return NULL; - } - - return (void __iomem *)(vaddr + offset); -} -EXPORT_SYMBOL(ioremap_prot); - -void iounmap(volatile void __iomem *addr) -{ - vunmap((void *)((unsigned long)addr & PAGE_MASK)); -} -EXPORT_SYMBOL(iounmap); -#endif /* CONFIG_GENERIC_IOREMAP */ --- a/lib/Makefile~mm-move-lib-ioremapc-to-mm +++ a/lib/Makefile @@ -37,7 +37,6 @@ lib-y := ctype.o string.o vsprintf.o cmd nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o lib-$(CONFIG_PRINTK) += dump_stack.o -lib-$(CONFIG_MMU) += ioremap.o lib-$(CONFIG_SMP) += cpumask.o lib-y += kobject.o klist.o --- /dev/null +++ a/mm/ioremap.c @@ -0,0 +1,287 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Re-map IO memory to kernel address space so that we can access it. + * This is needed for high PCI addresses that aren't mapped in the + * 640k-1MB IO memory area on PC's + * + * (C) Copyright 1995 1996 Linus Torvalds + */ +#include <linux/vmalloc.h> +#include <linux/mm.h> +#include <linux/sched.h> +#include <linux/io.h> +#include <linux/export.h> +#include <asm/cacheflush.h> + +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +static int __read_mostly ioremap_p4d_capable; +static int __read_mostly ioremap_pud_capable; +static int __read_mostly ioremap_pmd_capable; +static int __read_mostly ioremap_huge_disabled; + +static int __init set_nohugeiomap(char *str) +{ + ioremap_huge_disabled = 1; + return 0; +} +early_param("nohugeiomap", set_nohugeiomap); + +void __init ioremap_huge_init(void) +{ + if (!ioremap_huge_disabled) { + if (arch_ioremap_p4d_supported()) + ioremap_p4d_capable = 1; + if (arch_ioremap_pud_supported()) + ioremap_pud_capable = 1; + if (arch_ioremap_pmd_supported()) + ioremap_pmd_capable = 1; + } +} + +static inline int ioremap_p4d_enabled(void) +{ + return ioremap_p4d_capable; +} + +static inline int ioremap_pud_enabled(void) +{ + return ioremap_pud_capable; +} + +static inline int ioremap_pmd_enabled(void) +{ + return ioremap_pmd_capable; +} + +#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ +static inline int ioremap_p4d_enabled(void) { return 0; } +static inline int ioremap_pud_enabled(void) { return 0; } +static inline int ioremap_pmd_enabled(void) { return 0; } +#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +static int ioremap_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + pgtbl_mod_mask *mask) +{ + pte_t *pte; + u64 pfn; + + pfn = phys_addr >> PAGE_SHIFT; + pte = pte_alloc_kernel_track(pmd, addr, mask); + if (!pte) + return -ENOMEM; + do { + BUG_ON(!pte_none(*pte)); + set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot)); + pfn++; + } while (pte++, addr += PAGE_SIZE, addr != end); + *mask |= PGTBL_PTE_MODIFIED; + return 0; +} + +static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, + pgprot_t prot) +{ + if (!ioremap_pmd_enabled()) + return 0; + + if ((end - addr) != PMD_SIZE) + return 0; + + if (!IS_ALIGNED(addr, PMD_SIZE)) + return 0; + + if (!IS_ALIGNED(phys_addr, PMD_SIZE)) + return 0; + + if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr)) + return 0; + + return pmd_set_huge(pmd, phys_addr, prot); +} + +static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + pgtbl_mod_mask *mask) +{ + pmd_t *pmd; + unsigned long next; + + pmd = pmd_alloc_track(&init_mm, pud, addr, mask); + if (!pmd) + return -ENOMEM; + do { + next = pmd_addr_end(addr, end); + + if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) { + *mask |= PGTBL_PMD_MODIFIED; + continue; + } + + if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask)) + return -ENOMEM; + } while (pmd++, phys_addr += (next - addr), addr = next, addr != end); + return 0; +} + +static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, + pgprot_t prot) +{ + if (!ioremap_pud_enabled()) + return 0; + + if ((end - addr) != PUD_SIZE) + return 0; + + if (!IS_ALIGNED(addr, PUD_SIZE)) + return 0; + + if (!IS_ALIGNED(phys_addr, PUD_SIZE)) + return 0; + + if (pud_present(*pud) && !pud_free_pmd_page(pud, addr)) + return 0; + + return pud_set_huge(pud, phys_addr, prot); +} + +static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + pgtbl_mod_mask *mask) +{ + pud_t *pud; + unsigned long next; + + pud = pud_alloc_track(&init_mm, p4d, addr, mask); + if (!pud) + return -ENOMEM; + do { + next = pud_addr_end(addr, end); + + if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) { + *mask |= PGTBL_PUD_MODIFIED; + continue; + } + + if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask)) + return -ENOMEM; + } while (pud++, phys_addr += (next - addr), addr = next, addr != end); + return 0; +} + +static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, + pgprot_t prot) +{ + if (!ioremap_p4d_enabled()) + return 0; + + if ((end - addr) != P4D_SIZE) + return 0; + + if (!IS_ALIGNED(addr, P4D_SIZE)) + return 0; + + if (!IS_ALIGNED(phys_addr, P4D_SIZE)) + return 0; + + if (p4d_present(*p4d) && !p4d_free_pud_page(p4d, addr)) + return 0; + + return p4d_set_huge(p4d, phys_addr, prot); +} + +static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot, + pgtbl_mod_mask *mask) +{ + p4d_t *p4d; + unsigned long next; + + p4d = p4d_alloc_track(&init_mm, pgd, addr, mask); + if (!p4d) + return -ENOMEM; + do { + next = p4d_addr_end(addr, end); + + if (ioremap_try_huge_p4d(p4d, addr, next, phys_addr, prot)) { + *mask |= PGTBL_P4D_MODIFIED; + continue; + } + + if (ioremap_pud_range(p4d, addr, next, phys_addr, prot, mask)) + return -ENOMEM; + } while (p4d++, phys_addr += (next - addr), addr = next, addr != end); + return 0; +} + +int ioremap_page_range(unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot) +{ + pgd_t *pgd; + unsigned long start; + unsigned long next; + int err; + pgtbl_mod_mask mask = 0; + + might_sleep(); + BUG_ON(addr >= end); + + start = addr; + pgd = pgd_offset_k(addr); + do { + next = pgd_addr_end(addr, end); + err = ioremap_p4d_range(pgd, addr, next, phys_addr, prot, + &mask); + if (err) + break; + } while (pgd++, phys_addr += (next - addr), addr = next, addr != end); + + flush_cache_vmap(start, end); + + if (mask & ARCH_PAGE_TABLE_SYNC_MASK) + arch_sync_kernel_mappings(start, end); + + return err; +} + +#ifdef CONFIG_GENERIC_IOREMAP +void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot) +{ + unsigned long offset, vaddr; + phys_addr_t last_addr; + struct vm_struct *area; + + /* Disallow wrap-around or zero size */ + last_addr = addr + size - 1; + if (!size || last_addr < addr) + return NULL; + + /* Page-align mappings */ + offset = addr & (~PAGE_MASK); + addr -= offset; + size = PAGE_ALIGN(size + offset); + + area = get_vm_area_caller(size, VM_IOREMAP, + __builtin_return_address(0)); + if (!area) + return NULL; + vaddr = (unsigned long)area->addr; + + if (ioremap_page_range(vaddr, vaddr + size, addr, __pgprot(prot))) { + free_vm_area(area); + return NULL; + } + + return (void __iomem *)(vaddr + offset); +} +EXPORT_SYMBOL(ioremap_prot); + +void iounmap(volatile void __iomem *addr) +{ + vunmap((void *)((unsigned long)addr & PAGE_MASK)); +} +EXPORT_SYMBOL(iounmap); +#endif /* CONFIG_GENERIC_IOREMAP */ --- a/mm/Makefile~mm-move-lib-ioremapc-to-mm +++ a/mm/Makefile @@ -38,7 +38,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \ - pgtable-generic.o rmap.o vmalloc.o + pgtable-generic.o rmap.o vmalloc.o ioremap.o ifdef CONFIG_CROSS_MEMORY_ATTACH _
From: Joerg Roedel <jroedel@suse.de> Subject: mm: move p?d_alloc_track to separate header file The functions are only used in two source files, so there is no need for them to be in the global <linux/mm.h> header. Move them to the new <linux/pgalloc-track.h> header and include it only where needed. Link: http://lkml.kernel.org/r/20200609120533.25867-1-joro@8bytes.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm.h | 45 ------------------------------------- mm/ioremap.c | 2 + mm/pgalloc-track.h | 51 +++++++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 1 4 files changed, 54 insertions(+), 45 deletions(-) --- a/include/linux/mm.h~mm-move-pd_alloc_track-to-separate-header-file +++ a/include/linux/mm.h @@ -2103,51 +2103,11 @@ static inline pud_t *pud_alloc(struct mm NULL : pud_offset(p4d, address); } -static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd, - unsigned long address, - pgtbl_mod_mask *mod_mask) - -{ - if (unlikely(pgd_none(*pgd))) { - if (__p4d_alloc(mm, pgd, address)) - return NULL; - *mod_mask |= PGTBL_PGD_MODIFIED; - } - - return p4d_offset(pgd, address); -} - -static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d, - unsigned long address, - pgtbl_mod_mask *mod_mask) -{ - if (unlikely(p4d_none(*p4d))) { - if (__pud_alloc(mm, p4d, address)) - return NULL; - *mod_mask |= PGTBL_P4D_MODIFIED; - } - - return pud_offset(p4d, address); -} - static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address) { return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))? NULL: pmd_offset(pud, address); } - -static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud, - unsigned long address, - pgtbl_mod_mask *mod_mask) -{ - if (unlikely(pud_none(*pud))) { - if (__pmd_alloc(mm, pud, address)) - return NULL; - *mod_mask |= PGTBL_PUD_MODIFIED; - } - - return pmd_offset(pud, address); -} #endif /* CONFIG_MMU */ #if USE_SPLIT_PTE_PTLOCKS @@ -2263,11 +2223,6 @@ static inline void pgtable_pte_page_dtor ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd))? \ NULL: pte_offset_kernel(pmd, address)) -#define pte_alloc_kernel_track(pmd, address, mask) \ - ((unlikely(pmd_none(*(pmd))) && \ - (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\ - NULL: pte_offset_kernel(pmd, address)) - #if USE_SPLIT_PMD_PTLOCKS static struct page *pmd_to_page(pmd_t *pmd) --- a/mm/ioremap.c~mm-move-pd_alloc_track-to-separate-header-file +++ a/mm/ioremap.c @@ -13,6 +13,8 @@ #include <linux/export.h> #include <asm/cacheflush.h> +#include "pgalloc-track.h" + #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP static int __read_mostly ioremap_p4d_capable; static int __read_mostly ioremap_pud_capable; --- /dev/null +++ a/mm/pgalloc-track.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PGALLLC_TRACK_H +#define _LINUX_PGALLLC_TRACK_H + +#if defined(CONFIG_MMU) +static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd, + unsigned long address, + pgtbl_mod_mask *mod_mask) +{ + if (unlikely(pgd_none(*pgd))) { + if (__p4d_alloc(mm, pgd, address)) + return NULL; + *mod_mask |= PGTBL_PGD_MODIFIED; + } + + return p4d_offset(pgd, address); +} + +static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d, + unsigned long address, + pgtbl_mod_mask *mod_mask) +{ + if (unlikely(p4d_none(*p4d))) { + if (__pud_alloc(mm, p4d, address)) + return NULL; + *mod_mask |= PGTBL_P4D_MODIFIED; + } + + return pud_offset(p4d, address); +} + +static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud, + unsigned long address, + pgtbl_mod_mask *mod_mask) +{ + if (unlikely(pud_none(*pud))) { + if (__pmd_alloc(mm, pud, address)) + return NULL; + *mod_mask |= PGTBL_PUD_MODIFIED; + } + + return pmd_offset(pud, address); +} +#endif /* CONFIG_MMU */ + +#define pte_alloc_kernel_track(pmd, address, mask) \ + ((unlikely(pmd_none(*(pmd))) && \ + (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\ + NULL: pte_offset_kernel(pmd, address)) + +#endif /* _LINUX_PGALLLC_TRACK_H */ --- a/mm/vmalloc.c~mm-move-pd_alloc_track-to-separate-header-file +++ a/mm/vmalloc.c @@ -41,6 +41,7 @@ #include <asm/shmparam.h> #include "internal.h" +#include "pgalloc-track.h" bool is_vmalloc_addr(const void *x) { _
From: Zhen Lei <thunder.leizhen@huawei.com> Subject: mm/mmap: optimize a branch judgment in ksys_mmap_pgoff() Look at the pseudo code below. It's very clear that, the judgement "!is_file_hugepages(file)" at 3) is duplicated to the one at 1), we can use "else if" to avoid it. And the assignment "retval = -EINVAL" at 2) is only needed by the branch 3), because "retval" will be overwritten at 4). No functional change, but it can reduce the code size. Maybe more clearer? Before: text data bss dec hex filename 28733 1590 1 30324 7674 mm/mmap.o After: text data bss dec hex filename 28701 1590 1 30292 7654 mm/mmap.o ====pseudo code====: if (!(flags & MAP_ANONYMOUS)) { ... 1) if (is_file_hugepages(file)) len = ALIGN(len, huge_page_size(hstate_file(file))); 2) retval = -EINVAL; 3) if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) goto out_fput; } else if (flags & MAP_HUGETLB) { ... } ... 4) retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); out_fput: ... return retval; Link: http://lkml.kernel.org/r/20200705080112.1405-1-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/mmap.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/mm/mmap.c~mm-mmap-optimize-a-branch-judgment-in-ksys_mmap_pgoff +++ a/mm/mmap.c @@ -1562,11 +1562,12 @@ unsigned long ksys_mmap_pgoff(unsigned l file = fget(fd); if (!file) return -EBADF; - if (is_file_hugepages(file)) + if (is_file_hugepages(file)) { len = ALIGN(len, huge_page_size(hstate_file(file))); - retval = -EINVAL; - if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) + } else if (unlikely(flags & MAP_HUGETLB)) { + retval = -EINVAL; goto out_fput; + } } else if (flags & MAP_HUGETLB) { struct user_struct *user = NULL; struct hstate *hs; _
From: Feng Tang <feng.tang@intel.com> Subject: proc/meminfo: avoid open coded reading of vm_committed_as Patch series "make vm_committed_as_batch aware of vm overcommit policy", v6. When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and enlarge it for not-so-strict OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. And for that case, whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. We tested 10+ platforms in 0day (server, desktop and laptop). If we lift it to 64X, 80%+ platforms show improvements, and for 16X lift, 1/3 of the platforms will show improvements. And generally it should help the mmap/unmap usage,as Michal Hocko mentioned: : I believe that there are non-synthetic worklaods which would benefit : from a larger batch. E.g. large in memory databases which do large : mmaps during startups from multiple threads. Note: There are some style complain from checkpatch for patch 4, as sysctl handler declaration follows the similar format of sibling functions [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/ This patch (of 4): Use the existing vm_memory_committed() instead, which is also convenient for future change. Link: http://lkml.kernel.org/r/1594389708-60781-1-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1594389708-60781-2-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Kees Cook <keescook@chromium.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: kernel test robot <rong.a.chen@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/proc/meminfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/proc/meminfo.c~proc-meminfo-avoid-open-coded-reading-of-vm_committed_as +++ a/fs/proc/meminfo.c @@ -41,7 +41,7 @@ static int meminfo_proc_show(struct seq_ si_meminfo(&i); si_swapinfo(&i); - committed = percpu_counter_read_positive(&vm_committed_as); + committed = vm_memory_committed(); cached = global_node_page_state(NR_FILE_PAGES) - total_swapcache_pages() - i.bufferram; _
From: Feng Tang <feng.tang@intel.com> Subject: mm/util.c: make vm_memory_committed() more accurate percpu_counter_sum_positive() will provide more accurate info. As with percpu_counter_read_positive(), in worst case the deviation could be 'batch * nr_cpus', which is totalram_pages/256 for now, and will be more when the batch gets enlarged. Its time cost is about 800 nanoseconds on a 2C/4T platform and 2~3 microseconds on a 2S/36C/72T Skylake server in normal case, and in worst case where vm_committed_as's spinlock is under severe contention, it costs 30~40 microseconds for the 2S/36C/72T Skylake sever, which should be fine for its only two users: /proc/meminfo and HyperV balloon driver's status trace per second. Link: http://lkml.kernel.org/r/1592725000-73486-3-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1594389708-60781-3-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> # for /proc/meminfo Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: kernel test robot <rong.a.chen@intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/util.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/mm/util.c~mm-utilc-make-vm_memory_committed-more-accurate +++ a/mm/util.c @@ -787,10 +787,15 @@ struct percpu_counter vm_committed_as __ * balancing memory across competing virtual machines that are hosted. * Several metrics drive this policy engine including the guest reported * memory commitment. + * + * The time cost of this is very low for small platforms, and for big + * platform like a 2S/36C/72T Skylake server, in worst case where + * vm_committed_as's spinlock is under severe contention, the time cost + * could be about 30~40 microseconds. */ unsigned long vm_memory_committed(void) { - return percpu_counter_read_positive(&vm_committed_as); + return percpu_counter_sum_positive(&vm_committed_as); } EXPORT_SYMBOL_GPL(vm_memory_committed); _
From: Feng Tang <feng.tang@intel.com> Subject: percpu_counter: add percpu_counter_sync() percpu_counter's accuracy is related to its batch size. For a percpu_counter with a big batch, its deviation could be big, so when the counter's batch is runtime changed to a smaller value for better accuracy, there could also be requirment to reduce the big deviation. So add a percpu-counter sync function to be run on each CPU. Link: http://lkml.kernel.org/r/1594389708-60781-4-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/percpu_counter.h | 4 ++++ lib/percpu_counter.c | 19 +++++++++++++++++++ 2 files changed, 23 insertions(+) --- a/include/linux/percpu_counter.h~percpu_counter-add-percpu_counter_sync +++ a/include/linux/percpu_counter.h @@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct per s32 batch); s64 __percpu_counter_sum(struct percpu_counter *fbc); int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch); +void percpu_counter_sync(struct percpu_counter *fbc); static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs) { @@ -172,6 +173,9 @@ static inline bool percpu_counter_initia return true; } +static inline void percpu_counter_sync(struct percpu_counter *fbc) +{ +} #endif /* CONFIG_SMP */ static inline void percpu_counter_inc(struct percpu_counter *fbc) --- a/lib/percpu_counter.c~percpu_counter-add-percpu_counter_sync +++ a/lib/percpu_counter.c @@ -99,6 +99,25 @@ void percpu_counter_add_batch(struct per EXPORT_SYMBOL(percpu_counter_add_batch); /* + * For percpu_counter with a big batch, the devication of its count could + * be big, and there is requirement to reduce the deviation, like when the + * counter's batch could be runtime decreased to get a better accuracy, + * which can be achieved by running this sync function on each CPU. + */ +void percpu_counter_sync(struct percpu_counter *fbc) +{ + unsigned long flags; + s64 count; + + raw_spin_lock_irqsave(&fbc->lock, flags); + count = __this_cpu_read(*fbc->counters); + fbc->count += count; + __this_cpu_sub(*fbc->counters, count); + raw_spin_unlock_irqrestore(&fbc->lock, flags); +} +EXPORT_SYMBOL(percpu_counter_sync); + +/* * Add up all the per-cpu counts, return the result. This is a more accurate * but much slower version of percpu_counter_read_positive() */ _
From: Feng Tang <feng.tang@intel.com> Subject: mm: adjust vm_committed_as_batch according to vm overcommit policy When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also add a sysctl handler to adjust it when the policy is reconfigured. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test platforms in 0day (server, desktop and laptop), and 80%+ platforms shows improvements with that test. And whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. And if the lift is 16X, 1/3 of the platforms will show improvements, though it should help the mmap/unmap usage generally, as Michal Hocko mentioned: : I believe that there are non-synthetic worklaods which would benefit from : a larger batch. E.g. large in memory databases which do large mmaps : during startups from multiple threads. [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/ Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Kees Cook <keescook@chromium.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: kernel test robot <rong.a.chen@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm.h | 2 ++ include/linux/mman.h | 4 ++++ kernel/sysctl.c | 2 +- mm/mm_init.c | 20 +++++++++++++++----- mm/util.c | 41 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 63 insertions(+), 6 deletions(-) --- a/include/linux/mman.h~mm-adjust-vm_committed_as_batch-according-to-vm-overcommit-policy +++ a/include/linux/mman.h @@ -57,8 +57,12 @@ extern struct percpu_counter vm_committe #ifdef CONFIG_SMP extern s32 vm_committed_as_batch; +extern void mm_compute_batch(int overcommit_policy); #else #define vm_committed_as_batch 0 +static inline void mm_compute_batch(int overcommit_policy) +{ +} #endif unsigned long vm_memory_committed(void); --- a/include/linux/mm.h~mm-adjust-vm_committed_as_batch-according-to-vm-overcommit-policy +++ a/include/linux/mm.h @@ -206,6 +206,8 @@ int overcommit_ratio_handler(struct ctl_ loff_t *); int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *, loff_t *); +int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, + loff_t *); #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) --- a/kernel/sysctl.c~mm-adjust-vm_committed_as_batch-according-to-vm-overcommit-policy +++ a/kernel/sysctl.c @@ -2671,7 +2671,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_overcommit_memory, .maxlen = sizeof(sysctl_overcommit_memory), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = overcommit_policy_handler, .extra1 = SYSCTL_ZERO, .extra2 = &two, }, --- a/mm/mm_init.c~mm-adjust-vm_committed_as_batch-according-to-vm-overcommit-policy +++ a/mm/mm_init.c @@ -13,6 +13,7 @@ #include <linux/memory.h> #include <linux/notifier.h> #include <linux/sched.h> +#include <linux/mman.h> #include "internal.h" #ifdef CONFIG_DEBUG_MEMORY_INIT @@ -144,14 +145,23 @@ EXPORT_SYMBOL_GPL(mm_kobj); #ifdef CONFIG_SMP s32 vm_committed_as_batch = 32; -static void __meminit mm_compute_batch(void) +void mm_compute_batch(int overcommit_policy) { u64 memsized_batch; s32 nr = num_present_cpus(); s32 batch = max_t(s32, nr*2, 32); + unsigned long ram_pages = totalram_pages(); - /* batch size set to 0.4% of (total memory/#cpus), or max int32 */ - memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff); + /* + * For policy OVERCOMMIT_NEVER, set batch size to 0.4% of + * (total memory/#cpus), and lift it to 25% for other policies + * to easy the possible lock contention for percpu_counter + * vm_committed_as, while the max limit is INT_MAX + */ + if (overcommit_policy == OVERCOMMIT_NEVER) + memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX); + else + memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX); vm_committed_as_batch = max_t(s32, memsized_batch, batch); } @@ -162,7 +172,7 @@ static int __meminit mm_compute_batch_no switch (action) { case MEM_ONLINE: case MEM_OFFLINE: - mm_compute_batch(); + mm_compute_batch(sysctl_overcommit_memory); default: break; } @@ -176,7 +186,7 @@ static struct notifier_block compute_bat static int __init mm_compute_batch_init(void) { - mm_compute_batch(); + mm_compute_batch(sysctl_overcommit_memory); register_hotmemory_notifier(&compute_batch_nb); return 0; --- a/mm/util.c~mm-adjust-vm_committed_as_batch-according-to-vm-overcommit-policy +++ a/mm/util.c @@ -746,6 +746,47 @@ int overcommit_ratio_handler(struct ctl_ return ret; } +static void sync_overcommit_as(struct work_struct *dummy) +{ + percpu_counter_sync(&vm_committed_as); +} + +int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + struct ctl_table t; + int new_policy; + int ret; + + /* + * The deviation of sync_overcommit_as could be big with loose policy + * like OVERCOMMIT_ALWAYS/OVERCOMMIT_GUESS. When changing policy to + * strict OVERCOMMIT_NEVER, we need to reduce the deviation to comply + * with the strict "NEVER", and to avoid possible race condtion (even + * though user usually won't too frequently do the switching to policy + * OVERCOMMIT_NEVER), the switch is done in the following order: + * 1. changing the batch + * 2. sync percpu count on each CPU + * 3. switch the policy + */ + if (write) { + t = *table; + t.data = &new_policy; + ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos); + if (ret) + return ret; + + mm_compute_batch(new_policy); + if (new_policy == OVERCOMMIT_NEVER) + schedule_on_each_cpu(sync_overcommit_as); + sysctl_overcommit_memory = new_policy; + } else { + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + } + + return ret; +} + int overcommit_kbytes_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { _
From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages() Patch series "arm64: Enable vmemmap mapping from device memory", v4. This series enables vmemmap backing memory allocation from device memory ranges on arm64. But before that, it enables vmemmap_populate_basepages() and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based alocation requests. This patch (of 3): vmemmap_populate_basepages() is used across platforms to allocate backing memory for vmemmap mapping. This is used as a standard default choice or as a fallback when intended huge pages allocation fails. This just creates entire vmemmap mapping with base pages (PAGE_SIZE). On arm64 platforms, vmemmap_populate_basepages() is called instead of the platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs. At present vmemmap_populate_basepages() does not support allocating from driver defined struct vmem_altmap while trying to create vmemmap mapping for a device memory range. It prevents ARM64_16K_PAGES and ARM64_64K_PAGES configs on arm64 from supporting device memory with vmemap_altmap request. This enables vmem_altmap support in vmemmap_populate_basepages() unlocking device memory allocation for vmemap mapping on arm64 platforms with 16K or 64K base page configs. Each architecture should evaluate and decide on subscribing device memory based base page allocation through vmemmap_populate_basepages(). Hence lets keep it disabled on all archs in order to preserve the existing semantics. A subsequent patch enables it on arm64. Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Jia He <justin.he@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm64/mm/mmu.c | 2 +- arch/ia64/mm/discontig.c | 2 +- arch/riscv/mm/init.c | 2 +- arch/x86/mm/init_64.c | 6 +++--- include/linux/mm.h | 5 +++-- mm/sparse-vmemmap.c | 16 +++++++++++----- 6 files changed, 20 insertions(+), 13 deletions(-) --- a/arch/arm64/mm/mmu.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_populate_basepages +++ a/arch/arm64/mm/mmu.c @@ -1070,7 +1070,7 @@ static void free_empty_tables(unsigned l int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { - return vmemmap_populate_basepages(start, end, node); + return vmemmap_populate_basepages(start, end, node, NULL); } #else /* !ARM64_SWAPPER_USES_SECTION_MAPS */ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, --- a/arch/ia64/mm/discontig.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_populate_basepages +++ a/arch/ia64/mm/discontig.c @@ -655,7 +655,7 @@ void arch_refresh_nodedata(int update_no int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { - return vmemmap_populate_basepages(start, end, node); + return vmemmap_populate_basepages(start, end, node, NULL); } void vmemmap_free(unsigned long start, unsigned long end, --- a/arch/riscv/mm/init.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_populate_basepages +++ a/arch/riscv/mm/init.c @@ -554,6 +554,6 @@ void __init paging_init(void) int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { - return vmemmap_populate_basepages(start, end, node); + return vmemmap_populate_basepages(start, end, node, NULL); } #endif --- a/arch/x86/mm/init_64.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_populate_basepages +++ a/arch/x86/mm/init_64.c @@ -1545,7 +1545,7 @@ static int __meminit vmemmap_populate_hu vmemmap_verify((pte_t *)pmd, node, addr, next); continue; } - if (vmemmap_populate_basepages(addr, next, node)) + if (vmemmap_populate_basepages(addr, next, node, NULL)) return -ENOMEM; } return 0; @@ -1557,7 +1557,7 @@ int __meminit vmemmap_populate(unsigned int err; if (end - start < PAGES_PER_SECTION * sizeof(struct page)) - err = vmemmap_populate_basepages(start, end, node); + err = vmemmap_populate_basepages(start, end, node, NULL); else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); else if (altmap) { @@ -1565,7 +1565,7 @@ int __meminit vmemmap_populate(unsigned __func__); err = -ENOMEM; } else - err = vmemmap_populate_basepages(start, end, node); + err = vmemmap_populate_basepages(start, end, node, NULL); if (!err) sync_global_pgds(start, end - 1); return err; --- a/include/linux/mm.h~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_populate_basepages +++ a/include/linux/mm.h @@ -2978,14 +2978,15 @@ pgd_t *vmemmap_pgd_populate(unsigned lon p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node); -pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node); +pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, + struct vmem_altmap *altmap); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; void *vmemmap_alloc_block_buf(unsigned long size, int node); void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap); void vmemmap_verify(pte_t *, int, unsigned long, unsigned long); int vmemmap_populate_basepages(unsigned long start, unsigned long end, - int node); + int node, struct vmem_altmap *altmap); int vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap); void vmemmap_populate_print_last(void); --- a/mm/sparse-vmemmap.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_populate_basepages +++ a/mm/sparse-vmemmap.c @@ -139,12 +139,18 @@ void __meminit vmemmap_verify(pte_t *pte start, end - 1); } -pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node) +pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, + struct vmem_altmap *altmap) { pte_t *pte = pte_offset_kernel(pmd, addr); if (pte_none(*pte)) { pte_t entry; - void *p = vmemmap_alloc_block_buf(PAGE_SIZE, node); + void *p; + + if (altmap) + p = altmap_alloc_block_buf(PAGE_SIZE, altmap); + else + p = vmemmap_alloc_block_buf(PAGE_SIZE, node); if (!p) return NULL; entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); @@ -212,8 +218,8 @@ pgd_t * __meminit vmemmap_pgd_populate(u return pgd; } -int __meminit vmemmap_populate_basepages(unsigned long start, - unsigned long end, int node) +int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long end, + int node, struct vmem_altmap *altmap) { unsigned long addr = start; pgd_t *pgd; @@ -235,7 +241,7 @@ int __meminit vmemmap_populate_basepages pmd = vmemmap_pmd_populate(pud, addr, node); if (!pmd) return -ENOMEM; - pte = vmemmap_pte_populate(pmd, addr, node); + pte = vmemmap_pte_populate(pmd, addr, node, altmap); if (!pte) return -ENOMEM; vmemmap_verify(pte, node, addr, addr + PAGE_SIZE); _
From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf() There are many instances where vmemap allocation is often switched between regular memory and device memory just based on whether altmap is available or not. vmemmap_alloc_block_buf() is used in various platforms to allocate vmemmap mappings. Lets also enable it to handle altmap based device memory allocation along with existing regular memory allocations. This will help in avoiding the altmap based allocation switch in many places. To summarize there are two different methods to call vmemmap_alloc_block_buf(). vmemmap_alloc_block_buf(size, node, NULL) /* Allocate from system RAM */ vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */ This converts altmap_alloc_block_buf() into a static function, drops it's entry from the header and updates Documentation/vm/memory-model.rst. Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Jia He <justin.he@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/vm/memory-model.rst | 2 +- arch/arm64/mm/mmu.c | 2 +- arch/powerpc/mm/init_64.c | 4 ++-- arch/x86/mm/init_64.c | 5 +---- include/linux/mm.h | 4 ++-- mm/sparse-vmemmap.c | 28 +++++++++++++--------------- 6 files changed, 20 insertions(+), 25 deletions(-) --- a/arch/arm64/mm/mmu.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_alloc_block_buf +++ a/arch/arm64/mm/mmu.c @@ -1102,7 +1102,7 @@ int __meminit vmemmap_populate(unsigned if (pmd_none(READ_ONCE(*pmdp))) { void *p = NULL; - p = vmemmap_alloc_block_buf(PMD_SIZE, node); + p = vmemmap_alloc_block_buf(PMD_SIZE, node, NULL); if (!p) return -ENOMEM; --- a/arch/powerpc/mm/init_64.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_alloc_block_buf +++ a/arch/powerpc/mm/init_64.c @@ -225,12 +225,12 @@ int __meminit vmemmap_populate(unsigned * fall back to system memory if the altmap allocation fail. */ if (altmap && !altmap_cross_boundary(altmap, start, page_size)) { - p = altmap_alloc_block_buf(page_size, altmap); + p = vmemmap_alloc_block_buf(page_size, node, altmap); if (!p) pr_debug("altmap block allocation failed, falling back to system memory"); } if (!p) - p = vmemmap_alloc_block_buf(page_size, node); + p = vmemmap_alloc_block_buf(page_size, node, NULL); if (!p) return -ENOMEM; --- a/arch/x86/mm/init_64.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_alloc_block_buf +++ a/arch/x86/mm/init_64.c @@ -1515,10 +1515,7 @@ static int __meminit vmemmap_populate_hu if (pmd_none(*pmd)) { void *p; - if (altmap) - p = altmap_alloc_block_buf(PMD_SIZE, altmap); - else - p = vmemmap_alloc_block_buf(PMD_SIZE, node); + p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap); if (p) { pte_t entry; --- a/Documentation/vm/memory-model.rst~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_alloc_block_buf +++ a/Documentation/vm/memory-model.rst @@ -178,7 +178,7 @@ for persistent memory devices in pre-all devices. This storage is represented with :c:type:`struct vmem_altmap` that is eventually passed to vmemmap_populate() through a long chain of function calls. The vmemmap_populate() implementation may use the -`vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to +`vmem_altmap` along with :c:func:`vmemmap_alloc_block_buf` helper to allocate memory map on the persistent memory device. ZONE_DEVICE --- a/include/linux/mm.h~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_alloc_block_buf +++ a/include/linux/mm.h @@ -2982,8 +2982,8 @@ pte_t *vmemmap_pte_populate(pmd_t *pmd, struct vmem_altmap *altmap); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; -void *vmemmap_alloc_block_buf(unsigned long size, int node); -void *altmap_alloc_block_buf(unsigned long size, struct vmem_altmap *altmap); +void *vmemmap_alloc_block_buf(unsigned long size, int node, + struct vmem_altmap *altmap); void vmemmap_verify(pte_t *, int, unsigned long, unsigned long); int vmemmap_populate_basepages(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap); --- a/mm/sparse-vmemmap.c~mm-sparsemem-enable-vmem_altmap-support-in-vmemmap_alloc_block_buf +++ a/mm/sparse-vmemmap.c @@ -69,11 +69,19 @@ void * __meminit vmemmap_alloc_block(uns __pa(MAX_DMA_ADDRESS)); } +static void * __meminit altmap_alloc_block_buf(unsigned long size, + struct vmem_altmap *altmap); + /* need to make sure size is all the same during early stage */ -void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node) +void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node, + struct vmem_altmap *altmap) { - void *ptr = sparse_buffer_alloc(size); + void *ptr; + + if (altmap) + return altmap_alloc_block_buf(size, altmap); + ptr = sparse_buffer_alloc(size); if (!ptr) ptr = vmemmap_alloc_block(size, node); return ptr; @@ -94,15 +102,8 @@ static unsigned long __meminit vmem_altm return 0; } -/** - * altmap_alloc_block_buf - allocate pages from the device page map - * @altmap: device page map - * @size: size (in bytes) of the allocation - * - * Allocations are aligned to the size of the request. - */ -void * __meminit altmap_alloc_block_buf(unsigned long size, - struct vmem_altmap *altmap) +static void * __meminit altmap_alloc_block_buf(unsigned long size, + struct vmem_altmap *altmap) { unsigned long pfn, nr_pfns, nr_align; @@ -147,10 +148,7 @@ pte_t * __meminit vmemmap_pte_populate(p pte_t entry; void *p; - if (altmap) - p = altmap_alloc_block_buf(PAGE_SIZE, altmap); - else - p = vmemmap_alloc_block_buf(PAGE_SIZE, node); + p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap); if (!p) return NULL; entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); _
From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: arm64/mm: enable vmem_altmap support for vmemmap mappings Device memory ranges when getting hot added into ZONE_DEVICE, might require their vmemmap mapping's backing memory to be allocated from their own range instead of consuming system memory. This prevents large system memory usage for potentially large device memory ranges. Device driver communicates this request via vmem_altmap structure. Architecture needs to take this request into account while creating and tearing down vemmmap mappings. This enables vmem_altmap support in vmemmap_populate() and vmemmap_free() which includes vmemmap_populate_basepages() used for ARM64_16K_PAGES and ARM64_64K_PAGES configs. Link: http://lkml.kernel.org/r/1594004178-8861-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Jia He <justin.he@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm64/mm/mmu.c | 58 +++++++++++++++++++++++++++--------------- 1 file changed, 38 insertions(+), 20 deletions(-) --- a/arch/arm64/mm/mmu.c~arm64-mm-enable-vmem_altmap-support-for-vmemmap-mappings +++ a/arch/arm64/mm/mmu.c @@ -761,15 +761,20 @@ int kern_addr_valid(unsigned long addr) } #ifdef CONFIG_MEMORY_HOTPLUG -static void free_hotplug_page_range(struct page *page, size_t size) +static void free_hotplug_page_range(struct page *page, size_t size, + struct vmem_altmap *altmap) { - WARN_ON(PageReserved(page)); - free_pages((unsigned long)page_address(page), get_order(size)); + if (altmap) { + vmem_altmap_free(altmap, size >> PAGE_SHIFT); + } else { + WARN_ON(PageReserved(page)); + free_pages((unsigned long)page_address(page), get_order(size)); + } } static void free_hotplug_pgtable_page(struct page *page) { - free_hotplug_page_range(page, PAGE_SIZE); + free_hotplug_page_range(page, PAGE_SIZE, NULL); } static bool pgtable_range_aligned(unsigned long start, unsigned long end, @@ -792,7 +797,8 @@ static bool pgtable_range_aligned(unsign } static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, - unsigned long end, bool free_mapped) + unsigned long end, bool free_mapped, + struct vmem_altmap *altmap) { pte_t *ptep, pte; @@ -806,12 +812,14 @@ static void unmap_hotplug_pte_range(pmd_ pte_clear(&init_mm, addr, ptep); flush_tlb_kernel_range(addr, addr + PAGE_SIZE); if (free_mapped) - free_hotplug_page_range(pte_page(pte), PAGE_SIZE); + free_hotplug_page_range(pte_page(pte), + PAGE_SIZE, altmap); } while (addr += PAGE_SIZE, addr < end); } static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr, - unsigned long end, bool free_mapped) + unsigned long end, bool free_mapped, + struct vmem_altmap *altmap) { unsigned long next; pmd_t *pmdp, pmd; @@ -834,16 +842,17 @@ static void unmap_hotplug_pmd_range(pud_ flush_tlb_kernel_range(addr, addr + PAGE_SIZE); if (free_mapped) free_hotplug_page_range(pmd_page(pmd), - PMD_SIZE); + PMD_SIZE, altmap); continue; } WARN_ON(!pmd_table(pmd)); - unmap_hotplug_pte_range(pmdp, addr, next, free_mapped); + unmap_hotplug_pte_range(pmdp, addr, next, free_mapped, altmap); } while (addr = next, addr < end); } static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr, - unsigned long end, bool free_mapped) + unsigned long end, bool free_mapped, + struct vmem_altmap *altmap) { unsigned long next; pud_t *pudp, pud; @@ -866,16 +875,17 @@ static void unmap_hotplug_pud_range(p4d_ flush_tlb_kernel_range(addr, addr + PAGE_SIZE); if (free_mapped) free_hotplug_page_range(pud_page(pud), - PUD_SIZE); + PUD_SIZE, altmap); continue; } WARN_ON(!pud_table(pud)); - unmap_hotplug_pmd_range(pudp, addr, next, free_mapped); + unmap_hotplug_pmd_range(pudp, addr, next, free_mapped, altmap); } while (addr = next, addr < end); } static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr, - unsigned long end, bool free_mapped) + unsigned long end, bool free_mapped, + struct vmem_altmap *altmap) { unsigned long next; p4d_t *p4dp, p4d; @@ -888,16 +898,24 @@ static void unmap_hotplug_p4d_range(pgd_ continue; WARN_ON(!p4d_present(p4d)); - unmap_hotplug_pud_range(p4dp, addr, next, free_mapped); + unmap_hotplug_pud_range(p4dp, addr, next, free_mapped, altmap); } while (addr = next, addr < end); } static void unmap_hotplug_range(unsigned long addr, unsigned long end, - bool free_mapped) + bool free_mapped, struct vmem_altmap *altmap) { unsigned long next; pgd_t *pgdp, pgd; + /* + * altmap can only be used as vmemmap mapping backing memory. + * In case the backing memory itself is not being freed, then + * altmap is irrelevant. Warn about this inconsistency when + * encountered. + */ + WARN_ON(!free_mapped && altmap); + do { next = pgd_addr_end(addr, end); pgdp = pgd_offset_k(addr); @@ -906,7 +924,7 @@ static void unmap_hotplug_range(unsigned continue; WARN_ON(!pgd_present(pgd)); - unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped); + unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped, altmap); } while (addr = next, addr < end); } @@ -1070,7 +1088,7 @@ static void free_empty_tables(unsigned l int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, struct vmem_altmap *altmap) { - return vmemmap_populate_basepages(start, end, node, NULL); + return vmemmap_populate_basepages(start, end, node, altmap); } #else /* !ARM64_SWAPPER_USES_SECTION_MAPS */ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, @@ -1102,7 +1120,7 @@ int __meminit vmemmap_populate(unsigned if (pmd_none(READ_ONCE(*pmdp))) { void *p = NULL; - p = vmemmap_alloc_block_buf(PMD_SIZE, node, NULL); + p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap); if (!p) return -ENOMEM; @@ -1120,7 +1138,7 @@ void vmemmap_free(unsigned long start, u #ifdef CONFIG_MEMORY_HOTPLUG WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); - unmap_hotplug_range(start, end, true); + unmap_hotplug_range(start, end, true, altmap); free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END); #endif } @@ -1411,7 +1429,7 @@ static void __remove_pgd_mapping(pgd_t * WARN_ON(pgdir != init_mm.pgd); WARN_ON((start < PAGE_OFFSET) || (end > PAGE_END)); - unmap_hotplug_range(start, end, false); + unmap_hotplug_range(start, end, false, NULL); free_empty_tables(start, end, PAGE_OFFSET, PAGE_END); } _
From: Miaohe Lin <linmiaohe@huawei.com> Subject: mm: mmap: merge vma after call_mmap() if possible The vm_flags may be changed after call_mmap() because drivers may set some flags for their own purpose. As a result, we failed to merge the adjacent vma due to the different vm_flags as userspace can't pass in the same one. Try to merge vma after call_mmap() to fix this issue. Link: http://lkml.kernel.org/r/1594954065-23733-1-git-send-email-linmiaohe@huawei.com Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/mmap.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) --- a/mm/mmap.c~mm-mmap-merge-vma-after-call_mmap-if-possible +++ a/mm/mmap.c @@ -1690,7 +1690,7 @@ unsigned long mmap_region(struct file *f struct list_head *uf) { struct mm_struct *mm = current->mm; - struct vm_area_struct *vma, *prev; + struct vm_area_struct *vma, *prev, *merge; int error; struct rb_node **rb_link, *rb_parent; unsigned long charged = 0; @@ -1774,6 +1774,25 @@ unsigned long mmap_region(struct file *f if (error) goto unmap_and_free_vma; + /* If vm_flags changed after call_mmap(), we should try merge vma again + * as we may succeed this time. + */ + if (unlikely(vm_flags != vma->vm_flags && prev)) { + merge = vma_merge(mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags, + NULL, vma->vm_file, vma->vm_pgoff, NULL, NULL_VM_UFFD_CTX); + if (merge) { + fput(file); + vm_area_free(vma); + vma = merge; + /* Update vm_flags and possible addr to pick up the change. We don't + * warn here if addr changed as the vma is not linked by vma_link(). + */ + addr = vma->vm_start; + vm_flags = vma->vm_flags; + goto unmap_writable; + } + } + /* Can addr have changed?? * * Answer: Yes, several device drivers can do it in their @@ -1796,6 +1815,7 @@ unsigned long mmap_region(struct file *f vma_link(mm, vma, prev, rb_link, rb_parent); /* Once vma denies write, undo our temporary denial count */ if (file) { +unmap_writable: if (vm_flags & VM_SHARED) mapping_unmap_writable(file->f_mapping); if (vm_flags & VM_DENYWRITE) _
From: Peter Collingbourne <pcc@google.com> Subject: mm: remove unnecessary wrapper function do_mmap_pgoff() The current split between do_mmap() and do_mmap_pgoff() was introduced in commit 1fcfd8db7f82 ("mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()") to support MPX. The wrapper function do_mmap_pgoff() always passed 0 as the value of the vm_flags argument to do_mmap(). However, MPX support has subsequently been removed from the kernel and there were no more direct callers of do_mmap(); all calls were going via do_mmap_pgoff(). Simplify the code by removing do_mmap_pgoff() and changing all callers to directly call do_mmap(), which now no longer takes a vm_flags argument. Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/aio.c | 6 +++--- fs/hugetlbfs/inode.c | 2 +- include/linux/fs.h | 2 +- include/linux/mm.h | 12 +----------- ipc/shm.c | 2 +- mm/mmap.c | 16 ++++++++-------- mm/nommu.c | 6 +++--- mm/shmem.c | 2 +- mm/util.c | 4 ++-- 9 files changed, 21 insertions(+), 31 deletions(-) --- a/fs/aio.c~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/fs/aio.c @@ -525,9 +525,9 @@ static int aio_setup_ring(struct kioctx return -EINTR; } - ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size, - PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, + PROT_READ | PROT_WRITE, + MAP_SHARED, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; --- a/fs/hugetlbfs/inode.c~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/fs/hugetlbfs/inode.c @@ -140,7 +140,7 @@ static int hugetlbfs_file_mmap(struct fi * already been checked by prepare_hugepage_range. If you add * any error returns here, do so after setting VM_HUGETLB, so * is_vm_hugetlb_page tests below unmap_region go the right - * way when do_mmap_pgoff unwinds (may be important on powerpc + * way when do_mmap unwinds (may be important on powerpc * and ia64). */ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND; --- a/include/linux/fs.h~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/include/linux/fs.h @@ -528,7 +528,7 @@ static inline int mapping_mapped(struct /* * Might pages of this file have been modified in userspace? - * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap_pgoff + * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap * marks vma as VM_SHARED if it is shared, and the file was opened for * writing i.e. vma may be mprotected writable even if now readonly. * --- a/include/linux/mm.h~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/include/linux/mm.h @@ -2546,23 +2546,13 @@ extern unsigned long mmap_region(struct struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, - struct list_head *uf); + unsigned long pgoff, unsigned long *populate, struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); extern int do_madvise(unsigned long start, size_t len_in, int behavior); -static inline unsigned long -do_mmap_pgoff(struct file *file, unsigned long addr, - unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, - struct list_head *uf) -{ - return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate, uf); -} - #ifdef CONFIG_MMU extern int __mm_populate(unsigned long addr, unsigned long len, int ignore_errors); --- a/ipc/shm.c~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/ipc/shm.c @@ -1558,7 +1558,7 @@ long do_shmat(int shmid, char __user *sh goto invalid; } - addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) --- a/mm/mmap.c~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/mm/mmap.c @@ -1030,7 +1030,7 @@ static inline int is_mergeable_anon_vma( * anon_vmas, nor if same anon_vma is assigned but offsets incompatible. * * We don't check here for the merged mmap wrapping around the end of pagecache - * indices (16TB on ia32) because do_mmap_pgoff() does not permit mmap's which + * indices (16TB on ia32) because do_mmap() does not permit mmap's which * wrap, nor mmaps which cover the final page at index -1UL. */ static int @@ -1365,11 +1365,11 @@ static inline bool file_mmap_ok(struct f */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, vm_flags_t vm_flags, - unsigned long pgoff, unsigned long *populate, - struct list_head *uf) + unsigned long flags, unsigned long pgoff, + unsigned long *populate, struct list_head *uf) { struct mm_struct *mm = current->mm; + vm_flags_t vm_flags; int pkey = 0; *populate = 0; @@ -1431,7 +1431,7 @@ unsigned long do_mmap(struct file *file, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2230,7 +2230,7 @@ get_unmapped_area(struct file *file, uns /* * mmap_region() will call shmem_zero_setup() to create a file, * so use shmem's get_unmapped_area in case it can be huge. - * do_mmap_pgoff() will clear pgoff, so match alignment. + * do_mmap() will clear pgoff, so match alignment. */ pgoff = 0; get_area = shmem_get_unmapped_area; @@ -3003,7 +3003,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsign } file = get_file(vma->vm_file); - ret = do_mmap_pgoff(vma->vm_file, start, size, + ret = do_mmap(vma->vm_file, start, size, prot, flags, pgoff, &populate, NULL); fput(file); out: @@ -3223,7 +3223,7 @@ int insert_vm_struct(struct mm_struct *m * By setting it to reflect the virtual start address of the * vma, merges and splits can happen in a seamless way, just * using the existing file pgoff checks and manipulations. - * Similarly in do_mmap_pgoff and in do_brk. + * Similarly in do_mmap and in do_brk. */ if (vma_is_anonymous(vma)) { BUG_ON(vma->anon_vma); --- a/mm/nommu.c~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/mm/nommu.c @@ -1078,7 +1078,6 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, - vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1086,6 +1085,7 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; + vm_flags_t vm_flags; unsigned long capabilities, result; int ret; @@ -1104,7 +1104,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags |= determine_vm_flags(file, prot, flags, capabilities); + vm_flags = determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL); @@ -1763,7 +1763,7 @@ EXPORT_SYMBOL_GPL(access_process_vm); * * Check the shared mappings on an inode on behalf of a shrinking truncate to * make sure that that any outstanding VMAs aren't broken and then shrink the - * vm_regions that extend that beyond so that do_mmap_pgoff() doesn't + * vm_regions that extend that beyond so that do_mmap() doesn't * automatically grant mappings that are too large. */ int nommu_shrink_inode_mappings(struct inode *inode, size_t size, --- a/mm/shmem.c~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/mm/shmem.c @@ -4245,7 +4245,7 @@ EXPORT_SYMBOL_GPL(shmem_file_setup_with_ /** * shmem_zero_setup - setup a shared anonymous mapping - * @vma: the vma to be mmapped is prepared by do_mmap_pgoff + * @vma: the vma to be mmapped is prepared by do_mmap */ int shmem_zero_setup(struct vm_area_struct *vma) { --- a/mm/util.c~mm-remove-unnecessary-wrapper-function-do_mmap_pgoff +++ a/mm/util.c @@ -503,8 +503,8 @@ unsigned long vm_mmap_pgoff(struct file if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff, - &populate, &uf); + ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); if (populate) _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/mremap: it is sure to have enough space when extent meets requirement Patch series "mm/mremap: cleanup move_page_tables() a little", v5. move_page_tables() tries to move page table by PMD or PTE. The root reason is if it tries to move PMD, both old and new range should be PMD aligned. But current code calculate old range and new range separately. This leads to some redundant check and calculation. This cleanup tries to consolidate the range check in one place to reduce some extra range handling. This patch (of 3): old_end is passed to these two functions to check whether there is enough space to do the move, while this check is done before invoking these functions. These two functions only would be invoked when extent meets the requirement and there is one check before invoking these functions: if (extent > old_end - old_addr) extent = old_end - old_addr; This implies (old_end - old_addr) won't fail the check in these two functions. Link: http://lkml.kernel.org/r/20200710092835.56368-1-richard.weiyang@linux.alibaba.com Link: http://lkml.kernel.org/r/20200710092835.56368-2-richard.weiyang@linux.alibaba.com Link: http://lkml.kernel.org/r/20200708095028.41706-1-richard.weiyang@linux.alibaba.com Link: http://lkml.kernel.org/r/20200708095028.41706-2-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Peter Xu <peterx@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/huge_mm.h | 2 +- mm/huge_memory.c | 7 ++----- mm/mremap.c | 10 ++++------ 3 files changed, 7 insertions(+), 12 deletions(-) --- a/include/linux/huge_mm.h~mm-mremap-it-is-sure-to-have-enough-space-when-extent-meets-requirement +++ a/include/linux/huge_mm.h @@ -42,7 +42,7 @@ extern int mincore_huge_pmd(struct vm_ar unsigned long addr, unsigned long end, unsigned char *vec); extern bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, unsigned long old_end, + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd); extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, --- a/mm/huge_memory.c~mm-mremap-it-is-sure-to-have-enough-space-when-extent-meets-requirement +++ a/mm/huge_memory.c @@ -1722,17 +1722,14 @@ static pmd_t move_soft_dirty_pmd(pmd_t p } bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, unsigned long old_end, - pmd_t *old_pmd, pmd_t *new_pmd) + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) { spinlock_t *old_ptl, *new_ptl; pmd_t pmd; struct mm_struct *mm = vma->vm_mm; bool force_flush = false; - if ((old_addr & ~HPAGE_PMD_MASK) || - (new_addr & ~HPAGE_PMD_MASK) || - old_end - old_addr < HPAGE_PMD_SIZE) + if ((old_addr & ~HPAGE_PMD_MASK) || (new_addr & ~HPAGE_PMD_MASK)) return false; /* --- a/mm/mremap.c~mm-mremap-it-is-sure-to-have-enough-space-when-extent-meets-requirement +++ a/mm/mremap.c @@ -193,15 +193,13 @@ static void move_ptes(struct vm_area_str #ifdef CONFIG_HAVE_MOVE_PMD static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, unsigned long old_end, - pmd_t *old_pmd, pmd_t *new_pmd) + unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) { spinlock_t *old_ptl, *new_ptl; struct mm_struct *mm = vma->vm_mm; pmd_t pmd; - if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) - || old_end - old_addr < PMD_SIZE) + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)) return false; /* @@ -292,7 +290,7 @@ unsigned long move_page_tables(struct vm if (need_rmap_locks) take_rmap_locks(vma); moved = move_huge_pmd(vma, old_addr, new_addr, - old_end, old_pmd, new_pmd); + old_pmd, new_pmd); if (need_rmap_locks) drop_rmap_locks(vma); if (moved) @@ -312,7 +310,7 @@ unsigned long move_page_tables(struct vm if (need_rmap_locks) take_rmap_locks(vma); moved = move_normal_pmd(vma, old_addr, new_addr, - old_end, old_pmd, new_pmd); + old_pmd, new_pmd); if (need_rmap_locks) drop_rmap_locks(vma); if (moved) _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/mremap: calculate extent in one place Page tables is moved on the base of PMD. This requires both source and destination range should meet the requirement. Current code works well since move_huge_pmd() and move_normal_pmd() would check old_addr and new_addr again. And then return to move_ptes() if the either of them is not aligned. Instead of calculating the extent separately, it is better to calculate in one place, so we know it is not necessary to try move pmd. By doing so, the logic seems a little clear. Link: http://lkml.kernel.org/r/20200708095028.41706-3-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/mremap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/mremap.c~mm-mremap-calculate-extent-in-one-place +++ a/mm/mremap.c @@ -277,6 +277,9 @@ unsigned long move_page_tables(struct vm extent = next - old_addr; if (extent > old_end - old_addr) extent = old_end - old_addr; + next = (new_addr + PMD_SIZE) & PMD_MASK; + if (extent > next - new_addr) + extent = next - new_addr; old_pmd = get_old_pmd(vma->vm_mm, old_addr); if (!old_pmd) continue; @@ -320,9 +323,6 @@ unsigned long move_page_tables(struct vm if (pte_alloc(new_vma->vm_mm, new_pmd)) break; - next = (new_addr + PMD_SIZE) & PMD_MASK; - if (extent > next - new_addr) - extent = next - new_addr; move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, new_pmd, new_addr, need_rmap_locks); } _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/mremap: start addresses are properly aligned After previous cleanup, extent is the minimal step for both source and destination. This means when extent is HPAGE_PMD_SIZE or PMD_SIZE, old_addr and new_addr are properly aligned too. Since these two functions are only invoked in move_page_tables, it is safe to remove the check now. Link: http://lkml.kernel.org/r/20200708095028.41706-4-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/huge_memory.c | 3 --- mm/mremap.c | 3 --- 2 files changed, 6 deletions(-) --- a/mm/huge_memory.c~mm-mremap-start-addresses-are-properly-aligned +++ a/mm/huge_memory.c @@ -1729,9 +1729,6 @@ bool move_huge_pmd(struct vm_area_struct struct mm_struct *mm = vma->vm_mm; bool force_flush = false; - if ((old_addr & ~HPAGE_PMD_MASK) || (new_addr & ~HPAGE_PMD_MASK)) - return false; - /* * The destination pmd shouldn't be established, free_pgtables() * should have release it. --- a/mm/mremap.c~mm-mremap-start-addresses-are-properly-aligned +++ a/mm/mremap.c @@ -199,9 +199,6 @@ static bool move_normal_pmd(struct vm_ar struct mm_struct *mm = vma->vm_mm; pmd_t pmd; - if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)) - return false; - /* * The destination pmd shouldn't be established, free_pgtables() * should have released it. _
From: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Subject: selftests: add mincore() tests Add a test suite for the mincore() syscall. It tests most of its use cases as well as its interface. Tests implemented: - basic interface test - behavior on anonymous mappings - behavior on anonymous mappings with huge tlb pages - file-backed mapping with a regular file - file-backed mapping with a tmpfs file Link: http://lkml.kernel.org/r/20200728100450.4065-1-ricardo.canuelo@collabora.com Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- tools/testing/selftests/Makefile | 1 tools/testing/selftests/mincore/.gitignore | 2 tools/testing/selftests/mincore/Makefile | 6 tools/testing/selftests/mincore/mincore_selftest.c | 361 +++++++++++ 4 files changed, 370 insertions(+) --- a/tools/testing/selftests/Makefile~selftests-add-mincore-tests +++ a/tools/testing/selftests/Makefile @@ -32,6 +32,7 @@ TARGETS += lkdtm TARGETS += membarrier TARGETS += memfd TARGETS += memory-hotplug +TARGETS += mincore TARGETS += mount TARGETS += mqueue TARGETS += net --- /dev/null +++ a/tools/testing/selftests/mincore/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0+ +mincore_selftest --- /dev/null +++ a/tools/testing/selftests/mincore/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0+ + +CFLAGS += -Wall + +TEST_GEN_PROGS := mincore_selftest +include ../lib.mk --- /dev/null +++ a/tools/testing/selftests/mincore/mincore_selftest.c @@ -0,0 +1,361 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * kselftest suite for mincore(). + * + * Copyright (C) 2020 Collabora, Ltd. + */ + +#define _GNU_SOURCE + +#include <stdio.h> +#include <errno.h> +#include <unistd.h> +#include <stdlib.h> +#include <sys/mman.h> +#include <string.h> +#include <fcntl.h> +#include <string.h> + +#include "../kselftest.h" +#include "../kselftest_harness.h" + +/* Default test file size: 4MB */ +#define MB (1UL << 20) +#define FILE_SIZE (4 * MB) + + +/* + * Tests the user interface. This test triggers most of the documented + * error conditions in mincore(). + */ +TEST(basic_interface) +{ + int retval; + int page_size; + unsigned char vec[1]; + char *addr; + + page_size = sysconf(_SC_PAGESIZE); + + /* Query a 0 byte sized range */ + retval = mincore(0, 0, vec); + EXPECT_EQ(0, retval); + + /* Addresses in the specified range are invalid or unmapped */ + errno = 0; + retval = mincore(NULL, page_size, vec); + EXPECT_EQ(-1, retval); + EXPECT_EQ(ENOMEM, errno); + + errno = 0; + addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(MAP_FAILED, addr) { + TH_LOG("mmap error: %s", strerror(errno)); + } + + /* <addr> argument is not page-aligned */ + errno = 0; + retval = mincore(addr + 1, page_size, vec); + EXPECT_EQ(-1, retval); + EXPECT_EQ(EINVAL, errno); + + /* <length> argument is too large */ + errno = 0; + retval = mincore(addr, -1, vec); + EXPECT_EQ(-1, retval); + EXPECT_EQ(ENOMEM, errno); + + /* <vec> argument points to an illegal address */ + errno = 0; + retval = mincore(addr, page_size, NULL); + EXPECT_EQ(-1, retval); + EXPECT_EQ(EFAULT, errno); + munmap(addr, page_size); +} + + +/* + * Test mincore() behavior on a private anonymous page mapping. + * Check that the page is not loaded into memory right after the mapping + * but after accessing it (on-demand allocation). + * Then free the page and check that it's not memory-resident. + */ +TEST(check_anonymous_locked_pages) +{ + unsigned char vec[1]; + char *addr; + int retval; + int page_size; + + page_size = sysconf(_SC_PAGESIZE); + + /* Map one page and check it's not memory-resident */ + errno = 0; + addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(MAP_FAILED, addr) { + TH_LOG("mmap error: %s", strerror(errno)); + } + retval = mincore(addr, page_size, vec); + ASSERT_EQ(0, retval); + ASSERT_EQ(0, vec[0]) { + TH_LOG("Page found in memory before use"); + } + + /* Touch the page and check again. It should now be in memory */ + addr[0] = 1; + mlock(addr, page_size); + retval = mincore(addr, page_size, vec); + ASSERT_EQ(0, retval); + ASSERT_EQ(1, vec[0]) { + TH_LOG("Page not found in memory after use"); + } + + /* + * It shouldn't be memory-resident after unlocking it and + * marking it as unneeded. + */ + munlock(addr, page_size); + madvise(addr, page_size, MADV_DONTNEED); + retval = mincore(addr, page_size, vec); + ASSERT_EQ(0, retval); + ASSERT_EQ(0, vec[0]) { + TH_LOG("Page in memory after being zapped"); + } + munmap(addr, page_size); +} + + +/* + * Check mincore() behavior on huge pages. + * This test will be skipped if the mapping fails (ie. if there are no + * huge pages available). + * + * Make sure the system has at least one free huge page, check + * "HugePages_Free" in /proc/meminfo. + * Increment /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages if + * needed. + */ +TEST(check_huge_pages) +{ + unsigned char vec[1]; + char *addr; + int retval; + int page_size; + + page_size = sysconf(_SC_PAGESIZE); + + errno = 0; + addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, + -1, 0); + if (addr == MAP_FAILED) { + if (errno == ENOMEM) + SKIP(return, "No huge pages available."); + else + TH_LOG("mmap error: %s", strerror(errno)); + } + retval = mincore(addr, page_size, vec); + ASSERT_EQ(0, retval); + ASSERT_EQ(0, vec[0]) { + TH_LOG("Page found in memory before use"); + } + + addr[0] = 1; + mlock(addr, page_size); + retval = mincore(addr, page_size, vec); + ASSERT_EQ(0, retval); + ASSERT_EQ(1, vec[0]) { + TH_LOG("Page not found in memory after use"); + } + + munlock(addr, page_size); + munmap(addr, page_size); +} + + +/* + * Test mincore() behavior on a file-backed page. + * No pages should be loaded into memory right after the mapping. Then, + * accessing any address in the mapping range should load the page + * containing the address and a number of subsequent pages (readahead). + * + * The actual readahead settings depend on the test environment, so we + * can't make a lot of assumptions about that. This test covers the most + * general cases. + */ +TEST(check_file_mmap) +{ + unsigned char *vec; + int vec_size; + char *addr; + int retval; + int page_size; + int fd; + int i; + int ra_pages = 0; + + page_size = sysconf(_SC_PAGESIZE); + vec_size = FILE_SIZE / page_size; + if (FILE_SIZE % page_size) + vec_size++; + + vec = calloc(vec_size, sizeof(unsigned char)); + ASSERT_NE(NULL, vec) { + TH_LOG("Can't allocate array"); + } + + errno = 0; + fd = open(".", O_TMPFILE | O_RDWR, 0600); + ASSERT_NE(-1, fd) { + TH_LOG("Can't create temporary file: %s", + strerror(errno)); + } + errno = 0; + retval = fallocate(fd, 0, 0, FILE_SIZE); + ASSERT_EQ(0, retval) { + TH_LOG("Error allocating space for the temporary file: %s", + strerror(errno)); + } + + /* + * Map the whole file, the pages shouldn't be fetched yet. + */ + errno = 0; + addr = mmap(NULL, FILE_SIZE, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + ASSERT_NE(MAP_FAILED, addr) { + TH_LOG("mmap error: %s", strerror(errno)); + } + retval = mincore(addr, FILE_SIZE, vec); + ASSERT_EQ(0, retval); + for (i = 0; i < vec_size; i++) { + ASSERT_EQ(0, vec[i]) { + TH_LOG("Unexpected page in memory"); + } + } + + /* + * Touch a page in the middle of the mapping. We expect the next + * few pages (the readahead window) to be populated too. + */ + addr[FILE_SIZE / 2] = 1; + retval = mincore(addr, FILE_SIZE, vec); + ASSERT_EQ(0, retval); + ASSERT_EQ(1, vec[FILE_SIZE / 2 / page_size]) { + TH_LOG("Page not found in memory after use"); + } + + i = FILE_SIZE / 2 / page_size + 1; + while (i < vec_size && vec[i]) { + ra_pages++; + i++; + } + EXPECT_GT(ra_pages, 0) { + TH_LOG("No read-ahead pages found in memory"); + } + + EXPECT_LT(i, vec_size) { + TH_LOG("Read-ahead pages reached the end of the file"); + } + /* + * End of the readahead window. The rest of the pages shouldn't + * be in memory. + */ + if (i < vec_size) { + while (i < vec_size && !vec[i]) + i++; + EXPECT_EQ(vec_size, i) { + TH_LOG("Unexpected page in memory beyond readahead window"); + } + } + + munmap(addr, FILE_SIZE); + close(fd); + free(vec); +} + + +/* + * Test mincore() behavior on a page backed by a tmpfs file. This test + * performs the same steps as the previous one. However, we don't expect + * any readahead in this case. + */ +TEST(check_tmpfs_mmap) +{ + unsigned char *vec; + int vec_size; + char *addr; + int retval; + int page_size; + int fd; + int i; + int ra_pages = 0; + + page_size = sysconf(_SC_PAGESIZE); + vec_size = FILE_SIZE / page_size; + if (FILE_SIZE % page_size) + vec_size++; + + vec = calloc(vec_size, sizeof(unsigned char)); + ASSERT_NE(NULL, vec) { + TH_LOG("Can't allocate array"); + } + + errno = 0; + fd = open("/dev/shm", O_TMPFILE | O_RDWR, 0600); + ASSERT_NE(-1, fd) { + TH_LOG("Can't create temporary file: %s", + strerror(errno)); + } + errno = 0; + retval = fallocate(fd, 0, 0, FILE_SIZE); + ASSERT_EQ(0, retval) { + TH_LOG("Error allocating space for the temporary file: %s", + strerror(errno)); + } + + /* + * Map the whole file, the pages shouldn't be fetched yet. + */ + errno = 0; + addr = mmap(NULL, FILE_SIZE, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + ASSERT_NE(MAP_FAILED, addr) { + TH_LOG("mmap error: %s", strerror(errno)); + } + retval = mincore(addr, FILE_SIZE, vec); + ASSERT_EQ(0, retval); + for (i = 0; i < vec_size; i++) { + ASSERT_EQ(0, vec[i]) { + TH_LOG("Unexpected page in memory"); + } + } + + /* + * Touch a page in the middle of the mapping. We expect only + * that page to be fetched into memory. + */ + addr[FILE_SIZE / 2] = 1; + retval = mincore(addr, FILE_SIZE, vec); + ASSERT_EQ(0, retval); + ASSERT_EQ(1, vec[FILE_SIZE / 2 / page_size]) { + TH_LOG("Page not found in memory after use"); + } + + i = FILE_SIZE / 2 / page_size + 1; + while (i < vec_size && vec[i]) { + ra_pages++; + i++; + } + ASSERT_EQ(ra_pages, 0) { + TH_LOG("Read-ahead pages found in memory"); + } + + munmap(addr, FILE_SIZE); + close(fd); + free(vec); +} + +TEST_HARNESS_MAIN _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/sparse: never partially remove memmap for early section For early sections, its memmap is handled specially even sub-section is enabled. The memmap could only be populated as a whole. Quoted from the comment of section_activate(): * The early init code does not consider partially populated * initial sections, it simply assumes that memory will never be * referenced. If we hot-add memory into such a section then we * do not need to populate the memmap and can simply reuse what * is already there. While current section_deactivate() breaks this rule. When hot-remove a sub-section, section_deactivate() would depopulate its memmap. The consequence is if we hot-add this subsection again, its memmap never get proper populated. We can reproduce the case by following steps: 1. Hacking qemu to allow sub-section early section : diff --git a/hw/i386/pc.c b/hw/i386/pc.c : index 51b3050d01..c6a78d83c0 100644 : --- a/hw/i386/pc.c : +++ b/hw/i386/pc.c : @@ -1010,7 +1010,7 @@ void pc_memory_init(PCMachineState *pcms, : } : : machine->device_memory->base = : - ROUND_UP(0x100000000ULL + x86ms->above_4g_mem_size, 1 * GiB); : + 0x100000000ULL + x86ms->above_4g_mem_size; : : if (pcmc->enforce_aligned_dimm) { : /* size device region assuming 1G page max alignment per slot */ 2. Bootup qemu with PSE disabled and a sub-section aligned memory size Part of the qemu command would look like this: sudo x86_64-softmmu/qemu-system-x86_64 \ --enable-kvm -cpu host,pse=off \ -m 4160M,maxmem=20G,slots=1 \ -smp sockets=2,cores=16 \ -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \ -machine pc,nvdimm \ -nographic \ -object memory-backend-ram,id=mem0,size=8G \ -device nvdimm,id=vm0,memdev=mem0,node=0,addr=0x144000000,label-size=128k 3. Re-config a pmem device with sub-section size in guest ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax --size=16M Then you would see the following call trace: pmem0: detected capacity change from 0 to 16777216 BUG: unable to handle page fault for address: ffffec73c51000b4 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 81ff8067 P4D 81ff8067 PUD 81ff7067 PMD 1437cb067 PTE 0 Oops: 0002 [#1] SMP NOPTI CPU: 16 PID: 1348 Comm: ndctl Kdump: loaded Tainted: G W 5.8.0-rc2+ #24 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.4 RIP: 0010:memmap_init_zone+0x154/0x1c2 Code: 77 16 f6 40 10 02 74 10 48 03 48 08 48 89 cb 48 c1 eb 0c e9 3a ff ff ff 48 89 df 48 c1 e7 06 48f RSP: 0018:ffffbdc7011a39b0 EFLAGS: 00010282 RAX: ffffec73c5100088 RBX: 0000000000144002 RCX: 0000000000144000 RDX: 0000000000000004 RSI: 007ffe0000000000 RDI: ffffec73c5100080 RBP: 027ffe0000000000 R08: 0000000000000001 R09: ffff9f8d38f6d708 R10: ffffec73c0000000 R11: 0000000000000000 R12: 0000000000000004 R13: 0000000000000001 R14: 0000000000144200 R15: 0000000000000000 FS: 00007efe6b65d780(0000) GS:ffff9f8d3f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffec73c51000b4 CR3: 000000007d718000 CR4: 0000000000340ee0 Call Trace: move_pfn_range_to_zone+0x128/0x150 memremap_pages+0x4e4/0x5a0 devm_memremap_pages+0x1e/0x60 dev_dax_probe+0x69/0x160 [device_dax] really_probe+0x298/0x3c0 driver_probe_device+0xe1/0x150 ? driver_allows_async_probing+0x50/0x50 bus_for_each_drv+0x7e/0xc0 __device_attach+0xdf/0x160 bus_probe_device+0x8e/0xa0 device_add+0x3b9/0x740 __devm_create_dev_dax+0x127/0x1c0 __dax_pmem_probe+0x1f2/0x219 [dax_pmem_core] dax_pmem_probe+0xc/0x1b [dax_pmem] nvdimm_bus_probe+0x69/0x1c0 [libnvdimm] really_probe+0x147/0x3c0 driver_probe_device+0xe1/0x150 device_driver_attach+0x53/0x60 bind_store+0xd1/0x110 kernfs_fop_write+0xce/0x1b0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20200625223534.18024-1-richard.weiyang@linux.alibaba.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/sparse.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) --- a/mm/sparse.c~mm-sparse-never-partially-remove-memmap-for-early-section +++ a/mm/sparse.c @@ -824,10 +824,14 @@ static void section_deactivate(unsigned ms->section_mem_map &= ~SECTION_HAS_MEM_MAP; } - if (section_is_early && memmap) - free_map_bootmem(memmap); - else + /* + * The memmap of early sections is always fully populated. See + * section_activate() and pfn_valid() . + */ + if (!section_is_early) depopulate_section_memmap(pfn, nr_pages, altmap); + else if (memmap) + free_map_bootmem(memmap); if (empty) ms->section_mem_map = (unsigned long)NULL; _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/sparse: only sub-section aligned range would be populated There are two code path which invoke __populate_section_memmap() * sparse_init_nid() * sparse_add_section() For both case, we are sure the memory range is sub-section aligned. * we pass PAGES_PER_SECTION to sparse_init_nid() * we check range by check_pfn_span() before calling sparse_add_section() Also, the counterpart of __populate_section_memmap(), we don't do such calculation and check since the range is checked by check_pfn_span() in __remove_pages(). Clear the calculation and check to keep it simple and comply with its counterpart. Link: http://lkml.kernel.org/r/20200703031828.14645-1-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/sparse-vmemmap.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) --- a/mm/sparse-vmemmap.c~mm-sparse-only-sub-section-aligned-range-would-be-populated +++ a/mm/sparse-vmemmap.c @@ -251,20 +251,12 @@ int __meminit vmemmap_populate_basepages struct page * __meminit __populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_altmap *altmap) { - unsigned long start; - unsigned long end; + unsigned long start = (unsigned long) pfn_to_page(pfn); + unsigned long end = start + nr_pages * sizeof(struct page); - /* - * The minimum granularity of memmap extensions is - * PAGES_PER_SUBSECTION as allocations are tracked in the - * 'subsection_map' bitmap of the section. - */ - end = ALIGN(pfn + nr_pages, PAGES_PER_SUBSECTION); - pfn &= PAGE_SUBSECTION_MASK; - nr_pages = end - pfn; - - start = (unsigned long) pfn_to_page(pfn); - end = start + nr_pages * sizeof(struct page); + if (WARN_ON_ONCE(!IS_ALIGNED(pfn, PAGES_PER_SUBSECTION) || + !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) + return NULL; if (vmemmap_populate(start, end, nid, altmap)) return NULL; _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: mm/sparse: cleanup the code surrounding memory_present() After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent functions that call memory_present() for each region in memblock.memory: sparse_memory_present_with_active_regions() and membocks_present(). Moreover, all architectures have a call to either of these functions preceding the call to sparse_init() and in the most cases they are called one after the other. Mark the regions from memblock.memory as present during sparce_init() by making sparse_init() call memblocks_present(), make memblocks_present() and memory_present() functions static and remove redundant sparse_memory_present_with_active_regions() function. Also remove no longer required HAVE_MEMORY_PRESENT configuration option. Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/vm/memory-model.rst | 7 ++----- arch/arm/mm/init.c | 9 ++------- arch/arm64/mm/init.c | 6 ++---- arch/ia64/mm/discontig.c | 1 - arch/microblaze/mm/init.c | 3 --- arch/mips/kernel/setup.c | 8 -------- arch/mips/loongson64/numa.c | 1 - arch/mips/sgi-ip27/ip27-memory.c | 2 -- arch/parisc/mm/init.c | 5 ----- arch/powerpc/mm/mem.c | 2 -- arch/powerpc/mm/numa.c | 1 - arch/riscv/mm/init.c | 1 - arch/s390/mm/init.c | 1 - arch/sh/mm/init.c | 6 ------ arch/sh/mm/numa.c | 3 --- arch/sparc/mm/init_64.c | 1 - arch/x86/mm/init_32.c | 2 -- arch/x86/mm/init_64.c | 1 - include/linux/mm.h | 4 ---- include/linux/mmzone.h | 14 -------------- mm/Kconfig | 6 +----- mm/page_alloc.c | 16 ---------------- mm/sparse.c | 20 ++++++++++++-------- 23 files changed, 19 insertions(+), 101 deletions(-) --- a/arch/arm64/mm/init.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/arm64/mm/init.c @@ -430,11 +430,9 @@ void __init bootmem_init(void) #endif /* - * Sparsemem tries to allocate bootmem in memory_present(), so must be - * done after the fixed reservations. + * sparse_init() tries to allocate memory from memblock, so must be + * done after the fixed reservations */ - memblocks_present(); - sparse_init(); zone_sizes_init(min, max); --- a/arch/arm/mm/init.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/arm/mm/init.c @@ -243,13 +243,8 @@ void __init bootmem_init(void) (phys_addr_t)max_low_pfn << PAGE_SHIFT); /* - * Sparsemem tries to allocate bootmem in memory_present(), - * so must be done after the fixed reservations - */ - memblocks_present(); - - /* - * sparse_init() needs the bootmem allocator up and running. + * sparse_init() tries to allocate memory from memblock, so must be + * done after the fixed reservations */ sparse_init(); --- a/arch/ia64/mm/discontig.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/ia64/mm/discontig.c @@ -600,7 +600,6 @@ void __init paging_init(void) max_dma = virt_to_phys((void *) MAX_DMA_ADDRESS) >> PAGE_SHIFT; - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); #ifdef CONFIG_VIRTUAL_MEM_MAP --- a/arch/microblaze/mm/init.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/microblaze/mm/init.c @@ -172,9 +172,6 @@ void __init setup_memory(void) &memblock.memory, 0); } - /* XXX need to clip this if using highmem? */ - sparse_memory_present_with_active_regions(0); - paging_init(); } --- a/arch/mips/kernel/setup.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/mips/kernel/setup.c @@ -371,14 +371,6 @@ static void __init bootmem_init(void) #endif } - - /* - * In any case the added to the memblock memory regions - * (highmem/lowmem, available/reserved, etc) are considered - * as present, so inform sparsemem about them. - */ - memblocks_present(); - /* * Reserve initrd memory if needed. */ --- a/arch/mips/loongson64/numa.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/mips/loongson64/numa.c @@ -220,7 +220,6 @@ static __init void prom_meminit(void) cpumask_clear(&__node_cpumask[node]); } } - memblocks_present(); max_low_pfn = PHYS_PFN(memblock_end_of_DRAM()); for (cpu = 0; cpu < loongson_sysconf.nr_cpus; cpu++) { --- a/arch/mips/sgi-ip27/ip27-memory.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/mips/sgi-ip27/ip27-memory.c @@ -402,8 +402,6 @@ void __init prom_meminit(void) } __node_data[node] = &null_node; } - - memblocks_present(); } void __init prom_free_prom_memory(void) --- a/arch/parisc/mm/init.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/parisc/mm/init.c @@ -689,11 +689,6 @@ void __init paging_init(void) flush_cache_all_local(); /* start with known state */ flush_tlb_all_local(NULL); - /* - * Mark all memblocks as present for sparsemem using - * memory_present() and then initialize sparsemem. - */ - memblocks_present(); sparse_init(); parisc_bootmem_free(); } --- a/arch/powerpc/mm/mem.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/powerpc/mm/mem.c @@ -183,8 +183,6 @@ void __init mem_topology_setup(void) void __init initmem_init(void) { - /* XXX need to clip this if using highmem? */ - sparse_memory_present_with_active_regions(0); sparse_init(); } --- a/arch/powerpc/mm/numa.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/powerpc/mm/numa.c @@ -949,7 +949,6 @@ void __init initmem_init(void) get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); setup_node_data(nid, start_pfn, end_pfn); - sparse_memory_present_with_active_regions(nid); } sparse_init(); --- a/arch/riscv/mm/init.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/riscv/mm/init.c @@ -544,7 +544,6 @@ void mark_rodata_ro(void) void __init paging_init(void) { setup_vm_final(); - memblocks_present(); sparse_init(); setup_zero_page(); zone_sizes_init(); --- a/arch/s390/mm/init.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/s390/mm/init.c @@ -115,7 +115,6 @@ void __init paging_init(void) __load_psw_mask(psw.mask); kasan_free_early_identity(); - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); zone_dma_bits = 31; memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); --- a/arch/sh/mm/init.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/sh/mm/init.c @@ -241,12 +241,6 @@ static void __init do_init_bootmem(void) plat_mem_setup(); - for_each_memblock(memory, reg) { - int nid = memblock_get_region_node(reg); - - memory_present(nid, memblock_region_memory_base_pfn(reg), - memblock_region_memory_end_pfn(reg)); - } sparse_init(); } --- a/arch/sh/mm/numa.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/sh/mm/numa.c @@ -53,7 +53,4 @@ void __init setup_bootmem_node(int nid, /* It's up */ node_set_online(nid); - - /* Kick sparsemem */ - sparse_memory_present_with_active_regions(nid); } --- a/arch/sparc/mm/init_64.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/sparc/mm/init_64.c @@ -1610,7 +1610,6 @@ static unsigned long __init bootmem_init /* XXX cpu notifier XXX */ - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); return end_pfn; --- a/arch/x86/mm/init_32.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/x86/mm/init_32.c @@ -678,7 +678,6 @@ void __init initmem_init(void) #endif memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0); - sparse_memory_present_with_active_regions(0); #ifdef CONFIG_FLATMEM max_mapnr = IS_ENABLED(CONFIG_HIGHMEM) ? highend_pfn : max_low_pfn; @@ -718,7 +717,6 @@ void __init paging_init(void) * NOTE: at this point the bootmem allocator is fully available. */ olpc_dt_build_devicetree(); - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); zone_sizes_init(); } --- a/arch/x86/mm/init_64.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/arch/x86/mm/init_64.c @@ -817,7 +817,6 @@ void __init initmem_init(void) void __init paging_init(void) { - sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); /* --- a/Documentation/vm/memory-model.rst~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/Documentation/vm/memory-model.rst @@ -141,11 +141,8 @@ sections: `mem_section` objects and the number of rows is calculated to fit all the memory sections. -The architecture setup code should call :c:func:`memory_present` for -each active memory range or use :c:func:`memblocks_present` or -:c:func:`sparse_memory_present_with_active_regions` wrappers to -initialize the memory sections. Next, the actual memory maps should be -set up using :c:func:`sparse_init`. +The architecture setup code should call sparse_init() to +initialize the memory sections and the memory maps. With SPARSEMEM there are two possible ways to convert a PFN to the corresponding `struct page` - a "classic sparse" and "sparse --- a/include/linux/mm.h~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/include/linux/mm.h @@ -2382,9 +2382,6 @@ static inline unsigned long get_num_phys * for_each_valid_physical_page_range() * memblock_add_node(base, size, nid) * free_area_init(max_zone_pfns); - * - * sparse_memory_present_with_active_regions() calls memory_present() for - * each range when SPARSEMEM is enabled. */ void free_area_init(unsigned long *max_zone_pfn); unsigned long node_map_pfn_alignment(void); @@ -2395,7 +2392,6 @@ extern unsigned long absent_pages_in_ran extern void get_pfn_range_for_nid(unsigned int nid, unsigned long *start_pfn, unsigned long *end_pfn); extern unsigned long find_min_pfn_with_active_regions(void); -extern void sparse_memory_present_with_active_regions(int nid); #ifndef CONFIG_NEED_MULTIPLE_NODES static inline int early_pfn_to_nid(unsigned long pfn) --- a/include/linux/mmzone.h~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/include/linux/mmzone.h @@ -839,18 +839,6 @@ static inline struct pglist_data *lruvec extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx); -#ifdef CONFIG_HAVE_MEMORY_PRESENT -void memory_present(int nid, unsigned long start, unsigned long end); -#else -static inline void memory_present(int nid, unsigned long start, unsigned long end) {} -#endif - -#if defined(CONFIG_SPARSEMEM) -void memblocks_present(void); -#else -static inline void memblocks_present(void) {} -#endif - #ifdef CONFIG_HAVE_MEMORYLESS_NODES int local_memory_node(int node_id); #else @@ -1407,8 +1395,6 @@ struct mminit_pfnnid_cache { #define early_pfn_valid(pfn) (1) #endif -void memory_present(int nid, unsigned long start, unsigned long end); - /* * If it is possible to have holes within a MAX_ORDER_NR_PAGES, then we * need to check pfn validity within that MAX_ORDER_NR_PAGES block. --- a/mm/Kconfig~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/mm/Kconfig @@ -88,13 +88,9 @@ config NEED_MULTIPLE_NODES def_bool y depends on DISCONTIGMEM || NUMA -config HAVE_MEMORY_PRESENT - def_bool y - depends on ARCH_HAVE_MEMORY_PRESENT || SPARSEMEM - # # SPARSEMEM_EXTREME (which is the default) does some bootmem -# allocations when memory_present() is called. If this cannot +# allocations when sparse_init() is called. If this cannot # be done on your architecture, select this option. However, # statically allocating the mem_section[] array can potentially # consume vast quantities of .bss, so be careful. --- a/mm/page_alloc.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/mm/page_alloc.c @@ -6325,22 +6325,6 @@ void __meminit init_currently_empty_zone } /** - * sparse_memory_present_with_active_regions - Call memory_present for each active range - * @nid: The node to call memory_present for. If MAX_NUMNODES, all nodes will be used. - * - * If an architecture guarantees that all ranges registered contain no holes and may - * be freed, this function may be used instead of calling memory_present() manually. - */ -void __init sparse_memory_present_with_active_regions(int nid) -{ - unsigned long start_pfn, end_pfn; - int i, this_nid; - - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) - memory_present(this_nid, start_pfn, end_pfn); -} - -/** * get_pfn_range_for_nid - Return the start and end page frames for a node * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned. * @start_pfn: Passed by reference. On return, it will have the node start_pfn. --- a/mm/sparse.c~mm-sparse-cleanup-the-code-surrounding-memory_present +++ a/mm/sparse.c @@ -249,7 +249,7 @@ void __init subsection_map_init(unsigned #endif /* Record a memory area against a node. */ -void __init memory_present(int nid, unsigned long start, unsigned long end) +static void __init memory_present(int nid, unsigned long start, unsigned long end) { unsigned long pfn; @@ -285,11 +285,11 @@ void __init memory_present(int nid, unsi } /* - * Mark all memblocks as present using memory_present(). This is a - * convenience function that is useful for a number of arches - * to mark all of the systems memory as present during initialization. + * Mark all memblocks as present using memory_present(). + * This is a convenience function that is useful to mark all of the systems + * memory as present during initialization. */ -void __init memblocks_present(void) +static void __init memblocks_present(void) { struct memblock_region *reg; @@ -574,9 +574,13 @@ failed: */ void __init sparse_init(void) { - unsigned long pnum_begin = first_present_section_nr(); - int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin)); - unsigned long pnum_end, map_count = 1; + unsigned long pnum_end, pnum_begin, map_count = 1; + int nid_begin; + + memblocks_present(); + + pnum_begin = first_present_section_nr(); + nid_begin = sparse_early_nid(__nr_to_section(pnum_begin)); /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */ set_pageblock_order(); _
From: "Matthew Wilcox (Oracle)" <willy@infradead.org> Subject: vmalloc: convert to XArray The radix tree of vmap blocks is simpler to express as an XArray. Reduces both the text and data sizes of the object file and eliminates a user of the radix tree preload API. Link: http://lkml.kernel.org/r/20200603171448.5894-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 40 +++++++++++----------------------------- 1 file changed, 11 insertions(+), 29 deletions(-) --- a/mm/vmalloc.c~vmalloc-convert-to-xarray +++ a/mm/vmalloc.c @@ -25,7 +25,7 @@ #include <linux/list.h> #include <linux/notifier.h> #include <linux/rbtree.h> -#include <linux/radix-tree.h> +#include <linux/xarray.h> #include <linux/rcupdate.h> #include <linux/pfn.h> #include <linux/kmemleak.h> @@ -1514,12 +1514,11 @@ struct vmap_block { static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue); /* - * Radix tree of vmap blocks, indexed by address, to quickly find a vmap block + * XArray of vmap blocks, indexed by address, to quickly find a vmap block * in the free path. Could get rid of this if we change the API to return a * "cookie" from alloc, to be passed to free. But no big deal yet. */ -static DEFINE_SPINLOCK(vmap_block_tree_lock); -static RADIX_TREE(vmap_block_tree, GFP_ATOMIC); +static DEFINE_XARRAY(vmap_blocks); /* * We should probably have a fallback mechanism to allocate virtual memory @@ -1576,13 +1575,6 @@ static void *new_vmap_block(unsigned int return ERR_CAST(va); } - err = radix_tree_preload(gfp_mask); - if (unlikely(err)) { - kfree(vb); - free_vmap_area(va); - return ERR_PTR(err); - } - vaddr = vmap_block_vaddr(va->va_start, 0); spin_lock_init(&vb->lock); vb->va = va; @@ -1595,11 +1587,12 @@ static void *new_vmap_block(unsigned int INIT_LIST_HEAD(&vb->free_list); vb_idx = addr_to_vb_idx(va->va_start); - spin_lock(&vmap_block_tree_lock); - err = radix_tree_insert(&vmap_block_tree, vb_idx, vb); - spin_unlock(&vmap_block_tree_lock); - BUG_ON(err); - radix_tree_preload_end(); + err = xa_insert(&vmap_blocks, vb_idx, vb, gfp_mask); + if (err) { + kfree(vb); + free_vmap_area(va); + return ERR_PTR(err); + } vbq = &get_cpu_var(vmap_block_queue); spin_lock(&vbq->lock); @@ -1613,12 +1606,8 @@ static void *new_vmap_block(unsigned int static void free_vmap_block(struct vmap_block *vb) { struct vmap_block *tmp; - unsigned long vb_idx; - vb_idx = addr_to_vb_idx(vb->va->va_start); - spin_lock(&vmap_block_tree_lock); - tmp = radix_tree_delete(&vmap_block_tree, vb_idx); - spin_unlock(&vmap_block_tree_lock); + tmp = xa_erase(&vmap_blocks, addr_to_vb_idx(vb->va->va_start)); BUG_ON(tmp != vb); free_vmap_area_noflush(vb->va); @@ -1724,7 +1713,6 @@ static void *vb_alloc(unsigned long size static void vb_free(unsigned long addr, unsigned long size) { unsigned long offset; - unsigned long vb_idx; unsigned int order; struct vmap_block *vb; @@ -1734,14 +1722,8 @@ static void vb_free(unsigned long addr, flush_cache_vunmap(addr, addr + size); order = get_order(size); - offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT; - - vb_idx = addr_to_vb_idx(addr); - rcu_read_lock(); - vb = radix_tree_lookup(&vmap_block_tree, vb_idx); - rcu_read_unlock(); - BUG_ON(!vb); + vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr)); unmap_kernel_range_noflush(addr, size); _
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Subject: mm/vmalloc: simplify merge_or_add_vmap_area() Currently when a VA is deallocated and is about to be placed back to the tree, it can be either: merged with next/prev neighbors or inserted if not coalesced. On those steps the tree can be populated several times. For example when both neighbors are merged. It can be avoided and simplified in fact. Therefore do it only once when VA points to final merged area, after all manipulations: merging/removing/inserting. Link: http://lkml.kernel.org/r/20200527205054.1696-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-simplify-merge_or_add_vmap_area-func +++ a/mm/vmalloc.c @@ -797,9 +797,6 @@ merge_or_add_vmap_area(struct vmap_area if (sibling->va_start == va->va_end) { sibling->va_start = va->va_start; - /* Check and update the tree if needed. */ - augment_tree_propagate_from(sibling); - /* Free vmap_area object. */ kmem_cache_free(vmap_area_cachep, va); @@ -819,14 +816,18 @@ merge_or_add_vmap_area(struct vmap_area if (next->prev != head) { sibling = list_entry(next->prev, struct vmap_area, list); if (sibling->va_end == va->va_start) { - sibling->va_end = va->va_end; - - /* Check and update the tree if needed. */ - augment_tree_propagate_from(sibling); - + /* + * If both neighbors are coalesced, it is important + * to unlink the "next" node first, followed by merging + * with "previous" one. Otherwise the tree might not be + * fully populated if a sibling's augmented value is + * "normalized" because of rotation operations. + */ if (merged) unlink_va(va, root); + sibling->va_end = va->va_end; + /* Free vmap_area object. */ kmem_cache_free(vmap_area_cachep, va); @@ -837,11 +838,13 @@ merge_or_add_vmap_area(struct vmap_area } insert: - if (!merged) { + if (!merged) link_va(va, root, parent, link, head); - augment_tree_propagate_from(va); - } + /* + * Last step is to check and update the tree. + */ + augment_tree_propagate_from(va); return va; } _
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Subject: mm/vmalloc: simplify augment_tree_propagate_check() This function is for debug purpose only. Currently it uses recursion for tree traversal, checking an augmented value of each node to find out if it is valid or not. The recursion can corrupt the stack because the tree can be huge if synthetic tests are applied. To prevent it, navigate the tree from bottom to upper levels using a regular list instead, because nodes are linked among each other also. It is faster and without recursion. Link: http://lkml.kernel.org/r/20200527205054.1696-2-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 42 ++++++++---------------------------------- 1 file changed, 8 insertions(+), 34 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-simplify-augment_tree_propagate_check-func +++ a/mm/vmalloc.c @@ -633,43 +633,17 @@ unlink_va(struct vmap_area *va, struct r #if DEBUG_AUGMENT_PROPAGATE_CHECK static void -augment_tree_propagate_check(struct rb_node *n) +augment_tree_propagate_check(void) { struct vmap_area *va; - struct rb_node *node; - unsigned long size; - bool found = false; - - if (n == NULL) - return; - - va = rb_entry(n, struct vmap_area, rb_node); - size = va->subtree_max_size; - node = n; + unsigned long computed_size; - while (node) { - va = rb_entry(node, struct vmap_area, rb_node); - - if (get_subtree_max_size(node->rb_left) == size) { - node = node->rb_left; - } else { - if (va_size(va) == size) { - found = true; - break; - } - - node = node->rb_right; - } + list_for_each_entry(va, &free_vmap_area_list, list) { + computed_size = compute_subtree_max_size(va); + if (computed_size != va->subtree_max_size) + pr_emerg("tree is corrupted: %lu, %lu\n", + va_size(va), va->subtree_max_size); } - - if (!found) { - va = rb_entry(n, struct vmap_area, rb_node); - pr_emerg("tree is corrupted: %lu, %lu\n", - va_size(va), va->subtree_max_size); - } - - augment_tree_propagate_check(n->rb_left); - augment_tree_propagate_check(n->rb_right); } #endif @@ -724,7 +698,7 @@ augment_tree_propagate_from(struct vmap_ } #if DEBUG_AUGMENT_PROPAGATE_CHECK - augment_tree_propagate_check(free_vmap_area_root.rb_node); + augment_tree_propagate_check(); #endif } _
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Subject: mm/vmalloc: switch to "propagate()" callback An augment_tree_propagate_from() function uses its own implementation that populates a tree from the specified node toward a root node. On the other hand the RB_DECLARE_CALLBACKS_MAX macro provides the "propagate()" callback that does exactly the same. Having two similar functions does not make sense and is redundant. Reuse "built in" functionality to the macros. So the code size gets reduced. Link: http://lkml.kernel.org/r/20200527205054.1696-3-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 25 ++++++------------------- 1 file changed, 6 insertions(+), 19 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-switch-to-propagate-callback +++ a/mm/vmalloc.c @@ -677,25 +677,12 @@ augment_tree_propagate_check(void) static __always_inline void augment_tree_propagate_from(struct vmap_area *va) { - struct rb_node *node = &va->rb_node; - unsigned long new_va_sub_max_size; - - while (node) { - va = rb_entry(node, struct vmap_area, rb_node); - new_va_sub_max_size = compute_subtree_max_size(va); - - /* - * If the newly calculated maximum available size of the - * subtree is equal to the current one, then it means that - * the tree is propagated correctly. So we have to stop at - * this point to save cycles. - */ - if (va->subtree_max_size == new_va_sub_max_size) - break; - - va->subtree_max_size = new_va_sub_max_size; - node = rb_parent(&va->rb_node); - } + /* + * Populate the tree from bottom towards the root until + * the calculated maximum available size of checked node + * is equal to its current one. + */ + free_vmap_area_rb_augment_cb_propagate(&va->rb_node, NULL); #if DEBUG_AUGMENT_PROPAGATE_CHECK augment_tree_propagate_check(); _
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Subject: mm/vmalloc: update the header about KVA rework Reflect information about the author, date and year when the KVA rework was done. Link: http://lkml.kernel.org/r/20200622195821.4796-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/vmalloc.c~mm-vmalloc-update-the-header-about-kva-rework +++ a/mm/vmalloc.c @@ -7,6 +7,7 @@ * SMP-safe vmalloc/vfree/ioremap, Tigran Aivazian <tigran@veritas.com>, May 2000 * Major rework to support vmap/vunmap, Christoph Hellwig, SGI, August 2002 * Numa awareness, Christoph Lameter, SGI, June 2005 + * Improving global KVA allocator, Uladzislau Rezki, Sony, May 2019 */ #include <linux/vmalloc.h> _
From: Mike Rapoport <rppt@linux.ibm.com> Subject: mm: vmalloc: remove redundant assignment in unmap_kernel_range_noflush() 'addr' is set to 'start' and then a few lines afterwards 'start' is set to 'addr'. Remove the second asignment. Link: http://lkml.kernel.org/r/20200707163226.374685-1-rppt@kernel.org Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 1 - 1 file changed, 1 deletion(-) --- a/mm/vmalloc.c~mm-vmalloc-remove-redundant-asignmnet-in-unmap_kernel_range_noflush +++ a/mm/vmalloc.c @@ -175,7 +175,6 @@ void unmap_kernel_range_noflush(unsigned pgtbl_mod_mask mask = 0; BUG_ON(addr >= end); - start = addr; pgd = pgd_offset_k(addr); do { next = pgd_addr_end(addr, end); _
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Subject: mm/vmalloc.c: remove BUG() from the find_va_links() Get rid of BUG() macro, that should be used only when a critical situation happens and a system is not able to function anymore. Replace it with WARN() macro instead, dump some extra information about start/end addresses of both VAs which overlap. Such overlap data can help to figure out what happened making further analysis easier. For example if both areas are identical it could mean a double free. A recovery process consists of declining all further steps regarding inserting of conflicting overlap range. In that sense find_va_links() now can return NULL, so its return value has to be checked by callers. Side effect of such process is it can leak memory, but it is better than just killing a machine for no good reason. Apart of that a debugging process can be done on alive system. Link: http://lkml.kernel.org/r/20200711104531.12242-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) --- a/mm/vmalloc.c~mm-vmallocc-remove-bug-from-the-find_va_links +++ a/mm/vmalloc.c @@ -512,6 +512,10 @@ static struct vmap_area *__find_vmap_are /* * This function returns back addresses of parent node * and its left or right link for further processing. + * + * Otherwise NULL is returned. In that case all further + * steps regarding inserting of conflicting overlap range + * have to be declined and actually considered as a bug. */ static __always_inline struct rb_node ** find_va_links(struct vmap_area *va, @@ -550,8 +554,12 @@ find_va_links(struct vmap_area *va, else if (va->va_end > tmp_va->va_start && va->va_start >= tmp_va->va_end) link = &(*link)->rb_right; - else - BUG(); + else { + WARN(1, "vmalloc bug: 0x%lx-0x%lx overlaps with 0x%lx-0x%lx\n", + va->va_start, va->va_end, tmp_va->va_start, tmp_va->va_end); + + return NULL; + } } while (*link); *parent = &tmp_va->rb_node; @@ -697,7 +705,8 @@ insert_vmap_area(struct vmap_area *va, struct rb_node *parent; link = find_va_links(va, root, NULL, &parent); - link_va(va, root, parent, link, head); + if (link) + link_va(va, root, parent, link, head); } static void @@ -713,8 +722,10 @@ insert_vmap_area_augment(struct vmap_are else link = find_va_links(va, root, NULL, &parent); - link_va(va, root, parent, link, head); - augment_tree_propagate_from(va); + if (link) { + link_va(va, root, parent, link, head); + augment_tree_propagate_from(va); + } } /* @@ -722,6 +733,11 @@ insert_vmap_area_augment(struct vmap_are * and next free blocks. If coalesce is not done a new * free area is inserted. If VA has been merged, it is * freed. + * + * Please note, it can return NULL in case of overlap + * ranges, followed by WARN() report. Despite it is a + * buggy behaviour, a system can be alive and keep + * ongoing. */ static __always_inline struct vmap_area * merge_or_add_vmap_area(struct vmap_area *va, @@ -738,6 +754,8 @@ merge_or_add_vmap_area(struct vmap_area * inserted, unless it is merged with its sibling/siblings. */ link = find_va_links(va, root, NULL, &parent); + if (!link) + return NULL; /* * Get next node of VA to check if merging can be done. @@ -1346,6 +1364,9 @@ static bool __purge_vmap_area_lazy(unsig va = merge_or_add_vmap_area(va, &free_vmap_area_root, &free_vmap_area_list); + if (!va) + continue; + if (is_vmalloc_or_module_addr((void *)orig_start)) kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end); @@ -3330,8 +3351,9 @@ recovery: orig_end = vas[area]->va_end; va = merge_or_add_vmap_area(vas[area], &free_vmap_area_root, &free_vmap_area_list); - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + if (va) + kasan_release_vmalloc(orig_start, orig_end, + va->va_start, va->va_end); vas[area] = NULL; } @@ -3379,8 +3401,9 @@ err_free_shadow: orig_end = vas[area]->va_end; va = merge_or_add_vmap_area(vas[area], &free_vmap_area_root, &free_vmap_area_list); - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + if (va) + kasan_release_vmalloc(orig_start, orig_end, + va->va_start, va->va_end); vas[area] = NULL; kfree(vms[area]); } _
From: Marco Elver <elver@google.com> Subject: kasan: improve and simplify Kconfig.kasan Turn 'KASAN' into a menuconfig, to avoid cluttering its parent menu with the suboptions if enabled. Use 'if KASAN ... endif' instead of having to 'depend on KASAN' for each entry. Link: http://lkml.kernel.org/r/20200629104157.3242503-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- lib/Kconfig.kasan | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) --- a/lib/Kconfig.kasan~kasan-improve-and-simplify-kconfigkasan +++ a/lib/Kconfig.kasan @@ -18,7 +18,7 @@ config CC_HAS_KASAN_SW_TAGS config CC_HAS_WORKING_NOSANITIZE_ADDRESS def_bool !CC_IS_GCC || GCC_VERSION >= 80300 -config KASAN +menuconfig KASAN bool "KASAN: runtime memory debugger" depends on (HAVE_ARCH_KASAN && CC_HAS_KASAN_GENERIC) || \ (HAVE_ARCH_KASAN_SW_TAGS && CC_HAS_KASAN_SW_TAGS) @@ -29,9 +29,10 @@ config KASAN designed to find out-of-bounds accesses and use-after-free bugs. See Documentation/dev-tools/kasan.rst for details. +if KASAN + choice prompt "KASAN mode" - depends on KASAN default KASAN_GENERIC help KASAN has two modes: generic KASAN (similar to userspace ASan, @@ -88,7 +89,6 @@ endchoice choice prompt "Instrumentation type" - depends on KASAN default KASAN_OUTLINE config KASAN_OUTLINE @@ -113,7 +113,6 @@ endchoice config KASAN_STACK_ENABLE bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && !COMPILE_TEST - depends on KASAN help The LLVM stack address sanitizer has a know problem that causes excessive stack usage in a lot of functions, see @@ -134,7 +133,7 @@ config KASAN_STACK config KASAN_S390_4_LEVEL_PAGING bool "KASan: use 4-level paging" - depends on KASAN && S390 + depends on S390 help Compiling the kernel with KASan disables automatic 3-level vs 4-level paging selection. 3-level paging is used by default (up @@ -151,7 +150,7 @@ config KASAN_SW_TAGS_IDENTIFY config KASAN_VMALLOC bool "Back mappings in vmalloc space with real shadow memory" - depends on KASAN && HAVE_ARCH_KASAN_VMALLOC + depends on HAVE_ARCH_KASAN_VMALLOC help By default, the shadow region for vmalloc space is the read-only zero page. This means that KASAN cannot detect errors involving @@ -164,8 +163,10 @@ config KASAN_VMALLOC config TEST_KASAN tristate "Module for testing KASAN for bug detection" - depends on m && KASAN + depends on m help This is a test module doing various nasty things like out of bounds accesses, use after free. It is useful for testing kernel debugging features like KASAN. + +endif # KASAN _
From: Marco Elver <elver@google.com> Subject: kasan: update required compiler versions in documentation Updates the recently changed compiler requirements for KASAN. In particular, we require GCC >= 8.3.0, and add a note that Clang 11 supports OOB detection of globals. Link: http://lkml.kernel.org/r/20200629104157.3242503-2-elver@google.com Fixes: 7b861a53e46b ("kasan: Bump required compiler version") Fixes: acf7b0bf7dcf ("kasan: Fix required compiler version") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/dev-tools/kasan.rst | 7 ++----- lib/Kconfig.kasan | 24 +++++++++++++++--------- 2 files changed, 17 insertions(+), 14 deletions(-) --- a/Documentation/dev-tools/kasan.rst~kasan-update-required-compiler-versions-in-documentation +++ a/Documentation/dev-tools/kasan.rst @@ -13,11 +13,8 @@ KASAN uses compile-time instrumentation memory access, and therefore requires a compiler version that supports that. Generic KASAN is supported in both GCC and Clang. With GCC it requires version -4.9.2 or later for basic support and version 5.0 or later for detection of -out-of-bounds accesses for stack and global variables and for inline -instrumentation mode (see the Usage section). With Clang it requires version -7.0.0 or later and it doesn't support detection of out-of-bounds accesses for -global variables yet. +8.3.0 or later. With Clang it requires version 7.0.0 or later, but detection of +out-of-bounds accesses for global variables is only supported since Clang 11. Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later. --- a/lib/Kconfig.kasan~kasan-update-required-compiler-versions-in-documentation +++ a/lib/Kconfig.kasan @@ -40,6 +40,7 @@ choice software tag-based KASAN (a version based on software memory tagging, arm64 only, similar to userspace HWASan, enabled with CONFIG_KASAN_SW_TAGS). + Both generic and tag-based KASAN are strictly debugging features. config KASAN_GENERIC @@ -51,16 +52,18 @@ config KASAN_GENERIC select STACKDEPOT help Enables generic KASAN mode. - Supported in both GCC and Clang. With GCC it requires version 4.9.2 - or later for basic support and version 5.0 or later for detection of - out-of-bounds accesses for stack and global variables and for inline - instrumentation mode (CONFIG_KASAN_INLINE). With Clang it requires - version 3.7.0 or later and it doesn't support detection of - out-of-bounds accesses for global variables yet. + + This mode is supported in both GCC and Clang. With GCC it requires + version 8.3.0 or later. With Clang it requires version 7.0.0 or + later, but detection of out-of-bounds accesses for global variables + is supported only since Clang 11. + This mode consumes about 1/8th of available memory at kernel start and introduces an overhead of ~x1.5 for the rest of the allocations. The performance slowdown is ~x3. + For better error detection enable CONFIG_STACKTRACE. + Currently CONFIG_KASAN_GENERIC doesn't work with CONFIG_DEBUG_SLAB (the resulting kernel does not boot). @@ -73,15 +76,19 @@ config KASAN_SW_TAGS select STACKDEPOT help Enables software tag-based KASAN mode. + This mode requires Top Byte Ignore support by the CPU and therefore - is only supported for arm64. - This mode requires Clang version 7.0.0 or later. + is only supported for arm64. This mode requires Clang version 7.0.0 + or later. + This mode consumes about 1/16th of available memory at kernel start and introduces an overhead of ~20% for the rest of the allocations. This mode may potentially introduce problems relating to pointer casting and comparison, as it embeds tags into the top byte of each pointer. + For better error detection enable CONFIG_STACKTRACE. + Currently CONFIG_KASAN_SW_TAGS doesn't work with CONFIG_DEBUG_SLAB (the resulting kernel does not boot). @@ -107,7 +114,6 @@ config KASAN_INLINE memory accesses. This is faster than outline (in some workloads it gives about x2 boost over outline instrumentation), but make kernel's .text size much bigger. - For CONFIG_KASAN_GENERIC this requires GCC 5.0 or later. endchoice _
From: Walter Wu <walter-zh.wu@mediatek.com> Subject: rcu: kasan: record and print call_rcu() call stack Patch series "kasan: memorize and print call_rcu stack", v8. This patchset improves KASAN reports by making them to have call_rcu() call stack information. It is useful for programmers to solve use-after-free or double-free memory issue. The KASAN report was as follows(cleaned up slightly): BUG: KASAN: use-after-free in kasan_rcu_reclaim+0x58/0x60 Freed by task 0: kasan_save_stack+0x24/0x50 kasan_set_track+0x24/0x38 kasan_set_free_info+0x18/0x20 __kasan_slab_free+0x10c/0x170 kasan_slab_free+0x10/0x18 kfree+0x98/0x270 kasan_rcu_reclaim+0x1c/0x60 Last call_rcu(): kasan_save_stack+0x24/0x50 kasan_record_aux_stack+0xbc/0xd0 call_rcu+0x8c/0x580 kasan_rcu_uaf+0xf4/0xf8 Generic KASAN will record the last two call_rcu() call stacks and print up to 2 call_rcu() call stacks in KASAN report. it is only suitable for generic KASAN. This feature considers the size of struct kasan_alloc_meta and kasan_free_meta, we try to optimize the structure layout and size, lets it get better memory consumption. [1]https://bugzilla.kernel.org/show_bug.cgi?id=198437 [2]https://groups.google.com/forum/#!searchin/kasan-dev/better$20stack$20traces$20for$20rcu%7Csort:date/kasan-dev/KQsjT_88hDE/7rNUZprRBgAJ This patch (of 4): This feature will record the last two call_rcu() call stacks and prints up to 2 call_rcu() call stacks in KASAN report. When call_rcu() is called, we store the call_rcu() call stack into slub alloc meta-data, so that the KASAN report can print rcu stack. [1]https://bugzilla.kernel.org/show_bug.cgi?id=198437 [2]https://groups.google.com/forum/#!searchin/kasan-dev/better$20stack$20traces$20for$20rcu%7Csort:date/kasan-dev/KQsjT_88hDE/7rNUZprRBgAJ [walter-zh.wu@mediatek.com: build fix] Link: http://lkml.kernel.org/r/20200710162401.23816-1-walter-zh.wu@mediatek.com Link: http://lkml.kernel.org/r/20200710162123.23713-1-walter-zh.wu@mediatek.com Link: http://lkml.kernel.org/r/20200601050847.1096-1-walter-zh.wu@mediatek.com Link: http://lkml.kernel.org/r/20200601050927.1153-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/kasan.h | 2 ++ kernel/rcu/tree.c | 2 ++ mm/kasan/common.c | 4 ++-- mm/kasan/generic.c | 21 +++++++++++++++++++++ mm/kasan/kasan.h | 9 +++++++++ mm/kasan/report.c | 28 +++++++++++++++++++++++----- 6 files changed, 59 insertions(+), 7 deletions(-) --- a/include/linux/kasan.h~rcu-kasan-record-and-print-call_rcu-call-stack +++ a/include/linux/kasan.h @@ -174,11 +174,13 @@ static inline size_t kasan_metadata_size void kasan_cache_shrink(struct kmem_cache *cache); void kasan_cache_shutdown(struct kmem_cache *cache); +void kasan_record_aux_stack(void *ptr); #else /* CONFIG_KASAN_GENERIC */ static inline void kasan_cache_shrink(struct kmem_cache *cache) {} static inline void kasan_cache_shutdown(struct kmem_cache *cache) {} +static inline void kasan_record_aux_stack(void *ptr) {} #endif /* CONFIG_KASAN_GENERIC */ --- a/kernel/rcu/tree.c~rcu-kasan-record-and-print-call_rcu-call-stack +++ a/kernel/rcu/tree.c @@ -59,6 +59,7 @@ #include <linux/sched/clock.h> #include <linux/vmalloc.h> #include <linux/mm.h> +#include <linux/kasan.h> #include "../time/tick-internal.h" #include "tree.h" @@ -2890,6 +2891,7 @@ __call_rcu(struct rcu_head *head, rcu_ca head->func = func; head->next = NULL; local_irq_save(flags); + kasan_record_aux_stack(head); rdp = this_cpu_ptr(&rcu_data); /* Add the callback to our list. */ --- a/mm/kasan/common.c~rcu-kasan-record-and-print-call_rcu-call-stack +++ a/mm/kasan/common.c @@ -40,7 +40,7 @@ #include "kasan.h" #include "../slab.h" -static inline depot_stack_handle_t save_stack(gfp_t flags) +depot_stack_handle_t kasan_save_stack(gfp_t flags) { unsigned long entries[KASAN_STACK_DEPTH]; unsigned int nr_entries; @@ -53,7 +53,7 @@ static inline depot_stack_handle_t save_ static inline void set_track(struct kasan_track *track, gfp_t flags) { track->pid = current->pid; - track->stack = save_stack(flags); + track->stack = kasan_save_stack(flags); } void kasan_enable_current(void) --- a/mm/kasan/generic.c~rcu-kasan-record-and-print-call_rcu-call-stack +++ a/mm/kasan/generic.c @@ -324,3 +324,24 @@ DEFINE_ASAN_SET_SHADOW(f2); DEFINE_ASAN_SET_SHADOW(f3); DEFINE_ASAN_SET_SHADOW(f5); DEFINE_ASAN_SET_SHADOW(f8); + +void kasan_record_aux_stack(void *addr) +{ + struct page *page = kasan_addr_to_page(addr); + struct kmem_cache *cache; + struct kasan_alloc_meta *alloc_info; + void *object; + + if (!(page && PageSlab(page))) + return; + + cache = page->slab_cache; + object = nearest_obj(cache, page, addr); + alloc_info = get_alloc_info(cache, object); + + /* + * record the last two call_rcu() call stacks. + */ + alloc_info->aux_stack[1] = alloc_info->aux_stack[0]; + alloc_info->aux_stack[0] = kasan_save_stack(GFP_NOWAIT); +} --- a/mm/kasan/kasan.h~rcu-kasan-record-and-print-call_rcu-call-stack +++ a/mm/kasan/kasan.h @@ -104,6 +104,13 @@ struct kasan_track { struct kasan_alloc_meta { struct kasan_track alloc_track; +#ifdef CONFIG_KASAN_GENERIC + /* + * call_rcu() call stack is stored into struct kasan_alloc_meta. + * The free stack is stored into struct kasan_free_meta. + */ + depot_stack_handle_t aux_stack[2]; +#endif struct kasan_track free_track[KASAN_NR_FREE_STACKS]; #ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY u8 free_pointer_tag[KASAN_NR_FREE_STACKS]; @@ -159,6 +166,8 @@ void kasan_report_invalid_free(void *obj struct page *kasan_addr_to_page(const void *addr); +depot_stack_handle_t kasan_save_stack(gfp_t flags); + #if defined(CONFIG_KASAN_GENERIC) && \ (defined(CONFIG_SLAB) || defined(CONFIG_SLUB)) void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache); --- a/mm/kasan/report.c~rcu-kasan-record-and-print-call_rcu-call-stack +++ a/mm/kasan/report.c @@ -106,15 +106,20 @@ static void end_report(unsigned long *fl kasan_enable_current(); } +static void print_stack(depot_stack_handle_t stack) +{ + unsigned long *entries; + unsigned int nr_entries; + + nr_entries = stack_depot_fetch(stack, &entries); + stack_trace_print(entries, nr_entries, 0); +} + static void print_track(struct kasan_track *track, const char *prefix) { pr_err("%s by task %u:\n", prefix, track->pid); if (track->stack) { - unsigned long *entries; - unsigned int nr_entries; - - nr_entries = stack_depot_fetch(track->stack, &entries); - stack_trace_print(entries, nr_entries, 0); + print_stack(track->stack); } else { pr_err("(stack is not available)\n"); } @@ -193,6 +198,19 @@ static void describe_object(struct kmem_ free_track = kasan_get_free_track(cache, object, tag); print_track(free_track, "Freed"); pr_err("\n"); + +#ifdef CONFIG_KASAN_GENERIC + if (alloc_info->aux_stack[0]) { + pr_err("Last call_rcu():\n"); + print_stack(alloc_info->aux_stack[0]); + pr_err("\n"); + } + if (alloc_info->aux_stack[1]) { + pr_err("Second to last call_rcu():\n"); + print_stack(alloc_info->aux_stack[1]); + pr_err("\n"); + } +#endif } describe_object_addr(cache, object, addr); _
From: Walter Wu <walter-zh.wu@mediatek.com> Subject: kasan: record and print the free track Move free track from kasan_alloc_meta to kasan_free_meta in order to make struct kasan_alloc_meta and kasan_free_meta size are both 16 bytes. It is a good size because it is the minimal redzone size and a good number of alignment. For free track, we make some modifications as shown below: 1) Remove the free_track from struct kasan_alloc_meta. 2) Add the free_track into struct kasan_free_meta. 3) Add a macro KASAN_KMALLOC_FREETRACK in order to check whether it can print free stack in KASAN report. [1]https://bugzilla.kernel.org/show_bug.cgi?id=198437 [walter-zh.wu@mediatek.com: build fix] Link: http://lkml.kernel.org/r/20200710162440.23887-1-walter-zh.wu@mediatek.com Link: http://lkml.kernel.org/r/20200601051022.1230-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/kasan/common.c | 22 +-------------------- mm/kasan/generic.c | 22 +++++++++++++++++++++ mm/kasan/generic_report.c | 1 mm/kasan/kasan.h | 16 ++++++++++++--- mm/kasan/quarantine.c | 1 mm/kasan/report.c | 26 +++---------------------- mm/kasan/tags.c | 37 ++++++++++++++++++++++++++++++++++++ 7 files changed, 80 insertions(+), 45 deletions(-) --- a/mm/kasan/common.c~kasan-record-and-print-the-free-track +++ a/mm/kasan/common.c @@ -50,7 +50,7 @@ depot_stack_handle_t kasan_save_stack(gf return stack_depot_save(entries, nr_entries, flags); } -static inline void set_track(struct kasan_track *track, gfp_t flags) +void kasan_set_track(struct kasan_track *track, gfp_t flags) { track->pid = current->pid; track->stack = kasan_save_stack(flags); @@ -298,24 +298,6 @@ struct kasan_free_meta *get_free_info(st return (void *)object + cache->kasan_info.free_meta_offset; } - -static void kasan_set_free_info(struct kmem_cache *cache, - void *object, u8 tag) -{ - struct kasan_alloc_meta *alloc_meta; - u8 idx = 0; - - alloc_meta = get_alloc_info(cache, object); - -#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY - idx = alloc_meta->free_track_idx; - alloc_meta->free_pointer_tag[idx] = tag; - alloc_meta->free_track_idx = (idx + 1) % KASAN_NR_FREE_STACKS; -#endif - - set_track(&alloc_meta->free_track[idx], GFP_NOWAIT); -} - void kasan_poison_slab(struct page *page) { unsigned long i; @@ -491,7 +473,7 @@ static void *__kasan_kmalloc(struct kmem KASAN_KMALLOC_REDZONE); if (cache->flags & SLAB_KASAN) - set_track(&get_alloc_info(cache, object)->alloc_track, flags); + kasan_set_track(&get_alloc_info(cache, object)->alloc_track, flags); return set_tag(object, tag); } --- a/mm/kasan/generic.c~kasan-record-and-print-the-free-track +++ a/mm/kasan/generic.c @@ -345,3 +345,25 @@ void kasan_record_aux_stack(void *addr) alloc_info->aux_stack[1] = alloc_info->aux_stack[0]; alloc_info->aux_stack[0] = kasan_save_stack(GFP_NOWAIT); } + +void kasan_set_free_info(struct kmem_cache *cache, + void *object, u8 tag) +{ + struct kasan_free_meta *free_meta; + + free_meta = get_free_info(cache, object); + kasan_set_track(&free_meta->free_track, GFP_NOWAIT); + + /* + * the object was freed and has free track set + */ + *(u8 *)kasan_mem_to_shadow(object) = KASAN_KMALLOC_FREETRACK; +} + +struct kasan_track *kasan_get_free_track(struct kmem_cache *cache, + void *object, u8 tag) +{ + if (*(u8 *)kasan_mem_to_shadow(object) != KASAN_KMALLOC_FREETRACK) + return NULL; + return &get_free_info(cache, object)->free_track; +} --- a/mm/kasan/generic_report.c~kasan-record-and-print-the-free-track +++ a/mm/kasan/generic_report.c @@ -80,6 +80,7 @@ static const char *get_shadow_bug_type(s break; case KASAN_FREE_PAGE: case KASAN_KMALLOC_FREE: + case KASAN_KMALLOC_FREETRACK: bug_type = "use-after-free"; break; case KASAN_ALLOCA_LEFT: --- a/mm/kasan/kasan.h~kasan-record-and-print-the-free-track +++ a/mm/kasan/kasan.h @@ -17,15 +17,17 @@ #define KASAN_PAGE_REDZONE 0xFE /* redzone for kmalloc_large allocations */ #define KASAN_KMALLOC_REDZONE 0xFC /* redzone inside slub object */ #define KASAN_KMALLOC_FREE 0xFB /* object was freed (kmem_cache_free/kfree) */ +#define KASAN_KMALLOC_FREETRACK 0xFA /* object was freed and has free track set */ #else #define KASAN_FREE_PAGE KASAN_TAG_INVALID #define KASAN_PAGE_REDZONE KASAN_TAG_INVALID #define KASAN_KMALLOC_REDZONE KASAN_TAG_INVALID #define KASAN_KMALLOC_FREE KASAN_TAG_INVALID +#define KASAN_KMALLOC_FREETRACK KASAN_TAG_INVALID #endif -#define KASAN_GLOBAL_REDZONE 0xFA /* redzone for global variable */ -#define KASAN_VMALLOC_INVALID 0xF9 /* unallocated space in vmapped page */ +#define KASAN_GLOBAL_REDZONE 0xF9 /* redzone for global variable */ +#define KASAN_VMALLOC_INVALID 0xF8 /* unallocated space in vmapped page */ /* * Stack redzone shadow values @@ -110,8 +112,9 @@ struct kasan_alloc_meta { * The free stack is stored into struct kasan_free_meta. */ depot_stack_handle_t aux_stack[2]; -#endif +#else struct kasan_track free_track[KASAN_NR_FREE_STACKS]; +#endif #ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY u8 free_pointer_tag[KASAN_NR_FREE_STACKS]; u8 free_track_idx; @@ -126,6 +129,9 @@ struct kasan_free_meta { * Otherwise it might be used for the allocator freelist. */ struct qlist_node quarantine_link; +#ifdef CONFIG_KASAN_GENERIC + struct kasan_track free_track; +#endif }; struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache, @@ -167,6 +173,10 @@ void kasan_report_invalid_free(void *obj struct page *kasan_addr_to_page(const void *addr); depot_stack_handle_t kasan_save_stack(gfp_t flags); +void kasan_set_track(struct kasan_track *track, gfp_t flags); +void kasan_set_free_info(struct kmem_cache *cache, void *object, u8 tag); +struct kasan_track *kasan_get_free_track(struct kmem_cache *cache, + void *object, u8 tag); #if defined(CONFIG_KASAN_GENERIC) && \ (defined(CONFIG_SLAB) || defined(CONFIG_SLUB)) --- a/mm/kasan/quarantine.c~kasan-record-and-print-the-free-track +++ a/mm/kasan/quarantine.c @@ -145,6 +145,7 @@ static void qlink_free(struct qlist_node if (IS_ENABLED(CONFIG_SLAB)) local_irq_save(flags); + *(u8 *)kasan_mem_to_shadow(object) = KASAN_KMALLOC_FREE; ___cache_free(cache, object, _THIS_IP_); if (IS_ENABLED(CONFIG_SLAB)) --- a/mm/kasan/report.c~kasan-record-and-print-the-free-track +++ a/mm/kasan/report.c @@ -165,26 +165,6 @@ static void describe_object_addr(struct (void *)(object_addr + cache->object_size)); } -static struct kasan_track *kasan_get_free_track(struct kmem_cache *cache, - void *object, u8 tag) -{ - struct kasan_alloc_meta *alloc_meta; - int i = 0; - - alloc_meta = get_alloc_info(cache, object); - -#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY - for (i = 0; i < KASAN_NR_FREE_STACKS; i++) { - if (alloc_meta->free_pointer_tag[i] == tag) - break; - } - if (i == KASAN_NR_FREE_STACKS) - i = alloc_meta->free_track_idx; -#endif - - return &alloc_meta->free_track[i]; -} - static void describe_object(struct kmem_cache *cache, void *object, const void *addr, u8 tag) { @@ -196,8 +176,10 @@ static void describe_object(struct kmem_ print_track(&alloc_info->alloc_track, "Allocated"); pr_err("\n"); free_track = kasan_get_free_track(cache, object, tag); - print_track(free_track, "Freed"); - pr_err("\n"); + if (free_track) { + print_track(free_track, "Freed"); + pr_err("\n"); + } #ifdef CONFIG_KASAN_GENERIC if (alloc_info->aux_stack[0]) { --- a/mm/kasan/tags.c~kasan-record-and-print-the-free-track +++ a/mm/kasan/tags.c @@ -161,3 +161,40 @@ void __hwasan_tag_memory(unsigned long a kasan_poison_shadow((void *)addr, size, tag); } EXPORT_SYMBOL(__hwasan_tag_memory); + +void kasan_set_free_info(struct kmem_cache *cache, + void *object, u8 tag) +{ + struct kasan_alloc_meta *alloc_meta; + u8 idx = 0; + + alloc_meta = get_alloc_info(cache, object); + +#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY + idx = alloc_meta->free_track_idx; + alloc_meta->free_pointer_tag[idx] = tag; + alloc_meta->free_track_idx = (idx + 1) % KASAN_NR_FREE_STACKS; +#endif + + kasan_set_track(&alloc_meta->free_track[idx], GFP_NOWAIT); +} + +struct kasan_track *kasan_get_free_track(struct kmem_cache *cache, + void *object, u8 tag) +{ + struct kasan_alloc_meta *alloc_meta; + int i = 0; + + alloc_meta = get_alloc_info(cache, object); + +#ifdef CONFIG_KASAN_SW_TAGS_IDENTIFY + for (i = 0; i < KASAN_NR_FREE_STACKS; i++) { + if (alloc_meta->free_pointer_tag[i] == tag) + break; + } + if (i == KASAN_NR_FREE_STACKS) + i = alloc_meta->free_track_idx; +#endif + + return &alloc_meta->free_track[i]; +} _
From: Walter Wu <walter-zh.wu@mediatek.com> Subject: kasan: add tests for call_rcu stack recording Test call_rcu() call stack recording and verify whether it correctly is printed in KASAN report. Link: http://lkml.kernel.org/r/20200601051045.1294-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- lib/test_kasan.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) --- a/lib/test_kasan.c~kasan-add-tests-for-call_rcu-stack-recording +++ a/lib/test_kasan.c @@ -801,6 +801,35 @@ static noinline void __init vmalloc_oob( static void __init vmalloc_oob(void) {} #endif +static struct kasan_rcu_info { + int i; + struct rcu_head rcu; +} *global_rcu_ptr; + +static noinline void __init kasan_rcu_reclaim(struct rcu_head *rp) +{ + struct kasan_rcu_info *fp = container_of(rp, + struct kasan_rcu_info, rcu); + + kfree(fp); + fp->i = 1; +} + +static noinline void __init kasan_rcu_uaf(void) +{ + struct kasan_rcu_info *ptr; + + pr_info("use-after-free in kasan_rcu_reclaim\n"); + ptr = kmalloc(sizeof(struct kasan_rcu_info), GFP_KERNEL); + if (!ptr) { + pr_err("Allocation failed\n"); + return; + } + + global_rcu_ptr = rcu_dereference_protected(ptr, NULL); + call_rcu(&global_rcu_ptr->rcu, kasan_rcu_reclaim); +} + static int __init kmalloc_tests_init(void) { /* @@ -848,6 +877,7 @@ static int __init kmalloc_tests_init(voi kasan_bitops(); kmalloc_double_kzfree(); vmalloc_oob(); + kasan_rcu_uaf(); kasan_restore_multi_shot(multishot); _
From: Walter Wu <walter-zh.wu@mediatek.com> Subject: kasan: update documentation for generic kasan Generic KASAN will support to record the last two call_rcu() call stacks and print them in KASAN report. So that need to update documentation. Link: http://lkml.kernel.org/r/20200601051111.1359-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/dev-tools/kasan.rst | 3 +++ 1 file changed, 3 insertions(+) --- a/Documentation/dev-tools/kasan.rst~kasan-update-documentation-for-generic-kasan +++ a/Documentation/dev-tools/kasan.rst @@ -190,6 +190,9 @@ function calls GCC directly inserts the This option significantly enlarges kernel but it gives x1.1-x2 performance boost over outline instrumented kernel. +Generic KASAN prints up to 2 call_rcu() call stacks in reports, the last one +and the second to last. + Software tag-based KASAN ~~~~~~~~~~~~~~~~~~~~~~~~ _
From: Vincenzo Frascino <vincenzo.frascino@arm.com> Subject: kasan: remove kasan_unpoison_stack_above_sp_to() kasan_unpoison_stack_above_sp_to() is defined in kasan code but never used. The function was introduced as part of the commit: commit 9f7d416c36124667 ("kprobes: Unpoison stack in jprobe_return() for KASAN") ... where it was necessary because x86's jprobe_return() would leave stale shadow on the stack, and was an oddity in that regard. Since then, jprobes were removed entirely, and as of commit: commit 80006dbee674f9fa ("kprobes/x86: Remove jprobe implementation") ... there have been no callers of this function. Remove the declaration and the implementation. Link: http://lkml.kernel.org/r/20200706143505.23299-1-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/kasan.h | 2 -- mm/kasan/common.c | 15 --------------- 2 files changed, 17 deletions(-) --- a/include/linux/kasan.h~kasan-remove-kasan_unpoison_stack_above_sp_to +++ a/include/linux/kasan.h @@ -38,7 +38,6 @@ extern void kasan_disable_current(void); void kasan_unpoison_shadow(const void *address, size_t size); void kasan_unpoison_task_stack(struct task_struct *task); -void kasan_unpoison_stack_above_sp_to(const void *watermark); void kasan_alloc_pages(struct page *page, unsigned int order); void kasan_free_pages(struct page *page, unsigned int order); @@ -101,7 +100,6 @@ void kasan_restore_multi_shot(bool enabl static inline void kasan_unpoison_shadow(const void *address, size_t size) {} static inline void kasan_unpoison_task_stack(struct task_struct *task) {} -static inline void kasan_unpoison_stack_above_sp_to(const void *watermark) {} static inline void kasan_enable_current(void) {} static inline void kasan_disable_current(void) {} --- a/mm/kasan/common.c~kasan-remove-kasan_unpoison_stack_above_sp_to +++ a/mm/kasan/common.c @@ -180,21 +180,6 @@ asmlinkage void kasan_unpoison_task_stac kasan_unpoison_shadow(base, watermark - base); } -/* - * Clear all poison for the region between the current SP and a provided - * watermark value, as is sometimes required prior to hand-crafted asm function - * returns in the middle of functions. - */ -void kasan_unpoison_stack_above_sp_to(const void *watermark) -{ - const void *sp = __builtin_frame_address(0); - size_t size = watermark - sp; - - if (WARN_ON(sp > watermark)) - return; - kasan_unpoison_shadow(sp, size); -} - void kasan_alloc_pages(struct page *page, unsigned int order) { u8 tag; _
From: Walter Wu <walter-zh.wu@mediatek.com> Subject: lib/test_kasan.c: fix KASAN unit tests for tag-based KASAN We use tag-based KASAN, then KASAN unit tests don't detect out-of-bounds memory access. They need to be fixed. With tag-based KASAN, the state of each 16 aligned bytes of memory is encoded in one shadow byte and the shadow value is tag of pointer, so we need to read next shadow byte, the shadow value is not equal to tag value of pointer, so that tag-based KASAN will detect out-of-bounds memory access. [walter-zh.wu@mediatek.com: use KASAN_SHADOW_SCALE_SIZE instead of 13] Link: http://lkml.kernel.org/r/20200708132524.11688-1-walter-zh.wu@mediatek.com Link: http://lkml.kernel.org/r/20200706115039.16750-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- lib/test_kasan.c | 49 +++++++++++++++++++++++++++++---------------- 1 file changed, 32 insertions(+), 17 deletions(-) --- a/lib/test_kasan.c~kasan-fix-kasan-unit-tests-for-tag-based-kasan +++ a/lib/test_kasan.c @@ -23,6 +23,10 @@ #include <asm/page.h> +#include "../mm/kasan/kasan.h" + +#define OOB_TAG_OFF (IS_ENABLED(CONFIG_KASAN_GENERIC) ? 0 : KASAN_SHADOW_SCALE_SIZE) + /* * We assign some test results to these globals to make sure the tests * are not eliminated as dead code. @@ -48,7 +52,8 @@ static noinline void __init kmalloc_oob_ return; } - ptr[size] = 'x'; + ptr[size + OOB_TAG_OFF] = 'x'; + kfree(ptr); } @@ -100,7 +105,8 @@ static noinline void __init kmalloc_page return; } - ptr[size] = 0; + ptr[size + OOB_TAG_OFF] = 0; + kfree(ptr); } @@ -170,7 +176,8 @@ static noinline void __init kmalloc_oob_ return; } - ptr2[size2] = 'x'; + ptr2[size2 + OOB_TAG_OFF] = 'x'; + kfree(ptr2); } @@ -188,7 +195,9 @@ static noinline void __init kmalloc_oob_ kfree(ptr1); return; } - ptr2[size2] = 'x'; + + ptr2[size2 + OOB_TAG_OFF] = 'x'; + kfree(ptr2); } @@ -224,7 +233,8 @@ static noinline void __init kmalloc_oob_ return; } - memset(ptr+7, 0, 2); + memset(ptr + 7 + OOB_TAG_OFF, 0, 2); + kfree(ptr); } @@ -240,7 +250,8 @@ static noinline void __init kmalloc_oob_ return; } - memset(ptr+5, 0, 4); + memset(ptr + 5 + OOB_TAG_OFF, 0, 4); + kfree(ptr); } @@ -257,7 +268,8 @@ static noinline void __init kmalloc_oob_ return; } - memset(ptr+1, 0, 8); + memset(ptr + 1 + OOB_TAG_OFF, 0, 8); + kfree(ptr); } @@ -273,7 +285,8 @@ static noinline void __init kmalloc_oob_ return; } - memset(ptr+1, 0, 16); + memset(ptr + 1 + OOB_TAG_OFF, 0, 16); + kfree(ptr); } @@ -289,7 +302,8 @@ static noinline void __init kmalloc_oob_ return; } - memset(ptr, 0, size+5); + memset(ptr, 0, size + 5 + OOB_TAG_OFF); + kfree(ptr); } @@ -423,7 +437,8 @@ static noinline void __init kmem_cache_o return; } - *p = p[size]; + *p = p[size + OOB_TAG_OFF]; + kmem_cache_free(cache, p); kmem_cache_destroy(cache); } @@ -520,25 +535,25 @@ static noinline void __init copy_user_te } pr_info("out-of-bounds in copy_from_user()\n"); - unused = copy_from_user(kmem, usermem, size + 1); + unused = copy_from_user(kmem, usermem, size + 1 + OOB_TAG_OFF); pr_info("out-of-bounds in copy_to_user()\n"); - unused = copy_to_user(usermem, kmem, size + 1); + unused = copy_to_user(usermem, kmem, size + 1 + OOB_TAG_OFF); pr_info("out-of-bounds in __copy_from_user()\n"); - unused = __copy_from_user(kmem, usermem, size + 1); + unused = __copy_from_user(kmem, usermem, size + 1 + OOB_TAG_OFF); pr_info("out-of-bounds in __copy_to_user()\n"); - unused = __copy_to_user(usermem, kmem, size + 1); + unused = __copy_to_user(usermem, kmem, size + 1 + OOB_TAG_OFF); pr_info("out-of-bounds in __copy_from_user_inatomic()\n"); - unused = __copy_from_user_inatomic(kmem, usermem, size + 1); + unused = __copy_from_user_inatomic(kmem, usermem, size + 1 + OOB_TAG_OFF); pr_info("out-of-bounds in __copy_to_user_inatomic()\n"); - unused = __copy_to_user_inatomic(usermem, kmem, size + 1); + unused = __copy_to_user_inatomic(usermem, kmem, size + 1 + OOB_TAG_OFF); pr_info("out-of-bounds in strncpy_from_user()\n"); - unused = strncpy_from_user(kmem, usermem, size + 1); + unused = strncpy_from_user(kmem, usermem, size + 1 + OOB_TAG_OFF); vm_munmap((unsigned long)usermem, PAGE_SIZE); kfree(kmem); _
From: Andrey Konovalov <andreyknvl@google.com> Subject: kasan: don't tag stacks allocated with pagealloc Patch series "kasan: support stack instrumentation for tag-based mode", v2. This patch (of 5): Prepare Software Tag-Based KASAN for stack tagging support. With Tag-Based KASAN when kernel stacks are allocated via pagealloc (which happens when CONFIG_VMAP_STACK is not enabled), they get tagged. KASAN instrumentation doesn't expect the sp register to be tagged, and this leads to false-positive reports. Fix by resetting the tag of kernel stack pointers after allocation. Link: http://lkml.kernel.org/r/cover.1596199677.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/cover.1596544734.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/12d8c678869268dd0884b01271ab592f30792abf.1596544734.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/01c678b877755bcf29009176592402cdf6f2cb15.1596199677.git.andreyknvl@google.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=203497 Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- kernel/fork.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/kernel/fork.c~kasan-dont-tag-stacks-allocated-with-pagealloc +++ a/kernel/fork.c @@ -261,7 +261,7 @@ static unsigned long *alloc_thread_stack THREAD_SIZE_ORDER); if (likely(page)) { - tsk->stack = page_address(page); + tsk->stack = kasan_reset_tag(page_address(page)); return tsk->stack; } return NULL; @@ -302,6 +302,7 @@ static unsigned long *alloc_thread_stack { unsigned long *stack; stack = kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node); + stack = kasan_reset_tag(stack); tsk->stack = stack; return stack; } _
From: Andrey Konovalov <andreyknvl@google.com> Subject: efi: provide empty efi_enter_virtual_mode implementation When CONFIG_EFI is not enabled, we might get an undefined reference to efi_enter_virtual_mode() error, if this efi_enabled() call isn't inlined into start_kernel(). This happens in particular, if start_kernel() is annodated with __no_sanitize_address. Link: http://lkml.kernel.org/r/6514652d3a32d3ed33d6eb5c91d0af63bf0d1a0c.1596544734.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/efi.h | 4 ++++ 1 file changed, 4 insertions(+) --- a/include/linux/efi.h~efi-provide-empty-efi_enter_virtual_mode-implementation +++ a/include/linux/efi.h @@ -606,7 +606,11 @@ extern void *efi_get_pal_addr (void); extern void efi_map_pal_code (void); extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg); extern void efi_gettimeofday (struct timespec64 *ts); +#ifdef CONFIG_EFI extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */ +#else +static inline void efi_enter_virtual_mode (void) {} +#endif #ifdef CONFIG_X86 extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, _
From: Andrey Konovalov <andreyknvl@google.com> Subject: kasan, arm64: don't instrument functions that enable kasan This patch prepares Software Tag-Based KASAN for stack tagging support. With stack tagging enabled, KASAN tags stack variable in each function in its prologue. In start_kernel() stack variables get tagged before KASAN is enabled via setup_arch()->kasan_init(). As the result the tags for start_kernel()'s stack variables end up in the temporary shadow memory. Later when KASAN gets enabled, switched to normal shadow, and starts checking tags, this leads to false-positive reports, as proper tags are missing in normal shadow. Disable KASAN instrumentation for start_kernel(). Also disable it for arm64's setup_arch() as a precaution (it doesn't have any stack variables right now). [andreyknvl@google.com: reorder attributes for start_kernel()] Link: http://lkml.kernel.org/r/26fb6165a17abcf61222eda5184c030fb6b133d1.1596544734.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/55d432671a92e931ab8234b03dc36b14d4c21bfb.1596199677.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm64/kernel/setup.c | 2 +- init/main.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm64/kernel/setup.c~kasan-arm64-dont-instrument-functions-that-enable-kasan +++ a/arch/arm64/kernel/setup.c @@ -276,7 +276,7 @@ arch_initcall(reserve_memblock_reserved_ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID }; -void __init setup_arch(char **cmdline_p) +void __init __no_sanitize_address setup_arch(char **cmdline_p) { init_mm.start_code = (unsigned long) _text; init_mm.end_code = (unsigned long) _etext; --- a/init/main.c~kasan-arm64-dont-instrument-functions-that-enable-kasan +++ a/init/main.c @@ -829,7 +829,7 @@ void __init __weak arch_call_rest_init(v rest_init(); } -asmlinkage __visible void __init start_kernel(void) +asmlinkage __visible void __init __no_sanitize_address start_kernel(void) { char *command_line; char *after_dashes; _
From: Andrey Konovalov <andreyknvl@google.com> Subject: kasan: allow enabling stack tagging for tag-based mode Use CONFIG_KASAN_STACK to enable stack tagging. Note, that HWASAN short granules [1] are disabled. Supporting those will require more kernel changes. [1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html Link: http://lkml.kernel.org/r/e7febb907b539c3730780df587ce0b38dc558c3d.1596199677.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/99f7d90a4237431bf5988599fb41358e92876eb0.1596544734.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- scripts/Makefile.kasan | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/scripts/Makefile.kasan~kasan-allow-enabling-stack-tagging-for-tag-based-mode +++ a/scripts/Makefile.kasan @@ -44,7 +44,8 @@ else endif CFLAGS_KASAN := -fsanitize=kernel-hwaddress \ - -mllvm -hwasan-instrument-stack=0 \ + -mllvm -hwasan-instrument-stack=$(CONFIG_KASAN_STACK) \ + -mllvm -hwasan-use-short-granules=0 \ $(instrumentation_flags) endif # CONFIG_KASAN_SW_TAGS _
From: Andrey Konovalov <andreyknvl@google.com> Subject: kasan: adjust kasan_stack_oob for tag-based mode Use OOB_TAG_OFF as access offset to land the access into the next granule. Link: http://lkml.kernel.org/r/403b259f1de49a7a3694531c851ac28326a586a8.1596199677.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/3063ab1411e92bce36061a96e25b651212e70ba6.1596544734.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Suggested-by: Walter Wu <walter-zh.wu@mediatek.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- lib/test_kasan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/lib/test_kasan.c~kasan-adjust-kasan_stack_oob-for-tag-based-mode +++ a/lib/test_kasan.c @@ -488,7 +488,7 @@ static noinline void __init kasan_global static noinline void __init kasan_stack_oob(void) { char stack_array[10]; - volatile int i = 0; + volatile int i = OOB_TAG_OFF; char *p = &stack_array[ARRAY_SIZE(stack_array) + i]; pr_info("out-of-bounds on stack\n"); _
From: Vlastimil Babka <vbabka@suse.cz> Subject: mm, page_alloc: use unlikely() in task_capc() Hugh noted that task_capc() could use unlikely(), as most of the time there is no capture in progress and we are in page freeing hot path. Indeed adding unlikely() produces assembly that better matches the assumption and moves all the tests away from the hot path. I have also noticed that we don't need to test for cc->direct_compaction as the only place we set current->task_capture is compact_zone_order() which also always sets cc->direct_compaction true. Link: http://lkml.kernel.org/r/4a24f7af-3aa5-6e80-4ae6-8f253b562039@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Hugh Dickins <hughd@google.com> Acked-by: Hugh Dickins <hughd@googlecom> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Li Wang <liwang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-use-unlikely-in-task_capc +++ a/mm/page_alloc.c @@ -813,11 +813,10 @@ static inline struct capture_control *ta { struct capture_control *capc = current->capture_control; - return capc && + return unlikely(capc) && !(current->flags & PF_KTHREAD) && !capc->page && - capc->cc->zone == zone && - capc->cc->direct_compaction ? capc : NULL; + capc->cc->zone == zone ? capc : NULL; } static inline bool _
From: Jaewon Kim <jaewon31.kim@samsung.com> Subject: page_alloc: consider highatomic reserve in watermark fast zone_watermark_fast was introduced by commit 48ee5f3696f6 ("mm, page_alloc: shortcut watermark checks for order-0 pages"). The commit simply checks if free pages is bigger than watermark without additional calculation such like reducing watermark. It considered free cma pages but it did not consider highatomic reserved. This may incur exhaustion of free pages except high order atomic free pages. Assume that reserved_highatomic pageblock is bigger than watermark min, and there are only few free pages except high order atomic free. Because zone_watermark_fast passes the allocation without considering high order atomic free, normal reclaimable allocation like GFP_HIGHUSER will consume all the free pages. Then finally order-0 atomic allocation may fail on allocation. This means watermark min is not protected against non-atomic allocation. The order-0 atomic allocation with ALLOC_HARDER unwantedly can be failed. Additionally the __GFP_MEMALLOC allocation with ALLOC_NO_WATERMARKS also can be failed. To avoid the problem, zone_watermark_fast should consider highatomic reserve. If the actual size of high atomic free is counted accurately like cma free, we may use it. On this patch just use nr_reserved_highatomic. Additionally introduce __zone_watermark_unusable_free to factor out common parts between zone_watermark_fast and __zone_watermark_ok. This is an example of ALLOC_HARDER allocation failure using v4.19 based kernel. Binder:9343_3: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) Call trace: [<ffffff8008f40f8c>] dump_stack+0xb8/0xf0 [<ffffff8008223320>] warn_alloc+0xd8/0x12c [<ffffff80082245e4>] __alloc_pages_nodemask+0x120c/0x1250 [<ffffff800827f6e8>] new_slab+0x128/0x604 [<ffffff800827b0cc>] ___slab_alloc+0x508/0x670 [<ffffff800827ba00>] __kmalloc+0x2f8/0x310 [<ffffff80084ac3e0>] context_struct_to_string+0x104/0x1cc [<ffffff80084ad8fc>] security_sid_to_context_core+0x74/0x144 [<ffffff80084ad880>] security_sid_to_context+0x10/0x18 [<ffffff800849bd80>] selinux_secid_to_secctx+0x20/0x28 [<ffffff800849109c>] security_secid_to_secctx+0x3c/0x70 [<ffffff8008bfe118>] binder_transaction+0xe68/0x454c Mem-Info: active_anon:102061 inactive_anon:81551 isolated_anon:0 active_file:59102 inactive_file:68924 isolated_file:64 unevictable:611 dirty:63 writeback:0 unstable:0 slab_reclaimable:13324 slab_unreclaimable:44354 mapped:83015 shmem:4858 pagetables:26316 bounce:0 free:2727 free_pcp:1035 free_cma:178 Node 0 active_anon:408244kB inactive_anon:326204kB active_file:236408kB inactive_file:275696kB unevictable:2444kB isolated(anon):0kB isolated(file):256kB mapped:332060kB dirty:252kB writeback:0kB shmem:19432kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:10908kB min:6192kB low:44388kB high:47060kB active_anon:409160kB inactive_anon:325924kB active_file:235820kB inactive_file:276628kB unevictable:2444kB writepending:252kB present:3076096kB managed:2673676kB mlocked:2444kB kernel_stack:62512kB pagetables:105264kB bounce:0kB free_pcp:4140kB local_pcp:40kB free_cma:712kB lowmem_reserve[]: 0 0 Normal: 505*4kB (H) 357*8kB (H) 201*16kB (H) 65*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 10236kB 138826 total pagecache pages 5460 pages in swap cache Swap cache stats: add 8273090, delete 8267506, find 1004381/4060142 This is an example of ALLOC_NO_WATERMARKS allocation failure using v4.14 based kernel. kswapd0: page allocation failure: order:0, mode:0x140000a(GFP_NOIO|__GFP_HIGHMEM|__GFP_MOVABLE), nodemask=(null) kswapd0 cpuset=/ mems_allowed=0 CPU: 4 PID: 1221 Comm: kswapd0 Not tainted 4.14.113-18770262-userdebug #1 Call trace: [<0000000000000000>] dump_backtrace+0x0/0x248 [<0000000000000000>] show_stack+0x18/0x20 [<0000000000000000>] __dump_stack+0x20/0x28 [<0000000000000000>] dump_stack+0x68/0x90 [<0000000000000000>] warn_alloc+0x104/0x198 [<0000000000000000>] __alloc_pages_nodemask+0xdc0/0xdf0 [<0000000000000000>] zs_malloc+0x148/0x3d0 [<0000000000000000>] zram_bvec_rw+0x410/0x798 [<0000000000000000>] zram_rw_page+0x88/0xdc [<0000000000000000>] bdev_write_page+0x70/0xbc [<0000000000000000>] __swap_writepage+0x58/0x37c [<0000000000000000>] swap_writepage+0x40/0x4c [<0000000000000000>] shrink_page_list+0xc30/0xf48 [<0000000000000000>] shrink_inactive_list+0x2b0/0x61c [<0000000000000000>] shrink_node_memcg+0x23c/0x618 [<0000000000000000>] shrink_node+0x1c8/0x304 [<0000000000000000>] kswapd+0x680/0x7c4 [<0000000000000000>] kthread+0x110/0x120 [<0000000000000000>] ret_from_fork+0x10/0x18 Mem-Info: active_anon:111826 inactive_anon:65557 isolated_anon:0\x0a active_file:44260 inactive_file:83422 isolated_file:0\x0a unevictable:4158 dirty:117 writeback:0 unstable:0\x0a slab_reclaimable:13943 slab_unreclaimable:43315\x0a mapped:102511 shmem:3299 pagetables:19566 bounce:0\x0a free:3510 free_pcp:553 free_cma:0 Node 0 active_anon:447304kB inactive_anon:262228kB active_file:177040kB inactive_file:333688kB unevictable:16632kB isolated(anon):0kB isolated(file):0kB mapped:410044kB d irty:468kB writeback:0kB shmem:13196kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:14040kB min:7440kB low:94500kB high:98136kB reserved_highatomic:32768KB active_anon:447336kB inactive_anon:261668kB active_file:177572kB inactive_file:333768k B unevictable:16632kB writepending:480kB present:4081664kB managed:3637088kB mlocked:16632kB kernel_stack:47072kB pagetables:78264kB bounce:0kB free_pcp:2280kB local_pcp:720kB free_cma:0kB [ 4738.329607] lowmem_reserve[]: 0 0 Normal: 860*4kB (H) 453*8kB (H) 180*16kB (H) 26*32kB (H) 34*64kB (H) 6*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14232kB This is trace log which shows GFP_HIGHUSER consumes free pages right before ALLOC_NO_WATERMARKS. <...>-22275 [006] .... 889.213383: mm_page_alloc: page=00000000d2be5665 pfn=970744 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213385: mm_page_alloc: page=000000004b2335c2 pfn=970745 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213387: mm_page_alloc: page=00000000017272e1 pfn=970278 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213389: mm_page_alloc: page=00000000c4be79fb pfn=970279 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213391: mm_page_alloc: page=00000000f8a51d4f pfn=970260 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213393: mm_page_alloc: page=000000006ba8f5ac pfn=970261 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213395: mm_page_alloc: page=00000000819f1cd3 pfn=970196 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213396: mm_page_alloc: page=00000000f6b72a64 pfn=970197 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO kswapd0-1207 [005] ...1 889.213398: mm_page_alloc: page= (null) pfn=0 order=0 migratetype=1 nr_free=3650 gfp_flags=GFP_NOWAIT|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_MOVABLE [jaewon31.kim@samsung.com: remove redundant code for high-order] Link: http://lkml.kernel.org/r/20200623035242.27232-1-jaewon31.kim@samsung.com Link: http://lkml.kernel.org/r/20200619235958.11283-1-jaewon31.kim@samsung.com Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com> Suggested-by: Minchan Kim <minchan@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yong-Taek Lee <ytk.lee@samsung.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 66 +++++++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 30 deletions(-) --- a/mm/page_alloc.c~page_alloc-consider-highatomic-reserve-in-watermark-fast +++ a/mm/page_alloc.c @@ -3486,6 +3486,29 @@ static noinline bool should_fail_alloc_p } ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE); +static inline long __zone_watermark_unusable_free(struct zone *z, + unsigned int order, unsigned int alloc_flags) +{ + const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); + long unusable_free = (1 << order) - 1; + + /* + * If the caller does not have rights to ALLOC_HARDER then subtract + * the high-atomic reserves. This will over-estimate the size of the + * atomic reserve but it avoids a search. + */ + if (likely(!alloc_harder)) + unusable_free += z->nr_reserved_highatomic; + +#ifdef CONFIG_CMA + /* If allocation can't use CMA areas don't use free CMA pages */ + if (!(alloc_flags & ALLOC_CMA)) + unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES); +#endif + + return unusable_free; +} + /* * Return true if free base pages are above 'mark'. For high-order checks it * will return true of the order-0 watermark is reached and there is at least @@ -3501,19 +3524,12 @@ bool __zone_watermark_ok(struct zone *z, const bool alloc_harder = (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); /* free_pages may go negative - that's OK */ - free_pages -= (1 << order) - 1; + free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags); if (alloc_flags & ALLOC_HIGH) min -= min / 2; - /* - * If the caller does not have rights to ALLOC_HARDER then subtract - * the high-atomic reserves. This will over-estimate the size of the - * atomic reserve but it avoids a search. - */ - if (likely(!alloc_harder)) { - free_pages -= z->nr_reserved_highatomic; - } else { + if (unlikely(alloc_harder)) { /* * OOM victims can try even harder than normal ALLOC_HARDER * users on the grounds that it's definitely going to be in @@ -3526,13 +3542,6 @@ bool __zone_watermark_ok(struct zone *z, min -= min / 4; } - -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES); -#endif - /* * Check watermarks for an order-0 allocation request. If these * are not met, then a high-order request also cannot go ahead @@ -3581,25 +3590,22 @@ static inline bool zone_watermark_fast(s unsigned long mark, int highest_zoneidx, unsigned int alloc_flags) { - long free_pages = zone_page_state(z, NR_FREE_PAGES); - long cma_pages = 0; + long free_pages; -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES); -#endif + free_pages = zone_page_state(z, NR_FREE_PAGES); /* * Fast check for order-0 only. If this fails then the reserves - * need to be calculated. There is a corner case where the check - * passes but only the high-order atomic reserve are free. If - * the caller is !atomic then it'll uselessly search the free - * list. That corner case is then slower but it is harmless. + * need to be calculated. */ - if (!order && (free_pages - cma_pages) > - mark + z->lowmem_reserve[highest_zoneidx]) - return true; + if (!order) { + long fast_free; + + fast_free = free_pages; + fast_free -= __zone_watermark_unusable_free(z, 0, alloc_flags); + if (fast_free > mark + z->lowmem_reserve[highest_zoneidx]) + return true; + } return __zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags, free_pages); _
From: Charan Teja Reddy <charante@codeaurora.org> Subject: mm, page_alloc: skip ->waternark_boost for atomic order-0 allocations When boosting is enabled, it is observed that rate of atomic order-0 allocation failures are high due to the fact that free levels in the system are checked with ->watermark_boost offset. This is not a problem for sleepable allocations but for atomic allocations which looks like regression. This problem is seen frequently on system setup of Android kernel running on Snapdragon hardware with 4GB RAM size. When no extfrag event occurred in the system, ->watermark_boost factor is zero, thus the watermark configurations in the system are: _watermark = ( [WMARK_MIN] = 1272, --> ~5MB [WMARK_LOW] = 9067, --> ~36MB [WMARK_HIGH] = 9385), --> ~38MB watermark_boost = 0 After launching some memory hungry applications in Android which can cause extfrag events in the system to an extent that ->watermark_boost can be set to max i.e. default boost factor makes it to 150% of high watermark. _watermark = ( [WMARK_MIN] = 1272, --> ~5MB [WMARK_LOW] = 9067, --> ~36MB [WMARK_HIGH] = 9385), --> ~38MB watermark_boost = 14077, -->~57MB With default system configuration, for an atomic order-0 allocation to succeed, having free memory of ~2MB will suffice. But boosting makes the min_wmark to ~61MB thus for an atomic order-0 allocation to be successful system should have minimum of ~23MB of free memory(from calculations of zone_watermark_ok(), min = 3/4(min/2)). But failures are observed despite system is having ~20MB of free memory. In the testing, this is reproducible as early as first 300secs since boot and with furtherlowram configurations(<2GB) it is observed as early as first 150secs since boot. These failures can be avoided by excluding the ->watermark_boost in watermark caluculations for atomic order-0 allocations. [akpm@linux-foundation.org: fix comment grammar, reflow comment] [charante@codeaurora.org: fix suggested by Mel Gorman] Link: http://lkml.kernel.org/r/31556793-57b1-1c21-1a9d-22674d9bd938@codeaurora.org Link: http://lkml.kernel.org/r/1589882284-21010-1-git-send-email-charante@codeaurora.org Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-skip-waternark_boost-for-atomic-order-0-allocations +++ a/mm/page_alloc.c @@ -3588,7 +3588,7 @@ bool zone_watermark_ok(struct zone *z, u static inline bool zone_watermark_fast(struct zone *z, unsigned int order, unsigned long mark, int highest_zoneidx, - unsigned int alloc_flags) + unsigned int alloc_flags, gfp_t gfp_mask) { long free_pages; @@ -3607,8 +3607,23 @@ static inline bool zone_watermark_fast(s return true; } - return __zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags, - free_pages); + if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags, + free_pages)) + return true; + /* + * Ignore watermark boosting for GFP_ATOMIC order-0 allocations + * when checking the min watermark. The min watermark is the + * point where boosting is ignored so that kswapd is woken up + * when below the low watermark. + */ + if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost + && ((alloc_flags & ALLOC_WMARK_MASK) == WMARK_MIN))) { + mark = z->_watermark[WMARK_MIN]; + return __zone_watermark_ok(z, order, mark, highest_zoneidx, + alloc_flags, free_pages); + } + + return false; } bool zone_watermark_ok_safe(struct zone *z, unsigned int order, @@ -3752,7 +3767,8 @@ retry: mark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK); if (!zone_watermark_fast(zone, order, mark, - ac->highest_zoneidx, alloc_flags)) { + ac->highest_zoneidx, alloc_flags, + gfp_mask)) { int ret; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT _
From: David Hildenbrand <david@redhat.com> Subject: mm: remove vm_total_pages The global variable "vm_total_pages" is a relic from older days. There is only a single user that reads the variable - build_all_zonelists() - and the first thing it does is update it. Use a local variable in build_all_zonelists() instead and remove the global variable. Link: http://lkml.kernel.org/r/20200619132410.23859-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/swap.h | 1 - mm/memory_hotplug.c | 3 --- mm/page-writeback.c | 6 ++---- mm/page_alloc.c | 2 ++ mm/vmscan.c | 5 ----- 5 files changed, 4 insertions(+), 13 deletions(-) --- a/include/linux/swap.h~mm-drop-vm_total_pages +++ a/include/linux/swap.h @@ -372,7 +372,6 @@ extern unsigned long mem_cgroup_shrink_n extern unsigned long shrink_all_memory(unsigned long nr_pages); extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); -extern unsigned long vm_total_pages; extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA --- a/mm/memory_hotplug.c~mm-drop-vm_total_pages +++ a/mm/memory_hotplug.c @@ -844,8 +844,6 @@ int __ref online_pages(unsigned long pfn kswapd_run(nid); kcompactd_run(nid); - vm_total_pages = nr_free_pagecache_pages(); - writeback_set_ratelimit(); memory_notify(MEM_ONLINE, &arg); @@ -1595,7 +1593,6 @@ static int __ref __offline_pages(unsigne kcompactd_stop(node); } - vm_total_pages = nr_free_pagecache_pages(); writeback_set_ratelimit(); memory_notify(MEM_OFFLINE, &arg); --- a/mm/page_alloc.c~mm-drop-vm_total_pages +++ a/mm/page_alloc.c @@ -5912,6 +5912,8 @@ build_all_zonelists_init(void) */ void __ref build_all_zonelists(pg_data_t *pgdat) { + unsigned long vm_total_pages; + if (system_state == SYSTEM_BOOTING) { build_all_zonelists_init(); } else { --- a/mm/page-writeback.c~mm-drop-vm_total_pages +++ a/mm/page-writeback.c @@ -2076,13 +2076,11 @@ static int page_writeback_cpu_online(uns * Called early on to tune the page writeback dirty limits. * * We used to scale dirty pages according to how total memory - * related to pages that could be allocated for buffers (by - * comparing nr_free_buffer_pages() to vm_total_pages. + * related to pages that could be allocated for buffers. * * However, that was when we used "dirty_ratio" to scale with * all memory, and we don't do that any more. "dirty_ratio" - * is now applied to total non-HIGHPAGE memory (by subtracting - * totalhigh_pages from vm_total_pages), and as such we can't + * is now applied to total non-HIGHPAGE memory, and as such we can't * get into the old insane situation any more where we had * large amounts of dirty pages compared to a small amount of * non-HIGHMEM memory. --- a/mm/vmscan.c~mm-drop-vm_total_pages +++ a/mm/vmscan.c @@ -170,11 +170,6 @@ struct scan_control { * From 0 .. 200. Higher means more swappy. */ int vm_swappiness = 60; -/* - * The total number of pages which are beyond the high watermark within all - * zones. - */ -unsigned long vm_total_pages; static void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs) _
From: David Hildenbrand <david@redhat.com> Subject: mm/page_alloc: remove nr_free_pagecache_pages() nr_free_pagecache_pages() isn't used outside page_alloc.c anymore - and the name does not really help to understand what's going on. Let's open-code it instead and add a comment. Link: http://lkml.kernel.org/r/20200619132410.23859-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/swap.h | 1 - mm/page_alloc.c | 16 ++-------------- 2 files changed, 2 insertions(+), 15 deletions(-) --- a/include/linux/swap.h~mm-page_alloc-drop-nr_free_pagecache_pages +++ a/include/linux/swap.h @@ -328,7 +328,6 @@ void workingset_update_node(struct xa_no /* linux/mm/page_alloc.c */ extern unsigned long totalreserve_pages; extern unsigned long nr_free_buffer_pages(void); -extern unsigned long nr_free_pagecache_pages(void); /* Definition of global_zone_page_state not available yet */ #define nr_free_pages() global_zone_page_state(NR_FREE_PAGES) --- a/mm/page_alloc.c~mm-page_alloc-drop-nr_free_pagecache_pages +++ a/mm/page_alloc.c @@ -5186,19 +5186,6 @@ unsigned long nr_free_buffer_pages(void) } EXPORT_SYMBOL_GPL(nr_free_buffer_pages); -/** - * nr_free_pagecache_pages - count number of pages beyond high watermark - * - * nr_free_pagecache_pages() counts the number of pages which are beyond the - * high watermark within all zones. - * - * Return: number of pages beyond high watermark within all zones. - */ -unsigned long nr_free_pagecache_pages(void) -{ - return nr_free_zone_pages(gfp_zone(GFP_HIGHUSER_MOVABLE)); -} - static inline void show_node(struct zone *zone) { if (IS_ENABLED(CONFIG_NUMA)) @@ -5920,7 +5907,8 @@ void __ref build_all_zonelists(pg_data_t __build_all_zonelists(pgdat); /* cpuset refresh routine should be here */ } - vm_total_pages = nr_free_pagecache_pages(); + /* Get the number of free pages beyond high watermark in all zones. */ + vm_total_pages = nr_free_zone_pages(gfp_zone(GFP_HIGHUSER_MOVABLE)); /* * Disable grouping by mobility if the number of pages in the * system is too low to allow the mechanism to work. It would be _
From: David Hildenbrand <david@redhat.com> Subject: mm/memory_hotplug: document why shuffle_zone() is relevant It's not completely obvious why we have to shuffle the complete zone - introduced in commit e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") - because some sort of shuffling is already performed when onlining pages via __free_one_page(), placing MAX_ORDER-1 pages either to the head or the tail of the freelist. Let's document why we have to shuffle the complete zone when exposing larger, contiguous physical memory areas to the buddy. Link: http://lkml.kernel.org/r/20200624094741.9918-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memory_hotplug.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/memory_hotplug.c~mm-memory_hotplug-document-why-shuffle_zone-is-relevant +++ a/mm/memory_hotplug.c @@ -831,6 +831,14 @@ int __ref online_pages(unsigned long pfn zone->zone_pgdat->node_present_pages += onlined_pages; pgdat_resize_unlock(zone->zone_pgdat, &flags); + /* + * When exposing larger, physically contiguous memory areas to the + * buddy, shuffling in the buddy (when freeing onlined pages, putting + * them either to the head or the tail of the freelist) is only helpful + * for maintaining the shuffle, but not for creating the initial + * shuffle. Shuffle the whole zone to make sure the just onlined pages + * are properly distributed across the whole freelist. + */ shuffle_zone(zone); node_states_set_node(nid, &arg); _
From: David Hildenbrand <david@redhat.com> Subject: mm/shuffle: remove dynamic reconfiguration Commit e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") promised "autodetection of a memory-side-cache (to be added in a follow-on patch)" over a year ago. The original series included patches [1], however, they were dropped during review [2] to be followed-up later. Due to lack of platforms that publish an HMAT, autodetection is currently not implemented. However, manual activation is actively used [3]. Let's simplify for now and re-add when really (ever?) needed. [1] https://lkml.kernel.org/r/154510700291.1941238.817190985966612531.stgit@dwillia2-desk3.amr.corp.intel.com [2] https://lkml.kernel.org/r/154690326478.676627.103843791978176914.stgit@dwillia2-desk3.amr.corp.intel.com [3] https://lkml.kernel.org/r/CAPcyv4irwGUU2x+c6b4L=KbB1dnasNKaaZd6oSpYjL9kfsnROQ@mail.gmail.com Link: http://lkml.kernel.org/r/20200624094741.9918-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/shuffle.c | 28 ++-------------------------- mm/shuffle.h | 17 ----------------- 2 files changed, 2 insertions(+), 43 deletions(-) --- a/mm/shuffle.c~mm-shuffle-remove-dynamic-reconfiguration +++ a/mm/shuffle.c @@ -10,33 +10,11 @@ #include "shuffle.h" DEFINE_STATIC_KEY_FALSE(page_alloc_shuffle_key); -static unsigned long shuffle_state __ro_after_init; - -/* - * Depending on the architecture, module parameter parsing may run - * before, or after the cache detection. SHUFFLE_FORCE_DISABLE prevents, - * or reverts the enabling of the shuffle implementation. SHUFFLE_ENABLE - * attempts to turn on the implementation, but aborts if it finds - * SHUFFLE_FORCE_DISABLE already set. - */ -__meminit void page_alloc_shuffle(enum mm_shuffle_ctl ctl) -{ - if (ctl == SHUFFLE_FORCE_DISABLE) - set_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state); - - if (test_bit(SHUFFLE_FORCE_DISABLE, &shuffle_state)) { - if (test_and_clear_bit(SHUFFLE_ENABLE, &shuffle_state)) - static_branch_disable(&page_alloc_shuffle_key); - } else if (ctl == SHUFFLE_ENABLE - && !test_and_set_bit(SHUFFLE_ENABLE, &shuffle_state)) - static_branch_enable(&page_alloc_shuffle_key); -} static bool shuffle_param; static int shuffle_show(char *buffer, const struct kernel_param *kp) { - return sprintf(buffer, "%c\n", test_bit(SHUFFLE_ENABLE, &shuffle_state) - ? 'Y' : 'N'); + return sprintf(buffer, "%c\n", shuffle_param ? 'Y' : 'N'); } static __meminit int shuffle_store(const char *val, @@ -47,9 +25,7 @@ static __meminit int shuffle_store(const if (rc < 0) return rc; if (shuffle_param) - page_alloc_shuffle(SHUFFLE_ENABLE); - else - page_alloc_shuffle(SHUFFLE_FORCE_DISABLE); + static_branch_enable(&page_alloc_shuffle_key); return 0; } module_param_call(shuffle, shuffle_store, shuffle_show, &shuffle_param, 0400); --- a/mm/shuffle.h~mm-shuffle-remove-dynamic-reconfiguration +++ a/mm/shuffle.h @@ -4,23 +4,10 @@ #define _MM_SHUFFLE_H #include <linux/jump_label.h> -/* - * SHUFFLE_ENABLE is called from the command line enabling path, or by - * platform-firmware enabling that indicates the presence of a - * direct-mapped memory-side-cache. SHUFFLE_FORCE_DISABLE is called from - * the command line path and overrides any previous or future - * SHUFFLE_ENABLE. - */ -enum mm_shuffle_ctl { - SHUFFLE_ENABLE, - SHUFFLE_FORCE_DISABLE, -}; - #define SHUFFLE_ORDER (MAX_ORDER-1) #ifdef CONFIG_SHUFFLE_PAGE_ALLOCATOR DECLARE_STATIC_KEY_FALSE(page_alloc_shuffle_key); -extern void page_alloc_shuffle(enum mm_shuffle_ctl ctl); extern void __shuffle_free_memory(pg_data_t *pgdat); extern bool shuffle_pick_tail(void); static inline void shuffle_free_memory(pg_data_t *pgdat) @@ -58,10 +45,6 @@ static inline void shuffle_zone(struct z { } -static inline void page_alloc_shuffle(enum mm_shuffle_ctl ctl) -{ -} - static inline bool is_shuffle_order(int order) { return false; _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits We already have the definition of PB_migratetype_bits and current NR_MIGRATETYPE_BITS looks like a cyclic definition. Just use PB_migratetype_bits is enough. Link: http://lkml.kernel.org/r/20200623124201.8199-1-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mmzone.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/include/linux/mmzone.h~mm-page_allocc-replace-the-definition-of-nr_migratetype_bits-with-pb_migratetype_bits +++ a/include/linux/mmzone.h @@ -88,8 +88,7 @@ static inline bool is_migrate_movable(in extern int page_group_by_mobility_disabled; -#define NR_MIGRATETYPE_BITS (PB_migrate_end - PB_migrate + 1) -#define MIGRATETYPE_MASK ((1UL << NR_MIGRATETYPE_BITS) - 1) +#define MIGRATETYPE_MASK ((1UL << PB_migratetype_bits) - 1) #define get_pageblock_migratetype(page) \ get_pfnblock_flags_mask(page, page_to_pfn(page), \ _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/page_alloc.c: extract the common part in pfn_to_bitidx() The return value calculation is the same both for SPARSEMEM or not. Just take it out. Link: http://lkml.kernel.org/r/20200623124201.8199-2-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/page_alloc.c~mm-page_allocc-extract-the-common-part-in-pfn_to_bitidx +++ a/mm/page_alloc.c @@ -459,11 +459,10 @@ static inline int pfn_to_bitidx(struct p { #ifdef CONFIG_SPARSEMEM pfn &= (PAGES_PER_SECTION-1); - return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS; #else pfn = pfn - round_down(page_zone(page)->zone_start_pfn, pageblock_nr_pages); - return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS; #endif /* CONFIG_SPARSEMEM */ + return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS; } /** _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/page_alloc.c: simplify pageblock bitmap access Due to commit e58469bafd05 ("mm: page_alloc: use word-based accesses for get/set pageblock bitmaps"), pageblock bitmap is accessed with word-based access. This operation could be simplified a little. Intuitively, if we want to get a bit range [start_idx, end_idx] in a word, we can do like this: mask = (1 << (end_bitidx - start_bitidx + 1)) - 1; ret = (word >> start_idx) & mask; And also if we want to set a bit range [start_idx, end_idx] with flags, we can do the same by just shift start_bitidx. By doing so we reduce some instructions for these two helper functions: Before Patched set_pfnblock_flags_mask 209 198(-5%) get_pfnblock_flags_mask 101 87(-13%) Since the syntax is changed a little, we need to check the whole 4-bit migrate_type instead of part of it. Link: http://lkml.kernel.org/r/20200623124201.8199-3-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/pageblock-flags.h | 22 +++++++--------------- mm/page_alloc.c | 13 ++++++------- 2 files changed, 13 insertions(+), 22 deletions(-) --- a/include/linux/pageblock-flags.h~mm-page_allocc-simplify-pageblock-bitmap-access +++ a/include/linux/pageblock-flags.h @@ -66,25 +66,17 @@ void set_pfnblock_flags_mask(struct page unsigned long mask); /* Declarations for getting and setting flags. See mm/page_alloc.c */ -#define get_pageblock_flags_group(page, start_bitidx, end_bitidx) \ - get_pfnblock_flags_mask(page, page_to_pfn(page), \ - end_bitidx, \ - (1 << (end_bitidx - start_bitidx + 1)) - 1) -#define set_pageblock_flags_group(page, flags, start_bitidx, end_bitidx) \ - set_pfnblock_flags_mask(page, flags, page_to_pfn(page), \ - end_bitidx, \ - (1 << (end_bitidx - start_bitidx + 1)) - 1) - #ifdef CONFIG_COMPACTION #define get_pageblock_skip(page) \ - get_pageblock_flags_group(page, PB_migrate_skip, \ - PB_migrate_skip) + get_pfnblock_flags_mask(page, page_to_pfn(page), \ + PB_migrate_skip, (1 << (PB_migrate_skip))) #define clear_pageblock_skip(page) \ - set_pageblock_flags_group(page, 0, PB_migrate_skip, \ - PB_migrate_skip) + set_pfnblock_flags_mask(page, 0, page_to_pfn(page), \ + PB_migrate_skip, (1 << PB_migrate_skip)) #define set_pageblock_skip(page) \ - set_pageblock_flags_group(page, 1, PB_migrate_skip, \ - PB_migrate_skip) + set_pfnblock_flags_mask(page, (1 << PB_migrate_skip), \ + page_to_pfn(page), \ + PB_migrate_skip, (1 << PB_migrate_skip)) #else static inline bool get_pageblock_skip(struct page *page) { --- a/mm/page_alloc.c~mm-page_allocc-simplify-pageblock-bitmap-access +++ a/mm/page_alloc.c @@ -489,8 +489,7 @@ static __always_inline unsigned long __g bitidx &= (BITS_PER_LONG-1); word = bitmap[word_bitidx]; - bitidx += end_bitidx; - return (word >> (BITS_PER_LONG - bitidx - 1)) & mask; + return (word >> bitidx) & mask; } unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, @@ -532,9 +531,8 @@ void set_pfnblock_flags_mask(struct page VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); - bitidx += end_bitidx; - mask <<= (BITS_PER_LONG - bitidx - 1); - flags <<= (BITS_PER_LONG - bitidx - 1); + mask <<= bitidx; + flags <<= bitidx; word = READ_ONCE(bitmap[word_bitidx]); for (;;) { @@ -551,8 +549,9 @@ void set_pageblock_migratetype(struct pa migratetype < MIGRATE_PCPTYPES)) migratetype = MIGRATE_UNMOVABLE; - set_pageblock_flags_group(page, (unsigned long)migratetype, - PB_migrate, PB_migrate_end); + set_pfnblock_flags_mask(page, (unsigned long)migratetype, + page_to_pfn(page), PB_migrate_end, + MIGRATETYPE_MASK); } #ifdef CONFIG_DEBUG_VM _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask() After previous cleanup, the end_bitidx is not necessary any more. Link: http://lkml.kernel.org/r/20200623124201.8199-4-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mmzone.h | 3 +-- include/linux/pageblock-flags.h | 8 +++----- mm/page_alloc.c | 15 +++++---------- 3 files changed, 9 insertions(+), 17 deletions(-) --- a/include/linux/mmzone.h~mm-page_allocc-remove-unnecessary-end_bitidx-for-_pfnblock_flags_mask +++ a/include/linux/mmzone.h @@ -91,8 +91,7 @@ extern int page_group_by_mobility_disabl #define MIGRATETYPE_MASK ((1UL << PB_migratetype_bits) - 1) #define get_pageblock_migratetype(page) \ - get_pfnblock_flags_mask(page, page_to_pfn(page), \ - PB_migrate_end, MIGRATETYPE_MASK) + get_pfnblock_flags_mask(page, page_to_pfn(page), MIGRATETYPE_MASK) struct free_area { struct list_head free_list[MIGRATE_TYPES]; --- a/include/linux/pageblock-flags.h~mm-page_allocc-remove-unnecessary-end_bitidx-for-_pfnblock_flags_mask +++ a/include/linux/pageblock-flags.h @@ -56,27 +56,25 @@ struct page; unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, - unsigned long end_bitidx, unsigned long mask); void set_pfnblock_flags_mask(struct page *page, unsigned long flags, unsigned long pfn, - unsigned long end_bitidx, unsigned long mask); /* Declarations for getting and setting flags. See mm/page_alloc.c */ #ifdef CONFIG_COMPACTION #define get_pageblock_skip(page) \ get_pfnblock_flags_mask(page, page_to_pfn(page), \ - PB_migrate_skip, (1 << (PB_migrate_skip))) + (1 << (PB_migrate_skip))) #define clear_pageblock_skip(page) \ set_pfnblock_flags_mask(page, 0, page_to_pfn(page), \ - PB_migrate_skip, (1 << PB_migrate_skip)) + (1 << PB_migrate_skip)) #define set_pageblock_skip(page) \ set_pfnblock_flags_mask(page, (1 << PB_migrate_skip), \ page_to_pfn(page), \ - PB_migrate_skip, (1 << PB_migrate_skip)) + (1 << PB_migrate_skip)) #else static inline bool get_pageblock_skip(struct page *page) { --- a/mm/page_alloc.c~mm-page_allocc-remove-unnecessary-end_bitidx-for-_pfnblock_flags_mask +++ a/mm/page_alloc.c @@ -469,14 +469,13 @@ static inline int pfn_to_bitidx(struct p * get_pfnblock_flags_mask - Return the requested group of flags for the pageblock_nr_pages block of pages * @page: The page within the block of interest * @pfn: The target page frame number - * @end_bitidx: The last bit of interest to retrieve * @mask: mask of bits that the caller is interested in * * Return: pageblock_bits flags */ -static __always_inline unsigned long __get_pfnblock_flags_mask(struct page *page, +static __always_inline +unsigned long __get_pfnblock_flags_mask(struct page *page, unsigned long pfn, - unsigned long end_bitidx, unsigned long mask) { unsigned long *bitmap; @@ -493,15 +492,14 @@ static __always_inline unsigned long __g } unsigned long get_pfnblock_flags_mask(struct page *page, unsigned long pfn, - unsigned long end_bitidx, unsigned long mask) { - return __get_pfnblock_flags_mask(page, pfn, end_bitidx, mask); + return __get_pfnblock_flags_mask(page, pfn, mask); } static __always_inline int get_pfnblock_migratetype(struct page *page, unsigned long pfn) { - return __get_pfnblock_flags_mask(page, pfn, PB_migrate_end, MIGRATETYPE_MASK); + return __get_pfnblock_flags_mask(page, pfn, MIGRATETYPE_MASK); } /** @@ -509,12 +507,10 @@ static __always_inline int get_pfnblock_ * @page: The page within the block of interest * @flags: The flags to set * @pfn: The target page frame number - * @end_bitidx: The last bit of interest * @mask: mask of bits that the caller is interested in */ void set_pfnblock_flags_mask(struct page *page, unsigned long flags, unsigned long pfn, - unsigned long end_bitidx, unsigned long mask) { unsigned long *bitmap; @@ -550,8 +546,7 @@ void set_pageblock_migratetype(struct pa migratetype = MIGRATE_UNMOVABLE; set_pfnblock_flags_mask(page, (unsigned long)migratetype, - page_to_pfn(page), PB_migrate_end, - MIGRATETYPE_MASK); + page_to_pfn(page), MIGRATETYPE_MASK); } #ifdef CONFIG_DEBUG_VM _
From: Qian Cai <cai@lca.pw> Subject: mm/page_alloc: silence a KASAN false positive kernel_init_free_pages() will use memset() on s390 to clear all pages from kmalloc_order() which will override KASAN redzones because a redzone was setup from the end of the allocation size to the end of the last page. Silence it by not reporting it there. An example of the report is, BUG: KASAN: slab-out-of-bounds in __free_pages_ok Write of size 4096 at addr 000000014beaa000 Call Trace: show_stack+0x152/0x210 dump_stack+0x1f8/0x248 print_address_description.isra.13+0x5e/0x4d0 kasan_report+0x130/0x178 check_memory_region+0x190/0x218 memset+0x34/0x60 __free_pages_ok+0x894/0x12f0 kfree+0x4f2/0x5e0 unpack_to_rootfs+0x60e/0x650 populate_rootfs+0x56/0x358 do_one_initcall+0x1f4/0xa20 kernel_init_freeable+0x758/0x7e8 kernel_init+0x1c/0x170 ret_from_fork+0x24/0x28 Memory state around the buggy address: 000000014bea9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000014bea9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >000000014beaa000: 03 fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe ^ 000000014beaa080: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe 000000014beaa100: fe fe fe fe fe fe fe fe fe fe fe fe fe fe Link: http://lkml.kernel.org/r/20200610052154.5180-1-cai@lca.pw Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+) --- a/mm/page_alloc.c~mm-page_alloc-silence-a-kasan-false-positive +++ a/mm/page_alloc.c @@ -1156,8 +1156,11 @@ static void kernel_init_free_pages(struc { int i; + /* s390's use of memset() could override KASAN redzones. */ + kasan_disable_current(); for (i = 0; i < numpages; i++) clear_highpage(page + i); + kasan_enable_current(); } static __always_inline bool free_pages_prepare(struct page *page, _
From: Wei Yang <richard.weiyang@linux.alibaba.com> Subject: mm/page_alloc: fallbacks at most has 3 elements MIGRAGE_TYPES is used to be the mark of end and there are at most 3 elements for the one dimension array. Reduce to 3 to save little memory. Link: http://lkml.kernel.org/r/20200625231022.18784-1-richard.weiyang@linux.alibaba.com Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-page_alloc-fallbacks-at-most-has-3-elements +++ a/mm/page_alloc.c @@ -2268,7 +2268,7 @@ struct page *__rmqueue_smallest(struct z * This array describes the order lists are fallen back to when * the free lists for the desirable migrate type are depleted */ -static int fallbacks[MIGRATE_TYPES][4] = { +static int fallbacks[MIGRATE_TYPES][3] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES }, [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES }, _
From: Muchun Song <songmuchun@bytedance.com> Subject: mm/page_alloc.c: skip setting nodemask when we are in interrupt When we are in the interrupt context, it is irrelevant to the current task context. If we use current task's mems_allowed, we can be fair to alloc pages in the fast path and fall back to slow path memory allocation when the current node(which is the current task mems_allowed) does not have enough memory to allocate. In this case, it slows down the memory allocation speed of interrupt context. So we can skip setting the nodemask to allow any node to allocate memory, so that fast path allocation can success. Link: http://lkml.kernel.org/r/20200706025921.53683-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/page_alloc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-page_alloc-skip-setting-nodemask-when-we-are-in-interrupt +++ a/mm/page_alloc.c @@ -4788,7 +4788,11 @@ static inline bool prepare_alloc_pages(g if (cpusets_enabled()) { *alloc_mask |= __GFP_HARDWALL; - if (!ac->nodemask) + /* + * When we are in the interrupt context, it is irrelevant + * to the current task context. It means that any node ok. + */ + if (!in_interrupt() && !ac->nodemask) ac->nodemask = &cpuset_current_mems_allowed; else *alloc_flags |= ALLOC_CPUSET; _
From: Joonsoo Kim <iamjoonsoo.kim@lge.com> Subject: mm/page_alloc: fix memalloc_nocma_{save/restore} APIs Currently, memalloc_nocma_{save/restore} API that prevents CMA area in page allocation is implemented by using current_gfp_context(). However, there are two problems of this implementation. First, this doesn't work for allocation fastpath. In the fastpath, original gfp_mask is used since current_gfp_context() is introduced in order to control reclaim and it is on slowpath. So, CMA area can be allocated through the allocation fastpath even if memalloc_nocma_{save/restore} APIs are used. Currently, there is just one user for these APIs and it has a fallback method to prevent actual problem. Second, clearing __GFP_MOVABLE in current_gfp_context() has a side effect to exclude the memory on the ZONE_MOVABLE for allocation target. To fix these problems, this patch changes the implementation to exclude CMA area in page allocation. Main point of this change is using the alloc_flags. alloc_flags is mainly used to control allocation so it fits for excluding CMA area in allocation. Link: http://lkml.kernel.org/r/1595468942-29687-1-git-send-email-iamjoonsoo.kim@lge.com Fixes: d7fefcc8de91 (mm/cma: add PF flag to force non cma alloc) Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/sched/mm.h | 8 +------- mm/page_alloc.c | 31 +++++++++++++++++++++---------- 2 files changed, 22 insertions(+), 17 deletions(-) --- a/include/linux/sched/mm.h~mm-page_alloc-fix-memalloc_nocma_save-restore-apis +++ a/include/linux/sched/mm.h @@ -175,12 +175,10 @@ static inline bool in_vfork(struct task_ * Applies per-task gfp context to the given allocation flags. * PF_MEMALLOC_NOIO implies GFP_NOIO * PF_MEMALLOC_NOFS implies GFP_NOFS - * PF_MEMALLOC_NOCMA implies no allocation from CMA region. */ static inline gfp_t current_gfp_context(gfp_t flags) { - if (unlikely(current->flags & - (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_NOCMA))) { + if (unlikely(current->flags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS))) { /* * NOIO implies both NOIO and NOFS and it is a weaker context * so always make sure it makes precedence @@ -189,10 +187,6 @@ static inline gfp_t current_gfp_context( flags &= ~(__GFP_IO | __GFP_FS); else if (current->flags & PF_MEMALLOC_NOFS) flags &= ~__GFP_FS; -#ifdef CONFIG_CMA - if (current->flags & PF_MEMALLOC_NOCMA) - flags &= ~__GFP_MOVABLE; -#endif } return flags; } --- a/mm/page_alloc.c~mm-page_alloc-fix-memalloc_nocma_save-restore-apis +++ a/mm/page_alloc.c @@ -2785,7 +2785,7 @@ __rmqueue(struct zone *zone, unsigned in * allocating from CMA when over half of the zone's free memory * is in the CMA area. */ - if (migratetype == MIGRATE_MOVABLE && + if (alloc_flags & ALLOC_CMA && zone_page_state(zone, NR_FREE_CMA_PAGES) > zone_page_state(zone, NR_FREE_PAGES) / 2) { page = __rmqueue_cma_fallback(zone, order); @@ -2796,7 +2796,7 @@ __rmqueue(struct zone *zone, unsigned in retry: page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page)) { - if (migratetype == MIGRATE_MOVABLE) + if (alloc_flags & ALLOC_CMA) page = __rmqueue_cma_fallback(zone, order); if (!page && __rmqueue_fallback(zone, order, migratetype, @@ -3687,6 +3687,20 @@ alloc_flags_nofragment(struct zone *zone return alloc_flags; } +static inline unsigned int current_alloc_flags(gfp_t gfp_mask, + unsigned int alloc_flags) +{ +#ifdef CONFIG_CMA + unsigned int pflags = current->flags; + + if (!(pflags & PF_MEMALLOC_NOCMA) && + gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE) + alloc_flags |= ALLOC_CMA; + +#endif + return alloc_flags; +} + /* * get_page_from_freelist goes through the zonelist trying to allocate * a page. @@ -4333,10 +4347,8 @@ gfp_to_alloc_flags(gfp_t gfp_mask) } else if (unlikely(rt_task(current)) && !in_interrupt()) alloc_flags |= ALLOC_HARDER; -#ifdef CONFIG_CMA - if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif + alloc_flags = current_alloc_flags(gfp_mask, alloc_flags); + return alloc_flags; } @@ -4637,7 +4649,7 @@ retry: reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); if (reserve_flags) - alloc_flags = reserve_flags; + alloc_flags = current_alloc_flags(gfp_mask, reserve_flags); /* * Reset the nodemask and zonelist iterators if memory policies can be @@ -4714,7 +4726,7 @@ retry: /* Avoid allocations with no watermarks from looping endlessly */ if (tsk_is_oom_victim(current) && - (alloc_flags == ALLOC_OOM || + (alloc_flags & ALLOC_OOM || (gfp_mask & __GFP_NOMEMALLOC))) goto nopage; @@ -4806,8 +4818,7 @@ static inline bool prepare_alloc_pages(g if (should_fail_alloc_page(gfp_mask, order)) return false; - if (IS_ENABLED(CONFIG_CMA) && ac->migratetype == MIGRATE_MOVABLE) - *alloc_flags |= ALLOC_CMA; + *alloc_flags = current_alloc_flags(gfp_mask, *alloc_flags); return true; } _
From: "Alexander A. Klimov" <grandmaster@al2klimov.de> Subject: mm: thp: replace HTTP links with HTTPS ones Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `xmlns`: For each link, `http://[^# ]*(?:\w|/)`: If neither `gnu\.org/license`, nor `mozilla\.org/MPL`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. [akpm@linux-foundation.org: fix amd.com URL, per Vlastimil] Link: http://lkml.kernel.org/r/20200713164345.36088-1-grandmaster@al2klimov.de Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/huge_memory.c~mm-thp-replace-http-links-with-https-ones +++ a/mm/huge_memory.c @@ -2063,8 +2063,8 @@ static void __split_huge_pmd_locked(stru * free), userland could trigger a small page size TLB miss on the * small sized TLB while the hugepage TLB entry is still established in * the huge TLB. Some CPU doesn't like that. - * See http://support.amd.com/us/Processor_TechDocs/41322.pdf, Erratum - * 383 on page 93. Intel should be safe but is also warns that it's + * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum + * 383 on page 105. Intel should be safe but is also warns that it's * only safe if the permission and cache attributes of the two entries * loaded in the two TLB is identical (which should be the case here). * But it is generally safer to never allow small and huge TLB entries _
From: Peter Xu <peterx@redhat.com> Subject: mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible This is found by code observation only. Firstly, the worst case scenario should assume the whole range was covered by pmd sharing. The old algorithm might not work as expected for ranges like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the expected range should be (0, 2g). Since at it, remove the loop since it should not be required. With that, the new code should be faster too when the invalidating range is huge. Mike said: : With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only : adjust to (0, 1g+2m) which is incorrect. : : We should cc stable. The original reason for adjusting the range was to : prevent data corruption (getting wrong page). Since the range is not : always adjusted correctly, the potential for corruption still exists. : : However, I am fairly confident that adjust_range_if_pmd_sharing_possible : is only gong to be called in two cases: : : 1) for a single page : 2) for range == entire vma : : In those cases, the current code should produce the correct results. : : To be safe, let's just cc stable. Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/hugetlb.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-calculation-of-adjust_range_if_pmd_sharing_possible +++ a/mm/hugetlb.c @@ -5314,25 +5314,21 @@ static bool vma_shareable(struct vm_area void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { - unsigned long check_addr; + unsigned long a_start, a_end; if (!(vma->vm_flags & VM_MAYSHARE)) return; - for (check_addr = *start; check_addr < *end; check_addr += PUD_SIZE) { - unsigned long a_start = check_addr & PUD_MASK; - unsigned long a_end = a_start + PUD_SIZE; + /* Extend the range to be PUD aligned for a worst case scenario */ + a_start = ALIGN_DOWN(*start, PUD_SIZE); + a_end = ALIGN(*end, PUD_SIZE); - /* - * If sharing is possible, adjust start/end if necessary. - */ - if (range_in_vma(vma, a_start, a_end)) { - if (a_start < *start) - *start = a_start; - if (a_end > *end) - *end = a_end; - } - } + /* + * Intersect the range with the vma range, since pmd sharing won't be + * across vma after all + */ + *start = max(vma->vm_start, a_start); + *end = min(vma->vm_end, a_end); } /* _
From: Hugh Dickins <hughd@google.com> Subject: khugepaged: collapse_pte_mapped_thp() flush the right range pmdp_collapse_flush() should be given the start address at which the huge page is mapped, haddr: it was given addr, which at that point has been used as a local variable, incremented to the end address of the extent. Found by source inspection while chasing a hugepage locking bug, which I then could not explain by this. At first I thought this was very bad; then saw that all of the page translations that were not flushed would actually still point to the right pages afterwards, so harmless; then realized that I know nothing of how different architectures and models cache intermediate paging structures, so maybe it matters after all - particularly since the page table concerned is immediately freed. Much easier to fix than to think about. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/khugepaged.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-flush-the-right-range +++ a/mm/khugepaged.c @@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s /* step 4: collapse pmd */ ptl = pmd_lock(vma->vm_mm, pmd); - _pmd = pmdp_collapse_flush(vma, addr, pmd); + _pmd = pmdp_collapse_flush(vma, haddr, pmd); spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd)); _
From: Hugh Dickins <hughd@google.com> Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock When retract_page_tables() removes a page table to make way for a huge pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the case when the original mmap_write_trylock had failed), only mmap_write_trylock and pmd lock are held. That's not enough. One machine has twice crashed under load, with "BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving page_referenced() on a file THP, that had found a page table at *pmd) discovers that the page table page and its lock have already been freed by the time it comes to unlock. Follow the example of retract_page_tables(), but we only need one of huge page lock or i_mmap_lock_write to secure against this: because it's the narrower lock, and because it simplifies collapse_pte_mapped_thp() to know the hpage earlier, choose to rely on huge page lock here. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/khugepaged.c | 44 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 25 deletions(-) --- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock +++ a/mm/khugepaged.c @@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s { unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = find_vma(mm, haddr); - struct page *hpage = NULL; + struct page *hpage; pte_t *start_pte, *pte; pmd_t *pmd, _pmd; spinlock_t *ptl; @@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; + hpage = find_lock_page(vma->vm_file->f_mapping, + linear_page_index(vma, haddr)); + if (!hpage) + return; + + if (!PageHead(hpage)) + goto drop_hpage; + pmd = mm_find_pmd(mm, haddr); if (!pmd) - return; + goto drop_hpage; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); @@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s page = vm_normal_page(vma, addr, *pte); - if (!page || !PageCompound(page)) - goto abort; - - if (!hpage) { - hpage = compound_head(page); - /* - * The mapping of the THP should not change. - * - * Note that uprobe, debugger, or MAP_PRIVATE may - * change the page table, but the new page will - * not pass PageCompound() check. - */ - if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping)) - goto abort; - } - /* - * Confirm the page maps to the correct subpage. - * - * Note that uprobe, debugger, or MAP_PRIVATE may change - * the page table, but the new page will not pass - * PageCompound() check. + * Note that uprobe, debugger, or MAP_PRIVATE may change the + * page table, but the new page will not be a subpage of hpage. */ - if (WARN_ON(hpage + i != page)) + if (hpage + i != page) goto abort; count++; } @@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s pte_unmap_unlock(start_pte, ptl); /* step 3: set proper refcount and mm_counters. */ - if (hpage) { + if (count) { page_ref_sub(hpage, count); add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); } @@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd)); + +drop_hpage: + unlock_page(hpage); + put_page(hpage); return; abort: pte_unmap_unlock(start_pte, ptl); + goto drop_hpage; } static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) _
From: Hugh Dickins <hughd@google.com> Subject: khugepaged: retract_page_tables() remember to test exit Only once have I seen this scenario (and forgot even to notice what forced the eventual crash): a sequence of "BUG: Bad page map" alerts from vm_normal_page(), from zap_pte_range() servicing exit_mmap(); pmd:00000000, pte values corresponding to data in physical page 0. The pte mappings being zapped in this case were supposed to be from a huge page of ext4 text (but could as well have been shmem): my belief is that it was racing with collapse_file()'s retract_page_tables(), found *pmd pointing to a page table, locked it, but *pmd had become 0 by the time start_pte was decided. In most cases, that possibility is excluded by holding mmap lock; but exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged checks khugepaged_test_exit() after acquiring mmap lock: khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so, for example. But retract_page_tables() did not: fix that. The fix is for retract_page_tables() to check khugepaged_test_exit(), after acquiring mmap lock, before doing anything to the page table. Getting the mmap lock serializes with __mmput(), which briefly takes and drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on mm_users makes sure we don't touch the page table once exit_mmap() might reach it, since exit_mmap() will be proceeding without mmap lock, not expecting anyone to be racing with it. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/khugepaged.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) --- a/mm/khugepaged.c~khugepaged-retract_page_tables-remember-to-test-exit +++ a/mm/khugepaged.c @@ -1532,6 +1532,7 @@ out: static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; + struct mm_struct *mm; unsigned long addr; pmd_t *pmd, _pmd; @@ -1560,7 +1561,8 @@ static void retract_page_tables(struct a continue; if (vma->vm_end < addr + HPAGE_PMD_SIZE) continue; - pmd = mm_find_pmd(vma->vm_mm, addr); + mm = vma->vm_mm; + pmd = mm_find_pmd(mm, addr); if (!pmd) continue; /* @@ -1570,17 +1572,19 @@ static void retract_page_tables(struct a * mmap_lock while holding page lock. Fault path does it in * reverse order. Trylock is a way to avoid deadlock. */ - if (mmap_write_trylock(vma->vm_mm)) { - spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd); - /* assume page table is clear */ - _pmd = pmdp_collapse_flush(vma, addr, pmd); - spin_unlock(ptl); - mmap_write_unlock(vma->vm_mm); - mm_dec_nr_ptes(vma->vm_mm); - pte_free(vma->vm_mm, pmd_pgtable(_pmd)); + if (mmap_write_trylock(mm)) { + if (!khugepaged_test_exit(mm)) { + spinlock_t *ptl = pmd_lock(mm, pmd); + /* assume page table is clear */ + _pmd = pmdp_collapse_flush(vma, addr, pmd); + spin_unlock(ptl); + mm_dec_nr_ptes(mm); + pte_free(mm, pmd_pgtable(_pmd)); + } + mmap_write_unlock(mm); } else { /* Try again later */ - khugepaged_add_pte_mapped_thp(vma->vm_mm, addr); + khugepaged_add_pte_mapped_thp(mm, addr); } } i_mmap_unlock_write(mapping); _
From: Hugh Dickins <hughd@google.com> Subject: khugepaged: khugepaged_test_exit() check mmget_still_valid() Move collapse_huge_page()'s mmget_still_valid() check into khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP only, and earned its mmget_still_valid() check because it inserts a huge pmd entry in place of the page table's pmd entry; whereas collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp() merely clears the page table's pmd entry. But core dumping without mmap lock must have been as open to mistaking a racily cleared pmd entry for a page table at physical page 0, as exit_mmap() was. And we certainly have no interest in mapping as a THP once dumping core. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Song Liu <songliubraving@fb.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/khugepaged.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) --- a/mm/khugepaged.c~khugepaged-khugepaged_test_exit-check-mmget_still_valid +++ a/mm/khugepaged.c @@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(stru static inline int khugepaged_test_exit(struct mm_struct *mm) { - return atomic_read(&mm->mm_users) == 0; + return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm); } static bool hugepage_vma_check(struct vm_area_struct *vma, @@ -1100,9 +1100,6 @@ static void collapse_huge_page(struct mm * handled by the anon_vma lock + PG_lock. */ mmap_write_lock(mm); - result = SCAN_ANY_PROCESS; - if (!mmget_still_valid(mm)) - goto out; result = hugepage_vma_revalidate(mm, address, &vma); if (result) goto out; _
From: dylan-meiners <spacct.spacct@gmail.com> Subject: mm/vmscan.c: fix typo Change "optizimation" to "optimization". Link: http://lkml.kernel.org/r/20200609185144.10049-1-spacct.spacct@gmail.com Signed-off-by: dylan-meiners <spacct.spacct@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmscan.c~mm-vmscanc-fixed-typo +++ a/mm/vmscan.c @@ -910,7 +910,7 @@ static int __remove_mapping(struct addre * order to detect refaults, thus thrashing, later on. * * But don't store shadows in an address space that is - * already exiting. This is not just an optizimation, + * already exiting. This is not just an optimization, * inode reclaim needs to empty out the radix tree or * the nodes are lost. Don't plant shadows behind its * back. _
From: Shakeel Butt <shakeelb@google.com> Subject: mm: vmscan: consistent update to pgrefill The vmstat pgrefill is useful together with pgscan and pgsteal stats to measure the reclaim efficiency. However vmstat's pgrefill is not updated consistently at system level. It gets updated for both global and memcg reclaim however pgscan and pgsteal are updated for only global reclaim. So, update pgrefill only for global reclaim. If someone is interested in the stats representing both system level as well as memcg level reclaim, then consult the root memcg's memory.stat instead of /proc/vmstat. Link: http://lkml.kernel.org/r/20200711011459.1159929-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmscan.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/vmscan.c~mm-vmscan-consistent-update-to-pgrefill +++ a/mm/vmscan.c @@ -2030,7 +2030,8 @@ static void shrink_active_list(unsigned __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); - __count_vm_events(PGREFILL, nr_scanned); + if (!cgroup_reclaim(sc)) + __count_vm_events(PGREFILL, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); spin_unlock_irq(&pgdat->lru_lock); _
The patch titled Subject: mm/memory.c: avoid access flag update TLB flush for retried page fault has been removed from the -mm tree. Its filename was mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault.patch This patch was dropped because it was nacked ------------------------------------------------------ From: Yang Shi <yang.shi@linux.alibaba.com> Subject: mm/memory.c: avoid access flag update TLB flush for retried page fault Recently we found regression when running will_it_scale/page_fault3 test on ARM64. Over 70% down for the multi processes cases and over 20% down for the multi threads cases. It turns out the regression is caused by commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault"). The test mmaps a memory size file then write to the mapping, this would make all memory dirty and trigger dirty pages throttle, that upstream commit would release mmap_sem then retry the page fault. The retried page fault would see correct PTEs installed by the first try then update dirty bit and clear read-only bit and flush TLBs for ARM. The regression is caused by the excessive TLB flush. It is fine on x86 since x86 doesn't clear read-only bit so there is no need to flush TLB for this case. The page fault would be retried due to: 1. Waiting for page readahead 2. Waiting for page swapped in 3. Waiting for dirty pages throttling The first two cases don't have PTEs set up at all, so the retried page fault would install the PTEs, so they don't reach there. But the #3 case usually has PTEs installed, the retried page fault would reach the dirty bit and read-only bit update. But it seems not necessary to modify those bits again for #3 since they should be already set by the first page fault try. Of course the parallel page fault may set up PTEs, but we just need care about write fault. If the parallel page fault setup a writable and dirty PTE then the retried fault doesn't need do anything extra. If the parallel page fault setup a clean read-only PTE, the retried fault should just call do_wp_page() then return as the below code snippet shows: if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); } With this fix the test result get back to normal. [yang.shi@linux.alibaba.com: incorporate comment from Will Deacon, update commit log per discussion] Link: http://lkml.kernel.org/r/1594848990-55657-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1594148072-91273-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reported-by: Xu Yu <xuyu@linux.alibaba.com> Debugged-by: Xu Yu <xuyu@linux.alibaba.com> Tested-by: Xu Yu <xuyu@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memory.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/memory.c~mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault +++ a/mm/memory.c @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struc if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf); - entry = pte_mkdirty(entry); } + + if (vmf->flags & FAULT_FLAG_TRIED) + goto unlock; + + if (vmf->flags & FAULT_FLAG_WRITE) + entry = pte_mkdirty(entry); + entry = pte_mkyoung(entry); if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, vmf->flags & FAULT_FLAG_WRITE)) { _ Patches currently in -mm which might be from yang.shi@linux.alibaba.com are mm-filemap-clear-idle-flag-for-writes.patch mm-filemap-add-missing-fgp_-flags-in-kerneldoc-comment-for-pagecache_get_page.patch mm-thp-remove-debug_cow-switch.patch
The patch titled Subject: include/linux/mempolicy.h: fix typo has been added to the -mm tree. Its filename is mempolicyh-fix-typo.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mempolicyh-fix-typo.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mempolicyh-fix-typo.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Yanfei Xu <yanfei.xu@windriver.com> Subject: include/linux/mempolicy.h: fix typo Change "interlave" to "interleave". Link: http://lkml.kernel.org/r/20200810063454.9357-1-yanfei.xu@windriver.com Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mempolicy.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/include/linux/mempolicy.h~mempolicyh-fix-typo +++ a/include/linux/mempolicy.h @@ -28,7 +28,7 @@ struct mm_struct; * the process policy is used. Interrupts ignore the memory policy * of the current process. * - * Locking policy for interlave: + * Locking policy for interleave: * In process context there is no locking because only the process accesses * its own state. All vma manipulation is somewhat protected by a down_read on * mmap_lock. _ Patches currently in -mm which might be from yanfei.xu@windriver.com are mempolicyh-fix-typo.patch
The patch titled Subject: mm/vunmap: add cond_resched() in vunmap_pmd_range has been added to the -mm tree. Its filename is mm-vunmap-add-cond_resched-in-vunmap_pmd_range.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-vunmap-add-cond_resched-in-vunmap_pmd_range.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-vunmap-add-cond_resched-in-vunmap_pmd_range.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Subject: mm/vunmap: add cond_resched() in vunmap_pmd_range Like zap_pte_range add cond_resched so that we can avoid softlockups as reported below. On non-preemptible kernel with large I/O map region (like the one we get when using persistent memory with sector mode), an unmap of the namespace can report below softlockups. 22724.027334] watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [ndctl:50777] NIP [c0000000000dc224] plpar_hcall+0x38/0x58 LR [c0000000000d8898] pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: [c0000004e87a7780] [c0000004fb197c00] 0xc0000004fb197c00 (unreliable) [c0000004e87a7810] [c00000000007f4e4] flush_hash_page+0x114/0x200 [c0000004e87a7890] [c0000000000833cc] hpte_need_flush+0x2dc/0x540 [c0000004e87a7950] [c0000000003f5798] vunmap_page_range+0x538/0x6f0 [c0000004e87a7a70] [c0000000003f76d0] free_unmap_vmap_area+0x30/0x70 [c0000004e87a7aa0] [c0000000003f7a6c] remove_vm_area+0xfc/0x140 [c0000004e87a7ad0] [c0000000003f7dd8] __vunmap+0x68/0x270 [c0000004e87a7b50] [c000000000079de4] __iounmap.part.0+0x34/0x60 [c0000004e87a7bb0] [c000000000376394] memunmap+0x54/0x70 [c0000004e87a7bd0] [c000000000881d7c] release_nodes+0x28c/0x300 [c0000004e87a7c40] [c00000000087a65c] device_release_driver_internal+0x16c/0x280 [c0000004e87a7c80] [c000000000876fc4] unbind_store+0x124/0x170 [c0000004e87a7cd0] [c000000000875be4] drv_attr_store+0x44/0x60 [c0000004e87a7cf0] [c00000000057c734] sysfs_kf_write+0x64/0x90 [c0000004e87a7d10] [c00000000057bc10] kernfs_fop_write+0x1b0/0x290 [c0000004e87a7d60] [c000000000488e6c] __vfs_write+0x3c/0x70 [c0000004e87a7d80] [c00000000048c868] vfs_write+0xd8/0x260 [c0000004e87a7dd0] [c00000000048ccac] ksys_write+0xdc/0x130 [c0000004e87a7e20] [c00000000000b588] system_call+0x5c/0x70 Link: http://lkml.kernel.org/r/20200807075933.310240-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Harish Sriram <harish@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 2 ++ 1 file changed, 2 insertions(+) --- a/mm/vmalloc.c~mm-vunmap-add-cond_resched-in-vunmap_pmd_range +++ a/mm/vmalloc.c @@ -104,6 +104,8 @@ static void vunmap_pmd_range(pud_t *pud, if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); + + cond_resched(); } while (pmd++, addr = next, addr != end); } _ Patches currently in -mm which might be from aneesh.kumar@linux.ibm.com are mm-vunmap-add-cond_resched-in-vunmap_pmd_range.patch
The patch titled Subject: mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup-fix has been added to the -mm tree. Its filename is mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup-fix.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup-fix.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup-fix.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Roman Gushchin <guro@fb.com> Subject: mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup-fix add WARN_ON_ONCE()s, per Johannes Link: http://lkml.kernel.org/r/20200811170611.GB1507044@carbon.DHCP.thefacebook.com Cc: Dennis Zhou <dennis@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memcontrol.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/mm/memcontrol.c~mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup-fix +++ a/mm/memcontrol.c @@ -5131,6 +5131,9 @@ static int alloc_mem_cgroup_per_node_inf if (!pn) return 1; + /* We charge the parent cgroup, never the current task */ + WARN_ON_ONCE(!current->active_memcg); + pn->lruvec_stat_local = alloc_percpu_gfp(struct lruvec_stat, GFP_KERNEL_ACCOUNT); if (!pn->lruvec_stat_local) { @@ -5213,6 +5216,9 @@ static struct mem_cgroup *mem_cgroup_all goto fail; } + /* We charge the parent cgroup, never the current task */ + WARN_ON_ONCE(!current->active_memcg); + memcg->vmstats_local = alloc_percpu_gfp(struct memcg_vmstats_percpu, GFP_KERNEL_ACCOUNT); if (!memcg->vmstats_local) _ Patches currently in -mm which might be from guro@fb.com are percpu-return-number-of-released-bytes-from-pcpu_free_area.patch mm-memcg-percpu-account-percpu-memory-to-memory-cgroups.patch mm-memcg-percpu-per-memcg-percpu-memory-statistics.patch mm-memcg-percpu-per-memcg-percpu-memory-statistics-v3.patch mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup.patch mm-memcg-charge-memcg-percpu-memory-to-the-parent-cgroup-fix.patch kselftests-cgroup-add-perpcu-memory-accounting-test.patch mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch
The patch titled Subject: mm: slub: fix conversion of freelist_corrupted() has been added to the -mm tree. Its filename is mm-slub-fix-conversion-of-freelist_corrupted.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-slub-fix-conversion-of-freelist_corrupted.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-slub-fix-conversion-of-freelist_corrupted.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Eugeniu Rosca <erosca@de.adit-jv.com> Subject: mm: slub: fix conversion of freelist_corrupted() Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in deactivate_slab()") suffered an update when picked up from LKML [1]. Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()' created a no-op statement. Fix it by sticking to the behavior intended in the original patch [1]. Prefer the lowest-line-count solution. [1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/ Link: http://lkml.kernel.org/r/20200811124656.10308-1-erosca@de.adit-jv.com Fixes: 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in deactivate_slab()") Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/slub.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/mm/slub.c~mm-slub-fix-conversion-of-freelist_corrupted +++ a/mm/slub.c @@ -677,7 +677,6 @@ static bool freelist_corrupted(struct km if ((s->flags & SLAB_CONSISTENCY_CHECKS) && !check_valid_pointer(s, page, nextfree)) { object_err(s, page, freelist, "Freechain corrupt"); - freelist = NULL; slab_fix(s, "Isolate corrupted freechain"); return true; } @@ -2184,8 +2183,10 @@ static void deactivate_slab(struct kmem_ * 'freelist' is already corrupted. So isolate all objects * starting at 'freelist'. */ - if (freelist_corrupted(s, page, freelist, nextfree)) + if (freelist_corrupted(s, page, freelist, nextfree)) { + freelist = NULL; break; + } do { prior = page->freelist; _ Patches currently in -mm which might be from erosca@de.adit-jv.com are mm-slub-fix-conversion-of-freelist_corrupted.patch
The patch titled Subject: Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" has been added to the -mm tree. Its filename is revert-mm-vmstatc-do-not-show-lowmem-reserve-protection-information-of-empty-zone.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/revert-mm-vmstatc-do-not-show-lowmem-reserve-protection-information-of-empty-zone.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/revert-mm-vmstatc-do-not-show-lowmem-reserve-protection-information-of-empty-zone.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Baoquan He <bhe@redhat.com> Subject: Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" This reverts commit 26e7deadaae175. Sonny reported that one of their tests started failing on the latest kernel on their Chrome OS platform. The root cause is that the above commit removed the protection line of empty zone, while the parser used in the test relies on the protection line to mark the end of each zone. Let's revert it to avoid breaking userspace testing or applications. Link: http://lkml.kernel.org/r/20200811075412.12872-1-bhe@redhat.com Fixes: 26e7deadaae175 ("mm/vmstat.c: do not show lowmem reserve protection information of empty zone)" Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [5.8.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmstat.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) --- a/mm/vmstat.c~revert-mm-vmstatc-do-not-show-lowmem-reserve-protection-information-of-empty-zone +++ a/mm/vmstat.c @@ -1618,12 +1618,6 @@ static void zoneinfo_show_print(struct s zone->present_pages, zone_managed_pages(zone)); - /* If unpopulated, no other information is useful */ - if (!populated_zone(zone)) { - seq_putc(m, '\n'); - return; - } - seq_printf(m, "\n protection: (%ld", zone->lowmem_reserve[0]); @@ -1631,6 +1625,12 @@ static void zoneinfo_show_print(struct s seq_printf(m, ", %ld", zone->lowmem_reserve[i]); seq_putc(m, ')'); + /* If unpopulated, no other information is useful */ + if (!populated_zone(zone)) { + seq_putc(m, '\n'); + return; + } + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) seq_printf(m, "\n %-12s %lu", zone_stat_name(i), zone_page_state(zone, i)); _ Patches currently in -mm which might be from bhe@redhat.com are revert-mm-vmstatc-do-not-show-lowmem-reserve-protection-information-of-empty-zone.patch
The patch titled Subject: ROMFS: support inode blocks calculation has been added to the -mm tree. Its filename is romfs-support-inode-blocks-calculation.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/romfs-support-inode-blocks-calculation.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/romfs-support-inode-blocks-calculation.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Libing Zhou <libing.zhou@nokia-sbell.com> Subject: ROMFS: support inode blocks calculation When use 'stat' tool to display file status, the 'Blocks' field always in '0', this is not good for tool 'du'(e.g.: busybox 'du'), it always output '0' size for the files under ROMFS since such tool calculates number of 512B Blocks. This patch calculates approx. number of 512B blocks based on inode size. Link: http://lkml.kernel.org/r/20200811052606.4243-1-libing.zhou@nokia-sbell.com Signed-off-by: Libing Zhou <libing.zhou@nokia-sbell.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/romfs/super.c | 1 + 1 file changed, 1 insertion(+) --- a/fs/romfs/super.c~romfs-support-inode-blocks-calculation +++ a/fs/romfs/super.c @@ -356,6 +356,7 @@ static struct inode *romfs_iget(struct s } i->i_mode = mode; + i->i_blocks = (i->i_size + 511) >> 9; unlock_new_inode(i); return i; _ Patches currently in -mm which might be from libing.zhou@nokia-sbell.com are romfs-support-inode-blocks-calculation.patch
The patch titled Subject: mm-vmstat-add-events-for-thp-migration-without-split-v4 has been added to the -mm tree. Its filename is mm-vmstat-add-events-for-thp-migration-without-split-v4.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-vmstat-add-events-for-thp-migration-without-split-v4.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-vmstat-add-events-for-thp-migration-without-split-v4.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: mm-vmstat-add-events-for-thp-migration-without-split-v4 s/thp_nr_pages/hpage_nr_pages/ Link: http://lkml.kernel.org/r/1594287583-16568-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/migrate.c~mm-vmstat-add-events-for-thp-migration-without-split-v4 +++ a/mm/migrate.c @@ -1446,7 +1446,7 @@ retry: * during migration. */ is_thp = PageTransHuge(page); - nr_subpages = thp_nr_pages(page); + nr_subpages = hpage_nr_pages(page); cond_resched(); if (PageHuge(page)) _ Patches currently in -mm which might be from anshuman.khandual@arm.com are mm-vmstat-add-events-for-thp-migration-without-split.patch mm-vmstat-add-events-for-thp-migration-without-split-v4.patch
- Most of the rest of MM - various other subsystems 165 patches, based on 00e4db51259a5f936fec1424b884f029479d3981. Subsystems affected by this patch series: mm/memcg mm/hugetlb mm/vmscan mm/proc mm/compaction mm/mempolicy mm/oom-kill mm/hugetlbfs mm/migration mm/thp mm/cma mm/util mm/memory-hotplug mm/cleanups mm/uaccess alpha misc sparse bitmap lib lz4 bitops checkpatch autofs minix nilfs ufs fat signals kmod coredump exec kdump rapidio panic kcov kgdb ipc mm/migration mm/gup mm/pagemap Subsystem: mm/memcg Roman Gushchin <guro@fb.com>: Patch series "mm: memcg accounting of percpu memory", v3: percpu: return number of released bytes from pcpu_free_area() mm: memcg/percpu: account percpu memory to memory cgroups mm: memcg/percpu: per-memcg percpu memory statistics mm: memcg: charge memcg percpu memory to the parent cgroup kselftests: cgroup: add perpcu memory accounting test Subsystem: mm/hugetlb Muchun Song <songmuchun@bytedance.com>: mm/hugetlb: add mempolicy check in the reservation routine Subsystem: mm/vmscan Joonsoo Kim <iamjoonsoo.kim@lge.com>: Patch series "workingset protection/detection on the anonymous LRU list", v7: mm/vmscan: make active/inactive ratio as 1:1 for anon lru mm/vmscan: protect the workingset on anonymous LRU mm/workingset: prepare the workingset detection infrastructure for anon LRU mm/swapcache: support to handle the shadow entries mm/swap: implement workingset detection for anonymous LRU mm/vmscan: restore active/inactive ratio for anonymous LRU Subsystem: mm/proc Michal Koutný <mkoutny@suse.com>: /proc/PID/smaps: consistent whitespace output format Subsystem: mm/compaction Nitin Gupta <nigupta@nvidia.com>: mm: proactive compaction mm: fix compile error due to COMPACTION_HPAGE_ORDER mm: use unsigned types for fragmentation score Alex Shi <alex.shi@linux.alibaba.com>: mm/compaction: correct the comments of compact_defer_shift Subsystem: mm/mempolicy Krzysztof Kozlowski <krzk@kernel.org>: mm: mempolicy: fix kerneldoc of numa_map_to_online_node() Wenchao Hao <haowenchao22@gmail.com>: mm/mempolicy.c: check parameters first in kernel_get_mempolicy Yanfei Xu <yanfei.xu@windriver.com>: include/linux/mempolicy.h: fix typo Subsystem: mm/oom-kill Yafang Shao <laoar.shao@gmail.com>: mm, oom: make the calculation of oom badness more accurate Michal Hocko <mhocko@suse.com>: doc, mm: sync up oom_score_adj documentation doc, mm: clarify /proc/<pid>/oom_score value range Yafang Shao <laoar.shao@gmail.com>: mm, oom: show process exiting information in __oom_kill_process() Subsystem: mm/hugetlbfs Mike Kravetz <mike.kravetz@oracle.com>: hugetlbfs: prevent filesystem stacking of hugetlbfs hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem Subsystem: mm/migration Ralph Campbell <rcampbell@nvidia.com>: Patch series "mm/migrate: optimize migrate_vma_setup() for holes": mm/migrate: optimize migrate_vma_setup() for holes mm/migrate: add migrate-shared test for migrate_vma_*() Subsystem: mm/thp Yang Shi <yang.shi@linux.alibaba.com>: mm: thp: remove debug_cow switch Anshuman Khandual <anshuman.khandual@arm.com>: mm/vmstat: add events for THP migration without split Subsystem: mm/cma Jianqun Xu <jay.xu@rock-chips.com>: mm/cma.c: fix NULL pointer dereference when cma could not be activated Barry Song <song.bao.hua@hisilicon.com>: Patch series "mm: fix the names of general cma and hugetlb cma", v2: mm: cma: fix the name of CMA areas mm: hugetlb: fix the name of hugetlb CMA Mike Kravetz <mike.kravetz@oracle.com>: cma: don't quit at first error when activating reserved areas Subsystem: mm/util Waiman Long <longman@redhat.com>: include/linux/sched/mm.h: optimize current_gfp_context() Krzysztof Kozlowski <krzk@kernel.org>: mm: mmu_notifier: fix and extend kerneldoc Subsystem: mm/memory-hotplug Daniel Jordan <daniel.m.jordan@oracle.com>: x86/mm: use max memory block size on bare metal Jia He <justin.he@arm.com>: mm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid() mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Charan Teja Reddy <charante@codeaurora.org>: mm, memory_hotplug: update pcp lists everytime onlining a memory block Subsystem: mm/cleanups Randy Dunlap <rdunlap@infradead.org>: mm: drop duplicated words in <linux/pgtable.h> mm: drop duplicated words in <linux/mm.h> include/linux/highmem.h: fix duplicated words in a comment include/linux/frontswap.h: drop duplicated word in a comment include/linux/memcontrol.h: drop duplicate word and fix spello Arvind Sankar <nivedita@alum.mit.edu>: sh/mm: drop unused MAX_PHYSADDR_BITS sparc: drop unused MAX_PHYSADDR_BITS Randy Dunlap <rdunlap@infradead.org>: mm/compaction.c: delete duplicated word mm/filemap.c: delete duplicated word mm/hmm.c: delete duplicated word mm/hugetlb.c: delete duplicated words mm/memcontrol.c: delete duplicated words mm/memory.c: delete duplicated words mm/migrate.c: delete duplicated word mm/nommu.c: delete duplicated words mm/page_alloc.c: delete or fix duplicated words mm/shmem.c: delete duplicated word mm/slab_common.c: delete duplicated word mm/usercopy.c: delete duplicated word mm/vmscan.c: delete or fix duplicated words mm/zpool.c: delete duplicated word and fix grammar mm/zsmalloc.c: fix duplicated words Subsystem: mm/uaccess Christoph Hellwig <hch@lst.de>: Patch series "clean up address limit helpers", v2: syscalls: use uaccess_kernel in addr_limit_user_check nds32: use uaccess_kernel in show_regs riscv: include <asm/pgtable.h> in <asm/uaccess.h> uaccess: remove segment_eq uaccess: add force_uaccess_{begin,end} helpers exec: use force_uaccess_begin during exec and exit Subsystem: alpha Luc Van Oostenryck <luc.vanoostenryck@gmail.com>: alpha: fix annotation of io{read,write}{16,32}be() Subsystem: misc Randy Dunlap <rdunlap@infradead.org>: include/linux/compiler-clang.h: drop duplicated word in a comment include/linux/exportfs.h: drop duplicated word in a comment include/linux/async_tx.h: drop duplicated word in a comment include/linux/xz.h: drop duplicated word Christoph Hellwig <hch@lst.de>: kernel: add a kernel_wait helper Feng Tang <feng.tang@intel.com>: ./Makefile: add debug option to enable function aligned on 32 bytes Arvind Sankar <nivedita@alum.mit.edu>: kernel.h: remove duplicate include of asm/div64.h "Alexander A. Klimov" <grandmaster@al2klimov.de>: include/: replace HTTP links with HTTPS ones Matthew Wilcox <willy@infradead.org>: include/linux/poison.h: remove obsolete comment Subsystem: sparse Luc Van Oostenryck <luc.vanoostenryck@gmail.com>: sparse: group the defines by functionality Subsystem: bitmap Stefano Brivio <sbrivio@redhat.com>: Patch series "lib: Fix bitmap_cut() for overlaps, add test": lib/bitmap.c: fix bitmap_cut() for partial overlapping case lib/test_bitmap.c: add test for bitmap_cut() Subsystem: lib Luc Van Oostenryck <luc.vanoostenryck@gmail.com>: lib/generic-radix-tree.c: remove unneeded __rcu Geert Uytterhoeven <geert@linux-m68k.org>: lib/test_bitops: do the full test during module init Wei Yongjun <weiyongjun1@huawei.com>: lib/test_lockup.c: make symbol 'test_works' static Tiezhu Yang <yangtiezhu@loongson.cn>: lib/Kconfig.debug: make TEST_LOCKUP depend on module lib/test_lockup.c: fix return value of test_lockup_init() "Alexander A. Klimov" <grandmaster@al2klimov.de>: lib/: replace HTTP links with HTTPS ones "Kars Mulder" <kerneldev@karsmulder.nl>: kstrto*: correct documentation references to simple_strto*() kstrto*: do not describe simple_strto*() as obsolete/replaced Subsystem: lz4 Nick Terrell <terrelln@fb.com>: lz4: fix kernel decompression speed Subsystem: bitops Rikard Falkeborn <rikard.falkeborn@gmail.com>: lib/test_bits.c: add tests of GENMASK Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: add test for possible misuse of IS_ENABLED() without CONFIG_ checkpatch: add --fix option for ASSIGN_IN_IF Quentin Monnet <quentin@isovalent.com>: checkpatch: fix CONST_STRUCT when const_structs.checkpatch is missing Joe Perches <joe@perches.com>: checkpatch: add test for repeated words checkpatch: remove missing switch/case break test Subsystem: autofs Randy Dunlap <rdunlap@infradead.org>: autofs: fix doubled word Subsystem: minix Eric Biggers <ebiggers@google.com>: Patch series "fs/minix: fix syzbot bugs and set s_maxbytes": fs/minix: check return value of sb_getblk() fs/minix: don't allow getting deleted inodes fs/minix: reject too-large maximum file size fs/minix: set s_maxbytes correctly fs/minix: fix block limit check for V1 filesystems fs/minix: remove expected error message in block_to_path() Subsystem: nilfs Eric Biggers <ebiggers@google.com>: Patch series "nilfs2 updates": nilfs2: only call unlock_new_inode() if I_NEW Joe Perches <joe@perches.com>: nilfs2: convert __nilfs_msg to integrate the level and format nilfs2: use a more common logging style Subsystem: ufs Colin Ian King <colin.king@canonical.com>: fs/ufs: avoid potential u32 multiplication overflow Subsystem: fat Yubo Feng <fengyubo3@huawei.com>: fatfs: switch write_lock to read_lock in fat_ioctl_get_attributes "Alexander A. Klimov" <grandmaster@al2klimov.de>: VFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS ones OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>: fat: fix fat_ra_init() for data clusters == 0 Subsystem: signals Helge Deller <deller@gmx.de>: fs/signalfd.c: fix inconsistent return codes for signalfd4 Subsystem: kmod Tiezhu Yang <yangtiezhu@loongson.cn>: Patch series "kmod/umh: a few fixes": selftests: kmod: use variable NAME in kmod_test_0001() kmod: remove redundant "be an" in the comment test_kmod: avoid potential double free in trigger_config_run_type() Subsystem: coredump Lepton Wu <ytht.net@gmail.com>: coredump: add %f for executable filename Subsystem: exec Kees Cook <keescook@chromium.org>: Patch series "Relocate execve() sanity checks", v2: exec: change uselib(2) IS_SREG() failure to EACCES exec: move S_ISREG() check earlier exec: move path_noexec() check earlier Subsystem: kdump Vijay Balakrishna <vijayb@linux.microsoft.com>: kdump: append kernel build-id string to VMCOREINFO Subsystem: rapidio "Gustavo A. R. Silva" <gustavoars@kernel.org>: drivers/rapidio/devices/rio_mport_cdev.c: use struct_size() helper drivers/rapidio/rio-scan.c: use struct_size() helper rapidio/rio_mport_cdev: use array_size() helper in copy_{from,to}_user() Subsystem: panic Tiezhu Yang <yangtiezhu@loongson.cn>: kernel/panic.c: make oops_may_print() return bool lib/Kconfig.debug: fix typo in the help text of CONFIG_PANIC_TIMEOUT Yue Hu <huyue2@yulong.com>: panic: make print_oops_end_marker() static Subsystem: kcov Marco Elver <elver@google.com>: kcov: unconditionally add -fno-stack-protector to compiler options Wei Yongjun <weiyongjun1@huawei.com>: kcov: make some symbols static Subsystem: kgdb Nick Desaulniers <ndesaulniers@google.com>: scripts/gdb: fix python 3.8 SyntaxWarning Subsystem: ipc Alexey Dobriyan <adobriyan@gmail.com>: ipc: uninline functions Liao Pingfang <liao.pingfang@zte.com.cn>: ipc/shm.c: remove the superfluous break Subsystem: mm/migration Joonsoo Kim <iamjoonsoo.kim@lge.com>: Patch series "clean-up the migration target allocation functions", v5: mm/page_isolation: prefer the node of the source page mm/migrate: move migration helper from .h to .c mm/hugetlb: unify migration callbacks mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations mm/migrate: introduce a standard migration target allocation function mm/mempolicy: use a standard migration target allocation callback mm/page_alloc: remove a wrapper for alloc_migration_target() Subsystem: mm/gup Joonsoo Kim <iamjoonsoo.kim@lge.com>: mm/gup: restrict CMA region by using allocation scope API mm/hugetlb: make hugetlb migration callback CMA aware mm/gup: use a standard migration target allocation callback Subsystem: mm/pagemap Peter Xu <peterx@redhat.com>: Patch series "mm: Page fault accounting cleanups", v5: mm: do page fault accounting in handle_mm_fault mm/alpha: use general page fault accounting mm/arc: use general page fault accounting mm/arm: use general page fault accounting mm/arm64: use general page fault accounting mm/csky: use general page fault accounting mm/hexagon: use general page fault accounting mm/ia64: use general page fault accounting mm/m68k: use general page fault accounting mm/microblaze: use general page fault accounting mm/mips: use general page fault accounting mm/nds32: use general page fault accounting mm/nios2: use general page fault accounting mm/openrisc: use general page fault accounting mm/parisc: use general page fault accounting mm/powerpc: use general page fault accounting mm/riscv: use general page fault accounting mm/s390: use general page fault accounting mm/sh: use general page fault accounting mm/sparc32: use general page fault accounting mm/sparc64: use general page fault accounting mm/x86: use general page fault accounting mm/xtensa: use general page fault accounting mm: clean up the last pieces of page fault accountings mm/gup: remove task_struct pointer for all gup code Documentation/admin-guide/cgroup-v2.rst | 4 Documentation/admin-guide/sysctl/kernel.rst | 3 Documentation/admin-guide/sysctl/vm.rst | 15 + Documentation/filesystems/proc.rst | 11 - Documentation/vm/page_migration.rst | 27 +++ Makefile | 4 arch/alpha/include/asm/io.h | 8 arch/alpha/include/asm/uaccess.h | 2 arch/alpha/mm/fault.c | 10 - arch/arc/include/asm/segment.h | 3 arch/arc/kernel/process.c | 2 arch/arc/mm/fault.c | 20 -- arch/arm/include/asm/uaccess.h | 4 arch/arm/kernel/signal.c | 2 arch/arm/mm/fault.c | 27 --- arch/arm64/include/asm/uaccess.h | 2 arch/arm64/kernel/sdei.c | 2 arch/arm64/mm/fault.c | 31 --- arch/arm64/mm/numa.c | 10 - arch/csky/include/asm/segment.h | 2 arch/csky/mm/fault.c | 15 - arch/h8300/include/asm/segment.h | 2 arch/hexagon/mm/vm_fault.c | 11 - arch/ia64/include/asm/uaccess.h | 2 arch/ia64/mm/fault.c | 11 - arch/ia64/mm/numa.c | 2 arch/m68k/include/asm/segment.h | 2 arch/m68k/include/asm/tlbflush.h | 6 arch/m68k/mm/fault.c | 16 - arch/microblaze/include/asm/uaccess.h | 2 arch/microblaze/mm/fault.c | 11 - arch/mips/include/asm/uaccess.h | 2 arch/mips/kernel/unaligned.c | 27 +-- arch/mips/mm/fault.c | 16 - arch/nds32/include/asm/uaccess.h | 2 arch/nds32/kernel/process.c | 2 arch/nds32/mm/alignment.c | 7 arch/nds32/mm/fault.c | 21 -- arch/nios2/include/asm/uaccess.h | 2 arch/nios2/mm/fault.c | 16 - arch/openrisc/include/asm/uaccess.h | 2 arch/openrisc/mm/fault.c | 11 - arch/parisc/include/asm/uaccess.h | 2 arch/parisc/mm/fault.c | 10 - arch/powerpc/include/asm/uaccess.h | 3 arch/powerpc/mm/copro_fault.c | 7 arch/powerpc/mm/fault.c | 13 - arch/riscv/include/asm/uaccess.h | 6 arch/riscv/mm/fault.c | 18 -- arch/s390/include/asm/uaccess.h | 2 arch/s390/kvm/interrupt.c | 2 arch/s390/kvm/kvm-s390.c | 2 arch/s390/kvm/priv.c | 8 arch/s390/mm/fault.c | 18 -- arch/s390/mm/gmap.c | 4 arch/sh/include/asm/segment.h | 3 arch/sh/include/asm/sparsemem.h | 4 arch/sh/kernel/traps_32.c | 12 - arch/sh/mm/fault.c | 13 - arch/sh/mm/init.c | 9 - arch/sparc/include/asm/sparsemem.h | 1 arch/sparc/include/asm/uaccess_32.h | 2 arch/sparc/include/asm/uaccess_64.h | 2 arch/sparc/mm/fault_32.c | 15 - arch/sparc/mm/fault_64.c | 13 - arch/um/kernel/trap.c | 6 arch/x86/include/asm/uaccess.h | 2 arch/x86/mm/fault.c | 19 -- arch/x86/mm/init_64.c | 9 + arch/x86/mm/numa.c | 1 arch/xtensa/include/asm/uaccess.h | 2 arch/xtensa/mm/fault.c | 17 - drivers/firmware/arm_sdei.c | 5 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 drivers/infiniband/core/umem_odp.c | 2 drivers/iommu/amd/iommu_v2.c | 2 drivers/iommu/intel/svm.c | 3 drivers/rapidio/devices/rio_mport_cdev.c | 7 drivers/rapidio/rio-scan.c | 8 drivers/vfio/vfio_iommu_type1.c | 4 fs/coredump.c | 17 + fs/exec.c | 38 ++-- fs/fat/Kconfig | 2 fs/fat/fatent.c | 3 fs/fat/file.c | 4 fs/hugetlbfs/inode.c | 6 fs/minix/inode.c | 48 ++++- fs/minix/itree_common.c | 8 fs/minix/itree_v1.c | 16 - fs/minix/itree_v2.c | 15 - fs/minix/minix.h | 1 fs/namei.c | 10 - fs/nilfs2/alloc.c | 38 ++-- fs/nilfs2/btree.c | 42 ++-- fs/nilfs2/cpfile.c | 10 - fs/nilfs2/dat.c | 14 - fs/nilfs2/direct.c | 14 - fs/nilfs2/gcinode.c | 2 fs/nilfs2/ifile.c | 4 fs/nilfs2/inode.c | 32 +-- fs/nilfs2/ioctl.c | 37 ++-- fs/nilfs2/mdt.c | 2 fs/nilfs2/namei.c | 6 fs/nilfs2/nilfs.h | 18 +- fs/nilfs2/page.c | 11 - fs/nilfs2/recovery.c | 32 +-- fs/nilfs2/segbuf.c | 2 fs/nilfs2/segment.c | 38 ++-- fs/nilfs2/sufile.c | 29 +-- fs/nilfs2/super.c | 73 ++++---- fs/nilfs2/sysfs.c | 29 +-- fs/nilfs2/the_nilfs.c | 85 ++++----- fs/open.c | 6 fs/proc/base.c | 11 + fs/proc/task_mmu.c | 4 fs/signalfd.c | 10 - fs/ufs/super.c | 2 include/asm-generic/uaccess.h | 4 include/clocksource/timer-ti-dm.h | 2 include/linux/async_tx.h | 2 include/linux/btree.h | 2 include/linux/compaction.h | 6 include/linux/compiler-clang.h | 2 include/linux/compiler_types.h | 44 ++--- include/linux/crash_core.h | 6 include/linux/delay.h | 2 include/linux/dma/k3-psil.h | 2 include/linux/dma/k3-udma-glue.h | 2 include/linux/dma/ti-cppi5.h | 2 include/linux/exportfs.h | 2 include/linux/frontswap.h | 2 include/linux/fs.h | 10 + include/linux/generic-radix-tree.h | 2 include/linux/highmem.h | 2 include/linux/huge_mm.h | 7 include/linux/hugetlb.h | 53 ++++-- include/linux/irqchip/irq-omap-intc.h | 2 include/linux/jhash.h | 2 include/linux/kernel.h | 12 - include/linux/leds-ti-lmu-common.h | 2 include/linux/memcontrol.h | 12 + include/linux/mempolicy.h | 18 +- include/linux/migrate.h | 42 +--- include/linux/mm.h | 20 +- include/linux/mmzone.h | 17 + include/linux/oom.h | 4 include/linux/pgtable.h | 12 - include/linux/platform_data/davinci-cpufreq.h | 2 include/linux/platform_data/davinci_asp.h | 2 include/linux/platform_data/elm.h | 2 include/linux/platform_data/gpio-davinci.h | 2 include/linux/platform_data/gpmc-omap.h | 2 include/linux/platform_data/mtd-davinci-aemif.h | 2 include/linux/platform_data/omap-twl4030.h | 2 include/linux/platform_data/uio_pruss.h | 2 include/linux/platform_data/usb-omap.h | 2 include/linux/poison.h | 4 include/linux/sched/mm.h | 8 include/linux/sched/task.h | 1 include/linux/soc/ti/k3-ringacc.h | 2 include/linux/soc/ti/knav_qmss.h | 2 include/linux/soc/ti/ti-msgmgr.h | 2 include/linux/swap.h | 25 ++ include/linux/syscalls.h | 2 include/linux/uaccess.h | 20 ++ include/linux/vm_event_item.h | 3 include/linux/wkup_m3_ipc.h | 2 include/linux/xxhash.h | 2 include/linux/xz.h | 4 include/linux/zlib.h | 2 include/soc/arc/aux.h | 2 include/trace/events/migrate.h | 17 + include/uapi/linux/auto_dev-ioctl.h | 2 include/uapi/linux/elf.h | 2 include/uapi/linux/map_to_7segment.h | 2 include/uapi/linux/types.h | 2 include/uapi/linux/usb/ch9.h | 2 ipc/sem.c | 3 ipc/shm.c | 4 kernel/Makefile | 2 kernel/crash_core.c | 50 +++++ kernel/events/callchain.c | 5 kernel/events/core.c | 5 kernel/events/uprobes.c | 8 kernel/exit.c | 18 +- kernel/futex.c | 2 kernel/kcov.c | 6 kernel/kmod.c | 5 kernel/kthread.c | 5 kernel/panic.c | 4 kernel/stacktrace.c | 5 kernel/sysctl.c | 11 + kernel/umh.c | 29 --- lib/Kconfig.debug | 27 ++- lib/Makefile | 1 lib/bitmap.c | 4 lib/crc64.c | 2 lib/decompress_bunzip2.c | 2 lib/decompress_unlzma.c | 6 lib/kstrtox.c | 20 -- lib/lz4/lz4_compress.c | 4 lib/lz4/lz4_decompress.c | 18 +- lib/lz4/lz4defs.h | 10 + lib/lz4/lz4hc_compress.c | 2 lib/math/rational.c | 2 lib/rbtree.c | 2 lib/test_bitmap.c | 58 ++++++ lib/test_bitops.c | 18 +- lib/test_bits.c | 75 ++++++++ lib/test_kmod.c | 2 lib/test_lockup.c | 6 lib/ts_bm.c | 2 lib/xxhash.c | 2 lib/xz/xz_crc32.c | 2 lib/xz/xz_dec_bcj.c | 2 lib/xz/xz_dec_lzma2.c | 2 lib/xz/xz_lzma2.h | 2 lib/xz/xz_stream.h | 2 mm/cma.c | 40 +--- mm/cma.h | 4 mm/compaction.c | 207 +++++++++++++++++++++-- mm/filemap.c | 2 mm/gup.c | 195 ++++++---------------- mm/hmm.c | 5 mm/huge_memory.c | 23 -- mm/hugetlb.c | 93 ++++------ mm/internal.h | 9 - mm/khugepaged.c | 2 mm/ksm.c | 3 mm/maccess.c | 22 +- mm/memcontrol.c | 42 +++- mm/memory-failure.c | 7 mm/memory.c | 107 +++++++++--- mm/memory_hotplug.c | 30 ++- mm/mempolicy.c | 49 +---- mm/migrate.c | 151 ++++++++++++++--- mm/mmu_notifier.c | 9 - mm/nommu.c | 4 mm/oom_kill.c | 24 +- mm/page_alloc.c | 14 + mm/page_isolation.c | 21 -- mm/percpu-internal.h | 55 ++++++ mm/percpu-km.c | 5 mm/percpu-stats.c | 36 ++-- mm/percpu-vm.c | 5 mm/percpu.c | 208 +++++++++++++++++++++--- mm/process_vm_access.c | 2 mm/rmap.c | 2 mm/shmem.c | 5 mm/slab_common.c | 2 mm/swap.c | 13 - mm/swap_state.c | 80 +++++++-- mm/swapfile.c | 4 mm/usercopy.c | 2 mm/userfaultfd.c | 2 mm/vmscan.c | 36 ++-- mm/vmstat.c | 32 +++ mm/workingset.c | 23 +- mm/zpool.c | 8 mm/zsmalloc.c | 2 scripts/checkpatch.pl | 116 +++++++++---- scripts/gdb/linux/rbtree.py | 4 security/tomoyo/domain.c | 2 tools/testing/selftests/cgroup/test_kmem.c | 70 +++++++- tools/testing/selftests/kmod/kmod.sh | 4 tools/testing/selftests/vm/hmm-tests.c | 35 ++++ virt/kvm/async_pf.c | 2 virt/kvm/kvm_main.c | 2 268 files changed, 2481 insertions(+), 1551 deletions(-)
39 patches, based on b923f1247b72fc100b87792fd2129d026bb10e66. Subsystems affected by this patch series: mm/hotfixes lz4 exec mailmap mm/thp autofs mm/madvise sysctl mm/kmemleak mm/misc lib Subsystem: mm/hotfixes Mike Rapoport <rppt@linux.ibm.com>: asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one() Baoquan He <bhe@redhat.com>: Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" Subsystem: lz4 Nick Terrell <terrelln@fb.com>: lz4: fix kernel decompression speed Subsystem: exec Kees Cook <keescook@chromium.org>: Patch series "Fix S_ISDIR execve() errno": exec: restore EACCES of S_ISDIR execve() selftests/exec: add file type errno tests Subsystem: mailmap Greg Kurz <groug@kaod.org>: mailmap: add entry for Greg Kurz Subsystem: mm/thp "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "THP prep patches": mm: store compound_nr as well as compound_order mm: move page-flags include to top of file mm: add thp_order mm: add thp_size mm: replace hpage_nr_pages with thp_nr_pages mm: add thp_head mm: introduce offset_in_thp Subsystem: autofs Randy Dunlap <rdunlap@infradead.org>: fs: autofs: delete repeated words in comments Subsystem: mm/madvise Minchan Kim <minchan@kernel.org>: Patch series "introduce memory hinting API for external process", v8: mm/madvise: pass task and mm to do_madvise pid: move pidfd_get_pid() to pid.c mm/madvise: introduce process_madvise() syscall: an external memory hinting API mm/madvise: check fatal signal pending of target process Subsystem: sysctl Xiaoming Ni <nixiaoming@huawei.com>: all arch: remove system call sys_sysctl Subsystem: mm/kmemleak Qian Cai <cai@lca.pw>: mm/kmemleak: silence KCSAN splats in checksum Subsystem: mm/misc Qian Cai <cai@lca.pw>: mm/frontswap: mark various intentional data races mm/page_io: mark various intentional data races mm/swap_state: mark various intentional data races Kirill A. Shutemov <kirill@shutemov.name>: mm/filemap.c: fix a data race in filemap_fault() Qian Cai <cai@lca.pw>: mm/swapfile: fix and annotate various data races mm/page_counter: fix various data races at memsw mm/memcontrol: fix a data race in scan count mm/list_lru: fix a data race in list_lru_count_one mm/mempool: fix a data race in mempool_free() mm/rmap: annotate a data race at tlb_flush_batched mm/swap.c: annotate data races for lru_rotate_pvecs mm: annotate a data race in page_zonenum() Romain Naour <romain.naour@gmail.com>: include/asm-generic/vmlinux.lds.h: align ro_after_init Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>: sh: clkfwk: remove r8/r16/r32 sh: use generic strncpy() Subsystem: lib Krzysztof Kozlowski <krzk@kernel.org>: Patch series "iomap: Constify ioreadX() iomem argument", v3: iomap: constify ioreadX() iomem argument (as in generic implementation) rtl818x: constify ioreadX() iomem argument (as in generic implementation) ntb: intel: constify ioreadX() iomem argument (as in generic implementation) virtio: pci: constify ioreadX() iomem argument (as in generic implementation) .mailmap | 1 arch/alpha/include/asm/core_apecs.h | 6 arch/alpha/include/asm/core_cia.h | 6 arch/alpha/include/asm/core_lca.h | 6 arch/alpha/include/asm/core_marvel.h | 4 arch/alpha/include/asm/core_mcpcia.h | 6 arch/alpha/include/asm/core_t2.h | 2 arch/alpha/include/asm/io.h | 12 - arch/alpha/include/asm/io_trivial.h | 16 - arch/alpha/include/asm/jensen.h | 2 arch/alpha/include/asm/machvec.h | 6 arch/alpha/kernel/core_marvel.c | 2 arch/alpha/kernel/io.c | 12 - arch/alpha/kernel/syscalls/syscall.tbl | 3 arch/arm/configs/am200epdkit_defconfig | 1 arch/arm/tools/syscall.tbl | 3 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 6 arch/ia64/kernel/syscalls/syscall.tbl | 3 arch/m68k/kernel/syscalls/syscall.tbl | 3 arch/microblaze/kernel/syscalls/syscall.tbl | 3 arch/mips/configs/cu1000-neo_defconfig | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 3 arch/mips/kernel/syscalls/syscall_n64.tbl | 3 arch/mips/kernel/syscalls/syscall_o32.tbl | 3 arch/parisc/include/asm/io.h | 4 arch/parisc/kernel/syscalls/syscall.tbl | 3 arch/parisc/lib/iomap.c | 72 +++--- arch/powerpc/kernel/iomap.c | 28 +- arch/powerpc/kernel/syscalls/syscall.tbl | 3 arch/s390/kernel/syscalls/syscall.tbl | 3 arch/sh/configs/dreamcast_defconfig | 1 arch/sh/configs/espt_defconfig | 1 arch/sh/configs/hp6xx_defconfig | 1 arch/sh/configs/landisk_defconfig | 1 arch/sh/configs/lboxre2_defconfig | 1 arch/sh/configs/microdev_defconfig | 1 arch/sh/configs/migor_defconfig | 1 arch/sh/configs/r7780mp_defconfig | 1 arch/sh/configs/r7785rp_defconfig | 1 arch/sh/configs/rts7751r2d1_defconfig | 1 arch/sh/configs/rts7751r2dplus_defconfig | 1 arch/sh/configs/se7206_defconfig | 1 arch/sh/configs/se7343_defconfig | 1 arch/sh/configs/se7619_defconfig | 1 arch/sh/configs/se7705_defconfig | 1 arch/sh/configs/se7750_defconfig | 1 arch/sh/configs/se7751_defconfig | 1 arch/sh/configs/secureedge5410_defconfig | 1 arch/sh/configs/sh03_defconfig | 1 arch/sh/configs/sh7710voipgw_defconfig | 1 arch/sh/configs/sh7757lcr_defconfig | 1 arch/sh/configs/sh7763rdp_defconfig | 1 arch/sh/configs/shmin_defconfig | 1 arch/sh/configs/titan_defconfig | 1 arch/sh/include/asm/string_32.h | 26 -- arch/sh/kernel/iomap.c | 22 - arch/sh/kernel/syscalls/syscall.tbl | 3 arch/sparc/kernel/syscalls/syscall.tbl | 3 arch/x86/entry/syscalls/syscall_32.tbl | 3 arch/x86/entry/syscalls/syscall_64.tbl | 4 arch/xtensa/kernel/syscalls/syscall.tbl | 3 drivers/mailbox/bcm-pdc-mailbox.c | 2 drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h | 6 drivers/ntb/hw/intel/ntb_hw_gen1.c | 2 drivers/ntb/hw/intel/ntb_hw_gen3.h | 2 drivers/ntb/hw/intel/ntb_hw_intel.h | 2 drivers/nvdimm/btt.c | 4 drivers/nvdimm/pmem.c | 6 drivers/sh/clk/cpg.c | 25 -- drivers/virtio/virtio_pci_modern.c | 6 fs/autofs/dev-ioctl.c | 4 fs/io_uring.c | 2 fs/namei.c | 4 include/asm-generic/iomap.h | 28 +- include/asm-generic/pgalloc.h | 2 include/asm-generic/vmlinux.lds.h | 1 include/linux/compat.h | 5 include/linux/huge_mm.h | 58 ++++- include/linux/io-64-nonatomic-hi-lo.h | 4 include/linux/io-64-nonatomic-lo-hi.h | 4 include/linux/memcontrol.h | 2 include/linux/mm.h | 16 - include/linux/mm_inline.h | 6 include/linux/mm_types.h | 1 include/linux/pagemap.h | 6 include/linux/pid.h | 1 include/linux/syscalls.h | 4 include/linux/sysctl.h | 6 include/uapi/asm-generic/unistd.h | 4 kernel/Makefile | 2 kernel/exit.c | 17 - kernel/pid.c | 17 + kernel/sys_ni.c | 3 kernel/sysctl_binary.c | 171 -------------- lib/iomap.c | 30 +- lib/lz4/lz4_compress.c | 4 lib/lz4/lz4_decompress.c | 18 - lib/lz4/lz4defs.h | 10 lib/lz4/lz4hc_compress.c | 2 mm/compaction.c | 2 mm/filemap.c | 22 + mm/frontswap.c | 8 mm/gup.c | 2 mm/internal.h | 4 mm/kmemleak.c | 2 mm/list_lru.c | 2 mm/madvise.c | 190 ++++++++++++++-- mm/memcontrol.c | 10 mm/memory.c | 4 mm/memory_hotplug.c | 7 mm/mempolicy.c | 2 mm/mempool.c | 2 mm/migrate.c | 18 - mm/mlock.c | 9 mm/page_alloc.c | 5 mm/page_counter.c | 13 - mm/page_io.c | 12 - mm/page_vma_mapped.c | 6 mm/rmap.c | 10 mm/swap.c | 21 - mm/swap_state.c | 10 mm/swapfile.c | 33 +- mm/vmscan.c | 6 mm/vmstat.c | 12 - mm/workingset.c | 6 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2 tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2 tools/testing/selftests/exec/.gitignore | 1 tools/testing/selftests/exec/Makefile | 5 tools/testing/selftests/exec/non-regular.c | 196 +++++++++++++++++ 132 files changed, 815 insertions(+), 614 deletions(-)
11 patches, based on 7eac66d0456fe12a462e5c14c68e97c7460989da. Subsystems affected by this patch series: misc mm/hugetlb mm/vmalloc mm/misc romfs relay uprobes squashfs mm/cma mm/pagealloc Subsystem: misc Nick Desaulniers <ndesaulniers@google.com>: mailmap: add Andi Kleen Subsystem: mm/hugetlb Xu Wang <vulab@iscas.ac.cn>: hugetlb_cgroup: convert comma to semicolon Hugh Dickins <hughd@google.com>: khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Subsystem: mm/vmalloc "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: mm/vunmap: add cond_resched() in vunmap_pmd_range Subsystem: mm/misc Leon Romanovsky <leonro@nvidia.com>: mm/rodata_test.c: fix missing function declaration Subsystem: romfs Jann Horn <jannh@google.com>: romfs: fix uninitialized memory leak in romfs_dev_read() Subsystem: relay Wei Yongjun <weiyongjun1@huawei.com>: kernel/relay.c: fix memleak on destroy relay channel Subsystem: uprobes Hugh Dickins <hughd@google.com>: uprobes: __replace_page() avoid BUG in munlock_vma_page() Subsystem: squashfs Phillip Lougher <phillip@squashfs.org.uk>: squashfs: avoid bio_alloc() failure with 1Mbyte blocks Subsystem: mm/cma Doug Berger <opendmb@gmail.com>: mm: include CMA pages in lowmem_reserve at boot Subsystem: mm/pagealloc Charan Teja Reddy <charante@codeaurora.org>: mm, page_alloc: fix core hung in free_pcppages_bulk() .mailmap | 1 + fs/romfs/storage.c | 4 +--- fs/squashfs/block.c | 6 +++++- kernel/events/uprobes.c | 2 +- kernel/relay.c | 1 + mm/hugetlb_cgroup.c | 4 ++-- mm/khugepaged.c | 2 +- mm/page_alloc.c | 7 ++++++- mm/rodata_test.c | 1 + mm/vmalloc.c | 2 ++ 10 files changed, 21 insertions(+), 9 deletions(-)
19 patches, based on 59126901f200f5fc907153468b03c64e0081b6e6. Subsystems affected by this patch series: mm/memcg mm/slub MAINTAINERS mm/pagemap ipc fork checkpatch mm/madvise mm/migration mm/hugetlb lib Subsystem: mm/memcg Michal Hocko <mhocko@suse.com>: memcg: fix use-after-free in uncharge_batch Xunlei Pang <xlpang@linux.alibaba.com>: mm: memcg: fix memcg reclaim soft lockup Subsystem: mm/slub Eugeniu Rosca <erosca@de.adit-jv.com>: mm: slub: fix conversion of freelist_corrupted() Subsystem: MAINTAINERS Robert Richter <rric@kernel.org>: MAINTAINERS: update Cavium/Marvell entries Nick Desaulniers <ndesaulniers@google.com>: MAINTAINERS: add LLVM maintainers Randy Dunlap <rdunlap@infradead.org>: MAINTAINERS: IA64: mark Status as Odd Fixes only Subsystem: mm/pagemap Joerg Roedel <jroedel@suse.de>: mm: track page table modifications in __apply_to_page_range() Subsystem: ipc Tobias Klauser <tklauser@distanz.ch>: ipc: adjust proc_ipc_sem_dointvec definition to match prototype Subsystem: fork Tobias Klauser <tklauser@distanz.ch>: fork: adjust sysctl_max_threads definition to match prototype Subsystem: checkpatch Mrinal Pandey <mrinalmni@gmail.com>: checkpatch: fix the usage of capture group ( ... ) Subsystem: mm/madvise Yang Shi <shy828301@gmail.com>: mm: madvise: fix vma user-after-free Subsystem: mm/migration Alistair Popple <alistair@popple.id.au>: mm/migrate: fixup setting UFFD_WP flag mm/rmap: fixup copying of soft dirty and uffd ptes Ralph Campbell <rcampbell@nvidia.com>: Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()": mm/migrate: remove unnecessary is_zone_device_page() check mm/migrate: preserve soft dirty in remove_migration_pte() Subsystem: mm/hugetlb Li Xinhai <lixinhai.lxh@gmail.com>: mm/hugetlb: try preferred node first when alloc gigantic page from cma Muchun Song <songmuchun@bytedance.com>: mm/hugetlb: fix a race between hugetlb sysctl handlers David Howells <dhowells@redhat.com>: mm/khugepaged.c: fix khugepaged's request size in collapse_file Subsystem: lib Jason Gunthorpe <jgg@nvidia.com>: include/linux/log2.h: add missing () around n in roundup_pow_of_two() MAINTAINERS | 32 ++++++++++++++++---------------- include/linux/log2.h | 2 +- ipc/ipc_sysctl.c | 2 +- kernel/fork.c | 2 +- mm/hugetlb.c | 49 +++++++++++++++++++++++++++++++++++++------------ mm/khugepaged.c | 2 +- mm/madvise.c | 2 +- mm/memcontrol.c | 6 ++++++ mm/memory.c | 37 ++++++++++++++++++++++++------------- mm/migrate.c | 31 +++++++++++++++++++------------ mm/rmap.c | 9 +++++++-- mm/slub.c | 12 ++++++------ mm/vmscan.c | 8 ++++++++ scripts/checkpatch.pl | 4 ++-- 14 files changed, 130 insertions(+), 68 deletions(-)
15 patches, based on 92ab97adeefccf375de7ebaad9d5b75d4125fe8b. Subsystems affected by this patch series: mailmap mm/hotfixes mm/thp mm/memory-hotplug misc kcsan Subsystem: mailmap Kees Cook <keescook@chromium.org>: mailmap: add older email addresses for Kees Cook Subsystem: mm/hotfixes Hugh Dickins <hughd@google.com>: Patch series "mm: fixes to past from future testing": ksm: reinstate memcg charge on copied pages mm: migration of hugetlbfs page skip memcg shmem: shmem_writepage() split unlikely i915 THP mm: fix check_move_unevictable_pages() on THP mlock: fix unevictable_pgs event counts on THP Byron Stanoszek <gandalf@winds.org>: tmpfs: restore functionality of nr_inodes=0 Muchun Song <songmuchun@bytedance.com>: kprobes: fix kill kprobe which has been marked as gone Subsystem: mm/thp Ralph Campbell <rcampbell@nvidia.com>: mm/thp: fix __split_huge_pmd_locked() for migration PMD Christophe Leroy <christophe.leroy@csgroup.eu>: selftests/vm: fix display of page size in map_hugetlb Subsystem: mm/memory-hotplug Pavel Tatashin <pasha.tatashin@soleen.com>: mm/memory_hotplug: drain per-cpu pages again during memory offline Subsystem: misc Tobias Klauser <tklauser@distanz.ch>: ftrace: let ftrace_enable_sysctl take a kernel pointer buffer stackleak: let stack_erasing_sysctl take a kernel pointer buffer fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype Subsystem: kcsan Changbin Du <changbin.du@gmail.com>: kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments' .mailmap | 4 ++ fs/fs-writeback.c | 2 - include/linux/ftrace.h | 3 -- include/linux/stackleak.h | 2 - kernel/kprobes.c | 9 +++++- kernel/stackleak.c | 2 - kernel/trace/ftrace.c | 3 -- lib/Kconfig.debug | 4 -- mm/huge_memory.c | 42 ++++++++++++++++--------------- mm/ksm.c | 4 ++ mm/memory_hotplug.c | 14 ++++++++++ mm/migrate.c | 3 +- mm/mlock.c | 24 +++++++++++------ mm/page_isolation.c | 8 +++++ mm/shmem.c | 20 +++++++++++--- mm/swap.c | 6 ++-- mm/vmscan.c | 10 +++++-- tools/testing/selftests/vm/map_hugetlb.c | 2 - 18 files changed, 111 insertions(+), 51 deletions(-)
9 patches, based on 7c7ec3226f5f33f9c050d85ec20f18419c622ad6. Subsystems affected by this patch series: mm/thp mm/memcg mm/gup mm/migration lib x86 mm/memory-hotplug Subsystem: mm/thp Gao Xiang <hsiangkao@redhat.com>: mm, THP, swap: fix allocating cluster for swapfile by mistake Subsystem: mm/memcg Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: fix missing suffix of workingset_restore Subsystem: mm/gup Vasily Gorbik <gor@linux.ibm.com>: mm/gup: fix gup_fast with dynamic page table folding Subsystem: mm/migration Zi Yan <ziy@nvidia.com>: mm/migrate: correct thp migration stats Subsystem: lib Nick Desaulniers <ndesaulniers@google.com>: lib/string.c: implement stpcpy Jason Yan <yanaijie@huawei.com>: lib/memregion.c: include memregion.h Subsystem: x86 Mikulas Patocka <mpatocka@redhat.com>: arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback Subsystem: mm/memory-hotplug Laurent Dufour <ldufour@linux.ibm.com>: Patch series "mm: fix memory to node bad links in sysfs", v3: mm: replace memmap_context by meminit_context mm: don't rely on system state to detect hot-plug operations Documentation/admin-guide/cgroup-v2.rst | 25 ++++++--- arch/ia64/mm/init.c | 6 +- arch/s390/include/asm/pgtable.h | 42 +++++++++++---- arch/x86/lib/usercopy_64.c | 2 drivers/base/node.c | 85 ++++++++++++++++++++------------ include/linux/mm.h | 2 include/linux/mmzone.h | 11 +++- include/linux/node.h | 11 ++-- include/linux/pgtable.h | 10 +++ lib/memregion.c | 1 lib/string.c | 24 +++++++++ mm/gup.c | 18 +++--- mm/memcontrol.c | 4 - mm/memory_hotplug.c | 5 + mm/migrate.c | 7 +- mm/page_alloc.c | 10 +-- mm/swapfile.c | 2 17 files changed, 181 insertions(+), 84 deletions(-)
3 patches, based on d3d45f8220d60a0b2aaaacf8fb2be4e6ffd9008e. Subsystems affected by this patch series: mm/slub mm/cma scripts Subsystem: mm/slub Eric Farman <farman@linux.ibm.com>: mm, slub: restore initial kmem_cache flags Subsystem: mm/cma Joonsoo Kim <iamjoonsoo.kim@lge.com>: mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs Subsystem: scripts Eric Biggers <ebiggers@google.com>: scripts/spelling.txt: fix malformed entry mm/page_alloc.c | 19 ++++++++++++++++--- mm/slub.c | 6 +----- scripts/spelling.txt | 2 +- 3 files changed, 18 insertions(+), 9 deletions(-)
5 patches, based on da690031a5d6d50a361e3f19f3eeabd086a6f20d. Subsystems affected by this patch series: MAINTAINERS mm/pagemap mm/swap mm/hugetlb Subsystem: MAINTAINERS Kees Cook <keescook@chromium.org>: MAINTAINERS: change hardening mailing list Antoine Tenart <atenart@kernel.org>: MAINTAINERS: Antoine Tenart's email address Subsystem: mm/pagemap Miaohe Lin <linmiaohe@huawei.com>: mm: mmap: Fix general protection fault in unlink_file_vma() Subsystem: mm/swap Minchan Kim <minchan@kernel.org>: mm: validate inode in mapping_set_error() Subsystem: mm/hugetlb Vijay Balakrishna <vijayb@linux.microsoft.com>: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged .mailmap | 4 +++- MAINTAINERS | 8 ++++---- include/linux/khugepaged.h | 5 +++++ include/linux/pagemap.h | 3 ++- mm/khugepaged.c | 13 +++++++++++-- mm/mmap.c | 6 +++++- mm/page_alloc.c | 3 +++ 7 files changed, 33 insertions(+), 9 deletions(-)
181 patches, based on 029f56db6ac248769f2c260bfaf3c3c0e23e904c. Subsystems affected by this patch series: kbuild scripts ntfs ocfs2 vfs mm/slab mm/slub mm/kmemleak mm/dax mm/debug mm/pagecache mm/fadvise mm/gup mm/swap mm/memremap mm/memcg mm/selftests mm/pagemap mm/mincore mm/hmm mm/dma mm/memory-failure mm/vmalloc mm/documentation mm/kasan mm/pagealloc mm/hugetlb mm/vmscan mm/z3fold mm/zbud mm/compaction mm/mempolicy mm/mempool mm/memblock mm/oom-kill mm/migration Subsystem: kbuild Nick Desaulniers <ndesaulniers@google.com>: Patch series "set clang minimum version to 10.0.1", v3: compiler-clang: add build check for clang 10.0.1 Revert "kbuild: disable clang's default use of -fmerge-all-constants" Revert "arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support" Revert "arm64: vdso: Fix compilation with clang older than 8" Partially revert "ARM: 8905/1: Emit __gnu_mcount_nc when using Clang 10.0.0 or newer" Marco Elver <elver@google.com>: kasan: remove mentions of unsupported Clang versions Nick Desaulniers <ndesaulniers@google.com>: compiler-gcc: improve version error compiler.h: avoid escaped section names export.h: fix section name for CONFIG_TRIM_UNUSED_KSYMS for Clang Lukas Bulwahn <lukas.bulwahn@gmail.com>: kbuild: doc: describe proper script invocation Subsystem: scripts Wang Qing <wangqing@vivo.com>: scripts/spelling.txt: increase error-prone spell checking Naoki Hayama <naoki.hayama@lineo.co.jp>: scripts/spelling.txt: add "arbitrary" typo Borislav Petkov <bp@suse.de>: scripts/decodecode: add the capability to supply the program counter Subsystem: ntfs Rustam Kovhaev <rkovhaev@gmail.com>: ntfs: add check for mft record size in superblock Subsystem: ocfs2 Randy Dunlap <rdunlap@infradead.org>: ocfs2: delete repeated words in comments Gang He <ghe@suse.com>: ocfs2: fix potential soft lockup during fstrim Subsystem: vfs Randy Dunlap <rdunlap@infradead.org>: fs/xattr.c: fix kernel-doc warnings for setxattr & removexattr Luo Jiaxing <luojiaxing@huawei.com>: fs_parse: mark fs_param_bad_value() as static Subsystem: mm/slab Mateusz Nosek <mateusznosek0@gmail.com>: mm/slab.c: clean code by removing redundant if condition tangjianqiang <wyqt1985@gmail.com>: include/linux/slab.h: fix a typo error in comment Subsystem: mm/slub Abel Wu <wuyun.wu@huawei.com>: mm/slub.c: branch optimization in free slowpath mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc mm/slub: make add_full() condition more explicit Subsystem: mm/kmemleak Davidlohr Bueso <dave@stgolabs.net>: mm/kmemleak: rely on rcu for task stack scanning Hui Su <sh_def@163.com>: mm,kmemleak-test.c: move kmemleak-test.c to samples dir Subsystem: mm/dax Dan Williams <dan.j.williams@intel.com>: Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5: x86/numa: cleanup configuration dependent command-line options x86/numa: add 'nohmat' option efi/fake_mem: arrange for a resource entry per efi_fake_mem instance ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device resource: report parent to walk_iomem_res_desc() callback mm/memory_hotplug: introduce default phys_to_target_node() implementation ACPI: HMAT: attach a device for each soft-reserved range device-dax: drop the dax_region.pfn_flags attribute device-dax: move instance creation parameters to 'struct dev_dax_data' device-dax: make pgmap optional for instance creation device-dax/kmem: introduce dax_kmem_range() device-dax/kmem: move resource name tracking to drvdata device-dax/kmem: replace release_resource() with release_mem_region() device-dax: add an allocation interface for device-dax instances device-dax: introduce 'struct dev_dax' typed-driver operations device-dax: introduce 'seed' devices drivers/base: make device_find_child_by_name() compatible with sysfs inputs device-dax: add resize support mm/memremap_pages: convert to 'struct range' mm/memremap_pages: support multiple ranges per invocation device-dax: add dis-contiguous resource support device-dax: introduce 'mapping' devices Joao Martins <joao.m.martins@oracle.com>: device-dax: make align a per-device property Dan Williams <dan.j.williams@intel.com>: device-dax: add an 'align' attribute Joao Martins <joao.m.martins@oracle.com>: dax/hmem: introduce dax_hmem.region_idle parameter device-dax: add a range mapping allocation attribute Subsystem: mm/debug "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/debug.c: do not dereference i_ino blindly John Hubbard <jhubbard@nvidia.com>: mm, dump_page: rename head_mapcount() --> head_compound_mapcount() Subsystem: mm/pagecache "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Return head pages from find_*_entry", v2: mm: factor find_get_incore_page out of mincore_page mm: use find_get_incore_page in memcontrol mm: optimise madvise WILLNEED proc: optimise smaps for shmem entries i915: use find_lock_page instead of find_lock_entry mm: convert find_get_entry to return the head page mm/shmem: return head page from find_lock_entry mm: add find_lock_head mm/filemap: fix filemap_map_pages for THP Subsystem: mm/fadvise Yafang Shao <laoar.shao@gmail.com>: mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Subsystem: mm/gup Barry Song <song.bao.hua@hisilicon.com>: mm/gup_benchmark: update the documentation in Kconfig mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag mm/gup: don't permit users to call get_user_pages with FOLL_LONGTERM John Hubbard <jhubbard@nvidia.com>: mm/gup: protect unpin_user_pages() against npages==-ERRNO Subsystem: mm/swap Gao Xiang <hsiangkao@redhat.com>: swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity Yu Zhao <yuzhao@google.com>: mm: remove activate_page() from unuse_pte() mm: remove superfluous __ClearPageActive() Miaohe Lin <linmiaohe@huawei.com>: mm/swap.c: fix confusing comment in release_pages() mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() mm/page_io.c: remove useless out label in __swap_writepage() mm/swap.c: fix incomplete comment in lru_cache_add_inactive_or_unevictable() mm/swapfile.c: remove unnecessary goto out in _swap_info_get() mm/swapfile.c: fix potential memory leak in sys_swapon Subsystem: mm/memremap Ira Weiny <ira.weiny@intel.com>: mm/memremap.c: convert devmap static branch to {inc,dec} Subsystem: mm/memcg "Gustavo A. R. Silva" <gustavoars@kernel.org>: mm: memcontrol: use flex_array_size() helper in memcpy() mm: memcontrol: use the preferred form for passing the size of a structure type Roman Gushchin <guro@fb.com>: mm: memcg/slab: fix racy access to page->mem_cgroup in mem_cgroup_from_obj() Miaohe Lin <linmiaohe@huawei.com>: mm: memcontrol: correct the comment of mem_cgroup_iter() Waiman Long <longman@redhat.com>: Patch series "mm/memcg: Miscellaneous cleanups and streamlining", v2: mm/memcg: clean up obsolete enum charge_type mm/memcg: simplify mem_cgroup_get_max() mm/memcg: unify swap and memsw page counters Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: add the missing numa_stat interface for cgroup v2 Miaohe Lin <linmiaohe@huawei.com>: mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() mm: memcontrol: reword obsolete comment of mem_cgroup_unmark_under_oom() Bharata B Rao <bharata@linux.ibm.com>: mm: memcg/slab: uncharge during kmem_cache_free_bulk() Ralph Campbell <rcampbell@nvidia.com>: mm/memcg: fix device private memcg accounting Subsystem: mm/selftests John Hubbard <jhubbard@nvidia.com>: Patch series "selftests/vm: fix some minor aggravating factors in the Makefile": selftests/vm: fix false build success on the second and later attempts selftests/vm: fix incorrect gcc invocation in some cases Subsystem: mm/pagemap Matthew Wilcox <willy@infradead.org>: mm: account PMD tables like PTE tables Yanfei Xu <yanfei.xu@windriver.com>: mm/memory.c: fix typo in __do_fault() comment mm/memory.c: replace vmf->vma with variable vma Wei Yang <richard.weiyang@linux.alibaba.com>: mm/mmap: rename __vma_unlink_common() to __vma_unlink() mm/mmap: leverage vma_rb_erase_ignore() to implement vma_rb_erase() Chinwen Chang <chinwen.chang@mediatek.com>: Patch series "Try to release mmap_lock temporarily in smaps_rollup", v4: mmap locking API: add mmap_lock_is_contended() mm: smaps*: extend smap_gather_stats to support specified beginning mm: proc: smaps_rollup: do not stall write attempts on mmap_lock "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Fix PageDoubleMap": mm: move PageDoubleMap bit mm: simplify PageDoubleMap with PF_SECOND policy Wei Yang <richard.weiyang@linux.alibaba.com>: mm/mmap: leave adjust_next as virtual address instead of page frame number Randy Dunlap <rdunlap@infradead.org>: mm/memory.c: fix spello of "function" Wei Yang <richard.weiyang@linux.alibaba.com>: mm/mmap: not necessary to check mapping separately mm/mmap: check on file instead of the rb_root_cached of its address_space Miaohe Lin <linmiaohe@huawei.com>: mm: use helper function mapping_allow_writable() mm/mmap.c: use helper function allow_write_access() in __remove_shared_vm_struct() Liao Pingfang <liao.pingfang@zte.com.cn>: mm/mmap.c: replace do_brk with do_brk_flags in comment of insert_vm_struct() Peter Xu <peterx@redhat.com>: mm: remove src/dst mm parameter in copy_page_range() Subsystem: mm/mincore yuleixzhang <yulei.kernel@gmail.com>: include/linux/huge_mm.h: remove mincore_huge_pmd declaration Subsystem: mm/hmm Ralph Campbell <rcampbell@nvidia.com>: tools/testing/selftests/vm/hmm-tests.c: use the new SKIP() macro lib/test_hmm.c: remove unused dmirror_zero_page Subsystem: mm/dma Andy Shevchenko <andriy.shevchenko@linux.intel.com>: mm/dmapool.c: replace open-coded list_for_each_entry_safe() mm/dmapool.c: replace hard coded function name with __func__ Subsystem: mm/memory-failure Xianting Tian <tian.xianting@h3c.com>: mm/memory-failure: do pgoff calculation before for_each_process() Alex Shi <alex.shi@linux.alibaba.com>: mm/memory-failure.c: remove unused macro `writeback' Subsystem: mm/vmalloc Hui Su <sh_def@163.com>: mm/vmalloc.c: update the comment in __vmalloc_area_node() mm/vmalloc.c: fix the comment of find_vm_area Subsystem: mm/documentation Alexander Gordeev <agordeev@linux.ibm.com>: docs/vm: fix 'mm_count' vs 'mm_users' counter confusion Subsystem: mm/kasan Patricia Alfonso <trishalfonso@google.com>: Patch series "KASAN-KUnit Integration", v14: kasan/kunit: add KUnit Struct to Current Task KUnit: KASAN Integration KASAN: port KASAN Tests to KUnit KASAN: Testing Documentation David Gow <davidgow@google.com>: mm: kasan: do not panic if both panic_on_warn and kasan_multishot set Subsystem: mm/pagealloc David Hildenbrand <david@redhat.com>: Patch series "mm / virtio-mem: support ZONE_MOVABLE", v5: mm/page_alloc: tweak comments in has_unmovable_pages() mm/page_isolation: exit early when pageblock is isolated in set_migratetype_isolate() mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate() mm/page_isolation: cleanup set_migratetype_isolate() virtio-mem: don't special-case ZONE_MOVABLE mm: document semantics of ZONE_MOVABLE Li Xinhai <lixinhai.lxh@gmail.com>: mm, isolation: avoid checking unmovable pages across pageblock boundary Mateusz Nosek <mateusznosek0@gmail.com>: mm/page_alloc.c: clean code by removing unnecessary initialization mm/page_alloc.c: micro-optimization remove unnecessary branch mm/page_alloc.c: fix early params garbage value accesses mm/page_alloc.c: clean code by merging two functions Yanfei Xu <yanfei.xu@windriver.com>: mm/page_alloc.c: __perform_reclaim should return 'unsigned long' Mateusz Nosek <mateusznosek0@gmail.com>: mmzone: clean code by removing unused macro parameter Ralph Campbell <rcampbell@nvidia.com>: mm: move call to compound_head() in release_pages() "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/page_alloc.c: fix freeing non-compound pages Michal Hocko <mhocko@suse.com>: include/linux/gfp.h: clarify usage of GFP_ATOMIC in !preemptible contexts Subsystem: mm/hugetlb Baoquan He <bhe@redhat.com>: Patch series "mm/hugetlb: Small cleanup and improvement", v2: mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool mm/hugetlb.c: remove the unnecessary non_swap_entry() doc/vm: fix typo in the hugetlb admin documentation Wei Yang <richard.weiyang@linux.alibaba.com>: Patch series "mm/hugetlb: code refine and simplification", v4: mm/hugetlb: not necessary to coalesce regions recursively mm/hugetlb: remove VM_BUG_ON(!nrg) in get_file_region_entry_from_cache() mm/hugetlb: use list_splice to merge two list at once mm/hugetlb: count file_region to be added when regions_needed != NULL mm/hugetlb: a page from buddy is not on any list mm/hugetlb: narrow the hugetlb_lock protection area during preparing huge page mm/hugetlb: take the free hpage during the iteration directly Mike Kravetz <mike.kravetz@oracle.com>: hugetlb: add lockdep check for i_mmap_rwsem held in huge_pmd_share Subsystem: mm/vmscan Chunxin Zang <zangchunxin@bytedance.com>: mm/vmscan: fix infinite loop in drop_slab_node Hui Su <sh_def@163.com>: mm/vmscan: fix comments for isolate_lru_page() Subsystem: mm/z3fold Hui Su <sh_def@163.com>: mm/z3fold.c: use xx_zalloc instead xx_alloc and memset Subsystem: mm/zbud Xiang Chen <chenxiang66@hisilicon.com>: mm/zbud: remove redundant initialization Subsystem: mm/compaction Mateusz Nosek <mateusznosek0@gmail.com>: mm/compaction.c: micro-optimization remove unnecessary branch include/linux/compaction.h: clean code by removing unused enum value John Hubbard <jhubbard@nvidia.com>: selftests/vm: 8x compaction_test speedup Subsystem: mm/mempolicy Wei Yang <richard.weiyang@linux.alibaba.com>: mm/mempolicy: remove or narrow the lock on current mm: remove unused alloc_page_vma_node() Subsystem: mm/mempool Miaohe Lin <linmiaohe@huawei.com>: mm/mempool: add 'else' to split mutually exclusive case Subsystem: mm/memblock Mike Rapoport <rppt@linux.ibm.com>: Patch series "memblock: seasonal cleaning^w cleanup", v3: KVM: PPC: Book3S HV: simplify kvm_cma_reserve() dma-contiguous: simplify cma_early_percent_memory() arm, xtensa: simplify initialization of high memory pages arm64: numa: simplify dummy_numa_init() h8300, nds32, openrisc: simplify detection of memory extents riscv: drop unneeded node initialization mircoblaze: drop unneeded NUMA and sparsemem initializations memblock: make for_each_memblock_type() iterator private memblock: make memblock_debug and related functionality private memblock: reduce number of parameters in for_each_mem_range() arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() arch, drivers: replace for_each_membock() with for_each_mem_range() x86/setup: simplify initrd relocation and reservation x86/setup: simplify reserve_crashkernel() memblock: remove unused memblock_mem_size() memblock: implement for_each_reserved_mem_region() using __next_mem_region() memblock: use separate iterators for memory and reserved regions Subsystem: mm/oom-kill Suren Baghdasaryan <surenb@google.com>: mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary Subsystem: mm/migration Ralph Campbell <rcampbell@nvidia.com>: mm/migrate: remove cpages-- in migrate_vma_finalize() mm/migrate: remove obsolete comment about device public .clang-format | 7 Documentation/admin-guide/cgroup-v2.rst | 69 + Documentation/admin-guide/mm/hugetlbpage.rst | 2 Documentation/dev-tools/kasan.rst | 74 + Documentation/dev-tools/kmemleak.rst | 2 Documentation/kbuild/makefiles.rst | 20 Documentation/vm/active_mm.rst | 2 Documentation/x86/x86_64/boot-options.rst | 4 MAINTAINERS | 2 Makefile | 9 arch/arm/Kconfig | 2 arch/arm/include/asm/tlb.h | 1 arch/arm/kernel/setup.c | 18 arch/arm/mm/init.c | 59 - arch/arm/mm/mmu.c | 39 arch/arm/mm/pmsa-v7.c | 23 arch/arm/mm/pmsa-v8.c | 17 arch/arm/xen/mm.c | 7 arch/arm64/Kconfig | 2 arch/arm64/kernel/machine_kexec_file.c | 6 arch/arm64/kernel/setup.c | 4 arch/arm64/kernel/vdso/Makefile | 7 arch/arm64/mm/init.c | 11 arch/arm64/mm/kasan_init.c | 10 arch/arm64/mm/mmu.c | 11 arch/arm64/mm/numa.c | 15 arch/c6x/kernel/setup.c | 9 arch/h8300/kernel/setup.c | 8 arch/microblaze/mm/init.c | 23 arch/mips/cavium-octeon/dma-octeon.c | 14 arch/mips/kernel/setup.c | 31 arch/mips/netlogic/xlp/setup.c | 2 arch/nds32/kernel/setup.c | 8 arch/openrisc/kernel/setup.c | 9 arch/openrisc/mm/init.c | 8 arch/powerpc/kernel/fadump.c | 61 - arch/powerpc/kexec/file_load_64.c | 16 arch/powerpc/kvm/book3s_hv_builtin.c | 12 arch/powerpc/kvm/book3s_hv_uvmem.c | 14 arch/powerpc/mm/book3s64/hash_utils.c | 16 arch/powerpc/mm/book3s64/radix_pgtable.c | 10 arch/powerpc/mm/kasan/kasan_init_32.c | 8 arch/powerpc/mm/mem.c | 31 arch/powerpc/mm/numa.c | 7 arch/powerpc/mm/pgtable_32.c | 8 arch/riscv/mm/init.c | 36 arch/riscv/mm/kasan_init.c | 10 arch/s390/kernel/setup.c | 27 arch/s390/mm/page-states.c | 6 arch/s390/mm/vmem.c | 7 arch/sh/mm/init.c | 9 arch/sparc/mm/init_64.c | 12 arch/x86/include/asm/numa.h | 8 arch/x86/kernel/e820.c | 16 arch/x86/kernel/setup.c | 56 - arch/x86/mm/numa.c | 13 arch/x86/mm/numa_emulation.c | 3 arch/x86/xen/enlighten_pv.c | 2 arch/xtensa/mm/init.c | 55 - drivers/acpi/numa/hmat.c | 76 - drivers/acpi/numa/srat.c | 9 drivers/base/core.c | 2 drivers/bus/mvebu-mbus.c | 12 drivers/dax/Kconfig | 6 drivers/dax/Makefile | 3 drivers/dax/bus.c | 1237 +++++++++++++++++++++++---- drivers/dax/bus.h | 34 drivers/dax/dax-private.h | 74 + drivers/dax/device.c | 164 +-- drivers/dax/hmem.c | 56 - drivers/dax/hmem/Makefile | 8 drivers/dax/hmem/device.c | 100 ++ drivers/dax/hmem/hmem.c | 93 +- drivers/dax/kmem.c | 236 ++--- drivers/dax/pmem/compat.c | 2 drivers/dax/pmem/core.c | 36 drivers/firmware/efi/x86_fake_mem.c | 12 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 drivers/gpu/drm/nouveau/nouveau_dmem.c | 15 drivers/irqchip/irq-gic-v3-its.c | 2 drivers/nvdimm/badrange.c | 26 drivers/nvdimm/claim.c | 13 drivers/nvdimm/nd.h | 3 drivers/nvdimm/pfn_devs.c | 13 drivers/nvdimm/pmem.c | 27 drivers/nvdimm/region.c | 21 drivers/pci/p2pdma.c | 12 drivers/virtio/virtio_mem.c | 47 - drivers/xen/unpopulated-alloc.c | 45 fs/fs_parser.c | 2 fs/ntfs/inode.c | 6 fs/ocfs2/alloc.c | 6 fs/ocfs2/localalloc.c | 2 fs/proc/base.c | 3 fs/proc/task_mmu.c | 104 +- fs/xattr.c | 22 include/acpi/acpi_numa.h | 14 include/kunit/test.h | 5 include/linux/acpi.h | 2 include/linux/compaction.h | 3 include/linux/compiler-clang.h | 8 include/linux/compiler-gcc.h | 2 include/linux/compiler.h | 2 include/linux/dax.h | 8 include/linux/export.h | 2 include/linux/fs.h | 4 include/linux/gfp.h | 6 include/linux/huge_mm.h | 3 include/linux/kasan.h | 6 include/linux/memblock.h | 90 + include/linux/memcontrol.h | 13 include/linux/memory_hotplug.h | 23 include/linux/memremap.h | 15 include/linux/mm.h | 36 include/linux/mmap_lock.h | 5 include/linux/mmzone.h | 37 include/linux/numa.h | 11 include/linux/oom.h | 1 include/linux/page-flags.h | 42 include/linux/pagemap.h | 43 include/linux/range.h | 6 include/linux/sched.h | 4 include/linux/sched/coredump.h | 1 include/linux/slab.h | 2 include/linux/swap.h | 10 include/linux/swap_slots.h | 2 kernel/dma/contiguous.c | 11 kernel/fork.c | 25 kernel/resource.c | 11 lib/Kconfig.debug | 9 lib/Kconfig.kasan | 31 lib/Makefile | 5 lib/kunit/test.c | 13 lib/test_free_pages.c | 42 lib/test_hmm.c | 65 - lib/test_kasan.c | 732 ++++++--------- lib/test_kasan_module.c | 111 ++ mm/Kconfig | 4 mm/Makefile | 1 mm/compaction.c | 5 mm/debug.c | 18 mm/dmapool.c | 46 - mm/fadvise.c | 9 mm/filemap.c | 78 - mm/gup.c | 44 mm/gup_benchmark.c | 23 mm/huge_memory.c | 4 mm/hugetlb.c | 100 +- mm/internal.h | 3 mm/kasan/report.c | 34 mm/kmemleak-test.c | 99 -- mm/kmemleak.c | 8 mm/madvise.c | 21 mm/memblock.c | 102 -- mm/memcontrol.c | 262 +++-- mm/memory-failure.c | 5 mm/memory.c | 147 +-- mm/memory_hotplug.c | 10 mm/mempolicy.c | 8 mm/mempool.c | 18 mm/memremap.c | 344 ++++--- mm/migrate.c | 3 mm/mincore.c | 28 mm/mmap.c | 45 mm/oom_kill.c | 2 mm/page_alloc.c | 82 - mm/page_counter.c | 2 mm/page_io.c | 14 mm/page_isolation.c | 41 mm/shmem.c | 19 mm/slab.c | 4 mm/slab.h | 50 - mm/slub.c | 33 mm/sparse.c | 10 mm/swap.c | 14 mm/swap_slots.c | 3 mm/swap_state.c | 38 mm/swapfile.c | 12 mm/truncate.c | 58 - mm/vmalloc.c | 6 mm/vmscan.c | 5 mm/z3fold.c | 3 mm/zbud.c | 1 samples/Makefile | 1 samples/kmemleak/Makefile | 3 samples/kmemleak/kmemleak-test.c | 99 ++ scripts/decodecode | 29 scripts/spelling.txt | 4 tools/testing/nvdimm/dax-dev.c | 28 tools/testing/nvdimm/test/iomap.c | 2 tools/testing/selftests/vm/Makefile | 17 tools/testing/selftests/vm/compaction_test.c | 11 tools/testing/selftests/vm/gup_benchmark.c | 14 tools/testing/selftests/vm/hmm-tests.c | 4 194 files changed, 4273 insertions(+), 2777 deletions(-)
- most of the rest of mm/ - various other subsystems 156 patches, based on 578a7155c5a1894a789d4ece181abf9d25dc6b0d. Subsystems affected by this patch series: mm/dax mm/debug mm/thp mm/readahead mm/page-poison mm/util mm/memory-hotplug mm/zram mm/cleanups misc core-kernel get_maintainer MAINTAINERS lib bitops checkpatch binfmt ramfs autofs nilfs rapidio panic relay kgdb ubsan romfs fault-injection Subsystem: mm/dax Dan Williams <dan.j.williams@intel.com>: device-dax/kmem: fix resource release Subsystem: mm/debug "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: Patch series "mm/debug_vm_pgtable fixes", v4: powerpc/mm: add DEBUG_VM WARN for pmd_clear powerpc/mm: move setting pte specific flags to pfn_pte mm/debug_vm_pgtable/ppc64: avoid setting top bits in radom value mm/debug_vm_pgtables/hugevmap: use the arch helper to identify huge vmap support. mm/debug_vm_pgtable/savedwrite: enable savedwrite test with CONFIG_NUMA_BALANCING mm/debug_vm_pgtable/THP: mark the pte entry huge before using set_pmd/pud_at mm/debug_vm_pgtable/set_pte/pmd/pud: don't use set_*_at to update an existing pte entry mm/debug_vm_pgtable/locks: move non page table modifying test together mm/debug_vm_pgtable/locks: take correct page table lock mm/debug_vm_pgtable/thp: use page table depost/withdraw with THP mm/debug_vm_pgtable/pmd_clear: don't use pmd/pud_clear on pte entries mm/debug_vm_pgtable/hugetlb: disable hugetlb test on ppc64 mm/debug_vm_pgtable: avoid none pte in pte_clear_test mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped. Subsystem: mm/thp "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Fix read-only THP for non-tmpfs filesystems": XArray: add xa_get_order XArray: add xas_split mm/filemap: fix storing to a THP shadow entry Patch series "Remove assumptions of THP size": mm/filemap: fix page cache removal for arbitrary sized THPs mm/memory: remove page fault assumption of compound page size mm/page_owner: change split_page_owner to take a count "Kirill A. Shutemov" <kirill@shutemov.name>: mm/huge_memory: fix total_mapcount assumption of page size mm/huge_memory: fix split assumption of page size "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/huge_memory: fix page_trans_huge_mapcount assumption of THP size mm/huge_memory: fix can_split_huge_page assumption of THP size mm/rmap: fix assumptions of THP size mm/truncate: fix truncation for pages of arbitrary size mm/page-writeback: support tail pages in wait_for_stable_page mm/vmscan: allow arbitrary sized pages to be paged out fs: add a filesystem flag for THPs fs: do not update nr_thps for mappings which support THPs Huang Ying <ying.huang@intel.com>: mm: fix a race during THP splitting Subsystem: mm/readahead "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Readahead patches for 5.9/5.10": mm/readahead: add DEFINE_READAHEAD mm/readahead: make page_cache_ra_unbounded take a readahead_control mm/readahead: make do_page_cache_ra take a readahead_control David Howells <dhowells@redhat.com>: mm/readahead: make ondemand_readahead take a readahead_control mm/readahead: pass readahead_control to force_page_cache_ra "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/readahead: add page_cache_sync_ra and page_cache_async_ra David Howells <dhowells@redhat.com>: mm/filemap: fold ra_submit into do_sync_mmap_readahead mm/readahead: pass a file_ra_state into force_page_cache_ra Subsystem: mm/page-poison Naoya Horiguchi <naoya.horiguchi@nec.com>: Patch series "HWPOISON: soft offline rework", v7: mm,hwpoison: cleanup unused PageHuge() check mm, hwpoison: remove recalculating hpage mm,hwpoison-inject: don't pin for hwpoison_filter Oscar Salvador <osalvador@suse.de>: mm,hwpoison: unexport get_hwpoison_page and make it static mm,hwpoison: refactor madvise_inject_error mm,hwpoison: kill put_hwpoison_page mm,hwpoison: unify THP handling for hard and soft offline mm,hwpoison: rework soft offline for free pages mm,hwpoison: rework soft offline for in-use pages mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_page mm,hwpoison: return 0 if the page is already poisoned in soft-offline Naoya Horiguchi <naoya.horiguchi@nec.com>: mm,hwpoison: introduce MF_MSG_UNSPLIT_THP mm,hwpoison: double-check page count in __get_any_page() Oscar Salvador <osalvador@suse.de>: mm,hwpoison: try to narrow window race for free pages Mateusz Nosek <mateusznosek0@gmail.com>: mm/page_poison.c: replace bool variable with static key Miaohe Lin <linmiaohe@huawei.com>: mm/vmstat.c: use helper macro abs() Subsystem: mm/util Bartosz Golaszewski <bgolaszewski@baylibre.com>: mm/util.c: update the kerneldoc for kstrdup_const() Jann Horn <jannh@google.com>: mm/mmu_notifier: fix mmget() assert in __mmu_interval_notifier_insert Subsystem: mm/memory-hotplug David Hildenbrand <david@redhat.com>: Patch series "mm/memory_hotplug: online_pages()/offline_pages() cleanups", v2: mm/memory_hotplug: inline __offline_pages() into offline_pages() mm/memory_hotplug: enforce section granularity when onlining/offlining mm/memory_hotplug: simplify page offlining mm/page_alloc: simplify __offline_isolated_pages() mm/memory_hotplug: drop nr_isolate_pageblock in offline_pages() mm/page_isolation: simplify return value of start_isolate_page_range() mm/memory_hotplug: simplify page onlining mm/page_alloc: drop stale pageblock comment in memmap_init_zone*() mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone() mm/memory_hotplug: mark pageblocks MIGRATE_ISOLATE while onlining memory Patch series "selective merging of system ram resources", v4: kernel/resource: make release_mem_region_adjustable() never fail kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG mm/memory_hotplug: prepare passing flags to add_memory() and friends mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources virtio-mem: try to merge system ram resources xen/balloon: try to merge system ram resources hv_balloon: try to merge system ram resources kernel/resource: make iomem_resource implicit in release_mem_region_adjustable() Laurent Dufour <ldufour@linux.ibm.com>: mm: don't panic when links can't be created in sysfs David Hildenbrand <david@redhat.com>: Patch series "mm: place pages to the freelist tail when onlining and undoing isolation", v2: mm/page_alloc: convert "report" flag of __free_one_page() to a proper flag mm/page_alloc: place pages to tail in __putback_isolated_page() mm/page_alloc: move pages to tail in move_to_free_list() mm/page_alloc: place pages to tail in __free_pages_core() mm/memory_hotplug: update comment regarding zone shuffling Subsystem: mm/zram Douglas Anderson <dianders@chromium.org>: zram: failing to decompress is WARN_ON worthy Subsystem: mm/cleanups YueHaibing <yuehaibing@huawei.com>: mm/slab.h: remove duplicate include Wei Yang <richard.weiyang@linux.alibaba.com>: mm/page_reporting.c: drop stale list head check in page_reporting_cycle Ira Weiny <ira.weiny@intel.com>: mm/highmem.c: clean up endif comments Yu Zhao <yuzhao@google.com>: mm: use self-explanatory macros rather than "2" Miaohe Lin <linmiaohe@huawei.com>: mm: fix some broken comments Chen Tao <chentao3@hotmail.com>: mm: fix some comments formatting Xiaofei Tan <tanxiaofei@huawei.com>: mm/workingset.c: fix some doc warnings Miaohe Lin <linmiaohe@huawei.com>: mm: use helper function put_write_access() Mike Rapoport <rppt@linux.ibm.com>: include/linux/mmzone.h: remove unused early_pfn_valid() "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: rename page_order() to buddy_order() Subsystem: misc Randy Dunlap <rdunlap@infradead.org>: fs: configfs: delete repeated words in comments Andy Shevchenko <andriy.shevchenko@linux.intel.com>: kernel.h: split out min()/max() et al. helpers Subsystem: core-kernel Liao Pingfang <liao.pingfang@zte.com.cn>: kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map() Randy Dunlap <rdunlap@infradead.org>: kernel/: fix repeated words in comments kernel: acct.c: fix some kernel-doc nits Subsystem: get_maintainer Joe Perches <joe@perches.com>: get_maintainer: add test for file in VCS Subsystem: MAINTAINERS Joe Perches <joe@perches.com>: get_maintainer: exclude MAINTAINERS file(s) from --git-fallback Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>: MAINTAINERS: jarkko.sakkinen@linux.intel.com -> jarkko@kernel.org Subsystem: lib Randy Dunlap <rdunlap@infradead.org>: lib: bitmap: delete duplicated words lib: libcrc32c: delete duplicated words lib: decompress_bunzip2: delete duplicated words lib: dynamic_queue_limits: delete duplicated words + fix typo lib: earlycpio: delete duplicated words lib: radix-tree: delete duplicated words lib: syscall: delete duplicated words lib: test_sysctl: delete duplicated words lib/mpi/mpi-bit.c: fix spello of "functions" Stephen Boyd <swboyd@chromium.org>: lib/idr.c: document calling context for IDA APIs mustn't use locks lib/idr.c: document that ida_simple_{get,remove}() are deprecated Christophe JAILLET <christophe.jaillet@wanadoo.fr>: lib/scatterlist.c: avoid a double memset Miaohe Lin <linmiaohe@huawei.com>: lib/percpu_counter.c: use helper macro abs() Andy Shevchenko <andriy.shevchenko@linux.intel.com>: include/linux/list.h: add a macro to test if entry is pointing to the head Dan Carpenter <dan.carpenter@oracle.com>: lib/test_hmm.c: fix an error code in dmirror_allocate_chunk() Tobias Jordan <kernel@cdqe.de>: lib/crc32.c: fix trivial typo in preprocessor condition Subsystem: bitops Wei Yang <richard.weiyang@linux.alibaba.com>: bitops: simplify get_count_order_long() bitops: use the same mechanism for get_count_order[_long] Subsystem: checkpatch Jerome Forissier <jerome@forissier.org>: checkpatch: add --kconfig-prefix Joe Perches <joe@perches.com>: checkpatch: move repeated word test checkpatch: add test for comma use that should be semicolon Rikard Falkeborn <rikard.falkeborn@gmail.com>: const_structs.checkpatch: add phy_ops Nicolas Boichat <drinkcat@chromium.org>: checkpatch: warn if trace_printk and friends are called Rikard Falkeborn <rikard.falkeborn@gmail.com>: const_structs.checkpatch: add pinctrl_ops and pinmux_ops Joe Perches <joe@perches.com>: checkpatch: warn on self-assignments checkpatch: allow not using -f with files that are in git Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: extend author Signed-off-by check for split From: header Joe Perches <joe@perches.com>: checkpatch: emit a warning on embedded filenames Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: fix multi-statement macro checks for while blocks. Łukasz Stelmach <l.stelmach@samsung.com>: checkpatch: fix false positive on empty block comment lines Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: add new warnings to author signoff checks. Subsystem: binfmt Chris Kennelly <ckennelly@google.com>: Patch series "Selecting Load Addresses According to p_align", v3: fs/binfmt_elf: use PT_LOAD p_align values for suitable start address tools/testing/selftests: add self-test for verifying load alignment Jann Horn <jannh@google.com>: Patch series "Fix ELF / FDPIC ELF core dumping, and use mmap_lock properly in there", v5: binfmt_elf_fdpic: stop using dump_emit() on user pointers on !MMU coredump: let dump_emit() bail out on short writes coredump: refactor page range dumping into common helper coredump: rework elf/elf_fdpic vma_dump_size() into common helper binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot mm/gup: take mmap_lock in get_dump_page() mm: remove the now-unnecessary mmget_still_valid() hack Subsystem: ramfs Matthew Wilcox (Oracle) <willy@infradead.org>: ramfs: fix nommu mmap with gaps in the page cache Subsystem: autofs Matthew Wilcox <willy@infradead.org>: autofs: harden ioctl table Subsystem: nilfs Wang Hai <wanghai38@huawei.com>: nilfs2: fix some kernel-doc warnings for nilfs2 Subsystem: rapidio Souptick Joarder <jrdr.linux@gmail.com>: rapidio: fix error handling path Jing Xiangfeng <jingxiangfeng@huawei.com>: rapidio: fix the missed put_device() for rio_mport_add_riodev Subsystem: panic Alexey Kardashevskiy <aik@ozlabs.ru>: panic: dump registers on panic_on_warn Subsystem: relay Sudip Mukherjee <sudipm.mukherjee@gmail.com>: kernel/relay.c: drop unneeded initialization Subsystem: kgdb Ritesh Harjani <riteshh@linux.ibm.com>: scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command scripts/gdb/tasks: add headers and improve spacing format Subsystem: ubsan Elena Petrova <lenaptr@google.com>: sched.h: drop in_ubsan field when UBSAN is in trap mode George Popescu <georgepope@android.com>: ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang Subsystem: romfs Libing Zhou <libing.zhou@nokia-sbell.com>: ROMFS: support inode blocks calculation Subsystem: fault-injection Albert van der Linde <alinde@google.com>: Patch series "add fault injection to user memory access", v3: lib, include/linux: add usercopy failure capability lib, uaccess: add failure injection to usercopy functions .mailmap | 1 Documentation/admin-guide/kernel-parameters.txt | 1 Documentation/core-api/xarray.rst | 14 Documentation/fault-injection/fault-injection.rst | 7 MAINTAINERS | 6 arch/ia64/mm/init.c | 4 arch/powerpc/include/asm/book3s/64/pgtable.h | 29 + arch/powerpc/include/asm/nohash/pgtable.h | 5 arch/powerpc/mm/pgtable.c | 5 arch/powerpc/platforms/powernv/memtrace.c | 2 arch/powerpc/platforms/pseries/hotplug-memory.c | 2 drivers/acpi/acpi_memhotplug.c | 3 drivers/base/memory.c | 3 drivers/base/node.c | 33 +- drivers/block/zram/zram_drv.c | 2 drivers/dax/kmem.c | 50 ++- drivers/hv/hv_balloon.c | 4 drivers/infiniband/core/uverbs_main.c | 3 drivers/rapidio/devices/rio_mport_cdev.c | 18 - drivers/s390/char/sclp_cmd.c | 2 drivers/vfio/pci/vfio_pci.c | 38 +- drivers/virtio/virtio_mem.c | 5 drivers/xen/balloon.c | 4 fs/autofs/dev-ioctl.c | 8 fs/binfmt_elf.c | 267 +++------------- fs/binfmt_elf_fdpic.c | 176 ++-------- fs/configfs/dir.c | 2 fs/configfs/file.c | 2 fs/coredump.c | 238 +++++++++++++- fs/ext4/verity.c | 4 fs/f2fs/verity.c | 4 fs/inode.c | 2 fs/nilfs2/bmap.c | 2 fs/nilfs2/cpfile.c | 6 fs/nilfs2/page.c | 1 fs/nilfs2/sufile.c | 4 fs/proc/task_mmu.c | 18 - fs/ramfs/file-nommu.c | 2 fs/romfs/super.c | 1 fs/userfaultfd.c | 28 - include/linux/bitops.h | 13 include/linux/blkdev.h | 1 include/linux/bvec.h | 6 include/linux/coredump.h | 13 include/linux/fault-inject-usercopy.h | 22 + include/linux/fs.h | 28 - include/linux/idr.h | 13 include/linux/ioport.h | 15 include/linux/jiffies.h | 3 include/linux/kernel.h | 150 --------- include/linux/list.h | 29 + include/linux/memory_hotplug.h | 42 +- include/linux/minmax.h | 153 +++++++++ include/linux/mm.h | 5 include/linux/mmzone.h | 17 - include/linux/node.h | 16 include/linux/nodemask.h | 2 include/linux/page-flags.h | 6 include/linux/page_owner.h | 6 include/linux/pagemap.h | 111 ++++++ include/linux/sched.h | 2 include/linux/sched/mm.h | 25 - include/linux/uaccess.h | 12 include/linux/vmstat.h | 2 include/linux/xarray.h | 22 + include/ras/ras_event.h | 3 kernel/acct.c | 10 kernel/cgroup/cpuset.c | 2 kernel/dma/direct.c | 2 kernel/fork.c | 4 kernel/futex.c | 2 kernel/irq/timings.c | 2 kernel/jump_label.c | 2 kernel/kcsan/encoding.h | 2 kernel/kexec_core.c | 2 kernel/kexec_file.c | 2 kernel/kthread.c | 2 kernel/livepatch/state.c | 2 kernel/panic.c | 12 kernel/pid_namespace.c | 2 kernel/power/snapshot.c | 2 kernel/range.c | 3 kernel/relay.c | 2 kernel/resource.c | 114 +++++-- kernel/smp.c | 2 kernel/sys.c | 2 kernel/user_namespace.c | 2 lib/Kconfig.debug | 7 lib/Kconfig.ubsan | 14 lib/Makefile | 1 lib/bitmap.c | 2 lib/crc32.c | 2 lib/decompress_bunzip2.c | 2 lib/dynamic_queue_limits.c | 4 lib/earlycpio.c | 2 lib/fault-inject-usercopy.c | 39 ++ lib/find_bit.c | 1 lib/hexdump.c | 1 lib/idr.c | 9 lib/iov_iter.c | 5 lib/libcrc32c.c | 2 lib/math/rational.c | 2 lib/math/reciprocal_div.c | 1 lib/mpi/mpi-bit.c | 2 lib/percpu_counter.c | 2 lib/radix-tree.c | 2 lib/scatterlist.c | 2 lib/strncpy_from_user.c | 3 lib/syscall.c | 2 lib/test_hmm.c | 2 lib/test_sysctl.c | 2 lib/test_xarray.c | 65 ++++ lib/usercopy.c | 5 lib/xarray.c | 208 ++++++++++++ mm/Kconfig | 2 mm/compaction.c | 6 mm/debug_vm_pgtable.c | 267 ++++++++-------- mm/filemap.c | 58 ++- mm/gup.c | 73 ++-- mm/highmem.c | 4 mm/huge_memory.c | 47 +- mm/hwpoison-inject.c | 18 - mm/internal.h | 47 +- mm/khugepaged.c | 2 mm/madvise.c | 52 --- mm/memory-failure.c | 357 ++++++++++------------ mm/memory.c | 7 mm/memory_hotplug.c | 223 +++++-------- mm/memremap.c | 3 mm/migrate.c | 11 mm/mmap.c | 7 mm/mmu_notifier.c | 2 mm/page-writeback.c | 1 mm/page_alloc.c | 289 +++++++++++------ mm/page_isolation.c | 16 mm/page_owner.c | 10 mm/page_poison.c | 20 - mm/page_reporting.c | 4 mm/readahead.c | 174 ++++------ mm/rmap.c | 10 mm/shmem.c | 2 mm/shuffle.c | 2 mm/slab.c | 2 mm/slab.h | 1 mm/slub.c | 2 mm/sparse.c | 2 mm/swap_state.c | 2 mm/truncate.c | 6 mm/util.c | 3 mm/vmscan.c | 5 mm/vmstat.c | 8 mm/workingset.c | 2 scripts/Makefile.ubsan | 10 scripts/checkpatch.pl | 238 ++++++++++---- scripts/const_structs.checkpatch | 3 scripts/gdb/linux/proc.py | 15 scripts/gdb/linux/tasks.py | 9 scripts/get_maintainer.pl | 9 tools/testing/selftests/exec/.gitignore | 1 tools/testing/selftests/exec/Makefile | 9 tools/testing/selftests/exec/load_address.c | 68 ++++ 161 files changed, 2532 insertions(+), 1864 deletions(-)
And... I forgot to set in-reply-to :( Shall resend, omitting linux-mm.
40 patches, based on 9d9af1007bc08971953ae915d88dc9bb21344b53. Subsystems affected by this patch series: ia64 mm/memcg mm/migration mm/pagemap mm/gup mm/madvise mm/vmalloc misc Subsystem: ia64 Krzysztof Kozlowski <krzk@kernel.org>: ia64: fix build error with !COREDUMP Subsystem: mm/memcg Roman Gushchin <guro@fb.com>: mm, memcg: rework remote charging API to support nesting Patch series "mm: kmem: kernel memory accounting in an interrupt context": mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current() mm: kmem: remove redundant checks from get_obj_cgroup_from_current() mm: kmem: prepare remote memcg charging infra for interrupt contexts mm: kmem: enable kernel memcg accounting from interrupt contexts Subsystem: mm/migration Joonsoo Kim <iamjoonsoo.kim@lge.com>: mm/memory-failure: remove a wrapper for alloc_migration_target() mm/memory_hotplug: remove a wrapper for alloc_migration_target() Miaohe Lin <linmiaohe@huawei.com>: mm/migrate: avoid possible unnecessary process right check in kernel_move_pages() Subsystem: mm/pagemap "Liam R. Howlett" <Liam.Howlett@Oracle.com>: mm/mmap: add inline vma_next() for readability of mmap code mm/mmap: add inline munmap_vma_range() for code readability Subsystem: mm/gup Jann Horn <jannh@google.com>: mm/gup_benchmark: take the mmap lock around GUP binfmt_elf: take the mmap lock around find_extend_vma() mm/gup: assert that the mmap lock is held in __get_user_pages() John Hubbard <jhubbard@nvidia.com>: Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v2: mm/gup_benchmark: rename to mm/gup_test selftests/vm: use a common gup_test.h selftests/vm: rename run_vmtests --> run_vmtests.sh selftests/vm: minor cleanup: Makefile and gup_test.c selftests/vm: only some gup_test items are really benchmarks selftests/vm: gup_test: introduce the dump_pages() sub-test selftests/vm: run_vmtests.sh: update and clean up gup_test invocation selftests/vm: hmm-tests: remove the libhugetlbfs dependency selftests/vm: 10x speedup for hmm-tests Subsystem: mm/madvise Minchan Kim <minchan@kernel.org>: Patch series "introduce memory hinting API for external process", v9: mm/madvise: pass mm to do_madvise pid: move pidfd_get_pid() to pid.c mm/madvise: introduce process_madvise() syscall: an external memory hinting API Subsystem: mm/vmalloc "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "remove alloc_vm_area", v4: mm: update the documentation for vfree Christoph Hellwig <hch@lst.de>: mm: add a VM_MAP_PUT_PAGES flag for vmap mm: add a vmap_pfn function mm: allow a NULL fn callback in apply_to_page_range zsmalloc: switch from alloc_vm_area to get_vm_area drm/i915: use vmap in shmem_pin_map drm/i915: stop using kmap in i915_gem_object_map drm/i915: use vmap in i915_gem_object_map xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv x86/xen: open code alloc_vm_area in arch_gnttab_valloc mm: remove alloc_vm_area Patch series "two small vmalloc cleanups": mm: cleanup the gfp_mask handling in __vmalloc_area_node mm: remove the filename in the top of file comment in vmalloc.c Subsystem: misc Tian Tao <tiantao6@hisilicon.com>: mm: remove duplicate include statement in mmu.c Documentation/core-api/pin_user_pages.rst | 8 arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/mm/mmu.c | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/ia64/kernel/Makefile | 2 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/configs/debug_defconfig | 2 arch/s390/configs/defconfig | 2 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/x86/xen/grant-table.c | 27 +- arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/gpu/drm/i915/Kconfig | 1 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 136 ++++------ drivers/gpu/drm/i915/gt/shmem_utils.c | 78 +----- drivers/xen/xenbus/xenbus_client.c | 30 +- fs/binfmt_elf.c | 3 fs/buffer.c | 6 fs/io_uring.c | 2 fs/notify/fanotify/fanotify.c | 5 fs/notify/inotify/inotify_fsnotify.c | 5 include/linux/memcontrol.h | 12 include/linux/mm.h | 2 include/linux/pid.h | 1 include/linux/sched/mm.h | 43 +-- include/linux/syscalls.h | 2 include/linux/vmalloc.h | 7 include/uapi/asm-generic/unistd.h | 4 kernel/exit.c | 19 - kernel/pid.c | 19 + kernel/sys_ni.c | 1 mm/Kconfig | 24 + mm/Makefile | 2 mm/gup.c | 2 mm/gup_benchmark.c | 225 ------------------ mm/gup_test.c | 295 +++++++++++++++++++++-- mm/gup_test.h | 40 ++- mm/madvise.c | 125 ++++++++-- mm/memcontrol.c | 83 ++++-- mm/memory-failure.c | 18 - mm/memory.c | 16 - mm/memory_hotplug.c | 46 +-- mm/migrate.c | 71 +++-- mm/mmap.c | 74 ++++- mm/nommu.c | 7 mm/percpu.c | 3 mm/slab.h | 3 mm/vmalloc.c | 147 +++++------ mm/zsmalloc.c | 10 tools/testing/selftests/vm/.gitignore | 3 tools/testing/selftests/vm/Makefile | 40 ++- tools/testing/selftests/vm/check_config.sh | 31 ++ tools/testing/selftests/vm/config | 2 tools/testing/selftests/vm/gup_benchmark.c | 143 ----------- tools/testing/selftests/vm/gup_test.c | 260 ++++++++++++++++++-- tools/testing/selftests/vm/hmm-tests.c | 12 tools/testing/selftests/vm/run_vmtests | 334 -------------------------- tools/testing/selftests/vm/run_vmtests.sh | 350 +++++++++++++++++++++++++++- 70 files changed, 1580 insertions(+), 1224 deletions(-)
15 patches, based on 3cea11cd5e3b00d91caf0b4730194039b45c5891. Subsystems affected by this patch series: mm/memremap mm/memcg mm/slab-generic mm/kasan mm/mempolicy signals lib mm/pagecache kthread mm/oom-kill mm/pagemap epoll core-kernel Subsystem: mm/memremap Ralph Campbell <rcampbell@nvidia.com>: mm/mremap_pages: fix static key devmap_managed_key updates Subsystem: mm/memcg Mike Kravetz <mike.kravetz@oracle.com>: hugetlb_cgroup: fix reservation accounting zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>: mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg Roman Gushchin <guro@fb.com>: mm: memcg: link page counters to root if use_hierarchy is false Subsystem: mm/slab-generic Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan: adopt KUNIT tests to SW_TAGS mode Subsystem: mm/mempolicy Shijie Luo <luoshijie1@huawei.com>: mm: mempolicy: fix potential pte_unmap_unlock pte error Subsystem: signals Oleg Nesterov <oleg@redhat.com>: ptrace: fix task_join_group_stop() for the case when current is traced Subsystem: lib Vasily Gorbik <gor@linux.ibm.com>: lib/crc32test: remove extra local_irq_disable/enable Subsystem: mm/pagecache Jason Yan <yanaijie@huawei.com>: mm/truncate.c: make __invalidate_mapping_pages() static Subsystem: kthread Zqiang <qiang.zhang@windriver.com>: kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled Subsystem: mm/oom-kill Charles Haithcock <chaithco@redhat.com>: mm, oom: keep oom_adj under or at upper limit when printing Subsystem: mm/pagemap Jason Gunthorpe <jgg@nvidia.com>: mm: always have io_remap_pfn_range() set pgprot_decrypted() Subsystem: epoll Soheil Hassas Yeganeh <soheil@google.com>: epoll: check ep_events_available() upon timeout epoll: add a selftest for epoll timeout race Subsystem: core-kernel Lukas Bulwahn <lukas.bulwahn@gmail.com>: kernel/hung_task.c: make type annotations consistent fs/eventpoll.c | 16 + fs/proc/base.c | 2 include/linux/mm.h | 9 include/linux/pgtable.h | 4 kernel/hung_task.c | 3 kernel/kthread.c | 3 kernel/signal.c | 19 - lib/crc32test.c | 4 lib/test_kasan.c | 149 +++++++--- mm/hugetlb.c | 20 - mm/memcontrol.c | 25 + mm/mempolicy.c | 6 mm/memremap.c | 39 +- mm/truncate.c | 2 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 95 ++++++ 15 files changed, 290 insertions(+), 106 deletions(-)
14 patches, based on 9e6a39eae450b81c8b2c8cbbfbdf8218e9b40c81. Subsystems affected by this patch series: mm/migration mm/vmscan mailmap mm/slub mm/gup kbuild reboot kernel/watchdog mm/memcg mm/hugetlbfs panic ocfs2 Subsystem: mm/migration Zi Yan <ziy@nvidia.com>: mm/compaction: count pages and stop correctly during page isolation mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate Subsystem: mm/vmscan Nicholas Piggin <npiggin@gmail.com>: mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit Subsystem: mailmap Dmitry Baryshkov <dbaryshkov@gmail.com>: mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov Subsystem: mm/slub Laurent Dufour <ldufour@linux.ibm.com>: mm/slub: fix panic in slab_alloc_node() Subsystem: mm/gup Jason Gunthorpe <jgg@nvidia.com>: mm/gup: use unpin_user_pages() in __gup_longterm_locked() Subsystem: kbuild Arvind Sankar <nivedita@alum.mit.edu>: compiler.h: fix barrier_data() on clang Subsystem: reboot Matteo Croce <mcroce@microsoft.com>: Patch series "fix parsing of reboot= cmdline", v3: Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" reboot: fix overflow parsing reboot cpu number Subsystem: kernel/watchdog Santosh Sivaraj <santosh@fossix.org>: kernel/watchdog: fix watchdog_allowed_mask not used warning Subsystem: mm/memcg Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: fix missing wakeup polling thread Subsystem: mm/hugetlbfs Mike Kravetz <mike.kravetz@oracle.com>: hugetlbfs: fix anon huge page migration race Subsystem: panic Christophe Leroy <christophe.leroy@csgroup.eu>: panic: don't dump stack twice on warn Subsystem: ocfs2 Wengang Wang <wen.gang.wang@oracle.com>: ocfs2: initialize ip_next_orphan .mailmap | 5 +- fs/ocfs2/super.c | 1 include/asm-generic/barrier.h | 1 include/linux/compiler-clang.h | 6 -- include/linux/compiler-gcc.h | 19 -------- include/linux/compiler.h | 18 +++++++- include/linux/memcontrol.h | 11 ++++- kernel/panic.c | 3 - kernel/reboot.c | 28 ++++++------ kernel/watchdog.c | 4 - mm/compaction.c | 12 +++-- mm/gup.c | 14 ++++-- mm/hugetlb.c | 90 ++--------------------------------------- mm/memory-failure.c | 36 +++++++--------- mm/migrate.c | 46 +++++++++++--------- mm/rmap.c | 5 -- mm/slub.c | 2 mm/vmscan.c | 5 +- 18 files changed, 119 insertions(+), 187 deletions(-)
8 patches, based on a349e4c659609fd20e4beea89e5c4a4038e33a95. Subsystems affected by this patch series: mm/madvise kbuild mm/pagemap mm/readahead mm/memcg mm/userfaultfd vfs-akpm mm/madvise Subsystem: mm/madvise Eric Dumazet <edumazet@google.com>: mm/madvise: fix memory leak from process_madvise Subsystem: kbuild Nick Desaulniers <ndesaulniers@google.com>: compiler-clang: remove version check for BPF Tracing Subsystem: mm/pagemap Dan Williams <dan.j.williams@intel.com>: mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports Subsystem: mm/readahead "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: fix readahead_page_batch for retry entries Subsystem: mm/memcg Muchun Song <songmuchun@bytedance.com>: mm: memcg/slab: fix root memcg vmstats Subsystem: mm/userfaultfd Gerald Schaefer <gerald.schaefer@linux.ibm.com>: mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault() Subsystem: vfs-akpm Yicong Yang <yangyicong@hisilicon.com>: libfs: fix error cast of negative value in simple_attr_write() Subsystem: mm/madvise "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: fix madvise WILLNEED performance problem arch/ia64/include/asm/sparsemem.h | 6 ++++++ arch/powerpc/include/asm/mmzone.h | 5 +++++ arch/powerpc/include/asm/sparsemem.h | 5 ++--- arch/powerpc/mm/mem.c | 1 + arch/x86/include/asm/sparsemem.h | 10 ++++++++++ arch/x86/mm/numa.c | 2 ++ drivers/dax/Kconfig | 1 - fs/libfs.c | 6 ++++-- include/linux/compiler-clang.h | 2 ++ include/linux/memory_hotplug.h | 14 -------------- include/linux/numa.h | 30 +++++++++++++++++++++++++++++- include/linux/pagemap.h | 2 ++ mm/huge_memory.c | 9 ++++----- mm/madvise.c | 4 +--- mm/memcontrol.c | 9 +++++++-- mm/memory_hotplug.c | 18 ------------------ 16 files changed, 75 insertions(+), 49 deletions(-)
12 patches, based on 33256ce194110874d4bc90078b577c59f9076c59. Subsystems affected by this patch series: lib coredump mm/memcg mm/zsmalloc mm/swap mailmap mm/selftests mm/pagecache mm/hugetlb mm/pagemap Subsystem: lib Randy Dunlap <rdunlap@infradead.org>: zlib: export S390 symbols for zlib modules Subsystem: coredump Menglong Dong <dong.menglong@zte.com.cn>: coredump: fix core_pattern parse error Subsystem: mm/memcg Roman Gushchin <guro@fb.com>: mm: memcg/slab: fix obj_cgroup_charge() return value handling Yang Shi <shy828301@gmail.com>: mm: list_lru: set shrinker map bit when child nr_items is not zero Subsystem: mm/zsmalloc Minchan Kim <minchan@kernel.org>: mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING Subsystem: mm/swap Qian Cai <qcai@redhat.com>: mm/swapfile: do not sleep with a spin lock held Subsystem: mailmap Uwe Kleine-König <u.kleine-koenig@pengutronix.de>: mailmap: add two more addresses of Uwe Kleine-König Subsystem: mm/selftests Xingxing Su <suxingxing@loongson.cn>: tools/testing/selftests/vm: fix build error Axel Rasmussen <axelrasmussen@google.com>: userfaultfd: selftests: fix SIGSEGV if huge mmap fails Subsystem: mm/pagecache Alex Shi <alex.shi@linux.alibaba.com>: mm/filemap: add static for function __add_to_page_cache_locked Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: hugetlb_cgroup: fix offline of hugetlb cgroup with reservations Subsystem: mm/pagemap Liu Zixian <liuzixian4@huawei.com>: mm/mmap.c: fix mmap return value when vma is merged after call_mmap() .mailmap | 2 + arch/arm/configs/omap2plus_defconfig | 1 fs/coredump.c | 3 + include/linux/zsmalloc.h | 1 lib/zlib_dfltcc/dfltcc_inflate.c | 3 + mm/Kconfig | 13 ------- mm/filemap.c | 2 - mm/hugetlb_cgroup.c | 8 +--- mm/list_lru.c | 10 ++--- mm/mmap.c | 26 ++++++-------- mm/slab.h | 40 +++++++++++++--------- mm/swapfile.c | 4 +- mm/zsmalloc.c | 54 ------------------------------- tools/testing/selftests/vm/Makefile | 4 ++ tools/testing/selftests/vm/userfaultfd.c | 25 +++++++++----- 15 files changed, 75 insertions(+), 121 deletions(-)
8 patches, based on 33dc9614dc208291d0c4bcdeb5d30d481dcd2c4c. Subsystems affected by this patch series: mm/pagecache proc selftests kbuild mm/kasan mm/hugetlb Subsystem: mm/pagecache Andrew Morton <akpm@linux-foundation.org>: revert "mm/filemap: add static for function __add_to_page_cache_locked" Subsystem: proc Miles Chen <miles.chen@mediatek.com>: proc: use untagged_addr() for pagemap_read addresses Subsystem: selftests Arnd Bergmann <arnd@arndb.de>: selftest/fpu: avoid clang warning Subsystem: kbuild Arnd Bergmann <arnd@arndb.de>: kbuild: avoid static_assert for genksyms initramfs: fix clang build failure elfcore: fix building with clang Subsystem: mm/kasan Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>: kasan: fix object remaining in offline per-cpu quarantine Subsystem: mm/hugetlb Gerald Schaefer <gerald.schaefer@linux.ibm.com>: mm/hugetlb: clear compound_nr before freeing gigantic pages fs/proc/task_mmu.c | 8 ++++++-- include/linux/build_bug.h | 5 +++++ include/linux/elfcore.h | 22 ++++++++++++++++++++++ init/initramfs.c | 2 +- kernel/Makefile | 1 - kernel/elfcore.c | 26 -------------------------- lib/Makefile | 3 ++- mm/filemap.c | 2 +- mm/hugetlb.c | 1 + mm/kasan/quarantine.c | 39 +++++++++++++++++++++++++++++++++++++++ 10 files changed, 77 insertions(+), 32 deletions(-)
- a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. 200 patches, based on 2c85ebc57b3e1817b6ce1a6b703928e113a90442. Subsystems affected by this patch series: kthread kbuild ide ntfs ocfs2 arch mm/slab-generic mm/slab mm/slub mm/dax mm/debug mm/pagecache mm/gup mm/swap mm/shmem mm/memcg mm/pagemap mm/mremap mm/hmm mm/vmalloc mm/documentation mm/kasan mm/pagealloc mm/memory-failure mm/hugetlb mm/vmscan mm/z3fold mm/compaction mm/oom-kill mm/migration mm/cma mm/page-poison mm/userfaultfd mm/zswap mm/zsmalloc mm/uaccess mm/zram mm/cleanups Subsystem: kthread Rob Clark <robdclark@chromium.org>: kthread: add kthread_work tracepoints Petr Mladek <pmladek@suse.com>: kthread_worker: document CPU hotplug handling Subsystem: kbuild Petr Vorel <petr.vorel@gmail.com>: uapi: move constants from <linux/kernel.h> to <linux/const.h> Subsystem: ide Sebastian Andrzej Siewior <bigeasy@linutronix.de>: ide/falcon: remove in_interrupt() usage ide: remove BUG_ON(in_interrupt() || irqs_disabled()) from ide_unregister() Subsystem: ntfs Alex Shi <alex.shi@linux.alibaba.com>: fs/ntfs: remove unused varibles fs/ntfs: remove unused variable attr_len Subsystem: ocfs2 Tom Rix <trix@redhat.com>: fs/ocfs2/cluster/tcp.c: remove unneeded break Mauricio Faria de Oliveira <mfo@canonical.com>: ocfs2: ratelimit the 'max lookup times reached' notice Subsystem: arch Colin Ian King <colin.king@canonical.com>: arch/Kconfig: fix spelling mistakes Subsystem: mm/slab-generic Hui Su <sh_def@163.com>: mm/slab_common.c: use list_for_each_entry in dump_unreclaimable_slab() Bartosz Golaszewski <bgolaszewski@baylibre.com>: Patch series "slab: provide and use krealloc_array()", v3: mm: slab: clarify krealloc()'s behavior with __GFP_ZERO mm: slab: provide krealloc_array() ALSA: pcm: use krealloc_array() vhost: vringh: use krealloc_array() pinctrl: use krealloc_array() edac: ghes: use krealloc_array() drm: atomic: use krealloc_array() hwtracing: intel: use krealloc_array() dma-buf: use krealloc_array() Vlastimil Babka <vbabka@suse.cz>: mm, slab, slub: clear the slab_cache field when freeing page Subsystem: mm/slab Alexander Popov <alex.popov@linux.com>: mm/slab: rerform init_on_free earlier Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: mm, slub: use kmem_cache_debug_flags() in deactivate_slab() Bharata B Rao <bharata@linux.ibm.com>: mm/slub: let number of online CPUs determine the slub page order Subsystem: mm/dax Dan Williams <dan.j.williams@intel.com>: device-dax/kmem: use struct_size() Subsystem: mm/debug Zhenhua Huang <zhenhuah@codeaurora.org>: mm: fix page_owner initializing issue for arm32 Liam Mark <lmark@codeaurora.org>: mm/page_owner: record timestamp and pid Subsystem: mm/pagecache Kent Overstreet <kent.overstreet@gmail.com>: Patch series "generic_file_buffered_read() improvements", v2: mm/filemap/c: break generic_file_buffered_read up into multiple functions mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig Alex Shi <alex.shi@linux.alibaba.com>: mm/truncate: add parameter explanation for invalidate_mapping_pagevec Hailong Liu <carver4lio@163.com>: mm/filemap.c: remove else after a return Subsystem: mm/gup John Hubbard <jhubbard@nvidia.com>: Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3: mm/gup_benchmark: rename to mm/gup_test selftests/vm: use a common gup_test.h selftests/vm: rename run_vmtests --> run_vmtests.sh selftests/vm: minor cleanup: Makefile and gup_test.c selftests/vm: only some gup_test items are really benchmarks selftests/vm: gup_test: introduce the dump_pages() sub-test selftests/vm: run_vmtests.sh: update and clean up gup_test invocation selftests/vm: hmm-tests: remove the libhugetlbfs dependency selftests/vm: 2x speedup for run_vmtests.sh Barry Song <song.bao.hua@hisilicon.com>: mm/gup_test.c: mark gup_test_init as __init function mm/gup_test: GUP_TEST depends on DEBUG_FS Jason Gunthorpe <jgg@nvidia.com>: Patch series "Add a seqcount between gup_fast and copy_page_range()", v4: mm/gup: reorganize internal_get_user_pages_fast() mm/gup: prevent gup_fast from racing with COW during fork mm/gup: remove the vma allocation from gup_longterm_locked() mm/gup: combine put_compound_head() and unpin_user_page() Subsystem: mm/swap Ralph Campbell <rcampbell@nvidia.com>: mm: handle zone device pages in release_pages() Miaohe Lin <linmiaohe@huawei.com>: mm/swapfile.c: use helper function swap_count() in add_swap_count_continuation() mm/swap_state: skip meaningless swap cache readahead when ra_info.win == 0 mm/swapfile.c: remove unnecessary out label in __swap_duplicate() mm/swapfile.c: use memset to fill the swap_map with SWAP_HAS_CACHE Jeff Layton <jlayton@kernel.org>: mm: remove pagevec_lookup_range_nr_tag() Subsystem: mm/shmem Hui Su <sh_def@163.com>: mm/shmem.c: make shmem_mapping() inline Randy Dunlap <rdunlap@infradead.org>: tmpfs: fix Documentation nits Subsystem: mm/memcg Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: add file_thp, shmem_thp to memory.stat Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: remove unused mod_memcg_obj_state() Miaohe Lin <linmiaohe@huawei.com>: mm: memcontrol: eliminate redundant check in __mem_cgroup_insert_exceeded() Muchun Song <songmuchun@bytedance.com>: mm: memcg/slab: fix return of child memcg objcg for root memcg mm: memcg/slab: fix use after free in obj_cgroup_charge Shakeel Butt <shakeelb@google.com>: mm/rmap: always do TTU_IGNORE_ACCESS Alex Shi <alex.shi@linux.alibaba.com>: mm/memcg: update page struct member in comments Roman Gushchin <guro@fb.com>: mm: memcg: fix obsolete code comments Patch series "mm: memcg: deprecate cgroup v1 non-hierarchical mode", v1: mm: memcg: deprecate the non-hierarchical mode docs: cgroup-v1: reflect the deprecation of the non-hierarchical mode cgroup: remove obsoleted broken_hierarchy and warned_broken_hierarchy Hui Su <sh_def@163.com>: mm/page_counter: use page_counter_read in page_counter_set_max Lukas Bulwahn <lukas.bulwahn@gmail.com>: mm: memcg: remove obsolete memcg_has_children() Muchun Song <songmuchun@bytedance.com>: mm: memcg/slab: rename *_lruvec_slab_state to *_lruvec_kmem_state Kaixu Xia <kaixuxia@tencent.com>: mm: memcontrol: sssign boolean values to a bool variable Alex Shi <alex.shi@linux.alibaba.com>: mm/memcg: remove incorrect comment Shakeel Butt <shakeelb@google.com>: Patch series "memcg: add pagetable comsumption to memory.stat", v2: mm: move lruvec stats update functions to vmstat.h mm: memcontrol: account pagetables per node Subsystem: mm/pagemap Dan Williams <dan.j.williams@intel.com>: xen/unpopulated-alloc: consolidate pgmap manipulation Kalesh Singh <kaleshsingh@google.com>: Patch series "Speed up mremap on large regions", v4: kselftests: vm: add mremap tests mm: speedup mremap on 1GB or larger regions arm64: mremap speedup - enable HAVE_MOVE_PUD x86: mremap speedup - Enable HAVE_MOVE_PUD John Hubbard <jhubbard@nvidia.com>: mm: cleanup: remove unused tsk arg from __access_remote_vm Alex Shi <alex.shi@linux.alibaba.com>: mm/mapping_dirty_helpers: enhance the kernel-doc markups mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte Axel Rasmussen <axelrasmussen@google.com>: mm: mmap_lock: add tracepoints around lock acquisition "Matthew Wilcox (Oracle)" <willy@infradead.org>: sparc: fix handling of page table constructor failure mm: move free_unref_page to mm/internal.h Subsystem: mm/mremap Dmitry Safonov <dima@arista.com>: Patch series "mremap: move_vma() fixes": mm/mremap: account memory on do_munmap() failure mm/mremap: for MREMAP_DONTUNMAP check security_vm_enough_memory_mm() mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio vm_ops: rename .split() callback to .may_split() mremap: check if it's possible to split original vma mm: forbid splitting special mappings Subsystem: mm/hmm Daniel Vetter <daniel.vetter@ffwll.ch>: mm: track mmu notifiers in fs_reclaim_acquire/release mm: extract might_alloc() debug check locking/selftests: add testcases for fs_reclaim Subsystem: mm/vmalloc Andrew Morton <akpm@linux-foundation.org>: mm/vmalloc.c:__vmalloc_area_node(): avoid 32-bit overflow "Uladzislau Rezki (Sony)" <urezki@gmail.com>: mm/vmalloc: use free_vm_area() if an allocation fails mm/vmalloc: rework the drain logic Alex Shi <alex.shi@linux.alibaba.com>: mm/vmalloc: add 'align' parameter explanation for pvm_determine_end_from_reverse Baolin Wang <baolin.wang@linux.alibaba.com>: mm/vmalloc.c: remove unnecessary return statement Waiman Long <longman@redhat.com>: mm/vmalloc: Fix unlock order in s_stop() Subsystem: mm/documentation Alex Shi <alex.shi@linux.alibaba.com>: docs/vm: remove unused 3 items explanation for /proc/vmstat Subsystem: mm/kasan Vincenzo Frascino <vincenzo.frascino@arm.com>: mm/vmalloc.c: fix kasan shadow poisoning size Walter Wu <walter-zh.wu@mediatek.com>: Patch series "kasan: add workqueue stack for generic KASAN", v5: workqueue: kasan: record workqueue stack kasan: print workqueue stack lib/test_kasan.c: add workqueue test case kasan: update documentation for generic kasan Marco Elver <elver@google.com>: lkdtm: disable KASAN for rodata.o Subsystem: mm/pagealloc Mike Rapoport <rppt@linux.ibm.com>: Patch series "arch, mm: deprecate DISCONTIGMEM", v2: alpha: switch from DISCONTIGMEM to SPARSEMEM ia64: remove custom __early_pfn_to_nid() ia64: remove 'ifdef CONFIG_ZONE_DMA32' statements ia64: discontig: paging_init(): remove local max_pfn calculation ia64: split virtual map initialization out of paging_init() ia64: forbid using VIRTUAL_MEM_MAP with FLATMEM ia64: make SPARSEMEM default and disable DISCONTIGMEM arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL arm, arm64: move free_unused_memmap() to generic mm arc: use FLATMEM with freeing of unused memory map instead of DISCONTIGMEM m68k/mm: make node data and node setup depend on CONFIG_DISCONTIGMEM m68k/mm: enable use of generic memory_model.h for !DISCONTIGMEM m68k: deprecate DISCONTIGMEM Patch series "arch, mm: improve robustness of direct map manipulation", v7: mm: introduce debug_pagealloc_{map,unmap}_pages() helpers PM: hibernate: make direct map manipulations more explicit arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC arch, mm: make kernel_page_present() always available Vlastimil Babka <vbabka@suse.cz>: Patch series "disable pcplists during memory offline", v3: mm, page_alloc: clean up pageset high and batch update mm, page_alloc: calculate pageset high and batch once per zone mm, page_alloc: remove setup_pageset() mm, page_alloc: simplify pageset_update() mm, page_alloc: cache pageset high and batch in struct zone mm, page_alloc: move draining pcplists to page isolation users mm, page_alloc: disable pcplists during memory offline Miaohe Lin <linmiaohe@huawei.com>: include/linux/page-flags.h: remove unused __[Set|Clear]PagePrivate "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/page-flags: fix comment mm/page_alloc: add __free_pages() documentation Zou Wei <zou_wei@huawei.com>: mm/page_alloc: mark some symbols with static keyword David Hildenbrand <david@redhat.com>: mm/page_alloc: clear all pages in post_alloc_hook() with init_on_alloc=1 Lin Feng <linf@wangsu.com>: init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set Lorenzo Stoakes <lstoakes@gmail.com>: mm: page_alloc: refactor setup_per_zone_lowmem_reserve() Muchun Song <songmuchun@bytedance.com>: mm/page_alloc: speed up the iteration of max_order Subsystem: mm/memory-failure Oscar Salvador <osalvador@suse.de>: Patch series "HWpoison: further fixes and cleanups", v5: mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page mm,hwpoison: take free pages off the buddy freelists mm,hwpoison: drop unneeded pcplist draining Patch series "HWPoison: Refactor get page interface", v2: mm,hwpoison: refactor get_any_page mm,hwpoison: disable pcplists before grabbing a refcount mm,hwpoison: remove drain_all_pages from shake_page mm,memory_failure: always pin the page in madvise_inject_error mm,hwpoison: return -EBUSY when migration fails Subsystem: mm/hugetlb Hui Su <sh_def@163.com>: mm/hugetlb.c: just use put_page_testzero() instead of page_count() Ralph Campbell <rcampbell@nvidia.com>: include/linux/huge_mm.h: remove extern keyword Alex Shi <alex.shi@linux.alibaba.com>: khugepaged: add parameter explanations for kernel-doc markup Liu Xiang <liu.xiang@zlingsmart.com>: mm: hugetlb: fix type of delta parameter and related local variables in gather_surplus_pages() Oscar Salvador <osalvador@suse.de>: mm,hugetlb: remove unneeded initialization Dan Carpenter <dan.carpenter@oracle.com>: hugetlb: fix an error code in hugetlb_reserve_pages() Subsystem: mm/vmscan Johannes Weiner <hannes@cmpxchg.org>: mm: don't wake kswapd prematurely when watermark boosting is disabled Lukas Bulwahn <lukas.bulwahn@gmail.com>: mm/vmscan: drop unneeded assignment in kswapd() "logic.yu" <hymmsx.yu@gmail.com>: mm/vmscan.c: remove the filename in the top of file comment Muchun Song <songmuchun@bytedance.com>: mm/page_isolation: do not isolate the max order page Subsystem: mm/z3fold Vitaly Wool <vitaly.wool@konsulko.com>: Patch series "z3fold: stability / rt fixes": z3fold: simplify freeing slots z3fold: stricter locking and more careful reclaim z3fold: remove preempt disabled sections for RT Subsystem: mm/compaction Yanfei Xu <yanfei.xu@windriver.com>: mm/compaction: rename 'start_pfn' to 'iteration_start_pfn' in compact_zone() Hui Su <sh_def@163.com>: mm/compaction: move compaction_suitable's comment to right place mm/compaction: make defer_compaction and compaction_deferred static Subsystem: mm/oom-kill Hui Su <sh_def@163.com>: mm/oom_kill: change comment and rename is_dump_unreclaim_slabs() Subsystem: mm/migration Long Li <lonuxli.64@gmail.com>: mm/migrate.c: fix comment spelling Ralph Campbell <rcampbell@nvidia.com>: mm/migrate.c: optimize migrate_vma_pages() mmu notifier "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: support THPs in zero_user_segments Yang Shi <shy828301@gmail.com>: Patch series "mm: misc migrate cleanup and improvement", v3: mm: truncate_complete_page() does not exist any more mm: migrate: simplify the logic for handling permanent failure mm: migrate: skip shared exec THP for NUMA balancing mm: migrate: clean up migrate_prep{_local} mm: migrate: return -ENOSYS if THP migration is unsupported Stephen Zhang <starzhangzsd@gmail.com>: mm: migrate: remove unused parameter in migrate_vma_insert_page() Subsystem: mm/cma Lecopzer Chen <lecopzer.chen@mediatek.com>: mm/cma.c: remove redundant cma_mutex lock Charan Teja Reddy <charante@codeaurora.org>: mm: cma: improve pr_debug log in cma_release() Subsystem: mm/page-poison Vlastimil Babka <vbabka@suse.cz>: Patch series "cleanup page poisoning", v3: mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters mm, page_poison: use static key more efficiently kernel/power: allow hibernation with page_poison sanity checking mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO Subsystem: mm/userfaultfd Lokesh Gidra <lokeshgidra@google.com>: Patch series "Control over userfaultfd kernel-fault handling", v6: userfaultfd: add UFFD_USER_MODE_ONLY userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob Axel Rasmussen <axelrasmussen@google.com>: userfaultfd: selftests: make __{s,u}64 format specifiers portable Peter Xu <peterx@redhat.com>: Patch series "userfaultfd: selftests: Small fixes": userfaultfd/selftests: always dump something in modes userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: hint the test runner on required privilege Subsystem: mm/zswap Joe Perches <joe@perches.com>: mm/zswap: make struct kernel_param_ops definitions const YueHaibing <yuehaibing@huawei.com>: mm/zswap: fix passing zero to 'PTR_ERR' warning Barry Song <song.bao.hua@hisilicon.com>: mm/zswap: move to use crypto_acomp API for hardware acceleration Subsystem: mm/zsmalloc Miaohe Lin <linmiaohe@huawei.com>: mm/zsmalloc.c: rework the list_add code in insert_zspage() Subsystem: mm/uaccess Colin Ian King <colin.king@canonical.com>: mm/process_vm_access: remove redundant initialization of iov_r Subsystem: mm/zram Minchan Kim <minchan@kernel.org>: zram: support page writeback zram: add stat to gather incompressible pages since zram set up Rui Salvaterra <rsalvaterra@gmail.com>: zram: break the strict dependency from lzo Subsystem: mm/cleanups Mauro Carvalho Chehab <mchehab+huawei@kernel.org>: mm: fix kernel-doc markups Joe Perches <joe@perches.com>: Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2: mm: use sysfs_emit for struct kobject * uses mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm:backing-dev: use sysfs_emit in macro defining functions mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at "Gustavo A. R. Silva" <gustavoars@kernel.org>: mm: fix fall-through warnings for Clang Alexey Dobriyan <adobriyan@gmail.com>: mm: cleanup kstrto*() usage /mmap_lock.h | 107 ++ a/Documentation/admin-guide/blockdev/zram.rst | 6 a/Documentation/admin-guide/cgroup-v1/memcg_test.rst | 8 a/Documentation/admin-guide/cgroup-v1/memory.rst | 42 a/Documentation/admin-guide/cgroup-v2.rst | 11 a/Documentation/admin-guide/mm/transhuge.rst | 15 a/Documentation/admin-guide/sysctl/vm.rst | 15 a/Documentation/core-api/memory-allocation.rst | 4 a/Documentation/core-api/pin_user_pages.rst | 8 a/Documentation/dev-tools/kasan.rst | 5 a/Documentation/filesystems/tmpfs.rst | 8 a/Documentation/vm/memory-model.rst | 3 a/Documentation/vm/page_owner.rst | 12 a/arch/Kconfig | 21 a/arch/alpha/Kconfig | 8 a/arch/alpha/include/asm/mmzone.h | 14 a/arch/alpha/include/asm/page.h | 7 a/arch/alpha/include/asm/pgtable.h | 12 a/arch/alpha/include/asm/sparsemem.h | 18 a/arch/alpha/kernel/setup.c | 1 a/arch/arc/Kconfig | 3 a/arch/arc/include/asm/page.h | 20 a/arch/arc/mm/init.c | 29 a/arch/arm/Kconfig | 12 a/arch/arm/kernel/vdso.c | 9 a/arch/arm/mach-bcm/Kconfig | 1 a/arch/arm/mach-davinci/Kconfig | 1 a/arch/arm/mach-exynos/Kconfig | 1 a/arch/arm/mach-highbank/Kconfig | 1 a/arch/arm/mach-omap2/Kconfig | 1 a/arch/arm/mach-s5pv210/Kconfig | 1 a/arch/arm/mach-tango/Kconfig | 1 a/arch/arm/mm/init.c | 78 - a/arch/arm64/Kconfig | 9 a/arch/arm64/include/asm/cacheflush.h | 1 a/arch/arm64/include/asm/pgtable.h | 1 a/arch/arm64/kernel/vdso.c | 41 a/arch/arm64/mm/init.c | 68 - a/arch/arm64/mm/pageattr.c | 12 a/arch/ia64/Kconfig | 11 a/arch/ia64/include/asm/meminit.h | 2 a/arch/ia64/mm/contig.c | 88 -- a/arch/ia64/mm/discontig.c | 44 - a/arch/ia64/mm/init.c | 14 a/arch/ia64/mm/numa.c | 30 a/arch/m68k/Kconfig.cpu | 31 a/arch/m68k/include/asm/page.h | 2 a/arch/m68k/include/asm/page_mm.h | 7 a/arch/m68k/include/asm/virtconvert.h | 7 a/arch/m68k/mm/init.c | 10 a/arch/mips/vdso/genvdso.c | 4 a/arch/nds32/mm/mm-nds32.c | 6 a/arch/powerpc/Kconfig | 5 a/arch/riscv/Kconfig | 4 a/arch/riscv/include/asm/pgtable.h | 2 a/arch/riscv/include/asm/set_memory.h | 1 a/arch/riscv/mm/pageattr.c | 31 a/arch/s390/Kconfig | 4 a/arch/s390/configs/debug_defconfig | 2 a/arch/s390/configs/defconfig | 2 a/arch/s390/kernel/vdso.c | 11 a/arch/sparc/Kconfig | 4 a/arch/sparc/mm/init_64.c | 2 a/arch/x86/Kconfig | 5 a/arch/x86/entry/vdso/vma.c | 17 a/arch/x86/include/asm/set_memory.h | 1 a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 a/arch/x86/kernel/tboot.c | 1 a/arch/x86/mm/pat/set_memory.c | 6 a/drivers/base/node.c | 2 a/drivers/block/zram/Kconfig | 42 a/drivers/block/zram/zcomp.c | 2 a/drivers/block/zram/zram_drv.c | 29 a/drivers/block/zram/zram_drv.h | 1 a/drivers/dax/device.c | 4 a/drivers/dax/kmem.c | 2 a/drivers/dma-buf/sync_file.c | 3 a/drivers/edac/ghes_edac.c | 4 a/drivers/firmware/efi/efi.c | 1 a/drivers/gpu/drm/drm_atomic.c | 3 a/drivers/hwtracing/intel_th/msu.c | 2 a/drivers/ide/falconide.c | 2 a/drivers/ide/ide-probe.c | 3 a/drivers/misc/lkdtm/Makefile | 1 a/drivers/pinctrl/pinctrl-utils.c | 2 a/drivers/vhost/vringh.c | 3 a/drivers/virtio/virtio_balloon.c | 6 a/drivers/xen/unpopulated-alloc.c | 14 a/fs/aio.c | 5 a/fs/ntfs/file.c | 5 a/fs/ntfs/inode.c | 2 a/fs/ntfs/logfile.c | 3 a/fs/ocfs2/cluster/tcp.c | 1 a/fs/ocfs2/namei.c | 4 a/fs/proc/kcore.c | 2 a/fs/proc/meminfo.c | 2 a/fs/userfaultfd.c | 20 a/include/linux/cgroup-defs.h | 15 a/include/linux/compaction.h | 12 a/include/linux/fs.h | 2 a/include/linux/gfp.h | 2 a/include/linux/highmem.h | 19 a/include/linux/huge_mm.h | 93 -- a/include/linux/memcontrol.h | 148 --- a/include/linux/migrate.h | 4 a/include/linux/mm.h | 118 +- a/include/linux/mm_types.h | 8 a/include/linux/mmap_lock.h | 94 ++ a/include/linux/mmzone.h | 50 - a/include/linux/page-flags.h | 6 a/include/linux/page_ext.h | 8 a/include/linux/pagevec.h | 3 a/include/linux/poison.h | 4 a/include/linux/rmap.h | 1 a/include/linux/sched/mm.h | 16 a/include/linux/set_memory.h | 5 a/include/linux/shmem_fs.h | 6 a/include/linux/slab.h | 18 a/include/linux/vmalloc.h | 8 a/include/linux/vmstat.h | 104 ++ a/include/trace/events/sched.h | 84 + a/include/uapi/linux/const.h | 5 a/include/uapi/linux/ethtool.h | 2 a/include/uapi/linux/kernel.h | 9 a/include/uapi/linux/lightnvm.h | 2 a/include/uapi/linux/mroute6.h | 2 a/include/uapi/linux/netfilter/x_tables.h | 2 a/include/uapi/linux/netlink.h | 2 a/include/uapi/linux/sysctl.h | 2 a/include/uapi/linux/userfaultfd.h | 9 a/init/main.c | 6 a/ipc/shm.c | 8 a/kernel/cgroup/cgroup.c | 12 a/kernel/fork.c | 3 a/kernel/kthread.c | 29 a/kernel/power/hibernate.c | 2 a/kernel/power/power.h | 2 a/kernel/power/snapshot.c | 52 + a/kernel/ptrace.c | 2 a/kernel/workqueue.c | 3 a/lib/locking-selftest.c | 47 + a/lib/test_kasan_module.c | 29 a/mm/Kconfig | 25 a/mm/Kconfig.debug | 28 a/mm/Makefile | 4 a/mm/backing-dev.c | 8 a/mm/cma.c | 6 a/mm/compaction.c | 29 a/mm/filemap.c | 823 ++++++++++--------- a/mm/gup.c | 329 ++----- a/mm/gup_benchmark.c | 210 ---- a/mm/gup_test.c | 299 ++++++ a/mm/gup_test.h | 40 a/mm/highmem.c | 52 + a/mm/huge_memory.c | 86 + a/mm/hugetlb.c | 28 a/mm/init-mm.c | 1 a/mm/internal.h | 5 a/mm/kasan/generic.c | 3 a/mm/kasan/report.c | 4 a/mm/khugepaged.c | 58 - a/mm/ksm.c | 50 - a/mm/madvise.c | 14 a/mm/mapping_dirty_helpers.c | 6 a/mm/memblock.c | 80 + a/mm/memcontrol.c | 170 +-- a/mm/memory-failure.c | 322 +++---- a/mm/memory.c | 24 a/mm/memory_hotplug.c | 44 - a/mm/mempolicy.c | 8 a/mm/migrate.c | 183 ++-- a/mm/mm_init.c | 1 a/mm/mmap.c | 22 a/mm/mmap_lock.c | 230 +++++ a/mm/mmu_notifier.c | 7 a/mm/mmzone.c | 14 a/mm/mremap.c | 282 ++++-- a/mm/nommu.c | 8 a/mm/oom_kill.c | 14 a/mm/page_alloc.c | 517 ++++++----- a/mm/page_counter.c | 4 a/mm/page_ext.c | 10 a/mm/page_isolation.c | 18 a/mm/page_owner.c | 17 a/mm/page_poison.c | 56 - a/mm/page_vma_mapped.c | 9 a/mm/process_vm_access.c | 2 a/mm/rmap.c | 9 a/mm/shmem.c | 39 a/mm/slab.c | 10 a/mm/slab.h | 9 a/mm/slab_common.c | 10 a/mm/slob.c | 6 a/mm/slub.c | 156 +-- a/mm/swap.c | 12 a/mm/swap_state.c | 7 a/mm/swapfile.c | 14 a/mm/truncate.c | 18 a/mm/vmalloc.c | 105 +- a/mm/vmscan.c | 21 a/mm/vmstat.c | 6 a/mm/workingset.c | 8 a/mm/z3fold.c | 215 ++-- a/mm/zsmalloc.c | 11 a/mm/zswap.c | 193 +++- a/sound/core/pcm_lib.c | 4 a/tools/include/linux/poison.h | 6 a/tools/testing/selftests/vm/.gitignore | 4 a/tools/testing/selftests/vm/Makefile | 41 a/tools/testing/selftests/vm/check_config.sh | 31 a/tools/testing/selftests/vm/config | 2 a/tools/testing/selftests/vm/gup_benchmark.c | 143 --- a/tools/testing/selftests/vm/gup_test.c | 258 +++++ a/tools/testing/selftests/vm/hmm-tests.c | 10 a/tools/testing/selftests/vm/mremap_test.c | 344 +++++++ a/tools/testing/selftests/vm/run_vmtests | 51 - a/tools/testing/selftests/vm/userfaultfd.c | 94 -- 217 files changed, 4817 insertions(+), 3369 deletions(-)
On Mon, Dec 14, 2020 at 7:02 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 200 patches, based on 2c85ebc57b3e1817b6ce1a6b703928e113a90442.
I haven't actually processed the patches yet, but I have a question
for Konstantin wrt b4.
All the patches except for _one_ get a nice little green check-mark
next to them when I use 'git am' on this series.
The one that did not was [patch 192/200].
I have no idea why - and it doesn't matter a lot to me, it just stood
out as being different. I'm assuming Andrew has started doing patch
attestation, and that patch failed. But if so, maybe Konstantin wants
to know what went wrong.
Konstantin?
Linus
On Mon, Dec 14, 2020 at 7:25 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> All the patches except for _one_ get a nice little green check-mark
> next to them when I use 'git am' on this series.
>
> The one that did not was [patch 192/200].
>
> I have no idea why
Hmm. It looks like that patch is the only one in the series with the
">From" marker in the commit message, from the silly "clarify that
this isn't the first line in a new message in mbox format".
And "b4 am" has turned the single ">" into two, making the stupid
marker worse, and actually corrupting the end result.
Coincidence? Or cause?
Linus
On Mon, Dec 14, 2020 at 07:30:54PM -0800, Linus Torvalds wrote:
> > All the patches except for _one_ get a nice little green check-mark
> > next to them when I use 'git am' on this series.
> >
> > The one that did not was [patch 192/200].
> >
> > I have no idea why
>
> Hmm. It looks like that patch is the only one in the series with the
> ">From" marker in the commit message, from the silly "clarify that
> this isn't the first line in a new message in mbox format".
>
> And "b4 am" has turned the single ">" into two, making the stupid
> marker worse, and actually corrupting the end result.
It's a bug in b4 that I overlooked. Public-inbox emits mboxrd-formatted
.mbox files, while Python's mailbox.mbox consumes mboxo only. The main
distinction between the two is precisely that mboxrd will convert
">From " into ">>From " in an attempt to avoid corruption during
escape/unescape (it didn't end up fixing the problem 100% and mostly
introduced incompatibilities like this one).
I have a fix in master/stable-0.6.y and I'll release a 0.6.2 before the
end of the week.
Thanks for the report.
-K
- more MM work: a memcg scalability improvememt 19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3. Subsystems affected by this patch series: Alex Shi <alex.shi@linux.alibaba.com>: Patch series "per memcg lru lock", v21: mm/thp: move lru_add_page_tail() to huge_memory.c mm/thp: use head for head page in lru_add_page_tail() mm/thp: simplify lru_add_page_tail() mm/thp: narrow lru locking mm/vmscan: remove unnecessary lruvec adding mm/rmap: stop store reordering issue on page->mapping Hugh Dickins <hughd@google.com>: mm: page_idle_get_page() does not need lru_lock Alex Shi <alex.shi@linux.alibaba.com>: mm/memcg: add debug checking in lock_page_memcg mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn mm/lru: move lock into lru_note_cost mm/vmscan: remove lruvec reget in move_pages_to_lru mm/mlock: remove lru_lock on TestClearPageMlocked mm/mlock: remove __munlock_isolate_lru_page() mm/lru: introduce TestClearPageLRU() mm/compaction: do page isolation first in compaction mm/swap.c: serialize memcg changes in pagevec_lru_move_fn mm/lru: replace pgdat lru_lock with lruvec lock Alexander Duyck <alexander.h.duyck@linux.intel.com>: mm/lru: introduce relock_page_lruvec() Hugh Dickins <hughd@google.com>: mm/lru: revise the comments of lru_lock Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 - Documentation/admin-guide/cgroup-v1/memory.rst | 23 - Documentation/trace/events-kmem.rst | 2 Documentation/vm/unevictable-lru.rst | 22 - include/linux/memcontrol.h | 110 +++++++ include/linux/mm_types.h | 2 include/linux/mmzone.h | 6 include/linux/page-flags.h | 1 include/linux/swap.h | 4 mm/compaction.c | 98 ++++--- mm/filemap.c | 4 mm/huge_memory.c | 109 ++++--- mm/memcontrol.c | 84 +++++- mm/mlock.c | 93 ++---- mm/mmzone.c | 1 mm/page_alloc.c | 1 mm/page_idle.c | 4 mm/rmap.c | 12 mm/swap.c | 292 ++++++++------------- mm/vmscan.c | 239 ++++++++--------- mm/workingset.c | 2 21 files changed, 644 insertions(+), 480 deletions(-)
On Tue, Dec 15, 2020 at 12:32 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> - more MM work: a memcg scalability improvememt
>
> 19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.
I'm not seeing patch 10/19 at all.
And patch 19/19 is corrupted and has an attachment with a '^P'
character in it. I could fix it up, but with the missing patch in the
middle I'm not going to even try. 'b4' is also very unhappy about that
patch 19/19.
I don't know what went wrong, but I'll ignore this send - please
re-send the series at your leisure, ok?
Linus
On Tue, Dec 15, 2020 at 12:32 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> - more MM work: a memcg scalability improvememt
>
> 19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.
With your re-send, I get all patches, but they don't actually apply cleanly.
Is that base correct?
I get
error: patch failed: mm/huge_memory.c:2750
error: mm/huge_memory.c: patch does not apply
Patch failed at 0004 mm/thp: narrow lru locking
for that patch "[patch 04/19] mm/thp: narrow lru locking", and that's
definitely true: the patch fragment has
@@ -2750,7 +2751,7 @@ int split_huge_page_to_list(struct page
__dec_lruvec_page_state(head, NR_FILE_THPS);
}
- __split_huge_page(page, list, end, flags);
+ __split_huge_page(page, list, end);
ret = 0;
} else {
if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
but that __dec_lruvec_page_state() conversion was done by your
previous commit series.
So I have the feeling that what you actually mean by "base" isn't
actually really the base for that series at all..
I will try to apply it on top of my merge of your previous series instead.
Linus
On Tue, Dec 15, 2020 at 2:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I will try to apply it on top of my merge of your previous series instead.
Yes, then it applies cleanly. So apparently we just have different
concepts of what really constitutes a "base" for applying your series.
Linus
On Tue, 15 Dec 2020 14:49:24 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Dec 15, 2020 at 2:48 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I will try to apply it on top of my merge of your previous series instead.
>
> Yes, then it applies cleanly. So apparently we just have different
> concepts of what really constitutes a "base" for applying your series.
>
oop, sorry, yes, the "based on" thing was wrong because I had two
series in flight simultaneously. I've never tried that before..
- lots of little subsystems - a few post-linux-next MM material. Most of this awaits more merging of other trees. 95 patches, based on 489e9fea66f31086f85d9a18e61e4791d94a56a4. Subsystems affected by this patch series: mm/swap mm/memory-hotplug alpha procfs misc core-kernel bitmap lib lz4 bitops checkpatch nilfs kdump rapidio gcov bfs relay resource ubsan reboot fault-injection lzo apparmor mm/pagemap mm/cleanups mm/gup Subsystem: mm/swap Zhaoyang Huang <huangzhaoyang@gmail.com>: mm: fix a race on nr_swap_pages Subsystem: mm/memory-hotplug Laurent Dufour <ldufour@linux.ibm.com>: mm/memory_hotplug: quieting offline operation Subsystem: alpha Thomas Gleixner <tglx@linutronix.de>: alpha: replace bogus in_interrupt() Subsystem: procfs Randy Dunlap <rdunlap@infradead.org>: procfs: delete duplicated words + other fixes Anand K Mistry <amistry@google.com>: proc: provide details on indirect branch speculation Alexey Dobriyan <adobriyan@gmail.com>: proc: fix lookup in /proc/net subdirectories after setns(2) Hui Su <sh_def@163.com>: fs/proc: make pde_get() return nothing Subsystem: misc Christophe Leroy <christophe.leroy@csgroup.eu>: asm-generic: force inlining of get_order() to work around gcc10 poor decision Andy Shevchenko <andriy.shevchenko@linux.intel.com>: kernel.h: split out mathematical helpers Subsystem: core-kernel Hui Su <sh_def@163.com>: kernel/acct.c: use #elif instead of #end and #elif Subsystem: bitmap Andy Shevchenko <andriy.shevchenko@linux.intel.com>: include/linux/bitmap.h: convert bitmap_empty() / bitmap_full() to return boolean "Ma, Jianpeng" <jianpeng.ma@intel.com>: bitmap: remove unused function declaration Subsystem: lib Geert Uytterhoeven <geert@linux-m68k.org>: lib/test_free_pages.c: add basic progress indicators "Gustavo A. R. Silva" <gustavoars@kernel.org>: Patch series "] lib/stackdepot.c: Replace one-element array with flexible-array member": lib/stackdepot.c: replace one-element array with flexible-array member lib/stackdepot.c: use flex_array_size() helper in memcpy() lib/stackdepot.c: use array_size() helper in jhash2() Sebastian Andrzej Siewior <bigeasy@linutronix.de>: lib/test_lockup.c: minimum fix to get it compiled on PREEMPT_RT Andy Shevchenko <andriy.shevchenko@linux.intel.com>: lib/list_kunit: follow new file name convention for KUnit tests lib/linear_ranges_kunit: follow new file name convention for KUnit tests lib/bits_kunit: follow new file name convention for KUnit tests lib/cmdline: fix get_option() for strings starting with hyphen lib/cmdline: allow NULL to be an output for get_option() lib/cmdline_kunit: add a new test suite for cmdline API Jakub Jelinek <jakub@redhat.com>: ilog2: improve ilog2 for constant arguments Nick Desaulniers <ndesaulniers@google.com>: lib/string: remove unnecessary #undefs Daniel Axtens <dja@axtens.net>: Patch series "Fortify strscpy()", v7: lib: string.h: detect intra-object overflow in fortified string functions lkdtm: tests for FORTIFY_SOURCE Francis Laniel <laniel_francis@privacyrequired.com>: string.h: add FORTIFY coverage for strscpy() drivers/misc/lkdtm: add new file in LKDTM to test fortified strscpy drivers/misc/lkdtm/lkdtm.h: correct wrong filenames in comment Alexey Dobriyan <adobriyan@gmail.com>: lib: cleanup kstrto*() usage Subsystem: lz4 Gao Xiang <hsiangkao@redhat.com>: lib/lz4: explicitly support in-place decompression Subsystem: bitops Syed Nayyar Waris <syednwaris@gmail.com>: Patch series "Introduce the for_each_set_clump macro", v12: bitops: introduce the for_each_set_clump macro lib/test_bitmap.c: add for_each_set_clump test cases gpio: thunderx: utilize for_each_set_clump macro gpio: xilinx: utilize generic bitmap_get_value and _set_value Subsystem: checkpatch Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: add new exception to repeated word check Aditya Srivastava <yashsri421@gmail.com>: checkpatch: fix false positives in REPEATED_WORD warning Łukasz Stelmach <l.stelmach@samsung.com>: checkpatch: ignore generated CamelCase defines and enum values Joe Perches <joe@perches.com>: checkpatch: prefer static const declarations checkpatch: allow --fix removal of unnecessary break statements Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: extend attributes check to handle more patterns Tom Rix <trix@redhat.com>: checkpatch: add a fixer for missing newline at eof Joe Perches <joe@perches.com>: checkpatch: update __attribute__((section("name"))) quote removal Aditya Srivastava <yashsri421@gmail.com>: checkpatch: add fix option for GERRIT_CHANGE_ID Joe Perches <joe@perches.com>: checkpatch: add __alias and __weak to suggested __attribute__ conversions Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: improve email parsing checkpatch: fix spelling errors and remove repeated word Aditya Srivastava <yashsri421@gmail.com>: checkpatch: avoid COMMIT_LOG_LONG_LINE warning for signature tags Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: fix unescaped left brace Aditya Srivastava <yashsri421@gmail.com>: checkpatch: add fix option for ASSIGNMENT_CONTINUATIONS checkpatch: add fix option for LOGICAL_CONTINUATIONS checkpatch: add fix and improve warning msg for non-standard signature Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: add warning for unnecessary use of %h[xudi] and %hh[xudi] checkpatch: add warning for lines starting with a '#' in commit log checkpatch: fix TYPO_SPELLING check for words with apostrophe Joe Perches <joe@perches.com>: checkpatch: add printk_once and printk_ratelimit to prefer pr_<level> warning Subsystem: nilfs Alex Shi <alex.shi@linux.alibaba.com>: fs/nilfs2: remove some unused macros to tame gcc Subsystem: kdump Alexander Egorenkov <egorenar@linux.ibm.com>: kdump: append uts_namespace.name offset to VMCOREINFO Subsystem: rapidio Sebastian Andrzej Siewior <bigeasy@linutronix.de>: rapidio: remove unused rio_get_asm() and rio_get_device() Subsystem: gcov Nick Desaulniers <ndesaulniers@google.com>: gcov: remove support for GCC < 4.9 Alex Shi <alex.shi@linux.alibaba.com>: gcov: fix kernel-doc markup issue Subsystem: bfs Randy Dunlap <rdunlap@infradead.org>: bfs: don't use WARNING: string when it's just info. Subsystem: relay Jani Nikula <jani.nikula@intel.com>: Patch series "relay: cleanup and const callbacks", v2: relay: remove unused buf_mapped and buf_unmapped callbacks relay: require non-NULL callbacks in relay_open() relay: make create_buf_file and remove_buf_file callbacks mandatory relay: allow the use of const callback structs drm/i915: make relay callbacks const ath10k: make relay callbacks const ath11k: make relay callbacks const ath9k: make relay callbacks const blktrace: make relay callbacks const Subsystem: resource Mauro Carvalho Chehab <mchehab+huawei@kernel.org>: kernel/resource.c: fix kernel-doc markups Subsystem: ubsan Kees Cook <keescook@chromium.org>: Patch series "Clean up UBSAN Makefile", v2: ubsan: remove redundant -Wno-maybe-uninitialized ubsan: move cc-option tests into Kconfig ubsan: disable object-size sanitizer under GCC ubsan: disable UBSAN_TRAP for all*config ubsan: enable for all*config builds ubsan: remove UBSAN_MISC in favor of individual options ubsan: expand tests and reporting Dmitry Vyukov <dvyukov@google.com>: kcov: don't instrument with UBSAN Zou Wei <zou_wei@huawei.com>: lib/ubsan.c: mark type_check_kinds with static keyword Subsystem: reboot Matteo Croce <mcroce@microsoft.com>: reboot: refactor and comment the cpu selection code reboot: allow to specify reboot mode via sysfs reboot: remove cf9_safe from allowed types and rename cf9_force Patch series "reboot: sysfs improvements": reboot: allow to override reboot type if quirks are found reboot: hide from sysfs not applicable settings Subsystem: fault-injection Barnabás Pőcze <pobrn@protonmail.com>: fault-injection: handle EI_ETYPE_TRUE Subsystem: lzo Jason Yan <yanaijie@huawei.com>: lib/lzo/lzo1x_compress.c: make lzogeneric1x_1_compress() static Subsystem: apparmor Andy Shevchenko <andriy.shevchenko@linux.intel.com>: apparmor: remove duplicate macro list_entry_is_head() Subsystem: mm/pagemap Christoph Hellwig <hch@lst.de>: Patch series "simplify follow_pte a bit": mm: unexport follow_pte_pmd mm: simplify follow_pte{,pmd} Subsystem: mm/cleanups Haitao Shi <shihaitao1@huawei.com>: mm: fix some spelling mistakes in comments Subsystem: mm/gup Jann Horn <jannh@google.com>: mmap locking API: don't check locking if the mm isn't live yet mm/gup: assert that the mmap lock is held in __get_user_pages() Documentation/ABI/testing/sysfs-kernel-reboot | 32 Documentation/admin-guide/kdump/vmcoreinfo.rst | 6 Documentation/dev-tools/ubsan.rst | 1 Documentation/filesystems/proc.rst | 2 MAINTAINERS | 5 arch/alpha/kernel/process.c | 2 arch/powerpc/kernel/vmlinux.lds.S | 4 arch/s390/pci/pci_mmio.c | 4 drivers/gpio/gpio-thunderx.c | 11 drivers/gpio/gpio-xilinx.c | 61 - drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 2 drivers/misc/lkdtm/Makefile | 1 drivers/misc/lkdtm/bugs.c | 50 + drivers/misc/lkdtm/core.c | 3 drivers/misc/lkdtm/fortify.c | 82 ++ drivers/misc/lkdtm/lkdtm.h | 19 drivers/net/wireless/ath/ath10k/spectral.c | 2 drivers/net/wireless/ath/ath11k/spectral.c | 2 drivers/net/wireless/ath/ath9k/common-spectral.c | 2 drivers/rapidio/rio.c | 81 -- fs/bfs/inode.c | 2 fs/dax.c | 9 fs/exec.c | 8 fs/nfs/callback_proc.c | 5 fs/nilfs2/segment.c | 5 fs/proc/array.c | 28 fs/proc/base.c | 2 fs/proc/generic.c | 24 fs/proc/internal.h | 10 fs/proc/proc_net.c | 20 include/asm-generic/bitops/find.h | 19 include/asm-generic/getorder.h | 2 include/linux/bitmap.h | 67 +- include/linux/bitops.h | 24 include/linux/dcache.h | 1 include/linux/iommu-helper.h | 4 include/linux/kernel.h | 173 ----- include/linux/log2.h | 3 include/linux/math.h | 177 +++++ include/linux/mm.h | 6 include/linux/mm_types.h | 10 include/linux/mmap_lock.h | 16 include/linux/proc_fs.h | 8 include/linux/rcu_node_tree.h | 2 include/linux/relay.h | 29 include/linux/rio_drv.h | 3 include/linux/string.h | 75 +- include/linux/units.h | 2 kernel/Makefile | 3 kernel/acct.c | 7 kernel/crash_core.c | 1 kernel/fail_function.c | 6 kernel/gcov/gcc_4_7.c | 10 kernel/reboot.c | 308 ++++++++- kernel/relay.c | 111 --- kernel/resource.c | 24 kernel/trace/blktrace.c | 2 lib/Kconfig.debug | 11 lib/Kconfig.ubsan | 154 +++- lib/Makefile | 7 lib/bits_kunit.c | 75 ++ lib/cmdline.c | 20 lib/cmdline_kunit.c | 100 +++ lib/errname.c | 1 lib/error-inject.c | 2 lib/errseq.c | 1 lib/find_bit.c | 17 lib/linear_ranges_kunit.c | 228 +++++++ lib/list-test.c | 748 ----------------------- lib/list_kunit.c | 748 +++++++++++++++++++++++ lib/lz4/lz4_decompress.c | 6 lib/lz4/lz4defs.h | 1 lib/lzo/lzo1x_compress.c | 2 lib/math/div64.c | 4 lib/math/int_pow.c | 2 lib/math/int_sqrt.c | 3 lib/math/reciprocal_div.c | 9 lib/stackdepot.c | 11 lib/string.c | 4 lib/test_bitmap.c | 143 ++++ lib/test_bits.c | 75 -- lib/test_firmware.c | 9 lib/test_free_pages.c | 5 lib/test_kmod.c | 26 lib/test_linear_ranges.c | 228 ------- lib/test_lockup.c | 16 lib/test_ubsan.c | 74 ++ lib/ubsan.c | 2 mm/filemap.c | 2 mm/gup.c | 2 mm/huge_memory.c | 2 mm/khugepaged.c | 2 mm/memblock.c | 2 mm/memory.c | 36 - mm/memory_hotplug.c | 2 mm/migrate.c | 2 mm/page_ext.c | 2 mm/swapfile.c | 11 scripts/Makefile.ubsan | 49 - scripts/checkpatch.pl | 495 +++++++++++---- security/apparmor/apparmorfs.c | 3 tools/testing/selftests/lkdtm/tests.txt | 1 102 files changed, 3022 insertions(+), 1899 deletions(-)
78 patches, based on a409ed156a90093a03fe6a93721ddf4c591eac87. Subsystems affected by this patch series: mm/memcg epoll mm/kasan mm/cleanups epoll Subsystem: mm/memcg Alex Shi <alex.shi@linux.alibaba.com>: Patch series "bail out early for memcg disable": mm/memcg: bail early from swap accounting if memcg disabled mm/memcg: warning on !memcg after readahead page charged Wei Yang <richard.weiyang@gmail.com>: mm/memcg: remove unused definitions Shakeel Butt <shakeelb@google.com>: mm, kvm: account kvm_vcpu_mmap to kmemcg Hui Su <sh_def@163.com>: mm/memcontrol:rewrite mem_cgroup_page_lruvec() Subsystem: epoll Soheil Hassas Yeganeh <soheil@google.com>: Patch series "simplify ep_poll": epoll: check for events when removing a timed out thread from the wait queue epoll: simplify signal handling epoll: pull fatal signal checks into ep_send_events() epoll: move eavail next to the list_empty_careful check epoll: simplify and optimize busy loop logic epoll: pull all code between fetch_events and send_event into the loop epoll: replace gotos with a proper loop epoll: eliminate unnecessary lock for zero timeout Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan: add hardware tag-based mode for arm64", v11: kasan: drop unnecessary GPL text from comment headers kasan: KASAN_VMALLOC depends on KASAN_GENERIC kasan: group vmalloc code kasan: shadow declarations only for software modes kasan: rename (un)poison_shadow to (un)poison_range kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_* kasan: only build init.c for software modes kasan: split out shadow.c from common.c kasan: define KASAN_MEMORY_PER_SHADOW_PAGE kasan: rename report and tags files kasan: don't duplicate config dependencies kasan: hide invalid free check implementation kasan: decode stack frame only with KASAN_STACK_ENABLE kasan, arm64: only init shadow for software modes kasan, arm64: only use kasan_depth for software modes kasan, arm64: move initialization message kasan, arm64: rename kasan_init_tags and mark as __init kasan: rename addr_has_shadow to addr_has_metadata kasan: rename print_shadow_for_address to print_memory_metadata kasan: rename SHADOW layout macros to META kasan: separate metadata_fetch_row for each mode kasan: introduce CONFIG_KASAN_HW_TAGS Vincenzo Frascino <vincenzo.frascino@arm.com>: arm64: enable armv8.5-a asm-arch option arm64: mte: add in-kernel MTE helpers arm64: mte: reset the page tag in page->flags arm64: mte: add in-kernel tag fault handler arm64: kasan: allow enabling in-kernel MTE arm64: mte: convert gcr_user into an exclude mask arm64: mte: switch GCR_EL1 in kernel entry and exit kasan, mm: untag page address in free_reserved_area Andrey Konovalov <andreyknvl@google.com>: arm64: kasan: align allocations for HW_TAGS arm64: kasan: add arch layer for memory tagging helpers kasan: define KASAN_GRANULE_SIZE for HW_TAGS kasan, x86, s390: update undef CONFIG_KASAN kasan, arm64: expand CONFIG_KASAN checks kasan, arm64: implement HW_TAGS runtime kasan, arm64: print report from tag fault handler kasan, mm: reset tags when accessing metadata kasan, arm64: enable CONFIG_KASAN_HW_TAGS kasan: add documentation for hardware tag-based mode Vincenzo Frascino <vincenzo.frascino@arm.com>: kselftest/arm64: check GCR_EL1 after context switch Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan: boot parameters for hardware tag-based mode", v4: kasan: simplify quarantine_put call site kasan: rename get_alloc/free_info kasan: introduce set_alloc_info kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK kasan: allow VMAP_STACK for HW_TAGS mode kasan: remove __kasan_unpoison_stack kasan: inline kasan_reset_tag for tag-based modes kasan: inline random_tag for HW_TAGS kasan: open-code kasan_unpoison_slab kasan: inline (un)poison_range and check_invalid_free kasan: add and integrate kasan boot parameters kasan, mm: check kasan_enabled in annotations kasan, mm: rename kasan_poison_kfree kasan: don't round_up too much kasan: simplify assign_tag and set_tag calls kasan: clarify comment in __kasan_kfree_large kasan: sanitize objects when metadata doesn't fit kasan, mm: allow cache merging with no metadata kasan: update documentation Subsystem: mm/cleanups Colin Ian King <colin.king@canonical.com>: mm/Kconfig: fix spelling mistake "whats" -> "what's" Subsystem: epoll Willem de Bruijn <willemb@google.com>: Patch series "add epoll_pwait2 syscall", v4: epoll: convert internal api to timespec64 epoll: add syscall epoll_pwait2 epoll: wire up syscall epoll_pwait2 selftests/filesystems: expand epoll with epoll_pwait2 Documentation/dev-tools/kasan.rst | 274 +- arch/Kconfig | 8 arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/Kconfig | 9 arch/arm64/Makefile | 7 arch/arm64/include/asm/assembler.h | 2 arch/arm64/include/asm/cache.h | 3 arch/arm64/include/asm/esr.h | 1 arch/arm64/include/asm/kasan.h | 17 arch/arm64/include/asm/memory.h | 15 arch/arm64/include/asm/mte-def.h | 16 arch/arm64/include/asm/mte-kasan.h | 67 arch/arm64/include/asm/mte.h | 22 arch/arm64/include/asm/processor.h | 2 arch/arm64/include/asm/string.h | 5 arch/arm64/include/asm/uaccess.h | 23 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/arm64/kernel/asm-offsets.c | 3 arch/arm64/kernel/cpufeature.c | 3 arch/arm64/kernel/entry.S | 41 arch/arm64/kernel/head.S | 2 arch/arm64/kernel/hibernate.c | 5 arch/arm64/kernel/image-vars.h | 2 arch/arm64/kernel/kaslr.c | 3 arch/arm64/kernel/module.c | 6 arch/arm64/kernel/mte.c | 124 + arch/arm64/kernel/setup.c | 2 arch/arm64/kernel/sleep.S | 2 arch/arm64/kernel/smp.c | 2 arch/arm64/lib/mte.S | 16 arch/arm64/mm/copypage.c | 9 arch/arm64/mm/fault.c | 59 arch/arm64/mm/kasan_init.c | 41 arch/arm64/mm/mteswap.c | 9 arch/arm64/mm/proc.S | 23 arch/arm64/mm/ptdump.c | 6 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/boot/string.c | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/boot/compressed/misc.h | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/x86/kernel/acpi/wakeup_64.S | 2 arch/x86/kvm/x86.c | 2 arch/xtensa/kernel/syscalls/syscall.tbl | 1 fs/eventpoll.c | 359 ++- include/linux/compat.h | 6 include/linux/kasan-checks.h | 2 include/linux/kasan.h | 423 ++-- include/linux/memcontrol.h | 137 - include/linux/mm.h | 24 include/linux/mmdebug.h | 13 include/linux/moduleloader.h | 3 include/linux/page-flags-layout.h | 2 include/linux/sched.h | 2 include/linux/string.h | 2 include/linux/syscalls.h | 5 include/uapi/asm-generic/unistd.h | 4 init/init_task.c | 2 kernel/fork.c | 4 kernel/sys_ni.c | 2 lib/Kconfig.kasan | 71 lib/test_kasan.c | 2 lib/test_kasan_module.c | 2 mm/Kconfig | 2 mm/kasan/Makefile | 33 mm/kasan/common.c | 1006 ++-------- mm/kasan/generic.c | 72 mm/kasan/generic_report.c | 13 mm/kasan/hw_tags.c | 294 ++ mm/kasan/init.c | 25 mm/kasan/kasan.h | 204 +- mm/kasan/quarantine.c | 35 mm/kasan/report.c | 363 +-- mm/kasan/report_generic.c | 169 + mm/kasan/report_hw_tags.c | 44 mm/kasan/report_sw_tags.c | 22 mm/kasan/shadow.c | 541 +++++ mm/kasan/sw_tags.c | 34 mm/kasan/tags.c | 7 mm/kasan/tags_report.c | 7 mm/memcontrol.c | 53 mm/mempool.c | 4 mm/page_alloc.c | 9 mm/page_poison.c | 2 mm/ptdump.c | 13 mm/slab_common.c | 5 mm/slub.c | 29 scripts/Makefile.lib | 2 tools/testing/selftests/arm64/mte/Makefile | 2 tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c | 155 + tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 72 virt/kvm/coalesced_mmio.c | 2 virt/kvm/kvm_main.c | 2 105 files changed, 3268 insertions(+), 1873 deletions(-)
60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e. Subsystems affected by this patch series: mm/kasan Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan: add hardware tag-based mode for arm64", v11: kasan: drop unnecessary GPL text from comment headers kasan: KASAN_VMALLOC depends on KASAN_GENERIC kasan: group vmalloc code kasan: shadow declarations only for software modes kasan: rename (un)poison_shadow to (un)poison_range kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_* kasan: only build init.c for software modes kasan: split out shadow.c from common.c kasan: define KASAN_MEMORY_PER_SHADOW_PAGE kasan: rename report and tags files kasan: don't duplicate config dependencies kasan: hide invalid free check implementation kasan: decode stack frame only with KASAN_STACK_ENABLE kasan, arm64: only init shadow for software modes kasan, arm64: only use kasan_depth for software modes kasan, arm64: move initialization message kasan, arm64: rename kasan_init_tags and mark as __init kasan: rename addr_has_shadow to addr_has_metadata kasan: rename print_shadow_for_address to print_memory_metadata kasan: rename SHADOW layout macros to META kasan: separate metadata_fetch_row for each mode kasan: introduce CONFIG_KASAN_HW_TAGS Vincenzo Frascino <vincenzo.frascino@arm.com>: arm64: enable armv8.5-a asm-arch option arm64: mte: add in-kernel MTE helpers arm64: mte: reset the page tag in page->flags arm64: mte: add in-kernel tag fault handler arm64: kasan: allow enabling in-kernel MTE arm64: mte: convert gcr_user into an exclude mask arm64: mte: switch GCR_EL1 in kernel entry and exit kasan, mm: untag page address in free_reserved_area Andrey Konovalov <andreyknvl@google.com>: arm64: kasan: align allocations for HW_TAGS arm64: kasan: add arch layer for memory tagging helpers kasan: define KASAN_GRANULE_SIZE for HW_TAGS kasan, x86, s390: update undef CONFIG_KASAN kasan, arm64: expand CONFIG_KASAN checks kasan, arm64: implement HW_TAGS runtime kasan, arm64: print report from tag fault handler kasan, mm: reset tags when accessing metadata kasan, arm64: enable CONFIG_KASAN_HW_TAGS kasan: add documentation for hardware tag-based mode Vincenzo Frascino <vincenzo.frascino@arm.com>: kselftest/arm64: check GCR_EL1 after context switch Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan: boot parameters for hardware tag-based mode", v4: kasan: simplify quarantine_put call site kasan: rename get_alloc/free_info kasan: introduce set_alloc_info kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK kasan: allow VMAP_STACK for HW_TAGS mode kasan: remove __kasan_unpoison_stack kasan: inline kasan_reset_tag for tag-based modes kasan: inline random_tag for HW_TAGS kasan: open-code kasan_unpoison_slab kasan: inline (un)poison_range and check_invalid_free kasan: add and integrate kasan boot parameters kasan, mm: check kasan_enabled in annotations kasan, mm: rename kasan_poison_kfree kasan: don't round_up too much kasan: simplify assign_tag and set_tag calls kasan: clarify comment in __kasan_kfree_large kasan: sanitize objects when metadata doesn't fit kasan, mm: allow cache merging with no metadata kasan: update documentation Documentation/dev-tools/kasan.rst | 274 ++- arch/Kconfig | 8 arch/arm64/Kconfig | 9 arch/arm64/Makefile | 7 arch/arm64/include/asm/assembler.h | 2 arch/arm64/include/asm/cache.h | 3 arch/arm64/include/asm/esr.h | 1 arch/arm64/include/asm/kasan.h | 17 arch/arm64/include/asm/memory.h | 15 arch/arm64/include/asm/mte-def.h | 16 arch/arm64/include/asm/mte-kasan.h | 67 arch/arm64/include/asm/mte.h | 22 arch/arm64/include/asm/processor.h | 2 arch/arm64/include/asm/string.h | 5 arch/arm64/include/asm/uaccess.h | 23 arch/arm64/kernel/asm-offsets.c | 3 arch/arm64/kernel/cpufeature.c | 3 arch/arm64/kernel/entry.S | 41 arch/arm64/kernel/head.S | 2 arch/arm64/kernel/hibernate.c | 5 arch/arm64/kernel/image-vars.h | 2 arch/arm64/kernel/kaslr.c | 3 arch/arm64/kernel/module.c | 6 arch/arm64/kernel/mte.c | 124 + arch/arm64/kernel/setup.c | 2 arch/arm64/kernel/sleep.S | 2 arch/arm64/kernel/smp.c | 2 arch/arm64/lib/mte.S | 16 arch/arm64/mm/copypage.c | 9 arch/arm64/mm/fault.c | 59 arch/arm64/mm/kasan_init.c | 41 arch/arm64/mm/mteswap.c | 9 arch/arm64/mm/proc.S | 23 arch/arm64/mm/ptdump.c | 6 arch/s390/boot/string.c | 1 arch/x86/boot/compressed/misc.h | 1 arch/x86/kernel/acpi/wakeup_64.S | 2 include/linux/kasan-checks.h | 2 include/linux/kasan.h | 423 ++++- include/linux/mm.h | 24 include/linux/moduleloader.h | 3 include/linux/page-flags-layout.h | 2 include/linux/sched.h | 2 include/linux/string.h | 2 init/init_task.c | 2 kernel/fork.c | 4 lib/Kconfig.kasan | 71 lib/test_kasan.c | 2 lib/test_kasan_module.c | 2 mm/kasan/Makefile | 33 mm/kasan/common.c | 1006 +++----------- mm/kasan/generic.c | 72 - mm/kasan/generic_report.c | 13 mm/kasan/hw_tags.c | 276 +++ mm/kasan/init.c | 25 mm/kasan/kasan.h | 195 ++ mm/kasan/quarantine.c | 35 mm/kasan/report.c | 363 +---- mm/kasan/report_generic.c | 169 ++ mm/kasan/report_hw_tags.c | 44 mm/kasan/report_sw_tags.c | 22 mm/kasan/shadow.c | 528 +++++++ mm/kasan/sw_tags.c | 34 mm/kasan/tags.c | 7 mm/kasan/tags_report.c | 7 mm/mempool.c | 4 mm/page_alloc.c | 9 mm/page_poison.c | 2 mm/ptdump.c | 13 mm/slab_common.c | 5 mm/slub.c | 29 scripts/Makefile.lib | 2 tools/testing/selftests/arm64/mte/Makefile | 2 tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c | 155 ++ 74 files changed, 2869 insertions(+), 1553 deletions(-)
On Tue, Dec 22, 2020 at 11:58 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > 60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e. I see that you enabled renaming in the patches. Lovely. Can you also enable it in the diffstat? > 74 files changed, 2869 insertions(+), 1553 deletions(-) With -M in the diffstat, you should have seen 72 files changed, 2775 insertions(+), 1460 deletions(-) and if you add "--summary", you'll also see the rename part ofthe file create/delete summary: rename mm/kasan/{tags_report.c => report_sw_tags.c} (78%) which is often nice to see in addition to the line stats.. Linus
16 patches, based on dea8dcf2a9fa8cc540136a6cd885c3beece16ec3. Subsystems affected by this patch series: mm/selftests mm/hugetlb kbuild checkpatch mm/pagecache mm/mremap mm/kasan misc lib mm/slub Subsystem: mm/selftests Harish <harish@linux.ibm.com>: selftests/vm: fix building protection keys test Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: mm/hugetlb: fix deadlock in hugetlb_cow error path Subsystem: kbuild Masahiro Yamada <masahiroy@kernel.org>: Revert "kbuild: avoid static_assert for genksyms" Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: prefer strscpy to strlcpy Subsystem: mm/pagecache Souptick Joarder <jrdr.linux@gmail.com>: mm: add prototype for __add_to_page_cache_locked() Baoquan He <bhe@redhat.com>: mm: memmap defer init doesn't work as expected Subsystem: mm/mremap Kalesh Singh <kaleshsingh@google.com>: mm/mremap.c: fix extent calculation Nicholas Piggin <npiggin@gmail.com>: mm: generalise COW SMC TLB flushing race comment Subsystem: mm/kasan Walter Wu <walter-zh.wu@mediatek.com>: kasan: fix null pointer dereference in kasan_record_aux_stack Subsystem: misc Randy Dunlap <rdunlap@infradead.org>: local64.h: make <asm/local64.h> mandatory Huang Shijie <sjhuang@iluvatar.ai>: sizes.h: add SZ_8G/SZ_16G/SZ_32G macros Josh Poimboeuf <jpoimboe@redhat.com>: kdev_t: always inline major/minor helper functions Subsystem: lib Huang Shijie <sjhuang@iluvatar.ai>: lib/genalloc: fix the overflow when size is too big Ilya Leoshkevich <iii@linux.ibm.com>: lib/zlib: fix inflating zlib streams on s390 Randy Dunlap <rdunlap@infradead.org>: zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c Subsystem: mm/slub Roman Gushchin <guro@fb.com>: mm: slub: call account_slab_page() after slab page initialization arch/alpha/include/asm/local64.h | 1 - arch/arc/include/asm/Kbuild | 1 - arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/csky/include/asm/Kbuild | 1 - arch/h8300/include/asm/Kbuild | 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/local64.h | 1 - arch/ia64/mm/init.c | 4 ++-- arch/m68k/include/asm/Kbuild | 1 - arch/microblaze/include/asm/Kbuild | 1 - arch/mips/include/asm/Kbuild | 1 - arch/nds32/include/asm/Kbuild | 1 - arch/openrisc/include/asm/Kbuild | 1 - arch/parisc/include/asm/Kbuild | 1 - arch/powerpc/include/asm/Kbuild | 1 - arch/riscv/include/asm/Kbuild | 1 - arch/s390/include/asm/Kbuild | 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild | 1 - arch/x86/include/asm/local64.h | 1 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/Kbuild | 1 + include/linux/build_bug.h | 5 ----- include/linux/kdev_t.h | 22 +++++++++++----------- include/linux/mm.h | 12 ++++++++++-- include/linux/sizes.h | 3 +++ lib/genalloc.c | 25 +++++++++++++------------ lib/zlib_dfltcc/Makefile | 2 +- lib/zlib_dfltcc/dfltcc.c | 6 +++++- lib/zlib_dfltcc/dfltcc_deflate.c | 3 +++ lib/zlib_dfltcc/dfltcc_inflate.c | 4 ++-- lib/zlib_dfltcc/dfltcc_syms.c | 17 ----------------- mm/hugetlb.c | 22 +++++++++++++++++++++- mm/kasan/generic.c | 2 ++ mm/memory.c | 8 +++++--- mm/memory_hotplug.c | 2 +- mm/mremap.c | 4 +++- mm/page_alloc.c | 8 +++++--- mm/slub.c | 5 ++--- scripts/checkpatch.pl | 6 ++++++ tools/testing/selftests/vm/Makefile | 10 +++++----- 42 files changed, 101 insertions(+), 91 deletions(-)
10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c. Subsystems affected by this patch series: mm/slub mm/pagealloc mm/memcg mm/kasan mm/vmalloc mm/migration mm/hugetlb MAINTAINERS mm/memory-failure mm/process_vm_access Subsystem: mm/slub Jann Horn <jannh@google.com>: mm, slub: consider rest of partial list if acquire_slab() fails Subsystem: mm/pagealloc Hailong liu <liu.hailong6@zte.com.cn>: mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint Subsystem: mm/memcg Hugh Dickins <hughd@google.com>: mm/memcontrol: fix warning in mem_cgroup_page_lruvec() Subsystem: mm/kasan Hailong Liu <liu.hailong6@zte.com.cn>: arm/kasan: fix the array size of kasan_early_shadow_pte[] Subsystem: mm/vmalloc Miaohe Lin <linmiaohe@huawei.com>: mm/vmalloc.c: fix potential memory leak Subsystem: mm/migration Jan Stancek <jstancek@redhat.com>: mm: migrate: initialize err in do_migrate_pages Subsystem: mm/hugetlb Miaohe Lin <linmiaohe@huawei.com>: mm/hugetlb: fix potential missing huge page size info Subsystem: MAINTAINERS Vlastimil Babka <vbabka@suse.cz>: MAINTAINERS: add Vlastimil as slab allocators maintainer Subsystem: mm/memory-failure Oscar Salvador <osalvador@suse.de>: mm,hwpoison: fix printing of page flags Subsystem: mm/process_vm_access Andrew Morton <akpm@linux-foundation.org>: mm/process_vm_access.c: include compat.h MAINTAINERS | 1 + include/linux/kasan.h | 6 +++++- include/linux/memcontrol.h | 2 +- mm/hugetlb.c | 2 +- mm/kasan/init.c | 3 ++- mm/memory-failure.c | 2 +- mm/mempolicy.c | 2 +- mm/page_alloc.c | 31 ++++++++++++++++--------------- mm/process_vm_access.c | 1 + mm/slub.c | 2 +- mm/vmalloc.c | 4 +++- 11 files changed, 33 insertions(+), 23 deletions(-)
On Tue, Jan 12, 2021 at 3:48 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c.
Whee. I had completely dropped the ball on this - I had built my usual
"akpm" branch with the patches, but then had completely forgotten
about it after doing my basic build tests.
I tend to leave it for a while to see if people send belated ACK/NAK's
for the patches, but that "for a while" is typically "overnight", not
several days.
So if you ever notice that I haven't merged your patch submission, and
you haven't seen me comment on them, feel free to ping me to remind
me.
Because it might just have gotten lost in the shuffle for some random
reason. Admittedly it's rare - I think this is the first time I just
randomly noticed three days later that I'd never done the actual merge
of the patch-series).
Linus
19 patches, based on e1ae4b0be15891faf46d390e9f3dc9bd71a8cae1. Subsystems affected by this patch series: mm/pagealloc mm/memcg mm/kasan ubsan mm/memory-failure mm/highmem proc MAINTAINERS Subsystem: mm/pagealloc Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: fix initialization of struct page for holes in memory layout", v3: x86/setup: don't remove E820_TYPE_RAM for pfn 0 mm: fix initialization of struct page for holes in memory layout Subsystem: mm/memcg Roman Gushchin <guro@fb.com>: mm: memcg/slab: optimize objcg stock draining Shakeel Butt <shakeelb@google.com>: mm: memcg: fix memcg file_dirty numa stat mm: fix numa stats for thp migration Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: prevent starvation when writing memory.high Subsystem: mm/kasan Lecopzer Chen <lecopzer@gmail.com>: kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow kasan: fix incorrect arguments passing in kasan_add_zero_shadow Andrey Konovalov <andreyknvl@google.com>: kasan: fix HW_TAGS boot parameters kasan, mm: fix conflicts with init_on_alloc/free kasan, mm: fix resetting page_alloc tags for HW_TAGS Subsystem: ubsan Arnd Bergmann <arnd@arndb.de>: ubsan: disable unsigned-overflow check for i386 Subsystem: mm/memory-failure Dan Williams <dan.j.williams@intel.com>: mm: fix page reference leak in soft_offline_page() Subsystem: mm/highmem Thomas Gleixner <tglx@linutronix.de>: Patch series "mm/highmem: Fix fallout from generic kmap_local conversions": sparc/mm/highmem: flush cache and TLB mm/highmem: prepare for overriding set_pte_at() mips/mm/highmem: use set_pte() for kmap_local() powerpc/mm/highmem: use __set_pte_at() for kmap_local() Subsystem: proc Xiaoming Ni <nixiaoming@huawei.com>: proc_sysctl: fix oops caused by incorrect command parameters Subsystem: MAINTAINERS Nathan Chancellor <natechancellor@gmail.com>: MAINTAINERS: add a couple more files to the Clang/LLVM section Documentation/dev-tools/kasan.rst | 27 ++--------- MAINTAINERS | 2 arch/mips/include/asm/highmem.h | 1 arch/powerpc/include/asm/highmem.h | 2 arch/sparc/include/asm/highmem.h | 9 ++- arch/x86/kernel/setup.c | 20 +++----- fs/proc/proc_sysctl.c | 7 ++- lib/Kconfig.ubsan | 1 mm/highmem.c | 7 ++- mm/kasan/hw_tags.c | 77 +++++++++++++-------------------- mm/kasan/init.c | 23 +++++---- mm/memcontrol.c | 11 +--- mm/memory-failure.c | 20 ++++++-- mm/migrate.c | 27 ++++++----- mm/page_alloc.c | 86 ++++++++++++++++++++++--------------- mm/slub.c | 7 +-- 16 files changed, 173 insertions(+), 154 deletions(-)
18 patches, based on 5c279c4cf206e03995e04fd3404fa95ffd243a97. Subsystems affected by this patch series: mm/hugetlb mm/compaction mm/vmalloc gcov mm/shmem mm/memblock mailmap mm/pagecache mm/kasan ubsan mm/hugetlb MAINTAINERS Subsystem: mm/hugetlb Muchun Song <songmuchun@bytedance.com>: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page mm: hugetlb: fix a race between freeing and dissolving the page mm: hugetlb: fix a race between isolating and freeing page mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active mm: migrate: do not migrate HugeTLB page whose refcount is one Subsystem: mm/compaction Rokudo Yan <wu-yan@tcl.com>: mm, compaction: move high_pfn to the for loop scope Subsystem: mm/vmalloc Rick Edgecombe <rick.p.edgecombe@intel.com>: mm/vmalloc: separate put pages and flush VM flags Subsystem: gcov Johannes Berg <johannes.berg@intel.com>: init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov Subsystem: mm/shmem Hugh Dickins <hughd@google.com>: mm: thp: fix MADV_REMOVE deadlock on shmem THP Subsystem: mm/memblock Roman Gushchin <guro@fb.com>: memblock: do not start bottom-up allocations with kernel_end Subsystem: mailmap Viresh Kumar <viresh.kumar@linaro.org>: mailmap: fix name/email for Viresh Kumar Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>: mailmap: add entries for Manivannan Sadhasivam Subsystem: mm/pagecache Waiman Long <longman@redhat.com>: mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Subsystem: mm/kasan Vincenzo Frascino <vincenzo.frascino@arm.com>: Patch series "kasan: Fix metadata detection for KASAN_HW_TAGS", v5: kasan: add explicit preconditions to kasan_report() kasan: make addr_has_metadata() return true for valid addresses Subsystem: ubsan Nathan Chancellor <nathan@kernel.org>: ubsan: implement __ubsan_handle_alignment_assumption Subsystem: mm/hugetlb Muchun Song <songmuchun@bytedance.com>: mm: hugetlb: fix missing put_page in gather_surplus_pages() Subsystem: MAINTAINERS Nathan Chancellor <nathan@kernel.org>: MAINTAINERS/.mailmap: use my @kernel.org address .mailmap | 5 ++++ MAINTAINERS | 2 - fs/hugetlbfs/inode.c | 3 +- include/linux/hugetlb.h | 2 + include/linux/kasan.h | 7 ++++++ include/linux/vmalloc.h | 9 +------- init/Kconfig | 1 init/main.c | 8 ++++++- kernel/gcov/Kconfig | 2 - lib/ubsan.c | 31 ++++++++++++++++++++++++++++ lib/ubsan.h | 6 +++++ mm/compaction.c | 3 +- mm/filemap.c | 4 +++ mm/huge_memory.c | 37 ++++++++++++++++++++------------- mm/hugetlb.c | 53 ++++++++++++++++++++++++++++++++++++++++++------ mm/kasan/kasan.h | 2 - mm/memblock.c | 49 +++++--------------------------------------- mm/migrate.c | 6 +++++ 18 files changed, 153 insertions(+), 77 deletions(-)
14 patches, based on e0756cfc7d7cd08c98a53b6009c091a3f6a50be6. Subsystems affected by this patch series: squashfs mm/kasan firmware mm/mremap mm/tmpfs mm/selftests MAINTAINERS mm/memcg mm/slub nilfs2 Subsystem: squashfs Phillip Lougher <phillip@squashfs.org.uk>: Patch series "Squashfs: fix BIO migration regression and add sanity checks": squashfs: avoid out of bounds writes in decompressors squashfs: add more sanity checks in id lookup squashfs: add more sanity checks in inode lookup squashfs: add more sanity checks in xattr id lookup Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan: fix stack traces dependency for HW_TAGS Subsystem: firmware Fangrui Song <maskray@google.com>: firmware_loader: align .builtin_fw to 8 Subsystem: mm/mremap Arnd Bergmann <arnd@arndb.de>: mm/mremap: fix BUILD_BUG_ON() error in get_extent Subsystem: mm/tmpfs Seth Forshee <seth.forshee@canonical.com>: tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha Subsystem: mm/selftests Rong Chen <rong.a.chen@intel.com>: selftests/vm: rename file run_vmtests to run_vmtests.sh Subsystem: MAINTAINERS Andrey Ryabinin <ryabinin.a.a@gmail.com>: MAINTAINERS: update Andrey Ryabinin's email address Subsystem: mm/memcg Johannes Weiner <hannes@cmpxchg.org>: Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: mm, slub: better heuristic for number of cpus when calculating slab order Subsystem: nilfs2 Joachim Henke <joachim.henke@t-systems.com>: nilfs2: make splice write available again .mailmap | 1 Documentation/dev-tools/kasan.rst | 3 - MAINTAINERS | 2 - fs/Kconfig | 4 +- fs/nilfs2/file.c | 1 fs/squashfs/block.c | 8 ++++ fs/squashfs/export.c | 41 +++++++++++++++++++---- fs/squashfs/id.c | 40 ++++++++++++++++++----- fs/squashfs/squashfs_fs_sb.h | 1 fs/squashfs/super.c | 6 +-- fs/squashfs/xattr.h | 10 +++++ fs/squashfs/xattr_id.c | 66 ++++++++++++++++++++++++++++++++------ include/asm-generic/vmlinux.lds.h | 2 - mm/kasan/hw_tags.c | 8 +--- mm/memcontrol.c | 5 +- mm/mremap.c | 5 +- mm/slub.c | 18 +++++++++- 17 files changed, 172 insertions(+), 49 deletions(-)
Hah. This series shows a small deficiency in your scripting wrt the diffstat: On Tue, Feb 9, 2021 at 1:41 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > .mailmap | 1 ... > mm/slub.c | 18 +++++++++- > 17 files changed, 172 insertions(+), 49 deletions(-) It actually has 18 files changed, but one of them is a pure rename (no change to the content), and apparently your diffstat tool can't handle that case. It *should* have ended with ... mm/slub.c | 18 +++++- .../selftests/vm/{run_vmtests => run_vmtests.sh} | 0 18 files changed, 172 insertions(+), 49 deletions(-) rename tools/testing/selftests/vm/{run_vmtests => run_vmtests.sh} (100%) if you'd done a proper "git diff -M --stat --summary" of the series. [ Ok, by default git would actually have said 18 files changed, 171 insertions(+), 48 deletions(-) but it looks like you use the patience diff option, which gives that extra insertion/deletion line because it generates the diff a bit differently ] Not a big deal,, but it made me briefly wonder "why doesn't my diffstat match yours". Linus
6 patches, based on dcc0b49040c70ad827a7f3d58a21b01fdb14e749. Subsystems affected by this patch series: mm/pagemap scripts MAINTAINERS h8300 Subsystem: mm/pagemap Mike Rapoport <rppt@linux.ibm.com>: m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMU Subsystem: scripts Rong Chen <rong.a.chen@intel.com>: scripts/recordmcount.pl: support big endian for ARCH sh Subsystem: MAINTAINERS Andrey Konovalov <andreyknvl@google.com>: MAINTAINERS: update KASAN file list MAINTAINERS: update Andrey Konovalov's email address MAINTAINERS: add Andrey Konovalov to KASAN reviewers Subsystem: h8300 Randy Dunlap <rdunlap@infradead.org>: h8300: fix PREEMPTION build, TI_PRE_COUNT undefined MAINTAINERS | 8 +++++--- arch/h8300/kernel/asm-offsets.c | 3 +++ arch/m68k/include/asm/page.h | 2 +- scripts/recordmcount.pl | 6 +++++- 4 files changed, 14 insertions(+), 5 deletions(-)
A few small subsystems and some of MM. 173 patches, based on c03c21ba6f4e95e406a1a7b4c34ef334b977c194. Subsystems affected by this patch series: hexagon scripts ntfs ocfs2 vfs mm/slab-generic mm/slab mm/slub mm/debug mm/pagecache mm/swap mm/memcg mm/pagemap mm/mprotect mm/mremap mm/page-reporting mm/vmalloc mm/kasan mm/pagealloc mm/memory-failure mm/hugetlb mm/vmscan mm/z3fold mm/compaction mm/mempolicy mm/oom-kill mm/hugetlbfs mm/migration Subsystem: hexagon Randy Dunlap <rdunlap@infradead.org>: hexagon: remove CONFIG_EXPERIMENTAL from defconfigs Subsystem: scripts tangchunyou <tangchunyou@yulong.com>: scripts/spelling.txt: increase error-prone spell checking zuoqilin <zuoqilin@yulong.com>: scripts/spelling.txt: check for "exeeds" dingsenjie <dingsenjie@yulong.com>: scripts/spelling.txt: add "allocted" and "exeeds" typo Colin Ian King <colin.king@canonical.com>: scripts/spelling.txt: add more spellings to spelling.txt Subsystem: ntfs Randy Dunlap <rdunlap@infradead.org>: ntfs: layout.h: delete duplicated words Rustam Kovhaev <rkovhaev@gmail.com>: ntfs: check for valid standard information attribute Subsystem: ocfs2 Yi Li <yili@winhong.com>: ocfs2: remove redundant conditional before iput guozh <guozh88@chinatelecom.cn>: ocfs2: clean up some definitions which are not used any more Dan Carpenter <dan.carpenter@oracle.com>: ocfs2: fix a use after free on error Jiapeng Chong <jiapeng.chong@linux.alibaba.com>: ocfs2: simplify the calculation of variables Subsystem: vfs Randy Dunlap <rdunlap@infradead.org>: fs: delete repeated words in comments Alexey Dobriyan <adobriyan@gmail.com>: ramfs: support O_TMPFILE Subsystem: mm/slab-generic Jacob Wen <jian.w.wen@oracle.com>: mm, tracing: record slab name for kmem_cache_free() Nikolay Borisov <nborisov@suse.com>: mm/sl?b.c: remove ctor argument from kmem_cache_flags Subsystem: mm/slab Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/slab: minor coding style tweaks Subsystem: mm/slub Johannes Berg <johannes.berg@intel.com>: mm/slub: disable user tracing for kmemleak caches by default Vlastimil Babka <vbabka@suse.cz>: Patch series "mm, slab, slub: remove cpu and memory hotplug locks": mm, slub: stop freeing kmem_cache_node structures on node offline mm, slab, slub: stop taking memory hotplug lock mm, slab, slub: stop taking cpu hotplug lock mm, slub: splice cpu and page freelists in deactivate_slab() mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/slub: minor coding style tweaks Subsystem: mm/debug "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/debug: improve memcg debugging Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/debug_vm_pgtable: Some minor updates", v3: mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect mm/debug_vm_pgtable/basic: iterate over entire protection_map[] Miaohe Lin <linmiaohe@huawei.com>: mm/page_owner: use helper function zone_end_pfn() to get end_pfn Subsystem: mm/pagecache Baolin Wang <baolin.wang@linux.alibaba.com>: mm/filemap: remove unused parameter and change to void type for replace_page_cache_page() Pavel Begunkov <asml.silence@gmail.com>: mm/filemap: don't revert iter on -EIOCBQUEUED "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Refactor generic_file_buffered_read", v5: mm/filemap: rename generic_file_buffered_read subfunctions mm/filemap: remove dynamically allocated array from filemap_read mm/filemap: convert filemap_get_pages to take a pagevec mm/filemap: use head pages in generic_file_buffered_read mm/filemap: pass a sleep state to put_and_wait_on_page_locked mm/filemap: support readpage splitting a page mm/filemap: inline __wait_on_page_locked_async into caller mm/filemap: don't call ->readpage if IOCB_WAITQ is set mm/filemap: change filemap_read_page calling conventions mm/filemap: change filemap_create_page calling conventions mm/filemap: convert filemap_update_page to return an errno mm/filemap: move the iocb checks into filemap_update_page mm/filemap: add filemap_range_uptodate mm/filemap: split filemap_readahead out of filemap_get_pages mm/filemap: restructure filemap_get_pages mm/filemap: don't relock the page after calling readpage Christoph Hellwig <hch@lst.de>: mm/filemap: rename generic_file_buffered_read to filemap_read mm/filemap: simplify generic_file_read_iter Yang Guo <guoyang2@huawei.com>: fs/buffer.c: add checking buffer head stat before clear Baolin Wang <baolin.wang@linux.alibaba.com>: mm: backing-dev: Remove duplicated macro definition Subsystem: mm/swap Yang Li <abaci-bugfix@linux.alibaba.com>: mm/swap_slots.c: remove redundant NULL check Stephen Zhang <stephenzhangzsd@gmail.com>: mm/swapfile.c: fix debugging information problem Georgi Djakov <georgi.djakov@linaro.org>: mm/page_io: use pr_alert_ratelimited for swap read/write errors Rikard Falkeborn <rikard.falkeborn@gmail.com>: mm/swap_state: constify static struct attribute_group Yu Zhao <yuzhao@google.com>: mm/swap: don't SetPageWorkingset unconditionally during swapin Subsystem: mm/memcg Roman Gushchin <guro@fb.com>: mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: optimize per-lruvec stats counter memory usage Patch series "Convert all THP vmstat counters to pages", v6: mm: memcontrol: fix NR_ANON_THPS accounting in charge moving mm: memcontrol: convert NR_ANON_THPS account to pages mm: memcontrol: convert NR_FILE_THPS account to pages mm: memcontrol: convert NR_SHMEM_THPS account to pages mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pages mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages mm: memcontrol: make the slab calculation consistent Alex Shi <alex.shi@linux.alibaba.com>: mm/memcg: revise the using condition of lock_page_lruvec function series mm/memcg: remove rcu locking for lock_page_lruvec function series Shakeel Butt <shakeelb@google.com>: mm: memcg: add swapcache stat for memcg v2 Roman Gushchin <guro@fb.com>: mm: kmem: make __memcg_kmem_(un)charge static Feng Tang <feng.tang@intel.com>: mm: page_counter: re-layout structure to reduce false sharing Yang Li <abaci-bugfix@linux.alibaba.com>: mm/memcontrol: remove redundant NULL check Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: replace the loop with a list_for_each_entry() Shakeel Butt <shakeelb@google.com>: mm/list_lru.c: remove kvfree_rcu_local() Johannes Weiner <hannes@cmpxchg.org>: fs: buffer: use raw page_memcg() on locked page Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: fix swap undercounting in cgroup2 mm: memcontrol: fix get_active_memcg return value mm: memcontrol: fix slub memory accounting Subsystem: mm/pagemap Adrian Huang <ahuang12@lenovo.com>: mm/mmap.c: remove unnecessary local variable Miaohe Lin <linmiaohe@huawei.com>: mm/memory.c: fix potential pte_unmap_unlock pte error mm/pgtable-generic.c: simplify the VM_BUG_ON condition in pmdp_huge_clear_flush() mm/pgtable-generic.c: optimize the VM_BUG_ON condition in pmdp_huge_clear_flush() mm/memory.c: fix potential pte_unmap_unlock pte error Subsystem: mm/mprotect Tianjia Zhang <tianjia.zhang@linux.alibaba.com>: mm/mprotect.c: optimize error detection in do_mprotect_pkey() Subsystem: mm/mremap Li Xinhai <lixinhai.lxh@gmail.com>: mm: rmap: explicitly reset vma->anon_vma in unlink_anon_vmas() mm: mremap: unlink anon_vmas when mremap with MREMAP_DONTUNMAP success Subsystem: mm/page-reporting sh <sh_def@163.com>: mm/page_reporting: use list_entry_is_head() in page_reporting_cycle() Subsystem: mm/vmalloc Yang Li <abaci-bugfix@linux.alibaba.com>: vmalloc: remove redundant NULL check Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan: HW_TAGS tests support and fixes", v4: kasan: prefix global functions with kasan_ kasan: clarify HW_TAGS impact on TBI kasan: clean up comments in tests kasan: add macros to simplify checking test constraints kasan: add match-all tag tests kasan, arm64: allow using KUnit tests with HW_TAGS mode kasan: rename CONFIG_TEST_KASAN_MODULE kasan: add compiler barriers to KUNIT_EXPECT_KASAN_FAIL kasan: adapt kmalloc_uaf2 test to HW_TAGS mode kasan: fix memory corruption in kasan_bitops_tags test kasan: move _RET_IP_ to inline wrappers kasan: fix bug detection via ksize for HW_TAGS mode kasan: add proper page allocator tests kasan: add a test for kmem_cache_alloc/free_bulk kasan: don't run tests when KASAN is not enabled Walter Wu <walter-zh.wu@mediatek.com>: kasan: remove redundant config option Subsystem: mm/pagealloc Baoquan He <bhe@redhat.com>: Patch series "mm: clean up names and parameters of memmap_init_xxxx functions", v5: mm: fix prototype warning from kernel test robot mm: rename memmap_init() and memmap_init_zone() mm: simplify parater of function memmap_init_zone() mm: simplify parameter of setup_usemap() mm: remove unneeded local variable in free_area_init_core David Hildenbrand <david@redhat.com>: Patch series "mm: simplify free_highmem_page() and free_reserved_page()": video: fbdev: acornfb: remove free_unused_pages() mm: simplify free_highmem_page() and free_reserved_page() "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/gfp: add kernel-doc for gfp_t Subsystem: mm/memory-failure Aili Yao <yaoaili@kingsoft.com>: mm,hwpoison: send SIGBUS to PF_MCE_EARLY processes on action required events Subsystem: mm/hugetlb Bibo Mao <maobibo@loongson.cn>: mm/huge_memory.c: update tlb entry if pmd is changed MIPS: do not call flush_tlb_all when setting pmd entry Miaohe Lin <linmiaohe@huawei.com>: mm/hugetlb: fix potential double free in hugetlb_register_node() error path Li Xinhai <lixinhai.lxh@gmail.com>: mm/hugetlb.c: fix unnecessary address expansion of pmd sharing Miaohe Lin <linmiaohe@huawei.com>: mm/hugetlb: avoid unnecessary hugetlb_acct_memory() call mm/hugetlb: use helper huge_page_order and pages_per_huge_page mm/hugetlb: fix use after free when subpool max_hpages accounting is not enabled Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>: mm/hugetlb: simplify the calculation of variables Joao Martins <joao.m.martins@oracle.com>: Patch series "mm/hugetlb: follow_hugetlb_page() improvements", v2: mm/hugetlb: grab head page refcount once for group of subpages mm/hugetlb: refactor subpage recording Miaohe Lin <linmiaohe@huawei.com>: mm/hugetlb: fix some comment typos Yanfei Xu <yanfei.xu@windriver.com>: mm/hugetlb: remove redundant check in preparing and destroying gigantic page Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/hugetlb.c: fix typos in comments Miaohe Lin <linmiaohe@huawei.com>: mm/huge_memory.c: remove unused return value of set_huge_zero_page() "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled Miaohe Lin <linmiaohe@huawei.com>: hugetlb_cgroup: use helper pages_per_huge_page() in hugetlb_cgroup mm/hugetlb: use helper function range_in_vma() in page_table_shareable() mm/hugetlb: remove unnecessary VM_BUG_ON_PAGE on putback_active_hugepage() mm/hugetlb: use helper huge_page_size() to get hugepage size Mike Kravetz <mike.kravetz@oracle.com>: hugetlb: fix update_and_free_page contig page struct assumption hugetlb: fix copy_huge_page_from_user contig page struct assumption Chen Wandun <chenwandun@huawei.com>: mm/hugetlb: suppress wrong warning info when alloc gigantic page Subsystem: mm/vmscan Alex Shi <alex.shi@linux.alibaba.com>: mm/vmscan: __isolate_lru_page_prepare() cleanup Miaohe Lin <linmiaohe@huawei.com>: mm/workingset.c: avoid unnecessary max_nodes estimation in count_shadow_nodes() Yu Zhao <yuzhao@google.com>: Patch series "mm: lru related cleanups", v2: mm/vmscan.c: use add_page_to_lru_list() include/linux/mm_inline.h: shuffle lru list addition and deletion functions mm: don't pass "enum lru_list" to lru list addition functions mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion() mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list() mm: add __clear_page_lru_flags() to replace page_off_lru() mm: VM_BUG_ON lru page flags include/linux/mm_inline.h: fold page_lru_base_type() into its sole caller include/linux/mm_inline.h: fold __update_lru_size() into its sole caller mm/vmscan.c: make lruvec_lru_size() static Oscar Salvador <osalvador@suse.de>: mm: workingset: clarify eviction order and distance calculation Mike Kravetz <mike.kravetz@oracle.com>: Patch series "create hugetlb flags to consolidate state", v3: hugetlb: use page.private for hugetlb specific page flags hugetlb: convert page_huge_active() HPageMigratable flag hugetlb: convert PageHugeTemporary() to HPageTemporary flag hugetlb: convert PageHugeFreed to HPageFreed flag include/linux/hugetlb.h: add synchronization information for new hugetlb specific flags hugetlb: fix uninitialized subpool pointer Dave Hansen <dave.hansen@linux.intel.com>: mm/vmscan: restore zone_reclaim_mode ABI Subsystem: mm/z3fold Miaohe Lin <linmiaohe@huawei.com>: z3fold: remove unused attribute for release_z3fold_page z3fold: simplify the zhdr initialization code in init_z3fold_page() Subsystem: mm/compaction Alex Shi <alex.shi@linux.alibaba.com>: mm/compaction: remove rcu_read_lock during page compaction Miaohe Lin <linmiaohe@huawei.com>: mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked Charan Teja Reddy <charante@codeaurora.org>: mm/compaction: correct deferral logic for proactive compaction Wonhyuk Yang <vvghjk1234@gmail.com>: mm/compaction: fix misbehaviors of fast_find_migrateblock() Vlastimil Babka <vbabka@suse.cz>: mm, compaction: make fast_isolate_freepages() stay within zone Subsystem: mm/mempolicy Huang Ying <ying.huang@intel.com>: numa balancing: migrate on fault among multiple bound nodes Miaohe Lin <linmiaohe@huawei.com>: mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() Subsystem: mm/oom-kill Tang Yizhou <tangyizhou@huawei.com>: mm, oom: fix a comment in dump_task() Subsystem: mm/hugetlbfs Mike Kravetz <mike.kravetz@oracle.com>: mm/hugetlb: change hugetlb_reserve_pages() to type bool hugetlbfs: remove special hugetlbfs_set_page_dirty() Miaohe Lin <linmiaohe@huawei.com>: hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr() hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs hugetlbfs: correct obsolete function name in hugetlbfs_read_iter() hugetlbfs: remove meaningless variable avoid_reserve hugetlbfs: make hugepage size conversion more readable hugetlbfs: correct some obsolete comments about inode i_mutex hugetlbfs: fix some comment typos hugetlbfs: remove unneeded return value of hugetlb_vmtruncate() Subsystem: mm/migration Chengyang Fan <cy.fan@huawei.com>: mm/migrate: remove unneeded semicolons Documentation/admin-guide/cgroup-v2.rst | 4 Documentation/admin-guide/kernel-parameters.txt | 8 Documentation/admin-guide/sysctl/vm.rst | 10 Documentation/core-api/mm-api.rst | 7 Documentation/dev-tools/kasan.rst | 24 Documentation/vm/arch_pgtable_helpers.rst | 8 arch/arm64/include/asm/memory.h | 1 arch/arm64/include/asm/mte-kasan.h | 12 arch/arm64/kernel/mte.c | 12 arch/arm64/kernel/sleep.S | 2 arch/arm64/mm/fault.c | 20 arch/hexagon/configs/comet_defconfig | 1 arch/ia64/include/asm/pgtable.h | 6 arch/ia64/mm/init.c | 18 arch/mips/mm/pgtable-32.c | 1 arch/mips/mm/pgtable-64.c | 1 arch/x86/kernel/acpi/wakeup_64.S | 2 drivers/base/node.c | 33 drivers/video/fbdev/acornfb.c | 34 fs/block_dev.c | 2 fs/btrfs/file.c | 2 fs/buffer.c | 7 fs/dcache.c | 4 fs/direct-io.c | 4 fs/exec.c | 4 fs/fhandle.c | 2 fs/fuse/dev.c | 6 fs/hugetlbfs/inode.c | 72 -- fs/ntfs/inode.c | 6 fs/ntfs/layout.h | 4 fs/ocfs2/cluster/heartbeat.c | 8 fs/ocfs2/dlm/dlmast.c | 10 fs/ocfs2/dlm/dlmcommon.h | 4 fs/ocfs2/refcounttree.c | 2 fs/ocfs2/super.c | 2 fs/pipe.c | 2 fs/proc/meminfo.c | 10 fs/proc/vmcore.c | 7 fs/ramfs/inode.c | 13 include/linux/fs.h | 4 include/linux/gfp.h | 14 include/linux/highmem-internal.h | 5 include/linux/huge_mm.h | 15 include/linux/hugetlb.h | 98 ++ include/linux/kasan-checks.h | 6 include/linux/kasan.h | 39 - include/linux/memcontrol.h | 43 - include/linux/migrate.h | 2 include/linux/mm.h | 28 include/linux/mm_inline.h | 123 +-- include/linux/mmzone.h | 30 include/linux/page-flags.h | 6 include/linux/page_counter.h | 9 include/linux/pagemap.h | 5 include/linux/swap.h | 8 include/trace/events/kmem.h | 24 include/trace/events/pagemap.h | 11 include/uapi/linux/mempolicy.h | 4 init/Kconfig | 14 lib/Kconfig.kasan | 14 lib/Makefile | 2 lib/test_kasan.c | 446 ++++++++---- lib/test_kasan_module.c | 5 mm/backing-dev.c | 6 mm/compaction.c | 73 +- mm/debug.c | 10 mm/debug_vm_pgtable.c | 86 ++ mm/filemap.c | 859 +++++++++++------------- mm/gup.c | 5 mm/huge_memory.c | 28 mm/hugetlb.c | 376 ++++------ mm/hugetlb_cgroup.c | 6 mm/kasan/common.c | 60 - mm/kasan/generic.c | 40 - mm/kasan/hw_tags.c | 16 mm/kasan/kasan.h | 87 +- mm/kasan/quarantine.c | 22 mm/kasan/report.c | 15 mm/kasan/report_generic.c | 10 mm/kasan/report_hw_tags.c | 8 mm/kasan/report_sw_tags.c | 8 mm/kasan/shadow.c | 27 mm/kasan/sw_tags.c | 22 mm/khugepaged.c | 6 mm/list_lru.c | 12 mm/memcontrol.c | 309 ++++---- mm/memory-failure.c | 34 mm/memory.c | 24 mm/memory_hotplug.c | 11 mm/mempolicy.c | 18 mm/mempool.c | 2 mm/migrate.c | 10 mm/mlock.c | 3 mm/mmap.c | 4 mm/mprotect.c | 7 mm/mremap.c | 8 mm/oom_kill.c | 5 mm/page_alloc.c | 70 - mm/page_io.c | 12 mm/page_owner.c | 4 mm/page_reporting.c | 2 mm/pgtable-generic.c | 9 mm/rmap.c | 35 mm/shmem.c | 2 mm/slab.c | 21 mm/slab.h | 20 mm/slab_common.c | 40 - mm/slob.c | 2 mm/slub.c | 169 ++-- mm/swap.c | 54 - mm/swap_slots.c | 3 mm/swap_state.c | 31 mm/swapfile.c | 8 mm/vmscan.c | 100 +- mm/vmstat.c | 14 mm/workingset.c | 7 mm/z3fold.c | 11 scripts/Makefile.kasan | 10 scripts/spelling.txt | 30 tools/objtool/check.c | 2 120 files changed, 2249 insertions(+), 1954 deletions(-)
On Wed, Feb 24, 2021 at 11:58 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> A few small subsystems and some of MM.
Hmm. I haven't bisected things yet, but I suspect it's something with
the KASAN patches. With this all applied, I get:
lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
and
lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
larger than 2048 bytes [-Wframe-larger-than=]
which is obviously not really acceptable. A 11kB stack frame _will_
cause issues.
Linus
On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. I haven't bisected things yet, but I suspect it's something with
> the KASAN patches. With this all applied, I get:
>
> lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>
> and
>
> lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> larger than 2048 bytes [-Wframe-larger-than=]
>
> which is obviously not really acceptable. A 11kB stack frame _will_
> cause issues.
A quick bisect shoes that this was introduced by "[patch 101/173]
kasan: remove redundant config option".
I didn't check what part of that patch screws up, but it's definitely
doing something bad.
I will drop that patch.
Linus
On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Hmm. I haven't bisected things yet, but I suspect it's something with > > the KASAN patches. With this all applied, I get: > > > > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’: > > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of > > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=] > > > > and > > > > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’: > > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is > > larger than 2048 bytes [-Wframe-larger-than=] > > > > which is obviously not really acceptable. A 11kB stack frame _will_ > > cause issues. > > A quick bisect shoes that this was introduced by "[patch 101/173] > kasan: remove redundant config option". > > I didn't check what part of that patch screws up, but it's definitely > doing something bad. I'm not sure why that patch surfaced the bug, but it's worth pointing out that the underlying problem is asan-stack in combination with the structleak plugin. This will happen for every user of kunit. I sent a series[1] out earlier this year to turn off the structleak plugin as an alternative workaround, but need to follow up on the remaining patches. Someone suggested adding a more generic way to turn off the plugin for a file instead of open-coding the CLFAGS_REMOVE_*.o Makefile bit, which would help. I am also still hoping that someone can come up with a way to make kunit work better with the structleak plugin, as there shouldn't be a fundamental reason why it can't work, just that it the code pattern triggers a particularly bad case in the compiler. Arnd [1] https://lore.kernel.org/lkml/20210125124533.101339-1-arnd@kernel.org/
On Thu, Feb 25, 2021 at 11:53 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Hmm. I haven't bisected things yet, but I suspect it's something with
> > > the KASAN patches. With this all applied, I get:
> > >
> > > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> > > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> > > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> > >
> > > and
> > >
> > > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> > > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> > > larger than 2048 bytes [-Wframe-larger-than=]
> > >
> > > which is obviously not really acceptable. A 11kB stack frame _will_
> > > cause issues.
> >
> > A quick bisect shoes that this was introduced by "[patch 101/173]
> > kasan: remove redundant config option".
> >
> > I didn't check what part of that patch screws up, but it's definitely
> > doing something bad.
>
> I'm not sure why that patch surfaced the bug, but it's worth pointing
> out that the underlying problem is asan-stack in combination
> with the structleak plugin. This will happen for every user of kunit.
>
The patch didn't update KASAN_STACK dependency in kconfig:
config GCC_PLUGIN_STRUCTLEAK_BYREF
....
depends on !(KASAN && KASAN_STACK=1)
This 'depends on' stopped working with the patch
Hi Andrey,
On Thu, 2021-02-25 at 12:12 +0300, Andrey Ryabinin wrote:
> On Thu, Feb 25, 2021 at 11:53 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > Hmm. I haven't bisected things yet, but I suspect it's something with
> > > > the KASAN patches. With this all applied, I get:
> > > >
> > > > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> > > > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> > > > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> > > >
> > > > and
> > > >
> > > > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> > > > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> > > > larger than 2048 bytes [-Wframe-larger-than=]
> > > >
> > > > which is obviously not really acceptable. A 11kB stack frame _will_
> > > > cause issues.
> > >
> > > A quick bisect shoes that this was introduced by "[patch 101/173]
> > > kasan: remove redundant config option".
> > >
> > > I didn't check what part of that patch screws up, but it's definitely
> > > doing something bad.
> >
> > I'm not sure why that patch surfaced the bug, but it's worth pointing
> > out that the underlying problem is asan-stack in combination
> > with the structleak plugin. This will happen for every user of kunit.
> >
>
> The patch didn't update KASAN_STACK dependency in kconfig:
> config GCC_PLUGIN_STRUCTLEAK_BYREF
> ....
> depends on !(KASAN && KASAN_STACK=1)
>
> This 'depends on' stopped working with the patch
Thanks for pointing out this problem. I will re-send that patch.
Walter
- The rest of MM. Includes kfence - another runtime memory validator. Not as thorough as KASAN, but it has unmeasurable overhead and is intended to be usable in production builds. - Everything else 118 patches, based on 6fbd6cf85a3be127454a1ad58525a3adcf8612ab. Subsystems affected by this patch series: mm/thp mm/cma mm/vmstat mm/memory-hotplug mm/mlock mm/rmap mm/zswap mm/zsmalloc mm/cleanups mm/kfence mm/kasan2 alpha procfs sysctl misc core-kernel MAINTAINERS lib bitops checkpatch init coredump seq_file gdb ubsan initramfs mm/pagemap2 Subsystem: mm/thp "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Overhaul multi-page lookups for THP", v4: mm: make pagecache tagged lookups return only head pages mm/shmem: use pagevec_lookup in shmem_unlock_mapping mm/swap: optimise get_shadow_from_swap_cache mm: add FGP_ENTRY mm/filemap: rename find_get_entry to mapping_get_entry mm/filemap: add helper for finding pages mm/filemap: add mapping_seek_hole_data iomap: use mapping_seek_hole_data mm: add and use find_lock_entries mm: add an 'end' parameter to find_get_entries mm: add an 'end' parameter to pagevec_lookup_entries mm: remove nr_entries parameter from pagevec_lookup_entries mm: pass pvec directly to find_get_entries mm: remove pagevec_lookup_entries Rik van Riel <riel@surriel.com>: Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6: mm,thp,shmem: limit shmem THP alloc gfp_mask mm,thp,shm: limit gfp mask to no more than specified mm,thp,shmem: make khugepaged obey tmpfs mount flags mm,shmem,thp: limit shmem THP allocations to requested zones Subsystem: mm/cma Roman Gushchin <guro@fb.com>: mm: cma: allocate cma areas bottom-up David Hildenbrand <david@redhat.com>: mm/cma: expose all pages to the buddy if activation of an area fails mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo Patrick Daly <pdaly@codeaurora.org>: mm: cma: print region name on failure Subsystem: mm/vmstat Johannes Weiner <hannes@cmpxchg.org>: mm: vmstat: fix NOHZ wakeups for node stat changes mm: vmstat: add some comments on internal storage of byte items Jiang Biao <benbjiang@tencent.com>: mm/vmstat.c: erase latency in vmstat_shepherd Subsystem: mm/memory-hotplug Dan Williams <dan.j.williams@intel.com>: Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4: mm: move pfn_to_online_page() out of line mm: teach pfn_to_online_page() to consider subsection validity mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions mm: fix memory_failure() handling of dax-namespace metadata Anshuman Khandual <anshuman.khandual@arm.com>: mm/memory_hotplug: rename all existing 'memhp' into 'mhp' David Hildenbrand <david@redhat.com>: mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE Miaohe Lin <linmiaohe@huawei.com>: mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn David Hildenbrand <david@redhat.com>: drivers/base/memory: don't store phys_device in memory blocks Documentation: sysfs/memory: clarify some memory block device properties Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/memory_hotplug: Pre-validate the address range with platform", v5: mm/memory_hotplug: prevalidate the address range being added with platform arm64/mm: define arch_get_mappable_range() s390/mm: define arch_get_mappable_range() David Hildenbrand <david@redhat.com>: virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplug Subsystem: mm/mlock Miaohe Lin <linmiaohe@huawei.com>: mm/mlock: stop counting mlocked pages when none vma is found Subsystem: mm/rmap Miaohe Lin <linmiaohe@huawei.com>: mm/rmap: correct some obsolete comments of anon_vma mm/rmap: remove unneeded semicolon in page_not_mapped() mm/rmap: fix obsolete comment in __page_check_anon_rmap() mm/rmap: use page_not_mapped in try_to_unmap() mm/rmap: correct obsolete comment of page_get_anon_vma() mm/rmap: fix potential pte_unmap on an not mapped pte Subsystem: mm/zswap Randy Dunlap <rdunlap@infradead.org>: mm: zswap: clean up confusing comment Tian Tao <tiantao6@hisilicon.com>: Patch series "Fix the compatibility of zsmalloc and zswap": mm/zswap: add the flag can_sleep_mapped mm: set the sleep_mapped to true for zbud and z3fold Subsystem: mm/zsmalloc Miaohe Lin <linmiaohe@huawei.com>: mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage() Rokudo Yan <wu-yan@tcl.com>: zsmalloc: account the number of compacted pages correctly Miaohe Lin <linmiaohe@huawei.com>: mm/zsmalloc.c: use page_private() to access page->private Subsystem: mm/cleanups Guo Ren <guoren@linux.alibaba.com>: mm: page-flags.h: Typo fix (It -> If) Daniel Vetter <daniel.vetter@ffwll.ch>: mm/dmapool: use might_alloc() mm/backing-dev.c: use might_alloc() Stephen Zhang <stephenzhangzsd@gmail.com>: mm/early_ioremap.c: use __func__ instead of function name Subsystem: mm/kfence Alexander Potapenko <glider@google.com>: Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7: mm: add Kernel Electric-Fence infrastructure x86, kfence: enable KFENCE for x86 Marco Elver <elver@google.com>: arm64, kfence: enable KFENCE for ARM64 kfence: use pt_regs to generate stack trace on faults Alexander Potapenko <glider@google.com>: mm, kfence: insert KFENCE hooks for SLAB mm, kfence: insert KFENCE hooks for SLUB kfence, kasan: make KFENCE compatible with KASAN Marco Elver <elver@google.com>: kfence, Documentation: add KFENCE documentation kfence: add test suite MAINTAINERS: add entry for KFENCE kfence: report sensitive information based on no_hash_pointers Alexander Potapenko <glider@google.com>: Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3: tracing: add error_report_end trace point kfence: use error_report_end tracepoint kasan: use error_report_end tracepoint Subsystem: mm/kasan2 Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan: optimizations and fixes for HW_TAGS", v4: kasan, mm: don't save alloc stacks twice kasan, mm: optimize kmalloc poisoning kasan: optimize large kmalloc poisoning kasan: clean up setting free info in kasan_slab_free kasan: unify large kfree checks kasan: rework krealloc tests kasan, mm: fail krealloc on freed objects kasan, mm: optimize krealloc poisoning kasan: ensure poisoning size alignment arm64: kasan: simplify and inline MTE functions kasan: inline HW_TAGS helper functions kasan: clarify that only first bug is reported in HW_TAGS Subsystem: alpha Randy Dunlap <rdunlap@infradead.org>: alpha: remove CONFIG_EXPERIMENTAL from defconfigs Subsystem: procfs Helge Deller <deller@gmx.de>: proc/wchan: use printk format instead of lookup_symbol_name() Josef Bacik <josef@toxicpanda.com>: proc: use kvzalloc for our kernel buffer Subsystem: sysctl Lin Feng <linf@wangsu.com>: sysctl.c: fix underflow value setting risk in vm_table Subsystem: misc Randy Dunlap <rdunlap@infradead.org>: include/linux: remove repeated words Miguel Ojeda <ojeda@kernel.org>: treewide: Miguel has moved Subsystem: core-kernel Hubert Jasudowicz <hubert.jasudowicz@gmail.com>: groups: use flexible-array member in struct group_info groups: simplify struct group_info allocation Randy Dunlap <rdunlap@infradead.org>: kernel: delete repeated words in comments Subsystem: MAINTAINERS Vlastimil Babka <vbabka@suse.cz>: MAINTAINERS: add uapi directories to API/ABI section Subsystem: lib Huang Shijie <sjhuang@iluvatar.ai>: lib/genalloc.c: change return type to unsigned long for bitmap_set_ll Francis Laniel <laniel_francis@privacyrequired.com>: string.h: move fortified functions definitions in a dedicated header. Yogesh Lal <ylal@codeaurora.org>: lib: stackdepot: add support to configure STACK_HASH_SIZE Vijayanand Jitta <vjitta@codeaurora.org>: lib: stackdepot: add support to disable stack depot lib: stackdepot: fix ignoring return value warning Masahiro Yamada <masahiroy@kernel.org>: lib/cmdline: remove an unneeded local variable in next_arg() Subsystem: bitops Geert Uytterhoeven <geert+renesas@glider.be>: include/linux/bitops.h: spelling s/synomyn/synonym/ Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: improve blank line after declaration test Peng Wang <rocking@linux.alibaba.com>: checkpatch: ignore warning designated initializers using NR_CPUS Dwaipayan Ray <dwaipayanray1@gmail.com>: checkpatch: trivial style fixes Joe Perches <joe@perches.com>: checkpatch: prefer ftrace over function entry/exit printks checkpatch: improve TYPECAST_INT_CONSTANT test message Aditya Srivastava <yashsri421@gmail.com>: checkpatch: add warning for avoiding .L prefix symbols in assembly files Joe Perches <joe@perches.com>: checkpatch: add kmalloc_array_node to unnecessary OOM message check Chris Down <chris@chrisdown.name>: checkpatch: don't warn about colon termination in linker scripts Song Liu <songliubraving@fb.com>: checkpatch: do not apply "initialise globals to 0" check to BPF progs Subsystem: init Masahiro Yamada <masahiroy@kernel.org>: init/version.c: remove Version_<LINUX_VERSION_CODE> symbol init: clean up early_param_on_off() macro Bhaskar Chowdhury <unixbhaskar@gmail.com>: init/Kconfig: fix a typo in CC_VERSION_TEXT help text Subsystem: coredump Ira Weiny <ira.weiny@intel.com>: fs/coredump: use kmap_local_page() Subsystem: seq_file NeilBrown <neilb@suse.de>: Patch series "Fix some seq_file users that were recently broken": seq_file: document how per-entry resources are managed. x86: fix seq_file iteration for pat/memtype.c Subsystem: gdb George Prekas <prekageo@amazon.com>: scripts/gdb: fix list_for_each Sumit Garg <sumit.garg@linaro.org>: kgdb: fix to kill breakpoints on initmem after boot Subsystem: ubsan Andrey Ryabinin <ryabinin.a.a@gmail.com>: ubsan: remove overflow checks Subsystem: initramfs Florian Fainelli <f.fainelli@gmail.com>: initramfs: panic with memory information Subsystem: mm/pagemap2 Huang Pei <huangpei@loongson.cn>: MIPS: make userspace mapping young by default .mailmap | 1 CREDITS | 9 Documentation/ABI/testing/sysfs-devices-memory | 58 - Documentation/admin-guide/auxdisplay/cfag12864b.rst | 2 Documentation/admin-guide/auxdisplay/ks0108.rst | 2 Documentation/admin-guide/kernel-parameters.txt | 6 Documentation/admin-guide/mm/memory-hotplug.rst | 20 Documentation/dev-tools/index.rst | 1 Documentation/dev-tools/kasan.rst | 8 Documentation/dev-tools/kfence.rst | 318 +++++++ Documentation/filesystems/seq_file.rst | 6 MAINTAINERS | 26 arch/alpha/configs/defconfig | 1 arch/arm64/Kconfig | 1 arch/arm64/include/asm/cache.h | 1 arch/arm64/include/asm/kasan.h | 1 arch/arm64/include/asm/kfence.h | 26 arch/arm64/include/asm/mte-def.h | 2 arch/arm64/include/asm/mte-kasan.h | 65 + arch/arm64/include/asm/mte.h | 2 arch/arm64/kernel/mte.c | 46 - arch/arm64/lib/mte.S | 16 arch/arm64/mm/fault.c | 8 arch/arm64/mm/mmu.c | 23 arch/mips/mm/cache.c | 30 arch/s390/mm/init.c | 1 arch/s390/mm/vmem.c | 14 arch/x86/Kconfig | 1 arch/x86/include/asm/kfence.h | 76 + arch/x86/mm/fault.c | 10 arch/x86/mm/pat/memtype.c | 4 drivers/auxdisplay/cfag12864b.c | 4 drivers/auxdisplay/cfag12864bfb.c | 4 drivers/auxdisplay/ks0108.c | 4 drivers/base/memory.c | 35 drivers/block/zram/zram_drv.c | 2 drivers/hv/hv_balloon.c | 2 drivers/virtio/virtio_mem.c | 43 drivers/xen/balloon.c | 2 fs/coredump.c | 4 fs/iomap/seek.c | 125 -- fs/proc/base.c | 21 fs/proc/proc_sysctl.c | 4 include/linux/bitops.h | 2 include/linux/cfag12864b.h | 2 include/linux/cred.h | 2 include/linux/fortify-string.h | 302 ++++++ include/linux/gfp.h | 2 include/linux/init.h | 4 include/linux/kasan.h | 25 include/linux/kfence.h | 230 +++++ include/linux/kgdb.h | 2 include/linux/khugepaged.h | 2 include/linux/ks0108.h | 2 include/linux/mdev.h | 2 include/linux/memory.h | 3 include/linux/memory_hotplug.h | 33 include/linux/memremap.h | 6 include/linux/mmzone.h | 49 - include/linux/page-flags.h | 4 include/linux/pagemap.h | 10 include/linux/pagevec.h | 10 include/linux/pgtable.h | 8 include/linux/ptrace.h | 2 include/linux/rmap.h | 3 include/linux/slab_def.h | 3 include/linux/slub_def.h | 3 include/linux/stackdepot.h | 9 include/linux/string.h | 282 ------ include/linux/vmstat.h | 6 include/linux/zpool.h | 3 include/linux/zsmalloc.h | 2 include/trace/events/error_report.h | 74 + include/uapi/linux/firewire-cdev.h | 2 include/uapi/linux/input.h | 2 init/Kconfig | 2 init/initramfs.c | 19 init/main.c | 6 init/version.c | 8 kernel/debug/debug_core.c | 11 kernel/events/core.c | 8 kernel/events/uprobes.c | 2 kernel/groups.c | 7 kernel/locking/rtmutex.c | 4 kernel/locking/rwsem.c | 2 kernel/locking/semaphore.c | 2 kernel/sched/fair.c | 2 kernel/sched/membarrier.c | 2 kernel/sysctl.c | 8 kernel/trace/Makefile | 1 kernel/trace/error_report-traces.c | 12 lib/Kconfig | 9 lib/Kconfig.debug | 1 lib/Kconfig.kfence | 84 + lib/Kconfig.ubsan | 17 lib/cmdline.c | 7 lib/genalloc.c | 3 lib/stackdepot.c | 41 lib/test_kasan.c | 111 ++ lib/test_ubsan.c | 49 - lib/ubsan.c | 68 - mm/Makefile | 1 mm/backing-dev.c | 3 mm/cma.c | 64 - mm/dmapool.c | 3 mm/early_ioremap.c | 12 mm/filemap.c | 361 +++++--- mm/huge_memory.c | 6 mm/internal.h | 6 mm/kasan/common.c | 213 +++- mm/kasan/generic.c | 3 mm/kasan/hw_tags.c | 2 mm/kasan/kasan.h | 97 +- mm/kasan/report.c | 8 mm/kasan/shadow.c | 78 + mm/kfence/Makefile | 6 mm/kfence/core.c | 875 +++++++++++++++++++- mm/kfence/kfence.h | 126 ++ mm/kfence/kfence_test.c | 860 +++++++++++++++++++ mm/kfence/report.c | 350 ++++++-- mm/khugepaged.c | 22 mm/memory-failure.c | 6 mm/memory.c | 4 mm/memory_hotplug.c | 178 +++- mm/memremap.c | 23 mm/mlock.c | 2 mm/page_alloc.c | 1 mm/rmap.c | 24 mm/shmem.c | 160 +-- mm/slab.c | 38 mm/slab_common.c | 29 mm/slub.c | 63 + mm/swap.c | 54 - mm/swap_state.c | 7 mm/truncate.c | 141 --- mm/vmstat.c | 35 mm/z3fold.c | 1 mm/zbud.c | 1 mm/zpool.c | 13 mm/zsmalloc.c | 22 mm/zswap.c | 57 + samples/auxdisplay/cfag12864b-example.c | 2 scripts/Makefile.ubsan | 2 scripts/checkpatch.pl | 152 ++- scripts/gdb/linux/lists.py | 5 145 files changed, 5046 insertions(+), 1682 deletions(-)
On Thu, Feb 25, 2021 at 5:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> - The rest of MM.
>
> Includes kfence - another runtime memory validator. Not as
> thorough as KASAN, but it has unmeasurable overhead and is intended
> to be usable in production builds.
>
> - Everything else
Just to clarify: you have nothing else really pending?
I'm hoping to just do -rc1 this weekend after all - despite my late
start due to loss of power for several days.
I'll allow late stragglers with good reason through, but the fewer of
those there are, the better, of course.
Thanks,
Linus
On Fri, 26 Feb 2021 09:55:27 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Feb 25, 2021 at 5:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > - The rest of MM.
> >
> > Includes kfence - another runtime memory validator. Not as
> > thorough as KASAN, but it has unmeasurable overhead and is intended
> > to be usable in production builds.
> >
> > - Everything else
>
> Just to clarify: you have nothing else really pending?
Yes, that's it from me for -rc1.
29 patches, based on f78d76e72a4671ea52d12752d92077788b4f5d50. Subsystems affected by this patch series: mm/memblock core-kernel kconfig mm/pagealloc fork mm/hugetlb mm/highmem binfmt MAINTAINERS kbuild mm/kfence mm/oom-kill mm/madvise mm/kasan mm/userfaultfd mm/memory-failure ia64 mm/memcg mm/zram Subsystem: mm/memblock Arnd Bergmann <arnd@arndb.de>: memblock: fix section mismatch warning Subsystem: core-kernel Arnd Bergmann <arnd@arndb.de>: stop_machine: mark helpers __always_inline Subsystem: kconfig Masahiro Yamada <masahiroy@kernel.org>: init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM Subsystem: mm/pagealloc Mike Rapoport <rppt@linux.ibm.com>: mm/page_alloc.c: refactor initialization of struct page for holes in memory layout Subsystem: fork Fenghua Yu <fenghua.yu@intel.com>: mm/fork: clear PASID for new mm Subsystem: mm/hugetlb Peter Xu <peterx@redhat.com>: Patch series "mm/hugetlb: Early cow on fork, and a few cleanups", v5: hugetlb: dedup the code to add a new file_region hugetlb: break earlier in add_reservation_in_range() when we can mm: introduce page_needs_cow_for_dma() for deciding whether cow mm: use is_cow_mapping() across tree where proper hugetlb: do early cow when page pinned on src mm Subsystem: mm/highmem OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>: mm/highmem.c: fix zero_user_segments() with start > end Subsystem: binfmt Lior Ribak <liorribak@gmail.com>: binfmt_misc: fix possible deadlock in bm_register_write Subsystem: MAINTAINERS Vlastimil Babka <vbabka@suse.cz>: MAINTAINERS: exclude uapi directories in API/ABI section Subsystem: kbuild Arnd Bergmann <arnd@arndb.de>: linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence: fix printk format for ptrdiff_t kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations kfence: fix reports if constant function prefixes exist Subsystem: mm/oom-kill "Matthew Wilcox (Oracle)" <willy@infradead.org>: include/linux/sched/mm.h: use rcu_dereference in in_vfork() Subsystem: mm/madvise Suren Baghdasaryan <surenb@google.com>: mm/madvise: replace ptrace attach requirement for process_madvise Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC kasan: fix KASAN_STACK dependency for HW_TAGS Subsystem: mm/userfaultfd Nadav Amit <namit@vmware.com>: mm/userfaultfd: fix memory corruption due to writeprotect Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm, hwpoison: do not lock page again when me_huge_page() successfully recovers Subsystem: ia64 Sergei Trofimovich <slyfox@gentoo.org>: ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign Subsystem: mm/memcg Zhou Guanghui <zhouguanghui1@huawei.com>: mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument mm/memcg: set memcg when splitting page Subsystem: mm/zram Minchan Kim <minchan@kernel.org>: zram: fix return value on writeback_store zram: fix broken page writeback MAINTAINERS | 4 arch/ia64/include/asm/syscall.h | 2 arch/ia64/kernel/ptrace.c | 24 +++- drivers/block/zram/zram_drv.c | 17 +- drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 4 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 2 fs/binfmt_misc.c | 29 ++--- fs/proc/task_mmu.c | 2 include/linux/compiler-clang.h | 6 + include/linux/memblock.h | 4 include/linux/memcontrol.h | 6 - include/linux/mm.h | 21 +++ include/linux/mm_types.h | 1 include/linux/sched/mm.h | 3 include/linux/stop_machine.h | 11 + init/Kconfig | 3 kernel/fork.c | 8 + lib/Kconfig.kasan | 1 mm/highmem.c | 17 ++ mm/huge_memory.c | 10 - mm/hugetlb.c | 123 +++++++++++++++------ mm/internal.h | 5 mm/kfence/report.c | 30 +++-- mm/madvise.c | 13 ++ mm/memcontrol.c | 15 +- mm/memory-failure.c | 4 mm/memory.c | 16 +- mm/page_alloc.c | 167 ++++++++++++++--------------- mm/slab.c | 2 29 files changed, 334 insertions(+), 216 deletions(-)
14 patches, based on 7acac4b3196caee5e21fb5ea53f8bc124e6a16fc. Subsystems affected by this patch series: mm/hugetlb mm/kasan mm/gup mm/selftests mm/z3fold squashfs ia64 gcov mm/kfence mm/memblock mm/highmem mailmap Subsystem: mm/hugetlb Miaohe Lin <linmiaohe@huawei.com>: hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan: fix per-page tags for non-page_alloc pages Subsystem: mm/gup Sean Christopherson <seanjc@google.com>: mm/mmu_notifiers: ensure range_end() is paired with range_start() Subsystem: mm/selftests Rong Chen <rong.a.chen@intel.com>: selftests/vm: fix out-of-tree build Subsystem: mm/z3fold Thomas Hebb <tommyhebb@gmail.com>: z3fold: prevent reclaim/free race for headless pages Subsystem: squashfs Sean Nyekjaer <sean@geanix.com>: squashfs: fix inode lookup sanity checks Phillip Lougher <phillip@squashfs.org.uk>: squashfs: fix xattr id and id lookup sanity checks Subsystem: ia64 Sergei Trofimovich <slyfox@gentoo.org>: ia64: mca: allocate early mca with GFP_ATOMIC ia64: fix format strings for err_inject Subsystem: gcov Nick Desaulniers <ndesaulniers@google.com>: gcov: fix clang-11+ support Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence: make compatible with kmemleak Subsystem: mm/memblock Mike Rapoport <rppt@linux.ibm.com>: mm: memblock: fix section mismatch warning again Subsystem: mm/highmem Ira Weiny <ira.weiny@intel.com>: mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP Subsystem: mailmap Andrey Konovalov <andreyknvl@google.com>: mailmap: update Andrey Konovalov's email address .mailmap | 1 arch/ia64/kernel/err_inject.c | 22 +++++------ arch/ia64/kernel/mca.c | 2 - fs/squashfs/export.c | 8 +++- fs/squashfs/id.c | 6 ++- fs/squashfs/squashfs_fs.h | 1 fs/squashfs/xattr_id.c | 6 ++- include/linux/hugetlb_cgroup.h | 15 ++++++- include/linux/memblock.h | 4 +- include/linux/mm.h | 18 +++++++-- include/linux/mmu_notifier.h | 10 ++--- kernel/gcov/clang.c | 69 ++++++++++++++++++++++++++++++++++++ mm/highmem.c | 4 +- mm/hugetlb.c | 41 +++++++++++++++++++-- mm/hugetlb_cgroup.c | 10 ++++- mm/kfence/core.c | 9 ++++ mm/kmemleak.c | 3 + mm/mmu_notifier.c | 23 ++++++++++++ mm/z3fold.c | 16 +++++++- tools/testing/selftests/vm/Makefile | 4 +- 20 files changed, 230 insertions(+), 42 deletions(-)
16 patches, based on 17e7124aad766b3f158943acb51467f86220afe9. Subsystems affected by this patch series: MAINTAINERS mailmap mm/kasan mm/gup nds32 gcov ocfs2 ia64 mm/pagecache mm/kasan mm/kfence lib Subsystem: MAINTAINERS Marek Behún <kabel@kernel.org>: MAINTAINERS: update CZ.NIC's Turris information treewide: change my e-mail address, fix my name Subsystem: mailmap Jordan Crouse <jordan@cosmicpenguin.net>: mailmap: update email address for Jordan Crouse Matthew Wilcox <willy@infradead.org>: .mailmap: fix old email addresses Subsystem: mm/kasan Arnd Bergmann <arnd@arndb.de>: kasan: fix hwasan build for gcc Walter Wu <walter-zh.wu@mediatek.com>: kasan: remove redundant config option Subsystem: mm/gup Aili Yao <yaoaili@kingsoft.com>: mm/gup: check page posion status for coredump. Subsystem: nds32 Mike Rapoport <rppt@linux.ibm.com>: nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff Subsystem: gcov Nick Desaulniers <ndesaulniers@google.com>: gcov: re-fix clang-11+ support Subsystem: ocfs2 Wengang Wang <wen.gang.wang@oracle.com>: ocfs2: fix deadlock between setattr and dio_end_io_write Subsystem: ia64 Sergei Trofimovich <slyfox@gentoo.org>: ia64: fix user_stack_pointer() for ptrace() Subsystem: mm/pagecache Jack Qiu <jack.qiu@huawei.com>: fs: direct-io: fix missing sdio->boundary Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan: fix conflict with page poisoning Andrew Morton <akpm@linux-foundation.org>: lib/test_kasan_module.c: suppress unused var warning Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence, x86: fix preemptible warning on KPTI-enabled systems Subsystem: lib Julian Braha <julianbraha@gmail.com>: lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS .mailmap | 7 ++ Documentation/ABI/testing/debugfs-moxtet | 4 - Documentation/ABI/testing/debugfs-turris-mox-rwtm | 2 Documentation/ABI/testing/sysfs-bus-moxtet-devices | 6 +- Documentation/ABI/testing/sysfs-class-led-driver-turris-omnia | 2 Documentation/ABI/testing/sysfs-firmware-turris-mox-rwtm | 10 +-- Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml | 2 MAINTAINERS | 13 +++- arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts | 2 arch/arm64/kernel/sleep.S | 2 arch/ia64/include/asm/ptrace.h | 8 -- arch/nds32/mm/cacheflush.c | 2 arch/x86/include/asm/kfence.h | 7 ++ arch/x86/kernel/acpi/wakeup_64.S | 2 drivers/bus/moxtet.c | 4 - drivers/firmware/turris-mox-rwtm.c | 4 - drivers/gpio/gpio-moxtet.c | 4 - drivers/leds/leds-turris-omnia.c | 4 - drivers/mailbox/armada-37xx-rwtm-mailbox.c | 4 - drivers/watchdog/armada_37xx_wdt.c | 4 - fs/direct-io.c | 5 + fs/ocfs2/aops.c | 11 --- fs/ocfs2/file.c | 8 ++ include/dt-bindings/bus/moxtet.h | 2 include/linux/armada-37xx-rwtm-mailbox.h | 2 include/linux/kasan.h | 2 include/linux/moxtet.h | 2 kernel/gcov/clang.c | 29 ++++++---- lib/Kconfig.debug | 6 +- lib/Kconfig.kasan | 9 --- lib/test_kasan_module.c | 2 mm/gup.c | 4 + mm/internal.h | 20 ++++++ mm/kasan/common.c | 2 mm/kasan/kasan.h | 2 mm/kasan/report_generic.c | 2 mm/page_poison.c | 4 + scripts/Makefile.kasan | 18 ++++-- security/Kconfig.hardening | 4 - 39 files changed, 136 insertions(+), 91 deletions(-)
12 patches, based on 06c2aac4014c38247256fe49c61b7f55890271e7. Subsystems affected by this patch series: mm/documentation mm/kasan csky ia64 mm/pagemap gcov lib Subsystem: mm/documentation Randy Dunlap <rdunlap@infradead.org>: mm: eliminate "expecting prototype" kernel-doc warnings Subsystem: mm/kasan Arnd Bergmann <arnd@arndb.de>: kasan: fix hwasan build for gcc Walter Wu <walter-zh.wu@mediatek.com>: kasan: remove redundant config option Subsystem: csky Randy Dunlap <rdunlap@infradead.org>: csky: change a Kconfig symbol name to fix e1000 build error Subsystem: ia64 Randy Dunlap <rdunlap@infradead.org>: ia64: remove duplicate entries in generic_defconfig ia64: fix discontig.c section mismatches John Paul Adrian Glaubitz <glaubitz () physik ! fu-berlin ! de>: ia64: tools: remove inclusion of ia64-specific version of errno.h header John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>: ia64: tools: remove duplicate definition of ia64_mf() on ia64 Subsystem: mm/pagemap Zack Rusin <zackr@vmware.com>: mm/mapping_dirty_helpers: guard hugepage pud's usage Christophe Leroy <christophe.leroy@csgroup.eu>: mm: ptdump: fix build failure Subsystem: gcov Johannes Berg <johannes.berg@intel.com>: gcov: clang: fix clang-11+ build Subsystem: lib Randy Dunlap <rdunlap@infradead.org>: lib: remove "expecting prototype" kernel-doc warnings arch/arm64/kernel/sleep.S | 2 +- arch/csky/Kconfig | 2 +- arch/csky/include/asm/page.h | 2 +- arch/ia64/configs/generic_defconfig | 2 -- arch/ia64/mm/discontig.c | 6 +++--- arch/x86/kernel/acpi/wakeup_64.S | 2 +- include/linux/kasan.h | 2 +- kernel/gcov/clang.c | 2 +- lib/Kconfig.kasan | 9 ++------- lib/earlycpio.c | 4 ++-- lib/lru_cache.c | 3 ++- lib/parman.c | 4 ++-- lib/radix-tree.c | 11 ++++++----- mm/kasan/common.c | 2 +- mm/kasan/kasan.h | 2 +- mm/kasan/report_generic.c | 2 +- mm/mapping_dirty_helpers.c | 2 ++ mm/mmu_gather.c | 29 +++++++++++++++++++---------- mm/oom_kill.c | 2 +- mm/ptdump.c | 2 +- mm/shuffle.c | 4 ++-- scripts/Makefile.kasan | 22 ++++++++++++++-------- security/Kconfig.hardening | 4 ++-- tools/arch/ia64/include/asm/barrier.h | 3 --- tools/include/uapi/asm/errno.h | 2 -- 25 files changed, 67 insertions(+), 60 deletions(-)
5 patches, based on 5bfc75d92efd494db37f5c4c173d3639d4772966. Subsystems affected by this patch series: coda overlayfs mm/pagecache mm/memcg Subsystem: coda Christian König <christian.koenig@amd.com>: coda: fix reference counting in coda_file_mmap error path Subsystem: overlayfs Christian König <christian.koenig@amd.com>: ovl: fix reference counting in ovl_mmap error path Subsystem: mm/pagecache Hugh Dickins <hughd@google.com>: mm/filemap: fix find_lock_entries hang on 32-bit THP mm/filemap: fix mapping_seek_hole_data on THP & 32-bit Subsystem: mm/memcg Vasily Averin <vvs@virtuozzo.com>: tools/cgroup/slabinfo.py: updated to work on current kernel fs/coda/file.c | 6 +++--- fs/overlayfs/file.c | 11 +---------- mm/filemap.c | 31 +++++++++++++++++++------------ tools/cgroup/memcg_slabinfo.py | 8 ++++---- 4 files changed, 27 insertions(+), 29 deletions(-)
A few misc subsystems and some of MM. 178 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0. Subsystems affected by this patch series: ia64 kbuild scripts sh ocfs2 kfifo vfs kernel/watchdog mm/slab-generic mm/slub mm/kmemleak mm/debug mm/pagecache mm/msync mm/gup mm/memremap mm/memcg mm/pagemap mm/mremap mm/dma mm/sparsemem mm/vmalloc mm/documentation mm/kasan mm/initialization mm/pagealloc mm/memory-failure Subsystem: ia64 Zhang Yunkai <zhang.yunkai@zte.com.cn>: arch/ia64/kernel/head.S: remove duplicate include Bhaskar Chowdhury <unixbhaskar@gmail.com>: arch/ia64/kernel/fsys.S: fix typos arch/ia64/include/asm/pgtable.h: minor typo fixes Valentin Schneider <valentin.schneider@arm.com>: ia64: ensure proper NUMA distance and possible map initialization Sergei Trofimovich <slyfox@gentoo.org>: ia64: drop unused IA64_FW_EMU ifdef ia64: simplify code flow around swiotlb init Bhaskar Chowdhury <unixbhaskar@gmail.com>: ia64: trivial spelling fixes Sergei Trofimovich <slyfox@gentoo.org>: ia64: fix EFI_DEBUG build ia64: mca: always make IA64_MCA_DEBUG an expression ia64: drop marked broken DISCONTIGMEM and VIRTUAL_MEM_MAP ia64: module: fix symbolizer crash on fdescr Subsystem: kbuild Luc Van Oostenryck <luc.vanoostenryck@gmail.com>: include/linux/compiler-gcc.h: sparse can do constant folding of __builtin_bswap*() Subsystem: scripts Tom Saeger <tom.saeger@oracle.com>: scripts/spelling.txt: add entries for recent discoveries Wan Jiabing <wanjiabing@vivo.com>: scripts: a new script for checking duplicate struct declaration Subsystem: sh Zhang Yunkai <zhang.yunkai@zte.com.cn>: arch/sh/include/asm/tlb.h: remove duplicate include Subsystem: ocfs2 Yang Li <yang.lee@linux.alibaba.com>: ocfs2: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE Joseph Qi <joseph.qi@linux.alibaba.com>: ocfs2: map flags directly in flags_to_o2dlm() Bhaskar Chowdhury <unixbhaskar@gmail.com>: ocfs2: fix a typo Jiapeng Chong <jiapeng.chong@linux.alibaba.com>: ocfs2/dlm: remove unused function Subsystem: kfifo Dan Carpenter <dan.carpenter@oracle.com>: kfifo: fix ternary sign extension bugs Subsystem: vfs Randy Dunlap <rdunlap@infradead.org>: vfs: fs_parser: clean up kernel-doc warnings Subsystem: kernel/watchdog Petr Mladek <pmladek@suse.com>: Patch series "watchdog/softlockup: Report overall time and some cleanup", v2: watchdog: rename __touch_watchdog() to a better descriptive name watchdog: explicitly update timestamp when reporting softlockup watchdog/softlockup: report the overall time of softlockups watchdog/softlockup: remove logic that tried to prevent repeated reports watchdog: fix barriers when printing backtraces from all CPUs watchdog: cleanup handling of false positives Subsystem: mm/slab-generic Rafael Aquini <aquini@redhat.com>: mm/slab_common: provide "slab_merge" option for !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT) builds Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: mm, slub: enable slub_debug static key when creating cache with explicit debug flags Oliver Glitta <glittao@gmail.com>: kunit: add a KUnit test for SLUB debugging functionality slub: remove resiliency_test() function Bhaskar Chowdhury <unixbhaskar@gmail.com>: mm/slub.c: trivial typo fixes Subsystem: mm/kmemleak Bhaskar Chowdhury <unixbhaskar@gmail.com>: mm/kmemleak.c: fix a typo Subsystem: mm/debug Georgi Djakov <georgi.djakov@linaro.org>: mm/page_owner: record the timestamp of all pages during free zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>: mm, page_owner: remove unused parameter in __set_page_owner_handle Sergei Trofimovich <slyfox@gentoo.org>: mm: page_owner: fetch backtrace only for tracked pages mm: page_owner: use kstrtobool() to parse bool option mm: page_owner: detect page_owner recursion via task_struct mm: page_poison: print page info when corruption is caught Anshuman Khandual <anshuman.khandual@arm.com>: mm/memtest: add ARCH_USE_MEMTEST Subsystem: mm/pagecache Jens Axboe <axboe@kernel.dk>: Patch series "Improve IOCB_NOWAIT O_DIRECT reads", v3: mm: provide filemap_range_needs_writeback() helper mm: use filemap_range_needs_writeback() for O_DIRECT reads iomap: use filemap_range_needs_writeback() for O_DIRECT reads "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/filemap: use filemap_read_page in filemap_fault mm/filemap: drop check for truncated page after I/O Johannes Weiner <hannes@cmpxchg.org>: mm: page-writeback: simplify memcg handling in test_clear_page_writeback() "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: move page_mapping_file to pagemap.h Rui Sun <sunrui26@huawei.com>: mm/filemap: update stale comment Subsystem: mm/msync Nikita Ermakov <sh1r4s3@mail.si-head.nl>: mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start Subsystem: mm/gup Joao Martins <joao.m.martins@oracle.com>: Patch series "mm/gup: page unpining improvements", v4: mm/gup: add compound page list iterator mm/gup: decrement head page once for group of subpages mm/gup: add a range variant of unpin_user_pages_dirty_lock() RDMA/umem: batch page unpin in __ib_umem_release() Yang Shi <shy828301@gmail.com>: mm: gup: remove FOLL_SPLIT Subsystem: mm/memremap Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/memremap.c: fix improper SPDX comment style Subsystem: mm/memcg Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: fix kernel stack account Shakeel Butt <shakeelb@google.com>: memcg: cleanup root memcg checks memcg: enable memcg oom-kill for __GFP_NOFAIL Johannes Weiner <hannes@cmpxchg.org>: Patch series "mm: memcontrol: switch to rstat", v3: mm: memcontrol: fix cpuhotplug statistics flushing mm: memcontrol: kill mem_cgroup_nodeinfo() mm: memcontrol: privatize memcg_page_state query functions cgroup: rstat: support cgroup1 cgroup: rstat: punt root-level optimization to individual controllers mm: memcontrol: switch to rstat mm: memcontrol: consolidate lruvec stat flushing kselftests: cgroup: update kmem test for new vmstat implementation Shakeel Butt <shakeelb@google.com>: memcg: charge before adding to swapcache on swapin Muchun Song <songmuchun@bytedance.com>: Patch series "Use obj_cgroup APIs to charge kmem pages", v5: mm: memcontrol: slab: fix obtain a reference to a freeing memcg mm: memcontrol: introduce obj_cgroup_{un}charge_pages mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c mm: memcontrol: change ug->dummy_page only if memcg changed mm: memcontrol: use obj_cgroup APIs to charge kmem pages mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages() mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Wan Jiabing <wanjiabing@vivo.com>: linux/memcontrol.h: remove duplicate struct declaration Johannes Weiner <hannes@cmpxchg.org>: mm: page_counter: mitigate consequences of a page_counter underflow Subsystem: mm/pagemap Wang Qing <wangqing@vivo.com>: mm/memory.c: do_numa_page(): delete bool "migrated" Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/interval_tree: add comments to improve code readability Oscar Salvador <osalvador@suse.de>: Patch series "Cleanup and fixups for vmemmap handling", v6: x86/vmemmap: drop handling of 4K unaligned vmemmap range x86/vmemmap: drop handling of 1GB vmemmap ranges x86/vmemmap: handle unpopulated sub-pmd ranges x86/vmemmap: optimize for consecutive sections in partial populated PMDs Ovidiu Panait <ovidiu.panait@windriver.com>: mm, tracing: improve rss_stat tracepoint message Christoph Hellwig <hch@lst.de>: Patch series "add remap_pfn_range_notrack instead of reinventing it in i915", v2: mm: add remap_pfn_range_notrack mm: add a io_mapping_map_user helper i915: use io_mapping_map_user i915: fix remap_io_sg to verify the pgprot Huang Ying <ying.huang@intel.com>: NUMA balancing: reduce TLB flush via delaying mapping on hint page fault Subsystem: mm/mremap Brian Geffon <bgeffon@google.com>: Patch series "mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings", v5: mm: extend MREMAP_DONTUNMAP to non-anonymous mappings Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" selftests: add a MREMAP_DONTUNMAP selftest for shmem Subsystem: mm/dma Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/dmapool: switch from strlcpy to strscpy Subsystem: mm/sparsemem Wang Wensheng <wangwensheng4@huawei.com>: mm/sparse: add the missing sparse_buffer_fini() in error branch Subsystem: mm/vmalloc Christoph Hellwig <hch@lst.de>: Patch series "remap_vmalloc_range cleanups": samples/vfio-mdev/mdpy: use remap_vmalloc_range mm: unexport remap_vmalloc_range_partial Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>: mm/vmalloc: use rb_tree instead of list for vread() lookups Nicholas Piggin <npiggin@gmail.com>: Patch series "huge vmalloc mappings", v13: ARM: mm: add missing pud_page define to 2-level page tables mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page mm: apply_to_pte_range warn and fail if a large pte is encountered mm/vmalloc: rename vmap_*_range vmap_pages_*_range mm/ioremap: rename ioremap_*_range to vmap_*_range mm: HUGE_VMAP arch support cleanup powerpc: inline huge vmap supported functions arm64: inline huge vmap supported functions x86: inline huge vmap supported functions mm/vmalloc: provide fallback arch huge vmap support functions mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c mm/vmalloc: add vmap_range_noflush variant mm/vmalloc: hugepage vmalloc mappings Patch series "mm/vmalloc: cleanup after hugepage series", v2: mm/vmalloc: remove map_kernel_range kernel/dma: remove unnecessary unmap_kernel_range powerpc/xive: remove unnecessary unmap_kernel_range mm/vmalloc: remove unmap_kernel_range mm/vmalloc: improve allocation failure error messages Vijayanand Jitta <vjitta@codeaurora.org>: mm: vmalloc: prevent use after free in _vm_unmap_aliases "Uladzislau Rezki (Sony)" <urezki@gmail.com>: lib/test_vmalloc.c: remove two kvfree_rcu() tests lib/test_vmalloc.c: add a new 'nr_threads' parameter vm/test_vmalloc.sh: adapt for updated driver interface mm/vmalloc: refactor the preloading loagic mm/vmalloc: remove an empty line Subsystem: mm/documentation "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/doc: fix fault_flag_allow_retry_first kerneldoc mm/doc: fix page_maybe_dma_pinned kerneldoc mm/doc: turn fault flags into an enum mm/doc: add mm.h and mm_types.h to the mm-api document Lukas Bulwahn <lukas.bulwahn@gmail.com>: Patch series "kernel-doc and MAINTAINERS clean-up": MAINTAINERS: assign pagewalk.h to MEMORY MANAGEMENT pagewalk: prefix struct kernel-doc descriptions Subsystem: mm/kasan Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/kasan: switch from strlcpy to strscpy Peter Collingbourne <pcc@google.com>: kasan: fix kasan_byte_accessible() to be consistent with actual checks Andrey Konovalov <andreyknvl@google.com>: kasan: initialize shadow to TAG_INVALID for SW_TAGS mm, kasan: don't poison boot memory with tag-based modes Patch series "kasan: integrate with init_on_alloc/free", v3: arm64: kasan: allow to init memory when setting tags kasan: init memory in kasan_(un)poison for HW_TAGS kasan, mm: integrate page_alloc init with HW_TAGS kasan, mm: integrate slab init_on_alloc with HW_TAGS kasan, mm: integrate slab init_on_free with HW_TAGS kasan: docs: clean up sections kasan: docs: update overview section kasan: docs: update usage section kasan: docs: update error reports section kasan: docs: update boot parameters section kasan: docs: update GENERIC implementation details section kasan: docs: update SW_TAGS implementation details section kasan: docs: update HW_TAGS implementation details section kasan: docs: update shadow memory section kasan: docs: update ignoring accesses section kasan: docs: update tests section Walter Wu <walter-zh.wu@mediatek.com>: kasan: record task_work_add() call stack Andrey Konovalov <andreyknvl@google.com>: kasan: detect false-positives in tests Zqiang <qiang.zhang@windriver.com>: irq_work: record irq_work_queue() call stack Subsystem: mm/initialization Kefeng Wang <wangkefeng.wang@huawei.com>: mm: move mem_init_print_info() into mm_init() Subsystem: mm/pagealloc David Hildenbrand <david@redhat.com>: mm/page_alloc: drop pr_info_ratelimited() in alloc_contig_range() Minchan Kim <minchan@kernel.org>: mm: remove lru_add_drain_all in alloc_contig_range Yu Zhao <yuzhao@google.com>: include/linux/page-flags-layout.h: correctly determine LAST_CPUPID_WIDTH include/linux/page-flags-layout.h: cleanups "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Rationalise __alloc_pages wrappers", v3: mm/page_alloc: rename alloc_mask to alloc_gfp mm/page_alloc: rename gfp_mask to gfp mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemask mm/mempolicy: rename alloc_pages_current to alloc_pages mm/mempolicy: rewrite alloc_pages documentation mm/mempolicy: rewrite alloc_pages_vma documentation mm/mempolicy: fix mpol_misplaced kernel-doc Minchan Kim <minchan@kernel.org>: mm: page_alloc: dump migrate-failed pages Geert Uytterhoeven <geert@linux-m68k.org>: mm/Kconfig: remove default DISCONTIGMEM_MANUAL Kefeng Wang <wangkefeng.wang@huawei.com>: mm, page_alloc: avoid page_to_pfn() in move_freepages() zhouchuangao <zhouchuangao@vivo.com>: mm/page_alloc: duplicate include linux/vmalloc.h Mel Gorman <mgorman@techsingularity.net>: Patch series "Introduce a bulk order-0 page allocator with two in-tree users", v6: mm/page_alloc: rename alloced to allocated mm/page_alloc: add a bulk page allocator mm/page_alloc: add an array-based interface to the bulk page allocator Jesper Dangaard Brouer <brouer@redhat.com>: mm/page_alloc: optimize code layout for __alloc_pages_bulk mm/page_alloc: inline __rmqueue_pcplist Chuck Lever <chuck.lever@oracle.com>: Patch series "SUNRPC consumer for the bulk page allocator": SUNRPC: set rq_page_end differently SUNRPC: refresh rq_pages using a bulk page allocator Jesper Dangaard Brouer <brouer@redhat.com>: net: page_pool: refactor dma_map into own function page_pool_dma_map net: page_pool: use alloc_pages_bulk in refill code path Sergei Trofimovich <slyfox@gentoo.org>: mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1 huxiang <huxiang@uniontech.com>: mm/page_alloc: redundant definition variables of pfn in for loop Mike Rapoport <rppt@linux.ibm.com>: mm/mmzone.h: fix existing kernel-doc comments and link them to core-api Subsystem: mm/memory-failure Jane Chu <jane.chu@oracle.com>: mm/memory-failure: unnecessary amount of unmapping Documentation/admin-guide/kernel-parameters.txt | 7 Documentation/admin-guide/mm/transhuge.rst | 2 Documentation/core-api/cachetlb.rst | 4 Documentation/core-api/mm-api.rst | 6 Documentation/dev-tools/kasan.rst | 355 +++++----- Documentation/vm/page_owner.rst | 2 Documentation/vm/transhuge.rst | 5 MAINTAINERS | 1 arch/Kconfig | 11 arch/alpha/mm/init.c | 1 arch/arc/mm/init.c | 1 arch/arm/Kconfig | 1 arch/arm/include/asm/pgtable-3level.h | 2 arch/arm/include/asm/pgtable.h | 3 arch/arm/mm/copypage-v4mc.c | 1 arch/arm/mm/copypage-v6.c | 1 arch/arm/mm/copypage-xscale.c | 1 arch/arm/mm/init.c | 2 arch/arm64/Kconfig | 1 arch/arm64/include/asm/memory.h | 4 arch/arm64/include/asm/mte-kasan.h | 39 - arch/arm64/include/asm/vmalloc.h | 38 - arch/arm64/mm/init.c | 4 arch/arm64/mm/mmu.c | 36 - arch/csky/abiv1/cacheflush.c | 1 arch/csky/mm/init.c | 1 arch/h8300/mm/init.c | 2 arch/hexagon/mm/init.c | 1 arch/ia64/Kconfig | 23 arch/ia64/configs/bigsur_defconfig | 1 arch/ia64/include/asm/meminit.h | 11 arch/ia64/include/asm/module.h | 6 arch/ia64/include/asm/page.h | 25 arch/ia64/include/asm/pgtable.h | 7 arch/ia64/kernel/Makefile | 2 arch/ia64/kernel/acpi.c | 7 arch/ia64/kernel/efi.c | 11 arch/ia64/kernel/fsys.S | 4 arch/ia64/kernel/head.S | 6 arch/ia64/kernel/ia64_ksyms.c | 12 arch/ia64/kernel/machine_kexec.c | 2 arch/ia64/kernel/mca.c | 4 arch/ia64/kernel/module.c | 29 arch/ia64/kernel/pal.S | 6 arch/ia64/mm/Makefile | 1 arch/ia64/mm/contig.c | 4 arch/ia64/mm/discontig.c | 21 arch/ia64/mm/fault.c | 15 arch/ia64/mm/init.c | 221 ------ arch/m68k/mm/init.c | 1 arch/microblaze/mm/init.c | 1 arch/mips/Kconfig | 1 arch/mips/loongson64/numa.c | 1 arch/mips/mm/cache.c | 1 arch/mips/mm/init.c | 1 arch/mips/sgi-ip27/ip27-memory.c | 1 arch/nds32/mm/init.c | 1 arch/nios2/mm/cacheflush.c | 1 arch/nios2/mm/init.c | 1 arch/openrisc/mm/init.c | 2 arch/parisc/mm/init.c | 2 arch/powerpc/Kconfig | 1 arch/powerpc/include/asm/vmalloc.h | 34 - arch/powerpc/kernel/isa-bridge.c | 4 arch/powerpc/kernel/pci_64.c | 2 arch/powerpc/mm/book3s64/radix_pgtable.c | 29 arch/powerpc/mm/ioremap.c | 2 arch/powerpc/mm/mem.c | 1 arch/powerpc/sysdev/xive/common.c | 4 arch/riscv/mm/init.c | 1 arch/s390/mm/init.c | 2 arch/sh/include/asm/tlb.h | 10 arch/sh/mm/cache-sh4.c | 1 arch/sh/mm/cache-sh7705.c | 1 arch/sh/mm/init.c | 1 arch/sparc/include/asm/pgtable_32.h | 3 arch/sparc/mm/init_32.c | 2 arch/sparc/mm/init_64.c | 1 arch/sparc/mm/tlb.c | 1 arch/um/kernel/mem.c | 1 arch/x86/Kconfig | 1 arch/x86/include/asm/vmalloc.h | 42 - arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 arch/x86/mm/init_32.c | 2 arch/x86/mm/init_64.c | 222 ++++-- arch/x86/mm/ioremap.c | 33 arch/x86/mm/pgtable.c | 13 arch/xtensa/Kconfig | 1 arch/xtensa/mm/init.c | 1 block/blk-cgroup.c | 17 drivers/gpu/drm/i915/Kconfig | 1 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 9 drivers/gpu/drm/i915/i915_drv.h | 3 drivers/gpu/drm/i915/i915_mm.c | 117 --- drivers/infiniband/core/umem.c | 12 drivers/pci/pci.c | 2 fs/aio.c | 5 fs/fs_parser.c | 2 fs/iomap/direct-io.c | 24 fs/ocfs2/blockcheck.c | 2 fs/ocfs2/dlm/dlmrecovery.c | 7 fs/ocfs2/stack_o2cb.c | 36 - fs/ocfs2/stackglue.c | 2 include/linux/compiler-gcc.h | 8 include/linux/fs.h | 2 include/linux/gfp.h | 45 - include/linux/io-mapping.h | 3 include/linux/io.h | 9 include/linux/kasan.h | 51 + include/linux/memcontrol.h | 271 ++++---- include/linux/mm.h | 50 - include/linux/mmzone.h | 43 - include/linux/page-flags-layout.h | 64 - include/linux/pagemap.h | 10 include/linux/pagewalk.h | 4 include/linux/sched.h | 4 include/linux/slab.h | 2 include/linux/slub_def.h | 2 include/linux/vmalloc.h | 73 +- include/linux/vmstat.h | 24 include/net/page_pool.h | 2 include/trace/events/kmem.h | 24 init/main.c | 2 kernel/cgroup/cgroup.c | 34 - kernel/cgroup/rstat.c | 61 + kernel/dma/remap.c | 1 kernel/fork.c | 13 kernel/irq_work.c | 7 kernel/task_work.c | 3 kernel/watchdog.c | 102 +-- lib/Kconfig.debug | 14 lib/Makefile | 1 lib/test_kasan.c | 59 - lib/test_slub.c | 124 +++ lib/test_vmalloc.c | 128 +-- mm/Kconfig | 4 mm/Makefile | 1 mm/debug_vm_pgtable.c | 4 mm/dmapool.c | 2 mm/filemap.c | 61 + mm/gup.c | 145 +++- mm/hugetlb.c | 2 mm/internal.h | 25 mm/interval_tree.c | 2 mm/io-mapping.c | 29 mm/ioremap.c | 361 ++-------- mm/kasan/common.c | 53 - mm/kasan/generic.c | 12 mm/kasan/kasan.h | 28 mm/kasan/report_generic.c | 2 mm/kasan/shadow.c | 10 mm/kasan/sw_tags.c | 12 mm/kmemleak.c | 2 mm/memcontrol.c | 798 ++++++++++++------------ mm/memory-failure.c | 2 mm/memory.c | 191 +++-- mm/mempolicy.c | 78 -- mm/mempool.c | 4 mm/memremap.c | 2 mm/migrate.c | 2 mm/mm_init.c | 4 mm/mmap.c | 6 mm/mremap.c | 6 mm/msync.c | 6 mm/page-writeback.c | 9 mm/page_alloc.c | 430 +++++++++--- mm/page_counter.c | 8 mm/page_owner.c | 68 -- mm/page_poison.c | 6 mm/percpu-vm.c | 7 mm/slab.c | 43 - mm/slab.h | 24 mm/slab_common.c | 10 mm/slub.c | 215 ++---- mm/sparse.c | 1 mm/swap_state.c | 13 mm/util.c | 10 mm/vmalloc.c | 728 ++++++++++++++++----- net/core/page_pool.c | 127 ++- net/sunrpc/svc_xprt.c | 38 - samples/kfifo/bytestream-example.c | 8 samples/kfifo/inttype-example.c | 8 samples/kfifo/record-example.c | 8 samples/vfio-mdev/mdpy.c | 4 scripts/checkdeclares.pl | 53 + scripts/spelling.txt | 26 tools/testing/selftests/cgroup/test_kmem.c | 22 tools/testing/selftests/vm/mremap_dontunmap.c | 52 + tools/testing/selftests/vm/test_vmalloc.sh | 21 189 files changed, 3642 insertions(+), 3013 deletions(-)
The remainder of the main mm/ queue. 143 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0, plus previously sent patches. Subsystems affected by this patch series: mm/pagecache mm/hugetlb mm/userfaultfd mm/vmscan mm/compaction mm/migration mm/cma mm/ksm mm/vmstat mm/mmap mm/kconfig mm/util mm/memory-hotplug mm/zswap mm/zsmalloc mm/highmem mm/cleanups mm/kfence Subsystem: mm/pagecache "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Remove nrexceptional tracking", v2: mm: introduce and use mapping_empty() mm: stop accounting shadow entries dax: account DAX entries as nrpages mm: remove nrexceptional from inode Hugh Dickins <hughd@google.com>: mm: remove nrexceptional from inode: remove BUG_ON Subsystem: mm/hugetlb Peter Xu <peterx@redhat.com>: Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4: hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share() hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled mm/hugetlb: move flush_hugetlb_tlb_range() into hugetlb.h hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp Miaohe Lin <linmiaohe@huawei.com>: mm/hugetlb: remove redundant reservation check condition in alloc_huge_page() Anshuman Khandual <anshuman.khandual@arm.com>: mm: generalize HUGETLB_PAGE_SIZE_VARIABLE Miaohe Lin <linmiaohe@huawei.com>: Patch series "Some cleanups for hugetlb": mm/hugetlb: use some helper functions to cleanup code mm/hugetlb: optimize the surplus state transfer code in move_hugetlb_state() mm/hugetlb_cgroup: remove unnecessary VM_BUG_ON_PAGE in hugetlb_cgroup_migrate() mm/hugetlb: simplify the code when alloc_huge_page() failed in hugetlb_no_page() mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case Patch series "Cleanup and fixup for khugepaged", v2: khugepaged: remove unneeded return value of khugepaged_collapse_pte_mapped_thps() khugepaged: reuse the smp_wmb() inside __SetPageUptodate() khugepaged: use helper khugepaged_test_exit() in __khugepaged_enter() khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate() mm/huge_memory.c: remove unnecessary local variable ret2 Patch series "Some cleanups for huge_memory", v3: mm/huge_memory.c: rework the function vma_adjust_trans_huge() mm/huge_memory.c: make get_huge_zero_page() return bool mm/huge_memory.c: rework the function do_huge_pmd_numa_page() slightly mm/huge_memory.c: remove redundant PageCompound() check mm/huge_memory.c: remove unused macro TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG mm/huge_memory.c: use helper function migration_entry_to_page() Yanfei Xu <yanfei.xu@windriver.com>: mm/khugepaged.c: replace barrier() with READ_ONCE() for a selective variable Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup for khugepaged": khugepaged: use helper function range_in_vma() in collapse_pte_mapped_thp() khugepaged: remove unnecessary out label in collapse_huge_page() khugepaged: remove meaningless !pte_present() check in khugepaged_scan_pmd() Zi Yan <ziy@nvidia.com>: mm: huge_memory: a new debugfs interface for splitting THP tests mm: huge_memory: debugfs for file-backed THP split Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup and fixup for hugetlb", v2: mm/hugeltb: remove redundant VM_BUG_ON() in region_add() mm/hugeltb: simplify the return code of __vma_reservation_common() mm/hugeltb: clarify (chg - freed) won't go negative in hugetlb_unreserve_pages() mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts() mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages() Mike Kravetz <mike.kravetz@oracle.com>: Patch series "make hugetlb put_page safe for all calling contexts", v5: mm/cma: change cma mutex to irq safe spinlock hugetlb: no need to drop hugetlb_lock to call cma_release hugetlb: add per-hstate mutex to synchronize user adjustments hugetlb: create remove_hugetlb_page() to separate functionality hugetlb: call update_and_free_page without hugetlb_lock hugetlb: change free_pool_huge_page to remove_pool_huge_page hugetlb: make free_huge_page irq safe hugetlb: add lockdep_assert_held() calls for hugetlb_lock Oscar Salvador <osalvador@suse.de>: Patch series "Make alloc_contig_range handle Hugetlb pages", v10: mm,page_alloc: bail out earlier on -ENOMEM in alloc_contig_migrate_range mm,compaction: let isolate_migratepages_{range,block} return error codes mm,hugetlb: drop clearing of flag from prep_new_huge_page mm,hugetlb: split prep_new_huge_page functionality mm: make alloc_contig_range handle free hugetlb pages mm: make alloc_contig_range handle in-use hugetlb pages mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig Subsystem: mm/userfaultfd Axel Rasmussen <axelrasmussen@google.com>: Patch series "userfaultfd: add minor fault handling", v9: userfaultfd: add minor fault registration mode userfaultfd: disable huge PMD sharing for MINOR registered VMAs userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled userfaultfd: add UFFDIO_CONTINUE ioctl userfaultfd: update documentation to describe minor fault handling userfaultfd/selftests: add test exercising minor fault handling Subsystem: mm/vmscan Dave Hansen <dave.hansen@linux.intel.com>: mm/vmscan: move RECLAIM* bits to uapi header mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks Yang Shi <shy828301@gmail.com>: Patch series "Make shrinker's nr_deferred memcg aware", v10: mm: vmscan: use nid from shrink_control for tracepoint mm: vmscan: consolidate shrinker_maps handling code mm: vmscan: use shrinker_rwsem to protect shrinker_maps allocation mm: vmscan: remove memcg_shrinker_map_size mm: vmscan: use kvfree_rcu instead of call_rcu mm: memcontrol: rename shrinker_map to shrinker_info mm: vmscan: add shrinker_info_protected() helper mm: vmscan: use a new flag to indicate shrinker is registered mm: vmscan: add per memcg shrinker nr_deferred mm: vmscan: use per memcg nr_deferred of shrinker mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers mm: memcontrol: reparent nr_deferred when memcg offline mm: vmscan: shrink deferred objects proportional to priority Subsystem: mm/compaction Pintu Kumar <pintu@codeaurora.org>: mm/compaction: remove unused variable sysctl_compact_memory Charan Teja Reddy <charante@codeaurora.org>: mm: compaction: update the COMPACT[STALL|FAIL] events properly Subsystem: mm/migration Minchan Kim <minchan@kernel.org>: mm: disable LRU pagevec during the migration temporarily mm: replace migrate_[prep|finish] with lru_cache_[disable|enable] mm: fs: invalidate BH LRU during page migration Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup and fixup for mm/migrate.c", v3: mm/migrate.c: make putback_movable_page() static mm/migrate.c: remove unnecessary rc != MIGRATEPAGE_SUCCESS check in 'else' case mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page() mm/migrate.c: use helper migrate_vma_collect_skip() in migrate_vma_collect_hole() Revert "mm: migrate: skip shared exec THP for NUMA balancing" Subsystem: mm/cma Minchan Kim <minchan@kernel.org>: mm: vmstat: add cma statistics Baolin Wang <baolin.wang@linux.alibaba.com>: mm: cma: use pr_err_ratelimited for CMA warning Liam Mark <lmark@codeaurora.org>: mm: cma: add trace events for CMA alloc perf testing Minchan Kim <minchan@kernel.org>: mm: cma: support sysfs mm: cma: add the CMA instance name to cma trace events mm: use proper type for cma_[alloc|release] Subsystem: mm/ksm Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup and fixup for ksm": ksm: remove redundant VM_BUG_ON_PAGE() on stable_tree_search() ksm: use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree() ksm: remove dedicated macro KSM_FLAG_MASK ksm: fix potential missing rmap_item for stable_node Chengyang Fan <cy.fan@huawei.com>: mm/ksm: remove unused parameter from remove_trailing_rmap_items() Subsystem: mm/vmstat Hugh Dickins <hughd@google.com>: mm: restore node stat checking in /proc/sys/vm/stat_refresh mm: no more EINVAL from /proc/sys/vm/stat_refresh mm: /proc/sys/vm/stat_refresh skip checking known negative stats mm: /proc/sys/vm/stat_refresh stop checking monotonic numa stats Saravanan D <saravanand@fb.com>: x86/mm: track linear mapping split events Subsystem: mm/mmap Liam Howlett <liam.howlett@oracle.com>: mm/mmap.c: don't unlock VMAs in remap_file_pages() Subsystem: mm/kconfig Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm: some config cleanups", v2: mm: generalize ARCH_HAS_CACHE_LINE_SIZE mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS) mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] mm: drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK mm: drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE Subsystem: mm/util Joe Perches <joe@perches.com>: mm/util.c: reduce mem_dump_obj() object size Bhaskar Chowdhury <unixbhaskar@gmail.com>: mm/util.c: fix typo Subsystem: mm/memory-hotplug Pavel Tatashin <pasha.tatashin@soleen.com>: Patch series "prohibit pinning pages in ZONE_MOVABLE", v11: mm/gup: don't pin migrated cma pages in movable zone mm/gup: check every subpage of a compound page during isolation mm/gup: return an error on migration failure mm/gup: check for isolation errors mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN mm: apply per-task gfp constraints in fast path mm: honor PF_MEMALLOC_PIN for all movable pages mm/gup: do not migrate zero page mm/gup: migrate pinned pages out of movable zone memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning mm/gup: change index type to long as it counts pages mm/gup: longterm pin migration cleanup selftests/vm: gup_test: fix test flag selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages Mel Gorman <mgorman@techsingularity.net>: mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove Oscar Salvador <osalvador@suse.de>: Patch series "Allocate memmap from hotadded memory (per device)", v10: drivers/base/memory: introduce memory_block_{online,offline} mm,memory_hotplug: relax fully spanned sections check David Hildenbrand <david@redhat.com>: mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() Oscar Salvador <osalvador@suse.de>: mm,memory_hotplug: allocate memmap from the added memory range acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: add kernel boot option to enable memmap_on_memory x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Subsystem: mm/zswap Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/zswap.c: switch from strlcpy to strscpy Subsystem: mm/zsmalloc zhouchuangao <zhouchuangao@vivo.com>: mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. Subsystem: mm/highmem Ira Weiny <ira.weiny@intel.com>: Patch series "btrfs: Convert kmap/memset/kunmap to memzero_user()": iov_iter: lift memzero_page() to highmem.h btrfs: use memzero_page() instead of open coded kmap pattern songqiang <songqiang@uniontech.com>: mm/highmem.c: fix coding style issue Subsystem: mm/cleanups Zhiyuan Dai <daizhiyuan@phytium.com.cn>: mm/mempool: minor coding style tweaks Zhang Yunkai <zhang.yunkai@zte.com.cn>: mm/process_vm_access.c: remove duplicate include Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence: zero guard page after out-of-bounds access Patch series "kfence: optimize timer scheduling", v2: kfence: await for allocation using wait_event kfence: maximize allocation wait timeout duration kfence: use power-efficient work queue to run delayed work Documentation/ABI/testing/sysfs-kernel-mm-cma | 25 Documentation/admin-guide/kernel-parameters.txt | 17 Documentation/admin-guide/mm/memory-hotplug.rst | 9 Documentation/admin-guide/mm/userfaultfd.rst | 105 +- arch/arc/Kconfig | 9 arch/arm/Kconfig | 10 arch/arm64/Kconfig | 34 arch/arm64/mm/hugetlbpage.c | 7 arch/ia64/Kconfig | 14 arch/ia64/mm/hugetlbpage.c | 3 arch/mips/Kconfig | 6 arch/mips/mm/hugetlbpage.c | 4 arch/parisc/Kconfig | 5 arch/parisc/mm/hugetlbpage.c | 2 arch/powerpc/Kconfig | 17 arch/powerpc/mm/hugetlbpage.c | 3 arch/powerpc/platforms/Kconfig.cputype | 16 arch/riscv/Kconfig | 5 arch/s390/Kconfig | 12 arch/s390/mm/hugetlbpage.c | 2 arch/sh/Kconfig | 7 arch/sh/mm/Kconfig | 8 arch/sh/mm/hugetlbpage.c | 2 arch/sparc/mm/hugetlbpage.c | 2 arch/x86/Kconfig | 33 arch/x86/mm/pat/set_memory.c | 8 drivers/acpi/acpi_memhotplug.c | 5 drivers/base/memory.c | 105 ++ fs/Kconfig | 5 fs/block_dev.c | 2 fs/btrfs/compression.c | 5 fs/btrfs/extent_io.c | 22 fs/btrfs/inode.c | 33 fs/btrfs/reflink.c | 6 fs/btrfs/zlib.c | 5 fs/btrfs/zstd.c | 5 fs/buffer.c | 36 fs/dax.c | 8 fs/gfs2/glock.c | 3 fs/hugetlbfs/inode.c | 9 fs/inode.c | 11 fs/proc/task_mmu.c | 3 fs/userfaultfd.c | 149 +++ include/linux/buffer_head.h | 4 include/linux/cma.h | 4 include/linux/compaction.h | 1 include/linux/fs.h | 2 include/linux/gfp.h | 2 include/linux/highmem.h | 7 include/linux/huge_mm.h | 3 include/linux/hugetlb.h | 37 include/linux/memcontrol.h | 27 include/linux/memory.h | 8 include/linux/memory_hotplug.h | 15 include/linux/memremap.h | 2 include/linux/migrate.h | 11 include/linux/mm.h | 28 include/linux/mmzone.h | 20 include/linux/pagemap.h | 5 include/linux/pgtable.h | 12 include/linux/sched.h | 2 include/linux/sched/mm.h | 27 include/linux/shrinker.h | 7 include/linux/swap.h | 21 include/linux/userfaultfd_k.h | 55 + include/linux/vm_event_item.h | 8 include/trace/events/cma.h | 92 +- include/trace/events/migrate.h | 25 include/trace/events/mmflags.h | 7 include/uapi/linux/mempolicy.h | 7 include/uapi/linux/userfaultfd.h | 36 init/Kconfig | 5 kernel/sysctl.c | 2 lib/Kconfig.kfence | 1 lib/iov_iter.c | 8 mm/Kconfig | 28 mm/Makefile | 6 mm/cma.c | 70 + mm/cma.h | 25 mm/cma_debug.c | 8 mm/cma_sysfs.c | 112 ++ mm/compaction.c | 113 ++ mm/filemap.c | 24 mm/frontswap.c | 12 mm/gup.c | 264 +++--- mm/gup_test.c | 29 mm/gup_test.h | 3 mm/highmem.c | 11 mm/huge_memory.c | 326 +++++++- mm/hugetlb.c | 843 ++++++++++++++-------- mm/hugetlb_cgroup.c | 9 mm/internal.h | 10 mm/kfence/core.c | 61 + mm/khugepaged.c | 63 - mm/ksm.c | 17 mm/list_lru.c | 6 mm/memcontrol.c | 137 --- mm/memory_hotplug.c | 220 +++++ mm/mempolicy.c | 16 mm/mempool.c | 2 mm/migrate.c | 103 -- mm/mlock.c | 4 mm/mmap.c | 18 mm/oom_kill.c | 2 mm/page_alloc.c | 83 +- mm/process_vm_access.c | 1 mm/shmem.c | 2 mm/sparse.c | 4 mm/swap.c | 69 + mm/swap_state.c | 4 mm/swapfile.c | 4 mm/truncate.c | 19 mm/userfaultfd.c | 39 - mm/util.c | 26 mm/vmalloc.c | 2 mm/vmscan.c | 543 +++++++++----- mm/vmstat.c | 45 - mm/workingset.c | 1 mm/zsmalloc.c | 6 mm/zswap.c | 2 tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 1 tools/testing/selftests/vm/gup_test.c | 38 tools/testing/selftests/vm/split_huge_page_test.c | 400 ++++++++++ tools/testing/selftests/vm/userfaultfd.c | 164 ++++ 125 files changed, 3596 insertions(+), 1668 deletions(-)
On Tue, May 4, 2021 at 6:32 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > 143 patches Hmm. Only 140 seem to have made it to the list, with 103, 106 and 107 missing. Maybe just some mail delay? But at least right now https://lore.kernel.org/mm-commits/ doesn't show them (and thus 'b4' doesn't work). I'll check again later. Linus
On Tue, 4 May 2021 18:47:19 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, May 4, 2021 at 6:32 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > 143 patches
>
> Hmm. Only 140 seem to have made it to the list, with 103, 106 and 107 missing.
>
> Maybe just some mail delay? But at least right now
>
> https://lore.kernel.org/mm-commits/
>
> doesn't show them (and thus 'b4' doesn't work).
>
> I'll check again later.
>
Well that's strange. I see all three via cc:me, but not on linux-mm or
mm-commits.
Let me resend right now with the same in-reply-to. Hopefully they will
land in the correct place.
On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Let me resend right now with the same in-reply-to. Hopefully they will
> land in the correct place.
Well, you re-sent it twice, and I have three copies in my own mailbox,
bot they still don't show up on the mm-commits mailing list.
So the list hates them for some odd reason.
I've picked them up locally, but adding Konstantin to the participants
to see if he can see what's up.
Konstantin: patches 103/106/107 are missing on lore out of Andrew's
series of 143. Odd.
Linus
[-- Attachment #1: Type: text/plain, Size: 1387 bytes --] On Wed, 5 May 2021 10:10:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > Let me resend right now with the same in-reply-to. Hopefully they will > > land in the correct place. > > Well, you re-sent it twice, and I have three copies in my own mailbox, > bot they still don't show up on the mm-commits mailing list. > > So the list hates them for some odd reason. > > I've picked them up locally, but adding Konstantin to the participants > to see if he can see what's up. > > Konstantin: patches 103/106/107 are missing on lore out of Andrew's > series of 143. Odd. It's weird. They don't turn up on linux-mm either, and that's running at kvack.org, also majordomo. They don't get through when sent with either heirloom-mailx or with sylpheed. Also, it seems that when Anshuman originally sent the patch, linux-mm and linux-kernel didn't send it back out. So perhaps a spam filter triggered? I'm seeing https://lore.kernel.org/linux-arm-kernel/1615278790-18053-3-git-send-email-anshuman.khandual@arm.com/ which is via linux-arm-kernel@lists.infradead.org but the linux-kernel server massacred that patch series. Searching https://lkml.org/lkml/2021/3/9 for "anshuman" only shows 3 of the 7 email series. One of the emails (as sent my me) is attached, if that helps. [-- Attachment #2: x.txt --] [-- Type: text/plain, Size: 21048 bytes --] Return-Path: <akpm@linux-foundation.org> X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on y X-Spam-Level: (none) X-Spam-Status: No, score=-101.5 required=2.5 tests=BAYES_00,T_DKIM_INVALID, USER_IN_WHITELIST autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by localhost.localdomain (8.15.2/8.15.2/Debian-8ubuntu1) with ESMTP id 1453H2fk032202 for <akpm@localhost>; Tue, 4 May 2021 20:17:03 -0700 Received: from imap.fastmail.com [66.111.4.135] by localhost.localdomain with IMAP (fetchmail-6.3.26) for <akpm@localhost> (single-drop); Tue, 04 May 2021 20:17:03 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by sloti11d1t06 (Cyrus 3.5.0-alpha0-442-g5daca166b9-fm-20210428.001-g5daca166) with LMTPA; Tue, 04 May 2021 23:16:31 -0400 X-Cyrus-Session-Id: sloti11d1t06-1620184591-1699471-2-6359664467419938249 X-Sieve: CMU Sieve 3.0 X-Resolved-to: akpm@mbx.kernel.org X-Delivered-to: akpm@mbx.kernel.org X-Mail-from: akpm@linux-foundation.org Received: from mx6 ([10.202.2.205]) by compute1.internal (LMTPProxy); Tue, 04 May 2021 23:16:31 -0400 Received: from mx6.messagingengine.com (localhost [127.0.0.1]) by mailmx.nyi.internal (Postfix) with ESMTP id 40796C800E1 for <akpm@mbx.kernel.org>; Tue, 4 May 2021 23:16:31 -0400 (EDT) Received: from mx6.messagingengine.com (localhost [127.0.0.1]) by mx6.messagingengine.com (Authentication Milter) with ESMTP id 14870833D7F; Tue, 4 May 2021 23:16:31 -0400 ARC-Seal: i=2; a=rsa-sha256; cv=pass; d=messagingengine.com; s=fm2; t= 1620184591; b=FBo7Gf3JFN+4QYg5Byan0oNm6RESv+sIf5HcaslVNsUd9SOTGS yI0+IsXr1CUpGH783hE6fmgEq9SyfOwQVZjdikLaJS1+7u0JtfAYQFU3RORCtXlr djJWrScfjVa8nAHX4rQCtzvtPYuzx5w7cTgGgeILgoJMxgLj7EC9xcT8BIf68+9W Lw+ohAmcuiKhL2ez+de4SMuwdh3dh2FwAIHQOsSjEU1/NV+WGxMLwYbxWgTrqQGH RQIzFNdq30qslW9huK47+e80uHOX2tXwxtshwbThFEn458bdV5LL6Y8Oh4ZWMbv1 tFgTt515DVedonZknxc07XsXtAjaJyB8bfHw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=date:from:to:subject:message-id :in-reply-to; s=fm2; t=1620184591; bh=LuH7mbm3+zp863vKBEqKeoZtnp uFxYpIb5oTVwf56Es=; b=m5E1fbz2b+an/X406oY3BuG0Zm4/W05vWAki8Lsnud gPCc1LfPUFSuXaMppcEDPbLKprp4hH3T52itK4pivXMQCLEOyme7kVStaLMVTiky Xxqh5ZdhOWvygBfda/GjfuLBSbbj2gfm8HPKpbL7CA5foelknIBhJHDzGkJyxetZ YagZfVvtdo2OEwnC1mmjUCpKPO5+m5kaZO0ol6rPdl+TV0MKGhjLg+/i6Ia+0nFp zDwV4VeACvVcGb2xY7KG5Z+BtqVxeVFn+w5JcqpWUtxEKoSBR4bWARzjwHg6eouh 7psOOKPTt/NzDKk+3f49lso5KlPiTF2xEU/+5SIttCkQ== ARC-Authentication-Results: i=2; mx6.messagingengine.com; arc=pass (as.1.google.com=pass, ams.1.google.com=pass) smtp.remote-ip=209.85.215.198; bimi=skipped (DMARC did not pass); dkim=pass (1024-bit rsa key sha256) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=Gdz/3wY9 header.a=rsa-sha256 header.s=korg x-bits=1024; dmarc=none policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=linux-foundation.org; iprev=pass smtp.remote-ip=209.85.215.198 (mail-pg1-f198.google.com); spf=pass smtp.mailfrom=akpm@linux-foundation.org smtp.helo=mail-pg1-f198.google.com; x-aligned-from=pass (Address match); x-arc-spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org x-arc-instance=1 x-arc-domain=google.com (Trusted from aar.1.google.com); x-csa=none; x-google-dkim=fail (message has been altered, 2048-bit rsa key) header.d=1e100.net header.i=@1e100.net header.b=VZuDOxUf; x-me-sender=none; x-ptr=pass smtp.helo=mail-pg1-f198.google.com policy.ptr=mail-pg1-f198.google.com; x-return-mx=pass header.domain=linux-foundation.org policy.is_org=yes (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM); x-return-mx=pass smtp.domain=linux-foundation.org policy.is_org=yes (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM); x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384 smtp.bits=256/256; x-vs=clean score=40 state=0 Authentication-Results: mx6.messagingengine.com; arc=pass (as.1.google.com=pass, ams.1.google.com=pass) smtp.remote-ip=209.85.215.198; bimi=skipped (DMARC did not pass); dkim=pass (1024-bit rsa key sha256) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=Gdz/3wY9 header.a=rsa-sha256 header.s=korg x-bits=1024; dmarc=none policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=linux-foundation.org; iprev=pass smtp.remote-ip=209.85.215.198 (mail-pg1-f198.google.com); spf=pass smtp.mailfrom=akpm@linux-foundation.org smtp.helo=mail-pg1-f198.google.com; x-aligned-from=pass (Address match); x-arc-spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org x-arc-instance=1 x-arc-domain=google.com (Trusted from aar.1.google.com); x-csa=none; x-google-dkim=fail (message has been altered, 2048-bit rsa key) header.d=1e100.net header.i=@1e100.net header.b=VZuDOxUf; x-me-sender=none; x-ptr=pass smtp.helo=mail-pg1-f198.google.com policy.ptr=mail-pg1-f198.google.com; x-return-mx=pass header.domain=linux-foundation.org policy.is_org=yes (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM); x-return-mx=pass smtp.domain=linux-foundation.org policy.is_org=yes (MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM); x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384 smtp.bits=256/256; x-vs=clean score=40 state=0 X-ME-VSCause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefjedgieegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucgoufhorhhtvggutfgvtg hiphdvucdlgedtmdenucfjughrpeffhffvuffkjggfsedttdertddtredtnecuhfhrohhm peetnhgurhgvficuofhorhhtohhnuceorghkphhmsehlihhnuhigqdhfohhunhgurghtih honhdrohhrgheqnecuggftrfgrthhtvghrnhepjeevfeduveffvddvudetkefhgeduveeu geevvdfhhfevhfekkedtieefgfduheeinecuffhomhgrihhnpehkvghrnhgvlhdrohhrgh enucfkphepvddtledrkeehrddvudehrdduleekpdduleekrddugeehrddvledrleelnecu uegrugftvghpuhhtkfhppeduleekrddugeehrddvledrleelnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehinhgvthepvddtledrkeehrddvudehrdduleekpdhhvghl ohepmhgrihhlqdhpghduqdhfudelkedrghhoohhglhgvrdgtohhmpdhmrghilhhfrhhomh epoegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgqe X-ME-VSScore: 40 X-ME-VSCategory: clean X-ME-CSA: none Received-SPF: pass (linux-foundation.org: Sender is authorized to use 'akpm@linux-foundation.org' in 'mfrom' identity (mechanism 'include:_spf.google.com' matched)) receiver=mx6.messagingengine.com; identity=mailfrom; envelope-from="akpm@linux-foundation.org"; helo=mail-pg1-f198.google.com; client-ip=209.85.215.198 Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx6.messagingengine.com (Postfix) with ESMTPS for <akpm@mbx.kernel.org>; Tue, 4 May 2021 23:16:31 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id g5-20020a63f4050000b02901f6c7b9a6d0so593624pgi.5 for <akpm@mbx.kernel.org>; Tue, 04 May 2021 20:16:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:from:to:subject:message-id :in-reply-to:user-agent; bh=LuH7mbm3+zp863vKBEqKeoZtnpuFxYpIb5oTVwf56Es=; b=VZuDOxUfeHXJz1/CiFfcxuMVHkmW5RznvqYS+Py8Ub6nHHXprQJGE9Ze3WgH+1ylSe NJLEC7xgv15SR9A+e/MT4RTj3OVOwtd1Zi2vPav39a9K4tP+2uL2Ei+5d7FtT3LLZsjo feek/DqCGSkJ/EC5woLyU9BBkfLUuQ9/2HiDCk10BMetEfWdor69Slb39NOXES8br02X 25Btabu9ZCWroyjQj7W5gwGr5Z6Hs2nbnnfAb+e92FalcUD/4ql77lNzRcWGi4/9TT8s ntqI2g46Xv+k5LURaRH5CRBpxkkKgzcrioRPYFUHkEgOEWy1hPzg9QPk8ZO35Xm9R9d2 vl3Q== X-Gm-Message-State: AOAM531IlYUTVWcMrsTunnxZWB7SKeeOmoZj5mZ1A5tl7N/JlZUueN8L tvyRKnvxHr6a5mDaGHN9Tb1N/iCzT0U5oQgRVTxTnj1qFGibRa9+leLQNKX0aGlNg9JiaMfromb xyOlCUpVXOlVvchuwTUSTn7rXum+Hh3PWQZm5II/EX+0AkzKqez62Z8U= X-Received: by 2002:a17:90a:a581:: with SMTP id b1mr32203271pjq.53.1620184589161; Tue, 04 May 2021 20:16:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxffoGdRqAjUagWoMVD5p/Lk1KTEDftEhkWh8ewatgDmZLlxh0lO1hxYIdYYwoO5dsJ/i0z X-Received: by 2002:a17:90a:a581:: with SMTP id b1mr32203198pjq.53.1620184588109; Tue, 04 May 2021 20:16:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620184588; cv=none; d=google.com; s=arc-20160816; b=Fr2b2AMXJr6OeNpSql45tq1korkuDOunp7t+DpARuEBnwvQnKfagyipQ93jywsRf/c /i/mP2eTmJwOLWNORClh1MGF/0VfBx1ULoB9W4CI3LpVgGFXGGFis8LTcvUYD5yvhlsV 50rm2j34iS9lyo04FB/hbhGkwLtUhz2PGkLGuqHspTd+pUpUCf5SLxGJbZC5uCcUEsbO 8WSDBWyvaCPjFzJQZK60gK70ticKW+fCG1xHtOG4qsFCbqEpFKBy8eVK83OBazo/dQDr DOheWNWyw2o/WMP4GpZMvZuj30dx3j8xnBahIpnMIQJaog6wLMcVX9pkQ8UJym3/PGNm pO/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:message-id:subject:to:from:date :dkim-signature; bh=LuH7mbm3+zp863vKBEqKeoZtnpuFxYpIb5oTVwf56Es=; b=vVN16NPMKjoxSJQ6b36VXFCkZqnmG7wABfilgE069txZqmHpEMyZb8lRStkHy557LM Kn7UfJFP3xwsP8ZTCipVDZ6tpFW/hYFU9o4th9G8asWs+MOf9xpWX2LQZ1FTmaao2Fg5 uCHypz39cnAh0Z1EJfNsTcaTGIrkbBd6zje+mtBgs8hnfH8HcWBYTPCHCCx950Z928tb XOPd/Igs7yzD1ioBiGXZj/ciwPbWVTaZXBg4JOZSApxkDMfuMyfyLLOs++EVkyxJHUme TmgwvLkixcwEtKF7gIeqEhwvOUSVvilLuJLFVaLumwTcjJ1amVfGcJhBE7LIM9C3SMpA rOOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b="Gdz/3wY9"; spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id c85si20173199pfb.8.2021.05.04.20.16.27 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 May 2021 20:16:28 -0700 (PDT) Received-SPF: pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b="Gdz/3wY9"; spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org Received: by mail.kernel.org (Postfix) with ESMTPSA id A4DB4610D2; Wed, 5 May 2021 03:16:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1620184587; bh=TxN4wgKcKf2UUem+5pL09m9GL/7U592mEalo2U6vwAU=; h=Date:From:To:Subject:In-Reply-To:From; b=Gdz/3wY9ktH3hOmn2DAOkfh0JXwPdMJ8xsNQFa9eI25K39Z3iHdRGo9jX3QtMDtog D4Zakt52CQCYsV91c9oCai8KnCTkkAjJq/Ez7p8UHpz97Go3yYYxqg6DDl6d8HCQvN H47dTaZAgeH2sw29bjB9fRzNuTx7k4RAPlqZIpiE= Date: Tue, 04 May 2021 20:16:26 -0700 From: Andrew Morton <akpm@linux-foundation.org> To: akpm@linux-foundation.org, anshuman.khandual@arm.com, aou@eecs.berkeley.edu, arnd@arndb.de, benh@kernel.crashing.org, borntraeger@de.ibm.com, bp@alien8.de, catalin.marinas@arm.com, dalias@libc.org, deller@gmx.de, gor@linux.ibm.com, hca@linux.ibm.com, hpa@zytor.com, James.Bottomley@HansenPartnership.com, linux-mm@kvack.org, linux@armlinux.org.uk, mingo@redhat.com, mm-commits@vger.kernel.org, mpe@ellerman.id.au, palmerdabbelt@google.com, paul.walmsley@sifive.com, paulus@samba.org, tglx@linutronix.de, torvalds@linux-foundation.org, tsbogend@alpha.franken.de, vgupta@synopsys.com, viro@zeniv.linux.org.uk, will@kernel.org, ysato@users.osdn.me Subject: [patch 103/143] mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS) Message-ID: <20210505031626.c8o4WL7KE%akpm@linux-foundation.org> In-Reply-To: <20210504183219.a3cc46aee4013d77402276c5@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Gm-Original-To: akpm@linux-foundation.org From: Anshuman Khandual <anshuman.khandual@arm.com> Subject: mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS) SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. Also rename it as ARCH_SUPPORTS_HUGETLBFS instead. This reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [riscv] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/arm/Kconfig | 5 +---- arch/arm64/Kconfig | 4 +--- arch/mips/Kconfig | 6 +----- arch/parisc/Kconfig | 5 +---- arch/powerpc/Kconfig | 3 --- arch/powerpc/platforms/Kconfig.cputype | 6 +++--- arch/riscv/Kconfig | 5 +---- arch/sh/Kconfig | 5 +---- fs/Kconfig | 5 ++++- 9 files changed, 13 insertions(+), 31 deletions(-) --- a/arch/arm64/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/arm64/Kconfig @@ -73,6 +73,7 @@ config ARM64 select ARCH_USE_QUEUED_SPINLOCKS select ARCH_USE_SYM_ANNOTATIONS select ARCH_SUPPORTS_DEBUG_PAGEALLOC + select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_MEMORY_FAILURE select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN @@ -1072,9 +1073,6 @@ config HW_PERF_EVENTS def_bool y depends on ARM_PMU -config SYS_SUPPORTS_HUGETLBFS - def_bool y - config ARCH_HAS_FILTER_PGPROT def_bool y --- a/arch/arm/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/arm/Kconfig @@ -31,6 +31,7 @@ config ARM select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7 select ARCH_SUPPORTS_ATOMIC_RMW + select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF select ARCH_USE_MEMTEST @@ -1511,10 +1512,6 @@ config HW_PERF_EVENTS def_bool y depends on ARM_PMU -config SYS_SUPPORTS_HUGETLBFS - def_bool y - depends on ARM_LPAE - config HAVE_ARCH_TRANSPARENT_HUGEPAGE def_bool y depends on ARM_LPAE --- a/arch/mips/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/mips/Kconfig @@ -19,6 +19,7 @@ config MIPS select ARCH_USE_MEMTEST select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS + select ARCH_SUPPORTS_HUGETLBFS if CPU_SUPPORTS_HUGEPAGES select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU select ARCH_WANT_IPC_PARSE_VERSION select ARCH_WANT_LD_ORPHAN_WARN @@ -1287,11 +1288,6 @@ config SYS_SUPPORTS_BIG_ENDIAN config SYS_SUPPORTS_LITTLE_ENDIAN bool -config SYS_SUPPORTS_HUGETLBFS - bool - depends on CPU_SUPPORTS_HUGEPAGES - default y - config MIPS_HUGE_TLB_SUPPORT def_bool HUGETLB_PAGE || TRANSPARENT_HUGEPAGE --- a/arch/parisc/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/parisc/Kconfig @@ -12,6 +12,7 @@ config PARISC select ARCH_HAS_STRICT_KERNEL_RWX select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_NO_SG_CHAIN + select ARCH_SUPPORTS_HUGETLBFS if PA20 select ARCH_SUPPORTS_MEMORY_FAILURE select DMA_OPS select RTC_CLASS @@ -138,10 +139,6 @@ config PGTABLE_LEVELS default 3 if 64BIT && PARISC_PAGE_SIZE_4KB default 2 -config SYS_SUPPORTS_HUGETLBFS - def_bool y if PA20 - - menu "Processor type and features" choice --- a/arch/powerpc/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/powerpc/Kconfig @@ -697,9 +697,6 @@ config ARCH_SPARSEMEM_DEFAULT def_bool y depends on PPC_BOOK3S_64 -config SYS_SUPPORTS_HUGETLBFS - bool - config ILLEGAL_POINTER_VALUE hex # This is roughly half way between the top of user space and the bottom --- a/arch/powerpc/platforms/Kconfig.cputype~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/powerpc/platforms/Kconfig.cputype @@ -40,8 +40,8 @@ config PPC_85xx config PPC_8xx bool "Freescale 8xx" + select ARCH_SUPPORTS_HUGETLBFS select FSL_SOC - select SYS_SUPPORTS_HUGETLBFS select PPC_HAVE_KUEP select PPC_HAVE_KUAP select HAVE_ARCH_VMAP_STACK @@ -95,9 +95,9 @@ config PPC_BOOK3S_64 bool "Server processors" select PPC_FPU select PPC_HAVE_PMU_SUPPORT - select SYS_SUPPORTS_HUGETLBFS select HAVE_ARCH_TRANSPARENT_HUGEPAGE select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE + select ARCH_SUPPORTS_HUGETLBFS select ARCH_SUPPORTS_NUMA_BALANCING select IRQ_WORK select PPC_MM_SLICES @@ -278,9 +278,9 @@ config FSL_BOOKE # this is for common code between PPC32 & PPC64 FSL BOOKE config PPC_FSL_BOOK3E bool + select ARCH_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64 select FSL_EMB_PERFMON select PPC_SMP_MUXED_IPI - select SYS_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64 select PPC_DOORBELL default y if FSL_BOOKE --- a/arch/riscv/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/riscv/Kconfig @@ -30,6 +30,7 @@ config RISCV select ARCH_HAS_STRICT_KERNEL_RWX if MMU select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT + select ARCH_SUPPORTS_HUGETLBFS if MMU select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if 64BIT @@ -165,10 +166,6 @@ config ARCH_WANT_GENERAL_HUGETLB config ARCH_SUPPORTS_UPROBES def_bool y -config SYS_SUPPORTS_HUGETLBFS - depends on MMU - def_bool y - config STACKTRACE_SUPPORT def_bool y --- a/arch/sh/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/arch/sh/Kconfig @@ -101,9 +101,6 @@ config SYS_SUPPORTS_APM_EMULATION bool select ARCH_SUSPEND_POSSIBLE -config SYS_SUPPORTS_HUGETLBFS - bool - config SYS_SUPPORTS_SMP bool @@ -175,12 +172,12 @@ config CPU_SH3 config CPU_SH4 bool + select ARCH_SUPPORTS_HUGETLBFS if MMU select CPU_HAS_INTEVT select CPU_HAS_SR_RB select CPU_HAS_FPU if !CPU_SH4AL_DSP select SH_INTC select SYS_SUPPORTS_SH_TMU - select SYS_SUPPORTS_HUGETLBFS if MMU config CPU_SH4A bool --- a/fs/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs +++ a/fs/Kconfig @@ -223,10 +223,13 @@ config TMPFS_INODE64 If unsure, say N. +config ARCH_SUPPORTS_HUGETLBFS + def_bool n + config HUGETLBFS bool "HugeTLB file system support" depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || \ - SYS_SUPPORTS_HUGETLBFS || BROKEN + ARCH_SUPPORTS_HUGETLBFS || BROKEN help hugetlbfs is a filesystem backing for HugeTLB pages, based on ramfs. For architectures that support it, say Y here and read _
On 5/5/21 11:14 PM, Andrew Morton wrote:
> On Wed, 5 May 2021 10:10:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>> Let me resend right now with the same in-reply-to. Hopefully they will
>>> land in the correct place.
>> Well, you re-sent it twice, and I have three copies in my own mailbox,
>> bot they still don't show up on the mm-commits mailing list.
>>
>> So the list hates them for some odd reason.
>>
>> I've picked them up locally, but adding Konstantin to the participants
>> to see if he can see what's up.
>>
>> Konstantin: patches 103/106/107 are missing on lore out of Andrew's
>> series of 143. Odd.
> It's weird. They don't turn up on linux-mm either, and that's running
> at kvack.org, also majordomo. They don't get through when sent with
> either heirloom-mailx or with sylpheed.
>
> Also, it seems that when Anshuman originally sent the patch, linux-mm
> and linux-kernel didn't send it back out. So perhaps a spam filter
> triggered?
>
> I'm seeing
>
> https://lore.kernel.org/linux-arm-kernel/1615278790-18053-3-git-send-email-anshuman.khandual@arm.com/
>
> which is via linux-arm-kernel@lists.infradead.org but the linux-kernel
> server massacred that patch series. Searching
> https://lkml.org/lkml/2021/3/9 for "anshuman" only shows 3 of the 7
> email series.
Yeah these patches faced problem from the very beginning getting
into the MM/LKML list for some strange reason.
This is everything else from -mm for this merge window, with the possible exception of Mike Rapoport's "secretmem" syscall patch series (https://lkml.kernel.org/r/20210303162209.8609-1-rppt@kernel.org). I've been wobbly about the secretmem patches due to doubts about whether the feature is sufficiently useful to justify inclusion, but developers are now weighing in with helpful information and I've asked Mike for an extensively updated [0/n] changelog. This will take a few days to play out so it is possible that I will prevail upon you for a post-rc1 merge. If that's a problem, there's always 5.13-rc1. 91 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0, plus previously sent patches. Thanks. Subsystems affected by this patch series: alpha procfs sysctl misc core-kernel bitmap lib compat checkpatch epoll isofs nilfs2 hpfs exit fork kexec gcov panic delayacct gdb resource selftests async initramfs ipc mm/cleanups drivers/char mm/slub spelling Subsystem: alpha Randy Dunlap <rdunlap@infradead.org>: alpha: eliminate old-style function definitions alpha: csum_partial_copy.c: add function prototypes from <net/checksum.h> Subsystem: procfs Colin Ian King <colin.king@canonical.com>: fs/proc/generic.c: fix incorrect pde_is_permanent check Alexey Dobriyan <adobriyan@gmail.com>: proc: save LOC in __xlate_proc_name() proc: mandate ->proc_lseek in "struct proc_ops" proc: delete redundant subset=pid check selftests: proc: test subset=pid Subsystem: sysctl zhouchuangao <zhouchuangao@vivo.com>: proc/sysctl: fix function name error in comments Subsystem: misc "Matthew Wilcox (Oracle)" <willy@infradead.org>: include: remove pagemap.h from blkdev.h Andy Shevchenko <andriy.shevchenko@linux.intel.com>: kernel.h: drop inclusion in bitmap.h Wan Jiabing <wanjiabing@vivo.com>: linux/profile.h: remove unnecessary declaration Subsystem: core-kernel Rasmus Villemoes <linux@rasmusvillemoes.dk>: kernel/async.c: fix pr_debug statement kernel/cred.c: make init_groups static Subsystem: bitmap Yury Norov <yury.norov@gmail.com>: Patch series "lib/find_bit: fast path for small bitmaps", v6: tools: disable -Wno-type-limits tools: bitmap: sync function declarations with the kernel tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel arch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300 lib: extend the scope of small_const_nbits() macro tools: sync small_const_nbits() macro with the kernel lib: inline _find_next_bit() wrappers tools: sync find_next_bit implementation lib: add fast path for find_next_*_bit() lib: add fast path for find_first_*_bit() and find_last_bit() tools: sync lib/find_bit implementation MAINTAINERS: add entry for the bitmap API Subsystem: lib Bhaskar Chowdhury <unixbhaskar@gmail.com>: lib/bch.c: fix a typo in the file bch.c Wang Qing <wangqing@vivo.com>: lib: fix inconsistent indenting in process_bit1() ToastC <mrtoastcheng@gmail.com>: lib/list_sort.c: fix typo in function description Bhaskar Chowdhury <unixbhaskar@gmail.com>: lib/genalloc.c: Fix a typo Richard Fitzgerald <rf@opensource.cirrus.com>: lib: crc8: pointer to data block should be const Zqiang <qiang.zhang@windriver.com>: lib: stackdepot: turn depot_lock spinlock to raw_spinlock Alex Shi <alexs@kernel.org>: lib/percpu_counter: tame kernel-doc compile warning lib/genalloc: add parameter description to fix doc compile warning Randy Dunlap <rdunlap@infradead.org>: lib: parser: clean up kernel-doc Subsystem: compat Masahiro Yamada <masahiroy@kernel.org>: include/linux/compat.h: remove unneeded declaration from COMPAT_SYSCALL_DEFINEx() Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: warn when missing newline in return sysfs_emit() formats Vincent Mailhol <mailhol.vincent@wanadoo.fr>: checkpatch: exclude four preprocessor sub-expressions from MACRO_ARG_REUSE Christophe JAILLET <christophe.jaillet@wanadoo.fr>: checkpatch: improve ALLOC_ARRAY_ARGS test Subsystem: epoll Davidlohr Bueso <dave@stgolabs.net>: Patch series "fs/epoll: restore user-visible behavior upon event ready": kselftest: introduce new epoll test case fs/epoll: restore waking from ep_done_scan() Subsystem: isofs "Gustavo A. R. Silva" <gustavoars@kernel.org>: isofs: fix fall-through warnings for Clang Subsystem: nilfs2 Liu xuzhi <liu.xuzhi@zte.com.cn>: fs/nilfs2: fix misspellings using codespell tool Lu Jialin <lujialin4@huawei.com>: nilfs2: fix typos in comments Subsystem: hpfs "Gustavo A. R. Silva" <gustavoars@kernel.org>: hpfs: replace one-element array with flexible-array member Subsystem: exit Jim Newsome <jnewsome@torproject.org>: do_wait: make PIDTYPE_PID case O(1) instead of O(n) Subsystem: fork Rolf Eike Beer <eb@emlix.com>: kernel/fork.c: simplify copy_mm() Xiaofeng Cao <cxfcosmos@gmail.com>: kernel/fork.c: fix typos Subsystem: kexec Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>: kernel/crash_core: add crashkernel=auto for vmcore creation Joe LeVeque <jolevequ@microsoft.com>: kexec: Add kexec reboot string Jia-Ju Bai <baijiaju1990@gmail.com>: kernel: kexec_file: fix error return code of kexec_calculate_store_digests() Pavel Tatashin <pasha.tatashin@soleen.com>: kexec: dump kmessage before machine_kexec Subsystem: gcov Johannes Berg <johannes.berg@intel.com>: gcov: combine common code gcov: simplify buffer allocation gcov: use kvmalloc() Nick Desaulniers <ndesaulniers@google.com>: gcov: clang: drop support for clang-10 and older Subsystem: panic He Ying <heying24@huawei.com>: smp: kernel/panic.c - silence warnings Subsystem: delayacct Yafang Shao <laoar.shao@gmail.com>: delayacct: clear right task's flag after blkio completes Subsystem: gdb Johannes Berg <johannes.berg@intel.com>: gdb: lx-symbols: store the abspath() Barry Song <song.bao.hua@hisilicon.com>: Patch series "scripts/gdb: clarify the platforms supporting lx_current and add arm64 support", v2: scripts/gdb: document lx_current is only supported by x86 scripts/gdb: add lx_current support for arm64 Subsystem: resource David Hildenbrand <david@redhat.com>: Patch series "kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree", v2: kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources kernel/resource: remove first_lvl / siblings_only logic Alistair Popple <apopple@nvidia.com>: kernel/resource: allow region_intersects users to hold resource_lock kernel/resource: refactor __request_region to allow external locking kernel/resource: fix locking in request_free_mem_region Subsystem: selftests Zhang Yunkai <zhang.yunkai@zte.com.cn>: selftests: remove duplicate include Subsystem: async Rasmus Villemoes <linux@rasmusvillemoes.dk>: kernel/async.c: stop guarding pr_debug() statements kernel/async.c: remove async_unregister_domain() Subsystem: initramfs Rasmus Villemoes <linux@rasmusvillemoes.dk>: Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3: init/initramfs.c: do unpacking asynchronously modules: add CONFIG_MODPROBE_PATH Subsystem: ipc Bhaskar Chowdhury <unixbhaskar@gmail.com>: ipc/sem.c: mundane typo fixes Subsystem: mm/cleanups Shijie Luo <luoshijie1@huawei.com>: mm: fix some typos and code style problems Subsystem: drivers/char David Hildenbrand <david@redhat.com>: Patch series "drivers/char: remove /dev/kmem for good": drivers/char: remove /dev/kmem for good mm: remove xlate_dev_kmem_ptr() mm/vmalloc: remove vwrite() Subsystem: mm/slub Maninder Singh <maninder1.s@samsung.com>: arm: print alloc free paths for address in registers Subsystem: spelling Drew Fustini <drew@beagleboard.org>: scripts/spelling.txt: add "overlfow" zuoqilin <zuoqilin@yulong.com>: scripts/spelling.txt: Add "diabled" typo Drew Fustini <drew@beagleboard.org>: scripts/spelling.txt: add "overflw" Colin Ian King <colin.king@canonical.com>: mm/slab.c: fix spelling mistake "disired" -> "desired" Bhaskar Chowdhury <unixbhaskar@gmail.com>: include/linux/pgtable.h: few spelling fixes zhouchuangao <zhouchuangao@vivo.com>: kernel/umh.c: fix some spelling mistakes Xiaofeng Cao <cxfcosmos@gmail.com>: kernel/user_namespace.c: fix typos Bhaskar Chowdhury <unixbhaskar@gmail.com>: kernel/up.c: fix typo Xiaofeng Cao <caoxiaofeng@yulong.com>: kernel/sys.c: fix typo dingsenjie <dingsenjie@yulong.com>: fs: fat: fix spelling typo of values Bhaskar Chowdhury <unixbhaskar@gmail.com>: ipc/sem.c: spelling fix Masahiro Yamada <masahiroy@kernel.org>: treewide: remove editor modelines and cruft Ingo Molnar <mingo@kernel.org>: mm: fix typos in comments Lu Jialin <lujialin4@huawei.com>: mm: fix typos in comments Documentation/admin-guide/devices.txt | 2 Documentation/admin-guide/kdump/kdump.rst | 3 Documentation/admin-guide/kernel-parameters.txt | 18 Documentation/dev-tools/gdb-kernel-debugging.rst | 4 MAINTAINERS | 16 arch/Kconfig | 20 arch/alpha/include/asm/io.h | 5 arch/alpha/kernel/pc873xx.c | 4 arch/alpha/lib/csum_partial_copy.c | 1 arch/arm/configs/dove_defconfig | 1 arch/arm/configs/magician_defconfig | 1 arch/arm/configs/moxart_defconfig | 1 arch/arm/configs/mps2_defconfig | 1 arch/arm/configs/mvebu_v5_defconfig | 1 arch/arm/configs/xcep_defconfig | 1 arch/arm/include/asm/bug.h | 1 arch/arm/include/asm/io.h | 5 arch/arm/kernel/process.c | 11 arch/arm/kernel/traps.c | 1 arch/h8300/include/asm/bitops.h | 8 arch/hexagon/configs/comet_defconfig | 1 arch/hexagon/include/asm/io.h | 1 arch/ia64/include/asm/io.h | 1 arch/ia64/include/asm/uaccess.h | 18 arch/m68k/atari/time.c | 7 arch/m68k/configs/amcore_defconfig | 1 arch/m68k/include/asm/bitops.h | 6 arch/m68k/include/asm/io_mm.h | 5 arch/mips/include/asm/io.h | 5 arch/openrisc/configs/or1ksim_defconfig | 1 arch/parisc/include/asm/io.h | 5 arch/parisc/include/asm/pdc_chassis.h | 1 arch/powerpc/include/asm/io.h | 5 arch/s390/include/asm/io.h | 5 arch/sh/configs/edosk7705_defconfig | 1 arch/sh/configs/se7206_defconfig | 1 arch/sh/configs/sh2007_defconfig | 1 arch/sh/configs/sh7724_generic_defconfig | 1 arch/sh/configs/sh7770_generic_defconfig | 1 arch/sh/configs/sh7785lcr_32bit_defconfig | 1 arch/sh/include/asm/bitops.h | 5 arch/sh/include/asm/io.h | 5 arch/sparc/configs/sparc64_defconfig | 1 arch/sparc/include/asm/io_64.h | 5 arch/um/drivers/cow.h | 7 arch/xtensa/configs/xip_kc705_defconfig | 1 block/blk-settings.c | 1 drivers/auxdisplay/panel.c | 7 drivers/base/firmware_loader/main.c | 2 drivers/block/brd.c | 1 drivers/block/loop.c | 1 drivers/char/Kconfig | 10 drivers/char/mem.c | 231 -------- drivers/gpu/drm/qxl/qxl_drv.c | 1 drivers/isdn/capi/kcapi_proc.c | 1 drivers/md/bcache/super.c | 1 drivers/media/usb/pwc/pwc-uncompress.c | 3 drivers/net/ethernet/adaptec/starfire.c | 8 drivers/net/ethernet/amd/atarilance.c | 8 drivers/net/ethernet/amd/pcnet32.c | 7 drivers/net/wireless/intersil/hostap/hostap_proc.c | 1 drivers/net/wireless/intersil/orinoco/orinoco_nortel.c | 8 drivers/net/wireless/intersil/orinoco/orinoco_pci.c | 8 drivers/net/wireless/intersil/orinoco/orinoco_plx.c | 8 drivers/net/wireless/intersil/orinoco/orinoco_tmd.c | 8 drivers/nvdimm/btt.c | 1 drivers/nvdimm/pmem.c | 1 drivers/parport/parport_ip32.c | 12 drivers/platform/x86/dell/dell_rbu.c | 3 drivers/scsi/53c700.c | 1 drivers/scsi/53c700.h | 1 drivers/scsi/ch.c | 6 drivers/scsi/esas2r/esas2r_main.c | 1 drivers/scsi/ips.c | 20 drivers/scsi/ips.h | 20 drivers/scsi/lasi700.c | 1 drivers/scsi/megaraid/mbox_defs.h | 2 drivers/scsi/megaraid/mega_common.h | 2 drivers/scsi/megaraid/megaraid_mbox.c | 2 drivers/scsi/megaraid/megaraid_mbox.h | 2 drivers/scsi/qla1280.c | 12 drivers/scsi/scsicam.c | 1 drivers/scsi/sni_53c710.c | 1 drivers/video/fbdev/matrox/matroxfb_base.c | 9 drivers/video/fbdev/vga16fb.c | 10 fs/configfs/configfs_internal.h | 4 fs/configfs/dir.c | 4 fs/configfs/file.c | 4 fs/configfs/inode.c | 4 fs/configfs/item.c | 4 fs/configfs/mount.c | 4 fs/configfs/symlink.c | 4 fs/eventpoll.c | 6 fs/fat/fatent.c | 2 fs/hpfs/hpfs.h | 3 fs/isofs/rock.c | 1 fs/nfs/dir.c | 7 fs/nfs/nfs4proc.c | 6 fs/nfs/nfs4renewd.c | 6 fs/nfs/nfs4state.c | 6 fs/nfs/nfs4xdr.c | 6 fs/nfsd/nfs4proc.c | 6 fs/nfsd/nfs4xdr.c | 6 fs/nfsd/xdr4.h | 6 fs/nilfs2/cpfile.c | 2 fs/nilfs2/ioctl.c | 4 fs/nilfs2/segment.c | 4 fs/nilfs2/the_nilfs.c | 2 fs/ocfs2/acl.c | 4 fs/ocfs2/acl.h | 4 fs/ocfs2/alloc.c | 4 fs/ocfs2/alloc.h | 4 fs/ocfs2/aops.c | 4 fs/ocfs2/aops.h | 4 fs/ocfs2/blockcheck.c | 4 fs/ocfs2/blockcheck.h | 4 fs/ocfs2/buffer_head_io.c | 4 fs/ocfs2/buffer_head_io.h | 4 fs/ocfs2/cluster/heartbeat.c | 4 fs/ocfs2/cluster/heartbeat.h | 4 fs/ocfs2/cluster/masklog.c | 4 fs/ocfs2/cluster/masklog.h | 4 fs/ocfs2/cluster/netdebug.c | 4 fs/ocfs2/cluster/nodemanager.c | 4 fs/ocfs2/cluster/nodemanager.h | 4 fs/ocfs2/cluster/ocfs2_heartbeat.h | 4 fs/ocfs2/cluster/ocfs2_nodemanager.h | 4 fs/ocfs2/cluster/quorum.c | 4 fs/ocfs2/cluster/quorum.h | 4 fs/ocfs2/cluster/sys.c | 4 fs/ocfs2/cluster/sys.h | 4 fs/ocfs2/cluster/tcp.c | 4 fs/ocfs2/cluster/tcp.h | 4 fs/ocfs2/cluster/tcp_internal.h | 4 fs/ocfs2/dcache.c | 4 fs/ocfs2/dcache.h | 4 fs/ocfs2/dir.c | 4 fs/ocfs2/dir.h | 4 fs/ocfs2/dlm/dlmapi.h | 4 fs/ocfs2/dlm/dlmast.c | 4 fs/ocfs2/dlm/dlmcommon.h | 4 fs/ocfs2/dlm/dlmconvert.c | 4 fs/ocfs2/dlm/dlmconvert.h | 4 fs/ocfs2/dlm/dlmdebug.c | 4 fs/ocfs2/dlm/dlmdebug.h | 4 fs/ocfs2/dlm/dlmdomain.c | 4 fs/ocfs2/dlm/dlmdomain.h | 4 fs/ocfs2/dlm/dlmlock.c | 4 fs/ocfs2/dlm/dlmmaster.c | 4 fs/ocfs2/dlm/dlmrecovery.c | 4 fs/ocfs2/dlm/dlmthread.c | 4 fs/ocfs2/dlm/dlmunlock.c | 4 fs/ocfs2/dlmfs/dlmfs.c | 4 fs/ocfs2/dlmfs/userdlm.c | 4 fs/ocfs2/dlmfs/userdlm.h | 4 fs/ocfs2/dlmglue.c | 4 fs/ocfs2/dlmglue.h | 4 fs/ocfs2/export.c | 4 fs/ocfs2/export.h | 4 fs/ocfs2/extent_map.c | 4 fs/ocfs2/extent_map.h | 4 fs/ocfs2/file.c | 4 fs/ocfs2/file.h | 4 fs/ocfs2/filecheck.c | 4 fs/ocfs2/filecheck.h | 4 fs/ocfs2/heartbeat.c | 4 fs/ocfs2/heartbeat.h | 4 fs/ocfs2/inode.c | 4 fs/ocfs2/inode.h | 4 fs/ocfs2/journal.c | 4 fs/ocfs2/journal.h | 4 fs/ocfs2/localalloc.c | 4 fs/ocfs2/localalloc.h | 4 fs/ocfs2/locks.c | 4 fs/ocfs2/locks.h | 4 fs/ocfs2/mmap.c | 4 fs/ocfs2/move_extents.c | 4 fs/ocfs2/move_extents.h | 4 fs/ocfs2/namei.c | 4 fs/ocfs2/namei.h | 4 fs/ocfs2/ocfs1_fs_compat.h | 4 fs/ocfs2/ocfs2.h | 4 fs/ocfs2/ocfs2_fs.h | 4 fs/ocfs2/ocfs2_ioctl.h | 4 fs/ocfs2/ocfs2_lockid.h | 4 fs/ocfs2/ocfs2_lockingver.h | 4 fs/ocfs2/refcounttree.c | 4 fs/ocfs2/refcounttree.h | 4 fs/ocfs2/reservations.c | 4 fs/ocfs2/reservations.h | 4 fs/ocfs2/resize.c | 4 fs/ocfs2/resize.h | 4 fs/ocfs2/slot_map.c | 4 fs/ocfs2/slot_map.h | 4 fs/ocfs2/stack_o2cb.c | 4 fs/ocfs2/stack_user.c | 4 fs/ocfs2/stackglue.c | 4 fs/ocfs2/stackglue.h | 4 fs/ocfs2/suballoc.c | 4 fs/ocfs2/suballoc.h | 4 fs/ocfs2/super.c | 4 fs/ocfs2/super.h | 4 fs/ocfs2/symlink.c | 4 fs/ocfs2/symlink.h | 4 fs/ocfs2/sysfile.c | 4 fs/ocfs2/sysfile.h | 4 fs/ocfs2/uptodate.c | 4 fs/ocfs2/uptodate.h | 4 fs/ocfs2/xattr.c | 4 fs/ocfs2/xattr.h | 4 fs/proc/generic.c | 13 fs/proc/inode.c | 18 fs/proc/proc_sysctl.c | 2 fs/reiserfs/procfs.c | 10 include/asm-generic/bitops/find.h | 108 +++ include/asm-generic/bitops/le.h | 38 + include/asm-generic/bitsperlong.h | 12 include/asm-generic/io.h | 11 include/linux/align.h | 15 include/linux/async.h | 1 include/linux/bitmap.h | 11 include/linux/bitops.h | 12 include/linux/blkdev.h | 1 include/linux/compat.h | 1 include/linux/configfs.h | 4 include/linux/crc8.h | 2 include/linux/cred.h | 1 include/linux/delayacct.h | 20 include/linux/fs.h | 2 include/linux/genl_magic_func.h | 1 include/linux/genl_magic_struct.h | 1 include/linux/gfp.h | 2 include/linux/init_task.h | 1 include/linux/initrd.h | 2 include/linux/kernel.h | 9 include/linux/mm.h | 2 include/linux/mmzone.h | 2 include/linux/pgtable.h | 10 include/linux/proc_fs.h | 1 include/linux/profile.h | 3 include/linux/smp.h | 8 include/linux/swap.h | 1 include/linux/vmalloc.h | 7 include/uapi/linux/if_bonding.h | 11 include/uapi/linux/nfs4.h | 6 include/xen/interface/elfnote.h | 10 include/xen/interface/hvm/hvm_vcpu.h | 10 include/xen/interface/io/xenbus.h | 10 init/Kconfig | 12 init/initramfs.c | 38 + init/main.c | 1 ipc/sem.c | 12 kernel/async.c | 68 -- kernel/configs/android-base.config | 1 kernel/crash_core.c | 7 kernel/cred.c | 2 kernel/exit.c | 67 ++ kernel/fork.c | 23 kernel/gcov/Kconfig | 1 kernel/gcov/base.c | 49 + kernel/gcov/clang.c | 282 ---------- kernel/gcov/fs.c | 146 ++++- kernel/gcov/gcc_4_7.c | 173 ------ kernel/gcov/gcov.h | 14 kernel/kexec_core.c | 4 kernel/kexec_file.c | 4 kernel/kmod.c | 2 kernel/resource.c | 198 ++++--- kernel/sys.c | 14 kernel/umh.c | 8 kernel/up.c | 2 kernel/user_namespace.c | 6 lib/bch.c | 2 lib/crc8.c | 2 lib/decompress_unlzma.c | 2 lib/find_bit.c | 68 -- lib/genalloc.c | 7 lib/list_sort.c | 2 lib/parser.c | 61 +- lib/percpu_counter.c | 2 lib/stackdepot.c | 6 mm/balloon_compaction.c | 4 mm/compaction.c | 4 mm/filemap.c | 2 mm/gup.c | 2 mm/highmem.c | 2 mm/huge_memory.c | 6 mm/hugetlb.c | 6 mm/internal.h | 2 mm/kasan/kasan.h | 8 mm/kasan/quarantine.c | 4 mm/kasan/shadow.c | 4 mm/kfence/report.c | 2 mm/khugepaged.c | 2 mm/ksm.c | 6 mm/madvise.c | 4 mm/memcontrol.c | 18 mm/memory-failure.c | 2 mm/memory.c | 18 mm/mempolicy.c | 6 mm/migrate.c | 8 mm/mmap.c | 4 mm/mprotect.c | 2 mm/mremap.c | 2 mm/nommu.c | 10 mm/oom_kill.c | 2 mm/page-writeback.c | 4 mm/page_alloc.c | 16 mm/page_owner.c | 2 mm/page_vma_mapped.c | 2 mm/percpu-internal.h | 2 mm/percpu.c | 2 mm/pgalloc-track.h | 6 mm/rmap.c | 2 mm/slab.c | 8 mm/slub.c | 2 mm/swap.c | 4 mm/swap_slots.c | 2 mm/swap_state.c | 2 mm/vmalloc.c | 124 ---- mm/vmstat.c | 2 mm/z3fold.c | 2 mm/zpool.c | 2 mm/zsmalloc.c | 6 samples/configfs/configfs_sample.c | 2 scripts/checkpatch.pl | 15 scripts/gdb/linux/cpus.py | 23 scripts/gdb/linux/symbols.py | 3 scripts/spelling.txt | 3 tools/include/asm-generic/bitops/find.h | 85 ++- tools/include/asm-generic/bitsperlong.h | 3 tools/include/linux/bitmap.h | 18 tools/lib/bitmap.c | 4 tools/lib/find_bit.c | 56 - tools/scripts/Makefile.include | 1 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 44 + tools/testing/selftests/kvm/lib/sparsebit.c | 1 tools/testing/selftests/mincore/mincore_selftest.c | 1 tools/testing/selftests/powerpc/mm/tlbie_test.c | 1 tools/testing/selftests/proc/Makefile | 1 tools/testing/selftests/proc/proc-subset-pid.c | 121 ++++ tools/testing/selftests/proc/read.c | 4 tools/usb/hcd-tests.sh | 2 343 files changed, 1383 insertions(+), 2119 deletions(-)
On Thu, May 6, 2021 at 6:01 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > I've been wobbly about the secretmem patches due to doubts about > whether the feature is sufficiently useful to justify inclusion, but > developers are now weighing in with helpful information and I've asked Mike > for an extensively updated [0/n] changelog. This will take a few days > to play out so it is possible that I will prevail upon you for a post-rc1 > merge. Oh, much too late for this release by now. > If that's a problem, there's always 5.13-rc1. 5.13-rc1 is two days from now, it would be for 5.14-rc1.. How time - and version numbers - fly. Linus
13 patches, based on bd3c9cdb21a2674dd0db70199df884828e37abd4. Subsystems affected by this patch series: mm/hugetlb mm/slub resource squashfs mm/userfaultfd mm/ksm mm/pagealloc mm/kasan mm/pagemap hfsplus modprobe mm/ioremap Subsystem: mm/hugetlb Peter Xu <peterx@redhat.com>: Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2: mm/hugetlb: fix F_SEAL_FUTURE_WRITE mm/hugetlb: fix cow where page writtable in child Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: mm, slub: move slub_debug static key enabling outside slab_mutex Subsystem: resource Alistair Popple <apopple@nvidia.com>: kernel/resource: fix return code check in __request_free_mem_region Subsystem: squashfs Phillip Lougher <phillip@squashfs.org.uk>: squashfs: fix divide error in calculate_skip() Subsystem: mm/userfaultfd Axel Rasmussen <axelrasmussen@google.com>: userfaultfd: release page in error path to avoid BUG_ON Subsystem: mm/ksm Hugh Dickins <hughd@google.com>: ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" Subsystem: mm/pagealloc "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: fix struct page layout on 32-bit systems Subsystem: mm/kasan Peter Collingbourne <pcc@google.com>: kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled Subsystem: mm/pagemap "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/filemap: fix readahead return types Subsystem: hfsplus Jouni Roivas <jouni.roivas@tuxera.com>: hfsplus: prevent corruption in shrinking truncate Subsystem: modprobe Rasmus Villemoes <linux@rasmusvillemoes.dk>: docs: admin-guide: update description for kernel.modprobe sysctl Subsystem: mm/ioremap Christophe Leroy <christophe.leroy@csgroup.eu>: mm/ioremap: fix iomap_max_page_shift Documentation/admin-guide/sysctl/kernel.rst | 9 ++++--- fs/hfsplus/extents.c | 7 +++-- fs/hugetlbfs/inode.c | 5 ++++ fs/iomap/buffered-io.c | 4 +-- fs/squashfs/file.c | 6 ++-- include/linux/mm.h | 32 ++++++++++++++++++++++++++ include/linux/mm_types.h | 4 +-- include/linux/pagemap.h | 6 ++-- include/net/page_pool.h | 12 +++++++++ kernel/resource.c | 2 - lib/test_kasan.c | 29 ++++++++++++++++++----- mm/hugetlb.c | 1 mm/ioremap.c | 6 ++-- mm/ksm.c | 3 +- mm/shmem.c | 34 ++++++++++++---------------- mm/slab_common.c | 10 ++++++++ mm/slub.c | 9 ------- net/core/page_pool.c | 12 +++++---- 18 files changed, 129 insertions(+), 62 deletions(-)
10 patches, based on 4ff2473bdb4cf2bb7d208ccf4418d3d7e6b1652c. Subsystems affected by this patch series: mm/pagealloc mm/gup ipc selftests mm/kasan kernel/watchdog bitmap procfs lib mm/userfaultfd Subsystem: mm/pagealloc Arnd Bergmann <arnd@arndb.de>: mm/shuffle: fix section mismatch warning Subsystem: mm/gup Michal Hocko <mhocko@suse.com>: Revert "mm/gup: check page posion status for coredump." Subsystem: ipc Varad Gautam <varad.gautam@suse.com>: ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry Subsystem: selftests Yang Yingliang <yangyingliang@huawei.com>: tools/testing/selftests/exec: fix link error Subsystem: mm/kasan Alexander Potapenko <glider@google.com>: kasan: slab: always reset the tag in get_freepointer_safe() Subsystem: kernel/watchdog Petr Mladek <pmladek@suse.com>: watchdog: reliable handling of timestamps Subsystem: bitmap Rikard Falkeborn <rikard.falkeborn@gmail.com>: linux/bits.h: fix compilation error with GENMASK Subsystem: procfs Alexey Dobriyan <adobriyan@gmail.com>: proc: remove Alexey from MAINTAINERS Subsystem: lib Zhen Lei <thunder.leizhen@huawei.com>: lib: kunit: suppress a compilation warning of frame size Subsystem: mm/userfaultfd Mike Kravetz <mike.kravetz@oracle.com>: userfaultfd: hugetlbfs: fix new flag usage in error path MAINTAINERS | 1 - fs/hugetlbfs/inode.c | 2 +- include/linux/bits.h | 2 +- include/linux/const.h | 8 ++++++++ include/linux/minmax.h | 10 ++-------- ipc/mqueue.c | 6 ++++-- ipc/msg.c | 6 ++++-- ipc/sem.c | 6 ++++-- kernel/watchdog.c | 34 ++++++++++++++++++++-------------- lib/Makefile | 1 + mm/gup.c | 4 ---- mm/internal.h | 20 -------------------- mm/shuffle.h | 4 ++-- mm/slub.c | 1 + mm/userfaultfd.c | 28 ++++++++++++++-------------- tools/include/linux/bits.h | 2 +- tools/include/linux/const.h | 8 ++++++++ tools/testing/selftests/exec/Makefile | 6 +++--- 18 files changed, 74 insertions(+), 75 deletions(-)
13 patches, based on 16f0596fc1d78a1f3ae4628cff962bb297dc908c. Subsystems affected by this patch series: mips mm/kfence init mm/debug mm/pagealloc mm/memory-hotplug mm/hugetlb proc mm/kasan mm/hugetlb lib ocfs2 mailmap Subsystem: mips Thomas Bogendoerfer <tsbogend@alpha.franken.de>: Revert "MIPS: make userspace mapping young by default" Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence: use TASK_IDLE when awaiting allocation Subsystem: init Mark Rutland <mark.rutland@arm.com>: pid: take a reference when initializing `cad_pid` Subsystem: mm/debug Gerald Schaefer <gerald.schaefer@linux.ibm.com>: mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() Subsystem: mm/pagealloc Ding Hui <dinghui@sangfor.com.cn>: mm/page_alloc: fix counting of free pages after take off from buddy Subsystem: mm/memory-hotplug David Hildenbrand <david@redhat.com>: drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 Subsystem: mm/hugetlb Naoya Horiguchi <naoya.horiguchi@nec.com>: hugetlb: pass head page to remove_hugetlb_page() Subsystem: proc David Matlack <dmatlack@google.com>: proc: add .gitignore for proc-subset-pid selftest Subsystem: mm/kasan Yu Kuai <yukuai3@huawei.com>: mm/kasan/init.c: fix doc warning Subsystem: mm/hugetlb Mina Almasry <almasrymina@google.com>: mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY Subsystem: lib YueHaibing <yuehaibing@huawei.com>: lib: crc64: fix kernel-doc warning Subsystem: ocfs2 Junxiao Bi <junxiao.bi@oracle.com>: ocfs2: fix data corruption by fallocate Subsystem: mailmap Michel Lespinasse <michel@lespinasse.org>: mailmap: use private address for Michel Lespinasse .mailmap | 3 + arch/mips/mm/cache.c | 30 ++++++++--------- drivers/base/memory.c | 6 +-- fs/ocfs2/file.c | 55 +++++++++++++++++++++++++++++--- include/linux/pgtable.h | 8 ++++ init/main.c | 2 - lib/crc64.c | 2 - mm/debug_vm_pgtable.c | 4 +- mm/hugetlb.c | 16 +++++++-- mm/kasan/init.c | 4 +- mm/kfence/core.c | 6 +-- mm/memory.c | 4 ++ mm/page_alloc.c | 2 + tools/testing/selftests/proc/.gitignore | 1 14 files changed, 107 insertions(+), 36 deletions(-)
18 patches, based on 94f0b2d4a1d0c52035aef425da5e022bd2cb1c71. Subsystems affected by this patch series: mm/memory-failure mm/swap mm/slub mm/hugetlb mm/memory-failure coredump mm/slub mm/thp mm/sparsemem Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm,hwpoison: fix race with hugetlb page allocation Subsystem: mm/swap Peter Xu <peterx@redhat.com>: mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare Subsystem: mm/slub Kees Cook <keescook@chromium.org>: Patch series "Actually fix freelist pointer vs redzoning", v4: mm/slub: clarify verification reporting mm/slub: fix redzoning for small allocations mm/slub: actually fix freelist pointer vs redzoning Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: mm/hugetlb: expand restore_reserve_on_error functionality Subsystem: mm/memory-failure yangerkun <yangerkun@huawei.com>: mm/memory-failure: make sure wait for page writeback in memory_failure Subsystem: coredump Pingfan Liu <kernelfans@gmail.com>: crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo Subsystem: mm/slub Andrew Morton <akpm@linux-foundation.org>: mm/slub.c: include swab.h Subsystem: mm/thp Xu Yu <xuyu@linux.alibaba.com>: mm, thp: use head page in __migration_entry_wait() Hugh Dickins <hughd@google.com>: Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10: mm/thp: fix __split_huge_pmd_locked() on shmem migration entry mm/thp: make is_huge_zero_pmd() safe and quicker mm/thp: try_to_unmap() use TTU_SYNC for safe splitting mm/thp: fix vma_address() if virtual address below file offset Jue Wang <juew@google.com>: mm/thp: fix page_address_in_vma() on file THP tails Hugh Dickins <hughd@google.com>: mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() Yang Shi <shy828301@gmail.com>: mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Subsystem: mm/sparsemem Miles Chen <miles.chen@mediatek.com>: mm/sparse: fix check_usemap_section_nr warnings Documentation/vm/slub.rst | 10 +-- fs/hugetlbfs/inode.c | 1 include/linux/huge_mm.h | 8 ++ include/linux/hugetlb.h | 8 ++ include/linux/mm.h | 3 + include/linux/rmap.h | 1 include/linux/swapops.h | 15 +++-- kernel/crash_core.c | 1 mm/huge_memory.c | 58 ++++++++++--------- mm/hugetlb.c | 137 +++++++++++++++++++++++++++++++++++++--------- mm/internal.h | 51 ++++++++++++----- mm/memory-failure.c | 36 +++++++++++- mm/memory.c | 41 +++++++++++++ mm/migrate.c | 1 mm/page_vma_mapped.c | 27 +++++---- mm/pgtable-generic.c | 5 - mm/rmap.c | 41 +++++++++---- mm/slab_common.c | 3 - mm/slub.c | 37 +++++------- mm/sparse.c | 13 +++- mm/swapfile.c | 2 mm/truncate.c | 43 ++++++-------- 22 files changed, 388 insertions(+), 154 deletions(-)
24 patches, based on 4a09d388f2ab382f217a764e6a152b3f614246f6. Subsystems affected by this patch series: mm/thp nilfs2 mm/vmalloc kthread mm/hugetlb mm/memory-failure mm/pagealloc MAINTAINERS mailmap Subsystem: mm/thp Hugh Dickins <hughd@google.com>: Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes": mm: page_vma_mapped_walk(): use page for pvmw->page mm: page_vma_mapped_walk(): settle PageHuge on entry mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block mm: page_vma_mapped_walk(): crossing page table boundary mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): get vma_address_end() earlier mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Subsystem: nilfs2 Pavel Skripkin <paskripkin@gmail.com>: nilfs2: fix memory leak in nilfs_sysfs_delete_device_group Subsystem: mm/vmalloc Claudio Imbrenda <imbrenda@linux.ibm.com>: Patch series "mm: add vmalloc_no_huge and use it", v4: mm/vmalloc: add vmalloc_no_huge KVM: s390: prepare for hugepage vmalloc Daniel Axtens <dja@axtens.net>: mm/vmalloc: unbreak kasan vmalloc support Subsystem: kthread Petr Mladek <pmladek@suse.com>: Patch series "kthread_worker: Fix race between kthread_mod_delayed_work(): kthread_worker: split code for canceling the delayed work timer kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Subsystem: mm/hugetlb Hugh Dickins <hughd@google.com>: mm, futex: fix shared futex pgoff on shmem huge page Subsystem: mm/memory-failure Tony Luck <tony.luck@intel.com>: Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5: mm/memory-failure: use a mutex to avoid memory_failure() races Aili Yao <yaoaili@kingsoft.com>: mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned Naoya Horiguchi <naoya.horiguchi@nec.com>: mm/hwpoison: do not lock page again when me_huge_page() successfully recovers Subsystem: mm/pagealloc Rasmus Villemoes <linux@rasmusvillemoes.dk>: mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array Mel Gorman <mgorman@techsingularity.net>: mm/page_alloc: do bulk array bounds check after checking populated elements Subsystem: MAINTAINERS Marek Behún <kabel@kernel.org>: MAINTAINERS: fix Marek's identity again Subsystem: mailmap Marek Behún <kabel@kernel.org>: mailmap: add Marek's other e-mail address and identity without diacritics .mailmap | 2 MAINTAINERS | 4 arch/s390/kvm/pv.c | 7 + fs/nilfs2/sysfs.c | 1 include/linux/hugetlb.h | 16 --- include/linux/pagemap.h | 13 +- include/linux/vmalloc.h | 1 kernel/futex.c | 3 kernel/kthread.c | 81 ++++++++++------ mm/hugetlb.c | 5 - mm/memory-failure.c | 83 +++++++++++------ mm/page_alloc.c | 6 + mm/page_vma_mapped.c | 233 +++++++++++++++++++++++++++--------------------- mm/vmalloc.c | 41 ++++++-- 14 files changed, 297 insertions(+), 199 deletions(-)
192 patches, based on 7cf3dead1ad70c72edb03e2d98e1f3dcd332cdb2. Subsystems affected by this patch series: mm/gup mm/pagealloc kthread ia64 scripts ntfs squashfs ocfs2 z kernel/watchdog mm/slab mm/slub mm/kmemleak mm/dax mm/debug mm/pagecache mm/gup mm/swap mm/memcg mm/pagemap mm/mprotect mm/bootmem mm/dma mm/tracing mm/vmalloc mm/kasan mm/initialization mm/pagealloc mm/memory-failure Subsystem: mm/gup Jann Horn <jannh@google.com>: mm/gup: fix try_grab_compound_head() race with split_huge_page() Subsystem: mm/pagealloc Mike Rapoport <rppt@linux.ibm.com>: mm/page_alloc: fix memory map initialization for descending nodes Mel Gorman <mgorman@techsingularity.net>: mm/page_alloc: correct return value of populated elements if bulk array is populated Subsystem: kthread Jonathan Neuschäfer <j.neuschaefer@gmx.net>: kthread: switch to new kerneldoc syntax for named variable macro argument Petr Mladek <pmladek@suse.com>: kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Subsystem: ia64 Randy Dunlap <rdunlap@infradead.org>: ia64: headers: drop duplicated words Arnd Bergmann <arnd@arndb.de>: ia64: mca_drv: fix incorrect array size calculation Subsystem: scripts "Steven Rostedt (VMware)" <rostedt@goodmis.org>: Patch series "streamline_config.pl: Fix Perl spacing": streamline_config.pl: make spacing consistent streamline_config.pl: add softtabstop=4 for vim users Colin Ian King <colin.king@canonical.com>: scripts/spelling.txt: add more spellings to spelling.txt Subsystem: ntfs Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>: ntfs: fix validity check for file name attribute Subsystem: squashfs Vincent Whitchurch <vincent.whitchurch@axis.com>: squashfs: add option to panic on errors Subsystem: ocfs2 Yang Yingliang <yangyingliang@huawei.com>: ocfs2: remove unnecessary INIT_LIST_HEAD() Subsystem: z Dan Carpenter <dan.carpenter@oracle.com>: ocfs2: fix snprintf() checking Colin Ian King <colin.king@canonical.com>: ocfs2: remove redundant assignment to pointer queue Wan Jiabing <wanjiabing@vivo.com>: ocfs2: remove repeated uptodate check for buffer Chen Huang <chenhuang5@huawei.com>: ocfs2: replace simple_strtoull() with kstrtoull() Colin Ian King <colin.king@canonical.com>: ocfs2: remove redundant initialization of variable ret Subsystem: kernel/watchdog Wang Qing <wangqing@vivo.com>: kernel: watchdog: modify the explanation related to watchdog thread doc: watchdog: modify the explanation related to watchdog thread doc: watchdog: modify the doc related to "watchdog/%u" Subsystem: mm/slab gumingtao <gumingtao1225@gmail.com>: slab: use __func__ to trace function name Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: kunit: make test->lock irq safe Oliver Glitta <glittao@gmail.com>: mm/slub, kunit: add a KUnit test for SLUB debugging functionality slub: remove resiliency_test() function Hyeonggon Yoo <42.hyeyoo@gmail.com>: mm, slub: change run-time assertion in kmalloc_index() to compile-time Stephen Boyd <swboyd@chromium.org>: slub: restore slub_debug=- behavior slub: actually use 'message' in restore_bytes() Joe Perches <joe@perches.com>: slub: indicate slab_fix() uses printf formats Stephen Boyd <swboyd@chromium.org>: slub: force on no_hash_pointers when slub_debug is enabled Faiyaz Mohammed <faiyazm@codeaurora.org>: mm: slub: move sysfs slab alloc/free interfaces to debugfs Georgi Djakov <quic_c_gdjako@quicinc.com>: mm/slub: add taint after the errors are printed Subsystem: mm/kmemleak Yanfei Xu <yanfei.xu@windriver.com>: mm/kmemleak: fix possible wrong memory scanning period Subsystem: mm/dax Jan Kara <jack@suse.cz>: dax: fix ENOMEM handling in grab_mapping_entry() Subsystem: mm/debug Tang Bin <tangbin@cmss.chinamobile.com>: tools/vm/page_owner_sort.c: check malloc() return Anshuman Khandual <anshuman.khandual@arm.com>: mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage() Nicolas Saenz Julienne <nsaenzju@redhat.com>: mm: mmap_lock: use local locks instead of disabling preemption Gavin Shan <gshan@redhat.com>: Patch series "mm/page_reporting: Make page reporting work on arm64 with 64KB page size", v4: mm/page_reporting: fix code style in __page_reporting_request() mm/page_reporting: export reporting order as module parameter mm/page_reporting: allow driver to specify reporting order virtio_balloon: specify page reporting order if needed Subsystem: mm/pagecache Kefeng Wang <wangkefeng.wang@huawei.com>: mm: page-writeback: kill get_writeback_state() comments Chi Wu <wuchi.zero@gmail.com>: mm/page-writeback: Fix performance when BDI's share of ratio is 0. mm/page-writeback: update the comment of Dirty position control mm/page-writeback: use __this_cpu_inc() in account_page_dirtied() Roman Gushchin <guro@fb.com>: Patch series "cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups", v9: writeback, cgroup: do not switch inodes with I_WILL_FREE flag writeback, cgroup: add smp_mb() to cgroup_writeback_umount() writeback, cgroup: increment isw_nr_in_flight before grabbing an inode writeback, cgroup: switch to rcu_work API in inode_switch_wbs() writeback, cgroup: keep list of inodes attached to bdi_writeback writeback, cgroup: split out the functional part of inode_switch_wbs_work_fn() writeback, cgroup: support switching multiple inodes at once writeback, cgroup: release dying cgwbs by switching attached inodes Christoph Hellwig <hch@lst.de>: Patch series "remove the implicit .set_page_dirty default": fs: unexport __set_page_dirty fs: move ramfs_aops to libfs mm: require ->set_page_dirty to be explicitly wired up "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Further set_page_dirty cleanups": mm/writeback: move __set_page_dirty() to core mm mm/writeback: use __set_page_dirty in __set_page_dirty_nobuffers iomap: use __set_page_dirty_nobuffers fs: remove anon_set_page_dirty() fs: remove noop_set_page_dirty() mm: move page dirtying prototypes from mm.h Subsystem: mm/gup Peter Xu <peterx@redhat.com>: Patch series "mm/gup: Fix pin page write cache bouncing on has_pinned", v2: mm/gup_benchmark: support threading Andrea Arcangeli <aarcange@redhat.com>: mm: gup: allow FOLL_PIN to scale in SMP mm: gup: pack has_pinned in MMF_HAS_PINNED Christophe Leroy <christophe.leroy@csgroup.eu>: mm: pagewalk: fix walk for hugepage tables Subsystem: mm/swap Miaohe Lin <linmiaohe@huawei.com>: Patch series "close various race windows for swap", v6: mm/swapfile: use percpu_ref to serialize against concurrent swapoff swap: fix do_swap_page() race with swapoff mm/swap: remove confusing checking for non_swap_entry() in swap_ra_info() mm/shmem: fix shmem_swapin() race with swapoff Patch series "Cleanups for swap", v2: mm/swapfile: move get_swap_page_of_type() under CONFIG_HIBERNATION mm/swap: remove unused local variable nr_shadows mm/swap_slots.c: delete meaningless forward declarations Huang Ying <ying.huang@intel.com>: mm, swap: remove unnecessary smp_rmb() in swap_type_to_swap_info() mm: free idle swap cache page after COW swap: check mapping_empty() for swap cache before being freed Subsystem: mm/memcg Waiman Long <longman@redhat.com>: Patch series "mm/memcg: Reduce kmemcache memory accounting overhead", v6: mm/memcg: move mod_objcg_state() to memcontrol.c mm/memcg: cache vmstat data in percpu memcg_stock_pcp mm/memcg: improve refill_obj_stock() performance mm/memcg: optimize user context object stock access Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4: mm: memcg/slab: properly set up gfp flags for objcg pointer array mm: memcg/slab: create a new set of kmalloc-cg-<n> caches mm: memcg/slab: disable cache merging for KMALLOC_NORMAL caches Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: fix root_mem_cgroup charging Patch series "memcontrol code cleanup and simplification", v3: mm: memcontrol: fix page charging in page replacement mm: memcontrol: bail out early when !mm in get_mem_cgroup_from_mm mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec mm: memcontrol: simplify lruvec_holds_page_lru_lock mm: memcontrol: rename lruvec_holds_page_lru_lock to page_matches_lruvec mm: memcontrol: simplify the logic of objcg pinning memcg mm: memcontrol: move obj_cgroup_uncharge_pages() out of css_set_lock mm: vmscan: remove noinline_for_stack wenhuizhang <wenhui@gwmail.gwu.edu>: memcontrol: use flexible-array member Dan Schatzberg <schatzberg.dan@gmail.com>: Patch series "Charge loop device i/o to issuing cgroup", v14: loop: use worker per cgroup instead of kworker mm: charge active memcg when no mm is set loop: charge i/o to mem and blk cg Huilong Deng <denghuilong@cdjrlc.com>: mm: memcontrol: remove trailing semicolon in macros Subsystem: mm/pagemap David Hildenbrand <david@redhat.com>: Patch series "perf/binfmt/mm: remove in-tree usage of MAP_EXECUTABLE": perf: MAP_EXECUTABLE does not indicate VM_MAYEXEC binfmt: remove in-tree usage of MAP_EXECUTABLE mm: ignore MAP_EXECUTABLE in ksys_mmap_pgoff() Gonzalo Matias Juarez Tello <gmjuareztello@gmail.com>: mm/mmap.c: logic of find_vma_intersection repeated in __do_munmap Liam Howlett <liam.howlett@oracle.com>: mm/mmap: introduce unlock_range() for code cleanup mm/mmap: use find_vma_intersection() in do_mmap() for overlap Liu Xiang <liu.xiang@zlingsmart.com>: mm/memory.c: fix comment of finish_mkwrite_fault() Liam Howlett <liam.howlett@oracle.com>: Patch series "mm: Add vma_lookup()", v2: mm: add vma_lookup(), update find_vma_intersection() comments drm/i915/selftests: use vma_lookup() in __igt_mmap() arch/arc/kernel/troubleshoot: use vma_lookup() instead of find_vma() arch/arm64/kvm: use vma_lookup() instead of find_vma_intersection() arch/powerpc/kvm/book3s_hv_uvmem: use vma_lookup() instead of find_vma_intersection() arch/powerpc/kvm/book3s: use vma_lookup() in kvmppc_hv_setup_htab_rma() arch/mips/kernel/traps: use vma_lookup() instead of find_vma() arch/m68k/kernel/sys_m68k: use vma_lookup() in sys_cacheflush() x86/sgx: use vma_lookup() in sgx_encl_find() virt/kvm: use vma_lookup() instead of find_vma_intersection() vfio: use vma_lookup() instead of find_vma_intersection() net/ipv5/tcp: use vma_lookup() in tcp_zerocopy_receive() drm/amdgpu: use vma_lookup() in amdgpu_ttm_tt_get_user_pages() media: videobuf2: use vma_lookup() in get_vaddr_frames() misc/sgi-gru/grufault: use vma_lookup() in gru_find_vma() kernel/events/uprobes: use vma_lookup() in find_active_uprobe() lib/test_hmm: use vma_lookup() in dmirror_migrate() mm/ksm: use vma_lookup() in find_mergeable_vma() mm/migrate: use vma_lookup() in do_pages_stat_array() mm/mremap: use vma_lookup() in vma_to_resize() mm/memory.c: use vma_lookup() in __access_remote_vm() mm/mempolicy: use vma_lookup() in __access_remote_vm() Chen Li <chenli@uniontech.com>: mm: update legacy flush_tlb_* to use vma Subsystem: mm/mprotect Peter Collingbourne <pcc@google.com>: mm: improve mprotect(R|W) efficiency on pages referenced once Subsystem: mm/bootmem Souptick Joarder <jrdr.linux@gmail.com>: h8300: remove unused variable Subsystem: mm/dma YueHaibing <yuehaibing@huawei.com>: mm/dmapool: use DEVICE_ATTR_RO macro Subsystem: mm/tracing Vincent Whitchurch <vincent.whitchurch@axis.com>: mm, tracing: unify PFN format strings Subsystem: mm/vmalloc "Uladzislau Rezki (Sony)" <urezki@gmail.com>: Patch series "vmalloc() vs bulk allocator", v2: mm/page_alloc: add an alloc_pages_bulk_array_node() helper mm/vmalloc: switch to bulk allocator in __vmalloc_area_node() mm/vmalloc: print a warning message first on failure mm/vmalloc: remove quoted strings split across lines Uladzislau Rezki <urezki@gmail.com>: mm/vmalloc: fallback to a single page allocator Rafael Aquini <aquini@redhat.com>: mm: vmalloc: add cond_resched() in __vunmap() Subsystem: mm/kasan Alexander Potapenko <glider@google.com>: printk: introduce dump_stack_lvl() kasan: use dump_stack_lvl(KERN_ERR) to print stacks David Gow <davidgow@google.com>: kasan: test: improve failure message in KUNIT_EXPECT_KASAN_FAIL() Daniel Axtens <dja@axtens.net>: Patch series "KASAN core changes for ppc64 radix KASAN", v16: kasan: allow an architecture to disable inline instrumentation kasan: allow architectures to provide an outline readiness check mm: define default MAX_PTRS_PER_* in include/pgtable.h kasan: use MAX_PTRS_PER_* for early shadow tables Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>: Patch series "kasan: add memory corruption identification support for hw tag-based kasan", v4: kasan: rename CONFIG_KASAN_SW_TAGS_IDENTIFY to CONFIG_KASAN_TAGS_IDENTIFY kasan: integrate the common part of two KASAN tag-based modes kasan: add memory corruption identification support for hardware tag-based mode Subsystem: mm/initialization Jungseung Lee <js07.lee@samsung.com>: mm: report which part of mem is being freed on initmem case Subsystem: mm/pagealloc Mike Rapoport <rppt@linux.ibm.com>: mm/mmzone.h: simplify is_highmem_idx() "Matthew Wilcox (Oracle)" <willy@infradead.org>: Patch series "Constify struct page arguments": mm: make __dump_page static Aaron Tomlin <atomlin@redhat.com>: mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/debug: factor PagePoisoned out of __dump_page mm/page_owner: constify dump_page_owner mm: make compound_head const-preserving mm: constify get_pfnblock_flags_mask and get_pfnblock_migratetype mm: constify page_count and page_ref_count mm: optimise nth_page for contiguous memmap Heiner Kallweit <hkallweit1@gmail.com>: mm/page_alloc: switch to pr_debug Andrii Nakryiko <andrii@kernel.org>: kbuild: skip per-CPU BTF generation for pahole v1.18-v1.21 Mel Gorman <mgorman@techsingularity.net>: mm/page_alloc: split per cpu page lists and zone stats mm/page_alloc: convert per-cpu list protection to local_lock mm/vmstat: convert NUMA statistics to basic NUMA counters mm/vmstat: inline NUMA event counter updates mm/page_alloc: batch the accounting updates in the bulk allocator mm/page_alloc: reduce duration that IRQs are disabled for VM counters mm/page_alloc: explicitly acquire the zone lock in __free_pages_ok mm/page_alloc: avoid conflating IRQs disabled with zone->lock mm/page_alloc: update PGFREE outside the zone lock in __free_pages_ok Minchan Kim <minchan@kernel.org>: mm: page_alloc: dump migrate-failed pages only at -EBUSY Mel Gorman <mgorman@techsingularity.net>: Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2: mm/page_alloc: delete vm.percpu_pagelist_fraction mm/page_alloc: disassociate the pcp->high from pcp->batch mm/page_alloc: adjust pcp->high after CPU hotplug events mm/page_alloc: scale the number of pages that are batch freed mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: introduce vm.percpu_pagelist_high_fraction Dong Aisheng <aisheng.dong@nxp.com>: mm: drop SECTION_SHIFT in code comments mm/page_alloc: improve memmap_pages dbg msg Liu Shixin <liushixin2@huawei.com>: mm/page_alloc: fix counting of managed_pages Mel Gorman <mgorman@techsingularity.net>: Patch series "Allow high order pages to be stored on PCP", v2: mm/page_alloc: move free_the_page Mike Rapoport <rppt@linux.ibm.com>: Patch series "Remove DISCONTIGMEM memory model", v3: alpha: remove DISCONTIGMEM and NUMA arc: update comment about HIGHMEM implementation arc: remove support for DISCONTIGMEM m68k: remove support for DISCONTIGMEM mm: remove CONFIG_DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM docs: remove description of DISCONTIGMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM Mel Gorman <mgorman@techsingularity.net>: mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm,hwpoison: send SIGBUS with error virutal address mm,hwpoison: make get_hwpoison_page() call get_any_page() Documentation/admin-guide/kernel-parameters.txt | 6 Documentation/admin-guide/lockup-watchdogs.rst | 4 Documentation/admin-guide/sysctl/kernel.rst | 10 Documentation/admin-guide/sysctl/vm.rst | 52 - Documentation/dev-tools/kasan.rst | 9 Documentation/vm/memory-model.rst | 45 arch/alpha/Kconfig | 22 arch/alpha/include/asm/machvec.h | 6 arch/alpha/include/asm/mmzone.h | 100 -- arch/alpha/include/asm/pgtable.h | 4 arch/alpha/include/asm/topology.h | 39 arch/alpha/kernel/core_marvel.c | 53 - arch/alpha/kernel/core_wildfire.c | 29 arch/alpha/kernel/pci_iommu.c | 29 arch/alpha/kernel/proto.h | 8 arch/alpha/kernel/setup.c | 16 arch/alpha/kernel/sys_marvel.c | 5 arch/alpha/kernel/sys_wildfire.c | 5 arch/alpha/mm/Makefile | 2 arch/alpha/mm/init.c | 3 arch/alpha/mm/numa.c | 223 ---- arch/arc/Kconfig | 13 arch/arc/include/asm/mmzone.h | 40 arch/arc/kernel/troubleshoot.c | 8 arch/arc/mm/init.c | 21 arch/arm/include/asm/tlbflush.h | 13 arch/arm/mm/tlb-v6.S | 2 arch/arm/mm/tlb-v7.S | 2 arch/arm64/Kconfig | 2 arch/arm64/kvm/mmu.c | 2 arch/h8300/kernel/setup.c | 2 arch/ia64/Kconfig | 2 arch/ia64/include/asm/pal.h | 2 arch/ia64/include/asm/spinlock.h | 2 arch/ia64/include/asm/uv/uv_hub.h | 2 arch/ia64/kernel/efi_stub.S | 2 arch/ia64/kernel/mca_drv.c | 2 arch/ia64/kernel/topology.c | 5 arch/ia64/mm/numa.c | 5 arch/m68k/Kconfig.cpu | 10 arch/m68k/include/asm/mmzone.h | 10 arch/m68k/include/asm/page.h | 2 arch/m68k/include/asm/page_mm.h | 35 arch/m68k/include/asm/tlbflush.h | 2 arch/m68k/kernel/sys_m68k.c | 4 arch/m68k/mm/init.c | 20 arch/mips/Kconfig | 2 arch/mips/include/asm/mmzone.h | 8 arch/mips/include/asm/page.h | 2 arch/mips/kernel/traps.c | 4 arch/mips/mm/init.c | 7 arch/nds32/include/asm/memory.h | 6 arch/openrisc/include/asm/tlbflush.h | 2 arch/powerpc/Kconfig | 2 arch/powerpc/include/asm/mmzone.h | 4 arch/powerpc/kernel/setup_64.c | 2 arch/powerpc/kernel/smp.c | 2 arch/powerpc/kexec/core.c | 4 arch/powerpc/kvm/book3s_hv.c | 4 arch/powerpc/kvm/book3s_hv_uvmem.c | 2 arch/powerpc/mm/Makefile | 2 arch/powerpc/mm/mem.c | 4 arch/riscv/Kconfig | 2 arch/s390/Kconfig | 2 arch/s390/include/asm/pgtable.h | 2 arch/sh/include/asm/mmzone.h | 4 arch/sh/kernel/topology.c | 2 arch/sh/mm/Kconfig | 2 arch/sh/mm/init.c | 2 arch/sparc/Kconfig | 2 arch/sparc/include/asm/mmzone.h | 4 arch/sparc/kernel/smp_64.c | 2 arch/sparc/mm/init_64.c | 12 arch/x86/Kconfig | 2 arch/x86/ia32/ia32_aout.c | 4 arch/x86/kernel/cpu/mce/core.c | 13 arch/x86/kernel/cpu/sgx/encl.h | 4 arch/x86/kernel/setup_percpu.c | 6 arch/x86/mm/init_32.c | 4 arch/xtensa/include/asm/page.h | 4 arch/xtensa/include/asm/tlbflush.h | 4 drivers/base/node.c | 18 drivers/block/loop.c | 270 ++++- drivers/block/loop.h | 15 drivers/dax/device.c | 2 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 2 drivers/media/common/videobuf2/frame_vector.c | 2 drivers/misc/sgi-gru/grufault.c | 4 drivers/vfio/vfio_iommu_type1.c | 2 drivers/virtio/virtio_balloon.c | 17 fs/adfs/inode.c | 1 fs/affs/file.c | 2 fs/bfs/file.c | 1 fs/binfmt_aout.c | 4 fs/binfmt_elf.c | 2 fs/binfmt_elf_fdpic.c | 11 fs/binfmt_flat.c | 2 fs/block_dev.c | 1 fs/buffer.c | 25 fs/configfs/inode.c | 8 fs/dax.c | 3 fs/ecryptfs/mmap.c | 13 fs/exfat/inode.c | 1 fs/ext2/inode.c | 4 fs/ext4/inode.c | 2 fs/fat/inode.c | 1 fs/fs-writeback.c | 366 +++++--- fs/fuse/dax.c | 3 fs/gfs2/aops.c | 2 fs/gfs2/meta_io.c | 2 fs/hfs/inode.c | 2 fs/hfsplus/inode.c | 2 fs/hpfs/file.c | 1 fs/iomap/buffered-io.c | 27 fs/jfs/inode.c | 1 fs/kernfs/inode.c | 8 fs/libfs.c | 44 fs/minix/inode.c | 1 fs/nilfs2/mdt.c | 1 fs/ntfs/inode.c | 2 fs/ocfs2/aops.c | 4 fs/ocfs2/cluster/heartbeat.c | 7 fs/ocfs2/cluster/nodemanager.c | 2 fs/ocfs2/dlm/dlmmaster.c | 2 fs/ocfs2/filecheck.c | 6 fs/ocfs2/stackglue.c | 8 fs/omfs/file.c | 1 fs/proc/task_mmu.c | 2 fs/ramfs/inode.c | 9 fs/squashfs/block.c | 5 fs/squashfs/squashfs_fs_sb.h | 1 fs/squashfs/super.c | 86 + fs/sysv/itree.c | 1 fs/udf/file.c | 1 fs/udf/inode.c | 1 fs/ufs/inode.c | 1 fs/xfs/xfs_aops.c | 4 fs/zonefs/super.c | 4 include/asm-generic/memory_model.h | 37 include/asm-generic/pgtable-nop4d.h | 1 include/asm-generic/topology.h | 2 include/kunit/test.h | 5 include/linux/backing-dev-defs.h | 20 include/linux/cpuhotplug.h | 2 include/linux/fs.h | 6 include/linux/gfp.h | 13 include/linux/iomap.h | 1 include/linux/kasan.h | 7 include/linux/kernel.h | 2 include/linux/kthread.h | 2 include/linux/memblock.h | 6 include/linux/memcontrol.h | 60 - include/linux/mm.h | 53 - include/linux/mm_types.h | 10 include/linux/mman.h | 2 include/linux/mmdebug.h | 3 include/linux/mmzone.h | 96 +- include/linux/page-flags.h | 10 include/linux/page_owner.h | 6 include/linux/page_ref.h | 4 include/linux/page_reporting.h | 3 include/linux/pageblock-flags.h | 2 include/linux/pagemap.h | 4 include/linux/pgtable.h | 22 include/linux/printk.h | 5 include/linux/sched/coredump.h | 8 include/linux/slab.h | 59 + include/linux/swap.h | 19 include/linux/swapops.h | 5 include/linux/vmstat.h | 69 - include/linux/writeback.h | 1 include/trace/events/cma.h | 4 include/trace/events/filemap.h | 2 include/trace/events/kmem.h | 12 include/trace/events/page_pool.h | 4 include/trace/events/pagemap.h | 4 include/trace/events/vmscan.h | 2 kernel/cgroup/cgroup.c | 1 kernel/crash_core.c | 4 kernel/events/core.c | 2 kernel/events/uprobes.c | 4 kernel/fork.c | 1 kernel/kthread.c | 19 kernel/sysctl.c | 16 kernel/watchdog.c | 12 lib/Kconfig.debug | 15 lib/Kconfig.kasan | 16 lib/Makefile | 1 lib/dump_stack.c | 20 lib/kunit/test.c | 18 lib/slub_kunit.c | 152 +++ lib/test_hmm.c | 5 lib/test_kasan.c | 11 lib/vsprintf.c | 2 mm/Kconfig | 38 mm/backing-dev.c | 66 + mm/compaction.c | 2 mm/debug.c | 27 mm/debug_vm_pgtable.c | 63 + mm/dmapool.c | 5 mm/filemap.c | 2 mm/gup.c | 81 + mm/hugetlb.c | 2 mm/internal.h | 9 mm/kasan/Makefile | 4 mm/kasan/common.c | 6 mm/kasan/generic.c | 3 mm/kasan/hw_tags.c | 22 mm/kasan/init.c | 6 mm/kasan/kasan.h | 12 mm/kasan/report.c | 6 mm/kasan/report_hw_tags.c | 5 mm/kasan/report_sw_tags.c | 45 mm/kasan/report_tags.c | 51 + mm/kasan/shadow.c | 6 mm/kasan/sw_tags.c | 45 mm/kasan/tags.c | 59 + mm/kfence/kfence_test.c | 5 mm/kmemleak.c | 18 mm/ksm.c | 6 mm/memblock.c | 8 mm/memcontrol.c | 385 ++++++-- mm/memory-failure.c | 344 +++++-- mm/memory.c | 22 mm/memory_hotplug.c | 6 mm/mempolicy.c | 4 mm/migrate.c | 4 mm/mmap.c | 54 - mm/mmap_lock.c | 33 mm/mprotect.c | 52 + mm/mremap.c | 5 mm/nommu.c | 2 mm/page-writeback.c | 89 + mm/page_alloc.c | 950 +++++++++++++-------- mm/page_ext.c | 2 mm/page_owner.c | 2 mm/page_reporting.c | 19 mm/page_reporting.h | 5 mm/pagewalk.c | 58 + mm/shmem.c | 18 mm/slab.h | 24 mm/slab_common.c | 60 - mm/slub.c | 420 +++++---- mm/sparse.c | 2 mm/swap.c | 4 mm/swap_slots.c | 2 mm/swap_state.c | 20 mm/swapfile.c | 177 +-- mm/vmalloc.c | 181 ++-- mm/vmscan.c | 43 mm/vmstat.c | 282 ++---- mm/workingset.c | 2 net/ipv4/tcp.c | 4 scripts/kconfig/streamline_config.pl | 76 - scripts/link-vmlinux.sh | 4 scripts/spelling.txt | 16 tools/testing/selftests/vm/gup_test.c | 96 +- tools/vm/page_owner_sort.c | 4 virt/kvm/kvm_main.c | 2 260 files changed, 3989 insertions(+), 2996 deletions(-)
This is the rest of the -mm tree, less 66 patches which are dependent on things which are (or were recently) in linux-next. I'll trickle that material over next week. 192 patches, based on 7cf3dead1ad70c72edb03e2d98e1f3dcd332cdb2 plus the June 28 sendings. Subsystems affected by this patch series: mm/hugetlb mm/userfaultfd mm/vmscan mm/kconfig mm/proc mm/z3fold mm/zbud mm/ras mm/mempolicy mm/memblock mm/migration mm/thp mm/nommu mm/kconfig mm/madvise mm/memory-hotplug mm/zswap mm/zsmalloc mm/zram mm/cleanups mm/kfence mm/hmm procfs sysctl misc core-kernel lib lz4 checkpatch init kprobes nilfs2 hfs signals exec kcov selftests compress/decompress ipc Subsystem: mm/hugetlb Muchun Song <songmuchun@bytedance.com>: Patch series "Free some vmemmap pages of HugeTLB page", v23: mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP mm: hugetlb: gather discrete indexes of tail page mm: hugetlb: free the vmemmap pages associated with each HugeTLB page mm: hugetlb: defer freeing of HugeTLB pages mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Shixin Liu <liushixin2@huawei.com>: mm/debug_vm_pgtable: move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE mm/debug_vm_pgtable: remove redundant pfn_{pmd/pte}() and fix one comment mistake Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup and fixup for huge_memory:, v3: mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK mm/huge_memory.c: use page->deferred_list mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd mm/huge_memory.c: don't discard hugepage if other processes are mapping it Christophe Leroy <christophe.leroy@csgroup.eu>: Patch series "Subject: [PATCH v2 0/5] Implement huge VMAP and VMALLOC on powerpc 8xx", v2: mm/hugetlb: change parameters of arch_make_huge_pte() mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge mm/vmalloc: enable mapping of huge pages at pte level in vmap mm/vmalloc: enable mapping of huge pages at pte level in vmalloc powerpc/8xx: add support for huge pages on VMAP and VMALLOC Nanyong Sun <sunnanyong@huawei.com>: khugepaged: selftests: remove debug_cow Mina Almasry <almasrymina@google.com>: mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY Muchun Song <songmuchun@bytedance.com>: Patch series "Split huge PMD mapping of vmemmap pages", v4: mm: sparsemem: split the huge PMD mapping of vmemmap pages mm: sparsemem: use huge PMD mapping for vmemmap pages mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Mike Kravetz <mike.kravetz@oracle.com>: Patch series "Fix prep_compound_gigantic_page ref count adjustment": hugetlb: remove prep_compound_huge_page cleanup hugetlb: address ref count racing in prep_compound_gigantic_page Naoya Horiguchi <naoya.horiguchi@nec.com>: mm/hwpoison: disable pcp for page_handle_poison() Subsystem: mm/userfaultfd Peter Xu <peterx@redhat.com>: Patch series "userfaultfd/selftests: A few cleanups", v2: userfaultfd/selftests: use user mode only userfaultfd/selftests: remove the time() check on delayed uffd userfaultfd/selftests: dropping VERIFY check in locking_thread userfaultfd/selftests: only dump counts if mode enabled userfaultfd/selftests: unify error handling Patch series "mm/uffd: Misc fix for uffd-wp and one more test": mm/thp: simplify copying of huge zero page pmd when fork mm/userfaultfd: fix uffd-wp special cases for fork() mm/userfaultfd: fail uffd-wp registration if not supported mm/pagemap: export uffd-wp protection information userfaultfd/selftests: add pagemap uffd-wp test Axel Rasmussen <axelrasmussen@google.com>: Patch series "userfaultfd: add minor fault handling for shmem", v6: userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte userfaultfd/shmem: support minor fault registration for shmem userfaultfd/shmem: support UFFDIO_CONTINUE for shmem userfaultfd/shmem: advertise shmem minor fault support userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte() userfaultfd/selftests: use memfd_create for shmem test type userfaultfd/selftests: create alias mappings in the shmem test userfaultfd/selftests: reinitialize test context in each test userfaultfd/selftests: exercise minor fault handling shmem support Subsystem: mm/vmscan Yu Zhao <yuzhao@google.com>: mm/vmscan.c: fix potential deadlock in reclaim_pages() include/trace/events/vmscan.h: remove mm_vmscan_inactive_list_is_low Miaohe Lin <linmiaohe@huawei.com>: mm: workingset: define macro WORKINGSET_SHIFT Subsystem: mm/kconfig Kefeng Wang <wangkefeng.wang@huawei.com>: mm/kconfig: move HOLES_IN_ZONE into mm Subsystem: mm/proc Mike Rapoport <rppt@linux.ibm.com>: docs: proc.rst: meminfo: briefly describe gaps in memory accounting David Hildenbrand <david@redhat.com>: Patch series "fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages", v3: fs/proc/kcore: drop KCORE_REMAP and KCORE_OTHER fs/proc/kcore: pfn_is_ram check only applies to KCORE_RAM fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages mm: introduce page_offline_(begin|end|freeze|thaw) to synchronize setting PageOffline() virtio-mem: use page_offline_(start|end) when setting PageOffline() fs/proc/kcore: use page_offline_(freeze|thaw) Subsystem: mm/z3fold Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup and fixup for z3fold": mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS mm/z3fold: avoid possible underflow in z3fold_alloc() mm/z3fold: remove magic number in z3fold_create_pool() mm/z3fold: remove unused function handle_to_z3fold_header() mm/z3fold: fix potential memory leak in z3fold_destroy_pool() mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page Subsystem: mm/zbud Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanups for zbud", v2: mm/zbud: reuse unbuddied[0] as buddied in zbud_pool mm/zbud: don't export any zbud API Subsystem: mm/ras YueHaibing <yuehaibing@huawei.com>: mm/compaction: use DEVICE_ATTR_WO macro Liu Xiang <liu.xiang@zlingsmart.com>: mm: compaction: remove duplicate !list_empty(&sublist) check Wonhyuk Yang <vvghjk1234@gmail.com>: mm/compaction: fix 'limit' in fast_isolate_freepages Subsystem: mm/mempolicy Feng Tang <feng.tang@intel.com>: Patch series "mm/mempolicy: some fix and semantics cleanup", v4: mm/mempolicy: cleanup nodemask intersection check for oom mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy mm/mempolicy: unify the parameter sanity check for mbind and set_mempolicy Yang Shi <shy828301@gmail.com>: mm: mempolicy: don't have to split pmd for huge zero page Ben Widawsky <ben.widawsky@intel.com>: mm/mempolicy: use unified 'nodes' for bind/interleave/prefer policies Subsystem: mm/memblock Mike Rapoport <rppt@linux.ibm.com>: Patch series "arm64: drop pfn_valid_within() and simplify pfn_valid()", v4: include/linux/mmzone.h: add documentation for pfn_valid() memblock: update initialization of reserved pages arm64: decouple check whether pfn is in linear map from pfn_valid() arm64: drop pfn_valid_within() and simplify pfn_valid() Anshuman Khandual <anshuman.khandual@arm.com>: arm64/mm: drop HAVE_ARCH_PFN_VALID Subsystem: mm/migration Muchun Song <songmuchun@bytedance.com>: mm: migrate: fix missing update page_private to hugetlb_page_subpool Subsystem: mm/thp Collin Fijalkovich <cfijalkovich@google.com>: mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs Yang Shi <shy828301@gmail.com>: mm: memory: add orig_pmd to struct vm_fault mm: memory: make numa_migrate_prep() non-static mm: thp: refactor NUMA fault handling mm: migrate: account THP NUMA migration counters correctly mm: migrate: don't split THP for misplaced NUMA page mm: migrate: check mapcount for THP instead of refcount mm: thp: skip make PMD PROT_NONE if THP migration is not supported Anshuman Khandual <anshuman.khandual@arm.com>: mm/thp: make ARCH_ENABLE_SPLIT_PMD_PTLOCK dependent on PGTABLE_LEVELS > 2 Yang Shi <shy828301@gmail.com>: mm: rmap: make try_to_unmap() void function Hugh Dickins <hughd@google.com>: mm/thp: remap_page() is only needed on anonymous THP mm: hwpoison_user_mappings() try_to_unmap() with TTU_SYNC "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/thp: fix strncpy warning Subsystem: mm/nommu Chen Li <chenli@uniontech.com>: nommu: remove __GFP_HIGHMEM in vmalloc/vzalloc Liam Howlett <liam.howlett@oracle.com>: mm/nommu: unexport do_munmap() Subsystem: mm/kconfig Kefeng Wang <wangkefeng.wang@huawei.com>: mm: generalize ZONE_[DMA|DMA32] Subsystem: mm/madvise David Hildenbrand <david@redhat.com>: Patch series "mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables", v2: mm: make variable names for populate_vma_page_range() consistent mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore selftests/vm: add test for MADV_POPULATE_(READ|WRITE) Subsystem: mm/memory-hotplug Liam Mark <lmark@codeaurora.org>: mm/memory_hotplug: rate limit page migration warnings Oscar Salvador <osalvador@suse.de>: mm,memory_hotplug: drop unneeded locking Subsystem: mm/zswap Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup and fixup for zswap": mm/zswap.c: remove unused function zswap_debugfs_exit() mm/zswap.c: avoid unnecessary copy-in at map time mm/zswap.c: fix two bugs in zswap_writeback_entry() Subsystem: mm/zsmalloc Zhaoyang Huang <zhaoyang.huang@unisoc.com>: mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup for zsmalloc": mm/zsmalloc.c: remove confusing code in obj_free() mm/zsmalloc.c: improve readability for async_free_zspage() Subsystem: mm/zram Yue Hu <huyue2@yulong.com>: zram: move backing_dev under macro CONFIG_ZRAM_WRITEBACK Subsystem: mm/cleanups Hyeonggon Yoo <42.hyeyoo@gmail.com>: mm: fix typos and grammar error in comments Anshuman Khandual <anshuman.khandual@arm.com>: mm: define default value for FIRST_USER_ADDRESS Zhen Lei <thunder.leizhen@huawei.com>: mm: fix spelling mistakes Mel Gorman <mgorman@techsingularity.net>: Patch series "Clean W=1 build warnings for mm/": mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages mm/vmalloc: include header for prototype of set_iounmap_nonlazy mm/page_alloc: make should_fail_alloc_page() static mm/mapping_dirty_helpers: remove double Note in kerneldoc mm/memcontrol.c: fix kerneldoc comment for mem_cgroup_calculate_protection mm/memory_hotplug: fix kerneldoc comment for __try_online_node mm/memory_hotplug: fix kerneldoc comment for __remove_memory mm/zbud: add kerneldoc fields for zbud_pool mm/z3fold: add kerneldoc fields for z3fold_pool mm/swap: make swap_address_space an inline function mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations mm/page_alloc: move prototype for find_suitable_fallback mm/swap: make NODE_DATA an inline function on CONFIG_FLATMEM Anshuman Khandual <anshuman.khandual@arm.com>: mm/thp: define default pmd_pgtable() Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence: unconditionally use unbound work queue Subsystem: mm/hmm Alistair Popple <apopple@nvidia.com>: Patch series "Add support for SVM atomics in Nouveau", v11: mm: remove special swap entry functions mm/swapops: rework swap entry manipulation code mm/rmap: split try_to_munlock from try_to_unmap mm/rmap: split migration into its own function mm: rename migrate_pgmap_owner mm/memory.c: allow different return codes for copy_nonpresent_pte() mm: device exclusive memory access mm: selftests for exclusive device memory nouveau/svm: refactor nouveau_range_fault nouveau/svm: implement atomic SVM access Subsystem: procfs Marcelo Henrique Cerri <marcelo.cerri@canonical.com>: proc: Avoid mixing integer types in mem_rw() ZHOUFENG <zhoufeng.zf@bytedance.com>: fs/proc/kcore.c: add mmap interface Kalesh Singh <kaleshsingh@google.com>: procfs: allow reading fdinfo with PTRACE_MODE_READ procfs/dmabuf: add inode number to /proc/*/fdinfo Subsystem: sysctl Jiapeng Chong <jiapeng.chong@linux.alibaba.com>: sysctl: remove redundant assignment to first Subsystem: misc Andy Shevchenko <andriy.shevchenko@linux.intel.com>: drm: include only needed headers in ascii85.h Subsystem: core-kernel Andy Shevchenko <andriy.shevchenko@linux.intel.com>: kernel.h: split out panic and oops helpers Subsystem: lib Zhen Lei <thunder.leizhen@huawei.com>: lib: decompress_bunzip2: remove an unneeded semicolon Andy Shevchenko <andriy.shevchenko@linux.intel.com>: Patch series "lib/string_helpers: get rid of ugly *_escape_mem_ascii()", v3: lib/string_helpers: switch to use BIT() macro lib/string_helpers: move ESCAPE_NP check inside 'else' branch in a loop lib/string_helpers: drop indentation level in string_escape_mem() lib/string_helpers: introduce ESCAPE_NA for escaping non-ASCII lib/string_helpers: introduce ESCAPE_NAP to escape non-ASCII and non-printable lib/string_helpers: allow to append additional characters to be escaped lib/test-string_helpers: print flags in hexadecimal format lib/test-string_helpers: get rid of trailing comma in terminators lib/test-string_helpers: add test cases for new features MAINTAINERS: add myself as designated reviewer for generic string library seq_file: introduce seq_escape_mem() seq_file: add seq_escape_str() as replica of string_escape_str() seq_file: convert seq_escape() to use seq_escape_str() nfsd: avoid non-flexible API in seq_quote_mem() seq_file: drop unused *_escape_mem_ascii() Trent Piepho <tpiepho@gmail.com>: lib/math/rational.c: fix divide by zero lib/math/rational: add Kunit test cases Zhen Lei <thunder.leizhen@huawei.com>: lib/decompressors: fix spelling mistakes lib/mpi: fix spelling mistakes Alexey Dobriyan <adobriyan@gmail.com>: lib: memscan() fixlet lib: uninline simple_strtoull() Matteo Croce <mcroce@microsoft.com>: lib/test_string.c: allow module removal Andy Shevchenko <andriy.shevchenko@linux.intel.com>: kernel.h: split out kstrtox() and simple_strtox() to a separate header Subsystem: lz4 Rajat Asthana <thisisrast7@gmail.com>: lz4_decompress: declare LZ4_decompress_safe_withPrefix64k static Dimitri John Ledkov <dimitri.ledkov@canonical.com>: lib/decompress_unlz4.c: correctly handle zero-padding around initrds. Subsystem: checkpatch Guenter Roeck <linux@roeck-us.net>: checkpatch: scripts/spdxcheck.py now requires python3 Joe Perches <joe@perches.com>: checkpatch: improve the indented label test Guenter Roeck <linux@roeck-us.net>: checkpatch: do not complain about positive return values starting with EPOLL Subsystem: init Andrew Halaney <ahalaney@redhat.com>: init: print out unknown kernel parameters Subsystem: kprobes Barry Song <song.bao.hua@hisilicon.com>: kprobes: remove duplicated strong free_insn_page in x86 and s390 Subsystem: nilfs2 Colin Ian King <colin.king@canonical.com>: nilfs2: remove redundant continue statement in a while-loop Subsystem: hfs Zhen Lei <thunder.leizhen@huawei.com>: hfsplus: remove unnecessary oom message Chung-Chiang Cheng <shepjeng@gmail.com>: hfsplus: report create_date to kstat.btime Subsystem: signals Al Viro <viro@zeniv.linux.org.uk>: x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned Subsystem: exec Alexey Dobriyan <adobriyan@gmail.com>: exec: remove checks in __register_bimfmt() Subsystem: kcov Marco Elver <elver@google.com>: kcov: add __no_sanitize_coverage to fix noinstr for all architectures Subsystem: selftests Dave Hansen <dave.hansen@linux.intel.com>: Patch series "selftests/vm/pkeys: Bug fixes and a new test": selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: exercise x86 XSAVE init state Subsystem: compress/decompress Yu Kuai <yukuai3@huawei.com>: lib/decompressors: remove set but not used variabled 'level' Subsystem: ipc Vasily Averin <vvs@virtuozzo.com>: Patch series "ipc: allocations cleanup", v2: ipc sem: use kvmalloc for sem_undo allocation ipc: use kmalloc for msg_queue and shmid_kernel Manfred Spraul <manfred@colorfullife.com>: ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc/util.c: use binary search for max_idx Documentation/admin-guide/kernel-parameters.txt | 35 Documentation/admin-guide/mm/hugetlbpage.rst | 11 Documentation/admin-guide/mm/memory-hotplug.rst | 13 Documentation/admin-guide/mm/pagemap.rst | 2 Documentation/admin-guide/mm/userfaultfd.rst | 3 Documentation/core-api/kernel-api.rst | 7 Documentation/filesystems/proc.rst | 48 Documentation/vm/hmm.rst | 19 Documentation/vm/unevictable-lru.rst | 33 MAINTAINERS | 10 arch/alpha/Kconfig | 5 arch/alpha/include/asm/pgalloc.h | 1 arch/alpha/include/asm/pgtable.h | 1 arch/alpha/include/uapi/asm/mman.h | 3 arch/alpha/kernel/setup.c | 2 arch/arc/include/asm/pgalloc.h | 2 arch/arc/include/asm/pgtable.h | 8 arch/arm/Kconfig | 3 arch/arm/include/asm/pgalloc.h | 1 arch/arm64/Kconfig | 15 arch/arm64/include/asm/hugetlb.h | 3 arch/arm64/include/asm/memory.h | 2 arch/arm64/include/asm/page.h | 4 arch/arm64/include/asm/pgalloc.h | 1 arch/arm64/include/asm/pgtable.h | 2 arch/arm64/kernel/setup.c | 1 arch/arm64/kvm/mmu.c | 2 arch/arm64/mm/hugetlbpage.c | 5 arch/arm64/mm/init.c | 51 arch/arm64/mm/ioremap.c | 4 arch/arm64/mm/mmu.c | 22 arch/csky/include/asm/pgalloc.h | 2 arch/csky/include/asm/pgtable.h | 1 arch/hexagon/include/asm/pgtable.h | 4 arch/ia64/Kconfig | 7 arch/ia64/include/asm/pal.h | 1 arch/ia64/include/asm/pgalloc.h | 1 arch/ia64/include/asm/pgtable.h | 1 arch/m68k/Kconfig | 5 arch/m68k/include/asm/mcf_pgalloc.h | 2 arch/m68k/include/asm/mcf_pgtable.h | 2 arch/m68k/include/asm/motorola_pgalloc.h | 1 arch/m68k/include/asm/motorola_pgtable.h | 2 arch/m68k/include/asm/pgtable_mm.h | 1 arch/m68k/include/asm/sun3_pgalloc.h | 1 arch/microblaze/Kconfig | 4 arch/microblaze/include/asm/pgalloc.h | 2 arch/microblaze/include/asm/pgtable.h | 2 arch/mips/Kconfig | 10 arch/mips/include/asm/pgalloc.h | 1 arch/mips/include/asm/pgtable-32.h | 1 arch/mips/include/asm/pgtable-64.h | 1 arch/mips/include/uapi/asm/mman.h | 3 arch/mips/kernel/relocate.c | 1 arch/mips/sgi-ip22/ip22-reset.c | 1 arch/mips/sgi-ip32/ip32-reset.c | 1 arch/nds32/include/asm/pgalloc.h | 5 arch/nios2/include/asm/pgalloc.h | 1 arch/nios2/include/asm/pgtable.h | 2 arch/openrisc/include/asm/pgalloc.h | 2 arch/openrisc/include/asm/pgtable.h | 1 arch/parisc/include/asm/pgalloc.h | 1 arch/parisc/include/asm/pgtable.h | 2 arch/parisc/include/uapi/asm/mman.h | 3 arch/parisc/kernel/pdc_chassis.c | 1 arch/powerpc/Kconfig | 6 arch/powerpc/include/asm/book3s/pgtable.h | 1 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 5 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 43 arch/powerpc/include/asm/nohash/32/pgtable.h | 1 arch/powerpc/include/asm/nohash/64/pgtable.h | 2 arch/powerpc/include/asm/pgalloc.h | 5 arch/powerpc/include/asm/pgtable.h | 6 arch/powerpc/kernel/setup-common.c | 1 arch/powerpc/platforms/Kconfig.cputype | 1 arch/riscv/Kconfig | 5 arch/riscv/include/asm/pgalloc.h | 2 arch/riscv/include/asm/pgtable.h | 2 arch/s390/Kconfig | 6 arch/s390/include/asm/pgalloc.h | 3 arch/s390/include/asm/pgtable.h | 5 arch/s390/kernel/ipl.c | 1 arch/s390/kernel/kprobes.c | 5 arch/s390/mm/pgtable.c | 2 arch/sh/include/asm/pgalloc.h | 1 arch/sh/include/asm/pgtable.h | 2 arch/sparc/Kconfig | 5 arch/sparc/include/asm/pgalloc_32.h | 1 arch/sparc/include/asm/pgalloc_64.h | 1 arch/sparc/include/asm/pgtable_32.h | 3 arch/sparc/include/asm/pgtable_64.h | 8 arch/sparc/kernel/sstate.c | 1 arch/sparc/mm/hugetlbpage.c | 6 arch/sparc/mm/init_64.c | 1 arch/um/drivers/mconsole_kern.c | 1 arch/um/include/asm/pgalloc.h | 1 arch/um/include/asm/pgtable-2level.h | 1 arch/um/include/asm/pgtable-3level.h | 1 arch/um/kernel/um_arch.c | 1 arch/x86/Kconfig | 17 arch/x86/include/asm/desc.h | 1 arch/x86/include/asm/pgalloc.h | 2 arch/x86/include/asm/pgtable_types.h | 2 arch/x86/kernel/cpu/mshyperv.c | 1 arch/x86/kernel/kprobes/core.c | 6 arch/x86/kernel/setup.c | 1 arch/x86/mm/init_64.c | 21 arch/x86/mm/pgtable.c | 34 arch/x86/purgatory/purgatory.c | 2 arch/x86/xen/enlighten.c | 1 arch/xtensa/include/asm/pgalloc.h | 2 arch/xtensa/include/asm/pgtable.h | 1 arch/xtensa/include/uapi/asm/mman.h | 3 arch/xtensa/platforms/iss/setup.c | 1 drivers/block/zram/zram_drv.h | 2 drivers/bus/brcmstb_gisb.c | 1 drivers/char/ipmi/ipmi_msghandler.c | 1 drivers/clk/analogbits/wrpll-cln28hpc.c | 4 drivers/edac/altera_edac.c | 1 drivers/firmware/google/gsmi.c | 1 drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1 drivers/gpu/drm/nouveau/nouveau_svm.c | 162 ++- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 drivers/hv/vmbus_drv.c | 1 drivers/hwtracing/coresight/coresight-cpu-debug.c | 1 drivers/leds/trigger/ledtrig-activity.c | 1 drivers/leds/trigger/ledtrig-heartbeat.c | 1 drivers/leds/trigger/ledtrig-panic.c | 1 drivers/misc/bcm-vk/bcm_vk_dev.c | 1 drivers/misc/ibmasm/heartbeat.c | 1 drivers/misc/pvpanic/pvpanic.c | 1 drivers/net/ipa/ipa_smp2p.c | 1 drivers/parisc/power.c | 1 drivers/power/reset/ltc2952-poweroff.c | 1 drivers/remoteproc/remoteproc_core.c | 1 drivers/s390/char/con3215.c | 1 drivers/s390/char/con3270.c | 1 drivers/s390/char/sclp.c | 1 drivers/s390/char/sclp_con.c | 1 drivers/s390/char/sclp_vt220.c | 1 drivers/s390/char/zcore.c | 1 drivers/soc/bcm/brcmstb/pm/pm-arm.c | 1 drivers/staging/olpc_dcon/olpc_dcon.c | 1 drivers/video/fbdev/hyperv_fb.c | 1 drivers/virtio/virtio_mem.c | 2 fs/Kconfig | 15 fs/exec.c | 3 fs/hfsplus/inode.c | 5 fs/hfsplus/xattr.c | 1 fs/nfsd/nfs4state.c | 2 fs/nilfs2/btree.c | 1 fs/open.c | 13 fs/proc/base.c | 6 fs/proc/fd.c | 20 fs/proc/kcore.c | 136 ++ fs/proc/task_mmu.c | 34 fs/seq_file.c | 43 fs/userfaultfd.c | 15 include/asm-generic/bug.h | 3 include/linux/ascii85.h | 3 include/linux/bootmem_info.h | 68 + include/linux/compat.h | 2 include/linux/compiler-clang.h | 17 include/linux/compiler-gcc.h | 6 include/linux/compiler_types.h | 2 include/linux/huge_mm.h | 74 - include/linux/hugetlb.h | 80 + include/linux/hugetlb_cgroup.h | 19 include/linux/kcore.h | 3 include/linux/kernel.h | 227 ---- include/linux/kprobes.h | 1 include/linux/kstrtox.h | 155 ++ include/linux/memblock.h | 4 include/linux/memory_hotplug.h | 27 include/linux/mempolicy.h | 9 include/linux/memremap.h | 2 include/linux/migrate.h | 27 include/linux/mm.h | 18 include/linux/mm_types.h | 2 include/linux/mmu_notifier.h | 26 include/linux/mmzone.h | 27 include/linux/mpi.h | 4 include/linux/page-flags.h | 22 include/linux/panic.h | 98 + include/linux/panic_notifier.h | 12 include/linux/pgtable.h | 44 include/linux/rmap.h | 13 include/linux/seq_file.h | 10 include/linux/shmem_fs.h | 19 include/linux/signal.h | 2 include/linux/string.h | 7 include/linux/string_helpers.h | 31 include/linux/sunrpc/cache.h | 1 include/linux/swap.h | 19 include/linux/swapops.h | 171 +-- include/linux/thread_info.h | 1 include/linux/userfaultfd_k.h | 5 include/linux/vmalloc.h | 15 include/linux/zbud.h | 23 include/trace/events/vmscan.h | 41 include/uapi/asm-generic/mman-common.h | 3 include/uapi/linux/mempolicy.h | 1 include/uapi/linux/userfaultfd.h | 7 init/main.c | 42 ipc/msg.c | 6 ipc/sem.c | 25 ipc/shm.c | 6 ipc/util.c | 44 ipc/util.h | 3 kernel/hung_task.c | 1 kernel/kexec_core.c | 1 kernel/kprobes.c | 2 kernel/panic.c | 1 kernel/rcu/tree.c | 2 kernel/signal.c | 14 kernel/sysctl.c | 4 kernel/trace/trace.c | 1 lib/Kconfig.debug | 12 lib/decompress_bunzip2.c | 6 lib/decompress_unlz4.c | 8 lib/decompress_unlzo.c | 3 lib/decompress_unxz.c | 2 lib/decompress_unzstd.c | 4 lib/kstrtox.c | 5 lib/lz4/lz4_decompress.c | 2 lib/math/Makefile | 1 lib/math/rational-test.c | 56 + lib/math/rational.c | 16 lib/mpi/longlong.h | 4 lib/mpi/mpicoder.c | 6 lib/mpi/mpiutil.c | 2 lib/parser.c | 1 lib/string.c | 2 lib/string_helpers.c | 142 +- lib/test-string_helpers.c | 157 ++- lib/test_hmm.c | 127 ++ lib/test_hmm_uapi.h | 2 lib/test_string.c | 5 lib/vsprintf.c | 1 lib/xz/xz_dec_bcj.c | 2 lib/xz/xz_dec_lzma2.c | 8 lib/zlib_inflate/inffast.c | 2 lib/zstd/huf.h | 2 mm/Kconfig | 16 mm/Makefile | 2 mm/bootmem_info.c | 127 ++ mm/compaction.c | 20 mm/debug_vm_pgtable.c | 109 -- mm/gup.c | 58 + mm/hmm.c | 12 mm/huge_memory.c | 269 ++--- mm/hugetlb.c | 369 +++++-- mm/hugetlb_vmemmap.c | 332 ++++++ mm/hugetlb_vmemmap.h | 53 - mm/internal.h | 29 mm/kfence/core.c | 4 mm/khugepaged.c | 20 mm/madvise.c | 66 + mm/mapping_dirty_helpers.c | 2 mm/memblock.c | 28 mm/memcontrol.c | 4 mm/memory-failure.c | 38 mm/memory.c | 239 +++- mm/memory_hotplug.c | 161 --- mm/mempolicy.c | 323 ++---- mm/migrate.c | 268 +---- mm/mlock.c | 12 mm/mmap_lock.c | 59 - mm/mprotect.c | 18 mm/nommu.c | 5 mm/oom_kill.c | 2 mm/page_alloc.c | 5 mm/page_vma_mapped.c | 15 mm/rmap.c | 644 +++++++++--- mm/shmem.c | 125 -- mm/sparse-vmemmap.c | 432 +++++++- mm/sparse.c | 1 mm/swap.c | 2 mm/swapfile.c | 2 mm/userfaultfd.c | 249 ++-- mm/util.c | 40 mm/vmalloc.c | 37 mm/vmscan.c | 20 mm/workingset.c | 10 mm/z3fold.c | 39 mm/zbud.c | 235 ++-- mm/zsmalloc.c | 5 mm/zswap.c | 26 scripts/checkpatch.pl | 16 tools/testing/selftests/vm/.gitignore | 3 tools/testing/selftests/vm/Makefile | 5 tools/testing/selftests/vm/hmm-tests.c | 158 +++ tools/testing/selftests/vm/khugepaged.c | 4 tools/testing/selftests/vm/madv_populate.c | 342 ++++++ tools/testing/selftests/vm/pkey-x86.h | 1 tools/testing/selftests/vm/protection_keys.c | 85 + tools/testing/selftests/vm/run_vmtests.sh | 16 tools/testing/selftests/vm/userfaultfd.c | 1094 ++++++++++----------- 299 files changed, 6277 insertions(+), 3183 deletions(-)
On Wed, Jun 30, 2021 at 6:46 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> This is the rest of the -mm tree, less 66 patches which are dependent on
> things which are (or were recently) in linux-next. I'll trickle that
> material over next week.
I haven't bisected this yet, but with the current -git I'm getting
watchdog: BUG: soft lockup - CPU#41 stuck for 49s!
and the common call chain seems to be in flush_tlb_mm_range ->
on_each_cpu_cond_mask.
Commit e058a84bfddc42ba356a2316f2cf1141974625c9 is good, and looking
at the pulls and merges I've done since, this -mm series looks like
the obvious culprit.
I'll go start bisection, but I thought I'd give a heads-up in case
somebody else has seen TLB-flush-related lockups and already figured
out the guilty party..
Linus
On Fri, Jul 2, 2021 at 5:28 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Commit e058a84bfddc42ba356a2316f2cf1141974625c9 is good, and looking
> at the pulls and merges I've done since, this -mm series looks like
> the obvious culprit.
No, unless my bisection is wrong, the -mm branch is innocent, and was
discarded from the suspects on the very first bisection trial.
So never mind.
Linus
54 patches, based on a931dd33d370896a683236bba67c0d6f3d01144d. Subsystems affected by this patch series: lib mm/slub mm/secretmem mm/cleanups mm/init debug mm/pagemap mm/mremap Subsystem: lib Zhen Lei <thunder.leizhen@huawei.com>: lib/test: fix spelling mistakes lib: fix spelling mistakes lib: fix spelling mistakes in header files Subsystem: mm/slub Nathan Chancellor <nathan@kernel.org>: Patch series "hexagon: Fix build error with CONFIG_STACKDEPOT and select CONFIG_ARCH_WANT_LD_ORPHAN_WARN": hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script hexagon: use common DISCARDS macro hexagon: select ARCH_WANT_LD_ORPHAN_WARN Oliver Glitta <glittao@gmail.com>: mm/slub: use stackdepot to save stack trace in objects Subsystem: mm/secretmem Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20: mmap: make mlock_future_check() global riscv/Kconfig: make direct map manipulation options depend on MMU set_memory: allow querying whether set_direct_map_*() is actually enabled mm: introduce memfd_secret system call to create "secret" memory areas PM: hibernate: disable when there are active secretmem users arch, mm: wire up memfd_secret system call where relevant secretmem: test: add basic selftest for memfd_secret(2) Subsystem: mm/cleanups Zhen Lei <thunder.leizhen@huawei.com>: mm: fix spelling mistakes in header files Subsystem: mm/init Kefeng Wang <wangkefeng.wang@huawei.com>: Patch series "init_mm: cleanup ARCH's text/data/brk setup code", v3: mm: add setup_initial_init_mm() helper arc: convert to setup_initial_init_mm() arm: convert to setup_initial_init_mm() arm64: convert to setup_initial_init_mm() csky: convert to setup_initial_init_mm() h8300: convert to setup_initial_init_mm() m68k: convert to setup_initial_init_mm() nds32: convert to setup_initial_init_mm() nios2: convert to setup_initial_init_mm() openrisc: convert to setup_initial_init_mm() powerpc: convert to setup_initial_init_mm() riscv: convert to setup_initial_init_mm() s390: convert to setup_initial_init_mm() sh: convert to setup_initial_init_mm() x86: convert to setup_initial_init_mm() Subsystem: debug Stephen Boyd <swboyd@chromium.org>: Patch series "Add build ID to stacktraces", v6: buildid: only consider GNU notes for build ID parsing buildid: add API to parse build ID out of buffer buildid: stash away kernels build ID on init dump_stack: add vmlinux build ID to stack traces module: add printk formats to add module build ID to stacktraces arm64: stacktrace: use %pSb for backtrace printing x86/dumpstack: use %pSb/%pBb for backtrace printing scripts/decode_stacktrace.sh: support debuginfod scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path buildid: mark some arguments const buildid: fix kernel-doc notation kdump: use vmlinux_build_id to simplify Subsystem: mm/pagemap "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * Subsystem: mm/mremap "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: Patch series "mrermap fixes", v2: selftest/mremap_test: update the test to handle pagesize other than 4K selftest/mremap_test: avoid crash with static build mm/mremap: convert huge PUD move to separate helper mm/mremap: don't enable optimized PUD move if page table levels is 2 mm/mremap: use pmd/pud_poplulate to update page table entries mm/mremap: hold the rmap lock in write mode when moving page table entries. Patch series "Speedup mremap on ppc64", v8: mm/mremap: allow arch runtime override powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache powerpc/mm: enable HAVE_MOVE_PMD support Documentation/core-api/printk-formats.rst | 11 arch/alpha/include/asm/pgtable.h | 8 arch/arc/mm/init.c | 5 arch/arm/include/asm/pgtable-3level.h | 2 arch/arm/kernel/setup.c | 5 arch/arm64/include/asm/Kbuild | 1 arch/arm64/include/asm/cacheflush.h | 6 arch/arm64/include/asm/kfence.h | 2 arch/arm64/include/asm/pgtable.h | 8 arch/arm64/include/asm/set_memory.h | 17 + arch/arm64/include/uapi/asm/unistd.h | 1 arch/arm64/kernel/machine_kexec.c | 1 arch/arm64/kernel/setup.c | 5 arch/arm64/kernel/stacktrace.c | 2 arch/arm64/mm/mmu.c | 7 arch/arm64/mm/pageattr.c | 13 arch/csky/kernel/setup.c | 5 arch/h8300/kernel/setup.c | 5 arch/hexagon/Kconfig | 1 arch/hexagon/kernel/vmlinux.lds.S | 9 arch/ia64/include/asm/pgtable.h | 4 arch/m68k/include/asm/motorola_pgtable.h | 2 arch/m68k/kernel/setup_mm.c | 5 arch/m68k/kernel/setup_no.c | 5 arch/mips/include/asm/pgtable-64.h | 8 arch/nds32/kernel/setup.c | 5 arch/nios2/kernel/setup.c | 5 arch/openrisc/kernel/setup.c | 5 arch/parisc/include/asm/pgtable.h | 4 arch/powerpc/include/asm/book3s/64/pgtable.h | 11 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h | 2 arch/powerpc/include/asm/nohash/64/pgtable-4k.h | 6 arch/powerpc/include/asm/nohash/64/pgtable.h | 6 arch/powerpc/include/asm/tlb.h | 6 arch/powerpc/kernel/setup-common.c | 5 arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 8 arch/powerpc/mm/book3s64/radix_pgtable.c | 6 arch/powerpc/mm/book3s64/radix_tlb.c | 44 +- arch/powerpc/mm/pgtable_64.c | 4 arch/powerpc/platforms/Kconfig.cputype | 2 arch/riscv/Kconfig | 4 arch/riscv/include/asm/pgtable-64.h | 4 arch/riscv/include/asm/unistd.h | 1 arch/riscv/kernel/setup.c | 5 arch/s390/kernel/setup.c | 5 arch/sh/include/asm/pgtable-3level.h | 4 arch/sh/kernel/setup.c | 5 arch/sparc/include/asm/pgtable_32.h | 6 arch/sparc/include/asm/pgtable_64.h | 10 arch/um/include/asm/pgtable-3level.h | 2 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/x86/include/asm/pgtable.h | 8 arch/x86/kernel/dumpstack.c | 2 arch/x86/kernel/setup.c | 5 arch/x86/mm/init_64.c | 4 arch/x86/mm/pat/set_memory.c | 4 arch/x86/mm/pgtable.c | 2 include/asm-generic/pgtable-nop4d.h | 2 include/asm-generic/pgtable-nopmd.h | 2 include/asm-generic/pgtable-nopud.h | 4 include/linux/bootconfig.h | 4 include/linux/buildid.h | 10 include/linux/compaction.h | 4 include/linux/cpumask.h | 2 include/linux/crash_core.h | 12 include/linux/debugobjects.h | 2 include/linux/hmm.h | 2 include/linux/hugetlb.h | 6 include/linux/kallsyms.h | 21 + include/linux/list_lru.h | 4 include/linux/lru_cache.h | 8 include/linux/mm.h | 3 include/linux/mmu_notifier.h | 8 include/linux/module.h | 9 include/linux/nodemask.h | 6 include/linux/percpu-defs.h | 2 include/linux/percpu-refcount.h | 2 include/linux/pgtable.h | 4 include/linux/scatterlist.h | 2 include/linux/secretmem.h | 54 +++ include/linux/set_memory.h | 12 include/linux/shrinker.h | 2 include/linux/syscalls.h | 1 include/linux/vmalloc.h | 4 include/uapi/asm-generic/unistd.h | 7 include/uapi/linux/magic.h | 1 init/Kconfig | 1 init/main.c | 2 kernel/crash_core.c | 50 --- kernel/kallsyms.c | 104 +++++-- kernel/module.c | 42 ++ kernel/power/hibernate.c | 5 kernel/sys_ni.c | 2 lib/Kconfig.debug | 17 - lib/asn1_encoder.c | 2 lib/buildid.c | 80 ++++- lib/devres.c | 2 lib/dump_stack.c | 13 lib/dynamic_debug.c | 2 lib/fonts/font_pearl_8x8.c | 2 lib/kfifo.c | 2 lib/list_sort.c | 2 lib/nlattr.c | 4 lib/oid_registry.c | 2 lib/pldmfw/pldmfw.c | 2 lib/reed_solomon/test_rslib.c | 2 lib/refcount.c | 2 lib/rhashtable.c | 2 lib/sbitmap.c | 2 lib/scatterlist.c | 4 lib/seq_buf.c | 2 lib/sort.c | 2 lib/stackdepot.c | 2 lib/test_bitops.c | 2 lib/test_bpf.c | 2 lib/test_kasan.c | 2 lib/test_kmod.c | 6 lib/test_scanf.c | 2 lib/vsprintf.c | 10 mm/Kconfig | 4 mm/Makefile | 1 mm/gup.c | 12 mm/init-mm.c | 9 mm/internal.h | 3 mm/mlock.c | 3 mm/mmap.c | 5 mm/mremap.c | 108 ++++++- mm/secretmem.c | 254 +++++++++++++++++ mm/slub.c | 79 +++-- scripts/checksyscalls.sh | 4 scripts/decode_stacktrace.sh | 89 +++++- tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 3 tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++ tools/testing/selftests/vm/mremap_test.c | 116 ++++--- tools/testing/selftests/vm/run_vmtests.sh | 17 + 137 files changed, 1470 insertions(+), 442 deletions(-)
13 patches, based on 40226a3d96ef8ab8980f032681c8bfd46d63874e. Subsystems affected by this patch series: mm/kasan mm/pagealloc mm/rmap mm/hmm hfs mm/hugetlb Subsystem: mm/kasan Marco Elver <elver@google.com>: mm: move helper to check slub_debug_enabled Yee Lee <yee.lee@mediatek.com>: kasan: add memzero init for unaligned size at DEBUG Marco Elver <elver@google.com>: kasan: fix build by including kernel.h Subsystem: mm/pagealloc Matteo Croce <mcroce@microsoft.com>: Revert "mm/page_alloc: make should_fail_alloc_page() static" Mel Gorman <mgorman@techsingularity.net>: mm/page_alloc: avoid page allocator recursion with pagesets.lock held Yanfei Xu <yanfei.xu@windriver.com>: mm/page_alloc: correct return value when failing at preparing Chuck Lever <chuck.lever@oracle.com>: mm/page_alloc: further fix __alloc_pages_bulk() return value Subsystem: mm/rmap Christoph Hellwig <hch@lst.de>: mm: fix the try_to_unmap prototype for !CONFIG_MMU Subsystem: mm/hmm Alistair Popple <apopple@nvidia.com>: lib/test_hmm: remove set but unused page variable Subsystem: hfs Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>: Patch series "hfs: fix various errors", v2: hfs: add missing clean-up in hfs_fill_super hfs: fix high memory mapping in hfs_bnode_read hfs: add lock nesting notation to hfs_find_init Subsystem: mm/hugetlb Joao Martins <joao.m.martins@oracle.com>: mm/hugetlb: fix refs calculation from unaligned @vaddr fs/hfs/bfind.c | 14 +++++++++++++- fs/hfs/bnode.c | 25 ++++++++++++++++++++----- fs/hfs/btree.h | 7 +++++++ fs/hfs/super.c | 10 +++++----- include/linux/kasan.h | 1 + include/linux/rmap.h | 4 +++- lib/test_hmm.c | 2 -- mm/hugetlb.c | 5 +++-- mm/kasan/kasan.h | 12 ++++++++++++ mm/page_alloc.c | 30 ++++++++++++++++++++++-------- mm/slab.h | 15 +++++++++++---- mm/slub.c | 14 -------------- 12 files changed, 97 insertions(+), 42 deletions(-)
15 patches, based on 704f4cba43d4ed31ef4beb422313f1263d87bc55. Subsystems affected by this patch series: mm/userfaultfd mm/kfence mm/highmem mm/pagealloc mm/memblock mm/pagecache mm/secretmem mm/pagemap mm/hugetlbfs Subsystem: mm/userfaultfd Peter Collingbourne <pcc@google.com>: Patch series "userfaultfd: do not untag user pointers", v5: userfaultfd: do not untag user pointers selftest: use mmap instead of posix_memalign to allocate memory Subsystem: mm/kfence Weizhao Ouyang <o451686892@gmail.com>: kfence: defer kfence_test_init to ensure that kunit debugfs is created Alexander Potapenko <glider@google.com>: kfence: move the size check to the beginning of __kfence_alloc() kfence: skip all GFP_ZONEMASK allocations Subsystem: mm/highmem Christoph Hellwig <hch@lst.de>: mm: call flush_dcache_page() in memcpy_to_page() and memzero_page() mm: use kmap_local_page in memzero_page Subsystem: mm/pagealloc Sergei Trofimovich <slyfox@gentoo.org>: mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction Subsystem: mm/memblock Mike Rapoport <rppt@linux.ibm.com>: memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions Subsystem: mm/pagecache Roman Gushchin <guro@fb.com>: writeback, cgroup: remove wb from offline list before releasing refcnt writeback, cgroup: do not reparent dax inodes Subsystem: mm/secretmem Mike Rapoport <rppt@linux.ibm.com>: mm/secretmem: wire up ->set_page_dirty Subsystem: mm/pagemap Muchun Song <songmuchun@bytedance.com>: mm: mmap_lock: fix disabling preemption directly Qi Zheng <zhengqi.arch@bytedance.com>: mm: fix the deadlock in finish_fault() Subsystem: mm/hugetlbfs Mike Kravetz <mike.kravetz@oracle.com>: hugetlbfs: fix mount mode command line processing Documentation/arm64/tagged-address-abi.rst | 26 ++++++++++++++++++-------- fs/fs-writeback.c | 3 +++ fs/hugetlbfs/inode.c | 2 +- fs/userfaultfd.c | 26 ++++++++++++-------------- include/linux/highmem.h | 6 ++++-- include/linux/memblock.h | 4 ++-- mm/backing-dev.c | 2 +- mm/kfence/core.c | 19 ++++++++++++++++--- mm/kfence/kfence_test.c | 2 +- mm/memblock.c | 3 ++- mm/memory.c | 11 ++++++++++- mm/mmap_lock.c | 4 ++-- mm/page_alloc.c | 29 ++++++++++++++++------------- mm/secretmem.c | 1 + tools/testing/selftests/vm/userfaultfd.c | 6 ++++-- 15 files changed, 93 insertions(+), 51 deletions(-)
7 patches, based on 7e96bf476270aecea66740a083e51b38c1371cd2. Subsystems affected by this patch series: lib ocfs2 mm/memcg mm/migration mm/slub mm/memcg Subsystem: lib Matteo Croce <mcroce@microsoft.com>: lib/test_string.c: move string selftest in the Runtime Testing menu Subsystem: ocfs2 Junxiao Bi <junxiao.bi@oracle.com>: ocfs2: fix zero out valid data ocfs2: issue zeroout to EOF blocks Subsystem: mm/memcg Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code Subsystem: mm/migration "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: mm/migrate: fix NR_ISOLATED corruption on 64-bit Subsystem: mm/slub Shakeel Butt <shakeelb@google.com>: slub: fix unreclaimable slab stat for bulk free Subsystem: mm/memcg Wang Hai <wanghai38@huawei.com>: mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() fs/ocfs2/file.c | 103 ++++++++++++++++++++++++++++++++---------------------- lib/Kconfig | 3 - lib/Kconfig.debug | 3 + mm/memcontrol.c | 3 + mm/migrate.c | 2 - mm/slab.h | 2 - mm/slub.c | 22 ++++++----- 7 files changed, 81 insertions(+), 57 deletions(-)
7 patches, based on f8e6dfc64f6135d1b6c5215c14cd30b9b60a0008. Subsystems affected by this patch series: mm/kasan mm/slub mm/madvise mm/memcg lib Subsystem: mm/kasan Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>: Patch series "kasan, slub: reset tag when printing address", v3: kasan, kmemleak: reset tags when scanning block kasan, slub: reset tag when printing address Subsystem: mm/slub Shakeel Butt <shakeelb@google.com>: slub: fix kmalloc_pagealloc_invalid_free unit test Vlastimil Babka <vbabka@suse.cz>: mm: slub: fix slub_debug disabling for list of slabs Subsystem: mm/madvise David Hildenbrand <david@redhat.com>: mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE) Subsystem: mm/memcg Waiman Long <longman@redhat.com>: mm/memcg: fix incorrect flushing of lruvec data in obj_stock Subsystem: lib Liang Wang <wangliang101@huawei.com>: lib: use PFN_PHYS() in devmem_is_allowed() lib/devmem_is_allowed.c | 2 +- mm/gup.c | 7 +++++-- mm/kmemleak.c | 6 +++--- mm/madvise.c | 4 +++- mm/memcontrol.c | 6 ++++-- mm/slub.c | 25 ++++++++++++++----------- 6 files changed, 30 insertions(+), 20 deletions(-)
10 patches, based on 614cb2751d3150850d459bee596c397f344a7936. Subsystems affected by this patch series: mm/shmem mm/pagealloc mm/tracing MAINTAINERS mm/memcg mm/memory-failure mm/vmscan mm/kfence mm/hugetlb Subsystem: mm/shmem Yang Shi <shy828301@gmail.com>: Revert "mm/shmem: fix shmem_swapin() race with swapoff" Revert "mm: swap: check if swap backing device is congested or not" Subsystem: mm/pagealloc Doug Berger <opendmb@gmail.com>: mm/page_alloc: don't corrupt pcppage_migratetype Subsystem: mm/tracing Mike Rapoport <rppt@linux.ibm.com>: mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON names Subsystem: MAINTAINERS Nathan Chancellor <nathan@kernel.org>: MAINTAINERS: update ClangBuiltLinux IRC chat Subsystem: mm/memcg Johannes Weiner <hannes@cmpxchg.org>: mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm/hwpoison: retry with shake_page() for unhandlable pages Subsystem: mm/vmscan Johannes Weiner <hannes@cmpxchg.org>: mm: vmscan: fix missing psi annotation for node_reclaim() Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: hugetlb: don't pass page cache pages to restore_reserve_on_error MAINTAINERS | 2 +- include/linux/kfence.h | 7 ++++--- include/linux/memcontrol.h | 29 +++++++++++++++-------------- include/trace/events/mmflags.h | 4 +++- mm/hugetlb.c | 19 ++++++++++++++----- mm/memory-failure.c | 12 +++++++++--- mm/page_alloc.c | 25 ++++++++++++------------- mm/shmem.c | 14 +------------- mm/swap_state.c | 7 ------- mm/vmscan.c | 30 ++++++++++++++++++++++-------- 10 files changed, 81 insertions(+), 68 deletions(-)
2 patches, based on 6e764bcd1cf72a2846c0e53d3975a09b242c04c9. Subsystems affected by this patch series: mm/memory-hotplug MAINTAINERS Subsystem: mm/memory-hotplug Miaohe Lin <linmiaohe@huawei.com>: mm/memory_hotplug: fix potential permanent lru cache disable Subsystem: MAINTAINERS Namjae Jeon <namjae.jeon@samsung.com>: MAINTAINERS: exfat: update my email address MAINTAINERS | 2 +- mm/memory_hotplug.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
212 patches, based on 4a3bb4200a5958d76cc26ebe4db4257efa56812b. Subsystems affected by this patch series: ia64 ocfs2 block mm/slub mm/debug mm/pagecache mm/gup mm/swap mm/shmem mm/memcg mm/selftests mm/pagemap mm/mremap mm/bootmem mm/sparsemem mm/vmalloc mm/kasan mm/pagealloc mm/memory-failure mm/hugetlb mm/userfaultfd mm/vmscan mm/compaction mm/mempolicy mm/memblock mm/oom-kill mm/migration mm/ksm mm/percpu mm/vmstat mm/madvise Subsystem: ia64 Jason Wang <wangborong@cdjrlc.com>: ia64: fix typo in a comment Geert Uytterhoeven <geert+renesas@glider.be>: Patch series "ia64: Miscellaneous fixes and cleanups": ia64: fix #endif comment for reserve_elfcorehdr() ia64: make reserve_elfcorehdr() static ia64: make num_rsvd_regions static Subsystem: ocfs2 Dan Carpenter <dan.carpenter@oracle.com>: ocfs2: remove an unnecessary condition Tuo Li <islituo@gmail.com>: ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info() Gang He <ghe@suse.com>: ocfs2: ocfs2_downconvert_lock failure results in deadlock Subsystem: block kernel test robot <lkp@intel.com>: arch/csky/kernel/probes/kprobes.c: fix bugon.cocci warnings Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: Patch series "SLUB: reduce irq disabled scope and make it RT compatible", v4: mm, slub: don't call flush_all() from slab_debug_trace_open() mm, slub: allocate private object map for debugfs listings mm, slub: allocate private object map for validate_slab_cache() mm, slub: don't disable irq for debug_check_no_locks_freed() mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() mm, slub: extract get_partial() from new_slab_objects() mm, slub: dissolve new_slab_objects() into ___slab_alloc() mm, slub: return slab page from get_partial() and set c->page afterwards mm, slub: restructure new page checks in ___slab_alloc() mm, slub: simplify kmem_cache_cpu and tid setup mm, slub: move disabling/enabling irqs to ___slab_alloc() mm, slub: do initial checks in ___slab_alloc() with irqs enabled mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() mm, slub: restore irqs around calling new_slab() mm, slub: validate slab from partial list or page allocator before making it cpu slab mm, slub: check new pages with restored irqs mm, slub: stop disabling irqs around get_partial() mm, slub: move reset of c->page and freelist out of deactivate_slab() mm, slub: make locking in deactivate_slab() irq-safe mm, slub: call deactivate_slab() without disabling irqs mm, slub: move irq control into unfreeze_partials() mm, slub: discard slabs in unfreeze_partials() without irqs disabled mm, slub: detach whole partial list at once in unfreeze_partials() mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing mm, slub: only disable irq with spin_lock in __unfreeze_partials() mm, slub: don't disable irqs in slub_cpu_dead() mm, slab: make flush_slab() possible to call with irqs enabled Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context mm: slub: make object_map_lock a raw_spinlock_t Vlastimil Babka <vbabka@suse.cz>: mm, slub: optionally save/restore irqs in slab_[un]lock()/ mm, slub: make slab_lock() disable irqs with PREEMPT_RT mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg mm, slub: use migrate_disable() on PREEMPT_RT mm, slub: convert kmem_cpu_slab protection to local_lock Subsystem: mm/debug Gavin Shan <gshan@redhat.com>: Patch series "mm/debug_vm_pgtable: Enhancements", v6: mm/debug_vm_pgtable: introduce struct pgtable_debug_args mm/debug_vm_pgtable: use struct pgtable_debug_args in basic tests mm/debug_vm_pgtable: use struct pgtable_debug_args in leaf and savewrite tests mm/debug_vm_pgtable: use struct pgtable_debug_args in protnone and devmap tests mm/debug_vm_pgtable: use struct pgtable_debug_args in soft_dirty and swap tests mm/debug_vm_pgtable: use struct pgtable_debug_args in migration and thp tests mm/debug_vm_pgtable: use struct pgtable_debug_args in PTE modifying tests mm/debug_vm_pgtable: use struct pgtable_debug_args in PMD modifying tests mm/debug_vm_pgtable: use struct pgtable_debug_args in PUD modifying tests mm/debug_vm_pgtable: use struct pgtable_debug_args in PGD and P4D modifying tests mm/debug_vm_pgtable: remove unused code mm/debug_vm_pgtable: fix corrupted page flag "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: report a more useful address for reclaim acquisition liuhailong <liuhailong@oppo.com>: mm: add kernel_misc_reclaimable in show_free_areas Subsystem: mm/pagecache Jan Kara <jack@suse.cz>: Patch series "writeback: Fix bandwidth estimates", v4: writeback: track number of inodes under writeback writeback: reliably update bandwidth estimation writeback: fix bandwidth estimate for spiky workload writeback: rename domain_update_bandwidth() writeback: use READ_ONCE for unlocked reads of writeback stats Johannes Weiner <hannes@cmpxchg.org>: mm: remove irqsave/restore locking from contexts with irqs enabled fs: drop_caches: fix skipping over shadow cache inodes fs: inode: count invalidated shadow pages in pginodesteal Shakeel Butt <shakeelb@google.com>: writeback: memcg: simplify cgroup_writeback_by_id Jing Yangyang <jing.yangyang@zte.com.cn>: include/linux/buffer_head.h: fix boolreturn.cocci warnings Subsystem: mm/gup Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanups and fixup for gup": mm: gup: remove set but unused local variable major mm: gup: remove unneed local variable orig_refs mm: gup: remove useless BUG_ON in __get_user_pages() mm: gup: fix potential pgmap refcnt leak in __gup_device_huge() mm: gup: use helper PAGE_ALIGNED in populate_vma_page_range() John Hubbard <jhubbard@nvidia.com>: Patch series "A few gup refactorings and documentation updates", v3: mm/gup: documentation corrections for gup/pup mm/gup: small refactoring: simplify try_grab_page() mm/gup: remove try_get_page(), call try_get_compound_head() directly Subsystem: mm/swap Hugh Dickins <hughd@google.com>: fs, mm: fix race in unlinking swapfile John Hubbard <jhubbard@nvidia.com>: mm: delete unused get_kernel_page() Subsystem: mm/shmem Sebastian Andrzej Siewior <bigeasy@linutronix.de>: shmem: use raw_spinlock_t for ->stat_lock Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanups for shmem": shmem: remove unneeded variable ret shmem: remove unneeded header file shmem: remove unneeded function forward declaration shmem: include header file to declare swap_info Hugh Dickins <hughd@google.com>: Patch series "huge tmpfs: shmem_is_huge() fixes and cleanups": huge tmpfs: fix fallocate(vanilla) advance over huge pages huge tmpfs: fix split_huge_page() after FALLOC_FL_KEEP_SIZE huge tmpfs: remove shrinklist addition from shmem_setattr() huge tmpfs: revert shmem's use of transhuge_vma_enabled() huge tmpfs: move shmem_huge_enabled() upwards huge tmpfs: SGP_NOALLOC to stop collapse_file() on race huge tmpfs: shmem_is_huge(vma, inode, index) huge tmpfs: decide stat.st_blksize by shmem_is_huge() shmem: shmem_writepage() split unlikely i915 THP Subsystem: mm/memcg Suren Baghdasaryan <surenb@google.com>: mm, memcg: add mem_cgroup_disabled checks in vmpressure and swap-related functions mm, memcg: inline mem_cgroup_{charge/uncharge} to improve disabled memcg config mm, memcg: inline swap-related functions to improve disabled memcg config Vasily Averin <vvs@virtuozzo.com>: memcg: enable accounting for pids in nested pid namespaces Shakeel Butt <shakeelb@google.com>: memcg: switch lruvec stats to rstat memcg: infrastructure to flush memcg stats Yutian Yang <nglaive@gmail.com>: memcg: charge fs_context and legacy_fs_context Vasily Averin <vvs@virtuozzo.com>: Patch series "memcg accounting from OpenVZ", v7: memcg: enable accounting for mnt_cache entries memcg: enable accounting for pollfd and select bits arrays memcg: enable accounting for file lock caches memcg: enable accounting for fasync_cache memcg: enable accounting for new namesapces and struct nsproxy memcg: enable accounting of ipc resources memcg: enable accounting for signals memcg: enable accounting for posix_timers_cache slab memcg: enable accounting for ldt_struct objects Shakeel Butt <shakeelb@google.com>: memcg: cleanup racy sum avoidance code Vasily Averin <vvs@virtuozzo.com>: memcg: replace in_interrupt() by !in_task() in active_memcg() Baolin Wang <baolin.wang@linux.alibaba.com>: mm: memcontrol: set the correct memcg swappiness restriction Miaohe Lin <linmiaohe@huawei.com>: mm, memcg: remove unused functions mm, memcg: save some atomic ops when flush is already true Michal Hocko <mhocko@suse.com>: memcg: fix up drain_local_stock comment Shakeel Butt <shakeelb@google.com>: memcg: make memcg->event_list_lock irqsafe Subsystem: mm/selftests Po-Hsu Lin <po-hsu.lin@canonical.com>: selftests/vm: use kselftest skip code for skipped tests Colin Ian King <colin.king@canonical.com>: selftests: Fix spelling mistake "cann't" -> "cannot" Subsystem: mm/pagemap Nicholas Piggin <npiggin@gmail.com>: Patch series "shoot lazy tlbs", v4: lazy tlb: introduce lazy mm refcount helper functions lazy tlb: allow lazy tlb mm refcounting to be configurable lazy tlb: shoot lazies, a non-refcounting lazy tlb option powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Christoph Hellwig <hch@lst.de>: Patch series "_kernel_dcache_page fixes and removal": mmc: JZ4740: remove the flush_kernel_dcache_page call in jz4740_mmc_read_data mmc: mmc_spi: replace flush_kernel_dcache_page with flush_dcache_page scatterlist: replace flush_kernel_dcache_page with flush_dcache_page mm: remove flush_kernel_dcache_page Huang Ying <ying.huang@intel.com>: mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code Greg Kroah-Hartman <gregkh@linuxfoundation.org>: mm: change fault_in_pages_* to have an unsigned size parameter Luigi Rizzo <lrizzo@google.com>: mm/pagemap: add mmap_assert_locked() annotations to find_vma*() "Liam R. Howlett" <Liam.Howlett@Oracle.com>: remap_file_pages: Use vma_lookup() instead of find_vma() Subsystem: mm/mremap Chen Wandun <chenwandun@huawei.com>: mm/mremap: fix memory account on do_munmap() failure Subsystem: mm/bootmem Muchun Song <songmuchun@bytedance.com>: mm/bootmem_info.c: mark __init on register_page_bootmem_info_section Subsystem: mm/sparsemem Ohhoon Kwon <ohoono.kwon@samsung.com>: Patch series "mm: sparse: remove __section_nr() function", v4: mm: sparse: pass section_nr to section_mark_present mm: sparse: pass section_nr to find_memory_block mm: sparse: remove __section_nr() function Naoya Horiguchi <naoya.horiguchi@nec.com>: mm/sparse: set SECTION_NID_SHIFT to 6 Matthew Wilcox <willy@infradead.org>: include/linux/mmzone.h: avoid a warning in sparse memory support Miles Chen <miles.chen@mediatek.com>: mm/sparse: clarify pgdat_to_phys Subsystem: mm/vmalloc "Uladzislau Rezki (Sony)" <urezki@gmail.com>: mm/vmalloc: use batched page requests in bulk-allocator mm/vmalloc: remove gfpflags_allow_blocking() check lib/test_vmalloc.c: add a new 'nr_pages' parameter Chen Wandun <chenwandun@huawei.com>: mm/vmalloc: fix wrong behavior in vread Subsystem: mm/kasan Woody Lin <woodylin@google.com>: mm/kasan: move kasan.fault to mm/kasan/report.c Andrey Konovalov <andreyknvl@gmail.com>: Patch series "kasan: test: avoid crashing the kernel with HW_TAGS", v2: kasan: test: rework kmalloc_oob_right kasan: test: avoid writing invalid memory kasan: test: avoid corrupting memory via memset kasan: test: disable kmalloc_memmove_invalid_size for HW_TAGS kasan: test: only do kmalloc_uaf_memset for generic mode kasan: test: clean up ksize_uaf kasan: test: avoid corrupting memory in copy_user_test kasan: test: avoid corrupting memory in kasan_rcu_uaf Subsystem: mm/pagealloc Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: ensure consistency of memory map poisoning": mm/page_alloc: always initialize memory map for the holes microblaze: simplify pte_alloc_one_kernel() mm: introduce memmap_alloc() to unify memory map allocation memblock: stop poisoning raw allocations Nico Pache <npache@redhat.com>: mm/page_alloc.c: fix 'zone_id' may be used uninitialized in this function warning Mike Rapoport <rppt@linux.ibm.com>: mm/page_alloc: make alloc_node_mem_map() __init rather than __ref Vasily Averin <vvs@virtuozzo.com>: mm/page_alloc.c: use in_task() "George G. Davis" <davis.george@siemens.com>: mm/page_isolation: tracing: trace all test_pages_isolated failures Subsystem: mm/memory-failure Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanups and fixup for hwpoison": mm/hwpoison: remove unneeded variable unmap_success mm/hwpoison: fix potential pte_unmap_unlock pte error mm/hwpoison: change argument struct page **hpagep to *hpage mm/hwpoison: fix some obsolete comments Yang Shi <shy828301@gmail.com>: mm: hwpoison: don't drop slab caches for offlining non-LRU page doc: hwpoison: correct the support for hugepage mm: hwpoison: dump page for unhandlable page Michael Wang <yun.wang@linux.alibaba.com>: mm: fix panic caused by __page_handle_poison() Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: hugetlb: simplify prep_compound_gigantic_page ref count racing code hugetlb: drop ref count earlier after page allocation hugetlb: before freeing hugetlb page set dtor to appropriate value hugetlb: fix hugetlb cgroup refcounting during vma split Subsystem: mm/userfaultfd Nadav Amit <namit@vmware.com>: Patch series "userfaultfd: minor bug fixes": userfaultfd: change mmap_changing to atomic userfaultfd: prevent concurrent API initialization selftests/vm/userfaultfd: wake after copy failure Subsystem: mm/vmscan Dave Hansen <dave.hansen@linux.intel.com>: Patch series "Migrate Pages in lieu of discard", v11: mm/numa: automatically generate node migration order mm/migrate: update node demotion order on hotplug events Yang Shi <yang.shi@linux.alibaba.com>: mm/migrate: enable returning precise migrate_pages() success count Dave Hansen <dave.hansen@linux.intel.com>: mm/migrate: demote pages during reclaim Yang Shi <yang.shi@linux.alibaba.com>: mm/vmscan: add page demotion counter Dave Hansen <dave.hansen@linux.intel.com>: mm/vmscan: add helper for querying ability to age anonymous pages Keith Busch <kbusch@kernel.org>: mm/vmscan: Consider anonymous pages without swap Dave Hansen <dave.hansen@linux.intel.com>: mm/vmscan: never demote for memcg reclaim Huang Ying <ying.huang@intel.com>: mm/migrate: add sysfs interface to enable reclaim migration Hui Su <suhui@zeku.com>: mm/vmpressure: replace vmpressure_to_css() with vmpressure_to_memcg() Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanups for vmscan", v2: mm/vmscan: remove the PageDirty check after MADV_FREE pages are page_ref_freezed mm/vmscan: remove misleading setting to sc->priority mm/vmscan: remove unneeded return value of kswapd_run() mm/vmscan: add 'else' to remove check_pending label Vlastimil Babka <vbabka@suse.cz>: mm, vmscan: guarantee drop_slab_node() termination Subsystem: mm/compaction Charan Teja Reddy <charante@codeaurora.org>: mm: compaction: optimize proactive compaction deferrals mm: compaction: support triggering of proactive compaction by user Subsystem: mm/mempolicy Baolin Wang <baolin.wang@linux.alibaba.com>: mm/mempolicy: use readable NUMA_NO_NODE macro instead of magic number Dave Hansen <dave.hansen@linux.intel.com>: Patch series "Introduce multi-preference mempolicy", v7: mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes Feng Tang <feng.tang@intel.com>: mm/memplicy: add page allocation function for MPOL_PREFERRED_MANY policy Ben Widawsky <ben.widawsky@intel.com>: mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY mm/mempolicy: advertise new MPOL_PREFERRED_MANY Feng Tang <feng.tang@intel.com>: mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies Vasily Averin <vvs@virtuozzo.com>: mm/mempolicy.c: use in_task() in mempolicy_slab_node() Subsystem: mm/memblock Mike Rapoport <rppt@linux.ibm.com>: memblock: make memblock_find_in_range method private Subsystem: mm/oom-kill Suren Baghdasaryan <surenb@google.com>: mm: introduce process_mrelease system call mm: wire up syscall process_mrelease Subsystem: mm/migration Randy Dunlap <rdunlap@infradead.org>: mm/migrate: correct kernel-doc notation Subsystem: mm/ksm Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>: Patch series "add KSM selftests": selftests: vm: add KSM merge test selftests: vm: add KSM unmerge test selftests: vm: add KSM zero page merging test selftests: vm: add KSM merging across nodes test mm: KSM: fix data type Patch series "add KSM performance tests", v3: selftests: vm: add KSM merging time test selftests: vm: add COW time test for KSM pages Subsystem: mm/percpu Jing Xiangfeng <jingxiangfeng@huawei.com>: mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() Subsystem: mm/vmstat Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup for vmstat": mm/vmstat: correct some wrong comments mm/vmstat: simplify the array size calculation mm/vmstat: remove unneeded return value Subsystem: mm/madvise zhangkui <zhangkui@oppo.com>: mm/madvise: add MADV_WILLNEED to process_madvise() Documentation/ABI/testing/sysfs-kernel-mm-numa | 24 Documentation/admin-guide/mm/numa_memory_policy.rst | 15 Documentation/admin-guide/sysctl/vm.rst | 3 Documentation/core-api/cachetlb.rst | 86 - Documentation/dev-tools/kasan.rst | 13 Documentation/translations/zh_CN/core-api/cachetlb.rst | 9 Documentation/vm/hwpoison.rst | 1 arch/Kconfig | 28 arch/alpha/kernel/syscalls/syscall.tbl | 2 arch/arm/include/asm/cacheflush.h | 4 arch/arm/kernel/setup.c | 20 arch/arm/mach-rpc/ecard.c | 2 arch/arm/mm/flush.c | 33 arch/arm/mm/nommu.c | 6 arch/arm/tools/syscall.tbl | 2 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/arm64/kvm/hyp/reserved_mem.c | 9 arch/arm64/mm/init.c | 38 arch/csky/abiv1/cacheflush.c | 11 arch/csky/abiv1/inc/abi/cacheflush.h | 4 arch/csky/kernel/probes/kprobes.c | 3 arch/ia64/include/asm/meminit.h | 2 arch/ia64/kernel/acpi.c | 2 arch/ia64/kernel/setup.c | 55 arch/ia64/kernel/syscalls/syscall.tbl | 2 arch/m68k/kernel/syscalls/syscall.tbl | 2 arch/microblaze/include/asm/page.h | 3 arch/microblaze/include/asm/pgtable.h | 2 arch/microblaze/kernel/syscalls/syscall.tbl | 2 arch/microblaze/mm/init.c | 12 arch/microblaze/mm/pgtable.c | 17 arch/mips/include/asm/cacheflush.h | 8 arch/mips/kernel/setup.c | 14 arch/mips/kernel/syscalls/syscall_n32.tbl | 2 arch/mips/kernel/syscalls/syscall_n64.tbl | 2 arch/mips/kernel/syscalls/syscall_o32.tbl | 2 arch/nds32/include/asm/cacheflush.h | 3 arch/nds32/mm/cacheflush.c | 9 arch/parisc/include/asm/cacheflush.h | 8 arch/parisc/kernel/cache.c | 3 arch/parisc/kernel/syscalls/syscall.tbl | 2 arch/powerpc/Kconfig | 1 arch/powerpc/kernel/smp.c | 2 arch/powerpc/kernel/syscalls/syscall.tbl | 2 arch/powerpc/mm/book3s64/radix_tlb.c | 4 arch/powerpc/platforms/pseries/hotplug-memory.c | 4 arch/riscv/mm/init.c | 44 arch/s390/kernel/setup.c | 9 arch/s390/kernel/syscalls/syscall.tbl | 2 arch/s390/mm/fault.c | 2 arch/sh/include/asm/cacheflush.h | 8 arch/sh/kernel/syscalls/syscall.tbl | 2 arch/sparc/kernel/syscalls/syscall.tbl | 2 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/x86/kernel/aperture_64.c | 5 arch/x86/kernel/ldt.c | 6 arch/x86/mm/init.c | 23 arch/x86/mm/numa.c | 5 arch/x86/mm/numa_emulation.c | 5 arch/x86/realmode/init.c | 2 arch/xtensa/kernel/syscalls/syscall.tbl | 2 block/blk-map.c | 2 drivers/acpi/tables.c | 5 drivers/base/arch_numa.c | 5 drivers/base/memory.c | 4 drivers/mmc/host/jz4740_mmc.c | 4 drivers/mmc/host/mmc_spi.c | 2 drivers/of/of_reserved_mem.c | 12 fs/drop_caches.c | 3 fs/exec.c | 12 fs/fcntl.c | 3 fs/fs-writeback.c | 28 fs/fs_context.c | 4 fs/inode.c | 2 fs/locks.c | 6 fs/namei.c | 8 fs/namespace.c | 7 fs/ocfs2/dlmglue.c | 14 fs/ocfs2/quota_global.c | 1 fs/ocfs2/quota_local.c | 2 fs/pipe.c | 2 fs/select.c | 4 fs/userfaultfd.c | 116 - include/linux/backing-dev-defs.h | 2 include/linux/backing-dev.h | 19 include/linux/buffer_head.h | 2 include/linux/compaction.h | 2 include/linux/highmem.h | 5 include/linux/hugetlb_cgroup.h | 12 include/linux/memblock.h | 2 include/linux/memcontrol.h | 118 + include/linux/memory.h | 2 include/linux/mempolicy.h | 16 include/linux/migrate.h | 14 include/linux/mm.h | 17 include/linux/mmzone.h | 4 include/linux/page-flags.h | 9 include/linux/pagemap.h | 4 include/linux/sched/mm.h | 35 include/linux/shmem_fs.h | 25 include/linux/slub_def.h | 6 include/linux/swap.h | 28 include/linux/syscalls.h | 1 include/linux/userfaultfd_k.h | 8 include/linux/vm_event_item.h | 2 include/linux/vmpressure.h | 2 include/linux/writeback.h | 4 include/trace/events/migrate.h | 3 include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/mempolicy.h | 1 ipc/msg.c | 2 ipc/namespace.c | 2 ipc/sem.c | 9 ipc/shm.c | 2 kernel/cgroup/namespace.c | 2 kernel/cpu.c | 2 kernel/exit.c | 2 kernel/fork.c | 51 kernel/kthread.c | 21 kernel/nsproxy.c | 2 kernel/pid_namespace.c | 5 kernel/sched/core.c | 37 kernel/sched/sched.h | 4 kernel/signal.c | 2 kernel/sys_ni.c | 1 kernel/sysctl.c | 2 kernel/time/namespace.c | 4 kernel/time/posix-timers.c | 4 kernel/user_namespace.c | 2 lib/scatterlist.c | 5 lib/test_kasan.c | 80 - lib/test_kasan_module.c | 20 lib/test_vmalloc.c | 5 mm/backing-dev.c | 11 mm/bootmem_info.c | 4 mm/compaction.c | 69 - mm/debug_vm_pgtable.c | 982 +++++++++------ mm/filemap.c | 15 mm/gup.c | 109 - mm/huge_memory.c | 32 mm/hugetlb.c | 173 ++ mm/hwpoison-inject.c | 2 mm/internal.h | 9 mm/kasan/hw_tags.c | 43 mm/kasan/kasan.h | 1 mm/kasan/report.c | 29 mm/khugepaged.c | 2 mm/ksm.c | 8 mm/madvise.c | 1 mm/memblock.c | 22 mm/memcontrol.c | 234 +-- mm/memory-failure.c | 53 mm/memory_hotplug.c | 2 mm/mempolicy.c | 207 ++- mm/migrate.c | 319 ++++ mm/mmap.c | 7 mm/mremap.c | 2 mm/oom_kill.c | 70 + mm/page-writeback.c | 133 +- mm/page_alloc.c | 62 mm/page_isolation.c | 13 mm/percpu.c | 3 mm/shmem.c | 309 ++-- mm/slab_common.c | 2 mm/slub.c | 1085 ++++++++++------- mm/sparse.c | 46 mm/swap.c | 22 mm/swapfile.c | 14 mm/truncate.c | 28 mm/userfaultfd.c | 15 mm/vmalloc.c | 79 - mm/vmpressure.c | 10 mm/vmscan.c | 220 ++- mm/vmstat.c | 25 security/tomoyo/domain.c | 13 tools/testing/scatterlist/linux/mm.h | 1 tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 3 tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 5 tools/testing/selftests/vm/hugetlb_reparenting_test.sh | 5 tools/testing/selftests/vm/ksm_tests.c | 696 ++++++++++ tools/testing/selftests/vm/mlock-random-test.c | 2 tools/testing/selftests/vm/run_vmtests.sh | 98 + tools/testing/selftests/vm/userfaultfd.c | 13 186 files changed, 4488 insertions(+), 2281 deletions(-)
On Thu, 2 Sep 2021 14:48:20 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 212 patches, based on 4a3bb4200a5958d76cc26ebe4db4257efa56812b.
Make that "based on 7d2a07b769330c34b4deabeed939325c77a7ec2f".
147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f. Subsystems affected by this patch series: mm/slub mm/memory-hotplug mm/rmap mm/ioremap mm/highmem mm/cleanups mm/secretmem mm/kfence mm/damon alpha percpu procfs misc core-kernel MAINTAINERS lib bitops checkpatch epoll init nilfs2 coredump fork pids criu kconfig selftests ipc mm/vmscan scripts Subsystem: mm/slub Vlastimil Babka <vbabka@suse.cz>: Patch series "SLUB: reduce irq disabled scope and make it RT compatible", v6: mm, slub: don't call flush_all() from slab_debug_trace_open() mm, slub: allocate private object map for debugfs listings mm, slub: allocate private object map for validate_slab_cache() mm, slub: don't disable irq for debug_check_no_locks_freed() mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() mm, slub: extract get_partial() from new_slab_objects() mm, slub: dissolve new_slab_objects() into ___slab_alloc() mm, slub: return slab page from get_partial() and set c->page afterwards mm, slub: restructure new page checks in ___slab_alloc() mm, slub: simplify kmem_cache_cpu and tid setup mm, slub: move disabling/enabling irqs to ___slab_alloc() mm, slub: do initial checks in ___slab_alloc() with irqs enabled mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() mm, slub: restore irqs around calling new_slab() mm, slub: validate slab from partial list or page allocator before making it cpu slab mm, slub: check new pages with restored irqs mm, slub: stop disabling irqs around get_partial() mm, slub: move reset of c->page and freelist out of deactivate_slab() mm, slub: make locking in deactivate_slab() irq-safe mm, slub: call deactivate_slab() without disabling irqs mm, slub: move irq control into unfreeze_partials() mm, slub: discard slabs in unfreeze_partials() without irqs disabled mm, slub: detach whole partial list at once in unfreeze_partials() mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing mm, slub: only disable irq with spin_lock in __unfreeze_partials() mm, slub: don't disable irqs in slub_cpu_dead() mm, slab: split out the cpu offline variant of flush_slab() Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context mm: slub: make object_map_lock a raw_spinlock_t Vlastimil Babka <vbabka@suse.cz>: mm, slub: make slab_lock() disable irqs with PREEMPT_RT mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg mm, slub: use migrate_disable() on PREEMPT_RT mm, slub: convert kmem_cpu_slab protection to local_lock Subsystem: mm/memory-hotplug David Hildenbrand <david@redhat.com>: Patch series "memory-hotplug.rst: complete admin-guide overhaul", v3: memory-hotplug.rst: remove locking details from admin-guide memory-hotplug.rst: complete admin-guide overhaul Mike Rapoport <rppt@linux.ibm.com>: Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE": mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE mm: memory_hotplug: cleanup after removal of pfn_valid_within() David Hildenbrand <david@redhat.com>: Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory": mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() mm/memory_hotplug: remove nid parameter from arch_remove_memory() mm/memory_hotplug: remove nid parameter from remove_memory() and friends ACPI: memhotplug: memory resources cannot be enabled yet Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3: mm: track present early pages per zone mm/memory_hotplug: introduce "auto-movable" online policy drivers/base/memory: introduce "memory groups" to logically group memory blocks mm/memory_hotplug: track present pages in memory groups ACPI: memhotplug: use a single static memory group for a single memory device dax/kmem: use a single static memory group for a single probed unit virtio-mem: use a single dynamic memory group for a single virtio-mem device mm/memory_hotplug: memory group aware "auto-movable" online policy mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanup and fixups for memory hotplug": mm/memory_hotplug: use helper zone_is_zone_device() to simplify the code Subsystem: mm/rmap Muchun Song <songmuchun@bytedance.com>: mm: remove redundant compound_head() calling Subsystem: mm/ioremap Christoph Hellwig <hch@lst.de>: riscv: only select GENERIC_IOREMAP if MMU support is enabled Patch series "small ioremap cleanups": mm: move ioremap_page_range to vmalloc.c mm: don't allow executable ioremap mappings Weizhao Ouyang <o451686892@gmail.com>: mm/early_ioremap.c: remove redundant early_ioremap_shutdown() Subsystem: mm/highmem Sebastian Andrzej Siewior <bigeasy@linutronix.de>: highmem: don't disable preemption on RT in kmap_atomic() Subsystem: mm/cleanups Changbin Du <changbin.du@gmail.com>: mm: in_irq() cleanup Muchun Song <songmuchun@bytedance.com>: mm: introduce PAGEFLAGS_MASK to replace ((1UL << NR_PAGEFLAGS) - 1) Subsystem: mm/secretmem Jordy Zomer <jordy@jordyzomer.github.io>: mm/secretmem: use refcount_t instead of atomic_t Subsystem: mm/kfence Marco Elver <elver@google.com>: kfence: show cpu and timestamp in alloc/free info kfence: test: fail fast if disabled at boot Subsystem: mm/damon SeongJae Park <sjpark@amazon.de>: Patch series "Introduce Data Access MONitor (DAMON)", v34: mm: introduce Data Access MONitor (DAMON) mm/damon/core: implement region-based sampling mm/damon: adaptively adjust regions mm/idle_page_tracking: make PG_idle reusable mm/damon: implement primitives for the virtual memory address spaces mm/damon: add a tracepoint mm/damon: implement a debugfs-based user space interface mm/damon/dbgfs: export kdamond pid to the user space mm/damon/dbgfs: support multiple contexts Documentation: add documents for DAMON mm/damon: add kunit tests mm/damon: add user space selftests MAINTAINERS: update for DAMON Subsystem: alpha Randy Dunlap <rdunlap@infradead.org>: alpha: agp: make empty macros use do-while-0 style alpha: pci-sysfs: fix all kernel-doc warnings Subsystem: percpu Greg Kroah-Hartman <gregkh@linuxfoundation.org>: percpu: remove export of pcpu_base_addr Subsystem: procfs Feng Zhou <zhoufeng.zf@bytedance.com>: fs/proc/kcore.c: add mmap interface Christoph Hellwig <hch@lst.de>: proc: stop using seq_get_buf in proc_task_name Ohhoon Kwon <ohoono.kwon@samsung.com>: connector: send event on write to /proc/[pid]/comm Subsystem: misc Colin Ian King <colin.king@canonical.com>: arch: Kconfig: fix spelling mistake "seperate" -> "separate" Andy Shevchenko <andriy.shevchenko@linux.intel.com>: include/linux/once.h: fix trivia typo Not -> Note Daniel Lezcano <daniel.lezcano@linaro.org>: Patch series "Add Hz macros", v3: units: change from 'L' to 'UL' units: add the HZ macros thermal/drivers/devfreq_cooling: use HZ macros devfreq: use HZ macros iio/drivers/as73211: use HZ macros hwmon/drivers/mr75203: use HZ macros iio/drivers/hid-sensor: use HZ macros i2c/drivers/ov02q10: use HZ macros mtd/drivers/nand: use HZ macros phy/drivers/stm32: use HZ macros Subsystem: core-kernel Yang Yang <yang.yang29@zte.com.cn>: kernel/acct.c: use dedicated helper to access rlimit values Pavel Skripkin <paskripkin@gmail.com>: profiling: fix shift-out-of-bounds bugs Subsystem: MAINTAINERS Nathan Chancellor <nathan@kernel.org>: MAINTAINERS: update ClangBuiltLinux mailing list Documentation/llvm: update mailing list Documentation/llvm: update IRC location Subsystem: lib Geert Uytterhoeven <geert@linux-m68k.org>: Patch series "math: RATIONAL and RATIONAL_KUNIT_TEST improvements": math: make RATIONAL tristate math: RATIONAL_KUNIT_TEST should depend on RATIONAL instead of selecting it Matteo Croce <mcroce@microsoft.com>: Patch series "lib/string: optimized mem* functions", v2: lib/string: optimized memcpy lib/string: optimized memmove lib/string: optimized memset Daniel Latypov <dlatypov@google.com>: lib/test: convert test_sort.c to use KUnit Randy Dunlap <rdunlap@infradead.org>: lib/dump_stack: correct kernel-doc notation lib/iov_iter.c: fix kernel-doc warnings Subsystem: bitops Yury Norov <yury.norov@gmail.com>: Patch series "Resend bitmap patches": bitops: protect find_first_{,zero}_bit properly bitops: move find_bit_*_le functions from le.h to find.h include: move find.h from asm_generic to linux arch: remove GENERIC_FIND_FIRST_BIT entirely lib: add find_first_and_bit() cpumask: use find_first_and_bit() all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate tools: sync tools/bitmap with mother linux cpumask: replace cpumask_next_* with cpumask_first_* where appropriate include/linux: move for_each_bit() macros from bitops.h to find.h find: micro-optimize for_each_{set,clear}_bit() bitops: replace for_each_*_bit_from() with for_each_*_bit() where appropriate Andy Shevchenko <andriy.shevchenko@linux.intel.com>: tools: rename bitmap_alloc() to bitmap_zalloc() Yury Norov <yury.norov@gmail.com>: mm/percpu: micro-optimize pcpu_is_populated() bitmap: unify find_bit operations lib: bitmap: add performance test for bitmap_print_to_pagebuf vsprintf: rework bitmap_list_string Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: support wide strings Mimi Zohar <zohar@linux.ibm.com>: checkpatch: make email address check case insensitive Joe Perches <joe@perches.com>: checkpatch: improve GIT_COMMIT_ID test Subsystem: epoll Nicholas Piggin <npiggin@gmail.com>: fs/epoll: use a per-cpu counter for user's watches count Subsystem: init Rasmus Villemoes <linux@rasmusvillemoes.dk>: init: move usermodehelper_enable() to populate_rootfs() Kefeng Wang <wangkefeng.wang@huawei.com>: trap: cleanup trap_init() Subsystem: nilfs2 Nanyong Sun <sunnanyong@huawei.com>: Patch series "nilfs2: fix incorrect usage of kobject": nilfs2: fix memory leak in nilfs_sysfs_create_device_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group Zhen Lei <thunder.leizhen@huawei.com>: nilfs2: use refcount_dec_and_lock() to fix potential UAF Subsystem: coredump David Oberhollenzer <david.oberhollenzer@sigma-star.at>: fs/coredump.c: log if a core dump is aborted due to changed file permissions QiuXi <qiuxi1@huawei.com>: coredump: fix memleak in dump_vma_snapshot() Subsystem: fork Christoph Hellwig <hch@lst.de>: kernel/fork.c: unexport get_{mm,task}_exe_file Subsystem: pids Takahiro Itazuri <itazur@amazon.com>: pid: cleanup the stale comment mentioning pidmap_init(). Subsystem: criu Cyrill Gorcunov <gorcunov@gmail.com>: prctl: allow to setup brk for et_dyn executables Subsystem: kconfig Zenghui Yu <yuzenghui@huawei.com>: configs: remove the obsolete CONFIG_INPUT_POLLDEV Lukas Bulwahn <lukas.bulwahn@gmail.com>: Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH Subsystem: selftests Greg Thelen <gthelen@google.com>: selftests/memfd: remove unused variable Subsystem: ipc Rafael Aquini <aquini@redhat.com>: ipc: replace costly bailout check in sysvipc_find_ipc() Subsystem: mm/vmscan Randy Dunlap <rdunlap@infradead.org>: mm/workingset: correct kernel-doc notations Subsystem: scripts Randy Dunlap <rdunlap@infradead.org>: scripts: check_extable: fix typo in user error message a/Documentation/admin-guide/mm/damon/index.rst | 15 a/Documentation/admin-guide/mm/damon/start.rst | 114 + a/Documentation/admin-guide/mm/damon/usage.rst | 112 + a/Documentation/admin-guide/mm/index.rst | 1 a/Documentation/admin-guide/mm/memory-hotplug.rst | 842 ++++++----- a/Documentation/dev-tools/kfence.rst | 98 - a/Documentation/kbuild/llvm.rst | 5 a/Documentation/vm/damon/api.rst | 20 a/Documentation/vm/damon/design.rst | 166 ++ a/Documentation/vm/damon/faq.rst | 51 a/Documentation/vm/damon/index.rst | 30 a/Documentation/vm/index.rst | 1 a/MAINTAINERS | 17 a/arch/Kconfig | 2 a/arch/alpha/include/asm/agp.h | 4 a/arch/alpha/include/asm/bitops.h | 2 a/arch/alpha/kernel/pci-sysfs.c | 12 a/arch/arc/Kconfig | 1 a/arch/arc/include/asm/bitops.h | 1 a/arch/arc/kernel/traps.c | 5 a/arch/arm/configs/dove_defconfig | 1 a/arch/arm/configs/pxa_defconfig | 1 a/arch/arm/include/asm/bitops.h | 1 a/arch/arm/kernel/traps.c | 5 a/arch/arm64/Kconfig | 1 a/arch/arm64/include/asm/bitops.h | 1 a/arch/arm64/mm/mmu.c | 3 a/arch/csky/include/asm/bitops.h | 1 a/arch/h8300/include/asm/bitops.h | 1 a/arch/h8300/kernel/traps.c | 4 a/arch/hexagon/include/asm/bitops.h | 1 a/arch/hexagon/kernel/traps.c | 4 a/arch/ia64/include/asm/bitops.h | 2 a/arch/ia64/mm/init.c | 3 a/arch/m68k/include/asm/bitops.h | 2 a/arch/mips/Kconfig | 1 a/arch/mips/configs/lemote2f_defconfig | 1 a/arch/mips/configs/pic32mzda_defconfig | 1 a/arch/mips/configs/rt305x_defconfig | 1 a/arch/mips/configs/xway_defconfig | 1 a/arch/mips/include/asm/bitops.h | 1 a/arch/nds32/kernel/traps.c | 5 a/arch/nios2/kernel/traps.c | 5 a/arch/openrisc/include/asm/bitops.h | 1 a/arch/openrisc/kernel/traps.c | 5 a/arch/parisc/configs/generic-32bit_defconfig | 1 a/arch/parisc/include/asm/bitops.h | 2 a/arch/parisc/kernel/traps.c | 4 a/arch/powerpc/include/asm/bitops.h | 2 a/arch/powerpc/include/asm/cputhreads.h | 2 a/arch/powerpc/kernel/traps.c | 5 a/arch/powerpc/mm/mem.c | 3 a/arch/powerpc/platforms/pasemi/dma_lib.c | 4 a/arch/powerpc/platforms/pseries/hotplug-memory.c | 9 a/arch/riscv/Kconfig | 2 a/arch/riscv/include/asm/bitops.h | 1 a/arch/riscv/kernel/traps.c | 5 a/arch/s390/Kconfig | 1 a/arch/s390/include/asm/bitops.h | 1 a/arch/s390/kvm/kvm-s390.c | 2 a/arch/s390/mm/init.c | 3 a/arch/sh/include/asm/bitops.h | 1 a/arch/sh/mm/init.c | 3 a/arch/sparc/include/asm/bitops_32.h | 1 a/arch/sparc/include/asm/bitops_64.h | 2 a/arch/um/kernel/trap.c | 4 a/arch/x86/Kconfig | 1 a/arch/x86/configs/i386_defconfig | 1 a/arch/x86/configs/x86_64_defconfig | 1 a/arch/x86/include/asm/bitops.h | 2 a/arch/x86/kernel/apic/vector.c | 4 a/arch/x86/mm/init_32.c | 3 a/arch/x86/mm/init_64.c | 3 a/arch/x86/um/Kconfig | 1 a/arch/xtensa/include/asm/bitops.h | 1 a/block/blk-mq.c | 2 a/drivers/acpi/acpi_memhotplug.c | 46 a/drivers/base/memory.c | 231 ++- a/drivers/base/node.c | 2 a/drivers/block/rnbd/rnbd-clt.c | 2 a/drivers/dax/kmem.c | 43 a/drivers/devfreq/devfreq.c | 2 a/drivers/dma/ti/edma.c | 2 a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 a/drivers/hwmon/ltc2992.c | 3 a/drivers/hwmon/mr75203.c | 2 a/drivers/iio/adc/ad7124.c | 2 a/drivers/iio/common/hid-sensors/hid-sensor-attributes.c | 3 a/drivers/iio/light/as73211.c | 3 a/drivers/infiniband/hw/irdma/hw.c | 16 a/drivers/media/cec/core/cec-core.c | 2 a/drivers/media/i2c/ov02a10.c | 2 a/drivers/media/mc/mc-devnode.c | 2 a/drivers/mmc/host/renesas_sdhi_core.c | 2 a/drivers/mtd/nand/raw/intel-nand-controller.c | 2 a/drivers/net/virtio_net.c | 2 a/drivers/pci/controller/dwc/pci-dra7xx.c | 2 a/drivers/phy/st/phy-stm32-usbphyc.c | 2 a/drivers/scsi/lpfc/lpfc_sli.c | 10 a/drivers/soc/fsl/qbman/bman_portal.c | 2 a/drivers/soc/fsl/qbman/qman_portal.c | 2 a/drivers/soc/ti/k3-ringacc.c | 4 a/drivers/thermal/devfreq_cooling.c | 2 a/drivers/tty/n_tty.c | 2 a/drivers/virt/acrn/ioreq.c | 3 a/drivers/virtio/virtio_mem.c | 26 a/fs/coredump.c | 15 a/fs/eventpoll.c | 18 a/fs/f2fs/segment.c | 8 a/fs/nilfs2/sysfs.c | 26 a/fs/nilfs2/the_nilfs.c | 9 a/fs/ocfs2/cluster/heartbeat.c | 2 a/fs/ocfs2/dlm/dlmdomain.c | 4 a/fs/ocfs2/dlm/dlmmaster.c | 18 a/fs/ocfs2/dlm/dlmrecovery.c | 2 a/fs/ocfs2/dlm/dlmthread.c | 2 a/fs/proc/array.c | 18 a/fs/proc/base.c | 5 a/fs/proc/kcore.c | 73 a/include/asm-generic/bitops.h | 1 a/include/asm-generic/bitops/find.h | 198 -- a/include/asm-generic/bitops/le.h | 64 a/include/asm-generic/early_ioremap.h | 6 a/include/linux/bitmap.h | 34 a/include/linux/bitops.h | 34 a/include/linux/cpumask.h | 46 a/include/linux/damon.h | 290 +++ a/include/linux/find.h | 134 + a/include/linux/highmem-internal.h | 27 a/include/linux/memory.h | 55 a/include/linux/memory_hotplug.h | 40 a/include/linux/mmzone.h | 19 a/include/linux/once.h | 2 a/include/linux/page-flags.h | 17 a/include/linux/page_ext.h | 2 a/include/linux/page_idle.h | 6 a/include/linux/pagemap.h | 7 a/include/linux/sched/user.h | 3 a/include/linux/slub_def.h | 6 a/include/linux/threads.h | 2 a/include/linux/units.h | 10 a/include/linux/vmalloc.h | 3 a/include/trace/events/damon.h | 43 a/include/trace/events/mmflags.h | 2 a/include/trace/events/page_ref.h | 4 a/init/initramfs.c | 2 a/init/main.c | 3 a/init/noinitramfs.c | 2 a/ipc/util.c | 16 a/kernel/acct.c | 2 a/kernel/fork.c | 2 a/kernel/profile.c | 21 a/kernel/sys.c | 7 a/kernel/time/clocksource.c | 4 a/kernel/user.c | 25 a/lib/Kconfig | 3 a/lib/Kconfig.debug | 9 a/lib/dump_stack.c | 3 a/lib/find_bit.c | 21 a/lib/find_bit_benchmark.c | 21 a/lib/genalloc.c | 2 a/lib/iov_iter.c | 8 a/lib/math/Kconfig | 2 a/lib/math/rational.c | 3 a/lib/string.c | 130 + a/lib/test_bitmap.c | 37 a/lib/test_printf.c | 2 a/lib/test_sort.c | 40 a/lib/vsprintf.c | 26 a/mm/Kconfig | 15 a/mm/Makefile | 4 a/mm/compaction.c | 20 a/mm/damon/Kconfig | 68 a/mm/damon/Makefile | 5 a/mm/damon/core-test.h | 253 +++ a/mm/damon/core.c | 748 ++++++++++ a/mm/damon/dbgfs-test.h | 126 + a/mm/damon/dbgfs.c | 631 ++++++++ a/mm/damon/vaddr-test.h | 329 ++++ a/mm/damon/vaddr.c | 672 +++++++++ a/mm/early_ioremap.c | 5 a/mm/highmem.c | 2 a/mm/ioremap.c | 25 a/mm/kfence/core.c | 3 a/mm/kfence/kfence.h | 2 a/mm/kfence/kfence_test.c | 3 a/mm/kfence/report.c | 19 a/mm/kmemleak.c | 2 a/mm/memory_hotplug.c | 396 ++++- a/mm/memremap.c | 5 a/mm/page_alloc.c | 27 a/mm/page_ext.c | 12 a/mm/page_idle.c | 10 a/mm/page_isolation.c | 7 a/mm/page_owner.c | 14 a/mm/percpu.c | 36 a/mm/rmap.c | 6 a/mm/secretmem.c | 9 a/mm/slab_common.c | 2 a/mm/slub.c | 1023 +++++++++----- a/mm/vmalloc.c | 24 a/mm/workingset.c | 2 a/net/ncsi/ncsi-manage.c | 4 a/scripts/check_extable.sh | 2 a/scripts/checkpatch.pl | 93 - a/tools/include/linux/bitmap.h | 4 a/tools/perf/bench/find-bit-bench.c | 2 a/tools/perf/builtin-c2c.c | 6 a/tools/perf/builtin-record.c | 2 a/tools/perf/tests/bitmap.c | 2 a/tools/perf/tests/mem2node.c | 2 a/tools/perf/util/affinity.c | 4 a/tools/perf/util/header.c | 4 a/tools/perf/util/metricgroup.c | 2 a/tools/perf/util/mmap.c | 4 a/tools/testing/selftests/damon/Makefile | 7 a/tools/testing/selftests/damon/_chk_dependency.sh | 28 a/tools/testing/selftests/damon/debugfs_attrs.sh | 75 + a/tools/testing/selftests/kvm/dirty_log_perf_test.c | 2 a/tools/testing/selftests/kvm/dirty_log_test.c | 4 a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c | 2 a/tools/testing/selftests/memfd/memfd_test.c | 2 b/MAINTAINERS | 2 b/tools/include/asm-generic/bitops.h | 1 b/tools/include/linux/bitmap.h | 7 b/tools/include/linux/find.h | 81 + b/tools/lib/find_bit.c | 20 227 files changed, 6695 insertions(+), 1875 deletions(-)
On 9/8/21 04:52, Andrew Morton wrote: > Subsystem: mm/slub > > Vlastimil Babka <vbabka@suse.cz>: > Patch series "SLUB: reduce irq disabled scope and make it RT compatible", v6: > mm, slub: don't call flush_all() from slab_debug_trace_open() > mm, slub: allocate private object map for debugfs listings > mm, slub: allocate private object map for validate_slab_cache() > mm, slub: don't disable irq for debug_check_no_locks_freed() > mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() > mm, slub: extract get_partial() from new_slab_objects() > mm, slub: dissolve new_slab_objects() into ___slab_alloc() > mm, slub: return slab page from get_partial() and set c->page afterwards > mm, slub: restructure new page checks in ___slab_alloc() > mm, slub: simplify kmem_cache_cpu and tid setup > mm, slub: move disabling/enabling irqs to ___slab_alloc() > mm, slub: do initial checks in ___slab_alloc() with irqs enabled > mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() > mm, slub: restore irqs around calling new_slab() > mm, slub: validate slab from partial list or page allocator before making it cpu slab > mm, slub: check new pages with restored irqs > mm, slub: stop disabling irqs around get_partial() > mm, slub: move reset of c->page and freelist out of deactivate_slab() > mm, slub: make locking in deactivate_slab() irq-safe > mm, slub: call deactivate_slab() without disabling irqs > mm, slub: move irq control into unfreeze_partials() > mm, slub: discard slabs in unfreeze_partials() without irqs disabled > mm, slub: detach whole partial list at once in unfreeze_partials() > mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing > mm, slub: only disable irq with spin_lock in __unfreeze_partials() > mm, slub: don't disable irqs in slub_cpu_dead() > mm, slab: split out the cpu offline variant of flush_slab() > > Sebastian Andrzej Siewior <bigeasy@linutronix.de>: > mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context > mm: slub: make object_map_lock a raw_spinlock_t > > Vlastimil Babka <vbabka@suse.cz>: > mm, slub: make slab_lock() disable irqs with PREEMPT_RT > mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg > mm, slub: use migrate_disable() on PREEMPT_RT > mm, slub: convert kmem_cpu_slab protection to local_lock For my own piece of mind, I've checked that this part (patches 1 to 33) are identical to the v6 posting [1] and git version [2] that Mel and Mike tested (replies to [1]). [1] https://lore.kernel.org/all/20210904105003.11688-1-vbabka@suse.cz/ [2] git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git tags/mm-slub-5.15-rc1
This is the post-linux-next material, so it is based upon latest upstream to catch the now-merged dependencies. 10 patches, based on 2d338201d5311bcd79d42f66df4cecbcbc5f4f2c. Subsystems affected by this patch series: mm/vmstat mm/migration compat Subsystem: mm/vmstat Ingo Molnar <mingo@elte.hu>: mm/vmstat: protect per cpu variables with preempt disable on RT Subsystem: mm/migration Baolin Wang <baolin.wang@linux.alibaba.com>: mm: migrate: introduce a local variable to get the number of pages mm: migrate: fix the incorrect function name in comments mm: migrate: change to use bool type for 'page_was_mapped' Subsystem: compat Arnd Bergmann <arnd@arndb.de>: Patch series "compat: remove compat_alloc_user_space", v5: kexec: move locking into do_kexec_load kexec: avoid compat_alloc_user_space mm: simplify compat_sys_move_pages mm: simplify compat numa syscalls compat: remove some compat entry points arch: remove compat_alloc_user_space arch/arm64/include/asm/compat.h | 5 arch/arm64/include/asm/uaccess.h | 11 - arch/arm64/include/asm/unistd32.h | 10 - arch/arm64/lib/Makefile | 2 arch/arm64/lib/copy_in_user.S | 77 ---------- arch/mips/cavium-octeon/octeon-memcpy.S | 2 arch/mips/include/asm/compat.h | 8 - arch/mips/include/asm/uaccess.h | 26 --- arch/mips/kernel/syscalls/syscall_n32.tbl | 10 - arch/mips/kernel/syscalls/syscall_o32.tbl | 10 - arch/mips/lib/memcpy.S | 11 - arch/parisc/include/asm/compat.h | 6 arch/parisc/include/asm/uaccess.h | 2 arch/parisc/kernel/syscalls/syscall.tbl | 8 - arch/parisc/lib/memcpy.c | 9 - arch/powerpc/include/asm/compat.h | 16 -- arch/powerpc/kernel/syscalls/syscall.tbl | 10 - arch/s390/include/asm/compat.h | 10 - arch/s390/include/asm/uaccess.h | 3 arch/s390/kernel/syscalls/syscall.tbl | 10 - arch/s390/lib/uaccess.c | 63 -------- arch/sparc/include/asm/compat.h | 19 -- arch/sparc/kernel/process_64.c | 2 arch/sparc/kernel/signal32.c | 12 - arch/sparc/kernel/signal_64.c | 8 - arch/sparc/kernel/syscalls/syscall.tbl | 10 - arch/x86/entry/syscalls/syscall_32.tbl | 4 arch/x86/entry/syscalls/syscall_64.tbl | 2 arch/x86/include/asm/compat.h | 13 - arch/x86/include/asm/uaccess_64.h | 7 include/linux/compat.h | 39 +---- include/linux/uaccess.h | 10 - include/uapi/asm-generic/unistd.h | 10 - kernel/compat.c | 21 -- kernel/kexec.c | 105 +++++--------- kernel/sys_ni.c | 5 mm/mempolicy.c | 213 +++++++----------------------- mm/migrate.c | 69 +++++---- mm/vmstat.c | 48 ++++++ 39 files changed, 243 insertions(+), 663 deletions(-)
A bunch of hotfixes, mostly cc:stable. 8 patches, based on 2d338201d5311bcd79d42f66df4cecbcbc5f4f2c. Subsystems affected by this patch series: mm/hmm mm/hugetlb mm/vmscan mm/pagealloc mm/pagemap mm/kmemleak mm/mempolicy mm/memblock Subsystem: mm/hmm Li Zhijian <lizhijian@cn.fujitsu.com>: mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled Subsystem: mm/hugetlb Liu Zixian <liuzixian4@huawei.com>: mm/hugetlb: initialize hugetlb_usage in mm_init Subsystem: mm/vmscan Rik van Riel <riel@surriel.com>: mm,vmscan: fix divide by zero in get_scan_count Subsystem: mm/pagealloc Miaohe Lin <linmiaohe@huawei.com>: mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype Subsystem: mm/pagemap Liam Howlett <liam.howlett@oracle.com>: mmap_lock: change trace and locking order Subsystem: mm/kmemleak Naohiro Aota <naohiro.aota@wdc.com>: mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp Subsystem: mm/mempolicy yanghui <yanghui.def@bytedance.com>: mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task Subsystem: mm/memblock Mike Rapoport <rppt@linux.ibm.com>: nds32/setup: remove unused memblock_region variable in setup_memory() arch/nds32/kernel/setup.c | 1 - include/linux/hugetlb.h | 9 +++++++++ include/linux/mmap_lock.h | 8 ++++---- kernel/fork.c | 1 + mm/hmm.c | 5 ++++- mm/kmemleak.c | 3 ++- mm/mempolicy.c | 17 +++++++++++++---- mm/page_alloc.c | 4 +++- mm/vmscan.c | 2 +- 9 files changed, 37 insertions(+), 13 deletions(-)
More post linux-next material. 9 patches, based on f154c806676ad7153c6e161f30c53a44855329d6. Subsystems affected by this patch series: mm/slab-generic rapidio mm/debug Subsystem: mm/slab-generic "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: move kvmalloc-related functions to slab.h Subsystem: rapidio Kees Cook <keescook@chromium.org>: rapidio: avoid bogus __alloc_size warning Subsystem: mm/debug Kees Cook <keescook@chromium.org>: Patch series "Add __alloc_size() for better bounds checking", v2: Compiler Attributes: add __alloc_size() for better bounds checking checkpatch: add __alloc_size() to known $Attribute slab: clean up function declarations slab: add __alloc_size attributes for better bounds checking mm/page_alloc: add __alloc_size attributes for better bounds checking percpu: add __alloc_size attributes for better bounds checking mm/vmalloc: add __alloc_size attributes for better bounds checking Makefile | 15 +++ drivers/of/kexec.c | 1 drivers/rapidio/devices/rio_mport_cdev.c | 9 +- include/linux/compiler_attributes.h | 6 + include/linux/gfp.h | 2 include/linux/mm.h | 34 -------- include/linux/percpu.h | 3 include/linux/slab.h | 122 ++++++++++++++++++++++--------- include/linux/vmalloc.h | 11 ++ scripts/checkpatch.pl | 3 10 files changed, 132 insertions(+), 74 deletions(-)
On Thu, Sep 09, 2021 at 08:09:48PM -0700, Andrew Morton wrote: > > More post linux-next material. > > 9 patches, based on f154c806676ad7153c6e161f30c53a44855329d6. > > Subsystems affected by this patch series: > > mm/slab-generic > rapidio > mm/debug > > Subsystem: mm/slab-generic > > "Matthew Wilcox (Oracle)" <willy@infradead.org>: > mm: move kvmalloc-related functions to slab.h > > Subsystem: rapidio > > Kees Cook <keescook@chromium.org>: > rapidio: avoid bogus __alloc_size warning > > Subsystem: mm/debug > > Kees Cook <keescook@chromium.org>: > Patch series "Add __alloc_size() for better bounds checking", v2: > Compiler Attributes: add __alloc_size() for better bounds checking > checkpatch: add __alloc_size() to known $Attribute > slab: clean up function declarations > slab: add __alloc_size attributes for better bounds checking > mm/page_alloc: add __alloc_size attributes for better bounds checking > percpu: add __alloc_size attributes for better bounds checking > mm/vmalloc: add __alloc_size attributes for better bounds checking Hi, FYI, in overnight build testing I found yet another corner case in GCC's handling of the __alloc_size attribute. It's the gift that keeps on giving. The fix is here: https://lore.kernel.org/lkml/20210910165851.3296624-1-keescook@chromium.org/ > > Makefile | 15 +++ > drivers/of/kexec.c | 1 > drivers/rapidio/devices/rio_mport_cdev.c | 9 +- > include/linux/compiler_attributes.h | 6 + > include/linux/gfp.h | 2 > include/linux/mm.h | 34 -------- > include/linux/percpu.h | 3 > include/linux/slab.h | 122 ++++++++++++++++++++++--------- > include/linux/vmalloc.h | 11 ++ > scripts/checkpatch.pl | 3 > 10 files changed, 132 insertions(+), 74 deletions(-) > -- Kees Cook
On Fri, Sep 10, 2021 at 10:11:53AM -0700, Kees Cook wrote: > On Thu, Sep 09, 2021 at 08:09:48PM -0700, Andrew Morton wrote: > > > > More post linux-next material. > > > > 9 patches, based on f154c806676ad7153c6e161f30c53a44855329d6. > > > > Subsystems affected by this patch series: > > > > mm/slab-generic > > rapidio > > mm/debug > > > > Subsystem: mm/slab-generic > > > > "Matthew Wilcox (Oracle)" <willy@infradead.org>: > > mm: move kvmalloc-related functions to slab.h > > > > Subsystem: rapidio > > > > Kees Cook <keescook@chromium.org>: > > rapidio: avoid bogus __alloc_size warning > > > > Subsystem: mm/debug > > > > Kees Cook <keescook@chromium.org>: > > Patch series "Add __alloc_size() for better bounds checking", v2: > > Compiler Attributes: add __alloc_size() for better bounds checking > > checkpatch: add __alloc_size() to known $Attribute > > slab: clean up function declarations > > slab: add __alloc_size attributes for better bounds checking > > mm/page_alloc: add __alloc_size attributes for better bounds checking > > percpu: add __alloc_size attributes for better bounds checking > > mm/vmalloc: add __alloc_size attributes for better bounds checking > > Hi, > > FYI, in overnight build testing I found yet another corner case in > GCC's handling of the __alloc_size attribute. It's the gift that keeps > on giving. The fix is here: > > https://lore.kernel.org/lkml/20210910165851.3296624-1-keescook@chromium.org/ I'm so glad it's Friday. Here's the v2 fix... *sigh* https://lore.kernel.org/lkml/20210910201132.3809437-1-keescook@chromium.org/ -Kees > > > > > Makefile | 15 +++ > > drivers/of/kexec.c | 1 > > drivers/rapidio/devices/rio_mport_cdev.c | 9 +- > > include/linux/compiler_attributes.h | 6 + > > include/linux/gfp.h | 2 > > include/linux/mm.h | 34 -------- > > include/linux/percpu.h | 3 > > include/linux/slab.h | 122 ++++++++++++++++++++++--------- > > include/linux/vmalloc.h | 11 ++ > > scripts/checkpatch.pl | 3 > > 10 files changed, 132 insertions(+), 74 deletions(-) > > > > -- > Kees Cook -- Kees Cook
16 patches, based on 7d42e98182586f57f376406d033f05fe135edb75. Subsystems affected by this patch series: mm/memory-failure mm/kasan mm/damon xtensa mm/shmem ocfs2 scripts mm/tools lib mm/pagecache mm/debug sh mm/kasan mm/memory-failure mm/pagemap Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable() Subsystem: mm/kasan Marco Elver <elver@google.com>: kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS Subsystem: mm/damon Adam Borowski <kilobyte@angband.pl>: mm/damon: don't use strnlen() with known-bogus source length Subsystem: xtensa Guenter Roeck <linux@roeck-us.net>: xtensa: increase size of gcc stack frame check Subsystem: mm/shmem Liu Yuntao <liuyuntao10@huawei.com>: mm/shmem.c: fix judgment error in shmem_is_huge() Subsystem: ocfs2 Wengang Wang <wen.gang.wang@oracle.com>: ocfs2: drop acl cache for directories too Subsystem: scripts Miles Chen <miles.chen@mediatek.com>: scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error Subsystem: mm/tools Changbin Du <changbin.du@gmail.com>: tools/vm/page-types: remove dependency on opt_file for idle page tracking Subsystem: lib Paul Menzel <pmenzel@molgen.mpg.de>: lib/zlib_inflate/inffast: check config in C to avoid unused function warning Subsystem: mm/pagecache Minchan Kim <minchan@kernel.org>: mm: fs: invalidate bh_lrus for only cold path Subsystem: mm/debug Weizhao Ouyang <o451686892@gmail.com>: mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN mm/debug: sync up latest migrate_reason to migrate_reason_names Subsystem: sh Geert Uytterhoeven <geert+renesas@glider.be>: sh: pgtable-3level: fix cast to pointer from integer of different size Subsystem: mm/kasan Nathan Chancellor <nathan@kernel.org>: kasan: always respect CONFIG_KASAN_STACK Subsystem: mm/memory-failure Qi Zheng <zhengqi.arch@bytedance.com>: mm/memory_failure: fix the missing pte_unmap() call Subsystem: mm/pagemap Chen Jun <chenjun102@huawei.com>: mm: fix uninitialized use in overcommit_policy_handler arch/sh/include/asm/pgtable-3level.h | 2 +- fs/buffer.c | 8 ++++++-- fs/ocfs2/dlmglue.c | 3 ++- include/linux/buffer_head.h | 4 ++-- include/linux/migrate.h | 6 +++++- lib/Kconfig.debug | 2 +- lib/Kconfig.kasan | 2 ++ lib/zlib_inflate/inffast.c | 13 ++++++------- mm/damon/dbgfs-test.h | 16 ++++++++-------- mm/debug.c | 4 +++- mm/memory-failure.c | 12 ++++++------ mm/shmem.c | 4 ++-- mm/swap.c | 19 ++++++++++++++++--- mm/util.c | 4 ++-- scripts/Makefile.kasan | 3 ++- scripts/sorttable.c | 4 ++++ tools/vm/page-types.c | 2 +- 17 files changed, 69 insertions(+), 39 deletions(-)
19 patches, based on 519d81956ee277b4419c723adfb154603c2565ba. Subsystems affected by this patch series: mm/userfaultfd mm/migration ocfs2 mm/memblock mm/mempolicy mm/slub binfmt vfs mm/secretmem mm/thp misc Subsystem: mm/userfaultfd Peter Xu <peterx@redhat.com>: mm/userfaultfd: selftests: fix memory corruption with thp enabled Nadav Amit <namit@vmware.com>: userfaultfd: fix a race between writeprotect and exit_mmap() Subsystem: mm/migration Dave Hansen <dave.hansen@linux.intel.com>: Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2: mm/migrate: optimize hotplug-time demotion order updates mm/migrate: add CPU hotplug to demotion #ifdef Huang Ying <ying.huang@intel.com>: mm/migrate: fix CPUHP state to update node demotion order Subsystem: ocfs2 Jan Kara <jack@suse.cz>: ocfs2: fix data corruption after conversion from inline format Valentin Vidic <vvidic@valentin-vidic.from.hr>: ocfs2: mount fails with buffer overflow in strlen Subsystem: mm/memblock Peng Fan <peng.fan@nxp.com>: memblock: check memory total_size Subsystem: mm/mempolicy Eric Dumazet <edumazet@google.com>: mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind() Subsystem: mm/slub Miaohe Lin <linmiaohe@huawei.com>: Patch series "Fixups for slub": mm, slub: fix two bugs in slab_debug_trace_open() mm, slub: fix mismatch between reconstructed freelist depth and cnt mm, slub: fix potential memoryleak in kmem_cache_open() mm, slub: fix potential use-after-free in slab_debugfs_fops mm, slub: fix incorrect memcg slab count for bulk free Subsystem: binfmt Lukas Bulwahn <lukas.bulwahn@gmail.com>: elfcore: correct reference to CONFIG_UML Subsystem: vfs "Matthew Wilcox (Oracle)" <willy@infradead.org>: vfs: check fd has read access in kernel_read_file_from_fd() Subsystem: mm/secretmem Sean Christopherson <seanjc@google.com>: mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem() Subsystem: mm/thp Marek Szyprowski <m.szyprowski@samsung.com>: mm/thp: decrease nr_thps in file's mapping on THP split Subsystem: misc Andrej Shadura <andrew.shadura@collabora.co.uk>: mailmap: add Andrej Shadura .mailmap | 2 + fs/kernel_read_file.c | 2 - fs/ocfs2/alloc.c | 46 ++++++----------------- fs/ocfs2/super.c | 14 +++++-- fs/userfaultfd.c | 12 ++++-- include/linux/cpuhotplug.h | 4 ++ include/linux/elfcore.h | 2 - include/linux/memory.h | 5 ++ include/linux/secretmem.h | 2 - mm/huge_memory.c | 6 ++- mm/memblock.c | 2 - mm/mempolicy.c | 16 ++------ mm/migrate.c | 62 ++++++++++++++++++------------- mm/page_ext.c | 4 -- mm/slab.c | 4 +- mm/slub.c | 31 ++++++++++++--- tools/testing/selftests/vm/userfaultfd.c | 23 ++++++++++- 17 files changed, 138 insertions(+), 99 deletions(-)
11 patches, based on 411a44c24a561e449b592ff631b7ae321f1eb559. Subsystems affected by this patch series: mm/memcg mm/memory-failure mm/oom-kill ocfs2 mm/secretmem mm/vmalloc mm/hugetlb mm/damon mm/tools Subsystem: mm/memcg Shakeel Butt <shakeelb@google.com>: memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT Subsystem: mm/memory-failure Yang Shi <shy828301@gmail.com>: mm: hwpoison: remove the unnecessary THP check mm: filemap: check if THP has hwpoisoned subpage for PMD page fault Subsystem: mm/oom-kill Suren Baghdasaryan <surenb@google.com>: mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap Subsystem: ocfs2 Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>: ocfs2: fix race between searching chunks and release journal_head from buffer_head Subsystem: mm/secretmem Kees Cook <keescook@chromium.org>: mm/secretmem: avoid letting secretmem_users drop to zero Subsystem: mm/vmalloc Chen Wandun <chenwandun@huawei.com>: mm/vmalloc: fix numa spreading for large hash tables Subsystem: mm/hugetlb Rongwei Wang <rongwei.wang@linux.alibaba.com>: mm, thp: bail out early in collapse_file for writeback page Yang Shi <shy828301@gmail.com>: mm: khugepaged: skip huge page collapse for special files Subsystem: mm/damon SeongJae Park <sj@kernel.org>: mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()' Subsystem: mm/tools David Yang <davidcomponentone@gmail.com>: tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer fs/ocfs2/suballoc.c | 22 ++++++++++------- include/linux/page-flags.h | 23 ++++++++++++++++++ mm/damon/core-test.h | 4 +-- mm/huge_memory.c | 2 + mm/khugepaged.c | 26 +++++++++++++------- mm/memory-failure.c | 28 +++++++++++----------- mm/memory.c | 9 +++++++ mm/oom_kill.c | 23 +++++++++--------- mm/page_alloc.c | 8 +++++- mm/secretmem.c | 2 - mm/vmalloc.c | 15 +++++++---- tools/testing/selftests/vm/split_huge_page_test.c | 2 - 12 files changed, 110 insertions(+), 54 deletions(-)
262 patches, based on 8bb7eca972ad531c9b149c0a51ab43a417385813 Subsystems affected by this patch series: scripts ocfs2 vfs mm/slab-generic mm/slab mm/slub mm/kconfig mm/dax mm/kasan mm/debug mm/pagecache mm/gup mm/swap mm/memcg mm/pagemap mm/mprotect mm/mremap mm/iomap mm/tracing mm/vmalloc mm/pagealloc mm/memory-failure mm/hugetlb mm/userfaultfd mm/vmscan mm/tools mm/memblock mm/oom-kill mm/hugetlbfs mm/migration mm/thp mm/readahead mm/nommu mm/ksm mm/vmstat mm/madvise mm/memory-hotplug mm/rmap mm/zsmalloc mm/highmem mm/zram mm/cleanups mm/kfence mm/damon Subsystem: scripts Colin Ian King <colin.king@canonical.com>: scripts/spelling.txt: add more spellings to spelling.txt Sven Eckelmann <sven@narfation.org>: scripts/spelling.txt: fix "mistake" version of "synchronization" weidonghui <weidonghui@allwinnertech.com>: scripts/decodecode: fix faulting instruction no print when opps.file is DOS format Subsystem: ocfs2 Chenyuan Mi <cymi20@fudan.edu.cn>: ocfs2: fix handle refcount leak in two exception handling paths Valentin Vidic <vvidic@valentin-vidic.from.hr>: ocfs2: cleanup journal init and shutdown Colin Ian King <colin.king@canonical.com>: ocfs2/dlm: remove redundant assignment of variable ret Jan Kara <jack@suse.cz>: Patch series "ocfs2: Truncate data corruption fix": ocfs2: fix data corruption on truncate ocfs2: do not zero pages beyond i_size Subsystem: vfs Arnd Bergmann <arnd@arndb.de>: fs/posix_acl.c: avoid -Wempty-body warning Jia He <justin.he@arm.com>: d_path: fix Kernel doc validator complaining Subsystem: mm/slab-generic "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: move kvmalloc-related functions to slab.h Subsystem: mm/slab Shi Lei <shi_lei@massclouds.com>: mm/slab.c: remove useless lines in enable_cpucache() Subsystem: mm/slub Kefeng Wang <wangkefeng.wang@huawei.com>: slub: add back check for free nonslab objects Vlastimil Babka <vbabka@suse.cz>: mm, slub: change percpu partial accounting from objects to pages mm/slub: increase default cpu partial list sizes Hyeonggon Yoo <42.hyeyoo@gmail.com>: mm, slub: use prefetchw instead of prefetch Subsystem: mm/kconfig Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm: disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT Subsystem: mm/dax Christoph Hellwig <hch@lst.de>: mm: don't include <linux/dax.h> in <linux/mempolicy.h> Subsystem: mm/kasan Marco Elver <elver@google.com>: Patch series "stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock", v2: lib/stackdepot: include gfp.h lib/stackdepot: remove unused function argument lib/stackdepot: introduce __stack_depot_save() kasan: common: provide can_alloc in kasan_save_stack() kasan: generic: introduce kasan_record_aux_stack_noalloc() workqueue, kasan: avoid alloc_pages() when recording stack "Matthew Wilcox (Oracle)" <willy@infradead.org>: kasan: fix tag for large allocations when using CONFIG_SLAB Peter Collingbourne <pcc@google.com>: kasan: test: add memcpy test that avoids out-of-bounds write Subsystem: mm/debug Peter Xu <peterx@redhat.com>: Patch series "mm/smaps: Fixes and optimizations on shmem swap handling": mm/smaps: fix shmem pte hole swap calculation mm/smaps: use vma->vm_pgoff directly when counting partial swap mm/smaps: simplify shmem handling of pte holes Guo Ren <guoren@linux.alibaba.com>: mm: debug_vm_pgtable: don't use __P000 directly Kees Cook <keescook@chromium.org>: kasan: test: bypass __alloc_size checks Patch series "Add __alloc_size()", v3: rapidio: avoid bogus __alloc_size warning Compiler Attributes: add __alloc_size() for better bounds checking slab: clean up function prototypes slab: add __alloc_size attributes for better bounds checking mm/kvmalloc: add __alloc_size attributes for better bounds checking mm/vmalloc: add __alloc_size attributes for better bounds checking mm/page_alloc: add __alloc_size attributes for better bounds checking percpu: add __alloc_size attributes for better bounds checking Yinan Zhang <zhangyinan2019@email.szu.edu.cn>: mm/page_ext.c: fix a comment Subsystem: mm/pagecache David Howells <dhowells@redhat.com>: mm: stop filemap_read() from grabbing a superfluous page Christoph Hellwig <hch@lst.de>: Patch series "simplify bdi unregistation": mm: export bdi_unregister mtd: call bdi_unregister explicitly fs: explicitly unregister per-superblock BDIs mm: don't automatically unregister bdis mm: simplify bdi refcounting Jens Axboe <axboe@kernel.dk>: mm: don't read i_size of inode unless we need it "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/filemap.c: remove bogus VM_BUG_ON Jens Axboe <axboe@kernel.dk>: mm: move more expensive part of XA setup out of mapping check Subsystem: mm/gup John Hubbard <jhubbard@nvidia.com>: mm/gup: further simplify __gup_device_huge() Subsystem: mm/swap Xu Wang <vulab@iscas.ac.cn>: mm/swapfile: remove needless request_queue NULL pointer check Rafael Aquini <aquini@redhat.com>: mm/swapfile: fix an integer overflow in swap_show() "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: optimise put_pages_list() Subsystem: mm/memcg Peter Xu <peterx@redhat.com>: mm/memcg: drop swp_entry_t* in mc_handle_file_pte() Shakeel Butt <shakeelb@google.com>: memcg: flush stats only if updated memcg: unify memcg stat flushing Waiman Long <longman@redhat.com>: mm/memcg: remove obsolete memcg_free_kmem() Len Baker <len.baker@gmx.com>: mm/list_lru.c: prefer struct_size over open coded arithmetic Shakeel Butt <shakeelb@google.com>: memcg, kmem: further deprecate kmem.limit_in_bytes Muchun Song <songmuchun@bytedance.com>: mm: list_lru: remove holding lru lock mm: list_lru: fix the return value of list_lru_count_one() mm: memcontrol: remove kmemcg_id reparenting mm: memcontrol: remove the kmem states mm: list_lru: only add memcg-aware lrus to the global lru list Vasily Averin <vvs@virtuozzo.com>: Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3: mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks Michal Hocko <mhocko@suse.com>: mm, oom: do not trigger out_of_memory from the #PF Vasily Averin <vvs@virtuozzo.com>: memcg: prohibit unconditional exceeding the limit of dying tasks Subsystem: mm/pagemap Peng Liu <liupeng256@huawei.com>: mm/mmap.c: fix a data race of mm->total_vm Rolf Eike Beer <eb@emlix.com>: mm: use __pfn_to_section() instead of open coding it Amit Daniel Kachhap <amit.kachhap@arm.com>: mm/memory.c: avoid unnecessary kernel/user pointer conversion Nadav Amit <namit@vmware.com>: mm/memory.c: use correct VMA flags when freeing page-tables Peter Xu <peterx@redhat.com>: Patch series "mm: A few cleanup patches around zap, shmem and uffd", v4: mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte mm: clear vmf->pte after pte_unmap_same() returns mm: drop first_index/last_index in zap_details mm: add zap_skip_check_mapping() helper Qi Zheng <zhengqi.arch@bytedance.com>: Patch series "Do some code cleanups related to mm", v3: mm: introduce pmd_install() helper mm: remove redundant smp_wmb() Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>: Documentation: update pagemap with shmem exceptions Nicholas Piggin <npiggin@gmail.com>: Patch series "shoot lazy tlbs", v4: lazy tlb: introduce lazy mm refcount helper functions lazy tlb: allow lazy tlb mm refcounting to be configurable lazy tlb: shoot lazies, a non-refcounting lazy tlb option powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Lukas Bulwahn <lukas.bulwahn@gmail.com>: memory: remove unused CONFIG_MEM_BLOCK_SIZE Subsystem: mm/mprotect Liu Song <liu.song11@zte.com.cn>: mm/mprotect.c: avoid repeated assignment in do_mprotect_pkey() Subsystem: mm/mremap Dmitry Safonov <dima@arista.com>: mm/mremap: don't account pages in vma_to_resize() Subsystem: mm/iomap Lucas De Marchi <lucas.demarchi@intel.com>: include/linux/io-mapping.h: remove fallback for writecombine Subsystem: mm/tracing Gang Li <ligang.bdlg@bytedance.com>: mm: mmap_lock: remove redundant newline in TP_printk mm: mmap_lock: use DECLARE_EVENT_CLASS and DEFINE_EVENT_FN Subsystem: mm/vmalloc Vasily Averin <vvs@virtuozzo.com>: mm/vmalloc: repair warn_alloc()s in __vmalloc_area_node() Peter Zijlstra <peterz@infradead.org>: mm/vmalloc: don't allow VM_NO_GUARD on vmap() Eric Dumazet <edumazet@google.com>: mm/vmalloc: make show_numa_info() aware of hugepage mappings mm/vmalloc: make sure to dump unpurged areas in /proc/vmallocinfo "Uladzislau Rezki (Sony)" <urezki@gmail.com>: mm/vmalloc: do not adjust the search size for alignment overhead mm/vmalloc: check various alignments when debugging Vasily Averin <vvs@virtuozzo.com>: vmalloc: back off when the current task is OOM-killed Kefeng Wang <wangkefeng.wang@huawei.com>: vmalloc: choose a better start address in vm_area_register_early() arm64: support page mapping percpu first chunk allocator kasan: arm64: fix pcpu_page_first_chunk crash with KASAN_VMALLOC Michal Hocko <mhocko@suse.com>: mm/vmalloc: be more explicit about supported gfp flags Chen Wandun <chenwandun@huawei.com>: mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation Changcheng Deng <deng.changcheng@zte.com.cn>: lib/test_vmalloc.c: use swap() to make code cleaner Subsystem: mm/pagealloc Eric Dumazet <edumazet@google.com>: mm/large system hash: avoid possible NULL deref in alloc_large_system_hash Miaohe Lin <linmiaohe@huawei.com>: Patch series "Cleanups and fixup for page_alloc", v2: mm/page_alloc.c: remove meaningless VM_BUG_ON() in pindex_to_order() mm/page_alloc.c: simplify the code by using macro K() mm/page_alloc.c: fix obsolete comment in free_pcppages_bulk() mm/page_alloc.c: use helper function zone_spans_pfn() mm/page_alloc.c: avoid allocating highmem pages via alloc_pages_exact[_nid] Bharata B Rao <bharata@amd.com>: Patch series "Fix NUMA nodes fallback list ordering": mm/page_alloc: print node fallback order Krupa Ramakrishnan <krupa.ramakrishnan@amd.com>: mm/page_alloc: use accumulated load when building node fallback list Geert Uytterhoeven <geert+renesas@glider.be>: Patch series "Fix NUMA without SMP": mm: move node_reclaim_distance to fix NUMA without SMP mm: move fold_vm_numa_events() to fix NUMA without SMP Eric Dumazet <edumazet@google.com>: mm/page_alloc.c: do not acquire zone lock in is_free_buddy_page() Feng Tang <feng.tang@intel.com>: mm/page_alloc: detect allocation forbidden by cpuset and bail out early Liangcai Fan <liangcaifan19@gmail.com>: mm/page_alloc.c: show watermark_boost of zone in zoneinfo Christophe Leroy <christophe.leroy@csgroup.eu>: mm: create a new system state and fix core_kernel_text() mm: make generic arch_is_kernel_initmem_freed() do what it says powerpc: use generic version of arch_is_kernel_initmem_freed() s390: use generic version of arch_is_kernel_initmem_freed() Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm: page_alloc: use migrate_disable() in drain_local_pages_wq() Wang ShaoBo <bobo.shaobowang@huawei.com>: mm/page_alloc: use clamp() to simplify code Subsystem: mm/memory-failure Marco Elver <elver@google.com>: mm: fix data race in PagePoisoned() Rikard Falkeborn <rikard.falkeborn@gmail.com>: mm/memory_failure: constify static mm_walk_ops Yang Shi <shy828301@gmail.com>: Patch series "Solve silent data loss caused by poisoned page cache (shmem/tmpfs)", v5: mm: filemap: coding style cleanup for filemap_map_pmd() mm: hwpoison: refactor refcount check handling mm: shmem: don't truncate page if memory failure happens mm: hwpoison: handle non-anonymous THP correctly Subsystem: mm/hugetlb Peter Xu <peterx@redhat.com>: mm/hugetlb: drop __unmap_hugepage_range definition from hugetlb.h Mike Kravetz <mike.kravetz@oracle.com>: Patch series "hugetlb: add demote/split page functionality", v4: hugetlb: add demote hugetlb page sysfs interfaces mm/cma: add cma_pages_valid to determine if pages are in CMA hugetlb: be sure to free demoted CMA pages to CMA hugetlb: add demote bool to gigantic page routines hugetlb: add hugetlb demote page support Liangcai Fan <liangcaifan19@gmail.com>: mm: khugepaged: recalculate min_free_kbytes after stopping khugepaged Mina Almasry <almasrymina@google.com>: mm, hugepages: add mremap() support for hugepage backed vma mm, hugepages: add hugetlb vma mremap() test Baolin Wang <baolin.wang@linux.alibaba.com>: hugetlb: support node specified when using cma for gigantic hugepages Ran Jianping <ran.jianping@zte.com.cn>: mm: remove duplicate include in hugepage-mremap.c Baolin Wang <baolin.wang@linux.alibaba.com>: Patch series "Some cleanups and improvements for hugetlb": hugetlb_cgroup: remove unused hugetlb_cgroup_from_counter macro hugetlb: replace the obsolete hugetlb_instantiation_mutex in the comments hugetlb: remove redundant validation in has_same_uncharge_info() hugetlb: remove redundant VM_BUG_ON() in add_reservation_in_range() Mike Kravetz <mike.kravetz@oracle.com>: hugetlb: remove unnecessary set_page_count in prep_compound_gigantic_page Subsystem: mm/userfaultfd Axel Rasmussen <axelrasmussen@google.com>: Patch series "Small userfaultfd selftest fixups", v2: userfaultfd/selftests: don't rely on GNU extensions for random numbers userfaultfd/selftests: fix feature support detection userfaultfd/selftests: fix calculation of expected ioctls Subsystem: mm/vmscan Miaohe Lin <linmiaohe@huawei.com>: mm/page_isolation: fix potential missing call to unset_migratetype_isolate() mm/page_isolation: guard against possible putback unisolated page Kai Song <songkai01@inspur.com>: mm/vmscan.c: fix -Wunused-but-set-variable warning Mel Gorman <mgorman@techsingularity.net>: Patch series "Remove dependency on congestion_wait in mm/", v5. Patch series: mm/vmscan: throttle reclaim until some writeback completes if congested mm/vmscan: throttle reclaim and compaction when too may pages are isolated mm/vmscan: throttle reclaim when no progress is being made mm/writeback: throttle based on page writeback instead of congestion mm/page_alloc: remove the throttling logic from the page allocator mm/vmscan: centralise timeout values for reclaim_throttle mm/vmscan: increase the timeout if page reclaim is not making progress mm/vmscan: delay waking of tasks throttled on NOPROGRESS Yuanzheng Song <songyuanzheng@huawei.com>: mm/vmpressure: fix data-race with memcg->socket_pressure Subsystem: mm/tools Zhenliang Wei <weizhenliang@huawei.com>: tools/vm/page_owner_sort.c: count and sort by mem Naoya Horiguchi <naoya.horiguchi@nec.com>: Patch series "tools/vm/page-types.c: a few improvements": tools/vm/page-types.c: make walk_file() aware of address range option tools/vm/page-types.c: move show_file() to summary output tools/vm/page-types.c: print file offset in hexadecimal Subsystem: mm/memblock Mike Rapoport <rppt@linux.ibm.com>: Patch series "memblock: cleanup memblock_free interface", v2: arch_numa: simplify numa_distance allocation xen/x86: free_p2m_page: use memblock_free_ptr() to free a virtual pointer memblock: drop memblock_free_early_nid() and memblock_free_early() memblock: stop aliasing __memblock_free_late with memblock_free_late memblock: rename memblock_free to memblock_phys_free memblock: use memblock_free for freeing virtual pointers Subsystem: mm/oom-kill Sultan Alsawaf <sultan@kerneltoast.com>: mm: mark the OOM reaper thread as freezable Subsystem: mm/hugetlbfs Zhenguo Yao <yaozhenguo1@gmail.com>: hugetlbfs: extend the definition of hugepages parameter to support node allocation Subsystem: mm/migration John Hubbard <jhubbard@nvidia.com>: mm/migrate: de-duplicate migrate_reason strings Yang Shi <shy828301@gmail.com>: mm: migrate: make demotion knob depend on migration Subsystem: mm/thp "George G. Davis" <davis.george@siemens.com>: selftests/vm/transhuge-stress: fix ram size thinko Rongwei Wang <rongwei.wang@linux.alibaba.com>: Patch series "fix two bugs for file THP": mm, thp: lock filemap when truncating page cache mm, thp: fix incorrect unmap behavior for private pages Subsystem: mm/readahead Lin Feng <linf@wangsu.com>: mm/readahead.c: fix incorrect comments for get_init_ra_size Subsystem: mm/nommu Kefeng Wang <wangkefeng.wang@huawei.com>: mm: nommu: kill arch_get_unmapped_area() Subsystem: mm/ksm "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: selftest/vm: fix ksm selftest to run with different NUMA topologies Pedro Demarchi Gomes <pedrodemargomes@gmail.com>: selftests: vm: add KSM huge pages merging time test Subsystem: mm/vmstat Liu Shixin <liushixin2@huawei.com>: mm/vmstat: annotate data race for zone->free_area[order].nr_free Lin Feng <linf@wangsu.com>: mm: vmstat.c: make extfrag_index show more pretty Subsystem: mm/madvise David Hildenbrand <david@redhat.com>: selftests/vm: make MADV_POPULATE_(READ|WRITE) use in-tree headers Subsystem: mm/memory-hotplug Tang Yizhou <tangyizhou@huawei.com>: mm/memory_hotplug: add static qualifier for online_policy_to_str() David Hildenbrand <david@redhat.com>: Patch series "memory-hotplug.rst: document the "auto-movable" online policy": memory-hotplug.rst: fix two instances of "movablecore" that should be "movable_node" memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ path memory-hotplug.rst: document the "auto-movable" online policy Patch series "mm/memory_hotplug: Kconfig and 32 bit cleanups": mm/memory_hotplug: remove CONFIG_X86_64_ACPI_NUMA dependency from CONFIG_MEMORY_HOTPLUG mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE mm/memory_hotplug: restrict CONFIG_MEMORY_HOTPLUG to 64 bit mm/memory_hotplug: remove HIGHMEM leftovers mm/memory_hotplug: remove stale function declarations x86: remove memory hotplug support on X86_32 Patch series "mm/memory_hotplug: full support for add_memory_driver_managed() with CONFIG_ARCH_KEEP_MEMBLOCK", v2: mm/memory_hotplug: handle memblock_add_node() failures in add_memory_resource() memblock: improve MEMBLOCK_HOTPLUG documentation memblock: allow to specify flags with memblock_add_node() memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED mm/memory_hotplug: indicate MEMBLOCK_DRIVER_MANAGED with IORESOURCE_SYSRAM_DRIVER_MANAGED Subsystem: mm/rmap Alistair Popple <apopple@nvidia.com>: mm/rmap.c: avoid double faults migrating device private pages Subsystem: mm/zsmalloc Miaohe Lin <linmiaohe@huawei.com>: mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration() Subsystem: mm/highmem Ira Weiny <ira.weiny@intel.com>: mm/highmem: remove deprecated kmap_atomic Subsystem: mm/zram Jaewon Kim <jaewon31.kim@samsung.com>: zram_drv: allow reclaim on bio_alloc Dan Carpenter <dan.carpenter@oracle.com>: zram: off by one in read_block_state() Brian Geffon <bgeffon@google.com>: zram: introduce an aged idle interface Subsystem: mm/cleanups Stephen Kitt <steve@sk2.org>: mm: remove HARDENED_USERCOPY_FALLBACK Mianhan Liu <liumh1@shanghaitech.edu.cn>: include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h Subsystem: mm/kfence Marco Elver <elver@google.com>: stacktrace: move filter_irq_stacks() to kernel/stacktrace.c kfence: count unexpectedly skipped allocations kfence: move saving stack trace of allocations into __kfence_alloc() kfence: limit currently covered allocations when pool nearly full kfence: add note to documentation about skipping covered allocations kfence: test: use kunit_skip() to skip tests kfence: shorten critical sections of alloc/free kfence: always use static branches to guard kfence_alloc() kfence: default to dynamic branch instead of static keys mode Subsystem: mm/damon Geert Uytterhoeven <geert@linux-m68k.org>: mm/damon: grammar s/works/work/ SeongJae Park <sjpark@amazon.de>: Documentation/vm: move user guides to admin-guide/mm/ SeongJae Park <sj@kernel.org>: MAINTAINERS: update SeongJae's email address SeongJae Park <sjpark@amazon.de>: docs/vm/damon: remove broken reference include/linux/damon.h: fix kernel-doc comments for 'damon_callback' SeongJae Park <sj@kernel.org>: mm/damon/core: print kdamond start log in debug mode only Changbin Du <changbin.du@gmail.com>: mm/damon: remove unnecessary do_exit() from kdamond mm/damon: needn't hold kdamond_lock to print pid of kdamond Colin Ian King <colin.king@canonical.com>: mm/damon/core: nullify pointer ctx->kdamond with a NULL SeongJae Park <sj@kernel.org>: Patch series "Implement Data Access Monitoring-based Memory Operation Schemes": mm/damon/core: account age of target regions mm/damon/core: implement DAMON-based Operation Schemes (DAMOS) mm/damon/vaddr: support DAMON-based Operation Schemes mm/damon/dbgfs: support DAMON-based Operation Schemes mm/damon/schemes: implement statistics feature selftests/damon: add 'schemes' debugfs tests Docs/admin-guide/mm/damon: document DAMON-based Operation Schemes Patch series "DAMON: Support Physical Memory Address Space Monitoring:: mm/damon/dbgfs: allow users to set initial monitoring target regions mm/damon/dbgfs-test: add a unit test case for 'init_regions' Docs/admin-guide/mm/damon: document 'init_regions' feature mm/damon/vaddr: separate commonly usable functions mm/damon: implement primitives for physical address space monitoring mm/damon/dbgfs: support physical memory monitoring Docs/DAMON: document physical memory monitoring support Rikard Falkeborn <rikard.falkeborn@gmail.com>: mm/damon/vaddr: constify static mm_walk_ops Rongwei Wang <rongwei.wang@linux.alibaba.com>: mm/damon/dbgfs: remove unnecessary variables SeongJae Park <sj@kernel.org>: mm/damon/paddr: support the pageout scheme mm/damon/schemes: implement size quota for schemes application speed control mm/damon/schemes: skip already charged targets and regions mm/damon/schemes: implement time quota mm/damon/dbgfs: support quotas of schemes mm/damon/selftests: support schemes quotas mm/damon/schemes: prioritize regions within the quotas mm/damon/vaddr,paddr: support pageout prioritization mm/damon/dbgfs: support prioritization weights tools/selftests/damon: update for regions prioritization of schemes mm/damon/schemes: activate schemes based on a watermarks mechanism mm/damon/dbgfs: support watermarks selftests/damon: support watermarks mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM Xin Hao <xhao@linux.alibaba.com>: Patch series "mm/damon: Fix some small bugs", v4: mm/damon: remove unnecessary variable initialization mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on SeongJae Park <sj@kernel.org>: Patch series "Fix trivial nits in Documentation/admin-guide/mm": Docs/admin-guide/mm/damon/start: fix wrong example commands Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Changbin Du <changbin.du@gmail.com>: mm/damon: simplify stop mechanism Colin Ian King <colin.i.king@googlemail.com>: mm/damon: fix a few spelling mistakes in comments and a pr_debug message Changbin Du <changbin.du@gmail.com>: mm/damon: remove return value from before_terminate callback a/Documentation/admin-guide/blockdev/zram.rst | 8 a/Documentation/admin-guide/cgroup-v1/memory.rst | 11 a/Documentation/admin-guide/kernel-parameters.txt | 14 a/Documentation/admin-guide/mm/damon/index.rst | 1 a/Documentation/admin-guide/mm/damon/reclaim.rst | 235 +++ a/Documentation/admin-guide/mm/damon/start.rst | 140 + a/Documentation/admin-guide/mm/damon/usage.rst | 117 + a/Documentation/admin-guide/mm/hugetlbpage.rst | 42 a/Documentation/admin-guide/mm/memory-hotplug.rst | 147 +- a/Documentation/admin-guide/mm/pagemap.rst | 75 - a/Documentation/core-api/memory-hotplug.rst | 3 a/Documentation/dev-tools/kfence.rst | 23 a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst | 4 a/Documentation/vm/damon/design.rst | 29 a/Documentation/vm/damon/faq.rst | 5 a/Documentation/vm/damon/index.rst | 1 a/Documentation/vm/page_owner.rst | 23 a/MAINTAINERS | 2 a/Makefile | 15 a/arch/Kconfig | 28 a/arch/alpha/kernel/core_irongate.c | 6 a/arch/arc/mm/init.c | 6 a/arch/arm/mach-hisi/platmcpm.c | 2 a/arch/arm/mach-rpc/ecard.c | 2 a/arch/arm/mm/init.c | 2 a/arch/arm64/Kconfig | 4 a/arch/arm64/mm/kasan_init.c | 16 a/arch/arm64/mm/mmu.c | 4 a/arch/ia64/mm/contig.c | 2 a/arch/ia64/mm/init.c | 2 a/arch/m68k/mm/mcfmmu.c | 3 a/arch/m68k/mm/motorola.c | 6 a/arch/mips/loongson64/init.c | 4 a/arch/mips/mm/init.c | 6 a/arch/mips/sgi-ip27/ip27-memory.c | 3 a/arch/mips/sgi-ip30/ip30-setup.c | 6 a/arch/powerpc/Kconfig | 1 a/arch/powerpc/configs/skiroot_defconfig | 1 a/arch/powerpc/include/asm/machdep.h | 2 a/arch/powerpc/include/asm/sections.h | 13 a/arch/powerpc/kernel/dt_cpu_ftrs.c | 8 a/arch/powerpc/kernel/paca.c | 8 a/arch/powerpc/kernel/setup-common.c | 4 a/arch/powerpc/kernel/setup_64.c | 6 a/arch/powerpc/kernel/smp.c | 2 a/arch/powerpc/mm/book3s64/radix_tlb.c | 4 a/arch/powerpc/mm/hugetlbpage.c | 9 a/arch/powerpc/platforms/powernv/pci-ioda.c | 4 a/arch/powerpc/platforms/powernv/setup.c | 4 a/arch/powerpc/platforms/pseries/setup.c | 2 a/arch/powerpc/platforms/pseries/svm.c | 9 a/arch/riscv/kernel/setup.c | 10 a/arch/s390/include/asm/sections.h | 12 a/arch/s390/kernel/setup.c | 11 a/arch/s390/kernel/smp.c | 6 a/arch/s390/kernel/uv.c | 2 a/arch/s390/mm/init.c | 3 a/arch/s390/mm/kasan_init.c | 2 a/arch/sh/boards/mach-ap325rxa/setup.c | 2 a/arch/sh/boards/mach-ecovec24/setup.c | 4 a/arch/sh/boards/mach-kfr2r09/setup.c | 2 a/arch/sh/boards/mach-migor/setup.c | 2 a/arch/sh/boards/mach-se/7724/setup.c | 4 a/arch/sparc/kernel/smp_64.c | 4 a/arch/um/kernel/mem.c | 4 a/arch/x86/Kconfig | 6 a/arch/x86/kernel/setup.c | 4 a/arch/x86/kernel/setup_percpu.c | 2 a/arch/x86/mm/init.c | 2 a/arch/x86/mm/init_32.c | 31 a/arch/x86/mm/kasan_init_64.c | 4 a/arch/x86/mm/numa.c | 2 a/arch/x86/mm/numa_emulation.c | 2 a/arch/x86/xen/mmu_pv.c | 8 a/arch/x86/xen/p2m.c | 4 a/arch/x86/xen/setup.c | 6 a/drivers/base/Makefile | 2 a/drivers/base/arch_numa.c | 96 + a/drivers/base/node.c | 9 a/drivers/block/zram/zram_drv.c | 66 a/drivers/firmware/efi/memmap.c | 2 a/drivers/hwmon/occ/p9_sbe.c | 1 a/drivers/macintosh/smu.c | 2 a/drivers/mmc/core/mmc_test.c | 1 a/drivers/mtd/mtdcore.c | 1 a/drivers/of/kexec.c | 4 a/drivers/of/of_reserved_mem.c | 5 a/drivers/rapidio/devices/rio_mport_cdev.c | 9 a/drivers/s390/char/sclp_early.c | 4 a/drivers/usb/early/xhci-dbc.c | 10 a/drivers/virtio/Kconfig | 2 a/drivers/xen/swiotlb-xen.c | 4 a/fs/d_path.c | 8 a/fs/exec.c | 4 a/fs/ocfs2/alloc.c | 21 a/fs/ocfs2/dlm/dlmrecovery.c | 1 a/fs/ocfs2/file.c | 8 a/fs/ocfs2/inode.c | 4 a/fs/ocfs2/journal.c | 28 a/fs/ocfs2/journal.h | 3 a/fs/ocfs2/super.c | 40 a/fs/open.c | 16 a/fs/posix_acl.c | 3 a/fs/proc/task_mmu.c | 28 a/fs/super.c | 3 a/include/asm-generic/sections.h | 14 a/include/linux/backing-dev-defs.h | 3 a/include/linux/backing-dev.h | 1 a/include/linux/cma.h | 1 a/include/linux/compiler-gcc.h | 8 a/include/linux/compiler_attributes.h | 10 a/include/linux/compiler_types.h | 12 a/include/linux/cpuset.h | 17 a/include/linux/damon.h | 258 +++ a/include/linux/fs.h | 1 a/include/linux/gfp.h | 8 a/include/linux/highmem.h | 28 a/include/linux/hugetlb.h | 36 a/include/linux/io-mapping.h | 6 a/include/linux/kasan.h | 8 a/include/linux/kernel.h | 1 a/include/linux/kfence.h | 21 a/include/linux/memblock.h | 48 a/include/linux/memcontrol.h | 9 a/include/linux/memory.h | 26 a/include/linux/memory_hotplug.h | 3 a/include/linux/mempolicy.h | 5 a/include/linux/migrate.h | 23 a/include/linux/migrate_mode.h | 13 a/include/linux/mm.h | 57 a/include/linux/mm_types.h | 2 a/include/linux/mmzone.h | 41 a/include/linux/node.h | 4 a/include/linux/page-flags.h | 2 a/include/linux/percpu.h | 6 a/include/linux/sched/mm.h | 25 a/include/linux/slab.h | 181 +- a/include/linux/slub_def.h | 13 a/include/linux/stackdepot.h | 8 a/include/linux/stacktrace.h | 1 a/include/linux/swap.h | 1 a/include/linux/vmalloc.h | 24 a/include/trace/events/mmap_lock.h | 50 a/include/trace/events/vmscan.h | 42 a/include/trace/events/writeback.h | 7 a/init/Kconfig | 2 a/init/initramfs.c | 4 a/init/main.c | 6 a/kernel/cgroup/cpuset.c | 23 a/kernel/cpu.c | 2 a/kernel/dma/swiotlb.c | 6 a/kernel/exit.c | 2 a/kernel/extable.c | 2 a/kernel/fork.c | 51 a/kernel/kexec_file.c | 5 a/kernel/kthread.c | 21 a/kernel/locking/lockdep.c | 15 a/kernel/printk/printk.c | 4 a/kernel/sched/core.c | 37 a/kernel/sched/sched.h | 4 a/kernel/sched/topology.c | 1 a/kernel/stacktrace.c | 30 a/kernel/tsacct.c | 2 a/kernel/workqueue.c | 2 a/lib/Kconfig.debug | 2 a/lib/Kconfig.kfence | 26 a/lib/bootconfig.c | 2 a/lib/cpumask.c | 6 a/lib/stackdepot.c | 76 - a/lib/test_kasan.c | 26 a/lib/test_kasan_module.c | 2 a/lib/test_vmalloc.c | 6 a/mm/Kconfig | 10 a/mm/backing-dev.c | 65 a/mm/cma.c | 26 a/mm/compaction.c | 12 a/mm/damon/Kconfig | 24 a/mm/damon/Makefile | 4 a/mm/damon/core.c | 500 ++++++- a/mm/damon/dbgfs-test.h | 56 a/mm/damon/dbgfs.c | 486 +++++- a/mm/damon/paddr.c | 275 +++ a/mm/damon/prmtv-common.c | 133 + a/mm/damon/prmtv-common.h | 20 a/mm/damon/reclaim.c | 356 ++++ a/mm/damon/vaddr-test.h | 2 a/mm/damon/vaddr.c | 167 +- a/mm/debug.c | 20 a/mm/debug_vm_pgtable.c | 7 a/mm/filemap.c | 78 - a/mm/gup.c | 5 a/mm/highmem.c | 6 a/mm/hugetlb.c | 713 +++++++++- a/mm/hugetlb_cgroup.c | 3 a/mm/internal.h | 26 a/mm/kasan/common.c | 8 a/mm/kasan/generic.c | 16 a/mm/kasan/kasan.h | 2 a/mm/kasan/shadow.c | 5 a/mm/kfence/core.c | 214 ++- a/mm/kfence/kfence.h | 2 a/mm/kfence/kfence_test.c | 14 a/mm/khugepaged.c | 10 a/mm/list_lru.c | 58 a/mm/memblock.c | 35 a/mm/memcontrol.c | 217 +-- a/mm/memory-failure.c | 117 + a/mm/memory.c | 166 +- a/mm/memory_hotplug.c | 57 a/mm/mempolicy.c | 143 +- a/mm/migrate.c | 61 a/mm/mmap.c | 2 a/mm/mprotect.c | 5 a/mm/mremap.c | 86 - a/mm/nommu.c | 6 a/mm/oom_kill.c | 27 a/mm/page-writeback.c | 13 a/mm/page_alloc.c | 119 - a/mm/page_ext.c | 2 a/mm/page_isolation.c | 29 a/mm/percpu.c | 24 a/mm/readahead.c | 2 a/mm/rmap.c | 8 a/mm/shmem.c | 44 a/mm/slab.c | 16 a/mm/slab_common.c | 8 a/mm/slub.c | 117 - a/mm/sparse-vmemmap.c | 2 a/mm/sparse.c | 6 a/mm/swap.c | 23 a/mm/swapfile.c | 6 a/mm/userfaultfd.c | 8 a/mm/vmalloc.c | 107 + a/mm/vmpressure.c | 2 a/mm/vmscan.c | 194 ++ a/mm/vmstat.c | 76 - a/mm/zsmalloc.c | 7 a/net/ipv4/tcp.c | 1 a/net/ipv4/udp.c | 1 a/net/netfilter/ipvs/ip_vs_ctl.c | 1 a/net/openvswitch/meter.c | 1 a/net/sctp/protocol.c | 1 a/scripts/checkpatch.pl | 3 a/scripts/decodecode | 2 a/scripts/spelling.txt | 18 a/security/Kconfig | 14 a/tools/testing/selftests/damon/debugfs_attrs.sh | 25 a/tools/testing/selftests/memory-hotplug/config | 1 a/tools/testing/selftests/vm/.gitignore | 1 a/tools/testing/selftests/vm/Makefile | 1 a/tools/testing/selftests/vm/hugepage-mremap.c | 161 ++ a/tools/testing/selftests/vm/ksm_tests.c | 154 ++ a/tools/testing/selftests/vm/madv_populate.c | 15 a/tools/testing/selftests/vm/run_vmtests.sh | 11 a/tools/testing/selftests/vm/transhuge-stress.c | 2 a/tools/testing/selftests/vm/userfaultfd.c | 157 +- a/tools/vm/page-types.c | 38 a/tools/vm/page_owner_sort.c | 94 + b/Documentation/admin-guide/mm/index.rst | 2 b/Documentation/vm/index.rst | 26 260 files changed, 6448 insertions(+), 2327 deletions(-)
87 patches, based on 8bb7eca972ad531c9b149c0a51ab43a417385813, plus previously sent material. Subsystems affected by this patch series: mm/pagecache mm/hugetlb procfs misc MAINTAINERS lib checkpatch binfmt kallsyms ramfs init codafs nilfs2 hfs crash_dump signals seq_file fork sysvfs kcov gdb resource selftests ipc Subsystem: mm/pagecache Johannes Weiner <hannes@cmpxchg.org>: vfs: keep inodes with page cache off the inode shrinker LRU Subsystem: mm/hugetlb zhangyiru <zhangyiru3@huawei.com>: mm,hugetlb: remove mlock ulimit for SHM_HUGETLB Subsystem: procfs Florian Weimer <fweimer@redhat.com>: procfs: do not list TID 0 in /proc/<pid>/task David Hildenbrand <david@redhat.com>: x86/xen: update xen_oldmem_pfn_is_ram() documentation x86/xen: simplify xen_oldmem_pfn_is_ram() x86/xen: print a warning when HVMOP_get_mem_type fails proc/vmcore: let pfn_is_ram() return a bool proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug() virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug() virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug() virtio-mem: kdump mode to sanitize /proc/vmcore access Stephen Brennan <stephen.s.brennan@oracle.com>: proc: allow pid_revalidate() during LOOKUP_RCU Subsystem: misc Andy Shevchenko <andriy.shevchenko@linux.intel.com>: Patch series "kernel.h further split", v5: kernel.h: drop unneeded <linux/kernel.h> inclusion from other headers kernel.h: split out container_of() and typeof_member() macros include/kunit/test.h: replace kernel.h with the necessary inclusions include/linux/list.h: replace kernel.h with the necessary inclusions include/linux/llist.h: replace kernel.h with the necessary inclusions include/linux/plist.h: replace kernel.h with the necessary inclusions include/media/media-entity.h: replace kernel.h with the necessary inclusions include/linux/delay.h: replace kernel.h with the necessary inclusions include/linux/sbitmap.h: replace kernel.h with the necessary inclusions include/linux/radix-tree.h: replace kernel.h with the necessary inclusions include/linux/generic-radix-tree.h: replace kernel.h with the necessary inclusions Stephen Rothwell <sfr@canb.auug.org.au>: kernel.h: split out instruction pointer accessors Rasmus Villemoes <linux@rasmusvillemoes.dk>: linux/container_of.h: switch to static_assert Colin Ian King <colin.i.king@googlemail.com>: mailmap: update email address for Colin King Subsystem: MAINTAINERS Kees Cook <keescook@chromium.org>: MAINTAINERS: add "exec & binfmt" section with myself and Eric Lukas Bulwahn <lukas.bulwahn@gmail.com>: Patch series "Rectify file references for dt-bindings in MAINTAINERS", v5: MAINTAINERS: rectify entry for ARM/TOSHIBA VISCONTI ARCHITECTURE MAINTAINERS: rectify entry for HIKEY960 ONBOARD USB GPIO HUB DRIVER MAINTAINERS: rectify entry for INTEL KEEM BAY DRM DRIVER MAINTAINERS: rectify entry for ALLWINNER HARDWARE SPINLOCK SUPPORT Subsystem: lib Imran Khan <imran.f.khan@oracle.com>: Patch series "lib, stackdepot: check stackdepot handle before accessing slabs", v2: lib, stackdepot: check stackdepot handle before accessing slabs lib, stackdepot: add helper to print stack entries lib, stackdepot: add helper to print stack entries into buffer Lucas De Marchi <lucas.demarchi@intel.com>: include/linux/string_helpers.h: add linux/string.h for strlen() Alexey Dobriyan <adobriyan@gmail.com>: lib: uninline simple_strntoull() as well Thomas Gleixner <tglx@linutronix.de>: mm/scatterlist: replace the !preemptible warning in sg_miter_stop() Subsystem: checkpatch Rikard Falkeborn <rikard.falkeborn@gmail.com>: const_structs.checkpatch: add a few sound ops structs Joe Perches <joe@perches.com>: checkpatch: improve EXPORT_SYMBOL test for EXPORT_SYMBOL_NS uses Peter Ujfalusi <peter.ujfalusi@linux.intel.com>: checkpatch: get default codespell dictionary path from package location Subsystem: binfmt Kees Cook <keescook@chromium.org>: binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE Alexey Dobriyan <adobriyan@gmail.com>: ELF: simplify STACK_ALLOC macro Subsystem: kallsyms Kefeng Wang <wangkefeng.wang@huawei.com>: Patch series "sections: Unify kernel sections range check and use", v4: kallsyms: remove arch specific text and data check kallsyms: fix address-checks for kernel related range sections: move and rename core_kernel_data() to is_kernel_core_data() sections: move is_kernel_inittext() into sections.h x86: mm: rename __is_kernel_text() to is_x86_32_kernel_text() sections: provide internal __is_kernel() and __is_kernel_text() helper mm: kasan: use is_kernel() helper extable: use is_kernel_text() helper powerpc/mm: use core_kernel_text() helper microblaze: use is_kernel_text() helper alpha: use is_kernel_text() helper Subsystem: ramfs yangerkun <yangerkun@huawei.com>: ramfs: fix mount source show for ramfs Subsystem: init Andrew Halaney <ahalaney@redhat.com>: init: make unknown command line param message clearer Subsystem: codafs Jan Harkes <jaharkes@cs.cmu.edu>: Patch series "Coda updates for -next": coda: avoid NULL pointer dereference from a bad inode coda: check for async upcall request using local state Alex Shi <alex.shi@linux.alibaba.com>: coda: remove err which no one care Jan Harkes <jaharkes@cs.cmu.edu>: coda: avoid flagging NULL inodes coda: avoid hidden code duplication in rename coda: avoid doing bad things on inode type changes during revalidation Xiyu Yang <xiyuyang19@fudan.edu.cn>: coda: convert from atomic_t to refcount_t on coda_vm_ops->refcnt Jing Yangyang <jing.yangyang@zte.com.cn>: coda: use vmemdup_user to replace the open code Jan Harkes <jaharkes@cs.cmu.edu>: coda: bump module version to 7.2 Subsystem: nilfs2 Qing Wang <wangqing@vivo.com>: Patch series "nilfs2 updates": nilfs2: replace snprintf in show functions with sysfs_emit Ryusuke Konishi <konishi.ryusuke@gmail.com>: nilfs2: remove filenames from file comments Subsystem: hfs Arnd Bergmann <arnd@arndb.de>: hfs/hfsplus: use WARN_ON for sanity check Subsystem: crash_dump Changcheng Deng <deng.changcheng@zte.com.cn>: crash_dump: fix boolreturn.cocci warning Ye Guojin <ye.guojin@zte.com.cn>: crash_dump: remove duplicate include in crash_dump.h Subsystem: signals Ye Guojin <ye.guojin@zte.com.cn>: signal: remove duplicate include in signal.h Subsystem: seq_file Andy Shevchenko <andriy.shevchenko@linux.intel.com>: seq_file: move seq_escape() to a header Muchun Song <songmuchun@bytedance.com>: seq_file: fix passing wrong private data Subsystem: fork Ran Xiaokai <ran.xiaokai@zte.com.cn>: kernel/fork.c: unshare(): use swap() to make code cleaner Subsystem: sysvfs Pavel Skripkin <paskripkin@gmail.com>: sysv: use BUILD_BUG_ON instead of runtime check Subsystem: kcov Sebastian Andrzej Siewior <bigeasy@linutronix.de>: Patch series "kcov: PREEMPT_RT fixup + misc", v2: Documentation/kcov: include types.h in the example Documentation/kcov: define `ip' in the example kcov: allocate per-CPU memory on the relevant node kcov: avoid enable+disable interrupts if !in_task() kcov: replace local_irq_save() with a local_lock_t Subsystem: gdb Douglas Anderson <dianders@chromium.org>: scripts/gdb: handle split debug for vmlinux Subsystem: resource David Hildenbrand <david@redhat.com>: Patch series "virtio-mem: disallow mapping virtio-mem memory via /dev/mem", v5: kernel/resource: clean up and optimize iomem_is_exclusive() kernel/resource: disallow access to exclusive system RAM regions virtio-mem: disallow mapping virtio-mem memory via /dev/mem Subsystem: selftests SeongJae Park <sjpark@amazon.de>: selftests/kselftest/runner/run_one(): allow running non-executable files Subsystem: ipc Michal Clapinski <mclapinski@google.com>: ipc: check checkpoint_restore_ns_capable() to modify C/R proc files Manfred Spraul <manfred@colorfullife.com>: ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL .mailmap | 2 Documentation/dev-tools/kcov.rst | 5 MAINTAINERS | 21 + arch/alpha/kernel/traps.c | 4 arch/microblaze/mm/pgtable.c | 3 arch/powerpc/mm/pgtable_32.c | 7 arch/riscv/lib/delay.c | 4 arch/s390/include/asm/facility.h | 4 arch/x86/kernel/aperture_64.c | 13 arch/x86/kernel/unwind_orc.c | 2 arch/x86/mm/init_32.c | 14 arch/x86/xen/mmu_hvm.c | 39 -- drivers/gpu/drm/drm_dp_mst_topology.c | 5 drivers/gpu/drm/drm_mm.c | 5 drivers/gpu/drm/i915/i915_vma.c | 5 drivers/gpu/drm/i915/intel_runtime_pm.c | 20 - drivers/media/dvb-frontends/cxd2880/cxd2880_common.h | 1 drivers/virtio/Kconfig | 1 drivers/virtio/virtio_mem.c | 321 +++++++++++++------ fs/binfmt_elf.c | 33 + fs/coda/cnode.c | 13 fs/coda/coda_linux.c | 39 +- fs/coda/coda_linux.h | 6 fs/coda/dir.c | 20 - fs/coda/file.c | 12 fs/coda/psdev.c | 14 fs/coda/upcall.c | 3 fs/hfs/inode.c | 6 fs/hfsplus/inode.c | 12 fs/hugetlbfs/inode.c | 23 - fs/inode.c | 46 +- fs/internal.h | 1 fs/nilfs2/alloc.c | 2 fs/nilfs2/alloc.h | 2 fs/nilfs2/bmap.c | 2 fs/nilfs2/bmap.h | 2 fs/nilfs2/btnode.c | 2 fs/nilfs2/btnode.h | 2 fs/nilfs2/btree.c | 2 fs/nilfs2/btree.h | 2 fs/nilfs2/cpfile.c | 2 fs/nilfs2/cpfile.h | 2 fs/nilfs2/dat.c | 2 fs/nilfs2/dat.h | 2 fs/nilfs2/dir.c | 2 fs/nilfs2/direct.c | 2 fs/nilfs2/direct.h | 2 fs/nilfs2/file.c | 2 fs/nilfs2/gcinode.c | 2 fs/nilfs2/ifile.c | 2 fs/nilfs2/ifile.h | 2 fs/nilfs2/inode.c | 2 fs/nilfs2/ioctl.c | 2 fs/nilfs2/mdt.c | 2 fs/nilfs2/mdt.h | 2 fs/nilfs2/namei.c | 2 fs/nilfs2/nilfs.h | 2 fs/nilfs2/page.c | 2 fs/nilfs2/page.h | 2 fs/nilfs2/recovery.c | 2 fs/nilfs2/segbuf.c | 2 fs/nilfs2/segbuf.h | 2 fs/nilfs2/segment.c | 2 fs/nilfs2/segment.h | 2 fs/nilfs2/sufile.c | 2 fs/nilfs2/sufile.h | 2 fs/nilfs2/super.c | 2 fs/nilfs2/sysfs.c | 78 ++-- fs/nilfs2/sysfs.h | 2 fs/nilfs2/the_nilfs.c | 2 fs/nilfs2/the_nilfs.h | 2 fs/proc/base.c | 21 - fs/proc/vmcore.c | 109 ++++-- fs/ramfs/inode.c | 11 fs/seq_file.c | 16 fs/sysv/super.c | 6 include/asm-generic/sections.h | 75 +++- include/kunit/test.h | 13 include/linux/bottom_half.h | 3 include/linux/container_of.h | 52 ++- include/linux/crash_dump.h | 30 + include/linux/delay.h | 2 include/linux/fs.h | 1 include/linux/fwnode.h | 1 include/linux/generic-radix-tree.h | 3 include/linux/hugetlb.h | 6 include/linux/instruction_pointer.h | 8 include/linux/kallsyms.h | 21 - include/linux/kernel.h | 39 -- include/linux/list.h | 4 include/linux/llist.h | 4 include/linux/pagemap.h | 50 ++ include/linux/plist.h | 5 include/linux/radix-tree.h | 4 include/linux/rwsem.h | 1 include/linux/sbitmap.h | 11 include/linux/seq_file.h | 19 + include/linux/signal.h | 1 include/linux/smp.h | 1 include/linux/spinlock.h | 1 include/linux/stackdepot.h | 5 include/linux/string_helpers.h | 1 include/media/media-entity.h | 3 init/main.c | 4 ipc/ipc_sysctl.c | 42 +- ipc/shm.c | 8 kernel/extable.c | 33 - kernel/fork.c | 9 kernel/kcov.c | 40 +- kernel/locking/lockdep.c | 3 kernel/resource.c | 54 ++- kernel/trace/ftrace.c | 2 lib/scatterlist.c | 11 lib/stackdepot.c | 46 ++ lib/vsprintf.c | 3 mm/Kconfig | 7 mm/filemap.c | 8 mm/kasan/report.c | 17 - mm/memfd.c | 4 mm/mmap.c | 3 mm/page_owner.c | 18 - mm/truncate.c | 19 + mm/vmscan.c | 7 mm/workingset.c | 10 net/sysctl_net.c | 2 scripts/checkpatch.pl | 33 + scripts/const_structs.checkpatch | 4 scripts/gdb/linux/symbols.py | 3 tools/testing/selftests/kselftest/runner.sh | 28 + tools/testing/selftests/proc/.gitignore | 1 tools/testing/selftests/proc/Makefile | 2 tools/testing/selftests/proc/proc-tid0.c | 81 ++++ 132 files changed, 1206 insertions(+), 681 deletions(-)
The post-linux-next material. 7 patches, based on debe436e77c72fcee804fb867f275e6d31aa999c. Subsystems affected by this patch series: mm/debug mm/slab-generic mm/migration mm/memcg mm/kasan Subsystem: mm/debug Yixuan Cao <caoyixuan2019@email.szu.edu.cn>: mm/page_owner.c: modify the type of argument "order" in some functions Subsystem: mm/slab-generic Ingo Molnar <mingo@kernel.org>: mm: allow only SLUB on PREEMPT_RT Subsystem: mm/migration Baolin Wang <baolin.wang@linux.alibaba.com>: mm: migrate: simplify the file-backed pages validation when migrating its mapping Alistair Popple <apopple@nvidia.com>: mm/migrate.c: remove MIGRATE_PFN_LOCKED Subsystem: mm/memcg Christoph Hellwig <hch@lst.de>: Patch series "unexport memcg locking helpers": mm: unexport folio_memcg_{,un}lock mm: unexport {,un}lock_page_memcg Subsystem: mm/kasan Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>: kasan: add kasan mode messages when kasan init Documentation/vm/hmm.rst | 2 arch/arm64/mm/kasan_init.c | 2 arch/powerpc/kvm/book3s_hv_uvmem.c | 4 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 include/linux/migrate.h | 1 include/linux/page_owner.h | 12 +- init/Kconfig | 2 lib/test_hmm.c | 5 - mm/kasan/hw_tags.c | 14 ++ mm/kasan/sw_tags.c | 2 mm/memcontrol.c | 4 mm/migrate.c | 151 +++++-------------------------- mm/page_owner.c | 6 - 14 files changed, 61 insertions(+), 150 deletions(-)
15 patches, based on a90af8f15bdc9449ee2d24e1d73fa3f7e8633f81. Subsystems affected by this patch series: mm/swap ipc mm/slab-generic hexagon mm/kmemleak mm/hugetlb mm/kasan mm/damon mm/highmem proc Subsystem: mm/swap Matthew Wilcox <willy@infradead.org>: mm/swap.c:put_pages_list(): reinitialise the page list Subsystem: ipc Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>: Patch series "shm: shm_rmid_forced feature fixes": ipc: WARN if trying to remove ipc object which is absent shm: extend forced shm destroy to support objects from several IPC nses Subsystem: mm/slab-generic Yunfeng Ye <yeyunfeng@huawei.com>: mm: emit the "free" trace report before freeing memory in kmem_cache_free() Subsystem: hexagon Nathan Chancellor <nathan@kernel.org>: Patch series "Fixes for ARCH=hexagon allmodconfig", v2: hexagon: export raw I/O routines for modules hexagon: clean up timer-regs.h hexagon: ignore vmlinux.lds Subsystem: mm/kmemleak Rustam Kovhaev <rkovhaev@gmail.com>: mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag Subsystem: mm/hugetlb Bui Quang Minh <minhquangbui99@gmail.com>: hugetlb: fix hugetlb cgroup refcounting during mremap Mina Almasry <almasrymina@google.com>: hugetlb, userfaultfd: fix reservation restore on userfaultfd error Subsystem: mm/kasan Kees Cook <keescook@chromium.org>: kasan: test: silence intentional read overflow warnings Subsystem: mm/damon SeongJae Park <sj@kernel.org>: Patch series "DAMON fixes": mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation mm/damon/dbgfs: fix missed use of damon_dbgfs_lock Subsystem: mm/highmem Ard Biesheuvel <ardb@kernel.org>: kmap_local: don't assume kmap PTEs are linear arrays in memory Subsystem: proc David Hildenbrand <david@redhat.com>: proc/vmcore: fix clearing user buffer by properly using clear_user() arch/arm/Kconfig | 1 arch/hexagon/include/asm/timer-regs.h | 26 ---- arch/hexagon/include/asm/timex.h | 3 arch/hexagon/kernel/.gitignore | 1 arch/hexagon/kernel/time.c | 12 +- arch/hexagon/lib/io.c | 4 fs/proc/vmcore.c | 20 ++- include/linux/hugetlb_cgroup.h | 12 ++ include/linux/ipc_namespace.h | 15 ++ include/linux/sched/task.h | 2 ipc/shm.c | 189 +++++++++++++++++++++++++--------- ipc/util.c | 6 - lib/test_kasan.c | 2 mm/Kconfig | 3 mm/damon/dbgfs.c | 20 ++- mm/highmem.c | 32 +++-- mm/hugetlb.c | 11 + mm/slab.c | 3 mm/slab.h | 2 mm/slob.c | 3 mm/slub.c | 2 mm/swap.c | 1 22 files changed, 254 insertions(+), 116 deletions(-)
21 patches, based on c741e49150dbb0c0aebe234389f4aa8b47958fa8. Subsystems affected by this patch series: mm/mlock MAINTAINERS mailmap mm/pagecache mm/damon mm/slub mm/memcg mm/hugetlb mm/pagecache Subsystem: mm/mlock Drew DeVault <sir@cmpwn.com>: Increase default MLOCK_LIMIT to 8 MiB Subsystem: MAINTAINERS Dave Young <dyoung@redhat.com>: MAINTAINERS: update kdump maintainers Subsystem: mailmap Guo Ren <guoren@linux.alibaba.com>: mailmap: update email address for Guo Ren Subsystem: mm/pagecache "Matthew Wilcox (Oracle)" <willy@infradead.org>: filemap: remove PageHWPoison check from next_uptodate_page() Subsystem: mm/damon SeongJae Park <sj@kernel.org>: Patch series "mm/damon: Fix fake /proc/loadavg reports", v3: timers: implement usleep_idle_range() mm/damon/core: fix fake load reports due to uninterruptible sleeps Patch series "mm/damon: Trivial fixups and improvements": mm/damon/core: use better timer mechanisms selection threshold mm/damon/dbgfs: remove an unnecessary error message mm/damon/core: remove unnecessary error messages mm/damon/vaddr: remove an unnecessary warning message mm/damon/vaddr-test: split a test function having >1024 bytes frame size mm/damon/vaddr-test: remove unnecessary variables selftests/damon: skip test if DAMON is running selftests/damon: test DAMON enabling with empty target_ids case selftests/damon: test wrong DAMOS condition ranges input selftests/damon: test debugfs file reads/writes with huge count selftests/damon: split test cases Subsystem: mm/slub Gerald Schaefer <gerald.schaefer@linux.ibm.com>: mm/slub: fix endianness bug for alloc/free_traces attributes Subsystem: mm/memcg Waiman Long <longman@redhat.com>: mm/memcg: relocate mod_objcg_mlstate(), get_obj_stock() and put_obj_stock() Subsystem: mm/hugetlb Zhenguo Yao <yaozhenguo1@gmail.com>: hugetlbfs: fix issue of preallocation of gigantic pages can't work Subsystem: mm/pagecache Manjong Lee <mj0123.lee@samsung.com>: mm: bdi: initialize bdi_min_ratio when bdi is unregistered .mailmap | 2 MAINTAINERS | 2 include/linux/delay.h | 14 include/uapi/linux/resource.h | 13 kernel/time/timer.c | 16 - mm/backing-dev.c | 7 mm/damon/core.c | 20 - mm/damon/dbgfs.c | 4 mm/damon/vaddr-test.h | 85 ++--- mm/damon/vaddr.c | 1 mm/filemap.c | 2 mm/hugetlb.c | 2 mm/memcontrol.c | 106 +++---- mm/slub.c | 15 - tools/testing/selftests/damon/.gitignore | 2 tools/testing/selftests/damon/Makefile | 7 tools/testing/selftests/damon/_debugfs_common.sh | 52 +++ tools/testing/selftests/damon/debugfs_attrs.sh | 149 ++-------- tools/testing/selftests/damon/debugfs_empty_targets.sh | 13 tools/testing/selftests/damon/debugfs_huge_count_read_write.sh | 22 + tools/testing/selftests/damon/debugfs_schemes.sh | 19 + tools/testing/selftests/damon/debugfs_target_ids.sh | 19 + tools/testing/selftests/damon/huge_count_read_write.c | 39 ++ 23 files changed, 363 insertions(+), 248 deletions(-)
9 patches, based on bc491fb12513e79702c6f936c838f792b5389129. Subsystems affected by this patch series: mm/kfence mm/mempolicy core-kernel MAINTAINERS mm/memory-failure mm/pagemap mm/pagealloc mm/damon mm/memory-failure Subsystem: mm/kfence Baokun Li <libaokun1@huawei.com>: kfence: fix memory leak when cat kfence objects Subsystem: mm/mempolicy Andrey Ryabinin <arbn@yandex-team.com>: mm: mempolicy: fix THP allocations escaping mempolicy restrictions Subsystem: core-kernel Philipp Rudo <prudo@redhat.com>: kernel/crash_core: suppress unknown crashkernel parameter warning Subsystem: MAINTAINERS Randy Dunlap <rdunlap@infradead.org>: MAINTAINERS: mark more list instances as moderated Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm, hwpoison: fix condition in free hugetlb page path Subsystem: mm/pagemap Hugh Dickins <hughd@google.com>: mm: delete unsafe BUG from page_cache_add_speculative() Subsystem: mm/pagealloc Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>: mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid Subsystem: mm/damon SeongJae Park <sj@kernel.org>: mm/damon/dbgfs: protect targets destructions with kdamond_lock Subsystem: mm/memory-failure Liu Shixin <liushixin2@huawei.com>: mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() MAINTAINERS | 4 ++-- include/linux/gfp.h | 2 +- include/linux/pagemap.h | 1 - kernel/crash_core.c | 11 +++++++++++ mm/damon/dbgfs.c | 2 ++ mm/kfence/core.c | 1 + mm/memory-failure.c | 14 +++++--------- mm/mempolicy.c | 3 +-- 8 files changed, 23 insertions(+), 15 deletions(-)
2 patches, based on 4f3d93c6eaff6b84e43b63e0d7a119c5920e1020. Subsystems affected by this patch series: mm/userfaultfd mm/damon Subsystem: mm/userfaultfd Mike Kravetz <mike.kravetz@oracle.com>: userfaultfd/selftests: fix hugetlb area allocations Subsystem: mm/damon SeongJae Park <sj@kernel.org>: mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()' mm/damon/dbgfs.c | 9 +++++++-- tools/testing/selftests/vm/userfaultfd.c | 16 ++++++++++------ 2 files changed, 17 insertions(+), 8 deletions(-)
146 patches, based on df0cc57e057f18e44dac8e6c18aba47ab53202f9 ("Linux 5.16") Subsystems affected by this patch series: kthread ia64 scripts ntfs squashfs ocfs2 vfs mm/slab-generic mm/slab mm/kmemleak mm/dax mm/kasan mm/debug mm/pagecache mm/gup mm/shmem mm/frontswap mm/memremap mm/memcg mm/selftests mm/pagemap mm/dma mm/vmalloc mm/memory-failure mm/hugetlb mm/userfaultfd mm/vmscan mm/mempolicy mm/oom-kill mm/hugetlbfs mm/migration mm/thp mm/ksm mm/page-poison mm/percpu mm/rmap mm/zswap mm/zram mm/cleanups mm/hmm mm/damon Subsystem: kthread Cai Huoqing <caihuoqing@baidu.com>: kthread: add the helper function kthread_run_on_cpu() RDMA/siw: make use of the helper function kthread_run_on_cpu() ring-buffer: make use of the helper function kthread_run_on_cpu() rcutorture: make use of the helper function kthread_run_on_cpu() trace/osnoise: make use of the helper function kthread_run_on_cpu() trace/hwlat: make use of the helper function kthread_run_on_cpu() Subsystem: ia64 Yang Guang <yang.guang5@zte.com.cn>: ia64: module: use swap() to make code cleaner arch/ia64/kernel/setup.c: use swap() to make code cleaner Jason Wang <wangborong@cdjrlc.com>: ia64: fix typo in a comment Greg Kroah-Hartman <gregkh@linuxfoundation.org>: ia64: topology: use default_groups in kobj_type Subsystem: scripts Drew Fustini <dfustini@baylibre.com>: scripts/spelling.txt: add "oveflow" Subsystem: ntfs Yang Li <yang.lee@linux.alibaba.com>: fs/ntfs/attrib.c: fix one kernel-doc comment Subsystem: squashfs Zheng Liang <zhengliang6@huawei.com>: squashfs: provide backing_dev_info in order to disable read-ahead Subsystem: ocfs2 Zhang Mingyu <zhang.mingyu@zte.com.cn>: ocfs2: use BUG_ON instead of if condition followed by BUG. Joseph Qi <joseph.qi@linux.alibaba.com>: ocfs2: clearly handle ocfs2_grab_pages_for_write() return value Greg Kroah-Hartman <gregkh@linuxfoundation.org>: ocfs2: use default_groups in kobj_type Colin Ian King <colin.i.king@gmail.com>: ocfs2: remove redundant assignment to pointer root_bh Greg Kroah-Hartman <gregkh@linuxfoundation.org>: ocfs2: cluster: use default_groups in kobj_type Colin Ian King <colin.i.king@gmail.com>: ocfs2: remove redundant assignment to variable free_space Subsystem: vfs Amit Daniel Kachhap <amit.kachhap@arm.com>: fs/ioctl: remove unnecessary __user annotation Subsystem: mm/slab-generic Marco Elver <elver@google.com>: mm/slab_common: use WARN() if cache still has objects on destroy Subsystem: mm/slab Muchun Song <songmuchun@bytedance.com>: mm: slab: make slab iterator functions static Subsystem: mm/kmemleak Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>: kmemleak: fix kmemleak false positive report with HW tag-based kasan enable Calvin Zhang <calvinzhang.cool@gmail.com>: mm: kmemleak: alloc gray object for reserved region with direct map Kefeng Wang <wangkefeng.wang@huawei.com>: mm: defer kmemleak object creation of module_alloc() Subsystem: mm/dax Joao Martins <joao.m.martins@oracle.com>: Patch series "mm, device-dax: Introduce compound pages in devmap", v7: mm/page_alloc: split prep_compound_page into head and tail subparts mm/page_alloc: refactor memmap_init_zone_device() page init mm/memremap: add ZONE_DEVICE support for compound pages device-dax: use ALIGN() for determining pgoff device-dax: use struct_size() device-dax: ensure dev_dax->pgmap is valid for dynamic devices device-dax: factor out page mapping initialization device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}() device-dax: remove pfn from __dev_dax_{pte,pmd,pud}_fault() device-dax: compound devmap support Subsystem: mm/kasan Marco Elver <elver@google.com>: kasan: test: add globals left-out-of-bounds test kasan: add ability to detect double-kmem_cache_destroy() kasan: test: add test case for double-kmem_cache_destroy() Andrey Konovalov <andreyknvl@google.com>: kasan: fix quarantine conflicting with init_on_free Subsystem: mm/debug "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm,fs: split dump_mapping() out from dump_page() Anshuman Khandual <anshuman.khandual@arm.com>: mm/debug_vm_pgtable: update comments regarding migration swap entries Subsystem: mm/pagecache chiminghao <chi.minghao@zte.com.cn>: mm/truncate.c: remove unneeded variable Subsystem: mm/gup Christophe Leroy <christophe.leroy@csgroup.eu>: gup: avoid multiple user access locking/unlocking in fault_in_{read/write}able Li Xinhai <lixinhai.lxh@gmail.com>: mm/gup.c: stricter check on THP migration entry during follow_pmd_mask Subsystem: mm/shmem Yang Shi <shy828301@gmail.com>: mm: shmem: don't truncate page if memory failure happens Gang Li <ligang.bdlg@bytedance.com>: shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode Subsystem: mm/frontswap Christophe JAILLET <christophe.jaillet@wanadoo.fr>: mm/frontswap.c: use non-atomic '__set_bit()' when possible Subsystem: mm/memremap Subsystem: mm/memcg Muchun Song <songmuchun@bytedance.com>: mm: memcontrol: make cgroup_memory_nokmem static Donghai Qiao <dqiao@redhat.com>: mm/page_counter: remove an incorrect call to propagate_protected_usage() Dan Schatzberg <schatzberg.dan@gmail.com>: mm/memcg: add oom_group_kill memory event Shakeel Butt <shakeelb@google.com>: memcg: better bounds on the memcg stats updates Wang Weiyang <wangweiyang2@huawei.com>: mm/memcg: use struct_size() helper in kzalloc() Shakeel Butt <shakeelb@google.com>: memcg: add per-memcg vmalloc stat Subsystem: mm/selftests chiminghao <chi.minghao@zte.com.cn>: tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner Subsystem: mm/pagemap Qi Zheng <zhengqi.arch@bytedance.com>: mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit Colin Cross <ccross@google.com>: Patch series "mm: rearrange madvise code to allow for reuse", v11: mm: rearrange madvise code to allow for reuse mm: add a field to store names for private anonymous memory Suren Baghdasaryan <surenb@google.com>: mm: add anonymous vma name refcounting Arnd Bergmann <arnd@arndb.de>: mm: move anon_vma declarations to linux/mm_inline.h mm: move tlb_flush_pending inline helpers to mm_inline.h Suren Baghdasaryan <surenb@google.com>: mm: protect free_pgtables with mmap_lock write lock in exit_mmap mm: document locking restrictions for vm_operations_struct::close mm/oom_kill: allow process_mrelease to run under mmap_lock protection Shuah Khan <skhan@linuxfoundation.org>: docs/vm: add vmalloced-kernel-stacks document Pasha Tatashin <pasha.tatashin@soleen.com>: Patch series "page table check", v3: mm: change page type prior to adding page table entry mm: ptep_clear() page table helper mm: page table check x86: mm: add x86_64 support for page table check "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm: remove last argument of reuse_swap_page() mm: remove the total_mapcount argument from page_trans_huge_map_swapcount() mm: remove the total_mapcount argument from page_trans_huge_mapcount() Subsystem: mm/dma Christian König <christian.koenig@amd.com>: mm/dmapool.c: revert "make dma pool to use kmalloc_node" Subsystem: mm/vmalloc Michal Hocko <mhocko@suse.com>: Patch series "extend vmalloc support for constrained allocations", v2: mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc mm/vmalloc: add support for __GFP_NOFAIL mm/vmalloc: be more explicit about supported gfp flags. mm: allow !GFP_KERNEL allocations for kvmalloc mm: make slab and vmalloc allocators __GFP_NOLOCKDEP aware "NeilBrown" <neilb@suse.de>: mm: introduce memalloc_retry_wait() Suren Baghdasaryan <surenb@google.com>: mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30% Changcheng Deng <deng.changcheng@zte.com.cn>: mm: fix boolreturn.cocci warning Xiongwei Song <sxwjean@gmail.com>: mm: page_alloc: fix building error on -Werror=array-compare Michal Hocko <mhocko@suse.com>: mm: drop node from alloc_pages_vma Miles Chen <miles.chen@mediatek.com>: include/linux/gfp.h: further document GFP_DMA32 Anshuman Khandual <anshuman.khandual@arm.com>: mm/page_alloc.c: modify the comment section for alloc_contig_pages() Baoquan He <bhe@redhat.com>: Patch series "Handle warning of allocation failure on DMA zone w/o managed pages", v4: mm_zone: add function to check if managed dma zone exists dma/pool: create dma atomic pool only if dma zone has managed pages mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages Subsystem: mm/memory-failure Subsystem: mm/hugetlb Mina Almasry <almasrymina@google.com>: hugetlb: add hugetlb.*.numa_stat file Yosry Ahmed <yosryahmed@google.com>: mm, hugepages: make memory size variable in hugepage-mremap selftest Yang Yang <yang.yang29@zte.com.cn>: mm/vmstat: add events for THP max_ptes_* exceeds Waiman Long <longman@redhat.com>: selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting Subsystem: mm/userfaultfd Peter Xu <peterx@redhat.com>: selftests/uffd: allow EINTR/EAGAIN Mike Kravetz <mike.kravetz@oracle.com>: userfaultfd/selftests: clean up hugetlb allocation code Subsystem: mm/vmscan Gang Li <ligang.bdlg@bytedance.com>: vmscan: make drop_slab_node static Chen Wandun <chenwandun@huawei.com>: mm/page_isolation: unset migratetype directly for non Buddy page Subsystem: mm/mempolicy "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: Patch series "mm: add new syscall set_mempolicy_home_node", v6: mm/mempolicy: use policy_node helper with MPOL_PREFERRED_MANY mm/mempolicy: add set_mempolicy_home_node syscall mm/mempolicy: wire up syscall set_mempolicy_home_node Randy Dunlap <rdunlap@infradead.org>: mm/mempolicy: fix all kernel-doc warnings Subsystem: mm/oom-kill Jann Horn <jannh@google.com>: mm, oom: OOM sysrq should always kill a process Subsystem: mm/hugetlbfs Sean Christopherson <seanjc@google.com>: hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list() Subsystem: mm/migration Baolin Wang <baolin.wang@linux.alibaba.com>: Patch series "Improve the migration stats": mm: migrate: fix the return value of migrate_pages() mm: migrate: correct the hugetlb migration stats mm: compaction: fix the migration stats in trace_mm_compaction_migratepages() mm: migrate: support multiple target nodes demotion mm: migrate: add more comments for selecting target node randomly Huang Ying <ying.huang@intel.com>: mm/migrate: move node demotion code to near its user Colin Ian King <colin.i.king@gmail.com>: mm/migrate: remove redundant variables used in a for-loop Subsystem: mm/thp Anshuman Khandual <anshuman.khandual@arm.com>: mm/thp: drop unused trace events hugepage_[invalidate|splitting] Subsystem: mm/ksm Nanyong Sun <sunnanyong@huawei.com>: mm: ksm: fix use-after-free kasan report in ksm_might_need_to_copy Subsystem: mm/page-poison Naoya Horiguchi <naoya.horiguchi@nec.com>: Patch series "mm/hwpoison: fix unpoison_memory()", v4: mm/hwpoison: mf_mutex for soft offline and unpoison mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE mm/hwpoison: fix unpoison_memory() Subsystem: mm/percpu Qi Zheng <zhengqi.arch@bytedance.com>: mm: memcg/percpu: account extra objcg space to memory cgroups Subsystem: mm/rmap Huang Ying <ying.huang@intel.com>: mm/rmap: fix potential batched TLB flush race Subsystem: mm/zswap Zhaoyu Liu <zackary.liu.pro@gmail.com>: zpool: remove the list of pools_head Subsystem: mm/zram Luis Chamberlain <mcgrof@kernel.org>: zram: use ATTRIBUTE_GROUPS Subsystem: mm/cleanups Quanfa Fu <fuqf0919@gmail.com>: mm: fix some comment errors Ting Liu <liuting.0x7c00@bytedance.com>: mm: make some vars and functions static or __init Subsystem: mm/hmm Alistair Popple <apopple@nvidia.com>: mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault Subsystem: mm/damon Xin Hao <xhao@linux.alibaba.com>: Patch series "mm/damon: Do some small changes", v4: mm/damon: unified access_check function naming rules mm/damon: add 'age' of region tracepoint support mm/damon/core: use abs() instead of diff_of() mm/damon: remove some unneeded function definitions in damon.h Yihao Han <hanyihao@vivo.com>: mm/damon/vaddr: remove swap_ranges() and replace it with swap() Xin Hao <xhao@linux.alibaba.com>: mm/damon/schemes: add the validity judgment of thresholds mm/damon: move damon_rand() definition into damon.h mm/damon: modify damon_rand() macro to static inline function SeongJae Park <sj@kernel.org>: Patch series "mm/damon: Misc cleanups": mm/damon: convert macro functions to static inline functions Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts mm/damon: remove a mistakenly added comment for a future feature Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning": mm/damon/schemes: account scheme actions that successfully applied mm/damon/schemes: account how many times quota limit has exceeded mm/damon/reclaim: provide reclamation statistics Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/usage: update for schemes statistics Baolin Wang <baolin.wang@linux.alibaba.com>: mm/damon: add access checking for hugetlb pages Guoqing Jiang <guoqing.jiang@linux.dev>: mm/damon: move the implementation of damon_insert_region to damon.h SeongJae Park <sj@kernel.org>: Patch series "mm/damon: Hide unnecessary information disclosures": mm/damon/dbgfs: remove an unnecessary variable mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon: hide kernel pointer from tracepoint event Documentation/admin-guide/cgroup-v1/hugetlb.rst | 4 Documentation/admin-guide/cgroup-v2.rst | 11 Documentation/admin-guide/mm/damon/reclaim.rst | 25 Documentation/admin-guide/mm/damon/usage.rst | 235 +++++-- Documentation/admin-guide/mm/numa_memory_policy.rst | 16 Documentation/admin-guide/sysctl/vm.rst | 2 Documentation/filesystems/proc.rst | 6 Documentation/vm/arch_pgtable_helpers.rst | 20 Documentation/vm/index.rst | 2 Documentation/vm/page_migration.rst | 12 Documentation/vm/page_table_check.rst | 56 + Documentation/vm/vmalloced-kernel-stacks.rst | 153 ++++ MAINTAINERS | 9 arch/Kconfig | 3 arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/alpha/mm/fault.c | 16 arch/arc/mm/fault.c | 3 arch/arm/mm/fault.c | 2 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/arm64/kernel/module.c | 4 arch/arm64/mm/fault.c | 6 arch/hexagon/mm/vm_fault.c | 8 arch/ia64/kernel/module.c | 6 arch/ia64/kernel/setup.c | 5 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/ia64/kernel/topology.c | 3 arch/ia64/kernel/uncached.c | 2 arch/ia64/mm/fault.c | 16 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/m68k/mm/fault.c | 18 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/microblaze/mm/fault.c | 18 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/mips/mm/fault.c | 19 arch/nds32/mm/fault.c | 16 arch/nios2/mm/fault.c | 18 arch/openrisc/mm/fault.c | 18 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/parisc/mm/fault.c | 18 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/mm/fault.c | 6 arch/riscv/mm/fault.c | 2 arch/s390/kernel/module.c | 5 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/s390/mm/fault.c | 28 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sh/mm/fault.c | 18 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/sparc/mm/fault_32.c | 16 arch/sparc/mm/fault_64.c | 16 arch/um/kernel/trap.c | 8 arch/x86/Kconfig | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/x86/include/asm/pgtable.h | 31 - arch/x86/kernel/module.c | 7 arch/x86/mm/fault.c | 3 arch/xtensa/kernel/syscalls/syscall.tbl | 1 arch/xtensa/mm/fault.c | 17 drivers/block/zram/zram_drv.c | 11 drivers/dax/bus.c | 32 + drivers/dax/bus.h | 1 drivers/dax/device.c | 140 ++-- drivers/infiniband/sw/siw/siw_main.c | 7 drivers/of/fdt.c | 6 fs/ext4/extents.c | 8 fs/ext4/inline.c | 5 fs/ext4/page-io.c | 9 fs/f2fs/data.c | 4 fs/f2fs/gc.c | 5 fs/f2fs/inode.c | 4 fs/f2fs/node.c | 4 fs/f2fs/recovery.c | 6 fs/f2fs/segment.c | 9 fs/f2fs/super.c | 5 fs/hugetlbfs/inode.c | 7 fs/inode.c | 49 + fs/ioctl.c | 2 fs/ntfs/attrib.c | 2 fs/ocfs2/alloc.c | 2 fs/ocfs2/aops.c | 26 fs/ocfs2/cluster/masklog.c | 11 fs/ocfs2/dir.c | 2 fs/ocfs2/filecheck.c | 3 fs/ocfs2/journal.c | 6 fs/proc/task_mmu.c | 13 fs/squashfs/super.c | 33 + fs/userfaultfd.c | 8 fs/xfs/kmem.c | 3 fs/xfs/xfs_buf.c | 2 include/linux/ceph/libceph.h | 1 include/linux/damon.h | 93 +-- include/linux/fs.h | 1 include/linux/gfp.h | 12 include/linux/hugetlb.h | 4 include/linux/hugetlb_cgroup.h | 7 include/linux/kasan.h | 4 include/linux/kthread.h | 25 include/linux/memcontrol.h | 22 include/linux/mempolicy.h | 1 include/linux/memremap.h | 11 include/linux/mm.h | 76 -- include/linux/mm_inline.h | 136 ++++ include/linux/mm_types.h | 252 +++----- include/linux/mmzone.h | 9 include/linux/page-flags.h | 6 include/linux/page_idle.h | 1 include/linux/page_table_check.h | 147 ++++ include/linux/pgtable.h | 8 include/linux/sched/mm.h | 26 include/linux/swap.h | 8 include/linux/syscalls.h | 3 include/linux/vm_event_item.h | 3 include/linux/vmalloc.h | 7 include/ras/ras_event.h | 2 include/trace/events/compaction.h | 24 include/trace/events/damon.h | 15 include/trace/events/thp.h | 35 - include/uapi/asm-generic/unistd.h | 5 include/uapi/linux/prctl.h | 3 kernel/dma/pool.c | 4 kernel/fork.c | 3 kernel/kthread.c | 1 kernel/rcu/rcutorture.c | 7 kernel/sys.c | 63 ++ kernel/sys_ni.c | 1 kernel/sysctl.c | 3 kernel/trace/ring_buffer.c | 7 kernel/trace/trace_hwlat.c | 6 kernel/trace/trace_osnoise.c | 3 lib/test_hmm.c | 24 lib/test_kasan.c | 30 mm/Kconfig | 14 mm/Kconfig.debug | 24 mm/Makefile | 1 mm/compaction.c | 7 mm/damon/core.c | 45 - mm/damon/dbgfs.c | 20 mm/damon/paddr.c | 24 mm/damon/prmtv-common.h | 4 mm/damon/reclaim.c | 46 + mm/damon/vaddr.c | 186 ++++-- mm/debug.c | 52 - mm/debug_vm_pgtable.c | 6 mm/dmapool.c | 2 mm/frontswap.c | 4 mm/gup.c | 31 - mm/hmm.c | 5 mm/huge_memory.c | 32 - mm/hugetlb.c | 6 mm/hugetlb_cgroup.c | 133 +++- mm/internal.h | 7 mm/kasan/quarantine.c | 11 mm/kasan/shadow.c | 9 mm/khugepaged.c | 23 mm/kmemleak.c | 21 mm/ksm.c | 5 mm/madvise.c | 510 ++++++++++------ mm/mapping_dirty_helpers.c | 1 mm/memcontrol.c | 44 - mm/memory-failure.c | 189 +++--- mm/memory.c | 12 mm/mempolicy.c | 95 ++- mm/memremap.c | 18 mm/migrate.c | 527 ++++++++++------- mm/mlock.c | 2 mm/mmap.c | 55 + mm/mmu_gather.c | 1 mm/mprotect.c | 2 mm/oom_kill.c | 30 mm/page_alloc.c | 198 ++++-- mm/page_counter.c | 1 mm/page_ext.c | 8 mm/page_isolation.c | 2 mm/page_owner.c | 4 mm/page_table_check.c | 270 ++++++++ mm/percpu-internal.h | 18 mm/percpu.c | 10 mm/pgtable-generic.c | 1 mm/rmap.c | 43 + mm/shmem.c | 91 ++ mm/slab.h | 5 mm/slab_common.c | 34 - mm/swap.c | 2 mm/swapfile.c | 46 - mm/truncate.c | 5 mm/userfaultfd.c | 5 mm/util.c | 15 mm/vmalloc.c | 75 +- mm/vmscan.c | 2 mm/vmstat.c | 3 mm/zpool.c | 12 net/ceph/buffer.c | 4 net/ceph/ceph_common.c | 27 net/ceph/crypto.c | 2 net/ceph/messenger.c | 2 net/ceph/messenger_v2.c | 2 net/ceph/osdmap.c | 12 net/sunrpc/svc_xprt.c | 3 scripts/spelling.txt | 1 tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 34 - tools/testing/selftests/vm/hmm-tests.c | 42 + tools/testing/selftests/vm/hugepage-mremap.c | 46 - tools/testing/selftests/vm/hugetlb_reparenting_test.sh | 21 tools/testing/selftests/vm/run_vmtests.sh | 2 tools/testing/selftests/vm/userfaultfd.c | 33 - tools/testing/selftests/vm/write_hugetlb_memory.sh | 2 211 files changed, 3980 insertions(+), 1759 deletions(-)
55 patches, based on df0cc57e057f18e44dac8e6c18aba47ab53202f9 ("Linux 5.16") Subsystems affected by this patch series: percpu procfs sysctl misc core-kernel get_maintainer lib checkpatch binfmt nilfs2 hfs fat adfs panic delayacct kconfig kcov ubsan Subsystem: percpu Kefeng Wang <wangkefeng.wang@huawei.com>: Patch series "mm: percpu: Cleanup percpu first chunk function": mm: percpu: generalize percpu related config mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef mm: percpu: add generic pcpu_fc_alloc/free funciton mm: percpu: add generic pcpu_populate_pte() function Subsystem: procfs David Hildenbrand <david@redhat.com>: proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration Hans de Goede <hdegoede@redhat.com>: proc: make the proc_create[_data]() stubs static inlines Qi Zheng <zhengqi.arch@bytedance.com>: proc: convert the return type of proc_fd_access_allowed() to be boolean Subsystem: sysctl Geert Uytterhoeven <geert+renesas@glider.be>: sysctl: fix duplicate path separator in printed entries luo penghao <luo.penghao@zte.com.cn>: sysctl: remove redundant ret assignment Subsystem: misc Andy Shevchenko <andriy.shevchenko@linux.intel.com>: include/linux/unaligned: replace kernel.h with the necessary inclusions kernel.h: include a note to discourage people from including it in headers Subsystem: core-kernel Yafang Shao <laoar.shao@gmail.com>: Patch series "task comm cleanups", v2: fs/exec: replace strlcpy with strscpy_pad in __set_task_comm fs/exec: replace strncpy with strscpy_pad in __get_task_comm drivers/infiniband: replace open-coded string copy with get_task_comm fs/binfmt_elf: replace open-coded string copy with get_task_comm samples/bpf/test_overhead_kprobe_kern: replace bpf_probe_read_kernel with bpf_probe_read_kernel_str to get task comm tools/bpf/bpftool/skeleton: replace bpf_probe_read_kernel with bpf_probe_read_kernel_str to get task comm tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN kthread: dynamically allocate memory to store kthread's full name Davidlohr Bueso <dave@stgolabs.net>: kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP) Subsystem: get_maintainer Randy Dunlap <rdunlap@infradead.org>: get_maintainer: don't remind about no git repo when --nogit is used Subsystem: lib Alexey Dobriyan <adobriyan@gmail.com>: kstrtox: uninline everything Andy Shevchenko <andriy.shevchenko@linux.intel.com>: list: introduce list_is_head() helper and re-use it in list.h Zhen Lei <thunder.leizhen@huawei.com>: lib/list_debug.c: print more list debugging context in __list_del_entry_valid() Isabella Basso <isabbasso@riseup.net>: Patch series "test_hash.c: refactor into KUnit", v3: hash.h: remove unused define directive test_hash.c: split test_int_hash into arch-specific functions test_hash.c: split test_hash_init lib/Kconfig.debug: properly split hash test kernel entries test_hash.c: refactor into kunit Andy Shevchenko <andriy.shevchenko@linux.intel.com>: kunit: replace kernel.h with the necessary inclusions uuid: discourage people from using UAPI header in new code uuid: remove licence boilerplate text from the header Andrey Konovalov <andreyknvl@google.com>: lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test Subsystem: checkpatch Jerome Forissier <jerome@forissier.org>: checkpatch: relax regexp for COMMIT_LOG_LONG_LINE Joe Perches <joe@perches.com>: checkpatch: improve Kconfig help test Rikard Falkeborn <rikard.falkeborn@gmail.com>: const_structs.checkpatch: add frequently used ops structs Subsystem: binfmt "H.J. Lu" <hjl.tools@gmail.com>: fs/binfmt_elf: use PT_LOAD p_align values for static PIE Subsystem: nilfs2 Colin Ian King <colin.i.king@gmail.com>: nilfs2: remove redundant pointer sbufs Subsystem: hfs Kees Cook <keescook@chromium.org>: hfsplus: use struct_group_attr() for memcpy() region Subsystem: fat "NeilBrown" <neilb@suse.de>: FAT: use io_schedule_timeout() instead of congestion_wait() Subsystem: adfs Minghao Chi <chi.minghao@zte.com.cn>: fs/adfs: remove unneeded variable make code cleaner Subsystem: panic Marco Elver <elver@google.com>: panic: use error_report_end tracepoint on warnings Sebastian Andrzej Siewior <bigeasy@linutronix.de>: panic: remove oops_id Subsystem: delayacct Yang Yang <yang.yang29@zte.com.cn>: delayacct: support swapin delay accounting for swapping without blkio delayacct: fix incomplete disable operation when switch enable to disable delayacct: cleanup flags in struct task_delay_info and functions use it wangyong <wang.yong12@zte.com.cn>: Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact delayacct: track delays from memory compact Subsystem: kconfig Qian Cai <quic_qiancai@quicinc.com>: configs: introduce debug.config for CI-like setup Nathan Chancellor <nathan@kernel.org>: Patch series "Fix CONFIG_TEST_KMOD with 256kB page size": arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB btrfs: use generic Kconfig option for 256kB page size limit lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB Subsystem: kcov Marco Elver <elver@google.com>: kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR Subsystem: ubsan Kees Cook <keescook@chromium.org>: ubsan: remove CONFIG_UBSAN_OBJECT_SIZE Colin Ian King <colin.i.king@gmail.com>: lib: remove redundant assignment to variable ret Documentation/accounting/delay-accounting.rst | 63 +- arch/Kconfig | 4 arch/arm64/Kconfig | 20 arch/ia64/Kconfig | 9 arch/mips/Kconfig | 10 arch/mips/mm/init.c | 28 - arch/powerpc/Kconfig | 17 arch/powerpc/kernel/setup_64.c | 113 ---- arch/riscv/Kconfig | 10 arch/sparc/Kconfig | 12 arch/sparc/kernel/led.c | 8 arch/sparc/kernel/smp_64.c | 119 ----- arch/x86/Kconfig | 19 arch/x86/kernel/setup_percpu.c | 82 --- drivers/base/arch_numa.c | 78 --- drivers/infiniband/hw/qib/qib.h | 2 drivers/infiniband/hw/qib/qib_file_ops.c | 2 drivers/infiniband/sw/rxe/rxe_qp.c | 3 drivers/net/wireless/broadcom/brcm80211/brcmfmac/xtlv.c | 2 fs/adfs/inode.c | 4 fs/binfmt_elf.c | 6 fs/btrfs/Kconfig | 3 fs/exec.c | 5 fs/fat/file.c | 5 fs/hfsplus/hfsplus_raw.h | 12 fs/hfsplus/xattr.c | 4 fs/nilfs2/page.c | 4 fs/proc/array.c | 3 fs/proc/base.c | 4 fs/proc/proc_sysctl.c | 9 fs/proc/vmcore.c | 10 include/kunit/assert.h | 2 include/linux/delayacct.h | 107 ++-- include/linux/elfcore-compat.h | 5 include/linux/elfcore.h | 5 include/linux/hash.h | 5 include/linux/kernel.h | 9 include/linux/kthread.h | 1 include/linux/list.h | 36 - include/linux/percpu.h | 21 include/linux/proc_fs.h | 12 include/linux/sched.h | 9 include/linux/unaligned/packed_struct.h | 2 include/trace/events/error_report.h | 8 include/uapi/linux/taskstats.h | 6 include/uapi/linux/uuid.h | 10 kernel/configs/debug.config | 105 ++++ kernel/delayacct.c | 49 +- kernel/kthread.c | 32 + kernel/panic.c | 21 kernel/sys.c | 16 lib/Kconfig.debug | 45 + lib/Kconfig.ubsan | 13 lib/Makefile | 5 lib/asn1_encoder.c | 2 lib/kstrtox.c | 12 lib/list_debug.c | 8 lib/lz4/lz4defs.h | 2 lib/test_hash.c | 375 +++++++--------- lib/test_meminit.c | 1 lib/test_ubsan.c | 22 mm/Kconfig | 12 mm/memory.c | 4 mm/page_alloc.c | 3 mm/page_io.c | 3 mm/percpu.c | 168 +++++-- samples/bpf/offwaketime_kern.c | 4 samples/bpf/test_overhead_kprobe_kern.c | 11 samples/bpf/test_overhead_tp_kern.c | 5 scripts/Makefile.ubsan | 1 scripts/checkpatch.pl | 54 +- scripts/const_structs.checkpatch | 23 scripts/get_maintainer.pl | 2 tools/accounting/getdelays.c | 8 tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 4 tools/include/linux/hash.h | 5 tools/testing/selftests/bpf/progs/test_stacktrace_map.c | 6 tools/testing/selftests/bpf/progs/test_tracepoint.c | 6 78 files changed, 943 insertions(+), 992 deletions(-)
This is the post-linux-next queue. Material which was based on or dependent upon material which was in -next. 69 patches, based on 9b57f458985742bd1c585f4c7f36d04634ce1143. Subsystems affected by this patch series: mm/migration sysctl mm/zsmalloc proc lib Subsystem: mm/migration Alistair Popple <apopple@nvidia.com>: mm/migrate.c: rework migration_entry_wait() to not take a pageref Subsystem: sysctl Xiaoming Ni <nixiaoming@huawei.com>: Patch series "sysctl: first set of kernel/sysctl cleanups", v2: sysctl: add a new register_sysctl_init() interface sysctl: move some boundary constants from sysctl.c to sysctl_vals hung_task: move hung_task sysctl interface to hung_task.c watchdog: move watchdog sysctl interface to watchdog.c Stephen Kitt <steve@sk2.org>: sysctl: make ngroups_max const Xiaoming Ni <nixiaoming@huawei.com>: sysctl: use const for typically used max/min proc sysctls sysctl: use SYSCTL_ZERO to replace some static int zero uses aio: move aio sysctl to aio.c dnotify: move dnotify sysctl to dnotify.c Luis Chamberlain <mcgrof@kernel.org>: Patch series "sysctl: second set of kernel/sysctl cleanups", v2: hpet: simplify subdirectory registration with register_sysctl() i915: simplify subdirectory registration with register_sysctl() macintosh/mac_hid.c: simplify subdirectory registration with register_sysctl() ocfs2: simplify subdirectory registration with register_sysctl() test_sysctl: simplify subdirectory registration with register_sysctl() Xiaoming Ni <nixiaoming@huawei.com>: inotify: simplify subdirectory registration with register_sysctl() Luis Chamberlain <mcgrof@kernel.org>: cdrom: simplify subdirectory registration with register_sysctl() Xiaoming Ni <nixiaoming@huawei.com>: eventpoll: simplify sysctl declaration with register_sysctl() Patch series "sysctl: 3rd set of kernel/sysctl cleanups", v2: firmware_loader: move firmware sysctl to its own files random: move the random sysctl declarations to its own file Luis Chamberlain <mcgrof@kernel.org>: sysctl: add helper to register a sysctl mount point fs: move binfmt_misc sysctl to its own file Xiaoming Ni <nixiaoming@huawei.com>: printk: move printk sysctl to printk/sysctl.c scsi/sg: move sg-big-buff sysctl to scsi/sg.c stackleak: move stack_erasing sysctl to stackleak.c Luis Chamberlain <mcgrof@kernel.org>: sysctl: share unsigned long const values Patch series "sysctl: 4th set of kernel/sysctl cleanups": fs: move inode sysctls to its own file fs: move fs stat sysctls to file_table.c fs: move dcache sysctls to its own file sysctl: move maxolduid as a sysctl specific const fs: move shared sysctls to fs/sysctls.c fs: move locking sysctls where they are used fs: move namei sysctls to its own file fs: move fs/exec.c sysctls into its own file fs: move pipe sysctls to is own file Patch series "sysctl: add and use base directory declarer and registration helper": sysctl: add and use base directory declarer and registration helper fs: move namespace sysctls and declare fs base directory kernel/sysctl.c: rename sysctl_init() to sysctl_init_bases() Xiaoming Ni <nixiaoming@huawei.com>: printk: fix build warning when CONFIG_PRINTK=n fs/coredump: move coredump sysctls into its own file kprobe: move sysctl_kprobes_optimization to kprobes.c Colin Ian King <colin.i.king@gmail.com>: kernel/sysctl.c: remove unused variable ten_thousand Baokun Li <libaokun1@huawei.com>: sysctl: returns -EINVAL when a negative value is passed to proc_doulongvec_minmax Subsystem: mm/zsmalloc Minchan Kim <minchan@kernel.org>: Patch series "zsmalloc: remove bit_spin_lock", v2: zsmalloc: introduce some helper functions zsmalloc: rename zs_stat_type to class_stat_type zsmalloc: decouple class actions from zspage works zsmalloc: introduce obj_allocated zsmalloc: move huge compressed obj from page to zspage zsmalloc: remove zspage isolation for migration locking/rwlocks: introduce write_lock_nested zsmalloc: replace per zpage lock with pool->migrate_lock Mike Galbraith <umgwanakikbuti@gmail.com>: zsmalloc: replace get_cpu_var with local_lock Subsystem: proc Muchun Song <songmuchun@bytedance.com>: fs: proc: store PDE()->data into inode->i_private proc: remove PDE_DATA() completely Subsystem: lib Vlastimil Babka <vbabka@suse.cz>: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() lib/stackdepot: fix spelling mistake and grammar in pr_err message lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3 lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4 Marco Elver <elver@google.com>: lib/stackdepot: always do filter_irq_stacks() in stack_depot_save() Christoph Hellwig <hch@lst.de>: Patch series "remove Xen tmem leftovers": mm: remove cleancache frontswap: remove frontswap_writethrough frontswap: remove frontswap_tmem_exclusive_gets frontswap: remove frontswap_shrink frontswap: remove frontswap_curr_pages frontswap: simplify frontswap_init frontswap: remove the frontswap exports mm: simplify try_to_unuse frontswap: remove frontswap_test frontswap: simplify frontswap_register_ops mm: mark swap_lock and swap_active_head static frontswap: remove support for multiple ops mm: hide the FRONTSWAP Kconfig symbol Documentation/vm/cleancache.rst | 296 ------ Documentation/vm/frontswap.rst | 31 Documentation/vm/index.rst | 1 MAINTAINERS | 7 arch/alpha/kernel/srm_env.c | 4 arch/arm/configs/bcm2835_defconfig | 1 arch/arm/configs/qcom_defconfig | 1 arch/arm/kernel/atags_proc.c | 2 arch/arm/mm/alignment.c | 2 arch/ia64/kernel/salinfo.c | 10 arch/m68k/configs/amiga_defconfig | 1 arch/m68k/configs/apollo_defconfig | 1 arch/m68k/configs/atari_defconfig | 1 arch/m68k/configs/bvme6000_defconfig | 1 arch/m68k/configs/hp300_defconfig | 1 arch/m68k/configs/mac_defconfig | 1 arch/m68k/configs/multi_defconfig | 1 arch/m68k/configs/mvme147_defconfig | 1 arch/m68k/configs/mvme16x_defconfig | 1 arch/m68k/configs/q40_defconfig | 1 arch/m68k/configs/sun3_defconfig | 1 arch/m68k/configs/sun3x_defconfig | 1 arch/powerpc/kernel/proc_powerpc.c | 4 arch/s390/configs/debug_defconfig | 1 arch/s390/configs/defconfig | 1 arch/sh/mm/alignment.c | 4 arch/xtensa/platforms/iss/simdisk.c | 4 block/bdev.c | 5 drivers/acpi/proc.c | 2 drivers/base/firmware_loader/fallback.c | 7 drivers/base/firmware_loader/fallback.h | 11 drivers/base/firmware_loader/fallback_table.c | 25 drivers/cdrom/cdrom.c | 23 drivers/char/hpet.c | 22 drivers/char/random.c | 14 drivers/gpu/drm/drm_dp_mst_topology.c | 1 drivers/gpu/drm/drm_mm.c | 4 drivers/gpu/drm/drm_modeset_lock.c | 9 drivers/gpu/drm/i915/i915_perf.c | 22 drivers/gpu/drm/i915/intel_runtime_pm.c | 3 drivers/hwmon/dell-smm-hwmon.c | 4 drivers/macintosh/mac_hid.c | 24 drivers/net/bonding/bond_procfs.c | 8 drivers/net/wireless/cisco/airo.c | 22 drivers/net/wireless/intersil/hostap/hostap_ap.c | 16 drivers/net/wireless/intersil/hostap/hostap_download.c | 2 drivers/net/wireless/intersil/hostap/hostap_proc.c | 24 drivers/net/wireless/ray_cs.c | 2 drivers/nubus/proc.c | 36 drivers/parisc/led.c | 4 drivers/pci/proc.c | 10 drivers/platform/x86/thinkpad_acpi.c | 4 drivers/platform/x86/toshiba_acpi.c | 16 drivers/pnp/isapnp/proc.c | 2 drivers/pnp/pnpbios/proc.c | 4 drivers/scsi/scsi_proc.c | 4 drivers/scsi/sg.c | 35 drivers/usb/gadget/function/rndis.c | 4 drivers/zorro/proc.c | 2 fs/Makefile | 4 fs/afs/proc.c | 6 fs/aio.c | 31 fs/binfmt_misc.c | 6 fs/btrfs/extent_io.c | 10 fs/btrfs/super.c | 2 fs/coredump.c | 66 + fs/dcache.c | 37 fs/eventpoll.c | 10 fs/exec.c | 145 +-- fs/ext4/mballoc.c | 14 fs/ext4/readpage.c | 6 fs/ext4/super.c | 3 fs/f2fs/data.c | 13 fs/file_table.c | 47 - fs/inode.c | 39 fs/jbd2/journal.c | 2 fs/locks.c | 34 fs/mpage.c | 7 fs/namei.c | 58 + fs/namespace.c | 24 fs/notify/dnotify/dnotify.c | 21 fs/notify/fanotify/fanotify_user.c | 10 fs/notify/inotify/inotify_user.c | 11 fs/ntfs3/ntfs_fs.h | 1 fs/ocfs2/stackglue.c | 25 fs/ocfs2/super.c | 2 fs/pipe.c | 64 + fs/proc/generic.c | 6 fs/proc/inode.c | 1 fs/proc/internal.h | 5 fs/proc/proc_net.c | 8 fs/proc/proc_sysctl.c | 67 + fs/super.c | 3 fs/sysctls.c | 47 - include/linux/aio.h | 4 include/linux/cleancache.h | 124 -- include/linux/coredump.h | 10 include/linux/dcache.h | 10 include/linux/dnotify.h | 1 include/linux/fanotify.h | 2 include/linux/frontswap.h | 35 include/linux/fs.h | 18 include/linux/inotify.h | 3 include/linux/kprobes.h | 6 include/linux/migrate.h | 2 include/linux/mount.h | 3 include/linux/pipe_fs_i.h | 4 include/linux/poll.h | 2 include/linux/printk.h | 4 include/linux/proc_fs.h | 17 include/linux/ref_tracker.h | 2 include/linux/rwlock.h | 6 include/linux/rwlock_api_smp.h | 8 include/linux/rwlock_rt.h | 10 include/linux/sched/sysctl.h | 14 include/linux/seq_file.h | 2 include/linux/shmem_fs.h | 3 include/linux/spinlock_api_up.h | 1 include/linux/stackdepot.h | 25 include/linux/stackleak.h | 5 include/linux/swapfile.h | 3 include/linux/sysctl.h | 67 + include/scsi/sg.h | 4 init/main.c | 9 ipc/util.c | 2 kernel/hung_task.c | 81 + kernel/irq/proc.c | 8 kernel/kprobes.c | 30 kernel/locking/spinlock.c | 10 kernel/locking/spinlock_rt.c | 12 kernel/printk/Makefile | 5 kernel/printk/internal.h | 8 kernel/printk/printk.c | 4 kernel/printk/sysctl.c | 85 + kernel/resource.c | 4 kernel/stackleak.c | 26 kernel/sysctl.c | 790 +---------------- kernel/watchdog.c | 101 ++ lib/Kconfig | 4 lib/Kconfig.kasan | 2 lib/stackdepot.c | 46 lib/test_sysctl.c | 22 mm/Kconfig | 40 mm/Makefile | 1 mm/cleancache.c | 315 ------ mm/filemap.c | 102 +- mm/frontswap.c | 259 ----- mm/kasan/common.c | 1 mm/migrate.c | 38 mm/page_owner.c | 2 mm/shmem.c | 33 mm/swapfile.c | 90 - mm/truncate.c | 15 mm/zsmalloc.c | 557 ++++------- mm/zswap.c | 8 net/atm/proc.c | 4 net/bluetooth/af_bluetooth.c | 8 net/can/bcm.c | 2 net/can/proc.c | 2 net/core/neighbour.c | 6 net/core/pktgen.c | 6 net/ipv4/netfilter/ipt_CLUSTERIP.c | 6 net/ipv4/raw.c | 8 net/ipv4/tcp_ipv4.c | 2 net/ipv4/udp.c | 6 net/netfilter/x_tables.c | 10 net/netfilter/xt_hashlimit.c | 18 net/netfilter/xt_recent.c | 4 net/sunrpc/auth_gss/svcauth_gss.c | 4 net/sunrpc/cache.c | 24 net/sunrpc/stats.c | 2 sound/core/info.c | 4 172 files changed, 1877 insertions(+), 2931 deletions(-)
12 patches, based on 169387e2aa291a4e3cb856053730fe99d6cec06f. Subsystems affected by this patch series: sysctl binfmt ia64 mm/memory-failure mm/folios selftests mm/kasan mm/psi ocfs2 Subsystem: sysctl Andrew Morton <akpm@linux-foundation.org>: include/linux/sysctl.h: fix register_sysctl_mount_point() return type Subsystem: binfmt Tong Zhang <ztong0001@gmail.com>: binfmt_misc: fix crash when load/unload module Subsystem: ia64 Randy Dunlap <rdunlap@infradead.org>: ia64: make IA64_MCA_RECOVERY bool instead of tristate Subsystem: mm/memory-failure Joao Martins <joao.m.martins@oracle.com>: memory-failure: fetch compound_head after pgmap_pfn_valid() Subsystem: mm/folios Wei Yang <richard.weiyang@gmail.com>: mm: page->mapping folio->mapping should have the same offset Subsystem: selftests Maor Gottlieb <maorg@nvidia.com>: tools/testing/scatterlist: add missing defines Subsystem: mm/kasan Marco Elver <elver@google.com>: kasan: test: fix compatibility with FORTIFY_SOURCE Peter Collingbourne <pcc@google.com>: mm, kasan: use compare-exchange operation to set KASAN page tag Subsystem: mm/psi Suren Baghdasaryan <surenb@google.com>: psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n Subsystem: ocfs2 Joseph Qi <joseph.qi@linux.alibaba.com>: Patch series "ocfs2: fix a deadlock case": jbd2: export jbd2_journal_[grab|put]_journal_head ocfs2: fix a deadlock when commit trans arch/ia64/Kconfig | 2 fs/binfmt_misc.c | 8 +-- fs/jbd2/journal.c | 2 fs/ocfs2/suballoc.c | 25 ++++------- include/linux/mm.h | 17 +++++-- include/linux/mm_types.h | 1 include/linux/psi.h | 11 ++-- include/linux/sysctl.h | 2 kernel/sched/psi.c | 79 ++++++++++++++++++----------------- lib/test_kasan.c | 5 ++ mm/memory-failure.c | 6 ++ tools/testing/scatterlist/linux/mm.h | 3 - 12 files changed, 91 insertions(+), 70 deletions(-)
On Fri, Jan 28, 2022 at 06:13:41PM -0800, Andrew Morton wrote:
> 12 patches, based on 169387e2aa291a4e3cb856053730fe99d6cec06f.
^^
I see 7?
On Sat, 29 Jan 2022 04:25:33 +0000 Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Jan 28, 2022 at 06:13:41PM -0800, Andrew Morton wrote:
> > 12 patches, based on 169387e2aa291a4e3cb856053730fe99d6cec06f.
> ^^
>
> I see 7?
Crap, sorry, ignore all this, shall redo tomorrow.
(It wasn't a good day over here. The thing with disk drives is that
the bigger they are, the harder they fall).
12 patches, based on f8c7e4ede46fe63ff10000669652648aab09d112. Subsystems affected by this patch series: sysctl binfmt ia64 mm/memory-failure mm/folios selftests mm/kasan mm/psi ocfs2 Subsystem: sysctl Andrew Morton <akpm@linux-foundation.org>: include/linux/sysctl.h: fix register_sysctl_mount_point() return type Subsystem: binfmt Tong Zhang <ztong0001@gmail.com>: binfmt_misc: fix crash when load/unload module Subsystem: ia64 Randy Dunlap <rdunlap@infradead.org>: ia64: make IA64_MCA_RECOVERY bool instead of tristate Subsystem: mm/memory-failure Joao Martins <joao.m.martins@oracle.com>: memory-failure: fetch compound_head after pgmap_pfn_valid() Subsystem: mm/folios Wei Yang <richard.weiyang@gmail.com>: mm: page->mapping folio->mapping should have the same offset Subsystem: selftests Maor Gottlieb <maorg@nvidia.com>: tools/testing/scatterlist: add missing defines Subsystem: mm/kasan Marco Elver <elver@google.com>: kasan: test: fix compatibility with FORTIFY_SOURCE Peter Collingbourne <pcc@google.com>: mm, kasan: use compare-exchange operation to set KASAN page tag Subsystem: mm/psi Suren Baghdasaryan <surenb@google.com>: psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n Subsystem: ocfs2 Joseph Qi <joseph.qi@linux.alibaba.com>: Patch series "ocfs2: fix a deadlock case": jbd2: export jbd2_journal_[grab|put]_journal_head ocfs2: fix a deadlock when commit trans arch/ia64/Kconfig | 2 fs/binfmt_misc.c | 8 +-- fs/jbd2/journal.c | 2 fs/ocfs2/suballoc.c | 25 ++++------- include/linux/mm.h | 17 +++++-- include/linux/mm_types.h | 1 include/linux/psi.h | 11 ++-- include/linux/sysctl.h | 2 kernel/sched/psi.c | 79 ++++++++++++++++++----------------- lib/test_kasan.c | 5 ++ mm/memory-failure.c | 6 ++ tools/testing/scatterlist/linux/mm.h | 3 - 12 files changed, 91 insertions(+), 70 deletions(-)
10 patches, based on 1f2cfdd349b7647f438c1e552dc1b983da86d830. Subsystems affected by this patch series: mm/vmscan mm/debug mm/pagemap ipc mm/kmemleak MAINTAINERS mm/selftests Subsystem: mm/vmscan Chen Wandun <chenwandun@huawei.com>: Revert "mm/page_isolation: unset migratetype directly for non Buddy page" Subsystem: mm/debug Pasha Tatashin <pasha.tatashin@soleen.com>: Patch series "page table check fixes and cleanups", v5: mm/debug_vm_pgtable: remove pte entry from the page table mm/page_table_check: use unsigned long for page counters and cleanup mm/khugepaged: unify collapse pmd clear, flush and free mm/page_table_check: check entries at pmd levels Subsystem: mm/pagemap Mike Rapoport <rppt@linux.ibm.com>: mm/pgtable: define pte_index so that preprocessor could recognize it Subsystem: ipc Minghao Chi <chi.minghao@zte.com.cn>: ipc/sem: do not sleep with a spin lock held Subsystem: mm/kmemleak Lang Yu <lang.yu@amd.com>: mm/kmemleak: avoid scanning potential huge holes Subsystem: MAINTAINERS Mike Rapoport <rppt@linux.ibm.com>: MAINTAINERS: update rppt's email Subsystem: mm/selftests Shuah Khan <skhan@linuxfoundation.org>: kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner" MAINTAINERS | 2 - include/linux/page_table_check.h | 19 ++++++++++ include/linux/pgtable.h | 1 ipc/sem.c | 4 +- mm/debug_vm_pgtable.c | 2 + mm/khugepaged.c | 37 +++++++++++--------- mm/kmemleak.c | 13 +++---- mm/page_isolation.c | 2 - mm/page_table_check.c | 55 +++++++++++++++---------------- tools/testing/selftests/vm/userfaultfd.c | 11 ++++-- 10 files changed, 89 insertions(+), 57 deletions(-)
5 patches, based on f1baf68e1383f6ed93eb9cff2866d46562607a43. Subsystems affected by this patch series: binfmt procfs mm/vmscan mm/memcg mm/kfence Subsystem: binfmt Mike Rapoport <rppt@linux.ibm.com>: fs/binfmt_elf: fix PT_LOAD p_align values for loaders Subsystem: procfs Yang Shi <shy828301@gmail.com>: fs/proc: task_mmu.c: don't read mapcount for migration entry Subsystem: mm/vmscan Mel Gorman <mgorman@suse.de>: mm: vmscan: remove deadlock due to throttling failing to make progress Subsystem: mm/memcg Roman Gushchin <guro@fb.com>: mm: memcg: synchronize objcg lists with a dedicated spinlock Subsystem: mm/kfence Peng Liu <liupeng256@huawei.com>: kfence: make test case compatible with run time set sample interval fs/binfmt_elf.c | 2 +- fs/proc/task_mmu.c | 40 +++++++++++++++++++++++++++++++--------- include/linux/kfence.h | 2 ++ include/linux/memcontrol.h | 5 +++-- mm/kfence/core.c | 3 ++- mm/kfence/kfence_test.c | 8 ++++---- mm/memcontrol.c | 10 +++++----- mm/vmscan.c | 4 +++- 8 files changed, 51 insertions(+), 23 deletions(-)
On Fri, Feb 11, 2022 at 4:27 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 5 patches, based on f1baf68e1383f6ed93eb9cff2866d46562607a43.
So this *completely* flummoxed 'b4', because you first sent the wrong
series, and then sent the right one in the same thread.
I fetched the emails manually, but honestly, this was confusing even
then, with two "[PATCH x/5]" series where the only way to tell the
right one was basically by date of email. They did arrive in the same
order in my mailbox, but even that wouldn't have been guaranteed if
there had been some mailer delays somewhere..
So next time when you mess up, resend it all as a completely new
series and completely new threading - so with a new header email too.
Please?
And since I'm here, let me just verify that yes, the series you
actually want me to apply is this one (as described by the head
email):
Subject: [patch 1/5] fs/binfmt_elf: fix PT_LOAD p_align values ..
Subject: [patch 2/5] fs/proc: task_mmu.c: don't read mapcount f..
Subject: [patch 3/5] mm: vmscan: remove deadlock due to throttl..
Subject: [patch 4/5] mm: memcg: synchronize objcg lists with a ..
Subject: [patch 5/5] kfence: make test case compatible with run..
and not the other one with GUP patches?
Linus
On Fri, 11 Feb 2022 18:02:53 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Feb 11, 2022 at 4:27 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > 5 patches, based on f1baf68e1383f6ed93eb9cff2866d46562607a43. > > So this *completely* flummoxed 'b4', because you first sent the wrong > series, and then sent the right one in the same thread. > > I fetched the emails manually, but honestly, this was confusing even > then, with two "[PATCH x/5]" series where the only way to tell the > right one was basically by date of email. They did arrive in the same > order in my mailbox, but even that wouldn't have been guaranteed if > there had been some mailer delays somewhere.. Yes, I wondered. Sorry bout that. > So next time when you mess up, resend it all as a completely new > series and completely new threading - so with a new header email too. > Please? Wilco. > And since I'm here, let me just verify that yes, the series you > actually want me to apply is this one (as described by the head > email): > > Subject: [patch 1/5] fs/binfmt_elf: fix PT_LOAD p_align values .. > Subject: [patch 2/5] fs/proc: task_mmu.c: don't read mapcount f.. > Subject: [patch 3/5] mm: vmscan: remove deadlock due to throttl.. > Subject: [patch 4/5] mm: memcg: synchronize objcg lists with a .. > Subject: [patch 5/5] kfence: make test case compatible with run.. > > and not the other one with GUP patches? Those are the ones. Five fixes, three with cc:stable.
12 patches, based on c47658311d60be064b839f329c0e4d34f5f0735b. Subsystems affected by this patch series: MAINTAINERS mm/hugetlb mm/kasan mm/hugetlbfs mm/pagemap mm/selftests mm/memcg m/slab mailmap memfd Subsystem: MAINTAINERS Luis Chamberlain <mcgrof@kernel.org>: MAINTAINERS: add sysctl-next git tree Subsystem: mm/hugetlb "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: mm/hugetlb: fix kernel crash with hugetlb mremap Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: kasan: test: prevent cache merging in kmem_cache_double_destroy Subsystem: mm/hugetlbfs Liu Yuntao <liuyuntao10@huawei.com>: hugetlbfs: fix a truncation issue in hugepages parameter Subsystem: mm/pagemap Suren Baghdasaryan <surenb@google.com>: mm: fix use-after-free bug when mm->mmap is reused after being freed Subsystem: mm/selftests "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: selftest/vm: fix map_fixed_noreplace test failure Subsystem: mm/memcg Roman Gushchin <roman.gushchin@linux.dev>: MAINTAINERS: add Roman as a memcg co-maintainer Vladimir Davydov <vdavydov.dev@gmail.com>: MAINTAINERS: remove Vladimir from memcg maintainers Shakeel Butt <shakeelb@google.com>: MAINTAINERS: add Shakeel as a memcg co-maintainer Subsystem: m/slab Vlastimil Babka <vbabka@suse.cz>: MAINTAINERS, SLAB: add Roman as reviewer, git tree Subsystem: mailmap Roman Gushchin <roman.gushchin@linux.dev>: mailmap: update Roman Gushchin's email Subsystem: memfd Mike Kravetz <mike.kravetz@oracle.com>: selftests/memfd: clean up mapping in mfd_fail_write .mailmap | 3 + MAINTAINERS | 6 ++ lib/test_kasan.c | 5 +- mm/hugetlb.c | 11 ++--- mm/mmap.c | 1 tools/testing/selftests/memfd/memfd_test.c | 1 tools/testing/selftests/vm/map_fixed_noreplace.c | 49 +++++++++++++++++------ 7 files changed, 56 insertions(+), 20 deletions(-)
8 patches, based on 07ebd38a0da24d2534da57b4841346379db9f354. Subsystems affected by this patch series: mm/hugetlb mm/pagemap memfd selftests mm/userfaultfd kconfig Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: selftests/vm: cleanup hugetlb file after mremap test Subsystem: mm/pagemap Suren Baghdasaryan <surenb@google.com>: mm: refactor vm_area_struct::anon_vma_name usage code mm: prevent vm_area_struct::anon_name refcount saturation mm: fix use-after-free when anon vma name is used after vma is freed Subsystem: memfd Hugh Dickins <hughd@google.com>: memfd: fix F_SEAL_WRITE after shmem huge page allocated Subsystem: selftests Chengming Zhou <zhouchengming@bytedance.com>: kselftest/vm: fix tests build with old libc Subsystem: mm/userfaultfd Yun Zhou <yun.zhou@windriver.com>: proc: fix documentation and description of pagemap Subsystem: kconfig Qian Cai <quic_qiancai@quicinc.com>: configs/debug: set CONFIG_DEBUG_INFO=y properly Documentation/admin-guide/mm/pagemap.rst | 2 fs/proc/task_mmu.c | 9 +- fs/userfaultfd.c | 6 - include/linux/mm.h | 7 + include/linux/mm_inline.h | 105 ++++++++++++++++++--------- include/linux/mm_types.h | 5 + kernel/configs/debug.config | 2 kernel/fork.c | 4 - kernel/sys.c | 19 +++- mm/madvise.c | 98 +++++++++---------------- mm/memfd.c | 40 +++++++--- mm/mempolicy.c | 2 mm/mlock.c | 2 mm/mmap.c | 12 +-- mm/mprotect.c | 2 tools/testing/selftests/vm/hugepage-mremap.c | 26 ++++-- tools/testing/selftests/vm/run_vmtests.sh | 3 tools/testing/selftests/vm/userfaultfd.c | 1 18 files changed, 201 insertions(+), 144 deletions(-)
4 patches, based on 56e337f2cf1326323844927a04e9dbce9a244835. Subsystems affected by this patch series: mm/swap kconfig ocfs2 selftests Subsystem: mm/swap Guo Ziliang <guo.ziliang@zte.com.cn>: mm: swap: get rid of deadloop in swapin readahead Subsystem: kconfig Qian Cai <quic_qiancai@quicinc.com>: configs/debug: restore DEBUG_INFO=y for overriding Subsystem: ocfs2 Joseph Qi <joseph.qi@linux.alibaba.com>: ocfs2: fix crash when initialize filecheck kobj fails Subsystem: selftests Yosry Ahmed <yosryahmed@google.com>: selftests: vm: fix clang build error multiple output files fs/ocfs2/super.c | 22 +++++++++++----------- kernel/configs/debug.config | 1 + mm/swap_state.c | 2 +- tools/testing/selftests/vm/Makefile | 6 ++---- 4 files changed, 15 insertions(+), 16 deletions(-)
- A few misc subsystems - There is a lot of MM material in Willy's tree. Folio work and non-folio patches which depended on that work. Here I send almost all the MM patches which precede the patches in Willy's tree. The remaining ~100 MM patches are staged on Willy's tree and I'll send those along once Willy is merged up. I tried this batch against your current tree (as of 51912904076680281) and a couple need some extra persuasion to apply, but all looks OK otherwise. 227 patches, based on f443e374ae131c168a065ea1748feac6b2e76613 Subsystems affected by this patch series: kthread scripts ntfs ocfs2 block vfs mm/kasan mm/pagecache mm/gup mm/swap mm/shmem mm/memcg mm/selftests mm/pagemap mm/mremap mm/sparsemem mm/vmalloc mm/pagealloc mm/memory-failure mm/mlock mm/hugetlb mm/userfaultfd mm/vmscan mm/compaction mm/mempolicy mm/oom-kill mm/migration mm/thp mm/cma mm/autonuma mm/psi mm/ksm mm/page-poison mm/madvise mm/memory-hotplug mm/rmap mm/zswap mm/uaccess mm/ioremap mm/highmem mm/cleanups mm/kfence mm/hmm mm/damon Subsystem: kthread Rasmus Villemoes <linux@rasmusvillemoes.dk>: linux/kthread.h: remove unused macros Subsystem: scripts Colin Ian King <colin.i.king@gmail.com>: scripts/spelling.txt: add more spellings to spelling.txt Subsystem: ntfs Dongliang Mu <mudongliangabcd@gmail.com>: ntfs: add sanity check on allocation size Subsystem: ocfs2 Joseph Qi <joseph.qi@linux.alibaba.com>: ocfs2: cleanup some return variables hongnanli <hongnan.li@linux.alibaba.com>: fs/ocfs2: fix comments mentioning i_mutex Subsystem: block NeilBrown <neilb@suse.de>: Patch series "Remove remaining parts of congestion tracking code", v2: doc: convert 'subsection' to 'section' in gfp.h mm: document and polish read-ahead code mm: improve cleanup when ->readpages doesn't process all pages fuse: remove reliance on bdi congestion nfs: remove reliance on bdi congestion ceph: remove reliance on bdi congestion remove inode_congested() remove bdi_congested() and wb_congested() and related functions f2fs: replace congestion_wait() calls with io_schedule_timeout() block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" remove congestion tracking framework Subsystem: vfs Anthony Iliopoulos <ailiop@suse.com>: mount: warn only once about timestamp range expiration Subsystem: mm/kasan Miaohe Lin <linmiaohe@huawei.com>: mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory Subsystem: mm/pagecache Miaohe Lin <linmiaohe@huawei.com>: filemap: remove find_get_pages() mm/writeback: minor clean up for highmem_dirtyable_memory Minchan Kim <minchan@kernel.org>: mm: fs: fix lru_cache_disabled race in bh_lru Subsystem: mm/gup Peter Xu <peterx@redhat.com>: Patch series "mm/gup: some cleanups", v5: mm: fix invalid page pointer returned with FOLL_PIN gups John Hubbard <jhubbard@nvidia.com>: mm/gup: follow_pfn_pte(): -EEXIST cleanup mm/gup: remove unused pin_user_pages_locked() mm: change lookup_node() to use get_user_pages_fast() mm/gup: remove unused get_user_pages_locked() Subsystem: mm/swap Bang Li <libang.linuxer@gmail.com>: mm/swap: fix confusing comment in folio_mark_accessed Subsystem: mm/shmem Xavier Roche <xavier.roche@algolia.com>: tmpfs: support for file creation time Hugh Dickins <hughd@google.com>: shmem: mapping_set_exiting() to help mapped resilience tmpfs: do not allocate pages on read Miaohe Lin <linmiaohe@huawei.com>: mm: shmem: use helper macro __ATTR_RW Subsystem: mm/memcg Shakeel Butt <shakeelb@google.com>: memcg: replace in_interrupt() with !in_task() Yosry Ahmed <yosryahmed@google.com>: memcg: add per-memcg total kernel memory stat Wei Yang <richard.weiyang@gmail.com>: mm/memcg: mem_cgroup_per_node is already set to 0 on allocation mm/memcg: retrieve parent memcg from css.parent Shakeel Butt <shakeelb@google.com>: Patch series "memcg: robust enforcement of memory.high", v2: memcg: refactor mem_cgroup_oom memcg: unify force charging conditions selftests: memcg: test high limit for single entry allocation memcg: synchronously enforce memory.high for large overcharges Randy Dunlap <rdunlap@infradead.org>: mm/memcontrol: return 1 from cgroup.memory __setup() handler Michal Hocko <mhocko@suse.com>: Patch series "mm/memcg: Address PREEMPT_RT problems instead of disabling it", v5: mm/memcg: revert ("mm/memcg: optimize user context object stock access") Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm/memcg: disable threshold event handlers on PREEMPT_RT mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed. Johannes Weiner <hannes@cmpxchg.org>: mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm/memcg: protect memcg_stock with a local_lock_t mm/memcg: disable migration instead of preemption in drain_all_stock(). Muchun Song <songmuchun@bytedance.com>: Patch series "Optimize list lru memory consumption", v6: mm: list_lru: transpose the array of per-node per-memcg lru lists mm: introduce kmem_cache_alloc_lru fs: introduce alloc_inode_sb() to allocate filesystems specific inode fs: allocate inode by using alloc_inode_sb() f2fs: allocate inode by using alloc_inode_sb() mm: dcache: use kmem_cache_alloc_lru() to allocate dentry xarray: use kmem_cache_alloc_lru to allocate xa_node mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() mm: list_lru: allocate list_lru_one only when needed mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus mm: list_lru: replace linear array with xarray mm: memcontrol: reuse memory cgroup ID for kmem ID mm: memcontrol: fix cannot alloc the maximum memcg ID mm: list_lru: rename list_lru_per_memcg to list_lru_memcg mm: memcontrol: rename memcg_cache_id to memcg_kmem_id Vasily Averin <vvs@virtuozzo.com>: memcg: enable accounting for tty-related objects Subsystem: mm/selftests Guillaume Tucker <guillaume.tucker@collabora.com>: selftests, x86: fix how check_cc.sh is being invoked Subsystem: mm/pagemap Anshuman Khandual <anshuman.khandual@arm.com>: mm: merge pte_mkhuge() call into arch_make_huge_pte() Stafford Horne <shorne@gmail.com>: mm: remove mmu_gathers storage from remaining architectures Muchun Song <songmuchun@bytedance.com>: Patch series "Fix some cache flush bugs", v5: mm: thp: fix wrong cache flush in remove_migration_pmd() mm: fix missing cache flush for all tail pages of compound page mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte() mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() mm: replace multiple dcache flush with flush_dcache_folio() Peter Xu <peterx@redhat.com>: Patch series "mm: Rework zap ptes on swap entries", v5: mm: don't skip swap entry even if zap_details specified mm: rename zap_skip_check_mapping() to should_zap_page() mm: change zap_details.zap_mapping into even_cows mm: rework swap handling of zap_pte_range Randy Dunlap <rdunlap@infradead.org>: mm/mmap: return 1 from stack_guard_gap __setup() handler Miaohe Lin <linmiaohe@huawei.com>: mm/memory.c: use helper function range_in_vma() mm/memory.c: use helper macro min and max in unmap_mapping_range_tree() Hugh Dickins <hughd@google.com>: mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK Miaohe Lin <linmiaohe@huawei.com>: mm/mmap: remove obsolete comment in ksys_mmap_pgoff Subsystem: mm/mremap Miaohe Lin <linmiaohe@huawei.com>: mm/mremap:: use vma_lookup() instead of find_vma() Subsystem: mm/sparsemem Miaohe Lin <linmiaohe@huawei.com>: mm/sparse: make mminit_validate_memmodel_limits() static Subsystem: mm/vmalloc Miaohe Lin <linmiaohe@huawei.com>: mm/vmalloc: remove unneeded function forward declaration "Uladzislau Rezki (Sony)" <urezki@gmail.com>: mm/vmalloc: Move draining areas out of caller context Uladzislau Rezki <uladzislau.rezki@sony.com>: mm/vmalloc: add adjust_search_size parameter "Uladzislau Rezki (Sony)" <urezki@gmail.com>: mm/vmalloc: eliminate an extra orig_gfp_mask Jiapeng Chong <jiapeng.chong@linux.alibaba.com>: mm/vmalloc.c: fix "unused function" warning Bang Li <libang.linuxer@gmail.com>: mm/vmalloc: fix comments about vmap_area struct Subsystem: mm/pagealloc Zi Yan <ziy@nvidia.com>: mm: page_alloc: avoid merging non-fallbackable pageblocks with others Peter Collingbourne <pcc@google.com>: mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last() Miaohe Lin <linmiaohe@huawei.com>: mm/mmzone.h: remove unused macros Nicolas Saenz Julienne <nsaenzju@redhat.com>: mm/page_alloc: don't pass pfn to free_unref_page_commit() David Hildenbrand <david@redhat.com>: Patch series "mm: enforce pageblock_order < MAX_ORDER": cma: factor out minimum alignment requirement mm: enforce pageblock_order < MAX_ORDER Nathan Chancellor <nathan@kernel.org>: mm/page_alloc: mark pagesets as __maybe_unused Alistair Popple <apopple@nvidia.com>: mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node Mel Gorman <mgorman@techsingularity.net>: Patch series "Follow-up on high-order PCP caching", v2: mm/page_alloc: fetch the correct pcp buddy during bulk free mm/page_alloc: track range of active PCP lists during bulk free mm/page_alloc: simplify how many pages are selected per pcp list during bulk free mm/page_alloc: drain the requested list first during bulk free mm/page_alloc: free pages in a single pass during bulk free mm/page_alloc: limit number of high-order pages on PCP during bulk free mm/page_alloc: do not prefetch buddies during bulk free Oscar Salvador <osalvador@suse.de>: arch/x86/mm/numa: Do not initialize nodes twice Suren Baghdasaryan <surenb@google.com>: mm: count time in drain_all_pages during direct reclaim as memory pressure Eric Dumazet <edumazet@google.com>: mm/page_alloc: call check_new_pages() while zone spinlock is not held Mel Gorman <mgorman@techsingularity.net>: mm/page_alloc: check high-order pages for corruption during PCP operations Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm/memory-failure.c: remove obsolete comment mm/hwpoison: fix error page recovered but reported "not recovered" Rik van Riel <riel@surriel.com>: mm: invalidate hwpoison page cache page in fault path Miaohe Lin <linmiaohe@huawei.com>: Patch series "A few cleanup and fixup patches for memory failure", v3: mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap mm/memory-failure.c: catch unexpected -EFAULT from vma_address() mm/memory-failure.c: rework the signaling logic in kill_proc mm/memory-failure.c: fix race with changing page more robustly mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings() mm/memory-failure.c: remove obsolete comment in __soft_offline_page mm/memory-failure.c: remove unnecessary PageTransTail check mm/hwpoison-inject: support injecting hwpoison to free page luofei <luofei@unicloud.com>: mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler mm/hwpoison: add in-use hugepage hwpoison filter judgement Miaohe Lin <linmiaohe@huawei.com>: Patch series "A few fixup patches for memory failure", v2: mm/memory-failure.c: fix race with changing page compound again mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages mm/memory-failure.c: make non-LRU movable pages unhandlable Vlastimil Babka <vbabka@suse.cz>: mm, fault-injection: declare should_fail_alloc_page() Subsystem: mm/mlock Miaohe Lin <linmiaohe@huawei.com>: mm/mlock: fix potential imbalanced rlimit ucounts adjustment Subsystem: mm/hugetlb Muchun Song <songmuchun@bytedance.com>: Patch series "Free the 2nd vmemmap page associated with each HugeTLB page", v7: mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key mm: sparsemem: use page table lock to protect kernel pmd operations selftests: vm: add a hugetlb test case mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP Anshuman Khandual <anshuman.khandual@arm.com>: mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB Mike Kravetz <mike.kravetz@oracle.com>: hugetlb: clean up potential spectre issue warnings Miaohe Lin <linmiaohe@huawei.com>: mm/hugetlb: use helper macro __ATTR_RW David Howells <dhowells@redhat.com>: mm/hugetlb.c: export PageHeadHuge() Miaohe Lin <linmiaohe@huawei.com>: mm: remove unneeded local variable follflags Subsystem: mm/userfaultfd Nadav Amit <namit@vmware.com>: userfaultfd: provide unmasked address on page-fault Guo Zhengkui <guozhengkui@vivo.com>: userfaultfd/selftests: fix uninitialized_var.cocci warning Subsystem: mm/vmscan Hugh Dickins <hughd@google.com>: mm/fs: delete PF_SWAPWRITE mm: __isolate_lru_page_prepare() in isolate_migratepages_block() Waiman Long <longman@redhat.com>: mm/list_lru: optimize memcg_reparent_list_lru_node() Marcelo Tosatti <mtosatti@redhat.com>: mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm: workingset: replace IRQ-off check with a lockdep assert. Charan Teja Kalla <quic_charante@quicinc.com>: mm: vmscan: fix documentation for page_check_references() Subsystem: mm/compaction Baolin Wang <baolin.wang@linux.alibaba.com>: mm: compaction: cleanup the compaction trace events Subsystem: mm/mempolicy Hugh Dickins <hughd@google.com>: mempolicy: mbind_range() set_policy() after vma_merge() Subsystem: mm/oom-kill Miaohe Lin <linmiaohe@huawei.com>: mm/oom_kill: remove unneeded is_memcg_oom check Subsystem: mm/migration Huang Ying <ying.huang@intel.com>: mm,migrate: fix establishing demotion target "andrew.yang" <andrew.yang@mediatek.com>: mm/migrate: fix race between lock page and clear PG_Isolated Subsystem: mm/thp Hugh Dickins <hughd@google.com>: mm/thp: refix __split_huge_pmd_locked() for migration PMD Subsystem: mm/cma Hari Bathini <hbathini@linux.ibm.com>: Patch series "powerpc/fadump: handle CMA activation failure appropriately", v3: mm/cma: provide option to opt out from exposing pages on activation failure powerpc/fadump: opt out from freeing pages on cma activation failure Subsystem: mm/autonuma Huang Ying <ying.huang@intel.com>: Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13: NUMA Balancing: add page promotion counter NUMA balancing: optimize page placement for memory tiering system memory tiering: skip to scan fast memory Subsystem: mm/psi Johannes Weiner <hannes@cmpxchg.org>: mm: page_io: fix psi memory pressure error on cold swapins Subsystem: mm/ksm Yang Yang <yang.yang29@zte.com.cn>: mm/vmstat: add event for ksm swapping in copy Miaohe Lin <linmiaohe@huawei.com>: mm/ksm: use helper macro __ATTR_RW Subsystem: mm/page-poison "Matthew Wilcox (Oracle)" <willy@infradead.org>: mm/hwpoison: check the subpage, not the head page Subsystem: mm/madvise Miaohe Lin <linmiaohe@huawei.com>: mm/madvise: use vma_lookup() instead of find_vma() Charan Teja Kalla <quic_charante@quicinc.com>: Patch series "mm: madvise: return correct bytes processed with: mm: madvise: return correct bytes advised with process_madvise mm: madvise: skip unmapped vma holes passed to process_madvise Subsystem: mm/memory-hotplug Michal Hocko <mhocko@suse.com>: Patch series "mm, memory_hotplug: handle unitialized numa node gracefully": mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG mm: handle uninitialized numa nodes gracefully mm, memory_hotplug: drop arch_free_nodedata mm, memory_hotplug: reorganize new pgdat initialization mm: make free_area_init_node aware of memory less nodes Wei Yang <richard.weiyang@gmail.com>: memcg: do not tweak node in alloc_mem_cgroup_per_node_info David Hildenbrand <david@redhat.com>: drivers/base/memory: add memory block to memory group after registration succeeded drivers/base/node: consolidate node device subsystem initialization in node_dev_init() Miaohe Lin <linmiaohe@huawei.com>: Patch series "A few cleanup patches around memory_hotplug": mm/memory_hotplug: remove obsolete comment of __add_pages mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL mm/memory_hotplug: clean up try_offline_node mm/memory_hotplug: fix misplaced comment in offline_pages David Hildenbrand <david@redhat.com>: Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2: drivers/base/node: rename link_mem_sections() to register_memory_block_under_node() drivers/base/memory: determine and store zone for single-zone memory blocks drivers/base/memory: clarify adding and removing of memory blocks Oscar Salvador <osalvador@suse.de>: mm: only re-generate demotion targets when a numa node changes its N_CPU state Subsystem: mm/rmap Hugh Dickins <hughd@google.com>: mm/thp: ClearPageDoubleMap in first page_add_file_rmap() Subsystem: mm/zswap "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>: mm/zswap.c: allow handling just same-value filled pages Subsystem: mm/uaccess Christophe Leroy <christophe.leroy@csgroup.eu>: mm: remove usercopy_warn() mm: uninline copy_overflow() Randy Dunlap <rdunlap@infradead.org>: mm/usercopy: return 1 from hardened_usercopy __setup() handler Subsystem: mm/ioremap Vlastimil Babka <vbabka@suse.cz>: mm/early_ioremap: declare early_memremap_pgprot_adjust() Subsystem: mm/highmem Ira Weiny <ira.weiny@intel.com>: highmem: document kunmap_local() Miaohe Lin <linmiaohe@huawei.com>: mm/highmem: remove unnecessary done label Subsystem: mm/cleanups "Dr. David Alan Gilbert" <linux@treblig.org>: mm/page_table_check.c: use strtobool for param parsing Subsystem: mm/kfence tangmeng <tangmeng@uniontech.com>: mm/kfence: remove unnecessary CONFIG_KFENCE option Tianchen Ding <dtcccc@linux.alibaba.com>: Patch series "provide the flexibility to enable KFENCE", v3: kfence: allow re-enabling KFENCE after system startup kfence: alloc kfence_pool after system startup Peng Liu <liupeng256@huawei.com>: Patch series "kunit: fix a UAF bug and do some optimization", v2: kunit: fix UAF when run kfence test case test_gfpzero kunit: make kunit_test_timeout compatible with comment kfence: test: try to avoid test_gfpzero trigger rcu_stall Marco Elver <elver@google.com>: kfence: allow use of a deferrable timer Subsystem: mm/hmm Miaohe Lin <linmiaohe@huawei.com>: mm/hmm.c: remove unneeded local variable ret Subsystem: mm/damon SeongJae Park <sj@kernel.org>: Patch series "Remove the type-unclear target id concept": mm/damon/dbgfs/init_regions: use target index instead of target id Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input mm/damon/core: move damon_set_targets() into dbgfs mm/damon: remove the target id concept Baolin Wang <baolin.wang@linux.alibaba.com>: mm/damon: remove redundant page validation SeongJae Park <sj@kernel.org>: Patch series "Allow DAMON user code independent of monitoring primitives": mm/damon: rename damon_primitives to damon_operations mm/damon: let monitoring operations can be registered and selected mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations() mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations() mm/damon/dbgfs: use operations id for knowing if the target has pid mm/damon/dbgfs-test: fix is_target_id() change mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() tangmeng <tangmeng@uniontech.com>: mm/damon: remove unnecessary CONFIG_DAMON option SeongJae Park <sj@kernel.org>: Patch series "Docs/damon: Update documents for better consistency": Docs/vm/damon: call low level monitoring primitives the operations Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/damon: update outdated term 'regions update interval' Patch series "Introduce DAMON sysfs interface", v3: mm/damon/core: allow non-exclusive DAMON start/stop mm/damon/core: add number of each enum type values mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support DAMOS stats selftests/damon: add a test for DAMON sysfs interface Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface Docs/ABI/testing: add DAMON sysfs interface ABI document Xin Hao <xhao@linux.alibaba.com>: mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Documentation/ABI/testing/sysfs-kernel-mm-damon | 274 ++ Documentation/admin-guide/cgroup-v1/memory.rst | 2 Documentation/admin-guide/cgroup-v2.rst | 5 Documentation/admin-guide/kernel-parameters.txt | 2 Documentation/admin-guide/mm/damon/usage.rst | 380 +++ Documentation/admin-guide/mm/zswap.rst | 22 Documentation/admin-guide/sysctl/kernel.rst | 31 Documentation/core-api/mm-api.rst | 19 Documentation/dev-tools/kfence.rst | 12 Documentation/filesystems/porting.rst | 6 Documentation/filesystems/vfs.rst | 16 Documentation/vm/damon/design.rst | 43 Documentation/vm/damon/faq.rst | 2 MAINTAINERS | 1 arch/arm/Kconfig | 4 arch/arm64/kernel/setup.c | 3 arch/arm64/mm/hugetlbpage.c | 1 arch/hexagon/mm/init.c | 2 arch/ia64/kernel/topology.c | 10 arch/ia64/mm/discontig.c | 11 arch/mips/kernel/topology.c | 5 arch/nds32/mm/init.c | 1 arch/openrisc/mm/init.c | 2 arch/powerpc/include/asm/fadump-internal.h | 5 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 4 arch/powerpc/kernel/fadump.c | 8 arch/powerpc/kernel/sysfs.c | 17 arch/riscv/Kconfig | 4 arch/riscv/kernel/setup.c | 3 arch/s390/kernel/numa.c | 7 arch/sh/kernel/topology.c | 5 arch/sparc/kernel/sysfs.c | 12 arch/sparc/mm/hugetlbpage.c | 1 arch/x86/Kconfig | 4 arch/x86/kernel/cpu/mce/core.c | 8 arch/x86/kernel/topology.c | 5 arch/x86/mm/numa.c | 33 block/bdev.c | 2 block/bfq-iosched.c | 2 drivers/base/init.c | 1 drivers/base/memory.c | 149 + drivers/base/node.c | 48 drivers/block/drbd/drbd_int.h | 3 drivers/block/drbd/drbd_req.c | 3 drivers/dax/super.c | 2 drivers/of/of_reserved_mem.c | 9 drivers/tty/tty_io.c | 2 drivers/virtio/virtio_mem.c | 9 fs/9p/vfs_inode.c | 2 fs/adfs/super.c | 2 fs/affs/super.c | 2 fs/afs/super.c | 2 fs/befs/linuxvfs.c | 2 fs/bfs/inode.c | 2 fs/btrfs/inode.c | 2 fs/buffer.c | 8 fs/ceph/addr.c | 22 fs/ceph/inode.c | 2 fs/ceph/super.c | 1 fs/ceph/super.h | 1 fs/cifs/cifsfs.c | 2 fs/coda/inode.c | 2 fs/dcache.c | 3 fs/ecryptfs/super.c | 2 fs/efs/super.c | 2 fs/erofs/super.c | 2 fs/exfat/super.c | 2 fs/ext2/ialloc.c | 5 fs/ext2/super.c | 2 fs/ext4/super.c | 2 fs/f2fs/compress.c | 4 fs/f2fs/data.c | 3 fs/f2fs/f2fs.h | 6 fs/f2fs/segment.c | 8 fs/f2fs/super.c | 14 fs/fat/inode.c | 2 fs/freevxfs/vxfs_super.c | 2 fs/fs-writeback.c | 40 fs/fuse/control.c | 17 fs/fuse/dev.c | 8 fs/fuse/file.c | 17 fs/fuse/inode.c | 2 fs/gfs2/super.c | 2 fs/hfs/super.c | 2 fs/hfsplus/super.c | 2 fs/hostfs/hostfs_kern.c | 2 fs/hpfs/super.c | 2 fs/hugetlbfs/inode.c | 2 fs/inode.c | 2 fs/isofs/inode.c | 2 fs/jffs2/super.c | 2 fs/jfs/super.c | 2 fs/minix/inode.c | 2 fs/namespace.c | 2 fs/nfs/inode.c | 2 fs/nfs/write.c | 14 fs/nilfs2/segbuf.c | 16 fs/nilfs2/super.c | 2 fs/ntfs/inode.c | 6 fs/ntfs3/super.c | 2 fs/ocfs2/alloc.c | 2 fs/ocfs2/aops.c | 2 fs/ocfs2/cluster/nodemanager.c | 2 fs/ocfs2/dir.c | 4 fs/ocfs2/dlmfs/dlmfs.c | 2 fs/ocfs2/file.c | 13 fs/ocfs2/inode.c | 2 fs/ocfs2/localalloc.c | 6 fs/ocfs2/namei.c | 2 fs/ocfs2/ocfs2.h | 4 fs/ocfs2/quota_global.c | 2 fs/ocfs2/stack_user.c | 18 fs/ocfs2/super.c | 2 fs/ocfs2/xattr.c | 2 fs/openpromfs/inode.c | 2 fs/orangefs/super.c | 2 fs/overlayfs/super.c | 2 fs/proc/inode.c | 2 fs/qnx4/inode.c | 2 fs/qnx6/inode.c | 2 fs/reiserfs/super.c | 2 fs/romfs/super.c | 2 fs/squashfs/super.c | 2 fs/sysv/inode.c | 2 fs/ubifs/super.c | 2 fs/udf/super.c | 2 fs/ufs/super.c | 2 fs/userfaultfd.c | 5 fs/vboxsf/super.c | 2 fs/xfs/libxfs/xfs_btree.c | 2 fs/xfs/xfs_buf.c | 3 fs/xfs/xfs_icache.c | 2 fs/zonefs/super.c | 2 include/linux/backing-dev-defs.h | 8 include/linux/backing-dev.h | 50 include/linux/cma.h | 14 include/linux/damon.h | 95 include/linux/fault-inject.h | 2 include/linux/fs.h | 21 include/linux/gfp.h | 10 include/linux/highmem-internal.h | 10 include/linux/hugetlb.h | 8 include/linux/kthread.h | 22 include/linux/list_lru.h | 45 include/linux/memcontrol.h | 46 include/linux/memory.h | 12 include/linux/memory_hotplug.h | 132 - include/linux/migrate.h | 8 include/linux/mm.h | 11 include/linux/mmzone.h | 22 include/linux/nfs_fs_sb.h | 1 include/linux/node.h | 25 include/linux/page-flags.h | 96 include/linux/pageblock-flags.h | 7 include/linux/pagemap.h | 7 include/linux/sched.h | 1 include/linux/sched/sysctl.h | 10 include/linux/shmem_fs.h | 1 include/linux/slab.h | 3 include/linux/swap.h | 6 include/linux/thread_info.h | 5 include/linux/uaccess.h | 2 include/linux/vm_event_item.h | 3 include/linux/vmalloc.h | 4 include/linux/xarray.h | 9 include/ras/ras_event.h | 1 include/trace/events/compaction.h | 26 include/trace/events/writeback.h | 28 include/uapi/linux/userfaultfd.h | 8 ipc/mqueue.c | 2 kernel/dma/contiguous.c | 4 kernel/sched/core.c | 21 kernel/sysctl.c | 2 lib/Kconfig.kfence | 12 lib/kunit/try-catch.c | 3 lib/xarray.c | 10 mm/Kconfig | 6 mm/backing-dev.c | 57 mm/cma.c | 31 mm/cma.h | 1 mm/compaction.c | 60 mm/damon/Kconfig | 19 mm/damon/Makefile | 7 mm/damon/core-test.h | 23 mm/damon/core.c | 190 + mm/damon/dbgfs-test.h | 103 mm/damon/dbgfs.c | 264 +- mm/damon/ops-common.c | 133 + mm/damon/ops-common.h | 16 mm/damon/paddr.c | 62 mm/damon/prmtv-common.c | 133 - mm/damon/prmtv-common.h | 16 mm/damon/reclaim.c | 11 mm/damon/sysfs.c | 2632 ++++++++++++++++++++++- mm/damon/vaddr-test.h | 8 mm/damon/vaddr.c | 67 mm/early_ioremap.c | 1 mm/fadvise.c | 5 mm/filemap.c | 17 mm/gup.c | 103 mm/highmem.c | 9 mm/hmm.c | 3 mm/huge_memory.c | 41 mm/hugetlb.c | 23 mm/hugetlb_vmemmap.c | 74 mm/hwpoison-inject.c | 7 mm/internal.h | 19 mm/kfence/Makefile | 2 mm/kfence/core.c | 147 + mm/kfence/kfence_test.c | 3 mm/ksm.c | 6 mm/list_lru.c | 690 ++---- mm/maccess.c | 6 mm/madvise.c | 18 mm/memcontrol.c | 549 ++-- mm/memory-failure.c | 148 - mm/memory.c | 116 - mm/memory_hotplug.c | 136 - mm/mempolicy.c | 29 mm/memremap.c | 3 mm/migrate.c | 128 - mm/mlock.c | 1 mm/mmap.c | 5 mm/mmzone.c | 7 mm/mprotect.c | 13 mm/mremap.c | 4 mm/oom_kill.c | 3 mm/page-writeback.c | 12 mm/page_alloc.c | 429 +-- mm/page_io.c | 7 mm/page_table_check.c | 10 mm/ptdump.c | 16 mm/readahead.c | 124 + mm/rmap.c | 15 mm/shmem.c | 46 mm/slab.c | 39 mm/slab.h | 25 mm/slob.c | 6 mm/slub.c | 42 mm/sparse-vmemmap.c | 70 mm/sparse.c | 2 mm/swap.c | 25 mm/swapfile.c | 1 mm/usercopy.c | 16 mm/userfaultfd.c | 3 mm/vmalloc.c | 102 mm/vmscan.c | 138 - mm/vmstat.c | 19 mm/workingset.c | 7 mm/zswap.c | 15 net/socket.c | 2 net/sunrpc/rpc_pipe.c | 2 scripts/spelling.txt | 16 tools/testing/selftests/cgroup/cgroup_util.c | 15 tools/testing/selftests/cgroup/cgroup_util.h | 1 tools/testing/selftests/cgroup/test_memcontrol.c | 78 tools/testing/selftests/damon/Makefile | 1 tools/testing/selftests/damon/sysfs.sh | 306 ++ tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 7 tools/testing/selftests/vm/hugepage-vmemmap.c | 144 + tools/testing/selftests/vm/run_vmtests.sh | 11 tools/testing/selftests/vm/userfaultfd.c | 2 tools/testing/selftests/x86/Makefile | 6 264 files changed, 7205 insertions(+), 3090 deletions(-)
Various misc subsystems, before getting into the post-linux-next material. This is all based on v5.17. I tested applying and compiling against today's 1bc191051dca28fa6. One patch required an extra whack, all looks good. 41 patches, based on f443e374ae131c168a065ea1748feac6b2e76613. Subsystems affected by this patch series: procfs misc core-kernel lib checkpatch init pipe minix fat cgroups kexec kdump taskstats panic kcov resource ubsan Subsystem: procfs Hao Lee <haolee.swjtu@gmail.com>: proc: alloc PATH_MAX bytes for /proc/${pid}/fd/ symlinks David Hildenbrand <david@redhat.com>: proc/vmcore: fix possible deadlock on concurrent mmap and read Yang Li <yang.lee@linux.alibaba.com>: proc/vmcore: fix vmcore_alloc_buf() kernel-doc comment Subsystem: misc Bjorn Helgaas <bhelgaas@google.com>: linux/types.h: remove unnecessary __bitwise__ Documentation/sparse: add hints about __CHECKER__ Subsystem: core-kernel Miaohe Lin <linmiaohe@huawei.com>: kernel/ksysfs.c: use helper macro __ATTR_RW Subsystem: lib Kees Cook <keescook@chromium.org>: Kconfig.debug: make DEBUG_INFO selectable from a choice Rasmus Villemoes <linux@rasmusvillemoes.dk>: include: drop pointless __compiler_offsetof indirection Christophe Leroy <christophe.leroy@csgroup.eu>: ilog2: force inlining of __ilog2_u32() and __ilog2_u64() Andy Shevchenko <andriy.shevchenko@linux.intel.com>: bitfield: add explicit inclusions to the example Feng Tang <feng.tang@intel.com>: lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option Randy Dunlap <rdunlap@infradead.org>: lib: bitmap: fix many kernel-doc warnings Subsystem: checkpatch Joe Perches <joe@perches.com>: checkpatch: prefer MODULE_LICENSE("GPL") over MODULE_LICENSE("GPL v2") checkpatch: add --fix option for some TRAILING_STATEMENTS checkpatch: add early_param exception to blank line after struct/function test Sagar Patel <sagarmp@cs.unc.edu>: checkpatch: use python3 to find codespell dictionary Subsystem: init Mark-PK Tsai <mark-pk.tsai@mediatek.com>: init: use ktime_us_delta() to make initcall_debug log more precise Randy Dunlap <rdunlap@infradead.org>: init.h: improve __setup and early_param documentation init/main.c: return 1 from handled __setup() functions Subsystem: pipe Andrei Vagin <avagin@gmail.com>: fs/pipe: use kvcalloc to allocate a pipe_buffer array fs/pipe.c: local vars have to match types of proper pipe_inode_info fields Subsystem: minix Qinghua Jin <qhjin.dev@gmail.com>: minix: fix bug when opening a file with O_DIRECT Subsystem: fat Helge Deller <deller@gmx.de>: fat: use pointer to simple type in put_user() Subsystem: cgroups Sebastian Andrzej Siewior <bigeasy@linutronix.de>: cgroup: use irqsave in cgroup_rstat_flush_locked(). cgroup: add a comment to cgroup_rstat_flush_locked(). Subsystem: kexec Jisheng Zhang <jszhang@kernel.org>: Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2: kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef Subsystem: kdump Tiezhu Yang <yangtiezhu@loongson.cn>: Patch series "Update doc and fix some issues about kdump", v2: docs: kdump: update description about sysfs file system support docs: kdump: add scp example to write out the dump file panic: unset panic_on_warn inside panic() ubsan: no need to unset panic_on_warn in ubsan_epilogue() kasan: no need to unset panic_on_warn in end_report() Subsystem: taskstats Lukas Bulwahn <lukas.bulwahn@gmail.com>: taskstats: remove unneeded dead assignment Subsystem: panic "Guilherme G. Piccoli" <gpiccoli@igalia.com>: Patch series "Some improvements on panic_print": docs: sysctl/kernel: add missing bit to panic_print panic: add option to dump all CPUs backtraces in panic_print panic: move panic_print before kmsg dumpers Subsystem: kcov Aleksandr Nogikh <nogikh@google.com>: Patch series "kcov: improve mmap processing", v3: kcov: split ioctl handling into locked and unlocked parts kcov: properly handle subsequent mmap calls Subsystem: resource Miaohe Lin <linmiaohe@huawei.com>: kernel/resource: fix kfree() of bootmem memory again Subsystem: ubsan Marco Elver <elver@google.com>: Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" Documentation/admin-guide/kdump/kdump.rst | 10 + Documentation/admin-guide/kernel-parameters.txt | 5 Documentation/admin-guide/sysctl/kernel.rst | 2 Documentation/dev-tools/sparse.rst | 2 arch/arm64/mm/init.c | 9 - arch/riscv/mm/init.c | 6 - arch/x86/kernel/setup.c | 10 - fs/fat/dir.c | 2 fs/minix/inode.c | 3 fs/pipe.c | 13 +- fs/proc/base.c | 8 - fs/proc/vmcore.c | 43 +++---- include/linux/bitfield.h | 3 include/linux/compiler_types.h | 3 include/linux/init.h | 11 + include/linux/kexec.h | 12 +- include/linux/log2.h | 4 include/linux/stddef.h | 6 - include/uapi/linux/types.h | 6 - init/main.c | 14 +- kernel/cgroup/rstat.c | 13 +- kernel/kcov.c | 102 ++++++++--------- kernel/ksysfs.c | 3 kernel/panic.c | 37 ++++-- kernel/resource.c | 41 +----- kernel/taskstats.c | 5 lib/Kconfig.debug | 142 ++++++++++++------------ lib/Kconfig.kcsan | 11 - lib/Kconfig.ubsan | 12 -- lib/bitmap.c | 24 ++-- lib/ubsan.c | 10 - mm/kasan/report.c | 10 - scripts/checkpatch.pl | 31 ++++- tools/include/linux/types.h | 5 34 files changed, 313 insertions(+), 305 deletions(-)
This is the material which was staged after willystuff in linux-next. Everything applied seamlessly on your latest, all looks well. 114 patches, based on 52deda9551a01879b3562e7b41748e85c591f14c. Subsystems affected by this patch series: mm/debug mm/selftests mm/pagecache mm/thp mm/rmap mm/migration mm/kasan mm/hugetlb mm/pagemap mm/madvise selftests Subsystem: mm/debug Sean Anderson <seanga2@gmail.com>: tools/vm/page_owner_sort.c: sort by stacktrace before culling tools/vm/page_owner_sort.c: support sorting by stack trace Yinan Zhang <zhangyinan2019@email.szu.edu.cn>: tools/vm/page_owner_sort.c: add switch between culling by stacktrace and txt Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>: tools/vm/page_owner_sort.c: support sorting pid and time Shenghong Han <hanshenghong2019@email.szu.edu.cn>: tools/vm/page_owner_sort.c: two trivial fixes Yixuan Cao <caoyixuan2019@email.szu.edu.cn>: tools/vm/page_owner_sort.c: delete invalid duplicate code Shenghong Han <hanshenghong2019@email.szu.edu.cn>: Documentation/vm/page_owner.rst: update the documentation Shuah Khan <skhan@linuxfoundation.org>: Documentation/vm/page_owner.rst: fix unexpected indentation warns Waiman Long <longman@redhat.com>: Patch series "mm/page_owner: Extend page_owner to show memcg information", v4: lib/vsprintf: avoid redundant work with 0 size mm/page_owner: use scnprintf() to avoid excessive buffer overrun check mm/page_owner: print memcg information mm/page_owner: record task command name Yixuan Cao <caoyixuan2019@email.szu.edu.cn>: mm/page_owner.c: record tgid tools/vm/page_owner_sort.c: fix the instructions for use Jiajian Ye <yejiajian2018@email.szu.edu.cn>: tools/vm/page_owner_sort.c: fix comments tools/vm/page_owner_sort.c: add a security check tools/vm/page_owner_sort.c: support sorting by tgid and update documentation tools/vm/page_owner_sort: fix three trivival places tools/vm/page_owner_sort: support for sorting by task command name tools/vm/page_owner_sort.c: support for selecting by PID, TGID or task command name tools/vm/page_owner_sort.c: support for user-defined culling rules Christoph Hellwig <hch@lst.de>: mm: unexport page_init_poison Subsystem: mm/selftests "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>: selftest/vm: add util.h and and move helper functions there Mike Rapoport <rppt@kernel.org>: selftest/vm: add helpers to detect PAGE_SIZE and PAGE_SHIFT Subsystem: mm/pagecache Hugh Dickins <hughd@google.com>: mm: delete __ClearPageWaiters() mm: filemap_unaccount_folio() large skip mapcount fixup Subsystem: mm/thp Hugh Dickins <hughd@google.com>: mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap() Subsystem: mm/rmap Subsystem: mm/migration Anshuman Khandual <anshuman.khandual@arm.com>: Patch series "mm/migration: Add trace events", v3: mm/migration: add trace events for THP migrations mm/migration: add trace events for base page and HugeTLB migrations Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: Patch series "kasan, vmalloc, arm64: add vmalloc tagging support for SW/HW_TAGS", v6: kasan, page_alloc: deduplicate should_skip_kasan_poison kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pages kasan, page_alloc: merge kasan_free_pages into free_pages_prepare kasan, page_alloc: simplify kasan_poison_pages call site kasan, page_alloc: init memory of skipped pages on free kasan: drop skip_kasan_poison variable in free_pages_prepare mm: clarify __GFP_ZEROTAGS comment kasan: only apply __GFP_ZEROTAGS when memory is zeroed kasan, page_alloc: refactor init checks in post_alloc_hook kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hook kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hook kasan, page_alloc: move kernel_init_free_pages in post_alloc_hook kasan, page_alloc: rework kasan_unpoison_pages call site kasan: clean up metadata byte definitions kasan: define KASAN_VMALLOC_INVALID for SW_TAGS kasan, x86, arm64, s390: rename functions for modules shadow kasan, vmalloc: drop outdated VM_KASAN comment kasan: reorder vmalloc hooks kasan: add wrappers for vmalloc hooks kasan, vmalloc: reset tags in vmalloc functions kasan, fork: reset pointer tags of vmapped stacks kasan, arm64: reset pointer tags of vmapped stacks kasan, vmalloc: add vmalloc tagging for SW_TAGS kasan, vmalloc, arm64: mark vmalloc mappings as pgprot_tagged kasan, vmalloc: unpoison VM_ALLOC pages after mapping kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS kasan, page_alloc: allow skipping unpoisoning for HW_TAGS kasan, page_alloc: allow skipping memory init for HW_TAGS kasan, vmalloc: add vmalloc tagging for HW_TAGS kasan, vmalloc: only tag normal vmalloc allocations kasan, arm64: don't tag executable vmalloc allocations kasan: mark kasan_arg_stacktrace as __initdata kasan: clean up feature flags for HW_TAGS mode kasan: add kasan.vmalloc command line flag kasan: allow enabling KASAN_VMALLOC and SW/HW_TAGS arm64: select KASAN_VMALLOC for SW/HW_TAGS modes kasan: documentation updates kasan: improve vmalloc tests kasan: test: support async (again) and asymm modes for HW_TAGS tangmeng <tangmeng@uniontech.com>: mm/kasan: remove unnecessary CONFIG_KASAN option Peter Collingbourne <pcc@google.com>: kasan: update function name in comments Andrey Konovalov <andreyknvl@google.com>: kasan: print virtual mapping info in reports Patch series "kasan: report clean-ups and improvements": kasan: drop addr check from describe_object_addr kasan: more line breaks in reports kasan: rearrange stack frame info in reports kasan: improve stack frame info in reports kasan: print basic stack frame info for SW_TAGS kasan: simplify async check in end_report() kasan: simplify kasan_update_kunit_status() and call sites kasan: check CONFIG_KASAN_KUNIT_TEST instead of CONFIG_KUNIT kasan: move update_kunit_status to start_report kasan: move disable_trace_on_warning to start_report kasan: split out print_report from __kasan_report kasan: simplify kasan_find_first_bad_addr call sites kasan: restructure kasan_report kasan: merge __kasan_report into kasan_report kasan: call print_report from kasan_report_invalid_free kasan: move and simplify kasan_report_async kasan: rename kasan_access_info to kasan_report_info kasan: add comment about UACCESS regions to kasan_report kasan: respect KASAN_BIT_REPORTED in all reporting routines kasan: reorder reporting functions kasan: move and hide kasan_save_enable/restore_multi_shot kasan: disable LOCKDEP when printing reports Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: Patch series "Add hugetlb MADV_DONTNEED support", v3: mm: enable MADV_DONTNEED for hugetlb mappings selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test userfaultfd/selftests: enable hugetlb remap and remove event testing Miaohe Lin <linmiaohe@huawei.com>: mm/huge_memory: make is_transparent_hugepage() static Subsystem: mm/pagemap David Hildenbrand <david@redhat.com>: Patch series "mm: COW fixes part 1: fix the COW security issue for THP and swap", v3: mm: optimize do_wp_page() for exclusive pages in the swapcache mm: optimize do_wp_page() for fresh pages in local LRU pagevecs mm: slightly clarify KSM logic in do_swap_page() mm: streamline COW logic in do_swap_page() mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() mm/khugepaged: remove reuse_swap_page() usage mm/swapfile: remove stale reuse_swap_page() mm/huge_memory: remove stale page_trans_huge_mapcount() mm/huge_memory: remove stale locking logic from __split_huge_pmd() Hugh Dickins <hughd@google.com>: mm: warn on deleting redirtied only if accounted mm: unmap_mapping_range_tree() with i_mmap_rwsem shared Anshuman Khandual <anshuman.khandual@arm.com>: mm: generalize ARCH_HAS_FILTER_PGPROT Subsystem: mm/madvise Mauricio Faria de Oliveira <mfo@canonical.com>: mm: fix race between MADV_FREE reclaim and blkdev direct IO read Johannes Weiner <hannes@cmpxchg.org>: mm: madvise: MADV_DONTNEED_LOCKED Subsystem: selftests Muhammad Usama Anjum <usama.anjum@collabora.com>: selftests: vm: remove dependecy from internal kernel macros Kees Cook <keescook@chromium.org>: selftests: kselftest framework: provide "finished" helper Documentation/dev-tools/kasan.rst | 17 Documentation/vm/page_owner.rst | 72 ++ arch/alpha/include/uapi/asm/mman.h | 2 arch/arm64/Kconfig | 2 arch/arm64/include/asm/vmalloc.h | 6 arch/arm64/include/asm/vmap_stack.h | 5 arch/arm64/kernel/module.c | 5 arch/arm64/mm/pageattr.c | 2 arch/arm64/net/bpf_jit_comp.c | 3 arch/mips/include/uapi/asm/mman.h | 2 arch/parisc/include/uapi/asm/mman.h | 2 arch/powerpc/mm/book3s64/trace.c | 1 arch/s390/kernel/module.c | 2 arch/x86/Kconfig | 3 arch/x86/kernel/module.c | 2 arch/x86/mm/init.c | 1 arch/xtensa/include/uapi/asm/mman.h | 2 include/linux/gfp.h | 53 +- include/linux/huge_mm.h | 6 include/linux/kasan.h | 136 +++-- include/linux/mm.h | 5 include/linux/page-flags.h | 2 include/linux/pagemap.h | 3 include/linux/swap.h | 4 include/linux/vmalloc.h | 18 include/trace/events/huge_memory.h | 1 include/trace/events/migrate.h | 31 + include/trace/events/mmflags.h | 18 include/trace/events/thp.h | 27 + include/uapi/asm-generic/mman-common.h | 2 kernel/fork.c | 13 kernel/scs.c | 16 lib/Kconfig.kasan | 18 lib/test_kasan.c | 239 ++++++++- lib/vsprintf.c | 8 mm/Kconfig | 3 mm/debug.c | 1 mm/filemap.c | 63 +- mm/huge_memory.c | 109 ---- mm/kasan/Makefile | 2 mm/kasan/common.c | 4 mm/kasan/hw_tags.c | 243 +++++++--- mm/kasan/kasan.h | 76 ++- mm/kasan/report.c | 516 +++++++++++---------- mm/kasan/report_generic.c | 34 - mm/kasan/report_hw_tags.c | 1 mm/kasan/report_sw_tags.c | 16 mm/kasan/report_tags.c | 2 mm/kasan/shadow.c | 76 +-- mm/khugepaged.c | 11 mm/madvise.c | 57 +- mm/memory.c | 129 +++-- mm/memremap.c | 2 mm/migrate.c | 4 mm/page-writeback.c | 18 mm/page_alloc.c | 270 ++++++----- mm/page_owner.c | 86 ++- mm/rmap.c | 62 +- mm/swap.c | 4 mm/swapfile.c | 104 ---- mm/vmalloc.c | 167 ++++-- tools/testing/selftests/kselftest.h | 10 tools/testing/selftests/vm/.gitignore | 1 tools/testing/selftests/vm/Makefile | 1 tools/testing/selftests/vm/gup_test.c | 3 tools/testing/selftests/vm/hugetlb-madvise.c | 410 ++++++++++++++++ tools/testing/selftests/vm/ksm_tests.c | 38 - tools/testing/selftests/vm/memfd_secret.c | 2 tools/testing/selftests/vm/run_vmtests.sh | 15 tools/testing/selftests/vm/transhuge-stress.c | 41 - tools/testing/selftests/vm/userfaultfd.c | 72 +- tools/testing/selftests/vm/util.h | 75 ++- tools/vm/page_owner_sort.c | 628 +++++++++++++++++++++----- 73 files changed, 2797 insertions(+), 1288 deletions(-)
16 patches, based on e8b767f5e04097aaedcd6e06e2270f9fe5282696. Subsystems affected by this patch series: mm/madvise ofs2 nilfs2 mm/mlock mm/mfence mailmap mm/memory-failure mm/kasan mm/debug mm/kmemleak mm/damon Subsystem: mm/madvise Charan Teja Kalla <quic_charante@quicinc.com>: Revert "mm: madvise: skip unmapped vma holes passed to process_madvise" Subsystem: ofs2 Joseph Qi <joseph.qi@linux.alibaba.com>: ocfs2: fix crash when mount with quota enabled Subsystem: nilfs2 Ryusuke Konishi <konishi.ryusuke@gmail.com>: Patch series "nilfs2 lockdep warning fixes": nilfs2: fix lockdep warnings in page operations for btree nodes nilfs2: fix lockdep warnings during disk space reclamation nilfs2: get rid of nilfs_mapping_init() Subsystem: mm/mlock Hugh Dickins <hughd@google.com>: mm/munlock: add lru_add_drain() to fix memcg_stat_test mm/munlock: update Documentation/vm/unevictable-lru.rst Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm/munlock: protect the per-CPU pagevec by a local_lock_t Subsystem: mm/kfence Muchun Song <songmuchun@bytedance.com>: mm: kfence: fix objcgs vector allocation Subsystem: mailmap Kirill Tkhai <kirill.tkhai@openvz.org>: mailmap: update Kirill's email Subsystem: mm/memory-failure Rik van Riel <riel@surriel.com>: mm,hwpoison: unmap poisoned page before invalidation Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP Subsystem: mm/debug Yinan Zhang <zhangyinan2019@email.szu.edu.cn>: tools/vm/page_owner_sort.c: remove -c option doc/vm/page_owner.rst: remove content related to -c option Subsystem: mm/kmemleak Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>: mm/kmemleak: reset tag when compare object pointer Subsystem: mm/damon Jonghyeon Kim <tome01@ajou.ac.kr>: mm/damon: prevent activated scheme from sleeping by deactivated schemes .mailmap | 1 Documentation/vm/page_owner.rst | 1 Documentation/vm/unevictable-lru.rst | 473 +++++++++++++++-------------------- fs/nilfs2/btnode.c | 23 + fs/nilfs2/btnode.h | 1 fs/nilfs2/btree.c | 27 + fs/nilfs2/dat.c | 4 fs/nilfs2/gcinode.c | 7 fs/nilfs2/inode.c | 167 +++++++++++- fs/nilfs2/mdt.c | 45 ++- fs/nilfs2/mdt.h | 6 fs/nilfs2/nilfs.h | 16 - fs/nilfs2/page.c | 16 - fs/nilfs2/page.h | 1 fs/nilfs2/segment.c | 9 fs/nilfs2/super.c | 5 fs/ocfs2/quota_global.c | 23 - fs/ocfs2/quota_local.c | 2 include/linux/gfp.h | 4 mm/damon/core.c | 5 mm/gup.c | 10 mm/internal.h | 6 mm/kfence/core.c | 11 mm/kfence/kfence.h | 3 mm/kmemleak.c | 9 mm/madvise.c | 9 mm/memory.c | 12 mm/migrate.c | 2 mm/mlock.c | 46 ++- mm/page_alloc.c | 1 mm/rmap.c | 4 mm/swap.c | 4 tools/vm/page_owner_sort.c | 6 33 files changed, 560 insertions(+), 399 deletions(-)
Argh, messed up in-reply-to. Let me redo...
16 patches, based on e8b767f5e04097aaedcd6e06e2270f9fe5282696. Subsystems affected by this patch series: mm/madvise ofs2 nilfs2 mm/mlock mm/mfence mailmap mm/memory-failure mm/kasan mm/debug mm/kmemleak mm/damon Subsystem: mm/madvise Charan Teja Kalla <quic_charante@quicinc.com>: Revert "mm: madvise: skip unmapped vma holes passed to process_madvise" Subsystem: ofs2 Joseph Qi <joseph.qi@linux.alibaba.com>: ocfs2: fix crash when mount with quota enabled Subsystem: nilfs2 Ryusuke Konishi <konishi.ryusuke@gmail.com>: Patch series "nilfs2 lockdep warning fixes": nilfs2: fix lockdep warnings in page operations for btree nodes nilfs2: fix lockdep warnings during disk space reclamation nilfs2: get rid of nilfs_mapping_init() Subsystem: mm/mlock Hugh Dickins <hughd@google.com>: mm/munlock: add lru_add_drain() to fix memcg_stat_test mm/munlock: update Documentation/vm/unevictable-lru.rst Sebastian Andrzej Siewior <bigeasy@linutronix.de>: mm/munlock: protect the per-CPU pagevec by a local_lock_t Subsystem: mm/kfence Muchun Song <songmuchun@bytedance.com>: mm: kfence: fix objcgs vector allocation Subsystem: mailmap Kirill Tkhai <kirill.tkhai@openvz.org>: mailmap: update Kirill's email Subsystem: mm/memory-failure Rik van Riel <riel@surriel.com>: mm,hwpoison: unmap poisoned page before invalidation Subsystem: mm/kasan Andrey Konovalov <andreyknvl@google.com>: mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP Subsystem: mm/debug Yinan Zhang <zhangyinan2019@email.szu.edu.cn>: tools/vm/page_owner_sort.c: remove -c option doc/vm/page_owner.rst: remove content related to -c option Subsystem: mm/kmemleak Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>: mm/kmemleak: reset tag when compare object pointer Subsystem: mm/damon Jonghyeon Kim <tome01@ajou.ac.kr>: mm/damon: prevent activated scheme from sleeping by deactivated schemes .mailmap | 1 Documentation/vm/page_owner.rst | 1 Documentation/vm/unevictable-lru.rst | 473 +++++++++++++++-------------------- fs/nilfs2/btnode.c | 23 + fs/nilfs2/btnode.h | 1 fs/nilfs2/btree.c | 27 + fs/nilfs2/dat.c | 4 fs/nilfs2/gcinode.c | 7 fs/nilfs2/inode.c | 167 +++++++++++- fs/nilfs2/mdt.c | 45 ++- fs/nilfs2/mdt.h | 6 fs/nilfs2/nilfs.h | 16 - fs/nilfs2/page.c | 16 - fs/nilfs2/page.h | 1 fs/nilfs2/segment.c | 9 fs/nilfs2/super.c | 5 fs/ocfs2/quota_global.c | 23 - fs/ocfs2/quota_local.c | 2 include/linux/gfp.h | 4 mm/damon/core.c | 5 mm/gup.c | 10 mm/internal.h | 6 mm/kfence/core.c | 11 mm/kfence/kfence.h | 3 mm/kmemleak.c | 9 mm/madvise.c | 9 mm/memory.c | 12 mm/migrate.c | 2 mm/mlock.c | 46 ++- mm/page_alloc.c | 1 mm/rmap.c | 4 mm/swap.c | 4 tools/vm/page_owner_sort.c | 6 33 files changed, 560 insertions(+), 399 deletions(-)
9 patches, based on d00c50b35101b862c3db270ffeba53a63a1063d9. Subsystems affected by this patch series: mm/migration mm/highmem lz4 mm/sparsemem mm/mremap mm/mempolicy mailmap mm/memcg MAINTAINERS Subsystem: mm/migration Zi Yan <ziy@nvidia.com>: mm: migrate: use thp_order instead of HPAGE_PMD_ORDER for new page allocation. Subsystem: mm/highmem Max Filippov <jcmvbkbc@gmail.com>: highmem: fix checks in __kmap_local_sched_{in,out} Subsystem: lz4 Guo Xuenan <guoxuenan@huawei.com>: lz4: fix LZ4_decompress_safe_partial read out of bound Subsystem: mm/sparsemem Waiman Long <longman@redhat.com>: mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning Subsystem: mm/mremap Paolo Bonzini <pbonzini@redhat.com>: mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) Subsystem: mm/mempolicy Miaohe Lin <linmiaohe@huawei.com>: mm/mempolicy: fix mpol_new leak in shared_policy_replace Subsystem: mailmap Vasily Averin <vasily.averin@linux.dev>: mailmap: update Vasily Averin's email address Subsystem: mm/memcg Andrew Morton <akpm@linux-foundation.org>: mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()" Subsystem: MAINTAINERS Tom Rix <trix@redhat.com>: MAINTAINERS: add Tom as clang reviewer .mailmap | 4 ++++ MAINTAINERS | 1 + include/linux/mmzone.h | 11 +++++++---- lib/lz4/lz4_decompress.c | 8 ++++++-- mm/highmem.c | 4 ++-- mm/list_lru.c | 6 ------ mm/mempolicy.c | 3 ++- mm/migrate.c | 2 +- mm/mremap.c | 3 +++ 9 files changed, 26 insertions(+), 16 deletions(-)
14 patches, based on 115acbb56978941bb7537a97dfc303da286106c1. Subsystems affected by this patch series: MAINTAINERS mm/tmpfs m/secretmem mm/kasan mm/kfence mm/pagealloc mm/zram mm/compaction mm/hugetlb binfmt mm/vmalloc mm/kmemleak Subsystem: MAINTAINERS Joe Perches <joe@perches.com>: MAINTAINERS: Broadcom internal lists aren't maintainers Subsystem: mm/tmpfs Hugh Dickins <hughd@google.com>: tmpfs: fix regressions from wider use of ZERO_PAGE Subsystem: m/secretmem Axel Rasmussen <axelrasmussen@google.com>: mm/secretmem: fix panic when growing a memfd_secret Subsystem: mm/kasan Zqiang <qiang1.zhang@intel.com>: irq_work: use kasan_record_aux_stack_noalloc() record callstack Vincenzo Frascino <vincenzo.frascino@arm.com>: kasan: fix hw tags enablement when KUNIT tests are disabled Subsystem: mm/kfence Marco Elver <elver@google.com>: mm, kfence: support kmem_dump_obj() for KFENCE objects Subsystem: mm/pagealloc Juergen Gross <jgross@suse.com>: mm, page_alloc: fix build_zonerefs_node() Subsystem: mm/zram Minchan Kim <minchan@kernel.org>: mm: fix unexpected zeroed page mapping with zram swap Subsystem: mm/compaction Charan Teja Kalla <quic_charante@quicinc.com>: mm: compaction: fix compiler warning when CONFIG_COMPACTION=n Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: hugetlb: do not demote poisoned hugetlb pages Subsystem: binfmt Andrew Morton <akpm@linux-foundation.org>: revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" Subsystem: mm/vmalloc Omar Sandoval <osandov@fb.com>: mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Subsystem: mm/kmemleak Patrick Wang <patrick.wang.shcn@gmail.com>: mm: kmemleak: take a full lowmem check in kmemleak_*_phys() MAINTAINERS | 64 ++++++++++++++++++++-------------------- arch/x86/include/asm/io.h | 2 - arch/x86/kernel/crash_dump_64.c | 1 fs/binfmt_elf.c | 6 +-- include/linux/kfence.h | 24 +++++++++++++++ kernel/irq_work.c | 2 - mm/compaction.c | 10 +++--- mm/filemap.c | 6 --- mm/hugetlb.c | 17 ++++++---- mm/kasan/hw_tags.c | 5 +-- mm/kasan/kasan.h | 10 +++--- mm/kfence/core.c | 21 ------------- mm/kfence/kfence.h | 21 +++++++++++++ mm/kfence/report.c | 47 +++++++++++++++++++++++++++++ mm/kmemleak.c | 8 ++--- mm/page_alloc.c | 2 - mm/page_io.c | 54 --------------------------------- mm/secretmem.c | 17 ++++++++++ mm/shmem.c | 31 ++++++++++++------- mm/slab.c | 2 - mm/slab.h | 2 - mm/slab_common.c | 9 +++++ mm/slob.c | 2 - mm/slub.c | 2 - mm/vmalloc.c | 11 ------ 25 files changed, 207 insertions(+), 169 deletions(-)
13 patches, based on b253435746d9a4a701b5f09211b9c14d3370d0da. Subsystems affected by this patch series: mm/memory-failure mm/memcg mm/userfaultfd mm/hugetlbfs mm/mremap mm/oom-kill mm/kasan kcov mm/hmm Subsystem: mm/memory-failure Naoya Horiguchi <naoya.horiguchi@nec.com>: mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() Xu Yu <xuyu@linux.alibaba.com>: mm/memory-failure.c: skip huge_zero_page in memory_failure() Subsystem: mm/memcg Shakeel Butt <shakeelb@google.com>: memcg: sync flush only if periodic flush is delayed Subsystem: mm/userfaultfd Nadav Amit <namit@vmware.com>: userfaultfd: mark uffd_wp regardless of VM_WRITE flag Subsystem: mm/hugetlbfs Christophe Leroy <christophe.leroy@csgroup.eu>: mm, hugetlb: allow for "high" userspace addresses Subsystem: mm/mremap Sidhartha Kumar <sidhartha.kumar@oracle.com>: selftest/vm: verify mmap addr in mremap_test selftest/vm: verify remap destination address in mremap_test selftest/vm: support xfail in mremap_test selftest/vm: add skip support to mremap_test Subsystem: mm/oom-kill Nico Pache <npache@redhat.com>: oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup Subsystem: mm/kasan Vincenzo Frascino <vincenzo.frascino@arm.com>: MAINTAINERS: add Vincenzo Frascino to KASAN reviewers Subsystem: kcov Aleksandr Nogikh <nogikh@google.com>: kcov: don't generate a warning on vm_insert_page()'s failure Subsystem: mm/hmm Alistair Popple <apopple@nvidia.com>: mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() MAINTAINERS | 1 fs/hugetlbfs/inode.c | 9 - include/linux/hugetlb.h | 6 + include/linux/memcontrol.h | 5 include/linux/mm.h | 8 + include/linux/sched.h | 1 include/linux/sched/mm.h | 8 + kernel/kcov.c | 7 - mm/hugetlb.c | 10 + mm/memcontrol.c | 12 ++ mm/memory-failure.c | 158 ++++++++++++++++++++++-------- mm/mmap.c | 8 - mm/mmu_notifier.c | 14 ++ mm/oom_kill.c | 54 +++++++--- mm/userfaultfd.c | 15 +- mm/workingset.c | 2 tools/testing/selftests/vm/mremap_test.c | 85 +++++++++++++++- tools/testing/selftests/vm/run_vmtests.sh | 11 +- 18 files changed, 327 insertions(+), 87 deletions(-)
2 patches, based on d615b5416f8a1afeb82d13b238f8152c572d59c0. Subsystems affected by this patch series: mm/kasan mm/debug Subsystem: mm/kasan Zqiang <qiang1.zhang@intel.com>: kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time Subsystem: mm/debug Akira Yokosawa <akiyks@gmail.com>: docs: vm/page_owner: use literal blocks for param description Documentation/vm/page_owner.rst | 5 +++-- mm/kasan/quarantine.c | 7 +++++++ 2 files changed, 10 insertions(+), 2 deletions(-)