* incoming
@ 2022-03-22 21:38 Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (226 more replies)
0 siblings, 227 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits, patches
- A few misc subsystems
- There is a lot of MM material in Willy's tree. Folio work and
non-folio patches which depended on that work.
Here I send almost all the MM patches which precede the patches in
Willy's tree. The remaining ~100 MM patches are staged on Willy's
tree and I'll send those along once Willy is merged up.
I tried this batch against your current tree (as of
51912904076680281) and a couple need some extra persuasion to apply,
but all looks OK otherwise.
227 patches, based on f443e374ae131c168a065ea1748feac6b2e76613
Subsystems affected by this patch series:
kthread
scripts
ntfs
ocfs2
block
vfs
mm/kasan
mm/pagecache
mm/gup
mm/swap
mm/shmem
mm/memcg
mm/selftests
mm/pagemap
mm/mremap
mm/sparsemem
mm/vmalloc
mm/pagealloc
mm/memory-failure
mm/mlock
mm/hugetlb
mm/userfaultfd
mm/vmscan
mm/compaction
mm/mempolicy
mm/oom-kill
mm/migration
mm/thp
mm/cma
mm/autonuma
mm/psi
mm/ksm
mm/page-poison
mm/madvise
mm/memory-hotplug
mm/rmap
mm/zswap
mm/uaccess
mm/ioremap
mm/highmem
mm/cleanups
mm/kfence
mm/hmm
mm/damon
Subsystem: kthread
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
linux/kthread.h: remove unused macros
Subsystem: scripts
Colin Ian King <colin.i.king@gmail.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Subsystem: ntfs
Dongliang Mu <mudongliangabcd@gmail.com>:
ntfs: add sanity check on allocation size
Subsystem: ocfs2
Joseph Qi <joseph.qi@linux.alibaba.com>:
ocfs2: cleanup some return variables
hongnanli <hongnan.li@linux.alibaba.com>:
fs/ocfs2: fix comments mentioning i_mutex
Subsystem: block
NeilBrown <neilb@suse.de>:
Patch series "Remove remaining parts of congestion tracking code", v2:
doc: convert 'subsection' to 'section' in gfp.h
mm: document and polish read-ahead code
mm: improve cleanup when ->readpages doesn't process all pages
fuse: remove reliance on bdi congestion
nfs: remove reliance on bdi congestion
ceph: remove reliance on bdi congestion
remove inode_congested()
remove bdi_congested() and wb_congested() and related functions
f2fs: replace congestion_wait() calls with io_schedule_timeout()
block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC"
remove congestion tracking framework
Subsystem: vfs
Anthony Iliopoulos <ailiop@suse.com>:
mount: warn only once about timestamp range expiration
Subsystem: mm/kasan
Miaohe Lin <linmiaohe@huawei.com>:
mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory
Subsystem: mm/pagecache
Miaohe Lin <linmiaohe@huawei.com>:
filemap: remove find_get_pages()
mm/writeback: minor clean up for highmem_dirtyable_memory
Minchan Kim <minchan@kernel.org>:
mm: fs: fix lru_cache_disabled race in bh_lru
Subsystem: mm/gup
Peter Xu <peterx@redhat.com>:
Patch series "mm/gup: some cleanups", v5:
mm: fix invalid page pointer returned with FOLL_PIN gups
John Hubbard <jhubbard@nvidia.com>:
mm/gup: follow_pfn_pte(): -EEXIST cleanup
mm/gup: remove unused pin_user_pages_locked()
mm: change lookup_node() to use get_user_pages_fast()
mm/gup: remove unused get_user_pages_locked()
Subsystem: mm/swap
Bang Li <libang.linuxer@gmail.com>:
mm/swap: fix confusing comment in folio_mark_accessed
Subsystem: mm/shmem
Xavier Roche <xavier.roche@algolia.com>:
tmpfs: support for file creation time
Hugh Dickins <hughd@google.com>:
shmem: mapping_set_exiting() to help mapped resilience
tmpfs: do not allocate pages on read
Miaohe Lin <linmiaohe@huawei.com>:
mm: shmem: use helper macro __ATTR_RW
Subsystem: mm/memcg
Shakeel Butt <shakeelb@google.com>:
memcg: replace in_interrupt() with !in_task()
Yosry Ahmed <yosryahmed@google.com>:
memcg: add per-memcg total kernel memory stat
Wei Yang <richard.weiyang@gmail.com>:
mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
mm/memcg: retrieve parent memcg from css.parent
Shakeel Butt <shakeelb@google.com>:
Patch series "memcg: robust enforcement of memory.high", v2:
memcg: refactor mem_cgroup_oom
memcg: unify force charging conditions
selftests: memcg: test high limit for single entry allocation
memcg: synchronously enforce memory.high for large overcharges
Randy Dunlap <rdunlap@infradead.org>:
mm/memcontrol: return 1 from cgroup.memory __setup() handler
Michal Hocko <mhocko@suse.com>:
Patch series "mm/memcg: Address PREEMPT_RT problems instead of disabling it", v5:
mm/memcg: revert ("mm/memcg: optimize user context object stock access")
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm/memcg: disable threshold event handlers on PREEMPT_RT
mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed.
Johannes Weiner <hannes@cmpxchg.org>:
mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock()
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm/memcg: protect memcg_stock with a local_lock_t
mm/memcg: disable migration instead of preemption in drain_all_stock().
Muchun Song <songmuchun@bytedance.com>:
Patch series "Optimize list lru memory consumption", v6:
mm: list_lru: transpose the array of per-node per-memcg lru lists
mm: introduce kmem_cache_alloc_lru
fs: introduce alloc_inode_sb() to allocate filesystems specific inode
fs: allocate inode by using alloc_inode_sb()
f2fs: allocate inode by using alloc_inode_sb()
mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
xarray: use kmem_cache_alloc_lru to allocate xa_node
mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
mm: list_lru: allocate list_lru_one only when needed
mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus
mm: list_lru: replace linear array with xarray
mm: memcontrol: reuse memory cgroup ID for kmem ID
mm: memcontrol: fix cannot alloc the maximum memcg ID
mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
Vasily Averin <vvs@virtuozzo.com>:
memcg: enable accounting for tty-related objects
Subsystem: mm/selftests
Guillaume Tucker <guillaume.tucker@collabora.com>:
selftests, x86: fix how check_cc.sh is being invoked
Subsystem: mm/pagemap
Anshuman Khandual <anshuman.khandual@arm.com>:
mm: merge pte_mkhuge() call into arch_make_huge_pte()
Stafford Horne <shorne@gmail.com>:
mm: remove mmu_gathers storage from remaining architectures
Muchun Song <songmuchun@bytedance.com>:
Patch series "Fix some cache flush bugs", v5:
mm: thp: fix wrong cache flush in remove_migration_pmd()
mm: fix missing cache flush for all tail pages of compound page
mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte()
mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
mm: replace multiple dcache flush with flush_dcache_folio()
Peter Xu <peterx@redhat.com>:
Patch series "mm: Rework zap ptes on swap entries", v5:
mm: don't skip swap entry even if zap_details specified
mm: rename zap_skip_check_mapping() to should_zap_page()
mm: change zap_details.zap_mapping into even_cows
mm: rework swap handling of zap_pte_range
Randy Dunlap <rdunlap@infradead.org>:
mm/mmap: return 1 from stack_guard_gap __setup() handler
Miaohe Lin <linmiaohe@huawei.com>:
mm/memory.c: use helper function range_in_vma()
mm/memory.c: use helper macro min and max in unmap_mapping_range_tree()
Hugh Dickins <hughd@google.com>:
mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK
Miaohe Lin <linmiaohe@huawei.com>:
mm/mmap: remove obsolete comment in ksys_mmap_pgoff
Subsystem: mm/mremap
Miaohe Lin <linmiaohe@huawei.com>:
mm/mremap:: use vma_lookup() instead of find_vma()
Subsystem: mm/sparsemem
Miaohe Lin <linmiaohe@huawei.com>:
mm/sparse: make mminit_validate_memmodel_limits() static
Subsystem: mm/vmalloc
Miaohe Lin <linmiaohe@huawei.com>:
mm/vmalloc: remove unneeded function forward declaration
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: Move draining areas out of caller context
Uladzislau Rezki <uladzislau.rezki@sony.com>:
mm/vmalloc: add adjust_search_size parameter
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: eliminate an extra orig_gfp_mask
Jiapeng Chong <jiapeng.chong@linux.alibaba.com>:
mm/vmalloc.c: fix "unused function" warning
Bang Li <libang.linuxer@gmail.com>:
mm/vmalloc: fix comments about vmap_area struct
Subsystem: mm/pagealloc
Zi Yan <ziy@nvidia.com>:
mm: page_alloc: avoid merging non-fallbackable pageblocks with others
Peter Collingbourne <pcc@google.com>:
mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last()
Miaohe Lin <linmiaohe@huawei.com>:
mm/mmzone.h: remove unused macros
Nicolas Saenz Julienne <nsaenzju@redhat.com>:
mm/page_alloc: don't pass pfn to free_unref_page_commit()
David Hildenbrand <david@redhat.com>:
Patch series "mm: enforce pageblock_order < MAX_ORDER":
cma: factor out minimum alignment requirement
mm: enforce pageblock_order < MAX_ORDER
Nathan Chancellor <nathan@kernel.org>:
mm/page_alloc: mark pagesets as __maybe_unused
Alistair Popple <apopple@nvidia.com>:
mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
Mel Gorman <mgorman@techsingularity.net>:
Patch series "Follow-up on high-order PCP caching", v2:
mm/page_alloc: fetch the correct pcp buddy during bulk free
mm/page_alloc: track range of active PCP lists during bulk free
mm/page_alloc: simplify how many pages are selected per pcp list during bulk free
mm/page_alloc: drain the requested list first during bulk free
mm/page_alloc: free pages in a single pass during bulk free
mm/page_alloc: limit number of high-order pages on PCP during bulk free
mm/page_alloc: do not prefetch buddies during bulk free
Oscar Salvador <osalvador@suse.de>:
arch/x86/mm/numa: Do not initialize nodes twice
Suren Baghdasaryan <surenb@google.com>:
mm: count time in drain_all_pages during direct reclaim as memory pressure
Eric Dumazet <edumazet@google.com>:
mm/page_alloc: call check_new_pages() while zone spinlock is not held
Mel Gorman <mgorman@techsingularity.net>:
mm/page_alloc: check high-order pages for corruption during PCP operations
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm/memory-failure.c: remove obsolete comment
mm/hwpoison: fix error page recovered but reported "not recovered"
Rik van Riel <riel@surriel.com>:
mm: invalidate hwpoison page cache page in fault path
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "A few cleanup and fixup patches for memory failure", v3:
mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap
mm/memory-failure.c: catch unexpected -EFAULT from vma_address()
mm/memory-failure.c: rework the signaling logic in kill_proc
mm/memory-failure.c: fix race with changing page more robustly
mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev
mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings()
mm/memory-failure.c: remove obsolete comment in __soft_offline_page
mm/memory-failure.c: remove unnecessary PageTransTail check
mm/hwpoison-inject: support injecting hwpoison to free page
luofei <luofei@unicloud.com>:
mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
mm/hwpoison: add in-use hugepage hwpoison filter judgement
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "A few fixup patches for memory failure", v2:
mm/memory-failure.c: fix race with changing page compound again
mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
mm/memory-failure.c: make non-LRU movable pages unhandlable
Vlastimil Babka <vbabka@suse.cz>:
mm, fault-injection: declare should_fail_alloc_page()
Subsystem: mm/mlock
Miaohe Lin <linmiaohe@huawei.com>:
mm/mlock: fix potential imbalanced rlimit ucounts adjustment
Subsystem: mm/hugetlb
Muchun Song <songmuchun@bytedance.com>:
Patch series "Free the 2nd vmemmap page associated with each HugeTLB page", v7:
mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
mm: sparsemem: use page table lock to protect kernel pmd operations
selftests: vm: add a hugetlb test case
mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb: clean up potential spectre issue warnings
Miaohe Lin <linmiaohe@huawei.com>:
mm/hugetlb: use helper macro __ATTR_RW
David Howells <dhowells@redhat.com>:
mm/hugetlb.c: export PageHeadHuge()
Miaohe Lin <linmiaohe@huawei.com>:
mm: remove unneeded local variable follflags
Subsystem: mm/userfaultfd
Nadav Amit <namit@vmware.com>:
userfaultfd: provide unmasked address on page-fault
Guo Zhengkui <guozhengkui@vivo.com>:
userfaultfd/selftests: fix uninitialized_var.cocci warning
Subsystem: mm/vmscan
Hugh Dickins <hughd@google.com>:
mm/fs: delete PF_SWAPWRITE
mm: __isolate_lru_page_prepare() in isolate_migratepages_block()
Waiman Long <longman@redhat.com>:
mm/list_lru: optimize memcg_reparent_list_lru_node()
Marcelo Tosatti <mtosatti@redhat.com>:
mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm: workingset: replace IRQ-off check with a lockdep assert.
Charan Teja Kalla <quic_charante@quicinc.com>:
mm: vmscan: fix documentation for page_check_references()
Subsystem: mm/compaction
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm: compaction: cleanup the compaction trace events
Subsystem: mm/mempolicy
Hugh Dickins <hughd@google.com>:
mempolicy: mbind_range() set_policy() after vma_merge()
Subsystem: mm/oom-kill
Miaohe Lin <linmiaohe@huawei.com>:
mm/oom_kill: remove unneeded is_memcg_oom check
Subsystem: mm/migration
Huang Ying <ying.huang@intel.com>:
mm,migrate: fix establishing demotion target
"andrew.yang" <andrew.yang@mediatek.com>:
mm/migrate: fix race between lock page and clear PG_Isolated
Subsystem: mm/thp
Hugh Dickins <hughd@google.com>:
mm/thp: refix __split_huge_pmd_locked() for migration PMD
Subsystem: mm/cma
Hari Bathini <hbathini@linux.ibm.com>:
Patch series "powerpc/fadump: handle CMA activation failure appropriately", v3:
mm/cma: provide option to opt out from exposing pages on activation failure
powerpc/fadump: opt out from freeing pages on cma activation failure
Subsystem: mm/autonuma
Huang Ying <ying.huang@intel.com>:
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13:
NUMA Balancing: add page promotion counter
NUMA balancing: optimize page placement for memory tiering system
memory tiering: skip to scan fast memory
Subsystem: mm/psi
Johannes Weiner <hannes@cmpxchg.org>:
mm: page_io: fix psi memory pressure error on cold swapins
Subsystem: mm/ksm
Yang Yang <yang.yang29@zte.com.cn>:
mm/vmstat: add event for ksm swapping in copy
Miaohe Lin <linmiaohe@huawei.com>:
mm/ksm: use helper macro __ATTR_RW
Subsystem: mm/page-poison
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/hwpoison: check the subpage, not the head page
Subsystem: mm/madvise
Miaohe Lin <linmiaohe@huawei.com>:
mm/madvise: use vma_lookup() instead of find_vma()
Charan Teja Kalla <quic_charante@quicinc.com>:
Patch series "mm: madvise: return correct bytes processed with:
mm: madvise: return correct bytes advised with process_madvise
mm: madvise: skip unmapped vma holes passed to process_madvise
Subsystem: mm/memory-hotplug
Michal Hocko <mhocko@suse.com>:
Patch series "mm, memory_hotplug: handle unitialized numa node gracefully":
mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG
mm: handle uninitialized numa nodes gracefully
mm, memory_hotplug: drop arch_free_nodedata
mm, memory_hotplug: reorganize new pgdat initialization
mm: make free_area_init_node aware of memory less nodes
Wei Yang <richard.weiyang@gmail.com>:
memcg: do not tweak node in alloc_mem_cgroup_per_node_info
David Hildenbrand <david@redhat.com>:
drivers/base/memory: add memory block to memory group after registration succeeded
drivers/base/node: consolidate node device subsystem initialization in node_dev_init()
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "A few cleanup patches around memory_hotplug":
mm/memory_hotplug: remove obsolete comment of __add_pages
mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL
mm/memory_hotplug: clean up try_offline_node
mm/memory_hotplug: fix misplaced comment in offline_pages
David Hildenbrand <david@redhat.com>:
Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2:
drivers/base/node: rename link_mem_sections() to register_memory_block_under_node()
drivers/base/memory: determine and store zone for single-zone memory blocks
drivers/base/memory: clarify adding and removing of memory blocks
Oscar Salvador <osalvador@suse.de>:
mm: only re-generate demotion targets when a numa node changes its N_CPU state
Subsystem: mm/rmap
Hugh Dickins <hughd@google.com>:
mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
Subsystem: mm/zswap
"Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>:
mm/zswap.c: allow handling just same-value filled pages
Subsystem: mm/uaccess
Christophe Leroy <christophe.leroy@csgroup.eu>:
mm: remove usercopy_warn()
mm: uninline copy_overflow()
Randy Dunlap <rdunlap@infradead.org>:
mm/usercopy: return 1 from hardened_usercopy __setup() handler
Subsystem: mm/ioremap
Vlastimil Babka <vbabka@suse.cz>:
mm/early_ioremap: declare early_memremap_pgprot_adjust()
Subsystem: mm/highmem
Ira Weiny <ira.weiny@intel.com>:
highmem: document kunmap_local()
Miaohe Lin <linmiaohe@huawei.com>:
mm/highmem: remove unnecessary done label
Subsystem: mm/cleanups
"Dr. David Alan Gilbert" <linux@treblig.org>:
mm/page_table_check.c: use strtobool for param parsing
Subsystem: mm/kfence
tangmeng <tangmeng@uniontech.com>:
mm/kfence: remove unnecessary CONFIG_KFENCE option
Tianchen Ding <dtcccc@linux.alibaba.com>:
Patch series "provide the flexibility to enable KFENCE", v3:
kfence: allow re-enabling KFENCE after system startup
kfence: alloc kfence_pool after system startup
Peng Liu <liupeng256@huawei.com>:
Patch series "kunit: fix a UAF bug and do some optimization", v2:
kunit: fix UAF when run kfence test case test_gfpzero
kunit: make kunit_test_timeout compatible with comment
kfence: test: try to avoid test_gfpzero trigger rcu_stall
Marco Elver <elver@google.com>:
kfence: allow use of a deferrable timer
Subsystem: mm/hmm
Miaohe Lin <linmiaohe@huawei.com>:
mm/hmm.c: remove unneeded local variable ret
Subsystem: mm/damon
SeongJae Park <sj@kernel.org>:
Patch series "Remove the type-unclear target id concept":
mm/damon/dbgfs/init_regions: use target index instead of target id
Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input
mm/damon/core: move damon_set_targets() into dbgfs
mm/damon: remove the target id concept
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm/damon: remove redundant page validation
SeongJae Park <sj@kernel.org>:
Patch series "Allow DAMON user code independent of monitoring primitives":
mm/damon: rename damon_primitives to damon_operations
mm/damon: let monitoring operations can be registered and selected
mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall
mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()
mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations()
mm/damon/dbgfs: use operations id for knowing if the target has pid
mm/damon/dbgfs-test: fix is_target_id() change
mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
tangmeng <tangmeng@uniontech.com>:
mm/damon: remove unnecessary CONFIG_DAMON option
SeongJae Park <sj@kernel.org>:
Patch series "Docs/damon: Update documents for better consistency":
Docs/vm/damon: call low level monitoring primitives the operations
Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
Docs/damon: update outdated term 'regions update interval'
Patch series "Introduce DAMON sysfs interface", v3:
mm/damon/core: allow non-exclusive DAMON start/stop
mm/damon/core: add number of each enum type values
mm/damon: implement a minimal stub for sysfs-based DAMON interface
mm/damon/sysfs: link DAMON for virtual address spaces monitoring
mm/damon/sysfs: support the physical address space monitoring
mm/damon/sysfs: support DAMON-based Operation Schemes
mm/damon/sysfs: support DAMOS quotas
mm/damon/sysfs: support schemes prioritization
mm/damon/sysfs: support DAMOS watermarks
mm/damon/sysfs: support DAMOS stats
selftests/damon: add a test for DAMON sysfs interface
Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
Docs/ABI/testing: add DAMON sysfs interface ABI document
Xin Hao <xhao@linux.alibaba.com>:
mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
Documentation/ABI/testing/sysfs-kernel-mm-damon | 274 ++
Documentation/admin-guide/cgroup-v1/memory.rst | 2
Documentation/admin-guide/cgroup-v2.rst | 5
Documentation/admin-guide/kernel-parameters.txt | 2
Documentation/admin-guide/mm/damon/usage.rst | 380 +++
Documentation/admin-guide/mm/zswap.rst | 22
Documentation/admin-guide/sysctl/kernel.rst | 31
Documentation/core-api/mm-api.rst | 19
Documentation/dev-tools/kfence.rst | 12
Documentation/filesystems/porting.rst | 6
Documentation/filesystems/vfs.rst | 16
Documentation/vm/damon/design.rst | 43
Documentation/vm/damon/faq.rst | 2
MAINTAINERS | 1
arch/arm/Kconfig | 4
arch/arm64/kernel/setup.c | 3
arch/arm64/mm/hugetlbpage.c | 1
arch/hexagon/mm/init.c | 2
arch/ia64/kernel/topology.c | 10
arch/ia64/mm/discontig.c | 11
arch/mips/kernel/topology.c | 5
arch/nds32/mm/init.c | 1
arch/openrisc/mm/init.c | 2
arch/powerpc/include/asm/fadump-internal.h | 5
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 4
arch/powerpc/kernel/fadump.c | 8
arch/powerpc/kernel/sysfs.c | 17
arch/riscv/Kconfig | 4
arch/riscv/kernel/setup.c | 3
arch/s390/kernel/numa.c | 7
arch/sh/kernel/topology.c | 5
arch/sparc/kernel/sysfs.c | 12
arch/sparc/mm/hugetlbpage.c | 1
arch/x86/Kconfig | 4
arch/x86/kernel/cpu/mce/core.c | 8
arch/x86/kernel/topology.c | 5
arch/x86/mm/numa.c | 33
block/bdev.c | 2
block/bfq-iosched.c | 2
drivers/base/init.c | 1
drivers/base/memory.c | 149 +
drivers/base/node.c | 48
drivers/block/drbd/drbd_int.h | 3
drivers/block/drbd/drbd_req.c | 3
drivers/dax/super.c | 2
drivers/of/of_reserved_mem.c | 9
drivers/tty/tty_io.c | 2
drivers/virtio/virtio_mem.c | 9
fs/9p/vfs_inode.c | 2
fs/adfs/super.c | 2
fs/affs/super.c | 2
fs/afs/super.c | 2
fs/befs/linuxvfs.c | 2
fs/bfs/inode.c | 2
fs/btrfs/inode.c | 2
fs/buffer.c | 8
fs/ceph/addr.c | 22
fs/ceph/inode.c | 2
fs/ceph/super.c | 1
fs/ceph/super.h | 1
fs/cifs/cifsfs.c | 2
fs/coda/inode.c | 2
fs/dcache.c | 3
fs/ecryptfs/super.c | 2
fs/efs/super.c | 2
fs/erofs/super.c | 2
fs/exfat/super.c | 2
fs/ext2/ialloc.c | 5
fs/ext2/super.c | 2
fs/ext4/super.c | 2
fs/f2fs/compress.c | 4
fs/f2fs/data.c | 3
fs/f2fs/f2fs.h | 6
fs/f2fs/segment.c | 8
fs/f2fs/super.c | 14
fs/fat/inode.c | 2
fs/freevxfs/vxfs_super.c | 2
fs/fs-writeback.c | 40
fs/fuse/control.c | 17
fs/fuse/dev.c | 8
fs/fuse/file.c | 17
fs/fuse/inode.c | 2
fs/gfs2/super.c | 2
fs/hfs/super.c | 2
fs/hfsplus/super.c | 2
fs/hostfs/hostfs_kern.c | 2
fs/hpfs/super.c | 2
fs/hugetlbfs/inode.c | 2
fs/inode.c | 2
fs/isofs/inode.c | 2
fs/jffs2/super.c | 2
fs/jfs/super.c | 2
fs/minix/inode.c | 2
fs/namespace.c | 2
fs/nfs/inode.c | 2
fs/nfs/write.c | 14
fs/nilfs2/segbuf.c | 16
fs/nilfs2/super.c | 2
fs/ntfs/inode.c | 6
fs/ntfs3/super.c | 2
fs/ocfs2/alloc.c | 2
fs/ocfs2/aops.c | 2
fs/ocfs2/cluster/nodemanager.c | 2
fs/ocfs2/dir.c | 4
fs/ocfs2/dlmfs/dlmfs.c | 2
fs/ocfs2/file.c | 13
fs/ocfs2/inode.c | 2
fs/ocfs2/localalloc.c | 6
fs/ocfs2/namei.c | 2
fs/ocfs2/ocfs2.h | 4
fs/ocfs2/quota_global.c | 2
fs/ocfs2/stack_user.c | 18
fs/ocfs2/super.c | 2
fs/ocfs2/xattr.c | 2
fs/openpromfs/inode.c | 2
fs/orangefs/super.c | 2
fs/overlayfs/super.c | 2
fs/proc/inode.c | 2
fs/qnx4/inode.c | 2
fs/qnx6/inode.c | 2
fs/reiserfs/super.c | 2
fs/romfs/super.c | 2
fs/squashfs/super.c | 2
fs/sysv/inode.c | 2
fs/ubifs/super.c | 2
fs/udf/super.c | 2
fs/ufs/super.c | 2
fs/userfaultfd.c | 5
fs/vboxsf/super.c | 2
fs/xfs/libxfs/xfs_btree.c | 2
fs/xfs/xfs_buf.c | 3
fs/xfs/xfs_icache.c | 2
fs/zonefs/super.c | 2
include/linux/backing-dev-defs.h | 8
include/linux/backing-dev.h | 50
include/linux/cma.h | 14
include/linux/damon.h | 95
include/linux/fault-inject.h | 2
include/linux/fs.h | 21
include/linux/gfp.h | 10
include/linux/highmem-internal.h | 10
include/linux/hugetlb.h | 8
include/linux/kthread.h | 22
include/linux/list_lru.h | 45
include/linux/memcontrol.h | 46
include/linux/memory.h | 12
include/linux/memory_hotplug.h | 132 -
include/linux/migrate.h | 8
include/linux/mm.h | 11
include/linux/mmzone.h | 22
include/linux/nfs_fs_sb.h | 1
include/linux/node.h | 25
include/linux/page-flags.h | 96
include/linux/pageblock-flags.h | 7
include/linux/pagemap.h | 7
include/linux/sched.h | 1
include/linux/sched/sysctl.h | 10
include/linux/shmem_fs.h | 1
include/linux/slab.h | 3
include/linux/swap.h | 6
include/linux/thread_info.h | 5
include/linux/uaccess.h | 2
include/linux/vm_event_item.h | 3
include/linux/vmalloc.h | 4
include/linux/xarray.h | 9
include/ras/ras_event.h | 1
include/trace/events/compaction.h | 26
include/trace/events/writeback.h | 28
include/uapi/linux/userfaultfd.h | 8
ipc/mqueue.c | 2
kernel/dma/contiguous.c | 4
kernel/sched/core.c | 21
kernel/sysctl.c | 2
lib/Kconfig.kfence | 12
lib/kunit/try-catch.c | 3
lib/xarray.c | 10
mm/Kconfig | 6
mm/backing-dev.c | 57
mm/cma.c | 31
mm/cma.h | 1
mm/compaction.c | 60
mm/damon/Kconfig | 19
mm/damon/Makefile | 7
mm/damon/core-test.h | 23
mm/damon/core.c | 190 +
mm/damon/dbgfs-test.h | 103
mm/damon/dbgfs.c | 264 +-
mm/damon/ops-common.c | 133 +
mm/damon/ops-common.h | 16
mm/damon/paddr.c | 62
mm/damon/prmtv-common.c | 133 -
mm/damon/prmtv-common.h | 16
mm/damon/reclaim.c | 11
mm/damon/sysfs.c | 2632 ++++++++++++++++++++++-
mm/damon/vaddr-test.h | 8
mm/damon/vaddr.c | 67
mm/early_ioremap.c | 1
mm/fadvise.c | 5
mm/filemap.c | 17
mm/gup.c | 103
mm/highmem.c | 9
mm/hmm.c | 3
mm/huge_memory.c | 41
mm/hugetlb.c | 23
mm/hugetlb_vmemmap.c | 74
mm/hwpoison-inject.c | 7
mm/internal.h | 19
mm/kfence/Makefile | 2
mm/kfence/core.c | 147 +
mm/kfence/kfence_test.c | 3
mm/ksm.c | 6
mm/list_lru.c | 690 ++----
mm/maccess.c | 6
mm/madvise.c | 18
mm/memcontrol.c | 549 ++--
mm/memory-failure.c | 148 -
mm/memory.c | 116 -
mm/memory_hotplug.c | 136 -
mm/mempolicy.c | 29
mm/memremap.c | 3
mm/migrate.c | 128 -
mm/mlock.c | 1
mm/mmap.c | 5
mm/mmzone.c | 7
mm/mprotect.c | 13
mm/mremap.c | 4
mm/oom_kill.c | 3
mm/page-writeback.c | 12
mm/page_alloc.c | 429 +--
mm/page_io.c | 7
mm/page_table_check.c | 10
mm/ptdump.c | 16
mm/readahead.c | 124 +
mm/rmap.c | 15
mm/shmem.c | 46
mm/slab.c | 39
mm/slab.h | 25
mm/slob.c | 6
mm/slub.c | 42
mm/sparse-vmemmap.c | 70
mm/sparse.c | 2
mm/swap.c | 25
mm/swapfile.c | 1
mm/usercopy.c | 16
mm/userfaultfd.c | 3
mm/vmalloc.c | 102
mm/vmscan.c | 138 -
mm/vmstat.c | 19
mm/workingset.c | 7
mm/zswap.c | 15
net/socket.c | 2
net/sunrpc/rpc_pipe.c | 2
scripts/spelling.txt | 16
tools/testing/selftests/cgroup/cgroup_util.c | 15
tools/testing/selftests/cgroup/cgroup_util.h | 1
tools/testing/selftests/cgroup/test_memcontrol.c | 78
tools/testing/selftests/damon/Makefile | 1
tools/testing/selftests/damon/sysfs.sh | 306 ++
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 7
tools/testing/selftests/vm/hugepage-vmemmap.c | 144 +
tools/testing/selftests/vm/run_vmtests.sh | 11
tools/testing/selftests/vm/userfaultfd.c | 2
tools/testing/selftests/x86/Makefile | 6
264 files changed, 7205 insertions(+), 3090 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 001/227] linux/kthread.h: remove unused macros
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: tj, pmladek, laoar.shao, ebiederm, david, caihuoqing, linux,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Subject: linux/kthread.h: remove unused macros
Ever since these macros were introduced in commit b56c0d8937e6 ("kthread:
implement kthread_worker"), there has been precisely one user (commit
4d115420707a, "NVMe: Async IO queue deletion"), and that user went away in
2016 with db3cbfff5bcc ("NVMe: IO queue deletion re-write").
Apart from being unused, these macros are also awkward to use (which may
contribute to them not being used): Having a way to statically (or
on-stack) allocating the storage for the struct kthread_worker itself
doesn't help much, since obviously one needs to have some code for
actually _spawning_ the worker thread, which must have error checking.
And these days we have the kthread_create_worker() interface which both
allocates the struct kthread_worker and spawns the kthread.
Link: https://lkml.kernel.org/r/20220314145343.494694-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/kthread.h | 22 ----------------------
1 file changed, 22 deletions(-)
--- a/include/linux/kthread.h~linux-kthreadh-remove-unused-macros
+++ a/include/linux/kthread.h
@@ -141,12 +141,6 @@ struct kthread_delayed_work {
struct timer_list timer;
};
-#define KTHREAD_WORKER_INIT(worker) { \
- .lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock), \
- .work_list = LIST_HEAD_INIT((worker).work_list), \
- .delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\
- }
-
#define KTHREAD_WORK_INIT(work, fn) { \
.node = LIST_HEAD_INIT((work).node), \
.func = (fn), \
@@ -158,9 +152,6 @@ struct kthread_delayed_work {
TIMER_IRQSAFE), \
}
-#define DEFINE_KTHREAD_WORKER(worker) \
- struct kthread_worker worker = KTHREAD_WORKER_INIT(worker)
-
#define DEFINE_KTHREAD_WORK(work, fn) \
struct kthread_work work = KTHREAD_WORK_INIT(work, fn)
@@ -168,19 +159,6 @@ struct kthread_delayed_work {
struct kthread_delayed_work dwork = \
KTHREAD_DELAYED_WORK_INIT(dwork, fn)
-/*
- * kthread_worker.lock needs its own lockdep class key when defined on
- * stack with lockdep enabled. Use the following macros in such cases.
- */
-#ifdef CONFIG_LOCKDEP
-# define KTHREAD_WORKER_INIT_ONSTACK(worker) \
- ({ kthread_init_worker(&worker); worker; })
-# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) \
- struct kthread_worker worker = KTHREAD_WORKER_INIT_ONSTACK(worker)
-#else
-# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) DEFINE_KTHREAD_WORKER(worker)
-#endif
-
extern void __kthread_init_worker(struct kthread_worker *worker,
const char *name, struct lock_class_key *key);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 001/227] linux/kthread.h: remove unused macros
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: tj, pmladek, laoar.shao, ebiederm, david, caihuoqing, linux,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Subject: linux/kthread.h: remove unused macros
Ever since these macros were introduced in commit b56c0d8937e6 ("kthread:
implement kthread_worker"), there has been precisely one user (commit
4d115420707a, "NVMe: Async IO queue deletion"), and that user went away in
2016 with db3cbfff5bcc ("NVMe: IO queue deletion re-write").
Apart from being unused, these macros are also awkward to use (which may
contribute to them not being used): Having a way to statically (or
on-stack) allocating the storage for the struct kthread_worker itself
doesn't help much, since obviously one needs to have some code for
actually _spawning_ the worker thread, which must have error checking.
And these days we have the kthread_create_worker() interface which both
allocates the struct kthread_worker and spawns the kthread.
Link: https://lkml.kernel.org/r/20220314145343.494694-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/kthread.h | 22 ----------------------
1 file changed, 22 deletions(-)
--- a/include/linux/kthread.h~linux-kthreadh-remove-unused-macros
+++ a/include/linux/kthread.h
@@ -141,12 +141,6 @@ struct kthread_delayed_work {
struct timer_list timer;
};
-#define KTHREAD_WORKER_INIT(worker) { \
- .lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock), \
- .work_list = LIST_HEAD_INIT((worker).work_list), \
- .delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\
- }
-
#define KTHREAD_WORK_INIT(work, fn) { \
.node = LIST_HEAD_INIT((work).node), \
.func = (fn), \
@@ -158,9 +152,6 @@ struct kthread_delayed_work {
TIMER_IRQSAFE), \
}
-#define DEFINE_KTHREAD_WORKER(worker) \
- struct kthread_worker worker = KTHREAD_WORKER_INIT(worker)
-
#define DEFINE_KTHREAD_WORK(work, fn) \
struct kthread_work work = KTHREAD_WORK_INIT(work, fn)
@@ -168,19 +159,6 @@ struct kthread_delayed_work {
struct kthread_delayed_work dwork = \
KTHREAD_DELAYED_WORK_INIT(dwork, fn)
-/*
- * kthread_worker.lock needs its own lockdep class key when defined on
- * stack with lockdep enabled. Use the following macros in such cases.
- */
-#ifdef CONFIG_LOCKDEP
-# define KTHREAD_WORKER_INIT_ONSTACK(worker) \
- ({ kthread_init_worker(&worker); worker; })
-# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) \
- struct kthread_worker worker = KTHREAD_WORKER_INIT_ONSTACK(worker)
-#else
-# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) DEFINE_KTHREAD_WORKER(worker)
-#endif
-
extern void __kthread_init_worker(struct kthread_worker *worker,
const char *name, struct lock_class_key *key);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 002/227] scripts/spelling.txt: add more spellings to spelling.txt
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: joe, colin.i.king, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Colin Ian King <colin.i.king@gmail.com>
Subject: scripts/spelling.txt: add more spellings to spelling.txt
Some of the more common spelling mistakes and typos that I've found
while fixing up spelling mistakes in the kernel in the past four months.
Link: https://lkml.kernel.org/r/20220216152343.105546-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
scripts/spelling.txt | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/scripts/spelling.txt~scripts-spellingtxt-add-more-spellings-to-spellingtxt
+++ a/scripts/spelling.txt
@@ -180,6 +180,7 @@ asuming||assuming
asycronous||asynchronous
asychronous||asynchronous
asynchnous||asynchronous
+asynchronus||asynchronous
asynchromous||asynchronous
asymetric||asymmetric
asymmeric||asymmetric
@@ -231,6 +232,7 @@ baloons||balloons
bandwith||bandwidth
banlance||balance
batery||battery
+battey||battery
beacuse||because
becasue||because
becomming||becoming
@@ -333,6 +335,7 @@ commoditiy||commodity
comsume||consume
comsumer||consumer
comsuming||consuming
+comaptible||compatible
compability||compatibility
compaibility||compatibility
comparsion||comparison
@@ -353,7 +356,9 @@ compoment||component
comppatible||compatible
compres||compress
compresion||compression
+compresser||compressor
comression||compression
+comsumed||consumed
comunicate||communicate
comunication||communication
conbination||combination
@@ -530,6 +535,7 @@ dissconect||disconnect
distiction||distinction
divisable||divisible
divsiors||divisors
+dsiabled||disabled
docuentation||documentation
documantation||documentation
documentaion||documentation
@@ -677,6 +683,7 @@ frequence||frequency
frequncy||frequency
frequancy||frequency
frome||from
+fronend||frontend
fucntion||function
fuction||function
fuctions||functions
@@ -761,6 +768,7 @@ implmentation||implementation
implmenting||implementing
incative||inactive
incomming||incoming
+incompaitiblity||incompatibility
incompatabilities||incompatibilities
incompatable||incompatible
incompatble||incompatible
@@ -942,6 +950,7 @@ metdata||metadata
micropone||microphone
microprocesspr||microprocessor
migrateable||migratable
+millenium||millennium
milliseonds||milliseconds
minium||minimum
minimam||minimum
@@ -1007,6 +1016,7 @@ notity||notify
nubmer||number
numebr||number
numner||number
+nunber||number
obtaion||obtain
obusing||abusing
occassionally||occasionally
@@ -1136,6 +1146,7 @@ preprare||prepare
pressre||pressure
presuambly||presumably
previosuly||previously
+previsously||previously
primative||primitive
princliple||principle
priorty||priority
@@ -1297,6 +1308,7 @@ routins||routines
rquest||request
runing||running
runned||ran
+runnnig||running
runnning||running
runtine||runtime
sacrifying||sacrificing
@@ -1353,6 +1365,7 @@ similiar||similar
simlar||similar
simliar||similar
simpified||simplified
+simultanous||simultaneous
singaled||signaled
singal||signal
singed||signed
@@ -1461,6 +1474,7 @@ syste||system
sytem||system
sythesis||synthesis
taht||that
+tained||tainted
tansmit||transmit
targetted||targeted
targetting||targeting
@@ -1489,6 +1503,7 @@ timout||timeout
tmis||this
toogle||toggle
torerable||tolerable
+torlence||tolerance
traget||target
traking||tracking
tramsmitted||transmitted
@@ -1503,6 +1518,7 @@ transferd||transferred
transfered||transferred
transfering||transferring
transision||transition
+transistioned||transitioned
transmittd||transmitted
transormed||transformed
trasfer||transfer
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 002/227] scripts/spelling.txt: add more spellings to spelling.txt
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: joe, colin.i.king, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Colin Ian King <colin.i.king@gmail.com>
Subject: scripts/spelling.txt: add more spellings to spelling.txt
Some of the more common spelling mistakes and typos that I've found
while fixing up spelling mistakes in the kernel in the past four months.
Link: https://lkml.kernel.org/r/20220216152343.105546-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
scripts/spelling.txt | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/scripts/spelling.txt~scripts-spellingtxt-add-more-spellings-to-spellingtxt
+++ a/scripts/spelling.txt
@@ -180,6 +180,7 @@ asuming||assuming
asycronous||asynchronous
asychronous||asynchronous
asynchnous||asynchronous
+asynchronus||asynchronous
asynchromous||asynchronous
asymetric||asymmetric
asymmeric||asymmetric
@@ -231,6 +232,7 @@ baloons||balloons
bandwith||bandwidth
banlance||balance
batery||battery
+battey||battery
beacuse||because
becasue||because
becomming||becoming
@@ -333,6 +335,7 @@ commoditiy||commodity
comsume||consume
comsumer||consumer
comsuming||consuming
+comaptible||compatible
compability||compatibility
compaibility||compatibility
comparsion||comparison
@@ -353,7 +356,9 @@ compoment||component
comppatible||compatible
compres||compress
compresion||compression
+compresser||compressor
comression||compression
+comsumed||consumed
comunicate||communicate
comunication||communication
conbination||combination
@@ -530,6 +535,7 @@ dissconect||disconnect
distiction||distinction
divisable||divisible
divsiors||divisors
+dsiabled||disabled
docuentation||documentation
documantation||documentation
documentaion||documentation
@@ -677,6 +683,7 @@ frequence||frequency
frequncy||frequency
frequancy||frequency
frome||from
+fronend||frontend
fucntion||function
fuction||function
fuctions||functions
@@ -761,6 +768,7 @@ implmentation||implementation
implmenting||implementing
incative||inactive
incomming||incoming
+incompaitiblity||incompatibility
incompatabilities||incompatibilities
incompatable||incompatible
incompatble||incompatible
@@ -942,6 +950,7 @@ metdata||metadata
micropone||microphone
microprocesspr||microprocessor
migrateable||migratable
+millenium||millennium
milliseonds||milliseconds
minium||minimum
minimam||minimum
@@ -1007,6 +1016,7 @@ notity||notify
nubmer||number
numebr||number
numner||number
+nunber||number
obtaion||obtain
obusing||abusing
occassionally||occasionally
@@ -1136,6 +1146,7 @@ preprare||prepare
pressre||pressure
presuambly||presumably
previosuly||previously
+previsously||previously
primative||primitive
princliple||principle
priorty||priority
@@ -1297,6 +1308,7 @@ routins||routines
rquest||request
runing||running
runned||ran
+runnnig||running
runnning||running
runtine||runtime
sacrifying||sacrificing
@@ -1353,6 +1365,7 @@ similiar||similar
simlar||similar
simliar||similar
simpified||simplified
+simultanous||simultaneous
singaled||signaled
singal||signal
singed||signed
@@ -1461,6 +1474,7 @@ syste||system
sytem||system
sythesis||synthesis
taht||that
+tained||tainted
tansmit||transmit
targetted||targeted
targetting||targeting
@@ -1489,6 +1503,7 @@ timout||timeout
tmis||this
toogle||toggle
torerable||tolerable
+torlence||tolerance
traget||target
traking||tracking
tramsmitted||transmitted
@@ -1503,6 +1518,7 @@ transferd||transferred
transfered||transferred
transfering||transferring
transision||transition
+transistioned||transitioned
transmittd||transmitted
transormed||transformed
trasfer||transfer
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 003/227] ntfs: add sanity check on allocation size
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: anton, mudongliangabcd, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Dongliang Mu <mudongliangabcd@gmail.com>
Subject: ntfs: add sanity check on allocation size
ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation size.
It triggers one BUG in the __ntfs_malloc function.
Fix this by adding sanity check on ni->attr_list_size.
Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn
Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ntfs/inode.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/fs/ntfs/inode.c~ntfs-add-sanity-check-on-allocation-size
+++ a/fs/ntfs/inode.c
@@ -1881,6 +1881,10 @@ int ntfs_read_inode_mount(struct inode *
}
/* Now allocate memory for the attribute list. */
ni->attr_list_size = (u32)ntfs_attr_size(a);
+ if (!ni->attr_list_size) {
+ ntfs_error(sb, "Attr_list_size is zero");
+ goto put_err_out;
+ }
ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
if (!ni->attr_list) {
ntfs_error(sb, "Not enough memory to allocate buffer "
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 003/227] ntfs: add sanity check on allocation size
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: anton, mudongliangabcd, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Dongliang Mu <mudongliangabcd@gmail.com>
Subject: ntfs: add sanity check on allocation size
ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation size.
It triggers one BUG in the __ntfs_malloc function.
Fix this by adding sanity check on ni->attr_list_size.
Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn
Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ntfs/inode.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/fs/ntfs/inode.c~ntfs-add-sanity-check-on-allocation-size
+++ a/fs/ntfs/inode.c
@@ -1881,6 +1881,10 @@ int ntfs_read_inode_mount(struct inode *
}
/* Now allocate memory for the attribute list. */
ni->attr_list_size = (u32)ntfs_attr_size(a);
+ if (!ni->attr_list_size) {
+ ntfs_error(sb, "Attr_list_size is zero");
+ goto put_err_out;
+ }
ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
if (!ni->attr_list) {
ntfs_error(sb, "Not enough memory to allocate buffer "
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 004/227] ocfs2: cleanup some return variables
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: zealci, piaojun, mark, junxiao.bi, jlbec, ghe, gechangwei,
chi.minghao, cgel.zte, joseph.qi, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Joseph Qi <joseph.qi@linux.alibaba.com>
Subject: ocfs2: cleanup some return variables
Simply return directly instead of assign the return value to another
variable.
Link: https://lkml.kernel.org/r/20220114021641.13927-1-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ocfs2/file.c | 9 +++------
fs/ocfs2/stack_user.c | 18 ++++++------------
2 files changed, 9 insertions(+), 18 deletions(-)
--- a/fs/ocfs2/file.c~ocfs2-cleanup-some-return-variables
+++ a/fs/ocfs2/file.c
@@ -540,15 +540,12 @@ int ocfs2_add_inode_data(struct ocfs2_su
struct ocfs2_alloc_context *meta_ac,
enum ocfs2_alloc_restarted *reason_ret)
{
- int ret;
struct ocfs2_extent_tree et;
ocfs2_init_dinode_extent_tree(&et, INODE_CACHE(inode), fe_bh);
- ret = ocfs2_add_clusters_in_btree(handle, &et, logical_offset,
- clusters_to_add, mark_unwritten,
- data_ac, meta_ac, reason_ret);
-
- return ret;
+ return ocfs2_add_clusters_in_btree(handle, &et, logical_offset,
+ clusters_to_add, mark_unwritten,
+ data_ac, meta_ac, reason_ret);
}
static int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
--- a/fs/ocfs2/stack_user.c~ocfs2-cleanup-some-return-variables
+++ a/fs/ocfs2/stack_user.c
@@ -683,28 +683,22 @@ static int user_dlm_lock(struct ocfs2_cl
void *name,
unsigned int namelen)
{
- int ret;
-
if (!lksb->lksb_fsdlm.sb_lvbptr)
lksb->lksb_fsdlm.sb_lvbptr = (char *)lksb +
sizeof(struct dlm_lksb);
- ret = dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm,
- flags|DLM_LKF_NODLCKWT, name, namelen, 0,
- fsdlm_lock_ast_wrapper, lksb,
- fsdlm_blocking_ast_wrapper);
- return ret;
+ return dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm,
+ flags|DLM_LKF_NODLCKWT, name, namelen, 0,
+ fsdlm_lock_ast_wrapper, lksb,
+ fsdlm_blocking_ast_wrapper);
}
static int user_dlm_unlock(struct ocfs2_cluster_connection *conn,
struct ocfs2_dlm_lksb *lksb,
u32 flags)
{
- int ret;
-
- ret = dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid,
- flags, &lksb->lksb_fsdlm, lksb);
- return ret;
+ return dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid,
+ flags, &lksb->lksb_fsdlm, lksb);
}
static int user_dlm_lock_status(struct ocfs2_dlm_lksb *lksb)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 004/227] ocfs2: cleanup some return variables
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: zealci, piaojun, mark, junxiao.bi, jlbec, ghe, gechangwei,
chi.minghao, cgel.zte, joseph.qi, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Joseph Qi <joseph.qi@linux.alibaba.com>
Subject: ocfs2: cleanup some return variables
Simply return directly instead of assign the return value to another
variable.
Link: https://lkml.kernel.org/r/20220114021641.13927-1-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ocfs2/file.c | 9 +++------
fs/ocfs2/stack_user.c | 18 ++++++------------
2 files changed, 9 insertions(+), 18 deletions(-)
--- a/fs/ocfs2/file.c~ocfs2-cleanup-some-return-variables
+++ a/fs/ocfs2/file.c
@@ -540,15 +540,12 @@ int ocfs2_add_inode_data(struct ocfs2_su
struct ocfs2_alloc_context *meta_ac,
enum ocfs2_alloc_restarted *reason_ret)
{
- int ret;
struct ocfs2_extent_tree et;
ocfs2_init_dinode_extent_tree(&et, INODE_CACHE(inode), fe_bh);
- ret = ocfs2_add_clusters_in_btree(handle, &et, logical_offset,
- clusters_to_add, mark_unwritten,
- data_ac, meta_ac, reason_ret);
-
- return ret;
+ return ocfs2_add_clusters_in_btree(handle, &et, logical_offset,
+ clusters_to_add, mark_unwritten,
+ data_ac, meta_ac, reason_ret);
}
static int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
--- a/fs/ocfs2/stack_user.c~ocfs2-cleanup-some-return-variables
+++ a/fs/ocfs2/stack_user.c
@@ -683,28 +683,22 @@ static int user_dlm_lock(struct ocfs2_cl
void *name,
unsigned int namelen)
{
- int ret;
-
if (!lksb->lksb_fsdlm.sb_lvbptr)
lksb->lksb_fsdlm.sb_lvbptr = (char *)lksb +
sizeof(struct dlm_lksb);
- ret = dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm,
- flags|DLM_LKF_NODLCKWT, name, namelen, 0,
- fsdlm_lock_ast_wrapper, lksb,
- fsdlm_blocking_ast_wrapper);
- return ret;
+ return dlm_lock(conn->cc_lockspace, mode, &lksb->lksb_fsdlm,
+ flags|DLM_LKF_NODLCKWT, name, namelen, 0,
+ fsdlm_lock_ast_wrapper, lksb,
+ fsdlm_blocking_ast_wrapper);
}
static int user_dlm_unlock(struct ocfs2_cluster_connection *conn,
struct ocfs2_dlm_lksb *lksb,
u32 flags)
{
- int ret;
-
- ret = dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid,
- flags, &lksb->lksb_fsdlm, lksb);
- return ret;
+ return dlm_unlock(conn->cc_lockspace, lksb->lksb_fsdlm.sb_lkid,
+ flags, &lksb->lksb_fsdlm, lksb);
}
static int user_dlm_lock_status(struct ocfs2_dlm_lksb *lksb)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 005/227] fs/ocfs2: fix comments mentioning i_mutex
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: piaojun, mark, junxiao.bi, joseph.qi, jlbec, ghe, gechangwei,
hongnan.li, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: hongnanli <hongnan.li@linux.alibaba.com>
Subject: fs/ocfs2: fix comments mentioning i_mutex
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix
comments still mentioning i_mutex.
Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com
Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ocfs2/alloc.c | 2 +-
fs/ocfs2/aops.c | 2 +-
fs/ocfs2/cluster/nodemanager.c | 2 +-
fs/ocfs2/dir.c | 4 ++--
fs/ocfs2/file.c | 4 ++--
fs/ocfs2/inode.c | 2 +-
fs/ocfs2/localalloc.c | 6 +++---
fs/ocfs2/namei.c | 2 +-
fs/ocfs2/ocfs2.h | 4 ++--
fs/ocfs2/quota_global.c | 2 +-
fs/ocfs2/xattr.c | 2 +-
11 files changed, 16 insertions(+), 16 deletions(-)
--- a/fs/ocfs2/alloc.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/alloc.c
@@ -5981,7 +5981,7 @@ bail:
return status;
}
-/* Expects you to already be holding tl_inode->i_mutex */
+/* Expects you to already be holding tl_inode->i_rwsem */
int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
{
int status;
--- a/fs/ocfs2/aops.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/aops.c
@@ -2311,7 +2311,7 @@ static int ocfs2_dio_end_io_write(struct
down_write(&oi->ip_alloc_sem);
- /* Delete orphan before acquire i_mutex. */
+ /* Delete orphan before acquire i_rwsem. */
if (dwc->dw_orphaned) {
BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
--- a/fs/ocfs2/cluster/nodemanager.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/cluster/nodemanager.c
@@ -689,7 +689,7 @@ static struct config_group *o2nm_cluster
struct o2nm_node_group *ns = NULL;
struct config_group *o2hb_group = NULL, *ret = NULL;
- /* this runs under the parent dir's i_mutex; there can be only
+ /* this runs under the parent dir's i_rwsem; there can be only
* one caller in here at a time */
if (o2nm_single_cluster)
return ERR_PTR(-ENOSPC);
--- a/fs/ocfs2/dir.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/dir.c
@@ -1957,7 +1957,7 @@ bail_nolock:
}
/*
- * NOTE: this should always be called with parent dir i_mutex taken.
+ * NOTE: this should always be called with parent dir i_rwsem taken.
*/
int ocfs2_find_files_on_disk(const char *name,
int namelen,
@@ -2003,7 +2003,7 @@ int ocfs2_lookup_ino_from_name(struct in
* Return 0 if the name does not exist
* Return -EEXIST if the directory contains the name
*
- * Callers should have i_mutex + a cluster lock on dir
+ * Callers should have i_rwsem + a cluster lock on dir
*/
int ocfs2_check_dir_for_entry(struct inode *dir,
const char *name,
--- a/fs/ocfs2/file.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/file.c
@@ -270,7 +270,7 @@ int ocfs2_update_inode_atime(struct inod
/*
* Don't use ocfs2_mark_inode_dirty() here as we don't always
- * have i_mutex to guard against concurrent changes to other
+ * have i_rwsem to guard against concurrent changes to other
* inode fields.
*/
inode->i_atime = current_time(inode);
@@ -1065,7 +1065,7 @@ static int ocfs2_extend_file(struct inod
/*
* The alloc sem blocks people in read/write from reading our
* allocation until we're done changing it. We depend on
- * i_mutex to block other extend/truncate calls while we're
+ * i_rwsem to block other extend/truncate calls while we're
* here. We even have to hold it for sparse files because there
* might be some tail zeroing.
*/
--- a/fs/ocfs2/inode.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/inode.c
@@ -713,7 +713,7 @@ bail:
/*
* Serialize with orphan dir recovery. If the process doing
* recovery on this orphan dir does an iget() with the dir
- * i_mutex held, we'll deadlock here. Instead we detect this
+ * i_rwsem held, we'll deadlock here. Instead we detect this
* and exit early - recovery will wipe this inode for us.
*/
static int ocfs2_check_orphan_recovery_state(struct ocfs2_super *osb,
--- a/fs/ocfs2/localalloc.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/localalloc.c
@@ -606,7 +606,7 @@ out:
/*
* make sure we've got at least bits_wanted contiguous bits in the
- * local alloc. You lose them when you drop i_mutex.
+ * local alloc. You lose them when you drop i_rwsem.
*
* We will add ourselves to the transaction passed in, but may start
* our own in order to shift windows.
@@ -636,7 +636,7 @@ int ocfs2_reserve_local_alloc_bits(struc
/*
* We must double check state and allocator bits because
- * another process may have changed them while holding i_mutex.
+ * another process may have changed them while holding i_rwsem.
*/
spin_lock(&osb->osb_lock);
if (!ocfs2_la_state_enabled(osb) ||
@@ -1029,7 +1029,7 @@ enum ocfs2_la_event {
/*
* Given an event, calculate the size of our next local alloc window.
*
- * This should always be called under i_mutex of the local alloc inode
+ * This should always be called under i_rwsem of the local alloc inode
* so that local alloc disabling doesn't race with processes trying to
* use the allocator.
*
--- a/fs/ocfs2/namei.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/namei.c
@@ -476,7 +476,7 @@ leave:
ocfs2_free_alloc_context(meta_ac);
/*
- * We should call iput after the i_mutex of the bitmap been
+ * We should call iput after the i_rwsem of the bitmap been
* unlocked in ocfs2_free_alloc_context, or the
* ocfs2_delete_inode will mutex_lock again.
*/
--- a/fs/ocfs2/ocfs2.h~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/ocfs2.h
@@ -355,7 +355,7 @@ struct ocfs2_super
struct delayed_work la_enable_wq;
/*
- * Must hold local alloc i_mutex and osb->osb_lock to change
+ * Must hold local alloc i_rwsem and osb->osb_lock to change
* local_alloc_bits. Reads can be done under either lock.
*/
unsigned int local_alloc_bits;
@@ -430,7 +430,7 @@ struct ocfs2_super
atomic_t osb_tl_disable;
/*
* How many clusters in our truncate log.
- * It must be protected by osb_tl_inode->i_mutex.
+ * It must be protected by osb_tl_inode->i_rwsem.
*/
unsigned int truncated_clusters;
--- a/fs/ocfs2/quota_global.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/quota_global.c
@@ -36,7 +36,7 @@
* should be obeyed by all the functions:
* - any write of quota structure (either to local or global file) is protected
* by dqio_sem or dquot->dq_lock.
- * - any modification of global quota file holds inode cluster lock, i_mutex,
+ * - any modification of global quota file holds inode cluster lock, i_rwsem,
* and ip_alloc_sem of the global quota file (achieved by
* ocfs2_lock_global_qf). It also has to hold qinfo_lock.
* - an allocation of new blocks for local quota file is protected by
--- a/fs/ocfs2/xattr.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/xattr.c
@@ -7205,7 +7205,7 @@ out:
* Used for reflink a non-preserve-security file.
*
* It uses common api like ocfs2_xattr_set, so the caller
- * must not hold any lock expect i_mutex.
+ * must not hold any lock expect i_rwsem.
*/
int ocfs2_init_security_and_acl(struct inode *dir,
struct inode *inode,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 005/227] fs/ocfs2: fix comments mentioning i_mutex
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: piaojun, mark, junxiao.bi, joseph.qi, jlbec, ghe, gechangwei,
hongnan.li, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: hongnanli <hongnan.li@linux.alibaba.com>
Subject: fs/ocfs2: fix comments mentioning i_mutex
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix
comments still mentioning i_mutex.
Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com
Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ocfs2/alloc.c | 2 +-
fs/ocfs2/aops.c | 2 +-
fs/ocfs2/cluster/nodemanager.c | 2 +-
fs/ocfs2/dir.c | 4 ++--
fs/ocfs2/file.c | 4 ++--
fs/ocfs2/inode.c | 2 +-
fs/ocfs2/localalloc.c | 6 +++---
fs/ocfs2/namei.c | 2 +-
fs/ocfs2/ocfs2.h | 4 ++--
fs/ocfs2/quota_global.c | 2 +-
fs/ocfs2/xattr.c | 2 +-
11 files changed, 16 insertions(+), 16 deletions(-)
--- a/fs/ocfs2/alloc.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/alloc.c
@@ -5981,7 +5981,7 @@ bail:
return status;
}
-/* Expects you to already be holding tl_inode->i_mutex */
+/* Expects you to already be holding tl_inode->i_rwsem */
int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
{
int status;
--- a/fs/ocfs2/aops.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/aops.c
@@ -2311,7 +2311,7 @@ static int ocfs2_dio_end_io_write(struct
down_write(&oi->ip_alloc_sem);
- /* Delete orphan before acquire i_mutex. */
+ /* Delete orphan before acquire i_rwsem. */
if (dwc->dw_orphaned) {
BUG_ON(dwc->dw_writer_pid != task_pid_nr(current));
--- a/fs/ocfs2/cluster/nodemanager.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/cluster/nodemanager.c
@@ -689,7 +689,7 @@ static struct config_group *o2nm_cluster
struct o2nm_node_group *ns = NULL;
struct config_group *o2hb_group = NULL, *ret = NULL;
- /* this runs under the parent dir's i_mutex; there can be only
+ /* this runs under the parent dir's i_rwsem; there can be only
* one caller in here at a time */
if (o2nm_single_cluster)
return ERR_PTR(-ENOSPC);
--- a/fs/ocfs2/dir.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/dir.c
@@ -1957,7 +1957,7 @@ bail_nolock:
}
/*
- * NOTE: this should always be called with parent dir i_mutex taken.
+ * NOTE: this should always be called with parent dir i_rwsem taken.
*/
int ocfs2_find_files_on_disk(const char *name,
int namelen,
@@ -2003,7 +2003,7 @@ int ocfs2_lookup_ino_from_name(struct in
* Return 0 if the name does not exist
* Return -EEXIST if the directory contains the name
*
- * Callers should have i_mutex + a cluster lock on dir
+ * Callers should have i_rwsem + a cluster lock on dir
*/
int ocfs2_check_dir_for_entry(struct inode *dir,
const char *name,
--- a/fs/ocfs2/file.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/file.c
@@ -270,7 +270,7 @@ int ocfs2_update_inode_atime(struct inod
/*
* Don't use ocfs2_mark_inode_dirty() here as we don't always
- * have i_mutex to guard against concurrent changes to other
+ * have i_rwsem to guard against concurrent changes to other
* inode fields.
*/
inode->i_atime = current_time(inode);
@@ -1065,7 +1065,7 @@ static int ocfs2_extend_file(struct inod
/*
* The alloc sem blocks people in read/write from reading our
* allocation until we're done changing it. We depend on
- * i_mutex to block other extend/truncate calls while we're
+ * i_rwsem to block other extend/truncate calls while we're
* here. We even have to hold it for sparse files because there
* might be some tail zeroing.
*/
--- a/fs/ocfs2/inode.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/inode.c
@@ -713,7 +713,7 @@ bail:
/*
* Serialize with orphan dir recovery. If the process doing
* recovery on this orphan dir does an iget() with the dir
- * i_mutex held, we'll deadlock here. Instead we detect this
+ * i_rwsem held, we'll deadlock here. Instead we detect this
* and exit early - recovery will wipe this inode for us.
*/
static int ocfs2_check_orphan_recovery_state(struct ocfs2_super *osb,
--- a/fs/ocfs2/localalloc.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/localalloc.c
@@ -606,7 +606,7 @@ out:
/*
* make sure we've got at least bits_wanted contiguous bits in the
- * local alloc. You lose them when you drop i_mutex.
+ * local alloc. You lose them when you drop i_rwsem.
*
* We will add ourselves to the transaction passed in, but may start
* our own in order to shift windows.
@@ -636,7 +636,7 @@ int ocfs2_reserve_local_alloc_bits(struc
/*
* We must double check state and allocator bits because
- * another process may have changed them while holding i_mutex.
+ * another process may have changed them while holding i_rwsem.
*/
spin_lock(&osb->osb_lock);
if (!ocfs2_la_state_enabled(osb) ||
@@ -1029,7 +1029,7 @@ enum ocfs2_la_event {
/*
* Given an event, calculate the size of our next local alloc window.
*
- * This should always be called under i_mutex of the local alloc inode
+ * This should always be called under i_rwsem of the local alloc inode
* so that local alloc disabling doesn't race with processes trying to
* use the allocator.
*
--- a/fs/ocfs2/namei.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/namei.c
@@ -476,7 +476,7 @@ leave:
ocfs2_free_alloc_context(meta_ac);
/*
- * We should call iput after the i_mutex of the bitmap been
+ * We should call iput after the i_rwsem of the bitmap been
* unlocked in ocfs2_free_alloc_context, or the
* ocfs2_delete_inode will mutex_lock again.
*/
--- a/fs/ocfs2/ocfs2.h~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/ocfs2.h
@@ -355,7 +355,7 @@ struct ocfs2_super
struct delayed_work la_enable_wq;
/*
- * Must hold local alloc i_mutex and osb->osb_lock to change
+ * Must hold local alloc i_rwsem and osb->osb_lock to change
* local_alloc_bits. Reads can be done under either lock.
*/
unsigned int local_alloc_bits;
@@ -430,7 +430,7 @@ struct ocfs2_super
atomic_t osb_tl_disable;
/*
* How many clusters in our truncate log.
- * It must be protected by osb_tl_inode->i_mutex.
+ * It must be protected by osb_tl_inode->i_rwsem.
*/
unsigned int truncated_clusters;
--- a/fs/ocfs2/quota_global.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/quota_global.c
@@ -36,7 +36,7 @@
* should be obeyed by all the functions:
* - any write of quota structure (either to local or global file) is protected
* by dqio_sem or dquot->dq_lock.
- * - any modification of global quota file holds inode cluster lock, i_mutex,
+ * - any modification of global quota file holds inode cluster lock, i_rwsem,
* and ip_alloc_sem of the global quota file (achieved by
* ocfs2_lock_global_qf). It also has to hold qinfo_lock.
* - an allocation of new blocks for local quota file is protected by
--- a/fs/ocfs2/xattr.c~fs-ocfs2-fix-comments-mentioning-i_mutex
+++ a/fs/ocfs2/xattr.c
@@ -7205,7 +7205,7 @@ out:
* Used for reflink a non-preserve-security file.
*
* It uses common api like ocfs2_xattr_set, so the caller
- * must not hold any lock expect i_mutex.
+ * must not hold any lock expect i_rwsem.
*/
int ocfs2_init_security_and_acl(struct inode *dir,
struct inode *inode,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 006/227] doc: convert 'subsection' to 'section' in gfp.h
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: doc: convert 'subsection' to 'section' in gfp.h
Patch series "Remove remaining parts of congestion tracking code", v2.
This patch (of 11):
Various DOC: sections in gfp.h have subsection headers (~~~) but the place
where they are included in mm-api.rst does not have section, only
chapters.
So convert to section headers (---) to avoid confusion. Specifically if
sections are added later in mm-api.rst, an error results.
Link: https://lkml.kernel.org/r/164549971112.9187.16871723439770288255.stgit@noble.brown
Link: https://lkml.kernel.org/r/164549983733.9187.17894407453436115822.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/gfp.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--- a/include/linux/gfp.h~doc-convert-subsection-to-section-in-gfph
+++ a/include/linux/gfp.h
@@ -79,7 +79,7 @@ struct vm_area_struct;
* DOC: Page mobility and placement hints
*
* Page mobility and placement hints
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * ---------------------------------
*
* These flags provide hints about how mobile the page is. Pages with similar
* mobility are placed within the same pageblocks to minimise problems due
@@ -112,7 +112,7 @@ struct vm_area_struct;
* DOC: Watermark modifiers
*
* Watermark modifiers -- controls access to emergency reserves
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * ------------------------------------------------------------
*
* %__GFP_HIGH indicates that the caller is high-priority and that granting
* the request is necessary before the system can make forward progress.
@@ -144,7 +144,7 @@ struct vm_area_struct;
* DOC: Reclaim modifiers
*
* Reclaim modifiers
- * ~~~~~~~~~~~~~~~~~
+ * -----------------
* Please note that all the following flags are only applicable to sleepable
* allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them).
*
@@ -224,7 +224,7 @@ struct vm_area_struct;
* DOC: Action modifiers
*
* Action modifiers
- * ~~~~~~~~~~~~~~~~
+ * ----------------
*
* %__GFP_NOWARN suppresses allocation failure reports.
*
@@ -256,7 +256,7 @@ struct vm_area_struct;
* DOC: Useful GFP flag combinations
*
* Useful GFP flag combinations
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * ----------------------------
*
* Useful GFP flag combinations that are commonly used. It is recommended
* that subsystems start with one of these combinations and then set/clear
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 006/227] doc: convert 'subsection' to 'section' in gfp.h
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: doc: convert 'subsection' to 'section' in gfp.h
Patch series "Remove remaining parts of congestion tracking code", v2.
This patch (of 11):
Various DOC: sections in gfp.h have subsection headers (~~~) but the place
where they are included in mm-api.rst does not have section, only
chapters.
So convert to section headers (---) to avoid confusion. Specifically if
sections are added later in mm-api.rst, an error results.
Link: https://lkml.kernel.org/r/164549971112.9187.16871723439770288255.stgit@noble.brown
Link: https://lkml.kernel.org/r/164549983733.9187.17894407453436115822.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/gfp.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--- a/include/linux/gfp.h~doc-convert-subsection-to-section-in-gfph
+++ a/include/linux/gfp.h
@@ -79,7 +79,7 @@ struct vm_area_struct;
* DOC: Page mobility and placement hints
*
* Page mobility and placement hints
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * ---------------------------------
*
* These flags provide hints about how mobile the page is. Pages with similar
* mobility are placed within the same pageblocks to minimise problems due
@@ -112,7 +112,7 @@ struct vm_area_struct;
* DOC: Watermark modifiers
*
* Watermark modifiers -- controls access to emergency reserves
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * ------------------------------------------------------------
*
* %__GFP_HIGH indicates that the caller is high-priority and that granting
* the request is necessary before the system can make forward progress.
@@ -144,7 +144,7 @@ struct vm_area_struct;
* DOC: Reclaim modifiers
*
* Reclaim modifiers
- * ~~~~~~~~~~~~~~~~~
+ * -----------------
* Please note that all the following flags are only applicable to sleepable
* allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them).
*
@@ -224,7 +224,7 @@ struct vm_area_struct;
* DOC: Action modifiers
*
* Action modifiers
- * ~~~~~~~~~~~~~~~~
+ * ----------------
*
* %__GFP_NOWARN suppresses allocation failure reports.
*
@@ -256,7 +256,7 @@ struct vm_area_struct;
* DOC: Useful GFP flag combinations
*
* Useful GFP flag combinations
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * ----------------------------
*
* Useful GFP flag combinations that are commonly used. It is recommended
* that subsystems start with one of these combinations and then set/clear
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 007/227] mm: document and polish read-ahead code
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: mm: document and polish read-ahead code
Add some "big-picture" documentation for read-ahead and polish the code to
make it fit this documentation.
The meaning of ->async_size is clarified to match its name. i.e. Any
request to ->readahead() has a sync part and an async part. The caller
will wait for the sync pages to complete, but will not wait for the async
pages. The first async page is still marked PG_readahead
Note that the current function names page_cache_sync_ra() and
page_cache_async_ra() are misleading. All ra request are partly sync and
partly async, so either part can be empty. A page_cache_sync_ra() request
will usually set ->async_size non-zero, implying it is not all
synchronous.
When a non-zero req_count is passed to page_cache_async_ra(), the
implication is that some prefix of the request is synchronous, though the
calculation made there is incorrect - I haven't tried to fix it.
Link: https://lkml.kernel.org/r/164549983734.9187.11586890887006601405.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/core-api/mm-api.rst | 19 ++++-
Documentation/filesystems/vfs.rst | 16 ++--
include/linux/fs.h | 9 +-
mm/readahead.c | 99 ++++++++++++++++++++++++++++
4 files changed, 133 insertions(+), 10 deletions(-)
--- a/Documentation/core-api/mm-api.rst~mm-document-and-polish-read-ahead-code
+++ a/Documentation/core-api/mm-api.rst
@@ -58,15 +58,30 @@ Virtually Contiguous Mappings
File Mapping and Page Cache
===========================
-.. kernel-doc:: mm/readahead.c
- :export:
+Filemap
+-------
.. kernel-doc:: mm/filemap.c
:export:
+Readahead
+---------
+
+.. kernel-doc:: mm/readahead.c
+ :doc: Readahead Overview
+
+.. kernel-doc:: mm/readahead.c
+ :export:
+
+Writeback
+---------
+
.. kernel-doc:: mm/page-writeback.c
:export:
+Truncate
+--------
+
.. kernel-doc:: mm/truncate.c
:export:
--- a/Documentation/filesystems/vfs.rst~mm-document-and-polish-read-ahead-code
+++ a/Documentation/filesystems/vfs.rst
@@ -806,12 +806,16 @@ cache in your filesystem. The following
object. The pages are consecutive in the page cache and are
locked. The implementation should decrement the page refcount
after starting I/O on each page. Usually the page will be
- unlocked by the I/O completion handler. If the filesystem decides
- to stop attempting I/O before reaching the end of the readahead
- window, it can simply return. The caller will decrement the page
- refcount and unlock the remaining pages for you. Set PageUptodate
- if the I/O completes successfully. Setting PageError on any page
- will be ignored; simply unlock the page if an I/O error occurs.
+ unlocked by the I/O completion handler. The set of pages are
+ divided into some sync pages followed by some async pages,
+ rac->ra->async_size gives the number of async pages. The
+ filesystem should attempt to read all sync pages but may decide
+ to stop once it reaches the async pages. If it does decide to
+ stop attempting I/O, it can simply return. The caller will
+ remove the remaining pages from the address space, unlock them
+ and decrement the page refcount. Set PageUptodate if the I/O
+ completes successfully. Setting PageError on any page will be
+ ignored; simply unlock the page if an I/O error occurs.
``readpages``
called by the VM to read pages associated with the address_space
--- a/include/linux/fs.h~mm-document-and-polish-read-ahead-code
+++ a/include/linux/fs.h
@@ -930,10 +930,15 @@ struct fown_struct {
* struct file_ra_state - Track a file's readahead state.
* @start: Where the most recent readahead started.
* @size: Number of pages read in the most recent readahead.
- * @async_size: Start next readahead when this many pages are left.
- * @ra_pages: Maximum size of a readahead request.
+ * @async_size: Numer of pages that were/are not needed immediately
+ * and so were/are genuinely "ahead". Start next readahead when
+ * the first of these pages is accessed.
+ * @ra_pages: Maximum size of a readahead request, copied from the bdi.
* @mmap_miss: How many mmap accesses missed in the page cache.
* @prev_pos: The last byte in the most recent read request.
+ *
+ * When this structure is passed to ->readahead(), the "most recent"
+ * readahead means the current readahead.
*/
struct file_ra_state {
pgoff_t start;
--- a/mm/readahead.c~mm-document-and-polish-read-ahead-code
+++ a/mm/readahead.c
@@ -8,6 +8,105 @@
* Initial version.
*/
+/**
+ * DOC: Readahead Overview
+ *
+ * Readahead is used to read content into the page cache before it is
+ * explicitly requested by the application. Readahead only ever
+ * attempts to read pages that are not yet in the page cache. If a
+ * page is present but not up-to-date, readahead will not try to read
+ * it. In that case a simple ->readpage() will be requested.
+ *
+ * Readahead is triggered when an application read request (whether a
+ * systemcall or a page fault) finds that the requested page is not in
+ * the page cache, or that it is in the page cache and has the
+ * %PG_readahead flag set. This flag indicates that the page was loaded
+ * as part of a previous read-ahead request and now that it has been
+ * accessed, it is time for the next read-ahead.
+ *
+ * Each readahead request is partly synchronous read, and partly async
+ * read-ahead. This is reflected in the struct file_ra_state which
+ * contains ->size being to total number of pages, and ->async_size
+ * which is the number of pages in the async section. The first page in
+ * this async section will have %PG_readahead set as a trigger for a
+ * subsequent read ahead. Once a series of sequential reads has been
+ * established, there should be no need for a synchronous component and
+ * all read ahead request will be fully asynchronous.
+ *
+ * When either of the triggers causes a readahead, three numbers need to
+ * be determined: the start of the region, the size of the region, and
+ * the size of the async tail.
+ *
+ * The start of the region is simply the first page address at or after
+ * the accessed address, which is not currently populated in the page
+ * cache. This is found with a simple search in the page cache.
+ *
+ * The size of the async tail is determined by subtracting the size that
+ * was explicitly requested from the determined request size, unless
+ * this would be less than zero - then zero is used. NOTE THIS
+ * CALCULATION IS WRONG WHEN THE START OF THE REGION IS NOT THE ACCESSED
+ * PAGE.
+ *
+ * The size of the region is normally determined from the size of the
+ * previous readahead which loaded the preceding pages. This may be
+ * discovered from the struct file_ra_state for simple sequential reads,
+ * or from examining the state of the page cache when multiple
+ * sequential reads are interleaved. Specifically: where the readahead
+ * was triggered by the %PG_readahead flag, the size of the previous
+ * readahead is assumed to be the number of pages from the triggering
+ * page to the start of the new readahead. In these cases, the size of
+ * the previous readahead is scaled, often doubled, for the new
+ * readahead, though see get_next_ra_size() for details.
+ *
+ * If the size of the previous read cannot be determined, the number of
+ * preceding pages in the page cache is used to estimate the size of
+ * a previous read. This estimate could easily be misled by random
+ * reads being coincidentally adjacent, so it is ignored unless it is
+ * larger than the current request, and it is not scaled up, unless it
+ * is at the start of file.
+ *
+ * In general read ahead is accelerated at the start of the file, as
+ * reads from there are often sequential. There are other minor
+ * adjustments to the read ahead size in various special cases and these
+ * are best discovered by reading the code.
+ *
+ * The above calculation determines the readahead, to which any requested
+ * read size may be added.
+ *
+ * Readahead requests are sent to the filesystem using the ->readahead()
+ * address space operation, for which mpage_readahead() is a canonical
+ * implementation. ->readahead() should normally initiate reads on all
+ * pages, but may fail to read any or all pages without causing an IO
+ * error. The page cache reading code will issue a ->readpage() request
+ * for any page which ->readahead() does not provided, and only an error
+ * from this will be final.
+ *
+ * ->readahead() will generally call readahead_page() repeatedly to get
+ * each page from those prepared for read ahead. It may fail to read a
+ * page by:
+ *
+ * * not calling readahead_page() sufficiently many times, effectively
+ * ignoring some pages, as might be appropriate if the path to
+ * storage is congested.
+ *
+ * * failing to actually submit a read request for a given page,
+ * possibly due to insufficient resources, or
+ *
+ * * getting an error during subsequent processing of a request.
+ *
+ * In the last two cases, the page should be unlocked to indicate that
+ * the read attempt has failed. In the first case the page will be
+ * unlocked by the caller.
+ *
+ * Those pages not in the final ``async_size`` of the request should be
+ * considered to be important and ->readahead() should not fail them due
+ * to congestion or temporary resource unavailability, but should wait
+ * for necessary resources (e.g. memory or indexing information) to
+ * become available. Pages in the final ``async_size`` may be
+ * considered less urgent and failure to read them is more acceptable.
+ * They will eventually be read individually using ->readpage().
+ */
+
#include <linux/kernel.h>
#include <linux/dax.h>
#include <linux/gfp.h>
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 007/227] mm: document and polish read-ahead code
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: mm: document and polish read-ahead code
Add some "big-picture" documentation for read-ahead and polish the code to
make it fit this documentation.
The meaning of ->async_size is clarified to match its name. i.e. Any
request to ->readahead() has a sync part and an async part. The caller
will wait for the sync pages to complete, but will not wait for the async
pages. The first async page is still marked PG_readahead
Note that the current function names page_cache_sync_ra() and
page_cache_async_ra() are misleading. All ra request are partly sync and
partly async, so either part can be empty. A page_cache_sync_ra() request
will usually set ->async_size non-zero, implying it is not all
synchronous.
When a non-zero req_count is passed to page_cache_async_ra(), the
implication is that some prefix of the request is synchronous, though the
calculation made there is incorrect - I haven't tried to fix it.
Link: https://lkml.kernel.org/r/164549983734.9187.11586890887006601405.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/core-api/mm-api.rst | 19 ++++-
Documentation/filesystems/vfs.rst | 16 ++--
include/linux/fs.h | 9 +-
mm/readahead.c | 99 ++++++++++++++++++++++++++++
4 files changed, 133 insertions(+), 10 deletions(-)
--- a/Documentation/core-api/mm-api.rst~mm-document-and-polish-read-ahead-code
+++ a/Documentation/core-api/mm-api.rst
@@ -58,15 +58,30 @@ Virtually Contiguous Mappings
File Mapping and Page Cache
===========================
-.. kernel-doc:: mm/readahead.c
- :export:
+Filemap
+-------
.. kernel-doc:: mm/filemap.c
:export:
+Readahead
+---------
+
+.. kernel-doc:: mm/readahead.c
+ :doc: Readahead Overview
+
+.. kernel-doc:: mm/readahead.c
+ :export:
+
+Writeback
+---------
+
.. kernel-doc:: mm/page-writeback.c
:export:
+Truncate
+--------
+
.. kernel-doc:: mm/truncate.c
:export:
--- a/Documentation/filesystems/vfs.rst~mm-document-and-polish-read-ahead-code
+++ a/Documentation/filesystems/vfs.rst
@@ -806,12 +806,16 @@ cache in your filesystem. The following
object. The pages are consecutive in the page cache and are
locked. The implementation should decrement the page refcount
after starting I/O on each page. Usually the page will be
- unlocked by the I/O completion handler. If the filesystem decides
- to stop attempting I/O before reaching the end of the readahead
- window, it can simply return. The caller will decrement the page
- refcount and unlock the remaining pages for you. Set PageUptodate
- if the I/O completes successfully. Setting PageError on any page
- will be ignored; simply unlock the page if an I/O error occurs.
+ unlocked by the I/O completion handler. The set of pages are
+ divided into some sync pages followed by some async pages,
+ rac->ra->async_size gives the number of async pages. The
+ filesystem should attempt to read all sync pages but may decide
+ to stop once it reaches the async pages. If it does decide to
+ stop attempting I/O, it can simply return. The caller will
+ remove the remaining pages from the address space, unlock them
+ and decrement the page refcount. Set PageUptodate if the I/O
+ completes successfully. Setting PageError on any page will be
+ ignored; simply unlock the page if an I/O error occurs.
``readpages``
called by the VM to read pages associated with the address_space
--- a/include/linux/fs.h~mm-document-and-polish-read-ahead-code
+++ a/include/linux/fs.h
@@ -930,10 +930,15 @@ struct fown_struct {
* struct file_ra_state - Track a file's readahead state.
* @start: Where the most recent readahead started.
* @size: Number of pages read in the most recent readahead.
- * @async_size: Start next readahead when this many pages are left.
- * @ra_pages: Maximum size of a readahead request.
+ * @async_size: Numer of pages that were/are not needed immediately
+ * and so were/are genuinely "ahead". Start next readahead when
+ * the first of these pages is accessed.
+ * @ra_pages: Maximum size of a readahead request, copied from the bdi.
* @mmap_miss: How many mmap accesses missed in the page cache.
* @prev_pos: The last byte in the most recent read request.
+ *
+ * When this structure is passed to ->readahead(), the "most recent"
+ * readahead means the current readahead.
*/
struct file_ra_state {
pgoff_t start;
--- a/mm/readahead.c~mm-document-and-polish-read-ahead-code
+++ a/mm/readahead.c
@@ -8,6 +8,105 @@
* Initial version.
*/
+/**
+ * DOC: Readahead Overview
+ *
+ * Readahead is used to read content into the page cache before it is
+ * explicitly requested by the application. Readahead only ever
+ * attempts to read pages that are not yet in the page cache. If a
+ * page is present but not up-to-date, readahead will not try to read
+ * it. In that case a simple ->readpage() will be requested.
+ *
+ * Readahead is triggered when an application read request (whether a
+ * systemcall or a page fault) finds that the requested page is not in
+ * the page cache, or that it is in the page cache and has the
+ * %PG_readahead flag set. This flag indicates that the page was loaded
+ * as part of a previous read-ahead request and now that it has been
+ * accessed, it is time for the next read-ahead.
+ *
+ * Each readahead request is partly synchronous read, and partly async
+ * read-ahead. This is reflected in the struct file_ra_state which
+ * contains ->size being to total number of pages, and ->async_size
+ * which is the number of pages in the async section. The first page in
+ * this async section will have %PG_readahead set as a trigger for a
+ * subsequent read ahead. Once a series of sequential reads has been
+ * established, there should be no need for a synchronous component and
+ * all read ahead request will be fully asynchronous.
+ *
+ * When either of the triggers causes a readahead, three numbers need to
+ * be determined: the start of the region, the size of the region, and
+ * the size of the async tail.
+ *
+ * The start of the region is simply the first page address at or after
+ * the accessed address, which is not currently populated in the page
+ * cache. This is found with a simple search in the page cache.
+ *
+ * The size of the async tail is determined by subtracting the size that
+ * was explicitly requested from the determined request size, unless
+ * this would be less than zero - then zero is used. NOTE THIS
+ * CALCULATION IS WRONG WHEN THE START OF THE REGION IS NOT THE ACCESSED
+ * PAGE.
+ *
+ * The size of the region is normally determined from the size of the
+ * previous readahead which loaded the preceding pages. This may be
+ * discovered from the struct file_ra_state for simple sequential reads,
+ * or from examining the state of the page cache when multiple
+ * sequential reads are interleaved. Specifically: where the readahead
+ * was triggered by the %PG_readahead flag, the size of the previous
+ * readahead is assumed to be the number of pages from the triggering
+ * page to the start of the new readahead. In these cases, the size of
+ * the previous readahead is scaled, often doubled, for the new
+ * readahead, though see get_next_ra_size() for details.
+ *
+ * If the size of the previous read cannot be determined, the number of
+ * preceding pages in the page cache is used to estimate the size of
+ * a previous read. This estimate could easily be misled by random
+ * reads being coincidentally adjacent, so it is ignored unless it is
+ * larger than the current request, and it is not scaled up, unless it
+ * is at the start of file.
+ *
+ * In general read ahead is accelerated at the start of the file, as
+ * reads from there are often sequential. There are other minor
+ * adjustments to the read ahead size in various special cases and these
+ * are best discovered by reading the code.
+ *
+ * The above calculation determines the readahead, to which any requested
+ * read size may be added.
+ *
+ * Readahead requests are sent to the filesystem using the ->readahead()
+ * address space operation, for which mpage_readahead() is a canonical
+ * implementation. ->readahead() should normally initiate reads on all
+ * pages, but may fail to read any or all pages without causing an IO
+ * error. The page cache reading code will issue a ->readpage() request
+ * for any page which ->readahead() does not provided, and only an error
+ * from this will be final.
+ *
+ * ->readahead() will generally call readahead_page() repeatedly to get
+ * each page from those prepared for read ahead. It may fail to read a
+ * page by:
+ *
+ * * not calling readahead_page() sufficiently many times, effectively
+ * ignoring some pages, as might be appropriate if the path to
+ * storage is congested.
+ *
+ * * failing to actually submit a read request for a given page,
+ * possibly due to insufficient resources, or
+ *
+ * * getting an error during subsequent processing of a request.
+ *
+ * In the last two cases, the page should be unlocked to indicate that
+ * the read attempt has failed. In the first case the page will be
+ * unlocked by the caller.
+ *
+ * Those pages not in the final ``async_size`` of the request should be
+ * considered to be important and ->readahead() should not fail them due
+ * to congestion or temporary resource unavailability, but should wait
+ * for necessary resources (e.g. memory or indexing information) to
+ * become available. Pages in the final ``async_size`` may be
+ * considered less urgent and failure to read them is more acceptable.
+ * They will eventually be read individually using ->readpage().
+ */
+
#include <linux/kernel.h>
#include <linux/dax.h>
#include <linux/gfp.h>
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 008/227] mm: improve cleanup when ->readpages doesn't process all pages
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: mm: improve cleanup when ->readpages doesn't process all pages
If ->readpages doesn't process all the pages, then it is best to act as
though they weren't requested so that a subsequent readahead can try
again.
So:
- remove any 'ahead' pages from the page cache so they can be loaded
with ->readahead() rather then multiple ->read()s
- update the file_ra_state to reflect the reads that were actually
submitted.
This allows ->readpages() to abort early due e.g. to congestion, which
will then allow us to remove the inode_read_congested() test from
page_Cache_async_ra().
Link: https://lkml.kernel.org/r/164549983736.9187.16755913785880819183.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/readahead.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
--- a/mm/readahead.c~mm-improve-cleanup-when-readpages-doesnt-process-all-pages
+++ a/mm/readahead.c
@@ -104,7 +104,13 @@
* for necessary resources (e.g. memory or indexing information) to
* become available. Pages in the final ``async_size`` may be
* considered less urgent and failure to read them is more acceptable.
- * They will eventually be read individually using ->readpage().
+ * In this case it is best to use delete_from_page_cache() to remove the
+ * pages from the page cache as is automatically done for pages that
+ * were not fetched with readahead_page(). This will allow a
+ * subsequent synchronous read ahead request to try them again. If they
+ * are left in the page cache, then they will be read individually using
+ * ->readpage().
+ *
*/
#include <linux/kernel.h>
@@ -226,8 +232,17 @@ static void read_pages(struct readahead_
if (aops->readahead) {
aops->readahead(rac);
- /* Clean up the remaining pages */
+ /*
+ * Clean up the remaining pages. The sizes in ->ra
+ * maybe be used to size next read-ahead, so make sure
+ * they accurately reflect what happened.
+ */
while ((page = readahead_page(rac))) {
+ rac->ra->size -= 1;
+ if (rac->ra->async_size > 0) {
+ rac->ra->async_size -= 1;
+ delete_from_page_cache(page);
+ }
unlock_page(page);
put_page(page);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 008/227] mm: improve cleanup when ->readpages doesn't process all pages
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: mm: improve cleanup when ->readpages doesn't process all pages
If ->readpages doesn't process all the pages, then it is best to act as
though they weren't requested so that a subsequent readahead can try
again.
So:
- remove any 'ahead' pages from the page cache so they can be loaded
with ->readahead() rather then multiple ->read()s
- update the file_ra_state to reflect the reads that were actually
submitted.
This allows ->readpages() to abort early due e.g. to congestion, which
will then allow us to remove the inode_read_congested() test from
page_Cache_async_ra().
Link: https://lkml.kernel.org/r/164549983736.9187.16755913785880819183.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/readahead.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
--- a/mm/readahead.c~mm-improve-cleanup-when-readpages-doesnt-process-all-pages
+++ a/mm/readahead.c
@@ -104,7 +104,13 @@
* for necessary resources (e.g. memory or indexing information) to
* become available. Pages in the final ``async_size`` may be
* considered less urgent and failure to read them is more acceptable.
- * They will eventually be read individually using ->readpage().
+ * In this case it is best to use delete_from_page_cache() to remove the
+ * pages from the page cache as is automatically done for pages that
+ * were not fetched with readahead_page(). This will allow a
+ * subsequent synchronous read ahead request to try them again. If they
+ * are left in the page cache, then they will be read individually using
+ * ->readpage().
+ *
*/
#include <linux/kernel.h>
@@ -226,8 +232,17 @@ static void read_pages(struct readahead_
if (aops->readahead) {
aops->readahead(rac);
- /* Clean up the remaining pages */
+ /*
+ * Clean up the remaining pages. The sizes in ->ra
+ * maybe be used to size next read-ahead, so make sure
+ * they accurately reflect what happened.
+ */
while ((page = readahead_page(rac))) {
+ rac->ra->size -= 1;
+ if (rac->ra->async_size > 0) {
+ rac->ra->async_size -= 1;
+ delete_from_page_cache(page);
+ }
unlock_page(page);
put_page(page);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 009/227] fuse: remove reliance on bdi congestion
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: fuse: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.
Fuse is one of a small number of filesystems that uses it, setting both
the sync (read) and async (write) congestion flags at what it determines
are appropriate times.
The only remaining effect of the sync flag is to cause read-ahead to be
skipped. The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flags, change:
- .readahead to stop when it has submitted all non-async pages
for read.
- .writepages to do nothing if WB_SYNC_NONE and the flag would be set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE
and the flag would be set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) will further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983737.9187.2627117501000365074.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fuse/control.c | 17 -----------------
fs/fuse/dev.c | 8 --------
fs/fuse/file.c | 17 +++++++++++++++++
3 files changed, 17 insertions(+), 25 deletions(-)
--- a/fs/fuse/control.c~fuse-remove-reliance-on-bdi-congestion
+++ a/fs/fuse/control.c
@@ -164,7 +164,6 @@ static ssize_t fuse_conn_congestion_thre
{
unsigned val;
struct fuse_conn *fc;
- struct fuse_mount *fm;
ssize_t ret;
ret = fuse_conn_limit_write(file, buf, count, ppos, &val,
@@ -178,22 +177,6 @@ static ssize_t fuse_conn_congestion_thre
down_read(&fc->killsb);
spin_lock(&fc->bg_lock);
fc->congestion_threshold = val;
-
- /*
- * Get any fuse_mount belonging to this fuse_conn; s_bdi is
- * shared between all of them
- */
-
- if (!list_empty(&fc->mounts)) {
- fm = list_first_entry(&fc->mounts, struct fuse_mount, fc_entry);
- if (fc->num_background < fc->congestion_threshold) {
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- } else {
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- }
- }
spin_unlock(&fc->bg_lock);
up_read(&fc->killsb);
fuse_conn_put(fc);
--- a/fs/fuse/dev.c~fuse-remove-reliance-on-bdi-congestion
+++ a/fs/fuse/dev.c
@@ -315,10 +315,6 @@ void fuse_request_end(struct fuse_req *r
wake_up(&fc->blocked_waitq);
}
- if (fc->num_background == fc->congestion_threshold && fm->sb) {
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- }
fc->num_background--;
fc->active_background--;
flush_bg_queue(fc);
@@ -540,10 +536,6 @@ static bool fuse_request_queue_backgroun
fc->num_background++;
if (fc->num_background == fc->max_background)
fc->blocked = 1;
- if (fc->num_background == fc->congestion_threshold && fm->sb) {
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- }
list_add_tail(&req->list, &fc->bg_queue);
flush_bg_queue(fc);
queued = true;
--- a/fs/fuse/file.c~fuse-remove-reliance-on-bdi-congestion
+++ a/fs/fuse/file.c
@@ -966,6 +966,14 @@ static void fuse_readahead(struct readah
struct fuse_io_args *ia;
struct fuse_args_pages *ap;
+ if (fc->num_background >= fc->congestion_threshold &&
+ rac->ra->async_size >= readahead_count(rac))
+ /*
+ * Congested and only async pages left, so skip the
+ * rest.
+ */
+ break;
+
nr_pages = readahead_count(rac) - nr_pages;
if (nr_pages > max_pages)
nr_pages = max_pages;
@@ -1959,6 +1967,7 @@ err:
static int fuse_writepage(struct page *page, struct writeback_control *wbc)
{
+ struct fuse_conn *fc = get_fuse_conn(page->mapping->host);
int err;
if (fuse_page_is_writeback(page->mapping->host, page->index)) {
@@ -1974,6 +1983,10 @@ static int fuse_writepage(struct page *p
return 0;
}
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ fc->num_background >= fc->congestion_threshold)
+ return AOP_WRITEPAGE_ACTIVATE;
+
err = fuse_writepage_locked(page);
unlock_page(page);
@@ -2227,6 +2240,10 @@ static int fuse_writepages(struct addres
if (fuse_is_bad(inode))
goto out;
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ fc->num_background >= fc->congestion_threshold)
+ return 0;
+
data.inode = inode;
data.wpa = NULL;
data.ff = NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 009/227] fuse: remove reliance on bdi congestion
@ 2022-03-22 21:38 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:38 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: fuse: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.
Fuse is one of a small number of filesystems that uses it, setting both
the sync (read) and async (write) congestion flags at what it determines
are appropriate times.
The only remaining effect of the sync flag is to cause read-ahead to be
skipped. The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flags, change:
- .readahead to stop when it has submitted all non-async pages
for read.
- .writepages to do nothing if WB_SYNC_NONE and the flag would be set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE
and the flag would be set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) will further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983737.9187.2627117501000365074.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fuse/control.c | 17 -----------------
fs/fuse/dev.c | 8 --------
fs/fuse/file.c | 17 +++++++++++++++++
3 files changed, 17 insertions(+), 25 deletions(-)
--- a/fs/fuse/control.c~fuse-remove-reliance-on-bdi-congestion
+++ a/fs/fuse/control.c
@@ -164,7 +164,6 @@ static ssize_t fuse_conn_congestion_thre
{
unsigned val;
struct fuse_conn *fc;
- struct fuse_mount *fm;
ssize_t ret;
ret = fuse_conn_limit_write(file, buf, count, ppos, &val,
@@ -178,22 +177,6 @@ static ssize_t fuse_conn_congestion_thre
down_read(&fc->killsb);
spin_lock(&fc->bg_lock);
fc->congestion_threshold = val;
-
- /*
- * Get any fuse_mount belonging to this fuse_conn; s_bdi is
- * shared between all of them
- */
-
- if (!list_empty(&fc->mounts)) {
- fm = list_first_entry(&fc->mounts, struct fuse_mount, fc_entry);
- if (fc->num_background < fc->congestion_threshold) {
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- } else {
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- }
- }
spin_unlock(&fc->bg_lock);
up_read(&fc->killsb);
fuse_conn_put(fc);
--- a/fs/fuse/dev.c~fuse-remove-reliance-on-bdi-congestion
+++ a/fs/fuse/dev.c
@@ -315,10 +315,6 @@ void fuse_request_end(struct fuse_req *r
wake_up(&fc->blocked_waitq);
}
- if (fc->num_background == fc->congestion_threshold && fm->sb) {
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- clear_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- }
fc->num_background--;
fc->active_background--;
flush_bg_queue(fc);
@@ -540,10 +536,6 @@ static bool fuse_request_queue_backgroun
fc->num_background++;
if (fc->num_background == fc->max_background)
fc->blocked = 1;
- if (fc->num_background == fc->congestion_threshold && fm->sb) {
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_SYNC);
- set_bdi_congested(fm->sb->s_bdi, BLK_RW_ASYNC);
- }
list_add_tail(&req->list, &fc->bg_queue);
flush_bg_queue(fc);
queued = true;
--- a/fs/fuse/file.c~fuse-remove-reliance-on-bdi-congestion
+++ a/fs/fuse/file.c
@@ -966,6 +966,14 @@ static void fuse_readahead(struct readah
struct fuse_io_args *ia;
struct fuse_args_pages *ap;
+ if (fc->num_background >= fc->congestion_threshold &&
+ rac->ra->async_size >= readahead_count(rac))
+ /*
+ * Congested and only async pages left, so skip the
+ * rest.
+ */
+ break;
+
nr_pages = readahead_count(rac) - nr_pages;
if (nr_pages > max_pages)
nr_pages = max_pages;
@@ -1959,6 +1967,7 @@ err:
static int fuse_writepage(struct page *page, struct writeback_control *wbc)
{
+ struct fuse_conn *fc = get_fuse_conn(page->mapping->host);
int err;
if (fuse_page_is_writeback(page->mapping->host, page->index)) {
@@ -1974,6 +1983,10 @@ static int fuse_writepage(struct page *p
return 0;
}
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ fc->num_background >= fc->congestion_threshold)
+ return AOP_WRITEPAGE_ACTIVATE;
+
err = fuse_writepage_locked(page);
unlock_page(page);
@@ -2227,6 +2240,10 @@ static int fuse_writepages(struct addres
if (fuse_is_bad(inode))
goto out;
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ fc->num_background >= fc->congestion_threshold)
+ return 0;
+
data.inode = inode;
data.wpa = NULL;
data.ff = NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 010/227] nfs: remove reliance on bdi congestion
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: nfs: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.
NFS is one of a small number of filesystems that uses it, setting just the
async (write) congestion flag at what it determines are appropriate times.
The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flag, set an internal flag and change:
- .writepages to do nothing if WB_SYNC_NONE and the flag is set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE
and the flag is set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) wil further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983738.9187.3972219847989393182.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/nfs/write.c | 14 +++++++++++---
include/linux/nfs_fs_sb.h | 1 +
2 files changed, 12 insertions(+), 3 deletions(-)
--- a/fs/nfs/write.c~nfs-remove-reliance-on-bdi-congestion
+++ a/fs/nfs/write.c
@@ -417,7 +417,7 @@ static void nfs_set_page_writeback(struc
if (atomic_long_inc_return(&nfss->writeback) >
NFS_CONGESTION_ON_THRESH)
- set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ nfss->write_congested = 1;
}
static void nfs_end_page_writeback(struct nfs_page *req)
@@ -433,7 +433,7 @@ static void nfs_end_page_writeback(struc
end_page_writeback(req->wb_page);
if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
- clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ nfss->write_congested = 0;
}
/*
@@ -672,6 +672,10 @@ static int nfs_writepage_locked(struct p
struct inode *inode = page_file_mapping(page)->host;
int err;
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ NFS_SERVER(inode)->write_congested)
+ return AOP_WRITEPAGE_ACTIVATE;
+
nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
nfs_pageio_init_write(&pgio, inode, 0,
false, &nfs_async_write_completion_ops);
@@ -719,6 +723,10 @@ int nfs_writepages(struct address_space
int priority = 0;
int err;
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ NFS_SERVER(inode)->write_congested)
+ return 0;
+
nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
if (!(mntflags & NFS_MOUNT_WRITE_EAGER) || wbc->for_kupdate ||
@@ -1893,7 +1901,7 @@ static void nfs_commit_release_pages(str
}
nfss = NFS_SERVER(data->inode);
if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
- clear_bdi_congested(inode_to_bdi(data->inode), BLK_RW_ASYNC);
+ nfss->write_congested = 0;
nfs_init_cinfo(&cinfo, data->inode, data->dreq);
nfs_commit_end(cinfo.mds);
--- a/include/linux/nfs_fs_sb.h~nfs-remove-reliance-on-bdi-congestion
+++ a/include/linux/nfs_fs_sb.h
@@ -138,6 +138,7 @@ struct nfs_server {
struct nlm_host *nlm_host; /* NLM client handle */
struct nfs_iostats __percpu *io_stats; /* I/O statistics */
atomic_long_t writeback; /* number of writeback pages */
+ unsigned int write_congested;/* flag set when writeback gets too high */
unsigned int flags; /* various flags */
/* The following are for internal use only. Also see uapi/linux/nfs_mount.h */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 010/227] nfs: remove reliance on bdi congestion
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: nfs: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.
NFS is one of a small number of filesystems that uses it, setting just the
async (write) congestion flag at what it determines are appropriate times.
The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flag, set an internal flag and change:
- .writepages to do nothing if WB_SYNC_NONE and the flag is set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE
and the flag is set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) wil further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983738.9187.3972219847989393182.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/nfs/write.c | 14 +++++++++++---
include/linux/nfs_fs_sb.h | 1 +
2 files changed, 12 insertions(+), 3 deletions(-)
--- a/fs/nfs/write.c~nfs-remove-reliance-on-bdi-congestion
+++ a/fs/nfs/write.c
@@ -417,7 +417,7 @@ static void nfs_set_page_writeback(struc
if (atomic_long_inc_return(&nfss->writeback) >
NFS_CONGESTION_ON_THRESH)
- set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ nfss->write_congested = 1;
}
static void nfs_end_page_writeback(struct nfs_page *req)
@@ -433,7 +433,7 @@ static void nfs_end_page_writeback(struc
end_page_writeback(req->wb_page);
if (atomic_long_dec_return(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
- clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ nfss->write_congested = 0;
}
/*
@@ -672,6 +672,10 @@ static int nfs_writepage_locked(struct p
struct inode *inode = page_file_mapping(page)->host;
int err;
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ NFS_SERVER(inode)->write_congested)
+ return AOP_WRITEPAGE_ACTIVATE;
+
nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGE);
nfs_pageio_init_write(&pgio, inode, 0,
false, &nfs_async_write_completion_ops);
@@ -719,6 +723,10 @@ int nfs_writepages(struct address_space
int priority = 0;
int err;
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ NFS_SERVER(inode)->write_congested)
+ return 0;
+
nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
if (!(mntflags & NFS_MOUNT_WRITE_EAGER) || wbc->for_kupdate ||
@@ -1893,7 +1901,7 @@ static void nfs_commit_release_pages(str
}
nfss = NFS_SERVER(data->inode);
if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
- clear_bdi_congested(inode_to_bdi(data->inode), BLK_RW_ASYNC);
+ nfss->write_congested = 0;
nfs_init_cinfo(&cinfo, data->inode, data->dreq);
nfs_commit_end(cinfo.mds);
--- a/include/linux/nfs_fs_sb.h~nfs-remove-reliance-on-bdi-congestion
+++ a/include/linux/nfs_fs_sb.h
@@ -138,6 +138,7 @@ struct nfs_server {
struct nlm_host *nlm_host; /* NLM client handle */
struct nfs_iostats __percpu *io_stats; /* I/O statistics */
atomic_long_t writeback; /* number of writeback pages */
+ unsigned int write_congested;/* flag set when writeback gets too high */
unsigned int flags; /* various flags */
/* The following are for internal use only. Also see uapi/linux/nfs_mount.h */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 011/227] ceph: remove reliance on bdi congestion
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: ceph: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.
CEPHfs is one of a small number of filesystems that uses it, setting just
the async (write) congestion flags at what it determines are appropriate
times.
The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flag, set an internal flag and change:
- .writepages to do nothing if WB_SYNC_NONE and the flag is set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE
and the flag is set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) wil further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983739.9187.14895675781408171186.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ceph/addr.c | 22 +++++++++++++---------
fs/ceph/super.c | 1 +
fs/ceph/super.h | 1 +
3 files changed, 15 insertions(+), 9 deletions(-)
--- a/fs/ceph/addr.c~ceph-remove-reliance-on-bdi-congestion
+++ a/fs/ceph/addr.c
@@ -563,7 +563,7 @@ static int writepage_nounlock(struct pag
if (atomic_long_inc_return(&fsc->writeback_count) >
CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
- set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ fsc->write_congested = true;
req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
@@ -623,7 +623,7 @@ static int writepage_nounlock(struct pag
if (atomic_long_dec_return(&fsc->writeback_count) <
CONGESTION_OFF_THRESH(fsc->mount_options->congestion_kb))
- clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ fsc->write_congested = false;
return err;
}
@@ -635,6 +635,10 @@ static int ceph_writepage(struct page *p
BUG_ON(!inode);
ihold(inode);
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ ceph_inode_to_client(inode)->write_congested)
+ return AOP_WRITEPAGE_ACTIVATE;
+
wait_on_page_fscache(page);
err = writepage_nounlock(page, wbc);
@@ -707,8 +711,7 @@ static void writepages_finish(struct cep
if (atomic_long_dec_return(&fsc->writeback_count) <
CONGESTION_OFF_THRESH(
fsc->mount_options->congestion_kb))
- clear_bdi_congested(inode_to_bdi(inode),
- BLK_RW_ASYNC);
+ fsc->write_congested = false;
ceph_put_snap_context(detach_page_private(page));
end_page_writeback(page);
@@ -760,6 +763,10 @@ static int ceph_writepages_start(struct
bool done = false;
bool caching = ceph_is_cache_enabled(inode);
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ fsc->write_congested)
+ return 0;
+
dout("writepages_start %p (mode=%s)\n", inode,
wbc->sync_mode == WB_SYNC_NONE ? "NONE" :
(wbc->sync_mode == WB_SYNC_ALL ? "ALL" : "HOLD"));
@@ -954,11 +961,8 @@ get_more_pages:
if (atomic_long_inc_return(&fsc->writeback_count) >
CONGESTION_ON_THRESH(
- fsc->mount_options->congestion_kb)) {
- set_bdi_congested(inode_to_bdi(inode),
- BLK_RW_ASYNC);
- }
-
+ fsc->mount_options->congestion_kb))
+ fsc->write_congested = true;
pages[locked_pages++] = page;
pvec.pages[i] = NULL;
--- a/fs/ceph/super.c~ceph-remove-reliance-on-bdi-congestion
+++ a/fs/ceph/super.c
@@ -802,6 +802,7 @@ static struct ceph_fs_client *create_fs_
fsc->have_copy_from2 = true;
atomic_long_set(&fsc->writeback_count, 0);
+ fsc->write_congested = false;
err = -ENOMEM;
/*
--- a/fs/ceph/super.h~ceph-remove-reliance-on-bdi-congestion
+++ a/fs/ceph/super.h
@@ -121,6 +121,7 @@ struct ceph_fs_client {
struct ceph_mds_client *mdsc;
atomic_long_t writeback_count;
+ bool write_congested;
struct workqueue_struct *inode_wq;
struct workqueue_struct *cap_wq;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 011/227] ceph: remove reliance on bdi congestion
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: ceph: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.
CEPHfs is one of a small number of filesystems that uses it, setting just
the async (write) congestion flags at what it determines are appropriate
times.
The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.
So instead of setting the flag, set an internal flag and change:
- .writepages to do nothing if WB_SYNC_NONE and the flag is set
- .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE
and the flag is set.
The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) wil further delay the next attempt at
writeout. This might be a good thing.
Link: https://lkml.kernel.org/r/164549983739.9187.14895675781408171186.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/ceph/addr.c | 22 +++++++++++++---------
fs/ceph/super.c | 1 +
fs/ceph/super.h | 1 +
3 files changed, 15 insertions(+), 9 deletions(-)
--- a/fs/ceph/addr.c~ceph-remove-reliance-on-bdi-congestion
+++ a/fs/ceph/addr.c
@@ -563,7 +563,7 @@ static int writepage_nounlock(struct pag
if (atomic_long_inc_return(&fsc->writeback_count) >
CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
- set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ fsc->write_congested = true;
req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
@@ -623,7 +623,7 @@ static int writepage_nounlock(struct pag
if (atomic_long_dec_return(&fsc->writeback_count) <
CONGESTION_OFF_THRESH(fsc->mount_options->congestion_kb))
- clear_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
+ fsc->write_congested = false;
return err;
}
@@ -635,6 +635,10 @@ static int ceph_writepage(struct page *p
BUG_ON(!inode);
ihold(inode);
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ ceph_inode_to_client(inode)->write_congested)
+ return AOP_WRITEPAGE_ACTIVATE;
+
wait_on_page_fscache(page);
err = writepage_nounlock(page, wbc);
@@ -707,8 +711,7 @@ static void writepages_finish(struct cep
if (atomic_long_dec_return(&fsc->writeback_count) <
CONGESTION_OFF_THRESH(
fsc->mount_options->congestion_kb))
- clear_bdi_congested(inode_to_bdi(inode),
- BLK_RW_ASYNC);
+ fsc->write_congested = false;
ceph_put_snap_context(detach_page_private(page));
end_page_writeback(page);
@@ -760,6 +763,10 @@ static int ceph_writepages_start(struct
bool done = false;
bool caching = ceph_is_cache_enabled(inode);
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ fsc->write_congested)
+ return 0;
+
dout("writepages_start %p (mode=%s)\n", inode,
wbc->sync_mode == WB_SYNC_NONE ? "NONE" :
(wbc->sync_mode == WB_SYNC_ALL ? "ALL" : "HOLD"));
@@ -954,11 +961,8 @@ get_more_pages:
if (atomic_long_inc_return(&fsc->writeback_count) >
CONGESTION_ON_THRESH(
- fsc->mount_options->congestion_kb)) {
- set_bdi_congested(inode_to_bdi(inode),
- BLK_RW_ASYNC);
- }
-
+ fsc->mount_options->congestion_kb))
+ fsc->write_congested = true;
pages[locked_pages++] = page;
pvec.pages[i] = NULL;
--- a/fs/ceph/super.c~ceph-remove-reliance-on-bdi-congestion
+++ a/fs/ceph/super.c
@@ -802,6 +802,7 @@ static struct ceph_fs_client *create_fs_
fsc->have_copy_from2 = true;
atomic_long_set(&fsc->writeback_count, 0);
+ fsc->write_congested = false;
err = -ENOMEM;
/*
--- a/fs/ceph/super.h~ceph-remove-reliance-on-bdi-congestion
+++ a/fs/ceph/super.h
@@ -121,6 +121,7 @@ struct ceph_fs_client {
struct ceph_mds_client *mdsc;
atomic_long_t writeback_count;
+ bool write_congested;
struct workqueue_struct *inode_wq;
struct workqueue_struct *cap_wq;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 012/227] remove inode_congested()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: remove inode_congested()
inode_congested() reports if the backing-device for the inode is
congested. No bdi reports congestion any more, so this always returns
'false'.
So remove inode_congested() and related functions, and remove the call
sites, assuming that inode_congested() always returns 'false'.
Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 37 ----------------------------------
include/linux/backing-dev.h | 22 --------------------
mm/fadvise.c | 5 +---
mm/readahead.c | 6 -----
mm/vmscan.c | 17 ---------------
5 files changed, 3 insertions(+), 84 deletions(-)
--- a/fs/fs-writeback.c~remove-inode_congested
+++ a/fs/fs-writeback.c
@@ -894,43 +894,6 @@ void wbc_account_cgroup_owner(struct wri
EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner);
/**
- * inode_congested - test whether an inode is congested
- * @inode: inode to test for congestion (may be NULL)
- * @cong_bits: mask of WB_[a]sync_congested bits to test
- *
- * Tests whether @inode is congested. @cong_bits is the mask of congestion
- * bits to test and the return value is the mask of set bits.
- *
- * If cgroup writeback is enabled for @inode, the congestion state is
- * determined by whether the cgwb (cgroup bdi_writeback) for the blkcg
- * associated with @inode is congested; otherwise, the root wb's congestion
- * state is used.
- *
- * @inode is allowed to be NULL as this function is often called on
- * mapping->host which is NULL for the swapper space.
- */
-int inode_congested(struct inode *inode, int cong_bits)
-{
- /*
- * Once set, ->i_wb never becomes NULL while the inode is alive.
- * Start transaction iff ->i_wb is visible.
- */
- if (inode && inode_to_wb_is_valid(inode)) {
- struct bdi_writeback *wb;
- struct wb_lock_cookie lock_cookie = {};
- bool congested;
-
- wb = unlocked_inode_to_wb_begin(inode, &lock_cookie);
- congested = wb_congested(wb, cong_bits);
- unlocked_inode_to_wb_end(inode, &lock_cookie);
- return congested;
- }
-
- return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-EXPORT_SYMBOL_GPL(inode_congested);
-
-/**
* wb_split_bdi_pages - split nr_pages to write according to bandwidth
* @wb: target bdi_writeback to split @nr_pages to
* @nr_pages: number of pages to write for the whole bdi
--- a/include/linux/backing-dev.h~remove-inode_congested
+++ a/include/linux/backing-dev.h
@@ -162,7 +162,6 @@ struct bdi_writeback *wb_get_create(stru
gfp_t gfp);
void wb_memcg_offline(struct mem_cgroup *memcg);
void wb_blkcg_offline(struct blkcg *blkcg);
-int inode_congested(struct inode *inode, int cong_bits);
/**
* inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
@@ -390,29 +389,8 @@ static inline void wb_blkcg_offline(stru
{
}
-static inline int inode_congested(struct inode *inode, int cong_bits)
-{
- return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-
#endif /* CONFIG_CGROUP_WRITEBACK */
-static inline int inode_read_congested(struct inode *inode)
-{
- return inode_congested(inode, 1 << WB_sync_congested);
-}
-
-static inline int inode_write_congested(struct inode *inode)
-{
- return inode_congested(inode, 1 << WB_async_congested);
-}
-
-static inline int inode_rw_congested(struct inode *inode)
-{
- return inode_congested(inode, (1 << WB_sync_congested) |
- (1 << WB_async_congested));
-}
-
static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
{
return wb_congested(&bdi->wb, cong_bits);
--- a/mm/fadvise.c~remove-inode_congested
+++ a/mm/fadvise.c
@@ -109,9 +109,8 @@ int generic_fadvise(struct file *file, l
case POSIX_FADV_NOREUSE:
break;
case POSIX_FADV_DONTNEED:
- if (!inode_write_congested(mapping->host))
- __filemap_fdatawrite_range(mapping, offset, endbyte,
- WB_SYNC_NONE);
+ __filemap_fdatawrite_range(mapping, offset, endbyte,
+ WB_SYNC_NONE);
/*
* First and last FULL page! Partial pages are deliberately
--- a/mm/readahead.c~remove-inode_congested
+++ a/mm/readahead.c
@@ -709,12 +709,6 @@ void page_cache_async_ra(struct readahea
folio_clear_readahead(folio);
- /*
- * Defer asynchronous read-ahead on IO congestion.
- */
- if (inode_read_congested(ractl->mapping->host))
- return;
-
if (blk_cgroup_congested())
return;
--- a/mm/vmscan.c~remove-inode_congested
+++ a/mm/vmscan.c
@@ -989,17 +989,6 @@ static inline int is_page_cache_freeable
return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
}
-static int may_write_to_inode(struct inode *inode)
-{
- if (current->flags & PF_SWAPWRITE)
- return 1;
- if (!inode_write_congested(inode))
- return 1;
- if (inode_to_bdi(inode) == current->backing_dev_info)
- return 1;
- return 0;
-}
-
/*
* We detected a synchronous write error writing a page out. Probably
* -ENOSPC. We need to propagate that into the address_space for a subsequent
@@ -1201,8 +1190,6 @@ static pageout_t pageout(struct page *pa
}
if (mapping->a_ops->writepage == NULL)
return PAGE_ACTIVATE;
- if (!may_write_to_inode(mapping->host))
- return PAGE_KEEP;
if (clear_page_dirty_for_io(page)) {
int res;
@@ -1578,9 +1565,7 @@ retry:
* end of the LRU a second time.
*/
mapping = page_mapping(page);
- if (((dirty || writeback) && mapping &&
- inode_write_congested(mapping->host)) ||
- (writeback && PageReclaim(page)))
+ if (writeback && PageReclaim(page))
stat->nr_congested++;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 012/227] remove inode_congested()
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: remove inode_congested()
inode_congested() reports if the backing-device for the inode is
congested. No bdi reports congestion any more, so this always returns
'false'.
So remove inode_congested() and related functions, and remove the call
sites, assuming that inode_congested() always returns 'false'.
Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 37 ----------------------------------
include/linux/backing-dev.h | 22 --------------------
mm/fadvise.c | 5 +---
mm/readahead.c | 6 -----
mm/vmscan.c | 17 ---------------
5 files changed, 3 insertions(+), 84 deletions(-)
--- a/fs/fs-writeback.c~remove-inode_congested
+++ a/fs/fs-writeback.c
@@ -894,43 +894,6 @@ void wbc_account_cgroup_owner(struct wri
EXPORT_SYMBOL_GPL(wbc_account_cgroup_owner);
/**
- * inode_congested - test whether an inode is congested
- * @inode: inode to test for congestion (may be NULL)
- * @cong_bits: mask of WB_[a]sync_congested bits to test
- *
- * Tests whether @inode is congested. @cong_bits is the mask of congestion
- * bits to test and the return value is the mask of set bits.
- *
- * If cgroup writeback is enabled for @inode, the congestion state is
- * determined by whether the cgwb (cgroup bdi_writeback) for the blkcg
- * associated with @inode is congested; otherwise, the root wb's congestion
- * state is used.
- *
- * @inode is allowed to be NULL as this function is often called on
- * mapping->host which is NULL for the swapper space.
- */
-int inode_congested(struct inode *inode, int cong_bits)
-{
- /*
- * Once set, ->i_wb never becomes NULL while the inode is alive.
- * Start transaction iff ->i_wb is visible.
- */
- if (inode && inode_to_wb_is_valid(inode)) {
- struct bdi_writeback *wb;
- struct wb_lock_cookie lock_cookie = {};
- bool congested;
-
- wb = unlocked_inode_to_wb_begin(inode, &lock_cookie);
- congested = wb_congested(wb, cong_bits);
- unlocked_inode_to_wb_end(inode, &lock_cookie);
- return congested;
- }
-
- return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-EXPORT_SYMBOL_GPL(inode_congested);
-
-/**
* wb_split_bdi_pages - split nr_pages to write according to bandwidth
* @wb: target bdi_writeback to split @nr_pages to
* @nr_pages: number of pages to write for the whole bdi
--- a/include/linux/backing-dev.h~remove-inode_congested
+++ a/include/linux/backing-dev.h
@@ -162,7 +162,6 @@ struct bdi_writeback *wb_get_create(stru
gfp_t gfp);
void wb_memcg_offline(struct mem_cgroup *memcg);
void wb_blkcg_offline(struct blkcg *blkcg);
-int inode_congested(struct inode *inode, int cong_bits);
/**
* inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
@@ -390,29 +389,8 @@ static inline void wb_blkcg_offline(stru
{
}
-static inline int inode_congested(struct inode *inode, int cong_bits)
-{
- return wb_congested(&inode_to_bdi(inode)->wb, cong_bits);
-}
-
#endif /* CONFIG_CGROUP_WRITEBACK */
-static inline int inode_read_congested(struct inode *inode)
-{
- return inode_congested(inode, 1 << WB_sync_congested);
-}
-
-static inline int inode_write_congested(struct inode *inode)
-{
- return inode_congested(inode, 1 << WB_async_congested);
-}
-
-static inline int inode_rw_congested(struct inode *inode)
-{
- return inode_congested(inode, (1 << WB_sync_congested) |
- (1 << WB_async_congested));
-}
-
static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
{
return wb_congested(&bdi->wb, cong_bits);
--- a/mm/fadvise.c~remove-inode_congested
+++ a/mm/fadvise.c
@@ -109,9 +109,8 @@ int generic_fadvise(struct file *file, l
case POSIX_FADV_NOREUSE:
break;
case POSIX_FADV_DONTNEED:
- if (!inode_write_congested(mapping->host))
- __filemap_fdatawrite_range(mapping, offset, endbyte,
- WB_SYNC_NONE);
+ __filemap_fdatawrite_range(mapping, offset, endbyte,
+ WB_SYNC_NONE);
/*
* First and last FULL page! Partial pages are deliberately
--- a/mm/readahead.c~remove-inode_congested
+++ a/mm/readahead.c
@@ -709,12 +709,6 @@ void page_cache_async_ra(struct readahea
folio_clear_readahead(folio);
- /*
- * Defer asynchronous read-ahead on IO congestion.
- */
- if (inode_read_congested(ractl->mapping->host))
- return;
-
if (blk_cgroup_congested())
return;
--- a/mm/vmscan.c~remove-inode_congested
+++ a/mm/vmscan.c
@@ -989,17 +989,6 @@ static inline int is_page_cache_freeable
return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
}
-static int may_write_to_inode(struct inode *inode)
-{
- if (current->flags & PF_SWAPWRITE)
- return 1;
- if (!inode_write_congested(inode))
- return 1;
- if (inode_to_bdi(inode) == current->backing_dev_info)
- return 1;
- return 0;
-}
-
/*
* We detected a synchronous write error writing a page out. Probably
* -ENOSPC. We need to propagate that into the address_space for a subsequent
@@ -1201,8 +1190,6 @@ static pageout_t pageout(struct page *pa
}
if (mapping->a_ops->writepage == NULL)
return PAGE_ACTIVATE;
- if (!may_write_to_inode(mapping->host))
- return PAGE_KEEP;
if (clear_page_dirty_for_io(page)) {
int res;
@@ -1578,9 +1565,7 @@ retry:
* end of the LRU a second time.
*/
mapping = page_mapping(page);
- if (((dirty || writeback) && mapping &&
- inode_write_congested(mapping->host)) ||
- (writeback && PageReclaim(page)))
+ if (writeback && PageReclaim(page))
stat->nr_congested++;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 013/227] remove bdi_congested() and wb_congested() and related functions
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: remove bdi_congested() and wb_congested() and related functions
These functions are no longer useful as no BDIs report congestions any
more.
Removing the test on bdi_write_contested() in current_may_throttle() could
cause a small change in behaviour, but only when PF_LOCAL_THROTTLE is set.
So replace the calls by 'false' and simplify the code - and remove the
functions.
[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs]
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/block/drbd/drbd_int.h | 3 ---
drivers/block/drbd/drbd_req.c | 3 +--
fs/ext2/ialloc.c | 5 -----
fs/nilfs2/segbuf.c | 16 ----------------
fs/xfs/xfs_buf.c | 3 ---
include/linux/backing-dev.h | 26 --------------------------
mm/vmscan.c | 4 +---
7 files changed, 2 insertions(+), 58 deletions(-)
--- a/drivers/block/drbd/drbd_int.h~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/drivers/block/drbd/drbd_int.h
@@ -638,9 +638,6 @@ enum {
STATE_SENT, /* Do not change state/UUIDs while this is set */
CALLBACK_PENDING, /* Whether we have a call_usermodehelper(, UMH_WAIT_PROC)
* pending, from drbd worker context.
- * If set, bdi_write_congested() returns true,
- * so shrink_page_list() would not recurse into,
- * and potentially deadlock on, this drbd worker.
*/
DISCONNECT_SENT,
--- a/drivers/block/drbd/drbd_req.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/drivers/block/drbd/drbd_req.c
@@ -909,8 +909,7 @@ static bool remote_due_to_read_balancing
switch (rbm) {
case RB_CONGESTED_REMOTE:
- return bdi_read_congested(
- device->ldev->backing_bdev->bd_disk->bdi);
+ return 0;
case RB_LEAST_PENDING:
return atomic_read(&device->local_cnt) >
atomic_read(&device->ap_pending_cnt) + atomic_read(&device->rs_pending_cnt);
--- a/fs/ext2/ialloc.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/fs/ext2/ialloc.c
@@ -170,11 +170,6 @@ static void ext2_preread_inode(struct in
unsigned long offset;
unsigned long block;
struct ext2_group_desc * gdp;
- struct backing_dev_info *bdi;
-
- bdi = inode_to_bdi(inode);
- if (bdi_rw_congested(bdi))
- return;
block_group = (inode->i_ino - 1) / EXT2_INODES_PER_GROUP(inode->i_sb);
gdp = ext2_get_group_desc(inode->i_sb, block_group, NULL);
--- a/fs/nilfs2/segbuf.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/fs/nilfs2/segbuf.c
@@ -341,18 +341,6 @@ static int nilfs_segbuf_submit_bio(struc
int mode_flags)
{
struct bio *bio = wi->bio;
- int err;
-
- if (segbuf->sb_nbio > 0 &&
- bdi_write_congested(segbuf->sb_super->s_bdi)) {
- wait_for_completion(&segbuf->sb_bio_event);
- segbuf->sb_nbio--;
- if (unlikely(atomic_read(&segbuf->sb_err))) {
- bio_put(bio);
- err = -EIO;
- goto failed;
- }
- }
bio->bi_end_io = nilfs_end_bio_write;
bio->bi_private = segbuf;
@@ -365,10 +353,6 @@ static int nilfs_segbuf_submit_bio(struc
wi->nr_vecs = min(wi->max_pages, wi->rest_blocks);
wi->start = wi->end;
return 0;
-
- failed:
- wi->bio = NULL;
- return err;
}
/**
--- a/fs/xfs/xfs_buf.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/fs/xfs/xfs_buf.c
@@ -843,9 +843,6 @@ xfs_buf_readahead_map(
{
struct xfs_buf *bp;
- if (bdi_read_congested(target->bt_bdev->bd_disk->bdi))
- return;
-
xfs_buf_read_map(target, map, nmaps,
XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
__this_address);
--- a/include/linux/backing-dev.h~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/include/linux/backing-dev.h
@@ -135,11 +135,6 @@ static inline bool writeback_in_progress
struct backing_dev_info *inode_to_bdi(struct inode *inode);
-static inline int wb_congested(struct bdi_writeback *wb, int cong_bits)
-{
- return wb->congested & cong_bits;
-}
-
long congestion_wait(int sync, long timeout);
static inline bool mapping_can_writeback(struct address_space *mapping)
@@ -391,27 +386,6 @@ static inline void wb_blkcg_offline(stru
#endif /* CONFIG_CGROUP_WRITEBACK */
-static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
-{
- return wb_congested(&bdi->wb, cong_bits);
-}
-
-static inline int bdi_read_congested(struct backing_dev_info *bdi)
-{
- return bdi_congested(bdi, 1 << WB_sync_congested);
-}
-
-static inline int bdi_write_congested(struct backing_dev_info *bdi)
-{
- return bdi_congested(bdi, 1 << WB_async_congested);
-}
-
-static inline int bdi_rw_congested(struct backing_dev_info *bdi)
-{
- return bdi_congested(bdi, (1 << WB_sync_congested) |
- (1 << WB_async_congested));
-}
-
const char *bdi_dev_name(struct backing_dev_info *bdi);
#endif /* _LINUX_BACKING_DEV_H */
--- a/mm/vmscan.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/mm/vmscan.c
@@ -2364,9 +2364,7 @@ static unsigned int move_pages_to_lru(st
*/
static int current_may_throttle(void)
{
- return !(current->flags & PF_LOCAL_THROTTLE) ||
- current->backing_dev_info == NULL ||
- bdi_write_congested(current->backing_dev_info);
+ return !(current->flags & PF_LOCAL_THROTTLE);
}
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 013/227] remove bdi_congested() and wb_congested() and related functions
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: remove bdi_congested() and wb_congested() and related functions
These functions are no longer useful as no BDIs report congestions any
more.
Removing the test on bdi_write_contested() in current_may_throttle() could
cause a small change in behaviour, but only when PF_LOCAL_THROTTLE is set.
So replace the calls by 'false' and simplify the code - and remove the
functions.
[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs]
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/block/drbd/drbd_int.h | 3 ---
drivers/block/drbd/drbd_req.c | 3 +--
fs/ext2/ialloc.c | 5 -----
fs/nilfs2/segbuf.c | 16 ----------------
fs/xfs/xfs_buf.c | 3 ---
include/linux/backing-dev.h | 26 --------------------------
mm/vmscan.c | 4 +---
7 files changed, 2 insertions(+), 58 deletions(-)
--- a/drivers/block/drbd/drbd_int.h~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/drivers/block/drbd/drbd_int.h
@@ -638,9 +638,6 @@ enum {
STATE_SENT, /* Do not change state/UUIDs while this is set */
CALLBACK_PENDING, /* Whether we have a call_usermodehelper(, UMH_WAIT_PROC)
* pending, from drbd worker context.
- * If set, bdi_write_congested() returns true,
- * so shrink_page_list() would not recurse into,
- * and potentially deadlock on, this drbd worker.
*/
DISCONNECT_SENT,
--- a/drivers/block/drbd/drbd_req.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/drivers/block/drbd/drbd_req.c
@@ -909,8 +909,7 @@ static bool remote_due_to_read_balancing
switch (rbm) {
case RB_CONGESTED_REMOTE:
- return bdi_read_congested(
- device->ldev->backing_bdev->bd_disk->bdi);
+ return 0;
case RB_LEAST_PENDING:
return atomic_read(&device->local_cnt) >
atomic_read(&device->ap_pending_cnt) + atomic_read(&device->rs_pending_cnt);
--- a/fs/ext2/ialloc.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/fs/ext2/ialloc.c
@@ -170,11 +170,6 @@ static void ext2_preread_inode(struct in
unsigned long offset;
unsigned long block;
struct ext2_group_desc * gdp;
- struct backing_dev_info *bdi;
-
- bdi = inode_to_bdi(inode);
- if (bdi_rw_congested(bdi))
- return;
block_group = (inode->i_ino - 1) / EXT2_INODES_PER_GROUP(inode->i_sb);
gdp = ext2_get_group_desc(inode->i_sb, block_group, NULL);
--- a/fs/nilfs2/segbuf.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/fs/nilfs2/segbuf.c
@@ -341,18 +341,6 @@ static int nilfs_segbuf_submit_bio(struc
int mode_flags)
{
struct bio *bio = wi->bio;
- int err;
-
- if (segbuf->sb_nbio > 0 &&
- bdi_write_congested(segbuf->sb_super->s_bdi)) {
- wait_for_completion(&segbuf->sb_bio_event);
- segbuf->sb_nbio--;
- if (unlikely(atomic_read(&segbuf->sb_err))) {
- bio_put(bio);
- err = -EIO;
- goto failed;
- }
- }
bio->bi_end_io = nilfs_end_bio_write;
bio->bi_private = segbuf;
@@ -365,10 +353,6 @@ static int nilfs_segbuf_submit_bio(struc
wi->nr_vecs = min(wi->max_pages, wi->rest_blocks);
wi->start = wi->end;
return 0;
-
- failed:
- wi->bio = NULL;
- return err;
}
/**
--- a/fs/xfs/xfs_buf.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/fs/xfs/xfs_buf.c
@@ -843,9 +843,6 @@ xfs_buf_readahead_map(
{
struct xfs_buf *bp;
- if (bdi_read_congested(target->bt_bdev->bd_disk->bdi))
- return;
-
xfs_buf_read_map(target, map, nmaps,
XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
__this_address);
--- a/include/linux/backing-dev.h~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/include/linux/backing-dev.h
@@ -135,11 +135,6 @@ static inline bool writeback_in_progress
struct backing_dev_info *inode_to_bdi(struct inode *inode);
-static inline int wb_congested(struct bdi_writeback *wb, int cong_bits)
-{
- return wb->congested & cong_bits;
-}
-
long congestion_wait(int sync, long timeout);
static inline bool mapping_can_writeback(struct address_space *mapping)
@@ -391,27 +386,6 @@ static inline void wb_blkcg_offline(stru
#endif /* CONFIG_CGROUP_WRITEBACK */
-static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits)
-{
- return wb_congested(&bdi->wb, cong_bits);
-}
-
-static inline int bdi_read_congested(struct backing_dev_info *bdi)
-{
- return bdi_congested(bdi, 1 << WB_sync_congested);
-}
-
-static inline int bdi_write_congested(struct backing_dev_info *bdi)
-{
- return bdi_congested(bdi, 1 << WB_async_congested);
-}
-
-static inline int bdi_rw_congested(struct backing_dev_info *bdi)
-{
- return bdi_congested(bdi, (1 << WB_sync_congested) |
- (1 << WB_async_congested));
-}
-
const char *bdi_dev_name(struct backing_dev_info *bdi);
#endif /* _LINUX_BACKING_DEV_H */
--- a/mm/vmscan.c~remove-bdi_congested-and-wb_congested-and-related-functions
+++ a/mm/vmscan.c
@@ -2364,9 +2364,7 @@ static unsigned int move_pages_to_lru(st
*/
static int current_may_throttle(void)
{
- return !(current->flags & PF_LOCAL_THROTTLE) ||
- current->backing_dev_info == NULL ||
- bdi_write_congested(current->backing_dev_info);
+ return !(current->flags & PF_LOCAL_THROTTLE);
}
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 014/227] f2fs: replace congestion_wait() calls with io_schedule_timeout()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: f2fs: replace congestion_wait() calls with io_schedule_timeout()
As congestion is no longer tracked, congestion_wait() is effectively
equivalent to io_schedule_timeout(). So introduce
f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE and call that
instead.
Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/f2fs/compress.c | 4 +---
fs/f2fs/data.c | 3 +--
fs/f2fs/f2fs.h | 6 ++++++
fs/f2fs/segment.c | 8 +++-----
fs/f2fs/super.c | 6 ++----
5 files changed, 13 insertions(+), 14 deletions(-)
--- a/fs/f2fs/compress.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/compress.c
@@ -1505,9 +1505,7 @@ continue_unlock:
if (IS_NOQUOTA(cc->inode))
return 0;
ret = 0;
- cond_resched();
- congestion_wait(BLK_RW_ASYNC,
- DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
goto retry_write;
}
return ret;
--- a/fs/f2fs/data.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/data.c
@@ -3047,8 +3047,7 @@ result:
} else if (ret == -EAGAIN) {
ret = 0;
if (wbc->sync_mode == WB_SYNC_ALL) {
- cond_resched();
- congestion_wait(BLK_RW_ASYNC,
+ f2fs_io_schedule_timeout(
DEFAULT_IO_TIMEOUT);
goto retry_write;
}
--- a/fs/f2fs/f2fs.h~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/f2fs.h
@@ -4426,6 +4426,12 @@ static inline bool f2fs_block_unit_disca
return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK;
}
+static inline void f2fs_io_schedule_timeout(long timeout)
+{
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ io_schedule_timeout(timeout);
+}
+
#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
--- a/fs/f2fs/segment.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/segment.c
@@ -313,8 +313,7 @@ next:
skip:
iput(inode);
}
- congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
- cond_resched();
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
if (gc_failure) {
if (++looped >= count)
return;
@@ -803,8 +802,7 @@ int f2fs_flush_device_cache(struct f2fs_
do {
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
if (ret)
- congestion_wait(BLK_RW_ASYNC,
- DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
} while (ret && --count);
if (ret) {
@@ -3133,7 +3131,7 @@ next:
blk_finish_plug(&plug);
mutex_unlock(&dcc->cmd_lock);
trimmed += __wait_all_discard_cmd(sbi, NULL);
- congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
goto next;
}
skip:
--- a/fs/f2fs/super.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/super.c
@@ -2135,8 +2135,7 @@ static void f2fs_enable_checkpoint(struc
/* we should flush all the data to keep data consistency */
do {
sync_inodes_sb(sbi->sb);
- cond_resched();
- congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
} while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--);
if (unlikely(retry < 0))
@@ -2504,8 +2503,7 @@ retry:
&page, &fsdata);
if (unlikely(err)) {
if (err == -ENOMEM) {
- congestion_wait(BLK_RW_ASYNC,
- DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
goto retry;
}
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 014/227] f2fs: replace congestion_wait() calls with io_schedule_timeout()
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: f2fs: replace congestion_wait() calls with io_schedule_timeout()
As congestion is no longer tracked, congestion_wait() is effectively
equivalent to io_schedule_timeout(). So introduce
f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE and call that
instead.
Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/f2fs/compress.c | 4 +---
fs/f2fs/data.c | 3 +--
fs/f2fs/f2fs.h | 6 ++++++
fs/f2fs/segment.c | 8 +++-----
fs/f2fs/super.c | 6 ++----
5 files changed, 13 insertions(+), 14 deletions(-)
--- a/fs/f2fs/compress.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/compress.c
@@ -1505,9 +1505,7 @@ continue_unlock:
if (IS_NOQUOTA(cc->inode))
return 0;
ret = 0;
- cond_resched();
- congestion_wait(BLK_RW_ASYNC,
- DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
goto retry_write;
}
return ret;
--- a/fs/f2fs/data.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/data.c
@@ -3047,8 +3047,7 @@ result:
} else if (ret == -EAGAIN) {
ret = 0;
if (wbc->sync_mode == WB_SYNC_ALL) {
- cond_resched();
- congestion_wait(BLK_RW_ASYNC,
+ f2fs_io_schedule_timeout(
DEFAULT_IO_TIMEOUT);
goto retry_write;
}
--- a/fs/f2fs/f2fs.h~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/f2fs.h
@@ -4426,6 +4426,12 @@ static inline bool f2fs_block_unit_disca
return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK;
}
+static inline void f2fs_io_schedule_timeout(long timeout)
+{
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ io_schedule_timeout(timeout);
+}
+
#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
--- a/fs/f2fs/segment.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/segment.c
@@ -313,8 +313,7 @@ next:
skip:
iput(inode);
}
- congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
- cond_resched();
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
if (gc_failure) {
if (++looped >= count)
return;
@@ -803,8 +802,7 @@ int f2fs_flush_device_cache(struct f2fs_
do {
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
if (ret)
- congestion_wait(BLK_RW_ASYNC,
- DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
} while (ret && --count);
if (ret) {
@@ -3133,7 +3131,7 @@ next:
blk_finish_plug(&plug);
mutex_unlock(&dcc->cmd_lock);
trimmed += __wait_all_discard_cmd(sbi, NULL);
- congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
goto next;
}
skip:
--- a/fs/f2fs/super.c~f2fs-replace-congestion_wait-calls-with-io_schedule_timeout
+++ a/fs/f2fs/super.c
@@ -2135,8 +2135,7 @@ static void f2fs_enable_checkpoint(struc
/* we should flush all the data to keep data consistency */
do {
sync_inodes_sb(sbi->sb);
- cond_resched();
- congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
} while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--);
if (unlikely(retry < 0))
@@ -2504,8 +2503,7 @@ retry:
&page, &fsdata);
if (unlikely(err)) {
if (err == -ENOMEM) {
- congestion_wait(BLK_RW_ASYNC,
- DEFAULT_IO_TIMEOUT);
+ f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
goto retry;
}
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 015/227] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC"
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC"
bfq_get_queue() expects a "bool" for the third arg, so pass "false" rather
than "BLK_RW_ASYNC" which will soon be removed.
Link: https://lkml.kernel.org/r/164549983746.9187.7949730109246767909.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
block/bfq-iosched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/block/bfq-iosched.c~block-bfq-ioschedc-use-false-rather-than-blk_rw_async
+++ a/block/bfq-iosched.c
@@ -5448,7 +5448,7 @@ static void bfq_check_ioprio_change(stru
bfqq = bic_to_bfqq(bic, false);
if (bfqq) {
bfq_release_process_ref(bfqd, bfqq);
- bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic, true);
+ bfqq = bfq_get_queue(bfqd, bio, false, bic, true);
bic_set_bfqq(bic, bfqq, false);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 015/227] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC"
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC"
bfq_get_queue() expects a "bool" for the third arg, so pass "false" rather
than "BLK_RW_ASYNC" which will soon be removed.
Link: https://lkml.kernel.org/r/164549983746.9187.7949730109246767909.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
block/bfq-iosched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/block/bfq-iosched.c~block-bfq-ioschedc-use-false-rather-than-blk_rw_async
+++ a/block/bfq-iosched.c
@@ -5448,7 +5448,7 @@ static void bfq_check_ioprio_change(stru
bfqq = bic_to_bfqq(bic, false);
if (bfqq) {
bfq_release_process_ref(bfqd, bfqq);
- bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic, true);
+ bfqq = bfq_get_queue(bfqd, bio, false, bic, true);
bic_set_bfqq(bic, bfqq, false);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 016/227] remove congestion tracking framework
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: remove congestion tracking framework
This framework is no longer used - so discard it.
Link: https://lkml.kernel.org/r/164549983747.9187.6171768583526866601.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/backing-dev-defs.h | 8 ----
include/linux/backing-dev.h | 2 -
include/trace/events/writeback.h | 28 --------------
mm/backing-dev.c | 57 -----------------------------
4 files changed, 95 deletions(-)
--- a/include/linux/backing-dev-defs.h~remove-congestion-tracking-framework
+++ a/include/linux/backing-dev-defs.h
@@ -207,14 +207,6 @@ struct backing_dev_info {
#endif
};
-enum {
- BLK_RW_ASYNC = 0,
- BLK_RW_SYNC = 1,
-};
-
-void clear_bdi_congested(struct backing_dev_info *bdi, int sync);
-void set_bdi_congested(struct backing_dev_info *bdi, int sync);
-
struct wb_lock_cookie {
bool locked;
unsigned long flags;
--- a/include/linux/backing-dev.h~remove-congestion-tracking-framework
+++ a/include/linux/backing-dev.h
@@ -135,8 +135,6 @@ static inline bool writeback_in_progress
struct backing_dev_info *inode_to_bdi(struct inode *inode);
-long congestion_wait(int sync, long timeout);
-
static inline bool mapping_can_writeback(struct address_space *mapping)
{
return inode_to_bdi(mapping->host)->capabilities & BDI_CAP_WRITEBACK;
--- a/include/trace/events/writeback.h~remove-congestion-tracking-framework
+++ a/include/trace/events/writeback.h
@@ -735,34 +735,6 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
)
);
-DECLARE_EVENT_CLASS(writeback_congest_waited_template,
-
- TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed),
-
- TP_ARGS(usec_timeout, usec_delayed),
-
- TP_STRUCT__entry(
- __field( unsigned int, usec_timeout )
- __field( unsigned int, usec_delayed )
- ),
-
- TP_fast_assign(
- __entry->usec_timeout = usec_timeout;
- __entry->usec_delayed = usec_delayed;
- ),
-
- TP_printk("usec_timeout=%u usec_delayed=%u",
- __entry->usec_timeout,
- __entry->usec_delayed)
-);
-
-DEFINE_EVENT(writeback_congest_waited_template, writeback_congestion_wait,
-
- TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed),
-
- TP_ARGS(usec_timeout, usec_delayed)
-);
-
DECLARE_EVENT_CLASS(writeback_single_inode_template,
TP_PROTO(struct inode *inode,
--- a/mm/backing-dev.c~remove-congestion-tracking-framework
+++ a/mm/backing-dev.c
@@ -1005,60 +1005,3 @@ const char *bdi_dev_name(struct backing_
return bdi->dev_name;
}
EXPORT_SYMBOL_GPL(bdi_dev_name);
-
-static wait_queue_head_t congestion_wqh[2] = {
- __WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[0]),
- __WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[1])
- };
-static atomic_t nr_wb_congested[2];
-
-void clear_bdi_congested(struct backing_dev_info *bdi, int sync)
-{
- wait_queue_head_t *wqh = &congestion_wqh[sync];
- enum wb_congested_state bit;
-
- bit = sync ? WB_sync_congested : WB_async_congested;
- if (test_and_clear_bit(bit, &bdi->wb.congested))
- atomic_dec(&nr_wb_congested[sync]);
- smp_mb__after_atomic();
- if (waitqueue_active(wqh))
- wake_up(wqh);
-}
-EXPORT_SYMBOL(clear_bdi_congested);
-
-void set_bdi_congested(struct backing_dev_info *bdi, int sync)
-{
- enum wb_congested_state bit;
-
- bit = sync ? WB_sync_congested : WB_async_congested;
- if (!test_and_set_bit(bit, &bdi->wb.congested))
- atomic_inc(&nr_wb_congested[sync]);
-}
-EXPORT_SYMBOL(set_bdi_congested);
-
-/**
- * congestion_wait - wait for a backing_dev to become uncongested
- * @sync: SYNC or ASYNC IO
- * @timeout: timeout in jiffies
- *
- * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
- * write congestion. If no backing_devs are congested then just wait for the
- * next write to be completed.
- */
-long congestion_wait(int sync, long timeout)
-{
- long ret;
- unsigned long start = jiffies;
- DEFINE_WAIT(wait);
- wait_queue_head_t *wqh = &congestion_wqh[sync];
-
- prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
- ret = io_schedule_timeout(timeout);
- finish_wait(wqh, &wait);
-
- trace_writeback_congestion_wait(jiffies_to_usecs(timeout),
- jiffies_to_usecs(jiffies - start));
-
- return ret;
-}
-EXPORT_SYMBOL(congestion_wait);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 016/227] remove congestion tracking framework
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: trond.myklebust, philipp.reisner, paolo.valente, miklos,
lars.ellenberg, konishi.ryusuke, jlayton, jaegeuk, jack,
idryomov, fengguang.wu, djwong, chao, axboe, Anna.Schumaker,
neilb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: NeilBrown <neilb@suse.de>
Subject: remove congestion tracking framework
This framework is no longer used - so discard it.
Link: https://lkml.kernel.org/r/164549983747.9187.6171768583526866601.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/backing-dev-defs.h | 8 ----
include/linux/backing-dev.h | 2 -
include/trace/events/writeback.h | 28 --------------
mm/backing-dev.c | 57 -----------------------------
4 files changed, 95 deletions(-)
--- a/include/linux/backing-dev-defs.h~remove-congestion-tracking-framework
+++ a/include/linux/backing-dev-defs.h
@@ -207,14 +207,6 @@ struct backing_dev_info {
#endif
};
-enum {
- BLK_RW_ASYNC = 0,
- BLK_RW_SYNC = 1,
-};
-
-void clear_bdi_congested(struct backing_dev_info *bdi, int sync);
-void set_bdi_congested(struct backing_dev_info *bdi, int sync);
-
struct wb_lock_cookie {
bool locked;
unsigned long flags;
--- a/include/linux/backing-dev.h~remove-congestion-tracking-framework
+++ a/include/linux/backing-dev.h
@@ -135,8 +135,6 @@ static inline bool writeback_in_progress
struct backing_dev_info *inode_to_bdi(struct inode *inode);
-long congestion_wait(int sync, long timeout);
-
static inline bool mapping_can_writeback(struct address_space *mapping)
{
return inode_to_bdi(mapping->host)->capabilities & BDI_CAP_WRITEBACK;
--- a/include/trace/events/writeback.h~remove-congestion-tracking-framework
+++ a/include/trace/events/writeback.h
@@ -735,34 +735,6 @@ TRACE_EVENT(writeback_sb_inodes_requeue,
)
);
-DECLARE_EVENT_CLASS(writeback_congest_waited_template,
-
- TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed),
-
- TP_ARGS(usec_timeout, usec_delayed),
-
- TP_STRUCT__entry(
- __field( unsigned int, usec_timeout )
- __field( unsigned int, usec_delayed )
- ),
-
- TP_fast_assign(
- __entry->usec_timeout = usec_timeout;
- __entry->usec_delayed = usec_delayed;
- ),
-
- TP_printk("usec_timeout=%u usec_delayed=%u",
- __entry->usec_timeout,
- __entry->usec_delayed)
-);
-
-DEFINE_EVENT(writeback_congest_waited_template, writeback_congestion_wait,
-
- TP_PROTO(unsigned int usec_timeout, unsigned int usec_delayed),
-
- TP_ARGS(usec_timeout, usec_delayed)
-);
-
DECLARE_EVENT_CLASS(writeback_single_inode_template,
TP_PROTO(struct inode *inode,
--- a/mm/backing-dev.c~remove-congestion-tracking-framework
+++ a/mm/backing-dev.c
@@ -1005,60 +1005,3 @@ const char *bdi_dev_name(struct backing_
return bdi->dev_name;
}
EXPORT_SYMBOL_GPL(bdi_dev_name);
-
-static wait_queue_head_t congestion_wqh[2] = {
- __WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[0]),
- __WAIT_QUEUE_HEAD_INITIALIZER(congestion_wqh[1])
- };
-static atomic_t nr_wb_congested[2];
-
-void clear_bdi_congested(struct backing_dev_info *bdi, int sync)
-{
- wait_queue_head_t *wqh = &congestion_wqh[sync];
- enum wb_congested_state bit;
-
- bit = sync ? WB_sync_congested : WB_async_congested;
- if (test_and_clear_bit(bit, &bdi->wb.congested))
- atomic_dec(&nr_wb_congested[sync]);
- smp_mb__after_atomic();
- if (waitqueue_active(wqh))
- wake_up(wqh);
-}
-EXPORT_SYMBOL(clear_bdi_congested);
-
-void set_bdi_congested(struct backing_dev_info *bdi, int sync)
-{
- enum wb_congested_state bit;
-
- bit = sync ? WB_sync_congested : WB_async_congested;
- if (!test_and_set_bit(bit, &bdi->wb.congested))
- atomic_inc(&nr_wb_congested[sync]);
-}
-EXPORT_SYMBOL(set_bdi_congested);
-
-/**
- * congestion_wait - wait for a backing_dev to become uncongested
- * @sync: SYNC or ASYNC IO
- * @timeout: timeout in jiffies
- *
- * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
- * write congestion. If no backing_devs are congested then just wait for the
- * next write to be completed.
- */
-long congestion_wait(int sync, long timeout)
-{
- long ret;
- unsigned long start = jiffies;
- DEFINE_WAIT(wait);
- wait_queue_head_t *wqh = &congestion_wqh[sync];
-
- prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
- ret = io_schedule_timeout(timeout);
- finish_wait(wqh, &wait);
-
- trace_writeback_congestion_wait(jiffies_to_usecs(timeout),
- jiffies_to_usecs(jiffies - start));
-
- return ret;
-}
-EXPORT_SYMBOL(congestion_wait);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 017/227] mount: warn only once about timestamp range expiration
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: viro, hch, djwong, deepa.kernel, christian.brauner, ailiop, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Anthony Iliopoulos <ailiop@suse.com>
Subject: mount: warn only once about timestamp range expiration
Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp
expiry") introduced a mount warning regarding filesystem timestamp limits,
that is printed upon each writable mount or remount.
This can result in a lot of unnecessary messages in the kernel log in
setups where filesystems are being frequently remounted (or mounted
multiple times).
Avoid this by setting a superblock flag which indicates that the warning
has been emitted at least once for any particular mount, as suggested in
[1].
[1] https://lore.kernel.org/CAHk-=wim6VGnxQmjfK_tDg6fbHYKL4EFkmnTjVr9QnRqjDBAeA@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220119202934.26495-1-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/namespace.c | 2 ++
include/linux/fs.h | 1 +
2 files changed, 3 insertions(+)
--- a/fs/namespace.c~mount-warn-only-once-about-timestamp-range-expiration
+++ a/fs/namespace.c
@@ -2597,6 +2597,7 @@ static void mnt_warn_timestamp_expiry(st
struct super_block *sb = mnt->mnt_sb;
if (!__mnt_is_readonly(mnt) &&
+ (!(sb->s_iflags & SB_I_TS_EXPIRY_WARNED)) &&
(ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) {
char *buf = (char *)__get_free_page(GFP_KERNEL);
char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : ERR_PTR(-ENOMEM);
@@ -2611,6 +2612,7 @@ static void mnt_warn_timestamp_expiry(st
tm.tm_year+1900, (unsigned long long)sb->s_time_max);
free_page((unsigned long)buf);
+ sb->s_iflags |= SB_I_TS_EXPIRY_WARNED;
}
}
--- a/include/linux/fs.h~mount-warn-only-once-about-timestamp-range-expiration
+++ a/include/linux/fs.h
@@ -1440,6 +1440,7 @@ extern int send_sigurg(struct fown_struc
#define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */
#define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */
+#define SB_I_TS_EXPIRY_WARNED 0x00000400 /* warned about timestamp range expiry */
/* Possible states of 'frozen' field */
enum {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 017/227] mount: warn only once about timestamp range expiration
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: viro, hch, djwong, deepa.kernel, christian.brauner, ailiop, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Anthony Iliopoulos <ailiop@suse.com>
Subject: mount: warn only once about timestamp range expiration
Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp
expiry") introduced a mount warning regarding filesystem timestamp limits,
that is printed upon each writable mount or remount.
This can result in a lot of unnecessary messages in the kernel log in
setups where filesystems are being frequently remounted (or mounted
multiple times).
Avoid this by setting a superblock flag which indicates that the warning
has been emitted at least once for any particular mount, as suggested in
[1].
[1] https://lore.kernel.org/CAHk-=wim6VGnxQmjfK_tDg6fbHYKL4EFkmnTjVr9QnRqjDBAeA@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220119202934.26495-1-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/namespace.c | 2 ++
include/linux/fs.h | 1 +
2 files changed, 3 insertions(+)
--- a/fs/namespace.c~mount-warn-only-once-about-timestamp-range-expiration
+++ a/fs/namespace.c
@@ -2597,6 +2597,7 @@ static void mnt_warn_timestamp_expiry(st
struct super_block *sb = mnt->mnt_sb;
if (!__mnt_is_readonly(mnt) &&
+ (!(sb->s_iflags & SB_I_TS_EXPIRY_WARNED)) &&
(ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) {
char *buf = (char *)__get_free_page(GFP_KERNEL);
char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : ERR_PTR(-ENOMEM);
@@ -2611,6 +2612,7 @@ static void mnt_warn_timestamp_expiry(st
tm.tm_year+1900, (unsigned long long)sb->s_time_max);
free_page((unsigned long)buf);
+ sb->s_iflags |= SB_I_TS_EXPIRY_WARNED;
}
}
--- a/include/linux/fs.h~mount-warn-only-once-about-timestamp-range-expiration
+++ a/include/linux/fs.h
@@ -1440,6 +1440,7 @@ extern int send_sigurg(struct fown_struc
#define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */
#define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */
+#define SB_I_TS_EXPIRY_WARNED 0x00000400 /* warned about timestamp range expiry */
/* Possible states of 'frozen' field */
enum {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 018/227] mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: songmuchun, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory
For device private memory, we do not create a linear mapping for the
memory because the device memory is un-accessible. Thus we do not add
kasan zero shadow for it. So it's unnecessary to do
kasan_remove_zero_shadow() for it.
Link: https://lkml.kernel.org/r/20220126092602.1425-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memremap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/memremap.c~mm-memremap-avoid-calling-kasan_remove_zero_shadow-for-device-private-memory
+++ a/mm/memremap.c
@@ -282,7 +282,8 @@ static int pagemap_range(struct dev_page
return 0;
err_add_memory:
- kasan_remove_zero_shadow(__va(range->start), range_len(range));
+ if (!is_private)
+ kasan_remove_zero_shadow(__va(range->start), range_len(range));
err_kasan:
untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
err_pfn_remap:
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 018/227] mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: songmuchun, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory
For device private memory, we do not create a linear mapping for the
memory because the device memory is un-accessible. Thus we do not add
kasan zero shadow for it. So it's unnecessary to do
kasan_remove_zero_shadow() for it.
Link: https://lkml.kernel.org/r/20220126092602.1425-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memremap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/memremap.c~mm-memremap-avoid-calling-kasan_remove_zero_shadow-for-device-private-memory
+++ a/mm/memremap.c
@@ -282,7 +282,8 @@ static int pagemap_range(struct dev_page
return 0;
err_add_memory:
- kasan_remove_zero_shadow(__va(range->start), range_len(range));
+ if (!is_private)
+ kasan_remove_zero_shadow(__va(range->start), range_len(range));
err_kasan:
untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
err_pfn_remap:
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 019/227] filemap: remove find_get_pages()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, william.kucharski, vbabka, kirill.shutemov, hch, hannes,
dhowells, agruenba, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: filemap: remove find_get_pages()
It's unused now. Remove it and clean up the relevant comment.
Link: https://lkml.kernel.org/r/20220208134149.47299-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/pagemap.h | 7 -------
mm/filemap.c | 11 ++++++-----
2 files changed, 6 insertions(+), 12 deletions(-)
--- a/include/linux/pagemap.h~filemap-remove-find_get_pages
+++ a/include/linux/pagemap.h
@@ -594,13 +594,6 @@ static inline struct page *find_subpage(
unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
pgoff_t end, unsigned int nr_pages,
struct page **pages);
-static inline unsigned find_get_pages(struct address_space *mapping,
- pgoff_t *start, unsigned int nr_pages,
- struct page **pages)
-{
- return find_get_pages_range(mapping, start, (pgoff_t)-1, nr_pages,
- pages);
-}
unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start,
unsigned int nr_pages, struct page **pages);
unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
--- a/mm/filemap.c~filemap-remove-find_get_pages
+++ a/mm/filemap.c
@@ -2229,8 +2229,9 @@ out:
* @nr_pages: The maximum number of pages
* @pages: Where the resulting pages are placed
*
- * find_get_pages_contig() works exactly like find_get_pages(), except
- * that the returned number of pages are guaranteed to be contiguous.
+ * find_get_pages_contig() works exactly like find_get_pages_range(),
+ * except that the returned number of pages are guaranteed to be
+ * contiguous.
*
* Return: the number of pages which were found.
*/
@@ -2290,9 +2291,9 @@ EXPORT_SYMBOL(find_get_pages_contig);
* @nr_pages: the maximum number of pages
* @pages: where the resulting pages are placed
*
- * Like find_get_pages(), except we only return head pages which are tagged
- * with @tag. @index is updated to the index immediately after the last
- * page we return, ready for the next iteration.
+ * Like find_get_pages_range(), except we only return head pages which are
+ * tagged with @tag. @index is updated to the index immediately after the
+ * last page we return, ready for the next iteration.
*
* Return: the number of pages which were found.
*/
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 019/227] filemap: remove find_get_pages()
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, william.kucharski, vbabka, kirill.shutemov, hch, hannes,
dhowells, agruenba, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: filemap: remove find_get_pages()
It's unused now. Remove it and clean up the relevant comment.
Link: https://lkml.kernel.org/r/20220208134149.47299-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/pagemap.h | 7 -------
mm/filemap.c | 11 ++++++-----
2 files changed, 6 insertions(+), 12 deletions(-)
--- a/include/linux/pagemap.h~filemap-remove-find_get_pages
+++ a/include/linux/pagemap.h
@@ -594,13 +594,6 @@ static inline struct page *find_subpage(
unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
pgoff_t end, unsigned int nr_pages,
struct page **pages);
-static inline unsigned find_get_pages(struct address_space *mapping,
- pgoff_t *start, unsigned int nr_pages,
- struct page **pages)
-{
- return find_get_pages_range(mapping, start, (pgoff_t)-1, nr_pages,
- pages);
-}
unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start,
unsigned int nr_pages, struct page **pages);
unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
--- a/mm/filemap.c~filemap-remove-find_get_pages
+++ a/mm/filemap.c
@@ -2229,8 +2229,9 @@ out:
* @nr_pages: The maximum number of pages
* @pages: Where the resulting pages are placed
*
- * find_get_pages_contig() works exactly like find_get_pages(), except
- * that the returned number of pages are guaranteed to be contiguous.
+ * find_get_pages_contig() works exactly like find_get_pages_range(),
+ * except that the returned number of pages are guaranteed to be
+ * contiguous.
*
* Return: the number of pages which were found.
*/
@@ -2290,9 +2291,9 @@ EXPORT_SYMBOL(find_get_pages_contig);
* @nr_pages: the maximum number of pages
* @pages: where the resulting pages are placed
*
- * Like find_get_pages(), except we only return head pages which are tagged
- * with @tag. @index is updated to the index immediately after the last
- * page we return, ready for the next iteration.
+ * Like find_get_pages_range(), except we only return head pages which are
+ * tagged with @tag. @index is updated to the index immediately after the
+ * last page we return, ready for the next iteration.
*
* Return: the number of pages which were found.
*/
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 020/227] mm/writeback: minor clean up for highmem_dirtyable_memory
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: hannes, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/writeback: minor clean up for highmem_dirtyable_memory
Since commit a804552b9a15 ("mm/page-writeback.c: fix dirty_balance_reserve
subtraction from dirtyable memory"), local variable x can not be negative.
And it can not overflow when it is the total number of dirtyable highmem
pages. Thus remove the unneeded comment and overflow check.
Link: https://lkml.kernel.org/r/20220224115416.46089-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page-writeback.c | 12 ------------
1 file changed, 12 deletions(-)
--- a/mm/page-writeback.c~mm-writeback-minor-clean-up-for-highmem_dirtyable_memory
+++ a/mm/page-writeback.c
@@ -324,18 +324,6 @@ static unsigned long highmem_dirtyable_m
}
/*
- * Unreclaimable memory (kernel memory or anonymous memory
- * without swap) can bring down the dirtyable pages below
- * the zone's dirty balance reserve and the above calculation
- * will underflow. However we still want to add in nodes
- * which are below threshold (negative values) to get a more
- * accurate calculation but make sure that the total never
- * underflows.
- */
- if ((long)x < 0)
- x = 0;
-
- /*
* Make sure that the number of highmem pages is never larger
* than the number of the total dirtyable memory. This can only
* occur in very strange VM situations but we want to make sure
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 020/227] mm/writeback: minor clean up for highmem_dirtyable_memory
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: hannes, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/writeback: minor clean up for highmem_dirtyable_memory
Since commit a804552b9a15 ("mm/page-writeback.c: fix dirty_balance_reserve
subtraction from dirtyable memory"), local variable x can not be negative.
And it can not overflow when it is the total number of dirtyable highmem
pages. Thus remove the unneeded comment and overflow check.
Link: https://lkml.kernel.org/r/20220224115416.46089-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page-writeback.c | 12 ------------
1 file changed, 12 deletions(-)
--- a/mm/page-writeback.c~mm-writeback-minor-clean-up-for-highmem_dirtyable_memory
+++ a/mm/page-writeback.c
@@ -324,18 +324,6 @@ static unsigned long highmem_dirtyable_m
}
/*
- * Unreclaimable memory (kernel memory or anonymous memory
- * without swap) can bring down the dirtyable pages below
- * the zone's dirty balance reserve and the above calculation
- * will underflow. However we still want to add in nodes
- * which are below threshold (negative values) to get a more
- * accurate calculation but make sure that the total never
- * underflows.
- */
- if ((long)x < 0)
- x = 0;
-
- /*
* Make sure that the number of highmem pages is never larger
* than the number of the total dirtyable memory. This can only
* occur in very strange VM situations but we want to make sure
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 021/227] mm: fs: fix lru_cache_disabled race in bh_lru
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: stable, mtosatti, joaodias, cgoldswo, minchan, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Minchan Kim <minchan@kernel.org>
Subject: mm: fs: fix lru_cache_disabled race in bh_lru
Check lru_cache_disabled under bh_lru_lock. Otherwise, it could introduce
race below and it fails to migrate pages containing buffer_head.
CPU 0 CPU 1
bh_lru_install
lru_cache_disable
lru_cache_disabled = false
atomic_inc(&lru_disable_count);
invalidate_bh_lrus_cpu of CPU 0
bh_lru_lock
__invalidate_bh_lrus
bh_lru_unlock
bh_lru_lock
install the bh
bh_lru_unlock
WHen this race happens a CMA allocation fails, which is critical for
the workload which depends on CMA.
Link: https://lkml.kernel.org/r/20220308180709.2017638-1-minchan@kernel.org
Fixes: 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/buffer.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/fs/buffer.c~mm-fs-fix-lru_cache_disabled-race-in-bh_lru
+++ a/fs/buffer.c
@@ -1235,16 +1235,18 @@ static void bh_lru_install(struct buffer
int i;
check_irqs_on();
+ bh_lru_lock();
+
/*
* the refcount of buffer_head in bh_lru prevents dropping the
* attached page(i.e., try_to_free_buffers) so it could cause
* failing page migration.
* Skip putting upcoming bh into bh_lru until migration is done.
*/
- if (lru_cache_disabled())
+ if (lru_cache_disabled()) {
+ bh_lru_unlock();
return;
-
- bh_lru_lock();
+ }
b = this_cpu_ptr(&bh_lrus);
for (i = 0; i < BH_LRU_SIZE; i++) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 021/227] mm: fs: fix lru_cache_disabled race in bh_lru
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: stable, mtosatti, joaodias, cgoldswo, minchan, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Minchan Kim <minchan@kernel.org>
Subject: mm: fs: fix lru_cache_disabled race in bh_lru
Check lru_cache_disabled under bh_lru_lock. Otherwise, it could introduce
race below and it fails to migrate pages containing buffer_head.
CPU 0 CPU 1
bh_lru_install
lru_cache_disable
lru_cache_disabled = false
atomic_inc(&lru_disable_count);
invalidate_bh_lrus_cpu of CPU 0
bh_lru_lock
__invalidate_bh_lrus
bh_lru_unlock
bh_lru_lock
install the bh
bh_lru_unlock
WHen this race happens a CMA allocation fails, which is critical for
the workload which depends on CMA.
Link: https://lkml.kernel.org/r/20220308180709.2017638-1-minchan@kernel.org
Fixes: 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/buffer.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/fs/buffer.c~mm-fs-fix-lru_cache_disabled-race-in-bh_lru
+++ a/fs/buffer.c
@@ -1235,16 +1235,18 @@ static void bh_lru_install(struct buffer
int i;
check_irqs_on();
+ bh_lru_lock();
+
/*
* the refcount of buffer_head in bh_lru prevents dropping the
* attached page(i.e., try_to_free_buffers) so it could cause
* failing page migration.
* Skip putting upcoming bh into bh_lru until migration is done.
*/
- if (lru_cache_disabled())
+ if (lru_cache_disabled()) {
+ bh_lru_unlock();
return;
-
- bh_lru_lock();
+ }
b = this_cpu_ptr(&bh_lrus);
for (i = 0; i < BH_LRU_SIZE; i++) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 022/227] mm: fix invalid page pointer returned with FOLL_PIN gups
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, lukas.bulwahn, kirill.shutemov, jhubbard, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, peterx, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: fix invalid page pointer returned with FOLL_PIN gups
Patch series "mm/gup: some cleanups", v5.
This patch (of 5):
Alex reported invalid page pointer returned with pin_user_pages_remote()
from vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for
batched pinning with struct vfio_batch").
It turns out that it's not the fault of the vfio commit; however after
vfio switches to a full page buffer to store the page pointers it starts
to expose the problem easier.
The problem is for VM_PFNMAP vmas we should normally fail with an -EFAULT
then vfio will carry on to handle the MMIO regions. However when the bug
triggered, follow_page_mask() returned -EEXIST for such a page, which will
jump over the current page, leaving that entry in **pages untouched.
However the caller is not aware of it, hence the caller will reference the
page as usual even if the pointer data can be anything.
We had that -EEXIST logic since commit 1027e4436b6a ("mm: make GUP handle
pfn mapping unless FOLL_GET is requested") which seems very reasonable.
It could be that when we reworked GUP with FOLL_PIN we could have
overlooked that special path in commit 3faa52c03f44 ("mm/gup: track
FOLL_PIN pages"), even if that commit rightfully touched up
follow_devmap_pud() on checking FOLL_PIN when it needs to return an
-EEXIST.
Attaching the Fixes to the FOLL_PIN rework commit, as it happened later than
1027e4436b6a.
[jhubbard@nvidia.com: added some tags, removed a reference to an out of tree module.]
Link: https://lkml.kernel.org/r/20220207062213.235127-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-2-jhubbard@nvidia.com
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Debugged-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/gup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/gup.c~mm-fix-invalid-page-pointer-returned-with-foll_pin-gups
+++ a/mm/gup.c
@@ -465,7 +465,7 @@ static int follow_pfn_pte(struct vm_area
pte_t *pte, unsigned int flags)
{
/* No page to get reference */
- if (flags & FOLL_GET)
+ if (flags & (FOLL_GET | FOLL_PIN))
return -EFAULT;
if (flags & FOLL_TOUCH) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 022/227] mm: fix invalid page pointer returned with FOLL_PIN gups
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, lukas.bulwahn, kirill.shutemov, jhubbard, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, peterx, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: fix invalid page pointer returned with FOLL_PIN gups
Patch series "mm/gup: some cleanups", v5.
This patch (of 5):
Alex reported invalid page pointer returned with pin_user_pages_remote()
from vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for
batched pinning with struct vfio_batch").
It turns out that it's not the fault of the vfio commit; however after
vfio switches to a full page buffer to store the page pointers it starts
to expose the problem easier.
The problem is for VM_PFNMAP vmas we should normally fail with an -EFAULT
then vfio will carry on to handle the MMIO regions. However when the bug
triggered, follow_page_mask() returned -EEXIST for such a page, which will
jump over the current page, leaving that entry in **pages untouched.
However the caller is not aware of it, hence the caller will reference the
page as usual even if the pointer data can be anything.
We had that -EEXIST logic since commit 1027e4436b6a ("mm: make GUP handle
pfn mapping unless FOLL_GET is requested") which seems very reasonable.
It could be that when we reworked GUP with FOLL_PIN we could have
overlooked that special path in commit 3faa52c03f44 ("mm/gup: track
FOLL_PIN pages"), even if that commit rightfully touched up
follow_devmap_pud() on checking FOLL_PIN when it needs to return an
-EEXIST.
Attaching the Fixes to the FOLL_PIN rework commit, as it happened later than
1027e4436b6a.
[jhubbard@nvidia.com: added some tags, removed a reference to an out of tree module.]
Link: https://lkml.kernel.org/r/20220207062213.235127-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-2-jhubbard@nvidia.com
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Debugged-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/gup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/gup.c~mm-fix-invalid-page-pointer-returned-with-foll_pin-gups
+++ a/mm/gup.c
@@ -465,7 +465,7 @@ static int follow_pfn_pte(struct vm_area
pte_t *pte, unsigned int flags)
{
/* No page to get reference */
- if (flags & FOLL_GET)
+ if (flags & (FOLL_GET | FOLL_PIN))
return -EFAULT;
if (flags & FOLL_TOUCH) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 023/227] mm/gup: follow_pfn_pte(): -EEXIST cleanup
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: follow_pfn_pte(): -EEXIST cleanup
Remove a quirky special case from follow_pfn_pte(), and adjust its callers
to match. Caller changes include:
__get_user_pages(): Regardless of any FOLL_* flags, get_user_pages() and
its variants should handle PFN-only entries by stopping early, if the
caller expected **pages to be filled in. This makes for a more reliable
API, as compared to the previous approach of skipping over such entries
(and thus leaving them silently unwritten).
move_pages(): squash the -EEXIST error return from follow_page() into
-EFAULT, because -EFAULT is listed in the man page, whereas -EEXIST is
not.
Link: https://lkml.kernel.org/r/20220204020010.68930-3-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Xu <peterx@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/gup.c | 13 ++++++++-----
mm/migrate.c | 7 +++++++
2 files changed, 15 insertions(+), 5 deletions(-)
--- a/mm/gup.c~mm-gup-follow_pfn_pte-eexist-cleanup
+++ a/mm/gup.c
@@ -464,10 +464,6 @@ static struct page *no_page_table(struct
static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
pte_t *pte, unsigned int flags)
{
- /* No page to get reference */
- if (flags & (FOLL_GET | FOLL_PIN))
- return -EFAULT;
-
if (flags & FOLL_TOUCH) {
pte_t entry = *pte;
@@ -1205,8 +1201,15 @@ retry:
} else if (PTR_ERR(page) == -EEXIST) {
/*
* Proper page table entry exists, but no corresponding
- * struct page.
+ * struct page. If the caller expects **pages to be
+ * filled in, bail out now, because that can't be done
+ * for this page.
*/
+ if (pages) {
+ ret = PTR_ERR(page);
+ goto out;
+ }
+
goto next_page;
} else if (IS_ERR(page)) {
ret = PTR_ERR(page);
--- a/mm/migrate.c~mm-gup-follow_pfn_pte-eexist-cleanup
+++ a/mm/migrate.c
@@ -1762,6 +1762,13 @@ static int do_pages_move(struct mm_struc
}
/*
+ * The move_pages() man page does not have an -EEXIST choice, so
+ * use -EFAULT instead.
+ */
+ if (err == -EEXIST)
+ err = -EFAULT;
+
+ /*
* If the page is already on the target node (!err), store the
* node, otherwise, store the err.
*/
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 023/227] mm/gup: follow_pfn_pte(): -EEXIST cleanup
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: follow_pfn_pte(): -EEXIST cleanup
Remove a quirky special case from follow_pfn_pte(), and adjust its callers
to match. Caller changes include:
__get_user_pages(): Regardless of any FOLL_* flags, get_user_pages() and
its variants should handle PFN-only entries by stopping early, if the
caller expected **pages to be filled in. This makes for a more reliable
API, as compared to the previous approach of skipping over such entries
(and thus leaving them silently unwritten).
move_pages(): squash the -EEXIST error return from follow_page() into
-EFAULT, because -EFAULT is listed in the man page, whereas -EEXIST is
not.
Link: https://lkml.kernel.org/r/20220204020010.68930-3-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Xu <peterx@redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/gup.c | 13 ++++++++-----
mm/migrate.c | 7 +++++++
2 files changed, 15 insertions(+), 5 deletions(-)
--- a/mm/gup.c~mm-gup-follow_pfn_pte-eexist-cleanup
+++ a/mm/gup.c
@@ -464,10 +464,6 @@ static struct page *no_page_table(struct
static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
pte_t *pte, unsigned int flags)
{
- /* No page to get reference */
- if (flags & (FOLL_GET | FOLL_PIN))
- return -EFAULT;
-
if (flags & FOLL_TOUCH) {
pte_t entry = *pte;
@@ -1205,8 +1201,15 @@ retry:
} else if (PTR_ERR(page) == -EEXIST) {
/*
* Proper page table entry exists, but no corresponding
- * struct page.
+ * struct page. If the caller expects **pages to be
+ * filled in, bail out now, because that can't be done
+ * for this page.
*/
+ if (pages) {
+ ret = PTR_ERR(page);
+ goto out;
+ }
+
goto next_page;
} else if (IS_ERR(page)) {
ret = PTR_ERR(page);
--- a/mm/migrate.c~mm-gup-follow_pfn_pte-eexist-cleanup
+++ a/mm/migrate.c
@@ -1762,6 +1762,13 @@ static int do_pages_move(struct mm_struc
}
/*
+ * The move_pages() man page does not have an -EEXIST choice, so
+ * use -EFAULT instead.
+ */
+ if (err == -EEXIST)
+ err = -EFAULT;
+
+ /*
* If the page is already on the target node (!err), store the
* node, otherwise, store the err.
*/
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 024/227] mm/gup: remove unused pin_user_pages_locked()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: remove unused pin_user_pages_locked()
This routine was used for a short while, but then the calling code was
refactored and the only caller was removed.
Link: https://lkml.kernel.org/r/20220204020010.68930-4-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 2 --
mm/gup.c | 29 -----------------------------
2 files changed, 31 deletions(-)
--- a/include/linux/mm.h~mm-gup-remove-unused-pin_user_pages_locked
+++ a/include/linux/mm.h
@@ -1918,8 +1918,6 @@ long pin_user_pages(unsigned long start,
struct vm_area_struct **vmas);
long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages, int *locked);
-long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
--- a/mm/gup.c~mm-gup-remove-unused-pin_user_pages_locked
+++ a/mm/gup.c
@@ -3127,32 +3127,3 @@ long pin_user_pages_unlocked(unsigned lo
return get_user_pages_unlocked(start, nr_pages, pages, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages_unlocked);
-
-/*
- * pin_user_pages_locked() is the FOLL_PIN variant of get_user_pages_locked().
- * Behavior is the same, except that this one sets FOLL_PIN and rejects
- * FOLL_GET.
- */
-long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- int *locked)
-{
- /*
- * FIXME: Current FOLL_LONGTERM behavior is incompatible with
- * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
- * vmas. As there are no users of this flag in this call we simply
- * disallow this option for now.
- */
- if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
- return -EINVAL;
-
- /* FOLL_GET and FOLL_PIN are mutually exclusive. */
- if (WARN_ON_ONCE(gup_flags & FOLL_GET))
- return -EINVAL;
-
- gup_flags |= FOLL_PIN;
- return __get_user_pages_locked(current->mm, start, nr_pages,
- pages, NULL, locked,
- gup_flags | FOLL_TOUCH);
-}
-EXPORT_SYMBOL(pin_user_pages_locked);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 024/227] mm/gup: remove unused pin_user_pages_locked()
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: remove unused pin_user_pages_locked()
This routine was used for a short while, but then the calling code was
refactored and the only caller was removed.
Link: https://lkml.kernel.org/r/20220204020010.68930-4-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 2 --
mm/gup.c | 29 -----------------------------
2 files changed, 31 deletions(-)
--- a/include/linux/mm.h~mm-gup-remove-unused-pin_user_pages_locked
+++ a/include/linux/mm.h
@@ -1918,8 +1918,6 @@ long pin_user_pages(unsigned long start,
struct vm_area_struct **vmas);
long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages, int *locked);
-long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
--- a/mm/gup.c~mm-gup-remove-unused-pin_user_pages_locked
+++ a/mm/gup.c
@@ -3127,32 +3127,3 @@ long pin_user_pages_unlocked(unsigned lo
return get_user_pages_unlocked(start, nr_pages, pages, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages_unlocked);
-
-/*
- * pin_user_pages_locked() is the FOLL_PIN variant of get_user_pages_locked().
- * Behavior is the same, except that this one sets FOLL_PIN and rejects
- * FOLL_GET.
- */
-long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- int *locked)
-{
- /*
- * FIXME: Current FOLL_LONGTERM behavior is incompatible with
- * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
- * vmas. As there are no users of this flag in this call we simply
- * disallow this option for now.
- */
- if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
- return -EINVAL;
-
- /* FOLL_GET and FOLL_PIN are mutually exclusive. */
- if (WARN_ON_ONCE(gup_flags & FOLL_GET))
- return -EINVAL;
-
- gup_flags |= FOLL_PIN;
- return __get_user_pages_locked(current->mm, start, nr_pages,
- pages, NULL, locked,
- gup_flags | FOLL_TOUCH);
-}
-EXPORT_SYMBOL(pin_user_pages_locked);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 025/227] mm: change lookup_node() to use get_user_pages_fast()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm: change lookup_node() to use get_user_pages_fast()
The purpose of calling get_user_pages_locked() from lookup_node() was to
allow for unlocking the mmap_lock when reading a page from the disk during
a page fault (hidden behind VM_FAULT_RETRY). The idea was to reduce
contention on the heavily-used mmap_lock. (Thanks to Jan Kara for clearly
pointing that out, and in fact I've used some of his wording here.)
However, it is unlikely for lookup_node() to take a page fault. With that
in mind, change over to calling get_user_pages_fast(). This simplifies
the code, runs a little faster in the expected case, and allows removing
get_user_pages_locked() entirely, in a subsequent patch.
Link: https://lkml.kernel.org/r/20220204020010.68930-5-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mempolicy.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
--- a/mm/mempolicy.c~mm-change-lookup_node-to-use-get_user_pages_fast
+++ a/mm/mempolicy.c
@@ -907,17 +907,14 @@ static void get_policy_nodemask(struct m
static int lookup_node(struct mm_struct *mm, unsigned long addr)
{
struct page *p = NULL;
- int err;
+ int ret;
- int locked = 1;
- err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
- if (err > 0) {
- err = page_to_nid(p);
+ ret = get_user_pages_fast(addr & PAGE_MASK, 1, 0, &p);
+ if (ret > 0) {
+ ret = page_to_nid(p);
put_page(p);
}
- if (locked)
- mmap_read_unlock(mm);
- return err;
+ return ret;
}
/* Retrieve NUMA policy */
@@ -968,14 +965,14 @@ static long do_get_mempolicy(int *policy
if (flags & MPOL_F_NODE) {
if (flags & MPOL_F_ADDR) {
/*
- * Take a refcount on the mpol, lookup_node()
- * will drop the mmap_lock, so after calling
- * lookup_node() only "pol" remains valid, "vma"
- * is stale.
+ * Take a refcount on the mpol, because we are about to
+ * drop the mmap_lock, after which only "pol" remains
+ * valid, "vma" is stale.
*/
pol_refcount = pol;
vma = NULL;
mpol_get(pol);
+ mmap_read_unlock(mm);
err = lookup_node(mm, addr);
if (err < 0)
goto out;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 025/227] mm: change lookup_node() to use get_user_pages_fast()
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm: change lookup_node() to use get_user_pages_fast()
The purpose of calling get_user_pages_locked() from lookup_node() was to
allow for unlocking the mmap_lock when reading a page from the disk during
a page fault (hidden behind VM_FAULT_RETRY). The idea was to reduce
contention on the heavily-used mmap_lock. (Thanks to Jan Kara for clearly
pointing that out, and in fact I've used some of his wording here.)
However, it is unlikely for lookup_node() to take a page fault. With that
in mind, change over to calling get_user_pages_fast(). This simplifies
the code, runs a little faster in the expected case, and allows removing
get_user_pages_locked() entirely, in a subsequent patch.
Link: https://lkml.kernel.org/r/20220204020010.68930-5-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mempolicy.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
--- a/mm/mempolicy.c~mm-change-lookup_node-to-use-get_user_pages_fast
+++ a/mm/mempolicy.c
@@ -907,17 +907,14 @@ static void get_policy_nodemask(struct m
static int lookup_node(struct mm_struct *mm, unsigned long addr)
{
struct page *p = NULL;
- int err;
+ int ret;
- int locked = 1;
- err = get_user_pages_locked(addr & PAGE_MASK, 1, 0, &p, &locked);
- if (err > 0) {
- err = page_to_nid(p);
+ ret = get_user_pages_fast(addr & PAGE_MASK, 1, 0, &p);
+ if (ret > 0) {
+ ret = page_to_nid(p);
put_page(p);
}
- if (locked)
- mmap_read_unlock(mm);
- return err;
+ return ret;
}
/* Retrieve NUMA policy */
@@ -968,14 +965,14 @@ static long do_get_mempolicy(int *policy
if (flags & MPOL_F_NODE) {
if (flags & MPOL_F_ADDR) {
/*
- * Take a refcount on the mpol, lookup_node()
- * will drop the mmap_lock, so after calling
- * lookup_node() only "pol" remains valid, "vma"
- * is stale.
+ * Take a refcount on the mpol, because we are about to
+ * drop the mmap_lock, after which only "pol" remains
+ * valid, "vma" is stale.
*/
pol_refcount = pol;
vma = NULL;
mpol_get(pol);
+ mmap_read_unlock(mm);
err = lookup_node(mm, addr);
if (err < 0)
goto out;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 026/227] mm/gup: remove unused get_user_pages_locked()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: remove unused get_user_pages_locked()
Now that the last caller of get_user_pages_locked() is gone, remove it.
Link: https://lkml.kernel.org/r/20220204020010.68930-6-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 2 -
mm/gup.c | 59 -------------------------------------------
2 files changed, 61 deletions(-)
--- a/include/linux/mm.h~mm-gup-remove-unused-get_user_pages_locked
+++ a/include/linux/mm.h
@@ -1916,8 +1916,6 @@ long get_user_pages(unsigned long start,
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas);
-long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
--- a/mm/gup.c~mm-gup-remove-unused-get_user_pages_locked
+++ a/mm/gup.c
@@ -2126,65 +2126,6 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
-/**
- * get_user_pages_locked() - variant of get_user_pages()
- *
- * @start: starting user address
- * @nr_pages: number of pages from start to pin
- * @gup_flags: flags modifying lookup behaviour
- * @pages: array that receives pointers to the pages pinned.
- * Should be at least nr_pages long. Or NULL, if caller
- * only intends to ensure the pages are faulted in.
- * @locked: pointer to lock flag indicating whether lock is held and
- * subsequently whether VM_FAULT_RETRY functionality can be
- * utilised. Lock must initially be held.
- *
- * It is suitable to replace the form:
- *
- * mmap_read_lock(mm);
- * do_something()
- * get_user_pages(mm, ..., pages, NULL);
- * mmap_read_unlock(mm);
- *
- * to:
- *
- * int locked = 1;
- * mmap_read_lock(mm);
- * do_something()
- * get_user_pages_locked(mm, ..., pages, &locked);
- * if (locked)
- * mmap_read_unlock(mm);
- *
- * We can leverage the VM_FAULT_RETRY functionality in the page fault
- * paths better by using either get_user_pages_locked() or
- * get_user_pages_unlocked().
- *
- */
-long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- int *locked)
-{
- /*
- * FIXME: Current FOLL_LONGTERM behavior is incompatible with
- * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
- * vmas. As there are no users of this flag in this call we simply
- * disallow this option for now.
- */
- if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
- return -EINVAL;
- /*
- * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
- * never directly by the caller, so enforce that:
- */
- if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
- return -EINVAL;
-
- return __get_user_pages_locked(current->mm, start, nr_pages,
- pages, NULL, locked,
- gup_flags | FOLL_TOUCH);
-}
-EXPORT_SYMBOL(get_user_pages_locked);
-
/*
* get_user_pages_unlocked() is suitable to replace the form:
*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 026/227] mm/gup: remove unused get_user_pages_locked()
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: willy, peterx, lukas.bulwahn, kirill.shutemov, jgg, jgg, jack,
imbrenda, hch, david, alex.williamson, aarcange, jhubbard, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: John Hubbard <jhubbard@nvidia.com>
Subject: mm/gup: remove unused get_user_pages_locked()
Now that the last caller of get_user_pages_locked() is gone, remove it.
Link: https://lkml.kernel.org/r/20220204020010.68930-6-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 2 -
mm/gup.c | 59 -------------------------------------------
2 files changed, 61 deletions(-)
--- a/include/linux/mm.h~mm-gup-remove-unused-get_user_pages_locked
+++ a/include/linux/mm.h
@@ -1916,8 +1916,6 @@ long get_user_pages(unsigned long start,
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas);
-long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
--- a/mm/gup.c~mm-gup-remove-unused-get_user_pages_locked
+++ a/mm/gup.c
@@ -2126,65 +2126,6 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
-/**
- * get_user_pages_locked() - variant of get_user_pages()
- *
- * @start: starting user address
- * @nr_pages: number of pages from start to pin
- * @gup_flags: flags modifying lookup behaviour
- * @pages: array that receives pointers to the pages pinned.
- * Should be at least nr_pages long. Or NULL, if caller
- * only intends to ensure the pages are faulted in.
- * @locked: pointer to lock flag indicating whether lock is held and
- * subsequently whether VM_FAULT_RETRY functionality can be
- * utilised. Lock must initially be held.
- *
- * It is suitable to replace the form:
- *
- * mmap_read_lock(mm);
- * do_something()
- * get_user_pages(mm, ..., pages, NULL);
- * mmap_read_unlock(mm);
- *
- * to:
- *
- * int locked = 1;
- * mmap_read_lock(mm);
- * do_something()
- * get_user_pages_locked(mm, ..., pages, &locked);
- * if (locked)
- * mmap_read_unlock(mm);
- *
- * We can leverage the VM_FAULT_RETRY functionality in the page fault
- * paths better by using either get_user_pages_locked() or
- * get_user_pages_unlocked().
- *
- */
-long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- int *locked)
-{
- /*
- * FIXME: Current FOLL_LONGTERM behavior is incompatible with
- * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
- * vmas. As there are no users of this flag in this call we simply
- * disallow this option for now.
- */
- if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
- return -EINVAL;
- /*
- * FOLL_PIN must only be set internally by the pin_user_pages*() APIs,
- * never directly by the caller, so enforce that:
- */
- if (WARN_ON_ONCE(gup_flags & FOLL_PIN))
- return -EINVAL;
-
- return __get_user_pages_locked(current->mm, start, nr_pages,
- pages, NULL, locked,
- gup_flags | FOLL_TOUCH);
-}
-EXPORT_SYMBOL(get_user_pages_locked);
-
/*
* get_user_pages_unlocked() is suitable to replace the form:
*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 027/227] mm/swap: fix confusing comment in folio_mark_accessed
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: libang.linuxer, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Bang Li <libang.linuxer@gmail.com>
Subject: mm/swap: fix confusing comment in folio_mark_accessed
For unevictable pages, we don't need mark them.
Link: https://lkml.kernel.org/r/20220311141519.59948-1-libang.linuxer@gmail.com
Signed-off-by: Bang Li <libang.linuxer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swap.c~mm-swap-fix-confusing-comment-in-folio_mark_accessed
+++ a/mm/swap.c
@@ -425,7 +425,7 @@ void folio_mark_accessed(struct folio *f
/*
* Unevictable pages are on the "LRU_UNEVICTABLE" list. But,
* this list is never rotated or maintained, so marking an
- * evictable page accessed has no effect.
+ * unevictable page accessed has no effect.
*/
} else if (!folio_test_active(folio)) {
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 027/227] mm/swap: fix confusing comment in folio_mark_accessed
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: libang.linuxer, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Bang Li <libang.linuxer@gmail.com>
Subject: mm/swap: fix confusing comment in folio_mark_accessed
For unevictable pages, we don't need mark them.
Link: https://lkml.kernel.org/r/20220311141519.59948-1-libang.linuxer@gmail.com
Signed-off-by: Bang Li <libang.linuxer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swap.c~mm-swap-fix-confusing-comment-in-folio_mark_accessed
+++ a/mm/swap.c
@@ -425,7 +425,7 @@ void folio_mark_accessed(struct folio *f
/*
* Unevictable pages are on the "LRU_UNEVICTABLE" list. But,
* this list is never rotated or maintained, so marking an
- * evictable page accessed has no effect.
+ * unevictable page accessed has no effect.
*/
} else if (!folio_test_active(folio)) {
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 028/227] tmpfs: support for file creation time
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: xavier.grand, sylvain.bellone, jdelvare, hughd, xavier.roche,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Xavier Roche <xavier.roche@algolia.com>
Subject: tmpfs: support for file creation time
Various filesystems (including ext4) now support file creation time. This
patch adds such support for tmpfs-based filesystems.
Note that using shmem_getattr() on other file types than regular requires
that shmem_is_huge() check type, to stop incorrect HPAGE_PMD_SIZE blksize.
[hughd@google.com: three tweaks to creation time patch]
Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com
Link: https://lkml.kernel.org/r/20220314211150.GA123458@xavier-xps
Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com
Link: https://lkml.kernel.org/r/20220211213628.GA1919658@xavier-xps
Signed-off-by: Xavier Roche <xavier.roche@algolia.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Sylvain Bellone <sylvain.bellone@algolia.com>
Reported-by: Xavier Grand <xavier.grand@algolia.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/shmem_fs.h | 1 +
mm/shmem.c | 16 +++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
--- a/include/linux/shmem_fs.h~tmpfs-support-for-file-creation-time
+++ a/include/linux/shmem_fs.h
@@ -24,6 +24,7 @@ struct shmem_inode_info {
struct shared_policy policy; /* NUMA memory alloc policy */
struct simple_xattrs xattrs; /* list of xattrs */
atomic_t stop_eviction; /* hold when working on inode */
+ struct timespec64 i_crtime; /* file creation time */
struct inode vfs_inode;
};
--- a/mm/shmem.c~tmpfs-support-for-file-creation-time
+++ a/mm/shmem.c
@@ -476,6 +476,8 @@ bool shmem_is_huge(struct vm_area_struct
{
loff_t i_size;
+ if (!S_ISREG(inode->i_mode))
+ return false;
if (shmem_huge == SHMEM_HUGE_DENY)
return false;
if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) ||
@@ -1061,6 +1063,12 @@ static int shmem_getattr(struct user_nam
if (shmem_is_huge(NULL, inode, 0))
stat->blksize = HPAGE_PMD_SIZE;
+ if (request_mask & STATX_BTIME) {
+ stat->result_mask |= STATX_BTIME;
+ stat->btime.tv_sec = info->i_crtime.tv_sec;
+ stat->btime.tv_nsec = info->i_crtime.tv_nsec;
+ }
+
return 0;
}
@@ -1854,9 +1862,6 @@ repeat:
return 0;
}
- /* Never use a huge page for shmem_symlink() */
- if (S_ISLNK(inode->i_mode))
- goto alloc_nohuge;
if (!shmem_is_huge(vma, inode, index))
goto alloc_nohuge;
@@ -2265,6 +2270,7 @@ static struct inode *shmem_get_inode(str
atomic_set(&info->stop_eviction, 0);
info->seals = F_SEAL_SEAL;
info->flags = flags & VM_NORESERVE;
+ info->i_crtime = inode->i_mtime;
INIT_LIST_HEAD(&info->shrinklist);
INIT_LIST_HEAD(&info->swaplist);
simple_xattrs_init(&info->xattrs);
@@ -3196,6 +3202,7 @@ static ssize_t shmem_listxattr(struct de
#endif /* CONFIG_TMPFS_XATTR */
static const struct inode_operations shmem_short_symlink_operations = {
+ .getattr = shmem_getattr,
.get_link = simple_get_link,
#ifdef CONFIG_TMPFS_XATTR
.listxattr = shmem_listxattr,
@@ -3203,6 +3210,7 @@ static const struct inode_operations shm
};
static const struct inode_operations shmem_symlink_inode_operations = {
+ .getattr = shmem_getattr,
.get_link = shmem_get_link,
#ifdef CONFIG_TMPFS_XATTR
.listxattr = shmem_listxattr,
@@ -3790,6 +3798,7 @@ static const struct inode_operations shm
static const struct inode_operations shmem_dir_inode_operations = {
#ifdef CONFIG_TMPFS
+ .getattr = shmem_getattr,
.create = shmem_create,
.lookup = simple_lookup,
.link = shmem_link,
@@ -3811,6 +3820,7 @@ static const struct inode_operations shm
};
static const struct inode_operations shmem_special_inode_operations = {
+ .getattr = shmem_getattr,
#ifdef CONFIG_TMPFS_XATTR
.listxattr = shmem_listxattr,
#endif
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 028/227] tmpfs: support for file creation time
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: xavier.grand, sylvain.bellone, jdelvare, hughd, xavier.roche,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Xavier Roche <xavier.roche@algolia.com>
Subject: tmpfs: support for file creation time
Various filesystems (including ext4) now support file creation time. This
patch adds such support for tmpfs-based filesystems.
Note that using shmem_getattr() on other file types than regular requires
that shmem_is_huge() check type, to stop incorrect HPAGE_PMD_SIZE blksize.
[hughd@google.com: three tweaks to creation time patch]
Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com
Link: https://lkml.kernel.org/r/20220314211150.GA123458@xavier-xps
Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com
Link: https://lkml.kernel.org/r/20220211213628.GA1919658@xavier-xps
Signed-off-by: Xavier Roche <xavier.roche@algolia.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Jean Delvare <jdelvare@suse.de>
Tested-by: Sylvain Bellone <sylvain.bellone@algolia.com>
Reported-by: Xavier Grand <xavier.grand@algolia.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/shmem_fs.h | 1 +
mm/shmem.c | 16 +++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
--- a/include/linux/shmem_fs.h~tmpfs-support-for-file-creation-time
+++ a/include/linux/shmem_fs.h
@@ -24,6 +24,7 @@ struct shmem_inode_info {
struct shared_policy policy; /* NUMA memory alloc policy */
struct simple_xattrs xattrs; /* list of xattrs */
atomic_t stop_eviction; /* hold when working on inode */
+ struct timespec64 i_crtime; /* file creation time */
struct inode vfs_inode;
};
--- a/mm/shmem.c~tmpfs-support-for-file-creation-time
+++ a/mm/shmem.c
@@ -476,6 +476,8 @@ bool shmem_is_huge(struct vm_area_struct
{
loff_t i_size;
+ if (!S_ISREG(inode->i_mode))
+ return false;
if (shmem_huge == SHMEM_HUGE_DENY)
return false;
if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) ||
@@ -1061,6 +1063,12 @@ static int shmem_getattr(struct user_nam
if (shmem_is_huge(NULL, inode, 0))
stat->blksize = HPAGE_PMD_SIZE;
+ if (request_mask & STATX_BTIME) {
+ stat->result_mask |= STATX_BTIME;
+ stat->btime.tv_sec = info->i_crtime.tv_sec;
+ stat->btime.tv_nsec = info->i_crtime.tv_nsec;
+ }
+
return 0;
}
@@ -1854,9 +1862,6 @@ repeat:
return 0;
}
- /* Never use a huge page for shmem_symlink() */
- if (S_ISLNK(inode->i_mode))
- goto alloc_nohuge;
if (!shmem_is_huge(vma, inode, index))
goto alloc_nohuge;
@@ -2265,6 +2270,7 @@ static struct inode *shmem_get_inode(str
atomic_set(&info->stop_eviction, 0);
info->seals = F_SEAL_SEAL;
info->flags = flags & VM_NORESERVE;
+ info->i_crtime = inode->i_mtime;
INIT_LIST_HEAD(&info->shrinklist);
INIT_LIST_HEAD(&info->swaplist);
simple_xattrs_init(&info->xattrs);
@@ -3196,6 +3202,7 @@ static ssize_t shmem_listxattr(struct de
#endif /* CONFIG_TMPFS_XATTR */
static const struct inode_operations shmem_short_symlink_operations = {
+ .getattr = shmem_getattr,
.get_link = simple_get_link,
#ifdef CONFIG_TMPFS_XATTR
.listxattr = shmem_listxattr,
@@ -3203,6 +3210,7 @@ static const struct inode_operations shm
};
static const struct inode_operations shmem_symlink_inode_operations = {
+ .getattr = shmem_getattr,
.get_link = shmem_get_link,
#ifdef CONFIG_TMPFS_XATTR
.listxattr = shmem_listxattr,
@@ -3790,6 +3798,7 @@ static const struct inode_operations shm
static const struct inode_operations shmem_dir_inode_operations = {
#ifdef CONFIG_TMPFS
+ .getattr = shmem_getattr,
.create = shmem_create,
.lookup = simple_lookup,
.link = shmem_link,
@@ -3811,6 +3820,7 @@ static const struct inode_operations shm
};
static const struct inode_operations shmem_special_inode_operations = {
+ .getattr = shmem_getattr,
#ifdef CONFIG_TMPFS_XATTR
.listxattr = shmem_listxattr,
#endif
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 029/227] shmem: mapping_set_exiting() to help mapped resilience
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: hughd, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: shmem: mapping_set_exiting() to help mapped resilience
When I added page_mapped() resilience in __delete_from_page_cache() for
the mapping_exiting() case, I missed that mapping_set_exiting() is done in
truncate_inode_pages_final(), which is not actually called for shmem.
(Today, it is folio_mapped() resilience in filemap_unaccount_folio().)
So the fixup to avoid a memory leak in this case never worked on shmem:
add a mapping_set_exiting() in shmem_evict_inode() at last. But this is
hardly a candidate for stable, since it's only useful if "Bad page".
Link: https://lkml.kernel.org/r/beefffda-6326-e36d-2d41-ed15b51af872@google.com
Fixes: 06b241f32c71 ("mm: __delete_from_page_cache show Bad page if mapped")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/shmem.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/shmem.c~shmem-mapping_set_exiting-to-help-mapped-resilience
+++ a/mm/shmem.c
@@ -1129,6 +1129,7 @@ static void shmem_evict_inode(struct ino
if (shmem_mapping(inode->i_mapping)) {
shmem_unacct_size(info->flags, inode->i_size);
inode->i_size = 0;
+ mapping_set_exiting(inode->i_mapping);
shmem_truncate_range(inode, 0, (loff_t)-1);
if (!list_empty(&info->shrinklist)) {
spin_lock(&sbinfo->shrinklist_lock);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 029/227] shmem: mapping_set_exiting() to help mapped resilience
@ 2022-03-22 21:39 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:39 UTC (permalink / raw)
To: hughd, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: shmem: mapping_set_exiting() to help mapped resilience
When I added page_mapped() resilience in __delete_from_page_cache() for
the mapping_exiting() case, I missed that mapping_set_exiting() is done in
truncate_inode_pages_final(), which is not actually called for shmem.
(Today, it is folio_mapped() resilience in filemap_unaccount_folio().)
So the fixup to avoid a memory leak in this case never worked on shmem:
add a mapping_set_exiting() in shmem_evict_inode() at last. But this is
hardly a candidate for stable, since it's only useful if "Bad page".
Link: https://lkml.kernel.org/r/beefffda-6326-e36d-2d41-ed15b51af872@google.com
Fixes: 06b241f32c71 ("mm: __delete_from_page_cache show Bad page if mapped")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/shmem.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/shmem.c~shmem-mapping_set_exiting-to-help-mapped-resilience
+++ a/mm/shmem.c
@@ -1129,6 +1129,7 @@ static void shmem_evict_inode(struct ino
if (shmem_mapping(inode->i_mapping)) {
shmem_unacct_size(info->flags, inode->i_size);
inode->i_size = 0;
+ mapping_set_exiting(inode->i_mapping);
shmem_truncate_range(inode, 0, (loff_t)-1);
if (!list_empty(&info->shrinklist)) {
spin_lock(&sbinfo->shrinklist_lock);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 030/227] tmpfs: do not allocate pages on read
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: zkabelac, mpatocka, miklos, lczerner, hch, djwong, bp, hughd,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: tmpfs: do not allocate pages on read
Mikulas asked in
https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/
Do we still need a0ee5ec520ed ("tmpfs: allocate on read when stacked")?
Lukas noticed this unusual behavior of loop device backed by tmpfs in
https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/
Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes;
but if it looks like it might be a read for "a stacking filesystem", it
allocates actual pages to the page cache, and even marks them as dirty.
And reads from the loop device do satisfy the test that is used.
This oddity was added for an old version of unionfs, to help to limit its
usage to the limited size of the tmpfs mount involved; but about the same
time as the tmpfs mod went in (2.6.25), unionfs was reworked to proceed
differently; and the mod kept just in case others needed it.
Do we still need it? I cannot answer with more certainty than "Probably
not". It's nasty enough that we really should try to delete it; but if a
regression is reported somewhere, then we might have to revert later.
It's not quite as simple as just removing the test (as Mikulas did):
xfstests generic/013 hung because splice from tmpfs failed on page not
up-to-date and page mapping unset. That can be fixed just by marking the
ZERO_PAGE as Uptodate, which of course it is: do so in pagecache_init() -
it might be useful to others than tmpfs.
My intention, though, was to stop using the ZERO_PAGE here altogether:
surely iov_iter_zero() is better for this case? Sadly not: it relies on
clear_user(), and the x86 clear_user() is slower than its copy_user():
https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/
But while we are still using the ZERO_PAGE, let's stop dirtying its struct
page cacheline with unnecessary get_page() and put_page().
Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Lukas Czerner <lczerner@redhat.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/filemap.c | 6 ++++++
mm/shmem.c | 20 ++++++--------------
2 files changed, 12 insertions(+), 14 deletions(-)
--- a/mm/filemap.c~tmpfs-do-not-allocate-pages-on-read
+++ a/mm/filemap.c
@@ -1054,6 +1054,12 @@ void __init pagecache_init(void)
init_waitqueue_head(&folio_wait_table[i]);
page_writeback_init();
+
+ /*
+ * tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date,
+ * and splice's page_cache_pipe_buf_confirm() needs to see that.
+ */
+ SetPageUptodate(ZERO_PAGE(0));
}
/*
--- a/mm/shmem.c~tmpfs-do-not-allocate-pages-on-read
+++ a/mm/shmem.c
@@ -2499,19 +2499,10 @@ static ssize_t shmem_file_read_iter(stru
struct address_space *mapping = inode->i_mapping;
pgoff_t index;
unsigned long offset;
- enum sgp_type sgp = SGP_READ;
int error = 0;
ssize_t retval = 0;
loff_t *ppos = &iocb->ki_pos;
- /*
- * Might this read be for a stacking filesystem? Then when reading
- * holes of a sparse file, we actually need to allocate those pages,
- * and even mark them dirty, so it cannot exceed the max_blocks limit.
- */
- if (!iter_is_iovec(to))
- sgp = SGP_CACHE;
-
index = *ppos >> PAGE_SHIFT;
offset = *ppos & ~PAGE_MASK;
@@ -2520,6 +2511,7 @@ static ssize_t shmem_file_read_iter(stru
pgoff_t end_index;
unsigned long nr, ret;
loff_t i_size = i_size_read(inode);
+ bool got_page;
end_index = i_size >> PAGE_SHIFT;
if (index > end_index)
@@ -2530,15 +2522,13 @@ static ssize_t shmem_file_read_iter(stru
break;
}
- error = shmem_getpage(inode, index, &page, sgp);
+ error = shmem_getpage(inode, index, &page, SGP_READ);
if (error) {
if (error == -EINVAL)
error = 0;
break;
}
if (page) {
- if (sgp == SGP_CACHE)
- set_page_dirty(page);
unlock_page(page);
if (PageHWPoison(page)) {
@@ -2578,9 +2568,10 @@ static ssize_t shmem_file_read_iter(stru
*/
if (!offset)
mark_page_accessed(page);
+ got_page = true;
} else {
page = ZERO_PAGE(0);
- get_page(page);
+ got_page = false;
}
/*
@@ -2593,7 +2584,8 @@ static ssize_t shmem_file_read_iter(stru
index += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK;
- put_page(page);
+ if (got_page)
+ put_page(page);
if (!iov_iter_count(to))
break;
if (ret < nr) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 030/227] tmpfs: do not allocate pages on read
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: zkabelac, mpatocka, miklos, lczerner, hch, djwong, bp, hughd,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: tmpfs: do not allocate pages on read
Mikulas asked in
https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/
Do we still need a0ee5ec520ed ("tmpfs: allocate on read when stacked")?
Lukas noticed this unusual behavior of loop device backed by tmpfs in
https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/
Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading holes;
but if it looks like it might be a read for "a stacking filesystem", it
allocates actual pages to the page cache, and even marks them as dirty.
And reads from the loop device do satisfy the test that is used.
This oddity was added for an old version of unionfs, to help to limit its
usage to the limited size of the tmpfs mount involved; but about the same
time as the tmpfs mod went in (2.6.25), unionfs was reworked to proceed
differently; and the mod kept just in case others needed it.
Do we still need it? I cannot answer with more certainty than "Probably
not". It's nasty enough that we really should try to delete it; but if a
regression is reported somewhere, then we might have to revert later.
It's not quite as simple as just removing the test (as Mikulas did):
xfstests generic/013 hung because splice from tmpfs failed on page not
up-to-date and page mapping unset. That can be fixed just by marking the
ZERO_PAGE as Uptodate, which of course it is: do so in pagecache_init() -
it might be useful to others than tmpfs.
My intention, though, was to stop using the ZERO_PAGE here altogether:
surely iov_iter_zero() is better for this case? Sadly not: it relies on
clear_user(), and the x86 clear_user() is slower than its copy_user():
https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/
But while we are still using the ZERO_PAGE, let's stop dirtying its struct
page cacheline with unnecessary get_page() and put_page().
Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Lukas Czerner <lczerner@redhat.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/filemap.c | 6 ++++++
mm/shmem.c | 20 ++++++--------------
2 files changed, 12 insertions(+), 14 deletions(-)
--- a/mm/filemap.c~tmpfs-do-not-allocate-pages-on-read
+++ a/mm/filemap.c
@@ -1054,6 +1054,12 @@ void __init pagecache_init(void)
init_waitqueue_head(&folio_wait_table[i]);
page_writeback_init();
+
+ /*
+ * tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date,
+ * and splice's page_cache_pipe_buf_confirm() needs to see that.
+ */
+ SetPageUptodate(ZERO_PAGE(0));
}
/*
--- a/mm/shmem.c~tmpfs-do-not-allocate-pages-on-read
+++ a/mm/shmem.c
@@ -2499,19 +2499,10 @@ static ssize_t shmem_file_read_iter(stru
struct address_space *mapping = inode->i_mapping;
pgoff_t index;
unsigned long offset;
- enum sgp_type sgp = SGP_READ;
int error = 0;
ssize_t retval = 0;
loff_t *ppos = &iocb->ki_pos;
- /*
- * Might this read be for a stacking filesystem? Then when reading
- * holes of a sparse file, we actually need to allocate those pages,
- * and even mark them dirty, so it cannot exceed the max_blocks limit.
- */
- if (!iter_is_iovec(to))
- sgp = SGP_CACHE;
-
index = *ppos >> PAGE_SHIFT;
offset = *ppos & ~PAGE_MASK;
@@ -2520,6 +2511,7 @@ static ssize_t shmem_file_read_iter(stru
pgoff_t end_index;
unsigned long nr, ret;
loff_t i_size = i_size_read(inode);
+ bool got_page;
end_index = i_size >> PAGE_SHIFT;
if (index > end_index)
@@ -2530,15 +2522,13 @@ static ssize_t shmem_file_read_iter(stru
break;
}
- error = shmem_getpage(inode, index, &page, sgp);
+ error = shmem_getpage(inode, index, &page, SGP_READ);
if (error) {
if (error == -EINVAL)
error = 0;
break;
}
if (page) {
- if (sgp == SGP_CACHE)
- set_page_dirty(page);
unlock_page(page);
if (PageHWPoison(page)) {
@@ -2578,9 +2568,10 @@ static ssize_t shmem_file_read_iter(stru
*/
if (!offset)
mark_page_accessed(page);
+ got_page = true;
} else {
page = ZERO_PAGE(0);
- get_page(page);
+ got_page = false;
}
/*
@@ -2593,7 +2584,8 @@ static ssize_t shmem_file_read_iter(stru
index += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK;
- put_page(page);
+ if (got_page)
+ put_page(page);
if (!iov_iter_count(to))
break;
if (ret < nr) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 031/227] mm: shmem: use helper macro __ATTR_RW
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: hughd, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm: shmem: use helper macro __ATTR_RW
Use helper macro __ATTR_RW to define shmem_enabled_attr to make code more
clear. Minor readability improvement.
Link: https://lkml.kernel.org/r/20220312082252.55586-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/shmem.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/shmem.c~mm-shmem-use-helper-macro-__attr_rw
+++ a/mm/shmem.c
@@ -3965,8 +3965,7 @@ static ssize_t shmem_enabled_store(struc
return count;
}
-struct kobj_attribute shmem_enabled_attr =
- __ATTR(shmem_enabled, 0644, shmem_enabled_show, shmem_enabled_store);
+struct kobj_attribute shmem_enabled_attr = __ATTR_RW(shmem_enabled);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */
#else /* !CONFIG_SHMEM */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 031/227] mm: shmem: use helper macro __ATTR_RW
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: hughd, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm: shmem: use helper macro __ATTR_RW
Use helper macro __ATTR_RW to define shmem_enabled_attr to make code more
clear. Minor readability improvement.
Link: https://lkml.kernel.org/r/20220312082252.55586-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/shmem.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/shmem.c~mm-shmem-use-helper-macro-__attr_rw
+++ a/mm/shmem.c
@@ -3965,8 +3965,7 @@ static ssize_t shmem_enabled_store(struc
return count;
}
-struct kobj_attribute shmem_enabled_attr =
- __ATTR(shmem_enabled, 0644, shmem_enabled_show, shmem_enabled_store);
+struct kobj_attribute shmem_enabled_attr = __ATTR_RW(shmem_enabled);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */
#else /* !CONFIG_SHMEM */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 032/227] memcg: replace in_interrupt() with !in_task()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vvs, roman.gushchin, mhocko, hannes, shakeelb, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: replace in_interrupt() with !in_task()
Replace the deprecated in_interrupt() with !in_task() because
in_interrupt() returns true for BH disabled even if the call happens in
the task context. in_task() is the right interface to differentiate task
context from NMI, hard IRQ and softirq contexts.
Link: https://lkml.kernel.org/r/20220127162636.3461256-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~memcg-replace-in_interrupt-with-in_task
+++ a/mm/memcontrol.c
@@ -2688,7 +2688,7 @@ done_restock:
READ_ONCE(memcg->swap.high);
/* Don't bother a random interrupted task */
- if (in_interrupt()) {
+ if (!in_task()) {
if (mem_high) {
schedule_work(&memcg->high_work);
break;
@@ -6968,7 +6968,7 @@ void mem_cgroup_sk_alloc(struct sock *sk
return;
/* Do not associate the sock with unrelated interrupted task's memcg. */
- if (in_interrupt())
+ if (!in_task())
return;
rcu_read_lock();
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 032/227] memcg: replace in_interrupt() with !in_task()
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vvs, roman.gushchin, mhocko, hannes, shakeelb, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: replace in_interrupt() with !in_task()
Replace the deprecated in_interrupt() with !in_task() because
in_interrupt() returns true for BH disabled even if the call happens in
the task context. in_task() is the right interface to differentiate task
context from NMI, hard IRQ and softirq contexts.
Link: https://lkml.kernel.org/r/20220127162636.3461256-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~memcg-replace-in_interrupt-with-in_task
+++ a/mm/memcontrol.c
@@ -2688,7 +2688,7 @@ done_restock:
READ_ONCE(memcg->swap.high);
/* Don't bother a random interrupted task */
- if (in_interrupt()) {
+ if (!in_task()) {
if (mem_high) {
schedule_work(&memcg->high_work);
break;
@@ -6968,7 +6968,7 @@ void mem_cgroup_sk_alloc(struct sock *sk
return;
/* Do not associate the sock with unrelated interrupted task's memcg. */
- if (in_interrupt())
+ if (!in_task())
return;
rcu_read_lock();
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 033/227] memcg: add per-memcg total kernel memory stat
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: songmuchun, shakeelb, mhocko, hannes, yosryahmed, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Yosry Ahmed <yosryahmed@google.com>
Subject: memcg: add per-memcg total kernel memory stat
Currently memcg stats show several types of kernel memory: kernel stack,
page tables, sock, vmalloc, and slab. However, there are other
allocations with __GFP_ACCOUNT (or supersets such as GFP_KERNEL_ACCOUNT)
that are not accounted in any of those stats, a few examples are:
- various kvm allocations (e.g. allocated pages to create vcpus)
- io_uring
- tmp_page in pipes during pipe_write()
- bpf ringbuffers
- unix sockets
Keeping track of the total kernel memory is essential for the ease of
migration from cgroup v1 to v2 as there are large discrepancies between
v1's kmem.usage_in_bytes and the sum of the available kernel memory stats
in v2. Adding separate memcg stats for all __GFP_ACCOUNT kernel
allocations is an impractical maintenance burden as there a lot of those
all over the kernel code, with more use cases likely to show up in the
future.
Therefore, add a "kernel" memcg stat that is analogous to kmem page
counter, with added benefits such as using rstat infrastructure which
aggregates stats more efficiently. Additionally, this provides a lighter
alternative in case the legacy kmem is deprecated in the future
[yosryahmed@google.com: v2]
Link: https://lkml.kernel.org/r/20220203193856.972500-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220201200823.3283171-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/cgroup-v2.rst | 5 ++++
include/linux/memcontrol.h | 1
mm/memcontrol.c | 27 +++++++++++++++++-----
3 files changed, 27 insertions(+), 6 deletions(-)
--- a/Documentation/admin-guide/cgroup-v2.rst~memcg-add-per-memcg-total-kernel-memory-stat
+++ a/Documentation/admin-guide/cgroup-v2.rst
@@ -1301,6 +1301,11 @@ PAGE_SIZE multiple when read back.
Amount of memory used to cache filesystem data,
including tmpfs and shared memory.
+ kernel (npn)
+ Amount of total kernel memory, including
+ (kernel_stack, pagetables, percpu, vmalloc, slab) in
+ addition to other kernel memory use cases.
+
kernel_stack
Amount of memory allocated to kernel stacks.
--- a/include/linux/memcontrol.h~memcg-add-per-memcg-total-kernel-memory-stat
+++ a/include/linux/memcontrol.h
@@ -34,6 +34,7 @@ enum memcg_stat_item {
MEMCG_SOCK,
MEMCG_PERCPU_B,
MEMCG_VMALLOC,
+ MEMCG_KMEM,
MEMCG_NR_STAT,
};
--- a/mm/memcontrol.c~memcg-add-per-memcg-total-kernel-memory-stat
+++ a/mm/memcontrol.c
@@ -1371,6 +1371,7 @@ struct memory_stat {
static const struct memory_stat memory_stats[] = {
{ "anon", NR_ANON_MAPPED },
{ "file", NR_FILE_PAGES },
+ { "kernel", MEMCG_KMEM },
{ "kernel_stack", NR_KERNEL_STACK_KB },
{ "pagetables", NR_PAGETABLE },
{ "percpu", MEMCG_PERCPU_B },
@@ -2114,6 +2115,7 @@ static DEFINE_MUTEX(percpu_charge_mutex)
static void drain_obj_stock(struct obj_stock *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
+static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages);
#else
static inline void drain_obj_stock(struct obj_stock *stock)
@@ -2124,6 +2126,9 @@ static bool obj_stock_flush_required(str
{
return false;
}
+static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
+{
+}
#endif
/**
@@ -2979,6 +2984,18 @@ static void memcg_free_cache_id(int id)
ida_simple_remove(&memcg_cache_ida, id);
}
+static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
+{
+ mod_memcg_state(memcg, MEMCG_KMEM, nr_pages);
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+ if (nr_pages > 0)
+ page_counter_charge(&memcg->kmem, nr_pages);
+ else
+ page_counter_uncharge(&memcg->kmem, -nr_pages);
+ }
+}
+
+
/*
* obj_cgroup_uncharge_pages: uncharge a number of kernel pages from a objcg
* @objcg: object cgroup to uncharge
@@ -2991,8 +3008,7 @@ static void obj_cgroup_uncharge_pages(st
memcg = get_mem_cgroup_from_objcg(objcg);
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
- page_counter_uncharge(&memcg->kmem, nr_pages);
+ memcg_account_kmem(memcg, -nr_pages);
refill_stock(memcg, nr_pages);
css_put(&memcg->css);
@@ -3018,8 +3034,7 @@ static int obj_cgroup_charge_pages(struc
if (ret)
goto out;
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
- page_counter_charge(&memcg->kmem, nr_pages);
+ memcg_account_kmem(memcg, nr_pages);
out:
css_put(&memcg->css);
@@ -6801,8 +6816,8 @@ static void uncharge_batch(const struct
page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
if (do_memsw_account())
page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
- page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
+ if (ug->nr_kmem)
+ memcg_account_kmem(ug->memcg, -ug->nr_kmem);
memcg_oom_recover(ug->memcg);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 033/227] memcg: add per-memcg total kernel memory stat
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: songmuchun, shakeelb, mhocko, hannes, yosryahmed, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Yosry Ahmed <yosryahmed@google.com>
Subject: memcg: add per-memcg total kernel memory stat
Currently memcg stats show several types of kernel memory: kernel stack,
page tables, sock, vmalloc, and slab. However, there are other
allocations with __GFP_ACCOUNT (or supersets such as GFP_KERNEL_ACCOUNT)
that are not accounted in any of those stats, a few examples are:
- various kvm allocations (e.g. allocated pages to create vcpus)
- io_uring
- tmp_page in pipes during pipe_write()
- bpf ringbuffers
- unix sockets
Keeping track of the total kernel memory is essential for the ease of
migration from cgroup v1 to v2 as there are large discrepancies between
v1's kmem.usage_in_bytes and the sum of the available kernel memory stats
in v2. Adding separate memcg stats for all __GFP_ACCOUNT kernel
allocations is an impractical maintenance burden as there a lot of those
all over the kernel code, with more use cases likely to show up in the
future.
Therefore, add a "kernel" memcg stat that is analogous to kmem page
counter, with added benefits such as using rstat infrastructure which
aggregates stats more efficiently. Additionally, this provides a lighter
alternative in case the legacy kmem is deprecated in the future
[yosryahmed@google.com: v2]
Link: https://lkml.kernel.org/r/20220203193856.972500-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20220201200823.3283171-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/cgroup-v2.rst | 5 ++++
include/linux/memcontrol.h | 1
mm/memcontrol.c | 27 +++++++++++++++++-----
3 files changed, 27 insertions(+), 6 deletions(-)
--- a/Documentation/admin-guide/cgroup-v2.rst~memcg-add-per-memcg-total-kernel-memory-stat
+++ a/Documentation/admin-guide/cgroup-v2.rst
@@ -1301,6 +1301,11 @@ PAGE_SIZE multiple when read back.
Amount of memory used to cache filesystem data,
including tmpfs and shared memory.
+ kernel (npn)
+ Amount of total kernel memory, including
+ (kernel_stack, pagetables, percpu, vmalloc, slab) in
+ addition to other kernel memory use cases.
+
kernel_stack
Amount of memory allocated to kernel stacks.
--- a/include/linux/memcontrol.h~memcg-add-per-memcg-total-kernel-memory-stat
+++ a/include/linux/memcontrol.h
@@ -34,6 +34,7 @@ enum memcg_stat_item {
MEMCG_SOCK,
MEMCG_PERCPU_B,
MEMCG_VMALLOC,
+ MEMCG_KMEM,
MEMCG_NR_STAT,
};
--- a/mm/memcontrol.c~memcg-add-per-memcg-total-kernel-memory-stat
+++ a/mm/memcontrol.c
@@ -1371,6 +1371,7 @@ struct memory_stat {
static const struct memory_stat memory_stats[] = {
{ "anon", NR_ANON_MAPPED },
{ "file", NR_FILE_PAGES },
+ { "kernel", MEMCG_KMEM },
{ "kernel_stack", NR_KERNEL_STACK_KB },
{ "pagetables", NR_PAGETABLE },
{ "percpu", MEMCG_PERCPU_B },
@@ -2114,6 +2115,7 @@ static DEFINE_MUTEX(percpu_charge_mutex)
static void drain_obj_stock(struct obj_stock *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
+static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages);
#else
static inline void drain_obj_stock(struct obj_stock *stock)
@@ -2124,6 +2126,9 @@ static bool obj_stock_flush_required(str
{
return false;
}
+static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
+{
+}
#endif
/**
@@ -2979,6 +2984,18 @@ static void memcg_free_cache_id(int id)
ida_simple_remove(&memcg_cache_ida, id);
}
+static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
+{
+ mod_memcg_state(memcg, MEMCG_KMEM, nr_pages);
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+ if (nr_pages > 0)
+ page_counter_charge(&memcg->kmem, nr_pages);
+ else
+ page_counter_uncharge(&memcg->kmem, -nr_pages);
+ }
+}
+
+
/*
* obj_cgroup_uncharge_pages: uncharge a number of kernel pages from a objcg
* @objcg: object cgroup to uncharge
@@ -2991,8 +3008,7 @@ static void obj_cgroup_uncharge_pages(st
memcg = get_mem_cgroup_from_objcg(objcg);
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
- page_counter_uncharge(&memcg->kmem, nr_pages);
+ memcg_account_kmem(memcg, -nr_pages);
refill_stock(memcg, nr_pages);
css_put(&memcg->css);
@@ -3018,8 +3034,7 @@ static int obj_cgroup_charge_pages(struc
if (ret)
goto out;
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
- page_counter_charge(&memcg->kmem, nr_pages);
+ memcg_account_kmem(memcg, nr_pages);
out:
css_put(&memcg->css);
@@ -6801,8 +6816,8 @@ static void uncharge_batch(const struct
page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
if (do_memsw_account())
page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
- page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
+ if (ug->nr_kmem)
+ memcg_account_kmem(ug->memcg, -ug->nr_kmem);
memcg_oom_recover(ug->memcg);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 034/227] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, vbabka, surenb, songmuchun, shy828301, shakeelb,
rppt, roman.gushchin, mhocko, hannes, guro, richard.weiyang,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Wei Yang <richard.weiyang@gmail.com>
Subject: mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
kzalloc_node() would set data to 0, so it's not necessary to set it
again.
Link: https://lkml.kernel.org/r/20220201004643.8391-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-mem_cgroup_per_node-is-already-set-to-0-on-allocation
+++ a/mm/memcontrol.c
@@ -5105,8 +5105,6 @@ static int alloc_mem_cgroup_per_node_inf
}
lruvec_init(&pn->lruvec);
- pn->usage_in_excess = 0;
- pn->on_tree = false;
pn->memcg = memcg;
memcg->nodeinfo[node] = pn;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 034/227] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, vbabka, surenb, songmuchun, shy828301, shakeelb,
rppt, roman.gushchin, mhocko, hannes, guro, richard.weiyang,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Wei Yang <richard.weiyang@gmail.com>
Subject: mm/memcg: mem_cgroup_per_node is already set to 0 on allocation
kzalloc_node() would set data to 0, so it's not necessary to set it
again.
Link: https://lkml.kernel.org/r/20220201004643.8391-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 2 --
1 file changed, 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-mem_cgroup_per_node-is-already-set-to-0-on-allocation
+++ a/mm/memcontrol.c
@@ -5105,8 +5105,6 @@ static int alloc_mem_cgroup_per_node_inf
}
lruvec_init(&pn->lruvec);
- pn->usage_in_excess = 0;
- pn->on_tree = false;
pn->memcg = memcg;
memcg->nodeinfo[node] = pn;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 035/227] mm/memcg: retrieve parent memcg from css.parent
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, vbabka, surenb, songmuchun, shy828301, shakeelb,
rppt, roman.gushchin, mhocko, hannes, guro, richard.weiyang,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Wei Yang <richard.weiyang@gmail.com>
Subject: mm/memcg: retrieve parent memcg from css.parent
The parent we get from page_counter is correct, while this is two
different hierarchy.
Let's retrieve the parent memcg from css.parent just like parent_cs(),
blkcg_parent(), etc.
Link: https://lkml.kernel.org/r/20220201004643.8391-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/memcontrol.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/include/linux/memcontrol.h~mm-memcg-retrieve-parent-memcg-from-cssparent
+++ a/include/linux/memcontrol.h
@@ -842,9 +842,7 @@ static inline struct mem_cgroup *lruvec_
*/
static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
{
- if (!memcg->memory.parent)
- return NULL;
- return mem_cgroup_from_counter(memcg->memory.parent, memory);
+ return mem_cgroup_from_css(memcg->css.parent);
}
static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 035/227] mm/memcg: retrieve parent memcg from css.parent
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, vbabka, surenb, songmuchun, shy828301, shakeelb,
rppt, roman.gushchin, mhocko, hannes, guro, richard.weiyang,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Wei Yang <richard.weiyang@gmail.com>
Subject: mm/memcg: retrieve parent memcg from css.parent
The parent we get from page_counter is correct, while this is two
different hierarchy.
Let's retrieve the parent memcg from css.parent just like parent_cs(),
blkcg_parent(), etc.
Link: https://lkml.kernel.org/r/20220201004643.8391-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/memcontrol.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/include/linux/memcontrol.h~mm-memcg-retrieve-parent-memcg-from-cssparent
+++ a/include/linux/memcontrol.h
@@ -842,9 +842,7 @@ static inline struct mem_cgroup *lruvec_
*/
static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
{
- if (!memcg->memory.parent)
- return NULL;
- return mem_cgroup_from_counter(memcg->memory.parent, memory);
+ return mem_cgroup_from_css(memcg->css.parent);
}
static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 036/227] memcg: refactor mem_cgroup_oom
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: refactor mem_cgroup_oom
Patch series "memcg: robust enforcement of memory.high", v2.
Due to the semantics of memory.high enforcement i.e. throttle the
workload without oom-kill, we are trying to use it for right sizing the
workloads in our production environment. However we observed the
mechanism fails for some specific applications which does big chunck of
allocations in a single syscall. The reason behind this failure is due to
the limitation of the memory.high enforcement's current implementation.
This patch series solves this issue by enforcing the memory.high
synchronously if the current process has accumulated a large amount of
high overcharge.
This patch (of 4):
The function mem_cgroup_oom returns enum which has four possible values
but the caller does not care about such values and only cares if the
return value is OOM_SUCCESS or not. So, remove the enum altogether and
make mem_cgroup_oom returns a simple bool.
Link: https://lkml.kernel.org/r/20220211064917.2028469-1-shakeelb@google.com
Link: https://lkml.kernel.org/r/20220211064917.2028469-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 44 +++++++++++++++++---------------------------
1 file changed, 17 insertions(+), 27 deletions(-)
--- a/mm/memcontrol.c~memcg-refactor-mem_cgroup_oom
+++ a/mm/memcontrol.c
@@ -1796,20 +1796,16 @@ static void memcg_oom_recover(struct mem
__wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg);
}
-enum oom_status {
- OOM_SUCCESS,
- OOM_FAILED,
- OOM_ASYNC,
- OOM_SKIPPED
-};
-
-static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
+/*
+ * Returns true if successfully killed one or more processes. Though in some
+ * corner cases it can return true even without killing any process.
+ */
+static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
{
- enum oom_status ret;
- bool locked;
+ bool locked, ret;
if (order > PAGE_ALLOC_COSTLY_ORDER)
- return OOM_SKIPPED;
+ return false;
memcg_memory_event(memcg, MEMCG_OOM);
@@ -1832,14 +1828,13 @@ static enum oom_status mem_cgroup_oom(st
* victim and then we have to bail out from the charge path.
*/
if (memcg->oom_kill_disable) {
- if (!current->in_user_fault)
- return OOM_SKIPPED;
- css_get(&memcg->css);
- current->memcg_in_oom = memcg;
- current->memcg_oom_gfp_mask = mask;
- current->memcg_oom_order = order;
-
- return OOM_ASYNC;
+ if (current->in_user_fault) {
+ css_get(&memcg->css);
+ current->memcg_in_oom = memcg;
+ current->memcg_oom_gfp_mask = mask;
+ current->memcg_oom_order = order;
+ }
+ return false;
}
mem_cgroup_mark_under_oom(memcg);
@@ -1850,10 +1845,7 @@ static enum oom_status mem_cgroup_oom(st
mem_cgroup_oom_notify(memcg);
mem_cgroup_unmark_under_oom(memcg);
- if (mem_cgroup_out_of_memory(memcg, mask, order))
- ret = OOM_SUCCESS;
- else
- ret = OOM_FAILED;
+ ret = mem_cgroup_out_of_memory(memcg, mask, order);
if (locked)
mem_cgroup_oom_unlock(memcg);
@@ -2546,7 +2538,6 @@ static int try_charge_memcg(struct mem_c
int nr_retries = MAX_RECLAIM_RETRIES;
struct mem_cgroup *mem_over_limit;
struct page_counter *counter;
- enum oom_status oom_status;
unsigned long nr_reclaimed;
bool passed_oom = false;
bool may_swap = true;
@@ -2649,9 +2640,8 @@ retry:
* a forward progress or bypass the charge if the oom killer
* couldn't make any progress.
*/
- oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
- get_order(nr_pages * PAGE_SIZE));
- if (oom_status == OOM_SUCCESS) {
+ if (mem_cgroup_oom(mem_over_limit, gfp_mask,
+ get_order(nr_pages * PAGE_SIZE))) {
passed_oom = true;
nr_retries = MAX_RECLAIM_RETRIES;
goto retry;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 036/227] memcg: refactor mem_cgroup_oom
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: refactor mem_cgroup_oom
Patch series "memcg: robust enforcement of memory.high", v2.
Due to the semantics of memory.high enforcement i.e. throttle the
workload without oom-kill, we are trying to use it for right sizing the
workloads in our production environment. However we observed the
mechanism fails for some specific applications which does big chunck of
allocations in a single syscall. The reason behind this failure is due to
the limitation of the memory.high enforcement's current implementation.
This patch series solves this issue by enforcing the memory.high
synchronously if the current process has accumulated a large amount of
high overcharge.
This patch (of 4):
The function mem_cgroup_oom returns enum which has four possible values
but the caller does not care about such values and only cares if the
return value is OOM_SUCCESS or not. So, remove the enum altogether and
make mem_cgroup_oom returns a simple bool.
Link: https://lkml.kernel.org/r/20220211064917.2028469-1-shakeelb@google.com
Link: https://lkml.kernel.org/r/20220211064917.2028469-2-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 44 +++++++++++++++++---------------------------
1 file changed, 17 insertions(+), 27 deletions(-)
--- a/mm/memcontrol.c~memcg-refactor-mem_cgroup_oom
+++ a/mm/memcontrol.c
@@ -1796,20 +1796,16 @@ static void memcg_oom_recover(struct mem
__wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg);
}
-enum oom_status {
- OOM_SUCCESS,
- OOM_FAILED,
- OOM_ASYNC,
- OOM_SKIPPED
-};
-
-static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
+/*
+ * Returns true if successfully killed one or more processes. Though in some
+ * corner cases it can return true even without killing any process.
+ */
+static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
{
- enum oom_status ret;
- bool locked;
+ bool locked, ret;
if (order > PAGE_ALLOC_COSTLY_ORDER)
- return OOM_SKIPPED;
+ return false;
memcg_memory_event(memcg, MEMCG_OOM);
@@ -1832,14 +1828,13 @@ static enum oom_status mem_cgroup_oom(st
* victim and then we have to bail out from the charge path.
*/
if (memcg->oom_kill_disable) {
- if (!current->in_user_fault)
- return OOM_SKIPPED;
- css_get(&memcg->css);
- current->memcg_in_oom = memcg;
- current->memcg_oom_gfp_mask = mask;
- current->memcg_oom_order = order;
-
- return OOM_ASYNC;
+ if (current->in_user_fault) {
+ css_get(&memcg->css);
+ current->memcg_in_oom = memcg;
+ current->memcg_oom_gfp_mask = mask;
+ current->memcg_oom_order = order;
+ }
+ return false;
}
mem_cgroup_mark_under_oom(memcg);
@@ -1850,10 +1845,7 @@ static enum oom_status mem_cgroup_oom(st
mem_cgroup_oom_notify(memcg);
mem_cgroup_unmark_under_oom(memcg);
- if (mem_cgroup_out_of_memory(memcg, mask, order))
- ret = OOM_SUCCESS;
- else
- ret = OOM_FAILED;
+ ret = mem_cgroup_out_of_memory(memcg, mask, order);
if (locked)
mem_cgroup_oom_unlock(memcg);
@@ -2546,7 +2538,6 @@ static int try_charge_memcg(struct mem_c
int nr_retries = MAX_RECLAIM_RETRIES;
struct mem_cgroup *mem_over_limit;
struct page_counter *counter;
- enum oom_status oom_status;
unsigned long nr_reclaimed;
bool passed_oom = false;
bool may_swap = true;
@@ -2649,9 +2640,8 @@ retry:
* a forward progress or bypass the charge if the oom killer
* couldn't make any progress.
*/
- oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask,
- get_order(nr_pages * PAGE_SIZE));
- if (oom_status == OOM_SUCCESS) {
+ if (mem_cgroup_oom(mem_over_limit, gfp_mask,
+ get_order(nr_pages * PAGE_SIZE))) {
passed_oom = true;
nr_retries = MAX_RECLAIM_RETRIES;
goto retry;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 037/227] memcg: unify force charging conditions
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: unify force charging conditions
Currently the kernel force charges the allocations which have __GFP_HIGH
flag without triggering the memory reclaim. __GFP_HIGH indicates that the
caller is high priority and since commit 869712fd3de5 ("mm: memcontrol:
fix network errors from failing __GFP_ATOMIC charges") the kernel lets
such allocations do force charging. Please note that __GFP_ATOMIC has
been replaced by __GFP_HIGH.
__GFP_HIGH does not tell if the caller can block or can trigger reclaim.
There are separate checks to determine that. So, there is no need to skip
reclaiming for __GFP_HIGH allocations. So, handle __GFP_HIGH together
with __GFP_NOFAIL which also does force charging.
Please note that this is a noop change as there are no __GFP_HIGH
allocators in the kernel which also have __GFP_ACCOUNT (or SLAB_ACCOUNT)
and does not allow reclaim for now.
Link: https://lkml.kernel.org/r/20220211064917.2028469-3-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
--- a/mm/memcontrol.c~memcg-unify-force-charging-conditions
+++ a/mm/memcontrol.c
@@ -2566,15 +2566,6 @@ retry:
}
/*
- * Memcg doesn't have a dedicated reserve for atomic
- * allocations. But like the global atomic pool, we need to
- * put the burden of reclaim on regular allocation requests
- * and let these go through as privileged allocations.
- */
- if (gfp_mask & __GFP_ATOMIC)
- goto force;
-
- /*
* Prevent unbounded recursion when reclaim operations need to
* allocate memory. This might exceed the limits temporarily,
* but we prefer facilitating memory reclaim and getting back
@@ -2647,7 +2638,13 @@ retry:
goto retry;
}
nomem:
- if (!(gfp_mask & __GFP_NOFAIL))
+ /*
+ * Memcg doesn't have a dedicated reserve for atomic
+ * allocations. But like the global atomic pool, we need to
+ * put the burden of reclaim on regular allocation requests
+ * and let these go through as privileged allocations.
+ */
+ if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH)))
return -ENOMEM;
force:
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 037/227] memcg: unify force charging conditions
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: unify force charging conditions
Currently the kernel force charges the allocations which have __GFP_HIGH
flag without triggering the memory reclaim. __GFP_HIGH indicates that the
caller is high priority and since commit 869712fd3de5 ("mm: memcontrol:
fix network errors from failing __GFP_ATOMIC charges") the kernel lets
such allocations do force charging. Please note that __GFP_ATOMIC has
been replaced by __GFP_HIGH.
__GFP_HIGH does not tell if the caller can block or can trigger reclaim.
There are separate checks to determine that. So, there is no need to skip
reclaiming for __GFP_HIGH allocations. So, handle __GFP_HIGH together
with __GFP_NOFAIL which also does force charging.
Please note that this is a noop change as there are no __GFP_HIGH
allocators in the kernel which also have __GFP_ACCOUNT (or SLAB_ACCOUNT)
and does not allow reclaim for now.
Link: https://lkml.kernel.org/r/20220211064917.2028469-3-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
--- a/mm/memcontrol.c~memcg-unify-force-charging-conditions
+++ a/mm/memcontrol.c
@@ -2566,15 +2566,6 @@ retry:
}
/*
- * Memcg doesn't have a dedicated reserve for atomic
- * allocations. But like the global atomic pool, we need to
- * put the burden of reclaim on regular allocation requests
- * and let these go through as privileged allocations.
- */
- if (gfp_mask & __GFP_ATOMIC)
- goto force;
-
- /*
* Prevent unbounded recursion when reclaim operations need to
* allocate memory. This might exceed the limits temporarily,
* but we prefer facilitating memory reclaim and getting back
@@ -2647,7 +2638,13 @@ retry:
goto retry;
}
nomem:
- if (!(gfp_mask & __GFP_NOFAIL))
+ /*
+ * Memcg doesn't have a dedicated reserve for atomic
+ * allocations. But like the global atomic pool, we need to
+ * put the burden of reclaim on regular allocation requests
+ * and let these go through as privileged allocations.
+ */
+ if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH)))
return -ENOMEM;
force:
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 038/227] selftests: memcg: test high limit for single entry allocation
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: selftests: memcg: test high limit for single entry allocation
Test the enforcement of memory.high limit for large amount of memory
allocation within a single kernel entry. There are valid use-cases where
the application can trigger large amount of memory allocation within a
single syscall e.g. mlock() or mmap(MAP_POPULATE). Make sure memory.high
limit enforcement works for such use-cases.
Link: https://lkml.kernel.org/r/20220211064917.2028469-4-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/cgroup/cgroup_util.c | 15 ++
tools/testing/selftests/cgroup/cgroup_util.h | 1
tools/testing/selftests/cgroup/test_memcontrol.c | 78 +++++++++++++
3 files changed, 91 insertions(+), 3 deletions(-)
--- a/tools/testing/selftests/cgroup/cgroup_util.c~selftests-memcg-test-high-limit-for-single-entry-allocation
+++ a/tools/testing/selftests/cgroup/cgroup_util.c
@@ -583,7 +583,7 @@ int clone_into_cgroup_run_wait(const cha
return 0;
}
-int cg_prepare_for_wait(const char *cgroup)
+static int __prepare_for_wait(const char *cgroup, const char *filename)
{
int fd, ret = -1;
@@ -591,8 +591,7 @@ int cg_prepare_for_wait(const char *cgro
if (fd == -1)
return fd;
- ret = inotify_add_watch(fd, cg_control(cgroup, "cgroup.events"),
- IN_MODIFY);
+ ret = inotify_add_watch(fd, cg_control(cgroup, filename), IN_MODIFY);
if (ret == -1) {
close(fd);
fd = -1;
@@ -601,6 +600,16 @@ int cg_prepare_for_wait(const char *cgro
return fd;
}
+int cg_prepare_for_wait(const char *cgroup)
+{
+ return __prepare_for_wait(cgroup, "cgroup.events");
+}
+
+int memcg_prepare_for_wait(const char *cgroup)
+{
+ return __prepare_for_wait(cgroup, "memory.events");
+}
+
int cg_wait_for(int fd)
{
int ret = -1;
--- a/tools/testing/selftests/cgroup/cgroup_util.h~selftests-memcg-test-high-limit-for-single-entry-allocation
+++ a/tools/testing/selftests/cgroup/cgroup_util.h
@@ -55,4 +55,5 @@ extern int clone_reap(pid_t pid, int opt
extern int clone_into_cgroup_run_wait(const char *cgroup);
extern int dirfd_open_opath(const char *dir);
extern int cg_prepare_for_wait(const char *cgroup);
+extern int memcg_prepare_for_wait(const char *cgroup);
extern int cg_wait_for(int fd);
--- a/tools/testing/selftests/cgroup/test_memcontrol.c~selftests-memcg-test-high-limit-for-single-entry-allocation
+++ a/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -16,6 +16,7 @@
#include <netinet/in.h>
#include <netdb.h>
#include <errno.h>
+#include <sys/mman.h>
#include "../kselftest.h"
#include "cgroup_util.h"
@@ -628,6 +629,82 @@ cleanup:
return ret;
}
+static int alloc_anon_mlock(const char *cgroup, void *arg)
+{
+ size_t size = (size_t)arg;
+ void *buf;
+
+ buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON,
+ 0, 0);
+ if (buf == MAP_FAILED)
+ return -1;
+
+ mlock(buf, size);
+ munmap(buf, size);
+ return 0;
+}
+
+/*
+ * This test checks that memory.high is able to throttle big single shot
+ * allocation i.e. large allocation within one kernel entry.
+ */
+static int test_memcg_high_sync(const char *root)
+{
+ int ret = KSFT_FAIL, pid, fd = -1;
+ char *memcg;
+ long pre_high, pre_max;
+ long post_high, post_max;
+
+ memcg = cg_name(root, "memcg_test");
+ if (!memcg)
+ goto cleanup;
+
+ if (cg_create(memcg))
+ goto cleanup;
+
+ pre_high = cg_read_key_long(memcg, "memory.events", "high ");
+ pre_max = cg_read_key_long(memcg, "memory.events", "max ");
+ if (pre_high < 0 || pre_max < 0)
+ goto cleanup;
+
+ if (cg_write(memcg, "memory.swap.max", "0"))
+ goto cleanup;
+
+ if (cg_write(memcg, "memory.high", "30M"))
+ goto cleanup;
+
+ if (cg_write(memcg, "memory.max", "140M"))
+ goto cleanup;
+
+ fd = memcg_prepare_for_wait(memcg);
+ if (fd < 0)
+ goto cleanup;
+
+ pid = cg_run_nowait(memcg, alloc_anon_mlock, (void *)MB(200));
+ if (pid < 0)
+ goto cleanup;
+
+ cg_wait_for(fd);
+
+ post_high = cg_read_key_long(memcg, "memory.events", "high ");
+ post_max = cg_read_key_long(memcg, "memory.events", "max ");
+ if (post_high < 0 || post_max < 0)
+ goto cleanup;
+
+ if (pre_high == post_high || pre_max != post_max)
+ goto cleanup;
+
+ ret = KSFT_PASS;
+
+cleanup:
+ if (fd >= 0)
+ close(fd);
+ cg_destroy(memcg);
+ free(memcg);
+
+ return ret;
+}
+
/*
* This test checks that memory.max limits the amount of
* memory which can be consumed by either anonymous memory
@@ -1180,6 +1257,7 @@ struct memcg_test {
T(test_memcg_min),
T(test_memcg_low),
T(test_memcg_high),
+ T(test_memcg_high_sync),
T(test_memcg_max),
T(test_memcg_oom_events),
T(test_memcg_swap_max),
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 038/227] selftests: memcg: test high limit for single entry allocation
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: selftests: memcg: test high limit for single entry allocation
Test the enforcement of memory.high limit for large amount of memory
allocation within a single kernel entry. There are valid use-cases where
the application can trigger large amount of memory allocation within a
single syscall e.g. mlock() or mmap(MAP_POPULATE). Make sure memory.high
limit enforcement works for such use-cases.
Link: https://lkml.kernel.org/r/20220211064917.2028469-4-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/cgroup/cgroup_util.c | 15 ++
tools/testing/selftests/cgroup/cgroup_util.h | 1
tools/testing/selftests/cgroup/test_memcontrol.c | 78 +++++++++++++
3 files changed, 91 insertions(+), 3 deletions(-)
--- a/tools/testing/selftests/cgroup/cgroup_util.c~selftests-memcg-test-high-limit-for-single-entry-allocation
+++ a/tools/testing/selftests/cgroup/cgroup_util.c
@@ -583,7 +583,7 @@ int clone_into_cgroup_run_wait(const cha
return 0;
}
-int cg_prepare_for_wait(const char *cgroup)
+static int __prepare_for_wait(const char *cgroup, const char *filename)
{
int fd, ret = -1;
@@ -591,8 +591,7 @@ int cg_prepare_for_wait(const char *cgro
if (fd == -1)
return fd;
- ret = inotify_add_watch(fd, cg_control(cgroup, "cgroup.events"),
- IN_MODIFY);
+ ret = inotify_add_watch(fd, cg_control(cgroup, filename), IN_MODIFY);
if (ret == -1) {
close(fd);
fd = -1;
@@ -601,6 +600,16 @@ int cg_prepare_for_wait(const char *cgro
return fd;
}
+int cg_prepare_for_wait(const char *cgroup)
+{
+ return __prepare_for_wait(cgroup, "cgroup.events");
+}
+
+int memcg_prepare_for_wait(const char *cgroup)
+{
+ return __prepare_for_wait(cgroup, "memory.events");
+}
+
int cg_wait_for(int fd)
{
int ret = -1;
--- a/tools/testing/selftests/cgroup/cgroup_util.h~selftests-memcg-test-high-limit-for-single-entry-allocation
+++ a/tools/testing/selftests/cgroup/cgroup_util.h
@@ -55,4 +55,5 @@ extern int clone_reap(pid_t pid, int opt
extern int clone_into_cgroup_run_wait(const char *cgroup);
extern int dirfd_open_opath(const char *dir);
extern int cg_prepare_for_wait(const char *cgroup);
+extern int memcg_prepare_for_wait(const char *cgroup);
extern int cg_wait_for(int fd);
--- a/tools/testing/selftests/cgroup/test_memcontrol.c~selftests-memcg-test-high-limit-for-single-entry-allocation
+++ a/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -16,6 +16,7 @@
#include <netinet/in.h>
#include <netdb.h>
#include <errno.h>
+#include <sys/mman.h>
#include "../kselftest.h"
#include "cgroup_util.h"
@@ -628,6 +629,82 @@ cleanup:
return ret;
}
+static int alloc_anon_mlock(const char *cgroup, void *arg)
+{
+ size_t size = (size_t)arg;
+ void *buf;
+
+ buf = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON,
+ 0, 0);
+ if (buf == MAP_FAILED)
+ return -1;
+
+ mlock(buf, size);
+ munmap(buf, size);
+ return 0;
+}
+
+/*
+ * This test checks that memory.high is able to throttle big single shot
+ * allocation i.e. large allocation within one kernel entry.
+ */
+static int test_memcg_high_sync(const char *root)
+{
+ int ret = KSFT_FAIL, pid, fd = -1;
+ char *memcg;
+ long pre_high, pre_max;
+ long post_high, post_max;
+
+ memcg = cg_name(root, "memcg_test");
+ if (!memcg)
+ goto cleanup;
+
+ if (cg_create(memcg))
+ goto cleanup;
+
+ pre_high = cg_read_key_long(memcg, "memory.events", "high ");
+ pre_max = cg_read_key_long(memcg, "memory.events", "max ");
+ if (pre_high < 0 || pre_max < 0)
+ goto cleanup;
+
+ if (cg_write(memcg, "memory.swap.max", "0"))
+ goto cleanup;
+
+ if (cg_write(memcg, "memory.high", "30M"))
+ goto cleanup;
+
+ if (cg_write(memcg, "memory.max", "140M"))
+ goto cleanup;
+
+ fd = memcg_prepare_for_wait(memcg);
+ if (fd < 0)
+ goto cleanup;
+
+ pid = cg_run_nowait(memcg, alloc_anon_mlock, (void *)MB(200));
+ if (pid < 0)
+ goto cleanup;
+
+ cg_wait_for(fd);
+
+ post_high = cg_read_key_long(memcg, "memory.events", "high ");
+ post_max = cg_read_key_long(memcg, "memory.events", "max ");
+ if (post_high < 0 || post_max < 0)
+ goto cleanup;
+
+ if (pre_high == post_high || pre_max != post_max)
+ goto cleanup;
+
+ ret = KSFT_PASS;
+
+cleanup:
+ if (fd >= 0)
+ close(fd);
+ cg_destroy(memcg);
+ free(memcg);
+
+ return ret;
+}
+
/*
* This test checks that memory.max limits the amount of
* memory which can be consumed by either anonymous memory
@@ -1180,6 +1257,7 @@ struct memcg_test {
T(test_memcg_min),
T(test_memcg_low),
T(test_memcg_high),
+ T(test_memcg_high_sync),
T(test_memcg_max),
T(test_memcg_oom_events),
T(test_memcg_swap_max),
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 039/227] memcg: synchronously enforce memory.high for large overcharges
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: synchronously enforce memory.high for large overcharges
The high limit is used to throttle the workload without invoking the
oom-killer. Recently we tried to use the high limit to right size our
internal workloads. More specifically dynamically adjusting the limits of
the workload without letting the workload get oom-killed. However due to
the limitation of the implementation of high limit enforcement, we
observed the mechanism fails for some real workloads.
The high limit is enforced on return-to-userspace i.e. the kernel let the
usage goes over the limit and when the execution returns to userspace, the
high reclaim is triggered and the process can get throttled as well.
However this mechanism fails for workloads which do large allocations in a
single kernel entry e.g. applications that mlock() a large chunk of
memory in a single syscall. Such applications bypass the high limit and
can trigger the oom-killer.
To make high limit enforcement more robust, this patch makes the limit
enforcement synchronous only if the accumulated overcharge becomes larger
than MEMCG_CHARGE_BATCH. So, most of the allocations would still be
throttled on the return-to-userspace path but only the extreme allocations
which accumulates large amount of overcharge without returning to the
userspace will be throttled synchronously. The value MEMCG_CHARGE_BATCH
is a bit arbitrary but most of other places in the memcg codebase uses
this constant therefore for now uses the same one.
Link: https://lkml.kernel.org/r/20220211064917.2028469-5-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/memcontrol.c~memcg-synchronously-enforce-memoryhigh-for-large-overcharges
+++ a/mm/memcontrol.c
@@ -2704,6 +2704,11 @@ done_restock:
}
} while ((memcg = parent_mem_cgroup(memcg)));
+ if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
+ !(current->flags & PF_MEMALLOC) &&
+ gfpflags_allow_blocking(gfp_mask)) {
+ mem_cgroup_handle_over_high();
+ }
return 0;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 039/227] memcg: synchronously enforce memory.high for large overcharges
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: roman.gushchin, mhocko, hannes, guro, chris, shakeelb, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: synchronously enforce memory.high for large overcharges
The high limit is used to throttle the workload without invoking the
oom-killer. Recently we tried to use the high limit to right size our
internal workloads. More specifically dynamically adjusting the limits of
the workload without letting the workload get oom-killed. However due to
the limitation of the implementation of high limit enforcement, we
observed the mechanism fails for some real workloads.
The high limit is enforced on return-to-userspace i.e. the kernel let the
usage goes over the limit and when the execution returns to userspace, the
high reclaim is triggered and the process can get throttled as well.
However this mechanism fails for workloads which do large allocations in a
single kernel entry e.g. applications that mlock() a large chunk of
memory in a single syscall. Such applications bypass the high limit and
can trigger the oom-killer.
To make high limit enforcement more robust, this patch makes the limit
enforcement synchronous only if the accumulated overcharge becomes larger
than MEMCG_CHARGE_BATCH. So, most of the allocations would still be
throttled on the return-to-userspace path but only the extreme allocations
which accumulates large amount of overcharge without returning to the
userspace will be throttled synchronously. The value MEMCG_CHARGE_BATCH
is a bit arbitrary but most of other places in the memcg codebase uses
this constant therefore for now uses the same one.
Link: https://lkml.kernel.org/r/20220211064917.2028469-5-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/memcontrol.c~memcg-synchronously-enforce-memoryhigh-for-large-overcharges
+++ a/mm/memcontrol.c
@@ -2704,6 +2704,11 @@ done_restock:
}
} while ((memcg = parent_mem_cgroup(memcg)));
+ if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
+ !(current->flags & PF_MEMALLOC) &&
+ gfpflags_allow_blocking(gfp_mask)) {
+ mem_cgroup_handle_over_high();
+ }
return 0;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 040/227] mm/memcontrol: return 1 from cgroup.memory __setup() handler
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, roman.gushchin, mkoutny, mhocko, i.zhbanov, hannes,
rdunlap, akpm, patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1917 bytes --]
From: Randy Dunlap <rdunlap@infradead.org>
Subject: mm/memcontrol: return 1 from cgroup.memory __setup() handler
__setup() handlers should return 1 if the command line option is handled
and 0 if not (or maybe never return 0; it just pollutes init's
environment).
The only reason that this particular __setup handler does not pollute
init's environment is that the setup string contains a '.', as in
"cgroup.memory". This causes init/main.c::unknown_boottoption() to
consider it to be an "Unused module parameter" and ignore it. (This is
for parsing of loadable module parameters any time after kernel init.)
Otherwise the string "cgroup.memory=whatever" would be added to init's
environment strings.
Instead of relying on this '.' quirk, just return 1 to indicate that the
boot option has been handled.
Note that there is no warning message if someone enters:
cgroup.memory=anything_invalid
Link: https://lkml.kernel.org/r/20220222005811.10672-1-rdunlap@infradead.org
Fixes: f7e1cb6ec51b0 ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-return-1-from-cgroupmemory-__setup-handler
+++ a/mm/memcontrol.c
@@ -7058,7 +7058,7 @@ static int __init cgroup_memory(char *s)
if (!strcmp(token, "nokmem"))
cgroup_memory_nokmem = true;
}
- return 0;
+ return 1;
}
__setup("cgroup.memory=", cgroup_memory);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 040/227] mm/memcontrol: return 1 from cgroup.memory __setup() handler
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, roman.gushchin, mkoutny, mhocko, i.zhbanov, hannes,
rdunlap, akpm, patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1917 bytes --]
From: Randy Dunlap <rdunlap@infradead.org>
Subject: mm/memcontrol: return 1 from cgroup.memory __setup() handler
__setup() handlers should return 1 if the command line option is handled
and 0 if not (or maybe never return 0; it just pollutes init's
environment).
The only reason that this particular __setup handler does not pollute
init's environment is that the setup string contains a '.', as in
"cgroup.memory". This causes init/main.c::unknown_boottoption() to
consider it to be an "Unused module parameter" and ignore it. (This is
for parsing of loadable module parameters any time after kernel init.)
Otherwise the string "cgroup.memory=whatever" would be added to init's
environment strings.
Instead of relying on this '.' quirk, just return 1 to indicate that the
boot option has been handled.
Note that there is no warning message if someone enters:
cgroup.memory=anything_invalid
Link: https://lkml.kernel.org/r/20220222005811.10672-1-rdunlap@infradead.org
Fixes: f7e1cb6ec51b0 ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-return-1-from-cgroupmemory-__setup-handler
+++ a/mm/memcontrol.c
@@ -7058,7 +7058,7 @@ static int __init cgroup_memory(char *s)
if (!strcmp(token, "nokmem"))
cgroup_memory_nokmem = true;
}
- return 0;
+ return 1;
}
__setup("cgroup.memory=", cgroup_memory);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 041/227] mm/memcg: revert ("mm/memcg: optimize user context object stock access")
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, longman, hannes, guro, bigeasy, mhocko, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8789 bytes --]
From: Michal Hocko <mhocko@suse.com>
Subject: mm/memcg: revert ("mm/memcg: optimize user context object stock access")
Patch series "mm/memcg: Address PREEMPT_RT problems instead of disabling it", v5.
This series aims to address the memcg related problem on PREEMPT_RT.
I tested them on CONFIG_PREEMPT and CONFIG_PREEMPT_RT with the
tools/testing/selftests/cgroup/* tests and I haven't observed any
regressions (other than the lockdep report that is already there).
This patch (of 6):
The optimisation is based on a micro benchmark where local_irq_save() is
more expensive than a preempt_disable(). There is no evidence that it is
visible in a real-world workload and there are CPUs where the opposite is
true (local_irq_save() is cheaper than preempt_disable()).
Based on micro benchmarks, the optimisation makes sense on PREEMPT_NONE
where preempt_disable() is optimized away. There is no improvement with
PREEMPT_DYNAMIC since the preemption counter is always available.
The optimization makes also the PREEMPT_RT integration more complicated
since most of the assumption are not true on PREEMPT_RT.
Revert the optimisation since it complicates the PREEMPT_RT integration
and the improvement is hardly visible.
[bigeasy@linutronix.de: patch body around Michal's diff]
Link: https://lkml.kernel.org/r/20220226204144.1008339-1-bigeasy@linutronix.de
Link: https://lore.kernel.org/all/YgOGkXXCrD%2F1k+p4@dhcp22.suse.cz
Link: https://lkml.kernel.org/r/YdX+INO9gQje6d0S@linutronix.de
Link: https://lkml.kernel.org/r/20220226204144.1008339-2-bigeasy@linutronix.de
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 94 +++++++++++++---------------------------------
1 file changed, 27 insertions(+), 67 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-revert-mm-memcg-optimize-user-context-object-stock-access
+++ a/mm/memcontrol.c
@@ -2078,23 +2078,17 @@ void unlock_page_memcg(struct page *page
folio_memcg_unlock(page_folio(page));
}
-struct obj_stock {
+struct memcg_stock_pcp {
+ struct mem_cgroup *cached; /* this never be root cgroup */
+ unsigned int nr_pages;
+
#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *cached_objcg;
struct pglist_data *cached_pgdat;
unsigned int nr_bytes;
int nr_slab_reclaimable_b;
int nr_slab_unreclaimable_b;
-#else
- int dummy[0];
#endif
-};
-
-struct memcg_stock_pcp {
- struct mem_cgroup *cached; /* this never be root cgroup */
- unsigned int nr_pages;
- struct obj_stock task_obj;
- struct obj_stock irq_obj;
struct work_struct work;
unsigned long flags;
@@ -2104,13 +2098,13 @@ static DEFINE_PER_CPU(struct memcg_stock
static DEFINE_MUTEX(percpu_charge_mutex);
#ifdef CONFIG_MEMCG_KMEM
-static void drain_obj_stock(struct obj_stock *stock);
+static void drain_obj_stock(struct memcg_stock_pcp *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages);
#else
-static inline void drain_obj_stock(struct obj_stock *stock)
+static inline void drain_obj_stock(struct memcg_stock_pcp *stock)
{
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -2190,9 +2184,7 @@ static void drain_local_stock(struct wor
local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
- drain_obj_stock(&stock->irq_obj);
- if (in_task())
- drain_obj_stock(&stock->task_obj);
+ drain_obj_stock(stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
@@ -2768,41 +2760,6 @@ retry:
#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT)
/*
- * Most kmem_cache_alloc() calls are from user context. The irq disable/enable
- * sequence used in this case to access content from object stock is slow.
- * To optimize for user context access, there are now two object stocks for
- * task context and interrupt context access respectively.
- *
- * The task context object stock can be accessed by disabling preemption only
- * which is cheap in non-preempt kernel. The interrupt context object stock
- * can only be accessed after disabling interrupt. User context code can
- * access interrupt object stock, but not vice versa.
- */
-static inline struct obj_stock *get_obj_stock(unsigned long *pflags)
-{
- struct memcg_stock_pcp *stock;
-
- if (likely(in_task())) {
- *pflags = 0UL;
- preempt_disable();
- stock = this_cpu_ptr(&memcg_stock);
- return &stock->task_obj;
- }
-
- local_irq_save(*pflags);
- stock = this_cpu_ptr(&memcg_stock);
- return &stock->irq_obj;
-}
-
-static inline void put_obj_stock(unsigned long flags)
-{
- if (likely(in_task()))
- preempt_enable();
- else
- local_irq_restore(flags);
-}
-
-/*
* mod_objcg_mlstate() may be called with irq enabled, so
* mod_memcg_lruvec_state() should be used.
*/
@@ -3082,10 +3039,13 @@ void __memcg_kmem_uncharge_page(struct p
void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr)
{
+ struct memcg_stock_pcp *stock;
unsigned long flags;
- struct obj_stock *stock = get_obj_stock(&flags);
int *bytes;
+ local_irq_save(flags);
+ stock = this_cpu_ptr(&memcg_stock);
+
/*
* Save vmstat data in stock and skip vmstat array update unless
* accumulating over a page of vmstat data or when pgdat or idx
@@ -3136,26 +3096,29 @@ void mod_objcg_state(struct obj_cgroup *
if (nr)
mod_objcg_mlstate(objcg, pgdat, idx, nr);
- put_obj_stock(flags);
+ local_irq_restore(flags);
}
static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
{
+ struct memcg_stock_pcp *stock;
unsigned long flags;
- struct obj_stock *stock = get_obj_stock(&flags);
bool ret = false;
+ local_irq_save(flags);
+
+ stock = this_cpu_ptr(&memcg_stock);
if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) {
stock->nr_bytes -= nr_bytes;
ret = true;
}
- put_obj_stock(flags);
+ local_irq_restore(flags);
return ret;
}
-static void drain_obj_stock(struct obj_stock *stock)
+static void drain_obj_stock(struct memcg_stock_pcp *stock)
{
struct obj_cgroup *old = stock->cached_objcg;
@@ -3211,13 +3174,8 @@ static bool obj_stock_flush_required(str
{
struct mem_cgroup *memcg;
- if (in_task() && stock->task_obj.cached_objcg) {
- memcg = obj_cgroup_memcg(stock->task_obj.cached_objcg);
- if (memcg && mem_cgroup_is_descendant(memcg, root_memcg))
- return true;
- }
- if (stock->irq_obj.cached_objcg) {
- memcg = obj_cgroup_memcg(stock->irq_obj.cached_objcg);
+ if (stock->cached_objcg) {
+ memcg = obj_cgroup_memcg(stock->cached_objcg);
if (memcg && mem_cgroup_is_descendant(memcg, root_memcg))
return true;
}
@@ -3228,10 +3186,13 @@ static bool obj_stock_flush_required(str
static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
bool allow_uncharge)
{
+ struct memcg_stock_pcp *stock;
unsigned long flags;
- struct obj_stock *stock = get_obj_stock(&flags);
unsigned int nr_pages = 0;
+ local_irq_save(flags);
+
+ stock = this_cpu_ptr(&memcg_stock);
if (stock->cached_objcg != objcg) { /* reset if necessary */
drain_obj_stock(stock);
obj_cgroup_get(objcg);
@@ -3247,7 +3208,7 @@ static void refill_obj_stock(struct obj_
stock->nr_bytes &= (PAGE_SIZE - 1);
}
- put_obj_stock(flags);
+ local_irq_restore(flags);
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
@@ -6826,7 +6787,6 @@ static void uncharge_folio(struct folio
long nr_pages;
struct mem_cgroup *memcg;
struct obj_cgroup *objcg;
- bool use_objcg = folio_memcg_kmem(folio);
VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
@@ -6835,7 +6795,7 @@ static void uncharge_folio(struct folio
* folio memcg or objcg at this point, we have fully
* exclusive access to the folio.
*/
- if (use_objcg) {
+ if (folio_memcg_kmem(folio)) {
objcg = __folio_objcg(folio);
/*
* This get matches the put at the end of the function and
@@ -6863,7 +6823,7 @@ static void uncharge_folio(struct folio
nr_pages = folio_nr_pages(folio);
- if (use_objcg) {
+ if (folio_memcg_kmem(folio)) {
ug->nr_memory += nr_pages;
ug->nr_kmem += nr_pages;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 041/227] mm/memcg: revert ("mm/memcg: optimize user context object stock access")
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, longman, hannes, guro, bigeasy, mhocko, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8789 bytes --]
From: Michal Hocko <mhocko@suse.com>
Subject: mm/memcg: revert ("mm/memcg: optimize user context object stock access")
Patch series "mm/memcg: Address PREEMPT_RT problems instead of disabling it", v5.
This series aims to address the memcg related problem on PREEMPT_RT.
I tested them on CONFIG_PREEMPT and CONFIG_PREEMPT_RT with the
tools/testing/selftests/cgroup/* tests and I haven't observed any
regressions (other than the lockdep report that is already there).
This patch (of 6):
The optimisation is based on a micro benchmark where local_irq_save() is
more expensive than a preempt_disable(). There is no evidence that it is
visible in a real-world workload and there are CPUs where the opposite is
true (local_irq_save() is cheaper than preempt_disable()).
Based on micro benchmarks, the optimisation makes sense on PREEMPT_NONE
where preempt_disable() is optimized away. There is no improvement with
PREEMPT_DYNAMIC since the preemption counter is always available.
The optimization makes also the PREEMPT_RT integration more complicated
since most of the assumption are not true on PREEMPT_RT.
Revert the optimisation since it complicates the PREEMPT_RT integration
and the improvement is hardly visible.
[bigeasy@linutronix.de: patch body around Michal's diff]
Link: https://lkml.kernel.org/r/20220226204144.1008339-1-bigeasy@linutronix.de
Link: https://lore.kernel.org/all/YgOGkXXCrD%2F1k+p4@dhcp22.suse.cz
Link: https://lkml.kernel.org/r/YdX+INO9gQje6d0S@linutronix.de
Link: https://lkml.kernel.org/r/20220226204144.1008339-2-bigeasy@linutronix.de
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 94 +++++++++++++---------------------------------
1 file changed, 27 insertions(+), 67 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-revert-mm-memcg-optimize-user-context-object-stock-access
+++ a/mm/memcontrol.c
@@ -2078,23 +2078,17 @@ void unlock_page_memcg(struct page *page
folio_memcg_unlock(page_folio(page));
}
-struct obj_stock {
+struct memcg_stock_pcp {
+ struct mem_cgroup *cached; /* this never be root cgroup */
+ unsigned int nr_pages;
+
#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *cached_objcg;
struct pglist_data *cached_pgdat;
unsigned int nr_bytes;
int nr_slab_reclaimable_b;
int nr_slab_unreclaimable_b;
-#else
- int dummy[0];
#endif
-};
-
-struct memcg_stock_pcp {
- struct mem_cgroup *cached; /* this never be root cgroup */
- unsigned int nr_pages;
- struct obj_stock task_obj;
- struct obj_stock irq_obj;
struct work_struct work;
unsigned long flags;
@@ -2104,13 +2098,13 @@ static DEFINE_PER_CPU(struct memcg_stock
static DEFINE_MUTEX(percpu_charge_mutex);
#ifdef CONFIG_MEMCG_KMEM
-static void drain_obj_stock(struct obj_stock *stock);
+static void drain_obj_stock(struct memcg_stock_pcp *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages);
#else
-static inline void drain_obj_stock(struct obj_stock *stock)
+static inline void drain_obj_stock(struct memcg_stock_pcp *stock)
{
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -2190,9 +2184,7 @@ static void drain_local_stock(struct wor
local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
- drain_obj_stock(&stock->irq_obj);
- if (in_task())
- drain_obj_stock(&stock->task_obj);
+ drain_obj_stock(stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
@@ -2768,41 +2760,6 @@ retry:
#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT)
/*
- * Most kmem_cache_alloc() calls are from user context. The irq disable/enable
- * sequence used in this case to access content from object stock is slow.
- * To optimize for user context access, there are now two object stocks for
- * task context and interrupt context access respectively.
- *
- * The task context object stock can be accessed by disabling preemption only
- * which is cheap in non-preempt kernel. The interrupt context object stock
- * can only be accessed after disabling interrupt. User context code can
- * access interrupt object stock, but not vice versa.
- */
-static inline struct obj_stock *get_obj_stock(unsigned long *pflags)
-{
- struct memcg_stock_pcp *stock;
-
- if (likely(in_task())) {
- *pflags = 0UL;
- preempt_disable();
- stock = this_cpu_ptr(&memcg_stock);
- return &stock->task_obj;
- }
-
- local_irq_save(*pflags);
- stock = this_cpu_ptr(&memcg_stock);
- return &stock->irq_obj;
-}
-
-static inline void put_obj_stock(unsigned long flags)
-{
- if (likely(in_task()))
- preempt_enable();
- else
- local_irq_restore(flags);
-}
-
-/*
* mod_objcg_mlstate() may be called with irq enabled, so
* mod_memcg_lruvec_state() should be used.
*/
@@ -3082,10 +3039,13 @@ void __memcg_kmem_uncharge_page(struct p
void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr)
{
+ struct memcg_stock_pcp *stock;
unsigned long flags;
- struct obj_stock *stock = get_obj_stock(&flags);
int *bytes;
+ local_irq_save(flags);
+ stock = this_cpu_ptr(&memcg_stock);
+
/*
* Save vmstat data in stock and skip vmstat array update unless
* accumulating over a page of vmstat data or when pgdat or idx
@@ -3136,26 +3096,29 @@ void mod_objcg_state(struct obj_cgroup *
if (nr)
mod_objcg_mlstate(objcg, pgdat, idx, nr);
- put_obj_stock(flags);
+ local_irq_restore(flags);
}
static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
{
+ struct memcg_stock_pcp *stock;
unsigned long flags;
- struct obj_stock *stock = get_obj_stock(&flags);
bool ret = false;
+ local_irq_save(flags);
+
+ stock = this_cpu_ptr(&memcg_stock);
if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) {
stock->nr_bytes -= nr_bytes;
ret = true;
}
- put_obj_stock(flags);
+ local_irq_restore(flags);
return ret;
}
-static void drain_obj_stock(struct obj_stock *stock)
+static void drain_obj_stock(struct memcg_stock_pcp *stock)
{
struct obj_cgroup *old = stock->cached_objcg;
@@ -3211,13 +3174,8 @@ static bool obj_stock_flush_required(str
{
struct mem_cgroup *memcg;
- if (in_task() && stock->task_obj.cached_objcg) {
- memcg = obj_cgroup_memcg(stock->task_obj.cached_objcg);
- if (memcg && mem_cgroup_is_descendant(memcg, root_memcg))
- return true;
- }
- if (stock->irq_obj.cached_objcg) {
- memcg = obj_cgroup_memcg(stock->irq_obj.cached_objcg);
+ if (stock->cached_objcg) {
+ memcg = obj_cgroup_memcg(stock->cached_objcg);
if (memcg && mem_cgroup_is_descendant(memcg, root_memcg))
return true;
}
@@ -3228,10 +3186,13 @@ static bool obj_stock_flush_required(str
static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
bool allow_uncharge)
{
+ struct memcg_stock_pcp *stock;
unsigned long flags;
- struct obj_stock *stock = get_obj_stock(&flags);
unsigned int nr_pages = 0;
+ local_irq_save(flags);
+
+ stock = this_cpu_ptr(&memcg_stock);
if (stock->cached_objcg != objcg) { /* reset if necessary */
drain_obj_stock(stock);
obj_cgroup_get(objcg);
@@ -3247,7 +3208,7 @@ static void refill_obj_stock(struct obj_
stock->nr_bytes &= (PAGE_SIZE - 1);
}
- put_obj_stock(flags);
+ local_irq_restore(flags);
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
@@ -6826,7 +6787,6 @@ static void uncharge_folio(struct folio
long nr_pages;
struct mem_cgroup *memcg;
struct obj_cgroup *objcg;
- bool use_objcg = folio_memcg_kmem(folio);
VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
@@ -6835,7 +6795,7 @@ static void uncharge_folio(struct folio
* folio memcg or objcg at this point, we have fully
* exclusive access to the folio.
*/
- if (use_objcg) {
+ if (folio_memcg_kmem(folio)) {
objcg = __folio_objcg(folio);
/*
* This get matches the put at the end of the function and
@@ -6863,7 +6823,7 @@ static void uncharge_folio(struct folio
nr_pages = folio_nr_pages(folio);
- if (use_objcg) {
+ if (folio_memcg_kmem(folio)) {
ug->nr_memory += nr_pages;
ug->nr_kmem += nr_pages;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 042/227] mm/memcg: disable threshold event handlers on PREEMPT_RT
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, mhocko, longman, hannes, guro, bigeasy, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3978 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: disable threshold event handlers on PREEMPT_RT
During the integration of PREEMPT_RT support, the code flow around
memcg_check_events() resulted in `twisted code'. Moving the code around
and avoiding then would then lead to an additional local-irq-save section
within memcg_check_events(). While looking better, it adds a
local-irq-save section to code flow which is usually within an
local-irq-off block on non-PREEMPT_RT configurations.
The threshold event handler is a deprecated memcg v1 feature. Instead of
trying to get it to work under PREEMPT_RT just disable it. There should
be no users on PREEMPT_RT. From that perspective it makes even less sense
to get it to work under PREEMPT_RT while having zero users.
Make memory.soft_limit_in_bytes and cgroup.event_control return
-EOPNOTSUPP on PREEMPT_RT. Make an empty memcg_check_events() and
memcg_write_event_control() which return only -EOPNOTSUPP on PREEMPT_RT.
Document that the two knobs are disabled on PREEMPT_RT.
Link: https://lkml.kernel.org/r/20220226204144.1008339-3-bigeasy@linutronix.de
Suggested-by: Michal Hocko <mhocko@kernel.org>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 2 ++
mm/memcontrol.c | 14 ++++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
--- a/Documentation/admin-guide/cgroup-v1/memory.rst~mm-memcg-disable-threshold-event-handlers-on-preempt_rt
+++ a/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -64,6 +64,7 @@ Brief summary of control files.
threads
cgroup.procs show list of processes
cgroup.event_control an interface for event_fd()
+ This knob is not available on CONFIG_PREEMPT_RT systems.
memory.usage_in_bytes show current usage for memory
(See 5.5 for details)
memory.memsw.usage_in_bytes show current usage for memory+Swap
@@ -75,6 +76,7 @@ Brief summary of control files.
memory.max_usage_in_bytes show max memory usage recorded
memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded
memory.soft_limit_in_bytes set/show soft limit of memory usage
+ This knob is not available on CONFIG_PREEMPT_RT systems.
memory.stat show various statistics
memory.use_hierarchy set/show hierarchical account enabled
This knob is deprecated and shouldn't be
--- a/mm/memcontrol.c~mm-memcg-disable-threshold-event-handlers-on-preempt_rt
+++ a/mm/memcontrol.c
@@ -858,6 +858,9 @@ static bool mem_cgroup_event_ratelimit(s
*/
static void memcg_check_events(struct mem_cgroup *memcg, int nid)
{
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ return;
+
/* threshold event is triggered in finer grain than soft limit */
if (unlikely(mem_cgroup_event_ratelimit(memcg,
MEM_CGROUP_TARGET_THRESH))) {
@@ -3731,8 +3734,12 @@ static ssize_t mem_cgroup_write(struct k
}
break;
case RES_SOFT_LIMIT:
- memcg->soft_limit = nr_pages;
- ret = 0;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ ret = -EOPNOTSUPP;
+ } else {
+ memcg->soft_limit = nr_pages;
+ ret = 0;
+ }
break;
}
return ret ?: nbytes;
@@ -4708,6 +4715,9 @@ static ssize_t memcg_write_event_control
char *endp;
int ret;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ return -EOPNOTSUPP;
+
buf = strstrip(buf);
efd = simple_strtoul(buf, &endp, 10);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 042/227] mm/memcg: disable threshold event handlers on PREEMPT_RT
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, mhocko, longman, hannes, guro, bigeasy, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3978 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: disable threshold event handlers on PREEMPT_RT
During the integration of PREEMPT_RT support, the code flow around
memcg_check_events() resulted in `twisted code'. Moving the code around
and avoiding then would then lead to an additional local-irq-save section
within memcg_check_events(). While looking better, it adds a
local-irq-save section to code flow which is usually within an
local-irq-off block on non-PREEMPT_RT configurations.
The threshold event handler is a deprecated memcg v1 feature. Instead of
trying to get it to work under PREEMPT_RT just disable it. There should
be no users on PREEMPT_RT. From that perspective it makes even less sense
to get it to work under PREEMPT_RT while having zero users.
Make memory.soft_limit_in_bytes and cgroup.event_control return
-EOPNOTSUPP on PREEMPT_RT. Make an empty memcg_check_events() and
memcg_write_event_control() which return only -EOPNOTSUPP on PREEMPT_RT.
Document that the two knobs are disabled on PREEMPT_RT.
Link: https://lkml.kernel.org/r/20220226204144.1008339-3-bigeasy@linutronix.de
Suggested-by: Michal Hocko <mhocko@kernel.org>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 2 ++
mm/memcontrol.c | 14 ++++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
--- a/Documentation/admin-guide/cgroup-v1/memory.rst~mm-memcg-disable-threshold-event-handlers-on-preempt_rt
+++ a/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -64,6 +64,7 @@ Brief summary of control files.
threads
cgroup.procs show list of processes
cgroup.event_control an interface for event_fd()
+ This knob is not available on CONFIG_PREEMPT_RT systems.
memory.usage_in_bytes show current usage for memory
(See 5.5 for details)
memory.memsw.usage_in_bytes show current usage for memory+Swap
@@ -75,6 +76,7 @@ Brief summary of control files.
memory.max_usage_in_bytes show max memory usage recorded
memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded
memory.soft_limit_in_bytes set/show soft limit of memory usage
+ This knob is not available on CONFIG_PREEMPT_RT systems.
memory.stat show various statistics
memory.use_hierarchy set/show hierarchical account enabled
This knob is deprecated and shouldn't be
--- a/mm/memcontrol.c~mm-memcg-disable-threshold-event-handlers-on-preempt_rt
+++ a/mm/memcontrol.c
@@ -858,6 +858,9 @@ static bool mem_cgroup_event_ratelimit(s
*/
static void memcg_check_events(struct mem_cgroup *memcg, int nid)
{
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ return;
+
/* threshold event is triggered in finer grain than soft limit */
if (unlikely(mem_cgroup_event_ratelimit(memcg,
MEM_CGROUP_TARGET_THRESH))) {
@@ -3731,8 +3734,12 @@ static ssize_t mem_cgroup_write(struct k
}
break;
case RES_SOFT_LIMIT:
- memcg->soft_limit = nr_pages;
- ret = 0;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ ret = -EOPNOTSUPP;
+ } else {
+ memcg->soft_limit = nr_pages;
+ ret = 0;
+ }
break;
}
return ret ?: nbytes;
@@ -4708,6 +4715,9 @@ static ssize_t memcg_write_event_control
char *endp;
int ret;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ return -EOPNOTSUPP;
+
buf = strstrip(buf);
efd = simple_strtoul(buf, &endp, 10);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 043/227] mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed.
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, mhocko, longman, hannes, guro, bigeasy, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4814 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed.
The per-CPU counter are modified with the non-atomic modifier. The
consistency is ensured by disabling interrupts for the update. On non
PREEMPT_RT configuration this works because acquiring a spinlock_t typed
lock with the _irq() suffix disables interrupts. On PREEMPT_RT
configurations the RMW operation can be interrupted.
Another problem is that mem_cgroup_swapout() expects to be invoked with
disabled interrupts because the caller has to acquire a spinlock_t which
is acquired with disabled interrupts. Since spinlock_t never disables
interrupts on PREEMPT_RT the interrupts are never disabled at this point.
The code is never called from in_irq() context on PREEMPT_RT therefore
disabling preemption during the update is sufficient on PREEMPT_RT. The
sections which explicitly disable interrupts can remain on PREEMPT_RT
because the sections remain short and they don't involve sleeping locks
(memcg_check_events() is doing nothing on PREEMPT_RT).
Disable preemption during update of the per-CPU variables which do not
explicitly disable interrupts.
Link: https://lkml.kernel.org/r/20220226204144.1008339-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 56 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 55 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcg-protect-per-cpu-counter-by-disabling-preemption-on-preempt_rt-where-needed
+++ a/mm/memcontrol.c
@@ -629,6 +629,35 @@ static DEFINE_SPINLOCK(stats_flush_lock)
static DEFINE_PER_CPU(unsigned int, stats_updates);
static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
+/*
+ * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can
+ * not rely on this as part of an acquired spinlock_t lock. These functions are
+ * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion
+ * is sufficient.
+ */
+static void memcg_stats_lock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+ preempt_disable();
+#else
+ VM_BUG_ON(!irqs_disabled());
+#endif
+}
+
+static void __memcg_stats_lock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+ preempt_disable();
+#endif
+}
+
+static void memcg_stats_unlock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+ preempt_enable();
+#endif
+}
+
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
unsigned int x;
@@ -705,6 +734,27 @@ void __mod_memcg_lruvec_state(struct lru
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
memcg = pn->memcg;
+ /*
+ * The caller from rmap relay on disabled preemption becase they never
+ * update their counter from in-interrupt context. For these two
+ * counters we check that the update is never performed from an
+ * interrupt context while other caller need to have disabled interrupt.
+ */
+ __memcg_stats_lock();
+ if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ switch (idx) {
+ case NR_ANON_MAPPED:
+ case NR_FILE_MAPPED:
+ case NR_ANON_THPS:
+ case NR_SHMEM_PMDMAPPED:
+ case NR_FILE_PMDMAPPED:
+ WARN_ON_ONCE(!in_task());
+ break;
+ default:
+ WARN_ON_ONCE(!irqs_disabled());
+ }
+ }
+
/* Update memcg */
__this_cpu_add(memcg->vmstats_percpu->state[idx], val);
@@ -712,6 +762,7 @@ void __mod_memcg_lruvec_state(struct lru
__this_cpu_add(pn->lruvec_stats_percpu->state[idx], val);
memcg_rstat_updated(memcg, val);
+ memcg_stats_unlock();
}
/**
@@ -794,8 +845,10 @@ void __count_memcg_events(struct mem_cgr
if (mem_cgroup_disabled())
return;
+ memcg_stats_lock();
__this_cpu_add(memcg->vmstats_percpu->events[idx], count);
memcg_rstat_updated(memcg, count);
+ memcg_stats_unlock();
}
static unsigned long memcg_events(struct mem_cgroup *memcg, int event)
@@ -7154,8 +7207,9 @@ void mem_cgroup_swapout(struct page *pag
* important here to have the interrupts disabled because it is the
* only synchronisation we have for updating the per-CPU variables.
*/
- VM_BUG_ON(!irqs_disabled());
+ memcg_stats_lock();
mem_cgroup_charge_statistics(memcg, -nr_entries);
+ memcg_stats_unlock();
memcg_check_events(memcg, page_to_nid(page));
css_put(&memcg->css);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 043/227] mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed.
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, mhocko, longman, hannes, guro, bigeasy, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4814 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed.
The per-CPU counter are modified with the non-atomic modifier. The
consistency is ensured by disabling interrupts for the update. On non
PREEMPT_RT configuration this works because acquiring a spinlock_t typed
lock with the _irq() suffix disables interrupts. On PREEMPT_RT
configurations the RMW operation can be interrupted.
Another problem is that mem_cgroup_swapout() expects to be invoked with
disabled interrupts because the caller has to acquire a spinlock_t which
is acquired with disabled interrupts. Since spinlock_t never disables
interrupts on PREEMPT_RT the interrupts are never disabled at this point.
The code is never called from in_irq() context on PREEMPT_RT therefore
disabling preemption during the update is sufficient on PREEMPT_RT. The
sections which explicitly disable interrupts can remain on PREEMPT_RT
because the sections remain short and they don't involve sleeping locks
(memcg_check_events() is doing nothing on PREEMPT_RT).
Disable preemption during update of the per-CPU variables which do not
explicitly disable interrupts.
Link: https://lkml.kernel.org/r/20220226204144.1008339-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 56 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 55 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcg-protect-per-cpu-counter-by-disabling-preemption-on-preempt_rt-where-needed
+++ a/mm/memcontrol.c
@@ -629,6 +629,35 @@ static DEFINE_SPINLOCK(stats_flush_lock)
static DEFINE_PER_CPU(unsigned int, stats_updates);
static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
+/*
+ * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can
+ * not rely on this as part of an acquired spinlock_t lock. These functions are
+ * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion
+ * is sufficient.
+ */
+static void memcg_stats_lock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+ preempt_disable();
+#else
+ VM_BUG_ON(!irqs_disabled());
+#endif
+}
+
+static void __memcg_stats_lock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+ preempt_disable();
+#endif
+}
+
+static void memcg_stats_unlock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+ preempt_enable();
+#endif
+}
+
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
unsigned int x;
@@ -705,6 +734,27 @@ void __mod_memcg_lruvec_state(struct lru
pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
memcg = pn->memcg;
+ /*
+ * The caller from rmap relay on disabled preemption becase they never
+ * update their counter from in-interrupt context. For these two
+ * counters we check that the update is never performed from an
+ * interrupt context while other caller need to have disabled interrupt.
+ */
+ __memcg_stats_lock();
+ if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ switch (idx) {
+ case NR_ANON_MAPPED:
+ case NR_FILE_MAPPED:
+ case NR_ANON_THPS:
+ case NR_SHMEM_PMDMAPPED:
+ case NR_FILE_PMDMAPPED:
+ WARN_ON_ONCE(!in_task());
+ break;
+ default:
+ WARN_ON_ONCE(!irqs_disabled());
+ }
+ }
+
/* Update memcg */
__this_cpu_add(memcg->vmstats_percpu->state[idx], val);
@@ -712,6 +762,7 @@ void __mod_memcg_lruvec_state(struct lru
__this_cpu_add(pn->lruvec_stats_percpu->state[idx], val);
memcg_rstat_updated(memcg, val);
+ memcg_stats_unlock();
}
/**
@@ -794,8 +845,10 @@ void __count_memcg_events(struct mem_cgr
if (mem_cgroup_disabled())
return;
+ memcg_stats_lock();
__this_cpu_add(memcg->vmstats_percpu->events[idx], count);
memcg_rstat_updated(memcg, count);
+ memcg_stats_unlock();
}
static unsigned long memcg_events(struct mem_cgroup *memcg, int event)
@@ -7154,8 +7207,9 @@ void mem_cgroup_swapout(struct page *pag
* important here to have the interrupts disabled because it is the
* only synchronisation we have for updating the per-CPU variables.
*/
- VM_BUG_ON(!irqs_disabled());
+ memcg_stats_lock();
mem_cgroup_charge_statistics(memcg, -nr_entries);
+ memcg_stats_unlock();
memcg_check_events(memcg, page_to_nid(page));
css_put(&memcg->css);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 044/227] mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, mhocko, longman, guro, bigeasy, hannes, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2803 bytes --]
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock()
Provide the inner part of refill_stock() as __refill_stock() without
disabling interrupts. This eases the integration of local_lock_t where
recursive locking must be avoided. Open code obj_cgroup_uncharge_pages()
in drain_obj_stock() and use __refill_stock(). The caller of
drain_obj_stock() already disables interrupts.
[bigeasy@linutronix.de: patch body around Johannes' diff]
Link: https://lkml.kernel.org/r/20220226204144.1008339-5-bigeasy@linutronix.de
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-opencode-the-inner-part-of-obj_cgroup_uncharge_pages-in-drain_obj_stock
+++ a/mm/memcontrol.c
@@ -2251,12 +2251,9 @@ static void drain_local_stock(struct wor
* Cache charges(val) to local per_cpu area.
* This will be consumed by consume_stock() function, later.
*/
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
{
struct memcg_stock_pcp *stock;
- unsigned long flags;
-
- local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
if (stock->cached != memcg) { /* reset if necessary */
@@ -2268,7 +2265,14 @@ static void refill_stock(struct mem_cgro
if (stock->nr_pages > MEMCG_CHARGE_BATCH)
drain_stock(stock);
+}
+
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+ unsigned long flags;
+ local_irq_save(flags);
+ __refill_stock(memcg, nr_pages);
local_irq_restore(flags);
}
@@ -3185,8 +3189,16 @@ static void drain_obj_stock(struct memcg
unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
- if (nr_pages)
- obj_cgroup_uncharge_pages(old, nr_pages);
+ if (nr_pages) {
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(old);
+
+ memcg_account_kmem(memcg, -nr_pages);
+ __refill_stock(memcg, nr_pages);
+
+ css_put(&memcg->css);
+ }
/*
* The leftover is flushed to the centralized per-memcg value.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 044/227] mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock()
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, peterz, oliver.sang, mkoutny,
mhocko, mhocko, longman, guro, bigeasy, hannes, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2803 bytes --]
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock()
Provide the inner part of refill_stock() as __refill_stock() without
disabling interrupts. This eases the integration of local_lock_t where
recursive locking must be avoided. Open code obj_cgroup_uncharge_pages()
in drain_obj_stock() and use __refill_stock(). The caller of
drain_obj_stock() already disables interrupts.
[bigeasy@linutronix.de: patch body around Johannes' diff]
Link: https://lkml.kernel.org/r/20220226204144.1008339-5-bigeasy@linutronix.de
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-opencode-the-inner-part-of-obj_cgroup_uncharge_pages-in-drain_obj_stock
+++ a/mm/memcontrol.c
@@ -2251,12 +2251,9 @@ static void drain_local_stock(struct wor
* Cache charges(val) to local per_cpu area.
* This will be consumed by consume_stock() function, later.
*/
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
{
struct memcg_stock_pcp *stock;
- unsigned long flags;
-
- local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
if (stock->cached != memcg) { /* reset if necessary */
@@ -2268,7 +2265,14 @@ static void refill_stock(struct mem_cgro
if (stock->nr_pages > MEMCG_CHARGE_BATCH)
drain_stock(stock);
+}
+
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+ unsigned long flags;
+ local_irq_save(flags);
+ __refill_stock(memcg, nr_pages);
local_irq_restore(flags);
}
@@ -3185,8 +3189,16 @@ static void drain_obj_stock(struct memcg
unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
- if (nr_pages)
- obj_cgroup_uncharge_pages(old, nr_pages);
+ if (nr_pages) {
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(old);
+
+ memcg_account_kmem(memcg, -nr_pages);
+ __refill_stock(memcg, nr_pages);
+
+ css_put(&memcg->css);
+ }
/*
* The leftover is flushed to the centralized per-memcg value.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 045/227] mm/memcg: protect memcg_stock with a local_lock_t
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, roman.gushchin, peterz,
oliver.sang, mkoutny, mhocko, longman, hannes, bigeasy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8126 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: protect memcg_stock with a local_lock_t
The members of the per-CPU structure memcg_stock_pcp are protected by
disabling interrupts. This is not working on PREEMPT_RT because it
creates atomic context in which actions are performed which require
preemptible context. One example is obj_cgroup_release().
The IRQ-disable sections can be replaced with local_lock_t which preserves
the explicit disabling of interrupts while keeps the code preemptible on
PREEMPT_RT.
drain_obj_stock() drops a reference on obj_cgroup which leads to an
invocat= ion of obj_cgroup_release() if it is the last object. This in
turn leads to recursive locking of the local_lock_t. To avoid this,
obj_cgroup_release() = is invoked outside of the locked section.
obj_cgroup_uncharge_pages() can be invoked with the local_lock_t acquired
a= nd without it. This will lead later to a recursion in refill_stock().
To avoid the locking recursion provide obj_cgroup_uncharge_pages_locked()
which uses the locked version of refill_stock().
- Replace disabling interrupts for memcg_stock with a local_lock_t.
- Let drain_obj_stock() return the old struct obj_cgroup which is passed
to obj_cgroup_put() outside of the locked section.
- Provide obj_cgroup_uncharge_pages_locked() which uses the locked
version of refill_stock() to avoid recursive locking in
drain_obj_stock().
Link: https://lkml.kernel.org/r/20220209014709.GA26885@xsang-OptiPlex-9020
Link: https://lkml.kernel.org/r/20220226204144.1008339-6-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 59 +++++++++++++++++++++++++++++-----------------
1 file changed, 38 insertions(+), 21 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-protect-memcg_stock-with-a-local_lock_t
+++ a/mm/memcontrol.c
@@ -2135,6 +2135,7 @@ void unlock_page_memcg(struct page *page
}
struct memcg_stock_pcp {
+ local_lock_t stock_lock;
struct mem_cgroup *cached; /* this never be root cgroup */
unsigned int nr_pages;
@@ -2150,18 +2151,21 @@ struct memcg_stock_pcp {
unsigned long flags;
#define FLUSHING_CACHED_CHARGE 0
};
-static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
+static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = {
+ .stock_lock = INIT_LOCAL_LOCK(stock_lock),
+};
static DEFINE_MUTEX(percpu_charge_mutex);
#ifdef CONFIG_MEMCG_KMEM
-static void drain_obj_stock(struct memcg_stock_pcp *stock);
+static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages);
#else
-static inline void drain_obj_stock(struct memcg_stock_pcp *stock)
+static inline struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
{
+ return NULL;
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg)
@@ -2193,7 +2197,7 @@ static bool consume_stock(struct mem_cgr
if (nr_pages > MEMCG_CHARGE_BATCH)
return ret;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
@@ -2201,7 +2205,7 @@ static bool consume_stock(struct mem_cgr
ret = true;
}
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
return ret;
}
@@ -2230,6 +2234,7 @@ static void drain_stock(struct memcg_sto
static void drain_local_stock(struct work_struct *dummy)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old = NULL;
unsigned long flags;
/*
@@ -2237,14 +2242,16 @@ static void drain_local_stock(struct wor
* drain_stock races is that we always operate on local CPU stock
* here with IRQ disabled
*/
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
- drain_obj_stock(stock);
+ old = drain_obj_stock(stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+ if (old)
+ obj_cgroup_put(old);
}
/*
@@ -2271,9 +2278,9 @@ static void refill_stock(struct mem_cgro
{
unsigned long flags;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
__refill_stock(memcg, nr_pages);
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
}
/*
@@ -3100,10 +3107,11 @@ void mod_objcg_state(struct obj_cgroup *
enum node_stat_item idx, int nr)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old = NULL;
unsigned long flags;
int *bytes;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
/*
@@ -3112,7 +3120,7 @@ void mod_objcg_state(struct obj_cgroup *
* changes.
*/
if (stock->cached_objcg != objcg) {
- drain_obj_stock(stock);
+ old = drain_obj_stock(stock);
obj_cgroup_get(objcg);
stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
@@ -3156,7 +3164,9 @@ void mod_objcg_state(struct obj_cgroup *
if (nr)
mod_objcg_mlstate(objcg, pgdat, idx, nr);
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+ if (old)
+ obj_cgroup_put(old);
}
static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -3165,7 +3175,7 @@ static bool consume_obj_stock(struct obj
unsigned long flags;
bool ret = false;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) {
@@ -3173,17 +3183,17 @@ static bool consume_obj_stock(struct obj
ret = true;
}
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
return ret;
}
-static void drain_obj_stock(struct memcg_stock_pcp *stock)
+static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
{
struct obj_cgroup *old = stock->cached_objcg;
if (!old)
- return;
+ return NULL;
if (stock->nr_bytes) {
unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
@@ -3233,8 +3243,12 @@ static void drain_obj_stock(struct memcg
stock->cached_pgdat = NULL;
}
- obj_cgroup_put(old);
stock->cached_objcg = NULL;
+ /*
+ * The `old' objects needs to be released by the caller via
+ * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock.
+ */
+ return old;
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -3255,14 +3269,15 @@ static void refill_obj_stock(struct obj_
bool allow_uncharge)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old = NULL;
unsigned long flags;
unsigned int nr_pages = 0;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
if (stock->cached_objcg != objcg) { /* reset if necessary */
- drain_obj_stock(stock);
+ old = drain_obj_stock(stock);
obj_cgroup_get(objcg);
stock->cached_objcg = objcg;
stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
@@ -3276,7 +3291,9 @@ static void refill_obj_stock(struct obj_
stock->nr_bytes &= (PAGE_SIZE - 1);
}
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+ if (old)
+ obj_cgroup_put(old);
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 045/227] mm/memcg: protect memcg_stock with a local_lock_t
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, roman.gushchin, peterz,
oliver.sang, mkoutny, mhocko, longman, hannes, bigeasy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8126 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: protect memcg_stock with a local_lock_t
The members of the per-CPU structure memcg_stock_pcp are protected by
disabling interrupts. This is not working on PREEMPT_RT because it
creates atomic context in which actions are performed which require
preemptible context. One example is obj_cgroup_release().
The IRQ-disable sections can be replaced with local_lock_t which preserves
the explicit disabling of interrupts while keeps the code preemptible on
PREEMPT_RT.
drain_obj_stock() drops a reference on obj_cgroup which leads to an
invocat= ion of obj_cgroup_release() if it is the last object. This in
turn leads to recursive locking of the local_lock_t. To avoid this,
obj_cgroup_release() = is invoked outside of the locked section.
obj_cgroup_uncharge_pages() can be invoked with the local_lock_t acquired
a= nd without it. This will lead later to a recursion in refill_stock().
To avoid the locking recursion provide obj_cgroup_uncharge_pages_locked()
which uses the locked version of refill_stock().
- Replace disabling interrupts for memcg_stock with a local_lock_t.
- Let drain_obj_stock() return the old struct obj_cgroup which is passed
to obj_cgroup_put() outside of the locked section.
- Provide obj_cgroup_uncharge_pages_locked() which uses the locked
version of refill_stock() to avoid recursive locking in
drain_obj_stock().
Link: https://lkml.kernel.org/r/20220209014709.GA26885@xsang-OptiPlex-9020
Link: https://lkml.kernel.org/r/20220226204144.1008339-6-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 59 +++++++++++++++++++++++++++++-----------------
1 file changed, 38 insertions(+), 21 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-protect-memcg_stock-with-a-local_lock_t
+++ a/mm/memcontrol.c
@@ -2135,6 +2135,7 @@ void unlock_page_memcg(struct page *page
}
struct memcg_stock_pcp {
+ local_lock_t stock_lock;
struct mem_cgroup *cached; /* this never be root cgroup */
unsigned int nr_pages;
@@ -2150,18 +2151,21 @@ struct memcg_stock_pcp {
unsigned long flags;
#define FLUSHING_CACHED_CHARGE 0
};
-static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
+static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = {
+ .stock_lock = INIT_LOCAL_LOCK(stock_lock),
+};
static DEFINE_MUTEX(percpu_charge_mutex);
#ifdef CONFIG_MEMCG_KMEM
-static void drain_obj_stock(struct memcg_stock_pcp *stock);
+static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages);
#else
-static inline void drain_obj_stock(struct memcg_stock_pcp *stock)
+static inline struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
{
+ return NULL;
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg)
@@ -2193,7 +2197,7 @@ static bool consume_stock(struct mem_cgr
if (nr_pages > MEMCG_CHARGE_BATCH)
return ret;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
@@ -2201,7 +2205,7 @@ static bool consume_stock(struct mem_cgr
ret = true;
}
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
return ret;
}
@@ -2230,6 +2234,7 @@ static void drain_stock(struct memcg_sto
static void drain_local_stock(struct work_struct *dummy)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old = NULL;
unsigned long flags;
/*
@@ -2237,14 +2242,16 @@ static void drain_local_stock(struct wor
* drain_stock races is that we always operate on local CPU stock
* here with IRQ disabled
*/
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
- drain_obj_stock(stock);
+ old = drain_obj_stock(stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+ if (old)
+ obj_cgroup_put(old);
}
/*
@@ -2271,9 +2278,9 @@ static void refill_stock(struct mem_cgro
{
unsigned long flags;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
__refill_stock(memcg, nr_pages);
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
}
/*
@@ -3100,10 +3107,11 @@ void mod_objcg_state(struct obj_cgroup *
enum node_stat_item idx, int nr)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old = NULL;
unsigned long flags;
int *bytes;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
/*
@@ -3112,7 +3120,7 @@ void mod_objcg_state(struct obj_cgroup *
* changes.
*/
if (stock->cached_objcg != objcg) {
- drain_obj_stock(stock);
+ old = drain_obj_stock(stock);
obj_cgroup_get(objcg);
stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
@@ -3156,7 +3164,9 @@ void mod_objcg_state(struct obj_cgroup *
if (nr)
mod_objcg_mlstate(objcg, pgdat, idx, nr);
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+ if (old)
+ obj_cgroup_put(old);
}
static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -3165,7 +3175,7 @@ static bool consume_obj_stock(struct obj
unsigned long flags;
bool ret = false;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) {
@@ -3173,17 +3183,17 @@ static bool consume_obj_stock(struct obj
ret = true;
}
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
return ret;
}
-static void drain_obj_stock(struct memcg_stock_pcp *stock)
+static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
{
struct obj_cgroup *old = stock->cached_objcg;
if (!old)
- return;
+ return NULL;
if (stock->nr_bytes) {
unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
@@ -3233,8 +3243,12 @@ static void drain_obj_stock(struct memcg
stock->cached_pgdat = NULL;
}
- obj_cgroup_put(old);
stock->cached_objcg = NULL;
+ /*
+ * The `old' objects needs to be released by the caller via
+ * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock.
+ */
+ return old;
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -3255,14 +3269,15 @@ static void refill_obj_stock(struct obj_
bool allow_uncharge)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old = NULL;
unsigned long flags;
unsigned int nr_pages = 0;
- local_irq_save(flags);
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
if (stock->cached_objcg != objcg) { /* reset if necessary */
- drain_obj_stock(stock);
+ old = drain_obj_stock(stock);
obj_cgroup_get(objcg);
stock->cached_objcg = objcg;
stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
@@ -3276,7 +3291,9 @@ static void refill_obj_stock(struct obj_
stock->nr_bytes &= (PAGE_SIZE - 1);
}
- local_irq_restore(flags);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+ if (old)
+ obj_cgroup_put(old);
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 046/227] mm/memcg: disable migration instead of preemption in drain_all_stock().
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, roman.gushchin, peterz,
oliver.sang, mkoutny, mhocko, mhocko, longman, hannes, bigeasy,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2489 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: disable migration instead of preemption in drain_all_stock().
Before the for-each-CPU loop, preemption is disabled so that so that
drain_local_stock() can be invoked directly instead of scheduling a
worker. Ensuring that drain_local_stock() completed on the local CPU is
not correctness problem. It _could_ be that the charging path will be
forced to reclaim memory because cached charges are still waiting for
their draining.
Disabling preemption before invoking drain_local_stock() is problematic on
PREEMPT_RT due to the sleeping locks involved. To ensure that no CPU
migrations happens across for_each_online_cpu() it is enouhg to use
migrate_disable() which disables migration and keeps context preemptible
to a sleeping lock can be acquired. A race with CPU hotplug is not a
problem because pcp data is not going away. In the worst case we just
schedule draining of an empty stock.
Use migrate_disable() instead of get_cpu() around the
for_each_online_cpu() loop.
Link: https://lkml.kernel.org/r/20220226204144.1008339-7-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-disable-migration-instead-of-preemption-in-drain_all_stock
+++ a/mm/memcontrol.c
@@ -2300,7 +2300,8 @@ static void drain_all_stock(struct mem_c
* as well as workers from this path always operate on the local
* per-cpu data. CPU up doesn't touch memcg_stock at all.
*/
- curcpu = get_cpu();
+ migrate_disable();
+ curcpu = smp_processor_id();
for_each_online_cpu(cpu) {
struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
struct mem_cgroup *memcg;
@@ -2323,7 +2324,7 @@ static void drain_all_stock(struct mem_c
schedule_work_on(cpu, &stock->work);
}
}
- put_cpu();
+ migrate_enable();
mutex_unlock(&percpu_charge_mutex);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 046/227] mm/memcg: disable migration instead of preemption in drain_all_stock().
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: vdavydov.dev, tglx, shakeelb, roman.gushchin, peterz,
oliver.sang, mkoutny, mhocko, mhocko, longman, hannes, bigeasy,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2489 bytes --]
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/memcg: disable migration instead of preemption in drain_all_stock().
Before the for-each-CPU loop, preemption is disabled so that so that
drain_local_stock() can be invoked directly instead of scheduling a
worker. Ensuring that drain_local_stock() completed on the local CPU is
not correctness problem. It _could_ be that the charging path will be
forced to reclaim memory because cached charges are still waiting for
their draining.
Disabling preemption before invoking drain_local_stock() is problematic on
PREEMPT_RT due to the sleeping locks involved. To ensure that no CPU
migrations happens across for_each_online_cpu() it is enouhg to use
migrate_disable() which disables migration and keeps context preemptible
to a sleeping lock can be acquired. A race with CPU hotplug is not a
problem because pcp data is not going away. In the worst case we just
schedule draining of an empty stock.
Use migrate_disable() instead of get_cpu() around the
for_each_online_cpu() loop.
Link: https://lkml.kernel.org/r/20220226204144.1008339-7-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-disable-migration-instead-of-preemption-in-drain_all_stock
+++ a/mm/memcontrol.c
@@ -2300,7 +2300,8 @@ static void drain_all_stock(struct mem_c
* as well as workers from this path always operate on the local
* per-cpu data. CPU up doesn't touch memcg_stock at all.
*/
- curcpu = get_cpu();
+ migrate_disable();
+ curcpu = smp_processor_id();
for_each_online_cpu(cpu) {
struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
struct mem_cgroup *memcg;
@@ -2323,7 +2324,7 @@ static void drain_all_stock(struct mem_c
schedule_work_on(cpu, &stock->work);
}
}
- put_cpu();
+ migrate_enable();
mutex_unlock(&percpu_charge_mutex);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 047/227] mm: list_lru: transpose the array of per-node per-memcg lru lists
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: transpose the array of per-node per-memcg lru lists
Patch series "Optimize list lru memory consumption", v6.
In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
memory.
After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.
crash> p
memcg_nr_cache_ids memcg_nr_cache_ids = $2 = 24574
memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.
num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
crash> list super_blocks | wc -l
952
Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But now the number of memory cgroups is less than 500. So I
guess more than 12286 memory cgroups have been created on this machine (I
do not know why there are so many cgroups, it may be a user's bug or the
user really want to do that). Because memcg_nr_cache_ids has not been
reduced to a suitable value. It leads to waste a lot of memory. If we
want to reduce memcg_nr_cache_ids, we have to *reboot* the server. This
is not what we want.
In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do
this. But this did not fundamentally solve the problem.
We currently allocate scope for every memcg to be able to tracked on every
superblock instantiated in the system, regardless of whether that
superblock is even accessible to that memcg.
These huge memcg counts come from container hosts where memcgs are
confined to just a small subset of the total number of superblocks that
instantiated at any given point in time.
For these systems with huge container counts, list_lru does not need the
capability of tracking every memcg on every superblock.
What it comes down to is that the list_lru is only needed for a given
memcg if that memcg is instatiating and freeing objects on a given
list_lru.
As Dave said, "Which makes me think we should be moving more towards 'add
the memcg to the list_lru at the first insert' model rather than
'instantiate all at memcg init time just in case'."
This patchset aims to optimize the list lru memory consumption from
different aspects.
I had done a easy test to show the optimization. I create 10k memory
cgroups and mount 10k filesystems in the systems. We use free command to
show how many memory does the systems comsumes after this operation (There
are 2 numa nodes in the system).
+-----------------------+------------------------+
| condition | memory consumption |
+-----------------------+------------------------+
| without this patchset | 24464 MB |
+-----------------------+------------------------+
| after patch 1 | 21957 MB | <--------+
+-----------------------+------------------------+ |
| after patch 10 | 6895 MB | |
+-----------------------+------------------------+ |
| after patch 12 | 4367 MB | |
+-----------------------+------------------------+ |
|
The more the number of nodes, the more obvious the effect---+
BTW, there was a recent discussion [2] on the same issue.
[1] https://lore.kernel.org/all/20210428094949.43579-1-songmuchun@bytedance.com/
[2] https://lore.kernel.org/all/20210405054848.GA1077931@in.ibm.com/
This series not only optimizes the memory usage of list_lru but also
simplifies the code.
This patch (of 16):
The current scheme of maintaining per-node per-memcg lru lists looks like:
struct list_lru {
struct list_lru_node *node; (for each node)
struct list_lru_memcg *memcg_lrus;
struct list_lru_one *lru[]; (for each memcg)
}
By effectively transposing the two-dimension array of list_lru_one's structures
(per-node per-memcg => per-memcg per-node) it's possible to save some memory
and simplify alloc/dealloc paths. The new scheme looks like:
struct list_lru {
struct list_lru_memcg *mlrus;
struct list_lru_per_memcg *mlru[]; (for each memcg)
struct list_lru_one node[0]; (for each node)
}
Memory savings are coming from not only 'struct rcu_head' but also some
pointer arrays used to store the pointer to 'struct list_lru_one'. The
array is per node and its size is 8 (a pointer) * num_memcgs. So the
total size of the arrays is 8 * num_nodes * memcg_nr_cache_ids. After
this patch, the size becomes 8 * memcg_nr_cache_ids.
Link: https://lkml.kernel.org/r/20220228122126.37293-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220228122126.37293-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 17 +--
mm/list_lru.c | 206 +++++++++++++------------------------
2 files changed, 86 insertions(+), 137 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-transpose-the-array-of-per-node-per-memcg-lru-lists
+++ a/include/linux/list_lru.h
@@ -31,10 +31,15 @@ struct list_lru_one {
long nr_items;
};
+struct list_lru_per_memcg {
+ /* array of per cgroup per node lists, indexed by node id */
+ struct list_lru_one node[0];
+};
+
struct list_lru_memcg {
- struct rcu_head rcu;
+ struct rcu_head rcu;
/* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_one *lru[];
+ struct list_lru_per_memcg *mlru[];
};
struct list_lru_node {
@@ -42,11 +47,7 @@ struct list_lru_node {
spinlock_t lock;
/* global list, used for the root cgroup in cgroup aware lrus */
struct list_lru_one lru;
-#ifdef CONFIG_MEMCG_KMEM
- /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
- struct list_lru_memcg __rcu *memcg_lrus;
-#endif
- long nr_items;
+ long nr_items;
} ____cacheline_aligned_in_smp;
struct list_lru {
@@ -55,6 +56,8 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
+ /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
+ struct list_lru_memcg __rcu *mlrus;
#endif
};
--- a/mm/list_lru.c~mm-list_lru-transpose-the-array-of-per-node-per-memcg-lru-lists
+++ a/mm/list_lru.c
@@ -49,35 +49,37 @@ static int lru_shrinker_id(struct list_l
}
static inline struct list_lru_one *
-list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
+list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- struct list_lru_memcg *memcg_lrus;
+ struct list_lru_memcg *mlrus;
+ struct list_lru_node *nlru = &lru->node[nid];
+
/*
* Either lock or RCU protects the array of per cgroup lists
- * from relocation (see memcg_update_list_lru_node).
+ * from relocation (see memcg_update_list_lru).
*/
- memcg_lrus = rcu_dereference_check(nlru->memcg_lrus,
- lockdep_is_held(&nlru->lock));
- if (memcg_lrus && idx >= 0)
- return memcg_lrus->lru[idx];
+ mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock));
+ if (mlrus && idx >= 0)
+ return &mlrus->mlru[idx]->node[nid];
return &nlru->lru;
}
static inline struct list_lru_one *
-list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
+list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
struct mem_cgroup **memcg_ptr)
{
+ struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l = &nlru->lru;
struct mem_cgroup *memcg = NULL;
- if (!nlru->memcg_lrus)
+ if (!lru->mlrus)
goto out;
memcg = mem_cgroup_from_obj(ptr);
if (!memcg)
goto out;
- l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
out:
if (memcg_ptr)
*memcg_ptr = memcg;
@@ -103,18 +105,18 @@ static inline bool list_lru_memcg_aware(
}
static inline struct list_lru_one *
-list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
+list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
static inline struct list_lru_one *
-list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
+list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
struct mem_cgroup **memcg_ptr)
{
if (memcg_ptr)
*memcg_ptr = NULL;
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
#endif /* CONFIG_MEMCG_KMEM */
@@ -127,7 +129,7 @@ bool list_lru_add(struct list_lru *lru,
spin_lock(&nlru->lock);
if (list_empty(item)) {
- l = list_lru_from_kmem(nlru, item, &memcg);
+ l = list_lru_from_kmem(lru, nid, item, &memcg);
list_add_tail(item, &l->list);
/* Set shrinker bit if the first element was added */
if (!l->nr_items++)
@@ -150,7 +152,7 @@ bool list_lru_del(struct list_lru *lru,
spin_lock(&nlru->lock);
if (!list_empty(item)) {
- l = list_lru_from_kmem(nlru, item, NULL);
+ l = list_lru_from_kmem(lru, nid, item, NULL);
list_del_init(item);
l->nr_items--;
nlru->nr_items--;
@@ -180,12 +182,11 @@ EXPORT_SYMBOL_GPL(list_lru_isolate_move)
unsigned long list_lru_count_one(struct list_lru *lru,
int nid, struct mem_cgroup *memcg)
{
- struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l;
long count;
rcu_read_lock();
- l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
count = READ_ONCE(l->nr_items);
rcu_read_unlock();
@@ -206,16 +207,16 @@ unsigned long list_lru_count_node(struct
EXPORT_SYMBOL_GPL(list_lru_count_node);
static unsigned long
-__list_lru_walk_one(struct list_lru_node *nlru, int memcg_idx,
+__list_lru_walk_one(struct list_lru *lru, int nid, int memcg_idx,
list_lru_walk_cb isolate, void *cb_arg,
unsigned long *nr_to_walk)
{
-
+ struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l;
struct list_head *item, *n;
unsigned long isolated = 0;
- l = list_lru_from_memcg_idx(nlru, memcg_idx);
+ l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
restart:
list_for_each_safe(item, n, &l->list) {
enum lru_status ret;
@@ -272,8 +273,8 @@ list_lru_walk_one(struct list_lru *lru,
unsigned long ret;
spin_lock(&nlru->lock);
- ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg,
- nr_to_walk);
+ ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ cb_arg, nr_to_walk);
spin_unlock(&nlru->lock);
return ret;
}
@@ -288,8 +289,8 @@ list_lru_walk_one_irq(struct list_lru *l
unsigned long ret;
spin_lock_irq(&nlru->lock);
- ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg,
- nr_to_walk);
+ ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ cb_arg, nr_to_walk);
spin_unlock_irq(&nlru->lock);
return ret;
}
@@ -308,7 +309,7 @@ unsigned long list_lru_walk_node(struct
struct list_lru_node *nlru = &lru->node[nid];
spin_lock(&nlru->lock);
- isolated += __list_lru_walk_one(nlru, memcg_idx,
+ isolated += __list_lru_walk_one(lru, nid, memcg_idx,
isolate, cb_arg,
nr_to_walk);
spin_unlock(&nlru->lock);
@@ -328,166 +329,111 @@ static void init_one_lru(struct list_lru
}
#ifdef CONFIG_MEMCG_KMEM
-static void __memcg_destroy_list_lru_node(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
+static void memcg_destroy_list_lru_range(struct list_lru_memcg *mlrus,
+ int begin, int end)
{
int i;
for (i = begin; i < end; i++)
- kfree(memcg_lrus->lru[i]);
+ kfree(mlrus->mlru[i]);
}
-static int __memcg_init_list_lru_node(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
+static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus,
+ int begin, int end)
{
int i;
for (i = begin; i < end; i++) {
- struct list_lru_one *l;
+ int nid;
+ struct list_lru_per_memcg *mlru;
- l = kmalloc(sizeof(struct list_lru_one), GFP_KERNEL);
- if (!l)
+ mlru = kmalloc(struct_size(mlru, node, nr_node_ids), GFP_KERNEL);
+ if (!mlru)
goto fail;
- init_one_lru(l);
- memcg_lrus->lru[i] = l;
+ for_each_node(nid)
+ init_one_lru(&mlru->node[nid]);
+ mlrus->mlru[i] = mlru;
}
return 0;
fail:
- __memcg_destroy_list_lru_node(memcg_lrus, begin, i);
+ memcg_destroy_list_lru_range(mlrus, begin, i);
return -ENOMEM;
}
-static int memcg_init_list_lru_node(struct list_lru_node *nlru)
+static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- struct list_lru_memcg *memcg_lrus;
+ struct list_lru_memcg *mlrus;
int size = memcg_nr_cache_ids;
- memcg_lrus = kvmalloc(struct_size(memcg_lrus, lru, size), GFP_KERNEL);
- if (!memcg_lrus)
+ lru->memcg_aware = memcg_aware;
+ if (!memcg_aware)
+ return 0;
+
+ mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
+ if (!mlrus)
return -ENOMEM;
- if (__memcg_init_list_lru_node(memcg_lrus, 0, size)) {
- kvfree(memcg_lrus);
+ if (memcg_init_list_lru_range(mlrus, 0, size)) {
+ kvfree(mlrus);
return -ENOMEM;
}
- RCU_INIT_POINTER(nlru->memcg_lrus, memcg_lrus);
+ RCU_INIT_POINTER(lru->mlrus, mlrus);
return 0;
}
-static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
+static void memcg_destroy_list_lru(struct list_lru *lru)
{
- struct list_lru_memcg *memcg_lrus;
+ struct list_lru_memcg *mlrus;
+
+ if (!list_lru_memcg_aware(lru))
+ return;
+
/*
* This is called when shrinker has already been unregistered,
* and nobody can use it. So, there is no need to use kvfree_rcu().
*/
- memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
- __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
- kvfree(memcg_lrus);
+ mlrus = rcu_dereference_protected(lru->mlrus, true);
+ memcg_destroy_list_lru_range(mlrus, 0, memcg_nr_cache_ids);
+ kvfree(mlrus);
}
-static int memcg_update_list_lru_node(struct list_lru_node *nlru,
- int old_size, int new_size)
+static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size)
{
struct list_lru_memcg *old, *new;
BUG_ON(old_size > new_size);
- old = rcu_dereference_protected(nlru->memcg_lrus,
+ old = rcu_dereference_protected(lru->mlrus,
lockdep_is_held(&list_lrus_mutex));
- new = kvmalloc(struct_size(new, lru, new_size), GFP_KERNEL);
+ new = kvmalloc(struct_size(new, mlru, new_size), GFP_KERNEL);
if (!new)
return -ENOMEM;
- if (__memcg_init_list_lru_node(new, old_size, new_size)) {
+ if (memcg_init_list_lru_range(new, old_size, new_size)) {
kvfree(new);
return -ENOMEM;
}
- memcpy(&new->lru, &old->lru, flex_array_size(new, lru, old_size));
- rcu_assign_pointer(nlru->memcg_lrus, new);
+ memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
+ rcu_assign_pointer(lru->mlrus, new);
kvfree_rcu(old, rcu);
return 0;
}
-static void memcg_cancel_update_list_lru_node(struct list_lru_node *nlru,
- int old_size, int new_size)
-{
- struct list_lru_memcg *memcg_lrus;
-
- memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus,
- lockdep_is_held(&list_lrus_mutex));
- /* do not bother shrinking the array back to the old size, because we
- * cannot handle allocation failures here */
- __memcg_destroy_list_lru_node(memcg_lrus, old_size, new_size);
-}
-
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
-{
- int i;
-
- lru->memcg_aware = memcg_aware;
-
- if (!memcg_aware)
- return 0;
-
- for_each_node(i) {
- if (memcg_init_list_lru_node(&lru->node[i]))
- goto fail;
- }
- return 0;
-fail:
- for (i = i - 1; i >= 0; i--) {
- if (!lru->node[i].memcg_lrus)
- continue;
- memcg_destroy_list_lru_node(&lru->node[i]);
- }
- return -ENOMEM;
-}
-
-static void memcg_destroy_list_lru(struct list_lru *lru)
-{
- int i;
-
- if (!list_lru_memcg_aware(lru))
- return;
-
- for_each_node(i)
- memcg_destroy_list_lru_node(&lru->node[i]);
-}
-
-static int memcg_update_list_lru(struct list_lru *lru,
- int old_size, int new_size)
-{
- int i;
-
- for_each_node(i) {
- if (memcg_update_list_lru_node(&lru->node[i],
- old_size, new_size))
- goto fail;
- }
- return 0;
-fail:
- for (i = i - 1; i >= 0; i--) {
- if (!lru->node[i].memcg_lrus)
- continue;
-
- memcg_cancel_update_list_lru_node(&lru->node[i],
- old_size, new_size);
- }
- return -ENOMEM;
-}
-
static void memcg_cancel_update_list_lru(struct list_lru *lru,
int old_size, int new_size)
{
- int i;
+ struct list_lru_memcg *mlrus;
- for_each_node(i)
- memcg_cancel_update_list_lru_node(&lru->node[i],
- old_size, new_size);
+ mlrus = rcu_dereference_protected(lru->mlrus,
+ lockdep_is_held(&list_lrus_mutex));
+ /*
+ * Do not bother shrinking the array back to the old size, because we
+ * cannot handle allocation failures here.
+ */
+ memcg_destroy_list_lru_range(mlrus, old_size, new_size);
}
int memcg_update_all_list_lrus(int new_size)
@@ -524,8 +470,8 @@ static void memcg_drain_list_lru_node(st
*/
spin_lock_irq(&nlru->lock);
- src = list_lru_from_memcg_idx(nlru, src_idx);
- dst = list_lru_from_memcg_idx(nlru, dst_idx);
+ src = list_lru_from_memcg_idx(lru, nid, src_idx);
+ dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
list_splice_init(&src->list, &dst->list);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 047/227] mm: list_lru: transpose the array of per-node per-memcg lru lists
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: transpose the array of per-node per-memcg lru lists
Patch series "Optimize list lru memory consumption", v6.
In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
memory.
After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.
crash> p
memcg_nr_cache_ids memcg_nr_cache_ids = $2 = 24574
memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.
num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
crash> list super_blocks | wc -l
952
Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But now the number of memory cgroups is less than 500. So I
guess more than 12286 memory cgroups have been created on this machine (I
do not know why there are so many cgroups, it may be a user's bug or the
user really want to do that). Because memcg_nr_cache_ids has not been
reduced to a suitable value. It leads to waste a lot of memory. If we
want to reduce memcg_nr_cache_ids, we have to *reboot* the server. This
is not what we want.
In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do
this. But this did not fundamentally solve the problem.
We currently allocate scope for every memcg to be able to tracked on every
superblock instantiated in the system, regardless of whether that
superblock is even accessible to that memcg.
These huge memcg counts come from container hosts where memcgs are
confined to just a small subset of the total number of superblocks that
instantiated at any given point in time.
For these systems with huge container counts, list_lru does not need the
capability of tracking every memcg on every superblock.
What it comes down to is that the list_lru is only needed for a given
memcg if that memcg is instatiating and freeing objects on a given
list_lru.
As Dave said, "Which makes me think we should be moving more towards 'add
the memcg to the list_lru at the first insert' model rather than
'instantiate all at memcg init time just in case'."
This patchset aims to optimize the list lru memory consumption from
different aspects.
I had done a easy test to show the optimization. I create 10k memory
cgroups and mount 10k filesystems in the systems. We use free command to
show how many memory does the systems comsumes after this operation (There
are 2 numa nodes in the system).
+-----------------------+------------------------+
| condition | memory consumption |
+-----------------------+------------------------+
| without this patchset | 24464 MB |
+-----------------------+------------------------+
| after patch 1 | 21957 MB | <--------+
+-----------------------+------------------------+ |
| after patch 10 | 6895 MB | |
+-----------------------+------------------------+ |
| after patch 12 | 4367 MB | |
+-----------------------+------------------------+ |
|
The more the number of nodes, the more obvious the effect---+
BTW, there was a recent discussion [2] on the same issue.
[1] https://lore.kernel.org/all/20210428094949.43579-1-songmuchun@bytedance.com/
[2] https://lore.kernel.org/all/20210405054848.GA1077931@in.ibm.com/
This series not only optimizes the memory usage of list_lru but also
simplifies the code.
This patch (of 16):
The current scheme of maintaining per-node per-memcg lru lists looks like:
struct list_lru {
struct list_lru_node *node; (for each node)
struct list_lru_memcg *memcg_lrus;
struct list_lru_one *lru[]; (for each memcg)
}
By effectively transposing the two-dimension array of list_lru_one's structures
(per-node per-memcg => per-memcg per-node) it's possible to save some memory
and simplify alloc/dealloc paths. The new scheme looks like:
struct list_lru {
struct list_lru_memcg *mlrus;
struct list_lru_per_memcg *mlru[]; (for each memcg)
struct list_lru_one node[0]; (for each node)
}
Memory savings are coming from not only 'struct rcu_head' but also some
pointer arrays used to store the pointer to 'struct list_lru_one'. The
array is per node and its size is 8 (a pointer) * num_memcgs. So the
total size of the arrays is 8 * num_nodes * memcg_nr_cache_ids. After
this patch, the size becomes 8 * memcg_nr_cache_ids.
Link: https://lkml.kernel.org/r/20220228122126.37293-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220228122126.37293-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 17 +--
mm/list_lru.c | 206 +++++++++++++------------------------
2 files changed, 86 insertions(+), 137 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-transpose-the-array-of-per-node-per-memcg-lru-lists
+++ a/include/linux/list_lru.h
@@ -31,10 +31,15 @@ struct list_lru_one {
long nr_items;
};
+struct list_lru_per_memcg {
+ /* array of per cgroup per node lists, indexed by node id */
+ struct list_lru_one node[0];
+};
+
struct list_lru_memcg {
- struct rcu_head rcu;
+ struct rcu_head rcu;
/* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_one *lru[];
+ struct list_lru_per_memcg *mlru[];
};
struct list_lru_node {
@@ -42,11 +47,7 @@ struct list_lru_node {
spinlock_t lock;
/* global list, used for the root cgroup in cgroup aware lrus */
struct list_lru_one lru;
-#ifdef CONFIG_MEMCG_KMEM
- /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
- struct list_lru_memcg __rcu *memcg_lrus;
-#endif
- long nr_items;
+ long nr_items;
} ____cacheline_aligned_in_smp;
struct list_lru {
@@ -55,6 +56,8 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
+ /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
+ struct list_lru_memcg __rcu *mlrus;
#endif
};
--- a/mm/list_lru.c~mm-list_lru-transpose-the-array-of-per-node-per-memcg-lru-lists
+++ a/mm/list_lru.c
@@ -49,35 +49,37 @@ static int lru_shrinker_id(struct list_l
}
static inline struct list_lru_one *
-list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
+list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- struct list_lru_memcg *memcg_lrus;
+ struct list_lru_memcg *mlrus;
+ struct list_lru_node *nlru = &lru->node[nid];
+
/*
* Either lock or RCU protects the array of per cgroup lists
- * from relocation (see memcg_update_list_lru_node).
+ * from relocation (see memcg_update_list_lru).
*/
- memcg_lrus = rcu_dereference_check(nlru->memcg_lrus,
- lockdep_is_held(&nlru->lock));
- if (memcg_lrus && idx >= 0)
- return memcg_lrus->lru[idx];
+ mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock));
+ if (mlrus && idx >= 0)
+ return &mlrus->mlru[idx]->node[nid];
return &nlru->lru;
}
static inline struct list_lru_one *
-list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
+list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
struct mem_cgroup **memcg_ptr)
{
+ struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l = &nlru->lru;
struct mem_cgroup *memcg = NULL;
- if (!nlru->memcg_lrus)
+ if (!lru->mlrus)
goto out;
memcg = mem_cgroup_from_obj(ptr);
if (!memcg)
goto out;
- l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
out:
if (memcg_ptr)
*memcg_ptr = memcg;
@@ -103,18 +105,18 @@ static inline bool list_lru_memcg_aware(
}
static inline struct list_lru_one *
-list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
+list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
static inline struct list_lru_one *
-list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
+list_lru_from_kmem(struct list_lru *lru, int nid, void *ptr,
struct mem_cgroup **memcg_ptr)
{
if (memcg_ptr)
*memcg_ptr = NULL;
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
#endif /* CONFIG_MEMCG_KMEM */
@@ -127,7 +129,7 @@ bool list_lru_add(struct list_lru *lru,
spin_lock(&nlru->lock);
if (list_empty(item)) {
- l = list_lru_from_kmem(nlru, item, &memcg);
+ l = list_lru_from_kmem(lru, nid, item, &memcg);
list_add_tail(item, &l->list);
/* Set shrinker bit if the first element was added */
if (!l->nr_items++)
@@ -150,7 +152,7 @@ bool list_lru_del(struct list_lru *lru,
spin_lock(&nlru->lock);
if (!list_empty(item)) {
- l = list_lru_from_kmem(nlru, item, NULL);
+ l = list_lru_from_kmem(lru, nid, item, NULL);
list_del_init(item);
l->nr_items--;
nlru->nr_items--;
@@ -180,12 +182,11 @@ EXPORT_SYMBOL_GPL(list_lru_isolate_move)
unsigned long list_lru_count_one(struct list_lru *lru,
int nid, struct mem_cgroup *memcg)
{
- struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l;
long count;
rcu_read_lock();
- l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
count = READ_ONCE(l->nr_items);
rcu_read_unlock();
@@ -206,16 +207,16 @@ unsigned long list_lru_count_node(struct
EXPORT_SYMBOL_GPL(list_lru_count_node);
static unsigned long
-__list_lru_walk_one(struct list_lru_node *nlru, int memcg_idx,
+__list_lru_walk_one(struct list_lru *lru, int nid, int memcg_idx,
list_lru_walk_cb isolate, void *cb_arg,
unsigned long *nr_to_walk)
{
-
+ struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l;
struct list_head *item, *n;
unsigned long isolated = 0;
- l = list_lru_from_memcg_idx(nlru, memcg_idx);
+ l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
restart:
list_for_each_safe(item, n, &l->list) {
enum lru_status ret;
@@ -272,8 +273,8 @@ list_lru_walk_one(struct list_lru *lru,
unsigned long ret;
spin_lock(&nlru->lock);
- ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg,
- nr_to_walk);
+ ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ cb_arg, nr_to_walk);
spin_unlock(&nlru->lock);
return ret;
}
@@ -288,8 +289,8 @@ list_lru_walk_one_irq(struct list_lru *l
unsigned long ret;
spin_lock_irq(&nlru->lock);
- ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg,
- nr_to_walk);
+ ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ cb_arg, nr_to_walk);
spin_unlock_irq(&nlru->lock);
return ret;
}
@@ -308,7 +309,7 @@ unsigned long list_lru_walk_node(struct
struct list_lru_node *nlru = &lru->node[nid];
spin_lock(&nlru->lock);
- isolated += __list_lru_walk_one(nlru, memcg_idx,
+ isolated += __list_lru_walk_one(lru, nid, memcg_idx,
isolate, cb_arg,
nr_to_walk);
spin_unlock(&nlru->lock);
@@ -328,166 +329,111 @@ static void init_one_lru(struct list_lru
}
#ifdef CONFIG_MEMCG_KMEM
-static void __memcg_destroy_list_lru_node(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
+static void memcg_destroy_list_lru_range(struct list_lru_memcg *mlrus,
+ int begin, int end)
{
int i;
for (i = begin; i < end; i++)
- kfree(memcg_lrus->lru[i]);
+ kfree(mlrus->mlru[i]);
}
-static int __memcg_init_list_lru_node(struct list_lru_memcg *memcg_lrus,
- int begin, int end)
+static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus,
+ int begin, int end)
{
int i;
for (i = begin; i < end; i++) {
- struct list_lru_one *l;
+ int nid;
+ struct list_lru_per_memcg *mlru;
- l = kmalloc(sizeof(struct list_lru_one), GFP_KERNEL);
- if (!l)
+ mlru = kmalloc(struct_size(mlru, node, nr_node_ids), GFP_KERNEL);
+ if (!mlru)
goto fail;
- init_one_lru(l);
- memcg_lrus->lru[i] = l;
+ for_each_node(nid)
+ init_one_lru(&mlru->node[nid]);
+ mlrus->mlru[i] = mlru;
}
return 0;
fail:
- __memcg_destroy_list_lru_node(memcg_lrus, begin, i);
+ memcg_destroy_list_lru_range(mlrus, begin, i);
return -ENOMEM;
}
-static int memcg_init_list_lru_node(struct list_lru_node *nlru)
+static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- struct list_lru_memcg *memcg_lrus;
+ struct list_lru_memcg *mlrus;
int size = memcg_nr_cache_ids;
- memcg_lrus = kvmalloc(struct_size(memcg_lrus, lru, size), GFP_KERNEL);
- if (!memcg_lrus)
+ lru->memcg_aware = memcg_aware;
+ if (!memcg_aware)
+ return 0;
+
+ mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
+ if (!mlrus)
return -ENOMEM;
- if (__memcg_init_list_lru_node(memcg_lrus, 0, size)) {
- kvfree(memcg_lrus);
+ if (memcg_init_list_lru_range(mlrus, 0, size)) {
+ kvfree(mlrus);
return -ENOMEM;
}
- RCU_INIT_POINTER(nlru->memcg_lrus, memcg_lrus);
+ RCU_INIT_POINTER(lru->mlrus, mlrus);
return 0;
}
-static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
+static void memcg_destroy_list_lru(struct list_lru *lru)
{
- struct list_lru_memcg *memcg_lrus;
+ struct list_lru_memcg *mlrus;
+
+ if (!list_lru_memcg_aware(lru))
+ return;
+
/*
* This is called when shrinker has already been unregistered,
* and nobody can use it. So, there is no need to use kvfree_rcu().
*/
- memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
- __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
- kvfree(memcg_lrus);
+ mlrus = rcu_dereference_protected(lru->mlrus, true);
+ memcg_destroy_list_lru_range(mlrus, 0, memcg_nr_cache_ids);
+ kvfree(mlrus);
}
-static int memcg_update_list_lru_node(struct list_lru_node *nlru,
- int old_size, int new_size)
+static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size)
{
struct list_lru_memcg *old, *new;
BUG_ON(old_size > new_size);
- old = rcu_dereference_protected(nlru->memcg_lrus,
+ old = rcu_dereference_protected(lru->mlrus,
lockdep_is_held(&list_lrus_mutex));
- new = kvmalloc(struct_size(new, lru, new_size), GFP_KERNEL);
+ new = kvmalloc(struct_size(new, mlru, new_size), GFP_KERNEL);
if (!new)
return -ENOMEM;
- if (__memcg_init_list_lru_node(new, old_size, new_size)) {
+ if (memcg_init_list_lru_range(new, old_size, new_size)) {
kvfree(new);
return -ENOMEM;
}
- memcpy(&new->lru, &old->lru, flex_array_size(new, lru, old_size));
- rcu_assign_pointer(nlru->memcg_lrus, new);
+ memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
+ rcu_assign_pointer(lru->mlrus, new);
kvfree_rcu(old, rcu);
return 0;
}
-static void memcg_cancel_update_list_lru_node(struct list_lru_node *nlru,
- int old_size, int new_size)
-{
- struct list_lru_memcg *memcg_lrus;
-
- memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus,
- lockdep_is_held(&list_lrus_mutex));
- /* do not bother shrinking the array back to the old size, because we
- * cannot handle allocation failures here */
- __memcg_destroy_list_lru_node(memcg_lrus, old_size, new_size);
-}
-
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
-{
- int i;
-
- lru->memcg_aware = memcg_aware;
-
- if (!memcg_aware)
- return 0;
-
- for_each_node(i) {
- if (memcg_init_list_lru_node(&lru->node[i]))
- goto fail;
- }
- return 0;
-fail:
- for (i = i - 1; i >= 0; i--) {
- if (!lru->node[i].memcg_lrus)
- continue;
- memcg_destroy_list_lru_node(&lru->node[i]);
- }
- return -ENOMEM;
-}
-
-static void memcg_destroy_list_lru(struct list_lru *lru)
-{
- int i;
-
- if (!list_lru_memcg_aware(lru))
- return;
-
- for_each_node(i)
- memcg_destroy_list_lru_node(&lru->node[i]);
-}
-
-static int memcg_update_list_lru(struct list_lru *lru,
- int old_size, int new_size)
-{
- int i;
-
- for_each_node(i) {
- if (memcg_update_list_lru_node(&lru->node[i],
- old_size, new_size))
- goto fail;
- }
- return 0;
-fail:
- for (i = i - 1; i >= 0; i--) {
- if (!lru->node[i].memcg_lrus)
- continue;
-
- memcg_cancel_update_list_lru_node(&lru->node[i],
- old_size, new_size);
- }
- return -ENOMEM;
-}
-
static void memcg_cancel_update_list_lru(struct list_lru *lru,
int old_size, int new_size)
{
- int i;
+ struct list_lru_memcg *mlrus;
- for_each_node(i)
- memcg_cancel_update_list_lru_node(&lru->node[i],
- old_size, new_size);
+ mlrus = rcu_dereference_protected(lru->mlrus,
+ lockdep_is_held(&list_lrus_mutex));
+ /*
+ * Do not bother shrinking the array back to the old size, because we
+ * cannot handle allocation failures here.
+ */
+ memcg_destroy_list_lru_range(mlrus, old_size, new_size);
}
int memcg_update_all_list_lrus(int new_size)
@@ -524,8 +470,8 @@ static void memcg_drain_list_lru_node(st
*/
spin_lock_irq(&nlru->lock);
- src = list_lru_from_memcg_idx(nlru, src_idx);
- dst = list_lru_from_memcg_idx(nlru, dst_idx);
+ src = list_lru_from_memcg_idx(lru, nid, src_idx);
+ dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
list_splice_init(&src->list, &dst->list);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 048/227] mm: introduce kmem_cache_alloc_lru
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: introduce kmem_cache_alloc_lru
We currently allocate scope for every memcg to be able to tracked on every
superblock instantiated in the system, regardless of whether that
superblock is even accessible to that memcg.
These huge memcg counts come from container hosts where memcgs are
confined to just a small subset of the total number of superblocks that
instantiated at any given point in time.
For these systems with huge container counts, list_lru does not need the
capability of tracking every memcg on every superblock. What it comes
down to is that adding the memcg to the list_lru at the first insert. So
introduce kmem_cache_alloc_lru to allocate objects and its list_lru. In
the later patch, we will convert all inode and dentry allocation from
kmem_cache_alloc to kmem_cache_alloc_lru.
Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 4 +
include/linux/memcontrol.h | 14 ++++
include/linux/slab.h | 3 +
mm/list_lru.c | 104 +++++++++++++++++++++++++++++++----
mm/memcontrol.c | 14 ----
mm/slab.c | 39 +++++++++----
mm/slab.h | 25 +++++++-
mm/slob.c | 6 ++
mm/slub.c | 42 +++++++++-----
9 files changed, 198 insertions(+), 53 deletions(-)
--- a/include/linux/list_lru.h~mm-introduce-kmem_cache_alloc_lru
+++ a/include/linux/list_lru.h
@@ -56,6 +56,8 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
+ /* protects ->mlrus->mlru[i] */
+ spinlock_t lock;
/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
struct list_lru_memcg __rcu *mlrus;
#endif
@@ -72,6 +74,8 @@ int __list_lru_init(struct list_lru *lru
#define list_lru_init_memcg(lru, shrinker) \
__list_lru_init((lru), true, NULL, shrinker)
+int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
+ gfp_t gfp);
int memcg_update_all_list_lrus(int num_memcgs);
void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg);
--- a/include/linux/memcontrol.h~mm-introduce-kmem_cache_alloc_lru
+++ a/include/linux/memcontrol.h
@@ -524,6 +524,20 @@ static inline struct mem_cgroup *page_me
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}
+static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+retry:
+ memcg = obj_cgroup_memcg(objcg);
+ if (unlikely(!css_tryget(&memcg->css)))
+ goto retry;
+ rcu_read_unlock();
+
+ return memcg;
+}
+
#ifdef CONFIG_MEMCG_KMEM
/*
* folio_memcg_kmem - Check if the folio has the memcg_kmem flag set.
--- a/include/linux/slab.h~mm-introduce-kmem_cache_alloc_lru
+++ a/include/linux/slab.h
@@ -135,6 +135,7 @@
#include <linux/kasan.h>
+struct list_lru;
struct mem_cgroup;
/*
* struct kmem_cache related prototypes
@@ -416,6 +417,8 @@ static __always_inline unsigned int __km
void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
void *kmem_cache_alloc(struct kmem_cache *s, gfp_t flags) __assume_slab_alignment __malloc;
+void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags) __assume_slab_alignment __malloc;
void kmem_cache_free(struct kmem_cache *s, void *objp);
/*
--- a/mm/list_lru.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/list_lru.c
@@ -13,6 +13,7 @@
#include <linux/mutex.h>
#include <linux/memcontrol.h>
#include "slab.h"
+#include "internal.h"
#ifdef CONFIG_MEMCG_KMEM
static LIST_HEAD(memcg_list_lrus);
@@ -338,22 +339,30 @@ static void memcg_destroy_list_lru_range
kfree(mlrus->mlru[i]);
}
+static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp)
+{
+ int nid;
+ struct list_lru_per_memcg *mlru;
+
+ mlru = kmalloc(struct_size(mlru, node, nr_node_ids), gfp);
+ if (!mlru)
+ return NULL;
+
+ for_each_node(nid)
+ init_one_lru(&mlru->node[nid]);
+
+ return mlru;
+}
+
static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus,
int begin, int end)
{
int i;
for (i = begin; i < end; i++) {
- int nid;
- struct list_lru_per_memcg *mlru;
-
- mlru = kmalloc(struct_size(mlru, node, nr_node_ids), GFP_KERNEL);
- if (!mlru)
+ mlrus->mlru[i] = memcg_init_list_lru_one(GFP_KERNEL);
+ if (!mlrus->mlru[i])
goto fail;
-
- for_each_node(nid)
- init_one_lru(&mlru->node[nid]);
- mlrus->mlru[i] = mlru;
}
return 0;
fail:
@@ -370,6 +379,8 @@ static int memcg_init_list_lru(struct li
if (!memcg_aware)
return 0;
+ spin_lock_init(&lru->lock);
+
mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
if (!mlrus)
return -ENOMEM;
@@ -416,8 +427,11 @@ static int memcg_update_list_lru(struct
return -ENOMEM;
}
+ spin_lock_irq(&lru->lock);
memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
rcu_assign_pointer(lru->mlrus, new);
+ spin_unlock_irq(&lru->lock);
+
kvfree_rcu(old, rcu);
return 0;
}
@@ -502,6 +516,78 @@ void memcg_drain_all_list_lrus(int src_i
memcg_drain_list_lru(lru, src_idx, dst_memcg);
mutex_unlock(&list_lrus_mutex);
}
+
+static bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
+ struct list_lru *lru)
+{
+ bool allocated;
+ int idx;
+
+ idx = memcg->kmemcg_id;
+ if (unlikely(idx < 0))
+ return true;
+
+ rcu_read_lock();
+ allocated = !!rcu_dereference(lru->mlrus)->mlru[idx];
+ rcu_read_unlock();
+
+ return allocated;
+}
+
+int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
+ gfp_t gfp)
+{
+ int i;
+ unsigned long flags;
+ struct list_lru_memcg *mlrus;
+ struct list_lru_memcg_table {
+ struct list_lru_per_memcg *mlru;
+ struct mem_cgroup *memcg;
+ } *table;
+
+ if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
+ return 0;
+
+ gfp &= GFP_RECLAIM_MASK;
+ table = kmalloc_array(memcg->css.cgroup->level, sizeof(*table), gfp);
+ if (!table)
+ return -ENOMEM;
+
+ /*
+ * Because the list_lru can be reparented to the parent cgroup's
+ * list_lru, we should make sure that this cgroup and all its
+ * ancestors have allocated list_lru_per_memcg.
+ */
+ for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) {
+ if (memcg_list_lru_allocated(memcg, lru))
+ break;
+
+ table[i].memcg = memcg;
+ table[i].mlru = memcg_init_list_lru_one(gfp);
+ if (!table[i].mlru) {
+ while (i--)
+ kfree(table[i].mlru);
+ kfree(table);
+ return -ENOMEM;
+ }
+ }
+
+ spin_lock_irqsave(&lru->lock, flags);
+ mlrus = rcu_dereference_protected(lru->mlrus, true);
+ while (i--) {
+ int index = table[i].memcg->kmemcg_id;
+
+ if (mlrus->mlru[index])
+ kfree(table[i].mlru);
+ else
+ mlrus->mlru[index] = table[i].mlru;
+ }
+ spin_unlock_irqrestore(&lru->lock, flags);
+
+ kfree(table);
+
+ return 0;
+}
#else
static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
--- a/mm/memcontrol.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/memcontrol.c
@@ -2805,20 +2805,6 @@ static void commit_charge(struct folio *
folio->memcg_data = (unsigned long)memcg;
}
-static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
-{
- struct mem_cgroup *memcg;
-
- rcu_read_lock();
-retry:
- memcg = obj_cgroup_memcg(objcg);
- if (unlikely(!css_tryget(&memcg->css)))
- goto retry;
- rcu_read_unlock();
-
- return memcg;
-}
-
#ifdef CONFIG_MEMCG_KMEM
/*
* The allocated objcg pointers array is not accounted directly.
--- a/mm/slab.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slab.c
@@ -3211,7 +3211,7 @@ slab_alloc_node(struct kmem_cache *cache
bool init = false;
flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags);
+ cachep = slab_pre_alloc_hook(cachep, NULL, &objcg, 1, flags);
if (unlikely(!cachep))
return NULL;
@@ -3287,7 +3287,8 @@ __do_cache_alloc(struct kmem_cache *cach
#endif /* CONFIG_NUMA */
static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned long caller)
+slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+ size_t orig_size, unsigned long caller)
{
unsigned long save_flags;
void *objp;
@@ -3295,7 +3296,7 @@ slab_alloc(struct kmem_cache *cachep, gf
bool init = false;
flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags);
+ cachep = slab_pre_alloc_hook(cachep, lru, &objcg, 1, flags);
if (unlikely(!cachep))
return NULL;
@@ -3484,6 +3485,18 @@ void ___cache_free(struct kmem_cache *ca
__free_one(ac, objp);
}
+static __always_inline
+void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+ gfp_t flags)
+{
+ void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
+
+ trace_kmem_cache_alloc(_RET_IP_, ret,
+ cachep->object_size, cachep->size, flags);
+
+ return ret;
+}
+
/**
* kmem_cache_alloc - Allocate an object
* @cachep: The cache to allocate from.
@@ -3496,15 +3509,17 @@ void ___cache_free(struct kmem_cache *ca
*/
void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
- void *ret = slab_alloc(cachep, flags, cachep->object_size, _RET_IP_);
-
- trace_kmem_cache_alloc(_RET_IP_, ret,
- cachep->object_size, cachep->size, flags);
-
- return ret;
+ return __kmem_cache_alloc_lru(cachep, NULL, flags);
}
EXPORT_SYMBOL(kmem_cache_alloc);
+void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+ gfp_t flags)
+{
+ return __kmem_cache_alloc_lru(cachep, lru, flags);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
+
static __always_inline void
cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
size_t size, void **p, unsigned long caller)
@@ -3521,7 +3536,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca
size_t i;
struct obj_cgroup *objcg = NULL;
- s = slab_pre_alloc_hook(s, &objcg, size, flags);
+ s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
if (!s)
return 0;
@@ -3562,7 +3577,7 @@ kmem_cache_alloc_trace(struct kmem_cache
{
void *ret;
- ret = slab_alloc(cachep, flags, size, _RET_IP_);
+ ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret,
@@ -3689,7 +3704,7 @@ static __always_inline void *__do_kmallo
cachep = kmalloc_slab(size, flags);
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
- ret = slab_alloc(cachep, flags, size, caller);
+ ret = slab_alloc(cachep, NULL, flags, size, caller);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(caller, ret,
--- a/mm/slab.h~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slab.h
@@ -231,6 +231,7 @@ struct kmem_cache {
#include <linux/kmemleak.h>
#include <linux/random.h>
#include <linux/sched/mm.h>
+#include <linux/list_lru.h>
/*
* State of the slab allocator.
@@ -472,6 +473,7 @@ static inline size_t obj_full_size(struc
* Returns false if the allocation should fail.
*/
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t objects, gfp_t flags)
{
@@ -487,13 +489,26 @@ static inline bool memcg_slab_pre_alloc_
if (!objcg)
return true;
- if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) {
- obj_cgroup_put(objcg);
- return false;
+ if (lru) {
+ int ret;
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ ret = memcg_list_lru_alloc(memcg, lru, flags);
+ css_put(&memcg->css);
+
+ if (ret)
+ goto out;
}
+ if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
+ goto out;
+
*objcgp = objcg;
return true;
+out:
+ obj_cgroup_put(objcg);
+ return false;
}
static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
@@ -598,6 +613,7 @@ static inline void memcg_free_slab_cgrou
}
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t objects, gfp_t flags)
{
@@ -697,6 +713,7 @@ static inline size_t slab_ksize(const st
}
static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t size, gfp_t flags)
{
@@ -707,7 +724,7 @@ static inline struct kmem_cache *slab_pr
if (should_failslab(s, flags))
return NULL;
- if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags))
+ if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
return NULL;
return s;
--- a/mm/slob.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slob.c
@@ -635,6 +635,12 @@ void *kmem_cache_alloc(struct kmem_cache
}
EXPORT_SYMBOL(kmem_cache_alloc);
+
+void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags)
+{
+ return slob_alloc_node(cachep, flags, NUMA_NO_NODE);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
#ifdef CONFIG_NUMA
void *__kmalloc_node(size_t size, gfp_t gfp, int node)
{
--- a/mm/slub.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slub.c
@@ -3131,7 +3131,7 @@ static __always_inline void maybe_wipe_o
*
* Otherwise we can simply pick the next object from the lockless free list.
*/
-static __always_inline void *slab_alloc_node(struct kmem_cache *s,
+static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
@@ -3141,7 +3141,7 @@ static __always_inline void *slab_alloc_
struct obj_cgroup *objcg = NULL;
bool init = false;
- s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags);
+ s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags);
if (!s)
return NULL;
@@ -3232,27 +3232,41 @@ out:
return object;
}
-static __always_inline void *slab_alloc(struct kmem_cache *s,
+static __always_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, unsigned long addr, size_t orig_size)
{
- return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr, orig_size);
+ return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size);
}
-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+static __always_inline
+void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size);
+ void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
trace_kmem_cache_alloc(_RET_IP_, ret, s->object_size,
s->size, gfpflags);
return ret;
}
+
+void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+{
+ return __kmem_cache_alloc_lru(s, NULL, gfpflags);
+}
EXPORT_SYMBOL(kmem_cache_alloc);
+void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags)
+{
+ return __kmem_cache_alloc_lru(s, lru, gfpflags);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
+
#ifdef CONFIG_TRACING
void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_, size);
+ void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
@@ -3263,7 +3277,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
#ifdef CONFIG_NUMA
void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, s->object_size);
+ void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
trace_kmem_cache_alloc_node(_RET_IP_, ret,
s->object_size, s->size, gfpflags, node);
@@ -3277,7 +3291,7 @@ void *kmem_cache_alloc_node_trace(struct
gfp_t gfpflags,
int node, size_t size)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, size);
+ void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret,
size, s->size, gfpflags, node);
@@ -3667,7 +3681,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca
struct obj_cgroup *objcg = NULL;
/* memcg and kmem_cache debug support */
- s = slab_pre_alloc_hook(s, &objcg, size, flags);
+ s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
if (unlikely(!s))
return false;
/*
@@ -4417,7 +4431,7 @@ void *__kmalloc(size_t size, gfp_t flags
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc(s, flags, _RET_IP_, size);
+ ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, flags);
@@ -4465,7 +4479,7 @@ void *__kmalloc_node(size_t size, gfp_t
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc_node(s, flags, node, _RET_IP_, size);
+ ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);
@@ -4923,7 +4937,7 @@ void *__kmalloc_track_caller(size_t size
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc(s, gfpflags, caller, size);
+ ret = slab_alloc(s, NULL, gfpflags, caller, size);
/* Honor the call site pointer we received. */
trace_kmalloc(caller, ret, size, s->size, gfpflags);
@@ -4954,7 +4968,7 @@ void *__kmalloc_node_track_caller(size_t
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc_node(s, gfpflags, node, caller, size);
+ ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
/* Honor the call site pointer we received. */
trace_kmalloc_node(caller, ret, size, s->size, gfpflags, node);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 048/227] mm: introduce kmem_cache_alloc_lru
@ 2022-03-22 21:40 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:40 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: introduce kmem_cache_alloc_lru
We currently allocate scope for every memcg to be able to tracked on every
superblock instantiated in the system, regardless of whether that
superblock is even accessible to that memcg.
These huge memcg counts come from container hosts where memcgs are
confined to just a small subset of the total number of superblocks that
instantiated at any given point in time.
For these systems with huge container counts, list_lru does not need the
capability of tracking every memcg on every superblock. What it comes
down to is that adding the memcg to the list_lru at the first insert. So
introduce kmem_cache_alloc_lru to allocate objects and its list_lru. In
the later patch, we will convert all inode and dentry allocation from
kmem_cache_alloc to kmem_cache_alloc_lru.
Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 4 +
include/linux/memcontrol.h | 14 ++++
include/linux/slab.h | 3 +
mm/list_lru.c | 104 +++++++++++++++++++++++++++++++----
mm/memcontrol.c | 14 ----
mm/slab.c | 39 +++++++++----
mm/slab.h | 25 +++++++-
mm/slob.c | 6 ++
mm/slub.c | 42 +++++++++-----
9 files changed, 198 insertions(+), 53 deletions(-)
--- a/include/linux/list_lru.h~mm-introduce-kmem_cache_alloc_lru
+++ a/include/linux/list_lru.h
@@ -56,6 +56,8 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
+ /* protects ->mlrus->mlru[i] */
+ spinlock_t lock;
/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
struct list_lru_memcg __rcu *mlrus;
#endif
@@ -72,6 +74,8 @@ int __list_lru_init(struct list_lru *lru
#define list_lru_init_memcg(lru, shrinker) \
__list_lru_init((lru), true, NULL, shrinker)
+int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
+ gfp_t gfp);
int memcg_update_all_list_lrus(int num_memcgs);
void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg);
--- a/include/linux/memcontrol.h~mm-introduce-kmem_cache_alloc_lru
+++ a/include/linux/memcontrol.h
@@ -524,6 +524,20 @@ static inline struct mem_cgroup *page_me
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}
+static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+retry:
+ memcg = obj_cgroup_memcg(objcg);
+ if (unlikely(!css_tryget(&memcg->css)))
+ goto retry;
+ rcu_read_unlock();
+
+ return memcg;
+}
+
#ifdef CONFIG_MEMCG_KMEM
/*
* folio_memcg_kmem - Check if the folio has the memcg_kmem flag set.
--- a/include/linux/slab.h~mm-introduce-kmem_cache_alloc_lru
+++ a/include/linux/slab.h
@@ -135,6 +135,7 @@
#include <linux/kasan.h>
+struct list_lru;
struct mem_cgroup;
/*
* struct kmem_cache related prototypes
@@ -416,6 +417,8 @@ static __always_inline unsigned int __km
void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
void *kmem_cache_alloc(struct kmem_cache *s, gfp_t flags) __assume_slab_alignment __malloc;
+void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags) __assume_slab_alignment __malloc;
void kmem_cache_free(struct kmem_cache *s, void *objp);
/*
--- a/mm/list_lru.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/list_lru.c
@@ -13,6 +13,7 @@
#include <linux/mutex.h>
#include <linux/memcontrol.h>
#include "slab.h"
+#include "internal.h"
#ifdef CONFIG_MEMCG_KMEM
static LIST_HEAD(memcg_list_lrus);
@@ -338,22 +339,30 @@ static void memcg_destroy_list_lru_range
kfree(mlrus->mlru[i]);
}
+static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp)
+{
+ int nid;
+ struct list_lru_per_memcg *mlru;
+
+ mlru = kmalloc(struct_size(mlru, node, nr_node_ids), gfp);
+ if (!mlru)
+ return NULL;
+
+ for_each_node(nid)
+ init_one_lru(&mlru->node[nid]);
+
+ return mlru;
+}
+
static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus,
int begin, int end)
{
int i;
for (i = begin; i < end; i++) {
- int nid;
- struct list_lru_per_memcg *mlru;
-
- mlru = kmalloc(struct_size(mlru, node, nr_node_ids), GFP_KERNEL);
- if (!mlru)
+ mlrus->mlru[i] = memcg_init_list_lru_one(GFP_KERNEL);
+ if (!mlrus->mlru[i])
goto fail;
-
- for_each_node(nid)
- init_one_lru(&mlru->node[nid]);
- mlrus->mlru[i] = mlru;
}
return 0;
fail:
@@ -370,6 +379,8 @@ static int memcg_init_list_lru(struct li
if (!memcg_aware)
return 0;
+ spin_lock_init(&lru->lock);
+
mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
if (!mlrus)
return -ENOMEM;
@@ -416,8 +427,11 @@ static int memcg_update_list_lru(struct
return -ENOMEM;
}
+ spin_lock_irq(&lru->lock);
memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
rcu_assign_pointer(lru->mlrus, new);
+ spin_unlock_irq(&lru->lock);
+
kvfree_rcu(old, rcu);
return 0;
}
@@ -502,6 +516,78 @@ void memcg_drain_all_list_lrus(int src_i
memcg_drain_list_lru(lru, src_idx, dst_memcg);
mutex_unlock(&list_lrus_mutex);
}
+
+static bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
+ struct list_lru *lru)
+{
+ bool allocated;
+ int idx;
+
+ idx = memcg->kmemcg_id;
+ if (unlikely(idx < 0))
+ return true;
+
+ rcu_read_lock();
+ allocated = !!rcu_dereference(lru->mlrus)->mlru[idx];
+ rcu_read_unlock();
+
+ return allocated;
+}
+
+int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
+ gfp_t gfp)
+{
+ int i;
+ unsigned long flags;
+ struct list_lru_memcg *mlrus;
+ struct list_lru_memcg_table {
+ struct list_lru_per_memcg *mlru;
+ struct mem_cgroup *memcg;
+ } *table;
+
+ if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
+ return 0;
+
+ gfp &= GFP_RECLAIM_MASK;
+ table = kmalloc_array(memcg->css.cgroup->level, sizeof(*table), gfp);
+ if (!table)
+ return -ENOMEM;
+
+ /*
+ * Because the list_lru can be reparented to the parent cgroup's
+ * list_lru, we should make sure that this cgroup and all its
+ * ancestors have allocated list_lru_per_memcg.
+ */
+ for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) {
+ if (memcg_list_lru_allocated(memcg, lru))
+ break;
+
+ table[i].memcg = memcg;
+ table[i].mlru = memcg_init_list_lru_one(gfp);
+ if (!table[i].mlru) {
+ while (i--)
+ kfree(table[i].mlru);
+ kfree(table);
+ return -ENOMEM;
+ }
+ }
+
+ spin_lock_irqsave(&lru->lock, flags);
+ mlrus = rcu_dereference_protected(lru->mlrus, true);
+ while (i--) {
+ int index = table[i].memcg->kmemcg_id;
+
+ if (mlrus->mlru[index])
+ kfree(table[i].mlru);
+ else
+ mlrus->mlru[index] = table[i].mlru;
+ }
+ spin_unlock_irqrestore(&lru->lock, flags);
+
+ kfree(table);
+
+ return 0;
+}
#else
static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
--- a/mm/memcontrol.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/memcontrol.c
@@ -2805,20 +2805,6 @@ static void commit_charge(struct folio *
folio->memcg_data = (unsigned long)memcg;
}
-static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
-{
- struct mem_cgroup *memcg;
-
- rcu_read_lock();
-retry:
- memcg = obj_cgroup_memcg(objcg);
- if (unlikely(!css_tryget(&memcg->css)))
- goto retry;
- rcu_read_unlock();
-
- return memcg;
-}
-
#ifdef CONFIG_MEMCG_KMEM
/*
* The allocated objcg pointers array is not accounted directly.
--- a/mm/slab.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slab.c
@@ -3211,7 +3211,7 @@ slab_alloc_node(struct kmem_cache *cache
bool init = false;
flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags);
+ cachep = slab_pre_alloc_hook(cachep, NULL, &objcg, 1, flags);
if (unlikely(!cachep))
return NULL;
@@ -3287,7 +3287,8 @@ __do_cache_alloc(struct kmem_cache *cach
#endif /* CONFIG_NUMA */
static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned long caller)
+slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags,
+ size_t orig_size, unsigned long caller)
{
unsigned long save_flags;
void *objp;
@@ -3295,7 +3296,7 @@ slab_alloc(struct kmem_cache *cachep, gf
bool init = false;
flags &= gfp_allowed_mask;
- cachep = slab_pre_alloc_hook(cachep, &objcg, 1, flags);
+ cachep = slab_pre_alloc_hook(cachep, lru, &objcg, 1, flags);
if (unlikely(!cachep))
return NULL;
@@ -3484,6 +3485,18 @@ void ___cache_free(struct kmem_cache *ca
__free_one(ac, objp);
}
+static __always_inline
+void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+ gfp_t flags)
+{
+ void *ret = slab_alloc(cachep, lru, flags, cachep->object_size, _RET_IP_);
+
+ trace_kmem_cache_alloc(_RET_IP_, ret,
+ cachep->object_size, cachep->size, flags);
+
+ return ret;
+}
+
/**
* kmem_cache_alloc - Allocate an object
* @cachep: The cache to allocate from.
@@ -3496,15 +3509,17 @@ void ___cache_free(struct kmem_cache *ca
*/
void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
- void *ret = slab_alloc(cachep, flags, cachep->object_size, _RET_IP_);
-
- trace_kmem_cache_alloc(_RET_IP_, ret,
- cachep->object_size, cachep->size, flags);
-
- return ret;
+ return __kmem_cache_alloc_lru(cachep, NULL, flags);
}
EXPORT_SYMBOL(kmem_cache_alloc);
+void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+ gfp_t flags)
+{
+ return __kmem_cache_alloc_lru(cachep, lru, flags);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
+
static __always_inline void
cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
size_t size, void **p, unsigned long caller)
@@ -3521,7 +3536,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca
size_t i;
struct obj_cgroup *objcg = NULL;
- s = slab_pre_alloc_hook(s, &objcg, size, flags);
+ s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
if (!s)
return 0;
@@ -3562,7 +3577,7 @@ kmem_cache_alloc_trace(struct kmem_cache
{
void *ret;
- ret = slab_alloc(cachep, flags, size, _RET_IP_);
+ ret = slab_alloc(cachep, NULL, flags, size, _RET_IP_);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret,
@@ -3689,7 +3704,7 @@ static __always_inline void *__do_kmallo
cachep = kmalloc_slab(size, flags);
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
- ret = slab_alloc(cachep, flags, size, caller);
+ ret = slab_alloc(cachep, NULL, flags, size, caller);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(caller, ret,
--- a/mm/slab.h~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slab.h
@@ -231,6 +231,7 @@ struct kmem_cache {
#include <linux/kmemleak.h>
#include <linux/random.h>
#include <linux/sched/mm.h>
+#include <linux/list_lru.h>
/*
* State of the slab allocator.
@@ -472,6 +473,7 @@ static inline size_t obj_full_size(struc
* Returns false if the allocation should fail.
*/
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t objects, gfp_t flags)
{
@@ -487,13 +489,26 @@ static inline bool memcg_slab_pre_alloc_
if (!objcg)
return true;
- if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s))) {
- obj_cgroup_put(objcg);
- return false;
+ if (lru) {
+ int ret;
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ ret = memcg_list_lru_alloc(memcg, lru, flags);
+ css_put(&memcg->css);
+
+ if (ret)
+ goto out;
}
+ if (obj_cgroup_charge(objcg, flags, objects * obj_full_size(s)))
+ goto out;
+
*objcgp = objcg;
return true;
+out:
+ obj_cgroup_put(objcg);
+ return false;
}
static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
@@ -598,6 +613,7 @@ static inline void memcg_free_slab_cgrou
}
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t objects, gfp_t flags)
{
@@ -697,6 +713,7 @@ static inline size_t slab_ksize(const st
}
static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
+ struct list_lru *lru,
struct obj_cgroup **objcgp,
size_t size, gfp_t flags)
{
@@ -707,7 +724,7 @@ static inline struct kmem_cache *slab_pr
if (should_failslab(s, flags))
return NULL;
- if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags))
+ if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))
return NULL;
return s;
--- a/mm/slob.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slob.c
@@ -635,6 +635,12 @@ void *kmem_cache_alloc(struct kmem_cache
}
EXPORT_SYMBOL(kmem_cache_alloc);
+
+void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags)
+{
+ return slob_alloc_node(cachep, flags, NUMA_NO_NODE);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
#ifdef CONFIG_NUMA
void *__kmalloc_node(size_t size, gfp_t gfp, int node)
{
--- a/mm/slub.c~mm-introduce-kmem_cache_alloc_lru
+++ a/mm/slub.c
@@ -3131,7 +3131,7 @@ static __always_inline void maybe_wipe_o
*
* Otherwise we can simply pick the next object from the lockless free list.
*/
-static __always_inline void *slab_alloc_node(struct kmem_cache *s,
+static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
@@ -3141,7 +3141,7 @@ static __always_inline void *slab_alloc_
struct obj_cgroup *objcg = NULL;
bool init = false;
- s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags);
+ s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags);
if (!s)
return NULL;
@@ -3232,27 +3232,41 @@ out:
return object;
}
-static __always_inline void *slab_alloc(struct kmem_cache *s,
+static __always_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags, unsigned long addr, size_t orig_size)
{
- return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr, orig_size);
+ return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size);
}
-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+static __always_inline
+void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size);
+ void *ret = slab_alloc(s, lru, gfpflags, _RET_IP_, s->object_size);
trace_kmem_cache_alloc(_RET_IP_, ret, s->object_size,
s->size, gfpflags);
return ret;
}
+
+void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+{
+ return __kmem_cache_alloc_lru(s, NULL, gfpflags);
+}
EXPORT_SYMBOL(kmem_cache_alloc);
+void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags)
+{
+ return __kmem_cache_alloc_lru(s, lru, gfpflags);
+}
+EXPORT_SYMBOL(kmem_cache_alloc_lru);
+
#ifdef CONFIG_TRACING
void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_, size);
+ void *ret = slab_alloc(s, NULL, gfpflags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
@@ -3263,7 +3277,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
#ifdef CONFIG_NUMA
void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, s->object_size);
+ void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
trace_kmem_cache_alloc_node(_RET_IP_, ret,
s->object_size, s->size, gfpflags, node);
@@ -3277,7 +3291,7 @@ void *kmem_cache_alloc_node_trace(struct
gfp_t gfpflags,
int node, size_t size)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, size);
+ void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret,
size, s->size, gfpflags, node);
@@ -3667,7 +3681,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca
struct obj_cgroup *objcg = NULL;
/* memcg and kmem_cache debug support */
- s = slab_pre_alloc_hook(s, &objcg, size, flags);
+ s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags);
if (unlikely(!s))
return false;
/*
@@ -4417,7 +4431,7 @@ void *__kmalloc(size_t size, gfp_t flags
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc(s, flags, _RET_IP_, size);
+ ret = slab_alloc(s, NULL, flags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, flags);
@@ -4465,7 +4479,7 @@ void *__kmalloc_node(size_t size, gfp_t
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc_node(s, flags, node, _RET_IP_, size);
+ ret = slab_alloc_node(s, NULL, flags, node, _RET_IP_, size);
trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);
@@ -4923,7 +4937,7 @@ void *__kmalloc_track_caller(size_t size
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc(s, gfpflags, caller, size);
+ ret = slab_alloc(s, NULL, gfpflags, caller, size);
/* Honor the call site pointer we received. */
trace_kmalloc(caller, ret, size, s->size, gfpflags);
@@ -4954,7 +4968,7 @@ void *__kmalloc_node_track_caller(size_t
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
- ret = slab_alloc_node(s, gfpflags, node, caller, size);
+ ret = slab_alloc_node(s, NULL, gfpflags, node, caller, size);
/* Honor the call site pointer we received. */
trace_kmalloc_node(caller, ret, size, s->size, gfpflags, node);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 049/227] fs: introduce alloc_inode_sb() to allocate filesystems specific inode
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: fs: introduce alloc_inode_sb() to allocate filesystems specific inode
The allocated inode cache is supposed to be added to its memcg list_lru
which should be allocated as well in advance. That can be done by
kmem_cache_alloc_lru() which allocates object and list_lru. The file
systems is main user of it. So introduce alloc_inode_sb() to allocate
file system specific inodes and set up the inode reclaim context properly.
The file system is supposed to use alloc_inode_sb() to allocate inodes.
In the later patches, we will convert all users to the new API.
Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/filesystems/porting.rst | 6 ++++++
fs/inode.c | 2 +-
include/linux/fs.h | 11 +++++++++++
3 files changed, 18 insertions(+), 1 deletion(-)
--- a/Documentation/filesystems/porting.rst~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode
+++ a/Documentation/filesystems/porting.rst
@@ -45,6 +45,12 @@ typically between calling iget_locked()
At some point that will become mandatory.
+**mandatory**
+
+The foo_inode_info should always be allocated through alloc_inode_sb() rather
+than kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context
+correctly.
+
---
**mandatory**
--- a/fs/inode.c~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode
+++ a/fs/inode.c
@@ -259,7 +259,7 @@ static struct inode *alloc_inode(struct
if (ops->alloc_inode)
inode = ops->alloc_inode(sb);
else
- inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+ inode = alloc_inode_sb(sb, inode_cachep, GFP_KERNEL);
if (!inode)
return NULL;
--- a/include/linux/fs.h~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode
+++ a/include/linux/fs.h
@@ -42,6 +42,7 @@
#include <linux/mount.h>
#include <linux/cred.h>
#include <linux/mnt_idmapping.h>
+#include <linux/slab.h>
#include <asm/byteorder.h>
#include <uapi/linux/fs.h>
@@ -3114,6 +3115,16 @@ extern void free_inode_nonrcu(struct ino
extern int should_remove_suid(struct dentry *);
extern int file_remove_privs(struct file *);
+/*
+ * This must be used for allocating filesystems specific inodes to set
+ * up the inode reclaim context correctly.
+ */
+static inline void *
+alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
+{
+ return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
+}
+
extern void __insert_inode_hash(struct inode *, unsigned long hashval);
static inline void insert_inode_hash(struct inode *inode)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 049/227] fs: introduce alloc_inode_sb() to allocate filesystems specific inode
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: fs: introduce alloc_inode_sb() to allocate filesystems specific inode
The allocated inode cache is supposed to be added to its memcg list_lru
which should be allocated as well in advance. That can be done by
kmem_cache_alloc_lru() which allocates object and list_lru. The file
systems is main user of it. So introduce alloc_inode_sb() to allocate
file system specific inodes and set up the inode reclaim context properly.
The file system is supposed to use alloc_inode_sb() to allocate inodes.
In the later patches, we will convert all users to the new API.
Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/filesystems/porting.rst | 6 ++++++
fs/inode.c | 2 +-
include/linux/fs.h | 11 +++++++++++
3 files changed, 18 insertions(+), 1 deletion(-)
--- a/Documentation/filesystems/porting.rst~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode
+++ a/Documentation/filesystems/porting.rst
@@ -45,6 +45,12 @@ typically between calling iget_locked()
At some point that will become mandatory.
+**mandatory**
+
+The foo_inode_info should always be allocated through alloc_inode_sb() rather
+than kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context
+correctly.
+
---
**mandatory**
--- a/fs/inode.c~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode
+++ a/fs/inode.c
@@ -259,7 +259,7 @@ static struct inode *alloc_inode(struct
if (ops->alloc_inode)
inode = ops->alloc_inode(sb);
else
- inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+ inode = alloc_inode_sb(sb, inode_cachep, GFP_KERNEL);
if (!inode)
return NULL;
--- a/include/linux/fs.h~fs-introduce-alloc_inode_sb-to-allocate-filesystems-specific-inode
+++ a/include/linux/fs.h
@@ -42,6 +42,7 @@
#include <linux/mount.h>
#include <linux/cred.h>
#include <linux/mnt_idmapping.h>
+#include <linux/slab.h>
#include <asm/byteorder.h>
#include <uapi/linux/fs.h>
@@ -3114,6 +3115,16 @@ extern void free_inode_nonrcu(struct ino
extern int should_remove_suid(struct dentry *);
extern int file_remove_privs(struct file *);
+/*
+ * This must be used for allocating filesystems specific inodes to set
+ * up the inode reclaim context correctly.
+ */
+static inline void *
+alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
+{
+ return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
+}
+
extern void __insert_inode_hash(struct inode *, unsigned long hashval);
static inline void insert_inode_hash(struct inode *inode)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 050/227] fs: allocate inode by using alloc_inode_sb()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
block/bdev.c | 2 +-
drivers/dax/super.c | 2 +-
fs/9p/vfs_inode.c | 2 +-
fs/adfs/super.c | 2 +-
fs/affs/super.c | 2 +-
fs/afs/super.c | 2 +-
fs/befs/linuxvfs.c | 2 +-
fs/bfs/inode.c | 2 +-
fs/btrfs/inode.c | 2 +-
fs/ceph/inode.c | 2 +-
fs/cifs/cifsfs.c | 2 +-
fs/coda/inode.c | 2 +-
fs/ecryptfs/super.c | 2 +-
fs/efs/super.c | 2 +-
fs/erofs/super.c | 2 +-
fs/exfat/super.c | 2 +-
fs/ext2/super.c | 2 +-
fs/ext4/super.c | 2 +-
fs/fat/inode.c | 2 +-
fs/freevxfs/vxfs_super.c | 2 +-
fs/fuse/inode.c | 2 +-
fs/gfs2/super.c | 2 +-
fs/hfs/super.c | 2 +-
fs/hfsplus/super.c | 2 +-
fs/hostfs/hostfs_kern.c | 2 +-
fs/hpfs/super.c | 2 +-
fs/hugetlbfs/inode.c | 2 +-
fs/isofs/inode.c | 2 +-
fs/jffs2/super.c | 2 +-
fs/jfs/super.c | 2 +-
fs/minix/inode.c | 2 +-
fs/nfs/inode.c | 2 +-
fs/nilfs2/super.c | 2 +-
fs/ntfs/inode.c | 2 +-
fs/ntfs3/super.c | 2 +-
fs/ocfs2/dlmfs/dlmfs.c | 2 +-
fs/ocfs2/super.c | 2 +-
fs/openpromfs/inode.c | 2 +-
fs/orangefs/super.c | 2 +-
fs/overlayfs/super.c | 2 +-
fs/proc/inode.c | 2 +-
fs/qnx4/inode.c | 2 +-
fs/qnx6/inode.c | 2 +-
fs/reiserfs/super.c | 2 +-
fs/romfs/super.c | 2 +-
fs/squashfs/super.c | 2 +-
fs/sysv/inode.c | 2 +-
fs/ubifs/super.c | 2 +-
fs/udf/super.c | 2 +-
fs/ufs/super.c | 2 +-
fs/vboxsf/super.c | 2 +-
fs/xfs/xfs_icache.c | 2 +-
fs/zonefs/super.c | 2 +-
ipc/mqueue.c | 2 +-
mm/shmem.c | 2 +-
net/socket.c | 2 +-
net/sunrpc/rpc_pipe.c | 2 +-
57 files changed, 57 insertions(+), 57 deletions(-)
--- a/block/bdev.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/block/bdev.c
@@ -385,7 +385,7 @@ static struct kmem_cache * bdev_cachep _
static struct inode *bdev_alloc_inode(struct super_block *sb)
{
- struct bdev_inode *ei = kmem_cache_alloc(bdev_cachep, GFP_KERNEL);
+ struct bdev_inode *ei = alloc_inode_sb(sb, bdev_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--- a/drivers/dax/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/drivers/dax/super.c
@@ -282,7 +282,7 @@ static struct inode *dax_alloc_inode(str
struct dax_device *dax_dev;
struct inode *inode;
- dax_dev = kmem_cache_alloc(dax_cache, GFP_KERNEL);
+ dax_dev = alloc_inode_sb(sb, dax_cache, GFP_KERNEL);
if (!dax_dev)
return NULL;
--- a/fs/9p/vfs_inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/9p/vfs_inode.c
@@ -228,7 +228,7 @@ struct inode *v9fs_alloc_inode(struct su
{
struct v9fs_inode *v9inode;
- v9inode = kmem_cache_alloc(v9fs_inode_cache, GFP_KERNEL);
+ v9inode = alloc_inode_sb(sb, v9fs_inode_cache, GFP_KERNEL);
if (!v9inode)
return NULL;
#ifdef CONFIG_9P_FSCACHE
--- a/fs/adfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/adfs/super.c
@@ -220,7 +220,7 @@ static struct kmem_cache *adfs_inode_cac
static struct inode *adfs_alloc_inode(struct super_block *sb)
{
struct adfs_inode_info *ei;
- ei = kmem_cache_alloc(adfs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, adfs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/affs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/affs/super.c
@@ -100,7 +100,7 @@ static struct inode *affs_alloc_inode(st
{
struct affs_inode_info *i;
- i = kmem_cache_alloc(affs_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, affs_inode_cachep, GFP_KERNEL);
if (!i)
return NULL;
--- a/fs/afs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/afs/super.c
@@ -679,7 +679,7 @@ static struct inode *afs_alloc_inode(str
{
struct afs_vnode *vnode;
- vnode = kmem_cache_alloc(afs_inode_cachep, GFP_KERNEL);
+ vnode = alloc_inode_sb(sb, afs_inode_cachep, GFP_KERNEL);
if (!vnode)
return NULL;
--- a/fs/befs/linuxvfs.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/befs/linuxvfs.c
@@ -277,7 +277,7 @@ befs_alloc_inode(struct super_block *sb)
{
struct befs_inode_info *bi;
- bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
+ bi = alloc_inode_sb(sb, befs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
return &bi->vfs_inode;
--- a/fs/bfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/bfs/inode.c
@@ -239,7 +239,7 @@ static struct kmem_cache *bfs_inode_cach
static struct inode *bfs_alloc_inode(struct super_block *sb)
{
struct bfs_inode_info *bi;
- bi = kmem_cache_alloc(bfs_inode_cachep, GFP_KERNEL);
+ bi = alloc_inode_sb(sb, bfs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
return &bi->vfs_inode;
--- a/fs/btrfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/btrfs/inode.c
@@ -8787,7 +8787,7 @@ struct inode *btrfs_alloc_inode(struct s
struct btrfs_inode *ei;
struct inode *inode;
- ei = kmem_cache_alloc(btrfs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, btrfs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--- a/fs/ceph/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ceph/inode.c
@@ -447,7 +447,7 @@ struct inode *ceph_alloc_inode(struct su
struct ceph_inode_info *ci;
int i;
- ci = kmem_cache_alloc(ceph_inode_cachep, GFP_NOFS);
+ ci = alloc_inode_sb(sb, ceph_inode_cachep, GFP_NOFS);
if (!ci)
return NULL;
--- a/fs/cifs/cifsfs.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/cifs/cifsfs.c
@@ -354,7 +354,7 @@ static struct inode *
cifs_alloc_inode(struct super_block *sb)
{
struct cifsInodeInfo *cifs_inode;
- cifs_inode = kmem_cache_alloc(cifs_inode_cachep, GFP_KERNEL);
+ cifs_inode = alloc_inode_sb(sb, cifs_inode_cachep, GFP_KERNEL);
if (!cifs_inode)
return NULL;
cifs_inode->cifsAttrs = 0x20; /* default */
--- a/fs/coda/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/coda/inode.c
@@ -43,7 +43,7 @@ static struct kmem_cache * coda_inode_ca
static struct inode *coda_alloc_inode(struct super_block *sb)
{
struct coda_inode_info *ei;
- ei = kmem_cache_alloc(coda_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, coda_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
memset(&ei->c_fid, 0, sizeof(struct CodaFid));
--- a/fs/ecryptfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ecryptfs/super.c
@@ -38,7 +38,7 @@ static struct inode *ecryptfs_alloc_inod
struct ecryptfs_inode_info *inode_info;
struct inode *inode = NULL;
- inode_info = kmem_cache_alloc(ecryptfs_inode_info_cache, GFP_KERNEL);
+ inode_info = alloc_inode_sb(sb, ecryptfs_inode_info_cache, GFP_KERNEL);
if (unlikely(!inode_info))
goto out;
if (ecryptfs_init_crypt_stat(&inode_info->crypt_stat)) {
--- a/fs/efs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/efs/super.c
@@ -69,7 +69,7 @@ static struct kmem_cache * efs_inode_cac
static struct inode *efs_alloc_inode(struct super_block *sb)
{
struct efs_inode_info *ei;
- ei = kmem_cache_alloc(efs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, efs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/erofs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/erofs/super.c
@@ -84,7 +84,7 @@ static void erofs_inode_init_once(void *
static struct inode *erofs_alloc_inode(struct super_block *sb)
{
struct erofs_inode *vi =
- kmem_cache_alloc(erofs_inode_cachep, GFP_KERNEL);
+ alloc_inode_sb(sb, erofs_inode_cachep, GFP_KERNEL);
if (!vi)
return NULL;
--- a/fs/exfat/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/exfat/super.c
@@ -183,7 +183,7 @@ static struct inode *exfat_alloc_inode(s
{
struct exfat_inode_info *ei;
- ei = kmem_cache_alloc(exfat_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, exfat_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/ext2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ext2/super.c
@@ -180,7 +180,7 @@ static struct kmem_cache * ext2_inode_ca
static struct inode *ext2_alloc_inode(struct super_block *sb)
{
struct ext2_inode_info *ei;
- ei = kmem_cache_alloc(ext2_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, ext2_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
ei->i_block_alloc_info = NULL;
--- a/fs/ext4/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ext4/super.c
@@ -1316,7 +1316,7 @@ static struct inode *ext4_alloc_inode(st
{
struct ext4_inode_info *ei;
- ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, ext4_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/fat/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/fat/inode.c
@@ -745,7 +745,7 @@ static struct kmem_cache *fat_inode_cach
static struct inode *fat_alloc_inode(struct super_block *sb)
{
struct msdos_inode_info *ei;
- ei = kmem_cache_alloc(fat_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, fat_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/freevxfs/vxfs_super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/freevxfs/vxfs_super.c
@@ -124,7 +124,7 @@ static struct inode *vxfs_alloc_inode(st
{
struct vxfs_inode_info *vi;
- vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
+ vi = alloc_inode_sb(sb, vxfs_inode_cachep, GFP_KERNEL);
if (!vi)
return NULL;
inode_init_once(&vi->vfs_inode);
--- a/fs/fuse/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/fuse/inode.c
@@ -72,7 +72,7 @@ static struct inode *fuse_alloc_inode(st
{
struct fuse_inode *fi;
- fi = kmem_cache_alloc(fuse_inode_cachep, GFP_KERNEL);
+ fi = alloc_inode_sb(sb, fuse_inode_cachep, GFP_KERNEL);
if (!fi)
return NULL;
--- a/fs/gfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/gfs2/super.c
@@ -1425,7 +1425,7 @@ static struct inode *gfs2_alloc_inode(st
{
struct gfs2_inode *ip;
- ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL);
+ ip = alloc_inode_sb(sb, gfs2_inode_cachep, GFP_KERNEL);
if (!ip)
return NULL;
ip->i_flags = 0;
--- a/fs/hfsplus/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hfsplus/super.c
@@ -624,7 +624,7 @@ static struct inode *hfsplus_alloc_inode
{
struct hfsplus_inode_info *i;
- i = kmem_cache_alloc(hfsplus_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, hfsplus_inode_cachep, GFP_KERNEL);
return i ? &i->vfs_inode : NULL;
}
--- a/fs/hfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hfs/super.c
@@ -162,7 +162,7 @@ static struct inode *hfs_alloc_inode(str
{
struct hfs_inode_info *i;
- i = kmem_cache_alloc(hfs_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, hfs_inode_cachep, GFP_KERNEL);
return i ? &i->vfs_inode : NULL;
}
--- a/fs/hostfs/hostfs_kern.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hostfs/hostfs_kern.c
@@ -222,7 +222,7 @@ static struct inode *hostfs_alloc_inode(
{
struct hostfs_inode_info *hi;
- hi = kmem_cache_alloc(hostfs_inode_cache, GFP_KERNEL_ACCOUNT);
+ hi = alloc_inode_sb(sb, hostfs_inode_cache, GFP_KERNEL_ACCOUNT);
if (hi == NULL)
return NULL;
hi->fd = -1;
--- a/fs/hpfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hpfs/super.c
@@ -232,7 +232,7 @@ static struct kmem_cache * hpfs_inode_ca
static struct inode *hpfs_alloc_inode(struct super_block *sb)
{
struct hpfs_inode_info *ei;
- ei = kmem_cache_alloc(hpfs_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, hpfs_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/hugetlbfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hugetlbfs/inode.c
@@ -1110,7 +1110,7 @@ static struct inode *hugetlbfs_alloc_ino
if (unlikely(!hugetlbfs_dec_free_inodes(sbinfo)))
return NULL;
- p = kmem_cache_alloc(hugetlbfs_inode_cachep, GFP_KERNEL);
+ p = alloc_inode_sb(sb, hugetlbfs_inode_cachep, GFP_KERNEL);
if (unlikely(!p)) {
hugetlbfs_inc_free_inodes(sbinfo);
return NULL;
--- a/fs/isofs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/isofs/inode.c
@@ -70,7 +70,7 @@ static struct kmem_cache *isofs_inode_ca
static struct inode *isofs_alloc_inode(struct super_block *sb)
{
struct iso_inode_info *ei;
- ei = kmem_cache_alloc(isofs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, isofs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/jffs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/jffs2/super.c
@@ -39,7 +39,7 @@ static struct inode *jffs2_alloc_inode(s
{
struct jffs2_inode_info *f;
- f = kmem_cache_alloc(jffs2_inode_cachep, GFP_KERNEL);
+ f = alloc_inode_sb(sb, jffs2_inode_cachep, GFP_KERNEL);
if (!f)
return NULL;
return &f->vfs_inode;
--- a/fs/jfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/jfs/super.c
@@ -102,7 +102,7 @@ static struct inode *jfs_alloc_inode(str
{
struct jfs_inode_info *jfs_inode;
- jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
+ jfs_inode = alloc_inode_sb(sb, jfs_inode_cachep, GFP_NOFS);
if (!jfs_inode)
return NULL;
#ifdef CONFIG_QUOTA
--- a/fs/minix/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/minix/inode.c
@@ -63,7 +63,7 @@ static struct kmem_cache * minix_inode_c
static struct inode *minix_alloc_inode(struct super_block *sb)
{
struct minix_inode_info *ei;
- ei = kmem_cache_alloc(minix_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, minix_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/nfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/nfs/inode.c
@@ -2238,7 +2238,7 @@ static int nfs_update_inode(struct inode
struct inode *nfs_alloc_inode(struct super_block *sb)
{
struct nfs_inode *nfsi;
- nfsi = kmem_cache_alloc(nfs_inode_cachep, GFP_KERNEL);
+ nfsi = alloc_inode_sb(sb, nfs_inode_cachep, GFP_KERNEL);
if (!nfsi)
return NULL;
nfsi->flags = 0UL;
--- a/fs/nilfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/nilfs2/super.c
@@ -151,7 +151,7 @@ struct inode *nilfs_alloc_inode(struct s
{
struct nilfs_inode_info *ii;
- ii = kmem_cache_alloc(nilfs_inode_cachep, GFP_NOFS);
+ ii = alloc_inode_sb(sb, nilfs_inode_cachep, GFP_NOFS);
if (!ii)
return NULL;
ii->i_bh = NULL;
--- a/fs/ntfs3/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ntfs3/super.c
@@ -399,7 +399,7 @@ static struct kmem_cache *ntfs_inode_cac
static struct inode *ntfs_alloc_inode(struct super_block *sb)
{
- struct ntfs_inode *ni = kmem_cache_alloc(ntfs_inode_cachep, GFP_NOFS);
+ struct ntfs_inode *ni = alloc_inode_sb(sb, ntfs_inode_cachep, GFP_NOFS);
if (!ni)
return NULL;
--- a/fs/ntfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ntfs/inode.c
@@ -310,7 +310,7 @@ struct inode *ntfs_alloc_big_inode(struc
ntfs_inode *ni;
ntfs_debug("Entering.");
- ni = kmem_cache_alloc(ntfs_big_inode_cache, GFP_NOFS);
+ ni = alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS);
if (likely(ni != NULL)) {
ni->state = 0;
return VFS_I(ni);
--- a/fs/ocfs2/dlmfs/dlmfs.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ocfs2/dlmfs/dlmfs.c
@@ -280,7 +280,7 @@ static struct inode *dlmfs_alloc_inode(s
{
struct dlmfs_inode_private *ip;
- ip = kmem_cache_alloc(dlmfs_inode_cache, GFP_NOFS);
+ ip = alloc_inode_sb(sb, dlmfs_inode_cache, GFP_NOFS);
if (!ip)
return NULL;
--- a/fs/ocfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ocfs2/super.c
@@ -548,7 +548,7 @@ static struct inode *ocfs2_alloc_inode(s
{
struct ocfs2_inode_info *oi;
- oi = kmem_cache_alloc(ocfs2_inode_cachep, GFP_NOFS);
+ oi = alloc_inode_sb(sb, ocfs2_inode_cachep, GFP_NOFS);
if (!oi)
return NULL;
--- a/fs/openpromfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/openpromfs/inode.c
@@ -335,7 +335,7 @@ static struct inode *openprom_alloc_inod
{
struct op_inode_info *oi;
- oi = kmem_cache_alloc(op_inode_cachep, GFP_KERNEL);
+ oi = alloc_inode_sb(sb, op_inode_cachep, GFP_KERNEL);
if (!oi)
return NULL;
--- a/fs/orangefs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/orangefs/super.c
@@ -107,7 +107,7 @@ static struct inode *orangefs_alloc_inod
{
struct orangefs_inode_s *orangefs_inode;
- orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, GFP_KERNEL);
+ orangefs_inode = alloc_inode_sb(sb, orangefs_inode_cache, GFP_KERNEL);
if (!orangefs_inode)
return NULL;
--- a/fs/overlayfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/overlayfs/super.c
@@ -174,7 +174,7 @@ static struct kmem_cache *ovl_inode_cach
static struct inode *ovl_alloc_inode(struct super_block *sb)
{
- struct ovl_inode *oi = kmem_cache_alloc(ovl_inode_cachep, GFP_KERNEL);
+ struct ovl_inode *oi = alloc_inode_sb(sb, ovl_inode_cachep, GFP_KERNEL);
if (!oi)
return NULL;
--- a/fs/proc/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/proc/inode.c
@@ -66,7 +66,7 @@ static struct inode *proc_alloc_inode(st
{
struct proc_inode *ei;
- ei = kmem_cache_alloc(proc_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, proc_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
ei->pid = NULL;
--- a/fs/qnx4/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/qnx4/inode.c
@@ -338,7 +338,7 @@ static struct kmem_cache *qnx4_inode_cac
static struct inode *qnx4_alloc_inode(struct super_block *sb)
{
struct qnx4_inode_info *ei;
- ei = kmem_cache_alloc(qnx4_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, qnx4_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/qnx6/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/qnx6/inode.c
@@ -597,7 +597,7 @@ static struct kmem_cache *qnx6_inode_cac
static struct inode *qnx6_alloc_inode(struct super_block *sb)
{
struct qnx6_inode_info *ei;
- ei = kmem_cache_alloc(qnx6_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, qnx6_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/reiserfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/reiserfs/super.c
@@ -639,7 +639,7 @@ static struct kmem_cache *reiserfs_inode
static struct inode *reiserfs_alloc_inode(struct super_block *sb)
{
struct reiserfs_inode_info *ei;
- ei = kmem_cache_alloc(reiserfs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, reiserfs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
atomic_set(&ei->openers, 0);
--- a/fs/romfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/romfs/super.c
@@ -375,7 +375,7 @@ static struct inode *romfs_alloc_inode(s
{
struct romfs_inode_info *inode;
- inode = kmem_cache_alloc(romfs_inode_cachep, GFP_KERNEL);
+ inode = alloc_inode_sb(sb, romfs_inode_cachep, GFP_KERNEL);
return inode ? &inode->vfs_inode : NULL;
}
--- a/fs/squashfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/squashfs/super.c
@@ -584,7 +584,7 @@ static void __exit exit_squashfs_fs(void
static struct inode *squashfs_alloc_inode(struct super_block *sb)
{
struct squashfs_inode_info *ei =
- kmem_cache_alloc(squashfs_inode_cachep, GFP_KERNEL);
+ alloc_inode_sb(sb, squashfs_inode_cachep, GFP_KERNEL);
return ei ? &ei->vfs_inode : NULL;
}
--- a/fs/sysv/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/sysv/inode.c
@@ -306,7 +306,7 @@ static struct inode *sysv_alloc_inode(st
{
struct sysv_inode_info *si;
- si = kmem_cache_alloc(sysv_inode_cachep, GFP_KERNEL);
+ si = alloc_inode_sb(sb, sysv_inode_cachep, GFP_KERNEL);
if (!si)
return NULL;
return &si->vfs_inode;
--- a/fs/ubifs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ubifs/super.c
@@ -268,7 +268,7 @@ static struct inode *ubifs_alloc_inode(s
{
struct ubifs_inode *ui;
- ui = kmem_cache_alloc(ubifs_inode_slab, GFP_NOFS);
+ ui = alloc_inode_sb(sb, ubifs_inode_slab, GFP_NOFS);
if (!ui)
return NULL;
--- a/fs/udf/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/udf/super.c
@@ -136,7 +136,7 @@ static struct kmem_cache *udf_inode_cach
static struct inode *udf_alloc_inode(struct super_block *sb)
{
struct udf_inode_info *ei;
- ei = kmem_cache_alloc(udf_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, udf_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--- a/fs/ufs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ufs/super.c
@@ -1443,7 +1443,7 @@ static struct inode *ufs_alloc_inode(str
{
struct ufs_inode_info *ei;
- ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, ufs_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/vboxsf/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/vboxsf/super.c
@@ -241,7 +241,7 @@ static struct inode *vboxsf_alloc_inode(
{
struct vboxsf_inode *sf_i;
- sf_i = kmem_cache_alloc(vboxsf_inode_cachep, GFP_NOFS);
+ sf_i = alloc_inode_sb(sb, vboxsf_inode_cachep, GFP_NOFS);
if (!sf_i)
return NULL;
--- a/fs/xfs/xfs_icache.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/xfs/xfs_icache.c
@@ -77,7 +77,7 @@ xfs_inode_alloc(
* XXX: If this didn't occur in transactions, we could drop GFP_NOFAIL
* and return NULL here on ENOMEM.
*/
- ip = kmem_cache_alloc(xfs_inode_cache, GFP_KERNEL | __GFP_NOFAIL);
+ ip = alloc_inode_sb(mp->m_super, xfs_inode_cache, GFP_KERNEL | __GFP_NOFAIL);
if (inode_init_always(mp->m_super, VFS_I(ip))) {
kmem_cache_free(xfs_inode_cache, ip);
--- a/fs/zonefs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/zonefs/super.c
@@ -1137,7 +1137,7 @@ static struct inode *zonefs_alloc_inode(
{
struct zonefs_inode_info *zi;
- zi = kmem_cache_alloc(zonefs_inode_cachep, GFP_KERNEL);
+ zi = alloc_inode_sb(sb, zonefs_inode_cachep, GFP_KERNEL);
if (!zi)
return NULL;
--- a/ipc/mqueue.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/ipc/mqueue.c
@@ -486,7 +486,7 @@ static struct inode *mqueue_alloc_inode(
{
struct mqueue_inode_info *ei;
- ei = kmem_cache_alloc(mqueue_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, mqueue_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/mm/shmem.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/mm/shmem.c
@@ -3708,7 +3708,7 @@ static struct kmem_cache *shmem_inode_ca
static struct inode *shmem_alloc_inode(struct super_block *sb)
{
struct shmem_inode_info *info;
- info = kmem_cache_alloc(shmem_inode_cachep, GFP_KERNEL);
+ info = alloc_inode_sb(sb, shmem_inode_cachep, GFP_KERNEL);
if (!info)
return NULL;
return &info->vfs_inode;
--- a/net/socket.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/net/socket.c
@@ -301,7 +301,7 @@ static struct inode *sock_alloc_inode(st
{
struct socket_alloc *ei;
- ei = kmem_cache_alloc(sock_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, sock_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
init_waitqueue_head(&ei->socket.wq.wait);
--- a/net/sunrpc/rpc_pipe.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/net/sunrpc/rpc_pipe.c
@@ -197,7 +197,7 @@ static struct inode *
rpc_alloc_inode(struct super_block *sb)
{
struct rpc_inode *rpci;
- rpci = kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
+ rpci = alloc_inode_sb(sb, rpc_inode_cachep, GFP_KERNEL);
if (!rpci)
return NULL;
return &rpci->vfs_inode;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 050/227] fs: allocate inode by using alloc_inode_sb()
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
block/bdev.c | 2 +-
drivers/dax/super.c | 2 +-
fs/9p/vfs_inode.c | 2 +-
fs/adfs/super.c | 2 +-
fs/affs/super.c | 2 +-
fs/afs/super.c | 2 +-
fs/befs/linuxvfs.c | 2 +-
fs/bfs/inode.c | 2 +-
fs/btrfs/inode.c | 2 +-
fs/ceph/inode.c | 2 +-
fs/cifs/cifsfs.c | 2 +-
fs/coda/inode.c | 2 +-
fs/ecryptfs/super.c | 2 +-
fs/efs/super.c | 2 +-
fs/erofs/super.c | 2 +-
fs/exfat/super.c | 2 +-
fs/ext2/super.c | 2 +-
fs/ext4/super.c | 2 +-
fs/fat/inode.c | 2 +-
fs/freevxfs/vxfs_super.c | 2 +-
fs/fuse/inode.c | 2 +-
fs/gfs2/super.c | 2 +-
fs/hfs/super.c | 2 +-
fs/hfsplus/super.c | 2 +-
fs/hostfs/hostfs_kern.c | 2 +-
fs/hpfs/super.c | 2 +-
fs/hugetlbfs/inode.c | 2 +-
fs/isofs/inode.c | 2 +-
fs/jffs2/super.c | 2 +-
fs/jfs/super.c | 2 +-
fs/minix/inode.c | 2 +-
fs/nfs/inode.c | 2 +-
fs/nilfs2/super.c | 2 +-
fs/ntfs/inode.c | 2 +-
fs/ntfs3/super.c | 2 +-
fs/ocfs2/dlmfs/dlmfs.c | 2 +-
fs/ocfs2/super.c | 2 +-
fs/openpromfs/inode.c | 2 +-
fs/orangefs/super.c | 2 +-
fs/overlayfs/super.c | 2 +-
fs/proc/inode.c | 2 +-
fs/qnx4/inode.c | 2 +-
fs/qnx6/inode.c | 2 +-
fs/reiserfs/super.c | 2 +-
fs/romfs/super.c | 2 +-
fs/squashfs/super.c | 2 +-
fs/sysv/inode.c | 2 +-
fs/ubifs/super.c | 2 +-
fs/udf/super.c | 2 +-
fs/ufs/super.c | 2 +-
fs/vboxsf/super.c | 2 +-
fs/xfs/xfs_icache.c | 2 +-
fs/zonefs/super.c | 2 +-
ipc/mqueue.c | 2 +-
mm/shmem.c | 2 +-
net/socket.c | 2 +-
net/sunrpc/rpc_pipe.c | 2 +-
57 files changed, 57 insertions(+), 57 deletions(-)
--- a/block/bdev.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/block/bdev.c
@@ -385,7 +385,7 @@ static struct kmem_cache * bdev_cachep _
static struct inode *bdev_alloc_inode(struct super_block *sb)
{
- struct bdev_inode *ei = kmem_cache_alloc(bdev_cachep, GFP_KERNEL);
+ struct bdev_inode *ei = alloc_inode_sb(sb, bdev_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--- a/drivers/dax/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/drivers/dax/super.c
@@ -282,7 +282,7 @@ static struct inode *dax_alloc_inode(str
struct dax_device *dax_dev;
struct inode *inode;
- dax_dev = kmem_cache_alloc(dax_cache, GFP_KERNEL);
+ dax_dev = alloc_inode_sb(sb, dax_cache, GFP_KERNEL);
if (!dax_dev)
return NULL;
--- a/fs/9p/vfs_inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/9p/vfs_inode.c
@@ -228,7 +228,7 @@ struct inode *v9fs_alloc_inode(struct su
{
struct v9fs_inode *v9inode;
- v9inode = kmem_cache_alloc(v9fs_inode_cache, GFP_KERNEL);
+ v9inode = alloc_inode_sb(sb, v9fs_inode_cache, GFP_KERNEL);
if (!v9inode)
return NULL;
#ifdef CONFIG_9P_FSCACHE
--- a/fs/adfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/adfs/super.c
@@ -220,7 +220,7 @@ static struct kmem_cache *adfs_inode_cac
static struct inode *adfs_alloc_inode(struct super_block *sb)
{
struct adfs_inode_info *ei;
- ei = kmem_cache_alloc(adfs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, adfs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/affs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/affs/super.c
@@ -100,7 +100,7 @@ static struct inode *affs_alloc_inode(st
{
struct affs_inode_info *i;
- i = kmem_cache_alloc(affs_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, affs_inode_cachep, GFP_KERNEL);
if (!i)
return NULL;
--- a/fs/afs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/afs/super.c
@@ -679,7 +679,7 @@ static struct inode *afs_alloc_inode(str
{
struct afs_vnode *vnode;
- vnode = kmem_cache_alloc(afs_inode_cachep, GFP_KERNEL);
+ vnode = alloc_inode_sb(sb, afs_inode_cachep, GFP_KERNEL);
if (!vnode)
return NULL;
--- a/fs/befs/linuxvfs.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/befs/linuxvfs.c
@@ -277,7 +277,7 @@ befs_alloc_inode(struct super_block *sb)
{
struct befs_inode_info *bi;
- bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
+ bi = alloc_inode_sb(sb, befs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
return &bi->vfs_inode;
--- a/fs/bfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/bfs/inode.c
@@ -239,7 +239,7 @@ static struct kmem_cache *bfs_inode_cach
static struct inode *bfs_alloc_inode(struct super_block *sb)
{
struct bfs_inode_info *bi;
- bi = kmem_cache_alloc(bfs_inode_cachep, GFP_KERNEL);
+ bi = alloc_inode_sb(sb, bfs_inode_cachep, GFP_KERNEL);
if (!bi)
return NULL;
return &bi->vfs_inode;
--- a/fs/btrfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/btrfs/inode.c
@@ -8787,7 +8787,7 @@ struct inode *btrfs_alloc_inode(struct s
struct btrfs_inode *ei;
struct inode *inode;
- ei = kmem_cache_alloc(btrfs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, btrfs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--- a/fs/ceph/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ceph/inode.c
@@ -447,7 +447,7 @@ struct inode *ceph_alloc_inode(struct su
struct ceph_inode_info *ci;
int i;
- ci = kmem_cache_alloc(ceph_inode_cachep, GFP_NOFS);
+ ci = alloc_inode_sb(sb, ceph_inode_cachep, GFP_NOFS);
if (!ci)
return NULL;
--- a/fs/cifs/cifsfs.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/cifs/cifsfs.c
@@ -354,7 +354,7 @@ static struct inode *
cifs_alloc_inode(struct super_block *sb)
{
struct cifsInodeInfo *cifs_inode;
- cifs_inode = kmem_cache_alloc(cifs_inode_cachep, GFP_KERNEL);
+ cifs_inode = alloc_inode_sb(sb, cifs_inode_cachep, GFP_KERNEL);
if (!cifs_inode)
return NULL;
cifs_inode->cifsAttrs = 0x20; /* default */
--- a/fs/coda/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/coda/inode.c
@@ -43,7 +43,7 @@ static struct kmem_cache * coda_inode_ca
static struct inode *coda_alloc_inode(struct super_block *sb)
{
struct coda_inode_info *ei;
- ei = kmem_cache_alloc(coda_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, coda_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
memset(&ei->c_fid, 0, sizeof(struct CodaFid));
--- a/fs/ecryptfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ecryptfs/super.c
@@ -38,7 +38,7 @@ static struct inode *ecryptfs_alloc_inod
struct ecryptfs_inode_info *inode_info;
struct inode *inode = NULL;
- inode_info = kmem_cache_alloc(ecryptfs_inode_info_cache, GFP_KERNEL);
+ inode_info = alloc_inode_sb(sb, ecryptfs_inode_info_cache, GFP_KERNEL);
if (unlikely(!inode_info))
goto out;
if (ecryptfs_init_crypt_stat(&inode_info->crypt_stat)) {
--- a/fs/efs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/efs/super.c
@@ -69,7 +69,7 @@ static struct kmem_cache * efs_inode_cac
static struct inode *efs_alloc_inode(struct super_block *sb)
{
struct efs_inode_info *ei;
- ei = kmem_cache_alloc(efs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, efs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/erofs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/erofs/super.c
@@ -84,7 +84,7 @@ static void erofs_inode_init_once(void *
static struct inode *erofs_alloc_inode(struct super_block *sb)
{
struct erofs_inode *vi =
- kmem_cache_alloc(erofs_inode_cachep, GFP_KERNEL);
+ alloc_inode_sb(sb, erofs_inode_cachep, GFP_KERNEL);
if (!vi)
return NULL;
--- a/fs/exfat/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/exfat/super.c
@@ -183,7 +183,7 @@ static struct inode *exfat_alloc_inode(s
{
struct exfat_inode_info *ei;
- ei = kmem_cache_alloc(exfat_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, exfat_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/ext2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ext2/super.c
@@ -180,7 +180,7 @@ static struct kmem_cache * ext2_inode_ca
static struct inode *ext2_alloc_inode(struct super_block *sb)
{
struct ext2_inode_info *ei;
- ei = kmem_cache_alloc(ext2_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, ext2_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
ei->i_block_alloc_info = NULL;
--- a/fs/ext4/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ext4/super.c
@@ -1316,7 +1316,7 @@ static struct inode *ext4_alloc_inode(st
{
struct ext4_inode_info *ei;
- ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, ext4_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/fat/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/fat/inode.c
@@ -745,7 +745,7 @@ static struct kmem_cache *fat_inode_cach
static struct inode *fat_alloc_inode(struct super_block *sb)
{
struct msdos_inode_info *ei;
- ei = kmem_cache_alloc(fat_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, fat_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/freevxfs/vxfs_super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/freevxfs/vxfs_super.c
@@ -124,7 +124,7 @@ static struct inode *vxfs_alloc_inode(st
{
struct vxfs_inode_info *vi;
- vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
+ vi = alloc_inode_sb(sb, vxfs_inode_cachep, GFP_KERNEL);
if (!vi)
return NULL;
inode_init_once(&vi->vfs_inode);
--- a/fs/fuse/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/fuse/inode.c
@@ -72,7 +72,7 @@ static struct inode *fuse_alloc_inode(st
{
struct fuse_inode *fi;
- fi = kmem_cache_alloc(fuse_inode_cachep, GFP_KERNEL);
+ fi = alloc_inode_sb(sb, fuse_inode_cachep, GFP_KERNEL);
if (!fi)
return NULL;
--- a/fs/gfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/gfs2/super.c
@@ -1425,7 +1425,7 @@ static struct inode *gfs2_alloc_inode(st
{
struct gfs2_inode *ip;
- ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL);
+ ip = alloc_inode_sb(sb, gfs2_inode_cachep, GFP_KERNEL);
if (!ip)
return NULL;
ip->i_flags = 0;
--- a/fs/hfsplus/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hfsplus/super.c
@@ -624,7 +624,7 @@ static struct inode *hfsplus_alloc_inode
{
struct hfsplus_inode_info *i;
- i = kmem_cache_alloc(hfsplus_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, hfsplus_inode_cachep, GFP_KERNEL);
return i ? &i->vfs_inode : NULL;
}
--- a/fs/hfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hfs/super.c
@@ -162,7 +162,7 @@ static struct inode *hfs_alloc_inode(str
{
struct hfs_inode_info *i;
- i = kmem_cache_alloc(hfs_inode_cachep, GFP_KERNEL);
+ i = alloc_inode_sb(sb, hfs_inode_cachep, GFP_KERNEL);
return i ? &i->vfs_inode : NULL;
}
--- a/fs/hostfs/hostfs_kern.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hostfs/hostfs_kern.c
@@ -222,7 +222,7 @@ static struct inode *hostfs_alloc_inode(
{
struct hostfs_inode_info *hi;
- hi = kmem_cache_alloc(hostfs_inode_cache, GFP_KERNEL_ACCOUNT);
+ hi = alloc_inode_sb(sb, hostfs_inode_cache, GFP_KERNEL_ACCOUNT);
if (hi == NULL)
return NULL;
hi->fd = -1;
--- a/fs/hpfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hpfs/super.c
@@ -232,7 +232,7 @@ static struct kmem_cache * hpfs_inode_ca
static struct inode *hpfs_alloc_inode(struct super_block *sb)
{
struct hpfs_inode_info *ei;
- ei = kmem_cache_alloc(hpfs_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, hpfs_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/hugetlbfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/hugetlbfs/inode.c
@@ -1110,7 +1110,7 @@ static struct inode *hugetlbfs_alloc_ino
if (unlikely(!hugetlbfs_dec_free_inodes(sbinfo)))
return NULL;
- p = kmem_cache_alloc(hugetlbfs_inode_cachep, GFP_KERNEL);
+ p = alloc_inode_sb(sb, hugetlbfs_inode_cachep, GFP_KERNEL);
if (unlikely(!p)) {
hugetlbfs_inc_free_inodes(sbinfo);
return NULL;
--- a/fs/isofs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/isofs/inode.c
@@ -70,7 +70,7 @@ static struct kmem_cache *isofs_inode_ca
static struct inode *isofs_alloc_inode(struct super_block *sb)
{
struct iso_inode_info *ei;
- ei = kmem_cache_alloc(isofs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, isofs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/jffs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/jffs2/super.c
@@ -39,7 +39,7 @@ static struct inode *jffs2_alloc_inode(s
{
struct jffs2_inode_info *f;
- f = kmem_cache_alloc(jffs2_inode_cachep, GFP_KERNEL);
+ f = alloc_inode_sb(sb, jffs2_inode_cachep, GFP_KERNEL);
if (!f)
return NULL;
return &f->vfs_inode;
--- a/fs/jfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/jfs/super.c
@@ -102,7 +102,7 @@ static struct inode *jfs_alloc_inode(str
{
struct jfs_inode_info *jfs_inode;
- jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
+ jfs_inode = alloc_inode_sb(sb, jfs_inode_cachep, GFP_NOFS);
if (!jfs_inode)
return NULL;
#ifdef CONFIG_QUOTA
--- a/fs/minix/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/minix/inode.c
@@ -63,7 +63,7 @@ static struct kmem_cache * minix_inode_c
static struct inode *minix_alloc_inode(struct super_block *sb)
{
struct minix_inode_info *ei;
- ei = kmem_cache_alloc(minix_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, minix_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/nfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/nfs/inode.c
@@ -2238,7 +2238,7 @@ static int nfs_update_inode(struct inode
struct inode *nfs_alloc_inode(struct super_block *sb)
{
struct nfs_inode *nfsi;
- nfsi = kmem_cache_alloc(nfs_inode_cachep, GFP_KERNEL);
+ nfsi = alloc_inode_sb(sb, nfs_inode_cachep, GFP_KERNEL);
if (!nfsi)
return NULL;
nfsi->flags = 0UL;
--- a/fs/nilfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/nilfs2/super.c
@@ -151,7 +151,7 @@ struct inode *nilfs_alloc_inode(struct s
{
struct nilfs_inode_info *ii;
- ii = kmem_cache_alloc(nilfs_inode_cachep, GFP_NOFS);
+ ii = alloc_inode_sb(sb, nilfs_inode_cachep, GFP_NOFS);
if (!ii)
return NULL;
ii->i_bh = NULL;
--- a/fs/ntfs3/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ntfs3/super.c
@@ -399,7 +399,7 @@ static struct kmem_cache *ntfs_inode_cac
static struct inode *ntfs_alloc_inode(struct super_block *sb)
{
- struct ntfs_inode *ni = kmem_cache_alloc(ntfs_inode_cachep, GFP_NOFS);
+ struct ntfs_inode *ni = alloc_inode_sb(sb, ntfs_inode_cachep, GFP_NOFS);
if (!ni)
return NULL;
--- a/fs/ntfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ntfs/inode.c
@@ -310,7 +310,7 @@ struct inode *ntfs_alloc_big_inode(struc
ntfs_inode *ni;
ntfs_debug("Entering.");
- ni = kmem_cache_alloc(ntfs_big_inode_cache, GFP_NOFS);
+ ni = alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS);
if (likely(ni != NULL)) {
ni->state = 0;
return VFS_I(ni);
--- a/fs/ocfs2/dlmfs/dlmfs.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ocfs2/dlmfs/dlmfs.c
@@ -280,7 +280,7 @@ static struct inode *dlmfs_alloc_inode(s
{
struct dlmfs_inode_private *ip;
- ip = kmem_cache_alloc(dlmfs_inode_cache, GFP_NOFS);
+ ip = alloc_inode_sb(sb, dlmfs_inode_cache, GFP_NOFS);
if (!ip)
return NULL;
--- a/fs/ocfs2/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ocfs2/super.c
@@ -548,7 +548,7 @@ static struct inode *ocfs2_alloc_inode(s
{
struct ocfs2_inode_info *oi;
- oi = kmem_cache_alloc(ocfs2_inode_cachep, GFP_NOFS);
+ oi = alloc_inode_sb(sb, ocfs2_inode_cachep, GFP_NOFS);
if (!oi)
return NULL;
--- a/fs/openpromfs/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/openpromfs/inode.c
@@ -335,7 +335,7 @@ static struct inode *openprom_alloc_inod
{
struct op_inode_info *oi;
- oi = kmem_cache_alloc(op_inode_cachep, GFP_KERNEL);
+ oi = alloc_inode_sb(sb, op_inode_cachep, GFP_KERNEL);
if (!oi)
return NULL;
--- a/fs/orangefs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/orangefs/super.c
@@ -107,7 +107,7 @@ static struct inode *orangefs_alloc_inod
{
struct orangefs_inode_s *orangefs_inode;
- orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, GFP_KERNEL);
+ orangefs_inode = alloc_inode_sb(sb, orangefs_inode_cache, GFP_KERNEL);
if (!orangefs_inode)
return NULL;
--- a/fs/overlayfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/overlayfs/super.c
@@ -174,7 +174,7 @@ static struct kmem_cache *ovl_inode_cach
static struct inode *ovl_alloc_inode(struct super_block *sb)
{
- struct ovl_inode *oi = kmem_cache_alloc(ovl_inode_cachep, GFP_KERNEL);
+ struct ovl_inode *oi = alloc_inode_sb(sb, ovl_inode_cachep, GFP_KERNEL);
if (!oi)
return NULL;
--- a/fs/proc/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/proc/inode.c
@@ -66,7 +66,7 @@ static struct inode *proc_alloc_inode(st
{
struct proc_inode *ei;
- ei = kmem_cache_alloc(proc_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, proc_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
ei->pid = NULL;
--- a/fs/qnx4/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/qnx4/inode.c
@@ -338,7 +338,7 @@ static struct kmem_cache *qnx4_inode_cac
static struct inode *qnx4_alloc_inode(struct super_block *sb)
{
struct qnx4_inode_info *ei;
- ei = kmem_cache_alloc(qnx4_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, qnx4_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/qnx6/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/qnx6/inode.c
@@ -597,7 +597,7 @@ static struct kmem_cache *qnx6_inode_cac
static struct inode *qnx6_alloc_inode(struct super_block *sb)
{
struct qnx6_inode_info *ei;
- ei = kmem_cache_alloc(qnx6_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, qnx6_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/fs/reiserfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/reiserfs/super.c
@@ -639,7 +639,7 @@ static struct kmem_cache *reiserfs_inode
static struct inode *reiserfs_alloc_inode(struct super_block *sb)
{
struct reiserfs_inode_info *ei;
- ei = kmem_cache_alloc(reiserfs_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, reiserfs_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
atomic_set(&ei->openers, 0);
--- a/fs/romfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/romfs/super.c
@@ -375,7 +375,7 @@ static struct inode *romfs_alloc_inode(s
{
struct romfs_inode_info *inode;
- inode = kmem_cache_alloc(romfs_inode_cachep, GFP_KERNEL);
+ inode = alloc_inode_sb(sb, romfs_inode_cachep, GFP_KERNEL);
return inode ? &inode->vfs_inode : NULL;
}
--- a/fs/squashfs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/squashfs/super.c
@@ -584,7 +584,7 @@ static void __exit exit_squashfs_fs(void
static struct inode *squashfs_alloc_inode(struct super_block *sb)
{
struct squashfs_inode_info *ei =
- kmem_cache_alloc(squashfs_inode_cachep, GFP_KERNEL);
+ alloc_inode_sb(sb, squashfs_inode_cachep, GFP_KERNEL);
return ei ? &ei->vfs_inode : NULL;
}
--- a/fs/sysv/inode.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/sysv/inode.c
@@ -306,7 +306,7 @@ static struct inode *sysv_alloc_inode(st
{
struct sysv_inode_info *si;
- si = kmem_cache_alloc(sysv_inode_cachep, GFP_KERNEL);
+ si = alloc_inode_sb(sb, sysv_inode_cachep, GFP_KERNEL);
if (!si)
return NULL;
return &si->vfs_inode;
--- a/fs/ubifs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ubifs/super.c
@@ -268,7 +268,7 @@ static struct inode *ubifs_alloc_inode(s
{
struct ubifs_inode *ui;
- ui = kmem_cache_alloc(ubifs_inode_slab, GFP_NOFS);
+ ui = alloc_inode_sb(sb, ubifs_inode_slab, GFP_NOFS);
if (!ui)
return NULL;
--- a/fs/udf/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/udf/super.c
@@ -136,7 +136,7 @@ static struct kmem_cache *udf_inode_cach
static struct inode *udf_alloc_inode(struct super_block *sb)
{
struct udf_inode_info *ei;
- ei = kmem_cache_alloc(udf_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, udf_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
--- a/fs/ufs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/ufs/super.c
@@ -1443,7 +1443,7 @@ static struct inode *ufs_alloc_inode(str
{
struct ufs_inode_info *ei;
- ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
+ ei = alloc_inode_sb(sb, ufs_inode_cachep, GFP_NOFS);
if (!ei)
return NULL;
--- a/fs/vboxsf/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/vboxsf/super.c
@@ -241,7 +241,7 @@ static struct inode *vboxsf_alloc_inode(
{
struct vboxsf_inode *sf_i;
- sf_i = kmem_cache_alloc(vboxsf_inode_cachep, GFP_NOFS);
+ sf_i = alloc_inode_sb(sb, vboxsf_inode_cachep, GFP_NOFS);
if (!sf_i)
return NULL;
--- a/fs/xfs/xfs_icache.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/xfs/xfs_icache.c
@@ -77,7 +77,7 @@ xfs_inode_alloc(
* XXX: If this didn't occur in transactions, we could drop GFP_NOFAIL
* and return NULL here on ENOMEM.
*/
- ip = kmem_cache_alloc(xfs_inode_cache, GFP_KERNEL | __GFP_NOFAIL);
+ ip = alloc_inode_sb(mp->m_super, xfs_inode_cache, GFP_KERNEL | __GFP_NOFAIL);
if (inode_init_always(mp->m_super, VFS_I(ip))) {
kmem_cache_free(xfs_inode_cache, ip);
--- a/fs/zonefs/super.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/zonefs/super.c
@@ -1137,7 +1137,7 @@ static struct inode *zonefs_alloc_inode(
{
struct zonefs_inode_info *zi;
- zi = kmem_cache_alloc(zonefs_inode_cachep, GFP_KERNEL);
+ zi = alloc_inode_sb(sb, zonefs_inode_cachep, GFP_KERNEL);
if (!zi)
return NULL;
--- a/ipc/mqueue.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/ipc/mqueue.c
@@ -486,7 +486,7 @@ static struct inode *mqueue_alloc_inode(
{
struct mqueue_inode_info *ei;
- ei = kmem_cache_alloc(mqueue_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, mqueue_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
return &ei->vfs_inode;
--- a/mm/shmem.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/mm/shmem.c
@@ -3708,7 +3708,7 @@ static struct kmem_cache *shmem_inode_ca
static struct inode *shmem_alloc_inode(struct super_block *sb)
{
struct shmem_inode_info *info;
- info = kmem_cache_alloc(shmem_inode_cachep, GFP_KERNEL);
+ info = alloc_inode_sb(sb, shmem_inode_cachep, GFP_KERNEL);
if (!info)
return NULL;
return &info->vfs_inode;
--- a/net/socket.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/net/socket.c
@@ -301,7 +301,7 @@ static struct inode *sock_alloc_inode(st
{
struct socket_alloc *ei;
- ei = kmem_cache_alloc(sock_inode_cachep, GFP_KERNEL);
+ ei = alloc_inode_sb(sb, sock_inode_cachep, GFP_KERNEL);
if (!ei)
return NULL;
init_waitqueue_head(&ei->socket.wq.wait);
--- a/net/sunrpc/rpc_pipe.c~fs-allocate-inode-by-using-alloc_inode_sb
+++ a/net/sunrpc/rpc_pipe.c
@@ -197,7 +197,7 @@ static struct inode *
rpc_alloc_inode(struct super_block *sb)
{
struct rpc_inode *rpci;
- rpci = kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
+ rpci = alloc_inode_sb(sb, rpc_inode_cachep, GFP_KERNEL);
if (!rpci)
return NULL;
return &rpci->vfs_inode;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 051/227] f2fs: allocate inode by using alloc_inode_sb()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: f2fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/f2fs/super.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/f2fs/super.c~f2fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/f2fs/super.c
@@ -1345,8 +1345,12 @@ static struct inode *f2fs_alloc_inode(st
{
struct f2fs_inode_info *fi;
- fi = f2fs_kmem_cache_alloc(f2fs_inode_cachep,
- GFP_F2FS_ZERO, false, F2FS_SB(sb));
+ if (time_to_inject(F2FS_SB(sb), FAULT_SLAB_ALLOC)) {
+ f2fs_show_injection_info(F2FS_SB(sb), FAULT_SLAB_ALLOC);
+ return NULL;
+ }
+
+ fi = alloc_inode_sb(sb, f2fs_inode_cachep, GFP_F2FS_ZERO);
if (!fi)
return NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 051/227] f2fs: allocate inode by using alloc_inode_sb()
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: f2fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/f2fs/super.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/f2fs/super.c~f2fs-allocate-inode-by-using-alloc_inode_sb
+++ a/fs/f2fs/super.c
@@ -1345,8 +1345,12 @@ static struct inode *f2fs_alloc_inode(st
{
struct f2fs_inode_info *fi;
- fi = f2fs_kmem_cache_alloc(f2fs_inode_cachep,
- GFP_F2FS_ZERO, false, F2FS_SB(sb));
+ if (time_to_inject(F2FS_SB(sb), FAULT_SLAB_ALLOC)) {
+ f2fs_show_injection_info(F2FS_SB(sb), FAULT_SLAB_ALLOC);
+ return NULL;
+ }
+
+ fi = alloc_inode_sb(sb, f2fs_inode_cachep, GFP_F2FS_ZERO);
if (!fi)
return NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 052/227] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
Like inode cache, the dentry will also be added to its memcg list_lru. So
replace kmem_cache_alloc() with kmem_cache_alloc_lru() to allocate dentry.
Link: https://lkml.kernel.org/r/20220228122126.37293-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/dcache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/dcache.c~mm-dcache-use-kmem_cache_alloc_lru-to-allocate-dentry
+++ a/fs/dcache.c
@@ -1766,7 +1766,8 @@ static struct dentry *__d_alloc(struct s
char *dname;
int err;
- dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
+ dentry = kmem_cache_alloc_lru(dentry_cache, &sb->s_dentry_lru,
+ GFP_KERNEL);
if (!dentry)
return NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 052/227] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
Like inode cache, the dentry will also be added to its memcg list_lru. So
replace kmem_cache_alloc() with kmem_cache_alloc_lru() to allocate dentry.
Link: https://lkml.kernel.org/r/20220228122126.37293-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/dcache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/dcache.c~mm-dcache-use-kmem_cache_alloc_lru-to-allocate-dentry
+++ a/fs/dcache.c
@@ -1766,7 +1766,8 @@ static struct dentry *__d_alloc(struct s
char *dname;
int err;
- dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
+ dentry = kmem_cache_alloc_lru(dentry_cache, &sb->s_dentry_lru,
+ GFP_KERNEL);
if (!dentry)
return NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 053/227] xarray: use kmem_cache_alloc_lru to allocate xa_node
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: xarray: use kmem_cache_alloc_lru to allocate xa_node
The workingset will add the xa_node to the shadow_nodes list. So the
allocation of xa_node should be done by kmem_cache_alloc_lru(). Using
xas_set_lru() to pass the list_lru which we want to insert xa_node into to
set up the xa_node reclaim context correctly.
Link: https://lkml.kernel.org/r/20220228122126.37293-9-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/swap.h | 5 ++++-
include/linux/xarray.h | 9 ++++++++-
lib/xarray.c | 10 +++++-----
mm/workingset.c | 2 +-
4 files changed, 18 insertions(+), 8 deletions(-)
--- a/include/linux/swap.h~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/include/linux/swap.h
@@ -334,9 +334,12 @@ void workingset_activation(struct folio
/* Only track the nodes of mappings with shadow entries */
void workingset_update_node(struct xa_node *node);
+extern struct list_lru shadow_nodes;
#define mapping_set_update(xas, mapping) do { \
- if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \
+ if (!dax_mapping(mapping) && !shmem_mapping(mapping)) { \
xas_set_update(xas, workingset_update_node); \
+ xas_set_lru(xas, &shadow_nodes); \
+ } \
} while (0)
/* linux/mm/page_alloc.c */
--- a/include/linux/xarray.h~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/include/linux/xarray.h
@@ -1317,6 +1317,7 @@ struct xa_state {
struct xa_node *xa_node;
struct xa_node *xa_alloc;
xa_update_node_t xa_update;
+ struct list_lru *xa_lru;
};
/*
@@ -1336,7 +1337,8 @@ struct xa_state {
.xa_pad = 0, \
.xa_node = XAS_RESTART, \
.xa_alloc = NULL, \
- .xa_update = NULL \
+ .xa_update = NULL, \
+ .xa_lru = NULL, \
}
/**
@@ -1631,6 +1633,11 @@ static inline void xas_set_update(struct
xas->xa_update = update;
}
+static inline void xas_set_lru(struct xa_state *xas, struct list_lru *lru)
+{
+ xas->xa_lru = lru;
+}
+
/**
* xas_next_entry() - Advance iterator to next present entry.
* @xas: XArray operation state.
--- a/lib/xarray.c~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/lib/xarray.c
@@ -302,7 +302,7 @@ bool xas_nomem(struct xa_state *xas, gfp
}
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!xas->xa_alloc)
return false;
xas->xa_alloc->parent = NULL;
@@ -334,10 +334,10 @@ static bool __xas_nomem(struct xa_state
gfp |= __GFP_ACCOUNT;
if (gfpflags_allow_blocking(gfp)) {
xas_unlock_type(xas, lock_type);
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
xas_lock_type(xas, lock_type);
} else {
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
}
if (!xas->xa_alloc)
return false;
@@ -371,7 +371,7 @@ static void *xas_alloc(struct xa_state *
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!node) {
xas_set_err(xas, -ENOMEM);
return NULL;
@@ -1014,7 +1014,7 @@ void xas_split_alloc(struct xa_state *xa
void *sibling = NULL;
struct xa_node *node;
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!node)
goto nomem;
node->array = xas->xa;
--- a/mm/workingset.c~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/mm/workingset.c
@@ -429,7 +429,7 @@ out:
* point where they would still be useful.
*/
-static struct list_lru shadow_nodes;
+struct list_lru shadow_nodes;
void workingset_update_node(struct xa_node *node)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 053/227] xarray: use kmem_cache_alloc_lru to allocate xa_node
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: xarray: use kmem_cache_alloc_lru to allocate xa_node
The workingset will add the xa_node to the shadow_nodes list. So the
allocation of xa_node should be done by kmem_cache_alloc_lru(). Using
xas_set_lru() to pass the list_lru which we want to insert xa_node into to
set up the xa_node reclaim context correctly.
Link: https://lkml.kernel.org/r/20220228122126.37293-9-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/swap.h | 5 ++++-
include/linux/xarray.h | 9 ++++++++-
lib/xarray.c | 10 +++++-----
mm/workingset.c | 2 +-
4 files changed, 18 insertions(+), 8 deletions(-)
--- a/include/linux/swap.h~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/include/linux/swap.h
@@ -334,9 +334,12 @@ void workingset_activation(struct folio
/* Only track the nodes of mappings with shadow entries */
void workingset_update_node(struct xa_node *node);
+extern struct list_lru shadow_nodes;
#define mapping_set_update(xas, mapping) do { \
- if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \
+ if (!dax_mapping(mapping) && !shmem_mapping(mapping)) { \
xas_set_update(xas, workingset_update_node); \
+ xas_set_lru(xas, &shadow_nodes); \
+ } \
} while (0)
/* linux/mm/page_alloc.c */
--- a/include/linux/xarray.h~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/include/linux/xarray.h
@@ -1317,6 +1317,7 @@ struct xa_state {
struct xa_node *xa_node;
struct xa_node *xa_alloc;
xa_update_node_t xa_update;
+ struct list_lru *xa_lru;
};
/*
@@ -1336,7 +1337,8 @@ struct xa_state {
.xa_pad = 0, \
.xa_node = XAS_RESTART, \
.xa_alloc = NULL, \
- .xa_update = NULL \
+ .xa_update = NULL, \
+ .xa_lru = NULL, \
}
/**
@@ -1631,6 +1633,11 @@ static inline void xas_set_update(struct
xas->xa_update = update;
}
+static inline void xas_set_lru(struct xa_state *xas, struct list_lru *lru)
+{
+ xas->xa_lru = lru;
+}
+
/**
* xas_next_entry() - Advance iterator to next present entry.
* @xas: XArray operation state.
--- a/lib/xarray.c~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/lib/xarray.c
@@ -302,7 +302,7 @@ bool xas_nomem(struct xa_state *xas, gfp
}
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!xas->xa_alloc)
return false;
xas->xa_alloc->parent = NULL;
@@ -334,10 +334,10 @@ static bool __xas_nomem(struct xa_state
gfp |= __GFP_ACCOUNT;
if (gfpflags_allow_blocking(gfp)) {
xas_unlock_type(xas, lock_type);
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
xas_lock_type(xas, lock_type);
} else {
- xas->xa_alloc = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
}
if (!xas->xa_alloc)
return false;
@@ -371,7 +371,7 @@ static void *xas_alloc(struct xa_state *
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!node) {
xas_set_err(xas, -ENOMEM);
return NULL;
@@ -1014,7 +1014,7 @@ void xas_split_alloc(struct xa_state *xa
void *sibling = NULL;
struct xa_node *node;
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp);
+ node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
if (!node)
goto nomem;
node->array = xas->xa;
--- a/mm/workingset.c~xarray-use-kmem_cache_alloc_lru-to-allocate-xa_node
+++ a/mm/workingset.c
@@ -429,7 +429,7 @@ out:
* point where they would still be useful.
*/
-static struct list_lru shadow_nodes;
+struct list_lru shadow_nodes;
void workingset_update_node(struct xa_node *node)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 054/227] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
It will simplify the code if moving memcg_online_kmem() to
mem_cgroup_css_online() and do not need to set ->kmemcg_id to -1 to
indicate the memcg is offline. In the next patch, ->kmemcg_id will be
used to sync list lru reparenting which requires not to change
->kmemcg_id.
Link: https://lkml.kernel.org/r/20220228122126.37293-10-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 37 ++++++++++++++++---------------------
1 file changed, 16 insertions(+), 21 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-move-memcg_online_kmem-to-mem_cgroup_css_online
+++ a/mm/memcontrol.c
@@ -3670,7 +3670,8 @@ static int memcg_online_kmem(struct mem_
if (cgroup_memory_nokmem)
return 0;
- BUG_ON(memcg->kmemcg_id >= 0);
+ if (unlikely(mem_cgroup_is_root(memcg)))
+ return 0;
memcg_id = memcg_alloc_cache_id();
if (memcg_id < 0)
@@ -3696,7 +3697,10 @@ static void memcg_offline_kmem(struct me
struct mem_cgroup *parent;
int kmemcg_id;
- if (memcg->kmemcg_id == -1)
+ if (cgroup_memory_nokmem)
+ return;
+
+ if (unlikely(mem_cgroup_is_root(memcg)))
return;
parent = parent_mem_cgroup(memcg);
@@ -3706,7 +3710,6 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
kmemcg_id = memcg->kmemcg_id;
- BUG_ON(kmemcg_id < 0);
/*
* After we have finished memcg_reparent_objcgs(), all list_lrus
@@ -3717,7 +3720,6 @@ static void memcg_offline_kmem(struct me
memcg_drain_all_list_lrus(kmemcg_id, parent);
memcg_free_cache_id(kmemcg_id);
- memcg->kmemcg_id = -1;
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
@@ -5237,7 +5239,6 @@ mem_cgroup_css_alloc(struct cgroup_subsy
{
struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
struct mem_cgroup *memcg, *old_memcg;
- long error = -ENOMEM;
old_memcg = set_active_memcg(parent);
memcg = mem_cgroup_alloc();
@@ -5266,34 +5267,26 @@ mem_cgroup_css_alloc(struct cgroup_subsy
return &memcg->css;
}
- /* The following stuff does not apply to the root */
- error = memcg_online_kmem(memcg);
- if (error)
- goto fail;
-
if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
static_branch_inc(&memcg_sockets_enabled_key);
return &memcg->css;
-fail:
- mem_cgroup_id_remove(memcg);
- mem_cgroup_free(memcg);
- return ERR_PTR(error);
}
static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+ if (memcg_online_kmem(memcg))
+ goto remove_id;
+
/*
* A memcg must be visible for expand_shrinker_info()
* by the time the maps are allocated. So, we allocate maps
* here, when for_each_mem_cgroup() can't skip it.
*/
- if (alloc_shrinker_info(memcg)) {
- mem_cgroup_id_remove(memcg);
- return -ENOMEM;
- }
+ if (alloc_shrinker_info(memcg))
+ goto offline_kmem;
/* Online state pins memcg ID, memcg ID pins CSS */
refcount_set(&memcg->id.ref, 1);
@@ -5303,6 +5296,11 @@ static int mem_cgroup_css_online(struct
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
2UL*HZ);
return 0;
+offline_kmem:
+ memcg_offline_kmem(memcg);
+remove_id:
+ mem_cgroup_id_remove(memcg);
+ return -ENOMEM;
}
static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
@@ -5360,9 +5358,6 @@ static void mem_cgroup_css_free(struct c
cancel_work_sync(&memcg->high_work);
mem_cgroup_remove_from_trees(memcg);
free_shrinker_info(memcg);
-
- /* Need to offline kmem if online_css() fails */
- memcg_offline_kmem(memcg);
mem_cgroup_free(memcg);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 054/227] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online()
It will simplify the code if moving memcg_online_kmem() to
mem_cgroup_css_online() and do not need to set ->kmemcg_id to -1 to
indicate the memcg is offline. In the next patch, ->kmemcg_id will be
used to sync list lru reparenting which requires not to change
->kmemcg_id.
Link: https://lkml.kernel.org/r/20220228122126.37293-10-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 37 ++++++++++++++++---------------------
1 file changed, 16 insertions(+), 21 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-move-memcg_online_kmem-to-mem_cgroup_css_online
+++ a/mm/memcontrol.c
@@ -3670,7 +3670,8 @@ static int memcg_online_kmem(struct mem_
if (cgroup_memory_nokmem)
return 0;
- BUG_ON(memcg->kmemcg_id >= 0);
+ if (unlikely(mem_cgroup_is_root(memcg)))
+ return 0;
memcg_id = memcg_alloc_cache_id();
if (memcg_id < 0)
@@ -3696,7 +3697,10 @@ static void memcg_offline_kmem(struct me
struct mem_cgroup *parent;
int kmemcg_id;
- if (memcg->kmemcg_id == -1)
+ if (cgroup_memory_nokmem)
+ return;
+
+ if (unlikely(mem_cgroup_is_root(memcg)))
return;
parent = parent_mem_cgroup(memcg);
@@ -3706,7 +3710,6 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
kmemcg_id = memcg->kmemcg_id;
- BUG_ON(kmemcg_id < 0);
/*
* After we have finished memcg_reparent_objcgs(), all list_lrus
@@ -3717,7 +3720,6 @@ static void memcg_offline_kmem(struct me
memcg_drain_all_list_lrus(kmemcg_id, parent);
memcg_free_cache_id(kmemcg_id);
- memcg->kmemcg_id = -1;
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
@@ -5237,7 +5239,6 @@ mem_cgroup_css_alloc(struct cgroup_subsy
{
struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
struct mem_cgroup *memcg, *old_memcg;
- long error = -ENOMEM;
old_memcg = set_active_memcg(parent);
memcg = mem_cgroup_alloc();
@@ -5266,34 +5267,26 @@ mem_cgroup_css_alloc(struct cgroup_subsy
return &memcg->css;
}
- /* The following stuff does not apply to the root */
- error = memcg_online_kmem(memcg);
- if (error)
- goto fail;
-
if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
static_branch_inc(&memcg_sockets_enabled_key);
return &memcg->css;
-fail:
- mem_cgroup_id_remove(memcg);
- mem_cgroup_free(memcg);
- return ERR_PTR(error);
}
static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+ if (memcg_online_kmem(memcg))
+ goto remove_id;
+
/*
* A memcg must be visible for expand_shrinker_info()
* by the time the maps are allocated. So, we allocate maps
* here, when for_each_mem_cgroup() can't skip it.
*/
- if (alloc_shrinker_info(memcg)) {
- mem_cgroup_id_remove(memcg);
- return -ENOMEM;
- }
+ if (alloc_shrinker_info(memcg))
+ goto offline_kmem;
/* Online state pins memcg ID, memcg ID pins CSS */
refcount_set(&memcg->id.ref, 1);
@@ -5303,6 +5296,11 @@ static int mem_cgroup_css_online(struct
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
2UL*HZ);
return 0;
+offline_kmem:
+ memcg_offline_kmem(memcg);
+remove_id:
+ mem_cgroup_id_remove(memcg);
+ return -ENOMEM;
}
static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
@@ -5360,9 +5358,6 @@ static void mem_cgroup_css_free(struct c
cancel_work_sync(&memcg->high_work);
mem_cgroup_remove_from_trees(memcg);
free_shrinker_info(memcg);
-
- /* Need to offline kmem if online_css() fails */
- memcg_offline_kmem(memcg);
mem_cgroup_free(memcg);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 055/227] mm: list_lru: allocate list_lru_one only when needed
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: allocate list_lru_one only when needed
In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
memory.
After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.
crash> p memcg_nr_cache_ids
memcg_nr_cache_ids = $2 = 24574
memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.
num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
crash> list super_blocks | wc -l
952
Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But the number of memory cgroup is less than 500. So I
guess more than 12286 containers have been deployed on this machine (I do
not know why there are so many containers, it may be a user's bug or the
user really want to do that). And memcg_nr_cache_ids has not been reduced
to a suitable value. This can waste a lot of memory.
Now the infrastructure for dynamic list_lru_one allocation is ready, so
remove statically allocated memory code to save memory.
Link: https://lkml.kernel.org/r/20220228122126.37293-11-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 7 +-
mm/list_lru.c | 121 ++++++++++++++++++++-----------------
mm/memcontrol.c | 6 +
3 files changed, 77 insertions(+), 57 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-allocate-list_lru_one-only-when-needed
+++ a/include/linux/list_lru.h
@@ -32,14 +32,15 @@ struct list_lru_one {
};
struct list_lru_per_memcg {
+ struct rcu_head rcu;
/* array of per cgroup per node lists, indexed by node id */
- struct list_lru_one node[0];
+ struct list_lru_one node[];
};
struct list_lru_memcg {
struct rcu_head rcu;
/* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_per_memcg *mlru[];
+ struct list_lru_per_memcg __rcu *mlru[];
};
struct list_lru_node {
@@ -77,7 +78,7 @@ int __list_lru_init(struct list_lru *lru
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
gfp_t gfp);
int memcg_update_all_list_lrus(int num_memcgs);
-void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg);
+void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst);
/**
* list_lru_add: add an element to the lru list's tail
--- a/mm/list_lru.c~mm-list_lru-allocate-list_lru_one-only-when-needed
+++ a/mm/list_lru.c
@@ -60,8 +60,12 @@ list_lru_from_memcg_idx(struct list_lru
* from relocation (see memcg_update_list_lru).
*/
mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock));
- if (mlrus && idx >= 0)
- return &mlrus->mlru[idx]->node[nid];
+ if (mlrus && idx >= 0) {
+ struct list_lru_per_memcg *mlru;
+
+ mlru = rcu_dereference_check(mlrus->mlru[idx], true);
+ return mlru ? &mlru->node[nid] : NULL;
+ }
return &nlru->lru;
}
@@ -188,7 +192,7 @@ unsigned long list_lru_count_one(struct
rcu_read_lock();
l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
- count = READ_ONCE(l->nr_items);
+ count = l ? READ_ONCE(l->nr_items) : 0;
rcu_read_unlock();
if (unlikely(count < 0))
@@ -217,8 +221,11 @@ __list_lru_walk_one(struct list_lru *lru
struct list_head *item, *n;
unsigned long isolated = 0;
- l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
restart:
+ l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
+ if (!l)
+ goto out;
+
list_for_each_safe(item, n, &l->list) {
enum lru_status ret;
@@ -262,6 +269,7 @@ restart:
BUG();
}
}
+out:
return isolated;
}
@@ -354,20 +362,25 @@ static struct list_lru_per_memcg *memcg_
return mlru;
}
-static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus,
- int begin, int end)
+static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- int i;
+ struct list_lru_memcg *mlrus;
+ struct list_lru_per_memcg *mlru;
- for (i = begin; i < end; i++) {
- mlrus->mlru[i] = memcg_init_list_lru_one(GFP_KERNEL);
- if (!mlrus->mlru[i])
- goto fail;
- }
- return 0;
-fail:
- memcg_destroy_list_lru_range(mlrus, begin, i);
- return -ENOMEM;
+ spin_lock_irq(&lru->lock);
+ mlrus = rcu_dereference_protected(lru->mlrus, true);
+ mlru = rcu_dereference_protected(mlrus->mlru[src_idx], true);
+ rcu_assign_pointer(mlrus->mlru[src_idx], NULL);
+ spin_unlock_irq(&lru->lock);
+
+ /*
+ * The __list_lru_walk_one() can walk the list of this node.
+ * We need kvfree_rcu() here. And the walking of the list
+ * is under lru->node[nid]->lock, which can serve as a RCU
+ * read-side critical section.
+ */
+ if (mlru)
+ kvfree_rcu(mlru, rcu);
}
static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
@@ -381,14 +394,10 @@ static int memcg_init_list_lru(struct li
spin_lock_init(&lru->lock);
- mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
+ mlrus = kvzalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
if (!mlrus)
return -ENOMEM;
- if (memcg_init_list_lru_range(mlrus, 0, size)) {
- kvfree(mlrus);
- return -ENOMEM;
- }
RCU_INIT_POINTER(lru->mlrus, mlrus);
return 0;
@@ -422,13 +431,9 @@ static int memcg_update_list_lru(struct
if (!new)
return -ENOMEM;
- if (memcg_init_list_lru_range(new, old_size, new_size)) {
- kvfree(new);
- return -ENOMEM;
- }
-
spin_lock_irq(&lru->lock);
memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
+ memset(&new->mlru[old_size], 0, flex_array_size(new, mlru, new_size - old_size));
rcu_assign_pointer(lru->mlrus, new);
spin_unlock_irq(&lru->lock);
@@ -436,20 +441,6 @@ static int memcg_update_list_lru(struct
return 0;
}
-static void memcg_cancel_update_list_lru(struct list_lru *lru,
- int old_size, int new_size)
-{
- struct list_lru_memcg *mlrus;
-
- mlrus = rcu_dereference_protected(lru->mlrus,
- lockdep_is_held(&list_lrus_mutex));
- /*
- * Do not bother shrinking the array back to the old size, because we
- * cannot handle allocation failures here.
- */
- memcg_destroy_list_lru_range(mlrus, old_size, new_size);
-}
-
int memcg_update_all_list_lrus(int new_size)
{
int ret = 0;
@@ -460,15 +451,10 @@ int memcg_update_all_list_lrus(int new_s
list_for_each_entry(lru, &memcg_list_lrus, list) {
ret = memcg_update_list_lru(lru, old_size, new_size);
if (ret)
- goto fail;
+ break;
}
-out:
mutex_unlock(&list_lrus_mutex);
return ret;
-fail:
- list_for_each_entry_continue_reverse(lru, &memcg_list_lrus, list)
- memcg_cancel_update_list_lru(lru, old_size, new_size);
- goto out;
}
static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
@@ -485,6 +471,8 @@ static void memcg_drain_list_lru_node(st
spin_lock_irq(&nlru->lock);
src = list_lru_from_memcg_idx(lru, nid, src_idx);
+ if (!src)
+ goto out;
dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
list_splice_init(&src->list, &dst->list);
@@ -494,7 +482,7 @@ static void memcg_drain_list_lru_node(st
set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
src->nr_items = 0;
}
-
+out:
spin_unlock_irq(&nlru->lock);
}
@@ -505,15 +493,41 @@ static void memcg_drain_list_lru(struct
for_each_node(i)
memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg);
+
+ memcg_list_lru_free(lru, src_idx);
}
-void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg)
+void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst)
{
+ struct cgroup_subsys_state *css;
struct list_lru *lru;
+ int src_idx = src->kmemcg_id;
+
+ /*
+ * Change kmemcg_id of this cgroup and all its descendants to the
+ * parent's id, and then move all entries from this cgroup's list_lrus
+ * to ones of the parent.
+ *
+ * After we have finished, all list_lrus corresponding to this cgroup
+ * are guaranteed to remain empty. So we can safely free this cgroup's
+ * list lrus in memcg_list_lru_free().
+ *
+ * Changing ->kmemcg_id to the parent can prevent memcg_list_lru_alloc()
+ * from allocating list lrus for this cgroup after memcg_list_lru_free()
+ * call.
+ */
+ rcu_read_lock();
+ css_for_each_descendant_pre(css, &src->css) {
+ struct mem_cgroup *memcg;
+
+ memcg = mem_cgroup_from_css(css);
+ memcg->kmemcg_id = dst->kmemcg_id;
+ }
+ rcu_read_unlock();
mutex_lock(&list_lrus_mutex);
list_for_each_entry(lru, &memcg_list_lrus, list)
- memcg_drain_list_lru(lru, src_idx, dst_memcg);
+ memcg_drain_list_lru(lru, src_idx, dst);
mutex_unlock(&list_lrus_mutex);
}
@@ -528,7 +542,7 @@ static bool memcg_list_lru_allocated(str
return true;
rcu_read_lock();
- allocated = !!rcu_dereference(lru->mlrus)->mlru[idx];
+ allocated = !!rcu_access_pointer(rcu_dereference(lru->mlrus)->mlru[idx]);
rcu_read_unlock();
return allocated;
@@ -576,11 +590,12 @@ int memcg_list_lru_alloc(struct mem_cgro
mlrus = rcu_dereference_protected(lru->mlrus, true);
while (i--) {
int index = table[i].memcg->kmemcg_id;
+ struct list_lru_per_memcg *mlru = table[i].mlru;
- if (mlrus->mlru[index])
- kfree(table[i].mlru);
+ if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true))
+ kfree(mlru);
else
- mlrus->mlru[index] = table[i].mlru;
+ rcu_assign_pointer(mlrus->mlru[index], mlru);
}
spin_unlock_irqrestore(&lru->lock, flags);
--- a/mm/memcontrol.c~mm-list_lru-allocate-list_lru_one-only-when-needed
+++ a/mm/memcontrol.c
@@ -3709,6 +3709,10 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
+ /*
+ * memcg_drain_all_list_lrus() can change memcg->kmemcg_id.
+ * Cache it to local @kmemcg_id.
+ */
kmemcg_id = memcg->kmemcg_id;
/*
@@ -3717,7 +3721,7 @@ static void memcg_offline_kmem(struct me
* The ordering is imposed by list_lru_node->lock taken by
* memcg_drain_all_list_lrus().
*/
- memcg_drain_all_list_lrus(kmemcg_id, parent);
+ memcg_drain_all_list_lrus(memcg, parent);
memcg_free_cache_id(kmemcg_id);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 055/227] mm: list_lru: allocate list_lru_one only when needed
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: allocate list_lru_one only when needed
In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
memory.
After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.
crash> p memcg_nr_cache_ids
memcg_nr_cache_ids = $2 = 24574
memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.
num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)
There are 4 numa nodes in our system, so each list_lru consumes ~3MB.
crash> list super_blocks | wc -l
952
Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But the number of memory cgroup is less than 500. So I
guess more than 12286 containers have been deployed on this machine (I do
not know why there are so many containers, it may be a user's bug or the
user really want to do that). And memcg_nr_cache_ids has not been reduced
to a suitable value. This can waste a lot of memory.
Now the infrastructure for dynamic list_lru_one allocation is ready, so
remove statically allocated memory code to save memory.
Link: https://lkml.kernel.org/r/20220228122126.37293-11-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 7 +-
mm/list_lru.c | 121 ++++++++++++++++++++-----------------
mm/memcontrol.c | 6 +
3 files changed, 77 insertions(+), 57 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-allocate-list_lru_one-only-when-needed
+++ a/include/linux/list_lru.h
@@ -32,14 +32,15 @@ struct list_lru_one {
};
struct list_lru_per_memcg {
+ struct rcu_head rcu;
/* array of per cgroup per node lists, indexed by node id */
- struct list_lru_one node[0];
+ struct list_lru_one node[];
};
struct list_lru_memcg {
struct rcu_head rcu;
/* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_per_memcg *mlru[];
+ struct list_lru_per_memcg __rcu *mlru[];
};
struct list_lru_node {
@@ -77,7 +78,7 @@ int __list_lru_init(struct list_lru *lru
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
gfp_t gfp);
int memcg_update_all_list_lrus(int num_memcgs);
-void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg);
+void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst);
/**
* list_lru_add: add an element to the lru list's tail
--- a/mm/list_lru.c~mm-list_lru-allocate-list_lru_one-only-when-needed
+++ a/mm/list_lru.c
@@ -60,8 +60,12 @@ list_lru_from_memcg_idx(struct list_lru
* from relocation (see memcg_update_list_lru).
*/
mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock));
- if (mlrus && idx >= 0)
- return &mlrus->mlru[idx]->node[nid];
+ if (mlrus && idx >= 0) {
+ struct list_lru_per_memcg *mlru;
+
+ mlru = rcu_dereference_check(mlrus->mlru[idx], true);
+ return mlru ? &mlru->node[nid] : NULL;
+ }
return &nlru->lru;
}
@@ -188,7 +192,7 @@ unsigned long list_lru_count_one(struct
rcu_read_lock();
l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
- count = READ_ONCE(l->nr_items);
+ count = l ? READ_ONCE(l->nr_items) : 0;
rcu_read_unlock();
if (unlikely(count < 0))
@@ -217,8 +221,11 @@ __list_lru_walk_one(struct list_lru *lru
struct list_head *item, *n;
unsigned long isolated = 0;
- l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
restart:
+ l = list_lru_from_memcg_idx(lru, nid, memcg_idx);
+ if (!l)
+ goto out;
+
list_for_each_safe(item, n, &l->list) {
enum lru_status ret;
@@ -262,6 +269,7 @@ restart:
BUG();
}
}
+out:
return isolated;
}
@@ -354,20 +362,25 @@ static struct list_lru_per_memcg *memcg_
return mlru;
}
-static int memcg_init_list_lru_range(struct list_lru_memcg *mlrus,
- int begin, int end)
+static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- int i;
+ struct list_lru_memcg *mlrus;
+ struct list_lru_per_memcg *mlru;
- for (i = begin; i < end; i++) {
- mlrus->mlru[i] = memcg_init_list_lru_one(GFP_KERNEL);
- if (!mlrus->mlru[i])
- goto fail;
- }
- return 0;
-fail:
- memcg_destroy_list_lru_range(mlrus, begin, i);
- return -ENOMEM;
+ spin_lock_irq(&lru->lock);
+ mlrus = rcu_dereference_protected(lru->mlrus, true);
+ mlru = rcu_dereference_protected(mlrus->mlru[src_idx], true);
+ rcu_assign_pointer(mlrus->mlru[src_idx], NULL);
+ spin_unlock_irq(&lru->lock);
+
+ /*
+ * The __list_lru_walk_one() can walk the list of this node.
+ * We need kvfree_rcu() here. And the walking of the list
+ * is under lru->node[nid]->lock, which can serve as a RCU
+ * read-side critical section.
+ */
+ if (mlru)
+ kvfree_rcu(mlru, rcu);
}
static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
@@ -381,14 +394,10 @@ static int memcg_init_list_lru(struct li
spin_lock_init(&lru->lock);
- mlrus = kvmalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
+ mlrus = kvzalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
if (!mlrus)
return -ENOMEM;
- if (memcg_init_list_lru_range(mlrus, 0, size)) {
- kvfree(mlrus);
- return -ENOMEM;
- }
RCU_INIT_POINTER(lru->mlrus, mlrus);
return 0;
@@ -422,13 +431,9 @@ static int memcg_update_list_lru(struct
if (!new)
return -ENOMEM;
- if (memcg_init_list_lru_range(new, old_size, new_size)) {
- kvfree(new);
- return -ENOMEM;
- }
-
spin_lock_irq(&lru->lock);
memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
+ memset(&new->mlru[old_size], 0, flex_array_size(new, mlru, new_size - old_size));
rcu_assign_pointer(lru->mlrus, new);
spin_unlock_irq(&lru->lock);
@@ -436,20 +441,6 @@ static int memcg_update_list_lru(struct
return 0;
}
-static void memcg_cancel_update_list_lru(struct list_lru *lru,
- int old_size, int new_size)
-{
- struct list_lru_memcg *mlrus;
-
- mlrus = rcu_dereference_protected(lru->mlrus,
- lockdep_is_held(&list_lrus_mutex));
- /*
- * Do not bother shrinking the array back to the old size, because we
- * cannot handle allocation failures here.
- */
- memcg_destroy_list_lru_range(mlrus, old_size, new_size);
-}
-
int memcg_update_all_list_lrus(int new_size)
{
int ret = 0;
@@ -460,15 +451,10 @@ int memcg_update_all_list_lrus(int new_s
list_for_each_entry(lru, &memcg_list_lrus, list) {
ret = memcg_update_list_lru(lru, old_size, new_size);
if (ret)
- goto fail;
+ break;
}
-out:
mutex_unlock(&list_lrus_mutex);
return ret;
-fail:
- list_for_each_entry_continue_reverse(lru, &memcg_list_lrus, list)
- memcg_cancel_update_list_lru(lru, old_size, new_size);
- goto out;
}
static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
@@ -485,6 +471,8 @@ static void memcg_drain_list_lru_node(st
spin_lock_irq(&nlru->lock);
src = list_lru_from_memcg_idx(lru, nid, src_idx);
+ if (!src)
+ goto out;
dst = list_lru_from_memcg_idx(lru, nid, dst_idx);
list_splice_init(&src->list, &dst->list);
@@ -494,7 +482,7 @@ static void memcg_drain_list_lru_node(st
set_shrinker_bit(dst_memcg, nid, lru_shrinker_id(lru));
src->nr_items = 0;
}
-
+out:
spin_unlock_irq(&nlru->lock);
}
@@ -505,15 +493,41 @@ static void memcg_drain_list_lru(struct
for_each_node(i)
memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg);
+
+ memcg_list_lru_free(lru, src_idx);
}
-void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg)
+void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst)
{
+ struct cgroup_subsys_state *css;
struct list_lru *lru;
+ int src_idx = src->kmemcg_id;
+
+ /*
+ * Change kmemcg_id of this cgroup and all its descendants to the
+ * parent's id, and then move all entries from this cgroup's list_lrus
+ * to ones of the parent.
+ *
+ * After we have finished, all list_lrus corresponding to this cgroup
+ * are guaranteed to remain empty. So we can safely free this cgroup's
+ * list lrus in memcg_list_lru_free().
+ *
+ * Changing ->kmemcg_id to the parent can prevent memcg_list_lru_alloc()
+ * from allocating list lrus for this cgroup after memcg_list_lru_free()
+ * call.
+ */
+ rcu_read_lock();
+ css_for_each_descendant_pre(css, &src->css) {
+ struct mem_cgroup *memcg;
+
+ memcg = mem_cgroup_from_css(css);
+ memcg->kmemcg_id = dst->kmemcg_id;
+ }
+ rcu_read_unlock();
mutex_lock(&list_lrus_mutex);
list_for_each_entry(lru, &memcg_list_lrus, list)
- memcg_drain_list_lru(lru, src_idx, dst_memcg);
+ memcg_drain_list_lru(lru, src_idx, dst);
mutex_unlock(&list_lrus_mutex);
}
@@ -528,7 +542,7 @@ static bool memcg_list_lru_allocated(str
return true;
rcu_read_lock();
- allocated = !!rcu_dereference(lru->mlrus)->mlru[idx];
+ allocated = !!rcu_access_pointer(rcu_dereference(lru->mlrus)->mlru[idx]);
rcu_read_unlock();
return allocated;
@@ -576,11 +590,12 @@ int memcg_list_lru_alloc(struct mem_cgro
mlrus = rcu_dereference_protected(lru->mlrus, true);
while (i--) {
int index = table[i].memcg->kmemcg_id;
+ struct list_lru_per_memcg *mlru = table[i].mlru;
- if (mlrus->mlru[index])
- kfree(table[i].mlru);
+ if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true))
+ kfree(mlru);
else
- mlrus->mlru[index] = table[i].mlru;
+ rcu_assign_pointer(mlrus->mlru[index], mlru);
}
spin_unlock_irqrestore(&lru->lock, flags);
--- a/mm/memcontrol.c~mm-list_lru-allocate-list_lru_one-only-when-needed
+++ a/mm/memcontrol.c
@@ -3709,6 +3709,10 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
+ /*
+ * memcg_drain_all_list_lrus() can change memcg->kmemcg_id.
+ * Cache it to local @kmemcg_id.
+ */
kmemcg_id = memcg->kmemcg_id;
/*
@@ -3717,7 +3721,7 @@ static void memcg_offline_kmem(struct me
* The ordering is imposed by list_lru_node->lock taken by
* memcg_drain_all_list_lrus().
*/
- memcg_drain_all_list_lrus(kmemcg_id, parent);
+ memcg_drain_all_list_lrus(memcg, parent);
memcg_free_cache_id(kmemcg_id);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 056/227] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus
The purpose of the memcg_drain_all_list_lrus() is list_lrus reparenting.
It is very similar to memcg_reparent_objcgs(). Rename it to
memcg_reparent_list_lrus() so that the name can more consistent with
memcg_reparent_objcgs().
Link: https://lkml.kernel.org/r/20220228122126.37293-12-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 2 +-
mm/list_lru.c | 24 ++++++++++++------------
mm/memcontrol.c | 6 +++---
3 files changed, 16 insertions(+), 16 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus
+++ a/include/linux/list_lru.h
@@ -78,7 +78,7 @@ int __list_lru_init(struct list_lru *lru
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
gfp_t gfp);
int memcg_update_all_list_lrus(int num_memcgs);
-void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst);
+void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
/**
* list_lru_add: add an element to the lru list's tail
--- a/mm/list_lru.c~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus
+++ a/mm/list_lru.c
@@ -457,8 +457,8 @@ int memcg_update_all_list_lrus(int new_s
return ret;
}
-static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
- int src_idx, struct mem_cgroup *dst_memcg)
+static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
+ int src_idx, struct mem_cgroup *dst_memcg)
{
struct list_lru_node *nlru = &lru->node[nid];
int dst_idx = dst_memcg->kmemcg_id;
@@ -486,22 +486,22 @@ out:
spin_unlock_irq(&nlru->lock);
}
-static void memcg_drain_list_lru(struct list_lru *lru,
- int src_idx, struct mem_cgroup *dst_memcg)
+static void memcg_reparent_list_lru(struct list_lru *lru,
+ int src_idx, struct mem_cgroup *dst_memcg)
{
int i;
for_each_node(i)
- memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg);
+ memcg_reparent_list_lru_node(lru, i, src_idx, dst_memcg);
memcg_list_lru_free(lru, src_idx);
}
-void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst)
+void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent)
{
struct cgroup_subsys_state *css;
struct list_lru *lru;
- int src_idx = src->kmemcg_id;
+ int src_idx = memcg->kmemcg_id;
/*
* Change kmemcg_id of this cgroup and all its descendants to the
@@ -517,17 +517,17 @@ void memcg_drain_all_list_lrus(struct me
* call.
*/
rcu_read_lock();
- css_for_each_descendant_pre(css, &src->css) {
- struct mem_cgroup *memcg;
+ css_for_each_descendant_pre(css, &memcg->css) {
+ struct mem_cgroup *child;
- memcg = mem_cgroup_from_css(css);
- memcg->kmemcg_id = dst->kmemcg_id;
+ child = mem_cgroup_from_css(css);
+ child->kmemcg_id = parent->kmemcg_id;
}
rcu_read_unlock();
mutex_lock(&list_lrus_mutex);
list_for_each_entry(lru, &memcg_list_lrus, list)
- memcg_drain_list_lru(lru, src_idx, dst);
+ memcg_reparent_list_lru(lru, src_idx, parent);
mutex_unlock(&list_lrus_mutex);
}
--- a/mm/memcontrol.c~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus
+++ a/mm/memcontrol.c
@@ -3710,7 +3710,7 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
/*
- * memcg_drain_all_list_lrus() can change memcg->kmemcg_id.
+ * memcg_reparent_list_lrus() can change memcg->kmemcg_id.
* Cache it to local @kmemcg_id.
*/
kmemcg_id = memcg->kmemcg_id;
@@ -3719,9 +3719,9 @@ static void memcg_offline_kmem(struct me
* After we have finished memcg_reparent_objcgs(), all list_lrus
* corresponding to this cgroup are guaranteed to remain empty.
* The ordering is imposed by list_lru_node->lock taken by
- * memcg_drain_all_list_lrus().
+ * memcg_reparent_list_lrus().
*/
- memcg_drain_all_list_lrus(memcg, parent);
+ memcg_reparent_list_lrus(memcg, parent);
memcg_free_cache_id(kmemcg_id);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 056/227] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus
The purpose of the memcg_drain_all_list_lrus() is list_lrus reparenting.
It is very similar to memcg_reparent_objcgs(). Rename it to
memcg_reparent_list_lrus() so that the name can more consistent with
memcg_reparent_objcgs().
Link: https://lkml.kernel.org/r/20220228122126.37293-12-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 2 +-
mm/list_lru.c | 24 ++++++++++++------------
mm/memcontrol.c | 6 +++---
3 files changed, 16 insertions(+), 16 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus
+++ a/include/linux/list_lru.h
@@ -78,7 +78,7 @@ int __list_lru_init(struct list_lru *lru
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
gfp_t gfp);
int memcg_update_all_list_lrus(int num_memcgs);
-void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst);
+void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
/**
* list_lru_add: add an element to the lru list's tail
--- a/mm/list_lru.c~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus
+++ a/mm/list_lru.c
@@ -457,8 +457,8 @@ int memcg_update_all_list_lrus(int new_s
return ret;
}
-static void memcg_drain_list_lru_node(struct list_lru *lru, int nid,
- int src_idx, struct mem_cgroup *dst_memcg)
+static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
+ int src_idx, struct mem_cgroup *dst_memcg)
{
struct list_lru_node *nlru = &lru->node[nid];
int dst_idx = dst_memcg->kmemcg_id;
@@ -486,22 +486,22 @@ out:
spin_unlock_irq(&nlru->lock);
}
-static void memcg_drain_list_lru(struct list_lru *lru,
- int src_idx, struct mem_cgroup *dst_memcg)
+static void memcg_reparent_list_lru(struct list_lru *lru,
+ int src_idx, struct mem_cgroup *dst_memcg)
{
int i;
for_each_node(i)
- memcg_drain_list_lru_node(lru, i, src_idx, dst_memcg);
+ memcg_reparent_list_lru_node(lru, i, src_idx, dst_memcg);
memcg_list_lru_free(lru, src_idx);
}
-void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst)
+void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent)
{
struct cgroup_subsys_state *css;
struct list_lru *lru;
- int src_idx = src->kmemcg_id;
+ int src_idx = memcg->kmemcg_id;
/*
* Change kmemcg_id of this cgroup and all its descendants to the
@@ -517,17 +517,17 @@ void memcg_drain_all_list_lrus(struct me
* call.
*/
rcu_read_lock();
- css_for_each_descendant_pre(css, &src->css) {
- struct mem_cgroup *memcg;
+ css_for_each_descendant_pre(css, &memcg->css) {
+ struct mem_cgroup *child;
- memcg = mem_cgroup_from_css(css);
- memcg->kmemcg_id = dst->kmemcg_id;
+ child = mem_cgroup_from_css(css);
+ child->kmemcg_id = parent->kmemcg_id;
}
rcu_read_unlock();
mutex_lock(&list_lrus_mutex);
list_for_each_entry(lru, &memcg_list_lrus, list)
- memcg_drain_list_lru(lru, src_idx, dst);
+ memcg_reparent_list_lru(lru, src_idx, parent);
mutex_unlock(&list_lrus_mutex);
}
--- a/mm/memcontrol.c~mm-list_lru-rename-memcg_drain_all_list_lrus-to-memcg_reparent_list_lrus
+++ a/mm/memcontrol.c
@@ -3710,7 +3710,7 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
/*
- * memcg_drain_all_list_lrus() can change memcg->kmemcg_id.
+ * memcg_reparent_list_lrus() can change memcg->kmemcg_id.
* Cache it to local @kmemcg_id.
*/
kmemcg_id = memcg->kmemcg_id;
@@ -3719,9 +3719,9 @@ static void memcg_offline_kmem(struct me
* After we have finished memcg_reparent_objcgs(), all list_lrus
* corresponding to this cgroup are guaranteed to remain empty.
* The ordering is imposed by list_lru_node->lock taken by
- * memcg_drain_all_list_lrus().
+ * memcg_reparent_list_lrus().
*/
- memcg_drain_all_list_lrus(memcg, parent);
+ memcg_reparent_list_lrus(memcg, parent);
memcg_free_cache_id(kmemcg_id);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 057/227] mm: list_lru: replace linear array with xarray
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: replace linear array with xarray
If we run 10k containers in the system, the size of the
list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the
number containers, the size of the array will not be shrinked. It is not
scalable. The xarray is a good choice for this case. We can save a lot
of memory when there are tens of thousands continers in the system. If we
use xarray, we also can remove the logic code of resizing array, which can
simplify the code.
[akpm@linux-foundation.org: remove unused local]
Link: https://lkml.kernel.org/r/20220228122126.37293-13-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 13 --
include/linux/memcontrol.h | 23 ---
mm/list_lru.c | 203 +++++++++++------------------------
mm/memcontrol.c | 77 -------------
4 files changed, 73 insertions(+), 243 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-replace-linear-array-with-xarray
+++ a/include/linux/list_lru.h
@@ -11,6 +11,7 @@
#include <linux/list.h>
#include <linux/nodemask.h>
#include <linux/shrinker.h>
+#include <linux/xarray.h>
struct mem_cgroup;
@@ -37,12 +38,6 @@ struct list_lru_per_memcg {
struct list_lru_one node[];
};
-struct list_lru_memcg {
- struct rcu_head rcu;
- /* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_per_memcg __rcu *mlru[];
-};
-
struct list_lru_node {
/* protects all lists on the node, including per cgroup */
spinlock_t lock;
@@ -57,10 +52,7 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
- /* protects ->mlrus->mlru[i] */
- spinlock_t lock;
- /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
- struct list_lru_memcg __rcu *mlrus;
+ struct xarray xa;
#endif
};
@@ -77,7 +69,6 @@ int __list_lru_init(struct list_lru *lru
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
gfp_t gfp);
-int memcg_update_all_list_lrus(int num_memcgs);
void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
/**
--- a/include/linux/memcontrol.h~mm-list_lru-replace-linear-array-with-xarray
+++ a/include/linux/memcontrol.h
@@ -1685,18 +1685,6 @@ void obj_cgroup_uncharge(struct obj_cgro
extern struct static_key_false memcg_kmem_enabled_key;
-extern int memcg_nr_cache_ids;
-void memcg_get_cache_ids(void);
-void memcg_put_cache_ids(void);
-
-/*
- * Helper macro to loop through all memcg-specific caches. Callers must still
- * check if the cache is valid (it is either valid or NULL).
- * the slab_mutex must be held when looping through those caches
- */
-#define for_each_memcg_cache_index(_idx) \
- for ((_idx) = 0; (_idx) < memcg_nr_cache_ids; (_idx)++)
-
static inline bool memcg_kmem_enabled(void)
{
return static_branch_likely(&memcg_kmem_enabled_key);
@@ -1753,9 +1741,6 @@ static inline void __memcg_kmem_uncharge
{
}
-#define for_each_memcg_cache_index(_idx) \
- for (; NULL; )
-
static inline bool memcg_kmem_enabled(void)
{
return false;
@@ -1766,14 +1751,6 @@ static inline int memcg_cache_id(struct
return -1;
}
-static inline void memcg_get_cache_ids(void)
-{
-}
-
-static inline void memcg_put_cache_ids(void)
-{
-}
-
static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
return NULL;
--- a/mm/list_lru.c~mm-list_lru-replace-linear-array-with-xarray
+++ a/mm/list_lru.c
@@ -52,21 +52,12 @@ static int lru_shrinker_id(struct list_l
static inline struct list_lru_one *
list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- struct list_lru_memcg *mlrus;
- struct list_lru_node *nlru = &lru->node[nid];
-
- /*
- * Either lock or RCU protects the array of per cgroup lists
- * from relocation (see memcg_update_list_lru).
- */
- mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock));
- if (mlrus && idx >= 0) {
- struct list_lru_per_memcg *mlru;
+ if (list_lru_memcg_aware(lru) && idx >= 0) {
+ struct list_lru_per_memcg *mlru = xa_load(&lru->xa, idx);
- mlru = rcu_dereference_check(mlrus->mlru[idx], true);
return mlru ? &mlru->node[nid] : NULL;
}
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
static inline struct list_lru_one *
@@ -77,7 +68,7 @@ list_lru_from_kmem(struct list_lru *lru,
struct list_lru_one *l = &nlru->lru;
struct mem_cgroup *memcg = NULL;
- if (!lru->mlrus)
+ if (!list_lru_memcg_aware(lru))
goto out;
memcg = mem_cgroup_from_obj(ptr);
@@ -309,16 +300,20 @@ unsigned long list_lru_walk_node(struct
unsigned long *nr_to_walk)
{
long isolated = 0;
- int memcg_idx;
isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
nr_to_walk);
+
+#ifdef CONFIG_MEMCG_KMEM
if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
- for_each_memcg_cache_index(memcg_idx) {
+ struct list_lru_per_memcg *mlru;
+ unsigned long index;
+
+ xa_for_each(&lru->xa, index, mlru) {
struct list_lru_node *nlru = &lru->node[nid];
spin_lock(&nlru->lock);
- isolated += __list_lru_walk_one(lru, nid, memcg_idx,
+ isolated += __list_lru_walk_one(lru, nid, index,
isolate, cb_arg,
nr_to_walk);
spin_unlock(&nlru->lock);
@@ -327,6 +322,8 @@ unsigned long list_lru_walk_node(struct
break;
}
}
+#endif
+
return isolated;
}
EXPORT_SYMBOL_GPL(list_lru_walk_node);
@@ -338,15 +335,6 @@ static void init_one_lru(struct list_lru
}
#ifdef CONFIG_MEMCG_KMEM
-static void memcg_destroy_list_lru_range(struct list_lru_memcg *mlrus,
- int begin, int end)
-{
- int i;
-
- for (i = begin; i < end; i++)
- kfree(mlrus->mlru[i]);
-}
-
static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp)
{
int nid;
@@ -364,14 +352,7 @@ static struct list_lru_per_memcg *memcg_
static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- struct list_lru_memcg *mlrus;
- struct list_lru_per_memcg *mlru;
-
- spin_lock_irq(&lru->lock);
- mlrus = rcu_dereference_protected(lru->mlrus, true);
- mlru = rcu_dereference_protected(mlrus->mlru[src_idx], true);
- rcu_assign_pointer(mlrus->mlru[src_idx], NULL);
- spin_unlock_irq(&lru->lock);
+ struct list_lru_per_memcg *mlru = xa_erase_irq(&lru->xa, src_idx);
/*
* The __list_lru_walk_one() can walk the list of this node.
@@ -383,78 +364,27 @@ static void memcg_list_lru_free(struct l
kvfree_rcu(mlru, rcu);
}
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
+static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- struct list_lru_memcg *mlrus;
- int size = memcg_nr_cache_ids;
-
+ if (memcg_aware)
+ xa_init_flags(&lru->xa, XA_FLAGS_LOCK_IRQ);
lru->memcg_aware = memcg_aware;
- if (!memcg_aware)
- return 0;
-
- spin_lock_init(&lru->lock);
-
- mlrus = kvzalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
- if (!mlrus)
- return -ENOMEM;
-
- RCU_INIT_POINTER(lru->mlrus, mlrus);
-
- return 0;
}
static void memcg_destroy_list_lru(struct list_lru *lru)
{
- struct list_lru_memcg *mlrus;
+ XA_STATE(xas, &lru->xa, 0);
+ struct list_lru_per_memcg *mlru;
if (!list_lru_memcg_aware(lru))
return;
- /*
- * This is called when shrinker has already been unregistered,
- * and nobody can use it. So, there is no need to use kvfree_rcu().
- */
- mlrus = rcu_dereference_protected(lru->mlrus, true);
- memcg_destroy_list_lru_range(mlrus, 0, memcg_nr_cache_ids);
- kvfree(mlrus);
-}
-
-static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size)
-{
- struct list_lru_memcg *old, *new;
-
- BUG_ON(old_size > new_size);
-
- old = rcu_dereference_protected(lru->mlrus,
- lockdep_is_held(&list_lrus_mutex));
- new = kvmalloc(struct_size(new, mlru, new_size), GFP_KERNEL);
- if (!new)
- return -ENOMEM;
-
- spin_lock_irq(&lru->lock);
- memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
- memset(&new->mlru[old_size], 0, flex_array_size(new, mlru, new_size - old_size));
- rcu_assign_pointer(lru->mlrus, new);
- spin_unlock_irq(&lru->lock);
-
- kvfree_rcu(old, rcu);
- return 0;
-}
-
-int memcg_update_all_list_lrus(int new_size)
-{
- int ret = 0;
- struct list_lru *lru;
- int old_size = memcg_nr_cache_ids;
-
- mutex_lock(&list_lrus_mutex);
- list_for_each_entry(lru, &memcg_list_lrus, list) {
- ret = memcg_update_list_lru(lru, old_size, new_size);
- if (ret)
- break;
+ xas_lock_irq(&xas);
+ xas_for_each(&xas, mlru, ULONG_MAX) {
+ kfree(mlru);
+ xas_store(&xas, NULL);
}
- mutex_unlock(&list_lrus_mutex);
- return ret;
+ xas_unlock_irq(&xas);
}
static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
@@ -521,7 +451,7 @@ void memcg_reparent_list_lrus(struct mem
struct mem_cgroup *child;
child = mem_cgroup_from_css(css);
- child->kmemcg_id = parent->kmemcg_id;
+ WRITE_ONCE(child->kmemcg_id, parent->kmemcg_id);
}
rcu_read_unlock();
@@ -531,21 +461,12 @@ void memcg_reparent_list_lrus(struct mem
mutex_unlock(&list_lrus_mutex);
}
-static bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
- struct list_lru *lru)
+static inline bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
+ struct list_lru *lru)
{
- bool allocated;
- int idx;
-
- idx = memcg->kmemcg_id;
- if (unlikely(idx < 0))
- return true;
+ int idx = memcg->kmemcg_id;
- rcu_read_lock();
- allocated = !!rcu_access_pointer(rcu_dereference(lru->mlrus)->mlru[idx]);
- rcu_read_unlock();
-
- return allocated;
+ return idx < 0 || xa_load(&lru->xa, idx);
}
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
@@ -553,11 +474,11 @@ int memcg_list_lru_alloc(struct mem_cgro
{
int i;
unsigned long flags;
- struct list_lru_memcg *mlrus;
struct list_lru_memcg_table {
struct list_lru_per_memcg *mlru;
struct mem_cgroup *memcg;
} *table;
+ XA_STATE(xas, &lru->xa, 0);
if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
return 0;
@@ -586,27 +507,48 @@ int memcg_list_lru_alloc(struct mem_cgro
}
}
- spin_lock_irqsave(&lru->lock, flags);
- mlrus = rcu_dereference_protected(lru->mlrus, true);
+ xas_lock_irqsave(&xas, flags);
while (i--) {
- int index = table[i].memcg->kmemcg_id;
+ int index = READ_ONCE(table[i].memcg->kmemcg_id);
struct list_lru_per_memcg *mlru = table[i].mlru;
- if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true))
+ xas_set(&xas, index);
+retry:
+ if (unlikely(index < 0 || xas_error(&xas) || xas_load(&xas))) {
kfree(mlru);
- else
- rcu_assign_pointer(mlrus->mlru[index], mlru);
+ } else {
+ xas_store(&xas, mlru);
+ if (xas_error(&xas) == -ENOMEM) {
+ xas_unlock_irqrestore(&xas, flags);
+ if (xas_nomem(&xas, gfp))
+ xas_set_err(&xas, 0);
+ xas_lock_irqsave(&xas, flags);
+ /*
+ * The xas lock has been released, this memcg
+ * can be reparented before us. So reload
+ * memcg id. More details see the comments
+ * in memcg_reparent_list_lrus().
+ */
+ index = READ_ONCE(table[i].memcg->kmemcg_id);
+ if (index < 0)
+ xas_set_err(&xas, 0);
+ else if (!xas_error(&xas) && index != xas.xa_index)
+ xas_set(&xas, index);
+ goto retry;
+ }
+ }
}
- spin_unlock_irqrestore(&lru->lock, flags);
-
+ /* xas_nomem() is used to free memory instead of memory allocation. */
+ if (xas.xa_alloc)
+ xas_nomem(&xas, gfp);
+ xas_unlock_irqrestore(&xas, flags);
kfree(table);
- return 0;
+ return xas_error(&xas);
}
#else
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
+static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- return 0;
}
static void memcg_destroy_list_lru(struct list_lru *lru)
@@ -618,7 +560,6 @@ int __list_lru_init(struct list_lru *lru
struct lock_class_key *key, struct shrinker *shrinker)
{
int i;
- int err = -ENOMEM;
#ifdef CONFIG_MEMCG_KMEM
if (shrinker)
@@ -626,11 +567,10 @@ int __list_lru_init(struct list_lru *lru
else
lru->shrinker_id = -1;
#endif
- memcg_get_cache_ids();
lru->node = kcalloc(nr_node_ids, sizeof(*lru->node), GFP_KERNEL);
if (!lru->node)
- goto out;
+ return -ENOMEM;
for_each_node(i) {
spin_lock_init(&lru->node[i].lock);
@@ -639,18 +579,10 @@ int __list_lru_init(struct list_lru *lru
init_one_lru(&lru->node[i].lru);
}
- err = memcg_init_list_lru(lru, memcg_aware);
- if (err) {
- kfree(lru->node);
- /* Do this so a list_lru_destroy() doesn't crash: */
- lru->node = NULL;
- goto out;
- }
-
+ memcg_init_list_lru(lru, memcg_aware);
list_lru_register(lru);
-out:
- memcg_put_cache_ids();
- return err;
+
+ return 0;
}
EXPORT_SYMBOL_GPL(__list_lru_init);
@@ -660,8 +592,6 @@ void list_lru_destroy(struct list_lru *l
if (!lru->node)
return;
- memcg_get_cache_ids();
-
list_lru_unregister(lru);
memcg_destroy_list_lru(lru);
@@ -671,6 +601,5 @@ void list_lru_destroy(struct list_lru *l
#ifdef CONFIG_MEMCG_KMEM
lru->shrinker_id = -1;
#endif
- memcg_put_cache_ids();
}
EXPORT_SYMBOL_GPL(list_lru_destroy);
--- a/mm/memcontrol.c~mm-list_lru-replace-linear-array-with-xarray
+++ a/mm/memcontrol.c
@@ -351,42 +351,17 @@ static void memcg_reparent_objcgs(struct
* This will be used as a shrinker list's index.
* The main reason for not using cgroup id for this:
* this works better in sparse environments, where we have a lot of memcgs,
- * but only a few kmem-limited. Or also, if we have, for instance, 200
- * memcgs, and none but the 200th is kmem-limited, we'd have to have a
- * 200 entry array for that.
- *
- * The current size of the caches array is stored in memcg_nr_cache_ids. It
- * will double each time we have to increase it.
+ * but only a few kmem-limited.
*/
static DEFINE_IDA(memcg_cache_ida);
-int memcg_nr_cache_ids;
-
-/* Protects memcg_nr_cache_ids */
-static DECLARE_RWSEM(memcg_cache_ids_sem);
-
-void memcg_get_cache_ids(void)
-{
- down_read(&memcg_cache_ids_sem);
-}
-
-void memcg_put_cache_ids(void)
-{
- up_read(&memcg_cache_ids_sem);
-}
/*
- * MIN_SIZE is different than 1, because we would like to avoid going through
- * the alloc/free process all the time. In a small machine, 4 kmem-limited
- * cgroups is a reasonable guess. In the future, it could be a parameter or
- * tunable, but that is strictly not necessary.
- *
* MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get
* this constant directly from cgroup, but it is understandable that this is
* better kept as an internal representation in cgroup.c. In any case, the
* cgrp_id space is not getting any smaller, and we don't have to necessarily
* increase ours as well if it increases.
*/
-#define MEMCG_CACHES_MIN_SIZE 4
#define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX
/*
@@ -2944,49 +2919,6 @@ __always_inline struct obj_cgroup *get_o
return objcg;
}
-static int memcg_alloc_cache_id(void)
-{
- int id, size;
- int err;
-
- id = ida_simple_get(&memcg_cache_ida,
- 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL);
- if (id < 0)
- return id;
-
- if (id < memcg_nr_cache_ids)
- return id;
-
- /*
- * There's no space for the new id in memcg_caches arrays,
- * so we have to grow them.
- */
- down_write(&memcg_cache_ids_sem);
-
- size = 2 * (id + 1);
- if (size < MEMCG_CACHES_MIN_SIZE)
- size = MEMCG_CACHES_MIN_SIZE;
- else if (size > MEMCG_CACHES_MAX_SIZE)
- size = MEMCG_CACHES_MAX_SIZE;
-
- err = memcg_update_all_list_lrus(size);
- if (!err)
- memcg_nr_cache_ids = size;
-
- up_write(&memcg_cache_ids_sem);
-
- if (err) {
- ida_simple_remove(&memcg_cache_ida, id);
- return err;
- }
- return id;
-}
-
-static void memcg_free_cache_id(int id)
-{
- ida_simple_remove(&memcg_cache_ida, id);
-}
-
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
{
mod_memcg_state(memcg, MEMCG_KMEM, nr_pages);
@@ -3673,13 +3605,14 @@ static int memcg_online_kmem(struct mem_
if (unlikely(mem_cgroup_is_root(memcg)))
return 0;
- memcg_id = memcg_alloc_cache_id();
+ memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1,
+ GFP_KERNEL);
if (memcg_id < 0)
return memcg_id;
objcg = obj_cgroup_alloc();
if (!objcg) {
- memcg_free_cache_id(memcg_id);
+ ida_free(&memcg_cache_ida, memcg_id);
return -ENOMEM;
}
objcg->memcg = memcg;
@@ -3723,7 +3656,7 @@ static void memcg_offline_kmem(struct me
*/
memcg_reparent_list_lrus(memcg, parent);
- memcg_free_cache_id(kmemcg_id);
+ ida_free(&memcg_cache_ida, kmemcg_id);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 057/227] mm: list_lru: replace linear array with xarray
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: replace linear array with xarray
If we run 10k containers in the system, the size of the
list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the
number containers, the size of the array will not be shrinked. It is not
scalable. The xarray is a good choice for this case. We can save a lot
of memory when there are tens of thousands continers in the system. If we
use xarray, we also can remove the logic code of resizing array, which can
simplify the code.
[akpm@linux-foundation.org: remove unused local]
Link: https://lkml.kernel.org/r/20220228122126.37293-13-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 13 --
include/linux/memcontrol.h | 23 ---
mm/list_lru.c | 203 +++++++++++------------------------
mm/memcontrol.c | 77 -------------
4 files changed, 73 insertions(+), 243 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-replace-linear-array-with-xarray
+++ a/include/linux/list_lru.h
@@ -11,6 +11,7 @@
#include <linux/list.h>
#include <linux/nodemask.h>
#include <linux/shrinker.h>
+#include <linux/xarray.h>
struct mem_cgroup;
@@ -37,12 +38,6 @@ struct list_lru_per_memcg {
struct list_lru_one node[];
};
-struct list_lru_memcg {
- struct rcu_head rcu;
- /* array of per cgroup lists, indexed by memcg_cache_id */
- struct list_lru_per_memcg __rcu *mlru[];
-};
-
struct list_lru_node {
/* protects all lists on the node, including per cgroup */
spinlock_t lock;
@@ -57,10 +52,7 @@ struct list_lru {
struct list_head list;
int shrinker_id;
bool memcg_aware;
- /* protects ->mlrus->mlru[i] */
- spinlock_t lock;
- /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
- struct list_lru_memcg __rcu *mlrus;
+ struct xarray xa;
#endif
};
@@ -77,7 +69,6 @@ int __list_lru_init(struct list_lru *lru
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
gfp_t gfp);
-int memcg_update_all_list_lrus(int num_memcgs);
void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
/**
--- a/include/linux/memcontrol.h~mm-list_lru-replace-linear-array-with-xarray
+++ a/include/linux/memcontrol.h
@@ -1685,18 +1685,6 @@ void obj_cgroup_uncharge(struct obj_cgro
extern struct static_key_false memcg_kmem_enabled_key;
-extern int memcg_nr_cache_ids;
-void memcg_get_cache_ids(void);
-void memcg_put_cache_ids(void);
-
-/*
- * Helper macro to loop through all memcg-specific caches. Callers must still
- * check if the cache is valid (it is either valid or NULL).
- * the slab_mutex must be held when looping through those caches
- */
-#define for_each_memcg_cache_index(_idx) \
- for ((_idx) = 0; (_idx) < memcg_nr_cache_ids; (_idx)++)
-
static inline bool memcg_kmem_enabled(void)
{
return static_branch_likely(&memcg_kmem_enabled_key);
@@ -1753,9 +1741,6 @@ static inline void __memcg_kmem_uncharge
{
}
-#define for_each_memcg_cache_index(_idx) \
- for (; NULL; )
-
static inline bool memcg_kmem_enabled(void)
{
return false;
@@ -1766,14 +1751,6 @@ static inline int memcg_cache_id(struct
return -1;
}
-static inline void memcg_get_cache_ids(void)
-{
-}
-
-static inline void memcg_put_cache_ids(void)
-{
-}
-
static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
{
return NULL;
--- a/mm/list_lru.c~mm-list_lru-replace-linear-array-with-xarray
+++ a/mm/list_lru.c
@@ -52,21 +52,12 @@ static int lru_shrinker_id(struct list_l
static inline struct list_lru_one *
list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
- struct list_lru_memcg *mlrus;
- struct list_lru_node *nlru = &lru->node[nid];
-
- /*
- * Either lock or RCU protects the array of per cgroup lists
- * from relocation (see memcg_update_list_lru).
- */
- mlrus = rcu_dereference_check(lru->mlrus, lockdep_is_held(&nlru->lock));
- if (mlrus && idx >= 0) {
- struct list_lru_per_memcg *mlru;
+ if (list_lru_memcg_aware(lru) && idx >= 0) {
+ struct list_lru_per_memcg *mlru = xa_load(&lru->xa, idx);
- mlru = rcu_dereference_check(mlrus->mlru[idx], true);
return mlru ? &mlru->node[nid] : NULL;
}
- return &nlru->lru;
+ return &lru->node[nid].lru;
}
static inline struct list_lru_one *
@@ -77,7 +68,7 @@ list_lru_from_kmem(struct list_lru *lru,
struct list_lru_one *l = &nlru->lru;
struct mem_cgroup *memcg = NULL;
- if (!lru->mlrus)
+ if (!list_lru_memcg_aware(lru))
goto out;
memcg = mem_cgroup_from_obj(ptr);
@@ -309,16 +300,20 @@ unsigned long list_lru_walk_node(struct
unsigned long *nr_to_walk)
{
long isolated = 0;
- int memcg_idx;
isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
nr_to_walk);
+
+#ifdef CONFIG_MEMCG_KMEM
if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
- for_each_memcg_cache_index(memcg_idx) {
+ struct list_lru_per_memcg *mlru;
+ unsigned long index;
+
+ xa_for_each(&lru->xa, index, mlru) {
struct list_lru_node *nlru = &lru->node[nid];
spin_lock(&nlru->lock);
- isolated += __list_lru_walk_one(lru, nid, memcg_idx,
+ isolated += __list_lru_walk_one(lru, nid, index,
isolate, cb_arg,
nr_to_walk);
spin_unlock(&nlru->lock);
@@ -327,6 +322,8 @@ unsigned long list_lru_walk_node(struct
break;
}
}
+#endif
+
return isolated;
}
EXPORT_SYMBOL_GPL(list_lru_walk_node);
@@ -338,15 +335,6 @@ static void init_one_lru(struct list_lru
}
#ifdef CONFIG_MEMCG_KMEM
-static void memcg_destroy_list_lru_range(struct list_lru_memcg *mlrus,
- int begin, int end)
-{
- int i;
-
- for (i = begin; i < end; i++)
- kfree(mlrus->mlru[i]);
-}
-
static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp)
{
int nid;
@@ -364,14 +352,7 @@ static struct list_lru_per_memcg *memcg_
static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- struct list_lru_memcg *mlrus;
- struct list_lru_per_memcg *mlru;
-
- spin_lock_irq(&lru->lock);
- mlrus = rcu_dereference_protected(lru->mlrus, true);
- mlru = rcu_dereference_protected(mlrus->mlru[src_idx], true);
- rcu_assign_pointer(mlrus->mlru[src_idx], NULL);
- spin_unlock_irq(&lru->lock);
+ struct list_lru_per_memcg *mlru = xa_erase_irq(&lru->xa, src_idx);
/*
* The __list_lru_walk_one() can walk the list of this node.
@@ -383,78 +364,27 @@ static void memcg_list_lru_free(struct l
kvfree_rcu(mlru, rcu);
}
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
+static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- struct list_lru_memcg *mlrus;
- int size = memcg_nr_cache_ids;
-
+ if (memcg_aware)
+ xa_init_flags(&lru->xa, XA_FLAGS_LOCK_IRQ);
lru->memcg_aware = memcg_aware;
- if (!memcg_aware)
- return 0;
-
- spin_lock_init(&lru->lock);
-
- mlrus = kvzalloc(struct_size(mlrus, mlru, size), GFP_KERNEL);
- if (!mlrus)
- return -ENOMEM;
-
- RCU_INIT_POINTER(lru->mlrus, mlrus);
-
- return 0;
}
static void memcg_destroy_list_lru(struct list_lru *lru)
{
- struct list_lru_memcg *mlrus;
+ XA_STATE(xas, &lru->xa, 0);
+ struct list_lru_per_memcg *mlru;
if (!list_lru_memcg_aware(lru))
return;
- /*
- * This is called when shrinker has already been unregistered,
- * and nobody can use it. So, there is no need to use kvfree_rcu().
- */
- mlrus = rcu_dereference_protected(lru->mlrus, true);
- memcg_destroy_list_lru_range(mlrus, 0, memcg_nr_cache_ids);
- kvfree(mlrus);
-}
-
-static int memcg_update_list_lru(struct list_lru *lru, int old_size, int new_size)
-{
- struct list_lru_memcg *old, *new;
-
- BUG_ON(old_size > new_size);
-
- old = rcu_dereference_protected(lru->mlrus,
- lockdep_is_held(&list_lrus_mutex));
- new = kvmalloc(struct_size(new, mlru, new_size), GFP_KERNEL);
- if (!new)
- return -ENOMEM;
-
- spin_lock_irq(&lru->lock);
- memcpy(&new->mlru, &old->mlru, flex_array_size(new, mlru, old_size));
- memset(&new->mlru[old_size], 0, flex_array_size(new, mlru, new_size - old_size));
- rcu_assign_pointer(lru->mlrus, new);
- spin_unlock_irq(&lru->lock);
-
- kvfree_rcu(old, rcu);
- return 0;
-}
-
-int memcg_update_all_list_lrus(int new_size)
-{
- int ret = 0;
- struct list_lru *lru;
- int old_size = memcg_nr_cache_ids;
-
- mutex_lock(&list_lrus_mutex);
- list_for_each_entry(lru, &memcg_list_lrus, list) {
- ret = memcg_update_list_lru(lru, old_size, new_size);
- if (ret)
- break;
+ xas_lock_irq(&xas);
+ xas_for_each(&xas, mlru, ULONG_MAX) {
+ kfree(mlru);
+ xas_store(&xas, NULL);
}
- mutex_unlock(&list_lrus_mutex);
- return ret;
+ xas_unlock_irq(&xas);
}
static void memcg_reparent_list_lru_node(struct list_lru *lru, int nid,
@@ -521,7 +451,7 @@ void memcg_reparent_list_lrus(struct mem
struct mem_cgroup *child;
child = mem_cgroup_from_css(css);
- child->kmemcg_id = parent->kmemcg_id;
+ WRITE_ONCE(child->kmemcg_id, parent->kmemcg_id);
}
rcu_read_unlock();
@@ -531,21 +461,12 @@ void memcg_reparent_list_lrus(struct mem
mutex_unlock(&list_lrus_mutex);
}
-static bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
- struct list_lru *lru)
+static inline bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
+ struct list_lru *lru)
{
- bool allocated;
- int idx;
-
- idx = memcg->kmemcg_id;
- if (unlikely(idx < 0))
- return true;
+ int idx = memcg->kmemcg_id;
- rcu_read_lock();
- allocated = !!rcu_access_pointer(rcu_dereference(lru->mlrus)->mlru[idx]);
- rcu_read_unlock();
-
- return allocated;
+ return idx < 0 || xa_load(&lru->xa, idx);
}
int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
@@ -553,11 +474,11 @@ int memcg_list_lru_alloc(struct mem_cgro
{
int i;
unsigned long flags;
- struct list_lru_memcg *mlrus;
struct list_lru_memcg_table {
struct list_lru_per_memcg *mlru;
struct mem_cgroup *memcg;
} *table;
+ XA_STATE(xas, &lru->xa, 0);
if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
return 0;
@@ -586,27 +507,48 @@ int memcg_list_lru_alloc(struct mem_cgro
}
}
- spin_lock_irqsave(&lru->lock, flags);
- mlrus = rcu_dereference_protected(lru->mlrus, true);
+ xas_lock_irqsave(&xas, flags);
while (i--) {
- int index = table[i].memcg->kmemcg_id;
+ int index = READ_ONCE(table[i].memcg->kmemcg_id);
struct list_lru_per_memcg *mlru = table[i].mlru;
- if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true))
+ xas_set(&xas, index);
+retry:
+ if (unlikely(index < 0 || xas_error(&xas) || xas_load(&xas))) {
kfree(mlru);
- else
- rcu_assign_pointer(mlrus->mlru[index], mlru);
+ } else {
+ xas_store(&xas, mlru);
+ if (xas_error(&xas) == -ENOMEM) {
+ xas_unlock_irqrestore(&xas, flags);
+ if (xas_nomem(&xas, gfp))
+ xas_set_err(&xas, 0);
+ xas_lock_irqsave(&xas, flags);
+ /*
+ * The xas lock has been released, this memcg
+ * can be reparented before us. So reload
+ * memcg id. More details see the comments
+ * in memcg_reparent_list_lrus().
+ */
+ index = READ_ONCE(table[i].memcg->kmemcg_id);
+ if (index < 0)
+ xas_set_err(&xas, 0);
+ else if (!xas_error(&xas) && index != xas.xa_index)
+ xas_set(&xas, index);
+ goto retry;
+ }
+ }
}
- spin_unlock_irqrestore(&lru->lock, flags);
-
+ /* xas_nomem() is used to free memory instead of memory allocation. */
+ if (xas.xa_alloc)
+ xas_nomem(&xas, gfp);
+ xas_unlock_irqrestore(&xas, flags);
kfree(table);
- return 0;
+ return xas_error(&xas);
}
#else
-static int memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
+static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
{
- return 0;
}
static void memcg_destroy_list_lru(struct list_lru *lru)
@@ -618,7 +560,6 @@ int __list_lru_init(struct list_lru *lru
struct lock_class_key *key, struct shrinker *shrinker)
{
int i;
- int err = -ENOMEM;
#ifdef CONFIG_MEMCG_KMEM
if (shrinker)
@@ -626,11 +567,10 @@ int __list_lru_init(struct list_lru *lru
else
lru->shrinker_id = -1;
#endif
- memcg_get_cache_ids();
lru->node = kcalloc(nr_node_ids, sizeof(*lru->node), GFP_KERNEL);
if (!lru->node)
- goto out;
+ return -ENOMEM;
for_each_node(i) {
spin_lock_init(&lru->node[i].lock);
@@ -639,18 +579,10 @@ int __list_lru_init(struct list_lru *lru
init_one_lru(&lru->node[i].lru);
}
- err = memcg_init_list_lru(lru, memcg_aware);
- if (err) {
- kfree(lru->node);
- /* Do this so a list_lru_destroy() doesn't crash: */
- lru->node = NULL;
- goto out;
- }
-
+ memcg_init_list_lru(lru, memcg_aware);
list_lru_register(lru);
-out:
- memcg_put_cache_ids();
- return err;
+
+ return 0;
}
EXPORT_SYMBOL_GPL(__list_lru_init);
@@ -660,8 +592,6 @@ void list_lru_destroy(struct list_lru *l
if (!lru->node)
return;
- memcg_get_cache_ids();
-
list_lru_unregister(lru);
memcg_destroy_list_lru(lru);
@@ -671,6 +601,5 @@ void list_lru_destroy(struct list_lru *l
#ifdef CONFIG_MEMCG_KMEM
lru->shrinker_id = -1;
#endif
- memcg_put_cache_ids();
}
EXPORT_SYMBOL_GPL(list_lru_destroy);
--- a/mm/memcontrol.c~mm-list_lru-replace-linear-array-with-xarray
+++ a/mm/memcontrol.c
@@ -351,42 +351,17 @@ static void memcg_reparent_objcgs(struct
* This will be used as a shrinker list's index.
* The main reason for not using cgroup id for this:
* this works better in sparse environments, where we have a lot of memcgs,
- * but only a few kmem-limited. Or also, if we have, for instance, 200
- * memcgs, and none but the 200th is kmem-limited, we'd have to have a
- * 200 entry array for that.
- *
- * The current size of the caches array is stored in memcg_nr_cache_ids. It
- * will double each time we have to increase it.
+ * but only a few kmem-limited.
*/
static DEFINE_IDA(memcg_cache_ida);
-int memcg_nr_cache_ids;
-
-/* Protects memcg_nr_cache_ids */
-static DECLARE_RWSEM(memcg_cache_ids_sem);
-
-void memcg_get_cache_ids(void)
-{
- down_read(&memcg_cache_ids_sem);
-}
-
-void memcg_put_cache_ids(void)
-{
- up_read(&memcg_cache_ids_sem);
-}
/*
- * MIN_SIZE is different than 1, because we would like to avoid going through
- * the alloc/free process all the time. In a small machine, 4 kmem-limited
- * cgroups is a reasonable guess. In the future, it could be a parameter or
- * tunable, but that is strictly not necessary.
- *
* MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get
* this constant directly from cgroup, but it is understandable that this is
* better kept as an internal representation in cgroup.c. In any case, the
* cgrp_id space is not getting any smaller, and we don't have to necessarily
* increase ours as well if it increases.
*/
-#define MEMCG_CACHES_MIN_SIZE 4
#define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX
/*
@@ -2944,49 +2919,6 @@ __always_inline struct obj_cgroup *get_o
return objcg;
}
-static int memcg_alloc_cache_id(void)
-{
- int id, size;
- int err;
-
- id = ida_simple_get(&memcg_cache_ida,
- 0, MEMCG_CACHES_MAX_SIZE, GFP_KERNEL);
- if (id < 0)
- return id;
-
- if (id < memcg_nr_cache_ids)
- return id;
-
- /*
- * There's no space for the new id in memcg_caches arrays,
- * so we have to grow them.
- */
- down_write(&memcg_cache_ids_sem);
-
- size = 2 * (id + 1);
- if (size < MEMCG_CACHES_MIN_SIZE)
- size = MEMCG_CACHES_MIN_SIZE;
- else if (size > MEMCG_CACHES_MAX_SIZE)
- size = MEMCG_CACHES_MAX_SIZE;
-
- err = memcg_update_all_list_lrus(size);
- if (!err)
- memcg_nr_cache_ids = size;
-
- up_write(&memcg_cache_ids_sem);
-
- if (err) {
- ida_simple_remove(&memcg_cache_ida, id);
- return err;
- }
- return id;
-}
-
-static void memcg_free_cache_id(int id)
-{
- ida_simple_remove(&memcg_cache_ida, id);
-}
-
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
{
mod_memcg_state(memcg, MEMCG_KMEM, nr_pages);
@@ -3673,13 +3605,14 @@ static int memcg_online_kmem(struct mem_
if (unlikely(mem_cgroup_is_root(memcg)))
return 0;
- memcg_id = memcg_alloc_cache_id();
+ memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1,
+ GFP_KERNEL);
if (memcg_id < 0)
return memcg_id;
objcg = obj_cgroup_alloc();
if (!objcg) {
- memcg_free_cache_id(memcg_id);
+ ida_free(&memcg_cache_ida, memcg_id);
return -ENOMEM;
}
objcg->memcg = memcg;
@@ -3723,7 +3656,7 @@ static void memcg_offline_kmem(struct me
*/
memcg_reparent_list_lrus(memcg, parent);
- memcg_free_cache_id(kmemcg_id);
+ ida_free(&memcg_cache_ida, kmemcg_id);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 058/227] mm: memcontrol: reuse memory cgroup ID for kmem ID
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: reuse memory cgroup ID for kmem ID
There are two idrs being used by memory cgroup, one is for kmem ID,
another is for memory cgroup ID. The maximum ID of both is 64Ki. Both of
them can limit the total number of memory cgroups. Actually, we can reuse
memory cgroup ID for kmem ID to simplify the code.
Link: https://lkml.kernel.org/r/20220228122126.37293-14-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 39 +++------------------------------------
1 file changed, 3 insertions(+), 36 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-reuse-memory-cgroup-id-for-kmem-id
+++ a/mm/memcontrol.c
@@ -348,23 +348,6 @@ static void memcg_reparent_objcgs(struct
}
/*
- * This will be used as a shrinker list's index.
- * The main reason for not using cgroup id for this:
- * this works better in sparse environments, where we have a lot of memcgs,
- * but only a few kmem-limited.
- */
-static DEFINE_IDA(memcg_cache_ida);
-
-/*
- * MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get
- * this constant directly from cgroup, but it is understandable that this is
- * better kept as an internal representation in cgroup.c. In any case, the
- * cgrp_id space is not getting any smaller, and we don't have to necessarily
- * increase ours as well if it increases.
- */
-#define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX
-
-/*
* A lot of the calls to the cache allocation functions are expected to be
* inlined by the compiler. Since the calls to memcg_slab_pre_alloc_hook() are
* conditional to this static branch, we'll have to allow modules that does
@@ -3597,7 +3580,6 @@ static u64 mem_cgroup_read_u64(struct cg
static int memcg_online_kmem(struct mem_cgroup *memcg)
{
struct obj_cgroup *objcg;
- int memcg_id;
if (cgroup_memory_nokmem)
return 0;
@@ -3605,22 +3587,16 @@ static int memcg_online_kmem(struct mem_
if (unlikely(mem_cgroup_is_root(memcg)))
return 0;
- memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1,
- GFP_KERNEL);
- if (memcg_id < 0)
- return memcg_id;
-
objcg = obj_cgroup_alloc();
- if (!objcg) {
- ida_free(&memcg_cache_ida, memcg_id);
+ if (!objcg)
return -ENOMEM;
- }
+
objcg->memcg = memcg;
rcu_assign_pointer(memcg->objcg, objcg);
static_branch_enable(&memcg_kmem_enabled_key);
- memcg->kmemcg_id = memcg_id;
+ memcg->kmemcg_id = memcg->id.id;
return 0;
}
@@ -3628,7 +3604,6 @@ static int memcg_online_kmem(struct mem_
static void memcg_offline_kmem(struct mem_cgroup *memcg)
{
struct mem_cgroup *parent;
- int kmemcg_id;
if (cgroup_memory_nokmem)
return;
@@ -3643,20 +3618,12 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
/*
- * memcg_reparent_list_lrus() can change memcg->kmemcg_id.
- * Cache it to local @kmemcg_id.
- */
- kmemcg_id = memcg->kmemcg_id;
-
- /*
* After we have finished memcg_reparent_objcgs(), all list_lrus
* corresponding to this cgroup are guaranteed to remain empty.
* The ordering is imposed by list_lru_node->lock taken by
* memcg_reparent_list_lrus().
*/
memcg_reparent_list_lrus(memcg, parent);
-
- ida_free(&memcg_cache_ida, kmemcg_id);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 058/227] mm: memcontrol: reuse memory cgroup ID for kmem ID
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: reuse memory cgroup ID for kmem ID
There are two idrs being used by memory cgroup, one is for kmem ID,
another is for memory cgroup ID. The maximum ID of both is 64Ki. Both of
them can limit the total number of memory cgroups. Actually, we can reuse
memory cgroup ID for kmem ID to simplify the code.
Link: https://lkml.kernel.org/r/20220228122126.37293-14-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 39 +++------------------------------------
1 file changed, 3 insertions(+), 36 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-reuse-memory-cgroup-id-for-kmem-id
+++ a/mm/memcontrol.c
@@ -348,23 +348,6 @@ static void memcg_reparent_objcgs(struct
}
/*
- * This will be used as a shrinker list's index.
- * The main reason for not using cgroup id for this:
- * this works better in sparse environments, where we have a lot of memcgs,
- * but only a few kmem-limited.
- */
-static DEFINE_IDA(memcg_cache_ida);
-
-/*
- * MAX_SIZE should be as large as the number of cgrp_ids. Ideally, we could get
- * this constant directly from cgroup, but it is understandable that this is
- * better kept as an internal representation in cgroup.c. In any case, the
- * cgrp_id space is not getting any smaller, and we don't have to necessarily
- * increase ours as well if it increases.
- */
-#define MEMCG_CACHES_MAX_SIZE MEM_CGROUP_ID_MAX
-
-/*
* A lot of the calls to the cache allocation functions are expected to be
* inlined by the compiler. Since the calls to memcg_slab_pre_alloc_hook() are
* conditional to this static branch, we'll have to allow modules that does
@@ -3597,7 +3580,6 @@ static u64 mem_cgroup_read_u64(struct cg
static int memcg_online_kmem(struct mem_cgroup *memcg)
{
struct obj_cgroup *objcg;
- int memcg_id;
if (cgroup_memory_nokmem)
return 0;
@@ -3605,22 +3587,16 @@ static int memcg_online_kmem(struct mem_
if (unlikely(mem_cgroup_is_root(memcg)))
return 0;
- memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1,
- GFP_KERNEL);
- if (memcg_id < 0)
- return memcg_id;
-
objcg = obj_cgroup_alloc();
- if (!objcg) {
- ida_free(&memcg_cache_ida, memcg_id);
+ if (!objcg)
return -ENOMEM;
- }
+
objcg->memcg = memcg;
rcu_assign_pointer(memcg->objcg, objcg);
static_branch_enable(&memcg_kmem_enabled_key);
- memcg->kmemcg_id = memcg_id;
+ memcg->kmemcg_id = memcg->id.id;
return 0;
}
@@ -3628,7 +3604,6 @@ static int memcg_online_kmem(struct mem_
static void memcg_offline_kmem(struct mem_cgroup *memcg)
{
struct mem_cgroup *parent;
- int kmemcg_id;
if (cgroup_memory_nokmem)
return;
@@ -3643,20 +3618,12 @@ static void memcg_offline_kmem(struct me
memcg_reparent_objcgs(memcg, parent);
/*
- * memcg_reparent_list_lrus() can change memcg->kmemcg_id.
- * Cache it to local @kmemcg_id.
- */
- kmemcg_id = memcg->kmemcg_id;
-
- /*
* After we have finished memcg_reparent_objcgs(), all list_lrus
* corresponding to this cgroup are guaranteed to remain empty.
* The ordering is imposed by list_lru_node->lock taken by
* memcg_reparent_list_lrus().
*/
memcg_reparent_list_lrus(memcg, parent);
-
- ida_free(&memcg_cache_ida, kmemcg_id);
}
#else
static int memcg_online_kmem(struct mem_cgroup *memcg)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 059/227] mm: memcontrol: fix cannot alloc the maximum memcg ID
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: fix cannot alloc the maximum memcg ID
The idr_alloc() does not include @max ID. So in the current
implementation, the maximum memcg ID is 65534 instead of 65535. It seems
a bug. So fix this.
Link: https://lkml.kernel.org/r/20220228122126.37293-15-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-cannot-alloc-the-maximum-memcg-id
+++ a/mm/memcontrol.c
@@ -5088,8 +5088,7 @@ static struct mem_cgroup *mem_cgroup_all
return ERR_PTR(error);
memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX,
- GFP_KERNEL);
+ 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
if (memcg->id.id < 0) {
error = memcg->id.id;
goto fail;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 059/227] mm: memcontrol: fix cannot alloc the maximum memcg ID
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: fix cannot alloc the maximum memcg ID
The idr_alloc() does not include @max ID. So in the current
implementation, the maximum memcg ID is 65534 instead of 65535. It seems
a bug. So fix this.
Link: https://lkml.kernel.org/r/20220228122126.37293-15-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-cannot-alloc-the-maximum-memcg-id
+++ a/mm/memcontrol.c
@@ -5088,8 +5088,7 @@ static struct mem_cgroup *mem_cgroup_all
return ERR_PTR(error);
memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
- 1, MEM_CGROUP_ID_MAX,
- GFP_KERNEL);
+ 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
if (memcg->id.id < 0) {
error = memcg->id.id;
goto fail;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 060/227] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
The name of list_lru_memcg was occupied before and became free since last
commit. Rename list_lru_per_memcg to list_lru_memcg since the name is
brief.
Link: https://lkml.kernel.org/r/20220228122126.37293-16-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 2 +-
mm/list_lru.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 10 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-rename-list_lru_per_memcg-to-list_lru_memcg
+++ a/include/linux/list_lru.h
@@ -32,7 +32,7 @@ struct list_lru_one {
long nr_items;
};
-struct list_lru_per_memcg {
+struct list_lru_memcg {
struct rcu_head rcu;
/* array of per cgroup per node lists, indexed by node id */
struct list_lru_one node[];
--- a/mm/list_lru.c~mm-list_lru-rename-list_lru_per_memcg-to-list_lru_memcg
+++ a/mm/list_lru.c
@@ -53,7 +53,7 @@ static inline struct list_lru_one *
list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
if (list_lru_memcg_aware(lru) && idx >= 0) {
- struct list_lru_per_memcg *mlru = xa_load(&lru->xa, idx);
+ struct list_lru_memcg *mlru = xa_load(&lru->xa, idx);
return mlru ? &mlru->node[nid] : NULL;
}
@@ -306,7 +306,7 @@ unsigned long list_lru_walk_node(struct
#ifdef CONFIG_MEMCG_KMEM
if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
unsigned long index;
xa_for_each(&lru->xa, index, mlru) {
@@ -335,10 +335,10 @@ static void init_one_lru(struct list_lru
}
#ifdef CONFIG_MEMCG_KMEM
-static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp)
+static struct list_lru_memcg *memcg_init_list_lru_one(gfp_t gfp)
{
int nid;
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
mlru = kmalloc(struct_size(mlru, node, nr_node_ids), gfp);
if (!mlru)
@@ -352,7 +352,7 @@ static struct list_lru_per_memcg *memcg_
static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- struct list_lru_per_memcg *mlru = xa_erase_irq(&lru->xa, src_idx);
+ struct list_lru_memcg *mlru = xa_erase_irq(&lru->xa, src_idx);
/*
* The __list_lru_walk_one() can walk the list of this node.
@@ -374,7 +374,7 @@ static inline void memcg_init_list_lru(s
static void memcg_destroy_list_lru(struct list_lru *lru)
{
XA_STATE(xas, &lru->xa, 0);
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
if (!list_lru_memcg_aware(lru))
return;
@@ -475,7 +475,7 @@ int memcg_list_lru_alloc(struct mem_cgro
int i;
unsigned long flags;
struct list_lru_memcg_table {
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
struct mem_cgroup *memcg;
} *table;
XA_STATE(xas, &lru->xa, 0);
@@ -491,7 +491,7 @@ int memcg_list_lru_alloc(struct mem_cgro
/*
* Because the list_lru can be reparented to the parent cgroup's
* list_lru, we should make sure that this cgroup and all its
- * ancestors have allocated list_lru_per_memcg.
+ * ancestors have allocated list_lru_memcg.
*/
for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) {
if (memcg_list_lru_allocated(memcg, lru))
@@ -510,7 +510,7 @@ int memcg_list_lru_alloc(struct mem_cgro
xas_lock_irqsave(&xas, flags);
while (i--) {
int index = READ_ONCE(table[i].memcg->kmemcg_id);
- struct list_lru_per_memcg *mlru = table[i].mlru;
+ struct list_lru_memcg *mlru = table[i].mlru;
xas_set(&xas, index);
retry:
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 060/227] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: list_lru: rename list_lru_per_memcg to list_lru_memcg
The name of list_lru_memcg was occupied before and became free since last
commit. Rename list_lru_per_memcg to list_lru_memcg since the name is
brief.
Link: https://lkml.kernel.org/r/20220228122126.37293-16-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/list_lru.h | 2 +-
mm/list_lru.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 10 deletions(-)
--- a/include/linux/list_lru.h~mm-list_lru-rename-list_lru_per_memcg-to-list_lru_memcg
+++ a/include/linux/list_lru.h
@@ -32,7 +32,7 @@ struct list_lru_one {
long nr_items;
};
-struct list_lru_per_memcg {
+struct list_lru_memcg {
struct rcu_head rcu;
/* array of per cgroup per node lists, indexed by node id */
struct list_lru_one node[];
--- a/mm/list_lru.c~mm-list_lru-rename-list_lru_per_memcg-to-list_lru_memcg
+++ a/mm/list_lru.c
@@ -53,7 +53,7 @@ static inline struct list_lru_one *
list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
{
if (list_lru_memcg_aware(lru) && idx >= 0) {
- struct list_lru_per_memcg *mlru = xa_load(&lru->xa, idx);
+ struct list_lru_memcg *mlru = xa_load(&lru->xa, idx);
return mlru ? &mlru->node[nid] : NULL;
}
@@ -306,7 +306,7 @@ unsigned long list_lru_walk_node(struct
#ifdef CONFIG_MEMCG_KMEM
if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
unsigned long index;
xa_for_each(&lru->xa, index, mlru) {
@@ -335,10 +335,10 @@ static void init_one_lru(struct list_lru
}
#ifdef CONFIG_MEMCG_KMEM
-static struct list_lru_per_memcg *memcg_init_list_lru_one(gfp_t gfp)
+static struct list_lru_memcg *memcg_init_list_lru_one(gfp_t gfp)
{
int nid;
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
mlru = kmalloc(struct_size(mlru, node, nr_node_ids), gfp);
if (!mlru)
@@ -352,7 +352,7 @@ static struct list_lru_per_memcg *memcg_
static void memcg_list_lru_free(struct list_lru *lru, int src_idx)
{
- struct list_lru_per_memcg *mlru = xa_erase_irq(&lru->xa, src_idx);
+ struct list_lru_memcg *mlru = xa_erase_irq(&lru->xa, src_idx);
/*
* The __list_lru_walk_one() can walk the list of this node.
@@ -374,7 +374,7 @@ static inline void memcg_init_list_lru(s
static void memcg_destroy_list_lru(struct list_lru *lru)
{
XA_STATE(xas, &lru->xa, 0);
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
if (!list_lru_memcg_aware(lru))
return;
@@ -475,7 +475,7 @@ int memcg_list_lru_alloc(struct mem_cgro
int i;
unsigned long flags;
struct list_lru_memcg_table {
- struct list_lru_per_memcg *mlru;
+ struct list_lru_memcg *mlru;
struct mem_cgroup *memcg;
} *table;
XA_STATE(xas, &lru->xa, 0);
@@ -491,7 +491,7 @@ int memcg_list_lru_alloc(struct mem_cgro
/*
* Because the list_lru can be reparented to the parent cgroup's
* list_lru, we should make sure that this cgroup and all its
- * ancestors have allocated list_lru_per_memcg.
+ * ancestors have allocated list_lru_memcg.
*/
for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) {
if (memcg_list_lru_allocated(memcg, lru))
@@ -510,7 +510,7 @@ int memcg_list_lru_alloc(struct mem_cgro
xas_lock_irqsave(&xas, flags);
while (i--) {
int index = READ_ONCE(table[i].memcg->kmemcg_id);
- struct list_lru_per_memcg *mlru = table[i].mlru;
+ struct list_lru_memcg *mlru = table[i].mlru;
xas_set(&xas, index);
retry:
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 061/227] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
The memcg_cache_id() introduced by commit 2633d7a02823 ("slab/slub:
consider a memcg parameter in kmem_create_cache") is used to index in the
kmem_cache->memcg_params->memcg_caches array. Since
kmem_cache->memcg_params.memcg_caches has been removed by commit
9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for all
accounted allocations"). So the name does not need to reflect cache
related. Just rename it to memcg_kmem_id. And it can reflect kmem
related.
Link: https://lkml.kernel.org/r/20220228122126.37293-17-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/memcontrol.h | 4 ++--
mm/list_lru.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
--- a/include/linux/memcontrol.h~mm-memcontrol-rename-memcg_cache_id-to-memcg_kmem_id
+++ a/include/linux/memcontrol.h
@@ -1708,7 +1708,7 @@ static inline void memcg_kmem_uncharge_p
* A helper for accessing memcg's kmem_id, used for getting
* corresponding LRU lists.
*/
-static inline int memcg_cache_id(struct mem_cgroup *memcg)
+static inline int memcg_kmem_id(struct mem_cgroup *memcg)
{
return memcg ? memcg->kmemcg_id : -1;
}
@@ -1746,7 +1746,7 @@ static inline bool memcg_kmem_enabled(vo
return false;
}
-static inline int memcg_cache_id(struct mem_cgroup *memcg)
+static inline int memcg_kmem_id(struct mem_cgroup *memcg)
{
return -1;
}
--- a/mm/list_lru.c~mm-memcontrol-rename-memcg_cache_id-to-memcg_kmem_id
+++ a/mm/list_lru.c
@@ -75,7 +75,7 @@ list_lru_from_kmem(struct list_lru *lru,
if (!memcg)
goto out;
- l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
out:
if (memcg_ptr)
*memcg_ptr = memcg;
@@ -182,7 +182,7 @@ unsigned long list_lru_count_one(struct
long count;
rcu_read_lock();
- l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
count = l ? READ_ONCE(l->nr_items) : 0;
rcu_read_unlock();
@@ -273,7 +273,7 @@ list_lru_walk_one(struct list_lru *lru,
unsigned long ret;
spin_lock(&nlru->lock);
- ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate,
cb_arg, nr_to_walk);
spin_unlock(&nlru->lock);
return ret;
@@ -289,7 +289,7 @@ list_lru_walk_one_irq(struct list_lru *l
unsigned long ret;
spin_lock_irq(&nlru->lock);
- ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate,
cb_arg, nr_to_walk);
spin_unlock_irq(&nlru->lock);
return ret;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 061/227] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: zhengqi.arch, willy, vdavydov.dev, vbabka, tytso,
trond.myklebust, shy828301, shakeelb, roman.gushchin,
richard.weiyang, mhocko, kari.argillander, jaegeuk, hannes,
fam.zheng, duanxiongchun, david, chao, Anna.Schumaker, alexs,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: memcontrol: rename memcg_cache_id to memcg_kmem_id
The memcg_cache_id() introduced by commit 2633d7a02823 ("slab/slub:
consider a memcg parameter in kmem_create_cache") is used to index in the
kmem_cache->memcg_params->memcg_caches array. Since
kmem_cache->memcg_params.memcg_caches has been removed by commit
9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for all
accounted allocations"). So the name does not need to reflect cache
related. Just rename it to memcg_kmem_id. And it can reflect kmem
related.
Link: https://lkml.kernel.org/r/20220228122126.37293-17-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/memcontrol.h | 4 ++--
mm/list_lru.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
--- a/include/linux/memcontrol.h~mm-memcontrol-rename-memcg_cache_id-to-memcg_kmem_id
+++ a/include/linux/memcontrol.h
@@ -1708,7 +1708,7 @@ static inline void memcg_kmem_uncharge_p
* A helper for accessing memcg's kmem_id, used for getting
* corresponding LRU lists.
*/
-static inline int memcg_cache_id(struct mem_cgroup *memcg)
+static inline int memcg_kmem_id(struct mem_cgroup *memcg)
{
return memcg ? memcg->kmemcg_id : -1;
}
@@ -1746,7 +1746,7 @@ static inline bool memcg_kmem_enabled(vo
return false;
}
-static inline int memcg_cache_id(struct mem_cgroup *memcg)
+static inline int memcg_kmem_id(struct mem_cgroup *memcg)
{
return -1;
}
--- a/mm/list_lru.c~mm-memcontrol-rename-memcg_cache_id-to-memcg_kmem_id
+++ a/mm/list_lru.c
@@ -75,7 +75,7 @@ list_lru_from_kmem(struct list_lru *lru,
if (!memcg)
goto out;
- l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
out:
if (memcg_ptr)
*memcg_ptr = memcg;
@@ -182,7 +182,7 @@ unsigned long list_lru_count_one(struct
long count;
rcu_read_lock();
- l = list_lru_from_memcg_idx(lru, nid, memcg_cache_id(memcg));
+ l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
count = l ? READ_ONCE(l->nr_items) : 0;
rcu_read_unlock();
@@ -273,7 +273,7 @@ list_lru_walk_one(struct list_lru *lru,
unsigned long ret;
spin_lock(&nlru->lock);
- ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate,
cb_arg, nr_to_walk);
spin_unlock(&nlru->lock);
return ret;
@@ -289,7 +289,7 @@ list_lru_walk_one_irq(struct list_lru *l
unsigned long ret;
spin_lock_irq(&nlru->lock);
- ret = __list_lru_walk_one(lru, nid, memcg_cache_id(memcg), isolate,
+ ret = __list_lru_walk_one(lru, nid, memcg_kmem_id(memcg), isolate,
cb_arg, nr_to_walk);
spin_unlock_irq(&nlru->lock);
return ret;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 062/227] memcg: enable accounting for tty-related objects
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: vdavydov.dev, shakeelb, roman.gushchin, mhocko, jirislaby,
hannes, gregkh, vvs, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Vasily Averin <vvs@virtuozzo.com>
Subject: memcg: enable accounting for tty-related objects
At each login the user forces the kernel to create a new terminal and
allocate up to ~1Kb memory for the tty-related structures.
By default it's allowed to create up to 4096 ptys with 1024 reserve for
initial mount namespace only and the settings are controlled by host
admin.
Though this default is not enough for hosters with thousands of containers
per node. Host admin can be forced to increase it up to NR_UNIX98_PTY_MAX
= 1<<20.
By default container is restricted by pty mount_opt.max = 1024, but admin
inside container can change it via remount. As a result, one container
can consume almost all allowed ptys and allocate up to 1Gb of unaccounted
memory.
It is not enough per-se to trigger OOM on host, however anyway, it allows
to significantly exceed the assigned memcg limit and leads to troubles on
the over-committed node.
It makes sense to account for them to restrict the host's memory
consumption from inside the memcg-limited container.
Link: https://lkml.kernel.org/r/5d4bca06-7d4f-a905-e518-12981ebca1b3@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/tty/tty_io.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/tty_io.c~memcg-enable-accounting-for-tty-related-objects
+++ a/drivers/tty/tty_io.c
@@ -3088,7 +3088,7 @@ struct tty_struct *alloc_tty_struct(stru
{
struct tty_struct *tty;
- tty = kzalloc(sizeof(*tty), GFP_KERNEL);
+ tty = kzalloc(sizeof(*tty), GFP_KERNEL_ACCOUNT);
if (!tty)
return NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 062/227] memcg: enable accounting for tty-related objects
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: vdavydov.dev, shakeelb, roman.gushchin, mhocko, jirislaby,
hannes, gregkh, vvs, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Vasily Averin <vvs@virtuozzo.com>
Subject: memcg: enable accounting for tty-related objects
At each login the user forces the kernel to create a new terminal and
allocate up to ~1Kb memory for the tty-related structures.
By default it's allowed to create up to 4096 ptys with 1024 reserve for
initial mount namespace only and the settings are controlled by host
admin.
Though this default is not enough for hosters with thousands of containers
per node. Host admin can be forced to increase it up to NR_UNIX98_PTY_MAX
= 1<<20.
By default container is restricted by pty mount_opt.max = 1024, but admin
inside container can change it via remount. As a result, one container
can consume almost all allowed ptys and allocate up to 1Gb of unaccounted
memory.
It is not enough per-se to trigger OOM on host, however anyway, it allows
to significantly exceed the assigned memcg limit and leads to troubles on
the over-committed node.
It makes sense to account for them to restrict the host's memory
consumption from inside the memcg-limited container.
Link: https://lkml.kernel.org/r/5d4bca06-7d4f-a905-e518-12981ebca1b3@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/tty/tty_io.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/tty_io.c~memcg-enable-accounting-for-tty-related-objects
+++ a/drivers/tty/tty_io.c
@@ -3088,7 +3088,7 @@ struct tty_struct *alloc_tty_struct(stru
{
struct tty_struct *tty;
- tty = kzalloc(sizeof(*tty), GFP_KERNEL);
+ tty = kzalloc(sizeof(*tty), GFP_KERNEL_ACCOUNT);
if (!tty)
return NULL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 063/227] selftests, x86: fix how check_cc.sh is being invoked
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: shuah, groeck, dave.hansen, bp, bot, guillaume.tucker, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Guillaume Tucker <guillaume.tucker@collabora.com>
Subject: selftests, x86: fix how check_cc.sh is being invoked
The $(CC) variable used in Makefiles could contain several arguments such
as "ccache gcc". These need to be passed as a single string to
check_cc.sh, otherwise only the first argument will be used as the
compiler command. Without quotes, the $(CC) variable is passed as
distinct arguments which causes the script to fail to build trivial
programs.
Fix this by adding quotes around $(CC) when calling check_cc.sh to pass
the whole string as a single argument to the script even if it has several
words such as "ccache gcc".
Link: https://lkml.kernel.org/r/d0d460d7be0107a69e3c52477761a6fe694c1840.1646991629.git.guillaume.tucker@collabora.com
Fixes: e9886ace222e ("selftests, x86: Rework x86 target architecture detection")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: "kernelci.org bot" <bot@kernelci.org>
Reviewed-by: Guenter Roeck <groeck@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/Makefile | 6 +++---
tools/testing/selftests/x86/Makefile | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
--- a/tools/testing/selftests/vm/Makefile~selftests-x86-fix-how-check_ccsh-is-being-invoked
+++ a/tools/testing/selftests/vm/Makefile
@@ -51,9 +51,9 @@ TEST_GEN_FILES += split_huge_page_test
TEST_GEN_FILES += ksm_tests
ifeq ($(MACHINE),x86_64)
-CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_32bit_program.c -m32)
-CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_64bit_program.c)
-CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_program.c -no-pie)
+CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
+CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
+CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
TARGETS := protection_keys
BINARIES_32 := $(TARGETS:%=%_32)
--- a/tools/testing/selftests/x86/Makefile~selftests-x86-fix-how-check_ccsh-is-being-invoked
+++ a/tools/testing/selftests/x86/Makefile
@@ -6,9 +6,9 @@ include ../lib.mk
.PHONY: all all_32 all_64 warn_32bit_failure clean
UNAME_M := $(shell uname -m)
-CAN_BUILD_I386 := $(shell ./check_cc.sh $(CC) trivial_32bit_program.c -m32)
-CAN_BUILD_X86_64 := $(shell ./check_cc.sh $(CC) trivial_64bit_program.c)
-CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie)
+CAN_BUILD_I386 := $(shell ./check_cc.sh "$(CC)" trivial_32bit_program.c -m32)
+CAN_BUILD_X86_64 := $(shell ./check_cc.sh "$(CC)" trivial_64bit_program.c)
+CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh "$(CC)" trivial_program.c -no-pie)
TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
check_initial_reg_state sigreturn iopl ioperm \
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 063/227] selftests, x86: fix how check_cc.sh is being invoked
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: shuah, groeck, dave.hansen, bp, bot, guillaume.tucker, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Guillaume Tucker <guillaume.tucker@collabora.com>
Subject: selftests, x86: fix how check_cc.sh is being invoked
The $(CC) variable used in Makefiles could contain several arguments such
as "ccache gcc". These need to be passed as a single string to
check_cc.sh, otherwise only the first argument will be used as the
compiler command. Without quotes, the $(CC) variable is passed as
distinct arguments which causes the script to fail to build trivial
programs.
Fix this by adding quotes around $(CC) when calling check_cc.sh to pass
the whole string as a single argument to the script even if it has several
words such as "ccache gcc".
Link: https://lkml.kernel.org/r/d0d460d7be0107a69e3c52477761a6fe694c1840.1646991629.git.guillaume.tucker@collabora.com
Fixes: e9886ace222e ("selftests, x86: Rework x86 target architecture detection")
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: "kernelci.org bot" <bot@kernelci.org>
Reviewed-by: Guenter Roeck <groeck@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/Makefile | 6 +++---
tools/testing/selftests/x86/Makefile | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
--- a/tools/testing/selftests/vm/Makefile~selftests-x86-fix-how-check_ccsh-is-being-invoked
+++ a/tools/testing/selftests/vm/Makefile
@@ -51,9 +51,9 @@ TEST_GEN_FILES += split_huge_page_test
TEST_GEN_FILES += ksm_tests
ifeq ($(MACHINE),x86_64)
-CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_32bit_program.c -m32)
-CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_64bit_program.c)
-CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) ../x86/trivial_program.c -no-pie)
+CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32)
+CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_64bit_program.c)
+CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_program.c -no-pie)
TARGETS := protection_keys
BINARIES_32 := $(TARGETS:%=%_32)
--- a/tools/testing/selftests/x86/Makefile~selftests-x86-fix-how-check_ccsh-is-being-invoked
+++ a/tools/testing/selftests/x86/Makefile
@@ -6,9 +6,9 @@ include ../lib.mk
.PHONY: all all_32 all_64 warn_32bit_failure clean
UNAME_M := $(shell uname -m)
-CAN_BUILD_I386 := $(shell ./check_cc.sh $(CC) trivial_32bit_program.c -m32)
-CAN_BUILD_X86_64 := $(shell ./check_cc.sh $(CC) trivial_64bit_program.c)
-CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie)
+CAN_BUILD_I386 := $(shell ./check_cc.sh "$(CC)" trivial_32bit_program.c -m32)
+CAN_BUILD_X86_64 := $(shell ./check_cc.sh "$(CC)" trivial_64bit_program.c)
+CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh "$(CC)" trivial_program.c -no-pie)
TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
check_initial_reg_state sigreturn iopl ioperm \
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 064/227] mm: merge pte_mkhuge() call into arch_make_huge_pte()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: will, paulus, mpe, mike.kravetz, davem, christophe.leroy,
catalin.marinas, anshuman.khandual, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm: merge pte_mkhuge() call into arch_make_huge_pte()
Each call into pte_mkhuge() is invariably followed by
arch_make_huge_pte(). Instead arch_make_huge_pte() can accommodate
pte_mkhuge() at the beginning. This updates generic fallback stub for
arch_make_huge_pte() and available platforms definitions. This makes huge
pte creation much cleaner and easier to follow.
Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm64/mm/hugetlbpage.c | 1 +
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 4 ++--
arch/sparc/mm/hugetlbpage.c | 1 +
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c | 3 +--
mm/vmalloc.c | 1 -
6 files changed, 6 insertions(+), 6 deletions(-)
--- a/arch/arm64/mm/hugetlbpage.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/arch/arm64/mm/hugetlbpage.c
@@ -347,6 +347,7 @@ pte_t arch_make_huge_pte(pte_t entry, un
{
size_t pagesize = 1UL << shift;
+ entry = pte_mkhuge(entry);
if (pagesize == CONT_PTE_SIZE) {
entry = pte_mkcont(entry);
} else if (pagesize == CONT_PMD_SIZE) {
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -71,9 +71,9 @@ static inline pte_t arch_make_huge_pte(p
size_t size = 1UL << shift;
if (size == SZ_16K)
- return __pte(pte_val(entry) & ~_PAGE_HUGE);
+ return __pte(pte_val(entry) | _PAGE_SPS);
else
- return entry;
+ return __pte(pte_val(entry) | _PAGE_SPS | _PAGE_HUGE);
}
#define arch_make_huge_pte arch_make_huge_pte
#endif
--- a/arch/sparc/mm/hugetlbpage.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/arch/sparc/mm/hugetlbpage.c
@@ -181,6 +181,7 @@ pte_t arch_make_huge_pte(pte_t entry, un
{
pte_t pte;
+ entry = pte_mkhuge(entry);
pte = hugepage_shift_to_tte(entry, shift);
#ifdef CONFIG_SPARC64
--- a/include/linux/hugetlb.h~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/include/linux/hugetlb.h
@@ -754,7 +754,7 @@ static inline void arch_clear_hugepage_f
static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift,
vm_flags_t flags)
{
- return entry;
+ return pte_mkhuge(entry);
}
#endif
--- a/mm/hugetlb.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/mm/hugetlb.c
@@ -4637,7 +4637,6 @@ static pte_t make_huge_pte(struct vm_are
vma->vm_page_prot));
}
entry = pte_mkyoung(entry);
- entry = pte_mkhuge(entry);
entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
return entry;
@@ -6171,7 +6170,7 @@ unsigned long hugetlb_change_protection(
unsigned int shift = huge_page_shift(hstate_vma(vma));
old_pte = huge_ptep_modify_prot_start(vma, address, ptep);
- pte = pte_mkhuge(huge_pte_modify(old_pte, newprot));
+ pte = huge_pte_modify(old_pte, newprot);
pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
pages++;
--- a/mm/vmalloc.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/mm/vmalloc.c
@@ -118,7 +118,6 @@ static int vmap_pte_range(pmd_t *pmd, un
if (size != PAGE_SIZE) {
pte_t entry = pfn_pte(pfn, prot);
- entry = pte_mkhuge(entry);
entry = arch_make_huge_pte(entry, ilog2(size), 0);
set_huge_pte_at(&init_mm, addr, pte, entry);
pfn += PFN_DOWN(size);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 064/227] mm: merge pte_mkhuge() call into arch_make_huge_pte()
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: will, paulus, mpe, mike.kravetz, davem, christophe.leroy,
catalin.marinas, anshuman.khandual, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm: merge pte_mkhuge() call into arch_make_huge_pte()
Each call into pte_mkhuge() is invariably followed by
arch_make_huge_pte(). Instead arch_make_huge_pte() can accommodate
pte_mkhuge() at the beginning. This updates generic fallback stub for
arch_make_huge_pte() and available platforms definitions. This makes huge
pte creation much cleaner and easier to follow.
Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm64/mm/hugetlbpage.c | 1 +
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 4 ++--
arch/sparc/mm/hugetlbpage.c | 1 +
include/linux/hugetlb.h | 2 +-
mm/hugetlb.c | 3 +--
mm/vmalloc.c | 1 -
6 files changed, 6 insertions(+), 6 deletions(-)
--- a/arch/arm64/mm/hugetlbpage.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/arch/arm64/mm/hugetlbpage.c
@@ -347,6 +347,7 @@ pte_t arch_make_huge_pte(pte_t entry, un
{
size_t pagesize = 1UL << shift;
+ entry = pte_mkhuge(entry);
if (pagesize == CONT_PTE_SIZE) {
entry = pte_mkcont(entry);
} else if (pagesize == CONT_PMD_SIZE) {
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -71,9 +71,9 @@ static inline pte_t arch_make_huge_pte(p
size_t size = 1UL << shift;
if (size == SZ_16K)
- return __pte(pte_val(entry) & ~_PAGE_HUGE);
+ return __pte(pte_val(entry) | _PAGE_SPS);
else
- return entry;
+ return __pte(pte_val(entry) | _PAGE_SPS | _PAGE_HUGE);
}
#define arch_make_huge_pte arch_make_huge_pte
#endif
--- a/arch/sparc/mm/hugetlbpage.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/arch/sparc/mm/hugetlbpage.c
@@ -181,6 +181,7 @@ pte_t arch_make_huge_pte(pte_t entry, un
{
pte_t pte;
+ entry = pte_mkhuge(entry);
pte = hugepage_shift_to_tte(entry, shift);
#ifdef CONFIG_SPARC64
--- a/include/linux/hugetlb.h~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/include/linux/hugetlb.h
@@ -754,7 +754,7 @@ static inline void arch_clear_hugepage_f
static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift,
vm_flags_t flags)
{
- return entry;
+ return pte_mkhuge(entry);
}
#endif
--- a/mm/hugetlb.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/mm/hugetlb.c
@@ -4637,7 +4637,6 @@ static pte_t make_huge_pte(struct vm_are
vma->vm_page_prot));
}
entry = pte_mkyoung(entry);
- entry = pte_mkhuge(entry);
entry = arch_make_huge_pte(entry, shift, vma->vm_flags);
return entry;
@@ -6171,7 +6170,7 @@ unsigned long hugetlb_change_protection(
unsigned int shift = huge_page_shift(hstate_vma(vma));
old_pte = huge_ptep_modify_prot_start(vma, address, ptep);
- pte = pte_mkhuge(huge_pte_modify(old_pte, newprot));
+ pte = huge_pte_modify(old_pte, newprot);
pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
pages++;
--- a/mm/vmalloc.c~mm-merge-pte_mkhuge-call-into-arch_make_huge_pte
+++ a/mm/vmalloc.c
@@ -118,7 +118,6 @@ static int vmap_pte_range(pmd_t *pmd, un
if (size != PAGE_SIZE) {
pte_t entry = pfn_pte(pfn, prot);
- entry = pte_mkhuge(entry);
entry = arch_make_huge_pte(entry, ilog2(size), 0);
set_huge_pte_at(&init_mm, addr, pte, entry);
pfn += PFN_DOWN(size);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 065/227] mm: remove mmu_gathers storage from remaining architectures
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: wangkefeng.wang, stefan.kristiansson, rppt, rmk+kernel, nickhu,
jonas, green.hu, deanbo422, david, dave.hansen, christophe.leroy,
bcain, shorne, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Stafford Horne <shorne@gmail.com>
Subject: mm: remove mmu_gathers storage from remaining architectures
Originally the mmu_gathers were removed in commit 1c3951769621 ("mm: now
that all old mmu_gather code is gone, remove the storage"). However, the
openrisc and hexagon architecture were merged around the same time and
mmu_gathers was not removed.
This patch removes them from openrisc, hexagon and nds32:
Noticed while cleaning this warning:
arch/openrisc/mm/init.c:41:1: warning: symbol 'mmu_gathers' was not declared. Should it be static?
Link: https://lkml.kernel.org/r/20220205141956.3315419-1-shorne@gmail.com
Signed-off-by: Stafford Horne <shorne@gmail.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/hexagon/mm/init.c | 2 --
arch/nds32/mm/init.c | 1 -
arch/openrisc/mm/init.c | 2 --
3 files changed, 5 deletions(-)
--- a/arch/hexagon/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures
+++ a/arch/hexagon/mm/init.c
@@ -29,8 +29,6 @@ int max_kernel_seg = 0x303;
/* indicate pfn's of high memory */
unsigned long highstart_pfn, highend_pfn;
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/* Default cache attribute for newly created page tables */
unsigned long _dflt_cache_att = CACHEDEF;
--- a/arch/nds32/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures
+++ a/arch/nds32/mm/init.c
@@ -18,7 +18,6 @@
#include <asm/tlb.h>
#include <asm/page.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
DEFINE_SPINLOCK(anon_alias_lock);
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
--- a/arch/openrisc/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures
+++ a/arch/openrisc/mm/init.c
@@ -38,8 +38,6 @@
int mem_init_done;
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static void __init zone_sizes_init(void)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 065/227] mm: remove mmu_gathers storage from remaining architectures
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: wangkefeng.wang, stefan.kristiansson, rppt, rmk+kernel, nickhu,
jonas, green.hu, deanbo422, david, dave.hansen, christophe.leroy,
bcain, shorne, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Stafford Horne <shorne@gmail.com>
Subject: mm: remove mmu_gathers storage from remaining architectures
Originally the mmu_gathers were removed in commit 1c3951769621 ("mm: now
that all old mmu_gather code is gone, remove the storage"). However, the
openrisc and hexagon architecture were merged around the same time and
mmu_gathers was not removed.
This patch removes them from openrisc, hexagon and nds32:
Noticed while cleaning this warning:
arch/openrisc/mm/init.c:41:1: warning: symbol 'mmu_gathers' was not declared. Should it be static?
Link: https://lkml.kernel.org/r/20220205141956.3315419-1-shorne@gmail.com
Signed-off-by: Stafford Horne <shorne@gmail.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/hexagon/mm/init.c | 2 --
arch/nds32/mm/init.c | 1 -
arch/openrisc/mm/init.c | 2 --
3 files changed, 5 deletions(-)
--- a/arch/hexagon/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures
+++ a/arch/hexagon/mm/init.c
@@ -29,8 +29,6 @@ int max_kernel_seg = 0x303;
/* indicate pfn's of high memory */
unsigned long highstart_pfn, highend_pfn;
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
/* Default cache attribute for newly created page tables */
unsigned long _dflt_cache_att = CACHEDEF;
--- a/arch/nds32/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures
+++ a/arch/nds32/mm/init.c
@@ -18,7 +18,6 @@
#include <asm/tlb.h>
#include <asm/page.h>
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
DEFINE_SPINLOCK(anon_alias_lock);
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
--- a/arch/openrisc/mm/init.c~mm-remove-mmu_gathers-storage-from-remaining-architectures
+++ a/arch/openrisc/mm/init.c
@@ -38,8 +38,6 @@
int mem_init_done;
-DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
-
static void __init zone_sizes_init(void)
{
unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 066/227] mm: thp: fix wrong cache flush in remove_migration_pmd()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: thp: fix wrong cache flush in remove_migration_pmd()
Patch series "Fix some cache flush bugs", v5.
This series focuses on fixing cache maintenance.
This patch (of 7):
The flush_cache_range() is supposed to be justified only if the page is
already placed in process page table, and that is done right after
flush_cache_range(). So using this interface is wrong. And there is no
need to invalite cache since it was non-present before in
remove_migration_pmd(). So just to remove it.
Link: https://lkml.kernel.org/r/20220210123058.79206-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220210123058.79206-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-thp-fix-wrong-cache-flush-in-remove_migration_pmd
+++ a/mm/huge_memory.c
@@ -3197,7 +3197,6 @@ void remove_migration_pmd(struct page_vm
if (pmd_swp_uffd_wp(*pvmw->pmd))
pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
- flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE);
if (PageAnon(new))
page_add_anon_rmap(new, vma, mmun_start, true);
else
@@ -3205,6 +3204,8 @@ void remove_migration_pmd(struct page_vm
set_pmd_at(mm, mmun_start, pvmw->pmd, pmde);
if ((vma->vm_flags & VM_LOCKED) && !PageDoubleMap(new))
mlock_vma_page(new);
+
+ /* No need to invalidate - it was non-present before */
update_mmu_cache_pmd(vma, address, pvmw->pmd);
}
#endif
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 066/227] mm: thp: fix wrong cache flush in remove_migration_pmd()
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: thp: fix wrong cache flush in remove_migration_pmd()
Patch series "Fix some cache flush bugs", v5.
This series focuses on fixing cache maintenance.
This patch (of 7):
The flush_cache_range() is supposed to be justified only if the page is
already placed in process page table, and that is done right after
flush_cache_range(). So using this interface is wrong. And there is no
need to invalite cache since it was non-present before in
remove_migration_pmd(). So just to remove it.
Link: https://lkml.kernel.org/r/20220210123058.79206-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220210123058.79206-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-thp-fix-wrong-cache-flush-in-remove_migration_pmd
+++ a/mm/huge_memory.c
@@ -3197,7 +3197,6 @@ void remove_migration_pmd(struct page_vm
if (pmd_swp_uffd_wp(*pvmw->pmd))
pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
- flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE);
if (PageAnon(new))
page_add_anon_rmap(new, vma, mmun_start, true);
else
@@ -3205,6 +3204,8 @@ void remove_migration_pmd(struct page_vm
set_pmd_at(mm, mmun_start, pvmw->pmd, pmde);
if ((vma->vm_flags & VM_LOCKED) && !PageDoubleMap(new))
mlock_vma_page(new);
+
+ /* No need to invalidate - it was non-present before */
update_mmu_cache_pmd(vma, address, pvmw->pmd);
}
#endif
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 067/227] mm: fix missing cache flush for all tail pages of compound page
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: fix missing cache flush for all tail pages of compound page
The D-cache maintenance inside move_to_new_page() only consider one page,
there is still D-cache maintenance issue for tail pages of compound page
(e.g. THP or HugeTLB).
THP migration is only enabled on x86_64, ARM64 and powerpc, while powerpc
and arm64 need to maintain the consistency between I-Cache and D-Cache,
which depends on flush_dcache_page() to maintain the consistency between
I-Cache and D-Cache.
But there is no issues on arm64 and powerpc since they already considers
the compound page cache flushing in their icache flush function. HugeTLB
migration is enabled on arm, arm64, mips, parisc, powerpc, riscv, s390 and
sh, while arm has handled the compound page cache flush in
flush_dcache_page(), but most others do not.
In theory, the issue exists on many architectures. Fix this by not using
flush_dcache_folio() since it is not backportable.
Link: https://lkml.kernel.org/r/20220210123058.79206-3-songmuchun@bytedance.com
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mm-fix-missing-cache-flush-for-all-tail-pages-of-compound-page
+++ a/mm/migrate.c
@@ -916,9 +916,12 @@ static int move_to_new_page(struct page
if (!PageMappingFlags(page))
page->mapping = NULL;
- if (likely(!is_zone_device_page(newpage)))
- flush_dcache_page(newpage);
+ if (likely(!is_zone_device_page(newpage))) {
+ int i, nr = compound_nr(newpage);
+ for (i = 0; i < nr; i++)
+ flush_dcache_page(newpage + i);
+ }
}
out:
return rc;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 067/227] mm: fix missing cache flush for all tail pages of compound page
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: fix missing cache flush for all tail pages of compound page
The D-cache maintenance inside move_to_new_page() only consider one page,
there is still D-cache maintenance issue for tail pages of compound page
(e.g. THP or HugeTLB).
THP migration is only enabled on x86_64, ARM64 and powerpc, while powerpc
and arm64 need to maintain the consistency between I-Cache and D-Cache,
which depends on flush_dcache_page() to maintain the consistency between
I-Cache and D-Cache.
But there is no issues on arm64 and powerpc since they already considers
the compound page cache flushing in their icache flush function. HugeTLB
migration is enabled on arm, arm64, mips, parisc, powerpc, riscv, s390 and
sh, while arm has handled the compound page cache flush in
flush_dcache_page(), but most others do not.
In theory, the issue exists on many architectures. Fix this by not using
flush_dcache_folio() since it is not backportable.
Link: https://lkml.kernel.org/r/20220210123058.79206-3-songmuchun@bytedance.com
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mm-fix-missing-cache-flush-for-all-tail-pages-of-compound-page
+++ a/mm/migrate.c
@@ -916,9 +916,12 @@ static int move_to_new_page(struct page
if (!PageMappingFlags(page))
page->mapping = NULL;
- if (likely(!is_zone_device_page(newpage)))
- flush_dcache_page(newpage);
+ if (likely(!is_zone_device_page(newpage))) {
+ int i, nr = compound_nr(newpage);
+ for (i = 0; i < nr; i++)
+ flush_dcache_page(newpage + i);
+ }
}
out:
return rc;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 068/227] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
userfaultfd calls copy_huge_page_from_user() which does not do any cache
flushing for the target page. Then the target page will be mapped to the
user space with a different address (user address), which might have an
alias issue with the kernel address used to copy the data from the user
to. Fix this issue by flushing dcache in copy_huge_page_from_user().
Link: https://lkml.kernel.org/r/20220210123058.79206-4-songmuchun@bytedance.com
Fixes: fa4d75c1de13 ("userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/memory.c~mm-hugetlb-fix-missing-cache-flush-in-copy_huge_page_from_user
+++ a/mm/memory.c
@@ -5444,6 +5444,8 @@ long copy_huge_page_from_user(struct pag
if (rc)
break;
+ flush_dcache_page(subpage);
+
cond_resched();
}
return ret_val;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 068/227] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
@ 2022-03-22 21:41 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:41 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: fix missing cache flush in copy_huge_page_from_user()
userfaultfd calls copy_huge_page_from_user() which does not do any cache
flushing for the target page. Then the target page will be mapped to the
user space with a different address (user address), which might have an
alias issue with the kernel address used to copy the data from the user
to. Fix this issue by flushing dcache in copy_huge_page_from_user().
Link: https://lkml.kernel.org/r/20220210123058.79206-4-songmuchun@bytedance.com
Fixes: fa4d75c1de13 ("userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/memory.c~mm-hugetlb-fix-missing-cache-flush-in-copy_huge_page_from_user
+++ a/mm/memory.c
@@ -5444,6 +5444,8 @@ long copy_huge_page_from_user(struct pag
if (rc)
break;
+ flush_dcache_page(subpage);
+
cond_resched();
}
return ret_val;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 069/227] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte()
folio_copy() will copy the data from one page to the target page, then the
target page will be mapped to the user space address, which might have an
alias issue with the kernel address used to copy the data from the page
to. There are 2 ways to fix this issue.
1) insert flush_dcache_page() after folio_copy().
2) replace folio_copy() with copy_user_huge_page() which already
considers the cache maintenance.
We chose 2) way to fix the issue since architectures can optimize this
situation. It is also make backports easier.
Link: https://lkml.kernel.org/r/20220210123058.79206-5-songmuchun@bytedance.com
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-missing-cache-flush-in-hugetlb_mcopy_atomic_pte
+++ a/mm/hugetlb.c
@@ -5816,7 +5816,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
*pagep = NULL;
goto out;
}
- folio_copy(page_folio(page), page_folio(*pagep));
+ copy_user_huge_page(page, *pagep, dst_addr, dst_vma,
+ pages_per_huge_page(h));
put_page(*pagep);
*pagep = NULL;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 069/227] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte()
folio_copy() will copy the data from one page to the target page, then the
target page will be mapped to the user space address, which might have an
alias issue with the kernel address used to copy the data from the page
to. There are 2 ways to fix this issue.
1) insert flush_dcache_page() after folio_copy().
2) replace folio_copy() with copy_user_huge_page() which already
considers the cache maintenance.
We chose 2) way to fix the issue since architectures can optimize this
situation. It is also make backports easier.
Link: https://lkml.kernel.org/r/20220210123058.79206-5-songmuchun@bytedance.com
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-missing-cache-flush-in-hugetlb_mcopy_atomic_pte
+++ a/mm/hugetlb.c
@@ -5816,7 +5816,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
*pagep = NULL;
goto out;
}
- folio_copy(page_folio(page), page_folio(*pagep));
+ copy_user_huge_page(page, *pagep, dst_addr, dst_vma,
+ pages_per_huge_page(h));
put_page(*pagep);
*pagep = NULL;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 070/227] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
userfaultfd calls shmem_mfill_atomic_pte() which does not do any cache
flushing for the target page. Then the target page will be mapped to the
user space with a different address (user address), which might have an
alias issue with the kernel address used to copy the data from the user
to. Insert flush_dcache_page() in non-zero-page case. And replace
clear_highpage() with clear_user_highpage() which already considers the
cache maintenance.
Link: https://lkml.kernel.org/r/20220210123058.79206-6-songmuchun@bytedance.com
Fixes: 8d1039634206 ("userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support")
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/shmem.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/shmem.c~mm-shmem-fix-missing-cache-flush-in-shmem_mfill_atomic_pte
+++ a/mm/shmem.c
@@ -2364,8 +2364,10 @@ int shmem_mfill_atomic_pte(struct mm_str
/* don't free the page */
goto out_unacct_blocks;
}
+
+ flush_dcache_page(page);
} else { /* ZEROPAGE */
- clear_highpage(page);
+ clear_user_highpage(page, dst_addr);
}
} else {
page = *pagep;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 070/227] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte()
userfaultfd calls shmem_mfill_atomic_pte() which does not do any cache
flushing for the target page. Then the target page will be mapped to the
user space with a different address (user address), which might have an
alias issue with the kernel address used to copy the data from the user
to. Insert flush_dcache_page() in non-zero-page case. And replace
clear_highpage() with clear_user_highpage() which already considers the
cache maintenance.
Link: https://lkml.kernel.org/r/20220210123058.79206-6-songmuchun@bytedance.com
Fixes: 8d1039634206 ("userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support")
Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/shmem.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/shmem.c~mm-shmem-fix-missing-cache-flush-in-shmem_mfill_atomic_pte
+++ a/mm/shmem.c
@@ -2364,8 +2364,10 @@ int shmem_mfill_atomic_pte(struct mm_str
/* don't free the page */
goto out_unacct_blocks;
}
+
+ flush_dcache_page(page);
} else { /* ZEROPAGE */
- clear_highpage(page);
+ clear_user_highpage(page, dst_addr);
}
} else {
page = *pagep;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 071/227] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
userfaultfd calls mcopy_atomic_pte() and __mcopy_atomic() which do not do
any cache flushing for the target page. Then the target page will be
mapped to the user space with a different address (user address), which
might have an alias issue with the kernel address used to copy the data
from the user to. Fix this by insert flush_dcache_page() after
copy_from_user() succeeds.
Link: https://lkml.kernel.org/r/20220210123058.79206-7-songmuchun@bytedance.com
Fixes: b6ebaedb4cb1 ("userfaultfd: avoid mmap_sem read recursion in mcopy_atomic")
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/userfaultfd.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/userfaultfd.c~mm-userfaultfd-fix-missing-cache-flush-in-mcopy_atomic_pte-and-__mcopy_atomic
+++ a/mm/userfaultfd.c
@@ -150,6 +150,8 @@ static int mcopy_atomic_pte(struct mm_st
/* don't free the page */
goto out;
}
+
+ flush_dcache_page(page);
} else {
page = *pagep;
*pagep = NULL;
@@ -625,6 +627,7 @@ retry:
err = -EFAULT;
goto out;
}
+ flush_dcache_page(page);
goto retry;
} else
BUG_ON(page);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 071/227] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic()
userfaultfd calls mcopy_atomic_pte() and __mcopy_atomic() which do not do
any cache flushing for the target page. Then the target page will be
mapped to the user space with a different address (user address), which
might have an alias issue with the kernel address used to copy the data
from the user to. Fix this by insert flush_dcache_page() after
copy_from_user() succeeds.
Link: https://lkml.kernel.org/r/20220210123058.79206-7-songmuchun@bytedance.com
Fixes: b6ebaedb4cb1 ("userfaultfd: avoid mmap_sem read recursion in mcopy_atomic")
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/userfaultfd.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/userfaultfd.c~mm-userfaultfd-fix-missing-cache-flush-in-mcopy_atomic_pte-and-__mcopy_atomic
+++ a/mm/userfaultfd.c
@@ -150,6 +150,8 @@ static int mcopy_atomic_pte(struct mm_st
/* don't free the page */
goto out;
}
+
+ flush_dcache_page(page);
} else {
page = *pagep;
*pagep = NULL;
@@ -625,6 +627,7 @@ retry:
err = -EFAULT;
goto out;
}
+ flush_dcache_page(page);
goto retry;
} else
BUG_ON(page);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 072/227] mm: replace multiple dcache flush with flush_dcache_folio()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: replace multiple dcache flush with flush_dcache_folio()
Simplify the code by using flush_dcache_folio().
Link: https://lkml.kernel.org/r/20220210123058.79206-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
--- a/mm/migrate.c~mm-replace-multiple-dcache-flush-with-flush_dcache_folio
+++ a/mm/migrate.c
@@ -916,12 +916,8 @@ static int move_to_new_page(struct page
if (!PageMappingFlags(page))
page->mapping = NULL;
- if (likely(!is_zone_device_page(newpage))) {
- int i, nr = compound_nr(newpage);
-
- for (i = 0; i < nr; i++)
- flush_dcache_page(newpage + i);
- }
+ if (likely(!is_zone_device_page(newpage)))
+ flush_dcache_folio(page_folio(newpage));
}
out:
return rc;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 072/227] mm: replace multiple dcache flush with flush_dcache_folio()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: ziy, rientjes, peterx, mike.kravetz, lars.persson,
kirill.shutemov, fam.zheng, duanxiongchun, axelrasmussen,
songmuchun, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: replace multiple dcache flush with flush_dcache_folio()
Simplify the code by using flush_dcache_folio().
Link: https://lkml.kernel.org/r/20220210123058.79206-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
--- a/mm/migrate.c~mm-replace-multiple-dcache-flush-with-flush_dcache_folio
+++ a/mm/migrate.c
@@ -916,12 +916,8 @@ static int move_to_new_page(struct page
if (!PageMappingFlags(page))
page->mapping = NULL;
- if (likely(!is_zone_device_page(newpage))) {
- int i, nr = compound_nr(newpage);
-
- for (i = 0; i < nr; i++)
- flush_dcache_page(newpage + i);
- }
+ if (likely(!is_zone_device_page(newpage)))
+ flush_dcache_folio(page_folio(newpage));
}
out:
return rc;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 073/227] mm: don't skip swap entry even if zap_details specified
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, stable, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: don't skip swap entry even if zap_details specified
Patch series "mm: Rework zap ptes on swap entries", v5.
Patch 1 should fix a long standing bug for zap_pte_range() on zap_details
usage. The risk is we could have some swap entries skipped while we should
have zapped them.
Migration entries are not the major concern because file backed memory always
zap in the pattern that "first time without page lock, then re-zap with page
lock" hence the 2nd zap will always make sure all migration entries are already
recovered.
However there can be issues with real swap entries got skipped errornoously.
There's a reproducer provided in commit message of patch 1 for that.
Patch 2-4 are cleanups that are based on patch 1. After the whole patchset
applied, we should have a very clean view of zap_pte_range().
Only patch 1 needs to be backported to stable if necessary.
This patch (of 4):
The "details" pointer shouldn't be the token to decide whether we should
skip swap entries.
For example, when the callers specified details->zap_mapping==NULL, it
means the user wants to zap all the pages (including COWed pages), then we
need to look into swap entries because there can be private COWed pages
that was swapped out.
Skipping some swap entries when details is non-NULL may lead to wrongly
leaving some of the swap entries while we should have zapped them.
A reproducer of the problem:
===8<===
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int page_size;
int shmem_fd;
char *buffer;
void main(void)
{
int ret;
char val;
page_size = getpagesize();
shmem_fd = memfd_create("test", 0);
assert(shmem_fd >= 0);
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
MAP_PRIVATE, shmem_fd, 0);
assert(buffer != MAP_FAILED);
/* Write private page, swap it out */
buffer[page_size] = 1;
madvise(buffer, page_size * 2, MADV_PAGEOUT);
/* This should drop private buffer[page_size] already */
ret = ftruncate(shmem_fd, page_size);
assert(ret == 0);
/* Recover the size */
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
/* Re-read the data, it should be all zero */
val = buffer[page_size];
if (val == 0)
printf("Good\n");
else
printf("BUG\n");
}
===8<===
We don't need to touch up the pmd path, because pmd never had a issue with
swap entries. For example, shmem pmd migration will always be split into
pte level, and same to swapping on anonymous.
Add another helper should_zap_cows() so that we can also check whether we
should zap private mappings when there's no page pointer specified.
This patch drops that trick, so we handle swap ptes coherently. Meanwhile
we should do the same check upon migration entry, hwpoison entry and
genuine swap entries too.
To be explicit, we should still remember to keep the private entries if
even_cows==false, and always zap them when even_cows==true.
The issue seems to exist starting from the initial commit of git.
[peterx@redhat.com: comment tweaks]
Link: https://lkml.kernel.org/r/20220217060746.71256-2-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220217060746.71256-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 40 +++++++++++++++++++++++++++++++---------
1 file changed, 31 insertions(+), 9 deletions(-)
--- a/mm/memory.c~mm-dont-skip-swap-entry-even-if-zap_details-specified
+++ a/mm/memory.c
@@ -1313,6 +1313,17 @@ struct zap_details {
struct folio *single_folio; /* Locked folio to be unmapped */
};
+/* Whether we should zap all COWed (private) pages too */
+static inline bool should_zap_cows(struct zap_details *details)
+{
+ /* By default, zap all pages */
+ if (!details)
+ return true;
+
+ /* Or, we zap COWed pages only if the caller wants to */
+ return !details->zap_mapping;
+}
+
/*
* We set details->zap_mapping when we want to unmap shared but keep private
* pages. Return true if skip zapping this page, false otherwise.
@@ -1320,11 +1331,15 @@ struct zap_details {
static inline bool
zap_skip_check_mapping(struct zap_details *details, struct page *page)
{
- if (!details || !page)
+ /* If we can make a decision without *page.. */
+ if (should_zap_cows(details))
+ return false;
+
+ /* E.g. the caller passes NULL for the case of a zero page */
+ if (!page)
return false;
- return details->zap_mapping &&
- (details->zap_mapping != page_rmapping(page));
+ return details->zap_mapping != page_rmapping(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1405,17 +1420,24 @@ again:
continue;
}
- /* If details->check_mapping, we leave swap entries. */
- if (unlikely(details))
- continue;
-
- if (!non_swap_entry(entry))
+ if (!non_swap_entry(entry)) {
+ /* Genuine swap entry, hence a private anon page */
+ if (!should_zap_cows(details))
+ continue;
rss[MM_SWAPENTS]--;
- else if (is_migration_entry(entry)) {
+ } else if (is_migration_entry(entry)) {
struct page *page;
page = pfn_swap_entry_to_page(entry);
+ if (zap_skip_check_mapping(details, page))
+ continue;
rss[mm_counter(page)]--;
+ } else if (is_hwpoison_entry(entry)) {
+ if (!should_zap_cows(details))
+ continue;
+ } else {
+ /* We should have covered all the swap entry types */
+ WARN_ON_ONCE(1);
}
if (unlikely(!free_swap_and_cache(entry)))
print_bad_pte(vma, addr, ptent, NULL);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 073/227] mm: don't skip swap entry even if zap_details specified
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, stable, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: don't skip swap entry even if zap_details specified
Patch series "mm: Rework zap ptes on swap entries", v5.
Patch 1 should fix a long standing bug for zap_pte_range() on zap_details
usage. The risk is we could have some swap entries skipped while we should
have zapped them.
Migration entries are not the major concern because file backed memory always
zap in the pattern that "first time without page lock, then re-zap with page
lock" hence the 2nd zap will always make sure all migration entries are already
recovered.
However there can be issues with real swap entries got skipped errornoously.
There's a reproducer provided in commit message of patch 1 for that.
Patch 2-4 are cleanups that are based on patch 1. After the whole patchset
applied, we should have a very clean view of zap_pte_range().
Only patch 1 needs to be backported to stable if necessary.
This patch (of 4):
The "details" pointer shouldn't be the token to decide whether we should
skip swap entries.
For example, when the callers specified details->zap_mapping==NULL, it
means the user wants to zap all the pages (including COWed pages), then we
need to look into swap entries because there can be private COWed pages
that was swapped out.
Skipping some swap entries when details is non-NULL may lead to wrongly
leaving some of the swap entries while we should have zapped them.
A reproducer of the problem:
===8<===
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int page_size;
int shmem_fd;
char *buffer;
void main(void)
{
int ret;
char val;
page_size = getpagesize();
shmem_fd = memfd_create("test", 0);
assert(shmem_fd >= 0);
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
MAP_PRIVATE, shmem_fd, 0);
assert(buffer != MAP_FAILED);
/* Write private page, swap it out */
buffer[page_size] = 1;
madvise(buffer, page_size * 2, MADV_PAGEOUT);
/* This should drop private buffer[page_size] already */
ret = ftruncate(shmem_fd, page_size);
assert(ret == 0);
/* Recover the size */
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
/* Re-read the data, it should be all zero */
val = buffer[page_size];
if (val == 0)
printf("Good\n");
else
printf("BUG\n");
}
===8<===
We don't need to touch up the pmd path, because pmd never had a issue with
swap entries. For example, shmem pmd migration will always be split into
pte level, and same to swapping on anonymous.
Add another helper should_zap_cows() so that we can also check whether we
should zap private mappings when there's no page pointer specified.
This patch drops that trick, so we handle swap ptes coherently. Meanwhile
we should do the same check upon migration entry, hwpoison entry and
genuine swap entries too.
To be explicit, we should still remember to keep the private entries if
even_cows==false, and always zap them when even_cows==true.
The issue seems to exist starting from the initial commit of git.
[peterx@redhat.com: comment tweaks]
Link: https://lkml.kernel.org/r/20220217060746.71256-2-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220217060746.71256-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 40 +++++++++++++++++++++++++++++++---------
1 file changed, 31 insertions(+), 9 deletions(-)
--- a/mm/memory.c~mm-dont-skip-swap-entry-even-if-zap_details-specified
+++ a/mm/memory.c
@@ -1313,6 +1313,17 @@ struct zap_details {
struct folio *single_folio; /* Locked folio to be unmapped */
};
+/* Whether we should zap all COWed (private) pages too */
+static inline bool should_zap_cows(struct zap_details *details)
+{
+ /* By default, zap all pages */
+ if (!details)
+ return true;
+
+ /* Or, we zap COWed pages only if the caller wants to */
+ return !details->zap_mapping;
+}
+
/*
* We set details->zap_mapping when we want to unmap shared but keep private
* pages. Return true if skip zapping this page, false otherwise.
@@ -1320,11 +1331,15 @@ struct zap_details {
static inline bool
zap_skip_check_mapping(struct zap_details *details, struct page *page)
{
- if (!details || !page)
+ /* If we can make a decision without *page.. */
+ if (should_zap_cows(details))
+ return false;
+
+ /* E.g. the caller passes NULL for the case of a zero page */
+ if (!page)
return false;
- return details->zap_mapping &&
- (details->zap_mapping != page_rmapping(page));
+ return details->zap_mapping != page_rmapping(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1405,17 +1420,24 @@ again:
continue;
}
- /* If details->check_mapping, we leave swap entries. */
- if (unlikely(details))
- continue;
-
- if (!non_swap_entry(entry))
+ if (!non_swap_entry(entry)) {
+ /* Genuine swap entry, hence a private anon page */
+ if (!should_zap_cows(details))
+ continue;
rss[MM_SWAPENTS]--;
- else if (is_migration_entry(entry)) {
+ } else if (is_migration_entry(entry)) {
struct page *page;
page = pfn_swap_entry_to_page(entry);
+ if (zap_skip_check_mapping(details, page))
+ continue;
rss[mm_counter(page)]--;
+ } else if (is_hwpoison_entry(entry)) {
+ if (!should_zap_cows(details))
+ continue;
+ } else {
+ /* We should have covered all the swap entry types */
+ WARN_ON_ONCE(1);
}
if (unlikely(!free_swap_and_cache(entry)))
print_bad_pte(vma, addr, ptent, NULL);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 074/227] mm: rename zap_skip_check_mapping() to should_zap_page()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: rename zap_skip_check_mapping() to should_zap_page()
The previous name is against the natural way people think. Invert the
meaning and also the return value. No functional change intended.
Link: https://lkml.kernel.org/r/20220216094810.60572-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
--- a/mm/memory.c~mm-rename-zap_skip_check_mapping-to-should_zap_page
+++ a/mm/memory.c
@@ -1326,20 +1326,19 @@ static inline bool should_zap_cows(struc
/*
* We set details->zap_mapping when we want to unmap shared but keep private
- * pages. Return true if skip zapping this page, false otherwise.
+ * pages. Return true if we should zap this page, false otherwise.
*/
-static inline bool
-zap_skip_check_mapping(struct zap_details *details, struct page *page)
+static inline bool should_zap_page(struct zap_details *details, struct page *page)
{
/* If we can make a decision without *page.. */
if (should_zap_cows(details))
- return false;
+ return true;
/* E.g. the caller passes NULL for the case of a zero page */
if (!page)
- return false;
+ return true;
- return details->zap_mapping != page_rmapping(page);
+ return details->zap_mapping == page_rmapping(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1374,7 +1373,7 @@ again:
struct page *page;
page = vm_normal_page(vma, addr, ptent);
- if (unlikely(zap_skip_check_mapping(details, page)))
+ if (unlikely(!should_zap_page(details, page)))
continue;
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
@@ -1408,7 +1407,7 @@ again:
is_device_exclusive_entry(entry)) {
struct page *page = pfn_swap_entry_to_page(entry);
- if (unlikely(zap_skip_check_mapping(details, page)))
+ if (unlikely(!should_zap_page(details, page)))
continue;
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
rss[mm_counter(page)]--;
@@ -1429,7 +1428,7 @@ again:
struct page *page;
page = pfn_swap_entry_to_page(entry);
- if (zap_skip_check_mapping(details, page))
+ if (!should_zap_page(details, page))
continue;
rss[mm_counter(page)]--;
} else if (is_hwpoison_entry(entry)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 074/227] mm: rename zap_skip_check_mapping() to should_zap_page()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: rename zap_skip_check_mapping() to should_zap_page()
The previous name is against the natural way people think. Invert the
meaning and also the return value. No functional change intended.
Link: https://lkml.kernel.org/r/20220216094810.60572-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
--- a/mm/memory.c~mm-rename-zap_skip_check_mapping-to-should_zap_page
+++ a/mm/memory.c
@@ -1326,20 +1326,19 @@ static inline bool should_zap_cows(struc
/*
* We set details->zap_mapping when we want to unmap shared but keep private
- * pages. Return true if skip zapping this page, false otherwise.
+ * pages. Return true if we should zap this page, false otherwise.
*/
-static inline bool
-zap_skip_check_mapping(struct zap_details *details, struct page *page)
+static inline bool should_zap_page(struct zap_details *details, struct page *page)
{
/* If we can make a decision without *page.. */
if (should_zap_cows(details))
- return false;
+ return true;
/* E.g. the caller passes NULL for the case of a zero page */
if (!page)
- return false;
+ return true;
- return details->zap_mapping != page_rmapping(page);
+ return details->zap_mapping == page_rmapping(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1374,7 +1373,7 @@ again:
struct page *page;
page = vm_normal_page(vma, addr, ptent);
- if (unlikely(zap_skip_check_mapping(details, page)))
+ if (unlikely(!should_zap_page(details, page)))
continue;
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
@@ -1408,7 +1407,7 @@ again:
is_device_exclusive_entry(entry)) {
struct page *page = pfn_swap_entry_to_page(entry);
- if (unlikely(zap_skip_check_mapping(details, page)))
+ if (unlikely(!should_zap_page(details, page)))
continue;
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
rss[mm_counter(page)]--;
@@ -1429,7 +1428,7 @@ again:
struct page *page;
page = pfn_swap_entry_to_page(entry);
- if (zap_skip_check_mapping(details, page))
+ if (!should_zap_page(details, page))
continue;
rss[mm_counter(page)]--;
} else if (is_hwpoison_entry(entry)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 075/227] mm: change zap_details.zap_mapping into even_cows
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: change zap_details.zap_mapping into even_cows
Currently we have a zap_mapping pointer maintained in zap_details, when it
is specified we only want to zap the pages that has the same mapping with
what the caller has specified.
But what we want to do is actually simpler: we want to skip zapping
private (COW-ed) pages in some cases. We can refer to
unmap_mapping_pages() callers where we could have passed in different
even_cows values. The other user is unmap_mapping_folio() where we always
want to skip private pages.
According to Hugh, we used a mapping pointer for historical reason, as
explained here:
https://lore.kernel.org/lkml/391aa58d-ce84-9d4-d68d-d98a9c533255@google.com/
Quoting partly from Hugh:
Which raises the question again of why I did not just use a boolean flag
there originally: aah, I think I've found why. In those days there was a
horrible "optimization", for better performance on some benchmark I guess,
which when you read from /dev/zero into a private mapping, would map the zero
page there (look up read_zero_pagealigned() and zeromap_page_range() if you
dare). So there was another category of page to be skipped along with the
anon COWs, and I didn't want multiple tests in the zap loop, so checking
check_mapping against page->mapping did both. I think nowadays you could do
it by checking for PageAnon page (or genuine swap entry) instead.
This patch replaces the zap_details.zap_mapping pointer into the even_cows
boolean, then we check it against PageAnon.
Link: https://lkml.kernel.org/r/20220216094810.60572-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
--- a/mm/memory.c~mm-change-zap_detailszap_mapping-into-even_cows
+++ a/mm/memory.c
@@ -1309,8 +1309,8 @@ copy_page_range(struct vm_area_struct *d
* Parameter block passed down to zap_pte_range in exceptional cases.
*/
struct zap_details {
- struct address_space *zap_mapping; /* Check page->mapping if set */
struct folio *single_folio; /* Locked folio to be unmapped */
+ bool even_cows; /* Zap COWed private pages too? */
};
/* Whether we should zap all COWed (private) pages too */
@@ -1321,13 +1321,10 @@ static inline bool should_zap_cows(struc
return true;
/* Or, we zap COWed pages only if the caller wants to */
- return !details->zap_mapping;
+ return details->even_cows;
}
-/*
- * We set details->zap_mapping when we want to unmap shared but keep private
- * pages. Return true if we should zap this page, false otherwise.
- */
+/* Decides whether we should zap this page with the page pointer specified */
static inline bool should_zap_page(struct zap_details *details, struct page *page)
{
/* If we can make a decision without *page.. */
@@ -1338,7 +1335,8 @@ static inline bool should_zap_page(struc
if (!page)
return true;
- return details->zap_mapping == page_rmapping(page);
+ /* Otherwise we should only zap non-anon pages */
+ return !PageAnon(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -3398,7 +3396,7 @@ void unmap_mapping_folio(struct folio *f
first_index = folio->index;
last_index = folio->index + folio_nr_pages(folio) - 1;
- details.zap_mapping = mapping;
+ details.even_cows = false;
details.single_folio = folio;
i_mmap_lock_write(mapping);
@@ -3427,7 +3425,7 @@ void unmap_mapping_pages(struct address_
pgoff_t first_index = start;
pgoff_t last_index = start + nr - 1;
- details.zap_mapping = even_cows ? NULL : mapping;
+ details.even_cows = even_cows;
if (last_index < first_index)
last_index = ULONG_MAX;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 075/227] mm: change zap_details.zap_mapping into even_cows
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: change zap_details.zap_mapping into even_cows
Currently we have a zap_mapping pointer maintained in zap_details, when it
is specified we only want to zap the pages that has the same mapping with
what the caller has specified.
But what we want to do is actually simpler: we want to skip zapping
private (COW-ed) pages in some cases. We can refer to
unmap_mapping_pages() callers where we could have passed in different
even_cows values. The other user is unmap_mapping_folio() where we always
want to skip private pages.
According to Hugh, we used a mapping pointer for historical reason, as
explained here:
https://lore.kernel.org/lkml/391aa58d-ce84-9d4-d68d-d98a9c533255@google.com/
Quoting partly from Hugh:
Which raises the question again of why I did not just use a boolean flag
there originally: aah, I think I've found why. In those days there was a
horrible "optimization", for better performance on some benchmark I guess,
which when you read from /dev/zero into a private mapping, would map the zero
page there (look up read_zero_pagealigned() and zeromap_page_range() if you
dare). So there was another category of page to be skipped along with the
anon COWs, and I didn't want multiple tests in the zap loop, so checking
check_mapping against page->mapping did both. I think nowadays you could do
it by checking for PageAnon page (or genuine swap entry) instead.
This patch replaces the zap_details.zap_mapping pointer into the even_cows
boolean, then we check it against PageAnon.
Link: https://lkml.kernel.org/r/20220216094810.60572-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
--- a/mm/memory.c~mm-change-zap_detailszap_mapping-into-even_cows
+++ a/mm/memory.c
@@ -1309,8 +1309,8 @@ copy_page_range(struct vm_area_struct *d
* Parameter block passed down to zap_pte_range in exceptional cases.
*/
struct zap_details {
- struct address_space *zap_mapping; /* Check page->mapping if set */
struct folio *single_folio; /* Locked folio to be unmapped */
+ bool even_cows; /* Zap COWed private pages too? */
};
/* Whether we should zap all COWed (private) pages too */
@@ -1321,13 +1321,10 @@ static inline bool should_zap_cows(struc
return true;
/* Or, we zap COWed pages only if the caller wants to */
- return !details->zap_mapping;
+ return details->even_cows;
}
-/*
- * We set details->zap_mapping when we want to unmap shared but keep private
- * pages. Return true if we should zap this page, false otherwise.
- */
+/* Decides whether we should zap this page with the page pointer specified */
static inline bool should_zap_page(struct zap_details *details, struct page *page)
{
/* If we can make a decision without *page.. */
@@ -1338,7 +1335,8 @@ static inline bool should_zap_page(struc
if (!page)
return true;
- return details->zap_mapping == page_rmapping(page);
+ /* Otherwise we should only zap non-anon pages */
+ return !PageAnon(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -3398,7 +3396,7 @@ void unmap_mapping_folio(struct folio *f
first_index = folio->index;
last_index = folio->index + folio_nr_pages(folio) - 1;
- details.zap_mapping = mapping;
+ details.even_cows = false;
details.single_folio = folio;
i_mmap_lock_write(mapping);
@@ -3427,7 +3425,7 @@ void unmap_mapping_pages(struct address_
pgoff_t first_index = start;
pgoff_t last_index = start + nr - 1;
- details.zap_mapping = even_cows ? NULL : mapping;
+ details.even_cows = even_cows;
if (last_index < first_index)
last_index = ULONG_MAX;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 076/227] mm: rework swap handling of zap_pte_range
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: rework swap handling of zap_pte_range
Clean the code up by merging the device private/exclusive swap entry
handling with the rest, then we merge the pte clear operation too.
struct* page is defined in multiple places in the function, move it upward.
free_swap_and_cache() is only useful for !non_swap_entry() case, put it
into the condition.
No functional change intended.
Link: https://lkml.kernel.org/r/20220216094810.60572-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)
--- a/mm/memory.c~mm-rework-swap-handling-of-zap_pte_range
+++ a/mm/memory.c
@@ -1361,6 +1361,8 @@ again:
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = *pte;
+ struct page *page;
+
if (pte_none(ptent))
continue;
@@ -1368,8 +1370,6 @@ again:
break;
if (pte_present(ptent)) {
- struct page *page;
-
page = vm_normal_page(vma, addr, ptent);
if (unlikely(!should_zap_page(details, page)))
continue;
@@ -1403,28 +1403,21 @@ again:
entry = pte_to_swp_entry(ptent);
if (is_device_private_entry(entry) ||
is_device_exclusive_entry(entry)) {
- struct page *page = pfn_swap_entry_to_page(entry);
-
+ page = pfn_swap_entry_to_page(entry);
if (unlikely(!should_zap_page(details, page)))
continue;
- pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
rss[mm_counter(page)]--;
-
if (is_device_private_entry(entry))
page_remove_rmap(page, false);
-
put_page(page);
- continue;
- }
-
- if (!non_swap_entry(entry)) {
+ } else if (!non_swap_entry(entry)) {
/* Genuine swap entry, hence a private anon page */
if (!should_zap_cows(details))
continue;
rss[MM_SWAPENTS]--;
+ if (unlikely(!free_swap_and_cache(entry)))
+ print_bad_pte(vma, addr, ptent, NULL);
} else if (is_migration_entry(entry)) {
- struct page *page;
-
page = pfn_swap_entry_to_page(entry);
if (!should_zap_page(details, page))
continue;
@@ -1436,8 +1429,6 @@ again:
/* We should have covered all the swap entry types */
WARN_ON_ONCE(1);
}
- if (unlikely(!free_swap_and_cache(entry)))
- print_bad_pte(vma, addr, ptent, NULL);
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
} while (pte++, addr += PAGE_SIZE, addr != end);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 076/227] mm: rework swap handling of zap_pte_range
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vbabka, shy828301, kirill, jhubbard, hughd, david,
apopple, aarcange, peterx, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peter Xu <peterx@redhat.com>
Subject: mm: rework swap handling of zap_pte_range
Clean the code up by merging the device private/exclusive swap entry
handling with the rest, then we merge the pte clear operation too.
struct* page is defined in multiple places in the function, move it upward.
free_swap_and_cache() is only useful for !non_swap_entry() case, put it
into the condition.
No functional change intended.
Link: https://lkml.kernel.org/r/20220216094810.60572-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)
--- a/mm/memory.c~mm-rework-swap-handling-of-zap_pte_range
+++ a/mm/memory.c
@@ -1361,6 +1361,8 @@ again:
arch_enter_lazy_mmu_mode();
do {
pte_t ptent = *pte;
+ struct page *page;
+
if (pte_none(ptent))
continue;
@@ -1368,8 +1370,6 @@ again:
break;
if (pte_present(ptent)) {
- struct page *page;
-
page = vm_normal_page(vma, addr, ptent);
if (unlikely(!should_zap_page(details, page)))
continue;
@@ -1403,28 +1403,21 @@ again:
entry = pte_to_swp_entry(ptent);
if (is_device_private_entry(entry) ||
is_device_exclusive_entry(entry)) {
- struct page *page = pfn_swap_entry_to_page(entry);
-
+ page = pfn_swap_entry_to_page(entry);
if (unlikely(!should_zap_page(details, page)))
continue;
- pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
rss[mm_counter(page)]--;
-
if (is_device_private_entry(entry))
page_remove_rmap(page, false);
-
put_page(page);
- continue;
- }
-
- if (!non_swap_entry(entry)) {
+ } else if (!non_swap_entry(entry)) {
/* Genuine swap entry, hence a private anon page */
if (!should_zap_cows(details))
continue;
rss[MM_SWAPENTS]--;
+ if (unlikely(!free_swap_and_cache(entry)))
+ print_bad_pte(vma, addr, ptent, NULL);
} else if (is_migration_entry(entry)) {
- struct page *page;
-
page = pfn_swap_entry_to_page(entry);
if (!should_zap_page(details, page))
continue;
@@ -1436,8 +1429,6 @@ again:
/* We should have covered all the swap entry types */
WARN_ON_ONCE(1);
}
- if (unlikely(!free_swap_and_cache(entry)))
- print_bad_pte(vma, addr, ptent, NULL);
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
} while (pte++, addr += PAGE_SIZE, addr != end);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 077/227] mm/mmap: return 1 from stack_guard_gap __setup() handler
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: i.zhbanov, hughd, rdunlap, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Randy Dunlap <rdunlap@infradead.org>
Subject: mm/mmap: return 1 from stack_guard_gap __setup() handler
__setup() handlers should return 1 if the command line option is handled
and 0 if not (or maybe never return 0; it just pollutes init's
environment). This prevents:
Unknown kernel command line parameters \
"BOOT_IMAGE=/boot/bzImage-517rc5 stack_guard_gap=100", will be \
passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc5
stack_guard_gap=100
Return 1 to indicate that the boot option has been handled.
Note that there is no warning message if someone enters:
stack_guard_gap=anything_invalid
and 'val' and stack_guard_gap are both set to 0 due to the use of
simple_strtoul(). This could be improved by using kstrtoxxx() and
checking for an error.
It appears that having stack_guard_gap == 0 is valid (if unexpected) since
using "stack_guard_gap=0" on the kernel command line does that.
Link: https://lkml.kernel.org/r/20220222005817.11087-1-rdunlap@infradead.org
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Fixes: 1be7107fbe18e ("mm: larger stack guard gap, between vmas")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mmap.c~mm-mmap-return-1-from-stack_guard_gap-__setup-handler
+++ a/mm/mmap.c
@@ -2557,7 +2557,7 @@ static int __init cmdline_parse_stack_gu
if (!*endptr)
stack_guard_gap = val << PAGE_SHIFT;
- return 0;
+ return 1;
}
__setup("stack_guard_gap=", cmdline_parse_stack_guard_gap);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 077/227] mm/mmap: return 1 from stack_guard_gap __setup() handler
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: i.zhbanov, hughd, rdunlap, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Randy Dunlap <rdunlap@infradead.org>
Subject: mm/mmap: return 1 from stack_guard_gap __setup() handler
__setup() handlers should return 1 if the command line option is handled
and 0 if not (or maybe never return 0; it just pollutes init's
environment). This prevents:
Unknown kernel command line parameters \
"BOOT_IMAGE=/boot/bzImage-517rc5 stack_guard_gap=100", will be \
passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc5
stack_guard_gap=100
Return 1 to indicate that the boot option has been handled.
Note that there is no warning message if someone enters:
stack_guard_gap=anything_invalid
and 'val' and stack_guard_gap are both set to 0 due to the use of
simple_strtoul(). This could be improved by using kstrtoxxx() and
checking for an error.
It appears that having stack_guard_gap == 0 is valid (if unexpected) since
using "stack_guard_gap=0" on the kernel command line does that.
Link: https://lkml.kernel.org/r/20220222005817.11087-1-rdunlap@infradead.org
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Fixes: 1be7107fbe18e ("mm: larger stack guard gap, between vmas")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mmap.c~mm-mmap-return-1-from-stack_guard_gap-__setup-handler
+++ a/mm/mmap.c
@@ -2557,7 +2557,7 @@ static int __init cmdline_parse_stack_gu
if (!*endptr)
stack_guard_gap = val << PAGE_SHIFT;
- return 0;
+ return 1;
}
__setup("stack_guard_gap=", cmdline_parse_stack_guard_gap);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 078/227] mm/memory.c: use helper function range_in_vma()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory.c: use helper function range_in_vma()
Use helper function range_in_vma() to check if address, address + size are
within the vma range. Minor readability improvement.
Link: https://lkml.kernel.org/r/20220219021441.29173-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory.c~mm-use-helper-function-range_in_vma
+++ a/mm/memory.c
@@ -1715,7 +1715,7 @@ static void zap_page_range_single(struct
void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
unsigned long size)
{
- if (address < vma->vm_start || address + size > vma->vm_end ||
+ if (!range_in_vma(vma, address, address + size) ||
!(vma->vm_flags & VM_PFNMAP))
return;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 078/227] mm/memory.c: use helper function range_in_vma()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory.c: use helper function range_in_vma()
Use helper function range_in_vma() to check if address, address + size are
within the vma range. Minor readability improvement.
Link: https://lkml.kernel.org/r/20220219021441.29173-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory.c~mm-use-helper-function-range_in_vma
+++ a/mm/memory.c
@@ -1715,7 +1715,7 @@ static void zap_page_range_single(struct
void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
unsigned long size)
{
- if (address < vma->vm_start || address + size > vma->vm_end ||
+ if (!range_in_vma(vma, address, address + size) ||
!(vma->vm_flags & VM_PFNMAP))
return;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 079/227] mm/memory.c: use helper macro min and max in unmap_mapping_range_tree()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory.c: use helper macro min and max in unmap_mapping_range_tree()
Use helper macro min and max to help simplify the code logic. Minor
readability improvement.
Link: https://lkml.kernel.org/r/20220224121134.35068-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
--- a/mm/memory.c~mm-use-helper-macro-min-and-max-in-unmap_mapping_range_tree
+++ a/mm/memory.c
@@ -3350,12 +3350,8 @@ static inline void unmap_mapping_range_t
vma_interval_tree_foreach(vma, root, first_index, last_index) {
vba = vma->vm_pgoff;
vea = vba + vma_pages(vma) - 1;
- zba = first_index;
- if (zba < vba)
- zba = vba;
- zea = last_index;
- if (zea > vea)
- zea = vea;
+ zba = max(first_index, vba);
+ zea = min(last_index, vea);
unmap_mapping_range_vma(vma,
((zba - vba) << PAGE_SHIFT) + vma->vm_start,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 079/227] mm/memory.c: use helper macro min and max in unmap_mapping_range_tree()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory.c: use helper macro min and max in unmap_mapping_range_tree()
Use helper macro min and max to help simplify the code logic. Minor
readability improvement.
Link: https://lkml.kernel.org/r/20220224121134.35068-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
--- a/mm/memory.c~mm-use-helper-macro-min-and-max-in-unmap_mapping_range_tree
+++ a/mm/memory.c
@@ -3350,12 +3350,8 @@ static inline void unmap_mapping_range_t
vma_interval_tree_foreach(vma, root, first_index, last_index) {
vba = vma->vm_pgoff;
vea = vba + vma_pages(vma) - 1;
- zba = first_index;
- if (zba < vba)
- zba = vba;
- zea = last_index;
- if (zea > vea)
- zea = vea;
+ zba = max(first_index, vba);
+ zea = min(last_index, vea);
unmap_mapping_range_vma(vma,
((zba - vba) << PAGE_SHIFT) + vma->vm_start,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 080/227] mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: vbabka, hughd, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK
_install_special_mapping() adds the VM_SPECIAL bit VM_DONTEXPAND (and
never attempts to update locked_vm), so it ought to be consistent with
mmap_region() and mlock_fixup(), making sure not to add VM_LOCKED or
VM_LOCKONFAULT. I doubt that this fixes any problem in practice: just do
it for consistency.
Link: https://lkml.kernel.org/r/a85315a9-21d1-6133-c5fc-c89863dfb25b@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mmap.c~mm-_install_special_mapping-apply-vm_locked_clear_mask
+++ a/mm/mmap.c
@@ -3448,6 +3448,7 @@ static struct vm_area_struct *__install_
vma->vm_end = addr + len;
vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
+ vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
vma->vm_ops = ops;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 080/227] mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: vbabka, hughd, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK
_install_special_mapping() adds the VM_SPECIAL bit VM_DONTEXPAND (and
never attempts to update locked_vm), so it ought to be consistent with
mmap_region() and mlock_fixup(), making sure not to add VM_LOCKED or
VM_LOCKONFAULT. I doubt that this fixes any problem in practice: just do
it for consistency.
Link: https://lkml.kernel.org/r/a85315a9-21d1-6133-c5fc-c89863dfb25b@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mmap.c~mm-_install_special_mapping-apply-vm_locked_clear_mask
+++ a/mm/mmap.c
@@ -3448,6 +3448,7 @@ static struct vm_area_struct *__install_
vma->vm_end = addr + len;
vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
+ vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
vma->vm_ops = ops;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 081/227] mm/mmap: remove obsolete comment in ksys_mmap_pgoff
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mmap: remove obsolete comment in ksys_mmap_pgoff
RLIMIT_MEMLOCK is already reimplemented on top of ucounts now. And since
commit 83c1fd763b32 ("mm,hugetlb: remove mlock ulimit for SHM_HUGETLB"),
mlock ulimit for SHM_HUGETLB is further removed. So we should remove this
obsolete comment.
Link: https://lkml.kernel.org/r/20220309090623.13036-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmap.c | 2 --
1 file changed, 2 deletions(-)
--- a/mm/mmap.c~mm-mmap-remove-obsolete-comment-in-ksys_mmap_pgoff
+++ a/mm/mmap.c
@@ -1616,8 +1616,6 @@ unsigned long ksys_mmap_pgoff(unsigned l
/*
* VM_NORESERVE is used because the reservations will be
* taken when vm_ops->mmap() is called
- * A dummy user value is used because we are not locking
- * memory so no accounting is necessary
*/
file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
VM_NORESERVE,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 081/227] mm/mmap: remove obsolete comment in ksys_mmap_pgoff
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mmap: remove obsolete comment in ksys_mmap_pgoff
RLIMIT_MEMLOCK is already reimplemented on top of ucounts now. And since
commit 83c1fd763b32 ("mm,hugetlb: remove mlock ulimit for SHM_HUGETLB"),
mlock ulimit for SHM_HUGETLB is further removed. So we should remove this
obsolete comment.
Link: https://lkml.kernel.org/r/20220309090623.13036-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmap.c | 2 --
1 file changed, 2 deletions(-)
--- a/mm/mmap.c~mm-mmap-remove-obsolete-comment-in-ksys_mmap_pgoff
+++ a/mm/mmap.c
@@ -1616,8 +1616,6 @@ unsigned long ksys_mmap_pgoff(unsigned l
/*
* VM_NORESERVE is used because the reservations will be
* taken when vm_ops->mmap() is called
- * A dummy user value is used because we are not locking
- * memory so no accounting is necessary
*/
file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
VM_NORESERVE,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 082/227] mm/mremap:: use vma_lookup() instead of find_vma()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: david, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mremap:: use vma_lookup() instead of find_vma()
Using vma_lookup() verifies the address is contained in the found vma.
This results in easier to read code.
Link: https://lkml.kernel.org/r/20220312083118.48284-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mremap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/mremap.c~mm-mremap-use-vma_lookup-instead-of-find_vma
+++ a/mm/mremap.c
@@ -942,8 +942,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, a
if (mmap_write_lock_killable(current->mm))
return -EINTR;
- vma = find_vma(mm, addr);
- if (!vma || vma->vm_start > addr) {
+ vma = vma_lookup(mm, addr);
+ if (!vma) {
ret = EFAULT;
goto out;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 082/227] mm/mremap:: use vma_lookup() instead of find_vma()
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: david, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mremap:: use vma_lookup() instead of find_vma()
Using vma_lookup() verifies the address is contained in the found vma.
This results in easier to read code.
Link: https://lkml.kernel.org/r/20220312083118.48284-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mremap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/mremap.c~mm-mremap-use-vma_lookup-instead-of-find_vma
+++ a/mm/mremap.c
@@ -942,8 +942,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, a
if (mmap_write_lock_killable(current->mm))
return -EINTR;
- vma = find_vma(mm, addr);
- if (!vma || vma->vm_start > addr) {
+ vma = vma_lookup(mm, addr);
+ if (!vma) {
ret = EFAULT;
goto out;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 083/227] mm/sparse: make mminit_validate_memmodel_limits() static
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: rppt, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/sparse: make mminit_validate_memmodel_limits() static
It's only used in the sparse.c now. So we can make it static and further
clean up the relevant code.
Link: https://lkml.kernel.org/r/20220127093221.63524-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/internal.h | 11 -----------
mm/sparse.c | 2 +-
2 files changed, 1 insertion(+), 12 deletions(-)
--- a/mm/internal.h~mm-sparse-make-mminit_validate_memmodel_limits-static
+++ a/mm/internal.h
@@ -572,17 +572,6 @@ static inline void mminit_verify_zonelis
}
#endif /* CONFIG_DEBUG_MEMORY_INIT */
-/* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
-#if defined(CONFIG_SPARSEMEM)
-extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
- unsigned long *end_pfn);
-#else
-static inline void mminit_validate_memmodel_limits(unsigned long *start_pfn,
- unsigned long *end_pfn)
-{
-}
-#endif /* CONFIG_SPARSEMEM */
-
#define NODE_RECLAIM_NOSCAN -2
#define NODE_RECLAIM_FULL -1
#define NODE_RECLAIM_SOME 0
--- a/mm/sparse.c~mm-sparse-make-mminit_validate_memmodel_limits-static
+++ a/mm/sparse.c
@@ -126,7 +126,7 @@ static inline int sparse_early_nid(struc
}
/* Validate the physical addressing limitations of the model */
-void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
+static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
unsigned long *end_pfn)
{
unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 083/227] mm/sparse: make mminit_validate_memmodel_limits() static
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: rppt, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/sparse: make mminit_validate_memmodel_limits() static
It's only used in the sparse.c now. So we can make it static and further
clean up the relevant code.
Link: https://lkml.kernel.org/r/20220127093221.63524-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/internal.h | 11 -----------
mm/sparse.c | 2 +-
2 files changed, 1 insertion(+), 12 deletions(-)
--- a/mm/internal.h~mm-sparse-make-mminit_validate_memmodel_limits-static
+++ a/mm/internal.h
@@ -572,17 +572,6 @@ static inline void mminit_verify_zonelis
}
#endif /* CONFIG_DEBUG_MEMORY_INIT */
-/* mminit_validate_memmodel_limits is independent of CONFIG_DEBUG_MEMORY_INIT */
-#if defined(CONFIG_SPARSEMEM)
-extern void mminit_validate_memmodel_limits(unsigned long *start_pfn,
- unsigned long *end_pfn);
-#else
-static inline void mminit_validate_memmodel_limits(unsigned long *start_pfn,
- unsigned long *end_pfn)
-{
-}
-#endif /* CONFIG_SPARSEMEM */
-
#define NODE_RECLAIM_NOSCAN -2
#define NODE_RECLAIM_FULL -1
#define NODE_RECLAIM_SOME 0
--- a/mm/sparse.c~mm-sparse-make-mminit_validate_memmodel_limits-static
+++ a/mm/sparse.c
@@ -126,7 +126,7 @@ static inline int sparse_early_nid(struc
}
/* Validate the physical addressing limitations of the model */
-void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
+static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
unsigned long *end_pfn)
{
unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 084/227] mm/vmalloc: remove unneeded function forward declaration
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: urezki, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/vmalloc: remove unneeded function forward declaration
The forward declaration for lazy_max_pages() is unnecessary. Remove it.
Link: https://lkml.kernel.org/r/20220124133752.60663-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 1 -
1 file changed, 1 deletion(-)
--- a/mm/vmalloc.c~mm-vmalloc-remove-unneeded-function-forward-declaration
+++ a/mm/vmalloc.c
@@ -791,7 +791,6 @@ RB_DECLARE_CALLBACKS_MAX(static, free_vm
static void purge_vmap_area_lazy(void);
static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
-static unsigned long lazy_max_pages(void);
static atomic_long_t nr_vmalloc_pages;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 084/227] mm/vmalloc: remove unneeded function forward declaration
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: urezki, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/vmalloc: remove unneeded function forward declaration
The forward declaration for lazy_max_pages() is unnecessary. Remove it.
Link: https://lkml.kernel.org/r/20220124133752.60663-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 1 -
1 file changed, 1 deletion(-)
--- a/mm/vmalloc.c~mm-vmalloc-remove-unneeded-function-forward-declaration
+++ a/mm/vmalloc.c
@@ -791,7 +791,6 @@ RB_DECLARE_CALLBACKS_MAX(static, free_vm
static void purge_vmap_area_lazy(void);
static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
-static unsigned long lazy_max_pages(void);
static atomic_long_t nr_vmalloc_pages;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 085/227] mm/vmalloc: Move draining areas out of caller context
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vvs, uladzislau.rezki, oleksiy.avramchenko, npiggin, hch,
urezki, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Subject: mm/vmalloc: Move draining areas out of caller context
A caller initiates the drain procces from its context once the
drain threshold is reached or passed. There are at least two
drawbacks of doing so:
a) a caller can be a high-prio or RT task. In that case it can
stuck in doing the actual drain of all lazily freed areas.
This is not optimal because such tasks usually are latency
sensitive where the control should be returned back as soon
as possible in order to drive such workloads in time. See
96e2db456135 ("mm/vmalloc: rework the drain logic")
b) It is not safe to call vfree() during holding a spinlock due
to the vmap_purge_lock mutex. The was a report about this from
Zeal Robot <zealci@zte.com.cn> here:
https://lore.kernel.org/all/20211222081026.484058-1-chi.minghao@zte.com.cn
Moving the drain to the separate work context addresses those
issues.
v1->v2:
- Added prefix "_work" to the drain worker function.
v2->v3:
- Remove the drain_vmap_work_in_progress. Extra queuing
is expectable under heavy load but it can be disregarded
because a work will bail out if nothing to be done.
Link: https://lkml.kernel.org/r/20220131144058.35608-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-move-draining-areas-out-of-caller-context
+++ a/mm/vmalloc.c
@@ -791,6 +791,8 @@ RB_DECLARE_CALLBACKS_MAX(static, free_vm
static void purge_vmap_area_lazy(void);
static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
+static void drain_vmap_area_work(struct work_struct *work);
+static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
static atomic_long_t nr_vmalloc_pages;
@@ -1718,18 +1720,6 @@ static bool __purge_vmap_area_lazy(unsig
}
/*
- * Kick off a purge of the outstanding lazy areas. Don't bother if somebody
- * is already purging.
- */
-static void try_purge_vmap_area_lazy(void)
-{
- if (mutex_trylock(&vmap_purge_lock)) {
- __purge_vmap_area_lazy(ULONG_MAX, 0);
- mutex_unlock(&vmap_purge_lock);
- }
-}
-
-/*
* Kick off a purge of the outstanding lazy areas.
*/
static void purge_vmap_area_lazy(void)
@@ -1740,6 +1730,20 @@ static void purge_vmap_area_lazy(void)
mutex_unlock(&vmap_purge_lock);
}
+static void drain_vmap_area_work(struct work_struct *work)
+{
+ unsigned long nr_lazy;
+
+ do {
+ mutex_lock(&vmap_purge_lock);
+ __purge_vmap_area_lazy(ULONG_MAX, 0);
+ mutex_unlock(&vmap_purge_lock);
+
+ /* Recheck if further work is required. */
+ nr_lazy = atomic_long_read(&vmap_lazy_nr);
+ } while (nr_lazy > lazy_max_pages());
+}
+
/*
* Free a vmap area, caller ensuring that the area has been unmapped
* and flush_cache_vunmap had been called for the correct range
@@ -1766,7 +1770,7 @@ static void free_vmap_area_noflush(struc
/* After this point, we may free va at any time */
if (unlikely(nr_lazy > lazy_max_pages()))
- try_purge_vmap_area_lazy();
+ schedule_work(&drain_vmap_work);
}
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 085/227] mm/vmalloc: Move draining areas out of caller context
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vvs, uladzislau.rezki, oleksiy.avramchenko, npiggin, hch,
urezki, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Subject: mm/vmalloc: Move draining areas out of caller context
A caller initiates the drain procces from its context once the
drain threshold is reached or passed. There are at least two
drawbacks of doing so:
a) a caller can be a high-prio or RT task. In that case it can
stuck in doing the actual drain of all lazily freed areas.
This is not optimal because such tasks usually are latency
sensitive where the control should be returned back as soon
as possible in order to drive such workloads in time. See
96e2db456135 ("mm/vmalloc: rework the drain logic")
b) It is not safe to call vfree() during holding a spinlock due
to the vmap_purge_lock mutex. The was a report about this from
Zeal Robot <zealci@zte.com.cn> here:
https://lore.kernel.org/all/20211222081026.484058-1-chi.minghao@zte.com.cn
Moving the drain to the separate work context addresses those
issues.
v1->v2:
- Added prefix "_work" to the drain worker function.
v2->v3:
- Remove the drain_vmap_work_in_progress. Extra queuing
is expectable under heavy load but it can be disregarded
because a work will bail out if nothing to be done.
Link: https://lkml.kernel.org/r/20220131144058.35608-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-move-draining-areas-out-of-caller-context
+++ a/mm/vmalloc.c
@@ -791,6 +791,8 @@ RB_DECLARE_CALLBACKS_MAX(static, free_vm
static void purge_vmap_area_lazy(void);
static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
+static void drain_vmap_area_work(struct work_struct *work);
+static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
static atomic_long_t nr_vmalloc_pages;
@@ -1718,18 +1720,6 @@ static bool __purge_vmap_area_lazy(unsig
}
/*
- * Kick off a purge of the outstanding lazy areas. Don't bother if somebody
- * is already purging.
- */
-static void try_purge_vmap_area_lazy(void)
-{
- if (mutex_trylock(&vmap_purge_lock)) {
- __purge_vmap_area_lazy(ULONG_MAX, 0);
- mutex_unlock(&vmap_purge_lock);
- }
-}
-
-/*
* Kick off a purge of the outstanding lazy areas.
*/
static void purge_vmap_area_lazy(void)
@@ -1740,6 +1730,20 @@ static void purge_vmap_area_lazy(void)
mutex_unlock(&vmap_purge_lock);
}
+static void drain_vmap_area_work(struct work_struct *work)
+{
+ unsigned long nr_lazy;
+
+ do {
+ mutex_lock(&vmap_purge_lock);
+ __purge_vmap_area_lazy(ULONG_MAX, 0);
+ mutex_unlock(&vmap_purge_lock);
+
+ /* Recheck if further work is required. */
+ nr_lazy = atomic_long_read(&vmap_lazy_nr);
+ } while (nr_lazy > lazy_max_pages());
+}
+
/*
* Free a vmap area, caller ensuring that the area has been unmapped
* and flush_cache_vunmap had been called for the correct range
@@ -1766,7 +1770,7 @@ static void free_vmap_area_noflush(struc
/* After this point, we may free va at any time */
if (unlikely(nr_lazy > lazy_max_pages()))
- try_purge_vmap_area_lazy();
+ schedule_work(&drain_vmap_work);
}
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 086/227] mm/vmalloc: add adjust_search_size parameter
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vvs, urezki, oleksiy.avramchenko, npiggin, hch,
uladzislau.rezki, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Uladzislau Rezki <uladzislau.rezki@sony.com>
Subject: mm/vmalloc: add adjust_search_size parameter
Extend the find_vmap_lowest_match() function with one more parameter. It
is "adjust_search_size" boolean variable, so it is possible to control an
accuracy of search block if a specific alignment is required.
With this patch, a search size is always adjusted, to serve a request as
fast as possible because of performance reason.
But there is one exception though, it is short ranges where requested size
corresponds to passed vstart/vend restriction together with a specific
alignment request. In such scenario an adjustment wold not lead to
success allocation.
Link: https://lkml.kernel.org/r/20220119143540.601149-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-add-adjust_search_size-parameter
+++ a/mm/vmalloc.c
@@ -1189,22 +1189,28 @@ is_within_this_va(struct vmap_area *va,
/*
* Find the first free block(lowest start address) in the tree,
* that will accomplish the request corresponding to passing
- * parameters.
+ * parameters. Please note, with an alignment bigger than PAGE_SIZE,
+ * a search length is adjusted to account for worst case alignment
+ * overhead.
*/
static __always_inline struct vmap_area *
-find_vmap_lowest_match(unsigned long size,
- unsigned long align, unsigned long vstart)
+find_vmap_lowest_match(unsigned long size, unsigned long align,
+ unsigned long vstart, bool adjust_search_size)
{
struct vmap_area *va;
struct rb_node *node;
+ unsigned long length;
/* Start from the root. */
node = free_vmap_area_root.rb_node;
+ /* Adjust the search size for alignment overhead. */
+ length = adjust_search_size ? size + align - 1 : size;
+
while (node) {
va = rb_entry(node, struct vmap_area, rb_node);
- if (get_subtree_max_size(node->rb_left) >= size &&
+ if (get_subtree_max_size(node->rb_left) >= length &&
vstart < va->va_start) {
node = node->rb_left;
} else {
@@ -1214,9 +1220,9 @@ find_vmap_lowest_match(unsigned long siz
/*
* Does not make sense to go deeper towards the right
* sub-tree if it does not have a free block that is
- * equal or bigger to the requested search size.
+ * equal or bigger to the requested search length.
*/
- if (get_subtree_max_size(node->rb_right) >= size) {
+ if (get_subtree_max_size(node->rb_right) >= length) {
node = node->rb_right;
continue;
}
@@ -1232,7 +1238,7 @@ find_vmap_lowest_match(unsigned long siz
if (is_within_this_va(va, size, align, vstart))
return va;
- if (get_subtree_max_size(node->rb_right) >= size &&
+ if (get_subtree_max_size(node->rb_right) >= length &&
vstart <= va->va_start) {
/*
* Shift the vstart forward. Please note, we update it with
@@ -1280,7 +1286,7 @@ find_vmap_lowest_match_check(unsigned lo
get_random_bytes(&rnd, sizeof(rnd));
vstart = VMALLOC_START + rnd;
- va_1 = find_vmap_lowest_match(size, align, vstart);
+ va_1 = find_vmap_lowest_match(size, align, vstart, false);
va_2 = find_vmap_lowest_linear_match(size, align, vstart);
if (va_1 != va_2)
@@ -1431,12 +1437,25 @@ static __always_inline unsigned long
__alloc_vmap_area(unsigned long size, unsigned long align,
unsigned long vstart, unsigned long vend)
{
+ bool adjust_search_size = true;
unsigned long nva_start_addr;
struct vmap_area *va;
enum fit_type type;
int ret;
- va = find_vmap_lowest_match(size, align, vstart);
+ /*
+ * Do not adjust when:
+ * a) align <= PAGE_SIZE, because it does not make any sense.
+ * All blocks(their start addresses) are at least PAGE_SIZE
+ * aligned anyway;
+ * b) a short range where a requested size corresponds to exactly
+ * specified [vstart:vend] interval and an alignment > PAGE_SIZE.
+ * With adjusted search length an allocation would not succeed.
+ */
+ if (align <= PAGE_SIZE || (align > PAGE_SIZE && (vend - vstart) == size))
+ adjust_search_size = false;
+
+ va = find_vmap_lowest_match(size, align, vstart, adjust_search_size);
if (unlikely(!va))
return vend;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 086/227] mm/vmalloc: add adjust_search_size parameter
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vvs, urezki, oleksiy.avramchenko, npiggin, hch,
uladzislau.rezki, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Uladzislau Rezki <uladzislau.rezki@sony.com>
Subject: mm/vmalloc: add adjust_search_size parameter
Extend the find_vmap_lowest_match() function with one more parameter. It
is "adjust_search_size" boolean variable, so it is possible to control an
accuracy of search block if a specific alignment is required.
With this patch, a search size is always adjusted, to serve a request as
fast as possible because of performance reason.
But there is one exception though, it is short ranges where requested size
corresponds to passed vstart/vend restriction together with a specific
alignment request. In such scenario an adjustment wold not lead to
success allocation.
Link: https://lkml.kernel.org/r/20220119143540.601149-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-add-adjust_search_size-parameter
+++ a/mm/vmalloc.c
@@ -1189,22 +1189,28 @@ is_within_this_va(struct vmap_area *va,
/*
* Find the first free block(lowest start address) in the tree,
* that will accomplish the request corresponding to passing
- * parameters.
+ * parameters. Please note, with an alignment bigger than PAGE_SIZE,
+ * a search length is adjusted to account for worst case alignment
+ * overhead.
*/
static __always_inline struct vmap_area *
-find_vmap_lowest_match(unsigned long size,
- unsigned long align, unsigned long vstart)
+find_vmap_lowest_match(unsigned long size, unsigned long align,
+ unsigned long vstart, bool adjust_search_size)
{
struct vmap_area *va;
struct rb_node *node;
+ unsigned long length;
/* Start from the root. */
node = free_vmap_area_root.rb_node;
+ /* Adjust the search size for alignment overhead. */
+ length = adjust_search_size ? size + align - 1 : size;
+
while (node) {
va = rb_entry(node, struct vmap_area, rb_node);
- if (get_subtree_max_size(node->rb_left) >= size &&
+ if (get_subtree_max_size(node->rb_left) >= length &&
vstart < va->va_start) {
node = node->rb_left;
} else {
@@ -1214,9 +1220,9 @@ find_vmap_lowest_match(unsigned long siz
/*
* Does not make sense to go deeper towards the right
* sub-tree if it does not have a free block that is
- * equal or bigger to the requested search size.
+ * equal or bigger to the requested search length.
*/
- if (get_subtree_max_size(node->rb_right) >= size) {
+ if (get_subtree_max_size(node->rb_right) >= length) {
node = node->rb_right;
continue;
}
@@ -1232,7 +1238,7 @@ find_vmap_lowest_match(unsigned long siz
if (is_within_this_va(va, size, align, vstart))
return va;
- if (get_subtree_max_size(node->rb_right) >= size &&
+ if (get_subtree_max_size(node->rb_right) >= length &&
vstart <= va->va_start) {
/*
* Shift the vstart forward. Please note, we update it with
@@ -1280,7 +1286,7 @@ find_vmap_lowest_match_check(unsigned lo
get_random_bytes(&rnd, sizeof(rnd));
vstart = VMALLOC_START + rnd;
- va_1 = find_vmap_lowest_match(size, align, vstart);
+ va_1 = find_vmap_lowest_match(size, align, vstart, false);
va_2 = find_vmap_lowest_linear_match(size, align, vstart);
if (va_1 != va_2)
@@ -1431,12 +1437,25 @@ static __always_inline unsigned long
__alloc_vmap_area(unsigned long size, unsigned long align,
unsigned long vstart, unsigned long vend)
{
+ bool adjust_search_size = true;
unsigned long nva_start_addr;
struct vmap_area *va;
enum fit_type type;
int ret;
- va = find_vmap_lowest_match(size, align, vstart);
+ /*
+ * Do not adjust when:
+ * a) align <= PAGE_SIZE, because it does not make any sense.
+ * All blocks(their start addresses) are at least PAGE_SIZE
+ * aligned anyway;
+ * b) a short range where a requested size corresponds to exactly
+ * specified [vstart:vend] interval and an alignment > PAGE_SIZE.
+ * With adjusted search length an allocation would not succeed.
+ */
+ if (align <= PAGE_SIZE || (align > PAGE_SIZE && (vend - vstart) == size))
+ adjust_search_size = false;
+
+ va = find_vmap_lowest_match(size, align, vstart, adjust_search_size);
if (unlikely(!va))
return vend;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 087/227] mm/vmalloc: eliminate an extra orig_gfp_mask
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vvs, uladzislau.rezki, oleksiy.avramchenko, npiggin, hch,
urezki, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Subject: mm/vmalloc: eliminate an extra orig_gfp_mask
That extra variable has been introduced just for keeping an original
passed gfp_mask because it is updated with __GFP_NOWARN on entry, thus
error handling messages were broken.
Instead we can keep an original gfp_mask without modifying it and add an
extra __GFP_NOWARN flag together with gfp_mask as a parameter to the
vm_area_alloc_pages() function. It will make it less confused.
Link: https://lkml.kernel.org/r/20220119143540.601149-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-eliminate-an-extra-orig_gfp_mask
+++ a/mm/vmalloc.c
@@ -2946,7 +2946,6 @@ static void *__vmalloc_area_node(struct
int node)
{
const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
- const gfp_t orig_gfp_mask = gfp_mask;
bool nofail = gfp_mask & __GFP_NOFAIL;
unsigned long addr = (unsigned long)area->addr;
unsigned long size = get_vm_area_size(area);
@@ -2970,7 +2969,7 @@ static void *__vmalloc_area_node(struct
}
if (!area->pages) {
- warn_alloc(orig_gfp_mask, NULL,
+ warn_alloc(gfp_mask, NULL,
"vmalloc error: size %lu, failed to allocated page array size %lu",
nr_small_pages * PAGE_SIZE, array_size);
free_vm_area(area);
@@ -2980,8 +2979,8 @@ static void *__vmalloc_area_node(struct
set_vm_area_page_order(area, page_shift - PAGE_SHIFT);
page_order = vm_area_page_order(area);
- area->nr_pages = vm_area_alloc_pages(gfp_mask, node,
- page_order, nr_small_pages, area->pages);
+ area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN,
+ node, page_order, nr_small_pages, area->pages);
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
if (gfp_mask & __GFP_ACCOUNT) {
@@ -2997,7 +2996,7 @@ static void *__vmalloc_area_node(struct
* allocation request, free them via __vfree() if any.
*/
if (area->nr_pages != nr_small_pages) {
- warn_alloc(orig_gfp_mask, NULL,
+ warn_alloc(gfp_mask, NULL,
"vmalloc error: size %lu, page order %u, failed to allocate pages",
area->nr_pages * PAGE_SIZE, page_order);
goto fail;
@@ -3025,7 +3024,7 @@ static void *__vmalloc_area_node(struct
memalloc_noio_restore(flags);
if (ret < 0) {
- warn_alloc(orig_gfp_mask, NULL,
+ warn_alloc(gfp_mask, NULL,
"vmalloc error: size %lu, failed to map pages",
area->nr_pages * PAGE_SIZE);
goto fail;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 087/227] mm/vmalloc: eliminate an extra orig_gfp_mask
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: willy, vvs, uladzislau.rezki, oleksiy.avramchenko, npiggin, hch,
urezki, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Subject: mm/vmalloc: eliminate an extra orig_gfp_mask
That extra variable has been introduced just for keeping an original
passed gfp_mask because it is updated with __GFP_NOWARN on entry, thus
error handling messages were broken.
Instead we can keep an original gfp_mask without modifying it and add an
extra __GFP_NOWARN flag together with gfp_mask as a parameter to the
vm_area_alloc_pages() function. It will make it less confused.
Link: https://lkml.kernel.org/r/20220119143540.601149-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-eliminate-an-extra-orig_gfp_mask
+++ a/mm/vmalloc.c
@@ -2946,7 +2946,6 @@ static void *__vmalloc_area_node(struct
int node)
{
const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
- const gfp_t orig_gfp_mask = gfp_mask;
bool nofail = gfp_mask & __GFP_NOFAIL;
unsigned long addr = (unsigned long)area->addr;
unsigned long size = get_vm_area_size(area);
@@ -2970,7 +2969,7 @@ static void *__vmalloc_area_node(struct
}
if (!area->pages) {
- warn_alloc(orig_gfp_mask, NULL,
+ warn_alloc(gfp_mask, NULL,
"vmalloc error: size %lu, failed to allocated page array size %lu",
nr_small_pages * PAGE_SIZE, array_size);
free_vm_area(area);
@@ -2980,8 +2979,8 @@ static void *__vmalloc_area_node(struct
set_vm_area_page_order(area, page_shift - PAGE_SHIFT);
page_order = vm_area_page_order(area);
- area->nr_pages = vm_area_alloc_pages(gfp_mask, node,
- page_order, nr_small_pages, area->pages);
+ area->nr_pages = vm_area_alloc_pages(gfp_mask | __GFP_NOWARN,
+ node, page_order, nr_small_pages, area->pages);
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
if (gfp_mask & __GFP_ACCOUNT) {
@@ -2997,7 +2996,7 @@ static void *__vmalloc_area_node(struct
* allocation request, free them via __vfree() if any.
*/
if (area->nr_pages != nr_small_pages) {
- warn_alloc(orig_gfp_mask, NULL,
+ warn_alloc(gfp_mask, NULL,
"vmalloc error: size %lu, page order %u, failed to allocate pages",
area->nr_pages * PAGE_SIZE, page_order);
goto fail;
@@ -3025,7 +3024,7 @@ static void *__vmalloc_area_node(struct
memalloc_noio_restore(flags);
if (ret < 0) {
- warn_alloc(orig_gfp_mask, NULL,
+ warn_alloc(gfp_mask, NULL,
"vmalloc error: size %lu, failed to map pages",
area->nr_pages * PAGE_SIZE);
goto fail;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 088/227] mm/vmalloc.c: fix "unused function" warning
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: abaci, jiapeng.chong, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Subject: mm/vmalloc.c: fix "unused function" warning
compute_subtree_max_size() is unused, when building with
DEBUG_AUGMENT_PROPAGATE_CHECK=y.
mm/vmalloc.c:785:1: warning: unused function 'compute_subtree_max_size'
[-Wunused-function].
Link: https://lkml.kernel.org/r/20220129034652.75359-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
--- a/mm/vmalloc.c~mm-vmallocc-fix-unused-function-warning
+++ a/mm/vmalloc.c
@@ -775,17 +775,6 @@ get_subtree_max_size(struct rb_node *nod
return va ? va->subtree_max_size : 0;
}
-/*
- * Gets called when remove the node and rotate.
- */
-static __always_inline unsigned long
-compute_subtree_max_size(struct vmap_area *va)
-{
- return max3(va_size(va),
- get_subtree_max_size(va->rb_node.rb_left),
- get_subtree_max_size(va->rb_node.rb_right));
-}
-
RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size)
@@ -973,6 +962,17 @@ unlink_va(struct vmap_area *va, struct r
}
#if DEBUG_AUGMENT_PROPAGATE_CHECK
+/*
+ * Gets called when remove the node and rotate.
+ */
+static __always_inline unsigned long
+compute_subtree_max_size(struct vmap_area *va)
+{
+ return max3(va_size(va),
+ get_subtree_max_size(va->rb_node.rb_left),
+ get_subtree_max_size(va->rb_node.rb_right));
+}
+
static void
augment_tree_propagate_check(void)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 088/227] mm/vmalloc.c: fix "unused function" warning
@ 2022-03-22 21:42 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:42 UTC (permalink / raw)
To: abaci, jiapeng.chong, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Subject: mm/vmalloc.c: fix "unused function" warning
compute_subtree_max_size() is unused, when building with
DEBUG_AUGMENT_PROPAGATE_CHECK=y.
mm/vmalloc.c:785:1: warning: unused function 'compute_subtree_max_size'
[-Wunused-function].
Link: https://lkml.kernel.org/r/20220129034652.75359-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmalloc.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
--- a/mm/vmalloc.c~mm-vmallocc-fix-unused-function-warning
+++ a/mm/vmalloc.c
@@ -775,17 +775,6 @@ get_subtree_max_size(struct rb_node *nod
return va ? va->subtree_max_size : 0;
}
-/*
- * Gets called when remove the node and rotate.
- */
-static __always_inline unsigned long
-compute_subtree_max_size(struct vmap_area *va)
-{
- return max3(va_size(va),
- get_subtree_max_size(va->rb_node.rb_left),
- get_subtree_max_size(va->rb_node.rb_right));
-}
-
RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
struct vmap_area, rb_node, unsigned long, subtree_max_size, va_size)
@@ -973,6 +962,17 @@ unlink_va(struct vmap_area *va, struct r
}
#if DEBUG_AUGMENT_PROPAGATE_CHECK
+/*
+ * Gets called when remove the node and rotate.
+ */
+static __always_inline unsigned long
+compute_subtree_max_size(struct vmap_area *va)
+{
+ return max3(va_size(va),
+ get_subtree_max_size(va->rb_node.rb_left),
+ get_subtree_max_size(va->rb_node.rb_right));
+}
+
static void
augment_tree_propagate_check(void)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 089/227] mm/vmalloc: fix comments about vmap_area struct
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: urezki, lpf.vector, libang.linuxer, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Bang Li <libang.linuxer@gmail.com>
Subject: mm/vmalloc: fix comments about vmap_area struct
The vmap_area_root should be in the "busy" tree and the
free_vmap_area_root should be in the "free" tree.
Link: https://lkml.kernel.org/r/20220305011510.33596-1-libang.linuxer@gmail.com
Fixes: 688fcbfc06e4 ("mm/vmalloc: modify struct vmap_area to reduce its size")
Signed-off-by: Bang Li <libang.linuxer@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Pengfei Li <lpf.vector@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/vmalloc.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/vmalloc.h~mm-vmalloc-fix-comments-about-vmap_area-struct
+++ a/include/linux/vmalloc.h
@@ -80,8 +80,8 @@ struct vmap_area {
/*
* The following two variables can be packed, because
* a vmap_area object can be either:
- * 1) in "free" tree (root is vmap_area_root)
- * 2) or "busy" tree (root is free_vmap_area_root)
+ * 1) in "free" tree (root is free_vmap_area_root)
+ * 2) or "busy" tree (root is vmap_area_root)
*/
union {
unsigned long subtree_max_size; /* in "free" tree */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 089/227] mm/vmalloc: fix comments about vmap_area struct
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: urezki, lpf.vector, libang.linuxer, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Bang Li <libang.linuxer@gmail.com>
Subject: mm/vmalloc: fix comments about vmap_area struct
The vmap_area_root should be in the "busy" tree and the
free_vmap_area_root should be in the "free" tree.
Link: https://lkml.kernel.org/r/20220305011510.33596-1-libang.linuxer@gmail.com
Fixes: 688fcbfc06e4 ("mm/vmalloc: modify struct vmap_area to reduce its size")
Signed-off-by: Bang Li <libang.linuxer@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Pengfei Li <lpf.vector@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/vmalloc.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/vmalloc.h~mm-vmalloc-fix-comments-about-vmap_area-struct
+++ a/include/linux/vmalloc.h
@@ -80,8 +80,8 @@ struct vmap_area {
/*
* The following two variables can be packed, because
* a vmap_area object can be either:
- * 1) in "free" tree (root is vmap_area_root)
- * 2) or "busy" tree (root is free_vmap_area_root)
+ * 1) in "free" tree (root is free_vmap_area_root)
+ * 2) or "busy" tree (root is vmap_area_root)
*/
union {
unsigned long subtree_max_size; /* in "free" tree */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 090/227] mm: page_alloc: avoid merging non-fallbackable pageblocks with others
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, rppt, rppt, osalvador, mgorman, david, ziy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Zi Yan <ziy@nvidia.com>
Subject: mm: page_alloc: avoid merging non-fallbackable pageblocks with others
This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance. It
prepares for the upcoming removal of the MAX_ORDER-1 alignment requirement
for CMA and alloc_contig_range().
MIGRATE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Remove MIGRATE_CMA and MIGRATE_ISOLATE from fallbacks list, since they
are never used.
[1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
Link: https://lkml.kernel.org/r/20220124175957.1261961-1-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 11 +++++++++
mm/page_alloc.c | 46 ++++++++++++++++++---------------------
2 files changed, 33 insertions(+), 24 deletions(-)
--- a/include/linux/mmzone.h~mm-page_alloc-avoid-merging-non-fallbackable-pageblocks-with-others
+++ a/include/linux/mmzone.h
@@ -83,6 +83,17 @@ static inline bool is_migrate_movable(in
return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
}
+/*
+ * Check whether a migratetype can be merged with another migratetype.
+ *
+ * It is only mergeable when it can fall back to other migratetypes for
+ * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c.
+ */
+static inline bool migratetype_is_mergeable(int mt)
+{
+ return mt < MIGRATE_PCPTYPES;
+}
+
#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
--- a/mm/page_alloc.c~mm-page_alloc-avoid-merging-non-fallbackable-pageblocks-with-others
+++ a/mm/page_alloc.c
@@ -1117,25 +1117,24 @@ continue_merging:
}
if (order < MAX_ORDER - 1) {
/* If we are here, it means order is >= pageblock_order.
- * We want to prevent merge between freepages on isolate
- * pageblock and normal pageblock. Without this, pageblock
- * isolation could cause incorrect freepage or CMA accounting.
+ * We want to prevent merge between freepages on pageblock
+ * without fallbacks and normal pageblock. Without this,
+ * pageblock isolation could cause incorrect freepage or CMA
+ * accounting or HIGHATOMIC accounting.
*
* We don't want to hit this code for the more frequent
* low-order merging.
*/
- if (unlikely(has_isolate_pageblock(zone))) {
- int buddy_mt;
+ int buddy_mt;
- buddy_pfn = __find_buddy_pfn(pfn, order);
- buddy = page + (buddy_pfn - pfn);
- buddy_mt = get_pageblock_migratetype(buddy);
-
- if (migratetype != buddy_mt
- && (is_migrate_isolate(migratetype) ||
- is_migrate_isolate(buddy_mt)))
- goto done_merging;
- }
+ buddy_pfn = __find_buddy_pfn(pfn, order);
+ buddy = page + (buddy_pfn - pfn);
+ buddy_mt = get_pageblock_migratetype(buddy);
+
+ if (migratetype != buddy_mt
+ && (!migratetype_is_mergeable(migratetype) ||
+ !migratetype_is_mergeable(buddy_mt)))
+ goto done_merging;
max_order = order + 1;
goto continue_merging;
}
@@ -2479,17 +2478,13 @@ struct page *__rmqueue_smallest(struct z
/*
* This array describes the order lists are fallen back to when
* the free lists for the desirable migrate type are depleted
+ *
+ * The other migratetypes do not have fallbacks.
*/
static int fallbacks[MIGRATE_TYPES][3] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
-#ifdef CONFIG_CMA
- [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
-#endif
-#ifdef CONFIG_MEMORY_ISOLATION
- [MIGRATE_ISOLATE] = { MIGRATE_TYPES }, /* Never used */
-#endif
};
#ifdef CONFIG_CMA
@@ -2795,8 +2790,8 @@ static void reserve_highatomic_pageblock
/* Yoink! */
mt = get_pageblock_migratetype(page);
- if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
- && !is_migrate_cma(mt)) {
+ /* Only reserve normal pageblocks (i.e., they can merge with others) */
+ if (migratetype_is_mergeable(mt)) {
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3545,8 +3540,11 @@ int __isolate_free_page(struct page *pag
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
int mt = get_pageblock_migratetype(page);
- if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
- && !is_migrate_highatomic(mt))
+ /*
+ * Only change normal pageblocks (i.e., they can merge
+ * with others)
+ */
+ if (migratetype_is_mergeable(mt))
set_pageblock_migratetype(page,
MIGRATE_MOVABLE);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 090/227] mm: page_alloc: avoid merging non-fallbackable pageblocks with others
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, rppt, rppt, osalvador, mgorman, david, ziy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Zi Yan <ziy@nvidia.com>
Subject: mm: page_alloc: avoid merging non-fallbackable pageblocks with others
This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance. It
prepares for the upcoming removal of the MAX_ORDER-1 alignment requirement
for CMA and alloc_contig_range().
MIGRATE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Remove MIGRATE_CMA and MIGRATE_ISOLATE from fallbacks list, since they
are never used.
[1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/
Link: https://lkml.kernel.org/r/20220124175957.1261961-1-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 11 +++++++++
mm/page_alloc.c | 46 ++++++++++++++++++---------------------
2 files changed, 33 insertions(+), 24 deletions(-)
--- a/include/linux/mmzone.h~mm-page_alloc-avoid-merging-non-fallbackable-pageblocks-with-others
+++ a/include/linux/mmzone.h
@@ -83,6 +83,17 @@ static inline bool is_migrate_movable(in
return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
}
+/*
+ * Check whether a migratetype can be merged with another migratetype.
+ *
+ * It is only mergeable when it can fall back to other migratetypes for
+ * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c.
+ */
+static inline bool migratetype_is_mergeable(int mt)
+{
+ return mt < MIGRATE_PCPTYPES;
+}
+
#define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
--- a/mm/page_alloc.c~mm-page_alloc-avoid-merging-non-fallbackable-pageblocks-with-others
+++ a/mm/page_alloc.c
@@ -1117,25 +1117,24 @@ continue_merging:
}
if (order < MAX_ORDER - 1) {
/* If we are here, it means order is >= pageblock_order.
- * We want to prevent merge between freepages on isolate
- * pageblock and normal pageblock. Without this, pageblock
- * isolation could cause incorrect freepage or CMA accounting.
+ * We want to prevent merge between freepages on pageblock
+ * without fallbacks and normal pageblock. Without this,
+ * pageblock isolation could cause incorrect freepage or CMA
+ * accounting or HIGHATOMIC accounting.
*
* We don't want to hit this code for the more frequent
* low-order merging.
*/
- if (unlikely(has_isolate_pageblock(zone))) {
- int buddy_mt;
+ int buddy_mt;
- buddy_pfn = __find_buddy_pfn(pfn, order);
- buddy = page + (buddy_pfn - pfn);
- buddy_mt = get_pageblock_migratetype(buddy);
-
- if (migratetype != buddy_mt
- && (is_migrate_isolate(migratetype) ||
- is_migrate_isolate(buddy_mt)))
- goto done_merging;
- }
+ buddy_pfn = __find_buddy_pfn(pfn, order);
+ buddy = page + (buddy_pfn - pfn);
+ buddy_mt = get_pageblock_migratetype(buddy);
+
+ if (migratetype != buddy_mt
+ && (!migratetype_is_mergeable(migratetype) ||
+ !migratetype_is_mergeable(buddy_mt)))
+ goto done_merging;
max_order = order + 1;
goto continue_merging;
}
@@ -2479,17 +2478,13 @@ struct page *__rmqueue_smallest(struct z
/*
* This array describes the order lists are fallen back to when
* the free lists for the desirable migrate type are depleted
+ *
+ * The other migratetypes do not have fallbacks.
*/
static int fallbacks[MIGRATE_TYPES][3] = {
[MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES },
-#ifdef CONFIG_CMA
- [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
-#endif
-#ifdef CONFIG_MEMORY_ISOLATION
- [MIGRATE_ISOLATE] = { MIGRATE_TYPES }, /* Never used */
-#endif
};
#ifdef CONFIG_CMA
@@ -2795,8 +2790,8 @@ static void reserve_highatomic_pageblock
/* Yoink! */
mt = get_pageblock_migratetype(page);
- if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
- && !is_migrate_cma(mt)) {
+ /* Only reserve normal pageblocks (i.e., they can merge with others) */
+ if (migratetype_is_mergeable(mt)) {
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3545,8 +3540,11 @@ int __isolate_free_page(struct page *pag
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
int mt = get_pageblock_migratetype(page);
- if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
- && !is_migrate_highatomic(mt))
+ /*
+ * Only change normal pageblocks (i.e., they can merge
+ * with others)
+ */
+ if (migratetype_is_mergeable(mt))
set_pageblock_migratetype(page,
MIGRATE_MOVABLE);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 091/227] mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: peterz, mgorman, andreyknvl, pcc, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Peter Collingbourne <pcc@google.com>
Subject: mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last()
This will let us avoid an additional read from page->flags when retrying
the compare-exchange on some architectures.
Link: https://lkml.kernel.org/r/20220120011200.1322836-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I2e1f5b5b080ac9c4e0eb7f98768dba6fd7821693
Signed-off-by: Peter Collingbourne <pcc@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmzone.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/mmzone.c~mm-mmzonec-use-try_cmpxchg-in-page_cpupid_xchg_last
+++ a/mm/mmzone.c
@@ -89,13 +89,14 @@ int page_cpupid_xchg_last(struct page *p
unsigned long old_flags, flags;
int last_cpupid;
+ old_flags = READ_ONCE(page->flags);
do {
- old_flags = flags = page->flags;
- last_cpupid = page_cpupid_last(page);
+ flags = old_flags;
+ last_cpupid = (flags >> LAST_CPUPID_PGSHIFT) & LAST_CPUPID_MASK;
flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
- } while (unlikely(cmpxchg(&page->flags, old_flags, flags) != old_flags));
+ } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags)));
return last_cpupid;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 091/227] mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last()
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: peterz, mgorman, andreyknvl, pcc, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Peter Collingbourne <pcc@google.com>
Subject: mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last()
This will let us avoid an additional read from page->flags when retrying
the compare-exchange on some architectures.
Link: https://lkml.kernel.org/r/20220120011200.1322836-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I2e1f5b5b080ac9c4e0eb7f98768dba6fd7821693
Signed-off-by: Peter Collingbourne <pcc@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mmzone.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/mmzone.c~mm-mmzonec-use-try_cmpxchg-in-page_cpupid_xchg_last
+++ a/mm/mmzone.c
@@ -89,13 +89,14 @@ int page_cpupid_xchg_last(struct page *p
unsigned long old_flags, flags;
int last_cpupid;
+ old_flags = READ_ONCE(page->flags);
do {
- old_flags = flags = page->flags;
- last_cpupid = page_cpupid_last(page);
+ flags = old_flags;
+ last_cpupid = (flags >> LAST_CPUPID_PGSHIFT) & LAST_CPUPID_MASK;
flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
- } while (unlikely(cmpxchg(&page->flags, old_flags, flags) != old_flags));
+ } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags)));
return last_cpupid;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 092/227] mm/mmzone.h: remove unused macros
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: rppt, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mmzone.h: remove unused macros
Remove pgdat_page_nr, nid_page_nr and NODE_MEM_MAP. They are unused now.
Link: https://lkml.kernel.org/r/20220127093210.62293-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 7 -------
1 file changed, 7 deletions(-)
--- a/include/linux/mmzone.h~mm-mmzoneh-remove-unused-macros
+++ a/include/linux/mmzone.h
@@ -931,12 +931,6 @@ typedef struct pglist_data {
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
-#ifdef CONFIG_FLATMEM
-#define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr))
-#else
-#define pgdat_page_nr(pgdat, pagenr) pfn_to_page((pgdat)->node_start_pfn + (pagenr))
-#endif
-#define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr))
#define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn)
#define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid))
@@ -1112,7 +1106,6 @@ static inline struct pglist_data *NODE_D
{
return &contig_page_data;
}
-#define NODE_MEM_MAP(nid) mem_map
#else /* CONFIG_NUMA */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 092/227] mm/mmzone.h: remove unused macros
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: rppt, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mmzone.h: remove unused macros
Remove pgdat_page_nr, nid_page_nr and NODE_MEM_MAP. They are unused now.
Link: https://lkml.kernel.org/r/20220127093210.62293-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 7 -------
1 file changed, 7 deletions(-)
--- a/include/linux/mmzone.h~mm-mmzoneh-remove-unused-macros
+++ a/include/linux/mmzone.h
@@ -931,12 +931,6 @@ typedef struct pglist_data {
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
-#ifdef CONFIG_FLATMEM
-#define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr))
-#else
-#define pgdat_page_nr(pgdat, pagenr) pfn_to_page((pgdat)->node_start_pfn + (pagenr))
-#endif
-#define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr))
#define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn)
#define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid))
@@ -1112,7 +1106,6 @@ static inline struct pglist_data *NODE_D
{
return &contig_page_data;
}
-#define NODE_MEM_MAP(nid) mem_map
#else /* CONFIG_NUMA */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 093/227] mm/page_alloc: don't pass pfn to free_unref_page_commit()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: willy, vbabka, nsaenzju, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Subject: mm/page_alloc: don't pass pfn to free_unref_page_commit()
free_unref_page_commit() doesn't make use of its pfn argument, so get
rid of it.
Link: https://lkml.kernel.org/r/20220202140451.415928-1-nsaenzju@redhat.com
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-dont-pass-pfn-to-free_unref_page_commit
+++ a/mm/page_alloc.c
@@ -3366,8 +3366,8 @@ static int nr_pcp_high(struct per_cpu_pa
return min(READ_ONCE(pcp->batch) << 2, high);
}
-static void free_unref_page_commit(struct page *page, unsigned long pfn,
- int migratetype, unsigned int order)
+static void free_unref_page_commit(struct page *page, int migratetype,
+ unsigned int order)
{
struct zone *zone = page_zone(page);
struct per_cpu_pages *pcp;
@@ -3416,7 +3416,7 @@ void free_unref_page(struct page *page,
}
local_lock_irqsave(&pagesets.lock, flags);
- free_unref_page_commit(page, pfn, migratetype, order);
+ free_unref_page_commit(page, migratetype, order);
local_unlock_irqrestore(&pagesets.lock, flags);
}
@@ -3426,13 +3426,13 @@ void free_unref_page(struct page *page,
void free_unref_page_list(struct list_head *list)
{
struct page *page, *next;
- unsigned long flags, pfn;
+ unsigned long flags;
int batch_count = 0;
int migratetype;
/* Prepare pages for freeing */
list_for_each_entry_safe(page, next, list, lru) {
- pfn = page_to_pfn(page);
+ unsigned long pfn = page_to_pfn(page);
if (!free_unref_page_prepare(page, pfn, 0)) {
list_del(&page->lru);
continue;
@@ -3448,15 +3448,10 @@ void free_unref_page_list(struct list_he
free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
continue;
}
-
- set_page_private(page, pfn);
}
local_lock_irqsave(&pagesets.lock, flags);
list_for_each_entry_safe(page, next, list, lru) {
- pfn = page_private(page);
- set_page_private(page, 0);
-
/*
* Non-isolated types over MIGRATE_PCPTYPES get added
* to the MIGRATE_MOVABLE pcp list.
@@ -3466,7 +3461,7 @@ void free_unref_page_list(struct list_he
migratetype = MIGRATE_MOVABLE;
trace_mm_page_free_batched(page);
- free_unref_page_commit(page, pfn, migratetype, 0);
+ free_unref_page_commit(page, migratetype, 0);
/*
* Guard against excessive IRQ disabled times when we get
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 093/227] mm/page_alloc: don't pass pfn to free_unref_page_commit()
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: willy, vbabka, nsaenzju, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Subject: mm/page_alloc: don't pass pfn to free_unref_page_commit()
free_unref_page_commit() doesn't make use of its pfn argument, so get
rid of it.
Link: https://lkml.kernel.org/r/20220202140451.415928-1-nsaenzju@redhat.com
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-dont-pass-pfn-to-free_unref_page_commit
+++ a/mm/page_alloc.c
@@ -3366,8 +3366,8 @@ static int nr_pcp_high(struct per_cpu_pa
return min(READ_ONCE(pcp->batch) << 2, high);
}
-static void free_unref_page_commit(struct page *page, unsigned long pfn,
- int migratetype, unsigned int order)
+static void free_unref_page_commit(struct page *page, int migratetype,
+ unsigned int order)
{
struct zone *zone = page_zone(page);
struct per_cpu_pages *pcp;
@@ -3416,7 +3416,7 @@ void free_unref_page(struct page *page,
}
local_lock_irqsave(&pagesets.lock, flags);
- free_unref_page_commit(page, pfn, migratetype, order);
+ free_unref_page_commit(page, migratetype, order);
local_unlock_irqrestore(&pagesets.lock, flags);
}
@@ -3426,13 +3426,13 @@ void free_unref_page(struct page *page,
void free_unref_page_list(struct list_head *list)
{
struct page *page, *next;
- unsigned long flags, pfn;
+ unsigned long flags;
int batch_count = 0;
int migratetype;
/* Prepare pages for freeing */
list_for_each_entry_safe(page, next, list, lru) {
- pfn = page_to_pfn(page);
+ unsigned long pfn = page_to_pfn(page);
if (!free_unref_page_prepare(page, pfn, 0)) {
list_del(&page->lru);
continue;
@@ -3448,15 +3448,10 @@ void free_unref_page_list(struct list_he
free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
continue;
}
-
- set_page_private(page, pfn);
}
local_lock_irqsave(&pagesets.lock, flags);
list_for_each_entry_safe(page, next, list, lru) {
- pfn = page_private(page);
- set_page_private(page, 0);
-
/*
* Non-isolated types over MIGRATE_PCPTYPES get added
* to the MIGRATE_MOVABLE pcp list.
@@ -3466,7 +3461,7 @@ void free_unref_page_list(struct list_he
migratetype = MIGRATE_MOVABLE;
trace_mm_page_free_batched(page);
- free_unref_page_commit(page, pfn, migratetype, 0);
+ free_unref_page_commit(page, migratetype, 0);
/*
* Guard against excessive IRQ disabled times when we get
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 094/227] cma: factor out minimum alignment requirement
2022-03-22 21:38 incoming Andrew Morton
2022-03-22 21:38 ` Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (224 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, vbabka, robin.murphy, robh, paulus, m.szyprowski, mst, mpe,
minchan, iommu, hch, frowand.list, benh, aneesh.kumar, david,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: cma: factor out minimum alignment requirement
Patch series "mm: enforce pageblock_order < MAX_ORDER".
Having pageblock_order >= MAX_ORDER seems to be able to happen in corner
cases and some parts of the kernel are not prepared for it.
For example, Aneesh has shown [1] that such kernels can be compiled on
ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run
into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during
boot.
We can get pageblock_order >= MAX_ORDER when the default hugetlb size is
bigger than the maximum allocation granularity of the buddy, in which case
we are no longer talking about huge pages but instead gigantic pages.
Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of
such gigantic pages more likely to succeed.
Reliable use of gigantic pages either requires boot time allcoation or
CMA, no need to overcomplicate some places in the kernel to optimize for
corner cases that are broken in other areas of the kernel.
This patch (of 2):
Let's enforce pageblock_order < MAX_ORDER and simplify.
Especially patch #1 can be regarded a cleanup before:
[PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range
alignment. [2]
[1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com
[2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi.yan@sent.com
Link: https://lkml.kernel.org/r/20220214174132.219303-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: John Garry via iommu <iommu@lists.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/powerpc/include/asm/fadump-internal.h | 5 ----
arch/powerpc/kernel/fadump.c | 2 -
drivers/of/of_reserved_mem.c | 9 ++------
include/linux/cma.h | 9 ++++++++
kernel/dma/contiguous.c | 4 ---
mm/cma.c | 20 ++++---------------
6 files changed, 19 insertions(+), 30 deletions(-)
--- a/arch/powerpc/include/asm/fadump-internal.h~cma-factor-out-minimum-alignment-requirement
+++ a/arch/powerpc/include/asm/fadump-internal.h
@@ -19,11 +19,6 @@
#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt)
-/* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \
- max_t(unsigned long, MAX_ORDER - 1, \
- pageblock_order))
-
/* FAD commands */
#define FADUMP_REGISTER 1
#define FADUMP_UNREGISTER 2
--- a/arch/powerpc/kernel/fadump.c~cma-factor-out-minimum-alignment-requirement
+++ a/arch/powerpc/kernel/fadump.c
@@ -544,7 +544,7 @@ int __init fadump_reserve_mem(void)
if (!fw_dump.nocma) {
fw_dump.boot_memory_size =
ALIGN(fw_dump.boot_memory_size,
- FADUMP_CMA_ALIGNMENT);
+ CMA_MIN_ALIGNMENT_BYTES);
}
#endif
--- a/drivers/of/of_reserved_mem.c~cma-factor-out-minimum-alignment-requirement
+++ a/drivers/of/of_reserved_mem.c
@@ -22,6 +22,7 @@
#include <linux/slab.h>
#include <linux/memblock.h>
#include <linux/kmemleak.h>
+#include <linux/cma.h>
#include "of_private.h"
@@ -116,12 +117,8 @@ static int __init __reserved_mem_alloc_s
if (IS_ENABLED(CONFIG_CMA)
&& of_flat_dt_is_compatible(node, "shared-dma-pool")
&& of_get_flat_dt_prop(node, "reusable", NULL)
- && !nomap) {
- unsigned long order =
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
-
- align = max(align, (phys_addr_t)PAGE_SIZE << order);
- }
+ && !nomap)
+ align = max_t(phys_addr_t, align, CMA_MIN_ALIGNMENT_BYTES);
prop = of_get_flat_dt_prop(node, "alloc-ranges", &len);
if (prop) {
--- a/include/linux/cma.h~cma-factor-out-minimum-alignment-requirement
+++ a/include/linux/cma.h
@@ -20,6 +20,15 @@
#define CMA_MAX_NAME 64
+/*
+ * TODO: once the buddy -- especially pageblock merging and alloc_contig_range()
+ * -- can deal with only some pageblocks of a higher-order page being
+ * MIGRATE_CMA, we can use pageblock_nr_pages.
+ */
+#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
+ pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
+
struct cma;
extern unsigned long totalcma_pages;
--- a/kernel/dma/contiguous.c~cma-factor-out-minimum-alignment-requirement
+++ a/kernel/dma/contiguous.c
@@ -399,8 +399,6 @@ static const struct reserved_mem_ops rme
static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
- phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
- phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
struct cma *cma;
@@ -416,7 +414,7 @@ static int __init rmem_cma_setup(struct
of_get_flat_dt_prop(node, "no-map", NULL))
return -EINVAL;
- if ((rmem->base & mask) || (rmem->size & mask)) {
+ if (!IS_ALIGNED(rmem->base | rmem->size, CMA_MIN_ALIGNMENT_BYTES)) {
pr_err("Reserved memory: incorrect alignment of CMA region\n");
return -EINVAL;
}
--- a/mm/cma.c~cma-factor-out-minimum-alignment-requirement
+++ a/mm/cma.c
@@ -168,7 +168,6 @@ int __init cma_init_reserved_mem(phys_ad
struct cma **res_cma)
{
struct cma *cma;
- phys_addr_t alignment;
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
@@ -179,15 +178,12 @@ int __init cma_init_reserved_mem(phys_ad
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
- /* ensure minimal alignment required by mm core */
- alignment = PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
-
/* alignment should be aligned with order_per_bit */
- if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
+ if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit))
return -EINVAL;
- if (ALIGN(base, alignment) != base || ALIGN(size, alignment) != size)
+ /* ensure minimal alignment required by mm core */
+ if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES))
return -EINVAL;
/*
@@ -262,14 +258,8 @@ int __init cma_declare_contiguous_nid(ph
if (alignment && !is_power_of_2(alignment))
return -EINVAL;
- /*
- * Sanitise input arguments.
- * Pages both ends in CMA area could be merged into adjacent unmovable
- * migratetype page by page allocator's buddy algorithm. In the case,
- * you couldn't get a contiguous memory, which is not what we want.
- */
- alignment = max(alignment, (phys_addr_t)PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+ /* Sanitise input arguments. */
+ alignment = max_t(phys_addr_t, alignment, CMA_MIN_ALIGNMENT_BYTES);
if (fixed && base & (alignment - 1)) {
ret = -EINVAL;
pr_err("Region at %pa must be aligned to %pa bytes\n",
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 094/227] cma: factor out minimum alignment requirement
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, vbabka, robin.murphy, robh, paulus, m.szyprowski, mst, mpe,
minchan, iommu, hch, frowand.list, benh, aneesh.kumar, david,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: cma: factor out minimum alignment requirement
Patch series "mm: enforce pageblock_order < MAX_ORDER".
Having pageblock_order >= MAX_ORDER seems to be able to happen in corner
cases and some parts of the kernel are not prepared for it.
For example, Aneesh has shown [1] that such kernels can be compiled on
ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run
into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during
boot.
We can get pageblock_order >= MAX_ORDER when the default hugetlb size is
bigger than the maximum allocation granularity of the buddy, in which case
we are no longer talking about huge pages but instead gigantic pages.
Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of
such gigantic pages more likely to succeed.
Reliable use of gigantic pages either requires boot time allcoation or
CMA, no need to overcomplicate some places in the kernel to optimize for
corner cases that are broken in other areas of the kernel.
This patch (of 2):
Let's enforce pageblock_order < MAX_ORDER and simplify.
Especially patch #1 can be regarded a cleanup before:
[PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range
alignment. [2]
[1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com
[2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi.yan@sent.com
Link: https://lkml.kernel.org/r/20220214174132.219303-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: John Garry via iommu <iommu@lists.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/powerpc/include/asm/fadump-internal.h | 5 ----
arch/powerpc/kernel/fadump.c | 2 -
drivers/of/of_reserved_mem.c | 9 ++------
include/linux/cma.h | 9 ++++++++
kernel/dma/contiguous.c | 4 ---
mm/cma.c | 20 ++++---------------
6 files changed, 19 insertions(+), 30 deletions(-)
--- a/arch/powerpc/include/asm/fadump-internal.h~cma-factor-out-minimum-alignment-requirement
+++ a/arch/powerpc/include/asm/fadump-internal.h
@@ -19,11 +19,6 @@
#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt)
-/* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \
- max_t(unsigned long, MAX_ORDER - 1, \
- pageblock_order))
-
/* FAD commands */
#define FADUMP_REGISTER 1
#define FADUMP_UNREGISTER 2
--- a/arch/powerpc/kernel/fadump.c~cma-factor-out-minimum-alignment-requirement
+++ a/arch/powerpc/kernel/fadump.c
@@ -544,7 +544,7 @@ int __init fadump_reserve_mem(void)
if (!fw_dump.nocma) {
fw_dump.boot_memory_size =
ALIGN(fw_dump.boot_memory_size,
- FADUMP_CMA_ALIGNMENT);
+ CMA_MIN_ALIGNMENT_BYTES);
}
#endif
--- a/drivers/of/of_reserved_mem.c~cma-factor-out-minimum-alignment-requirement
+++ a/drivers/of/of_reserved_mem.c
@@ -22,6 +22,7 @@
#include <linux/slab.h>
#include <linux/memblock.h>
#include <linux/kmemleak.h>
+#include <linux/cma.h>
#include "of_private.h"
@@ -116,12 +117,8 @@ static int __init __reserved_mem_alloc_s
if (IS_ENABLED(CONFIG_CMA)
&& of_flat_dt_is_compatible(node, "shared-dma-pool")
&& of_get_flat_dt_prop(node, "reusable", NULL)
- && !nomap) {
- unsigned long order =
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
-
- align = max(align, (phys_addr_t)PAGE_SIZE << order);
- }
+ && !nomap)
+ align = max_t(phys_addr_t, align, CMA_MIN_ALIGNMENT_BYTES);
prop = of_get_flat_dt_prop(node, "alloc-ranges", &len);
if (prop) {
--- a/include/linux/cma.h~cma-factor-out-minimum-alignment-requirement
+++ a/include/linux/cma.h
@@ -20,6 +20,15 @@
#define CMA_MAX_NAME 64
+/*
+ * TODO: once the buddy -- especially pageblock merging and alloc_contig_range()
+ * -- can deal with only some pageblocks of a higher-order page being
+ * MIGRATE_CMA, we can use pageblock_nr_pages.
+ */
+#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
+ pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
+
struct cma;
extern unsigned long totalcma_pages;
--- a/kernel/dma/contiguous.c~cma-factor-out-minimum-alignment-requirement
+++ a/kernel/dma/contiguous.c
@@ -399,8 +399,6 @@ static const struct reserved_mem_ops rme
static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
- phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
- phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
struct cma *cma;
@@ -416,7 +414,7 @@ static int __init rmem_cma_setup(struct
of_get_flat_dt_prop(node, "no-map", NULL))
return -EINVAL;
- if ((rmem->base & mask) || (rmem->size & mask)) {
+ if (!IS_ALIGNED(rmem->base | rmem->size, CMA_MIN_ALIGNMENT_BYTES)) {
pr_err("Reserved memory: incorrect alignment of CMA region\n");
return -EINVAL;
}
--- a/mm/cma.c~cma-factor-out-minimum-alignment-requirement
+++ a/mm/cma.c
@@ -168,7 +168,6 @@ int __init cma_init_reserved_mem(phys_ad
struct cma **res_cma)
{
struct cma *cma;
- phys_addr_t alignment;
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
@@ -179,15 +178,12 @@ int __init cma_init_reserved_mem(phys_ad
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
- /* ensure minimal alignment required by mm core */
- alignment = PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
-
/* alignment should be aligned with order_per_bit */
- if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
+ if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit))
return -EINVAL;
- if (ALIGN(base, alignment) != base || ALIGN(size, alignment) != size)
+ /* ensure minimal alignment required by mm core */
+ if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES))
return -EINVAL;
/*
@@ -262,14 +258,8 @@ int __init cma_declare_contiguous_nid(ph
if (alignment && !is_power_of_2(alignment))
return -EINVAL;
- /*
- * Sanitise input arguments.
- * Pages both ends in CMA area could be merged into adjacent unmovable
- * migratetype page by page allocator's buddy algorithm. In the case,
- * you couldn't get a contiguous memory, which is not what we want.
- */
- alignment = max(alignment, (phys_addr_t)PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+ /* Sanitise input arguments. */
+ alignment = max_t(phys_addr_t, alignment, CMA_MIN_ALIGNMENT_BYTES);
if (fixed && base & (alignment - 1)) {
ret = -EINVAL;
pr_err("Region at %pa must be aligned to %pa bytes\n",
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 094/227] cma: factor out minimum alignment requirement
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, vbabka, robin.murphy, robh, paulus, m.szyprowski, mst, mpe,
minchan, iommu, hch, frowand.list, benh, aneesh.kumar, david,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: cma: factor out minimum alignment requirement
Patch series "mm: enforce pageblock_order < MAX_ORDER".
Having pageblock_order >= MAX_ORDER seems to be able to happen in corner
cases and some parts of the kernel are not prepared for it.
For example, Aneesh has shown [1] that such kernels can be compiled on
ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run
into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during
boot.
We can get pageblock_order >= MAX_ORDER when the default hugetlb size is
bigger than the maximum allocation granularity of the buddy, in which case
we are no longer talking about huge pages but instead gigantic pages.
Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of
such gigantic pages more likely to succeed.
Reliable use of gigantic pages either requires boot time allcoation or
CMA, no need to overcomplicate some places in the kernel to optimize for
corner cases that are broken in other areas of the kernel.
This patch (of 2):
Let's enforce pageblock_order < MAX_ORDER and simplify.
Especially patch #1 can be regarded a cleanup before:
[PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range
alignment. [2]
[1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com
[2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi.yan@sent.com
Link: https://lkml.kernel.org/r/20220214174132.219303-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: John Garry via iommu <iommu@lists.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/powerpc/include/asm/fadump-internal.h | 5 ----
arch/powerpc/kernel/fadump.c | 2 -
drivers/of/of_reserved_mem.c | 9 ++------
include/linux/cma.h | 9 ++++++++
kernel/dma/contiguous.c | 4 ---
mm/cma.c | 20 ++++---------------
6 files changed, 19 insertions(+), 30 deletions(-)
--- a/arch/powerpc/include/asm/fadump-internal.h~cma-factor-out-minimum-alignment-requirement
+++ a/arch/powerpc/include/asm/fadump-internal.h
@@ -19,11 +19,6 @@
#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt)
-/* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \
- max_t(unsigned long, MAX_ORDER - 1, \
- pageblock_order))
-
/* FAD commands */
#define FADUMP_REGISTER 1
#define FADUMP_UNREGISTER 2
--- a/arch/powerpc/kernel/fadump.c~cma-factor-out-minimum-alignment-requirement
+++ a/arch/powerpc/kernel/fadump.c
@@ -544,7 +544,7 @@ int __init fadump_reserve_mem(void)
if (!fw_dump.nocma) {
fw_dump.boot_memory_size =
ALIGN(fw_dump.boot_memory_size,
- FADUMP_CMA_ALIGNMENT);
+ CMA_MIN_ALIGNMENT_BYTES);
}
#endif
--- a/drivers/of/of_reserved_mem.c~cma-factor-out-minimum-alignment-requirement
+++ a/drivers/of/of_reserved_mem.c
@@ -22,6 +22,7 @@
#include <linux/slab.h>
#include <linux/memblock.h>
#include <linux/kmemleak.h>
+#include <linux/cma.h>
#include "of_private.h"
@@ -116,12 +117,8 @@ static int __init __reserved_mem_alloc_s
if (IS_ENABLED(CONFIG_CMA)
&& of_flat_dt_is_compatible(node, "shared-dma-pool")
&& of_get_flat_dt_prop(node, "reusable", NULL)
- && !nomap) {
- unsigned long order =
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
-
- align = max(align, (phys_addr_t)PAGE_SIZE << order);
- }
+ && !nomap)
+ align = max_t(phys_addr_t, align, CMA_MIN_ALIGNMENT_BYTES);
prop = of_get_flat_dt_prop(node, "alloc-ranges", &len);
if (prop) {
--- a/include/linux/cma.h~cma-factor-out-minimum-alignment-requirement
+++ a/include/linux/cma.h
@@ -20,6 +20,15 @@
#define CMA_MAX_NAME 64
+/*
+ * TODO: once the buddy -- especially pageblock merging and alloc_contig_range()
+ * -- can deal with only some pageblocks of a higher-order page being
+ * MIGRATE_CMA, we can use pageblock_nr_pages.
+ */
+#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
+ pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
+
struct cma;
extern unsigned long totalcma_pages;
--- a/kernel/dma/contiguous.c~cma-factor-out-minimum-alignment-requirement
+++ a/kernel/dma/contiguous.c
@@ -399,8 +399,6 @@ static const struct reserved_mem_ops rme
static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
- phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
- phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
struct cma *cma;
@@ -416,7 +414,7 @@ static int __init rmem_cma_setup(struct
of_get_flat_dt_prop(node, "no-map", NULL))
return -EINVAL;
- if ((rmem->base & mask) || (rmem->size & mask)) {
+ if (!IS_ALIGNED(rmem->base | rmem->size, CMA_MIN_ALIGNMENT_BYTES)) {
pr_err("Reserved memory: incorrect alignment of CMA region\n");
return -EINVAL;
}
--- a/mm/cma.c~cma-factor-out-minimum-alignment-requirement
+++ a/mm/cma.c
@@ -168,7 +168,6 @@ int __init cma_init_reserved_mem(phys_ad
struct cma **res_cma)
{
struct cma *cma;
- phys_addr_t alignment;
/* Sanity checks */
if (cma_area_count == ARRAY_SIZE(cma_areas)) {
@@ -179,15 +178,12 @@ int __init cma_init_reserved_mem(phys_ad
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
- /* ensure minimal alignment required by mm core */
- alignment = PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
-
/* alignment should be aligned with order_per_bit */
- if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
+ if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit))
return -EINVAL;
- if (ALIGN(base, alignment) != base || ALIGN(size, alignment) != size)
+ /* ensure minimal alignment required by mm core */
+ if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES))
return -EINVAL;
/*
@@ -262,14 +258,8 @@ int __init cma_declare_contiguous_nid(ph
if (alignment && !is_power_of_2(alignment))
return -EINVAL;
- /*
- * Sanitise input arguments.
- * Pages both ends in CMA area could be merged into adjacent unmovable
- * migratetype page by page allocator's buddy algorithm. In the case,
- * you couldn't get a contiguous memory, which is not what we want.
- */
- alignment = max(alignment, (phys_addr_t)PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+ /* Sanitise input arguments. */
+ alignment = max_t(phys_addr_t, alignment, CMA_MIN_ALIGNMENT_BYTES);
if (fixed && base & (alignment - 1)) {
ret = -EINVAL;
pr_err("Region at %pa must be aligned to %pa bytes\n",
_
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 095/227] mm: enforce pageblock_order < MAX_ORDER
2022-03-22 21:38 incoming Andrew Morton
2022-03-22 21:38 ` Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (224 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, vbabka, robin.murphy, robh+dt, paulus, m.szyprowski, mst,
mpe, minchan, iommu, hch, frowand.list, benh, aneesh.kumar,
david, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: mm: enforce pageblock_order < MAX_ORDER
Some places in the kernel don't really expect pageblock_order >=
MAX_ORDER, and it looks like this is only possible in corner cases:
1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order
pages via __free_pages_core(), which cannot possibly work.
2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE
start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger
pageblock_order, we could have a single pageblock partially managed by
two zones.
3) compaction code runs into __fragmentation_index() with order
>= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1]
4) mm/page_reporting.c won't be reporting any pages with default
page_reporting_order == pageblock_order, as we'll be skipping the
reporting loop inside page_reporting_process_zone().
5) __rmqueue_fallback() will never be able to steal with
ALLOC_NOFRAGMENT.
pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization
for making alloc_contig_range(), as used for allcoation of gigantic pages,
a little more reliable to succeed. However, if there is demand for
somewhat reliable allocation of gigantic pages, affected setups should be
using CMA or boottime allocations instead.
So let's make sure that pageblock_order < MAX_ORDER and simplify.
[1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com
Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: John Garry via iommu <iommu@lists.linux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/virtio/virtio_mem.c | 9 ++------
include/linux/cma.h | 3 --
include/linux/pageblock-flags.h | 7 ++++--
mm/Kconfig | 3 ++
mm/page_alloc.c | 32 +++++++-----------------------
5 files changed, 20 insertions(+), 34 deletions(-)
--- a/drivers/virtio/virtio_mem.c~mm-enforce-pageblock_order-max_order
+++ a/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,10 @@ static int virtio_mem_init_hotplug(struc
VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
/*
- * We want subblocks to span at least MAX_ORDER_NR_PAGES and
- * pageblock_nr_pages pages. This:
- * - Is required for now for alloc_contig_range() to work reliably -
- * it doesn't properly handle smaller granularity on ZONE_NORMAL.
+ * TODO: once alloc_contig_range() works reliably with pageblock
+ * granularity on ZONE_NORMAL, use pageblock_nr_pages instead.
*/
- sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) * PAGE_SIZE;
+ sb_size = PAGE_SIZE * MAX_ORDER_NR_PAGES;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
if (sb_size < memory_block_size_bytes() && !force_bbm) {
--- a/include/linux/cma.h~mm-enforce-pageblock_order-max_order
+++ a/include/linux/cma.h
@@ -25,8 +25,7 @@
* -- can deal with only some pageblocks of a higher-order page being
* MIGRATE_CMA, we can use pageblock_nr_pages.
*/
-#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
- pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES
#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
struct cma;
--- a/include/linux/pageblock-flags.h~mm-enforce-pageblock_order-max_order
+++ a/include/linux/pageblock-flags.h
@@ -37,8 +37,11 @@ extern unsigned int pageblock_order;
#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
-/* Huge pages are a constant size */
-#define pageblock_order HUGETLB_PAGE_ORDER
+/*
+ * Huge pages are a constant size, but don't exceed the maximum allocation
+ * granularity.
+ */
+#define pageblock_order min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER - 1)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
--- a/mm/Kconfig~mm-enforce-pageblock_order-max_order
+++ a/mm/Kconfig
@@ -262,6 +262,9 @@ config HUGETLB_PAGE_SIZE_VARIABLE
HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
on a platform.
+ Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
+ clamped down to MAX_ORDER - 1.
+
config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
--- a/mm/page_alloc.c~mm-enforce-pageblock_order-max_order
+++ a/mm/page_alloc.c
@@ -1072,14 +1072,12 @@ static inline void __free_one_page(struc
int migratetype, fpi_t fpi_flags)
{
struct capture_control *capc = task_capc(zone);
+ unsigned int max_order = pageblock_order;
unsigned long buddy_pfn;
unsigned long combined_pfn;
- unsigned int max_order;
struct page *buddy;
bool to_tail;
- max_order = min_t(unsigned int, MAX_ORDER - 1, pageblock_order);
-
VM_BUG_ON(!zone_is_initialized(zone));
VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
@@ -2259,19 +2257,8 @@ void __init init_cma_reserved_pageblock(
} while (++p, --i);
set_pageblock_migratetype(page, MIGRATE_CMA);
-
- if (pageblock_order >= MAX_ORDER) {
- i = pageblock_nr_pages;
- p = page;
- do {
- set_page_refcounted(p);
- __free_pages(p, MAX_ORDER - 1);
- p += MAX_ORDER_NR_PAGES;
- } while (i -= MAX_ORDER_NR_PAGES);
- } else {
- set_page_refcounted(page);
- __free_pages(page, pageblock_order);
- }
+ set_page_refcounted(page);
+ __free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
page_zone(page)->cma_pages += pageblock_nr_pages;
@@ -7382,16 +7369,15 @@ static inline void setup_usemap(struct z
/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
void __init set_pageblock_order(void)
{
- unsigned int order;
+ unsigned int order = MAX_ORDER - 1;
/* Check that pageblock_nr_pages has not already been setup */
if (pageblock_order)
return;
- if (HPAGE_SHIFT > PAGE_SHIFT)
+ /* Don't let pageblocks exceed the maximum allocation granularity. */
+ if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order)
order = HUGETLB_PAGE_ORDER;
- else
- order = MAX_ORDER - 1;
/*
* Assume the largest contiguous order of interest is a huge page.
@@ -8979,14 +8965,12 @@ struct page *has_unmovable_pages(struct
#ifdef CONFIG_CONTIG_ALLOC
static unsigned long pfn_max_align_down(unsigned long pfn)
{
- return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) - 1);
+ return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
}
static unsigned long pfn_max_align_up(unsigned long pfn)
{
- return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages));
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
}
#if defined(CONFIG_DYNAMIC_DEBUG) || \
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 095/227] mm: enforce pageblock_order < MAX_ORDER
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, vbabka, robin.murphy, robh+dt, paulus, m.szyprowski, mst,
mpe, minchan, iommu, hch, frowand.list, benh, aneesh.kumar,
david, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: mm: enforce pageblock_order < MAX_ORDER
Some places in the kernel don't really expect pageblock_order >=
MAX_ORDER, and it looks like this is only possible in corner cases:
1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order
pages via __free_pages_core(), which cannot possibly work.
2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE
start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger
pageblock_order, we could have a single pageblock partially managed by
two zones.
3) compaction code runs into __fragmentation_index() with order
>= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1]
4) mm/page_reporting.c won't be reporting any pages with default
page_reporting_order == pageblock_order, as we'll be skipping the
reporting loop inside page_reporting_process_zone().
5) __rmqueue_fallback() will never be able to steal with
ALLOC_NOFRAGMENT.
pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization
for making alloc_contig_range(), as used for allcoation of gigantic pages,
a little more reliable to succeed. However, if there is demand for
somewhat reliable allocation of gigantic pages, affected setups should be
using CMA or boottime allocations instead.
So let's make sure that pageblock_order < MAX_ORDER and simplify.
[1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com
Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: John Garry via iommu <iommu@lists.linux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/virtio/virtio_mem.c | 9 ++------
include/linux/cma.h | 3 --
include/linux/pageblock-flags.h | 7 ++++--
mm/Kconfig | 3 ++
mm/page_alloc.c | 32 +++++++-----------------------
5 files changed, 20 insertions(+), 34 deletions(-)
--- a/drivers/virtio/virtio_mem.c~mm-enforce-pageblock_order-max_order
+++ a/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,10 @@ static int virtio_mem_init_hotplug(struc
VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
/*
- * We want subblocks to span at least MAX_ORDER_NR_PAGES and
- * pageblock_nr_pages pages. This:
- * - Is required for now for alloc_contig_range() to work reliably -
- * it doesn't properly handle smaller granularity on ZONE_NORMAL.
+ * TODO: once alloc_contig_range() works reliably with pageblock
+ * granularity on ZONE_NORMAL, use pageblock_nr_pages instead.
*/
- sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) * PAGE_SIZE;
+ sb_size = PAGE_SIZE * MAX_ORDER_NR_PAGES;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
if (sb_size < memory_block_size_bytes() && !force_bbm) {
--- a/include/linux/cma.h~mm-enforce-pageblock_order-max_order
+++ a/include/linux/cma.h
@@ -25,8 +25,7 @@
* -- can deal with only some pageblocks of a higher-order page being
* MIGRATE_CMA, we can use pageblock_nr_pages.
*/
-#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
- pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES
#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
struct cma;
--- a/include/linux/pageblock-flags.h~mm-enforce-pageblock_order-max_order
+++ a/include/linux/pageblock-flags.h
@@ -37,8 +37,11 @@ extern unsigned int pageblock_order;
#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
-/* Huge pages are a constant size */
-#define pageblock_order HUGETLB_PAGE_ORDER
+/*
+ * Huge pages are a constant size, but don't exceed the maximum allocation
+ * granularity.
+ */
+#define pageblock_order min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER - 1)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
--- a/mm/Kconfig~mm-enforce-pageblock_order-max_order
+++ a/mm/Kconfig
@@ -262,6 +262,9 @@ config HUGETLB_PAGE_SIZE_VARIABLE
HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
on a platform.
+ Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
+ clamped down to MAX_ORDER - 1.
+
config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
--- a/mm/page_alloc.c~mm-enforce-pageblock_order-max_order
+++ a/mm/page_alloc.c
@@ -1072,14 +1072,12 @@ static inline void __free_one_page(struc
int migratetype, fpi_t fpi_flags)
{
struct capture_control *capc = task_capc(zone);
+ unsigned int max_order = pageblock_order;
unsigned long buddy_pfn;
unsigned long combined_pfn;
- unsigned int max_order;
struct page *buddy;
bool to_tail;
- max_order = min_t(unsigned int, MAX_ORDER - 1, pageblock_order);
-
VM_BUG_ON(!zone_is_initialized(zone));
VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
@@ -2259,19 +2257,8 @@ void __init init_cma_reserved_pageblock(
} while (++p, --i);
set_pageblock_migratetype(page, MIGRATE_CMA);
-
- if (pageblock_order >= MAX_ORDER) {
- i = pageblock_nr_pages;
- p = page;
- do {
- set_page_refcounted(p);
- __free_pages(p, MAX_ORDER - 1);
- p += MAX_ORDER_NR_PAGES;
- } while (i -= MAX_ORDER_NR_PAGES);
- } else {
- set_page_refcounted(page);
- __free_pages(page, pageblock_order);
- }
+ set_page_refcounted(page);
+ __free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
page_zone(page)->cma_pages += pageblock_nr_pages;
@@ -7382,16 +7369,15 @@ static inline void setup_usemap(struct z
/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
void __init set_pageblock_order(void)
{
- unsigned int order;
+ unsigned int order = MAX_ORDER - 1;
/* Check that pageblock_nr_pages has not already been setup */
if (pageblock_order)
return;
- if (HPAGE_SHIFT > PAGE_SHIFT)
+ /* Don't let pageblocks exceed the maximum allocation granularity. */
+ if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order)
order = HUGETLB_PAGE_ORDER;
- else
- order = MAX_ORDER - 1;
/*
* Assume the largest contiguous order of interest is a huge page.
@@ -8979,14 +8965,12 @@ struct page *has_unmovable_pages(struct
#ifdef CONFIG_CONTIG_ALLOC
static unsigned long pfn_max_align_down(unsigned long pfn)
{
- return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) - 1);
+ return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
}
static unsigned long pfn_max_align_up(unsigned long pfn)
{
- return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages));
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
}
#if defined(CONFIG_DYNAMIC_DEBUG) || \
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 095/227] mm: enforce pageblock_order < MAX_ORDER
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, vbabka, robin.murphy, robh+dt, paulus, m.szyprowski, mst,
mpe, minchan, iommu, hch, frowand.list, benh, aneesh.kumar,
david, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: mm: enforce pageblock_order < MAX_ORDER
Some places in the kernel don't really expect pageblock_order >=
MAX_ORDER, and it looks like this is only possible in corner cases:
1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order
pages via __free_pages_core(), which cannot possibly work.
2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE
start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger
pageblock_order, we could have a single pageblock partially managed by
two zones.
3) compaction code runs into __fragmentation_index() with order
>= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1]
4) mm/page_reporting.c won't be reporting any pages with default
page_reporting_order == pageblock_order, as we'll be skipping the
reporting loop inside page_reporting_process_zone().
5) __rmqueue_fallback() will never be able to steal with
ALLOC_NOFRAGMENT.
pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization
for making alloc_contig_range(), as used for allcoation of gigantic pages,
a little more reliable to succeed. However, if there is demand for
somewhat reliable allocation of gigantic pages, affected setups should be
using CMA or boottime allocations instead.
So let's make sure that pageblock_order < MAX_ORDER and simplify.
[1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com
Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: John Garry via iommu <iommu@lists.linux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/virtio/virtio_mem.c | 9 ++------
include/linux/cma.h | 3 --
include/linux/pageblock-flags.h | 7 ++++--
mm/Kconfig | 3 ++
mm/page_alloc.c | 32 +++++++-----------------------
5 files changed, 20 insertions(+), 34 deletions(-)
--- a/drivers/virtio/virtio_mem.c~mm-enforce-pageblock_order-max_order
+++ a/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,10 @@ static int virtio_mem_init_hotplug(struc
VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
/*
- * We want subblocks to span at least MAX_ORDER_NR_PAGES and
- * pageblock_nr_pages pages. This:
- * - Is required for now for alloc_contig_range() to work reliably -
- * it doesn't properly handle smaller granularity on ZONE_NORMAL.
+ * TODO: once alloc_contig_range() works reliably with pageblock
+ * granularity on ZONE_NORMAL, use pageblock_nr_pages instead.
*/
- sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) * PAGE_SIZE;
+ sb_size = PAGE_SIZE * MAX_ORDER_NR_PAGES;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
if (sb_size < memory_block_size_bytes() && !force_bbm) {
--- a/include/linux/cma.h~mm-enforce-pageblock_order-max_order
+++ a/include/linux/cma.h
@@ -25,8 +25,7 @@
* -- can deal with only some pageblocks of a higher-order page being
* MIGRATE_CMA, we can use pageblock_nr_pages.
*/
-#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \
- pageblock_nr_pages)
+#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES
#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
struct cma;
--- a/include/linux/pageblock-flags.h~mm-enforce-pageblock_order-max_order
+++ a/include/linux/pageblock-flags.h
@@ -37,8 +37,11 @@ extern unsigned int pageblock_order;
#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
-/* Huge pages are a constant size */
-#define pageblock_order HUGETLB_PAGE_ORDER
+/*
+ * Huge pages are a constant size, but don't exceed the maximum allocation
+ * granularity.
+ */
+#define pageblock_order min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER - 1)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
--- a/mm/Kconfig~mm-enforce-pageblock_order-max_order
+++ a/mm/Kconfig
@@ -262,6 +262,9 @@ config HUGETLB_PAGE_SIZE_VARIABLE
HUGETLB_PAGE_ORDER when there are multiple HugeTLB page sizes available
on a platform.
+ Note that the pageblock_order cannot exceed MAX_ORDER - 1 and will be
+ clamped down to MAX_ORDER - 1.
+
config CONTIG_ALLOC
def_bool (MEMORY_ISOLATION && COMPACTION) || CMA
--- a/mm/page_alloc.c~mm-enforce-pageblock_order-max_order
+++ a/mm/page_alloc.c
@@ -1072,14 +1072,12 @@ static inline void __free_one_page(struc
int migratetype, fpi_t fpi_flags)
{
struct capture_control *capc = task_capc(zone);
+ unsigned int max_order = pageblock_order;
unsigned long buddy_pfn;
unsigned long combined_pfn;
- unsigned int max_order;
struct page *buddy;
bool to_tail;
- max_order = min_t(unsigned int, MAX_ORDER - 1, pageblock_order);
-
VM_BUG_ON(!zone_is_initialized(zone));
VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
@@ -2259,19 +2257,8 @@ void __init init_cma_reserved_pageblock(
} while (++p, --i);
set_pageblock_migratetype(page, MIGRATE_CMA);
-
- if (pageblock_order >= MAX_ORDER) {
- i = pageblock_nr_pages;
- p = page;
- do {
- set_page_refcounted(p);
- __free_pages(p, MAX_ORDER - 1);
- p += MAX_ORDER_NR_PAGES;
- } while (i -= MAX_ORDER_NR_PAGES);
- } else {
- set_page_refcounted(page);
- __free_pages(page, pageblock_order);
- }
+ set_page_refcounted(page);
+ __free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
page_zone(page)->cma_pages += pageblock_nr_pages;
@@ -7382,16 +7369,15 @@ static inline void setup_usemap(struct z
/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
void __init set_pageblock_order(void)
{
- unsigned int order;
+ unsigned int order = MAX_ORDER - 1;
/* Check that pageblock_nr_pages has not already been setup */
if (pageblock_order)
return;
- if (HPAGE_SHIFT > PAGE_SHIFT)
+ /* Don't let pageblocks exceed the maximum allocation granularity. */
+ if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order)
order = HUGETLB_PAGE_ORDER;
- else
- order = MAX_ORDER - 1;
/*
* Assume the largest contiguous order of interest is a huge page.
@@ -8979,14 +8965,12 @@ struct page *has_unmovable_pages(struct
#ifdef CONFIG_CONTIG_ALLOC
static unsigned long pfn_max_align_down(unsigned long pfn)
{
- return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages) - 1);
+ return ALIGN_DOWN(pfn, MAX_ORDER_NR_PAGES);
}
static unsigned long pfn_max_align_up(unsigned long pfn)
{
- return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
- pageblock_nr_pages));
+ return ALIGN(pfn, MAX_ORDER_NR_PAGES);
}
#if defined(CONFIG_DYNAMIC_DEBUG) || \
_
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 096/227] mm/page_alloc: mark pagesets as __maybe_unused
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: peterz, ndesaulniers, bot, bigeasy, nathan, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Nathan Chancellor <nathan@kernel.org>
Subject: mm/page_alloc: mark pagesets as __maybe_unused
Commit 9983a9d577db ("locking/local_lock: Make the empty local_lock_*()
function a macro.") in the -tip tree converted the local_lock_*()
functions into macros, which causes a warning with clang with
CONFIG_PREEMPT_RT=n + CONFIG_DEBUG_LOCK_ALLOC=n:
mm/page_alloc.c:131:40: error: variable 'pagesets' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static DEFINE_PER_CPU(struct pagesets, pagesets) = {
^
1 error generated.
Prior to that change, clang was not able to tell that pagesets was unused
in this configuration because it does not perform cross function analysis
in the frontend. After that change, it sees that the macros just do a
typecheck on the lock member of pagesets, which is evaluated at compile
time (so the variable is technically "used"), meaning the variable is not
needed in the final assembly, as the warning states.
Mark the variable as __maybe_unused to make it clear to clang that this is
expected in this configuration so there is no more warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1593
Link: https://lkml.kernel.org/r/20220215184322.440969-1-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-page_alloc-mark-pagesets-as-__maybe_unused
+++ a/mm/page_alloc.c
@@ -128,7 +128,7 @@ static DEFINE_MUTEX(pcp_batch_high_lock)
struct pagesets {
local_lock_t lock;
};
-static DEFINE_PER_CPU(struct pagesets, pagesets) = {
+static DEFINE_PER_CPU(struct pagesets, pagesets) __maybe_unused = {
.lock = INIT_LOCAL_LOCK(lock),
};
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 096/227] mm/page_alloc: mark pagesets as __maybe_unused
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: peterz, ndesaulniers, bot, bigeasy, nathan, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Nathan Chancellor <nathan@kernel.org>
Subject: mm/page_alloc: mark pagesets as __maybe_unused
Commit 9983a9d577db ("locking/local_lock: Make the empty local_lock_*()
function a macro.") in the -tip tree converted the local_lock_*()
functions into macros, which causes a warning with clang with
CONFIG_PREEMPT_RT=n + CONFIG_DEBUG_LOCK_ALLOC=n:
mm/page_alloc.c:131:40: error: variable 'pagesets' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static DEFINE_PER_CPU(struct pagesets, pagesets) = {
^
1 error generated.
Prior to that change, clang was not able to tell that pagesets was unused
in this configuration because it does not perform cross function analysis
in the frontend. After that change, it sees that the macros just do a
typecheck on the lock member of pagesets, which is evaluated at compile
time (so the variable is technically "used"), meaning the variable is not
needed in the final assembly, as the warning states.
Mark the variable as __maybe_unused to make it clear to clang that this is
expected in this configuration so there is no more warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1593
Link: https://lkml.kernel.org/r/20220215184322.440969-1-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-page_alloc-mark-pagesets-as-__maybe_unused
+++ a/mm/page_alloc.c
@@ -128,7 +128,7 @@ static DEFINE_MUTEX(pcp_batch_high_lock)
struct pagesets {
local_lock_t lock;
};
-static DEFINE_PER_CPU(struct pagesets, pagesets) = {
+static DEFINE_PER_CPU(struct pagesets, pagesets) __maybe_unused = {
.lock = INIT_LOCAL_LOCK(lock),
};
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 097/227] mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, stable, osalvador, mgorman, jhubbard, david,
anshuman.khandual, apopple, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Alistair Popple <apopple@nvidia.com>
Subject: mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
ZONE_MOVABLE uses the remaining memory in each node. Its starting pfn is
also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining
memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is not
enough room for ZONE_MOVABLE on that node.
Unfortunately this condition is not checked for. This leads to
zone_movable_pfn[] getting set to a pfn greater than the last pfn in a
node.
calculate_node_totalpages() then sets zone->present_pages to be greater
than zone->spanned_pages which is invalid, as spanned_pages represents the
maximum number of pages in a zone assuming no holes.
Subsequently it is possible free_area_init_core() will observe a zone of
size zero with present pages. In this case it will skip setting up the
zone, including the initialisation of free_lists[].
However populated_zone() checks zone->present_pages to see if a zone has
memory available. This is used by iterators such as walk_zones_in_node().
pagetypeinfo_showfree() uses this to walk the free_list of each zone in
each node, which are assumed to be initialised due to the zone not being
empty. As free_area_init_core() never initialised the free_lists[] this
results in the following kernel crash when trying to read
/proc/pagetypeinfo:
[ 67.534914] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 67.535429] #PF: supervisor read access in kernel mode
[ 67.535789] #PF: error_code(0x0000) - not-present page
[ 67.536128] PGD 0 P4D 0
[ 67.536305] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
[ 67.536696] CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461
[ 67.537096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[ 67.537638] RIP: 0010:pagetypeinfo_show+0x163/0x460
[ 67.537992] Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82
[ 67.538259] RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003
[ 67.538259] RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001
[ 67.538259] RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b
[ 67.538259] RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e
[ 67.538259] R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00
[ 67.538259] R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000
[ 67.538259] FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 67.538259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 67.538259] CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0
[ 67.538259] Call Trace:
[ 67.538259] <TASK>
[ 67.538259] seq_read_iter+0x128/0x460
[ 67.538259] ? aa_file_perm+0x1af/0x5f0
[ 67.538259] proc_reg_read_iter+0x51/0x80
[ 67.538259] ? lock_is_held_type+0xea/0x140
[ 67.538259] new_sync_read+0x113/0x1a0
[ 67.538259] vfs_read+0x136/0x1d0
[ 67.538259] ksys_read+0x70/0xf0
[ 67.538259] __x64_sys_read+0x1a/0x20
[ 67.538259] do_syscall_64+0x3b/0xc0
[ 67.538259] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 67.538259] RIP: 0033:0x7f9c83e23cce
[ 67.538259] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 13 0a 00 e8 c9 e3 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[ 67.538259] RSP: 002b:00007fff116e1a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 67.538259] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f9c83e23cce
[ 67.538259] RDX: 0000000000020000 RSI: 00007f9c83a2c000 RDI: 0000000000000003
[ 67.538259] RBP: 00007f9c83a2c000 R08: 00007f9c83a2b010 R09: 0000000000000000
[ 67.538259] R10: 00007f9c83f2d7d0 R11: 0000000000000246 R12: 0000000000000000
[ 67.538259] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[ 67.538259] </TASK>
Fix this by checking that the aligned zone_movable_pfn[] does not exceed
the end of the node, and if it does skip creating a movable zone on this
node.
Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com
Fixes: 2a1e274acf0b ("Create the ZONE_MOVABLE zone")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-pages_allocc-dont-create-zone_movable-beyond-the-end-of-a-node
+++ a/mm/page_alloc.c
@@ -7951,10 +7951,17 @@ restart:
out2:
/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
- for (nid = 0; nid < MAX_NUMNODES; nid++)
+ for (nid = 0; nid < MAX_NUMNODES; nid++) {
+ unsigned long start_pfn, end_pfn;
+
zone_movable_pfn[nid] =
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
+ get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+ if (zone_movable_pfn[nid] >= end_pfn)
+ zone_movable_pfn[nid] = 0;
+ }
+
out:
/* restore the node_state */
node_states[N_MEMORY] = saved_node_state;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 097/227] mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: ziy, stable, osalvador, mgorman, jhubbard, david,
anshuman.khandual, apopple, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Alistair Popple <apopple@nvidia.com>
Subject: mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
ZONE_MOVABLE uses the remaining memory in each node. Its starting pfn is
also aligned to MAX_ORDER_NR_PAGES. It is possible for the remaining
memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is not
enough room for ZONE_MOVABLE on that node.
Unfortunately this condition is not checked for. This leads to
zone_movable_pfn[] getting set to a pfn greater than the last pfn in a
node.
calculate_node_totalpages() then sets zone->present_pages to be greater
than zone->spanned_pages which is invalid, as spanned_pages represents the
maximum number of pages in a zone assuming no holes.
Subsequently it is possible free_area_init_core() will observe a zone of
size zero with present pages. In this case it will skip setting up the
zone, including the initialisation of free_lists[].
However populated_zone() checks zone->present_pages to see if a zone has
memory available. This is used by iterators such as walk_zones_in_node().
pagetypeinfo_showfree() uses this to walk the free_list of each zone in
each node, which are assumed to be initialised due to the zone not being
empty. As free_area_init_core() never initialised the free_lists[] this
results in the following kernel crash when trying to read
/proc/pagetypeinfo:
[ 67.534914] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 67.535429] #PF: supervisor read access in kernel mode
[ 67.535789] #PF: error_code(0x0000) - not-present page
[ 67.536128] PGD 0 P4D 0
[ 67.536305] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
[ 67.536696] CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461
[ 67.537096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[ 67.537638] RIP: 0010:pagetypeinfo_show+0x163/0x460
[ 67.537992] Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82
[ 67.538259] RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003
[ 67.538259] RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001
[ 67.538259] RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b
[ 67.538259] RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e
[ 67.538259] R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00
[ 67.538259] R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000
[ 67.538259] FS: 00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 67.538259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 67.538259] CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0
[ 67.538259] Call Trace:
[ 67.538259] <TASK>
[ 67.538259] seq_read_iter+0x128/0x460
[ 67.538259] ? aa_file_perm+0x1af/0x5f0
[ 67.538259] proc_reg_read_iter+0x51/0x80
[ 67.538259] ? lock_is_held_type+0xea/0x140
[ 67.538259] new_sync_read+0x113/0x1a0
[ 67.538259] vfs_read+0x136/0x1d0
[ 67.538259] ksys_read+0x70/0xf0
[ 67.538259] __x64_sys_read+0x1a/0x20
[ 67.538259] do_syscall_64+0x3b/0xc0
[ 67.538259] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 67.538259] RIP: 0033:0x7f9c83e23cce
[ 67.538259] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 13 0a 00 e8 c9 e3 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[ 67.538259] RSP: 002b:00007fff116e1a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 67.538259] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f9c83e23cce
[ 67.538259] RDX: 0000000000020000 RSI: 00007f9c83a2c000 RDI: 0000000000000003
[ 67.538259] RBP: 00007f9c83a2c000 R08: 00007f9c83a2b010 R09: 0000000000000000
[ 67.538259] R10: 00007f9c83f2d7d0 R11: 0000000000000246 R12: 0000000000000000
[ 67.538259] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[ 67.538259] </TASK>
Fix this by checking that the aligned zone_movable_pfn[] does not exceed
the end of the node, and if it does skip creating a movable zone on this
node.
Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com
Fixes: 2a1e274acf0b ("Create the ZONE_MOVABLE zone")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/page_alloc.c~mm-pages_allocc-dont-create-zone_movable-beyond-the-end-of-a-node
+++ a/mm/page_alloc.c
@@ -7951,10 +7951,17 @@ restart:
out2:
/* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */
- for (nid = 0; nid < MAX_NUMNODES; nid++)
+ for (nid = 0; nid < MAX_NUMNODES; nid++) {
+ unsigned long start_pfn, end_pfn;
+
zone_movable_pfn[nid] =
roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);
+ get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+ if (zone_movable_pfn[nid] >= end_pfn)
+ zone_movable_pfn[nid] = 0;
+ }
+
out:
/* restore the node_state */
node_states[N_MEMORY] = saved_node_state;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 098/227] mm/page_alloc: fetch the correct pcp buddy during bulk free
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: fetch the correct pcp buddy during bulk free
Patch series "Follow-up on high-order PCP caching", v2.
Commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored
on the per-cpu lists") was primarily aimed at reducing the cost of SLUB
cache refills of high-order pages in two ways. Firstly, zone lock
acquisitions was reduced and secondly, there were fewer buddy list
modifications. This is a follow-up series fixing some issues that became
apparant after merging.
Patch 1 is a functional fix. It's harmless but inefficient.
Patches 2-5 reduce the overhead of bulk freeing of PCP pages. While the
overhead is small, it's cumulative and noticable when truncating large
files. The changelog for patch 4 includes results of a microbench that
deletes large sparse files with data in page cache. Sparse files were
used to eliminate filesystem overhead.
Patch 6 addresses issues with high-order PCP pages being stored on PCP
lists for too long. Pages freed on a CPU potentially may not be quickly
reused and in some cases this can increase cache miss rates. Details are
included in the changelog.
This patch (of 6):
free_pcppages_bulk() prefetches buddies about to be freed but the order
must also be passed in as PCP lists store multiple orders.
Link: https://lkml.kernel.org/r/20220217002227.5739-1-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20220217002227.5739-2-mgorman@techsingularity.net
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fetch-the-correct-pcp-buddy-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1429,10 +1429,10 @@ static bool bulkfree_pcp_prepare(struct
}
#endif /* CONFIG_DEBUG_VM */
-static inline void prefetch_buddy(struct page *page)
+static inline void prefetch_buddy(struct page *page, unsigned int order)
{
unsigned long pfn = page_to_pfn(page);
- unsigned long buddy_pfn = __find_buddy_pfn(pfn, 0);
+ unsigned long buddy_pfn = __find_buddy_pfn(pfn, order);
struct page *buddy = page + (buddy_pfn - pfn);
prefetch(buddy);
@@ -1509,7 +1509,7 @@ static void free_pcppages_bulk(struct zo
* prefetch buddy for the first pcp->batch nr of pages.
*/
if (prefetch_nr) {
- prefetch_buddy(page);
+ prefetch_buddy(page, order);
prefetch_nr--;
}
} while (count > 0 && --batch_free && !list_empty(list));
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 098/227] mm/page_alloc: fetch the correct pcp buddy during bulk free
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: fetch the correct pcp buddy during bulk free
Patch series "Follow-up on high-order PCP caching", v2.
Commit 44042b449872 ("mm/page_alloc: allow high-order pages to be stored
on the per-cpu lists") was primarily aimed at reducing the cost of SLUB
cache refills of high-order pages in two ways. Firstly, zone lock
acquisitions was reduced and secondly, there were fewer buddy list
modifications. This is a follow-up series fixing some issues that became
apparant after merging.
Patch 1 is a functional fix. It's harmless but inefficient.
Patches 2-5 reduce the overhead of bulk freeing of PCP pages. While the
overhead is small, it's cumulative and noticable when truncating large
files. The changelog for patch 4 includes results of a microbench that
deletes large sparse files with data in page cache. Sparse files were
used to eliminate filesystem overhead.
Patch 6 addresses issues with high-order PCP pages being stored on PCP
lists for too long. Pages freed on a CPU potentially may not be quickly
reused and in some cases this can increase cache miss rates. Details are
included in the changelog.
This patch (of 6):
free_pcppages_bulk() prefetches buddies about to be freed but the order
must also be passed in as PCP lists store multiple orders.
Link: https://lkml.kernel.org/r/20220217002227.5739-1-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20220217002227.5739-2-mgorman@techsingularity.net
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fetch-the-correct-pcp-buddy-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1429,10 +1429,10 @@ static bool bulkfree_pcp_prepare(struct
}
#endif /* CONFIG_DEBUG_VM */
-static inline void prefetch_buddy(struct page *page)
+static inline void prefetch_buddy(struct page *page, unsigned int order)
{
unsigned long pfn = page_to_pfn(page);
- unsigned long buddy_pfn = __find_buddy_pfn(pfn, 0);
+ unsigned long buddy_pfn = __find_buddy_pfn(pfn, order);
struct page *buddy = page + (buddy_pfn - pfn);
prefetch(buddy);
@@ -1509,7 +1509,7 @@ static void free_pcppages_bulk(struct zo
* prefetch buddy for the first pcp->batch nr of pages.
*/
if (prefetch_nr) {
- prefetch_buddy(page);
+ prefetch_buddy(page, order);
prefetch_nr--;
}
} while (count > 0 && --batch_free && !list_empty(list));
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 099/227] mm/page_alloc: track range of active PCP lists during bulk free
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: track range of active PCP lists during bulk free
free_pcppages_bulk() frees pages in a round-robin fashion. Originally,
this was dealing only with migratetypes but storing high-order pages means
that there can be many more empty lists that are uselessly checked. Track
the minimum and maximum active pindex to reduce the search space.
Link: https://lkml.kernel.org/r/20220217002227.5739-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-track-range-of-active-pcp-lists-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1447,6 +1447,8 @@ static void free_pcppages_bulk(struct zo
struct per_cpu_pages *pcp)
{
int pindex = 0;
+ int min_pindex = 0;
+ int max_pindex = NR_PCP_LISTS - 1;
int batch_free = 0;
int nr_freed = 0;
unsigned int order;
@@ -1472,13 +1474,20 @@ static void free_pcppages_bulk(struct zo
*/
do {
batch_free++;
- if (++pindex == NR_PCP_LISTS)
- pindex = 0;
+ if (++pindex > max_pindex)
+ pindex = min_pindex;
list = &pcp->lists[pindex];
- } while (list_empty(list));
+ if (!list_empty(list))
+ break;
+
+ if (pindex == max_pindex)
+ max_pindex--;
+ if (pindex == min_pindex)
+ min_pindex++;
+ } while (1);
/* This is the only non-empty list. Free them all. */
- if (batch_free == NR_PCP_LISTS)
+ if (batch_free >= max_pindex - min_pindex)
batch_free = count;
order = pindex_to_order(pindex);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 099/227] mm/page_alloc: track range of active PCP lists during bulk free
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: track range of active PCP lists during bulk free
free_pcppages_bulk() frees pages in a round-robin fashion. Originally,
this was dealing only with migratetypes but storing high-order pages means
that there can be many more empty lists that are uselessly checked. Track
the minimum and maximum active pindex to reduce the search space.
Link: https://lkml.kernel.org/r/20220217002227.5739-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-track-range-of-active-pcp-lists-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1447,6 +1447,8 @@ static void free_pcppages_bulk(struct zo
struct per_cpu_pages *pcp)
{
int pindex = 0;
+ int min_pindex = 0;
+ int max_pindex = NR_PCP_LISTS - 1;
int batch_free = 0;
int nr_freed = 0;
unsigned int order;
@@ -1472,13 +1474,20 @@ static void free_pcppages_bulk(struct zo
*/
do {
batch_free++;
- if (++pindex == NR_PCP_LISTS)
- pindex = 0;
+ if (++pindex > max_pindex)
+ pindex = min_pindex;
list = &pcp->lists[pindex];
- } while (list_empty(list));
+ if (!list_empty(list))
+ break;
+
+ if (pindex == max_pindex)
+ max_pindex--;
+ if (pindex == min_pindex)
+ min_pindex++;
+ } while (1);
/* This is the only non-empty list. Free them all. */
- if (batch_free == NR_PCP_LISTS)
+ if (batch_free >= max_pindex - min_pindex)
batch_free = count;
order = pindex_to_order(pindex);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 100/227] mm/page_alloc: simplify how many pages are selected per pcp list during bulk free
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: simplify how many pages are selected per pcp list during bulk free
free_pcppages_bulk() selects pages to free by round-robining between
lists. Originally this was to evenly shrink pages by migratetype but
uneven freeing is inevitable due to high pages. Simplify list selection
by starting with a list that definitely has pages on it in
free_unref_page_commit() and for drain, it does not matter where draining
starts as all pages are removed.
Link: https://lkml.kernel.org/r/20220217002227.5739-4-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 34 +++++++++++-----------------------
1 file changed, 11 insertions(+), 23 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-simplify-how-many-pages-are-selected-per-pcp-list-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1444,13 +1444,11 @@ static inline void prefetch_buddy(struct
* count is the number of pages to free.
*/
static void free_pcppages_bulk(struct zone *zone, int count,
- struct per_cpu_pages *pcp)
+ struct per_cpu_pages *pcp,
+ int pindex)
{
- int pindex = 0;
int min_pindex = 0;
int max_pindex = NR_PCP_LISTS - 1;
- int batch_free = 0;
- int nr_freed = 0;
unsigned int order;
int prefetch_nr = READ_ONCE(pcp->batch);
bool isolated_pageblocks;
@@ -1464,16 +1462,10 @@ static void free_pcppages_bulk(struct zo
count = min(pcp->count, count);
while (count > 0) {
struct list_head *list;
+ int nr_pages;
- /*
- * Remove pages from lists in a round-robin fashion. A
- * batch_free count is maintained that is incremented when an
- * empty list is encountered. This is so more pages are freed
- * off fuller lists instead of spinning excessively around empty
- * lists
- */
+ /* Remove pages from lists in a round-robin fashion. */
do {
- batch_free++;
if (++pindex > max_pindex)
pindex = min_pindex;
list = &pcp->lists[pindex];
@@ -1486,18 +1478,15 @@ static void free_pcppages_bulk(struct zo
min_pindex++;
} while (1);
- /* This is the only non-empty list. Free them all. */
- if (batch_free >= max_pindex - min_pindex)
- batch_free = count;
-
order = pindex_to_order(pindex);
+ nr_pages = 1 << order;
BUILD_BUG_ON(MAX_ORDER >= (1<<NR_PCP_ORDER_WIDTH));
do {
page = list_last_entry(list, struct page, lru);
/* must delete to avoid corrupting pcp list */
list_del(&page->lru);
- nr_freed += 1 << order;
- count -= 1 << order;
+ count -= nr_pages;
+ pcp->count -= nr_pages;
if (bulkfree_pcp_prepare(page))
continue;
@@ -1521,9 +1510,8 @@ static void free_pcppages_bulk(struct zo
prefetch_buddy(page, order);
prefetch_nr--;
}
- } while (count > 0 && --batch_free && !list_empty(list));
+ } while (count > 0 && !list_empty(list));
}
- pcp->count -= nr_freed;
/*
* local_lock_irq held so equivalent to spin_lock_irqsave for
@@ -3077,7 +3065,7 @@ void drain_zone_pages(struct zone *zone,
batch = READ_ONCE(pcp->batch);
to_drain = min(pcp->count, batch);
if (to_drain > 0)
- free_pcppages_bulk(zone, to_drain, pcp);
+ free_pcppages_bulk(zone, to_drain, pcp, 0);
local_unlock_irqrestore(&pagesets.lock, flags);
}
#endif
@@ -3098,7 +3086,7 @@ static void drain_pages_zone(unsigned in
pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
if (pcp->count)
- free_pcppages_bulk(zone, pcp->count, pcp);
+ free_pcppages_bulk(zone, pcp->count, pcp, 0);
local_unlock_irqrestore(&pagesets.lock, flags);
}
@@ -3379,7 +3367,7 @@ static void free_unref_page_commit(struc
if (pcp->count >= high) {
int batch = READ_ONCE(pcp->batch);
- free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp);
+ free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex);
}
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 100/227] mm/page_alloc: simplify how many pages are selected per pcp list during bulk free
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: simplify how many pages are selected per pcp list during bulk free
free_pcppages_bulk() selects pages to free by round-robining between
lists. Originally this was to evenly shrink pages by migratetype but
uneven freeing is inevitable due to high pages. Simplify list selection
by starting with a list that definitely has pages on it in
free_unref_page_commit() and for drain, it does not matter where draining
starts as all pages are removed.
Link: https://lkml.kernel.org/r/20220217002227.5739-4-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 34 +++++++++++-----------------------
1 file changed, 11 insertions(+), 23 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-simplify-how-many-pages-are-selected-per-pcp-list-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1444,13 +1444,11 @@ static inline void prefetch_buddy(struct
* count is the number of pages to free.
*/
static void free_pcppages_bulk(struct zone *zone, int count,
- struct per_cpu_pages *pcp)
+ struct per_cpu_pages *pcp,
+ int pindex)
{
- int pindex = 0;
int min_pindex = 0;
int max_pindex = NR_PCP_LISTS - 1;
- int batch_free = 0;
- int nr_freed = 0;
unsigned int order;
int prefetch_nr = READ_ONCE(pcp->batch);
bool isolated_pageblocks;
@@ -1464,16 +1462,10 @@ static void free_pcppages_bulk(struct zo
count = min(pcp->count, count);
while (count > 0) {
struct list_head *list;
+ int nr_pages;
- /*
- * Remove pages from lists in a round-robin fashion. A
- * batch_free count is maintained that is incremented when an
- * empty list is encountered. This is so more pages are freed
- * off fuller lists instead of spinning excessively around empty
- * lists
- */
+ /* Remove pages from lists in a round-robin fashion. */
do {
- batch_free++;
if (++pindex > max_pindex)
pindex = min_pindex;
list = &pcp->lists[pindex];
@@ -1486,18 +1478,15 @@ static void free_pcppages_bulk(struct zo
min_pindex++;
} while (1);
- /* This is the only non-empty list. Free them all. */
- if (batch_free >= max_pindex - min_pindex)
- batch_free = count;
-
order = pindex_to_order(pindex);
+ nr_pages = 1 << order;
BUILD_BUG_ON(MAX_ORDER >= (1<<NR_PCP_ORDER_WIDTH));
do {
page = list_last_entry(list, struct page, lru);
/* must delete to avoid corrupting pcp list */
list_del(&page->lru);
- nr_freed += 1 << order;
- count -= 1 << order;
+ count -= nr_pages;
+ pcp->count -= nr_pages;
if (bulkfree_pcp_prepare(page))
continue;
@@ -1521,9 +1510,8 @@ static void free_pcppages_bulk(struct zo
prefetch_buddy(page, order);
prefetch_nr--;
}
- } while (count > 0 && --batch_free && !list_empty(list));
+ } while (count > 0 && !list_empty(list));
}
- pcp->count -= nr_freed;
/*
* local_lock_irq held so equivalent to spin_lock_irqsave for
@@ -3077,7 +3065,7 @@ void drain_zone_pages(struct zone *zone,
batch = READ_ONCE(pcp->batch);
to_drain = min(pcp->count, batch);
if (to_drain > 0)
- free_pcppages_bulk(zone, to_drain, pcp);
+ free_pcppages_bulk(zone, to_drain, pcp, 0);
local_unlock_irqrestore(&pagesets.lock, flags);
}
#endif
@@ -3098,7 +3086,7 @@ static void drain_pages_zone(unsigned in
pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
if (pcp->count)
- free_pcppages_bulk(zone, pcp->count, pcp);
+ free_pcppages_bulk(zone, pcp->count, pcp, 0);
local_unlock_irqrestore(&pagesets.lock, flags);
}
@@ -3379,7 +3367,7 @@ static void free_unref_page_commit(struc
if (pcp->count >= high) {
int batch = READ_ONCE(pcp->batch);
- free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp);
+ free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex);
}
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 101/227] mm/page_alloc: drain the requested list first during bulk free
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: drain the requested list first during bulk free
Prior to the series, pindex 0 (order-0 MIGRATE_UNMOVABLE) was always
skipped first and the precise reason is forgotten. A potential reason may
have been to artificially preserve MIGRATE_UNMOVABLE but there is no
reason why that would be optimal as it depends on the workload. The more
likely reason is that it was less complicated to do a pre-increment
instead of a post-increment in terms of overall code flow. As
free_pcppages_bulk() now typically receives the pindex of the PCP list
that exceeded high, always start draining that list.
Link: https://lkml.kernel.org/r/20220217002227.5739-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/page_alloc.c~mm-page_alloc-drain-the-requested-list-first-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1460,6 +1460,10 @@ static void free_pcppages_bulk(struct zo
* below while (list_empty(list)) loop.
*/
count = min(pcp->count, count);
+
+ /* Ensure requested pindex is drained first. */
+ pindex = pindex - 1;
+
while (count > 0) {
struct list_head *list;
int nr_pages;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 101/227] mm/page_alloc: drain the requested list first during bulk free
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: drain the requested list first during bulk free
Prior to the series, pindex 0 (order-0 MIGRATE_UNMOVABLE) was always
skipped first and the precise reason is forgotten. A potential reason may
have been to artificially preserve MIGRATE_UNMOVABLE but there is no
reason why that would be optimal as it depends on the workload. The more
likely reason is that it was less complicated to do a pre-increment
instead of a post-increment in terms of overall code flow. As
free_pcppages_bulk() now typically receives the pindex of the PCP list
that exceeded high, always start draining that list.
Link: https://lkml.kernel.org/r/20220217002227.5739-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/page_alloc.c~mm-page_alloc-drain-the-requested-list-first-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1460,6 +1460,10 @@ static void free_pcppages_bulk(struct zo
* below while (list_empty(list)) loop.
*/
count = min(pcp->count, count);
+
+ /* Ensure requested pindex is drained first. */
+ pindex = pindex - 1;
+
while (count > 0) {
struct list_head *list;
int nr_pages;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 102/227] mm/page_alloc: free pages in a single pass during bulk free
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: free pages in a single pass during bulk free
free_pcppages_bulk() has taken two passes through the pcp lists since
commit 0a5f4e5b4562 ("mm/free_pcppages_bulk: do not hold lock when picking
pages to free") due to deferring the cost of selecting PCP lists until the
zone lock is held. Now that list selection is simplier, the main cost
during selection is bulkfree_pcp_prepare() which in the normal case is a
simple check and prefetching. As the list manipulations have cost in
itself, go back to freeing pages in a single pass.
The series up to this point was evaulated using a trunc microbenchmark
that is truncating sparse files stored in page cache (mmtests config
config-io-trunc). Sparse files were used to limit filesystem interaction.
The results versus a revert of storing high-order pages in the PCP lists
is
1-socket Skylake
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1 mm-highpcpopt-v2
Min elapsed 540.00 ( 0.00%) 530.00 ( 1.85%) 530.00 ( 1.85%)
Amean elapsed 543.00 ( 0.00%) 530.00 * 2.39%* 530.00 * 2.39%*
Stddev elapsed 4.83 ( 0.00%) 0.00 ( 100.00%) 0.00 ( 100.00%)
CoeffVar elapsed 0.89 ( 0.00%) 0.00 ( 100.00%) 0.00 ( 100.00%)
Max elapsed 550.00 ( 0.00%) 530.00 ( 3.64%) 530.00 ( 3.64%)
BAmean-50 elapsed 540.00 ( 0.00%) 530.00 ( 1.85%) 530.00 ( 1.85%)
BAmean-95 elapsed 542.22 ( 0.00%) 530.00 ( 2.25%) 530.00 ( 2.25%)
BAmean-99 elapsed 542.22 ( 0.00%) 530.00 ( 2.25%) 530.00 ( 2.25%)
2-socket CascadeLake
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1 mm-highpcpopt-v2
Min elapsed 510.00 ( 0.00%) 500.00 ( 1.96%) 500.00 ( 1.96%)
Amean elapsed 529.00 ( 0.00%) 521.00 ( 1.51%) 510.00 * 3.59%*
Stddev elapsed 16.63 ( 0.00%) 12.87 ( 22.64%) 11.55 ( 30.58%)
CoeffVar elapsed 3.14 ( 0.00%) 2.47 ( 21.46%) 2.26 ( 27.99%)
Max elapsed 550.00 ( 0.00%) 540.00 ( 1.82%) 530.00 ( 3.64%)
BAmean-50 elapsed 516.00 ( 0.00%) 512.00 ( 0.78%) 500.00 ( 3.10%)
BAmean-95 elapsed 526.67 ( 0.00%) 518.89 ( 1.48%) 507.78 ( 3.59%)
BAmean-99 elapsed 526.67 ( 0.00%) 518.89 ( 1.48%) 507.78 ( 3.59%)
The original motivation for multi-passes was will-it-scale page_fault1
using $nr_cpu processes.
2-socket CascadeLake (40 cores, 80 CPUs HT enabled)
5.17.0-rc3 5.17.0-rc3
vanilla mm-highpcpopt-v2
Hmean page_fault1-processes-2 2694662.26 ( 0.00%) 2695780.35 ( 0.04%)
Hmean page_fault1-processes-5 6425819.34 ( 0.00%) 6435544.57 * 0.15%*
Hmean page_fault1-processes-8 9642169.10 ( 0.00%) 9658962.39 ( 0.17%)
Hmean page_fault1-processes-12 12167502.10 ( 0.00%) 12190163.79 ( 0.19%)
Hmean page_fault1-processes-21 15636859.03 ( 0.00%) 15612447.26 ( -0.16%)
Hmean page_fault1-processes-30 25157348.61 ( 0.00%) 25169456.65 ( 0.05%)
Hmean page_fault1-processes-48 27694013.85 ( 0.00%) 27671111.46 ( -0.08%)
Hmean page_fault1-processes-79 25928742.64 ( 0.00%) 25934202.02 ( 0.02%) <--
Hmean page_fault1-processes-110 25730869.75 ( 0.00%) 25671880.65 * -0.23%*
Hmean page_fault1-processes-141 25626992.42 ( 0.00%) 25629551.61 ( 0.01%)
Hmean page_fault1-processes-172 25611651.35 ( 0.00%) 25614927.99 ( 0.01%)
Hmean page_fault1-processes-203 25577298.75 ( 0.00%) 25583445.59 ( 0.02%)
Hmean page_fault1-processes-234 25580686.07 ( 0.00%) 25608240.71 ( 0.11%)
Hmean page_fault1-processes-265 25570215.47 ( 0.00%) 25568647.58 ( -0.01%)
Hmean page_fault1-processes-296 25549488.62 ( 0.00%) 25543935.00 ( -0.02%)
Hmean page_fault1-processes-320 25555149.05 ( 0.00%) 25575696.74 ( 0.08%)
The differences are mostly within the noise and the difference close to
$nr_cpus is negligible.
Link: https://lkml.kernel.org/r/20220217002227.5739-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 56 +++++++++++++++++-----------------------------
1 file changed, 21 insertions(+), 35 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-free-pages-in-a-single-pass-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1452,8 +1452,7 @@ static void free_pcppages_bulk(struct zo
unsigned int order;
int prefetch_nr = READ_ONCE(pcp->batch);
bool isolated_pageblocks;
- struct page *page, *tmp;
- LIST_HEAD(head);
+ struct page *page;
/*
* Ensure proper count is passed which otherwise would stuck in the
@@ -1464,6 +1463,13 @@ static void free_pcppages_bulk(struct zo
/* Ensure requested pindex is drained first. */
pindex = pindex - 1;
+ /*
+ * local_lock_irq held so equivalent to spin_lock_irqsave for
+ * both PREEMPT_RT and non-PREEMPT_RT configurations.
+ */
+ spin_lock(&zone->lock);
+ isolated_pageblocks = has_isolate_pageblock(zone);
+
while (count > 0) {
struct list_head *list;
int nr_pages;
@@ -1486,7 +1492,11 @@ static void free_pcppages_bulk(struct zo
nr_pages = 1 << order;
BUILD_BUG_ON(MAX_ORDER >= (1<<NR_PCP_ORDER_WIDTH));
do {
+ int mt;
+
page = list_last_entry(list, struct page, lru);
+ mt = get_pcppage_migratetype(page);
+
/* must delete to avoid corrupting pcp list */
list_del(&page->lru);
count -= nr_pages;
@@ -1495,12 +1505,6 @@ static void free_pcppages_bulk(struct zo
if (bulkfree_pcp_prepare(page))
continue;
- /* Encode order with the migratetype */
- page->index <<= NR_PCP_ORDER_WIDTH;
- page->index |= order;
-
- list_add_tail(&page->lru, &head);
-
/*
* We are going to put the page back to the global
* pool, prefetch its buddy to speed up later access
@@ -1514,36 +1518,18 @@ static void free_pcppages_bulk(struct zo
prefetch_buddy(page, order);
prefetch_nr--;
}
- } while (count > 0 && !list_empty(list));
- }
- /*
- * local_lock_irq held so equivalent to spin_lock_irqsave for
- * both PREEMPT_RT and non-PREEMPT_RT configurations.
- */
- spin_lock(&zone->lock);
- isolated_pageblocks = has_isolate_pageblock(zone);
+ /* MIGRATE_ISOLATE page should not go to pcplists */
+ VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
+ /* Pageblock could have been isolated meanwhile */
+ if (unlikely(isolated_pageblocks))
+ mt = get_pageblock_migratetype(page);
- /*
- * Use safe version since after __free_one_page(),
- * page->lru.next will not point to original list.
- */
- list_for_each_entry_safe(page, tmp, &head, lru) {
- int mt = get_pcppage_migratetype(page);
-
- /* mt has been encoded with the order (see above) */
- order = mt & NR_PCP_ORDER_MASK;
- mt >>= NR_PCP_ORDER_WIDTH;
-
- /* MIGRATE_ISOLATE page should not go to pcplists */
- VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
- /* Pageblock could have been isolated meanwhile */
- if (unlikely(isolated_pageblocks))
- mt = get_pageblock_migratetype(page);
-
- __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE);
- trace_mm_page_pcpu_drain(page, order, mt);
+ __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE);
+ trace_mm_page_pcpu_drain(page, order, mt);
+ } while (count > 0 && !list_empty(list));
}
+
spin_unlock(&zone->lock);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 102/227] mm/page_alloc: free pages in a single pass during bulk free
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: free pages in a single pass during bulk free
free_pcppages_bulk() has taken two passes through the pcp lists since
commit 0a5f4e5b4562 ("mm/free_pcppages_bulk: do not hold lock when picking
pages to free") due to deferring the cost of selecting PCP lists until the
zone lock is held. Now that list selection is simplier, the main cost
during selection is bulkfree_pcp_prepare() which in the normal case is a
simple check and prefetching. As the list manipulations have cost in
itself, go back to freeing pages in a single pass.
The series up to this point was evaulated using a trunc microbenchmark
that is truncating sparse files stored in page cache (mmtests config
config-io-trunc). Sparse files were used to limit filesystem interaction.
The results versus a revert of storing high-order pages in the PCP lists
is
1-socket Skylake
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1 mm-highpcpopt-v2
Min elapsed 540.00 ( 0.00%) 530.00 ( 1.85%) 530.00 ( 1.85%)
Amean elapsed 543.00 ( 0.00%) 530.00 * 2.39%* 530.00 * 2.39%*
Stddev elapsed 4.83 ( 0.00%) 0.00 ( 100.00%) 0.00 ( 100.00%)
CoeffVar elapsed 0.89 ( 0.00%) 0.00 ( 100.00%) 0.00 ( 100.00%)
Max elapsed 550.00 ( 0.00%) 530.00 ( 3.64%) 530.00 ( 3.64%)
BAmean-50 elapsed 540.00 ( 0.00%) 530.00 ( 1.85%) 530.00 ( 1.85%)
BAmean-95 elapsed 542.22 ( 0.00%) 530.00 ( 2.25%) 530.00 ( 2.25%)
BAmean-99 elapsed 542.22 ( 0.00%) 530.00 ( 2.25%) 530.00 ( 2.25%)
2-socket CascadeLake
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1 mm-highpcpopt-v2
Min elapsed 510.00 ( 0.00%) 500.00 ( 1.96%) 500.00 ( 1.96%)
Amean elapsed 529.00 ( 0.00%) 521.00 ( 1.51%) 510.00 * 3.59%*
Stddev elapsed 16.63 ( 0.00%) 12.87 ( 22.64%) 11.55 ( 30.58%)
CoeffVar elapsed 3.14 ( 0.00%) 2.47 ( 21.46%) 2.26 ( 27.99%)
Max elapsed 550.00 ( 0.00%) 540.00 ( 1.82%) 530.00 ( 3.64%)
BAmean-50 elapsed 516.00 ( 0.00%) 512.00 ( 0.78%) 500.00 ( 3.10%)
BAmean-95 elapsed 526.67 ( 0.00%) 518.89 ( 1.48%) 507.78 ( 3.59%)
BAmean-99 elapsed 526.67 ( 0.00%) 518.89 ( 1.48%) 507.78 ( 3.59%)
The original motivation for multi-passes was will-it-scale page_fault1
using $nr_cpu processes.
2-socket CascadeLake (40 cores, 80 CPUs HT enabled)
5.17.0-rc3 5.17.0-rc3
vanilla mm-highpcpopt-v2
Hmean page_fault1-processes-2 2694662.26 ( 0.00%) 2695780.35 ( 0.04%)
Hmean page_fault1-processes-5 6425819.34 ( 0.00%) 6435544.57 * 0.15%*
Hmean page_fault1-processes-8 9642169.10 ( 0.00%) 9658962.39 ( 0.17%)
Hmean page_fault1-processes-12 12167502.10 ( 0.00%) 12190163.79 ( 0.19%)
Hmean page_fault1-processes-21 15636859.03 ( 0.00%) 15612447.26 ( -0.16%)
Hmean page_fault1-processes-30 25157348.61 ( 0.00%) 25169456.65 ( 0.05%)
Hmean page_fault1-processes-48 27694013.85 ( 0.00%) 27671111.46 ( -0.08%)
Hmean page_fault1-processes-79 25928742.64 ( 0.00%) 25934202.02 ( 0.02%) <--
Hmean page_fault1-processes-110 25730869.75 ( 0.00%) 25671880.65 * -0.23%*
Hmean page_fault1-processes-141 25626992.42 ( 0.00%) 25629551.61 ( 0.01%)
Hmean page_fault1-processes-172 25611651.35 ( 0.00%) 25614927.99 ( 0.01%)
Hmean page_fault1-processes-203 25577298.75 ( 0.00%) 25583445.59 ( 0.02%)
Hmean page_fault1-processes-234 25580686.07 ( 0.00%) 25608240.71 ( 0.11%)
Hmean page_fault1-processes-265 25570215.47 ( 0.00%) 25568647.58 ( -0.01%)
Hmean page_fault1-processes-296 25549488.62 ( 0.00%) 25543935.00 ( -0.02%)
Hmean page_fault1-processes-320 25555149.05 ( 0.00%) 25575696.74 ( 0.08%)
The differences are mostly within the noise and the difference close to
$nr_cpus is negligible.
Link: https://lkml.kernel.org/r/20220217002227.5739-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 56 +++++++++++++++++-----------------------------
1 file changed, 21 insertions(+), 35 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-free-pages-in-a-single-pass-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1452,8 +1452,7 @@ static void free_pcppages_bulk(struct zo
unsigned int order;
int prefetch_nr = READ_ONCE(pcp->batch);
bool isolated_pageblocks;
- struct page *page, *tmp;
- LIST_HEAD(head);
+ struct page *page;
/*
* Ensure proper count is passed which otherwise would stuck in the
@@ -1464,6 +1463,13 @@ static void free_pcppages_bulk(struct zo
/* Ensure requested pindex is drained first. */
pindex = pindex - 1;
+ /*
+ * local_lock_irq held so equivalent to spin_lock_irqsave for
+ * both PREEMPT_RT and non-PREEMPT_RT configurations.
+ */
+ spin_lock(&zone->lock);
+ isolated_pageblocks = has_isolate_pageblock(zone);
+
while (count > 0) {
struct list_head *list;
int nr_pages;
@@ -1486,7 +1492,11 @@ static void free_pcppages_bulk(struct zo
nr_pages = 1 << order;
BUILD_BUG_ON(MAX_ORDER >= (1<<NR_PCP_ORDER_WIDTH));
do {
+ int mt;
+
page = list_last_entry(list, struct page, lru);
+ mt = get_pcppage_migratetype(page);
+
/* must delete to avoid corrupting pcp list */
list_del(&page->lru);
count -= nr_pages;
@@ -1495,12 +1505,6 @@ static void free_pcppages_bulk(struct zo
if (bulkfree_pcp_prepare(page))
continue;
- /* Encode order with the migratetype */
- page->index <<= NR_PCP_ORDER_WIDTH;
- page->index |= order;
-
- list_add_tail(&page->lru, &head);
-
/*
* We are going to put the page back to the global
* pool, prefetch its buddy to speed up later access
@@ -1514,36 +1518,18 @@ static void free_pcppages_bulk(struct zo
prefetch_buddy(page, order);
prefetch_nr--;
}
- } while (count > 0 && !list_empty(list));
- }
- /*
- * local_lock_irq held so equivalent to spin_lock_irqsave for
- * both PREEMPT_RT and non-PREEMPT_RT configurations.
- */
- spin_lock(&zone->lock);
- isolated_pageblocks = has_isolate_pageblock(zone);
+ /* MIGRATE_ISOLATE page should not go to pcplists */
+ VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
+ /* Pageblock could have been isolated meanwhile */
+ if (unlikely(isolated_pageblocks))
+ mt = get_pageblock_migratetype(page);
- /*
- * Use safe version since after __free_one_page(),
- * page->lru.next will not point to original list.
- */
- list_for_each_entry_safe(page, tmp, &head, lru) {
- int mt = get_pcppage_migratetype(page);
-
- /* mt has been encoded with the order (see above) */
- order = mt & NR_PCP_ORDER_MASK;
- mt >>= NR_PCP_ORDER_WIDTH;
-
- /* MIGRATE_ISOLATE page should not go to pcplists */
- VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
- /* Pageblock could have been isolated meanwhile */
- if (unlikely(isolated_pageblocks))
- mt = get_pageblock_migratetype(page);
-
- __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE);
- trace_mm_page_pcpu_drain(page, order, mt);
+ __free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE);
+ trace_mm_page_pcpu_drain(page, order, mt);
+ } while (count > 0 && !list_empty(list));
}
+
spin_unlock(&zone->lock);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 103/227] mm/page_alloc: limit number of high-order pages on PCP during bulk free
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: limit number of high-order pages on PCP during bulk free
When a PCP is mostly used for frees then high-order pages can exist on PCP
lists for some time. This is problematic when the allocation pattern is
all allocations from one CPU and all frees from another resulting in
colder pages being used. When bulk freeing pages, limit the number of
high-order pages that are stored on the PCP lists.
Netperf running on localhost exhibits this pattern and while it does not
matter for some machines, it does matter for others with smaller caches
where cache misses cause problems due to reduced page reuse. Pages freed
directly to the buddy list may be reused quickly while still cache hot
where as storing on the PCP lists may be cold by the time
free_pcppages_bulk() is called.
Using perf kmem:mm_page_alloc, the 5 most used page frames were
5.17-rc3
13041 pfn=0x111a30
13081 pfn=0x5814d0
13097 pfn=0x108258
13121 pfn=0x689598
13128 pfn=0x5814d8
5.17-revert-highpcp
192009 pfn=0x54c140
195426 pfn=0x1081d0
200908 pfn=0x61c808
243515 pfn=0xa9dc20
402523 pfn=0x222bb8
5.17-full-series
142693 pfn=0x346208
162227 pfn=0x13bf08
166413 pfn=0x2711e0
166950 pfn=0x2702f8
The spread is wider as there is still time before pages freed to one PCP
get released with a tradeoff between fast reuse and reduced zone lock
acquisition.
On the machine used to gather the traces, the headline performance was
equivalent.
netperf-tcp
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1r1 mm-highpcplimit-v2
Hmean 64 839.93 ( 0.00%) 840.77 ( 0.10%) 841.02 ( 0.13%)
Hmean 128 1614.22 ( 0.00%) 1622.07 * 0.49%* 1636.41 * 1.37%*
Hmean 256 2952.00 ( 0.00%) 2953.19 ( 0.04%) 2977.76 * 0.87%*
Hmean 1024 10291.67 ( 0.00%) 10239.17 ( -0.51%) 10434.41 * 1.39%*
Hmean 2048 17335.08 ( 0.00%) 17399.97 ( 0.37%) 17134.81 * -1.16%*
Hmean 3312 22628.15 ( 0.00%) 22471.97 ( -0.69%) 22422.78 ( -0.91%)
Hmean 4096 25009.50 ( 0.00%) 24752.83 * -1.03%* 24740.41 ( -1.08%)
Hmean 8192 32745.01 ( 0.00%) 31682.63 * -3.24%* 32153.50 * -1.81%*
Hmean 16384 39759.59 ( 0.00%) 36805.78 * -7.43%* 38948.13 * -2.04%*
On a 1-socket skylake machine with a small CPU cache that suffers more if
cache misses are too high
netperf-tcp
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1 mm-highpcplimit-v2
Hmean 64 938.95 ( 0.00%) 941.50 * 0.27%* 943.61 * 0.50%*
Hmean 128 1843.10 ( 0.00%) 1857.58 * 0.79%* 1861.09 * 0.98%*
Hmean 256 3573.07 ( 0.00%) 3667.45 * 2.64%* 3674.91 * 2.85%*
Hmean 1024 13206.52 ( 0.00%) 13487.80 * 2.13%* 13393.21 * 1.41%*
Hmean 2048 22870.23 ( 0.00%) 23337.96 * 2.05%* 23188.41 * 1.39%*
Hmean 3312 31001.99 ( 0.00%) 32206.50 * 3.89%* 31863.62 * 2.78%*
Hmean 4096 35364.59 ( 0.00%) 36490.96 * 3.19%* 36112.54 * 2.11%*
Hmean 8192 48497.71 ( 0.00%) 49954.05 * 3.00%* 49588.26 * 2.25%*
Hmean 16384 58410.86 ( 0.00%) 60839.80 * 4.16%* 62282.96 * 6.63%*
Note that this was a machine that did not benefit from caching high-order
pages and performance is almost restored with the series applied. It's
not fully restored as cache misses are still higher. This is a trade-off
between optimising for a workload that does all allocs on one CPU and
frees on another or more general workloads that need high-order pages for
SLUB and benefit from avoiding zone->lock for every SLUB refill/drain.
Link: https://lkml.kernel.org/r/20220217002227.5739-7-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-limit-number-of-high-order-pages-on-pcp-during-bulk-free
+++ a/mm/page_alloc.c
@@ -3299,10 +3299,15 @@ static bool free_unref_page_prepare(stru
return true;
}
-static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch)
+static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch,
+ bool free_high)
{
int min_nr_free, max_nr_free;
+ /* Free everything if batch freeing high-order pages. */
+ if (unlikely(free_high))
+ return pcp->count;
+
/* Check for PCP disabled or boot pageset */
if (unlikely(high < batch))
return 1;
@@ -3323,11 +3328,12 @@ static int nr_pcp_free(struct per_cpu_pa
return batch;
}
-static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone)
+static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
+ bool free_high)
{
int high = READ_ONCE(pcp->high);
- if (unlikely(!high))
+ if (unlikely(!high || free_high))
return 0;
if (!test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags))
@@ -3347,17 +3353,27 @@ static void free_unref_page_commit(struc
struct per_cpu_pages *pcp;
int high;
int pindex;
+ bool free_high;
__count_vm_event(PGFREE);
pcp = this_cpu_ptr(zone->per_cpu_pageset);
pindex = order_to_pindex(migratetype, order);
list_add(&page->lru, &pcp->lists[pindex]);
pcp->count += 1 << order;
- high = nr_pcp_high(pcp, zone);
+
+ /*
+ * As high-order pages other than THP's stored on PCP can contribute
+ * to fragmentation, limit the number stored when PCP is heavily
+ * freeing without allocation. The remainder after bulk freeing
+ * stops will be drained from vmstat refresh context.
+ */
+ free_high = (pcp->free_factor && order && order <= PAGE_ALLOC_COSTLY_ORDER);
+
+ high = nr_pcp_high(pcp, zone, free_high);
if (pcp->count >= high) {
int batch = READ_ONCE(pcp->batch);
- free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex);
+ free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex);
}
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 103/227] mm/page_alloc: limit number of high-order pages on PCP during bulk free
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: limit number of high-order pages on PCP during bulk free
When a PCP is mostly used for frees then high-order pages can exist on PCP
lists for some time. This is problematic when the allocation pattern is
all allocations from one CPU and all frees from another resulting in
colder pages being used. When bulk freeing pages, limit the number of
high-order pages that are stored on the PCP lists.
Netperf running on localhost exhibits this pattern and while it does not
matter for some machines, it does matter for others with smaller caches
where cache misses cause problems due to reduced page reuse. Pages freed
directly to the buddy list may be reused quickly while still cache hot
where as storing on the PCP lists may be cold by the time
free_pcppages_bulk() is called.
Using perf kmem:mm_page_alloc, the 5 most used page frames were
5.17-rc3
13041 pfn=0x111a30
13081 pfn=0x5814d0
13097 pfn=0x108258
13121 pfn=0x689598
13128 pfn=0x5814d8
5.17-revert-highpcp
192009 pfn=0x54c140
195426 pfn=0x1081d0
200908 pfn=0x61c808
243515 pfn=0xa9dc20
402523 pfn=0x222bb8
5.17-full-series
142693 pfn=0x346208
162227 pfn=0x13bf08
166413 pfn=0x2711e0
166950 pfn=0x2702f8
The spread is wider as there is still time before pages freed to one PCP
get released with a tradeoff between fast reuse and reduced zone lock
acquisition.
On the machine used to gather the traces, the headline performance was
equivalent.
netperf-tcp
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1r1 mm-highpcplimit-v2
Hmean 64 839.93 ( 0.00%) 840.77 ( 0.10%) 841.02 ( 0.13%)
Hmean 128 1614.22 ( 0.00%) 1622.07 * 0.49%* 1636.41 * 1.37%*
Hmean 256 2952.00 ( 0.00%) 2953.19 ( 0.04%) 2977.76 * 0.87%*
Hmean 1024 10291.67 ( 0.00%) 10239.17 ( -0.51%) 10434.41 * 1.39%*
Hmean 2048 17335.08 ( 0.00%) 17399.97 ( 0.37%) 17134.81 * -1.16%*
Hmean 3312 22628.15 ( 0.00%) 22471.97 ( -0.69%) 22422.78 ( -0.91%)
Hmean 4096 25009.50 ( 0.00%) 24752.83 * -1.03%* 24740.41 ( -1.08%)
Hmean 8192 32745.01 ( 0.00%) 31682.63 * -3.24%* 32153.50 * -1.81%*
Hmean 16384 39759.59 ( 0.00%) 36805.78 * -7.43%* 38948.13 * -2.04%*
On a 1-socket skylake machine with a small CPU cache that suffers more if
cache misses are too high
netperf-tcp
5.17.0-rc3 5.17.0-rc3 5.17.0-rc3
vanilla mm-reverthighpcp-v1 mm-highpcplimit-v2
Hmean 64 938.95 ( 0.00%) 941.50 * 0.27%* 943.61 * 0.50%*
Hmean 128 1843.10 ( 0.00%) 1857.58 * 0.79%* 1861.09 * 0.98%*
Hmean 256 3573.07 ( 0.00%) 3667.45 * 2.64%* 3674.91 * 2.85%*
Hmean 1024 13206.52 ( 0.00%) 13487.80 * 2.13%* 13393.21 * 1.41%*
Hmean 2048 22870.23 ( 0.00%) 23337.96 * 2.05%* 23188.41 * 1.39%*
Hmean 3312 31001.99 ( 0.00%) 32206.50 * 3.89%* 31863.62 * 2.78%*
Hmean 4096 35364.59 ( 0.00%) 36490.96 * 3.19%* 36112.54 * 2.11%*
Hmean 8192 48497.71 ( 0.00%) 49954.05 * 3.00%* 49588.26 * 2.25%*
Hmean 16384 58410.86 ( 0.00%) 60839.80 * 4.16%* 62282.96 * 6.63%*
Note that this was a machine that did not benefit from caching high-order
pages and performance is almost restored with the series applied. It's
not fully restored as cache misses are still higher. This is a trade-off
between optimising for a workload that does all allocs on one CPU and
frees on another or more general workloads that need high-order pages for
SLUB and benefit from avoiding zone->lock for every SLUB refill/drain.
Link: https://lkml.kernel.org/r/20220217002227.5739-7-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-limit-number-of-high-order-pages-on-pcp-during-bulk-free
+++ a/mm/page_alloc.c
@@ -3299,10 +3299,15 @@ static bool free_unref_page_prepare(stru
return true;
}
-static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch)
+static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch,
+ bool free_high)
{
int min_nr_free, max_nr_free;
+ /* Free everything if batch freeing high-order pages. */
+ if (unlikely(free_high))
+ return pcp->count;
+
/* Check for PCP disabled or boot pageset */
if (unlikely(high < batch))
return 1;
@@ -3323,11 +3328,12 @@ static int nr_pcp_free(struct per_cpu_pa
return batch;
}
-static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone)
+static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone,
+ bool free_high)
{
int high = READ_ONCE(pcp->high);
- if (unlikely(!high))
+ if (unlikely(!high || free_high))
return 0;
if (!test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags))
@@ -3347,17 +3353,27 @@ static void free_unref_page_commit(struc
struct per_cpu_pages *pcp;
int high;
int pindex;
+ bool free_high;
__count_vm_event(PGFREE);
pcp = this_cpu_ptr(zone->per_cpu_pageset);
pindex = order_to_pindex(migratetype, order);
list_add(&page->lru, &pcp->lists[pindex]);
pcp->count += 1 << order;
- high = nr_pcp_high(pcp, zone);
+
+ /*
+ * As high-order pages other than THP's stored on PCP can contribute
+ * to fragmentation, limit the number stored when PCP is heavily
+ * freeing without allocation. The remainder after bulk freeing
+ * stops will be drained from vmstat refresh context.
+ */
+ free_high = (pcp->free_factor && order && order <= PAGE_ALLOC_COSTLY_ORDER);
+
+ high = nr_pcp_high(pcp, zone, free_high);
if (pcp->count >= high) {
int batch = READ_ONCE(pcp->batch);
- free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex);
+ free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex);
}
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 104/227] mm/page_alloc: do not prefetch buddies during bulk free
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: do not prefetch buddies during bulk free
free_pcppages_bulk() has taken two passes through the pcp lists since
commit 0a5f4e5b4562 ("mm/free_pcppages_bulk: do not hold lock when picking
pages to free") due to deferring the cost of selecting PCP lists until the
zone lock is held.
As the list processing now takes place under the zone lock, it's less
clear that this will always benefit for two reasons.
1. There is a guaranteed cost to calculating the buddy which definitely
has to be calculated again. However, as the zone lock is held and
there is no deferring of buddy merging, there is no guarantee that the
prefetch will have completed when the second buddy calculation takes
place and buddies are being merged. With or without the prefetch, there
may be further stalls depending on how many pages get merged. In other
words, a stall due to merging is inevitable and at best only one stall
might be avoided at the cost of calculating the buddy location twice.
2. As the zone lock is held, prefetch_nr makes less sense as once
prefetch_nr expires, the cache lines of interest have already been
merged.
The main concern is that there is a definite cost to calculating the buddy
location early for the prefetch and it is a "maybe win" depending on
whether the CPU prefetch logic and memory is fast enough. Remove the
prefetch logic on the basis that reduced instructions in a path is always
a saving where as the prefetch might save one memory stall depending on
the CPU and memory.
In most cases, this has marginal benefit as the calculations are a small
part of the overall freeing of pages. However, it was detectable on at
least one machine.
5.17.0-rc3 5.17.0-rc3
mm-highpcplimit-v2r1 mm-noprefetch-v1r1
Min elapsed 630.00 ( 0.00%) 610.00 ( 3.17%)
Amean elapsed 639.00 ( 0.00%) 623.00 * 2.50%*
Max elapsed 660.00 ( 0.00%) 660.00 ( 0.00%)
Link: https://lkml.kernel.org/r/20220221094119.15282-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Suggested-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 24 ------------------------
1 file changed, 24 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-do-not-prefetch-buddies-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1429,15 +1429,6 @@ static bool bulkfree_pcp_prepare(struct
}
#endif /* CONFIG_DEBUG_VM */
-static inline void prefetch_buddy(struct page *page, unsigned int order)
-{
- unsigned long pfn = page_to_pfn(page);
- unsigned long buddy_pfn = __find_buddy_pfn(pfn, order);
- struct page *buddy = page + (buddy_pfn - pfn);
-
- prefetch(buddy);
-}
-
/*
* Frees a number of pages from the PCP lists
* Assumes all pages on list are in same zone.
@@ -1450,7 +1441,6 @@ static void free_pcppages_bulk(struct zo
int min_pindex = 0;
int max_pindex = NR_PCP_LISTS - 1;
unsigned int order;
- int prefetch_nr = READ_ONCE(pcp->batch);
bool isolated_pageblocks;
struct page *page;
@@ -1505,20 +1495,6 @@ static void free_pcppages_bulk(struct zo
if (bulkfree_pcp_prepare(page))
continue;
- /*
- * We are going to put the page back to the global
- * pool, prefetch its buddy to speed up later access
- * under zone->lock. It is believed the overhead of
- * an additional test and calculating buddy_pfn here
- * can be offset by reduced memory latency later. To
- * avoid excessive prefetching due to large count, only
- * prefetch buddy for the first pcp->batch nr of pages.
- */
- if (prefetch_nr) {
- prefetch_buddy(page, order);
- prefetch_nr--;
- }
-
/* MIGRATE_ISOLATE page should not go to pcplists */
VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
/* Pageblock could have been isolated meanwhile */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 104/227] mm/page_alloc: do not prefetch buddies during bulk free
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: vbabka, mhocko, dave.hansen, brouer, aaron.lu, mgorman, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: do not prefetch buddies during bulk free
free_pcppages_bulk() has taken two passes through the pcp lists since
commit 0a5f4e5b4562 ("mm/free_pcppages_bulk: do not hold lock when picking
pages to free") due to deferring the cost of selecting PCP lists until the
zone lock is held.
As the list processing now takes place under the zone lock, it's less
clear that this will always benefit for two reasons.
1. There is a guaranteed cost to calculating the buddy which definitely
has to be calculated again. However, as the zone lock is held and
there is no deferring of buddy merging, there is no guarantee that the
prefetch will have completed when the second buddy calculation takes
place and buddies are being merged. With or without the prefetch, there
may be further stalls depending on how many pages get merged. In other
words, a stall due to merging is inevitable and at best only one stall
might be avoided at the cost of calculating the buddy location twice.
2. As the zone lock is held, prefetch_nr makes less sense as once
prefetch_nr expires, the cache lines of interest have already been
merged.
The main concern is that there is a definite cost to calculating the buddy
location early for the prefetch and it is a "maybe win" depending on
whether the CPU prefetch logic and memory is fast enough. Remove the
prefetch logic on the basis that reduced instructions in a path is always
a saving where as the prefetch might save one memory stall depending on
the CPU and memory.
In most cases, this has marginal benefit as the calculations are a small
part of the overall freeing of pages. However, it was detectable on at
least one machine.
5.17.0-rc3 5.17.0-rc3
mm-highpcplimit-v2r1 mm-noprefetch-v1r1
Min elapsed 630.00 ( 0.00%) 610.00 ( 3.17%)
Amean elapsed 639.00 ( 0.00%) 623.00 * 2.50%*
Max elapsed 660.00 ( 0.00%) 660.00 ( 0.00%)
Link: https://lkml.kernel.org/r/20220221094119.15282-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Suggested-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 24 ------------------------
1 file changed, 24 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-do-not-prefetch-buddies-during-bulk-free
+++ a/mm/page_alloc.c
@@ -1429,15 +1429,6 @@ static bool bulkfree_pcp_prepare(struct
}
#endif /* CONFIG_DEBUG_VM */
-static inline void prefetch_buddy(struct page *page, unsigned int order)
-{
- unsigned long pfn = page_to_pfn(page);
- unsigned long buddy_pfn = __find_buddy_pfn(pfn, order);
- struct page *buddy = page + (buddy_pfn - pfn);
-
- prefetch(buddy);
-}
-
/*
* Frees a number of pages from the PCP lists
* Assumes all pages on list are in same zone.
@@ -1450,7 +1441,6 @@ static void free_pcppages_bulk(struct zo
int min_pindex = 0;
int max_pindex = NR_PCP_LISTS - 1;
unsigned int order;
- int prefetch_nr = READ_ONCE(pcp->batch);
bool isolated_pageblocks;
struct page *page;
@@ -1505,20 +1495,6 @@ static void free_pcppages_bulk(struct zo
if (bulkfree_pcp_prepare(page))
continue;
- /*
- * We are going to put the page back to the global
- * pool, prefetch its buddy to speed up later access
- * under zone->lock. It is believed the overhead of
- * an additional test and calculating buddy_pfn here
- * can be offset by reduced memory latency later. To
- * avoid excessive prefetching due to large count, only
- * prefetch buddy for the first pcp->batch nr of pages.
- */
- if (prefetch_nr) {
- prefetch_buddy(page, order);
- prefetch_nr--;
- }
-
/* MIGRATE_ISOLATE page should not go to pcplists */
VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
/* Pageblock could have been isolated meanwhile */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 105/227] arch/x86/mm/numa: Do not initialize nodes twice
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: richard.weiyang, raquini, mhocko, dennis, david, dave.hansen,
amakhalov, osalvador, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Oscar Salvador <osalvador@suse.de>
Subject: arch/x86/mm/numa: Do not initialize nodes twice
On x86, prior to ("mm: handle uninitialized numa nodes gracecully"), NUMA
nodes could be allocated at three different places.
- numa_register_memblks
- init_cpu_to_node
- init_gi_nodes
All these calls happen at setup_arch, and have the following order:
setup_arch
...
x86_numa_init
numa_init
numa_register_memblks
...
init_cpu_to_node
init_memory_less_node
alloc_node_data
free_area_init_memoryless_node
init_gi_nodes
init_memory_less_node
alloc_node_data
free_area_init_memoryless_node
numa_register_memblks() is only interested in those nodes which have
memory, so it skips over any memoryless node it founds. Later on, when we
have read ACPI's SRAT table, we call init_cpu_to_node() and
init_gi_nodes(), which initialize any memoryless node we might have that
have either CPU or Initiator affinity, meaning we allocate pg_data_t
struct for them and we mark them as ONLINE.
So far so good, but the thing is that after ("mm: handle uninitialized
numa nodes gracefully"), we allocate all possible NUMA nodes in
free_area_init(), meaning we have a picture like the following:
setup_arch
x86_numa_init
numa_init
numa_register_memblks <-- allocate non-memoryless node
x86_init.paging.pagetable_init
...
free_area_init
free_area_init_memoryless <-- allocate memoryless node
init_cpu_to_node
alloc_node_data <-- allocate memoryless node with CPU
free_area_init_memoryless_node
init_gi_nodes
alloc_node_data <-- allocate memoryless node with Initiator
free_area_init_memoryless_node
free_area_init() already allocates all possible NUMA nodes, but
init_cpu_to_node() and init_gi_nodes() are clueless about that, so they go
ahead and allocate a new pg_data_t struct without checking anything,
meaning we end up allocating twice.
It should be mad clear that this only happens in the case where memoryless
NUMA node happens to have a CPU/Initiator affinity.
So get rid of init_memory_less_node() and just set the node online.
Note that setting the node online is needed, otherwise we choke down the
chain when bringup_nonboot_cpus() ends up calling
__try_online_node()->register_one_node()->... and we blow up in
bus_add_device(). As can be seen here:
==========
[ 0.585060] BUG: kernel NULL pointer dereference, address: 0000000000000060
[ 0.586091] #PF: supervisor read access in kernel mode
[ 0.586831] #PF: error_code(0x0000) - not-present page
[ 0.586930] PGD 0 P4D 0
[ 0.586930] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[ 0.586930] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-1-default+ #45
[ 0.586930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/4
[ 0.586930] RIP: 0010:bus_add_device+0x5a/0x140
[ 0.586930] Code: 8b 74 24 20 48 89 df e8 84 96 ff ff 85 c0 89 c5 75 38 48 8b 53 50 48 85 d2 0f 84 bb 00 004
[ 0.586930] RSP: 0000:ffffc9000022bd10 EFLAGS: 00010246
[ 0.586930] RAX: 0000000000000000 RBX: ffff888100987400 RCX: ffff8881003e4e19
[ 0.586930] RDX: ffff8881009a5e00 RSI: ffff888100987400 RDI: ffff888100987400
[ 0.586930] RBP: 0000000000000000 R08: ffff8881003e4e18 R09: ffff8881003e4c98
[ 0.586930] R10: 0000000000000000 R11: ffff888100402bc0 R12: ffffffff822ceba0
[ 0.586930] R13: 0000000000000000 R14: ffff888100987400 R15: 0000000000000000
[ 0.586930] FS: 0000000000000000(0000) GS:ffff88853fc00000(0000) knlGS:0000000000000000
[ 0.586930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.586930] CR2: 0000000000000060 CR3: 000000000200a001 CR4: 00000000001706b0
[ 0.586930] Call Trace:
[ 0.586930] <TASK>
[ 0.586930] device_add+0x4c0/0x910
[ 0.586930] __register_one_node+0x97/0x2d0
[ 0.586930] __try_online_node+0x85/0xc0
[ 0.586930] try_online_node+0x25/0x40
[ 0.586930] cpu_up+0x4f/0x100
[ 0.586930] bringup_nonboot_cpus+0x4f/0x60
[ 0.586930] smp_init+0x26/0x79
[ 0.586930] kernel_init_freeable+0x130/0x2f1
[ 0.586930] ? rest_init+0x100/0x100
[ 0.586930] kernel_init+0x17/0x150
[ 0.586930] ? rest_init+0x100/0x100
[ 0.586930] ret_from_fork+0x22/0x30
[ 0.586930] </TASK>
[ 0.586930] Modules linked in:
[ 0.586930] CR2: 0000000000000060
[ 0.586930] ---[ end trace 0000000000000000 ]---
==========
The reason is simple, by the time bringup_nonboot_cpus() gets called, we
did not register the node_subsys bus yet, so we crash when
bus_add_device() tries to dereference bus()->p.
The following shows the order of the calls:
kernel_init_freeable
smp_init
bringup_nonboot_cpus
...
bus_add_device() <- we did not register node_subsys yet
do_basic_setup
do_initcalls
postcore_initcall(register_node_type);
register_node_type
subsys_system_register
subsys_register
bus_register <- register node_subsys bus
Why setting the node online saves us then? Well, simply because
__try_online_node() backs off when the node is online, meaning we do not
end up calling register_one_node() in the first place.
This is subtle, broken and deserves a deep analysis and thought about how
to put this into shape, but for now let us have this easy fix for the
leaking memory issue.
[osalvador@suse.de: add comments]
Link: https://lkml.kernel.org/r/20220221142649.3457-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20220218224302.5282-2-osalvador@suse.de
Fixes: da4490c958ad ("mm: handle uninitialized numa nodes gracefully")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/x86/mm/numa.c | 33 ++++++++++++++++++++-------------
include/linux/mm.h | 1 -
mm/page_alloc.c | 2 +-
3 files changed, 21 insertions(+), 15 deletions(-)
--- a/arch/x86/mm/numa.c~arch-x86-mm-numa-do-not-initialize-nodes-twice
+++ a/arch/x86/mm/numa.c
@@ -738,17 +738,6 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
}
-static void __init init_memory_less_node(int nid)
-{
- /* Allocate and initialize node data. Memory-less node is now online.*/
- alloc_node_data(nid);
- free_area_init_memoryless_node(nid);
-
- /*
- * All zonelists will be built later in start_kernel() after per cpu
- * areas are initialized.
- */
-}
/*
* A node may exist which has one or more Generic Initiators but no CPUs and no
@@ -766,9 +755,18 @@ void __init init_gi_nodes(void)
{
int nid;
+ /*
+ * Exclude this node from
+ * bringup_nonboot_cpus
+ * cpu_up
+ * __try_online_node
+ * register_one_node
+ * because node_subsys is not initialized yet.
+ * TODO remove dependency on node_online
+ */
for_each_node_state(nid, N_GENERIC_INITIATOR)
if (!node_online(nid))
- init_memory_less_node(nid);
+ node_set_online(nid);
}
/*
@@ -798,8 +796,17 @@ void __init init_cpu_to_node(void)
if (node == NUMA_NO_NODE)
continue;
+ /*
+ * Exclude this node from
+ * bringup_nonboot_cpus
+ * cpu_up
+ * __try_online_node
+ * register_one_node
+ * because node_subsys is not initialized yet.
+ * TODO remove dependency on node_online
+ */
if (!node_online(node))
- init_memory_less_node(node);
+ node_set_online(node);
numa_set_node(cpu, node);
}
--- a/include/linux/mm.h~arch-x86-mm-numa-do-not-initialize-nodes-twice
+++ a/include/linux/mm.h
@@ -2449,7 +2449,6 @@ static inline spinlock_t *pud_lock(struc
}
extern void __init pagecache_init(void);
-extern void __init free_area_init_memoryless_node(int nid);
extern void free_initmem(void);
/*
--- a/mm/page_alloc.c~arch-x86-mm-numa-do-not-initialize-nodes-twice
+++ a/mm/page_alloc.c
@@ -7626,7 +7626,7 @@ static void __init free_area_init_node(i
free_area_init_core(pgdat);
}
-void __init free_area_init_memoryless_node(int nid)
+static void __init free_area_init_memoryless_node(int nid)
{
free_area_init_node(nid);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 105/227] arch/x86/mm/numa: Do not initialize nodes twice
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: richard.weiyang, raquini, mhocko, dennis, david, dave.hansen,
amakhalov, osalvador, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Oscar Salvador <osalvador@suse.de>
Subject: arch/x86/mm/numa: Do not initialize nodes twice
On x86, prior to ("mm: handle uninitialized numa nodes gracecully"), NUMA
nodes could be allocated at three different places.
- numa_register_memblks
- init_cpu_to_node
- init_gi_nodes
All these calls happen at setup_arch, and have the following order:
setup_arch
...
x86_numa_init
numa_init
numa_register_memblks
...
init_cpu_to_node
init_memory_less_node
alloc_node_data
free_area_init_memoryless_node
init_gi_nodes
init_memory_less_node
alloc_node_data
free_area_init_memoryless_node
numa_register_memblks() is only interested in those nodes which have
memory, so it skips over any memoryless node it founds. Later on, when we
have read ACPI's SRAT table, we call init_cpu_to_node() and
init_gi_nodes(), which initialize any memoryless node we might have that
have either CPU or Initiator affinity, meaning we allocate pg_data_t
struct for them and we mark them as ONLINE.
So far so good, but the thing is that after ("mm: handle uninitialized
numa nodes gracefully"), we allocate all possible NUMA nodes in
free_area_init(), meaning we have a picture like the following:
setup_arch
x86_numa_init
numa_init
numa_register_memblks <-- allocate non-memoryless node
x86_init.paging.pagetable_init
...
free_area_init
free_area_init_memoryless <-- allocate memoryless node
init_cpu_to_node
alloc_node_data <-- allocate memoryless node with CPU
free_area_init_memoryless_node
init_gi_nodes
alloc_node_data <-- allocate memoryless node with Initiator
free_area_init_memoryless_node
free_area_init() already allocates all possible NUMA nodes, but
init_cpu_to_node() and init_gi_nodes() are clueless about that, so they go
ahead and allocate a new pg_data_t struct without checking anything,
meaning we end up allocating twice.
It should be mad clear that this only happens in the case where memoryless
NUMA node happens to have a CPU/Initiator affinity.
So get rid of init_memory_less_node() and just set the node online.
Note that setting the node online is needed, otherwise we choke down the
chain when bringup_nonboot_cpus() ends up calling
__try_online_node()->register_one_node()->... and we blow up in
bus_add_device(). As can be seen here:
==========
[ 0.585060] BUG: kernel NULL pointer dereference, address: 0000000000000060
[ 0.586091] #PF: supervisor read access in kernel mode
[ 0.586831] #PF: error_code(0x0000) - not-present page
[ 0.586930] PGD 0 P4D 0
[ 0.586930] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[ 0.586930] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-1-default+ #45
[ 0.586930] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/4
[ 0.586930] RIP: 0010:bus_add_device+0x5a/0x140
[ 0.586930] Code: 8b 74 24 20 48 89 df e8 84 96 ff ff 85 c0 89 c5 75 38 48 8b 53 50 48 85 d2 0f 84 bb 00 004
[ 0.586930] RSP: 0000:ffffc9000022bd10 EFLAGS: 00010246
[ 0.586930] RAX: 0000000000000000 RBX: ffff888100987400 RCX: ffff8881003e4e19
[ 0.586930] RDX: ffff8881009a5e00 RSI: ffff888100987400 RDI: ffff888100987400
[ 0.586930] RBP: 0000000000000000 R08: ffff8881003e4e18 R09: ffff8881003e4c98
[ 0.586930] R10: 0000000000000000 R11: ffff888100402bc0 R12: ffffffff822ceba0
[ 0.586930] R13: 0000000000000000 R14: ffff888100987400 R15: 0000000000000000
[ 0.586930] FS: 0000000000000000(0000) GS:ffff88853fc00000(0000) knlGS:0000000000000000
[ 0.586930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.586930] CR2: 0000000000000060 CR3: 000000000200a001 CR4: 00000000001706b0
[ 0.586930] Call Trace:
[ 0.586930] <TASK>
[ 0.586930] device_add+0x4c0/0x910
[ 0.586930] __register_one_node+0x97/0x2d0
[ 0.586930] __try_online_node+0x85/0xc0
[ 0.586930] try_online_node+0x25/0x40
[ 0.586930] cpu_up+0x4f/0x100
[ 0.586930] bringup_nonboot_cpus+0x4f/0x60
[ 0.586930] smp_init+0x26/0x79
[ 0.586930] kernel_init_freeable+0x130/0x2f1
[ 0.586930] ? rest_init+0x100/0x100
[ 0.586930] kernel_init+0x17/0x150
[ 0.586930] ? rest_init+0x100/0x100
[ 0.586930] ret_from_fork+0x22/0x30
[ 0.586930] </TASK>
[ 0.586930] Modules linked in:
[ 0.586930] CR2: 0000000000000060
[ 0.586930] ---[ end trace 0000000000000000 ]---
==========
The reason is simple, by the time bringup_nonboot_cpus() gets called, we
did not register the node_subsys bus yet, so we crash when
bus_add_device() tries to dereference bus()->p.
The following shows the order of the calls:
kernel_init_freeable
smp_init
bringup_nonboot_cpus
...
bus_add_device() <- we did not register node_subsys yet
do_basic_setup
do_initcalls
postcore_initcall(register_node_type);
register_node_type
subsys_system_register
subsys_register
bus_register <- register node_subsys bus
Why setting the node online saves us then? Well, simply because
__try_online_node() backs off when the node is online, meaning we do not
end up calling register_one_node() in the first place.
This is subtle, broken and deserves a deep analysis and thought about how
to put this into shape, but for now let us have this easy fix for the
leaking memory issue.
[osalvador@suse.de: add comments]
Link: https://lkml.kernel.org/r/20220221142649.3457-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20220218224302.5282-2-osalvador@suse.de
Fixes: da4490c958ad ("mm: handle uninitialized numa nodes gracefully")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/x86/mm/numa.c | 33 ++++++++++++++++++++-------------
include/linux/mm.h | 1 -
mm/page_alloc.c | 2 +-
3 files changed, 21 insertions(+), 15 deletions(-)
--- a/arch/x86/mm/numa.c~arch-x86-mm-numa-do-not-initialize-nodes-twice
+++ a/arch/x86/mm/numa.c
@@ -738,17 +738,6 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
}
-static void __init init_memory_less_node(int nid)
-{
- /* Allocate and initialize node data. Memory-less node is now online.*/
- alloc_node_data(nid);
- free_area_init_memoryless_node(nid);
-
- /*
- * All zonelists will be built later in start_kernel() after per cpu
- * areas are initialized.
- */
-}
/*
* A node may exist which has one or more Generic Initiators but no CPUs and no
@@ -766,9 +755,18 @@ void __init init_gi_nodes(void)
{
int nid;
+ /*
+ * Exclude this node from
+ * bringup_nonboot_cpus
+ * cpu_up
+ * __try_online_node
+ * register_one_node
+ * because node_subsys is not initialized yet.
+ * TODO remove dependency on node_online
+ */
for_each_node_state(nid, N_GENERIC_INITIATOR)
if (!node_online(nid))
- init_memory_less_node(nid);
+ node_set_online(nid);
}
/*
@@ -798,8 +796,17 @@ void __init init_cpu_to_node(void)
if (node == NUMA_NO_NODE)
continue;
+ /*
+ * Exclude this node from
+ * bringup_nonboot_cpus
+ * cpu_up
+ * __try_online_node
+ * register_one_node
+ * because node_subsys is not initialized yet.
+ * TODO remove dependency on node_online
+ */
if (!node_online(node))
- init_memory_less_node(node);
+ node_set_online(node);
numa_set_node(cpu, node);
}
--- a/include/linux/mm.h~arch-x86-mm-numa-do-not-initialize-nodes-twice
+++ a/include/linux/mm.h
@@ -2449,7 +2449,6 @@ static inline spinlock_t *pud_lock(struc
}
extern void __init pagecache_init(void);
-extern void __init free_area_init_memoryless_node(int nid);
extern void free_initmem(void);
/*
--- a/mm/page_alloc.c~arch-x86-mm-numa-do-not-initialize-nodes-twice
+++ a/mm/page_alloc.c
@@ -7626,7 +7626,7 @@ static void __init free_area_init_node(i
free_area_init_core(pgdat);
}
-void __init free_area_init_memoryless_node(int nid)
+static void __init free_area_init_memoryless_node(int nid)
{
free_area_init_node(nid);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 106/227] mm: count time in drain_all_pages during direct reclaim as memory pressure
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: timmurray, shakeelb, roman.gushchin, pmladek, peterz, minchan,
mhocko, hannes, surenb, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Suren Baghdasaryan <surenb@google.com>
Subject: mm: count time in drain_all_pages during direct reclaim as memory pressure
When page allocation in direct reclaim path fails, the system will make
one attempt to shrink per-cpu page lists and free pages from high alloc
reserves. Draining per-cpu pages into buddy allocator can be a very slow
operation because it's done using workqueues and the task in direct
reclaim waits for all of them to finish before proceeding. Currently this
time is not accounted as psi memory stall.
While testing mobile devices under extreme memory pressure, when
allocations are failing during direct reclaim, we notices that psi events
which would be expected in such conditions were not triggered. After
profiling these cases it was determined that the reason for missing psi
events was that a big chunk of time spent in direct reclaim is not
accounted as memory stall, therefore psi would not reach the levels at
which an event is generated. Further investigation revealed that the bulk
of that unaccounted time was spent inside drain_all_pages call.
A typical captured case when drain_all_pages path gets activated:
__alloc_pages_slowpath took 44.644.613ns
__perform_reclaim took 751.668ns (1.7%)
drain_all_pages took 43.887.167ns (98.3%)
PSI in this case records the time spent in __perform_reclaim but ignores
drain_all_pages, IOW it misses 98.3% of the time spent in
__alloc_pages_slowpath.
Annotate __alloc_pages_direct_reclaim in its entirety so that delays from
handling page allocation failure in the direct reclaim path are accounted
as memory stall.
Link: https://lkml.kernel.org/r/20220223194812.1299646-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Tim Murray <timmurray@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/mm/page_alloc.c~mm-count-time-in-drain_all_pages-during-direct-reclaim-as-memory-pressure
+++ a/mm/page_alloc.c
@@ -4554,13 +4554,12 @@ __perform_reclaim(gfp_t gfp_mask, unsign
const struct alloc_context *ac)
{
unsigned int noreclaim_flag;
- unsigned long pflags, progress;
+ unsigned long progress;
cond_resched();
/* We now go into synchronous reclaim */
cpuset_memory_pressure_bump();
- psi_memstall_enter(&pflags);
fs_reclaim_acquire(gfp_mask);
noreclaim_flag = memalloc_noreclaim_save();
@@ -4569,7 +4568,6 @@ __perform_reclaim(gfp_t gfp_mask, unsign
memalloc_noreclaim_restore(noreclaim_flag);
fs_reclaim_release(gfp_mask);
- psi_memstall_leave(&pflags);
cond_resched();
@@ -4583,11 +4581,13 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
unsigned long *did_some_progress)
{
struct page *page = NULL;
+ unsigned long pflags;
bool drained = false;
+ psi_memstall_enter(&pflags);
*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
if (unlikely(!(*did_some_progress)))
- return NULL;
+ goto out;
retry:
page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
@@ -4603,6 +4603,8 @@ retry:
drained = true;
goto retry;
}
+out:
+ psi_memstall_leave(&pflags);
return page;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 106/227] mm: count time in drain_all_pages during direct reclaim as memory pressure
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: timmurray, shakeelb, roman.gushchin, pmladek, peterz, minchan,
mhocko, hannes, surenb, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Suren Baghdasaryan <surenb@google.com>
Subject: mm: count time in drain_all_pages during direct reclaim as memory pressure
When page allocation in direct reclaim path fails, the system will make
one attempt to shrink per-cpu page lists and free pages from high alloc
reserves. Draining per-cpu pages into buddy allocator can be a very slow
operation because it's done using workqueues and the task in direct
reclaim waits for all of them to finish before proceeding. Currently this
time is not accounted as psi memory stall.
While testing mobile devices under extreme memory pressure, when
allocations are failing during direct reclaim, we notices that psi events
which would be expected in such conditions were not triggered. After
profiling these cases it was determined that the reason for missing psi
events was that a big chunk of time spent in direct reclaim is not
accounted as memory stall, therefore psi would not reach the levels at
which an event is generated. Further investigation revealed that the bulk
of that unaccounted time was spent inside drain_all_pages call.
A typical captured case when drain_all_pages path gets activated:
__alloc_pages_slowpath took 44.644.613ns
__perform_reclaim took 751.668ns (1.7%)
drain_all_pages took 43.887.167ns (98.3%)
PSI in this case records the time spent in __perform_reclaim but ignores
drain_all_pages, IOW it misses 98.3% of the time spent in
__alloc_pages_slowpath.
Annotate __alloc_pages_direct_reclaim in its entirety so that delays from
handling page allocation failure in the direct reclaim path are accounted
as memory stall.
Link: https://lkml.kernel.org/r/20220223194812.1299646-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Tim Murray <timmurray@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/mm/page_alloc.c~mm-count-time-in-drain_all_pages-during-direct-reclaim-as-memory-pressure
+++ a/mm/page_alloc.c
@@ -4554,13 +4554,12 @@ __perform_reclaim(gfp_t gfp_mask, unsign
const struct alloc_context *ac)
{
unsigned int noreclaim_flag;
- unsigned long pflags, progress;
+ unsigned long progress;
cond_resched();
/* We now go into synchronous reclaim */
cpuset_memory_pressure_bump();
- psi_memstall_enter(&pflags);
fs_reclaim_acquire(gfp_mask);
noreclaim_flag = memalloc_noreclaim_save();
@@ -4569,7 +4568,6 @@ __perform_reclaim(gfp_t gfp_mask, unsign
memalloc_noreclaim_restore(noreclaim_flag);
fs_reclaim_release(gfp_mask);
- psi_memstall_leave(&pflags);
cond_resched();
@@ -4583,11 +4581,13 @@ __alloc_pages_direct_reclaim(gfp_t gfp_m
unsigned long *did_some_progress)
{
struct page *page = NULL;
+ unsigned long pflags;
bool drained = false;
+ psi_memstall_enter(&pflags);
*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
if (unlikely(!(*did_some_progress)))
- return NULL;
+ goto out;
retry:
page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
@@ -4603,6 +4603,8 @@ retry:
drained = true;
goto retry;
}
+out:
+ psi_memstall_leave(&pflags);
return page;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 107/227] mm/page_alloc: call check_new_pages() while zone spinlock is not held
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: weixugc, vbabka, shakeelb, rientjes, mhocko, mgorman, hughd,
gthelen, edumazet, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Eric Dumazet <edumazet@google.com>
Subject: mm/page_alloc: call check_new_pages() while zone spinlock is not held
For high order pages not using pcp, rmqueue() is currently calling the
costly check_new_pages() while zone spinlock is held, and hard irqs
masked.
This is not needed, we can release the spinlock sooner to reduce zone
spinlock contention.
Note that after this patch, we call __mod_zone_freepage_state() before
deciding to leak the page because it is in bad state.
Link: https://lkml.kernel.org/r/20220304170215.1868106-1-eric.dumazet@gmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held
+++ a/mm/page_alloc.c
@@ -3665,10 +3665,10 @@ struct page *rmqueue(struct zone *prefer
* allocate greater than order-1 page units with __GFP_NOFAIL.
*/
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
- spin_lock_irqsave(&zone->lock, flags);
do {
page = NULL;
+ spin_lock_irqsave(&zone->lock, flags);
/*
* order-0 request can reach here when the pcplist is skipped
* due to non-CMA allocation context. HIGHATOMIC area is
@@ -3680,15 +3680,15 @@ struct page *rmqueue(struct zone *prefer
if (page)
trace_mm_page_alloc_zone_locked(page, order, migratetype);
}
- if (!page)
+ if (!page) {
page = __rmqueue(zone, order, migratetype, alloc_flags);
- } while (page && check_new_pages(page, order));
- if (!page)
- goto failed;
-
- __mod_zone_freepage_state(zone, -(1 << order),
- get_pcppage_migratetype(page));
- spin_unlock_irqrestore(&zone->lock, flags);
+ if (!page)
+ goto failed;
+ }
+ __mod_zone_freepage_state(zone, -(1 << order),
+ get_pcppage_migratetype(page));
+ spin_unlock_irqrestore(&zone->lock, flags);
+ } while (check_new_pages(page, order));
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
zone_statistics(preferred_zone, zone, 1);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 107/227] mm/page_alloc: call check_new_pages() while zone spinlock is not held
@ 2022-03-22 21:43 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:43 UTC (permalink / raw)
To: weixugc, vbabka, shakeelb, rientjes, mhocko, mgorman, hughd,
gthelen, edumazet, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Eric Dumazet <edumazet@google.com>
Subject: mm/page_alloc: call check_new_pages() while zone spinlock is not held
For high order pages not using pcp, rmqueue() is currently calling the
costly check_new_pages() while zone spinlock is held, and hard irqs
masked.
This is not needed, we can release the spinlock sooner to reduce zone
spinlock contention.
Note that after this patch, we call __mod_zone_freepage_state() before
deciding to leak the page because it is in bad state.
Link: https://lkml.kernel.org/r/20220304170215.1868106-1-eric.dumazet@gmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held
+++ a/mm/page_alloc.c
@@ -3665,10 +3665,10 @@ struct page *rmqueue(struct zone *prefer
* allocate greater than order-1 page units with __GFP_NOFAIL.
*/
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
- spin_lock_irqsave(&zone->lock, flags);
do {
page = NULL;
+ spin_lock_irqsave(&zone->lock, flags);
/*
* order-0 request can reach here when the pcplist is skipped
* due to non-CMA allocation context. HIGHATOMIC area is
@@ -3680,15 +3680,15 @@ struct page *rmqueue(struct zone *prefer
if (page)
trace_mm_page_alloc_zone_locked(page, order, migratetype);
}
- if (!page)
+ if (!page) {
page = __rmqueue(zone, order, migratetype, alloc_flags);
- } while (page && check_new_pages(page, order));
- if (!page)
- goto failed;
-
- __mod_zone_freepage_state(zone, -(1 << order),
- get_pcppage_migratetype(page));
- spin_unlock_irqrestore(&zone->lock, flags);
+ if (!page)
+ goto failed;
+ }
+ __mod_zone_freepage_state(zone, -(1 << order),
+ get_pcppage_migratetype(page));
+ spin_unlock_irqrestore(&zone->lock, flags);
+ } while (check_new_pages(page, order));
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
zone_statistics(preferred_zone, zone, 1);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 108/227] mm/page_alloc: check high-order pages for corruption during PCP operations
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: weixugc, vbabka, shakeelb, rientjes, mhocko, hughd, gthelen,
edumazet, mgorman, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: check high-order pages for corruption during PCP operations
Eric Dumazet pointed out that commit 44042b449872 ("mm/page_alloc: allow
high-order pages to be stored on the per-cpu lists") only checks the head
page during PCP refill and allocation operations. This was an oversight
and all pages should be checked. This will incur a small performance
penalty but it's necessary for correctness.
Link: https://lkml.kernel.org/r/20220310092456.GJ15701@techsingularity.net
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 46 +++++++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 23 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-check-high-order-pages-for-corruption-during-pcp-operations
+++ a/mm/page_alloc.c
@@ -2291,23 +2291,36 @@ static inline int check_new_page(struct
return 1;
}
+static bool check_new_pages(struct page *page, unsigned int order)
+{
+ int i;
+ for (i = 0; i < (1 << order); i++) {
+ struct page *p = page + i;
+
+ if (unlikely(check_new_page(p)))
+ return true;
+ }
+
+ return false;
+}
+
#ifdef CONFIG_DEBUG_VM
/*
* With DEBUG_VM enabled, order-0 pages are checked for expected state when
* being allocated from pcp lists. With debug_pagealloc also enabled, they are
* also checked when pcp lists are refilled from the free lists.
*/
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
{
if (debug_pagealloc_enabled_static())
- return check_new_page(page);
+ return check_new_pages(page, order);
else
return false;
}
-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
{
- return check_new_page(page);
+ return check_new_pages(page, order);
}
#else
/*
@@ -2315,32 +2328,19 @@ static inline bool check_new_pcp(struct
* when pcp lists are being refilled from the free lists. With debug_pagealloc
* enabled, they are also checked when being allocated from the pcp lists.
*/
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
{
- return check_new_page(page);
+ return check_new_pages(page, order);
}
-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
{
if (debug_pagealloc_enabled_static())
- return check_new_page(page);
+ return check_new_pages(page, order);
else
return false;
}
#endif /* CONFIG_DEBUG_VM */
-static bool check_new_pages(struct page *page, unsigned int order)
-{
- int i;
- for (i = 0; i < (1 << order); i++) {
- struct page *p = page + i;
-
- if (unlikely(check_new_page(p)))
- return true;
- }
-
- return false;
-}
-
inline void post_alloc_hook(struct page *page, unsigned int order,
gfp_t gfp_flags)
{
@@ -2982,7 +2982,7 @@ static int rmqueue_bulk(struct zone *zon
if (unlikely(page == NULL))
break;
- if (unlikely(check_pcp_refill(page)))
+ if (unlikely(check_pcp_refill(page, order)))
continue;
/*
@@ -3600,7 +3600,7 @@ struct page *__rmqueue_pcplist(struct zo
page = list_first_entry(list, struct page, lru);
list_del(&page->lru);
pcp->count -= 1 << order;
- } while (check_new_pcp(page));
+ } while (check_new_pcp(page, order));
return page;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 108/227] mm/page_alloc: check high-order pages for corruption during PCP operations
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: weixugc, vbabka, shakeelb, rientjes, mhocko, hughd, gthelen,
edumazet, mgorman, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm/page_alloc: check high-order pages for corruption during PCP operations
Eric Dumazet pointed out that commit 44042b449872 ("mm/page_alloc: allow
high-order pages to be stored on the per-cpu lists") only checks the head
page during PCP refill and allocation operations. This was an oversight
and all pages should be checked. This will incur a small performance
penalty but it's necessary for correctness.
Link: https://lkml.kernel.org/r/20220310092456.GJ15701@techsingularity.net
Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 46 +++++++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 23 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-check-high-order-pages-for-corruption-during-pcp-operations
+++ a/mm/page_alloc.c
@@ -2291,23 +2291,36 @@ static inline int check_new_page(struct
return 1;
}
+static bool check_new_pages(struct page *page, unsigned int order)
+{
+ int i;
+ for (i = 0; i < (1 << order); i++) {
+ struct page *p = page + i;
+
+ if (unlikely(check_new_page(p)))
+ return true;
+ }
+
+ return false;
+}
+
#ifdef CONFIG_DEBUG_VM
/*
* With DEBUG_VM enabled, order-0 pages are checked for expected state when
* being allocated from pcp lists. With debug_pagealloc also enabled, they are
* also checked when pcp lists are refilled from the free lists.
*/
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
{
if (debug_pagealloc_enabled_static())
- return check_new_page(page);
+ return check_new_pages(page, order);
else
return false;
}
-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
{
- return check_new_page(page);
+ return check_new_pages(page, order);
}
#else
/*
@@ -2315,32 +2328,19 @@ static inline bool check_new_pcp(struct
* when pcp lists are being refilled from the free lists. With debug_pagealloc
* enabled, they are also checked when being allocated from the pcp lists.
*/
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
{
- return check_new_page(page);
+ return check_new_pages(page, order);
}
-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
{
if (debug_pagealloc_enabled_static())
- return check_new_page(page);
+ return check_new_pages(page, order);
else
return false;
}
#endif /* CONFIG_DEBUG_VM */
-static bool check_new_pages(struct page *page, unsigned int order)
-{
- int i;
- for (i = 0; i < (1 << order); i++) {
- struct page *p = page + i;
-
- if (unlikely(check_new_page(p)))
- return true;
- }
-
- return false;
-}
-
inline void post_alloc_hook(struct page *page, unsigned int order,
gfp_t gfp_flags)
{
@@ -2982,7 +2982,7 @@ static int rmqueue_bulk(struct zone *zon
if (unlikely(page == NULL))
break;
- if (unlikely(check_pcp_refill(page)))
+ if (unlikely(check_pcp_refill(page, order)))
continue;
/*
@@ -3600,7 +3600,7 @@ struct page *__rmqueue_pcplist(struct zo
page = list_first_entry(list, struct page, lru);
list_del(&page->lru);
pcp->count -= 1 << order;
- } while (check_new_pcp(page));
+ } while (check_new_pcp(page, order));
return page;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 109/227] mm/memory-failure.c: remove obsolete comment
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: shy828301, osalvador, mike.kravetz, linmiaohe, anshuman.khandual,
naoya.horiguchi, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: mm/memory-failure.c: remove obsolete comment
With the introduction of mf_mutex, most of memory error handling process
is mutually exclusive, so the in-line comment about subtlety about
double-checking PageHWPoison is no more correct. So remove it.
Link: https://lkml.kernel.org/r/20220125025601.3054511-1-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/memory-failure.c~mm-hwpoison-remove-obsolete-comment
+++ a/mm/memory-failure.c
@@ -2150,12 +2150,6 @@ static int __soft_offline_page(struct pa
.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
};
- /*
- * Check PageHWPoison again inside page lock because PageHWPoison
- * is set by memory_failure() outside page lock. Note that
- * memory_failure() also double-checks PageHWPoison inside page lock,
- * so there's no race between soft_offline_page() and memory_failure().
- */
lock_page(page);
if (!PageHuge(page))
wait_on_page_writeback(page);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 109/227] mm/memory-failure.c: remove obsolete comment
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: shy828301, osalvador, mike.kravetz, linmiaohe, anshuman.khandual,
naoya.horiguchi, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: mm/memory-failure.c: remove obsolete comment
With the introduction of mf_mutex, most of memory error handling process
is mutually exclusive, so the in-line comment about subtlety about
double-checking PageHWPoison is no more correct. So remove it.
Link: https://lkml.kernel.org/r/20220125025601.3054511-1-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/memory-failure.c~mm-hwpoison-remove-obsolete-comment
+++ a/mm/memory-failure.c
@@ -2150,12 +2150,6 @@ static int __soft_offline_page(struct pa
.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
};
- /*
- * Check PageHWPoison again inside page lock because PageHWPoison
- * is set by memory_failure() outside page lock. Note that
- * memory_failure() also double-checks PageHWPoison inside page lock,
- * so there's no race between soft_offline_page() and memory_failure().
- */
lock_page(page);
if (!PageHuge(page))
wait_on_page_writeback(page);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 110/227] mm/hwpoison: fix error page recovered but reported "not recovered"
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: youquan.song, tony.luck, naoya.horiguchi, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: mm/hwpoison: fix error page recovered but reported "not recovered"
When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.
If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure() and the machine check
processing code finds the page already poisoned. It calls
kill_accessing_process() to make sure a SIGBUS is sent. But returns the
wrong error code.
Console log looks like this:
[34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400
[34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered
[34775.690310] Memory failure: 0x3710b3: already hardware poisoned
[34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption
[34775.706072] mce: Memory error not recovered
kill_accessing_process() is supposed to return -EHWPOISON to notify that
SIGBUS is already set to the process and kill_me_maybe() doesn't have to
send it again. But current code simply fails to do this, so fix it to
make sure to work as intended. This change avoids the noise message
"Memory error not recovered" and skips duplicate SIGBUSs.
[tony.luck@intel.com: reword some parts of commit message]
Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev
Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Youquan Song <youquan.song@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-hwpoison-fix-error-page-recovered-but-reported-not-recovered
+++ a/mm/memory-failure.c
@@ -707,8 +707,10 @@ static int kill_accessing_process(struct
(void *)&priv);
if (ret == 1 && priv.tk.addr)
kill_proc(&priv.tk, pfn, flags);
+ else
+ ret = 0;
mmap_read_unlock(p->mm);
- return ret ? -EFAULT : -EHWPOISON;
+ return ret > 0 ? -EHWPOISON : -EFAULT;
}
static const char *action_name[] = {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 110/227] mm/hwpoison: fix error page recovered but reported "not recovered"
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: youquan.song, tony.luck, naoya.horiguchi, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: mm/hwpoison: fix error page recovered but reported "not recovered"
When an uncorrected memory error is consumed there is a race between the
CMCI from the memory controller reporting an uncorrected error with a UCNA
signature, and the core reporting and SRAR signature machine check when
the data is about to be consumed.
If the CMCI wins that race, the page is marked poisoned when
uc_decode_notifier() calls memory_failure() and the machine check
processing code finds the page already poisoned. It calls
kill_accessing_process() to make sure a SIGBUS is sent. But returns the
wrong error code.
Console log looks like this:
[34775.674296] mce: Uncorrected hardware memory error in user-access at 3710b3400
[34775.675413] Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered
[34775.690310] Memory failure: 0x3710b3: already hardware poisoned
[34775.696247] Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption
[34775.706072] mce: Memory error not recovered
kill_accessing_process() is supposed to return -EHWPOISON to notify that
SIGBUS is already set to the process and kill_me_maybe() doesn't have to
send it again. But current code simply fails to do this, so fix it to
make sure to work as intended. This change avoids the noise message
"Memory error not recovered" and skips duplicate SIGBUSs.
[tony.luck@intel.com: reword some parts of commit message]
Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev
Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Youquan Song <youquan.song@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-hwpoison-fix-error-page-recovered-but-reported-not-recovered
+++ a/mm/memory-failure.c
@@ -707,8 +707,10 @@ static int kill_accessing_process(struct
(void *)&priv);
if (ret == 1 && priv.tk.addr)
kill_proc(&priv.tk, pfn, flags);
+ else
+ ret = 0;
mmap_read_unlock(p->mm);
- return ret ? -EFAULT : -EHWPOISON;
+ return ret > 0 ? -EHWPOISON : -EFAULT;
}
static const char *action_name[] = {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 111/227] mm: invalidate hwpoison page cache page in fault path
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: willy, stable, osalvador, naoya.horiguchi, mgorman, linmiaohe,
jhubbard, hannes, riel, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Rik van Riel <riel@surriel.com>
Subject: mm: invalidate hwpoison page cache page in fault path
Sometimes the page offlining code can leave behind a hwpoisoned clean page
cache page. This can lead to programs being killed over and over and over
again as they fault in the hwpoisoned page, get killed, and then get
re-spawned by whatever wanted to run them.
This is particularly embarrassing when the page was offlined due to having
too many corrected memory errors. Now we are killing tasks due to them
trying to access memory that probably isn't even corrupted.
This problem can be avoided by invalidating the page from the page fault
handler, which already has a branch for dealing with these kinds of pages.
With this patch we simply pretend the page fault was successful if the
page was invalidated, return to userspace, incur another page fault, read
in the file from disk (to a new memory page), and then everything works
again.
Link: https://lkml.kernel.org/r/20220212213740.423efcea@imladris.surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/memory.c~mm-clean-up-hwpoison-page-cache-page-in-fault-path
+++ a/mm/memory.c
@@ -3877,11 +3877,16 @@ static vm_fault_t __do_fault(struct vm_f
return ret;
if (unlikely(PageHWPoison(vmf->page))) {
- if (ret & VM_FAULT_LOCKED)
+ vm_fault_t poisonret = VM_FAULT_HWPOISON;
+ if (ret & VM_FAULT_LOCKED) {
+ /* Retry if a clean page was removed from the cache. */
+ if (invalidate_inode_page(vmf->page))
+ poisonret = 0;
unlock_page(vmf->page);
+ }
put_page(vmf->page);
vmf->page = NULL;
- return VM_FAULT_HWPOISON;
+ return poisonret;
}
if (unlikely(!(ret & VM_FAULT_LOCKED)))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 111/227] mm: invalidate hwpoison page cache page in fault path
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: willy, stable, osalvador, naoya.horiguchi, mgorman, linmiaohe,
jhubbard, hannes, riel, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Rik van Riel <riel@surriel.com>
Subject: mm: invalidate hwpoison page cache page in fault path
Sometimes the page offlining code can leave behind a hwpoisoned clean page
cache page. This can lead to programs being killed over and over and over
again as they fault in the hwpoisoned page, get killed, and then get
re-spawned by whatever wanted to run them.
This is particularly embarrassing when the page was offlined due to having
too many corrected memory errors. Now we are killing tasks due to them
trying to access memory that probably isn't even corrupted.
This problem can be avoided by invalidating the page from the page fault
handler, which already has a branch for dealing with these kinds of pages.
With this patch we simply pretend the page fault was successful if the
page was invalidated, return to userspace, incur another page fault, read
in the file from disk (to a new memory page), and then everything works
again.
Link: https://lkml.kernel.org/r/20220212213740.423efcea@imladris.surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/memory.c~mm-clean-up-hwpoison-page-cache-page-in-fault-path
+++ a/mm/memory.c
@@ -3877,11 +3877,16 @@ static vm_fault_t __do_fault(struct vm_f
return ret;
if (unlikely(PageHWPoison(vmf->page))) {
- if (ret & VM_FAULT_LOCKED)
+ vm_fault_t poisonret = VM_FAULT_HWPOISON;
+ if (ret & VM_FAULT_LOCKED) {
+ /* Retry if a clean page was removed from the cache. */
+ if (invalidate_inode_page(vmf->page))
+ poisonret = 0;
unlock_page(vmf->page);
+ }
put_page(vmf->page);
vmf->page = NULL;
- return VM_FAULT_HWPOISON;
+ return poisonret;
}
if (unlikely(!(ret & VM_FAULT_LOCKED)))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 112/227] mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap
Patch series "A few cleanup and fixup patches for memory failure", v3.
This series contains a few patches to simplify the code logic, remove
unneeded variable and remove obsolete comment. Also we fix race changing
page more robustly in memory_failure. More details can be found in the
respective changelogs.
This patch (of 8):
The flags always has MF_ACTION_REQUIRED and MF_MUST_KILL set. So we do
not need to check these flags again.
Link: https://lkml.kernel.org/r/20220218090118.1105-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220218090118.1105-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-minor-clean-up-for-memory_failure_dev_pagemap
+++ a/mm/memory-failure.c
@@ -1640,7 +1640,7 @@ static int memory_failure_dev_pagemap(un
* SIGBUS (i.e. MF_MUST_KILL)
*/
flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
- collect_procs(page, &tokill, flags & MF_ACTION_REQUIRED);
+ collect_procs(page, &tokill, true);
list_for_each_entry(tk, &tokill, nd)
if (tk->size_shift)
@@ -1655,7 +1655,7 @@ static int memory_failure_dev_pagemap(un
start = (page->index << PAGE_SHIFT) & ~(size - 1);
unmap_mapping_range(page->mapping, start, size, 0);
}
- kill_procs(&tokill, flags & MF_MUST_KILL, false, pfn, flags);
+ kill_procs(&tokill, true, false, pfn, flags);
rc = 0;
unlock:
dax_unlock_page(page, cookie);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 112/227] mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap
Patch series "A few cleanup and fixup patches for memory failure", v3.
This series contains a few patches to simplify the code logic, remove
unneeded variable and remove obsolete comment. Also we fix race changing
page more robustly in memory_failure. More details can be found in the
respective changelogs.
This patch (of 8):
The flags always has MF_ACTION_REQUIRED and MF_MUST_KILL set. So we do
not need to check these flags again.
Link: https://lkml.kernel.org/r/20220218090118.1105-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220218090118.1105-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-minor-clean-up-for-memory_failure_dev_pagemap
+++ a/mm/memory-failure.c
@@ -1640,7 +1640,7 @@ static int memory_failure_dev_pagemap(un
* SIGBUS (i.e. MF_MUST_KILL)
*/
flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
- collect_procs(page, &tokill, flags & MF_ACTION_REQUIRED);
+ collect_procs(page, &tokill, true);
list_for_each_entry(tk, &tokill, nd)
if (tk->size_shift)
@@ -1655,7 +1655,7 @@ static int memory_failure_dev_pagemap(un
start = (page->index << PAGE_SHIFT) & ~(size - 1);
unmap_mapping_range(page->mapping, start, size, 0);
}
- kill_procs(&tokill, flags & MF_MUST_KILL, false, pfn, flags);
+ kill_procs(&tokill, true, false, pfn, flags);
rc = 0;
unlock:
dax_unlock_page(page, cookie);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 113/227] mm/memory-failure.c: catch unexpected -EFAULT from vma_address()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: catch unexpected -EFAULT from vma_address()
It's unexpected to walk the page table when vma_address() return -EFAULT.
But dev_pagemap_mapping_shift() is called only when vma associated to the
error page is found already in collect_procs_{file,anon}, so vma_address()
should not return -EFAULT except with some bug, as Naoya pointed out. We
can use VM_BUG_ON_VMA() to catch this bug here.
Link: https://lkml.kernel.org/r/20220218090118.1105-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/memory-failure.c~mm-memory-failurec-catch-unexpected-efault-from-vma_address
+++ a/mm/memory-failure.c
@@ -315,6 +315,7 @@ static unsigned long dev_pagemap_mapping
pmd_t *pmd;
pte_t *pte;
+ VM_BUG_ON_VMA(address == -EFAULT, vma);
pgd = pgd_offset(vma->vm_mm, address);
if (!pgd_present(*pgd))
return 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 113/227] mm/memory-failure.c: catch unexpected -EFAULT from vma_address()
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: catch unexpected -EFAULT from vma_address()
It's unexpected to walk the page table when vma_address() return -EFAULT.
But dev_pagemap_mapping_shift() is called only when vma associated to the
error page is found already in collect_procs_{file,anon}, so vma_address()
should not return -EFAULT except with some bug, as Naoya pointed out. We
can use VM_BUG_ON_VMA() to catch this bug here.
Link: https://lkml.kernel.org/r/20220218090118.1105-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/memory-failure.c~mm-memory-failurec-catch-unexpected-efault-from-vma_address
+++ a/mm/memory-failure.c
@@ -315,6 +315,7 @@ static unsigned long dev_pagemap_mapping
pmd_t *pmd;
pte_t *pte;
+ VM_BUG_ON_VMA(address == -EFAULT, vma);
pgd = pgd_offset(vma->vm_mm, address);
if (!pgd_present(*pgd))
return 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 114/227] mm/memory-failure.c: rework the signaling logic in kill_proc
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: rework the signaling logic in kill_proc
BUS_MCEERR_AR code is only sent when MF_ACTION_REQUIRED is set and the
target is current. Rework the code to make this clear.
Link: https://lkml.kernel.org/r/20220218090118.1105-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-rework-the-signaling-logic-in-kill_proc
+++ a/mm/memory-failure.c
@@ -258,16 +258,13 @@ static int kill_proc(struct to_kill *tk,
pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
pfn, t->comm, t->pid);
- if (flags & MF_ACTION_REQUIRED) {
- if (t == current)
- ret = force_sig_mceerr(BUS_MCEERR_AR,
- (void __user *)tk->addr, addr_lsb);
- else
- /* Signal other processes sharing the page if they have PF_MCE_EARLY set. */
- ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
- addr_lsb, t);
- } else {
+ if ((flags & MF_ACTION_REQUIRED) && (t == current))
+ ret = force_sig_mceerr(BUS_MCEERR_AR,
+ (void __user *)tk->addr, addr_lsb);
+ else
/*
+ * Signal other processes sharing the page if they have
+ * PF_MCE_EARLY set.
* Don't use force here, it's convenient if the signal
* can be temporarily blocked.
* This could cause a loop when the user sets SIGBUS
@@ -275,7 +272,6 @@ static int kill_proc(struct to_kill *tk,
*/
ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
addr_lsb, t); /* synchronous? */
- }
if (ret < 0)
pr_info("Memory failure: Error sending signal to %s:%d: %d\n",
t->comm, t->pid, ret);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 114/227] mm/memory-failure.c: rework the signaling logic in kill_proc
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: rework the signaling logic in kill_proc
BUS_MCEERR_AR code is only sent when MF_ACTION_REQUIRED is set and the
target is current. Rework the code to make this clear.
Link: https://lkml.kernel.org/r/20220218090118.1105-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-rework-the-signaling-logic-in-kill_proc
+++ a/mm/memory-failure.c
@@ -258,16 +258,13 @@ static int kill_proc(struct to_kill *tk,
pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
pfn, t->comm, t->pid);
- if (flags & MF_ACTION_REQUIRED) {
- if (t == current)
- ret = force_sig_mceerr(BUS_MCEERR_AR,
- (void __user *)tk->addr, addr_lsb);
- else
- /* Signal other processes sharing the page if they have PF_MCE_EARLY set. */
- ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
- addr_lsb, t);
- } else {
+ if ((flags & MF_ACTION_REQUIRED) && (t == current))
+ ret = force_sig_mceerr(BUS_MCEERR_AR,
+ (void __user *)tk->addr, addr_lsb);
+ else
/*
+ * Signal other processes sharing the page if they have
+ * PF_MCE_EARLY set.
* Don't use force here, it's convenient if the signal
* can be temporarily blocked.
* This could cause a loop when the user sets SIGBUS
@@ -275,7 +272,6 @@ static int kill_proc(struct to_kill *tk,
*/
ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
addr_lsb, t); /* synchronous? */
- }
if (ret < 0)
pr_info("Memory failure: Error sending signal to %s:%d: %d\n",
t->comm, t->pid, ret);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 115/227] mm/memory-failure.c: fix race with changing page more robustly
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: fix race with changing page more robustly
We're only intended to deal with the non-Compound page after we split thp
in memory_failure. However, the page could have changed compound pages
due to race window. If this happens, we could retry once to hopefully
handle the page next round. Also remove unneeded orig_head. It's always
equal to the hpage. So we can use hpage directly and remove this
redundant one.
Link: https://lkml.kernel.org/r/20220218090118.1105-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-fix-race-with-changing-page-more-robustly
+++ a/mm/memory-failure.c
@@ -1686,7 +1686,6 @@ int memory_failure(unsigned long pfn, in
{
struct page *p;
struct page *hpage;
- struct page *orig_head;
struct dev_pagemap *pgmap;
int res = 0;
unsigned long page_flags;
@@ -1732,7 +1731,7 @@ try_again:
goto unlock_mutex;
}
- orig_head = hpage = compound_head(p);
+ hpage = compound_head(p);
num_poisoned_pages_inc();
/*
@@ -1813,10 +1812,21 @@ try_again:
lock_page(p);
/*
- * The page could have changed compound pages during the locking.
- * If this happens just bail out.
+ * We're only intended to deal with the non-Compound page here.
+ * However, the page could have changed compound pages due to
+ * race window. If this happens, we could try again to hopefully
+ * handle the page next round.
*/
- if (PageCompound(p) && compound_head(p) != orig_head) {
+ if (PageCompound(p)) {
+ if (retry) {
+ if (TestClearPageHWPoison(p))
+ num_poisoned_pages_dec();
+ unlock_page(p);
+ put_page(p);
+ flags &= ~MF_COUNT_INCREASED;
+ retry = false;
+ goto try_again;
+ }
action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
res = -EBUSY;
goto unlock_page;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 115/227] mm/memory-failure.c: fix race with changing page more robustly
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: fix race with changing page more robustly
We're only intended to deal with the non-Compound page after we split thp
in memory_failure. However, the page could have changed compound pages
due to race window. If this happens, we could retry once to hopefully
handle the page next round. Also remove unneeded orig_head. It's always
equal to the hpage. So we can use hpage directly and remove this
redundant one.
Link: https://lkml.kernel.org/r/20220218090118.1105-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-fix-race-with-changing-page-more-robustly
+++ a/mm/memory-failure.c
@@ -1686,7 +1686,6 @@ int memory_failure(unsigned long pfn, in
{
struct page *p;
struct page *hpage;
- struct page *orig_head;
struct dev_pagemap *pgmap;
int res = 0;
unsigned long page_flags;
@@ -1732,7 +1731,7 @@ try_again:
goto unlock_mutex;
}
- orig_head = hpage = compound_head(p);
+ hpage = compound_head(p);
num_poisoned_pages_inc();
/*
@@ -1813,10 +1812,21 @@ try_again:
lock_page(p);
/*
- * The page could have changed compound pages during the locking.
- * If this happens just bail out.
+ * We're only intended to deal with the non-Compound page here.
+ * However, the page could have changed compound pages due to
+ * race window. If this happens, we could try again to hopefully
+ * handle the page next round.
*/
- if (PageCompound(p) && compound_head(p) != orig_head) {
+ if (PageCompound(p)) {
+ if (retry) {
+ if (TestClearPageHWPoison(p))
+ num_poisoned_pages_dec();
+ unlock_page(p);
+ put_page(p);
+ flags &= ~MF_COUNT_INCREASED;
+ retry = false;
+ goto try_again;
+ }
action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
res = -EBUSY;
goto unlock_page;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 116/227] mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev
Since commit 03e5ac2fc3bf ("mm: fix crash when using XFS on loopback"),
page_mapping() can handle the Slab pages. So remove this unnecessary
PageSlab check and obsolete comment.
Link: https://lkml.kernel.org/r/20220218090118.1105-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-remove-pageslab-check-in-hwpoison_filter_dev
+++ a/mm/memory-failure.c
@@ -130,12 +130,6 @@ static int hwpoison_filter_dev(struct pa
hwpoison_filter_dev_minor == ~0U)
return 0;
- /*
- * page_mapping() does not accept slab pages.
- */
- if (PageSlab(p))
- return -EINVAL;
-
mapping = page_mapping(p);
if (mapping == NULL || mapping->host == NULL)
return -EINVAL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 116/227] mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev
Since commit 03e5ac2fc3bf ("mm: fix crash when using XFS on loopback"),
page_mapping() can handle the Slab pages. So remove this unnecessary
PageSlab check and obsolete comment.
Link: https://lkml.kernel.org/r/20220218090118.1105-6-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-remove-pageslab-check-in-hwpoison_filter_dev
+++ a/mm/memory-failure.c
@@ -130,12 +130,6 @@ static int hwpoison_filter_dev(struct pa
hwpoison_filter_dev_minor == ~0U)
return 0;
- /*
- * page_mapping() does not accept slab pages.
- */
- if (PageSlab(p))
- return -EINVAL;
-
mapping = page_mapping(p);
if (mapping == NULL || mapping->host == NULL)
return -EINVAL;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 117/227] mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings()
Only for hugetlb pages in shared mappings, try_to_unmap should take
semaphore in write mode here. Rework the code to make it clear.
Link: https://lkml.kernel.org/r/20220218090118.1105-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 34 +++++++++++++++-------------------
1 file changed, 15 insertions(+), 19 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-rework-the-try_to_unmap-logic-in-hwpoison_user_mappings
+++ a/mm/memory-failure.c
@@ -1404,26 +1404,22 @@ static bool hwpoison_user_mappings(struc
if (kill)
collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
- if (!PageHuge(hpage)) {
- try_to_unmap(hpage, ttu);
+ if (PageHuge(hpage) && !PageAnon(hpage)) {
+ /*
+ * For hugetlb pages in shared mappings, try_to_unmap
+ * could potentially call huge_pmd_unshare. Because of
+ * this, take semaphore in write mode here and set
+ * TTU_RMAP_LOCKED to indicate we have taken the lock
+ * at this higher level.
+ */
+ mapping = hugetlb_page_mapping_lock_write(hpage);
+ if (mapping) {
+ try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
+ } else
+ pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn);
} else {
- if (!PageAnon(hpage)) {
- /*
- * For hugetlb pages in shared mappings, try_to_unmap
- * could potentially call huge_pmd_unshare. Because of
- * this, take semaphore in write mode here and set
- * TTU_RMAP_LOCKED to indicate we have taken the lock
- * at this higher level.
- */
- mapping = hugetlb_page_mapping_lock_write(hpage);
- if (mapping) {
- try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED);
- i_mmap_unlock_write(mapping);
- } else
- pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn);
- } else {
- try_to_unmap(hpage, ttu);
- }
+ try_to_unmap(hpage, ttu);
}
unmap_success = !page_mapped(hpage);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 117/227] mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings()
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings()
Only for hugetlb pages in shared mappings, try_to_unmap should take
semaphore in write mode here. Rework the code to make it clear.
Link: https://lkml.kernel.org/r/20220218090118.1105-7-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 34 +++++++++++++++-------------------
1 file changed, 15 insertions(+), 19 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-rework-the-try_to_unmap-logic-in-hwpoison_user_mappings
+++ a/mm/memory-failure.c
@@ -1404,26 +1404,22 @@ static bool hwpoison_user_mappings(struc
if (kill)
collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
- if (!PageHuge(hpage)) {
- try_to_unmap(hpage, ttu);
+ if (PageHuge(hpage) && !PageAnon(hpage)) {
+ /*
+ * For hugetlb pages in shared mappings, try_to_unmap
+ * could potentially call huge_pmd_unshare. Because of
+ * this, take semaphore in write mode here and set
+ * TTU_RMAP_LOCKED to indicate we have taken the lock
+ * at this higher level.
+ */
+ mapping = hugetlb_page_mapping_lock_write(hpage);
+ if (mapping) {
+ try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED);
+ i_mmap_unlock_write(mapping);
+ } else
+ pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn);
} else {
- if (!PageAnon(hpage)) {
- /*
- * For hugetlb pages in shared mappings, try_to_unmap
- * could potentially call huge_pmd_unshare. Because of
- * this, take semaphore in write mode here and set
- * TTU_RMAP_LOCKED to indicate we have taken the lock
- * at this higher level.
- */
- mapping = hugetlb_page_mapping_lock_write(hpage);
- if (mapping) {
- try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED);
- i_mmap_unlock_write(mapping);
- } else
- pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn);
- } else {
- try_to_unmap(hpage, ttu);
- }
+ try_to_unmap(hpage, ttu);
}
unmap_success = !page_mapped(hpage);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 118/227] mm/memory-failure.c: remove obsolete comment in __soft_offline_page
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: remove obsolete comment in __soft_offline_page
Since commit add05cecef80 ("mm: soft-offline: don't free target page in
successful page migration"), set_migratetype_isolate logic is removed.
Remove this obsolete comment.
Link: https://lkml.kernel.org/r/20220218090118.1105-8-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 4 ----
1 file changed, 4 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-remove-obsolete-comment-in-__soft_offline_page
+++ a/mm/memory-failure.c
@@ -2167,10 +2167,6 @@ static int __soft_offline_page(struct pa
ret = invalidate_inode_page(page);
unlock_page(page);
- /*
- * RED-PEN would be better to keep it isolated here, but we
- * would need to fix isolation locking first.
- */
if (ret) {
pr_info("soft_offline: %#lx: invalidated\n", pfn);
page_handle_poison(page, false, true);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 118/227] mm/memory-failure.c: remove obsolete comment in __soft_offline_page
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: remove obsolete comment in __soft_offline_page
Since commit add05cecef80 ("mm: soft-offline: don't free target page in
successful page migration"), set_migratetype_isolate logic is removed.
Remove this obsolete comment.
Link: https://lkml.kernel.org/r/20220218090118.1105-8-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 4 ----
1 file changed, 4 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-remove-obsolete-comment-in-__soft_offline_page
+++ a/mm/memory-failure.c
@@ -2167,10 +2167,6 @@ static int __soft_offline_page(struct pa
ret = invalidate_inode_page(page);
unlock_page(page);
- /*
- * RED-PEN would be better to keep it isolated here, but we
- * would need to fix isolation locking first.
- */
if (ret) {
pr_info("soft_offline: %#lx: invalidated\n", pfn);
page_handle_poison(page, false, true);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 119/227] mm/memory-failure.c: remove unnecessary PageTransTail check
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: remove unnecessary PageTransTail check
When we reach here, we're guaranteed to have non-compound page as thp is
already splited. Remove this unnecessary PageTransTail check.
Link: https://lkml.kernel.org/r/20220218090118.1105-9-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-memory-failurec-remove-unnecessary-pagetranstail-check
+++ a/mm/memory-failure.c
@@ -1844,7 +1844,7 @@ try_again:
* page_lock. We need wait writeback completion for this page or it
* may trigger vfs BUG while evict inode.
*/
- if (!PageTransTail(p) && !PageLRU(p) && !PageWriteback(p))
+ if (!PageLRU(p) && !PageWriteback(p))
goto identify_page_state;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 119/227] mm/memory-failure.c: remove unnecessary PageTransTail check
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: remove unnecessary PageTransTail check
When we reach here, we're guaranteed to have non-compound page as thp is
already splited. Remove this unnecessary PageTransTail check.
Link: https://lkml.kernel.org/r/20220218090118.1105-9-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-memory-failurec-remove-unnecessary-pagetranstail-check
+++ a/mm/memory-failure.c
@@ -1844,7 +1844,7 @@ try_again:
* page_lock. We need wait writeback completion for this page or it
* may trigger vfs BUG while evict inode.
*/
- if (!PageTransTail(p) && !PageLRU(p) && !PageWriteback(p))
+ if (!PageLRU(p) && !PageWriteback(p))
goto identify_page_state;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 120/227] mm/hwpoison-inject: support injecting hwpoison to free page
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/hwpoison-inject: support injecting hwpoison to free page
memory_failure() can handle free buddy page. Support injecting hwpoison
to free page by adding is_free_buddy_page check when hwpoison filter is
disabled.
[akpm@linux-foundation.org: export is_free_buddy_page() to modules]
Link: https://lkml.kernel.org/r/20220218092052.3853-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hwpoison-inject.c | 4 ++--
mm/page_alloc.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
--- a/mm/hwpoison-inject.c~mm-hwpoison-inject-support-injecting-hwpoison-to-free-page
+++ a/mm/hwpoison-inject.c
@@ -32,9 +32,9 @@ static int hwpoison_inject(void *data, u
shake_page(hpage);
/*
- * This implies unable to support non-LRU pages.
+ * This implies unable to support non-LRU pages except free page.
*/
- if (!PageLRU(hpage) && !PageHuge(p))
+ if (!PageLRU(hpage) && !PageHuge(p) && !is_free_buddy_page(p))
return 0;
/*
--- a/mm/page_alloc.c~mm-hwpoison-inject-support-injecting-hwpoison-to-free-page
+++ a/mm/page_alloc.c
@@ -9417,6 +9417,7 @@ bool is_free_buddy_page(struct page *pag
return order < MAX_ORDER;
}
+EXPORT_SYMBOL(is_free_buddy_page);
#ifdef CONFIG_MEMORY_FAILURE
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 120/227] mm/hwpoison-inject: support injecting hwpoison to free page
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: naoya.horiguchi, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/hwpoison-inject: support injecting hwpoison to free page
memory_failure() can handle free buddy page. Support injecting hwpoison
to free page by adding is_free_buddy_page check when hwpoison filter is
disabled.
[akpm@linux-foundation.org: export is_free_buddy_page() to modules]
Link: https://lkml.kernel.org/r/20220218092052.3853-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hwpoison-inject.c | 4 ++--
mm/page_alloc.c | 1 +
2 files changed, 3 insertions(+), 2 deletions(-)
--- a/mm/hwpoison-inject.c~mm-hwpoison-inject-support-injecting-hwpoison-to-free-page
+++ a/mm/hwpoison-inject.c
@@ -32,9 +32,9 @@ static int hwpoison_inject(void *data, u
shake_page(hpage);
/*
- * This implies unable to support non-LRU pages.
+ * This implies unable to support non-LRU pages except free page.
*/
- if (!PageLRU(hpage) && !PageHuge(p))
+ if (!PageLRU(hpage) && !PageHuge(p) && !is_free_buddy_page(p))
return 0;
/*
--- a/mm/page_alloc.c~mm-hwpoison-inject-support-injecting-hwpoison-to-free-page
+++ a/mm/page_alloc.c
@@ -9417,6 +9417,7 @@ bool is_free_buddy_page(struct page *pag
return order < MAX_ORDER;
}
+EXPORT_SYMBOL(is_free_buddy_page);
#ifdef CONFIG_MEMORY_FAILURE
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 121/227] mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, tglx, naoya.horiguchi, mingo, linmiaohe, hpa,
dave.hansen, bp, luofei, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: luofei <luofei@unicloud.com>
Subject: mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
When the hwpoison page meets the filter conditions, it should not be
regarded as successful memory_failure() processing for mce handler, but
should return a distinct value, otherwise mce handler regards the error
page has been identified and isolated, which may lead to calling
set_mce_nospec() to change page attribute, etc.
Here memory_failure() return -EOPNOTSUPP to indicate that the error event
is filtered, mce handler should not take any action for this situation and
hwpoison injector should treat as correct.
Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.com
Signed-off-by: luofei <luofei@unicloud.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/x86/kernel/cpu/mce/core.c | 8 +++++---
drivers/base/memory.c | 2 ++
mm/hwpoison-inject.c | 3 ++-
mm/madvise.c | 2 ++
mm/memory-failure.c | 9 +++++++--
5 files changed, 18 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/cpu/mce/core.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/arch/x86/kernel/cpu/mce/core.c
@@ -1304,10 +1304,12 @@ static void kill_me_maybe(struct callbac
/*
* -EHWPOISON from memory_failure() means that it already sent SIGBUS
- * to the current process with the proper error info, so no need to
- * send SIGBUS here again.
+ * to the current process with the proper error info,
+ * -EOPNOTSUPP means hwpoison_filter() filtered the error event,
+ *
+ * In both cases, no further processing is required.
*/
- if (ret == -EHWPOISON)
+ if (ret == -EHWPOISON || ret == -EOPNOTSUPP)
return;
pr_err("Memory error not recovered");
--- a/drivers/base/memory.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/drivers/base/memory.c
@@ -555,6 +555,8 @@ static ssize_t hard_offline_page_store(s
return -EINVAL;
pfn >>= PAGE_SHIFT;
ret = memory_failure(pfn, 0);
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
return ret ? ret : count;
}
--- a/mm/hwpoison-inject.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/mm/hwpoison-inject.c
@@ -48,7 +48,8 @@ static int hwpoison_inject(void *data, u
inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
- return memory_failure(pfn, 0);
+ err = memory_failure(pfn, 0);
+ return (err == -EOPNOTSUPP) ? 0 : err;
}
static int hwpoison_unpoison(void *data, u64 val)
--- a/mm/madvise.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/mm/madvise.c
@@ -1067,6 +1067,8 @@ static int madvise_inject_error(int beha
pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
pfn, start);
ret = memory_failure(pfn, MF_COUNT_INCREASED);
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
}
if (ret)
--- a/mm/memory-failure.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/mm/memory-failure.c
@@ -1515,7 +1515,7 @@ static int memory_failure_hugetlb(unsign
if (TestClearPageHWPoison(head))
num_poisoned_pages_dec();
unlock_page(head);
- return 0;
+ return -EOPNOTSUPP;
}
unlock_page(head);
res = MF_FAILED;
@@ -1602,7 +1602,7 @@ static int memory_failure_dev_pagemap(un
goto out;
if (hwpoison_filter(page)) {
- rc = 0;
+ rc = -EOPNOTSUPP;
goto unlock;
}
@@ -1671,6 +1671,10 @@ static DEFINE_MUTEX(mf_mutex);
*
* Must run in process context (e.g. a work queue) with interrupts
* enabled and no spinlocks hold.
+ *
+ * Return: 0 for successfully handled the memory error,
+ * -EOPNOTSUPP for memory_filter() filtered the error event,
+ * < 0(except -EOPNOTSUPP) on failure.
*/
int memory_failure(unsigned long pfn, int flags)
{
@@ -1836,6 +1840,7 @@ try_again:
num_poisoned_pages_dec();
unlock_page(p);
put_page(p);
+ res = -EOPNOTSUPP;
goto unlock_mutex;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 121/227] mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, tglx, naoya.horiguchi, mingo, linmiaohe, hpa,
dave.hansen, bp, luofei, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: luofei <luofei@unicloud.com>
Subject: mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
When the hwpoison page meets the filter conditions, it should not be
regarded as successful memory_failure() processing for mce handler, but
should return a distinct value, otherwise mce handler regards the error
page has been identified and isolated, which may lead to calling
set_mce_nospec() to change page attribute, etc.
Here memory_failure() return -EOPNOTSUPP to indicate that the error event
is filtered, mce handler should not take any action for this situation and
hwpoison injector should treat as correct.
Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.com
Signed-off-by: luofei <luofei@unicloud.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/x86/kernel/cpu/mce/core.c | 8 +++++---
drivers/base/memory.c | 2 ++
mm/hwpoison-inject.c | 3 ++-
mm/madvise.c | 2 ++
mm/memory-failure.c | 9 +++++++--
5 files changed, 18 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/cpu/mce/core.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/arch/x86/kernel/cpu/mce/core.c
@@ -1304,10 +1304,12 @@ static void kill_me_maybe(struct callbac
/*
* -EHWPOISON from memory_failure() means that it already sent SIGBUS
- * to the current process with the proper error info, so no need to
- * send SIGBUS here again.
+ * to the current process with the proper error info,
+ * -EOPNOTSUPP means hwpoison_filter() filtered the error event,
+ *
+ * In both cases, no further processing is required.
*/
- if (ret == -EHWPOISON)
+ if (ret == -EHWPOISON || ret == -EOPNOTSUPP)
return;
pr_err("Memory error not recovered");
--- a/drivers/base/memory.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/drivers/base/memory.c
@@ -555,6 +555,8 @@ static ssize_t hard_offline_page_store(s
return -EINVAL;
pfn >>= PAGE_SHIFT;
ret = memory_failure(pfn, 0);
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
return ret ? ret : count;
}
--- a/mm/hwpoison-inject.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/mm/hwpoison-inject.c
@@ -48,7 +48,8 @@ static int hwpoison_inject(void *data, u
inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
- return memory_failure(pfn, 0);
+ err = memory_failure(pfn, 0);
+ return (err == -EOPNOTSUPP) ? 0 : err;
}
static int hwpoison_unpoison(void *data, u64 val)
--- a/mm/madvise.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/mm/madvise.c
@@ -1067,6 +1067,8 @@ static int madvise_inject_error(int beha
pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
pfn, start);
ret = memory_failure(pfn, MF_COUNT_INCREASED);
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
}
if (ret)
--- a/mm/memory-failure.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
+++ a/mm/memory-failure.c
@@ -1515,7 +1515,7 @@ static int memory_failure_hugetlb(unsign
if (TestClearPageHWPoison(head))
num_poisoned_pages_dec();
unlock_page(head);
- return 0;
+ return -EOPNOTSUPP;
}
unlock_page(head);
res = MF_FAILED;
@@ -1602,7 +1602,7 @@ static int memory_failure_dev_pagemap(un
goto out;
if (hwpoison_filter(page)) {
- rc = 0;
+ rc = -EOPNOTSUPP;
goto unlock;
}
@@ -1671,6 +1671,10 @@ static DEFINE_MUTEX(mf_mutex);
*
* Must run in process context (e.g. a work queue) with interrupts
* enabled and no spinlocks hold.
+ *
+ * Return: 0 for successfully handled the memory error,
+ * -EOPNOTSUPP for memory_filter() filtered the error event,
+ * < 0(except -EOPNOTSUPP) on failure.
*/
int memory_failure(unsigned long pfn, int flags)
{
@@ -1836,6 +1840,7 @@ try_again:
num_poisoned_pages_dec();
unlock_page(p);
put_page(p);
+ res = -EOPNOTSUPP;
goto unlock_mutex;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 122/227] mm/hwpoison: add in-use hugepage hwpoison filter judgement
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, tglx, naoya.horiguchi, mingo, linmiaohe, hpa,
dave.hansen, bp, luofei, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: luofei <luofei@unicloud.com>
Subject: mm/hwpoison: add in-use hugepage hwpoison filter judgement
After successfully obtaining the reference count of the huge page, it is
still necessary to call hwpoison_filter() to make a filter judgement,
otherwise the filter hugepage will be unmaped and the related process may
be killed.
Link: https://lkml.kernel.org/r/20220223082254.2769757-1-luofei@unicloud.com
Signed-off-by: luofei <luofei@unicloud.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/memory-failure.c~mm-hwpoison-add-in-use-hugepage-hwpoison-filter-judgement
+++ a/mm/memory-failure.c
@@ -1534,6 +1534,14 @@ static int memory_failure_hugetlb(unsign
lock_page(head);
page_flags = head->flags;
+ if (hwpoison_filter(p)) {
+ if (TestClearPageHWPoison(head))
+ num_poisoned_pages_dec();
+ put_page(p);
+ res = -EOPNOTSUPP;
+ goto out;
+ }
+
/*
* TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
* simply disable it. In order to make it work properly, we need
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 122/227] mm/hwpoison: add in-use hugepage hwpoison filter judgement
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, tglx, naoya.horiguchi, mingo, linmiaohe, hpa,
dave.hansen, bp, luofei, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: luofei <luofei@unicloud.com>
Subject: mm/hwpoison: add in-use hugepage hwpoison filter judgement
After successfully obtaining the reference count of the huge page, it is
still necessary to call hwpoison_filter() to make a filter judgement,
otherwise the filter hugepage will be unmaped and the related process may
be killed.
Link: https://lkml.kernel.org/r/20220223082254.2769757-1-luofei@unicloud.com
Signed-off-by: luofei <luofei@unicloud.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/memory-failure.c~mm-hwpoison-add-in-use-hugepage-hwpoison-filter-judgement
+++ a/mm/memory-failure.c
@@ -1534,6 +1534,14 @@ static int memory_failure_hugetlb(unsign
lock_page(head);
page_flags = head->flags;
+ if (hwpoison_filter(p)) {
+ if (TestClearPageHWPoison(head))
+ num_poisoned_pages_dec();
+ put_page(p);
+ res = -EOPNOTSUPP;
+ goto out;
+ }
+
/*
* TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
* simply disable it. In order to make it work properly, we need
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 123/227] mm/memory-failure.c: fix race with changing page compound again
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, shy828301, naoya.horiguchi, mike.kravetz, bp,
linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: fix race with changing page compound again
Patch series "A few fixup patches for memory failure", v2.
This series contains a few patches to fix the race with changing page
compound page, make non-LRU movable pages unhandlable and so on. More
details can be found in the respective changelogs.
There is a race window where we got the compound_head, the hugetlb page
could be freed to buddy, or even changed to another compound page just
before we try to get hwpoison page. Think about the below race window:
CPU 1 CPU 2
memory_failure_hugetlb
struct page *head = compound_head(p);
hugetlb page might be freed to
buddy, or even changed to another
compound page.
get_hwpoison_page -- page is not what we want now...
If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
introduced to record this event.
[akpm@linux-foundation.org: s@/**@/*@, per Naoya Horiguchi]
Link: https://lkml.kernel.org/r/20220312074613.4798-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220312074613.4798-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 1 +
include/ras/ras_event.h | 1 +
mm/memory-failure.c | 12 ++++++++++++
3 files changed, 14 insertions(+)
--- a/include/linux/mm.h~mm-memory-failurec-fix-race-with-changing-page-compound-again
+++ a/include/linux/mm.h
@@ -3239,6 +3239,7 @@ enum mf_action_page_type {
MF_MSG_BUDDY,
MF_MSG_DAX,
MF_MSG_UNSPLIT_THP,
+ MF_MSG_DIFFERENT_PAGE_SIZE,
MF_MSG_UNKNOWN,
};
--- a/include/ras/ras_event.h~mm-memory-failurec-fix-race-with-changing-page-compound-again
+++ a/include/ras/ras_event.h
@@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_BUDDY, "free buddy page" ) \
EM ( MF_MSG_DAX, "dax page" ) \
EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
+ EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \
EMe ( MF_MSG_UNKNOWN, "unknown page" )
/*
--- a/mm/memory-failure.c~mm-memory-failurec-fix-race-with-changing-page-compound-again
+++ a/mm/memory-failure.c
@@ -732,6 +732,7 @@ static const char * const action_page_ty
[MF_MSG_BUDDY] = "free buddy page",
[MF_MSG_DAX] = "dax page",
[MF_MSG_UNSPLIT_THP] = "unsplit thp",
+ [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size",
[MF_MSG_UNKNOWN] = "unknown page",
};
@@ -1532,6 +1533,17 @@ static int memory_failure_hugetlb(unsign
}
lock_page(head);
+
+ /*
+ * The page could have changed compound pages due to race window.
+ * If this happens just bail out.
+ */
+ if (!PageHuge(p) || compound_head(p) != head) {
+ action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
+ res = -EBUSY;
+ goto out;
+ }
+
page_flags = head->flags;
if (hwpoison_filter(p)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 123/227] mm/memory-failure.c: fix race with changing page compound again
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, shy828301, naoya.horiguchi, mike.kravetz, bp,
linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: fix race with changing page compound again
Patch series "A few fixup patches for memory failure", v2.
This series contains a few patches to fix the race with changing page
compound page, make non-LRU movable pages unhandlable and so on. More
details can be found in the respective changelogs.
There is a race window where we got the compound_head, the hugetlb page
could be freed to buddy, or even changed to another compound page just
before we try to get hwpoison page. Think about the below race window:
CPU 1 CPU 2
memory_failure_hugetlb
struct page *head = compound_head(p);
hugetlb page might be freed to
buddy, or even changed to another
compound page.
get_hwpoison_page -- page is not what we want now...
If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
introduced to record this event.
[akpm@linux-foundation.org: s@/**@/*@, per Naoya Horiguchi]
Link: https://lkml.kernel.org/r/20220312074613.4798-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220312074613.4798-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 1 +
include/ras/ras_event.h | 1 +
mm/memory-failure.c | 12 ++++++++++++
3 files changed, 14 insertions(+)
--- a/include/linux/mm.h~mm-memory-failurec-fix-race-with-changing-page-compound-again
+++ a/include/linux/mm.h
@@ -3239,6 +3239,7 @@ enum mf_action_page_type {
MF_MSG_BUDDY,
MF_MSG_DAX,
MF_MSG_UNSPLIT_THP,
+ MF_MSG_DIFFERENT_PAGE_SIZE,
MF_MSG_UNKNOWN,
};
--- a/include/ras/ras_event.h~mm-memory-failurec-fix-race-with-changing-page-compound-again
+++ a/include/ras/ras_event.h
@@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_BUDDY, "free buddy page" ) \
EM ( MF_MSG_DAX, "dax page" ) \
EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
+ EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \
EMe ( MF_MSG_UNKNOWN, "unknown page" )
/*
--- a/mm/memory-failure.c~mm-memory-failurec-fix-race-with-changing-page-compound-again
+++ a/mm/memory-failure.c
@@ -732,6 +732,7 @@ static const char * const action_page_ty
[MF_MSG_BUDDY] = "free buddy page",
[MF_MSG_DAX] = "dax page",
[MF_MSG_UNSPLIT_THP] = "unsplit thp",
+ [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size",
[MF_MSG_UNKNOWN] = "unknown page",
};
@@ -1532,6 +1533,17 @@ static int memory_failure_hugetlb(unsign
}
lock_page(head);
+
+ /*
+ * The page could have changed compound pages due to race window.
+ * If this happens just bail out.
+ */
+ if (!PageHuge(p) || compound_head(p) != head) {
+ action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
+ res = -EBUSY;
+ goto out;
+ }
+
page_flags = head->flags;
if (hwpoison_filter(p)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 124/227] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, shy828301, naoya.horiguchi, mike.kravetz, bp,
linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
into its one caller"), invalidate_inode_page() can invalidate the pages in
the swap cache because the check of page->mapping != mapping is removed.
But invalidate_inode_page() is not expected to deal with the pages in swap
cache. Also non-lru movable page can reach here too. They're not page
cache pages. Skip these pages by checking PageSwapCache and PageLRU.
Link: https://lkml.kernel.org/r/20220312074613.4798-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-memory-failurec-avoid-calling-invalidate_inode_page-with-unexpected-pages
+++ a/mm/memory-failure.c
@@ -2184,7 +2184,7 @@ static int __soft_offline_page(struct pa
return 0;
}
- if (!PageHuge(page))
+ if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
/*
* Try to invalidate first. This should work for
* non dirty unmapped page cache pages.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 124/227] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, shy828301, naoya.horiguchi, mike.kravetz, bp,
linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
into its one caller"), invalidate_inode_page() can invalidate the pages in
the swap cache because the check of page->mapping != mapping is removed.
But invalidate_inode_page() is not expected to deal with the pages in swap
cache. Also non-lru movable page can reach here too. They're not page
cache pages. Skip these pages by checking PageSwapCache and PageLRU.
Link: https://lkml.kernel.org/r/20220312074613.4798-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-memory-failurec-avoid-calling-invalidate_inode_page-with-unexpected-pages
+++ a/mm/memory-failure.c
@@ -2184,7 +2184,7 @@ static int __soft_offline_page(struct pa
return 0;
}
- if (!PageHuge(page))
+ if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
/*
* Try to invalidate first. This should work for
* non dirty unmapped page cache pages.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 125/227] mm/memory-failure.c: make non-LRU movable pages unhandlable
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, shy828301, naoya.horiguchi, mike.kravetz, bp,
linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: make non-LRU movable pages unhandlable
We can not really handle non-LRU movable pages in memory failure.
Typically they are balloon, zsmalloc, etc. Assuming we run into a base
(4K) non-LRU movable page, we could reach as far as identify_page_state(),
it should not fall into any category except me_unknown.
For the non-LRU compound movable pages, they could be taken for transhuge
pages but it's unexpected to split non-LRU movable pages using
split_huge_page_to_list in memory_failure. So we could just simply make
non-LRU movable pages unhandlable to avoid these possible nasty cases.
Link: https://lkml.kernel.org/r/20220312074613.4798-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-make-non-lru-movable-pages-unhandlable
+++ a/mm/memory-failure.c
@@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct pa
* does not return true for hugetlb or device memory pages, so it's assumed
* to be called only in the context where we never have such pages.
*/
-static inline bool HWPoisonHandlable(struct page *page)
+static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
{
- return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
+ bool movable = false;
+
+ /* Soft offline could mirgate non-LRU movable pages */
+ if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
+ movable = true;
+
+ return movable || PageLRU(page) || is_free_buddy_page(page);
}
-static int __get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page, unsigned long flags)
{
struct page *head = compound_head(page);
int ret = 0;
@@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct pa
* for any unsupported type of page in order to reduce the risk of
* unexpected races caused by taking a page refcount.
*/
- if (!HWPoisonHandlable(head))
+ if (!HWPoisonHandlable(head, flags))
return -EBUSY;
if (get_page_unless_zero(head)) {
@@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p,
try_again:
if (!count_increased) {
- ret = __get_hwpoison_page(p);
+ ret = __get_hwpoison_page(p, flags);
if (!ret) {
if (page_count(p)) {
/* We raced with an allocation, retry. */
@@ -1249,7 +1255,7 @@ try_again:
}
}
- if (PageHuge(p) || HWPoisonHandlable(p)) {
+ if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
ret = 1;
} else {
/*
@@ -2302,7 +2308,7 @@ int soft_offline_page(unsigned long pfn,
retry:
get_online_mems();
- ret = get_hwpoison_page(page, flags);
+ ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE);
put_online_mems();
if (ret > 0) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 125/227] mm/memory-failure.c: make non-LRU movable pages unhandlable
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: tony.luck, shy828301, naoya.horiguchi, mike.kravetz, bp,
linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory-failure.c: make non-LRU movable pages unhandlable
We can not really handle non-LRU movable pages in memory failure.
Typically they are balloon, zsmalloc, etc. Assuming we run into a base
(4K) non-LRU movable page, we could reach as far as identify_page_state(),
it should not fall into any category except me_unknown.
For the non-LRU compound movable pages, they could be taken for transhuge
pages but it's unexpected to split non-LRU movable pages using
split_huge_page_to_list in memory_failure. So we could just simply make
non-LRU movable pages unhandlable to avoid these possible nasty cases.
Link: https://lkml.kernel.org/r/20220312074613.4798-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory-failure.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failurec-make-non-lru-movable-pages-unhandlable
+++ a/mm/memory-failure.c
@@ -1176,12 +1176,18 @@ void ClearPageHWPoisonTakenOff(struct pa
* does not return true for hugetlb or device memory pages, so it's assumed
* to be called only in the context where we never have such pages.
*/
-static inline bool HWPoisonHandlable(struct page *page)
+static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
{
- return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
+ bool movable = false;
+
+ /* Soft offline could mirgate non-LRU movable pages */
+ if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
+ movable = true;
+
+ return movable || PageLRU(page) || is_free_buddy_page(page);
}
-static int __get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page, unsigned long flags)
{
struct page *head = compound_head(page);
int ret = 0;
@@ -1196,7 +1202,7 @@ static int __get_hwpoison_page(struct pa
* for any unsupported type of page in order to reduce the risk of
* unexpected races caused by taking a page refcount.
*/
- if (!HWPoisonHandlable(head))
+ if (!HWPoisonHandlable(head, flags))
return -EBUSY;
if (get_page_unless_zero(head)) {
@@ -1221,7 +1227,7 @@ static int get_any_page(struct page *p,
try_again:
if (!count_increased) {
- ret = __get_hwpoison_page(p);
+ ret = __get_hwpoison_page(p, flags);
if (!ret) {
if (page_count(p)) {
/* We raced with an allocation, retry. */
@@ -1249,7 +1255,7 @@ try_again:
}
}
- if (PageHuge(p) || HWPoisonHandlable(p)) {
+ if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
ret = 1;
} else {
/*
@@ -2302,7 +2308,7 @@ int soft_offline_page(unsigned long pfn,
retry:
get_online_mems();
- ret = get_hwpoison_page(page, flags);
+ ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE);
put_online_mems();
if (ret > 0) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 126/227] mm, fault-injection: declare should_fail_alloc_page()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: willy, mgorman, david, vbabka, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm, fault-injection: declare should_fail_alloc_page()
The mm/ directory can almost fully be built with W=1, which would help in
local development. One remaining issue is missing prototype for
should_fail_alloc_page(). Thus add it next to the should_failslab()
prototype.
Note the previous attempt by commit f7173090033c ("mm/page_alloc: make
should_fail_alloc_page() static") had to be reverted by commit
54aa386661fe as it caused an unresolved symbol error with
CONFIG_DEBUG_INFO_BTF=y
Link: https://lkml.kernel.org/r/20220314165724.16071-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/fault-inject.h | 2 ++
1 file changed, 2 insertions(+)
--- a/include/linux/fault-inject.h~mm-fault-injection-declare-should_fail_alloc_page
+++ a/include/linux/fault-inject.h
@@ -64,6 +64,8 @@ static inline struct dentry *fault_creat
struct kmem_cache;
+bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order);
+
int should_failslab(struct kmem_cache *s, gfp_t gfpflags);
#ifdef CONFIG_FAILSLAB
extern bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 126/227] mm, fault-injection: declare should_fail_alloc_page()
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: willy, mgorman, david, vbabka, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm, fault-injection: declare should_fail_alloc_page()
The mm/ directory can almost fully be built with W=1, which would help in
local development. One remaining issue is missing prototype for
should_fail_alloc_page(). Thus add it next to the should_failslab()
prototype.
Note the previous attempt by commit f7173090033c ("mm/page_alloc: make
should_fail_alloc_page() static") had to be reverted by commit
54aa386661fe as it caused an unresolved symbol error with
CONFIG_DEBUG_INFO_BTF=y
Link: https://lkml.kernel.org/r/20220314165724.16071-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/fault-inject.h | 2 ++
1 file changed, 2 insertions(+)
--- a/include/linux/fault-inject.h~mm-fault-injection-declare-should_fail_alloc_page
+++ a/include/linux/fault-inject.h
@@ -64,6 +64,8 @@ static inline struct dentry *fault_creat
struct kmem_cache;
+bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order);
+
int should_failslab(struct kmem_cache *s, gfp_t gfpflags);
#ifdef CONFIG_FAILSLAB
extern bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 127/227] mm/mlock: fix potential imbalanced rlimit ucounts adjustment
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: hughd, herbert.van.den.bergh, chris.mason, linmiaohe, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mlock: fix potential imbalanced rlimit ucounts adjustment
user_shm_lock forgets to set allowed to 0 when get_ucounts fails. So
the later user_shm_unlock might do the extra dec_rlimit_ucounts. Fix
this by resetting allowed to 0.
Link: https://lkml.kernel.org/r/20220310132417.41189-1-linmiaohe@huawei.com
Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mlock.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mlock.c~mm-mlock-fix-potential-imbalanced-rlimit-ucounts-adjustment
+++ a/mm/mlock.c
@@ -839,6 +839,7 @@ int user_shm_lock(size_t size, struct uc
}
if (!get_ucounts(ucounts)) {
dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
+ allowed = 0;
goto out;
}
allowed = 1;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 127/227] mm/mlock: fix potential imbalanced rlimit ucounts adjustment
@ 2022-03-22 21:44 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:44 UTC (permalink / raw)
To: hughd, herbert.van.den.bergh, chris.mason, linmiaohe, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/mlock: fix potential imbalanced rlimit ucounts adjustment
user_shm_lock forgets to set allowed to 0 when get_ucounts fails. So
the later user_shm_unlock might do the extra dec_rlimit_ucounts. Fix
this by resetting allowed to 0.
Link: https://lkml.kernel.org/r/20220310132417.41189-1-linmiaohe@huawei.com
Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mlock.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mlock.c~mm-mlock-fix-potential-imbalanced-rlimit-ucounts-adjustment
+++ a/mm/mlock.c
@@ -839,6 +839,7 @@ int user_shm_lock(size_t size, struct uc
}
if (!get_ucounts(ucounts)) {
dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
+ allowed = 0;
goto out;
}
allowed = 1;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 128/227] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
Patch series "Free the 2nd vmemmap page associated with each HugeTLB page", v7.
This series can minimize the overhead of struct page for 2MB HugeTLB pages
significantly. It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB. It is a nice gain. Comments and reviews are welcome.
Thanks.
The main implementation and details can refer to the commit log of patch
1. In this series, I have changed the following four helpers, the
following table shows the impact of the overhead of those helpers.
+------------------+-----------------------+
| APIs | head page | tail page |
+------------------+-----------+-----------+
| PageHead() | Y | N |
+------------------+-----------+-----------+
| PageTail() | Y | N |
+------------------+-----------+-----------+
| PageCompound() | N | N |
+------------------+-----------+-----------+
| compound_head() | Y | N |
+------------------+-----------+-----------+
Y: Overhead is increased.
N: Overhead is _NOT_ increased.
It shows that the overhead of those helpers on a tail page don't change
between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off". But the
overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
(except PageCompound()). So I believe that Matthew Wilcox's folio series
will help with this.
The users of PageHead() and PageTail() are much less than compound_head()
and most users of PageTail() are VM_BUG_ON(), so I have done some tests
about the overhead of compound_head() on head pages.
I have tested the overhead of calling compound_head() on a head page,
which is 2.11ns (Measure the call time of 10 million times
compound_head(), and then average).
For a head page whose address is not aligned with PAGE_SIZE or a
non-compound page, the overhead of compound_head() is 2.54ns which is
increased by 20%. For a head page whose address is aligned with
PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by
40%. Most pages are the former. I do not think the overhead is
significant since the overhead of compound_head() itself is low.
This patch (of 5):
This patch minimizes the overhead of struct page for 2MB HugeTLB pages
significantly. It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB (2MB type).
After the feature of "Free sonme vmemmap pages of HugeTLB page" is
enabled, the mapping of the vmemmap addresses associated with a 2MB
HugeTLB page becomes the figure below.
HugeTLB struct pages(8 pages) page frame(8 pages)
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head
| | | 0 | -------------> | 0 |
| | +-----------+ +-----------+
| | | 1 | -------------> | 1 |
| | +-----------+ +-----------+
| | | 2 | ----------------^ ^ ^ ^ ^ ^
| | +-----------+ | | | | |
| | | 3 | ------------------+ | | | |
| | +-----------+ | | | |
| | | 4 | --------------------+ | | |
| 2MB | +-----------+ | | |
| | | 5 | ----------------------+ | |
| | +-----------+ | |
| | | 6 | ------------------------+ |
| | +-----------+ |
| | | 7 | --------------------------+
| | +-----------+
| |
| |
| |
+-----------+
As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and
remaped. However, the 2nd vmemmap page frame is also can be freed to
the buddy allocator, then we can change the mapping from the figure
above to the figure below.
HugeTLB struct pages(8 pages) page frame(8 pages)
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head
| | | 0 | -------------> | 0 |
| | +-----------+ +-----------+
| | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
| | +-----------+ | | | | | |
| | | 2 | -----------------+ | | | | |
| | +-----------+ | | | | |
| | | 3 | -------------------+ | | | |
| | +-----------+ | | | |
| | | 4 | ---------------------+ | | |
| 2MB | +-----------+ | | |
| | | 5 | -----------------------+ | |
| | +-----------+ | |
| | | 6 | -------------------------+ |
| | +-----------+ |
| | | 7 | ---------------------------+
| | +-----------+
| |
| |
| |
+-----------+
After we do this, all tail vmemmap pages (1-7) are mapped to the head
vmemmap page frame (0). In other words, there are more than one page
struct with PG_head associated with each HugeTLB page. We __know__ that
there is only one head page struct, the tail page structs with PG_head are
fake head page structs. We need an approach to distinguish between those
two different types of page structs so that compound_head(), PageHead()
and PageTail() can work properly if the parameter is the tail page struct
but with PG_head.
The following code snippet describes how to distinguish between real and
fake head page struct.
if (test_bit(PG_head, &page->flags)) {
unsigned long head = READ_ONCE(page[1].compound_head);
if (head & 1) {
if (head == (unsigned long)page + 1)
==> head page struct
else
==> tail page struct
} else
==> head page struct
}
We can safely access the field of the @page[1] with PG_head because the
@page is a compound page composed with at least two contiguous pages.
[songmuchun@bytedance.com: restore lost comment changes]
Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/kernel-parameters.txt | 2
include/linux/page-flags.h | 78 +++++++++++++-
mm/hugetlb_vmemmap.c | 62 ++++++-----
mm/sparse-vmemmap.c | 21 +++
4 files changed, 130 insertions(+), 33 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/Documentation/admin-guide/kernel-parameters.txt
@@ -1625,7 +1625,7 @@
[KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
enabled.
Allows heavy hugetlb users to free up some more
- memory (6 * PAGE_SIZE for each 2MB hugetlb page).
+ memory (7 * PAGE_SIZE for each 2MB hugetlb page).
Format: { on | off (default) }
on: enable the feature
--- a/include/linux/page-flags.h~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/include/linux/page-flags.h
@@ -190,13 +190,69 @@ enum pageflags {
#ifndef __GENERATING_BOUNDS_H
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+extern bool hugetlb_free_vmemmap_enabled;
+
+/*
+ * If the feature of freeing some vmemmap pages associated with each HugeTLB
+ * page is enabled, the head vmemmap page frame is reused and all of the tail
+ * vmemmap addresses map to the head vmemmap page frame (furture details can
+ * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other
+ * words, there are more than one page struct with PG_head associated with each
+ * HugeTLB page. We __know__ that there is only one head page struct, the tail
+ * page structs with PG_head are fake head page structs. We need an approach
+ * to distinguish between those two different types of page structs so that
+ * compound_head() can return the real head page struct when the parameter is
+ * the tail page struct but with PG_head.
+ *
+ * The page_fixed_fake_head() returns the real head page struct if the @page is
+ * fake page head, otherwise, returns @page which can either be a true page
+ * head or tail.
+ */
+static __always_inline const struct page *page_fixed_fake_head(const struct page *page)
+{
+ if (!hugetlb_free_vmemmap_enabled)
+ return page;
+
+ /*
+ * Only addresses aligned with PAGE_SIZE of struct page may be fake head
+ * struct page. The alignment check aims to avoid access the fields (
+ * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
+ * cold cacheline in some cases.
+ */
+ if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
+ test_bit(PG_head, &page->flags)) {
+ /*
+ * We can safely access the field of the @page[1] with PG_head
+ * because the @page is a compound page composed with at least
+ * two contiguous pages.
+ */
+ unsigned long head = READ_ONCE(page[1].compound_head);
+
+ if (likely(head & 1))
+ return (const struct page *)(head - 1);
+ }
+ return page;
+}
+#else
+static inline const struct page *page_fixed_fake_head(const struct page *page)
+{
+ return page;
+}
+#endif
+
+static __always_inline int page_is_fake_head(struct page *page)
+{
+ return page_fixed_fake_head(page) != page;
+}
+
static inline unsigned long _compound_head(const struct page *page)
{
unsigned long head = READ_ONCE(page->compound_head);
if (unlikely(head & 1))
return head - 1;
- return (unsigned long)page;
+ return (unsigned long)page_fixed_fake_head(page);
}
#define compound_head(page) ((typeof(page))_compound_head(page))
@@ -231,12 +287,13 @@ static inline unsigned long _compound_he
static __always_inline int PageTail(struct page *page)
{
- return READ_ONCE(page->compound_head) & 1;
+ return READ_ONCE(page->compound_head) & 1 || page_is_fake_head(page);
}
static __always_inline int PageCompound(struct page *page)
{
- return test_bit(PG_head, &page->flags) || PageTail(page);
+ return test_bit(PG_head, &page->flags) ||
+ READ_ONCE(page->compound_head) & 1;
}
#define PAGE_POISON_PATTERN -1l
@@ -695,7 +752,20 @@ static inline bool test_set_page_writeba
return set_page_writeback(page);
}
-__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
+static __always_inline bool folio_test_head(struct folio *folio)
+{
+ return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY));
+}
+
+static __always_inline int PageHead(struct page *page)
+{
+ PF_POISONED_CHECK(page);
+ return test_bit(PG_head, &page->flags) && !page_is_fake_head(page);
+}
+
+__SETPAGEFLAG(Head, head, PF_ANY)
+__CLEARPAGEFLAG(Head, head, PF_ANY)
+CLEARPAGEFLAG(Head, head, PF_ANY)
/**
* folio_test_large() - Does this folio contain more than one page?
--- a/mm/hugetlb_vmemmap.c~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/mm/hugetlb_vmemmap.c
@@ -124,9 +124,9 @@
* page of page structs (page 0) associated with the HugeTLB page contains the 4
* page structs necessary to describe the HugeTLB. The only use of the remaining
* pages of page structs (page 1 to page 7) is to point to page->compound_head.
- * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs
+ * Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs
* will be used for each HugeTLB page. This will allow us to free the remaining
- * 6 pages to the buddy allocator.
+ * 7 pages to the buddy allocator.
*
* Here is how things look after remapping.
*
@@ -134,30 +134,30 @@
* +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
* | | | 0 | -------------> | 0 |
* | | +-----------+ +-----------+
- * | | | 1 | -------------> | 1 |
- * | | +-----------+ +-----------+
- * | | | 2 | ----------------^ ^ ^ ^ ^ ^
- * | | +-----------+ | | | | |
- * | | | 3 | ------------------+ | | | |
- * | | +-----------+ | | | |
- * | | | 4 | --------------------+ | | |
- * | PMD | +-----------+ | | |
- * | level | | 5 | ----------------------+ | |
- * | mapping | +-----------+ | |
- * | | | 6 | ------------------------+ |
- * | | +-----------+ |
- * | | | 7 | --------------------------+
+ * | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
+ * | | +-----------+ | | | | | |
+ * | | | 2 | -----------------+ | | | | |
+ * | | +-----------+ | | | | |
+ * | | | 3 | -------------------+ | | | |
+ * | | +-----------+ | | | |
+ * | | | 4 | ---------------------+ | | |
+ * | PMD | +-----------+ | | |
+ * | level | | 5 | -----------------------+ | |
+ * | mapping | +-----------+ | |
+ * | | | 6 | -------------------------+ |
+ * | | +-----------+ |
+ * | | | 7 | ---------------------------+
* | | +-----------+
* | |
* | |
* | |
* +-----------+
*
- * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for
+ * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for
* vmemmap pages and restore the previous mapping relationship.
*
* For the HugeTLB page of the pud level mapping. It is similar to the former.
- * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages.
+ * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages.
*
* Apart from the HugeTLB page of the pmd/pud level mapping, some architectures
* (e.g. aarch64) provides a contiguous bit in the translation table entries
@@ -166,7 +166,13 @@
*
* The contiguous bit is used to increase the mapping size at the pmd and pte
* (last) level. So this type of HugeTLB page can be optimized only when its
- * size of the struct page structs is greater than 2 pages.
+ * size of the struct page structs is greater than 1 page.
+ *
+ * Notice: The head vmemmap page is not freed to the buddy allocator and all
+ * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
+ * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
+ * associated with each HugeTLB page. The compound_head() can handle this
+ * correctly (more details refer to the comment above compound_head()).
*/
#define pr_fmt(fmt) "HugeTLB: " fmt
@@ -175,19 +181,21 @@
/*
* There are a lot of struct page structures associated with each HugeTLB page.
* For tail pages, the value of compound_head is the same. So we can reuse first
- * page of tail page structures. We map the virtual addresses of the remaining
- * pages of tail page structures to the first tail page struct, and then free
- * these page frames. Therefore, we need to reserve two pages as vmemmap areas.
+ * page of head page structures. We map the virtual addresses of all the pages
+ * of tail page structures to the head page struct, and then free these page
+ * frames. Therefore, we need to reserve one pages as vmemmap areas.
*/
-#define RESERVE_VMEMMAP_NR 2U
+#define RESERVE_VMEMMAP_NR 1U
#define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
-bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
+bool hugetlb_free_vmemmap_enabled __read_mostly =
+ IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
+EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
static int __init early_hugetlb_free_vmemmap_param(char *buf)
{
/* We cannot optimize if a "struct page" crosses page boundaries. */
- if ((!is_power_of_2(sizeof(struct page)))) {
+ if (!is_power_of_2(sizeof(struct page))) {
pr_warn("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n");
return 0;
}
@@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstat
*/
ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
-
if (!ret)
ClearHPageVmemmapOptimized(head);
@@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct
vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT;
/*
- * The head page and the first tail page are not to be freed to buddy
- * allocator, the other pages will map to the first tail page, so they
- * can be freed.
+ * The head page is not to be freed to buddy allocator, the other tail
+ * pages will map to the head page, so they can be freed.
*
* Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true
* on some architectures (e.g. aarch64). See Documentation/arm64/
--- a/mm/sparse-vmemmap.c~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/mm/sparse-vmemmap.c
@@ -245,6 +245,26 @@ static void vmemmap_remap_pte(pte_t *pte
set_pte_at(&init_mm, addr, pte, entry);
}
+/*
+ * How many struct page structs need to be reset. When we reuse the head
+ * struct page, the special metadata (e.g. page->flags or page->mapping)
+ * cannot copy to the tail struct page structs. The invalid value will be
+ * checked in the free_tail_pages_check(). In order to avoid the message
+ * of "corrupted mapping in tail page". We need to reset at least 3 (one
+ * head struct page struct and two tail struct page structs) struct page
+ * structs.
+ */
+#define NR_RESET_STRUCT_PAGE 3
+
+static inline void reset_struct_pages(struct page *start)
+{
+ int i;
+ struct page *from = start + NR_RESET_STRUCT_PAGE;
+
+ for (i = 0; i < NR_RESET_STRUCT_PAGE; i++)
+ memcpy(start + i, from, sizeof(*from));
+}
+
static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
struct vmemmap_remap_walk *walk)
{
@@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *p
list_del(&page->lru);
to = page_to_virt(page);
copy_page(to, (void *)walk->reuse_addr);
+ reset_struct_pages(to);
set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 128/227] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
Patch series "Free the 2nd vmemmap page associated with each HugeTLB page", v7.
This series can minimize the overhead of struct page for 2MB HugeTLB pages
significantly. It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB. It is a nice gain. Comments and reviews are welcome.
Thanks.
The main implementation and details can refer to the commit log of patch
1. In this series, I have changed the following four helpers, the
following table shows the impact of the overhead of those helpers.
+------------------+-----------------------+
| APIs | head page | tail page |
+------------------+-----------+-----------+
| PageHead() | Y | N |
+------------------+-----------+-----------+
| PageTail() | Y | N |
+------------------+-----------+-----------+
| PageCompound() | N | N |
+------------------+-----------+-----------+
| compound_head() | Y | N |
+------------------+-----------+-----------+
Y: Overhead is increased.
N: Overhead is _NOT_ increased.
It shows that the overhead of those helpers on a tail page don't change
between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off". But the
overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
(except PageCompound()). So I believe that Matthew Wilcox's folio series
will help with this.
The users of PageHead() and PageTail() are much less than compound_head()
and most users of PageTail() are VM_BUG_ON(), so I have done some tests
about the overhead of compound_head() on head pages.
I have tested the overhead of calling compound_head() on a head page,
which is 2.11ns (Measure the call time of 10 million times
compound_head(), and then average).
For a head page whose address is not aligned with PAGE_SIZE or a
non-compound page, the overhead of compound_head() is 2.54ns which is
increased by 20%. For a head page whose address is aligned with
PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by
40%. Most pages are the former. I do not think the overhead is
significant since the overhead of compound_head() itself is low.
This patch (of 5):
This patch minimizes the overhead of struct page for 2MB HugeTLB pages
significantly. It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB (2MB type).
After the feature of "Free sonme vmemmap pages of HugeTLB page" is
enabled, the mapping of the vmemmap addresses associated with a 2MB
HugeTLB page becomes the figure below.
HugeTLB struct pages(8 pages) page frame(8 pages)
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head
| | | 0 | -------------> | 0 |
| | +-----------+ +-----------+
| | | 1 | -------------> | 1 |
| | +-----------+ +-----------+
| | | 2 | ----------------^ ^ ^ ^ ^ ^
| | +-----------+ | | | | |
| | | 3 | ------------------+ | | | |
| | +-----------+ | | | |
| | | 4 | --------------------+ | | |
| 2MB | +-----------+ | | |
| | | 5 | ----------------------+ | |
| | +-----------+ | |
| | | 6 | ------------------------+ |
| | +-----------+ |
| | | 7 | --------------------------+
| | +-----------+
| |
| |
| |
+-----------+
As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and
remaped. However, the 2nd vmemmap page frame is also can be freed to
the buddy allocator, then we can change the mapping from the figure
above to the figure below.
HugeTLB struct pages(8 pages) page frame(8 pages)
+-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head
| | | 0 | -------------> | 0 |
| | +-----------+ +-----------+
| | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
| | +-----------+ | | | | | |
| | | 2 | -----------------+ | | | | |
| | +-----------+ | | | | |
| | | 3 | -------------------+ | | | |
| | +-----------+ | | | |
| | | 4 | ---------------------+ | | |
| 2MB | +-----------+ | | |
| | | 5 | -----------------------+ | |
| | +-----------+ | |
| | | 6 | -------------------------+ |
| | +-----------+ |
| | | 7 | ---------------------------+
| | +-----------+
| |
| |
| |
+-----------+
After we do this, all tail vmemmap pages (1-7) are mapped to the head
vmemmap page frame (0). In other words, there are more than one page
struct with PG_head associated with each HugeTLB page. We __know__ that
there is only one head page struct, the tail page structs with PG_head are
fake head page structs. We need an approach to distinguish between those
two different types of page structs so that compound_head(), PageHead()
and PageTail() can work properly if the parameter is the tail page struct
but with PG_head.
The following code snippet describes how to distinguish between real and
fake head page struct.
if (test_bit(PG_head, &page->flags)) {
unsigned long head = READ_ONCE(page[1].compound_head);
if (head & 1) {
if (head == (unsigned long)page + 1)
==> head page struct
else
==> tail page struct
} else
==> head page struct
}
We can safely access the field of the @page[1] with PG_head because the
@page is a compound page composed with at least two contiguous pages.
[songmuchun@bytedance.com: restore lost comment changes]
Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/kernel-parameters.txt | 2
include/linux/page-flags.h | 78 +++++++++++++-
mm/hugetlb_vmemmap.c | 62 ++++++-----
mm/sparse-vmemmap.c | 21 +++
4 files changed, 130 insertions(+), 33 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/Documentation/admin-guide/kernel-parameters.txt
@@ -1625,7 +1625,7 @@
[KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
enabled.
Allows heavy hugetlb users to free up some more
- memory (6 * PAGE_SIZE for each 2MB hugetlb page).
+ memory (7 * PAGE_SIZE for each 2MB hugetlb page).
Format: { on | off (default) }
on: enable the feature
--- a/include/linux/page-flags.h~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/include/linux/page-flags.h
@@ -190,13 +190,69 @@ enum pageflags {
#ifndef __GENERATING_BOUNDS_H
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+extern bool hugetlb_free_vmemmap_enabled;
+
+/*
+ * If the feature of freeing some vmemmap pages associated with each HugeTLB
+ * page is enabled, the head vmemmap page frame is reused and all of the tail
+ * vmemmap addresses map to the head vmemmap page frame (furture details can
+ * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other
+ * words, there are more than one page struct with PG_head associated with each
+ * HugeTLB page. We __know__ that there is only one head page struct, the tail
+ * page structs with PG_head are fake head page structs. We need an approach
+ * to distinguish between those two different types of page structs so that
+ * compound_head() can return the real head page struct when the parameter is
+ * the tail page struct but with PG_head.
+ *
+ * The page_fixed_fake_head() returns the real head page struct if the @page is
+ * fake page head, otherwise, returns @page which can either be a true page
+ * head or tail.
+ */
+static __always_inline const struct page *page_fixed_fake_head(const struct page *page)
+{
+ if (!hugetlb_free_vmemmap_enabled)
+ return page;
+
+ /*
+ * Only addresses aligned with PAGE_SIZE of struct page may be fake head
+ * struct page. The alignment check aims to avoid access the fields (
+ * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly)
+ * cold cacheline in some cases.
+ */
+ if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) &&
+ test_bit(PG_head, &page->flags)) {
+ /*
+ * We can safely access the field of the @page[1] with PG_head
+ * because the @page is a compound page composed with at least
+ * two contiguous pages.
+ */
+ unsigned long head = READ_ONCE(page[1].compound_head);
+
+ if (likely(head & 1))
+ return (const struct page *)(head - 1);
+ }
+ return page;
+}
+#else
+static inline const struct page *page_fixed_fake_head(const struct page *page)
+{
+ return page;
+}
+#endif
+
+static __always_inline int page_is_fake_head(struct page *page)
+{
+ return page_fixed_fake_head(page) != page;
+}
+
static inline unsigned long _compound_head(const struct page *page)
{
unsigned long head = READ_ONCE(page->compound_head);
if (unlikely(head & 1))
return head - 1;
- return (unsigned long)page;
+ return (unsigned long)page_fixed_fake_head(page);
}
#define compound_head(page) ((typeof(page))_compound_head(page))
@@ -231,12 +287,13 @@ static inline unsigned long _compound_he
static __always_inline int PageTail(struct page *page)
{
- return READ_ONCE(page->compound_head) & 1;
+ return READ_ONCE(page->compound_head) & 1 || page_is_fake_head(page);
}
static __always_inline int PageCompound(struct page *page)
{
- return test_bit(PG_head, &page->flags) || PageTail(page);
+ return test_bit(PG_head, &page->flags) ||
+ READ_ONCE(page->compound_head) & 1;
}
#define PAGE_POISON_PATTERN -1l
@@ -695,7 +752,20 @@ static inline bool test_set_page_writeba
return set_page_writeback(page);
}
-__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
+static __always_inline bool folio_test_head(struct folio *folio)
+{
+ return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY));
+}
+
+static __always_inline int PageHead(struct page *page)
+{
+ PF_POISONED_CHECK(page);
+ return test_bit(PG_head, &page->flags) && !page_is_fake_head(page);
+}
+
+__SETPAGEFLAG(Head, head, PF_ANY)
+__CLEARPAGEFLAG(Head, head, PF_ANY)
+CLEARPAGEFLAG(Head, head, PF_ANY)
/**
* folio_test_large() - Does this folio contain more than one page?
--- a/mm/hugetlb_vmemmap.c~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/mm/hugetlb_vmemmap.c
@@ -124,9 +124,9 @@
* page of page structs (page 0) associated with the HugeTLB page contains the 4
* page structs necessary to describe the HugeTLB. The only use of the remaining
* pages of page structs (page 1 to page 7) is to point to page->compound_head.
- * Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs
+ * Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs
* will be used for each HugeTLB page. This will allow us to free the remaining
- * 6 pages to the buddy allocator.
+ * 7 pages to the buddy allocator.
*
* Here is how things look after remapping.
*
@@ -134,30 +134,30 @@
* +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
* | | | 0 | -------------> | 0 |
* | | +-----------+ +-----------+
- * | | | 1 | -------------> | 1 |
- * | | +-----------+ +-----------+
- * | | | 2 | ----------------^ ^ ^ ^ ^ ^
- * | | +-----------+ | | | | |
- * | | | 3 | ------------------+ | | | |
- * | | +-----------+ | | | |
- * | | | 4 | --------------------+ | | |
- * | PMD | +-----------+ | | |
- * | level | | 5 | ----------------------+ | |
- * | mapping | +-----------+ | |
- * | | | 6 | ------------------------+ |
- * | | +-----------+ |
- * | | | 7 | --------------------------+
+ * | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^
+ * | | +-----------+ | | | | | |
+ * | | | 2 | -----------------+ | | | | |
+ * | | +-----------+ | | | | |
+ * | | | 3 | -------------------+ | | | |
+ * | | +-----------+ | | | |
+ * | | | 4 | ---------------------+ | | |
+ * | PMD | +-----------+ | | |
+ * | level | | 5 | -----------------------+ | |
+ * | mapping | +-----------+ | |
+ * | | | 6 | -------------------------+ |
+ * | | +-----------+ |
+ * | | | 7 | ---------------------------+
* | | +-----------+
* | |
* | |
* | |
* +-----------+
*
- * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for
+ * When a HugeTLB is freed to the buddy system, we should allocate 7 pages for
* vmemmap pages and restore the previous mapping relationship.
*
* For the HugeTLB page of the pud level mapping. It is similar to the former.
- * We also can use this approach to free (PAGE_SIZE - 2) vmemmap pages.
+ * We also can use this approach to free (PAGE_SIZE - 1) vmemmap pages.
*
* Apart from the HugeTLB page of the pmd/pud level mapping, some architectures
* (e.g. aarch64) provides a contiguous bit in the translation table entries
@@ -166,7 +166,13 @@
*
* The contiguous bit is used to increase the mapping size at the pmd and pte
* (last) level. So this type of HugeTLB page can be optimized only when its
- * size of the struct page structs is greater than 2 pages.
+ * size of the struct page structs is greater than 1 page.
+ *
+ * Notice: The head vmemmap page is not freed to the buddy allocator and all
+ * tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
+ * more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
+ * associated with each HugeTLB page. The compound_head() can handle this
+ * correctly (more details refer to the comment above compound_head()).
*/
#define pr_fmt(fmt) "HugeTLB: " fmt
@@ -175,19 +181,21 @@
/*
* There are a lot of struct page structures associated with each HugeTLB page.
* For tail pages, the value of compound_head is the same. So we can reuse first
- * page of tail page structures. We map the virtual addresses of the remaining
- * pages of tail page structures to the first tail page struct, and then free
- * these page frames. Therefore, we need to reserve two pages as vmemmap areas.
+ * page of head page structures. We map the virtual addresses of all the pages
+ * of tail page structures to the head page struct, and then free these page
+ * frames. Therefore, we need to reserve one pages as vmemmap areas.
*/
-#define RESERVE_VMEMMAP_NR 2U
+#define RESERVE_VMEMMAP_NR 1U
#define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
-bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
+bool hugetlb_free_vmemmap_enabled __read_mostly =
+ IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
+EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
static int __init early_hugetlb_free_vmemmap_param(char *buf)
{
/* We cannot optimize if a "struct page" crosses page boundaries. */
- if ((!is_power_of_2(sizeof(struct page)))) {
+ if (!is_power_of_2(sizeof(struct page))) {
pr_warn("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n");
return 0;
}
@@ -236,7 +244,6 @@ int alloc_huge_page_vmemmap(struct hstat
*/
ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse,
GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE);
-
if (!ret)
ClearHPageVmemmapOptimized(head);
@@ -282,9 +289,8 @@ void __init hugetlb_vmemmap_init(struct
vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT;
/*
- * The head page and the first tail page are not to be freed to buddy
- * allocator, the other pages will map to the first tail page, so they
- * can be freed.
+ * The head page is not to be freed to buddy allocator, the other tail
+ * pages will map to the head page, so they can be freed.
*
* Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? It is true
* on some architectures (e.g. aarch64). See Documentation/arm64/
--- a/mm/sparse-vmemmap.c~mm-hugetlb-free-the-2nd-vmemmap-page-associated-with-each-hugetlb-page
+++ a/mm/sparse-vmemmap.c
@@ -245,6 +245,26 @@ static void vmemmap_remap_pte(pte_t *pte
set_pte_at(&init_mm, addr, pte, entry);
}
+/*
+ * How many struct page structs need to be reset. When we reuse the head
+ * struct page, the special metadata (e.g. page->flags or page->mapping)
+ * cannot copy to the tail struct page structs. The invalid value will be
+ * checked in the free_tail_pages_check(). In order to avoid the message
+ * of "corrupted mapping in tail page". We need to reset at least 3 (one
+ * head struct page struct and two tail struct page structs) struct page
+ * structs.
+ */
+#define NR_RESET_STRUCT_PAGE 3
+
+static inline void reset_struct_pages(struct page *start)
+{
+ int i;
+ struct page *from = start + NR_RESET_STRUCT_PAGE;
+
+ for (i = 0; i < NR_RESET_STRUCT_PAGE; i++)
+ memcpy(start + i, from, sizeof(*from));
+}
+
static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
struct vmemmap_remap_walk *walk)
{
@@ -258,6 +278,7 @@ static void vmemmap_restore_pte(pte_t *p
list_del(&page->lru);
to = page_to_virt(page);
copy_page(to, (void *)walk->reuse_addr);
+ reset_struct_pages(to);
set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 129/227] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
The page_fixed_fake_head() is used throughout memory management and the
conditional check requires checking a global variable, although the
overhead of this check may be small, it increases when the memory cache
comes under pressure. Also, the global variable will not be modified
after system boot, so it is very appropriate to use static key machanism.
Link: https://lkml.kernel.org/r/20211101031651.75851-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/hugetlb.h | 6 ------
include/linux/page-flags.h | 16 ++++++++++++++--
mm/hugetlb_vmemmap.c | 12 ++++++------
mm/memory_hotplug.c | 2 +-
4 files changed, 21 insertions(+), 15 deletions(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/include/linux/hugetlb.h
@@ -1075,12 +1075,6 @@ static inline void set_huge_swap_pte_at(
}
#endif /* CONFIG_HUGETLB_PAGE */
-#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
-#else
-#define hugetlb_free_vmemmap_enabled false
-#endif
-
static inline spinlock_t *huge_pte_lock(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
--- a/include/linux/page-flags.h~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/include/linux/page-flags.h
@@ -191,7 +191,14 @@ enum pageflags {
#ifndef __GENERATING_BOUNDS_H
#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
+DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+ hugetlb_free_vmemmap_enabled_key);
+
+static __always_inline bool hugetlb_free_vmemmap_enabled(void)
+{
+ return static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+ &hugetlb_free_vmemmap_enabled_key);
+}
/*
* If the feature of freeing some vmemmap pages associated with each HugeTLB
@@ -211,7 +218,7 @@ extern bool hugetlb_free_vmemmap_enabled
*/
static __always_inline const struct page *page_fixed_fake_head(const struct page *page)
{
- if (!hugetlb_free_vmemmap_enabled)
+ if (!hugetlb_free_vmemmap_enabled())
return page;
/*
@@ -239,6 +246,11 @@ static inline const struct page *page_fi
{
return page;
}
+
+static inline bool hugetlb_free_vmemmap_enabled(void)
+{
+ return false;
+}
#endif
static __always_inline int page_is_fake_head(struct page *page)
--- a/mm/hugetlb_vmemmap.c~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/mm/hugetlb_vmemmap.c
@@ -188,9 +188,9 @@
#define RESERVE_VMEMMAP_NR 1U
#define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
-bool hugetlb_free_vmemmap_enabled __read_mostly =
- IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
-EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
+DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+ hugetlb_free_vmemmap_enabled_key);
+EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key);
static int __init early_hugetlb_free_vmemmap_param(char *buf)
{
@@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vme
return -EINVAL;
if (!strcmp(buf, "on"))
- hugetlb_free_vmemmap_enabled = true;
+ static_branch_enable(&hugetlb_free_vmemmap_enabled_key);
else if (!strcmp(buf, "off"))
- hugetlb_free_vmemmap_enabled = false;
+ static_branch_disable(&hugetlb_free_vmemmap_enabled_key);
else
return -EINVAL;
@@ -284,7 +284,7 @@ void __init hugetlb_vmemmap_init(struct
BUILD_BUG_ON(__NR_USED_SUBPAGE >=
RESERVE_VMEMMAP_SIZE / sizeof(struct page));
- if (!hugetlb_free_vmemmap_enabled)
+ if (!hugetlb_free_vmemmap_enabled())
return;
vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT;
--- a/mm/memory_hotplug.c~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/mm/memory_hotplug.c
@@ -1327,7 +1327,7 @@ bool mhp_supports_memmap_on_memory(unsig
* populate a single PMD.
*/
return memmap_on_memory &&
- !hugetlb_free_vmemmap_enabled &&
+ !hugetlb_free_vmemmap_enabled() &&
IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
size == memory_block_size_bytes() &&
IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 129/227] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
The page_fixed_fake_head() is used throughout memory management and the
conditional check requires checking a global variable, although the
overhead of this check may be small, it increases when the memory cache
comes under pressure. Also, the global variable will not be modified
after system boot, so it is very appropriate to use static key machanism.
Link: https://lkml.kernel.org/r/20211101031651.75851-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/hugetlb.h | 6 ------
include/linux/page-flags.h | 16 ++++++++++++++--
mm/hugetlb_vmemmap.c | 12 ++++++------
mm/memory_hotplug.c | 2 +-
4 files changed, 21 insertions(+), 15 deletions(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/include/linux/hugetlb.h
@@ -1075,12 +1075,6 @@ static inline void set_huge_swap_pte_at(
}
#endif /* CONFIG_HUGETLB_PAGE */
-#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
-#else
-#define hugetlb_free_vmemmap_enabled false
-#endif
-
static inline spinlock_t *huge_pte_lock(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
--- a/include/linux/page-flags.h~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/include/linux/page-flags.h
@@ -191,7 +191,14 @@ enum pageflags {
#ifndef __GENERATING_BOUNDS_H
#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
-extern bool hugetlb_free_vmemmap_enabled;
+DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+ hugetlb_free_vmemmap_enabled_key);
+
+static __always_inline bool hugetlb_free_vmemmap_enabled(void)
+{
+ return static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+ &hugetlb_free_vmemmap_enabled_key);
+}
/*
* If the feature of freeing some vmemmap pages associated with each HugeTLB
@@ -211,7 +218,7 @@ extern bool hugetlb_free_vmemmap_enabled
*/
static __always_inline const struct page *page_fixed_fake_head(const struct page *page)
{
- if (!hugetlb_free_vmemmap_enabled)
+ if (!hugetlb_free_vmemmap_enabled())
return page;
/*
@@ -239,6 +246,11 @@ static inline const struct page *page_fi
{
return page;
}
+
+static inline bool hugetlb_free_vmemmap_enabled(void)
+{
+ return false;
+}
#endif
static __always_inline int page_is_fake_head(struct page *page)
--- a/mm/hugetlb_vmemmap.c~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/mm/hugetlb_vmemmap.c
@@ -188,9 +188,9 @@
#define RESERVE_VMEMMAP_NR 1U
#define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
-bool hugetlb_free_vmemmap_enabled __read_mostly =
- IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON);
-EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled);
+DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON,
+ hugetlb_free_vmemmap_enabled_key);
+EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key);
static int __init early_hugetlb_free_vmemmap_param(char *buf)
{
@@ -204,9 +204,9 @@ static int __init early_hugetlb_free_vme
return -EINVAL;
if (!strcmp(buf, "on"))
- hugetlb_free_vmemmap_enabled = true;
+ static_branch_enable(&hugetlb_free_vmemmap_enabled_key);
else if (!strcmp(buf, "off"))
- hugetlb_free_vmemmap_enabled = false;
+ static_branch_disable(&hugetlb_free_vmemmap_enabled_key);
else
return -EINVAL;
@@ -284,7 +284,7 @@ void __init hugetlb_vmemmap_init(struct
BUILD_BUG_ON(__NR_USED_SUBPAGE >=
RESERVE_VMEMMAP_SIZE / sizeof(struct page));
- if (!hugetlb_free_vmemmap_enabled)
+ if (!hugetlb_free_vmemmap_enabled())
return;
vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT;
--- a/mm/memory_hotplug.c~mm-hugetlb-replace-hugetlb_free_vmemmap_enabled-with-a-static_key
+++ a/mm/memory_hotplug.c
@@ -1327,7 +1327,7 @@ bool mhp_supports_memmap_on_memory(unsig
* populate a single PMD.
*/
return memmap_on_memory &&
- !hugetlb_free_vmemmap_enabled &&
+ !hugetlb_free_vmemmap_enabled() &&
IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
size == memory_block_size_bytes() &&
IS_ALIGNED(vmemmap_size, PMD_SIZE) &&
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 130/227] mm: sparsemem: use page table lock to protect kernel pmd operations
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: sparsemem: use page table lock to protect kernel pmd operations
The init_mm.page_table_lock is used to protect kernel page tables, we can
use it to serialize splitting vmemmap PMD mappings instead of mmap write
lock, which can increase the concurrency of vmemmap_remap_free().
Actually, It increase the concurrency between allocations of HugeTLB
pages. But it is not the only benefit. There are a lot of users of mmap
read lock of init_mm. The mmap write lock is holding through
vmemmap_remap_free(), removing mmap write lock usage to make it does not
affect other users of mmap read lock. It is not making anything worse and
always a win to move.
Now the kernel page table walker does not hold the page_table_lock when
walking pmd entries. There may be consistency issue of a pmd entry,
because pmd entry might change from a huge pmd entry to a PTE page table.
There is only one user of kernel page table walker, namely ptdump. The
ptdump already considers the consistency, which use a local variable to
cache the value of pmd entry. But we also need to update ->action to
ACTION_CONTINUE to make sure the walker does not walk every pte entry
again when concurrent thread has split the huge pmd.
Link: https://lkml.kernel.org/r/20211101031651.75851-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/ptdump.c | 16 ++++++++++----
mm/sparse-vmemmap.c | 47 +++++++++++++++++++++++++++---------------
2 files changed, 43 insertions(+), 20 deletions(-)
--- a/mm/ptdump.c~mm-sparsemem-use-page-table-lock-to-protect-kernel-pmd-operations
+++ a/mm/ptdump.c
@@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd,
if (st->effective_prot)
st->effective_prot(st, 0, pgd_val(val));
- if (pgd_leaf(val))
+ if (pgd_leaf(val)) {
st->note_page(st, addr, 0, pgd_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
@@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d,
if (st->effective_prot)
st->effective_prot(st, 1, p4d_val(val));
- if (p4d_leaf(val))
+ if (p4d_leaf(val)) {
st->note_page(st, addr, 1, p4d_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
@@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud,
if (st->effective_prot)
st->effective_prot(st, 2, pud_val(val));
- if (pud_leaf(val))
+ if (pud_leaf(val)) {
st->note_page(st, addr, 2, pud_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
@@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd,
if (st->effective_prot)
st->effective_prot(st, 3, pmd_val(val));
- if (pmd_leaf(val))
+ if (pmd_leaf(val)) {
st->note_page(st, addr, 3, pmd_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
--- a/mm/sparse-vmemmap.c~mm-sparsemem-use-page-table-lock-to-protect-kernel-pmd-operations
+++ a/mm/sparse-vmemmap.c
@@ -53,8 +53,7 @@ struct vmemmap_remap_walk {
struct list_head *vmemmap_pages;
};
-static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
- struct vmemmap_remap_walk *walk)
+static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
{
pmd_t __pmd;
int i;
@@ -76,15 +75,34 @@ static int split_vmemmap_huge_pmd(pmd_t
set_pte_at(&init_mm, addr, pte, entry);
}
- /* Make pte visible before pmd. See comment in pmd_install(). */
- smp_wmb();
- pmd_populate_kernel(&init_mm, pmd, pgtable);
-
- flush_tlb_kernel_range(start, start + PMD_SIZE);
+ spin_lock(&init_mm.page_table_lock);
+ if (likely(pmd_leaf(*pmd))) {
+ /* Make pte visible before pmd. See comment in pmd_install(). */
+ smp_wmb();
+ pmd_populate_kernel(&init_mm, pmd, pgtable);
+ flush_tlb_kernel_range(start, start + PMD_SIZE);
+ } else {
+ pte_free_kernel(&init_mm, pgtable);
+ }
+ spin_unlock(&init_mm.page_table_lock);
return 0;
}
+static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
+{
+ int leaf;
+
+ spin_lock(&init_mm.page_table_lock);
+ leaf = pmd_leaf(*pmd);
+ spin_unlock(&init_mm.page_table_lock);
+
+ if (!leaf)
+ return 0;
+
+ return __split_vmemmap_huge_pmd(pmd, start);
+}
+
static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end,
struct vmemmap_remap_walk *walk)
@@ -121,13 +139,12 @@ static int vmemmap_pmd_range(pud_t *pud,
pmd = pmd_offset(pud, addr);
do {
- if (pmd_leaf(*pmd)) {
- int ret;
+ int ret;
+
+ ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK);
+ if (ret)
+ return ret;
- ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
- if (ret)
- return ret;
- }
next = pmd_addr_end(addr, end);
vmemmap_pte_range(pmd, addr, next, walk);
} while (pmd++, addr = next, addr != end);
@@ -321,10 +338,8 @@ int vmemmap_remap_free(unsigned long sta
*/
BUG_ON(start - reuse != PAGE_SIZE);
- mmap_write_lock(&init_mm);
+ mmap_read_lock(&init_mm);
ret = vmemmap_remap_range(reuse, end, &walk);
- mmap_write_downgrade(&init_mm);
-
if (ret && walk.nr_walked) {
end = reuse + walk.nr_walked * PAGE_SIZE;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 130/227] mm: sparsemem: use page table lock to protect kernel pmd operations
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: sparsemem: use page table lock to protect kernel pmd operations
The init_mm.page_table_lock is used to protect kernel page tables, we can
use it to serialize splitting vmemmap PMD mappings instead of mmap write
lock, which can increase the concurrency of vmemmap_remap_free().
Actually, It increase the concurrency between allocations of HugeTLB
pages. But it is not the only benefit. There are a lot of users of mmap
read lock of init_mm. The mmap write lock is holding through
vmemmap_remap_free(), removing mmap write lock usage to make it does not
affect other users of mmap read lock. It is not making anything worse and
always a win to move.
Now the kernel page table walker does not hold the page_table_lock when
walking pmd entries. There may be consistency issue of a pmd entry,
because pmd entry might change from a huge pmd entry to a PTE page table.
There is only one user of kernel page table walker, namely ptdump. The
ptdump already considers the consistency, which use a local variable to
cache the value of pmd entry. But we also need to update ->action to
ACTION_CONTINUE to make sure the walker does not walk every pte entry
again when concurrent thread has split the huge pmd.
Link: https://lkml.kernel.org/r/20211101031651.75851-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/ptdump.c | 16 ++++++++++----
mm/sparse-vmemmap.c | 47 +++++++++++++++++++++++++++---------------
2 files changed, 43 insertions(+), 20 deletions(-)
--- a/mm/ptdump.c~mm-sparsemem-use-page-table-lock-to-protect-kernel-pmd-operations
+++ a/mm/ptdump.c
@@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd,
if (st->effective_prot)
st->effective_prot(st, 0, pgd_val(val));
- if (pgd_leaf(val))
+ if (pgd_leaf(val)) {
st->note_page(st, addr, 0, pgd_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
@@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d,
if (st->effective_prot)
st->effective_prot(st, 1, p4d_val(val));
- if (p4d_leaf(val))
+ if (p4d_leaf(val)) {
st->note_page(st, addr, 1, p4d_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
@@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud,
if (st->effective_prot)
st->effective_prot(st, 2, pud_val(val));
- if (pud_leaf(val))
+ if (pud_leaf(val)) {
st->note_page(st, addr, 2, pud_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
@@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd,
if (st->effective_prot)
st->effective_prot(st, 3, pmd_val(val));
- if (pmd_leaf(val))
+ if (pmd_leaf(val)) {
st->note_page(st, addr, 3, pmd_val(val));
+ walk->action = ACTION_CONTINUE;
+ }
return 0;
}
--- a/mm/sparse-vmemmap.c~mm-sparsemem-use-page-table-lock-to-protect-kernel-pmd-operations
+++ a/mm/sparse-vmemmap.c
@@ -53,8 +53,7 @@ struct vmemmap_remap_walk {
struct list_head *vmemmap_pages;
};
-static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start,
- struct vmemmap_remap_walk *walk)
+static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
{
pmd_t __pmd;
int i;
@@ -76,15 +75,34 @@ static int split_vmemmap_huge_pmd(pmd_t
set_pte_at(&init_mm, addr, pte, entry);
}
- /* Make pte visible before pmd. See comment in pmd_install(). */
- smp_wmb();
- pmd_populate_kernel(&init_mm, pmd, pgtable);
-
- flush_tlb_kernel_range(start, start + PMD_SIZE);
+ spin_lock(&init_mm.page_table_lock);
+ if (likely(pmd_leaf(*pmd))) {
+ /* Make pte visible before pmd. See comment in pmd_install(). */
+ smp_wmb();
+ pmd_populate_kernel(&init_mm, pmd, pgtable);
+ flush_tlb_kernel_range(start, start + PMD_SIZE);
+ } else {
+ pte_free_kernel(&init_mm, pgtable);
+ }
+ spin_unlock(&init_mm.page_table_lock);
return 0;
}
+static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
+{
+ int leaf;
+
+ spin_lock(&init_mm.page_table_lock);
+ leaf = pmd_leaf(*pmd);
+ spin_unlock(&init_mm.page_table_lock);
+
+ if (!leaf)
+ return 0;
+
+ return __split_vmemmap_huge_pmd(pmd, start);
+}
+
static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end,
struct vmemmap_remap_walk *walk)
@@ -121,13 +139,12 @@ static int vmemmap_pmd_range(pud_t *pud,
pmd = pmd_offset(pud, addr);
do {
- if (pmd_leaf(*pmd)) {
- int ret;
+ int ret;
+
+ ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK);
+ if (ret)
+ return ret;
- ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk);
- if (ret)
- return ret;
- }
next = pmd_addr_end(addr, end);
vmemmap_pte_range(pmd, addr, next, walk);
} while (pmd++, addr = next, addr != end);
@@ -321,10 +338,8 @@ int vmemmap_remap_free(unsigned long sta
*/
BUG_ON(start - reuse != PAGE_SIZE);
- mmap_write_lock(&init_mm);
+ mmap_read_lock(&init_mm);
ret = vmemmap_remap_range(reuse, end, &walk);
- mmap_write_downgrade(&init_mm);
-
if (ret && walk.nr_walked) {
end = reuse + walk.nr_walked * PAGE_SIZE;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 131/227] selftests: vm: add a hugetlb test case
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: selftests: vm: add a hugetlb test case
Since the head vmemmap page frame associated with each HugeTLB page is
reused, we should hide the PG_head flag of tail struct page from the user.
Add a tese case to check whether it is work properly. The test steps are
as follows.
1) alloc 2MB hugeTLB
2) get each page frame
3) apply those APIs in each page frame
4) Those APIs work completely the same as before.
Reading the flags of a page by /proc/kpageflags is done in
stable_page_flags(), which has invoked PageHead(), PageTail(),
PageCompound() and compound_head(). If those APIs work properly, the head
page must have 15 and 17 bits set. And tail pages must have 16 and 17
bits set but 15 bit unset. Those flags are checked in check_page_flags().
Link: https://lkml.kernel.org/r/20211101031651.75851-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 1
tools/testing/selftests/vm/hugepage-vmemmap.c | 144 ++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 11 +
4 files changed, 157 insertions(+)
--- a/tools/testing/selftests/vm/.gitignore~selftests-vm-add-a-hugetlb-test-case
+++ a/tools/testing/selftests/vm/.gitignore
@@ -2,6 +2,7 @@
hugepage-mmap
hugepage-mremap
hugepage-shm
+hugepage-vmemmap
khugepaged
map_hugetlb
map_populate
--- /dev/null
+++ a/tools/testing/selftests/vm/hugepage-vmemmap.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A test case of using hugepage memory in a user application using the
+ * mmap system call with MAP_HUGETLB flag. Before running this program
+ * make sure the administrator has allocated enough default sized huge
+ * pages to cover the 2 MB allocation.
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+
+#define MAP_LENGTH (2UL * 1024 * 1024)
+
+#ifndef MAP_HUGETLB
+#define MAP_HUGETLB 0x40000 /* arch specific */
+#endif
+
+#define PAGE_SIZE 4096
+
+#define PAGE_COMPOUND_HEAD (1UL << 15)
+#define PAGE_COMPOUND_TAIL (1UL << 16)
+#define PAGE_HUGE (1UL << 17)
+
+#define HEAD_PAGE_FLAGS (PAGE_COMPOUND_HEAD | PAGE_HUGE)
+#define TAIL_PAGE_FLAGS (PAGE_COMPOUND_TAIL | PAGE_HUGE)
+
+#define PM_PFRAME_BITS 55
+#define PM_PFRAME_MASK ~((1UL << PM_PFRAME_BITS) - 1)
+
+/*
+ * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages.
+ * That means the addresses starting with 0x800000... will need to be
+ * specified. Specifying a fixed address is not required on ppc64, i386
+ * or x86_64.
+ */
+#ifdef __ia64__
+#define MAP_ADDR (void *)(0x8000000000000000UL)
+#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
+#else
+#define MAP_ADDR NULL
+#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
+#endif
+
+static void write_bytes(char *addr, size_t length)
+{
+ unsigned long i;
+
+ for (i = 0; i < length; i++)
+ *(addr + i) = (char)i;
+}
+
+static unsigned long virt_to_pfn(void *addr)
+{
+ int fd;
+ unsigned long pagemap;
+
+ fd = open("/proc/self/pagemap", O_RDONLY);
+ if (fd < 0)
+ return -1UL;
+
+ lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
+ read(fd, &pagemap, sizeof(pagemap));
+ close(fd);
+
+ return pagemap & ~PM_PFRAME_MASK;
+}
+
+static int check_page_flags(unsigned long pfn)
+{
+ int fd, i;
+ unsigned long pageflags;
+
+ fd = open("/proc/kpageflags", O_RDONLY);
+ if (fd < 0)
+ return -1;
+
+ lseek(fd, pfn * sizeof(pageflags), SEEK_SET);
+
+ read(fd, &pageflags, sizeof(pageflags));
+ if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) {
+ close(fd);
+ printf("Head page flags (%lx) is invalid\n", pageflags);
+ return -1;
+ }
+
+ /*
+ * pages other than the first page must be tail and shouldn't be head;
+ * this also verifies kernel has correctly set the fake page_head to tail
+ * while hugetlb_free_vmemmap is enabled.
+ */
+ for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+ read(fd, &pageflags, sizeof(pageflags));
+ if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
+ (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
+ close(fd);
+ printf("Tail page flags (%lx) is invalid\n", pageflags);
+ return -1;
+ }
+ }
+
+ close(fd);
+
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ void *addr;
+ unsigned long pfn;
+
+ addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
+ if (addr == MAP_FAILED) {
+ perror("mmap");
+ exit(1);
+ }
+
+ /* Trigger allocation of HugeTLB page. */
+ write_bytes(addr, MAP_LENGTH);
+
+ pfn = virt_to_pfn(addr);
+ if (pfn == -1UL) {
+ munmap(addr, MAP_LENGTH);
+ perror("virt_to_pfn");
+ exit(1);
+ }
+
+ printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
+
+ if (check_page_flags(pfn) < 0) {
+ munmap(addr, MAP_LENGTH);
+ perror("check_page_flags");
+ exit(1);
+ }
+
+ /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
+ if (munmap(addr, MAP_LENGTH)) {
+ perror("munmap");
+ exit(1);
+ }
+
+ return 0;
+}
--- a/tools/testing/selftests/vm/Makefile~selftests-vm-add-a-hugetlb-test-case
+++ a/tools/testing/selftests/vm/Makefile
@@ -33,6 +33,7 @@ TEST_GEN_FILES += hmm-tests
TEST_GEN_FILES += hugepage-mmap
TEST_GEN_FILES += hugepage-mremap
TEST_GEN_FILES += hugepage-shm
+TEST_GEN_FILES += hugepage-vmemmap
TEST_GEN_FILES += khugepaged
TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
--- a/tools/testing/selftests/vm/run_vmtests.sh~selftests-vm-add-a-hugetlb-test-case
+++ a/tools/testing/selftests/vm/run_vmtests.sh
@@ -120,6 +120,17 @@ else
fi
rm -f $mnt/huge_mremap
+echo "------------------------"
+echo "running hugepage-vmemmap"
+echo "------------------------"
+./hugepage-vmemmap
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
echo "NOTE: The above hugetlb tests provide minimal coverage. Use"
echo " https://github.com/libhugetlbfs/libhugetlbfs.git for"
echo " hugetlb regression testing."
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 131/227] selftests: vm: add a hugetlb test case
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: selftests: vm: add a hugetlb test case
Since the head vmemmap page frame associated with each HugeTLB page is
reused, we should hide the PG_head flag of tail struct page from the user.
Add a tese case to check whether it is work properly. The test steps are
as follows.
1) alloc 2MB hugeTLB
2) get each page frame
3) apply those APIs in each page frame
4) Those APIs work completely the same as before.
Reading the flags of a page by /proc/kpageflags is done in
stable_page_flags(), which has invoked PageHead(), PageTail(),
PageCompound() and compound_head(). If those APIs work properly, the head
page must have 15 and 17 bits set. And tail pages must have 16 and 17
bits set but 15 bit unset. Those flags are checked in check_page_flags().
Link: https://lkml.kernel.org/r/20211101031651.75851-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 1
tools/testing/selftests/vm/hugepage-vmemmap.c | 144 ++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 11 +
4 files changed, 157 insertions(+)
--- a/tools/testing/selftests/vm/.gitignore~selftests-vm-add-a-hugetlb-test-case
+++ a/tools/testing/selftests/vm/.gitignore
@@ -2,6 +2,7 @@
hugepage-mmap
hugepage-mremap
hugepage-shm
+hugepage-vmemmap
khugepaged
map_hugetlb
map_populate
--- /dev/null
+++ a/tools/testing/selftests/vm/hugepage-vmemmap.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A test case of using hugepage memory in a user application using the
+ * mmap system call with MAP_HUGETLB flag. Before running this program
+ * make sure the administrator has allocated enough default sized huge
+ * pages to cover the 2 MB allocation.
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+
+#define MAP_LENGTH (2UL * 1024 * 1024)
+
+#ifndef MAP_HUGETLB
+#define MAP_HUGETLB 0x40000 /* arch specific */
+#endif
+
+#define PAGE_SIZE 4096
+
+#define PAGE_COMPOUND_HEAD (1UL << 15)
+#define PAGE_COMPOUND_TAIL (1UL << 16)
+#define PAGE_HUGE (1UL << 17)
+
+#define HEAD_PAGE_FLAGS (PAGE_COMPOUND_HEAD | PAGE_HUGE)
+#define TAIL_PAGE_FLAGS (PAGE_COMPOUND_TAIL | PAGE_HUGE)
+
+#define PM_PFRAME_BITS 55
+#define PM_PFRAME_MASK ~((1UL << PM_PFRAME_BITS) - 1)
+
+/*
+ * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages.
+ * That means the addresses starting with 0x800000... will need to be
+ * specified. Specifying a fixed address is not required on ppc64, i386
+ * or x86_64.
+ */
+#ifdef __ia64__
+#define MAP_ADDR (void *)(0x8000000000000000UL)
+#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
+#else
+#define MAP_ADDR NULL
+#define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
+#endif
+
+static void write_bytes(char *addr, size_t length)
+{
+ unsigned long i;
+
+ for (i = 0; i < length; i++)
+ *(addr + i) = (char)i;
+}
+
+static unsigned long virt_to_pfn(void *addr)
+{
+ int fd;
+ unsigned long pagemap;
+
+ fd = open("/proc/self/pagemap", O_RDONLY);
+ if (fd < 0)
+ return -1UL;
+
+ lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET);
+ read(fd, &pagemap, sizeof(pagemap));
+ close(fd);
+
+ return pagemap & ~PM_PFRAME_MASK;
+}
+
+static int check_page_flags(unsigned long pfn)
+{
+ int fd, i;
+ unsigned long pageflags;
+
+ fd = open("/proc/kpageflags", O_RDONLY);
+ if (fd < 0)
+ return -1;
+
+ lseek(fd, pfn * sizeof(pageflags), SEEK_SET);
+
+ read(fd, &pageflags, sizeof(pageflags));
+ if ((pageflags & HEAD_PAGE_FLAGS) != HEAD_PAGE_FLAGS) {
+ close(fd);
+ printf("Head page flags (%lx) is invalid\n", pageflags);
+ return -1;
+ }
+
+ /*
+ * pages other than the first page must be tail and shouldn't be head;
+ * this also verifies kernel has correctly set the fake page_head to tail
+ * while hugetlb_free_vmemmap is enabled.
+ */
+ for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) {
+ read(fd, &pageflags, sizeof(pageflags));
+ if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS ||
+ (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) {
+ close(fd);
+ printf("Tail page flags (%lx) is invalid\n", pageflags);
+ return -1;
+ }
+ }
+
+ close(fd);
+
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ void *addr;
+ unsigned long pfn;
+
+ addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0);
+ if (addr == MAP_FAILED) {
+ perror("mmap");
+ exit(1);
+ }
+
+ /* Trigger allocation of HugeTLB page. */
+ write_bytes(addr, MAP_LENGTH);
+
+ pfn = virt_to_pfn(addr);
+ if (pfn == -1UL) {
+ munmap(addr, MAP_LENGTH);
+ perror("virt_to_pfn");
+ exit(1);
+ }
+
+ printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
+
+ if (check_page_flags(pfn) < 0) {
+ munmap(addr, MAP_LENGTH);
+ perror("check_page_flags");
+ exit(1);
+ }
+
+ /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */
+ if (munmap(addr, MAP_LENGTH)) {
+ perror("munmap");
+ exit(1);
+ }
+
+ return 0;
+}
--- a/tools/testing/selftests/vm/Makefile~selftests-vm-add-a-hugetlb-test-case
+++ a/tools/testing/selftests/vm/Makefile
@@ -33,6 +33,7 @@ TEST_GEN_FILES += hmm-tests
TEST_GEN_FILES += hugepage-mmap
TEST_GEN_FILES += hugepage-mremap
TEST_GEN_FILES += hugepage-shm
+TEST_GEN_FILES += hugepage-vmemmap
TEST_GEN_FILES += khugepaged
TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
--- a/tools/testing/selftests/vm/run_vmtests.sh~selftests-vm-add-a-hugetlb-test-case
+++ a/tools/testing/selftests/vm/run_vmtests.sh
@@ -120,6 +120,17 @@ else
fi
rm -f $mnt/huge_mremap
+echo "------------------------"
+echo "running hugepage-vmemmap"
+echo "------------------------"
+./hugepage-vmemmap
+if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ exitcode=1
+else
+ echo "[PASS]"
+fi
+
echo "NOTE: The above hugetlb tests provide minimal coverage. Use"
echo " https://github.com/libhugetlbfs/libhugetlbfs.git for"
echo " hugetlb regression testing."
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 132/227] mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those
functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.
Link: https://lkml.kernel.org/r/20211101031651.75851-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 2 ++
mm/sparse-vmemmap.c | 2 ++
2 files changed, 4 insertions(+)
--- a/include/linux/mm.h~mm-sparsemem-move-vmemmap-related-to-hugetlb-to-config_hugetlb_page_free_vmemmap
+++ a/include/linux/mm.h
@@ -3146,10 +3146,12 @@ static inline void print_vma_addr(char *
}
#endif
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
int vmemmap_remap_free(unsigned long start, unsigned long end,
unsigned long reuse);
int vmemmap_remap_alloc(unsigned long start, unsigned long end,
unsigned long reuse, gfp_t gfp_mask);
+#endif
void *sparse_buffer_alloc(unsigned long size);
struct page * __populate_section_memmap(unsigned long pfn,
--- a/mm/sparse-vmemmap.c~mm-sparsemem-move-vmemmap-related-to-hugetlb-to-config_hugetlb_page_free_vmemmap
+++ a/mm/sparse-vmemmap.c
@@ -34,6 +34,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
/**
* struct vmemmap_remap_walk - walk vmemmap page table
*
@@ -419,6 +420,7 @@ int vmemmap_remap_alloc(unsigned long st
return 0;
}
+#endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP */
/*
* Allocate a block of memory to be used to back the virtual memory map
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 132/227] mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: zhengqi.arch, willy, song.bao.hua, osalvador, mike.kravetz,
mhocko, fam.zheng, duanxiongchun, david, corbet, chenhuang5,
bodeddub, songmuchun, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Muchun Song <songmuchun@bytedance.com>
Subject: mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those
functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.
Link: https://lkml.kernel.org/r/20211101031651.75851-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm.h | 2 ++
mm/sparse-vmemmap.c | 2 ++
2 files changed, 4 insertions(+)
--- a/include/linux/mm.h~mm-sparsemem-move-vmemmap-related-to-hugetlb-to-config_hugetlb_page_free_vmemmap
+++ a/include/linux/mm.h
@@ -3146,10 +3146,12 @@ static inline void print_vma_addr(char *
}
#endif
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
int vmemmap_remap_free(unsigned long start, unsigned long end,
unsigned long reuse);
int vmemmap_remap_alloc(unsigned long start, unsigned long end,
unsigned long reuse, gfp_t gfp_mask);
+#endif
void *sparse_buffer_alloc(unsigned long size);
struct page * __populate_section_memmap(unsigned long pfn,
--- a/mm/sparse-vmemmap.c~mm-sparsemem-move-vmemmap-related-to-hugetlb-to-config_hugetlb_page_free_vmemmap
+++ a/mm/sparse-vmemmap.c
@@ -34,6 +34,7 @@
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
/**
* struct vmemmap_remap_walk - walk vmemmap page table
*
@@ -419,6 +420,7 @@ int vmemmap_remap_alloc(unsigned long st
return 0;
}
+#endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP */
/*
* Allocate a block of memory to be used to back the virtual memory map
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 133/227] mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: tglx, paul.walmsley, palmer, mingo, mike.kravetz, linux,
anshuman.khandual, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB
ARCH_WANT_GENERAL_HUGETLB config has duplicate definitions on platforms
that subscribe it. Instead make it a generic config option which can be
selected on applicable platforms when required.
Link: https://lkml.kernel.org/r/1643718465-4324-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm/Kconfig | 4 +---
arch/riscv/Kconfig | 4 +---
arch/x86/Kconfig | 4 +---
mm/Kconfig | 3 +++
4 files changed, 6 insertions(+), 9 deletions(-)
--- a/arch/arm/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/arch/arm/Kconfig
@@ -37,6 +37,7 @@ config ARM
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+ select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_LD_ORPHAN_WARN
select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
@@ -1508,9 +1509,6 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU
-config ARCH_WANT_GENERAL_HUGETLB
- def_bool y
-
config ARM_MODULE_PLTS
bool "Use PLTs to allow module memory to spill over into vmalloc area"
depends on MODULES
--- a/arch/riscv/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/arch/riscv/Kconfig
@@ -40,6 +40,7 @@ config RISCV
select ARCH_USE_MEMTEST
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_FRAME_POINTERS
+ select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
select BUILDTIME_TABLE_SORT if MMU
@@ -171,9 +172,6 @@ config ARCH_SPARSEMEM_ENABLE
config ARCH_SELECT_MEMORY_MODEL
def_bool ARCH_SPARSEMEM_ENABLE
-config ARCH_WANT_GENERAL_HUGETLB
- def_bool y
-
config ARCH_SUPPORTS_UPROBES
def_bool y
--- a/arch/x86/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/arch/x86/Kconfig
@@ -118,6 +118,7 @@ config X86
select ARCH_WANT_DEFAULT_BPF_JIT if X86_64
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANTS_NO_INSTR
+ select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_THP_SWAP if X86_64
@@ -347,9 +348,6 @@ config ARCH_NR_GPIO
config ARCH_SUSPEND_POSSIBLE
def_bool y
-config ARCH_WANT_GENERAL_HUGETLB
- def_bool y
-
config AUDIT_ARCH
def_bool y if X86_64
--- a/mm/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/mm/Kconfig
@@ -414,6 +414,9 @@ choice
benefit.
endchoice
+config ARCH_WANT_GENERAL_HUGETLB
+ bool
+
config ARCH_WANTS_THP_SWAP
def_bool n
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 133/227] mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: tglx, paul.walmsley, palmer, mingo, mike.kravetz, linux,
anshuman.khandual, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB
ARCH_WANT_GENERAL_HUGETLB config has duplicate definitions on platforms
that subscribe it. Instead make it a generic config option which can be
selected on applicable platforms when required.
Link: https://lkml.kernel.org/r/1643718465-4324-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm/Kconfig | 4 +---
arch/riscv/Kconfig | 4 +---
arch/x86/Kconfig | 4 +---
mm/Kconfig | 3 +++
4 files changed, 6 insertions(+), 9 deletions(-)
--- a/arch/arm/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/arch/arm/Kconfig
@@ -37,6 +37,7 @@ config ARM
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
+ select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_LD_ORPHAN_WARN
select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
@@ -1508,9 +1509,6 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU
-config ARCH_WANT_GENERAL_HUGETLB
- def_bool y
-
config ARM_MODULE_PLTS
bool "Use PLTs to allow module memory to spill over into vmalloc area"
depends on MODULES
--- a/arch/riscv/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/arch/riscv/Kconfig
@@ -40,6 +40,7 @@ config RISCV
select ARCH_USE_MEMTEST
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_FRAME_POINTERS
+ select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
select BUILDTIME_TABLE_SORT if MMU
@@ -171,9 +172,6 @@ config ARCH_SPARSEMEM_ENABLE
config ARCH_SELECT_MEMORY_MODEL
def_bool ARCH_SPARSEMEM_ENABLE
-config ARCH_WANT_GENERAL_HUGETLB
- def_bool y
-
config ARCH_SUPPORTS_UPROBES
def_bool y
--- a/arch/x86/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/arch/x86/Kconfig
@@ -118,6 +118,7 @@ config X86
select ARCH_WANT_DEFAULT_BPF_JIT if X86_64
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANTS_NO_INSTR
+ select ARCH_WANT_GENERAL_HUGETLB
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_THP_SWAP if X86_64
@@ -347,9 +348,6 @@ config ARCH_NR_GPIO
config ARCH_SUSPEND_POSSIBLE
def_bool y
-config ARCH_WANT_GENERAL_HUGETLB
- def_bool y
-
config AUDIT_ARCH
def_bool y if X86_64
--- a/mm/Kconfig~mm-hugetlb-generalize-arch_want_general_hugetlb
+++ a/mm/Kconfig
@@ -414,6 +414,9 @@ choice
benefit.
endchoice
+config ARCH_WANT_GENERAL_HUGETLB
+ bool
+
config ARCH_WANTS_THP_SWAP
def_bool n
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 134/227] hugetlb: clean up potential spectre issue warnings
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: yaozhenguo1, mhocko, liuyuntao10, dan.carpenter, baolin.wang,
mike.kravetz, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Mike Kravetz <mike.kravetz@oracle.com>
Subject: hugetlb: clean up potential spectre issue warnings
Recently introduced code allows numa nodes to be specified on the kernel
command line for hugetlb allocations or CMA reservations. The node values
are user specified and used as indicies into arrays. This generated the
following smatch warnings:
mm/hugetlb.c:4170 hugepages_setup() warn: potential spectre issue 'default_hugepages_in_node' [w]
mm/hugetlb.c:4172 hugepages_setup() warn: potential spectre issue 'parsed_hstate->max_huge_pages_node' [w]
mm/hugetlb.c:6898 cmdline_parse_hugetlb_cma() warn: potential spectre issue 'hugetlb_cma_size_in_node' [w] (local cap)
Clean up by using array_index_nospec to sanitize array indicies.
The routine cmdline_parse_hugetlb_cma has the same overflow/truncation
issue addressed in [1]. That is also fixed with this change.
[1] https://lore.kernel.org/linux-mm/20220209134018.8242-1-liuyuntao10@huawei.com/
As Michal pointed out, this is unlikely to be exploitable because it is
__init code. But the patch suppresses the warnings.
[mike.kravetz@oracle.com: v2]
Link: https://lkml.kernel.org/r/20220218212946.35441-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20220217234218.192885-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~hugetlb-clean-up-potential-spectre-issue-warnings
+++ a/mm/hugetlb.c
@@ -31,6 +31,7 @@
#include <linux/llist.h>
#include <linux/cma.h>
#include <linux/migrate.h>
+#include <linux/nospec.h>
#include <asm/page.h>
#include <asm/pgalloc.h>
@@ -4161,7 +4162,7 @@ static int __init hugepages_setup(char *
}
if (tmp >= nr_online_nodes)
goto invalid;
- node = tmp;
+ node = array_index_nospec(tmp, nr_online_nodes);
p += count + 1;
/* Parse hugepages */
if (sscanf(p, "%lu%n", &tmp, &count) != 1)
@@ -6889,9 +6890,9 @@ static int __init cmdline_parse_hugetlb_
break;
if (s[count] == ':') {
- nid = tmp;
- if (nid < 0 || nid >= MAX_NUMNODES)
+ if (tmp >= MAX_NUMNODES)
break;
+ nid = array_index_nospec(tmp, MAX_NUMNODES);
s += count + 1;
tmp = memparse(s, &s);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 134/227] hugetlb: clean up potential spectre issue warnings
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: yaozhenguo1, mhocko, liuyuntao10, dan.carpenter, baolin.wang,
mike.kravetz, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Mike Kravetz <mike.kravetz@oracle.com>
Subject: hugetlb: clean up potential spectre issue warnings
Recently introduced code allows numa nodes to be specified on the kernel
command line for hugetlb allocations or CMA reservations. The node values
are user specified and used as indicies into arrays. This generated the
following smatch warnings:
mm/hugetlb.c:4170 hugepages_setup() warn: potential spectre issue 'default_hugepages_in_node' [w]
mm/hugetlb.c:4172 hugepages_setup() warn: potential spectre issue 'parsed_hstate->max_huge_pages_node' [w]
mm/hugetlb.c:6898 cmdline_parse_hugetlb_cma() warn: potential spectre issue 'hugetlb_cma_size_in_node' [w] (local cap)
Clean up by using array_index_nospec to sanitize array indicies.
The routine cmdline_parse_hugetlb_cma has the same overflow/truncation
issue addressed in [1]. That is also fixed with this change.
[1] https://lore.kernel.org/linux-mm/20220209134018.8242-1-liuyuntao10@huawei.com/
As Michal pointed out, this is unlikely to be exploitable because it is
__init code. But the patch suppresses the warnings.
[mike.kravetz@oracle.com: v2]
Link: https://lkml.kernel.org/r/20220218212946.35441-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20220217234218.192885-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Liu Yuntao <liuyuntao10@huawei.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~hugetlb-clean-up-potential-spectre-issue-warnings
+++ a/mm/hugetlb.c
@@ -31,6 +31,7 @@
#include <linux/llist.h>
#include <linux/cma.h>
#include <linux/migrate.h>
+#include <linux/nospec.h>
#include <asm/page.h>
#include <asm/pgalloc.h>
@@ -4161,7 +4162,7 @@ static int __init hugepages_setup(char *
}
if (tmp >= nr_online_nodes)
goto invalid;
- node = tmp;
+ node = array_index_nospec(tmp, nr_online_nodes);
p += count + 1;
/* Parse hugepages */
if (sscanf(p, "%lu%n", &tmp, &count) != 1)
@@ -6889,9 +6890,9 @@ static int __init cmdline_parse_hugetlb_
break;
if (s[count] == ':') {
- nid = tmp;
- if (nid < 0 || nid >= MAX_NUMNODES)
+ if (tmp >= MAX_NUMNODES)
break;
+ nid = array_index_nospec(tmp, MAX_NUMNODES);
s += count + 1;
tmp = memparse(s, &s);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 135/227] mm/hugetlb: use helper macro __ATTR_RW
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: songmuchun, mike.kravetz, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/hugetlb: use helper macro __ATTR_RW
Use helper macro __ATTR_RW to define HSTATE_ATTR to make code more clear.
Minor readability improvement.
Link: https://lkml.kernel.org/r/20220222112731.33479-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-use-helper-macro-__attr_rw
+++ a/mm/hugetlb.c
@@ -3499,8 +3499,7 @@ static int demote_pool_huge_page(struct
static struct kobj_attribute _name##_attr = __ATTR_WO(_name)
#define HSTATE_ATTR(_name) \
- static struct kobj_attribute _name##_attr = \
- __ATTR(_name, 0644, _name##_show, _name##_store)
+ static struct kobj_attribute _name##_attr = __ATTR_RW(_name)
static struct kobject *hugepages_kobj;
static struct kobject *hstate_kobjs[HUGE_MAX_HSTATE];
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 135/227] mm/hugetlb: use helper macro __ATTR_RW
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: songmuchun, mike.kravetz, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/hugetlb: use helper macro __ATTR_RW
Use helper macro __ATTR_RW to define HSTATE_ATTR to make code more clear.
Minor readability improvement.
Link: https://lkml.kernel.org/r/20220222112731.33479-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-use-helper-macro-__attr_rw
+++ a/mm/hugetlb.c
@@ -3499,8 +3499,7 @@ static int demote_pool_huge_page(struct
static struct kobj_attribute _name##_attr = __ATTR_WO(_name)
#define HSTATE_ATTR(_name) \
- static struct kobj_attribute _name##_attr = \
- __ATTR(_name, 0644, _name##_show, _name##_store)
+ static struct kobj_attribute _name##_attr = __ATTR_RW(_name)
static struct kobject *hugepages_kobj;
static struct kobject *hstate_kobjs[HUGE_MAX_HSTATE];
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 136/227] mm/hugetlb.c: export PageHeadHuge()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: willy, mike.kravetz, kirill, hch, dhowells, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: David Howells <dhowells@redhat.com>
Subject: mm/hugetlb.c: export PageHeadHuge()
Export PageHeadHuge() - it's used by folio_test_hugetlb() and thence by
such as folio_file_page() and folio_contains(). Matthew suggested I use
the first of those instead of doing the same calculation manually - but I
can't call it from a module.
Kirill suggested rearranging things to put it in a header, but that
introduces header dependencies because of where constants are defined.
[akpm@linux-foundation.org: s/EXPORT_SYMBOL/EXPORT_SYMBOL_GPL/, per Christoph]
Link: https://lkml.kernel.org/r/2494562.1646054576@warthog.procyon.org.uk
Link: https://lore.kernel.org/r/163707085314.3221130.14783857863702203440.stgit@warthog.procyon.org.uk/
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/hugetlb.c~mm-export-pageheadhuge
+++ a/mm/hugetlb.c
@@ -1855,6 +1855,7 @@ int PageHeadHuge(struct page *page_head)
return page_head[1].compound_dtor == HUGETLB_PAGE_DTOR;
}
+EXPORT_SYMBOL_GPL(PageHeadHuge);
/*
* Find and lock address space (mapping) in write mode.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 136/227] mm/hugetlb.c: export PageHeadHuge()
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: willy, mike.kravetz, kirill, hch, dhowells, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: David Howells <dhowells@redhat.com>
Subject: mm/hugetlb.c: export PageHeadHuge()
Export PageHeadHuge() - it's used by folio_test_hugetlb() and thence by
such as folio_file_page() and folio_contains(). Matthew suggested I use
the first of those instead of doing the same calculation manually - but I
can't call it from a module.
Kirill suggested rearranging things to put it in a header, but that
introduces header dependencies because of where constants are defined.
[akpm@linux-foundation.org: s/EXPORT_SYMBOL/EXPORT_SYMBOL_GPL/, per Christoph]
Link: https://lkml.kernel.org/r/2494562.1646054576@warthog.procyon.org.uk
Link: https://lore.kernel.org/r/163707085314.3221130.14783857863702203440.stgit@warthog.procyon.org.uk/
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/hugetlb.c~mm-export-pageheadhuge
+++ a/mm/hugetlb.c
@@ -1855,6 +1855,7 @@ int PageHeadHuge(struct page *page_head)
return page_head[1].compound_dtor == HUGETLB_PAGE_DTOR;
}
+EXPORT_SYMBOL_GPL(PageHeadHuge);
/*
* Find and lock address space (mapping) in write mode.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 137/227] mm: remove unneeded local variable follflags
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: anshuman.khandual, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm: remove unneeded local variable follflags
We can pass FOLL_GET | FOLL_DUMP to follow_page directly to simplify the
code a bit in add_page_for_migration and split_huge_pages_pid.
Link: https://lkml.kernel.org/r/20220311072002.35575-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 4 +---
mm/migrate.c | 4 +---
2 files changed, 2 insertions(+), 6 deletions(-)
--- a/mm/huge_memory.c~mm-remove-unneeded-local-variable-follflags-v2
+++ a/mm/huge_memory.c
@@ -2953,7 +2953,6 @@ static int split_huge_pages_pid(int pid,
*/
for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
struct vm_area_struct *vma = find_vma(mm, addr);
- unsigned int follflags;
struct page *page;
if (!vma || addr < vma->vm_start)
@@ -2966,8 +2965,7 @@ static int split_huge_pages_pid(int pid,
}
/* FOLL_DUMP to ignore special (like zero) pages */
- follflags = FOLL_GET | FOLL_DUMP;
- page = follow_page(vma, addr, follflags);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
if (IS_ERR(page))
continue;
--- a/mm/migrate.c~mm-remove-unneeded-local-variable-follflags-v2
+++ a/mm/migrate.c
@@ -1611,7 +1611,6 @@ static int add_page_for_migration(struct
{
struct vm_area_struct *vma;
struct page *page;
- unsigned int follflags;
int err;
mmap_read_lock(mm);
@@ -1621,8 +1620,7 @@ static int add_page_for_migration(struct
goto out;
/* FOLL_DUMP to ignore special (like zero) pages */
- follflags = FOLL_GET | FOLL_DUMP;
- page = follow_page(vma, addr, follflags);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
err = PTR_ERR(page);
if (IS_ERR(page))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 137/227] mm: remove unneeded local variable follflags
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: anshuman.khandual, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm: remove unneeded local variable follflags
We can pass FOLL_GET | FOLL_DUMP to follow_page directly to simplify the
code a bit in add_page_for_migration and split_huge_pages_pid.
Link: https://lkml.kernel.org/r/20220311072002.35575-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 4 +---
mm/migrate.c | 4 +---
2 files changed, 2 insertions(+), 6 deletions(-)
--- a/mm/huge_memory.c~mm-remove-unneeded-local-variable-follflags-v2
+++ a/mm/huge_memory.c
@@ -2953,7 +2953,6 @@ static int split_huge_pages_pid(int pid,
*/
for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) {
struct vm_area_struct *vma = find_vma(mm, addr);
- unsigned int follflags;
struct page *page;
if (!vma || addr < vma->vm_start)
@@ -2966,8 +2965,7 @@ static int split_huge_pages_pid(int pid,
}
/* FOLL_DUMP to ignore special (like zero) pages */
- follflags = FOLL_GET | FOLL_DUMP;
- page = follow_page(vma, addr, follflags);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
if (IS_ERR(page))
continue;
--- a/mm/migrate.c~mm-remove-unneeded-local-variable-follflags-v2
+++ a/mm/migrate.c
@@ -1611,7 +1611,6 @@ static int add_page_for_migration(struct
{
struct vm_area_struct *vma;
struct page *page;
- unsigned int follflags;
int err;
mmap_read_lock(mm);
@@ -1621,8 +1620,7 @@ static int add_page_for_migration(struct
goto out;
/* FOLL_DUMP to ignore special (like zero) pages */
- follflags = FOLL_GET | FOLL_DUMP;
- page = follow_page(vma, addr, follflags);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
err = PTR_ERR(page);
if (IS_ERR(page))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 138/227] userfaultfd: provide unmasked address on page-fault
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: rppt, peterx, jack, david, aarcange, namit, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Nadav Amit <namit@vmware.com>
Subject: userfaultfd: provide unmasked address on page-fault
Userfaultfd is supposed to provide the full address (i.e., unmasked) of
the faulting access back to userspace. However, that is not the case for
quite some time.
Even running "userfaultfd_demo" from the userfaultfd man page provides the
wrong output (and contradicts the man page). Notice that
"UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and
not the first read address (0x7fc5e30b300f).
Address returned by mmap() = 0x7fc5e30b3000
fault_handler_thread():
poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
(uffdio_copy.copy returned 4096)
Read address 0x7fc5e30b300f in main(): A
Read address 0x7fc5e30b340f in main(): A
Read address 0x7fc5e30b380f in main(): A
Read address 0x7fc5e30b3c0f in main(): A
The exact address is useful for various reasons and specifically for
prefetching decisions. If it is known that the memory is populated by
certain objects whose size is not page-aligned, then based on the faulting
address, the uffd-monitor can decide whether to prefetch and prefault the
adjacent page.
This bug has been for quite some time in the kernel: since commit
1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address")
vmf->virtual_address"), which dates back to 2016. A concern has been
raised that existing userspace application might rely on the old/wrong
behavior in which the address is masked. Therefore, it was suggested to
provide the masked address unless the user explicitly asks for the exact
address.
Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct
userfaultfd to provide the exact address. Add a new "real_address" field
to vmf to hold the unmasked address. Provide the address to userspace
accordingly.
Initialize real_address in various code-paths to be consistent with
address, even when it is not used, to be on the safe side.
[namit@vmware.com: initialize real_address on all code paths, per Jan]
Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com
[akpm@linux-foundation.org: fix typo in comment, per Jan]
Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/userfaultfd.c | 5 ++++-
include/linux/mm.h | 3 ++-
include/uapi/linux/userfaultfd.h | 8 +++++++-
mm/hugetlb.c | 6 ++++--
mm/memory.c | 1 +
mm/swapfile.c | 1 +
6 files changed, 19 insertions(+), 5 deletions(-)
--- a/fs/userfaultfd.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/fs/userfaultfd.c
@@ -198,6 +198,9 @@ static inline struct uffd_msg userfault_
struct uffd_msg msg;
msg_init(&msg);
msg.event = UFFD_EVENT_PAGEFAULT;
+
+ if (!(features & UFFD_FEATURE_EXACT_ADDRESS))
+ address &= PAGE_MASK;
msg.arg.pagefault.address = address;
/*
* These flags indicate why the userfault occurred:
@@ -482,7 +485,7 @@ vm_fault_t handle_userfault(struct vm_fa
init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
uwq.wq.private = current;
- uwq.msg = userfault_msg(vmf->address, vmf->flags, reason,
+ uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason,
ctx->features);
uwq.ctx = ctx;
uwq.waken = false;
--- a/include/linux/mm.h~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/include/linux/mm.h
@@ -478,7 +478,8 @@ struct vm_fault {
struct vm_area_struct *vma; /* Target VMA */
gfp_t gfp_mask; /* gfp mask to be used for allocations */
pgoff_t pgoff; /* Logical page offset based on vma */
- unsigned long address; /* Faulting virtual address */
+ unsigned long address; /* Faulting virtual address - masked */
+ unsigned long real_address; /* Faulting virtual address - unmasked */
};
enum fault_flag flags; /* FAULT_FLAG_xxx flags
* XXX: should really be 'const' */
--- a/include/uapi/linux/userfaultfd.h~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/include/uapi/linux/userfaultfd.h
@@ -32,7 +32,8 @@
UFFD_FEATURE_SIGBUS | \
UFFD_FEATURE_THREAD_ID | \
UFFD_FEATURE_MINOR_HUGETLBFS | \
- UFFD_FEATURE_MINOR_SHMEM)
+ UFFD_FEATURE_MINOR_SHMEM | \
+ UFFD_FEATURE_EXACT_ADDRESS)
#define UFFD_API_IOCTLS \
((__u64)1 << _UFFDIO_REGISTER | \
(__u64)1 << _UFFDIO_UNREGISTER | \
@@ -189,6 +190,10 @@ struct uffdio_api {
*
* UFFD_FEATURE_MINOR_SHMEM indicates the same support as
* UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
+ *
+ * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page
+ * faults would be provided and the offset within the page would not be
+ * masked.
*/
#define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
#define UFFD_FEATURE_EVENT_FORK (1<<1)
@@ -201,6 +206,7 @@ struct uffdio_api {
#define UFFD_FEATURE_THREAD_ID (1<<8)
#define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9)
#define UFFD_FEATURE_MINOR_SHMEM (1<<10)
+#define UFFD_FEATURE_EXACT_ADDRESS (1<<11)
__u64 features;
__u64 ioctls;
--- a/mm/hugetlb.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/mm/hugetlb.c
@@ -5341,6 +5341,7 @@ static inline vm_fault_t hugetlb_handle_
pgoff_t idx,
unsigned int flags,
unsigned long haddr,
+ unsigned long addr,
unsigned long reason)
{
vm_fault_t ret;
@@ -5348,6 +5349,7 @@ static inline vm_fault_t hugetlb_handle_
struct vm_fault vmf = {
.vma = vma,
.address = haddr,
+ .real_address = addr,
.flags = flags,
/*
@@ -5416,7 +5418,7 @@ retry:
/* Check for page in userfault range */
if (userfaultfd_missing(vma)) {
ret = hugetlb_handle_userfault(vma, mapping, idx,
- flags, haddr,
+ flags, haddr, address,
VM_UFFD_MISSING);
goto out;
}
@@ -5480,7 +5482,7 @@ retry:
unlock_page(page);
put_page(page);
ret = hugetlb_handle_userfault(vma, mapping, idx,
- flags, haddr,
+ flags, haddr, address,
VM_UFFD_MINOR);
goto out;
}
--- a/mm/memory.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/mm/memory.c
@@ -4633,6 +4633,7 @@ static vm_fault_t __handle_mm_fault(stru
struct vm_fault vmf = {
.vma = vma,
.address = address & PAGE_MASK,
+ .real_address = address,
.flags = flags,
.pgoff = linear_page_index(vma, address),
.gfp_mask = __get_fault_gfp_mask(vma),
--- a/mm/swapfile.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/mm/swapfile.c
@@ -1951,6 +1951,7 @@ static int unuse_pte_range(struct vm_are
struct vm_fault vmf = {
.vma = vma,
.address = addr,
+ .real_address = addr,
.pmd = pmd,
};
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 138/227] userfaultfd: provide unmasked address on page-fault
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: rppt, peterx, jack, david, aarcange, namit, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Nadav Amit <namit@vmware.com>
Subject: userfaultfd: provide unmasked address on page-fault
Userfaultfd is supposed to provide the full address (i.e., unmasked) of
the faulting access back to userspace. However, that is not the case for
quite some time.
Even running "userfaultfd_demo" from the userfaultfd man page provides the
wrong output (and contradicts the man page). Notice that
"UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and
not the first read address (0x7fc5e30b300f).
Address returned by mmap() = 0x7fc5e30b3000
fault_handler_thread():
poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
(uffdio_copy.copy returned 4096)
Read address 0x7fc5e30b300f in main(): A
Read address 0x7fc5e30b340f in main(): A
Read address 0x7fc5e30b380f in main(): A
Read address 0x7fc5e30b3c0f in main(): A
The exact address is useful for various reasons and specifically for
prefetching decisions. If it is known that the memory is populated by
certain objects whose size is not page-aligned, then based on the faulting
address, the uffd-monitor can decide whether to prefetch and prefault the
adjacent page.
This bug has been for quite some time in the kernel: since commit
1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address")
vmf->virtual_address"), which dates back to 2016. A concern has been
raised that existing userspace application might rely on the old/wrong
behavior in which the address is masked. Therefore, it was suggested to
provide the masked address unless the user explicitly asks for the exact
address.
Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct
userfaultfd to provide the exact address. Add a new "real_address" field
to vmf to hold the unmasked address. Provide the address to userspace
accordingly.
Initialize real_address in various code-paths to be consistent with
address, even when it is not used, to be on the safe side.
[namit@vmware.com: initialize real_address on all code paths, per Jan]
Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com
[akpm@linux-foundation.org: fix typo in comment, per Jan]
Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/userfaultfd.c | 5 ++++-
include/linux/mm.h | 3 ++-
include/uapi/linux/userfaultfd.h | 8 +++++++-
mm/hugetlb.c | 6 ++++--
mm/memory.c | 1 +
mm/swapfile.c | 1 +
6 files changed, 19 insertions(+), 5 deletions(-)
--- a/fs/userfaultfd.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/fs/userfaultfd.c
@@ -198,6 +198,9 @@ static inline struct uffd_msg userfault_
struct uffd_msg msg;
msg_init(&msg);
msg.event = UFFD_EVENT_PAGEFAULT;
+
+ if (!(features & UFFD_FEATURE_EXACT_ADDRESS))
+ address &= PAGE_MASK;
msg.arg.pagefault.address = address;
/*
* These flags indicate why the userfault occurred:
@@ -482,7 +485,7 @@ vm_fault_t handle_userfault(struct vm_fa
init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
uwq.wq.private = current;
- uwq.msg = userfault_msg(vmf->address, vmf->flags, reason,
+ uwq.msg = userfault_msg(vmf->real_address, vmf->flags, reason,
ctx->features);
uwq.ctx = ctx;
uwq.waken = false;
--- a/include/linux/mm.h~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/include/linux/mm.h
@@ -478,7 +478,8 @@ struct vm_fault {
struct vm_area_struct *vma; /* Target VMA */
gfp_t gfp_mask; /* gfp mask to be used for allocations */
pgoff_t pgoff; /* Logical page offset based on vma */
- unsigned long address; /* Faulting virtual address */
+ unsigned long address; /* Faulting virtual address - masked */
+ unsigned long real_address; /* Faulting virtual address - unmasked */
};
enum fault_flag flags; /* FAULT_FLAG_xxx flags
* XXX: should really be 'const' */
--- a/include/uapi/linux/userfaultfd.h~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/include/uapi/linux/userfaultfd.h
@@ -32,7 +32,8 @@
UFFD_FEATURE_SIGBUS | \
UFFD_FEATURE_THREAD_ID | \
UFFD_FEATURE_MINOR_HUGETLBFS | \
- UFFD_FEATURE_MINOR_SHMEM)
+ UFFD_FEATURE_MINOR_SHMEM | \
+ UFFD_FEATURE_EXACT_ADDRESS)
#define UFFD_API_IOCTLS \
((__u64)1 << _UFFDIO_REGISTER | \
(__u64)1 << _UFFDIO_UNREGISTER | \
@@ -189,6 +190,10 @@ struct uffdio_api {
*
* UFFD_FEATURE_MINOR_SHMEM indicates the same support as
* UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
+ *
+ * UFFD_FEATURE_EXACT_ADDRESS indicates that the exact address of page
+ * faults would be provided and the offset within the page would not be
+ * masked.
*/
#define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
#define UFFD_FEATURE_EVENT_FORK (1<<1)
@@ -201,6 +206,7 @@ struct uffdio_api {
#define UFFD_FEATURE_THREAD_ID (1<<8)
#define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9)
#define UFFD_FEATURE_MINOR_SHMEM (1<<10)
+#define UFFD_FEATURE_EXACT_ADDRESS (1<<11)
__u64 features;
__u64 ioctls;
--- a/mm/hugetlb.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/mm/hugetlb.c
@@ -5341,6 +5341,7 @@ static inline vm_fault_t hugetlb_handle_
pgoff_t idx,
unsigned int flags,
unsigned long haddr,
+ unsigned long addr,
unsigned long reason)
{
vm_fault_t ret;
@@ -5348,6 +5349,7 @@ static inline vm_fault_t hugetlb_handle_
struct vm_fault vmf = {
.vma = vma,
.address = haddr,
+ .real_address = addr,
.flags = flags,
/*
@@ -5416,7 +5418,7 @@ retry:
/* Check for page in userfault range */
if (userfaultfd_missing(vma)) {
ret = hugetlb_handle_userfault(vma, mapping, idx,
- flags, haddr,
+ flags, haddr, address,
VM_UFFD_MISSING);
goto out;
}
@@ -5480,7 +5482,7 @@ retry:
unlock_page(page);
put_page(page);
ret = hugetlb_handle_userfault(vma, mapping, idx,
- flags, haddr,
+ flags, haddr, address,
VM_UFFD_MINOR);
goto out;
}
--- a/mm/memory.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/mm/memory.c
@@ -4633,6 +4633,7 @@ static vm_fault_t __handle_mm_fault(stru
struct vm_fault vmf = {
.vma = vma,
.address = address & PAGE_MASK,
+ .real_address = address,
.flags = flags,
.pgoff = linear_page_index(vma, address),
.gfp_mask = __get_fault_gfp_mask(vma),
--- a/mm/swapfile.c~userfaultfd-provide-unmasked-address-on-page-fault
+++ a/mm/swapfile.c
@@ -1951,6 +1951,7 @@ static int unuse_pte_range(struct vm_are
struct vm_fault vmf = {
.vma = vma,
.address = addr,
+ .real_address = addr,
.pmd = pmd,
};
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 139/227] userfaultfd/selftests: fix uninitialized_var.cocci warning
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: shuah, guozhengkui, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Guo Zhengkui <guozhengkui@vivo.com>
Subject: userfaultfd/selftests: fix uninitialized_var.cocci warning
Fix following coccicheck warning:
tools/testing/selftests/vm/userfaultfd.c:556:23-24:
WARNING this kind of initialization is deprecated
`unsigned long page_nr = *(&page_nr)` has the same form of
uninitialized_var() macro. I remove the redundant assignement. It has
been tested with gcc (Debian 8.3.0-6) 8.3.0.
The patch which removed uninitialized_var() is:
https://lore.kernel.org/all/20121028102007.GA7547@gmail.com/ And there is
very few "/* GCC */" comments in the Linux kernel code now.
Link: https://lkml.kernel.org/r/20220304082333.9252-1-guozhengkui@vivo.com
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/vm/userfaultfd.c~userfaultfd-selftests-fix-uninitialized_varcocci-warning
+++ a/tools/testing/selftests/vm/userfaultfd.c
@@ -540,7 +540,7 @@ static void continue_range(int ufd, __u6
static void *locking_thread(void *arg)
{
unsigned long cpu = (unsigned long) arg;
- unsigned long page_nr = *(&(page_nr)); /* uninitialized warning */
+ unsigned long page_nr;
unsigned long long count;
if (!(bounces & BOUNCE_RANDOM)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 139/227] userfaultfd/selftests: fix uninitialized_var.cocci warning
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: shuah, guozhengkui, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Guo Zhengkui <guozhengkui@vivo.com>
Subject: userfaultfd/selftests: fix uninitialized_var.cocci warning
Fix following coccicheck warning:
tools/testing/selftests/vm/userfaultfd.c:556:23-24:
WARNING this kind of initialization is deprecated
`unsigned long page_nr = *(&page_nr)` has the same form of
uninitialized_var() macro. I remove the redundant assignement. It has
been tested with gcc (Debian 8.3.0-6) 8.3.0.
The patch which removed uninitialized_var() is:
https://lore.kernel.org/all/20121028102007.GA7547@gmail.com/ And there is
very few "/* GCC */" comments in the Linux kernel code now.
Link: https://lkml.kernel.org/r/20220304082333.9252-1-guozhengkui@vivo.com
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/vm/userfaultfd.c~userfaultfd-selftests-fix-uninitialized_varcocci-warning
+++ a/tools/testing/selftests/vm/userfaultfd.c
@@ -540,7 +540,7 @@ static void continue_range(int ufd, __u6
static void *locking_thread(void *arg)
{
unsigned long cpu = (unsigned long) arg;
- unsigned long page_nr = *(&(page_nr)); /* uninitialized warning */
+ unsigned long page_nr;
unsigned long long count;
if (!(bounces & BOUNCE_RANDOM)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 140/227] mm/fs: delete PF_SWAPWRITE
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: willy, neilb, jack, djwong, david, hughd, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm/fs: delete PF_SWAPWRITE
PF_SWAPWRITE has been redundant since v3.2 commit ee72886d8ed5 ("mm:
vmscan: do not writeback filesystem pages in direct reclaim").
Coincidentally, NeilBrown's current patch "remove inode_congested()"
deletes may_write_to_inode(), which appeared to be the one function which
took notice of PF_SWAPWRITE. But if you study the old logic, and the
conditions under which may_write_to_inode() was called, you discover that
flag and function have been pointless for a decade.
Link: https://lkml.kernel.org/r/75e80e7-742d-e3bd-531-614db8961e4@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 3 ---
fs/xfs/libxfs/xfs_btree.c | 2 +-
include/linux/sched.h | 1 -
mm/migrate.c | 7 -------
mm/vmscan.c | 8 ++------
5 files changed, 3 insertions(+), 18 deletions(-)
--- a/fs/fs-writeback.c~mm-fs-delete-pf_swapwrite
+++ a/fs/fs-writeback.c
@@ -2197,7 +2197,6 @@ void wb_workfn(struct work_struct *work)
long pages_written;
set_worker_desc("flush-%s", bdi_dev_name(wb->bdi));
- current->flags |= PF_SWAPWRITE;
if (likely(!current_is_workqueue_rescuer() ||
!test_bit(WB_registered, &wb->state))) {
@@ -2226,8 +2225,6 @@ void wb_workfn(struct work_struct *work)
wb_wakeup(wb);
else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
wb_wakeup_delayed(wb);
-
- current->flags &= ~PF_SWAPWRITE;
}
/*
--- a/fs/xfs/libxfs/xfs_btree.c~mm-fs-delete-pf_swapwrite
+++ a/fs/xfs/libxfs/xfs_btree.c
@@ -2818,7 +2818,7 @@ xfs_btree_split_worker(
* in any way.
*/
if (args->kswapd)
- new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
+ new_pflags |= PF_MEMALLOC | PF_KSWAPD;
current_set_flags_nested(&pflags, new_pflags);
xfs_trans_set_context(args->cur->bc_tp);
--- a/include/linux/sched.h~mm-fs-delete-pf_swapwrite
+++ a/include/linux/sched.h
@@ -1689,7 +1689,6 @@ extern struct pid *cad_pid;
* I am cleaning dirty pages from some other bdi. */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
-#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMALLOC_PIN 0x10000000 /* Allocation context constrained to zones which allow long term pinning. */
--- a/mm/migrate.c~mm-fs-delete-pf_swapwrite
+++ a/mm/migrate.c
@@ -1350,7 +1350,6 @@ int migrate_pages(struct list_head *from
bool is_thp = false;
struct page *page;
struct page *page2;
- int swapwrite = current->flags & PF_SWAPWRITE;
int rc, nr_subpages;
LIST_HEAD(ret_pages);
LIST_HEAD(thp_split_pages);
@@ -1359,9 +1358,6 @@ int migrate_pages(struct list_head *from
trace_mm_migrate_pages_start(mode, reason);
- if (!swapwrite)
- current->flags |= PF_SWAPWRITE;
-
thp_subpage_migration:
for (pass = 0; pass < 10 && (retry || thp_retry); pass++) {
retry = 0;
@@ -1516,9 +1512,6 @@ out:
trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
nr_thp_failed, nr_thp_split, mode, reason);
- if (!swapwrite)
- current->flags &= ~PF_SWAPWRITE;
-
if (ret_succeeded)
*ret_succeeded = nr_succeeded;
--- a/mm/vmscan.c~mm-fs-delete-pf_swapwrite
+++ a/mm/vmscan.c
@@ -4457,7 +4457,7 @@ static int kswapd(void *p)
* us from recursively trying to free more memory as we're
* trying to free the first piece of memory in the first place).
*/
- tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
+ tsk->flags |= PF_MEMALLOC | PF_KSWAPD;
set_freezable();
WRITE_ONCE(pgdat->kswapd_order, 0);
@@ -4508,7 +4508,7 @@ kswapd_try_sleep:
goto kswapd_try_sleep;
}
- tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
+ tsk->flags &= ~(PF_MEMALLOC | PF_KSWAPD);
return 0;
}
@@ -4749,11 +4749,8 @@ static int __node_reclaim(struct pglist_
fs_reclaim_acquire(sc.gfp_mask);
/*
* We need to be able to allocate from the reserves for RECLAIM_UNMAP
- * and we also need to be able to write out pages for RECLAIM_WRITE
- * and RECLAIM_UNMAP.
*/
noreclaim_flag = memalloc_noreclaim_save();
- p->flags |= PF_SWAPWRITE;
set_task_reclaim_state(p, &sc.reclaim_state);
if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
@@ -4767,7 +4764,6 @@ static int __node_reclaim(struct pglist_
}
set_task_reclaim_state(p, NULL);
- current->flags &= ~PF_SWAPWRITE;
memalloc_noreclaim_restore(noreclaim_flag);
fs_reclaim_release(sc.gfp_mask);
psi_memstall_leave(&pflags);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 140/227] mm/fs: delete PF_SWAPWRITE
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: willy, neilb, jack, djwong, david, hughd, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm/fs: delete PF_SWAPWRITE
PF_SWAPWRITE has been redundant since v3.2 commit ee72886d8ed5 ("mm:
vmscan: do not writeback filesystem pages in direct reclaim").
Coincidentally, NeilBrown's current patch "remove inode_congested()"
deletes may_write_to_inode(), which appeared to be the one function which
took notice of PF_SWAPWRITE. But if you study the old logic, and the
conditions under which may_write_to_inode() was called, you discover that
flag and function have been pointless for a decade.
Link: https://lkml.kernel.org/r/75e80e7-742d-e3bd-531-614db8961e4@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 3 ---
fs/xfs/libxfs/xfs_btree.c | 2 +-
include/linux/sched.h | 1 -
mm/migrate.c | 7 -------
mm/vmscan.c | 8 ++------
5 files changed, 3 insertions(+), 18 deletions(-)
--- a/fs/fs-writeback.c~mm-fs-delete-pf_swapwrite
+++ a/fs/fs-writeback.c
@@ -2197,7 +2197,6 @@ void wb_workfn(struct work_struct *work)
long pages_written;
set_worker_desc("flush-%s", bdi_dev_name(wb->bdi));
- current->flags |= PF_SWAPWRITE;
if (likely(!current_is_workqueue_rescuer() ||
!test_bit(WB_registered, &wb->state))) {
@@ -2226,8 +2225,6 @@ void wb_workfn(struct work_struct *work)
wb_wakeup(wb);
else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
wb_wakeup_delayed(wb);
-
- current->flags &= ~PF_SWAPWRITE;
}
/*
--- a/fs/xfs/libxfs/xfs_btree.c~mm-fs-delete-pf_swapwrite
+++ a/fs/xfs/libxfs/xfs_btree.c
@@ -2818,7 +2818,7 @@ xfs_btree_split_worker(
* in any way.
*/
if (args->kswapd)
- new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
+ new_pflags |= PF_MEMALLOC | PF_KSWAPD;
current_set_flags_nested(&pflags, new_pflags);
xfs_trans_set_context(args->cur->bc_tp);
--- a/include/linux/sched.h~mm-fs-delete-pf_swapwrite
+++ a/include/linux/sched.h
@@ -1689,7 +1689,6 @@ extern struct pid *cad_pid;
* I am cleaning dirty pages from some other bdi. */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
-#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMALLOC_PIN 0x10000000 /* Allocation context constrained to zones which allow long term pinning. */
--- a/mm/migrate.c~mm-fs-delete-pf_swapwrite
+++ a/mm/migrate.c
@@ -1350,7 +1350,6 @@ int migrate_pages(struct list_head *from
bool is_thp = false;
struct page *page;
struct page *page2;
- int swapwrite = current->flags & PF_SWAPWRITE;
int rc, nr_subpages;
LIST_HEAD(ret_pages);
LIST_HEAD(thp_split_pages);
@@ -1359,9 +1358,6 @@ int migrate_pages(struct list_head *from
trace_mm_migrate_pages_start(mode, reason);
- if (!swapwrite)
- current->flags |= PF_SWAPWRITE;
-
thp_subpage_migration:
for (pass = 0; pass < 10 && (retry || thp_retry); pass++) {
retry = 0;
@@ -1516,9 +1512,6 @@ out:
trace_mm_migrate_pages(nr_succeeded, nr_failed_pages, nr_thp_succeeded,
nr_thp_failed, nr_thp_split, mode, reason);
- if (!swapwrite)
- current->flags &= ~PF_SWAPWRITE;
-
if (ret_succeeded)
*ret_succeeded = nr_succeeded;
--- a/mm/vmscan.c~mm-fs-delete-pf_swapwrite
+++ a/mm/vmscan.c
@@ -4457,7 +4457,7 @@ static int kswapd(void *p)
* us from recursively trying to free more memory as we're
* trying to free the first piece of memory in the first place).
*/
- tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
+ tsk->flags |= PF_MEMALLOC | PF_KSWAPD;
set_freezable();
WRITE_ONCE(pgdat->kswapd_order, 0);
@@ -4508,7 +4508,7 @@ kswapd_try_sleep:
goto kswapd_try_sleep;
}
- tsk->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD);
+ tsk->flags &= ~(PF_MEMALLOC | PF_KSWAPD);
return 0;
}
@@ -4749,11 +4749,8 @@ static int __node_reclaim(struct pglist_
fs_reclaim_acquire(sc.gfp_mask);
/*
* We need to be able to allocate from the reserves for RECLAIM_UNMAP
- * and we also need to be able to write out pages for RECLAIM_WRITE
- * and RECLAIM_UNMAP.
*/
noreclaim_flag = memalloc_noreclaim_save();
- p->flags |= PF_SWAPWRITE;
set_task_reclaim_state(p, &sc.reclaim_state);
if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
@@ -4767,7 +4764,6 @@ static int __node_reclaim(struct pglist_
}
set_task_reclaim_state(p, NULL);
- current->flags &= ~PF_SWAPWRITE;
memalloc_noreclaim_restore(noreclaim_flag);
fs_reclaim_release(sc.gfp_mask);
psi_memstall_leave(&pflags);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 141/227] mm: __isolate_lru_page_prepare() in isolate_migratepages_block()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: rientjes, alexs, alexander.duyck, hughd, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm: __isolate_lru_page_prepare() in isolate_migratepages_block()
__isolate_lru_page_prepare() conflates two unrelated functions, with the
flags to one disjoint from the flags to the other; and hides some of the
important checks outside of isolate_migratepages_block(), where the
sequence is better to be visible. It comes from the days of lumpy
reclaim, before compaction, when the combination made more sense.
Move what's needed by mm/compaction.c isolate_migratepages_block() inline
there, and what's needed by mm/vmscan.c isolate_lru_pages() inline there.
Shorten "isolate_mode" to "mode", so the sequence of conditions is easier
to read. Declare a "mapping" variable, to save one call to page_mapping()
(but not another: calling again after page is locked is necessary).
Simplify isolate_lru_pages() with a "move_to" list pointer.
Link: https://lkml.kernel.org/r/879d62a8-91cc-d3c6-fb3b-69768236df68@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Alex Shi <alexs@kernel.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/swap.h | 1
mm/compaction.c | 51 +++++++++++++++++---
mm/vmscan.c | 101 +++++++----------------------------------
3 files changed, 62 insertions(+), 91 deletions(-)
--- a/include/linux/swap.h~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block
+++ a/include/linux/swap.h
@@ -387,7 +387,6 @@ extern void lru_cache_add_inactive_or_un
extern unsigned long zone_reclaimable_pages(struct zone *zone);
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask, nodemask_t *mask);
-extern bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode);
extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
unsigned long nr_pages,
gfp_t gfp_mask,
--- a/mm/compaction.c~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block
+++ a/mm/compaction.c
@@ -785,7 +785,7 @@ static bool too_many_isolated(pg_data_t
* @cc: Compaction control structure.
* @low_pfn: The first PFN to isolate
* @end_pfn: The one-past-the-last PFN to isolate, within same pageblock
- * @isolate_mode: Isolation mode to be used.
+ * @mode: Isolation mode to be used.
*
* Isolate all pages that can be migrated from the range specified by
* [low_pfn, end_pfn). The range is expected to be within same pageblock.
@@ -798,7 +798,7 @@ static bool too_many_isolated(pg_data_t
*/
static int
isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
- unsigned long end_pfn, isolate_mode_t isolate_mode)
+ unsigned long end_pfn, isolate_mode_t mode)
{
pg_data_t *pgdat = cc->zone->zone_pgdat;
unsigned long nr_scanned = 0, nr_isolated = 0;
@@ -806,6 +806,7 @@ isolate_migratepages_block(struct compac
unsigned long flags = 0;
struct lruvec *locked = NULL;
struct page *page = NULL, *valid_page = NULL;
+ struct address_space *mapping;
unsigned long start_pfn = low_pfn;
bool skip_on_failure = false;
unsigned long next_skip_pfn = 0;
@@ -990,7 +991,7 @@ isolate_migratepages_block(struct compac
locked = NULL;
}
- if (!isolate_movable_page(page, isolate_mode))
+ if (!isolate_movable_page(page, mode))
goto isolate_success;
}
@@ -1002,15 +1003,15 @@ isolate_migratepages_block(struct compac
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
- if (!page_mapping(page) &&
- page_count(page) > page_mapcount(page))
+ mapping = page_mapping(page);
+ if (!mapping && page_count(page) > page_mapcount(page))
goto isolate_fail;
/*
* Only allow to migrate anonymous pages in GFP_NOFS context
* because those do not depend on fs locks.
*/
- if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
+ if (!(cc->gfp_mask & __GFP_FS) && mapping)
goto isolate_fail;
/*
@@ -1021,9 +1022,45 @@ isolate_migratepages_block(struct compac
if (unlikely(!get_page_unless_zero(page)))
goto isolate_fail;
- if (!__isolate_lru_page_prepare(page, isolate_mode))
+ /* Only take pages on LRU: a check now makes later tests safe */
+ if (!PageLRU(page))
goto isolate_fail_put;
+ /* Compaction might skip unevictable pages but CMA takes them */
+ if (!(mode & ISOLATE_UNEVICTABLE) && PageUnevictable(page))
+ goto isolate_fail_put;
+
+ /*
+ * To minimise LRU disruption, the caller can indicate with
+ * ISOLATE_ASYNC_MIGRATE that it only wants to isolate pages
+ * it will be able to migrate without blocking - clean pages
+ * for the most part. PageWriteback would require blocking.
+ */
+ if ((mode & ISOLATE_ASYNC_MIGRATE) && PageWriteback(page))
+ goto isolate_fail_put;
+
+ if ((mode & ISOLATE_ASYNC_MIGRATE) && PageDirty(page)) {
+ bool migrate_dirty;
+
+ /*
+ * Only pages without mappings or that have a
+ * ->migratepage callback are possible to migrate
+ * without blocking. However, we can be racing with
+ * truncation so it's necessary to lock the page
+ * to stabilise the mapping as truncation holds
+ * the page lock until after the page is removed
+ * from the page cache.
+ */
+ if (!trylock_page(page))
+ goto isolate_fail_put;
+
+ mapping = page_mapping(page);
+ migrate_dirty = !mapping || mapping->a_ops->migratepage;
+ unlock_page(page);
+ if (!migrate_dirty)
+ goto isolate_fail_put;
+ }
+
/* Try isolate the page */
if (!TestClearPageLRU(page))
goto isolate_fail_put;
--- a/mm/vmscan.c~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block
+++ a/mm/vmscan.c
@@ -1999,69 +1999,6 @@ unsigned int reclaim_clean_pages_from_li
}
/*
- * Attempt to remove the specified page from its LRU. Only take this page
- * if it is of the appropriate PageActive status. Pages which are being
- * freed elsewhere are also ignored.
- *
- * page: page to consider
- * mode: one of the LRU isolation modes defined above
- *
- * returns true on success, false on failure.
- */
-bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode)
-{
- /* Only take pages on the LRU. */
- if (!PageLRU(page))
- return false;
-
- /* Compaction should not handle unevictable pages but CMA can do so */
- if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
- return false;
-
- /*
- * To minimise LRU disruption, the caller can indicate that it only
- * wants to isolate pages it will be able to operate on without
- * blocking - clean pages for the most part.
- *
- * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
- * that it is possible to migrate without blocking
- */
- if (mode & ISOLATE_ASYNC_MIGRATE) {
- /* All the caller can do on PageWriteback is block */
- if (PageWriteback(page))
- return false;
-
- if (PageDirty(page)) {
- struct address_space *mapping;
- bool migrate_dirty;
-
- /*
- * Only pages without mappings or that have a
- * ->migratepage callback are possible to migrate
- * without blocking. However, we can be racing with
- * truncation so it's necessary to lock the page
- * to stabilise the mapping as truncation holds
- * the page lock until after the page is removed
- * from the page cache.
- */
- if (!trylock_page(page))
- return false;
-
- mapping = page_mapping(page);
- migrate_dirty = !mapping || mapping->a_ops->migratepage;
- unlock_page(page);
- if (!migrate_dirty)
- return false;
- }
- }
-
- if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
- return false;
-
- return true;
-}
-
-/*
* Update LRU sizes after isolating pages. The LRU size updates must
* be complete before mem_cgroup_update_lru_size due to a sanity check.
*/
@@ -2112,11 +2049,11 @@ static unsigned long isolate_lru_pages(u
unsigned long skipped = 0;
unsigned long scan, total_scan, nr_pages;
LIST_HEAD(pages_skipped);
- isolate_mode_t mode = (sc->may_unmap ? 0 : ISOLATE_UNMAPPED);
total_scan = 0;
scan = 0;
while (scan < nr_to_scan && !list_empty(src)) {
+ struct list_head *move_to = src;
struct page *page;
page = lru_to_page(src);
@@ -2126,9 +2063,9 @@ static unsigned long isolate_lru_pages(u
total_scan += nr_pages;
if (page_zonenum(page) > sc->reclaim_idx) {
- list_move(&page->lru, &pages_skipped);
nr_skipped[page_zonenum(page)] += nr_pages;
- continue;
+ move_to = &pages_skipped;
+ goto move;
}
/*
@@ -2136,37 +2073,34 @@ static unsigned long isolate_lru_pages(u
* return with no isolated pages if the LRU mostly contains
* ineligible pages. This causes the VM to not reclaim any
* pages, triggering a premature OOM.
- *
- * Account all tail pages of THP. This would not cause
- * premature OOM since __isolate_lru_page() returns -EBUSY
- * only when the page is being freed somewhere else.
+ * Account all tail pages of THP.
*/
scan += nr_pages;
- if (!__isolate_lru_page_prepare(page, mode)) {
- /* It is being freed elsewhere */
- list_move(&page->lru, src);
- continue;
- }
+
+ if (!PageLRU(page))
+ goto move;
+ if (!sc->may_unmap && page_mapped(page))
+ goto move;
+
/*
* Be careful not to clear PageLRU until after we're
* sure the page is not being freed elsewhere -- the
* page release code relies on it.
*/
- if (unlikely(!get_page_unless_zero(page))) {
- list_move(&page->lru, src);
- continue;
- }
+ if (unlikely(!get_page_unless_zero(page)))
+ goto move;
if (!TestClearPageLRU(page)) {
/* Another thread is already isolating this page */
put_page(page);
- list_move(&page->lru, src);
- continue;
+ goto move;
}
nr_taken += nr_pages;
nr_zone_taken[page_zonenum(page)] += nr_pages;
- list_move(&page->lru, dst);
+ move_to = dst;
+move:
+ list_move(&page->lru, move_to);
}
/*
@@ -2190,7 +2124,8 @@ static unsigned long isolate_lru_pages(u
}
*nr_scanned = total_scan;
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
- total_scan, skipped, nr_taken, mode, lru);
+ total_scan, skipped, nr_taken,
+ sc->may_unmap ? 0 : ISOLATE_UNMAPPED, lru);
update_lru_sizes(lruvec, lru, nr_zone_taken);
return nr_taken;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 141/227] mm: __isolate_lru_page_prepare() in isolate_migratepages_block()
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: rientjes, alexs, alexander.duyck, hughd, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm: __isolate_lru_page_prepare() in isolate_migratepages_block()
__isolate_lru_page_prepare() conflates two unrelated functions, with the
flags to one disjoint from the flags to the other; and hides some of the
important checks outside of isolate_migratepages_block(), where the
sequence is better to be visible. It comes from the days of lumpy
reclaim, before compaction, when the combination made more sense.
Move what's needed by mm/compaction.c isolate_migratepages_block() inline
there, and what's needed by mm/vmscan.c isolate_lru_pages() inline there.
Shorten "isolate_mode" to "mode", so the sequence of conditions is easier
to read. Declare a "mapping" variable, to save one call to page_mapping()
(but not another: calling again after page is locked is necessary).
Simplify isolate_lru_pages() with a "move_to" list pointer.
Link: https://lkml.kernel.org/r/879d62a8-91cc-d3c6-fb3b-69768236df68@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Alex Shi <alexs@kernel.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/swap.h | 1
mm/compaction.c | 51 +++++++++++++++++---
mm/vmscan.c | 101 +++++++----------------------------------
3 files changed, 62 insertions(+), 91 deletions(-)
--- a/include/linux/swap.h~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block
+++ a/include/linux/swap.h
@@ -387,7 +387,6 @@ extern void lru_cache_add_inactive_or_un
extern unsigned long zone_reclaimable_pages(struct zone *zone);
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask, nodemask_t *mask);
-extern bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode);
extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
unsigned long nr_pages,
gfp_t gfp_mask,
--- a/mm/compaction.c~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block
+++ a/mm/compaction.c
@@ -785,7 +785,7 @@ static bool too_many_isolated(pg_data_t
* @cc: Compaction control structure.
* @low_pfn: The first PFN to isolate
* @end_pfn: The one-past-the-last PFN to isolate, within same pageblock
- * @isolate_mode: Isolation mode to be used.
+ * @mode: Isolation mode to be used.
*
* Isolate all pages that can be migrated from the range specified by
* [low_pfn, end_pfn). The range is expected to be within same pageblock.
@@ -798,7 +798,7 @@ static bool too_many_isolated(pg_data_t
*/
static int
isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
- unsigned long end_pfn, isolate_mode_t isolate_mode)
+ unsigned long end_pfn, isolate_mode_t mode)
{
pg_data_t *pgdat = cc->zone->zone_pgdat;
unsigned long nr_scanned = 0, nr_isolated = 0;
@@ -806,6 +806,7 @@ isolate_migratepages_block(struct compac
unsigned long flags = 0;
struct lruvec *locked = NULL;
struct page *page = NULL, *valid_page = NULL;
+ struct address_space *mapping;
unsigned long start_pfn = low_pfn;
bool skip_on_failure = false;
unsigned long next_skip_pfn = 0;
@@ -990,7 +991,7 @@ isolate_migratepages_block(struct compac
locked = NULL;
}
- if (!isolate_movable_page(page, isolate_mode))
+ if (!isolate_movable_page(page, mode))
goto isolate_success;
}
@@ -1002,15 +1003,15 @@ isolate_migratepages_block(struct compac
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
- if (!page_mapping(page) &&
- page_count(page) > page_mapcount(page))
+ mapping = page_mapping(page);
+ if (!mapping && page_count(page) > page_mapcount(page))
goto isolate_fail;
/*
* Only allow to migrate anonymous pages in GFP_NOFS context
* because those do not depend on fs locks.
*/
- if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
+ if (!(cc->gfp_mask & __GFP_FS) && mapping)
goto isolate_fail;
/*
@@ -1021,9 +1022,45 @@ isolate_migratepages_block(struct compac
if (unlikely(!get_page_unless_zero(page)))
goto isolate_fail;
- if (!__isolate_lru_page_prepare(page, isolate_mode))
+ /* Only take pages on LRU: a check now makes later tests safe */
+ if (!PageLRU(page))
goto isolate_fail_put;
+ /* Compaction might skip unevictable pages but CMA takes them */
+ if (!(mode & ISOLATE_UNEVICTABLE) && PageUnevictable(page))
+ goto isolate_fail_put;
+
+ /*
+ * To minimise LRU disruption, the caller can indicate with
+ * ISOLATE_ASYNC_MIGRATE that it only wants to isolate pages
+ * it will be able to migrate without blocking - clean pages
+ * for the most part. PageWriteback would require blocking.
+ */
+ if ((mode & ISOLATE_ASYNC_MIGRATE) && PageWriteback(page))
+ goto isolate_fail_put;
+
+ if ((mode & ISOLATE_ASYNC_MIGRATE) && PageDirty(page)) {
+ bool migrate_dirty;
+
+ /*
+ * Only pages without mappings or that have a
+ * ->migratepage callback are possible to migrate
+ * without blocking. However, we can be racing with
+ * truncation so it's necessary to lock the page
+ * to stabilise the mapping as truncation holds
+ * the page lock until after the page is removed
+ * from the page cache.
+ */
+ if (!trylock_page(page))
+ goto isolate_fail_put;
+
+ mapping = page_mapping(page);
+ migrate_dirty = !mapping || mapping->a_ops->migratepage;
+ unlock_page(page);
+ if (!migrate_dirty)
+ goto isolate_fail_put;
+ }
+
/* Try isolate the page */
if (!TestClearPageLRU(page))
goto isolate_fail_put;
--- a/mm/vmscan.c~mm-__isolate_lru_page_prepare-in-isolate_migratepages_block
+++ a/mm/vmscan.c
@@ -1999,69 +1999,6 @@ unsigned int reclaim_clean_pages_from_li
}
/*
- * Attempt to remove the specified page from its LRU. Only take this page
- * if it is of the appropriate PageActive status. Pages which are being
- * freed elsewhere are also ignored.
- *
- * page: page to consider
- * mode: one of the LRU isolation modes defined above
- *
- * returns true on success, false on failure.
- */
-bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode)
-{
- /* Only take pages on the LRU. */
- if (!PageLRU(page))
- return false;
-
- /* Compaction should not handle unevictable pages but CMA can do so */
- if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
- return false;
-
- /*
- * To minimise LRU disruption, the caller can indicate that it only
- * wants to isolate pages it will be able to operate on without
- * blocking - clean pages for the most part.
- *
- * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
- * that it is possible to migrate without blocking
- */
- if (mode & ISOLATE_ASYNC_MIGRATE) {
- /* All the caller can do on PageWriteback is block */
- if (PageWriteback(page))
- return false;
-
- if (PageDirty(page)) {
- struct address_space *mapping;
- bool migrate_dirty;
-
- /*
- * Only pages without mappings or that have a
- * ->migratepage callback are possible to migrate
- * without blocking. However, we can be racing with
- * truncation so it's necessary to lock the page
- * to stabilise the mapping as truncation holds
- * the page lock until after the page is removed
- * from the page cache.
- */
- if (!trylock_page(page))
- return false;
-
- mapping = page_mapping(page);
- migrate_dirty = !mapping || mapping->a_ops->migratepage;
- unlock_page(page);
- if (!migrate_dirty)
- return false;
- }
- }
-
- if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
- return false;
-
- return true;
-}
-
-/*
* Update LRU sizes after isolating pages. The LRU size updates must
* be complete before mem_cgroup_update_lru_size due to a sanity check.
*/
@@ -2112,11 +2049,11 @@ static unsigned long isolate_lru_pages(u
unsigned long skipped = 0;
unsigned long scan, total_scan, nr_pages;
LIST_HEAD(pages_skipped);
- isolate_mode_t mode = (sc->may_unmap ? 0 : ISOLATE_UNMAPPED);
total_scan = 0;
scan = 0;
while (scan < nr_to_scan && !list_empty(src)) {
+ struct list_head *move_to = src;
struct page *page;
page = lru_to_page(src);
@@ -2126,9 +2063,9 @@ static unsigned long isolate_lru_pages(u
total_scan += nr_pages;
if (page_zonenum(page) > sc->reclaim_idx) {
- list_move(&page->lru, &pages_skipped);
nr_skipped[page_zonenum(page)] += nr_pages;
- continue;
+ move_to = &pages_skipped;
+ goto move;
}
/*
@@ -2136,37 +2073,34 @@ static unsigned long isolate_lru_pages(u
* return with no isolated pages if the LRU mostly contains
* ineligible pages. This causes the VM to not reclaim any
* pages, triggering a premature OOM.
- *
- * Account all tail pages of THP. This would not cause
- * premature OOM since __isolate_lru_page() returns -EBUSY
- * only when the page is being freed somewhere else.
+ * Account all tail pages of THP.
*/
scan += nr_pages;
- if (!__isolate_lru_page_prepare(page, mode)) {
- /* It is being freed elsewhere */
- list_move(&page->lru, src);
- continue;
- }
+
+ if (!PageLRU(page))
+ goto move;
+ if (!sc->may_unmap && page_mapped(page))
+ goto move;
+
/*
* Be careful not to clear PageLRU until after we're
* sure the page is not being freed elsewhere -- the
* page release code relies on it.
*/
- if (unlikely(!get_page_unless_zero(page))) {
- list_move(&page->lru, src);
- continue;
- }
+ if (unlikely(!get_page_unless_zero(page)))
+ goto move;
if (!TestClearPageLRU(page)) {
/* Another thread is already isolating this page */
put_page(page);
- list_move(&page->lru, src);
- continue;
+ goto move;
}
nr_taken += nr_pages;
nr_zone_taken[page_zonenum(page)] += nr_pages;
- list_move(&page->lru, dst);
+ move_to = dst;
+move:
+ list_move(&page->lru, move_to);
}
/*
@@ -2190,7 +2124,8 @@ static unsigned long isolate_lru_pages(u
}
*nr_scanned = total_scan;
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
- total_scan, skipped, nr_taken, mode, lru);
+ total_scan, skipped, nr_taken,
+ sc->may_unmap ? 0 : ISOLATE_UNMAPPED, lru);
update_lru_sizes(lruvec, lru, nr_zone_taken);
return nr_taken;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 142/227] mm/list_lru: optimize memcg_reparent_list_lru_node()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: songmuchun, shakeelb, roman.gushchin, mhocko, hannes, longman,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Waiman Long <longman@redhat.com>
Subject: mm/list_lru: optimize memcg_reparent_list_lru_node()
Since commit 2c80cd57c743 ("mm/list_lru.c: fix list_lru_count_node() to be
race free"), we are tracking the total number of lru entries in a
list_lru_node in its nr_items field. In the case of
memcg_reparent_list_lru_node(), there is nothing to be done if nr_items is
0. We don't even need to take the nlru->lock as no new lru entry could be
added by a racing list_lru_add() to the draining src_idx memcg at this
point.
On systems that serve a lot of containers, it is possible that there can
be thousands of list_lru's present due to the fact that each container may
mount its own container specific filesystems. As a typical container uses
only a few cpus, it is likely that only the list_lru_node that contains
those cpus will be utilized while the rests may be empty. In other words,
there can be a lot of list_lru_node with 0 nr_items. By skipping a
lock/unlock operation and loading a cacheline from memcg_lrus, a sizeable
number of cpu cycles can be saved. That can be substantial if we are
talking about thousands of list_lru_node's with 0 nr_items.
Link: https://lkml.kernel.org/r/20220309144000.1470138-1-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/list_lru.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/list_lru.c~mm-list_lru-optimize-memcg_reparent_list_lru_node
+++ a/mm/list_lru.c
@@ -395,6 +395,12 @@ static void memcg_reparent_list_lru_node
struct list_lru_one *src, *dst;
/*
+ * If there is no lru entry in this nlru, we can skip it immediately.
+ */
+ if (!READ_ONCE(nlru->nr_items))
+ return;
+
+ /*
* Since list_lru_{add,del} may be called under an IRQ-safe lock,
* we have to use IRQ-safe primitives here to avoid deadlock.
*/
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 142/227] mm/list_lru: optimize memcg_reparent_list_lru_node()
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: songmuchun, shakeelb, roman.gushchin, mhocko, hannes, longman,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Waiman Long <longman@redhat.com>
Subject: mm/list_lru: optimize memcg_reparent_list_lru_node()
Since commit 2c80cd57c743 ("mm/list_lru.c: fix list_lru_count_node() to be
race free"), we are tracking the total number of lru entries in a
list_lru_node in its nr_items field. In the case of
memcg_reparent_list_lru_node(), there is nothing to be done if nr_items is
0. We don't even need to take the nlru->lock as no new lru entry could be
added by a racing list_lru_add() to the draining src_idx memcg at this
point.
On systems that serve a lot of containers, it is possible that there can
be thousands of list_lru's present due to the fact that each container may
mount its own container specific filesystems. As a typical container uses
only a few cpus, it is likely that only the list_lru_node that contains
those cpus will be utilized while the rests may be empty. In other words,
there can be a lot of list_lru_node with 0 nr_items. By skipping a
lock/unlock operation and loading a cacheline from memcg_lrus, a sizeable
number of cpu cycles can be saved. That can be substantial if we are
talking about thousands of list_lru_node's with 0 nr_items.
Link: https://lkml.kernel.org/r/20220309144000.1470138-1-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/list_lru.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/list_lru.c~mm-list_lru-optimize-memcg_reparent_list_lru_node
+++ a/mm/list_lru.c
@@ -395,6 +395,12 @@ static void memcg_reparent_list_lru_node
struct list_lru_one *src, *dst;
/*
+ * If there is no lru entry in this nlru, we can skip it immediately.
+ */
+ if (!READ_ONCE(nlru->nr_items))
+ return;
+
+ /*
* Since list_lru_{add,del} may be called under an IRQ-safe lock,
* we have to use IRQ-safe primitives here to avoid deadlock.
*/
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 143/227] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: willy, tglx, paulmck, nsaenzju, minchan, mgorman, juri.lelli,
bigeasy, mtosatti, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu
On systems that run FIFO:1 applications that busy loop, any SCHED_OTHER
task that attempts to execute on such a CPU (such as work threads) will
not be scheduled, which leads to system hangs.
Commit d479960e44f27e0e5 ("mm: disable LRU pagevec during the migration
temporarily") relies on queueing work items on all online CPUs to ensure
visibility of lru_disable_count.
To fix this, replace the usage of work items with synchronize_rcu,
which provides the same guarantees.
Readers of lru_disable_count are protected by either disabling
preemption or rcu_read_lock:
preempt_disable, local_irq_disable [bh_lru_lock()]
rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT]
preempt_disable [local_lock !CONFIG_PREEMPT_RT]
Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on
preempt_disable() regions of code. So any CPU which sees
lru_disable_count = 0 will have exited the critical section when
synchronize_rcu() returns.
Link: https://lkml.kernel.org/r/Yin7hDxdt0s/x+fp@fuller.cnet
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swap.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
--- a/mm/swap.c~mm-lru_cache_disable-replace-work-queue-synchronization-with-synchronize_rcu
+++ a/mm/swap.c
@@ -831,8 +831,7 @@ inline void __lru_add_drain_all(bool for
for_each_online_cpu(cpu) {
struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
- if (force_all_cpus ||
- pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
+ if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
@@ -876,15 +875,21 @@ atomic_t lru_disable_count = ATOMIC_INIT
void lru_cache_disable(void)
{
atomic_inc(&lru_disable_count);
-#ifdef CONFIG_SMP
/*
- * lru_add_drain_all in the force mode will schedule draining on
- * all online CPUs so any calls of lru_cache_disabled wrapped by
- * local_lock or preemption disabled would be ordered by that.
- * The atomic operation doesn't need to have stronger ordering
- * requirements because that is enforced by the scheduling
- * guarantees.
+ * Readers of lru_disable_count are protected by either disabling
+ * preemption or rcu_read_lock:
+ *
+ * preempt_disable, local_irq_disable [bh_lru_lock()]
+ * rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT]
+ * preempt_disable [local_lock !CONFIG_PREEMPT_RT]
+ *
+ * Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on
+ * preempt_disable() regions of code. So any CPU which sees
+ * lru_disable_count = 0 will have exited the critical
+ * section when synchronize_rcu() returns.
*/
+ synchronize_rcu();
+#ifdef CONFIG_SMP
__lru_add_drain_all(true);
#else
lru_add_and_bh_lrus_drain();
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 143/227] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: willy, tglx, paulmck, nsaenzju, minchan, mgorman, juri.lelli,
bigeasy, mtosatti, akpm, patches, linux-mm, mm-commits, torvalds,
akpm
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu
On systems that run FIFO:1 applications that busy loop, any SCHED_OTHER
task that attempts to execute on such a CPU (such as work threads) will
not be scheduled, which leads to system hangs.
Commit d479960e44f27e0e5 ("mm: disable LRU pagevec during the migration
temporarily") relies on queueing work items on all online CPUs to ensure
visibility of lru_disable_count.
To fix this, replace the usage of work items with synchronize_rcu,
which provides the same guarantees.
Readers of lru_disable_count are protected by either disabling
preemption or rcu_read_lock:
preempt_disable, local_irq_disable [bh_lru_lock()]
rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT]
preempt_disable [local_lock !CONFIG_PREEMPT_RT]
Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on
preempt_disable() regions of code. So any CPU which sees
lru_disable_count = 0 will have exited the critical section when
synchronize_rcu() returns.
Link: https://lkml.kernel.org/r/Yin7hDxdt0s/x+fp@fuller.cnet
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swap.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
--- a/mm/swap.c~mm-lru_cache_disable-replace-work-queue-synchronization-with-synchronize_rcu
+++ a/mm/swap.c
@@ -831,8 +831,7 @@ inline void __lru_add_drain_all(bool for
for_each_online_cpu(cpu) {
struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
- if (force_all_cpus ||
- pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
+ if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
@@ -876,15 +875,21 @@ atomic_t lru_disable_count = ATOMIC_INIT
void lru_cache_disable(void)
{
atomic_inc(&lru_disable_count);
-#ifdef CONFIG_SMP
/*
- * lru_add_drain_all in the force mode will schedule draining on
- * all online CPUs so any calls of lru_cache_disabled wrapped by
- * local_lock or preemption disabled would be ordered by that.
- * The atomic operation doesn't need to have stronger ordering
- * requirements because that is enforced by the scheduling
- * guarantees.
+ * Readers of lru_disable_count are protected by either disabling
+ * preemption or rcu_read_lock:
+ *
+ * preempt_disable, local_irq_disable [bh_lru_lock()]
+ * rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT]
+ * preempt_disable [local_lock !CONFIG_PREEMPT_RT]
+ *
+ * Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on
+ * preempt_disable() regions of code. So any CPU which sees
+ * lru_disable_count = 0 will have exited the critical
+ * section when synchronize_rcu() returns.
*/
+ synchronize_rcu();
+#ifdef CONFIG_SMP
__lru_add_drain_all(true);
#else
lru_add_and_bh_lrus_drain();
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 144/227] mm: workingset: replace IRQ-off check with a lockdep assert.
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: tj, tglx, lizefan.x, hannes, bigeasy, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Commit 68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow
nodes") introduced an IRQ-off check to ensure that a lock is held which
also disabled interrupts. This does not work the same way on PREEMPT_RT
because none of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Link: https://lkml.kernel.org/r/20220301122143.1521823-3-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/workingset.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/workingset.c~mm-workingset-replace-irq-off-check-with-a-lockdep-assert
+++ a/mm/workingset.c
@@ -433,6 +433,8 @@ struct list_lru shadow_nodes;
void workingset_update_node(struct xa_node *node)
{
+ struct address_space *mapping;
+
/*
* Track non-empty nodes that contain only shadow entries;
* unlink those that contain pages or are being freed.
@@ -441,7 +443,8 @@ void workingset_update_node(struct xa_no
* already where they should be. The list_empty() test is safe
* as node->private_list is protected by the i_pages lock.
*/
- VM_WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */
+ mapping = container_of(node->array, struct address_space, i_pages);
+ lockdep_assert_held(&mapping->i_pages.xa_lock);
if (node->count && node->count == node->nr_values) {
if (list_empty(&node->private_list)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 144/227] mm: workingset: replace IRQ-off check with a lockdep assert.
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: tj, tglx, lizefan.x, hannes, bigeasy, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Commit 68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow
nodes") introduced an IRQ-off check to ensure that a lock is held which
also disabled interrupts. This does not work the same way on PREEMPT_RT
because none of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Link: https://lkml.kernel.org/r/20220301122143.1521823-3-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/workingset.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/workingset.c~mm-workingset-replace-irq-off-check-with-a-lockdep-assert
+++ a/mm/workingset.c
@@ -433,6 +433,8 @@ struct list_lru shadow_nodes;
void workingset_update_node(struct xa_node *node)
{
+ struct address_space *mapping;
+
/*
* Track non-empty nodes that contain only shadow entries;
* unlink those that contain pages or are being freed.
@@ -441,7 +443,8 @@ void workingset_update_node(struct xa_no
* already where they should be. The list_empty() test is safe
* as node->private_list is protected by the i_pages lock.
*/
- VM_WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */
+ mapping = container_of(node->array, struct address_space, i_pages);
+ lockdep_assert_held(&mapping->i_pages.xa_lock);
if (node->count && node->count == node->nr_values) {
if (list_empty(&node->private_list)) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 145/227] mm: vmscan: fix documentation for page_check_references()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: vbabka, mhocko, iamjoonsoo.kim, hannes, quic_charante, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: mm: vmscan: fix documentation for page_check_references()
commit b518154e59aa ("mm/vmscan: protect the workingset on anonymous LRU")
requires to look twice for both mapped anon/file pages are used more than
once to take the decission of reclaim or activation. Correct the
documentation accordingly.
Link: https://lkml.kernel.org/r/1646925640-21324-1-git-send-email-quic_charante@quicinc.com
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-fix-documentation-for-page_check_references
+++ a/mm/vmscan.c
@@ -1385,7 +1385,7 @@ static enum page_references page_check_r
/*
* All mapped pages start out with page table
* references from the instantiating fault, so we need
- * to look twice if a mapped file page is used more
+ * to look twice if a mapped file/anon page is used more
* than once.
*
* Mark it and spare it for another trip around the
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 145/227] mm: vmscan: fix documentation for page_check_references()
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: vbabka, mhocko, iamjoonsoo.kim, hannes, quic_charante, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: mm: vmscan: fix documentation for page_check_references()
commit b518154e59aa ("mm/vmscan: protect the workingset on anonymous LRU")
requires to look twice for both mapped anon/file pages are used more than
once to take the decission of reclaim or activation. Correct the
documentation accordingly.
Link: https://lkml.kernel.org/r/1646925640-21324-1-git-send-email-quic_charante@quicinc.com
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-fix-documentation-for-page_check_references
+++ a/mm/vmscan.c
@@ -1385,7 +1385,7 @@ static enum page_references page_check_r
/*
* All mapped pages start out with page table
* references from the instantiating fault, so we need
- * to look twice if a mapped file page is used more
+ * to look twice if a mapped file/anon page is used more
* than once.
*
* Mark it and spare it for another trip around the
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 146/227] mm: compaction: cleanup the compaction trace events
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: rostedt, mingo, baolin.wang, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: mm: compaction: cleanup the compaction trace events
As Steven suggested [1], we should access the pointers from the trace
event to avoid dereferencing them to the tracepoint function when the
tracepoint is disabled.
[1] https://lkml.org/lkml/2021/11/3/409
Link: https://lkml.kernel.org/r/4cd393b4d57f8f01ed72c001509b28e3a3b1a8c1.1646985115.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/trace/events/compaction.h | 26 +++++++++++++-------------
mm/compaction.c | 9 +++------
2 files changed, 16 insertions(+), 19 deletions(-)
--- a/include/trace/events/compaction.h~mm-compaction-cleanup-the-compaction-trace-events
+++ a/include/trace/events/compaction.h
@@ -67,10 +67,10 @@ DEFINE_EVENT(mm_compaction_isolate_templ
#ifdef CONFIG_COMPACTION
TRACE_EVENT(mm_compaction_migratepages,
- TP_PROTO(unsigned long nr_all,
+ TP_PROTO(struct compact_control *cc,
unsigned int nr_succeeded),
- TP_ARGS(nr_all, nr_succeeded),
+ TP_ARGS(cc, nr_succeeded),
TP_STRUCT__entry(
__field(unsigned long, nr_migrated)
@@ -79,7 +79,7 @@ TRACE_EVENT(mm_compaction_migratepages,
TP_fast_assign(
__entry->nr_migrated = nr_succeeded;
- __entry->nr_failed = nr_all - nr_succeeded;
+ __entry->nr_failed = cc->nr_migratepages - nr_succeeded;
),
TP_printk("nr_migrated=%lu nr_failed=%lu",
@@ -88,10 +88,10 @@ TRACE_EVENT(mm_compaction_migratepages,
);
TRACE_EVENT(mm_compaction_begin,
- TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
- unsigned long free_pfn, unsigned long zone_end, bool sync),
+ TP_PROTO(struct compact_control *cc, unsigned long zone_start,
+ unsigned long zone_end, bool sync),
- TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync),
+ TP_ARGS(cc, zone_start, zone_end, sync),
TP_STRUCT__entry(
__field(unsigned long, zone_start)
@@ -103,8 +103,8 @@ TRACE_EVENT(mm_compaction_begin,
TP_fast_assign(
__entry->zone_start = zone_start;
- __entry->migrate_pfn = migrate_pfn;
- __entry->free_pfn = free_pfn;
+ __entry->migrate_pfn = cc->migrate_pfn;
+ __entry->free_pfn = cc->free_pfn;
__entry->zone_end = zone_end;
__entry->sync = sync;
),
@@ -118,11 +118,11 @@ TRACE_EVENT(mm_compaction_begin,
);
TRACE_EVENT(mm_compaction_end,
- TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
- unsigned long free_pfn, unsigned long zone_end, bool sync,
+ TP_PROTO(struct compact_control *cc, unsigned long zone_start,
+ unsigned long zone_end, bool sync,
int status),
- TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync, status),
+ TP_ARGS(cc, zone_start, zone_end, sync, status),
TP_STRUCT__entry(
__field(unsigned long, zone_start)
@@ -135,8 +135,8 @@ TRACE_EVENT(mm_compaction_end,
TP_fast_assign(
__entry->zone_start = zone_start;
- __entry->migrate_pfn = migrate_pfn;
- __entry->free_pfn = free_pfn;
+ __entry->migrate_pfn = cc->migrate_pfn;
+ __entry->free_pfn = cc->free_pfn;
__entry->zone_end = zone_end;
__entry->sync = sync;
__entry->status = status;
--- a/mm/compaction.c~mm-compaction-cleanup-the-compaction-trace-events
+++ a/mm/compaction.c
@@ -2387,8 +2387,7 @@ compact_zone(struct compact_control *cc,
update_cached = !sync &&
cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1];
- trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
- cc->free_pfn, end_pfn, sync);
+ trace_mm_compaction_begin(cc, start_pfn, end_pfn, sync);
/* lru_add_drain_all could be expensive with involving other CPUs */
lru_add_drain();
@@ -2438,8 +2437,7 @@ compact_zone(struct compact_control *cc,
compaction_free, (unsigned long)cc, cc->mode,
MR_COMPACTION, &nr_succeeded);
- trace_mm_compaction_migratepages(cc->nr_migratepages,
- nr_succeeded);
+ trace_mm_compaction_migratepages(cc, nr_succeeded);
/* All pages were either migrated or will be released */
cc->nr_migratepages = 0;
@@ -2515,8 +2513,7 @@ out:
count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
count_compact_events(COMPACTFREE_SCANNED, cc->total_free_scanned);
- trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
- cc->free_pfn, end_pfn, sync, ret);
+ trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);
return ret;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 146/227] mm: compaction: cleanup the compaction trace events
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: rostedt, mingo, baolin.wang, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: mm: compaction: cleanup the compaction trace events
As Steven suggested [1], we should access the pointers from the trace
event to avoid dereferencing them to the tracepoint function when the
tracepoint is disabled.
[1] https://lkml.org/lkml/2021/11/3/409
Link: https://lkml.kernel.org/r/4cd393b4d57f8f01ed72c001509b28e3a3b1a8c1.1646985115.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/trace/events/compaction.h | 26 +++++++++++++-------------
mm/compaction.c | 9 +++------
2 files changed, 16 insertions(+), 19 deletions(-)
--- a/include/trace/events/compaction.h~mm-compaction-cleanup-the-compaction-trace-events
+++ a/include/trace/events/compaction.h
@@ -67,10 +67,10 @@ DEFINE_EVENT(mm_compaction_isolate_templ
#ifdef CONFIG_COMPACTION
TRACE_EVENT(mm_compaction_migratepages,
- TP_PROTO(unsigned long nr_all,
+ TP_PROTO(struct compact_control *cc,
unsigned int nr_succeeded),
- TP_ARGS(nr_all, nr_succeeded),
+ TP_ARGS(cc, nr_succeeded),
TP_STRUCT__entry(
__field(unsigned long, nr_migrated)
@@ -79,7 +79,7 @@ TRACE_EVENT(mm_compaction_migratepages,
TP_fast_assign(
__entry->nr_migrated = nr_succeeded;
- __entry->nr_failed = nr_all - nr_succeeded;
+ __entry->nr_failed = cc->nr_migratepages - nr_succeeded;
),
TP_printk("nr_migrated=%lu nr_failed=%lu",
@@ -88,10 +88,10 @@ TRACE_EVENT(mm_compaction_migratepages,
);
TRACE_EVENT(mm_compaction_begin,
- TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
- unsigned long free_pfn, unsigned long zone_end, bool sync),
+ TP_PROTO(struct compact_control *cc, unsigned long zone_start,
+ unsigned long zone_end, bool sync),
- TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync),
+ TP_ARGS(cc, zone_start, zone_end, sync),
TP_STRUCT__entry(
__field(unsigned long, zone_start)
@@ -103,8 +103,8 @@ TRACE_EVENT(mm_compaction_begin,
TP_fast_assign(
__entry->zone_start = zone_start;
- __entry->migrate_pfn = migrate_pfn;
- __entry->free_pfn = free_pfn;
+ __entry->migrate_pfn = cc->migrate_pfn;
+ __entry->free_pfn = cc->free_pfn;
__entry->zone_end = zone_end;
__entry->sync = sync;
),
@@ -118,11 +118,11 @@ TRACE_EVENT(mm_compaction_begin,
);
TRACE_EVENT(mm_compaction_end,
- TP_PROTO(unsigned long zone_start, unsigned long migrate_pfn,
- unsigned long free_pfn, unsigned long zone_end, bool sync,
+ TP_PROTO(struct compact_control *cc, unsigned long zone_start,
+ unsigned long zone_end, bool sync,
int status),
- TP_ARGS(zone_start, migrate_pfn, free_pfn, zone_end, sync, status),
+ TP_ARGS(cc, zone_start, zone_end, sync, status),
TP_STRUCT__entry(
__field(unsigned long, zone_start)
@@ -135,8 +135,8 @@ TRACE_EVENT(mm_compaction_end,
TP_fast_assign(
__entry->zone_start = zone_start;
- __entry->migrate_pfn = migrate_pfn;
- __entry->free_pfn = free_pfn;
+ __entry->migrate_pfn = cc->migrate_pfn;
+ __entry->free_pfn = cc->free_pfn;
__entry->zone_end = zone_end;
__entry->sync = sync;
__entry->status = status;
--- a/mm/compaction.c~mm-compaction-cleanup-the-compaction-trace-events
+++ a/mm/compaction.c
@@ -2387,8 +2387,7 @@ compact_zone(struct compact_control *cc,
update_cached = !sync &&
cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1];
- trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
- cc->free_pfn, end_pfn, sync);
+ trace_mm_compaction_begin(cc, start_pfn, end_pfn, sync);
/* lru_add_drain_all could be expensive with involving other CPUs */
lru_add_drain();
@@ -2438,8 +2437,7 @@ compact_zone(struct compact_control *cc,
compaction_free, (unsigned long)cc, cc->mode,
MR_COMPACTION, &nr_succeeded);
- trace_mm_compaction_migratepages(cc->nr_migratepages,
- nr_succeeded);
+ trace_mm_compaction_migratepages(cc, nr_succeeded);
/* All pages were either migrated or will be released */
cc->nr_migratepages = 0;
@@ -2515,8 +2513,7 @@ out:
count_compact_events(COMPACTMIGRATE_SCANNED, cc->total_migrate_scanned);
count_compact_events(COMPACTFREE_SCANNED, cc->total_free_scanned);
- trace_mm_compaction_end(start_pfn, cc->migrate_pfn,
- cc->free_pfn, end_pfn, sync, ret);
+ trace_mm_compaction_end(cc, start_pfn, end_pfn, sync, ret);
return ret;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 147/227] mempolicy: mbind_range() set_policy() after vma_merge()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: vbabka, stable, oleg, Liam.Howlett, hughd, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mempolicy: mbind_range() set_policy() after vma_merge()
v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced
vma_merge() to mbind_range(); but unlike madvise, mlock and mprotect, it
put a "continue" to next vma where its precedents go to update flags on
current vma before advancing: that left vma with the wrong setting in the
infamous vma_merge() case 8.
v3.10 commit 1444f92c8498 ("mm: merging memory blocks resets mempolicy")
tried to fix that in vma_adjust(), without fully understanding the issue.
v3.11 commit 3964acd0dbec ("mm: mempolicy: fix mbind_range() &&
vma_adjust() interaction") reverted that, and went about the fix in the
right way, but chose to optimize out an unnecessary mpol_dup() with a
prior mpol_equal() test. But on tmpfs, that also pessimized out the vital
call to its ->set_policy(), leaving the new mbind unenforced.
The user visible effect was that the pages got allocated on the local
node (happened to be 0), after the mbind() caller had specifically
asked for them to be allocated on node 1. There was not any page
migration involved in the case reported: the pages simply got allocated
on the wrong node.
Just delete that optimization now (though it could be made conditional on
vma not having a set_policy). Also remove the "next" variable: it turned
out to be blameless, but also pointless.
Link: https://lkml.kernel.org/r/319e4db9-64ae-4bca-92f0-ade85d342ff@google.com
Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mempolicy.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
--- a/mm/mempolicy.c~mempolicy-mbind_range-set_policy-after-vma_merge
+++ a/mm/mempolicy.c
@@ -786,7 +786,6 @@ static int vma_replace_policy(struct vm_
static int mbind_range(struct mm_struct *mm, unsigned long start,
unsigned long end, struct mempolicy *new_pol)
{
- struct vm_area_struct *next;
struct vm_area_struct *prev;
struct vm_area_struct *vma;
int err = 0;
@@ -801,8 +800,7 @@ static int mbind_range(struct mm_struct
if (start > vma->vm_start)
prev = vma;
- for (; vma && vma->vm_start < end; prev = vma, vma = next) {
- next = vma->vm_next;
+ for (; vma && vma->vm_start < end; prev = vma, vma = vma->vm_next) {
vmstart = max(start, vma->vm_start);
vmend = min(end, vma->vm_end);
@@ -817,10 +815,6 @@ static int mbind_range(struct mm_struct
anon_vma_name(vma));
if (prev) {
vma = prev;
- next = vma->vm_next;
- if (mpol_equal(vma_policy(vma), new_pol))
- continue;
- /* vma_merge() joined vma && vma->next, case 8 */
goto replace;
}
if (vma->vm_start != vmstart) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 147/227] mempolicy: mbind_range() set_policy() after vma_merge()
@ 2022-03-22 21:45 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:45 UTC (permalink / raw)
To: vbabka, stable, oleg, Liam.Howlett, hughd, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mempolicy: mbind_range() set_policy() after vma_merge()
v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced
vma_merge() to mbind_range(); but unlike madvise, mlock and mprotect, it
put a "continue" to next vma where its precedents go to update flags on
current vma before advancing: that left vma with the wrong setting in the
infamous vma_merge() case 8.
v3.10 commit 1444f92c8498 ("mm: merging memory blocks resets mempolicy")
tried to fix that in vma_adjust(), without fully understanding the issue.
v3.11 commit 3964acd0dbec ("mm: mempolicy: fix mbind_range() &&
vma_adjust() interaction") reverted that, and went about the fix in the
right way, but chose to optimize out an unnecessary mpol_dup() with a
prior mpol_equal() test. But on tmpfs, that also pessimized out the vital
call to its ->set_policy(), leaving the new mbind unenforced.
The user visible effect was that the pages got allocated on the local
node (happened to be 0), after the mbind() caller had specifically
asked for them to be allocated on node 1. There was not any page
migration involved in the case reported: the pages simply got allocated
on the wrong node.
Just delete that optimization now (though it could be made conditional on
vma not having a set_policy). Also remove the "next" variable: it turned
out to be blameless, but also pointless.
Link: https://lkml.kernel.org/r/319e4db9-64ae-4bca-92f0-ade85d342ff@google.com
Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/mempolicy.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
--- a/mm/mempolicy.c~mempolicy-mbind_range-set_policy-after-vma_merge
+++ a/mm/mempolicy.c
@@ -786,7 +786,6 @@ static int vma_replace_policy(struct vm_
static int mbind_range(struct mm_struct *mm, unsigned long start,
unsigned long end, struct mempolicy *new_pol)
{
- struct vm_area_struct *next;
struct vm_area_struct *prev;
struct vm_area_struct *vma;
int err = 0;
@@ -801,8 +800,7 @@ static int mbind_range(struct mm_struct
if (start > vma->vm_start)
prev = vma;
- for (; vma && vma->vm_start < end; prev = vma, vma = next) {
- next = vma->vm_next;
+ for (; vma && vma->vm_start < end; prev = vma, vma = vma->vm_next) {
vmstart = max(start, vma->vm_start);
vmend = min(end, vma->vm_end);
@@ -817,10 +815,6 @@ static int mbind_range(struct mm_struct
anon_vma_name(vma));
if (prev) {
vma = prev;
- next = vma->vm_next;
- if (mpol_equal(vma_policy(vma), new_pol))
- continue;
- /* vma_merge() joined vma && vma->next, case 8 */
goto replace;
}
if (vma->vm_start != vmstart) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 148/227] mm/oom_kill: remove unneeded is_memcg_oom check
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: rientjes, mhocko, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/oom_kill: remove unneeded is_memcg_oom check
oom_cpuset_eligible() is always called when !is_memcg_oom(). Remove this
unnecessary check.
Link: https://lkml.kernel.org/r/20220224115933.20154-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/oom_kill.c | 3 ---
1 file changed, 3 deletions(-)
--- a/mm/oom_kill.c~mm-oom_kill-remove-unneeded-is_memcg_oom-check
+++ a/mm/oom_kill.c
@@ -93,9 +93,6 @@ static bool oom_cpuset_eligible(struct t
bool ret = false;
const nodemask_t *mask = oc->nodemask;
- if (is_memcg_oom(oc))
- return true;
-
rcu_read_lock();
for_each_thread(start, tsk) {
if (mask) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 148/227] mm/oom_kill: remove unneeded is_memcg_oom check
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: rientjes, mhocko, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/oom_kill: remove unneeded is_memcg_oom check
oom_cpuset_eligible() is always called when !is_memcg_oom(). Remove this
unnecessary check.
Link: https://lkml.kernel.org/r/20220224115933.20154-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/oom_kill.c | 3 ---
1 file changed, 3 deletions(-)
--- a/mm/oom_kill.c~mm-oom_kill-remove-unneeded-is_memcg_oom-check
+++ a/mm/oom_kill.c
@@ -93,9 +93,6 @@ static bool oom_cpuset_eligible(struct t
bool ret = false;
const nodemask_t *mask = oc->nodemask;
- if (is_memcg_oom(oc))
- return true;
-
rcu_read_lock();
for_each_thread(start, tsk) {
if (mask) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 149/227] mm,migrate: fix establishing demotion target
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, xlpang, shy828301, osalvador, mgorman,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: mm,migrate: fix establishing demotion target
In commit ac16ec835314 ("mm: migrate: support multiple target nodes
demotion"), after the first demotion target node is found, we will
continue to check the next candidate obtained via find_next_best_node().
This is to find all demotion target nodes with same NUMA distance. But
one side effect of find_next_best_node() is that the candidate node
returned will be set in "used" parameter, even if the candidate node isn't
passed in the following NUMA distance checking, the candidate node will
not be used as demotion target node for the following nodes. For example,
for system as follows,
node distances:
node 0 1 2 3
0: 10 21 17 28
1: 21 10 28 17
2: 17 28 10 28
3: 28 17 28 10
when we establish demotion target node for node 0, in the first round node
2 is added to the demotion target node set. Then in the second round,
node 3 is checked and failed because distance(0, 3) > distance(0, 2). But
node 3 is set in "used" nodemask too. When we establish demotion target
node for node 1, there is no available node. This is wrong, node 3 should
be set as the demotion target of node 1.
To fix this, if the candidate node is failed to pass the distance
checking, it will be cleared in "used" nodemask. So that it can be used
for the following node.
The bug can be reproduced and fixed with this patch on a 2 socket server
machine with DRAM and PMEM.
Link: https://lkml.kernel.org/r/20220128055940.1792614-1-ying.huang@intel.com
Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Xunlei Pang <xlpang@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mmmigrate-fix-establishing-demotion-target
+++ a/mm/migrate.c
@@ -3079,18 +3079,21 @@ static int establish_migrate_target(int
if (best_distance != -1) {
val = node_distance(node, migration_target);
if (val > best_distance)
- return NUMA_NO_NODE;
+ goto out_clear;
}
index = nd->nr;
if (WARN_ONCE(index >= DEMOTION_TARGET_NODES,
"Exceeds maximum demotion target nodes\n"))
- return NUMA_NO_NODE;
+ goto out_clear;
nd->nodes[index] = migration_target;
nd->nr++;
return migration_target;
+out_clear:
+ node_clear(migration_target, *used);
+ return NUMA_NO_NODE;
}
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 149/227] mm,migrate: fix establishing demotion target
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, xlpang, shy828301, osalvador, mgorman,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: mm,migrate: fix establishing demotion target
In commit ac16ec835314 ("mm: migrate: support multiple target nodes
demotion"), after the first demotion target node is found, we will
continue to check the next candidate obtained via find_next_best_node().
This is to find all demotion target nodes with same NUMA distance. But
one side effect of find_next_best_node() is that the candidate node
returned will be set in "used" parameter, even if the candidate node isn't
passed in the following NUMA distance checking, the candidate node will
not be used as demotion target node for the following nodes. For example,
for system as follows,
node distances:
node 0 1 2 3
0: 10 21 17 28
1: 21 10 28 17
2: 17 28 10 28
3: 28 17 28 10
when we establish demotion target node for node 0, in the first round node
2 is added to the demotion target node set. Then in the second round,
node 3 is checked and failed because distance(0, 3) > distance(0, 2). But
node 3 is set in "used" nodemask too. When we establish demotion target
node for node 1, there is no available node. This is wrong, node 3 should
be set as the demotion target of node 1.
To fix this, if the candidate node is failed to pass the distance
checking, it will be cleared in "used" nodemask. So that it can be used
for the following node.
The bug can be reproduced and fixed with this patch on a 2 socket server
machine with DRAM and PMEM.
Link: https://lkml.kernel.org/r/20220128055940.1792614-1-ying.huang@intel.com
Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Xunlei Pang <xlpang@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/migrate.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mmmigrate-fix-establishing-demotion-target
+++ a/mm/migrate.c
@@ -3079,18 +3079,21 @@ static int establish_migrate_target(int
if (best_distance != -1) {
val = node_distance(node, migration_target);
if (val > best_distance)
- return NUMA_NO_NODE;
+ goto out_clear;
}
index = nd->nr;
if (WARN_ONCE(index >= DEMOTION_TARGET_NODES,
"Exceeds maximum demotion target nodes\n"))
- return NUMA_NO_NODE;
+ goto out_clear;
nd->nodes[index] = migration_target;
nd->nr++;
return migration_target;
+out_clear:
+ node_clear(migration_target, *used);
+ return NUMA_NO_NODE;
}
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 150/227] mm/migrate: fix race between lock page and clear PG_Isolated
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: willy, william.kucharski, vbabka, shy828301, nicholas.tang, maz,
matthias.bgg, Kuan-Ying.Lee, dhowells, david, andrew.yang, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: "andrew.yang" <andrew.yang@mediatek.com>
Subject: mm/migrate: fix race between lock page and clear PG_Isolated
When memory is tight, system may start to compact memory for large
continuous memory demands. If one process tries to lock a memory page
that is being locked and isolated for compaction, it may wait a long time
or even forever. This is because compaction will perform non-atomic
PG_Isolated clear while holding page lock, this may overwrite PG_waiters
set by the process that can't obtain the page lock and add itself to the
waiting queue to wait for the lock to be unlocked.
CPU1 CPU2
lock_page(page); (successful)
lock_page(); (failed)
__ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten)
unlock_page(page);
The solution is to not perform non-atomic operation on page flags while
holding page lock.
Link: https://lkml.kernel.org/r/20220315030515.20263-1-andrew.yang@mediatek.com
Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Vlastimil Babka" <vbabka@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: "William Kucharski" <william.kucharski@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/page-flags.h | 2 +-
mm/migrate.c | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
--- a/include/linux/page-flags.h~mm-migrate-fix-race-between-lock-page-and-clear-pg_isolated
+++ a/include/linux/page-flags.h
@@ -1000,7 +1000,7 @@ PAGE_TYPE_OPS(Guard, guard)
extern bool is_free_buddy_page(struct page *page);
-__PAGEFLAG(Isolated, isolated, PF_ANY);
+PAGEFLAG(Isolated, isolated, PF_ANY);
#ifdef CONFIG_MMU
#define __PG_MLOCKED (1UL << PG_mlocked)
--- a/mm/migrate.c~mm-migrate-fix-race-between-lock-page-and-clear-pg_isolated
+++ a/mm/migrate.c
@@ -107,7 +107,7 @@ int isolate_movable_page(struct page *pa
/* Driver shouldn't use PG_isolated bit of page->flags */
WARN_ON_ONCE(PageIsolated(page));
- __SetPageIsolated(page);
+ SetPageIsolated(page);
unlock_page(page);
return 0;
@@ -126,7 +126,7 @@ static void putback_movable_page(struct
mapping = page_mapping(page);
mapping->a_ops->putback_page(page);
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
}
/*
@@ -159,7 +159,7 @@ void putback_movable_pages(struct list_h
if (PageMovable(page))
putback_movable_page(page);
else
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
unlock_page(page);
put_page(page);
} else {
@@ -883,7 +883,7 @@ static int move_to_new_page(struct page
VM_BUG_ON_PAGE(!PageIsolated(page), page);
if (!PageMovable(page)) {
rc = MIGRATEPAGE_SUCCESS;
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
goto out;
}
@@ -905,7 +905,7 @@ static int move_to_new_page(struct page
* We clear PG_movable under page_lock so any compactor
* cannot try to migrate this page.
*/
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
}
/*
@@ -1091,7 +1091,7 @@ static int unmap_and_move(new_page_t get
if (unlikely(__PageMovable(page))) {
lock_page(page);
if (!PageMovable(page))
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
unlock_page(page);
}
goto out;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 150/227] mm/migrate: fix race between lock page and clear PG_Isolated
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: willy, william.kucharski, vbabka, shy828301, nicholas.tang, maz,
matthias.bgg, Kuan-Ying.Lee, dhowells, david, andrew.yang, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: "andrew.yang" <andrew.yang@mediatek.com>
Subject: mm/migrate: fix race between lock page and clear PG_Isolated
When memory is tight, system may start to compact memory for large
continuous memory demands. If one process tries to lock a memory page
that is being locked and isolated for compaction, it may wait a long time
or even forever. This is because compaction will perform non-atomic
PG_Isolated clear while holding page lock, this may overwrite PG_waiters
set by the process that can't obtain the page lock and add itself to the
waiting queue to wait for the lock to be unlocked.
CPU1 CPU2
lock_page(page); (successful)
lock_page(); (failed)
__ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten)
unlock_page(page);
The solution is to not perform non-atomic operation on page flags while
holding page lock.
Link: https://lkml.kernel.org/r/20220315030515.20263-1-andrew.yang@mediatek.com
Signed-off-by: andrew.yang <andrew.yang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Vlastimil Babka" <vbabka@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: "William Kucharski" <william.kucharski@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/page-flags.h | 2 +-
mm/migrate.c | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
--- a/include/linux/page-flags.h~mm-migrate-fix-race-between-lock-page-and-clear-pg_isolated
+++ a/include/linux/page-flags.h
@@ -1000,7 +1000,7 @@ PAGE_TYPE_OPS(Guard, guard)
extern bool is_free_buddy_page(struct page *page);
-__PAGEFLAG(Isolated, isolated, PF_ANY);
+PAGEFLAG(Isolated, isolated, PF_ANY);
#ifdef CONFIG_MMU
#define __PG_MLOCKED (1UL << PG_mlocked)
--- a/mm/migrate.c~mm-migrate-fix-race-between-lock-page-and-clear-pg_isolated
+++ a/mm/migrate.c
@@ -107,7 +107,7 @@ int isolate_movable_page(struct page *pa
/* Driver shouldn't use PG_isolated bit of page->flags */
WARN_ON_ONCE(PageIsolated(page));
- __SetPageIsolated(page);
+ SetPageIsolated(page);
unlock_page(page);
return 0;
@@ -126,7 +126,7 @@ static void putback_movable_page(struct
mapping = page_mapping(page);
mapping->a_ops->putback_page(page);
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
}
/*
@@ -159,7 +159,7 @@ void putback_movable_pages(struct list_h
if (PageMovable(page))
putback_movable_page(page);
else
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
unlock_page(page);
put_page(page);
} else {
@@ -883,7 +883,7 @@ static int move_to_new_page(struct page
VM_BUG_ON_PAGE(!PageIsolated(page), page);
if (!PageMovable(page)) {
rc = MIGRATEPAGE_SUCCESS;
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
goto out;
}
@@ -905,7 +905,7 @@ static int move_to_new_page(struct page
* We clear PG_movable under page_lock so any compactor
* cannot try to migrate this page.
*/
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
}
/*
@@ -1091,7 +1091,7 @@ static int unmap_and_move(new_page_t get
if (unlikely(__PageMovable(page))) {
lock_page(page);
if (!PageMovable(page))
- __ClearPageIsolated(page);
+ ClearPageIsolated(page);
unlock_page(page);
}
goto out;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 151/227] mm/thp: refix __split_huge_pmd_locked() for migration PMD
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, shy828301, rcampbell, kirill.shutemov, hughd, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm/thp: refix __split_huge_pmd_locked() for migration PMD
Migration entries do not contribute to a page's reference count: move
__split_huge_pmd_locked()'s page_ref_add() into pmd_migration's else block
(along with the page_count() check - a page is quite likely to have
reference count frozen to 0 when a migration entry is found).
This will fix a very rare anonymous memory leak, after a split_huge_pmd()
raced with an anon split_huge_page() or an anon THP migrate_pages(): since
the wrongly raised refcount stopped the page (perhaps small, perhaps huge,
depending on when the race hit) from ever being freed. At first I thought
there were worse risks, from prematurely unfreezing a frozen page: but now
think that would only affect page cache pages, which do not come this way
(except for anonymous pages in swap cache, perhaps).
Link: https://lkml.kernel.org/r/84792468-f512-e48f-378c-e34c3641e97@google.com
Fixes: ec0abae6dcdf ("mm/thp: fix __split_huge_pmd_locked() for migration PMD")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-thp-refix-__split_huge_pmd_locked-for-migration-pmd
+++ a/mm/huge_memory.c
@@ -2055,9 +2055,9 @@ static void __split_huge_pmd_locked(stru
young = pmd_young(old_pmd);
soft_dirty = pmd_soft_dirty(old_pmd);
uffd_wp = pmd_uffd_wp(old_pmd);
+ VM_BUG_ON_PAGE(!page_count(page), page);
+ page_ref_add(page, HPAGE_PMD_NR - 1);
}
- VM_BUG_ON_PAGE(!page_count(page), page);
- page_ref_add(page, HPAGE_PMD_NR - 1);
/*
* Withdraw the table only after we mark the pmd entry invalid.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 151/227] mm/thp: refix __split_huge_pmd_locked() for migration PMD
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, shy828301, rcampbell, kirill.shutemov, hughd, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm/thp: refix __split_huge_pmd_locked() for migration PMD
Migration entries do not contribute to a page's reference count: move
__split_huge_pmd_locked()'s page_ref_add() into pmd_migration's else block
(along with the page_count() check - a page is quite likely to have
reference count frozen to 0 when a migration entry is found).
This will fix a very rare anonymous memory leak, after a split_huge_pmd()
raced with an anon split_huge_page() or an anon THP migrate_pages(): since
the wrongly raised refcount stopped the page (perhaps small, perhaps huge,
depending on when the race hit) from ever being freed. At first I thought
there were worse risks, from prematurely unfreezing a frozen page: but now
think that would only affect page cache pages, which do not come this way
(except for anonymous pages in swap cache, perhaps).
Link: https://lkml.kernel.org/r/84792468-f512-e48f-378c-e34c3641e97@google.com
Fixes: ec0abae6dcdf ("mm/thp: fix __split_huge_pmd_locked() for migration PMD")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-thp-refix-__split_huge_pmd_locked-for-migration-pmd
+++ a/mm/huge_memory.c
@@ -2055,9 +2055,9 @@ static void __split_huge_pmd_locked(stru
young = pmd_young(old_pmd);
soft_dirty = pmd_soft_dirty(old_pmd);
uffd_wp = pmd_uffd_wp(old_pmd);
+ VM_BUG_ON_PAGE(!page_count(page), page);
+ page_ref_add(page, HPAGE_PMD_NR - 1);
}
- VM_BUG_ON_PAGE(!page_count(page), page);
- page_ref_add(page, HPAGE_PMD_NR - 1);
/*
* Withdraw the table only after we mark the pmd entry invalid.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 152/227] mm/cma: provide option to opt out from exposing pages on activation failure
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: sourabhjain, osalvador, mpe, mike.kravetz, mahesh, david,
hbathini, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hari Bathini <hbathini@linux.ibm.com>
Subject: mm/cma: provide option to opt out from exposing pages on activation failure
Patch series "powerpc/fadump: handle CMA activation failure appropriately", v3.
Commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if activation
of an area fails") started exposing all pages to buddy allocator on CMA
activation failure. But there can be CMA users that want to handle the
reserved memory differently on CMA allocation failure. Provide an option
to opt out from exposing pages to buddy for such cases.
Link: https://lkml.kernel.org/r/20220117075246.36072-1-hbathini@linux.ibm.com
Link: https://lkml.kernel.org/r/20220117075246.36072-2-hbathini@linux.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/cma.h | 2 ++
mm/cma.c | 11 +++++++++--
mm/cma.h | 1 +
3 files changed, 12 insertions(+), 2 deletions(-)
--- a/include/linux/cma.h~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure
+++ a/include/linux/cma.h
@@ -58,4 +58,6 @@ extern bool cma_pages_valid(struct cma *
extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data);
+
+extern void cma_reserve_pages_on_error(struct cma *cma);
#endif
--- a/mm/cma.c~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure
+++ a/mm/cma.c
@@ -131,8 +131,10 @@ not_in_zone:
bitmap_free(cma->bitmap);
out_error:
/* Expose all pages to the buddy, they are useless for CMA. */
- for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++)
- free_reserved_page(pfn_to_page(pfn));
+ if (!cma->reserve_pages_on_error) {
+ for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++)
+ free_reserved_page(pfn_to_page(pfn));
+ }
totalcma_pages -= cma->count;
cma->count = 0;
pr_err("CMA area %s could not be activated\n", cma->name);
@@ -150,6 +152,11 @@ static int __init cma_init_reserved_area
}
core_initcall(cma_init_reserved_areas);
+void __init cma_reserve_pages_on_error(struct cma *cma)
+{
+ cma->reserve_pages_on_error = true;
+}
+
/**
* cma_init_reserved_mem() - create custom contiguous area from reserved memory
* @base: Base address of the reserved area
--- a/mm/cma.h~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure
+++ a/mm/cma.h
@@ -30,6 +30,7 @@ struct cma {
/* kobject requires dynamic object */
struct cma_kobject *cma_kobj;
#endif
+ bool reserve_pages_on_error;
};
extern struct cma cma_areas[MAX_CMA_AREAS];
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 152/227] mm/cma: provide option to opt out from exposing pages on activation failure
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: sourabhjain, osalvador, mpe, mike.kravetz, mahesh, david,
hbathini, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hari Bathini <hbathini@linux.ibm.com>
Subject: mm/cma: provide option to opt out from exposing pages on activation failure
Patch series "powerpc/fadump: handle CMA activation failure appropriately", v3.
Commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if activation
of an area fails") started exposing all pages to buddy allocator on CMA
activation failure. But there can be CMA users that want to handle the
reserved memory differently on CMA allocation failure. Provide an option
to opt out from exposing pages to buddy for such cases.
Link: https://lkml.kernel.org/r/20220117075246.36072-1-hbathini@linux.ibm.com
Link: https://lkml.kernel.org/r/20220117075246.36072-2-hbathini@linux.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/cma.h | 2 ++
mm/cma.c | 11 +++++++++--
mm/cma.h | 1 +
3 files changed, 12 insertions(+), 2 deletions(-)
--- a/include/linux/cma.h~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure
+++ a/include/linux/cma.h
@@ -58,4 +58,6 @@ extern bool cma_pages_valid(struct cma *
extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data);
+
+extern void cma_reserve_pages_on_error(struct cma *cma);
#endif
--- a/mm/cma.c~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure
+++ a/mm/cma.c
@@ -131,8 +131,10 @@ not_in_zone:
bitmap_free(cma->bitmap);
out_error:
/* Expose all pages to the buddy, they are useless for CMA. */
- for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++)
- free_reserved_page(pfn_to_page(pfn));
+ if (!cma->reserve_pages_on_error) {
+ for (pfn = base_pfn; pfn < base_pfn + cma->count; pfn++)
+ free_reserved_page(pfn_to_page(pfn));
+ }
totalcma_pages -= cma->count;
cma->count = 0;
pr_err("CMA area %s could not be activated\n", cma->name);
@@ -150,6 +152,11 @@ static int __init cma_init_reserved_area
}
core_initcall(cma_init_reserved_areas);
+void __init cma_reserve_pages_on_error(struct cma *cma)
+{
+ cma->reserve_pages_on_error = true;
+}
+
/**
* cma_init_reserved_mem() - create custom contiguous area from reserved memory
* @base: Base address of the reserved area
--- a/mm/cma.h~mm-cma-provide-option-to-opt-out-from-exposing-pages-on-activation-failure
+++ a/mm/cma.h
@@ -30,6 +30,7 @@ struct cma {
/* kobject requires dynamic object */
struct cma_kobject *cma_kobj;
#endif
+ bool reserve_pages_on_error;
};
extern struct cma cma_areas[MAX_CMA_AREAS];
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 153/227] powerpc/fadump: opt out from freeing pages on cma activation failure
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: sourabhjain, osalvador, mpe, mike.kravetz, mahesh, david,
hbathini, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hari Bathini <hbathini@linux.ibm.com>
Subject: powerpc/fadump: opt out from freeing pages on cma activation failure
With commit a4e92ce8e4c8 ("powerpc/fadump: Reservationless firmware
assisted dump"), Linux kernel's Contiguous Memory Allocator (CMA) based
reservation was introduced in fadump. That change was aimed at using CMA
to let applications utilize the memory reserved for fadump while blocking
it from being used for kernel pages. The assumption was, even if CMA
activation fails for whatever reason, the memory still remains reserved to
avoid it from being used for kernel pages. But commit 072355c1cf2d
("mm/cma: expose all pages to the buddy if activation of an area fails")
breaks this assumption as it started exposing all pages to buddy allocator
on CMA activation failure. It led to warning messages like below while
running crash-utility on vmcore of a kernel having above two commits:
crash: seek error: kernel virtual address: <from reserved region>
To fix this problem, opt out from exposing pages to buddy allocator on CMA
activation failure for fadump reserved memory.
Link: https://lkml.kernel.org/r/20220117075246.36072-3-hbathini@linux.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/powerpc/kernel/fadump.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/arch/powerpc/kernel/fadump.c~powerpc-fadump-opt-out-from-freeing-pages-on-cma-activation-failure
+++ a/arch/powerpc/kernel/fadump.c
@@ -113,6 +113,12 @@ static int __init fadump_cma_init(void)
}
/*
+ * If CMA activation fails, keep the pages reserved, instead of
+ * exposing them to buddy allocator. Same as 'fadump=nocma' case.
+ */
+ cma_reserve_pages_on_error(fadump_cma);
+
+ /*
* So we now have successfully initialized cma area for fadump.
*/
pr_info("Initialized 0x%lx bytes cma area at %ldMB from 0x%lx "
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 153/227] powerpc/fadump: opt out from freeing pages on cma activation failure
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: sourabhjain, osalvador, mpe, mike.kravetz, mahesh, david,
hbathini, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Hari Bathini <hbathini@linux.ibm.com>
Subject: powerpc/fadump: opt out from freeing pages on cma activation failure
With commit a4e92ce8e4c8 ("powerpc/fadump: Reservationless firmware
assisted dump"), Linux kernel's Contiguous Memory Allocator (CMA) based
reservation was introduced in fadump. That change was aimed at using CMA
to let applications utilize the memory reserved for fadump while blocking
it from being used for kernel pages. The assumption was, even if CMA
activation fails for whatever reason, the memory still remains reserved to
avoid it from being used for kernel pages. But commit 072355c1cf2d
("mm/cma: expose all pages to the buddy if activation of an area fails")
breaks this assumption as it started exposing all pages to buddy allocator
on CMA activation failure. It led to warning messages like below while
running crash-utility on vmcore of a kernel having above two commits:
crash: seek error: kernel virtual address: <from reserved region>
To fix this problem, opt out from exposing pages to buddy allocator on CMA
activation failure for fadump reserved memory.
Link: https://lkml.kernel.org/r/20220117075246.36072-3-hbathini@linux.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/powerpc/kernel/fadump.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/arch/powerpc/kernel/fadump.c~powerpc-fadump-opt-out-from-freeing-pages-on-cma-activation-failure
+++ a/arch/powerpc/kernel/fadump.c
@@ -113,6 +113,12 @@ static int __init fadump_cma_init(void)
}
/*
+ * If CMA activation fails, keep the pages reserved, instead of
+ * exposing them to buddy allocator. Same as 'fadump=nocma' case.
+ */
+ cma_reserve_pages_on_error(fadump_cma);
+
+ /*
* So we now have successfully initialized cma area for fadump.
*/
pr_info("Initialized 0x%lx bytes cma area at %ldMB from 0x%lx "
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 154/227] NUMA Balancing: add page promotion counter
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, weixugc, shy828301, shakeelb, riel, rdunlap,
peterz, osalvador, mhocko, mgorman, hannes, feng.tang,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: NUMA Balancing: add page promotion counter
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13
With the advent of various new memory types, some machines will have
multiple types of memory, e.g. DRAM and PMEM (persistent memory). The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are different.
After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for
use like normal RAM"), the PMEM could be used as the cost-effective
volatile memory in separate NUMA nodes. In a typical memory tiering
system, there are CPUs, DRAM and PMEM in each physical NUMA node. The
CPUs and the DRAM will be put in one logical node, while the PMEM will be
put in another (faked) logical node.
To optimize the system overall performance, the hot pages should be placed
in DRAM node. To do that, we need to identify the hot pages in the PMEM
node and migrate them to DRAM node via NUMA migration.
In the original NUMA balancing, there are already a set of existing
mechanisms to identify the pages recently accessed by the CPUs in a node
and migrate the pages to the node. So we can reuse these mechanisms to
build the mechanisms to optimize the page placement in the memory tiering
system. This is implemented in this patchset.
At the other hand, the cold pages should be placed in PMEM node. So, we
also need to identify the cold pages in the DRAM node and migrate them to
PMEM node.
In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a
mechanism to demote the cold DRAM pages to PMEM node under memory pressure
is implemented. Based on that, the cold DRAM pages can be demoted to PMEM
node proactively to free some memory space on DRAM node to accommodate the
promoted hot PMEM pages. This is implemented in this patchset too.
We have tested the solution with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address distribution
on a 2 socket Intel server with Optane DC Persistent Memory Model. The
test results shows that the pmbench score can improve up to 95.9%.
This patch (of 3):
In a system with multiple memory types, e.g. DRAM and PMEM, the CPU
and DRAM in one socket will be put in one NUMA node as before, while
the PMEM will be put in another NUMA node as described in the
description of the commit c221c0b0308f ("device-dax: "Hotplug"
persistent memory for use like normal RAM"). So, the NUMA balancing
mechanism will identify all PMEM accesses as remote access and try to
promote the PMEM pages to DRAM.
To distinguish the number of the inter-type promoted pages from that of
the inter-socket migrated pages. A new vmstat count is added. The
counter is per-node (count in the target node). So this can be used to
identify promotion imbalance among the NUMA nodes.
Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 3 +++
include/linux/node.h | 5 +++++
mm/migrate.c | 13 ++++++++++---
mm/vmstat.c | 3 +++
4 files changed, 21 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~numa-balancing-add-page-promotion-counter
+++ a/include/linux/mmzone.h
@@ -222,6 +222,9 @@ enum node_stat_item {
#ifdef CONFIG_SWAP
NR_SWAPCACHE,
#endif
+#ifdef CONFIG_NUMA_BALANCING
+ PGPROMOTE_SUCCESS, /* promote successfully */
+#endif
NR_VM_NODE_STAT_ITEMS
};
--- a/include/linux/node.h~numa-balancing-add-page-promotion-counter
+++ a/include/linux/node.h
@@ -181,4 +181,9 @@ static inline void register_hugetlbfs_wi
#define to_node(device) container_of(device, struct node, dev)
+static inline bool node_is_toptier(int node)
+{
+ return node_state(node, N_CPU);
+}
+
#endif /* _LINUX_NODE_H_ */
--- a/mm/migrate.c~numa-balancing-add-page-promotion-counter
+++ a/mm/migrate.c
@@ -2069,6 +2069,7 @@ int migrate_misplaced_page(struct page *
pg_data_t *pgdat = NODE_DATA(node);
int isolated;
int nr_remaining;
+ unsigned int nr_succeeded;
LIST_HEAD(migratepages);
new_page_t *new;
bool compound;
@@ -2107,7 +2108,8 @@ int migrate_misplaced_page(struct page *
list_add(&page->lru, &migratepages);
nr_remaining = migrate_pages(&migratepages, *new, NULL, node,
- MIGRATE_ASYNC, MR_NUMA_MISPLACED, NULL);
+ MIGRATE_ASYNC, MR_NUMA_MISPLACED,
+ &nr_succeeded);
if (nr_remaining) {
if (!list_empty(&migratepages)) {
list_del(&page->lru);
@@ -2116,8 +2118,13 @@ int migrate_misplaced_page(struct page *
putback_lru_page(page);
}
isolated = 0;
- } else
- count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_pages);
+ }
+ if (nr_succeeded) {
+ count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
+ if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node))
+ mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
+ nr_succeeded);
+ }
BUG_ON(!list_empty(&migratepages));
return isolated;
--- a/mm/vmstat.c~numa-balancing-add-page-promotion-counter
+++ a/mm/vmstat.c
@@ -1242,6 +1242,9 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_SWAP
"nr_swapcached",
#endif
+#ifdef CONFIG_NUMA_BALANCING
+ "pgpromote_success",
+#endif
/* enum writeback_stat_item counters */
"nr_dirty_threshold",
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 154/227] NUMA Balancing: add page promotion counter
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, weixugc, shy828301, shakeelb, riel, rdunlap,
peterz, osalvador, mhocko, mgorman, hannes, feng.tang,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: NUMA Balancing: add page promotion counter
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13
With the advent of various new memory types, some machines will have
multiple types of memory, e.g. DRAM and PMEM (persistent memory). The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are different.
After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for
use like normal RAM"), the PMEM could be used as the cost-effective
volatile memory in separate NUMA nodes. In a typical memory tiering
system, there are CPUs, DRAM and PMEM in each physical NUMA node. The
CPUs and the DRAM will be put in one logical node, while the PMEM will be
put in another (faked) logical node.
To optimize the system overall performance, the hot pages should be placed
in DRAM node. To do that, we need to identify the hot pages in the PMEM
node and migrate them to DRAM node via NUMA migration.
In the original NUMA balancing, there are already a set of existing
mechanisms to identify the pages recently accessed by the CPUs in a node
and migrate the pages to the node. So we can reuse these mechanisms to
build the mechanisms to optimize the page placement in the memory tiering
system. This is implemented in this patchset.
At the other hand, the cold pages should be placed in PMEM node. So, we
also need to identify the cold pages in the DRAM node and migrate them to
PMEM node.
In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a
mechanism to demote the cold DRAM pages to PMEM node under memory pressure
is implemented. Based on that, the cold DRAM pages can be demoted to PMEM
node proactively to free some memory space on DRAM node to accommodate the
promoted hot PMEM pages. This is implemented in this patchset too.
We have tested the solution with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address distribution
on a 2 socket Intel server with Optane DC Persistent Memory Model. The
test results shows that the pmbench score can improve up to 95.9%.
This patch (of 3):
In a system with multiple memory types, e.g. DRAM and PMEM, the CPU
and DRAM in one socket will be put in one NUMA node as before, while
the PMEM will be put in another NUMA node as described in the
description of the commit c221c0b0308f ("device-dax: "Hotplug"
persistent memory for use like normal RAM"). So, the NUMA balancing
mechanism will identify all PMEM accesses as remote access and try to
promote the PMEM pages to DRAM.
To distinguish the number of the inter-type promoted pages from that of
the inter-socket migrated pages. A new vmstat count is added. The
counter is per-node (count in the target node). So this can be used to
identify promotion imbalance among the NUMA nodes.
Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 3 +++
include/linux/node.h | 5 +++++
mm/migrate.c | 13 ++++++++++---
mm/vmstat.c | 3 +++
4 files changed, 21 insertions(+), 3 deletions(-)
--- a/include/linux/mmzone.h~numa-balancing-add-page-promotion-counter
+++ a/include/linux/mmzone.h
@@ -222,6 +222,9 @@ enum node_stat_item {
#ifdef CONFIG_SWAP
NR_SWAPCACHE,
#endif
+#ifdef CONFIG_NUMA_BALANCING
+ PGPROMOTE_SUCCESS, /* promote successfully */
+#endif
NR_VM_NODE_STAT_ITEMS
};
--- a/include/linux/node.h~numa-balancing-add-page-promotion-counter
+++ a/include/linux/node.h
@@ -181,4 +181,9 @@ static inline void register_hugetlbfs_wi
#define to_node(device) container_of(device, struct node, dev)
+static inline bool node_is_toptier(int node)
+{
+ return node_state(node, N_CPU);
+}
+
#endif /* _LINUX_NODE_H_ */
--- a/mm/migrate.c~numa-balancing-add-page-promotion-counter
+++ a/mm/migrate.c
@@ -2069,6 +2069,7 @@ int migrate_misplaced_page(struct page *
pg_data_t *pgdat = NODE_DATA(node);
int isolated;
int nr_remaining;
+ unsigned int nr_succeeded;
LIST_HEAD(migratepages);
new_page_t *new;
bool compound;
@@ -2107,7 +2108,8 @@ int migrate_misplaced_page(struct page *
list_add(&page->lru, &migratepages);
nr_remaining = migrate_pages(&migratepages, *new, NULL, node,
- MIGRATE_ASYNC, MR_NUMA_MISPLACED, NULL);
+ MIGRATE_ASYNC, MR_NUMA_MISPLACED,
+ &nr_succeeded);
if (nr_remaining) {
if (!list_empty(&migratepages)) {
list_del(&page->lru);
@@ -2116,8 +2118,13 @@ int migrate_misplaced_page(struct page *
putback_lru_page(page);
}
isolated = 0;
- } else
- count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_pages);
+ }
+ if (nr_succeeded) {
+ count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
+ if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node))
+ mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
+ nr_succeeded);
+ }
BUG_ON(!list_empty(&migratepages));
return isolated;
--- a/mm/vmstat.c~numa-balancing-add-page-promotion-counter
+++ a/mm/vmstat.c
@@ -1242,6 +1242,9 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_SWAP
"nr_swapcached",
#endif
+#ifdef CONFIG_NUMA_BALANCING
+ "pgpromote_success",
+#endif
/* enum writeback_stat_item counters */
"nr_dirty_threshold",
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 155/227] NUMA balancing: optimize page placement for memory tiering system
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, weixugc, shy828301, shakeelb, riel, rdunlap,
peterz, osalvador, mhocko, mgorman, hannes, feng.tang,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: NUMA balancing: optimize page placement for memory tiering system
With the advent of various new memory types, some machines will have
multiple types of memory, e.g. DRAM and PMEM (persistent memory). The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are usually
different.
In such system, because of the memory accessing pattern changing etc,
some pages in the slow memory may become hot globally. So in this
patch, the NUMA balancing mechanism is enhanced to optimize the page
placement among the different memory types according to hot/cold
dynamically.
In a typical memory tiering system, there are CPUs, fast memory and
slow memory in each physical NUMA node. The CPUs and the fast memory
will be put in one logical node (called fast memory node), while the
slow memory will be put in another (faked) logical node (called slow
memory node). That is, the fast memory is regarded as local while the
slow memory is regarded as remote. So it's possible for the recently
accessed pages in the slow memory node to be promoted to the fast
memory node via the existing NUMA balancing mechanism.
The original NUMA balancing mechanism will stop to migrate pages if the
free memory of the target node becomes below the high watermark. This
is a reasonable policy if there's only one memory type. But this makes
the original NUMA balancing mechanism almost do not work to optimize
page placement among different memory types. Details are as follows.
It's the common cases that the working-set size of the workload is
larger than the size of the fast memory nodes. Otherwise, it's
unnecessary to use the slow memory at all. So, there are almost always
no enough free pages in the fast memory nodes, so that the globally hot
pages in the slow memory node cannot be promoted to the fast memory
node. To solve the issue, we have 2 choices as follows,
a. Ignore the free pages watermark checking when promoting hot pages
from the slow memory node to the fast memory node. This will
create some memory pressure in the fast memory node, thus trigger
the memory reclaiming. So that, the cold pages in the fast memory
node will be demoted to the slow memory node.
b. Define a new watermark called wmark_promo which is higher than
wmark_high, and have kswapd reclaiming pages until free pages reach
such watermark. The scenario is as follows: when we want to promote
hot-pages from a slow memory to a fast memory, but fast memory's free
pages would go lower than high watermark with such promotion, we wake
up kswapd with wmark_promo watermark in order to demote cold pages and
free us up some space. So, next time we want to promote hot-pages we
might have a chance of doing so.
The choice "a" may create high memory pressure in the fast memory node.
If the memory pressure of the workload is high, the memory pressure
may become so high that the memory allocation latency of the workload
is influenced, e.g. the direct reclaiming may be triggered.
The choice "b" works much better at this aspect. If the memory
pressure of the workload is high, the hot pages promotion will stop
earlier because its allocation watermark is higher than that of the
normal memory allocation. So in this patch, choice "b" is implemented.
A new zone watermark (WMARK_PROMO) is added. Which is larger than the
high watermark and can be controlled via watermark_scale_factor.
In addition to the original page placement optimization among sockets,
the NUMA balancing mechanism is extended to be used to optimize page
placement according to hot/cold among different memory types. So the
sysctl user space interface (numa_balancing) is extended in a backward
compatible way as follow, so that the users can enable/disable these
functionality individually.
The sysctl is converted from a Boolean value to a bits field. The
definition of the flags is,
- 0: NUMA_BALANCING_DISABLED
- 1: NUMA_BALANCING_NORMAL
- 2: NUMA_BALANCING_MEMORY_TIERING
We have tested the patch with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent
Memory Model. The test results shows that the pmbench score can
improve up to 95.9%.
Thanks Andrew Morton to help fix the document format error.
Link: https://lkml.kernel.org/r/20220221084529.1052339-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/sysctl/kernel.rst | 31 ++++++++++++------
include/linux/mmzone.h | 1
include/linux/sched/sysctl.h | 10 +++++
kernel/sched/core.c | 21 +++++++++---
kernel/sysctl.c | 2 -
mm/migrate.c | 16 ++++++++-
mm/page_alloc.c | 3 +
mm/vmscan.c | 6 ++-
8 files changed, 71 insertions(+), 19 deletions(-)
--- a/Documentation/admin-guide/sysctl/kernel.rst~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/Documentation/admin-guide/sysctl/kernel.rst
@@ -595,16 +595,23 @@ Documentation/admin-guide/kernel-paramet
numa_balancing
==============
-Enables/disables automatic page fault based NUMA memory
-balancing. Memory is moved automatically to nodes
-that access it often.
-
-Enables/disables automatic NUMA memory balancing. On NUMA machines, there
-is a performance penalty if remote memory is accessed by a CPU. When this
-feature is enabled the kernel samples what task thread is accessing memory
-by periodically unmapping pages and later trapping a page fault. At the
-time of the page fault, it is determined if the data being accessed should
-be migrated to a local memory node.
+Enables/disables and configures automatic page fault based NUMA memory
+balancing. Memory is moved automatically to nodes that access it often.
+The value to set can be the result of ORing the following:
+
+= =================================
+0 NUMA_BALANCING_DISABLED
+1 NUMA_BALANCING_NORMAL
+2 NUMA_BALANCING_MEMORY_TIERING
+= =================================
+
+Or NUMA_BALANCING_NORMAL to optimize page placement among different
+NUMA nodes to reduce remote accessing. On NUMA machines, there is a
+performance penalty if remote memory is accessed by a CPU. When this
+feature is enabled the kernel samples what task thread is accessing
+memory by periodically unmapping pages and later trapping a page
+fault. At the time of the page fault, it is determined if the data
+being accessed should be migrated to a local memory node.
The unmapping of pages and trapping faults incur additional overhead that
ideally is offset by improved memory locality but there is no universal
@@ -615,6 +622,10 @@ faults may be controlled by the `numa_ba
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.
+Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among
+different types of memory (represented as different NUMA nodes) to
+place the hot pages in the fast memory. This is implemented based on
+unmapping and page fault too.
numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
===============================================================================================================================
--- a/include/linux/mmzone.h~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/include/linux/mmzone.h
@@ -353,6 +353,7 @@ enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
WMARK_HIGH,
+ WMARK_PROMO,
NR_WMARK
};
--- a/include/linux/sched/sysctl.h~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/include/linux/sched/sysctl.h
@@ -23,6 +23,16 @@ enum sched_tunable_scaling {
SCHED_TUNABLESCALING_END,
};
+#define NUMA_BALANCING_DISABLED 0x0
+#define NUMA_BALANCING_NORMAL 0x1
+#define NUMA_BALANCING_MEMORY_TIERING 0x2
+
+#ifdef CONFIG_NUMA_BALANCING
+extern int sysctl_numa_balancing_mode;
+#else
+#define sysctl_numa_balancing_mode 0
+#endif
+
/*
* control realtime throttling:
*
--- a/kernel/sched/core.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/kernel/sched/core.c
@@ -4279,7 +4279,9 @@ DEFINE_STATIC_KEY_FALSE(sched_numa_balan
#ifdef CONFIG_NUMA_BALANCING
-void set_numabalancing_state(bool enabled)
+int sysctl_numa_balancing_mode;
+
+static void __set_numabalancing_state(bool enabled)
{
if (enabled)
static_branch_enable(&sched_numa_balancing);
@@ -4287,13 +4289,22 @@ void set_numabalancing_state(bool enable
static_branch_disable(&sched_numa_balancing);
}
+void set_numabalancing_state(bool enabled)
+{
+ if (enabled)
+ sysctl_numa_balancing_mode = NUMA_BALANCING_NORMAL;
+ else
+ sysctl_numa_balancing_mode = NUMA_BALANCING_DISABLED;
+ __set_numabalancing_state(enabled);
+}
+
#ifdef CONFIG_PROC_SYSCTL
int sysctl_numa_balancing(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
struct ctl_table t;
int err;
- int state = static_branch_likely(&sched_numa_balancing);
+ int state = sysctl_numa_balancing_mode;
if (write && !capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -4303,8 +4314,10 @@ int sysctl_numa_balancing(struct ctl_tab
err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
if (err < 0)
return err;
- if (write)
- set_numabalancing_state(state);
+ if (write) {
+ sysctl_numa_balancing_mode = state;
+ __set_numabalancing_state(state);
+ }
return err;
}
#endif
--- a/kernel/sysctl.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/kernel/sysctl.c
@@ -1696,7 +1696,7 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = sysctl_numa_balancing,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = SYSCTL_FOUR,
},
#endif /* CONFIG_NUMA_BALANCING */
{
--- a/mm/migrate.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/mm/migrate.c
@@ -51,6 +51,7 @@
#include <linux/oom.h>
#include <linux/memory.h>
#include <linux/random.h>
+#include <linux/sched/sysctl.h>
#include <asm/tlbflush.h>
@@ -2031,16 +2032,27 @@ static int numamigrate_isolate_page(pg_d
{
int page_lru;
int nr_pages = thp_nr_pages(page);
+ int order = compound_order(page);
- VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page);
+ VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
/* Do not migrate THP mapped by multiple processes */
if (PageTransHuge(page) && total_mapcount(page) > 1)
return 0;
/* Avoid migrating to a node that is nearly full */
- if (!migrate_balanced_pgdat(pgdat, nr_pages))
+ if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
+ int z;
+
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING))
+ return 0;
+ for (z = pgdat->nr_zones - 1; z >= 0; z--) {
+ if (populated_zone(pgdat->node_zones + z))
+ break;
+ }
+ wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE);
return 0;
+ }
if (isolate_lru_page(page))
return 0;
--- a/mm/page_alloc.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/mm/page_alloc.c
@@ -8441,7 +8441,8 @@ static void __setup_per_zone_wmarks(void
zone->watermark_boost = 0;
zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp;
- zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
+ zone->_watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp;
+ zone->_watermark[WMARK_PROMO] = high_wmark_pages(zone) + tmp;
spin_unlock_irqrestore(&zone->lock, flags);
}
--- a/mm/vmscan.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/mm/vmscan.c
@@ -56,6 +56,7 @@
#include <linux/swapops.h>
#include <linux/balloon_compaction.h>
+#include <linux/sched/sysctl.h>
#include "internal.h"
@@ -3895,7 +3896,10 @@ static bool pgdat_balanced(pg_data_t *pg
if (!managed_zone(zone))
continue;
- mark = high_wmark_pages(zone);
+ if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
+ mark = wmark_pages(zone, WMARK_PROMO);
+ else
+ mark = high_wmark_pages(zone);
if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))
return true;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 155/227] NUMA balancing: optimize page placement for memory tiering system
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, weixugc, shy828301, shakeelb, riel, rdunlap,
peterz, osalvador, mhocko, mgorman, hannes, feng.tang,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: NUMA balancing: optimize page placement for memory tiering system
With the advent of various new memory types, some machines will have
multiple types of memory, e.g. DRAM and PMEM (persistent memory). The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are usually
different.
In such system, because of the memory accessing pattern changing etc,
some pages in the slow memory may become hot globally. So in this
patch, the NUMA balancing mechanism is enhanced to optimize the page
placement among the different memory types according to hot/cold
dynamically.
In a typical memory tiering system, there are CPUs, fast memory and
slow memory in each physical NUMA node. The CPUs and the fast memory
will be put in one logical node (called fast memory node), while the
slow memory will be put in another (faked) logical node (called slow
memory node). That is, the fast memory is regarded as local while the
slow memory is regarded as remote. So it's possible for the recently
accessed pages in the slow memory node to be promoted to the fast
memory node via the existing NUMA balancing mechanism.
The original NUMA balancing mechanism will stop to migrate pages if the
free memory of the target node becomes below the high watermark. This
is a reasonable policy if there's only one memory type. But this makes
the original NUMA balancing mechanism almost do not work to optimize
page placement among different memory types. Details are as follows.
It's the common cases that the working-set size of the workload is
larger than the size of the fast memory nodes. Otherwise, it's
unnecessary to use the slow memory at all. So, there are almost always
no enough free pages in the fast memory nodes, so that the globally hot
pages in the slow memory node cannot be promoted to the fast memory
node. To solve the issue, we have 2 choices as follows,
a. Ignore the free pages watermark checking when promoting hot pages
from the slow memory node to the fast memory node. This will
create some memory pressure in the fast memory node, thus trigger
the memory reclaiming. So that, the cold pages in the fast memory
node will be demoted to the slow memory node.
b. Define a new watermark called wmark_promo which is higher than
wmark_high, and have kswapd reclaiming pages until free pages reach
such watermark. The scenario is as follows: when we want to promote
hot-pages from a slow memory to a fast memory, but fast memory's free
pages would go lower than high watermark with such promotion, we wake
up kswapd with wmark_promo watermark in order to demote cold pages and
free us up some space. So, next time we want to promote hot-pages we
might have a chance of doing so.
The choice "a" may create high memory pressure in the fast memory node.
If the memory pressure of the workload is high, the memory pressure
may become so high that the memory allocation latency of the workload
is influenced, e.g. the direct reclaiming may be triggered.
The choice "b" works much better at this aspect. If the memory
pressure of the workload is high, the hot pages promotion will stop
earlier because its allocation watermark is higher than that of the
normal memory allocation. So in this patch, choice "b" is implemented.
A new zone watermark (WMARK_PROMO) is added. Which is larger than the
high watermark and can be controlled via watermark_scale_factor.
In addition to the original page placement optimization among sockets,
the NUMA balancing mechanism is extended to be used to optimize page
placement according to hot/cold among different memory types. So the
sysctl user space interface (numa_balancing) is extended in a backward
compatible way as follow, so that the users can enable/disable these
functionality individually.
The sysctl is converted from a Boolean value to a bits field. The
definition of the flags is,
- 0: NUMA_BALANCING_DISABLED
- 1: NUMA_BALANCING_NORMAL
- 2: NUMA_BALANCING_MEMORY_TIERING
We have tested the patch with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent
Memory Model. The test results shows that the pmbench score can
improve up to 95.9%.
Thanks Andrew Morton to help fix the document format error.
Link: https://lkml.kernel.org/r/20220221084529.1052339-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/sysctl/kernel.rst | 31 ++++++++++++------
include/linux/mmzone.h | 1
include/linux/sched/sysctl.h | 10 +++++
kernel/sched/core.c | 21 +++++++++---
kernel/sysctl.c | 2 -
mm/migrate.c | 16 ++++++++-
mm/page_alloc.c | 3 +
mm/vmscan.c | 6 ++-
8 files changed, 71 insertions(+), 19 deletions(-)
--- a/Documentation/admin-guide/sysctl/kernel.rst~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/Documentation/admin-guide/sysctl/kernel.rst
@@ -595,16 +595,23 @@ Documentation/admin-guide/kernel-paramet
numa_balancing
==============
-Enables/disables automatic page fault based NUMA memory
-balancing. Memory is moved automatically to nodes
-that access it often.
-
-Enables/disables automatic NUMA memory balancing. On NUMA machines, there
-is a performance penalty if remote memory is accessed by a CPU. When this
-feature is enabled the kernel samples what task thread is accessing memory
-by periodically unmapping pages and later trapping a page fault. At the
-time of the page fault, it is determined if the data being accessed should
-be migrated to a local memory node.
+Enables/disables and configures automatic page fault based NUMA memory
+balancing. Memory is moved automatically to nodes that access it often.
+The value to set can be the result of ORing the following:
+
+= =================================
+0 NUMA_BALANCING_DISABLED
+1 NUMA_BALANCING_NORMAL
+2 NUMA_BALANCING_MEMORY_TIERING
+= =================================
+
+Or NUMA_BALANCING_NORMAL to optimize page placement among different
+NUMA nodes to reduce remote accessing. On NUMA machines, there is a
+performance penalty if remote memory is accessed by a CPU. When this
+feature is enabled the kernel samples what task thread is accessing
+memory by periodically unmapping pages and later trapping a page
+fault. At the time of the page fault, it is determined if the data
+being accessed should be migrated to a local memory node.
The unmapping of pages and trapping faults incur additional overhead that
ideally is offset by improved memory locality but there is no universal
@@ -615,6 +622,10 @@ faults may be controlled by the `numa_ba
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.
+Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among
+different types of memory (represented as different NUMA nodes) to
+place the hot pages in the fast memory. This is implemented based on
+unmapping and page fault too.
numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
===============================================================================================================================
--- a/include/linux/mmzone.h~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/include/linux/mmzone.h
@@ -353,6 +353,7 @@ enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
WMARK_HIGH,
+ WMARK_PROMO,
NR_WMARK
};
--- a/include/linux/sched/sysctl.h~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/include/linux/sched/sysctl.h
@@ -23,6 +23,16 @@ enum sched_tunable_scaling {
SCHED_TUNABLESCALING_END,
};
+#define NUMA_BALANCING_DISABLED 0x0
+#define NUMA_BALANCING_NORMAL 0x1
+#define NUMA_BALANCING_MEMORY_TIERING 0x2
+
+#ifdef CONFIG_NUMA_BALANCING
+extern int sysctl_numa_balancing_mode;
+#else
+#define sysctl_numa_balancing_mode 0
+#endif
+
/*
* control realtime throttling:
*
--- a/kernel/sched/core.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/kernel/sched/core.c
@@ -4279,7 +4279,9 @@ DEFINE_STATIC_KEY_FALSE(sched_numa_balan
#ifdef CONFIG_NUMA_BALANCING
-void set_numabalancing_state(bool enabled)
+int sysctl_numa_balancing_mode;
+
+static void __set_numabalancing_state(bool enabled)
{
if (enabled)
static_branch_enable(&sched_numa_balancing);
@@ -4287,13 +4289,22 @@ void set_numabalancing_state(bool enable
static_branch_disable(&sched_numa_balancing);
}
+void set_numabalancing_state(bool enabled)
+{
+ if (enabled)
+ sysctl_numa_balancing_mode = NUMA_BALANCING_NORMAL;
+ else
+ sysctl_numa_balancing_mode = NUMA_BALANCING_DISABLED;
+ __set_numabalancing_state(enabled);
+}
+
#ifdef CONFIG_PROC_SYSCTL
int sysctl_numa_balancing(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
struct ctl_table t;
int err;
- int state = static_branch_likely(&sched_numa_balancing);
+ int state = sysctl_numa_balancing_mode;
if (write && !capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -4303,8 +4314,10 @@ int sysctl_numa_balancing(struct ctl_tab
err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
if (err < 0)
return err;
- if (write)
- set_numabalancing_state(state);
+ if (write) {
+ sysctl_numa_balancing_mode = state;
+ __set_numabalancing_state(state);
+ }
return err;
}
#endif
--- a/kernel/sysctl.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/kernel/sysctl.c
@@ -1696,7 +1696,7 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = sysctl_numa_balancing,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = SYSCTL_FOUR,
},
#endif /* CONFIG_NUMA_BALANCING */
{
--- a/mm/migrate.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/mm/migrate.c
@@ -51,6 +51,7 @@
#include <linux/oom.h>
#include <linux/memory.h>
#include <linux/random.h>
+#include <linux/sched/sysctl.h>
#include <asm/tlbflush.h>
@@ -2031,16 +2032,27 @@ static int numamigrate_isolate_page(pg_d
{
int page_lru;
int nr_pages = thp_nr_pages(page);
+ int order = compound_order(page);
- VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page);
+ VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
/* Do not migrate THP mapped by multiple processes */
if (PageTransHuge(page) && total_mapcount(page) > 1)
return 0;
/* Avoid migrating to a node that is nearly full */
- if (!migrate_balanced_pgdat(pgdat, nr_pages))
+ if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
+ int z;
+
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING))
+ return 0;
+ for (z = pgdat->nr_zones - 1; z >= 0; z--) {
+ if (populated_zone(pgdat->node_zones + z))
+ break;
+ }
+ wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE);
return 0;
+ }
if (isolate_lru_page(page))
return 0;
--- a/mm/page_alloc.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/mm/page_alloc.c
@@ -8441,7 +8441,8 @@ static void __setup_per_zone_wmarks(void
zone->watermark_boost = 0;
zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp;
- zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
+ zone->_watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp;
+ zone->_watermark[WMARK_PROMO] = high_wmark_pages(zone) + tmp;
spin_unlock_irqrestore(&zone->lock, flags);
}
--- a/mm/vmscan.c~numa-balancing-optimize-page-placement-for-memory-tiering-system
+++ a/mm/vmscan.c
@@ -56,6 +56,7 @@
#include <linux/swapops.h>
#include <linux/balloon_compaction.h>
+#include <linux/sched/sysctl.h>
#include "internal.h"
@@ -3895,7 +3896,10 @@ static bool pgdat_balanced(pg_data_t *pg
if (!managed_zone(zone))
continue;
- mark = high_wmark_pages(zone);
+ if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
+ mark = wmark_pages(zone, WMARK_PROMO);
+ else
+ mark = high_wmark_pages(zone);
if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))
return true;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 156/227] memory tiering: skip to scan fast memory
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, weixugc, shy828301, shakeelb, riel, rdunlap,
peterz, osalvador, mhocko, mgorman, hannes, feng.tang,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: memory tiering: skip to scan fast memory
If the NUMA balancing isn't used to optimize the page placement among
sockets but only among memory types, the hot pages in the fast memory
node couldn't be migrated (promoted) to anywhere. So it's unnecessary
to scan the pages in the fast memory node via changing their PTE/PMD
mapping to be PROT_NONE. So that the page faults could be avoided too.
In the test, if only the memory tiering NUMA balancing mode is enabled,
the number of the NUMA balancing hint faults for the DRAM node is
reduced to almost 0 with the patch. While the benchmark score doesn't
change visibly.
Link: https://lkml.kernel.org/r/20220221084529.1052339-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 30 +++++++++++++++++++++---------
mm/mprotect.c | 13 ++++++++++++-
2 files changed, 33 insertions(+), 10 deletions(-)
--- a/mm/huge_memory.c~memory-tiering-skip-to-scan-fast-memory
+++ a/mm/huge_memory.c
@@ -34,6 +34,7 @@
#include <linux/oom.h>
#include <linux/numa.h>
#include <linux/page_owner.h>
+#include <linux/sched/sysctl.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -1766,17 +1767,28 @@ int change_huge_pmd(struct vm_area_struc
}
#endif
- /*
- * Avoid trapping faults against the zero page. The read-only
- * data is likely to be read-cached on the local CPU and
- * local/remote hits to the zero page are not interesting.
- */
- if (prot_numa && is_huge_zero_pmd(*pmd))
- goto unlock;
+ if (prot_numa) {
+ struct page *page;
+ /*
+ * Avoid trapping faults against the zero page. The read-only
+ * data is likely to be read-cached on the local CPU and
+ * local/remote hits to the zero page are not interesting.
+ */
+ if (is_huge_zero_pmd(*pmd))
+ goto unlock;
- if (prot_numa && pmd_protnone(*pmd))
- goto unlock;
+ if (pmd_protnone(*pmd))
+ goto unlock;
+ page = pmd_page(*pmd);
+ /*
+ * Skip scanning top tier node if normal numa
+ * balancing is disabled
+ */
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
+ node_is_toptier(page_to_nid(page)))
+ goto unlock;
+ }
/*
* In case prot_numa, we are under mmap_read_lock(mm). It's critical
* to not clear pmd intermittently to avoid race with MADV_DONTNEED
--- a/mm/mprotect.c~memory-tiering-skip-to-scan-fast-memory
+++ a/mm/mprotect.c
@@ -29,6 +29,7 @@
#include <linux/uaccess.h>
#include <linux/mm_inline.h>
#include <linux/pgtable.h>
+#include <linux/sched/sysctl.h>
#include <asm/cacheflush.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -83,6 +84,7 @@ static unsigned long change_pte_range(st
*/
if (prot_numa) {
struct page *page;
+ int nid;
/* Avoid TLB flush if possible */
if (pte_protnone(oldpte))
@@ -109,7 +111,16 @@ static unsigned long change_pte_range(st
* Don't mess with PTEs if page is already on the node
* a single-threaded process is running on.
*/
- if (target_node == page_to_nid(page))
+ nid = page_to_nid(page);
+ if (target_node == nid)
+ continue;
+
+ /*
+ * Skip scanning top tier node if normal numa
+ * balancing is disabled
+ */
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
+ node_is_toptier(nid))
continue;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 156/227] memory tiering: skip to scan fast memory
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: ziy, zhongjiang-ali, weixugc, shy828301, shakeelb, riel, rdunlap,
peterz, osalvador, mhocko, mgorman, hannes, feng.tang,
dave.hansen, baolin.wang, ying.huang, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Huang Ying <ying.huang@intel.com>
Subject: memory tiering: skip to scan fast memory
If the NUMA balancing isn't used to optimize the page placement among
sockets but only among memory types, the hot pages in the fast memory
node couldn't be migrated (promoted) to anywhere. So it's unnecessary
to scan the pages in the fast memory node via changing their PTE/PMD
mapping to be PROT_NONE. So that the page faults could be avoided too.
In the test, if only the memory tiering NUMA balancing mode is enabled,
the number of the NUMA balancing hint faults for the DRAM node is
reduced to almost 0 with the patch. While the benchmark score doesn't
change visibly.
Link: https://lkml.kernel.org/r/20220221084529.1052339-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 30 +++++++++++++++++++++---------
mm/mprotect.c | 13 ++++++++++++-
2 files changed, 33 insertions(+), 10 deletions(-)
--- a/mm/huge_memory.c~memory-tiering-skip-to-scan-fast-memory
+++ a/mm/huge_memory.c
@@ -34,6 +34,7 @@
#include <linux/oom.h>
#include <linux/numa.h>
#include <linux/page_owner.h>
+#include <linux/sched/sysctl.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -1766,17 +1767,28 @@ int change_huge_pmd(struct vm_area_struc
}
#endif
- /*
- * Avoid trapping faults against the zero page. The read-only
- * data is likely to be read-cached on the local CPU and
- * local/remote hits to the zero page are not interesting.
- */
- if (prot_numa && is_huge_zero_pmd(*pmd))
- goto unlock;
+ if (prot_numa) {
+ struct page *page;
+ /*
+ * Avoid trapping faults against the zero page. The read-only
+ * data is likely to be read-cached on the local CPU and
+ * local/remote hits to the zero page are not interesting.
+ */
+ if (is_huge_zero_pmd(*pmd))
+ goto unlock;
- if (prot_numa && pmd_protnone(*pmd))
- goto unlock;
+ if (pmd_protnone(*pmd))
+ goto unlock;
+ page = pmd_page(*pmd);
+ /*
+ * Skip scanning top tier node if normal numa
+ * balancing is disabled
+ */
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
+ node_is_toptier(page_to_nid(page)))
+ goto unlock;
+ }
/*
* In case prot_numa, we are under mmap_read_lock(mm). It's critical
* to not clear pmd intermittently to avoid race with MADV_DONTNEED
--- a/mm/mprotect.c~memory-tiering-skip-to-scan-fast-memory
+++ a/mm/mprotect.c
@@ -29,6 +29,7 @@
#include <linux/uaccess.h>
#include <linux/mm_inline.h>
#include <linux/pgtable.h>
+#include <linux/sched/sysctl.h>
#include <asm/cacheflush.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -83,6 +84,7 @@ static unsigned long change_pte_range(st
*/
if (prot_numa) {
struct page *page;
+ int nid;
/* Avoid TLB flush if possible */
if (pte_protnone(oldpte))
@@ -109,7 +111,16 @@ static unsigned long change_pte_range(st
* Don't mess with PTEs if page is already on the node
* a single-threaded process is running on.
*/
- if (target_node == page_to_nid(page))
+ nid = page_to_nid(page);
+ if (target_node == nid)
+ continue;
+
+ /*
+ * Skip scanning top tier node if normal numa
+ * balancing is disabled
+ */
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
+ node_is_toptier(nid))
continue;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 157/227] mm: page_io: fix psi memory pressure error on cold swapins
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: yuzhao, minchan, iamjoonsoo.kim, cgel.zte, hannes, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: page_io: fix psi memory pressure error on cold swapins
Once upon a time, all swapins counted toward memory pressure[1]. Then
Joonsoo introduced workingset detection for anonymous pages and we gained
the ability to distinguish hot from cold swapins[2][3]. But we failed to
update swap_readpage() accordingly, and now we account partial memory
pressure in the swapin path of cold memory.
Not for all situations - which adds more inconsistency: paths using the
conventional submit_bio() and lock_page() route will not see much pressure
- unless storage itself is heavily congested and the bio submissions
stall. ZRAM and ZSWAP do most of the work directly from swap_readpage()
and will see all swapins reflected as pressure.
IOW, a workload doing cold swapins could see little to no pressure
reported with on-disk swap, but potentially high pressure with a zram or
zswap backend. That confuses any psi-based health monitoring, load
shedding, proactive reclaim, or userspace OOM killing schemes that might
be in place for the workload.
Restore consistency by making all swapin stall accounting conditional on
the page actually being part of the workingset.
[1] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
[2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
[3] commit cad8320b4b39 ("mm/swap: don't SetPageWorkingset unconditionally during swapin")
Link: https://lkml.kernel.org/r/20220214214921.419687-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: CGEL <cgel.zte@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_io.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/page_io.c~mm-page_io-fix-psi-memory-pressure-error-on-cold-swapins
+++ a/mm/page_io.c
@@ -359,6 +359,7 @@ int swap_readpage(struct page *page, boo
struct bio *bio;
int ret = 0;
struct swap_info_struct *sis = page_swap_info(page);
+ bool workingset = PageWorkingset(page);
unsigned long pflags;
VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
@@ -370,7 +371,8 @@ int swap_readpage(struct page *page, boo
* or the submitting cgroup IO-throttled, submission can be a
* significant part of overall IO time.
*/
- psi_memstall_enter(&pflags);
+ if (workingset)
+ psi_memstall_enter(&pflags);
delayacct_swapin_start();
if (frontswap_load(page) == 0) {
@@ -433,7 +435,8 @@ int swap_readpage(struct page *page, boo
bio_put(bio);
out:
- psi_memstall_leave(&pflags);
+ if (workingset)
+ psi_memstall_leave(&pflags);
delayacct_swapin_end();
return ret;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 157/227] mm: page_io: fix psi memory pressure error on cold swapins
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: yuzhao, minchan, iamjoonsoo.kim, cgel.zte, hannes, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: page_io: fix psi memory pressure error on cold swapins
Once upon a time, all swapins counted toward memory pressure[1]. Then
Joonsoo introduced workingset detection for anonymous pages and we gained
the ability to distinguish hot from cold swapins[2][3]. But we failed to
update swap_readpage() accordingly, and now we account partial memory
pressure in the swapin path of cold memory.
Not for all situations - which adds more inconsistency: paths using the
conventional submit_bio() and lock_page() route will not see much pressure
- unless storage itself is heavily congested and the bio submissions
stall. ZRAM and ZSWAP do most of the work directly from swap_readpage()
and will see all swapins reflected as pressure.
IOW, a workload doing cold swapins could see little to no pressure
reported with on-disk swap, but potentially high pressure with a zram or
zswap backend. That confuses any psi-based health monitoring, load
shedding, proactive reclaim, or userspace OOM killing schemes that might
be in place for the workload.
Restore consistency by making all swapin stall accounting conditional on
the page actually being part of the workingset.
[1] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
[2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
[3] commit cad8320b4b39 ("mm/swap: don't SetPageWorkingset unconditionally during swapin")
Link: https://lkml.kernel.org/r/20220214214921.419687-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: CGEL <cgel.zte@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_io.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/page_io.c~mm-page_io-fix-psi-memory-pressure-error-on-cold-swapins
+++ a/mm/page_io.c
@@ -359,6 +359,7 @@ int swap_readpage(struct page *page, boo
struct bio *bio;
int ret = 0;
struct swap_info_struct *sis = page_swap_info(page);
+ bool workingset = PageWorkingset(page);
unsigned long pflags;
VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
@@ -370,7 +371,8 @@ int swap_readpage(struct page *page, boo
* or the submitting cgroup IO-throttled, submission can be a
* significant part of overall IO time.
*/
- psi_memstall_enter(&pflags);
+ if (workingset)
+ psi_memstall_enter(&pflags);
delayacct_swapin_start();
if (frontswap_load(page) == 0) {
@@ -433,7 +435,8 @@ int swap_readpage(struct page *page, boo
bio_put(bio);
out:
- psi_memstall_leave(&pflags);
+ if (workingset)
+ psi_memstall_leave(&pflags);
delayacct_swapin_end();
return ret;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 158/227] mm/vmstat: add event for ksm swapping in copy
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: yang.shi, saravanand, ran.xiaokai, hughd, dave.hansen,
yang.yang29, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Yang Yang <yang.yang29@zte.com.cn>
Subject: mm/vmstat: add event for ksm swapping in copy
When faults in from swap what used to be a KSM page and that page had been
swapped in before, system has to make a copy, and leaves remerging the
pages to a later pass of ksmd.
That is not good for performace, we'd better to reduce this kind of copy.
There are some ways to reduce it, for example lessen swappiness or
madvise(, , MADV_MERGEABLE) range. So add this event to support doing
this tuning. Just like this patch: "mm, THP, swap: add THP swapping out
fallback counting".
Link: https://lkml.kernel.org/r/20220113023839.758845-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Saravanan D <saravanand@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/vm_event_item.h | 3 +++
mm/ksm.c | 3 +++
mm/vmstat.c | 3 +++
3 files changed, 9 insertions(+)
--- a/include/linux/vm_event_item.h~mm-vmstat-add-event-for-ksm-swapping-in-copy
+++ a/include/linux/vm_event_item.h
@@ -129,6 +129,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
#ifdef CONFIG_SWAP
SWAP_RA,
SWAP_RA_HIT,
+#ifdef CONFIG_KSM
+ KSM_SWPIN_COPY,
+#endif
#endif
#ifdef CONFIG_X86
DIRECT_MAP_LEVEL2_SPLIT,
--- a/mm/ksm.c~mm-vmstat-add-event-for-ksm-swapping-in-copy
+++ a/mm/ksm.c
@@ -2595,6 +2595,9 @@ struct page *ksm_might_need_to_copy(stru
SetPageDirty(new_page);
__SetPageUptodate(new_page);
__SetPageLocked(new_page);
+#ifdef CONFIG_SWAP
+ count_vm_event(KSM_SWPIN_COPY);
+#endif
}
return new_page;
--- a/mm/vmstat.c~mm-vmstat-add-event-for-ksm-swapping-in-copy
+++ a/mm/vmstat.c
@@ -1388,6 +1388,9 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_SWAP
"swap_ra",
"swap_ra_hit",
+#ifdef CONFIG_KSM
+ "ksm_swpin_copy",
+#endif
#endif
#ifdef CONFIG_X86
"direct_map_level2_splits",
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 158/227] mm/vmstat: add event for ksm swapping in copy
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: yang.shi, saravanand, ran.xiaokai, hughd, dave.hansen,
yang.yang29, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Yang Yang <yang.yang29@zte.com.cn>
Subject: mm/vmstat: add event for ksm swapping in copy
When faults in from swap what used to be a KSM page and that page had been
swapped in before, system has to make a copy, and leaves remerging the
pages to a later pass of ksmd.
That is not good for performace, we'd better to reduce this kind of copy.
There are some ways to reduce it, for example lessen swappiness or
madvise(, , MADV_MERGEABLE) range. So add this event to support doing
this tuning. Just like this patch: "mm, THP, swap: add THP swapping out
fallback counting".
Link: https://lkml.kernel.org/r/20220113023839.758845-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Saravanan D <saravanand@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/vm_event_item.h | 3 +++
mm/ksm.c | 3 +++
mm/vmstat.c | 3 +++
3 files changed, 9 insertions(+)
--- a/include/linux/vm_event_item.h~mm-vmstat-add-event-for-ksm-swapping-in-copy
+++ a/include/linux/vm_event_item.h
@@ -129,6 +129,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
#ifdef CONFIG_SWAP
SWAP_RA,
SWAP_RA_HIT,
+#ifdef CONFIG_KSM
+ KSM_SWPIN_COPY,
+#endif
#endif
#ifdef CONFIG_X86
DIRECT_MAP_LEVEL2_SPLIT,
--- a/mm/ksm.c~mm-vmstat-add-event-for-ksm-swapping-in-copy
+++ a/mm/ksm.c
@@ -2595,6 +2595,9 @@ struct page *ksm_might_need_to_copy(stru
SetPageDirty(new_page);
__SetPageUptodate(new_page);
__SetPageLocked(new_page);
+#ifdef CONFIG_SWAP
+ count_vm_event(KSM_SWPIN_COPY);
+#endif
}
return new_page;
--- a/mm/vmstat.c~mm-vmstat-add-event-for-ksm-swapping-in-copy
+++ a/mm/vmstat.c
@@ -1388,6 +1388,9 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_SWAP
"swap_ra",
"swap_ra_hit",
+#ifdef CONFIG_KSM
+ "ksm_swpin_copy",
+#endif
#endif
#ifdef CONFIG_X86
"direct_map_level2_splits",
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 159/227] mm/ksm: use helper macro __ATTR_RW
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/ksm: use helper macro __ATTR_RW
Use helper macro __ATTR_RW to define KSM_ATTR to make code more clear.
Minor readability improvement.
Link: https://lkml.kernel.org/r/20220221115809.26381-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/ksm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/ksm.c~mm-ksm-use-helper-macro-__attr_rw
+++ a/mm/ksm.c
@@ -2829,8 +2829,7 @@ static void wait_while_offlining(void)
#define KSM_ATTR_RO(_name) \
static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
#define KSM_ATTR(_name) \
- static struct kobj_attribute _name##_attr = \
- __ATTR(_name, 0644, _name##_show, _name##_store)
+ static struct kobj_attribute _name##_attr = __ATTR_RW(_name)
static ssize_t sleep_millisecs_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 159/227] mm/ksm: use helper macro __ATTR_RW
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/ksm: use helper macro __ATTR_RW
Use helper macro __ATTR_RW to define KSM_ATTR to make code more clear.
Minor readability improvement.
Link: https://lkml.kernel.org/r/20220221115809.26381-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/ksm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/ksm.c~mm-ksm-use-helper-macro-__attr_rw
+++ a/mm/ksm.c
@@ -2829,8 +2829,7 @@ static void wait_while_offlining(void)
#define KSM_ATTR_RO(_name) \
static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
#define KSM_ATTR(_name) \
- static struct kobj_attribute _name##_attr = \
- __ATTR(_name, 0644, _name##_show, _name##_store)
+ static struct kobj_attribute _name##_attr = __ATTR_RW(_name)
static ssize_t sleep_millisecs_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 160/227] mm/hwpoison: check the subpage, not the head page
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: shy828301, rientjes, naoya.horiguchi, mike.kravetz, willy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/hwpoison: check the subpage, not the head page
Hardware poison is tracked on a per-page basis, not on the head page.
Link: https://lkml.kernel.org/r/20220130013042.1906881-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/rmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/rmap.c~mm-hwpoison-check-the-subpage-not-the-head-page
+++ a/mm/rmap.c
@@ -1553,7 +1553,7 @@ static bool try_to_unmap_one(struct page
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
- if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
+ if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (PageHuge(page)) {
hugetlb_count_sub(compound_nr(page), mm);
@@ -1873,7 +1873,7 @@ static bool try_to_migrate_one(struct pa
* memory are supported.
*/
subpage = page;
- } else if (PageHWPoison(page)) {
+ } else if (PageHWPoison(subpage)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (PageHuge(page)) {
hugetlb_count_sub(compound_nr(page), mm);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 160/227] mm/hwpoison: check the subpage, not the head page
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: shy828301, rientjes, naoya.horiguchi, mike.kravetz, willy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: mm/hwpoison: check the subpage, not the head page
Hardware poison is tracked on a per-page basis, not on the head page.
Link: https://lkml.kernel.org/r/20220130013042.1906881-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/rmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/rmap.c~mm-hwpoison-check-the-subpage-not-the-head-page
+++ a/mm/rmap.c
@@ -1553,7 +1553,7 @@ static bool try_to_unmap_one(struct page
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
- if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
+ if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (PageHuge(page)) {
hugetlb_count_sub(compound_nr(page), mm);
@@ -1873,7 +1873,7 @@ static bool try_to_migrate_one(struct pa
* memory are supported.
*/
subpage = page;
- } else if (PageHWPoison(page)) {
+ } else if (PageHWPoison(subpage)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (PageHuge(page)) {
hugetlb_count_sub(compound_nr(page), mm);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 161/227] mm/madvise: use vma_lookup() instead of find_vma()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: david, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/madvise: use vma_lookup() instead of find_vma()
Using vma_lookup() verifies the start address is contained in the found
vma. This results in easier to read the code.
Link: https://lkml.kernel.org/r/20220311082731.63513-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/madvise.c~mm-madvise-use-vma_lookup-instead-of-find_vma
+++ a/mm/madvise.c
@@ -849,8 +849,8 @@ static long madvise_populate(struct vm_a
* our VMA might have been split.
*/
if (!vma || start >= vma->vm_end) {
- vma = find_vma(mm, start);
- if (!vma || start < vma->vm_start)
+ vma = vma_lookup(mm, start);
+ if (!vma)
return -ENOMEM;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 161/227] mm/madvise: use vma_lookup() instead of find_vma()
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: david, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/madvise: use vma_lookup() instead of find_vma()
Using vma_lookup() verifies the start address is contained in the found
vma. This results in easier to read the code.
Link: https://lkml.kernel.org/r/20220311082731.63513-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/madvise.c~mm-madvise-use-vma_lookup-instead-of-find_vma
+++ a/mm/madvise.c
@@ -849,8 +849,8 @@ static long madvise_populate(struct vm_a
* our VMA might have been split.
*/
if (!vma || start >= vma->vm_end) {
- vma = find_vma(mm, start);
- if (!vma || start < vma->vm_start)
+ vma = vma_lookup(mm, start);
+ if (!vma)
return -ENOMEM;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 162/227] mm: madvise: return correct bytes advised with process_madvise
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: vbabka, surenb, stable, sfr, rientjes, nadav.amit, minchan,
mhocko, quic_charante, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: mm: madvise: return correct bytes advised with process_madvise
Patch series "mm: madvise: return correct bytes processed with
process_madvise", v2. With the process_madvise(), always choose to return
non zero processed bytes over an error. This can help the user to know on
which VMA, passed in the 'struct iovec' vector list, is failed to advise
thus can take the decission of retrying/skipping on that VMA.
This patch (of 2):
The process_madvise() system call returns error even after processing some
VMA's passed in the 'struct iovec' vector list which leaves the user
confused to know where to restart the advise next. It is also against
this syscall man page[1] documentation where it mentions that "return
value may be less than the total number of requested bytes, if an error
occurred after some iovec elements were already processed.".
Consider a user passed 10 VMA's in the 'struct iovec' vector list of which
9 are processed but one. Then it just returns the error caused on that
failed VMA despite the first 9 VMA's processed, leaving the user confused
about on which VMA it is failed. Returning the number of bytes processed
here can help the user to know which VMA it is failed on and thus can
retry/skip the advise on that VMA.
[1]https://man7.org/linux/man-pages/man2/process_madvise.2.html.
Link: https://lkml.kernel.org/r/cover.1647008754.git.quic_charante@quicinc.com
Link: https://lkml.kernel.org/r/125b61a0edcee5c2db8658aed9d06a43a19ccafc.1647008754.git.quic_charante@quicinc.com
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/madvise.c~mm-madvise-return-correct-bytes-advised-with-process_madvise
+++ a/mm/madvise.c
@@ -1435,8 +1435,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi
iov_iter_advance(&iter, iovec.iov_len);
}
- if (ret == 0)
- ret = total_len - iov_iter_count(&iter);
+ ret = (total_len - iov_iter_count(&iter)) ? : ret;
release_mm:
mmput(mm);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 162/227] mm: madvise: return correct bytes advised with process_madvise
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: vbabka, surenb, stable, sfr, rientjes, nadav.amit, minchan,
mhocko, quic_charante, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: mm: madvise: return correct bytes advised with process_madvise
Patch series "mm: madvise: return correct bytes processed with
process_madvise", v2. With the process_madvise(), always choose to return
non zero processed bytes over an error. This can help the user to know on
which VMA, passed in the 'struct iovec' vector list, is failed to advise
thus can take the decission of retrying/skipping on that VMA.
This patch (of 2):
The process_madvise() system call returns error even after processing some
VMA's passed in the 'struct iovec' vector list which leaves the user
confused to know where to restart the advise next. It is also against
this syscall man page[1] documentation where it mentions that "return
value may be less than the total number of requested bytes, if an error
occurred after some iovec elements were already processed.".
Consider a user passed 10 VMA's in the 'struct iovec' vector list of which
9 are processed but one. Then it just returns the error caused on that
failed VMA despite the first 9 VMA's processed, leaving the user confused
about on which VMA it is failed. Returning the number of bytes processed
here can help the user to know which VMA it is failed on and thus can
retry/skip the advise on that VMA.
[1]https://man7.org/linux/man-pages/man2/process_madvise.2.html.
Link: https://lkml.kernel.org/r/cover.1647008754.git.quic_charante@quicinc.com
Link: https://lkml.kernel.org/r/125b61a0edcee5c2db8658aed9d06a43a19ccafc.1647008754.git.quic_charante@quicinc.com
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/madvise.c~mm-madvise-return-correct-bytes-advised-with-process_madvise
+++ a/mm/madvise.c
@@ -1435,8 +1435,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi
iov_iter_advance(&iter, iovec.iov_len);
}
- if (ret == 0)
- ret = total_len - iov_iter_count(&iter);
+ ret = (total_len - iov_iter_count(&iter)) ? : ret;
release_mm:
mmput(mm);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: vbabka, surenb, stable, sfr, rientjes, nadav.amit, minchan,
mhocko, quic_charante, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
The process_madvise() system call is expected to skip holes in vma passed
through 'struct iovec' vector list. But do_madvise, which
process_madvise() calls for each vma, returns ENOMEM in case of unmapped
holes, despite the VMA is processed.
Thus process_madvise() should treat ENOMEM as expected and consider the
VMA passed to as processed and continue processing other vma's in the
vector list. Returning -ENOMEM to user, despite the VMA is processed,
will be unable to figure out where to start the next madvise.
Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise
+++ a/mm/madvise.c
@@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi
while (iov_iter_count(&iter)) {
iovec = iov_iter_iovec(&iter);
+ /*
+ * do_madvise returns ENOMEM if unmapped holes are present
+ * in the passed VMA. process_madvise() is expected to skip
+ * unmapped holes passed to it in the 'struct iovec' list
+ * and not fail because of them. Thus treat -ENOMEM return
+ * from do_madvise as valid and continue processing.
+ */
ret = do_madvise(mm, (unsigned long)iovec.iov_base,
iovec.iov_len, behavior);
- if (ret < 0)
+ if (ret < 0 && ret != -ENOMEM)
break;
iov_iter_advance(&iter, iovec.iov_len);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: vbabka, surenb, stable, sfr, rientjes, nadav.amit, minchan,
mhocko, quic_charante, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Charan Teja Kalla <quic_charante@quicinc.com>
Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
The process_madvise() system call is expected to skip holes in vma passed
through 'struct iovec' vector list. But do_madvise, which
process_madvise() calls for each vma, returns ENOMEM in case of unmapped
holes, despite the VMA is processed.
Thus process_madvise() should treat ENOMEM as expected and consider the
VMA passed to as processed and continue processing other vma's in the
vector list. Returning -ENOMEM to user, despite the VMA is processed,
will be unable to figure out where to start the next madvise.
Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise
+++ a/mm/madvise.c
@@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi
while (iov_iter_count(&iter)) {
iovec = iov_iter_iovec(&iter);
+ /*
+ * do_madvise returns ENOMEM if unmapped holes are present
+ * in the passed VMA. process_madvise() is expected to skip
+ * unmapped holes passed to it in the 'struct iovec' list
+ * and not fail because of them. Thus treat -ENOMEM return
+ * from do_madvise as valid and continue processing.
+ */
ret = do_madvise(mm, (unsigned long)iovec.iov_base,
iovec.iov_len, behavior);
- if (ret < 0)
+ if (ret < 0 && ret != -ENOMEM)
break;
iov_iter_advance(&iter, iovec.iov_len);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 164/227] mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG
Patch series "mm, memory_hotplug: handle unitialized numa node gracefully".
The core of the fix is patch 2 which also links existing bug reports. The
high level goal is to have all possible numa nodes have their pgdat
allocated and initialized so
for_each_possible_node(nid)
NODE_DATA(nid)
will never return garbage. This has proven to be problem in several
places when an offline numa node is used for an allocation just to realize
that node_data and therefore allocation fallback zonelists are not
initialized and such an allocation request blows up.
There were attempts to address that by checking node_online in several
places including the page allocator. This patchset approaches the problem
from a different perspective and instead of special casing, which just
adds a runtime overhead, it allocates pglist_data for each possible node.
This can add some memory overhead for platforms with high number of
possible nodes if they do not contain any memory. This should be a rather
rare configuration though.
How to test this? David has provided and excellent howto:
http://lkml.kernel.org/r/6e5ebc19-890c-b6dd-1924-9f25c441010d@redhat.com
Patches 1 and 3-6 are mostly cleanups. The patchset has been reviewed by
Rafael (thanks!) and the core fix tested by Rafael and Alexey (thanks to
both). David has tested as per instructions above and hasn't found any
fallouts in the memory hotplug scenarios.
This patch (of 6):
This is a preparatory patch and it doesn't introduce any functional
change. It merely pulls out arch_alloc_nodedata (and co) outside of
CONFIG_MEMORY_HOTPLUG because the following patch will need to call this
from the generic MM code.
Link: https://lkml.kernel.org/r/20220127085305.20890-1-mhocko@kernel.org
Link: https://lkml.kernel.org/r/20220127085305.20890-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/ia64/mm/discontig.c | 2
include/linux/memory_hotplug.h | 119 +++++++++++++++----------------
2 files changed, 59 insertions(+), 62 deletions(-)
--- a/arch/ia64/mm/discontig.c~mm-memory_hotplug-make-arch_alloc_nodedata-independent-on-config_memory_hotplug
+++ a/arch/ia64/mm/discontig.c
@@ -608,7 +608,6 @@ void __init paging_init(void)
zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
}
-#ifdef CONFIG_MEMORY_HOTPLUG
pg_data_t *arch_alloc_nodedata(int nid)
{
unsigned long size = compute_pernodesize(nid);
@@ -626,7 +625,6 @@ void arch_refresh_nodedata(int update_no
pgdat_list[update_node] = update_pgdat;
scatter_node_data();
}
-#endif
#ifdef CONFIG_SPARSEMEM_VMEMMAP
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-make-arch_alloc_nodedata-independent-on-config_memory_hotplug
+++ a/include/linux/memory_hotplug.h
@@ -16,6 +16,65 @@ struct memory_group;
struct resource;
struct vmem_altmap;
+#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
+/*
+ * For supporting node-hotadd, we have to allocate a new pgdat.
+ *
+ * If an arch has generic style NODE_DATA(),
+ * node_data[nid] = kzalloc() works well. But it depends on the architecture.
+ *
+ * In general, generic_alloc_nodedata() is used.
+ * Now, arch_free_nodedata() is just defined for error path of node_hot_add.
+ *
+ */
+extern pg_data_t *arch_alloc_nodedata(int nid);
+extern void arch_free_nodedata(pg_data_t *pgdat);
+extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat);
+
+#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
+
+#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid)
+#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat)
+
+#ifdef CONFIG_NUMA
+/*
+ * XXX: node aware allocation can't work well to get new node's memory at this time.
+ * Because, pgdat for the new node is not allocated/initialized yet itself.
+ * To use new node's memory, more consideration will be necessary.
+ */
+#define generic_alloc_nodedata(nid) \
+({ \
+ kzalloc(sizeof(pg_data_t), GFP_KERNEL); \
+})
+/*
+ * This definition is just for error path in node hotadd.
+ * For node hotremove, we have to replace this.
+ */
+#define generic_free_nodedata(pgdat) kfree(pgdat)
+
+extern pg_data_t *node_data[];
+static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
+{
+ node_data[nid] = pgdat;
+}
+
+#else /* !CONFIG_NUMA */
+
+/* never called */
+static inline pg_data_t *generic_alloc_nodedata(int nid)
+{
+ BUG();
+ return NULL;
+}
+static inline void generic_free_nodedata(pg_data_t *pgdat)
+{
+}
+static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
+{
+}
+#endif /* CONFIG_NUMA */
+#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
+
#ifdef CONFIG_MEMORY_HOTPLUG
struct page *pfn_to_online_page(unsigned long pfn);
@@ -154,66 +213,6 @@ int add_pages(int nid, unsigned long sta
struct mhp_params *params);
#endif /* ARCH_HAS_ADD_PAGES */
-#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
-/*
- * For supporting node-hotadd, we have to allocate a new pgdat.
- *
- * If an arch has generic style NODE_DATA(),
- * node_data[nid] = kzalloc() works well. But it depends on the architecture.
- *
- * In general, generic_alloc_nodedata() is used.
- * Now, arch_free_nodedata() is just defined for error path of node_hot_add.
- *
- */
-extern pg_data_t *arch_alloc_nodedata(int nid);
-extern void arch_free_nodedata(pg_data_t *pgdat);
-extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat);
-
-#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
-
-#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid)
-#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat)
-
-#ifdef CONFIG_NUMA
-/*
- * If ARCH_HAS_NODEDATA_EXTENSION=n, this func is used to allocate pgdat.
- * XXX: kmalloc_node() can't work well to get new node's memory at this time.
- * Because, pgdat for the new node is not allocated/initialized yet itself.
- * To use new node's memory, more consideration will be necessary.
- */
-#define generic_alloc_nodedata(nid) \
-({ \
- kzalloc(sizeof(pg_data_t), GFP_KERNEL); \
-})
-/*
- * This definition is just for error path in node hotadd.
- * For node hotremove, we have to replace this.
- */
-#define generic_free_nodedata(pgdat) kfree(pgdat)
-
-extern pg_data_t *node_data[];
-static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
-{
- node_data[nid] = pgdat;
-}
-
-#else /* !CONFIG_NUMA */
-
-/* never called */
-static inline pg_data_t *generic_alloc_nodedata(int nid)
-{
- BUG();
- return NULL;
-}
-static inline void generic_free_nodedata(pg_data_t *pgdat)
-{
-}
-static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
-{
-}
-#endif /* CONFIG_NUMA */
-#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
-
void get_online_mems(void);
void put_online_mems(void);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 164/227] mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG
Patch series "mm, memory_hotplug: handle unitialized numa node gracefully".
The core of the fix is patch 2 which also links existing bug reports. The
high level goal is to have all possible numa nodes have their pgdat
allocated and initialized so
for_each_possible_node(nid)
NODE_DATA(nid)
will never return garbage. This has proven to be problem in several
places when an offline numa node is used for an allocation just to realize
that node_data and therefore allocation fallback zonelists are not
initialized and such an allocation request blows up.
There were attempts to address that by checking node_online in several
places including the page allocator. This patchset approaches the problem
from a different perspective and instead of special casing, which just
adds a runtime overhead, it allocates pglist_data for each possible node.
This can add some memory overhead for platforms with high number of
possible nodes if they do not contain any memory. This should be a rather
rare configuration though.
How to test this? David has provided and excellent howto:
http://lkml.kernel.org/r/6e5ebc19-890c-b6dd-1924-9f25c441010d@redhat.com
Patches 1 and 3-6 are mostly cleanups. The patchset has been reviewed by
Rafael (thanks!) and the core fix tested by Rafael and Alexey (thanks to
both). David has tested as per instructions above and hasn't found any
fallouts in the memory hotplug scenarios.
This patch (of 6):
This is a preparatory patch and it doesn't introduce any functional
change. It merely pulls out arch_alloc_nodedata (and co) outside of
CONFIG_MEMORY_HOTPLUG because the following patch will need to call this
from the generic MM code.
Link: https://lkml.kernel.org/r/20220127085305.20890-1-mhocko@kernel.org
Link: https://lkml.kernel.org/r/20220127085305.20890-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/ia64/mm/discontig.c | 2
include/linux/memory_hotplug.h | 119 +++++++++++++++----------------
2 files changed, 59 insertions(+), 62 deletions(-)
--- a/arch/ia64/mm/discontig.c~mm-memory_hotplug-make-arch_alloc_nodedata-independent-on-config_memory_hotplug
+++ a/arch/ia64/mm/discontig.c
@@ -608,7 +608,6 @@ void __init paging_init(void)
zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
}
-#ifdef CONFIG_MEMORY_HOTPLUG
pg_data_t *arch_alloc_nodedata(int nid)
{
unsigned long size = compute_pernodesize(nid);
@@ -626,7 +625,6 @@ void arch_refresh_nodedata(int update_no
pgdat_list[update_node] = update_pgdat;
scatter_node_data();
}
-#endif
#ifdef CONFIG_SPARSEMEM_VMEMMAP
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-make-arch_alloc_nodedata-independent-on-config_memory_hotplug
+++ a/include/linux/memory_hotplug.h
@@ -16,6 +16,65 @@ struct memory_group;
struct resource;
struct vmem_altmap;
+#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
+/*
+ * For supporting node-hotadd, we have to allocate a new pgdat.
+ *
+ * If an arch has generic style NODE_DATA(),
+ * node_data[nid] = kzalloc() works well. But it depends on the architecture.
+ *
+ * In general, generic_alloc_nodedata() is used.
+ * Now, arch_free_nodedata() is just defined for error path of node_hot_add.
+ *
+ */
+extern pg_data_t *arch_alloc_nodedata(int nid);
+extern void arch_free_nodedata(pg_data_t *pgdat);
+extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat);
+
+#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
+
+#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid)
+#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat)
+
+#ifdef CONFIG_NUMA
+/*
+ * XXX: node aware allocation can't work well to get new node's memory at this time.
+ * Because, pgdat for the new node is not allocated/initialized yet itself.
+ * To use new node's memory, more consideration will be necessary.
+ */
+#define generic_alloc_nodedata(nid) \
+({ \
+ kzalloc(sizeof(pg_data_t), GFP_KERNEL); \
+})
+/*
+ * This definition is just for error path in node hotadd.
+ * For node hotremove, we have to replace this.
+ */
+#define generic_free_nodedata(pgdat) kfree(pgdat)
+
+extern pg_data_t *node_data[];
+static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
+{
+ node_data[nid] = pgdat;
+}
+
+#else /* !CONFIG_NUMA */
+
+/* never called */
+static inline pg_data_t *generic_alloc_nodedata(int nid)
+{
+ BUG();
+ return NULL;
+}
+static inline void generic_free_nodedata(pg_data_t *pgdat)
+{
+}
+static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
+{
+}
+#endif /* CONFIG_NUMA */
+#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
+
#ifdef CONFIG_MEMORY_HOTPLUG
struct page *pfn_to_online_page(unsigned long pfn);
@@ -154,66 +213,6 @@ int add_pages(int nid, unsigned long sta
struct mhp_params *params);
#endif /* ARCH_HAS_ADD_PAGES */
-#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
-/*
- * For supporting node-hotadd, we have to allocate a new pgdat.
- *
- * If an arch has generic style NODE_DATA(),
- * node_data[nid] = kzalloc() works well. But it depends on the architecture.
- *
- * In general, generic_alloc_nodedata() is used.
- * Now, arch_free_nodedata() is just defined for error path of node_hot_add.
- *
- */
-extern pg_data_t *arch_alloc_nodedata(int nid);
-extern void arch_free_nodedata(pg_data_t *pgdat);
-extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat);
-
-#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
-
-#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid)
-#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat)
-
-#ifdef CONFIG_NUMA
-/*
- * If ARCH_HAS_NODEDATA_EXTENSION=n, this func is used to allocate pgdat.
- * XXX: kmalloc_node() can't work well to get new node's memory at this time.
- * Because, pgdat for the new node is not allocated/initialized yet itself.
- * To use new node's memory, more consideration will be necessary.
- */
-#define generic_alloc_nodedata(nid) \
-({ \
- kzalloc(sizeof(pg_data_t), GFP_KERNEL); \
-})
-/*
- * This definition is just for error path in node hotadd.
- * For node hotremove, we have to replace this.
- */
-#define generic_free_nodedata(pgdat) kfree(pgdat)
-
-extern pg_data_t *node_data[];
-static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
-{
- node_data[nid] = pgdat;
-}
-
-#else /* !CONFIG_NUMA */
-
-/* never called */
-static inline pg_data_t *generic_alloc_nodedata(int nid)
-{
- BUG();
- return NULL;
-}
-static inline void generic_free_nodedata(pg_data_t *pgdat)
-{
-}
-static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
-{
-}
-#endif /* CONFIG_NUMA */
-#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
-
void get_online_mems(void);
void put_online_mems(void);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 165/227] mm: handle uninitialized numa nodes gracefully
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm: handle uninitialized numa nodes gracefully
We have had several reports [1][2][3] that page allocator blows up when an
allocation from a possible node is requested. The underlying reason is
that NODE_DATA for the specific node is not allocated.
NUMA specific initialization is arch specific and it can vary a lot. E.g.
x86 tries to initialize all nodes that have some cpu affinity (see
init_cpu_to_node) but this can be insufficient because the node might be
cpuless for example.
One way to address this problem would be to check for !node_online nodes
when trying to get a zonelist and silently fall back to another node.
That is unfortunately adding a branch into allocator hot path and it
doesn't handle any other potential NODE_DATA users.
This patch takes a different approach (following a lead of [3]) and it pre
allocates pgdat for all possible nodes in an arch indipendent code -
free_area_init. All uninitialized nodes are treated as memoryless nodes.
node_state of the node is not changed because that would lead to other
side effects - e.g. sysfs representation of such a node and from past
discussions [4] it is known that some tools might have problems digesting
that.
Newly allocated pgdat only gets a minimal initialization and the rest of
the work is expected to be done by the memory hotplug - hotadd_new_pgdat
(renamed to hotadd_init_pgdat).
generic_alloc_nodedata is changed to use the memblock allocator because
neither page nor slab allocators are available at the stage when all
pgdats are allocated. Hotplug doesn't allocate pgdat anymore so we can
use the early boot allocator. The only arch specific implementation is
ia64 and that is changed to use the early allocator as well.
[1] http://lkml.kernel.org/r/20211101201312.11589-1-amakhalov@vmware.com
[2] http://lkml.kernel.org/r/20211207224013.880775-1-npache@redhat.com
[3] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org
[4] http://lkml.kernel.org/r/20200428093836.27190-1-srikar@linux.vnet.ibm.com
[akpm@linux-foundation.org: replace comment, per Mike]
Link: https://lkml.kernel.org/r/Yfe7RBeLCijnWBON@dhcp22.suse.cz
Reported-by: Alexey Makhalov <amakhalov@vmware.com>
Tested-by: Alexey Makhalov <amakhalov@vmware.com>
Reported-by: Nico Pache <npache@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Tested-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/ia64/mm/discontig.c | 4 +--
include/linux/memory_hotplug.h | 2 -
mm/internal.h | 2 +
mm/memory_hotplug.c | 21 ++++++----------
mm/page_alloc.c | 40 +++++++++++++++++++++++++++----
5 files changed, 50 insertions(+), 19 deletions(-)
--- a/arch/ia64/mm/discontig.c~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/arch/ia64/mm/discontig.c
@@ -608,11 +608,11 @@ void __init paging_init(void)
zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
}
-pg_data_t *arch_alloc_nodedata(int nid)
+pg_data_t * __init arch_alloc_nodedata(int nid)
{
unsigned long size = compute_pernodesize(nid);
- return kzalloc(size, GFP_KERNEL);
+ return memblock_alloc(size, SMP_CACHE_BYTES);
}
void arch_free_nodedata(pg_data_t *pgdat)
--- a/include/linux/memory_hotplug.h~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/include/linux/memory_hotplug.h
@@ -44,7 +44,7 @@ extern void arch_refresh_nodedata(int ni
*/
#define generic_alloc_nodedata(nid) \
({ \
- kzalloc(sizeof(pg_data_t), GFP_KERNEL); \
+ memblock_alloc(sizeof(*pgdat), SMP_CACHE_BYTES); \
})
/*
* This definition is just for error path in node hotadd.
--- a/mm/internal.h~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/mm/internal.h
@@ -707,4 +707,6 @@ void vunmap_range_noflush(unsigned long
int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
unsigned long addr, int page_nid, int *flags);
+DECLARE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
+
#endif /* __MM_INTERNAL_H */
--- a/mm/memory_hotplug.c~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/mm/memory_hotplug.c
@@ -1162,19 +1162,21 @@ static void reset_node_present_pages(pg_
}
/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
-static pg_data_t __ref *hotadd_new_pgdat(int nid)
+static pg_data_t __ref *hotadd_init_pgdat(int nid)
{
struct pglist_data *pgdat;
pgdat = NODE_DATA(nid);
- if (!pgdat) {
- pgdat = arch_alloc_nodedata(nid);
- if (!pgdat)
- return NULL;
+ /*
+ * NODE_DATA is preallocated (free_area_init) but its internal
+ * state is not allocated completely. Add missing pieces.
+ * Completely offline nodes stay around and they just need
+ * reintialization.
+ */
+ if (pgdat->per_cpu_nodestats == &boot_nodestats) {
pgdat->per_cpu_nodestats =
alloc_percpu(struct per_cpu_nodestat);
- arch_refresh_nodedata(nid, pgdat);
} else {
int cpu;
/*
@@ -1193,8 +1195,6 @@ static pg_data_t __ref *hotadd_new_pgdat
}
}
- /* we can use NODE_DATA(nid) from here */
- pgdat->node_id = nid;
pgdat->node_start_pfn = 0;
/* init node's zones as empty zones, we don't have any present pages.*/
@@ -1246,7 +1246,7 @@ static int __try_online_node(int nid, bo
if (node_online(nid))
return 0;
- pgdat = hotadd_new_pgdat(nid);
+ pgdat = hotadd_init_pgdat(nid);
if (!pgdat) {
pr_err("Cannot online node %d due to NULL pgdat\n", nid);
ret = -ENOMEM;
@@ -1445,9 +1445,6 @@ int __ref add_memory_resource(int nid, s
return ret;
error:
- /* rollback pgdat allocation and others */
- if (new_node)
- rollback_node_hotadd(nid);
if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
memblock_remove(start, size);
error_mem_hotplug_end:
--- a/mm/page_alloc.c~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/mm/page_alloc.c
@@ -6341,7 +6341,7 @@ static void per_cpu_pages_init(struct pe
#define BOOT_PAGESET_BATCH 1
static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset);
static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats);
-static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
+DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
static void __build_all_zonelists(void *data)
{
@@ -6363,7 +6363,11 @@ static void __build_all_zonelists(void *
if (self && !node_online(self->node_id)) {
build_zonelists(self);
} else {
- for_each_online_node(nid) {
+ /*
+ * All possible nodes have pgdat preallocated
+ * in free_area_init
+ */
+ for_each_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
build_zonelists(pgdat);
@@ -8063,8 +8067,36 @@ void __init free_area_init(unsigned long
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
- for_each_online_node(nid) {
- pg_data_t *pgdat = NODE_DATA(nid);
+ for_each_node(nid) {
+ pg_data_t *pgdat;
+
+ if (!node_online(nid)) {
+ pr_info("Initializing node %d as memoryless\n", nid);
+
+ /* Allocator not initialized yet */
+ pgdat = arch_alloc_nodedata(nid);
+ if (!pgdat) {
+ pr_err("Cannot allocate %zuB for node %d.\n",
+ sizeof(*pgdat), nid);
+ continue;
+ }
+ arch_refresh_nodedata(nid, pgdat);
+ free_area_init_memoryless_node(nid);
+
+ /*
+ * We do not want to confuse userspace by sysfs
+ * files/directories for node without any memory
+ * attached to it, so this node is not marked as
+ * N_MEMORY and not marked online so that no sysfs
+ * hierarchy will be created via register_one_node for
+ * it. The pgdat will get fully initialized by
+ * hotadd_init_pgdat() when memory is hotplugged into
+ * this node.
+ */
+ continue;
+ }
+
+ pgdat = NODE_DATA(nid);
free_area_init_node(nid);
/* Any memory on that node */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 165/227] mm: handle uninitialized numa nodes gracefully
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm: handle uninitialized numa nodes gracefully
We have had several reports [1][2][3] that page allocator blows up when an
allocation from a possible node is requested. The underlying reason is
that NODE_DATA for the specific node is not allocated.
NUMA specific initialization is arch specific and it can vary a lot. E.g.
x86 tries to initialize all nodes that have some cpu affinity (see
init_cpu_to_node) but this can be insufficient because the node might be
cpuless for example.
One way to address this problem would be to check for !node_online nodes
when trying to get a zonelist and silently fall back to another node.
That is unfortunately adding a branch into allocator hot path and it
doesn't handle any other potential NODE_DATA users.
This patch takes a different approach (following a lead of [3]) and it pre
allocates pgdat for all possible nodes in an arch indipendent code -
free_area_init. All uninitialized nodes are treated as memoryless nodes.
node_state of the node is not changed because that would lead to other
side effects - e.g. sysfs representation of such a node and from past
discussions [4] it is known that some tools might have problems digesting
that.
Newly allocated pgdat only gets a minimal initialization and the rest of
the work is expected to be done by the memory hotplug - hotadd_new_pgdat
(renamed to hotadd_init_pgdat).
generic_alloc_nodedata is changed to use the memblock allocator because
neither page nor slab allocators are available at the stage when all
pgdats are allocated. Hotplug doesn't allocate pgdat anymore so we can
use the early boot allocator. The only arch specific implementation is
ia64 and that is changed to use the early allocator as well.
[1] http://lkml.kernel.org/r/20211101201312.11589-1-amakhalov@vmware.com
[2] http://lkml.kernel.org/r/20211207224013.880775-1-npache@redhat.com
[3] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org
[4] http://lkml.kernel.org/r/20200428093836.27190-1-srikar@linux.vnet.ibm.com
[akpm@linux-foundation.org: replace comment, per Mike]
Link: https://lkml.kernel.org/r/Yfe7RBeLCijnWBON@dhcp22.suse.cz
Reported-by: Alexey Makhalov <amakhalov@vmware.com>
Tested-by: Alexey Makhalov <amakhalov@vmware.com>
Reported-by: Nico Pache <npache@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Tested-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/ia64/mm/discontig.c | 4 +--
include/linux/memory_hotplug.h | 2 -
mm/internal.h | 2 +
mm/memory_hotplug.c | 21 ++++++----------
mm/page_alloc.c | 40 +++++++++++++++++++++++++++----
5 files changed, 50 insertions(+), 19 deletions(-)
--- a/arch/ia64/mm/discontig.c~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/arch/ia64/mm/discontig.c
@@ -608,11 +608,11 @@ void __init paging_init(void)
zero_page_memmap_ptr = virt_to_page(ia64_imva(empty_zero_page));
}
-pg_data_t *arch_alloc_nodedata(int nid)
+pg_data_t * __init arch_alloc_nodedata(int nid)
{
unsigned long size = compute_pernodesize(nid);
- return kzalloc(size, GFP_KERNEL);
+ return memblock_alloc(size, SMP_CACHE_BYTES);
}
void arch_free_nodedata(pg_data_t *pgdat)
--- a/include/linux/memory_hotplug.h~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/include/linux/memory_hotplug.h
@@ -44,7 +44,7 @@ extern void arch_refresh_nodedata(int ni
*/
#define generic_alloc_nodedata(nid) \
({ \
- kzalloc(sizeof(pg_data_t), GFP_KERNEL); \
+ memblock_alloc(sizeof(*pgdat), SMP_CACHE_BYTES); \
})
/*
* This definition is just for error path in node hotadd.
--- a/mm/internal.h~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/mm/internal.h
@@ -707,4 +707,6 @@ void vunmap_range_noflush(unsigned long
int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
unsigned long addr, int page_nid, int *flags);
+DECLARE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
+
#endif /* __MM_INTERNAL_H */
--- a/mm/memory_hotplug.c~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/mm/memory_hotplug.c
@@ -1162,19 +1162,21 @@ static void reset_node_present_pages(pg_
}
/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
-static pg_data_t __ref *hotadd_new_pgdat(int nid)
+static pg_data_t __ref *hotadd_init_pgdat(int nid)
{
struct pglist_data *pgdat;
pgdat = NODE_DATA(nid);
- if (!pgdat) {
- pgdat = arch_alloc_nodedata(nid);
- if (!pgdat)
- return NULL;
+ /*
+ * NODE_DATA is preallocated (free_area_init) but its internal
+ * state is not allocated completely. Add missing pieces.
+ * Completely offline nodes stay around and they just need
+ * reintialization.
+ */
+ if (pgdat->per_cpu_nodestats == &boot_nodestats) {
pgdat->per_cpu_nodestats =
alloc_percpu(struct per_cpu_nodestat);
- arch_refresh_nodedata(nid, pgdat);
} else {
int cpu;
/*
@@ -1193,8 +1195,6 @@ static pg_data_t __ref *hotadd_new_pgdat
}
}
- /* we can use NODE_DATA(nid) from here */
- pgdat->node_id = nid;
pgdat->node_start_pfn = 0;
/* init node's zones as empty zones, we don't have any present pages.*/
@@ -1246,7 +1246,7 @@ static int __try_online_node(int nid, bo
if (node_online(nid))
return 0;
- pgdat = hotadd_new_pgdat(nid);
+ pgdat = hotadd_init_pgdat(nid);
if (!pgdat) {
pr_err("Cannot online node %d due to NULL pgdat\n", nid);
ret = -ENOMEM;
@@ -1445,9 +1445,6 @@ int __ref add_memory_resource(int nid, s
return ret;
error:
- /* rollback pgdat allocation and others */
- if (new_node)
- rollback_node_hotadd(nid);
if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
memblock_remove(start, size);
error_mem_hotplug_end:
--- a/mm/page_alloc.c~mm-handle-uninitialized-numa-nodes-gracefully
+++ a/mm/page_alloc.c
@@ -6341,7 +6341,7 @@ static void per_cpu_pages_init(struct pe
#define BOOT_PAGESET_BATCH 1
static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset);
static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats);
-static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
+DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
static void __build_all_zonelists(void *data)
{
@@ -6363,7 +6363,11 @@ static void __build_all_zonelists(void *
if (self && !node_online(self->node_id)) {
build_zonelists(self);
} else {
- for_each_online_node(nid) {
+ /*
+ * All possible nodes have pgdat preallocated
+ * in free_area_init
+ */
+ for_each_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
build_zonelists(pgdat);
@@ -8063,8 +8067,36 @@ void __init free_area_init(unsigned long
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
- for_each_online_node(nid) {
- pg_data_t *pgdat = NODE_DATA(nid);
+ for_each_node(nid) {
+ pg_data_t *pgdat;
+
+ if (!node_online(nid)) {
+ pr_info("Initializing node %d as memoryless\n", nid);
+
+ /* Allocator not initialized yet */
+ pgdat = arch_alloc_nodedata(nid);
+ if (!pgdat) {
+ pr_err("Cannot allocate %zuB for node %d.\n",
+ sizeof(*pgdat), nid);
+ continue;
+ }
+ arch_refresh_nodedata(nid, pgdat);
+ free_area_init_memoryless_node(nid);
+
+ /*
+ * We do not want to confuse userspace by sysfs
+ * files/directories for node without any memory
+ * attached to it, so this node is not marked as
+ * N_MEMORY and not marked online so that no sysfs
+ * hierarchy will be created via register_one_node for
+ * it. The pgdat will get fully initialized by
+ * hotadd_init_pgdat() when memory is hotplugged into
+ * this node.
+ */
+ continue;
+ }
+
+ pgdat = NODE_DATA(nid);
free_area_init_node(nid);
/* Any memory on that node */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 166/227] mm, memory_hotplug: drop arch_free_nodedata
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm, memory_hotplug: drop arch_free_nodedata
Prior to "mm: handle uninitialized numa nodes gracefully" memory hotplug
used to allocate pgdat when memory has been added to a node
(hotadd_init_pgdat) arch_free_nodedata has been only used in the failure
path because once the pgdat is exported (to be visible by NODA_DATA(nid))
it cannot really be freed because there is no synchronization available
for that.
pgdat is allocated for each possible nodes now so the memory hotplug
doesn't need to do the ever use arch_free_nodedata so drop it.
This patch doesn't introduce any functional change.
Link: https://lkml.kernel.org/r/20220127085305.20890-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/ia64/mm/discontig.c | 5 -----
include/linux/memory_hotplug.h | 3 ---
mm/memory_hotplug.c | 10 ----------
3 files changed, 18 deletions(-)
--- a/arch/ia64/mm/discontig.c~mm-memory_hotplug-drop-arch_free_nodedata
+++ a/arch/ia64/mm/discontig.c
@@ -615,11 +615,6 @@ pg_data_t * __init arch_alloc_nodedata(i
return memblock_alloc(size, SMP_CACHE_BYTES);
}
-void arch_free_nodedata(pg_data_t *pgdat)
-{
- kfree(pgdat);
-}
-
void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
{
pgdat_list[update_node] = update_pgdat;
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-drop-arch_free_nodedata
+++ a/include/linux/memory_hotplug.h
@@ -24,17 +24,14 @@ struct vmem_altmap;
* node_data[nid] = kzalloc() works well. But it depends on the architecture.
*
* In general, generic_alloc_nodedata() is used.
- * Now, arch_free_nodedata() is just defined for error path of node_hot_add.
*
*/
extern pg_data_t *arch_alloc_nodedata(int nid);
-extern void arch_free_nodedata(pg_data_t *pgdat);
extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat);
#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid)
-#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat)
#ifdef CONFIG_NUMA
/*
--- a/mm/memory_hotplug.c~mm-memory_hotplug-drop-arch_free_nodedata
+++ a/mm/memory_hotplug.c
@@ -1217,16 +1217,6 @@ static pg_data_t __ref *hotadd_init_pgda
return pgdat;
}
-static void rollback_node_hotadd(int nid)
-{
- pg_data_t *pgdat = NODE_DATA(nid);
-
- arch_refresh_nodedata(nid, NULL);
- free_percpu(pgdat->per_cpu_nodestats);
- arch_free_nodedata(pgdat);
-}
-
-
/*
* __try_online_node - online a node if offlined
* @nid: the node ID
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 166/227] mm, memory_hotplug: drop arch_free_nodedata
@ 2022-03-22 21:46 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:46 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm, memory_hotplug: drop arch_free_nodedata
Prior to "mm: handle uninitialized numa nodes gracefully" memory hotplug
used to allocate pgdat when memory has been added to a node
(hotadd_init_pgdat) arch_free_nodedata has been only used in the failure
path because once the pgdat is exported (to be visible by NODA_DATA(nid))
it cannot really be freed because there is no synchronization available
for that.
pgdat is allocated for each possible nodes now so the memory hotplug
doesn't need to do the ever use arch_free_nodedata so drop it.
This patch doesn't introduce any functional change.
Link: https://lkml.kernel.org/r/20220127085305.20890-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/ia64/mm/discontig.c | 5 -----
include/linux/memory_hotplug.h | 3 ---
mm/memory_hotplug.c | 10 ----------
3 files changed, 18 deletions(-)
--- a/arch/ia64/mm/discontig.c~mm-memory_hotplug-drop-arch_free_nodedata
+++ a/arch/ia64/mm/discontig.c
@@ -615,11 +615,6 @@ pg_data_t * __init arch_alloc_nodedata(i
return memblock_alloc(size, SMP_CACHE_BYTES);
}
-void arch_free_nodedata(pg_data_t *pgdat)
-{
- kfree(pgdat);
-}
-
void arch_refresh_nodedata(int update_node, pg_data_t *update_pgdat)
{
pgdat_list[update_node] = update_pgdat;
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-drop-arch_free_nodedata
+++ a/include/linux/memory_hotplug.h
@@ -24,17 +24,14 @@ struct vmem_altmap;
* node_data[nid] = kzalloc() works well. But it depends on the architecture.
*
* In general, generic_alloc_nodedata() is used.
- * Now, arch_free_nodedata() is just defined for error path of node_hot_add.
*
*/
extern pg_data_t *arch_alloc_nodedata(int nid);
-extern void arch_free_nodedata(pg_data_t *pgdat);
extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat);
#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid)
-#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat)
#ifdef CONFIG_NUMA
/*
--- a/mm/memory_hotplug.c~mm-memory_hotplug-drop-arch_free_nodedata
+++ a/mm/memory_hotplug.c
@@ -1217,16 +1217,6 @@ static pg_data_t __ref *hotadd_init_pgda
return pgdat;
}
-static void rollback_node_hotadd(int nid)
-{
- pg_data_t *pgdat = NODE_DATA(nid);
-
- arch_refresh_nodedata(nid, NULL);
- free_percpu(pgdat->per_cpu_nodestats);
- arch_free_nodedata(pgdat);
-}
-
-
/*
* __try_online_node - online a node if offlined
* @nid: the node ID
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 167/227] mm, memory_hotplug: reorganize new pgdat initialization
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm, memory_hotplug: reorganize new pgdat initialization
When a !node_online node is brought up it needs a hotplug specific
initialization because the node could be either uninitialized yet or it
could have been recycled after previous hotremove. hotadd_init_pgdat is
responsible for that.
Internal pgdat state is initialized at two places currently
- hotadd_init_pgdat
- free_area_init_core_hotplug
There is no real clear cut what should go where but this patch's chosen to
move the whole internal state initialization into
free_area_init_core_hotplug. hotadd_init_pgdat is still responsible to
pull all the parts together - most notably to initialize zonelists because
those depend on the overall topology.
This patch doesn't introduce any functional change.
Link: https://lkml.kernel.org/r/20220127085305.20890-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/memory_hotplug.h | 2 +-
mm/memory_hotplug.c | 28 +++-------------------------
mm/page_alloc.c | 25 +++++++++++++++++++++++--
3 files changed, 27 insertions(+), 28 deletions(-)
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-reorganize-new-pgdat-initialization
+++ a/include/linux/memory_hotplug.h
@@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct z
extern void clear_zone_contiguous(struct zone *zone);
#ifdef CONFIG_MEMORY_HOTPLUG
-extern void __ref free_area_init_core_hotplug(int nid);
+extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory_resource(int nid, struct resource *resource,
--- a/mm/memory_hotplug.c~mm-memory_hotplug-reorganize-new-pgdat-initialization
+++ a/mm/memory_hotplug.c
@@ -1166,39 +1166,16 @@ static pg_data_t __ref *hotadd_init_pgda
{
struct pglist_data *pgdat;
- pgdat = NODE_DATA(nid);
-
/*
* NODE_DATA is preallocated (free_area_init) but its internal
* state is not allocated completely. Add missing pieces.
* Completely offline nodes stay around and they just need
* reintialization.
*/
- if (pgdat->per_cpu_nodestats == &boot_nodestats) {
- pgdat->per_cpu_nodestats =
- alloc_percpu(struct per_cpu_nodestat);
- } else {
- int cpu;
- /*
- * Reset the nr_zones, order and highest_zoneidx before reuse.
- * Note that kswapd will init kswapd_highest_zoneidx properly
- * when it starts in the near future.
- */
- pgdat->nr_zones = 0;
- pgdat->kswapd_order = 0;
- pgdat->kswapd_highest_zoneidx = 0;
- for_each_online_cpu(cpu) {
- struct per_cpu_nodestat *p;
-
- p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
- memset(p, 0, sizeof(*p));
- }
- }
-
- pgdat->node_start_pfn = 0;
+ pgdat = NODE_DATA(nid);
/* init node's zones as empty zones, we don't have any present pages.*/
- free_area_init_core_hotplug(nid);
+ free_area_init_core_hotplug(pgdat);
/*
* The node we allocated has no zone fallback lists. For avoiding
@@ -1210,6 +1187,7 @@ static pg_data_t __ref *hotadd_init_pgda
* When memory is hot-added, all the memory is in offline state. So
* clear all zones' present_pages because they will be updated in
* online_pages() and offline_pages().
+ * TODO: should be in free_area_init_core_hotplug?
*/
reset_node_managed_pages(pgdat);
reset_node_present_pages(pgdat);
--- a/mm/page_alloc.c~mm-memory_hotplug-reorganize-new-pgdat-initialization
+++ a/mm/page_alloc.c
@@ -7466,12 +7466,33 @@ static void __meminit zone_init_internal
* NOTE: this function is only called during memory hotplug
*/
#ifdef CONFIG_MEMORY_HOTPLUG
-void __ref free_area_init_core_hotplug(int nid)
+void __ref free_area_init_core_hotplug(struct pglist_data *pgdat)
{
+ int nid = pgdat->node_id;
enum zone_type z;
- pg_data_t *pgdat = NODE_DATA(nid);
+ int cpu;
pgdat_init_internals(pgdat);
+
+ if (pgdat->per_cpu_nodestats == &boot_nodestats)
+ pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
+
+ /*
+ * Reset the nr_zones, order and highest_zoneidx before reuse.
+ * Note that kswapd will init kswapd_highest_zoneidx properly
+ * when it starts in the near future.
+ */
+ pgdat->nr_zones = 0;
+ pgdat->kswapd_order = 0;
+ pgdat->kswapd_highest_zoneidx = 0;
+ pgdat->node_start_pfn = 0;
+ for_each_online_cpu(cpu) {
+ struct per_cpu_nodestat *p;
+
+ p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
+ memset(p, 0, sizeof(*p));
+ }
+
for (z = 0; z < MAX_NR_ZONES; z++)
zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 167/227] mm, memory_hotplug: reorganize new pgdat initialization
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm, memory_hotplug: reorganize new pgdat initialization
When a !node_online node is brought up it needs a hotplug specific
initialization because the node could be either uninitialized yet or it
could have been recycled after previous hotremove. hotadd_init_pgdat is
responsible for that.
Internal pgdat state is initialized at two places currently
- hotadd_init_pgdat
- free_area_init_core_hotplug
There is no real clear cut what should go where but this patch's chosen to
move the whole internal state initialization into
free_area_init_core_hotplug. hotadd_init_pgdat is still responsible to
pull all the parts together - most notably to initialize zonelists because
those depend on the overall topology.
This patch doesn't introduce any functional change.
Link: https://lkml.kernel.org/r/20220127085305.20890-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/memory_hotplug.h | 2 +-
mm/memory_hotplug.c | 28 +++-------------------------
mm/page_alloc.c | 25 +++++++++++++++++++++++--
3 files changed, 27 insertions(+), 28 deletions(-)
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-reorganize-new-pgdat-initialization
+++ a/include/linux/memory_hotplug.h
@@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct z
extern void clear_zone_contiguous(struct zone *zone);
#ifdef CONFIG_MEMORY_HOTPLUG
-extern void __ref free_area_init_core_hotplug(int nid);
+extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory_resource(int nid, struct resource *resource,
--- a/mm/memory_hotplug.c~mm-memory_hotplug-reorganize-new-pgdat-initialization
+++ a/mm/memory_hotplug.c
@@ -1166,39 +1166,16 @@ static pg_data_t __ref *hotadd_init_pgda
{
struct pglist_data *pgdat;
- pgdat = NODE_DATA(nid);
-
/*
* NODE_DATA is preallocated (free_area_init) but its internal
* state is not allocated completely. Add missing pieces.
* Completely offline nodes stay around and they just need
* reintialization.
*/
- if (pgdat->per_cpu_nodestats == &boot_nodestats) {
- pgdat->per_cpu_nodestats =
- alloc_percpu(struct per_cpu_nodestat);
- } else {
- int cpu;
- /*
- * Reset the nr_zones, order and highest_zoneidx before reuse.
- * Note that kswapd will init kswapd_highest_zoneidx properly
- * when it starts in the near future.
- */
- pgdat->nr_zones = 0;
- pgdat->kswapd_order = 0;
- pgdat->kswapd_highest_zoneidx = 0;
- for_each_online_cpu(cpu) {
- struct per_cpu_nodestat *p;
-
- p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
- memset(p, 0, sizeof(*p));
- }
- }
-
- pgdat->node_start_pfn = 0;
+ pgdat = NODE_DATA(nid);
/* init node's zones as empty zones, we don't have any present pages.*/
- free_area_init_core_hotplug(nid);
+ free_area_init_core_hotplug(pgdat);
/*
* The node we allocated has no zone fallback lists. For avoiding
@@ -1210,6 +1187,7 @@ static pg_data_t __ref *hotadd_init_pgda
* When memory is hot-added, all the memory is in offline state. So
* clear all zones' present_pages because they will be updated in
* online_pages() and offline_pages().
+ * TODO: should be in free_area_init_core_hotplug?
*/
reset_node_managed_pages(pgdat);
reset_node_present_pages(pgdat);
--- a/mm/page_alloc.c~mm-memory_hotplug-reorganize-new-pgdat-initialization
+++ a/mm/page_alloc.c
@@ -7466,12 +7466,33 @@ static void __meminit zone_init_internal
* NOTE: this function is only called during memory hotplug
*/
#ifdef CONFIG_MEMORY_HOTPLUG
-void __ref free_area_init_core_hotplug(int nid)
+void __ref free_area_init_core_hotplug(struct pglist_data *pgdat)
{
+ int nid = pgdat->node_id;
enum zone_type z;
- pg_data_t *pgdat = NODE_DATA(nid);
+ int cpu;
pgdat_init_internals(pgdat);
+
+ if (pgdat->per_cpu_nodestats == &boot_nodestats)
+ pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat);
+
+ /*
+ * Reset the nr_zones, order and highest_zoneidx before reuse.
+ * Note that kswapd will init kswapd_highest_zoneidx properly
+ * when it starts in the near future.
+ */
+ pgdat->nr_zones = 0;
+ pgdat->kswapd_order = 0;
+ pgdat->kswapd_highest_zoneidx = 0;
+ pgdat->node_start_pfn = 0;
+ for_each_online_cpu(cpu) {
+ struct per_cpu_nodestat *p;
+
+ p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
+ memset(p, 0, sizeof(*p));
+ }
+
for (z = 0; z < MAX_NR_ZONES; z++)
zone_init_internals(&pgdat->node_zones[z], z, nid, 0);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 168/227] mm: make free_area_init_node aware of memory less nodes
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm: make free_area_init_node aware of memory less nodes
free_area_init_node is also called from memory less node initialization
path (free_area_init_memoryless_node). It doesn't really make much sense
to display the physical memory range for those nodes: Initmem setup node
XX [mem 0x0000000000000000-0x0000000000000000]
Instead be explicit that the node is memoryless: Initmem setup node XX as
memoryless
Link: https://lkml.kernel.org/r/20220127085305.20890-6-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/mm/page_alloc.c~mm-make-free_area_init_node-aware-of-memory-less-nodes
+++ a/mm/page_alloc.c
@@ -7642,9 +7642,14 @@ static void __init free_area_init_node(i
pgdat->node_start_pfn = start_pfn;
pgdat->per_cpu_nodestats = NULL;
- pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
- (u64)start_pfn << PAGE_SHIFT,
- end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
+ if (start_pfn != end_pfn) {
+ pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
+ (u64)start_pfn << PAGE_SHIFT,
+ end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
+ } else {
+ pr_info("Initmem setup node %d as memoryless\n", nid);
+ }
+
calculate_node_totalpages(pgdat, start_pfn, end_pfn);
alloc_node_mem_map(pgdat);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 168/227] mm: make free_area_init_node aware of memory less nodes
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: tj, rppt, richard.weiyang, raquini, osalvador, npache,
eric.dumazet, dennis, david, cl, amakhalov, mhocko, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Michal Hocko <mhocko@suse.com>
Subject: mm: make free_area_init_node aware of memory less nodes
free_area_init_node is also called from memory less node initialization
path (free_area_init_memoryless_node). It doesn't really make much sense
to display the physical memory range for those nodes: Initmem setup node
XX [mem 0x0000000000000000-0x0000000000000000]
Instead be explicit that the node is memoryless: Initmem setup node XX as
memoryless
Link: https://lkml.kernel.org/r/20220127085305.20890-6-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/mm/page_alloc.c~mm-make-free_area_init_node-aware-of-memory-less-nodes
+++ a/mm/page_alloc.c
@@ -7642,9 +7642,14 @@ static void __init free_area_init_node(i
pgdat->node_start_pfn = start_pfn;
pgdat->per_cpu_nodestats = NULL;
- pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
- (u64)start_pfn << PAGE_SHIFT,
- end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
+ if (start_pfn != end_pfn) {
+ pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
+ (u64)start_pfn << PAGE_SHIFT,
+ end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0);
+ } else {
+ pr_info("Initmem setup node %d as memoryless\n", nid);
+ }
+
calculate_node_totalpages(pgdat, start_pfn, end_pfn);
alloc_node_mem_map(pgdat);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 169/227] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: tj, rppt, raquini, osalvador, npache, mhocko, eric.dumazet,
dennis, david, cl, amakhalov, richard.weiyang, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Wei Yang <richard.weiyang@gmail.com>
Subject: memcg: do not tweak node in alloc_mem_cgroup_per_node_info
alloc_mem_cgroup_per_node_info is allocated for each possible node and
this used to be a problem because !node_online nodes didn't have
appropriate data structure allocated. This has changed by "mm: handle
uninitialized numa nodes gracefully" so we can drop the special casing
here.
Link: https://lkml.kernel.org/r/20220127085305.20890-7-mhocko@kernel.org
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael Aquini <raquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
--- a/mm/memcontrol.c~memcg-do-not-tweak-node-in-alloc_mem_cgroup_per_node_info
+++ a/mm/memcontrol.c
@@ -5020,18 +5020,8 @@ struct mem_cgroup *mem_cgroup_from_id(un
static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn;
- int tmp = node;
- /*
- * This routine is called against possible nodes.
- * But it's BUG to call kmalloc() against offline node.
- *
- * TODO: this routine can waste much memory for nodes which will
- * never be onlined. It's better to use memory hotplug callback
- * function.
- */
- if (!node_state(node, N_NORMAL_MEMORY))
- tmp = -1;
- pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+ pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
if (!pn)
return 1;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 169/227] memcg: do not tweak node in alloc_mem_cgroup_per_node_info
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: tj, rppt, raquini, osalvador, npache, mhocko, eric.dumazet,
dennis, david, cl, amakhalov, richard.weiyang, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Wei Yang <richard.weiyang@gmail.com>
Subject: memcg: do not tweak node in alloc_mem_cgroup_per_node_info
alloc_mem_cgroup_per_node_info is allocated for each possible node and
this used to be a problem because !node_online nodes didn't have
appropriate data structure allocated. This has changed by "mm: handle
uninitialized numa nodes gracefully" so we can drop the special casing
here.
Link: https://lkml.kernel.org/r/20220127085305.20890-7-mhocko@kernel.org
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael Aquini <raquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memcontrol.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
--- a/mm/memcontrol.c~memcg-do-not-tweak-node-in-alloc_mem_cgroup_per_node_info
+++ a/mm/memcontrol.c
@@ -5020,18 +5020,8 @@ struct mem_cgroup *mem_cgroup_from_id(un
static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
struct mem_cgroup_per_node *pn;
- int tmp = node;
- /*
- * This routine is called against possible nodes.
- * But it's BUG to call kmalloc() against offline node.
- *
- * TODO: this routine can waste much memory for nodes which will
- * never be onlined. It's better to use memory hotplug callback
- * function.
- */
- if (!node_state(node, N_NORMAL_MEMORY))
- tmp = -1;
- pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
+
+ pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, node);
if (!pn)
return 1;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 170/227] drivers/base/memory: add memory block to memory group after registration succeeded
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rafael, osalvador, mhocko, gregkh, david, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: add memory block to memory group after registration succeeded
If register_memory() fails, we freed the memory block but already added
the memory block to the group list, not good. Let's defer adding the
block to the memory group to after registering the memory block device.
We do handle it properly during unregister_memory(), but that's not
called when the registration fails.
Link: https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded
+++ a/drivers/base/memory.c
@@ -665,14 +665,16 @@ static int init_memory_block(unsigned lo
mem->nr_vmemmap_pages = nr_vmemmap_pages;
INIT_LIST_HEAD(&mem->group_next);
+ ret = register_memory(mem);
+ if (ret)
+ return ret;
+
if (group) {
mem->group = group;
list_add(&mem->group_next, &group->memory_blocks);
}
- ret = register_memory(mem);
-
- return ret;
+ return 0;
}
static int add_memory_block(unsigned long base_section_nr)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 170/227] drivers/base/memory: add memory block to memory group after registration succeeded
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rafael, osalvador, mhocko, gregkh, david, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: add memory block to memory group after registration succeeded
If register_memory() fails, we freed the memory block but already added
the memory block to the group list, not good. Let's defer adding the
block to the memory group to after registering the memory block device.
We do handle it properly during unregister_memory(), but that's not
called when the registration fails.
Link: https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded
+++ a/drivers/base/memory.c
@@ -665,14 +665,16 @@ static int init_memory_block(unsigned lo
mem->nr_vmemmap_pages = nr_vmemmap_pages;
INIT_LIST_HEAD(&mem->group_next);
+ ret = register_memory(mem);
+ if (ret)
+ return ret;
+
if (group) {
mem->group = group;
list_add(&mem->group_next, &group->memory_blocks);
}
- ret = register_memory(mem);
-
- return ret;
+ return 0;
}
static int add_memory_block(unsigned long base_section_nr)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 171/227] drivers/base/node: consolidate node device subsystem initialization in node_dev_init()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: ysato, will, tsbogend, tglx, rppt, rafael, paul.walmsley, paulus,
palmer, osalvador, mpe, mingo, mhocko, matorola, hca, gregkh,
gor, davem, dave.hansen, dalias, catalin.marinas, bp, benh, aou,
david, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/node: consolidate node device subsystem initialization in node_dev_init()
... and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls. All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().
This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.
Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.
The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device. The latter should be the case for the current users of
topology_init().
Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64)
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm64/kernel/setup.c | 3 ---
arch/ia64/kernel/topology.c | 10 ----------
arch/mips/kernel/topology.c | 5 -----
arch/powerpc/kernel/sysfs.c | 17 -----------------
arch/riscv/kernel/setup.c | 3 ---
arch/s390/kernel/numa.c | 7 -------
arch/sh/kernel/topology.c | 5 -----
arch/sparc/kernel/sysfs.c | 12 ------------
arch/x86/kernel/topology.c | 5 -----
drivers/base/init.c | 1 +
drivers/base/node.c | 30 +++++++++++++++++-------------
include/linux/node.h | 4 ++++
12 files changed, 22 insertions(+), 80 deletions(-)
--- a/arch/arm64/kernel/setup.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/arm64/kernel/setup.c
@@ -406,9 +406,6 @@ static int __init topology_init(void)
{
int i;
- for_each_online_node(i)
- register_one_node(i);
-
for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
cpu->hotpluggable = cpu_can_disable(i);
--- a/arch/ia64/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/ia64/kernel/topology.c
@@ -70,16 +70,6 @@ static int __init topology_init(void)
{
int i, err = 0;
-#ifdef CONFIG_NUMA
- /*
- * MCD - Do we want to register all ONLINE nodes, or all POSSIBLE nodes?
- */
- for_each_online_node(i) {
- if ((err = register_one_node(i)))
- goto out;
- }
-#endif
-
sysfs_cpus = kcalloc(NR_CPUS, sizeof(struct ia64_cpu), GFP_KERNEL);
if (!sysfs_cpus)
panic("kzalloc in topology_init failed - NR_CPUS too big?");
--- a/arch/mips/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/mips/kernel/topology.c
@@ -12,11 +12,6 @@ static int __init topology_init(void)
{
int i, ret;
-#ifdef CONFIG_NUMA
- for_each_online_node(i)
- register_one_node(i);
-#endif /* CONFIG_NUMA */
-
for_each_present_cpu(i) {
struct cpu *c = &per_cpu(cpu_devices, i);
--- a/arch/powerpc/kernel/sysfs.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/powerpc/kernel/sysfs.c
@@ -1110,14 +1110,6 @@ EXPORT_SYMBOL_GPL(cpu_remove_dev_attr_gr
/* NUMA stuff */
#ifdef CONFIG_NUMA
-static void __init register_nodes(void)
-{
- int i;
-
- for (i = 0; i < MAX_NUMNODES; i++)
- register_one_node(i);
-}
-
int sysfs_add_device_to_node(struct device *dev, int nid)
{
struct node *node = node_devices[nid];
@@ -1132,13 +1124,6 @@ void sysfs_remove_device_from_node(struc
sysfs_remove_link(&node->dev.kobj, kobject_name(&dev->kobj));
}
EXPORT_SYMBOL_GPL(sysfs_remove_device_from_node);
-
-#else
-static void __init register_nodes(void)
-{
- return;
-}
-
#endif
/* Only valid if CPU is present. */
@@ -1155,8 +1140,6 @@ static int __init topology_init(void)
{
int cpu, r;
- register_nodes();
-
for_each_possible_cpu(cpu) {
struct cpu *c = &per_cpu(cpu_devices, cpu);
--- a/arch/riscv/kernel/setup.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/riscv/kernel/setup.c
@@ -301,9 +301,6 @@ static int __init topology_init(void)
{
int i, ret;
- for_each_online_node(i)
- register_one_node(i);
-
for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_devices, i);
--- a/arch/s390/kernel/numa.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/s390/kernel/numa.c
@@ -33,10 +33,3 @@ void __init numa_setup(void)
NODE_DATA(0)->node_spanned_pages = memblock_end_of_DRAM() >> PAGE_SHIFT;
NODE_DATA(0)->node_id = 0;
}
-
-static int __init numa_init_late(void)
-{
- register_one_node(0);
- return 0;
-}
-arch_initcall(numa_init_late);
--- a/arch/sh/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/sh/kernel/topology.c
@@ -46,11 +46,6 @@ static int __init topology_init(void)
{
int i, ret;
-#ifdef CONFIG_NUMA
- for_each_online_node(i)
- register_one_node(i);
-#endif
-
for_each_present_cpu(i) {
struct cpu *c = &per_cpu(cpu_devices, i);
--- a/arch/sparc/kernel/sysfs.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/sparc/kernel/sysfs.c
@@ -244,22 +244,10 @@ static void __init check_mmu_stats(void)
mmu_stats_supported = 1;
}
-static void register_nodes(void)
-{
-#ifdef CONFIG_NUMA
- int i;
-
- for (i = 0; i < MAX_NUMNODES; i++)
- register_one_node(i);
-#endif
-}
-
static int __init topology_init(void)
{
int cpu, ret;
- register_nodes();
-
check_mmu_stats();
for_each_possible_cpu(cpu) {
--- a/arch/x86/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/x86/kernel/topology.c
@@ -154,11 +154,6 @@ static int __init topology_init(void)
{
int i;
-#ifdef CONFIG_NUMA
- for_each_online_node(i)
- register_one_node(i);
-#endif
-
for_each_present_cpu(i)
arch_register_cpu(i);
--- a/drivers/base/init.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/drivers/base/init.c
@@ -35,5 +35,6 @@ void __init driver_init(void)
auxiliary_bus_init();
cpu_dev_init();
memory_dev_init();
+ node_dev_init();
container_dev_init();
}
--- a/drivers/base/node.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/drivers/base/node.c
@@ -1065,26 +1065,30 @@ static const struct attribute_group *cpu
};
#define NODE_CALLBACK_PRI 2 /* lower than SLAB */
-static int __init register_node_type(void)
+void __init node_dev_init(void)
{
- int ret;
+ static struct notifier_block node_memory_callback_nb = {
+ .notifier_call = node_memory_callback,
+ .priority = NODE_CALLBACK_PRI,
+ };
+ int ret, i;
BUILD_BUG_ON(ARRAY_SIZE(node_state_attr) != NR_NODE_STATES);
BUILD_BUG_ON(ARRAY_SIZE(node_state_attrs)-1 != NR_NODE_STATES);
ret = subsys_system_register(&node_subsys, cpu_root_attr_groups);
- if (!ret) {
- static struct notifier_block node_memory_callback_nb = {
- .notifier_call = node_memory_callback,
- .priority = NODE_CALLBACK_PRI,
- };
- register_hotmemory_notifier(&node_memory_callback_nb);
- }
+ if (ret)
+ panic("%s() failed to register subsystem: %d\n", __func__, ret);
+
+ register_hotmemory_notifier(&node_memory_callback_nb);
/*
- * Note: we're not going to unregister the node class if we fail
- * to register the node state class attribute files.
+ * Create all node devices, which will properly link the node
+ * to applicable memory block devices and already created cpu devices.
*/
- return ret;
+ for_each_online_node(i) {
+ ret = register_one_node(i);
+ if (ret)
+ panic("%s() failed to add node: %d\n", __func__, ret);
+ }
}
-postcore_initcall(register_node_type);
--- a/include/linux/node.h~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/include/linux/node.h
@@ -112,6 +112,7 @@ static inline void link_mem_sections(int
extern void unregister_node(struct node *node);
#ifdef CONFIG_NUMA
+extern void node_dev_init(void);
/* Core of the node registration - only memory hotplug should use this */
extern int __register_one_node(int nid);
@@ -149,6 +150,9 @@ extern void register_hugetlbfs_with_node
node_registration_func_t unregister);
#endif
#else
+static inline void node_dev_init(void)
+{
+}
static inline int __register_one_node(int nid)
{
return 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 171/227] drivers/base/node: consolidate node device subsystem initialization in node_dev_init()
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: ysato, will, tsbogend, tglx, rppt, rafael, paul.walmsley, paulus,
palmer, osalvador, mpe, mingo, mhocko, matorola, hca, gregkh,
gor, davem, dave.hansen, dalias, catalin.marinas, bp, benh, aou,
david, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/node: consolidate node device subsystem initialization in node_dev_init()
... and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls. All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().
This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.
Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.
The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device. The latter should be the case for the current users of
topology_init().
Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64)
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm64/kernel/setup.c | 3 ---
arch/ia64/kernel/topology.c | 10 ----------
arch/mips/kernel/topology.c | 5 -----
arch/powerpc/kernel/sysfs.c | 17 -----------------
arch/riscv/kernel/setup.c | 3 ---
arch/s390/kernel/numa.c | 7 -------
arch/sh/kernel/topology.c | 5 -----
arch/sparc/kernel/sysfs.c | 12 ------------
arch/x86/kernel/topology.c | 5 -----
drivers/base/init.c | 1 +
drivers/base/node.c | 30 +++++++++++++++++-------------
include/linux/node.h | 4 ++++
12 files changed, 22 insertions(+), 80 deletions(-)
--- a/arch/arm64/kernel/setup.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/arm64/kernel/setup.c
@@ -406,9 +406,6 @@ static int __init topology_init(void)
{
int i;
- for_each_online_node(i)
- register_one_node(i);
-
for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
cpu->hotpluggable = cpu_can_disable(i);
--- a/arch/ia64/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/ia64/kernel/topology.c
@@ -70,16 +70,6 @@ static int __init topology_init(void)
{
int i, err = 0;
-#ifdef CONFIG_NUMA
- /*
- * MCD - Do we want to register all ONLINE nodes, or all POSSIBLE nodes?
- */
- for_each_online_node(i) {
- if ((err = register_one_node(i)))
- goto out;
- }
-#endif
-
sysfs_cpus = kcalloc(NR_CPUS, sizeof(struct ia64_cpu), GFP_KERNEL);
if (!sysfs_cpus)
panic("kzalloc in topology_init failed - NR_CPUS too big?");
--- a/arch/mips/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/mips/kernel/topology.c
@@ -12,11 +12,6 @@ static int __init topology_init(void)
{
int i, ret;
-#ifdef CONFIG_NUMA
- for_each_online_node(i)
- register_one_node(i);
-#endif /* CONFIG_NUMA */
-
for_each_present_cpu(i) {
struct cpu *c = &per_cpu(cpu_devices, i);
--- a/arch/powerpc/kernel/sysfs.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/powerpc/kernel/sysfs.c
@@ -1110,14 +1110,6 @@ EXPORT_SYMBOL_GPL(cpu_remove_dev_attr_gr
/* NUMA stuff */
#ifdef CONFIG_NUMA
-static void __init register_nodes(void)
-{
- int i;
-
- for (i = 0; i < MAX_NUMNODES; i++)
- register_one_node(i);
-}
-
int sysfs_add_device_to_node(struct device *dev, int nid)
{
struct node *node = node_devices[nid];
@@ -1132,13 +1124,6 @@ void sysfs_remove_device_from_node(struc
sysfs_remove_link(&node->dev.kobj, kobject_name(&dev->kobj));
}
EXPORT_SYMBOL_GPL(sysfs_remove_device_from_node);
-
-#else
-static void __init register_nodes(void)
-{
- return;
-}
-
#endif
/* Only valid if CPU is present. */
@@ -1155,8 +1140,6 @@ static int __init topology_init(void)
{
int cpu, r;
- register_nodes();
-
for_each_possible_cpu(cpu) {
struct cpu *c = &per_cpu(cpu_devices, cpu);
--- a/arch/riscv/kernel/setup.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/riscv/kernel/setup.c
@@ -301,9 +301,6 @@ static int __init topology_init(void)
{
int i, ret;
- for_each_online_node(i)
- register_one_node(i);
-
for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_devices, i);
--- a/arch/s390/kernel/numa.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/s390/kernel/numa.c
@@ -33,10 +33,3 @@ void __init numa_setup(void)
NODE_DATA(0)->node_spanned_pages = memblock_end_of_DRAM() >> PAGE_SHIFT;
NODE_DATA(0)->node_id = 0;
}
-
-static int __init numa_init_late(void)
-{
- register_one_node(0);
- return 0;
-}
-arch_initcall(numa_init_late);
--- a/arch/sh/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/sh/kernel/topology.c
@@ -46,11 +46,6 @@ static int __init topology_init(void)
{
int i, ret;
-#ifdef CONFIG_NUMA
- for_each_online_node(i)
- register_one_node(i);
-#endif
-
for_each_present_cpu(i) {
struct cpu *c = &per_cpu(cpu_devices, i);
--- a/arch/sparc/kernel/sysfs.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/sparc/kernel/sysfs.c
@@ -244,22 +244,10 @@ static void __init check_mmu_stats(void)
mmu_stats_supported = 1;
}
-static void register_nodes(void)
-{
-#ifdef CONFIG_NUMA
- int i;
-
- for (i = 0; i < MAX_NUMNODES; i++)
- register_one_node(i);
-#endif
-}
-
static int __init topology_init(void)
{
int cpu, ret;
- register_nodes();
-
check_mmu_stats();
for_each_possible_cpu(cpu) {
--- a/arch/x86/kernel/topology.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/arch/x86/kernel/topology.c
@@ -154,11 +154,6 @@ static int __init topology_init(void)
{
int i;
-#ifdef CONFIG_NUMA
- for_each_online_node(i)
- register_one_node(i);
-#endif
-
for_each_present_cpu(i)
arch_register_cpu(i);
--- a/drivers/base/init.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/drivers/base/init.c
@@ -35,5 +35,6 @@ void __init driver_init(void)
auxiliary_bus_init();
cpu_dev_init();
memory_dev_init();
+ node_dev_init();
container_dev_init();
}
--- a/drivers/base/node.c~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/drivers/base/node.c
@@ -1065,26 +1065,30 @@ static const struct attribute_group *cpu
};
#define NODE_CALLBACK_PRI 2 /* lower than SLAB */
-static int __init register_node_type(void)
+void __init node_dev_init(void)
{
- int ret;
+ static struct notifier_block node_memory_callback_nb = {
+ .notifier_call = node_memory_callback,
+ .priority = NODE_CALLBACK_PRI,
+ };
+ int ret, i;
BUILD_BUG_ON(ARRAY_SIZE(node_state_attr) != NR_NODE_STATES);
BUILD_BUG_ON(ARRAY_SIZE(node_state_attrs)-1 != NR_NODE_STATES);
ret = subsys_system_register(&node_subsys, cpu_root_attr_groups);
- if (!ret) {
- static struct notifier_block node_memory_callback_nb = {
- .notifier_call = node_memory_callback,
- .priority = NODE_CALLBACK_PRI,
- };
- register_hotmemory_notifier(&node_memory_callback_nb);
- }
+ if (ret)
+ panic("%s() failed to register subsystem: %d\n", __func__, ret);
+
+ register_hotmemory_notifier(&node_memory_callback_nb);
/*
- * Note: we're not going to unregister the node class if we fail
- * to register the node state class attribute files.
+ * Create all node devices, which will properly link the node
+ * to applicable memory block devices and already created cpu devices.
*/
- return ret;
+ for_each_online_node(i) {
+ ret = register_one_node(i);
+ if (ret)
+ panic("%s() failed to add node: %d\n", __func__, ret);
+ }
}
-postcore_initcall(register_node_type);
--- a/include/linux/node.h~drivers-base-node-consolidate-node-device-subsystem-initialization-in-node_dev_init
+++ a/include/linux/node.h
@@ -112,6 +112,7 @@ static inline void link_mem_sections(int
extern void unregister_node(struct node *node);
#ifdef CONFIG_NUMA
+extern void node_dev_init(void);
/* Core of the node registration - only memory hotplug should use this */
extern int __register_one_node(int nid);
@@ -149,6 +150,9 @@ extern void register_hugetlbfs_with_node
node_registration_func_t unregister);
#endif
#else
+static inline void node_dev_init(void)
+{
+}
static inline int __register_one_node(int nid)
{
return 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 172/227] mm/memory_hotplug: remove obsolete comment of __add_pages
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: david, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: remove obsolete comment of __add_pages
Patch series "A few cleanup patches around memory_hotplug".
This series contains a few patches to fix obsolete and misplaced comments,
clean up the try_offline_node function and so on.
This patch (of 4):
Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online"), there is no need to pass in the zone.
[akpm@linux-foundation.org: remove the comment altogether, per David]
Link: https://lkml.kernel.org/r/20220207133643.23427-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220207133643.23427-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-remove-obsolete-comment-of-__add_pages
+++ a/mm/memory_hotplug.c
@@ -295,12 +295,6 @@ struct page *pfn_to_online_page(unsigned
}
EXPORT_SYMBOL_GPL(pfn_to_online_page);
-/*
- * Reasonably generic function for adding memory. It is
- * expected that archs that support memory hotplug will
- * call this function after deciding the zone to which to
- * add the new pages.
- */
int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
struct mhp_params *params)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 172/227] mm/memory_hotplug: remove obsolete comment of __add_pages
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: david, linmiaohe, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: remove obsolete comment of __add_pages
Patch series "A few cleanup patches around memory_hotplug".
This series contains a few patches to fix obsolete and misplaced comments,
clean up the try_offline_node function and so on.
This patch (of 4):
Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online"), there is no need to pass in the zone.
[akpm@linux-foundation.org: remove the comment altogether, per David]
Link: https://lkml.kernel.org/r/20220207133643.23427-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20220207133643.23427-2-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-remove-obsolete-comment-of-__add_pages
+++ a/mm/memory_hotplug.c
@@ -295,12 +295,6 @@ struct page *pfn_to_online_page(unsigned
}
EXPORT_SYMBOL_GPL(pfn_to_online_page);
-/*
- * Reasonably generic function for adding memory. It is
- * expected that archs that support memory hotplug will
- * call this function after deciding the zone to which to
- * add the new pages.
- */
int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
struct mhp_params *params)
{
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 173/227] mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: osalvador, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL
If zid reaches ZONE_NORMAL, the caller will always get the NORMAL zone no
matter what zone_intersects() returns. So we can save some possible cpu
cycles by avoid calling zone_intersects() for ZONE_NORMAL.
Link: https://lkml.kernel.org/r/20220207133643.23427-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-avoid-calling-zone_intersects-for-zone_normal
+++ a/mm/memory_hotplug.c
@@ -823,7 +823,7 @@ static struct zone *default_kernel_zone_
struct pglist_data *pgdat = NODE_DATA(nid);
int zid;
- for (zid = 0; zid <= ZONE_NORMAL; zid++) {
+ for (zid = 0; zid < ZONE_NORMAL; zid++) {
struct zone *zone = &pgdat->node_zones[zid];
if (zone_intersects(zone, start_pfn, nr_pages))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 173/227] mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: osalvador, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL
If zid reaches ZONE_NORMAL, the caller will always get the NORMAL zone no
matter what zone_intersects() returns. So we can save some possible cpu
cycles by avoid calling zone_intersects() for ZONE_NORMAL.
Link: https://lkml.kernel.org/r/20220207133643.23427-3-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-avoid-calling-zone_intersects-for-zone_normal
+++ a/mm/memory_hotplug.c
@@ -823,7 +823,7 @@ static struct zone *default_kernel_zone_
struct pglist_data *pgdat = NODE_DATA(nid);
int zid;
- for (zid = 0; zid <= ZONE_NORMAL; zid++) {
+ for (zid = 0; zid < ZONE_NORMAL; zid++) {
struct zone *zone = &pgdat->node_zones[zid];
if (zone_intersects(zone, start_pfn, nr_pages))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 174/227] mm/memory_hotplug: clean up try_offline_node
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: osalvador, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: clean up try_offline_node
We can use helper macro node_spanned_pages to check whether node spans
pages. And we can change the parameter of check_cpu_on_node to nid as
that's what it really cares. Thus we can further get rid of the local
variable pgdat and improve the readability a bit.
Link: https://lkml.kernel.org/r/20220207133643.23427-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-clean-up-try_offline_node
+++ a/mm/memory_hotplug.c
@@ -2005,12 +2005,12 @@ static int get_nr_vmemmap_pages_cb(struc
return mem->nr_vmemmap_pages;
}
-static int check_cpu_on_node(pg_data_t *pgdat)
+static int check_cpu_on_node(int nid)
{
int cpu;
for_each_present_cpu(cpu) {
- if (cpu_to_node(cpu) == pgdat->node_id)
+ if (cpu_to_node(cpu) == nid)
/*
* the cpu on this node isn't removed, and we can't
* offline this node.
@@ -2044,7 +2044,6 @@ static int check_no_memblock_for_node_cb
*/
void try_offline_node(int nid)
{
- pg_data_t *pgdat = NODE_DATA(nid);
int rc;
/*
@@ -2052,7 +2051,7 @@ void try_offline_node(int nid)
* offline it. A node spans memory after move_pfn_range_to_zone(),
* e.g., after the memory block was onlined.
*/
- if (pgdat->node_spanned_pages)
+ if (node_spanned_pages(nid))
return;
/*
@@ -2064,7 +2063,7 @@ void try_offline_node(int nid)
if (rc)
return;
- if (check_cpu_on_node(pgdat))
+ if (check_cpu_on_node(nid))
return;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 174/227] mm/memory_hotplug: clean up try_offline_node
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: osalvador, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: clean up try_offline_node
We can use helper macro node_spanned_pages to check whether node spans
pages. And we can change the parameter of check_cpu_on_node to nid as
that's what it really cares. Thus we can further get rid of the local
variable pgdat and improve the readability a bit.
Link: https://lkml.kernel.org/r/20220207133643.23427-4-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-clean-up-try_offline_node
+++ a/mm/memory_hotplug.c
@@ -2005,12 +2005,12 @@ static int get_nr_vmemmap_pages_cb(struc
return mem->nr_vmemmap_pages;
}
-static int check_cpu_on_node(pg_data_t *pgdat)
+static int check_cpu_on_node(int nid)
{
int cpu;
for_each_present_cpu(cpu) {
- if (cpu_to_node(cpu) == pgdat->node_id)
+ if (cpu_to_node(cpu) == nid)
/*
* the cpu on this node isn't removed, and we can't
* offline this node.
@@ -2044,7 +2044,6 @@ static int check_no_memblock_for_node_cb
*/
void try_offline_node(int nid)
{
- pg_data_t *pgdat = NODE_DATA(nid);
int rc;
/*
@@ -2052,7 +2051,7 @@ void try_offline_node(int nid)
* offline it. A node spans memory after move_pfn_range_to_zone(),
* e.g., after the memory block was onlined.
*/
- if (pgdat->node_spanned_pages)
+ if (node_spanned_pages(nid))
return;
/*
@@ -2064,7 +2063,7 @@ void try_offline_node(int nid)
if (rc)
return;
- if (check_cpu_on_node(pgdat))
+ if (check_cpu_on_node(nid))
return;
/*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 175/227] mm/memory_hotplug: fix misplaced comment in offline_pages
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: osalvador, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: fix misplaced comment in offline_pages
It's misplaced since commit 7960509329c2 ("mm, memory_hotplug: print
reason for the offlining failure"). Move it to the right place.
Link: https://lkml.kernel.org/r/20220207133643.23427-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-misplaced-comment-in-offline_pages
+++ a/mm/memory_hotplug.c
@@ -1963,6 +1963,7 @@ int __ref offline_pages(unsigned long st
return 0;
failed_removal_isolated:
+ /* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
memory_notify(MEM_CANCEL_OFFLINE, &arg);
failed_removal_pcplists_disabled:
@@ -1973,7 +1974,6 @@ failed_removal:
(unsigned long long) start_pfn << PAGE_SHIFT,
((unsigned long long) end_pfn << PAGE_SHIFT) - 1,
reason);
- /* pushback to free area */
mem_hotplug_done();
return ret;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 175/227] mm/memory_hotplug: fix misplaced comment in offline_pages
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: osalvador, david, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/memory_hotplug: fix misplaced comment in offline_pages
It's misplaced since commit 7960509329c2 ("mm, memory_hotplug: print
reason for the offlining failure"). Move it to the right place.
Link: https://lkml.kernel.org/r/20220207133643.23427-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-misplaced-comment-in-offline_pages
+++ a/mm/memory_hotplug.c
@@ -1963,6 +1963,7 @@ int __ref offline_pages(unsigned long st
return 0;
failed_removal_isolated:
+ /* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
memory_notify(MEM_CANCEL_OFFLINE, &arg);
failed_removal_pcplists_disabled:
@@ -1973,7 +1974,6 @@ failed_removal:
(unsigned long long) start_pfn << PAGE_SHIFT,
((unsigned long long) end_pfn << PAGE_SHIFT) - 1,
reason);
- /* pushback to free area */
mem_hotplug_done();
return ret;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 176/227] drivers/base/node: rename link_mem_sections() to register_memory_block_under_node()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rparrazo, rafael, osalvador, mhocko, gregkh, david, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/node: rename link_mem_sections() to register_memory_block_under_node()
Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2.
I remember talking to Michal in the past about removing
test_pages_in_a_zone(), which we use for:
* verifying that a memory block we intend to offline is really only managed
by a single zone. We don't support offlining of memory blocks that are
managed by multiple zones (e.g., multiple nodes, DMA and DMA32)
* exposing that zone to user space via
/sys/devices/system/memory/memory*/valid_zones
Now that I identified some more cases where test_pages_in_a_zone() might
go wrong, and we received an UBSAN report (see patch #3), let's get rid of
this PFN walker.
So instead of detecting the zone at runtime with test_pages_in_a_zone() by
scanning the memmap, let's determine and remember for each memory block if
it's managed by a single zone. The stored zone can then be used for the
above two cases, avoiding a manual lookup using test_pages_in_a_zone().
This avoids eventually stumbling over uninitialized memmaps in corner
cases, especially when ZONE_DEVICE ranges partly fall into memory block
(that are responsible for managing System RAM).
Handling memory onlining is easy, because we online to exactly one zone.
Handling boot memory is more tricky, because we want to avoid scanning all
zones of all nodes to detect possible zones that overlap with the physical
memory region of interest. Fortunately, we already have code that
determines the applicable nodes for a memory block, to create sysfs links
-- we'll hook into that.
Patch #1 is a simple cleanup I had laying around for a longer time.
Patch #2 contains the main logic to remove test_pages_in_a_zone() and
further details.
[1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
[2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
This patch (of 2):
Let's adjust the stale terminology, making it match
unregister_memory_block_under_nodes() and
do_register_memory_block_under_node(). We're dealing with memory block
devices, which span 1..X memory sections.
Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rafael Parra <rparrazo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/node.c | 5 +++--
include/linux/node.h | 16 ++++++++--------
mm/memory_hotplug.c | 6 +++---
3 files changed, 14 insertions(+), 13 deletions(-)
--- a/drivers/base/node.c~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node
+++ a/drivers/base/node.c
@@ -892,8 +892,9 @@ void unregister_memory_block_under_nodes
kobject_name(&node_devices[mem_blk->nid]->dev.kobj));
}
-void link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn,
- enum meminit_context context)
+void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
+ unsigned long end_pfn,
+ enum meminit_context context)
{
walk_memory_blocks_func_t func;
--- a/include/linux/node.h~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node
+++ a/include/linux/node.h
@@ -99,13 +99,13 @@ extern struct node *node_devices[];
typedef void (*node_registration_func_t)(struct node *);
#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
-void link_mem_sections(int nid, unsigned long start_pfn,
- unsigned long end_pfn,
- enum meminit_context context);
+void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
+ unsigned long end_pfn,
+ enum meminit_context context);
#else
-static inline void link_mem_sections(int nid, unsigned long start_pfn,
- unsigned long end_pfn,
- enum meminit_context context)
+static inline void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
+ unsigned long end_pfn,
+ enum meminit_context context)
{
}
#endif
@@ -129,8 +129,8 @@ static inline int register_one_node(int
error = __register_one_node(nid);
if (error)
return error;
- /* link memory sections under this node */
- link_mem_sections(nid, start_pfn, end_pfn, MEMINIT_EARLY);
+ register_memory_blocks_under_node(nid, start_pfn, end_pfn,
+ MEMINIT_EARLY);
}
return error;
--- a/mm/memory_hotplug.c~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node
+++ a/mm/memory_hotplug.c
@@ -1383,9 +1383,9 @@ int __ref add_memory_resource(int nid, s
BUG_ON(ret);
}
- /* link memory sections under this node.*/
- link_mem_sections(nid, PFN_DOWN(start), PFN_UP(start + size - 1),
- MEMINIT_HOTPLUG);
+ register_memory_blocks_under_node(nid, PFN_DOWN(start),
+ PFN_UP(start + size - 1),
+ MEMINIT_HOTPLUG);
/* create new memmap entry */
if (!strcmp(res->name, "System RAM"))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 176/227] drivers/base/node: rename link_mem_sections() to register_memory_block_under_node()
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rparrazo, rafael, osalvador, mhocko, gregkh, david, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/node: rename link_mem_sections() to register_memory_block_under_node()
Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2.
I remember talking to Michal in the past about removing
test_pages_in_a_zone(), which we use for:
* verifying that a memory block we intend to offline is really only managed
by a single zone. We don't support offlining of memory blocks that are
managed by multiple zones (e.g., multiple nodes, DMA and DMA32)
* exposing that zone to user space via
/sys/devices/system/memory/memory*/valid_zones
Now that I identified some more cases where test_pages_in_a_zone() might
go wrong, and we received an UBSAN report (see patch #3), let's get rid of
this PFN walker.
So instead of detecting the zone at runtime with test_pages_in_a_zone() by
scanning the memmap, let's determine and remember for each memory block if
it's managed by a single zone. The stored zone can then be used for the
above two cases, avoiding a manual lookup using test_pages_in_a_zone().
This avoids eventually stumbling over uninitialized memmaps in corner
cases, especially when ZONE_DEVICE ranges partly fall into memory block
(that are responsible for managing System RAM).
Handling memory onlining is easy, because we online to exactly one zone.
Handling boot memory is more tricky, because we want to avoid scanning all
zones of all nodes to detect possible zones that overlap with the physical
memory region of interest. Fortunately, we already have code that
determines the applicable nodes for a memory block, to create sysfs links
-- we'll hook into that.
Patch #1 is a simple cleanup I had laying around for a longer time.
Patch #2 contains the main logic to remove test_pages_in_a_zone() and
further details.
[1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
[2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
This patch (of 2):
Let's adjust the stale terminology, making it match
unregister_memory_block_under_nodes() and
do_register_memory_block_under_node(). We're dealing with memory block
devices, which span 1..X memory sections.
Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rafael Parra <rparrazo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/node.c | 5 +++--
include/linux/node.h | 16 ++++++++--------
mm/memory_hotplug.c | 6 +++---
3 files changed, 14 insertions(+), 13 deletions(-)
--- a/drivers/base/node.c~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node
+++ a/drivers/base/node.c
@@ -892,8 +892,9 @@ void unregister_memory_block_under_nodes
kobject_name(&node_devices[mem_blk->nid]->dev.kobj));
}
-void link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn,
- enum meminit_context context)
+void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
+ unsigned long end_pfn,
+ enum meminit_context context)
{
walk_memory_blocks_func_t func;
--- a/include/linux/node.h~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node
+++ a/include/linux/node.h
@@ -99,13 +99,13 @@ extern struct node *node_devices[];
typedef void (*node_registration_func_t)(struct node *);
#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
-void link_mem_sections(int nid, unsigned long start_pfn,
- unsigned long end_pfn,
- enum meminit_context context);
+void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
+ unsigned long end_pfn,
+ enum meminit_context context);
#else
-static inline void link_mem_sections(int nid, unsigned long start_pfn,
- unsigned long end_pfn,
- enum meminit_context context)
+static inline void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
+ unsigned long end_pfn,
+ enum meminit_context context)
{
}
#endif
@@ -129,8 +129,8 @@ static inline int register_one_node(int
error = __register_one_node(nid);
if (error)
return error;
- /* link memory sections under this node */
- link_mem_sections(nid, start_pfn, end_pfn, MEMINIT_EARLY);
+ register_memory_blocks_under_node(nid, start_pfn, end_pfn,
+ MEMINIT_EARLY);
}
return error;
--- a/mm/memory_hotplug.c~drivers-base-node-rename-link_mem_sections-to-register_memory_block_under_node
+++ a/mm/memory_hotplug.c
@@ -1383,9 +1383,9 @@ int __ref add_memory_resource(int nid, s
BUG_ON(ret);
}
- /* link memory sections under this node.*/
- link_mem_sections(nid, PFN_DOWN(start), PFN_UP(start + size - 1),
- MEMINIT_HOTPLUG);
+ register_memory_blocks_under_node(nid, PFN_DOWN(start),
+ PFN_UP(start + size - 1),
+ MEMINIT_HOTPLUG);
/* create new memmap entry */
if (!strcmp(res->name, "System RAM"))
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 177/227] drivers/base/memory: determine and store zone for single-zone memory blocks
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rparrazo, rafael, osalvador, mhocko, gregkh, david, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: determine and store zone for single-zone memory blocks
test_pages_in_a_zone() is just another nasty PFN walker that can easily
stumble over ZONE_DEVICE memory ranges falling into the same memory block
as ordinary system RAM: the memmap of parts of these ranges might possibly
be uninitialized. In fact, we observed (on an older kernel) with UBSAN:
[ 7691.855626] UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50
[ 7691.862155] index 7 is out of range for type 'zone [5]'
[ 7691.867393] CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...]
[ 7691.879990] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019
[ 7691.887643] Call Trace:
[ 7691.890107] dump_stack+0x9a/0xf0
[ 7691.893438] ubsan_epilogue+0x9/0x7a
[ 7691.897025] __ubsan_handle_out_of_bounds+0x13a/0x181
[ 7691.902086] ? __ubsan_handle_shift_out_of_bounds+0x289/0x289
[ 7691.907841] ? sched_clock_cpu+0x18/0x1e0
[ 7691.911867] ? __lock_acquire+0x610/0x38d0
[ 7691.915979] test_pages_in_a_zone+0x3c4/0x500
[ 7691.920357] show_valid_zones+0x1fa/0x380
[ 7691.924375] ? print_allowed_zone+0x80/0x80
[ 7691.928571] ? __lock_is_held+0xb4/0x140
[ 7691.932509] ? __lock_is_held+0xb4/0x140
[ 7691.936447] ? dev_attr_store+0x70/0x70
[ 7691.940296] dev_attr_show+0x43/0xb0
[ 7691.943884] ? memset+0x1f/0x40
[ 7691.947042] sysfs_kf_seq_show+0x1c5/0x440
[ 7691.951153] seq_read+0x49d/0x1190
[ 7691.954574] ? seq_escape+0x1f0/0x1f0
[ 7691.958249] ? fsnotify_first_mark+0x150/0x150
[ 7691.962713] vfs_read+0xff/0x300
[ 7691.965952] ksys_read+0xb8/0x170
[ 7691.969279] ? kernel_write+0x130/0x130
[ 7691.973126] ? entry_SYSCALL_64_after_hwframe+0x7a/0xdf
[ 7691.978365] ? do_syscall_64+0x22/0x4b0
[ 7691.982212] do_syscall_64+0xa5/0x4b0
[ 7691.985887] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 7691.990947] RIP: 0033:0x7f01f4439b52
We seem to stumble over a memmap that contains a garbage zone id. While
we could try inserting pfn_to_online_page() calls, it will just make
memory offlining slower, because we use test_pages_in_a_zone() to make
sure we're offlining pages that all belong to the same zone.
Let's just get rid of this PFN walker and determine the single zone of a
memory block -- if any -- for early memory blocks during boot. For memory
onlining, we know the single zone already. Let's avoid any additional
memmap scanning and just rely on the zone information available during
boot.
For memory hot(un)plug, we only really care about memory blocks that:
* span a single zone (and, thereby, a single node)
* are completely System RAM (IOW, no holes, no ZONE_DEVICE)
If one of these conditions is not met, we reject memory offlining.
Hotplugged memory blocks (starting out offline), always meet both
conditions.
There are three scenarios to handle:
(1) Memory hot(un)plug
A memory block with zone == NULL cannot be offlined, corresponding to
our previous test_pages_in_a_zone() check.
After successful memory onlining/offlining, we simply set the zone
accordingly.
* Memory onlining: set the zone we just used for onlining
* Memory offlining: set zone = NULL
So a hotplugged memory block starts with zone = NULL. Once memory
onlining is done, we set the proper zone.
(2) Boot memory with !CONFIG_NUMA
We know that there is just a single pgdat, so we simply scan all zones
of that pgdat for an intersection with our memory block PFN range when
adding the memory block. If more than one zone intersects (e.g., DMA and
DMA32 on x86 for the first memory block) we set zone = NULL and
consequently mimic what test_pages_in_a_zone() used to do.
(3) Boot memory with CONFIG_NUMA
At the point in time we create the memory block devices during boot, we
don't know yet which nodes *actually* span a memory block. While we could
scan all zones of all nodes for intersections, overlapping nodes complicate
the situation and scanning all nodes is possibly expensive. But that
problem has already been solved by the code that sets the node of a memory
block and creates the link in the sysfs --
do_register_memory_block_under_node().
So, we hook into the code that sets the node id for a memory block. If
we already have a different node id set for the memory block, we know
that multiple nodes *actually* have PFNs falling into our memory block:
we set zone = NULL and consequently mimic what test_pages_in_a_zone() used
to do. If there is no node id set, we do the same as (2) for the given
node.
Note that the call order in driver_init() is:
-> memory_dev_init(): create memory block devices
-> node_dev_init(): link memory block devices to the node and set the
node id
So in summary, we detect if there is a single zone responsible for this
memory block and we consequently store the zone in that case in the
memory block, updating it during memory onlining/offlining.
Link: https://lkml.kernel.org/r/20220210184359.235565-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Rafael Parra <rparrazo@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rafael Parra <rparrazo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/memory.c | 101 +++++++++++++++++++++++++++++--
drivers/base/node.c | 13 +--
include/linux/memory.h | 12 +++
include/linux/memory_hotplug.h | 6 -
mm/memory_hotplug.c | 50 +++------------
5 files changed, 125 insertions(+), 57 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/drivers/base/memory.c
@@ -215,6 +215,7 @@ static int memory_block_online(struct me
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
nr_vmemmap_pages);
+ mem->zone = zone;
return ret;
}
@@ -225,6 +226,9 @@ static int memory_block_offline(struct m
unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
int ret;
+ if (!mem->zone)
+ return -EINVAL;
+
/*
* Unaccount before offlining, such that unpopulated zone and kthreads
* can properly be torn down in offline_pages().
@@ -234,7 +238,7 @@ static int memory_block_offline(struct m
-nr_vmemmap_pages);
ret = offline_pages(start_pfn + nr_vmemmap_pages,
- nr_pages - nr_vmemmap_pages, mem->group);
+ nr_pages - nr_vmemmap_pages, mem->zone, mem->group);
if (ret) {
/* offline_pages() failed. Account back. */
if (nr_vmemmap_pages)
@@ -246,6 +250,7 @@ static int memory_block_offline(struct m
if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
+ mem->zone = NULL;
return ret;
}
@@ -411,11 +416,10 @@ static ssize_t valid_zones_show(struct d
*/
if (mem->state == MEM_ONLINE) {
/*
- * The block contains more than one zone can not be offlined.
- * This can happen e.g. for ZONE_DMA and ZONE_DMA32
+ * If !mem->zone, the memory block spans multiple zones and
+ * cannot get offlined.
*/
- default_zone = test_pages_in_a_zone(start_pfn,
- start_pfn + nr_pages);
+ default_zone = mem->zone;
if (!default_zone)
return sysfs_emit(buf, "%s\n", "none");
len += sysfs_emit_at(buf, len, "%s", default_zone->name);
@@ -643,6 +647,82 @@ int register_memory(struct memory_block
return ret;
}
+static struct zone *early_node_zone_for_memory_block(struct memory_block *mem,
+ int nid)
+{
+ const unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+ const unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+ struct zone *zone, *matching_zone = NULL;
+ pg_data_t *pgdat = NODE_DATA(nid);
+ int i;
+
+ /*
+ * This logic only works for early memory, when the applicable zones
+ * already span the memory block. We don't expect overlapping zones on
+ * a single node for early memory. So if we're told that some PFNs
+ * of a node fall into this memory block, we can assume that all node
+ * zones that intersect with the memory block are actually applicable.
+ * No need to look at the memmap.
+ */
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ zone = pgdat->node_zones + i;
+ if (!populated_zone(zone))
+ continue;
+ if (!zone_intersects(zone, start_pfn, nr_pages))
+ continue;
+ if (!matching_zone) {
+ matching_zone = zone;
+ continue;
+ }
+ /* Spans multiple zones ... */
+ matching_zone = NULL;
+ break;
+ }
+ return matching_zone;
+}
+
+#ifdef CONFIG_NUMA
+/**
+ * memory_block_add_nid() - Indicate that system RAM falling into this memory
+ * block device (partially) belongs to the given node.
+ * @mem: The memory block device.
+ * @nid: The node id.
+ * @context: The memory initialization context.
+ *
+ * Indicate that system RAM falling into this memory block (partially) belongs
+ * to the given node. If the context indicates ("early") that we are adding the
+ * node during node device subsystem initialization, this will also properly
+ * set/adjust mem->zone based on the zone ranges of the given node.
+ */
+void memory_block_add_nid(struct memory_block *mem, int nid,
+ enum meminit_context context)
+{
+ if (context == MEMINIT_EARLY && mem->nid != nid) {
+ /*
+ * For early memory we have to determine the zone when setting
+ * the node id and handle multiple nodes spanning a single
+ * memory block by indicate via zone == NULL that we're not
+ * dealing with a single zone. So if we're setting the node id
+ * the first time, determine if there is a single zone. If we're
+ * setting the node id a second time to a different node,
+ * invalidate the single detected zone.
+ */
+ if (mem->nid == NUMA_NO_NODE)
+ mem->zone = early_node_zone_for_memory_block(mem, nid);
+ else
+ mem->zone = NULL;
+ }
+
+ /*
+ * If this memory block spans multiple nodes, we only indicate
+ * the last processed node. If we span multiple nodes (not applicable
+ * to hotplugged memory), zone == NULL will prohibit memory offlining
+ * and consequently unplug.
+ */
+ mem->nid = nid;
+}
+#endif
+
static int init_memory_block(unsigned long block_id, unsigned long state,
unsigned long nr_vmemmap_pages,
struct memory_group *group)
@@ -665,6 +745,17 @@ static int init_memory_block(unsigned lo
mem->nr_vmemmap_pages = nr_vmemmap_pages;
INIT_LIST_HEAD(&mem->group_next);
+#ifndef CONFIG_NUMA
+ if (state == MEM_ONLINE)
+ /*
+ * MEM_ONLINE at this point implies early memory. With NUMA,
+ * we'll determine the zone when setting the node id via
+ * memory_block_add_nid(). Memory hotplug updated the zone
+ * manually when memory onlining/offlining succeeds.
+ */
+ mem->zone = early_node_zone_for_memory_block(mem, NUMA_NO_NODE);
+#endif /* CONFIG_NUMA */
+
ret = register_memory(mem);
if (ret)
return ret;
--- a/drivers/base/node.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/drivers/base/node.c
@@ -796,15 +796,12 @@ static int __ref get_nid_for_pfn(unsigne
}
static void do_register_memory_block_under_node(int nid,
- struct memory_block *mem_blk)
+ struct memory_block *mem_blk,
+ enum meminit_context context)
{
int ret;
- /*
- * If this memory block spans multiple nodes, we only indicate
- * the last processed node.
- */
- mem_blk->nid = nid;
+ memory_block_add_nid(mem_blk, nid, context);
ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj,
&mem_blk->dev.kobj,
@@ -857,7 +854,7 @@ static int register_mem_block_under_node
if (page_nid != nid)
continue;
- do_register_memory_block_under_node(nid, mem_blk);
+ do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY);
return 0;
}
/* mem section does not span the specified node */
@@ -873,7 +870,7 @@ static int register_mem_block_under_node
{
int nid = *(int *)arg;
- do_register_memory_block_under_node(nid, mem_blk);
+ do_register_memory_block_under_node(nid, mem_blk, MEMINIT_HOTPLUG);
return 0;
}
--- a/include/linux/memory.h~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/include/linux/memory.h
@@ -70,6 +70,13 @@ struct memory_block {
unsigned long state; /* serialized by the dev->lock */
int online_type; /* for passing data to online routine */
int nid; /* NID for this memory block */
+ /*
+ * The single zone of this memory block if all PFNs of this memory block
+ * that are System RAM (not a memory hole, not ZONE_DEVICE ranges) are
+ * managed by a single zone. NULL if multiple zones (including nodes)
+ * apply.
+ */
+ struct zone *zone;
struct device dev;
/*
* Number of vmemmap pages. These pages
@@ -161,6 +168,11 @@ int walk_dynamic_memory_groups(int nid,
})
#define register_hotmemory_notifier(nb) register_memory_notifier(nb)
#define unregister_hotmemory_notifier(nb) unregister_memory_notifier(nb)
+
+#ifdef CONFIG_NUMA
+void memory_block_add_nid(struct memory_block *mem, int nid,
+ enum meminit_context context);
+#endif /* CONFIG_NUMA */
#endif /* CONFIG_MEMORY_HOTPLUG */
/*
--- a/include/linux/memory_hotplug.h~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/include/linux/memory_hotplug.h
@@ -163,8 +163,6 @@ extern int mhp_init_memmap_on_memory(uns
extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages);
extern int online_pages(unsigned long pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group);
-extern struct zone *test_pages_in_a_zone(unsigned long start_pfn,
- unsigned long end_pfn);
extern void __offline_isolated_pages(unsigned long start_pfn,
unsigned long end_pfn);
@@ -293,7 +291,7 @@ static inline void pgdat_resize_init(str
extern void try_offline_node(int nid);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct memory_group *group);
+ struct zone *zone, struct memory_group *group);
extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
extern int offline_and_remove_memory(u64 start, u64 size);
@@ -302,7 +300,7 @@ extern int offline_and_remove_memory(u64
static inline void try_offline_node(int nid) {}
static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct memory_group *group)
+ struct zone *zone, struct memory_group *group)
{
return -EINVAL;
}
--- a/mm/memory_hotplug.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/mm/memory_hotplug.c
@@ -1549,38 +1549,6 @@ bool mhp_range_allowed(u64 start, u64 si
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
- * Confirm all pages in a range [start, end) belong to the same zone (skipping
- * memory holes). When true, return the zone.
- */
-struct zone *test_pages_in_a_zone(unsigned long start_pfn,
- unsigned long end_pfn)
-{
- unsigned long pfn, sec_end_pfn;
- struct zone *zone = NULL;
- struct page *page;
-
- for (pfn = start_pfn, sec_end_pfn = SECTION_ALIGN_UP(start_pfn + 1);
- pfn < end_pfn;
- pfn = sec_end_pfn, sec_end_pfn += PAGES_PER_SECTION) {
- /* Make sure the memory section is present first */
- if (!present_section_nr(pfn_to_section_nr(pfn)))
- continue;
- for (; pfn < sec_end_pfn && pfn < end_pfn;
- pfn += MAX_ORDER_NR_PAGES) {
- /* Check if we got outside of the zone */
- if (zone && !zone_spans_pfn(zone, pfn))
- return NULL;
- page = pfn_to_page(pfn);
- if (zone && page_zone(page) != zone)
- return NULL;
- zone = page_zone(page);
- }
- }
-
- return zone;
-}
-
-/*
* Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
* non-lru movable pages and hugepages). Will skip over most unmovable
* pages (esp., pages that can be skipped when offlining), but bail out on
@@ -1803,15 +1771,15 @@ static int count_system_ram_pages_cb(uns
}
int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct memory_group *group)
+ struct zone *zone, struct memory_group *group)
{
const unsigned long end_pfn = start_pfn + nr_pages;
unsigned long pfn, system_ram_pages = 0;
+ const int node = zone_to_nid(zone);
unsigned long flags;
- struct zone *zone;
struct memory_notify arg;
- int ret, node;
char *reason;
+ int ret;
/*
* {on,off}lining is constrained to full memory sections (or more
@@ -1843,15 +1811,17 @@ int __ref offline_pages(unsigned long st
goto failed_removal;
}
- /* This makes hotplug much easier...and readable.
- we assume this for now. .*/
- zone = test_pages_in_a_zone(start_pfn, end_pfn);
- if (!zone) {
+ /*
+ * We only support offlining of memory blocks managed by a single zone,
+ * checked by calling code. This is just a sanity check that we might
+ * want to remove in the future.
+ */
+ if (WARN_ON_ONCE(page_zone(pfn_to_page(start_pfn)) != zone ||
+ page_zone(pfn_to_page(end_pfn - 1)) != zone)) {
ret = -EINVAL;
reason = "multizone range";
goto failed_removal;
}
- node = zone_to_nid(zone);
/*
* Disable pcplists so that page isolation cannot race with freeing
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 177/227] drivers/base/memory: determine and store zone for single-zone memory blocks
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rparrazo, rafael, osalvador, mhocko, gregkh, david, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: determine and store zone for single-zone memory blocks
test_pages_in_a_zone() is just another nasty PFN walker that can easily
stumble over ZONE_DEVICE memory ranges falling into the same memory block
as ordinary system RAM: the memmap of parts of these ranges might possibly
be uninitialized. In fact, we observed (on an older kernel) with UBSAN:
[ 7691.855626] UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50
[ 7691.862155] index 7 is out of range for type 'zone [5]'
[ 7691.867393] CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...]
[ 7691.879990] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019
[ 7691.887643] Call Trace:
[ 7691.890107] dump_stack+0x9a/0xf0
[ 7691.893438] ubsan_epilogue+0x9/0x7a
[ 7691.897025] __ubsan_handle_out_of_bounds+0x13a/0x181
[ 7691.902086] ? __ubsan_handle_shift_out_of_bounds+0x289/0x289
[ 7691.907841] ? sched_clock_cpu+0x18/0x1e0
[ 7691.911867] ? __lock_acquire+0x610/0x38d0
[ 7691.915979] test_pages_in_a_zone+0x3c4/0x500
[ 7691.920357] show_valid_zones+0x1fa/0x380
[ 7691.924375] ? print_allowed_zone+0x80/0x80
[ 7691.928571] ? __lock_is_held+0xb4/0x140
[ 7691.932509] ? __lock_is_held+0xb4/0x140
[ 7691.936447] ? dev_attr_store+0x70/0x70
[ 7691.940296] dev_attr_show+0x43/0xb0
[ 7691.943884] ? memset+0x1f/0x40
[ 7691.947042] sysfs_kf_seq_show+0x1c5/0x440
[ 7691.951153] seq_read+0x49d/0x1190
[ 7691.954574] ? seq_escape+0x1f0/0x1f0
[ 7691.958249] ? fsnotify_first_mark+0x150/0x150
[ 7691.962713] vfs_read+0xff/0x300
[ 7691.965952] ksys_read+0xb8/0x170
[ 7691.969279] ? kernel_write+0x130/0x130
[ 7691.973126] ? entry_SYSCALL_64_after_hwframe+0x7a/0xdf
[ 7691.978365] ? do_syscall_64+0x22/0x4b0
[ 7691.982212] do_syscall_64+0xa5/0x4b0
[ 7691.985887] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 7691.990947] RIP: 0033:0x7f01f4439b52
We seem to stumble over a memmap that contains a garbage zone id. While
we could try inserting pfn_to_online_page() calls, it will just make
memory offlining slower, because we use test_pages_in_a_zone() to make
sure we're offlining pages that all belong to the same zone.
Let's just get rid of this PFN walker and determine the single zone of a
memory block -- if any -- for early memory blocks during boot. For memory
onlining, we know the single zone already. Let's avoid any additional
memmap scanning and just rely on the zone information available during
boot.
For memory hot(un)plug, we only really care about memory blocks that:
* span a single zone (and, thereby, a single node)
* are completely System RAM (IOW, no holes, no ZONE_DEVICE)
If one of these conditions is not met, we reject memory offlining.
Hotplugged memory blocks (starting out offline), always meet both
conditions.
There are three scenarios to handle:
(1) Memory hot(un)plug
A memory block with zone == NULL cannot be offlined, corresponding to
our previous test_pages_in_a_zone() check.
After successful memory onlining/offlining, we simply set the zone
accordingly.
* Memory onlining: set the zone we just used for onlining
* Memory offlining: set zone = NULL
So a hotplugged memory block starts with zone = NULL. Once memory
onlining is done, we set the proper zone.
(2) Boot memory with !CONFIG_NUMA
We know that there is just a single pgdat, so we simply scan all zones
of that pgdat for an intersection with our memory block PFN range when
adding the memory block. If more than one zone intersects (e.g., DMA and
DMA32 on x86 for the first memory block) we set zone = NULL and
consequently mimic what test_pages_in_a_zone() used to do.
(3) Boot memory with CONFIG_NUMA
At the point in time we create the memory block devices during boot, we
don't know yet which nodes *actually* span a memory block. While we could
scan all zones of all nodes for intersections, overlapping nodes complicate
the situation and scanning all nodes is possibly expensive. But that
problem has already been solved by the code that sets the node of a memory
block and creates the link in the sysfs --
do_register_memory_block_under_node().
So, we hook into the code that sets the node id for a memory block. If
we already have a different node id set for the memory block, we know
that multiple nodes *actually* have PFNs falling into our memory block:
we set zone = NULL and consequently mimic what test_pages_in_a_zone() used
to do. If there is no node id set, we do the same as (2) for the given
node.
Note that the call order in driver_init() is:
-> memory_dev_init(): create memory block devices
-> node_dev_init(): link memory block devices to the node and set the
node id
So in summary, we detect if there is a single zone responsible for this
memory block and we consequently store the zone in that case in the
memory block, updating it during memory onlining/offlining.
Link: https://lkml.kernel.org/r/20220210184359.235565-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Rafael Parra <rparrazo@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rafael Parra <rparrazo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/memory.c | 101 +++++++++++++++++++++++++++++--
drivers/base/node.c | 13 +--
include/linux/memory.h | 12 +++
include/linux/memory_hotplug.h | 6 -
mm/memory_hotplug.c | 50 +++------------
5 files changed, 125 insertions(+), 57 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/drivers/base/memory.c
@@ -215,6 +215,7 @@ static int memory_block_online(struct me
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
nr_vmemmap_pages);
+ mem->zone = zone;
return ret;
}
@@ -225,6 +226,9 @@ static int memory_block_offline(struct m
unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
int ret;
+ if (!mem->zone)
+ return -EINVAL;
+
/*
* Unaccount before offlining, such that unpopulated zone and kthreads
* can properly be torn down in offline_pages().
@@ -234,7 +238,7 @@ static int memory_block_offline(struct m
-nr_vmemmap_pages);
ret = offline_pages(start_pfn + nr_vmemmap_pages,
- nr_pages - nr_vmemmap_pages, mem->group);
+ nr_pages - nr_vmemmap_pages, mem->zone, mem->group);
if (ret) {
/* offline_pages() failed. Account back. */
if (nr_vmemmap_pages)
@@ -246,6 +250,7 @@ static int memory_block_offline(struct m
if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
+ mem->zone = NULL;
return ret;
}
@@ -411,11 +416,10 @@ static ssize_t valid_zones_show(struct d
*/
if (mem->state == MEM_ONLINE) {
/*
- * The block contains more than one zone can not be offlined.
- * This can happen e.g. for ZONE_DMA and ZONE_DMA32
+ * If !mem->zone, the memory block spans multiple zones and
+ * cannot get offlined.
*/
- default_zone = test_pages_in_a_zone(start_pfn,
- start_pfn + nr_pages);
+ default_zone = mem->zone;
if (!default_zone)
return sysfs_emit(buf, "%s\n", "none");
len += sysfs_emit_at(buf, len, "%s", default_zone->name);
@@ -643,6 +647,82 @@ int register_memory(struct memory_block
return ret;
}
+static struct zone *early_node_zone_for_memory_block(struct memory_block *mem,
+ int nid)
+{
+ const unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+ const unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+ struct zone *zone, *matching_zone = NULL;
+ pg_data_t *pgdat = NODE_DATA(nid);
+ int i;
+
+ /*
+ * This logic only works for early memory, when the applicable zones
+ * already span the memory block. We don't expect overlapping zones on
+ * a single node for early memory. So if we're told that some PFNs
+ * of a node fall into this memory block, we can assume that all node
+ * zones that intersect with the memory block are actually applicable.
+ * No need to look at the memmap.
+ */
+ for (i = 0; i < MAX_NR_ZONES; i++) {
+ zone = pgdat->node_zones + i;
+ if (!populated_zone(zone))
+ continue;
+ if (!zone_intersects(zone, start_pfn, nr_pages))
+ continue;
+ if (!matching_zone) {
+ matching_zone = zone;
+ continue;
+ }
+ /* Spans multiple zones ... */
+ matching_zone = NULL;
+ break;
+ }
+ return matching_zone;
+}
+
+#ifdef CONFIG_NUMA
+/**
+ * memory_block_add_nid() - Indicate that system RAM falling into this memory
+ * block device (partially) belongs to the given node.
+ * @mem: The memory block device.
+ * @nid: The node id.
+ * @context: The memory initialization context.
+ *
+ * Indicate that system RAM falling into this memory block (partially) belongs
+ * to the given node. If the context indicates ("early") that we are adding the
+ * node during node device subsystem initialization, this will also properly
+ * set/adjust mem->zone based on the zone ranges of the given node.
+ */
+void memory_block_add_nid(struct memory_block *mem, int nid,
+ enum meminit_context context)
+{
+ if (context == MEMINIT_EARLY && mem->nid != nid) {
+ /*
+ * For early memory we have to determine the zone when setting
+ * the node id and handle multiple nodes spanning a single
+ * memory block by indicate via zone == NULL that we're not
+ * dealing with a single zone. So if we're setting the node id
+ * the first time, determine if there is a single zone. If we're
+ * setting the node id a second time to a different node,
+ * invalidate the single detected zone.
+ */
+ if (mem->nid == NUMA_NO_NODE)
+ mem->zone = early_node_zone_for_memory_block(mem, nid);
+ else
+ mem->zone = NULL;
+ }
+
+ /*
+ * If this memory block spans multiple nodes, we only indicate
+ * the last processed node. If we span multiple nodes (not applicable
+ * to hotplugged memory), zone == NULL will prohibit memory offlining
+ * and consequently unplug.
+ */
+ mem->nid = nid;
+}
+#endif
+
static int init_memory_block(unsigned long block_id, unsigned long state,
unsigned long nr_vmemmap_pages,
struct memory_group *group)
@@ -665,6 +745,17 @@ static int init_memory_block(unsigned lo
mem->nr_vmemmap_pages = nr_vmemmap_pages;
INIT_LIST_HEAD(&mem->group_next);
+#ifndef CONFIG_NUMA
+ if (state == MEM_ONLINE)
+ /*
+ * MEM_ONLINE at this point implies early memory. With NUMA,
+ * we'll determine the zone when setting the node id via
+ * memory_block_add_nid(). Memory hotplug updated the zone
+ * manually when memory onlining/offlining succeeds.
+ */
+ mem->zone = early_node_zone_for_memory_block(mem, NUMA_NO_NODE);
+#endif /* CONFIG_NUMA */
+
ret = register_memory(mem);
if (ret)
return ret;
--- a/drivers/base/node.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/drivers/base/node.c
@@ -796,15 +796,12 @@ static int __ref get_nid_for_pfn(unsigne
}
static void do_register_memory_block_under_node(int nid,
- struct memory_block *mem_blk)
+ struct memory_block *mem_blk,
+ enum meminit_context context)
{
int ret;
- /*
- * If this memory block spans multiple nodes, we only indicate
- * the last processed node.
- */
- mem_blk->nid = nid;
+ memory_block_add_nid(mem_blk, nid, context);
ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj,
&mem_blk->dev.kobj,
@@ -857,7 +854,7 @@ static int register_mem_block_under_node
if (page_nid != nid)
continue;
- do_register_memory_block_under_node(nid, mem_blk);
+ do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY);
return 0;
}
/* mem section does not span the specified node */
@@ -873,7 +870,7 @@ static int register_mem_block_under_node
{
int nid = *(int *)arg;
- do_register_memory_block_under_node(nid, mem_blk);
+ do_register_memory_block_under_node(nid, mem_blk, MEMINIT_HOTPLUG);
return 0;
}
--- a/include/linux/memory.h~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/include/linux/memory.h
@@ -70,6 +70,13 @@ struct memory_block {
unsigned long state; /* serialized by the dev->lock */
int online_type; /* for passing data to online routine */
int nid; /* NID for this memory block */
+ /*
+ * The single zone of this memory block if all PFNs of this memory block
+ * that are System RAM (not a memory hole, not ZONE_DEVICE ranges) are
+ * managed by a single zone. NULL if multiple zones (including nodes)
+ * apply.
+ */
+ struct zone *zone;
struct device dev;
/*
* Number of vmemmap pages. These pages
@@ -161,6 +168,11 @@ int walk_dynamic_memory_groups(int nid,
})
#define register_hotmemory_notifier(nb) register_memory_notifier(nb)
#define unregister_hotmemory_notifier(nb) unregister_memory_notifier(nb)
+
+#ifdef CONFIG_NUMA
+void memory_block_add_nid(struct memory_block *mem, int nid,
+ enum meminit_context context);
+#endif /* CONFIG_NUMA */
#endif /* CONFIG_MEMORY_HOTPLUG */
/*
--- a/include/linux/memory_hotplug.h~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/include/linux/memory_hotplug.h
@@ -163,8 +163,6 @@ extern int mhp_init_memmap_on_memory(uns
extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages);
extern int online_pages(unsigned long pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group);
-extern struct zone *test_pages_in_a_zone(unsigned long start_pfn,
- unsigned long end_pfn);
extern void __offline_isolated_pages(unsigned long start_pfn,
unsigned long end_pfn);
@@ -293,7 +291,7 @@ static inline void pgdat_resize_init(str
extern void try_offline_node(int nid);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct memory_group *group);
+ struct zone *zone, struct memory_group *group);
extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
extern int offline_and_remove_memory(u64 start, u64 size);
@@ -302,7 +300,7 @@ extern int offline_and_remove_memory(u64
static inline void try_offline_node(int nid) {}
static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct memory_group *group)
+ struct zone *zone, struct memory_group *group)
{
return -EINVAL;
}
--- a/mm/memory_hotplug.c~drivers-base-memory-determine-and-store-zone-for-single-zone-memory-blocks
+++ a/mm/memory_hotplug.c
@@ -1549,38 +1549,6 @@ bool mhp_range_allowed(u64 start, u64 si
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
- * Confirm all pages in a range [start, end) belong to the same zone (skipping
- * memory holes). When true, return the zone.
- */
-struct zone *test_pages_in_a_zone(unsigned long start_pfn,
- unsigned long end_pfn)
-{
- unsigned long pfn, sec_end_pfn;
- struct zone *zone = NULL;
- struct page *page;
-
- for (pfn = start_pfn, sec_end_pfn = SECTION_ALIGN_UP(start_pfn + 1);
- pfn < end_pfn;
- pfn = sec_end_pfn, sec_end_pfn += PAGES_PER_SECTION) {
- /* Make sure the memory section is present first */
- if (!present_section_nr(pfn_to_section_nr(pfn)))
- continue;
- for (; pfn < sec_end_pfn && pfn < end_pfn;
- pfn += MAX_ORDER_NR_PAGES) {
- /* Check if we got outside of the zone */
- if (zone && !zone_spans_pfn(zone, pfn))
- return NULL;
- page = pfn_to_page(pfn);
- if (zone && page_zone(page) != zone)
- return NULL;
- zone = page_zone(page);
- }
- }
-
- return zone;
-}
-
-/*
* Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
* non-lru movable pages and hugepages). Will skip over most unmovable
* pages (esp., pages that can be skipped when offlining), but bail out on
@@ -1803,15 +1771,15 @@ static int count_system_ram_pages_cb(uns
}
int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct memory_group *group)
+ struct zone *zone, struct memory_group *group)
{
const unsigned long end_pfn = start_pfn + nr_pages;
unsigned long pfn, system_ram_pages = 0;
+ const int node = zone_to_nid(zone);
unsigned long flags;
- struct zone *zone;
struct memory_notify arg;
- int ret, node;
char *reason;
+ int ret;
/*
* {on,off}lining is constrained to full memory sections (or more
@@ -1843,15 +1811,17 @@ int __ref offline_pages(unsigned long st
goto failed_removal;
}
- /* This makes hotplug much easier...and readable.
- we assume this for now. .*/
- zone = test_pages_in_a_zone(start_pfn, end_pfn);
- if (!zone) {
+ /*
+ * We only support offlining of memory blocks managed by a single zone,
+ * checked by calling code. This is just a sanity check that we might
+ * want to remove in the future.
+ */
+ if (WARN_ON_ONCE(page_zone(pfn_to_page(start_pfn)) != zone ||
+ page_zone(pfn_to_page(end_pfn - 1)) != zone)) {
ret = -EINVAL;
reason = "multizone range";
goto failed_removal;
}
- node = zone_to_nid(zone);
/*
* Disable pcplists so that page isolation cannot race with freeing
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 178/227] drivers/base/memory: clarify adding and removing of memory blocks
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rafael, osalvador, mhocko, gregkh, david, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: clarify adding and removing of memory blocks
Let's make it clearer at which places we actually add and remove memory
blocks -- streamlining the terminology -- and highlight which memory block
start out online and which start out as offline.
* rename add_memory_block -> add_boot_memory_block
* rename init_memory_block -> add_memory_block
* rename unregister_memory -> remove_memory_block
* rename register_memory -> __add_memory_block
* add add_hotplug_memory_block
* mark add_boot_memory_block with __init (suggested by Oscar)
__add_memory_block() is a pure helper for add_memory_block(), remove
the somewhat obvious comment.
Link: https://lkml.kernel.org/r/20220221154531.11382-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/memory.c | 38 ++++++++++++++++++++------------------
1 file changed, 20 insertions(+), 18 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-clarify-adding-and-removing-of-memory-blocks
+++ a/drivers/base/memory.c
@@ -619,11 +619,7 @@ static const struct attribute_group *mem
NULL,
};
-/*
- * register_memory - Setup a sysfs device for a memory block
- */
-static
-int register_memory(struct memory_block *memory)
+static int __add_memory_block(struct memory_block *memory)
{
int ret;
@@ -723,9 +719,9 @@ void memory_block_add_nid(struct memory_
}
#endif
-static int init_memory_block(unsigned long block_id, unsigned long state,
- unsigned long nr_vmemmap_pages,
- struct memory_group *group)
+static int add_memory_block(unsigned long block_id, unsigned long state,
+ unsigned long nr_vmemmap_pages,
+ struct memory_group *group)
{
struct memory_block *mem;
int ret = 0;
@@ -756,7 +752,7 @@ static int init_memory_block(unsigned lo
mem->zone = early_node_zone_for_memory_block(mem, NUMA_NO_NODE);
#endif /* CONFIG_NUMA */
- ret = register_memory(mem);
+ ret = __add_memory_block(mem);
if (ret)
return ret;
@@ -768,7 +764,7 @@ static int init_memory_block(unsigned lo
return 0;
}
-static int add_memory_block(unsigned long base_section_nr)
+static int __init add_boot_memory_block(unsigned long base_section_nr)
{
int section_count = 0;
unsigned long nr;
@@ -780,11 +776,18 @@ static int add_memory_block(unsigned lon
if (section_count == 0)
return 0;
- return init_memory_block(memory_block_id(base_section_nr),
- MEM_ONLINE, 0, NULL);
+ return add_memory_block(memory_block_id(base_section_nr),
+ MEM_ONLINE, 0, NULL);
+}
+
+static int add_hotplug_memory_block(unsigned long block_id,
+ unsigned long nr_vmemmap_pages,
+ struct memory_group *group)
+{
+ return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
}
-static void unregister_memory(struct memory_block *memory)
+static void remove_memory_block(struct memory_block *memory)
{
if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
return;
@@ -823,8 +826,7 @@ int create_memory_block_devices(unsigned
return -EINVAL;
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
- ret = init_memory_block(block_id, MEM_OFFLINE, vmemmap_pages,
- group);
+ ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
if (ret)
break;
}
@@ -835,7 +837,7 @@ int create_memory_block_devices(unsigned
mem = find_memory_block_by_id(block_id);
if (WARN_ON_ONCE(!mem))
continue;
- unregister_memory(mem);
+ remove_memory_block(mem);
}
}
return ret;
@@ -864,7 +866,7 @@ void remove_memory_block_devices(unsigne
if (WARN_ON_ONCE(!mem))
continue;
unregister_memory_block_under_nodes(mem);
- unregister_memory(mem);
+ remove_memory_block(mem);
}
}
@@ -924,7 +926,7 @@ void __init memory_dev_init(void)
*/
for (nr = 0; nr <= __highest_present_section_nr;
nr += sections_per_block) {
- ret = add_memory_block(nr);
+ ret = add_boot_memory_block(nr);
if (ret)
panic("%s() failed to add memory block: %d\n", __func__,
ret);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 178/227] drivers/base/memory: clarify adding and removing of memory blocks
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: rafael, osalvador, mhocko, gregkh, david, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: David Hildenbrand <david@redhat.com>
Subject: drivers/base/memory: clarify adding and removing of memory blocks
Let's make it clearer at which places we actually add and remove memory
blocks -- streamlining the terminology -- and highlight which memory block
start out online and which start out as offline.
* rename add_memory_block -> add_boot_memory_block
* rename init_memory_block -> add_memory_block
* rename unregister_memory -> remove_memory_block
* rename register_memory -> __add_memory_block
* add add_hotplug_memory_block
* mark add_boot_memory_block with __init (suggested by Oscar)
__add_memory_block() is a pure helper for add_memory_block(), remove
the somewhat obvious comment.
Link: https://lkml.kernel.org/r/20220221154531.11382-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
drivers/base/memory.c | 38 ++++++++++++++++++++------------------
1 file changed, 20 insertions(+), 18 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-clarify-adding-and-removing-of-memory-blocks
+++ a/drivers/base/memory.c
@@ -619,11 +619,7 @@ static const struct attribute_group *mem
NULL,
};
-/*
- * register_memory - Setup a sysfs device for a memory block
- */
-static
-int register_memory(struct memory_block *memory)
+static int __add_memory_block(struct memory_block *memory)
{
int ret;
@@ -723,9 +719,9 @@ void memory_block_add_nid(struct memory_
}
#endif
-static int init_memory_block(unsigned long block_id, unsigned long state,
- unsigned long nr_vmemmap_pages,
- struct memory_group *group)
+static int add_memory_block(unsigned long block_id, unsigned long state,
+ unsigned long nr_vmemmap_pages,
+ struct memory_group *group)
{
struct memory_block *mem;
int ret = 0;
@@ -756,7 +752,7 @@ static int init_memory_block(unsigned lo
mem->zone = early_node_zone_for_memory_block(mem, NUMA_NO_NODE);
#endif /* CONFIG_NUMA */
- ret = register_memory(mem);
+ ret = __add_memory_block(mem);
if (ret)
return ret;
@@ -768,7 +764,7 @@ static int init_memory_block(unsigned lo
return 0;
}
-static int add_memory_block(unsigned long base_section_nr)
+static int __init add_boot_memory_block(unsigned long base_section_nr)
{
int section_count = 0;
unsigned long nr;
@@ -780,11 +776,18 @@ static int add_memory_block(unsigned lon
if (section_count == 0)
return 0;
- return init_memory_block(memory_block_id(base_section_nr),
- MEM_ONLINE, 0, NULL);
+ return add_memory_block(memory_block_id(base_section_nr),
+ MEM_ONLINE, 0, NULL);
+}
+
+static int add_hotplug_memory_block(unsigned long block_id,
+ unsigned long nr_vmemmap_pages,
+ struct memory_group *group)
+{
+ return add_memory_block(block_id, MEM_OFFLINE, nr_vmemmap_pages, group);
}
-static void unregister_memory(struct memory_block *memory)
+static void remove_memory_block(struct memory_block *memory)
{
if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys))
return;
@@ -823,8 +826,7 @@ int create_memory_block_devices(unsigned
return -EINVAL;
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
- ret = init_memory_block(block_id, MEM_OFFLINE, vmemmap_pages,
- group);
+ ret = add_hotplug_memory_block(block_id, vmemmap_pages, group);
if (ret)
break;
}
@@ -835,7 +837,7 @@ int create_memory_block_devices(unsigned
mem = find_memory_block_by_id(block_id);
if (WARN_ON_ONCE(!mem))
continue;
- unregister_memory(mem);
+ remove_memory_block(mem);
}
}
return ret;
@@ -864,7 +866,7 @@ void remove_memory_block_devices(unsigne
if (WARN_ON_ONCE(!mem))
continue;
unregister_memory_block_under_nodes(mem);
- unregister_memory(mem);
+ remove_memory_block(mem);
}
}
@@ -924,7 +926,7 @@ void __init memory_dev_init(void)
*/
for (nr = 0; nr <= __highest_present_section_nr;
nr += sections_per_block) {
- ret = add_memory_block(nr);
+ ret = add_boot_memory_block(nr);
if (ret)
panic("%s() failed to add memory block: %d\n", __func__,
ret);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 179/227] mm: only re-generate demotion targets when a numa node changes its N_CPU state
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: ying.huang, stable, huntbag, dave.hansen, baolin.wang, osalvador,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Oscar Salvador <osalvador@suse.de>
Subject: mm: only re-generate demotion targets when a numa node changes its N_CPU state
Abhishek reported that after patch [1], hotplug operations are taking
~double the expected time. [2]
The reason behind is that the CPU callbacks that migrate_on_reclaim_init()
sets always call set_migration_target_nodes() whenever a CPU is brought
up/down.
But we only care about numa nodes going from having cpus to become
cpuless, and vice versa, as that influences the demotion_target order.
We do already have two CPU callbacks (vmstat_cpu_online() and
vmstat_cpu_dead()) that check exactly that, so get rid of the CPU
callbacks in migrate_on_reclaim_init() and only call
set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a
numa node change its N_CPU state.
[1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/
[2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/
[osalvador@suse.de: add feedback from Huang Ying]
Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de
Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Abhishek Goel <huntbag@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/migrate.h | 8 ++++++
mm/migrate.c | 47 ++++++++------------------------------
mm/vmstat.c | 13 +++++++++-
3 files changed, 30 insertions(+), 38 deletions(-)
--- a/include/linux/migrate.h~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/include/linux/migrate.h
@@ -48,7 +48,15 @@ int folio_migrate_mapping(struct address
struct folio *newfolio, struct folio *folio, int extra_count);
extern bool numa_demotion_enabled;
+extern void migrate_on_reclaim_init(void);
+#ifdef CONFIG_HOTPLUG_CPU
+extern void set_migration_target_nodes(void);
#else
+static inline void set_migration_target_nodes(void) {}
+#endif
+#else
+
+static inline void set_migration_target_nodes(void) {}
static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t new,
--- a/mm/migrate.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/mm/migrate.c
@@ -3209,7 +3209,7 @@ again:
/*
* For callers that do not hold get_online_mems() already.
*/
-static void set_migration_target_nodes(void)
+void set_migration_target_nodes(void)
{
get_online_mems();
__set_migration_target_nodes();
@@ -3273,51 +3273,24 @@ static int __meminit migrate_on_reclaim_
return notifier_from_errno(0);
}
-/*
- * React to hotplug events that might affect the migration targets
- * like events that online or offline NUMA nodes.
- *
- * The ordering is also currently dependent on which nodes have
- * CPUs. That means we need CPU on/offline notification too.
- */
-static int migration_online_cpu(unsigned int cpu)
-{
- set_migration_target_nodes();
- return 0;
-}
-
-static int migration_offline_cpu(unsigned int cpu)
-{
- set_migration_target_nodes();
- return 0;
-}
-
-static int __init migrate_on_reclaim_init(void)
+void __init migrate_on_reclaim_init(void)
{
- int ret;
-
node_demotion = kmalloc_array(nr_node_ids,
sizeof(struct demotion_nodes),
GFP_KERNEL);
WARN_ON(!node_demotion);
- ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
- NULL, migration_offline_cpu);
+ hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
/*
- * In the unlikely case that this fails, the automatic
- * migration targets may become suboptimal for nodes
- * where N_CPU changes. With such a small impact in a
- * rare case, do not bother trying to do anything special.
+ * At this point, all numa nodes with memory/CPus have their state
+ * properly set, so we can build the demotion order now.
+ * Let us hold the cpu_hotplug lock just, as we could possibily have
+ * CPU hotplug events during boot.
*/
- WARN_ON(ret < 0);
- ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
- migration_online_cpu, NULL);
- WARN_ON(ret < 0);
-
- hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
- return 0;
+ cpus_read_lock();
+ set_migration_target_nodes();
+ cpus_read_unlock();
}
-late_initcall(migrate_on_reclaim_init);
#endif /* CONFIG_HOTPLUG_CPU */
bool numa_demotion_enabled = false;
--- a/mm/vmstat.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/mm/vmstat.c
@@ -28,6 +28,7 @@
#include <linux/mm_inline.h>
#include <linux/page_ext.h>
#include <linux/page_owner.h>
+#include <linux/migrate.h>
#include "internal.h"
@@ -2049,7 +2050,12 @@ static void __init init_cpu_node_state(v
static int vmstat_cpu_online(unsigned int cpu)
{
refresh_zone_stat_thresholds();
- node_set_state(cpu_to_node(cpu), N_CPU);
+
+ if (!node_state(cpu_to_node(cpu), N_CPU)) {
+ node_set_state(cpu_to_node(cpu), N_CPU);
+ set_migration_target_nodes();
+ }
+
return 0;
}
@@ -2072,6 +2078,8 @@ static int vmstat_cpu_dead(unsigned int
return 0;
node_clear_state(node, N_CPU);
+ set_migration_target_nodes();
+
return 0;
}
@@ -2103,6 +2111,9 @@ void __init init_mm_internals(void)
start_shepherd_timer();
#endif
+#if defined(CONFIG_MIGRATION) && defined(CONFIG_HOTPLUG_CPU)
+ migrate_on_reclaim_init();
+#endif
#ifdef CONFIG_PROC_FS
proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op);
proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 179/227] mm: only re-generate demotion targets when a numa node changes its N_CPU state
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: ying.huang, stable, huntbag, dave.hansen, baolin.wang, osalvador,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Oscar Salvador <osalvador@suse.de>
Subject: mm: only re-generate demotion targets when a numa node changes its N_CPU state
Abhishek reported that after patch [1], hotplug operations are taking
~double the expected time. [2]
The reason behind is that the CPU callbacks that migrate_on_reclaim_init()
sets always call set_migration_target_nodes() whenever a CPU is brought
up/down.
But we only care about numa nodes going from having cpus to become
cpuless, and vice versa, as that influences the demotion_target order.
We do already have two CPU callbacks (vmstat_cpu_online() and
vmstat_cpu_dead()) that check exactly that, so get rid of the CPU
callbacks in migrate_on_reclaim_init() and only call
set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a
numa node change its N_CPU state.
[1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/
[2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/
[osalvador@suse.de: add feedback from Huang Ying]
Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de
Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Abhishek Goel <huntbag@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/migrate.h | 8 ++++++
mm/migrate.c | 47 ++++++++------------------------------
mm/vmstat.c | 13 +++++++++-
3 files changed, 30 insertions(+), 38 deletions(-)
--- a/include/linux/migrate.h~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/include/linux/migrate.h
@@ -48,7 +48,15 @@ int folio_migrate_mapping(struct address
struct folio *newfolio, struct folio *folio, int extra_count);
extern bool numa_demotion_enabled;
+extern void migrate_on_reclaim_init(void);
+#ifdef CONFIG_HOTPLUG_CPU
+extern void set_migration_target_nodes(void);
#else
+static inline void set_migration_target_nodes(void) {}
+#endif
+#else
+
+static inline void set_migration_target_nodes(void) {}
static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t new,
--- a/mm/migrate.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/mm/migrate.c
@@ -3209,7 +3209,7 @@ again:
/*
* For callers that do not hold get_online_mems() already.
*/
-static void set_migration_target_nodes(void)
+void set_migration_target_nodes(void)
{
get_online_mems();
__set_migration_target_nodes();
@@ -3273,51 +3273,24 @@ static int __meminit migrate_on_reclaim_
return notifier_from_errno(0);
}
-/*
- * React to hotplug events that might affect the migration targets
- * like events that online or offline NUMA nodes.
- *
- * The ordering is also currently dependent on which nodes have
- * CPUs. That means we need CPU on/offline notification too.
- */
-static int migration_online_cpu(unsigned int cpu)
-{
- set_migration_target_nodes();
- return 0;
-}
-
-static int migration_offline_cpu(unsigned int cpu)
-{
- set_migration_target_nodes();
- return 0;
-}
-
-static int __init migrate_on_reclaim_init(void)
+void __init migrate_on_reclaim_init(void)
{
- int ret;
-
node_demotion = kmalloc_array(nr_node_ids,
sizeof(struct demotion_nodes),
GFP_KERNEL);
WARN_ON(!node_demotion);
- ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
- NULL, migration_offline_cpu);
+ hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
/*
- * In the unlikely case that this fails, the automatic
- * migration targets may become suboptimal for nodes
- * where N_CPU changes. With such a small impact in a
- * rare case, do not bother trying to do anything special.
+ * At this point, all numa nodes with memory/CPus have their state
+ * properly set, so we can build the demotion order now.
+ * Let us hold the cpu_hotplug lock just, as we could possibily have
+ * CPU hotplug events during boot.
*/
- WARN_ON(ret < 0);
- ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
- migration_online_cpu, NULL);
- WARN_ON(ret < 0);
-
- hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
- return 0;
+ cpus_read_lock();
+ set_migration_target_nodes();
+ cpus_read_unlock();
}
-late_initcall(migrate_on_reclaim_init);
#endif /* CONFIG_HOTPLUG_CPU */
bool numa_demotion_enabled = false;
--- a/mm/vmstat.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/mm/vmstat.c
@@ -28,6 +28,7 @@
#include <linux/mm_inline.h>
#include <linux/page_ext.h>
#include <linux/page_owner.h>
+#include <linux/migrate.h>
#include "internal.h"
@@ -2049,7 +2050,12 @@ static void __init init_cpu_node_state(v
static int vmstat_cpu_online(unsigned int cpu)
{
refresh_zone_stat_thresholds();
- node_set_state(cpu_to_node(cpu), N_CPU);
+
+ if (!node_state(cpu_to_node(cpu), N_CPU)) {
+ node_set_state(cpu_to_node(cpu), N_CPU);
+ set_migration_target_nodes();
+ }
+
return 0;
}
@@ -2072,6 +2078,8 @@ static int vmstat_cpu_dead(unsigned int
return 0;
node_clear_state(node, N_CPU);
+ set_migration_target_nodes();
+
return 0;
}
@@ -2103,6 +2111,9 @@ void __init init_mm_internals(void)
start_shepherd_timer();
#endif
+#if defined(CONFIG_MIGRATION) && defined(CONFIG_HOTPLUG_CPU)
+ migrate_on_reclaim_init();
+#endif
#ifdef CONFIG_PROC_FS
proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op);
proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 180/227] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: shy828301, kirill.shutemov, hughd, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
PageDoubleMap is maintained differently for anon and for shmem+file: the
shmem+file one was never cleared, because a safe place to do so could not
be found; so it would blight future use of the cached hugepage until
evicted.
See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
But page_add_file_rmap() does provide a safe place to do so (though later
than one might wish): allowing testing to return to an initial state
without a damaging drop_caches.
Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/rmap.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/mm/rmap.c~mm-thp-clearpagedoublemap-in-first-page_add_file_rmap
+++ a/mm/rmap.c
@@ -1252,6 +1252,17 @@ void page_add_file_rmap(struct page *pag
}
if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
goto out;
+
+ /*
+ * It is racy to ClearPageDoubleMap in page_remove_file_rmap();
+ * but page lock is held by all page_add_file_rmap() compound
+ * callers, and SetPageDoubleMap below warns if !PageLocked:
+ * so here is a place that DoubleMap can be safely cleared.
+ */
+ VM_WARN_ON_ONCE(!PageLocked(page));
+ if (nr == nr_pages && PageDoubleMap(page))
+ ClearPageDoubleMap(page);
+
if (PageSwapBacked(page))
__mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED,
nr_pages);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 180/227] mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: shy828301, kirill.shutemov, hughd, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: mm/thp: ClearPageDoubleMap in first page_add_file_rmap()
PageDoubleMap is maintained differently for anon and for shmem+file: the
shmem+file one was never cleared, because a safe place to do so could not
be found; so it would blight future use of the cached hugepage until
evicted.
See https://lore.kernel.org/lkml/1571938066-29031-1-git-send-email-yang.shi@linux.alibaba.com/
But page_add_file_rmap() does provide a safe place to do so (though later
than one might wish): allowing testing to return to an initial state
without a damaging drop_caches.
Link: https://lkml.kernel.org/r/61c5cf99-a962-9a25-597a-53ab1bd8fbc0@google.com
Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/rmap.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/mm/rmap.c~mm-thp-clearpagedoublemap-in-first-page_add_file_rmap
+++ a/mm/rmap.c
@@ -1252,6 +1252,17 @@ void page_add_file_rmap(struct page *pag
}
if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
goto out;
+
+ /*
+ * It is racy to ClearPageDoubleMap in page_remove_file_rmap();
+ * but page lock is held by all page_add_file_rmap() compound
+ * callers, and SetPageDoubleMap below warns if !PageLocked:
+ * so here is a place that DoubleMap can be safely cleared.
+ */
+ VM_WARN_ON_ONCE(!PageLocked(page));
+ if (nr == nr_pages && PageDoubleMap(page))
+ ClearPageDoubleMap(page);
+
if (PageSwapBacked(page))
__mod_lruvec_page_state(page, NR_SHMEM_PMDMAPPED,
nr_pages);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 181/227] mm/zswap.c: allow handling just same-value filled pages
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: vitaly.wool, sjenning, ddstreet, maciej.szmigiero, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
Subject: mm/zswap.c: allow handling just same-value filled pages
Zswap has an ability to efficiently store same-value filled pages, which
can be turned on and off using the "same_filled_pages_enabled" parameter.
However, there is currently no way to enable just this (lightweight)
functionality, while not making use of the whole compressed page storage
machinery.
Add a "non_same_filled_pages_enabled" parameter which allows disabling
handling of pages that aren't same-value filled. This way zswap can be
run in such lightweight same-value filled pages only mode.
Link: https://lkml.kernel.org/r/7dbafa963e8bab43608189abbe2067f4b9287831.1641247624.git.maciej.szmigiero@oracle.com
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/zswap.rst | 22 +++++++++++++++++++---
mm/zswap.c | 15 ++++++++++++++-
2 files changed, 33 insertions(+), 4 deletions(-)
--- a/Documentation/admin-guide/mm/zswap.rst~mm-zswapc-allow-handling-just-same-value-filled-pages
+++ a/Documentation/admin-guide/mm/zswap.rst
@@ -130,9 +130,25 @@ attribute, e.g.::
echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
When zswap same-filled page identification is disabled at runtime, it will stop
-checking for the same-value filled pages during store operation. However, the
-existing pages which are marked as same-value filled pages remain stored
-unchanged in zswap until they are either loaded or invalidated.
+checking for the same-value filled pages during store operation.
+In other words, every page will be then considered non-same-value filled.
+However, the existing pages which are marked as same-value filled pages remain
+stored unchanged in zswap until they are either loaded or invalidated.
+
+In some circumstances it might be advantageous to make use of just the zswap
+ability to efficiently store same-filled pages without enabling the whole
+compressed page storage.
+In this case the handling of non-same-value pages by zswap (enabled by default)
+can be disabled by setting the ``non_same_filled_pages_enabled`` attribute
+to 0, e.g. ``zswap.non_same_filled_pages_enabled=0``.
+It can also be enabled and disabled at runtime using the sysfs
+``non_same_filled_pages_enabled`` attribute, e.g.::
+
+ echo 1 > /sys/module/zswap/parameters/non_same_filled_pages_enabled
+
+Disabling both ``zswap.same_filled_pages_enabled`` and
+``zswap.non_same_filled_pages_enabled`` effectively disables accepting any new
+pages by zswap.
To prevent zswap from shrinking pool when zswap is full and there's a high
pressure on swap (this will result in flipping pages in and out zswap pool
--- a/mm/zswap.c~mm-zswapc-allow-handling-just-same-value-filled-pages
+++ a/mm/zswap.c
@@ -120,11 +120,19 @@ static unsigned int zswap_accept_thr_per
module_param_named(accept_threshold_percent, zswap_accept_thr_percent,
uint, 0644);
-/* Enable/disable handling same-value filled pages (enabled by default) */
+/*
+ * Enable/disable handling same-value filled pages (enabled by default).
+ * If disabled every page is considered non-same-value filled.
+ */
static bool zswap_same_filled_pages_enabled = true;
module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
bool, 0644);
+/* Enable/disable handling non-same-value filled pages (enabled by default) */
+static bool zswap_non_same_filled_pages_enabled = true;
+module_param_named(non_same_filled_pages_enabled, zswap_non_same_filled_pages_enabled,
+ bool, 0644);
+
/*********************************
* data structures
**********************************/
@@ -1147,6 +1155,11 @@ static int zswap_frontswap_store(unsigne
kunmap_atomic(src);
}
+ if (!zswap_non_same_filled_pages_enabled) {
+ ret = -EINVAL;
+ goto freepage;
+ }
+
/* if entry is successfully added, it keeps the reference */
entry->pool = zswap_pool_current_get();
if (!entry->pool) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 181/227] mm/zswap.c: allow handling just same-value filled pages
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: vitaly.wool, sjenning, ddstreet, maciej.szmigiero, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
Subject: mm/zswap.c: allow handling just same-value filled pages
Zswap has an ability to efficiently store same-value filled pages, which
can be turned on and off using the "same_filled_pages_enabled" parameter.
However, there is currently no way to enable just this (lightweight)
functionality, while not making use of the whole compressed page storage
machinery.
Add a "non_same_filled_pages_enabled" parameter which allows disabling
handling of pages that aren't same-value filled. This way zswap can be
run in such lightweight same-value filled pages only mode.
Link: https://lkml.kernel.org/r/7dbafa963e8bab43608189abbe2067f4b9287831.1641247624.git.maciej.szmigiero@oracle.com
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/zswap.rst | 22 +++++++++++++++++++---
mm/zswap.c | 15 ++++++++++++++-
2 files changed, 33 insertions(+), 4 deletions(-)
--- a/Documentation/admin-guide/mm/zswap.rst~mm-zswapc-allow-handling-just-same-value-filled-pages
+++ a/Documentation/admin-guide/mm/zswap.rst
@@ -130,9 +130,25 @@ attribute, e.g.::
echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
When zswap same-filled page identification is disabled at runtime, it will stop
-checking for the same-value filled pages during store operation. However, the
-existing pages which are marked as same-value filled pages remain stored
-unchanged in zswap until they are either loaded or invalidated.
+checking for the same-value filled pages during store operation.
+In other words, every page will be then considered non-same-value filled.
+However, the existing pages which are marked as same-value filled pages remain
+stored unchanged in zswap until they are either loaded or invalidated.
+
+In some circumstances it might be advantageous to make use of just the zswap
+ability to efficiently store same-filled pages without enabling the whole
+compressed page storage.
+In this case the handling of non-same-value pages by zswap (enabled by default)
+can be disabled by setting the ``non_same_filled_pages_enabled`` attribute
+to 0, e.g. ``zswap.non_same_filled_pages_enabled=0``.
+It can also be enabled and disabled at runtime using the sysfs
+``non_same_filled_pages_enabled`` attribute, e.g.::
+
+ echo 1 > /sys/module/zswap/parameters/non_same_filled_pages_enabled
+
+Disabling both ``zswap.same_filled_pages_enabled`` and
+``zswap.non_same_filled_pages_enabled`` effectively disables accepting any new
+pages by zswap.
To prevent zswap from shrinking pool when zswap is full and there's a high
pressure on swap (this will result in flipping pages in and out zswap pool
--- a/mm/zswap.c~mm-zswapc-allow-handling-just-same-value-filled-pages
+++ a/mm/zswap.c
@@ -120,11 +120,19 @@ static unsigned int zswap_accept_thr_per
module_param_named(accept_threshold_percent, zswap_accept_thr_percent,
uint, 0644);
-/* Enable/disable handling same-value filled pages (enabled by default) */
+/*
+ * Enable/disable handling same-value filled pages (enabled by default).
+ * If disabled every page is considered non-same-value filled.
+ */
static bool zswap_same_filled_pages_enabled = true;
module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
bool, 0644);
+/* Enable/disable handling non-same-value filled pages (enabled by default) */
+static bool zswap_non_same_filled_pages_enabled = true;
+module_param_named(non_same_filled_pages_enabled, zswap_non_same_filled_pages_enabled,
+ bool, 0644);
+
/*********************************
* data structures
**********************************/
@@ -1147,6 +1155,11 @@ static int zswap_frontswap_store(unsigne
kunmap_atomic(src);
}
+ if (!zswap_non_same_filled_pages_enabled) {
+ ret = -EINVAL;
+ goto freepage;
+ }
+
/* if entry is successfully added, it keeps the reference */
entry->pool = zswap_pool_current_get();
if (!entry->pool) {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 182/227] mm: remove usercopy_warn()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: steve, songmuchun, linmiaohe, keescook, christophe.leroy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: mm: remove usercopy_warn()
Users of usercopy_warn() were removed by commit 53944f171a89 ("mm: remove
HARDENED_USERCOPY_FALLBACK")
Remove it.
Link: https://lkml.kernel.org/r/5f26643fc70b05f8455b60b99c30c17d635fa640.1644231910.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Stephen Kitt <steve@sk2.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/uaccess.h | 2 --
mm/usercopy.c | 11 -----------
2 files changed, 13 deletions(-)
--- a/include/linux/uaccess.h~mm-remove-usercopy_warn
+++ a/include/linux/uaccess.h
@@ -401,8 +401,6 @@ static inline void user_access_restore(u
#endif
#ifdef CONFIG_HARDENED_USERCOPY
-void usercopy_warn(const char *name, const char *detail, bool to_user,
- unsigned long offset, unsigned long len);
void __noreturn usercopy_abort(const char *name, const char *detail,
bool to_user, unsigned long offset,
unsigned long len);
--- a/mm/usercopy.c~mm-remove-usercopy_warn
+++ a/mm/usercopy.c
@@ -70,17 +70,6 @@ static noinline int check_stack_object(c
* kmem_cache_create_usercopy() function to create the cache (and
* carefully audit the whitelist range).
*/
-void usercopy_warn(const char *name, const char *detail, bool to_user,
- unsigned long offset, unsigned long len)
-{
- WARN_ONCE(1, "Bad or missing usercopy whitelist? Kernel memory %s attempt detected %s %s%s%s%s (offset %lu, size %lu)!\n",
- to_user ? "exposure" : "overwrite",
- to_user ? "from" : "to",
- name ? : "unknown?!",
- detail ? " '" : "", detail ? : "", detail ? "'" : "",
- offset, len);
-}
-
void __noreturn usercopy_abort(const char *name, const char *detail,
bool to_user, unsigned long offset,
unsigned long len)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 182/227] mm: remove usercopy_warn()
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: steve, songmuchun, linmiaohe, keescook, christophe.leroy, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: mm: remove usercopy_warn()
Users of usercopy_warn() were removed by commit 53944f171a89 ("mm: remove
HARDENED_USERCOPY_FALLBACK")
Remove it.
Link: https://lkml.kernel.org/r/5f26643fc70b05f8455b60b99c30c17d635fa640.1644231910.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Stephen Kitt <steve@sk2.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/uaccess.h | 2 --
mm/usercopy.c | 11 -----------
2 files changed, 13 deletions(-)
--- a/include/linux/uaccess.h~mm-remove-usercopy_warn
+++ a/include/linux/uaccess.h
@@ -401,8 +401,6 @@ static inline void user_access_restore(u
#endif
#ifdef CONFIG_HARDENED_USERCOPY
-void usercopy_warn(const char *name, const char *detail, bool to_user,
- unsigned long offset, unsigned long len);
void __noreturn usercopy_abort(const char *name, const char *detail,
bool to_user, unsigned long offset,
unsigned long len);
--- a/mm/usercopy.c~mm-remove-usercopy_warn
+++ a/mm/usercopy.c
@@ -70,17 +70,6 @@ static noinline int check_stack_object(c
* kmem_cache_create_usercopy() function to create the cache (and
* carefully audit the whitelist range).
*/
-void usercopy_warn(const char *name, const char *detail, bool to_user,
- unsigned long offset, unsigned long len)
-{
- WARN_ONCE(1, "Bad or missing usercopy whitelist? Kernel memory %s attempt detected %s %s%s%s%s (offset %lu, size %lu)!\n",
- to_user ? "exposure" : "overwrite",
- to_user ? "from" : "to",
- name ? : "unknown?!",
- detail ? " '" : "", detail ? : "", detail ? "'" : "",
- offset, len);
-}
-
void __noreturn usercopy_abort(const char *name, const char *detail,
bool to_user, unsigned long offset,
unsigned long len)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 183/227] mm: uninline copy_overflow()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: David.Laight, anshuman.khandual, christophe.leroy, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: mm: uninline copy_overflow()
While building a small config with CONFIG_CC_OPTIMISE_FOR_SIZE, I ended up
with more than 50 times the following function in vmlinux because GCC
doesn't honor the 'inline' keyword:
c00243bc <copy_overflow>:
c00243bc: 94 21 ff f0 stwu r1,-16(r1)
c00243c0: 7c 85 23 78 mr r5,r4
c00243c4: 7c 64 1b 78 mr r4,r3
c00243c8: 3c 60 c0 62 lis r3,-16286
c00243cc: 7c 08 02 a6 mflr r0
c00243d0: 38 63 5e e5 addi r3,r3,24293
c00243d4: 90 01 00 14 stw r0,20(r1)
c00243d8: 4b ff 82 45 bl c001c61c <__warn_printk>
c00243dc: 0f e0 00 00 twui r0,0
c00243e0: 80 01 00 14 lwz r0,20(r1)
c00243e4: 38 21 00 10 addi r1,r1,16
c00243e8: 7c 08 03 a6 mtlr r0
c00243ec: 4e 80 00 20 blr
With -Winline, GCC tells:
/include/linux/thread_info.h:212:20: warning: inlining failed in call to 'copy_overflow': call is unlikely and code size would grow [-Winline]
copy_overflow() is a non conditional warning called by
check_copy_size() on an error path.
check_copy_size() have to remain inlined in order to benefit
from constant folding, but copy_overflow() is not worth inlining.
Uninline the warning when CONFIG_BUG is selected.
When CONFIG_BUG is not selected, WARN() does nothing so skip it.
This reduces the size of vmlinux by almost 4kbytes.
Link: https://lkml.kernel.org/r/e1723b9cfa924bcefcd41f69d0025b38e4c9364e.1644819985.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/thread_info.h | 5 ++++-
mm/maccess.c | 6 ++++++
2 files changed, 10 insertions(+), 1 deletion(-)
--- a/include/linux/thread_info.h~mm-uninline-copy_overflow
+++ a/include/linux/thread_info.h
@@ -209,9 +209,12 @@ __bad_copy_from(void);
extern void __compiletime_error("copy destination size is too small")
__bad_copy_to(void);
+void __copy_overflow(int size, unsigned long count);
+
static inline void copy_overflow(int size, unsigned long count)
{
- WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+ if (IS_ENABLED(CONFIG_BUG))
+ __copy_overflow(size, count);
}
static __always_inline __must_check bool
--- a/mm/maccess.c~mm-uninline-copy_overflow
+++ a/mm/maccess.c
@@ -335,3 +335,9 @@ long strnlen_user_nofault(const void __u
return ret;
}
+
+void __copy_overflow(int size, unsigned long count)
+{
+ WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
+EXPORT_SYMBOL(__copy_overflow);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 183/227] mm: uninline copy_overflow()
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: David.Laight, anshuman.khandual, christophe.leroy, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: mm: uninline copy_overflow()
While building a small config with CONFIG_CC_OPTIMISE_FOR_SIZE, I ended up
with more than 50 times the following function in vmlinux because GCC
doesn't honor the 'inline' keyword:
c00243bc <copy_overflow>:
c00243bc: 94 21 ff f0 stwu r1,-16(r1)
c00243c0: 7c 85 23 78 mr r5,r4
c00243c4: 7c 64 1b 78 mr r4,r3
c00243c8: 3c 60 c0 62 lis r3,-16286
c00243cc: 7c 08 02 a6 mflr r0
c00243d0: 38 63 5e e5 addi r3,r3,24293
c00243d4: 90 01 00 14 stw r0,20(r1)
c00243d8: 4b ff 82 45 bl c001c61c <__warn_printk>
c00243dc: 0f e0 00 00 twui r0,0
c00243e0: 80 01 00 14 lwz r0,20(r1)
c00243e4: 38 21 00 10 addi r1,r1,16
c00243e8: 7c 08 03 a6 mtlr r0
c00243ec: 4e 80 00 20 blr
With -Winline, GCC tells:
/include/linux/thread_info.h:212:20: warning: inlining failed in call to 'copy_overflow': call is unlikely and code size would grow [-Winline]
copy_overflow() is a non conditional warning called by
check_copy_size() on an error path.
check_copy_size() have to remain inlined in order to benefit
from constant folding, but copy_overflow() is not worth inlining.
Uninline the warning when CONFIG_BUG is selected.
When CONFIG_BUG is not selected, WARN() does nothing so skip it.
This reduces the size of vmlinux by almost 4kbytes.
Link: https://lkml.kernel.org/r/e1723b9cfa924bcefcd41f69d0025b38e4c9364e.1644819985.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/thread_info.h | 5 ++++-
mm/maccess.c | 6 ++++++
2 files changed, 10 insertions(+), 1 deletion(-)
--- a/include/linux/thread_info.h~mm-uninline-copy_overflow
+++ a/include/linux/thread_info.h
@@ -209,9 +209,12 @@ __bad_copy_from(void);
extern void __compiletime_error("copy destination size is too small")
__bad_copy_to(void);
+void __copy_overflow(int size, unsigned long count);
+
static inline void copy_overflow(int size, unsigned long count)
{
- WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+ if (IS_ENABLED(CONFIG_BUG))
+ __copy_overflow(size, count);
}
static __always_inline __must_check bool
--- a/mm/maccess.c~mm-uninline-copy_overflow
+++ a/mm/maccess.c
@@ -335,3 +335,9 @@ long strnlen_user_nofault(const void __u
return ret;
}
+
+void __copy_overflow(int size, unsigned long count)
+{
+ WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
+}
+EXPORT_SYMBOL(__copy_overflow);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 184/227] mm/usercopy: return 1 from hardened_usercopy __setup() handler
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: keescook, i.zhbanov, crecklin, rdunlap, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Randy Dunlap <rdunlap@infradead.org>
Subject: mm/usercopy: return 1 from hardened_usercopy __setup() handler
__setup() handlers should return 1 if the command line option is handled
and 0 if not (or maybe never return 0; it just pollutes init's
environment). This prevents:
Unknown kernel command line parameters \
"BOOT_IMAGE=/boot/bzImage-517rc5 hardened_usercopy=off", will be \
passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc5
hardened_usercopy=off
or
hardened_usercopy=on
but when "hardened_usercopy=foo" is used, there is no Unknown kernel
command line parameter.
Return 1 to indicate that the boot option has been handled.
Print a warning if strtobool() returns an error on the option string,
but do not mark this as in unknown command line option and do not cause
init's environment to be polluted with this string.
Link: https://lkml.kernel.org/r/20220222034249.14795-1-rdunlap@infradead.org
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Fixes: b5cb15d9372ab ("usercopy: Allow boot cmdline disabling of hardening")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/usercopy.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/usercopy.c~mm-usercopy-return-1-from-hardened_usercopy-__setup-handler
+++ a/mm/usercopy.c
@@ -284,7 +284,10 @@ static bool enable_checks __initdata = t
static int __init parse_hardened_usercopy(char *str)
{
- return strtobool(str, &enable_checks);
+ if (strtobool(str, &enable_checks))
+ pr_warn("Invalid option string for hardened_usercopy: '%s'\n",
+ str);
+ return 1;
}
__setup("hardened_usercopy=", parse_hardened_usercopy);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 184/227] mm/usercopy: return 1 from hardened_usercopy __setup() handler
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: keescook, i.zhbanov, crecklin, rdunlap, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Randy Dunlap <rdunlap@infradead.org>
Subject: mm/usercopy: return 1 from hardened_usercopy __setup() handler
__setup() handlers should return 1 if the command line option is handled
and 0 if not (or maybe never return 0; it just pollutes init's
environment). This prevents:
Unknown kernel command line parameters \
"BOOT_IMAGE=/boot/bzImage-517rc5 hardened_usercopy=off", will be \
passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc5
hardened_usercopy=off
or
hardened_usercopy=on
but when "hardened_usercopy=foo" is used, there is no Unknown kernel
command line parameter.
Return 1 to indicate that the boot option has been handled.
Print a warning if strtobool() returns an error on the option string,
but do not mark this as in unknown command line option and do not cause
init's environment to be polluted with this string.
Link: https://lkml.kernel.org/r/20220222034249.14795-1-rdunlap@infradead.org
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Fixes: b5cb15d9372ab ("usercopy: Allow boot cmdline disabling of hardening")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Acked-by: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/usercopy.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/usercopy.c~mm-usercopy-return-1-from-hardened_usercopy-__setup-handler
+++ a/mm/usercopy.c
@@ -284,7 +284,10 @@ static bool enable_checks __initdata = t
static int __init parse_hardened_usercopy(char *str)
{
- return strtobool(str, &enable_checks);
+ if (strtobool(str, &enable_checks))
+ pr_warn("Invalid option string for hardened_usercopy: '%s'\n",
+ str);
+ return 1;
}
__setup("hardened_usercopy=", parse_hardened_usercopy);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 185/227] mm/early_ioremap: declare early_memremap_pgprot_adjust()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: willy, mgorman, david, vbabka, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm/early_ioremap: declare early_memremap_pgprot_adjust()
The mm/ directory can almost fully be built with W=1, which would help in
local development. One remaining issue is missing prototype for
early_memremap_pgprot_adjust().
Thus add a declaration for this function. Use mm/internal.h instead of
asm/early_ioremap.h to avoid missing type definitions and unnecessary
exposure.
Link: https://lkml.kernel.org/r/20220314165724.16071-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/early_ioremap.c | 1 +
mm/internal.h | 6 ++++++
2 files changed, 7 insertions(+)
--- a/mm/early_ioremap.c~mm-early_ioremap-declare-early_memremap_pgprot_adjust
+++ a/mm/early_ioremap.c
@@ -17,6 +17,7 @@
#include <linux/vmalloc.h>
#include <asm/fixmap.h>
#include <asm/early_ioremap.h>
+#include "internal.h"
#ifdef CONFIG_MMU
static int early_ioremap_debug __initdata;
--- a/mm/internal.h~mm-early_ioremap-declare-early_memremap_pgprot_adjust
+++ a/mm/internal.h
@@ -155,6 +155,12 @@ extern unsigned long highest_memmap_pfn;
#define MAX_RECLAIM_RETRIES 16
/*
+ * in mm/early_ioremap.c
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size, pgprot_t prot);
+
+/*
* in mm/vmscan.c:
*/
extern int isolate_lru_page(struct page *page);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 185/227] mm/early_ioremap: declare early_memremap_pgprot_adjust()
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: willy, mgorman, david, vbabka, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm/early_ioremap: declare early_memremap_pgprot_adjust()
The mm/ directory can almost fully be built with W=1, which would help in
local development. One remaining issue is missing prototype for
early_memremap_pgprot_adjust().
Thus add a declaration for this function. Use mm/internal.h instead of
asm/early_ioremap.h to avoid missing type definitions and unnecessary
exposure.
Link: https://lkml.kernel.org/r/20220314165724.16071-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/early_ioremap.c | 1 +
mm/internal.h | 6 ++++++
2 files changed, 7 insertions(+)
--- a/mm/early_ioremap.c~mm-early_ioremap-declare-early_memremap_pgprot_adjust
+++ a/mm/early_ioremap.c
@@ -17,6 +17,7 @@
#include <linux/vmalloc.h>
#include <asm/fixmap.h>
#include <asm/early_ioremap.h>
+#include "internal.h"
#ifdef CONFIG_MMU
static int early_ioremap_debug __initdata;
--- a/mm/internal.h~mm-early_ioremap-declare-early_memremap_pgprot_adjust
+++ a/mm/internal.h
@@ -155,6 +155,12 @@ extern unsigned long highest_memmap_pfn;
#define MAX_RECLAIM_RETRIES 16
/*
+ * in mm/early_ioremap.c
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size, pgprot_t prot);
+
+/*
* in mm/vmscan.c:
*/
extern int isolate_lru_page(struct page *page);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 186/227] highmem: document kunmap_local()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: ira.weiny, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Ira Weiny <ira.weiny@intel.com>
Subject: highmem: document kunmap_local()
Some users of kmap() add an offset to the kmap() address to be used
during the mapping.
When converting to kmap_local_page() the base address does not
need to be stored because any address within the page can be used in
kunmap_local(). However, this was not clear from the documentation and
cause some questions.[1]
Document that any address in the page can be used in kunmap_local() to
clarify this for future users.
[1] https://lore.kernel.org/lkml/20211213154543.GM3538886@iweiny-DESK2.sc.intel.com/
[ira.weiny@intel.com: updates per Christoph]
Link: https://lkml.kernel.org/r/20220124182138.816693-1-ira.weiny@intel.com
Link: https://lkml.kernel.org/r/20220124013045.806718-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/highmem-internal.h | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/include/linux/highmem-internal.h~highmem-document-kunmap_local
+++ a/include/linux/highmem-internal.h
@@ -246,6 +246,16 @@ do { \
__kunmap_atomic(__addr); \
} while (0)
+/**
+ * kunmap_local - Unmap a page mapped via kmap_local_page().
+ * @__addr: An address within the page mapped
+ *
+ * @__addr can be any address within the mapped page. Commonly it is the
+ * address return from kmap_local_page(), but it can also include offsets.
+ *
+ * Unmapping should be done in the reverse order of the mapping. See
+ * kmap_local_page() for details.
+ */
#define kunmap_local(__addr) \
do { \
BUILD_BUG_ON(__same_type((__addr), struct page *)); \
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 186/227] highmem: document kunmap_local()
@ 2022-03-22 21:47 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:47 UTC (permalink / raw)
To: ira.weiny, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Ira Weiny <ira.weiny@intel.com>
Subject: highmem: document kunmap_local()
Some users of kmap() add an offset to the kmap() address to be used
during the mapping.
When converting to kmap_local_page() the base address does not
need to be stored because any address within the page can be used in
kunmap_local(). However, this was not clear from the documentation and
cause some questions.[1]
Document that any address in the page can be used in kunmap_local() to
clarify this for future users.
[1] https://lore.kernel.org/lkml/20211213154543.GM3538886@iweiny-DESK2.sc.intel.com/
[ira.weiny@intel.com: updates per Christoph]
Link: https://lkml.kernel.org/r/20220124182138.816693-1-ira.weiny@intel.com
Link: https://lkml.kernel.org/r/20220124013045.806718-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/highmem-internal.h | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/include/linux/highmem-internal.h~highmem-document-kunmap_local
+++ a/include/linux/highmem-internal.h
@@ -246,6 +246,16 @@ do { \
__kunmap_atomic(__addr); \
} while (0)
+/**
+ * kunmap_local - Unmap a page mapped via kmap_local_page().
+ * @__addr: An address within the page mapped
+ *
+ * @__addr can be any address within the mapped page. Commonly it is the
+ * address return from kmap_local_page(), but it can also include offsets.
+ *
+ * Unmapping should be done in the reverse order of the mapping. See
+ * kmap_local_page() for details.
+ */
#define kunmap_local(__addr) \
do { \
BUILD_BUG_ON(__same_type((__addr), struct page *)); \
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 187/227] mm/highmem: remove unnecessary done label
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: songmuchun, rientjes, david, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/highmem: remove unnecessary done label
Remove unnecessary done label to simplify the code.
Link: https://lkml.kernel.org/r/20220126092542.64659-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/highmem.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/mm/highmem.c~mm-highmem-remove-unnecessary-done-label
+++ a/mm/highmem.c
@@ -736,11 +736,11 @@ void *page_address(const struct page *pa
list_for_each_entry(pam, &pas->lh, list) {
if (pam->page == page) {
ret = pam->virtual;
- goto done;
+ break;
}
}
}
-done:
+
spin_unlock_irqrestore(&pas->lock, flags);
return ret;
}
@@ -773,13 +773,12 @@ void set_page_address(struct page *page,
list_for_each_entry(pam, &pas->lh, list) {
if (pam->page == page) {
list_del(&pam->list);
- spin_unlock_irqrestore(&pas->lock, flags);
- goto done;
+ break;
}
}
spin_unlock_irqrestore(&pas->lock, flags);
}
-done:
+
return;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 187/227] mm/highmem: remove unnecessary done label
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: songmuchun, rientjes, david, linmiaohe, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/highmem: remove unnecessary done label
Remove unnecessary done label to simplify the code.
Link: https://lkml.kernel.org/r/20220126092542.64659-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/highmem.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
--- a/mm/highmem.c~mm-highmem-remove-unnecessary-done-label
+++ a/mm/highmem.c
@@ -736,11 +736,11 @@ void *page_address(const struct page *pa
list_for_each_entry(pam, &pas->lh, list) {
if (pam->page == page) {
ret = pam->virtual;
- goto done;
+ break;
}
}
}
-done:
+
spin_unlock_irqrestore(&pas->lock, flags);
return ret;
}
@@ -773,13 +773,12 @@ void set_page_address(struct page *page,
list_for_each_entry(pam, &pas->lh, list) {
if (pam->page == page) {
list_del(&pam->list);
- spin_unlock_irqrestore(&pas->lock, flags);
- goto done;
+ break;
}
}
spin_unlock_irqrestore(&pas->lock, flags);
}
-done:
+
return;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 188/227] mm/page_table_check.c: use strtobool for param parsing
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: linux, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: "Dr. David Alan Gilbert" <linux@treblig.org>
Subject: mm/page_table_check.c: use strtobool for param parsing
Use strtobool rather than open coding "on" and "off" parsing.
Link: https://lkml.kernel.org/r/20220227181038.126926-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_table_check.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
--- a/mm/page_table_check.c~mm-use-strtobool-for-param-parsing
+++ a/mm/page_table_check.c
@@ -23,15 +23,7 @@ EXPORT_SYMBOL(page_table_check_disabled)
static int __init early_page_table_check_param(char *buf)
{
- if (!buf)
- return -EINVAL;
-
- if (strcmp(buf, "on") == 0)
- __page_table_check_enabled = true;
- else if (strcmp(buf, "off") == 0)
- __page_table_check_enabled = false;
-
- return 0;
+ return strtobool(buf, &__page_table_check_enabled);
}
early_param("page_table_check", early_page_table_check_param);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 188/227] mm/page_table_check.c: use strtobool for param parsing
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: linux, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: "Dr. David Alan Gilbert" <linux@treblig.org>
Subject: mm/page_table_check.c: use strtobool for param parsing
Use strtobool rather than open coding "on" and "off" parsing.
Link: https://lkml.kernel.org/r/20220227181038.126926-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_table_check.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
--- a/mm/page_table_check.c~mm-use-strtobool-for-param-parsing
+++ a/mm/page_table_check.c
@@ -23,15 +23,7 @@ EXPORT_SYMBOL(page_table_check_disabled)
static int __init early_page_table_check_param(char *buf)
{
- if (!buf)
- return -EINVAL;
-
- if (strcmp(buf, "on") == 0)
- __page_table_check_enabled = true;
- else if (strcmp(buf, "off") == 0)
- __page_table_check_enabled = false;
-
- return 0;
+ return strtobool(buf, &__page_table_check_enabled);
}
early_param("page_table_check", early_page_table_check_param);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 189/227] mm/kfence: remove unnecessary CONFIG_KFENCE option
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: glider, elver, dvyukov, tangmeng, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: tangmeng <tangmeng@uniontech.com>
Subject: mm/kfence: remove unnecessary CONFIG_KFENCE option
In mm/Makefile has:
obj-$(CONFIG_KFENCE) += kfence/
So that we don't need 'obj-$(CONFIG_KFENCE) :=' in mm/kfence/Makefile,
delete it from mm/kfence/Makefile.
Link: https://lkml.kernel.org/r/20220221065525.21344-1-tangmeng@uniontech.com
Signed-off-by: tangmeng <tangmeng@uniontech.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kfence/Makefile~mm-kfence-remove-unnecessary-config_kfence-option
+++ a/mm/kfence/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_KFENCE) := core.o report.o
+obj-y := core.o report.o
CFLAGS_kfence_test.o := -g -fno-omit-frame-pointer -fno-optimize-sibling-calls
obj-$(CONFIG_KFENCE_KUNIT_TEST) += kfence_test.o
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 189/227] mm/kfence: remove unnecessary CONFIG_KFENCE option
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: glider, elver, dvyukov, tangmeng, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: tangmeng <tangmeng@uniontech.com>
Subject: mm/kfence: remove unnecessary CONFIG_KFENCE option
In mm/Makefile has:
obj-$(CONFIG_KFENCE) += kfence/
So that we don't need 'obj-$(CONFIG_KFENCE) :=' in mm/kfence/Makefile,
delete it from mm/kfence/Makefile.
Link: https://lkml.kernel.org/r/20220221065525.21344-1-tangmeng@uniontech.com
Signed-off-by: tangmeng <tangmeng@uniontech.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kfence/Makefile~mm-kfence-remove-unnecessary-config_kfence-option
+++ a/mm/kfence/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_KFENCE) := core.o report.o
+obj-y := core.o report.o
CFLAGS_kfence_test.o := -g -fno-omit-frame-pointer -fno-optimize-sibling-calls
obj-$(CONFIG_KFENCE_KUNIT_TEST) += kfence_test.o
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 190/227] kfence: allow re-enabling KFENCE after system startup
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: glider, elver, dvyukov, dtcccc, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Tianchen Ding <dtcccc@linux.alibaba.com>
Subject: kfence: allow re-enabling KFENCE after system startup
Patch series "provide the flexibility to enable KFENCE", v3.
If CONFIG_CONTIG_ALLOC is not supported, we fallback to try
alloc_pages_exact(). Allocating pages in this way has limits about
MAX_ORDER (default 11). So we will not support allocating kfence pool
after system startup with a large KFENCE_NUM_OBJECTS.
When handling failures in kfence_init_pool_late(), we pair
free_pages_exact() to alloc_pages_exact() for compatibility consideration,
though it actually does the same as free_contig_range().
This patch (of 2):
If once KFENCE is disabled by:
echo 0 > /sys/module/kfence/parameters/sample_interval
KFENCE could never be re-enabled until next rebooting.
Allow re-enabling it by writing a positive num to sample_interval.
Link: https://lkml.kernel.org/r/20220307074516.6920-1-dtcccc@linux.alibaba.com
Link: https://lkml.kernel.org/r/20220307074516.6920-2-dtcccc@linux.alibaba.com
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/core.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
--- a/mm/kfence/core.c~kfence-allow-re-enabling-kfence-after-system-startup
+++ a/mm/kfence/core.c
@@ -38,14 +38,17 @@
#define KFENCE_WARN_ON(cond) \
({ \
const bool __cond = WARN_ON(cond); \
- if (unlikely(__cond)) \
+ if (unlikely(__cond)) { \
WRITE_ONCE(kfence_enabled, false); \
+ disabled_by_warn = true; \
+ } \
__cond; \
})
/* === Data ================================================================= */
static bool kfence_enabled __read_mostly;
+static bool disabled_by_warn __read_mostly;
unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */
@@ -55,6 +58,7 @@ EXPORT_SYMBOL_GPL(kfence_sample_interval
#endif
#define MODULE_PARAM_PREFIX "kfence."
+static int kfence_enable_late(void);
static int param_set_sample_interval(const char *val, const struct kernel_param *kp)
{
unsigned long num;
@@ -65,10 +69,11 @@ static int param_set_sample_interval(con
if (!num) /* Using 0 to indicate KFENCE is disabled. */
WRITE_ONCE(kfence_enabled, false);
- else if (!READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING)
- return -EINVAL; /* Cannot (re-)enable KFENCE on-the-fly. */
*((unsigned long *)kp->arg) = num;
+
+ if (num && !READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING)
+ return disabled_by_warn ? -EINVAL : kfence_enable_late();
return 0;
}
@@ -787,6 +792,16 @@ void __init kfence_init(void)
(void *)(__kfence_pool + KFENCE_POOL_SIZE));
}
+static int kfence_enable_late(void)
+{
+ if (!__kfence_pool)
+ return -EINVAL;
+
+ WRITE_ONCE(kfence_enabled, true);
+ queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+ return 0;
+}
+
void kfence_shutdown_cache(struct kmem_cache *s)
{
unsigned long flags;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 190/227] kfence: allow re-enabling KFENCE after system startup
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: glider, elver, dvyukov, dtcccc, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Tianchen Ding <dtcccc@linux.alibaba.com>
Subject: kfence: allow re-enabling KFENCE after system startup
Patch series "provide the flexibility to enable KFENCE", v3.
If CONFIG_CONTIG_ALLOC is not supported, we fallback to try
alloc_pages_exact(). Allocating pages in this way has limits about
MAX_ORDER (default 11). So we will not support allocating kfence pool
after system startup with a large KFENCE_NUM_OBJECTS.
When handling failures in kfence_init_pool_late(), we pair
free_pages_exact() to alloc_pages_exact() for compatibility consideration,
though it actually does the same as free_contig_range().
This patch (of 2):
If once KFENCE is disabled by:
echo 0 > /sys/module/kfence/parameters/sample_interval
KFENCE could never be re-enabled until next rebooting.
Allow re-enabling it by writing a positive num to sample_interval.
Link: https://lkml.kernel.org/r/20220307074516.6920-1-dtcccc@linux.alibaba.com
Link: https://lkml.kernel.org/r/20220307074516.6920-2-dtcccc@linux.alibaba.com
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/core.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
--- a/mm/kfence/core.c~kfence-allow-re-enabling-kfence-after-system-startup
+++ a/mm/kfence/core.c
@@ -38,14 +38,17 @@
#define KFENCE_WARN_ON(cond) \
({ \
const bool __cond = WARN_ON(cond); \
- if (unlikely(__cond)) \
+ if (unlikely(__cond)) { \
WRITE_ONCE(kfence_enabled, false); \
+ disabled_by_warn = true; \
+ } \
__cond; \
})
/* === Data ================================================================= */
static bool kfence_enabled __read_mostly;
+static bool disabled_by_warn __read_mostly;
unsigned long kfence_sample_interval __read_mostly = CONFIG_KFENCE_SAMPLE_INTERVAL;
EXPORT_SYMBOL_GPL(kfence_sample_interval); /* Export for test modules. */
@@ -55,6 +58,7 @@ EXPORT_SYMBOL_GPL(kfence_sample_interval
#endif
#define MODULE_PARAM_PREFIX "kfence."
+static int kfence_enable_late(void);
static int param_set_sample_interval(const char *val, const struct kernel_param *kp)
{
unsigned long num;
@@ -65,10 +69,11 @@ static int param_set_sample_interval(con
if (!num) /* Using 0 to indicate KFENCE is disabled. */
WRITE_ONCE(kfence_enabled, false);
- else if (!READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING)
- return -EINVAL; /* Cannot (re-)enable KFENCE on-the-fly. */
*((unsigned long *)kp->arg) = num;
+
+ if (num && !READ_ONCE(kfence_enabled) && system_state != SYSTEM_BOOTING)
+ return disabled_by_warn ? -EINVAL : kfence_enable_late();
return 0;
}
@@ -787,6 +792,16 @@ void __init kfence_init(void)
(void *)(__kfence_pool + KFENCE_POOL_SIZE));
}
+static int kfence_enable_late(void)
+{
+ if (!__kfence_pool)
+ return -EINVAL;
+
+ WRITE_ONCE(kfence_enabled, true);
+ queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+ return 0;
+}
+
void kfence_shutdown_cache(struct kmem_cache *s)
{
unsigned long flags;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 191/227] kfence: alloc kfence_pool after system startup
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: liupeng256, glider, elver, dvyukov, dtcccc, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Tianchen Ding <dtcccc@linux.alibaba.com>
Subject: kfence: alloc kfence_pool after system startup
Allow enabling KFENCE after system startup by allocating its pool via the
page allocator. This provides the flexibility to enable KFENCE even if it
wasn't enabled at boot time.
Link: https://lkml.kernel.org/r/20220307074516.6920-3-dtcccc@linux.alibaba.com
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Peng Liu <liupeng256@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/core.c | 111 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 90 insertions(+), 21 deletions(-)
--- a/mm/kfence/core.c~kfence-alloc-kfence_pool-after-system-startup
+++ a/mm/kfence/core.c
@@ -96,7 +96,7 @@ static unsigned long kfence_skip_covered
module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644);
/* The pool of pages used for guard pages and objects. */
-char *__kfence_pool __ro_after_init;
+char *__kfence_pool __read_mostly;
EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
/*
@@ -537,17 +537,19 @@ static void rcu_guarded_free(struct rcu_
kfence_guarded_free((void *)meta->addr, meta, false);
}
-static bool __init kfence_init_pool(void)
+/*
+ * Initialization of the KFENCE pool after its allocation.
+ * Returns 0 on success; otherwise returns the address up to
+ * which partial initialization succeeded.
+ */
+static unsigned long kfence_init_pool(void)
{
unsigned long addr = (unsigned long)__kfence_pool;
struct page *pages;
int i;
- if (!__kfence_pool)
- return false;
-
if (!arch_kfence_init_pool())
- goto err;
+ return addr;
pages = virt_to_page(addr);
@@ -565,7 +567,7 @@ static bool __init kfence_init_pool(void
/* Verify we do not have a compound head page. */
if (WARN_ON(compound_head(&pages[i]) != &pages[i]))
- goto err;
+ return addr;
__SetPageSlab(&pages[i]);
}
@@ -578,7 +580,7 @@ static bool __init kfence_init_pool(void
*/
for (i = 0; i < 2; i++) {
if (unlikely(!kfence_protect(addr)))
- goto err;
+ return addr;
addr += PAGE_SIZE;
}
@@ -595,7 +597,7 @@ static bool __init kfence_init_pool(void
/* Protect the right redzone. */
if (unlikely(!kfence_protect(addr + PAGE_SIZE)))
- goto err;
+ return addr;
addr += 2 * PAGE_SIZE;
}
@@ -608,9 +610,21 @@ static bool __init kfence_init_pool(void
*/
kmemleak_free(__kfence_pool);
- return true;
+ return 0;
+}
+
+static bool __init kfence_init_pool_early(void)
+{
+ unsigned long addr;
+
+ if (!__kfence_pool)
+ return false;
+
+ addr = kfence_init_pool();
+
+ if (!addr)
+ return true;
-err:
/*
* Only release unprotected pages, and do not try to go back and change
* page attributes due to risk of failing to do so as well. If changing
@@ -623,6 +637,26 @@ err:
return false;
}
+static bool kfence_init_pool_late(void)
+{
+ unsigned long addr, free_size;
+
+ addr = kfence_init_pool();
+
+ if (!addr)
+ return true;
+
+ /* Same as above. */
+ free_size = KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool);
+#ifdef CONFIG_CONTIG_ALLOC
+ free_contig_range(page_to_pfn(virt_to_page(addr)), free_size / PAGE_SIZE);
+#else
+ free_pages_exact((void *)addr, free_size);
+#endif
+ __kfence_pool = NULL;
+ return false;
+}
+
/* === DebugFS Interface ==================================================== */
static int stats_show(struct seq_file *seq, void *v)
@@ -771,31 +805,66 @@ void __init kfence_alloc_pool(void)
pr_err("failed to allocate pool\n");
}
+static void kfence_init_enable(void)
+{
+ if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS))
+ static_branch_enable(&kfence_allocation_key);
+ WRITE_ONCE(kfence_enabled, true);
+ queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+ pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
+ CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
+ (void *)(__kfence_pool + KFENCE_POOL_SIZE));
+}
+
void __init kfence_init(void)
{
+ stack_hash_seed = (u32)random_get_entropy();
+
/* Setting kfence_sample_interval to 0 on boot disables KFENCE. */
if (!kfence_sample_interval)
return;
- stack_hash_seed = (u32)random_get_entropy();
- if (!kfence_init_pool()) {
+ if (!kfence_init_pool_early()) {
pr_err("%s failed\n", __func__);
return;
}
- if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS))
- static_branch_enable(&kfence_allocation_key);
- WRITE_ONCE(kfence_enabled, true);
- queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
- pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
- CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
- (void *)(__kfence_pool + KFENCE_POOL_SIZE));
+ kfence_init_enable();
+}
+
+static int kfence_init_late(void)
+{
+ const unsigned long nr_pages = KFENCE_POOL_SIZE / PAGE_SIZE;
+#ifdef CONFIG_CONTIG_ALLOC
+ struct page *pages;
+
+ pages = alloc_contig_pages(nr_pages, GFP_KERNEL, first_online_node, NULL);
+ if (!pages)
+ return -ENOMEM;
+ __kfence_pool = page_to_virt(pages);
+#else
+ if (nr_pages > MAX_ORDER_NR_PAGES) {
+ pr_warn("KFENCE_NUM_OBJECTS too large for buddy allocator\n");
+ return -EINVAL;
+ }
+ __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, GFP_KERNEL);
+ if (!__kfence_pool)
+ return -ENOMEM;
+#endif
+
+ if (!kfence_init_pool_late()) {
+ pr_err("%s failed\n", __func__);
+ return -EBUSY;
+ }
+
+ kfence_init_enable();
+ return 0;
}
static int kfence_enable_late(void)
{
if (!__kfence_pool)
- return -EINVAL;
+ return kfence_init_late();
WRITE_ONCE(kfence_enabled, true);
queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 191/227] kfence: alloc kfence_pool after system startup
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: liupeng256, glider, elver, dvyukov, dtcccc, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Tianchen Ding <dtcccc@linux.alibaba.com>
Subject: kfence: alloc kfence_pool after system startup
Allow enabling KFENCE after system startup by allocating its pool via the
page allocator. This provides the flexibility to enable KFENCE even if it
wasn't enabled at boot time.
Link: https://lkml.kernel.org/r/20220307074516.6920-3-dtcccc@linux.alibaba.com
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Peng Liu <liupeng256@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/core.c | 111 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 90 insertions(+), 21 deletions(-)
--- a/mm/kfence/core.c~kfence-alloc-kfence_pool-after-system-startup
+++ a/mm/kfence/core.c
@@ -96,7 +96,7 @@ static unsigned long kfence_skip_covered
module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644);
/* The pool of pages used for guard pages and objects. */
-char *__kfence_pool __ro_after_init;
+char *__kfence_pool __read_mostly;
EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
/*
@@ -537,17 +537,19 @@ static void rcu_guarded_free(struct rcu_
kfence_guarded_free((void *)meta->addr, meta, false);
}
-static bool __init kfence_init_pool(void)
+/*
+ * Initialization of the KFENCE pool after its allocation.
+ * Returns 0 on success; otherwise returns the address up to
+ * which partial initialization succeeded.
+ */
+static unsigned long kfence_init_pool(void)
{
unsigned long addr = (unsigned long)__kfence_pool;
struct page *pages;
int i;
- if (!__kfence_pool)
- return false;
-
if (!arch_kfence_init_pool())
- goto err;
+ return addr;
pages = virt_to_page(addr);
@@ -565,7 +567,7 @@ static bool __init kfence_init_pool(void
/* Verify we do not have a compound head page. */
if (WARN_ON(compound_head(&pages[i]) != &pages[i]))
- goto err;
+ return addr;
__SetPageSlab(&pages[i]);
}
@@ -578,7 +580,7 @@ static bool __init kfence_init_pool(void
*/
for (i = 0; i < 2; i++) {
if (unlikely(!kfence_protect(addr)))
- goto err;
+ return addr;
addr += PAGE_SIZE;
}
@@ -595,7 +597,7 @@ static bool __init kfence_init_pool(void
/* Protect the right redzone. */
if (unlikely(!kfence_protect(addr + PAGE_SIZE)))
- goto err;
+ return addr;
addr += 2 * PAGE_SIZE;
}
@@ -608,9 +610,21 @@ static bool __init kfence_init_pool(void
*/
kmemleak_free(__kfence_pool);
- return true;
+ return 0;
+}
+
+static bool __init kfence_init_pool_early(void)
+{
+ unsigned long addr;
+
+ if (!__kfence_pool)
+ return false;
+
+ addr = kfence_init_pool();
+
+ if (!addr)
+ return true;
-err:
/*
* Only release unprotected pages, and do not try to go back and change
* page attributes due to risk of failing to do so as well. If changing
@@ -623,6 +637,26 @@ err:
return false;
}
+static bool kfence_init_pool_late(void)
+{
+ unsigned long addr, free_size;
+
+ addr = kfence_init_pool();
+
+ if (!addr)
+ return true;
+
+ /* Same as above. */
+ free_size = KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool);
+#ifdef CONFIG_CONTIG_ALLOC
+ free_contig_range(page_to_pfn(virt_to_page(addr)), free_size / PAGE_SIZE);
+#else
+ free_pages_exact((void *)addr, free_size);
+#endif
+ __kfence_pool = NULL;
+ return false;
+}
+
/* === DebugFS Interface ==================================================== */
static int stats_show(struct seq_file *seq, void *v)
@@ -771,31 +805,66 @@ void __init kfence_alloc_pool(void)
pr_err("failed to allocate pool\n");
}
+static void kfence_init_enable(void)
+{
+ if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS))
+ static_branch_enable(&kfence_allocation_key);
+ WRITE_ONCE(kfence_enabled, true);
+ queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+ pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
+ CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
+ (void *)(__kfence_pool + KFENCE_POOL_SIZE));
+}
+
void __init kfence_init(void)
{
+ stack_hash_seed = (u32)random_get_entropy();
+
/* Setting kfence_sample_interval to 0 on boot disables KFENCE. */
if (!kfence_sample_interval)
return;
- stack_hash_seed = (u32)random_get_entropy();
- if (!kfence_init_pool()) {
+ if (!kfence_init_pool_early()) {
pr_err("%s failed\n", __func__);
return;
}
- if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS))
- static_branch_enable(&kfence_allocation_key);
- WRITE_ONCE(kfence_enabled, true);
- queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
- pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
- CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
- (void *)(__kfence_pool + KFENCE_POOL_SIZE));
+ kfence_init_enable();
+}
+
+static int kfence_init_late(void)
+{
+ const unsigned long nr_pages = KFENCE_POOL_SIZE / PAGE_SIZE;
+#ifdef CONFIG_CONTIG_ALLOC
+ struct page *pages;
+
+ pages = alloc_contig_pages(nr_pages, GFP_KERNEL, first_online_node, NULL);
+ if (!pages)
+ return -ENOMEM;
+ __kfence_pool = page_to_virt(pages);
+#else
+ if (nr_pages > MAX_ORDER_NR_PAGES) {
+ pr_warn("KFENCE_NUM_OBJECTS too large for buddy allocator\n");
+ return -EINVAL;
+ }
+ __kfence_pool = alloc_pages_exact(KFENCE_POOL_SIZE, GFP_KERNEL);
+ if (!__kfence_pool)
+ return -ENOMEM;
+#endif
+
+ if (!kfence_init_pool_late()) {
+ pr_err("%s failed\n", __func__);
+ return -EBUSY;
+ }
+
+ kfence_init_enable();
+ return 0;
}
static int kfence_enable_late(void)
{
if (!__kfence_pool)
- return -EINVAL;
+ return kfence_init_late();
WRITE_ONCE(kfence_enabled, true);
queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 192/227] kunit: fix UAF when run kfence test case test_gfpzero
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: wangkefeng.wang, glider, elver, dvyukov, dlatypov, davidgow,
brendanhiggins, liupeng256, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peng Liu <liupeng256@huawei.com>
Subject: kunit: fix UAF when run kfence test case test_gfpzero
Patch series "kunit: fix a UAF bug and do some optimization", v2.
This series is to fix UAF (use after free) when running kfence test case
test_gfpzero, which is time costly. This UAF bug can be easily triggered
by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some
optimization for kunit tests has been done.
This patch (of 3):
Kunit will create a new thread to run an actual test case, and the main
process will wait for the completion of the actual test thread until
overtime. The variable "struct kunit test" has local property in function
kunit_try_catch_run, and will be used in the test case thread. Task
kunit_try_catch_run will free "struct kunit test" when kunit runs
overtime, but the actual test case is still run and an UAF bug will be
triggered.
The above problem has been both observed in a physical machine and qemu
platform when running kfence kunit tests. The problem can be triggered
when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the
test case test_gfpzero will cost hours and kunit will run to overtime.
The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace:
kunit_log_append+0x58/0xd0
...
test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test]
test_gfpzero.cold+0x61/0x8ab [kfence_test]
kunit_try_run_case+0x4c/0x70
kunit_generic_run_threadfn_adapter+0x11/0x20
kthread+0x166/0x190
ret_from_fork+0x22/0x30
Kernel panic - not syncing: Fatal exception
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the
kunit frame runs overtime. The stop signal will send in function
kunit_try_catch_run, and test_gfpzero will handle it.
Link: https://lkml.kernel.org/r/20220309083753.1561921-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220309083753.1561921-2-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
lib/kunit/try-catch.c | 1 +
mm/kfence/kfence_test.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
--- a/lib/kunit/try-catch.c~kunit-fix-uaf-when-run-kfence-test-case-test_gfpzero
+++ a/lib/kunit/try-catch.c
@@ -78,6 +78,7 @@ void kunit_try_catch_run(struct kunit_tr
if (time_remaining == 0) {
kunit_err(test, "try timed out\n");
try_catch->try_result = -ETIMEDOUT;
+ kthread_stop(task_struct);
}
exit_code = try_catch->try_result;
--- a/mm/kfence/kfence_test.c~kunit-fix-uaf-when-run-kfence-test-case-test_gfpzero
+++ a/mm/kfence/kfence_test.c
@@ -623,7 +623,7 @@ static void test_gfpzero(struct kunit *t
break;
test_free(buf2);
- if (i == CONFIG_KFENCE_NUM_OBJECTS) {
+ if (kthread_should_stop() || (i == CONFIG_KFENCE_NUM_OBJECTS)) {
kunit_warn(test, "giving up ... cannot get same object back\n");
return;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 192/227] kunit: fix UAF when run kfence test case test_gfpzero
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: wangkefeng.wang, glider, elver, dvyukov, dlatypov, davidgow,
brendanhiggins, liupeng256, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peng Liu <liupeng256@huawei.com>
Subject: kunit: fix UAF when run kfence test case test_gfpzero
Patch series "kunit: fix a UAF bug and do some optimization", v2.
This series is to fix UAF (use after free) when running kfence test case
test_gfpzero, which is time costly. This UAF bug can be easily triggered
by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some
optimization for kunit tests has been done.
This patch (of 3):
Kunit will create a new thread to run an actual test case, and the main
process will wait for the completion of the actual test thread until
overtime. The variable "struct kunit test" has local property in function
kunit_try_catch_run, and will be used in the test case thread. Task
kunit_try_catch_run will free "struct kunit test" when kunit runs
overtime, but the actual test case is still run and an UAF bug will be
triggered.
The above problem has been both observed in a physical machine and qemu
platform when running kfence kunit tests. The problem can be triggered
when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the
test case test_gfpzero will cost hours and kunit will run to overtime.
The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace:
kunit_log_append+0x58/0xd0
...
test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test]
test_gfpzero.cold+0x61/0x8ab [kfence_test]
kunit_try_run_case+0x4c/0x70
kunit_generic_run_threadfn_adapter+0x11/0x20
kthread+0x166/0x190
ret_from_fork+0x22/0x30
Kernel panic - not syncing: Fatal exception
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the
kunit frame runs overtime. The stop signal will send in function
kunit_try_catch_run, and test_gfpzero will handle it.
Link: https://lkml.kernel.org/r/20220309083753.1561921-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220309083753.1561921-2-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
lib/kunit/try-catch.c | 1 +
mm/kfence/kfence_test.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
--- a/lib/kunit/try-catch.c~kunit-fix-uaf-when-run-kfence-test-case-test_gfpzero
+++ a/lib/kunit/try-catch.c
@@ -78,6 +78,7 @@ void kunit_try_catch_run(struct kunit_tr
if (time_remaining == 0) {
kunit_err(test, "try timed out\n");
try_catch->try_result = -ETIMEDOUT;
+ kthread_stop(task_struct);
}
exit_code = try_catch->try_result;
--- a/mm/kfence/kfence_test.c~kunit-fix-uaf-when-run-kfence-test-case-test_gfpzero
+++ a/mm/kfence/kfence_test.c
@@ -623,7 +623,7 @@ static void test_gfpzero(struct kunit *t
break;
test_free(buf2);
- if (i == CONFIG_KFENCE_NUM_OBJECTS) {
+ if (kthread_should_stop() || (i == CONFIG_KFENCE_NUM_OBJECTS)) {
kunit_warn(test, "giving up ... cannot get same object back\n");
return;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 193/227] kunit: make kunit_test_timeout compatible with comment
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: wangkefeng.wang, glider, elver, dvyukov, dlatypov, davidgow,
brendanhiggins, liupeng256, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peng Liu <liupeng256@huawei.com>
Subject: kunit: make kunit_test_timeout compatible with comment
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
represent 5min. However, it is wrong when dealing with arm64 whose
default HZ = 250, or some other situations. Use msecs_to_jiffies to fix
this, and kunit_test_timeout will work as desired.
Link: https://lkml.kernel.org/r/20220309083753.1561921-3-liupeng256@huawei.com
Fixes: 5f3e06208920 ("kunit: test: add support for test abort")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
lib/kunit/try-catch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/kunit/try-catch.c~kunit-make-kunit_test_timeout-compatible-with-comment
+++ a/lib/kunit/try-catch.c
@@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(
* If tests timeout due to exceeding sysctl_hung_task_timeout_secs,
* the task will be killed and an oops generated.
*/
- return 300 * MSEC_PER_SEC; /* 5 min */
+ return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */
}
void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 193/227] kunit: make kunit_test_timeout compatible with comment
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: wangkefeng.wang, glider, elver, dvyukov, dlatypov, davidgow,
brendanhiggins, liupeng256, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peng Liu <liupeng256@huawei.com>
Subject: kunit: make kunit_test_timeout compatible with comment
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
represent 5min. However, it is wrong when dealing with arm64 whose
default HZ = 250, or some other situations. Use msecs_to_jiffies to fix
this, and kunit_test_timeout will work as desired.
Link: https://lkml.kernel.org/r/20220309083753.1561921-3-liupeng256@huawei.com
Fixes: 5f3e06208920 ("kunit: test: add support for test abort")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
lib/kunit/try-catch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/kunit/try-catch.c~kunit-make-kunit_test_timeout-compatible-with-comment
+++ a/lib/kunit/try-catch.c
@@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(
* If tests timeout due to exceeding sysctl_hung_task_timeout_secs,
* the task will be killed and an oops generated.
*/
- return 300 * MSEC_PER_SEC; /* 5 min */
+ return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */
}
void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 194/227] kfence: test: try to avoid test_gfpzero trigger rcu_stall
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: wangkefeng.wang, glider, elver, dvyukov, dlatypov, davidgow,
brendanhiggins, liupeng256, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peng Liu <liupeng256@huawei.com>
Subject: kfence: test: try to avoid test_gfpzero trigger rcu_stall
When CONFIG_KFENCE_NUM_OBJECTS is set to a big number, kfence
kunit-test-case test_gfpzero will eat up nearly all the CPU's resources
and rcu_stall is reported as the following log which is cut from a
physical server.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002
softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019)
Task dump for CPU 68:
task:kunit_try_catch state:R running task
stack: 0 pid: 9728 ppid: 2 flags:0x0000020a
Call trace:
dump_backtrace+0x0/0x1e4
show_stack+0x20/0x2c
sched_show_task+0x148/0x170
...
rcu_sched_clock_irq+0x70/0x180
update_process_times+0x68/0xb0
tick_sched_handle+0x38/0x74
...
gic_handle_irq+0x78/0x2c0
el1_irq+0xb8/0x140
kfree+0xd8/0x53c
test_alloc+0x264/0x310 [kfence_test]
test_gfpzero+0xf4/0x840 [kfence_test]
kunit_try_run_case+0x48/0x20c
kunit_generic_run_threadfn_adapter+0x28/0x34
kthread+0x108/0x13c
ret_from_fork+0x10/0x18
To avoid rcu_stall and unacceptable latency, a schedule point is
added to test_gfpzero.
Link: https://lkml.kernel.org/r/20220309083753.1561921-4-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/kfence_test.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/kfence/kfence_test.c~kfence-test-try-to-avoid-test_gfpzero-trigger-rcu_stall
+++ a/mm/kfence/kfence_test.c
@@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *t
kunit_warn(test, "giving up ... cannot get same object back\n");
return;
}
+ cond_resched();
}
for (i = 0; i < size; i++)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 194/227] kfence: test: try to avoid test_gfpzero trigger rcu_stall
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: wangkefeng.wang, glider, elver, dvyukov, dlatypov, davidgow,
brendanhiggins, liupeng256, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Peng Liu <liupeng256@huawei.com>
Subject: kfence: test: try to avoid test_gfpzero trigger rcu_stall
When CONFIG_KFENCE_NUM_OBJECTS is set to a big number, kfence
kunit-test-case test_gfpzero will eat up nearly all the CPU's resources
and rcu_stall is reported as the following log which is cut from a
physical server.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002
softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019)
Task dump for CPU 68:
task:kunit_try_catch state:R running task
stack: 0 pid: 9728 ppid: 2 flags:0x0000020a
Call trace:
dump_backtrace+0x0/0x1e4
show_stack+0x20/0x2c
sched_show_task+0x148/0x170
...
rcu_sched_clock_irq+0x70/0x180
update_process_times+0x68/0xb0
tick_sched_handle+0x38/0x74
...
gic_handle_irq+0x78/0x2c0
el1_irq+0xb8/0x140
kfree+0xd8/0x53c
test_alloc+0x264/0x310 [kfence_test]
test_gfpzero+0xf4/0x840 [kfence_test]
kunit_try_run_case+0x48/0x20c
kunit_generic_run_threadfn_adapter+0x28/0x34
kthread+0x108/0x13c
ret_from_fork+0x10/0x18
To avoid rcu_stall and unacceptable latency, a schedule point is
added to test_gfpzero.
Link: https://lkml.kernel.org/r/20220309083753.1561921-4-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/kfence/kfence_test.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/kfence/kfence_test.c~kfence-test-try-to-avoid-test_gfpzero-trigger-rcu_stall
+++ a/mm/kfence/kfence_test.c
@@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *t
kunit_warn(test, "giving up ... cannot get same object back\n");
return;
}
+ cond_resched();
}
for (i = 0; i < size; i++)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 195/227] kfence: allow use of a deferrable timer
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: glider, dvyukov, elver, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Marco Elver <elver@google.com>
Subject: kfence: allow use of a deferrable timer
Allow the use of a deferrable timer, which does not force CPU wake-ups
when the system is idle. A consequence is that the sample interval
becomes very unpredictable, to the point that it is not guaranteed that
the KFENCE KUnit test still passes.
Nevertheless, on power-constrained systems this may be preferable, so
let's give the user the option should they accept the above trade-off.
Link: https://lkml.kernel.org/r/20220308141415.3168078-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/dev-tools/kfence.rst | 12 ++++++++++++
lib/Kconfig.kfence | 12 ++++++++++++
mm/kfence/core.c | 15 +++++++++++++--
3 files changed, 37 insertions(+), 2 deletions(-)
--- a/Documentation/dev-tools/kfence.rst~kfence-allow-use-of-a-deferrable-timer
+++ a/Documentation/dev-tools/kfence.rst
@@ -41,6 +41,18 @@ guarded by KFENCE. The default is config
``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
disables KFENCE.
+The sample interval controls a timer that sets up KFENCE allocations. By
+default, to keep the real sample interval predictable, the normal timer also
+causes CPU wake-ups when the system is completely idle. This may be undesirable
+on power-constrained systems. The boot parameter ``kfence.deferrable=1``
+instead switches to a "deferrable" timer which does not force CPU wake-ups on
+idle systems, at the risk of unpredictable sample intervals. The default is
+configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``.
+
+.. warning::
+ The KUnit test suite is very likely to fail when using a deferrable timer
+ since it currently causes very unpredictable sample intervals.
+
The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
255), the number of available guarded objects can be controlled. Each object
--- a/lib/Kconfig.kfence~kfence-allow-use-of-a-deferrable-timer
+++ a/lib/Kconfig.kfence
@@ -45,6 +45,18 @@ config KFENCE_NUM_OBJECTS
pages are required; with one containing the object and two adjacent
ones used as guard pages.
+config KFENCE_DEFERRABLE
+ bool "Use a deferrable timer to trigger allocations"
+ help
+ Use a deferrable timer to trigger allocations. This avoids forcing
+ CPU wake-ups if the system is idle, at the risk of a less predictable
+ sample interval.
+
+ Warning: The KUnit test suite fails with this option enabled - due to
+ the unpredictability of the sample interval!
+
+ Say N if you are unsure.
+
config KFENCE_STATIC_KEYS
bool "Use static keys to set up allocations" if EXPERT
depends on JUMP_LABEL
--- a/mm/kfence/core.c~kfence-allow-use-of-a-deferrable-timer
+++ a/mm/kfence/core.c
@@ -95,6 +95,10 @@ module_param_cb(sample_interval, &sample
static unsigned long kfence_skip_covered_thresh __read_mostly = 75;
module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644);
+/* If true, use a deferrable timer. */
+static bool kfence_deferrable __read_mostly = IS_ENABLED(CONFIG_KFENCE_DEFERRABLE);
+module_param_named(deferrable, kfence_deferrable, bool, 0444);
+
/* The pool of pages used for guard pages and objects. */
char *__kfence_pool __read_mostly;
EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
@@ -740,6 +744,8 @@ late_initcall(kfence_debugfs_init);
/* === Allocation Gate Timer ================================================ */
+static struct delayed_work kfence_timer;
+
#ifdef CONFIG_KFENCE_STATIC_KEYS
/* Wait queue to wake up allocation-gate timer task. */
static DECLARE_WAIT_QUEUE_HEAD(allocation_wait);
@@ -762,7 +768,6 @@ static DEFINE_IRQ_WORK(wake_up_kfence_ti
* avoids IPIs, at the cost of not immediately capturing allocations if the
* instructions remain cached.
*/
-static struct delayed_work kfence_timer;
static void toggle_allocation_gate(struct work_struct *work)
{
if (!READ_ONCE(kfence_enabled))
@@ -790,7 +795,6 @@ static void toggle_allocation_gate(struc
queue_delayed_work(system_unbound_wq, &kfence_timer,
msecs_to_jiffies(kfence_sample_interval));
}
-static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);
/* === Public interface ===================================================== */
@@ -809,8 +813,15 @@ static void kfence_init_enable(void)
{
if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS))
static_branch_enable(&kfence_allocation_key);
+
+ if (kfence_deferrable)
+ INIT_DEFERRABLE_WORK(&kfence_timer, toggle_allocation_gate);
+ else
+ INIT_DELAYED_WORK(&kfence_timer, toggle_allocation_gate);
+
WRITE_ONCE(kfence_enabled, true);
queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+
pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
(void *)(__kfence_pool + KFENCE_POOL_SIZE));
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 195/227] kfence: allow use of a deferrable timer
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: glider, dvyukov, elver, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Marco Elver <elver@google.com>
Subject: kfence: allow use of a deferrable timer
Allow the use of a deferrable timer, which does not force CPU wake-ups
when the system is idle. A consequence is that the sample interval
becomes very unpredictable, to the point that it is not guaranteed that
the KFENCE KUnit test still passes.
Nevertheless, on power-constrained systems this may be preferable, so
let's give the user the option should they accept the above trade-off.
Link: https://lkml.kernel.org/r/20220308141415.3168078-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/dev-tools/kfence.rst | 12 ++++++++++++
lib/Kconfig.kfence | 12 ++++++++++++
mm/kfence/core.c | 15 +++++++++++++--
3 files changed, 37 insertions(+), 2 deletions(-)
--- a/Documentation/dev-tools/kfence.rst~kfence-allow-use-of-a-deferrable-timer
+++ a/Documentation/dev-tools/kfence.rst
@@ -41,6 +41,18 @@ guarded by KFENCE. The default is config
``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
disables KFENCE.
+The sample interval controls a timer that sets up KFENCE allocations. By
+default, to keep the real sample interval predictable, the normal timer also
+causes CPU wake-ups when the system is completely idle. This may be undesirable
+on power-constrained systems. The boot parameter ``kfence.deferrable=1``
+instead switches to a "deferrable" timer which does not force CPU wake-ups on
+idle systems, at the risk of unpredictable sample intervals. The default is
+configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``.
+
+.. warning::
+ The KUnit test suite is very likely to fail when using a deferrable timer
+ since it currently causes very unpredictable sample intervals.
+
The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
255), the number of available guarded objects can be controlled. Each object
--- a/lib/Kconfig.kfence~kfence-allow-use-of-a-deferrable-timer
+++ a/lib/Kconfig.kfence
@@ -45,6 +45,18 @@ config KFENCE_NUM_OBJECTS
pages are required; with one containing the object and two adjacent
ones used as guard pages.
+config KFENCE_DEFERRABLE
+ bool "Use a deferrable timer to trigger allocations"
+ help
+ Use a deferrable timer to trigger allocations. This avoids forcing
+ CPU wake-ups if the system is idle, at the risk of a less predictable
+ sample interval.
+
+ Warning: The KUnit test suite fails with this option enabled - due to
+ the unpredictability of the sample interval!
+
+ Say N if you are unsure.
+
config KFENCE_STATIC_KEYS
bool "Use static keys to set up allocations" if EXPERT
depends on JUMP_LABEL
--- a/mm/kfence/core.c~kfence-allow-use-of-a-deferrable-timer
+++ a/mm/kfence/core.c
@@ -95,6 +95,10 @@ module_param_cb(sample_interval, &sample
static unsigned long kfence_skip_covered_thresh __read_mostly = 75;
module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644);
+/* If true, use a deferrable timer. */
+static bool kfence_deferrable __read_mostly = IS_ENABLED(CONFIG_KFENCE_DEFERRABLE);
+module_param_named(deferrable, kfence_deferrable, bool, 0444);
+
/* The pool of pages used for guard pages and objects. */
char *__kfence_pool __read_mostly;
EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
@@ -740,6 +744,8 @@ late_initcall(kfence_debugfs_init);
/* === Allocation Gate Timer ================================================ */
+static struct delayed_work kfence_timer;
+
#ifdef CONFIG_KFENCE_STATIC_KEYS
/* Wait queue to wake up allocation-gate timer task. */
static DECLARE_WAIT_QUEUE_HEAD(allocation_wait);
@@ -762,7 +768,6 @@ static DEFINE_IRQ_WORK(wake_up_kfence_ti
* avoids IPIs, at the cost of not immediately capturing allocations if the
* instructions remain cached.
*/
-static struct delayed_work kfence_timer;
static void toggle_allocation_gate(struct work_struct *work)
{
if (!READ_ONCE(kfence_enabled))
@@ -790,7 +795,6 @@ static void toggle_allocation_gate(struc
queue_delayed_work(system_unbound_wq, &kfence_timer,
msecs_to_jiffies(kfence_sample_interval));
}
-static DECLARE_DELAYED_WORK(kfence_timer, toggle_allocation_gate);
/* === Public interface ===================================================== */
@@ -809,8 +813,15 @@ static void kfence_init_enable(void)
{
if (!IS_ENABLED(CONFIG_KFENCE_STATIC_KEYS))
static_branch_enable(&kfence_allocation_key);
+
+ if (kfence_deferrable)
+ INIT_DEFERRABLE_WORK(&kfence_timer, toggle_allocation_gate);
+ else
+ INIT_DELAYED_WORK(&kfence_timer, toggle_allocation_gate);
+
WRITE_ONCE(kfence_enabled, true);
queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+
pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
(void *)(__kfence_pool + KFENCE_POOL_SIZE));
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 196/227] mm/hmm.c: remove unneeded local variable ret
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: songmuchun, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/hmm.c: remove unneeded local variable ret
The local variable ret is always 0. Remove it to make code more tight.
Link: https://lkml.kernel.org/r/20220125124833.39718-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hmm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/hmm.c~mm-hmmc-remove-unneeded-local-variable-ret
+++ a/mm/hmm.c
@@ -417,7 +417,6 @@ static int hmm_vma_walk_pud(pud_t *pudp,
struct hmm_range *range = hmm_vma_walk->range;
unsigned long addr = start;
pud_t pud;
- int ret = 0;
spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
if (!ptl)
@@ -466,7 +465,7 @@ static int hmm_vma_walk_pud(pud_t *pudp,
out_unlock:
spin_unlock(ptl);
- return ret;
+ return 0;
}
#else
#define hmm_vma_walk_pud NULL
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 196/227] mm/hmm.c: remove unneeded local variable ret
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: songmuchun, linmiaohe, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Miaohe Lin <linmiaohe@huawei.com>
Subject: mm/hmm.c: remove unneeded local variable ret
The local variable ret is always 0. Remove it to make code more tight.
Link: https://lkml.kernel.org/r/20220125124833.39718-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/hmm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/hmm.c~mm-hmmc-remove-unneeded-local-variable-ret
+++ a/mm/hmm.c
@@ -417,7 +417,6 @@ static int hmm_vma_walk_pud(pud_t *pudp,
struct hmm_range *range = hmm_vma_walk->range;
unsigned long addr = start;
pud_t pud;
- int ret = 0;
spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
if (!ptl)
@@ -466,7 +465,7 @@ static int hmm_vma_walk_pud(pud_t *pudp,
out_unlock:
spin_unlock(ptl);
- return ret;
+ return 0;
}
#else
#define hmm_vma_walk_pud NULL
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 197/227] mm/damon/dbgfs/init_regions: use target index instead of target id
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs/init_regions: use target index instead of target id
Patch series "Remove the type-unclear target id concept".
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this patchset removes the concept and uses clear
type definition.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
This patch (of 4):
Target id is a 'unsigned long' data, which can be interpreted differently
by each monitoring primitives. For example, it means 'struct pid *' for
the virtual address spaces monitoring, while it means nothing but an
integer to be displayed to debugfs interface users for the physical
address space monitoring. It's flexible but makes code ugly and
type-unsafe[1].
To be prepared for eventual removal of the concept, this commit removes a
use case of the concept in 'init_regions' debugfs file handling. In
detail, this commit replaces use of the id with the index of each target
in the context's targets list.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211230100723.2238-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs-test.h | 20 ++++++++++----------
mm/damon/dbgfs.c | 25 ++++++++++++-------------
2 files changed, 22 insertions(+), 23 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-init_regions-use-target-index-instead-of-target-id
+++ a/mm/damon/dbgfs.c
@@ -440,18 +440,20 @@ static ssize_t sprint_init_regions(struc
{
struct damon_target *t;
struct damon_region *r;
+ int target_idx = 0;
int written = 0;
int rc;
damon_for_each_target(t, c) {
damon_for_each_region(r, t) {
rc = scnprintf(&buf[written], len - written,
- "%lu %lu %lu\n",
- t->id, r->ar.start, r->ar.end);
+ "%d %lu %lu\n",
+ target_idx, r->ar.start, r->ar.end);
if (!rc)
return -ENOMEM;
written += rc;
}
+ target_idx++;
}
return written;
}
@@ -485,22 +487,19 @@ out:
return len;
}
-static int add_init_region(struct damon_ctx *c,
- unsigned long target_id, struct damon_addr_range *ar)
+static int add_init_region(struct damon_ctx *c, int target_idx,
+ struct damon_addr_range *ar)
{
struct damon_target *t;
struct damon_region *r, *prev;
- unsigned long id;
+ unsigned long idx = 0;
int rc = -EINVAL;
if (ar->start >= ar->end)
return -EINVAL;
damon_for_each_target(t, c) {
- id = t->id;
- if (targetid_is_pid(c))
- id = (unsigned long)pid_vnr((struct pid *)id);
- if (id == target_id) {
+ if (idx++ == target_idx) {
r = damon_new_region(ar->start, ar->end);
if (!r)
return -ENOMEM;
@@ -523,7 +522,7 @@ static int set_init_regions(struct damon
struct damon_target *t;
struct damon_region *r, *next;
int pos = 0, parsed, ret;
- unsigned long target_id;
+ int target_idx;
struct damon_addr_range ar;
int err;
@@ -533,11 +532,11 @@ static int set_init_regions(struct damon
}
while (pos < len) {
- ret = sscanf(&str[pos], "%lu %lu %lu%n",
- &target_id, &ar.start, &ar.end, &parsed);
+ ret = sscanf(&str[pos], "%d %lu %lu%n",
+ &target_idx, &ar.start, &ar.end, &parsed);
if (ret != 3)
break;
- err = add_init_region(c, target_id, &ar);
+ err = add_init_region(c, target_idx, &ar);
if (err)
goto fail;
pos += parsed;
--- a/mm/damon/dbgfs-test.h~mm-damon-dbgfs-init_regions-use-target-index-instead-of-target-id
+++ a/mm/damon/dbgfs-test.h
@@ -113,19 +113,19 @@ static void damon_dbgfs_test_set_init_re
{
struct damon_ctx *ctx = damon_new_ctx();
unsigned long ids[] = {1, 2, 3};
- /* Each line represents one region in ``<target id> <start> <end>`` */
- char * const valid_inputs[] = {"2 10 20\n 2 20 30\n2 35 45",
- "2 10 20\n",
- "2 10 20\n1 39 59\n1 70 134\n 2 20 25\n",
+ /* Each line represents one region in ``<target idx> <start> <end>`` */
+ char * const valid_inputs[] = {"1 10 20\n 1 20 30\n1 35 45",
+ "1 10 20\n",
+ "1 10 20\n0 39 59\n0 70 134\n 1 20 25\n",
""};
/* Reading the file again will show sorted, clean output */
- char * const valid_expects[] = {"2 10 20\n2 20 30\n2 35 45\n",
- "2 10 20\n",
- "1 39 59\n1 70 134\n2 10 20\n2 20 25\n",
+ char * const valid_expects[] = {"1 10 20\n1 20 30\n1 35 45\n",
+ "1 10 20\n",
+ "0 39 59\n0 70 134\n1 10 20\n1 20 25\n",
""};
- char * const invalid_inputs[] = {"4 10 20\n", /* target not exists */
- "2 10 20\n 2 14 26\n", /* regions overlap */
- "1 10 20\n2 30 40\n 1 5 8"}; /* not sorted by address */
+ char * const invalid_inputs[] = {"3 10 20\n", /* target not exists */
+ "1 10 20\n 1 14 26\n", /* regions overlap */
+ "0 10 20\n1 30 40\n 0 5 8"}; /* not sorted by address */
char *input, *expect;
int i, rc;
char buf[256];
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 197/227] mm/damon/dbgfs/init_regions: use target index instead of target id
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs/init_regions: use target index instead of target id
Patch series "Remove the type-unclear target id concept".
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this patchset removes the concept and uses clear
type definition.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
This patch (of 4):
Target id is a 'unsigned long' data, which can be interpreted differently
by each monitoring primitives. For example, it means 'struct pid *' for
the virtual address spaces monitoring, while it means nothing but an
integer to be displayed to debugfs interface users for the physical
address space monitoring. It's flexible but makes code ugly and
type-unsafe[1].
To be prepared for eventual removal of the concept, this commit removes a
use case of the concept in 'init_regions' debugfs file handling. In
detail, this commit replaces use of the id with the index of each target
in the context's targets list.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211230100723.2238-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs-test.h | 20 ++++++++++----------
mm/damon/dbgfs.c | 25 ++++++++++++-------------
2 files changed, 22 insertions(+), 23 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-init_regions-use-target-index-instead-of-target-id
+++ a/mm/damon/dbgfs.c
@@ -440,18 +440,20 @@ static ssize_t sprint_init_regions(struc
{
struct damon_target *t;
struct damon_region *r;
+ int target_idx = 0;
int written = 0;
int rc;
damon_for_each_target(t, c) {
damon_for_each_region(r, t) {
rc = scnprintf(&buf[written], len - written,
- "%lu %lu %lu\n",
- t->id, r->ar.start, r->ar.end);
+ "%d %lu %lu\n",
+ target_idx, r->ar.start, r->ar.end);
if (!rc)
return -ENOMEM;
written += rc;
}
+ target_idx++;
}
return written;
}
@@ -485,22 +487,19 @@ out:
return len;
}
-static int add_init_region(struct damon_ctx *c,
- unsigned long target_id, struct damon_addr_range *ar)
+static int add_init_region(struct damon_ctx *c, int target_idx,
+ struct damon_addr_range *ar)
{
struct damon_target *t;
struct damon_region *r, *prev;
- unsigned long id;
+ unsigned long idx = 0;
int rc = -EINVAL;
if (ar->start >= ar->end)
return -EINVAL;
damon_for_each_target(t, c) {
- id = t->id;
- if (targetid_is_pid(c))
- id = (unsigned long)pid_vnr((struct pid *)id);
- if (id == target_id) {
+ if (idx++ == target_idx) {
r = damon_new_region(ar->start, ar->end);
if (!r)
return -ENOMEM;
@@ -523,7 +522,7 @@ static int set_init_regions(struct damon
struct damon_target *t;
struct damon_region *r, *next;
int pos = 0, parsed, ret;
- unsigned long target_id;
+ int target_idx;
struct damon_addr_range ar;
int err;
@@ -533,11 +532,11 @@ static int set_init_regions(struct damon
}
while (pos < len) {
- ret = sscanf(&str[pos], "%lu %lu %lu%n",
- &target_id, &ar.start, &ar.end, &parsed);
+ ret = sscanf(&str[pos], "%d %lu %lu%n",
+ &target_idx, &ar.start, &ar.end, &parsed);
if (ret != 3)
break;
- err = add_init_region(c, target_id, &ar);
+ err = add_init_region(c, target_idx, &ar);
if (err)
goto fail;
pos += parsed;
--- a/mm/damon/dbgfs-test.h~mm-damon-dbgfs-init_regions-use-target-index-instead-of-target-id
+++ a/mm/damon/dbgfs-test.h
@@ -113,19 +113,19 @@ static void damon_dbgfs_test_set_init_re
{
struct damon_ctx *ctx = damon_new_ctx();
unsigned long ids[] = {1, 2, 3};
- /* Each line represents one region in ``<target id> <start> <end>`` */
- char * const valid_inputs[] = {"2 10 20\n 2 20 30\n2 35 45",
- "2 10 20\n",
- "2 10 20\n1 39 59\n1 70 134\n 2 20 25\n",
+ /* Each line represents one region in ``<target idx> <start> <end>`` */
+ char * const valid_inputs[] = {"1 10 20\n 1 20 30\n1 35 45",
+ "1 10 20\n",
+ "1 10 20\n0 39 59\n0 70 134\n 1 20 25\n",
""};
/* Reading the file again will show sorted, clean output */
- char * const valid_expects[] = {"2 10 20\n2 20 30\n2 35 45\n",
- "2 10 20\n",
- "1 39 59\n1 70 134\n2 10 20\n2 20 25\n",
+ char * const valid_expects[] = {"1 10 20\n1 20 30\n1 35 45\n",
+ "1 10 20\n",
+ "0 39 59\n0 70 134\n1 10 20\n1 20 25\n",
""};
- char * const invalid_inputs[] = {"4 10 20\n", /* target not exists */
- "2 10 20\n 2 14 26\n", /* regions overlap */
- "1 10 20\n2 30 40\n 1 5 8"}; /* not sorted by address */
+ char * const invalid_inputs[] = {"3 10 20\n", /* target not exists */
+ "1 10 20\n 1 14 26\n", /* regions overlap */
+ "0 10 20\n1 30 40\n 0 5 8"}; /* not sorted by address */
char *input, *expect;
int i, rc;
char buf[256];
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 198/227] Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input
A previous commit made init_regions debugfs file to use target index
instead of target id for specifying the target of the init regions. This
commit updates the usage document to reflect the change.
Link: https://lkml.kernel.org/r/20211230100723.2238-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 24 +++++++++--------
1 file changed, 14 insertions(+), 10 deletions(-)
--- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-update-for-changed-initail_regions-file-input
+++ a/Documentation/admin-guide/mm/damon/usage.rst
@@ -108,19 +108,23 @@ In such cases, users can explicitly set
as they want, by writing proper values to the ``init_regions`` file. Each line
of the input should represent one region in below form.::
- <target id> <start address> <end address>
+ <target idx> <start address> <end address>
-The ``target id`` should already in ``target_ids`` file, and the regions should
-be passed in address order. For example, below commands will set a couple of
-address ranges, ``1-100`` and ``100-200`` as the initial monitoring target
-region of process 42, and another couple of address ranges, ``20-40`` and
-``50-100`` as that of process 4242.::
+The ``target idx`` should be the index of the target in ``target_ids`` file,
+starting from ``0``, and the regions should be passed in address order. For
+example, below commands will set a couple of address ranges, ``1-100`` and
+``100-200`` as the initial monitoring target region of pid 42, which is the
+first one (index ``0``) in ``target_ids``, and another couple of address
+ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one
+(index ``1``) in ``target_ids``.::
# cd <debugfs>/damon
- # echo "42 1 100
- 42 100 200
- 4242 20 40
- 4242 50 100" > init_regions
+ # cat target_ids
+ 42 4242
+ # echo "0 1 100
+ 0 100 200
+ 1 20 40
+ 1 50 100" > init_regions
Note that this sets the initial monitoring target regions only. In case of
virtual memory monitoring, DAMON will automatically updates the boundary of the
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 198/227] Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input
A previous commit made init_regions debugfs file to use target index
instead of target id for specifying the target of the init regions. This
commit updates the usage document to reflect the change.
Link: https://lkml.kernel.org/r/20211230100723.2238-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 24 +++++++++--------
1 file changed, 14 insertions(+), 10 deletions(-)
--- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-update-for-changed-initail_regions-file-input
+++ a/Documentation/admin-guide/mm/damon/usage.rst
@@ -108,19 +108,23 @@ In such cases, users can explicitly set
as they want, by writing proper values to the ``init_regions`` file. Each line
of the input should represent one region in below form.::
- <target id> <start address> <end address>
+ <target idx> <start address> <end address>
-The ``target id`` should already in ``target_ids`` file, and the regions should
-be passed in address order. For example, below commands will set a couple of
-address ranges, ``1-100`` and ``100-200`` as the initial monitoring target
-region of process 42, and another couple of address ranges, ``20-40`` and
-``50-100`` as that of process 4242.::
+The ``target idx`` should be the index of the target in ``target_ids`` file,
+starting from ``0``, and the regions should be passed in address order. For
+example, below commands will set a couple of address ranges, ``1-100`` and
+``100-200`` as the initial monitoring target region of pid 42, which is the
+first one (index ``0``) in ``target_ids``, and another couple of address
+ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one
+(index ``1``) in ``target_ids``.::
# cd <debugfs>/damon
- # echo "42 1 100
- 42 100 200
- 4242 20 40
- 4242 50 100" > init_regions
+ # cat target_ids
+ 42 4242
+ # echo "0 1 100
+ 0 100 200
+ 1 20 40
+ 1 50 100" > init_regions
Note that this sets the initial monitoring target regions only. In case of
virtual memory monitoring, DAMON will automatically updates the boundary of the
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 199/227] mm/damon/core: move damon_set_targets() into dbgfs
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/core: move damon_set_targets() into dbgfs
damon_set_targets() function is defined in the core for general use cases,
but called from only dbgfs. Also, because the function is for general use
cases, dbgfs does additional handling of pid type target id case. To make
the situation simpler, this commit moves the function into dbgfs and makes
it to do the pid type case handling on its own.
Link: https://lkml.kernel.org/r/20211230100723.2238-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 2 -
mm/damon/core-test.h | 5 +++
mm/damon/core.c | 32 ------------------------
mm/damon/dbgfs-test.h | 14 +++++-----
mm/damon/dbgfs.c | 53 ++++++++++++++++++++++++++++++----------
5 files changed, 52 insertions(+), 54 deletions(-)
--- a/include/linux/damon.h~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/include/linux/damon.h
@@ -484,8 +484,6 @@ unsigned int damon_nr_regions(struct dam
struct damon_ctx *damon_new_ctx(void);
void damon_destroy_ctx(struct damon_ctx *ctx);
-int damon_set_targets(struct damon_ctx *ctx,
- unsigned long *ids, ssize_t nr_ids);
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
unsigned long aggr_int, unsigned long primitive_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg);
--- a/mm/damon/core.c~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/core.c
@@ -246,38 +246,6 @@ void damon_destroy_ctx(struct damon_ctx
}
/**
- * damon_set_targets() - Set monitoring targets.
- * @ctx: monitoring context
- * @ids: array of target ids
- * @nr_ids: number of entries in @ids
- *
- * This function should not be called while the kdamond is running.
- *
- * Return: 0 on success, negative error code otherwise.
- */
-int damon_set_targets(struct damon_ctx *ctx,
- unsigned long *ids, ssize_t nr_ids)
-{
- ssize_t i;
- struct damon_target *t, *next;
-
- damon_destroy_targets(ctx);
-
- for (i = 0; i < nr_ids; i++) {
- t = damon_new_target(ids[i]);
- if (!t) {
- /* The caller should do cleanup of the ids itself */
- damon_for_each_target_safe(t, next, ctx)
- damon_destroy_target(t);
- return -ENOMEM;
- }
- damon_add_target(ctx, t);
- }
-
- return 0;
-}
-
-/**
* damon_set_attrs() - Set attributes for the monitoring.
* @ctx: monitoring context
* @sample_int: time interval between samplings
--- a/mm/damon/core-test.h~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/core-test.h
@@ -86,7 +86,10 @@ static void damon_test_aggregate(struct
struct damon_region *r;
int it, ir;
- damon_set_targets(ctx, target_ids, 3);
+ for (it = 0; it < 3; it++) {
+ t = damon_new_target(target_ids[it]);
+ damon_add_target(ctx, t);
+ }
it = 0;
damon_for_each_target(t, ctx) {
--- a/mm/damon/dbgfs.c~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/dbgfs.c
@@ -358,11 +358,48 @@ static void dbgfs_put_pids(unsigned long
put_pid((struct pid *)ids[i]);
}
+/*
+ * dbgfs_set_targets() - Set monitoring targets.
+ * @ctx: monitoring context
+ * @ids: array of target ids
+ * @nr_ids: number of entries in @ids
+ *
+ * This function should not be called while the kdamond is running.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+static int dbgfs_set_targets(struct damon_ctx *ctx,
+ unsigned long *ids, ssize_t nr_ids)
+{
+ ssize_t i;
+ struct damon_target *t, *next;
+
+ damon_for_each_target_safe(t, next, ctx) {
+ if (targetid_is_pid(ctx))
+ put_pid((struct pid *)t->id);
+ damon_destroy_target(t);
+ }
+
+ for (i = 0; i < nr_ids; i++) {
+ t = damon_new_target(ids[i]);
+ if (!t) {
+ /* The caller should do cleanup of the ids itself */
+ damon_for_each_target_safe(t, next, ctx)
+ damon_destroy_target(t);
+ if (targetid_is_pid(ctx))
+ dbgfs_put_pids(ids, nr_ids);
+ return -ENOMEM;
+ }
+ damon_add_target(ctx, t);
+ }
+
+ return 0;
+}
+
static ssize_t dbgfs_target_ids_write(struct file *file,
const char __user *buf, size_t count, loff_t *ppos)
{
struct damon_ctx *ctx = file->private_data;
- struct damon_target *t, *next_t;
bool id_is_pid = true;
char *kbuf;
unsigned long *targets;
@@ -407,11 +444,7 @@ static ssize_t dbgfs_target_ids_write(st
}
/* remove previously set targets */
- damon_for_each_target_safe(t, next_t, ctx) {
- if (targetid_is_pid(ctx))
- put_pid((struct pid *)t->id);
- damon_destroy_target(t);
- }
+ dbgfs_set_targets(ctx, NULL, 0);
/* Configure the context for the address space type */
if (id_is_pid)
@@ -419,13 +452,9 @@ static ssize_t dbgfs_target_ids_write(st
else
damon_pa_set_primitives(ctx);
- ret = damon_set_targets(ctx, targets, nr_targets);
- if (ret) {
- if (id_is_pid)
- dbgfs_put_pids(targets, nr_targets);
- } else {
+ ret = dbgfs_set_targets(ctx, targets, nr_targets);
+ if (!ret)
ret = count;
- }
unlock_out:
mutex_unlock(&ctx->kdamond_lock);
--- a/mm/damon/dbgfs-test.h~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/dbgfs-test.h
@@ -86,23 +86,23 @@ static void damon_dbgfs_test_set_targets
ctx->primitive.target_valid = NULL;
ctx->primitive.cleanup = NULL;
- damon_set_targets(ctx, ids, 3);
+ dbgfs_set_targets(ctx, ids, 3);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2 3\n");
- damon_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, NULL, 0);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
- damon_set_targets(ctx, (unsigned long []){1, 2}, 2);
+ dbgfs_set_targets(ctx, (unsigned long []){1, 2}, 2);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2\n");
- damon_set_targets(ctx, (unsigned long []){2}, 1);
+ dbgfs_set_targets(ctx, (unsigned long []){2}, 1);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "2\n");
- damon_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, NULL, 0);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
@@ -130,7 +130,7 @@ static void damon_dbgfs_test_set_init_re
int i, rc;
char buf[256];
- damon_set_targets(ctx, ids, 3);
+ dbgfs_set_targets(ctx, ids, 3);
/* Put valid inputs and check the results */
for (i = 0; i < ARRAY_SIZE(valid_inputs); i++) {
@@ -158,7 +158,7 @@ static void damon_dbgfs_test_set_init_re
KUNIT_EXPECT_STREQ(test, (char *)buf, "");
}
- damon_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, NULL, 0);
damon_destroy_ctx(ctx);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 199/227] mm/damon/core: move damon_set_targets() into dbgfs
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/core: move damon_set_targets() into dbgfs
damon_set_targets() function is defined in the core for general use cases,
but called from only dbgfs. Also, because the function is for general use
cases, dbgfs does additional handling of pid type target id case. To make
the situation simpler, this commit moves the function into dbgfs and makes
it to do the pid type case handling on its own.
Link: https://lkml.kernel.org/r/20211230100723.2238-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 2 -
mm/damon/core-test.h | 5 +++
mm/damon/core.c | 32 ------------------------
mm/damon/dbgfs-test.h | 14 +++++-----
mm/damon/dbgfs.c | 53 ++++++++++++++++++++++++++++++----------
5 files changed, 52 insertions(+), 54 deletions(-)
--- a/include/linux/damon.h~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/include/linux/damon.h
@@ -484,8 +484,6 @@ unsigned int damon_nr_regions(struct dam
struct damon_ctx *damon_new_ctx(void);
void damon_destroy_ctx(struct damon_ctx *ctx);
-int damon_set_targets(struct damon_ctx *ctx,
- unsigned long *ids, ssize_t nr_ids);
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
unsigned long aggr_int, unsigned long primitive_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg);
--- a/mm/damon/core.c~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/core.c
@@ -246,38 +246,6 @@ void damon_destroy_ctx(struct damon_ctx
}
/**
- * damon_set_targets() - Set monitoring targets.
- * @ctx: monitoring context
- * @ids: array of target ids
- * @nr_ids: number of entries in @ids
- *
- * This function should not be called while the kdamond is running.
- *
- * Return: 0 on success, negative error code otherwise.
- */
-int damon_set_targets(struct damon_ctx *ctx,
- unsigned long *ids, ssize_t nr_ids)
-{
- ssize_t i;
- struct damon_target *t, *next;
-
- damon_destroy_targets(ctx);
-
- for (i = 0; i < nr_ids; i++) {
- t = damon_new_target(ids[i]);
- if (!t) {
- /* The caller should do cleanup of the ids itself */
- damon_for_each_target_safe(t, next, ctx)
- damon_destroy_target(t);
- return -ENOMEM;
- }
- damon_add_target(ctx, t);
- }
-
- return 0;
-}
-
-/**
* damon_set_attrs() - Set attributes for the monitoring.
* @ctx: monitoring context
* @sample_int: time interval between samplings
--- a/mm/damon/core-test.h~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/core-test.h
@@ -86,7 +86,10 @@ static void damon_test_aggregate(struct
struct damon_region *r;
int it, ir;
- damon_set_targets(ctx, target_ids, 3);
+ for (it = 0; it < 3; it++) {
+ t = damon_new_target(target_ids[it]);
+ damon_add_target(ctx, t);
+ }
it = 0;
damon_for_each_target(t, ctx) {
--- a/mm/damon/dbgfs.c~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/dbgfs.c
@@ -358,11 +358,48 @@ static void dbgfs_put_pids(unsigned long
put_pid((struct pid *)ids[i]);
}
+/*
+ * dbgfs_set_targets() - Set monitoring targets.
+ * @ctx: monitoring context
+ * @ids: array of target ids
+ * @nr_ids: number of entries in @ids
+ *
+ * This function should not be called while the kdamond is running.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+static int dbgfs_set_targets(struct damon_ctx *ctx,
+ unsigned long *ids, ssize_t nr_ids)
+{
+ ssize_t i;
+ struct damon_target *t, *next;
+
+ damon_for_each_target_safe(t, next, ctx) {
+ if (targetid_is_pid(ctx))
+ put_pid((struct pid *)t->id);
+ damon_destroy_target(t);
+ }
+
+ for (i = 0; i < nr_ids; i++) {
+ t = damon_new_target(ids[i]);
+ if (!t) {
+ /* The caller should do cleanup of the ids itself */
+ damon_for_each_target_safe(t, next, ctx)
+ damon_destroy_target(t);
+ if (targetid_is_pid(ctx))
+ dbgfs_put_pids(ids, nr_ids);
+ return -ENOMEM;
+ }
+ damon_add_target(ctx, t);
+ }
+
+ return 0;
+}
+
static ssize_t dbgfs_target_ids_write(struct file *file,
const char __user *buf, size_t count, loff_t *ppos)
{
struct damon_ctx *ctx = file->private_data;
- struct damon_target *t, *next_t;
bool id_is_pid = true;
char *kbuf;
unsigned long *targets;
@@ -407,11 +444,7 @@ static ssize_t dbgfs_target_ids_write(st
}
/* remove previously set targets */
- damon_for_each_target_safe(t, next_t, ctx) {
- if (targetid_is_pid(ctx))
- put_pid((struct pid *)t->id);
- damon_destroy_target(t);
- }
+ dbgfs_set_targets(ctx, NULL, 0);
/* Configure the context for the address space type */
if (id_is_pid)
@@ -419,13 +452,9 @@ static ssize_t dbgfs_target_ids_write(st
else
damon_pa_set_primitives(ctx);
- ret = damon_set_targets(ctx, targets, nr_targets);
- if (ret) {
- if (id_is_pid)
- dbgfs_put_pids(targets, nr_targets);
- } else {
+ ret = dbgfs_set_targets(ctx, targets, nr_targets);
+ if (!ret)
ret = count;
- }
unlock_out:
mutex_unlock(&ctx->kdamond_lock);
--- a/mm/damon/dbgfs-test.h~mm-damon-core-move-damon_set_targets-into-dbgfs
+++ a/mm/damon/dbgfs-test.h
@@ -86,23 +86,23 @@ static void damon_dbgfs_test_set_targets
ctx->primitive.target_valid = NULL;
ctx->primitive.cleanup = NULL;
- damon_set_targets(ctx, ids, 3);
+ dbgfs_set_targets(ctx, ids, 3);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2 3\n");
- damon_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, NULL, 0);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
- damon_set_targets(ctx, (unsigned long []){1, 2}, 2);
+ dbgfs_set_targets(ctx, (unsigned long []){1, 2}, 2);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2\n");
- damon_set_targets(ctx, (unsigned long []){2}, 1);
+ dbgfs_set_targets(ctx, (unsigned long []){2}, 1);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "2\n");
- damon_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, NULL, 0);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
@@ -130,7 +130,7 @@ static void damon_dbgfs_test_set_init_re
int i, rc;
char buf[256];
- damon_set_targets(ctx, ids, 3);
+ dbgfs_set_targets(ctx, ids, 3);
/* Put valid inputs and check the results */
for (i = 0; i < ARRAY_SIZE(valid_inputs); i++) {
@@ -158,7 +158,7 @@ static void damon_dbgfs_test_set_init_re
KUNIT_EXPECT_STREQ(test, (char *)buf, "");
}
- damon_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, NULL, 0);
damon_destroy_ctx(ctx);
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 200/227] mm/damon: remove the target id concept
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: remove the target id concept
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible, but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this commit removes the concept and uses clear
type definition. For now, only 'struct pid' pointer is used for the
virtual address spaces monitoring. If DAMON is extended in future so that
we need to put another identifier field in the struct, we will use a union
for such primitives-dependent fields and document which primitives are
using which type.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 11 +-
mm/damon/core-test.h | 18 ++--
mm/damon/core.c | 4 -
mm/damon/dbgfs-test.h | 63 +++++-----------
mm/damon/dbgfs.c | 152 +++++++++++++++++++++++-----------------
mm/damon/reclaim.c | 3
mm/damon/vaddr-test.h | 6 -
mm/damon/vaddr.c | 4 -
8 files changed, 133 insertions(+), 128 deletions(-)
--- a/include/linux/damon.h~mm-damon-remove-the-target-id-concept
+++ a/include/linux/damon.h
@@ -60,19 +60,18 @@ struct damon_region {
/**
* struct damon_target - Represents a monitoring target.
- * @id: Unique identifier for this target.
+ * @pid: The PID of the virtual address space to monitor.
* @nr_regions: Number of monitoring target regions of this target.
* @regions_list: Head of the monitoring target regions of this target.
* @list: List head for siblings.
*
* Each monitoring context could have multiple targets. For example, a context
* for virtual memory address spaces could have multiple target processes. The
- * @id of each target should be unique among the targets of the context. For
- * example, in the virtual address monitoring context, it could be a pidfd or
- * an address of an mm_struct.
+ * @pid should be set for appropriate address space monitoring primitives
+ * including the virtual address spaces monitoring primitives.
*/
struct damon_target {
- unsigned long id;
+ struct pid *pid;
unsigned int nr_regions;
struct list_head regions_list;
struct list_head list;
@@ -475,7 +474,7 @@ struct damos *damon_new_scheme(
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
void damon_destroy_scheme(struct damos *s);
-struct damon_target *damon_new_target(unsigned long id);
+struct damon_target *damon_new_target(void);
void damon_add_target(struct damon_ctx *ctx, struct damon_target *t);
bool damon_targets_empty(struct damon_ctx *ctx);
void damon_free_target(struct damon_target *t);
--- a/mm/damon/core.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/core.c
@@ -144,7 +144,7 @@ void damon_destroy_scheme(struct damos *
*
* Returns the pointer to the new struct if success, or NULL otherwise
*/
-struct damon_target *damon_new_target(unsigned long id)
+struct damon_target *damon_new_target(void)
{
struct damon_target *t;
@@ -152,7 +152,7 @@ struct damon_target *damon_new_target(un
if (!t)
return NULL;
- t->id = id;
+ t->pid = NULL;
t->nr_regions = 0;
INIT_LIST_HEAD(&t->regions_list);
--- a/mm/damon/core-test.h~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/core-test.h
@@ -24,7 +24,7 @@ static void damon_test_regions(struct ku
KUNIT_EXPECT_EQ(test, 2ul, r->ar.end);
KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses);
- t = damon_new_target(42);
+ t = damon_new_target();
KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
damon_add_region(r, t);
@@ -52,8 +52,7 @@ static void damon_test_target(struct kun
struct damon_ctx *c = damon_new_ctx();
struct damon_target *t;
- t = damon_new_target(42);
- KUNIT_EXPECT_EQ(test, 42ul, t->id);
+ t = damon_new_target();
KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
damon_add_target(c, t);
@@ -78,7 +77,6 @@ static void damon_test_target(struct kun
static void damon_test_aggregate(struct kunit *test)
{
struct damon_ctx *ctx = damon_new_ctx();
- unsigned long target_ids[] = {1, 2, 3};
unsigned long saddr[][3] = {{10, 20, 30}, {5, 42, 49}, {13, 33, 55} };
unsigned long eaddr[][3] = {{15, 27, 40}, {31, 45, 55}, {23, 44, 66} };
unsigned long accesses[][3] = {{42, 95, 84}, {10, 20, 30}, {0, 1, 2} };
@@ -87,7 +85,7 @@ static void damon_test_aggregate(struct
int it, ir;
for (it = 0; it < 3; it++) {
- t = damon_new_target(target_ids[it]);
+ t = damon_new_target();
damon_add_target(ctx, t);
}
@@ -125,7 +123,7 @@ static void damon_test_split_at(struct k
struct damon_target *t;
struct damon_region *r;
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 100);
damon_add_region(r, t);
damon_split_region_at(c, t, r, 25);
@@ -146,7 +144,7 @@ static void damon_test_merge_two(struct
struct damon_region *r, *r2, *r3;
int i;
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 100);
r->nr_accesses = 10;
damon_add_region(r, t);
@@ -194,7 +192,7 @@ static void damon_test_merge_regions_of(
unsigned long eaddrs[] = {112, 130, 156, 170, 230};
int i;
- t = damon_new_target(42);
+ t = damon_new_target();
for (i = 0; i < ARRAY_SIZE(sa); i++) {
r = damon_new_region(sa[i], ea[i]);
r->nr_accesses = nrs[i];
@@ -218,14 +216,14 @@ static void damon_test_split_regions_of(
struct damon_target *t;
struct damon_region *r;
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 22);
damon_add_region(r, t);
damon_split_regions_of(c, t, 2);
KUNIT_EXPECT_LE(test, damon_nr_regions(t), 2u);
damon_free_target(t);
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 220);
damon_add_region(r, t);
damon_split_regions_of(c, t, 4);
--- a/mm/damon/dbgfs.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/dbgfs.c
@@ -275,7 +275,7 @@ out:
return ret;
}
-static inline bool targetid_is_pid(const struct damon_ctx *ctx)
+static inline bool target_has_pid(const struct damon_ctx *ctx)
{
return ctx->primitive.target_valid == damon_va_target_valid;
}
@@ -283,17 +283,19 @@ static inline bool targetid_is_pid(const
static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len)
{
struct damon_target *t;
- unsigned long id;
+ int id;
int written = 0;
int rc;
damon_for_each_target(t, ctx) {
- id = t->id;
- if (targetid_is_pid(ctx))
+ if (target_has_pid(ctx))
/* Show pid numbers to debugfs users */
- id = (unsigned long)pid_vnr((struct pid *)id);
+ id = pid_vnr(t->pid);
+ else
+ /* Show 42 for physical address space, just for fun */
+ id = 42;
- rc = scnprintf(&buf[written], len - written, "%lu ", id);
+ rc = scnprintf(&buf[written], len - written, "%d ", id);
if (!rc)
return -ENOMEM;
written += rc;
@@ -321,75 +323,114 @@ static ssize_t dbgfs_target_ids_read(str
}
/*
- * Converts a string into an array of unsigned long integers
+ * Converts a string into an integers array
*
- * Returns an array of unsigned long integers if the conversion success, or
- * NULL otherwise.
+ * Returns an array of integers array if the conversion success, or NULL
+ * otherwise.
*/
-static unsigned long *str_to_target_ids(const char *str, ssize_t len,
- ssize_t *nr_ids)
+static int *str_to_ints(const char *str, ssize_t len, ssize_t *nr_ints)
{
- unsigned long *ids;
- const int max_nr_ids = 32;
- unsigned long id;
+ int *array;
+ const int max_nr_ints = 32;
+ int nr;
int pos = 0, parsed, ret;
- *nr_ids = 0;
- ids = kmalloc_array(max_nr_ids, sizeof(id), GFP_KERNEL);
- if (!ids)
+ *nr_ints = 0;
+ array = kmalloc_array(max_nr_ints, sizeof(*array), GFP_KERNEL);
+ if (!array)
return NULL;
- while (*nr_ids < max_nr_ids && pos < len) {
- ret = sscanf(&str[pos], "%lu%n", &id, &parsed);
+ while (*nr_ints < max_nr_ints && pos < len) {
+ ret = sscanf(&str[pos], "%d%n", &nr, &parsed);
pos += parsed;
if (ret != 1)
break;
- ids[*nr_ids] = id;
- *nr_ids += 1;
+ array[*nr_ints] = nr;
+ *nr_ints += 1;
}
- return ids;
+ return array;
}
-static void dbgfs_put_pids(unsigned long *ids, int nr_ids)
+static void dbgfs_put_pids(struct pid **pids, int nr_pids)
{
int i;
- for (i = 0; i < nr_ids; i++)
- put_pid((struct pid *)ids[i]);
+ for (i = 0; i < nr_pids; i++)
+ put_pid(pids[i]);
+}
+
+/*
+ * Converts a string into an struct pid pointers array
+ *
+ * Returns an array of struct pid pointers if the conversion success, or NULL
+ * otherwise.
+ */
+static struct pid **str_to_pids(const char *str, ssize_t len, ssize_t *nr_pids)
+{
+ int *ints;
+ ssize_t nr_ints;
+ struct pid **pids;
+
+ *nr_pids = 0;
+
+ ints = str_to_ints(str, len, &nr_ints);
+ if (!ints)
+ return NULL;
+
+ pids = kmalloc_array(nr_ints, sizeof(*pids), GFP_KERNEL);
+ if (!pids)
+ goto out;
+
+ for (; *nr_pids < nr_ints; (*nr_pids)++) {
+ pids[*nr_pids] = find_get_pid(ints[*nr_pids]);
+ if (!pids[*nr_pids]) {
+ dbgfs_put_pids(pids, *nr_pids);
+ kfree(ints);
+ kfree(pids);
+ return NULL;
+ }
+ }
+
+out:
+ kfree(ints);
+ return pids;
}
/*
* dbgfs_set_targets() - Set monitoring targets.
* @ctx: monitoring context
- * @ids: array of target ids
- * @nr_ids: number of entries in @ids
+ * @nr_targets: number of targets
+ * @pids: array of target pids (size is same to @nr_targets)
*
- * This function should not be called while the kdamond is running.
+ * This function should not be called while the kdamond is running. @pids is
+ * ignored if the context is not configured to have pid in each target. On
+ * failure, reference counts of all pids in @pids are decremented.
*
* Return: 0 on success, negative error code otherwise.
*/
-static int dbgfs_set_targets(struct damon_ctx *ctx,
- unsigned long *ids, ssize_t nr_ids)
+static int dbgfs_set_targets(struct damon_ctx *ctx, ssize_t nr_targets,
+ struct pid **pids)
{
ssize_t i;
struct damon_target *t, *next;
damon_for_each_target_safe(t, next, ctx) {
- if (targetid_is_pid(ctx))
- put_pid((struct pid *)t->id);
+ if (target_has_pid(ctx))
+ put_pid(t->pid);
damon_destroy_target(t);
}
- for (i = 0; i < nr_ids; i++) {
- t = damon_new_target(ids[i]);
+ for (i = 0; i < nr_targets; i++) {
+ t = damon_new_target();
if (!t) {
- /* The caller should do cleanup of the ids itself */
damon_for_each_target_safe(t, next, ctx)
damon_destroy_target(t);
- if (targetid_is_pid(ctx))
- dbgfs_put_pids(ids, nr_ids);
+ if (target_has_pid(ctx))
+ dbgfs_put_pids(pids, nr_targets);
return -ENOMEM;
}
+ if (target_has_pid(ctx))
+ t->pid = pids[i];
damon_add_target(ctx, t);
}
@@ -402,10 +443,9 @@ static ssize_t dbgfs_target_ids_write(st
struct damon_ctx *ctx = file->private_data;
bool id_is_pid = true;
char *kbuf;
- unsigned long *targets;
+ struct pid **target_pids = NULL;
ssize_t nr_targets;
ssize_t ret;
- int i;
kbuf = user_input_str(buf, count, ppos);
if (IS_ERR(kbuf))
@@ -413,38 +453,27 @@ static ssize_t dbgfs_target_ids_write(st
if (!strncmp(kbuf, "paddr\n", count)) {
id_is_pid = false;
- /* target id is meaningless here, but we set it just for fun */
- scnprintf(kbuf, count, "42 ");
- }
-
- targets = str_to_target_ids(kbuf, count, &nr_targets);
- if (!targets) {
- ret = -ENOMEM;
- goto out;
+ nr_targets = 1;
}
if (id_is_pid) {
- for (i = 0; i < nr_targets; i++) {
- targets[i] = (unsigned long)find_get_pid(
- (int)targets[i]);
- if (!targets[i]) {
- dbgfs_put_pids(targets, i);
- ret = -EINVAL;
- goto free_targets_out;
- }
+ target_pids = str_to_pids(kbuf, count, &nr_targets);
+ if (!target_pids) {
+ ret = -ENOMEM;
+ goto out;
}
}
mutex_lock(&ctx->kdamond_lock);
if (ctx->kdamond) {
if (id_is_pid)
- dbgfs_put_pids(targets, nr_targets);
+ dbgfs_put_pids(target_pids, nr_targets);
ret = -EBUSY;
goto unlock_out;
}
/* remove previously set targets */
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
/* Configure the context for the address space type */
if (id_is_pid)
@@ -452,14 +481,13 @@ static ssize_t dbgfs_target_ids_write(st
else
damon_pa_set_primitives(ctx);
- ret = dbgfs_set_targets(ctx, targets, nr_targets);
+ ret = dbgfs_set_targets(ctx, nr_targets, target_pids);
if (!ret)
ret = count;
unlock_out:
mutex_unlock(&ctx->kdamond_lock);
-free_targets_out:
- kfree(targets);
+ kfree(target_pids);
out:
kfree(kbuf);
return ret;
@@ -688,12 +716,12 @@ static void dbgfs_before_terminate(struc
{
struct damon_target *t, *next;
- if (!targetid_is_pid(ctx))
+ if (!target_has_pid(ctx))
return;
mutex_lock(&ctx->kdamond_lock);
damon_for_each_target_safe(t, next, ctx) {
- put_pid((struct pid *)t->id);
+ put_pid(t->pid);
damon_destroy_target(t);
}
mutex_unlock(&ctx->kdamond_lock);
--- a/mm/damon/dbgfs-test.h~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/dbgfs-test.h
@@ -12,66 +12,58 @@
#include <kunit/test.h>
-static void damon_dbgfs_test_str_to_target_ids(struct kunit *test)
+static void damon_dbgfs_test_str_to_ints(struct kunit *test)
{
char *question;
- unsigned long *answers;
- unsigned long expected[] = {12, 35, 46};
+ int *answers;
+ int expected[] = {12, 35, 46};
ssize_t nr_integers = 0, i;
question = "123";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers);
- KUNIT_EXPECT_EQ(test, 123ul, answers[0]);
+ KUNIT_EXPECT_EQ(test, 123, answers[0]);
kfree(answers);
question = "123abc";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers);
- KUNIT_EXPECT_EQ(test, 123ul, answers[0]);
+ KUNIT_EXPECT_EQ(test, 123, answers[0]);
kfree(answers);
question = "a123";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
kfree(answers);
question = "12 35";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers);
for (i = 0; i < nr_integers; i++)
KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
kfree(answers);
question = "12 35 46";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)3, nr_integers);
for (i = 0; i < nr_integers; i++)
KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
kfree(answers);
question = "12 35 abc 46";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers);
for (i = 0; i < 2; i++)
KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
kfree(answers);
question = "";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
kfree(answers);
question = "\n";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
kfree(answers);
}
@@ -79,30 +71,20 @@ static void damon_dbgfs_test_str_to_targ
static void damon_dbgfs_test_set_targets(struct kunit *test)
{
struct damon_ctx *ctx = dbgfs_new_ctx();
- unsigned long ids[] = {1, 2, 3};
char buf[64];
- /* Make DAMON consider target id as plain number */
- ctx->primitive.target_valid = NULL;
- ctx->primitive.cleanup = NULL;
+ /* Make DAMON consider target has no pid */
+ ctx->primitive = (struct damon_primitive){};
- dbgfs_set_targets(ctx, ids, 3);
- sprint_target_ids(ctx, buf, 64);
- KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2 3\n");
-
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
- dbgfs_set_targets(ctx, (unsigned long []){1, 2}, 2);
- sprint_target_ids(ctx, buf, 64);
- KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2\n");
-
- dbgfs_set_targets(ctx, (unsigned long []){2}, 1);
+ dbgfs_set_targets(ctx, 1, NULL);
sprint_target_ids(ctx, buf, 64);
- KUNIT_EXPECT_STREQ(test, (char *)buf, "2\n");
+ KUNIT_EXPECT_STREQ(test, (char *)buf, "42\n");
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
@@ -112,7 +94,6 @@ static void damon_dbgfs_test_set_targets
static void damon_dbgfs_test_set_init_regions(struct kunit *test)
{
struct damon_ctx *ctx = damon_new_ctx();
- unsigned long ids[] = {1, 2, 3};
/* Each line represents one region in ``<target idx> <start> <end>`` */
char * const valid_inputs[] = {"1 10 20\n 1 20 30\n1 35 45",
"1 10 20\n",
@@ -130,7 +111,7 @@ static void damon_dbgfs_test_set_init_re
int i, rc;
char buf[256];
- dbgfs_set_targets(ctx, ids, 3);
+ dbgfs_set_targets(ctx, 3, NULL);
/* Put valid inputs and check the results */
for (i = 0; i < ARRAY_SIZE(valid_inputs); i++) {
@@ -158,12 +139,12 @@ static void damon_dbgfs_test_set_init_re
KUNIT_EXPECT_STREQ(test, (char *)buf, "");
}
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
damon_destroy_ctx(ctx);
}
static struct kunit_case damon_test_cases[] = {
- KUNIT_CASE(damon_dbgfs_test_str_to_target_ids),
+ KUNIT_CASE(damon_dbgfs_test_str_to_ints),
KUNIT_CASE(damon_dbgfs_test_set_targets),
KUNIT_CASE(damon_dbgfs_test_set_init_regions),
{},
--- a/mm/damon/reclaim.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/reclaim.c
@@ -387,8 +387,7 @@ static int __init damon_reclaim_init(voi
damon_pa_set_primitives(ctx);
ctx->callback.after_aggregation = damon_reclaim_after_aggregation;
- /* 4242 means nothing but fun */
- target = damon_new_target(4242);
+ target = damon_new_target();
if (!target) {
damon_destroy_ctx(ctx);
return -ENOMEM;
--- a/mm/damon/vaddr.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/vaddr.c
@@ -23,12 +23,12 @@
#endif
/*
- * 't->id' should be the pointer to the relevant 'struct pid' having reference
+ * 't->pid' should be the pointer to the relevant 'struct pid' having reference
* count. Caller must put the returned task, unless it is NULL.
*/
static inline struct task_struct *damon_get_task_struct(struct damon_target *t)
{
- return get_pid_task((struct pid *)t->id, PIDTYPE_PID);
+ return get_pid_task(t->pid, PIDTYPE_PID);
}
/*
--- a/mm/damon/vaddr-test.h~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/vaddr-test.h
@@ -139,7 +139,7 @@ static void damon_do_test_apply_three_re
struct damon_region *r;
int i;
- t = damon_new_target(42);
+ t = damon_new_target();
for (i = 0; i < nr_regions / 2; i++) {
r = damon_new_region(regions[i * 2], regions[i * 2 + 1]);
damon_add_region(r, t);
@@ -251,7 +251,7 @@ static void damon_test_apply_three_regio
static void damon_test_split_evenly_fail(struct kunit *test,
unsigned long start, unsigned long end, unsigned int nr_pieces)
{
- struct damon_target *t = damon_new_target(42);
+ struct damon_target *t = damon_new_target();
struct damon_region *r = damon_new_region(start, end);
damon_add_region(r, t);
@@ -270,7 +270,7 @@ static void damon_test_split_evenly_fail
static void damon_test_split_evenly_succ(struct kunit *test,
unsigned long start, unsigned long end, unsigned int nr_pieces)
{
- struct damon_target *t = damon_new_target(42);
+ struct damon_target *t = damon_new_target();
struct damon_region *r = damon_new_region(start, end);
unsigned long expected_width = (end - start) / nr_pieces;
unsigned long i = 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 200/227] mm/damon: remove the target id concept
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: remove the target id concept
DAMON asks each monitoring target ('struct damon_target') to have one
'unsigned long' integer called 'id', which should be unique among the
targets of same monitoring context. Meaning of it is, however, totally up
to the monitoring primitives that registered to the monitoring context.
For example, the virtual address spaces monitoring primitives treats the
id as a 'struct pid' pointer.
This makes the code flexible, but ugly, not well-documented, and
type-unsafe[1]. Also, identification of each target can be done via its
index. For the reason, this commit removes the concept and uses clear
type definition. For now, only 'struct pid' pointer is used for the
virtual address spaces monitoring. If DAMON is extended in future so that
we need to put another identifier field in the struct, we will use a union
for such primitives-dependent fields and document which primitives are
using which type.
[1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/
Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 11 +-
mm/damon/core-test.h | 18 ++--
mm/damon/core.c | 4 -
mm/damon/dbgfs-test.h | 63 +++++-----------
mm/damon/dbgfs.c | 152 +++++++++++++++++++++++-----------------
mm/damon/reclaim.c | 3
mm/damon/vaddr-test.h | 6 -
mm/damon/vaddr.c | 4 -
8 files changed, 133 insertions(+), 128 deletions(-)
--- a/include/linux/damon.h~mm-damon-remove-the-target-id-concept
+++ a/include/linux/damon.h
@@ -60,19 +60,18 @@ struct damon_region {
/**
* struct damon_target - Represents a monitoring target.
- * @id: Unique identifier for this target.
+ * @pid: The PID of the virtual address space to monitor.
* @nr_regions: Number of monitoring target regions of this target.
* @regions_list: Head of the monitoring target regions of this target.
* @list: List head for siblings.
*
* Each monitoring context could have multiple targets. For example, a context
* for virtual memory address spaces could have multiple target processes. The
- * @id of each target should be unique among the targets of the context. For
- * example, in the virtual address monitoring context, it could be a pidfd or
- * an address of an mm_struct.
+ * @pid should be set for appropriate address space monitoring primitives
+ * including the virtual address spaces monitoring primitives.
*/
struct damon_target {
- unsigned long id;
+ struct pid *pid;
unsigned int nr_regions;
struct list_head regions_list;
struct list_head list;
@@ -475,7 +474,7 @@ struct damos *damon_new_scheme(
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
void damon_destroy_scheme(struct damos *s);
-struct damon_target *damon_new_target(unsigned long id);
+struct damon_target *damon_new_target(void);
void damon_add_target(struct damon_ctx *ctx, struct damon_target *t);
bool damon_targets_empty(struct damon_ctx *ctx);
void damon_free_target(struct damon_target *t);
--- a/mm/damon/core.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/core.c
@@ -144,7 +144,7 @@ void damon_destroy_scheme(struct damos *
*
* Returns the pointer to the new struct if success, or NULL otherwise
*/
-struct damon_target *damon_new_target(unsigned long id)
+struct damon_target *damon_new_target(void)
{
struct damon_target *t;
@@ -152,7 +152,7 @@ struct damon_target *damon_new_target(un
if (!t)
return NULL;
- t->id = id;
+ t->pid = NULL;
t->nr_regions = 0;
INIT_LIST_HEAD(&t->regions_list);
--- a/mm/damon/core-test.h~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/core-test.h
@@ -24,7 +24,7 @@ static void damon_test_regions(struct ku
KUNIT_EXPECT_EQ(test, 2ul, r->ar.end);
KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses);
- t = damon_new_target(42);
+ t = damon_new_target();
KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
damon_add_region(r, t);
@@ -52,8 +52,7 @@ static void damon_test_target(struct kun
struct damon_ctx *c = damon_new_ctx();
struct damon_target *t;
- t = damon_new_target(42);
- KUNIT_EXPECT_EQ(test, 42ul, t->id);
+ t = damon_new_target();
KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
damon_add_target(c, t);
@@ -78,7 +77,6 @@ static void damon_test_target(struct kun
static void damon_test_aggregate(struct kunit *test)
{
struct damon_ctx *ctx = damon_new_ctx();
- unsigned long target_ids[] = {1, 2, 3};
unsigned long saddr[][3] = {{10, 20, 30}, {5, 42, 49}, {13, 33, 55} };
unsigned long eaddr[][3] = {{15, 27, 40}, {31, 45, 55}, {23, 44, 66} };
unsigned long accesses[][3] = {{42, 95, 84}, {10, 20, 30}, {0, 1, 2} };
@@ -87,7 +85,7 @@ static void damon_test_aggregate(struct
int it, ir;
for (it = 0; it < 3; it++) {
- t = damon_new_target(target_ids[it]);
+ t = damon_new_target();
damon_add_target(ctx, t);
}
@@ -125,7 +123,7 @@ static void damon_test_split_at(struct k
struct damon_target *t;
struct damon_region *r;
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 100);
damon_add_region(r, t);
damon_split_region_at(c, t, r, 25);
@@ -146,7 +144,7 @@ static void damon_test_merge_two(struct
struct damon_region *r, *r2, *r3;
int i;
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 100);
r->nr_accesses = 10;
damon_add_region(r, t);
@@ -194,7 +192,7 @@ static void damon_test_merge_regions_of(
unsigned long eaddrs[] = {112, 130, 156, 170, 230};
int i;
- t = damon_new_target(42);
+ t = damon_new_target();
for (i = 0; i < ARRAY_SIZE(sa); i++) {
r = damon_new_region(sa[i], ea[i]);
r->nr_accesses = nrs[i];
@@ -218,14 +216,14 @@ static void damon_test_split_regions_of(
struct damon_target *t;
struct damon_region *r;
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 22);
damon_add_region(r, t);
damon_split_regions_of(c, t, 2);
KUNIT_EXPECT_LE(test, damon_nr_regions(t), 2u);
damon_free_target(t);
- t = damon_new_target(42);
+ t = damon_new_target();
r = damon_new_region(0, 220);
damon_add_region(r, t);
damon_split_regions_of(c, t, 4);
--- a/mm/damon/dbgfs.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/dbgfs.c
@@ -275,7 +275,7 @@ out:
return ret;
}
-static inline bool targetid_is_pid(const struct damon_ctx *ctx)
+static inline bool target_has_pid(const struct damon_ctx *ctx)
{
return ctx->primitive.target_valid == damon_va_target_valid;
}
@@ -283,17 +283,19 @@ static inline bool targetid_is_pid(const
static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len)
{
struct damon_target *t;
- unsigned long id;
+ int id;
int written = 0;
int rc;
damon_for_each_target(t, ctx) {
- id = t->id;
- if (targetid_is_pid(ctx))
+ if (target_has_pid(ctx))
/* Show pid numbers to debugfs users */
- id = (unsigned long)pid_vnr((struct pid *)id);
+ id = pid_vnr(t->pid);
+ else
+ /* Show 42 for physical address space, just for fun */
+ id = 42;
- rc = scnprintf(&buf[written], len - written, "%lu ", id);
+ rc = scnprintf(&buf[written], len - written, "%d ", id);
if (!rc)
return -ENOMEM;
written += rc;
@@ -321,75 +323,114 @@ static ssize_t dbgfs_target_ids_read(str
}
/*
- * Converts a string into an array of unsigned long integers
+ * Converts a string into an integers array
*
- * Returns an array of unsigned long integers if the conversion success, or
- * NULL otherwise.
+ * Returns an array of integers array if the conversion success, or NULL
+ * otherwise.
*/
-static unsigned long *str_to_target_ids(const char *str, ssize_t len,
- ssize_t *nr_ids)
+static int *str_to_ints(const char *str, ssize_t len, ssize_t *nr_ints)
{
- unsigned long *ids;
- const int max_nr_ids = 32;
- unsigned long id;
+ int *array;
+ const int max_nr_ints = 32;
+ int nr;
int pos = 0, parsed, ret;
- *nr_ids = 0;
- ids = kmalloc_array(max_nr_ids, sizeof(id), GFP_KERNEL);
- if (!ids)
+ *nr_ints = 0;
+ array = kmalloc_array(max_nr_ints, sizeof(*array), GFP_KERNEL);
+ if (!array)
return NULL;
- while (*nr_ids < max_nr_ids && pos < len) {
- ret = sscanf(&str[pos], "%lu%n", &id, &parsed);
+ while (*nr_ints < max_nr_ints && pos < len) {
+ ret = sscanf(&str[pos], "%d%n", &nr, &parsed);
pos += parsed;
if (ret != 1)
break;
- ids[*nr_ids] = id;
- *nr_ids += 1;
+ array[*nr_ints] = nr;
+ *nr_ints += 1;
}
- return ids;
+ return array;
}
-static void dbgfs_put_pids(unsigned long *ids, int nr_ids)
+static void dbgfs_put_pids(struct pid **pids, int nr_pids)
{
int i;
- for (i = 0; i < nr_ids; i++)
- put_pid((struct pid *)ids[i]);
+ for (i = 0; i < nr_pids; i++)
+ put_pid(pids[i]);
+}
+
+/*
+ * Converts a string into an struct pid pointers array
+ *
+ * Returns an array of struct pid pointers if the conversion success, or NULL
+ * otherwise.
+ */
+static struct pid **str_to_pids(const char *str, ssize_t len, ssize_t *nr_pids)
+{
+ int *ints;
+ ssize_t nr_ints;
+ struct pid **pids;
+
+ *nr_pids = 0;
+
+ ints = str_to_ints(str, len, &nr_ints);
+ if (!ints)
+ return NULL;
+
+ pids = kmalloc_array(nr_ints, sizeof(*pids), GFP_KERNEL);
+ if (!pids)
+ goto out;
+
+ for (; *nr_pids < nr_ints; (*nr_pids)++) {
+ pids[*nr_pids] = find_get_pid(ints[*nr_pids]);
+ if (!pids[*nr_pids]) {
+ dbgfs_put_pids(pids, *nr_pids);
+ kfree(ints);
+ kfree(pids);
+ return NULL;
+ }
+ }
+
+out:
+ kfree(ints);
+ return pids;
}
/*
* dbgfs_set_targets() - Set monitoring targets.
* @ctx: monitoring context
- * @ids: array of target ids
- * @nr_ids: number of entries in @ids
+ * @nr_targets: number of targets
+ * @pids: array of target pids (size is same to @nr_targets)
*
- * This function should not be called while the kdamond is running.
+ * This function should not be called while the kdamond is running. @pids is
+ * ignored if the context is not configured to have pid in each target. On
+ * failure, reference counts of all pids in @pids are decremented.
*
* Return: 0 on success, negative error code otherwise.
*/
-static int dbgfs_set_targets(struct damon_ctx *ctx,
- unsigned long *ids, ssize_t nr_ids)
+static int dbgfs_set_targets(struct damon_ctx *ctx, ssize_t nr_targets,
+ struct pid **pids)
{
ssize_t i;
struct damon_target *t, *next;
damon_for_each_target_safe(t, next, ctx) {
- if (targetid_is_pid(ctx))
- put_pid((struct pid *)t->id);
+ if (target_has_pid(ctx))
+ put_pid(t->pid);
damon_destroy_target(t);
}
- for (i = 0; i < nr_ids; i++) {
- t = damon_new_target(ids[i]);
+ for (i = 0; i < nr_targets; i++) {
+ t = damon_new_target();
if (!t) {
- /* The caller should do cleanup of the ids itself */
damon_for_each_target_safe(t, next, ctx)
damon_destroy_target(t);
- if (targetid_is_pid(ctx))
- dbgfs_put_pids(ids, nr_ids);
+ if (target_has_pid(ctx))
+ dbgfs_put_pids(pids, nr_targets);
return -ENOMEM;
}
+ if (target_has_pid(ctx))
+ t->pid = pids[i];
damon_add_target(ctx, t);
}
@@ -402,10 +443,9 @@ static ssize_t dbgfs_target_ids_write(st
struct damon_ctx *ctx = file->private_data;
bool id_is_pid = true;
char *kbuf;
- unsigned long *targets;
+ struct pid **target_pids = NULL;
ssize_t nr_targets;
ssize_t ret;
- int i;
kbuf = user_input_str(buf, count, ppos);
if (IS_ERR(kbuf))
@@ -413,38 +453,27 @@ static ssize_t dbgfs_target_ids_write(st
if (!strncmp(kbuf, "paddr\n", count)) {
id_is_pid = false;
- /* target id is meaningless here, but we set it just for fun */
- scnprintf(kbuf, count, "42 ");
- }
-
- targets = str_to_target_ids(kbuf, count, &nr_targets);
- if (!targets) {
- ret = -ENOMEM;
- goto out;
+ nr_targets = 1;
}
if (id_is_pid) {
- for (i = 0; i < nr_targets; i++) {
- targets[i] = (unsigned long)find_get_pid(
- (int)targets[i]);
- if (!targets[i]) {
- dbgfs_put_pids(targets, i);
- ret = -EINVAL;
- goto free_targets_out;
- }
+ target_pids = str_to_pids(kbuf, count, &nr_targets);
+ if (!target_pids) {
+ ret = -ENOMEM;
+ goto out;
}
}
mutex_lock(&ctx->kdamond_lock);
if (ctx->kdamond) {
if (id_is_pid)
- dbgfs_put_pids(targets, nr_targets);
+ dbgfs_put_pids(target_pids, nr_targets);
ret = -EBUSY;
goto unlock_out;
}
/* remove previously set targets */
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
/* Configure the context for the address space type */
if (id_is_pid)
@@ -452,14 +481,13 @@ static ssize_t dbgfs_target_ids_write(st
else
damon_pa_set_primitives(ctx);
- ret = dbgfs_set_targets(ctx, targets, nr_targets);
+ ret = dbgfs_set_targets(ctx, nr_targets, target_pids);
if (!ret)
ret = count;
unlock_out:
mutex_unlock(&ctx->kdamond_lock);
-free_targets_out:
- kfree(targets);
+ kfree(target_pids);
out:
kfree(kbuf);
return ret;
@@ -688,12 +716,12 @@ static void dbgfs_before_terminate(struc
{
struct damon_target *t, *next;
- if (!targetid_is_pid(ctx))
+ if (!target_has_pid(ctx))
return;
mutex_lock(&ctx->kdamond_lock);
damon_for_each_target_safe(t, next, ctx) {
- put_pid((struct pid *)t->id);
+ put_pid(t->pid);
damon_destroy_target(t);
}
mutex_unlock(&ctx->kdamond_lock);
--- a/mm/damon/dbgfs-test.h~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/dbgfs-test.h
@@ -12,66 +12,58 @@
#include <kunit/test.h>
-static void damon_dbgfs_test_str_to_target_ids(struct kunit *test)
+static void damon_dbgfs_test_str_to_ints(struct kunit *test)
{
char *question;
- unsigned long *answers;
- unsigned long expected[] = {12, 35, 46};
+ int *answers;
+ int expected[] = {12, 35, 46};
ssize_t nr_integers = 0, i;
question = "123";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers);
- KUNIT_EXPECT_EQ(test, 123ul, answers[0]);
+ KUNIT_EXPECT_EQ(test, 123, answers[0]);
kfree(answers);
question = "123abc";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)1, nr_integers);
- KUNIT_EXPECT_EQ(test, 123ul, answers[0]);
+ KUNIT_EXPECT_EQ(test, 123, answers[0]);
kfree(answers);
question = "a123";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
kfree(answers);
question = "12 35";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers);
for (i = 0; i < nr_integers; i++)
KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
kfree(answers);
question = "12 35 46";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)3, nr_integers);
for (i = 0; i < nr_integers; i++)
KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
kfree(answers);
question = "12 35 abc 46";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)2, nr_integers);
for (i = 0; i < 2; i++)
KUNIT_EXPECT_EQ(test, expected[i], answers[i]);
kfree(answers);
question = "";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
kfree(answers);
question = "\n";
- answers = str_to_target_ids(question, strlen(question),
- &nr_integers);
+ answers = str_to_ints(question, strlen(question), &nr_integers);
KUNIT_EXPECT_EQ(test, (ssize_t)0, nr_integers);
kfree(answers);
}
@@ -79,30 +71,20 @@ static void damon_dbgfs_test_str_to_targ
static void damon_dbgfs_test_set_targets(struct kunit *test)
{
struct damon_ctx *ctx = dbgfs_new_ctx();
- unsigned long ids[] = {1, 2, 3};
char buf[64];
- /* Make DAMON consider target id as plain number */
- ctx->primitive.target_valid = NULL;
- ctx->primitive.cleanup = NULL;
+ /* Make DAMON consider target has no pid */
+ ctx->primitive = (struct damon_primitive){};
- dbgfs_set_targets(ctx, ids, 3);
- sprint_target_ids(ctx, buf, 64);
- KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2 3\n");
-
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
- dbgfs_set_targets(ctx, (unsigned long []){1, 2}, 2);
- sprint_target_ids(ctx, buf, 64);
- KUNIT_EXPECT_STREQ(test, (char *)buf, "1 2\n");
-
- dbgfs_set_targets(ctx, (unsigned long []){2}, 1);
+ dbgfs_set_targets(ctx, 1, NULL);
sprint_target_ids(ctx, buf, 64);
- KUNIT_EXPECT_STREQ(test, (char *)buf, "2\n");
+ KUNIT_EXPECT_STREQ(test, (char *)buf, "42\n");
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
KUNIT_EXPECT_STREQ(test, (char *)buf, "\n");
@@ -112,7 +94,6 @@ static void damon_dbgfs_test_set_targets
static void damon_dbgfs_test_set_init_regions(struct kunit *test)
{
struct damon_ctx *ctx = damon_new_ctx();
- unsigned long ids[] = {1, 2, 3};
/* Each line represents one region in ``<target idx> <start> <end>`` */
char * const valid_inputs[] = {"1 10 20\n 1 20 30\n1 35 45",
"1 10 20\n",
@@ -130,7 +111,7 @@ static void damon_dbgfs_test_set_init_re
int i, rc;
char buf[256];
- dbgfs_set_targets(ctx, ids, 3);
+ dbgfs_set_targets(ctx, 3, NULL);
/* Put valid inputs and check the results */
for (i = 0; i < ARRAY_SIZE(valid_inputs); i++) {
@@ -158,12 +139,12 @@ static void damon_dbgfs_test_set_init_re
KUNIT_EXPECT_STREQ(test, (char *)buf, "");
}
- dbgfs_set_targets(ctx, NULL, 0);
+ dbgfs_set_targets(ctx, 0, NULL);
damon_destroy_ctx(ctx);
}
static struct kunit_case damon_test_cases[] = {
- KUNIT_CASE(damon_dbgfs_test_str_to_target_ids),
+ KUNIT_CASE(damon_dbgfs_test_str_to_ints),
KUNIT_CASE(damon_dbgfs_test_set_targets),
KUNIT_CASE(damon_dbgfs_test_set_init_regions),
{},
--- a/mm/damon/reclaim.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/reclaim.c
@@ -387,8 +387,7 @@ static int __init damon_reclaim_init(voi
damon_pa_set_primitives(ctx);
ctx->callback.after_aggregation = damon_reclaim_after_aggregation;
- /* 4242 means nothing but fun */
- target = damon_new_target(4242);
+ target = damon_new_target();
if (!target) {
damon_destroy_ctx(ctx);
return -ENOMEM;
--- a/mm/damon/vaddr.c~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/vaddr.c
@@ -23,12 +23,12 @@
#endif
/*
- * 't->id' should be the pointer to the relevant 'struct pid' having reference
+ * 't->pid' should be the pointer to the relevant 'struct pid' having reference
* count. Caller must put the returned task, unless it is NULL.
*/
static inline struct task_struct *damon_get_task_struct(struct damon_target *t)
{
- return get_pid_task((struct pid *)t->id, PIDTYPE_PID);
+ return get_pid_task(t->pid, PIDTYPE_PID);
}
/*
--- a/mm/damon/vaddr-test.h~mm-damon-remove-the-target-id-concept
+++ a/mm/damon/vaddr-test.h
@@ -139,7 +139,7 @@ static void damon_do_test_apply_three_re
struct damon_region *r;
int i;
- t = damon_new_target(42);
+ t = damon_new_target();
for (i = 0; i < nr_regions / 2; i++) {
r = damon_new_region(regions[i * 2], regions[i * 2 + 1]);
damon_add_region(r, t);
@@ -251,7 +251,7 @@ static void damon_test_apply_three_regio
static void damon_test_split_evenly_fail(struct kunit *test,
unsigned long start, unsigned long end, unsigned int nr_pieces)
{
- struct damon_target *t = damon_new_target(42);
+ struct damon_target *t = damon_new_target();
struct damon_region *r = damon_new_region(start, end);
damon_add_region(r, t);
@@ -270,7 +270,7 @@ static void damon_test_split_evenly_fail
static void damon_test_split_evenly_succ(struct kunit *test,
unsigned long start, unsigned long end, unsigned int nr_pieces)
{
- struct damon_target *t = damon_new_target(42);
+ struct damon_target *t = damon_new_target();
struct damon_region *r = damon_new_region(start, end);
unsigned long expected_width = (end - start) / nr_pieces;
unsigned long i = 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 201/227] mm/damon: remove redundant page validation
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, rientjes, linmiaohe, jrdr.linux, dan.carpenter, baolin.wang,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: mm/damon: remove redundant page validation
It will never get a NULL page by pte_page() as discussed in thread [1],
thus remove the redundant page validation to fix below Smatch static
checker warning.
mm/damon/vaddr.c:405 damon_hugetlb_mkold()
warn: 'page' can't be NULL.
[1] https://lore.kernel.org/linux-mm/20220106091200.GA14564@kili/
Link: https://lkml.kernel.org/r/6d32f7d201b8970d53f51b6c5717d472aed2987c.1642386715.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/vaddr.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/damon/vaddr.c~mm-damon-remove-redundant-page-validation
+++ a/mm/damon/vaddr.c
@@ -402,9 +402,6 @@ static void damon_hugetlb_mkold(pte_t *p
pte_t entry = huge_ptep_get(pte);
struct page *page = pte_page(entry);
- if (!page)
- return;
-
get_page(page);
if (pte_young(entry)) {
@@ -564,9 +561,6 @@ static int damon_young_hugetlb_entry(pte
goto out;
page = pte_page(entry);
- if (!page)
- goto out;
-
get_page(page);
if (pte_young(entry) || !page_is_idle(page) ||
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 201/227] mm/damon: remove redundant page validation
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: sj, rientjes, linmiaohe, jrdr.linux, dan.carpenter, baolin.wang,
akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: mm/damon: remove redundant page validation
It will never get a NULL page by pte_page() as discussed in thread [1],
thus remove the redundant page validation to fix below Smatch static
checker warning.
mm/damon/vaddr.c:405 damon_hugetlb_mkold()
warn: 'page' can't be NULL.
[1] https://lore.kernel.org/linux-mm/20220106091200.GA14564@kili/
Link: https://lkml.kernel.org/r/6d32f7d201b8970d53f51b6c5717d472aed2987c.1642386715.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/vaddr.c | 6 ------
1 file changed, 6 deletions(-)
--- a/mm/damon/vaddr.c~mm-damon-remove-redundant-page-validation
+++ a/mm/damon/vaddr.c
@@ -402,9 +402,6 @@ static void damon_hugetlb_mkold(pte_t *p
pte_t entry = huge_ptep_get(pte);
struct page *page = pte_page(entry);
- if (!page)
- return;
-
get_page(page);
if (pte_young(entry)) {
@@ -564,9 +561,6 @@ static int damon_young_hugetlb_entry(pte
goto out;
page = pte_page(entry);
- if (!page)
- goto out;
-
get_page(page);
if (pte_young(entry) || !page_is_idle(page) ||
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 202/227] mm/damon: rename damon_primitives to damon_operations
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: rename damon_primitives to damon_operations
Patch series "Allow DAMON user code independent of monitoring primitives".
In-kernel DAMON user code is required to configure the monitoring context
(struct damon_ctx) with proper monitoring primitives (struct
damon_primitive). This makes the user code dependent to all supporting
monitoring primitives. For example, DAMON debugfs interface depends on
both DAMON_VADDR and DAMON_PADDR, though some users have interest in only
one use case. As more monitoring primitives are introduced, the problem
will be bigger.
To minimize such unnecessary dependency, this patchset makes monitoring
primitives can be registered by the implemnting code and later dynamically
searched and selected by the user code.
In addition to that, this patchset renames monitoring primitives to
monitoring operations, which is more easy to intuitively understand what
it means and how it would be structed.
This patch (of 8):
DAMON has a set of callback functions called monitoring primitives and let
it can be configured with various implementations for easy extension for
different address spaces and usages. However, the word 'primitive' is not
so explicit. Meanwhile, many other structs resembles similar purpose
calls themselves 'operations'. To make the code easier to be understood,
this commit renames 'damon_primitives' to 'damon_operations' before it is
too late to rename.
Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 48 ++++++-------
mm/damon/Kconfig | 12 +--
mm/damon/Makefile | 4 -
mm/damon/core.c | 65 +++++++++---------
mm/damon/dbgfs-test.h | 2
mm/damon/dbgfs.c | 10 +-
mm/damon/{prmtv-common.c => ops-common.c} | 2 +-
mm/damon/{prmtv-common.h => ops-common.h} | 0
mm/damon/paddr.c | 22 +++---
mm/damon/reclaim.c | 2
mm/damon/vaddr-test.h | 2
mm/damon/vaddr.c | 22 +++---
14 files changed, 244 insertions(+), 243 deletions(-)
--- a/include/linux/damon.h~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/include/linux/damon.h
@@ -67,8 +67,8 @@ struct damon_region {
*
* Each monitoring context could have multiple targets. For example, a context
* for virtual memory address spaces could have multiple target processes. The
- * @pid should be set for appropriate address space monitoring primitives
- * including the virtual address spaces monitoring primitives.
+ * @pid should be set for appropriate &struct damon_operations including the
+ * virtual address spaces monitoring operations.
*/
struct damon_target {
struct pid *pid;
@@ -120,9 +120,9 @@ enum damos_action {
* uses smaller one as the effective quota.
*
* For selecting regions within the quota, DAMON prioritizes current scheme's
- * target memory regions using the &struct damon_primitive->get_scheme_score.
+ * target memory regions using the &struct damon_operations->get_scheme_score.
* You could customize the prioritization logic by setting &weight_sz,
- * &weight_nr_accesses, and &weight_age, because monitoring primitives are
+ * &weight_nr_accesses, and &weight_age, because monitoring operations are
* encouraged to respect those.
*/
struct damos_quota {
@@ -256,10 +256,10 @@ struct damos {
struct damon_ctx;
/**
- * struct damon_primitive - Monitoring primitives for given use cases.
+ * struct damon_operations - Monitoring operations for given use cases.
*
- * @init: Initialize primitive-internal data structures.
- * @update: Update primitive-internal data structures.
+ * @init: Initialize operations-related data structures.
+ * @update: Update operations-related data structures.
* @prepare_access_checks: Prepare next access check of target regions.
* @check_accesses: Check the accesses to target regions.
* @reset_aggregated: Reset aggregated accesses monitoring results.
@@ -269,18 +269,18 @@ struct damon_ctx;
* @cleanup: Clean up the context.
*
* DAMON can be extended for various address spaces and usages. For this,
- * users should register the low level primitives for their target address
- * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread
+ * users should register the low level operations for their target address
+ * space and usecase via the &damon_ctx.ops. Then, the monitoring thread
* (&damon_ctx.kdamond) calls @init and @prepare_access_checks before starting
- * the monitoring, @update after each &damon_ctx.primitive_update_interval, and
+ * the monitoring, @update after each &damon_ctx.ops_update_interval, and
* @check_accesses, @target_valid and @prepare_access_checks after each
* &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each
* &damon_ctx.aggr_interval.
*
- * @init should initialize primitive-internal data structures. For example,
+ * @init should initialize operations-related data structures. For example,
* this could be used to construct proper monitoring target regions and link
* those to @damon_ctx.adaptive_targets.
- * @update should update the primitive-internal data structures. For example,
+ * @update should update the operations-related data structures. For example,
* this could be used to update monitoring target regions for current status.
* @prepare_access_checks should manipulate the monitoring regions to be
* prepared for the next access check.
@@ -300,7 +300,7 @@ struct damon_ctx;
* monitoring.
* @cleanup is called from @kdamond just before its termination.
*/
-struct damon_primitive {
+struct damon_operations {
void (*init)(struct damon_ctx *context);
void (*update)(struct damon_ctx *context);
void (*prepare_access_checks)(struct damon_ctx *context);
@@ -354,15 +354,15 @@ struct damon_callback {
*
* @sample_interval: The time between access samplings.
* @aggr_interval: The time between monitor results aggregations.
- * @primitive_update_interval: The time between monitoring primitive updates.
+ * @ops_update_interval: The time between monitoring operations updates.
*
* For each @sample_interval, DAMON checks whether each region is accessed or
* not. It aggregates and keeps the access information (number of accesses to
* each region) for @aggr_interval time. DAMON also checks whether the target
* memory regions need update (e.g., by ``mmap()`` calls from the application,
* in case of virtual memory monitoring) and applies the changes for each
- * @primitive_update_interval. All time intervals are in micro-seconds.
- * Please refer to &struct damon_primitive and &struct damon_callback for more
+ * @ops_update_interval. All time intervals are in micro-seconds.
+ * Please refer to &struct damon_operations and &struct damon_callback for more
* detail.
*
* @kdamond: Kernel thread who does the monitoring.
@@ -374,7 +374,7 @@ struct damon_callback {
*
* Once started, the monitoring thread runs until explicitly required to be
* terminated or every monitoring target is invalid. The validity of the
- * targets is checked via the &damon_primitive.target_valid of @primitive. The
+ * targets is checked via the &damon_operations.target_valid of @ops. The
* termination can also be explicitly requested by writing non-zero to
* @kdamond_stop. The thread sets @kdamond to NULL when it terminates.
* Therefore, users can know whether the monitoring is ongoing or terminated by
@@ -384,7 +384,7 @@ struct damon_callback {
* Note that the monitoring thread protects only @kdamond and @kdamond_stop via
* @kdamond_lock. Accesses to other fields must be protected by themselves.
*
- * @primitive: Set of monitoring primitives for given use cases.
+ * @ops: Set of monitoring operations for given use cases.
* @callback: Set of callbacks for monitoring events notifications.
*
* @min_nr_regions: The minimum number of adaptive monitoring regions.
@@ -395,17 +395,17 @@ struct damon_callback {
struct damon_ctx {
unsigned long sample_interval;
unsigned long aggr_interval;
- unsigned long primitive_update_interval;
+ unsigned long ops_update_interval;
/* private: internal use only */
struct timespec64 last_aggregation;
- struct timespec64 last_primitive_update;
+ struct timespec64 last_ops_update;
/* public: */
struct task_struct *kdamond;
struct mutex kdamond_lock;
- struct damon_primitive primitive;
+ struct damon_operations ops;
struct damon_callback callback;
unsigned long min_nr_regions;
@@ -484,7 +484,7 @@ unsigned int damon_nr_regions(struct dam
struct damon_ctx *damon_new_ctx(void);
void damon_destroy_ctx(struct damon_ctx *ctx);
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
- unsigned long aggr_int, unsigned long primitive_upd_int,
+ unsigned long aggr_int, unsigned long ops_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg);
int damon_set_schemes(struct damon_ctx *ctx,
struct damos **schemes, ssize_t nr_schemes);
@@ -497,12 +497,12 @@ int damon_stop(struct damon_ctx **ctxs,
#ifdef CONFIG_DAMON_VADDR
bool damon_va_target_valid(void *t);
-void damon_va_set_primitives(struct damon_ctx *ctx);
+void damon_va_set_operations(struct damon_ctx *ctx);
#endif /* CONFIG_DAMON_VADDR */
#ifdef CONFIG_DAMON_PADDR
bool damon_pa_target_valid(void *t);
-void damon_pa_set_primitives(struct damon_ctx *ctx);
+void damon_pa_set_operations(struct damon_ctx *ctx);
#endif /* CONFIG_DAMON_PADDR */
#endif /* _DAMON_H */
--- a/mm/damon/core.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/core.c
@@ -204,10 +204,10 @@ struct damon_ctx *damon_new_ctx(void)
ctx->sample_interval = 5 * 1000;
ctx->aggr_interval = 100 * 1000;
- ctx->primitive_update_interval = 60 * 1000 * 1000;
+ ctx->ops_update_interval = 60 * 1000 * 1000;
ktime_get_coarse_ts64(&ctx->last_aggregation);
- ctx->last_primitive_update = ctx->last_aggregation;
+ ctx->last_ops_update = ctx->last_aggregation;
mutex_init(&ctx->kdamond_lock);
@@ -224,8 +224,8 @@ static void damon_destroy_targets(struct
{
struct damon_target *t, *next_t;
- if (ctx->primitive.cleanup) {
- ctx->primitive.cleanup(ctx);
+ if (ctx->ops.cleanup) {
+ ctx->ops.cleanup(ctx);
return;
}
@@ -250,7 +250,7 @@ void damon_destroy_ctx(struct damon_ctx
* @ctx: monitoring context
* @sample_int: time interval between samplings
* @aggr_int: time interval between aggregations
- * @primitive_upd_int: time interval between monitoring primitive updates
+ * @ops_upd_int: time interval between monitoring operations updates
* @min_nr_reg: minimal number of regions
* @max_nr_reg: maximum number of regions
*
@@ -260,7 +260,7 @@ void damon_destroy_ctx(struct damon_ctx
* Return: 0 on success, negative error code otherwise.
*/
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
- unsigned long aggr_int, unsigned long primitive_upd_int,
+ unsigned long aggr_int, unsigned long ops_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg)
{
if (min_nr_reg < 3)
@@ -270,7 +270,7 @@ int damon_set_attrs(struct damon_ctx *ct
ctx->sample_interval = sample_int;
ctx->aggr_interval = aggr_int;
- ctx->primitive_update_interval = primitive_upd_int;
+ ctx->ops_update_interval = ops_upd_int;
ctx->min_nr_regions = min_nr_reg;
ctx->max_nr_regions = max_nr_reg;
@@ -516,10 +516,10 @@ static bool damos_valid_target(struct da
{
bool ret = __damos_valid_target(r, s);
- if (!ret || !s->quota.esz || !c->primitive.get_scheme_score)
+ if (!ret || !s->quota.esz || !c->ops.get_scheme_score)
return ret;
- return c->primitive.get_scheme_score(c, t, r, s) >= s->quota.min_score;
+ return c->ops.get_scheme_score(c, t, r, s) >= s->quota.min_score;
}
static void damon_do_apply_schemes(struct damon_ctx *c,
@@ -576,7 +576,7 @@ static void damon_do_apply_schemes(struc
continue;
/* Apply the scheme */
- if (c->primitive.apply_scheme) {
+ if (c->ops.apply_scheme) {
if (quota->esz &&
quota->charged_sz + sz > quota->esz) {
sz = ALIGN_DOWN(quota->esz - quota->charged_sz,
@@ -586,7 +586,7 @@ static void damon_do_apply_schemes(struc
damon_split_region_at(c, t, r, sz);
}
ktime_get_coarse_ts64(&begin);
- sz_applied = c->primitive.apply_scheme(c, t, r, s);
+ sz_applied = c->ops.apply_scheme(c, t, r, s);
ktime_get_coarse_ts64(&end);
quota->total_charged_ns += timespec64_to_ns(&end) -
timespec64_to_ns(&begin);
@@ -660,7 +660,7 @@ static void kdamond_apply_schemes(struct
damos_set_effective_quota(quota);
}
- if (!c->primitive.get_scheme_score)
+ if (!c->ops.get_scheme_score)
continue;
/* Fill up the score histogram */
@@ -669,7 +669,7 @@ static void kdamond_apply_schemes(struct
damon_for_each_region(r, t) {
if (!__damos_valid_target(r, s))
continue;
- score = c->primitive.get_scheme_score(
+ score = c->ops.get_scheme_score(
c, t, r, s);
quota->histogram[score] +=
r->ar.end - r->ar.start;
@@ -848,14 +848,15 @@ static void kdamond_split_regions(struct
}
/*
- * Check whether it is time to check and apply the target monitoring regions
+ * Check whether it is time to check and apply the operations-related data
+ * structures.
*
* Returns true if it is.
*/
-static bool kdamond_need_update_primitive(struct damon_ctx *ctx)
+static bool kdamond_need_update_operations(struct damon_ctx *ctx)
{
- return damon_check_reset_time_interval(&ctx->last_primitive_update,
- ctx->primitive_update_interval);
+ return damon_check_reset_time_interval(&ctx->last_ops_update,
+ ctx->ops_update_interval);
}
/*
@@ -873,11 +874,11 @@ static bool kdamond_need_stop(struct dam
if (kthread_should_stop())
return true;
- if (!ctx->primitive.target_valid)
+ if (!ctx->ops.target_valid)
return false;
damon_for_each_target(t, ctx) {
- if (ctx->primitive.target_valid(t))
+ if (ctx->ops.target_valid(t))
return false;
}
@@ -976,8 +977,8 @@ static int kdamond_fn(void *data)
pr_debug("kdamond (%d) starts\n", current->pid);
- if (ctx->primitive.init)
- ctx->primitive.init(ctx);
+ if (ctx->ops.init)
+ ctx->ops.init(ctx);
if (ctx->callback.before_start && ctx->callback.before_start(ctx))
done = true;
@@ -987,16 +988,16 @@ static int kdamond_fn(void *data)
if (kdamond_wait_activation(ctx))
continue;
- if (ctx->primitive.prepare_access_checks)
- ctx->primitive.prepare_access_checks(ctx);
+ if (ctx->ops.prepare_access_checks)
+ ctx->ops.prepare_access_checks(ctx);
if (ctx->callback.after_sampling &&
ctx->callback.after_sampling(ctx))
done = true;
kdamond_usleep(ctx->sample_interval);
- if (ctx->primitive.check_accesses)
- max_nr_accesses = ctx->primitive.check_accesses(ctx);
+ if (ctx->ops.check_accesses)
+ max_nr_accesses = ctx->ops.check_accesses(ctx);
if (kdamond_aggregate_interval_passed(ctx)) {
kdamond_merge_regions(ctx,
@@ -1008,13 +1009,13 @@ static int kdamond_fn(void *data)
kdamond_apply_schemes(ctx);
kdamond_reset_aggregated(ctx);
kdamond_split_regions(ctx);
- if (ctx->primitive.reset_aggregated)
- ctx->primitive.reset_aggregated(ctx);
+ if (ctx->ops.reset_aggregated)
+ ctx->ops.reset_aggregated(ctx);
}
- if (kdamond_need_update_primitive(ctx)) {
- if (ctx->primitive.update)
- ctx->primitive.update(ctx);
+ if (kdamond_need_update_operations(ctx)) {
+ if (ctx->ops.update)
+ ctx->ops.update(ctx);
sz_limit = damon_region_sz_limit(ctx);
}
}
@@ -1025,8 +1026,8 @@ static int kdamond_fn(void *data)
if (ctx->callback.before_terminate)
ctx->callback.before_terminate(ctx);
- if (ctx->primitive.cleanup)
- ctx->primitive.cleanup(ctx);
+ if (ctx->ops.cleanup)
+ ctx->ops.cleanup(ctx);
pr_debug("kdamond (%d) finishes\n", current->pid);
mutex_lock(&ctx->kdamond_lock);
--- a/mm/damon/dbgfs.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/dbgfs.c
@@ -56,7 +56,7 @@ static ssize_t dbgfs_attrs_read(struct f
mutex_lock(&ctx->kdamond_lock);
ret = scnprintf(kbuf, ARRAY_SIZE(kbuf), "%lu %lu %lu %lu %lu\n",
ctx->sample_interval, ctx->aggr_interval,
- ctx->primitive_update_interval, ctx->min_nr_regions,
+ ctx->ops_update_interval, ctx->min_nr_regions,
ctx->max_nr_regions);
mutex_unlock(&ctx->kdamond_lock);
@@ -277,7 +277,7 @@ out:
static inline bool target_has_pid(const struct damon_ctx *ctx)
{
- return ctx->primitive.target_valid == damon_va_target_valid;
+ return ctx->ops.target_valid == damon_va_target_valid;
}
static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len)
@@ -477,9 +477,9 @@ static ssize_t dbgfs_target_ids_write(st
/* Configure the context for the address space type */
if (id_is_pid)
- damon_va_set_primitives(ctx);
+ damon_va_set_operations(ctx);
else
- damon_pa_set_primitives(ctx);
+ damon_pa_set_operations(ctx);
ret = dbgfs_set_targets(ctx, nr_targets, target_pids);
if (!ret)
@@ -735,7 +735,7 @@ static struct damon_ctx *dbgfs_new_ctx(v
if (!ctx)
return NULL;
- damon_va_set_primitives(ctx);
+ damon_va_set_operations(ctx);
ctx->callback.before_terminate = dbgfs_before_terminate;
return ctx;
}
--- a/mm/damon/dbgfs-test.h~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/dbgfs-test.h
@@ -74,7 +74,7 @@ static void damon_dbgfs_test_set_targets
char buf[64];
/* Make DAMON consider target has no pid */
- ctx->primitive = (struct damon_primitive){};
+ ctx->ops = (struct damon_operations){};
dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
--- a/mm/damon/Kconfig~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/Kconfig
@@ -25,27 +25,27 @@ config DAMON_KUNIT_TEST
If unsure, say N.
config DAMON_VADDR
- bool "Data access monitoring primitives for virtual address spaces"
+ bool "Data access monitoring operations for virtual address spaces"
depends on DAMON && MMU
select PAGE_IDLE_FLAG
help
- This builds the default data access monitoring primitives for DAMON
+ This builds the default data access monitoring operations for DAMON
that work for virtual address spaces.
config DAMON_PADDR
- bool "Data access monitoring primitives for the physical address space"
+ bool "Data access monitoring operations for the physical address space"
depends on DAMON && MMU
select PAGE_IDLE_FLAG
help
- This builds the default data access monitoring primitives for DAMON
+ This builds the default data access monitoring operations for DAMON
that works for the physical address space.
config DAMON_VADDR_KUNIT_TEST
- bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS
+ bool "Test for DAMON operations" if !KUNIT_ALL_TESTS
depends on DAMON_VADDR && KUNIT=y
default KUNIT_ALL_TESTS
help
- This builds the DAMON virtual addresses primitives Kunit test suite.
+ This builds the DAMON virtual addresses operations Kunit test suite.
For more information on KUnit and unit tests in general, please refer
to the KUnit documentation.
--- a/mm/damon/Makefile~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_DAMON) := core.o
-obj-$(CONFIG_DAMON_VADDR) += prmtv-common.o vaddr.o
-obj-$(CONFIG_DAMON_PADDR) += prmtv-common.o paddr.o
+obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o
+obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o
obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
obj-$(CONFIG_DAMON_RECLAIM) += reclaim.o
--- /dev/null
+++ a/mm/damon/ops-common.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common Primitives for Data Access Monitoring
+ *
+ * Author: SeongJae Park <sj@kernel.org>
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/page_idle.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+
+#include "ops-common.h"
+
+/*
+ * Get an online page for a pfn if it's in the LRU list. Otherwise, returns
+ * NULL.
+ *
+ * The body of this function is stolen from the 'page_idle_get_page()'. We
+ * steal rather than reuse it because the code is quite simple.
+ */
+struct page *damon_get_page(unsigned long pfn)
+{
+ struct page *page = pfn_to_online_page(pfn);
+
+ if (!page || !PageLRU(page) || !get_page_unless_zero(page))
+ return NULL;
+
+ if (unlikely(!PageLRU(page))) {
+ put_page(page);
+ page = NULL;
+ }
+ return page;
+}
+
+void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr)
+{
+ bool referenced = false;
+ struct page *page = damon_get_page(pte_pfn(*pte));
+
+ if (!page)
+ return;
+
+ if (pte_young(*pte)) {
+ referenced = true;
+ *pte = pte_mkold(*pte);
+ }
+
+#ifdef CONFIG_MMU_NOTIFIER
+ if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE))
+ referenced = true;
+#endif /* CONFIG_MMU_NOTIFIER */
+
+ if (referenced)
+ set_page_young(page);
+
+ set_page_idle(page);
+ put_page(page);
+}
+
+void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ bool referenced = false;
+ struct page *page = damon_get_page(pmd_pfn(*pmd));
+
+ if (!page)
+ return;
+
+ if (pmd_young(*pmd)) {
+ referenced = true;
+ *pmd = pmd_mkold(*pmd);
+ }
+
+#ifdef CONFIG_MMU_NOTIFIER
+ if (mmu_notifier_clear_young(mm, addr,
+ addr + ((1UL) << HPAGE_PMD_SHIFT)))
+ referenced = true;
+#endif /* CONFIG_MMU_NOTIFIER */
+
+ if (referenced)
+ set_page_young(page);
+
+ set_page_idle(page);
+ put_page(page);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+}
+
+#define DAMON_MAX_SUBSCORE (100)
+#define DAMON_MAX_AGE_IN_LOG (32)
+
+int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
+ struct damos *s)
+{
+ unsigned int max_nr_accesses;
+ int freq_subscore;
+ unsigned int age_in_sec;
+ int age_in_log, age_subscore;
+ unsigned int freq_weight = s->quota.weight_nr_accesses;
+ unsigned int age_weight = s->quota.weight_age;
+ int hotness;
+
+ max_nr_accesses = c->aggr_interval / c->sample_interval;
+ freq_subscore = r->nr_accesses * DAMON_MAX_SUBSCORE / max_nr_accesses;
+
+ age_in_sec = (unsigned long)r->age * c->aggr_interval / 1000000;
+ for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec;
+ age_in_log++, age_in_sec >>= 1)
+ ;
+
+ /* If frequency is 0, higher age means it's colder */
+ if (freq_subscore == 0)
+ age_in_log *= -1;
+
+ /*
+ * Now age_in_log is in [-DAMON_MAX_AGE_IN_LOG, DAMON_MAX_AGE_IN_LOG].
+ * Scale it to be in [0, 100] and set it as age subscore.
+ */
+ age_in_log += DAMON_MAX_AGE_IN_LOG;
+ age_subscore = age_in_log * DAMON_MAX_SUBSCORE /
+ DAMON_MAX_AGE_IN_LOG / 2;
+
+ hotness = (freq_weight * freq_subscore + age_weight * age_subscore);
+ if (freq_weight + age_weight)
+ hotness /= freq_weight + age_weight;
+ /*
+ * Transform it to fit in [0, DAMOS_MAX_SCORE]
+ */
+ hotness = hotness * DAMOS_MAX_SCORE / DAMON_MAX_SUBSCORE;
+
+ /* Return coldness of the region */
+ return DAMOS_MAX_SCORE - hotness;
+}
--- /dev/null
+++ a/mm/damon/ops-common.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Common Primitives for Data Access Monitoring
+ *
+ * Author: SeongJae Park <sj@kernel.org>
+ */
+
+#include <linux/damon.h>
+
+struct page *damon_get_page(unsigned long pfn);
+
+void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr);
+void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr);
+
+int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
+ struct damos *s);
--- a/mm/damon/paddr.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/paddr.c
@@ -14,7 +14,7 @@
#include <linux/swap.h>
#include "../internal.h"
-#include "prmtv-common.h"
+#include "ops-common.h"
static bool __damon_pa_mkold(struct page *page, struct vm_area_struct *vma,
unsigned long addr, void *arg)
@@ -261,15 +261,15 @@ static int damon_pa_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_pa_set_primitives(struct damon_ctx *ctx)
+void damon_pa_set_operations(struct damon_ctx *ctx)
{
- ctx->primitive.init = NULL;
- ctx->primitive.update = NULL;
- ctx->primitive.prepare_access_checks = damon_pa_prepare_access_checks;
- ctx->primitive.check_accesses = damon_pa_check_accesses;
- ctx->primitive.reset_aggregated = NULL;
- ctx->primitive.target_valid = damon_pa_target_valid;
- ctx->primitive.cleanup = NULL;
- ctx->primitive.apply_scheme = damon_pa_apply_scheme;
- ctx->primitive.get_scheme_score = damon_pa_scheme_score;
+ ctx->ops.init = NULL;
+ ctx->ops.update = NULL;
+ ctx->ops.prepare_access_checks = damon_pa_prepare_access_checks;
+ ctx->ops.check_accesses = damon_pa_check_accesses;
+ ctx->ops.reset_aggregated = NULL;
+ ctx->ops.target_valid = damon_pa_target_valid;
+ ctx->ops.cleanup = NULL;
+ ctx->ops.apply_scheme = damon_pa_apply_scheme;
+ ctx->ops.get_scheme_score = damon_pa_scheme_score;
}
--- a/mm/damon/prmtv-common.c
+++ /dev/null
@@ -1,133 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Common Primitives for Data Access Monitoring
- *
- * Author: SeongJae Park <sj@kernel.org>
- */
-
-#include <linux/mmu_notifier.h>
-#include <linux/page_idle.h>
-#include <linux/pagemap.h>
-#include <linux/rmap.h>
-
-#include "prmtv-common.h"
-
-/*
- * Get an online page for a pfn if it's in the LRU list. Otherwise, returns
- * NULL.
- *
- * The body of this function is stolen from the 'page_idle_get_page()'. We
- * steal rather than reuse it because the code is quite simple.
- */
-struct page *damon_get_page(unsigned long pfn)
-{
- struct page *page = pfn_to_online_page(pfn);
-
- if (!page || !PageLRU(page) || !get_page_unless_zero(page))
- return NULL;
-
- if (unlikely(!PageLRU(page))) {
- put_page(page);
- page = NULL;
- }
- return page;
-}
-
-void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr)
-{
- bool referenced = false;
- struct page *page = damon_get_page(pte_pfn(*pte));
-
- if (!page)
- return;
-
- if (pte_young(*pte)) {
- referenced = true;
- *pte = pte_mkold(*pte);
- }
-
-#ifdef CONFIG_MMU_NOTIFIER
- if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE))
- referenced = true;
-#endif /* CONFIG_MMU_NOTIFIER */
-
- if (referenced)
- set_page_young(page);
-
- set_page_idle(page);
- put_page(page);
-}
-
-void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr)
-{
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- bool referenced = false;
- struct page *page = damon_get_page(pmd_pfn(*pmd));
-
- if (!page)
- return;
-
- if (pmd_young(*pmd)) {
- referenced = true;
- *pmd = pmd_mkold(*pmd);
- }
-
-#ifdef CONFIG_MMU_NOTIFIER
- if (mmu_notifier_clear_young(mm, addr,
- addr + ((1UL) << HPAGE_PMD_SHIFT)))
- referenced = true;
-#endif /* CONFIG_MMU_NOTIFIER */
-
- if (referenced)
- set_page_young(page);
-
- set_page_idle(page);
- put_page(page);
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-}
-
-#define DAMON_MAX_SUBSCORE (100)
-#define DAMON_MAX_AGE_IN_LOG (32)
-
-int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
- struct damos *s)
-{
- unsigned int max_nr_accesses;
- int freq_subscore;
- unsigned int age_in_sec;
- int age_in_log, age_subscore;
- unsigned int freq_weight = s->quota.weight_nr_accesses;
- unsigned int age_weight = s->quota.weight_age;
- int hotness;
-
- max_nr_accesses = c->aggr_interval / c->sample_interval;
- freq_subscore = r->nr_accesses * DAMON_MAX_SUBSCORE / max_nr_accesses;
-
- age_in_sec = (unsigned long)r->age * c->aggr_interval / 1000000;
- for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec;
- age_in_log++, age_in_sec >>= 1)
- ;
-
- /* If frequency is 0, higher age means it's colder */
- if (freq_subscore == 0)
- age_in_log *= -1;
-
- /*
- * Now age_in_log is in [-DAMON_MAX_AGE_IN_LOG, DAMON_MAX_AGE_IN_LOG].
- * Scale it to be in [0, 100] and set it as age subscore.
- */
- age_in_log += DAMON_MAX_AGE_IN_LOG;
- age_subscore = age_in_log * DAMON_MAX_SUBSCORE /
- DAMON_MAX_AGE_IN_LOG / 2;
-
- hotness = (freq_weight * freq_subscore + age_weight * age_subscore);
- if (freq_weight + age_weight)
- hotness /= freq_weight + age_weight;
- /*
- * Transform it to fit in [0, DAMOS_MAX_SCORE]
- */
- hotness = hotness * DAMOS_MAX_SCORE / DAMON_MAX_SUBSCORE;
-
- /* Return coldness of the region */
- return DAMOS_MAX_SCORE - hotness;
-}
--- a/mm/damon/prmtv-common.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Common Primitives for Data Access Monitoring
- *
- * Author: SeongJae Park <sj@kernel.org>
- */
-
-#include <linux/damon.h>
-
-struct page *damon_get_page(unsigned long pfn);
-
-void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr);
-void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr);
-
-int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
- struct damos *s);
--- a/mm/damon/reclaim.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/reclaim.c
@@ -384,7 +384,7 @@ static int __init damon_reclaim_init(voi
if (!ctx)
return -ENOMEM;
- damon_pa_set_primitives(ctx);
+ damon_pa_set_operations(ctx);
ctx->callback.after_aggregation = damon_reclaim_after_aggregation;
target = damon_new_target();
--- a/mm/damon/vaddr.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/vaddr.c
@@ -15,7 +15,7 @@
#include <linux/pagewalk.h>
#include <linux/sched/mm.h>
-#include "prmtv-common.h"
+#include "ops-common.h"
#ifdef CONFIG_DAMON_VADDR_KUNIT_TEST
#undef DAMON_MIN_REGION
@@ -739,17 +739,17 @@ static int damon_va_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_va_set_primitives(struct damon_ctx *ctx)
+void damon_va_set_operations(struct damon_ctx *ctx)
{
- ctx->primitive.init = damon_va_init;
- ctx->primitive.update = damon_va_update;
- ctx->primitive.prepare_access_checks = damon_va_prepare_access_checks;
- ctx->primitive.check_accesses = damon_va_check_accesses;
- ctx->primitive.reset_aggregated = NULL;
- ctx->primitive.target_valid = damon_va_target_valid;
- ctx->primitive.cleanup = NULL;
- ctx->primitive.apply_scheme = damon_va_apply_scheme;
- ctx->primitive.get_scheme_score = damon_va_scheme_score;
+ ctx->ops.init = damon_va_init;
+ ctx->ops.update = damon_va_update;
+ ctx->ops.prepare_access_checks = damon_va_prepare_access_checks;
+ ctx->ops.check_accesses = damon_va_check_accesses;
+ ctx->ops.reset_aggregated = NULL;
+ ctx->ops.target_valid = damon_va_target_valid;
+ ctx->ops.cleanup = NULL;
+ ctx->ops.apply_scheme = damon_va_apply_scheme;
+ ctx->ops.get_scheme_score = damon_va_scheme_score;
}
#include "vaddr-test.h"
--- a/mm/damon/vaddr-test.h~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/vaddr-test.h
@@ -314,7 +314,7 @@ static struct kunit_case damon_test_case
};
static struct kunit_suite damon_test_suite = {
- .name = "damon-primitives",
+ .name = "damon-operations",
.test_cases = damon_test_cases,
};
kunit_test_suite(damon_test_suite);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 202/227] mm/damon: rename damon_primitives to damon_operations
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: rename damon_primitives to damon_operations
Patch series "Allow DAMON user code independent of monitoring primitives".
In-kernel DAMON user code is required to configure the monitoring context
(struct damon_ctx) with proper monitoring primitives (struct
damon_primitive). This makes the user code dependent to all supporting
monitoring primitives. For example, DAMON debugfs interface depends on
both DAMON_VADDR and DAMON_PADDR, though some users have interest in only
one use case. As more monitoring primitives are introduced, the problem
will be bigger.
To minimize such unnecessary dependency, this patchset makes monitoring
primitives can be registered by the implemnting code and later dynamically
searched and selected by the user code.
In addition to that, this patchset renames monitoring primitives to
monitoring operations, which is more easy to intuitively understand what
it means and how it would be structed.
This patch (of 8):
DAMON has a set of callback functions called monitoring primitives and let
it can be configured with various implementations for easy extension for
different address spaces and usages. However, the word 'primitive' is not
so explicit. Meanwhile, many other structs resembles similar purpose
calls themselves 'operations'. To make the code easier to be understood,
this commit renames 'damon_primitives' to 'damon_operations' before it is
too late to rename.
Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 48 ++++++-------
mm/damon/Kconfig | 12 +--
mm/damon/Makefile | 4 -
mm/damon/core.c | 65 +++++++++---------
mm/damon/dbgfs-test.h | 2
mm/damon/dbgfs.c | 10 +-
mm/damon/{prmtv-common.c => ops-common.c} | 2 +-
mm/damon/{prmtv-common.h => ops-common.h} | 0
mm/damon/paddr.c | 22 +++---
mm/damon/reclaim.c | 2
mm/damon/vaddr-test.h | 2
mm/damon/vaddr.c | 22 +++---
14 files changed, 244 insertions(+), 243 deletions(-)
--- a/include/linux/damon.h~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/include/linux/damon.h
@@ -67,8 +67,8 @@ struct damon_region {
*
* Each monitoring context could have multiple targets. For example, a context
* for virtual memory address spaces could have multiple target processes. The
- * @pid should be set for appropriate address space monitoring primitives
- * including the virtual address spaces monitoring primitives.
+ * @pid should be set for appropriate &struct damon_operations including the
+ * virtual address spaces monitoring operations.
*/
struct damon_target {
struct pid *pid;
@@ -120,9 +120,9 @@ enum damos_action {
* uses smaller one as the effective quota.
*
* For selecting regions within the quota, DAMON prioritizes current scheme's
- * target memory regions using the &struct damon_primitive->get_scheme_score.
+ * target memory regions using the &struct damon_operations->get_scheme_score.
* You could customize the prioritization logic by setting &weight_sz,
- * &weight_nr_accesses, and &weight_age, because monitoring primitives are
+ * &weight_nr_accesses, and &weight_age, because monitoring operations are
* encouraged to respect those.
*/
struct damos_quota {
@@ -256,10 +256,10 @@ struct damos {
struct damon_ctx;
/**
- * struct damon_primitive - Monitoring primitives for given use cases.
+ * struct damon_operations - Monitoring operations for given use cases.
*
- * @init: Initialize primitive-internal data structures.
- * @update: Update primitive-internal data structures.
+ * @init: Initialize operations-related data structures.
+ * @update: Update operations-related data structures.
* @prepare_access_checks: Prepare next access check of target regions.
* @check_accesses: Check the accesses to target regions.
* @reset_aggregated: Reset aggregated accesses monitoring results.
@@ -269,18 +269,18 @@ struct damon_ctx;
* @cleanup: Clean up the context.
*
* DAMON can be extended for various address spaces and usages. For this,
- * users should register the low level primitives for their target address
- * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread
+ * users should register the low level operations for their target address
+ * space and usecase via the &damon_ctx.ops. Then, the monitoring thread
* (&damon_ctx.kdamond) calls @init and @prepare_access_checks before starting
- * the monitoring, @update after each &damon_ctx.primitive_update_interval, and
+ * the monitoring, @update after each &damon_ctx.ops_update_interval, and
* @check_accesses, @target_valid and @prepare_access_checks after each
* &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each
* &damon_ctx.aggr_interval.
*
- * @init should initialize primitive-internal data structures. For example,
+ * @init should initialize operations-related data structures. For example,
* this could be used to construct proper monitoring target regions and link
* those to @damon_ctx.adaptive_targets.
- * @update should update the primitive-internal data structures. For example,
+ * @update should update the operations-related data structures. For example,
* this could be used to update monitoring target regions for current status.
* @prepare_access_checks should manipulate the monitoring regions to be
* prepared for the next access check.
@@ -300,7 +300,7 @@ struct damon_ctx;
* monitoring.
* @cleanup is called from @kdamond just before its termination.
*/
-struct damon_primitive {
+struct damon_operations {
void (*init)(struct damon_ctx *context);
void (*update)(struct damon_ctx *context);
void (*prepare_access_checks)(struct damon_ctx *context);
@@ -354,15 +354,15 @@ struct damon_callback {
*
* @sample_interval: The time between access samplings.
* @aggr_interval: The time between monitor results aggregations.
- * @primitive_update_interval: The time between monitoring primitive updates.
+ * @ops_update_interval: The time between monitoring operations updates.
*
* For each @sample_interval, DAMON checks whether each region is accessed or
* not. It aggregates and keeps the access information (number of accesses to
* each region) for @aggr_interval time. DAMON also checks whether the target
* memory regions need update (e.g., by ``mmap()`` calls from the application,
* in case of virtual memory monitoring) and applies the changes for each
- * @primitive_update_interval. All time intervals are in micro-seconds.
- * Please refer to &struct damon_primitive and &struct damon_callback for more
+ * @ops_update_interval. All time intervals are in micro-seconds.
+ * Please refer to &struct damon_operations and &struct damon_callback for more
* detail.
*
* @kdamond: Kernel thread who does the monitoring.
@@ -374,7 +374,7 @@ struct damon_callback {
*
* Once started, the monitoring thread runs until explicitly required to be
* terminated or every monitoring target is invalid. The validity of the
- * targets is checked via the &damon_primitive.target_valid of @primitive. The
+ * targets is checked via the &damon_operations.target_valid of @ops. The
* termination can also be explicitly requested by writing non-zero to
* @kdamond_stop. The thread sets @kdamond to NULL when it terminates.
* Therefore, users can know whether the monitoring is ongoing or terminated by
@@ -384,7 +384,7 @@ struct damon_callback {
* Note that the monitoring thread protects only @kdamond and @kdamond_stop via
* @kdamond_lock. Accesses to other fields must be protected by themselves.
*
- * @primitive: Set of monitoring primitives for given use cases.
+ * @ops: Set of monitoring operations for given use cases.
* @callback: Set of callbacks for monitoring events notifications.
*
* @min_nr_regions: The minimum number of adaptive monitoring regions.
@@ -395,17 +395,17 @@ struct damon_callback {
struct damon_ctx {
unsigned long sample_interval;
unsigned long aggr_interval;
- unsigned long primitive_update_interval;
+ unsigned long ops_update_interval;
/* private: internal use only */
struct timespec64 last_aggregation;
- struct timespec64 last_primitive_update;
+ struct timespec64 last_ops_update;
/* public: */
struct task_struct *kdamond;
struct mutex kdamond_lock;
- struct damon_primitive primitive;
+ struct damon_operations ops;
struct damon_callback callback;
unsigned long min_nr_regions;
@@ -484,7 +484,7 @@ unsigned int damon_nr_regions(struct dam
struct damon_ctx *damon_new_ctx(void);
void damon_destroy_ctx(struct damon_ctx *ctx);
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
- unsigned long aggr_int, unsigned long primitive_upd_int,
+ unsigned long aggr_int, unsigned long ops_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg);
int damon_set_schemes(struct damon_ctx *ctx,
struct damos **schemes, ssize_t nr_schemes);
@@ -497,12 +497,12 @@ int damon_stop(struct damon_ctx **ctxs,
#ifdef CONFIG_DAMON_VADDR
bool damon_va_target_valid(void *t);
-void damon_va_set_primitives(struct damon_ctx *ctx);
+void damon_va_set_operations(struct damon_ctx *ctx);
#endif /* CONFIG_DAMON_VADDR */
#ifdef CONFIG_DAMON_PADDR
bool damon_pa_target_valid(void *t);
-void damon_pa_set_primitives(struct damon_ctx *ctx);
+void damon_pa_set_operations(struct damon_ctx *ctx);
#endif /* CONFIG_DAMON_PADDR */
#endif /* _DAMON_H */
--- a/mm/damon/core.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/core.c
@@ -204,10 +204,10 @@ struct damon_ctx *damon_new_ctx(void)
ctx->sample_interval = 5 * 1000;
ctx->aggr_interval = 100 * 1000;
- ctx->primitive_update_interval = 60 * 1000 * 1000;
+ ctx->ops_update_interval = 60 * 1000 * 1000;
ktime_get_coarse_ts64(&ctx->last_aggregation);
- ctx->last_primitive_update = ctx->last_aggregation;
+ ctx->last_ops_update = ctx->last_aggregation;
mutex_init(&ctx->kdamond_lock);
@@ -224,8 +224,8 @@ static void damon_destroy_targets(struct
{
struct damon_target *t, *next_t;
- if (ctx->primitive.cleanup) {
- ctx->primitive.cleanup(ctx);
+ if (ctx->ops.cleanup) {
+ ctx->ops.cleanup(ctx);
return;
}
@@ -250,7 +250,7 @@ void damon_destroy_ctx(struct damon_ctx
* @ctx: monitoring context
* @sample_int: time interval between samplings
* @aggr_int: time interval between aggregations
- * @primitive_upd_int: time interval between monitoring primitive updates
+ * @ops_upd_int: time interval between monitoring operations updates
* @min_nr_reg: minimal number of regions
* @max_nr_reg: maximum number of regions
*
@@ -260,7 +260,7 @@ void damon_destroy_ctx(struct damon_ctx
* Return: 0 on success, negative error code otherwise.
*/
int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
- unsigned long aggr_int, unsigned long primitive_upd_int,
+ unsigned long aggr_int, unsigned long ops_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg)
{
if (min_nr_reg < 3)
@@ -270,7 +270,7 @@ int damon_set_attrs(struct damon_ctx *ct
ctx->sample_interval = sample_int;
ctx->aggr_interval = aggr_int;
- ctx->primitive_update_interval = primitive_upd_int;
+ ctx->ops_update_interval = ops_upd_int;
ctx->min_nr_regions = min_nr_reg;
ctx->max_nr_regions = max_nr_reg;
@@ -516,10 +516,10 @@ static bool damos_valid_target(struct da
{
bool ret = __damos_valid_target(r, s);
- if (!ret || !s->quota.esz || !c->primitive.get_scheme_score)
+ if (!ret || !s->quota.esz || !c->ops.get_scheme_score)
return ret;
- return c->primitive.get_scheme_score(c, t, r, s) >= s->quota.min_score;
+ return c->ops.get_scheme_score(c, t, r, s) >= s->quota.min_score;
}
static void damon_do_apply_schemes(struct damon_ctx *c,
@@ -576,7 +576,7 @@ static void damon_do_apply_schemes(struc
continue;
/* Apply the scheme */
- if (c->primitive.apply_scheme) {
+ if (c->ops.apply_scheme) {
if (quota->esz &&
quota->charged_sz + sz > quota->esz) {
sz = ALIGN_DOWN(quota->esz - quota->charged_sz,
@@ -586,7 +586,7 @@ static void damon_do_apply_schemes(struc
damon_split_region_at(c, t, r, sz);
}
ktime_get_coarse_ts64(&begin);
- sz_applied = c->primitive.apply_scheme(c, t, r, s);
+ sz_applied = c->ops.apply_scheme(c, t, r, s);
ktime_get_coarse_ts64(&end);
quota->total_charged_ns += timespec64_to_ns(&end) -
timespec64_to_ns(&begin);
@@ -660,7 +660,7 @@ static void kdamond_apply_schemes(struct
damos_set_effective_quota(quota);
}
- if (!c->primitive.get_scheme_score)
+ if (!c->ops.get_scheme_score)
continue;
/* Fill up the score histogram */
@@ -669,7 +669,7 @@ static void kdamond_apply_schemes(struct
damon_for_each_region(r, t) {
if (!__damos_valid_target(r, s))
continue;
- score = c->primitive.get_scheme_score(
+ score = c->ops.get_scheme_score(
c, t, r, s);
quota->histogram[score] +=
r->ar.end - r->ar.start;
@@ -848,14 +848,15 @@ static void kdamond_split_regions(struct
}
/*
- * Check whether it is time to check and apply the target monitoring regions
+ * Check whether it is time to check and apply the operations-related data
+ * structures.
*
* Returns true if it is.
*/
-static bool kdamond_need_update_primitive(struct damon_ctx *ctx)
+static bool kdamond_need_update_operations(struct damon_ctx *ctx)
{
- return damon_check_reset_time_interval(&ctx->last_primitive_update,
- ctx->primitive_update_interval);
+ return damon_check_reset_time_interval(&ctx->last_ops_update,
+ ctx->ops_update_interval);
}
/*
@@ -873,11 +874,11 @@ static bool kdamond_need_stop(struct dam
if (kthread_should_stop())
return true;
- if (!ctx->primitive.target_valid)
+ if (!ctx->ops.target_valid)
return false;
damon_for_each_target(t, ctx) {
- if (ctx->primitive.target_valid(t))
+ if (ctx->ops.target_valid(t))
return false;
}
@@ -976,8 +977,8 @@ static int kdamond_fn(void *data)
pr_debug("kdamond (%d) starts\n", current->pid);
- if (ctx->primitive.init)
- ctx->primitive.init(ctx);
+ if (ctx->ops.init)
+ ctx->ops.init(ctx);
if (ctx->callback.before_start && ctx->callback.before_start(ctx))
done = true;
@@ -987,16 +988,16 @@ static int kdamond_fn(void *data)
if (kdamond_wait_activation(ctx))
continue;
- if (ctx->primitive.prepare_access_checks)
- ctx->primitive.prepare_access_checks(ctx);
+ if (ctx->ops.prepare_access_checks)
+ ctx->ops.prepare_access_checks(ctx);
if (ctx->callback.after_sampling &&
ctx->callback.after_sampling(ctx))
done = true;
kdamond_usleep(ctx->sample_interval);
- if (ctx->primitive.check_accesses)
- max_nr_accesses = ctx->primitive.check_accesses(ctx);
+ if (ctx->ops.check_accesses)
+ max_nr_accesses = ctx->ops.check_accesses(ctx);
if (kdamond_aggregate_interval_passed(ctx)) {
kdamond_merge_regions(ctx,
@@ -1008,13 +1009,13 @@ static int kdamond_fn(void *data)
kdamond_apply_schemes(ctx);
kdamond_reset_aggregated(ctx);
kdamond_split_regions(ctx);
- if (ctx->primitive.reset_aggregated)
- ctx->primitive.reset_aggregated(ctx);
+ if (ctx->ops.reset_aggregated)
+ ctx->ops.reset_aggregated(ctx);
}
- if (kdamond_need_update_primitive(ctx)) {
- if (ctx->primitive.update)
- ctx->primitive.update(ctx);
+ if (kdamond_need_update_operations(ctx)) {
+ if (ctx->ops.update)
+ ctx->ops.update(ctx);
sz_limit = damon_region_sz_limit(ctx);
}
}
@@ -1025,8 +1026,8 @@ static int kdamond_fn(void *data)
if (ctx->callback.before_terminate)
ctx->callback.before_terminate(ctx);
- if (ctx->primitive.cleanup)
- ctx->primitive.cleanup(ctx);
+ if (ctx->ops.cleanup)
+ ctx->ops.cleanup(ctx);
pr_debug("kdamond (%d) finishes\n", current->pid);
mutex_lock(&ctx->kdamond_lock);
--- a/mm/damon/dbgfs.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/dbgfs.c
@@ -56,7 +56,7 @@ static ssize_t dbgfs_attrs_read(struct f
mutex_lock(&ctx->kdamond_lock);
ret = scnprintf(kbuf, ARRAY_SIZE(kbuf), "%lu %lu %lu %lu %lu\n",
ctx->sample_interval, ctx->aggr_interval,
- ctx->primitive_update_interval, ctx->min_nr_regions,
+ ctx->ops_update_interval, ctx->min_nr_regions,
ctx->max_nr_regions);
mutex_unlock(&ctx->kdamond_lock);
@@ -277,7 +277,7 @@ out:
static inline bool target_has_pid(const struct damon_ctx *ctx)
{
- return ctx->primitive.target_valid == damon_va_target_valid;
+ return ctx->ops.target_valid == damon_va_target_valid;
}
static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len)
@@ -477,9 +477,9 @@ static ssize_t dbgfs_target_ids_write(st
/* Configure the context for the address space type */
if (id_is_pid)
- damon_va_set_primitives(ctx);
+ damon_va_set_operations(ctx);
else
- damon_pa_set_primitives(ctx);
+ damon_pa_set_operations(ctx);
ret = dbgfs_set_targets(ctx, nr_targets, target_pids);
if (!ret)
@@ -735,7 +735,7 @@ static struct damon_ctx *dbgfs_new_ctx(v
if (!ctx)
return NULL;
- damon_va_set_primitives(ctx);
+ damon_va_set_operations(ctx);
ctx->callback.before_terminate = dbgfs_before_terminate;
return ctx;
}
--- a/mm/damon/dbgfs-test.h~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/dbgfs-test.h
@@ -74,7 +74,7 @@ static void damon_dbgfs_test_set_targets
char buf[64];
/* Make DAMON consider target has no pid */
- ctx->primitive = (struct damon_primitive){};
+ ctx->ops = (struct damon_operations){};
dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
--- a/mm/damon/Kconfig~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/Kconfig
@@ -25,27 +25,27 @@ config DAMON_KUNIT_TEST
If unsure, say N.
config DAMON_VADDR
- bool "Data access monitoring primitives for virtual address spaces"
+ bool "Data access monitoring operations for virtual address spaces"
depends on DAMON && MMU
select PAGE_IDLE_FLAG
help
- This builds the default data access monitoring primitives for DAMON
+ This builds the default data access monitoring operations for DAMON
that work for virtual address spaces.
config DAMON_PADDR
- bool "Data access monitoring primitives for the physical address space"
+ bool "Data access monitoring operations for the physical address space"
depends on DAMON && MMU
select PAGE_IDLE_FLAG
help
- This builds the default data access monitoring primitives for DAMON
+ This builds the default data access monitoring operations for DAMON
that works for the physical address space.
config DAMON_VADDR_KUNIT_TEST
- bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS
+ bool "Test for DAMON operations" if !KUNIT_ALL_TESTS
depends on DAMON_VADDR && KUNIT=y
default KUNIT_ALL_TESTS
help
- This builds the DAMON virtual addresses primitives Kunit test suite.
+ This builds the DAMON virtual addresses operations Kunit test suite.
For more information on KUnit and unit tests in general, please refer
to the KUnit documentation.
--- a/mm/damon/Makefile~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_DAMON) := core.o
-obj-$(CONFIG_DAMON_VADDR) += prmtv-common.o vaddr.o
-obj-$(CONFIG_DAMON_PADDR) += prmtv-common.o paddr.o
+obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o
+obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o
obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
obj-$(CONFIG_DAMON_RECLAIM) += reclaim.o
--- /dev/null
+++ a/mm/damon/ops-common.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common Primitives for Data Access Monitoring
+ *
+ * Author: SeongJae Park <sj@kernel.org>
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/page_idle.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+
+#include "ops-common.h"
+
+/*
+ * Get an online page for a pfn if it's in the LRU list. Otherwise, returns
+ * NULL.
+ *
+ * The body of this function is stolen from the 'page_idle_get_page()'. We
+ * steal rather than reuse it because the code is quite simple.
+ */
+struct page *damon_get_page(unsigned long pfn)
+{
+ struct page *page = pfn_to_online_page(pfn);
+
+ if (!page || !PageLRU(page) || !get_page_unless_zero(page))
+ return NULL;
+
+ if (unlikely(!PageLRU(page))) {
+ put_page(page);
+ page = NULL;
+ }
+ return page;
+}
+
+void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr)
+{
+ bool referenced = false;
+ struct page *page = damon_get_page(pte_pfn(*pte));
+
+ if (!page)
+ return;
+
+ if (pte_young(*pte)) {
+ referenced = true;
+ *pte = pte_mkold(*pte);
+ }
+
+#ifdef CONFIG_MMU_NOTIFIER
+ if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE))
+ referenced = true;
+#endif /* CONFIG_MMU_NOTIFIER */
+
+ if (referenced)
+ set_page_young(page);
+
+ set_page_idle(page);
+ put_page(page);
+}
+
+void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ bool referenced = false;
+ struct page *page = damon_get_page(pmd_pfn(*pmd));
+
+ if (!page)
+ return;
+
+ if (pmd_young(*pmd)) {
+ referenced = true;
+ *pmd = pmd_mkold(*pmd);
+ }
+
+#ifdef CONFIG_MMU_NOTIFIER
+ if (mmu_notifier_clear_young(mm, addr,
+ addr + ((1UL) << HPAGE_PMD_SHIFT)))
+ referenced = true;
+#endif /* CONFIG_MMU_NOTIFIER */
+
+ if (referenced)
+ set_page_young(page);
+
+ set_page_idle(page);
+ put_page(page);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+}
+
+#define DAMON_MAX_SUBSCORE (100)
+#define DAMON_MAX_AGE_IN_LOG (32)
+
+int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
+ struct damos *s)
+{
+ unsigned int max_nr_accesses;
+ int freq_subscore;
+ unsigned int age_in_sec;
+ int age_in_log, age_subscore;
+ unsigned int freq_weight = s->quota.weight_nr_accesses;
+ unsigned int age_weight = s->quota.weight_age;
+ int hotness;
+
+ max_nr_accesses = c->aggr_interval / c->sample_interval;
+ freq_subscore = r->nr_accesses * DAMON_MAX_SUBSCORE / max_nr_accesses;
+
+ age_in_sec = (unsigned long)r->age * c->aggr_interval / 1000000;
+ for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec;
+ age_in_log++, age_in_sec >>= 1)
+ ;
+
+ /* If frequency is 0, higher age means it's colder */
+ if (freq_subscore == 0)
+ age_in_log *= -1;
+
+ /*
+ * Now age_in_log is in [-DAMON_MAX_AGE_IN_LOG, DAMON_MAX_AGE_IN_LOG].
+ * Scale it to be in [0, 100] and set it as age subscore.
+ */
+ age_in_log += DAMON_MAX_AGE_IN_LOG;
+ age_subscore = age_in_log * DAMON_MAX_SUBSCORE /
+ DAMON_MAX_AGE_IN_LOG / 2;
+
+ hotness = (freq_weight * freq_subscore + age_weight * age_subscore);
+ if (freq_weight + age_weight)
+ hotness /= freq_weight + age_weight;
+ /*
+ * Transform it to fit in [0, DAMOS_MAX_SCORE]
+ */
+ hotness = hotness * DAMOS_MAX_SCORE / DAMON_MAX_SUBSCORE;
+
+ /* Return coldness of the region */
+ return DAMOS_MAX_SCORE - hotness;
+}
--- /dev/null
+++ a/mm/damon/ops-common.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Common Primitives for Data Access Monitoring
+ *
+ * Author: SeongJae Park <sj@kernel.org>
+ */
+
+#include <linux/damon.h>
+
+struct page *damon_get_page(unsigned long pfn);
+
+void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr);
+void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr);
+
+int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
+ struct damos *s);
--- a/mm/damon/paddr.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/paddr.c
@@ -14,7 +14,7 @@
#include <linux/swap.h>
#include "../internal.h"
-#include "prmtv-common.h"
+#include "ops-common.h"
static bool __damon_pa_mkold(struct page *page, struct vm_area_struct *vma,
unsigned long addr, void *arg)
@@ -261,15 +261,15 @@ static int damon_pa_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_pa_set_primitives(struct damon_ctx *ctx)
+void damon_pa_set_operations(struct damon_ctx *ctx)
{
- ctx->primitive.init = NULL;
- ctx->primitive.update = NULL;
- ctx->primitive.prepare_access_checks = damon_pa_prepare_access_checks;
- ctx->primitive.check_accesses = damon_pa_check_accesses;
- ctx->primitive.reset_aggregated = NULL;
- ctx->primitive.target_valid = damon_pa_target_valid;
- ctx->primitive.cleanup = NULL;
- ctx->primitive.apply_scheme = damon_pa_apply_scheme;
- ctx->primitive.get_scheme_score = damon_pa_scheme_score;
+ ctx->ops.init = NULL;
+ ctx->ops.update = NULL;
+ ctx->ops.prepare_access_checks = damon_pa_prepare_access_checks;
+ ctx->ops.check_accesses = damon_pa_check_accesses;
+ ctx->ops.reset_aggregated = NULL;
+ ctx->ops.target_valid = damon_pa_target_valid;
+ ctx->ops.cleanup = NULL;
+ ctx->ops.apply_scheme = damon_pa_apply_scheme;
+ ctx->ops.get_scheme_score = damon_pa_scheme_score;
}
--- a/mm/damon/prmtv-common.c
+++ /dev/null
@@ -1,133 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Common Primitives for Data Access Monitoring
- *
- * Author: SeongJae Park <sj@kernel.org>
- */
-
-#include <linux/mmu_notifier.h>
-#include <linux/page_idle.h>
-#include <linux/pagemap.h>
-#include <linux/rmap.h>
-
-#include "prmtv-common.h"
-
-/*
- * Get an online page for a pfn if it's in the LRU list. Otherwise, returns
- * NULL.
- *
- * The body of this function is stolen from the 'page_idle_get_page()'. We
- * steal rather than reuse it because the code is quite simple.
- */
-struct page *damon_get_page(unsigned long pfn)
-{
- struct page *page = pfn_to_online_page(pfn);
-
- if (!page || !PageLRU(page) || !get_page_unless_zero(page))
- return NULL;
-
- if (unlikely(!PageLRU(page))) {
- put_page(page);
- page = NULL;
- }
- return page;
-}
-
-void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr)
-{
- bool referenced = false;
- struct page *page = damon_get_page(pte_pfn(*pte));
-
- if (!page)
- return;
-
- if (pte_young(*pte)) {
- referenced = true;
- *pte = pte_mkold(*pte);
- }
-
-#ifdef CONFIG_MMU_NOTIFIER
- if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE))
- referenced = true;
-#endif /* CONFIG_MMU_NOTIFIER */
-
- if (referenced)
- set_page_young(page);
-
- set_page_idle(page);
- put_page(page);
-}
-
-void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr)
-{
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- bool referenced = false;
- struct page *page = damon_get_page(pmd_pfn(*pmd));
-
- if (!page)
- return;
-
- if (pmd_young(*pmd)) {
- referenced = true;
- *pmd = pmd_mkold(*pmd);
- }
-
-#ifdef CONFIG_MMU_NOTIFIER
- if (mmu_notifier_clear_young(mm, addr,
- addr + ((1UL) << HPAGE_PMD_SHIFT)))
- referenced = true;
-#endif /* CONFIG_MMU_NOTIFIER */
-
- if (referenced)
- set_page_young(page);
-
- set_page_idle(page);
- put_page(page);
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-}
-
-#define DAMON_MAX_SUBSCORE (100)
-#define DAMON_MAX_AGE_IN_LOG (32)
-
-int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
- struct damos *s)
-{
- unsigned int max_nr_accesses;
- int freq_subscore;
- unsigned int age_in_sec;
- int age_in_log, age_subscore;
- unsigned int freq_weight = s->quota.weight_nr_accesses;
- unsigned int age_weight = s->quota.weight_age;
- int hotness;
-
- max_nr_accesses = c->aggr_interval / c->sample_interval;
- freq_subscore = r->nr_accesses * DAMON_MAX_SUBSCORE / max_nr_accesses;
-
- age_in_sec = (unsigned long)r->age * c->aggr_interval / 1000000;
- for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec;
- age_in_log++, age_in_sec >>= 1)
- ;
-
- /* If frequency is 0, higher age means it's colder */
- if (freq_subscore == 0)
- age_in_log *= -1;
-
- /*
- * Now age_in_log is in [-DAMON_MAX_AGE_IN_LOG, DAMON_MAX_AGE_IN_LOG].
- * Scale it to be in [0, 100] and set it as age subscore.
- */
- age_in_log += DAMON_MAX_AGE_IN_LOG;
- age_subscore = age_in_log * DAMON_MAX_SUBSCORE /
- DAMON_MAX_AGE_IN_LOG / 2;
-
- hotness = (freq_weight * freq_subscore + age_weight * age_subscore);
- if (freq_weight + age_weight)
- hotness /= freq_weight + age_weight;
- /*
- * Transform it to fit in [0, DAMOS_MAX_SCORE]
- */
- hotness = hotness * DAMOS_MAX_SCORE / DAMON_MAX_SUBSCORE;
-
- /* Return coldness of the region */
- return DAMOS_MAX_SCORE - hotness;
-}
--- a/mm/damon/prmtv-common.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Common Primitives for Data Access Monitoring
- *
- * Author: SeongJae Park <sj@kernel.org>
- */
-
-#include <linux/damon.h>
-
-struct page *damon_get_page(unsigned long pfn);
-
-void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, unsigned long addr);
-void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, unsigned long addr);
-
-int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
- struct damos *s);
--- a/mm/damon/reclaim.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/reclaim.c
@@ -384,7 +384,7 @@ static int __init damon_reclaim_init(voi
if (!ctx)
return -ENOMEM;
- damon_pa_set_primitives(ctx);
+ damon_pa_set_operations(ctx);
ctx->callback.after_aggregation = damon_reclaim_after_aggregation;
target = damon_new_target();
--- a/mm/damon/vaddr.c~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/vaddr.c
@@ -15,7 +15,7 @@
#include <linux/pagewalk.h>
#include <linux/sched/mm.h>
-#include "prmtv-common.h"
+#include "ops-common.h"
#ifdef CONFIG_DAMON_VADDR_KUNIT_TEST
#undef DAMON_MIN_REGION
@@ -739,17 +739,17 @@ static int damon_va_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_va_set_primitives(struct damon_ctx *ctx)
+void damon_va_set_operations(struct damon_ctx *ctx)
{
- ctx->primitive.init = damon_va_init;
- ctx->primitive.update = damon_va_update;
- ctx->primitive.prepare_access_checks = damon_va_prepare_access_checks;
- ctx->primitive.check_accesses = damon_va_check_accesses;
- ctx->primitive.reset_aggregated = NULL;
- ctx->primitive.target_valid = damon_va_target_valid;
- ctx->primitive.cleanup = NULL;
- ctx->primitive.apply_scheme = damon_va_apply_scheme;
- ctx->primitive.get_scheme_score = damon_va_scheme_score;
+ ctx->ops.init = damon_va_init;
+ ctx->ops.update = damon_va_update;
+ ctx->ops.prepare_access_checks = damon_va_prepare_access_checks;
+ ctx->ops.check_accesses = damon_va_check_accesses;
+ ctx->ops.reset_aggregated = NULL;
+ ctx->ops.target_valid = damon_va_target_valid;
+ ctx->ops.cleanup = NULL;
+ ctx->ops.apply_scheme = damon_va_apply_scheme;
+ ctx->ops.get_scheme_score = damon_va_scheme_score;
}
#include "vaddr-test.h"
--- a/mm/damon/vaddr-test.h~mm-damon-rename-damon_primitives-to-damon_operations
+++ a/mm/damon/vaddr-test.h
@@ -314,7 +314,7 @@ static struct kunit_case damon_test_case
};
static struct kunit_suite damon_test_suite = {
- .name = "damon-primitives",
+ .name = "damon-operations",
.test_cases = damon_test_cases,
};
kunit_test_suite(damon_test_suite);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 203/227] mm/damon: let monitoring operations can be registered and selected
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: let monitoring operations can be registered and selected
In-kernel DAMON user code like DAMON debugfs interface should set 'struct
damon_operations' of its 'struct damon_ctx' on its own. Therefore, the
client code should depend on all supporting monitoring operations
implementations that it could use. For example, DAMON debugfs interface
depends on both vaddr and paddr, while some of the users are not always
interested in both.
To minimize such unnecessary dependencies, this commit makes the
monitoring operations can be registered by implementing code and then
dynamically selected by the user code without build-time dependency.
Link: https://lkml.kernel.org/r/20220215184603.1479-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 18 ++++++++++
mm/damon/core.c | 66 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 84 insertions(+)
--- a/include/linux/damon.h~mm-damon-let-monitoring-operations-can-be-registered-and-selected
+++ a/include/linux/damon.h
@@ -253,11 +253,24 @@ struct damos {
struct list_head list;
};
+/**
+ * enum damon_ops_id - Identifier for each monitoring operations implementation
+ *
+ * @DAMON_OPS_VADDR: Monitoring operations for virtual address spaces
+ * @DAMON_OPS_PADDR: Monitoring operations for the physical address space
+ */
+enum damon_ops_id {
+ DAMON_OPS_VADDR,
+ DAMON_OPS_PADDR,
+ NR_DAMON_OPS,
+};
+
struct damon_ctx;
/**
* struct damon_operations - Monitoring operations for given use cases.
*
+ * @id: Identifier of this operations set.
* @init: Initialize operations-related data structures.
* @update: Update operations-related data structures.
* @prepare_access_checks: Prepare next access check of target regions.
@@ -277,6 +290,8 @@ struct damon_ctx;
* &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each
* &damon_ctx.aggr_interval.
*
+ * Each &struct damon_operations instance having valid @id can be registered
+ * via damon_register_ops() and selected by damon_select_ops() later.
* @init should initialize operations-related data structures. For example,
* this could be used to construct proper monitoring target regions and link
* those to @damon_ctx.adaptive_targets.
@@ -301,6 +316,7 @@ struct damon_ctx;
* @cleanup is called from @kdamond just before its termination.
*/
struct damon_operations {
+ enum damon_ops_id id;
void (*init)(struct damon_ctx *context);
void (*update)(struct damon_ctx *context);
void (*prepare_access_checks)(struct damon_ctx *context);
@@ -489,6 +505,8 @@ int damon_set_attrs(struct damon_ctx *ct
int damon_set_schemes(struct damon_ctx *ctx,
struct damos **schemes, ssize_t nr_schemes);
int damon_nr_running_ctxs(void);
+int damon_register_ops(struct damon_operations *ops);
+int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id);
int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
--- a/mm/damon/core.c~mm-damon-let-monitoring-operations-can-be-registered-and-selected
+++ a/mm/damon/core.c
@@ -25,6 +25,72 @@
static DEFINE_MUTEX(damon_lock);
static int nr_running_ctxs;
+static DEFINE_MUTEX(damon_ops_lock);
+static struct damon_operations damon_registered_ops[NR_DAMON_OPS];
+
+/* Should be called under damon_ops_lock with id smaller than NR_DAMON_OPS */
+static bool damon_registered_ops_id(enum damon_ops_id id)
+{
+ struct damon_operations empty_ops = {};
+
+ if (!memcmp(&empty_ops, &damon_registered_ops[id], sizeof(empty_ops)))
+ return false;
+ return true;
+}
+
+/**
+ * damon_register_ops() - Register a monitoring operations set to DAMON.
+ * @ops: monitoring operations set to register.
+ *
+ * This function registers a monitoring operations set of valid &struct
+ * damon_operations->id so that others can find and use them later.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int damon_register_ops(struct damon_operations *ops)
+{
+ int err = 0;
+
+ if (ops->id >= NR_DAMON_OPS)
+ return -EINVAL;
+ mutex_lock(&damon_ops_lock);
+ /* Fail for already registered ops */
+ if (damon_registered_ops_id(ops->id)) {
+ err = -EINVAL;
+ goto out;
+ }
+ damon_registered_ops[ops->id] = *ops;
+out:
+ mutex_unlock(&damon_ops_lock);
+ return err;
+}
+
+/**
+ * damon_select_ops() - Select a monitoring operations to use with the context.
+ * @ctx: monitoring context to use the operations.
+ * @id: id of the registered monitoring operations to select.
+ *
+ * This function finds registered monitoring operations set of @id and make
+ * @ctx to use it.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id)
+{
+ int err = 0;
+
+ if (id >= NR_DAMON_OPS)
+ return -EINVAL;
+
+ mutex_lock(&damon_ops_lock);
+ if (!damon_registered_ops_id(id))
+ err = -EINVAL;
+ else
+ ctx->ops = damon_registered_ops[id];
+ mutex_unlock(&damon_ops_lock);
+ return err;
+}
+
/*
* Construct a damon_region struct
*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 203/227] mm/damon: let monitoring operations can be registered and selected
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: let monitoring operations can be registered and selected
In-kernel DAMON user code like DAMON debugfs interface should set 'struct
damon_operations' of its 'struct damon_ctx' on its own. Therefore, the
client code should depend on all supporting monitoring operations
implementations that it could use. For example, DAMON debugfs interface
depends on both vaddr and paddr, while some of the users are not always
interested in both.
To minimize such unnecessary dependencies, this commit makes the
monitoring operations can be registered by implementing code and then
dynamically selected by the user code without build-time dependency.
Link: https://lkml.kernel.org/r/20220215184603.1479-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 18 ++++++++++
mm/damon/core.c | 66 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 84 insertions(+)
--- a/include/linux/damon.h~mm-damon-let-monitoring-operations-can-be-registered-and-selected
+++ a/include/linux/damon.h
@@ -253,11 +253,24 @@ struct damos {
struct list_head list;
};
+/**
+ * enum damon_ops_id - Identifier for each monitoring operations implementation
+ *
+ * @DAMON_OPS_VADDR: Monitoring operations for virtual address spaces
+ * @DAMON_OPS_PADDR: Monitoring operations for the physical address space
+ */
+enum damon_ops_id {
+ DAMON_OPS_VADDR,
+ DAMON_OPS_PADDR,
+ NR_DAMON_OPS,
+};
+
struct damon_ctx;
/**
* struct damon_operations - Monitoring operations for given use cases.
*
+ * @id: Identifier of this operations set.
* @init: Initialize operations-related data structures.
* @update: Update operations-related data structures.
* @prepare_access_checks: Prepare next access check of target regions.
@@ -277,6 +290,8 @@ struct damon_ctx;
* &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each
* &damon_ctx.aggr_interval.
*
+ * Each &struct damon_operations instance having valid @id can be registered
+ * via damon_register_ops() and selected by damon_select_ops() later.
* @init should initialize operations-related data structures. For example,
* this could be used to construct proper monitoring target regions and link
* those to @damon_ctx.adaptive_targets.
@@ -301,6 +316,7 @@ struct damon_ctx;
* @cleanup is called from @kdamond just before its termination.
*/
struct damon_operations {
+ enum damon_ops_id id;
void (*init)(struct damon_ctx *context);
void (*update)(struct damon_ctx *context);
void (*prepare_access_checks)(struct damon_ctx *context);
@@ -489,6 +505,8 @@ int damon_set_attrs(struct damon_ctx *ct
int damon_set_schemes(struct damon_ctx *ctx,
struct damos **schemes, ssize_t nr_schemes);
int damon_nr_running_ctxs(void);
+int damon_register_ops(struct damon_operations *ops);
+int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id);
int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
--- a/mm/damon/core.c~mm-damon-let-monitoring-operations-can-be-registered-and-selected
+++ a/mm/damon/core.c
@@ -25,6 +25,72 @@
static DEFINE_MUTEX(damon_lock);
static int nr_running_ctxs;
+static DEFINE_MUTEX(damon_ops_lock);
+static struct damon_operations damon_registered_ops[NR_DAMON_OPS];
+
+/* Should be called under damon_ops_lock with id smaller than NR_DAMON_OPS */
+static bool damon_registered_ops_id(enum damon_ops_id id)
+{
+ struct damon_operations empty_ops = {};
+
+ if (!memcmp(&empty_ops, &damon_registered_ops[id], sizeof(empty_ops)))
+ return false;
+ return true;
+}
+
+/**
+ * damon_register_ops() - Register a monitoring operations set to DAMON.
+ * @ops: monitoring operations set to register.
+ *
+ * This function registers a monitoring operations set of valid &struct
+ * damon_operations->id so that others can find and use them later.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int damon_register_ops(struct damon_operations *ops)
+{
+ int err = 0;
+
+ if (ops->id >= NR_DAMON_OPS)
+ return -EINVAL;
+ mutex_lock(&damon_ops_lock);
+ /* Fail for already registered ops */
+ if (damon_registered_ops_id(ops->id)) {
+ err = -EINVAL;
+ goto out;
+ }
+ damon_registered_ops[ops->id] = *ops;
+out:
+ mutex_unlock(&damon_ops_lock);
+ return err;
+}
+
+/**
+ * damon_select_ops() - Select a monitoring operations to use with the context.
+ * @ctx: monitoring context to use the operations.
+ * @id: id of the registered monitoring operations to select.
+ *
+ * This function finds registered monitoring operations set of @id and make
+ * @ctx to use it.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id)
+{
+ int err = 0;
+
+ if (id >= NR_DAMON_OPS)
+ return -EINVAL;
+
+ mutex_lock(&damon_ops_lock);
+ if (!damon_registered_ops_id(id))
+ err = -EINVAL;
+ else
+ ctx->ops = damon_registered_ops[id];
+ mutex_unlock(&damon_ops_lock);
+ return err;
+}
+
/*
* Construct a damon_region struct
*
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 204/227] mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall
This commit makes the monitoring operations for the physical address space
and virtual address spaces register themselves to DAMON in the
subsys_initcall step. Later, in-kernel DAMON user code can use them via
damon_select_ops() without have to unnecessarily depend on all possible
monitoring operations implementations.
Link: https://lkml.kernel.org/r/20220215184603.1479-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/paddr.c | 20 ++++++++++++++++++++
mm/damon/vaddr.c | 20 ++++++++++++++++++++
2 files changed, 40 insertions(+)
--- a/mm/damon/paddr.c~mm-damon-paddrvaddr-register-themselves-to-damon-in-subsys_initcall
+++ a/mm/damon/paddr.c
@@ -273,3 +273,23 @@ void damon_pa_set_operations(struct damo
ctx->ops.apply_scheme = damon_pa_apply_scheme;
ctx->ops.get_scheme_score = damon_pa_scheme_score;
}
+
+static int __init damon_pa_initcall(void)
+{
+ struct damon_operations ops = {
+ .id = DAMON_OPS_PADDR,
+ .init = NULL,
+ .update = NULL,
+ .prepare_access_checks = damon_pa_prepare_access_checks,
+ .check_accesses = damon_pa_check_accesses,
+ .reset_aggregated = NULL,
+ .target_valid = damon_pa_target_valid,
+ .cleanup = NULL,
+ .apply_scheme = damon_pa_apply_scheme,
+ .get_scheme_score = damon_pa_scheme_score,
+ };
+
+ return damon_register_ops(&ops);
+};
+
+subsys_initcall(damon_pa_initcall);
--- a/mm/damon/vaddr.c~mm-damon-paddrvaddr-register-themselves-to-damon-in-subsys_initcall
+++ a/mm/damon/vaddr.c
@@ -752,4 +752,24 @@ void damon_va_set_operations(struct damo
ctx->ops.get_scheme_score = damon_va_scheme_score;
}
+static int __init damon_va_initcall(void)
+{
+ struct damon_operations ops = {
+ .id = DAMON_OPS_VADDR,
+ .init = damon_va_init,
+ .update = damon_va_update,
+ .prepare_access_checks = damon_va_prepare_access_checks,
+ .check_accesses = damon_va_check_accesses,
+ .reset_aggregated = NULL,
+ .target_valid = damon_va_target_valid,
+ .cleanup = NULL,
+ .apply_scheme = damon_va_apply_scheme,
+ .get_scheme_score = damon_va_scheme_score,
+ };
+
+ return damon_register_ops(&ops);
+};
+
+subsys_initcall(damon_va_initcall);
+
#include "vaddr-test.h"
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 204/227] mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall
This commit makes the monitoring operations for the physical address space
and virtual address spaces register themselves to DAMON in the
subsys_initcall step. Later, in-kernel DAMON user code can use them via
damon_select_ops() without have to unnecessarily depend on all possible
monitoring operations implementations.
Link: https://lkml.kernel.org/r/20220215184603.1479-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/paddr.c | 20 ++++++++++++++++++++
mm/damon/vaddr.c | 20 ++++++++++++++++++++
2 files changed, 40 insertions(+)
--- a/mm/damon/paddr.c~mm-damon-paddrvaddr-register-themselves-to-damon-in-subsys_initcall
+++ a/mm/damon/paddr.c
@@ -273,3 +273,23 @@ void damon_pa_set_operations(struct damo
ctx->ops.apply_scheme = damon_pa_apply_scheme;
ctx->ops.get_scheme_score = damon_pa_scheme_score;
}
+
+static int __init damon_pa_initcall(void)
+{
+ struct damon_operations ops = {
+ .id = DAMON_OPS_PADDR,
+ .init = NULL,
+ .update = NULL,
+ .prepare_access_checks = damon_pa_prepare_access_checks,
+ .check_accesses = damon_pa_check_accesses,
+ .reset_aggregated = NULL,
+ .target_valid = damon_pa_target_valid,
+ .cleanup = NULL,
+ .apply_scheme = damon_pa_apply_scheme,
+ .get_scheme_score = damon_pa_scheme_score,
+ };
+
+ return damon_register_ops(&ops);
+};
+
+subsys_initcall(damon_pa_initcall);
--- a/mm/damon/vaddr.c~mm-damon-paddrvaddr-register-themselves-to-damon-in-subsys_initcall
+++ a/mm/damon/vaddr.c
@@ -752,4 +752,24 @@ void damon_va_set_operations(struct damo
ctx->ops.get_scheme_score = damon_va_scheme_score;
}
+static int __init damon_va_initcall(void)
+{
+ struct damon_operations ops = {
+ .id = DAMON_OPS_VADDR,
+ .init = damon_va_init,
+ .update = damon_va_update,
+ .prepare_access_checks = damon_va_prepare_access_checks,
+ .check_accesses = damon_va_check_accesses,
+ .reset_aggregated = NULL,
+ .target_valid = damon_va_target_valid,
+ .cleanup = NULL,
+ .apply_scheme = damon_va_apply_scheme,
+ .get_scheme_score = damon_va_scheme_score,
+ };
+
+ return damon_register_ops(&ops);
+};
+
+subsys_initcall(damon_va_initcall);
+
#include "vaddr-test.h"
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 205/227] mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()
This commit makes DAMON_RECLAIM to select the registered monitoring
operations for the physical address space instead of setting it on its
own. This allows DAMON_RECLAIM be independent of DAMON_PADDR, but leave
the dependency as is, because it's the only one monitoring operations it
use, and therefore it makes no sense to build DAMON_RECLAIM without
DAMON_PADDR.
Link: https://lkml.kernel.org/r/20220215184603.1479-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/reclaim.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/damon/reclaim.c~mm-damon-reclaim-use-damon_select_ops-instead-of-damon_vpa_set_operations
+++ a/mm/damon/reclaim.c
@@ -384,7 +384,9 @@ static int __init damon_reclaim_init(voi
if (!ctx)
return -ENOMEM;
- damon_pa_set_operations(ctx);
+ if (damon_select_ops(ctx, DAMON_OPS_PADDR))
+ return -EINVAL;
+
ctx->callback.after_aggregation = damon_reclaim_after_aggregation;
target = damon_new_target();
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 205/227] mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()
This commit makes DAMON_RECLAIM to select the registered monitoring
operations for the physical address space instead of setting it on its
own. This allows DAMON_RECLAIM be independent of DAMON_PADDR, but leave
the dependency as is, because it's the only one monitoring operations it
use, and therefore it makes no sense to build DAMON_RECLAIM without
DAMON_PADDR.
Link: https://lkml.kernel.org/r/20220215184603.1479-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/reclaim.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/damon/reclaim.c~mm-damon-reclaim-use-damon_select_ops-instead-of-damon_vpa_set_operations
+++ a/mm/damon/reclaim.c
@@ -384,7 +384,9 @@ static int __init damon_reclaim_init(voi
if (!ctx)
return -ENOMEM;
- damon_pa_set_operations(ctx);
+ if (damon_select_ops(ctx, DAMON_OPS_PADDR))
+ return -EINVAL;
+
ctx->callback.after_aggregation = damon_reclaim_after_aggregation;
target = damon_new_target();
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 206/227] mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations()
This commit makes DAMON debugfs interface to select the registered
monitoring operations for the physical address space or virtual address
spaces depending on user requests instead of setting it on its own. Note
that DAMON debugfs interface is still dependent to DAMON_VADDR with this
change, because it is also using its symbol, 'damon_va_target_valid'.
Link: https://lkml.kernel.org/r/20220215184603.1479-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-use-damon_select_ops-instead-of-damon_vpa_set_operations
+++ a/mm/damon/dbgfs.c
@@ -474,12 +474,18 @@ static ssize_t dbgfs_target_ids_write(st
/* remove previously set targets */
dbgfs_set_targets(ctx, 0, NULL);
+ if (!nr_targets) {
+ ret = count;
+ goto unlock_out;
+ }
/* Configure the context for the address space type */
if (id_is_pid)
- damon_va_set_operations(ctx);
+ ret = damon_select_ops(ctx, DAMON_OPS_VADDR);
else
- damon_pa_set_operations(ctx);
+ ret = damon_select_ops(ctx, DAMON_OPS_PADDR);
+ if (ret)
+ goto unlock_out;
ret = dbgfs_set_targets(ctx, nr_targets, target_pids);
if (!ret)
@@ -735,7 +741,11 @@ static struct damon_ctx *dbgfs_new_ctx(v
if (!ctx)
return NULL;
- damon_va_set_operations(ctx);
+ if (damon_select_ops(ctx, DAMON_OPS_VADDR) && damon_select_ops(ctx,
+ DAMON_OPS_PADDR)) {
+ damon_destroy_ctx(ctx);
+ return NULL;
+ }
ctx->callback.before_terminate = dbgfs_before_terminate;
return ctx;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 206/227] mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations()
@ 2022-03-22 21:48 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:48 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs: use damon_select_ops() instead of damon_{v,p}a_set_operations()
This commit makes DAMON debugfs interface to select the registered
monitoring operations for the physical address space or virtual address
spaces depending on user requests instead of setting it on its own. Note
that DAMON debugfs interface is still dependent to DAMON_VADDR with this
change, because it is also using its symbol, 'damon_va_target_valid'.
Link: https://lkml.kernel.org/r/20220215184603.1479-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-use-damon_select_ops-instead-of-damon_vpa_set_operations
+++ a/mm/damon/dbgfs.c
@@ -474,12 +474,18 @@ static ssize_t dbgfs_target_ids_write(st
/* remove previously set targets */
dbgfs_set_targets(ctx, 0, NULL);
+ if (!nr_targets) {
+ ret = count;
+ goto unlock_out;
+ }
/* Configure the context for the address space type */
if (id_is_pid)
- damon_va_set_operations(ctx);
+ ret = damon_select_ops(ctx, DAMON_OPS_VADDR);
else
- damon_pa_set_operations(ctx);
+ ret = damon_select_ops(ctx, DAMON_OPS_PADDR);
+ if (ret)
+ goto unlock_out;
ret = dbgfs_set_targets(ctx, nr_targets, target_pids);
if (!ret)
@@ -735,7 +741,11 @@ static struct damon_ctx *dbgfs_new_ctx(v
if (!ctx)
return NULL;
- damon_va_set_operations(ctx);
+ if (damon_select_ops(ctx, DAMON_OPS_VADDR) && damon_select_ops(ctx,
+ DAMON_OPS_PADDR)) {
+ damon_destroy_ctx(ctx);
+ return NULL;
+ }
ctx->callback.before_terminate = dbgfs_before_terminate;
return ctx;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 207/227] mm/damon/dbgfs: use operations id for knowing if the target has pid
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs: use operations id for knowing if the target has pid
DAMON debugfs interface depends on monitoring operations for virtual
address spaces because it knows if the target has pid or not by seeing if
the context is configured to use one of the virtual address space
monitoring operation functions. We can replace that check with 'enum
damon_ops_id' now, to make it independent. This commit makes the change.
Link: https://lkml.kernel.org/r/20220215184603.1479-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-use-operations-id-for-knowing-if-the-target-has-pid
+++ a/mm/damon/dbgfs.c
@@ -277,7 +277,7 @@ out:
static inline bool target_has_pid(const struct damon_ctx *ctx)
{
- return ctx->ops.target_valid == damon_va_target_valid;
+ return ctx->ops.id == DAMON_OPS_VADDR;
}
static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len)
@@ -741,8 +741,8 @@ static struct damon_ctx *dbgfs_new_ctx(v
if (!ctx)
return NULL;
- if (damon_select_ops(ctx, DAMON_OPS_VADDR) && damon_select_ops(ctx,
- DAMON_OPS_PADDR)) {
+ if (damon_select_ops(ctx, DAMON_OPS_VADDR) &&
+ damon_select_ops(ctx, DAMON_OPS_PADDR)) {
damon_destroy_ctx(ctx);
return NULL;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 207/227] mm/damon/dbgfs: use operations id for knowing if the target has pid
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs: use operations id for knowing if the target has pid
DAMON debugfs interface depends on monitoring operations for virtual
address spaces because it knows if the target has pid or not by seeing if
the context is configured to use one of the virtual address space
monitoring operation functions. We can replace that check with 'enum
damon_ops_id' now, to make it independent. This commit makes the change.
Link: https://lkml.kernel.org/r/20220215184603.1479-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-use-operations-id-for-knowing-if-the-target-has-pid
+++ a/mm/damon/dbgfs.c
@@ -277,7 +277,7 @@ out:
static inline bool target_has_pid(const struct damon_ctx *ctx)
{
- return ctx->ops.target_valid == damon_va_target_valid;
+ return ctx->ops.id == DAMON_OPS_VADDR;
}
static ssize_t sprint_target_ids(struct damon_ctx *ctx, char *buf, ssize_t len)
@@ -741,8 +741,8 @@ static struct damon_ctx *dbgfs_new_ctx(v
if (!ctx)
return NULL;
- if (damon_select_ops(ctx, DAMON_OPS_VADDR) && damon_select_ops(ctx,
- DAMON_OPS_PADDR)) {
+ if (damon_select_ops(ctx, DAMON_OPS_VADDR) &&
+ damon_select_ops(ctx, DAMON_OPS_PADDR)) {
damon_destroy_ctx(ctx);
return NULL;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 208/227] mm/damon/dbgfs-test: fix is_target_id() change
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs-test: fix is_target_id() change
DAMON kunit tests for DAMON debugfs interface fails because it still
assumes setting empty monitoring operations makes DAMON debugfs interface
believe the target of the context don't have pid. This commit fixes the
kunit test fails by explicitly setting the context's monitoring operations
with the operations for the physical address space, which let debugfs
knows the target will not have pid.
Link: https://lkml.kernel.org/r/20220215184603.1479-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs-test.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/damon/dbgfs-test.h~mm-damon-dbgfs-test-fix-is_target_id-change
+++ a/mm/damon/dbgfs-test.h
@@ -74,7 +74,7 @@ static void damon_dbgfs_test_set_targets
char buf[64];
/* Make DAMON consider target has no pid */
- ctx->ops = (struct damon_operations){};
+ damon_select_ops(ctx, DAMON_OPS_PADDR);
dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
@@ -111,6 +111,8 @@ static void damon_dbgfs_test_set_init_re
int i, rc;
char buf[256];
+ damon_select_ops(ctx, DAMON_OPS_PADDR);
+
dbgfs_set_targets(ctx, 3, NULL);
/* Put valid inputs and check the results */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 208/227] mm/damon/dbgfs-test: fix is_target_id() change
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/dbgfs-test: fix is_target_id() change
DAMON kunit tests for DAMON debugfs interface fails because it still
assumes setting empty monitoring operations makes DAMON debugfs interface
believe the target of the context don't have pid. This commit fixes the
kunit test fails by explicitly setting the context's monitoring operations
with the operations for the physical address space, which let debugfs
knows the target will not have pid.
Link: https://lkml.kernel.org/r/20220215184603.1479-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/dbgfs-test.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/damon/dbgfs-test.h~mm-damon-dbgfs-test-fix-is_target_id-change
+++ a/mm/damon/dbgfs-test.h
@@ -74,7 +74,7 @@ static void damon_dbgfs_test_set_targets
char buf[64];
/* Make DAMON consider target has no pid */
- ctx->ops = (struct damon_operations){};
+ damon_select_ops(ctx, DAMON_OPS_PADDR);
dbgfs_set_targets(ctx, 0, NULL);
sprint_target_ids(ctx, buf, 64);
@@ -111,6 +111,8 @@ static void damon_dbgfs_test_set_init_re
int i, rc;
char buf[256];
+ damon_select_ops(ctx, DAMON_OPS_PADDR);
+
dbgfs_set_targets(ctx, 3, NULL);
/* Put valid inputs and check the results */
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 209/227] mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
Because DAMON debugfs interface and DAMON-based proactive reclaim are now
using monitoring operations via registration mechanism,
damon_{p,v}a_{target_valid,set_operations}() functions have no user. This
commit clean them up.
Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 10 ----------
mm/damon/paddr.c | 20 +-------------------
mm/damon/vaddr.c | 15 +--------------
3 files changed, 2 insertions(+), 43 deletions(-)
--- a/include/linux/damon.h~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations
+++ a/include/linux/damon.h
@@ -513,14 +513,4 @@ int damon_stop(struct damon_ctx **ctxs,
#endif /* CONFIG_DAMON */
-#ifdef CONFIG_DAMON_VADDR
-bool damon_va_target_valid(void *t);
-void damon_va_set_operations(struct damon_ctx *ctx);
-#endif /* CONFIG_DAMON_VADDR */
-
-#ifdef CONFIG_DAMON_PADDR
-bool damon_pa_target_valid(void *t);
-void damon_pa_set_operations(struct damon_ctx *ctx);
-#endif /* CONFIG_DAMON_PADDR */
-
#endif /* _DAMON_H */
--- a/mm/damon/paddr.c~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations
+++ a/mm/damon/paddr.c
@@ -208,11 +208,6 @@ static unsigned int damon_pa_check_acces
return max_nr_accesses;
}
-bool damon_pa_target_valid(void *t)
-{
- return true;
-}
-
static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
struct damon_target *t, struct damon_region *r,
struct damos *scheme)
@@ -261,19 +256,6 @@ static int damon_pa_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_pa_set_operations(struct damon_ctx *ctx)
-{
- ctx->ops.init = NULL;
- ctx->ops.update = NULL;
- ctx->ops.prepare_access_checks = damon_pa_prepare_access_checks;
- ctx->ops.check_accesses = damon_pa_check_accesses;
- ctx->ops.reset_aggregated = NULL;
- ctx->ops.target_valid = damon_pa_target_valid;
- ctx->ops.cleanup = NULL;
- ctx->ops.apply_scheme = damon_pa_apply_scheme;
- ctx->ops.get_scheme_score = damon_pa_scheme_score;
-}
-
static int __init damon_pa_initcall(void)
{
struct damon_operations ops = {
@@ -283,7 +265,7 @@ static int __init damon_pa_initcall(void
.prepare_access_checks = damon_pa_prepare_access_checks,
.check_accesses = damon_pa_check_accesses,
.reset_aggregated = NULL,
- .target_valid = damon_pa_target_valid,
+ .target_valid = NULL,
.cleanup = NULL,
.apply_scheme = damon_pa_apply_scheme,
.get_scheme_score = damon_pa_scheme_score,
--- a/mm/damon/vaddr.c~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations
+++ a/mm/damon/vaddr.c
@@ -653,7 +653,7 @@ static unsigned int damon_va_check_acces
* Functions for the target validity check and cleanup
*/
-bool damon_va_target_valid(void *target)
+static bool damon_va_target_valid(void *target)
{
struct damon_target *t = target;
struct task_struct *task;
@@ -739,19 +739,6 @@ static int damon_va_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_va_set_operations(struct damon_ctx *ctx)
-{
- ctx->ops.init = damon_va_init;
- ctx->ops.update = damon_va_update;
- ctx->ops.prepare_access_checks = damon_va_prepare_access_checks;
- ctx->ops.check_accesses = damon_va_check_accesses;
- ctx->ops.reset_aggregated = NULL;
- ctx->ops.target_valid = damon_va_target_valid;
- ctx->ops.cleanup = NULL;
- ctx->ops.apply_scheme = damon_va_apply_scheme;
- ctx->ops.get_scheme_score = damon_va_scheme_score;
-}
-
static int __init damon_va_initcall(void)
{
struct damon_operations ops = {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 209/227] mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, rientjes, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
Because DAMON debugfs interface and DAMON-based proactive reclaim are now
using monitoring operations via registration mechanism,
damon_{p,v}a_{target_valid,set_operations}() functions have no user. This
commit clean them up.
Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 10 ----------
mm/damon/paddr.c | 20 +-------------------
mm/damon/vaddr.c | 15 +--------------
3 files changed, 2 insertions(+), 43 deletions(-)
--- a/include/linux/damon.h~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations
+++ a/include/linux/damon.h
@@ -513,14 +513,4 @@ int damon_stop(struct damon_ctx **ctxs,
#endif /* CONFIG_DAMON */
-#ifdef CONFIG_DAMON_VADDR
-bool damon_va_target_valid(void *t);
-void damon_va_set_operations(struct damon_ctx *ctx);
-#endif /* CONFIG_DAMON_VADDR */
-
-#ifdef CONFIG_DAMON_PADDR
-bool damon_pa_target_valid(void *t);
-void damon_pa_set_operations(struct damon_ctx *ctx);
-#endif /* CONFIG_DAMON_PADDR */
-
#endif /* _DAMON_H */
--- a/mm/damon/paddr.c~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations
+++ a/mm/damon/paddr.c
@@ -208,11 +208,6 @@ static unsigned int damon_pa_check_acces
return max_nr_accesses;
}
-bool damon_pa_target_valid(void *t)
-{
- return true;
-}
-
static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
struct damon_target *t, struct damon_region *r,
struct damos *scheme)
@@ -261,19 +256,6 @@ static int damon_pa_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_pa_set_operations(struct damon_ctx *ctx)
-{
- ctx->ops.init = NULL;
- ctx->ops.update = NULL;
- ctx->ops.prepare_access_checks = damon_pa_prepare_access_checks;
- ctx->ops.check_accesses = damon_pa_check_accesses;
- ctx->ops.reset_aggregated = NULL;
- ctx->ops.target_valid = damon_pa_target_valid;
- ctx->ops.cleanup = NULL;
- ctx->ops.apply_scheme = damon_pa_apply_scheme;
- ctx->ops.get_scheme_score = damon_pa_scheme_score;
-}
-
static int __init damon_pa_initcall(void)
{
struct damon_operations ops = {
@@ -283,7 +265,7 @@ static int __init damon_pa_initcall(void
.prepare_access_checks = damon_pa_prepare_access_checks,
.check_accesses = damon_pa_check_accesses,
.reset_aggregated = NULL,
- .target_valid = damon_pa_target_valid,
+ .target_valid = NULL,
.cleanup = NULL,
.apply_scheme = damon_pa_apply_scheme,
.get_scheme_score = damon_pa_scheme_score,
--- a/mm/damon/vaddr.c~mm-damon-paddrvaddr-remove-damon_pva_target_validset_operations
+++ a/mm/damon/vaddr.c
@@ -653,7 +653,7 @@ static unsigned int damon_va_check_acces
* Functions for the target validity check and cleanup
*/
-bool damon_va_target_valid(void *target)
+static bool damon_va_target_valid(void *target)
{
struct damon_target *t = target;
struct task_struct *task;
@@ -739,19 +739,6 @@ static int damon_va_scheme_score(struct
return DAMOS_MAX_SCORE;
}
-void damon_va_set_operations(struct damon_ctx *ctx)
-{
- ctx->ops.init = damon_va_init;
- ctx->ops.update = damon_va_update;
- ctx->ops.prepare_access_checks = damon_va_prepare_access_checks;
- ctx->ops.check_accesses = damon_va_check_accesses;
- ctx->ops.reset_aggregated = NULL;
- ctx->ops.target_valid = damon_va_target_valid;
- ctx->ops.cleanup = NULL;
- ctx->ops.apply_scheme = damon_va_apply_scheme;
- ctx->ops.get_scheme_score = damon_va_scheme_score;
-}
-
static int __init damon_va_initcall(void)
{
struct damon_operations ops = {
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 210/227] mm/damon: remove unnecessary CONFIG_DAMON option
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: sj, tangmeng, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: tangmeng <tangmeng@uniontech.com>
Subject: mm/damon: remove unnecessary CONFIG_DAMON option
In mm/Makefile has:
obj-$(CONFIG_DAMON) += damon/
So that we don't need 'obj-$(CONFIG_DAMON) :=' in mm/damon/Makefile,
delete it from mm/damon/Makefile.
Link: https://lkml.kernel.org/r/20220221065255.19991-1-tangmeng@uniontech.com
Signed-off-by: tangmeng <tangmeng@uniontech.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/damon/Makefile~mm-damon-remove-unnecessary-config_damon-option
+++ a/mm/damon/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_DAMON) := core.o
+obj-y := core.o
obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o
obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o
obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 210/227] mm/damon: remove unnecessary CONFIG_DAMON option
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: sj, tangmeng, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: tangmeng <tangmeng@uniontech.com>
Subject: mm/damon: remove unnecessary CONFIG_DAMON option
In mm/Makefile has:
obj-$(CONFIG_DAMON) += damon/
So that we don't need 'obj-$(CONFIG_DAMON) :=' in mm/damon/Makefile,
delete it from mm/damon/Makefile.
Link: https://lkml.kernel.org/r/20220221065255.19991-1-tangmeng@uniontech.com
Signed-off-by: tangmeng <tangmeng@uniontech.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/damon/Makefile~mm-damon-remove-unnecessary-config_damon-option
+++ a/mm/damon/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_DAMON) := core.o
+obj-y := core.o
obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o
obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o
obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 211/227] Docs/vm/damon: call low level monitoring primitives the operations
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: corbet, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/vm/damon: call low level monitoring primitives the operations
Patch series "Docs/damon: Update documents for better consistency".
Some of DAMON document are not properly updated for latest version. This
patchset updates such parts.
This patch (of 3):
DAMON code calls the low level monitoring primitives implementations the
monitoring operations. The documentation would have no problem at still
calling those primitives implementation because there is no real
difference in the concepts, but making it more consistent with the code
would make it better. This commit therefore convert sentences in the doc
specifically pointing the implementations of the primitives to call it
monitoring operations.
Link: https://lkml.kernel.org/r/20220222170100.17068-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220222170100.17068-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/vm/damon/design.rst | 24 ++++++++++++------------
Documentation/vm/damon/faq.rst | 2 +-
2 files changed, 13 insertions(+), 13 deletions(-)
--- a/Documentation/vm/damon/design.rst~docs-vm-damon-call-low-level-monitoring-primitives-the-operations
+++ a/Documentation/vm/damon/design.rst
@@ -13,12 +13,13 @@ primitives that dependent on and optimiz
the other hand, the accuracy and overhead tradeoff mechanism, which is the core
of DAMON, is in the pure logic space. DAMON separates the two parts in
different layers and defines its interface to allow various low level
-primitives implementations configurable with the core logic.
+primitives implementations configurable with the core logic. We call the low
+level primitives implementations monitoring operations.
Due to this separated design and the configurable interface, users can extend
-DAMON for any address space by configuring the core logics with appropriate low
-level primitive implementations. If appropriate one is not provided, users can
-implement the primitives on their own.
+DAMON for any address space by configuring the core logics with appropriate
+monitoring operations. If appropriate one is not provided, users can implement
+the operations on their own.
For example, physical memory, virtual memory, swap space, those for specific
processes, NUMA nodes, files, and backing memory devices would be supportable.
@@ -26,25 +27,24 @@ Also, if some architectures or devices s
primitives, those will be easily configurable.
-Reference Implementations of Address Space Specific Primitives
-==============================================================
+Reference Implementations of Address Space Specific Monitoring Operations
+=========================================================================
-The low level primitives for the fundamental access monitoring are defined in
-two parts:
+The monitoring operations are defined in two parts:
1. Identification of the monitoring target address range for the address space.
2. Access check of specific address range in the target space.
-DAMON currently provides the implementations of the primitives for the physical
+DAMON currently provides the implementations of the operations for the physical
and virtual address spaces. Below two subsections describe how those work.
VMA-based Target Address Range Construction
-------------------------------------------
-This is only for the virtual address space primitives implementation. That for
-the physical address space simply asks users to manually set the monitoring
-target address ranges.
+This is only for the virtual address space monitoring operations
+implementation. That for the physical address space simply asks users to
+manually set the monitoring target address ranges.
Only small parts in the super-huge virtual address space of the processes are
mapped to the physical memory and accessed. Thus, tracking the unmapped
--- a/Documentation/vm/damon/faq.rst~docs-vm-damon-call-low-level-monitoring-primitives-the-operations
+++ a/Documentation/vm/damon/faq.rst
@@ -31,7 +31,7 @@ Does DAMON support virtual memory only?
=======================================
No. The core of the DAMON is address space independent. The address space
-specific low level primitive parts including monitoring target regions
+specific monitoring operations including monitoring target regions
constructions and actual access checks can be implemented and configured on the
DAMON core by the users. In this way, DAMON users can monitor any address
space with any access check technique.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 211/227] Docs/vm/damon: call low level monitoring primitives the operations
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: corbet, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/vm/damon: call low level monitoring primitives the operations
Patch series "Docs/damon: Update documents for better consistency".
Some of DAMON document are not properly updated for latest version. This
patchset updates such parts.
This patch (of 3):
DAMON code calls the low level monitoring primitives implementations the
monitoring operations. The documentation would have no problem at still
calling those primitives implementation because there is no real
difference in the concepts, but making it more consistent with the code
would make it better. This commit therefore convert sentences in the doc
specifically pointing the implementations of the primitives to call it
monitoring operations.
Link: https://lkml.kernel.org/r/20220222170100.17068-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220222170100.17068-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/vm/damon/design.rst | 24 ++++++++++++------------
Documentation/vm/damon/faq.rst | 2 +-
2 files changed, 13 insertions(+), 13 deletions(-)
--- a/Documentation/vm/damon/design.rst~docs-vm-damon-call-low-level-monitoring-primitives-the-operations
+++ a/Documentation/vm/damon/design.rst
@@ -13,12 +13,13 @@ primitives that dependent on and optimiz
the other hand, the accuracy and overhead tradeoff mechanism, which is the core
of DAMON, is in the pure logic space. DAMON separates the two parts in
different layers and defines its interface to allow various low level
-primitives implementations configurable with the core logic.
+primitives implementations configurable with the core logic. We call the low
+level primitives implementations monitoring operations.
Due to this separated design and the configurable interface, users can extend
-DAMON for any address space by configuring the core logics with appropriate low
-level primitive implementations. If appropriate one is not provided, users can
-implement the primitives on their own.
+DAMON for any address space by configuring the core logics with appropriate
+monitoring operations. If appropriate one is not provided, users can implement
+the operations on their own.
For example, physical memory, virtual memory, swap space, those for specific
processes, NUMA nodes, files, and backing memory devices would be supportable.
@@ -26,25 +27,24 @@ Also, if some architectures or devices s
primitives, those will be easily configurable.
-Reference Implementations of Address Space Specific Primitives
-==============================================================
+Reference Implementations of Address Space Specific Monitoring Operations
+=========================================================================
-The low level primitives for the fundamental access monitoring are defined in
-two parts:
+The monitoring operations are defined in two parts:
1. Identification of the monitoring target address range for the address space.
2. Access check of specific address range in the target space.
-DAMON currently provides the implementations of the primitives for the physical
+DAMON currently provides the implementations of the operations for the physical
and virtual address spaces. Below two subsections describe how those work.
VMA-based Target Address Range Construction
-------------------------------------------
-This is only for the virtual address space primitives implementation. That for
-the physical address space simply asks users to manually set the monitoring
-target address ranges.
+This is only for the virtual address space monitoring operations
+implementation. That for the physical address space simply asks users to
+manually set the monitoring target address ranges.
Only small parts in the super-huge virtual address space of the processes are
mapped to the physical memory and accessed. Thus, tracking the unmapped
--- a/Documentation/vm/damon/faq.rst~docs-vm-damon-call-low-level-monitoring-primitives-the-operations
+++ a/Documentation/vm/damon/faq.rst
@@ -31,7 +31,7 @@ Does DAMON support virtual memory only?
=======================================
No. The core of the DAMON is address space independent. The address space
-specific low level primitive parts including monitoring target regions
+specific monitoring operations including monitoring target regions
constructions and actual access checks can be implemented and configured on the
DAMON core by the users. In this way, DAMON users can monitor any address
space with any access check technique.
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 212/227] Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: corbet, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
In DAMON's early development stage before it be merged in the mainline, it
was first designed to work exclusively with Idle page tracking to avoid
any interference between each other. Later, but still before be merged in
the mainline, because Idle page tracking is fully under the control of
sysadmins, we made the resolving of conflict as the responsibility of
sysadmins. The document is not updated for the change, though. This
commit updates the document for that.
Link: https://lkml.kernel.org/r/20220222170100.17068-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/vm/damon/design.rst | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/Documentation/vm/damon/design.rst~docs-vm-damon-design-update-damon-idle-page-tracking-interference-handling
+++ a/Documentation/vm/damon/design.rst
@@ -84,9 +84,10 @@ table having a mapping to the address.
and clear the bit(s) for next sampling target address and checks whether the
bit(s) set again after one sampling period. This could disturb other kernel
subsystems using the Accessed bits, namely Idle page tracking and the reclaim
-logic. To avoid such disturbances, DAMON makes it mutually exclusive with Idle
-page tracking and uses ``PG_idle`` and ``PG_young`` page flags to solve the
-conflict with the reclaim logic, as Idle page tracking does.
+logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling
+the interference is the responsibility of sysadmins. However, it solves the
+conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
+as Idle page tracking does.
Address Space Independent Core Mechanisms
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 212/227] Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: corbet, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
In DAMON's early development stage before it be merged in the mainline, it
was first designed to work exclusively with Idle page tracking to avoid
any interference between each other. Later, but still before be merged in
the mainline, because Idle page tracking is fully under the control of
sysadmins, we made the resolving of conflict as the responsibility of
sysadmins. The document is not updated for the change, though. This
commit updates the document for that.
Link: https://lkml.kernel.org/r/20220222170100.17068-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/vm/damon/design.rst | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/Documentation/vm/damon/design.rst~docs-vm-damon-design-update-damon-idle-page-tracking-interference-handling
+++ a/Documentation/vm/damon/design.rst
@@ -84,9 +84,10 @@ table having a mapping to the address.
and clear the bit(s) for next sampling target address and checks whether the
bit(s) set again after one sampling period. This could disturb other kernel
subsystems using the Accessed bits, namely Idle page tracking and the reclaim
-logic. To avoid such disturbances, DAMON makes it mutually exclusive with Idle
-page tracking and uses ``PG_idle`` and ``PG_young`` page flags to solve the
-conflict with the reclaim logic, as Idle page tracking does.
+logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling
+the interference is the responsibility of sysadmins. However, it solves the
+conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
+as Idle page tracking does.
Address Space Independent Core Mechanisms
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 213/227] Docs/damon: update outdated term 'regions update interval'
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: corbet, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/damon: update outdated term 'regions update interval'
Before DAMON is merged in the mainline, the concept of 'regions update
interval' has generalized to be used as the time interval for update of
any monitoring operations related data structure, but the document has not
updated properly. This commit updates the document for better
consistency.
Link: https://lkml.kernel.org/r/20220222170100.17068-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 6 +++---
Documentation/vm/damon/design.rst | 12 +++++++-----
2 files changed, 10 insertions(+), 8 deletions(-)
--- a/Documentation/admin-guide/mm/damon/usage.rst~docs-damon-update-outdated-term-regions-update-interval
+++ a/Documentation/admin-guide/mm/damon/usage.rst
@@ -47,7 +47,7 @@ Attributes
----------
Users can get and set the ``sampling interval``, ``aggregation interval``,
-``regions update interval``, and min/max number of monitoring target regions by
+``update interval``, and min/max number of monitoring target regions by
reading from and writing to the ``attrs`` file. To know about the monitoring
attributes in detail, please refer to the :doc:`/vm/damon/design`. For
example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and
@@ -128,8 +128,8 @@ ranges, ``20-40`` and ``50-100`` as that
Note that this sets the initial monitoring target regions only. In case of
virtual memory monitoring, DAMON will automatically updates the boundary of the
-regions after one ``regions update interval``. Therefore, users should set the
-``regions update interval`` large enough in this case, if they don't want the
+regions after one ``update interval``. Therefore, users should set the
+``update interval`` large enough in this case, if they don't want the
update.
--- a/Documentation/vm/damon/design.rst~docs-damon-update-outdated-term-regions-update-interval
+++ a/Documentation/vm/damon/design.rst
@@ -95,8 +95,8 @@ Address Space Independent Core Mechanism
Below four sections describe each of the DAMON core mechanisms and the five
monitoring attributes, ``sampling interval``, ``aggregation interval``,
-``regions update interval``, ``minimum number of regions``, and ``maximum
-number of regions``.
+``update interval``, ``minimum number of regions``, and ``maximum number of
+regions``.
Access Frequency Monitoring
@@ -169,6 +169,8 @@ The monitoring target address range coul
virtual memory could be dynamically mapped and unmapped. Physical memory could
be hot-plugged.
-As the changes could be quite frequent in some cases, DAMON checks the dynamic
-memory mapping changes and applies it to the abstracted target area only for
-each of a user-specified time interval (``regions update interval``).
+As the changes could be quite frequent in some cases, DAMON allows the
+monitoring operations to check dynamic changes including memory mapping changes
+and applies it to monitoring operations-related data structures such as the
+abstracted monitoring target memory area only for each of a user-specified time
+interval (``update interval``).
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 213/227] Docs/damon: update outdated term 'regions update interval'
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: corbet, sj, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/damon: update outdated term 'regions update interval'
Before DAMON is merged in the mainline, the concept of 'regions update
interval' has generalized to be used as the time interval for update of
any monitoring operations related data structure, but the document has not
updated properly. This commit updates the document for better
consistency.
Link: https://lkml.kernel.org/r/20220222170100.17068-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 6 +++---
Documentation/vm/damon/design.rst | 12 +++++++-----
2 files changed, 10 insertions(+), 8 deletions(-)
--- a/Documentation/admin-guide/mm/damon/usage.rst~docs-damon-update-outdated-term-regions-update-interval
+++ a/Documentation/admin-guide/mm/damon/usage.rst
@@ -47,7 +47,7 @@ Attributes
----------
Users can get and set the ``sampling interval``, ``aggregation interval``,
-``regions update interval``, and min/max number of monitoring target regions by
+``update interval``, and min/max number of monitoring target regions by
reading from and writing to the ``attrs`` file. To know about the monitoring
attributes in detail, please refer to the :doc:`/vm/damon/design`. For
example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and
@@ -128,8 +128,8 @@ ranges, ``20-40`` and ``50-100`` as that
Note that this sets the initial monitoring target regions only. In case of
virtual memory monitoring, DAMON will automatically updates the boundary of the
-regions after one ``regions update interval``. Therefore, users should set the
-``regions update interval`` large enough in this case, if they don't want the
+regions after one ``update interval``. Therefore, users should set the
+``update interval`` large enough in this case, if they don't want the
update.
--- a/Documentation/vm/damon/design.rst~docs-damon-update-outdated-term-regions-update-interval
+++ a/Documentation/vm/damon/design.rst
@@ -95,8 +95,8 @@ Address Space Independent Core Mechanism
Below four sections describe each of the DAMON core mechanisms and the five
monitoring attributes, ``sampling interval``, ``aggregation interval``,
-``regions update interval``, ``minimum number of regions``, and ``maximum
-number of regions``.
+``update interval``, ``minimum number of regions``, and ``maximum number of
+regions``.
Access Frequency Monitoring
@@ -169,6 +169,8 @@ The monitoring target address range coul
virtual memory could be dynamically mapped and unmapped. Physical memory could
be hot-plugged.
-As the changes could be quite frequent in some cases, DAMON checks the dynamic
-memory mapping changes and applies it to the abstracted target area only for
-each of a user-specified time interval (``regions update interval``).
+As the changes could be quite frequent in some cases, DAMON allows the
+monitoring operations to check dynamic changes including memory mapping changes
+and applies it to monitoring operations-related data structures such as the
+abstracted monitoring target memory area only for each of a user-specified time
+interval (``update interval``).
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 214/227] mm/damon/core: allow non-exclusive DAMON start/stop
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 11206 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/core: allow non-exclusive DAMON start/stop
Patch series "Introduce DAMON sysfs interface", v3.
Introduction
============
DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so
far. However, it unnecessarily depends on debugfs, while DAMON is not
aimed to be used for only debugging. Also, the interface receives
multiple values via one file. For example, schemes file receives 18
values. As a result, it is inefficient, hard to be used, and difficult to
be extended. Especially, keeping backward compatibility of user space
tools is getting only challenging. It would be better to implement
another reliable and flexible interface and deprecate DAMON_DBGFS in long
term.
For the reason, this patchset introduces a sysfs-based new user interface
of DAMON. The idea of the new interface is, using directory hierarchies
and having one dedicated file for each value. For a short example, users
can do the virtual address monitoring via the interface as below:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr_kdamonds
# echo 1 > kdamonds/0/contexts/nr_contexts
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr_targets
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
# echo on > kdamonds/0/state
A brief representation of the files hierarchy of DAMON sysfs interface is
as below. Childs are represented with indentation, directories are having
'/' suffix, and files in each directory are separated by comma.
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Detailed usage of the files will be described in the final Documentation
patch of this patchset.
Main Difference Between DAMON_DBGFS and DAMON_SYSFS
---------------------------------------------------
At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One
important difference between them is their exclusiveness. DAMON_DBGFS
works in an exclusive manner, so that no DAMON worker thread (kdamond) in
the system can run concurrently and interfere somehow. For the reason,
DAMON_DBGFS asks users to construct all monitoring contexts and start them
at once. It's not a big problem but makes the operation a little bit
complex and unflexible.
For more flexible usage, DAMON_SYSFS moves the responsibility of
preventing any possible interference to the admins and work in a
non-exclusive manner. That is, users can configure and start contexts one
by one. Note that DAMON respects both exclusive groups and non-exclusive
groups of contexts, in a manner similar to that of reader-writer locks.
That is, if any exclusive monitoring contexts (e.g., contexts that started
via DAMON_DBGFS) are running, DAMON_SYSFS does not start new contexts, and
vice versa.
Future Plan of DAMON_DBGFS Deprecation
======================================
Once this patchset is merged, DAMON_DBGFS development will be frozen.
That is, we will maintain it to work as is now so that no users will be
break. But, it will not be extended to provide any new feature of DAMON.
The support will be continued only until next LTS release. After that, we
will drop DAMON_DBGFS.
User-space Tooling Compatibility
--------------------------------
As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space
tooling can move to DAMON_SYSFS. As we will continue supporting
DAMON_DBGFS until next LTS kernel release, user space tools would have
enough time to move to DAMON_SYSFS.
The official user space tool, damo[1], is already supporting both
DAMON_SYSFS and DAMON_DBGFS. Both correctness tests[2] and performance
tests[3] of DAMON using DAMON_SYSFS also passed.
[1] https://github.com/awslabs/damo
[2] https://github.com/awslabs/damon-tests/tree/master/corr
[3] https://github.com/awslabs/damon-tests/tree/master/perf
Sequence of Patches
===================
First two patches (patches 1-2) make core changes for DAMON_SYSFS. The
first one (patch 1) allows non-exclusive DAMON contexts so that
DAMON_SYSFS can work in non-exclusive mode, while the second one (patch 2)
adds size of DAMON enum types so that DAMON API users can safely iterate
the enums.
Third patch (patch 3) implements basic sysfs stub for virtual address
spaces monitoring. Note that this implements only sysfs files and DAMON
is not linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so
that users can control DAMON using the sysfs files.
Following six patches (patches 5-10) implements other DAMON features that
DAMON_DBGFS supports one by one (physical address space monitoring,
DAMON-based operation schemes, schemes quotas, schemes prioritization
weights, schemes watermarks, and schemes stats).
Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the
final one (patch 12) documents DAMON_SYSFS.
This patch (of 13):
To avoid interference between DAMON contexts monitoring overlapping memory
regions, damon_start() works in an exclusive manner. That is,
damon_start() does nothing bug fails if any context that started by
another instance of the function is still running. This makes its usage a
little bit restrictive. However, admins could aware each DAMON usage and
address such interferences on their own in some cases.
This commit hence implements non-exclusive mode of the function and allows
the callers to select the mode. Note that the exclusive groups and
non-exclusive groups of contexts will respect each other in a manner
similar to that of reader-writer locks. Therefore, this commit will not
cause any behavioral change to the exclusive groups.
Link: https://lkml.kernel.org/r/20220228081314.5770-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220228081314.5770-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 2 +-
mm/damon/core.c | 23 +++++++++++++++--------
mm/damon/dbgfs.c | 2 +-
mm/damon/reclaim.c | 2 +-
4 files changed, 18 insertions(+), 11 deletions(-)
--- a/include/linux/damon.h~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/include/linux/damon.h
@@ -508,7 +508,7 @@ int damon_nr_running_ctxs(void);
int damon_register_ops(struct damon_operations *ops);
int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id);
-int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
+int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive);
int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
#endif /* CONFIG_DAMON */
--- a/mm/damon/core.c~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/mm/damon/core.c
@@ -24,6 +24,7 @@
static DEFINE_MUTEX(damon_lock);
static int nr_running_ctxs;
+static bool running_exclusive_ctxs;
static DEFINE_MUTEX(damon_ops_lock);
static struct damon_operations damon_registered_ops[NR_DAMON_OPS];
@@ -434,22 +435,25 @@ static int __damon_start(struct damon_ct
* damon_start() - Starts the monitorings for a given group of contexts.
* @ctxs: an array of the pointers for contexts to start monitoring
* @nr_ctxs: size of @ctxs
+ * @exclusive: exclusiveness of this contexts group
*
* This function starts a group of monitoring threads for a group of monitoring
* contexts. One thread per each context is created and run in parallel. The
- * caller should handle synchronization between the threads by itself. If a
- * group of threads that created by other 'damon_start()' call is currently
- * running, this function does nothing but returns -EBUSY.
+ * caller should handle synchronization between the threads by itself. If
+ * @exclusive is true and a group of threads that created by other
+ * 'damon_start()' call is currently running, this function does nothing but
+ * returns -EBUSY.
*
* Return: 0 on success, negative error code otherwise.
*/
-int damon_start(struct damon_ctx **ctxs, int nr_ctxs)
+int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive)
{
int i;
int err = 0;
mutex_lock(&damon_lock);
- if (nr_running_ctxs) {
+ if ((exclusive && nr_running_ctxs) ||
+ (!exclusive && running_exclusive_ctxs)) {
mutex_unlock(&damon_lock);
return -EBUSY;
}
@@ -460,13 +464,15 @@ int damon_start(struct damon_ctx **ctxs,
break;
nr_running_ctxs++;
}
+ if (exclusive && nr_running_ctxs)
+ running_exclusive_ctxs = true;
mutex_unlock(&damon_lock);
return err;
}
/*
- * __damon_stop() - Stops monitoring of given context.
+ * __damon_stop() - Stops monitoring of a given context.
* @ctx: monitoring context
*
* Return: 0 on success, negative error code otherwise.
@@ -504,9 +510,8 @@ int damon_stop(struct damon_ctx **ctxs,
/* nr_running_ctxs is decremented in kdamond_fn */
err = __damon_stop(ctxs[i]);
if (err)
- return err;
+ break;
}
-
return err;
}
@@ -1102,6 +1107,8 @@ static int kdamond_fn(void *data)
mutex_lock(&damon_lock);
nr_running_ctxs--;
+ if (!nr_running_ctxs && running_exclusive_ctxs)
+ running_exclusive_ctxs = false;
mutex_unlock(&damon_lock);
return 0;
--- a/mm/damon/dbgfs.c~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/mm/damon/dbgfs.c
@@ -967,7 +967,7 @@ static ssize_t dbgfs_monitor_on_write(st
return -EINVAL;
}
}
- ret = damon_start(dbgfs_ctxs, dbgfs_nr_ctxs);
+ ret = damon_start(dbgfs_ctxs, dbgfs_nr_ctxs, true);
} else if (!strncmp(kbuf, "off", count)) {
ret = damon_stop(dbgfs_ctxs, dbgfs_nr_ctxs);
} else {
--- a/mm/damon/reclaim.c~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/mm/damon/reclaim.c
@@ -330,7 +330,7 @@ static int damon_reclaim_turn(bool on)
if (err)
goto free_scheme_out;
- err = damon_start(&ctx, 1);
+ err = damon_start(&ctx, 1, true);
if (!err) {
kdamond_pid = ctx->kdamond->pid;
return 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 214/227] mm/damon/core: allow non-exclusive DAMON start/stop
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 11206 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/core: allow non-exclusive DAMON start/stop
Patch series "Introduce DAMON sysfs interface", v3.
Introduction
============
DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so
far. However, it unnecessarily depends on debugfs, while DAMON is not
aimed to be used for only debugging. Also, the interface receives
multiple values via one file. For example, schemes file receives 18
values. As a result, it is inefficient, hard to be used, and difficult to
be extended. Especially, keeping backward compatibility of user space
tools is getting only challenging. It would be better to implement
another reliable and flexible interface and deprecate DAMON_DBGFS in long
term.
For the reason, this patchset introduces a sysfs-based new user interface
of DAMON. The idea of the new interface is, using directory hierarchies
and having one dedicated file for each value. For a short example, users
can do the virtual address monitoring via the interface as below:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr_kdamonds
# echo 1 > kdamonds/0/contexts/nr_contexts
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr_targets
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
# echo on > kdamonds/0/state
A brief representation of the files hierarchy of DAMON sysfs interface is
as below. Childs are represented with indentation, directories are having
'/' suffix, and files in each directory are separated by comma.
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Detailed usage of the files will be described in the final Documentation
patch of this patchset.
Main Difference Between DAMON_DBGFS and DAMON_SYSFS
---------------------------------------------------
At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One
important difference between them is their exclusiveness. DAMON_DBGFS
works in an exclusive manner, so that no DAMON worker thread (kdamond) in
the system can run concurrently and interfere somehow. For the reason,
DAMON_DBGFS asks users to construct all monitoring contexts and start them
at once. It's not a big problem but makes the operation a little bit
complex and unflexible.
For more flexible usage, DAMON_SYSFS moves the responsibility of
preventing any possible interference to the admins and work in a
non-exclusive manner. That is, users can configure and start contexts one
by one. Note that DAMON respects both exclusive groups and non-exclusive
groups of contexts, in a manner similar to that of reader-writer locks.
That is, if any exclusive monitoring contexts (e.g., contexts that started
via DAMON_DBGFS) are running, DAMON_SYSFS does not start new contexts, and
vice versa.
Future Plan of DAMON_DBGFS Deprecation
======================================
Once this patchset is merged, DAMON_DBGFS development will be frozen.
That is, we will maintain it to work as is now so that no users will be
break. But, it will not be extended to provide any new feature of DAMON.
The support will be continued only until next LTS release. After that, we
will drop DAMON_DBGFS.
User-space Tooling Compatibility
--------------------------------
As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space
tooling can move to DAMON_SYSFS. As we will continue supporting
DAMON_DBGFS until next LTS kernel release, user space tools would have
enough time to move to DAMON_SYSFS.
The official user space tool, damo[1], is already supporting both
DAMON_SYSFS and DAMON_DBGFS. Both correctness tests[2] and performance
tests[3] of DAMON using DAMON_SYSFS also passed.
[1] https://github.com/awslabs/damo
[2] https://github.com/awslabs/damon-tests/tree/master/corr
[3] https://github.com/awslabs/damon-tests/tree/master/perf
Sequence of Patches
===================
First two patches (patches 1-2) make core changes for DAMON_SYSFS. The
first one (patch 1) allows non-exclusive DAMON contexts so that
DAMON_SYSFS can work in non-exclusive mode, while the second one (patch 2)
adds size of DAMON enum types so that DAMON API users can safely iterate
the enums.
Third patch (patch 3) implements basic sysfs stub for virtual address
spaces monitoring. Note that this implements only sysfs files and DAMON
is not linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so
that users can control DAMON using the sysfs files.
Following six patches (patches 5-10) implements other DAMON features that
DAMON_DBGFS supports one by one (physical address space monitoring,
DAMON-based operation schemes, schemes quotas, schemes prioritization
weights, schemes watermarks, and schemes stats).
Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the
final one (patch 12) documents DAMON_SYSFS.
This patch (of 13):
To avoid interference between DAMON contexts monitoring overlapping memory
regions, damon_start() works in an exclusive manner. That is,
damon_start() does nothing bug fails if any context that started by
another instance of the function is still running. This makes its usage a
little bit restrictive. However, admins could aware each DAMON usage and
address such interferences on their own in some cases.
This commit hence implements non-exclusive mode of the function and allows
the callers to select the mode. Note that the exclusive groups and
non-exclusive groups of contexts will respect each other in a manner
similar to that of reader-writer locks. Therefore, this commit will not
cause any behavioral change to the exclusive groups.
Link: https://lkml.kernel.org/r/20220228081314.5770-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220228081314.5770-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 2 +-
mm/damon/core.c | 23 +++++++++++++++--------
mm/damon/dbgfs.c | 2 +-
mm/damon/reclaim.c | 2 +-
4 files changed, 18 insertions(+), 11 deletions(-)
--- a/include/linux/damon.h~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/include/linux/damon.h
@@ -508,7 +508,7 @@ int damon_nr_running_ctxs(void);
int damon_register_ops(struct damon_operations *ops);
int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id);
-int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
+int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive);
int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
#endif /* CONFIG_DAMON */
--- a/mm/damon/core.c~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/mm/damon/core.c
@@ -24,6 +24,7 @@
static DEFINE_MUTEX(damon_lock);
static int nr_running_ctxs;
+static bool running_exclusive_ctxs;
static DEFINE_MUTEX(damon_ops_lock);
static struct damon_operations damon_registered_ops[NR_DAMON_OPS];
@@ -434,22 +435,25 @@ static int __damon_start(struct damon_ct
* damon_start() - Starts the monitorings for a given group of contexts.
* @ctxs: an array of the pointers for contexts to start monitoring
* @nr_ctxs: size of @ctxs
+ * @exclusive: exclusiveness of this contexts group
*
* This function starts a group of monitoring threads for a group of monitoring
* contexts. One thread per each context is created and run in parallel. The
- * caller should handle synchronization between the threads by itself. If a
- * group of threads that created by other 'damon_start()' call is currently
- * running, this function does nothing but returns -EBUSY.
+ * caller should handle synchronization between the threads by itself. If
+ * @exclusive is true and a group of threads that created by other
+ * 'damon_start()' call is currently running, this function does nothing but
+ * returns -EBUSY.
*
* Return: 0 on success, negative error code otherwise.
*/
-int damon_start(struct damon_ctx **ctxs, int nr_ctxs)
+int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive)
{
int i;
int err = 0;
mutex_lock(&damon_lock);
- if (nr_running_ctxs) {
+ if ((exclusive && nr_running_ctxs) ||
+ (!exclusive && running_exclusive_ctxs)) {
mutex_unlock(&damon_lock);
return -EBUSY;
}
@@ -460,13 +464,15 @@ int damon_start(struct damon_ctx **ctxs,
break;
nr_running_ctxs++;
}
+ if (exclusive && nr_running_ctxs)
+ running_exclusive_ctxs = true;
mutex_unlock(&damon_lock);
return err;
}
/*
- * __damon_stop() - Stops monitoring of given context.
+ * __damon_stop() - Stops monitoring of a given context.
* @ctx: monitoring context
*
* Return: 0 on success, negative error code otherwise.
@@ -504,9 +510,8 @@ int damon_stop(struct damon_ctx **ctxs,
/* nr_running_ctxs is decremented in kdamond_fn */
err = __damon_stop(ctxs[i]);
if (err)
- return err;
+ break;
}
-
return err;
}
@@ -1102,6 +1107,8 @@ static int kdamond_fn(void *data)
mutex_lock(&damon_lock);
nr_running_ctxs--;
+ if (!nr_running_ctxs && running_exclusive_ctxs)
+ running_exclusive_ctxs = false;
mutex_unlock(&damon_lock);
return 0;
--- a/mm/damon/dbgfs.c~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/mm/damon/dbgfs.c
@@ -967,7 +967,7 @@ static ssize_t dbgfs_monitor_on_write(st
return -EINVAL;
}
}
- ret = damon_start(dbgfs_ctxs, dbgfs_nr_ctxs);
+ ret = damon_start(dbgfs_ctxs, dbgfs_nr_ctxs, true);
} else if (!strncmp(kbuf, "off", count)) {
ret = damon_stop(dbgfs_ctxs, dbgfs_nr_ctxs);
} else {
--- a/mm/damon/reclaim.c~mm-damon-core-allow-non-exclusive-damon-start-stop
+++ a/mm/damon/reclaim.c
@@ -330,7 +330,7 @@ static int damon_reclaim_turn(bool on)
if (err)
goto free_scheme_out;
- err = damon_start(&ctx, 1);
+ err = damon_start(&ctx, 1, true);
if (!err) {
kdamond_pid = ctx->kdamond->pid;
return 0;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 215/227] mm/damon/core: add number of each enum type values
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/core: add number of each enum type values
This commit declares the number of legal values for each DAMON enum types
to make traversals of such DAMON enum types easy and safe.
Link: https://lkml.kernel.org/r/20220228081314.5770-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 4 ++++
1 file changed, 4 insertions(+)
--- a/include/linux/damon.h~mm-damon-core-add-number-of-each-enum-type-values
+++ a/include/linux/damon.h
@@ -87,6 +87,7 @@ struct damon_target {
* @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE.
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
* @DAMOS_STAT: Do nothing but count the stat.
+ * @NR_DAMOS_ACTIONS: Total number of DAMOS actions
*/
enum damos_action {
DAMOS_WILLNEED,
@@ -95,6 +96,7 @@ enum damos_action {
DAMOS_HUGEPAGE,
DAMOS_NOHUGEPAGE,
DAMOS_STAT, /* Do nothing but only record the stat */
+ NR_DAMOS_ACTIONS,
};
/**
@@ -157,10 +159,12 @@ struct damos_quota {
*
* @DAMOS_WMARK_NONE: Ignore the watermarks of the given scheme.
* @DAMOS_WMARK_FREE_MEM_RATE: Free memory rate of the system in [0,1000].
+ * @NR_DAMOS_WMARK_METRICS: Total number of DAMOS watermark metrics
*/
enum damos_wmark_metric {
DAMOS_WMARK_NONE,
DAMOS_WMARK_FREE_MEM_RATE,
+ NR_DAMOS_WMARK_METRICS,
};
/**
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 215/227] mm/damon/core: add number of each enum type values
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/core: add number of each enum type values
This commit declares the number of legal values for each DAMON enum types
to make traversals of such DAMON enum types easy and safe.
Link: https://lkml.kernel.org/r/20220228081314.5770-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/damon.h | 4 ++++
1 file changed, 4 insertions(+)
--- a/include/linux/damon.h~mm-damon-core-add-number-of-each-enum-type-values
+++ a/include/linux/damon.h
@@ -87,6 +87,7 @@ struct damon_target {
* @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE.
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
* @DAMOS_STAT: Do nothing but count the stat.
+ * @NR_DAMOS_ACTIONS: Total number of DAMOS actions
*/
enum damos_action {
DAMOS_WILLNEED,
@@ -95,6 +96,7 @@ enum damos_action {
DAMOS_HUGEPAGE,
DAMOS_NOHUGEPAGE,
DAMOS_STAT, /* Do nothing but only record the stat */
+ NR_DAMOS_ACTIONS,
};
/**
@@ -157,10 +159,12 @@ struct damos_quota {
*
* @DAMOS_WMARK_NONE: Ignore the watermarks of the given scheme.
* @DAMOS_WMARK_FREE_MEM_RATE: Free memory rate of the system in [0,1000].
+ * @NR_DAMOS_WMARK_METRICS: Total number of DAMOS watermark metrics
*/
enum damos_wmark_metric {
DAMOS_WMARK_NONE,
DAMOS_WMARK_FREE_MEM_RATE,
+ NR_DAMOS_WMARK_METRICS,
};
/**
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 216/227] mm/damon: implement a minimal stub for sysfs-based DAMON interface
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, jiapeng.chong, gregkh, corbet, sj, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 30840 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: implement a minimal stub for sysfs-based DAMON interface
DAMON's debugfs-based user interface served very well, so far. However,
it unnecessarily depends on debugfs, while DAMON is not aimed to be used
for only debugging. Also, the interface receives multiple values via one
file. For example, schemes file receives 18 values separated by white
spaces. As a result, it is ineffient, hard to be used, and difficult to
be extended. Especially, keeping backward compatibility of user space
tools is getting only challenging. It would be better to implement
another reliable and flexible interface and deprecate the debugfs
interface in long term.
To this end, this commit implements a stub of a part of the new user
interface of DAMON using sysfs. Specifically, this commit implements the
sysfs control parts for virtual address space monitoring.
More specifically, the idea of the new interface is, using directory
hierarchies and making one file for one value. The hierarchy that this
commit is introducing is as below. In the below figure, parents-children
relations are represented with indentations, each directory is having
``/`` suffix, and files in each directory are separated by comma (",").
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Writing a number <N> to each 'nr' file makes directories of name <0> to
<N-1> in the directory of the 'nr' file. That's all this commit does.
Writing proper values to relevant files will construct the DAMON contexts,
and writing a special keyword, 'on', to 'state' files for each kdamond
will ask DAMON to start the constructed contexts.
For a short example, using below commands for monitoring virtual address
spaces of a given workload is imaginable:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr_kdamonds
# echo 1 > kdamonds/0/contexts/nr_contexts
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr_targets
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
# echo on > kdamonds/0/state
Please note that this commit is implementing only the sysfs part stub as
abovely mentioned. This commit doesn't implement the special keywords for
'state' files. Following commits will do that.
[jiapeng.chong@linux.alibaba.com: fix missing error code in damon_sysfs_attrs_add_dirs()]
Link: https://lkml.kernel.org/r/20220302111120.24984-1-jiapeng.chong@linux.alibaba.com
Link: https://lkml.kernel.org/r/20220228081314.5770-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/Kconfig | 7
mm/damon/Makefile | 1
mm/damon/sysfs.c | 1084 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 1092 insertions(+)
--- a/mm/damon/Kconfig~mm-damon-implement-a-minimal-stub-for-sysfs-based-damon-interface
+++ a/mm/damon/Kconfig
@@ -52,6 +52,13 @@ config DAMON_VADDR_KUNIT_TEST
If unsure, say N.
+config DAMON_SYSFS
+ bool "DAMON sysfs interface"
+ depends on DAMON && SYSFS
+ help
+ This builds the sysfs interface for DAMON. The user space can use
+ the interface for arbitrary data access monitoring.
+
config DAMON_DBGFS
bool "DAMON debugfs interface"
depends on DAMON_VADDR && DAMON_PADDR && DEBUG_FS
--- a/mm/damon/Makefile~mm-damon-implement-a-minimal-stub-for-sysfs-based-damon-interface
+++ a/mm/damon/Makefile
@@ -3,5 +3,6 @@
obj-y := core.o
obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o
obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o
+obj-$(CONFIG_DAMON_SYSFS) += sysfs.o
obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
obj-$(CONFIG_DAMON_RECLAIM) += reclaim.o
--- /dev/null
+++ a/mm/damon/sysfs.c
@@ -0,0 +1,1084 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON sysfs Interface
+ *
+ * Copyright (c) 2022 SeongJae Park <sj@kernel.org>
+ */
+
+#include <linux/damon.h>
+#include <linux/kobject.h>
+#include <linux/pid.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+static DEFINE_MUTEX(damon_sysfs_lock);
+
+/*
+ * unsigned long range directory
+ */
+
+struct damon_sysfs_ul_range {
+ struct kobject kobj;
+ unsigned long min;
+ unsigned long max;
+};
+
+static struct damon_sysfs_ul_range *damon_sysfs_ul_range_alloc(
+ unsigned long min,
+ unsigned long max)
+{
+ struct damon_sysfs_ul_range *range = kmalloc(sizeof(*range),
+ GFP_KERNEL);
+
+ if (!range)
+ return NULL;
+ range->kobj = (struct kobject){};
+ range->min = min;
+ range->max = max;
+
+ return range;
+}
+
+static ssize_t min_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+
+ return sysfs_emit(buf, "%lu\n", range->min);
+}
+
+static ssize_t min_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+ unsigned long min;
+ int err;
+
+ err = kstrtoul(buf, 0, &min);
+ if (err)
+ return -EINVAL;
+
+ range->min = min;
+ return count;
+}
+
+static ssize_t max_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+
+ return sysfs_emit(buf, "%lu\n", range->max);
+}
+
+static ssize_t max_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+ unsigned long max;
+ int err;
+
+ err = kstrtoul(buf, 0, &max);
+ if (err)
+ return -EINVAL;
+
+ range->max = max;
+ return count;
+}
+
+static void damon_sysfs_ul_range_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_ul_range, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_ul_range_min_attr =
+ __ATTR_RW_MODE(min, 0600);
+
+static struct kobj_attribute damon_sysfs_ul_range_max_attr =
+ __ATTR_RW_MODE(max, 0600);
+
+static struct attribute *damon_sysfs_ul_range_attrs[] = {
+ &damon_sysfs_ul_range_min_attr.attr,
+ &damon_sysfs_ul_range_max_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_ul_range);
+
+static struct kobj_type damon_sysfs_ul_range_ktype = {
+ .release = damon_sysfs_ul_range_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_ul_range_groups,
+};
+
+/*
+ * target directory
+ */
+
+struct damon_sysfs_target {
+ struct kobject kobj;
+ int pid;
+};
+
+static struct damon_sysfs_target *damon_sysfs_target_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_target), GFP_KERNEL);
+}
+
+static ssize_t pid_target_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_target *target = container_of(kobj,
+ struct damon_sysfs_target, kobj);
+
+ return sysfs_emit(buf, "%d\n", target->pid);
+}
+
+static ssize_t pid_target_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_target *target = container_of(kobj,
+ struct damon_sysfs_target, kobj);
+ int err = kstrtoint(buf, 0, &target->pid);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_target_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_target, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_target_pid_attr =
+ __ATTR_RW_MODE(pid_target, 0600);
+
+static struct attribute *damon_sysfs_target_attrs[] = {
+ &damon_sysfs_target_pid_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_target);
+
+static struct kobj_type damon_sysfs_target_ktype = {
+ .release = damon_sysfs_target_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_target_groups,
+};
+
+/*
+ * targets directory
+ */
+
+struct damon_sysfs_targets {
+ struct kobject kobj;
+ struct damon_sysfs_target **targets_arr;
+ int nr;
+};
+
+static struct damon_sysfs_targets *damon_sysfs_targets_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_targets), GFP_KERNEL);
+}
+
+static void damon_sysfs_targets_rm_dirs(struct damon_sysfs_targets *targets)
+{
+ struct damon_sysfs_target **targets_arr = targets->targets_arr;
+ int i;
+
+ for (i = 0; i < targets->nr; i++)
+ kobject_put(&targets_arr[i]->kobj);
+ targets->nr = 0;
+ kfree(targets_arr);
+ targets->targets_arr = NULL;
+}
+
+static int damon_sysfs_targets_add_dirs(struct damon_sysfs_targets *targets,
+ int nr_targets)
+{
+ struct damon_sysfs_target **targets_arr, *target;
+ int err, i;
+
+ damon_sysfs_targets_rm_dirs(targets);
+ if (!nr_targets)
+ return 0;
+
+ targets_arr = kmalloc_array(nr_targets, sizeof(*targets_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!targets_arr)
+ return -ENOMEM;
+ targets->targets_arr = targets_arr;
+
+ for (i = 0; i < nr_targets; i++) {
+ target = damon_sysfs_target_alloc();
+ if (!target) {
+ damon_sysfs_targets_rm_dirs(targets);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&target->kobj,
+ &damon_sysfs_target_ktype, &targets->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+
+ targets_arr[i] = target;
+ targets->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_targets_rm_dirs(targets);
+ kobject_put(&target->kobj);
+ return err;
+}
+
+static ssize_t nr_targets_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_targets *targets = container_of(kobj,
+ struct damon_sysfs_targets, kobj);
+
+ return sysfs_emit(buf, "%d\n", targets->nr);
+}
+
+static ssize_t nr_targets_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_targets *targets = container_of(kobj,
+ struct damon_sysfs_targets, kobj);
+ int nr, err = kstrtoint(buf, 0, &nr);
+
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_targets_add_dirs(targets, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_targets_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_targets, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_targets_nr_attr =
+ __ATTR_RW_MODE(nr_targets, 0600);
+
+static struct attribute *damon_sysfs_targets_attrs[] = {
+ &damon_sysfs_targets_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_targets);
+
+static struct kobj_type damon_sysfs_targets_ktype = {
+ .release = damon_sysfs_targets_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_targets_groups,
+};
+
+/*
+ * intervals directory
+ */
+
+struct damon_sysfs_intervals {
+ struct kobject kobj;
+ unsigned long sample_us;
+ unsigned long aggr_us;
+ unsigned long update_us;
+};
+
+static struct damon_sysfs_intervals *damon_sysfs_intervals_alloc(
+ unsigned long sample_us, unsigned long aggr_us,
+ unsigned long update_us)
+{
+ struct damon_sysfs_intervals *intervals = kmalloc(sizeof(*intervals),
+ GFP_KERNEL);
+
+ if (!intervals)
+ return NULL;
+
+ intervals->kobj = (struct kobject){};
+ intervals->sample_us = sample_us;
+ intervals->aggr_us = aggr_us;
+ intervals->update_us = update_us;
+ return intervals;
+}
+
+static ssize_t sample_us_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+
+ return sysfs_emit(buf, "%lu\n", intervals->sample_us);
+}
+
+static ssize_t sample_us_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+ unsigned long us;
+ int err = kstrtoul(buf, 0, &us);
+
+ if (err)
+ return -EINVAL;
+
+ intervals->sample_us = us;
+ return count;
+}
+
+static ssize_t aggr_us_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+
+ return sysfs_emit(buf, "%lu\n", intervals->aggr_us);
+}
+
+static ssize_t aggr_us_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+ unsigned long us;
+ int err = kstrtoul(buf, 0, &us);
+
+ if (err)
+ return -EINVAL;
+
+ intervals->aggr_us = us;
+ return count;
+}
+
+static ssize_t update_us_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+
+ return sysfs_emit(buf, "%lu\n", intervals->update_us);
+}
+
+static ssize_t update_us_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+ unsigned long us;
+ int err = kstrtoul(buf, 0, &us);
+
+ if (err)
+ return -EINVAL;
+
+ intervals->update_us = us;
+ return count;
+}
+
+static void damon_sysfs_intervals_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_intervals, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_intervals_sample_us_attr =
+ __ATTR_RW_MODE(sample_us, 0600);
+
+static struct kobj_attribute damon_sysfs_intervals_aggr_us_attr =
+ __ATTR_RW_MODE(aggr_us, 0600);
+
+static struct kobj_attribute damon_sysfs_intervals_update_us_attr =
+ __ATTR_RW_MODE(update_us, 0600);
+
+static struct attribute *damon_sysfs_intervals_attrs[] = {
+ &damon_sysfs_intervals_sample_us_attr.attr,
+ &damon_sysfs_intervals_aggr_us_attr.attr,
+ &damon_sysfs_intervals_update_us_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_intervals);
+
+static struct kobj_type damon_sysfs_intervals_ktype = {
+ .release = damon_sysfs_intervals_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_intervals_groups,
+};
+
+/*
+ * monitoring_attrs directory
+ */
+
+struct damon_sysfs_attrs {
+ struct kobject kobj;
+ struct damon_sysfs_intervals *intervals;
+ struct damon_sysfs_ul_range *nr_regions_range;
+};
+
+static struct damon_sysfs_attrs *damon_sysfs_attrs_alloc(void)
+{
+ struct damon_sysfs_attrs *attrs = kmalloc(sizeof(*attrs), GFP_KERNEL);
+
+ if (!attrs)
+ return NULL;
+ attrs->kobj = (struct kobject){};
+ return attrs;
+}
+
+static int damon_sysfs_attrs_add_dirs(struct damon_sysfs_attrs *attrs)
+{
+ struct damon_sysfs_intervals *intervals;
+ struct damon_sysfs_ul_range *nr_regions_range;
+ int err;
+
+ intervals = damon_sysfs_intervals_alloc(5000, 100000, 60000000);
+ if (!intervals)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&intervals->kobj,
+ &damon_sysfs_intervals_ktype, &attrs->kobj,
+ "intervals");
+ if (err)
+ goto put_intervals_out;
+ attrs->intervals = intervals;
+
+ nr_regions_range = damon_sysfs_ul_range_alloc(10, 1000);
+ if (!nr_regions_range) {
+ err = -ENOMEM;
+ goto put_intervals_out;
+ }
+
+ err = kobject_init_and_add(&nr_regions_range->kobj,
+ &damon_sysfs_ul_range_ktype, &attrs->kobj,
+ "nr_regions");
+ if (err)
+ goto put_nr_regions_intervals_out;
+ attrs->nr_regions_range = nr_regions_range;
+ return 0;
+
+put_nr_regions_intervals_out:
+ kobject_put(&nr_regions_range->kobj);
+ attrs->nr_regions_range = NULL;
+put_intervals_out:
+ kobject_put(&intervals->kobj);
+ attrs->intervals = NULL;
+ return err;
+}
+
+static void damon_sysfs_attrs_rm_dirs(struct damon_sysfs_attrs *attrs)
+{
+ kobject_put(&attrs->nr_regions_range->kobj);
+ kobject_put(&attrs->intervals->kobj);
+}
+
+static void damon_sysfs_attrs_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_attrs, kobj));
+}
+
+static struct attribute *damon_sysfs_attrs_attrs[] = {
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_attrs);
+
+static struct kobj_type damon_sysfs_attrs_ktype = {
+ .release = damon_sysfs_attrs_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_attrs_groups,
+};
+
+/*
+ * context directory
+ */
+
+/* This should match with enum damon_ops_id */
+static const char * const damon_sysfs_ops_strs[] = {
+ "vaddr",
+ "paddr",
+};
+
+struct damon_sysfs_context {
+ struct kobject kobj;
+ enum damon_ops_id ops_id;
+ struct damon_sysfs_attrs *attrs;
+ struct damon_sysfs_targets *targets;
+};
+
+static struct damon_sysfs_context *damon_sysfs_context_alloc(
+ enum damon_ops_id ops_id)
+{
+ struct damon_sysfs_context *context = kmalloc(sizeof(*context),
+ GFP_KERNEL);
+
+ if (!context)
+ return NULL;
+ context->kobj = (struct kobject){};
+ context->ops_id = ops_id;
+ return context;
+}
+
+static int damon_sysfs_context_set_attrs(struct damon_sysfs_context *context)
+{
+ struct damon_sysfs_attrs *attrs = damon_sysfs_attrs_alloc();
+ int err;
+
+ if (!attrs)
+ return -ENOMEM;
+ err = kobject_init_and_add(&attrs->kobj, &damon_sysfs_attrs_ktype,
+ &context->kobj, "monitoring_attrs");
+ if (err)
+ goto out;
+ err = damon_sysfs_attrs_add_dirs(attrs);
+ if (err)
+ goto out;
+ context->attrs = attrs;
+ return 0;
+
+out:
+ kobject_put(&attrs->kobj);
+ return err;
+}
+
+static int damon_sysfs_context_set_targets(struct damon_sysfs_context *context)
+{
+ struct damon_sysfs_targets *targets = damon_sysfs_targets_alloc();
+ int err;
+
+ if (!targets)
+ return -ENOMEM;
+ err = kobject_init_and_add(&targets->kobj, &damon_sysfs_targets_ktype,
+ &context->kobj, "targets");
+ if (err) {
+ kobject_put(&targets->kobj);
+ return err;
+ }
+ context->targets = targets;
+ return 0;
+}
+
+static int damon_sysfs_context_add_dirs(struct damon_sysfs_context *context)
+{
+ int err;
+
+ err = damon_sysfs_context_set_attrs(context);
+ if (err)
+ return err;
+
+ err = damon_sysfs_context_set_targets(context);
+ if (err)
+ goto put_attrs_out;
+ return 0;
+
+put_attrs_out:
+ kobject_put(&context->attrs->kobj);
+ context->attrs = NULL;
+ return err;
+}
+
+static void damon_sysfs_context_rm_dirs(struct damon_sysfs_context *context)
+{
+ damon_sysfs_attrs_rm_dirs(context->attrs);
+ kobject_put(&context->attrs->kobj);
+ damon_sysfs_targets_rm_dirs(context->targets);
+ kobject_put(&context->targets->kobj);
+}
+
+static ssize_t operations_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_context *context = container_of(kobj,
+ struct damon_sysfs_context, kobj);
+
+ return sysfs_emit(buf, "%s\n", damon_sysfs_ops_strs[context->ops_id]);
+}
+
+static ssize_t operations_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_context *context = container_of(kobj,
+ struct damon_sysfs_context, kobj);
+ enum damon_ops_id id;
+
+ for (id = 0; id < NR_DAMON_OPS; id++) {
+ if (sysfs_streq(buf, damon_sysfs_ops_strs[id])) {
+ /* Support only vaddr */
+ if (id != DAMON_OPS_VADDR)
+ return -EINVAL;
+ context->ops_id = id;
+ return count;
+ }
+ }
+ return -EINVAL;
+}
+
+static void damon_sysfs_context_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_context, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_context_operations_attr =
+ __ATTR_RW_MODE(operations, 0600);
+
+static struct attribute *damon_sysfs_context_attrs[] = {
+ &damon_sysfs_context_operations_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_context);
+
+static struct kobj_type damon_sysfs_context_ktype = {
+ .release = damon_sysfs_context_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_context_groups,
+};
+
+/*
+ * contexts directory
+ */
+
+struct damon_sysfs_contexts {
+ struct kobject kobj;
+ struct damon_sysfs_context **contexts_arr;
+ int nr;
+};
+
+static struct damon_sysfs_contexts *damon_sysfs_contexts_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_contexts), GFP_KERNEL);
+}
+
+static void damon_sysfs_contexts_rm_dirs(struct damon_sysfs_contexts *contexts)
+{
+ struct damon_sysfs_context **contexts_arr = contexts->contexts_arr;
+ int i;
+
+ for (i = 0; i < contexts->nr; i++) {
+ damon_sysfs_context_rm_dirs(contexts_arr[i]);
+ kobject_put(&contexts_arr[i]->kobj);
+ }
+ contexts->nr = 0;
+ kfree(contexts_arr);
+ contexts->contexts_arr = NULL;
+}
+
+static int damon_sysfs_contexts_add_dirs(struct damon_sysfs_contexts *contexts,
+ int nr_contexts)
+{
+ struct damon_sysfs_context **contexts_arr, *context;
+ int err, i;
+
+ damon_sysfs_contexts_rm_dirs(contexts);
+ if (!nr_contexts)
+ return 0;
+
+ contexts_arr = kmalloc_array(nr_contexts, sizeof(*contexts_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!contexts_arr)
+ return -ENOMEM;
+ contexts->contexts_arr = contexts_arr;
+
+ for (i = 0; i < nr_contexts; i++) {
+ context = damon_sysfs_context_alloc(DAMON_OPS_VADDR);
+ if (!context) {
+ damon_sysfs_contexts_rm_dirs(contexts);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&context->kobj,
+ &damon_sysfs_context_ktype, &contexts->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+
+ err = damon_sysfs_context_add_dirs(context);
+ if (err)
+ goto out;
+
+ contexts_arr[i] = context;
+ contexts->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_contexts_rm_dirs(contexts);
+ kobject_put(&context->kobj);
+ return err;
+}
+
+static ssize_t nr_contexts_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_contexts *contexts = container_of(kobj,
+ struct damon_sysfs_contexts, kobj);
+
+ return sysfs_emit(buf, "%d\n", contexts->nr);
+}
+
+static ssize_t nr_contexts_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_contexts *contexts = container_of(kobj,
+ struct damon_sysfs_contexts, kobj);
+ int nr, err;
+
+ err = kstrtoint(buf, 0, &nr);
+ if (err)
+ return err;
+ /* TODO: support multiple contexts per kdamond */
+ if (nr < 0 || 1 < nr)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_contexts_add_dirs(contexts, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_contexts_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_contexts, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_contexts_nr_attr
+ = __ATTR_RW_MODE(nr_contexts, 0600);
+
+static struct attribute *damon_sysfs_contexts_attrs[] = {
+ &damon_sysfs_contexts_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_contexts);
+
+static struct kobj_type damon_sysfs_contexts_ktype = {
+ .release = damon_sysfs_contexts_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_contexts_groups,
+};
+
+/*
+ * kdamond directory
+ */
+
+struct damon_sysfs_kdamond {
+ struct kobject kobj;
+ struct damon_sysfs_contexts *contexts;
+ struct damon_ctx *damon_ctx;
+};
+
+static struct damon_sysfs_kdamond *damon_sysfs_kdamond_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_kdamond), GFP_KERNEL);
+}
+
+static int damon_sysfs_kdamond_add_dirs(struct damon_sysfs_kdamond *kdamond)
+{
+ struct damon_sysfs_contexts *contexts;
+ int err;
+
+ contexts = damon_sysfs_contexts_alloc();
+ if (!contexts)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&contexts->kobj,
+ &damon_sysfs_contexts_ktype, &kdamond->kobj,
+ "contexts");
+ if (err) {
+ kobject_put(&contexts->kobj);
+ return err;
+ }
+ kdamond->contexts = contexts;
+
+ return err;
+}
+
+static void damon_sysfs_kdamond_rm_dirs(struct damon_sysfs_kdamond *kdamond)
+{
+ damon_sysfs_contexts_rm_dirs(kdamond->contexts);
+ kobject_put(&kdamond->contexts->kobj);
+}
+
+static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ return -EINVAL;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ return -EINVAL;
+}
+
+static ssize_t pid_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return -EINVAL;
+}
+
+static void damon_sysfs_kdamond_release(struct kobject *kobj)
+{
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+
+ if (kdamond->damon_ctx)
+ damon_destroy_ctx(kdamond->damon_ctx);
+ kfree(container_of(kobj, struct damon_sysfs_kdamond, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_kdamond_state_attr =
+ __ATTR_RW_MODE(state, 0600);
+
+static struct kobj_attribute damon_sysfs_kdamond_pid_attr =
+ __ATTR_RO_MODE(pid, 0400);
+
+static struct attribute *damon_sysfs_kdamond_attrs[] = {
+ &damon_sysfs_kdamond_state_attr.attr,
+ &damon_sysfs_kdamond_pid_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_kdamond);
+
+static struct kobj_type damon_sysfs_kdamond_ktype = {
+ .release = damon_sysfs_kdamond_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_kdamond_groups,
+};
+
+/*
+ * kdamonds directory
+ */
+
+struct damon_sysfs_kdamonds {
+ struct kobject kobj;
+ struct damon_sysfs_kdamond **kdamonds_arr;
+ int nr;
+};
+
+static struct damon_sysfs_kdamonds *damon_sysfs_kdamonds_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_kdamonds), GFP_KERNEL);
+}
+
+static void damon_sysfs_kdamonds_rm_dirs(struct damon_sysfs_kdamonds *kdamonds)
+{
+ struct damon_sysfs_kdamond **kdamonds_arr = kdamonds->kdamonds_arr;
+ int i;
+
+ for (i = 0; i < kdamonds->nr; i++) {
+ damon_sysfs_kdamond_rm_dirs(kdamonds_arr[i]);
+ kobject_put(&kdamonds_arr[i]->kobj);
+ }
+ kdamonds->nr = 0;
+ kfree(kdamonds_arr);
+ kdamonds->kdamonds_arr = NULL;
+}
+
+static int damon_sysfs_nr_running_ctxs(struct damon_sysfs_kdamond **kdamonds,
+ int nr_kdamonds)
+{
+ int nr_running_ctxs = 0;
+ int i;
+
+ for (i = 0; i < nr_kdamonds; i++) {
+ struct damon_ctx *ctx = kdamonds[i]->damon_ctx;
+
+ if (!ctx)
+ continue;
+ mutex_lock(&ctx->kdamond_lock);
+ if (ctx->kdamond)
+ nr_running_ctxs++;
+ mutex_unlock(&ctx->kdamond_lock);
+ }
+ return nr_running_ctxs;
+}
+
+static int damon_sysfs_kdamonds_add_dirs(struct damon_sysfs_kdamonds *kdamonds,
+ int nr_kdamonds)
+{
+ struct damon_sysfs_kdamond **kdamonds_arr, *kdamond;
+ int err, i;
+
+ if (damon_sysfs_nr_running_ctxs(kdamonds->kdamonds_arr, kdamonds->nr))
+ return -EBUSY;
+
+ damon_sysfs_kdamonds_rm_dirs(kdamonds);
+ if (!nr_kdamonds)
+ return 0;
+
+ kdamonds_arr = kmalloc_array(nr_kdamonds, sizeof(*kdamonds_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!kdamonds_arr)
+ return -ENOMEM;
+ kdamonds->kdamonds_arr = kdamonds_arr;
+
+ for (i = 0; i < nr_kdamonds; i++) {
+ kdamond = damon_sysfs_kdamond_alloc();
+ if (!kdamond) {
+ damon_sysfs_kdamonds_rm_dirs(kdamonds);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&kdamond->kobj,
+ &damon_sysfs_kdamond_ktype, &kdamonds->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+
+ err = damon_sysfs_kdamond_add_dirs(kdamond);
+ if (err)
+ goto out;
+
+ kdamonds_arr[i] = kdamond;
+ kdamonds->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_kdamonds_rm_dirs(kdamonds);
+ kobject_put(&kdamond->kobj);
+ return err;
+}
+
+static ssize_t nr_kdamonds_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_kdamonds *kdamonds = container_of(kobj,
+ struct damon_sysfs_kdamonds, kobj);
+
+ return sysfs_emit(buf, "%d\n", kdamonds->nr);
+}
+
+static ssize_t nr_kdamonds_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_kdamonds *kdamonds = container_of(kobj,
+ struct damon_sysfs_kdamonds, kobj);
+ int nr, err;
+
+ err = kstrtoint(buf, 0, &nr);
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_kdamonds_add_dirs(kdamonds, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_kdamonds_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_kdamonds, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_kdamonds_nr_attr =
+ __ATTR_RW_MODE(nr_kdamonds, 0600);
+
+static struct attribute *damon_sysfs_kdamonds_attrs[] = {
+ &damon_sysfs_kdamonds_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_kdamonds);
+
+static struct kobj_type damon_sysfs_kdamonds_ktype = {
+ .release = damon_sysfs_kdamonds_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_kdamonds_groups,
+};
+
+/*
+ * damon user interface directory
+ */
+
+struct damon_sysfs_ui_dir {
+ struct kobject kobj;
+ struct damon_sysfs_kdamonds *kdamonds;
+};
+
+static struct damon_sysfs_ui_dir *damon_sysfs_ui_dir_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_ui_dir), GFP_KERNEL);
+}
+
+static int damon_sysfs_ui_dir_add_dirs(struct damon_sysfs_ui_dir *ui_dir)
+{
+ struct damon_sysfs_kdamonds *kdamonds;
+ int err;
+
+ kdamonds = damon_sysfs_kdamonds_alloc();
+ if (!kdamonds)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&kdamonds->kobj,
+ &damon_sysfs_kdamonds_ktype, &ui_dir->kobj,
+ "kdamonds");
+ if (err) {
+ kobject_put(&kdamonds->kobj);
+ return err;
+ }
+ ui_dir->kdamonds = kdamonds;
+ return err;
+}
+
+static void damon_sysfs_ui_dir_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_ui_dir, kobj));
+}
+
+static struct attribute *damon_sysfs_ui_dir_attrs[] = {
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_ui_dir);
+
+static struct kobj_type damon_sysfs_ui_dir_ktype = {
+ .release = damon_sysfs_ui_dir_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_ui_dir_groups,
+};
+
+static int __init damon_sysfs_init(void)
+{
+ struct kobject *damon_sysfs_root;
+ struct damon_sysfs_ui_dir *admin;
+ int err;
+
+ damon_sysfs_root = kobject_create_and_add("damon", mm_kobj);
+ if (!damon_sysfs_root)
+ return -ENOMEM;
+
+ admin = damon_sysfs_ui_dir_alloc();
+ if (!admin) {
+ kobject_put(damon_sysfs_root);
+ return -ENOMEM;
+ }
+ err = kobject_init_and_add(&admin->kobj, &damon_sysfs_ui_dir_ktype,
+ damon_sysfs_root, "admin");
+ if (err)
+ goto out;
+ err = damon_sysfs_ui_dir_add_dirs(admin);
+ if (err)
+ goto out;
+ return 0;
+
+out:
+ kobject_put(&admin->kobj);
+ kobject_put(damon_sysfs_root);
+ return err;
+}
+subsys_initcall(damon_sysfs_init);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 216/227] mm/damon: implement a minimal stub for sysfs-based DAMON interface
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, jiapeng.chong, gregkh, corbet, sj, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 30840 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon: implement a minimal stub for sysfs-based DAMON interface
DAMON's debugfs-based user interface served very well, so far. However,
it unnecessarily depends on debugfs, while DAMON is not aimed to be used
for only debugging. Also, the interface receives multiple values via one
file. For example, schemes file receives 18 values separated by white
spaces. As a result, it is ineffient, hard to be used, and difficult to
be extended. Especially, keeping backward compatibility of user space
tools is getting only challenging. It would be better to implement
another reliable and flexible interface and deprecate the debugfs
interface in long term.
To this end, this commit implements a stub of a part of the new user
interface of DAMON using sysfs. Specifically, this commit implements the
sysfs control parts for virtual address space monitoring.
More specifically, the idea of the new interface is, using directory
hierarchies and making one file for one value. The hierarchy that this
commit is introducing is as below. In the below figure, parents-children
relations are represented with indentations, each directory is having
``/`` suffix, and files in each directory are separated by comma (",").
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Writing a number <N> to each 'nr' file makes directories of name <0> to
<N-1> in the directory of the 'nr' file. That's all this commit does.
Writing proper values to relevant files will construct the DAMON contexts,
and writing a special keyword, 'on', to 'state' files for each kdamond
will ask DAMON to start the constructed contexts.
For a short example, using below commands for monitoring virtual address
spaces of a given workload is imaginable:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr_kdamonds
# echo 1 > kdamonds/0/contexts/nr_contexts
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr_targets
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
# echo on > kdamonds/0/state
Please note that this commit is implementing only the sysfs part stub as
abovely mentioned. This commit doesn't implement the special keywords for
'state' files. Following commits will do that.
[jiapeng.chong@linux.alibaba.com: fix missing error code in damon_sysfs_attrs_add_dirs()]
Link: https://lkml.kernel.org/r/20220302111120.24984-1-jiapeng.chong@linux.alibaba.com
Link: https://lkml.kernel.org/r/20220228081314.5770-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/Kconfig | 7
mm/damon/Makefile | 1
mm/damon/sysfs.c | 1084 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 1092 insertions(+)
--- a/mm/damon/Kconfig~mm-damon-implement-a-minimal-stub-for-sysfs-based-damon-interface
+++ a/mm/damon/Kconfig
@@ -52,6 +52,13 @@ config DAMON_VADDR_KUNIT_TEST
If unsure, say N.
+config DAMON_SYSFS
+ bool "DAMON sysfs interface"
+ depends on DAMON && SYSFS
+ help
+ This builds the sysfs interface for DAMON. The user space can use
+ the interface for arbitrary data access monitoring.
+
config DAMON_DBGFS
bool "DAMON debugfs interface"
depends on DAMON_VADDR && DAMON_PADDR && DEBUG_FS
--- a/mm/damon/Makefile~mm-damon-implement-a-minimal-stub-for-sysfs-based-damon-interface
+++ a/mm/damon/Makefile
@@ -3,5 +3,6 @@
obj-y := core.o
obj-$(CONFIG_DAMON_VADDR) += ops-common.o vaddr.o
obj-$(CONFIG_DAMON_PADDR) += ops-common.o paddr.o
+obj-$(CONFIG_DAMON_SYSFS) += sysfs.o
obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
obj-$(CONFIG_DAMON_RECLAIM) += reclaim.o
--- /dev/null
+++ a/mm/damon/sysfs.c
@@ -0,0 +1,1084 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON sysfs Interface
+ *
+ * Copyright (c) 2022 SeongJae Park <sj@kernel.org>
+ */
+
+#include <linux/damon.h>
+#include <linux/kobject.h>
+#include <linux/pid.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+static DEFINE_MUTEX(damon_sysfs_lock);
+
+/*
+ * unsigned long range directory
+ */
+
+struct damon_sysfs_ul_range {
+ struct kobject kobj;
+ unsigned long min;
+ unsigned long max;
+};
+
+static struct damon_sysfs_ul_range *damon_sysfs_ul_range_alloc(
+ unsigned long min,
+ unsigned long max)
+{
+ struct damon_sysfs_ul_range *range = kmalloc(sizeof(*range),
+ GFP_KERNEL);
+
+ if (!range)
+ return NULL;
+ range->kobj = (struct kobject){};
+ range->min = min;
+ range->max = max;
+
+ return range;
+}
+
+static ssize_t min_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+
+ return sysfs_emit(buf, "%lu\n", range->min);
+}
+
+static ssize_t min_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+ unsigned long min;
+ int err;
+
+ err = kstrtoul(buf, 0, &min);
+ if (err)
+ return -EINVAL;
+
+ range->min = min;
+ return count;
+}
+
+static ssize_t max_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+
+ return sysfs_emit(buf, "%lu\n", range->max);
+}
+
+static ssize_t max_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_ul_range *range = container_of(kobj,
+ struct damon_sysfs_ul_range, kobj);
+ unsigned long max;
+ int err;
+
+ err = kstrtoul(buf, 0, &max);
+ if (err)
+ return -EINVAL;
+
+ range->max = max;
+ return count;
+}
+
+static void damon_sysfs_ul_range_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_ul_range, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_ul_range_min_attr =
+ __ATTR_RW_MODE(min, 0600);
+
+static struct kobj_attribute damon_sysfs_ul_range_max_attr =
+ __ATTR_RW_MODE(max, 0600);
+
+static struct attribute *damon_sysfs_ul_range_attrs[] = {
+ &damon_sysfs_ul_range_min_attr.attr,
+ &damon_sysfs_ul_range_max_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_ul_range);
+
+static struct kobj_type damon_sysfs_ul_range_ktype = {
+ .release = damon_sysfs_ul_range_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_ul_range_groups,
+};
+
+/*
+ * target directory
+ */
+
+struct damon_sysfs_target {
+ struct kobject kobj;
+ int pid;
+};
+
+static struct damon_sysfs_target *damon_sysfs_target_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_target), GFP_KERNEL);
+}
+
+static ssize_t pid_target_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_target *target = container_of(kobj,
+ struct damon_sysfs_target, kobj);
+
+ return sysfs_emit(buf, "%d\n", target->pid);
+}
+
+static ssize_t pid_target_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_target *target = container_of(kobj,
+ struct damon_sysfs_target, kobj);
+ int err = kstrtoint(buf, 0, &target->pid);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_target_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_target, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_target_pid_attr =
+ __ATTR_RW_MODE(pid_target, 0600);
+
+static struct attribute *damon_sysfs_target_attrs[] = {
+ &damon_sysfs_target_pid_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_target);
+
+static struct kobj_type damon_sysfs_target_ktype = {
+ .release = damon_sysfs_target_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_target_groups,
+};
+
+/*
+ * targets directory
+ */
+
+struct damon_sysfs_targets {
+ struct kobject kobj;
+ struct damon_sysfs_target **targets_arr;
+ int nr;
+};
+
+static struct damon_sysfs_targets *damon_sysfs_targets_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_targets), GFP_KERNEL);
+}
+
+static void damon_sysfs_targets_rm_dirs(struct damon_sysfs_targets *targets)
+{
+ struct damon_sysfs_target **targets_arr = targets->targets_arr;
+ int i;
+
+ for (i = 0; i < targets->nr; i++)
+ kobject_put(&targets_arr[i]->kobj);
+ targets->nr = 0;
+ kfree(targets_arr);
+ targets->targets_arr = NULL;
+}
+
+static int damon_sysfs_targets_add_dirs(struct damon_sysfs_targets *targets,
+ int nr_targets)
+{
+ struct damon_sysfs_target **targets_arr, *target;
+ int err, i;
+
+ damon_sysfs_targets_rm_dirs(targets);
+ if (!nr_targets)
+ return 0;
+
+ targets_arr = kmalloc_array(nr_targets, sizeof(*targets_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!targets_arr)
+ return -ENOMEM;
+ targets->targets_arr = targets_arr;
+
+ for (i = 0; i < nr_targets; i++) {
+ target = damon_sysfs_target_alloc();
+ if (!target) {
+ damon_sysfs_targets_rm_dirs(targets);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&target->kobj,
+ &damon_sysfs_target_ktype, &targets->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+
+ targets_arr[i] = target;
+ targets->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_targets_rm_dirs(targets);
+ kobject_put(&target->kobj);
+ return err;
+}
+
+static ssize_t nr_targets_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_targets *targets = container_of(kobj,
+ struct damon_sysfs_targets, kobj);
+
+ return sysfs_emit(buf, "%d\n", targets->nr);
+}
+
+static ssize_t nr_targets_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_targets *targets = container_of(kobj,
+ struct damon_sysfs_targets, kobj);
+ int nr, err = kstrtoint(buf, 0, &nr);
+
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_targets_add_dirs(targets, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_targets_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_targets, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_targets_nr_attr =
+ __ATTR_RW_MODE(nr_targets, 0600);
+
+static struct attribute *damon_sysfs_targets_attrs[] = {
+ &damon_sysfs_targets_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_targets);
+
+static struct kobj_type damon_sysfs_targets_ktype = {
+ .release = damon_sysfs_targets_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_targets_groups,
+};
+
+/*
+ * intervals directory
+ */
+
+struct damon_sysfs_intervals {
+ struct kobject kobj;
+ unsigned long sample_us;
+ unsigned long aggr_us;
+ unsigned long update_us;
+};
+
+static struct damon_sysfs_intervals *damon_sysfs_intervals_alloc(
+ unsigned long sample_us, unsigned long aggr_us,
+ unsigned long update_us)
+{
+ struct damon_sysfs_intervals *intervals = kmalloc(sizeof(*intervals),
+ GFP_KERNEL);
+
+ if (!intervals)
+ return NULL;
+
+ intervals->kobj = (struct kobject){};
+ intervals->sample_us = sample_us;
+ intervals->aggr_us = aggr_us;
+ intervals->update_us = update_us;
+ return intervals;
+}
+
+static ssize_t sample_us_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+
+ return sysfs_emit(buf, "%lu\n", intervals->sample_us);
+}
+
+static ssize_t sample_us_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+ unsigned long us;
+ int err = kstrtoul(buf, 0, &us);
+
+ if (err)
+ return -EINVAL;
+
+ intervals->sample_us = us;
+ return count;
+}
+
+static ssize_t aggr_us_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+
+ return sysfs_emit(buf, "%lu\n", intervals->aggr_us);
+}
+
+static ssize_t aggr_us_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+ unsigned long us;
+ int err = kstrtoul(buf, 0, &us);
+
+ if (err)
+ return -EINVAL;
+
+ intervals->aggr_us = us;
+ return count;
+}
+
+static ssize_t update_us_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+
+ return sysfs_emit(buf, "%lu\n", intervals->update_us);
+}
+
+static ssize_t update_us_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_intervals *intervals = container_of(kobj,
+ struct damon_sysfs_intervals, kobj);
+ unsigned long us;
+ int err = kstrtoul(buf, 0, &us);
+
+ if (err)
+ return -EINVAL;
+
+ intervals->update_us = us;
+ return count;
+}
+
+static void damon_sysfs_intervals_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_intervals, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_intervals_sample_us_attr =
+ __ATTR_RW_MODE(sample_us, 0600);
+
+static struct kobj_attribute damon_sysfs_intervals_aggr_us_attr =
+ __ATTR_RW_MODE(aggr_us, 0600);
+
+static struct kobj_attribute damon_sysfs_intervals_update_us_attr =
+ __ATTR_RW_MODE(update_us, 0600);
+
+static struct attribute *damon_sysfs_intervals_attrs[] = {
+ &damon_sysfs_intervals_sample_us_attr.attr,
+ &damon_sysfs_intervals_aggr_us_attr.attr,
+ &damon_sysfs_intervals_update_us_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_intervals);
+
+static struct kobj_type damon_sysfs_intervals_ktype = {
+ .release = damon_sysfs_intervals_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_intervals_groups,
+};
+
+/*
+ * monitoring_attrs directory
+ */
+
+struct damon_sysfs_attrs {
+ struct kobject kobj;
+ struct damon_sysfs_intervals *intervals;
+ struct damon_sysfs_ul_range *nr_regions_range;
+};
+
+static struct damon_sysfs_attrs *damon_sysfs_attrs_alloc(void)
+{
+ struct damon_sysfs_attrs *attrs = kmalloc(sizeof(*attrs), GFP_KERNEL);
+
+ if (!attrs)
+ return NULL;
+ attrs->kobj = (struct kobject){};
+ return attrs;
+}
+
+static int damon_sysfs_attrs_add_dirs(struct damon_sysfs_attrs *attrs)
+{
+ struct damon_sysfs_intervals *intervals;
+ struct damon_sysfs_ul_range *nr_regions_range;
+ int err;
+
+ intervals = damon_sysfs_intervals_alloc(5000, 100000, 60000000);
+ if (!intervals)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&intervals->kobj,
+ &damon_sysfs_intervals_ktype, &attrs->kobj,
+ "intervals");
+ if (err)
+ goto put_intervals_out;
+ attrs->intervals = intervals;
+
+ nr_regions_range = damon_sysfs_ul_range_alloc(10, 1000);
+ if (!nr_regions_range) {
+ err = -ENOMEM;
+ goto put_intervals_out;
+ }
+
+ err = kobject_init_and_add(&nr_regions_range->kobj,
+ &damon_sysfs_ul_range_ktype, &attrs->kobj,
+ "nr_regions");
+ if (err)
+ goto put_nr_regions_intervals_out;
+ attrs->nr_regions_range = nr_regions_range;
+ return 0;
+
+put_nr_regions_intervals_out:
+ kobject_put(&nr_regions_range->kobj);
+ attrs->nr_regions_range = NULL;
+put_intervals_out:
+ kobject_put(&intervals->kobj);
+ attrs->intervals = NULL;
+ return err;
+}
+
+static void damon_sysfs_attrs_rm_dirs(struct damon_sysfs_attrs *attrs)
+{
+ kobject_put(&attrs->nr_regions_range->kobj);
+ kobject_put(&attrs->intervals->kobj);
+}
+
+static void damon_sysfs_attrs_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_attrs, kobj));
+}
+
+static struct attribute *damon_sysfs_attrs_attrs[] = {
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_attrs);
+
+static struct kobj_type damon_sysfs_attrs_ktype = {
+ .release = damon_sysfs_attrs_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_attrs_groups,
+};
+
+/*
+ * context directory
+ */
+
+/* This should match with enum damon_ops_id */
+static const char * const damon_sysfs_ops_strs[] = {
+ "vaddr",
+ "paddr",
+};
+
+struct damon_sysfs_context {
+ struct kobject kobj;
+ enum damon_ops_id ops_id;
+ struct damon_sysfs_attrs *attrs;
+ struct damon_sysfs_targets *targets;
+};
+
+static struct damon_sysfs_context *damon_sysfs_context_alloc(
+ enum damon_ops_id ops_id)
+{
+ struct damon_sysfs_context *context = kmalloc(sizeof(*context),
+ GFP_KERNEL);
+
+ if (!context)
+ return NULL;
+ context->kobj = (struct kobject){};
+ context->ops_id = ops_id;
+ return context;
+}
+
+static int damon_sysfs_context_set_attrs(struct damon_sysfs_context *context)
+{
+ struct damon_sysfs_attrs *attrs = damon_sysfs_attrs_alloc();
+ int err;
+
+ if (!attrs)
+ return -ENOMEM;
+ err = kobject_init_and_add(&attrs->kobj, &damon_sysfs_attrs_ktype,
+ &context->kobj, "monitoring_attrs");
+ if (err)
+ goto out;
+ err = damon_sysfs_attrs_add_dirs(attrs);
+ if (err)
+ goto out;
+ context->attrs = attrs;
+ return 0;
+
+out:
+ kobject_put(&attrs->kobj);
+ return err;
+}
+
+static int damon_sysfs_context_set_targets(struct damon_sysfs_context *context)
+{
+ struct damon_sysfs_targets *targets = damon_sysfs_targets_alloc();
+ int err;
+
+ if (!targets)
+ return -ENOMEM;
+ err = kobject_init_and_add(&targets->kobj, &damon_sysfs_targets_ktype,
+ &context->kobj, "targets");
+ if (err) {
+ kobject_put(&targets->kobj);
+ return err;
+ }
+ context->targets = targets;
+ return 0;
+}
+
+static int damon_sysfs_context_add_dirs(struct damon_sysfs_context *context)
+{
+ int err;
+
+ err = damon_sysfs_context_set_attrs(context);
+ if (err)
+ return err;
+
+ err = damon_sysfs_context_set_targets(context);
+ if (err)
+ goto put_attrs_out;
+ return 0;
+
+put_attrs_out:
+ kobject_put(&context->attrs->kobj);
+ context->attrs = NULL;
+ return err;
+}
+
+static void damon_sysfs_context_rm_dirs(struct damon_sysfs_context *context)
+{
+ damon_sysfs_attrs_rm_dirs(context->attrs);
+ kobject_put(&context->attrs->kobj);
+ damon_sysfs_targets_rm_dirs(context->targets);
+ kobject_put(&context->targets->kobj);
+}
+
+static ssize_t operations_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_context *context = container_of(kobj,
+ struct damon_sysfs_context, kobj);
+
+ return sysfs_emit(buf, "%s\n", damon_sysfs_ops_strs[context->ops_id]);
+}
+
+static ssize_t operations_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_context *context = container_of(kobj,
+ struct damon_sysfs_context, kobj);
+ enum damon_ops_id id;
+
+ for (id = 0; id < NR_DAMON_OPS; id++) {
+ if (sysfs_streq(buf, damon_sysfs_ops_strs[id])) {
+ /* Support only vaddr */
+ if (id != DAMON_OPS_VADDR)
+ return -EINVAL;
+ context->ops_id = id;
+ return count;
+ }
+ }
+ return -EINVAL;
+}
+
+static void damon_sysfs_context_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_context, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_context_operations_attr =
+ __ATTR_RW_MODE(operations, 0600);
+
+static struct attribute *damon_sysfs_context_attrs[] = {
+ &damon_sysfs_context_operations_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_context);
+
+static struct kobj_type damon_sysfs_context_ktype = {
+ .release = damon_sysfs_context_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_context_groups,
+};
+
+/*
+ * contexts directory
+ */
+
+struct damon_sysfs_contexts {
+ struct kobject kobj;
+ struct damon_sysfs_context **contexts_arr;
+ int nr;
+};
+
+static struct damon_sysfs_contexts *damon_sysfs_contexts_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_contexts), GFP_KERNEL);
+}
+
+static void damon_sysfs_contexts_rm_dirs(struct damon_sysfs_contexts *contexts)
+{
+ struct damon_sysfs_context **contexts_arr = contexts->contexts_arr;
+ int i;
+
+ for (i = 0; i < contexts->nr; i++) {
+ damon_sysfs_context_rm_dirs(contexts_arr[i]);
+ kobject_put(&contexts_arr[i]->kobj);
+ }
+ contexts->nr = 0;
+ kfree(contexts_arr);
+ contexts->contexts_arr = NULL;
+}
+
+static int damon_sysfs_contexts_add_dirs(struct damon_sysfs_contexts *contexts,
+ int nr_contexts)
+{
+ struct damon_sysfs_context **contexts_arr, *context;
+ int err, i;
+
+ damon_sysfs_contexts_rm_dirs(contexts);
+ if (!nr_contexts)
+ return 0;
+
+ contexts_arr = kmalloc_array(nr_contexts, sizeof(*contexts_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!contexts_arr)
+ return -ENOMEM;
+ contexts->contexts_arr = contexts_arr;
+
+ for (i = 0; i < nr_contexts; i++) {
+ context = damon_sysfs_context_alloc(DAMON_OPS_VADDR);
+ if (!context) {
+ damon_sysfs_contexts_rm_dirs(contexts);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&context->kobj,
+ &damon_sysfs_context_ktype, &contexts->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+
+ err = damon_sysfs_context_add_dirs(context);
+ if (err)
+ goto out;
+
+ contexts_arr[i] = context;
+ contexts->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_contexts_rm_dirs(contexts);
+ kobject_put(&context->kobj);
+ return err;
+}
+
+static ssize_t nr_contexts_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_contexts *contexts = container_of(kobj,
+ struct damon_sysfs_contexts, kobj);
+
+ return sysfs_emit(buf, "%d\n", contexts->nr);
+}
+
+static ssize_t nr_contexts_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_contexts *contexts = container_of(kobj,
+ struct damon_sysfs_contexts, kobj);
+ int nr, err;
+
+ err = kstrtoint(buf, 0, &nr);
+ if (err)
+ return err;
+ /* TODO: support multiple contexts per kdamond */
+ if (nr < 0 || 1 < nr)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_contexts_add_dirs(contexts, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_contexts_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_contexts, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_contexts_nr_attr
+ = __ATTR_RW_MODE(nr_contexts, 0600);
+
+static struct attribute *damon_sysfs_contexts_attrs[] = {
+ &damon_sysfs_contexts_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_contexts);
+
+static struct kobj_type damon_sysfs_contexts_ktype = {
+ .release = damon_sysfs_contexts_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_contexts_groups,
+};
+
+/*
+ * kdamond directory
+ */
+
+struct damon_sysfs_kdamond {
+ struct kobject kobj;
+ struct damon_sysfs_contexts *contexts;
+ struct damon_ctx *damon_ctx;
+};
+
+static struct damon_sysfs_kdamond *damon_sysfs_kdamond_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_kdamond), GFP_KERNEL);
+}
+
+static int damon_sysfs_kdamond_add_dirs(struct damon_sysfs_kdamond *kdamond)
+{
+ struct damon_sysfs_contexts *contexts;
+ int err;
+
+ contexts = damon_sysfs_contexts_alloc();
+ if (!contexts)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&contexts->kobj,
+ &damon_sysfs_contexts_ktype, &kdamond->kobj,
+ "contexts");
+ if (err) {
+ kobject_put(&contexts->kobj);
+ return err;
+ }
+ kdamond->contexts = contexts;
+
+ return err;
+}
+
+static void damon_sysfs_kdamond_rm_dirs(struct damon_sysfs_kdamond *kdamond)
+{
+ damon_sysfs_contexts_rm_dirs(kdamond->contexts);
+ kobject_put(&kdamond->contexts->kobj);
+}
+
+static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ return -EINVAL;
+}
+
+static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ return -EINVAL;
+}
+
+static ssize_t pid_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return -EINVAL;
+}
+
+static void damon_sysfs_kdamond_release(struct kobject *kobj)
+{
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+
+ if (kdamond->damon_ctx)
+ damon_destroy_ctx(kdamond->damon_ctx);
+ kfree(container_of(kobj, struct damon_sysfs_kdamond, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_kdamond_state_attr =
+ __ATTR_RW_MODE(state, 0600);
+
+static struct kobj_attribute damon_sysfs_kdamond_pid_attr =
+ __ATTR_RO_MODE(pid, 0400);
+
+static struct attribute *damon_sysfs_kdamond_attrs[] = {
+ &damon_sysfs_kdamond_state_attr.attr,
+ &damon_sysfs_kdamond_pid_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_kdamond);
+
+static struct kobj_type damon_sysfs_kdamond_ktype = {
+ .release = damon_sysfs_kdamond_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_kdamond_groups,
+};
+
+/*
+ * kdamonds directory
+ */
+
+struct damon_sysfs_kdamonds {
+ struct kobject kobj;
+ struct damon_sysfs_kdamond **kdamonds_arr;
+ int nr;
+};
+
+static struct damon_sysfs_kdamonds *damon_sysfs_kdamonds_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_kdamonds), GFP_KERNEL);
+}
+
+static void damon_sysfs_kdamonds_rm_dirs(struct damon_sysfs_kdamonds *kdamonds)
+{
+ struct damon_sysfs_kdamond **kdamonds_arr = kdamonds->kdamonds_arr;
+ int i;
+
+ for (i = 0; i < kdamonds->nr; i++) {
+ damon_sysfs_kdamond_rm_dirs(kdamonds_arr[i]);
+ kobject_put(&kdamonds_arr[i]->kobj);
+ }
+ kdamonds->nr = 0;
+ kfree(kdamonds_arr);
+ kdamonds->kdamonds_arr = NULL;
+}
+
+static int damon_sysfs_nr_running_ctxs(struct damon_sysfs_kdamond **kdamonds,
+ int nr_kdamonds)
+{
+ int nr_running_ctxs = 0;
+ int i;
+
+ for (i = 0; i < nr_kdamonds; i++) {
+ struct damon_ctx *ctx = kdamonds[i]->damon_ctx;
+
+ if (!ctx)
+ continue;
+ mutex_lock(&ctx->kdamond_lock);
+ if (ctx->kdamond)
+ nr_running_ctxs++;
+ mutex_unlock(&ctx->kdamond_lock);
+ }
+ return nr_running_ctxs;
+}
+
+static int damon_sysfs_kdamonds_add_dirs(struct damon_sysfs_kdamonds *kdamonds,
+ int nr_kdamonds)
+{
+ struct damon_sysfs_kdamond **kdamonds_arr, *kdamond;
+ int err, i;
+
+ if (damon_sysfs_nr_running_ctxs(kdamonds->kdamonds_arr, kdamonds->nr))
+ return -EBUSY;
+
+ damon_sysfs_kdamonds_rm_dirs(kdamonds);
+ if (!nr_kdamonds)
+ return 0;
+
+ kdamonds_arr = kmalloc_array(nr_kdamonds, sizeof(*kdamonds_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!kdamonds_arr)
+ return -ENOMEM;
+ kdamonds->kdamonds_arr = kdamonds_arr;
+
+ for (i = 0; i < nr_kdamonds; i++) {
+ kdamond = damon_sysfs_kdamond_alloc();
+ if (!kdamond) {
+ damon_sysfs_kdamonds_rm_dirs(kdamonds);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&kdamond->kobj,
+ &damon_sysfs_kdamond_ktype, &kdamonds->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+
+ err = damon_sysfs_kdamond_add_dirs(kdamond);
+ if (err)
+ goto out;
+
+ kdamonds_arr[i] = kdamond;
+ kdamonds->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_kdamonds_rm_dirs(kdamonds);
+ kobject_put(&kdamond->kobj);
+ return err;
+}
+
+static ssize_t nr_kdamonds_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_kdamonds *kdamonds = container_of(kobj,
+ struct damon_sysfs_kdamonds, kobj);
+
+ return sysfs_emit(buf, "%d\n", kdamonds->nr);
+}
+
+static ssize_t nr_kdamonds_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_kdamonds *kdamonds = container_of(kobj,
+ struct damon_sysfs_kdamonds, kobj);
+ int nr, err;
+
+ err = kstrtoint(buf, 0, &nr);
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_kdamonds_add_dirs(kdamonds, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_kdamonds_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_kdamonds, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_kdamonds_nr_attr =
+ __ATTR_RW_MODE(nr_kdamonds, 0600);
+
+static struct attribute *damon_sysfs_kdamonds_attrs[] = {
+ &damon_sysfs_kdamonds_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_kdamonds);
+
+static struct kobj_type damon_sysfs_kdamonds_ktype = {
+ .release = damon_sysfs_kdamonds_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_kdamonds_groups,
+};
+
+/*
+ * damon user interface directory
+ */
+
+struct damon_sysfs_ui_dir {
+ struct kobject kobj;
+ struct damon_sysfs_kdamonds *kdamonds;
+};
+
+static struct damon_sysfs_ui_dir *damon_sysfs_ui_dir_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_ui_dir), GFP_KERNEL);
+}
+
+static int damon_sysfs_ui_dir_add_dirs(struct damon_sysfs_ui_dir *ui_dir)
+{
+ struct damon_sysfs_kdamonds *kdamonds;
+ int err;
+
+ kdamonds = damon_sysfs_kdamonds_alloc();
+ if (!kdamonds)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&kdamonds->kobj,
+ &damon_sysfs_kdamonds_ktype, &ui_dir->kobj,
+ "kdamonds");
+ if (err) {
+ kobject_put(&kdamonds->kobj);
+ return err;
+ }
+ ui_dir->kdamonds = kdamonds;
+ return err;
+}
+
+static void damon_sysfs_ui_dir_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_ui_dir, kobj));
+}
+
+static struct attribute *damon_sysfs_ui_dir_attrs[] = {
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_ui_dir);
+
+static struct kobj_type damon_sysfs_ui_dir_ktype = {
+ .release = damon_sysfs_ui_dir_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_ui_dir_groups,
+};
+
+static int __init damon_sysfs_init(void)
+{
+ struct kobject *damon_sysfs_root;
+ struct damon_sysfs_ui_dir *admin;
+ int err;
+
+ damon_sysfs_root = kobject_create_and_add("damon", mm_kobj);
+ if (!damon_sysfs_root)
+ return -ENOMEM;
+
+ admin = damon_sysfs_ui_dir_alloc();
+ if (!admin) {
+ kobject_put(damon_sysfs_root);
+ return -ENOMEM;
+ }
+ err = kobject_init_and_add(&admin->kobj, &damon_sysfs_ui_dir_ktype,
+ damon_sysfs_root, "admin");
+ if (err)
+ goto out;
+ err = damon_sysfs_ui_dir_add_dirs(admin);
+ if (err)
+ goto out;
+ return 0;
+
+out:
+ kobject_put(&admin->kobj);
+ kobject_put(damon_sysfs_root);
+ return err;
+}
+subsys_initcall(damon_sysfs_init);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 217/227] mm/damon/sysfs: link DAMON for virtual address spaces monitoring
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7237 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: link DAMON for virtual address spaces monitoring
This commit links the DAMON sysfs interface to DAMON so that users can
control DAMON via the interface. In detail, this commit makes writing
'on' to 'state' file constructs DAMON contexts based on values that users
have written to relevant sysfs files and start the context. It supports
only virtual address spaces monitoring at the moment, though.
The files hierarchy of DAMON sysfs interface after this commit is shown
below. In the below figure, parents-children relations are represented
with indentations, each directory is having ``/`` suffix, and files in
each directory are separated by comma (",").
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
The usage is straightforward. Writing a number ('N') to each 'nr_*' file
makes directories named '0' to 'N-1'. Users can construct DAMON contexts
by writing proper values to the files in the straightforward manner and
start each kdamond by writing 'on' to 'kdamonds/<N>/state'.
Link: https://lkml.kernel.org/r/20220228081314.5770-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 192 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 189 insertions(+), 3 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-link-damon-for-virtual-address-spaces-monitoring
+++ a/mm/damon/sysfs.c
@@ -808,22 +808,208 @@ static void damon_sysfs_kdamond_rm_dirs(
kobject_put(&kdamond->contexts->kobj);
}
+static bool damon_sysfs_ctx_running(struct damon_ctx *ctx)
+{
+ bool running;
+
+ mutex_lock(&ctx->kdamond_lock);
+ running = ctx->kdamond != NULL;
+ mutex_unlock(&ctx->kdamond_lock);
+ return running;
+}
+
static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
- return -EINVAL;
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+ struct damon_ctx *ctx = kdamond->damon_ctx;
+ bool running;
+
+ if (!ctx)
+ running = false;
+ else
+ running = damon_sysfs_ctx_running(ctx);
+
+ return sysfs_emit(buf, "%s\n", running ? "on" : "off");
+}
+
+static int damon_sysfs_set_attrs(struct damon_ctx *ctx,
+ struct damon_sysfs_attrs *sys_attrs)
+{
+ struct damon_sysfs_intervals *sys_intervals = sys_attrs->intervals;
+ struct damon_sysfs_ul_range *sys_nr_regions =
+ sys_attrs->nr_regions_range;
+
+ return damon_set_attrs(ctx, sys_intervals->sample_us,
+ sys_intervals->aggr_us, sys_intervals->update_us,
+ sys_nr_regions->min, sys_nr_regions->max);
+}
+
+static void damon_sysfs_destroy_targets(struct damon_ctx *ctx)
+{
+ struct damon_target *t, *next;
+
+ damon_for_each_target_safe(t, next, ctx) {
+ if (ctx->ops.id == DAMON_OPS_VADDR)
+ put_pid(t->pid);
+ damon_destroy_target(t);
+ }
+}
+
+static int damon_sysfs_set_targets(struct damon_ctx *ctx,
+ struct damon_sysfs_targets *sysfs_targets)
+{
+ int i;
+
+ for (i = 0; i < sysfs_targets->nr; i++) {
+ struct damon_sysfs_target *sys_target =
+ sysfs_targets->targets_arr[i];
+ struct damon_target *t = damon_new_target();
+
+ if (!t) {
+ damon_sysfs_destroy_targets(ctx);
+ return -ENOMEM;
+ }
+ if (ctx->ops.id == DAMON_OPS_VADDR) {
+ t->pid = find_get_pid(sys_target->pid);
+ if (!t->pid) {
+ damon_sysfs_destroy_targets(ctx);
+ return -EINVAL;
+ }
+ }
+ damon_add_target(ctx, t);
+ }
+ return 0;
+}
+
+static void damon_sysfs_before_terminate(struct damon_ctx *ctx)
+{
+ struct damon_target *t, *next;
+
+ if (ctx->ops.id != DAMON_OPS_VADDR)
+ return;
+
+ mutex_lock(&ctx->kdamond_lock);
+ damon_for_each_target_safe(t, next, ctx) {
+ put_pid(t->pid);
+ damon_destroy_target(t);
+ }
+ mutex_unlock(&ctx->kdamond_lock);
+}
+
+static struct damon_ctx *damon_sysfs_build_ctx(
+ struct damon_sysfs_context *sys_ctx)
+{
+ struct damon_ctx *ctx = damon_new_ctx();
+ int err;
+
+ if (!ctx)
+ return ERR_PTR(-ENOMEM);
+
+ err = damon_select_ops(ctx, sys_ctx->ops_id);
+ if (err)
+ goto out;
+ err = damon_sysfs_set_attrs(ctx, sys_ctx->attrs);
+ if (err)
+ goto out;
+ err = damon_sysfs_set_targets(ctx, sys_ctx->targets);
+ if (err)
+ goto out;
+
+ ctx->callback.before_terminate = damon_sysfs_before_terminate;
+ return ctx;
+
+out:
+ damon_destroy_ctx(ctx);
+ return ERR_PTR(err);
+}
+
+static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond)
+{
+ struct damon_ctx *ctx;
+ int err;
+
+ if (kdamond->damon_ctx &&
+ damon_sysfs_ctx_running(kdamond->damon_ctx))
+ return -EBUSY;
+ /* TODO: support multiple contexts per kdamond */
+ if (kdamond->contexts->nr != 1)
+ return -EINVAL;
+
+ if (kdamond->damon_ctx)
+ damon_destroy_ctx(kdamond->damon_ctx);
+ kdamond->damon_ctx = NULL;
+
+ ctx = damon_sysfs_build_ctx(kdamond->contexts->contexts_arr[0]);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+ err = damon_start(&ctx, 1, false);
+ if (err) {
+ damon_destroy_ctx(ctx);
+ return err;
+ }
+ kdamond->damon_ctx = ctx;
+ return err;
+}
+
+static int damon_sysfs_turn_damon_off(struct damon_sysfs_kdamond *kdamond)
+{
+ if (!kdamond->damon_ctx)
+ return -EINVAL;
+ return damon_stop(&kdamond->damon_ctx, 1);
+ /*
+ * To allow users show final monitoring results of already turned-off
+ * DAMON, we free kdamond->damon_ctx in next
+ * damon_sysfs_turn_damon_on(), or kdamonds_nr_store()
+ */
}
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
- return -EINVAL;
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+ ssize_t ret;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ if (sysfs_streq(buf, "on"))
+ ret = damon_sysfs_turn_damon_on(kdamond);
+ else if (sysfs_streq(buf, "off"))
+ ret = damon_sysfs_turn_damon_off(kdamond);
+ else
+ ret = -EINVAL;
+ mutex_unlock(&damon_sysfs_lock);
+ if (!ret)
+ ret = count;
+ return ret;
}
static ssize_t pid_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
- return -EINVAL;
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+ struct damon_ctx *ctx;
+ int pid;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ ctx = kdamond->damon_ctx;
+ if (!ctx) {
+ pid = -1;
+ goto out;
+ }
+ mutex_lock(&ctx->kdamond_lock);
+ if (!ctx->kdamond)
+ pid = -1;
+ else
+ pid = ctx->kdamond->pid;
+ mutex_unlock(&ctx->kdamond_lock);
+out:
+ mutex_unlock(&damon_sysfs_lock);
+ return sysfs_emit(buf, "%d\n", pid);
}
static void damon_sysfs_kdamond_release(struct kobject *kobj)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 217/227] mm/damon/sysfs: link DAMON for virtual address spaces monitoring
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7237 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: link DAMON for virtual address spaces monitoring
This commit links the DAMON sysfs interface to DAMON so that users can
control DAMON via the interface. In detail, this commit makes writing
'on' to 'state' file constructs DAMON contexts based on values that users
have written to relevant sysfs files and start the context. It supports
only virtual address spaces monitoring at the moment, though.
The files hierarchy of DAMON sysfs interface after this commit is shown
below. In the below figure, parents-children relations are represented
with indentations, each directory is having ``/`` suffix, and files in
each directory are separated by comma (",").
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
The usage is straightforward. Writing a number ('N') to each 'nr_*' file
makes directories named '0' to 'N-1'. Users can construct DAMON contexts
by writing proper values to the files in the straightforward manner and
start each kdamond by writing 'on' to 'kdamonds/<N>/state'.
Link: https://lkml.kernel.org/r/20220228081314.5770-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 192 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 189 insertions(+), 3 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-link-damon-for-virtual-address-spaces-monitoring
+++ a/mm/damon/sysfs.c
@@ -808,22 +808,208 @@ static void damon_sysfs_kdamond_rm_dirs(
kobject_put(&kdamond->contexts->kobj);
}
+static bool damon_sysfs_ctx_running(struct damon_ctx *ctx)
+{
+ bool running;
+
+ mutex_lock(&ctx->kdamond_lock);
+ running = ctx->kdamond != NULL;
+ mutex_unlock(&ctx->kdamond_lock);
+ return running;
+}
+
static ssize_t state_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
- return -EINVAL;
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+ struct damon_ctx *ctx = kdamond->damon_ctx;
+ bool running;
+
+ if (!ctx)
+ running = false;
+ else
+ running = damon_sysfs_ctx_running(ctx);
+
+ return sysfs_emit(buf, "%s\n", running ? "on" : "off");
+}
+
+static int damon_sysfs_set_attrs(struct damon_ctx *ctx,
+ struct damon_sysfs_attrs *sys_attrs)
+{
+ struct damon_sysfs_intervals *sys_intervals = sys_attrs->intervals;
+ struct damon_sysfs_ul_range *sys_nr_regions =
+ sys_attrs->nr_regions_range;
+
+ return damon_set_attrs(ctx, sys_intervals->sample_us,
+ sys_intervals->aggr_us, sys_intervals->update_us,
+ sys_nr_regions->min, sys_nr_regions->max);
+}
+
+static void damon_sysfs_destroy_targets(struct damon_ctx *ctx)
+{
+ struct damon_target *t, *next;
+
+ damon_for_each_target_safe(t, next, ctx) {
+ if (ctx->ops.id == DAMON_OPS_VADDR)
+ put_pid(t->pid);
+ damon_destroy_target(t);
+ }
+}
+
+static int damon_sysfs_set_targets(struct damon_ctx *ctx,
+ struct damon_sysfs_targets *sysfs_targets)
+{
+ int i;
+
+ for (i = 0; i < sysfs_targets->nr; i++) {
+ struct damon_sysfs_target *sys_target =
+ sysfs_targets->targets_arr[i];
+ struct damon_target *t = damon_new_target();
+
+ if (!t) {
+ damon_sysfs_destroy_targets(ctx);
+ return -ENOMEM;
+ }
+ if (ctx->ops.id == DAMON_OPS_VADDR) {
+ t->pid = find_get_pid(sys_target->pid);
+ if (!t->pid) {
+ damon_sysfs_destroy_targets(ctx);
+ return -EINVAL;
+ }
+ }
+ damon_add_target(ctx, t);
+ }
+ return 0;
+}
+
+static void damon_sysfs_before_terminate(struct damon_ctx *ctx)
+{
+ struct damon_target *t, *next;
+
+ if (ctx->ops.id != DAMON_OPS_VADDR)
+ return;
+
+ mutex_lock(&ctx->kdamond_lock);
+ damon_for_each_target_safe(t, next, ctx) {
+ put_pid(t->pid);
+ damon_destroy_target(t);
+ }
+ mutex_unlock(&ctx->kdamond_lock);
+}
+
+static struct damon_ctx *damon_sysfs_build_ctx(
+ struct damon_sysfs_context *sys_ctx)
+{
+ struct damon_ctx *ctx = damon_new_ctx();
+ int err;
+
+ if (!ctx)
+ return ERR_PTR(-ENOMEM);
+
+ err = damon_select_ops(ctx, sys_ctx->ops_id);
+ if (err)
+ goto out;
+ err = damon_sysfs_set_attrs(ctx, sys_ctx->attrs);
+ if (err)
+ goto out;
+ err = damon_sysfs_set_targets(ctx, sys_ctx->targets);
+ if (err)
+ goto out;
+
+ ctx->callback.before_terminate = damon_sysfs_before_terminate;
+ return ctx;
+
+out:
+ damon_destroy_ctx(ctx);
+ return ERR_PTR(err);
+}
+
+static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond)
+{
+ struct damon_ctx *ctx;
+ int err;
+
+ if (kdamond->damon_ctx &&
+ damon_sysfs_ctx_running(kdamond->damon_ctx))
+ return -EBUSY;
+ /* TODO: support multiple contexts per kdamond */
+ if (kdamond->contexts->nr != 1)
+ return -EINVAL;
+
+ if (kdamond->damon_ctx)
+ damon_destroy_ctx(kdamond->damon_ctx);
+ kdamond->damon_ctx = NULL;
+
+ ctx = damon_sysfs_build_ctx(kdamond->contexts->contexts_arr[0]);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+ err = damon_start(&ctx, 1, false);
+ if (err) {
+ damon_destroy_ctx(ctx);
+ return err;
+ }
+ kdamond->damon_ctx = ctx;
+ return err;
+}
+
+static int damon_sysfs_turn_damon_off(struct damon_sysfs_kdamond *kdamond)
+{
+ if (!kdamond->damon_ctx)
+ return -EINVAL;
+ return damon_stop(&kdamond->damon_ctx, 1);
+ /*
+ * To allow users show final monitoring results of already turned-off
+ * DAMON, we free kdamond->damon_ctx in next
+ * damon_sysfs_turn_damon_on(), or kdamonds_nr_store()
+ */
}
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
- return -EINVAL;
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+ ssize_t ret;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ if (sysfs_streq(buf, "on"))
+ ret = damon_sysfs_turn_damon_on(kdamond);
+ else if (sysfs_streq(buf, "off"))
+ ret = damon_sysfs_turn_damon_off(kdamond);
+ else
+ ret = -EINVAL;
+ mutex_unlock(&damon_sysfs_lock);
+ if (!ret)
+ ret = count;
+ return ret;
}
static ssize_t pid_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
- return -EINVAL;
+ struct damon_sysfs_kdamond *kdamond = container_of(kobj,
+ struct damon_sysfs_kdamond, kobj);
+ struct damon_ctx *ctx;
+ int pid;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ ctx = kdamond->damon_ctx;
+ if (!ctx) {
+ pid = -1;
+ goto out;
+ }
+ mutex_lock(&ctx->kdamond_lock);
+ if (!ctx->kdamond)
+ pid = -1;
+ else
+ pid = ctx->kdamond->pid;
+ mutex_unlock(&ctx->kdamond_lock);
+out:
+ mutex_unlock(&damon_sysfs_lock);
+ return sysfs_emit(buf, "%d\n", pid);
}
static void damon_sysfs_kdamond_release(struct kobject *kobj)
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 218/227] mm/damon/sysfs: support the physical address space monitoring
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 9864 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support the physical address space monitoring
This commit makes DAMON sysfs interface supports the physical address
space monitoring. Specifically, this commit adds support of the initial
monitoring regions set feature by adding 'regions' directory under each
target directory and makes context operations file to receive 'paddr' in
addition to 'vaddr'.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions <- NEW DIRECTORY
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 276 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 271 insertions(+), 5 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-the-physical-address-space-monitoring
+++ a/mm/damon/sysfs.c
@@ -114,11 +114,219 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * init region directory
+ */
+
+struct damon_sysfs_region {
+ struct kobject kobj;
+ unsigned long start;
+ unsigned long end;
+};
+
+static struct damon_sysfs_region *damon_sysfs_region_alloc(
+ unsigned long start,
+ unsigned long end)
+{
+ struct damon_sysfs_region *region = kmalloc(sizeof(*region),
+ GFP_KERNEL);
+
+ if (!region)
+ return NULL;
+ region->kobj = (struct kobject){};
+ region->start = start;
+ region->end = end;
+ return region;
+}
+
+static ssize_t start_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+
+ return sysfs_emit(buf, "%lu\n", region->start);
+}
+
+static ssize_t start_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+ int err = kstrtoul(buf, 0, ®ion->start);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t end_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+
+ return sysfs_emit(buf, "%lu\n", region->end);
+}
+
+static ssize_t end_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+ int err = kstrtoul(buf, 0, ®ion->end);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_region_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_region, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_region_start_attr =
+ __ATTR_RW_MODE(start, 0600);
+
+static struct kobj_attribute damon_sysfs_region_end_attr =
+ __ATTR_RW_MODE(end, 0600);
+
+static struct attribute *damon_sysfs_region_attrs[] = {
+ &damon_sysfs_region_start_attr.attr,
+ &damon_sysfs_region_end_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_region);
+
+static struct kobj_type damon_sysfs_region_ktype = {
+ .release = damon_sysfs_region_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_region_groups,
+};
+
+/*
+ * init_regions directory
+ */
+
+struct damon_sysfs_regions {
+ struct kobject kobj;
+ struct damon_sysfs_region **regions_arr;
+ int nr;
+};
+
+static struct damon_sysfs_regions *damon_sysfs_regions_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_regions), GFP_KERNEL);
+}
+
+static void damon_sysfs_regions_rm_dirs(struct damon_sysfs_regions *regions)
+{
+ struct damon_sysfs_region **regions_arr = regions->regions_arr;
+ int i;
+
+ for (i = 0; i < regions->nr; i++)
+ kobject_put(®ions_arr[i]->kobj);
+ regions->nr = 0;
+ kfree(regions_arr);
+ regions->regions_arr = NULL;
+}
+
+static int damon_sysfs_regions_add_dirs(struct damon_sysfs_regions *regions,
+ int nr_regions)
+{
+ struct damon_sysfs_region **regions_arr, *region;
+ int err, i;
+
+ damon_sysfs_regions_rm_dirs(regions);
+ if (!nr_regions)
+ return 0;
+
+ regions_arr = kmalloc_array(nr_regions, sizeof(*regions_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!regions_arr)
+ return -ENOMEM;
+ regions->regions_arr = regions_arr;
+
+ for (i = 0; i < nr_regions; i++) {
+ region = damon_sysfs_region_alloc(0, 0);
+ if (!region) {
+ damon_sysfs_regions_rm_dirs(regions);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(®ion->kobj,
+ &damon_sysfs_region_ktype, ®ions->kobj,
+ "%d", i);
+ if (err) {
+ kobject_put(®ion->kobj);
+ damon_sysfs_regions_rm_dirs(regions);
+ return err;
+ }
+
+ regions_arr[i] = region;
+ regions->nr++;
+ }
+ return 0;
+}
+
+static ssize_t nr_regions_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_regions *regions = container_of(kobj,
+ struct damon_sysfs_regions, kobj);
+
+ return sysfs_emit(buf, "%d\n", regions->nr);
+}
+
+static ssize_t nr_regions_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_regions *regions = container_of(kobj,
+ struct damon_sysfs_regions, kobj);
+ int nr, err = kstrtoint(buf, 0, &nr);
+
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_regions_add_dirs(regions, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_regions_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_regions, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_regions_nr_attr =
+ __ATTR_RW_MODE(nr_regions, 0600);
+
+static struct attribute *damon_sysfs_regions_attrs[] = {
+ &damon_sysfs_regions_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_regions);
+
+static struct kobj_type damon_sysfs_regions_ktype = {
+ .release = damon_sysfs_regions_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_regions_groups,
+};
+
+/*
* target directory
*/
struct damon_sysfs_target {
struct kobject kobj;
+ struct damon_sysfs_regions *regions;
int pid;
};
@@ -127,6 +335,29 @@ static struct damon_sysfs_target *damon_
return kzalloc(sizeof(struct damon_sysfs_target), GFP_KERNEL);
}
+static int damon_sysfs_target_add_dirs(struct damon_sysfs_target *target)
+{
+ struct damon_sysfs_regions *regions = damon_sysfs_regions_alloc();
+ int err;
+
+ if (!regions)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(®ions->kobj, &damon_sysfs_regions_ktype,
+ &target->kobj, "regions");
+ if (err)
+ kobject_put(®ions->kobj);
+ else
+ target->regions = regions;
+ return err;
+}
+
+static void damon_sysfs_target_rm_dirs(struct damon_sysfs_target *target)
+{
+ damon_sysfs_regions_rm_dirs(target->regions);
+ kobject_put(&target->regions->kobj);
+}
+
static ssize_t pid_target_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
@@ -188,8 +419,10 @@ static void damon_sysfs_targets_rm_dirs(
struct damon_sysfs_target **targets_arr = targets->targets_arr;
int i;
- for (i = 0; i < targets->nr; i++)
+ for (i = 0; i < targets->nr; i++) {
+ damon_sysfs_target_rm_dirs(targets_arr[i]);
kobject_put(&targets_arr[i]->kobj);
+ }
targets->nr = 0;
kfree(targets_arr);
targets->targets_arr = NULL;
@@ -224,6 +457,10 @@ static int damon_sysfs_targets_add_dirs(
if (err)
goto out;
+ err = damon_sysfs_target_add_dirs(target);
+ if (err)
+ goto out;
+
targets_arr[i] = target;
targets->nr++;
}
@@ -610,9 +847,6 @@ static ssize_t operations_store(struct k
for (id = 0; id < NR_DAMON_OPS; id++) {
if (sysfs_streq(buf, damon_sysfs_ops_strs[id])) {
- /* Support only vaddr */
- if (id != DAMON_OPS_VADDR)
- return -EINVAL;
context->ops_id = id;
return count;
}
@@ -857,10 +1091,37 @@ static void damon_sysfs_destroy_targets(
}
}
+static int damon_sysfs_set_regions(struct damon_target *t,
+ struct damon_sysfs_regions *sysfs_regions)
+{
+ int i;
+
+ for (i = 0; i < sysfs_regions->nr; i++) {
+ struct damon_sysfs_region *sys_region =
+ sysfs_regions->regions_arr[i];
+ struct damon_region *prev, *r;
+
+ if (sys_region->start > sys_region->end)
+ return -EINVAL;
+ r = damon_new_region(sys_region->start, sys_region->end);
+ if (!r)
+ return -ENOMEM;
+ damon_add_region(r, t);
+ if (damon_nr_regions(t) > 1) {
+ prev = damon_prev_region(r);
+ if (prev->ar.end > r->ar.start) {
+ damon_destroy_region(r, t);
+ return -EINVAL;
+ }
+ }
+ }
+ return 0;
+}
+
static int damon_sysfs_set_targets(struct damon_ctx *ctx,
struct damon_sysfs_targets *sysfs_targets)
{
- int i;
+ int i, err;
for (i = 0; i < sysfs_targets->nr; i++) {
struct damon_sysfs_target *sys_target =
@@ -879,6 +1140,11 @@ static int damon_sysfs_set_targets(struc
}
}
damon_add_target(ctx, t);
+ err = damon_sysfs_set_regions(t, sys_target->regions);
+ if (err) {
+ damon_sysfs_destroy_targets(ctx);
+ return err;
+ }
}
return 0;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 218/227] mm/damon/sysfs: support the physical address space monitoring
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 9864 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support the physical address space monitoring
This commit makes DAMON sysfs interface supports the physical address
space monitoring. Specifically, this commit adds support of the initial
monitoring regions set feature by adding 'regions' directory under each
target directory and makes context operations file to receive 'paddr' in
addition to 'vaddr'.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions <- NEW DIRECTORY
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 276 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 271 insertions(+), 5 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-the-physical-address-space-monitoring
+++ a/mm/damon/sysfs.c
@@ -114,11 +114,219 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * init region directory
+ */
+
+struct damon_sysfs_region {
+ struct kobject kobj;
+ unsigned long start;
+ unsigned long end;
+};
+
+static struct damon_sysfs_region *damon_sysfs_region_alloc(
+ unsigned long start,
+ unsigned long end)
+{
+ struct damon_sysfs_region *region = kmalloc(sizeof(*region),
+ GFP_KERNEL);
+
+ if (!region)
+ return NULL;
+ region->kobj = (struct kobject){};
+ region->start = start;
+ region->end = end;
+ return region;
+}
+
+static ssize_t start_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+
+ return sysfs_emit(buf, "%lu\n", region->start);
+}
+
+static ssize_t start_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+ int err = kstrtoul(buf, 0, ®ion->start);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t end_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+
+ return sysfs_emit(buf, "%lu\n", region->end);
+}
+
+static ssize_t end_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_region *region = container_of(kobj,
+ struct damon_sysfs_region, kobj);
+ int err = kstrtoul(buf, 0, ®ion->end);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_region_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_region, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_region_start_attr =
+ __ATTR_RW_MODE(start, 0600);
+
+static struct kobj_attribute damon_sysfs_region_end_attr =
+ __ATTR_RW_MODE(end, 0600);
+
+static struct attribute *damon_sysfs_region_attrs[] = {
+ &damon_sysfs_region_start_attr.attr,
+ &damon_sysfs_region_end_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_region);
+
+static struct kobj_type damon_sysfs_region_ktype = {
+ .release = damon_sysfs_region_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_region_groups,
+};
+
+/*
+ * init_regions directory
+ */
+
+struct damon_sysfs_regions {
+ struct kobject kobj;
+ struct damon_sysfs_region **regions_arr;
+ int nr;
+};
+
+static struct damon_sysfs_regions *damon_sysfs_regions_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_regions), GFP_KERNEL);
+}
+
+static void damon_sysfs_regions_rm_dirs(struct damon_sysfs_regions *regions)
+{
+ struct damon_sysfs_region **regions_arr = regions->regions_arr;
+ int i;
+
+ for (i = 0; i < regions->nr; i++)
+ kobject_put(®ions_arr[i]->kobj);
+ regions->nr = 0;
+ kfree(regions_arr);
+ regions->regions_arr = NULL;
+}
+
+static int damon_sysfs_regions_add_dirs(struct damon_sysfs_regions *regions,
+ int nr_regions)
+{
+ struct damon_sysfs_region **regions_arr, *region;
+ int err, i;
+
+ damon_sysfs_regions_rm_dirs(regions);
+ if (!nr_regions)
+ return 0;
+
+ regions_arr = kmalloc_array(nr_regions, sizeof(*regions_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!regions_arr)
+ return -ENOMEM;
+ regions->regions_arr = regions_arr;
+
+ for (i = 0; i < nr_regions; i++) {
+ region = damon_sysfs_region_alloc(0, 0);
+ if (!region) {
+ damon_sysfs_regions_rm_dirs(regions);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(®ion->kobj,
+ &damon_sysfs_region_ktype, ®ions->kobj,
+ "%d", i);
+ if (err) {
+ kobject_put(®ion->kobj);
+ damon_sysfs_regions_rm_dirs(regions);
+ return err;
+ }
+
+ regions_arr[i] = region;
+ regions->nr++;
+ }
+ return 0;
+}
+
+static ssize_t nr_regions_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_regions *regions = container_of(kobj,
+ struct damon_sysfs_regions, kobj);
+
+ return sysfs_emit(buf, "%d\n", regions->nr);
+}
+
+static ssize_t nr_regions_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_regions *regions = container_of(kobj,
+ struct damon_sysfs_regions, kobj);
+ int nr, err = kstrtoint(buf, 0, &nr);
+
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_regions_add_dirs(regions, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+
+ return count;
+}
+
+static void damon_sysfs_regions_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_regions, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_regions_nr_attr =
+ __ATTR_RW_MODE(nr_regions, 0600);
+
+static struct attribute *damon_sysfs_regions_attrs[] = {
+ &damon_sysfs_regions_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_regions);
+
+static struct kobj_type damon_sysfs_regions_ktype = {
+ .release = damon_sysfs_regions_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_regions_groups,
+};
+
+/*
* target directory
*/
struct damon_sysfs_target {
struct kobject kobj;
+ struct damon_sysfs_regions *regions;
int pid;
};
@@ -127,6 +335,29 @@ static struct damon_sysfs_target *damon_
return kzalloc(sizeof(struct damon_sysfs_target), GFP_KERNEL);
}
+static int damon_sysfs_target_add_dirs(struct damon_sysfs_target *target)
+{
+ struct damon_sysfs_regions *regions = damon_sysfs_regions_alloc();
+ int err;
+
+ if (!regions)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(®ions->kobj, &damon_sysfs_regions_ktype,
+ &target->kobj, "regions");
+ if (err)
+ kobject_put(®ions->kobj);
+ else
+ target->regions = regions;
+ return err;
+}
+
+static void damon_sysfs_target_rm_dirs(struct damon_sysfs_target *target)
+{
+ damon_sysfs_regions_rm_dirs(target->regions);
+ kobject_put(&target->regions->kobj);
+}
+
static ssize_t pid_target_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
@@ -188,8 +419,10 @@ static void damon_sysfs_targets_rm_dirs(
struct damon_sysfs_target **targets_arr = targets->targets_arr;
int i;
- for (i = 0; i < targets->nr; i++)
+ for (i = 0; i < targets->nr; i++) {
+ damon_sysfs_target_rm_dirs(targets_arr[i]);
kobject_put(&targets_arr[i]->kobj);
+ }
targets->nr = 0;
kfree(targets_arr);
targets->targets_arr = NULL;
@@ -224,6 +457,10 @@ static int damon_sysfs_targets_add_dirs(
if (err)
goto out;
+ err = damon_sysfs_target_add_dirs(target);
+ if (err)
+ goto out;
+
targets_arr[i] = target;
targets->nr++;
}
@@ -610,9 +847,6 @@ static ssize_t operations_store(struct k
for (id = 0; id < NR_DAMON_OPS; id++) {
if (sysfs_streq(buf, damon_sysfs_ops_strs[id])) {
- /* Support only vaddr */
- if (id != DAMON_OPS_VADDR)
- return -EINVAL;
context->ops_id = id;
return count;
}
@@ -857,10 +1091,37 @@ static void damon_sysfs_destroy_targets(
}
}
+static int damon_sysfs_set_regions(struct damon_target *t,
+ struct damon_sysfs_regions *sysfs_regions)
+{
+ int i;
+
+ for (i = 0; i < sysfs_regions->nr; i++) {
+ struct damon_sysfs_region *sys_region =
+ sysfs_regions->regions_arr[i];
+ struct damon_region *prev, *r;
+
+ if (sys_region->start > sys_region->end)
+ return -EINVAL;
+ r = damon_new_region(sys_region->start, sys_region->end);
+ if (!r)
+ return -ENOMEM;
+ damon_add_region(r, t);
+ if (damon_nr_regions(t) > 1) {
+ prev = damon_prev_region(r);
+ if (prev->ar.end > r->ar.start) {
+ damon_destroy_region(r, t);
+ return -EINVAL;
+ }
+ }
+ }
+ return 0;
+}
+
static int damon_sysfs_set_targets(struct damon_ctx *ctx,
struct damon_sysfs_targets *sysfs_targets)
{
- int i;
+ int i, err;
for (i = 0; i < sysfs_targets->nr; i++) {
struct damon_sysfs_target *sys_target =
@@ -879,6 +1140,11 @@ static int damon_sysfs_set_targets(struc
}
}
damon_add_target(ctx, t);
+ err = damon_sysfs_set_regions(t, sys_target->regions);
+ if (err) {
+ damon_sysfs_destroy_targets(ctx);
+ return err;
+ }
}
return 0;
}
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 219/227] mm/damon/sysfs: support DAMON-based Operation Schemes
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 13820 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMON-based Operation Schemes
This commit makes DAMON sysfs interface supports the DAMON-based operation
schemes (DAMOS) feature. Specifically, this commit adds 'schemes'
directory under each context direcotry, and makes kdamond 'state' file
writing respects the contents in the directory.
Note that this commit doesn't support all features of DAMOS but only the
target access pattern and action feature. Supports for quotas,
prioritization, watermarks will follow.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes <- NEW DIRECTORY
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 410 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 410 insertions(+)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damon-based-operation-schemes
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,347 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * access_pattern directory
+ */
+
+struct damon_sysfs_access_pattern {
+ struct kobject kobj;
+ struct damon_sysfs_ul_range *sz;
+ struct damon_sysfs_ul_range *nr_accesses;
+ struct damon_sysfs_ul_range *age;
+};
+
+static
+struct damon_sysfs_access_pattern *damon_sysfs_access_pattern_alloc(void)
+{
+ struct damon_sysfs_access_pattern *access_pattern =
+ kmalloc(sizeof(*access_pattern), GFP_KERNEL);
+
+ if (!access_pattern)
+ return NULL;
+ access_pattern->kobj = (struct kobject){};
+ return access_pattern;
+}
+
+static int damon_sysfs_access_pattern_add_range_dir(
+ struct damon_sysfs_access_pattern *access_pattern,
+ struct damon_sysfs_ul_range **range_dir_ptr,
+ char *name)
+{
+ struct damon_sysfs_ul_range *range = damon_sysfs_ul_range_alloc(0, 0);
+ int err;
+
+ if (!range)
+ return -ENOMEM;
+ err = kobject_init_and_add(&range->kobj, &damon_sysfs_ul_range_ktype,
+ &access_pattern->kobj, name);
+ if (err)
+ kobject_put(&range->kobj);
+ else
+ *range_dir_ptr = range;
+ return err;
+}
+
+static int damon_sysfs_access_pattern_add_dirs(
+ struct damon_sysfs_access_pattern *access_pattern)
+{
+ int err;
+
+ err = damon_sysfs_access_pattern_add_range_dir(access_pattern,
+ &access_pattern->sz, "sz");
+ if (err)
+ goto put_sz_out;
+
+ err = damon_sysfs_access_pattern_add_range_dir(access_pattern,
+ &access_pattern->nr_accesses, "nr_accesses");
+ if (err)
+ goto put_nr_accesses_sz_out;
+
+ err = damon_sysfs_access_pattern_add_range_dir(access_pattern,
+ &access_pattern->age, "age");
+ if (err)
+ goto put_age_nr_accesses_sz_out;
+ return 0;
+
+put_age_nr_accesses_sz_out:
+ kobject_put(&access_pattern->age->kobj);
+ access_pattern->age = NULL;
+put_nr_accesses_sz_out:
+ kobject_put(&access_pattern->nr_accesses->kobj);
+ access_pattern->nr_accesses = NULL;
+put_sz_out:
+ kobject_put(&access_pattern->sz->kobj);
+ access_pattern->sz = NULL;
+ return err;
+}
+
+static void damon_sysfs_access_pattern_rm_dirs(
+ struct damon_sysfs_access_pattern *access_pattern)
+{
+ kobject_put(&access_pattern->sz->kobj);
+ kobject_put(&access_pattern->nr_accesses->kobj);
+ kobject_put(&access_pattern->age->kobj);
+}
+
+static void damon_sysfs_access_pattern_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_access_pattern, kobj));
+}
+
+static struct attribute *damon_sysfs_access_pattern_attrs[] = {
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_access_pattern);
+
+static struct kobj_type damon_sysfs_access_pattern_ktype = {
+ .release = damon_sysfs_access_pattern_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_access_pattern_groups,
+};
+
+/*
+ * scheme directory
+ */
+
+struct damon_sysfs_scheme {
+ struct kobject kobj;
+ enum damos_action action;
+ struct damon_sysfs_access_pattern *access_pattern;
+};
+
+/* This should match with enum damos_action */
+static const char * const damon_sysfs_damos_action_strs[] = {
+ "willneed",
+ "cold",
+ "pageout",
+ "hugepage",
+ "nohugepage",
+ "stat",
+};
+
+static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
+ enum damos_action action)
+{
+ struct damon_sysfs_scheme *scheme = kmalloc(sizeof(*scheme),
+ GFP_KERNEL);
+
+ if (!scheme)
+ return NULL;
+ scheme->kobj = (struct kobject){};
+ scheme->action = action;
+ return scheme;
+}
+
+static int damon_sysfs_scheme_set_access_pattern(
+ struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_access_pattern *access_pattern;
+ int err;
+
+ access_pattern = damon_sysfs_access_pattern_alloc();
+ if (!access_pattern)
+ return -ENOMEM;
+ err = kobject_init_and_add(&access_pattern->kobj,
+ &damon_sysfs_access_pattern_ktype, &scheme->kobj,
+ "access_pattern");
+ if (err)
+ goto out;
+ err = damon_sysfs_access_pattern_add_dirs(access_pattern);
+ if (err)
+ goto out;
+ scheme->access_pattern = access_pattern;
+ return 0;
+
+out:
+ kobject_put(&access_pattern->kobj);
+ return err;
+}
+
+static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
+{
+ int err;
+
+ err = damon_sysfs_scheme_set_access_pattern(scheme);
+ if (err)
+ return err;
+ return 0;
+}
+
+static void damon_sysfs_scheme_rm_dirs(struct damon_sysfs_scheme *scheme)
+{
+ damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern);
+ kobject_put(&scheme->access_pattern->kobj);
+}
+
+static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+
+ return sysfs_emit(buf, "%s\n",
+ damon_sysfs_damos_action_strs[scheme->action]);
+}
+
+static ssize_t action_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+ enum damos_action action;
+
+ for (action = 0; action < NR_DAMOS_ACTIONS; action++) {
+ if (sysfs_streq(buf, damon_sysfs_damos_action_strs[action])) {
+ scheme->action = action;
+ return count;
+ }
+ }
+ return -EINVAL;
+}
+
+static void damon_sysfs_scheme_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_scheme, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_scheme_action_attr =
+ __ATTR_RW_MODE(action, 0600);
+
+static struct attribute *damon_sysfs_scheme_attrs[] = {
+ &damon_sysfs_scheme_action_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_scheme);
+
+static struct kobj_type damon_sysfs_scheme_ktype = {
+ .release = damon_sysfs_scheme_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_scheme_groups,
+};
+
+/*
+ * schemes directory
+ */
+
+struct damon_sysfs_schemes {
+ struct kobject kobj;
+ struct damon_sysfs_scheme **schemes_arr;
+ int nr;
+};
+
+static struct damon_sysfs_schemes *damon_sysfs_schemes_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_schemes), GFP_KERNEL);
+}
+
+static void damon_sysfs_schemes_rm_dirs(struct damon_sysfs_schemes *schemes)
+{
+ struct damon_sysfs_scheme **schemes_arr = schemes->schemes_arr;
+ int i;
+
+ for (i = 0; i < schemes->nr; i++) {
+ damon_sysfs_scheme_rm_dirs(schemes_arr[i]);
+ kobject_put(&schemes_arr[i]->kobj);
+ }
+ schemes->nr = 0;
+ kfree(schemes_arr);
+ schemes->schemes_arr = NULL;
+}
+
+static int damon_sysfs_schemes_add_dirs(struct damon_sysfs_schemes *schemes,
+ int nr_schemes)
+{
+ struct damon_sysfs_scheme **schemes_arr, *scheme;
+ int err, i;
+
+ damon_sysfs_schemes_rm_dirs(schemes);
+ if (!nr_schemes)
+ return 0;
+
+ schemes_arr = kmalloc_array(nr_schemes, sizeof(*schemes_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!schemes_arr)
+ return -ENOMEM;
+ schemes->schemes_arr = schemes_arr;
+
+ for (i = 0; i < nr_schemes; i++) {
+ scheme = damon_sysfs_scheme_alloc(DAMOS_STAT);
+ if (!scheme) {
+ damon_sysfs_schemes_rm_dirs(schemes);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&scheme->kobj,
+ &damon_sysfs_scheme_ktype, &schemes->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+ err = damon_sysfs_scheme_add_dirs(scheme);
+ if (err)
+ goto out;
+
+ schemes_arr[i] = scheme;
+ schemes->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_schemes_rm_dirs(schemes);
+ kobject_put(&scheme->kobj);
+ return err;
+}
+
+static ssize_t nr_schemes_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_schemes *schemes = container_of(kobj,
+ struct damon_sysfs_schemes, kobj);
+
+ return sysfs_emit(buf, "%d\n", schemes->nr);
+}
+
+static ssize_t nr_schemes_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_schemes *schemes = container_of(kobj,
+ struct damon_sysfs_schemes, kobj);
+ int nr, err = kstrtoint(buf, 0, &nr);
+
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_schemes_add_dirs(schemes, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+ return count;
+}
+
+static void damon_sysfs_schemes_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_schemes, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_schemes_nr_attr =
+ __ATTR_RW_MODE(nr_schemes, 0600);
+
+static struct attribute *damon_sysfs_schemes_attrs[] = {
+ &damon_sysfs_schemes_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_schemes);
+
+static struct kobj_type damon_sysfs_schemes_ktype = {
+ .release = damon_sysfs_schemes_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_schemes_groups,
+};
+
+/*
* init region directory
*/
@@ -748,6 +1089,7 @@ struct damon_sysfs_context {
enum damon_ops_id ops_id;
struct damon_sysfs_attrs *attrs;
struct damon_sysfs_targets *targets;
+ struct damon_sysfs_schemes *schemes;
};
static struct damon_sysfs_context *damon_sysfs_context_alloc(
@@ -802,6 +1144,23 @@ static int damon_sysfs_context_set_targe
return 0;
}
+static int damon_sysfs_context_set_schemes(struct damon_sysfs_context *context)
+{
+ struct damon_sysfs_schemes *schemes = damon_sysfs_schemes_alloc();
+ int err;
+
+ if (!schemes)
+ return -ENOMEM;
+ err = kobject_init_and_add(&schemes->kobj, &damon_sysfs_schemes_ktype,
+ &context->kobj, "schemes");
+ if (err) {
+ kobject_put(&schemes->kobj);
+ return err;
+ }
+ context->schemes = schemes;
+ return 0;
+}
+
static int damon_sysfs_context_add_dirs(struct damon_sysfs_context *context)
{
int err;
@@ -813,8 +1172,15 @@ static int damon_sysfs_context_add_dirs(
err = damon_sysfs_context_set_targets(context);
if (err)
goto put_attrs_out;
+
+ err = damon_sysfs_context_set_schemes(context);
+ if (err)
+ goto put_targets_attrs_out;
return 0;
+put_targets_attrs_out:
+ kobject_put(&context->targets->kobj);
+ context->targets = NULL;
put_attrs_out:
kobject_put(&context->attrs->kobj);
context->attrs = NULL;
@@ -827,6 +1193,8 @@ static void damon_sysfs_context_rm_dirs(
kobject_put(&context->attrs->kobj);
damon_sysfs_targets_rm_dirs(context->targets);
kobject_put(&context->targets->kobj);
+ damon_sysfs_schemes_rm_dirs(context->schemes);
+ kobject_put(&context->schemes->kobj);
}
static ssize_t operations_show(struct kobject *kobj,
@@ -1149,6 +1517,45 @@ static int damon_sysfs_set_targets(struc
return 0;
}
+static struct damos *damon_sysfs_mk_scheme(
+ struct damon_sysfs_scheme *sysfs_scheme)
+{
+ struct damon_sysfs_access_pattern *pattern =
+ sysfs_scheme->access_pattern;
+ struct damos_quota quota = (struct damos_quota){};
+ struct damos_watermarks wmarks = {
+ .metric = DAMOS_WMARK_NONE,
+ .interval = 0,
+ .high = 0,
+ .mid = 0,
+ .low = 0,
+ };
+
+ return damon_new_scheme(pattern->sz->min, pattern->sz->max,
+ pattern->nr_accesses->min, pattern->nr_accesses->max,
+ pattern->age->min, pattern->age->max,
+ sysfs_scheme->action, "a, &wmarks);
+}
+
+static int damon_sysfs_set_schemes(struct damon_ctx *ctx,
+ struct damon_sysfs_schemes *sysfs_schemes)
+{
+ int i;
+
+ for (i = 0; i < sysfs_schemes->nr; i++) {
+ struct damos *scheme, *next;
+
+ scheme = damon_sysfs_mk_scheme(sysfs_schemes->schemes_arr[i]);
+ if (!scheme) {
+ damon_for_each_scheme_safe(scheme, next, ctx)
+ damon_destroy_scheme(scheme);
+ return -ENOMEM;
+ }
+ damon_add_scheme(ctx, scheme);
+ }
+ return 0;
+}
+
static void damon_sysfs_before_terminate(struct damon_ctx *ctx)
{
struct damon_target *t, *next;
@@ -1182,6 +1589,9 @@ static struct damon_ctx *damon_sysfs_bui
err = damon_sysfs_set_targets(ctx, sys_ctx->targets);
if (err)
goto out;
+ err = damon_sysfs_set_schemes(ctx, sys_ctx->schemes);
+ if (err)
+ goto out;
ctx->callback.before_terminate = damon_sysfs_before_terminate;
return ctx;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 219/227] mm/damon/sysfs: support DAMON-based Operation Schemes
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 13820 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMON-based Operation Schemes
This commit makes DAMON sysfs interface supports the DAMON-based operation
schemes (DAMOS) feature. Specifically, this commit adds 'schemes'
directory under each context direcotry, and makes kdamond 'state' file
writing respects the contents in the directory.
Note that this commit doesn't support all features of DAMOS but only the
target access pattern and action feature. Supports for quotas,
prioritization, watermarks will follow.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes <- NEW DIRECTORY
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 410 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 410 insertions(+)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damon-based-operation-schemes
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,347 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * access_pattern directory
+ */
+
+struct damon_sysfs_access_pattern {
+ struct kobject kobj;
+ struct damon_sysfs_ul_range *sz;
+ struct damon_sysfs_ul_range *nr_accesses;
+ struct damon_sysfs_ul_range *age;
+};
+
+static
+struct damon_sysfs_access_pattern *damon_sysfs_access_pattern_alloc(void)
+{
+ struct damon_sysfs_access_pattern *access_pattern =
+ kmalloc(sizeof(*access_pattern), GFP_KERNEL);
+
+ if (!access_pattern)
+ return NULL;
+ access_pattern->kobj = (struct kobject){};
+ return access_pattern;
+}
+
+static int damon_sysfs_access_pattern_add_range_dir(
+ struct damon_sysfs_access_pattern *access_pattern,
+ struct damon_sysfs_ul_range **range_dir_ptr,
+ char *name)
+{
+ struct damon_sysfs_ul_range *range = damon_sysfs_ul_range_alloc(0, 0);
+ int err;
+
+ if (!range)
+ return -ENOMEM;
+ err = kobject_init_and_add(&range->kobj, &damon_sysfs_ul_range_ktype,
+ &access_pattern->kobj, name);
+ if (err)
+ kobject_put(&range->kobj);
+ else
+ *range_dir_ptr = range;
+ return err;
+}
+
+static int damon_sysfs_access_pattern_add_dirs(
+ struct damon_sysfs_access_pattern *access_pattern)
+{
+ int err;
+
+ err = damon_sysfs_access_pattern_add_range_dir(access_pattern,
+ &access_pattern->sz, "sz");
+ if (err)
+ goto put_sz_out;
+
+ err = damon_sysfs_access_pattern_add_range_dir(access_pattern,
+ &access_pattern->nr_accesses, "nr_accesses");
+ if (err)
+ goto put_nr_accesses_sz_out;
+
+ err = damon_sysfs_access_pattern_add_range_dir(access_pattern,
+ &access_pattern->age, "age");
+ if (err)
+ goto put_age_nr_accesses_sz_out;
+ return 0;
+
+put_age_nr_accesses_sz_out:
+ kobject_put(&access_pattern->age->kobj);
+ access_pattern->age = NULL;
+put_nr_accesses_sz_out:
+ kobject_put(&access_pattern->nr_accesses->kobj);
+ access_pattern->nr_accesses = NULL;
+put_sz_out:
+ kobject_put(&access_pattern->sz->kobj);
+ access_pattern->sz = NULL;
+ return err;
+}
+
+static void damon_sysfs_access_pattern_rm_dirs(
+ struct damon_sysfs_access_pattern *access_pattern)
+{
+ kobject_put(&access_pattern->sz->kobj);
+ kobject_put(&access_pattern->nr_accesses->kobj);
+ kobject_put(&access_pattern->age->kobj);
+}
+
+static void damon_sysfs_access_pattern_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_access_pattern, kobj));
+}
+
+static struct attribute *damon_sysfs_access_pattern_attrs[] = {
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_access_pattern);
+
+static struct kobj_type damon_sysfs_access_pattern_ktype = {
+ .release = damon_sysfs_access_pattern_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_access_pattern_groups,
+};
+
+/*
+ * scheme directory
+ */
+
+struct damon_sysfs_scheme {
+ struct kobject kobj;
+ enum damos_action action;
+ struct damon_sysfs_access_pattern *access_pattern;
+};
+
+/* This should match with enum damos_action */
+static const char * const damon_sysfs_damos_action_strs[] = {
+ "willneed",
+ "cold",
+ "pageout",
+ "hugepage",
+ "nohugepage",
+ "stat",
+};
+
+static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
+ enum damos_action action)
+{
+ struct damon_sysfs_scheme *scheme = kmalloc(sizeof(*scheme),
+ GFP_KERNEL);
+
+ if (!scheme)
+ return NULL;
+ scheme->kobj = (struct kobject){};
+ scheme->action = action;
+ return scheme;
+}
+
+static int damon_sysfs_scheme_set_access_pattern(
+ struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_access_pattern *access_pattern;
+ int err;
+
+ access_pattern = damon_sysfs_access_pattern_alloc();
+ if (!access_pattern)
+ return -ENOMEM;
+ err = kobject_init_and_add(&access_pattern->kobj,
+ &damon_sysfs_access_pattern_ktype, &scheme->kobj,
+ "access_pattern");
+ if (err)
+ goto out;
+ err = damon_sysfs_access_pattern_add_dirs(access_pattern);
+ if (err)
+ goto out;
+ scheme->access_pattern = access_pattern;
+ return 0;
+
+out:
+ kobject_put(&access_pattern->kobj);
+ return err;
+}
+
+static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
+{
+ int err;
+
+ err = damon_sysfs_scheme_set_access_pattern(scheme);
+ if (err)
+ return err;
+ return 0;
+}
+
+static void damon_sysfs_scheme_rm_dirs(struct damon_sysfs_scheme *scheme)
+{
+ damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern);
+ kobject_put(&scheme->access_pattern->kobj);
+}
+
+static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+
+ return sysfs_emit(buf, "%s\n",
+ damon_sysfs_damos_action_strs[scheme->action]);
+}
+
+static ssize_t action_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+ enum damos_action action;
+
+ for (action = 0; action < NR_DAMOS_ACTIONS; action++) {
+ if (sysfs_streq(buf, damon_sysfs_damos_action_strs[action])) {
+ scheme->action = action;
+ return count;
+ }
+ }
+ return -EINVAL;
+}
+
+static void damon_sysfs_scheme_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_scheme, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_scheme_action_attr =
+ __ATTR_RW_MODE(action, 0600);
+
+static struct attribute *damon_sysfs_scheme_attrs[] = {
+ &damon_sysfs_scheme_action_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_scheme);
+
+static struct kobj_type damon_sysfs_scheme_ktype = {
+ .release = damon_sysfs_scheme_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_scheme_groups,
+};
+
+/*
+ * schemes directory
+ */
+
+struct damon_sysfs_schemes {
+ struct kobject kobj;
+ struct damon_sysfs_scheme **schemes_arr;
+ int nr;
+};
+
+static struct damon_sysfs_schemes *damon_sysfs_schemes_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_schemes), GFP_KERNEL);
+}
+
+static void damon_sysfs_schemes_rm_dirs(struct damon_sysfs_schemes *schemes)
+{
+ struct damon_sysfs_scheme **schemes_arr = schemes->schemes_arr;
+ int i;
+
+ for (i = 0; i < schemes->nr; i++) {
+ damon_sysfs_scheme_rm_dirs(schemes_arr[i]);
+ kobject_put(&schemes_arr[i]->kobj);
+ }
+ schemes->nr = 0;
+ kfree(schemes_arr);
+ schemes->schemes_arr = NULL;
+}
+
+static int damon_sysfs_schemes_add_dirs(struct damon_sysfs_schemes *schemes,
+ int nr_schemes)
+{
+ struct damon_sysfs_scheme **schemes_arr, *scheme;
+ int err, i;
+
+ damon_sysfs_schemes_rm_dirs(schemes);
+ if (!nr_schemes)
+ return 0;
+
+ schemes_arr = kmalloc_array(nr_schemes, sizeof(*schemes_arr),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!schemes_arr)
+ return -ENOMEM;
+ schemes->schemes_arr = schemes_arr;
+
+ for (i = 0; i < nr_schemes; i++) {
+ scheme = damon_sysfs_scheme_alloc(DAMOS_STAT);
+ if (!scheme) {
+ damon_sysfs_schemes_rm_dirs(schemes);
+ return -ENOMEM;
+ }
+
+ err = kobject_init_and_add(&scheme->kobj,
+ &damon_sysfs_scheme_ktype, &schemes->kobj,
+ "%d", i);
+ if (err)
+ goto out;
+ err = damon_sysfs_scheme_add_dirs(scheme);
+ if (err)
+ goto out;
+
+ schemes_arr[i] = scheme;
+ schemes->nr++;
+ }
+ return 0;
+
+out:
+ damon_sysfs_schemes_rm_dirs(schemes);
+ kobject_put(&scheme->kobj);
+ return err;
+}
+
+static ssize_t nr_schemes_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_schemes *schemes = container_of(kobj,
+ struct damon_sysfs_schemes, kobj);
+
+ return sysfs_emit(buf, "%d\n", schemes->nr);
+}
+
+static ssize_t nr_schemes_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_schemes *schemes = container_of(kobj,
+ struct damon_sysfs_schemes, kobj);
+ int nr, err = kstrtoint(buf, 0, &nr);
+
+ if (err)
+ return err;
+ if (nr < 0)
+ return -EINVAL;
+
+ if (!mutex_trylock(&damon_sysfs_lock))
+ return -EBUSY;
+ err = damon_sysfs_schemes_add_dirs(schemes, nr);
+ mutex_unlock(&damon_sysfs_lock);
+ if (err)
+ return err;
+ return count;
+}
+
+static void damon_sysfs_schemes_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_schemes, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_schemes_nr_attr =
+ __ATTR_RW_MODE(nr_schemes, 0600);
+
+static struct attribute *damon_sysfs_schemes_attrs[] = {
+ &damon_sysfs_schemes_nr_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_schemes);
+
+static struct kobj_type damon_sysfs_schemes_ktype = {
+ .release = damon_sysfs_schemes_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_schemes_groups,
+};
+
+/*
* init region directory
*/
@@ -748,6 +1089,7 @@ struct damon_sysfs_context {
enum damon_ops_id ops_id;
struct damon_sysfs_attrs *attrs;
struct damon_sysfs_targets *targets;
+ struct damon_sysfs_schemes *schemes;
};
static struct damon_sysfs_context *damon_sysfs_context_alloc(
@@ -802,6 +1144,23 @@ static int damon_sysfs_context_set_targe
return 0;
}
+static int damon_sysfs_context_set_schemes(struct damon_sysfs_context *context)
+{
+ struct damon_sysfs_schemes *schemes = damon_sysfs_schemes_alloc();
+ int err;
+
+ if (!schemes)
+ return -ENOMEM;
+ err = kobject_init_and_add(&schemes->kobj, &damon_sysfs_schemes_ktype,
+ &context->kobj, "schemes");
+ if (err) {
+ kobject_put(&schemes->kobj);
+ return err;
+ }
+ context->schemes = schemes;
+ return 0;
+}
+
static int damon_sysfs_context_add_dirs(struct damon_sysfs_context *context)
{
int err;
@@ -813,8 +1172,15 @@ static int damon_sysfs_context_add_dirs(
err = damon_sysfs_context_set_targets(context);
if (err)
goto put_attrs_out;
+
+ err = damon_sysfs_context_set_schemes(context);
+ if (err)
+ goto put_targets_attrs_out;
return 0;
+put_targets_attrs_out:
+ kobject_put(&context->targets->kobj);
+ context->targets = NULL;
put_attrs_out:
kobject_put(&context->attrs->kobj);
context->attrs = NULL;
@@ -827,6 +1193,8 @@ static void damon_sysfs_context_rm_dirs(
kobject_put(&context->attrs->kobj);
damon_sysfs_targets_rm_dirs(context->targets);
kobject_put(&context->targets->kobj);
+ damon_sysfs_schemes_rm_dirs(context->schemes);
+ kobject_put(&context->schemes->kobj);
}
static ssize_t operations_show(struct kobject *kobj,
@@ -1149,6 +1517,45 @@ static int damon_sysfs_set_targets(struc
return 0;
}
+static struct damos *damon_sysfs_mk_scheme(
+ struct damon_sysfs_scheme *sysfs_scheme)
+{
+ struct damon_sysfs_access_pattern *pattern =
+ sysfs_scheme->access_pattern;
+ struct damos_quota quota = (struct damos_quota){};
+ struct damos_watermarks wmarks = {
+ .metric = DAMOS_WMARK_NONE,
+ .interval = 0,
+ .high = 0,
+ .mid = 0,
+ .low = 0,
+ };
+
+ return damon_new_scheme(pattern->sz->min, pattern->sz->max,
+ pattern->nr_accesses->min, pattern->nr_accesses->max,
+ pattern->age->min, pattern->age->max,
+ sysfs_scheme->action, "a, &wmarks);
+}
+
+static int damon_sysfs_set_schemes(struct damon_ctx *ctx,
+ struct damon_sysfs_schemes *sysfs_schemes)
+{
+ int i;
+
+ for (i = 0; i < sysfs_schemes->nr; i++) {
+ struct damos *scheme, *next;
+
+ scheme = damon_sysfs_mk_scheme(sysfs_schemes->schemes_arr[i]);
+ if (!scheme) {
+ damon_for_each_scheme_safe(scheme, next, ctx)
+ damon_destroy_scheme(scheme);
+ return -ENOMEM;
+ }
+ damon_add_scheme(ctx, scheme);
+ }
+ return 0;
+}
+
static void damon_sysfs_before_terminate(struct damon_ctx *ctx)
{
struct damon_target *t, *next;
@@ -1182,6 +1589,9 @@ static struct damon_ctx *damon_sysfs_bui
err = damon_sysfs_set_targets(ctx, sys_ctx->targets);
if (err)
goto out;
+ err = damon_sysfs_set_schemes(ctx, sys_ctx->schemes);
+ if (err)
+ goto out;
ctx->callback.before_terminate = damon_sysfs_before_terminate;
return ctx;
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 220/227] mm/damon/sysfs: support DAMOS quotas
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 6936 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMOS quotas
This commit makes DAMON sysfs interface supports the DAMOS quotas feature.
Specifically, this commit adds 'quotas' directory under each scheme
directory and makes kdamond 'state' file writing respects the contents in
the directory.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms <- NEW DIRECTORY
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 146 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 145 insertions(+), 1 deletion(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-quotas
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,113 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * quotas directory
+ */
+
+struct damon_sysfs_quotas {
+ struct kobject kobj;
+ unsigned long ms;
+ unsigned long sz;
+ unsigned long reset_interval_ms;
+};
+
+static struct damon_sysfs_quotas *damon_sysfs_quotas_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_quotas), GFP_KERNEL);
+}
+
+static ssize_t ms_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+
+ return sysfs_emit(buf, "%lu\n", quotas->ms);
+}
+
+static ssize_t ms_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+ int err = kstrtoul(buf, 0, "as->ms);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t bytes_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+
+ return sysfs_emit(buf, "%lu\n", quotas->sz);
+}
+
+static ssize_t bytes_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+ int err = kstrtoul(buf, 0, "as->sz);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t reset_interval_ms_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+
+ return sysfs_emit(buf, "%lu\n", quotas->reset_interval_ms);
+}
+
+static ssize_t reset_interval_ms_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+ int err = kstrtoul(buf, 0, "as->reset_interval_ms);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_quotas_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_quotas, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_quotas_ms_attr =
+ __ATTR_RW_MODE(ms, 0600);
+
+static struct kobj_attribute damon_sysfs_quotas_sz_attr =
+ __ATTR_RW_MODE(bytes, 0600);
+
+static struct kobj_attribute damon_sysfs_quotas_reset_interval_ms_attr =
+ __ATTR_RW_MODE(reset_interval_ms, 0600);
+
+static struct attribute *damon_sysfs_quotas_attrs[] = {
+ &damon_sysfs_quotas_ms_attr.attr,
+ &damon_sysfs_quotas_sz_attr.attr,
+ &damon_sysfs_quotas_reset_interval_ms_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_quotas);
+
+static struct kobj_type damon_sysfs_quotas_ktype = {
+ .release = damon_sysfs_quotas_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_quotas_groups,
+};
+
+/*
* access_pattern directory
*/
@@ -220,6 +327,7 @@ struct damon_sysfs_scheme {
struct kobject kobj;
enum damos_action action;
struct damon_sysfs_access_pattern *access_pattern;
+ struct damon_sysfs_quotas *quotas;
};
/* This should match with enum damos_action */
@@ -270,6 +378,25 @@ out:
return err;
}
+static int damon_sysfs_scheme_set_quotas(struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_quotas *quotas = damon_sysfs_quotas_alloc();
+ int err;
+
+ if (!quotas)
+ return -ENOMEM;
+ err = kobject_init_and_add("as->kobj, &damon_sysfs_quotas_ktype,
+ &scheme->kobj, "quotas");
+ if (err)
+ goto out;
+ scheme->quotas = quotas;
+ return 0;
+
+out:
+ kobject_put("as->kobj);
+ return err;
+}
+
static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
{
int err;
@@ -277,13 +404,22 @@ static int damon_sysfs_scheme_add_dirs(s
err = damon_sysfs_scheme_set_access_pattern(scheme);
if (err)
return err;
+ err = damon_sysfs_scheme_set_quotas(scheme);
+ if (err)
+ goto put_access_pattern_out;
return 0;
+
+put_access_pattern_out:
+ kobject_put(&scheme->access_pattern->kobj);
+ scheme->access_pattern = NULL;
+ return err;
}
static void damon_sysfs_scheme_rm_dirs(struct damon_sysfs_scheme *scheme)
{
damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern);
kobject_put(&scheme->access_pattern->kobj);
+ kobject_put(&scheme->quotas->kobj);
}
static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
@@ -1522,7 +1658,15 @@ static struct damos *damon_sysfs_mk_sche
{
struct damon_sysfs_access_pattern *pattern =
sysfs_scheme->access_pattern;
- struct damos_quota quota = (struct damos_quota){};
+ struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
+ struct damos_quota quota = {
+ .ms = sysfs_quotas->ms,
+ .sz = sysfs_quotas->sz,
+ .reset_interval = sysfs_quotas->reset_interval_ms,
+ .weight_sz = 1000,
+ .weight_nr_accesses = 1000,
+ .weight_age = 1000,
+ };
struct damos_watermarks wmarks = {
.metric = DAMOS_WMARK_NONE,
.interval = 0,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 220/227] mm/damon/sysfs: support DAMOS quotas
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 6936 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMOS quotas
This commit makes DAMON sysfs interface supports the DAMOS quotas feature.
Specifically, this commit adds 'quotas' directory under each scheme
directory and makes kdamond 'state' file writing respects the contents in
the directory.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms <- NEW DIRECTORY
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 146 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 145 insertions(+), 1 deletion(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-quotas
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,113 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * quotas directory
+ */
+
+struct damon_sysfs_quotas {
+ struct kobject kobj;
+ unsigned long ms;
+ unsigned long sz;
+ unsigned long reset_interval_ms;
+};
+
+static struct damon_sysfs_quotas *damon_sysfs_quotas_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_quotas), GFP_KERNEL);
+}
+
+static ssize_t ms_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+
+ return sysfs_emit(buf, "%lu\n", quotas->ms);
+}
+
+static ssize_t ms_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+ int err = kstrtoul(buf, 0, "as->ms);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t bytes_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+
+ return sysfs_emit(buf, "%lu\n", quotas->sz);
+}
+
+static ssize_t bytes_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+ int err = kstrtoul(buf, 0, "as->sz);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t reset_interval_ms_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+
+ return sysfs_emit(buf, "%lu\n", quotas->reset_interval_ms);
+}
+
+static ssize_t reset_interval_ms_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_quotas *quotas = container_of(kobj,
+ struct damon_sysfs_quotas, kobj);
+ int err = kstrtoul(buf, 0, "as->reset_interval_ms);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_quotas_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_quotas, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_quotas_ms_attr =
+ __ATTR_RW_MODE(ms, 0600);
+
+static struct kobj_attribute damon_sysfs_quotas_sz_attr =
+ __ATTR_RW_MODE(bytes, 0600);
+
+static struct kobj_attribute damon_sysfs_quotas_reset_interval_ms_attr =
+ __ATTR_RW_MODE(reset_interval_ms, 0600);
+
+static struct attribute *damon_sysfs_quotas_attrs[] = {
+ &damon_sysfs_quotas_ms_attr.attr,
+ &damon_sysfs_quotas_sz_attr.attr,
+ &damon_sysfs_quotas_reset_interval_ms_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_quotas);
+
+static struct kobj_type damon_sysfs_quotas_ktype = {
+ .release = damon_sysfs_quotas_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_quotas_groups,
+};
+
+/*
* access_pattern directory
*/
@@ -220,6 +327,7 @@ struct damon_sysfs_scheme {
struct kobject kobj;
enum damos_action action;
struct damon_sysfs_access_pattern *access_pattern;
+ struct damon_sysfs_quotas *quotas;
};
/* This should match with enum damos_action */
@@ -270,6 +378,25 @@ out:
return err;
}
+static int damon_sysfs_scheme_set_quotas(struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_quotas *quotas = damon_sysfs_quotas_alloc();
+ int err;
+
+ if (!quotas)
+ return -ENOMEM;
+ err = kobject_init_and_add("as->kobj, &damon_sysfs_quotas_ktype,
+ &scheme->kobj, "quotas");
+ if (err)
+ goto out;
+ scheme->quotas = quotas;
+ return 0;
+
+out:
+ kobject_put("as->kobj);
+ return err;
+}
+
static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
{
int err;
@@ -277,13 +404,22 @@ static int damon_sysfs_scheme_add_dirs(s
err = damon_sysfs_scheme_set_access_pattern(scheme);
if (err)
return err;
+ err = damon_sysfs_scheme_set_quotas(scheme);
+ if (err)
+ goto put_access_pattern_out;
return 0;
+
+put_access_pattern_out:
+ kobject_put(&scheme->access_pattern->kobj);
+ scheme->access_pattern = NULL;
+ return err;
}
static void damon_sysfs_scheme_rm_dirs(struct damon_sysfs_scheme *scheme)
{
damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern);
kobject_put(&scheme->access_pattern->kobj);
+ kobject_put(&scheme->quotas->kobj);
}
static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
@@ -1522,7 +1658,15 @@ static struct damos *damon_sysfs_mk_sche
{
struct damon_sysfs_access_pattern *pattern =
sysfs_scheme->access_pattern;
- struct damos_quota quota = (struct damos_quota){};
+ struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
+ struct damos_quota quota = {
+ .ms = sysfs_quotas->ms,
+ .sz = sysfs_quotas->sz,
+ .reset_interval = sysfs_quotas->reset_interval_ms,
+ .weight_sz = 1000,
+ .weight_nr_accesses = 1000,
+ .weight_age = 1000,
+ };
struct damos_watermarks wmarks = {
.metric = DAMOS_WMARK_NONE,
.interval = 0,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 221/227] mm/damon/sysfs: support schemes prioritization
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7408 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support schemes prioritization
This commit makes DAMON sysfs interface supports the DAMOS' regions
prioritization weights feature under quotas limitation. Specifically,
this commit adds 'weights' directory under each scheme directory and makes
kdamond 'state' file writing respects the contents in the directory.
/sys/kernel/mm/damon/admin
│ kdamonds/nr
│ │ 0/state,pid
│ │ │ contexts/nr
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr
│ │ │ │ │ │ 0/pid
│ │ │ │ │ │ │ regions/nr
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/ <- NEW DIRECTORY
│ │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 152 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 149 insertions(+), 3 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-schemes-prioritization
+++ a/mm/damon/sysfs.c
@@ -114,11 +114,129 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * scheme/weights directory
+ */
+
+struct damon_sysfs_weights {
+ struct kobject kobj;
+ unsigned int sz;
+ unsigned int nr_accesses;
+ unsigned int age;
+};
+
+static struct damon_sysfs_weights *damon_sysfs_weights_alloc(unsigned int sz,
+ unsigned int nr_accesses, unsigned int age)
+{
+ struct damon_sysfs_weights *weights = kmalloc(sizeof(*weights),
+ GFP_KERNEL);
+
+ if (!weights)
+ return NULL;
+ weights->kobj = (struct kobject){};
+ weights->sz = sz;
+ weights->nr_accesses = nr_accesses;
+ weights->age = age;
+ return weights;
+}
+
+static ssize_t sz_permil_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+
+ return sysfs_emit(buf, "%u\n", weights->sz);
+}
+
+static ssize_t sz_permil_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+ int err = kstrtouint(buf, 0, &weights->sz);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t nr_accesses_permil_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+
+ return sysfs_emit(buf, "%u\n", weights->nr_accesses);
+}
+
+static ssize_t nr_accesses_permil_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+ int err = kstrtouint(buf, 0, &weights->nr_accesses);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t age_permil_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+
+ return sysfs_emit(buf, "%u\n", weights->age);
+}
+
+static ssize_t age_permil_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+ int err = kstrtouint(buf, 0, &weights->age);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_weights_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_weights, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_weights_sz_attr =
+ __ATTR_RW_MODE(sz_permil, 0600);
+
+static struct kobj_attribute damon_sysfs_weights_nr_accesses_attr =
+ __ATTR_RW_MODE(nr_accesses_permil, 0600);
+
+static struct kobj_attribute damon_sysfs_weights_age_attr =
+ __ATTR_RW_MODE(age_permil, 0600);
+
+static struct attribute *damon_sysfs_weights_attrs[] = {
+ &damon_sysfs_weights_sz_attr.attr,
+ &damon_sysfs_weights_nr_accesses_attr.attr,
+ &damon_sysfs_weights_age_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_weights);
+
+static struct kobj_type damon_sysfs_weights_ktype = {
+ .release = damon_sysfs_weights_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_weights_groups,
+};
+
+/*
* quotas directory
*/
struct damon_sysfs_quotas {
struct kobject kobj;
+ struct damon_sysfs_weights *weights;
unsigned long ms;
unsigned long sz;
unsigned long reset_interval_ms;
@@ -129,6 +247,29 @@ static struct damon_sysfs_quotas *damon_
return kzalloc(sizeof(struct damon_sysfs_quotas), GFP_KERNEL);
}
+static int damon_sysfs_quotas_add_dirs(struct damon_sysfs_quotas *quotas)
+{
+ struct damon_sysfs_weights *weights;
+ int err;
+
+ weights = damon_sysfs_weights_alloc(0, 0, 0);
+ if (!weights)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&weights->kobj, &damon_sysfs_weights_ktype,
+ "as->kobj, "weights");
+ if (err)
+ kobject_put(&weights->kobj);
+ else
+ quotas->weights = weights;
+ return err;
+}
+
+static void damon_sysfs_quotas_rm_dirs(struct damon_sysfs_quotas *quotas)
+{
+ kobject_put("as->weights->kobj);
+}
+
static ssize_t ms_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
@@ -389,6 +530,9 @@ static int damon_sysfs_scheme_set_quotas
&scheme->kobj, "quotas");
if (err)
goto out;
+ err = damon_sysfs_quotas_add_dirs(quotas);
+ if (err)
+ goto out;
scheme->quotas = quotas;
return 0;
@@ -419,6 +563,7 @@ static void damon_sysfs_scheme_rm_dirs(s
{
damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern);
kobject_put(&scheme->access_pattern->kobj);
+ damon_sysfs_quotas_rm_dirs(scheme->quotas);
kobject_put(&scheme->quotas->kobj);
}
@@ -1659,13 +1804,14 @@ static struct damos *damon_sysfs_mk_sche
struct damon_sysfs_access_pattern *pattern =
sysfs_scheme->access_pattern;
struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
+ struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights;
struct damos_quota quota = {
.ms = sysfs_quotas->ms,
.sz = sysfs_quotas->sz,
.reset_interval = sysfs_quotas->reset_interval_ms,
- .weight_sz = 1000,
- .weight_nr_accesses = 1000,
- .weight_age = 1000,
+ .weight_sz = sysfs_weights->sz,
+ .weight_nr_accesses = sysfs_weights->nr_accesses,
+ .weight_age = sysfs_weights->age,
};
struct damos_watermarks wmarks = {
.metric = DAMOS_WMARK_NONE,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 221/227] mm/damon/sysfs: support schemes prioritization
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7408 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support schemes prioritization
This commit makes DAMON sysfs interface supports the DAMOS' regions
prioritization weights feature under quotas limitation. Specifically,
this commit adds 'weights' directory under each scheme directory and makes
kdamond 'state' file writing respects the contents in the directory.
/sys/kernel/mm/damon/admin
│ kdamonds/nr
│ │ 0/state,pid
│ │ │ contexts/nr
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr
│ │ │ │ │ │ 0/pid
│ │ │ │ │ │ │ regions/nr
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/ <- NEW DIRECTORY
│ │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 152 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 149 insertions(+), 3 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-schemes-prioritization
+++ a/mm/damon/sysfs.c
@@ -114,11 +114,129 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * scheme/weights directory
+ */
+
+struct damon_sysfs_weights {
+ struct kobject kobj;
+ unsigned int sz;
+ unsigned int nr_accesses;
+ unsigned int age;
+};
+
+static struct damon_sysfs_weights *damon_sysfs_weights_alloc(unsigned int sz,
+ unsigned int nr_accesses, unsigned int age)
+{
+ struct damon_sysfs_weights *weights = kmalloc(sizeof(*weights),
+ GFP_KERNEL);
+
+ if (!weights)
+ return NULL;
+ weights->kobj = (struct kobject){};
+ weights->sz = sz;
+ weights->nr_accesses = nr_accesses;
+ weights->age = age;
+ return weights;
+}
+
+static ssize_t sz_permil_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+
+ return sysfs_emit(buf, "%u\n", weights->sz);
+}
+
+static ssize_t sz_permil_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+ int err = kstrtouint(buf, 0, &weights->sz);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t nr_accesses_permil_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+
+ return sysfs_emit(buf, "%u\n", weights->nr_accesses);
+}
+
+static ssize_t nr_accesses_permil_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+ int err = kstrtouint(buf, 0, &weights->nr_accesses);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t age_permil_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+
+ return sysfs_emit(buf, "%u\n", weights->age);
+}
+
+static ssize_t age_permil_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_weights *weights = container_of(kobj,
+ struct damon_sysfs_weights, kobj);
+ int err = kstrtouint(buf, 0, &weights->age);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_weights_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_weights, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_weights_sz_attr =
+ __ATTR_RW_MODE(sz_permil, 0600);
+
+static struct kobj_attribute damon_sysfs_weights_nr_accesses_attr =
+ __ATTR_RW_MODE(nr_accesses_permil, 0600);
+
+static struct kobj_attribute damon_sysfs_weights_age_attr =
+ __ATTR_RW_MODE(age_permil, 0600);
+
+static struct attribute *damon_sysfs_weights_attrs[] = {
+ &damon_sysfs_weights_sz_attr.attr,
+ &damon_sysfs_weights_nr_accesses_attr.attr,
+ &damon_sysfs_weights_age_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_weights);
+
+static struct kobj_type damon_sysfs_weights_ktype = {
+ .release = damon_sysfs_weights_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_weights_groups,
+};
+
+/*
* quotas directory
*/
struct damon_sysfs_quotas {
struct kobject kobj;
+ struct damon_sysfs_weights *weights;
unsigned long ms;
unsigned long sz;
unsigned long reset_interval_ms;
@@ -129,6 +247,29 @@ static struct damon_sysfs_quotas *damon_
return kzalloc(sizeof(struct damon_sysfs_quotas), GFP_KERNEL);
}
+static int damon_sysfs_quotas_add_dirs(struct damon_sysfs_quotas *quotas)
+{
+ struct damon_sysfs_weights *weights;
+ int err;
+
+ weights = damon_sysfs_weights_alloc(0, 0, 0);
+ if (!weights)
+ return -ENOMEM;
+
+ err = kobject_init_and_add(&weights->kobj, &damon_sysfs_weights_ktype,
+ "as->kobj, "weights");
+ if (err)
+ kobject_put(&weights->kobj);
+ else
+ quotas->weights = weights;
+ return err;
+}
+
+static void damon_sysfs_quotas_rm_dirs(struct damon_sysfs_quotas *quotas)
+{
+ kobject_put("as->weights->kobj);
+}
+
static ssize_t ms_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
@@ -389,6 +530,9 @@ static int damon_sysfs_scheme_set_quotas
&scheme->kobj, "quotas");
if (err)
goto out;
+ err = damon_sysfs_quotas_add_dirs(quotas);
+ if (err)
+ goto out;
scheme->quotas = quotas;
return 0;
@@ -419,6 +563,7 @@ static void damon_sysfs_scheme_rm_dirs(s
{
damon_sysfs_access_pattern_rm_dirs(scheme->access_pattern);
kobject_put(&scheme->access_pattern->kobj);
+ damon_sysfs_quotas_rm_dirs(scheme->quotas);
kobject_put(&scheme->quotas->kobj);
}
@@ -1659,13 +1804,14 @@ static struct damos *damon_sysfs_mk_sche
struct damon_sysfs_access_pattern *pattern =
sysfs_scheme->access_pattern;
struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
+ struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights;
struct damos_quota quota = {
.ms = sysfs_quotas->ms,
.sz = sysfs_quotas->sz,
.reset_interval = sysfs_quotas->reset_interval_ms,
- .weight_sz = 1000,
- .weight_nr_accesses = 1000,
- .weight_age = 1000,
+ .weight_sz = sysfs_weights->sz,
+ .weight_nr_accesses = sysfs_weights->nr_accesses,
+ .weight_age = sysfs_weights->age,
};
struct damos_watermarks wmarks = {
.metric = DAMOS_WMARK_NONE,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 222/227] mm/damon/sysfs: support DAMOS watermarks
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, colin.i.king, sj, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 10162 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMOS watermarks
This commit makes DAMON sysfs interface supports the DAMOS watermarks
feature. Specifically, this commit adds 'watermarks' directory under each
scheme directory and makes kdamond 'state' file writing respects the
contents in the directory.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,sz,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/ <- NEW DIRECTORY
│ │ │ │ │ │ │ │ metric,interval_us,high,mid,lo
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
[sj@kernel.org: fix out-of-bound array access for wmark_metric_strs[]]
Link: https://lkml.kernel.org/r/20220301185619.2904-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220228081314.5770-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 220 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 215 insertions(+), 5 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-watermarks
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,189 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * watermarks directory
+ */
+
+struct damon_sysfs_watermarks {
+ struct kobject kobj;
+ enum damos_wmark_metric metric;
+ unsigned long interval_us;
+ unsigned long high;
+ unsigned long mid;
+ unsigned long low;
+};
+
+static struct damon_sysfs_watermarks *damon_sysfs_watermarks_alloc(
+ enum damos_wmark_metric metric, unsigned long interval_us,
+ unsigned long high, unsigned long mid, unsigned long low)
+{
+ struct damon_sysfs_watermarks *watermarks = kmalloc(
+ sizeof(*watermarks), GFP_KERNEL);
+
+ if (!watermarks)
+ return NULL;
+ watermarks->kobj = (struct kobject){};
+ watermarks->metric = metric;
+ watermarks->interval_us = interval_us;
+ watermarks->high = high;
+ watermarks->mid = mid;
+ watermarks->low = low;
+ return watermarks;
+}
+
+/* Should match with enum damos_wmark_metric */
+static const char * const damon_sysfs_wmark_metric_strs[] = {
+ "none",
+ "free_mem_rate",
+};
+
+static ssize_t metric_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%s\n",
+ damon_sysfs_wmark_metric_strs[watermarks->metric]);
+}
+
+static ssize_t metric_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ enum damos_wmark_metric metric;
+
+ for (metric = 0; metric < NR_DAMOS_WMARK_METRICS; metric++) {
+ if (sysfs_streq(buf, damon_sysfs_wmark_metric_strs[metric])) {
+ watermarks->metric = metric;
+ return count;
+ }
+ }
+ return -EINVAL;
+}
+
+static ssize_t interval_us_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->interval_us);
+}
+
+static ssize_t interval_us_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->interval_us);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t high_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->high);
+}
+
+static ssize_t high_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->high);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t mid_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->mid);
+}
+
+static ssize_t mid_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->mid);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t low_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->low);
+}
+
+static ssize_t low_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->low);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_watermarks_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_watermarks, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_watermarks_metric_attr =
+ __ATTR_RW_MODE(metric, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_interval_us_attr =
+ __ATTR_RW_MODE(interval_us, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_high_attr =
+ __ATTR_RW_MODE(high, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_mid_attr =
+ __ATTR_RW_MODE(mid, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_low_attr =
+ __ATTR_RW_MODE(low, 0600);
+
+static struct attribute *damon_sysfs_watermarks_attrs[] = {
+ &damon_sysfs_watermarks_metric_attr.attr,
+ &damon_sysfs_watermarks_interval_us_attr.attr,
+ &damon_sysfs_watermarks_high_attr.attr,
+ &damon_sysfs_watermarks_mid_attr.attr,
+ &damon_sysfs_watermarks_low_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_watermarks);
+
+static struct kobj_type damon_sysfs_watermarks_ktype = {
+ .release = damon_sysfs_watermarks_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_watermarks_groups,
+};
+
+/*
* scheme/weights directory
*/
@@ -469,6 +652,7 @@ struct damon_sysfs_scheme {
enum damos_action action;
struct damon_sysfs_access_pattern *access_pattern;
struct damon_sysfs_quotas *quotas;
+ struct damon_sysfs_watermarks *watermarks;
};
/* This should match with enum damos_action */
@@ -541,6 +725,24 @@ out:
return err;
}
+static int damon_sysfs_scheme_set_watermarks(struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_watermarks *watermarks =
+ damon_sysfs_watermarks_alloc(DAMOS_WMARK_NONE, 0, 0, 0, 0);
+ int err;
+
+ if (!watermarks)
+ return -ENOMEM;
+ err = kobject_init_and_add(&watermarks->kobj,
+ &damon_sysfs_watermarks_ktype, &scheme->kobj,
+ "watermarks");
+ if (err)
+ kobject_put(&watermarks->kobj);
+ else
+ scheme->watermarks = watermarks;
+ return err;
+}
+
static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
{
int err;
@@ -551,8 +753,14 @@ static int damon_sysfs_scheme_add_dirs(s
err = damon_sysfs_scheme_set_quotas(scheme);
if (err)
goto put_access_pattern_out;
+ err = damon_sysfs_scheme_set_watermarks(scheme);
+ if (err)
+ goto put_quotas_access_pattern_out;
return 0;
+put_quotas_access_pattern_out:
+ kobject_put(&scheme->quotas->kobj);
+ scheme->quotas = NULL;
put_access_pattern_out:
kobject_put(&scheme->access_pattern->kobj);
scheme->access_pattern = NULL;
@@ -565,6 +773,7 @@ static void damon_sysfs_scheme_rm_dirs(s
kobject_put(&scheme->access_pattern->kobj);
damon_sysfs_quotas_rm_dirs(scheme->quotas);
kobject_put(&scheme->quotas->kobj);
+ kobject_put(&scheme->watermarks->kobj);
}
static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
@@ -1805,6 +2014,7 @@ static struct damos *damon_sysfs_mk_sche
sysfs_scheme->access_pattern;
struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights;
+ struct damon_sysfs_watermarks *sysfs_wmarks = sysfs_scheme->watermarks;
struct damos_quota quota = {
.ms = sysfs_quotas->ms,
.sz = sysfs_quotas->sz,
@@ -1814,11 +2024,11 @@ static struct damos *damon_sysfs_mk_sche
.weight_age = sysfs_weights->age,
};
struct damos_watermarks wmarks = {
- .metric = DAMOS_WMARK_NONE,
- .interval = 0,
- .high = 0,
- .mid = 0,
- .low = 0,
+ .metric = sysfs_wmarks->metric,
+ .interval = sysfs_wmarks->interval_us,
+ .high = sysfs_wmarks->high,
+ .mid = sysfs_wmarks->mid,
+ .low = sysfs_wmarks->low,
};
return damon_new_scheme(pattern->sz->min, pattern->sz->max,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 222/227] mm/damon/sysfs: support DAMOS watermarks
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, colin.i.king, sj, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 10162 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMOS watermarks
This commit makes DAMON sysfs interface supports the DAMOS watermarks
feature. Specifically, this commit adds 'watermarks' directory under each
scheme directory and makes kdamond 'state' file writing respects the
contents in the directory.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,sz,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/ <- NEW DIRECTORY
│ │ │ │ │ │ │ │ metric,interval_us,high,mid,lo
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
[sj@kernel.org: fix out-of-bound array access for wmark_metric_strs[]]
Link: https://lkml.kernel.org/r/20220301185619.2904-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20220228081314.5770-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 220 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 215 insertions(+), 5 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-watermarks
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,189 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * watermarks directory
+ */
+
+struct damon_sysfs_watermarks {
+ struct kobject kobj;
+ enum damos_wmark_metric metric;
+ unsigned long interval_us;
+ unsigned long high;
+ unsigned long mid;
+ unsigned long low;
+};
+
+static struct damon_sysfs_watermarks *damon_sysfs_watermarks_alloc(
+ enum damos_wmark_metric metric, unsigned long interval_us,
+ unsigned long high, unsigned long mid, unsigned long low)
+{
+ struct damon_sysfs_watermarks *watermarks = kmalloc(
+ sizeof(*watermarks), GFP_KERNEL);
+
+ if (!watermarks)
+ return NULL;
+ watermarks->kobj = (struct kobject){};
+ watermarks->metric = metric;
+ watermarks->interval_us = interval_us;
+ watermarks->high = high;
+ watermarks->mid = mid;
+ watermarks->low = low;
+ return watermarks;
+}
+
+/* Should match with enum damos_wmark_metric */
+static const char * const damon_sysfs_wmark_metric_strs[] = {
+ "none",
+ "free_mem_rate",
+};
+
+static ssize_t metric_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%s\n",
+ damon_sysfs_wmark_metric_strs[watermarks->metric]);
+}
+
+static ssize_t metric_store(struct kobject *kobj, struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ enum damos_wmark_metric metric;
+
+ for (metric = 0; metric < NR_DAMOS_WMARK_METRICS; metric++) {
+ if (sysfs_streq(buf, damon_sysfs_wmark_metric_strs[metric])) {
+ watermarks->metric = metric;
+ return count;
+ }
+ }
+ return -EINVAL;
+}
+
+static ssize_t interval_us_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->interval_us);
+}
+
+static ssize_t interval_us_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->interval_us);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t high_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->high);
+}
+
+static ssize_t high_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->high);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t mid_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->mid);
+}
+
+static ssize_t mid_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->mid);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t low_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+
+ return sysfs_emit(buf, "%lu\n", watermarks->low);
+}
+
+static ssize_t low_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_watermarks *watermarks = container_of(kobj,
+ struct damon_sysfs_watermarks, kobj);
+ int err = kstrtoul(buf, 0, &watermarks->low);
+
+ if (err)
+ return -EINVAL;
+ return count;
+}
+
+static void damon_sysfs_watermarks_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_watermarks, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_watermarks_metric_attr =
+ __ATTR_RW_MODE(metric, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_interval_us_attr =
+ __ATTR_RW_MODE(interval_us, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_high_attr =
+ __ATTR_RW_MODE(high, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_mid_attr =
+ __ATTR_RW_MODE(mid, 0600);
+
+static struct kobj_attribute damon_sysfs_watermarks_low_attr =
+ __ATTR_RW_MODE(low, 0600);
+
+static struct attribute *damon_sysfs_watermarks_attrs[] = {
+ &damon_sysfs_watermarks_metric_attr.attr,
+ &damon_sysfs_watermarks_interval_us_attr.attr,
+ &damon_sysfs_watermarks_high_attr.attr,
+ &damon_sysfs_watermarks_mid_attr.attr,
+ &damon_sysfs_watermarks_low_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_watermarks);
+
+static struct kobj_type damon_sysfs_watermarks_ktype = {
+ .release = damon_sysfs_watermarks_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_watermarks_groups,
+};
+
+/*
* scheme/weights directory
*/
@@ -469,6 +652,7 @@ struct damon_sysfs_scheme {
enum damos_action action;
struct damon_sysfs_access_pattern *access_pattern;
struct damon_sysfs_quotas *quotas;
+ struct damon_sysfs_watermarks *watermarks;
};
/* This should match with enum damos_action */
@@ -541,6 +725,24 @@ out:
return err;
}
+static int damon_sysfs_scheme_set_watermarks(struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_watermarks *watermarks =
+ damon_sysfs_watermarks_alloc(DAMOS_WMARK_NONE, 0, 0, 0, 0);
+ int err;
+
+ if (!watermarks)
+ return -ENOMEM;
+ err = kobject_init_and_add(&watermarks->kobj,
+ &damon_sysfs_watermarks_ktype, &scheme->kobj,
+ "watermarks");
+ if (err)
+ kobject_put(&watermarks->kobj);
+ else
+ scheme->watermarks = watermarks;
+ return err;
+}
+
static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
{
int err;
@@ -551,8 +753,14 @@ static int damon_sysfs_scheme_add_dirs(s
err = damon_sysfs_scheme_set_quotas(scheme);
if (err)
goto put_access_pattern_out;
+ err = damon_sysfs_scheme_set_watermarks(scheme);
+ if (err)
+ goto put_quotas_access_pattern_out;
return 0;
+put_quotas_access_pattern_out:
+ kobject_put(&scheme->quotas->kobj);
+ scheme->quotas = NULL;
put_access_pattern_out:
kobject_put(&scheme->access_pattern->kobj);
scheme->access_pattern = NULL;
@@ -565,6 +773,7 @@ static void damon_sysfs_scheme_rm_dirs(s
kobject_put(&scheme->access_pattern->kobj);
damon_sysfs_quotas_rm_dirs(scheme->quotas);
kobject_put(&scheme->quotas->kobj);
+ kobject_put(&scheme->watermarks->kobj);
}
static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
@@ -1805,6 +2014,7 @@ static struct damos *damon_sysfs_mk_sche
sysfs_scheme->access_pattern;
struct damon_sysfs_quotas *sysfs_quotas = sysfs_scheme->quotas;
struct damon_sysfs_weights *sysfs_weights = sysfs_quotas->weights;
+ struct damon_sysfs_watermarks *sysfs_wmarks = sysfs_scheme->watermarks;
struct damos_quota quota = {
.ms = sysfs_quotas->ms,
.sz = sysfs_quotas->sz,
@@ -1814,11 +2024,11 @@ static struct damos *damon_sysfs_mk_sche
.weight_age = sysfs_weights->age,
};
struct damos_watermarks wmarks = {
- .metric = DAMOS_WMARK_NONE,
- .interval = 0,
- .high = 0,
- .mid = 0,
- .low = 0,
+ .metric = sysfs_wmarks->metric,
+ .interval = sysfs_wmarks->interval_us,
+ .high = sysfs_wmarks->high,
+ .mid = sysfs_wmarks->mid,
+ .low = sysfs_wmarks->low,
};
return damon_new_scheme(pattern->sz->min, pattern->sz->max,
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 223/227] mm/damon/sysfs: support DAMOS stats
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8182 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMOS stats
This commit makes DAMON sysfs interface supports the DAMOS stats feature.
Specifically, this commit adds 'stats' directory under each scheme
directory, and update the contents of the files under the directory
according to the latest monitoring results, when the user writes special
keyword, 'update_schemes_stats' to the 'state' file of the kdamond.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,sz,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ stats/ <- NEW DIRECTORY
│ │ │ │ │ │ │ │ nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 150 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 150 insertions(+)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-stats
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,105 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * schemes/stats directory
+ */
+
+struct damon_sysfs_stats {
+ struct kobject kobj;
+ unsigned long nr_tried;
+ unsigned long sz_tried;
+ unsigned long nr_applied;
+ unsigned long sz_applied;
+ unsigned long qt_exceeds;
+};
+
+static struct damon_sysfs_stats *damon_sysfs_stats_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_stats), GFP_KERNEL);
+}
+
+static ssize_t nr_tried_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->nr_tried);
+}
+
+static ssize_t sz_tried_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->sz_tried);
+}
+
+static ssize_t nr_applied_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->nr_applied);
+}
+
+static ssize_t sz_applied_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->sz_applied);
+}
+
+static ssize_t qt_exceeds_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->qt_exceeds);
+}
+
+static void damon_sysfs_stats_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_stats, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_stats_nr_tried_attr =
+ __ATTR_RO_MODE(nr_tried, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_sz_tried_attr =
+ __ATTR_RO_MODE(sz_tried, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_nr_applied_attr =
+ __ATTR_RO_MODE(nr_applied, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_sz_applied_attr =
+ __ATTR_RO_MODE(sz_applied, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_qt_exceeds_attr =
+ __ATTR_RO_MODE(qt_exceeds, 0400);
+
+static struct attribute *damon_sysfs_stats_attrs[] = {
+ &damon_sysfs_stats_nr_tried_attr.attr,
+ &damon_sysfs_stats_sz_tried_attr.attr,
+ &damon_sysfs_stats_nr_applied_attr.attr,
+ &damon_sysfs_stats_sz_applied_attr.attr,
+ &damon_sysfs_stats_qt_exceeds_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_stats);
+
+static struct kobj_type damon_sysfs_stats_ktype = {
+ .release = damon_sysfs_stats_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_stats_groups,
+};
+
+/*
* watermarks directory
*/
@@ -653,6 +752,7 @@ struct damon_sysfs_scheme {
struct damon_sysfs_access_pattern *access_pattern;
struct damon_sysfs_quotas *quotas;
struct damon_sysfs_watermarks *watermarks;
+ struct damon_sysfs_stats *stats;
};
/* This should match with enum damos_action */
@@ -743,6 +843,22 @@ static int damon_sysfs_scheme_set_waterm
return err;
}
+static int damon_sysfs_scheme_set_stats(struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_stats *stats = damon_sysfs_stats_alloc();
+ int err;
+
+ if (!stats)
+ return -ENOMEM;
+ err = kobject_init_and_add(&stats->kobj, &damon_sysfs_stats_ktype,
+ &scheme->kobj, "stats");
+ if (err)
+ kobject_put(&stats->kobj);
+ else
+ scheme->stats = stats;
+ return err;
+}
+
static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
{
int err;
@@ -756,8 +872,14 @@ static int damon_sysfs_scheme_add_dirs(s
err = damon_sysfs_scheme_set_watermarks(scheme);
if (err)
goto put_quotas_access_pattern_out;
+ err = damon_sysfs_scheme_set_stats(scheme);
+ if (err)
+ goto put_watermarks_quotas_access_pattern_out;
return 0;
+put_watermarks_quotas_access_pattern_out:
+ kobject_put(&scheme->watermarks->kobj);
+ scheme->watermarks = NULL;
put_quotas_access_pattern_out:
kobject_put(&scheme->quotas->kobj);
scheme->quotas = NULL;
@@ -774,6 +896,7 @@ static void damon_sysfs_scheme_rm_dirs(s
damon_sysfs_quotas_rm_dirs(scheme->quotas);
kobject_put(&scheme->quotas->kobj);
kobject_put(&scheme->watermarks->kobj);
+ kobject_put(&scheme->stats->kobj);
}
static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
@@ -2141,6 +2264,31 @@ static int damon_sysfs_turn_damon_off(st
*/
}
+static int damon_sysfs_update_schemes_stats(struct damon_sysfs_kdamond *kdamond)
+{
+ struct damon_ctx *ctx = kdamond->damon_ctx;
+ struct damos *scheme;
+ int schemes_idx = 0;
+
+ if (!ctx)
+ return -EINVAL;
+ mutex_lock(&ctx->kdamond_lock);
+ damon_for_each_scheme(scheme, ctx) {
+ struct damon_sysfs_schemes *sysfs_schemes;
+ struct damon_sysfs_stats *sysfs_stats;
+
+ sysfs_schemes = kdamond->contexts->contexts_arr[0]->schemes;
+ sysfs_stats = sysfs_schemes->schemes_arr[schemes_idx++]->stats;
+ sysfs_stats->nr_tried = scheme->stat.nr_tried;
+ sysfs_stats->sz_tried = scheme->stat.sz_tried;
+ sysfs_stats->nr_applied = scheme->stat.nr_applied;
+ sysfs_stats->sz_applied = scheme->stat.sz_applied;
+ sysfs_stats->qt_exceeds = scheme->stat.qt_exceeds;
+ }
+ mutex_unlock(&ctx->kdamond_lock);
+ return 0;
+}
+
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
@@ -2154,6 +2302,8 @@ static ssize_t state_store(struct kobjec
ret = damon_sysfs_turn_damon_on(kdamond);
else if (sysfs_streq(buf, "off"))
ret = damon_sysfs_turn_damon_off(kdamond);
+ else if (sysfs_streq(buf, "update_schemes_stats"))
+ ret = damon_sysfs_update_schemes_stats(kdamond);
else
ret = -EINVAL;
mutex_unlock(&damon_sysfs_lock);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 223/227] mm/damon/sysfs: support DAMOS stats
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8182 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: mm/damon/sysfs: support DAMOS stats
This commit makes DAMON sysfs interface supports the DAMOS stats feature.
Specifically, this commit adds 'stats' directory under each scheme
directory, and update the contents of the files under the directory
according to the latest monitoring results, when the user writes special
keyword, 'update_schemes_stats' to the 'state' file of the kdamond.
As a result, the files hierarchy becomes as below:
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,sz,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ stats/ <- NEW DIRECTORY
│ │ │ │ │ │ │ │ nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Link: https://lkml.kernel.org/r/20220228081314.5770-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 150 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 150 insertions(+)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-support-damos-stats
+++ a/mm/damon/sysfs.c
@@ -114,6 +114,105 @@ static struct kobj_type damon_sysfs_ul_r
};
/*
+ * schemes/stats directory
+ */
+
+struct damon_sysfs_stats {
+ struct kobject kobj;
+ unsigned long nr_tried;
+ unsigned long sz_tried;
+ unsigned long nr_applied;
+ unsigned long sz_applied;
+ unsigned long qt_exceeds;
+};
+
+static struct damon_sysfs_stats *damon_sysfs_stats_alloc(void)
+{
+ return kzalloc(sizeof(struct damon_sysfs_stats), GFP_KERNEL);
+}
+
+static ssize_t nr_tried_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->nr_tried);
+}
+
+static ssize_t sz_tried_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->sz_tried);
+}
+
+static ssize_t nr_applied_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->nr_applied);
+}
+
+static ssize_t sz_applied_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->sz_applied);
+}
+
+static ssize_t qt_exceeds_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_stats *stats = container_of(kobj,
+ struct damon_sysfs_stats, kobj);
+
+ return sysfs_emit(buf, "%lu\n", stats->qt_exceeds);
+}
+
+static void damon_sysfs_stats_release(struct kobject *kobj)
+{
+ kfree(container_of(kobj, struct damon_sysfs_stats, kobj));
+}
+
+static struct kobj_attribute damon_sysfs_stats_nr_tried_attr =
+ __ATTR_RO_MODE(nr_tried, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_sz_tried_attr =
+ __ATTR_RO_MODE(sz_tried, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_nr_applied_attr =
+ __ATTR_RO_MODE(nr_applied, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_sz_applied_attr =
+ __ATTR_RO_MODE(sz_applied, 0400);
+
+static struct kobj_attribute damon_sysfs_stats_qt_exceeds_attr =
+ __ATTR_RO_MODE(qt_exceeds, 0400);
+
+static struct attribute *damon_sysfs_stats_attrs[] = {
+ &damon_sysfs_stats_nr_tried_attr.attr,
+ &damon_sysfs_stats_sz_tried_attr.attr,
+ &damon_sysfs_stats_nr_applied_attr.attr,
+ &damon_sysfs_stats_sz_applied_attr.attr,
+ &damon_sysfs_stats_qt_exceeds_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(damon_sysfs_stats);
+
+static struct kobj_type damon_sysfs_stats_ktype = {
+ .release = damon_sysfs_stats_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = damon_sysfs_stats_groups,
+};
+
+/*
* watermarks directory
*/
@@ -653,6 +752,7 @@ struct damon_sysfs_scheme {
struct damon_sysfs_access_pattern *access_pattern;
struct damon_sysfs_quotas *quotas;
struct damon_sysfs_watermarks *watermarks;
+ struct damon_sysfs_stats *stats;
};
/* This should match with enum damos_action */
@@ -743,6 +843,22 @@ static int damon_sysfs_scheme_set_waterm
return err;
}
+static int damon_sysfs_scheme_set_stats(struct damon_sysfs_scheme *scheme)
+{
+ struct damon_sysfs_stats *stats = damon_sysfs_stats_alloc();
+ int err;
+
+ if (!stats)
+ return -ENOMEM;
+ err = kobject_init_and_add(&stats->kobj, &damon_sysfs_stats_ktype,
+ &scheme->kobj, "stats");
+ if (err)
+ kobject_put(&stats->kobj);
+ else
+ scheme->stats = stats;
+ return err;
+}
+
static int damon_sysfs_scheme_add_dirs(struct damon_sysfs_scheme *scheme)
{
int err;
@@ -756,8 +872,14 @@ static int damon_sysfs_scheme_add_dirs(s
err = damon_sysfs_scheme_set_watermarks(scheme);
if (err)
goto put_quotas_access_pattern_out;
+ err = damon_sysfs_scheme_set_stats(scheme);
+ if (err)
+ goto put_watermarks_quotas_access_pattern_out;
return 0;
+put_watermarks_quotas_access_pattern_out:
+ kobject_put(&scheme->watermarks->kobj);
+ scheme->watermarks = NULL;
put_quotas_access_pattern_out:
kobject_put(&scheme->quotas->kobj);
scheme->quotas = NULL;
@@ -774,6 +896,7 @@ static void damon_sysfs_scheme_rm_dirs(s
damon_sysfs_quotas_rm_dirs(scheme->quotas);
kobject_put(&scheme->quotas->kobj);
kobject_put(&scheme->watermarks->kobj);
+ kobject_put(&scheme->stats->kobj);
}
static ssize_t action_show(struct kobject *kobj, struct kobj_attribute *attr,
@@ -2141,6 +2264,31 @@ static int damon_sysfs_turn_damon_off(st
*/
}
+static int damon_sysfs_update_schemes_stats(struct damon_sysfs_kdamond *kdamond)
+{
+ struct damon_ctx *ctx = kdamond->damon_ctx;
+ struct damos *scheme;
+ int schemes_idx = 0;
+
+ if (!ctx)
+ return -EINVAL;
+ mutex_lock(&ctx->kdamond_lock);
+ damon_for_each_scheme(scheme, ctx) {
+ struct damon_sysfs_schemes *sysfs_schemes;
+ struct damon_sysfs_stats *sysfs_stats;
+
+ sysfs_schemes = kdamond->contexts->contexts_arr[0]->schemes;
+ sysfs_stats = sysfs_schemes->schemes_arr[schemes_idx++]->stats;
+ sysfs_stats->nr_tried = scheme->stat.nr_tried;
+ sysfs_stats->sz_tried = scheme->stat.sz_tried;
+ sysfs_stats->nr_applied = scheme->stat.nr_applied;
+ sysfs_stats->sz_applied = scheme->stat.sz_applied;
+ sysfs_stats->qt_exceeds = scheme->stat.qt_exceeds;
+ }
+ mutex_unlock(&ctx->kdamond_lock);
+ return 0;
+}
+
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
@@ -2154,6 +2302,8 @@ static ssize_t state_store(struct kobjec
ret = damon_sysfs_turn_damon_on(kdamond);
else if (sysfs_streq(buf, "off"))
ret = damon_sysfs_turn_damon_off(kdamond);
+ else if (sysfs_streq(buf, "update_schemes_stats"))
+ ret = damon_sysfs_update_schemes_stats(kdamond);
else
ret = -EINVAL;
mutex_unlock(&damon_sysfs_lock);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 224/227] selftests/damon: add a test for DAMON sysfs interface
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: selftests/damon: add a test for DAMON sysfs interface
This commit adds a selftest for DAMON sysfs interface. It tests the
functionality of 'nr' files and existence of files in each directory of
the hierarchy.
Link: https://lkml.kernel.org/r/20220228081314.5770-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/damon/Makefile | 1
tools/testing/selftests/damon/sysfs.sh | 306 +++++++++++++++++++++++
2 files changed, 307 insertions(+)
--- a/tools/testing/selftests/damon/Makefile~selftests-damon-add-a-test-for-damon-sysfs-interface
+++ a/tools/testing/selftests/damon/Makefile
@@ -6,5 +6,6 @@ TEST_GEN_FILES += huge_count_read_write
TEST_FILES = _chk_dependency.sh _debugfs_common.sh
TEST_PROGS = debugfs_attrs.sh debugfs_schemes.sh debugfs_target_ids.sh
TEST_PROGS += debugfs_empty_targets.sh debugfs_huge_count_read_write.sh
+TEST_PROGS += sysfs.sh
include ../lib.mk
--- /dev/null
+++ a/tools/testing/selftests/damon/sysfs.sh
@@ -0,0 +1,306 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Kselftest frmework requirement - SKIP code is 4.
+ksft_skip=4
+
+ensure_write_succ()
+{
+ file=$1
+ content=$2
+ reason=$3
+
+ if ! echo "$content" > "$file"
+ then
+ echo "writing $content to $file failed"
+ echo "expected success because $reason"
+ exit 1
+ fi
+}
+
+ensure_write_fail()
+{
+ file=$1
+ content=$2
+ reason=$3
+
+ if echo "$content" > "$file"
+ then
+ echo "writing $content to $file succeed ($fail_reason)"
+ echo "expected failure because $reason"
+ exit 1
+ fi
+}
+
+ensure_dir()
+{
+ dir=$1
+ to_ensure=$2
+ if [ "$to_ensure" = "exist" ] && [ ! -d "$dir" ]
+ then
+ echo "$dir dir is expected but not found"
+ exit 1
+ elif [ "$to_ensure" = "not_exist" ] && [ -d "$dir" ]
+ then
+ echo "$dir dir is not expected but found"
+ exit 1
+ fi
+}
+
+ensure_file()
+{
+ file=$1
+ to_ensure=$2
+ permission=$3
+ if [ "$to_ensure" = "exist" ]
+ then
+ if [ ! -f "$file" ]
+ then
+ echo "$file is expected but not found"
+ exit 1
+ fi
+ perm=$(stat -c "%a" "$file")
+ if [ ! "$perm" = "$permission" ]
+ then
+ echo "$file permission: expected $permission but $perm"
+ exit 1
+ fi
+ elif [ "$to_ensure" = "not_exist" ] && [ -f "$dir" ]
+ then
+ echo "$file is not expected but found"
+ exit 1
+ fi
+}
+
+test_range()
+{
+ range_dir=$1
+ ensure_dir "$range_dir" "exist"
+ ensure_file "$range_dir/min" "exist" 600
+ ensure_file "$range_dir/max" "exist" 600
+}
+
+test_stats()
+{
+ stats_dir=$1
+ ensure_dir "$stats_dir" "exist"
+ for f in nr_tried sz_tried nr_applied sz_applied qt_exceeds
+ do
+ ensure_file "$stats_dir/$f" "exist" "400"
+ done
+}
+
+test_watermarks()
+{
+ watermarks_dir=$1
+ ensure_dir "$watermarks_dir" "exist"
+ ensure_file "$watermarks_dir/metric" "exist" "600"
+ ensure_file "$watermarks_dir/interval_us" "exist" "600"
+ ensure_file "$watermarks_dir/high" "exist" "600"
+ ensure_file "$watermarks_dir/mid" "exist" "600"
+ ensure_file "$watermarks_dir/low" "exist" "600"
+}
+
+test_weights()
+{
+ weights_dir=$1
+ ensure_dir "$weights_dir" "exist"
+ ensure_file "$weights_dir/sz_permil" "exist" "600"
+ ensure_file "$weights_dir/nr_accesses_permil" "exist" "600"
+ ensure_file "$weights_dir/age_permil" "exist" "600"
+}
+
+test_quotas()
+{
+ quotas_dir=$1
+ ensure_dir "$quotas_dir" "exist"
+ ensure_file "$quotas_dir/ms" "exist" 600
+ ensure_file "$quotas_dir/bytes" "exist" 600
+ ensure_file "$quotas_dir/reset_interval_ms" "exist" 600
+ test_weights "$quotas_dir/weights"
+}
+
+test_access_pattern()
+{
+ access_pattern_dir=$1
+ ensure_dir "$access_pattern_dir" "exist"
+ test_range "$access_pattern_dir/age"
+ test_range "$access_pattern_dir/nr_accesses"
+ test_range "$access_pattern_dir/sz"
+}
+
+test_scheme()
+{
+ scheme_dir=$1
+ ensure_dir "$scheme_dir" "exist"
+ ensure_file "$scheme_dir/action" "exist" "600"
+ test_access_pattern "$scheme_dir/access_pattern"
+ test_quotas "$scheme_dir/quotas"
+ test_watermarks "$scheme_dir/watermarks"
+ test_stats "$scheme_dir/stats"
+}
+
+test_schemes()
+{
+ schemes_dir=$1
+ ensure_dir "$schemes_dir" "exist"
+ ensure_file "$schemes_dir/nr_schemes" "exist" 600
+
+ ensure_write_succ "$schemes_dir/nr_schemes" "1" "valid input"
+ test_scheme "$schemes_dir/0"
+
+ ensure_write_succ "$schemes_dir/nr_schemes" "2" "valid input"
+ test_scheme "$schemes_dir/0"
+ test_scheme "$schemes_dir/1"
+
+ ensure_write_succ "$schemes_dir/nr_schemes" "0" "valid input"
+ ensure_dir "$schemes_dir/0" "not_exist"
+ ensure_dir "$schemes_dir/1" "not_exist"
+}
+
+test_region()
+{
+ region_dir=$1
+ ensure_dir "$region_dir" "exist"
+ ensure_file "$region_dir/start" "exist" 600
+ ensure_file "$region_dir/end" "exist" 600
+}
+
+test_regions()
+{
+ regions_dir=$1
+ ensure_dir "$regions_dir" "exist"
+ ensure_file "$regions_dir/nr_regions" "exist" 600
+
+ ensure_write_succ "$regions_dir/nr_regions" "1" "valid input"
+ test_region "$regions_dir/0"
+
+ ensure_write_succ "$regions_dir/nr_regions" "2" "valid input"
+ test_region "$regions_dir/0"
+ test_region "$regions_dir/1"
+
+ ensure_write_succ "$regions_dir/nr_regions" "0" "valid input"
+ ensure_dir "$regions_dir/0" "not_exist"
+ ensure_dir "$regions_dir/1" "not_exist"
+}
+
+test_target()
+{
+ target_dir=$1
+ ensure_dir "$target_dir" "exist"
+ ensure_file "$target_dir/pid_target" "exist" "600"
+ test_regions "$target_dir/regions"
+}
+
+test_targets()
+{
+ targets_dir=$1
+ ensure_dir "$targets_dir" "exist"
+ ensure_file "$targets_dir/nr_targets" "exist" 600
+
+ ensure_write_succ "$targets_dir/nr_targets" "1" "valid input"
+ test_target "$targets_dir/0"
+
+ ensure_write_succ "$targets_dir/nr_targets" "2" "valid input"
+ test_target "$targets_dir/0"
+ test_target "$targets_dir/1"
+
+ ensure_write_succ "$targets_dir/nr_targets" "0" "valid input"
+ ensure_dir "$targets_dir/0" "not_exist"
+ ensure_dir "$targets_dir/1" "not_exist"
+}
+
+test_intervals()
+{
+ intervals_dir=$1
+ ensure_dir "$intervals_dir" "exist"
+ ensure_file "$intervals_dir/aggr_us" "exist" "600"
+ ensure_file "$intervals_dir/sample_us" "exist" "600"
+ ensure_file "$intervals_dir/update_us" "exist" "600"
+}
+
+test_monitoring_attrs()
+{
+ monitoring_attrs_dir=$1
+ ensure_dir "$monitoring_attrs_dir" "exist"
+ test_intervals "$monitoring_attrs_dir/intervals"
+ test_range "$monitoring_attrs_dir/nr_regions"
+}
+
+test_context()
+{
+ context_dir=$1
+ ensure_dir "$context_dir" "exist"
+ ensure_file "$context_dir/operations" "exist" 600
+ test_monitoring_attrs "$context_dir/monitoring_attrs"
+ test_targets "$context_dir/targets"
+ test_schemes "$context_dir/schemes"
+}
+
+test_contexts()
+{
+ contexts_dir=$1
+ ensure_dir "$contexts_dir" "exist"
+ ensure_file "$contexts_dir/nr_contexts" "exist" 600
+
+ ensure_write_succ "$contexts_dir/nr_contexts" "1" "valid input"
+ test_context "$contexts_dir/0"
+
+ ensure_write_fail "$contexts_dir/nr_contexts" "2" "only 0/1 are supported"
+ test_context "$contexts_dir/0"
+
+ ensure_write_succ "$contexts_dir/nr_contexts" "0" "valid input"
+ ensure_dir "$contexts_dir/0" "not_exist"
+}
+
+test_kdamond()
+{
+ kdamond_dir=$1
+ ensure_dir "$kdamond_dir" "exist"
+ ensure_file "$kdamond_dir/state" "exist" "600"
+ ensure_file "$kdamond_dir/pid" "exist" 400
+ test_contexts "$kdamond_dir/contexts"
+}
+
+test_kdamonds()
+{
+ kdamonds_dir=$1
+ ensure_dir "$kdamonds_dir" "exist"
+
+ ensure_file "$kdamonds_dir/nr_kdamonds" "exist" "600"
+
+ ensure_write_succ "$kdamonds_dir/nr_kdamonds" "1" "valid input"
+ test_kdamond "$kdamonds_dir/0"
+
+ ensure_write_succ "$kdamonds_dir/nr_kdamonds" "2" "valid input"
+ test_kdamond "$kdamonds_dir/0"
+ test_kdamond "$kdamonds_dir/1"
+
+ ensure_write_succ "$kdamonds_dir/nr_kdamonds" "0" "valid input"
+ ensure_dir "$kdamonds_dir/0" "not_exist"
+ ensure_dir "$kdamonds_dir/1" "not_exist"
+}
+
+test_damon_sysfs()
+{
+ damon_sysfs=$1
+ if [ ! -d "$damon_sysfs" ]
+ then
+ echo "$damon_sysfs not found"
+ exit $ksft_skip
+ fi
+
+ test_kdamonds "$damon_sysfs/kdamonds"
+}
+
+check_dependencies()
+{
+ if [ $EUID -ne 0 ]
+ then
+ echo "Run as root"
+ exit $ksft_skip
+ fi
+}
+
+check_dependencies
+test_damon_sysfs "/sys/kernel/mm/damon/admin"
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 224/227] selftests/damon: add a test for DAMON sysfs interface
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: selftests/damon: add a test for DAMON sysfs interface
This commit adds a selftest for DAMON sysfs interface. It tests the
functionality of 'nr' files and existence of files in each directory of
the hierarchy.
Link: https://lkml.kernel.org/r/20220228081314.5770-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/damon/Makefile | 1
tools/testing/selftests/damon/sysfs.sh | 306 +++++++++++++++++++++++
2 files changed, 307 insertions(+)
--- a/tools/testing/selftests/damon/Makefile~selftests-damon-add-a-test-for-damon-sysfs-interface
+++ a/tools/testing/selftests/damon/Makefile
@@ -6,5 +6,6 @@ TEST_GEN_FILES += huge_count_read_write
TEST_FILES = _chk_dependency.sh _debugfs_common.sh
TEST_PROGS = debugfs_attrs.sh debugfs_schemes.sh debugfs_target_ids.sh
TEST_PROGS += debugfs_empty_targets.sh debugfs_huge_count_read_write.sh
+TEST_PROGS += sysfs.sh
include ../lib.mk
--- /dev/null
+++ a/tools/testing/selftests/damon/sysfs.sh
@@ -0,0 +1,306 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Kselftest frmework requirement - SKIP code is 4.
+ksft_skip=4
+
+ensure_write_succ()
+{
+ file=$1
+ content=$2
+ reason=$3
+
+ if ! echo "$content" > "$file"
+ then
+ echo "writing $content to $file failed"
+ echo "expected success because $reason"
+ exit 1
+ fi
+}
+
+ensure_write_fail()
+{
+ file=$1
+ content=$2
+ reason=$3
+
+ if echo "$content" > "$file"
+ then
+ echo "writing $content to $file succeed ($fail_reason)"
+ echo "expected failure because $reason"
+ exit 1
+ fi
+}
+
+ensure_dir()
+{
+ dir=$1
+ to_ensure=$2
+ if [ "$to_ensure" = "exist" ] && [ ! -d "$dir" ]
+ then
+ echo "$dir dir is expected but not found"
+ exit 1
+ elif [ "$to_ensure" = "not_exist" ] && [ -d "$dir" ]
+ then
+ echo "$dir dir is not expected but found"
+ exit 1
+ fi
+}
+
+ensure_file()
+{
+ file=$1
+ to_ensure=$2
+ permission=$3
+ if [ "$to_ensure" = "exist" ]
+ then
+ if [ ! -f "$file" ]
+ then
+ echo "$file is expected but not found"
+ exit 1
+ fi
+ perm=$(stat -c "%a" "$file")
+ if [ ! "$perm" = "$permission" ]
+ then
+ echo "$file permission: expected $permission but $perm"
+ exit 1
+ fi
+ elif [ "$to_ensure" = "not_exist" ] && [ -f "$dir" ]
+ then
+ echo "$file is not expected but found"
+ exit 1
+ fi
+}
+
+test_range()
+{
+ range_dir=$1
+ ensure_dir "$range_dir" "exist"
+ ensure_file "$range_dir/min" "exist" 600
+ ensure_file "$range_dir/max" "exist" 600
+}
+
+test_stats()
+{
+ stats_dir=$1
+ ensure_dir "$stats_dir" "exist"
+ for f in nr_tried sz_tried nr_applied sz_applied qt_exceeds
+ do
+ ensure_file "$stats_dir/$f" "exist" "400"
+ done
+}
+
+test_watermarks()
+{
+ watermarks_dir=$1
+ ensure_dir "$watermarks_dir" "exist"
+ ensure_file "$watermarks_dir/metric" "exist" "600"
+ ensure_file "$watermarks_dir/interval_us" "exist" "600"
+ ensure_file "$watermarks_dir/high" "exist" "600"
+ ensure_file "$watermarks_dir/mid" "exist" "600"
+ ensure_file "$watermarks_dir/low" "exist" "600"
+}
+
+test_weights()
+{
+ weights_dir=$1
+ ensure_dir "$weights_dir" "exist"
+ ensure_file "$weights_dir/sz_permil" "exist" "600"
+ ensure_file "$weights_dir/nr_accesses_permil" "exist" "600"
+ ensure_file "$weights_dir/age_permil" "exist" "600"
+}
+
+test_quotas()
+{
+ quotas_dir=$1
+ ensure_dir "$quotas_dir" "exist"
+ ensure_file "$quotas_dir/ms" "exist" 600
+ ensure_file "$quotas_dir/bytes" "exist" 600
+ ensure_file "$quotas_dir/reset_interval_ms" "exist" 600
+ test_weights "$quotas_dir/weights"
+}
+
+test_access_pattern()
+{
+ access_pattern_dir=$1
+ ensure_dir "$access_pattern_dir" "exist"
+ test_range "$access_pattern_dir/age"
+ test_range "$access_pattern_dir/nr_accesses"
+ test_range "$access_pattern_dir/sz"
+}
+
+test_scheme()
+{
+ scheme_dir=$1
+ ensure_dir "$scheme_dir" "exist"
+ ensure_file "$scheme_dir/action" "exist" "600"
+ test_access_pattern "$scheme_dir/access_pattern"
+ test_quotas "$scheme_dir/quotas"
+ test_watermarks "$scheme_dir/watermarks"
+ test_stats "$scheme_dir/stats"
+}
+
+test_schemes()
+{
+ schemes_dir=$1
+ ensure_dir "$schemes_dir" "exist"
+ ensure_file "$schemes_dir/nr_schemes" "exist" 600
+
+ ensure_write_succ "$schemes_dir/nr_schemes" "1" "valid input"
+ test_scheme "$schemes_dir/0"
+
+ ensure_write_succ "$schemes_dir/nr_schemes" "2" "valid input"
+ test_scheme "$schemes_dir/0"
+ test_scheme "$schemes_dir/1"
+
+ ensure_write_succ "$schemes_dir/nr_schemes" "0" "valid input"
+ ensure_dir "$schemes_dir/0" "not_exist"
+ ensure_dir "$schemes_dir/1" "not_exist"
+}
+
+test_region()
+{
+ region_dir=$1
+ ensure_dir "$region_dir" "exist"
+ ensure_file "$region_dir/start" "exist" 600
+ ensure_file "$region_dir/end" "exist" 600
+}
+
+test_regions()
+{
+ regions_dir=$1
+ ensure_dir "$regions_dir" "exist"
+ ensure_file "$regions_dir/nr_regions" "exist" 600
+
+ ensure_write_succ "$regions_dir/nr_regions" "1" "valid input"
+ test_region "$regions_dir/0"
+
+ ensure_write_succ "$regions_dir/nr_regions" "2" "valid input"
+ test_region "$regions_dir/0"
+ test_region "$regions_dir/1"
+
+ ensure_write_succ "$regions_dir/nr_regions" "0" "valid input"
+ ensure_dir "$regions_dir/0" "not_exist"
+ ensure_dir "$regions_dir/1" "not_exist"
+}
+
+test_target()
+{
+ target_dir=$1
+ ensure_dir "$target_dir" "exist"
+ ensure_file "$target_dir/pid_target" "exist" "600"
+ test_regions "$target_dir/regions"
+}
+
+test_targets()
+{
+ targets_dir=$1
+ ensure_dir "$targets_dir" "exist"
+ ensure_file "$targets_dir/nr_targets" "exist" 600
+
+ ensure_write_succ "$targets_dir/nr_targets" "1" "valid input"
+ test_target "$targets_dir/0"
+
+ ensure_write_succ "$targets_dir/nr_targets" "2" "valid input"
+ test_target "$targets_dir/0"
+ test_target "$targets_dir/1"
+
+ ensure_write_succ "$targets_dir/nr_targets" "0" "valid input"
+ ensure_dir "$targets_dir/0" "not_exist"
+ ensure_dir "$targets_dir/1" "not_exist"
+}
+
+test_intervals()
+{
+ intervals_dir=$1
+ ensure_dir "$intervals_dir" "exist"
+ ensure_file "$intervals_dir/aggr_us" "exist" "600"
+ ensure_file "$intervals_dir/sample_us" "exist" "600"
+ ensure_file "$intervals_dir/update_us" "exist" "600"
+}
+
+test_monitoring_attrs()
+{
+ monitoring_attrs_dir=$1
+ ensure_dir "$monitoring_attrs_dir" "exist"
+ test_intervals "$monitoring_attrs_dir/intervals"
+ test_range "$monitoring_attrs_dir/nr_regions"
+}
+
+test_context()
+{
+ context_dir=$1
+ ensure_dir "$context_dir" "exist"
+ ensure_file "$context_dir/operations" "exist" 600
+ test_monitoring_attrs "$context_dir/monitoring_attrs"
+ test_targets "$context_dir/targets"
+ test_schemes "$context_dir/schemes"
+}
+
+test_contexts()
+{
+ contexts_dir=$1
+ ensure_dir "$contexts_dir" "exist"
+ ensure_file "$contexts_dir/nr_contexts" "exist" 600
+
+ ensure_write_succ "$contexts_dir/nr_contexts" "1" "valid input"
+ test_context "$contexts_dir/0"
+
+ ensure_write_fail "$contexts_dir/nr_contexts" "2" "only 0/1 are supported"
+ test_context "$contexts_dir/0"
+
+ ensure_write_succ "$contexts_dir/nr_contexts" "0" "valid input"
+ ensure_dir "$contexts_dir/0" "not_exist"
+}
+
+test_kdamond()
+{
+ kdamond_dir=$1
+ ensure_dir "$kdamond_dir" "exist"
+ ensure_file "$kdamond_dir/state" "exist" "600"
+ ensure_file "$kdamond_dir/pid" "exist" 400
+ test_contexts "$kdamond_dir/contexts"
+}
+
+test_kdamonds()
+{
+ kdamonds_dir=$1
+ ensure_dir "$kdamonds_dir" "exist"
+
+ ensure_file "$kdamonds_dir/nr_kdamonds" "exist" "600"
+
+ ensure_write_succ "$kdamonds_dir/nr_kdamonds" "1" "valid input"
+ test_kdamond "$kdamonds_dir/0"
+
+ ensure_write_succ "$kdamonds_dir/nr_kdamonds" "2" "valid input"
+ test_kdamond "$kdamonds_dir/0"
+ test_kdamond "$kdamonds_dir/1"
+
+ ensure_write_succ "$kdamonds_dir/nr_kdamonds" "0" "valid input"
+ ensure_dir "$kdamonds_dir/0" "not_exist"
+ ensure_dir "$kdamonds_dir/1" "not_exist"
+}
+
+test_damon_sysfs()
+{
+ damon_sysfs=$1
+ if [ ! -d "$damon_sysfs" ]
+ then
+ echo "$damon_sysfs not found"
+ exit $ksft_skip
+ fi
+
+ test_kdamonds "$damon_sysfs/kdamonds"
+}
+
+check_dependencies()
+{
+ if [ $EUID -ne 0 ]
+ then
+ echo "Run as root"
+ exit $ksft_skip
+ fi
+}
+
+check_dependencies
+test_damon_sysfs "/sys/kernel/mm/damon/admin"
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 225/227] Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 18133 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
This commit adds detailed usage of DAMON sysfs interface in the
admin-guide document for DAMON.
Link: https://lkml.kernel.org/r/20220228081314.5770-13-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 350 ++++++++++++++++-
1 file changed, 344 insertions(+), 6 deletions(-)
--- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-document-damon-sysfs-interface
+++ a/Documentation/admin-guide/mm/damon/usage.rst
@@ -4,7 +4,7 @@
Detailed Usages
===============
-DAMON provides below three interfaces for different users.
+DAMON provides below interfaces for different users.
- *DAMON user space tool.*
`This <https://github.com/awslabs/damo>`_ is for privileged people such as
@@ -14,17 +14,21 @@ DAMON provides below three interfaces fo
virtual and physical address spaces monitoring. For more detail, please
refer to its `usage document
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
-- *debugfs interface.*
- :ref:`This <debugfs_interface>` is for privileged user space programmers who
+- *sysfs interface.*
+ :ref:`This <sysfs_interface>` is for privileged user space programmers who
want more optimized use of DAMON. Using this, users can use DAMON’s major
- features by reading from and writing to special debugfs files. Therefore,
- you can write and use your personalized DAMON debugfs wrapper programs that
- reads/writes the debugfs files instead of you. The `DAMON user space tool
+ features by reading from and writing to special sysfs files. Therefore,
+ you can write and use your personalized DAMON sysfs wrapper programs that
+ reads/writes the sysfs files instead of you. The `DAMON user space tool
<https://github.com/awslabs/damo>`_ is one example of such programs. It
supports both virtual and physical address spaces monitoring. Note that this
interface provides only simple :ref:`statistics <damos_stats>` for the
monitoring results. For detailed monitoring results, DAMON provides a
:ref:`tracepoint <tracepoint>`.
+- *debugfs interface.*
+ :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
+ <sysfs_interface>`. This will be removed after next LTS kernel is released,
+ so users should move to the :ref:`sysfs interface <sysfs_interface>`.
- *Kernel Space Programming Interface.*
:doc:`This </vm/damon/api>` is for kernel space programmers. Using this,
users can utilize every feature of DAMON most flexibly and efficiently by
@@ -32,6 +36,340 @@ DAMON provides below three interfaces fo
DAMON for various address spaces. For detail, please refer to the interface
:doc:`document </vm/damon/api>`.
+.. _sysfs_interface:
+
+sysfs Interface
+===============
+
+DAMON sysfs interface is built when ``CONFIG_DAMON_SYSFS`` is defined. It
+creates multiple directories and files under its sysfs directory,
+``<sysfs>/kernel/mm/damon/``. You can control DAMON by writing to and reading
+from the files under the directory.
+
+For a short example, users can monitor the virtual address space of a given
+workload as below. ::
+
+ # cd /sys/kernel/mm/damon/admin/
+ # echo 1 > kdamonds/nr && echo 1 > kdamonds/0/contexts/nr
+ # echo vaddr > kdamonds/0/contexts/0/operations
+ # echo 1 > kdamonds/0/contexts/0/targets/nr
+ # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid
+ # echo on > kdamonds/0/state
+
+Files Hierarchy
+---------------
+
+The files hierarchy of DAMON sysfs interface is shown below. In the below
+figure, parents-children relations are represented with indentations, each
+directory is having ``/`` suffix, and files in each directory are separated by
+comma (","). ::
+
+ /sys/kernel/mm/damon/admin
+ │ kdamonds/nr_kdamonds
+ │ │ 0/state,pid
+ │ │ │ contexts/nr_contexts
+ │ │ │ │ 0/operations
+ │ │ │ │ │ monitoring_attrs/
+ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
+ │ │ │ │ │ │ nr_regions/min,max
+ │ │ │ │ │ targets/nr_targets
+ │ │ │ │ │ │ 0/pid_target
+ │ │ │ │ │ │ │ regions/nr_regions
+ │ │ │ │ │ │ │ │ 0/start,end
+ │ │ │ │ │ │ │ │ ...
+ │ │ │ │ │ │ ...
+ │ │ │ │ │ schemes/nr_schemes
+ │ │ │ │ │ │ 0/action
+ │ │ │ │ │ │ │ access_pattern/
+ │ │ │ │ │ │ │ │ sz/min,max
+ │ │ │ │ │ │ │ │ nr_accesses/min,max
+ │ │ │ │ │ │ │ │ age/min,max
+ │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
+ │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
+ │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
+ │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
+ │ │ │ │ │ │ ...
+ │ │ │ │ ...
+ │ │ ...
+
+Root
+----
+
+The root of the DAMON sysfs interface is ``<sysfs>/kernel/mm/damon/``, and it
+has one directory named ``admin``. The directory contains the files for
+privileged user space programs' control of DAMON. User space tools or deamons
+having the root permission could use this directory.
+
+kdamonds/
+---------
+
+The monitoring-related information including request specifications and results
+are called DAMON context. DAMON executes each context with a kernel thread
+called kdamond, and multiple kdamonds could run in parallel.
+
+Under the ``admin`` directory, one directory, ``kdamonds``, which has files for
+controlling the kdamonds exist. In the beginning, this directory has only one
+file, ``nr_kdamonds``. Writing a number (``N``) to the file creates the number
+of child directories named ``0`` to ``N-1``. Each directory represents each
+kdamond.
+
+kdamonds/<N>/
+-------------
+
+In each kdamond directory, two files (``state`` and ``pid``) and one directory
+(``contexts``) exist.
+
+Reading ``state`` returns ``on`` if the kdamond is currently running, or
+``off`` if it is not running. Writing ``on`` or ``off`` makes the kdamond be
+in the state. Writing ``update_schemes_stats`` to ``state`` file updates the
+contents of stats files for each DAMON-based operation scheme of the kdamond.
+For details of the stats, please refer to :ref:`stats section
+<sysfs_schemes_stats>`.
+
+If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
+
+``contexts`` directory contains files for controlling the monitoring contexts
+that this kdamond will execute.
+
+kdamonds/<N>/contexts/
+----------------------
+
+In the beginning, this directory has only one file, ``nr_contexts``. Writing a
+number (``N``) to the file creates the number of child directories named as
+``0`` to ``N-1``. Each directory represents each monitoring context. At the
+moment, only one context per kdamond is supported, so only ``0`` or ``1`` can
+be written to the file.
+
+contexts/<N>/
+-------------
+
+In each context directory, one file (``operations``) and three directories
+(``monitoring_attrs``, ``targets``, and ``schemes``) exist.
+
+DAMON supports multiple types of monitoring operations, including those for
+virtual address space and the physical address space. You can set and get what
+type of monitoring operations DAMON will use for the context by writing one of
+below keywords to, and reading from the file.
+
+ - vaddr: Monitor virtual address spaces of specific processes
+ - paddr: Monitor the physical address space of the system
+
+contexts/<N>/monitoring_attrs/
+------------------------------
+
+Files for specifying attributes of the monitoring including required quality
+and efficiency of the monitoring are in ``monitoring_attrs`` directory.
+Specifically, two directories, ``intervals`` and ``nr_regions`` exist in this
+directory.
+
+Under ``intervals`` directory, three files for DAMON's sampling interval
+(``sample_us``), aggregation interval (``aggr_us``), and update interval
+(``update_us``) exist. You can set and get the values in micro-seconds by
+writing to and reading from the files.
+
+Under ``nr_regions`` directory, two files for the lower-bound and upper-bound
+of DAMON's monitoring regions (``min`` and ``max``, respectively), which
+controls the monitoring overhead, exist. You can set and get the values by
+writing to and rading from the files.
+
+For more details about the intervals and monitoring regions range, please refer
+to the Design document (:doc:`/vm/damon/design`).
+
+contexts/<N>/targets/
+---------------------
+
+In the beginning, this directory has only one file, ``nr_targets``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each monitoring target.
+
+targets/<N>/
+------------
+
+In each target directory, one file (``pid_target``) and one directory
+(``regions``) exist.
+
+If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should
+be a process. You can specify the process to DAMON by writing the pid of the
+process to the ``pid_target`` file.
+
+targets/<N>/regions
+-------------------
+
+When ``vaddr`` monitoring operations set is being used (``vaddr`` is written to
+the ``contexts/<N>/operations`` file), DAMON automatically sets and updates the
+monitoring target regions so that entire memory mappings of target processes
+can be covered. However, users could want to set the initial monitoring region
+to specific address ranges.
+
+In contrast, DAMON do not automatically sets and updates the monitoring target
+regions when ``paddr`` monitoring operations set is being used (``paddr`` is
+written to the ``contexts/<N>/operations``). Therefore, users should set the
+monitoring target regions by themselves in the case.
+
+For such cases, users can explicitly set the initial monitoring target regions
+as they want, by writing proper values to the files under this directory.
+
+In the beginning, this directory has only one file, ``nr_regions``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each initial monitoring target region.
+
+regions/<N>/
+------------
+
+In each region directory, you will find two files (``start`` and ``end``). You
+can set and get the start and end addresses of the initial monitoring target
+region by writing to and reading from the files, respectively.
+
+contexts/<N>/schemes/
+---------------------
+
+For usual DAMON-based data access aware memory management optimizations, users
+would normally want the system to apply a memory management action to a memory
+region of a specific access pattern. DAMON receives such formalized operation
+schemes from the user and applies those to the target memory regions. Users
+can get and set the schemes by reading from and writing to files under this
+directory.
+
+In the beginning, this directory has only one file, ``nr_schemes``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each DAMON-based operation scheme.
+
+schemes/<N>/
+------------
+
+In each scheme directory, four directories (``access_pattern``, ``quotas``,
+``watermarks``, and ``stats``) and one file (``action``) exist.
+
+The ``action`` file is for setting and getting what action you want to apply to
+memory regions having specific access pattern of the interest. The keywords
+that can be written to and read from the file and their meaning are as below.
+
+ - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``
+ - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``
+ - ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
+ - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
+ - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
+ - ``stat``: Do nothing but count the statistics
+
+schemes/<N>/access_pattern/
+---------------------------
+
+The target access pattern of each DAMON-based operation scheme is constructed
+with three ranges including the size of the region in bytes, number of
+monitored accesses per aggregate interval, and number of aggregated intervals
+for the age of the region.
+
+Under the ``access_pattern`` directory, three directories (``sz``,
+``nr_accesses``, and ``age``) each having two files (``min`` and ``max``)
+exist. You can set and get the access pattern for the given scheme by writing
+to and reading from the ``min`` and ``max`` files under ``sz``,
+``nr_accesses``, and ``age`` directories, respectively.
+
+schemes/<N>/quotas/
+-------------------
+
+Optimal ``target access pattern`` for each ``action`` is workload dependent, so
+not easy to find. Worse yet, setting a scheme of some action too aggressive
+can cause severe overhead. To avoid such overhead, users can limit time and
+size quota for each scheme. In detail, users can ask DAMON to try to use only
+up to specific time (``time quota``) for applying the action, and to apply the
+action to only up to specific amount (``size quota``) of memory regions having
+the target access pattern within a given time interval (``reset interval``).
+
+When the quota limit is expected to be exceeded, DAMON prioritizes found memory
+regions of the ``target access pattern`` based on their size, access frequency,
+and age. For personalized prioritization, users can set the weights for the
+three properties.
+
+Under ``quotas`` directory, three files (``ms``, ``bytes``,
+``reset_interval_ms``) and one directory (``weights``) having three files
+(``sz_permil``, ``nr_accesses_permil``, and ``age_permil``) in it exist.
+
+You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
+``reset interval`` in milliseconds by writing the values to the three files,
+respectively. You can also set the prioritization weights for size, access
+frequency, and age in per-thousand unit by writing the values to the three
+files under the ``weights`` directory.
+
+schemes/<N>/watermarks/
+-----------------------
+
+To allow easy activation and deactivation of each scheme based on system
+status, DAMON provides a feature called watermarks. The feature receives five
+values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The
+``metric`` is the system metric such as free memory ratio that can be measured.
+If the metric value of the system is higher than the value in ``high`` or lower
+than ``low`` at the memoent, the scheme is deactivated. If the value is lower
+than ``mid``, the scheme is activated.
+
+Under the watermarks directory, five files (``metric``, ``interval_us``,
+``high``, ``mid``, and ``low``) for setting each value exist. You can set and
+get the five values by writing to the files, respectively.
+
+Keywords and meanings of those that can be written to the ``metric`` file are
+as below.
+
+ - none: Ignore the watermarks
+ - free_mem_rate: System's free memory rate (per thousand)
+
+The ``interval`` should written in microseconds unit.
+
+.. _sysfs_schemes_stats:
+
+schemes/<N>/stats/
+------------------
+
+DAMON counts the total number and bytes of regions that each scheme is tried to
+be applied, the two numbers for the regions that each scheme is successfully
+applied, and the total number of the quota limit exceeds. This statistics can
+be used for online analysis or tuning of the schemes.
+
+The statistics can be retrieved by reading the files under ``stats`` directory
+(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, and
+``qt_exceeds``), respectively. The files are not updated in real time, so you
+should ask DAMON sysfs interface to updte the content of the files for the
+stats by writing a special keyword, ``update_schemes_stats`` to the relevant
+``kdamonds/<N>/state`` file.
+
+Example
+~~~~~~~
+
+Below commands applies a scheme saying "If a memory region of size in [4KiB,
+8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
+interval in [10, 20], page out the region. For the paging out, use only up to
+10ms per second, and also don't page out more than 1GiB per second. Under the
+limitation, page out memory regions having longer age first. Also, check the
+free memory rate of the system every 5 seconds, start the monitoring and paging
+out when the free memory rate becomes lower than 50%, but stop it if the free
+memory rate becomes larger than 60%, or lower than 30%". ::
+
+ # cd <sysfs>/kernel/mm/damon/admin
+ # # populate directories
+ # echo 1 > kdamonds/nr_kdamonds; echo 1 > kdamonds/0/contexts/nr_contexts;
+ # echo 1 > kdamonds/0/contexts/0/schemes/nr_schemes
+ # cd kdamonds/0/contexts/0/schemes/0
+ # # set the basic access pattern and the action
+ # echo 4096 > access_patterns/sz/min
+ # echo 8192 > access_patterns/sz/max
+ # echo 0 > access_patterns/nr_accesses/min
+ # echo 5 > access_patterns/nr_accesses/max
+ # echo 10 > access_patterns/age/min
+ # echo 20 > access_patterns/age/max
+ # echo pageout > action
+ # # set quotas
+ # echo 10 > quotas/ms
+ # echo $((1024*1024*1024)) > quotas/bytes
+ # echo 1000 > quotas/reset_interval_ms
+ # # set watermark
+ # echo free_mem_rate > watermarks/metric
+ # echo 5000000 > watermarks/interval_us
+ # echo 600 > watermarks/high
+ # echo 500 > watermarks/mid
+ # echo 300 > watermarks/low
+
+Please note that it's highly recommended to use user space tools like `damo
+<https://github.com/awslabs/damo>`_ rather than manually reading and writing
+the files as above. Above is only for an example.
.. _debugfs_interface:
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 225/227] Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 18133 bytes --]
From: SeongJae Park <sj@kernel.org>
Subject: Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
This commit adds detailed usage of DAMON sysfs interface in the
admin-guide document for DAMON.
Link: https://lkml.kernel.org/r/20220228081314.5770-13-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 350 ++++++++++++++++-
1 file changed, 344 insertions(+), 6 deletions(-)
--- a/Documentation/admin-guide/mm/damon/usage.rst~docs-admin-guide-mm-damon-usage-document-damon-sysfs-interface
+++ a/Documentation/admin-guide/mm/damon/usage.rst
@@ -4,7 +4,7 @@
Detailed Usages
===============
-DAMON provides below three interfaces for different users.
+DAMON provides below interfaces for different users.
- *DAMON user space tool.*
`This <https://github.com/awslabs/damo>`_ is for privileged people such as
@@ -14,17 +14,21 @@ DAMON provides below three interfaces fo
virtual and physical address spaces monitoring. For more detail, please
refer to its `usage document
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
-- *debugfs interface.*
- :ref:`This <debugfs_interface>` is for privileged user space programmers who
+- *sysfs interface.*
+ :ref:`This <sysfs_interface>` is for privileged user space programmers who
want more optimized use of DAMON. Using this, users can use DAMON’s major
- features by reading from and writing to special debugfs files. Therefore,
- you can write and use your personalized DAMON debugfs wrapper programs that
- reads/writes the debugfs files instead of you. The `DAMON user space tool
+ features by reading from and writing to special sysfs files. Therefore,
+ you can write and use your personalized DAMON sysfs wrapper programs that
+ reads/writes the sysfs files instead of you. The `DAMON user space tool
<https://github.com/awslabs/damo>`_ is one example of such programs. It
supports both virtual and physical address spaces monitoring. Note that this
interface provides only simple :ref:`statistics <damos_stats>` for the
monitoring results. For detailed monitoring results, DAMON provides a
:ref:`tracepoint <tracepoint>`.
+- *debugfs interface.*
+ :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
+ <sysfs_interface>`. This will be removed after next LTS kernel is released,
+ so users should move to the :ref:`sysfs interface <sysfs_interface>`.
- *Kernel Space Programming Interface.*
:doc:`This </vm/damon/api>` is for kernel space programmers. Using this,
users can utilize every feature of DAMON most flexibly and efficiently by
@@ -32,6 +36,340 @@ DAMON provides below three interfaces fo
DAMON for various address spaces. For detail, please refer to the interface
:doc:`document </vm/damon/api>`.
+.. _sysfs_interface:
+
+sysfs Interface
+===============
+
+DAMON sysfs interface is built when ``CONFIG_DAMON_SYSFS`` is defined. It
+creates multiple directories and files under its sysfs directory,
+``<sysfs>/kernel/mm/damon/``. You can control DAMON by writing to and reading
+from the files under the directory.
+
+For a short example, users can monitor the virtual address space of a given
+workload as below. ::
+
+ # cd /sys/kernel/mm/damon/admin/
+ # echo 1 > kdamonds/nr && echo 1 > kdamonds/0/contexts/nr
+ # echo vaddr > kdamonds/0/contexts/0/operations
+ # echo 1 > kdamonds/0/contexts/0/targets/nr
+ # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid
+ # echo on > kdamonds/0/state
+
+Files Hierarchy
+---------------
+
+The files hierarchy of DAMON sysfs interface is shown below. In the below
+figure, parents-children relations are represented with indentations, each
+directory is having ``/`` suffix, and files in each directory are separated by
+comma (","). ::
+
+ /sys/kernel/mm/damon/admin
+ │ kdamonds/nr_kdamonds
+ │ │ 0/state,pid
+ │ │ │ contexts/nr_contexts
+ │ │ │ │ 0/operations
+ │ │ │ │ │ monitoring_attrs/
+ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
+ │ │ │ │ │ │ nr_regions/min,max
+ │ │ │ │ │ targets/nr_targets
+ │ │ │ │ │ │ 0/pid_target
+ │ │ │ │ │ │ │ regions/nr_regions
+ │ │ │ │ │ │ │ │ 0/start,end
+ │ │ │ │ │ │ │ │ ...
+ │ │ │ │ │ │ ...
+ │ │ │ │ │ schemes/nr_schemes
+ │ │ │ │ │ │ 0/action
+ │ │ │ │ │ │ │ access_pattern/
+ │ │ │ │ │ │ │ │ sz/min,max
+ │ │ │ │ │ │ │ │ nr_accesses/min,max
+ │ │ │ │ │ │ │ │ age/min,max
+ │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
+ │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
+ │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
+ │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
+ │ │ │ │ │ │ ...
+ │ │ │ │ ...
+ │ │ ...
+
+Root
+----
+
+The root of the DAMON sysfs interface is ``<sysfs>/kernel/mm/damon/``, and it
+has one directory named ``admin``. The directory contains the files for
+privileged user space programs' control of DAMON. User space tools or deamons
+having the root permission could use this directory.
+
+kdamonds/
+---------
+
+The monitoring-related information including request specifications and results
+are called DAMON context. DAMON executes each context with a kernel thread
+called kdamond, and multiple kdamonds could run in parallel.
+
+Under the ``admin`` directory, one directory, ``kdamonds``, which has files for
+controlling the kdamonds exist. In the beginning, this directory has only one
+file, ``nr_kdamonds``. Writing a number (``N``) to the file creates the number
+of child directories named ``0`` to ``N-1``. Each directory represents each
+kdamond.
+
+kdamonds/<N>/
+-------------
+
+In each kdamond directory, two files (``state`` and ``pid``) and one directory
+(``contexts``) exist.
+
+Reading ``state`` returns ``on`` if the kdamond is currently running, or
+``off`` if it is not running. Writing ``on`` or ``off`` makes the kdamond be
+in the state. Writing ``update_schemes_stats`` to ``state`` file updates the
+contents of stats files for each DAMON-based operation scheme of the kdamond.
+For details of the stats, please refer to :ref:`stats section
+<sysfs_schemes_stats>`.
+
+If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
+
+``contexts`` directory contains files for controlling the monitoring contexts
+that this kdamond will execute.
+
+kdamonds/<N>/contexts/
+----------------------
+
+In the beginning, this directory has only one file, ``nr_contexts``. Writing a
+number (``N``) to the file creates the number of child directories named as
+``0`` to ``N-1``. Each directory represents each monitoring context. At the
+moment, only one context per kdamond is supported, so only ``0`` or ``1`` can
+be written to the file.
+
+contexts/<N>/
+-------------
+
+In each context directory, one file (``operations``) and three directories
+(``monitoring_attrs``, ``targets``, and ``schemes``) exist.
+
+DAMON supports multiple types of monitoring operations, including those for
+virtual address space and the physical address space. You can set and get what
+type of monitoring operations DAMON will use for the context by writing one of
+below keywords to, and reading from the file.
+
+ - vaddr: Monitor virtual address spaces of specific processes
+ - paddr: Monitor the physical address space of the system
+
+contexts/<N>/monitoring_attrs/
+------------------------------
+
+Files for specifying attributes of the monitoring including required quality
+and efficiency of the monitoring are in ``monitoring_attrs`` directory.
+Specifically, two directories, ``intervals`` and ``nr_regions`` exist in this
+directory.
+
+Under ``intervals`` directory, three files for DAMON's sampling interval
+(``sample_us``), aggregation interval (``aggr_us``), and update interval
+(``update_us``) exist. You can set and get the values in micro-seconds by
+writing to and reading from the files.
+
+Under ``nr_regions`` directory, two files for the lower-bound and upper-bound
+of DAMON's monitoring regions (``min`` and ``max``, respectively), which
+controls the monitoring overhead, exist. You can set and get the values by
+writing to and rading from the files.
+
+For more details about the intervals and monitoring regions range, please refer
+to the Design document (:doc:`/vm/damon/design`).
+
+contexts/<N>/targets/
+---------------------
+
+In the beginning, this directory has only one file, ``nr_targets``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each monitoring target.
+
+targets/<N>/
+------------
+
+In each target directory, one file (``pid_target``) and one directory
+(``regions``) exist.
+
+If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should
+be a process. You can specify the process to DAMON by writing the pid of the
+process to the ``pid_target`` file.
+
+targets/<N>/regions
+-------------------
+
+When ``vaddr`` monitoring operations set is being used (``vaddr`` is written to
+the ``contexts/<N>/operations`` file), DAMON automatically sets and updates the
+monitoring target regions so that entire memory mappings of target processes
+can be covered. However, users could want to set the initial monitoring region
+to specific address ranges.
+
+In contrast, DAMON do not automatically sets and updates the monitoring target
+regions when ``paddr`` monitoring operations set is being used (``paddr`` is
+written to the ``contexts/<N>/operations``). Therefore, users should set the
+monitoring target regions by themselves in the case.
+
+For such cases, users can explicitly set the initial monitoring target regions
+as they want, by writing proper values to the files under this directory.
+
+In the beginning, this directory has only one file, ``nr_regions``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each initial monitoring target region.
+
+regions/<N>/
+------------
+
+In each region directory, you will find two files (``start`` and ``end``). You
+can set and get the start and end addresses of the initial monitoring target
+region by writing to and reading from the files, respectively.
+
+contexts/<N>/schemes/
+---------------------
+
+For usual DAMON-based data access aware memory management optimizations, users
+would normally want the system to apply a memory management action to a memory
+region of a specific access pattern. DAMON receives such formalized operation
+schemes from the user and applies those to the target memory regions. Users
+can get and set the schemes by reading from and writing to files under this
+directory.
+
+In the beginning, this directory has only one file, ``nr_schemes``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each DAMON-based operation scheme.
+
+schemes/<N>/
+------------
+
+In each scheme directory, four directories (``access_pattern``, ``quotas``,
+``watermarks``, and ``stats``) and one file (``action``) exist.
+
+The ``action`` file is for setting and getting what action you want to apply to
+memory regions having specific access pattern of the interest. The keywords
+that can be written to and read from the file and their meaning are as below.
+
+ - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``
+ - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``
+ - ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
+ - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
+ - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
+ - ``stat``: Do nothing but count the statistics
+
+schemes/<N>/access_pattern/
+---------------------------
+
+The target access pattern of each DAMON-based operation scheme is constructed
+with three ranges including the size of the region in bytes, number of
+monitored accesses per aggregate interval, and number of aggregated intervals
+for the age of the region.
+
+Under the ``access_pattern`` directory, three directories (``sz``,
+``nr_accesses``, and ``age``) each having two files (``min`` and ``max``)
+exist. You can set and get the access pattern for the given scheme by writing
+to and reading from the ``min`` and ``max`` files under ``sz``,
+``nr_accesses``, and ``age`` directories, respectively.
+
+schemes/<N>/quotas/
+-------------------
+
+Optimal ``target access pattern`` for each ``action`` is workload dependent, so
+not easy to find. Worse yet, setting a scheme of some action too aggressive
+can cause severe overhead. To avoid such overhead, users can limit time and
+size quota for each scheme. In detail, users can ask DAMON to try to use only
+up to specific time (``time quota``) for applying the action, and to apply the
+action to only up to specific amount (``size quota``) of memory regions having
+the target access pattern within a given time interval (``reset interval``).
+
+When the quota limit is expected to be exceeded, DAMON prioritizes found memory
+regions of the ``target access pattern`` based on their size, access frequency,
+and age. For personalized prioritization, users can set the weights for the
+three properties.
+
+Under ``quotas`` directory, three files (``ms``, ``bytes``,
+``reset_interval_ms``) and one directory (``weights``) having three files
+(``sz_permil``, ``nr_accesses_permil``, and ``age_permil``) in it exist.
+
+You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
+``reset interval`` in milliseconds by writing the values to the three files,
+respectively. You can also set the prioritization weights for size, access
+frequency, and age in per-thousand unit by writing the values to the three
+files under the ``weights`` directory.
+
+schemes/<N>/watermarks/
+-----------------------
+
+To allow easy activation and deactivation of each scheme based on system
+status, DAMON provides a feature called watermarks. The feature receives five
+values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The
+``metric`` is the system metric such as free memory ratio that can be measured.
+If the metric value of the system is higher than the value in ``high`` or lower
+than ``low`` at the memoent, the scheme is deactivated. If the value is lower
+than ``mid``, the scheme is activated.
+
+Under the watermarks directory, five files (``metric``, ``interval_us``,
+``high``, ``mid``, and ``low``) for setting each value exist. You can set and
+get the five values by writing to the files, respectively.
+
+Keywords and meanings of those that can be written to the ``metric`` file are
+as below.
+
+ - none: Ignore the watermarks
+ - free_mem_rate: System's free memory rate (per thousand)
+
+The ``interval`` should written in microseconds unit.
+
+.. _sysfs_schemes_stats:
+
+schemes/<N>/stats/
+------------------
+
+DAMON counts the total number and bytes of regions that each scheme is tried to
+be applied, the two numbers for the regions that each scheme is successfully
+applied, and the total number of the quota limit exceeds. This statistics can
+be used for online analysis or tuning of the schemes.
+
+The statistics can be retrieved by reading the files under ``stats`` directory
+(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, and
+``qt_exceeds``), respectively. The files are not updated in real time, so you
+should ask DAMON sysfs interface to updte the content of the files for the
+stats by writing a special keyword, ``update_schemes_stats`` to the relevant
+``kdamonds/<N>/state`` file.
+
+Example
+~~~~~~~
+
+Below commands applies a scheme saying "If a memory region of size in [4KiB,
+8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
+interval in [10, 20], page out the region. For the paging out, use only up to
+10ms per second, and also don't page out more than 1GiB per second. Under the
+limitation, page out memory regions having longer age first. Also, check the
+free memory rate of the system every 5 seconds, start the monitoring and paging
+out when the free memory rate becomes lower than 50%, but stop it if the free
+memory rate becomes larger than 60%, or lower than 30%". ::
+
+ # cd <sysfs>/kernel/mm/damon/admin
+ # # populate directories
+ # echo 1 > kdamonds/nr_kdamonds; echo 1 > kdamonds/0/contexts/nr_contexts;
+ # echo 1 > kdamonds/0/contexts/0/schemes/nr_schemes
+ # cd kdamonds/0/contexts/0/schemes/0
+ # # set the basic access pattern and the action
+ # echo 4096 > access_patterns/sz/min
+ # echo 8192 > access_patterns/sz/max
+ # echo 0 > access_patterns/nr_accesses/min
+ # echo 5 > access_patterns/nr_accesses/max
+ # echo 10 > access_patterns/age/min
+ # echo 20 > access_patterns/age/max
+ # echo pageout > action
+ # # set quotas
+ # echo 10 > quotas/ms
+ # echo $((1024*1024*1024)) > quotas/bytes
+ # echo 1000 > quotas/reset_interval_ms
+ # # set watermark
+ # echo free_mem_rate > watermarks/metric
+ # echo 5000000 > watermarks/interval_us
+ # echo 600 > watermarks/high
+ # echo 500 > watermarks/mid
+ # echo 300 > watermarks/low
+
+Please note that it's highly recommended to use user space tools like `damo
+<https://github.com/awslabs/damo>`_ rather than manually reading and writing
+the files as above. Above is only for an example.
.. _debugfs_interface:
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 226/227] Docs/ABI/testing: add DAMON sysfs interface ABI document
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/ABI/testing: add DAMON sysfs interface ABI document
This commit adds DAMON sysfs interface ABI document under
Documentation/ABI/testing.
Link: https://lkml.kernel.org/r/20220228081314.5770-14-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/ABI/testing/sysfs-kernel-mm-damon | 274 ++++++++++++++
MAINTAINERS | 1
2 files changed, 275 insertions(+)
--- /dev/null
+++ a/Documentation/ABI/testing/sysfs-kernel-mm-damon
@@ -0,0 +1,274 @@
+what: /sys/kernel/mm/damon/
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Interface for Data Access MONitoring (DAMON). Contains files
+ for controlling DAMON. For more details on DAMON itself,
+ please refer to Documentation/admin-guide/mm/damon/index.rst.
+
+What: /sys/kernel/mm/damon/admin/
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Interface for privileged users of DAMON. Contains files for
+ controlling DAMON that aimed to be used by privileged users.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/nr_kdamonds
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON worker thread (kdamond)
+ named '0' to 'N-1' under the kdamonds/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/state
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing 'on' or 'off' to this file makes the kdamond starts or
+ stops, respectively. Reading the file returns the keywords
+ based on the current status. Writing 'update_schemes_stats' to
+ the file updates contents of schemes stats files of the
+ kdamond.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the pid of the kdamond if it is
+ running.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/nr_contexts
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON context named '0' to
+ 'N-1' under the contexts/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/operations
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a keyword for a monitoring operations set ('vaddr' for
+ virtual address spaces monitoring, and 'paddr' for the physical
+ address space monitoring) to this file makes the context to use
+ the operations set. Reading the file returns the keyword for
+ the operations set the context is set to use.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/intervals/sample_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the sampling interval of the
+ DAMON context in microseconds as the value. Reading this file
+ returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/intervals/aggr_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the aggregation interval of
+ the DAMON context in microseconds as the value. Reading this
+ file returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/intervals/update_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the update interval of the
+ DAMON context in microseconds as the value. Reading this file
+ returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/nr_regions/min
+
+WDate: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the minimum number of
+ monitoring regions of the DAMON context as the value. Reading
+ this file returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/nr_regions/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the maximum number of
+ monitoring regions of the DAMON context as the value. Reading
+ this file returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/nr_targets
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON target of the context
+ named '0' to 'N-1' under the contexts/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/pid_target
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the pid of
+ the target process if the context is for virtual address spaces
+ monitoring, respectively.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/nr_regions
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for setting each DAMON target memory region of the
+ context named '0' to 'N-1' under the regions/ directory. In
+ case of the virtual address space monitoring, DAMON
+ automatically sets the target memory region based on the target
+ processes' mappings.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/<R>/start
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the start
+ address of the monitoring region.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/<R>/end
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the end
+ address of the monitoring region.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/nr_schemes
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON-based operation scheme
+ of the context named '0' to 'N-1' under the schemes/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/action
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the action
+ of the scheme.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/sz/min
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the mimimum
+ size of the scheme's target regions in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/sz/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the maximum
+ size of the scheme's target regions in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/nr_accesses/min
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the manimum
+ 'nr_accesses' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/nr_accesses/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the maximum
+ 'nr_accesses' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/age/min
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the minimum
+ 'age' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/age/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the maximum
+ 'age' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/ms
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the time
+ quota of the scheme in milliseconds.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/bytes
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the size
+ quota of the scheme in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/reset_interval_ms
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the quotas
+ charge reset interval of the scheme in milliseconds.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the
+ under-quota limit regions prioritization weight for 'size' in
+ permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/nr_accesses_permil
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the
+ under-quota limit regions prioritization weight for
+ 'nr_accesses' in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/age_permil
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the
+ under-quota limit regions prioritization weight for 'age' in
+ permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/metric
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the metric
+ of the watermarks for the scheme. The writable/readable
+ keywords for this file are 'none' for disabling the watermarks
+ feature, or 'free_mem_rate' for the system's global free memory
+ rate in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/interval_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the metric
+ check interval of the watermarks for the scheme in
+ microseconds.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/high
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the high
+ watermark of the scheme in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/mid
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the mid
+ watermark of the scheme in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/low
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the low
+ watermark of the scheme in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/nr_tried
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the number of regions that the action
+ of the scheme has tried to be applied.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/sz_tried
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the total size of regions that the
+ action of the scheme has tried to be applied in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/nr_applied
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the number of regions that the action
+ of the scheme has successfully applied.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/sz_applied
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the total size of regions that the
+ action of the scheme has successfully applied in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/qt_exceeds
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the number of the exceed events of
+ the scheme's quotas.
--- a/MAINTAINERS~docs-abi-testing-add-damon-sysfs-interface-abi-document
+++ a/MAINTAINERS
@@ -5317,6 +5317,7 @@ DATA ACCESS MONITOR
M: SeongJae Park <sj@kernel.org>
L: linux-mm@kvack.org
S: Maintained
+F: Documentation/ABI/testing/sysfs-kernel-mm-damon
F: Documentation/admin-guide/mm/damon/
F: Documentation/vm/damon/
F: include/linux/damon.h
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 226/227] Docs/ABI/testing: add DAMON sysfs interface ABI document
@ 2022-03-22 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:49 UTC (permalink / raw)
To: xhao, skhan, rientjes, gregkh, corbet, sj, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: SeongJae Park <sj@kernel.org>
Subject: Docs/ABI/testing: add DAMON sysfs interface ABI document
This commit adds DAMON sysfs interface ABI document under
Documentation/ABI/testing.
Link: https://lkml.kernel.org/r/20220228081314.5770-14-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/ABI/testing/sysfs-kernel-mm-damon | 274 ++++++++++++++
MAINTAINERS | 1
2 files changed, 275 insertions(+)
--- /dev/null
+++ a/Documentation/ABI/testing/sysfs-kernel-mm-damon
@@ -0,0 +1,274 @@
+what: /sys/kernel/mm/damon/
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Interface for Data Access MONitoring (DAMON). Contains files
+ for controlling DAMON. For more details on DAMON itself,
+ please refer to Documentation/admin-guide/mm/damon/index.rst.
+
+What: /sys/kernel/mm/damon/admin/
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Interface for privileged users of DAMON. Contains files for
+ controlling DAMON that aimed to be used by privileged users.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/nr_kdamonds
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON worker thread (kdamond)
+ named '0' to 'N-1' under the kdamonds/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/state
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing 'on' or 'off' to this file makes the kdamond starts or
+ stops, respectively. Reading the file returns the keywords
+ based on the current status. Writing 'update_schemes_stats' to
+ the file updates contents of schemes stats files of the
+ kdamond.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the pid of the kdamond if it is
+ running.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/nr_contexts
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON context named '0' to
+ 'N-1' under the contexts/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/operations
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a keyword for a monitoring operations set ('vaddr' for
+ virtual address spaces monitoring, and 'paddr' for the physical
+ address space monitoring) to this file makes the context to use
+ the operations set. Reading the file returns the keyword for
+ the operations set the context is set to use.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/intervals/sample_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the sampling interval of the
+ DAMON context in microseconds as the value. Reading this file
+ returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/intervals/aggr_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the aggregation interval of
+ the DAMON context in microseconds as the value. Reading this
+ file returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/intervals/update_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the update interval of the
+ DAMON context in microseconds as the value. Reading this file
+ returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/nr_regions/min
+
+WDate: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the minimum number of
+ monitoring regions of the DAMON context as the value. Reading
+ this file returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/monitoring_attrs/nr_regions/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a value to this file sets the maximum number of
+ monitoring regions of the DAMON context as the value. Reading
+ this file returns the value.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/nr_targets
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON target of the context
+ named '0' to 'N-1' under the contexts/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/pid_target
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the pid of
+ the target process if the context is for virtual address spaces
+ monitoring, respectively.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/nr_regions
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for setting each DAMON target memory region of the
+ context named '0' to 'N-1' under the regions/ directory. In
+ case of the virtual address space monitoring, DAMON
+ automatically sets the target memory region based on the target
+ processes' mappings.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/<R>/start
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the start
+ address of the monitoring region.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/<R>/end
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the end
+ address of the monitoring region.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/nr_schemes
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing a number 'N' to this file creates the number of
+ directories for controlling each DAMON-based operation scheme
+ of the context named '0' to 'N-1' under the schemes/ directory.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/action
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the action
+ of the scheme.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/sz/min
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the mimimum
+ size of the scheme's target regions in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/sz/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the maximum
+ size of the scheme's target regions in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/nr_accesses/min
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the manimum
+ 'nr_accesses' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/nr_accesses/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the maximum
+ 'nr_accesses' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/age/min
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the minimum
+ 'age' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/age/max
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the maximum
+ 'age' of the scheme's target regions.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/ms
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the time
+ quota of the scheme in milliseconds.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/bytes
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the size
+ quota of the scheme in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/reset_interval_ms
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the quotas
+ charge reset interval of the scheme in milliseconds.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the
+ under-quota limit regions prioritization weight for 'size' in
+ permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/nr_accesses_permil
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the
+ under-quota limit regions prioritization weight for
+ 'nr_accesses' in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/age_permil
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the
+ under-quota limit regions prioritization weight for 'age' in
+ permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/metric
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the metric
+ of the watermarks for the scheme. The writable/readable
+ keywords for this file are 'none' for disabling the watermarks
+ feature, or 'free_mem_rate' for the system's global free memory
+ rate in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/interval_us
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the metric
+ check interval of the watermarks for the scheme in
+ microseconds.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/high
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the high
+ watermark of the scheme in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/mid
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the mid
+ watermark of the scheme in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/watermarks/low
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the low
+ watermark of the scheme in permil.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/nr_tried
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the number of regions that the action
+ of the scheme has tried to be applied.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/sz_tried
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the total size of regions that the
+ action of the scheme has tried to be applied in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/nr_applied
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the number of regions that the action
+ of the scheme has successfully applied.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/sz_applied
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the total size of regions that the
+ action of the scheme has successfully applied in bytes.
+
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/qt_exceeds
+Date: Mar 2022
+Contact: SeongJae Park <sj@kernel.org>
+Description: Reading this file returns the number of the exceed events of
+ the scheme's quotas.
--- a/MAINTAINERS~docs-abi-testing-add-damon-sysfs-interface-abi-document
+++ a/MAINTAINERS
@@ -5317,6 +5317,7 @@ DATA ACCESS MONITOR
M: SeongJae Park <sj@kernel.org>
L: linux-mm@kvack.org
S: Maintained
+F: Documentation/ABI/testing/sysfs-kernel-mm-damon
F: Documentation/admin-guide/mm/damon/
F: Documentation/vm/damon/
F: include/linux/damon.h
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 227/227] mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
2022-03-22 21:38 incoming Andrew Morton
@ 2022-03-22 21:50 ` Andrew Morton
2022-03-22 21:38 ` Andrew Morton
` (225 subsequent siblings)
226 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:50 UTC (permalink / raw)
To: sj, xhao, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Xin Hao <xhao@linux.alibaba.com>
Subject: mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
In damon_sysfs_kdamond_release(), we have use container_of() to get
"kdamond" pointer, so there no need to get it once again.
Link: https://lkml.kernel.org/r/20220303075314.22502-1-xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-remove-repeat-container_of-in-damon_sysfs_kdamond_release
+++ a/mm/damon/sysfs.c
@@ -2345,7 +2345,7 @@ static void damon_sysfs_kdamond_release(
if (kdamond->damon_ctx)
damon_destroy_ctx(kdamond->damon_ctx);
- kfree(container_of(kobj, struct damon_sysfs_kdamond, kobj));
+ kfree(kdamond);
}
static struct kobj_attribute damon_sysfs_kdamond_state_attr =
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* [patch 227/227] mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
@ 2022-03-22 21:50 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-22 21:50 UTC (permalink / raw)
To: sj, xhao, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Xin Hao <xhao@linux.alibaba.com>
Subject: mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
In damon_sysfs_kdamond_release(), we have use container_of() to get
"kdamond" pointer, so there no need to get it once again.
Link: https://lkml.kernel.org/r/20220303075314.22502-1-xhao@linux.alibaba.com
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/damon/sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-remove-repeat-container_of-in-damon_sysfs_kdamond_release
+++ a/mm/damon/sysfs.c
@@ -2345,7 +2345,7 @@ static void damon_sysfs_kdamond_release(
if (kdamond->damon_ctx)
damon_destroy_ctx(kdamond->damon_ctx);
- kfree(container_of(kobj, struct damon_sysfs_kdamond, kobj));
+ kfree(kdamond);
}
static struct kobj_attribute damon_sysfs_kdamond_state_attr =
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: [patch 003/227] ntfs: add sanity check on allocation size
2022-03-22 21:38 ` Andrew Morton
(?)
@ 2022-03-22 22:13 ` Linus Torvalds
-1 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2022-03-22 22:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Anton Altaparmakov, mudongliangabcd, patches, Linux-MM, mm-commits
On Tue, Mar 22, 2022 at 2:38 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> From: Dongliang Mu <mudongliangabcd@gmail.com>
> Subject: ntfs: add sanity check on allocation size
>
> ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation size.
> It triggers one BUG in the __ntfs_malloc function.
Hmm. A more serious issue seems to be that cast to (u32).
ntfs_attr_size(a) returns a 's64', so it just randomly truncates a
possibly bad value..
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise
2022-03-22 21:46 ` Andrew Morton
(?)
@ 2022-03-23 0:24 ` Minchan Kim
2022-03-23 2:08 ` Linus Torvalds
2022-03-23 8:28 ` Michal Hocko
-1 siblings, 2 replies; 786+ messages in thread
From: Minchan Kim @ 2022-03-23 0:24 UTC (permalink / raw)
To: Andrew Morton
Cc: vbabka, surenb, stable, sfr, rientjes, nadav.amit, mhocko,
quic_charante, patches, linux-mm, mm-commits, torvalds
On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
> From: Charan Teja Kalla <quic_charante@quicinc.com>
> Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
>
> The process_madvise() system call is expected to skip holes in vma passed
> through 'struct iovec' vector list. But do_madvise, which
> process_madvise() calls for each vma, returns ENOMEM in case of unmapped
> holes, despite the VMA is processed.
>
> Thus process_madvise() should treat ENOMEM as expected and consider the
> VMA passed to as processed and continue processing other vma's in the
> vector list. Returning -ENOMEM to user, despite the VMA is processed,
> will be unable to figure out where to start the next madvise.
>
> Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
I thought it was still under discussion and Charan will post next
version along with previous patch
"mm: madvise: return correct bytes advised with process_madvise"
https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/
> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/madvise.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> --- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise
> +++ a/mm/madvise.c
> @@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi
>
> while (iov_iter_count(&iter)) {
> iovec = iov_iter_iovec(&iter);
> + /*
> + * do_madvise returns ENOMEM if unmapped holes are present
> + * in the passed VMA. process_madvise() is expected to skip
> + * unmapped holes passed to it in the 'struct iovec' list
> + * and not fail because of them. Thus treat -ENOMEM return
> + * from do_madvise as valid and continue processing.
> + */
> ret = do_madvise(mm, (unsigned long)iovec.iov_base,
> iovec.iov_len, behavior);
> - if (ret < 0)
> + if (ret < 0 && ret != -ENOMEM)
> break;
> iov_iter_advance(&iter, iovec.iov_len);
> }
> _
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise
2022-03-23 0:24 ` Minchan Kim
@ 2022-03-23 2:08 ` Linus Torvalds
2022-03-23 8:28 ` Michal Hocko
1 sibling, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2022-03-23 2:08 UTC (permalink / raw)
To: Minchan Kim, Charan Teja Kalla
Cc: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, stable,
Stephen Rothwell, David Rientjes, Nadav Amit, Michal Hocko,
patches, Linux-MM, mm-commits
On Tue, Mar 22, 2022 at 5:25 PM Minchan Kim <minchan@kernel.org> wrote:
>
> I thought it was still under discussion and Charan will post next
> version along with previous patch
> "mm: madvise: return correct bytes advised with process_madvise"
>
> https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/
Hmm. It's merged now, as commit 08095d6310a7.
So any fixes please do it on top of that existing state ;(
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise
2022-03-23 0:24 ` Minchan Kim
2022-03-23 2:08 ` Linus Torvalds
@ 2022-03-23 8:28 ` Michal Hocko
2022-03-23 15:47 ` Charan Teja Kalla
1 sibling, 1 reply; 786+ messages in thread
From: Michal Hocko @ 2022-03-23 8:28 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, vbabka, surenb, stable, sfr, rientjes, nadav.amit,
quic_charante, patches, linux-mm, mm-commits, torvalds
On Tue 22-03-22 17:24:58, Minchan Kim wrote:
> On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
> > From: Charan Teja Kalla <quic_charante@quicinc.com>
> > Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
> >
> > The process_madvise() system call is expected to skip holes in vma passed
> > through 'struct iovec' vector list. But do_madvise, which
> > process_madvise() calls for each vma, returns ENOMEM in case of unmapped
> > holes, despite the VMA is processed.
> >
> > Thus process_madvise() should treat ENOMEM as expected and consider the
> > VMA passed to as processed and continue processing other vma's in the
> > vector list. Returning -ENOMEM to user, despite the VMA is processed,
> > will be unable to figure out where to start the next madvise.
> >
> > Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
>
> I thought it was still under discussion and Charan will post next
> version along with previous patch
> "mm: madvise: return correct bytes advised with process_madvise"
>
> https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/
Yes, I am not even sure the new semantic is sensible[1]. We should discuss
that and see all the consequences. Changing the semantic of an existing
syscall is always tricky going back and forth is even worse.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise
2022-03-23 8:28 ` Michal Hocko
@ 2022-03-23 15:47 ` Charan Teja Kalla
0 siblings, 0 replies; 786+ messages in thread
From: Charan Teja Kalla @ 2022-03-23 15:47 UTC (permalink / raw)
To: Michal Hocko, linux-kernel
Cc: Andrew Morton, vbabka, surenb, stable, sfr, rientjes, nadav.amit,
patches, linux-mm, mm-commits, torvalds
On 3/23/2022 1:58 PM, Michal Hocko wrote:
> On Tue 22-03-22 17:24:58, Minchan Kim wrote:
>> On Tue, Mar 22, 2022 at 02:46:48PM -0700, Andrew Morton wrote:
>>> From: Charan Teja Kalla <quic_charante@quicinc.com>
>>> Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
>>>
>>> The process_madvise() system call is expected to skip holes in vma passed
>>> through 'struct iovec' vector list. But do_madvise, which
>>> process_madvise() calls for each vma, returns ENOMEM in case of unmapped
>>> holes, despite the VMA is processed.
>>>
>>> Thus process_madvise() should treat ENOMEM as expected and consider the
>>> VMA passed to as processed and continue processing other vma's in the
>>> vector list. Returning -ENOMEM to user, despite the VMA is processed,
>>> will be unable to figure out where to start the next madvise.
>>>
>>> Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.1647008754.git.quic_charante@quicinc.com
>> I thought it was still under discussion and Charan will post next
>> version along with previous patch
>> "mm: madvise: return correct bytes advised with process_madvise"
>>
>> https://lore.kernel.org/linux-mm/7207b2f5-6b3e-aea4-aa1b-9c6d849abe34@quicinc.com/
> Yes, I am not even sure the new semantic is sensible[1]. We should discuss
> that and see all the consequences. Changing the semantic of an existing
> syscall is always tricky going back and forth is even worse.
Starting the discussion @
https://lore.kernel.org/linux-mm/cover.1648046642.git.quic_charante@quicinc.com/
Thanks,
Charan
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-04-27 19:41 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-04-27 19:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits, patches
2 patches, based on d615b5416f8a1afeb82d13b238f8152c572d59c0.
Subsystems affected by this patch series:
mm/kasan
mm/debug
Subsystem: mm/kasan
Zqiang <qiang1.zhang@intel.com>:
kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time
Subsystem: mm/debug
Akira Yokosawa <akiyks@gmail.com>:
docs: vm/page_owner: use literal blocks for param description
Documentation/vm/page_owner.rst | 5 +++--
mm/kasan/quarantine.c | 7 +++++++
2 files changed, 10 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-04-21 23:35 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-04-21 23:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches
13 patches, based on b253435746d9a4a701b5f09211b9c14d3370d0da.
Subsystems affected by this patch series:
mm/memory-failure
mm/memcg
mm/userfaultfd
mm/hugetlbfs
mm/mremap
mm/oom-kill
mm/kasan
kcov
mm/hmm
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Xu Yu <xuyu@linux.alibaba.com>:
mm/memory-failure.c: skip huge_zero_page in memory_failure()
Subsystem: mm/memcg
Shakeel Butt <shakeelb@google.com>:
memcg: sync flush only if periodic flush is delayed
Subsystem: mm/userfaultfd
Nadav Amit <namit@vmware.com>:
userfaultfd: mark uffd_wp regardless of VM_WRITE flag
Subsystem: mm/hugetlbfs
Christophe Leroy <christophe.leroy@csgroup.eu>:
mm, hugetlb: allow for "high" userspace addresses
Subsystem: mm/mremap
Sidhartha Kumar <sidhartha.kumar@oracle.com>:
selftest/vm: verify mmap addr in mremap_test
selftest/vm: verify remap destination address in mremap_test
selftest/vm: support xfail in mremap_test
selftest/vm: add skip support to mremap_test
Subsystem: mm/oom-kill
Nico Pache <npache@redhat.com>:
oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
Subsystem: mm/kasan
Vincenzo Frascino <vincenzo.frascino@arm.com>:
MAINTAINERS: add Vincenzo Frascino to KASAN reviewers
Subsystem: kcov
Aleksandr Nogikh <nogikh@google.com>:
kcov: don't generate a warning on vm_insert_page()'s failure
Subsystem: mm/hmm
Alistair Popple <apopple@nvidia.com>:
mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
MAINTAINERS | 1
fs/hugetlbfs/inode.c | 9 -
include/linux/hugetlb.h | 6 +
include/linux/memcontrol.h | 5
include/linux/mm.h | 8 +
include/linux/sched.h | 1
include/linux/sched/mm.h | 8 +
kernel/kcov.c | 7 -
mm/hugetlb.c | 10 +
mm/memcontrol.c | 12 ++
mm/memory-failure.c | 158 ++++++++++++++++++++++--------
mm/mmap.c | 8 -
mm/mmu_notifier.c | 14 ++
mm/oom_kill.c | 54 +++++++---
mm/userfaultfd.c | 15 +-
mm/workingset.c | 2
tools/testing/selftests/vm/mremap_test.c | 85 +++++++++++++++-
tools/testing/selftests/vm/run_vmtests.sh | 11 +-
18 files changed, 327 insertions(+), 87 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-04-15 2:12 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-04-15 2:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits, patches
14 patches, based on 115acbb56978941bb7537a97dfc303da286106c1.
Subsystems affected by this patch series:
MAINTAINERS
mm/tmpfs
m/secretmem
mm/kasan
mm/kfence
mm/pagealloc
mm/zram
mm/compaction
mm/hugetlb
binfmt
mm/vmalloc
mm/kmemleak
Subsystem: MAINTAINERS
Joe Perches <joe@perches.com>:
MAINTAINERS: Broadcom internal lists aren't maintainers
Subsystem: mm/tmpfs
Hugh Dickins <hughd@google.com>:
tmpfs: fix regressions from wider use of ZERO_PAGE
Subsystem: m/secretmem
Axel Rasmussen <axelrasmussen@google.com>:
mm/secretmem: fix panic when growing a memfd_secret
Subsystem: mm/kasan
Zqiang <qiang1.zhang@intel.com>:
irq_work: use kasan_record_aux_stack_noalloc() record callstack
Vincenzo Frascino <vincenzo.frascino@arm.com>:
kasan: fix hw tags enablement when KUNIT tests are disabled
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
mm, kfence: support kmem_dump_obj() for KFENCE objects
Subsystem: mm/pagealloc
Juergen Gross <jgross@suse.com>:
mm, page_alloc: fix build_zonerefs_node()
Subsystem: mm/zram
Minchan Kim <minchan@kernel.org>:
mm: fix unexpected zeroed page mapping with zram swap
Subsystem: mm/compaction
Charan Teja Kalla <quic_charante@quicinc.com>:
mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb: do not demote poisoned hugetlb pages
Subsystem: binfmt
Andrew Morton <akpm@linux-foundation.org>:
revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
Subsystem: mm/vmalloc
Omar Sandoval <osandov@fb.com>:
mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Subsystem: mm/kmemleak
Patrick Wang <patrick.wang.shcn@gmail.com>:
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
MAINTAINERS | 64 ++++++++++++++++++++--------------------
arch/x86/include/asm/io.h | 2 -
arch/x86/kernel/crash_dump_64.c | 1
fs/binfmt_elf.c | 6 +--
include/linux/kfence.h | 24 +++++++++++++++
kernel/irq_work.c | 2 -
mm/compaction.c | 10 +++---
mm/filemap.c | 6 ---
mm/hugetlb.c | 17 ++++++----
mm/kasan/hw_tags.c | 5 +--
mm/kasan/kasan.h | 10 +++---
mm/kfence/core.c | 21 -------------
mm/kfence/kfence.h | 21 +++++++++++++
mm/kfence/report.c | 47 +++++++++++++++++++++++++++++
mm/kmemleak.c | 8 ++---
mm/page_alloc.c | 2 -
mm/page_io.c | 54 ---------------------------------
mm/secretmem.c | 17 ++++++++++
mm/shmem.c | 31 ++++++++++++-------
mm/slab.c | 2 -
mm/slab.h | 2 -
mm/slab_common.c | 9 +++++
mm/slob.c | 2 -
mm/slub.c | 2 -
mm/vmalloc.c | 11 ------
25 files changed, 207 insertions(+), 169 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-04-08 20:08 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-04-08 20:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits, patches
9 patches, based on d00c50b35101b862c3db270ffeba53a63a1063d9.
Subsystems affected by this patch series:
mm/migration
mm/highmem
lz4
mm/sparsemem
mm/mremap
mm/mempolicy
mailmap
mm/memcg
MAINTAINERS
Subsystem: mm/migration
Zi Yan <ziy@nvidia.com>:
mm: migrate: use thp_order instead of HPAGE_PMD_ORDER for new page allocation.
Subsystem: mm/highmem
Max Filippov <jcmvbkbc@gmail.com>:
highmem: fix checks in __kmap_local_sched_{in,out}
Subsystem: lz4
Guo Xuenan <guoxuenan@huawei.com>:
lz4: fix LZ4_decompress_safe_partial read out of bound
Subsystem: mm/sparsemem
Waiman Long <longman@redhat.com>:
mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
Subsystem: mm/mremap
Paolo Bonzini <pbonzini@redhat.com>:
mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
Subsystem: mm/mempolicy
Miaohe Lin <linmiaohe@huawei.com>:
mm/mempolicy: fix mpol_new leak in shared_policy_replace
Subsystem: mailmap
Vasily Averin <vasily.averin@linux.dev>:
mailmap: update Vasily Averin's email address
Subsystem: mm/memcg
Andrew Morton <akpm@linux-foundation.org>:
mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()"
Subsystem: MAINTAINERS
Tom Rix <trix@redhat.com>:
MAINTAINERS: add Tom as clang reviewer
.mailmap | 4 ++++
MAINTAINERS | 1 +
include/linux/mmzone.h | 11 +++++++----
lib/lz4/lz4_decompress.c | 8 ++++++--
mm/highmem.c | 4 ++--
mm/list_lru.c | 6 ------
mm/mempolicy.c | 3 ++-
mm/migrate.c | 2 +-
mm/mremap.c | 3 +++
9 files changed, 26 insertions(+), 16 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-04-01 18:27 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-04-01 18:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits, patches
16 patches, based on e8b767f5e04097aaedcd6e06e2270f9fe5282696.
Subsystems affected by this patch series:
mm/madvise
ofs2
nilfs2
mm/mlock
mm/mfence
mailmap
mm/memory-failure
mm/kasan
mm/debug
mm/kmemleak
mm/damon
Subsystem: mm/madvise
Charan Teja Kalla <quic_charante@quicinc.com>:
Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
Subsystem: ofs2
Joseph Qi <joseph.qi@linux.alibaba.com>:
ocfs2: fix crash when mount with quota enabled
Subsystem: nilfs2
Ryusuke Konishi <konishi.ryusuke@gmail.com>:
Patch series "nilfs2 lockdep warning fixes":
nilfs2: fix lockdep warnings in page operations for btree nodes
nilfs2: fix lockdep warnings during disk space reclamation
nilfs2: get rid of nilfs_mapping_init()
Subsystem: mm/mlock
Hugh Dickins <hughd@google.com>:
mm/munlock: add lru_add_drain() to fix memcg_stat_test
mm/munlock: update Documentation/vm/unevictable-lru.rst
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm/munlock: protect the per-CPU pagevec by a local_lock_t
Subsystem: mm/kfence
Muchun Song <songmuchun@bytedance.com>:
mm: kfence: fix objcgs vector allocation
Subsystem: mailmap
Kirill Tkhai <kirill.tkhai@openvz.org>:
mailmap: update Kirill's email
Subsystem: mm/memory-failure
Rik van Riel <riel@surriel.com>:
mm,hwpoison: unmap poisoned page before invalidation
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP
Subsystem: mm/debug
Yinan Zhang <zhangyinan2019@email.szu.edu.cn>:
tools/vm/page_owner_sort.c: remove -c option
doc/vm/page_owner.rst: remove content related to -c option
Subsystem: mm/kmemleak
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
mm/kmemleak: reset tag when compare object pointer
Subsystem: mm/damon
Jonghyeon Kim <tome01@ajou.ac.kr>:
mm/damon: prevent activated scheme from sleeping by deactivated schemes
.mailmap | 1
Documentation/vm/page_owner.rst | 1
Documentation/vm/unevictable-lru.rst | 473 +++++++++++++++--------------------
fs/nilfs2/btnode.c | 23 +
fs/nilfs2/btnode.h | 1
fs/nilfs2/btree.c | 27 +
fs/nilfs2/dat.c | 4
fs/nilfs2/gcinode.c | 7
fs/nilfs2/inode.c | 167 +++++++++++-
fs/nilfs2/mdt.c | 45 ++-
fs/nilfs2/mdt.h | 6
fs/nilfs2/nilfs.h | 16 -
fs/nilfs2/page.c | 16 -
fs/nilfs2/page.h | 1
fs/nilfs2/segment.c | 9
fs/nilfs2/super.c | 5
fs/ocfs2/quota_global.c | 23 -
fs/ocfs2/quota_local.c | 2
include/linux/gfp.h | 4
mm/damon/core.c | 5
mm/gup.c | 10
mm/internal.h | 6
mm/kfence/core.c | 11
mm/kfence/kfence.h | 3
mm/kmemleak.c | 9
mm/madvise.c | 9
mm/memory.c | 12
mm/migrate.c | 2
mm/mlock.c | 46 ++-
mm/page_alloc.c | 1
mm/rmap.c | 4
mm/swap.c | 4
tools/vm/page_owner_sort.c | 6
33 files changed, 560 insertions(+), 399 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2022-04-01 18:20 incoming Andrew Morton
@ 2022-04-01 18:27 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-04-01 18:27 UTC (permalink / raw)
To: Linus Torvalds, linux-mm, mm-commits, patches
Argh, messed up in-reply-to. Let me redo...
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-04-01 18:20 Andrew Morton
2022-04-01 18:27 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2022-04-01 18:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits, patches
16 patches, based on e8b767f5e04097aaedcd6e06e2270f9fe5282696.
Subsystems affected by this patch series:
mm/madvise
ofs2
nilfs2
mm/mlock
mm/mfence
mailmap
mm/memory-failure
mm/kasan
mm/debug
mm/kmemleak
mm/damon
Subsystem: mm/madvise
Charan Teja Kalla <quic_charante@quicinc.com>:
Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
Subsystem: ofs2
Joseph Qi <joseph.qi@linux.alibaba.com>:
ocfs2: fix crash when mount with quota enabled
Subsystem: nilfs2
Ryusuke Konishi <konishi.ryusuke@gmail.com>:
Patch series "nilfs2 lockdep warning fixes":
nilfs2: fix lockdep warnings in page operations for btree nodes
nilfs2: fix lockdep warnings during disk space reclamation
nilfs2: get rid of nilfs_mapping_init()
Subsystem: mm/mlock
Hugh Dickins <hughd@google.com>:
mm/munlock: add lru_add_drain() to fix memcg_stat_test
mm/munlock: update Documentation/vm/unevictable-lru.rst
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm/munlock: protect the per-CPU pagevec by a local_lock_t
Subsystem: mm/kfence
Muchun Song <songmuchun@bytedance.com>:
mm: kfence: fix objcgs vector allocation
Subsystem: mailmap
Kirill Tkhai <kirill.tkhai@openvz.org>:
mailmap: update Kirill's email
Subsystem: mm/memory-failure
Rik van Riel <riel@surriel.com>:
mm,hwpoison: unmap poisoned page before invalidation
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP
Subsystem: mm/debug
Yinan Zhang <zhangyinan2019@email.szu.edu.cn>:
tools/vm/page_owner_sort.c: remove -c option
doc/vm/page_owner.rst: remove content related to -c option
Subsystem: mm/kmemleak
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
mm/kmemleak: reset tag when compare object pointer
Subsystem: mm/damon
Jonghyeon Kim <tome01@ajou.ac.kr>:
mm/damon: prevent activated scheme from sleeping by deactivated schemes
.mailmap | 1
Documentation/vm/page_owner.rst | 1
Documentation/vm/unevictable-lru.rst | 473 +++++++++++++++--------------------
fs/nilfs2/btnode.c | 23 +
fs/nilfs2/btnode.h | 1
fs/nilfs2/btree.c | 27 +
fs/nilfs2/dat.c | 4
fs/nilfs2/gcinode.c | 7
fs/nilfs2/inode.c | 167 +++++++++++-
fs/nilfs2/mdt.c | 45 ++-
fs/nilfs2/mdt.h | 6
fs/nilfs2/nilfs.h | 16 -
fs/nilfs2/page.c | 16 -
fs/nilfs2/page.h | 1
fs/nilfs2/segment.c | 9
fs/nilfs2/super.c | 5
fs/ocfs2/quota_global.c | 23 -
fs/ocfs2/quota_local.c | 2
include/linux/gfp.h | 4
mm/damon/core.c | 5
mm/gup.c | 10
mm/internal.h | 6
mm/kfence/core.c | 11
mm/kfence/kfence.h | 3
mm/kmemleak.c | 9
mm/madvise.c | 9
mm/memory.c | 12
mm/migrate.c | 2
mm/mlock.c | 46 ++-
mm/page_alloc.c | 1
mm/rmap.c | 4
mm/swap.c | 4
tools/vm/page_owner_sort.c | 6
33 files changed, 560 insertions(+), 399 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-03-25 1:07 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-25 1:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches
This is the material which was staged after willystuff in linux-next.
Everything applied seamlessly on your latest, all looks well.
114 patches, based on 52deda9551a01879b3562e7b41748e85c591f14c.
Subsystems affected by this patch series:
mm/debug
mm/selftests
mm/pagecache
mm/thp
mm/rmap
mm/migration
mm/kasan
mm/hugetlb
mm/pagemap
mm/madvise
selftests
Subsystem: mm/debug
Sean Anderson <seanga2@gmail.com>:
tools/vm/page_owner_sort.c: sort by stacktrace before culling
tools/vm/page_owner_sort.c: support sorting by stack trace
Yinan Zhang <zhangyinan2019@email.szu.edu.cn>:
tools/vm/page_owner_sort.c: add switch between culling by stacktrace and txt
Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>:
tools/vm/page_owner_sort.c: support sorting pid and time
Shenghong Han <hanshenghong2019@email.szu.edu.cn>:
tools/vm/page_owner_sort.c: two trivial fixes
Yixuan Cao <caoyixuan2019@email.szu.edu.cn>:
tools/vm/page_owner_sort.c: delete invalid duplicate code
Shenghong Han <hanshenghong2019@email.szu.edu.cn>:
Documentation/vm/page_owner.rst: update the documentation
Shuah Khan <skhan@linuxfoundation.org>:
Documentation/vm/page_owner.rst: fix unexpected indentation warns
Waiman Long <longman@redhat.com>:
Patch series "mm/page_owner: Extend page_owner to show memcg information", v4:
lib/vsprintf: avoid redundant work with 0 size
mm/page_owner: use scnprintf() to avoid excessive buffer overrun check
mm/page_owner: print memcg information
mm/page_owner: record task command name
Yixuan Cao <caoyixuan2019@email.szu.edu.cn>:
mm/page_owner.c: record tgid
tools/vm/page_owner_sort.c: fix the instructions for use
Jiajian Ye <yejiajian2018@email.szu.edu.cn>:
tools/vm/page_owner_sort.c: fix comments
tools/vm/page_owner_sort.c: add a security check
tools/vm/page_owner_sort.c: support sorting by tgid and update documentation
tools/vm/page_owner_sort: fix three trivival places
tools/vm/page_owner_sort: support for sorting by task command name
tools/vm/page_owner_sort.c: support for selecting by PID, TGID or task command name
tools/vm/page_owner_sort.c: support for user-defined culling rules
Christoph Hellwig <hch@lst.de>:
mm: unexport page_init_poison
Subsystem: mm/selftests
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
selftest/vm: add util.h and and move helper functions there
Mike Rapoport <rppt@kernel.org>:
selftest/vm: add helpers to detect PAGE_SIZE and PAGE_SHIFT
Subsystem: mm/pagecache
Hugh Dickins <hughd@google.com>:
mm: delete __ClearPageWaiters()
mm: filemap_unaccount_folio() large skip mapcount fixup
Subsystem: mm/thp
Hugh Dickins <hughd@google.com>:
mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap()
Subsystem: mm/rmap
Subsystem: mm/migration
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/migration: Add trace events", v3:
mm/migration: add trace events for THP migrations
mm/migration: add trace events for base page and HugeTLB migrations
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan, vmalloc, arm64: add vmalloc tagging support for SW/HW_TAGS", v6:
kasan, page_alloc: deduplicate should_skip_kasan_poison
kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pages
kasan, page_alloc: merge kasan_free_pages into free_pages_prepare
kasan, page_alloc: simplify kasan_poison_pages call site
kasan, page_alloc: init memory of skipped pages on free
kasan: drop skip_kasan_poison variable in free_pages_prepare
mm: clarify __GFP_ZEROTAGS comment
kasan: only apply __GFP_ZEROTAGS when memory is zeroed
kasan, page_alloc: refactor init checks in post_alloc_hook
kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook
kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hook
kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hook
kasan, page_alloc: move kernel_init_free_pages in post_alloc_hook
kasan, page_alloc: rework kasan_unpoison_pages call site
kasan: clean up metadata byte definitions
kasan: define KASAN_VMALLOC_INVALID for SW_TAGS
kasan, x86, arm64, s390: rename functions for modules shadow
kasan, vmalloc: drop outdated VM_KASAN comment
kasan: reorder vmalloc hooks
kasan: add wrappers for vmalloc hooks
kasan, vmalloc: reset tags in vmalloc functions
kasan, fork: reset pointer tags of vmapped stacks
kasan, arm64: reset pointer tags of vmapped stacks
kasan, vmalloc: add vmalloc tagging for SW_TAGS
kasan, vmalloc, arm64: mark vmalloc mappings as pgprot_tagged
kasan, vmalloc: unpoison VM_ALLOC pages after mapping
kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS
kasan, page_alloc: allow skipping unpoisoning for HW_TAGS
kasan, page_alloc: allow skipping memory init for HW_TAGS
kasan, vmalloc: add vmalloc tagging for HW_TAGS
kasan, vmalloc: only tag normal vmalloc allocations
kasan, arm64: don't tag executable vmalloc allocations
kasan: mark kasan_arg_stacktrace as __initdata
kasan: clean up feature flags for HW_TAGS mode
kasan: add kasan.vmalloc command line flag
kasan: allow enabling KASAN_VMALLOC and SW/HW_TAGS
arm64: select KASAN_VMALLOC for SW/HW_TAGS modes
kasan: documentation updates
kasan: improve vmalloc tests
kasan: test: support async (again) and asymm modes for HW_TAGS
tangmeng <tangmeng@uniontech.com>:
mm/kasan: remove unnecessary CONFIG_KASAN option
Peter Collingbourne <pcc@google.com>:
kasan: update function name in comments
Andrey Konovalov <andreyknvl@google.com>:
kasan: print virtual mapping info in reports
Patch series "kasan: report clean-ups and improvements":
kasan: drop addr check from describe_object_addr
kasan: more line breaks in reports
kasan: rearrange stack frame info in reports
kasan: improve stack frame info in reports
kasan: print basic stack frame info for SW_TAGS
kasan: simplify async check in end_report()
kasan: simplify kasan_update_kunit_status() and call sites
kasan: check CONFIG_KASAN_KUNIT_TEST instead of CONFIG_KUNIT
kasan: move update_kunit_status to start_report
kasan: move disable_trace_on_warning to start_report
kasan: split out print_report from __kasan_report
kasan: simplify kasan_find_first_bad_addr call sites
kasan: restructure kasan_report
kasan: merge __kasan_report into kasan_report
kasan: call print_report from kasan_report_invalid_free
kasan: move and simplify kasan_report_async
kasan: rename kasan_access_info to kasan_report_info
kasan: add comment about UACCESS regions to kasan_report
kasan: respect KASAN_BIT_REPORTED in all reporting routines
kasan: reorder reporting functions
kasan: move and hide kasan_save_enable/restore_multi_shot
kasan: disable LOCKDEP when printing reports
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
Patch series "Add hugetlb MADV_DONTNEED support", v3:
mm: enable MADV_DONTNEED for hugetlb mappings
selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
userfaultfd/selftests: enable hugetlb remap and remove event testing
Miaohe Lin <linmiaohe@huawei.com>:
mm/huge_memory: make is_transparent_hugepage() static
Subsystem: mm/pagemap
David Hildenbrand <david@redhat.com>:
Patch series "mm: COW fixes part 1: fix the COW security issue for THP and swap", v3:
mm: optimize do_wp_page() for exclusive pages in the swapcache
mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
mm: slightly clarify KSM logic in do_swap_page()
mm: streamline COW logic in do_swap_page()
mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
mm/khugepaged: remove reuse_swap_page() usage
mm/swapfile: remove stale reuse_swap_page()
mm/huge_memory: remove stale page_trans_huge_mapcount()
mm/huge_memory: remove stale locking logic from __split_huge_pmd()
Hugh Dickins <hughd@google.com>:
mm: warn on deleting redirtied only if accounted
mm: unmap_mapping_range_tree() with i_mmap_rwsem shared
Anshuman Khandual <anshuman.khandual@arm.com>:
mm: generalize ARCH_HAS_FILTER_PGPROT
Subsystem: mm/madvise
Mauricio Faria de Oliveira <mfo@canonical.com>:
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Johannes Weiner <hannes@cmpxchg.org>:
mm: madvise: MADV_DONTNEED_LOCKED
Subsystem: selftests
Muhammad Usama Anjum <usama.anjum@collabora.com>:
selftests: vm: remove dependecy from internal kernel macros
Kees Cook <keescook@chromium.org>:
selftests: kselftest framework: provide "finished" helper
Documentation/dev-tools/kasan.rst | 17
Documentation/vm/page_owner.rst | 72 ++
arch/alpha/include/uapi/asm/mman.h | 2
arch/arm64/Kconfig | 2
arch/arm64/include/asm/vmalloc.h | 6
arch/arm64/include/asm/vmap_stack.h | 5
arch/arm64/kernel/module.c | 5
arch/arm64/mm/pageattr.c | 2
arch/arm64/net/bpf_jit_comp.c | 3
arch/mips/include/uapi/asm/mman.h | 2
arch/parisc/include/uapi/asm/mman.h | 2
arch/powerpc/mm/book3s64/trace.c | 1
arch/s390/kernel/module.c | 2
arch/x86/Kconfig | 3
arch/x86/kernel/module.c | 2
arch/x86/mm/init.c | 1
arch/xtensa/include/uapi/asm/mman.h | 2
include/linux/gfp.h | 53 +-
include/linux/huge_mm.h | 6
include/linux/kasan.h | 136 +++--
include/linux/mm.h | 5
include/linux/page-flags.h | 2
include/linux/pagemap.h | 3
include/linux/swap.h | 4
include/linux/vmalloc.h | 18
include/trace/events/huge_memory.h | 1
include/trace/events/migrate.h | 31 +
include/trace/events/mmflags.h | 18
include/trace/events/thp.h | 27 +
include/uapi/asm-generic/mman-common.h | 2
kernel/fork.c | 13
kernel/scs.c | 16
lib/Kconfig.kasan | 18
lib/test_kasan.c | 239 ++++++++-
lib/vsprintf.c | 8
mm/Kconfig | 3
mm/debug.c | 1
mm/filemap.c | 63 +-
mm/huge_memory.c | 109 ----
mm/kasan/Makefile | 2
mm/kasan/common.c | 4
mm/kasan/hw_tags.c | 243 +++++++---
mm/kasan/kasan.h | 76 ++-
mm/kasan/report.c | 516 +++++++++++----------
mm/kasan/report_generic.c | 34 -
mm/kasan/report_hw_tags.c | 1
mm/kasan/report_sw_tags.c | 16
mm/kasan/report_tags.c | 2
mm/kasan/shadow.c | 76 +--
mm/khugepaged.c | 11
mm/madvise.c | 57 +-
mm/memory.c | 129 +++--
mm/memremap.c | 2
mm/migrate.c | 4
mm/page-writeback.c | 18
mm/page_alloc.c | 270 ++++++-----
mm/page_owner.c | 86 ++-
mm/rmap.c | 62 +-
mm/swap.c | 4
mm/swapfile.c | 104 ----
mm/vmalloc.c | 167 ++++--
tools/testing/selftests/kselftest.h | 10
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 1
tools/testing/selftests/vm/gup_test.c | 3
tools/testing/selftests/vm/hugetlb-madvise.c | 410 ++++++++++++++++
tools/testing/selftests/vm/ksm_tests.c | 38 -
tools/testing/selftests/vm/memfd_secret.c | 2
tools/testing/selftests/vm/run_vmtests.sh | 15
tools/testing/selftests/vm/transhuge-stress.c | 41 -
tools/testing/selftests/vm/userfaultfd.c | 72 +-
tools/testing/selftests/vm/util.h | 75 ++-
tools/vm/page_owner_sort.c | 628 +++++++++++++++++++++-----
73 files changed, 2797 insertions(+), 1288 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-03-23 23:04 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-23 23:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches
Various misc subsystems, before getting into the post-linux-next material.
This is all based on v5.17. I tested applying and compiling against
today's 1bc191051dca28fa6. One patch required an extra whack, all
looks good.
41 patches, based on f443e374ae131c168a065ea1748feac6b2e76613.
Subsystems affected by this patch series:
procfs
misc
core-kernel
lib
checkpatch
init
pipe
minix
fat
cgroups
kexec
kdump
taskstats
panic
kcov
resource
ubsan
Subsystem: procfs
Hao Lee <haolee.swjtu@gmail.com>:
proc: alloc PATH_MAX bytes for /proc/${pid}/fd/ symlinks
David Hildenbrand <david@redhat.com>:
proc/vmcore: fix possible deadlock on concurrent mmap and read
Yang Li <yang.lee@linux.alibaba.com>:
proc/vmcore: fix vmcore_alloc_buf() kernel-doc comment
Subsystem: misc
Bjorn Helgaas <bhelgaas@google.com>:
linux/types.h: remove unnecessary __bitwise__
Documentation/sparse: add hints about __CHECKER__
Subsystem: core-kernel
Miaohe Lin <linmiaohe@huawei.com>:
kernel/ksysfs.c: use helper macro __ATTR_RW
Subsystem: lib
Kees Cook <keescook@chromium.org>:
Kconfig.debug: make DEBUG_INFO selectable from a choice
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
include: drop pointless __compiler_offsetof indirection
Christophe Leroy <christophe.leroy@csgroup.eu>:
ilog2: force inlining of __ilog2_u32() and __ilog2_u64()
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
bitfield: add explicit inclusions to the example
Feng Tang <feng.tang@intel.com>:
lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option
Randy Dunlap <rdunlap@infradead.org>:
lib: bitmap: fix many kernel-doc warnings
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: prefer MODULE_LICENSE("GPL") over MODULE_LICENSE("GPL v2")
checkpatch: add --fix option for some TRAILING_STATEMENTS
checkpatch: add early_param exception to blank line after struct/function test
Sagar Patel <sagarmp@cs.unc.edu>:
checkpatch: use python3 to find codespell dictionary
Subsystem: init
Mark-PK Tsai <mark-pk.tsai@mediatek.com>:
init: use ktime_us_delta() to make initcall_debug log more precise
Randy Dunlap <rdunlap@infradead.org>:
init.h: improve __setup and early_param documentation
init/main.c: return 1 from handled __setup() functions
Subsystem: pipe
Andrei Vagin <avagin@gmail.com>:
fs/pipe: use kvcalloc to allocate a pipe_buffer array
fs/pipe.c: local vars have to match types of proper pipe_inode_info fields
Subsystem: minix
Qinghua Jin <qhjin.dev@gmail.com>:
minix: fix bug when opening a file with O_DIRECT
Subsystem: fat
Helge Deller <deller@gmx.de>:
fat: use pointer to simple type in put_user()
Subsystem: cgroups
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
cgroup: use irqsave in cgroup_rstat_flush_locked().
cgroup: add a comment to cgroup_rstat_flush_locked().
Subsystem: kexec
Jisheng Zhang <jszhang@kernel.org>:
Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2:
kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
Subsystem: kdump
Tiezhu Yang <yangtiezhu@loongson.cn>:
Patch series "Update doc and fix some issues about kdump", v2:
docs: kdump: update description about sysfs file system support
docs: kdump: add scp example to write out the dump file
panic: unset panic_on_warn inside panic()
ubsan: no need to unset panic_on_warn in ubsan_epilogue()
kasan: no need to unset panic_on_warn in end_report()
Subsystem: taskstats
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
taskstats: remove unneeded dead assignment
Subsystem: panic
"Guilherme G. Piccoli" <gpiccoli@igalia.com>:
Patch series "Some improvements on panic_print":
docs: sysctl/kernel: add missing bit to panic_print
panic: add option to dump all CPUs backtraces in panic_print
panic: move panic_print before kmsg dumpers
Subsystem: kcov
Aleksandr Nogikh <nogikh@google.com>:
Patch series "kcov: improve mmap processing", v3:
kcov: split ioctl handling into locked and unlocked parts
kcov: properly handle subsequent mmap calls
Subsystem: resource
Miaohe Lin <linmiaohe@huawei.com>:
kernel/resource: fix kfree() of bootmem memory again
Subsystem: ubsan
Marco Elver <elver@google.com>:
Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
Documentation/admin-guide/kdump/kdump.rst | 10 +
Documentation/admin-guide/kernel-parameters.txt | 5
Documentation/admin-guide/sysctl/kernel.rst | 2
Documentation/dev-tools/sparse.rst | 2
arch/arm64/mm/init.c | 9 -
arch/riscv/mm/init.c | 6 -
arch/x86/kernel/setup.c | 10 -
fs/fat/dir.c | 2
fs/minix/inode.c | 3
fs/pipe.c | 13 +-
fs/proc/base.c | 8 -
fs/proc/vmcore.c | 43 +++----
include/linux/bitfield.h | 3
include/linux/compiler_types.h | 3
include/linux/init.h | 11 +
include/linux/kexec.h | 12 +-
include/linux/log2.h | 4
include/linux/stddef.h | 6 -
include/uapi/linux/types.h | 6 -
init/main.c | 14 +-
kernel/cgroup/rstat.c | 13 +-
kernel/kcov.c | 102 ++++++++---------
kernel/ksysfs.c | 3
kernel/panic.c | 37 ++++--
kernel/resource.c | 41 +-----
kernel/taskstats.c | 5
lib/Kconfig.debug | 142 ++++++++++++------------
lib/Kconfig.kcsan | 11 -
lib/Kconfig.ubsan | 12 --
lib/bitmap.c | 24 ++--
lib/ubsan.c | 10 -
mm/kasan/report.c | 10 -
scripts/checkpatch.pl | 31 ++++-
tools/include/linux/types.h | 5
34 files changed, 313 insertions(+), 305 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-03-16 23:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-16 23:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches
4 patches, based on 56e337f2cf1326323844927a04e9dbce9a244835.
Subsystems affected by this patch series:
mm/swap
kconfig
ocfs2
selftests
Subsystem: mm/swap
Guo Ziliang <guo.ziliang@zte.com.cn>:
mm: swap: get rid of deadloop in swapin readahead
Subsystem: kconfig
Qian Cai <quic_qiancai@quicinc.com>:
configs/debug: restore DEBUG_INFO=y for overriding
Subsystem: ocfs2
Joseph Qi <joseph.qi@linux.alibaba.com>:
ocfs2: fix crash when initialize filecheck kobj fails
Subsystem: selftests
Yosry Ahmed <yosryahmed@google.com>:
selftests: vm: fix clang build error multiple output files
fs/ocfs2/super.c | 22 +++++++++++-----------
kernel/configs/debug.config | 1 +
mm/swap_state.c | 2 +-
tools/testing/selftests/vm/Makefile | 6 ++----
4 files changed, 15 insertions(+), 16 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-03-05 4:28 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches
8 patches, based on 07ebd38a0da24d2534da57b4841346379db9f354.
Subsystems affected by this patch series:
mm/hugetlb
mm/pagemap
memfd
selftests
mm/userfaultfd
kconfig
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
selftests/vm: cleanup hugetlb file after mremap test
Subsystem: mm/pagemap
Suren Baghdasaryan <surenb@google.com>:
mm: refactor vm_area_struct::anon_vma_name usage code
mm: prevent vm_area_struct::anon_name refcount saturation
mm: fix use-after-free when anon vma name is used after vma is freed
Subsystem: memfd
Hugh Dickins <hughd@google.com>:
memfd: fix F_SEAL_WRITE after shmem huge page allocated
Subsystem: selftests
Chengming Zhou <zhouchengming@bytedance.com>:
kselftest/vm: fix tests build with old libc
Subsystem: mm/userfaultfd
Yun Zhou <yun.zhou@windriver.com>:
proc: fix documentation and description of pagemap
Subsystem: kconfig
Qian Cai <quic_qiancai@quicinc.com>:
configs/debug: set CONFIG_DEBUG_INFO=y properly
Documentation/admin-guide/mm/pagemap.rst | 2
fs/proc/task_mmu.c | 9 +-
fs/userfaultfd.c | 6 -
include/linux/mm.h | 7 +
include/linux/mm_inline.h | 105 ++++++++++++++++++---------
include/linux/mm_types.h | 5 +
kernel/configs/debug.config | 2
kernel/fork.c | 4 -
kernel/sys.c | 19 +++-
mm/madvise.c | 98 +++++++++----------------
mm/memfd.c | 40 +++++++---
mm/mempolicy.c | 2
mm/mlock.c | 2
mm/mmap.c | 12 +--
mm/mprotect.c | 2
tools/testing/selftests/vm/hugepage-mremap.c | 26 ++++--
tools/testing/selftests/vm/run_vmtests.sh | 3
tools/testing/selftests/vm/userfaultfd.c | 1
18 files changed, 201 insertions(+), 144 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-02-26 3:10 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-02-26 3:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches
12 patches, based on c47658311d60be064b839f329c0e4d34f5f0735b.
Subsystems affected by this patch series:
MAINTAINERS
mm/hugetlb
mm/kasan
mm/hugetlbfs
mm/pagemap
mm/selftests
mm/memcg
m/slab
mailmap
memfd
Subsystem: MAINTAINERS
Luis Chamberlain <mcgrof@kernel.org>:
MAINTAINERS: add sysctl-next git tree
Subsystem: mm/hugetlb
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
mm/hugetlb: fix kernel crash with hugetlb mremap
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan: test: prevent cache merging in kmem_cache_double_destroy
Subsystem: mm/hugetlbfs
Liu Yuntao <liuyuntao10@huawei.com>:
hugetlbfs: fix a truncation issue in hugepages parameter
Subsystem: mm/pagemap
Suren Baghdasaryan <surenb@google.com>:
mm: fix use-after-free bug when mm->mmap is reused after being freed
Subsystem: mm/selftests
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
selftest/vm: fix map_fixed_noreplace test failure
Subsystem: mm/memcg
Roman Gushchin <roman.gushchin@linux.dev>:
MAINTAINERS: add Roman as a memcg co-maintainer
Vladimir Davydov <vdavydov.dev@gmail.com>:
MAINTAINERS: remove Vladimir from memcg maintainers
Shakeel Butt <shakeelb@google.com>:
MAINTAINERS: add Shakeel as a memcg co-maintainer
Subsystem: m/slab
Vlastimil Babka <vbabka@suse.cz>:
MAINTAINERS, SLAB: add Roman as reviewer, git tree
Subsystem: mailmap
Roman Gushchin <roman.gushchin@linux.dev>:
mailmap: update Roman Gushchin's email
Subsystem: memfd
Mike Kravetz <mike.kravetz@oracle.com>:
selftests/memfd: clean up mapping in mfd_fail_write
.mailmap | 3 +
MAINTAINERS | 6 ++
lib/test_kasan.c | 5 +-
mm/hugetlb.c | 11 ++---
mm/mmap.c | 1
tools/testing/selftests/memfd/memfd_test.c | 1
tools/testing/selftests/vm/map_fixed_noreplace.c | 49 +++++++++++++++++------
7 files changed, 56 insertions(+), 20 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2022-02-12 2:02 ` incoming Linus Torvalds
@ 2022-02-12 5:24 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-02-12 5:24 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux-MM, mm-commits, patches
On Fri, 11 Feb 2022 18:02:53 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Feb 11, 2022 at 4:27 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > 5 patches, based on f1baf68e1383f6ed93eb9cff2866d46562607a43.
>
> So this *completely* flummoxed 'b4', because you first sent the wrong
> series, and then sent the right one in the same thread.
>
> I fetched the emails manually, but honestly, this was confusing even
> then, with two "[PATCH x/5]" series where the only way to tell the
> right one was basically by date of email. They did arrive in the same
> order in my mailbox, but even that wouldn't have been guaranteed if
> there had been some mailer delays somewhere..
Yes, I wondered. Sorry bout that.
> So next time when you mess up, resend it all as a completely new
> series and completely new threading - so with a new header email too.
> Please?
Wilco.
> And since I'm here, let me just verify that yes, the series you
> actually want me to apply is this one (as described by the head
> email):
>
> Subject: [patch 1/5] fs/binfmt_elf: fix PT_LOAD p_align values ..
> Subject: [patch 2/5] fs/proc: task_mmu.c: don't read mapcount f..
> Subject: [patch 3/5] mm: vmscan: remove deadlock due to throttl..
> Subject: [patch 4/5] mm: memcg: synchronize objcg lists with a ..
> Subject: [patch 5/5] kfence: make test case compatible with run..
>
> and not the other one with GUP patches?
Those are the ones. Five fixes, three with cc:stable.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2022-02-12 0:27 incoming Andrew Morton
@ 2022-02-12 2:02 ` Linus Torvalds
2022-02-12 5:24 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2022-02-12 2:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits, patches
On Fri, Feb 11, 2022 at 4:27 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 5 patches, based on f1baf68e1383f6ed93eb9cff2866d46562607a43.
So this *completely* flummoxed 'b4', because you first sent the wrong
series, and then sent the right one in the same thread.
I fetched the emails manually, but honestly, this was confusing even
then, with two "[PATCH x/5]" series where the only way to tell the
right one was basically by date of email. They did arrive in the same
order in my mailbox, but even that wouldn't have been guaranteed if
there had been some mailer delays somewhere..
So next time when you mess up, resend it all as a completely new
series and completely new threading - so with a new header email too.
Please?
And since I'm here, let me just verify that yes, the series you
actually want me to apply is this one (as described by the head
email):
Subject: [patch 1/5] fs/binfmt_elf: fix PT_LOAD p_align values ..
Subject: [patch 2/5] fs/proc: task_mmu.c: don't read mapcount f..
Subject: [patch 3/5] mm: vmscan: remove deadlock due to throttl..
Subject: [patch 4/5] mm: memcg: synchronize objcg lists with a ..
Subject: [patch 5/5] kfence: make test case compatible with run..
and not the other one with GUP patches?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-02-12 0:27 Andrew Morton
2022-02-12 2:02 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2022-02-12 0:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits, patches
5 patches, based on f1baf68e1383f6ed93eb9cff2866d46562607a43.
Subsystems affected by this patch series:
binfmt
procfs
mm/vmscan
mm/memcg
mm/kfence
Subsystem: binfmt
Mike Rapoport <rppt@linux.ibm.com>:
fs/binfmt_elf: fix PT_LOAD p_align values for loaders
Subsystem: procfs
Yang Shi <shy828301@gmail.com>:
fs/proc: task_mmu.c: don't read mapcount for migration entry
Subsystem: mm/vmscan
Mel Gorman <mgorman@suse.de>:
mm: vmscan: remove deadlock due to throttling failing to make progress
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
mm: memcg: synchronize objcg lists with a dedicated spinlock
Subsystem: mm/kfence
Peng Liu <liupeng256@huawei.com>:
kfence: make test case compatible with run time set sample interval
fs/binfmt_elf.c | 2 +-
fs/proc/task_mmu.c | 40 +++++++++++++++++++++++++++++++---------
include/linux/kfence.h | 2 ++
include/linux/memcontrol.h | 5 +++--
mm/kfence/core.c | 3 ++-
mm/kfence/kfence_test.c | 8 ++++----
mm/memcontrol.c | 10 +++++-----
mm/vmscan.c | 4 +++-
8 files changed, 51 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-02-04 4:48 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-02-04 4:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
10 patches, based on 1f2cfdd349b7647f438c1e552dc1b983da86d830.
Subsystems affected by this patch series:
mm/vmscan
mm/debug
mm/pagemap
ipc
mm/kmemleak
MAINTAINERS
mm/selftests
Subsystem: mm/vmscan
Chen Wandun <chenwandun@huawei.com>:
Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
Subsystem: mm/debug
Pasha Tatashin <pasha.tatashin@soleen.com>:
Patch series "page table check fixes and cleanups", v5:
mm/debug_vm_pgtable: remove pte entry from the page table
mm/page_table_check: use unsigned long for page counters and cleanup
mm/khugepaged: unify collapse pmd clear, flush and free
mm/page_table_check: check entries at pmd levels
Subsystem: mm/pagemap
Mike Rapoport <rppt@linux.ibm.com>:
mm/pgtable: define pte_index so that preprocessor could recognize it
Subsystem: ipc
Minghao Chi <chi.minghao@zte.com.cn>:
ipc/sem: do not sleep with a spin lock held
Subsystem: mm/kmemleak
Lang Yu <lang.yu@amd.com>:
mm/kmemleak: avoid scanning potential huge holes
Subsystem: MAINTAINERS
Mike Rapoport <rppt@linux.ibm.com>:
MAINTAINERS: update rppt's email
Subsystem: mm/selftests
Shuah Khan <skhan@linuxfoundation.org>:
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
MAINTAINERS | 2 -
include/linux/page_table_check.h | 19 ++++++++++
include/linux/pgtable.h | 1
ipc/sem.c | 4 +-
mm/debug_vm_pgtable.c | 2 +
mm/khugepaged.c | 37 +++++++++++---------
mm/kmemleak.c | 13 +++----
mm/page_isolation.c | 2 -
mm/page_table_check.c | 55 +++++++++++++++----------------
tools/testing/selftests/vm/userfaultfd.c | 11 ++++--
10 files changed, 89 insertions(+), 57 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-01-29 21:40 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-01-29 21:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
12 patches, based on f8c7e4ede46fe63ff10000669652648aab09d112.
Subsystems affected by this patch series:
sysctl
binfmt
ia64
mm/memory-failure
mm/folios
selftests
mm/kasan
mm/psi
ocfs2
Subsystem: sysctl
Andrew Morton <akpm@linux-foundation.org>:
include/linux/sysctl.h: fix register_sysctl_mount_point() return type
Subsystem: binfmt
Tong Zhang <ztong0001@gmail.com>:
binfmt_misc: fix crash when load/unload module
Subsystem: ia64
Randy Dunlap <rdunlap@infradead.org>:
ia64: make IA64_MCA_RECOVERY bool instead of tristate
Subsystem: mm/memory-failure
Joao Martins <joao.m.martins@oracle.com>:
memory-failure: fetch compound_head after pgmap_pfn_valid()
Subsystem: mm/folios
Wei Yang <richard.weiyang@gmail.com>:
mm: page->mapping folio->mapping should have the same offset
Subsystem: selftests
Maor Gottlieb <maorg@nvidia.com>:
tools/testing/scatterlist: add missing defines
Subsystem: mm/kasan
Marco Elver <elver@google.com>:
kasan: test: fix compatibility with FORTIFY_SOURCE
Peter Collingbourne <pcc@google.com>:
mm, kasan: use compare-exchange operation to set KASAN page tag
Subsystem: mm/psi
Suren Baghdasaryan <surenb@google.com>:
psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
Subsystem: ocfs2
Joseph Qi <joseph.qi@linux.alibaba.com>:
Patch series "ocfs2: fix a deadlock case":
jbd2: export jbd2_journal_[grab|put]_journal_head
ocfs2: fix a deadlock when commit trans
arch/ia64/Kconfig | 2
fs/binfmt_misc.c | 8 +--
fs/jbd2/journal.c | 2
fs/ocfs2/suballoc.c | 25 ++++-------
include/linux/mm.h | 17 +++++--
include/linux/mm_types.h | 1
include/linux/psi.h | 11 ++--
include/linux/sysctl.h | 2
kernel/sched/psi.c | 79 ++++++++++++++++++-----------------
lib/test_kasan.c | 5 ++
mm/memory-failure.c | 6 ++
tools/testing/scatterlist/linux/mm.h | 3 -
12 files changed, 91 insertions(+), 70 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2022-01-29 4:25 ` incoming Matthew Wilcox
@ 2022-01-29 6:23 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-01-29 6:23 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Linus Torvalds, mm-commits, linux-mm
On Sat, 29 Jan 2022 04:25:33 +0000 Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Jan 28, 2022 at 06:13:41PM -0800, Andrew Morton wrote:
> > 12 patches, based on 169387e2aa291a4e3cb856053730fe99d6cec06f.
> ^^
>
> I see 7?
Crap, sorry, ignore all this, shall redo tomorrow.
(It wasn't a good day over here. The thing with disk drives is that
the bigger they are, the harder they fall).
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2022-01-29 2:13 incoming Andrew Morton
@ 2022-01-29 4:25 ` Matthew Wilcox
2022-01-29 6:23 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Matthew Wilcox @ 2022-01-29 4:25 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, mm-commits, linux-mm
On Fri, Jan 28, 2022 at 06:13:41PM -0800, Andrew Morton wrote:
> 12 patches, based on 169387e2aa291a4e3cb856053730fe99d6cec06f.
^^
I see 7?
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-01-29 2:13 Andrew Morton
2022-01-29 4:25 ` incoming Matthew Wilcox
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2022-01-29 2:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
12 patches, based on 169387e2aa291a4e3cb856053730fe99d6cec06f.
Subsystems affected by this patch series:
sysctl
binfmt
ia64
mm/memory-failure
mm/folios
selftests
mm/kasan
mm/psi
ocfs2
Subsystem: sysctl
Andrew Morton <akpm@linux-foundation.org>:
include/linux/sysctl.h: fix register_sysctl_mount_point() return type
Subsystem: binfmt
Tong Zhang <ztong0001@gmail.com>:
binfmt_misc: fix crash when load/unload module
Subsystem: ia64
Randy Dunlap <rdunlap@infradead.org>:
ia64: make IA64_MCA_RECOVERY bool instead of tristate
Subsystem: mm/memory-failure
Joao Martins <joao.m.martins@oracle.com>:
memory-failure: fetch compound_head after pgmap_pfn_valid()
Subsystem: mm/folios
Wei Yang <richard.weiyang@gmail.com>:
mm: page->mapping folio->mapping should have the same offset
Subsystem: selftests
Maor Gottlieb <maorg@nvidia.com>:
tools/testing/scatterlist: add missing defines
Subsystem: mm/kasan
Marco Elver <elver@google.com>:
kasan: test: fix compatibility with FORTIFY_SOURCE
Peter Collingbourne <pcc@google.com>:
mm, kasan: use compare-exchange operation to set KASAN page tag
Subsystem: mm/psi
Suren Baghdasaryan <surenb@google.com>:
psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
Subsystem: ocfs2
Joseph Qi <joseph.qi@linux.alibaba.com>:
Patch series "ocfs2: fix a deadlock case":
jbd2: export jbd2_journal_[grab|put]_journal_head
ocfs2: fix a deadlock when commit trans
arch/ia64/Kconfig | 2
fs/binfmt_misc.c | 8 +--
fs/jbd2/journal.c | 2
fs/ocfs2/suballoc.c | 25 ++++-------
include/linux/mm.h | 17 +++++--
include/linux/mm_types.h | 1
include/linux/psi.h | 11 ++--
include/linux/sysctl.h | 2
kernel/sched/psi.c | 79 ++++++++++++++++++-----------------
lib/test_kasan.c | 5 ++
mm/memory-failure.c | 6 ++
tools/testing/scatterlist/linux/mm.h | 3 -
12 files changed, 91 insertions(+), 70 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-01-22 6:10 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-01-22 6:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
This is the post-linux-next queue. Material which was based on or
dependent upon material which was in -next.
69 patches, based on 9b57f458985742bd1c585f4c7f36d04634ce1143.
Subsystems affected by this patch series:
mm/migration
sysctl
mm/zsmalloc
proc
lib
Subsystem: mm/migration
Alistair Popple <apopple@nvidia.com>:
mm/migrate.c: rework migration_entry_wait() to not take a pageref
Subsystem: sysctl
Xiaoming Ni <nixiaoming@huawei.com>:
Patch series "sysctl: first set of kernel/sysctl cleanups", v2:
sysctl: add a new register_sysctl_init() interface
sysctl: move some boundary constants from sysctl.c to sysctl_vals
hung_task: move hung_task sysctl interface to hung_task.c
watchdog: move watchdog sysctl interface to watchdog.c
Stephen Kitt <steve@sk2.org>:
sysctl: make ngroups_max const
Xiaoming Ni <nixiaoming@huawei.com>:
sysctl: use const for typically used max/min proc sysctls
sysctl: use SYSCTL_ZERO to replace some static int zero uses
aio: move aio sysctl to aio.c
dnotify: move dnotify sysctl to dnotify.c
Luis Chamberlain <mcgrof@kernel.org>:
Patch series "sysctl: second set of kernel/sysctl cleanups", v2:
hpet: simplify subdirectory registration with register_sysctl()
i915: simplify subdirectory registration with register_sysctl()
macintosh/mac_hid.c: simplify subdirectory registration with register_sysctl()
ocfs2: simplify subdirectory registration with register_sysctl()
test_sysctl: simplify subdirectory registration with register_sysctl()
Xiaoming Ni <nixiaoming@huawei.com>:
inotify: simplify subdirectory registration with register_sysctl()
Luis Chamberlain <mcgrof@kernel.org>:
cdrom: simplify subdirectory registration with register_sysctl()
Xiaoming Ni <nixiaoming@huawei.com>:
eventpoll: simplify sysctl declaration with register_sysctl()
Patch series "sysctl: 3rd set of kernel/sysctl cleanups", v2:
firmware_loader: move firmware sysctl to its own files
random: move the random sysctl declarations to its own file
Luis Chamberlain <mcgrof@kernel.org>:
sysctl: add helper to register a sysctl mount point
fs: move binfmt_misc sysctl to its own file
Xiaoming Ni <nixiaoming@huawei.com>:
printk: move printk sysctl to printk/sysctl.c
scsi/sg: move sg-big-buff sysctl to scsi/sg.c
stackleak: move stack_erasing sysctl to stackleak.c
Luis Chamberlain <mcgrof@kernel.org>:
sysctl: share unsigned long const values
Patch series "sysctl: 4th set of kernel/sysctl cleanups":
fs: move inode sysctls to its own file
fs: move fs stat sysctls to file_table.c
fs: move dcache sysctls to its own file
sysctl: move maxolduid as a sysctl specific const
fs: move shared sysctls to fs/sysctls.c
fs: move locking sysctls where they are used
fs: move namei sysctls to its own file
fs: move fs/exec.c sysctls into its own file
fs: move pipe sysctls to is own file
Patch series "sysctl: add and use base directory declarer and registration helper":
sysctl: add and use base directory declarer and registration helper
fs: move namespace sysctls and declare fs base directory
kernel/sysctl.c: rename sysctl_init() to sysctl_init_bases()
Xiaoming Ni <nixiaoming@huawei.com>:
printk: fix build warning when CONFIG_PRINTK=n
fs/coredump: move coredump sysctls into its own file
kprobe: move sysctl_kprobes_optimization to kprobes.c
Colin Ian King <colin.i.king@gmail.com>:
kernel/sysctl.c: remove unused variable ten_thousand
Baokun Li <libaokun1@huawei.com>:
sysctl: returns -EINVAL when a negative value is passed to proc_doulongvec_minmax
Subsystem: mm/zsmalloc
Minchan Kim <minchan@kernel.org>:
Patch series "zsmalloc: remove bit_spin_lock", v2:
zsmalloc: introduce some helper functions
zsmalloc: rename zs_stat_type to class_stat_type
zsmalloc: decouple class actions from zspage works
zsmalloc: introduce obj_allocated
zsmalloc: move huge compressed obj from page to zspage
zsmalloc: remove zspage isolation for migration
locking/rwlocks: introduce write_lock_nested
zsmalloc: replace per zpage lock with pool->migrate_lock
Mike Galbraith <umgwanakikbuti@gmail.com>:
zsmalloc: replace get_cpu_var with local_lock
Subsystem: proc
Muchun Song <songmuchun@bytedance.com>:
fs: proc: store PDE()->data into inode->i_private
proc: remove PDE_DATA() completely
Subsystem: lib
Vlastimil Babka <vbabka@suse.cz>:
lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()
lib/stackdepot: fix spelling mistake and grammar in pr_err message
lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup
lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3
lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4
Marco Elver <elver@google.com>:
lib/stackdepot: always do filter_irq_stacks() in stack_depot_save()
Christoph Hellwig <hch@lst.de>:
Patch series "remove Xen tmem leftovers":
mm: remove cleancache
frontswap: remove frontswap_writethrough
frontswap: remove frontswap_tmem_exclusive_gets
frontswap: remove frontswap_shrink
frontswap: remove frontswap_curr_pages
frontswap: simplify frontswap_init
frontswap: remove the frontswap exports
mm: simplify try_to_unuse
frontswap: remove frontswap_test
frontswap: simplify frontswap_register_ops
mm: mark swap_lock and swap_active_head static
frontswap: remove support for multiple ops
mm: hide the FRONTSWAP Kconfig symbol
Documentation/vm/cleancache.rst | 296 ------
Documentation/vm/frontswap.rst | 31
Documentation/vm/index.rst | 1
MAINTAINERS | 7
arch/alpha/kernel/srm_env.c | 4
arch/arm/configs/bcm2835_defconfig | 1
arch/arm/configs/qcom_defconfig | 1
arch/arm/kernel/atags_proc.c | 2
arch/arm/mm/alignment.c | 2
arch/ia64/kernel/salinfo.c | 10
arch/m68k/configs/amiga_defconfig | 1
arch/m68k/configs/apollo_defconfig | 1
arch/m68k/configs/atari_defconfig | 1
arch/m68k/configs/bvme6000_defconfig | 1
arch/m68k/configs/hp300_defconfig | 1
arch/m68k/configs/mac_defconfig | 1
arch/m68k/configs/multi_defconfig | 1
arch/m68k/configs/mvme147_defconfig | 1
arch/m68k/configs/mvme16x_defconfig | 1
arch/m68k/configs/q40_defconfig | 1
arch/m68k/configs/sun3_defconfig | 1
arch/m68k/configs/sun3x_defconfig | 1
arch/powerpc/kernel/proc_powerpc.c | 4
arch/s390/configs/debug_defconfig | 1
arch/s390/configs/defconfig | 1
arch/sh/mm/alignment.c | 4
arch/xtensa/platforms/iss/simdisk.c | 4
block/bdev.c | 5
drivers/acpi/proc.c | 2
drivers/base/firmware_loader/fallback.c | 7
drivers/base/firmware_loader/fallback.h | 11
drivers/base/firmware_loader/fallback_table.c | 25
drivers/cdrom/cdrom.c | 23
drivers/char/hpet.c | 22
drivers/char/random.c | 14
drivers/gpu/drm/drm_dp_mst_topology.c | 1
drivers/gpu/drm/drm_mm.c | 4
drivers/gpu/drm/drm_modeset_lock.c | 9
drivers/gpu/drm/i915/i915_perf.c | 22
drivers/gpu/drm/i915/intel_runtime_pm.c | 3
drivers/hwmon/dell-smm-hwmon.c | 4
drivers/macintosh/mac_hid.c | 24
drivers/net/bonding/bond_procfs.c | 8
drivers/net/wireless/cisco/airo.c | 22
drivers/net/wireless/intersil/hostap/hostap_ap.c | 16
drivers/net/wireless/intersil/hostap/hostap_download.c | 2
drivers/net/wireless/intersil/hostap/hostap_proc.c | 24
drivers/net/wireless/ray_cs.c | 2
drivers/nubus/proc.c | 36
drivers/parisc/led.c | 4
drivers/pci/proc.c | 10
drivers/platform/x86/thinkpad_acpi.c | 4
drivers/platform/x86/toshiba_acpi.c | 16
drivers/pnp/isapnp/proc.c | 2
drivers/pnp/pnpbios/proc.c | 4
drivers/scsi/scsi_proc.c | 4
drivers/scsi/sg.c | 35
drivers/usb/gadget/function/rndis.c | 4
drivers/zorro/proc.c | 2
fs/Makefile | 4
fs/afs/proc.c | 6
fs/aio.c | 31
fs/binfmt_misc.c | 6
fs/btrfs/extent_io.c | 10
fs/btrfs/super.c | 2
fs/coredump.c | 66 +
fs/dcache.c | 37
fs/eventpoll.c | 10
fs/exec.c | 145 +--
fs/ext4/mballoc.c | 14
fs/ext4/readpage.c | 6
fs/ext4/super.c | 3
fs/f2fs/data.c | 13
fs/file_table.c | 47 -
fs/inode.c | 39
fs/jbd2/journal.c | 2
fs/locks.c | 34
fs/mpage.c | 7
fs/namei.c | 58 +
fs/namespace.c | 24
fs/notify/dnotify/dnotify.c | 21
fs/notify/fanotify/fanotify_user.c | 10
fs/notify/inotify/inotify_user.c | 11
fs/ntfs3/ntfs_fs.h | 1
fs/ocfs2/stackglue.c | 25
fs/ocfs2/super.c | 2
fs/pipe.c | 64 +
fs/proc/generic.c | 6
fs/proc/inode.c | 1
fs/proc/internal.h | 5
fs/proc/proc_net.c | 8
fs/proc/proc_sysctl.c | 67 +
fs/super.c | 3
fs/sysctls.c | 47 -
include/linux/aio.h | 4
include/linux/cleancache.h | 124 --
include/linux/coredump.h | 10
include/linux/dcache.h | 10
include/linux/dnotify.h | 1
include/linux/fanotify.h | 2
include/linux/frontswap.h | 35
include/linux/fs.h | 18
include/linux/inotify.h | 3
include/linux/kprobes.h | 6
include/linux/migrate.h | 2
include/linux/mount.h | 3
include/linux/pipe_fs_i.h | 4
include/linux/poll.h | 2
include/linux/printk.h | 4
include/linux/proc_fs.h | 17
include/linux/ref_tracker.h | 2
include/linux/rwlock.h | 6
include/linux/rwlock_api_smp.h | 8
include/linux/rwlock_rt.h | 10
include/linux/sched/sysctl.h | 14
include/linux/seq_file.h | 2
include/linux/shmem_fs.h | 3
include/linux/spinlock_api_up.h | 1
include/linux/stackdepot.h | 25
include/linux/stackleak.h | 5
include/linux/swapfile.h | 3
include/linux/sysctl.h | 67 +
include/scsi/sg.h | 4
init/main.c | 9
ipc/util.c | 2
kernel/hung_task.c | 81 +
kernel/irq/proc.c | 8
kernel/kprobes.c | 30
kernel/locking/spinlock.c | 10
kernel/locking/spinlock_rt.c | 12
kernel/printk/Makefile | 5
kernel/printk/internal.h | 8
kernel/printk/printk.c | 4
kernel/printk/sysctl.c | 85 +
kernel/resource.c | 4
kernel/stackleak.c | 26
kernel/sysctl.c | 790 +----------------
kernel/watchdog.c | 101 ++
lib/Kconfig | 4
lib/Kconfig.kasan | 2
lib/stackdepot.c | 46
lib/test_sysctl.c | 22
mm/Kconfig | 40
mm/Makefile | 1
mm/cleancache.c | 315 ------
mm/filemap.c | 102 +-
mm/frontswap.c | 259 -----
mm/kasan/common.c | 1
mm/migrate.c | 38
mm/page_owner.c | 2
mm/shmem.c | 33
mm/swapfile.c | 90 -
mm/truncate.c | 15
mm/zsmalloc.c | 557 ++++-------
mm/zswap.c | 8
net/atm/proc.c | 4
net/bluetooth/af_bluetooth.c | 8
net/can/bcm.c | 2
net/can/proc.c | 2
net/core/neighbour.c | 6
net/core/pktgen.c | 6
net/ipv4/netfilter/ipt_CLUSTERIP.c | 6
net/ipv4/raw.c | 8
net/ipv4/tcp_ipv4.c | 2
net/ipv4/udp.c | 6
net/netfilter/x_tables.c | 10
net/netfilter/xt_hashlimit.c | 18
net/netfilter/xt_recent.c | 4
net/sunrpc/auth_gss/svcauth_gss.c | 4
net/sunrpc/cache.c | 24
net/sunrpc/stats.c | 2
sound/core/info.c | 4
172 files changed, 1877 insertions(+), 2931 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-01-20 2:07 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-01-20 2:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
55 patches, based on df0cc57e057f18e44dac8e6c18aba47ab53202f9 ("Linux 5.16")
Subsystems affected by this patch series:
percpu
procfs
sysctl
misc
core-kernel
get_maintainer
lib
checkpatch
binfmt
nilfs2
hfs
fat
adfs
panic
delayacct
kconfig
kcov
ubsan
Subsystem: percpu
Kefeng Wang <wangkefeng.wang@huawei.com>:
Patch series "mm: percpu: Cleanup percpu first chunk function":
mm: percpu: generalize percpu related config
mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef
mm: percpu: add generic pcpu_fc_alloc/free funciton
mm: percpu: add generic pcpu_populate_pte() function
Subsystem: procfs
David Hildenbrand <david@redhat.com>:
proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration
Hans de Goede <hdegoede@redhat.com>:
proc: make the proc_create[_data]() stubs static inlines
Qi Zheng <zhengqi.arch@bytedance.com>:
proc: convert the return type of proc_fd_access_allowed() to be boolean
Subsystem: sysctl
Geert Uytterhoeven <geert+renesas@glider.be>:
sysctl: fix duplicate path separator in printed entries
luo penghao <luo.penghao@zte.com.cn>:
sysctl: remove redundant ret assignment
Subsystem: misc
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
include/linux/unaligned: replace kernel.h with the necessary inclusions
kernel.h: include a note to discourage people from including it in headers
Subsystem: core-kernel
Yafang Shao <laoar.shao@gmail.com>:
Patch series "task comm cleanups", v2:
fs/exec: replace strlcpy with strscpy_pad in __set_task_comm
fs/exec: replace strncpy with strscpy_pad in __get_task_comm
drivers/infiniband: replace open-coded string copy with get_task_comm
fs/binfmt_elf: replace open-coded string copy with get_task_comm
samples/bpf/test_overhead_kprobe_kern: replace bpf_probe_read_kernel with bpf_probe_read_kernel_str to get task comm
tools/bpf/bpftool/skeleton: replace bpf_probe_read_kernel with bpf_probe_read_kernel_str to get task comm
tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN
kthread: dynamically allocate memory to store kthread's full name
Davidlohr Bueso <dave@stgolabs.net>:
kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP)
Subsystem: get_maintainer
Randy Dunlap <rdunlap@infradead.org>:
get_maintainer: don't remind about no git repo when --nogit is used
Subsystem: lib
Alexey Dobriyan <adobriyan@gmail.com>:
kstrtox: uninline everything
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
list: introduce list_is_head() helper and re-use it in list.h
Zhen Lei <thunder.leizhen@huawei.com>:
lib/list_debug.c: print more list debugging context in __list_del_entry_valid()
Isabella Basso <isabbasso@riseup.net>:
Patch series "test_hash.c: refactor into KUnit", v3:
hash.h: remove unused define directive
test_hash.c: split test_int_hash into arch-specific functions
test_hash.c: split test_hash_init
lib/Kconfig.debug: properly split hash test kernel entries
test_hash.c: refactor into kunit
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
kunit: replace kernel.h with the necessary inclusions
uuid: discourage people from using UAPI header in new code
uuid: remove licence boilerplate text from the header
Andrey Konovalov <andreyknvl@google.com>:
lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test
Subsystem: checkpatch
Jerome Forissier <jerome@forissier.org>:
checkpatch: relax regexp for COMMIT_LOG_LONG_LINE
Joe Perches <joe@perches.com>:
checkpatch: improve Kconfig help test
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
const_structs.checkpatch: add frequently used ops structs
Subsystem: binfmt
"H.J. Lu" <hjl.tools@gmail.com>:
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
Subsystem: nilfs2
Colin Ian King <colin.i.king@gmail.com>:
nilfs2: remove redundant pointer sbufs
Subsystem: hfs
Kees Cook <keescook@chromium.org>:
hfsplus: use struct_group_attr() for memcpy() region
Subsystem: fat
"NeilBrown" <neilb@suse.de>:
FAT: use io_schedule_timeout() instead of congestion_wait()
Subsystem: adfs
Minghao Chi <chi.minghao@zte.com.cn>:
fs/adfs: remove unneeded variable make code cleaner
Subsystem: panic
Marco Elver <elver@google.com>:
panic: use error_report_end tracepoint on warnings
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
panic: remove oops_id
Subsystem: delayacct
Yang Yang <yang.yang29@zte.com.cn>:
delayacct: support swapin delay accounting for swapping without blkio
delayacct: fix incomplete disable operation when switch enable to disable
delayacct: cleanup flags in struct task_delay_info and functions use it
wangyong <wang.yong12@zte.com.cn>:
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
delayacct: track delays from memory compact
Subsystem: kconfig
Qian Cai <quic_qiancai@quicinc.com>:
configs: introduce debug.config for CI-like setup
Nathan Chancellor <nathan@kernel.org>:
Patch series "Fix CONFIG_TEST_KMOD with 256kB page size":
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
btrfs: use generic Kconfig option for 256kB page size limit
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
Subsystem: kcov
Marco Elver <elver@google.com>:
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
Subsystem: ubsan
Kees Cook <keescook@chromium.org>:
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
Colin Ian King <colin.i.king@gmail.com>:
lib: remove redundant assignment to variable ret
Documentation/accounting/delay-accounting.rst | 63 +-
arch/Kconfig | 4
arch/arm64/Kconfig | 20
arch/ia64/Kconfig | 9
arch/mips/Kconfig | 10
arch/mips/mm/init.c | 28 -
arch/powerpc/Kconfig | 17
arch/powerpc/kernel/setup_64.c | 113 ----
arch/riscv/Kconfig | 10
arch/sparc/Kconfig | 12
arch/sparc/kernel/led.c | 8
arch/sparc/kernel/smp_64.c | 119 -----
arch/x86/Kconfig | 19
arch/x86/kernel/setup_percpu.c | 82 ---
drivers/base/arch_numa.c | 78 ---
drivers/infiniband/hw/qib/qib.h | 2
drivers/infiniband/hw/qib/qib_file_ops.c | 2
drivers/infiniband/sw/rxe/rxe_qp.c | 3
drivers/net/wireless/broadcom/brcm80211/brcmfmac/xtlv.c | 2
fs/adfs/inode.c | 4
fs/binfmt_elf.c | 6
fs/btrfs/Kconfig | 3
fs/exec.c | 5
fs/fat/file.c | 5
fs/hfsplus/hfsplus_raw.h | 12
fs/hfsplus/xattr.c | 4
fs/nilfs2/page.c | 4
fs/proc/array.c | 3
fs/proc/base.c | 4
fs/proc/proc_sysctl.c | 9
fs/proc/vmcore.c | 10
include/kunit/assert.h | 2
include/linux/delayacct.h | 107 ++--
include/linux/elfcore-compat.h | 5
include/linux/elfcore.h | 5
include/linux/hash.h | 5
include/linux/kernel.h | 9
include/linux/kthread.h | 1
include/linux/list.h | 36 -
include/linux/percpu.h | 21
include/linux/proc_fs.h | 12
include/linux/sched.h | 9
include/linux/unaligned/packed_struct.h | 2
include/trace/events/error_report.h | 8
include/uapi/linux/taskstats.h | 6
include/uapi/linux/uuid.h | 10
kernel/configs/debug.config | 105 ++++
kernel/delayacct.c | 49 +-
kernel/kthread.c | 32 +
kernel/panic.c | 21
kernel/sys.c | 16
lib/Kconfig.debug | 45 +
lib/Kconfig.ubsan | 13
lib/Makefile | 5
lib/asn1_encoder.c | 2
lib/kstrtox.c | 12
lib/list_debug.c | 8
lib/lz4/lz4defs.h | 2
lib/test_hash.c | 375 +++++++---------
lib/test_meminit.c | 1
lib/test_ubsan.c | 22
mm/Kconfig | 12
mm/memory.c | 4
mm/page_alloc.c | 3
mm/page_io.c | 3
mm/percpu.c | 168 +++++--
samples/bpf/offwaketime_kern.c | 4
samples/bpf/test_overhead_kprobe_kern.c | 11
samples/bpf/test_overhead_tp_kern.c | 5
scripts/Makefile.ubsan | 1
scripts/checkpatch.pl | 54 +-
scripts/const_structs.checkpatch | 23
scripts/get_maintainer.pl | 2
tools/accounting/getdelays.c | 8
tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 4
tools/include/linux/hash.h | 5
tools/testing/selftests/bpf/progs/test_stacktrace_map.c | 6
tools/testing/selftests/bpf/progs/test_tracepoint.c | 6
78 files changed, 943 insertions(+), 992 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2022-01-14 22:02 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2022-01-14 22:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
146 patches, based on df0cc57e057f18e44dac8e6c18aba47ab53202f9 ("Linux 5.16")
Subsystems affected by this patch series:
kthread
ia64
scripts
ntfs
squashfs
ocfs2
vfs
mm/slab-generic
mm/slab
mm/kmemleak
mm/dax
mm/kasan
mm/debug
mm/pagecache
mm/gup
mm/shmem
mm/frontswap
mm/memremap
mm/memcg
mm/selftests
mm/pagemap
mm/dma
mm/vmalloc
mm/memory-failure
mm/hugetlb
mm/userfaultfd
mm/vmscan
mm/mempolicy
mm/oom-kill
mm/hugetlbfs
mm/migration
mm/thp
mm/ksm
mm/page-poison
mm/percpu
mm/rmap
mm/zswap
mm/zram
mm/cleanups
mm/hmm
mm/damon
Subsystem: kthread
Cai Huoqing <caihuoqing@baidu.com>:
kthread: add the helper function kthread_run_on_cpu()
RDMA/siw: make use of the helper function kthread_run_on_cpu()
ring-buffer: make use of the helper function kthread_run_on_cpu()
rcutorture: make use of the helper function kthread_run_on_cpu()
trace/osnoise: make use of the helper function kthread_run_on_cpu()
trace/hwlat: make use of the helper function kthread_run_on_cpu()
Subsystem: ia64
Yang Guang <yang.guang5@zte.com.cn>:
ia64: module: use swap() to make code cleaner
arch/ia64/kernel/setup.c: use swap() to make code cleaner
Jason Wang <wangborong@cdjrlc.com>:
ia64: fix typo in a comment
Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
ia64: topology: use default_groups in kobj_type
Subsystem: scripts
Drew Fustini <dfustini@baylibre.com>:
scripts/spelling.txt: add "oveflow"
Subsystem: ntfs
Yang Li <yang.lee@linux.alibaba.com>:
fs/ntfs/attrib.c: fix one kernel-doc comment
Subsystem: squashfs
Zheng Liang <zhengliang6@huawei.com>:
squashfs: provide backing_dev_info in order to disable read-ahead
Subsystem: ocfs2
Zhang Mingyu <zhang.mingyu@zte.com.cn>:
ocfs2: use BUG_ON instead of if condition followed by BUG.
Joseph Qi <joseph.qi@linux.alibaba.com>:
ocfs2: clearly handle ocfs2_grab_pages_for_write() return value
Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
ocfs2: use default_groups in kobj_type
Colin Ian King <colin.i.king@gmail.com>:
ocfs2: remove redundant assignment to pointer root_bh
Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
ocfs2: cluster: use default_groups in kobj_type
Colin Ian King <colin.i.king@gmail.com>:
ocfs2: remove redundant assignment to variable free_space
Subsystem: vfs
Amit Daniel Kachhap <amit.kachhap@arm.com>:
fs/ioctl: remove unnecessary __user annotation
Subsystem: mm/slab-generic
Marco Elver <elver@google.com>:
mm/slab_common: use WARN() if cache still has objects on destroy
Subsystem: mm/slab
Muchun Song <songmuchun@bytedance.com>:
mm: slab: make slab iterator functions static
Subsystem: mm/kmemleak
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
kmemleak: fix kmemleak false positive report with HW tag-based kasan enable
Calvin Zhang <calvinzhang.cool@gmail.com>:
mm: kmemleak: alloc gray object for reserved region with direct map
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm: defer kmemleak object creation of module_alloc()
Subsystem: mm/dax
Joao Martins <joao.m.martins@oracle.com>:
Patch series "mm, device-dax: Introduce compound pages in devmap", v7:
mm/page_alloc: split prep_compound_page into head and tail subparts
mm/page_alloc: refactor memmap_init_zone_device() page init
mm/memremap: add ZONE_DEVICE support for compound pages
device-dax: use ALIGN() for determining pgoff
device-dax: use struct_size()
device-dax: ensure dev_dax->pgmap is valid for dynamic devices
device-dax: factor out page mapping initialization
device-dax: set mapping prior to vmf_insert_pfn{,_pmd,pud}()
device-dax: remove pfn from __dev_dax_{pte,pmd,pud}_fault()
device-dax: compound devmap support
Subsystem: mm/kasan
Marco Elver <elver@google.com>:
kasan: test: add globals left-out-of-bounds test
kasan: add ability to detect double-kmem_cache_destroy()
kasan: test: add test case for double-kmem_cache_destroy()
Andrey Konovalov <andreyknvl@google.com>:
kasan: fix quarantine conflicting with init_on_free
Subsystem: mm/debug
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm,fs: split dump_mapping() out from dump_page()
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/debug_vm_pgtable: update comments regarding migration swap entries
Subsystem: mm/pagecache
chiminghao <chi.minghao@zte.com.cn>:
mm/truncate.c: remove unneeded variable
Subsystem: mm/gup
Christophe Leroy <christophe.leroy@csgroup.eu>:
gup: avoid multiple user access locking/unlocking in fault_in_{read/write}able
Li Xinhai <lixinhai.lxh@gmail.com>:
mm/gup.c: stricter check on THP migration entry during follow_pmd_mask
Subsystem: mm/shmem
Yang Shi <shy828301@gmail.com>:
mm: shmem: don't truncate page if memory failure happens
Gang Li <ligang.bdlg@bytedance.com>:
shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode
Subsystem: mm/frontswap
Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
mm/frontswap.c: use non-atomic '__set_bit()' when possible
Subsystem: mm/memremap
Subsystem: mm/memcg
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: make cgroup_memory_nokmem static
Donghai Qiao <dqiao@redhat.com>:
mm/page_counter: remove an incorrect call to propagate_protected_usage()
Dan Schatzberg <schatzberg.dan@gmail.com>:
mm/memcg: add oom_group_kill memory event
Shakeel Butt <shakeelb@google.com>:
memcg: better bounds on the memcg stats updates
Wang Weiyang <wangweiyang2@huawei.com>:
mm/memcg: use struct_size() helper in kzalloc()
Shakeel Butt <shakeelb@google.com>:
memcg: add per-memcg vmalloc stat
Subsystem: mm/selftests
chiminghao <chi.minghao@zte.com.cn>:
tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner
Subsystem: mm/pagemap
Qi Zheng <zhengqi.arch@bytedance.com>:
mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit
Colin Cross <ccross@google.com>:
Patch series "mm: rearrange madvise code to allow for reuse", v11:
mm: rearrange madvise code to allow for reuse
mm: add a field to store names for private anonymous memory
Suren Baghdasaryan <surenb@google.com>:
mm: add anonymous vma name refcounting
Arnd Bergmann <arnd@arndb.de>:
mm: move anon_vma declarations to linux/mm_inline.h
mm: move tlb_flush_pending inline helpers to mm_inline.h
Suren Baghdasaryan <surenb@google.com>:
mm: protect free_pgtables with mmap_lock write lock in exit_mmap
mm: document locking restrictions for vm_operations_struct::close
mm/oom_kill: allow process_mrelease to run under mmap_lock protection
Shuah Khan <skhan@linuxfoundation.org>:
docs/vm: add vmalloced-kernel-stacks document
Pasha Tatashin <pasha.tatashin@soleen.com>:
Patch series "page table check", v3:
mm: change page type prior to adding page table entry
mm: ptep_clear() page table helper
mm: page table check
x86: mm: add x86_64 support for page table check
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: remove last argument of reuse_swap_page()
mm: remove the total_mapcount argument from page_trans_huge_map_swapcount()
mm: remove the total_mapcount argument from page_trans_huge_mapcount()
Subsystem: mm/dma
Christian König <christian.koenig@amd.com>:
mm/dmapool.c: revert "make dma pool to use kmalloc_node"
Subsystem: mm/vmalloc
Michal Hocko <mhocko@suse.com>:
Patch series "extend vmalloc support for constrained allocations", v2:
mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc
mm/vmalloc: add support for __GFP_NOFAIL
mm/vmalloc: be more explicit about supported gfp flags.
mm: allow !GFP_KERNEL allocations for kvmalloc
mm: make slab and vmalloc allocators __GFP_NOLOCKDEP aware
"NeilBrown" <neilb@suse.de>:
mm: introduce memalloc_retry_wait()
Suren Baghdasaryan <surenb@google.com>:
mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%
Changcheng Deng <deng.changcheng@zte.com.cn>:
mm: fix boolreturn.cocci warning
Xiongwei Song <sxwjean@gmail.com>:
mm: page_alloc: fix building error on -Werror=array-compare
Michal Hocko <mhocko@suse.com>:
mm: drop node from alloc_pages_vma
Miles Chen <miles.chen@mediatek.com>:
include/linux/gfp.h: further document GFP_DMA32
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/page_alloc.c: modify the comment section for alloc_contig_pages()
Baoquan He <bhe@redhat.com>:
Patch series "Handle warning of allocation failure on DMA zone w/o managed pages", v4:
mm_zone: add function to check if managed dma zone exists
dma/pool: create dma atomic pool only if dma zone has managed pages
mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
Subsystem: mm/memory-failure
Subsystem: mm/hugetlb
Mina Almasry <almasrymina@google.com>:
hugetlb: add hugetlb.*.numa_stat file
Yosry Ahmed <yosryahmed@google.com>:
mm, hugepages: make memory size variable in hugepage-mremap selftest
Yang Yang <yang.yang29@zte.com.cn>:
mm/vmstat: add events for THP max_ptes_* exceeds
Waiman Long <longman@redhat.com>:
selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting
Subsystem: mm/userfaultfd
Peter Xu <peterx@redhat.com>:
selftests/uffd: allow EINTR/EAGAIN
Mike Kravetz <mike.kravetz@oracle.com>:
userfaultfd/selftests: clean up hugetlb allocation code
Subsystem: mm/vmscan
Gang Li <ligang.bdlg@bytedance.com>:
vmscan: make drop_slab_node static
Chen Wandun <chenwandun@huawei.com>:
mm/page_isolation: unset migratetype directly for non Buddy page
Subsystem: mm/mempolicy
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "mm: add new syscall set_mempolicy_home_node", v6:
mm/mempolicy: use policy_node helper with MPOL_PREFERRED_MANY
mm/mempolicy: add set_mempolicy_home_node syscall
mm/mempolicy: wire up syscall set_mempolicy_home_node
Randy Dunlap <rdunlap@infradead.org>:
mm/mempolicy: fix all kernel-doc warnings
Subsystem: mm/oom-kill
Jann Horn <jannh@google.com>:
mm, oom: OOM sysrq should always kill a process
Subsystem: mm/hugetlbfs
Sean Christopherson <seanjc@google.com>:
hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list()
Subsystem: mm/migration
Baolin Wang <baolin.wang@linux.alibaba.com>:
Patch series "Improve the migration stats":
mm: migrate: fix the return value of migrate_pages()
mm: migrate: correct the hugetlb migration stats
mm: compaction: fix the migration stats in trace_mm_compaction_migratepages()
mm: migrate: support multiple target nodes demotion
mm: migrate: add more comments for selecting target node randomly
Huang Ying <ying.huang@intel.com>:
mm/migrate: move node demotion code to near its user
Colin Ian King <colin.i.king@gmail.com>:
mm/migrate: remove redundant variables used in a for-loop
Subsystem: mm/thp
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/thp: drop unused trace events hugepage_[invalidate|splitting]
Subsystem: mm/ksm
Nanyong Sun <sunnanyong@huawei.com>:
mm: ksm: fix use-after-free kasan report in ksm_might_need_to_copy
Subsystem: mm/page-poison
Naoya Horiguchi <naoya.horiguchi@nec.com>:
Patch series "mm/hwpoison: fix unpoison_memory()", v4:
mm/hwpoison: mf_mutex for soft offline and unpoison
mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
mm/hwpoison: fix unpoison_memory()
Subsystem: mm/percpu
Qi Zheng <zhengqi.arch@bytedance.com>:
mm: memcg/percpu: account extra objcg space to memory cgroups
Subsystem: mm/rmap
Huang Ying <ying.huang@intel.com>:
mm/rmap: fix potential batched TLB flush race
Subsystem: mm/zswap
Zhaoyu Liu <zackary.liu.pro@gmail.com>:
zpool: remove the list of pools_head
Subsystem: mm/zram
Luis Chamberlain <mcgrof@kernel.org>:
zram: use ATTRIBUTE_GROUPS
Subsystem: mm/cleanups
Quanfa Fu <fuqf0919@gmail.com>:
mm: fix some comment errors
Ting Liu <liuting.0x7c00@bytedance.com>:
mm: make some vars and functions static or __init
Subsystem: mm/hmm
Alistair Popple <apopple@nvidia.com>:
mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault
Subsystem: mm/damon
Xin Hao <xhao@linux.alibaba.com>:
Patch series "mm/damon: Do some small changes", v4:
mm/damon: unified access_check function naming rules
mm/damon: add 'age' of region tracepoint support
mm/damon/core: use abs() instead of diff_of()
mm/damon: remove some unneeded function definitions in damon.h
Yihao Han <hanyihao@vivo.com>:
mm/damon/vaddr: remove swap_ranges() and replace it with swap()
Xin Hao <xhao@linux.alibaba.com>:
mm/damon/schemes: add the validity judgment of thresholds
mm/damon: move damon_rand() definition into damon.h
mm/damon: modify damon_rand() macro to static inline function
SeongJae Park <sj@kernel.org>:
Patch series "mm/damon: Misc cleanups":
mm/damon: convert macro functions to static inline functions
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
mm/damon: remove a mistakenly added comment for a future feature
Patch series "mm/damon/schemes: Extend stats for better online analysis and tuning":
mm/damon/schemes: account scheme actions that successfully applied
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/reclaim: provide reclamation statistics
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/usage: update for schemes statistics
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm/damon: add access checking for hugetlb pages
Guoqing Jiang <guoqing.jiang@linux.dev>:
mm/damon: move the implementation of damon_insert_region to damon.h
SeongJae Park <sj@kernel.org>:
Patch series "mm/damon: Hide unnecessary information disclosures":
mm/damon/dbgfs: remove an unnecessary variable
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon: hide kernel pointer from tracepoint event
Documentation/admin-guide/cgroup-v1/hugetlb.rst | 4
Documentation/admin-guide/cgroup-v2.rst | 11
Documentation/admin-guide/mm/damon/reclaim.rst | 25
Documentation/admin-guide/mm/damon/usage.rst | 235 +++++--
Documentation/admin-guide/mm/numa_memory_policy.rst | 16
Documentation/admin-guide/sysctl/vm.rst | 2
Documentation/filesystems/proc.rst | 6
Documentation/vm/arch_pgtable_helpers.rst | 20
Documentation/vm/index.rst | 2
Documentation/vm/page_migration.rst | 12
Documentation/vm/page_table_check.rst | 56 +
Documentation/vm/vmalloced-kernel-stacks.rst | 153 ++++
MAINTAINERS | 9
arch/Kconfig | 3
arch/alpha/kernel/syscalls/syscall.tbl | 1
arch/alpha/mm/fault.c | 16
arch/arc/mm/fault.c | 3
arch/arm/mm/fault.c | 2
arch/arm/tools/syscall.tbl | 1
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 2
arch/arm64/kernel/module.c | 4
arch/arm64/mm/fault.c | 6
arch/hexagon/mm/vm_fault.c | 8
arch/ia64/kernel/module.c | 6
arch/ia64/kernel/setup.c | 5
arch/ia64/kernel/syscalls/syscall.tbl | 1
arch/ia64/kernel/topology.c | 3
arch/ia64/kernel/uncached.c | 2
arch/ia64/mm/fault.c | 16
arch/m68k/kernel/syscalls/syscall.tbl | 1
arch/m68k/mm/fault.c | 18
arch/microblaze/kernel/syscalls/syscall.tbl | 1
arch/microblaze/mm/fault.c | 18
arch/mips/kernel/syscalls/syscall_n32.tbl | 1
arch/mips/kernel/syscalls/syscall_n64.tbl | 1
arch/mips/kernel/syscalls/syscall_o32.tbl | 1
arch/mips/mm/fault.c | 19
arch/nds32/mm/fault.c | 16
arch/nios2/mm/fault.c | 18
arch/openrisc/mm/fault.c | 18
arch/parisc/kernel/syscalls/syscall.tbl | 1
arch/parisc/mm/fault.c | 18
arch/powerpc/kernel/syscalls/syscall.tbl | 1
arch/powerpc/mm/fault.c | 6
arch/riscv/mm/fault.c | 2
arch/s390/kernel/module.c | 5
arch/s390/kernel/syscalls/syscall.tbl | 1
arch/s390/mm/fault.c | 28
arch/sh/kernel/syscalls/syscall.tbl | 1
arch/sh/mm/fault.c | 18
arch/sparc/kernel/syscalls/syscall.tbl | 1
arch/sparc/mm/fault_32.c | 16
arch/sparc/mm/fault_64.c | 16
arch/um/kernel/trap.c | 8
arch/x86/Kconfig | 1
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
arch/x86/include/asm/pgtable.h | 31 -
arch/x86/kernel/module.c | 7
arch/x86/mm/fault.c | 3
arch/xtensa/kernel/syscalls/syscall.tbl | 1
arch/xtensa/mm/fault.c | 17
drivers/block/zram/zram_drv.c | 11
drivers/dax/bus.c | 32 +
drivers/dax/bus.h | 1
drivers/dax/device.c | 140 ++--
drivers/infiniband/sw/siw/siw_main.c | 7
drivers/of/fdt.c | 6
fs/ext4/extents.c | 8
fs/ext4/inline.c | 5
fs/ext4/page-io.c | 9
fs/f2fs/data.c | 4
fs/f2fs/gc.c | 5
fs/f2fs/inode.c | 4
fs/f2fs/node.c | 4
fs/f2fs/recovery.c | 6
fs/f2fs/segment.c | 9
fs/f2fs/super.c | 5
fs/hugetlbfs/inode.c | 7
fs/inode.c | 49 +
fs/ioctl.c | 2
fs/ntfs/attrib.c | 2
fs/ocfs2/alloc.c | 2
fs/ocfs2/aops.c | 26
fs/ocfs2/cluster/masklog.c | 11
fs/ocfs2/dir.c | 2
fs/ocfs2/filecheck.c | 3
fs/ocfs2/journal.c | 6
fs/proc/task_mmu.c | 13
fs/squashfs/super.c | 33 +
fs/userfaultfd.c | 8
fs/xfs/kmem.c | 3
fs/xfs/xfs_buf.c | 2
include/linux/ceph/libceph.h | 1
include/linux/damon.h | 93 +--
include/linux/fs.h | 1
include/linux/gfp.h | 12
include/linux/hugetlb.h | 4
include/linux/hugetlb_cgroup.h | 7
include/linux/kasan.h | 4
include/linux/kthread.h | 25
include/linux/memcontrol.h | 22
include/linux/mempolicy.h | 1
include/linux/memremap.h | 11
include/linux/mm.h | 76 --
include/linux/mm_inline.h | 136 ++++
include/linux/mm_types.h | 252 +++-----
include/linux/mmzone.h | 9
include/linux/page-flags.h | 6
include/linux/page_idle.h | 1
include/linux/page_table_check.h | 147 ++++
include/linux/pgtable.h | 8
include/linux/sched/mm.h | 26
include/linux/swap.h | 8
include/linux/syscalls.h | 3
include/linux/vm_event_item.h | 3
include/linux/vmalloc.h | 7
include/ras/ras_event.h | 2
include/trace/events/compaction.h | 24
include/trace/events/damon.h | 15
include/trace/events/thp.h | 35 -
include/uapi/asm-generic/unistd.h | 5
include/uapi/linux/prctl.h | 3
kernel/dma/pool.c | 4
kernel/fork.c | 3
kernel/kthread.c | 1
kernel/rcu/rcutorture.c | 7
kernel/sys.c | 63 ++
kernel/sys_ni.c | 1
kernel/sysctl.c | 3
kernel/trace/ring_buffer.c | 7
kernel/trace/trace_hwlat.c | 6
kernel/trace/trace_osnoise.c | 3
lib/test_hmm.c | 24
lib/test_kasan.c | 30
mm/Kconfig | 14
mm/Kconfig.debug | 24
mm/Makefile | 1
mm/compaction.c | 7
mm/damon/core.c | 45 -
mm/damon/dbgfs.c | 20
mm/damon/paddr.c | 24
mm/damon/prmtv-common.h | 4
mm/damon/reclaim.c | 46 +
mm/damon/vaddr.c | 186 ++++--
mm/debug.c | 52 -
mm/debug_vm_pgtable.c | 6
mm/dmapool.c | 2
mm/frontswap.c | 4
mm/gup.c | 31 -
mm/hmm.c | 5
mm/huge_memory.c | 32 -
mm/hugetlb.c | 6
mm/hugetlb_cgroup.c | 133 +++-
mm/internal.h | 7
mm/kasan/quarantine.c | 11
mm/kasan/shadow.c | 9
mm/khugepaged.c | 23
mm/kmemleak.c | 21
mm/ksm.c | 5
mm/madvise.c | 510 ++++++++++------
mm/mapping_dirty_helpers.c | 1
mm/memcontrol.c | 44 -
mm/memory-failure.c | 189 +++---
mm/memory.c | 12
mm/mempolicy.c | 95 ++-
mm/memremap.c | 18
mm/migrate.c | 527 ++++++++++-------
mm/mlock.c | 2
mm/mmap.c | 55 +
mm/mmu_gather.c | 1
mm/mprotect.c | 2
mm/oom_kill.c | 30
mm/page_alloc.c | 198 ++++--
mm/page_counter.c | 1
mm/page_ext.c | 8
mm/page_isolation.c | 2
mm/page_owner.c | 4
mm/page_table_check.c | 270 ++++++++
mm/percpu-internal.h | 18
mm/percpu.c | 10
mm/pgtable-generic.c | 1
mm/rmap.c | 43 +
mm/shmem.c | 91 ++
mm/slab.h | 5
mm/slab_common.c | 34 -
mm/swap.c | 2
mm/swapfile.c | 46 -
mm/truncate.c | 5
mm/userfaultfd.c | 5
mm/util.c | 15
mm/vmalloc.c | 75 +-
mm/vmscan.c | 2
mm/vmstat.c | 3
mm/zpool.c | 12
net/ceph/buffer.c | 4
net/ceph/ceph_common.c | 27
net/ceph/crypto.c | 2
net/ceph/messenger.c | 2
net/ceph/messenger_v2.c | 2
net/ceph/osdmap.c | 12
net/sunrpc/svc_xprt.c | 3
scripts/spelling.txt | 1
tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 34 -
tools/testing/selftests/vm/hmm-tests.c | 42 +
tools/testing/selftests/vm/hugepage-mremap.c | 46 -
tools/testing/selftests/vm/hugetlb_reparenting_test.sh | 21
tools/testing/selftests/vm/run_vmtests.sh | 2
tools/testing/selftests/vm/userfaultfd.c | 33 -
tools/testing/selftests/vm/write_hugetlb_memory.sh | 2
211 files changed, 3980 insertions(+), 1759 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-12-31 4:12 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-12-31 4:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
2 patches, based on 4f3d93c6eaff6b84e43b63e0d7a119c5920e1020.
Subsystems affected by this patch series:
mm/userfaultfd
mm/damon
Subsystem: mm/userfaultfd
Mike Kravetz <mike.kravetz@oracle.com>:
userfaultfd/selftests: fix hugetlb area allocations
Subsystem: mm/damon
SeongJae Park <sj@kernel.org>:
mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
mm/damon/dbgfs.c | 9 +++++++--
tools/testing/selftests/vm/userfaultfd.c | 16 ++++++++++------
2 files changed, 17 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-12-25 5:11 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-12-25 5:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
9 patches, based on bc491fb12513e79702c6f936c838f792b5389129.
Subsystems affected by this patch series:
mm/kfence
mm/mempolicy
core-kernel
MAINTAINERS
mm/memory-failure
mm/pagemap
mm/pagealloc
mm/damon
mm/memory-failure
Subsystem: mm/kfence
Baokun Li <libaokun1@huawei.com>:
kfence: fix memory leak when cat kfence objects
Subsystem: mm/mempolicy
Andrey Ryabinin <arbn@yandex-team.com>:
mm: mempolicy: fix THP allocations escaping mempolicy restrictions
Subsystem: core-kernel
Philipp Rudo <prudo@redhat.com>:
kernel/crash_core: suppress unknown crashkernel parameter warning
Subsystem: MAINTAINERS
Randy Dunlap <rdunlap@infradead.org>:
MAINTAINERS: mark more list instances as moderated
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm, hwpoison: fix condition in free hugetlb page path
Subsystem: mm/pagemap
Hugh Dickins <hughd@google.com>:
mm: delete unsafe BUG from page_cache_add_speculative()
Subsystem: mm/pagealloc
Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>:
mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
Subsystem: mm/damon
SeongJae Park <sj@kernel.org>:
mm/damon/dbgfs: protect targets destructions with kdamond_lock
Subsystem: mm/memory-failure
Liu Shixin <liushixin2@huawei.com>:
mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
MAINTAINERS | 4 ++--
include/linux/gfp.h | 2 +-
include/linux/pagemap.h | 1 -
kernel/crash_core.c | 11 +++++++++++
mm/damon/dbgfs.c | 2 ++
mm/kfence/core.c | 1 +
mm/memory-failure.c | 14 +++++---------
mm/mempolicy.c | 3 +--
8 files changed, 23 insertions(+), 15 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-12-10 22:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-12-10 22:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
21 patches, based on c741e49150dbb0c0aebe234389f4aa8b47958fa8.
Subsystems affected by this patch series:
mm/mlock
MAINTAINERS
mailmap
mm/pagecache
mm/damon
mm/slub
mm/memcg
mm/hugetlb
mm/pagecache
Subsystem: mm/mlock
Drew DeVault <sir@cmpwn.com>:
Increase default MLOCK_LIMIT to 8 MiB
Subsystem: MAINTAINERS
Dave Young <dyoung@redhat.com>:
MAINTAINERS: update kdump maintainers
Subsystem: mailmap
Guo Ren <guoren@linux.alibaba.com>:
mailmap: update email address for Guo Ren
Subsystem: mm/pagecache
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
filemap: remove PageHWPoison check from next_uptodate_page()
Subsystem: mm/damon
SeongJae Park <sj@kernel.org>:
Patch series "mm/damon: Fix fake /proc/loadavg reports", v3:
timers: implement usleep_idle_range()
mm/damon/core: fix fake load reports due to uninterruptible sleeps
Patch series "mm/damon: Trivial fixups and improvements":
mm/damon/core: use better timer mechanisms selection threshold
mm/damon/dbgfs: remove an unnecessary error message
mm/damon/core: remove unnecessary error messages
mm/damon/vaddr: remove an unnecessary warning message
mm/damon/vaddr-test: split a test function having >1024 bytes frame size
mm/damon/vaddr-test: remove unnecessary variables
selftests/damon: skip test if DAMON is running
selftests/damon: test DAMON enabling with empty target_ids case
selftests/damon: test wrong DAMOS condition ranges input
selftests/damon: test debugfs file reads/writes with huge count
selftests/damon: split test cases
Subsystem: mm/slub
Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
mm/slub: fix endianness bug for alloc/free_traces attributes
Subsystem: mm/memcg
Waiman Long <longman@redhat.com>:
mm/memcg: relocate mod_objcg_mlstate(), get_obj_stock() and put_obj_stock()
Subsystem: mm/hugetlb
Zhenguo Yao <yaozhenguo1@gmail.com>:
hugetlbfs: fix issue of preallocation of gigantic pages can't work
Subsystem: mm/pagecache
Manjong Lee <mj0123.lee@samsung.com>:
mm: bdi: initialize bdi_min_ratio when bdi is unregistered
.mailmap | 2
MAINTAINERS | 2
include/linux/delay.h | 14
include/uapi/linux/resource.h | 13
kernel/time/timer.c | 16 -
mm/backing-dev.c | 7
mm/damon/core.c | 20 -
mm/damon/dbgfs.c | 4
mm/damon/vaddr-test.h | 85 ++---
mm/damon/vaddr.c | 1
mm/filemap.c | 2
mm/hugetlb.c | 2
mm/memcontrol.c | 106 +++----
mm/slub.c | 15 -
tools/testing/selftests/damon/.gitignore | 2
tools/testing/selftests/damon/Makefile | 7
tools/testing/selftests/damon/_debugfs_common.sh | 52 +++
tools/testing/selftests/damon/debugfs_attrs.sh | 149 ++--------
tools/testing/selftests/damon/debugfs_empty_targets.sh | 13
tools/testing/selftests/damon/debugfs_huge_count_read_write.sh | 22 +
tools/testing/selftests/damon/debugfs_schemes.sh | 19 +
tools/testing/selftests/damon/debugfs_target_ids.sh | 19 +
tools/testing/selftests/damon/huge_count_read_write.c | 39 ++
23 files changed, 363 insertions(+), 248 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-11-20 0:42 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-11-20 0:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
15 patches, based on a90af8f15bdc9449ee2d24e1d73fa3f7e8633f81.
Subsystems affected by this patch series:
mm/swap
ipc
mm/slab-generic
hexagon
mm/kmemleak
mm/hugetlb
mm/kasan
mm/damon
mm/highmem
proc
Subsystem: mm/swap
Matthew Wilcox <willy@infradead.org>:
mm/swap.c:put_pages_list(): reinitialise the page list
Subsystem: ipc
Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>:
Patch series "shm: shm_rmid_forced feature fixes":
ipc: WARN if trying to remove ipc object which is absent
shm: extend forced shm destroy to support objects from several IPC nses
Subsystem: mm/slab-generic
Yunfeng Ye <yeyunfeng@huawei.com>:
mm: emit the "free" trace report before freeing memory in kmem_cache_free()
Subsystem: hexagon
Nathan Chancellor <nathan@kernel.org>:
Patch series "Fixes for ARCH=hexagon allmodconfig", v2:
hexagon: export raw I/O routines for modules
hexagon: clean up timer-regs.h
hexagon: ignore vmlinux.lds
Subsystem: mm/kmemleak
Rustam Kovhaev <rkovhaev@gmail.com>:
mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
Subsystem: mm/hugetlb
Bui Quang Minh <minhquangbui99@gmail.com>:
hugetlb: fix hugetlb cgroup refcounting during mremap
Mina Almasry <almasrymina@google.com>:
hugetlb, userfaultfd: fix reservation restore on userfaultfd error
Subsystem: mm/kasan
Kees Cook <keescook@chromium.org>:
kasan: test: silence intentional read overflow warnings
Subsystem: mm/damon
SeongJae Park <sj@kernel.org>:
Patch series "DAMON fixes":
mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
Subsystem: mm/highmem
Ard Biesheuvel <ardb@kernel.org>:
kmap_local: don't assume kmap PTEs are linear arrays in memory
Subsystem: proc
David Hildenbrand <david@redhat.com>:
proc/vmcore: fix clearing user buffer by properly using clear_user()
arch/arm/Kconfig | 1
arch/hexagon/include/asm/timer-regs.h | 26 ----
arch/hexagon/include/asm/timex.h | 3
arch/hexagon/kernel/.gitignore | 1
arch/hexagon/kernel/time.c | 12 +-
arch/hexagon/lib/io.c | 4
fs/proc/vmcore.c | 20 ++-
include/linux/hugetlb_cgroup.h | 12 ++
include/linux/ipc_namespace.h | 15 ++
include/linux/sched/task.h | 2
ipc/shm.c | 189 +++++++++++++++++++++++++---------
ipc/util.c | 6 -
lib/test_kasan.c | 2
mm/Kconfig | 3
mm/damon/dbgfs.c | 20 ++-
mm/highmem.c | 32 +++--
mm/hugetlb.c | 11 +
mm/slab.c | 3
mm/slab.h | 2
mm/slob.c | 3
mm/slub.c | 2
mm/swap.c | 1
22 files changed, 254 insertions(+), 116 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-11-11 4:32 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-11-11 4:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
The post-linux-next material.
7 patches, based on debe436e77c72fcee804fb867f275e6d31aa999c.
Subsystems affected by this patch series:
mm/debug
mm/slab-generic
mm/migration
mm/memcg
mm/kasan
Subsystem: mm/debug
Yixuan Cao <caoyixuan2019@email.szu.edu.cn>:
mm/page_owner.c: modify the type of argument "order" in some functions
Subsystem: mm/slab-generic
Ingo Molnar <mingo@kernel.org>:
mm: allow only SLUB on PREEMPT_RT
Subsystem: mm/migration
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm: migrate: simplify the file-backed pages validation when migrating its mapping
Alistair Popple <apopple@nvidia.com>:
mm/migrate.c: remove MIGRATE_PFN_LOCKED
Subsystem: mm/memcg
Christoph Hellwig <hch@lst.de>:
Patch series "unexport memcg locking helpers":
mm: unexport folio_memcg_{,un}lock
mm: unexport {,un}lock_page_memcg
Subsystem: mm/kasan
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
kasan: add kasan mode messages when kasan init
Documentation/vm/hmm.rst | 2
arch/arm64/mm/kasan_init.c | 2
arch/powerpc/kvm/book3s_hv_uvmem.c | 4
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2
drivers/gpu/drm/nouveau/nouveau_dmem.c | 4
include/linux/migrate.h | 1
include/linux/page_owner.h | 12 +-
init/Kconfig | 2
lib/test_hmm.c | 5 -
mm/kasan/hw_tags.c | 14 ++
mm/kasan/sw_tags.c | 2
mm/memcontrol.c | 4
mm/migrate.c | 151 +++++--------------------------
mm/page_owner.c | 6 -
14 files changed, 61 insertions(+), 150 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-11-09 2:30 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-11-09 2:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
87 patches, based on 8bb7eca972ad531c9b149c0a51ab43a417385813, plus
previously sent material.
Subsystems affected by this patch series:
mm/pagecache
mm/hugetlb
procfs
misc
MAINTAINERS
lib
checkpatch
binfmt
kallsyms
ramfs
init
codafs
nilfs2
hfs
crash_dump
signals
seq_file
fork
sysvfs
kcov
gdb
resource
selftests
ipc
Subsystem: mm/pagecache
Johannes Weiner <hannes@cmpxchg.org>:
vfs: keep inodes with page cache off the inode shrinker LRU
Subsystem: mm/hugetlb
zhangyiru <zhangyiru3@huawei.com>:
mm,hugetlb: remove mlock ulimit for SHM_HUGETLB
Subsystem: procfs
Florian Weimer <fweimer@redhat.com>:
procfs: do not list TID 0 in /proc/<pid>/task
David Hildenbrand <david@redhat.com>:
x86/xen: update xen_oldmem_pfn_is_ram() documentation
x86/xen: simplify xen_oldmem_pfn_is_ram()
x86/xen: print a warning when HVMOP_get_mem_type fails
proc/vmcore: let pfn_is_ram() return a bool
proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks
virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug()
virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug()
virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug()
virtio-mem: kdump mode to sanitize /proc/vmcore access
Stephen Brennan <stephen.s.brennan@oracle.com>:
proc: allow pid_revalidate() during LOOKUP_RCU
Subsystem: misc
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
Patch series "kernel.h further split", v5:
kernel.h: drop unneeded <linux/kernel.h> inclusion from other headers
kernel.h: split out container_of() and typeof_member() macros
include/kunit/test.h: replace kernel.h with the necessary inclusions
include/linux/list.h: replace kernel.h with the necessary inclusions
include/linux/llist.h: replace kernel.h with the necessary inclusions
include/linux/plist.h: replace kernel.h with the necessary inclusions
include/media/media-entity.h: replace kernel.h with the necessary inclusions
include/linux/delay.h: replace kernel.h with the necessary inclusions
include/linux/sbitmap.h: replace kernel.h with the necessary inclusions
include/linux/radix-tree.h: replace kernel.h with the necessary inclusions
include/linux/generic-radix-tree.h: replace kernel.h with the necessary inclusions
Stephen Rothwell <sfr@canb.auug.org.au>:
kernel.h: split out instruction pointer accessors
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
linux/container_of.h: switch to static_assert
Colin Ian King <colin.i.king@googlemail.com>:
mailmap: update email address for Colin King
Subsystem: MAINTAINERS
Kees Cook <keescook@chromium.org>:
MAINTAINERS: add "exec & binfmt" section with myself and Eric
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
Patch series "Rectify file references for dt-bindings in MAINTAINERS", v5:
MAINTAINERS: rectify entry for ARM/TOSHIBA VISCONTI ARCHITECTURE
MAINTAINERS: rectify entry for HIKEY960 ONBOARD USB GPIO HUB DRIVER
MAINTAINERS: rectify entry for INTEL KEEM BAY DRM DRIVER
MAINTAINERS: rectify entry for ALLWINNER HARDWARE SPINLOCK SUPPORT
Subsystem: lib
Imran Khan <imran.f.khan@oracle.com>:
Patch series "lib, stackdepot: check stackdepot handle before accessing slabs", v2:
lib, stackdepot: check stackdepot handle before accessing slabs
lib, stackdepot: add helper to print stack entries
lib, stackdepot: add helper to print stack entries into buffer
Lucas De Marchi <lucas.demarchi@intel.com>:
include/linux/string_helpers.h: add linux/string.h for strlen()
Alexey Dobriyan <adobriyan@gmail.com>:
lib: uninline simple_strntoull() as well
Thomas Gleixner <tglx@linutronix.de>:
mm/scatterlist: replace the !preemptible warning in sg_miter_stop()
Subsystem: checkpatch
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
const_structs.checkpatch: add a few sound ops structs
Joe Perches <joe@perches.com>:
checkpatch: improve EXPORT_SYMBOL test for EXPORT_SYMBOL_NS uses
Peter Ujfalusi <peter.ujfalusi@linux.intel.com>:
checkpatch: get default codespell dictionary path from package location
Subsystem: binfmt
Kees Cook <keescook@chromium.org>:
binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE
Alexey Dobriyan <adobriyan@gmail.com>:
ELF: simplify STACK_ALLOC macro
Subsystem: kallsyms
Kefeng Wang <wangkefeng.wang@huawei.com>:
Patch series "sections: Unify kernel sections range check and use", v4:
kallsyms: remove arch specific text and data check
kallsyms: fix address-checks for kernel related range
sections: move and rename core_kernel_data() to is_kernel_core_data()
sections: move is_kernel_inittext() into sections.h
x86: mm: rename __is_kernel_text() to is_x86_32_kernel_text()
sections: provide internal __is_kernel() and __is_kernel_text() helper
mm: kasan: use is_kernel() helper
extable: use is_kernel_text() helper
powerpc/mm: use core_kernel_text() helper
microblaze: use is_kernel_text() helper
alpha: use is_kernel_text() helper
Subsystem: ramfs
yangerkun <yangerkun@huawei.com>:
ramfs: fix mount source show for ramfs
Subsystem: init
Andrew Halaney <ahalaney@redhat.com>:
init: make unknown command line param message clearer
Subsystem: codafs
Jan Harkes <jaharkes@cs.cmu.edu>:
Patch series "Coda updates for -next":
coda: avoid NULL pointer dereference from a bad inode
coda: check for async upcall request using local state
Alex Shi <alex.shi@linux.alibaba.com>:
coda: remove err which no one care
Jan Harkes <jaharkes@cs.cmu.edu>:
coda: avoid flagging NULL inodes
coda: avoid hidden code duplication in rename
coda: avoid doing bad things on inode type changes during revalidation
Xiyu Yang <xiyuyang19@fudan.edu.cn>:
coda: convert from atomic_t to refcount_t on coda_vm_ops->refcnt
Jing Yangyang <jing.yangyang@zte.com.cn>:
coda: use vmemdup_user to replace the open code
Jan Harkes <jaharkes@cs.cmu.edu>:
coda: bump module version to 7.2
Subsystem: nilfs2
Qing Wang <wangqing@vivo.com>:
Patch series "nilfs2 updates":
nilfs2: replace snprintf in show functions with sysfs_emit
Ryusuke Konishi <konishi.ryusuke@gmail.com>:
nilfs2: remove filenames from file comments
Subsystem: hfs
Arnd Bergmann <arnd@arndb.de>:
hfs/hfsplus: use WARN_ON for sanity check
Subsystem: crash_dump
Changcheng Deng <deng.changcheng@zte.com.cn>:
crash_dump: fix boolreturn.cocci warning
Ye Guojin <ye.guojin@zte.com.cn>:
crash_dump: remove duplicate include in crash_dump.h
Subsystem: signals
Ye Guojin <ye.guojin@zte.com.cn>:
signal: remove duplicate include in signal.h
Subsystem: seq_file
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
seq_file: move seq_escape() to a header
Muchun Song <songmuchun@bytedance.com>:
seq_file: fix passing wrong private data
Subsystem: fork
Ran Xiaokai <ran.xiaokai@zte.com.cn>:
kernel/fork.c: unshare(): use swap() to make code cleaner
Subsystem: sysvfs
Pavel Skripkin <paskripkin@gmail.com>:
sysv: use BUILD_BUG_ON instead of runtime check
Subsystem: kcov
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
Patch series "kcov: PREEMPT_RT fixup + misc", v2:
Documentation/kcov: include types.h in the example
Documentation/kcov: define `ip' in the example
kcov: allocate per-CPU memory on the relevant node
kcov: avoid enable+disable interrupts if !in_task()
kcov: replace local_irq_save() with a local_lock_t
Subsystem: gdb
Douglas Anderson <dianders@chromium.org>:
scripts/gdb: handle split debug for vmlinux
Subsystem: resource
David Hildenbrand <david@redhat.com>:
Patch series "virtio-mem: disallow mapping virtio-mem memory via /dev/mem", v5:
kernel/resource: clean up and optimize iomem_is_exclusive()
kernel/resource: disallow access to exclusive system RAM regions
virtio-mem: disallow mapping virtio-mem memory via /dev/mem
Subsystem: selftests
SeongJae Park <sjpark@amazon.de>:
selftests/kselftest/runner/run_one(): allow running non-executable files
Subsystem: ipc
Michal Clapinski <mclapinski@google.com>:
ipc: check checkpoint_restore_ns_capable() to modify C/R proc files
Manfred Spraul <manfred@colorfullife.com>:
ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL
.mailmap | 2
Documentation/dev-tools/kcov.rst | 5
MAINTAINERS | 21 +
arch/alpha/kernel/traps.c | 4
arch/microblaze/mm/pgtable.c | 3
arch/powerpc/mm/pgtable_32.c | 7
arch/riscv/lib/delay.c | 4
arch/s390/include/asm/facility.h | 4
arch/x86/kernel/aperture_64.c | 13
arch/x86/kernel/unwind_orc.c | 2
arch/x86/mm/init_32.c | 14
arch/x86/xen/mmu_hvm.c | 39 --
drivers/gpu/drm/drm_dp_mst_topology.c | 5
drivers/gpu/drm/drm_mm.c | 5
drivers/gpu/drm/i915/i915_vma.c | 5
drivers/gpu/drm/i915/intel_runtime_pm.c | 20 -
drivers/media/dvb-frontends/cxd2880/cxd2880_common.h | 1
drivers/virtio/Kconfig | 1
drivers/virtio/virtio_mem.c | 321 +++++++++++++------
fs/binfmt_elf.c | 33 +
fs/coda/cnode.c | 13
fs/coda/coda_linux.c | 39 +-
fs/coda/coda_linux.h | 6
fs/coda/dir.c | 20 -
fs/coda/file.c | 12
fs/coda/psdev.c | 14
fs/coda/upcall.c | 3
fs/hfs/inode.c | 6
fs/hfsplus/inode.c | 12
fs/hugetlbfs/inode.c | 23 -
fs/inode.c | 46 +-
fs/internal.h | 1
fs/nilfs2/alloc.c | 2
fs/nilfs2/alloc.h | 2
fs/nilfs2/bmap.c | 2
fs/nilfs2/bmap.h | 2
fs/nilfs2/btnode.c | 2
fs/nilfs2/btnode.h | 2
fs/nilfs2/btree.c | 2
fs/nilfs2/btree.h | 2
fs/nilfs2/cpfile.c | 2
fs/nilfs2/cpfile.h | 2
fs/nilfs2/dat.c | 2
fs/nilfs2/dat.h | 2
fs/nilfs2/dir.c | 2
fs/nilfs2/direct.c | 2
fs/nilfs2/direct.h | 2
fs/nilfs2/file.c | 2
fs/nilfs2/gcinode.c | 2
fs/nilfs2/ifile.c | 2
fs/nilfs2/ifile.h | 2
fs/nilfs2/inode.c | 2
fs/nilfs2/ioctl.c | 2
fs/nilfs2/mdt.c | 2
fs/nilfs2/mdt.h | 2
fs/nilfs2/namei.c | 2
fs/nilfs2/nilfs.h | 2
fs/nilfs2/page.c | 2
fs/nilfs2/page.h | 2
fs/nilfs2/recovery.c | 2
fs/nilfs2/segbuf.c | 2
fs/nilfs2/segbuf.h | 2
fs/nilfs2/segment.c | 2
fs/nilfs2/segment.h | 2
fs/nilfs2/sufile.c | 2
fs/nilfs2/sufile.h | 2
fs/nilfs2/super.c | 2
fs/nilfs2/sysfs.c | 78 ++--
fs/nilfs2/sysfs.h | 2
fs/nilfs2/the_nilfs.c | 2
fs/nilfs2/the_nilfs.h | 2
fs/proc/base.c | 21 -
fs/proc/vmcore.c | 109 ++++--
fs/ramfs/inode.c | 11
fs/seq_file.c | 16
fs/sysv/super.c | 6
include/asm-generic/sections.h | 75 +++-
include/kunit/test.h | 13
include/linux/bottom_half.h | 3
include/linux/container_of.h | 52 ++-
include/linux/crash_dump.h | 30 +
include/linux/delay.h | 2
include/linux/fs.h | 1
include/linux/fwnode.h | 1
include/linux/generic-radix-tree.h | 3
include/linux/hugetlb.h | 6
include/linux/instruction_pointer.h | 8
include/linux/kallsyms.h | 21 -
include/linux/kernel.h | 39 --
include/linux/list.h | 4
include/linux/llist.h | 4
include/linux/pagemap.h | 50 ++
include/linux/plist.h | 5
include/linux/radix-tree.h | 4
include/linux/rwsem.h | 1
include/linux/sbitmap.h | 11
include/linux/seq_file.h | 19 +
include/linux/signal.h | 1
include/linux/smp.h | 1
include/linux/spinlock.h | 1
include/linux/stackdepot.h | 5
include/linux/string_helpers.h | 1
include/media/media-entity.h | 3
init/main.c | 4
ipc/ipc_sysctl.c | 42 +-
ipc/shm.c | 8
kernel/extable.c | 33 -
kernel/fork.c | 9
kernel/kcov.c | 40 +-
kernel/locking/lockdep.c | 3
kernel/resource.c | 54 ++-
kernel/trace/ftrace.c | 2
lib/scatterlist.c | 11
lib/stackdepot.c | 46 ++
lib/vsprintf.c | 3
mm/Kconfig | 7
mm/filemap.c | 8
mm/kasan/report.c | 17 -
mm/memfd.c | 4
mm/mmap.c | 3
mm/page_owner.c | 18 -
mm/truncate.c | 19 +
mm/vmscan.c | 7
mm/workingset.c | 10
net/sysctl_net.c | 2
scripts/checkpatch.pl | 33 +
scripts/const_structs.checkpatch | 4
scripts/gdb/linux/symbols.py | 3
tools/testing/selftests/kselftest/runner.sh | 28 +
tools/testing/selftests/proc/.gitignore | 1
tools/testing/selftests/proc/Makefile | 2
tools/testing/selftests/proc/proc-tid0.c | 81 ++++
132 files changed, 1206 insertions(+), 681 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-11-05 20:34 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-11-05 20:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
262 patches, based on 8bb7eca972ad531c9b149c0a51ab43a417385813
Subsystems affected by this patch series:
scripts
ocfs2
vfs
mm/slab-generic
mm/slab
mm/slub
mm/kconfig
mm/dax
mm/kasan
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/memcg
mm/pagemap
mm/mprotect
mm/mremap
mm/iomap
mm/tracing
mm/vmalloc
mm/pagealloc
mm/memory-failure
mm/hugetlb
mm/userfaultfd
mm/vmscan
mm/tools
mm/memblock
mm/oom-kill
mm/hugetlbfs
mm/migration
mm/thp
mm/readahead
mm/nommu
mm/ksm
mm/vmstat
mm/madvise
mm/memory-hotplug
mm/rmap
mm/zsmalloc
mm/highmem
mm/zram
mm/cleanups
mm/kfence
mm/damon
Subsystem: scripts
Colin Ian King <colin.king@canonical.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Sven Eckelmann <sven@narfation.org>:
scripts/spelling.txt: fix "mistake" version of "synchronization"
weidonghui <weidonghui@allwinnertech.com>:
scripts/decodecode: fix faulting instruction no print when opps.file is DOS format
Subsystem: ocfs2
Chenyuan Mi <cymi20@fudan.edu.cn>:
ocfs2: fix handle refcount leak in two exception handling paths
Valentin Vidic <vvidic@valentin-vidic.from.hr>:
ocfs2: cleanup journal init and shutdown
Colin Ian King <colin.king@canonical.com>:
ocfs2/dlm: remove redundant assignment of variable ret
Jan Kara <jack@suse.cz>:
Patch series "ocfs2: Truncate data corruption fix":
ocfs2: fix data corruption on truncate
ocfs2: do not zero pages beyond i_size
Subsystem: vfs
Arnd Bergmann <arnd@arndb.de>:
fs/posix_acl.c: avoid -Wempty-body warning
Jia He <justin.he@arm.com>:
d_path: fix Kernel doc validator complaining
Subsystem: mm/slab-generic
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: move kvmalloc-related functions to slab.h
Subsystem: mm/slab
Shi Lei <shi_lei@massclouds.com>:
mm/slab.c: remove useless lines in enable_cpucache()
Subsystem: mm/slub
Kefeng Wang <wangkefeng.wang@huawei.com>:
slub: add back check for free nonslab objects
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: change percpu partial accounting from objects to pages
mm/slub: increase default cpu partial list sizes
Hyeonggon Yoo <42.hyeyoo@gmail.com>:
mm, slub: use prefetchw instead of prefetch
Subsystem: mm/kconfig
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm: disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT
Subsystem: mm/dax
Christoph Hellwig <hch@lst.de>:
mm: don't include <linux/dax.h> in <linux/mempolicy.h>
Subsystem: mm/kasan
Marco Elver <elver@google.com>:
Patch series "stackdepot, kasan, workqueue: Avoid expanding stackdepot slabs when holding raw_spin_lock", v2:
lib/stackdepot: include gfp.h
lib/stackdepot: remove unused function argument
lib/stackdepot: introduce __stack_depot_save()
kasan: common: provide can_alloc in kasan_save_stack()
kasan: generic: introduce kasan_record_aux_stack_noalloc()
workqueue, kasan: avoid alloc_pages() when recording stack
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
kasan: fix tag for large allocations when using CONFIG_SLAB
Peter Collingbourne <pcc@google.com>:
kasan: test: add memcpy test that avoids out-of-bounds write
Subsystem: mm/debug
Peter Xu <peterx@redhat.com>:
Patch series "mm/smaps: Fixes and optimizations on shmem swap handling":
mm/smaps: fix shmem pte hole swap calculation
mm/smaps: use vma->vm_pgoff directly when counting partial swap
mm/smaps: simplify shmem handling of pte holes
Guo Ren <guoren@linux.alibaba.com>:
mm: debug_vm_pgtable: don't use __P000 directly
Kees Cook <keescook@chromium.org>:
kasan: test: bypass __alloc_size checks
Patch series "Add __alloc_size()", v3:
rapidio: avoid bogus __alloc_size warning
Compiler Attributes: add __alloc_size() for better bounds checking
slab: clean up function prototypes
slab: add __alloc_size attributes for better bounds checking
mm/kvmalloc: add __alloc_size attributes for better bounds checking
mm/vmalloc: add __alloc_size attributes for better bounds checking
mm/page_alloc: add __alloc_size attributes for better bounds checking
percpu: add __alloc_size attributes for better bounds checking
Yinan Zhang <zhangyinan2019@email.szu.edu.cn>:
mm/page_ext.c: fix a comment
Subsystem: mm/pagecache
David Howells <dhowells@redhat.com>:
mm: stop filemap_read() from grabbing a superfluous page
Christoph Hellwig <hch@lst.de>:
Patch series "simplify bdi unregistation":
mm: export bdi_unregister
mtd: call bdi_unregister explicitly
fs: explicitly unregister per-superblock BDIs
mm: don't automatically unregister bdis
mm: simplify bdi refcounting
Jens Axboe <axboe@kernel.dk>:
mm: don't read i_size of inode unless we need it
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/filemap.c: remove bogus VM_BUG_ON
Jens Axboe <axboe@kernel.dk>:
mm: move more expensive part of XA setup out of mapping check
Subsystem: mm/gup
John Hubbard <jhubbard@nvidia.com>:
mm/gup: further simplify __gup_device_huge()
Subsystem: mm/swap
Xu Wang <vulab@iscas.ac.cn>:
mm/swapfile: remove needless request_queue NULL pointer check
Rafael Aquini <aquini@redhat.com>:
mm/swapfile: fix an integer overflow in swap_show()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: optimise put_pages_list()
Subsystem: mm/memcg
Peter Xu <peterx@redhat.com>:
mm/memcg: drop swp_entry_t* in mc_handle_file_pte()
Shakeel Butt <shakeelb@google.com>:
memcg: flush stats only if updated
memcg: unify memcg stat flushing
Waiman Long <longman@redhat.com>:
mm/memcg: remove obsolete memcg_free_kmem()
Len Baker <len.baker@gmx.com>:
mm/list_lru.c: prefer struct_size over open coded arithmetic
Shakeel Butt <shakeelb@google.com>:
memcg, kmem: further deprecate kmem.limit_in_bytes
Muchun Song <songmuchun@bytedance.com>:
mm: list_lru: remove holding lru lock
mm: list_lru: fix the return value of list_lru_count_one()
mm: memcontrol: remove kmemcg_id reparenting
mm: memcontrol: remove the kmem states
mm: list_lru: only add memcg-aware lrus to the global lru list
Vasily Averin <vvs@virtuozzo.com>:
Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3:
mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
Michal Hocko <mhocko@suse.com>:
mm, oom: do not trigger out_of_memory from the #PF
Vasily Averin <vvs@virtuozzo.com>:
memcg: prohibit unconditional exceeding the limit of dying tasks
Subsystem: mm/pagemap
Peng Liu <liupeng256@huawei.com>:
mm/mmap.c: fix a data race of mm->total_vm
Rolf Eike Beer <eb@emlix.com>:
mm: use __pfn_to_section() instead of open coding it
Amit Daniel Kachhap <amit.kachhap@arm.com>:
mm/memory.c: avoid unnecessary kernel/user pointer conversion
Nadav Amit <namit@vmware.com>:
mm/memory.c: use correct VMA flags when freeing page-tables
Peter Xu <peterx@redhat.com>:
Patch series "mm: A few cleanup patches around zap, shmem and uffd", v4:
mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte
mm: clear vmf->pte after pte_unmap_same() returns
mm: drop first_index/last_index in zap_details
mm: add zap_skip_check_mapping() helper
Qi Zheng <zhengqi.arch@bytedance.com>:
Patch series "Do some code cleanups related to mm", v3:
mm: introduce pmd_install() helper
mm: remove redundant smp_wmb()
Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>:
Documentation: update pagemap with shmem exceptions
Nicholas Piggin <npiggin@gmail.com>:
Patch series "shoot lazy tlbs", v4:
lazy tlb: introduce lazy mm refcount helper functions
lazy tlb: allow lazy tlb mm refcounting to be configurable
lazy tlb: shoot lazies, a non-refcounting lazy tlb option
powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
memory: remove unused CONFIG_MEM_BLOCK_SIZE
Subsystem: mm/mprotect
Liu Song <liu.song11@zte.com.cn>:
mm/mprotect.c: avoid repeated assignment in do_mprotect_pkey()
Subsystem: mm/mremap
Dmitry Safonov <dima@arista.com>:
mm/mremap: don't account pages in vma_to_resize()
Subsystem: mm/iomap
Lucas De Marchi <lucas.demarchi@intel.com>:
include/linux/io-mapping.h: remove fallback for writecombine
Subsystem: mm/tracing
Gang Li <ligang.bdlg@bytedance.com>:
mm: mmap_lock: remove redundant newline in TP_printk
mm: mmap_lock: use DECLARE_EVENT_CLASS and DEFINE_EVENT_FN
Subsystem: mm/vmalloc
Vasily Averin <vvs@virtuozzo.com>:
mm/vmalloc: repair warn_alloc()s in __vmalloc_area_node()
Peter Zijlstra <peterz@infradead.org>:
mm/vmalloc: don't allow VM_NO_GUARD on vmap()
Eric Dumazet <edumazet@google.com>:
mm/vmalloc: make show_numa_info() aware of hugepage mappings
mm/vmalloc: make sure to dump unpurged areas in /proc/vmallocinfo
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: do not adjust the search size for alignment overhead
mm/vmalloc: check various alignments when debugging
Vasily Averin <vvs@virtuozzo.com>:
vmalloc: back off when the current task is OOM-killed
Kefeng Wang <wangkefeng.wang@huawei.com>:
vmalloc: choose a better start address in vm_area_register_early()
arm64: support page mapping percpu first chunk allocator
kasan: arm64: fix pcpu_page_first_chunk crash with KASAN_VMALLOC
Michal Hocko <mhocko@suse.com>:
mm/vmalloc: be more explicit about supported gfp flags
Chen Wandun <chenwandun@huawei.com>:
mm/vmalloc: introduce alloc_pages_bulk_array_mempolicy to accelerate memory allocation
Changcheng Deng <deng.changcheng@zte.com.cn>:
lib/test_vmalloc.c: use swap() to make code cleaner
Subsystem: mm/pagealloc
Eric Dumazet <edumazet@google.com>:
mm/large system hash: avoid possible NULL deref in alloc_large_system_hash
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanups and fixup for page_alloc", v2:
mm/page_alloc.c: remove meaningless VM_BUG_ON() in pindex_to_order()
mm/page_alloc.c: simplify the code by using macro K()
mm/page_alloc.c: fix obsolete comment in free_pcppages_bulk()
mm/page_alloc.c: use helper function zone_spans_pfn()
mm/page_alloc.c: avoid allocating highmem pages via alloc_pages_exact[_nid]
Bharata B Rao <bharata@amd.com>:
Patch series "Fix NUMA nodes fallback list ordering":
mm/page_alloc: print node fallback order
Krupa Ramakrishnan <krupa.ramakrishnan@amd.com>:
mm/page_alloc: use accumulated load when building node fallback list
Geert Uytterhoeven <geert+renesas@glider.be>:
Patch series "Fix NUMA without SMP":
mm: move node_reclaim_distance to fix NUMA without SMP
mm: move fold_vm_numa_events() to fix NUMA without SMP
Eric Dumazet <edumazet@google.com>:
mm/page_alloc.c: do not acquire zone lock in is_free_buddy_page()
Feng Tang <feng.tang@intel.com>:
mm/page_alloc: detect allocation forbidden by cpuset and bail out early
Liangcai Fan <liangcaifan19@gmail.com>:
mm/page_alloc.c: show watermark_boost of zone in zoneinfo
Christophe Leroy <christophe.leroy@csgroup.eu>:
mm: create a new system state and fix core_kernel_text()
mm: make generic arch_is_kernel_initmem_freed() do what it says
powerpc: use generic version of arch_is_kernel_initmem_freed()
s390: use generic version of arch_is_kernel_initmem_freed()
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm: page_alloc: use migrate_disable() in drain_local_pages_wq()
Wang ShaoBo <bobo.shaobowang@huawei.com>:
mm/page_alloc: use clamp() to simplify code
Subsystem: mm/memory-failure
Marco Elver <elver@google.com>:
mm: fix data race in PagePoisoned()
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
mm/memory_failure: constify static mm_walk_ops
Yang Shi <shy828301@gmail.com>:
Patch series "Solve silent data loss caused by poisoned page cache (shmem/tmpfs)", v5:
mm: filemap: coding style cleanup for filemap_map_pmd()
mm: hwpoison: refactor refcount check handling
mm: shmem: don't truncate page if memory failure happens
mm: hwpoison: handle non-anonymous THP correctly
Subsystem: mm/hugetlb
Peter Xu <peterx@redhat.com>:
mm/hugetlb: drop __unmap_hugepage_range definition from hugetlb.h
Mike Kravetz <mike.kravetz@oracle.com>:
Patch series "hugetlb: add demote/split page functionality", v4:
hugetlb: add demote hugetlb page sysfs interfaces
mm/cma: add cma_pages_valid to determine if pages are in CMA
hugetlb: be sure to free demoted CMA pages to CMA
hugetlb: add demote bool to gigantic page routines
hugetlb: add hugetlb demote page support
Liangcai Fan <liangcaifan19@gmail.com>:
mm: khugepaged: recalculate min_free_kbytes after stopping khugepaged
Mina Almasry <almasrymina@google.com>:
mm, hugepages: add mremap() support for hugepage backed vma
mm, hugepages: add hugetlb vma mremap() test
Baolin Wang <baolin.wang@linux.alibaba.com>:
hugetlb: support node specified when using cma for gigantic hugepages
Ran Jianping <ran.jianping@zte.com.cn>:
mm: remove duplicate include in hugepage-mremap.c
Baolin Wang <baolin.wang@linux.alibaba.com>:
Patch series "Some cleanups and improvements for hugetlb":
hugetlb_cgroup: remove unused hugetlb_cgroup_from_counter macro
hugetlb: replace the obsolete hugetlb_instantiation_mutex in the comments
hugetlb: remove redundant validation in has_same_uncharge_info()
hugetlb: remove redundant VM_BUG_ON() in add_reservation_in_range()
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb: remove unnecessary set_page_count in prep_compound_gigantic_page
Subsystem: mm/userfaultfd
Axel Rasmussen <axelrasmussen@google.com>:
Patch series "Small userfaultfd selftest fixups", v2:
userfaultfd/selftests: don't rely on GNU extensions for random numbers
userfaultfd/selftests: fix feature support detection
userfaultfd/selftests: fix calculation of expected ioctls
Subsystem: mm/vmscan
Miaohe Lin <linmiaohe@huawei.com>:
mm/page_isolation: fix potential missing call to unset_migratetype_isolate()
mm/page_isolation: guard against possible putback unisolated page
Kai Song <songkai01@inspur.com>:
mm/vmscan.c: fix -Wunused-but-set-variable warning
Mel Gorman <mgorman@techsingularity.net>:
Patch series "Remove dependency on congestion_wait in mm/", v5. Patch series:
mm/vmscan: throttle reclaim until some writeback completes if congested
mm/vmscan: throttle reclaim and compaction when too may pages are isolated
mm/vmscan: throttle reclaim when no progress is being made
mm/writeback: throttle based on page writeback instead of congestion
mm/page_alloc: remove the throttling logic from the page allocator
mm/vmscan: centralise timeout values for reclaim_throttle
mm/vmscan: increase the timeout if page reclaim is not making progress
mm/vmscan: delay waking of tasks throttled on NOPROGRESS
Yuanzheng Song <songyuanzheng@huawei.com>:
mm/vmpressure: fix data-race with memcg->socket_pressure
Subsystem: mm/tools
Zhenliang Wei <weizhenliang@huawei.com>:
tools/vm/page_owner_sort.c: count and sort by mem
Naoya Horiguchi <naoya.horiguchi@nec.com>:
Patch series "tools/vm/page-types.c: a few improvements":
tools/vm/page-types.c: make walk_file() aware of address range option
tools/vm/page-types.c: move show_file() to summary output
tools/vm/page-types.c: print file offset in hexadecimal
Subsystem: mm/memblock
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "memblock: cleanup memblock_free interface", v2:
arch_numa: simplify numa_distance allocation
xen/x86: free_p2m_page: use memblock_free_ptr() to free a virtual pointer
memblock: drop memblock_free_early_nid() and memblock_free_early()
memblock: stop aliasing __memblock_free_late with memblock_free_late
memblock: rename memblock_free to memblock_phys_free
memblock: use memblock_free for freeing virtual pointers
Subsystem: mm/oom-kill
Sultan Alsawaf <sultan@kerneltoast.com>:
mm: mark the OOM reaper thread as freezable
Subsystem: mm/hugetlbfs
Zhenguo Yao <yaozhenguo1@gmail.com>:
hugetlbfs: extend the definition of hugepages parameter to support node allocation
Subsystem: mm/migration
John Hubbard <jhubbard@nvidia.com>:
mm/migrate: de-duplicate migrate_reason strings
Yang Shi <shy828301@gmail.com>:
mm: migrate: make demotion knob depend on migration
Subsystem: mm/thp
"George G. Davis" <davis.george@siemens.com>:
selftests/vm/transhuge-stress: fix ram size thinko
Rongwei Wang <rongwei.wang@linux.alibaba.com>:
Patch series "fix two bugs for file THP":
mm, thp: lock filemap when truncating page cache
mm, thp: fix incorrect unmap behavior for private pages
Subsystem: mm/readahead
Lin Feng <linf@wangsu.com>:
mm/readahead.c: fix incorrect comments for get_init_ra_size
Subsystem: mm/nommu
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm: nommu: kill arch_get_unmapped_area()
Subsystem: mm/ksm
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
selftest/vm: fix ksm selftest to run with different NUMA topologies
Pedro Demarchi Gomes <pedrodemargomes@gmail.com>:
selftests: vm: add KSM huge pages merging time test
Subsystem: mm/vmstat
Liu Shixin <liushixin2@huawei.com>:
mm/vmstat: annotate data race for zone->free_area[order].nr_free
Lin Feng <linf@wangsu.com>:
mm: vmstat.c: make extfrag_index show more pretty
Subsystem: mm/madvise
David Hildenbrand <david@redhat.com>:
selftests/vm: make MADV_POPULATE_(READ|WRITE) use in-tree headers
Subsystem: mm/memory-hotplug
Tang Yizhou <tangyizhou@huawei.com>:
mm/memory_hotplug: add static qualifier for online_policy_to_str()
David Hildenbrand <david@redhat.com>:
Patch series "memory-hotplug.rst: document the "auto-movable" online policy":
memory-hotplug.rst: fix two instances of "movablecore" that should be "movable_node"
memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ path
memory-hotplug.rst: document the "auto-movable" online policy
Patch series "mm/memory_hotplug: Kconfig and 32 bit cleanups":
mm/memory_hotplug: remove CONFIG_X86_64_ACPI_NUMA dependency from CONFIG_MEMORY_HOTPLUG
mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
mm/memory_hotplug: restrict CONFIG_MEMORY_HOTPLUG to 64 bit
mm/memory_hotplug: remove HIGHMEM leftovers
mm/memory_hotplug: remove stale function declarations
x86: remove memory hotplug support on X86_32
Patch series "mm/memory_hotplug: full support for add_memory_driver_managed() with CONFIG_ARCH_KEEP_MEMBLOCK", v2:
mm/memory_hotplug: handle memblock_add_node() failures in add_memory_resource()
memblock: improve MEMBLOCK_HOTPLUG documentation
memblock: allow to specify flags with memblock_add_node()
memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED
mm/memory_hotplug: indicate MEMBLOCK_DRIVER_MANAGED with IORESOURCE_SYSRAM_DRIVER_MANAGED
Subsystem: mm/rmap
Alistair Popple <apopple@nvidia.com>:
mm/rmap.c: avoid double faults migrating device private pages
Subsystem: mm/zsmalloc
Miaohe Lin <linmiaohe@huawei.com>:
mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()
Subsystem: mm/highmem
Ira Weiny <ira.weiny@intel.com>:
mm/highmem: remove deprecated kmap_atomic
Subsystem: mm/zram
Jaewon Kim <jaewon31.kim@samsung.com>:
zram_drv: allow reclaim on bio_alloc
Dan Carpenter <dan.carpenter@oracle.com>:
zram: off by one in read_block_state()
Brian Geffon <bgeffon@google.com>:
zram: introduce an aged idle interface
Subsystem: mm/cleanups
Stephen Kitt <steve@sk2.org>:
mm: remove HARDENED_USERCOPY_FALLBACK
Mianhan Liu <liumh1@shanghaitech.edu.cn>:
include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
stacktrace: move filter_irq_stacks() to kernel/stacktrace.c
kfence: count unexpectedly skipped allocations
kfence: move saving stack trace of allocations into __kfence_alloc()
kfence: limit currently covered allocations when pool nearly full
kfence: add note to documentation about skipping covered allocations
kfence: test: use kunit_skip() to skip tests
kfence: shorten critical sections of alloc/free
kfence: always use static branches to guard kfence_alloc()
kfence: default to dynamic branch instead of static keys mode
Subsystem: mm/damon
Geert Uytterhoeven <geert@linux-m68k.org>:
mm/damon: grammar s/works/work/
SeongJae Park <sjpark@amazon.de>:
Documentation/vm: move user guides to admin-guide/mm/
SeongJae Park <sj@kernel.org>:
MAINTAINERS: update SeongJae's email address
SeongJae Park <sjpark@amazon.de>:
docs/vm/damon: remove broken reference
include/linux/damon.h: fix kernel-doc comments for 'damon_callback'
SeongJae Park <sj@kernel.org>:
mm/damon/core: print kdamond start log in debug mode only
Changbin Du <changbin.du@gmail.com>:
mm/damon: remove unnecessary do_exit() from kdamond
mm/damon: needn't hold kdamond_lock to print pid of kdamond
Colin Ian King <colin.king@canonical.com>:
mm/damon/core: nullify pointer ctx->kdamond with a NULL
SeongJae Park <sj@kernel.org>:
Patch series "Implement Data Access Monitoring-based Memory Operation Schemes":
mm/damon/core: account age of target regions
mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)
mm/damon/vaddr: support DAMON-based Operation Schemes
mm/damon/dbgfs: support DAMON-based Operation Schemes
mm/damon/schemes: implement statistics feature
selftests/damon: add 'schemes' debugfs tests
Docs/admin-guide/mm/damon: document DAMON-based Operation Schemes
Patch series "DAMON: Support Physical Memory Address Space Monitoring::
mm/damon/dbgfs: allow users to set initial monitoring target regions
mm/damon/dbgfs-test: add a unit test case for 'init_regions'
Docs/admin-guide/mm/damon: document 'init_regions' feature
mm/damon/vaddr: separate commonly usable functions
mm/damon: implement primitives for physical address space monitoring
mm/damon/dbgfs: support physical memory monitoring
Docs/DAMON: document physical memory monitoring support
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
mm/damon/vaddr: constify static mm_walk_ops
Rongwei Wang <rongwei.wang@linux.alibaba.com>:
mm/damon/dbgfs: remove unnecessary variables
SeongJae Park <sj@kernel.org>:
mm/damon/paddr: support the pageout scheme
mm/damon/schemes: implement size quota for schemes application speed control
mm/damon/schemes: skip already charged targets and regions
mm/damon/schemes: implement time quota
mm/damon/dbgfs: support quotas of schemes
mm/damon/selftests: support schemes quotas
mm/damon/schemes: prioritize regions within the quotas
mm/damon/vaddr,paddr: support pageout prioritization
mm/damon/dbgfs: support prioritization weights
tools/selftests/damon: update for regions prioritization of schemes
mm/damon/schemes: activate schemes based on a watermarks mechanism
mm/damon/dbgfs: support watermarks
selftests/damon: support watermarks
mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
Xin Hao <xhao@linux.alibaba.com>:
Patch series "mm/damon: Fix some small bugs", v4:
mm/damon: remove unnecessary variable initialization
mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
SeongJae Park <sj@kernel.org>:
Patch series "Fix trivial nits in Documentation/admin-guide/mm":
Docs/admin-guide/mm/damon/start: fix wrong example commands
Docs/admin-guide/mm/damon/start: fix a wrong link
Docs/admin-guide/mm/damon/start: simplify the content
Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
Changbin Du <changbin.du@gmail.com>:
mm/damon: simplify stop mechanism
Colin Ian King <colin.i.king@googlemail.com>:
mm/damon: fix a few spelling mistakes in comments and a pr_debug message
Changbin Du <changbin.du@gmail.com>:
mm/damon: remove return value from before_terminate callback
a/Documentation/admin-guide/blockdev/zram.rst | 8
a/Documentation/admin-guide/cgroup-v1/memory.rst | 11
a/Documentation/admin-guide/kernel-parameters.txt | 14
a/Documentation/admin-guide/mm/damon/index.rst | 1
a/Documentation/admin-guide/mm/damon/reclaim.rst | 235 +++
a/Documentation/admin-guide/mm/damon/start.rst | 140 +
a/Documentation/admin-guide/mm/damon/usage.rst | 117 +
a/Documentation/admin-guide/mm/hugetlbpage.rst | 42
a/Documentation/admin-guide/mm/memory-hotplug.rst | 147 +-
a/Documentation/admin-guide/mm/pagemap.rst | 75 -
a/Documentation/core-api/memory-hotplug.rst | 3
a/Documentation/dev-tools/kfence.rst | 23
a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst | 4
a/Documentation/vm/damon/design.rst | 29
a/Documentation/vm/damon/faq.rst | 5
a/Documentation/vm/damon/index.rst | 1
a/Documentation/vm/page_owner.rst | 23
a/MAINTAINERS | 2
a/Makefile | 15
a/arch/Kconfig | 28
a/arch/alpha/kernel/core_irongate.c | 6
a/arch/arc/mm/init.c | 6
a/arch/arm/mach-hisi/platmcpm.c | 2
a/arch/arm/mach-rpc/ecard.c | 2
a/arch/arm/mm/init.c | 2
a/arch/arm64/Kconfig | 4
a/arch/arm64/mm/kasan_init.c | 16
a/arch/arm64/mm/mmu.c | 4
a/arch/ia64/mm/contig.c | 2
a/arch/ia64/mm/init.c | 2
a/arch/m68k/mm/mcfmmu.c | 3
a/arch/m68k/mm/motorola.c | 6
a/arch/mips/loongson64/init.c | 4
a/arch/mips/mm/init.c | 6
a/arch/mips/sgi-ip27/ip27-memory.c | 3
a/arch/mips/sgi-ip30/ip30-setup.c | 6
a/arch/powerpc/Kconfig | 1
a/arch/powerpc/configs/skiroot_defconfig | 1
a/arch/powerpc/include/asm/machdep.h | 2
a/arch/powerpc/include/asm/sections.h | 13
a/arch/powerpc/kernel/dt_cpu_ftrs.c | 8
a/arch/powerpc/kernel/paca.c | 8
a/arch/powerpc/kernel/setup-common.c | 4
a/arch/powerpc/kernel/setup_64.c | 6
a/arch/powerpc/kernel/smp.c | 2
a/arch/powerpc/mm/book3s64/radix_tlb.c | 4
a/arch/powerpc/mm/hugetlbpage.c | 9
a/arch/powerpc/platforms/powernv/pci-ioda.c | 4
a/arch/powerpc/platforms/powernv/setup.c | 4
a/arch/powerpc/platforms/pseries/setup.c | 2
a/arch/powerpc/platforms/pseries/svm.c | 9
a/arch/riscv/kernel/setup.c | 10
a/arch/s390/include/asm/sections.h | 12
a/arch/s390/kernel/setup.c | 11
a/arch/s390/kernel/smp.c | 6
a/arch/s390/kernel/uv.c | 2
a/arch/s390/mm/init.c | 3
a/arch/s390/mm/kasan_init.c | 2
a/arch/sh/boards/mach-ap325rxa/setup.c | 2
a/arch/sh/boards/mach-ecovec24/setup.c | 4
a/arch/sh/boards/mach-kfr2r09/setup.c | 2
a/arch/sh/boards/mach-migor/setup.c | 2
a/arch/sh/boards/mach-se/7724/setup.c | 4
a/arch/sparc/kernel/smp_64.c | 4
a/arch/um/kernel/mem.c | 4
a/arch/x86/Kconfig | 6
a/arch/x86/kernel/setup.c | 4
a/arch/x86/kernel/setup_percpu.c | 2
a/arch/x86/mm/init.c | 2
a/arch/x86/mm/init_32.c | 31
a/arch/x86/mm/kasan_init_64.c | 4
a/arch/x86/mm/numa.c | 2
a/arch/x86/mm/numa_emulation.c | 2
a/arch/x86/xen/mmu_pv.c | 8
a/arch/x86/xen/p2m.c | 4
a/arch/x86/xen/setup.c | 6
a/drivers/base/Makefile | 2
a/drivers/base/arch_numa.c | 96 +
a/drivers/base/node.c | 9
a/drivers/block/zram/zram_drv.c | 66
a/drivers/firmware/efi/memmap.c | 2
a/drivers/hwmon/occ/p9_sbe.c | 1
a/drivers/macintosh/smu.c | 2
a/drivers/mmc/core/mmc_test.c | 1
a/drivers/mtd/mtdcore.c | 1
a/drivers/of/kexec.c | 4
a/drivers/of/of_reserved_mem.c | 5
a/drivers/rapidio/devices/rio_mport_cdev.c | 9
a/drivers/s390/char/sclp_early.c | 4
a/drivers/usb/early/xhci-dbc.c | 10
a/drivers/virtio/Kconfig | 2
a/drivers/xen/swiotlb-xen.c | 4
a/fs/d_path.c | 8
a/fs/exec.c | 4
a/fs/ocfs2/alloc.c | 21
a/fs/ocfs2/dlm/dlmrecovery.c | 1
a/fs/ocfs2/file.c | 8
a/fs/ocfs2/inode.c | 4
a/fs/ocfs2/journal.c | 28
a/fs/ocfs2/journal.h | 3
a/fs/ocfs2/super.c | 40
a/fs/open.c | 16
a/fs/posix_acl.c | 3
a/fs/proc/task_mmu.c | 28
a/fs/super.c | 3
a/include/asm-generic/sections.h | 14
a/include/linux/backing-dev-defs.h | 3
a/include/linux/backing-dev.h | 1
a/include/linux/cma.h | 1
a/include/linux/compiler-gcc.h | 8
a/include/linux/compiler_attributes.h | 10
a/include/linux/compiler_types.h | 12
a/include/linux/cpuset.h | 17
a/include/linux/damon.h | 258 +++
a/include/linux/fs.h | 1
a/include/linux/gfp.h | 8
a/include/linux/highmem.h | 28
a/include/linux/hugetlb.h | 36
a/include/linux/io-mapping.h | 6
a/include/linux/kasan.h | 8
a/include/linux/kernel.h | 1
a/include/linux/kfence.h | 21
a/include/linux/memblock.h | 48
a/include/linux/memcontrol.h | 9
a/include/linux/memory.h | 26
a/include/linux/memory_hotplug.h | 3
a/include/linux/mempolicy.h | 5
a/include/linux/migrate.h | 23
a/include/linux/migrate_mode.h | 13
a/include/linux/mm.h | 57
a/include/linux/mm_types.h | 2
a/include/linux/mmzone.h | 41
a/include/linux/node.h | 4
a/include/linux/page-flags.h | 2
a/include/linux/percpu.h | 6
a/include/linux/sched/mm.h | 25
a/include/linux/slab.h | 181 +-
a/include/linux/slub_def.h | 13
a/include/linux/stackdepot.h | 8
a/include/linux/stacktrace.h | 1
a/include/linux/swap.h | 1
a/include/linux/vmalloc.h | 24
a/include/trace/events/mmap_lock.h | 50
a/include/trace/events/vmscan.h | 42
a/include/trace/events/writeback.h | 7
a/init/Kconfig | 2
a/init/initramfs.c | 4
a/init/main.c | 6
a/kernel/cgroup/cpuset.c | 23
a/kernel/cpu.c | 2
a/kernel/dma/swiotlb.c | 6
a/kernel/exit.c | 2
a/kernel/extable.c | 2
a/kernel/fork.c | 51
a/kernel/kexec_file.c | 5
a/kernel/kthread.c | 21
a/kernel/locking/lockdep.c | 15
a/kernel/printk/printk.c | 4
a/kernel/sched/core.c | 37
a/kernel/sched/sched.h | 4
a/kernel/sched/topology.c | 1
a/kernel/stacktrace.c | 30
a/kernel/tsacct.c | 2
a/kernel/workqueue.c | 2
a/lib/Kconfig.debug | 2
a/lib/Kconfig.kfence | 26
a/lib/bootconfig.c | 2
a/lib/cpumask.c | 6
a/lib/stackdepot.c | 76 -
a/lib/test_kasan.c | 26
a/lib/test_kasan_module.c | 2
a/lib/test_vmalloc.c | 6
a/mm/Kconfig | 10
a/mm/backing-dev.c | 65
a/mm/cma.c | 26
a/mm/compaction.c | 12
a/mm/damon/Kconfig | 24
a/mm/damon/Makefile | 4
a/mm/damon/core.c | 500 ++++++-
a/mm/damon/dbgfs-test.h | 56
a/mm/damon/dbgfs.c | 486 +++++-
a/mm/damon/paddr.c | 275 +++
a/mm/damon/prmtv-common.c | 133 +
a/mm/damon/prmtv-common.h | 20
a/mm/damon/reclaim.c | 356 ++++
a/mm/damon/vaddr-test.h | 2
a/mm/damon/vaddr.c | 167 +-
a/mm/debug.c | 20
a/mm/debug_vm_pgtable.c | 7
a/mm/filemap.c | 78 -
a/mm/gup.c | 5
a/mm/highmem.c | 6
a/mm/hugetlb.c | 713 +++++++++-
a/mm/hugetlb_cgroup.c | 3
a/mm/internal.h | 26
a/mm/kasan/common.c | 8
a/mm/kasan/generic.c | 16
a/mm/kasan/kasan.h | 2
a/mm/kasan/shadow.c | 5
a/mm/kfence/core.c | 214 ++-
a/mm/kfence/kfence.h | 2
a/mm/kfence/kfence_test.c | 14
a/mm/khugepaged.c | 10
a/mm/list_lru.c | 58
a/mm/memblock.c | 35
a/mm/memcontrol.c | 217 +--
a/mm/memory-failure.c | 117 +
a/mm/memory.c | 166 +-
a/mm/memory_hotplug.c | 57
a/mm/mempolicy.c | 143 +-
a/mm/migrate.c | 61
a/mm/mmap.c | 2
a/mm/mprotect.c | 5
a/mm/mremap.c | 86 -
a/mm/nommu.c | 6
a/mm/oom_kill.c | 27
a/mm/page-writeback.c | 13
a/mm/page_alloc.c | 119 -
a/mm/page_ext.c | 2
a/mm/page_isolation.c | 29
a/mm/percpu.c | 24
a/mm/readahead.c | 2
a/mm/rmap.c | 8
a/mm/shmem.c | 44
a/mm/slab.c | 16
a/mm/slab_common.c | 8
a/mm/slub.c | 117 -
a/mm/sparse-vmemmap.c | 2
a/mm/sparse.c | 6
a/mm/swap.c | 23
a/mm/swapfile.c | 6
a/mm/userfaultfd.c | 8
a/mm/vmalloc.c | 107 +
a/mm/vmpressure.c | 2
a/mm/vmscan.c | 194 ++
a/mm/vmstat.c | 76 -
a/mm/zsmalloc.c | 7
a/net/ipv4/tcp.c | 1
a/net/ipv4/udp.c | 1
a/net/netfilter/ipvs/ip_vs_ctl.c | 1
a/net/openvswitch/meter.c | 1
a/net/sctp/protocol.c | 1
a/scripts/checkpatch.pl | 3
a/scripts/decodecode | 2
a/scripts/spelling.txt | 18
a/security/Kconfig | 14
a/tools/testing/selftests/damon/debugfs_attrs.sh | 25
a/tools/testing/selftests/memory-hotplug/config | 1
a/tools/testing/selftests/vm/.gitignore | 1
a/tools/testing/selftests/vm/Makefile | 1
a/tools/testing/selftests/vm/hugepage-mremap.c | 161 ++
a/tools/testing/selftests/vm/ksm_tests.c | 154 ++
a/tools/testing/selftests/vm/madv_populate.c | 15
a/tools/testing/selftests/vm/run_vmtests.sh | 11
a/tools/testing/selftests/vm/transhuge-stress.c | 2
a/tools/testing/selftests/vm/userfaultfd.c | 157 +-
a/tools/vm/page-types.c | 38
a/tools/vm/page_owner_sort.c | 94 +
b/Documentation/admin-guide/mm/index.rst | 2
b/Documentation/vm/index.rst | 26
260 files changed, 6448 insertions(+), 2327 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-10-28 21:35 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-10-28 21:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
11 patches, based on 411a44c24a561e449b592ff631b7ae321f1eb559.
Subsystems affected by this patch series:
mm/memcg
mm/memory-failure
mm/oom-kill
ocfs2
mm/secretmem
mm/vmalloc
mm/hugetlb
mm/damon
mm/tools
Subsystem: mm/memcg
Shakeel Butt <shakeelb@google.com>:
memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT
Subsystem: mm/memory-failure
Yang Shi <shy828301@gmail.com>:
mm: hwpoison: remove the unnecessary THP check
mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
Subsystem: mm/oom-kill
Suren Baghdasaryan <surenb@google.com>:
mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap
Subsystem: ocfs2
Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>:
ocfs2: fix race between searching chunks and release journal_head from buffer_head
Subsystem: mm/secretmem
Kees Cook <keescook@chromium.org>:
mm/secretmem: avoid letting secretmem_users drop to zero
Subsystem: mm/vmalloc
Chen Wandun <chenwandun@huawei.com>:
mm/vmalloc: fix numa spreading for large hash tables
Subsystem: mm/hugetlb
Rongwei Wang <rongwei.wang@linux.alibaba.com>:
mm, thp: bail out early in collapse_file for writeback page
Yang Shi <shy828301@gmail.com>:
mm: khugepaged: skip huge page collapse for special files
Subsystem: mm/damon
SeongJae Park <sj@kernel.org>:
mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'
Subsystem: mm/tools
David Yang <davidcomponentone@gmail.com>:
tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer
fs/ocfs2/suballoc.c | 22 ++++++++++-------
include/linux/page-flags.h | 23 ++++++++++++++++++
mm/damon/core-test.h | 4 +--
mm/huge_memory.c | 2 +
mm/khugepaged.c | 26 +++++++++++++-------
mm/memory-failure.c | 28 +++++++++++-----------
mm/memory.c | 9 +++++++
mm/oom_kill.c | 23 +++++++++---------
mm/page_alloc.c | 8 +++++-
mm/secretmem.c | 2 -
mm/vmalloc.c | 15 +++++++----
tools/testing/selftests/vm/split_huge_page_test.c | 2 -
12 files changed, 110 insertions(+), 54 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-10-18 22:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-10-18 22:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
19 patches, based on 519d81956ee277b4419c723adfb154603c2565ba.
Subsystems affected by this patch series:
mm/userfaultfd
mm/migration
ocfs2
mm/memblock
mm/mempolicy
mm/slub
binfmt
vfs
mm/secretmem
mm/thp
misc
Subsystem: mm/userfaultfd
Peter Xu <peterx@redhat.com>:
mm/userfaultfd: selftests: fix memory corruption with thp enabled
Nadav Amit <namit@vmware.com>:
userfaultfd: fix a race between writeprotect and exit_mmap()
Subsystem: mm/migration
Dave Hansen <dave.hansen@linux.intel.com>:
Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2:
mm/migrate: optimize hotplug-time demotion order updates
mm/migrate: add CPU hotplug to demotion #ifdef
Huang Ying <ying.huang@intel.com>:
mm/migrate: fix CPUHP state to update node demotion order
Subsystem: ocfs2
Jan Kara <jack@suse.cz>:
ocfs2: fix data corruption after conversion from inline format
Valentin Vidic <vvidic@valentin-vidic.from.hr>:
ocfs2: mount fails with buffer overflow in strlen
Subsystem: mm/memblock
Peng Fan <peng.fan@nxp.com>:
memblock: check memory total_size
Subsystem: mm/mempolicy
Eric Dumazet <edumazet@google.com>:
mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
Subsystem: mm/slub
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Fixups for slub":
mm, slub: fix two bugs in slab_debug_trace_open()
mm, slub: fix mismatch between reconstructed freelist depth and cnt
mm, slub: fix potential memoryleak in kmem_cache_open()
mm, slub: fix potential use-after-free in slab_debugfs_fops
mm, slub: fix incorrect memcg slab count for bulk free
Subsystem: binfmt
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
elfcore: correct reference to CONFIG_UML
Subsystem: vfs
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
vfs: check fd has read access in kernel_read_file_from_fd()
Subsystem: mm/secretmem
Sean Christopherson <seanjc@google.com>:
mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()
Subsystem: mm/thp
Marek Szyprowski <m.szyprowski@samsung.com>:
mm/thp: decrease nr_thps in file's mapping on THP split
Subsystem: misc
Andrej Shadura <andrew.shadura@collabora.co.uk>:
mailmap: add Andrej Shadura
.mailmap | 2 +
fs/kernel_read_file.c | 2 -
fs/ocfs2/alloc.c | 46 ++++++-----------------
fs/ocfs2/super.c | 14 +++++--
fs/userfaultfd.c | 12 ++++--
include/linux/cpuhotplug.h | 4 ++
include/linux/elfcore.h | 2 -
include/linux/memory.h | 5 ++
include/linux/secretmem.h | 2 -
mm/huge_memory.c | 6 ++-
mm/memblock.c | 2 -
mm/mempolicy.c | 16 ++------
mm/migrate.c | 62 ++++++++++++++++++-------------
mm/page_ext.c | 4 --
mm/slab.c | 4 +-
mm/slub.c | 31 ++++++++++++---
tools/testing/selftests/vm/userfaultfd.c | 23 ++++++++++-
17 files changed, 138 insertions(+), 99 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-09-24 22:42 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-09-24 22:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
16 patches, based on 7d42e98182586f57f376406d033f05fe135edb75.
Subsystems affected by this patch series:
mm/memory-failure
mm/kasan
mm/damon
xtensa
mm/shmem
ocfs2
scripts
mm/tools
lib
mm/pagecache
mm/debug
sh
mm/kasan
mm/memory-failure
mm/pagemap
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
Subsystem: mm/kasan
Marco Elver <elver@google.com>:
kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
Subsystem: mm/damon
Adam Borowski <kilobyte@angband.pl>:
mm/damon: don't use strnlen() with known-bogus source length
Subsystem: xtensa
Guenter Roeck <linux@roeck-us.net>:
xtensa: increase size of gcc stack frame check
Subsystem: mm/shmem
Liu Yuntao <liuyuntao10@huawei.com>:
mm/shmem.c: fix judgment error in shmem_is_huge()
Subsystem: ocfs2
Wengang Wang <wen.gang.wang@oracle.com>:
ocfs2: drop acl cache for directories too
Subsystem: scripts
Miles Chen <miles.chen@mediatek.com>:
scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
Subsystem: mm/tools
Changbin Du <changbin.du@gmail.com>:
tools/vm/page-types: remove dependency on opt_file for idle page tracking
Subsystem: lib
Paul Menzel <pmenzel@molgen.mpg.de>:
lib/zlib_inflate/inffast: check config in C to avoid unused function warning
Subsystem: mm/pagecache
Minchan Kim <minchan@kernel.org>:
mm: fs: invalidate bh_lrus for only cold path
Subsystem: mm/debug
Weizhao Ouyang <o451686892@gmail.com>:
mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
mm/debug: sync up latest migrate_reason to migrate_reason_names
Subsystem: sh
Geert Uytterhoeven <geert+renesas@glider.be>:
sh: pgtable-3level: fix cast to pointer from integer of different size
Subsystem: mm/kasan
Nathan Chancellor <nathan@kernel.org>:
kasan: always respect CONFIG_KASAN_STACK
Subsystem: mm/memory-failure
Qi Zheng <zhengqi.arch@bytedance.com>:
mm/memory_failure: fix the missing pte_unmap() call
Subsystem: mm/pagemap
Chen Jun <chenjun102@huawei.com>:
mm: fix uninitialized use in overcommit_policy_handler
arch/sh/include/asm/pgtable-3level.h | 2 +-
fs/buffer.c | 8 ++++++--
fs/ocfs2/dlmglue.c | 3 ++-
include/linux/buffer_head.h | 4 ++--
include/linux/migrate.h | 6 +++++-
lib/Kconfig.debug | 2 +-
lib/Kconfig.kasan | 2 ++
lib/zlib_inflate/inffast.c | 13 ++++++-------
mm/damon/dbgfs-test.h | 16 ++++++++--------
mm/debug.c | 4 +++-
mm/memory-failure.c | 12 ++++++------
mm/shmem.c | 4 ++--
mm/swap.c | 19 ++++++++++++++++---
mm/util.c | 4 ++--
scripts/Makefile.kasan | 3 ++-
scripts/sorttable.c | 4 ++++
tools/vm/page-types.c | 2 +-
17 files changed, 69 insertions(+), 39 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-09-10 17:11 ` incoming Kees Cook
@ 2021-09-10 20:13 ` Kees Cook
0 siblings, 0 replies; 786+ messages in thread
From: Kees Cook @ 2021-09-10 20:13 UTC (permalink / raw)
To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, linux-mm, mm-commits
On Fri, Sep 10, 2021 at 10:11:53AM -0700, Kees Cook wrote:
> On Thu, Sep 09, 2021 at 08:09:48PM -0700, Andrew Morton wrote:
> >
> > More post linux-next material.
> >
> > 9 patches, based on f154c806676ad7153c6e161f30c53a44855329d6.
> >
> > Subsystems affected by this patch series:
> >
> > mm/slab-generic
> > rapidio
> > mm/debug
> >
> > Subsystem: mm/slab-generic
> >
> > "Matthew Wilcox (Oracle)" <willy@infradead.org>:
> > mm: move kvmalloc-related functions to slab.h
> >
> > Subsystem: rapidio
> >
> > Kees Cook <keescook@chromium.org>:
> > rapidio: avoid bogus __alloc_size warning
> >
> > Subsystem: mm/debug
> >
> > Kees Cook <keescook@chromium.org>:
> > Patch series "Add __alloc_size() for better bounds checking", v2:
> > Compiler Attributes: add __alloc_size() for better bounds checking
> > checkpatch: add __alloc_size() to known $Attribute
> > slab: clean up function declarations
> > slab: add __alloc_size attributes for better bounds checking
> > mm/page_alloc: add __alloc_size attributes for better bounds checking
> > percpu: add __alloc_size attributes for better bounds checking
> > mm/vmalloc: add __alloc_size attributes for better bounds checking
>
> Hi,
>
> FYI, in overnight build testing I found yet another corner case in
> GCC's handling of the __alloc_size attribute. It's the gift that keeps
> on giving. The fix is here:
>
> https://lore.kernel.org/lkml/20210910165851.3296624-1-keescook@chromium.org/
I'm so glad it's Friday. Here's the v2 fix... *sigh*
https://lore.kernel.org/lkml/20210910201132.3809437-1-keescook@chromium.org/
-Kees
>
> >
> > Makefile | 15 +++
> > drivers/of/kexec.c | 1
> > drivers/rapidio/devices/rio_mport_cdev.c | 9 +-
> > include/linux/compiler_attributes.h | 6 +
> > include/linux/gfp.h | 2
> > include/linux/mm.h | 34 --------
> > include/linux/percpu.h | 3
> > include/linux/slab.h | 122 ++++++++++++++++++++++---------
> > include/linux/vmalloc.h | 11 ++
> > scripts/checkpatch.pl | 3
> > 10 files changed, 132 insertions(+), 74 deletions(-)
> >
>
> --
> Kees Cook
--
Kees Cook
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-09-10 3:09 incoming Andrew Morton
@ 2021-09-10 17:11 ` Kees Cook
2021-09-10 20:13 ` incoming Kees Cook
0 siblings, 1 reply; 786+ messages in thread
From: Kees Cook @ 2021-09-10 17:11 UTC (permalink / raw)
To: Linus Torvalds, Andrew Morton; +Cc: linux-mm, mm-commits
On Thu, Sep 09, 2021 at 08:09:48PM -0700, Andrew Morton wrote:
>
> More post linux-next material.
>
> 9 patches, based on f154c806676ad7153c6e161f30c53a44855329d6.
>
> Subsystems affected by this patch series:
>
> mm/slab-generic
> rapidio
> mm/debug
>
> Subsystem: mm/slab-generic
>
> "Matthew Wilcox (Oracle)" <willy@infradead.org>:
> mm: move kvmalloc-related functions to slab.h
>
> Subsystem: rapidio
>
> Kees Cook <keescook@chromium.org>:
> rapidio: avoid bogus __alloc_size warning
>
> Subsystem: mm/debug
>
> Kees Cook <keescook@chromium.org>:
> Patch series "Add __alloc_size() for better bounds checking", v2:
> Compiler Attributes: add __alloc_size() for better bounds checking
> checkpatch: add __alloc_size() to known $Attribute
> slab: clean up function declarations
> slab: add __alloc_size attributes for better bounds checking
> mm/page_alloc: add __alloc_size attributes for better bounds checking
> percpu: add __alloc_size attributes for better bounds checking
> mm/vmalloc: add __alloc_size attributes for better bounds checking
Hi,
FYI, in overnight build testing I found yet another corner case in
GCC's handling of the __alloc_size attribute. It's the gift that keeps
on giving. The fix is here:
https://lore.kernel.org/lkml/20210910165851.3296624-1-keescook@chromium.org/
>
> Makefile | 15 +++
> drivers/of/kexec.c | 1
> drivers/rapidio/devices/rio_mport_cdev.c | 9 +-
> include/linux/compiler_attributes.h | 6 +
> include/linux/gfp.h | 2
> include/linux/mm.h | 34 --------
> include/linux/percpu.h | 3
> include/linux/slab.h | 122 ++++++++++++++++++++++---------
> include/linux/vmalloc.h | 11 ++
> scripts/checkpatch.pl | 3
> 10 files changed, 132 insertions(+), 74 deletions(-)
>
--
Kees Cook
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-09-10 3:09 Andrew Morton
2021-09-10 17:11 ` incoming Kees Cook
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-09-10 3:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
More post linux-next material.
9 patches, based on f154c806676ad7153c6e161f30c53a44855329d6.
Subsystems affected by this patch series:
mm/slab-generic
rapidio
mm/debug
Subsystem: mm/slab-generic
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: move kvmalloc-related functions to slab.h
Subsystem: rapidio
Kees Cook <keescook@chromium.org>:
rapidio: avoid bogus __alloc_size warning
Subsystem: mm/debug
Kees Cook <keescook@chromium.org>:
Patch series "Add __alloc_size() for better bounds checking", v2:
Compiler Attributes: add __alloc_size() for better bounds checking
checkpatch: add __alloc_size() to known $Attribute
slab: clean up function declarations
slab: add __alloc_size attributes for better bounds checking
mm/page_alloc: add __alloc_size attributes for better bounds checking
percpu: add __alloc_size attributes for better bounds checking
mm/vmalloc: add __alloc_size attributes for better bounds checking
Makefile | 15 +++
drivers/of/kexec.c | 1
drivers/rapidio/devices/rio_mport_cdev.c | 9 +-
include/linux/compiler_attributes.h | 6 +
include/linux/gfp.h | 2
include/linux/mm.h | 34 --------
include/linux/percpu.h | 3
include/linux/slab.h | 122 ++++++++++++++++++++++---------
include/linux/vmalloc.h | 11 ++
scripts/checkpatch.pl | 3
10 files changed, 132 insertions(+), 74 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-09-09 1:08 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-09-09 1:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
A bunch of hotfixes, mostly cc:stable.
8 patches, based on 2d338201d5311bcd79d42f66df4cecbcbc5f4f2c.
Subsystems affected by this patch series:
mm/hmm
mm/hugetlb
mm/vmscan
mm/pagealloc
mm/pagemap
mm/kmemleak
mm/mempolicy
mm/memblock
Subsystem: mm/hmm
Li Zhijian <lizhijian@cn.fujitsu.com>:
mm/hmm: bypass devmap pte when all pfn requested flags are fulfilled
Subsystem: mm/hugetlb
Liu Zixian <liuzixian4@huawei.com>:
mm/hugetlb: initialize hugetlb_usage in mm_init
Subsystem: mm/vmscan
Rik van Riel <riel@surriel.com>:
mm,vmscan: fix divide by zero in get_scan_count
Subsystem: mm/pagealloc
Miaohe Lin <linmiaohe@huawei.com>:
mm/page_alloc.c: avoid accessing uninitialized pcp page migratetype
Subsystem: mm/pagemap
Liam Howlett <liam.howlett@oracle.com>:
mmap_lock: change trace and locking order
Subsystem: mm/kmemleak
Naohiro Aota <naohiro.aota@wdc.com>:
mm/kmemleak: allow __GFP_NOLOCKDEP passed to kmemleak's gfp
Subsystem: mm/mempolicy
yanghui <yanghui.def@bytedance.com>:
mm/mempolicy: fix a race between offset_il_node and mpol_rebind_task
Subsystem: mm/memblock
Mike Rapoport <rppt@linux.ibm.com>:
nds32/setup: remove unused memblock_region variable in setup_memory()
arch/nds32/kernel/setup.c | 1 -
include/linux/hugetlb.h | 9 +++++++++
include/linux/mmap_lock.h | 8 ++++----
kernel/fork.c | 1 +
mm/hmm.c | 5 ++++-
mm/kmemleak.c | 3 ++-
mm/mempolicy.c | 17 +++++++++++++----
mm/page_alloc.c | 4 +++-
mm/vmscan.c | 2 +-
9 files changed, 37 insertions(+), 13 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-09-08 22:17 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-09-08 22:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
This is the post-linux-next material, so it is based upon latest
upstream to catch the now-merged dependencies.
10 patches, based on 2d338201d5311bcd79d42f66df4cecbcbc5f4f2c.
Subsystems affected by this patch series:
mm/vmstat
mm/migration
compat
Subsystem: mm/vmstat
Ingo Molnar <mingo@elte.hu>:
mm/vmstat: protect per cpu variables with preempt disable on RT
Subsystem: mm/migration
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm: migrate: introduce a local variable to get the number of pages
mm: migrate: fix the incorrect function name in comments
mm: migrate: change to use bool type for 'page_was_mapped'
Subsystem: compat
Arnd Bergmann <arnd@arndb.de>:
Patch series "compat: remove compat_alloc_user_space", v5:
kexec: move locking into do_kexec_load
kexec: avoid compat_alloc_user_space
mm: simplify compat_sys_move_pages
mm: simplify compat numa syscalls
compat: remove some compat entry points
arch: remove compat_alloc_user_space
arch/arm64/include/asm/compat.h | 5
arch/arm64/include/asm/uaccess.h | 11 -
arch/arm64/include/asm/unistd32.h | 10 -
arch/arm64/lib/Makefile | 2
arch/arm64/lib/copy_in_user.S | 77 ----------
arch/mips/cavium-octeon/octeon-memcpy.S | 2
arch/mips/include/asm/compat.h | 8 -
arch/mips/include/asm/uaccess.h | 26 ---
arch/mips/kernel/syscalls/syscall_n32.tbl | 10 -
arch/mips/kernel/syscalls/syscall_o32.tbl | 10 -
arch/mips/lib/memcpy.S | 11 -
arch/parisc/include/asm/compat.h | 6
arch/parisc/include/asm/uaccess.h | 2
arch/parisc/kernel/syscalls/syscall.tbl | 8 -
arch/parisc/lib/memcpy.c | 9 -
arch/powerpc/include/asm/compat.h | 16 --
arch/powerpc/kernel/syscalls/syscall.tbl | 10 -
arch/s390/include/asm/compat.h | 10 -
arch/s390/include/asm/uaccess.h | 3
arch/s390/kernel/syscalls/syscall.tbl | 10 -
arch/s390/lib/uaccess.c | 63 --------
arch/sparc/include/asm/compat.h | 19 --
arch/sparc/kernel/process_64.c | 2
arch/sparc/kernel/signal32.c | 12 -
arch/sparc/kernel/signal_64.c | 8 -
arch/sparc/kernel/syscalls/syscall.tbl | 10 -
arch/x86/entry/syscalls/syscall_32.tbl | 4
arch/x86/entry/syscalls/syscall_64.tbl | 2
arch/x86/include/asm/compat.h | 13 -
arch/x86/include/asm/uaccess_64.h | 7
include/linux/compat.h | 39 +----
include/linux/uaccess.h | 10 -
include/uapi/asm-generic/unistd.h | 10 -
kernel/compat.c | 21 --
kernel/kexec.c | 105 +++++---------
kernel/sys_ni.c | 5
mm/mempolicy.c | 213 +++++++-----------------------
mm/migrate.c | 69 +++++----
mm/vmstat.c | 48 ++++++
39 files changed, 243 insertions(+), 663 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-09-08 2:52 incoming Andrew Morton
@ 2021-09-08 8:57 ` Vlastimil Babka
0 siblings, 0 replies; 786+ messages in thread
From: Vlastimil Babka @ 2021-09-08 8:57 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds
Cc: linux-mm, mm-commits, Mike Galbraith, Mel Gorman
On 9/8/21 04:52, Andrew Morton wrote:
> Subsystem: mm/slub
>
> Vlastimil Babka <vbabka@suse.cz>:
> Patch series "SLUB: reduce irq disabled scope and make it RT compatible", v6:
> mm, slub: don't call flush_all() from slab_debug_trace_open()
> mm, slub: allocate private object map for debugfs listings
> mm, slub: allocate private object map for validate_slab_cache()
> mm, slub: don't disable irq for debug_check_no_locks_freed()
> mm, slub: remove redundant unfreeze_partials() from put_cpu_partial()
> mm, slub: extract get_partial() from new_slab_objects()
> mm, slub: dissolve new_slab_objects() into ___slab_alloc()
> mm, slub: return slab page from get_partial() and set c->page afterwards
> mm, slub: restructure new page checks in ___slab_alloc()
> mm, slub: simplify kmem_cache_cpu and tid setup
> mm, slub: move disabling/enabling irqs to ___slab_alloc()
> mm, slub: do initial checks in ___slab_alloc() with irqs enabled
> mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc()
> mm, slub: restore irqs around calling new_slab()
> mm, slub: validate slab from partial list or page allocator before making it cpu slab
> mm, slub: check new pages with restored irqs
> mm, slub: stop disabling irqs around get_partial()
> mm, slub: move reset of c->page and freelist out of deactivate_slab()
> mm, slub: make locking in deactivate_slab() irq-safe
> mm, slub: call deactivate_slab() without disabling irqs
> mm, slub: move irq control into unfreeze_partials()
> mm, slub: discard slabs in unfreeze_partials() without irqs disabled
> mm, slub: detach whole partial list at once in unfreeze_partials()
> mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing
> mm, slub: only disable irq with spin_lock in __unfreeze_partials()
> mm, slub: don't disable irqs in slub_cpu_dead()
> mm, slab: split out the cpu offline variant of flush_slab()
>
> Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
> mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
> mm: slub: make object_map_lock a raw_spinlock_t
>
> Vlastimil Babka <vbabka@suse.cz>:
> mm, slub: make slab_lock() disable irqs with PREEMPT_RT
> mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
> mm, slub: use migrate_disable() on PREEMPT_RT
> mm, slub: convert kmem_cpu_slab protection to local_lock
For my own piece of mind, I've checked that this part (patches 1 to 33)
are identical to the v6 posting [1] and git version [2] that Mel and
Mike tested (replies to [1]).
[1] https://lore.kernel.org/all/20210904105003.11688-1-vbabka@suse.cz/
[2] git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git
tags/mm-slub-5.15-rc1
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-09-08 2:52 Andrew Morton
2021-09-08 8:57 ` incoming Vlastimil Babka
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-09-08 2:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.
Subsystems affected by this patch series:
mm/slub
mm/memory-hotplug
mm/rmap
mm/ioremap
mm/highmem
mm/cleanups
mm/secretmem
mm/kfence
mm/damon
alpha
percpu
procfs
misc
core-kernel
MAINTAINERS
lib
bitops
checkpatch
epoll
init
nilfs2
coredump
fork
pids
criu
kconfig
selftests
ipc
mm/vmscan
scripts
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
Patch series "SLUB: reduce irq disabled scope and make it RT compatible", v6:
mm, slub: don't call flush_all() from slab_debug_trace_open()
mm, slub: allocate private object map for debugfs listings
mm, slub: allocate private object map for validate_slab_cache()
mm, slub: don't disable irq for debug_check_no_locks_freed()
mm, slub: remove redundant unfreeze_partials() from put_cpu_partial()
mm, slub: extract get_partial() from new_slab_objects()
mm, slub: dissolve new_slab_objects() into ___slab_alloc()
mm, slub: return slab page from get_partial() and set c->page afterwards
mm, slub: restructure new page checks in ___slab_alloc()
mm, slub: simplify kmem_cache_cpu and tid setup
mm, slub: move disabling/enabling irqs to ___slab_alloc()
mm, slub: do initial checks in ___slab_alloc() with irqs enabled
mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc()
mm, slub: restore irqs around calling new_slab()
mm, slub: validate slab from partial list or page allocator before making it cpu slab
mm, slub: check new pages with restored irqs
mm, slub: stop disabling irqs around get_partial()
mm, slub: move reset of c->page and freelist out of deactivate_slab()
mm, slub: make locking in deactivate_slab() irq-safe
mm, slub: call deactivate_slab() without disabling irqs
mm, slub: move irq control into unfreeze_partials()
mm, slub: discard slabs in unfreeze_partials() without irqs disabled
mm, slub: detach whole partial list at once in unfreeze_partials()
mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing
mm, slub: only disable irq with spin_lock in __unfreeze_partials()
mm, slub: don't disable irqs in slub_cpu_dead()
mm, slab: split out the cpu offline variant of flush_slab()
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
mm: slub: make object_map_lock a raw_spinlock_t
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: make slab_lock() disable irqs with PREEMPT_RT
mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
mm, slub: use migrate_disable() on PREEMPT_RT
mm, slub: convert kmem_cpu_slab protection to local_lock
Subsystem: mm/memory-hotplug
David Hildenbrand <david@redhat.com>:
Patch series "memory-hotplug.rst: complete admin-guide overhaul", v3:
memory-hotplug.rst: remove locking details from admin-guide
memory-hotplug.rst: complete admin-guide overhaul
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE":
mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE
mm: memory_hotplug: cleanup after removal of pfn_valid_within()
David Hildenbrand <david@redhat.com>:
Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory":
mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
mm/memory_hotplug: remove nid parameter from arch_remove_memory()
mm/memory_hotplug: remove nid parameter from remove_memory() and friends
ACPI: memhotplug: memory resources cannot be enabled yet
Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3:
mm: track present early pages per zone
mm/memory_hotplug: introduce "auto-movable" online policy
drivers/base/memory: introduce "memory groups" to logically group memory blocks
mm/memory_hotplug: track present pages in memory groups
ACPI: memhotplug: use a single static memory group for a single memory device
dax/kmem: use a single static memory group for a single probed unit
virtio-mem: use a single dynamic memory group for a single virtio-mem device
mm/memory_hotplug: memory group aware "auto-movable" online policy
mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup and fixups for memory hotplug":
mm/memory_hotplug: use helper zone_is_zone_device() to simplify the code
Subsystem: mm/rmap
Muchun Song <songmuchun@bytedance.com>:
mm: remove redundant compound_head() calling
Subsystem: mm/ioremap
Christoph Hellwig <hch@lst.de>:
riscv: only select GENERIC_IOREMAP if MMU support is enabled
Patch series "small ioremap cleanups":
mm: move ioremap_page_range to vmalloc.c
mm: don't allow executable ioremap mappings
Weizhao Ouyang <o451686892@gmail.com>:
mm/early_ioremap.c: remove redundant early_ioremap_shutdown()
Subsystem: mm/highmem
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
highmem: don't disable preemption on RT in kmap_atomic()
Subsystem: mm/cleanups
Changbin Du <changbin.du@gmail.com>:
mm: in_irq() cleanup
Muchun Song <songmuchun@bytedance.com>:
mm: introduce PAGEFLAGS_MASK to replace ((1UL << NR_PAGEFLAGS) - 1)
Subsystem: mm/secretmem
Jordy Zomer <jordy@jordyzomer.github.io>:
mm/secretmem: use refcount_t instead of atomic_t
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence: show cpu and timestamp in alloc/free info
kfence: test: fail fast if disabled at boot
Subsystem: mm/damon
SeongJae Park <sjpark@amazon.de>:
Patch series "Introduce Data Access MONitor (DAMON)", v34:
mm: introduce Data Access MONitor (DAMON)
mm/damon/core: implement region-based sampling
mm/damon: adaptively adjust regions
mm/idle_page_tracking: make PG_idle reusable
mm/damon: implement primitives for the virtual memory address spaces
mm/damon: add a tracepoint
mm/damon: implement a debugfs-based user space interface
mm/damon/dbgfs: export kdamond pid to the user space
mm/damon/dbgfs: support multiple contexts
Documentation: add documents for DAMON
mm/damon: add kunit tests
mm/damon: add user space selftests
MAINTAINERS: update for DAMON
Subsystem: alpha
Randy Dunlap <rdunlap@infradead.org>:
alpha: agp: make empty macros use do-while-0 style
alpha: pci-sysfs: fix all kernel-doc warnings
Subsystem: percpu
Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
percpu: remove export of pcpu_base_addr
Subsystem: procfs
Feng Zhou <zhoufeng.zf@bytedance.com>:
fs/proc/kcore.c: add mmap interface
Christoph Hellwig <hch@lst.de>:
proc: stop using seq_get_buf in proc_task_name
Ohhoon Kwon <ohoono.kwon@samsung.com>:
connector: send event on write to /proc/[pid]/comm
Subsystem: misc
Colin Ian King <colin.king@canonical.com>:
arch: Kconfig: fix spelling mistake "seperate" -> "separate"
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
include/linux/once.h: fix trivia typo Not -> Note
Daniel Lezcano <daniel.lezcano@linaro.org>:
Patch series "Add Hz macros", v3:
units: change from 'L' to 'UL'
units: add the HZ macros
thermal/drivers/devfreq_cooling: use HZ macros
devfreq: use HZ macros
iio/drivers/as73211: use HZ macros
hwmon/drivers/mr75203: use HZ macros
iio/drivers/hid-sensor: use HZ macros
i2c/drivers/ov02q10: use HZ macros
mtd/drivers/nand: use HZ macros
phy/drivers/stm32: use HZ macros
Subsystem: core-kernel
Yang Yang <yang.yang29@zte.com.cn>:
kernel/acct.c: use dedicated helper to access rlimit values
Pavel Skripkin <paskripkin@gmail.com>:
profiling: fix shift-out-of-bounds bugs
Subsystem: MAINTAINERS
Nathan Chancellor <nathan@kernel.org>:
MAINTAINERS: update ClangBuiltLinux mailing list
Documentation/llvm: update mailing list
Documentation/llvm: update IRC location
Subsystem: lib
Geert Uytterhoeven <geert@linux-m68k.org>:
Patch series "math: RATIONAL and RATIONAL_KUNIT_TEST improvements":
math: make RATIONAL tristate
math: RATIONAL_KUNIT_TEST should depend on RATIONAL instead of selecting it
Matteo Croce <mcroce@microsoft.com>:
Patch series "lib/string: optimized mem* functions", v2:
lib/string: optimized memcpy
lib/string: optimized memmove
lib/string: optimized memset
Daniel Latypov <dlatypov@google.com>:
lib/test: convert test_sort.c to use KUnit
Randy Dunlap <rdunlap@infradead.org>:
lib/dump_stack: correct kernel-doc notation
lib/iov_iter.c: fix kernel-doc warnings
Subsystem: bitops
Yury Norov <yury.norov@gmail.com>:
Patch series "Resend bitmap patches":
bitops: protect find_first_{,zero}_bit properly
bitops: move find_bit_*_le functions from le.h to find.h
include: move find.h from asm_generic to linux
arch: remove GENERIC_FIND_FIRST_BIT entirely
lib: add find_first_and_bit()
cpumask: use find_first_and_bit()
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
tools: sync tools/bitmap with mother linux
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
include/linux: move for_each_bit() macros from bitops.h to find.h
find: micro-optimize for_each_{set,clear}_bit()
bitops: replace for_each_*_bit_from() with for_each_*_bit() where appropriate
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
tools: rename bitmap_alloc() to bitmap_zalloc()
Yury Norov <yury.norov@gmail.com>:
mm/percpu: micro-optimize pcpu_is_populated()
bitmap: unify find_bit operations
lib: bitmap: add performance test for bitmap_print_to_pagebuf
vsprintf: rework bitmap_list_string
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: support wide strings
Mimi Zohar <zohar@linux.ibm.com>:
checkpatch: make email address check case insensitive
Joe Perches <joe@perches.com>:
checkpatch: improve GIT_COMMIT_ID test
Subsystem: epoll
Nicholas Piggin <npiggin@gmail.com>:
fs/epoll: use a per-cpu counter for user's watches count
Subsystem: init
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
init: move usermodehelper_enable() to populate_rootfs()
Kefeng Wang <wangkefeng.wang@huawei.com>:
trap: cleanup trap_init()
Subsystem: nilfs2
Nanyong Sun <sunnanyong@huawei.com>:
Patch series "nilfs2: fix incorrect usage of kobject":
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
Zhen Lei <thunder.leizhen@huawei.com>:
nilfs2: use refcount_dec_and_lock() to fix potential UAF
Subsystem: coredump
David Oberhollenzer <david.oberhollenzer@sigma-star.at>:
fs/coredump.c: log if a core dump is aborted due to changed file permissions
QiuXi <qiuxi1@huawei.com>:
coredump: fix memleak in dump_vma_snapshot()
Subsystem: fork
Christoph Hellwig <hch@lst.de>:
kernel/fork.c: unexport get_{mm,task}_exe_file
Subsystem: pids
Takahiro Itazuri <itazur@amazon.com>:
pid: cleanup the stale comment mentioning pidmap_init().
Subsystem: criu
Cyrill Gorcunov <gorcunov@gmail.com>:
prctl: allow to setup brk for et_dyn executables
Subsystem: kconfig
Zenghui Yu <yuzenghui@huawei.com>:
configs: remove the obsolete CONFIG_INPUT_POLLDEV
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
Subsystem: selftests
Greg Thelen <gthelen@google.com>:
selftests/memfd: remove unused variable
Subsystem: ipc
Rafael Aquini <aquini@redhat.com>:
ipc: replace costly bailout check in sysvipc_find_ipc()
Subsystem: mm/vmscan
Randy Dunlap <rdunlap@infradead.org>:
mm/workingset: correct kernel-doc notations
Subsystem: scripts
Randy Dunlap <rdunlap@infradead.org>:
scripts: check_extable: fix typo in user error message
a/Documentation/admin-guide/mm/damon/index.rst | 15
a/Documentation/admin-guide/mm/damon/start.rst | 114 +
a/Documentation/admin-guide/mm/damon/usage.rst | 112 +
a/Documentation/admin-guide/mm/index.rst | 1
a/Documentation/admin-guide/mm/memory-hotplug.rst | 842 ++++++-----
a/Documentation/dev-tools/kfence.rst | 98 -
a/Documentation/kbuild/llvm.rst | 5
a/Documentation/vm/damon/api.rst | 20
a/Documentation/vm/damon/design.rst | 166 ++
a/Documentation/vm/damon/faq.rst | 51
a/Documentation/vm/damon/index.rst | 30
a/Documentation/vm/index.rst | 1
a/MAINTAINERS | 17
a/arch/Kconfig | 2
a/arch/alpha/include/asm/agp.h | 4
a/arch/alpha/include/asm/bitops.h | 2
a/arch/alpha/kernel/pci-sysfs.c | 12
a/arch/arc/Kconfig | 1
a/arch/arc/include/asm/bitops.h | 1
a/arch/arc/kernel/traps.c | 5
a/arch/arm/configs/dove_defconfig | 1
a/arch/arm/configs/pxa_defconfig | 1
a/arch/arm/include/asm/bitops.h | 1
a/arch/arm/kernel/traps.c | 5
a/arch/arm64/Kconfig | 1
a/arch/arm64/include/asm/bitops.h | 1
a/arch/arm64/mm/mmu.c | 3
a/arch/csky/include/asm/bitops.h | 1
a/arch/h8300/include/asm/bitops.h | 1
a/arch/h8300/kernel/traps.c | 4
a/arch/hexagon/include/asm/bitops.h | 1
a/arch/hexagon/kernel/traps.c | 4
a/arch/ia64/include/asm/bitops.h | 2
a/arch/ia64/mm/init.c | 3
a/arch/m68k/include/asm/bitops.h | 2
a/arch/mips/Kconfig | 1
a/arch/mips/configs/lemote2f_defconfig | 1
a/arch/mips/configs/pic32mzda_defconfig | 1
a/arch/mips/configs/rt305x_defconfig | 1
a/arch/mips/configs/xway_defconfig | 1
a/arch/mips/include/asm/bitops.h | 1
a/arch/nds32/kernel/traps.c | 5
a/arch/nios2/kernel/traps.c | 5
a/arch/openrisc/include/asm/bitops.h | 1
a/arch/openrisc/kernel/traps.c | 5
a/arch/parisc/configs/generic-32bit_defconfig | 1
a/arch/parisc/include/asm/bitops.h | 2
a/arch/parisc/kernel/traps.c | 4
a/arch/powerpc/include/asm/bitops.h | 2
a/arch/powerpc/include/asm/cputhreads.h | 2
a/arch/powerpc/kernel/traps.c | 5
a/arch/powerpc/mm/mem.c | 3
a/arch/powerpc/platforms/pasemi/dma_lib.c | 4
a/arch/powerpc/platforms/pseries/hotplug-memory.c | 9
a/arch/riscv/Kconfig | 2
a/arch/riscv/include/asm/bitops.h | 1
a/arch/riscv/kernel/traps.c | 5
a/arch/s390/Kconfig | 1
a/arch/s390/include/asm/bitops.h | 1
a/arch/s390/kvm/kvm-s390.c | 2
a/arch/s390/mm/init.c | 3
a/arch/sh/include/asm/bitops.h | 1
a/arch/sh/mm/init.c | 3
a/arch/sparc/include/asm/bitops_32.h | 1
a/arch/sparc/include/asm/bitops_64.h | 2
a/arch/um/kernel/trap.c | 4
a/arch/x86/Kconfig | 1
a/arch/x86/configs/i386_defconfig | 1
a/arch/x86/configs/x86_64_defconfig | 1
a/arch/x86/include/asm/bitops.h | 2
a/arch/x86/kernel/apic/vector.c | 4
a/arch/x86/mm/init_32.c | 3
a/arch/x86/mm/init_64.c | 3
a/arch/x86/um/Kconfig | 1
a/arch/xtensa/include/asm/bitops.h | 1
a/block/blk-mq.c | 2
a/drivers/acpi/acpi_memhotplug.c | 46
a/drivers/base/memory.c | 231 ++-
a/drivers/base/node.c | 2
a/drivers/block/rnbd/rnbd-clt.c | 2
a/drivers/dax/kmem.c | 43
a/drivers/devfreq/devfreq.c | 2
a/drivers/dma/ti/edma.c | 2
a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4
a/drivers/hwmon/ltc2992.c | 3
a/drivers/hwmon/mr75203.c | 2
a/drivers/iio/adc/ad7124.c | 2
a/drivers/iio/common/hid-sensors/hid-sensor-attributes.c | 3
a/drivers/iio/light/as73211.c | 3
a/drivers/infiniband/hw/irdma/hw.c | 16
a/drivers/media/cec/core/cec-core.c | 2
a/drivers/media/i2c/ov02a10.c | 2
a/drivers/media/mc/mc-devnode.c | 2
a/drivers/mmc/host/renesas_sdhi_core.c | 2
a/drivers/mtd/nand/raw/intel-nand-controller.c | 2
a/drivers/net/virtio_net.c | 2
a/drivers/pci/controller/dwc/pci-dra7xx.c | 2
a/drivers/phy/st/phy-stm32-usbphyc.c | 2
a/drivers/scsi/lpfc/lpfc_sli.c | 10
a/drivers/soc/fsl/qbman/bman_portal.c | 2
a/drivers/soc/fsl/qbman/qman_portal.c | 2
a/drivers/soc/ti/k3-ringacc.c | 4
a/drivers/thermal/devfreq_cooling.c | 2
a/drivers/tty/n_tty.c | 2
a/drivers/virt/acrn/ioreq.c | 3
a/drivers/virtio/virtio_mem.c | 26
a/fs/coredump.c | 15
a/fs/eventpoll.c | 18
a/fs/f2fs/segment.c | 8
a/fs/nilfs2/sysfs.c | 26
a/fs/nilfs2/the_nilfs.c | 9
a/fs/ocfs2/cluster/heartbeat.c | 2
a/fs/ocfs2/dlm/dlmdomain.c | 4
a/fs/ocfs2/dlm/dlmmaster.c | 18
a/fs/ocfs2/dlm/dlmrecovery.c | 2
a/fs/ocfs2/dlm/dlmthread.c | 2
a/fs/proc/array.c | 18
a/fs/proc/base.c | 5
a/fs/proc/kcore.c | 73
a/include/asm-generic/bitops.h | 1
a/include/asm-generic/bitops/find.h | 198 --
a/include/asm-generic/bitops/le.h | 64
a/include/asm-generic/early_ioremap.h | 6
a/include/linux/bitmap.h | 34
a/include/linux/bitops.h | 34
a/include/linux/cpumask.h | 46
a/include/linux/damon.h | 290 +++
a/include/linux/find.h | 134 +
a/include/linux/highmem-internal.h | 27
a/include/linux/memory.h | 55
a/include/linux/memory_hotplug.h | 40
a/include/linux/mmzone.h | 19
a/include/linux/once.h | 2
a/include/linux/page-flags.h | 17
a/include/linux/page_ext.h | 2
a/include/linux/page_idle.h | 6
a/include/linux/pagemap.h | 7
a/include/linux/sched/user.h | 3
a/include/linux/slub_def.h | 6
a/include/linux/threads.h | 2
a/include/linux/units.h | 10
a/include/linux/vmalloc.h | 3
a/include/trace/events/damon.h | 43
a/include/trace/events/mmflags.h | 2
a/include/trace/events/page_ref.h | 4
a/init/initramfs.c | 2
a/init/main.c | 3
a/init/noinitramfs.c | 2
a/ipc/util.c | 16
a/kernel/acct.c | 2
a/kernel/fork.c | 2
a/kernel/profile.c | 21
a/kernel/sys.c | 7
a/kernel/time/clocksource.c | 4
a/kernel/user.c | 25
a/lib/Kconfig | 3
a/lib/Kconfig.debug | 9
a/lib/dump_stack.c | 3
a/lib/find_bit.c | 21
a/lib/find_bit_benchmark.c | 21
a/lib/genalloc.c | 2
a/lib/iov_iter.c | 8
a/lib/math/Kconfig | 2
a/lib/math/rational.c | 3
a/lib/string.c | 130 +
a/lib/test_bitmap.c | 37
a/lib/test_printf.c | 2
a/lib/test_sort.c | 40
a/lib/vsprintf.c | 26
a/mm/Kconfig | 15
a/mm/Makefile | 4
a/mm/compaction.c | 20
a/mm/damon/Kconfig | 68
a/mm/damon/Makefile | 5
a/mm/damon/core-test.h | 253 +++
a/mm/damon/core.c | 748 ++++++++++
a/mm/damon/dbgfs-test.h | 126 +
a/mm/damon/dbgfs.c | 631 ++++++++
a/mm/damon/vaddr-test.h | 329 ++++
a/mm/damon/vaddr.c | 672 +++++++++
a/mm/early_ioremap.c | 5
a/mm/highmem.c | 2
a/mm/ioremap.c | 25
a/mm/kfence/core.c | 3
a/mm/kfence/kfence.h | 2
a/mm/kfence/kfence_test.c | 3
a/mm/kfence/report.c | 19
a/mm/kmemleak.c | 2
a/mm/memory_hotplug.c | 396 ++++-
a/mm/memremap.c | 5
a/mm/page_alloc.c | 27
a/mm/page_ext.c | 12
a/mm/page_idle.c | 10
a/mm/page_isolation.c | 7
a/mm/page_owner.c | 14
a/mm/percpu.c | 36
a/mm/rmap.c | 6
a/mm/secretmem.c | 9
a/mm/slab_common.c | 2
a/mm/slub.c | 1023 +++++++++-----
a/mm/vmalloc.c | 24
a/mm/workingset.c | 2
a/net/ncsi/ncsi-manage.c | 4
a/scripts/check_extable.sh | 2
a/scripts/checkpatch.pl | 93 -
a/tools/include/linux/bitmap.h | 4
a/tools/perf/bench/find-bit-bench.c | 2
a/tools/perf/builtin-c2c.c | 6
a/tools/perf/builtin-record.c | 2
a/tools/perf/tests/bitmap.c | 2
a/tools/perf/tests/mem2node.c | 2
a/tools/perf/util/affinity.c | 4
a/tools/perf/util/header.c | 4
a/tools/perf/util/metricgroup.c | 2
a/tools/perf/util/mmap.c | 4
a/tools/testing/selftests/damon/Makefile | 7
a/tools/testing/selftests/damon/_chk_dependency.sh | 28
a/tools/testing/selftests/damon/debugfs_attrs.sh | 75 +
a/tools/testing/selftests/kvm/dirty_log_perf_test.c | 2
a/tools/testing/selftests/kvm/dirty_log_test.c | 4
a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c | 2
a/tools/testing/selftests/memfd/memfd_test.c | 2
b/MAINTAINERS | 2
b/tools/include/asm-generic/bitops.h | 1
b/tools/include/linux/bitmap.h | 7
b/tools/include/linux/find.h | 81 +
b/tools/lib/find_bit.c | 20
227 files changed, 6695 insertions(+), 1875 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-09-02 21:48 incoming Andrew Morton
@ 2021-09-02 21:49 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-09-02 21:49 UTC (permalink / raw)
To: Linus Torvalds, linux-mm, mm-commits
On Thu, 2 Sep 2021 14:48:20 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 212 patches, based on 4a3bb4200a5958d76cc26ebe4db4257efa56812b.
Make that "based on 7d2a07b769330c34b4deabeed939325c77a7ec2f".
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-09-02 21:48 Andrew Morton
2021-09-02 21:49 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-09-02 21:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
212 patches, based on 4a3bb4200a5958d76cc26ebe4db4257efa56812b.
Subsystems affected by this patch series:
ia64
ocfs2
block
mm/slub
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/shmem
mm/memcg
mm/selftests
mm/pagemap
mm/mremap
mm/bootmem
mm/sparsemem
mm/vmalloc
mm/kasan
mm/pagealloc
mm/memory-failure
mm/hugetlb
mm/userfaultfd
mm/vmscan
mm/compaction
mm/mempolicy
mm/memblock
mm/oom-kill
mm/migration
mm/ksm
mm/percpu
mm/vmstat
mm/madvise
Subsystem: ia64
Jason Wang <wangborong@cdjrlc.com>:
ia64: fix typo in a comment
Geert Uytterhoeven <geert+renesas@glider.be>:
Patch series "ia64: Miscellaneous fixes and cleanups":
ia64: fix #endif comment for reserve_elfcorehdr()
ia64: make reserve_elfcorehdr() static
ia64: make num_rsvd_regions static
Subsystem: ocfs2
Dan Carpenter <dan.carpenter@oracle.com>:
ocfs2: remove an unnecessary condition
Tuo Li <islituo@gmail.com>:
ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()
Gang He <ghe@suse.com>:
ocfs2: ocfs2_downconvert_lock failure results in deadlock
Subsystem: block
kernel test robot <lkp@intel.com>:
arch/csky/kernel/probes/kprobes.c: fix bugon.cocci warnings
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
Patch series "SLUB: reduce irq disabled scope and make it RT compatible", v4:
mm, slub: don't call flush_all() from slab_debug_trace_open()
mm, slub: allocate private object map for debugfs listings
mm, slub: allocate private object map for validate_slab_cache()
mm, slub: don't disable irq for debug_check_no_locks_freed()
mm, slub: remove redundant unfreeze_partials() from put_cpu_partial()
mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab()
mm, slub: extract get_partial() from new_slab_objects()
mm, slub: dissolve new_slab_objects() into ___slab_alloc()
mm, slub: return slab page from get_partial() and set c->page afterwards
mm, slub: restructure new page checks in ___slab_alloc()
mm, slub: simplify kmem_cache_cpu and tid setup
mm, slub: move disabling/enabling irqs to ___slab_alloc()
mm, slub: do initial checks in ___slab_alloc() with irqs enabled
mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc()
mm, slub: restore irqs around calling new_slab()
mm, slub: validate slab from partial list or page allocator before making it cpu slab
mm, slub: check new pages with restored irqs
mm, slub: stop disabling irqs around get_partial()
mm, slub: move reset of c->page and freelist out of deactivate_slab()
mm, slub: make locking in deactivate_slab() irq-safe
mm, slub: call deactivate_slab() without disabling irqs
mm, slub: move irq control into unfreeze_partials()
mm, slub: discard slabs in unfreeze_partials() without irqs disabled
mm, slub: detach whole partial list at once in unfreeze_partials()
mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing
mm, slub: only disable irq with spin_lock in __unfreeze_partials()
mm, slub: don't disable irqs in slub_cpu_dead()
mm, slab: make flush_slab() possible to call with irqs enabled
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
mm: slub: make object_map_lock a raw_spinlock_t
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: optionally save/restore irqs in slab_[un]lock()/
mm, slub: make slab_lock() disable irqs with PREEMPT_RT
mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg
mm, slub: use migrate_disable() on PREEMPT_RT
mm, slub: convert kmem_cpu_slab protection to local_lock
Subsystem: mm/debug
Gavin Shan <gshan@redhat.com>:
Patch series "mm/debug_vm_pgtable: Enhancements", v6:
mm/debug_vm_pgtable: introduce struct pgtable_debug_args
mm/debug_vm_pgtable: use struct pgtable_debug_args in basic tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in leaf and savewrite tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in protnone and devmap tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in soft_dirty and swap tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in migration and thp tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in PTE modifying tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in PMD modifying tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in PUD modifying tests
mm/debug_vm_pgtable: use struct pgtable_debug_args in PGD and P4D modifying tests
mm/debug_vm_pgtable: remove unused code
mm/debug_vm_pgtable: fix corrupted page flag
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: report a more useful address for reclaim acquisition
liuhailong <liuhailong@oppo.com>:
mm: add kernel_misc_reclaimable in show_free_areas
Subsystem: mm/pagecache
Jan Kara <jack@suse.cz>:
Patch series "writeback: Fix bandwidth estimates", v4:
writeback: track number of inodes under writeback
writeback: reliably update bandwidth estimation
writeback: fix bandwidth estimate for spiky workload
writeback: rename domain_update_bandwidth()
writeback: use READ_ONCE for unlocked reads of writeback stats
Johannes Weiner <hannes@cmpxchg.org>:
mm: remove irqsave/restore locking from contexts with irqs enabled
fs: drop_caches: fix skipping over shadow cache inodes
fs: inode: count invalidated shadow pages in pginodesteal
Shakeel Butt <shakeelb@google.com>:
writeback: memcg: simplify cgroup_writeback_by_id
Jing Yangyang <jing.yangyang@zte.com.cn>:
include/linux/buffer_head.h: fix boolreturn.cocci warnings
Subsystem: mm/gup
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanups and fixup for gup":
mm: gup: remove set but unused local variable major
mm: gup: remove unneed local variable orig_refs
mm: gup: remove useless BUG_ON in __get_user_pages()
mm: gup: fix potential pgmap refcnt leak in __gup_device_huge()
mm: gup: use helper PAGE_ALIGNED in populate_vma_page_range()
John Hubbard <jhubbard@nvidia.com>:
Patch series "A few gup refactorings and documentation updates", v3:
mm/gup: documentation corrections for gup/pup
mm/gup: small refactoring: simplify try_grab_page()
mm/gup: remove try_get_page(), call try_get_compound_head() directly
Subsystem: mm/swap
Hugh Dickins <hughd@google.com>:
fs, mm: fix race in unlinking swapfile
John Hubbard <jhubbard@nvidia.com>:
mm: delete unused get_kernel_page()
Subsystem: mm/shmem
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
shmem: use raw_spinlock_t for ->stat_lock
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanups for shmem":
shmem: remove unneeded variable ret
shmem: remove unneeded header file
shmem: remove unneeded function forward declaration
shmem: include header file to declare swap_info
Hugh Dickins <hughd@google.com>:
Patch series "huge tmpfs: shmem_is_huge() fixes and cleanups":
huge tmpfs: fix fallocate(vanilla) advance over huge pages
huge tmpfs: fix split_huge_page() after FALLOC_FL_KEEP_SIZE
huge tmpfs: remove shrinklist addition from shmem_setattr()
huge tmpfs: revert shmem's use of transhuge_vma_enabled()
huge tmpfs: move shmem_huge_enabled() upwards
huge tmpfs: SGP_NOALLOC to stop collapse_file() on race
huge tmpfs: shmem_is_huge(vma, inode, index)
huge tmpfs: decide stat.st_blksize by shmem_is_huge()
shmem: shmem_writepage() split unlikely i915 THP
Subsystem: mm/memcg
Suren Baghdasaryan <surenb@google.com>:
mm, memcg: add mem_cgroup_disabled checks in vmpressure and swap-related functions
mm, memcg: inline mem_cgroup_{charge/uncharge} to improve disabled memcg config
mm, memcg: inline swap-related functions to improve disabled memcg config
Vasily Averin <vvs@virtuozzo.com>:
memcg: enable accounting for pids in nested pid namespaces
Shakeel Butt <shakeelb@google.com>:
memcg: switch lruvec stats to rstat
memcg: infrastructure to flush memcg stats
Yutian Yang <nglaive@gmail.com>:
memcg: charge fs_context and legacy_fs_context
Vasily Averin <vvs@virtuozzo.com>:
Patch series "memcg accounting from OpenVZ", v7:
memcg: enable accounting for mnt_cache entries
memcg: enable accounting for pollfd and select bits arrays
memcg: enable accounting for file lock caches
memcg: enable accounting for fasync_cache
memcg: enable accounting for new namesapces and struct nsproxy
memcg: enable accounting of ipc resources
memcg: enable accounting for signals
memcg: enable accounting for posix_timers_cache slab
memcg: enable accounting for ldt_struct objects
Shakeel Butt <shakeelb@google.com>:
memcg: cleanup racy sum avoidance code
Vasily Averin <vvs@virtuozzo.com>:
memcg: replace in_interrupt() by !in_task() in active_memcg()
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm: memcontrol: set the correct memcg swappiness restriction
Miaohe Lin <linmiaohe@huawei.com>:
mm, memcg: remove unused functions
mm, memcg: save some atomic ops when flush is already true
Michal Hocko <mhocko@suse.com>:
memcg: fix up drain_local_stock comment
Shakeel Butt <shakeelb@google.com>:
memcg: make memcg->event_list_lock irqsafe
Subsystem: mm/selftests
Po-Hsu Lin <po-hsu.lin@canonical.com>:
selftests/vm: use kselftest skip code for skipped tests
Colin Ian King <colin.king@canonical.com>:
selftests: Fix spelling mistake "cann't" -> "cannot"
Subsystem: mm/pagemap
Nicholas Piggin <npiggin@gmail.com>:
Patch series "shoot lazy tlbs", v4:
lazy tlb: introduce lazy mm refcount helper functions
lazy tlb: allow lazy tlb mm refcounting to be configurable
lazy tlb: shoot lazies, a non-refcounting lazy tlb option
powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN
Christoph Hellwig <hch@lst.de>:
Patch series "_kernel_dcache_page fixes and removal":
mmc: JZ4740: remove the flush_kernel_dcache_page call in jz4740_mmc_read_data
mmc: mmc_spi: replace flush_kernel_dcache_page with flush_dcache_page
scatterlist: replace flush_kernel_dcache_page with flush_dcache_page
mm: remove flush_kernel_dcache_page
Huang Ying <ying.huang@intel.com>:
mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code
Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
mm: change fault_in_pages_* to have an unsigned size parameter
Luigi Rizzo <lrizzo@google.com>:
mm/pagemap: add mmap_assert_locked() annotations to find_vma*()
"Liam R. Howlett" <Liam.Howlett@Oracle.com>:
remap_file_pages: Use vma_lookup() instead of find_vma()
Subsystem: mm/mremap
Chen Wandun <chenwandun@huawei.com>:
mm/mremap: fix memory account on do_munmap() failure
Subsystem: mm/bootmem
Muchun Song <songmuchun@bytedance.com>:
mm/bootmem_info.c: mark __init on register_page_bootmem_info_section
Subsystem: mm/sparsemem
Ohhoon Kwon <ohoono.kwon@samsung.com>:
Patch series "mm: sparse: remove __section_nr() function", v4:
mm: sparse: pass section_nr to section_mark_present
mm: sparse: pass section_nr to find_memory_block
mm: sparse: remove __section_nr() function
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm/sparse: set SECTION_NID_SHIFT to 6
Matthew Wilcox <willy@infradead.org>:
include/linux/mmzone.h: avoid a warning in sparse memory support
Miles Chen <miles.chen@mediatek.com>:
mm/sparse: clarify pgdat_to_phys
Subsystem: mm/vmalloc
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: use batched page requests in bulk-allocator
mm/vmalloc: remove gfpflags_allow_blocking() check
lib/test_vmalloc.c: add a new 'nr_pages' parameter
Chen Wandun <chenwandun@huawei.com>:
mm/vmalloc: fix wrong behavior in vread
Subsystem: mm/kasan
Woody Lin <woodylin@google.com>:
mm/kasan: move kasan.fault to mm/kasan/report.c
Andrey Konovalov <andreyknvl@gmail.com>:
Patch series "kasan: test: avoid crashing the kernel with HW_TAGS", v2:
kasan: test: rework kmalloc_oob_right
kasan: test: avoid writing invalid memory
kasan: test: avoid corrupting memory via memset
kasan: test: disable kmalloc_memmove_invalid_size for HW_TAGS
kasan: test: only do kmalloc_uaf_memset for generic mode
kasan: test: clean up ksize_uaf
kasan: test: avoid corrupting memory in copy_user_test
kasan: test: avoid corrupting memory in kasan_rcu_uaf
Subsystem: mm/pagealloc
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: ensure consistency of memory map poisoning":
mm/page_alloc: always initialize memory map for the holes
microblaze: simplify pte_alloc_one_kernel()
mm: introduce memmap_alloc() to unify memory map allocation
memblock: stop poisoning raw allocations
Nico Pache <npache@redhat.com>:
mm/page_alloc.c: fix 'zone_id' may be used uninitialized in this function warning
Mike Rapoport <rppt@linux.ibm.com>:
mm/page_alloc: make alloc_node_mem_map() __init rather than __ref
Vasily Averin <vvs@virtuozzo.com>:
mm/page_alloc.c: use in_task()
"George G. Davis" <davis.george@siemens.com>:
mm/page_isolation: tracing: trace all test_pages_isolated failures
Subsystem: mm/memory-failure
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanups and fixup for hwpoison":
mm/hwpoison: remove unneeded variable unmap_success
mm/hwpoison: fix potential pte_unmap_unlock pte error
mm/hwpoison: change argument struct page **hpagep to *hpage
mm/hwpoison: fix some obsolete comments
Yang Shi <shy828301@gmail.com>:
mm: hwpoison: don't drop slab caches for offlining non-LRU page
doc: hwpoison: correct the support for hugepage
mm: hwpoison: dump page for unhandlable page
Michael Wang <yun.wang@linux.alibaba.com>:
mm: fix panic caused by __page_handle_poison()
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb: simplify prep_compound_gigantic_page ref count racing code
hugetlb: drop ref count earlier after page allocation
hugetlb: before freeing hugetlb page set dtor to appropriate value
hugetlb: fix hugetlb cgroup refcounting during vma split
Subsystem: mm/userfaultfd
Nadav Amit <namit@vmware.com>:
Patch series "userfaultfd: minor bug fixes":
userfaultfd: change mmap_changing to atomic
userfaultfd: prevent concurrent API initialization
selftests/vm/userfaultfd: wake after copy failure
Subsystem: mm/vmscan
Dave Hansen <dave.hansen@linux.intel.com>:
Patch series "Migrate Pages in lieu of discard", v11:
mm/numa: automatically generate node migration order
mm/migrate: update node demotion order on hotplug events
Yang Shi <yang.shi@linux.alibaba.com>:
mm/migrate: enable returning precise migrate_pages() success count
Dave Hansen <dave.hansen@linux.intel.com>:
mm/migrate: demote pages during reclaim
Yang Shi <yang.shi@linux.alibaba.com>:
mm/vmscan: add page demotion counter
Dave Hansen <dave.hansen@linux.intel.com>:
mm/vmscan: add helper for querying ability to age anonymous pages
Keith Busch <kbusch@kernel.org>:
mm/vmscan: Consider anonymous pages without swap
Dave Hansen <dave.hansen@linux.intel.com>:
mm/vmscan: never demote for memcg reclaim
Huang Ying <ying.huang@intel.com>:
mm/migrate: add sysfs interface to enable reclaim migration
Hui Su <suhui@zeku.com>:
mm/vmpressure: replace vmpressure_to_css() with vmpressure_to_memcg()
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanups for vmscan", v2:
mm/vmscan: remove the PageDirty check after MADV_FREE pages are page_ref_freezed
mm/vmscan: remove misleading setting to sc->priority
mm/vmscan: remove unneeded return value of kswapd_run()
mm/vmscan: add 'else' to remove check_pending label
Vlastimil Babka <vbabka@suse.cz>:
mm, vmscan: guarantee drop_slab_node() termination
Subsystem: mm/compaction
Charan Teja Reddy <charante@codeaurora.org>:
mm: compaction: optimize proactive compaction deferrals
mm: compaction: support triggering of proactive compaction by user
Subsystem: mm/mempolicy
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm/mempolicy: use readable NUMA_NO_NODE macro instead of magic number
Dave Hansen <dave.hansen@linux.intel.com>:
Patch series "Introduce multi-preference mempolicy", v7:
mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes
Feng Tang <feng.tang@intel.com>:
mm/memplicy: add page allocation function for MPOL_PREFERRED_MANY policy
Ben Widawsky <ben.widawsky@intel.com>:
mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
mm/mempolicy: advertise new MPOL_PREFERRED_MANY
Feng Tang <feng.tang@intel.com>:
mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies
Vasily Averin <vvs@virtuozzo.com>:
mm/mempolicy.c: use in_task() in mempolicy_slab_node()
Subsystem: mm/memblock
Mike Rapoport <rppt@linux.ibm.com>:
memblock: make memblock_find_in_range method private
Subsystem: mm/oom-kill
Suren Baghdasaryan <surenb@google.com>:
mm: introduce process_mrelease system call
mm: wire up syscall process_mrelease
Subsystem: mm/migration
Randy Dunlap <rdunlap@infradead.org>:
mm/migrate: correct kernel-doc notation
Subsystem: mm/ksm
Zhansaya Bagdauletkyzy <zhansayabagdaulet@gmail.com>:
Patch series "add KSM selftests":
selftests: vm: add KSM merge test
selftests: vm: add KSM unmerge test
selftests: vm: add KSM zero page merging test
selftests: vm: add KSM merging across nodes test
mm: KSM: fix data type
Patch series "add KSM performance tests", v3:
selftests: vm: add KSM merging time test
selftests: vm: add COW time test for KSM pages
Subsystem: mm/percpu
Jing Xiangfeng <jingxiangfeng@huawei.com>:
mm/percpu,c: remove obsolete comments of pcpu_chunk_populated()
Subsystem: mm/vmstat
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup for vmstat":
mm/vmstat: correct some wrong comments
mm/vmstat: simplify the array size calculation
mm/vmstat: remove unneeded return value
Subsystem: mm/madvise
zhangkui <zhangkui@oppo.com>:
mm/madvise: add MADV_WILLNEED to process_madvise()
Documentation/ABI/testing/sysfs-kernel-mm-numa | 24
Documentation/admin-guide/mm/numa_memory_policy.rst | 15
Documentation/admin-guide/sysctl/vm.rst | 3
Documentation/core-api/cachetlb.rst | 86 -
Documentation/dev-tools/kasan.rst | 13
Documentation/translations/zh_CN/core-api/cachetlb.rst | 9
Documentation/vm/hwpoison.rst | 1
arch/Kconfig | 28
arch/alpha/kernel/syscalls/syscall.tbl | 2
arch/arm/include/asm/cacheflush.h | 4
arch/arm/kernel/setup.c | 20
arch/arm/mach-rpc/ecard.c | 2
arch/arm/mm/flush.c | 33
arch/arm/mm/nommu.c | 6
arch/arm/tools/syscall.tbl | 2
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 2
arch/arm64/kvm/hyp/reserved_mem.c | 9
arch/arm64/mm/init.c | 38
arch/csky/abiv1/cacheflush.c | 11
arch/csky/abiv1/inc/abi/cacheflush.h | 4
arch/csky/kernel/probes/kprobes.c | 3
arch/ia64/include/asm/meminit.h | 2
arch/ia64/kernel/acpi.c | 2
arch/ia64/kernel/setup.c | 55
arch/ia64/kernel/syscalls/syscall.tbl | 2
arch/m68k/kernel/syscalls/syscall.tbl | 2
arch/microblaze/include/asm/page.h | 3
arch/microblaze/include/asm/pgtable.h | 2
arch/microblaze/kernel/syscalls/syscall.tbl | 2
arch/microblaze/mm/init.c | 12
arch/microblaze/mm/pgtable.c | 17
arch/mips/include/asm/cacheflush.h | 8
arch/mips/kernel/setup.c | 14
arch/mips/kernel/syscalls/syscall_n32.tbl | 2
arch/mips/kernel/syscalls/syscall_n64.tbl | 2
arch/mips/kernel/syscalls/syscall_o32.tbl | 2
arch/nds32/include/asm/cacheflush.h | 3
arch/nds32/mm/cacheflush.c | 9
arch/parisc/include/asm/cacheflush.h | 8
arch/parisc/kernel/cache.c | 3
arch/parisc/kernel/syscalls/syscall.tbl | 2
arch/powerpc/Kconfig | 1
arch/powerpc/kernel/smp.c | 2
arch/powerpc/kernel/syscalls/syscall.tbl | 2
arch/powerpc/mm/book3s64/radix_tlb.c | 4
arch/powerpc/platforms/pseries/hotplug-memory.c | 4
arch/riscv/mm/init.c | 44
arch/s390/kernel/setup.c | 9
arch/s390/kernel/syscalls/syscall.tbl | 2
arch/s390/mm/fault.c | 2
arch/sh/include/asm/cacheflush.h | 8
arch/sh/kernel/syscalls/syscall.tbl | 2
arch/sparc/kernel/syscalls/syscall.tbl | 2
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
arch/x86/kernel/aperture_64.c | 5
arch/x86/kernel/ldt.c | 6
arch/x86/mm/init.c | 23
arch/x86/mm/numa.c | 5
arch/x86/mm/numa_emulation.c | 5
arch/x86/realmode/init.c | 2
arch/xtensa/kernel/syscalls/syscall.tbl | 2
block/blk-map.c | 2
drivers/acpi/tables.c | 5
drivers/base/arch_numa.c | 5
drivers/base/memory.c | 4
drivers/mmc/host/jz4740_mmc.c | 4
drivers/mmc/host/mmc_spi.c | 2
drivers/of/of_reserved_mem.c | 12
fs/drop_caches.c | 3
fs/exec.c | 12
fs/fcntl.c | 3
fs/fs-writeback.c | 28
fs/fs_context.c | 4
fs/inode.c | 2
fs/locks.c | 6
fs/namei.c | 8
fs/namespace.c | 7
fs/ocfs2/dlmglue.c | 14
fs/ocfs2/quota_global.c | 1
fs/ocfs2/quota_local.c | 2
fs/pipe.c | 2
fs/select.c | 4
fs/userfaultfd.c | 116 -
include/linux/backing-dev-defs.h | 2
include/linux/backing-dev.h | 19
include/linux/buffer_head.h | 2
include/linux/compaction.h | 2
include/linux/highmem.h | 5
include/linux/hugetlb_cgroup.h | 12
include/linux/memblock.h | 2
include/linux/memcontrol.h | 118 +
include/linux/memory.h | 2
include/linux/mempolicy.h | 16
include/linux/migrate.h | 14
include/linux/mm.h | 17
include/linux/mmzone.h | 4
include/linux/page-flags.h | 9
include/linux/pagemap.h | 4
include/linux/sched/mm.h | 35
include/linux/shmem_fs.h | 25
include/linux/slub_def.h | 6
include/linux/swap.h | 28
include/linux/syscalls.h | 1
include/linux/userfaultfd_k.h | 8
include/linux/vm_event_item.h | 2
include/linux/vmpressure.h | 2
include/linux/writeback.h | 4
include/trace/events/migrate.h | 3
include/uapi/asm-generic/unistd.h | 4
include/uapi/linux/mempolicy.h | 1
ipc/msg.c | 2
ipc/namespace.c | 2
ipc/sem.c | 9
ipc/shm.c | 2
kernel/cgroup/namespace.c | 2
kernel/cpu.c | 2
kernel/exit.c | 2
kernel/fork.c | 51
kernel/kthread.c | 21
kernel/nsproxy.c | 2
kernel/pid_namespace.c | 5
kernel/sched/core.c | 37
kernel/sched/sched.h | 4
kernel/signal.c | 2
kernel/sys_ni.c | 1
kernel/sysctl.c | 2
kernel/time/namespace.c | 4
kernel/time/posix-timers.c | 4
kernel/user_namespace.c | 2
lib/scatterlist.c | 5
lib/test_kasan.c | 80 -
lib/test_kasan_module.c | 20
lib/test_vmalloc.c | 5
mm/backing-dev.c | 11
mm/bootmem_info.c | 4
mm/compaction.c | 69 -
mm/debug_vm_pgtable.c | 982 +++++++++------
mm/filemap.c | 15
mm/gup.c | 109 -
mm/huge_memory.c | 32
mm/hugetlb.c | 173 ++
mm/hwpoison-inject.c | 2
mm/internal.h | 9
mm/kasan/hw_tags.c | 43
mm/kasan/kasan.h | 1
mm/kasan/report.c | 29
mm/khugepaged.c | 2
mm/ksm.c | 8
mm/madvise.c | 1
mm/memblock.c | 22
mm/memcontrol.c | 234 +--
mm/memory-failure.c | 53
mm/memory_hotplug.c | 2
mm/mempolicy.c | 207 ++-
mm/migrate.c | 319 ++++
mm/mmap.c | 7
mm/mremap.c | 2
mm/oom_kill.c | 70 +
mm/page-writeback.c | 133 +-
mm/page_alloc.c | 62
mm/page_isolation.c | 13
mm/percpu.c | 3
mm/shmem.c | 309 ++--
mm/slab_common.c | 2
mm/slub.c | 1085 ++++++++++-------
mm/sparse.c | 46
mm/swap.c | 22
mm/swapfile.c | 14
mm/truncate.c | 28
mm/userfaultfd.c | 15
mm/vmalloc.c | 79 -
mm/vmpressure.c | 10
mm/vmscan.c | 220 ++-
mm/vmstat.c | 25
security/tomoyo/domain.c | 13
tools/testing/scatterlist/linux/mm.h | 1
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 3
tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 5
tools/testing/selftests/vm/hugetlb_reparenting_test.sh | 5
tools/testing/selftests/vm/ksm_tests.c | 696 ++++++++++
tools/testing/selftests/vm/mlock-random-test.c | 2
tools/testing/selftests/vm/run_vmtests.sh | 98 +
tools/testing/selftests/vm/userfaultfd.c | 13
186 files changed, 4488 insertions(+), 2281 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-08-25 19:17 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-08-25 19:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
2 patches, based on 6e764bcd1cf72a2846c0e53d3975a09b242c04c9.
Subsystems affected by this patch series:
mm/memory-hotplug
MAINTAINERS
Subsystem: mm/memory-hotplug
Miaohe Lin <linmiaohe@huawei.com>:
mm/memory_hotplug: fix potential permanent lru cache disable
Subsystem: MAINTAINERS
Namjae Jeon <namjae.jeon@samsung.com>:
MAINTAINERS: exfat: update my email address
MAINTAINERS | 2 +-
mm/memory_hotplug.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-08-20 2:03 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-08-20 2:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
10 patches, based on 614cb2751d3150850d459bee596c397f344a7936.
Subsystems affected by this patch series:
mm/shmem
mm/pagealloc
mm/tracing
MAINTAINERS
mm/memcg
mm/memory-failure
mm/vmscan
mm/kfence
mm/hugetlb
Subsystem: mm/shmem
Yang Shi <shy828301@gmail.com>:
Revert "mm/shmem: fix shmem_swapin() race with swapoff"
Revert "mm: swap: check if swap backing device is congested or not"
Subsystem: mm/pagealloc
Doug Berger <opendmb@gmail.com>:
mm/page_alloc: don't corrupt pcppage_migratetype
Subsystem: mm/tracing
Mike Rapoport <rppt@linux.ibm.com>:
mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON names
Subsystem: MAINTAINERS
Nathan Chancellor <nathan@kernel.org>:
MAINTAINERS: update ClangBuiltLinux IRC chat
Subsystem: mm/memcg
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm/hwpoison: retry with shake_page() for unhandlable pages
Subsystem: mm/vmscan
Johannes Weiner <hannes@cmpxchg.org>:
mm: vmscan: fix missing psi annotation for node_reclaim()
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb: don't pass page cache pages to restore_reserve_on_error
MAINTAINERS | 2 +-
include/linux/kfence.h | 7 ++++---
include/linux/memcontrol.h | 29 +++++++++++++++--------------
include/trace/events/mmflags.h | 4 +++-
mm/hugetlb.c | 19 ++++++++++++++-----
mm/memory-failure.c | 12 +++++++++---
mm/page_alloc.c | 25 ++++++++++++-------------
mm/shmem.c | 14 +-------------
mm/swap_state.c | 7 -------
mm/vmscan.c | 30 ++++++++++++++++++++++--------
10 files changed, 81 insertions(+), 68 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-08-13 23:53 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-08-13 23:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
7 patches, based on f8e6dfc64f6135d1b6c5215c14cd30b9b60a0008.
Subsystems affected by this patch series:
mm/kasan
mm/slub
mm/madvise
mm/memcg
lib
Subsystem: mm/kasan
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
Patch series "kasan, slub: reset tag when printing address", v3:
kasan, kmemleak: reset tags when scanning block
kasan, slub: reset tag when printing address
Subsystem: mm/slub
Shakeel Butt <shakeelb@google.com>:
slub: fix kmalloc_pagealloc_invalid_free unit test
Vlastimil Babka <vbabka@suse.cz>:
mm: slub: fix slub_debug disabling for list of slabs
Subsystem: mm/madvise
David Hildenbrand <david@redhat.com>:
mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)
Subsystem: mm/memcg
Waiman Long <longman@redhat.com>:
mm/memcg: fix incorrect flushing of lruvec data in obj_stock
Subsystem: lib
Liang Wang <wangliang101@huawei.com>:
lib: use PFN_PHYS() in devmem_is_allowed()
lib/devmem_is_allowed.c | 2 +-
mm/gup.c | 7 +++++--
mm/kmemleak.c | 6 +++---
mm/madvise.c | 4 +++-
mm/memcontrol.c | 6 ++++--
mm/slub.c | 25 ++++++++++++++-----------
6 files changed, 30 insertions(+), 20 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-07-29 21:52 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-07-29 21:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
7 patches, based on 7e96bf476270aecea66740a083e51b38c1371cd2.
Subsystems affected by this patch series:
lib
ocfs2
mm/memcg
mm/migration
mm/slub
mm/memcg
Subsystem: lib
Matteo Croce <mcroce@microsoft.com>:
lib/test_string.c: move string selftest in the Runtime Testing menu
Subsystem: ocfs2
Junxiao Bi <junxiao.bi@oracle.com>:
ocfs2: fix zero out valid data
ocfs2: issue zeroout to EOF blocks
Subsystem: mm/memcg
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
Subsystem: mm/migration
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
mm/migrate: fix NR_ISOLATED corruption on 64-bit
Subsystem: mm/slub
Shakeel Butt <shakeelb@google.com>:
slub: fix unreclaimable slab stat for bulk free
Subsystem: mm/memcg
Wang Hai <wanghai38@huawei.com>:
mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
fs/ocfs2/file.c | 103 ++++++++++++++++++++++++++++++++----------------------
lib/Kconfig | 3 -
lib/Kconfig.debug | 3 +
mm/memcontrol.c | 3 +
mm/migrate.c | 2 -
mm/slab.h | 2 -
mm/slub.c | 22 ++++++-----
7 files changed, 81 insertions(+), 57 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-07-23 22:49 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-07-23 22:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
15 patches, based on 704f4cba43d4ed31ef4beb422313f1263d87bc55.
Subsystems affected by this patch series:
mm/userfaultfd
mm/kfence
mm/highmem
mm/pagealloc
mm/memblock
mm/pagecache
mm/secretmem
mm/pagemap
mm/hugetlbfs
Subsystem: mm/userfaultfd
Peter Collingbourne <pcc@google.com>:
Patch series "userfaultfd: do not untag user pointers", v5:
userfaultfd: do not untag user pointers
selftest: use mmap instead of posix_memalign to allocate memory
Subsystem: mm/kfence
Weizhao Ouyang <o451686892@gmail.com>:
kfence: defer kfence_test_init to ensure that kunit debugfs is created
Alexander Potapenko <glider@google.com>:
kfence: move the size check to the beginning of __kfence_alloc()
kfence: skip all GFP_ZONEMASK allocations
Subsystem: mm/highmem
Christoph Hellwig <hch@lst.de>:
mm: call flush_dcache_page() in memcpy_to_page() and memzero_page()
mm: use kmap_local_page in memzero_page
Subsystem: mm/pagealloc
Sergei Trofimovich <slyfox@gentoo.org>:
mm: page_alloc: fix page_poison=1 / INIT_ON_ALLOC_DEFAULT_ON interaction
Subsystem: mm/memblock
Mike Rapoport <rppt@linux.ibm.com>:
memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
Subsystem: mm/pagecache
Roman Gushchin <guro@fb.com>:
writeback, cgroup: remove wb from offline list before releasing refcnt
writeback, cgroup: do not reparent dax inodes
Subsystem: mm/secretmem
Mike Rapoport <rppt@linux.ibm.com>:
mm/secretmem: wire up ->set_page_dirty
Subsystem: mm/pagemap
Muchun Song <songmuchun@bytedance.com>:
mm: mmap_lock: fix disabling preemption directly
Qi Zheng <zhengqi.arch@bytedance.com>:
mm: fix the deadlock in finish_fault()
Subsystem: mm/hugetlbfs
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlbfs: fix mount mode command line processing
Documentation/arm64/tagged-address-abi.rst | 26 ++++++++++++++++++--------
fs/fs-writeback.c | 3 +++
fs/hugetlbfs/inode.c | 2 +-
fs/userfaultfd.c | 26 ++++++++++++--------------
include/linux/highmem.h | 6 ++++--
include/linux/memblock.h | 4 ++--
mm/backing-dev.c | 2 +-
mm/kfence/core.c | 19 ++++++++++++++++---
mm/kfence/kfence_test.c | 2 +-
mm/memblock.c | 3 ++-
mm/memory.c | 11 ++++++++++-
mm/mmap_lock.c | 4 ++--
mm/page_alloc.c | 29 ++++++++++++++++-------------
mm/secretmem.c | 1 +
tools/testing/selftests/vm/userfaultfd.c | 6 ++++--
15 files changed, 93 insertions(+), 51 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-07-15 4:26 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-07-15 4:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
13 patches, based on 40226a3d96ef8ab8980f032681c8bfd46d63874e.
Subsystems affected by this patch series:
mm/kasan
mm/pagealloc
mm/rmap
mm/hmm
hfs
mm/hugetlb
Subsystem: mm/kasan
Marco Elver <elver@google.com>:
mm: move helper to check slub_debug_enabled
Yee Lee <yee.lee@mediatek.com>:
kasan: add memzero init for unaligned size at DEBUG
Marco Elver <elver@google.com>:
kasan: fix build by including kernel.h
Subsystem: mm/pagealloc
Matteo Croce <mcroce@microsoft.com>:
Revert "mm/page_alloc: make should_fail_alloc_page() static"
Mel Gorman <mgorman@techsingularity.net>:
mm/page_alloc: avoid page allocator recursion with pagesets.lock held
Yanfei Xu <yanfei.xu@windriver.com>:
mm/page_alloc: correct return value when failing at preparing
Chuck Lever <chuck.lever@oracle.com>:
mm/page_alloc: further fix __alloc_pages_bulk() return value
Subsystem: mm/rmap
Christoph Hellwig <hch@lst.de>:
mm: fix the try_to_unmap prototype for !CONFIG_MMU
Subsystem: mm/hmm
Alistair Popple <apopple@nvidia.com>:
lib/test_hmm: remove set but unused page variable
Subsystem: hfs
Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>:
Patch series "hfs: fix various errors", v2:
hfs: add missing clean-up in hfs_fill_super
hfs: fix high memory mapping in hfs_bnode_read
hfs: add lock nesting notation to hfs_find_init
Subsystem: mm/hugetlb
Joao Martins <joao.m.martins@oracle.com>:
mm/hugetlb: fix refs calculation from unaligned @vaddr
fs/hfs/bfind.c | 14 +++++++++++++-
fs/hfs/bnode.c | 25 ++++++++++++++++++++-----
fs/hfs/btree.h | 7 +++++++
fs/hfs/super.c | 10 +++++-----
include/linux/kasan.h | 1 +
include/linux/rmap.h | 4 +++-
lib/test_hmm.c | 2 --
mm/hugetlb.c | 5 +++--
mm/kasan/kasan.h | 12 ++++++++++++
mm/page_alloc.c | 30 ++++++++++++++++++++++--------
mm/slab.h | 15 +++++++++++----
mm/slub.c | 14 --------------
12 files changed, 97 insertions(+), 42 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-07-08 0:59 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-07-08 0:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
54 patches, based on a931dd33d370896a683236bba67c0d6f3d01144d.
Subsystems affected by this patch series:
lib
mm/slub
mm/secretmem
mm/cleanups
mm/init
debug
mm/pagemap
mm/mremap
Subsystem: lib
Zhen Lei <thunder.leizhen@huawei.com>:
lib/test: fix spelling mistakes
lib: fix spelling mistakes
lib: fix spelling mistakes in header files
Subsystem: mm/slub
Nathan Chancellor <nathan@kernel.org>:
Patch series "hexagon: Fix build error with CONFIG_STACKDEPOT and select CONFIG_ARCH_WANT_LD_ORPHAN_WARN":
hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script
hexagon: use common DISCARDS macro
hexagon: select ARCH_WANT_LD_ORPHAN_WARN
Oliver Glitta <glittao@gmail.com>:
mm/slub: use stackdepot to save stack trace in objects
Subsystem: mm/secretmem
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20:
mmap: make mlock_future_check() global
riscv/Kconfig: make direct map manipulation options depend on MMU
set_memory: allow querying whether set_direct_map_*() is actually enabled
mm: introduce memfd_secret system call to create "secret" memory areas
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call where relevant
secretmem: test: add basic selftest for memfd_secret(2)
Subsystem: mm/cleanups
Zhen Lei <thunder.leizhen@huawei.com>:
mm: fix spelling mistakes in header files
Subsystem: mm/init
Kefeng Wang <wangkefeng.wang@huawei.com>:
Patch series "init_mm: cleanup ARCH's text/data/brk setup code", v3:
mm: add setup_initial_init_mm() helper
arc: convert to setup_initial_init_mm()
arm: convert to setup_initial_init_mm()
arm64: convert to setup_initial_init_mm()
csky: convert to setup_initial_init_mm()
h8300: convert to setup_initial_init_mm()
m68k: convert to setup_initial_init_mm()
nds32: convert to setup_initial_init_mm()
nios2: convert to setup_initial_init_mm()
openrisc: convert to setup_initial_init_mm()
powerpc: convert to setup_initial_init_mm()
riscv: convert to setup_initial_init_mm()
s390: convert to setup_initial_init_mm()
sh: convert to setup_initial_init_mm()
x86: convert to setup_initial_init_mm()
Subsystem: debug
Stephen Boyd <swboyd@chromium.org>:
Patch series "Add build ID to stacktraces", v6:
buildid: only consider GNU notes for build ID parsing
buildid: add API to parse build ID out of buffer
buildid: stash away kernels build ID on init
dump_stack: add vmlinux build ID to stack traces
module: add printk formats to add module build ID to stacktraces
arm64: stacktrace: use %pSb for backtrace printing
x86/dumpstack: use %pSb/%pBb for backtrace printing
scripts/decode_stacktrace.sh: support debuginfod
scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path
buildid: mark some arguments const
buildid: fix kernel-doc notation
kdump: use vmlinux_build_id to simplify
Subsystem: mm/pagemap
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *
Subsystem: mm/mremap
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "mrermap fixes", v2:
selftest/mremap_test: update the test to handle pagesize other than 4K
selftest/mremap_test: avoid crash with static build
mm/mremap: convert huge PUD move to separate helper
mm/mremap: don't enable optimized PUD move if page table levels is 2
mm/mremap: use pmd/pud_poplulate to update page table entries
mm/mremap: hold the rmap lock in write mode when moving page table entries.
Patch series "Speedup mremap on ppc64", v8:
mm/mremap: allow arch runtime override
powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
powerpc/mm: enable HAVE_MOVE_PMD support
Documentation/core-api/printk-formats.rst | 11
arch/alpha/include/asm/pgtable.h | 8
arch/arc/mm/init.c | 5
arch/arm/include/asm/pgtable-3level.h | 2
arch/arm/kernel/setup.c | 5
arch/arm64/include/asm/Kbuild | 1
arch/arm64/include/asm/cacheflush.h | 6
arch/arm64/include/asm/kfence.h | 2
arch/arm64/include/asm/pgtable.h | 8
arch/arm64/include/asm/set_memory.h | 17 +
arch/arm64/include/uapi/asm/unistd.h | 1
arch/arm64/kernel/machine_kexec.c | 1
arch/arm64/kernel/setup.c | 5
arch/arm64/kernel/stacktrace.c | 2
arch/arm64/mm/mmu.c | 7
arch/arm64/mm/pageattr.c | 13
arch/csky/kernel/setup.c | 5
arch/h8300/kernel/setup.c | 5
arch/hexagon/Kconfig | 1
arch/hexagon/kernel/vmlinux.lds.S | 9
arch/ia64/include/asm/pgtable.h | 4
arch/m68k/include/asm/motorola_pgtable.h | 2
arch/m68k/kernel/setup_mm.c | 5
arch/m68k/kernel/setup_no.c | 5
arch/mips/include/asm/pgtable-64.h | 8
arch/nds32/kernel/setup.c | 5
arch/nios2/kernel/setup.c | 5
arch/openrisc/kernel/setup.c | 5
arch/parisc/include/asm/pgtable.h | 4
arch/powerpc/include/asm/book3s/64/pgtable.h | 11
arch/powerpc/include/asm/book3s/64/tlbflush-radix.h | 2
arch/powerpc/include/asm/nohash/64/pgtable-4k.h | 6
arch/powerpc/include/asm/nohash/64/pgtable.h | 6
arch/powerpc/include/asm/tlb.h | 6
arch/powerpc/kernel/setup-common.c | 5
arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 8
arch/powerpc/mm/book3s64/radix_pgtable.c | 6
arch/powerpc/mm/book3s64/radix_tlb.c | 44 +-
arch/powerpc/mm/pgtable_64.c | 4
arch/powerpc/platforms/Kconfig.cputype | 2
arch/riscv/Kconfig | 4
arch/riscv/include/asm/pgtable-64.h | 4
arch/riscv/include/asm/unistd.h | 1
arch/riscv/kernel/setup.c | 5
arch/s390/kernel/setup.c | 5
arch/sh/include/asm/pgtable-3level.h | 4
arch/sh/kernel/setup.c | 5
arch/sparc/include/asm/pgtable_32.h | 6
arch/sparc/include/asm/pgtable_64.h | 10
arch/um/include/asm/pgtable-3level.h | 2
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
arch/x86/include/asm/pgtable.h | 8
arch/x86/kernel/dumpstack.c | 2
arch/x86/kernel/setup.c | 5
arch/x86/mm/init_64.c | 4
arch/x86/mm/pat/set_memory.c | 4
arch/x86/mm/pgtable.c | 2
include/asm-generic/pgtable-nop4d.h | 2
include/asm-generic/pgtable-nopmd.h | 2
include/asm-generic/pgtable-nopud.h | 4
include/linux/bootconfig.h | 4
include/linux/buildid.h | 10
include/linux/compaction.h | 4
include/linux/cpumask.h | 2
include/linux/crash_core.h | 12
include/linux/debugobjects.h | 2
include/linux/hmm.h | 2
include/linux/hugetlb.h | 6
include/linux/kallsyms.h | 21 +
include/linux/list_lru.h | 4
include/linux/lru_cache.h | 8
include/linux/mm.h | 3
include/linux/mmu_notifier.h | 8
include/linux/module.h | 9
include/linux/nodemask.h | 6
include/linux/percpu-defs.h | 2
include/linux/percpu-refcount.h | 2
include/linux/pgtable.h | 4
include/linux/scatterlist.h | 2
include/linux/secretmem.h | 54 +++
include/linux/set_memory.h | 12
include/linux/shrinker.h | 2
include/linux/syscalls.h | 1
include/linux/vmalloc.h | 4
include/uapi/asm-generic/unistd.h | 7
include/uapi/linux/magic.h | 1
init/Kconfig | 1
init/main.c | 2
kernel/crash_core.c | 50 ---
kernel/kallsyms.c | 104 +++++--
kernel/module.c | 42 ++
kernel/power/hibernate.c | 5
kernel/sys_ni.c | 2
lib/Kconfig.debug | 17 -
lib/asn1_encoder.c | 2
lib/buildid.c | 80 ++++-
lib/devres.c | 2
lib/dump_stack.c | 13
lib/dynamic_debug.c | 2
lib/fonts/font_pearl_8x8.c | 2
lib/kfifo.c | 2
lib/list_sort.c | 2
lib/nlattr.c | 4
lib/oid_registry.c | 2
lib/pldmfw/pldmfw.c | 2
lib/reed_solomon/test_rslib.c | 2
lib/refcount.c | 2
lib/rhashtable.c | 2
lib/sbitmap.c | 2
lib/scatterlist.c | 4
lib/seq_buf.c | 2
lib/sort.c | 2
lib/stackdepot.c | 2
lib/test_bitops.c | 2
lib/test_bpf.c | 2
lib/test_kasan.c | 2
lib/test_kmod.c | 6
lib/test_scanf.c | 2
lib/vsprintf.c | 10
mm/Kconfig | 4
mm/Makefile | 1
mm/gup.c | 12
mm/init-mm.c | 9
mm/internal.h | 3
mm/mlock.c | 3
mm/mmap.c | 5
mm/mremap.c | 108 ++++++-
mm/secretmem.c | 254 +++++++++++++++++
mm/slub.c | 79 +++--
scripts/checksyscalls.sh | 4
scripts/decode_stacktrace.sh | 89 +++++-
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 3
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++
tools/testing/selftests/vm/mremap_test.c | 116 ++++---
tools/testing/selftests/vm/run_vmtests.sh | 17 +
137 files changed, 1470 insertions(+), 442 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-07-03 0:28 ` incoming Linus Torvalds
@ 2021-07-03 1:06 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-07-03 1:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Fri, Jul 2, 2021 at 5:28 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Commit e058a84bfddc42ba356a2316f2cf1141974625c9 is good, and looking
> at the pulls and merges I've done since, this -mm series looks like
> the obvious culprit.
No, unless my bisection is wrong, the -mm branch is innocent, and was
discarded from the suspects on the very first bisection trial.
So never mind.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-07-01 1:46 incoming Andrew Morton
@ 2021-07-03 0:28 ` Linus Torvalds
2021-07-03 1:06 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2021-07-03 0:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Wed, Jun 30, 2021 at 6:46 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> This is the rest of the -mm tree, less 66 patches which are dependent on
> things which are (or were recently) in linux-next. I'll trickle that
> material over next week.
I haven't bisected this yet, but with the current -git I'm getting
watchdog: BUG: soft lockup - CPU#41 stuck for 49s!
and the common call chain seems to be in flush_tlb_mm_range ->
on_each_cpu_cond_mask.
Commit e058a84bfddc42ba356a2316f2cf1141974625c9 is good, and looking
at the pulls and merges I've done since, this -mm series looks like
the obvious culprit.
I'll go start bisection, but I thought I'd give a heads-up in case
somebody else has seen TLB-flush-related lockups and already figured
out the guilty party..
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-07-01 1:46 Andrew Morton
2021-07-03 0:28 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-07-01 1:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
This is the rest of the -mm tree, less 66 patches which are dependent on
things which are (or were recently) in linux-next. I'll trickle that
material over next week.
192 patches, based on 7cf3dead1ad70c72edb03e2d98e1f3dcd332cdb2 plus the
June 28 sendings.
Subsystems affected by this patch series:
mm/hugetlb
mm/userfaultfd
mm/vmscan
mm/kconfig
mm/proc
mm/z3fold
mm/zbud
mm/ras
mm/mempolicy
mm/memblock
mm/migration
mm/thp
mm/nommu
mm/kconfig
mm/madvise
mm/memory-hotplug
mm/zswap
mm/zsmalloc
mm/zram
mm/cleanups
mm/kfence
mm/hmm
procfs
sysctl
misc
core-kernel
lib
lz4
checkpatch
init
kprobes
nilfs2
hfs
signals
exec
kcov
selftests
compress/decompress
ipc
Subsystem: mm/hugetlb
Muchun Song <songmuchun@bytedance.com>:
Patch series "Free some vmemmap pages of HugeTLB page", v23:
mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c
mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP
mm: hugetlb: gather discrete indexes of tail page
mm: hugetlb: free the vmemmap pages associated with each HugeTLB page
mm: hugetlb: defer freeing of HugeTLB pages
mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap
mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled
mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate
Shixin Liu <liushixin2@huawei.com>:
mm/debug_vm_pgtable: move {pmd/pud}_huge_tests out of CONFIG_TRANSPARENT_HUGEPAGE
mm/debug_vm_pgtable: remove redundant pfn_{pmd/pte}() and fix one comment mistake
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup and fixup for huge_memory:, v3:
mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASK
mm/huge_memory.c: use page->deferred_list
mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled()
mm/huge_memory.c: remove unnecessary tlb_remove_page_size() for huge zero pmd
mm/huge_memory.c: don't discard hugepage if other processes are mapping it
Christophe Leroy <christophe.leroy@csgroup.eu>:
Patch series "Subject: [PATCH v2 0/5] Implement huge VMAP and VMALLOC on powerpc 8xx", v2:
mm/hugetlb: change parameters of arch_make_huge_pte()
mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge
mm/vmalloc: enable mapping of huge pages at pte level in vmap
mm/vmalloc: enable mapping of huge pages at pte level in vmalloc
powerpc/8xx: add support for huge pages on VMAP and VMALLOC
Nanyong Sun <sunnanyong@huawei.com>:
khugepaged: selftests: remove debug_cow
Mina Almasry <almasrymina@google.com>:
mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY
Muchun Song <songmuchun@bytedance.com>:
Patch series "Split huge PMD mapping of vmemmap pages", v4:
mm: sparsemem: split the huge PMD mapping of vmemmap pages
mm: sparsemem: use huge PMD mapping for vmemmap pages
mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON
Mike Kravetz <mike.kravetz@oracle.com>:
Patch series "Fix prep_compound_gigantic_page ref count adjustment":
hugetlb: remove prep_compound_huge_page cleanup
hugetlb: address ref count racing in prep_compound_gigantic_page
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm/hwpoison: disable pcp for page_handle_poison()
Subsystem: mm/userfaultfd
Peter Xu <peterx@redhat.com>:
Patch series "userfaultfd/selftests: A few cleanups", v2:
userfaultfd/selftests: use user mode only
userfaultfd/selftests: remove the time() check on delayed uffd
userfaultfd/selftests: dropping VERIFY check in locking_thread
userfaultfd/selftests: only dump counts if mode enabled
userfaultfd/selftests: unify error handling
Patch series "mm/uffd: Misc fix for uffd-wp and one more test":
mm/thp: simplify copying of huge zero page pmd when fork
mm/userfaultfd: fix uffd-wp special cases for fork()
mm/userfaultfd: fail uffd-wp registration if not supported
mm/pagemap: export uffd-wp protection information
userfaultfd/selftests: add pagemap uffd-wp test
Axel Rasmussen <axelrasmussen@google.com>:
Patch series "userfaultfd: add minor fault handling for shmem", v6:
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: advertise shmem minor fault support
userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte()
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
Subsystem: mm/vmscan
Yu Zhao <yuzhao@google.com>:
mm/vmscan.c: fix potential deadlock in reclaim_pages()
include/trace/events/vmscan.h: remove mm_vmscan_inactive_list_is_low
Miaohe Lin <linmiaohe@huawei.com>:
mm: workingset: define macro WORKINGSET_SHIFT
Subsystem: mm/kconfig
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm/kconfig: move HOLES_IN_ZONE into mm
Subsystem: mm/proc
Mike Rapoport <rppt@linux.ibm.com>:
docs: proc.rst: meminfo: briefly describe gaps in memory accounting
David Hildenbrand <david@redhat.com>:
Patch series "fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages", v3:
fs/proc/kcore: drop KCORE_REMAP and KCORE_OTHER
fs/proc/kcore: pfn_is_ram check only applies to KCORE_RAM
fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages
mm: introduce page_offline_(begin|end|freeze|thaw) to synchronize setting PageOffline()
virtio-mem: use page_offline_(start|end) when setting PageOffline()
fs/proc/kcore: use page_offline_(freeze|thaw)
Subsystem: mm/z3fold
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup and fixup for z3fold":
mm/z3fold: define macro NCHUNKS as TOTAL_CHUNKS - ZHDR_CHUNKS
mm/z3fold: avoid possible underflow in z3fold_alloc()
mm/z3fold: remove magic number in z3fold_create_pool()
mm/z3fold: remove unused function handle_to_z3fold_header()
mm/z3fold: fix potential memory leak in z3fold_destroy_pool()
mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page
Subsystem: mm/zbud
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanups for zbud", v2:
mm/zbud: reuse unbuddied[0] as buddied in zbud_pool
mm/zbud: don't export any zbud API
Subsystem: mm/ras
YueHaibing <yuehaibing@huawei.com>:
mm/compaction: use DEVICE_ATTR_WO macro
Liu Xiang <liu.xiang@zlingsmart.com>:
mm: compaction: remove duplicate !list_empty(&sublist) check
Wonhyuk Yang <vvghjk1234@gmail.com>:
mm/compaction: fix 'limit' in fast_isolate_freepages
Subsystem: mm/mempolicy
Feng Tang <feng.tang@intel.com>:
Patch series "mm/mempolicy: some fix and semantics cleanup", v4:
mm/mempolicy: cleanup nodemask intersection check for oom
mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy
mm/mempolicy: unify the parameter sanity check for mbind and set_mempolicy
Yang Shi <shy828301@gmail.com>:
mm: mempolicy: don't have to split pmd for huge zero page
Ben Widawsky <ben.widawsky@intel.com>:
mm/mempolicy: use unified 'nodes' for bind/interleave/prefer policies
Subsystem: mm/memblock
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "arm64: drop pfn_valid_within() and simplify pfn_valid()", v4:
include/linux/mmzone.h: add documentation for pfn_valid()
memblock: update initialization of reserved pages
arm64: decouple check whether pfn is in linear map from pfn_valid()
arm64: drop pfn_valid_within() and simplify pfn_valid()
Anshuman Khandual <anshuman.khandual@arm.com>:
arm64/mm: drop HAVE_ARCH_PFN_VALID
Subsystem: mm/migration
Muchun Song <songmuchun@bytedance.com>:
mm: migrate: fix missing update page_private to hugetlb_page_subpool
Subsystem: mm/thp
Collin Fijalkovich <cfijalkovich@google.com>:
mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs
Yang Shi <shy828301@gmail.com>:
mm: memory: add orig_pmd to struct vm_fault
mm: memory: make numa_migrate_prep() non-static
mm: thp: refactor NUMA fault handling
mm: migrate: account THP NUMA migration counters correctly
mm: migrate: don't split THP for misplaced NUMA page
mm: migrate: check mapcount for THP instead of refcount
mm: thp: skip make PMD PROT_NONE if THP migration is not supported
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/thp: make ARCH_ENABLE_SPLIT_PMD_PTLOCK dependent on PGTABLE_LEVELS > 2
Yang Shi <shy828301@gmail.com>:
mm: rmap: make try_to_unmap() void function
Hugh Dickins <hughd@google.com>:
mm/thp: remap_page() is only needed on anonymous THP
mm: hwpoison_user_mappings() try_to_unmap() with TTU_SYNC
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/thp: fix strncpy warning
Subsystem: mm/nommu
Chen Li <chenli@uniontech.com>:
nommu: remove __GFP_HIGHMEM in vmalloc/vzalloc
Liam Howlett <liam.howlett@oracle.com>:
mm/nommu: unexport do_munmap()
Subsystem: mm/kconfig
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm: generalize ZONE_[DMA|DMA32]
Subsystem: mm/madvise
David Hildenbrand <david@redhat.com>:
Patch series "mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables", v2:
mm: make variable names for populate_vma_page_range() consistent
mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables
MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT
selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore
selftests/vm: add test for MADV_POPULATE_(READ|WRITE)
Subsystem: mm/memory-hotplug
Liam Mark <lmark@codeaurora.org>:
mm/memory_hotplug: rate limit page migration warnings
Oscar Salvador <osalvador@suse.de>:
mm,memory_hotplug: drop unneeded locking
Subsystem: mm/zswap
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup and fixup for zswap":
mm/zswap.c: remove unused function zswap_debugfs_exit()
mm/zswap.c: avoid unnecessary copy-in at map time
mm/zswap.c: fix two bugs in zswap_writeback_entry()
Subsystem: mm/zsmalloc
Zhaoyang Huang <zhaoyang.huang@unisoc.com>:
mm: zram: amend SLAB_RECLAIM_ACCOUNT on zspage_cachep
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup for zsmalloc":
mm/zsmalloc.c: remove confusing code in obj_free()
mm/zsmalloc.c: improve readability for async_free_zspage()
Subsystem: mm/zram
Yue Hu <huyue2@yulong.com>:
zram: move backing_dev under macro CONFIG_ZRAM_WRITEBACK
Subsystem: mm/cleanups
Hyeonggon Yoo <42.hyeyoo@gmail.com>:
mm: fix typos and grammar error in comments
Anshuman Khandual <anshuman.khandual@arm.com>:
mm: define default value for FIRST_USER_ADDRESS
Zhen Lei <thunder.leizhen@huawei.com>:
mm: fix spelling mistakes
Mel Gorman <mgorman@techsingularity.net>:
Patch series "Clean W=1 build warnings for mm/":
mm/vmscan: remove kerneldoc-like comment from isolate_lru_pages
mm/vmalloc: include header for prototype of set_iounmap_nonlazy
mm/page_alloc: make should_fail_alloc_page() static
mm/mapping_dirty_helpers: remove double Note in kerneldoc
mm/memcontrol.c: fix kerneldoc comment for mem_cgroup_calculate_protection
mm/memory_hotplug: fix kerneldoc comment for __try_online_node
mm/memory_hotplug: fix kerneldoc comment for __remove_memory
mm/zbud: add kerneldoc fields for zbud_pool
mm/z3fold: add kerneldoc fields for z3fold_pool
mm/swap: make swap_address_space an inline function
mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations
mm/page_alloc: move prototype for find_suitable_fallback
mm/swap: make NODE_DATA an inline function on CONFIG_FLATMEM
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/thp: define default pmd_pgtable()
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence: unconditionally use unbound work queue
Subsystem: mm/hmm
Alistair Popple <apopple@nvidia.com>:
Patch series "Add support for SVM atomics in Nouveau", v11:
mm: remove special swap entry functions
mm/swapops: rework swap entry manipulation code
mm/rmap: split try_to_munlock from try_to_unmap
mm/rmap: split migration into its own function
mm: rename migrate_pgmap_owner
mm/memory.c: allow different return codes for copy_nonpresent_pte()
mm: device exclusive memory access
mm: selftests for exclusive device memory
nouveau/svm: refactor nouveau_range_fault
nouveau/svm: implement atomic SVM access
Subsystem: procfs
Marcelo Henrique Cerri <marcelo.cerri@canonical.com>:
proc: Avoid mixing integer types in mem_rw()
ZHOUFENG <zhoufeng.zf@bytedance.com>:
fs/proc/kcore.c: add mmap interface
Kalesh Singh <kaleshsingh@google.com>:
procfs: allow reading fdinfo with PTRACE_MODE_READ
procfs/dmabuf: add inode number to /proc/*/fdinfo
Subsystem: sysctl
Jiapeng Chong <jiapeng.chong@linux.alibaba.com>:
sysctl: remove redundant assignment to first
Subsystem: misc
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
drm: include only needed headers in ascii85.h
Subsystem: core-kernel
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
kernel.h: split out panic and oops helpers
Subsystem: lib
Zhen Lei <thunder.leizhen@huawei.com>:
lib: decompress_bunzip2: remove an unneeded semicolon
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
Patch series "lib/string_helpers: get rid of ugly *_escape_mem_ascii()", v3:
lib/string_helpers: switch to use BIT() macro
lib/string_helpers: move ESCAPE_NP check inside 'else' branch in a loop
lib/string_helpers: drop indentation level in string_escape_mem()
lib/string_helpers: introduce ESCAPE_NA for escaping non-ASCII
lib/string_helpers: introduce ESCAPE_NAP to escape non-ASCII and non-printable
lib/string_helpers: allow to append additional characters to be escaped
lib/test-string_helpers: print flags in hexadecimal format
lib/test-string_helpers: get rid of trailing comma in terminators
lib/test-string_helpers: add test cases for new features
MAINTAINERS: add myself as designated reviewer for generic string library
seq_file: introduce seq_escape_mem()
seq_file: add seq_escape_str() as replica of string_escape_str()
seq_file: convert seq_escape() to use seq_escape_str()
nfsd: avoid non-flexible API in seq_quote_mem()
seq_file: drop unused *_escape_mem_ascii()
Trent Piepho <tpiepho@gmail.com>:
lib/math/rational.c: fix divide by zero
lib/math/rational: add Kunit test cases
Zhen Lei <thunder.leizhen@huawei.com>:
lib/decompressors: fix spelling mistakes
lib/mpi: fix spelling mistakes
Alexey Dobriyan <adobriyan@gmail.com>:
lib: memscan() fixlet
lib: uninline simple_strtoull()
Matteo Croce <mcroce@microsoft.com>:
lib/test_string.c: allow module removal
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
kernel.h: split out kstrtox() and simple_strtox() to a separate header
Subsystem: lz4
Rajat Asthana <thisisrast7@gmail.com>:
lz4_decompress: declare LZ4_decompress_safe_withPrefix64k static
Dimitri John Ledkov <dimitri.ledkov@canonical.com>:
lib/decompress_unlz4.c: correctly handle zero-padding around initrds.
Subsystem: checkpatch
Guenter Roeck <linux@roeck-us.net>:
checkpatch: scripts/spdxcheck.py now requires python3
Joe Perches <joe@perches.com>:
checkpatch: improve the indented label test
Guenter Roeck <linux@roeck-us.net>:
checkpatch: do not complain about positive return values starting with EPOLL
Subsystem: init
Andrew Halaney <ahalaney@redhat.com>:
init: print out unknown kernel parameters
Subsystem: kprobes
Barry Song <song.bao.hua@hisilicon.com>:
kprobes: remove duplicated strong free_insn_page in x86 and s390
Subsystem: nilfs2
Colin Ian King <colin.king@canonical.com>:
nilfs2: remove redundant continue statement in a while-loop
Subsystem: hfs
Zhen Lei <thunder.leizhen@huawei.com>:
hfsplus: remove unnecessary oom message
Chung-Chiang Cheng <shepjeng@gmail.com>:
hfsplus: report create_date to kstat.btime
Subsystem: signals
Al Viro <viro@zeniv.linux.org.uk>:
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
Subsystem: exec
Alexey Dobriyan <adobriyan@gmail.com>:
exec: remove checks in __register_bimfmt()
Subsystem: kcov
Marco Elver <elver@google.com>:
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
Subsystem: selftests
Dave Hansen <dave.hansen@linux.intel.com>:
Patch series "selftests/vm/pkeys: Bug fixes and a new test":
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: exercise x86 XSAVE init state
Subsystem: compress/decompress
Yu Kuai <yukuai3@huawei.com>:
lib/decompressors: remove set but not used variabled 'level'
Subsystem: ipc
Vasily Averin <vvs@virtuozzo.com>:
Patch series "ipc: allocations cleanup", v2:
ipc sem: use kvmalloc for sem_undo allocation
ipc: use kmalloc for msg_queue and shmid_kernel
Manfred Spraul <manfred@colorfullife.com>:
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc/util.c: use binary search for max_idx
Documentation/admin-guide/kernel-parameters.txt | 35
Documentation/admin-guide/mm/hugetlbpage.rst | 11
Documentation/admin-guide/mm/memory-hotplug.rst | 13
Documentation/admin-guide/mm/pagemap.rst | 2
Documentation/admin-guide/mm/userfaultfd.rst | 3
Documentation/core-api/kernel-api.rst | 7
Documentation/filesystems/proc.rst | 48
Documentation/vm/hmm.rst | 19
Documentation/vm/unevictable-lru.rst | 33
MAINTAINERS | 10
arch/alpha/Kconfig | 5
arch/alpha/include/asm/pgalloc.h | 1
arch/alpha/include/asm/pgtable.h | 1
arch/alpha/include/uapi/asm/mman.h | 3
arch/alpha/kernel/setup.c | 2
arch/arc/include/asm/pgalloc.h | 2
arch/arc/include/asm/pgtable.h | 8
arch/arm/Kconfig | 3
arch/arm/include/asm/pgalloc.h | 1
arch/arm64/Kconfig | 15
arch/arm64/include/asm/hugetlb.h | 3
arch/arm64/include/asm/memory.h | 2
arch/arm64/include/asm/page.h | 4
arch/arm64/include/asm/pgalloc.h | 1
arch/arm64/include/asm/pgtable.h | 2
arch/arm64/kernel/setup.c | 1
arch/arm64/kvm/mmu.c | 2
arch/arm64/mm/hugetlbpage.c | 5
arch/arm64/mm/init.c | 51
arch/arm64/mm/ioremap.c | 4
arch/arm64/mm/mmu.c | 22
arch/csky/include/asm/pgalloc.h | 2
arch/csky/include/asm/pgtable.h | 1
arch/hexagon/include/asm/pgtable.h | 4
arch/ia64/Kconfig | 7
arch/ia64/include/asm/pal.h | 1
arch/ia64/include/asm/pgalloc.h | 1
arch/ia64/include/asm/pgtable.h | 1
arch/m68k/Kconfig | 5
arch/m68k/include/asm/mcf_pgalloc.h | 2
arch/m68k/include/asm/mcf_pgtable.h | 2
arch/m68k/include/asm/motorola_pgalloc.h | 1
arch/m68k/include/asm/motorola_pgtable.h | 2
arch/m68k/include/asm/pgtable_mm.h | 1
arch/m68k/include/asm/sun3_pgalloc.h | 1
arch/microblaze/Kconfig | 4
arch/microblaze/include/asm/pgalloc.h | 2
arch/microblaze/include/asm/pgtable.h | 2
arch/mips/Kconfig | 10
arch/mips/include/asm/pgalloc.h | 1
arch/mips/include/asm/pgtable-32.h | 1
arch/mips/include/asm/pgtable-64.h | 1
arch/mips/include/uapi/asm/mman.h | 3
arch/mips/kernel/relocate.c | 1
arch/mips/sgi-ip22/ip22-reset.c | 1
arch/mips/sgi-ip32/ip32-reset.c | 1
arch/nds32/include/asm/pgalloc.h | 5
arch/nios2/include/asm/pgalloc.h | 1
arch/nios2/include/asm/pgtable.h | 2
arch/openrisc/include/asm/pgalloc.h | 2
arch/openrisc/include/asm/pgtable.h | 1
arch/parisc/include/asm/pgalloc.h | 1
arch/parisc/include/asm/pgtable.h | 2
arch/parisc/include/uapi/asm/mman.h | 3
arch/parisc/kernel/pdc_chassis.c | 1
arch/powerpc/Kconfig | 6
arch/powerpc/include/asm/book3s/pgtable.h | 1
arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h | 5
arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 43
arch/powerpc/include/asm/nohash/32/pgtable.h | 1
arch/powerpc/include/asm/nohash/64/pgtable.h | 2
arch/powerpc/include/asm/pgalloc.h | 5
arch/powerpc/include/asm/pgtable.h | 6
arch/powerpc/kernel/setup-common.c | 1
arch/powerpc/platforms/Kconfig.cputype | 1
arch/riscv/Kconfig | 5
arch/riscv/include/asm/pgalloc.h | 2
arch/riscv/include/asm/pgtable.h | 2
arch/s390/Kconfig | 6
arch/s390/include/asm/pgalloc.h | 3
arch/s390/include/asm/pgtable.h | 5
arch/s390/kernel/ipl.c | 1
arch/s390/kernel/kprobes.c | 5
arch/s390/mm/pgtable.c | 2
arch/sh/include/asm/pgalloc.h | 1
arch/sh/include/asm/pgtable.h | 2
arch/sparc/Kconfig | 5
arch/sparc/include/asm/pgalloc_32.h | 1
arch/sparc/include/asm/pgalloc_64.h | 1
arch/sparc/include/asm/pgtable_32.h | 3
arch/sparc/include/asm/pgtable_64.h | 8
arch/sparc/kernel/sstate.c | 1
arch/sparc/mm/hugetlbpage.c | 6
arch/sparc/mm/init_64.c | 1
arch/um/drivers/mconsole_kern.c | 1
arch/um/include/asm/pgalloc.h | 1
arch/um/include/asm/pgtable-2level.h | 1
arch/um/include/asm/pgtable-3level.h | 1
arch/um/kernel/um_arch.c | 1
arch/x86/Kconfig | 17
arch/x86/include/asm/desc.h | 1
arch/x86/include/asm/pgalloc.h | 2
arch/x86/include/asm/pgtable_types.h | 2
arch/x86/kernel/cpu/mshyperv.c | 1
arch/x86/kernel/kprobes/core.c | 6
arch/x86/kernel/setup.c | 1
arch/x86/mm/init_64.c | 21
arch/x86/mm/pgtable.c | 34
arch/x86/purgatory/purgatory.c | 2
arch/x86/xen/enlighten.c | 1
arch/xtensa/include/asm/pgalloc.h | 2
arch/xtensa/include/asm/pgtable.h | 1
arch/xtensa/include/uapi/asm/mman.h | 3
arch/xtensa/platforms/iss/setup.c | 1
drivers/block/zram/zram_drv.h | 2
drivers/bus/brcmstb_gisb.c | 1
drivers/char/ipmi/ipmi_msghandler.c | 1
drivers/clk/analogbits/wrpll-cln28hpc.c | 4
drivers/edac/altera_edac.c | 1
drivers/firmware/google/gsmi.c | 1
drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1
drivers/gpu/drm/nouveau/nouveau_svm.c | 162 ++-
drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1
drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6
drivers/hv/vmbus_drv.c | 1
drivers/hwtracing/coresight/coresight-cpu-debug.c | 1
drivers/leds/trigger/ledtrig-activity.c | 1
drivers/leds/trigger/ledtrig-heartbeat.c | 1
drivers/leds/trigger/ledtrig-panic.c | 1
drivers/misc/bcm-vk/bcm_vk_dev.c | 1
drivers/misc/ibmasm/heartbeat.c | 1
drivers/misc/pvpanic/pvpanic.c | 1
drivers/net/ipa/ipa_smp2p.c | 1
drivers/parisc/power.c | 1
drivers/power/reset/ltc2952-poweroff.c | 1
drivers/remoteproc/remoteproc_core.c | 1
drivers/s390/char/con3215.c | 1
drivers/s390/char/con3270.c | 1
drivers/s390/char/sclp.c | 1
drivers/s390/char/sclp_con.c | 1
drivers/s390/char/sclp_vt220.c | 1
drivers/s390/char/zcore.c | 1
drivers/soc/bcm/brcmstb/pm/pm-arm.c | 1
drivers/staging/olpc_dcon/olpc_dcon.c | 1
drivers/video/fbdev/hyperv_fb.c | 1
drivers/virtio/virtio_mem.c | 2
fs/Kconfig | 15
fs/exec.c | 3
fs/hfsplus/inode.c | 5
fs/hfsplus/xattr.c | 1
fs/nfsd/nfs4state.c | 2
fs/nilfs2/btree.c | 1
fs/open.c | 13
fs/proc/base.c | 6
fs/proc/fd.c | 20
fs/proc/kcore.c | 136 ++
fs/proc/task_mmu.c | 34
fs/seq_file.c | 43
fs/userfaultfd.c | 15
include/asm-generic/bug.h | 3
include/linux/ascii85.h | 3
include/linux/bootmem_info.h | 68 +
include/linux/compat.h | 2
include/linux/compiler-clang.h | 17
include/linux/compiler-gcc.h | 6
include/linux/compiler_types.h | 2
include/linux/huge_mm.h | 74 -
include/linux/hugetlb.h | 80 +
include/linux/hugetlb_cgroup.h | 19
include/linux/kcore.h | 3
include/linux/kernel.h | 227 ----
include/linux/kprobes.h | 1
include/linux/kstrtox.h | 155 ++
include/linux/memblock.h | 4
include/linux/memory_hotplug.h | 27
include/linux/mempolicy.h | 9
include/linux/memremap.h | 2
include/linux/migrate.h | 27
include/linux/mm.h | 18
include/linux/mm_types.h | 2
include/linux/mmu_notifier.h | 26
include/linux/mmzone.h | 27
include/linux/mpi.h | 4
include/linux/page-flags.h | 22
include/linux/panic.h | 98 +
include/linux/panic_notifier.h | 12
include/linux/pgtable.h | 44
include/linux/rmap.h | 13
include/linux/seq_file.h | 10
include/linux/shmem_fs.h | 19
include/linux/signal.h | 2
include/linux/string.h | 7
include/linux/string_helpers.h | 31
include/linux/sunrpc/cache.h | 1
include/linux/swap.h | 19
include/linux/swapops.h | 171 +--
include/linux/thread_info.h | 1
include/linux/userfaultfd_k.h | 5
include/linux/vmalloc.h | 15
include/linux/zbud.h | 23
include/trace/events/vmscan.h | 41
include/uapi/asm-generic/mman-common.h | 3
include/uapi/linux/mempolicy.h | 1
include/uapi/linux/userfaultfd.h | 7
init/main.c | 42
ipc/msg.c | 6
ipc/sem.c | 25
ipc/shm.c | 6
ipc/util.c | 44
ipc/util.h | 3
kernel/hung_task.c | 1
kernel/kexec_core.c | 1
kernel/kprobes.c | 2
kernel/panic.c | 1
kernel/rcu/tree.c | 2
kernel/signal.c | 14
kernel/sysctl.c | 4
kernel/trace/trace.c | 1
lib/Kconfig.debug | 12
lib/decompress_bunzip2.c | 6
lib/decompress_unlz4.c | 8
lib/decompress_unlzo.c | 3
lib/decompress_unxz.c | 2
lib/decompress_unzstd.c | 4
lib/kstrtox.c | 5
lib/lz4/lz4_decompress.c | 2
lib/math/Makefile | 1
lib/math/rational-test.c | 56 +
lib/math/rational.c | 16
lib/mpi/longlong.h | 4
lib/mpi/mpicoder.c | 6
lib/mpi/mpiutil.c | 2
lib/parser.c | 1
lib/string.c | 2
lib/string_helpers.c | 142 +-
lib/test-string_helpers.c | 157 ++-
lib/test_hmm.c | 127 ++
lib/test_hmm_uapi.h | 2
lib/test_string.c | 5
lib/vsprintf.c | 1
lib/xz/xz_dec_bcj.c | 2
lib/xz/xz_dec_lzma2.c | 8
lib/zlib_inflate/inffast.c | 2
lib/zstd/huf.h | 2
mm/Kconfig | 16
mm/Makefile | 2
mm/bootmem_info.c | 127 ++
mm/compaction.c | 20
mm/debug_vm_pgtable.c | 109 --
mm/gup.c | 58 +
mm/hmm.c | 12
mm/huge_memory.c | 269 ++---
mm/hugetlb.c | 369 +++++--
mm/hugetlb_vmemmap.c | 332 ++++++
mm/hugetlb_vmemmap.h | 53 -
mm/internal.h | 29
mm/kfence/core.c | 4
mm/khugepaged.c | 20
mm/madvise.c | 66 +
mm/mapping_dirty_helpers.c | 2
mm/memblock.c | 28
mm/memcontrol.c | 4
mm/memory-failure.c | 38
mm/memory.c | 239 +++-
mm/memory_hotplug.c | 161 ---
mm/mempolicy.c | 323 ++----
mm/migrate.c | 268 +----
mm/mlock.c | 12
mm/mmap_lock.c | 59 -
mm/mprotect.c | 18
mm/nommu.c | 5
mm/oom_kill.c | 2
mm/page_alloc.c | 5
mm/page_vma_mapped.c | 15
mm/rmap.c | 644 +++++++++---
mm/shmem.c | 125 --
mm/sparse-vmemmap.c | 432 +++++++-
mm/sparse.c | 1
mm/swap.c | 2
mm/swapfile.c | 2
mm/userfaultfd.c | 249 ++--
mm/util.c | 40
mm/vmalloc.c | 37
mm/vmscan.c | 20
mm/workingset.c | 10
mm/z3fold.c | 39
mm/zbud.c | 235 ++--
mm/zsmalloc.c | 5
mm/zswap.c | 26
scripts/checkpatch.pl | 16
tools/testing/selftests/vm/.gitignore | 3
tools/testing/selftests/vm/Makefile | 5
tools/testing/selftests/vm/hmm-tests.c | 158 +++
tools/testing/selftests/vm/khugepaged.c | 4
tools/testing/selftests/vm/madv_populate.c | 342 ++++++
tools/testing/selftests/vm/pkey-x86.h | 1
tools/testing/selftests/vm/protection_keys.c | 85 +
tools/testing/selftests/vm/run_vmtests.sh | 16
tools/testing/selftests/vm/userfaultfd.c | 1094 ++++++++++-----------
299 files changed, 6277 insertions(+), 3183 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-06-29 2:32 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-06-29 2:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
192 patches, based on 7cf3dead1ad70c72edb03e2d98e1f3dcd332cdb2.
Subsystems affected by this patch series:
mm/gup
mm/pagealloc
kthread
ia64
scripts
ntfs
squashfs
ocfs2
z
kernel/watchdog
mm/slab
mm/slub
mm/kmemleak
mm/dax
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/memcg
mm/pagemap
mm/mprotect
mm/bootmem
mm/dma
mm/tracing
mm/vmalloc
mm/kasan
mm/initialization
mm/pagealloc
mm/memory-failure
Subsystem: mm/gup
Jann Horn <jannh@google.com>:
mm/gup: fix try_grab_compound_head() race with split_huge_page()
Subsystem: mm/pagealloc
Mike Rapoport <rppt@linux.ibm.com>:
mm/page_alloc: fix memory map initialization for descending nodes
Mel Gorman <mgorman@techsingularity.net>:
mm/page_alloc: correct return value of populated elements if bulk array is populated
Subsystem: kthread
Jonathan Neuschäfer <j.neuschaefer@gmx.net>:
kthread: switch to new kerneldoc syntax for named variable macro argument
Petr Mladek <pmladek@suse.com>:
kthread_worker: fix return value when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Subsystem: ia64
Randy Dunlap <rdunlap@infradead.org>:
ia64: headers: drop duplicated words
Arnd Bergmann <arnd@arndb.de>:
ia64: mca_drv: fix incorrect array size calculation
Subsystem: scripts
"Steven Rostedt (VMware)" <rostedt@goodmis.org>:
Patch series "streamline_config.pl: Fix Perl spacing":
streamline_config.pl: make spacing consistent
streamline_config.pl: add softtabstop=4 for vim users
Colin Ian King <colin.king@canonical.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Subsystem: ntfs
Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>:
ntfs: fix validity check for file name attribute
Subsystem: squashfs
Vincent Whitchurch <vincent.whitchurch@axis.com>:
squashfs: add option to panic on errors
Subsystem: ocfs2
Yang Yingliang <yangyingliang@huawei.com>:
ocfs2: remove unnecessary INIT_LIST_HEAD()
Subsystem: z
Dan Carpenter <dan.carpenter@oracle.com>:
ocfs2: fix snprintf() checking
Colin Ian King <colin.king@canonical.com>:
ocfs2: remove redundant assignment to pointer queue
Wan Jiabing <wanjiabing@vivo.com>:
ocfs2: remove repeated uptodate check for buffer
Chen Huang <chenhuang5@huawei.com>:
ocfs2: replace simple_strtoull() with kstrtoull()
Colin Ian King <colin.king@canonical.com>:
ocfs2: remove redundant initialization of variable ret
Subsystem: kernel/watchdog
Wang Qing <wangqing@vivo.com>:
kernel: watchdog: modify the explanation related to watchdog thread
doc: watchdog: modify the explanation related to watchdog thread
doc: watchdog: modify the doc related to "watchdog/%u"
Subsystem: mm/slab
gumingtao <gumingtao1225@gmail.com>:
slab: use __func__ to trace function name
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
kunit: make test->lock irq safe
Oliver Glitta <glittao@gmail.com>:
mm/slub, kunit: add a KUnit test for SLUB debugging functionality
slub: remove resiliency_test() function
Hyeonggon Yoo <42.hyeyoo@gmail.com>:
mm, slub: change run-time assertion in kmalloc_index() to compile-time
Stephen Boyd <swboyd@chromium.org>:
slub: restore slub_debug=- behavior
slub: actually use 'message' in restore_bytes()
Joe Perches <joe@perches.com>:
slub: indicate slab_fix() uses printf formats
Stephen Boyd <swboyd@chromium.org>:
slub: force on no_hash_pointers when slub_debug is enabled
Faiyaz Mohammed <faiyazm@codeaurora.org>:
mm: slub: move sysfs slab alloc/free interfaces to debugfs
Georgi Djakov <quic_c_gdjako@quicinc.com>:
mm/slub: add taint after the errors are printed
Subsystem: mm/kmemleak
Yanfei Xu <yanfei.xu@windriver.com>:
mm/kmemleak: fix possible wrong memory scanning period
Subsystem: mm/dax
Jan Kara <jack@suse.cz>:
dax: fix ENOMEM handling in grab_mapping_entry()
Subsystem: mm/debug
Tang Bin <tangbin@cmss.chinamobile.com>:
tools/vm/page_owner_sort.c: check malloc() return
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage()
Nicolas Saenz Julienne <nsaenzju@redhat.com>:
mm: mmap_lock: use local locks instead of disabling preemption
Gavin Shan <gshan@redhat.com>:
Patch series "mm/page_reporting: Make page reporting work on arm64 with 64KB page size", v4:
mm/page_reporting: fix code style in __page_reporting_request()
mm/page_reporting: export reporting order as module parameter
mm/page_reporting: allow driver to specify reporting order
virtio_balloon: specify page reporting order if needed
Subsystem: mm/pagecache
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm: page-writeback: kill get_writeback_state() comments
Chi Wu <wuchi.zero@gmail.com>:
mm/page-writeback: Fix performance when BDI's share of ratio is 0.
mm/page-writeback: update the comment of Dirty position control
mm/page-writeback: use __this_cpu_inc() in account_page_dirtied()
Roman Gushchin <guro@fb.com>:
Patch series "cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups", v9:
writeback, cgroup: do not switch inodes with I_WILL_FREE flag
writeback, cgroup: add smp_mb() to cgroup_writeback_umount()
writeback, cgroup: increment isw_nr_in_flight before grabbing an inode
writeback, cgroup: switch to rcu_work API in inode_switch_wbs()
writeback, cgroup: keep list of inodes attached to bdi_writeback
writeback, cgroup: split out the functional part of inode_switch_wbs_work_fn()
writeback, cgroup: support switching multiple inodes at once
writeback, cgroup: release dying cgwbs by switching attached inodes
Christoph Hellwig <hch@lst.de>:
Patch series "remove the implicit .set_page_dirty default":
fs: unexport __set_page_dirty
fs: move ramfs_aops to libfs
mm: require ->set_page_dirty to be explicitly wired up
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Further set_page_dirty cleanups":
mm/writeback: move __set_page_dirty() to core mm
mm/writeback: use __set_page_dirty in __set_page_dirty_nobuffers
iomap: use __set_page_dirty_nobuffers
fs: remove anon_set_page_dirty()
fs: remove noop_set_page_dirty()
mm: move page dirtying prototypes from mm.h
Subsystem: mm/gup
Peter Xu <peterx@redhat.com>:
Patch series "mm/gup: Fix pin page write cache bouncing on has_pinned", v2:
mm/gup_benchmark: support threading
Andrea Arcangeli <aarcange@redhat.com>:
mm: gup: allow FOLL_PIN to scale in SMP
mm: gup: pack has_pinned in MMF_HAS_PINNED
Christophe Leroy <christophe.leroy@csgroup.eu>:
mm: pagewalk: fix walk for hugepage tables
Subsystem: mm/swap
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "close various race windows for swap", v6:
mm/swapfile: use percpu_ref to serialize against concurrent swapoff
swap: fix do_swap_page() race with swapoff
mm/swap: remove confusing checking for non_swap_entry() in swap_ra_info()
mm/shmem: fix shmem_swapin() race with swapoff
Patch series "Cleanups for swap", v2:
mm/swapfile: move get_swap_page_of_type() under CONFIG_HIBERNATION
mm/swap: remove unused local variable nr_shadows
mm/swap_slots.c: delete meaningless forward declarations
Huang Ying <ying.huang@intel.com>:
mm, swap: remove unnecessary smp_rmb() in swap_type_to_swap_info()
mm: free idle swap cache page after COW
swap: check mapping_empty() for swap cache before being freed
Subsystem: mm/memcg
Waiman Long <longman@redhat.com>:
Patch series "mm/memcg: Reduce kmemcache memory accounting overhead", v6:
mm/memcg: move mod_objcg_state() to memcontrol.c
mm/memcg: cache vmstat data in percpu memcg_stock_pcp
mm/memcg: improve refill_obj_stock() performance
mm/memcg: optimize user context object stock access
Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4:
mm: memcg/slab: properly set up gfp flags for objcg pointer array
mm: memcg/slab: create a new set of kmalloc-cg-<n> caches
mm: memcg/slab: disable cache merging for KMALLOC_NORMAL caches
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: fix root_mem_cgroup charging
Patch series "memcontrol code cleanup and simplification", v3:
mm: memcontrol: fix page charging in page replacement
mm: memcontrol: bail out early when !mm in get_mem_cgroup_from_mm
mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec
mm: memcontrol: simplify lruvec_holds_page_lru_lock
mm: memcontrol: rename lruvec_holds_page_lru_lock to page_matches_lruvec
mm: memcontrol: simplify the logic of objcg pinning memcg
mm: memcontrol: move obj_cgroup_uncharge_pages() out of css_set_lock
mm: vmscan: remove noinline_for_stack
wenhuizhang <wenhui@gwmail.gwu.edu>:
memcontrol: use flexible-array member
Dan Schatzberg <schatzberg.dan@gmail.com>:
Patch series "Charge loop device i/o to issuing cgroup", v14:
loop: use worker per cgroup instead of kworker
mm: charge active memcg when no mm is set
loop: charge i/o to mem and blk cg
Huilong Deng <denghuilong@cdjrlc.com>:
mm: memcontrol: remove trailing semicolon in macros
Subsystem: mm/pagemap
David Hildenbrand <david@redhat.com>:
Patch series "perf/binfmt/mm: remove in-tree usage of MAP_EXECUTABLE":
perf: MAP_EXECUTABLE does not indicate VM_MAYEXEC
binfmt: remove in-tree usage of MAP_EXECUTABLE
mm: ignore MAP_EXECUTABLE in ksys_mmap_pgoff()
Gonzalo Matias Juarez Tello <gmjuareztello@gmail.com>:
mm/mmap.c: logic of find_vma_intersection repeated in __do_munmap
Liam Howlett <liam.howlett@oracle.com>:
mm/mmap: introduce unlock_range() for code cleanup
mm/mmap: use find_vma_intersection() in do_mmap() for overlap
Liu Xiang <liu.xiang@zlingsmart.com>:
mm/memory.c: fix comment of finish_mkwrite_fault()
Liam Howlett <liam.howlett@oracle.com>:
Patch series "mm: Add vma_lookup()", v2:
mm: add vma_lookup(), update find_vma_intersection() comments
drm/i915/selftests: use vma_lookup() in __igt_mmap()
arch/arc/kernel/troubleshoot: use vma_lookup() instead of find_vma()
arch/arm64/kvm: use vma_lookup() instead of find_vma_intersection()
arch/powerpc/kvm/book3s_hv_uvmem: use vma_lookup() instead of find_vma_intersection()
arch/powerpc/kvm/book3s: use vma_lookup() in kvmppc_hv_setup_htab_rma()
arch/mips/kernel/traps: use vma_lookup() instead of find_vma()
arch/m68k/kernel/sys_m68k: use vma_lookup() in sys_cacheflush()
x86/sgx: use vma_lookup() in sgx_encl_find()
virt/kvm: use vma_lookup() instead of find_vma_intersection()
vfio: use vma_lookup() instead of find_vma_intersection()
net/ipv5/tcp: use vma_lookup() in tcp_zerocopy_receive()
drm/amdgpu: use vma_lookup() in amdgpu_ttm_tt_get_user_pages()
media: videobuf2: use vma_lookup() in get_vaddr_frames()
misc/sgi-gru/grufault: use vma_lookup() in gru_find_vma()
kernel/events/uprobes: use vma_lookup() in find_active_uprobe()
lib/test_hmm: use vma_lookup() in dmirror_migrate()
mm/ksm: use vma_lookup() in find_mergeable_vma()
mm/migrate: use vma_lookup() in do_pages_stat_array()
mm/mremap: use vma_lookup() in vma_to_resize()
mm/memory.c: use vma_lookup() in __access_remote_vm()
mm/mempolicy: use vma_lookup() in __access_remote_vm()
Chen Li <chenli@uniontech.com>:
mm: update legacy flush_tlb_* to use vma
Subsystem: mm/mprotect
Peter Collingbourne <pcc@google.com>:
mm: improve mprotect(R|W) efficiency on pages referenced once
Subsystem: mm/bootmem
Souptick Joarder <jrdr.linux@gmail.com>:
h8300: remove unused variable
Subsystem: mm/dma
YueHaibing <yuehaibing@huawei.com>:
mm/dmapool: use DEVICE_ATTR_RO macro
Subsystem: mm/tracing
Vincent Whitchurch <vincent.whitchurch@axis.com>:
mm, tracing: unify PFN format strings
Subsystem: mm/vmalloc
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
Patch series "vmalloc() vs bulk allocator", v2:
mm/page_alloc: add an alloc_pages_bulk_array_node() helper
mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()
mm/vmalloc: print a warning message first on failure
mm/vmalloc: remove quoted strings split across lines
Uladzislau Rezki <urezki@gmail.com>:
mm/vmalloc: fallback to a single page allocator
Rafael Aquini <aquini@redhat.com>:
mm: vmalloc: add cond_resched() in __vunmap()
Subsystem: mm/kasan
Alexander Potapenko <glider@google.com>:
printk: introduce dump_stack_lvl()
kasan: use dump_stack_lvl(KERN_ERR) to print stacks
David Gow <davidgow@google.com>:
kasan: test: improve failure message in KUNIT_EXPECT_KASAN_FAIL()
Daniel Axtens <dja@axtens.net>:
Patch series "KASAN core changes for ppc64 radix KASAN", v16:
kasan: allow an architecture to disable inline instrumentation
kasan: allow architectures to provide an outline readiness check
mm: define default MAX_PTRS_PER_* in include/pgtable.h
kasan: use MAX_PTRS_PER_* for early shadow tables
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
Patch series "kasan: add memory corruption identification support for hw tag-based kasan", v4:
kasan: rename CONFIG_KASAN_SW_TAGS_IDENTIFY to CONFIG_KASAN_TAGS_IDENTIFY
kasan: integrate the common part of two KASAN tag-based modes
kasan: add memory corruption identification support for hardware tag-based mode
Subsystem: mm/initialization
Jungseung Lee <js07.lee@samsung.com>:
mm: report which part of mem is being freed on initmem case
Subsystem: mm/pagealloc
Mike Rapoport <rppt@linux.ibm.com>:
mm/mmzone.h: simplify is_highmem_idx()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Constify struct page arguments":
mm: make __dump_page static
Aaron Tomlin <atomlin@redhat.com>:
mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/debug: factor PagePoisoned out of __dump_page
mm/page_owner: constify dump_page_owner
mm: make compound_head const-preserving
mm: constify get_pfnblock_flags_mask and get_pfnblock_migratetype
mm: constify page_count and page_ref_count
mm: optimise nth_page for contiguous memmap
Heiner Kallweit <hkallweit1@gmail.com>:
mm/page_alloc: switch to pr_debug
Andrii Nakryiko <andrii@kernel.org>:
kbuild: skip per-CPU BTF generation for pahole v1.18-v1.21
Mel Gorman <mgorman@techsingularity.net>:
mm/page_alloc: split per cpu page lists and zone stats
mm/page_alloc: convert per-cpu list protection to local_lock
mm/vmstat: convert NUMA statistics to basic NUMA counters
mm/vmstat: inline NUMA event counter updates
mm/page_alloc: batch the accounting updates in the bulk allocator
mm/page_alloc: reduce duration that IRQs are disabled for VM counters
mm/page_alloc: explicitly acquire the zone lock in __free_pages_ok
mm/page_alloc: avoid conflating IRQs disabled with zone->lock
mm/page_alloc: update PGFREE outside the zone lock in __free_pages_ok
Minchan Kim <minchan@kernel.org>:
mm: page_alloc: dump migrate-failed pages only at -EBUSY
Mel Gorman <mgorman@techsingularity.net>:
Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2:
mm/page_alloc: delete vm.percpu_pagelist_fraction
mm/page_alloc: disassociate the pcp->high from pcp->batch
mm/page_alloc: adjust pcp->high after CPU hotplug events
mm/page_alloc: scale the number of pages that are batch freed
mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
Dong Aisheng <aisheng.dong@nxp.com>:
mm: drop SECTION_SHIFT in code comments
mm/page_alloc: improve memmap_pages dbg msg
Liu Shixin <liushixin2@huawei.com>:
mm/page_alloc: fix counting of managed_pages
Mel Gorman <mgorman@techsingularity.net>:
Patch series "Allow high order pages to be stored on PCP", v2:
mm/page_alloc: move free_the_page
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "Remove DISCONTIGMEM memory model", v3:
alpha: remove DISCONTIGMEM and NUMA
arc: update comment about HIGHMEM implementation
arc: remove support for DISCONTIGMEM
m68k: remove support for DISCONTIGMEM
mm: remove CONFIG_DISCONTIGMEM
arch, mm: remove stale mentions of DISCONIGMEM
docs: remove description of DISCONTIGMEM
mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
Mel Gorman <mgorman@techsingularity.net>:
mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm,hwpoison: send SIGBUS with error virutal address
mm,hwpoison: make get_hwpoison_page() call get_any_page()
Documentation/admin-guide/kernel-parameters.txt | 6
Documentation/admin-guide/lockup-watchdogs.rst | 4
Documentation/admin-guide/sysctl/kernel.rst | 10
Documentation/admin-guide/sysctl/vm.rst | 52 -
Documentation/dev-tools/kasan.rst | 9
Documentation/vm/memory-model.rst | 45
arch/alpha/Kconfig | 22
arch/alpha/include/asm/machvec.h | 6
arch/alpha/include/asm/mmzone.h | 100 --
arch/alpha/include/asm/pgtable.h | 4
arch/alpha/include/asm/topology.h | 39
arch/alpha/kernel/core_marvel.c | 53 -
arch/alpha/kernel/core_wildfire.c | 29
arch/alpha/kernel/pci_iommu.c | 29
arch/alpha/kernel/proto.h | 8
arch/alpha/kernel/setup.c | 16
arch/alpha/kernel/sys_marvel.c | 5
arch/alpha/kernel/sys_wildfire.c | 5
arch/alpha/mm/Makefile | 2
arch/alpha/mm/init.c | 3
arch/alpha/mm/numa.c | 223 ----
arch/arc/Kconfig | 13
arch/arc/include/asm/mmzone.h | 40
arch/arc/kernel/troubleshoot.c | 8
arch/arc/mm/init.c | 21
arch/arm/include/asm/tlbflush.h | 13
arch/arm/mm/tlb-v6.S | 2
arch/arm/mm/tlb-v7.S | 2
arch/arm64/Kconfig | 2
arch/arm64/kvm/mmu.c | 2
arch/h8300/kernel/setup.c | 2
arch/ia64/Kconfig | 2
arch/ia64/include/asm/pal.h | 2
arch/ia64/include/asm/spinlock.h | 2
arch/ia64/include/asm/uv/uv_hub.h | 2
arch/ia64/kernel/efi_stub.S | 2
arch/ia64/kernel/mca_drv.c | 2
arch/ia64/kernel/topology.c | 5
arch/ia64/mm/numa.c | 5
arch/m68k/Kconfig.cpu | 10
arch/m68k/include/asm/mmzone.h | 10
arch/m68k/include/asm/page.h | 2
arch/m68k/include/asm/page_mm.h | 35
arch/m68k/include/asm/tlbflush.h | 2
arch/m68k/kernel/sys_m68k.c | 4
arch/m68k/mm/init.c | 20
arch/mips/Kconfig | 2
arch/mips/include/asm/mmzone.h | 8
arch/mips/include/asm/page.h | 2
arch/mips/kernel/traps.c | 4
arch/mips/mm/init.c | 7
arch/nds32/include/asm/memory.h | 6
arch/openrisc/include/asm/tlbflush.h | 2
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/mmzone.h | 4
arch/powerpc/kernel/setup_64.c | 2
arch/powerpc/kernel/smp.c | 2
arch/powerpc/kexec/core.c | 4
arch/powerpc/kvm/book3s_hv.c | 4
arch/powerpc/kvm/book3s_hv_uvmem.c | 2
arch/powerpc/mm/Makefile | 2
arch/powerpc/mm/mem.c | 4
arch/riscv/Kconfig | 2
arch/s390/Kconfig | 2
arch/s390/include/asm/pgtable.h | 2
arch/sh/include/asm/mmzone.h | 4
arch/sh/kernel/topology.c | 2
arch/sh/mm/Kconfig | 2
arch/sh/mm/init.c | 2
arch/sparc/Kconfig | 2
arch/sparc/include/asm/mmzone.h | 4
arch/sparc/kernel/smp_64.c | 2
arch/sparc/mm/init_64.c | 12
arch/x86/Kconfig | 2
arch/x86/ia32/ia32_aout.c | 4
arch/x86/kernel/cpu/mce/core.c | 13
arch/x86/kernel/cpu/sgx/encl.h | 4
arch/x86/kernel/setup_percpu.c | 6
arch/x86/mm/init_32.c | 4
arch/xtensa/include/asm/page.h | 4
arch/xtensa/include/asm/tlbflush.h | 4
drivers/base/node.c | 18
drivers/block/loop.c | 270 ++++-
drivers/block/loop.h | 15
drivers/dax/device.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4
drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 2
drivers/media/common/videobuf2/frame_vector.c | 2
drivers/misc/sgi-gru/grufault.c | 4
drivers/vfio/vfio_iommu_type1.c | 2
drivers/virtio/virtio_balloon.c | 17
fs/adfs/inode.c | 1
fs/affs/file.c | 2
fs/bfs/file.c | 1
fs/binfmt_aout.c | 4
fs/binfmt_elf.c | 2
fs/binfmt_elf_fdpic.c | 11
fs/binfmt_flat.c | 2
fs/block_dev.c | 1
fs/buffer.c | 25
fs/configfs/inode.c | 8
fs/dax.c | 3
fs/ecryptfs/mmap.c | 13
fs/exfat/inode.c | 1
fs/ext2/inode.c | 4
fs/ext4/inode.c | 2
fs/fat/inode.c | 1
fs/fs-writeback.c | 366 +++++---
fs/fuse/dax.c | 3
fs/gfs2/aops.c | 2
fs/gfs2/meta_io.c | 2
fs/hfs/inode.c | 2
fs/hfsplus/inode.c | 2
fs/hpfs/file.c | 1
fs/iomap/buffered-io.c | 27
fs/jfs/inode.c | 1
fs/kernfs/inode.c | 8
fs/libfs.c | 44
fs/minix/inode.c | 1
fs/nilfs2/mdt.c | 1
fs/ntfs/inode.c | 2
fs/ocfs2/aops.c | 4
fs/ocfs2/cluster/heartbeat.c | 7
fs/ocfs2/cluster/nodemanager.c | 2
fs/ocfs2/dlm/dlmmaster.c | 2
fs/ocfs2/filecheck.c | 6
fs/ocfs2/stackglue.c | 8
fs/omfs/file.c | 1
fs/proc/task_mmu.c | 2
fs/ramfs/inode.c | 9
fs/squashfs/block.c | 5
fs/squashfs/squashfs_fs_sb.h | 1
fs/squashfs/super.c | 86 +
fs/sysv/itree.c | 1
fs/udf/file.c | 1
fs/udf/inode.c | 1
fs/ufs/inode.c | 1
fs/xfs/xfs_aops.c | 4
fs/zonefs/super.c | 4
include/asm-generic/memory_model.h | 37
include/asm-generic/pgtable-nop4d.h | 1
include/asm-generic/topology.h | 2
include/kunit/test.h | 5
include/linux/backing-dev-defs.h | 20
include/linux/cpuhotplug.h | 2
include/linux/fs.h | 6
include/linux/gfp.h | 13
include/linux/iomap.h | 1
include/linux/kasan.h | 7
include/linux/kernel.h | 2
include/linux/kthread.h | 2
include/linux/memblock.h | 6
include/linux/memcontrol.h | 60 -
include/linux/mm.h | 53 -
include/linux/mm_types.h | 10
include/linux/mman.h | 2
include/linux/mmdebug.h | 3
include/linux/mmzone.h | 96 +-
include/linux/page-flags.h | 10
include/linux/page_owner.h | 6
include/linux/page_ref.h | 4
include/linux/page_reporting.h | 3
include/linux/pageblock-flags.h | 2
include/linux/pagemap.h | 4
include/linux/pgtable.h | 22
include/linux/printk.h | 5
include/linux/sched/coredump.h | 8
include/linux/slab.h | 59 +
include/linux/swap.h | 19
include/linux/swapops.h | 5
include/linux/vmstat.h | 69 -
include/linux/writeback.h | 1
include/trace/events/cma.h | 4
include/trace/events/filemap.h | 2
include/trace/events/kmem.h | 12
include/trace/events/page_pool.h | 4
include/trace/events/pagemap.h | 4
include/trace/events/vmscan.h | 2
kernel/cgroup/cgroup.c | 1
kernel/crash_core.c | 4
kernel/events/core.c | 2
kernel/events/uprobes.c | 4
kernel/fork.c | 1
kernel/kthread.c | 19
kernel/sysctl.c | 16
kernel/watchdog.c | 12
lib/Kconfig.debug | 15
lib/Kconfig.kasan | 16
lib/Makefile | 1
lib/dump_stack.c | 20
lib/kunit/test.c | 18
lib/slub_kunit.c | 152 +++
lib/test_hmm.c | 5
lib/test_kasan.c | 11
lib/vsprintf.c | 2
mm/Kconfig | 38
mm/backing-dev.c | 66 +
mm/compaction.c | 2
mm/debug.c | 27
mm/debug_vm_pgtable.c | 63 +
mm/dmapool.c | 5
mm/filemap.c | 2
mm/gup.c | 81 +
mm/hugetlb.c | 2
mm/internal.h | 9
mm/kasan/Makefile | 4
mm/kasan/common.c | 6
mm/kasan/generic.c | 3
mm/kasan/hw_tags.c | 22
mm/kasan/init.c | 6
mm/kasan/kasan.h | 12
mm/kasan/report.c | 6
mm/kasan/report_hw_tags.c | 5
mm/kasan/report_sw_tags.c | 45
mm/kasan/report_tags.c | 51 +
mm/kasan/shadow.c | 6
mm/kasan/sw_tags.c | 45
mm/kasan/tags.c | 59 +
mm/kfence/kfence_test.c | 5
mm/kmemleak.c | 18
mm/ksm.c | 6
mm/memblock.c | 8
mm/memcontrol.c | 385 ++++++--
mm/memory-failure.c | 344 +++++--
mm/memory.c | 22
mm/memory_hotplug.c | 6
mm/mempolicy.c | 4
mm/migrate.c | 4
mm/mmap.c | 54 -
mm/mmap_lock.c | 33
mm/mprotect.c | 52 +
mm/mremap.c | 5
mm/nommu.c | 2
mm/page-writeback.c | 89 +
mm/page_alloc.c | 950 +++++++++++++--------
mm/page_ext.c | 2
mm/page_owner.c | 2
mm/page_reporting.c | 19
mm/page_reporting.h | 5
mm/pagewalk.c | 58 +
mm/shmem.c | 18
mm/slab.h | 24
mm/slab_common.c | 60 -
mm/slub.c | 420 +++++----
mm/sparse.c | 2
mm/swap.c | 4
mm/swap_slots.c | 2
mm/swap_state.c | 20
mm/swapfile.c | 177 +--
mm/vmalloc.c | 181 ++--
mm/vmscan.c | 43
mm/vmstat.c | 282 ++----
mm/workingset.c | 2
net/ipv4/tcp.c | 4
scripts/kconfig/streamline_config.pl | 76 -
scripts/link-vmlinux.sh | 4
scripts/spelling.txt | 16
tools/testing/selftests/vm/gup_test.c | 96 +-
tools/vm/page_owner_sort.c | 4
virt/kvm/kvm_main.c | 2
260 files changed, 3989 insertions(+), 2996 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-06-25 1:38 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-06-25 1:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
24 patches, based on 4a09d388f2ab382f217a764e6a152b3f614246f6.
Subsystems affected by this patch series:
mm/thp
nilfs2
mm/vmalloc
kthread
mm/hugetlb
mm/memory-failure
mm/pagealloc
MAINTAINERS
mailmap
Subsystem: mm/thp
Hugh Dickins <hughd@google.com>:
Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes":
mm: page_vma_mapped_walk(): use page for pvmw->page
mm: page_vma_mapped_walk(): settle PageHuge on entry
mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
mm: page_vma_mapped_walk(): crossing page table boundary
mm: page_vma_mapped_walk(): add a level of indentation
mm: page_vma_mapped_walk(): use goto instead of while (1)
mm: page_vma_mapped_walk(): get vma_address_end() earlier
mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
Subsystem: nilfs2
Pavel Skripkin <paskripkin@gmail.com>:
nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
Subsystem: mm/vmalloc
Claudio Imbrenda <imbrenda@linux.ibm.com>:
Patch series "mm: add vmalloc_no_huge and use it", v4:
mm/vmalloc: add vmalloc_no_huge
KVM: s390: prepare for hugepage vmalloc
Daniel Axtens <dja@axtens.net>:
mm/vmalloc: unbreak kasan vmalloc support
Subsystem: kthread
Petr Mladek <pmladek@suse.com>:
Patch series "kthread_worker: Fix race between kthread_mod_delayed_work():
kthread_worker: split code for canceling the delayed work timer
kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Subsystem: mm/hugetlb
Hugh Dickins <hughd@google.com>:
mm, futex: fix shared futex pgoff on shmem huge page
Subsystem: mm/memory-failure
Tony Luck <tony.luck@intel.com>:
Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5:
mm/memory-failure: use a mutex to avoid memory_failure() races
Aili Yao <yaoaili@kingsoft.com>:
mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
Subsystem: mm/pagealloc
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array
Mel Gorman <mgorman@techsingularity.net>:
mm/page_alloc: do bulk array bounds check after checking populated elements
Subsystem: MAINTAINERS
Marek Behún <kabel@kernel.org>:
MAINTAINERS: fix Marek's identity again
Subsystem: mailmap
Marek Behún <kabel@kernel.org>:
mailmap: add Marek's other e-mail address and identity without diacritics
.mailmap | 2
MAINTAINERS | 4
arch/s390/kvm/pv.c | 7 +
fs/nilfs2/sysfs.c | 1
include/linux/hugetlb.h | 16 ---
include/linux/pagemap.h | 13 +-
include/linux/vmalloc.h | 1
kernel/futex.c | 3
kernel/kthread.c | 81 ++++++++++------
mm/hugetlb.c | 5 -
mm/memory-failure.c | 83 +++++++++++------
mm/page_alloc.c | 6 +
mm/page_vma_mapped.c | 233 +++++++++++++++++++++++++++---------------------
mm/vmalloc.c | 41 ++++++--
14 files changed, 297 insertions(+), 199 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-06-16 1:22 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-06-16 1:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
18 patches, based on 94f0b2d4a1d0c52035aef425da5e022bd2cb1c71.
Subsystems affected by this patch series:
mm/memory-failure
mm/swap
mm/slub
mm/hugetlb
mm/memory-failure
coredump
mm/slub
mm/thp
mm/sparsemem
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm,hwpoison: fix race with hugetlb page allocation
Subsystem: mm/swap
Peter Xu <peterx@redhat.com>:
mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare
Subsystem: mm/slub
Kees Cook <keescook@chromium.org>:
Patch series "Actually fix freelist pointer vs redzoning", v4:
mm/slub: clarify verification reporting
mm/slub: fix redzoning for small allocations
mm/slub: actually fix freelist pointer vs redzoning
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
mm/hugetlb: expand restore_reserve_on_error functionality
Subsystem: mm/memory-failure
yangerkun <yangerkun@huawei.com>:
mm/memory-failure: make sure wait for page writeback in memory_failure
Subsystem: coredump
Pingfan Liu <kernelfans@gmail.com>:
crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo
Subsystem: mm/slub
Andrew Morton <akpm@linux-foundation.org>:
mm/slub.c: include swab.h
Subsystem: mm/thp
Xu Yu <xuyu@linux.alibaba.com>:
mm, thp: use head page in __migration_entry_wait()
Hugh Dickins <hughd@google.com>:
Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10:
mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
mm/thp: make is_huge_zero_pmd() safe and quicker
mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
mm/thp: fix vma_address() if virtual address below file offset
Jue Wang <juew@google.com>:
mm/thp: fix page_address_in_vma() on file THP tails
Hugh Dickins <hughd@google.com>:
mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
Yang Shi <shy828301@gmail.com>:
mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
Subsystem: mm/sparsemem
Miles Chen <miles.chen@mediatek.com>:
mm/sparse: fix check_usemap_section_nr warnings
Documentation/vm/slub.rst | 10 +--
fs/hugetlbfs/inode.c | 1
include/linux/huge_mm.h | 8 ++
include/linux/hugetlb.h | 8 ++
include/linux/mm.h | 3 +
include/linux/rmap.h | 1
include/linux/swapops.h | 15 +++--
kernel/crash_core.c | 1
mm/huge_memory.c | 58 ++++++++++---------
mm/hugetlb.c | 137 +++++++++++++++++++++++++++++++++++++---------
mm/internal.h | 51 ++++++++++++-----
mm/memory-failure.c | 36 +++++++++++-
mm/memory.c | 41 +++++++++++++
mm/migrate.c | 1
mm/page_vma_mapped.c | 27 +++++----
mm/pgtable-generic.c | 5 -
mm/rmap.c | 41 +++++++++----
mm/slab_common.c | 3 -
mm/slub.c | 37 +++++-------
mm/sparse.c | 13 +++-
mm/swapfile.c | 2
mm/truncate.c | 43 ++++++--------
22 files changed, 388 insertions(+), 154 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-06-05 3:00 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-06-05 3:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
13 patches, based on 16f0596fc1d78a1f3ae4628cff962bb297dc908c.
Subsystems affected by this patch series:
mips
mm/kfence
init
mm/debug
mm/pagealloc
mm/memory-hotplug
mm/hugetlb
proc
mm/kasan
mm/hugetlb
lib
ocfs2
mailmap
Subsystem: mips
Thomas Bogendoerfer <tsbogend@alpha.franken.de>:
Revert "MIPS: make userspace mapping young by default"
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence: use TASK_IDLE when awaiting allocation
Subsystem: init
Mark Rutland <mark.rutland@arm.com>:
pid: take a reference when initializing `cad_pid`
Subsystem: mm/debug
Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
Subsystem: mm/pagealloc
Ding Hui <dinghui@sangfor.com.cn>:
mm/page_alloc: fix counting of free pages after take off from buddy
Subsystem: mm/memory-hotplug
David Hildenbrand <david@redhat.com>:
drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
Subsystem: mm/hugetlb
Naoya Horiguchi <naoya.horiguchi@nec.com>:
hugetlb: pass head page to remove_hugetlb_page()
Subsystem: proc
David Matlack <dmatlack@google.com>:
proc: add .gitignore for proc-subset-pid selftest
Subsystem: mm/kasan
Yu Kuai <yukuai3@huawei.com>:
mm/kasan/init.c: fix doc warning
Subsystem: mm/hugetlb
Mina Almasry <almasrymina@google.com>:
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Subsystem: lib
YueHaibing <yuehaibing@huawei.com>:
lib: crc64: fix kernel-doc warning
Subsystem: ocfs2
Junxiao Bi <junxiao.bi@oracle.com>:
ocfs2: fix data corruption by fallocate
Subsystem: mailmap
Michel Lespinasse <michel@lespinasse.org>:
mailmap: use private address for Michel Lespinasse
.mailmap | 3 +
arch/mips/mm/cache.c | 30 ++++++++---------
drivers/base/memory.c | 6 +--
fs/ocfs2/file.c | 55 +++++++++++++++++++++++++++++---
include/linux/pgtable.h | 8 ++++
init/main.c | 2 -
lib/crc64.c | 2 -
mm/debug_vm_pgtable.c | 4 +-
mm/hugetlb.c | 16 +++++++--
mm/kasan/init.c | 4 +-
mm/kfence/core.c | 6 +--
mm/memory.c | 4 ++
mm/page_alloc.c | 2 +
tools/testing/selftests/proc/.gitignore | 1
14 files changed, 107 insertions(+), 36 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-05-23 0:41 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-05-23 0:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
10 patches, based on 4ff2473bdb4cf2bb7d208ccf4418d3d7e6b1652c.
Subsystems affected by this patch series:
mm/pagealloc
mm/gup
ipc
selftests
mm/kasan
kernel/watchdog
bitmap
procfs
lib
mm/userfaultfd
Subsystem: mm/pagealloc
Arnd Bergmann <arnd@arndb.de>:
mm/shuffle: fix section mismatch warning
Subsystem: mm/gup
Michal Hocko <mhocko@suse.com>:
Revert "mm/gup: check page posion status for coredump."
Subsystem: ipc
Varad Gautam <varad.gautam@suse.com>:
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
Subsystem: selftests
Yang Yingliang <yangyingliang@huawei.com>:
tools/testing/selftests/exec: fix link error
Subsystem: mm/kasan
Alexander Potapenko <glider@google.com>:
kasan: slab: always reset the tag in get_freepointer_safe()
Subsystem: kernel/watchdog
Petr Mladek <pmladek@suse.com>:
watchdog: reliable handling of timestamps
Subsystem: bitmap
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
linux/bits.h: fix compilation error with GENMASK
Subsystem: procfs
Alexey Dobriyan <adobriyan@gmail.com>:
proc: remove Alexey from MAINTAINERS
Subsystem: lib
Zhen Lei <thunder.leizhen@huawei.com>:
lib: kunit: suppress a compilation warning of frame size
Subsystem: mm/userfaultfd
Mike Kravetz <mike.kravetz@oracle.com>:
userfaultfd: hugetlbfs: fix new flag usage in error path
MAINTAINERS | 1 -
fs/hugetlbfs/inode.c | 2 +-
include/linux/bits.h | 2 +-
include/linux/const.h | 8 ++++++++
include/linux/minmax.h | 10 ++--------
ipc/mqueue.c | 6 ++++--
ipc/msg.c | 6 ++++--
ipc/sem.c | 6 ++++--
kernel/watchdog.c | 34 ++++++++++++++++++++--------------
lib/Makefile | 1 +
mm/gup.c | 4 ----
mm/internal.h | 20 --------------------
mm/shuffle.h | 4 ++--
mm/slub.c | 1 +
mm/userfaultfd.c | 28 ++++++++++++++--------------
tools/include/linux/bits.h | 2 +-
tools/include/linux/const.h | 8 ++++++++
tools/testing/selftests/exec/Makefile | 6 +++---
18 files changed, 74 insertions(+), 75 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-05-15 0:26 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-05-15 0:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
13 patches, based on bd3c9cdb21a2674dd0db70199df884828e37abd4.
Subsystems affected by this patch series:
mm/hugetlb
mm/slub
resource
squashfs
mm/userfaultfd
mm/ksm
mm/pagealloc
mm/kasan
mm/pagemap
hfsplus
modprobe
mm/ioremap
Subsystem: mm/hugetlb
Peter Xu <peterx@redhat.com>:
Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2:
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
mm/hugetlb: fix cow where page writtable in child
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: move slub_debug static key enabling outside slab_mutex
Subsystem: resource
Alistair Popple <apopple@nvidia.com>:
kernel/resource: fix return code check in __request_free_mem_region
Subsystem: squashfs
Phillip Lougher <phillip@squashfs.org.uk>:
squashfs: fix divide error in calculate_skip()
Subsystem: mm/userfaultfd
Axel Rasmussen <axelrasmussen@google.com>:
userfaultfd: release page in error path to avoid BUG_ON
Subsystem: mm/ksm
Hugh Dickins <hughd@google.com>:
ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"
Subsystem: mm/pagealloc
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: fix struct page layout on 32-bit systems
Subsystem: mm/kasan
Peter Collingbourne <pcc@google.com>:
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
Subsystem: mm/pagemap
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/filemap: fix readahead return types
Subsystem: hfsplus
Jouni Roivas <jouni.roivas@tuxera.com>:
hfsplus: prevent corruption in shrinking truncate
Subsystem: modprobe
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
docs: admin-guide: update description for kernel.modprobe sysctl
Subsystem: mm/ioremap
Christophe Leroy <christophe.leroy@csgroup.eu>:
mm/ioremap: fix iomap_max_page_shift
Documentation/admin-guide/sysctl/kernel.rst | 9 ++++---
fs/hfsplus/extents.c | 7 +++--
fs/hugetlbfs/inode.c | 5 ++++
fs/iomap/buffered-io.c | 4 +--
fs/squashfs/file.c | 6 ++--
include/linux/mm.h | 32 ++++++++++++++++++++++++++
include/linux/mm_types.h | 4 +--
include/linux/pagemap.h | 6 ++--
include/net/page_pool.h | 12 +++++++++
kernel/resource.c | 2 -
lib/test_kasan.c | 29 ++++++++++++++++++-----
mm/hugetlb.c | 1
mm/ioremap.c | 6 ++--
mm/ksm.c | 3 +-
mm/shmem.c | 34 ++++++++++++----------------
mm/slab_common.c | 10 ++++++++
mm/slub.c | 9 -------
net/core/page_pool.c | 12 +++++----
18 files changed, 129 insertions(+), 62 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-05-07 1:01 incoming Andrew Morton
@ 2021-05-07 7:12 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-05-07 7:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Thu, May 6, 2021 at 6:01 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> I've been wobbly about the secretmem patches due to doubts about
> whether the feature is sufficiently useful to justify inclusion, but
> developers are now weighing in with helpful information and I've asked Mike
> for an extensively updated [0/n] changelog. This will take a few days
> to play out so it is possible that I will prevail upon you for a post-rc1
> merge.
Oh, much too late for this release by now.
> If that's a problem, there's always 5.13-rc1.
5.13-rc1 is two days from now, it would be for 5.14-rc1.. How time -
and version numbers - fly.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-05-07 1:01 Andrew Morton
2021-05-07 7:12 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-05-07 1:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
This is everything else from -mm for this merge window, with the
possible exception of Mike Rapoport's "secretmem" syscall patch series
(https://lkml.kernel.org/r/20210303162209.8609-1-rppt@kernel.org).
I've been wobbly about the secretmem patches due to doubts about
whether the feature is sufficiently useful to justify inclusion, but
developers are now weighing in with helpful information and I've asked Mike
for an extensively updated [0/n] changelog. This will take a few days
to play out so it is possible that I will prevail upon you for a post-rc1
merge. If that's a problem, there's always 5.13-rc1.
91 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0, plus
previously sent patches.
Thanks.
Subsystems affected by this patch series:
alpha
procfs
sysctl
misc
core-kernel
bitmap
lib
compat
checkpatch
epoll
isofs
nilfs2
hpfs
exit
fork
kexec
gcov
panic
delayacct
gdb
resource
selftests
async
initramfs
ipc
mm/cleanups
drivers/char
mm/slub
spelling
Subsystem: alpha
Randy Dunlap <rdunlap@infradead.org>:
alpha: eliminate old-style function definitions
alpha: csum_partial_copy.c: add function prototypes from <net/checksum.h>
Subsystem: procfs
Colin Ian King <colin.king@canonical.com>:
fs/proc/generic.c: fix incorrect pde_is_permanent check
Alexey Dobriyan <adobriyan@gmail.com>:
proc: save LOC in __xlate_proc_name()
proc: mandate ->proc_lseek in "struct proc_ops"
proc: delete redundant subset=pid check
selftests: proc: test subset=pid
Subsystem: sysctl
zhouchuangao <zhouchuangao@vivo.com>:
proc/sysctl: fix function name error in comments
Subsystem: misc
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
include: remove pagemap.h from blkdev.h
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
kernel.h: drop inclusion in bitmap.h
Wan Jiabing <wanjiabing@vivo.com>:
linux/profile.h: remove unnecessary declaration
Subsystem: core-kernel
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
kernel/async.c: fix pr_debug statement
kernel/cred.c: make init_groups static
Subsystem: bitmap
Yury Norov <yury.norov@gmail.com>:
Patch series "lib/find_bit: fast path for small bitmaps", v6:
tools: disable -Wno-type-limits
tools: bitmap: sync function declarations with the kernel
tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel
arch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300
lib: extend the scope of small_const_nbits() macro
tools: sync small_const_nbits() macro with the kernel
lib: inline _find_next_bit() wrappers
tools: sync find_next_bit implementation
lib: add fast path for find_next_*_bit()
lib: add fast path for find_first_*_bit() and find_last_bit()
tools: sync lib/find_bit implementation
MAINTAINERS: add entry for the bitmap API
Subsystem: lib
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
lib/bch.c: fix a typo in the file bch.c
Wang Qing <wangqing@vivo.com>:
lib: fix inconsistent indenting in process_bit1()
ToastC <mrtoastcheng@gmail.com>:
lib/list_sort.c: fix typo in function description
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
lib/genalloc.c: Fix a typo
Richard Fitzgerald <rf@opensource.cirrus.com>:
lib: crc8: pointer to data block should be const
Zqiang <qiang.zhang@windriver.com>:
lib: stackdepot: turn depot_lock spinlock to raw_spinlock
Alex Shi <alexs@kernel.org>:
lib/percpu_counter: tame kernel-doc compile warning
lib/genalloc: add parameter description to fix doc compile warning
Randy Dunlap <rdunlap@infradead.org>:
lib: parser: clean up kernel-doc
Subsystem: compat
Masahiro Yamada <masahiroy@kernel.org>:
include/linux/compat.h: remove unneeded declaration from COMPAT_SYSCALL_DEFINEx()
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: warn when missing newline in return sysfs_emit() formats
Vincent Mailhol <mailhol.vincent@wanadoo.fr>:
checkpatch: exclude four preprocessor sub-expressions from MACRO_ARG_REUSE
Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
checkpatch: improve ALLOC_ARRAY_ARGS test
Subsystem: epoll
Davidlohr Bueso <dave@stgolabs.net>:
Patch series "fs/epoll: restore user-visible behavior upon event ready":
kselftest: introduce new epoll test case
fs/epoll: restore waking from ep_done_scan()
Subsystem: isofs
"Gustavo A. R. Silva" <gustavoars@kernel.org>:
isofs: fix fall-through warnings for Clang
Subsystem: nilfs2
Liu xuzhi <liu.xuzhi@zte.com.cn>:
fs/nilfs2: fix misspellings using codespell tool
Lu Jialin <lujialin4@huawei.com>:
nilfs2: fix typos in comments
Subsystem: hpfs
"Gustavo A. R. Silva" <gustavoars@kernel.org>:
hpfs: replace one-element array with flexible-array member
Subsystem: exit
Jim Newsome <jnewsome@torproject.org>:
do_wait: make PIDTYPE_PID case O(1) instead of O(n)
Subsystem: fork
Rolf Eike Beer <eb@emlix.com>:
kernel/fork.c: simplify copy_mm()
Xiaofeng Cao <cxfcosmos@gmail.com>:
kernel/fork.c: fix typos
Subsystem: kexec
Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>:
kernel/crash_core: add crashkernel=auto for vmcore creation
Joe LeVeque <jolevequ@microsoft.com>:
kexec: Add kexec reboot string
Jia-Ju Bai <baijiaju1990@gmail.com>:
kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
Pavel Tatashin <pasha.tatashin@soleen.com>:
kexec: dump kmessage before machine_kexec
Subsystem: gcov
Johannes Berg <johannes.berg@intel.com>:
gcov: combine common code
gcov: simplify buffer allocation
gcov: use kvmalloc()
Nick Desaulniers <ndesaulniers@google.com>:
gcov: clang: drop support for clang-10 and older
Subsystem: panic
He Ying <heying24@huawei.com>:
smp: kernel/panic.c - silence warnings
Subsystem: delayacct
Yafang Shao <laoar.shao@gmail.com>:
delayacct: clear right task's flag after blkio completes
Subsystem: gdb
Johannes Berg <johannes.berg@intel.com>:
gdb: lx-symbols: store the abspath()
Barry Song <song.bao.hua@hisilicon.com>:
Patch series "scripts/gdb: clarify the platforms supporting lx_current and add arm64 support", v2:
scripts/gdb: document lx_current is only supported by x86
scripts/gdb: add lx_current support for arm64
Subsystem: resource
David Hildenbrand <david@redhat.com>:
Patch series "kernel/resource: make walk_system_ram_res() and walk_mem_res() search the whole tree", v2:
kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
kernel/resource: remove first_lvl / siblings_only logic
Alistair Popple <apopple@nvidia.com>:
kernel/resource: allow region_intersects users to hold resource_lock
kernel/resource: refactor __request_region to allow external locking
kernel/resource: fix locking in request_free_mem_region
Subsystem: selftests
Zhang Yunkai <zhang.yunkai@zte.com.cn>:
selftests: remove duplicate include
Subsystem: async
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
kernel/async.c: stop guarding pr_debug() statements
kernel/async.c: remove async_unregister_domain()
Subsystem: initramfs
Rasmus Villemoes <linux@rasmusvillemoes.dk>:
Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3:
init/initramfs.c: do unpacking asynchronously
modules: add CONFIG_MODPROBE_PATH
Subsystem: ipc
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
ipc/sem.c: mundane typo fixes
Subsystem: mm/cleanups
Shijie Luo <luoshijie1@huawei.com>:
mm: fix some typos and code style problems
Subsystem: drivers/char
David Hildenbrand <david@redhat.com>:
Patch series "drivers/char: remove /dev/kmem for good":
drivers/char: remove /dev/kmem for good
mm: remove xlate_dev_kmem_ptr()
mm/vmalloc: remove vwrite()
Subsystem: mm/slub
Maninder Singh <maninder1.s@samsung.com>:
arm: print alloc free paths for address in registers
Subsystem: spelling
Drew Fustini <drew@beagleboard.org>:
scripts/spelling.txt: add "overlfow"
zuoqilin <zuoqilin@yulong.com>:
scripts/spelling.txt: Add "diabled" typo
Drew Fustini <drew@beagleboard.org>:
scripts/spelling.txt: add "overflw"
Colin Ian King <colin.king@canonical.com>:
mm/slab.c: fix spelling mistake "disired" -> "desired"
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
include/linux/pgtable.h: few spelling fixes
zhouchuangao <zhouchuangao@vivo.com>:
kernel/umh.c: fix some spelling mistakes
Xiaofeng Cao <cxfcosmos@gmail.com>:
kernel/user_namespace.c: fix typos
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
kernel/up.c: fix typo
Xiaofeng Cao <caoxiaofeng@yulong.com>:
kernel/sys.c: fix typo
dingsenjie <dingsenjie@yulong.com>:
fs: fat: fix spelling typo of values
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
ipc/sem.c: spelling fix
Masahiro Yamada <masahiroy@kernel.org>:
treewide: remove editor modelines and cruft
Ingo Molnar <mingo@kernel.org>:
mm: fix typos in comments
Lu Jialin <lujialin4@huawei.com>:
mm: fix typos in comments
Documentation/admin-guide/devices.txt | 2
Documentation/admin-guide/kdump/kdump.rst | 3
Documentation/admin-guide/kernel-parameters.txt | 18
Documentation/dev-tools/gdb-kernel-debugging.rst | 4
MAINTAINERS | 16
arch/Kconfig | 20
arch/alpha/include/asm/io.h | 5
arch/alpha/kernel/pc873xx.c | 4
arch/alpha/lib/csum_partial_copy.c | 1
arch/arm/configs/dove_defconfig | 1
arch/arm/configs/magician_defconfig | 1
arch/arm/configs/moxart_defconfig | 1
arch/arm/configs/mps2_defconfig | 1
arch/arm/configs/mvebu_v5_defconfig | 1
arch/arm/configs/xcep_defconfig | 1
arch/arm/include/asm/bug.h | 1
arch/arm/include/asm/io.h | 5
arch/arm/kernel/process.c | 11
arch/arm/kernel/traps.c | 1
arch/h8300/include/asm/bitops.h | 8
arch/hexagon/configs/comet_defconfig | 1
arch/hexagon/include/asm/io.h | 1
arch/ia64/include/asm/io.h | 1
arch/ia64/include/asm/uaccess.h | 18
arch/m68k/atari/time.c | 7
arch/m68k/configs/amcore_defconfig | 1
arch/m68k/include/asm/bitops.h | 6
arch/m68k/include/asm/io_mm.h | 5
arch/mips/include/asm/io.h | 5
arch/openrisc/configs/or1ksim_defconfig | 1
arch/parisc/include/asm/io.h | 5
arch/parisc/include/asm/pdc_chassis.h | 1
arch/powerpc/include/asm/io.h | 5
arch/s390/include/asm/io.h | 5
arch/sh/configs/edosk7705_defconfig | 1
arch/sh/configs/se7206_defconfig | 1
arch/sh/configs/sh2007_defconfig | 1
arch/sh/configs/sh7724_generic_defconfig | 1
arch/sh/configs/sh7770_generic_defconfig | 1
arch/sh/configs/sh7785lcr_32bit_defconfig | 1
arch/sh/include/asm/bitops.h | 5
arch/sh/include/asm/io.h | 5
arch/sparc/configs/sparc64_defconfig | 1
arch/sparc/include/asm/io_64.h | 5
arch/um/drivers/cow.h | 7
arch/xtensa/configs/xip_kc705_defconfig | 1
block/blk-settings.c | 1
drivers/auxdisplay/panel.c | 7
drivers/base/firmware_loader/main.c | 2
drivers/block/brd.c | 1
drivers/block/loop.c | 1
drivers/char/Kconfig | 10
drivers/char/mem.c | 231 --------
drivers/gpu/drm/qxl/qxl_drv.c | 1
drivers/isdn/capi/kcapi_proc.c | 1
drivers/md/bcache/super.c | 1
drivers/media/usb/pwc/pwc-uncompress.c | 3
drivers/net/ethernet/adaptec/starfire.c | 8
drivers/net/ethernet/amd/atarilance.c | 8
drivers/net/ethernet/amd/pcnet32.c | 7
drivers/net/wireless/intersil/hostap/hostap_proc.c | 1
drivers/net/wireless/intersil/orinoco/orinoco_nortel.c | 8
drivers/net/wireless/intersil/orinoco/orinoco_pci.c | 8
drivers/net/wireless/intersil/orinoco/orinoco_plx.c | 8
drivers/net/wireless/intersil/orinoco/orinoco_tmd.c | 8
drivers/nvdimm/btt.c | 1
drivers/nvdimm/pmem.c | 1
drivers/parport/parport_ip32.c | 12
drivers/platform/x86/dell/dell_rbu.c | 3
drivers/scsi/53c700.c | 1
drivers/scsi/53c700.h | 1
drivers/scsi/ch.c | 6
drivers/scsi/esas2r/esas2r_main.c | 1
drivers/scsi/ips.c | 20
drivers/scsi/ips.h | 20
drivers/scsi/lasi700.c | 1
drivers/scsi/megaraid/mbox_defs.h | 2
drivers/scsi/megaraid/mega_common.h | 2
drivers/scsi/megaraid/megaraid_mbox.c | 2
drivers/scsi/megaraid/megaraid_mbox.h | 2
drivers/scsi/qla1280.c | 12
drivers/scsi/scsicam.c | 1
drivers/scsi/sni_53c710.c | 1
drivers/video/fbdev/matrox/matroxfb_base.c | 9
drivers/video/fbdev/vga16fb.c | 10
fs/configfs/configfs_internal.h | 4
fs/configfs/dir.c | 4
fs/configfs/file.c | 4
fs/configfs/inode.c | 4
fs/configfs/item.c | 4
fs/configfs/mount.c | 4
fs/configfs/symlink.c | 4
fs/eventpoll.c | 6
fs/fat/fatent.c | 2
fs/hpfs/hpfs.h | 3
fs/isofs/rock.c | 1
fs/nfs/dir.c | 7
fs/nfs/nfs4proc.c | 6
fs/nfs/nfs4renewd.c | 6
fs/nfs/nfs4state.c | 6
fs/nfs/nfs4xdr.c | 6
fs/nfsd/nfs4proc.c | 6
fs/nfsd/nfs4xdr.c | 6
fs/nfsd/xdr4.h | 6
fs/nilfs2/cpfile.c | 2
fs/nilfs2/ioctl.c | 4
fs/nilfs2/segment.c | 4
fs/nilfs2/the_nilfs.c | 2
fs/ocfs2/acl.c | 4
fs/ocfs2/acl.h | 4
fs/ocfs2/alloc.c | 4
fs/ocfs2/alloc.h | 4
fs/ocfs2/aops.c | 4
fs/ocfs2/aops.h | 4
fs/ocfs2/blockcheck.c | 4
fs/ocfs2/blockcheck.h | 4
fs/ocfs2/buffer_head_io.c | 4
fs/ocfs2/buffer_head_io.h | 4
fs/ocfs2/cluster/heartbeat.c | 4
fs/ocfs2/cluster/heartbeat.h | 4
fs/ocfs2/cluster/masklog.c | 4
fs/ocfs2/cluster/masklog.h | 4
fs/ocfs2/cluster/netdebug.c | 4
fs/ocfs2/cluster/nodemanager.c | 4
fs/ocfs2/cluster/nodemanager.h | 4
fs/ocfs2/cluster/ocfs2_heartbeat.h | 4
fs/ocfs2/cluster/ocfs2_nodemanager.h | 4
fs/ocfs2/cluster/quorum.c | 4
fs/ocfs2/cluster/quorum.h | 4
fs/ocfs2/cluster/sys.c | 4
fs/ocfs2/cluster/sys.h | 4
fs/ocfs2/cluster/tcp.c | 4
fs/ocfs2/cluster/tcp.h | 4
fs/ocfs2/cluster/tcp_internal.h | 4
fs/ocfs2/dcache.c | 4
fs/ocfs2/dcache.h | 4
fs/ocfs2/dir.c | 4
fs/ocfs2/dir.h | 4
fs/ocfs2/dlm/dlmapi.h | 4
fs/ocfs2/dlm/dlmast.c | 4
fs/ocfs2/dlm/dlmcommon.h | 4
fs/ocfs2/dlm/dlmconvert.c | 4
fs/ocfs2/dlm/dlmconvert.h | 4
fs/ocfs2/dlm/dlmdebug.c | 4
fs/ocfs2/dlm/dlmdebug.h | 4
fs/ocfs2/dlm/dlmdomain.c | 4
fs/ocfs2/dlm/dlmdomain.h | 4
fs/ocfs2/dlm/dlmlock.c | 4
fs/ocfs2/dlm/dlmmaster.c | 4
fs/ocfs2/dlm/dlmrecovery.c | 4
fs/ocfs2/dlm/dlmthread.c | 4
fs/ocfs2/dlm/dlmunlock.c | 4
fs/ocfs2/dlmfs/dlmfs.c | 4
fs/ocfs2/dlmfs/userdlm.c | 4
fs/ocfs2/dlmfs/userdlm.h | 4
fs/ocfs2/dlmglue.c | 4
fs/ocfs2/dlmglue.h | 4
fs/ocfs2/export.c | 4
fs/ocfs2/export.h | 4
fs/ocfs2/extent_map.c | 4
fs/ocfs2/extent_map.h | 4
fs/ocfs2/file.c | 4
fs/ocfs2/file.h | 4
fs/ocfs2/filecheck.c | 4
fs/ocfs2/filecheck.h | 4
fs/ocfs2/heartbeat.c | 4
fs/ocfs2/heartbeat.h | 4
fs/ocfs2/inode.c | 4
fs/ocfs2/inode.h | 4
fs/ocfs2/journal.c | 4
fs/ocfs2/journal.h | 4
fs/ocfs2/localalloc.c | 4
fs/ocfs2/localalloc.h | 4
fs/ocfs2/locks.c | 4
fs/ocfs2/locks.h | 4
fs/ocfs2/mmap.c | 4
fs/ocfs2/move_extents.c | 4
fs/ocfs2/move_extents.h | 4
fs/ocfs2/namei.c | 4
fs/ocfs2/namei.h | 4
fs/ocfs2/ocfs1_fs_compat.h | 4
fs/ocfs2/ocfs2.h | 4
fs/ocfs2/ocfs2_fs.h | 4
fs/ocfs2/ocfs2_ioctl.h | 4
fs/ocfs2/ocfs2_lockid.h | 4
fs/ocfs2/ocfs2_lockingver.h | 4
fs/ocfs2/refcounttree.c | 4
fs/ocfs2/refcounttree.h | 4
fs/ocfs2/reservations.c | 4
fs/ocfs2/reservations.h | 4
fs/ocfs2/resize.c | 4
fs/ocfs2/resize.h | 4
fs/ocfs2/slot_map.c | 4
fs/ocfs2/slot_map.h | 4
fs/ocfs2/stack_o2cb.c | 4
fs/ocfs2/stack_user.c | 4
fs/ocfs2/stackglue.c | 4
fs/ocfs2/stackglue.h | 4
fs/ocfs2/suballoc.c | 4
fs/ocfs2/suballoc.h | 4
fs/ocfs2/super.c | 4
fs/ocfs2/super.h | 4
fs/ocfs2/symlink.c | 4
fs/ocfs2/symlink.h | 4
fs/ocfs2/sysfile.c | 4
fs/ocfs2/sysfile.h | 4
fs/ocfs2/uptodate.c | 4
fs/ocfs2/uptodate.h | 4
fs/ocfs2/xattr.c | 4
fs/ocfs2/xattr.h | 4
fs/proc/generic.c | 13
fs/proc/inode.c | 18
fs/proc/proc_sysctl.c | 2
fs/reiserfs/procfs.c | 10
include/asm-generic/bitops/find.h | 108 +++
include/asm-generic/bitops/le.h | 38 +
include/asm-generic/bitsperlong.h | 12
include/asm-generic/io.h | 11
include/linux/align.h | 15
include/linux/async.h | 1
include/linux/bitmap.h | 11
include/linux/bitops.h | 12
include/linux/blkdev.h | 1
include/linux/compat.h | 1
include/linux/configfs.h | 4
include/linux/crc8.h | 2
include/linux/cred.h | 1
include/linux/delayacct.h | 20
include/linux/fs.h | 2
include/linux/genl_magic_func.h | 1
include/linux/genl_magic_struct.h | 1
include/linux/gfp.h | 2
include/linux/init_task.h | 1
include/linux/initrd.h | 2
include/linux/kernel.h | 9
include/linux/mm.h | 2
include/linux/mmzone.h | 2
include/linux/pgtable.h | 10
include/linux/proc_fs.h | 1
include/linux/profile.h | 3
include/linux/smp.h | 8
include/linux/swap.h | 1
include/linux/vmalloc.h | 7
include/uapi/linux/if_bonding.h | 11
include/uapi/linux/nfs4.h | 6
include/xen/interface/elfnote.h | 10
include/xen/interface/hvm/hvm_vcpu.h | 10
include/xen/interface/io/xenbus.h | 10
init/Kconfig | 12
init/initramfs.c | 38 +
init/main.c | 1
ipc/sem.c | 12
kernel/async.c | 68 --
kernel/configs/android-base.config | 1
kernel/crash_core.c | 7
kernel/cred.c | 2
kernel/exit.c | 67 ++
kernel/fork.c | 23
kernel/gcov/Kconfig | 1
kernel/gcov/base.c | 49 +
kernel/gcov/clang.c | 282 ----------
kernel/gcov/fs.c | 146 ++++-
kernel/gcov/gcc_4_7.c | 173 ------
kernel/gcov/gcov.h | 14
kernel/kexec_core.c | 4
kernel/kexec_file.c | 4
kernel/kmod.c | 2
kernel/resource.c | 198 ++++---
kernel/sys.c | 14
kernel/umh.c | 8
kernel/up.c | 2
kernel/user_namespace.c | 6
lib/bch.c | 2
lib/crc8.c | 2
lib/decompress_unlzma.c | 2
lib/find_bit.c | 68 --
lib/genalloc.c | 7
lib/list_sort.c | 2
lib/parser.c | 61 +-
lib/percpu_counter.c | 2
lib/stackdepot.c | 6
mm/balloon_compaction.c | 4
mm/compaction.c | 4
mm/filemap.c | 2
mm/gup.c | 2
mm/highmem.c | 2
mm/huge_memory.c | 6
mm/hugetlb.c | 6
mm/internal.h | 2
mm/kasan/kasan.h | 8
mm/kasan/quarantine.c | 4
mm/kasan/shadow.c | 4
mm/kfence/report.c | 2
mm/khugepaged.c | 2
mm/ksm.c | 6
mm/madvise.c | 4
mm/memcontrol.c | 18
mm/memory-failure.c | 2
mm/memory.c | 18
mm/mempolicy.c | 6
mm/migrate.c | 8
mm/mmap.c | 4
mm/mprotect.c | 2
mm/mremap.c | 2
mm/nommu.c | 10
mm/oom_kill.c | 2
mm/page-writeback.c | 4
mm/page_alloc.c | 16
mm/page_owner.c | 2
mm/page_vma_mapped.c | 2
mm/percpu-internal.h | 2
mm/percpu.c | 2
mm/pgalloc-track.h | 6
mm/rmap.c | 2
mm/slab.c | 8
mm/slub.c | 2
mm/swap.c | 4
mm/swap_slots.c | 2
mm/swap_state.c | 2
mm/vmalloc.c | 124 ----
mm/vmstat.c | 2
mm/z3fold.c | 2
mm/zpool.c | 2
mm/zsmalloc.c | 6
samples/configfs/configfs_sample.c | 2
scripts/checkpatch.pl | 15
scripts/gdb/linux/cpus.py | 23
scripts/gdb/linux/symbols.py | 3
scripts/spelling.txt | 3
tools/include/asm-generic/bitops/find.h | 85 ++-
tools/include/asm-generic/bitsperlong.h | 3
tools/include/linux/bitmap.h | 18
tools/lib/bitmap.c | 4
tools/lib/find_bit.c | 56 -
tools/scripts/Makefile.include | 1
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 44 +
tools/testing/selftests/kvm/lib/sparsebit.c | 1
tools/testing/selftests/mincore/mincore_selftest.c | 1
tools/testing/selftests/powerpc/mm/tlbie_test.c | 1
tools/testing/selftests/proc/Makefile | 1
tools/testing/selftests/proc/proc-subset-pid.c | 121 ++++
tools/testing/selftests/proc/read.c | 4
tools/usb/hcd-tests.sh | 2
343 files changed, 1383 insertions(+), 2119 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-05-05 17:44 ` incoming Andrew Morton
@ 2021-05-06 3:19 ` Anshuman Khandual
0 siblings, 0 replies; 786+ messages in thread
From: Anshuman Khandual @ 2021-05-06 3:19 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds; +Cc: Konstantin Ryabitsev, Linux-MM, mm-commits
On 5/5/21 11:14 PM, Andrew Morton wrote:
> On Wed, 5 May 2021 10:10:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>> Let me resend right now with the same in-reply-to. Hopefully they will
>>> land in the correct place.
>> Well, you re-sent it twice, and I have three copies in my own mailbox,
>> bot they still don't show up on the mm-commits mailing list.
>>
>> So the list hates them for some odd reason.
>>
>> I've picked them up locally, but adding Konstantin to the participants
>> to see if he can see what's up.
>>
>> Konstantin: patches 103/106/107 are missing on lore out of Andrew's
>> series of 143. Odd.
> It's weird. They don't turn up on linux-mm either, and that's running
> at kvack.org, also majordomo. They don't get through when sent with
> either heirloom-mailx or with sylpheed.
>
> Also, it seems that when Anshuman originally sent the patch, linux-mm
> and linux-kernel didn't send it back out. So perhaps a spam filter
> triggered?
>
> I'm seeing
>
> https://lore.kernel.org/linux-arm-kernel/1615278790-18053-3-git-send-email-anshuman.khandual@arm.com/
>
> which is via linux-arm-kernel@lists.infradead.org but the linux-kernel
> server massacred that patch series. Searching
> https://lkml.org/lkml/2021/3/9 for "anshuman" only shows 3 of the 7
> email series.
Yeah these patches faced problem from the very beginning getting
into the MM/LKML list for some strange reason.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-05-05 17:10 ` incoming Linus Torvalds
(?)
@ 2021-05-05 17:44 ` Andrew Morton
2021-05-06 3:19 ` incoming Anshuman Khandual
-1 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-05-05 17:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Konstantin Ryabitsev, Linux-MM, mm-commits
[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]
On Wed, 5 May 2021 10:10:33 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Let me resend right now with the same in-reply-to. Hopefully they will
> > land in the correct place.
>
> Well, you re-sent it twice, and I have three copies in my own mailbox,
> bot they still don't show up on the mm-commits mailing list.
>
> So the list hates them for some odd reason.
>
> I've picked them up locally, but adding Konstantin to the participants
> to see if he can see what's up.
>
> Konstantin: patches 103/106/107 are missing on lore out of Andrew's
> series of 143. Odd.
It's weird. They don't turn up on linux-mm either, and that's running
at kvack.org, also majordomo. They don't get through when sent with
either heirloom-mailx or with sylpheed.
Also, it seems that when Anshuman originally sent the patch, linux-mm
and linux-kernel didn't send it back out. So perhaps a spam filter
triggered?
I'm seeing
https://lore.kernel.org/linux-arm-kernel/1615278790-18053-3-git-send-email-anshuman.khandual@arm.com/
which is via linux-arm-kernel@lists.infradead.org but the linux-kernel
server massacred that patch series. Searching
https://lkml.org/lkml/2021/3/9 for "anshuman" only shows 3 of the 7
email series.
One of the emails (as sent my me) is attached, if that helps.
[-- Attachment #2: x.txt --]
[-- Type: text/plain, Size: 21048 bytes --]
Return-Path: <akpm@linux-foundation.org>
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on y
X-Spam-Level: (none)
X-Spam-Status: No, score=-101.5 required=2.5 tests=BAYES_00,T_DKIM_INVALID,
USER_IN_WHITELIST autolearn=ham autolearn_force=no version=3.4.1
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
by localhost.localdomain (8.15.2/8.15.2/Debian-8ubuntu1) with ESMTP id 1453H2fk032202
for <akpm@localhost>; Tue, 4 May 2021 20:17:03 -0700
Received: from imap.fastmail.com [66.111.4.135]
by localhost.localdomain with IMAP (fetchmail-6.3.26)
for <akpm@localhost> (single-drop); Tue, 04 May 2021 20:17:03 -0700 (PDT)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
by sloti11d1t06 (Cyrus 3.5.0-alpha0-442-g5daca166b9-fm-20210428.001-g5daca166) with LMTPA;
Tue, 04 May 2021 23:16:31 -0400
X-Cyrus-Session-Id: sloti11d1t06-1620184591-1699471-2-6359664467419938249
X-Sieve: CMU Sieve 3.0
X-Resolved-to: akpm@mbx.kernel.org
X-Delivered-to: akpm@mbx.kernel.org
X-Mail-from: akpm@linux-foundation.org
Received: from mx6 ([10.202.2.205])
by compute1.internal (LMTPProxy); Tue, 04 May 2021 23:16:31 -0400
Received: from mx6.messagingengine.com (localhost [127.0.0.1])
by mailmx.nyi.internal (Postfix) with ESMTP id 40796C800E1
for <akpm@mbx.kernel.org>; Tue, 4 May 2021 23:16:31 -0400 (EDT)
Received: from mx6.messagingengine.com (localhost [127.0.0.1])
by mx6.messagingengine.com (Authentication Milter) with ESMTP
id 14870833D7F;
Tue, 4 May 2021 23:16:31 -0400
ARC-Seal: i=2; a=rsa-sha256; cv=pass; d=messagingengine.com; s=fm2; t=
1620184591; b=FBo7Gf3JFN+4QYg5Byan0oNm6RESv+sIf5HcaslVNsUd9SOTGS
yI0+IsXr1CUpGH783hE6fmgEq9SyfOwQVZjdikLaJS1+7u0JtfAYQFU3RORCtXlr
djJWrScfjVa8nAHX4rQCtzvtPYuzx5w7cTgGgeILgoJMxgLj7EC9xcT8BIf68+9W
Lw+ohAmcuiKhL2ez+de4SMuwdh3dh2FwAIHQOsSjEU1/NV+WGxMLwYbxWgTrqQGH
RQIzFNdq30qslW9huK47+e80uHOX2tXwxtshwbThFEn458bdV5LL6Y8Oh4ZWMbv1
tFgTt515DVedonZknxc07XsXtAjaJyB8bfHw==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=
messagingengine.com; h=date:from:to:subject:message-id
:in-reply-to; s=fm2; t=1620184591; bh=LuH7mbm3+zp863vKBEqKeoZtnp
uFxYpIb5oTVwf56Es=; b=m5E1fbz2b+an/X406oY3BuG0Zm4/W05vWAki8Lsnud
gPCc1LfPUFSuXaMppcEDPbLKprp4hH3T52itK4pivXMQCLEOyme7kVStaLMVTiky
Xxqh5ZdhOWvygBfda/GjfuLBSbbj2gfm8HPKpbL7CA5foelknIBhJHDzGkJyxetZ
YagZfVvtdo2OEwnC1mmjUCpKPO5+m5kaZO0ol6rPdl+TV0MKGhjLg+/i6Ia+0nFp
zDwV4VeACvVcGb2xY7KG5Z+BtqVxeVFn+w5JcqpWUtxEKoSBR4bWARzjwHg6eouh
7psOOKPTt/NzDKk+3f49lso5KlPiTF2xEU/+5SIttCkQ==
ARC-Authentication-Results: i=2; mx6.messagingengine.com;
arc=pass (as.1.google.com=pass, ams.1.google.com=pass)
smtp.remote-ip=209.85.215.198;
bimi=skipped (DMARC did not pass);
dkim=pass (1024-bit rsa key sha256) header.d=linux-foundation.org
header.i=@linux-foundation.org header.b=Gdz/3wY9 header.a=rsa-sha256
header.s=korg x-bits=1024;
dmarc=none policy.published-domain-policy=none
policy.applied-disposition=none policy.evaluated-disposition=none
(p=none,d=none,d.eval=none) policy.policy-from=p
header.from=linux-foundation.org;
iprev=pass smtp.remote-ip=209.85.215.198 (mail-pg1-f198.google.com);
spf=pass smtp.mailfrom=akpm@linux-foundation.org
smtp.helo=mail-pg1-f198.google.com;
x-aligned-from=pass (Address match);
x-arc-spf=pass
(google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender)
smtp.mailfrom=akpm@linux-foundation.org x-arc-instance=1
x-arc-domain=google.com (Trusted from aar.1.google.com);
x-csa=none;
x-google-dkim=fail (message has been altered, 2048-bit rsa key)
header.d=1e100.net header.i=@1e100.net header.b=VZuDOxUf;
x-me-sender=none;
x-ptr=pass smtp.helo=mail-pg1-f198.google.com
policy.ptr=mail-pg1-f198.google.com;
x-return-mx=pass header.domain=linux-foundation.org policy.is_org=yes
(MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
x-return-mx=pass smtp.domain=linux-foundation.org policy.is_org=yes
(MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384
smtp.bits=256/256;
x-vs=clean score=40 state=0
Authentication-Results: mx6.messagingengine.com;
arc=pass (as.1.google.com=pass, ams.1.google.com=pass)
smtp.remote-ip=209.85.215.198;
bimi=skipped (DMARC did not pass);
dkim=pass (1024-bit rsa key sha256) header.d=linux-foundation.org
header.i=@linux-foundation.org header.b=Gdz/3wY9 header.a=rsa-sha256
header.s=korg x-bits=1024;
dmarc=none policy.published-domain-policy=none
policy.applied-disposition=none policy.evaluated-disposition=none
(p=none,d=none,d.eval=none) policy.policy-from=p
header.from=linux-foundation.org;
iprev=pass smtp.remote-ip=209.85.215.198 (mail-pg1-f198.google.com);
spf=pass smtp.mailfrom=akpm@linux-foundation.org
smtp.helo=mail-pg1-f198.google.com;
x-aligned-from=pass (Address match);
x-arc-spf=pass
(google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender)
smtp.mailfrom=akpm@linux-foundation.org x-arc-instance=1
x-arc-domain=google.com (Trusted from aar.1.google.com);
x-csa=none;
x-google-dkim=fail (message has been altered, 2048-bit rsa key)
header.d=1e100.net header.i=@1e100.net header.b=VZuDOxUf;
x-me-sender=none;
x-ptr=pass smtp.helo=mail-pg1-f198.google.com
policy.ptr=mail-pg1-f198.google.com;
x-return-mx=pass header.domain=linux-foundation.org policy.is_org=yes
(MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
x-return-mx=pass smtp.domain=linux-foundation.org policy.is_org=yes
(MX Records found: ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT3.ASPMX.L.GOOGLE.COM,ALT4.ASPMX.L.GOOGLE.COM);
x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384
smtp.bits=256/256;
x-vs=clean score=40 state=0
X-ME-VSCause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefjedgieegucetufdoteggodetrfdotf
fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu
rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucgoufhorhhtvggutfgvtg
hiphdvucdlgedtmdenucfjughrpeffhffvuffkjggfsedttdertddtredtnecuhfhrohhm
peetnhgurhgvficuofhorhhtohhnuceorghkphhmsehlihhnuhigqdhfohhunhgurghtih
honhdrohhrgheqnecuggftrfgrthhtvghrnhepjeevfeduveffvddvudetkefhgeduveeu
geevvdfhhfevhfekkedtieefgfduheeinecuffhomhgrihhnpehkvghrnhgvlhdrohhrgh
enucfkphepvddtledrkeehrddvudehrdduleekpdduleekrddugeehrddvledrleelnecu
uegrugftvghpuhhtkfhppeduleekrddugeehrddvledrleelnecuvehluhhsthgvrhfuih
iivgeptdenucfrrghrrghmpehinhgvthepvddtledrkeehrddvudehrdduleekpdhhvghl
ohepmhgrihhlqdhpghduqdhfudelkedrghhoohhglhgvrdgtohhmpdhmrghilhhfrhhomh
epoegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgqe
X-ME-VSScore: 40
X-ME-VSCategory: clean
X-ME-CSA: none
Received-SPF: pass
(linux-foundation.org: Sender is authorized to use 'akpm@linux-foundation.org' in 'mfrom' identity (mechanism 'include:_spf.google.com' matched))
receiver=mx6.messagingengine.com;
identity=mailfrom;
envelope-from="akpm@linux-foundation.org";
helo=mail-pg1-f198.google.com;
client-ip=209.85.215.198
Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
(No client certificate requested)
by mx6.messagingengine.com (Postfix) with ESMTPS
for <akpm@mbx.kernel.org>; Tue, 4 May 2021 23:16:31 -0400 (EDT)
Received: by mail-pg1-f198.google.com with SMTP id g5-20020a63f4050000b02901f6c7b9a6d0so593624pgi.5
for <akpm@mbx.kernel.org>; Tue, 04 May 2021 20:16:30 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:dkim-signature:date:from:to:subject:message-id
:in-reply-to:user-agent;
bh=LuH7mbm3+zp863vKBEqKeoZtnpuFxYpIb5oTVwf56Es=;
b=VZuDOxUfeHXJz1/CiFfcxuMVHkmW5RznvqYS+Py8Ub6nHHXprQJGE9Ze3WgH+1ylSe
NJLEC7xgv15SR9A+e/MT4RTj3OVOwtd1Zi2vPav39a9K4tP+2uL2Ei+5d7FtT3LLZsjo
feek/DqCGSkJ/EC5woLyU9BBkfLUuQ9/2HiDCk10BMetEfWdor69Slb39NOXES8br02X
25Btabu9ZCWroyjQj7W5gwGr5Z6Hs2nbnnfAb+e92FalcUD/4ql77lNzRcWGi4/9TT8s
ntqI2g46Xv+k5LURaRH5CRBpxkkKgzcrioRPYFUHkEgOEWy1hPzg9QPk8ZO35Xm9R9d2
vl3Q==
X-Gm-Message-State: AOAM531IlYUTVWcMrsTunnxZWB7SKeeOmoZj5mZ1A5tl7N/JlZUueN8L
tvyRKnvxHr6a5mDaGHN9Tb1N/iCzT0U5oQgRVTxTnj1qFGibRa9+leLQNKX0aGlNg9JiaMfromb
xyOlCUpVXOlVvchuwTUSTn7rXum+Hh3PWQZm5II/EX+0AkzKqez62Z8U=
X-Received: by 2002:a17:90a:a581:: with SMTP id b1mr32203271pjq.53.1620184589161;
Tue, 04 May 2021 20:16:29 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJxffoGdRqAjUagWoMVD5p/Lk1KTEDftEhkWh8ewatgDmZLlxh0lO1hxYIdYYwoO5dsJ/i0z
X-Received: by 2002:a17:90a:a581:: with SMTP id b1mr32203198pjq.53.1620184588109;
Tue, 04 May 2021 20:16:28 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1620184588; cv=none;
d=google.com; s=arc-20160816;
b=Fr2b2AMXJr6OeNpSql45tq1korkuDOunp7t+DpARuEBnwvQnKfagyipQ93jywsRf/c
/i/mP2eTmJwOLWNORClh1MGF/0VfBx1ULoB9W4CI3LpVgGFXGGFis8LTcvUYD5yvhlsV
50rm2j34iS9lyo04FB/hbhGkwLtUhz2PGkLGuqHspTd+pUpUCf5SLxGJbZC5uCcUEsbO
8WSDBWyvaCPjFzJQZK60gK70ticKW+fCG1xHtOG4qsFCbqEpFKBy8eVK83OBazo/dQDr
DOheWNWyw2o/WMP4GpZMvZuj30dx3j8xnBahIpnMIQJaog6wLMcVX9pkQ8UJym3/PGNm
pO/g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
h=user-agent:in-reply-to:message-id:subject:to:from:date
:dkim-signature;
bh=LuH7mbm3+zp863vKBEqKeoZtnpuFxYpIb5oTVwf56Es=;
b=vVN16NPMKjoxSJQ6b36VXFCkZqnmG7wABfilgE069txZqmHpEMyZb8lRStkHy557LM
Kn7UfJFP3xwsP8ZTCipVDZ6tpFW/hYFU9o4th9G8asWs+MOf9xpWX2LQZ1FTmaao2Fg5
uCHypz39cnAh0Z1EJfNsTcaTGIrkbBd6zje+mtBgs8hnfH8HcWBYTPCHCCx950Z928tb
XOPd/Igs7yzD1ioBiGXZj/ciwPbWVTaZXBg4JOZSApxkDMfuMyfyLLOs++EVkyxJHUme
TmgwvLkixcwEtKF7gIeqEhwvOUSVvilLuJLFVaLumwTcjJ1amVfGcJhBE7LIM9C3SMpA
rOOg==
ARC-Authentication-Results: i=1; mx.google.com;
dkim=pass header.i=@linux-foundation.org header.s=korg header.b="Gdz/3wY9";
spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org
Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99])
by mx.google.com with ESMTPS id c85si20173199pfb.8.2021.05.04.20.16.27
(version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
Tue, 04 May 2021 20:16:28 -0700 (PDT)
Received-SPF: pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) client-ip=198.145.29.99;
Authentication-Results: mx.google.com;
dkim=pass header.i=@linux-foundation.org header.s=korg header.b="Gdz/3wY9";
spf=pass (google.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org
Received: by mail.kernel.org (Postfix) with ESMTPSA id A4DB4610D2;
Wed, 5 May 2021 03:16:26 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org;
s=korg; t=1620184587;
bh=TxN4wgKcKf2UUem+5pL09m9GL/7U592mEalo2U6vwAU=;
h=Date:From:To:Subject:In-Reply-To:From;
b=Gdz/3wY9ktH3hOmn2DAOkfh0JXwPdMJ8xsNQFa9eI25K39Z3iHdRGo9jX3QtMDtog
D4Zakt52CQCYsV91c9oCai8KnCTkkAjJq/Ez7p8UHpz97Go3yYYxqg6DDl6d8HCQvN
H47dTaZAgeH2sw29bjB9fRzNuTx7k4RAPlqZIpiE=
Date: Tue, 04 May 2021 20:16:26 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, anshuman.khandual@arm.com, aou@eecs.berkeley.edu, arnd@arndb.de, benh@kernel.crashing.org, borntraeger@de.ibm.com, bp@alien8.de, catalin.marinas@arm.com, dalias@libc.org, deller@gmx.de, gor@linux.ibm.com, hca@linux.ibm.com, hpa@zytor.com, James.Bottomley@HansenPartnership.com, linux-mm@kvack.org, linux@armlinux.org.uk, mingo@redhat.com, mm-commits@vger.kernel.org, mpe@ellerman.id.au, palmerdabbelt@google.com, paul.walmsley@sifive.com, paulus@samba.org, tglx@linutronix.de, torvalds@linux-foundation.org, tsbogend@alpha.franken.de, vgupta@synopsys.com, viro@zeniv.linux.org.uk, will@kernel.org, ysato@users.osdn.me
Subject: [patch 103/143] mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
Message-ID: <20210505031626.c8o4WL7KE%akpm@linux-foundation.org>
In-Reply-To: <20210504183219.a3cc46aee4013d77402276c5@linux-foundation.org>
User-Agent: s-nail v14.8.16
X-Gm-Original-To: akpm@linux-foundation.org
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms that
subscribe it. Instead, just make it a generic option which can be
selected on applicable platforms. Also rename it as
ARCH_SUPPORTS_HUGETLBFS instead. This reduces code duplication and makes
it cleaner.
Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [riscv]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/arm/Kconfig | 5 +----
arch/arm64/Kconfig | 4 +---
arch/mips/Kconfig | 6 +-----
arch/parisc/Kconfig | 5 +----
arch/powerpc/Kconfig | 3 ---
arch/powerpc/platforms/Kconfig.cputype | 6 +++---
arch/riscv/Kconfig | 5 +----
arch/sh/Kconfig | 5 +----
fs/Kconfig | 5 ++++-
9 files changed, 13 insertions(+), 31 deletions(-)
--- a/arch/arm64/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_USE_SYM_ANNOTATIONS
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
+ select ARCH_SUPPORTS_HUGETLBFS
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
@@ -1072,9 +1073,6 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU
-config SYS_SUPPORTS_HUGETLBFS
- def_bool y
-
config ARCH_HAS_FILTER_PGPROT
def_bool y
--- a/arch/arm/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/arm/Kconfig
@@ -31,6 +31,7 @@ config ARM
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
select ARCH_SUPPORTS_ATOMIC_RMW
+ select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
@@ -1511,10 +1512,6 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU
-config SYS_SUPPORTS_HUGETLBFS
- def_bool y
- depends on ARM_LPAE
-
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
def_bool y
depends on ARM_LPAE
--- a/arch/mips/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/mips/Kconfig
@@ -19,6 +19,7 @@ config MIPS
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
+ select ARCH_SUPPORTS_HUGETLBFS if CPU_SUPPORTS_HUGEPAGES
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_LD_ORPHAN_WARN
@@ -1287,11 +1288,6 @@ config SYS_SUPPORTS_BIG_ENDIAN
config SYS_SUPPORTS_LITTLE_ENDIAN
bool
-config SYS_SUPPORTS_HUGETLBFS
- bool
- depends on CPU_SUPPORTS_HUGEPAGES
- default y
-
config MIPS_HUGE_TLB_SUPPORT
def_bool HUGETLB_PAGE || TRANSPARENT_HUGEPAGE
--- a/arch/parisc/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/parisc/Kconfig
@@ -12,6 +12,7 @@ config PARISC
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_NO_SG_CHAIN
+ select ARCH_SUPPORTS_HUGETLBFS if PA20
select ARCH_SUPPORTS_MEMORY_FAILURE
select DMA_OPS
select RTC_CLASS
@@ -138,10 +139,6 @@ config PGTABLE_LEVELS
default 3 if 64BIT && PARISC_PAGE_SIZE_4KB
default 2
-config SYS_SUPPORTS_HUGETLBFS
- def_bool y if PA20
-
-
menu "Processor type and features"
choice
--- a/arch/powerpc/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/powerpc/Kconfig
@@ -697,9 +697,6 @@ config ARCH_SPARSEMEM_DEFAULT
def_bool y
depends on PPC_BOOK3S_64
-config SYS_SUPPORTS_HUGETLBFS
- bool
-
config ILLEGAL_POINTER_VALUE
hex
# This is roughly half way between the top of user space and the bottom
--- a/arch/powerpc/platforms/Kconfig.cputype~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/powerpc/platforms/Kconfig.cputype
@@ -40,8 +40,8 @@ config PPC_85xx
config PPC_8xx
bool "Freescale 8xx"
+ select ARCH_SUPPORTS_HUGETLBFS
select FSL_SOC
- select SYS_SUPPORTS_HUGETLBFS
select PPC_HAVE_KUEP
select PPC_HAVE_KUAP
select HAVE_ARCH_VMAP_STACK
@@ -95,9 +95,9 @@ config PPC_BOOK3S_64
bool "Server processors"
select PPC_FPU
select PPC_HAVE_PMU_SUPPORT
- select SYS_SUPPORTS_HUGETLBFS
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
+ select ARCH_SUPPORTS_HUGETLBFS
select ARCH_SUPPORTS_NUMA_BALANCING
select IRQ_WORK
select PPC_MM_SLICES
@@ -278,9 +278,9 @@ config FSL_BOOKE
# this is for common code between PPC32 & PPC64 FSL BOOKE
config PPC_FSL_BOOK3E
bool
+ select ARCH_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64
select FSL_EMB_PERFMON
select PPC_SMP_MUXED_IPI
- select SYS_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64
select PPC_DOORBELL
default y if FSL_BOOKE
--- a/arch/riscv/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/riscv/Kconfig
@@ -30,6 +30,7 @@ config RISCV
select ARCH_HAS_STRICT_KERNEL_RWX if MMU
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
+ select ARCH_SUPPORTS_HUGETLBFS if MMU
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
@@ -165,10 +166,6 @@ config ARCH_WANT_GENERAL_HUGETLB
config ARCH_SUPPORTS_UPROBES
def_bool y
-config SYS_SUPPORTS_HUGETLBFS
- depends on MMU
- def_bool y
-
config STACKTRACE_SUPPORT
def_bool y
--- a/arch/sh/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/arch/sh/Kconfig
@@ -101,9 +101,6 @@ config SYS_SUPPORTS_APM_EMULATION
bool
select ARCH_SUSPEND_POSSIBLE
-config SYS_SUPPORTS_HUGETLBFS
- bool
-
config SYS_SUPPORTS_SMP
bool
@@ -175,12 +172,12 @@ config CPU_SH3
config CPU_SH4
bool
+ select ARCH_SUPPORTS_HUGETLBFS if MMU
select CPU_HAS_INTEVT
select CPU_HAS_SR_RB
select CPU_HAS_FPU if !CPU_SH4AL_DSP
select SH_INTC
select SYS_SUPPORTS_SH_TMU
- select SYS_SUPPORTS_HUGETLBFS if MMU
config CPU_SH4A
bool
--- a/fs/Kconfig~mm-generalize-sys_supports_hugetlbfs-rename-as-arch_supports_hugetlbfs
+++ a/fs/Kconfig
@@ -223,10 +223,13 @@ config TMPFS_INODE64
If unsure, say N.
+config ARCH_SUPPORTS_HUGETLBFS
+ def_bool n
+
config HUGETLBFS
bool "HugeTLB file system support"
depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || \
- SYS_SUPPORTS_HUGETLBFS || BROKEN
+ ARCH_SUPPORTS_HUGETLBFS || BROKEN
help
hugetlbfs is a filesystem backing for HugeTLB pages, based on
ramfs. For architectures that support it, say Y here and read
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-05-05 3:16 ` incoming Andrew Morton
@ 2021-05-05 17:10 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-05-05 17:10 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: Linux-MM, mm-commits
On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Let me resend right now with the same in-reply-to. Hopefully they will
> land in the correct place.
Well, you re-sent it twice, and I have three copies in my own mailbox,
bot they still don't show up on the mm-commits mailing list.
So the list hates them for some odd reason.
I've picked them up locally, but adding Konstantin to the participants
to see if he can see what's up.
Konstantin: patches 103/106/107 are missing on lore out of Andrew's
series of 143. Odd.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2021-05-05 17:10 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-05-05 17:10 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: Linux-MM, mm-commits
On Tue, May 4, 2021 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Let me resend right now with the same in-reply-to. Hopefully they will
> land in the correct place.
Well, you re-sent it twice, and I have three copies in my own mailbox,
bot they still don't show up on the mm-commits mailing list.
So the list hates them for some odd reason.
I've picked them up locally, but adding Konstantin to the participants
to see if he can see what's up.
Konstantin: patches 103/106/107 are missing on lore out of Andrew's
series of 143. Odd.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-05-05 1:47 ` incoming Linus Torvalds
@ 2021-05-05 3:16 ` Andrew Morton
2021-05-05 17:10 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-05-05 3:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux-MM, mm-commits
On Tue, 4 May 2021 18:47:19 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, May 4, 2021 at 6:32 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > 143 patches
>
> Hmm. Only 140 seem to have made it to the list, with 103, 106 and 107 missing.
>
> Maybe just some mail delay? But at least right now
>
> https://lore.kernel.org/mm-commits/
>
> doesn't show them (and thus 'b4' doesn't work).
>
> I'll check again later.
>
Well that's strange. I see all three via cc:me, but not on linux-mm or
mm-commits.
Let me resend right now with the same in-reply-to. Hopefully they will
land in the correct place.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-05-05 1:32 incoming Andrew Morton
@ 2021-05-05 1:47 ` Linus Torvalds
2021-05-05 3:16 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2021-05-05 1:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Tue, May 4, 2021 at 6:32 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 143 patches
Hmm. Only 140 seem to have made it to the list, with 103, 106 and 107 missing.
Maybe just some mail delay? But at least right now
https://lore.kernel.org/mm-commits/
doesn't show them (and thus 'b4' doesn't work).
I'll check again later.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-05-05 1:32 Andrew Morton
2021-05-05 1:47 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-05-05 1:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
The remainder of the main mm/ queue.
143 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0, plus
previously sent patches.
Subsystems affected by this patch series:
mm/pagecache
mm/hugetlb
mm/userfaultfd
mm/vmscan
mm/compaction
mm/migration
mm/cma
mm/ksm
mm/vmstat
mm/mmap
mm/kconfig
mm/util
mm/memory-hotplug
mm/zswap
mm/zsmalloc
mm/highmem
mm/cleanups
mm/kfence
Subsystem: mm/pagecache
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Remove nrexceptional tracking", v2:
mm: introduce and use mapping_empty()
mm: stop accounting shadow entries
dax: account DAX entries as nrpages
mm: remove nrexceptional from inode
Hugh Dickins <hughd@google.com>:
mm: remove nrexceptional from inode: remove BUG_ON
Subsystem: mm/hugetlb
Peter Xu <peterx@redhat.com>:
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4:
hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share()
hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled
mm/hugetlb: move flush_hugetlb_tlb_range() into hugetlb.h
hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp
Miaohe Lin <linmiaohe@huawei.com>:
mm/hugetlb: remove redundant reservation check condition in alloc_huge_page()
Anshuman Khandual <anshuman.khandual@arm.com>:
mm: generalize HUGETLB_PAGE_SIZE_VARIABLE
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Some cleanups for hugetlb":
mm/hugetlb: use some helper functions to cleanup code
mm/hugetlb: optimize the surplus state transfer code in move_hugetlb_state()
mm/hugetlb_cgroup: remove unnecessary VM_BUG_ON_PAGE in hugetlb_cgroup_migrate()
mm/hugetlb: simplify the code when alloc_huge_page() failed in hugetlb_no_page()
mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case
Patch series "Cleanup and fixup for khugepaged", v2:
khugepaged: remove unneeded return value of khugepaged_collapse_pte_mapped_thps()
khugepaged: reuse the smp_wmb() inside __SetPageUptodate()
khugepaged: use helper khugepaged_test_exit() in __khugepaged_enter()
khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
mm/huge_memory.c: remove unnecessary local variable ret2
Patch series "Some cleanups for huge_memory", v3:
mm/huge_memory.c: rework the function vma_adjust_trans_huge()
mm/huge_memory.c: make get_huge_zero_page() return bool
mm/huge_memory.c: rework the function do_huge_pmd_numa_page() slightly
mm/huge_memory.c: remove redundant PageCompound() check
mm/huge_memory.c: remove unused macro TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG
mm/huge_memory.c: use helper function migration_entry_to_page()
Yanfei Xu <yanfei.xu@windriver.com>:
mm/khugepaged.c: replace barrier() with READ_ONCE() for a selective variable
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup for khugepaged":
khugepaged: use helper function range_in_vma() in collapse_pte_mapped_thp()
khugepaged: remove unnecessary out label in collapse_huge_page()
khugepaged: remove meaningless !pte_present() check in khugepaged_scan_pmd()
Zi Yan <ziy@nvidia.com>:
mm: huge_memory: a new debugfs interface for splitting THP tests
mm: huge_memory: debugfs for file-backed THP split
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup and fixup for hugetlb", v2:
mm/hugeltb: remove redundant VM_BUG_ON() in region_add()
mm/hugeltb: simplify the return code of __vma_reservation_common()
mm/hugeltb: clarify (chg - freed) won't go negative in hugetlb_unreserve_pages()
mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()
Mike Kravetz <mike.kravetz@oracle.com>:
Patch series "make hugetlb put_page safe for all calling contexts", v5:
mm/cma: change cma mutex to irq safe spinlock
hugetlb: no need to drop hugetlb_lock to call cma_release
hugetlb: add per-hstate mutex to synchronize user adjustments
hugetlb: create remove_hugetlb_page() to separate functionality
hugetlb: call update_and_free_page without hugetlb_lock
hugetlb: change free_pool_huge_page to remove_pool_huge_page
hugetlb: make free_huge_page irq safe
hugetlb: add lockdep_assert_held() calls for hugetlb_lock
Oscar Salvador <osalvador@suse.de>:
Patch series "Make alloc_contig_range handle Hugetlb pages", v10:
mm,page_alloc: bail out earlier on -ENOMEM in alloc_contig_migrate_range
mm,compaction: let isolate_migratepages_{range,block} return error codes
mm,hugetlb: drop clearing of flag from prep_new_huge_page
mm,hugetlb: split prep_new_huge_page functionality
mm: make alloc_contig_range handle free hugetlb pages
mm: make alloc_contig_range handle in-use hugetlb pages
mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig
Subsystem: mm/userfaultfd
Axel Rasmussen <axelrasmussen@google.com>:
Patch series "userfaultfd: add minor fault handling", v9:
userfaultfd: add minor fault registration mode
userfaultfd: disable huge PMD sharing for MINOR registered VMAs
userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled
userfaultfd: add UFFDIO_CONTINUE ioctl
userfaultfd: update documentation to describe minor fault handling
userfaultfd/selftests: add test exercising minor fault handling
Subsystem: mm/vmscan
Dave Hansen <dave.hansen@linux.intel.com>:
mm/vmscan: move RECLAIM* bits to uapi header
mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks
Yang Shi <shy828301@gmail.com>:
Patch series "Make shrinker's nr_deferred memcg aware", v10:
mm: vmscan: use nid from shrink_control for tracepoint
mm: vmscan: consolidate shrinker_maps handling code
mm: vmscan: use shrinker_rwsem to protect shrinker_maps allocation
mm: vmscan: remove memcg_shrinker_map_size
mm: vmscan: use kvfree_rcu instead of call_rcu
mm: memcontrol: rename shrinker_map to shrinker_info
mm: vmscan: add shrinker_info_protected() helper
mm: vmscan: use a new flag to indicate shrinker is registered
mm: vmscan: add per memcg shrinker nr_deferred
mm: vmscan: use per memcg nr_deferred of shrinker
mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers
mm: memcontrol: reparent nr_deferred when memcg offline
mm: vmscan: shrink deferred objects proportional to priority
Subsystem: mm/compaction
Pintu Kumar <pintu@codeaurora.org>:
mm/compaction: remove unused variable sysctl_compact_memory
Charan Teja Reddy <charante@codeaurora.org>:
mm: compaction: update the COMPACT[STALL|FAIL] events properly
Subsystem: mm/migration
Minchan Kim <minchan@kernel.org>:
mm: disable LRU pagevec during the migration temporarily
mm: replace migrate_[prep|finish] with lru_cache_[disable|enable]
mm: fs: invalidate BH LRU during page migration
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup and fixup for mm/migrate.c", v3:
mm/migrate.c: make putback_movable_page() static
mm/migrate.c: remove unnecessary rc != MIGRATEPAGE_SUCCESS check in 'else' case
mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
mm/migrate.c: use helper migrate_vma_collect_skip() in migrate_vma_collect_hole()
Revert "mm: migrate: skip shared exec THP for NUMA balancing"
Subsystem: mm/cma
Minchan Kim <minchan@kernel.org>:
mm: vmstat: add cma statistics
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm: cma: use pr_err_ratelimited for CMA warning
Liam Mark <lmark@codeaurora.org>:
mm: cma: add trace events for CMA alloc perf testing
Minchan Kim <minchan@kernel.org>:
mm: cma: support sysfs
mm: cma: add the CMA instance name to cma trace events
mm: use proper type for cma_[alloc|release]
Subsystem: mm/ksm
Miaohe Lin <linmiaohe@huawei.com>:
Patch series "Cleanup and fixup for ksm":
ksm: remove redundant VM_BUG_ON_PAGE() on stable_tree_search()
ksm: use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()
ksm: remove dedicated macro KSM_FLAG_MASK
ksm: fix potential missing rmap_item for stable_node
Chengyang Fan <cy.fan@huawei.com>:
mm/ksm: remove unused parameter from remove_trailing_rmap_items()
Subsystem: mm/vmstat
Hugh Dickins <hughd@google.com>:
mm: restore node stat checking in /proc/sys/vm/stat_refresh
mm: no more EINVAL from /proc/sys/vm/stat_refresh
mm: /proc/sys/vm/stat_refresh skip checking known negative stats
mm: /proc/sys/vm/stat_refresh stop checking monotonic numa stats
Saravanan D <saravanand@fb.com>:
x86/mm: track linear mapping split events
Subsystem: mm/mmap
Liam Howlett <liam.howlett@oracle.com>:
mm/mmap.c: don't unlock VMAs in remap_file_pages()
Subsystem: mm/kconfig
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm: some config cleanups", v2:
mm: generalize ARCH_HAS_CACHE_LINE_SIZE
mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)
mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]
mm: drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION
mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCK
mm: drop redundant HAVE_ARCH_TRANSPARENT_HUGEPAGE
Subsystem: mm/util
Joe Perches <joe@perches.com>:
mm/util.c: reduce mem_dump_obj() object size
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
mm/util.c: fix typo
Subsystem: mm/memory-hotplug
Pavel Tatashin <pasha.tatashin@soleen.com>:
Patch series "prohibit pinning pages in ZONE_MOVABLE", v11:
mm/gup: don't pin migrated cma pages in movable zone
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN
mm: apply per-task gfp constraints in fast path
mm: honor PF_MEMALLOC_PIN for all movable pages
mm/gup: do not migrate zero page
mm/gup: migrate pinned pages out of movable zone
memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning
mm/gup: change index type to long as it counts pages
mm/gup: longterm pin migration cleanup
selftests/vm: gup_test: fix test flag
selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages
Mel Gorman <mgorman@techsingularity.net>:
mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
Oscar Salvador <osalvador@suse.de>:
Patch series "Allocate memmap from hotadded memory (per device)", v10:
drivers/base/memory: introduce memory_block_{online,offline}
mm,memory_hotplug: relax fully spanned sections check
David Hildenbrand <david@redhat.com>:
mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
Oscar Salvador <osalvador@suse.de>:
mm,memory_hotplug: allocate memmap from the added memory range
acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
Subsystem: mm/zswap
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/zswap.c: switch from strlcpy to strscpy
Subsystem: mm/zsmalloc
zhouchuangao <zhouchuangao@vivo.com>:
mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
Subsystem: mm/highmem
Ira Weiny <ira.weiny@intel.com>:
Patch series "btrfs: Convert kmap/memset/kunmap to memzero_user()":
iov_iter: lift memzero_page() to highmem.h
btrfs: use memzero_page() instead of open coded kmap pattern
songqiang <songqiang@uniontech.com>:
mm/highmem.c: fix coding style issue
Subsystem: mm/cleanups
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/mempool: minor coding style tweaks
Zhang Yunkai <zhang.yunkai@zte.com.cn>:
mm/process_vm_access.c: remove duplicate include
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence: zero guard page after out-of-bounds access
Patch series "kfence: optimize timer scheduling", v2:
kfence: await for allocation using wait_event
kfence: maximize allocation wait timeout duration
kfence: use power-efficient work queue to run delayed work
Documentation/ABI/testing/sysfs-kernel-mm-cma | 25
Documentation/admin-guide/kernel-parameters.txt | 17
Documentation/admin-guide/mm/memory-hotplug.rst | 9
Documentation/admin-guide/mm/userfaultfd.rst | 105 +-
arch/arc/Kconfig | 9
arch/arm/Kconfig | 10
arch/arm64/Kconfig | 34
arch/arm64/mm/hugetlbpage.c | 7
arch/ia64/Kconfig | 14
arch/ia64/mm/hugetlbpage.c | 3
arch/mips/Kconfig | 6
arch/mips/mm/hugetlbpage.c | 4
arch/parisc/Kconfig | 5
arch/parisc/mm/hugetlbpage.c | 2
arch/powerpc/Kconfig | 17
arch/powerpc/mm/hugetlbpage.c | 3
arch/powerpc/platforms/Kconfig.cputype | 16
arch/riscv/Kconfig | 5
arch/s390/Kconfig | 12
arch/s390/mm/hugetlbpage.c | 2
arch/sh/Kconfig | 7
arch/sh/mm/Kconfig | 8
arch/sh/mm/hugetlbpage.c | 2
arch/sparc/mm/hugetlbpage.c | 2
arch/x86/Kconfig | 33
arch/x86/mm/pat/set_memory.c | 8
drivers/acpi/acpi_memhotplug.c | 5
drivers/base/memory.c | 105 ++
fs/Kconfig | 5
fs/block_dev.c | 2
fs/btrfs/compression.c | 5
fs/btrfs/extent_io.c | 22
fs/btrfs/inode.c | 33
fs/btrfs/reflink.c | 6
fs/btrfs/zlib.c | 5
fs/btrfs/zstd.c | 5
fs/buffer.c | 36
fs/dax.c | 8
fs/gfs2/glock.c | 3
fs/hugetlbfs/inode.c | 9
fs/inode.c | 11
fs/proc/task_mmu.c | 3
fs/userfaultfd.c | 149 +++
include/linux/buffer_head.h | 4
include/linux/cma.h | 4
include/linux/compaction.h | 1
include/linux/fs.h | 2
include/linux/gfp.h | 2
include/linux/highmem.h | 7
include/linux/huge_mm.h | 3
include/linux/hugetlb.h | 37
include/linux/memcontrol.h | 27
include/linux/memory.h | 8
include/linux/memory_hotplug.h | 15
include/linux/memremap.h | 2
include/linux/migrate.h | 11
include/linux/mm.h | 28
include/linux/mmzone.h | 20
include/linux/pagemap.h | 5
include/linux/pgtable.h | 12
include/linux/sched.h | 2
include/linux/sched/mm.h | 27
include/linux/shrinker.h | 7
include/linux/swap.h | 21
include/linux/userfaultfd_k.h | 55 +
include/linux/vm_event_item.h | 8
include/trace/events/cma.h | 92 +-
include/trace/events/migrate.h | 25
include/trace/events/mmflags.h | 7
include/uapi/linux/mempolicy.h | 7
include/uapi/linux/userfaultfd.h | 36
init/Kconfig | 5
kernel/sysctl.c | 2
lib/Kconfig.kfence | 1
lib/iov_iter.c | 8
mm/Kconfig | 28
mm/Makefile | 6
mm/cma.c | 70 +
mm/cma.h | 25
mm/cma_debug.c | 8
mm/cma_sysfs.c | 112 ++
mm/compaction.c | 113 ++
mm/filemap.c | 24
mm/frontswap.c | 12
mm/gup.c | 264 +++---
mm/gup_test.c | 29
mm/gup_test.h | 3
mm/highmem.c | 11
mm/huge_memory.c | 326 +++++++-
mm/hugetlb.c | 843 ++++++++++++++--------
mm/hugetlb_cgroup.c | 9
mm/internal.h | 10
mm/kfence/core.c | 61 +
mm/khugepaged.c | 63 -
mm/ksm.c | 17
mm/list_lru.c | 6
mm/memcontrol.c | 137 ---
mm/memory_hotplug.c | 220 +++++
mm/mempolicy.c | 16
mm/mempool.c | 2
mm/migrate.c | 103 --
mm/mlock.c | 4
mm/mmap.c | 18
mm/oom_kill.c | 2
mm/page_alloc.c | 83 +-
mm/process_vm_access.c | 1
mm/shmem.c | 2
mm/sparse.c | 4
mm/swap.c | 69 +
mm/swap_state.c | 4
mm/swapfile.c | 4
mm/truncate.c | 19
mm/userfaultfd.c | 39 -
mm/util.c | 26
mm/vmalloc.c | 2
mm/vmscan.c | 543 +++++++++-----
mm/vmstat.c | 45 -
mm/workingset.c | 1
mm/zsmalloc.c | 6
mm/zswap.c | 2
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 1
tools/testing/selftests/vm/gup_test.c | 38
tools/testing/selftests/vm/split_huge_page_test.c | 400 ++++++++++
tools/testing/selftests/vm/userfaultfd.c | 164 ++++
125 files changed, 3596 insertions(+), 1668 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-04-30 5:52 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-04-30 5:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
A few misc subsystems and some of MM.
178 patches, based on 8ca5297e7e38f2dc8c753d33a5092e7be181fff0.
Subsystems affected by this patch series:
ia64
kbuild
scripts
sh
ocfs2
kfifo
vfs
kernel/watchdog
mm/slab-generic
mm/slub
mm/kmemleak
mm/debug
mm/pagecache
mm/msync
mm/gup
mm/memremap
mm/memcg
mm/pagemap
mm/mremap
mm/dma
mm/sparsemem
mm/vmalloc
mm/documentation
mm/kasan
mm/initialization
mm/pagealloc
mm/memory-failure
Subsystem: ia64
Zhang Yunkai <zhang.yunkai@zte.com.cn>:
arch/ia64/kernel/head.S: remove duplicate include
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
arch/ia64/kernel/fsys.S: fix typos
arch/ia64/include/asm/pgtable.h: minor typo fixes
Valentin Schneider <valentin.schneider@arm.com>:
ia64: ensure proper NUMA distance and possible map initialization
Sergei Trofimovich <slyfox@gentoo.org>:
ia64: drop unused IA64_FW_EMU ifdef
ia64: simplify code flow around swiotlb init
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
ia64: trivial spelling fixes
Sergei Trofimovich <slyfox@gentoo.org>:
ia64: fix EFI_DEBUG build
ia64: mca: always make IA64_MCA_DEBUG an expression
ia64: drop marked broken DISCONTIGMEM and VIRTUAL_MEM_MAP
ia64: module: fix symbolizer crash on fdescr
Subsystem: kbuild
Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
include/linux/compiler-gcc.h: sparse can do constant folding of __builtin_bswap*()
Subsystem: scripts
Tom Saeger <tom.saeger@oracle.com>:
scripts/spelling.txt: add entries for recent discoveries
Wan Jiabing <wanjiabing@vivo.com>:
scripts: a new script for checking duplicate struct declaration
Subsystem: sh
Zhang Yunkai <zhang.yunkai@zte.com.cn>:
arch/sh/include/asm/tlb.h: remove duplicate include
Subsystem: ocfs2
Yang Li <yang.lee@linux.alibaba.com>:
ocfs2: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE
Joseph Qi <joseph.qi@linux.alibaba.com>:
ocfs2: map flags directly in flags_to_o2dlm()
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
ocfs2: fix a typo
Jiapeng Chong <jiapeng.chong@linux.alibaba.com>:
ocfs2/dlm: remove unused function
Subsystem: kfifo
Dan Carpenter <dan.carpenter@oracle.com>:
kfifo: fix ternary sign extension bugs
Subsystem: vfs
Randy Dunlap <rdunlap@infradead.org>:
vfs: fs_parser: clean up kernel-doc warnings
Subsystem: kernel/watchdog
Petr Mladek <pmladek@suse.com>:
Patch series "watchdog/softlockup: Report overall time and some cleanup", v2:
watchdog: rename __touch_watchdog() to a better descriptive name
watchdog: explicitly update timestamp when reporting softlockup
watchdog/softlockup: report the overall time of softlockups
watchdog/softlockup: remove logic that tried to prevent repeated reports
watchdog: fix barriers when printing backtraces from all CPUs
watchdog: cleanup handling of false positives
Subsystem: mm/slab-generic
Rafael Aquini <aquini@redhat.com>:
mm/slab_common: provide "slab_merge" option for !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT) builds
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: enable slub_debug static key when creating cache with explicit debug flags
Oliver Glitta <glittao@gmail.com>:
kunit: add a KUnit test for SLUB debugging functionality
slub: remove resiliency_test() function
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
mm/slub.c: trivial typo fixes
Subsystem: mm/kmemleak
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
mm/kmemleak.c: fix a typo
Subsystem: mm/debug
Georgi Djakov <georgi.djakov@linaro.org>:
mm/page_owner: record the timestamp of all pages during free
zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>:
mm, page_owner: remove unused parameter in __set_page_owner_handle
Sergei Trofimovich <slyfox@gentoo.org>:
mm: page_owner: fetch backtrace only for tracked pages
mm: page_owner: use kstrtobool() to parse bool option
mm: page_owner: detect page_owner recursion via task_struct
mm: page_poison: print page info when corruption is caught
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/memtest: add ARCH_USE_MEMTEST
Subsystem: mm/pagecache
Jens Axboe <axboe@kernel.dk>:
Patch series "Improve IOCB_NOWAIT O_DIRECT reads", v3:
mm: provide filemap_range_needs_writeback() helper
mm: use filemap_range_needs_writeback() for O_DIRECT reads
iomap: use filemap_range_needs_writeback() for O_DIRECT reads
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/filemap: use filemap_read_page in filemap_fault
mm/filemap: drop check for truncated page after I/O
Johannes Weiner <hannes@cmpxchg.org>:
mm: page-writeback: simplify memcg handling in test_clear_page_writeback()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: move page_mapping_file to pagemap.h
Rui Sun <sunrui26@huawei.com>:
mm/filemap: update stale comment
Subsystem: mm/msync
Nikita Ermakov <sh1r4s3@mail.si-head.nl>:
mm/msync: exit early when the flags is an MS_ASYNC and start < vm_start
Subsystem: mm/gup
Joao Martins <joao.m.martins@oracle.com>:
Patch series "mm/gup: page unpining improvements", v4:
mm/gup: add compound page list iterator
mm/gup: decrement head page once for group of subpages
mm/gup: add a range variant of unpin_user_pages_dirty_lock()
RDMA/umem: batch page unpin in __ib_umem_release()
Yang Shi <shy828301@gmail.com>:
mm: gup: remove FOLL_SPLIT
Subsystem: mm/memremap
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/memremap.c: fix improper SPDX comment style
Subsystem: mm/memcg
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: fix kernel stack account
Shakeel Butt <shakeelb@google.com>:
memcg: cleanup root memcg checks
memcg: enable memcg oom-kill for __GFP_NOFAIL
Johannes Weiner <hannes@cmpxchg.org>:
Patch series "mm: memcontrol: switch to rstat", v3:
mm: memcontrol: fix cpuhotplug statistics flushing
mm: memcontrol: kill mem_cgroup_nodeinfo()
mm: memcontrol: privatize memcg_page_state query functions
cgroup: rstat: support cgroup1
cgroup: rstat: punt root-level optimization to individual controllers
mm: memcontrol: switch to rstat
mm: memcontrol: consolidate lruvec stat flushing
kselftests: cgroup: update kmem test for new vmstat implementation
Shakeel Butt <shakeelb@google.com>:
memcg: charge before adding to swapcache on swapin
Muchun Song <songmuchun@bytedance.com>:
Patch series "Use obj_cgroup APIs to charge kmem pages", v5:
mm: memcontrol: slab: fix obtain a reference to a freeing memcg
mm: memcontrol: introduce obj_cgroup_{un}charge_pages
mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c
mm: memcontrol: change ug->dummy_page only if memcg changed
mm: memcontrol: use obj_cgroup APIs to charge kmem pages
mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()
mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
Wan Jiabing <wanjiabing@vivo.com>:
linux/memcontrol.h: remove duplicate struct declaration
Johannes Weiner <hannes@cmpxchg.org>:
mm: page_counter: mitigate consequences of a page_counter underflow
Subsystem: mm/pagemap
Wang Qing <wangqing@vivo.com>:
mm/memory.c: do_numa_page(): delete bool "migrated"
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/interval_tree: add comments to improve code readability
Oscar Salvador <osalvador@suse.de>:
Patch series "Cleanup and fixups for vmemmap handling", v6:
x86/vmemmap: drop handling of 4K unaligned vmemmap range
x86/vmemmap: drop handling of 1GB vmemmap ranges
x86/vmemmap: handle unpopulated sub-pmd ranges
x86/vmemmap: optimize for consecutive sections in partial populated PMDs
Ovidiu Panait <ovidiu.panait@windriver.com>:
mm, tracing: improve rss_stat tracepoint message
Christoph Hellwig <hch@lst.de>:
Patch series "add remap_pfn_range_notrack instead of reinventing it in i915", v2:
mm: add remap_pfn_range_notrack
mm: add a io_mapping_map_user helper
i915: use io_mapping_map_user
i915: fix remap_io_sg to verify the pgprot
Huang Ying <ying.huang@intel.com>:
NUMA balancing: reduce TLB flush via delaying mapping on hint page fault
Subsystem: mm/mremap
Brian Geffon <bgeffon@google.com>:
Patch series "mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings", v5:
mm: extend MREMAP_DONTUNMAP to non-anonymous mappings
Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"
selftests: add a MREMAP_DONTUNMAP selftest for shmem
Subsystem: mm/dma
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/dmapool: switch from strlcpy to strscpy
Subsystem: mm/sparsemem
Wang Wensheng <wangwensheng4@huawei.com>:
mm/sparse: add the missing sparse_buffer_fini() in error branch
Subsystem: mm/vmalloc
Christoph Hellwig <hch@lst.de>:
Patch series "remap_vmalloc_range cleanups":
samples/vfio-mdev/mdpy: use remap_vmalloc_range
mm: unexport remap_vmalloc_range_partial
Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>:
mm/vmalloc: use rb_tree instead of list for vread() lookups
Nicholas Piggin <npiggin@gmail.com>:
Patch series "huge vmalloc mappings", v13:
ARM: mm: add missing pud_page define to 2-level page tables
mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page
mm: apply_to_pte_range warn and fail if a large pte is encountered
mm/vmalloc: rename vmap_*_range vmap_pages_*_range
mm/ioremap: rename ioremap_*_range to vmap_*_range
mm: HUGE_VMAP arch support cleanup
powerpc: inline huge vmap supported functions
arm64: inline huge vmap supported functions
x86: inline huge vmap supported functions
mm/vmalloc: provide fallback arch huge vmap support functions
mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c
mm/vmalloc: add vmap_range_noflush variant
mm/vmalloc: hugepage vmalloc mappings
Patch series "mm/vmalloc: cleanup after hugepage series", v2:
mm/vmalloc: remove map_kernel_range
kernel/dma: remove unnecessary unmap_kernel_range
powerpc/xive: remove unnecessary unmap_kernel_range
mm/vmalloc: remove unmap_kernel_range
mm/vmalloc: improve allocation failure error messages
Vijayanand Jitta <vjitta@codeaurora.org>:
mm: vmalloc: prevent use after free in _vm_unmap_aliases
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
lib/test_vmalloc.c: remove two kvfree_rcu() tests
lib/test_vmalloc.c: add a new 'nr_threads' parameter
vm/test_vmalloc.sh: adapt for updated driver interface
mm/vmalloc: refactor the preloading loagic
mm/vmalloc: remove an empty line
Subsystem: mm/documentation
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/doc: fix fault_flag_allow_retry_first kerneldoc
mm/doc: fix page_maybe_dma_pinned kerneldoc
mm/doc: turn fault flags into an enum
mm/doc: add mm.h and mm_types.h to the mm-api document
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
Patch series "kernel-doc and MAINTAINERS clean-up":
MAINTAINERS: assign pagewalk.h to MEMORY MANAGEMENT
pagewalk: prefix struct kernel-doc descriptions
Subsystem: mm/kasan
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/kasan: switch from strlcpy to strscpy
Peter Collingbourne <pcc@google.com>:
kasan: fix kasan_byte_accessible() to be consistent with actual checks
Andrey Konovalov <andreyknvl@google.com>:
kasan: initialize shadow to TAG_INVALID for SW_TAGS
mm, kasan: don't poison boot memory with tag-based modes
Patch series "kasan: integrate with init_on_alloc/free", v3:
arm64: kasan: allow to init memory when setting tags
kasan: init memory in kasan_(un)poison for HW_TAGS
kasan, mm: integrate page_alloc init with HW_TAGS
kasan, mm: integrate slab init_on_alloc with HW_TAGS
kasan, mm: integrate slab init_on_free with HW_TAGS
kasan: docs: clean up sections
kasan: docs: update overview section
kasan: docs: update usage section
kasan: docs: update error reports section
kasan: docs: update boot parameters section
kasan: docs: update GENERIC implementation details section
kasan: docs: update SW_TAGS implementation details section
kasan: docs: update HW_TAGS implementation details section
kasan: docs: update shadow memory section
kasan: docs: update ignoring accesses section
kasan: docs: update tests section
Walter Wu <walter-zh.wu@mediatek.com>:
kasan: record task_work_add() call stack
Andrey Konovalov <andreyknvl@google.com>:
kasan: detect false-positives in tests
Zqiang <qiang.zhang@windriver.com>:
irq_work: record irq_work_queue() call stack
Subsystem: mm/initialization
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm: move mem_init_print_info() into mm_init()
Subsystem: mm/pagealloc
David Hildenbrand <david@redhat.com>:
mm/page_alloc: drop pr_info_ratelimited() in alloc_contig_range()
Minchan Kim <minchan@kernel.org>:
mm: remove lru_add_drain_all in alloc_contig_range
Yu Zhao <yuzhao@google.com>:
include/linux/page-flags-layout.h: correctly determine LAST_CPUPID_WIDTH
include/linux/page-flags-layout.h: cleanups
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Rationalise __alloc_pages wrappers", v3:
mm/page_alloc: rename alloc_mask to alloc_gfp
mm/page_alloc: rename gfp_mask to gfp
mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemask
mm/mempolicy: rename alloc_pages_current to alloc_pages
mm/mempolicy: rewrite alloc_pages documentation
mm/mempolicy: rewrite alloc_pages_vma documentation
mm/mempolicy: fix mpol_misplaced kernel-doc
Minchan Kim <minchan@kernel.org>:
mm: page_alloc: dump migrate-failed pages
Geert Uytterhoeven <geert@linux-m68k.org>:
mm/Kconfig: remove default DISCONTIGMEM_MANUAL
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm, page_alloc: avoid page_to_pfn() in move_freepages()
zhouchuangao <zhouchuangao@vivo.com>:
mm/page_alloc: duplicate include linux/vmalloc.h
Mel Gorman <mgorman@techsingularity.net>:
Patch series "Introduce a bulk order-0 page allocator with two in-tree users", v6:
mm/page_alloc: rename alloced to allocated
mm/page_alloc: add a bulk page allocator
mm/page_alloc: add an array-based interface to the bulk page allocator
Jesper Dangaard Brouer <brouer@redhat.com>:
mm/page_alloc: optimize code layout for __alloc_pages_bulk
mm/page_alloc: inline __rmqueue_pcplist
Chuck Lever <chuck.lever@oracle.com>:
Patch series "SUNRPC consumer for the bulk page allocator":
SUNRPC: set rq_page_end differently
SUNRPC: refresh rq_pages using a bulk page allocator
Jesper Dangaard Brouer <brouer@redhat.com>:
net: page_pool: refactor dma_map into own function page_pool_dma_map
net: page_pool: use alloc_pages_bulk in refill code path
Sergei Trofimovich <slyfox@gentoo.org>:
mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1
huxiang <huxiang@uniontech.com>:
mm/page_alloc: redundant definition variables of pfn in for loop
Mike Rapoport <rppt@linux.ibm.com>:
mm/mmzone.h: fix existing kernel-doc comments and link them to core-api
Subsystem: mm/memory-failure
Jane Chu <jane.chu@oracle.com>:
mm/memory-failure: unnecessary amount of unmapping
Documentation/admin-guide/kernel-parameters.txt | 7
Documentation/admin-guide/mm/transhuge.rst | 2
Documentation/core-api/cachetlb.rst | 4
Documentation/core-api/mm-api.rst | 6
Documentation/dev-tools/kasan.rst | 355 +++++-----
Documentation/vm/page_owner.rst | 2
Documentation/vm/transhuge.rst | 5
MAINTAINERS | 1
arch/Kconfig | 11
arch/alpha/mm/init.c | 1
arch/arc/mm/init.c | 1
arch/arm/Kconfig | 1
arch/arm/include/asm/pgtable-3level.h | 2
arch/arm/include/asm/pgtable.h | 3
arch/arm/mm/copypage-v4mc.c | 1
arch/arm/mm/copypage-v6.c | 1
arch/arm/mm/copypage-xscale.c | 1
arch/arm/mm/init.c | 2
arch/arm64/Kconfig | 1
arch/arm64/include/asm/memory.h | 4
arch/arm64/include/asm/mte-kasan.h | 39 -
arch/arm64/include/asm/vmalloc.h | 38 -
arch/arm64/mm/init.c | 4
arch/arm64/mm/mmu.c | 36 -
arch/csky/abiv1/cacheflush.c | 1
arch/csky/mm/init.c | 1
arch/h8300/mm/init.c | 2
arch/hexagon/mm/init.c | 1
arch/ia64/Kconfig | 23
arch/ia64/configs/bigsur_defconfig | 1
arch/ia64/include/asm/meminit.h | 11
arch/ia64/include/asm/module.h | 6
arch/ia64/include/asm/page.h | 25
arch/ia64/include/asm/pgtable.h | 7
arch/ia64/kernel/Makefile | 2
arch/ia64/kernel/acpi.c | 7
arch/ia64/kernel/efi.c | 11
arch/ia64/kernel/fsys.S | 4
arch/ia64/kernel/head.S | 6
arch/ia64/kernel/ia64_ksyms.c | 12
arch/ia64/kernel/machine_kexec.c | 2
arch/ia64/kernel/mca.c | 4
arch/ia64/kernel/module.c | 29
arch/ia64/kernel/pal.S | 6
arch/ia64/mm/Makefile | 1
arch/ia64/mm/contig.c | 4
arch/ia64/mm/discontig.c | 21
arch/ia64/mm/fault.c | 15
arch/ia64/mm/init.c | 221 ------
arch/m68k/mm/init.c | 1
arch/microblaze/mm/init.c | 1
arch/mips/Kconfig | 1
arch/mips/loongson64/numa.c | 1
arch/mips/mm/cache.c | 1
arch/mips/mm/init.c | 1
arch/mips/sgi-ip27/ip27-memory.c | 1
arch/nds32/mm/init.c | 1
arch/nios2/mm/cacheflush.c | 1
arch/nios2/mm/init.c | 1
arch/openrisc/mm/init.c | 2
arch/parisc/mm/init.c | 2
arch/powerpc/Kconfig | 1
arch/powerpc/include/asm/vmalloc.h | 34 -
arch/powerpc/kernel/isa-bridge.c | 4
arch/powerpc/kernel/pci_64.c | 2
arch/powerpc/mm/book3s64/radix_pgtable.c | 29
arch/powerpc/mm/ioremap.c | 2
arch/powerpc/mm/mem.c | 1
arch/powerpc/sysdev/xive/common.c | 4
arch/riscv/mm/init.c | 1
arch/s390/mm/init.c | 2
arch/sh/include/asm/tlb.h | 10
arch/sh/mm/cache-sh4.c | 1
arch/sh/mm/cache-sh7705.c | 1
arch/sh/mm/init.c | 1
arch/sparc/include/asm/pgtable_32.h | 3
arch/sparc/mm/init_32.c | 2
arch/sparc/mm/init_64.c | 1
arch/sparc/mm/tlb.c | 1
arch/um/kernel/mem.c | 1
arch/x86/Kconfig | 1
arch/x86/include/asm/vmalloc.h | 42 -
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2
arch/x86/mm/init_32.c | 2
arch/x86/mm/init_64.c | 222 ++++--
arch/x86/mm/ioremap.c | 33
arch/x86/mm/pgtable.c | 13
arch/xtensa/Kconfig | 1
arch/xtensa/mm/init.c | 1
block/blk-cgroup.c | 17
drivers/gpu/drm/i915/Kconfig | 1
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 9
drivers/gpu/drm/i915/i915_drv.h | 3
drivers/gpu/drm/i915/i915_mm.c | 117 ---
drivers/infiniband/core/umem.c | 12
drivers/pci/pci.c | 2
fs/aio.c | 5
fs/fs_parser.c | 2
fs/iomap/direct-io.c | 24
fs/ocfs2/blockcheck.c | 2
fs/ocfs2/dlm/dlmrecovery.c | 7
fs/ocfs2/stack_o2cb.c | 36 -
fs/ocfs2/stackglue.c | 2
include/linux/compiler-gcc.h | 8
include/linux/fs.h | 2
include/linux/gfp.h | 45 -
include/linux/io-mapping.h | 3
include/linux/io.h | 9
include/linux/kasan.h | 51 +
include/linux/memcontrol.h | 271 ++++----
include/linux/mm.h | 50 -
include/linux/mmzone.h | 43 -
include/linux/page-flags-layout.h | 64 -
include/linux/pagemap.h | 10
include/linux/pagewalk.h | 4
include/linux/sched.h | 4
include/linux/slab.h | 2
include/linux/slub_def.h | 2
include/linux/vmalloc.h | 73 +-
include/linux/vmstat.h | 24
include/net/page_pool.h | 2
include/trace/events/kmem.h | 24
init/main.c | 2
kernel/cgroup/cgroup.c | 34 -
kernel/cgroup/rstat.c | 61 +
kernel/dma/remap.c | 1
kernel/fork.c | 13
kernel/irq_work.c | 7
kernel/task_work.c | 3
kernel/watchdog.c | 102 +--
lib/Kconfig.debug | 14
lib/Makefile | 1
lib/test_kasan.c | 59 -
lib/test_slub.c | 124 +++
lib/test_vmalloc.c | 128 +--
mm/Kconfig | 4
mm/Makefile | 1
mm/debug_vm_pgtable.c | 4
mm/dmapool.c | 2
mm/filemap.c | 61 +
mm/gup.c | 145 +++-
mm/hugetlb.c | 2
mm/internal.h | 25
mm/interval_tree.c | 2
mm/io-mapping.c | 29
mm/ioremap.c | 361 ++--------
mm/kasan/common.c | 53 -
mm/kasan/generic.c | 12
mm/kasan/kasan.h | 28
mm/kasan/report_generic.c | 2
mm/kasan/shadow.c | 10
mm/kasan/sw_tags.c | 12
mm/kmemleak.c | 2
mm/memcontrol.c | 798 ++++++++++++------------
mm/memory-failure.c | 2
mm/memory.c | 191 +++--
mm/mempolicy.c | 78 --
mm/mempool.c | 4
mm/memremap.c | 2
mm/migrate.c | 2
mm/mm_init.c | 4
mm/mmap.c | 6
mm/mremap.c | 6
mm/msync.c | 6
mm/page-writeback.c | 9
mm/page_alloc.c | 430 +++++++++---
mm/page_counter.c | 8
mm/page_owner.c | 68 --
mm/page_poison.c | 6
mm/percpu-vm.c | 7
mm/slab.c | 43 -
mm/slab.h | 24
mm/slab_common.c | 10
mm/slub.c | 215 ++----
mm/sparse.c | 1
mm/swap_state.c | 13
mm/util.c | 10
mm/vmalloc.c | 728 ++++++++++++++++-----
net/core/page_pool.c | 127 ++-
net/sunrpc/svc_xprt.c | 38 -
samples/kfifo/bytestream-example.c | 8
samples/kfifo/inttype-example.c | 8
samples/kfifo/record-example.c | 8
samples/vfio-mdev/mdpy.c | 4
scripts/checkdeclares.pl | 53 +
scripts/spelling.txt | 26
tools/testing/selftests/cgroup/test_kmem.c | 22
tools/testing/selftests/vm/mremap_dontunmap.c | 52 +
tools/testing/selftests/vm/test_vmalloc.sh | 21
189 files changed, 3642 insertions(+), 3013 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-04-23 21:28 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-04-23 21:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
5 patches, based on 5bfc75d92efd494db37f5c4c173d3639d4772966.
Subsystems affected by this patch series:
coda
overlayfs
mm/pagecache
mm/memcg
Subsystem: coda
Christian König <christian.koenig@amd.com>:
coda: fix reference counting in coda_file_mmap error path
Subsystem: overlayfs
Christian König <christian.koenig@amd.com>:
ovl: fix reference counting in ovl_mmap error path
Subsystem: mm/pagecache
Hugh Dickins <hughd@google.com>:
mm/filemap: fix find_lock_entries hang on 32-bit THP
mm/filemap: fix mapping_seek_hole_data on THP & 32-bit
Subsystem: mm/memcg
Vasily Averin <vvs@virtuozzo.com>:
tools/cgroup/slabinfo.py: updated to work on current kernel
fs/coda/file.c | 6 +++---
fs/overlayfs/file.c | 11 +----------
mm/filemap.c | 31 +++++++++++++++++++------------
tools/cgroup/memcg_slabinfo.py | 8 ++++----
4 files changed, 27 insertions(+), 29 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-04-16 22:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-04-16 22:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
12 patches, based on 06c2aac4014c38247256fe49c61b7f55890271e7.
Subsystems affected by this patch series:
mm/documentation
mm/kasan
csky
ia64
mm/pagemap
gcov
lib
Subsystem: mm/documentation
Randy Dunlap <rdunlap@infradead.org>:
mm: eliminate "expecting prototype" kernel-doc warnings
Subsystem: mm/kasan
Arnd Bergmann <arnd@arndb.de>:
kasan: fix hwasan build for gcc
Walter Wu <walter-zh.wu@mediatek.com>:
kasan: remove redundant config option
Subsystem: csky
Randy Dunlap <rdunlap@infradead.org>:
csky: change a Kconfig symbol name to fix e1000 build error
Subsystem: ia64
Randy Dunlap <rdunlap@infradead.org>:
ia64: remove duplicate entries in generic_defconfig
ia64: fix discontig.c section mismatches
John Paul Adrian Glaubitz <glaubitz () physik ! fu-berlin ! de>:
ia64: tools: remove inclusion of ia64-specific version of errno.h header
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>:
ia64: tools: remove duplicate definition of ia64_mf() on ia64
Subsystem: mm/pagemap
Zack Rusin <zackr@vmware.com>:
mm/mapping_dirty_helpers: guard hugepage pud's usage
Christophe Leroy <christophe.leroy@csgroup.eu>:
mm: ptdump: fix build failure
Subsystem: gcov
Johannes Berg <johannes.berg@intel.com>:
gcov: clang: fix clang-11+ build
Subsystem: lib
Randy Dunlap <rdunlap@infradead.org>:
lib: remove "expecting prototype" kernel-doc warnings
arch/arm64/kernel/sleep.S | 2 +-
arch/csky/Kconfig | 2 +-
arch/csky/include/asm/page.h | 2 +-
arch/ia64/configs/generic_defconfig | 2 --
arch/ia64/mm/discontig.c | 6 +++---
arch/x86/kernel/acpi/wakeup_64.S | 2 +-
include/linux/kasan.h | 2 +-
kernel/gcov/clang.c | 2 +-
lib/Kconfig.kasan | 9 ++-------
lib/earlycpio.c | 4 ++--
lib/lru_cache.c | 3 ++-
lib/parman.c | 4 ++--
lib/radix-tree.c | 11 ++++++-----
mm/kasan/common.c | 2 +-
mm/kasan/kasan.h | 2 +-
mm/kasan/report_generic.c | 2 +-
mm/mapping_dirty_helpers.c | 2 ++
mm/mmu_gather.c | 29 +++++++++++++++++++----------
mm/oom_kill.c | 2 +-
mm/ptdump.c | 2 +-
mm/shuffle.c | 4 ++--
scripts/Makefile.kasan | 22 ++++++++++++++--------
security/Kconfig.hardening | 4 ++--
tools/arch/ia64/include/asm/barrier.h | 3 ---
tools/include/uapi/asm/errno.h | 2 --
25 files changed, 67 insertions(+), 60 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-04-09 20:26 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-04-09 20:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
16 patches, based on 17e7124aad766b3f158943acb51467f86220afe9.
Subsystems affected by this patch series:
MAINTAINERS
mailmap
mm/kasan
mm/gup
nds32
gcov
ocfs2
ia64
mm/pagecache
mm/kasan
mm/kfence
lib
Subsystem: MAINTAINERS
Marek Behún <kabel@kernel.org>:
MAINTAINERS: update CZ.NIC's Turris information
treewide: change my e-mail address, fix my name
Subsystem: mailmap
Jordan Crouse <jordan@cosmicpenguin.net>:
mailmap: update email address for Jordan Crouse
Matthew Wilcox <willy@infradead.org>:
.mailmap: fix old email addresses
Subsystem: mm/kasan
Arnd Bergmann <arnd@arndb.de>:
kasan: fix hwasan build for gcc
Walter Wu <walter-zh.wu@mediatek.com>:
kasan: remove redundant config option
Subsystem: mm/gup
Aili Yao <yaoaili@kingsoft.com>:
mm/gup: check page posion status for coredump.
Subsystem: nds32
Mike Rapoport <rppt@linux.ibm.com>:
nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
Subsystem: gcov
Nick Desaulniers <ndesaulniers@google.com>:
gcov: re-fix clang-11+ support
Subsystem: ocfs2
Wengang Wang <wen.gang.wang@oracle.com>:
ocfs2: fix deadlock between setattr and dio_end_io_write
Subsystem: ia64
Sergei Trofimovich <slyfox@gentoo.org>:
ia64: fix user_stack_pointer() for ptrace()
Subsystem: mm/pagecache
Jack Qiu <jack.qiu@huawei.com>:
fs: direct-io: fix missing sdio->boundary
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan: fix conflict with page poisoning
Andrew Morton <akpm@linux-foundation.org>:
lib/test_kasan_module.c: suppress unused var warning
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence, x86: fix preemptible warning on KPTI-enabled systems
Subsystem: lib
Julian Braha <julianbraha@gmail.com>:
lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS
.mailmap | 7 ++
Documentation/ABI/testing/debugfs-moxtet | 4 -
Documentation/ABI/testing/debugfs-turris-mox-rwtm | 2
Documentation/ABI/testing/sysfs-bus-moxtet-devices | 6 +-
Documentation/ABI/testing/sysfs-class-led-driver-turris-omnia | 2
Documentation/ABI/testing/sysfs-firmware-turris-mox-rwtm | 10 +--
Documentation/devicetree/bindings/leds/cznic,turris-omnia-leds.yaml | 2
MAINTAINERS | 13 +++-
arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts | 2
arch/arm64/kernel/sleep.S | 2
arch/ia64/include/asm/ptrace.h | 8 --
arch/nds32/mm/cacheflush.c | 2
arch/x86/include/asm/kfence.h | 7 ++
arch/x86/kernel/acpi/wakeup_64.S | 2
drivers/bus/moxtet.c | 4 -
drivers/firmware/turris-mox-rwtm.c | 4 -
drivers/gpio/gpio-moxtet.c | 4 -
drivers/leds/leds-turris-omnia.c | 4 -
drivers/mailbox/armada-37xx-rwtm-mailbox.c | 4 -
drivers/watchdog/armada_37xx_wdt.c | 4 -
fs/direct-io.c | 5 +
fs/ocfs2/aops.c | 11 ---
fs/ocfs2/file.c | 8 ++
include/dt-bindings/bus/moxtet.h | 2
include/linux/armada-37xx-rwtm-mailbox.h | 2
include/linux/kasan.h | 2
include/linux/moxtet.h | 2
kernel/gcov/clang.c | 29 ++++++----
lib/Kconfig.debug | 6 +-
lib/Kconfig.kasan | 9 ---
lib/test_kasan_module.c | 2
mm/gup.c | 4 +
mm/internal.h | 20 ++++++
mm/kasan/common.c | 2
mm/kasan/kasan.h | 2
mm/kasan/report_generic.c | 2
mm/page_poison.c | 4 +
scripts/Makefile.kasan | 18 ++++--
security/Kconfig.hardening | 4 -
39 files changed, 136 insertions(+), 91 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-03-25 4:36 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-03-25 4:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
14 patches, based on 7acac4b3196caee5e21fb5ea53f8bc124e6a16fc.
Subsystems affected by this patch series:
mm/hugetlb
mm/kasan
mm/gup
mm/selftests
mm/z3fold
squashfs
ia64
gcov
mm/kfence
mm/memblock
mm/highmem
mailmap
Subsystem: mm/hugetlb
Miaohe Lin <linmiaohe@huawei.com>:
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan: fix per-page tags for non-page_alloc pages
Subsystem: mm/gup
Sean Christopherson <seanjc@google.com>:
mm/mmu_notifiers: ensure range_end() is paired with range_start()
Subsystem: mm/selftests
Rong Chen <rong.a.chen@intel.com>:
selftests/vm: fix out-of-tree build
Subsystem: mm/z3fold
Thomas Hebb <tommyhebb@gmail.com>:
z3fold: prevent reclaim/free race for headless pages
Subsystem: squashfs
Sean Nyekjaer <sean@geanix.com>:
squashfs: fix inode lookup sanity checks
Phillip Lougher <phillip@squashfs.org.uk>:
squashfs: fix xattr id and id lookup sanity checks
Subsystem: ia64
Sergei Trofimovich <slyfox@gentoo.org>:
ia64: mca: allocate early mca with GFP_ATOMIC
ia64: fix format strings for err_inject
Subsystem: gcov
Nick Desaulniers <ndesaulniers@google.com>:
gcov: fix clang-11+ support
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence: make compatible with kmemleak
Subsystem: mm/memblock
Mike Rapoport <rppt@linux.ibm.com>:
mm: memblock: fix section mismatch warning again
Subsystem: mm/highmem
Ira Weiny <ira.weiny@intel.com>:
mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
Subsystem: mailmap
Andrey Konovalov <andreyknvl@google.com>:
mailmap: update Andrey Konovalov's email address
.mailmap | 1
arch/ia64/kernel/err_inject.c | 22 +++++------
arch/ia64/kernel/mca.c | 2 -
fs/squashfs/export.c | 8 +++-
fs/squashfs/id.c | 6 ++-
fs/squashfs/squashfs_fs.h | 1
fs/squashfs/xattr_id.c | 6 ++-
include/linux/hugetlb_cgroup.h | 15 ++++++-
include/linux/memblock.h | 4 +-
include/linux/mm.h | 18 +++++++--
include/linux/mmu_notifier.h | 10 ++---
kernel/gcov/clang.c | 69 ++++++++++++++++++++++++++++++++++++
mm/highmem.c | 4 +-
mm/hugetlb.c | 41 +++++++++++++++++++--
mm/hugetlb_cgroup.c | 10 ++++-
mm/kfence/core.c | 9 ++++
mm/kmemleak.c | 3 +
mm/mmu_notifier.c | 23 ++++++++++++
mm/z3fold.c | 16 +++++++-
tools/testing/selftests/vm/Makefile | 4 +-
20 files changed, 230 insertions(+), 42 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-03-13 5:06 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-03-13 5:06 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
29 patches, based on f78d76e72a4671ea52d12752d92077788b4f5d50.
Subsystems affected by this patch series:
mm/memblock
core-kernel
kconfig
mm/pagealloc
fork
mm/hugetlb
mm/highmem
binfmt
MAINTAINERS
kbuild
mm/kfence
mm/oom-kill
mm/madvise
mm/kasan
mm/userfaultfd
mm/memory-failure
ia64
mm/memcg
mm/zram
Subsystem: mm/memblock
Arnd Bergmann <arnd@arndb.de>:
memblock: fix section mismatch warning
Subsystem: core-kernel
Arnd Bergmann <arnd@arndb.de>:
stop_machine: mark helpers __always_inline
Subsystem: kconfig
Masahiro Yamada <masahiroy@kernel.org>:
init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM
Subsystem: mm/pagealloc
Mike Rapoport <rppt@linux.ibm.com>:
mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
Subsystem: fork
Fenghua Yu <fenghua.yu@intel.com>:
mm/fork: clear PASID for new mm
Subsystem: mm/hugetlb
Peter Xu <peterx@redhat.com>:
Patch series "mm/hugetlb: Early cow on fork, and a few cleanups", v5:
hugetlb: dedup the code to add a new file_region
hugetlb: break earlier in add_reservation_in_range() when we can
mm: introduce page_needs_cow_for_dma() for deciding whether cow
mm: use is_cow_mapping() across tree where proper
hugetlb: do early cow when page pinned on src mm
Subsystem: mm/highmem
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
mm/highmem.c: fix zero_user_segments() with start > end
Subsystem: binfmt
Lior Ribak <liorribak@gmail.com>:
binfmt_misc: fix possible deadlock in bm_register_write
Subsystem: MAINTAINERS
Vlastimil Babka <vbabka@suse.cz>:
MAINTAINERS: exclude uapi directories in API/ABI section
Subsystem: kbuild
Arnd Bergmann <arnd@arndb.de>:
linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
Subsystem: mm/kfence
Marco Elver <elver@google.com>:
kfence: fix printk format for ptrdiff_t
kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
kfence: fix reports if constant function prefixes exist
Subsystem: mm/oom-kill
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
include/linux/sched/mm.h: use rcu_dereference in in_vfork()
Subsystem: mm/madvise
Suren Baghdasaryan <surenb@google.com>:
mm/madvise: replace ptrace attach requirement for process_madvise
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
kasan: fix KASAN_STACK dependency for HW_TAGS
Subsystem: mm/userfaultfd
Nadav Amit <namit@vmware.com>:
mm/userfaultfd: fix memory corruption due to writeprotect
Subsystem: mm/memory-failure
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm, hwpoison: do not lock page again when me_huge_page() successfully recovers
Subsystem: ia64
Sergei Trofimovich <slyfox@gentoo.org>:
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
Subsystem: mm/memcg
Zhou Guanghui <zhouguanghui1@huawei.com>:
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
mm/memcg: set memcg when splitting page
Subsystem: mm/zram
Minchan Kim <minchan@kernel.org>:
zram: fix return value on writeback_store
zram: fix broken page writeback
MAINTAINERS | 4
arch/ia64/include/asm/syscall.h | 2
arch/ia64/kernel/ptrace.c | 24 +++-
drivers/block/zram/zram_drv.c | 17 +-
drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 4
drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 2
fs/binfmt_misc.c | 29 ++---
fs/proc/task_mmu.c | 2
include/linux/compiler-clang.h | 6 +
include/linux/memblock.h | 4
include/linux/memcontrol.h | 6 -
include/linux/mm.h | 21 +++
include/linux/mm_types.h | 1
include/linux/sched/mm.h | 3
include/linux/stop_machine.h | 11 +
init/Kconfig | 3
kernel/fork.c | 8 +
lib/Kconfig.kasan | 1
mm/highmem.c | 17 ++
mm/huge_memory.c | 10 -
mm/hugetlb.c | 123 +++++++++++++++------
mm/internal.h | 5
mm/kfence/report.c | 30 +++--
mm/madvise.c | 13 ++
mm/memcontrol.c | 15 +-
mm/memory-failure.c | 4
mm/memory.c | 16 +-
mm/page_alloc.c | 167 ++++++++++++++---------------
mm/slab.c | 2
29 files changed, 334 insertions(+), 216 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-26 17:55 ` incoming Linus Torvalds
@ 2021-02-26 19:16 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-02-26 19:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, Linux-MM
On Fri, 26 Feb 2021 09:55:27 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Feb 25, 2021 at 5:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > - The rest of MM.
> >
> > Includes kfence - another runtime memory validator. Not as
> > thorough as KASAN, but it has unmeasurable overhead and is intended
> > to be usable in production builds.
> >
> > - Everything else
>
> Just to clarify: you have nothing else really pending?
Yes, that's it from me for -rc1.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-26 1:14 incoming Andrew Morton
@ 2021-02-26 17:55 ` Linus Torvalds
2021-02-26 19:16 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2021-02-26 17:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Thu, Feb 25, 2021 at 5:14 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> - The rest of MM.
>
> Includes kfence - another runtime memory validator. Not as
> thorough as KASAN, but it has unmeasurable overhead and is intended
> to be usable in production builds.
>
> - Everything else
Just to clarify: you have nothing else really pending?
I'm hoping to just do -rc1 this weekend after all - despite my late
start due to loss of power for several days.
I'll allow late stragglers with good reason through, but the fewer of
those there are, the better, of course.
Thanks,
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-02-26 1:14 Andrew Morton
2021-02-26 17:55 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-02-26 1:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- The rest of MM.
Includes kfence - another runtime memory validator. Not as
thorough as KASAN, but it has unmeasurable overhead and is intended
to be usable in production builds.
- Everything else
118 patches, based on 6fbd6cf85a3be127454a1ad58525a3adcf8612ab.
Subsystems affected by this patch series:
mm/thp
mm/cma
mm/vmstat
mm/memory-hotplug
mm/mlock
mm/rmap
mm/zswap
mm/zsmalloc
mm/cleanups
mm/kfence
mm/kasan2
alpha
procfs
sysctl
misc
core-kernel
MAINTAINERS
lib
bitops
checkpatch
init
coredump
seq_file
gdb
ubsan
initramfs
mm/pagemap2
Subsystem: mm/thp
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Overhaul multi-page lookups for THP", v4:
mm: make pagecache tagged lookups return only head pages
mm/shmem: use pagevec_lookup in shmem_unlock_mapping
mm/swap: optimise get_shadow_from_swap_cache
mm: add FGP_ENTRY
mm/filemap: rename find_get_entry to mapping_get_entry
mm/filemap: add helper for finding pages
mm/filemap: add mapping_seek_hole_data
iomap: use mapping_seek_hole_data
mm: add and use find_lock_entries
mm: add an 'end' parameter to find_get_entries
mm: add an 'end' parameter to pagevec_lookup_entries
mm: remove nr_entries parameter from pagevec_lookup_entries
mm: pass pvec directly to find_get_entries
mm: remove pagevec_lookup_entries
Rik van Riel <riel@surriel.com>:
Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6:
mm,thp,shmem: limit shmem THP alloc gfp_mask
mm,thp,shm: limit gfp mask to no more than specified
mm,thp,shmem: make khugepaged obey tmpfs mount flags
mm,shmem,thp: limit shmem THP allocations to requested zones
Subsystem: mm/cma
Roman Gushchin <guro@fb.com>:
mm: cma: allocate cma areas bottom-up
David Hildenbrand <david@redhat.com>:
mm/cma: expose all pages to the buddy if activation of an area fails
mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
Patrick Daly <pdaly@codeaurora.org>:
mm: cma: print region name on failure
Subsystem: mm/vmstat
Johannes Weiner <hannes@cmpxchg.org>:
mm: vmstat: fix NOHZ wakeups for node stat changes
mm: vmstat: add some comments on internal storage of byte items
Jiang Biao <benbjiang@tencent.com>:
mm/vmstat.c: erase latency in vmstat_shepherd
Subsystem: mm/memory-hotplug
Dan Williams <dan.j.williams@intel.com>:
Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4:
mm: move pfn_to_online_page() out of line
mm: teach pfn_to_online_page() to consider subsection validity
mm: teach pfn_to_online_page() about ZONE_DEVICE section collisions
mm: fix memory_failure() handling of dax-namespace metadata
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/memory_hotplug: rename all existing 'memhp' into 'mhp'
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCE
Miaohe Lin <linmiaohe@huawei.com>:
mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfn
David Hildenbrand <david@redhat.com>:
drivers/base/memory: don't store phys_device in memory blocks
Documentation: sysfs/memory: clarify some memory block device properties
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/memory_hotplug: Pre-validate the address range with platform", v5:
mm/memory_hotplug: prevalidate the address range being added with platform
arm64/mm: define arch_get_mappable_range()
s390/mm: define arch_get_mappable_range()
David Hildenbrand <david@redhat.com>:
virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplug
Subsystem: mm/mlock
Miaohe Lin <linmiaohe@huawei.com>:
mm/mlock: stop counting mlocked pages when none vma is found
Subsystem: mm/rmap
Miaohe Lin <linmiaohe@huawei.com>:
mm/rmap: correct some obsolete comments of anon_vma
mm/rmap: remove unneeded semicolon in page_not_mapped()
mm/rmap: fix obsolete comment in __page_check_anon_rmap()
mm/rmap: use page_not_mapped in try_to_unmap()
mm/rmap: correct obsolete comment of page_get_anon_vma()
mm/rmap: fix potential pte_unmap on an not mapped pte
Subsystem: mm/zswap
Randy Dunlap <rdunlap@infradead.org>:
mm: zswap: clean up confusing comment
Tian Tao <tiantao6@hisilicon.com>:
Patch series "Fix the compatibility of zsmalloc and zswap":
mm/zswap: add the flag can_sleep_mapped
mm: set the sleep_mapped to true for zbud and z3fold
Subsystem: mm/zsmalloc
Miaohe Lin <linmiaohe@huawei.com>:
mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage()
Rokudo Yan <wu-yan@tcl.com>:
zsmalloc: account the number of compacted pages correctly
Miaohe Lin <linmiaohe@huawei.com>:
mm/zsmalloc.c: use page_private() to access page->private
Subsystem: mm/cleanups
Guo Ren <guoren@linux.alibaba.com>:
mm: page-flags.h: Typo fix (It -> If)
Daniel Vetter <daniel.vetter@ffwll.ch>:
mm/dmapool: use might_alloc()
mm/backing-dev.c: use might_alloc()
Stephen Zhang <stephenzhangzsd@gmail.com>:
mm/early_ioremap.c: use __func__ instead of function name
Subsystem: mm/kfence
Alexander Potapenko <glider@google.com>:
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7:
mm: add Kernel Electric-Fence infrastructure
x86, kfence: enable KFENCE for x86
Marco Elver <elver@google.com>:
arm64, kfence: enable KFENCE for ARM64
kfence: use pt_regs to generate stack trace on faults
Alexander Potapenko <glider@google.com>:
mm, kfence: insert KFENCE hooks for SLAB
mm, kfence: insert KFENCE hooks for SLUB
kfence, kasan: make KFENCE compatible with KASAN
Marco Elver <elver@google.com>:
kfence, Documentation: add KFENCE documentation
kfence: add test suite
MAINTAINERS: add entry for KFENCE
kfence: report sensitive information based on no_hash_pointers
Alexander Potapenko <glider@google.com>:
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3:
tracing: add error_report_end trace point
kfence: use error_report_end tracepoint
kasan: use error_report_end tracepoint
Subsystem: mm/kasan2
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan: optimizations and fixes for HW_TAGS", v4:
kasan, mm: don't save alloc stacks twice
kasan, mm: optimize kmalloc poisoning
kasan: optimize large kmalloc poisoning
kasan: clean up setting free info in kasan_slab_free
kasan: unify large kfree checks
kasan: rework krealloc tests
kasan, mm: fail krealloc on freed objects
kasan, mm: optimize krealloc poisoning
kasan: ensure poisoning size alignment
arm64: kasan: simplify and inline MTE functions
kasan: inline HW_TAGS helper functions
kasan: clarify that only first bug is reported in HW_TAGS
Subsystem: alpha
Randy Dunlap <rdunlap@infradead.org>:
alpha: remove CONFIG_EXPERIMENTAL from defconfigs
Subsystem: procfs
Helge Deller <deller@gmx.de>:
proc/wchan: use printk format instead of lookup_symbol_name()
Josef Bacik <josef@toxicpanda.com>:
proc: use kvzalloc for our kernel buffer
Subsystem: sysctl
Lin Feng <linf@wangsu.com>:
sysctl.c: fix underflow value setting risk in vm_table
Subsystem: misc
Randy Dunlap <rdunlap@infradead.org>:
include/linux: remove repeated words
Miguel Ojeda <ojeda@kernel.org>:
treewide: Miguel has moved
Subsystem: core-kernel
Hubert Jasudowicz <hubert.jasudowicz@gmail.com>:
groups: use flexible-array member in struct group_info
groups: simplify struct group_info allocation
Randy Dunlap <rdunlap@infradead.org>:
kernel: delete repeated words in comments
Subsystem: MAINTAINERS
Vlastimil Babka <vbabka@suse.cz>:
MAINTAINERS: add uapi directories to API/ABI section
Subsystem: lib
Huang Shijie <sjhuang@iluvatar.ai>:
lib/genalloc.c: change return type to unsigned long for bitmap_set_ll
Francis Laniel <laniel_francis@privacyrequired.com>:
string.h: move fortified functions definitions in a dedicated header.
Yogesh Lal <ylal@codeaurora.org>:
lib: stackdepot: add support to configure STACK_HASH_SIZE
Vijayanand Jitta <vjitta@codeaurora.org>:
lib: stackdepot: add support to disable stack depot
lib: stackdepot: fix ignoring return value warning
Masahiro Yamada <masahiroy@kernel.org>:
lib/cmdline: remove an unneeded local variable in next_arg()
Subsystem: bitops
Geert Uytterhoeven <geert+renesas@glider.be>:
include/linux/bitops.h: spelling s/synomyn/synonym/
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: improve blank line after declaration test
Peng Wang <rocking@linux.alibaba.com>:
checkpatch: ignore warning designated initializers using NR_CPUS
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: trivial style fixes
Joe Perches <joe@perches.com>:
checkpatch: prefer ftrace over function entry/exit printks
checkpatch: improve TYPECAST_INT_CONSTANT test message
Aditya Srivastava <yashsri421@gmail.com>:
checkpatch: add warning for avoiding .L prefix symbols in assembly files
Joe Perches <joe@perches.com>:
checkpatch: add kmalloc_array_node to unnecessary OOM message check
Chris Down <chris@chrisdown.name>:
checkpatch: don't warn about colon termination in linker scripts
Song Liu <songliubraving@fb.com>:
checkpatch: do not apply "initialise globals to 0" check to BPF progs
Subsystem: init
Masahiro Yamada <masahiroy@kernel.org>:
init/version.c: remove Version_<LINUX_VERSION_CODE> symbol
init: clean up early_param_on_off() macro
Bhaskar Chowdhury <unixbhaskar@gmail.com>:
init/Kconfig: fix a typo in CC_VERSION_TEXT help text
Subsystem: coredump
Ira Weiny <ira.weiny@intel.com>:
fs/coredump: use kmap_local_page()
Subsystem: seq_file
NeilBrown <neilb@suse.de>:
Patch series "Fix some seq_file users that were recently broken":
seq_file: document how per-entry resources are managed.
x86: fix seq_file iteration for pat/memtype.c
Subsystem: gdb
George Prekas <prekageo@amazon.com>:
scripts/gdb: fix list_for_each
Sumit Garg <sumit.garg@linaro.org>:
kgdb: fix to kill breakpoints on initmem after boot
Subsystem: ubsan
Andrey Ryabinin <ryabinin.a.a@gmail.com>:
ubsan: remove overflow checks
Subsystem: initramfs
Florian Fainelli <f.fainelli@gmail.com>:
initramfs: panic with memory information
Subsystem: mm/pagemap2
Huang Pei <huangpei@loongson.cn>:
MIPS: make userspace mapping young by default
.mailmap | 1
CREDITS | 9
Documentation/ABI/testing/sysfs-devices-memory | 58 -
Documentation/admin-guide/auxdisplay/cfag12864b.rst | 2
Documentation/admin-guide/auxdisplay/ks0108.rst | 2
Documentation/admin-guide/kernel-parameters.txt | 6
Documentation/admin-guide/mm/memory-hotplug.rst | 20
Documentation/dev-tools/index.rst | 1
Documentation/dev-tools/kasan.rst | 8
Documentation/dev-tools/kfence.rst | 318 +++++++
Documentation/filesystems/seq_file.rst | 6
MAINTAINERS | 26
arch/alpha/configs/defconfig | 1
arch/arm64/Kconfig | 1
arch/arm64/include/asm/cache.h | 1
arch/arm64/include/asm/kasan.h | 1
arch/arm64/include/asm/kfence.h | 26
arch/arm64/include/asm/mte-def.h | 2
arch/arm64/include/asm/mte-kasan.h | 65 +
arch/arm64/include/asm/mte.h | 2
arch/arm64/kernel/mte.c | 46 -
arch/arm64/lib/mte.S | 16
arch/arm64/mm/fault.c | 8
arch/arm64/mm/mmu.c | 23
arch/mips/mm/cache.c | 30
arch/s390/mm/init.c | 1
arch/s390/mm/vmem.c | 14
arch/x86/Kconfig | 1
arch/x86/include/asm/kfence.h | 76 +
arch/x86/mm/fault.c | 10
arch/x86/mm/pat/memtype.c | 4
drivers/auxdisplay/cfag12864b.c | 4
drivers/auxdisplay/cfag12864bfb.c | 4
drivers/auxdisplay/ks0108.c | 4
drivers/base/memory.c | 35
drivers/block/zram/zram_drv.c | 2
drivers/hv/hv_balloon.c | 2
drivers/virtio/virtio_mem.c | 43
drivers/xen/balloon.c | 2
fs/coredump.c | 4
fs/iomap/seek.c | 125 --
fs/proc/base.c | 21
fs/proc/proc_sysctl.c | 4
include/linux/bitops.h | 2
include/linux/cfag12864b.h | 2
include/linux/cred.h | 2
include/linux/fortify-string.h | 302 ++++++
include/linux/gfp.h | 2
include/linux/init.h | 4
include/linux/kasan.h | 25
include/linux/kfence.h | 230 +++++
include/linux/kgdb.h | 2
include/linux/khugepaged.h | 2
include/linux/ks0108.h | 2
include/linux/mdev.h | 2
include/linux/memory.h | 3
include/linux/memory_hotplug.h | 33
include/linux/memremap.h | 6
include/linux/mmzone.h | 49 -
include/linux/page-flags.h | 4
include/linux/pagemap.h | 10
include/linux/pagevec.h | 10
include/linux/pgtable.h | 8
include/linux/ptrace.h | 2
include/linux/rmap.h | 3
include/linux/slab_def.h | 3
include/linux/slub_def.h | 3
include/linux/stackdepot.h | 9
include/linux/string.h | 282 ------
include/linux/vmstat.h | 6
include/linux/zpool.h | 3
include/linux/zsmalloc.h | 2
include/trace/events/error_report.h | 74 +
include/uapi/linux/firewire-cdev.h | 2
include/uapi/linux/input.h | 2
init/Kconfig | 2
init/initramfs.c | 19
init/main.c | 6
init/version.c | 8
kernel/debug/debug_core.c | 11
kernel/events/core.c | 8
kernel/events/uprobes.c | 2
kernel/groups.c | 7
kernel/locking/rtmutex.c | 4
kernel/locking/rwsem.c | 2
kernel/locking/semaphore.c | 2
kernel/sched/fair.c | 2
kernel/sched/membarrier.c | 2
kernel/sysctl.c | 8
kernel/trace/Makefile | 1
kernel/trace/error_report-traces.c | 12
lib/Kconfig | 9
lib/Kconfig.debug | 1
lib/Kconfig.kfence | 84 +
lib/Kconfig.ubsan | 17
lib/cmdline.c | 7
lib/genalloc.c | 3
lib/stackdepot.c | 41
lib/test_kasan.c | 111 ++
lib/test_ubsan.c | 49 -
lib/ubsan.c | 68 -
mm/Makefile | 1
mm/backing-dev.c | 3
mm/cma.c | 64 -
mm/dmapool.c | 3
mm/early_ioremap.c | 12
mm/filemap.c | 361 +++++---
mm/huge_memory.c | 6
mm/internal.h | 6
mm/kasan/common.c | 213 +++-
mm/kasan/generic.c | 3
mm/kasan/hw_tags.c | 2
mm/kasan/kasan.h | 97 +-
mm/kasan/report.c | 8
mm/kasan/shadow.c | 78 +
mm/kfence/Makefile | 6
mm/kfence/core.c | 875 +++++++++++++++++++-
mm/kfence/kfence.h | 126 ++
mm/kfence/kfence_test.c | 860 +++++++++++++++++++
mm/kfence/report.c | 350 ++++++--
mm/khugepaged.c | 22
mm/memory-failure.c | 6
mm/memory.c | 4
mm/memory_hotplug.c | 178 +++-
mm/memremap.c | 23
mm/mlock.c | 2
mm/page_alloc.c | 1
mm/rmap.c | 24
mm/shmem.c | 160 +--
mm/slab.c | 38
mm/slab_common.c | 29
mm/slub.c | 63 +
mm/swap.c | 54 -
mm/swap_state.c | 7
mm/truncate.c | 141 ---
mm/vmstat.c | 35
mm/z3fold.c | 1
mm/zbud.c | 1
mm/zpool.c | 13
mm/zsmalloc.c | 22
mm/zswap.c | 57 +
samples/auxdisplay/cfag12864b-example.c | 2
scripts/Makefile.ubsan | 2
scripts/checkpatch.pl | 152 ++-
scripts/gdb/linux/lists.py | 5
145 files changed, 5046 insertions(+), 1682 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-25 9:12 ` incoming Andrey Ryabinin
(?)
@ 2021-02-25 11:07 ` Walter Wu
-1 siblings, 0 replies; 786+ messages in thread
From: Walter Wu @ 2021-02-25 11:07 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Arnd Bergmann, Linus Torvalds, Andrew Morton, Dmitry Vyukov,
Nathan Chancellor, Arnd Bergmann, Andrey Konovalov, Linux-MM,
mm-commits, Andrey Ryabinin, Alexander Potapenko
Hi Andrey,
On Thu, 2021-02-25 at 12:12 +0300, Andrey Ryabinin wrote:
> On Thu, Feb 25, 2021 at 11:53 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > Hmm. I haven't bisected things yet, but I suspect it's something with
> > > > the KASAN patches. With this all applied, I get:
> > > >
> > > > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> > > > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> > > > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> > > >
> > > > and
> > > >
> > > > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> > > > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> > > > larger than 2048 bytes [-Wframe-larger-than=]
> > > >
> > > > which is obviously not really acceptable. A 11kB stack frame _will_
> > > > cause issues.
> > >
> > > A quick bisect shoes that this was introduced by "[patch 101/173]
> > > kasan: remove redundant config option".
> > >
> > > I didn't check what part of that patch screws up, but it's definitely
> > > doing something bad.
> >
> > I'm not sure why that patch surfaced the bug, but it's worth pointing
> > out that the underlying problem is asan-stack in combination
> > with the structleak plugin. This will happen for every user of kunit.
> >
>
> The patch didn't update KASAN_STACK dependency in kconfig:
> config GCC_PLUGIN_STRUCTLEAK_BYREF
> ....
> depends on !(KASAN && KASAN_STACK=1)
>
> This 'depends on' stopped working with the patch
Thanks for pointing out this problem. I will re-send that patch.
Walter
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-25 8:53 ` incoming Arnd Bergmann
@ 2021-02-25 9:12 ` Andrey Ryabinin
-1 siblings, 0 replies; 786+ messages in thread
From: Andrey Ryabinin @ 2021-02-25 9:12 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Linus Torvalds, Andrew Morton, Walter Wu, Dmitry Vyukov,
Nathan Chancellor, Arnd Bergmann, Andrey Konovalov, Linux-MM,
mm-commits, Andrey Ryabinin, Alexander Potapenko
On Thu, Feb 25, 2021 at 11:53 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Hmm. I haven't bisected things yet, but I suspect it's something with
> > > the KASAN patches. With this all applied, I get:
> > >
> > > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> > > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> > > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> > >
> > > and
> > >
> > > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> > > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> > > larger than 2048 bytes [-Wframe-larger-than=]
> > >
> > > which is obviously not really acceptable. A 11kB stack frame _will_
> > > cause issues.
> >
> > A quick bisect shoes that this was introduced by "[patch 101/173]
> > kasan: remove redundant config option".
> >
> > I didn't check what part of that patch screws up, but it's definitely
> > doing something bad.
>
> I'm not sure why that patch surfaced the bug, but it's worth pointing
> out that the underlying problem is asan-stack in combination
> with the structleak plugin. This will happen for every user of kunit.
>
The patch didn't update KASAN_STACK dependency in kconfig:
config GCC_PLUGIN_STRUCTLEAK_BYREF
....
depends on !(KASAN && KASAN_STACK=1)
This 'depends on' stopped working with the patch
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2021-02-25 9:12 ` Andrey Ryabinin
0 siblings, 0 replies; 786+ messages in thread
From: Andrey Ryabinin @ 2021-02-25 9:12 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Linus Torvalds, Andrew Morton, Walter Wu, Dmitry Vyukov,
Nathan Chancellor, Arnd Bergmann, Andrey Konovalov, Linux-MM,
mm-commits, Andrey Ryabinin, Alexander Potapenko
On Thu, Feb 25, 2021 at 11:53 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Hmm. I haven't bisected things yet, but I suspect it's something with
> > > the KASAN patches. With this all applied, I get:
> > >
> > > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> > > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> > > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> > >
> > > and
> > >
> > > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> > > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> > > larger than 2048 bytes [-Wframe-larger-than=]
> > >
> > > which is obviously not really acceptable. A 11kB stack frame _will_
> > > cause issues.
> >
> > A quick bisect shoes that this was introduced by "[patch 101/173]
> > kasan: remove redundant config option".
> >
> > I didn't check what part of that patch screws up, but it's definitely
> > doing something bad.
>
> I'm not sure why that patch surfaced the bug, but it's worth pointing
> out that the underlying problem is asan-stack in combination
> with the structleak plugin. This will happen for every user of kunit.
>
The patch didn't update KASAN_STACK dependency in kconfig:
config GCC_PLUGIN_STRUCTLEAK_BYREF
....
depends on !(KASAN && KASAN_STACK=1)
This 'depends on' stopped working with the patch
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-24 21:37 ` incoming Linus Torvalds
@ 2021-02-25 8:53 ` Arnd Bergmann
-1 siblings, 0 replies; 786+ messages in thread
From: Arnd Bergmann @ 2021-02-25 8:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Walter Wu, Dmitry Vyukov, Nathan Chancellor,
Arnd Bergmann, Andrey Konovalov, Linux-MM, mm-commits,
Andrey Ryabinin, Alexander Potapenko
On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Hmm. I haven't bisected things yet, but I suspect it's something with
> > the KASAN patches. With this all applied, I get:
> >
> > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> >
> > and
> >
> > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> > larger than 2048 bytes [-Wframe-larger-than=]
> >
> > which is obviously not really acceptable. A 11kB stack frame _will_
> > cause issues.
>
> A quick bisect shoes that this was introduced by "[patch 101/173]
> kasan: remove redundant config option".
>
> I didn't check what part of that patch screws up, but it's definitely
> doing something bad.
I'm not sure why that patch surfaced the bug, but it's worth pointing
out that the underlying problem is asan-stack in combination
with the structleak plugin. This will happen for every user of kunit.
I sent a series[1] out earlier this year to turn off the structleak
plugin as an alternative workaround, but need to follow up on
the remaining patches. Someone suggested adding a more
generic way to turn off the plugin for a file instead of open-coding
the CLFAGS_REMOVE_*.o Makefile bit, which would help.
I am also still hoping that someone can come up with a way
to make kunit work better with the structleak plugin, as there
shouldn't be a fundamental reason why it can't work, just that
it the code pattern triggers a particularly bad case in the compiler.
Arnd
[1] https://lore.kernel.org/lkml/20210125124533.101339-1-arnd@kernel.org/
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2021-02-25 8:53 ` Arnd Bergmann
0 siblings, 0 replies; 786+ messages in thread
From: Arnd Bergmann @ 2021-02-25 8:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Walter Wu, Dmitry Vyukov, Nathan Chancellor,
Arnd Bergmann, Andrey Konovalov, Linux-MM, mm-commits,
Andrey Ryabinin, Alexander Potapenko
On Wed, Feb 24, 2021 at 10:37 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Hmm. I haven't bisected things yet, but I suspect it's something with
> > the KASAN patches. With this all applied, I get:
> >
> > lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> > lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> > 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> >
> > and
> >
> > lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> > lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> > larger than 2048 bytes [-Wframe-larger-than=]
> >
> > which is obviously not really acceptable. A 11kB stack frame _will_
> > cause issues.
>
> A quick bisect shoes that this was introduced by "[patch 101/173]
> kasan: remove redundant config option".
>
> I didn't check what part of that patch screws up, but it's definitely
> doing something bad.
I'm not sure why that patch surfaced the bug, but it's worth pointing
out that the underlying problem is asan-stack in combination
with the structleak plugin. This will happen for every user of kunit.
I sent a series[1] out earlier this year to turn off the structleak
plugin as an alternative workaround, but need to follow up on
the remaining patches. Someone suggested adding a more
generic way to turn off the plugin for a file instead of open-coding
the CLFAGS_REMOVE_*.o Makefile bit, which would help.
I am also still hoping that someone can come up with a way
to make kunit work better with the structleak plugin, as there
shouldn't be a fundamental reason why it can't work, just that
it the code pattern triggers a particularly bad case in the compiler.
Arnd
[1] https://lore.kernel.org/lkml/20210125124533.101339-1-arnd@kernel.org/
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-24 21:30 ` incoming Linus Torvalds
@ 2021-02-24 21:37 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-02-24 21:37 UTC (permalink / raw)
To: Andrew Morton, Walter Wu, Dmitry Vyukov, Nathan Chancellor,
Arnd Bergmann, Andrey Konovalov
Cc: Linux-MM, mm-commits, Andrey Ryabinin, Alexander Potapenko
On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. I haven't bisected things yet, but I suspect it's something with
> the KASAN patches. With this all applied, I get:
>
> lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>
> and
>
> lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> larger than 2048 bytes [-Wframe-larger-than=]
>
> which is obviously not really acceptable. A 11kB stack frame _will_
> cause issues.
A quick bisect shoes that this was introduced by "[patch 101/173]
kasan: remove redundant config option".
I didn't check what part of that patch screws up, but it's definitely
doing something bad.
I will drop that patch.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2021-02-24 21:37 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-02-24 21:37 UTC (permalink / raw)
To: Andrew Morton, Walter Wu, Dmitry Vyukov, Nathan Chancellor,
Arnd Bergmann, Andrey Konovalov
Cc: Linux-MM, mm-commits, Andrey Ryabinin, Alexander Potapenko
On Wed, Feb 24, 2021 at 1:30 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. I haven't bisected things yet, but I suspect it's something with
> the KASAN patches. With this all applied, I get:
>
> lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
> lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
> 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>
> and
>
> lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
> lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
> larger than 2048 bytes [-Wframe-larger-than=]
>
> which is obviously not really acceptable. A 11kB stack frame _will_
> cause issues.
A quick bisect shoes that this was introduced by "[patch 101/173]
kasan: remove redundant config option".
I didn't check what part of that patch screws up, but it's definitely
doing something bad.
I will drop that patch.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-24 19:58 incoming Andrew Morton
@ 2021-02-24 21:30 ` Linus Torvalds
2021-02-24 21:37 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2021-02-24 21:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Wed, Feb 24, 2021 at 11:58 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> A few small subsystems and some of MM.
Hmm. I haven't bisected things yet, but I suspect it's something with
the KASAN patches. With this all applied, I get:
lib/crypto/curve25519-hacl64.c: In function ‘ladder_cmult.constprop’:
lib/crypto/curve25519-hacl64.c:601:1: warning: the frame size of
2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
and
lib/bitfield_kunit.c: In function ‘test_bitfields_constants’:
lib/bitfield_kunit.c:93:1: warning: the frame size of 11200 bytes is
larger than 2048 bytes [-Wframe-larger-than=]
which is obviously not really acceptable. A 11kB stack frame _will_
cause issues.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-02-24 19:58 Andrew Morton
2021-02-24 21:30 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-02-24 19:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
A few small subsystems and some of MM.
173 patches, based on c03c21ba6f4e95e406a1a7b4c34ef334b977c194.
Subsystems affected by this patch series:
hexagon
scripts
ntfs
ocfs2
vfs
mm/slab-generic
mm/slab
mm/slub
mm/debug
mm/pagecache
mm/swap
mm/memcg
mm/pagemap
mm/mprotect
mm/mremap
mm/page-reporting
mm/vmalloc
mm/kasan
mm/pagealloc
mm/memory-failure
mm/hugetlb
mm/vmscan
mm/z3fold
mm/compaction
mm/mempolicy
mm/oom-kill
mm/hugetlbfs
mm/migration
Subsystem: hexagon
Randy Dunlap <rdunlap@infradead.org>:
hexagon: remove CONFIG_EXPERIMENTAL from defconfigs
Subsystem: scripts
tangchunyou <tangchunyou@yulong.com>:
scripts/spelling.txt: increase error-prone spell checking
zuoqilin <zuoqilin@yulong.com>:
scripts/spelling.txt: check for "exeeds"
dingsenjie <dingsenjie@yulong.com>:
scripts/spelling.txt: add "allocted" and "exeeds" typo
Colin Ian King <colin.king@canonical.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Subsystem: ntfs
Randy Dunlap <rdunlap@infradead.org>:
ntfs: layout.h: delete duplicated words
Rustam Kovhaev <rkovhaev@gmail.com>:
ntfs: check for valid standard information attribute
Subsystem: ocfs2
Yi Li <yili@winhong.com>:
ocfs2: remove redundant conditional before iput
guozh <guozh88@chinatelecom.cn>:
ocfs2: clean up some definitions which are not used any more
Dan Carpenter <dan.carpenter@oracle.com>:
ocfs2: fix a use after free on error
Jiapeng Chong <jiapeng.chong@linux.alibaba.com>:
ocfs2: simplify the calculation of variables
Subsystem: vfs
Randy Dunlap <rdunlap@infradead.org>:
fs: delete repeated words in comments
Alexey Dobriyan <adobriyan@gmail.com>:
ramfs: support O_TMPFILE
Subsystem: mm/slab-generic
Jacob Wen <jian.w.wen@oracle.com>:
mm, tracing: record slab name for kmem_cache_free()
Nikolay Borisov <nborisov@suse.com>:
mm/sl?b.c: remove ctor argument from kmem_cache_flags
Subsystem: mm/slab
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/slab: minor coding style tweaks
Subsystem: mm/slub
Johannes Berg <johannes.berg@intel.com>:
mm/slub: disable user tracing for kmemleak caches by default
Vlastimil Babka <vbabka@suse.cz>:
Patch series "mm, slab, slub: remove cpu and memory hotplug locks":
mm, slub: stop freeing kmem_cache_node structures on node offline
mm, slab, slub: stop taking memory hotplug lock
mm, slab, slub: stop taking cpu hotplug lock
mm, slub: splice cpu and page freelists in deactivate_slab()
mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ON
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/slub: minor coding style tweaks
Subsystem: mm/debug
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/debug: improve memcg debugging
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/debug_vm_pgtable: Some minor updates", v3:
mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect
mm/debug_vm_pgtable/basic: iterate over entire protection_map[]
Miaohe Lin <linmiaohe@huawei.com>:
mm/page_owner: use helper function zone_end_pfn() to get end_pfn
Subsystem: mm/pagecache
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm/filemap: remove unused parameter and change to void type for replace_page_cache_page()
Pavel Begunkov <asml.silence@gmail.com>:
mm/filemap: don't revert iter on -EIOCBQUEUED
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Refactor generic_file_buffered_read", v5:
mm/filemap: rename generic_file_buffered_read subfunctions
mm/filemap: remove dynamically allocated array from filemap_read
mm/filemap: convert filemap_get_pages to take a pagevec
mm/filemap: use head pages in generic_file_buffered_read
mm/filemap: pass a sleep state to put_and_wait_on_page_locked
mm/filemap: support readpage splitting a page
mm/filemap: inline __wait_on_page_locked_async into caller
mm/filemap: don't call ->readpage if IOCB_WAITQ is set
mm/filemap: change filemap_read_page calling conventions
mm/filemap: change filemap_create_page calling conventions
mm/filemap: convert filemap_update_page to return an errno
mm/filemap: move the iocb checks into filemap_update_page
mm/filemap: add filemap_range_uptodate
mm/filemap: split filemap_readahead out of filemap_get_pages
mm/filemap: restructure filemap_get_pages
mm/filemap: don't relock the page after calling readpage
Christoph Hellwig <hch@lst.de>:
mm/filemap: rename generic_file_buffered_read to filemap_read
mm/filemap: simplify generic_file_read_iter
Yang Guo <guoyang2@huawei.com>:
fs/buffer.c: add checking buffer head stat before clear
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm: backing-dev: Remove duplicated macro definition
Subsystem: mm/swap
Yang Li <abaci-bugfix@linux.alibaba.com>:
mm/swap_slots.c: remove redundant NULL check
Stephen Zhang <stephenzhangzsd@gmail.com>:
mm/swapfile.c: fix debugging information problem
Georgi Djakov <georgi.djakov@linaro.org>:
mm/page_io: use pr_alert_ratelimited for swap read/write errors
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
mm/swap_state: constify static struct attribute_group
Yu Zhao <yuzhao@google.com>:
mm/swap: don't SetPageWorkingset unconditionally during swapin
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: optimize per-lruvec stats counter memory usage
Patch series "Convert all THP vmstat counters to pages", v6:
mm: memcontrol: fix NR_ANON_THPS accounting in charge moving
mm: memcontrol: convert NR_ANON_THPS account to pages
mm: memcontrol: convert NR_FILE_THPS account to pages
mm: memcontrol: convert NR_SHMEM_THPS account to pages
mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pages
mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages
mm: memcontrol: make the slab calculation consistent
Alex Shi <alex.shi@linux.alibaba.com>:
mm/memcg: revise the using condition of lock_page_lruvec function series
mm/memcg: remove rcu locking for lock_page_lruvec function series
Shakeel Butt <shakeelb@google.com>:
mm: memcg: add swapcache stat for memcg v2
Roman Gushchin <guro@fb.com>:
mm: kmem: make __memcg_kmem_(un)charge static
Feng Tang <feng.tang@intel.com>:
mm: page_counter: re-layout structure to reduce false sharing
Yang Li <abaci-bugfix@linux.alibaba.com>:
mm/memcontrol: remove redundant NULL check
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: replace the loop with a list_for_each_entry()
Shakeel Butt <shakeelb@google.com>:
mm/list_lru.c: remove kvfree_rcu_local()
Johannes Weiner <hannes@cmpxchg.org>:
fs: buffer: use raw page_memcg() on locked page
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: fix swap undercounting in cgroup2
mm: memcontrol: fix get_active_memcg return value
mm: memcontrol: fix slub memory accounting
Subsystem: mm/pagemap
Adrian Huang <ahuang12@lenovo.com>:
mm/mmap.c: remove unnecessary local variable
Miaohe Lin <linmiaohe@huawei.com>:
mm/memory.c: fix potential pte_unmap_unlock pte error
mm/pgtable-generic.c: simplify the VM_BUG_ON condition in pmdp_huge_clear_flush()
mm/pgtable-generic.c: optimize the VM_BUG_ON condition in pmdp_huge_clear_flush()
mm/memory.c: fix potential pte_unmap_unlock pte error
Subsystem: mm/mprotect
Tianjia Zhang <tianjia.zhang@linux.alibaba.com>:
mm/mprotect.c: optimize error detection in do_mprotect_pkey()
Subsystem: mm/mremap
Li Xinhai <lixinhai.lxh@gmail.com>:
mm: rmap: explicitly reset vma->anon_vma in unlink_anon_vmas()
mm: mremap: unlink anon_vmas when mremap with MREMAP_DONTUNMAP success
Subsystem: mm/page-reporting
sh <sh_def@163.com>:
mm/page_reporting: use list_entry_is_head() in page_reporting_cycle()
Subsystem: mm/vmalloc
Yang Li <abaci-bugfix@linux.alibaba.com>:
vmalloc: remove redundant NULL check
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan: HW_TAGS tests support and fixes", v4:
kasan: prefix global functions with kasan_
kasan: clarify HW_TAGS impact on TBI
kasan: clean up comments in tests
kasan: add macros to simplify checking test constraints
kasan: add match-all tag tests
kasan, arm64: allow using KUnit tests with HW_TAGS mode
kasan: rename CONFIG_TEST_KASAN_MODULE
kasan: add compiler barriers to KUNIT_EXPECT_KASAN_FAIL
kasan: adapt kmalloc_uaf2 test to HW_TAGS mode
kasan: fix memory corruption in kasan_bitops_tags test
kasan: move _RET_IP_ to inline wrappers
kasan: fix bug detection via ksize for HW_TAGS mode
kasan: add proper page allocator tests
kasan: add a test for kmem_cache_alloc/free_bulk
kasan: don't run tests when KASAN is not enabled
Walter Wu <walter-zh.wu@mediatek.com>:
kasan: remove redundant config option
Subsystem: mm/pagealloc
Baoquan He <bhe@redhat.com>:
Patch series "mm: clean up names and parameters of memmap_init_xxxx functions", v5:
mm: fix prototype warning from kernel test robot
mm: rename memmap_init() and memmap_init_zone()
mm: simplify parater of function memmap_init_zone()
mm: simplify parameter of setup_usemap()
mm: remove unneeded local variable in free_area_init_core
David Hildenbrand <david@redhat.com>:
Patch series "mm: simplify free_highmem_page() and free_reserved_page()":
video: fbdev: acornfb: remove free_unused_pages()
mm: simplify free_highmem_page() and free_reserved_page()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/gfp: add kernel-doc for gfp_t
Subsystem: mm/memory-failure
Aili Yao <yaoaili@kingsoft.com>:
mm,hwpoison: send SIGBUS to PF_MCE_EARLY processes on action required events
Subsystem: mm/hugetlb
Bibo Mao <maobibo@loongson.cn>:
mm/huge_memory.c: update tlb entry if pmd is changed
MIPS: do not call flush_tlb_all when setting pmd entry
Miaohe Lin <linmiaohe@huawei.com>:
mm/hugetlb: fix potential double free in hugetlb_register_node() error path
Li Xinhai <lixinhai.lxh@gmail.com>:
mm/hugetlb.c: fix unnecessary address expansion of pmd sharing
Miaohe Lin <linmiaohe@huawei.com>:
mm/hugetlb: avoid unnecessary hugetlb_acct_memory() call
mm/hugetlb: use helper huge_page_order and pages_per_huge_page
mm/hugetlb: fix use after free when subpool max_hpages accounting is not enabled
Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>:
mm/hugetlb: simplify the calculation of variables
Joao Martins <joao.m.martins@oracle.com>:
Patch series "mm/hugetlb: follow_hugetlb_page() improvements", v2:
mm/hugetlb: grab head page refcount once for group of subpages
mm/hugetlb: refactor subpage recording
Miaohe Lin <linmiaohe@huawei.com>:
mm/hugetlb: fix some comment typos
Yanfei Xu <yanfei.xu@windriver.com>:
mm/hugetlb: remove redundant check in preparing and destroying gigantic page
Zhiyuan Dai <daizhiyuan@phytium.com.cn>:
mm/hugetlb.c: fix typos in comments
Miaohe Lin <linmiaohe@huawei.com>:
mm/huge_memory.c: remove unused return value of set_huge_zero_page()
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled
Miaohe Lin <linmiaohe@huawei.com>:
hugetlb_cgroup: use helper pages_per_huge_page() in hugetlb_cgroup
mm/hugetlb: use helper function range_in_vma() in page_table_shareable()
mm/hugetlb: remove unnecessary VM_BUG_ON_PAGE on putback_active_hugepage()
mm/hugetlb: use helper huge_page_size() to get hugepage size
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb: fix update_and_free_page contig page struct assumption
hugetlb: fix copy_huge_page_from_user contig page struct assumption
Chen Wandun <chenwandun@huawei.com>:
mm/hugetlb: suppress wrong warning info when alloc gigantic page
Subsystem: mm/vmscan
Alex Shi <alex.shi@linux.alibaba.com>:
mm/vmscan: __isolate_lru_page_prepare() cleanup
Miaohe Lin <linmiaohe@huawei.com>:
mm/workingset.c: avoid unnecessary max_nodes estimation in count_shadow_nodes()
Yu Zhao <yuzhao@google.com>:
Patch series "mm: lru related cleanups", v2:
mm/vmscan.c: use add_page_to_lru_list()
include/linux/mm_inline.h: shuffle lru list addition and deletion functions
mm: don't pass "enum lru_list" to lru list addition functions
mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion()
mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list()
mm: add __clear_page_lru_flags() to replace page_off_lru()
mm: VM_BUG_ON lru page flags
include/linux/mm_inline.h: fold page_lru_base_type() into its sole caller
include/linux/mm_inline.h: fold __update_lru_size() into its sole caller
mm/vmscan.c: make lruvec_lru_size() static
Oscar Salvador <osalvador@suse.de>:
mm: workingset: clarify eviction order and distance calculation
Mike Kravetz <mike.kravetz@oracle.com>:
Patch series "create hugetlb flags to consolidate state", v3:
hugetlb: use page.private for hugetlb specific page flags
hugetlb: convert page_huge_active() HPageMigratable flag
hugetlb: convert PageHugeTemporary() to HPageTemporary flag
hugetlb: convert PageHugeFreed to HPageFreed flag
include/linux/hugetlb.h: add synchronization information for new hugetlb specific flags
hugetlb: fix uninitialized subpool pointer
Dave Hansen <dave.hansen@linux.intel.com>:
mm/vmscan: restore zone_reclaim_mode ABI
Subsystem: mm/z3fold
Miaohe Lin <linmiaohe@huawei.com>:
z3fold: remove unused attribute for release_z3fold_page
z3fold: simplify the zhdr initialization code in init_z3fold_page()
Subsystem: mm/compaction
Alex Shi <alex.shi@linux.alibaba.com>:
mm/compaction: remove rcu_read_lock during page compaction
Miaohe Lin <linmiaohe@huawei.com>:
mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked
Charan Teja Reddy <charante@codeaurora.org>:
mm/compaction: correct deferral logic for proactive compaction
Wonhyuk Yang <vvghjk1234@gmail.com>:
mm/compaction: fix misbehaviors of fast_find_migrateblock()
Vlastimil Babka <vbabka@suse.cz>:
mm, compaction: make fast_isolate_freepages() stay within zone
Subsystem: mm/mempolicy
Huang Ying <ying.huang@intel.com>:
numa balancing: migrate on fault among multiple bound nodes
Miaohe Lin <linmiaohe@huawei.com>:
mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk()
Subsystem: mm/oom-kill
Tang Yizhou <tangyizhou@huawei.com>:
mm, oom: fix a comment in dump_task()
Subsystem: mm/hugetlbfs
Mike Kravetz <mike.kravetz@oracle.com>:
mm/hugetlb: change hugetlb_reserve_pages() to type bool
hugetlbfs: remove special hugetlbfs_set_page_dirty()
Miaohe Lin <linmiaohe@huawei.com>:
hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()
hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs
hugetlbfs: correct obsolete function name in hugetlbfs_read_iter()
hugetlbfs: remove meaningless variable avoid_reserve
hugetlbfs: make hugepage size conversion more readable
hugetlbfs: correct some obsolete comments about inode i_mutex
hugetlbfs: fix some comment typos
hugetlbfs: remove unneeded return value of hugetlb_vmtruncate()
Subsystem: mm/migration
Chengyang Fan <cy.fan@huawei.com>:
mm/migrate: remove unneeded semicolons
Documentation/admin-guide/cgroup-v2.rst | 4
Documentation/admin-guide/kernel-parameters.txt | 8
Documentation/admin-guide/sysctl/vm.rst | 10
Documentation/core-api/mm-api.rst | 7
Documentation/dev-tools/kasan.rst | 24
Documentation/vm/arch_pgtable_helpers.rst | 8
arch/arm64/include/asm/memory.h | 1
arch/arm64/include/asm/mte-kasan.h | 12
arch/arm64/kernel/mte.c | 12
arch/arm64/kernel/sleep.S | 2
arch/arm64/mm/fault.c | 20
arch/hexagon/configs/comet_defconfig | 1
arch/ia64/include/asm/pgtable.h | 6
arch/ia64/mm/init.c | 18
arch/mips/mm/pgtable-32.c | 1
arch/mips/mm/pgtable-64.c | 1
arch/x86/kernel/acpi/wakeup_64.S | 2
drivers/base/node.c | 33
drivers/video/fbdev/acornfb.c | 34
fs/block_dev.c | 2
fs/btrfs/file.c | 2
fs/buffer.c | 7
fs/dcache.c | 4
fs/direct-io.c | 4
fs/exec.c | 4
fs/fhandle.c | 2
fs/fuse/dev.c | 6
fs/hugetlbfs/inode.c | 72 --
fs/ntfs/inode.c | 6
fs/ntfs/layout.h | 4
fs/ocfs2/cluster/heartbeat.c | 8
fs/ocfs2/dlm/dlmast.c | 10
fs/ocfs2/dlm/dlmcommon.h | 4
fs/ocfs2/refcounttree.c | 2
fs/ocfs2/super.c | 2
fs/pipe.c | 2
fs/proc/meminfo.c | 10
fs/proc/vmcore.c | 7
fs/ramfs/inode.c | 13
include/linux/fs.h | 4
include/linux/gfp.h | 14
include/linux/highmem-internal.h | 5
include/linux/huge_mm.h | 15
include/linux/hugetlb.h | 98 ++
include/linux/kasan-checks.h | 6
include/linux/kasan.h | 39 -
include/linux/memcontrol.h | 43 -
include/linux/migrate.h | 2
include/linux/mm.h | 28
include/linux/mm_inline.h | 123 +--
include/linux/mmzone.h | 30
include/linux/page-flags.h | 6
include/linux/page_counter.h | 9
include/linux/pagemap.h | 5
include/linux/swap.h | 8
include/trace/events/kmem.h | 24
include/trace/events/pagemap.h | 11
include/uapi/linux/mempolicy.h | 4
init/Kconfig | 14
lib/Kconfig.kasan | 14
lib/Makefile | 2
lib/test_kasan.c | 446 ++++++++----
lib/test_kasan_module.c | 5
mm/backing-dev.c | 6
mm/compaction.c | 73 +-
mm/debug.c | 10
mm/debug_vm_pgtable.c | 86 ++
mm/filemap.c | 859 +++++++++++-------------
mm/gup.c | 5
mm/huge_memory.c | 28
mm/hugetlb.c | 376 ++++------
mm/hugetlb_cgroup.c | 6
mm/kasan/common.c | 60 -
mm/kasan/generic.c | 40 -
mm/kasan/hw_tags.c | 16
mm/kasan/kasan.h | 87 +-
mm/kasan/quarantine.c | 22
mm/kasan/report.c | 15
mm/kasan/report_generic.c | 10
mm/kasan/report_hw_tags.c | 8
mm/kasan/report_sw_tags.c | 8
mm/kasan/shadow.c | 27
mm/kasan/sw_tags.c | 22
mm/khugepaged.c | 6
mm/list_lru.c | 12
mm/memcontrol.c | 309 ++++----
mm/memory-failure.c | 34
mm/memory.c | 24
mm/memory_hotplug.c | 11
mm/mempolicy.c | 18
mm/mempool.c | 2
mm/migrate.c | 10
mm/mlock.c | 3
mm/mmap.c | 4
mm/mprotect.c | 7
mm/mremap.c | 8
mm/oom_kill.c | 5
mm/page_alloc.c | 70 -
mm/page_io.c | 12
mm/page_owner.c | 4
mm/page_reporting.c | 2
mm/pgtable-generic.c | 9
mm/rmap.c | 35
mm/shmem.c | 2
mm/slab.c | 21
mm/slab.h | 20
mm/slab_common.c | 40 -
mm/slob.c | 2
mm/slub.c | 169 ++--
mm/swap.c | 54 -
mm/swap_slots.c | 3
mm/swap_state.c | 31
mm/swapfile.c | 8
mm/vmscan.c | 100 +-
mm/vmstat.c | 14
mm/workingset.c | 7
mm/z3fold.c | 11
scripts/Makefile.kasan | 10
scripts/spelling.txt | 30
tools/objtool/check.c | 2
120 files changed, 2249 insertions(+), 1954 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-02-13 4:52 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-02-13 4:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
6 patches, based on dcc0b49040c70ad827a7f3d58a21b01fdb14e749.
Subsystems affected by this patch series:
mm/pagemap
scripts
MAINTAINERS
h8300
Subsystem: mm/pagemap
Mike Rapoport <rppt@linux.ibm.com>:
m68k: make __pfn_to_phys() and __phys_to_pfn() available for !MMU
Subsystem: scripts
Rong Chen <rong.a.chen@intel.com>:
scripts/recordmcount.pl: support big endian for ARCH sh
Subsystem: MAINTAINERS
Andrey Konovalov <andreyknvl@google.com>:
MAINTAINERS: update KASAN file list
MAINTAINERS: update Andrey Konovalov's email address
MAINTAINERS: add Andrey Konovalov to KASAN reviewers
Subsystem: h8300
Randy Dunlap <rdunlap@infradead.org>:
h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
MAINTAINERS | 8 +++++---
arch/h8300/kernel/asm-offsets.c | 3 +++
arch/m68k/include/asm/page.h | 2 +-
scripts/recordmcount.pl | 6 +++++-
4 files changed, 14 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-02-09 21:41 incoming Andrew Morton
@ 2021-02-10 19:30 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-02-10 19:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
Hah. This series shows a small deficiency in your scripting wrt the diffstat:
On Tue, Feb 9, 2021 at 1:41 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> .mailmap | 1
...
> mm/slub.c | 18 +++++++++-
> 17 files changed, 172 insertions(+), 49 deletions(-)
It actually has 18 files changed, but one of them is a pure rename (no
change to the content), and apparently your diffstat tool can't handle
that case.
It *should* have ended with
...
mm/slub.c | 18 +++++-
.../selftests/vm/{run_vmtests => run_vmtests.sh} | 0
18 files changed, 172 insertions(+), 49 deletions(-)
rename tools/testing/selftests/vm/{run_vmtests => run_vmtests.sh} (100%)
if you'd done a proper "git diff -M --stat --summary" of the series.
[ Ok, by default git would actually have said
18 files changed, 171 insertions(+), 48 deletions(-)
but it looks like you use the patience diff option, which gives that
extra insertion/deletion line because it generates the diff a bit
differently ]
Not a big deal,, but it made me briefly wonder "why doesn't my
diffstat match yours".
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-02-09 21:41 Andrew Morton
2021-02-10 19:30 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-02-09 21:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
14 patches, based on e0756cfc7d7cd08c98a53b6009c091a3f6a50be6.
Subsystems affected by this patch series:
squashfs
mm/kasan
firmware
mm/mremap
mm/tmpfs
mm/selftests
MAINTAINERS
mm/memcg
mm/slub
nilfs2
Subsystem: squashfs
Phillip Lougher <phillip@squashfs.org.uk>:
Patch series "Squashfs: fix BIO migration regression and add sanity checks":
squashfs: avoid out of bounds writes in decompressors
squashfs: add more sanity checks in id lookup
squashfs: add more sanity checks in inode lookup
squashfs: add more sanity checks in xattr id lookup
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan: fix stack traces dependency for HW_TAGS
Subsystem: firmware
Fangrui Song <maskray@google.com>:
firmware_loader: align .builtin_fw to 8
Subsystem: mm/mremap
Arnd Bergmann <arnd@arndb.de>:
mm/mremap: fix BUILD_BUG_ON() error in get_extent
Subsystem: mm/tmpfs
Seth Forshee <seth.forshee@canonical.com>:
tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
Subsystem: mm/selftests
Rong Chen <rong.a.chen@intel.com>:
selftests/vm: rename file run_vmtests to run_vmtests.sh
Subsystem: MAINTAINERS
Andrey Ryabinin <ryabinin.a.a@gmail.com>:
MAINTAINERS: update Andrey Ryabinin's email address
Subsystem: mm/memcg
Johannes Weiner <hannes@cmpxchg.org>:
Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: better heuristic for number of cpus when calculating slab order
Subsystem: nilfs2
Joachim Henke <joachim.henke@t-systems.com>:
nilfs2: make splice write available again
.mailmap | 1
Documentation/dev-tools/kasan.rst | 3 -
MAINTAINERS | 2 -
fs/Kconfig | 4 +-
fs/nilfs2/file.c | 1
fs/squashfs/block.c | 8 ++++
fs/squashfs/export.c | 41 +++++++++++++++++++----
fs/squashfs/id.c | 40 ++++++++++++++++++-----
fs/squashfs/squashfs_fs_sb.h | 1
fs/squashfs/super.c | 6 +--
fs/squashfs/xattr.h | 10 +++++
fs/squashfs/xattr_id.c | 66 ++++++++++++++++++++++++++++++++------
include/asm-generic/vmlinux.lds.h | 2 -
mm/kasan/hw_tags.c | 8 +---
mm/memcontrol.c | 5 +-
mm/mremap.c | 5 +-
mm/slub.c | 18 +++++++++-
17 files changed, 172 insertions(+), 49 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-02-05 2:31 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-02-05 2:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
18 patches, based on 5c279c4cf206e03995e04fd3404fa95ffd243a97.
Subsystems affected by this patch series:
mm/hugetlb
mm/compaction
mm/vmalloc
gcov
mm/shmem
mm/memblock
mailmap
mm/pagecache
mm/kasan
ubsan
mm/hugetlb
MAINTAINERS
Subsystem: mm/hugetlb
Muchun Song <songmuchun@bytedance.com>:
mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
mm: hugetlb: fix a race between freeing and dissolving the page
mm: hugetlb: fix a race between isolating and freeing page
mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
mm: migrate: do not migrate HugeTLB page whose refcount is one
Subsystem: mm/compaction
Rokudo Yan <wu-yan@tcl.com>:
mm, compaction: move high_pfn to the for loop scope
Subsystem: mm/vmalloc
Rick Edgecombe <rick.p.edgecombe@intel.com>:
mm/vmalloc: separate put pages and flush VM flags
Subsystem: gcov
Johannes Berg <johannes.berg@intel.com>:
init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov
Subsystem: mm/shmem
Hugh Dickins <hughd@google.com>:
mm: thp: fix MADV_REMOVE deadlock on shmem THP
Subsystem: mm/memblock
Roman Gushchin <guro@fb.com>:
memblock: do not start bottom-up allocations with kernel_end
Subsystem: mailmap
Viresh Kumar <viresh.kumar@linaro.org>:
mailmap: fix name/email for Viresh Kumar
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>:
mailmap: add entries for Manivannan Sadhasivam
Subsystem: mm/pagecache
Waiman Long <longman@redhat.com>:
mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
Subsystem: mm/kasan
Vincenzo Frascino <vincenzo.frascino@arm.com>:
Patch series "kasan: Fix metadata detection for KASAN_HW_TAGS", v5:
kasan: add explicit preconditions to kasan_report()
kasan: make addr_has_metadata() return true for valid addresses
Subsystem: ubsan
Nathan Chancellor <nathan@kernel.org>:
ubsan: implement __ubsan_handle_alignment_assumption
Subsystem: mm/hugetlb
Muchun Song <songmuchun@bytedance.com>:
mm: hugetlb: fix missing put_page in gather_surplus_pages()
Subsystem: MAINTAINERS
Nathan Chancellor <nathan@kernel.org>:
MAINTAINERS/.mailmap: use my @kernel.org address
.mailmap | 5 ++++
MAINTAINERS | 2 -
fs/hugetlbfs/inode.c | 3 +-
include/linux/hugetlb.h | 2 +
include/linux/kasan.h | 7 ++++++
include/linux/vmalloc.h | 9 +-------
init/Kconfig | 1
init/main.c | 8 ++++++-
kernel/gcov/Kconfig | 2 -
lib/ubsan.c | 31 ++++++++++++++++++++++++++++
lib/ubsan.h | 6 +++++
mm/compaction.c | 3 +-
mm/filemap.c | 4 +++
mm/huge_memory.c | 37 ++++++++++++++++++++-------------
mm/hugetlb.c | 53 ++++++++++++++++++++++++++++++++++++++++++------
mm/kasan/kasan.h | 2 -
mm/memblock.c | 49 +++++---------------------------------------
mm/migrate.c | 6 +++++
18 files changed, 153 insertions(+), 77 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-01-24 5:00 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2021-01-24 5:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
19 patches, based on e1ae4b0be15891faf46d390e9f3dc9bd71a8cae1.
Subsystems affected by this patch series:
mm/pagealloc
mm/memcg
mm/kasan
ubsan
mm/memory-failure
mm/highmem
proc
MAINTAINERS
Subsystem: mm/pagealloc
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: fix initialization of struct page for holes in memory layout", v3:
x86/setup: don't remove E820_TYPE_RAM for pfn 0
mm: fix initialization of struct page for holes in memory layout
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: optimize objcg stock draining
Shakeel Butt <shakeelb@google.com>:
mm: memcg: fix memcg file_dirty numa stat
mm: fix numa stats for thp migration
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: prevent starvation when writing memory.high
Subsystem: mm/kasan
Lecopzer Chen <lecopzer@gmail.com>:
kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
kasan: fix incorrect arguments passing in kasan_add_zero_shadow
Andrey Konovalov <andreyknvl@google.com>:
kasan: fix HW_TAGS boot parameters
kasan, mm: fix conflicts with init_on_alloc/free
kasan, mm: fix resetting page_alloc tags for HW_TAGS
Subsystem: ubsan
Arnd Bergmann <arnd@arndb.de>:
ubsan: disable unsigned-overflow check for i386
Subsystem: mm/memory-failure
Dan Williams <dan.j.williams@intel.com>:
mm: fix page reference leak in soft_offline_page()
Subsystem: mm/highmem
Thomas Gleixner <tglx@linutronix.de>:
Patch series "mm/highmem: Fix fallout from generic kmap_local conversions":
sparc/mm/highmem: flush cache and TLB
mm/highmem: prepare for overriding set_pte_at()
mips/mm/highmem: use set_pte() for kmap_local()
powerpc/mm/highmem: use __set_pte_at() for kmap_local()
Subsystem: proc
Xiaoming Ni <nixiaoming@huawei.com>:
proc_sysctl: fix oops caused by incorrect command parameters
Subsystem: MAINTAINERS
Nathan Chancellor <natechancellor@gmail.com>:
MAINTAINERS: add a couple more files to the Clang/LLVM section
Documentation/dev-tools/kasan.rst | 27 ++---------
MAINTAINERS | 2
arch/mips/include/asm/highmem.h | 1
arch/powerpc/include/asm/highmem.h | 2
arch/sparc/include/asm/highmem.h | 9 ++-
arch/x86/kernel/setup.c | 20 +++-----
fs/proc/proc_sysctl.c | 7 ++-
lib/Kconfig.ubsan | 1
mm/highmem.c | 7 ++-
mm/kasan/hw_tags.c | 77 +++++++++++++--------------------
mm/kasan/init.c | 23 +++++----
mm/memcontrol.c | 11 +---
mm/memory-failure.c | 20 ++++++--
mm/migrate.c | 27 ++++++-----
mm/page_alloc.c | 86 ++++++++++++++++++++++---------------
mm/slub.c | 7 +--
16 files changed, 173 insertions(+), 154 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2021-01-12 23:48 incoming Andrew Morton
@ 2021-01-15 23:32 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2021-01-15 23:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Tue, Jan 12, 2021 at 3:48 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c.
Whee. I had completely dropped the ball on this - I had built my usual
"akpm" branch with the patches, but then had completely forgotten
about it after doing my basic build tests.
I tend to leave it for a while to see if people send belated ACK/NAK's
for the patches, but that "for a while" is typically "overnight", not
several days.
So if you ever notice that I haven't merged your patch submission, and
you haven't seen me comment on them, feel free to ping me to remind
me.
Because it might just have gotten lost in the shuffle for some random
reason. Admittedly it's rare - I think this is the first time I just
randomly noticed three days later that I'd never done the actual merge
of the patch-series).
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2021-01-12 23:48 Andrew Morton
2021-01-15 23:32 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2021-01-12 23:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
10 patches, based on e609571b5ffa3528bf85292de1ceaddac342bc1c.
Subsystems affected by this patch series:
mm/slub
mm/pagealloc
mm/memcg
mm/kasan
mm/vmalloc
mm/migration
mm/hugetlb
MAINTAINERS
mm/memory-failure
mm/process_vm_access
Subsystem: mm/slub
Jann Horn <jannh@google.com>:
mm, slub: consider rest of partial list if acquire_slab() fails
Subsystem: mm/pagealloc
Hailong liu <liu.hailong6@zte.com.cn>:
mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
Subsystem: mm/memcg
Hugh Dickins <hughd@google.com>:
mm/memcontrol: fix warning in mem_cgroup_page_lruvec()
Subsystem: mm/kasan
Hailong Liu <liu.hailong6@zte.com.cn>:
arm/kasan: fix the array size of kasan_early_shadow_pte[]
Subsystem: mm/vmalloc
Miaohe Lin <linmiaohe@huawei.com>:
mm/vmalloc.c: fix potential memory leak
Subsystem: mm/migration
Jan Stancek <jstancek@redhat.com>:
mm: migrate: initialize err in do_migrate_pages
Subsystem: mm/hugetlb
Miaohe Lin <linmiaohe@huawei.com>:
mm/hugetlb: fix potential missing huge page size info
Subsystem: MAINTAINERS
Vlastimil Babka <vbabka@suse.cz>:
MAINTAINERS: add Vlastimil as slab allocators maintainer
Subsystem: mm/memory-failure
Oscar Salvador <osalvador@suse.de>:
mm,hwpoison: fix printing of page flags
Subsystem: mm/process_vm_access
Andrew Morton <akpm@linux-foundation.org>:
mm/process_vm_access.c: include compat.h
MAINTAINERS | 1 +
include/linux/kasan.h | 6 +++++-
include/linux/memcontrol.h | 2 +-
mm/hugetlb.c | 2 +-
mm/kasan/init.c | 3 ++-
mm/memory-failure.c | 2 +-
mm/mempolicy.c | 2 +-
mm/page_alloc.c | 31 ++++++++++++++++---------------
mm/process_vm_access.c | 1 +
mm/slub.c | 2 +-
mm/vmalloc.c | 4 +++-
11 files changed, 33 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-29 23:13 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-12-29 23:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
16 patches, based on dea8dcf2a9fa8cc540136a6cd885c3beece16ec3.
Subsystems affected by this patch series:
mm/selftests
mm/hugetlb
kbuild
checkpatch
mm/pagecache
mm/mremap
mm/kasan
misc
lib
mm/slub
Subsystem: mm/selftests
Harish <harish@linux.ibm.com>:
selftests/vm: fix building protection keys test
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
mm/hugetlb: fix deadlock in hugetlb_cow error path
Subsystem: kbuild
Masahiro Yamada <masahiroy@kernel.org>:
Revert "kbuild: avoid static_assert for genksyms"
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: prefer strscpy to strlcpy
Subsystem: mm/pagecache
Souptick Joarder <jrdr.linux@gmail.com>:
mm: add prototype for __add_to_page_cache_locked()
Baoquan He <bhe@redhat.com>:
mm: memmap defer init doesn't work as expected
Subsystem: mm/mremap
Kalesh Singh <kaleshsingh@google.com>:
mm/mremap.c: fix extent calculation
Nicholas Piggin <npiggin@gmail.com>:
mm: generalise COW SMC TLB flushing race comment
Subsystem: mm/kasan
Walter Wu <walter-zh.wu@mediatek.com>:
kasan: fix null pointer dereference in kasan_record_aux_stack
Subsystem: misc
Randy Dunlap <rdunlap@infradead.org>:
local64.h: make <asm/local64.h> mandatory
Huang Shijie <sjhuang@iluvatar.ai>:
sizes.h: add SZ_8G/SZ_16G/SZ_32G macros
Josh Poimboeuf <jpoimboe@redhat.com>:
kdev_t: always inline major/minor helper functions
Subsystem: lib
Huang Shijie <sjhuang@iluvatar.ai>:
lib/genalloc: fix the overflow when size is too big
Ilya Leoshkevich <iii@linux.ibm.com>:
lib/zlib: fix inflating zlib streams on s390
Randy Dunlap <rdunlap@infradead.org>:
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
Subsystem: mm/slub
Roman Gushchin <guro@fb.com>:
mm: slub: call account_slab_page() after slab page initialization
arch/alpha/include/asm/local64.h | 1 -
arch/arc/include/asm/Kbuild | 1 -
arch/arm/include/asm/Kbuild | 1 -
arch/arm64/include/asm/Kbuild | 1 -
arch/csky/include/asm/Kbuild | 1 -
arch/h8300/include/asm/Kbuild | 1 -
arch/hexagon/include/asm/Kbuild | 1 -
arch/ia64/include/asm/local64.h | 1 -
arch/ia64/mm/init.c | 4 ++--
arch/m68k/include/asm/Kbuild | 1 -
arch/microblaze/include/asm/Kbuild | 1 -
arch/mips/include/asm/Kbuild | 1 -
arch/nds32/include/asm/Kbuild | 1 -
arch/openrisc/include/asm/Kbuild | 1 -
arch/parisc/include/asm/Kbuild | 1 -
arch/powerpc/include/asm/Kbuild | 1 -
arch/riscv/include/asm/Kbuild | 1 -
arch/s390/include/asm/Kbuild | 1 -
arch/sh/include/asm/Kbuild | 1 -
arch/sparc/include/asm/Kbuild | 1 -
arch/x86/include/asm/local64.h | 1 -
arch/xtensa/include/asm/Kbuild | 1 -
include/asm-generic/Kbuild | 1 +
include/linux/build_bug.h | 5 -----
include/linux/kdev_t.h | 22 +++++++++++-----------
include/linux/mm.h | 12 ++++++++++--
include/linux/sizes.h | 3 +++
lib/genalloc.c | 25 +++++++++++++------------
lib/zlib_dfltcc/Makefile | 2 +-
lib/zlib_dfltcc/dfltcc.c | 6 +++++-
lib/zlib_dfltcc/dfltcc_deflate.c | 3 +++
lib/zlib_dfltcc/dfltcc_inflate.c | 4 ++--
lib/zlib_dfltcc/dfltcc_syms.c | 17 -----------------
mm/hugetlb.c | 22 +++++++++++++++++++++-
mm/kasan/generic.c | 2 ++
mm/memory.c | 8 +++++---
mm/memory_hotplug.c | 2 +-
mm/mremap.c | 4 +++-
mm/page_alloc.c | 8 +++++---
mm/slub.c | 5 ++---
scripts/checkpatch.pl | 6 ++++++
tools/testing/selftests/vm/Makefile | 10 +++++-----
42 files changed, 101 insertions(+), 91 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-22 19:58 incoming Andrew Morton
@ 2020-12-22 21:43 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-12-22 21:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Tue, Dec 22, 2020 at 11:58 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> 60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e.
I see that you enabled renaming in the patches. Lovely.
Can you also enable it in the diffstat?
> 74 files changed, 2869 insertions(+), 1553 deletions(-)
With -M in the diffstat, you should have seen
72 files changed, 2775 insertions(+), 1460 deletions(-)
and if you add "--summary", you'll also see the rename part ofthe file
create/delete summary:
rename mm/kasan/{tags_report.c => report_sw_tags.c} (78%)
which is often nice to see in addition to the line stats..
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-22 19:58 Andrew Morton
2020-12-22 21:43 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-12-22 19:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
60 patches, based on 8653b778e454a7708847aeafe689bce07aeeb94e.
Subsystems affected by this patch series:
mm/kasan
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan: add hardware tag-based mode for arm64", v11:
kasan: drop unnecessary GPL text from comment headers
kasan: KASAN_VMALLOC depends on KASAN_GENERIC
kasan: group vmalloc code
kasan: shadow declarations only for software modes
kasan: rename (un)poison_shadow to (un)poison_range
kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_*
kasan: only build init.c for software modes
kasan: split out shadow.c from common.c
kasan: define KASAN_MEMORY_PER_SHADOW_PAGE
kasan: rename report and tags files
kasan: don't duplicate config dependencies
kasan: hide invalid free check implementation
kasan: decode stack frame only with KASAN_STACK_ENABLE
kasan, arm64: only init shadow for software modes
kasan, arm64: only use kasan_depth for software modes
kasan, arm64: move initialization message
kasan, arm64: rename kasan_init_tags and mark as __init
kasan: rename addr_has_shadow to addr_has_metadata
kasan: rename print_shadow_for_address to print_memory_metadata
kasan: rename SHADOW layout macros to META
kasan: separate metadata_fetch_row for each mode
kasan: introduce CONFIG_KASAN_HW_TAGS
Vincenzo Frascino <vincenzo.frascino@arm.com>:
arm64: enable armv8.5-a asm-arch option
arm64: mte: add in-kernel MTE helpers
arm64: mte: reset the page tag in page->flags
arm64: mte: add in-kernel tag fault handler
arm64: kasan: allow enabling in-kernel MTE
arm64: mte: convert gcr_user into an exclude mask
arm64: mte: switch GCR_EL1 in kernel entry and exit
kasan, mm: untag page address in free_reserved_area
Andrey Konovalov <andreyknvl@google.com>:
arm64: kasan: align allocations for HW_TAGS
arm64: kasan: add arch layer for memory tagging helpers
kasan: define KASAN_GRANULE_SIZE for HW_TAGS
kasan, x86, s390: update undef CONFIG_KASAN
kasan, arm64: expand CONFIG_KASAN checks
kasan, arm64: implement HW_TAGS runtime
kasan, arm64: print report from tag fault handler
kasan, mm: reset tags when accessing metadata
kasan, arm64: enable CONFIG_KASAN_HW_TAGS
kasan: add documentation for hardware tag-based mode
Vincenzo Frascino <vincenzo.frascino@arm.com>:
kselftest/arm64: check GCR_EL1 after context switch
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan: boot parameters for hardware tag-based mode", v4:
kasan: simplify quarantine_put call site
kasan: rename get_alloc/free_info
kasan: introduce set_alloc_info
kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK
kasan: allow VMAP_STACK for HW_TAGS mode
kasan: remove __kasan_unpoison_stack
kasan: inline kasan_reset_tag for tag-based modes
kasan: inline random_tag for HW_TAGS
kasan: open-code kasan_unpoison_slab
kasan: inline (un)poison_range and check_invalid_free
kasan: add and integrate kasan boot parameters
kasan, mm: check kasan_enabled in annotations
kasan, mm: rename kasan_poison_kfree
kasan: don't round_up too much
kasan: simplify assign_tag and set_tag calls
kasan: clarify comment in __kasan_kfree_large
kasan: sanitize objects when metadata doesn't fit
kasan, mm: allow cache merging with no metadata
kasan: update documentation
Documentation/dev-tools/kasan.rst | 274 ++-
arch/Kconfig | 8
arch/arm64/Kconfig | 9
arch/arm64/Makefile | 7
arch/arm64/include/asm/assembler.h | 2
arch/arm64/include/asm/cache.h | 3
arch/arm64/include/asm/esr.h | 1
arch/arm64/include/asm/kasan.h | 17
arch/arm64/include/asm/memory.h | 15
arch/arm64/include/asm/mte-def.h | 16
arch/arm64/include/asm/mte-kasan.h | 67
arch/arm64/include/asm/mte.h | 22
arch/arm64/include/asm/processor.h | 2
arch/arm64/include/asm/string.h | 5
arch/arm64/include/asm/uaccess.h | 23
arch/arm64/kernel/asm-offsets.c | 3
arch/arm64/kernel/cpufeature.c | 3
arch/arm64/kernel/entry.S | 41
arch/arm64/kernel/head.S | 2
arch/arm64/kernel/hibernate.c | 5
arch/arm64/kernel/image-vars.h | 2
arch/arm64/kernel/kaslr.c | 3
arch/arm64/kernel/module.c | 6
arch/arm64/kernel/mte.c | 124 +
arch/arm64/kernel/setup.c | 2
arch/arm64/kernel/sleep.S | 2
arch/arm64/kernel/smp.c | 2
arch/arm64/lib/mte.S | 16
arch/arm64/mm/copypage.c | 9
arch/arm64/mm/fault.c | 59
arch/arm64/mm/kasan_init.c | 41
arch/arm64/mm/mteswap.c | 9
arch/arm64/mm/proc.S | 23
arch/arm64/mm/ptdump.c | 6
arch/s390/boot/string.c | 1
arch/x86/boot/compressed/misc.h | 1
arch/x86/kernel/acpi/wakeup_64.S | 2
include/linux/kasan-checks.h | 2
include/linux/kasan.h | 423 ++++-
include/linux/mm.h | 24
include/linux/moduleloader.h | 3
include/linux/page-flags-layout.h | 2
include/linux/sched.h | 2
include/linux/string.h | 2
init/init_task.c | 2
kernel/fork.c | 4
lib/Kconfig.kasan | 71
lib/test_kasan.c | 2
lib/test_kasan_module.c | 2
mm/kasan/Makefile | 33
mm/kasan/common.c | 1006 +++-----------
mm/kasan/generic.c | 72 -
mm/kasan/generic_report.c | 13
mm/kasan/hw_tags.c | 276 +++
mm/kasan/init.c | 25
mm/kasan/kasan.h | 195 ++
mm/kasan/quarantine.c | 35
mm/kasan/report.c | 363 +----
mm/kasan/report_generic.c | 169 ++
mm/kasan/report_hw_tags.c | 44
mm/kasan/report_sw_tags.c | 22
mm/kasan/shadow.c | 528 +++++++
mm/kasan/sw_tags.c | 34
mm/kasan/tags.c | 7
mm/kasan/tags_report.c | 7
mm/mempool.c | 4
mm/page_alloc.c | 9
mm/page_poison.c | 2
mm/ptdump.c | 13
mm/slab_common.c | 5
mm/slub.c | 29
scripts/Makefile.lib | 2
tools/testing/selftests/arm64/mte/Makefile | 2
tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c | 155 ++
74 files changed, 2869 insertions(+), 1553 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-18 22:00 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-12-18 22:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
78 patches, based on a409ed156a90093a03fe6a93721ddf4c591eac87.
Subsystems affected by this patch series:
mm/memcg
epoll
mm/kasan
mm/cleanups
epoll
Subsystem: mm/memcg
Alex Shi <alex.shi@linux.alibaba.com>:
Patch series "bail out early for memcg disable":
mm/memcg: bail early from swap accounting if memcg disabled
mm/memcg: warning on !memcg after readahead page charged
Wei Yang <richard.weiyang@gmail.com>:
mm/memcg: remove unused definitions
Shakeel Butt <shakeelb@google.com>:
mm, kvm: account kvm_vcpu_mmap to kmemcg
Hui Su <sh_def@163.com>:
mm/memcontrol:rewrite mem_cgroup_page_lruvec()
Subsystem: epoll
Soheil Hassas Yeganeh <soheil@google.com>:
Patch series "simplify ep_poll":
epoll: check for events when removing a timed out thread from the wait queue
epoll: simplify signal handling
epoll: pull fatal signal checks into ep_send_events()
epoll: move eavail next to the list_empty_careful check
epoll: simplify and optimize busy loop logic
epoll: pull all code between fetch_events and send_event into the loop
epoll: replace gotos with a proper loop
epoll: eliminate unnecessary lock for zero timeout
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan: add hardware tag-based mode for arm64", v11:
kasan: drop unnecessary GPL text from comment headers
kasan: KASAN_VMALLOC depends on KASAN_GENERIC
kasan: group vmalloc code
kasan: shadow declarations only for software modes
kasan: rename (un)poison_shadow to (un)poison_range
kasan: rename KASAN_SHADOW_* to KASAN_GRANULE_*
kasan: only build init.c for software modes
kasan: split out shadow.c from common.c
kasan: define KASAN_MEMORY_PER_SHADOW_PAGE
kasan: rename report and tags files
kasan: don't duplicate config dependencies
kasan: hide invalid free check implementation
kasan: decode stack frame only with KASAN_STACK_ENABLE
kasan, arm64: only init shadow for software modes
kasan, arm64: only use kasan_depth for software modes
kasan, arm64: move initialization message
kasan, arm64: rename kasan_init_tags and mark as __init
kasan: rename addr_has_shadow to addr_has_metadata
kasan: rename print_shadow_for_address to print_memory_metadata
kasan: rename SHADOW layout macros to META
kasan: separate metadata_fetch_row for each mode
kasan: introduce CONFIG_KASAN_HW_TAGS
Vincenzo Frascino <vincenzo.frascino@arm.com>:
arm64: enable armv8.5-a asm-arch option
arm64: mte: add in-kernel MTE helpers
arm64: mte: reset the page tag in page->flags
arm64: mte: add in-kernel tag fault handler
arm64: kasan: allow enabling in-kernel MTE
arm64: mte: convert gcr_user into an exclude mask
arm64: mte: switch GCR_EL1 in kernel entry and exit
kasan, mm: untag page address in free_reserved_area
Andrey Konovalov <andreyknvl@google.com>:
arm64: kasan: align allocations for HW_TAGS
arm64: kasan: add arch layer for memory tagging helpers
kasan: define KASAN_GRANULE_SIZE for HW_TAGS
kasan, x86, s390: update undef CONFIG_KASAN
kasan, arm64: expand CONFIG_KASAN checks
kasan, arm64: implement HW_TAGS runtime
kasan, arm64: print report from tag fault handler
kasan, mm: reset tags when accessing metadata
kasan, arm64: enable CONFIG_KASAN_HW_TAGS
kasan: add documentation for hardware tag-based mode
Vincenzo Frascino <vincenzo.frascino@arm.com>:
kselftest/arm64: check GCR_EL1 after context switch
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan: boot parameters for hardware tag-based mode", v4:
kasan: simplify quarantine_put call site
kasan: rename get_alloc/free_info
kasan: introduce set_alloc_info
kasan, arm64: unpoison stack only with CONFIG_KASAN_STACK
kasan: allow VMAP_STACK for HW_TAGS mode
kasan: remove __kasan_unpoison_stack
kasan: inline kasan_reset_tag for tag-based modes
kasan: inline random_tag for HW_TAGS
kasan: open-code kasan_unpoison_slab
kasan: inline (un)poison_range and check_invalid_free
kasan: add and integrate kasan boot parameters
kasan, mm: check kasan_enabled in annotations
kasan, mm: rename kasan_poison_kfree
kasan: don't round_up too much
kasan: simplify assign_tag and set_tag calls
kasan: clarify comment in __kasan_kfree_large
kasan: sanitize objects when metadata doesn't fit
kasan, mm: allow cache merging with no metadata
kasan: update documentation
Subsystem: mm/cleanups
Colin Ian King <colin.king@canonical.com>:
mm/Kconfig: fix spelling mistake "whats" -> "what's"
Subsystem: epoll
Willem de Bruijn <willemb@google.com>:
Patch series "add epoll_pwait2 syscall", v4:
epoll: convert internal api to timespec64
epoll: add syscall epoll_pwait2
epoll: wire up syscall epoll_pwait2
selftests/filesystems: expand epoll with epoll_pwait2
Documentation/dev-tools/kasan.rst | 274 +-
arch/Kconfig | 8
arch/alpha/kernel/syscalls/syscall.tbl | 1
arch/arm/tools/syscall.tbl | 1
arch/arm64/Kconfig | 9
arch/arm64/Makefile | 7
arch/arm64/include/asm/assembler.h | 2
arch/arm64/include/asm/cache.h | 3
arch/arm64/include/asm/esr.h | 1
arch/arm64/include/asm/kasan.h | 17
arch/arm64/include/asm/memory.h | 15
arch/arm64/include/asm/mte-def.h | 16
arch/arm64/include/asm/mte-kasan.h | 67
arch/arm64/include/asm/mte.h | 22
arch/arm64/include/asm/processor.h | 2
arch/arm64/include/asm/string.h | 5
arch/arm64/include/asm/uaccess.h | 23
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 2
arch/arm64/kernel/asm-offsets.c | 3
arch/arm64/kernel/cpufeature.c | 3
arch/arm64/kernel/entry.S | 41
arch/arm64/kernel/head.S | 2
arch/arm64/kernel/hibernate.c | 5
arch/arm64/kernel/image-vars.h | 2
arch/arm64/kernel/kaslr.c | 3
arch/arm64/kernel/module.c | 6
arch/arm64/kernel/mte.c | 124 +
arch/arm64/kernel/setup.c | 2
arch/arm64/kernel/sleep.S | 2
arch/arm64/kernel/smp.c | 2
arch/arm64/lib/mte.S | 16
arch/arm64/mm/copypage.c | 9
arch/arm64/mm/fault.c | 59
arch/arm64/mm/kasan_init.c | 41
arch/arm64/mm/mteswap.c | 9
arch/arm64/mm/proc.S | 23
arch/arm64/mm/ptdump.c | 6
arch/ia64/kernel/syscalls/syscall.tbl | 1
arch/m68k/kernel/syscalls/syscall.tbl | 1
arch/microblaze/kernel/syscalls/syscall.tbl | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 1
arch/mips/kernel/syscalls/syscall_n64.tbl | 1
arch/mips/kernel/syscalls/syscall_o32.tbl | 1
arch/parisc/kernel/syscalls/syscall.tbl | 1
arch/powerpc/kernel/syscalls/syscall.tbl | 1
arch/s390/boot/string.c | 1
arch/s390/kernel/syscalls/syscall.tbl | 1
arch/sh/kernel/syscalls/syscall.tbl | 1
arch/sparc/kernel/syscalls/syscall.tbl | 1
arch/x86/boot/compressed/misc.h | 1
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
arch/x86/kernel/acpi/wakeup_64.S | 2
arch/x86/kvm/x86.c | 2
arch/xtensa/kernel/syscalls/syscall.tbl | 1
fs/eventpoll.c | 359 ++-
include/linux/compat.h | 6
include/linux/kasan-checks.h | 2
include/linux/kasan.h | 423 ++--
include/linux/memcontrol.h | 137 -
include/linux/mm.h | 24
include/linux/mmdebug.h | 13
include/linux/moduleloader.h | 3
include/linux/page-flags-layout.h | 2
include/linux/sched.h | 2
include/linux/string.h | 2
include/linux/syscalls.h | 5
include/uapi/asm-generic/unistd.h | 4
init/init_task.c | 2
kernel/fork.c | 4
kernel/sys_ni.c | 2
lib/Kconfig.kasan | 71
lib/test_kasan.c | 2
lib/test_kasan_module.c | 2
mm/Kconfig | 2
mm/kasan/Makefile | 33
mm/kasan/common.c | 1006 ++--------
mm/kasan/generic.c | 72
mm/kasan/generic_report.c | 13
mm/kasan/hw_tags.c | 294 ++
mm/kasan/init.c | 25
mm/kasan/kasan.h | 204 +-
mm/kasan/quarantine.c | 35
mm/kasan/report.c | 363 +--
mm/kasan/report_generic.c | 169 +
mm/kasan/report_hw_tags.c | 44
mm/kasan/report_sw_tags.c | 22
mm/kasan/shadow.c | 541 +++++
mm/kasan/sw_tags.c | 34
mm/kasan/tags.c | 7
mm/kasan/tags_report.c | 7
mm/memcontrol.c | 53
mm/mempool.c | 4
mm/page_alloc.c | 9
mm/page_poison.c | 2
mm/ptdump.c | 13
mm/slab_common.c | 5
mm/slub.c | 29
scripts/Makefile.lib | 2
tools/testing/selftests/arm64/mte/Makefile | 2
tools/testing/selftests/arm64/mte/check_gcr_el1_cswitch.c | 155 +
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 72
virt/kvm/coalesced_mmio.c | 2
virt/kvm/kvm_main.c | 2
105 files changed, 3268 insertions(+), 1873 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-16 4:41 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-12-16 4:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- lots of little subsystems
- a few post-linux-next MM material. Most of this awaits more merging
of other trees.
95 patches, based on 489e9fea66f31086f85d9a18e61e4791d94a56a4.
Subsystems affected by this patch series:
mm/swap
mm/memory-hotplug
alpha
procfs
misc
core-kernel
bitmap
lib
lz4
bitops
checkpatch
nilfs
kdump
rapidio
gcov
bfs
relay
resource
ubsan
reboot
fault-injection
lzo
apparmor
mm/pagemap
mm/cleanups
mm/gup
Subsystem: mm/swap
Zhaoyang Huang <huangzhaoyang@gmail.com>:
mm: fix a race on nr_swap_pages
Subsystem: mm/memory-hotplug
Laurent Dufour <ldufour@linux.ibm.com>:
mm/memory_hotplug: quieting offline operation
Subsystem: alpha
Thomas Gleixner <tglx@linutronix.de>:
alpha: replace bogus in_interrupt()
Subsystem: procfs
Randy Dunlap <rdunlap@infradead.org>:
procfs: delete duplicated words + other fixes
Anand K Mistry <amistry@google.com>:
proc: provide details on indirect branch speculation
Alexey Dobriyan <adobriyan@gmail.com>:
proc: fix lookup in /proc/net subdirectories after setns(2)
Hui Su <sh_def@163.com>:
fs/proc: make pde_get() return nothing
Subsystem: misc
Christophe Leroy <christophe.leroy@csgroup.eu>:
asm-generic: force inlining of get_order() to work around gcc10 poor decision
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
kernel.h: split out mathematical helpers
Subsystem: core-kernel
Hui Su <sh_def@163.com>:
kernel/acct.c: use #elif instead of #end and #elif
Subsystem: bitmap
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
include/linux/bitmap.h: convert bitmap_empty() / bitmap_full() to return boolean
"Ma, Jianpeng" <jianpeng.ma@intel.com>:
bitmap: remove unused function declaration
Subsystem: lib
Geert Uytterhoeven <geert@linux-m68k.org>:
lib/test_free_pages.c: add basic progress indicators
"Gustavo A. R. Silva" <gustavoars@kernel.org>:
Patch series "] lib/stackdepot.c: Replace one-element array with flexible-array member":
lib/stackdepot.c: replace one-element array with flexible-array member
lib/stackdepot.c: use flex_array_size() helper in memcpy()
lib/stackdepot.c: use array_size() helper in jhash2()
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
lib/test_lockup.c: minimum fix to get it compiled on PREEMPT_RT
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
lib/list_kunit: follow new file name convention for KUnit tests
lib/linear_ranges_kunit: follow new file name convention for KUnit tests
lib/bits_kunit: follow new file name convention for KUnit tests
lib/cmdline: fix get_option() for strings starting with hyphen
lib/cmdline: allow NULL to be an output for get_option()
lib/cmdline_kunit: add a new test suite for cmdline API
Jakub Jelinek <jakub@redhat.com>:
ilog2: improve ilog2 for constant arguments
Nick Desaulniers <ndesaulniers@google.com>:
lib/string: remove unnecessary #undefs
Daniel Axtens <dja@axtens.net>:
Patch series "Fortify strscpy()", v7:
lib: string.h: detect intra-object overflow in fortified string functions
lkdtm: tests for FORTIFY_SOURCE
Francis Laniel <laniel_francis@privacyrequired.com>:
string.h: add FORTIFY coverage for strscpy()
drivers/misc/lkdtm: add new file in LKDTM to test fortified strscpy
drivers/misc/lkdtm/lkdtm.h: correct wrong filenames in comment
Alexey Dobriyan <adobriyan@gmail.com>:
lib: cleanup kstrto*() usage
Subsystem: lz4
Gao Xiang <hsiangkao@redhat.com>:
lib/lz4: explicitly support in-place decompression
Subsystem: bitops
Syed Nayyar Waris <syednwaris@gmail.com>:
Patch series "Introduce the for_each_set_clump macro", v12:
bitops: introduce the for_each_set_clump macro
lib/test_bitmap.c: add for_each_set_clump test cases
gpio: thunderx: utilize for_each_set_clump macro
gpio: xilinx: utilize generic bitmap_get_value and _set_value
Subsystem: checkpatch
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: add new exception to repeated word check
Aditya Srivastava <yashsri421@gmail.com>:
checkpatch: fix false positives in REPEATED_WORD warning
Łukasz Stelmach <l.stelmach@samsung.com>:
checkpatch: ignore generated CamelCase defines and enum values
Joe Perches <joe@perches.com>:
checkpatch: prefer static const declarations
checkpatch: allow --fix removal of unnecessary break statements
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: extend attributes check to handle more patterns
Tom Rix <trix@redhat.com>:
checkpatch: add a fixer for missing newline at eof
Joe Perches <joe@perches.com>:
checkpatch: update __attribute__((section("name"))) quote removal
Aditya Srivastava <yashsri421@gmail.com>:
checkpatch: add fix option for GERRIT_CHANGE_ID
Joe Perches <joe@perches.com>:
checkpatch: add __alias and __weak to suggested __attribute__ conversions
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: improve email parsing
checkpatch: fix spelling errors and remove repeated word
Aditya Srivastava <yashsri421@gmail.com>:
checkpatch: avoid COMMIT_LOG_LONG_LINE warning for signature tags
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: fix unescaped left brace
Aditya Srivastava <yashsri421@gmail.com>:
checkpatch: add fix option for ASSIGNMENT_CONTINUATIONS
checkpatch: add fix option for LOGICAL_CONTINUATIONS
checkpatch: add fix and improve warning msg for non-standard signature
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: add warning for unnecessary use of %h[xudi] and %hh[xudi]
checkpatch: add warning for lines starting with a '#' in commit log
checkpatch: fix TYPO_SPELLING check for words with apostrophe
Joe Perches <joe@perches.com>:
checkpatch: add printk_once and printk_ratelimit to prefer pr_<level> warning
Subsystem: nilfs
Alex Shi <alex.shi@linux.alibaba.com>:
fs/nilfs2: remove some unused macros to tame gcc
Subsystem: kdump
Alexander Egorenkov <egorenar@linux.ibm.com>:
kdump: append uts_namespace.name offset to VMCOREINFO
Subsystem: rapidio
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
rapidio: remove unused rio_get_asm() and rio_get_device()
Subsystem: gcov
Nick Desaulniers <ndesaulniers@google.com>:
gcov: remove support for GCC < 4.9
Alex Shi <alex.shi@linux.alibaba.com>:
gcov: fix kernel-doc markup issue
Subsystem: bfs
Randy Dunlap <rdunlap@infradead.org>:
bfs: don't use WARNING: string when it's just info.
Subsystem: relay
Jani Nikula <jani.nikula@intel.com>:
Patch series "relay: cleanup and const callbacks", v2:
relay: remove unused buf_mapped and buf_unmapped callbacks
relay: require non-NULL callbacks in relay_open()
relay: make create_buf_file and remove_buf_file callbacks mandatory
relay: allow the use of const callback structs
drm/i915: make relay callbacks const
ath10k: make relay callbacks const
ath11k: make relay callbacks const
ath9k: make relay callbacks const
blktrace: make relay callbacks const
Subsystem: resource
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>:
kernel/resource.c: fix kernel-doc markups
Subsystem: ubsan
Kees Cook <keescook@chromium.org>:
Patch series "Clean up UBSAN Makefile", v2:
ubsan: remove redundant -Wno-maybe-uninitialized
ubsan: move cc-option tests into Kconfig
ubsan: disable object-size sanitizer under GCC
ubsan: disable UBSAN_TRAP for all*config
ubsan: enable for all*config builds
ubsan: remove UBSAN_MISC in favor of individual options
ubsan: expand tests and reporting
Dmitry Vyukov <dvyukov@google.com>:
kcov: don't instrument with UBSAN
Zou Wei <zou_wei@huawei.com>:
lib/ubsan.c: mark type_check_kinds with static keyword
Subsystem: reboot
Matteo Croce <mcroce@microsoft.com>:
reboot: refactor and comment the cpu selection code
reboot: allow to specify reboot mode via sysfs
reboot: remove cf9_safe from allowed types and rename cf9_force
Patch series "reboot: sysfs improvements":
reboot: allow to override reboot type if quirks are found
reboot: hide from sysfs not applicable settings
Subsystem: fault-injection
Barnabás Pőcze <pobrn@protonmail.com>:
fault-injection: handle EI_ETYPE_TRUE
Subsystem: lzo
Jason Yan <yanaijie@huawei.com>:
lib/lzo/lzo1x_compress.c: make lzogeneric1x_1_compress() static
Subsystem: apparmor
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
apparmor: remove duplicate macro list_entry_is_head()
Subsystem: mm/pagemap
Christoph Hellwig <hch@lst.de>:
Patch series "simplify follow_pte a bit":
mm: unexport follow_pte_pmd
mm: simplify follow_pte{,pmd}
Subsystem: mm/cleanups
Haitao Shi <shihaitao1@huawei.com>:
mm: fix some spelling mistakes in comments
Subsystem: mm/gup
Jann Horn <jannh@google.com>:
mmap locking API: don't check locking if the mm isn't live yet
mm/gup: assert that the mmap lock is held in __get_user_pages()
Documentation/ABI/testing/sysfs-kernel-reboot | 32
Documentation/admin-guide/kdump/vmcoreinfo.rst | 6
Documentation/dev-tools/ubsan.rst | 1
Documentation/filesystems/proc.rst | 2
MAINTAINERS | 5
arch/alpha/kernel/process.c | 2
arch/powerpc/kernel/vmlinux.lds.S | 4
arch/s390/pci/pci_mmio.c | 4
drivers/gpio/gpio-thunderx.c | 11
drivers/gpio/gpio-xilinx.c | 61 -
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 2
drivers/misc/lkdtm/Makefile | 1
drivers/misc/lkdtm/bugs.c | 50 +
drivers/misc/lkdtm/core.c | 3
drivers/misc/lkdtm/fortify.c | 82 ++
drivers/misc/lkdtm/lkdtm.h | 19
drivers/net/wireless/ath/ath10k/spectral.c | 2
drivers/net/wireless/ath/ath11k/spectral.c | 2
drivers/net/wireless/ath/ath9k/common-spectral.c | 2
drivers/rapidio/rio.c | 81 --
fs/bfs/inode.c | 2
fs/dax.c | 9
fs/exec.c | 8
fs/nfs/callback_proc.c | 5
fs/nilfs2/segment.c | 5
fs/proc/array.c | 28
fs/proc/base.c | 2
fs/proc/generic.c | 24
fs/proc/internal.h | 10
fs/proc/proc_net.c | 20
include/asm-generic/bitops/find.h | 19
include/asm-generic/getorder.h | 2
include/linux/bitmap.h | 67 +-
include/linux/bitops.h | 24
include/linux/dcache.h | 1
include/linux/iommu-helper.h | 4
include/linux/kernel.h | 173 -----
include/linux/log2.h | 3
include/linux/math.h | 177 +++++
include/linux/mm.h | 6
include/linux/mm_types.h | 10
include/linux/mmap_lock.h | 16
include/linux/proc_fs.h | 8
include/linux/rcu_node_tree.h | 2
include/linux/relay.h | 29
include/linux/rio_drv.h | 3
include/linux/string.h | 75 +-
include/linux/units.h | 2
kernel/Makefile | 3
kernel/acct.c | 7
kernel/crash_core.c | 1
kernel/fail_function.c | 6
kernel/gcov/gcc_4_7.c | 10
kernel/reboot.c | 308 ++++++++-
kernel/relay.c | 111 ---
kernel/resource.c | 24
kernel/trace/blktrace.c | 2
lib/Kconfig.debug | 11
lib/Kconfig.ubsan | 154 +++-
lib/Makefile | 7
lib/bits_kunit.c | 75 ++
lib/cmdline.c | 20
lib/cmdline_kunit.c | 100 +++
lib/errname.c | 1
lib/error-inject.c | 2
lib/errseq.c | 1
lib/find_bit.c | 17
lib/linear_ranges_kunit.c | 228 +++++++
lib/list-test.c | 748 -----------------------
lib/list_kunit.c | 748 +++++++++++++++++++++++
lib/lz4/lz4_decompress.c | 6
lib/lz4/lz4defs.h | 1
lib/lzo/lzo1x_compress.c | 2
lib/math/div64.c | 4
lib/math/int_pow.c | 2
lib/math/int_sqrt.c | 3
lib/math/reciprocal_div.c | 9
lib/stackdepot.c | 11
lib/string.c | 4
lib/test_bitmap.c | 143 ++++
lib/test_bits.c | 75 --
lib/test_firmware.c | 9
lib/test_free_pages.c | 5
lib/test_kmod.c | 26
lib/test_linear_ranges.c | 228 -------
lib/test_lockup.c | 16
lib/test_ubsan.c | 74 ++
lib/ubsan.c | 2
mm/filemap.c | 2
mm/gup.c | 2
mm/huge_memory.c | 2
mm/khugepaged.c | 2
mm/memblock.c | 2
mm/memory.c | 36 -
mm/memory_hotplug.c | 2
mm/migrate.c | 2
mm/page_ext.c | 2
mm/swapfile.c | 11
scripts/Makefile.ubsan | 49 -
scripts/checkpatch.pl | 495 +++++++++++----
security/apparmor/apparmorfs.c | 3
tools/testing/selftests/lkdtm/tests.txt | 1
102 files changed, 3022 insertions(+), 1899 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-15 22:49 ` incoming Linus Torvalds
@ 2020-12-15 22:55 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-12-15 22:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux-MM, mm-commits
On Tue, 15 Dec 2020 14:49:24 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Dec 15, 2020 at 2:48 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I will try to apply it on top of my merge of your previous series instead.
>
> Yes, then it applies cleanly. So apparently we just have different
> concepts of what really constitutes a "base" for applying your series.
>
oop, sorry, yes, the "based on" thing was wrong because I had two
series in flight simultaneously. I've never tried that before..
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-15 22:48 ` incoming Linus Torvalds
@ 2020-12-15 22:49 ` Linus Torvalds
2020-12-15 22:55 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2020-12-15 22:49 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Tue, Dec 15, 2020 at 2:48 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I will try to apply it on top of my merge of your previous series instead.
Yes, then it applies cleanly. So apparently we just have different
concepts of what really constitutes a "base" for applying your series.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-15 20:32 incoming Andrew Morton
2020-12-15 21:00 ` incoming Linus Torvalds
@ 2020-12-15 22:48 ` Linus Torvalds
2020-12-15 22:49 ` incoming Linus Torvalds
1 sibling, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2020-12-15 22:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Tue, Dec 15, 2020 at 12:32 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> - more MM work: a memcg scalability improvememt
>
> 19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.
With your re-send, I get all patches, but they don't actually apply cleanly.
Is that base correct?
I get
error: patch failed: mm/huge_memory.c:2750
error: mm/huge_memory.c: patch does not apply
Patch failed at 0004 mm/thp: narrow lru locking
for that patch "[patch 04/19] mm/thp: narrow lru locking", and that's
definitely true: the patch fragment has
@@ -2750,7 +2751,7 @@ int split_huge_page_to_list(struct page
__dec_lruvec_page_state(head, NR_FILE_THPS);
}
- __split_huge_page(page, list, end, flags);
+ __split_huge_page(page, list, end);
ret = 0;
} else {
if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
but that __dec_lruvec_page_state() conversion was done by your
previous commit series.
So I have the feeling that what you actually mean by "base" isn't
actually really the base for that series at all..
I will try to apply it on top of my merge of your previous series instead.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-15 20:32 incoming Andrew Morton
@ 2020-12-15 21:00 ` Linus Torvalds
2020-12-15 22:48 ` incoming Linus Torvalds
1 sibling, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-12-15 21:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-MM, mm-commits
On Tue, Dec 15, 2020 at 12:32 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> - more MM work: a memcg scalability improvememt
>
> 19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.
I'm not seeing patch 10/19 at all.
And patch 19/19 is corrupted and has an attachment with a '^P'
character in it. I could fix it up, but with the missing patch in the
middle I'm not going to even try. 'b4' is also very unhappy about that
patch 19/19.
I don't know what went wrong, but I'll ignore this send - please
re-send the series at your leisure, ok?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-15 20:32 Andrew Morton
2020-12-15 21:00 ` incoming Linus Torvalds
2020-12-15 22:48 ` incoming Linus Torvalds
0 siblings, 2 replies; 786+ messages in thread
From: Andrew Morton @ 2020-12-15 20:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
- more MM work: a memcg scalability improvememt
19 patches, based on 148842c98a24e508aecb929718818fbf4c2a6ff3.
Subsystems affected by this patch series:
Alex Shi <alex.shi@linux.alibaba.com>:
Patch series "per memcg lru lock", v21:
mm/thp: move lru_add_page_tail() to huge_memory.c
mm/thp: use head for head page in lru_add_page_tail()
mm/thp: simplify lru_add_page_tail()
mm/thp: narrow lru locking
mm/vmscan: remove unnecessary lruvec adding
mm/rmap: stop store reordering issue on page->mapping
Hugh Dickins <hughd@google.com>:
mm: page_idle_get_page() does not need lru_lock
Alex Shi <alex.shi@linux.alibaba.com>:
mm/memcg: add debug checking in lock_page_memcg
mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn
mm/lru: move lock into lru_note_cost
mm/vmscan: remove lruvec reget in move_pages_to_lru
mm/mlock: remove lru_lock on TestClearPageMlocked
mm/mlock: remove __munlock_isolate_lru_page()
mm/lru: introduce TestClearPageLRU()
mm/compaction: do page isolation first in compaction
mm/swap.c: serialize memcg changes in pagevec_lru_move_fn
mm/lru: replace pgdat lru_lock with lruvec lock
Alexander Duyck <alexander.h.duyck@linux.intel.com>:
mm/lru: introduce relock_page_lruvec()
Hugh Dickins <hughd@google.com>:
mm/lru: revise the comments of lru_lock
Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 -
Documentation/admin-guide/cgroup-v1/memory.rst | 23 -
Documentation/trace/events-kmem.rst | 2
Documentation/vm/unevictable-lru.rst | 22 -
include/linux/memcontrol.h | 110 +++++++
include/linux/mm_types.h | 2
include/linux/mmzone.h | 6
include/linux/page-flags.h | 1
include/linux/swap.h | 4
mm/compaction.c | 98 ++++---
mm/filemap.c | 4
mm/huge_memory.c | 109 ++++---
mm/memcontrol.c | 84 +++++-
mm/mlock.c | 93 ++----
mm/mmzone.c | 1
mm/page_alloc.c | 1
mm/page_idle.c | 4
mm/rmap.c | 12
mm/swap.c | 292 ++++++++-------------
mm/vmscan.c | 239 ++++++++---------
mm/workingset.c | 2
21 files changed, 644 insertions(+), 480 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-15 3:30 ` incoming Linus Torvalds
(?)
@ 2020-12-15 14:04 ` Konstantin Ryabitsev
-1 siblings, 0 replies; 786+ messages in thread
From: Konstantin Ryabitsev @ 2020-12-15 14:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, mm-commits, Linux-MM
On Mon, Dec 14, 2020 at 07:30:54PM -0800, Linus Torvalds wrote:
> > All the patches except for _one_ get a nice little green check-mark
> > next to them when I use 'git am' on this series.
> >
> > The one that did not was [patch 192/200].
> >
> > I have no idea why
>
> Hmm. It looks like that patch is the only one in the series with the
> ">From" marker in the commit message, from the silly "clarify that
> this isn't the first line in a new message in mbox format".
>
> And "b4 am" has turned the single ">" into two, making the stupid
> marker worse, and actually corrupting the end result.
It's a bug in b4 that I overlooked. Public-inbox emits mboxrd-formatted
.mbox files, while Python's mailbox.mbox consumes mboxo only. The main
distinction between the two is precisely that mboxrd will convert
">From " into ">>From " in an attempt to avoid corruption during
escape/unescape (it didn't end up fixing the problem 100% and mostly
introduced incompatibilities like this one).
I have a fix in master/stable-0.6.y and I'll release a 0.6.2 before the
end of the week.
Thanks for the report.
-K
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-15 3:25 ` incoming Linus Torvalds
@ 2020-12-15 3:30 ` Linus Torvalds
-1 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-12-15 3:30 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: mm-commits, Linux-MM
On Mon, Dec 14, 2020 at 7:25 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> All the patches except for _one_ get a nice little green check-mark
> next to them when I use 'git am' on this series.
>
> The one that did not was [patch 192/200].
>
> I have no idea why
Hmm. It looks like that patch is the only one in the series with the
">From" marker in the commit message, from the silly "clarify that
this isn't the first line in a new message in mbox format".
And "b4 am" has turned the single ">" into two, making the stupid
marker worse, and actually corrupting the end result.
Coincidence? Or cause?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2020-12-15 3:30 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-12-15 3:30 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: mm-commits, Linux-MM
On Mon, Dec 14, 2020 at 7:25 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> All the patches except for _one_ get a nice little green check-mark
> next to them when I use 'git am' on this series.
>
> The one that did not was [patch 192/200].
>
> I have no idea why
Hmm. It looks like that patch is the only one in the series with the
">From" marker in the commit message, from the silly "clarify that
this isn't the first line in a new message in mbox format".
And "b4 am" has turned the single ">" into two, making the stupid
marker worse, and actually corrupting the end result.
Coincidence? Or cause?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-12-15 3:02 incoming Andrew Morton
@ 2020-12-15 3:25 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-12-15 3:25 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: mm-commits, Linux-MM
On Mon, Dec 14, 2020 at 7:02 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 200 patches, based on 2c85ebc57b3e1817b6ce1a6b703928e113a90442.
I haven't actually processed the patches yet, but I have a question
for Konstantin wrt b4.
All the patches except for _one_ get a nice little green check-mark
next to them when I use 'git am' on this series.
The one that did not was [patch 192/200].
I have no idea why - and it doesn't matter a lot to me, it just stood
out as being different. I'm assuming Andrew has started doing patch
attestation, and that patch failed. But if so, maybe Konstantin wants
to know what went wrong.
Konstantin?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2020-12-15 3:25 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-12-15 3:25 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: mm-commits, Linux-MM
On Mon, Dec 14, 2020 at 7:02 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 200 patches, based on 2c85ebc57b3e1817b6ce1a6b703928e113a90442.
I haven't actually processed the patches yet, but I have a question
for Konstantin wrt b4.
All the patches except for _one_ get a nice little green check-mark
next to them when I use 'git am' on this series.
The one that did not was [patch 192/200].
I have no idea why - and it doesn't matter a lot to me, it just stood
out as being different. I'm assuming Andrew has started doing patch
attestation, and that patch failed. But if so, maybe Konstantin wants
to know what went wrong.
Konstantin?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-15 3:02 Andrew Morton
2020-12-15 3:25 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-12-15 3:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
200 patches, based on 2c85ebc57b3e1817b6ce1a6b703928e113a90442.
Subsystems affected by this patch series:
kthread
kbuild
ide
ntfs
ocfs2
arch
mm/slab-generic
mm/slab
mm/slub
mm/dax
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/shmem
mm/memcg
mm/pagemap
mm/mremap
mm/hmm
mm/vmalloc
mm/documentation
mm/kasan
mm/pagealloc
mm/memory-failure
mm/hugetlb
mm/vmscan
mm/z3fold
mm/compaction
mm/oom-kill
mm/migration
mm/cma
mm/page-poison
mm/userfaultfd
mm/zswap
mm/zsmalloc
mm/uaccess
mm/zram
mm/cleanups
Subsystem: kthread
Rob Clark <robdclark@chromium.org>:
kthread: add kthread_work tracepoints
Petr Mladek <pmladek@suse.com>:
kthread_worker: document CPU hotplug handling
Subsystem: kbuild
Petr Vorel <petr.vorel@gmail.com>:
uapi: move constants from <linux/kernel.h> to <linux/const.h>
Subsystem: ide
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
ide/falcon: remove in_interrupt() usage
ide: remove BUG_ON(in_interrupt() || irqs_disabled()) from ide_unregister()
Subsystem: ntfs
Alex Shi <alex.shi@linux.alibaba.com>:
fs/ntfs: remove unused varibles
fs/ntfs: remove unused variable attr_len
Subsystem: ocfs2
Tom Rix <trix@redhat.com>:
fs/ocfs2/cluster/tcp.c: remove unneeded break
Mauricio Faria de Oliveira <mfo@canonical.com>:
ocfs2: ratelimit the 'max lookup times reached' notice
Subsystem: arch
Colin Ian King <colin.king@canonical.com>:
arch/Kconfig: fix spelling mistakes
Subsystem: mm/slab-generic
Hui Su <sh_def@163.com>:
mm/slab_common.c: use list_for_each_entry in dump_unreclaimable_slab()
Bartosz Golaszewski <bgolaszewski@baylibre.com>:
Patch series "slab: provide and use krealloc_array()", v3:
mm: slab: clarify krealloc()'s behavior with __GFP_ZERO
mm: slab: provide krealloc_array()
ALSA: pcm: use krealloc_array()
vhost: vringh: use krealloc_array()
pinctrl: use krealloc_array()
edac: ghes: use krealloc_array()
drm: atomic: use krealloc_array()
hwtracing: intel: use krealloc_array()
dma-buf: use krealloc_array()
Vlastimil Babka <vbabka@suse.cz>:
mm, slab, slub: clear the slab_cache field when freeing page
Subsystem: mm/slab
Alexander Popov <alex.popov@linux.com>:
mm/slab: rerform init_on_free earlier
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: use kmem_cache_debug_flags() in deactivate_slab()
Bharata B Rao <bharata@linux.ibm.com>:
mm/slub: let number of online CPUs determine the slub page order
Subsystem: mm/dax
Dan Williams <dan.j.williams@intel.com>:
device-dax/kmem: use struct_size()
Subsystem: mm/debug
Zhenhua Huang <zhenhuah@codeaurora.org>:
mm: fix page_owner initializing issue for arm32
Liam Mark <lmark@codeaurora.org>:
mm/page_owner: record timestamp and pid
Subsystem: mm/pagecache
Kent Overstreet <kent.overstreet@gmail.com>:
Patch series "generic_file_buffered_read() improvements", v2:
mm/filemap/c: break generic_file_buffered_read up into multiple functions
mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig
Alex Shi <alex.shi@linux.alibaba.com>:
mm/truncate: add parameter explanation for invalidate_mapping_pagevec
Hailong Liu <carver4lio@163.com>:
mm/filemap.c: remove else after a return
Subsystem: mm/gup
John Hubbard <jhubbard@nvidia.com>:
Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3:
mm/gup_benchmark: rename to mm/gup_test
selftests/vm: use a common gup_test.h
selftests/vm: rename run_vmtests --> run_vmtests.sh
selftests/vm: minor cleanup: Makefile and gup_test.c
selftests/vm: only some gup_test items are really benchmarks
selftests/vm: gup_test: introduce the dump_pages() sub-test
selftests/vm: run_vmtests.sh: update and clean up gup_test invocation
selftests/vm: hmm-tests: remove the libhugetlbfs dependency
selftests/vm: 2x speedup for run_vmtests.sh
Barry Song <song.bao.hua@hisilicon.com>:
mm/gup_test.c: mark gup_test_init as __init function
mm/gup_test: GUP_TEST depends on DEBUG_FS
Jason Gunthorpe <jgg@nvidia.com>:
Patch series "Add a seqcount between gup_fast and copy_page_range()", v4:
mm/gup: reorganize internal_get_user_pages_fast()
mm/gup: prevent gup_fast from racing with COW during fork
mm/gup: remove the vma allocation from gup_longterm_locked()
mm/gup: combine put_compound_head() and unpin_user_page()
Subsystem: mm/swap
Ralph Campbell <rcampbell@nvidia.com>:
mm: handle zone device pages in release_pages()
Miaohe Lin <linmiaohe@huawei.com>:
mm/swapfile.c: use helper function swap_count() in add_swap_count_continuation()
mm/swap_state: skip meaningless swap cache readahead when ra_info.win == 0
mm/swapfile.c: remove unnecessary out label in __swap_duplicate()
mm/swapfile.c: use memset to fill the swap_map with SWAP_HAS_CACHE
Jeff Layton <jlayton@kernel.org>:
mm: remove pagevec_lookup_range_nr_tag()
Subsystem: mm/shmem
Hui Su <sh_def@163.com>:
mm/shmem.c: make shmem_mapping() inline
Randy Dunlap <rdunlap@infradead.org>:
tmpfs: fix Documentation nits
Subsystem: mm/memcg
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: add file_thp, shmem_thp to memory.stat
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: remove unused mod_memcg_obj_state()
Miaohe Lin <linmiaohe@huawei.com>:
mm: memcontrol: eliminate redundant check in __mem_cgroup_insert_exceeded()
Muchun Song <songmuchun@bytedance.com>:
mm: memcg/slab: fix return of child memcg objcg for root memcg
mm: memcg/slab: fix use after free in obj_cgroup_charge
Shakeel Butt <shakeelb@google.com>:
mm/rmap: always do TTU_IGNORE_ACCESS
Alex Shi <alex.shi@linux.alibaba.com>:
mm/memcg: update page struct member in comments
Roman Gushchin <guro@fb.com>:
mm: memcg: fix obsolete code comments
Patch series "mm: memcg: deprecate cgroup v1 non-hierarchical mode", v1:
mm: memcg: deprecate the non-hierarchical mode
docs: cgroup-v1: reflect the deprecation of the non-hierarchical mode
cgroup: remove obsoleted broken_hierarchy and warned_broken_hierarchy
Hui Su <sh_def@163.com>:
mm/page_counter: use page_counter_read in page_counter_set_max
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
mm: memcg: remove obsolete memcg_has_children()
Muchun Song <songmuchun@bytedance.com>:
mm: memcg/slab: rename *_lruvec_slab_state to *_lruvec_kmem_state
Kaixu Xia <kaixuxia@tencent.com>:
mm: memcontrol: sssign boolean values to a bool variable
Alex Shi <alex.shi@linux.alibaba.com>:
mm/memcg: remove incorrect comment
Shakeel Butt <shakeelb@google.com>:
Patch series "memcg: add pagetable comsumption to memory.stat", v2:
mm: move lruvec stats update functions to vmstat.h
mm: memcontrol: account pagetables per node
Subsystem: mm/pagemap
Dan Williams <dan.j.williams@intel.com>:
xen/unpopulated-alloc: consolidate pgmap manipulation
Kalesh Singh <kaleshsingh@google.com>:
Patch series "Speed up mremap on large regions", v4:
kselftests: vm: add mremap tests
mm: speedup mremap on 1GB or larger regions
arm64: mremap speedup - enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
John Hubbard <jhubbard@nvidia.com>:
mm: cleanup: remove unused tsk arg from __access_remote_vm
Alex Shi <alex.shi@linux.alibaba.com>:
mm/mapping_dirty_helpers: enhance the kernel-doc markups
mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte
Axel Rasmussen <axelrasmussen@google.com>:
mm: mmap_lock: add tracepoints around lock acquisition
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
sparc: fix handling of page table constructor failure
mm: move free_unref_page to mm/internal.h
Subsystem: mm/mremap
Dmitry Safonov <dima@arista.com>:
Patch series "mremap: move_vma() fixes":
mm/mremap: account memory on do_munmap() failure
mm/mremap: for MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio
vm_ops: rename .split() callback to .may_split()
mremap: check if it's possible to split original vma
mm: forbid splitting special mappings
Subsystem: mm/hmm
Daniel Vetter <daniel.vetter@ffwll.ch>:
mm: track mmu notifiers in fs_reclaim_acquire/release
mm: extract might_alloc() debug check
locking/selftests: add testcases for fs_reclaim
Subsystem: mm/vmalloc
Andrew Morton <akpm@linux-foundation.org>:
mm/vmalloc.c:__vmalloc_area_node(): avoid 32-bit overflow
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: use free_vm_area() if an allocation fails
mm/vmalloc: rework the drain logic
Alex Shi <alex.shi@linux.alibaba.com>:
mm/vmalloc: add 'align' parameter explanation for pvm_determine_end_from_reverse
Baolin Wang <baolin.wang@linux.alibaba.com>:
mm/vmalloc.c: remove unnecessary return statement
Waiman Long <longman@redhat.com>:
mm/vmalloc: Fix unlock order in s_stop()
Subsystem: mm/documentation
Alex Shi <alex.shi@linux.alibaba.com>:
docs/vm: remove unused 3 items explanation for /proc/vmstat
Subsystem: mm/kasan
Vincenzo Frascino <vincenzo.frascino@arm.com>:
mm/vmalloc.c: fix kasan shadow poisoning size
Walter Wu <walter-zh.wu@mediatek.com>:
Patch series "kasan: add workqueue stack for generic KASAN", v5:
workqueue: kasan: record workqueue stack
kasan: print workqueue stack
lib/test_kasan.c: add workqueue test case
kasan: update documentation for generic kasan
Marco Elver <elver@google.com>:
lkdtm: disable KASAN for rodata.o
Subsystem: mm/pagealloc
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "arch, mm: deprecate DISCONTIGMEM", v2:
alpha: switch from DISCONTIGMEM to SPARSEMEM
ia64: remove custom __early_pfn_to_nid()
ia64: remove 'ifdef CONFIG_ZONE_DMA32' statements
ia64: discontig: paging_init(): remove local max_pfn calculation
ia64: split virtual map initialization out of paging_init()
ia64: forbid using VIRTUAL_MEM_MAP with FLATMEM
ia64: make SPARSEMEM default and disable DISCONTIGMEM
arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
arm, arm64: move free_unused_memmap() to generic mm
arc: use FLATMEM with freeing of unused memory map instead of DISCONTIGMEM
m68k/mm: make node data and node setup depend on CONFIG_DISCONTIGMEM
m68k/mm: enable use of generic memory_model.h for !DISCONTIGMEM
m68k: deprecate DISCONTIGMEM
Patch series "arch, mm: improve robustness of direct map manipulation", v7:
mm: introduce debug_pagealloc_{map,unmap}_pages() helpers
PM: hibernate: make direct map manipulations more explicit
arch, mm: restore dependency of __kernel_map_pages() on DEBUG_PAGEALLOC
arch, mm: make kernel_page_present() always available
Vlastimil Babka <vbabka@suse.cz>:
Patch series "disable pcplists during memory offline", v3:
mm, page_alloc: clean up pageset high and batch update
mm, page_alloc: calculate pageset high and batch once per zone
mm, page_alloc: remove setup_pageset()
mm, page_alloc: simplify pageset_update()
mm, page_alloc: cache pageset high and batch in struct zone
mm, page_alloc: move draining pcplists to page isolation users
mm, page_alloc: disable pcplists during memory offline
Miaohe Lin <linmiaohe@huawei.com>:
include/linux/page-flags.h: remove unused __[Set|Clear]PagePrivate
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/page-flags: fix comment
mm/page_alloc: add __free_pages() documentation
Zou Wei <zou_wei@huawei.com>:
mm/page_alloc: mark some symbols with static keyword
David Hildenbrand <david@redhat.com>:
mm/page_alloc: clear all pages in post_alloc_hook() with init_on_alloc=1
Lin Feng <linf@wangsu.com>:
init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set
Lorenzo Stoakes <lstoakes@gmail.com>:
mm: page_alloc: refactor setup_per_zone_lowmem_reserve()
Muchun Song <songmuchun@bytedance.com>:
mm/page_alloc: speed up the iteration of max_order
Subsystem: mm/memory-failure
Oscar Salvador <osalvador@suse.de>:
Patch series "HWpoison: further fixes and cleanups", v5:
mm,hwpoison: drain pcplists before bailing out for non-buddy zero-refcount page
mm,hwpoison: take free pages off the buddy freelists
mm,hwpoison: drop unneeded pcplist draining
Patch series "HWPoison: Refactor get page interface", v2:
mm,hwpoison: refactor get_any_page
mm,hwpoison: disable pcplists before grabbing a refcount
mm,hwpoison: remove drain_all_pages from shake_page
mm,memory_failure: always pin the page in madvise_inject_error
mm,hwpoison: return -EBUSY when migration fails
Subsystem: mm/hugetlb
Hui Su <sh_def@163.com>:
mm/hugetlb.c: just use put_page_testzero() instead of page_count()
Ralph Campbell <rcampbell@nvidia.com>:
include/linux/huge_mm.h: remove extern keyword
Alex Shi <alex.shi@linux.alibaba.com>:
khugepaged: add parameter explanations for kernel-doc markup
Liu Xiang <liu.xiang@zlingsmart.com>:
mm: hugetlb: fix type of delta parameter and related local variables in gather_surplus_pages()
Oscar Salvador <osalvador@suse.de>:
mm,hugetlb: remove unneeded initialization
Dan Carpenter <dan.carpenter@oracle.com>:
hugetlb: fix an error code in hugetlb_reserve_pages()
Subsystem: mm/vmscan
Johannes Weiner <hannes@cmpxchg.org>:
mm: don't wake kswapd prematurely when watermark boosting is disabled
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
mm/vmscan: drop unneeded assignment in kswapd()
"logic.yu" <hymmsx.yu@gmail.com>:
mm/vmscan.c: remove the filename in the top of file comment
Muchun Song <songmuchun@bytedance.com>:
mm/page_isolation: do not isolate the max order page
Subsystem: mm/z3fold
Vitaly Wool <vitaly.wool@konsulko.com>:
Patch series "z3fold: stability / rt fixes":
z3fold: simplify freeing slots
z3fold: stricter locking and more careful reclaim
z3fold: remove preempt disabled sections for RT
Subsystem: mm/compaction
Yanfei Xu <yanfei.xu@windriver.com>:
mm/compaction: rename 'start_pfn' to 'iteration_start_pfn' in compact_zone()
Hui Su <sh_def@163.com>:
mm/compaction: move compaction_suitable's comment to right place
mm/compaction: make defer_compaction and compaction_deferred static
Subsystem: mm/oom-kill
Hui Su <sh_def@163.com>:
mm/oom_kill: change comment and rename is_dump_unreclaim_slabs()
Subsystem: mm/migration
Long Li <lonuxli.64@gmail.com>:
mm/migrate.c: fix comment spelling
Ralph Campbell <rcampbell@nvidia.com>:
mm/migrate.c: optimize migrate_vma_pages() mmu notifier
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: support THPs in zero_user_segments
Yang Shi <shy828301@gmail.com>:
Patch series "mm: misc migrate cleanup and improvement", v3:
mm: truncate_complete_page() does not exist any more
mm: migrate: simplify the logic for handling permanent failure
mm: migrate: skip shared exec THP for NUMA balancing
mm: migrate: clean up migrate_prep{_local}
mm: migrate: return -ENOSYS if THP migration is unsupported
Stephen Zhang <starzhangzsd@gmail.com>:
mm: migrate: remove unused parameter in migrate_vma_insert_page()
Subsystem: mm/cma
Lecopzer Chen <lecopzer.chen@mediatek.com>:
mm/cma.c: remove redundant cma_mutex lock
Charan Teja Reddy <charante@codeaurora.org>:
mm: cma: improve pr_debug log in cma_release()
Subsystem: mm/page-poison
Vlastimil Babka <vbabka@suse.cz>:
Patch series "cleanup page poisoning", v3:
mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters
mm, page_poison: use static key more efficiently
kernel/power: allow hibernation with page_poison sanity checking
mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY
mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO
Subsystem: mm/userfaultfd
Lokesh Gidra <lokeshgidra@google.com>:
Patch series "Control over userfaultfd kernel-fault handling", v6:
userfaultfd: add UFFD_USER_MODE_ONLY
userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
Axel Rasmussen <axelrasmussen@google.com>:
userfaultfd: selftests: make __{s,u}64 format specifiers portable
Peter Xu <peterx@redhat.com>:
Patch series "userfaultfd: selftests: Small fixes":
userfaultfd/selftests: always dump something in modes
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: hint the test runner on required privilege
Subsystem: mm/zswap
Joe Perches <joe@perches.com>:
mm/zswap: make struct kernel_param_ops definitions const
YueHaibing <yuehaibing@huawei.com>:
mm/zswap: fix passing zero to 'PTR_ERR' warning
Barry Song <song.bao.hua@hisilicon.com>:
mm/zswap: move to use crypto_acomp API for hardware acceleration
Subsystem: mm/zsmalloc
Miaohe Lin <linmiaohe@huawei.com>:
mm/zsmalloc.c: rework the list_add code in insert_zspage()
Subsystem: mm/uaccess
Colin Ian King <colin.king@canonical.com>:
mm/process_vm_access: remove redundant initialization of iov_r
Subsystem: mm/zram
Minchan Kim <minchan@kernel.org>:
zram: support page writeback
zram: add stat to gather incompressible pages since zram set up
Rui Salvaterra <rsalvaterra@gmail.com>:
zram: break the strict dependency from lzo
Subsystem: mm/cleanups
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>:
mm: fix kernel-doc markups
Joe Perches <joe@perches.com>:
Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2:
mm: use sysfs_emit for struct kobject * uses
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm:backing-dev: use sysfs_emit in macro defining functions
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
"Gustavo A. R. Silva" <gustavoars@kernel.org>:
mm: fix fall-through warnings for Clang
Alexey Dobriyan <adobriyan@gmail.com>:
mm: cleanup kstrto*() usage
/mmap_lock.h | 107 ++
a/Documentation/admin-guide/blockdev/zram.rst | 6
a/Documentation/admin-guide/cgroup-v1/memcg_test.rst | 8
a/Documentation/admin-guide/cgroup-v1/memory.rst | 42
a/Documentation/admin-guide/cgroup-v2.rst | 11
a/Documentation/admin-guide/mm/transhuge.rst | 15
a/Documentation/admin-guide/sysctl/vm.rst | 15
a/Documentation/core-api/memory-allocation.rst | 4
a/Documentation/core-api/pin_user_pages.rst | 8
a/Documentation/dev-tools/kasan.rst | 5
a/Documentation/filesystems/tmpfs.rst | 8
a/Documentation/vm/memory-model.rst | 3
a/Documentation/vm/page_owner.rst | 12
a/arch/Kconfig | 21
a/arch/alpha/Kconfig | 8
a/arch/alpha/include/asm/mmzone.h | 14
a/arch/alpha/include/asm/page.h | 7
a/arch/alpha/include/asm/pgtable.h | 12
a/arch/alpha/include/asm/sparsemem.h | 18
a/arch/alpha/kernel/setup.c | 1
a/arch/arc/Kconfig | 3
a/arch/arc/include/asm/page.h | 20
a/arch/arc/mm/init.c | 29
a/arch/arm/Kconfig | 12
a/arch/arm/kernel/vdso.c | 9
a/arch/arm/mach-bcm/Kconfig | 1
a/arch/arm/mach-davinci/Kconfig | 1
a/arch/arm/mach-exynos/Kconfig | 1
a/arch/arm/mach-highbank/Kconfig | 1
a/arch/arm/mach-omap2/Kconfig | 1
a/arch/arm/mach-s5pv210/Kconfig | 1
a/arch/arm/mach-tango/Kconfig | 1
a/arch/arm/mm/init.c | 78 -
a/arch/arm64/Kconfig | 9
a/arch/arm64/include/asm/cacheflush.h | 1
a/arch/arm64/include/asm/pgtable.h | 1
a/arch/arm64/kernel/vdso.c | 41
a/arch/arm64/mm/init.c | 68 -
a/arch/arm64/mm/pageattr.c | 12
a/arch/ia64/Kconfig | 11
a/arch/ia64/include/asm/meminit.h | 2
a/arch/ia64/mm/contig.c | 88 --
a/arch/ia64/mm/discontig.c | 44 -
a/arch/ia64/mm/init.c | 14
a/arch/ia64/mm/numa.c | 30
a/arch/m68k/Kconfig.cpu | 31
a/arch/m68k/include/asm/page.h | 2
a/arch/m68k/include/asm/page_mm.h | 7
a/arch/m68k/include/asm/virtconvert.h | 7
a/arch/m68k/mm/init.c | 10
a/arch/mips/vdso/genvdso.c | 4
a/arch/nds32/mm/mm-nds32.c | 6
a/arch/powerpc/Kconfig | 5
a/arch/riscv/Kconfig | 4
a/arch/riscv/include/asm/pgtable.h | 2
a/arch/riscv/include/asm/set_memory.h | 1
a/arch/riscv/mm/pageattr.c | 31
a/arch/s390/Kconfig | 4
a/arch/s390/configs/debug_defconfig | 2
a/arch/s390/configs/defconfig | 2
a/arch/s390/kernel/vdso.c | 11
a/arch/sparc/Kconfig | 4
a/arch/sparc/mm/init_64.c | 2
a/arch/x86/Kconfig | 5
a/arch/x86/entry/vdso/vma.c | 17
a/arch/x86/include/asm/set_memory.h | 1
a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2
a/arch/x86/kernel/tboot.c | 1
a/arch/x86/mm/pat/set_memory.c | 6
a/drivers/base/node.c | 2
a/drivers/block/zram/Kconfig | 42
a/drivers/block/zram/zcomp.c | 2
a/drivers/block/zram/zram_drv.c | 29
a/drivers/block/zram/zram_drv.h | 1
a/drivers/dax/device.c | 4
a/drivers/dax/kmem.c | 2
a/drivers/dma-buf/sync_file.c | 3
a/drivers/edac/ghes_edac.c | 4
a/drivers/firmware/efi/efi.c | 1
a/drivers/gpu/drm/drm_atomic.c | 3
a/drivers/hwtracing/intel_th/msu.c | 2
a/drivers/ide/falconide.c | 2
a/drivers/ide/ide-probe.c | 3
a/drivers/misc/lkdtm/Makefile | 1
a/drivers/pinctrl/pinctrl-utils.c | 2
a/drivers/vhost/vringh.c | 3
a/drivers/virtio/virtio_balloon.c | 6
a/drivers/xen/unpopulated-alloc.c | 14
a/fs/aio.c | 5
a/fs/ntfs/file.c | 5
a/fs/ntfs/inode.c | 2
a/fs/ntfs/logfile.c | 3
a/fs/ocfs2/cluster/tcp.c | 1
a/fs/ocfs2/namei.c | 4
a/fs/proc/kcore.c | 2
a/fs/proc/meminfo.c | 2
a/fs/userfaultfd.c | 20
a/include/linux/cgroup-defs.h | 15
a/include/linux/compaction.h | 12
a/include/linux/fs.h | 2
a/include/linux/gfp.h | 2
a/include/linux/highmem.h | 19
a/include/linux/huge_mm.h | 93 --
a/include/linux/memcontrol.h | 148 ---
a/include/linux/migrate.h | 4
a/include/linux/mm.h | 118 +-
a/include/linux/mm_types.h | 8
a/include/linux/mmap_lock.h | 94 ++
a/include/linux/mmzone.h | 50 -
a/include/linux/page-flags.h | 6
a/include/linux/page_ext.h | 8
a/include/linux/pagevec.h | 3
a/include/linux/poison.h | 4
a/include/linux/rmap.h | 1
a/include/linux/sched/mm.h | 16
a/include/linux/set_memory.h | 5
a/include/linux/shmem_fs.h | 6
a/include/linux/slab.h | 18
a/include/linux/vmalloc.h | 8
a/include/linux/vmstat.h | 104 ++
a/include/trace/events/sched.h | 84 +
a/include/uapi/linux/const.h | 5
a/include/uapi/linux/ethtool.h | 2
a/include/uapi/linux/kernel.h | 9
a/include/uapi/linux/lightnvm.h | 2
a/include/uapi/linux/mroute6.h | 2
a/include/uapi/linux/netfilter/x_tables.h | 2
a/include/uapi/linux/netlink.h | 2
a/include/uapi/linux/sysctl.h | 2
a/include/uapi/linux/userfaultfd.h | 9
a/init/main.c | 6
a/ipc/shm.c | 8
a/kernel/cgroup/cgroup.c | 12
a/kernel/fork.c | 3
a/kernel/kthread.c | 29
a/kernel/power/hibernate.c | 2
a/kernel/power/power.h | 2
a/kernel/power/snapshot.c | 52 +
a/kernel/ptrace.c | 2
a/kernel/workqueue.c | 3
a/lib/locking-selftest.c | 47 +
a/lib/test_kasan_module.c | 29
a/mm/Kconfig | 25
a/mm/Kconfig.debug | 28
a/mm/Makefile | 4
a/mm/backing-dev.c | 8
a/mm/cma.c | 6
a/mm/compaction.c | 29
a/mm/filemap.c | 823 ++++++++++---------
a/mm/gup.c | 329 ++-----
a/mm/gup_benchmark.c | 210 ----
a/mm/gup_test.c | 299 ++++++
a/mm/gup_test.h | 40
a/mm/highmem.c | 52 +
a/mm/huge_memory.c | 86 +
a/mm/hugetlb.c | 28
a/mm/init-mm.c | 1
a/mm/internal.h | 5
a/mm/kasan/generic.c | 3
a/mm/kasan/report.c | 4
a/mm/khugepaged.c | 58 -
a/mm/ksm.c | 50 -
a/mm/madvise.c | 14
a/mm/mapping_dirty_helpers.c | 6
a/mm/memblock.c | 80 +
a/mm/memcontrol.c | 170 +--
a/mm/memory-failure.c | 322 +++----
a/mm/memory.c | 24
a/mm/memory_hotplug.c | 44 -
a/mm/mempolicy.c | 8
a/mm/migrate.c | 183 ++--
a/mm/mm_init.c | 1
a/mm/mmap.c | 22
a/mm/mmap_lock.c | 230 +++++
a/mm/mmu_notifier.c | 7
a/mm/mmzone.c | 14
a/mm/mremap.c | 282 ++++--
a/mm/nommu.c | 8
a/mm/oom_kill.c | 14
a/mm/page_alloc.c | 517 ++++++-----
a/mm/page_counter.c | 4
a/mm/page_ext.c | 10
a/mm/page_isolation.c | 18
a/mm/page_owner.c | 17
a/mm/page_poison.c | 56 -
a/mm/page_vma_mapped.c | 9
a/mm/process_vm_access.c | 2
a/mm/rmap.c | 9
a/mm/shmem.c | 39
a/mm/slab.c | 10
a/mm/slab.h | 9
a/mm/slab_common.c | 10
a/mm/slob.c | 6
a/mm/slub.c | 156 +--
a/mm/swap.c | 12
a/mm/swap_state.c | 7
a/mm/swapfile.c | 14
a/mm/truncate.c | 18
a/mm/vmalloc.c | 105 +-
a/mm/vmscan.c | 21
a/mm/vmstat.c | 6
a/mm/workingset.c | 8
a/mm/z3fold.c | 215 ++--
a/mm/zsmalloc.c | 11
a/mm/zswap.c | 193 +++-
a/sound/core/pcm_lib.c | 4
a/tools/include/linux/poison.h | 6
a/tools/testing/selftests/vm/.gitignore | 4
a/tools/testing/selftests/vm/Makefile | 41
a/tools/testing/selftests/vm/check_config.sh | 31
a/tools/testing/selftests/vm/config | 2
a/tools/testing/selftests/vm/gup_benchmark.c | 143 ---
a/tools/testing/selftests/vm/gup_test.c | 258 +++++
a/tools/testing/selftests/vm/hmm-tests.c | 10
a/tools/testing/selftests/vm/mremap_test.c | 344 +++++++
a/tools/testing/selftests/vm/run_vmtests | 51 -
a/tools/testing/selftests/vm/userfaultfd.c | 94 --
217 files changed, 4817 insertions(+), 3369 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-11 21:35 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-12-11 21:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
8 patches, based on 33dc9614dc208291d0c4bcdeb5d30d481dcd2c4c.
Subsystems affected by this patch series:
mm/pagecache
proc
selftests
kbuild
mm/kasan
mm/hugetlb
Subsystem: mm/pagecache
Andrew Morton <akpm@linux-foundation.org>:
revert "mm/filemap: add static for function __add_to_page_cache_locked"
Subsystem: proc
Miles Chen <miles.chen@mediatek.com>:
proc: use untagged_addr() for pagemap_read addresses
Subsystem: selftests
Arnd Bergmann <arnd@arndb.de>:
selftest/fpu: avoid clang warning
Subsystem: kbuild
Arnd Bergmann <arnd@arndb.de>:
kbuild: avoid static_assert for genksyms
initramfs: fix clang build failure
elfcore: fix building with clang
Subsystem: mm/kasan
Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>:
kasan: fix object remaining in offline per-cpu quarantine
Subsystem: mm/hugetlb
Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
mm/hugetlb: clear compound_nr before freeing gigantic pages
fs/proc/task_mmu.c | 8 ++++++--
include/linux/build_bug.h | 5 +++++
include/linux/elfcore.h | 22 ++++++++++++++++++++++
init/initramfs.c | 2 +-
kernel/Makefile | 1 -
kernel/elfcore.c | 26 --------------------------
lib/Makefile | 3 ++-
mm/filemap.c | 2 +-
mm/hugetlb.c | 1 +
mm/kasan/quarantine.c | 39 +++++++++++++++++++++++++++++++++++++++
10 files changed, 77 insertions(+), 32 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-12-06 6:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-12-06 6:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
12 patches, based on 33256ce194110874d4bc90078b577c59f9076c59.
Subsystems affected by this patch series:
lib
coredump
mm/memcg
mm/zsmalloc
mm/swap
mailmap
mm/selftests
mm/pagecache
mm/hugetlb
mm/pagemap
Subsystem: lib
Randy Dunlap <rdunlap@infradead.org>:
zlib: export S390 symbols for zlib modules
Subsystem: coredump
Menglong Dong <dong.menglong@zte.com.cn>:
coredump: fix core_pattern parse error
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: fix obj_cgroup_charge() return value handling
Yang Shi <shy828301@gmail.com>:
mm: list_lru: set shrinker map bit when child nr_items is not zero
Subsystem: mm/zsmalloc
Minchan Kim <minchan@kernel.org>:
mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING
Subsystem: mm/swap
Qian Cai <qcai@redhat.com>:
mm/swapfile: do not sleep with a spin lock held
Subsystem: mailmap
Uwe Kleine-König <u.kleine-koenig@pengutronix.de>:
mailmap: add two more addresses of Uwe Kleine-König
Subsystem: mm/selftests
Xingxing Su <suxingxing@loongson.cn>:
tools/testing/selftests/vm: fix build error
Axel Rasmussen <axelrasmussen@google.com>:
userfaultfd: selftests: fix SIGSEGV if huge mmap fails
Subsystem: mm/pagecache
Alex Shi <alex.shi@linux.alibaba.com>:
mm/filemap: add static for function __add_to_page_cache_locked
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
Subsystem: mm/pagemap
Liu Zixian <liuzixian4@huawei.com>:
mm/mmap.c: fix mmap return value when vma is merged after call_mmap()
.mailmap | 2 +
arch/arm/configs/omap2plus_defconfig | 1
fs/coredump.c | 3 +
include/linux/zsmalloc.h | 1
lib/zlib_dfltcc/dfltcc_inflate.c | 3 +
mm/Kconfig | 13 -------
mm/filemap.c | 2 -
mm/hugetlb_cgroup.c | 8 +---
mm/list_lru.c | 10 ++---
mm/mmap.c | 26 ++++++--------
mm/slab.h | 40 +++++++++++++---------
mm/swapfile.c | 4 +-
mm/zsmalloc.c | 54 -------------------------------
tools/testing/selftests/vm/Makefile | 4 ++
tools/testing/selftests/vm/userfaultfd.c | 25 +++++++++-----
15 files changed, 75 insertions(+), 121 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-11-22 6:16 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-11-22 6:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
8 patches, based on a349e4c659609fd20e4beea89e5c4a4038e33a95.
Subsystems affected by this patch series:
mm/madvise
kbuild
mm/pagemap
mm/readahead
mm/memcg
mm/userfaultfd
vfs-akpm
mm/madvise
Subsystem: mm/madvise
Eric Dumazet <edumazet@google.com>:
mm/madvise: fix memory leak from process_madvise
Subsystem: kbuild
Nick Desaulniers <ndesaulniers@google.com>:
compiler-clang: remove version check for BPF Tracing
Subsystem: mm/pagemap
Dan Williams <dan.j.williams@intel.com>:
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
Subsystem: mm/readahead
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: fix readahead_page_batch for retry entries
Subsystem: mm/memcg
Muchun Song <songmuchun@bytedance.com>:
mm: memcg/slab: fix root memcg vmstats
Subsystem: mm/userfaultfd
Gerald Schaefer <gerald.schaefer@linux.ibm.com>:
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Subsystem: vfs-akpm
Yicong Yang <yangyicong@hisilicon.com>:
libfs: fix error cast of negative value in simple_attr_write()
Subsystem: mm/madvise
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: fix madvise WILLNEED performance problem
arch/ia64/include/asm/sparsemem.h | 6 ++++++
arch/powerpc/include/asm/mmzone.h | 5 +++++
arch/powerpc/include/asm/sparsemem.h | 5 ++---
arch/powerpc/mm/mem.c | 1 +
arch/x86/include/asm/sparsemem.h | 10 ++++++++++
arch/x86/mm/numa.c | 2 ++
drivers/dax/Kconfig | 1 -
fs/libfs.c | 6 ++++--
include/linux/compiler-clang.h | 2 ++
include/linux/memory_hotplug.h | 14 --------------
include/linux/numa.h | 30 +++++++++++++++++++++++++++++-
include/linux/pagemap.h | 2 ++
mm/huge_memory.c | 9 ++++-----
mm/madvise.c | 4 +---
mm/memcontrol.c | 9 +++++++--
mm/memory_hotplug.c | 18 ------------------
16 files changed, 75 insertions(+), 49 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-11-14 6:51 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-11-14 6:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
14 patches, based on 9e6a39eae450b81c8b2c8cbbfbdf8218e9b40c81.
Subsystems affected by this patch series:
mm/migration
mm/vmscan
mailmap
mm/slub
mm/gup
kbuild
reboot
kernel/watchdog
mm/memcg
mm/hugetlbfs
panic
ocfs2
Subsystem: mm/migration
Zi Yan <ziy@nvidia.com>:
mm/compaction: count pages and stop correctly during page isolation
mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate
Subsystem: mm/vmscan
Nicholas Piggin <npiggin@gmail.com>:
mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit
Subsystem: mailmap
Dmitry Baryshkov <dbaryshkov@gmail.com>:
mailmap: fix entry for Dmitry Baryshkov/Eremin-Solenikov
Subsystem: mm/slub
Laurent Dufour <ldufour@linux.ibm.com>:
mm/slub: fix panic in slab_alloc_node()
Subsystem: mm/gup
Jason Gunthorpe <jgg@nvidia.com>:
mm/gup: use unpin_user_pages() in __gup_longterm_locked()
Subsystem: kbuild
Arvind Sankar <nivedita@alum.mit.edu>:
compiler.h: fix barrier_data() on clang
Subsystem: reboot
Matteo Croce <mcroce@microsoft.com>:
Patch series "fix parsing of reboot= cmdline", v3:
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
reboot: fix overflow parsing reboot cpu number
Subsystem: kernel/watchdog
Santosh Sivaraj <santosh@fossix.org>:
kernel/watchdog: fix watchdog_allowed_mask not used warning
Subsystem: mm/memcg
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: fix missing wakeup polling thread
Subsystem: mm/hugetlbfs
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlbfs: fix anon huge page migration race
Subsystem: panic
Christophe Leroy <christophe.leroy@csgroup.eu>:
panic: don't dump stack twice on warn
Subsystem: ocfs2
Wengang Wang <wen.gang.wang@oracle.com>:
ocfs2: initialize ip_next_orphan
.mailmap | 5 +-
fs/ocfs2/super.c | 1
include/asm-generic/barrier.h | 1
include/linux/compiler-clang.h | 6 --
include/linux/compiler-gcc.h | 19 --------
include/linux/compiler.h | 18 +++++++-
include/linux/memcontrol.h | 11 ++++-
kernel/panic.c | 3 -
kernel/reboot.c | 28 ++++++------
kernel/watchdog.c | 4 -
mm/compaction.c | 12 +++--
mm/gup.c | 14 ++++--
mm/hugetlb.c | 90 ++---------------------------------------
mm/memory-failure.c | 36 +++++++---------
mm/migrate.c | 46 +++++++++++---------
mm/rmap.c | 5 --
mm/slub.c | 2
mm/vmscan.c | 5 +-
18 files changed, 119 insertions(+), 187 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-11-02 1:06 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-11-02 1:06 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
15 patches, based on 3cea11cd5e3b00d91caf0b4730194039b45c5891.
Subsystems affected by this patch series:
mm/memremap
mm/memcg
mm/slab-generic
mm/kasan
mm/mempolicy
signals
lib
mm/pagecache
kthread
mm/oom-kill
mm/pagemap
epoll
core-kernel
Subsystem: mm/memremap
Ralph Campbell <rcampbell@nvidia.com>:
mm/mremap_pages: fix static key devmap_managed_key updates
Subsystem: mm/memcg
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb_cgroup: fix reservation accounting
zhongjiang-ali <zhongjiang-ali@linux.alibaba.com>:
mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg
Roman Gushchin <guro@fb.com>:
mm: memcg: link page counters to root if use_hierarchy is false
Subsystem: mm/slab-generic
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan: adopt KUNIT tests to SW_TAGS mode
Subsystem: mm/mempolicy
Shijie Luo <luoshijie1@huawei.com>:
mm: mempolicy: fix potential pte_unmap_unlock pte error
Subsystem: signals
Oleg Nesterov <oleg@redhat.com>:
ptrace: fix task_join_group_stop() for the case when current is traced
Subsystem: lib
Vasily Gorbik <gor@linux.ibm.com>:
lib/crc32test: remove extra local_irq_disable/enable
Subsystem: mm/pagecache
Jason Yan <yanaijie@huawei.com>:
mm/truncate.c: make __invalidate_mapping_pages() static
Subsystem: kthread
Zqiang <qiang.zhang@windriver.com>:
kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
Subsystem: mm/oom-kill
Charles Haithcock <chaithco@redhat.com>:
mm, oom: keep oom_adj under or at upper limit when printing
Subsystem: mm/pagemap
Jason Gunthorpe <jgg@nvidia.com>:
mm: always have io_remap_pfn_range() set pgprot_decrypted()
Subsystem: epoll
Soheil Hassas Yeganeh <soheil@google.com>:
epoll: check ep_events_available() upon timeout
epoll: add a selftest for epoll timeout race
Subsystem: core-kernel
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
kernel/hung_task.c: make type annotations consistent
fs/eventpoll.c | 16 +
fs/proc/base.c | 2
include/linux/mm.h | 9
include/linux/pgtable.h | 4
kernel/hung_task.c | 3
kernel/kthread.c | 3
kernel/signal.c | 19 -
lib/crc32test.c | 4
lib/test_kasan.c | 149 +++++++---
mm/hugetlb.c | 20 -
mm/memcontrol.c | 25 +
mm/mempolicy.c | 6
mm/memremap.c | 39 +-
mm/truncate.c | 2
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 95 ++++++
15 files changed, 290 insertions(+), 106 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-10-17 23:13 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-10-17 23:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
40 patches, based on 9d9af1007bc08971953ae915d88dc9bb21344b53.
Subsystems affected by this patch series:
ia64
mm/memcg
mm/migration
mm/pagemap
mm/gup
mm/madvise
mm/vmalloc
misc
Subsystem: ia64
Krzysztof Kozlowski <krzk@kernel.org>:
ia64: fix build error with !COREDUMP
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
mm, memcg: rework remote charging API to support nesting
Patch series "mm: kmem: kernel memory accounting in an interrupt context":
mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()
mm: kmem: remove redundant checks from get_obj_cgroup_from_current()
mm: kmem: prepare remote memcg charging infra for interrupt contexts
mm: kmem: enable kernel memcg accounting from interrupt contexts
Subsystem: mm/migration
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
mm/memory-failure: remove a wrapper for alloc_migration_target()
mm/memory_hotplug: remove a wrapper for alloc_migration_target()
Miaohe Lin <linmiaohe@huawei.com>:
mm/migrate: avoid possible unnecessary process right check in kernel_move_pages()
Subsystem: mm/pagemap
"Liam R. Howlett" <Liam.Howlett@Oracle.com>:
mm/mmap: add inline vma_next() for readability of mmap code
mm/mmap: add inline munmap_vma_range() for code readability
Subsystem: mm/gup
Jann Horn <jannh@google.com>:
mm/gup_benchmark: take the mmap lock around GUP
binfmt_elf: take the mmap lock around find_extend_vma()
mm/gup: assert that the mmap lock is held in __get_user_pages()
John Hubbard <jhubbard@nvidia.com>:
Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v2:
mm/gup_benchmark: rename to mm/gup_test
selftests/vm: use a common gup_test.h
selftests/vm: rename run_vmtests --> run_vmtests.sh
selftests/vm: minor cleanup: Makefile and gup_test.c
selftests/vm: only some gup_test items are really benchmarks
selftests/vm: gup_test: introduce the dump_pages() sub-test
selftests/vm: run_vmtests.sh: update and clean up gup_test invocation
selftests/vm: hmm-tests: remove the libhugetlbfs dependency
selftests/vm: 10x speedup for hmm-tests
Subsystem: mm/madvise
Minchan Kim <minchan@kernel.org>:
Patch series "introduce memory hinting API for external process", v9:
mm/madvise: pass mm to do_madvise
pid: move pidfd_get_pid() to pid.c
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
Subsystem: mm/vmalloc
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "remove alloc_vm_area", v4:
mm: update the documentation for vfree
Christoph Hellwig <hch@lst.de>:
mm: add a VM_MAP_PUT_PAGES flag for vmap
mm: add a vmap_pfn function
mm: allow a NULL fn callback in apply_to_page_range
zsmalloc: switch from alloc_vm_area to get_vm_area
drm/i915: use vmap in shmem_pin_map
drm/i915: stop using kmap in i915_gem_object_map
drm/i915: use vmap in i915_gem_object_map
xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv
x86/xen: open code alloc_vm_area in arch_gnttab_valloc
mm: remove alloc_vm_area
Patch series "two small vmalloc cleanups":
mm: cleanup the gfp_mask handling in __vmalloc_area_node
mm: remove the filename in the top of file comment in vmalloc.c
Subsystem: misc
Tian Tao <tiantao6@hisilicon.com>:
mm: remove duplicate include statement in mmu.c
Documentation/core-api/pin_user_pages.rst | 8
arch/alpha/kernel/syscalls/syscall.tbl | 1
arch/arm/mm/mmu.c | 1
arch/arm/tools/syscall.tbl | 1
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 2
arch/ia64/kernel/Makefile | 2
arch/ia64/kernel/syscalls/syscall.tbl | 1
arch/m68k/kernel/syscalls/syscall.tbl | 1
arch/microblaze/kernel/syscalls/syscall.tbl | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 1
arch/mips/kernel/syscalls/syscall_n64.tbl | 1
arch/mips/kernel/syscalls/syscall_o32.tbl | 1
arch/parisc/kernel/syscalls/syscall.tbl | 1
arch/powerpc/kernel/syscalls/syscall.tbl | 1
arch/s390/configs/debug_defconfig | 2
arch/s390/configs/defconfig | 2
arch/s390/kernel/syscalls/syscall.tbl | 1
arch/sh/kernel/syscalls/syscall.tbl | 1
arch/sparc/kernel/syscalls/syscall.tbl | 1
arch/x86/entry/syscalls/syscall_32.tbl | 1
arch/x86/entry/syscalls/syscall_64.tbl | 1
arch/x86/xen/grant-table.c | 27 +-
arch/xtensa/kernel/syscalls/syscall.tbl | 1
drivers/gpu/drm/i915/Kconfig | 1
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 136 ++++------
drivers/gpu/drm/i915/gt/shmem_utils.c | 78 +-----
drivers/xen/xenbus/xenbus_client.c | 30 +-
fs/binfmt_elf.c | 3
fs/buffer.c | 6
fs/io_uring.c | 2
fs/notify/fanotify/fanotify.c | 5
fs/notify/inotify/inotify_fsnotify.c | 5
include/linux/memcontrol.h | 12
include/linux/mm.h | 2
include/linux/pid.h | 1
include/linux/sched/mm.h | 43 +--
include/linux/syscalls.h | 2
include/linux/vmalloc.h | 7
include/uapi/asm-generic/unistd.h | 4
kernel/exit.c | 19 -
kernel/pid.c | 19 +
kernel/sys_ni.c | 1
mm/Kconfig | 24 +
mm/Makefile | 2
mm/gup.c | 2
mm/gup_benchmark.c | 225 ------------------
mm/gup_test.c | 295 +++++++++++++++++++++--
mm/gup_test.h | 40 ++-
mm/madvise.c | 125 ++++++++--
mm/memcontrol.c | 83 ++++--
mm/memory-failure.c | 18 -
mm/memory.c | 16 -
mm/memory_hotplug.c | 46 +--
mm/migrate.c | 71 +++--
mm/mmap.c | 74 ++++-
mm/nommu.c | 7
mm/percpu.c | 3
mm/slab.h | 3
mm/vmalloc.c | 147 +++++------
mm/zsmalloc.c | 10
tools/testing/selftests/vm/.gitignore | 3
tools/testing/selftests/vm/Makefile | 40 ++-
tools/testing/selftests/vm/check_config.sh | 31 ++
tools/testing/selftests/vm/config | 2
tools/testing/selftests/vm/gup_benchmark.c | 143 -----------
tools/testing/selftests/vm/gup_test.c | 260 ++++++++++++++++++--
tools/testing/selftests/vm/hmm-tests.c | 12
tools/testing/selftests/vm/run_vmtests | 334 --------------------------
tools/testing/selftests/vm/run_vmtests.sh | 350 +++++++++++++++++++++++++++-
70 files changed, 1580 insertions(+), 1224 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-10-16 2:40 incoming Andrew Morton
@ 2020-10-16 3:03 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-10-16 3:03 UTC (permalink / raw)
To: Linus Torvalds, mm-commits, linux-mm
And... I forgot to set in-reply-to :(
Shall resend, omitting linux-mm.
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-10-16 2:40 Andrew Morton
2020-10-16 3:03 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-10-16 2:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- most of the rest of mm/
- various other subsystems
156 patches, based on 578a7155c5a1894a789d4ece181abf9d25dc6b0d.
Subsystems affected by this patch series:
mm/dax
mm/debug
mm/thp
mm/readahead
mm/page-poison
mm/util
mm/memory-hotplug
mm/zram
mm/cleanups
misc
core-kernel
get_maintainer
MAINTAINERS
lib
bitops
checkpatch
binfmt
ramfs
autofs
nilfs
rapidio
panic
relay
kgdb
ubsan
romfs
fault-injection
Subsystem: mm/dax
Dan Williams <dan.j.williams@intel.com>:
device-dax/kmem: fix resource release
Subsystem: mm/debug
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "mm/debug_vm_pgtable fixes", v4:
powerpc/mm: add DEBUG_VM WARN for pmd_clear
powerpc/mm: move setting pte specific flags to pfn_pte
mm/debug_vm_pgtable/ppc64: avoid setting top bits in radom value
mm/debug_vm_pgtables/hugevmap: use the arch helper to identify huge vmap support.
mm/debug_vm_pgtable/savedwrite: enable savedwrite test with CONFIG_NUMA_BALANCING
mm/debug_vm_pgtable/THP: mark the pte entry huge before using set_pmd/pud_at
mm/debug_vm_pgtable/set_pte/pmd/pud: don't use set_*_at to update an existing pte entry
mm/debug_vm_pgtable/locks: move non page table modifying test together
mm/debug_vm_pgtable/locks: take correct page table lock
mm/debug_vm_pgtable/thp: use page table depost/withdraw with THP
mm/debug_vm_pgtable/pmd_clear: don't use pmd/pud_clear on pte entries
mm/debug_vm_pgtable/hugetlb: disable hugetlb test on ppc64
mm/debug_vm_pgtable: avoid none pte in pte_clear_test
mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.
Subsystem: mm/thp
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Fix read-only THP for non-tmpfs filesystems":
XArray: add xa_get_order
XArray: add xas_split
mm/filemap: fix storing to a THP shadow entry
Patch series "Remove assumptions of THP size":
mm/filemap: fix page cache removal for arbitrary sized THPs
mm/memory: remove page fault assumption of compound page size
mm/page_owner: change split_page_owner to take a count
"Kirill A. Shutemov" <kirill@shutemov.name>:
mm/huge_memory: fix total_mapcount assumption of page size
mm/huge_memory: fix split assumption of page size
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/huge_memory: fix page_trans_huge_mapcount assumption of THP size
mm/huge_memory: fix can_split_huge_page assumption of THP size
mm/rmap: fix assumptions of THP size
mm/truncate: fix truncation for pages of arbitrary size
mm/page-writeback: support tail pages in wait_for_stable_page
mm/vmscan: allow arbitrary sized pages to be paged out
fs: add a filesystem flag for THPs
fs: do not update nr_thps for mappings which support THPs
Huang Ying <ying.huang@intel.com>:
mm: fix a race during THP splitting
Subsystem: mm/readahead
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Readahead patches for 5.9/5.10":
mm/readahead: add DEFINE_READAHEAD
mm/readahead: make page_cache_ra_unbounded take a readahead_control
mm/readahead: make do_page_cache_ra take a readahead_control
David Howells <dhowells@redhat.com>:
mm/readahead: make ondemand_readahead take a readahead_control
mm/readahead: pass readahead_control to force_page_cache_ra
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/readahead: add page_cache_sync_ra and page_cache_async_ra
David Howells <dhowells@redhat.com>:
mm/filemap: fold ra_submit into do_sync_mmap_readahead
mm/readahead: pass a file_ra_state into force_page_cache_ra
Subsystem: mm/page-poison
Naoya Horiguchi <naoya.horiguchi@nec.com>:
Patch series "HWPOISON: soft offline rework", v7:
mm,hwpoison: cleanup unused PageHuge() check
mm, hwpoison: remove recalculating hpage
mm,hwpoison-inject: don't pin for hwpoison_filter
Oscar Salvador <osalvador@suse.de>:
mm,hwpoison: unexport get_hwpoison_page and make it static
mm,hwpoison: refactor madvise_inject_error
mm,hwpoison: kill put_hwpoison_page
mm,hwpoison: unify THP handling for hard and soft offline
mm,hwpoison: rework soft offline for free pages
mm,hwpoison: rework soft offline for in-use pages
mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_page
mm,hwpoison: return 0 if the page is already poisoned in soft-offline
Naoya Horiguchi <naoya.horiguchi@nec.com>:
mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
mm,hwpoison: double-check page count in __get_any_page()
Oscar Salvador <osalvador@suse.de>:
mm,hwpoison: try to narrow window race for free pages
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/page_poison.c: replace bool variable with static key
Miaohe Lin <linmiaohe@huawei.com>:
mm/vmstat.c: use helper macro abs()
Subsystem: mm/util
Bartosz Golaszewski <bgolaszewski@baylibre.com>:
mm/util.c: update the kerneldoc for kstrdup_const()
Jann Horn <jannh@google.com>:
mm/mmu_notifier: fix mmget() assert in __mmu_interval_notifier_insert
Subsystem: mm/memory-hotplug
David Hildenbrand <david@redhat.com>:
Patch series "mm/memory_hotplug: online_pages()/offline_pages() cleanups", v2:
mm/memory_hotplug: inline __offline_pages() into offline_pages()
mm/memory_hotplug: enforce section granularity when onlining/offlining
mm/memory_hotplug: simplify page offlining
mm/page_alloc: simplify __offline_isolated_pages()
mm/memory_hotplug: drop nr_isolate_pageblock in offline_pages()
mm/page_isolation: simplify return value of start_isolate_page_range()
mm/memory_hotplug: simplify page onlining
mm/page_alloc: drop stale pageblock comment in memmap_init_zone*()
mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone()
mm/memory_hotplug: mark pageblocks MIGRATE_ISOLATE while onlining memory
Patch series "selective merging of system ram resources", v4:
kernel/resource: make release_mem_region_adjustable() never fail
kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED
mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG
mm/memory_hotplug: prepare passing flags to add_memory() and friends
mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources
virtio-mem: try to merge system ram resources
xen/balloon: try to merge system ram resources
hv_balloon: try to merge system ram resources
kernel/resource: make iomem_resource implicit in release_mem_region_adjustable()
Laurent Dufour <ldufour@linux.ibm.com>:
mm: don't panic when links can't be created in sysfs
David Hildenbrand <david@redhat.com>:
Patch series "mm: place pages to the freelist tail when onlining and undoing isolation", v2:
mm/page_alloc: convert "report" flag of __free_one_page() to a proper flag
mm/page_alloc: place pages to tail in __putback_isolated_page()
mm/page_alloc: move pages to tail in move_to_free_list()
mm/page_alloc: place pages to tail in __free_pages_core()
mm/memory_hotplug: update comment regarding zone shuffling
Subsystem: mm/zram
Douglas Anderson <dianders@chromium.org>:
zram: failing to decompress is WARN_ON worthy
Subsystem: mm/cleanups
YueHaibing <yuehaibing@huawei.com>:
mm/slab.h: remove duplicate include
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/page_reporting.c: drop stale list head check in page_reporting_cycle
Ira Weiny <ira.weiny@intel.com>:
mm/highmem.c: clean up endif comments
Yu Zhao <yuzhao@google.com>:
mm: use self-explanatory macros rather than "2"
Miaohe Lin <linmiaohe@huawei.com>:
mm: fix some broken comments
Chen Tao <chentao3@hotmail.com>:
mm: fix some comments formatting
Xiaofei Tan <tanxiaofei@huawei.com>:
mm/workingset.c: fix some doc warnings
Miaohe Lin <linmiaohe@huawei.com>:
mm: use helper function put_write_access()
Mike Rapoport <rppt@linux.ibm.com>:
include/linux/mmzone.h: remove unused early_pfn_valid()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: rename page_order() to buddy_order()
Subsystem: misc
Randy Dunlap <rdunlap@infradead.org>:
fs: configfs: delete repeated words in comments
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
kernel.h: split out min()/max() et al. helpers
Subsystem: core-kernel
Liao Pingfang <liao.pingfang@zte.com.cn>:
kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map()
Randy Dunlap <rdunlap@infradead.org>:
kernel/: fix repeated words in comments
kernel: acct.c: fix some kernel-doc nits
Subsystem: get_maintainer
Joe Perches <joe@perches.com>:
get_maintainer: add test for file in VCS
Subsystem: MAINTAINERS
Joe Perches <joe@perches.com>:
get_maintainer: exclude MAINTAINERS file(s) from --git-fallback
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>:
MAINTAINERS: jarkko.sakkinen@linux.intel.com -> jarkko@kernel.org
Subsystem: lib
Randy Dunlap <rdunlap@infradead.org>:
lib: bitmap: delete duplicated words
lib: libcrc32c: delete duplicated words
lib: decompress_bunzip2: delete duplicated words
lib: dynamic_queue_limits: delete duplicated words + fix typo
lib: earlycpio: delete duplicated words
lib: radix-tree: delete duplicated words
lib: syscall: delete duplicated words
lib: test_sysctl: delete duplicated words
lib/mpi/mpi-bit.c: fix spello of "functions"
Stephen Boyd <swboyd@chromium.org>:
lib/idr.c: document calling context for IDA APIs mustn't use locks
lib/idr.c: document that ida_simple_{get,remove}() are deprecated
Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
lib/scatterlist.c: avoid a double memset
Miaohe Lin <linmiaohe@huawei.com>:
lib/percpu_counter.c: use helper macro abs()
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
include/linux/list.h: add a macro to test if entry is pointing to the head
Dan Carpenter <dan.carpenter@oracle.com>:
lib/test_hmm.c: fix an error code in dmirror_allocate_chunk()
Tobias Jordan <kernel@cdqe.de>:
lib/crc32.c: fix trivial typo in preprocessor condition
Subsystem: bitops
Wei Yang <richard.weiyang@linux.alibaba.com>:
bitops: simplify get_count_order_long()
bitops: use the same mechanism for get_count_order[_long]
Subsystem: checkpatch
Jerome Forissier <jerome@forissier.org>:
checkpatch: add --kconfig-prefix
Joe Perches <joe@perches.com>:
checkpatch: move repeated word test
checkpatch: add test for comma use that should be semicolon
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
const_structs.checkpatch: add phy_ops
Nicolas Boichat <drinkcat@chromium.org>:
checkpatch: warn if trace_printk and friends are called
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
const_structs.checkpatch: add pinctrl_ops and pinmux_ops
Joe Perches <joe@perches.com>:
checkpatch: warn on self-assignments
checkpatch: allow not using -f with files that are in git
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: extend author Signed-off-by check for split From: header
Joe Perches <joe@perches.com>:
checkpatch: emit a warning on embedded filenames
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: fix multi-statement macro checks for while blocks.
Łukasz Stelmach <l.stelmach@samsung.com>:
checkpatch: fix false positive on empty block comment lines
Dwaipayan Ray <dwaipayanray1@gmail.com>:
checkpatch: add new warnings to author signoff checks.
Subsystem: binfmt
Chris Kennelly <ckennelly@google.com>:
Patch series "Selecting Load Addresses According to p_align", v3:
fs/binfmt_elf: use PT_LOAD p_align values for suitable start address
tools/testing/selftests: add self-test for verifying load alignment
Jann Horn <jannh@google.com>:
Patch series "Fix ELF / FDPIC ELF core dumping, and use mmap_lock properly in there", v5:
binfmt_elf_fdpic: stop using dump_emit() on user pointers on !MMU
coredump: let dump_emit() bail out on short writes
coredump: refactor page range dumping into common helper
coredump: rework elf/elf_fdpic vma_dump_size() into common helper
binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
mm/gup: take mmap_lock in get_dump_page()
mm: remove the now-unnecessary mmget_still_valid() hack
Subsystem: ramfs
Matthew Wilcox (Oracle) <willy@infradead.org>:
ramfs: fix nommu mmap with gaps in the page cache
Subsystem: autofs
Matthew Wilcox <willy@infradead.org>:
autofs: harden ioctl table
Subsystem: nilfs
Wang Hai <wanghai38@huawei.com>:
nilfs2: fix some kernel-doc warnings for nilfs2
Subsystem: rapidio
Souptick Joarder <jrdr.linux@gmail.com>:
rapidio: fix error handling path
Jing Xiangfeng <jingxiangfeng@huawei.com>:
rapidio: fix the missed put_device() for rio_mport_add_riodev
Subsystem: panic
Alexey Kardashevskiy <aik@ozlabs.ru>:
panic: dump registers on panic_on_warn
Subsystem: relay
Sudip Mukherjee <sudipm.mukherjee@gmail.com>:
kernel/relay.c: drop unneeded initialization
Subsystem: kgdb
Ritesh Harjani <riteshh@linux.ibm.com>:
scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
scripts/gdb/tasks: add headers and improve spacing format
Subsystem: ubsan
Elena Petrova <lenaptr@google.com>:
sched.h: drop in_ubsan field when UBSAN is in trap mode
George Popescu <georgepope@android.com>:
ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
Subsystem: romfs
Libing Zhou <libing.zhou@nokia-sbell.com>:
ROMFS: support inode blocks calculation
Subsystem: fault-injection
Albert van der Linde <alinde@google.com>:
Patch series "add fault injection to user memory access", v3:
lib, include/linux: add usercopy failure capability
lib, uaccess: add failure injection to usercopy functions
.mailmap | 1
Documentation/admin-guide/kernel-parameters.txt | 1
Documentation/core-api/xarray.rst | 14
Documentation/fault-injection/fault-injection.rst | 7
MAINTAINERS | 6
arch/ia64/mm/init.c | 4
arch/powerpc/include/asm/book3s/64/pgtable.h | 29 +
arch/powerpc/include/asm/nohash/pgtable.h | 5
arch/powerpc/mm/pgtable.c | 5
arch/powerpc/platforms/powernv/memtrace.c | 2
arch/powerpc/platforms/pseries/hotplug-memory.c | 2
drivers/acpi/acpi_memhotplug.c | 3
drivers/base/memory.c | 3
drivers/base/node.c | 33 +-
drivers/block/zram/zram_drv.c | 2
drivers/dax/kmem.c | 50 ++-
drivers/hv/hv_balloon.c | 4
drivers/infiniband/core/uverbs_main.c | 3
drivers/rapidio/devices/rio_mport_cdev.c | 18 -
drivers/s390/char/sclp_cmd.c | 2
drivers/vfio/pci/vfio_pci.c | 38 +-
drivers/virtio/virtio_mem.c | 5
drivers/xen/balloon.c | 4
fs/autofs/dev-ioctl.c | 8
fs/binfmt_elf.c | 267 +++-------------
fs/binfmt_elf_fdpic.c | 176 ++--------
fs/configfs/dir.c | 2
fs/configfs/file.c | 2
fs/coredump.c | 238 +++++++++++++-
fs/ext4/verity.c | 4
fs/f2fs/verity.c | 4
fs/inode.c | 2
fs/nilfs2/bmap.c | 2
fs/nilfs2/cpfile.c | 6
fs/nilfs2/page.c | 1
fs/nilfs2/sufile.c | 4
fs/proc/task_mmu.c | 18 -
fs/ramfs/file-nommu.c | 2
fs/romfs/super.c | 1
fs/userfaultfd.c | 28 -
include/linux/bitops.h | 13
include/linux/blkdev.h | 1
include/linux/bvec.h | 6
include/linux/coredump.h | 13
include/linux/fault-inject-usercopy.h | 22 +
include/linux/fs.h | 28 -
include/linux/idr.h | 13
include/linux/ioport.h | 15
include/linux/jiffies.h | 3
include/linux/kernel.h | 150 ---------
include/linux/list.h | 29 +
include/linux/memory_hotplug.h | 42 +-
include/linux/minmax.h | 153 +++++++++
include/linux/mm.h | 5
include/linux/mmzone.h | 17 -
include/linux/node.h | 16
include/linux/nodemask.h | 2
include/linux/page-flags.h | 6
include/linux/page_owner.h | 6
include/linux/pagemap.h | 111 ++++++
include/linux/sched.h | 2
include/linux/sched/mm.h | 25 -
include/linux/uaccess.h | 12
include/linux/vmstat.h | 2
include/linux/xarray.h | 22 +
include/ras/ras_event.h | 3
kernel/acct.c | 10
kernel/cgroup/cpuset.c | 2
kernel/dma/direct.c | 2
kernel/fork.c | 4
kernel/futex.c | 2
kernel/irq/timings.c | 2
kernel/jump_label.c | 2
kernel/kcsan/encoding.h | 2
kernel/kexec_core.c | 2
kernel/kexec_file.c | 2
kernel/kthread.c | 2
kernel/livepatch/state.c | 2
kernel/panic.c | 12
kernel/pid_namespace.c | 2
kernel/power/snapshot.c | 2
kernel/range.c | 3
kernel/relay.c | 2
kernel/resource.c | 114 +++++--
kernel/smp.c | 2
kernel/sys.c | 2
kernel/user_namespace.c | 2
lib/Kconfig.debug | 7
lib/Kconfig.ubsan | 14
lib/Makefile | 1
lib/bitmap.c | 2
lib/crc32.c | 2
lib/decompress_bunzip2.c | 2
lib/dynamic_queue_limits.c | 4
lib/earlycpio.c | 2
lib/fault-inject-usercopy.c | 39 ++
lib/find_bit.c | 1
lib/hexdump.c | 1
lib/idr.c | 9
lib/iov_iter.c | 5
lib/libcrc32c.c | 2
lib/math/rational.c | 2
lib/math/reciprocal_div.c | 1
lib/mpi/mpi-bit.c | 2
lib/percpu_counter.c | 2
lib/radix-tree.c | 2
lib/scatterlist.c | 2
lib/strncpy_from_user.c | 3
lib/syscall.c | 2
lib/test_hmm.c | 2
lib/test_sysctl.c | 2
lib/test_xarray.c | 65 ++++
lib/usercopy.c | 5
lib/xarray.c | 208 ++++++++++++
mm/Kconfig | 2
mm/compaction.c | 6
mm/debug_vm_pgtable.c | 267 ++++++++--------
mm/filemap.c | 58 ++-
mm/gup.c | 73 ++--
mm/highmem.c | 4
mm/huge_memory.c | 47 +-
mm/hwpoison-inject.c | 18 -
mm/internal.h | 47 +-
mm/khugepaged.c | 2
mm/madvise.c | 52 ---
mm/memory-failure.c | 357 ++++++++++------------
mm/memory.c | 7
mm/memory_hotplug.c | 223 +++++--------
mm/memremap.c | 3
mm/migrate.c | 11
mm/mmap.c | 7
mm/mmu_notifier.c | 2
mm/page-writeback.c | 1
mm/page_alloc.c | 289 +++++++++++------
mm/page_isolation.c | 16
mm/page_owner.c | 10
mm/page_poison.c | 20 -
mm/page_reporting.c | 4
mm/readahead.c | 174 ++++------
mm/rmap.c | 10
mm/shmem.c | 2
mm/shuffle.c | 2
mm/slab.c | 2
mm/slab.h | 1
mm/slub.c | 2
mm/sparse.c | 2
mm/swap_state.c | 2
mm/truncate.c | 6
mm/util.c | 3
mm/vmscan.c | 5
mm/vmstat.c | 8
mm/workingset.c | 2
scripts/Makefile.ubsan | 10
scripts/checkpatch.pl | 238 ++++++++++----
scripts/const_structs.checkpatch | 3
scripts/gdb/linux/proc.py | 15
scripts/gdb/linux/tasks.py | 9
scripts/get_maintainer.pl | 9
tools/testing/selftests/exec/.gitignore | 1
tools/testing/selftests/exec/Makefile | 9
tools/testing/selftests/exec/load_address.c | 68 ++++
161 files changed, 2532 insertions(+), 1864 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-10-13 23:46 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-10-13 23:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
181 patches, based on 029f56db6ac248769f2c260bfaf3c3c0e23e904c.
Subsystems affected by this patch series:
kbuild
scripts
ntfs
ocfs2
vfs
mm/slab
mm/slub
mm/kmemleak
mm/dax
mm/debug
mm/pagecache
mm/fadvise
mm/gup
mm/swap
mm/memremap
mm/memcg
mm/selftests
mm/pagemap
mm/mincore
mm/hmm
mm/dma
mm/memory-failure
mm/vmalloc
mm/documentation
mm/kasan
mm/pagealloc
mm/hugetlb
mm/vmscan
mm/z3fold
mm/zbud
mm/compaction
mm/mempolicy
mm/mempool
mm/memblock
mm/oom-kill
mm/migration
Subsystem: kbuild
Nick Desaulniers <ndesaulniers@google.com>:
Patch series "set clang minimum version to 10.0.1", v3:
compiler-clang: add build check for clang 10.0.1
Revert "kbuild: disable clang's default use of -fmerge-all-constants"
Revert "arm64: bti: Require clang >= 10.0.1 for in-kernel BTI support"
Revert "arm64: vdso: Fix compilation with clang older than 8"
Partially revert "ARM: 8905/1: Emit __gnu_mcount_nc when using Clang 10.0.0 or newer"
Marco Elver <elver@google.com>:
kasan: remove mentions of unsupported Clang versions
Nick Desaulniers <ndesaulniers@google.com>:
compiler-gcc: improve version error
compiler.h: avoid escaped section names
export.h: fix section name for CONFIG_TRIM_UNUSED_KSYMS for Clang
Lukas Bulwahn <lukas.bulwahn@gmail.com>:
kbuild: doc: describe proper script invocation
Subsystem: scripts
Wang Qing <wangqing@vivo.com>:
scripts/spelling.txt: increase error-prone spell checking
Naoki Hayama <naoki.hayama@lineo.co.jp>:
scripts/spelling.txt: add "arbitrary" typo
Borislav Petkov <bp@suse.de>:
scripts/decodecode: add the capability to supply the program counter
Subsystem: ntfs
Rustam Kovhaev <rkovhaev@gmail.com>:
ntfs: add check for mft record size in superblock
Subsystem: ocfs2
Randy Dunlap <rdunlap@infradead.org>:
ocfs2: delete repeated words in comments
Gang He <ghe@suse.com>:
ocfs2: fix potential soft lockup during fstrim
Subsystem: vfs
Randy Dunlap <rdunlap@infradead.org>:
fs/xattr.c: fix kernel-doc warnings for setxattr & removexattr
Luo Jiaxing <luojiaxing@huawei.com>:
fs_parse: mark fs_param_bad_value() as static
Subsystem: mm/slab
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/slab.c: clean code by removing redundant if condition
tangjianqiang <wyqt1985@gmail.com>:
include/linux/slab.h: fix a typo error in comment
Subsystem: mm/slub
Abel Wu <wuyun.wu@huawei.com>:
mm/slub.c: branch optimization in free slowpath
mm/slub: fix missing ALLOC_SLOWPATH stat when bulk alloc
mm/slub: make add_full() condition more explicit
Subsystem: mm/kmemleak
Davidlohr Bueso <dave@stgolabs.net>:
mm/kmemleak: rely on rcu for task stack scanning
Hui Su <sh_def@163.com>:
mm,kmemleak-test.c: move kmemleak-test.c to samples dir
Subsystem: mm/dax
Dan Williams <dan.j.williams@intel.com>:
Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5:
x86/numa: cleanup configuration dependent command-line options
x86/numa: add 'nohmat' option
efi/fake_mem: arrange for a resource entry per efi_fake_mem instance
ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device
resource: report parent to walk_iomem_res_desc() callback
mm/memory_hotplug: introduce default phys_to_target_node() implementation
ACPI: HMAT: attach a device for each soft-reserved range
device-dax: drop the dax_region.pfn_flags attribute
device-dax: move instance creation parameters to 'struct dev_dax_data'
device-dax: make pgmap optional for instance creation
device-dax/kmem: introduce dax_kmem_range()
device-dax/kmem: move resource name tracking to drvdata
device-dax/kmem: replace release_resource() with release_mem_region()
device-dax: add an allocation interface for device-dax instances
device-dax: introduce 'struct dev_dax' typed-driver operations
device-dax: introduce 'seed' devices
drivers/base: make device_find_child_by_name() compatible with sysfs inputs
device-dax: add resize support
mm/memremap_pages: convert to 'struct range'
mm/memremap_pages: support multiple ranges per invocation
device-dax: add dis-contiguous resource support
device-dax: introduce 'mapping' devices
Joao Martins <joao.m.martins@oracle.com>:
device-dax: make align a per-device property
Dan Williams <dan.j.williams@intel.com>:
device-dax: add an 'align' attribute
Joao Martins <joao.m.martins@oracle.com>:
dax/hmem: introduce dax_hmem.region_idle parameter
device-dax: add a range mapping allocation attribute
Subsystem: mm/debug
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/debug.c: do not dereference i_ino blindly
John Hubbard <jhubbard@nvidia.com>:
mm, dump_page: rename head_mapcount() --> head_compound_mapcount()
Subsystem: mm/pagecache
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Return head pages from find_*_entry", v2:
mm: factor find_get_incore_page out of mincore_page
mm: use find_get_incore_page in memcontrol
mm: optimise madvise WILLNEED
proc: optimise smaps for shmem entries
i915: use find_lock_page instead of find_lock_entry
mm: convert find_get_entry to return the head page
mm/shmem: return head page from find_lock_entry
mm: add find_lock_head
mm/filemap: fix filemap_map_pages for THP
Subsystem: mm/fadvise
Yafang Shao <laoar.shao@gmail.com>:
mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED
Subsystem: mm/gup
Barry Song <song.bao.hua@hisilicon.com>:
mm/gup_benchmark: update the documentation in Kconfig
mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag
mm/gup: don't permit users to call get_user_pages with FOLL_LONGTERM
John Hubbard <jhubbard@nvidia.com>:
mm/gup: protect unpin_user_pages() against npages==-ERRNO
Subsystem: mm/swap
Gao Xiang <hsiangkao@redhat.com>:
swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity
Yu Zhao <yuzhao@google.com>:
mm: remove activate_page() from unuse_pte()
mm: remove superfluous __ClearPageActive()
Miaohe Lin <linmiaohe@huawei.com>:
mm/swap.c: fix confusing comment in release_pages()
mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache()
mm/page_io.c: remove useless out label in __swap_writepage()
mm/swap.c: fix incomplete comment in lru_cache_add_inactive_or_unevictable()
mm/swapfile.c: remove unnecessary goto out in _swap_info_get()
mm/swapfile.c: fix potential memory leak in sys_swapon
Subsystem: mm/memremap
Ira Weiny <ira.weiny@intel.com>:
mm/memremap.c: convert devmap static branch to {inc,dec}
Subsystem: mm/memcg
"Gustavo A. R. Silva" <gustavoars@kernel.org>:
mm: memcontrol: use flex_array_size() helper in memcpy()
mm: memcontrol: use the preferred form for passing the size of a structure type
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: fix racy access to page->mem_cgroup in mem_cgroup_from_obj()
Miaohe Lin <linmiaohe@huawei.com>:
mm: memcontrol: correct the comment of mem_cgroup_iter()
Waiman Long <longman@redhat.com>:
Patch series "mm/memcg: Miscellaneous cleanups and streamlining", v2:
mm/memcg: clean up obsolete enum charge_type
mm/memcg: simplify mem_cgroup_get_max()
mm/memcg: unify swap and memsw page counters
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: add the missing numa_stat interface for cgroup v2
Miaohe Lin <linmiaohe@huawei.com>:
mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge()
mm: memcontrol: reword obsolete comment of mem_cgroup_unmark_under_oom()
Bharata B Rao <bharata@linux.ibm.com>:
mm: memcg/slab: uncharge during kmem_cache_free_bulk()
Ralph Campbell <rcampbell@nvidia.com>:
mm/memcg: fix device private memcg accounting
Subsystem: mm/selftests
John Hubbard <jhubbard@nvidia.com>:
Patch series "selftests/vm: fix some minor aggravating factors in the Makefile":
selftests/vm: fix false build success on the second and later attempts
selftests/vm: fix incorrect gcc invocation in some cases
Subsystem: mm/pagemap
Matthew Wilcox <willy@infradead.org>:
mm: account PMD tables like PTE tables
Yanfei Xu <yanfei.xu@windriver.com>:
mm/memory.c: fix typo in __do_fault() comment
mm/memory.c: replace vmf->vma with variable vma
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/mmap: rename __vma_unlink_common() to __vma_unlink()
mm/mmap: leverage vma_rb_erase_ignore() to implement vma_rb_erase()
Chinwen Chang <chinwen.chang@mediatek.com>:
Patch series "Try to release mmap_lock temporarily in smaps_rollup", v4:
mmap locking API: add mmap_lock_is_contended()
mm: smaps*: extend smap_gather_stats to support specified beginning
mm: proc: smaps_rollup: do not stall write attempts on mmap_lock
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Fix PageDoubleMap":
mm: move PageDoubleMap bit
mm: simplify PageDoubleMap with PF_SECOND policy
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/mmap: leave adjust_next as virtual address instead of page frame number
Randy Dunlap <rdunlap@infradead.org>:
mm/memory.c: fix spello of "function"
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/mmap: not necessary to check mapping separately
mm/mmap: check on file instead of the rb_root_cached of its address_space
Miaohe Lin <linmiaohe@huawei.com>:
mm: use helper function mapping_allow_writable()
mm/mmap.c: use helper function allow_write_access() in __remove_shared_vm_struct()
Liao Pingfang <liao.pingfang@zte.com.cn>:
mm/mmap.c: replace do_brk with do_brk_flags in comment of insert_vm_struct()
Peter Xu <peterx@redhat.com>:
mm: remove src/dst mm parameter in copy_page_range()
Subsystem: mm/mincore
yuleixzhang <yulei.kernel@gmail.com>:
include/linux/huge_mm.h: remove mincore_huge_pmd declaration
Subsystem: mm/hmm
Ralph Campbell <rcampbell@nvidia.com>:
tools/testing/selftests/vm/hmm-tests.c: use the new SKIP() macro
lib/test_hmm.c: remove unused dmirror_zero_page
Subsystem: mm/dma
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
mm/dmapool.c: replace open-coded list_for_each_entry_safe()
mm/dmapool.c: replace hard coded function name with __func__
Subsystem: mm/memory-failure
Xianting Tian <tian.xianting@h3c.com>:
mm/memory-failure: do pgoff calculation before for_each_process()
Alex Shi <alex.shi@linux.alibaba.com>:
mm/memory-failure.c: remove unused macro `writeback'
Subsystem: mm/vmalloc
Hui Su <sh_def@163.com>:
mm/vmalloc.c: update the comment in __vmalloc_area_node()
mm/vmalloc.c: fix the comment of find_vm_area
Subsystem: mm/documentation
Alexander Gordeev <agordeev@linux.ibm.com>:
docs/vm: fix 'mm_count' vs 'mm_users' counter confusion
Subsystem: mm/kasan
Patricia Alfonso <trishalfonso@google.com>:
Patch series "KASAN-KUnit Integration", v14:
kasan/kunit: add KUnit Struct to Current Task
KUnit: KASAN Integration
KASAN: port KASAN Tests to KUnit
KASAN: Testing Documentation
David Gow <davidgow@google.com>:
mm: kasan: do not panic if both panic_on_warn and kasan_multishot set
Subsystem: mm/pagealloc
David Hildenbrand <david@redhat.com>:
Patch series "mm / virtio-mem: support ZONE_MOVABLE", v5:
mm/page_alloc: tweak comments in has_unmovable_pages()
mm/page_isolation: exit early when pageblock is isolated in set_migratetype_isolate()
mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
mm/page_isolation: cleanup set_migratetype_isolate()
virtio-mem: don't special-case ZONE_MOVABLE
mm: document semantics of ZONE_MOVABLE
Li Xinhai <lixinhai.lxh@gmail.com>:
mm, isolation: avoid checking unmovable pages across pageblock boundary
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/page_alloc.c: clean code by removing unnecessary initialization
mm/page_alloc.c: micro-optimization remove unnecessary branch
mm/page_alloc.c: fix early params garbage value accesses
mm/page_alloc.c: clean code by merging two functions
Yanfei Xu <yanfei.xu@windriver.com>:
mm/page_alloc.c: __perform_reclaim should return 'unsigned long'
Mateusz Nosek <mateusznosek0@gmail.com>:
mmzone: clean code by removing unused macro parameter
Ralph Campbell <rcampbell@nvidia.com>:
mm: move call to compound_head() in release_pages()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/page_alloc.c: fix freeing non-compound pages
Michal Hocko <mhocko@suse.com>:
include/linux/gfp.h: clarify usage of GFP_ATOMIC in !preemptible contexts
Subsystem: mm/hugetlb
Baoquan He <bhe@redhat.com>:
Patch series "mm/hugetlb: Small cleanup and improvement", v2:
mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool
mm/hugetlb.c: remove the unnecessary non_swap_entry()
doc/vm: fix typo in the hugetlb admin documentation
Wei Yang <richard.weiyang@linux.alibaba.com>:
Patch series "mm/hugetlb: code refine and simplification", v4:
mm/hugetlb: not necessary to coalesce regions recursively
mm/hugetlb: remove VM_BUG_ON(!nrg) in get_file_region_entry_from_cache()
mm/hugetlb: use list_splice to merge two list at once
mm/hugetlb: count file_region to be added when regions_needed != NULL
mm/hugetlb: a page from buddy is not on any list
mm/hugetlb: narrow the hugetlb_lock protection area during preparing huge page
mm/hugetlb: take the free hpage during the iteration directly
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlb: add lockdep check for i_mmap_rwsem held in huge_pmd_share
Subsystem: mm/vmscan
Chunxin Zang <zangchunxin@bytedance.com>:
mm/vmscan: fix infinite loop in drop_slab_node
Hui Su <sh_def@163.com>:
mm/vmscan: fix comments for isolate_lru_page()
Subsystem: mm/z3fold
Hui Su <sh_def@163.com>:
mm/z3fold.c: use xx_zalloc instead xx_alloc and memset
Subsystem: mm/zbud
Xiang Chen <chenxiang66@hisilicon.com>:
mm/zbud: remove redundant initialization
Subsystem: mm/compaction
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/compaction.c: micro-optimization remove unnecessary branch
include/linux/compaction.h: clean code by removing unused enum value
John Hubbard <jhubbard@nvidia.com>:
selftests/vm: 8x compaction_test speedup
Subsystem: mm/mempolicy
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/mempolicy: remove or narrow the lock on current
mm: remove unused alloc_page_vma_node()
Subsystem: mm/mempool
Miaohe Lin <linmiaohe@huawei.com>:
mm/mempool: add 'else' to split mutually exclusive case
Subsystem: mm/memblock
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "memblock: seasonal cleaning^w cleanup", v3:
KVM: PPC: Book3S HV: simplify kvm_cma_reserve()
dma-contiguous: simplify cma_early_percent_memory()
arm, xtensa: simplify initialization of high memory pages
arm64: numa: simplify dummy_numa_init()
h8300, nds32, openrisc: simplify detection of memory extents
riscv: drop unneeded node initialization
mircoblaze: drop unneeded NUMA and sparsemem initializations
memblock: make for_each_memblock_type() iterator private
memblock: make memblock_debug and related functionality private
memblock: reduce number of parameters in for_each_mem_range()
arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()
arch, drivers: replace for_each_membock() with for_each_mem_range()
x86/setup: simplify initrd relocation and reservation
x86/setup: simplify reserve_crashkernel()
memblock: remove unused memblock_mem_size()
memblock: implement for_each_reserved_mem_region() using __next_mem_region()
memblock: use separate iterators for memory and reserved regions
Subsystem: mm/oom-kill
Suren Baghdasaryan <surenb@google.com>:
mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
Subsystem: mm/migration
Ralph Campbell <rcampbell@nvidia.com>:
mm/migrate: remove cpages-- in migrate_vma_finalize()
mm/migrate: remove obsolete comment about device public
.clang-format | 7
Documentation/admin-guide/cgroup-v2.rst | 69 +
Documentation/admin-guide/mm/hugetlbpage.rst | 2
Documentation/dev-tools/kasan.rst | 74 +
Documentation/dev-tools/kmemleak.rst | 2
Documentation/kbuild/makefiles.rst | 20
Documentation/vm/active_mm.rst | 2
Documentation/x86/x86_64/boot-options.rst | 4
MAINTAINERS | 2
Makefile | 9
arch/arm/Kconfig | 2
arch/arm/include/asm/tlb.h | 1
arch/arm/kernel/setup.c | 18
arch/arm/mm/init.c | 59 -
arch/arm/mm/mmu.c | 39
arch/arm/mm/pmsa-v7.c | 23
arch/arm/mm/pmsa-v8.c | 17
arch/arm/xen/mm.c | 7
arch/arm64/Kconfig | 2
arch/arm64/kernel/machine_kexec_file.c | 6
arch/arm64/kernel/setup.c | 4
arch/arm64/kernel/vdso/Makefile | 7
arch/arm64/mm/init.c | 11
arch/arm64/mm/kasan_init.c | 10
arch/arm64/mm/mmu.c | 11
arch/arm64/mm/numa.c | 15
arch/c6x/kernel/setup.c | 9
arch/h8300/kernel/setup.c | 8
arch/microblaze/mm/init.c | 23
arch/mips/cavium-octeon/dma-octeon.c | 14
arch/mips/kernel/setup.c | 31
arch/mips/netlogic/xlp/setup.c | 2
arch/nds32/kernel/setup.c | 8
arch/openrisc/kernel/setup.c | 9
arch/openrisc/mm/init.c | 8
arch/powerpc/kernel/fadump.c | 61 -
arch/powerpc/kexec/file_load_64.c | 16
arch/powerpc/kvm/book3s_hv_builtin.c | 12
arch/powerpc/kvm/book3s_hv_uvmem.c | 14
arch/powerpc/mm/book3s64/hash_utils.c | 16
arch/powerpc/mm/book3s64/radix_pgtable.c | 10
arch/powerpc/mm/kasan/kasan_init_32.c | 8
arch/powerpc/mm/mem.c | 31
arch/powerpc/mm/numa.c | 7
arch/powerpc/mm/pgtable_32.c | 8
arch/riscv/mm/init.c | 36
arch/riscv/mm/kasan_init.c | 10
arch/s390/kernel/setup.c | 27
arch/s390/mm/page-states.c | 6
arch/s390/mm/vmem.c | 7
arch/sh/mm/init.c | 9
arch/sparc/mm/init_64.c | 12
arch/x86/include/asm/numa.h | 8
arch/x86/kernel/e820.c | 16
arch/x86/kernel/setup.c | 56 -
arch/x86/mm/numa.c | 13
arch/x86/mm/numa_emulation.c | 3
arch/x86/xen/enlighten_pv.c | 2
arch/xtensa/mm/init.c | 55 -
drivers/acpi/numa/hmat.c | 76 -
drivers/acpi/numa/srat.c | 9
drivers/base/core.c | 2
drivers/bus/mvebu-mbus.c | 12
drivers/dax/Kconfig | 6
drivers/dax/Makefile | 3
drivers/dax/bus.c | 1237 +++++++++++++++++++++++----
drivers/dax/bus.h | 34
drivers/dax/dax-private.h | 74 +
drivers/dax/device.c | 164 +--
drivers/dax/hmem.c | 56 -
drivers/dax/hmem/Makefile | 8
drivers/dax/hmem/device.c | 100 ++
drivers/dax/hmem/hmem.c | 93 +-
drivers/dax/kmem.c | 236 ++---
drivers/dax/pmem/compat.c | 2
drivers/dax/pmem/core.c | 36
drivers/firmware/efi/x86_fake_mem.c | 12
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4
drivers/gpu/drm/nouveau/nouveau_dmem.c | 15
drivers/irqchip/irq-gic-v3-its.c | 2
drivers/nvdimm/badrange.c | 26
drivers/nvdimm/claim.c | 13
drivers/nvdimm/nd.h | 3
drivers/nvdimm/pfn_devs.c | 13
drivers/nvdimm/pmem.c | 27
drivers/nvdimm/region.c | 21
drivers/pci/p2pdma.c | 12
drivers/virtio/virtio_mem.c | 47 -
drivers/xen/unpopulated-alloc.c | 45
fs/fs_parser.c | 2
fs/ntfs/inode.c | 6
fs/ocfs2/alloc.c | 6
fs/ocfs2/localalloc.c | 2
fs/proc/base.c | 3
fs/proc/task_mmu.c | 104 +-
fs/xattr.c | 22
include/acpi/acpi_numa.h | 14
include/kunit/test.h | 5
include/linux/acpi.h | 2
include/linux/compaction.h | 3
include/linux/compiler-clang.h | 8
include/linux/compiler-gcc.h | 2
include/linux/compiler.h | 2
include/linux/dax.h | 8
include/linux/export.h | 2
include/linux/fs.h | 4
include/linux/gfp.h | 6
include/linux/huge_mm.h | 3
include/linux/kasan.h | 6
include/linux/memblock.h | 90 +
include/linux/memcontrol.h | 13
include/linux/memory_hotplug.h | 23
include/linux/memremap.h | 15
include/linux/mm.h | 36
include/linux/mmap_lock.h | 5
include/linux/mmzone.h | 37
include/linux/numa.h | 11
include/linux/oom.h | 1
include/linux/page-flags.h | 42
include/linux/pagemap.h | 43
include/linux/range.h | 6
include/linux/sched.h | 4
include/linux/sched/coredump.h | 1
include/linux/slab.h | 2
include/linux/swap.h | 10
include/linux/swap_slots.h | 2
kernel/dma/contiguous.c | 11
kernel/fork.c | 25
kernel/resource.c | 11
lib/Kconfig.debug | 9
lib/Kconfig.kasan | 31
lib/Makefile | 5
lib/kunit/test.c | 13
lib/test_free_pages.c | 42
lib/test_hmm.c | 65 -
lib/test_kasan.c | 732 ++++++---------
lib/test_kasan_module.c | 111 ++
mm/Kconfig | 4
mm/Makefile | 1
mm/compaction.c | 5
mm/debug.c | 18
mm/dmapool.c | 46 -
mm/fadvise.c | 9
mm/filemap.c | 78 -
mm/gup.c | 44
mm/gup_benchmark.c | 23
mm/huge_memory.c | 4
mm/hugetlb.c | 100 +-
mm/internal.h | 3
mm/kasan/report.c | 34
mm/kmemleak-test.c | 99 --
mm/kmemleak.c | 8
mm/madvise.c | 21
mm/memblock.c | 102 --
mm/memcontrol.c | 262 +++--
mm/memory-failure.c | 5
mm/memory.c | 147 +--
mm/memory_hotplug.c | 10
mm/mempolicy.c | 8
mm/mempool.c | 18
mm/memremap.c | 344 ++++---
mm/migrate.c | 3
mm/mincore.c | 28
mm/mmap.c | 45
mm/oom_kill.c | 2
mm/page_alloc.c | 82 -
mm/page_counter.c | 2
mm/page_io.c | 14
mm/page_isolation.c | 41
mm/shmem.c | 19
mm/slab.c | 4
mm/slab.h | 50 -
mm/slub.c | 33
mm/sparse.c | 10
mm/swap.c | 14
mm/swap_slots.c | 3
mm/swap_state.c | 38
mm/swapfile.c | 12
mm/truncate.c | 58 -
mm/vmalloc.c | 6
mm/vmscan.c | 5
mm/z3fold.c | 3
mm/zbud.c | 1
samples/Makefile | 1
samples/kmemleak/Makefile | 3
samples/kmemleak/kmemleak-test.c | 99 ++
scripts/decodecode | 29
scripts/spelling.txt | 4
tools/testing/nvdimm/dax-dev.c | 28
tools/testing/nvdimm/test/iomap.c | 2
tools/testing/selftests/vm/Makefile | 17
tools/testing/selftests/vm/compaction_test.c | 11
tools/testing/selftests/vm/gup_benchmark.c | 14
tools/testing/selftests/vm/hmm-tests.c | 4
194 files changed, 4273 insertions(+), 2777 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-10-11 6:15 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-10-11 6:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
5 patches, based on da690031a5d6d50a361e3f19f3eeabd086a6f20d.
Subsystems affected by this patch series:
MAINTAINERS
mm/pagemap
mm/swap
mm/hugetlb
Subsystem: MAINTAINERS
Kees Cook <keescook@chromium.org>:
MAINTAINERS: change hardening mailing list
Antoine Tenart <atenart@kernel.org>:
MAINTAINERS: Antoine Tenart's email address
Subsystem: mm/pagemap
Miaohe Lin <linmiaohe@huawei.com>:
mm: mmap: Fix general protection fault in unlink_file_vma()
Subsystem: mm/swap
Minchan Kim <minchan@kernel.org>:
mm: validate inode in mapping_set_error()
Subsystem: mm/hugetlb
Vijay Balakrishna <vijayb@linux.microsoft.com>:
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
.mailmap | 4 +++-
MAINTAINERS | 8 ++++----
include/linux/khugepaged.h | 5 +++++
include/linux/pagemap.h | 3 ++-
mm/khugepaged.c | 13 +++++++++++--
mm/mmap.c | 6 +++++-
mm/page_alloc.c | 3 +++
7 files changed, 33 insertions(+), 9 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-10-03 5:20 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-10-03 5:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
3 patches, based on d3d45f8220d60a0b2aaaacf8fb2be4e6ffd9008e.
Subsystems affected by this patch series:
mm/slub
mm/cma
scripts
Subsystem: mm/slub
Eric Farman <farman@linux.ibm.com>:
mm, slub: restore initial kmem_cache flags
Subsystem: mm/cma
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs
Subsystem: scripts
Eric Biggers <ebiggers@google.com>:
scripts/spelling.txt: fix malformed entry
mm/page_alloc.c | 19 ++++++++++++++++---
mm/slub.c | 6 +-----
scripts/spelling.txt | 2 +-
3 files changed, 18 insertions(+), 9 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-09-26 4:17 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-09-26 4:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
9 patches, based on 7c7ec3226f5f33f9c050d85ec20f18419c622ad6.
Subsystems affected by this patch series:
mm/thp
mm/memcg
mm/gup
mm/migration
lib
x86
mm/memory-hotplug
Subsystem: mm/thp
Gao Xiang <hsiangkao@redhat.com>:
mm, THP, swap: fix allocating cluster for swapfile by mistake
Subsystem: mm/memcg
Muchun Song <songmuchun@bytedance.com>:
mm: memcontrol: fix missing suffix of workingset_restore
Subsystem: mm/gup
Vasily Gorbik <gor@linux.ibm.com>:
mm/gup: fix gup_fast with dynamic page table folding
Subsystem: mm/migration
Zi Yan <ziy@nvidia.com>:
mm/migrate: correct thp migration stats
Subsystem: lib
Nick Desaulniers <ndesaulniers@google.com>:
lib/string.c: implement stpcpy
Jason Yan <yanaijie@huawei.com>:
lib/memregion.c: include memregion.h
Subsystem: x86
Mikulas Patocka <mpatocka@redhat.com>:
arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
Subsystem: mm/memory-hotplug
Laurent Dufour <ldufour@linux.ibm.com>:
Patch series "mm: fix memory to node bad links in sysfs", v3:
mm: replace memmap_context by meminit_context
mm: don't rely on system state to detect hot-plug operations
Documentation/admin-guide/cgroup-v2.rst | 25 ++++++---
arch/ia64/mm/init.c | 6 +-
arch/s390/include/asm/pgtable.h | 42 +++++++++++----
arch/x86/lib/usercopy_64.c | 2
drivers/base/node.c | 85 ++++++++++++++++++++------------
include/linux/mm.h | 2
include/linux/mmzone.h | 11 +++-
include/linux/node.h | 11 ++--
include/linux/pgtable.h | 10 +++
lib/memregion.c | 1
lib/string.c | 24 +++++++++
mm/gup.c | 18 +++---
mm/memcontrol.c | 4 -
mm/memory_hotplug.c | 5 +
mm/migrate.c | 7 +-
mm/page_alloc.c | 10 +--
mm/swapfile.c | 2
17 files changed, 181 insertions(+), 84 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-09-19 4:19 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-09-19 4:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
15 patches, based on 92ab97adeefccf375de7ebaad9d5b75d4125fe8b.
Subsystems affected by this patch series:
mailmap
mm/hotfixes
mm/thp
mm/memory-hotplug
misc
kcsan
Subsystem: mailmap
Kees Cook <keescook@chromium.org>:
mailmap: add older email addresses for Kees Cook
Subsystem: mm/hotfixes
Hugh Dickins <hughd@google.com>:
Patch series "mm: fixes to past from future testing":
ksm: reinstate memcg charge on copied pages
mm: migration of hugetlbfs page skip memcg
shmem: shmem_writepage() split unlikely i915 THP
mm: fix check_move_unevictable_pages() on THP
mlock: fix unevictable_pgs event counts on THP
Byron Stanoszek <gandalf@winds.org>:
tmpfs: restore functionality of nr_inodes=0
Muchun Song <songmuchun@bytedance.com>:
kprobes: fix kill kprobe which has been marked as gone
Subsystem: mm/thp
Ralph Campbell <rcampbell@nvidia.com>:
mm/thp: fix __split_huge_pmd_locked() for migration PMD
Christophe Leroy <christophe.leroy@csgroup.eu>:
selftests/vm: fix display of page size in map_hugetlb
Subsystem: mm/memory-hotplug
Pavel Tatashin <pasha.tatashin@soleen.com>:
mm/memory_hotplug: drain per-cpu pages again during memory offline
Subsystem: misc
Tobias Klauser <tklauser@distanz.ch>:
ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
stackleak: let stack_erasing_sysctl take a kernel pointer buffer
fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype
Subsystem: kcsan
Changbin Du <changbin.du@gmail.com>:
kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'
.mailmap | 4 ++
fs/fs-writeback.c | 2 -
include/linux/ftrace.h | 3 --
include/linux/stackleak.h | 2 -
kernel/kprobes.c | 9 +++++-
kernel/stackleak.c | 2 -
kernel/trace/ftrace.c | 3 --
lib/Kconfig.debug | 4 --
mm/huge_memory.c | 42 ++++++++++++++++---------------
mm/ksm.c | 4 ++
mm/memory_hotplug.c | 14 ++++++++++
mm/migrate.c | 3 +-
mm/mlock.c | 24 +++++++++++------
mm/page_isolation.c | 8 +++++
mm/shmem.c | 20 +++++++++++---
mm/swap.c | 6 ++--
mm/vmscan.c | 10 +++++--
tools/testing/selftests/vm/map_hugetlb.c | 2 -
18 files changed, 111 insertions(+), 51 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-09-04 23:34 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-09-04 23:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
19 patches, based on 59126901f200f5fc907153468b03c64e0081b6e6.
Subsystems affected by this patch series:
mm/memcg
mm/slub
MAINTAINERS
mm/pagemap
ipc
fork
checkpatch
mm/madvise
mm/migration
mm/hugetlb
lib
Subsystem: mm/memcg
Michal Hocko <mhocko@suse.com>:
memcg: fix use-after-free in uncharge_batch
Xunlei Pang <xlpang@linux.alibaba.com>:
mm: memcg: fix memcg reclaim soft lockup
Subsystem: mm/slub
Eugeniu Rosca <erosca@de.adit-jv.com>:
mm: slub: fix conversion of freelist_corrupted()
Subsystem: MAINTAINERS
Robert Richter <rric@kernel.org>:
MAINTAINERS: update Cavium/Marvell entries
Nick Desaulniers <ndesaulniers@google.com>:
MAINTAINERS: add LLVM maintainers
Randy Dunlap <rdunlap@infradead.org>:
MAINTAINERS: IA64: mark Status as Odd Fixes only
Subsystem: mm/pagemap
Joerg Roedel <jroedel@suse.de>:
mm: track page table modifications in __apply_to_page_range()
Subsystem: ipc
Tobias Klauser <tklauser@distanz.ch>:
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
Subsystem: fork
Tobias Klauser <tklauser@distanz.ch>:
fork: adjust sysctl_max_threads definition to match prototype
Subsystem: checkpatch
Mrinal Pandey <mrinalmni@gmail.com>:
checkpatch: fix the usage of capture group ( ... )
Subsystem: mm/madvise
Yang Shi <shy828301@gmail.com>:
mm: madvise: fix vma user-after-free
Subsystem: mm/migration
Alistair Popple <alistair@popple.id.au>:
mm/migrate: fixup setting UFFD_WP flag
mm/rmap: fixup copying of soft dirty and uffd ptes
Ralph Campbell <rcampbell@nvidia.com>:
Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()":
mm/migrate: remove unnecessary is_zone_device_page() check
mm/migrate: preserve soft dirty in remove_migration_pte()
Subsystem: mm/hugetlb
Li Xinhai <lixinhai.lxh@gmail.com>:
mm/hugetlb: try preferred node first when alloc gigantic page from cma
Muchun Song <songmuchun@bytedance.com>:
mm/hugetlb: fix a race between hugetlb sysctl handlers
David Howells <dhowells@redhat.com>:
mm/khugepaged.c: fix khugepaged's request size in collapse_file
Subsystem: lib
Jason Gunthorpe <jgg@nvidia.com>:
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
MAINTAINERS | 32 ++++++++++++++++----------------
include/linux/log2.h | 2 +-
ipc/ipc_sysctl.c | 2 +-
kernel/fork.c | 2 +-
mm/hugetlb.c | 49 +++++++++++++++++++++++++++++++++++++------------
mm/khugepaged.c | 2 +-
mm/madvise.c | 2 +-
mm/memcontrol.c | 6 ++++++
mm/memory.c | 37 ++++++++++++++++++++++++-------------
mm/migrate.c | 31 +++++++++++++++++++------------
mm/rmap.c | 9 +++++++--
mm/slub.c | 12 ++++++------
mm/vmscan.c | 8 ++++++++
scripts/checkpatch.pl | 4 ++--
14 files changed, 130 insertions(+), 68 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-08-21 0:41 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-08-21 0:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
11 patches, based on 7eac66d0456fe12a462e5c14c68e97c7460989da.
Subsystems affected by this patch series:
misc
mm/hugetlb
mm/vmalloc
mm/misc
romfs
relay
uprobes
squashfs
mm/cma
mm/pagealloc
Subsystem: misc
Nick Desaulniers <ndesaulniers@google.com>:
mailmap: add Andi Kleen
Subsystem: mm/hugetlb
Xu Wang <vulab@iscas.ac.cn>:
hugetlb_cgroup: convert comma to semicolon
Hugh Dickins <hughd@google.com>:
khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
Subsystem: mm/vmalloc
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
mm/vunmap: add cond_resched() in vunmap_pmd_range
Subsystem: mm/misc
Leon Romanovsky <leonro@nvidia.com>:
mm/rodata_test.c: fix missing function declaration
Subsystem: romfs
Jann Horn <jannh@google.com>:
romfs: fix uninitialized memory leak in romfs_dev_read()
Subsystem: relay
Wei Yongjun <weiyongjun1@huawei.com>:
kernel/relay.c: fix memleak on destroy relay channel
Subsystem: uprobes
Hugh Dickins <hughd@google.com>:
uprobes: __replace_page() avoid BUG in munlock_vma_page()
Subsystem: squashfs
Phillip Lougher <phillip@squashfs.org.uk>:
squashfs: avoid bio_alloc() failure with 1Mbyte blocks
Subsystem: mm/cma
Doug Berger <opendmb@gmail.com>:
mm: include CMA pages in lowmem_reserve at boot
Subsystem: mm/pagealloc
Charan Teja Reddy <charante@codeaurora.org>:
mm, page_alloc: fix core hung in free_pcppages_bulk()
.mailmap | 1 +
fs/romfs/storage.c | 4 +---
fs/squashfs/block.c | 6 +++++-
kernel/events/uprobes.c | 2 +-
kernel/relay.c | 1 +
mm/hugetlb_cgroup.c | 4 ++--
mm/khugepaged.c | 2 +-
mm/page_alloc.c | 7 ++++++-
mm/rodata_test.c | 1 +
mm/vmalloc.c | 2 ++
10 files changed, 21 insertions(+), 9 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-08-15 0:29 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-08-15 0:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
39 patches, based on b923f1247b72fc100b87792fd2129d026bb10e66.
Subsystems affected by this patch series:
mm/hotfixes
lz4
exec
mailmap
mm/thp
autofs
mm/madvise
sysctl
mm/kmemleak
mm/misc
lib
Subsystem: mm/hotfixes
Mike Rapoport <rppt@linux.ibm.com>:
asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()
Baoquan He <bhe@redhat.com>:
Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"
Subsystem: lz4
Nick Terrell <terrelln@fb.com>:
lz4: fix kernel decompression speed
Subsystem: exec
Kees Cook <keescook@chromium.org>:
Patch series "Fix S_ISDIR execve() errno":
exec: restore EACCES of S_ISDIR execve()
selftests/exec: add file type errno tests
Subsystem: mailmap
Greg Kurz <groug@kaod.org>:
mailmap: add entry for Greg Kurz
Subsystem: mm/thp
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "THP prep patches":
mm: store compound_nr as well as compound_order
mm: move page-flags include to top of file
mm: add thp_order
mm: add thp_size
mm: replace hpage_nr_pages with thp_nr_pages
mm: add thp_head
mm: introduce offset_in_thp
Subsystem: autofs
Randy Dunlap <rdunlap@infradead.org>:
fs: autofs: delete repeated words in comments
Subsystem: mm/madvise
Minchan Kim <minchan@kernel.org>:
Patch series "introduce memory hinting API for external process", v8:
mm/madvise: pass task and mm to do_madvise
pid: move pidfd_get_pid() to pid.c
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
mm/madvise: check fatal signal pending of target process
Subsystem: sysctl
Xiaoming Ni <nixiaoming@huawei.com>:
all arch: remove system call sys_sysctl
Subsystem: mm/kmemleak
Qian Cai <cai@lca.pw>:
mm/kmemleak: silence KCSAN splats in checksum
Subsystem: mm/misc
Qian Cai <cai@lca.pw>:
mm/frontswap: mark various intentional data races
mm/page_io: mark various intentional data races
mm/swap_state: mark various intentional data races
Kirill A. Shutemov <kirill@shutemov.name>:
mm/filemap.c: fix a data race in filemap_fault()
Qian Cai <cai@lca.pw>:
mm/swapfile: fix and annotate various data races
mm/page_counter: fix various data races at memsw
mm/memcontrol: fix a data race in scan count
mm/list_lru: fix a data race in list_lru_count_one
mm/mempool: fix a data race in mempool_free()
mm/rmap: annotate a data race at tlb_flush_batched
mm/swap.c: annotate data races for lru_rotate_pvecs
mm: annotate a data race in page_zonenum()
Romain Naour <romain.naour@gmail.com>:
include/asm-generic/vmlinux.lds.h: align ro_after_init
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:
sh: clkfwk: remove r8/r16/r32
sh: use generic strncpy()
Subsystem: lib
Krzysztof Kozlowski <krzk@kernel.org>:
Patch series "iomap: Constify ioreadX() iomem argument", v3:
iomap: constify ioreadX() iomem argument (as in generic implementation)
rtl818x: constify ioreadX() iomem argument (as in generic implementation)
ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
.mailmap | 1
arch/alpha/include/asm/core_apecs.h | 6
arch/alpha/include/asm/core_cia.h | 6
arch/alpha/include/asm/core_lca.h | 6
arch/alpha/include/asm/core_marvel.h | 4
arch/alpha/include/asm/core_mcpcia.h | 6
arch/alpha/include/asm/core_t2.h | 2
arch/alpha/include/asm/io.h | 12 -
arch/alpha/include/asm/io_trivial.h | 16 -
arch/alpha/include/asm/jensen.h | 2
arch/alpha/include/asm/machvec.h | 6
arch/alpha/kernel/core_marvel.c | 2
arch/alpha/kernel/io.c | 12 -
arch/alpha/kernel/syscalls/syscall.tbl | 3
arch/arm/configs/am200epdkit_defconfig | 1
arch/arm/tools/syscall.tbl | 3
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 6
arch/ia64/kernel/syscalls/syscall.tbl | 3
arch/m68k/kernel/syscalls/syscall.tbl | 3
arch/microblaze/kernel/syscalls/syscall.tbl | 3
arch/mips/configs/cu1000-neo_defconfig | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 3
arch/mips/kernel/syscalls/syscall_n64.tbl | 3
arch/mips/kernel/syscalls/syscall_o32.tbl | 3
arch/parisc/include/asm/io.h | 4
arch/parisc/kernel/syscalls/syscall.tbl | 3
arch/parisc/lib/iomap.c | 72 +++---
arch/powerpc/kernel/iomap.c | 28 +-
arch/powerpc/kernel/syscalls/syscall.tbl | 3
arch/s390/kernel/syscalls/syscall.tbl | 3
arch/sh/configs/dreamcast_defconfig | 1
arch/sh/configs/espt_defconfig | 1
arch/sh/configs/hp6xx_defconfig | 1
arch/sh/configs/landisk_defconfig | 1
arch/sh/configs/lboxre2_defconfig | 1
arch/sh/configs/microdev_defconfig | 1
arch/sh/configs/migor_defconfig | 1
arch/sh/configs/r7780mp_defconfig | 1
arch/sh/configs/r7785rp_defconfig | 1
arch/sh/configs/rts7751r2d1_defconfig | 1
arch/sh/configs/rts7751r2dplus_defconfig | 1
arch/sh/configs/se7206_defconfig | 1
arch/sh/configs/se7343_defconfig | 1
arch/sh/configs/se7619_defconfig | 1
arch/sh/configs/se7705_defconfig | 1
arch/sh/configs/se7750_defconfig | 1
arch/sh/configs/se7751_defconfig | 1
arch/sh/configs/secureedge5410_defconfig | 1
arch/sh/configs/sh03_defconfig | 1
arch/sh/configs/sh7710voipgw_defconfig | 1
arch/sh/configs/sh7757lcr_defconfig | 1
arch/sh/configs/sh7763rdp_defconfig | 1
arch/sh/configs/shmin_defconfig | 1
arch/sh/configs/titan_defconfig | 1
arch/sh/include/asm/string_32.h | 26 --
arch/sh/kernel/iomap.c | 22 -
arch/sh/kernel/syscalls/syscall.tbl | 3
arch/sparc/kernel/syscalls/syscall.tbl | 3
arch/x86/entry/syscalls/syscall_32.tbl | 3
arch/x86/entry/syscalls/syscall_64.tbl | 4
arch/xtensa/kernel/syscalls/syscall.tbl | 3
drivers/mailbox/bcm-pdc-mailbox.c | 2
drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h | 6
drivers/ntb/hw/intel/ntb_hw_gen1.c | 2
drivers/ntb/hw/intel/ntb_hw_gen3.h | 2
drivers/ntb/hw/intel/ntb_hw_intel.h | 2
drivers/nvdimm/btt.c | 4
drivers/nvdimm/pmem.c | 6
drivers/sh/clk/cpg.c | 25 --
drivers/virtio/virtio_pci_modern.c | 6
fs/autofs/dev-ioctl.c | 4
fs/io_uring.c | 2
fs/namei.c | 4
include/asm-generic/iomap.h | 28 +-
include/asm-generic/pgalloc.h | 2
include/asm-generic/vmlinux.lds.h | 1
include/linux/compat.h | 5
include/linux/huge_mm.h | 58 ++++-
include/linux/io-64-nonatomic-hi-lo.h | 4
include/linux/io-64-nonatomic-lo-hi.h | 4
include/linux/memcontrol.h | 2
include/linux/mm.h | 16 -
include/linux/mm_inline.h | 6
include/linux/mm_types.h | 1
include/linux/pagemap.h | 6
include/linux/pid.h | 1
include/linux/syscalls.h | 4
include/linux/sysctl.h | 6
include/uapi/asm-generic/unistd.h | 4
kernel/Makefile | 2
kernel/exit.c | 17 -
kernel/pid.c | 17 +
kernel/sys_ni.c | 3
kernel/sysctl_binary.c | 171 --------------
lib/iomap.c | 30 +-
lib/lz4/lz4_compress.c | 4
lib/lz4/lz4_decompress.c | 18 -
lib/lz4/lz4defs.h | 10
lib/lz4/lz4hc_compress.c | 2
mm/compaction.c | 2
mm/filemap.c | 22 +
mm/frontswap.c | 8
mm/gup.c | 2
mm/internal.h | 4
mm/kmemleak.c | 2
mm/list_lru.c | 2
mm/madvise.c | 190 ++++++++++++++--
mm/memcontrol.c | 10
mm/memory.c | 4
mm/memory_hotplug.c | 7
mm/mempolicy.c | 2
mm/mempool.c | 2
mm/migrate.c | 18 -
mm/mlock.c | 9
mm/page_alloc.c | 5
mm/page_counter.c | 13 -
mm/page_io.c | 12 -
mm/page_vma_mapped.c | 6
mm/rmap.c | 10
mm/swap.c | 21 -
mm/swap_state.c | 10
mm/swapfile.c | 33 +-
mm/vmscan.c | 6
mm/vmstat.c | 12 -
mm/workingset.c | 6
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2
tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2
tools/testing/selftests/exec/.gitignore | 1
tools/testing/selftests/exec/Makefile | 5
tools/testing/selftests/exec/non-regular.c | 196 +++++++++++++++++
132 files changed, 815 insertions(+), 614 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-08-12 1:29 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-08-12 1:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- Most of the rest of MM
- various other subsystems
165 patches, based on 00e4db51259a5f936fec1424b884f029479d3981.
Subsystems affected by this patch series:
mm/memcg
mm/hugetlb
mm/vmscan
mm/proc
mm/compaction
mm/mempolicy
mm/oom-kill
mm/hugetlbfs
mm/migration
mm/thp
mm/cma
mm/util
mm/memory-hotplug
mm/cleanups
mm/uaccess
alpha
misc
sparse
bitmap
lib
lz4
bitops
checkpatch
autofs
minix
nilfs
ufs
fat
signals
kmod
coredump
exec
kdump
rapidio
panic
kcov
kgdb
ipc
mm/migration
mm/gup
mm/pagemap
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
Patch series "mm: memcg accounting of percpu memory", v3:
percpu: return number of released bytes from pcpu_free_area()
mm: memcg/percpu: account percpu memory to memory cgroups
mm: memcg/percpu: per-memcg percpu memory statistics
mm: memcg: charge memcg percpu memory to the parent cgroup
kselftests: cgroup: add perpcu memory accounting test
Subsystem: mm/hugetlb
Muchun Song <songmuchun@bytedance.com>:
mm/hugetlb: add mempolicy check in the reservation routine
Subsystem: mm/vmscan
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
Patch series "workingset protection/detection on the anonymous LRU list", v7:
mm/vmscan: make active/inactive ratio as 1:1 for anon lru
mm/vmscan: protect the workingset on anonymous LRU
mm/workingset: prepare the workingset detection infrastructure for anon LRU
mm/swapcache: support to handle the shadow entries
mm/swap: implement workingset detection for anonymous LRU
mm/vmscan: restore active/inactive ratio for anonymous LRU
Subsystem: mm/proc
Michal Koutný <mkoutny@suse.com>:
/proc/PID/smaps: consistent whitespace output format
Subsystem: mm/compaction
Nitin Gupta <nigupta@nvidia.com>:
mm: proactive compaction
mm: fix compile error due to COMPACTION_HPAGE_ORDER
mm: use unsigned types for fragmentation score
Alex Shi <alex.shi@linux.alibaba.com>:
mm/compaction: correct the comments of compact_defer_shift
Subsystem: mm/mempolicy
Krzysztof Kozlowski <krzk@kernel.org>:
mm: mempolicy: fix kerneldoc of numa_map_to_online_node()
Wenchao Hao <haowenchao22@gmail.com>:
mm/mempolicy.c: check parameters first in kernel_get_mempolicy
Yanfei Xu <yanfei.xu@windriver.com>:
include/linux/mempolicy.h: fix typo
Subsystem: mm/oom-kill
Yafang Shao <laoar.shao@gmail.com>:
mm, oom: make the calculation of oom badness more accurate
Michal Hocko <mhocko@suse.com>:
doc, mm: sync up oom_score_adj documentation
doc, mm: clarify /proc/<pid>/oom_score value range
Yafang Shao <laoar.shao@gmail.com>:
mm, oom: show process exiting information in __oom_kill_process()
Subsystem: mm/hugetlbfs
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlbfs: prevent filesystem stacking of hugetlbfs
hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem
Subsystem: mm/migration
Ralph Campbell <rcampbell@nvidia.com>:
Patch series "mm/migrate: optimize migrate_vma_setup() for holes":
mm/migrate: optimize migrate_vma_setup() for holes
mm/migrate: add migrate-shared test for migrate_vma_*()
Subsystem: mm/thp
Yang Shi <yang.shi@linux.alibaba.com>:
mm: thp: remove debug_cow switch
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/vmstat: add events for THP migration without split
Subsystem: mm/cma
Jianqun Xu <jay.xu@rock-chips.com>:
mm/cma.c: fix NULL pointer dereference when cma could not be activated
Barry Song <song.bao.hua@hisilicon.com>:
Patch series "mm: fix the names of general cma and hugetlb cma", v2:
mm: cma: fix the name of CMA areas
mm: hugetlb: fix the name of hugetlb CMA
Mike Kravetz <mike.kravetz@oracle.com>:
cma: don't quit at first error when activating reserved areas
Subsystem: mm/util
Waiman Long <longman@redhat.com>:
include/linux/sched/mm.h: optimize current_gfp_context()
Krzysztof Kozlowski <krzk@kernel.org>:
mm: mmu_notifier: fix and extend kerneldoc
Subsystem: mm/memory-hotplug
Daniel Jordan <daniel.m.jordan@oracle.com>:
x86/mm: use max memory block size on bare metal
Jia He <justin.he@arm.com>:
mm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid()
mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
Charan Teja Reddy <charante@codeaurora.org>:
mm, memory_hotplug: update pcp lists everytime onlining a memory block
Subsystem: mm/cleanups
Randy Dunlap <rdunlap@infradead.org>:
mm: drop duplicated words in <linux/pgtable.h>
mm: drop duplicated words in <linux/mm.h>
include/linux/highmem.h: fix duplicated words in a comment
include/linux/frontswap.h: drop duplicated word in a comment
include/linux/memcontrol.h: drop duplicate word and fix spello
Arvind Sankar <nivedita@alum.mit.edu>:
sh/mm: drop unused MAX_PHYSADDR_BITS
sparc: drop unused MAX_PHYSADDR_BITS
Randy Dunlap <rdunlap@infradead.org>:
mm/compaction.c: delete duplicated word
mm/filemap.c: delete duplicated word
mm/hmm.c: delete duplicated word
mm/hugetlb.c: delete duplicated words
mm/memcontrol.c: delete duplicated words
mm/memory.c: delete duplicated words
mm/migrate.c: delete duplicated word
mm/nommu.c: delete duplicated words
mm/page_alloc.c: delete or fix duplicated words
mm/shmem.c: delete duplicated word
mm/slab_common.c: delete duplicated word
mm/usercopy.c: delete duplicated word
mm/vmscan.c: delete or fix duplicated words
mm/zpool.c: delete duplicated word and fix grammar
mm/zsmalloc.c: fix duplicated words
Subsystem: mm/uaccess
Christoph Hellwig <hch@lst.de>:
Patch series "clean up address limit helpers", v2:
syscalls: use uaccess_kernel in addr_limit_user_check
nds32: use uaccess_kernel in show_regs
riscv: include <asm/pgtable.h> in <asm/uaccess.h>
uaccess: remove segment_eq
uaccess: add force_uaccess_{begin,end} helpers
exec: use force_uaccess_begin during exec and exit
Subsystem: alpha
Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
alpha: fix annotation of io{read,write}{16,32}be()
Subsystem: misc
Randy Dunlap <rdunlap@infradead.org>:
include/linux/compiler-clang.h: drop duplicated word in a comment
include/linux/exportfs.h: drop duplicated word in a comment
include/linux/async_tx.h: drop duplicated word in a comment
include/linux/xz.h: drop duplicated word
Christoph Hellwig <hch@lst.de>:
kernel: add a kernel_wait helper
Feng Tang <feng.tang@intel.com>:
./Makefile: add debug option to enable function aligned on 32 bytes
Arvind Sankar <nivedita@alum.mit.edu>:
kernel.h: remove duplicate include of asm/div64.h
"Alexander A. Klimov" <grandmaster@al2klimov.de>:
include/: replace HTTP links with HTTPS ones
Matthew Wilcox <willy@infradead.org>:
include/linux/poison.h: remove obsolete comment
Subsystem: sparse
Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
sparse: group the defines by functionality
Subsystem: bitmap
Stefano Brivio <sbrivio@redhat.com>:
Patch series "lib: Fix bitmap_cut() for overlaps, add test":
lib/bitmap.c: fix bitmap_cut() for partial overlapping case
lib/test_bitmap.c: add test for bitmap_cut()
Subsystem: lib
Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
lib/generic-radix-tree.c: remove unneeded __rcu
Geert Uytterhoeven <geert@linux-m68k.org>:
lib/test_bitops: do the full test during module init
Wei Yongjun <weiyongjun1@huawei.com>:
lib/test_lockup.c: make symbol 'test_works' static
Tiezhu Yang <yangtiezhu@loongson.cn>:
lib/Kconfig.debug: make TEST_LOCKUP depend on module
lib/test_lockup.c: fix return value of test_lockup_init()
"Alexander A. Klimov" <grandmaster@al2klimov.de>:
lib/: replace HTTP links with HTTPS ones
"Kars Mulder" <kerneldev@karsmulder.nl>:
kstrto*: correct documentation references to simple_strto*()
kstrto*: do not describe simple_strto*() as obsolete/replaced
Subsystem: lz4
Nick Terrell <terrelln@fb.com>:
lz4: fix kernel decompression speed
Subsystem: bitops
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
lib/test_bits.c: add tests of GENMASK
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: add test for possible misuse of IS_ENABLED() without CONFIG_
checkpatch: add --fix option for ASSIGN_IN_IF
Quentin Monnet <quentin@isovalent.com>:
checkpatch: fix CONST_STRUCT when const_structs.checkpatch is missing
Joe Perches <joe@perches.com>:
checkpatch: add test for repeated words
checkpatch: remove missing switch/case break test
Subsystem: autofs
Randy Dunlap <rdunlap@infradead.org>:
autofs: fix doubled word
Subsystem: minix
Eric Biggers <ebiggers@google.com>:
Patch series "fs/minix: fix syzbot bugs and set s_maxbytes":
fs/minix: check return value of sb_getblk()
fs/minix: don't allow getting deleted inodes
fs/minix: reject too-large maximum file size
fs/minix: set s_maxbytes correctly
fs/minix: fix block limit check for V1 filesystems
fs/minix: remove expected error message in block_to_path()
Subsystem: nilfs
Eric Biggers <ebiggers@google.com>:
Patch series "nilfs2 updates":
nilfs2: only call unlock_new_inode() if I_NEW
Joe Perches <joe@perches.com>:
nilfs2: convert __nilfs_msg to integrate the level and format
nilfs2: use a more common logging style
Subsystem: ufs
Colin Ian King <colin.king@canonical.com>:
fs/ufs: avoid potential u32 multiplication overflow
Subsystem: fat
Yubo Feng <fengyubo3@huawei.com>:
fatfs: switch write_lock to read_lock in fat_ioctl_get_attributes
"Alexander A. Klimov" <grandmaster@al2klimov.de>:
VFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS ones
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
fat: fix fat_ra_init() for data clusters == 0
Subsystem: signals
Helge Deller <deller@gmx.de>:
fs/signalfd.c: fix inconsistent return codes for signalfd4
Subsystem: kmod
Tiezhu Yang <yangtiezhu@loongson.cn>:
Patch series "kmod/umh: a few fixes":
selftests: kmod: use variable NAME in kmod_test_0001()
kmod: remove redundant "be an" in the comment
test_kmod: avoid potential double free in trigger_config_run_type()
Subsystem: coredump
Lepton Wu <ytht.net@gmail.com>:
coredump: add %f for executable filename
Subsystem: exec
Kees Cook <keescook@chromium.org>:
Patch series "Relocate execve() sanity checks", v2:
exec: change uselib(2) IS_SREG() failure to EACCES
exec: move S_ISREG() check earlier
exec: move path_noexec() check earlier
Subsystem: kdump
Vijay Balakrishna <vijayb@linux.microsoft.com>:
kdump: append kernel build-id string to VMCOREINFO
Subsystem: rapidio
"Gustavo A. R. Silva" <gustavoars@kernel.org>:
drivers/rapidio/devices/rio_mport_cdev.c: use struct_size() helper
drivers/rapidio/rio-scan.c: use struct_size() helper
rapidio/rio_mport_cdev: use array_size() helper in copy_{from,to}_user()
Subsystem: panic
Tiezhu Yang <yangtiezhu@loongson.cn>:
kernel/panic.c: make oops_may_print() return bool
lib/Kconfig.debug: fix typo in the help text of CONFIG_PANIC_TIMEOUT
Yue Hu <huyue2@yulong.com>:
panic: make print_oops_end_marker() static
Subsystem: kcov
Marco Elver <elver@google.com>:
kcov: unconditionally add -fno-stack-protector to compiler options
Wei Yongjun <weiyongjun1@huawei.com>:
kcov: make some symbols static
Subsystem: kgdb
Nick Desaulniers <ndesaulniers@google.com>:
scripts/gdb: fix python 3.8 SyntaxWarning
Subsystem: ipc
Alexey Dobriyan <adobriyan@gmail.com>:
ipc: uninline functions
Liao Pingfang <liao.pingfang@zte.com.cn>:
ipc/shm.c: remove the superfluous break
Subsystem: mm/migration
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
Patch series "clean-up the migration target allocation functions", v5:
mm/page_isolation: prefer the node of the source page
mm/migrate: move migration helper from .h to .c
mm/hugetlb: unify migration callbacks
mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations
mm/migrate: introduce a standard migration target allocation function
mm/mempolicy: use a standard migration target allocation callback
mm/page_alloc: remove a wrapper for alloc_migration_target()
Subsystem: mm/gup
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
mm/gup: restrict CMA region by using allocation scope API
mm/hugetlb: make hugetlb migration callback CMA aware
mm/gup: use a standard migration target allocation callback
Subsystem: mm/pagemap
Peter Xu <peterx@redhat.com>:
Patch series "mm: Page fault accounting cleanups", v5:
mm: do page fault accounting in handle_mm_fault
mm/alpha: use general page fault accounting
mm/arc: use general page fault accounting
mm/arm: use general page fault accounting
mm/arm64: use general page fault accounting
mm/csky: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/ia64: use general page fault accounting
mm/m68k: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/mips: use general page fault accounting
mm/nds32: use general page fault accounting
mm/nios2: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/riscv: use general page fault accounting
mm/s390: use general page fault accounting
mm/sh: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/x86: use general page fault accounting
mm/xtensa: use general page fault accounting
mm: clean up the last pieces of page fault accountings
mm/gup: remove task_struct pointer for all gup code
Documentation/admin-guide/cgroup-v2.rst | 4
Documentation/admin-guide/sysctl/kernel.rst | 3
Documentation/admin-guide/sysctl/vm.rst | 15 +
Documentation/filesystems/proc.rst | 11 -
Documentation/vm/page_migration.rst | 27 +++
Makefile | 4
arch/alpha/include/asm/io.h | 8
arch/alpha/include/asm/uaccess.h | 2
arch/alpha/mm/fault.c | 10 -
arch/arc/include/asm/segment.h | 3
arch/arc/kernel/process.c | 2
arch/arc/mm/fault.c | 20 --
arch/arm/include/asm/uaccess.h | 4
arch/arm/kernel/signal.c | 2
arch/arm/mm/fault.c | 27 ---
arch/arm64/include/asm/uaccess.h | 2
arch/arm64/kernel/sdei.c | 2
arch/arm64/mm/fault.c | 31 ---
arch/arm64/mm/numa.c | 10 -
arch/csky/include/asm/segment.h | 2
arch/csky/mm/fault.c | 15 -
arch/h8300/include/asm/segment.h | 2
arch/hexagon/mm/vm_fault.c | 11 -
arch/ia64/include/asm/uaccess.h | 2
arch/ia64/mm/fault.c | 11 -
arch/ia64/mm/numa.c | 2
arch/m68k/include/asm/segment.h | 2
arch/m68k/include/asm/tlbflush.h | 6
arch/m68k/mm/fault.c | 16 -
arch/microblaze/include/asm/uaccess.h | 2
arch/microblaze/mm/fault.c | 11 -
arch/mips/include/asm/uaccess.h | 2
arch/mips/kernel/unaligned.c | 27 +--
arch/mips/mm/fault.c | 16 -
arch/nds32/include/asm/uaccess.h | 2
arch/nds32/kernel/process.c | 2
arch/nds32/mm/alignment.c | 7
arch/nds32/mm/fault.c | 21 --
arch/nios2/include/asm/uaccess.h | 2
arch/nios2/mm/fault.c | 16 -
arch/openrisc/include/asm/uaccess.h | 2
arch/openrisc/mm/fault.c | 11 -
arch/parisc/include/asm/uaccess.h | 2
arch/parisc/mm/fault.c | 10 -
arch/powerpc/include/asm/uaccess.h | 3
arch/powerpc/mm/copro_fault.c | 7
arch/powerpc/mm/fault.c | 13 -
arch/riscv/include/asm/uaccess.h | 6
arch/riscv/mm/fault.c | 18 --
arch/s390/include/asm/uaccess.h | 2
arch/s390/kvm/interrupt.c | 2
arch/s390/kvm/kvm-s390.c | 2
arch/s390/kvm/priv.c | 8
arch/s390/mm/fault.c | 18 --
arch/s390/mm/gmap.c | 4
arch/sh/include/asm/segment.h | 3
arch/sh/include/asm/sparsemem.h | 4
arch/sh/kernel/traps_32.c | 12 -
arch/sh/mm/fault.c | 13 -
arch/sh/mm/init.c | 9 -
arch/sparc/include/asm/sparsemem.h | 1
arch/sparc/include/asm/uaccess_32.h | 2
arch/sparc/include/asm/uaccess_64.h | 2
arch/sparc/mm/fault_32.c | 15 -
arch/sparc/mm/fault_64.c | 13 -
arch/um/kernel/trap.c | 6
arch/x86/include/asm/uaccess.h | 2
arch/x86/mm/fault.c | 19 --
arch/x86/mm/init_64.c | 9 +
arch/x86/mm/numa.c | 1
arch/xtensa/include/asm/uaccess.h | 2
arch/xtensa/mm/fault.c | 17 -
drivers/firmware/arm_sdei.c | 5
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2
drivers/infiniband/core/umem_odp.c | 2
drivers/iommu/amd/iommu_v2.c | 2
drivers/iommu/intel/svm.c | 3
drivers/rapidio/devices/rio_mport_cdev.c | 7
drivers/rapidio/rio-scan.c | 8
drivers/vfio/vfio_iommu_type1.c | 4
fs/coredump.c | 17 +
fs/exec.c | 38 ++--
fs/fat/Kconfig | 2
fs/fat/fatent.c | 3
fs/fat/file.c | 4
fs/hugetlbfs/inode.c | 6
fs/minix/inode.c | 48 ++++-
fs/minix/itree_common.c | 8
fs/minix/itree_v1.c | 16 -
fs/minix/itree_v2.c | 15 -
fs/minix/minix.h | 1
fs/namei.c | 10 -
fs/nilfs2/alloc.c | 38 ++--
fs/nilfs2/btree.c | 42 ++--
fs/nilfs2/cpfile.c | 10 -
fs/nilfs2/dat.c | 14 -
fs/nilfs2/direct.c | 14 -
fs/nilfs2/gcinode.c | 2
fs/nilfs2/ifile.c | 4
fs/nilfs2/inode.c | 32 +--
fs/nilfs2/ioctl.c | 37 ++--
fs/nilfs2/mdt.c | 2
fs/nilfs2/namei.c | 6
fs/nilfs2/nilfs.h | 18 +-
fs/nilfs2/page.c | 11 -
fs/nilfs2/recovery.c | 32 +--
fs/nilfs2/segbuf.c | 2
fs/nilfs2/segment.c | 38 ++--
fs/nilfs2/sufile.c | 29 +--
fs/nilfs2/super.c | 73 ++++----
fs/nilfs2/sysfs.c | 29 +--
fs/nilfs2/the_nilfs.c | 85 ++++-----
fs/open.c | 6
fs/proc/base.c | 11 +
fs/proc/task_mmu.c | 4
fs/signalfd.c | 10 -
fs/ufs/super.c | 2
include/asm-generic/uaccess.h | 4
include/clocksource/timer-ti-dm.h | 2
include/linux/async_tx.h | 2
include/linux/btree.h | 2
include/linux/compaction.h | 6
include/linux/compiler-clang.h | 2
include/linux/compiler_types.h | 44 ++---
include/linux/crash_core.h | 6
include/linux/delay.h | 2
include/linux/dma/k3-psil.h | 2
include/linux/dma/k3-udma-glue.h | 2
include/linux/dma/ti-cppi5.h | 2
include/linux/exportfs.h | 2
include/linux/frontswap.h | 2
include/linux/fs.h | 10 +
include/linux/generic-radix-tree.h | 2
include/linux/highmem.h | 2
include/linux/huge_mm.h | 7
include/linux/hugetlb.h | 53 ++++--
include/linux/irqchip/irq-omap-intc.h | 2
include/linux/jhash.h | 2
include/linux/kernel.h | 12 -
include/linux/leds-ti-lmu-common.h | 2
include/linux/memcontrol.h | 12 +
include/linux/mempolicy.h | 18 +-
include/linux/migrate.h | 42 +---
include/linux/mm.h | 20 +-
include/linux/mmzone.h | 17 +
include/linux/oom.h | 4
include/linux/pgtable.h | 12 -
include/linux/platform_data/davinci-cpufreq.h | 2
include/linux/platform_data/davinci_asp.h | 2
include/linux/platform_data/elm.h | 2
include/linux/platform_data/gpio-davinci.h | 2
include/linux/platform_data/gpmc-omap.h | 2
include/linux/platform_data/mtd-davinci-aemif.h | 2
include/linux/platform_data/omap-twl4030.h | 2
include/linux/platform_data/uio_pruss.h | 2
include/linux/platform_data/usb-omap.h | 2
include/linux/poison.h | 4
include/linux/sched/mm.h | 8
include/linux/sched/task.h | 1
include/linux/soc/ti/k3-ringacc.h | 2
include/linux/soc/ti/knav_qmss.h | 2
include/linux/soc/ti/ti-msgmgr.h | 2
include/linux/swap.h | 25 ++
include/linux/syscalls.h | 2
include/linux/uaccess.h | 20 ++
include/linux/vm_event_item.h | 3
include/linux/wkup_m3_ipc.h | 2
include/linux/xxhash.h | 2
include/linux/xz.h | 4
include/linux/zlib.h | 2
include/soc/arc/aux.h | 2
include/trace/events/migrate.h | 17 +
include/uapi/linux/auto_dev-ioctl.h | 2
include/uapi/linux/elf.h | 2
include/uapi/linux/map_to_7segment.h | 2
include/uapi/linux/types.h | 2
include/uapi/linux/usb/ch9.h | 2
ipc/sem.c | 3
ipc/shm.c | 4
kernel/Makefile | 2
kernel/crash_core.c | 50 +++++
kernel/events/callchain.c | 5
kernel/events/core.c | 5
kernel/events/uprobes.c | 8
kernel/exit.c | 18 +-
kernel/futex.c | 2
kernel/kcov.c | 6
kernel/kmod.c | 5
kernel/kthread.c | 5
kernel/panic.c | 4
kernel/stacktrace.c | 5
kernel/sysctl.c | 11 +
kernel/umh.c | 29 ---
lib/Kconfig.debug | 27 ++-
lib/Makefile | 1
lib/bitmap.c | 4
lib/crc64.c | 2
lib/decompress_bunzip2.c | 2
lib/decompress_unlzma.c | 6
lib/kstrtox.c | 20 --
lib/lz4/lz4_compress.c | 4
lib/lz4/lz4_decompress.c | 18 +-
lib/lz4/lz4defs.h | 10 +
lib/lz4/lz4hc_compress.c | 2
lib/math/rational.c | 2
lib/rbtree.c | 2
lib/test_bitmap.c | 58 ++++++
lib/test_bitops.c | 18 +-
lib/test_bits.c | 75 ++++++++
lib/test_kmod.c | 2
lib/test_lockup.c | 6
lib/ts_bm.c | 2
lib/xxhash.c | 2
lib/xz/xz_crc32.c | 2
lib/xz/xz_dec_bcj.c | 2
lib/xz/xz_dec_lzma2.c | 2
lib/xz/xz_lzma2.h | 2
lib/xz/xz_stream.h | 2
mm/cma.c | 40 +---
mm/cma.h | 4
mm/compaction.c | 207 +++++++++++++++++++++--
mm/filemap.c | 2
mm/gup.c | 195 ++++++----------------
mm/hmm.c | 5
mm/huge_memory.c | 23 --
mm/hugetlb.c | 93 ++++------
mm/internal.h | 9 -
mm/khugepaged.c | 2
mm/ksm.c | 3
mm/maccess.c | 22 +-
mm/memcontrol.c | 42 +++-
mm/memory-failure.c | 7
mm/memory.c | 107 +++++++++---
mm/memory_hotplug.c | 30 ++-
mm/mempolicy.c | 49 +----
mm/migrate.c | 151 ++++++++++++++---
mm/mmu_notifier.c | 9 -
mm/nommu.c | 4
mm/oom_kill.c | 24 +-
mm/page_alloc.c | 14 +
mm/page_isolation.c | 21 --
mm/percpu-internal.h | 55 ++++++
mm/percpu-km.c | 5
mm/percpu-stats.c | 36 ++--
mm/percpu-vm.c | 5
mm/percpu.c | 208 +++++++++++++++++++++---
mm/process_vm_access.c | 2
mm/rmap.c | 2
mm/shmem.c | 5
mm/slab_common.c | 2
mm/swap.c | 13 -
mm/swap_state.c | 80 +++++++--
mm/swapfile.c | 4
mm/usercopy.c | 2
mm/userfaultfd.c | 2
mm/vmscan.c | 36 ++--
mm/vmstat.c | 32 +++
mm/workingset.c | 23 +-
mm/zpool.c | 8
mm/zsmalloc.c | 2
scripts/checkpatch.pl | 116 +++++++++----
scripts/gdb/linux/rbtree.py | 4
security/tomoyo/domain.c | 2
tools/testing/selftests/cgroup/test_kmem.c | 70 +++++++-
tools/testing/selftests/kmod/kmod.sh | 4
tools/testing/selftests/vm/hmm-tests.c | 35 ++++
virt/kvm/async_pf.c | 2
virt/kvm/kvm_main.c | 2
268 files changed, 2481 insertions(+), 1551 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-08-07 6:16 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-08-07 6:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- A few MM hotfixes
- kthread, tools, scripts, ntfs and ocfs2
- Some of MM
163 patches, based on d6efb3ac3e6c19ab722b28bdb9252bae0b9676b6.
Subsystems affected by this patch series:
mm/pagemap
mm/hofixes
mm/pagealloc
kthread
tools
scripts
ntfs
ocfs2
mm/slab-generic
mm/slab
mm/slub
mm/kcsan
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/shmem
mm/memcg
mm/pagemap
mm/mremap
mm/mincore
mm/sparsemem
mm/vmalloc
mm/kasan
mm/pagealloc
mm/hugetlb
mm/vmscan
Subsystem: mm/pagemap
Yang Shi <yang.shi@linux.alibaba.com>:
mm/memory.c: avoid access flag update TLB flush for retried page fault
Subsystem: mm/hofixes
Ralph Campbell <rcampbell@nvidia.com>:
mm/migrate: fix migrate_pgmap_owner w/o CONFIG_MMU_NOTIFIER
Subsystem: mm/pagealloc
David Hildenbrand <david@redhat.com>:
mm/shuffle: don't move pages between zones and don't read garbage memmaps
Subsystem: kthread
Peter Zijlstra <peterz@infradead.org>:
mm: fix kthread_use_mm() vs TLB invalidate
Ilias Stamatis <stamatis.iliass@gmail.com>:
kthread: remove incorrect comment in kthread_create_on_cpu()
Subsystem: tools
"Alexander A. Klimov" <grandmaster@al2klimov.de>:
tools/: replace HTTP links with HTTPS ones
Gaurav Singh <gaurav1086@gmail.com>:
tools/testing/selftests/cgroup/cgroup_util.c: cg_read_strcmp: fix null pointer dereference
Subsystem: scripts
Jialu Xu <xujialu@vimux.org>:
scripts/tags.sh: collect compiled source precisely
Nikolay Borisov <nborisov@suse.com>:
scripts/bloat-o-meter: Support comparing library archives
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
scripts/decode_stacktrace.sh: skip missing symbols
scripts/decode_stacktrace.sh: guess basepath if not specified
scripts/decode_stacktrace.sh: guess path to modules
scripts/decode_stacktrace.sh: guess path to vmlinux by release name
Joe Perches <joe@perches.com>:
const_structs.checkpatch: add regulator_ops
Colin Ian King <colin.king@canonical.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Subsystem: ntfs
Luca Stefani <luca.stefani.ge1@gmail.com>:
ntfs: fix ntfs_test_inode and ntfs_init_locked_inode function type
Subsystem: ocfs2
Gang He <ghe@suse.com>:
ocfs2: fix remounting needed after setfacl command
Randy Dunlap <rdunlap@infradead.org>:
ocfs2: suballoc.h: delete a duplicated word
Junxiao Bi <junxiao.bi@oracle.com>:
ocfs2: change slot number type s16 to u16
"Alexander A. Klimov" <grandmaster@al2klimov.de>:
ocfs2: replace HTTP links with HTTPS ones
Pavel Machek <pavel@ucw.cz>:
ocfs2: fix unbalanced locking
Subsystem: mm/slab-generic
Waiman Long <longman@redhat.com>:
mm, treewide: rename kzfree() to kfree_sensitive()
William Kucharski <william.kucharski@oracle.com>:
mm: ksize() should silently accept a NULL pointer
Subsystem: mm/slab
Kees Cook <keescook@chromium.org>:
Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB":
mm/slab: expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB
mm/slab: add naive detection of double free
Long Li <lonuxli.64@gmail.com>:
mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_order
Xiao Yang <yangx.jy@cn.fujitsu.com>:
mm/slab.c: update outdated kmem_list3 in a comment
Subsystem: mm/slub
Vlastimil Babka <vbabka@suse.cz>:
Patch series "slub_debug fixes and improvements":
mm, slub: extend slub_debug syntax for multiple blocks
mm, slub: make some slub_debug related attributes read-only
mm, slub: remove runtime allocation order changes
mm, slub: make remaining slub_debug related attributes read-only
mm, slub: make reclaim_account attribute read-only
mm, slub: introduce static key for slub_debug()
mm, slub: introduce kmem_cache_debug_flags()
mm, slub: extend checks guarded by slub_debug static key
mm, slab/slub: move and improve cache_from_obj()
mm, slab/slub: improve error reporting and overhead of cache_from_obj()
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm/slub.c: drop lockdep_assert_held() from put_map()
Subsystem: mm/kcsan
Marco Elver <elver@google.com>:
mm, kcsan: instrument SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS"
Subsystem: mm/debug
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/debug_vm_pgtable: Add some more tests", v5:
mm/debug_vm_pgtable: add tests validating arch helpers for core MM features
mm/debug_vm_pgtable: add tests validating advanced arch page table helpers
mm/debug_vm_pgtable: add debug prints for individual tests
Documentation/mm: add descriptions for arch page table helpers
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Improvements for dump_page()", v2:
mm/debug: handle page->mapping better in dump_page
mm/debug: dump compound page information on a second line
mm/debug: print head flags in dump_page
mm/debug: switch dump_page to get_kernel_nofault
mm/debug: print the inode number in dump_page
mm/debug: print hashed address of struct page
John Hubbard <jhubbard@nvidia.com>:
mm, dump_page: do not crash with bad compound_mapcount()
Subsystem: mm/pagecache
Yang Shi <yang.shi@linux.alibaba.com>:
mm: filemap: clear idle flag for writes
mm: filemap: add missing FGP_ flags in kerneldoc comment for pagecache_get_page
Subsystem: mm/gup
Tang Yizhou <tangyizhou@huawei.com>:
mm/gup.c: fix the comment of return value for populate_vma_page_range()
Subsystem: mm/swap
Zhen Lei <thunder.leizhen@huawei.com>:
Patch series "clean up some functions in mm/swap_slots.c":
mm/swap_slots.c: simplify alloc_swap_slot_cache()
mm/swap_slots.c: simplify enable_swap_slots_cache()
mm/swap_slots.c: remove redundant check for swap_slot_cache_initialized
Krzysztof Kozlowski <krzk@kernel.org>:
mm: swap: fix kerneldoc of swap_vma_readahead()
Xianting Tian <xianting_tian@126.com>:
mm/page_io.c: use blk_io_schedule() for avoiding task hung in sync io
Subsystem: mm/shmem
Chris Down <chris@chrisdown.name>:
Patch series "tmpfs: inode: Reduce risk of inum overflow", v7:
tmpfs: per-superblock i_ino support
tmpfs: support 64-bit inums per-sb
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
mm: kmem: make memcg_kmem_enabled() irreversible
Patch series "The new cgroup slab memory controller", v7:
mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state()
mm: memcg: prepare for byte-sized vmstat items
mm: memcg: convert vmstat slab counters to bytes
mm: slub: implement SLUB version of obj_to_index()
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: decouple reference counting from page accounting
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: obj_cgroup API
mm: memcg/slab: allocate obj_cgroups for non-root slab pages
mm: memcg/slab: save obj_cgroup for non-root slab objects
mm: memcg/slab: charge individual slab objects instead of pages
mm: memcg/slab: deprecate memory.kmem.slabinfo
mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h
mm: memcg/slab: use a single set of kmem_caches for all accounted allocations
mm: memcg/slab: simplify memcg cache creation
mm: memcg/slab: remove memcg_kmem_get_cache()
mm: memcg/slab: deprecate slab_root_caches
mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo()
mm: memcg/slab: use a single set of kmem_caches for all allocations
kselftests: cgroup: add kernel memory accounting tests
tools/cgroup: add memcg_slabinfo.py tool
Shakeel Butt <shakeelb@google.com>:
mm: memcontrol: account kernel stack per node
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: remove unused argument by charge_slab_page()
mm: slab: rename (un)charge_slab_page() to (un)account_slab_page()
mm: kmem: switch to static_branch_likely() in memcg_kmem_enabled()
mm: memcontrol: avoid workload stalls when lowering memory.high
Chris Down <chris@chrisdown.name>:
Patch series "mm, memcg: reclaim harder before high throttling", v2:
mm, memcg: reclaim more aggressively before high allocator throttling
mm, memcg: unify reclaim retry limits with page allocator
Yafang Shao <laoar.shao@gmail.com>:
Patch series "mm, memcg: memory.{low,min} reclaim fix & cleanup", v4:
mm, memcg: avoid stale protection values when cgroup is above protection
Chris Down <chris@chrisdown.name>:
mm, memcg: decouple e{low,min} state mutations from protection checks
Yafang Shao <laoar.shao@gmail.com>:
memcg, oom: check memcg margin for parallel oom
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: restore proper dirty throttling when memory.high changes
mm: memcontrol: don't count limit-setting reclaim as memory pressure
Michal Koutný <mkoutny@suse.com>:
mm/page_counter.c: fix protection usage propagation
Subsystem: mm/pagemap
Ralph Campbell <rcampbell@nvidia.com>:
mm: remove redundant check non_swap_entry()
Alex Zhang <zhangalex@google.com>:
mm/memory.c: make remap_pfn_range() reject unaligned addr
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: cleanup usage of <asm/pgalloc.h>":
mm: remove unneeded includes of <asm/pgalloc.h>
opeinrisc: switch to generic version of pte allocation
xtensa: switch to generic version of pte allocation
asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()
asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()
asm-generic: pgalloc: provide generic pgd_free()
mm: move lib/ioremap.c to mm/
Joerg Roedel <jroedel@suse.de>:
mm: move p?d_alloc_track to separate header file
Zhen Lei <thunder.leizhen@huawei.com>:
mm/mmap: optimize a branch judgment in ksys_mmap_pgoff()
Feng Tang <feng.tang@intel.com>:
Patch series "make vm_committed_as_batch aware of vm overcommit policy", v6:
proc/meminfo: avoid open coded reading of vm_committed_as
mm/util.c: make vm_memory_committed() more accurate
percpu_counter: add percpu_counter_sync()
mm: adjust vm_committed_as_batch according to vm overcommit policy
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "arm64: Enable vmemmap mapping from device memory", v4:
mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages()
mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf()
arm64/mm: enable vmem_altmap support for vmemmap mappings
Miaohe Lin <linmiaohe@huawei.com>:
mm: mmap: merge vma after call_mmap() if possible
Peter Collingbourne <pcc@google.com>:
mm: remove unnecessary wrapper function do_mmap_pgoff()
Subsystem: mm/mremap
Wei Yang <richard.weiyang@linux.alibaba.com>:
Patch series "mm/mremap: cleanup move_page_tables() a little", v5:
mm/mremap: it is sure to have enough space when extent meets requirement
mm/mremap: calculate extent in one place
mm/mremap: start addresses are properly aligned
Subsystem: mm/mincore
Ricardo Cañuelo <ricardo.canuelo@collabora.com>:
selftests: add mincore() tests
Subsystem: mm/sparsemem
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/sparse: never partially remove memmap for early section
mm/sparse: only sub-section aligned range would be populated
Mike Rapoport <rppt@linux.ibm.com>:
mm/sparse: cleanup the code surrounding memory_present()
Subsystem: mm/vmalloc
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
vmalloc: convert to XArray
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: simplify merge_or_add_vmap_area()
mm/vmalloc: simplify augment_tree_propagate_check()
mm/vmalloc: switch to "propagate()" callback
mm/vmalloc: update the header about KVA rework
Mike Rapoport <rppt@linux.ibm.com>:
mm: vmalloc: remove redundant assignment in unmap_kernel_range_noflush()
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc.c: remove BUG() from the find_va_links()
Subsystem: mm/kasan
Marco Elver <elver@google.com>:
kasan: improve and simplify Kconfig.kasan
kasan: update required compiler versions in documentation
Walter Wu <walter-zh.wu@mediatek.com>:
Patch series "kasan: memorize and print call_rcu stack", v8:
rcu: kasan: record and print call_rcu() call stack
kasan: record and print the free track
kasan: add tests for call_rcu stack recording
kasan: update documentation for generic kasan
Vincenzo Frascino <vincenzo.frascino@arm.com>:
kasan: remove kasan_unpoison_stack_above_sp_to()
Walter Wu <walter-zh.wu@mediatek.com>:
lib/test_kasan.c: fix KASAN unit tests for tag-based KASAN
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kasan: support stack instrumentation for tag-based mode", v2:
kasan: don't tag stacks allocated with pagealloc
efi: provide empty efi_enter_virtual_mode implementation
kasan, arm64: don't instrument functions that enable kasan
kasan: allow enabling stack tagging for tag-based mode
kasan: adjust kasan_stack_oob for tag-based mode
Subsystem: mm/pagealloc
Vlastimil Babka <vbabka@suse.cz>:
mm, page_alloc: use unlikely() in task_capc()
Jaewon Kim <jaewon31.kim@samsung.com>:
page_alloc: consider highatomic reserve in watermark fast
Charan Teja Reddy <charante@codeaurora.org>:
mm, page_alloc: skip ->waternark_boost for atomic order-0 allocations
David Hildenbrand <david@redhat.com>:
mm: remove vm_total_pages
mm/page_alloc: remove nr_free_pagecache_pages()
mm/memory_hotplug: document why shuffle_zone() is relevant
mm/shuffle: remove dynamic reconfiguration
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
mm/page_alloc.c: extract the common part in pfn_to_bitidx()
mm/page_alloc.c: simplify pageblock bitmap access
mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
Qian Cai <cai@lca.pw>:
mm/page_alloc: silence a KASAN false positive
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/page_alloc: fallbacks at most has 3 elements
Muchun Song <songmuchun@bytedance.com>:
mm/page_alloc.c: skip setting nodemask when we are in interrupt
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
Subsystem: mm/hugetlb
"Alexander A. Klimov" <grandmaster@al2klimov.de>:
mm: thp: replace HTTP links with HTTPS ones
Peter Xu <peterx@redhat.com>:
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
Hugh Dickins <hughd@google.com>:
khugepaged: collapse_pte_mapped_thp() flush the right range
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
khugepaged: retract_page_tables() remember to test exit
khugepaged: khugepaged_test_exit() check mmget_still_valid()
Subsystem: mm/vmscan
dylan-meiners <spacct.spacct@gmail.com>:
mm/vmscan.c: fix typo
Shakeel Butt <shakeelb@google.com>:
mm: vmscan: consistent update to pgrefill
Documentation/admin-guide/kernel-parameters.txt | 2
Documentation/dev-tools/kasan.rst | 10
Documentation/filesystems/dlmfs.rst | 2
Documentation/filesystems/ocfs2.rst | 2
Documentation/filesystems/tmpfs.rst | 18
Documentation/vm/arch_pgtable_helpers.rst | 258 +++++
Documentation/vm/memory-model.rst | 9
Documentation/vm/slub.rst | 51 -
arch/alpha/include/asm/pgalloc.h | 21
arch/alpha/include/asm/tlbflush.h | 1
arch/alpha/kernel/core_irongate.c | 1
arch/alpha/kernel/core_marvel.c | 1
arch/alpha/kernel/core_titan.c | 1
arch/alpha/kernel/machvec_impl.h | 2
arch/alpha/kernel/smp.c | 1
arch/alpha/mm/numa.c | 1
arch/arc/mm/fault.c | 1
arch/arc/mm/init.c | 1
arch/arm/include/asm/pgalloc.h | 12
arch/arm/include/asm/tlb.h | 1
arch/arm/kernel/machine_kexec.c | 1
arch/arm/kernel/smp.c | 1
arch/arm/kernel/suspend.c | 1
arch/arm/mach-omap2/omap-mpuss-lowpower.c | 1
arch/arm/mm/hugetlbpage.c | 1
arch/arm/mm/init.c | 9
arch/arm/mm/mmu.c | 1
arch/arm64/include/asm/pgalloc.h | 39
arch/arm64/kernel/setup.c | 2
arch/arm64/kernel/smp.c | 1
arch/arm64/mm/hugetlbpage.c | 1
arch/arm64/mm/init.c | 6
arch/arm64/mm/ioremap.c | 1
arch/arm64/mm/mmu.c | 63 -
arch/csky/include/asm/pgalloc.h | 7
arch/csky/kernel/smp.c | 1
arch/hexagon/include/asm/pgalloc.h | 7
arch/ia64/include/asm/pgalloc.h | 24
arch/ia64/include/asm/tlb.h | 1
arch/ia64/kernel/process.c | 1
arch/ia64/kernel/smp.c | 1
arch/ia64/kernel/smpboot.c | 1
arch/ia64/mm/contig.c | 1
arch/ia64/mm/discontig.c | 4
arch/ia64/mm/hugetlbpage.c | 1
arch/ia64/mm/tlb.c | 1
arch/m68k/include/asm/mmu_context.h | 2
arch/m68k/include/asm/sun3_pgalloc.h | 7
arch/m68k/kernel/dma.c | 2
arch/m68k/kernel/traps.c | 3
arch/m68k/mm/cache.c | 2
arch/m68k/mm/fault.c | 1
arch/m68k/mm/kmap.c | 2
arch/m68k/mm/mcfmmu.c | 1
arch/m68k/mm/memory.c | 1
arch/m68k/sun3x/dvma.c | 2
arch/microblaze/include/asm/pgalloc.h | 6
arch/microblaze/include/asm/tlbflush.h | 1
arch/microblaze/kernel/process.c | 1
arch/microblaze/kernel/signal.c | 1
arch/microblaze/mm/init.c | 3
arch/mips/include/asm/pgalloc.h | 19
arch/mips/kernel/setup.c | 8
arch/mips/loongson64/numa.c | 1
arch/mips/sgi-ip27/ip27-memory.c | 2
arch/mips/sgi-ip32/ip32-memory.c | 1
arch/nds32/mm/mm-nds32.c | 2
arch/nios2/include/asm/pgalloc.h | 7
arch/openrisc/include/asm/pgalloc.h | 33
arch/openrisc/include/asm/tlbflush.h | 1
arch/openrisc/kernel/or32_ksyms.c | 1
arch/parisc/include/asm/mmu_context.h | 1
arch/parisc/include/asm/pgalloc.h | 12
arch/parisc/kernel/cache.c | 1
arch/parisc/kernel/pci-dma.c | 1
arch/parisc/kernel/process.c | 1
arch/parisc/kernel/signal.c | 1
arch/parisc/kernel/smp.c | 1
arch/parisc/mm/hugetlbpage.c | 1
arch/parisc/mm/init.c | 5
arch/parisc/mm/ioremap.c | 2
arch/powerpc/include/asm/tlb.h | 1
arch/powerpc/mm/book3s64/hash_hugetlbpage.c | 1
arch/powerpc/mm/book3s64/hash_pgtable.c | 1
arch/powerpc/mm/book3s64/hash_tlb.c | 1
arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 1
arch/powerpc/mm/init_32.c | 1
arch/powerpc/mm/init_64.c | 4
arch/powerpc/mm/kasan/8xx.c | 1
arch/powerpc/mm/kasan/book3s_32.c | 1
arch/powerpc/mm/mem.c | 3
arch/powerpc/mm/nohash/40x.c | 1
arch/powerpc/mm/nohash/8xx.c | 1
arch/powerpc/mm/nohash/fsl_booke.c | 1
arch/powerpc/mm/nohash/kaslr_booke.c | 1
arch/powerpc/mm/nohash/tlb.c | 1
arch/powerpc/mm/numa.c | 1
arch/powerpc/mm/pgtable.c | 1
arch/powerpc/mm/pgtable_64.c | 1
arch/powerpc/mm/ptdump/hashpagetable.c | 2
arch/powerpc/mm/ptdump/ptdump.c | 1
arch/powerpc/platforms/pseries/cmm.c | 1
arch/riscv/include/asm/pgalloc.h | 18
arch/riscv/mm/fault.c | 1
arch/riscv/mm/init.c | 3
arch/s390/crypto/prng.c | 4
arch/s390/include/asm/tlb.h | 1
arch/s390/include/asm/tlbflush.h | 1
arch/s390/kernel/machine_kexec.c | 1
arch/s390/kernel/ptrace.c | 1
arch/s390/kvm/diag.c | 1
arch/s390/kvm/priv.c | 1
arch/s390/kvm/pv.c | 1
arch/s390/mm/cmm.c | 1
arch/s390/mm/init.c | 1
arch/s390/mm/mmap.c | 1
arch/s390/mm/pgtable.c | 1
arch/sh/include/asm/pgalloc.h | 4
arch/sh/kernel/idle.c | 1
arch/sh/kernel/machine_kexec.c | 1
arch/sh/mm/cache-sh3.c | 1
arch/sh/mm/cache-sh7705.c | 1
arch/sh/mm/hugetlbpage.c | 1
arch/sh/mm/init.c | 7
arch/sh/mm/ioremap_fixed.c | 1
arch/sh/mm/numa.c | 3
arch/sh/mm/tlb-sh3.c | 1
arch/sparc/include/asm/ide.h | 1
arch/sparc/include/asm/tlb_64.h | 1
arch/sparc/kernel/leon_smp.c | 1
arch/sparc/kernel/process_32.c | 1
arch/sparc/kernel/signal_32.c | 1
arch/sparc/kernel/smp_32.c | 1
arch/sparc/kernel/smp_64.c | 1
arch/sparc/kernel/sun4m_irq.c | 1
arch/sparc/mm/highmem.c | 1
arch/sparc/mm/init_64.c | 1
arch/sparc/mm/io-unit.c | 1
arch/sparc/mm/iommu.c | 1
arch/sparc/mm/tlb.c | 1
arch/um/include/asm/pgalloc.h | 9
arch/um/include/asm/pgtable-3level.h | 3
arch/um/kernel/mem.c | 17
arch/x86/ia32/ia32_aout.c | 1
arch/x86/include/asm/mmu_context.h | 1
arch/x86/include/asm/pgalloc.h | 42
arch/x86/kernel/alternative.c | 1
arch/x86/kernel/apic/apic.c | 1
arch/x86/kernel/mpparse.c | 1
arch/x86/kernel/traps.c | 1
arch/x86/mm/fault.c | 1
arch/x86/mm/hugetlbpage.c | 1
arch/x86/mm/init_32.c | 2
arch/x86/mm/init_64.c | 12
arch/x86/mm/kaslr.c | 1
arch/x86/mm/pgtable_32.c | 1
arch/x86/mm/pti.c | 1
arch/x86/platform/uv/bios_uv.c | 1
arch/x86/power/hibernate.c | 2
arch/xtensa/include/asm/pgalloc.h | 46
arch/xtensa/kernel/xtensa_ksyms.c | 1
arch/xtensa/mm/cache.c | 1
arch/xtensa/mm/fault.c | 1
crypto/adiantum.c | 2
crypto/ahash.c | 4
crypto/api.c | 2
crypto/asymmetric_keys/verify_pefile.c | 4
crypto/deflate.c | 2
crypto/drbg.c | 10
crypto/ecc.c | 8
crypto/ecdh.c | 2
crypto/gcm.c | 2
crypto/gf128mul.c | 4
crypto/jitterentropy-kcapi.c | 2
crypto/rng.c | 2
crypto/rsa-pkcs1pad.c | 6
crypto/seqiv.c | 2
crypto/shash.c | 2
crypto/skcipher.c | 2
crypto/testmgr.c | 6
crypto/zstd.c | 2
drivers/base/node.c | 10
drivers/block/xen-blkback/common.h | 1
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c | 2
drivers/crypto/allwinner/sun8i-ss/sun8i-ss-cipher.c | 2
drivers/crypto/amlogic/amlogic-gxl-cipher.c | 4
drivers/crypto/atmel-ecc.c | 2
drivers/crypto/caam/caampkc.c | 28
drivers/crypto/cavium/cpt/cptvf_main.c | 6
drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 12
drivers/crypto/cavium/nitrox/nitrox_lib.c | 4
drivers/crypto/cavium/zip/zip_crypto.c | 6
drivers/crypto/ccp/ccp-crypto-rsa.c | 6
drivers/crypto/ccree/cc_aead.c | 4
drivers/crypto/ccree/cc_buffer_mgr.c | 4
drivers/crypto/ccree/cc_cipher.c | 6
drivers/crypto/ccree/cc_hash.c | 8
drivers/crypto/ccree/cc_request_mgr.c | 2
drivers/crypto/marvell/cesa/hash.c | 2
drivers/crypto/marvell/octeontx/otx_cptvf_main.c | 6
drivers/crypto/marvell/octeontx/otx_cptvf_reqmgr.h | 2
drivers/crypto/nx/nx.c | 4
drivers/crypto/virtio/virtio_crypto_algs.c | 12
drivers/crypto/virtio/virtio_crypto_core.c | 2
drivers/iommu/ipmmu-vmsa.c | 1
drivers/md/dm-crypt.c | 32
drivers/md/dm-integrity.c | 6
drivers/misc/ibmvmc.c | 6
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 2
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 6
drivers/net/ppp/ppp_mppe.c | 6
drivers/net/wireguard/noise.c | 4
drivers/net/wireguard/peer.c | 2
drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2
drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 6
drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 6
drivers/net/wireless/intersil/orinoco/wext.c | 4
drivers/s390/crypto/ap_bus.h | 4
drivers/staging/ks7010/ks_hostif.c | 2
drivers/staging/rtl8723bs/core/rtw_security.c | 2
drivers/staging/wlan-ng/p80211netdev.c | 2
drivers/target/iscsi/iscsi_target_auth.c | 2
drivers/xen/balloon.c | 1
drivers/xen/privcmd.c | 1
fs/Kconfig | 21
fs/aio.c | 6
fs/binfmt_elf_fdpic.c | 1
fs/cifs/cifsencrypt.c | 2
fs/cifs/connect.c | 10
fs/cifs/dfs_cache.c | 2
fs/cifs/misc.c | 8
fs/crypto/inline_crypt.c | 5
fs/crypto/keyring.c | 6
fs/crypto/keysetup_v1.c | 4
fs/ecryptfs/keystore.c | 4
fs/ecryptfs/messaging.c | 2
fs/hugetlbfs/inode.c | 2
fs/ntfs/dir.c | 2
fs/ntfs/inode.c | 27
fs/ntfs/inode.h | 4
fs/ntfs/mft.c | 4
fs/ocfs2/Kconfig | 6
fs/ocfs2/acl.c | 2
fs/ocfs2/blockcheck.c | 2
fs/ocfs2/dlmglue.c | 8
fs/ocfs2/ocfs2.h | 4
fs/ocfs2/suballoc.c | 4
fs/ocfs2/suballoc.h | 2
fs/ocfs2/super.c | 4
fs/proc/meminfo.c | 10
include/asm-generic/pgalloc.h | 80 +
include/asm-generic/tlb.h | 1
include/crypto/aead.h | 2
include/crypto/akcipher.h | 2
include/crypto/gf128mul.h | 2
include/crypto/hash.h | 2
include/crypto/internal/acompress.h | 2
include/crypto/kpp.h | 2
include/crypto/skcipher.h | 2
include/linux/efi.h | 4
include/linux/fs.h | 17
include/linux/huge_mm.h | 2
include/linux/kasan.h | 4
include/linux/memcontrol.h | 209 +++-
include/linux/mm.h | 86 -
include/linux/mm_types.h | 5
include/linux/mman.h | 4
include/linux/mmu_notifier.h | 13
include/linux/mmzone.h | 54 -
include/linux/pageblock-flags.h | 30
include/linux/percpu_counter.h | 4
include/linux/sched/mm.h | 8
include/linux/shmem_fs.h | 3
include/linux/slab.h | 11
include/linux/slab_def.h | 9
include/linux/slub_def.h | 31
include/linux/swap.h | 2
include/linux/vmstat.h | 14
init/Kconfig | 9
init/main.c | 2
ipc/shm.c | 2
kernel/fork.c | 54 -
kernel/kthread.c | 8
kernel/power/snapshot.c | 2
kernel/rcu/tree.c | 2
kernel/scs.c | 2
kernel/sysctl.c | 2
lib/Kconfig.kasan | 39
lib/Makefile | 1
lib/ioremap.c | 287 -----
lib/mpi/mpiutil.c | 6
lib/percpu_counter.c | 19
lib/test_kasan.c | 87 +
mm/Kconfig | 6
mm/Makefile | 2
mm/debug.c | 103 +-
mm/debug_vm_pgtable.c | 666 +++++++++++++
mm/filemap.c | 9
mm/gup.c | 3
mm/huge_memory.c | 14
mm/hugetlb.c | 25
mm/ioremap.c | 289 +++++
mm/kasan/common.c | 41
mm/kasan/generic.c | 43
mm/kasan/generic_report.c | 1
mm/kasan/kasan.h | 25
mm/kasan/quarantine.c | 1
mm/kasan/report.c | 54 -
mm/kasan/tags.c | 37
mm/khugepaged.c | 75 -
mm/memcontrol.c | 832 ++++++++++-------
mm/memory.c | 15
mm/memory_hotplug.c | 11
mm/migrate.c | 6
mm/mm_init.c | 20
mm/mmap.c | 45
mm/mremap.c | 19
mm/nommu.c | 6
mm/oom_kill.c | 2
mm/page-writeback.c | 6
mm/page_alloc.c | 226 ++--
mm/page_counter.c | 6
mm/page_io.c | 2
mm/pgalloc-track.h | 51 +
mm/shmem.c | 133 ++
mm/shuffle.c | 46
mm/shuffle.h | 17
mm/slab.c | 129 +-
mm/slab.h | 755 ++++++---------
mm/slab_common.c | 829 ++--------------
mm/slob.c | 12
mm/slub.c | 680 ++++---------
mm/sparse-vmemmap.c | 62 -
mm/sparse.c | 31
mm/swap_slots.c | 45
mm/swap_state.c | 2
mm/util.c | 52 +
mm/vmalloc.c | 176 +--
mm/vmscan.c | 39
mm/vmstat.c | 38
mm/workingset.c | 6
net/atm/mpoa_caches.c | 4
net/bluetooth/ecdh_helper.c | 6
net/bluetooth/smp.c | 24
net/core/sock.c | 2
net/ipv4/tcp_fastopen.c | 2
net/mac80211/aead_api.c | 4
net/mac80211/aes_gmac.c | 2
net/mac80211/key.c | 2
net/mac802154/llsec.c | 20
net/sctp/auth.c | 2
net/sunrpc/auth_gss/gss_krb5_crypto.c | 4
net/sunrpc/auth_gss/gss_krb5_keys.c | 6
net/sunrpc/auth_gss/gss_krb5_mech.c | 2
net/tipc/crypto.c | 10
net/wireless/core.c | 2
net/wireless/ibss.c | 4
net/wireless/lib80211_crypt_tkip.c | 2
net/wireless/lib80211_crypt_wep.c | 2
net/wireless/nl80211.c | 24
net/wireless/sme.c | 6
net/wireless/util.c | 2
net/wireless/wext-sme.c | 2
scripts/Makefile.kasan | 3
scripts/bloat-o-meter | 2
scripts/coccinelle/free/devm_free.cocci | 4
scripts/coccinelle/free/ifnullfree.cocci | 4
scripts/coccinelle/free/kfree.cocci | 6
scripts/coccinelle/free/kfreeaddr.cocci | 2
scripts/const_structs.checkpatch | 1
scripts/decode_stacktrace.sh | 85 +
scripts/spelling.txt | 19
scripts/tags.sh | 18
security/apparmor/domain.c | 4
security/apparmor/include/file.h | 2
security/apparmor/policy.c | 24
security/apparmor/policy_ns.c | 6
security/apparmor/policy_unpack.c | 14
security/keys/big_key.c | 6
security/keys/dh.c | 14
security/keys/encrypted-keys/encrypted.c | 14
security/keys/trusted-keys/trusted_tpm1.c | 34
security/keys/user_defined.c | 6
tools/cgroup/memcg_slabinfo.py | 226 ++++
tools/include/linux/jhash.h | 2
tools/lib/rbtree.c | 2
tools/lib/traceevent/event-parse.h | 2
tools/testing/ktest/examples/README | 2
tools/testing/ktest/examples/crosstests.conf | 2
tools/testing/selftests/Makefile | 1
tools/testing/selftests/cgroup/.gitignore | 1
tools/testing/selftests/cgroup/Makefile | 2
tools/testing/selftests/cgroup/cgroup_util.c | 2
tools/testing/selftests/cgroup/test_kmem.c | 382 +++++++
tools/testing/selftests/mincore/.gitignore | 2
tools/testing/selftests/mincore/Makefile | 6
tools/testing/selftests/mincore/mincore_selftest.c | 361 +++++++
397 files changed, 5547 insertions(+), 4072 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-07-24 4:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-07-24 4:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
15 patches, based on f37e99aca03f63aa3f2bd13ceaf769455d12c4b0.
Subsystems affected by this patch series:
mm/pagemap
mm/shmem
mm/hotfixes
mm/memcg
mm/hugetlb
mailmap
squashfs
scripts
io-mapping
MAINTAINERS
gdb
Subsystem: mm/pagemap
Yang Shi <yang.shi@linux.alibaba.com>:
mm/memory.c: avoid access flag update TLB flush for retried page fault
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
mm/mmap.c: close race between munmap() and expand_upwards()/downwards()
Subsystem: mm/shmem
Chengguang Xu <cgxu519@mykernel.net>:
vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way
Subsystem: mm/hotfixes
Tom Rix <trix@redhat.com>:
mm: initialize return of vm_insert_pages
Bhupesh Sharma <bhsharma@redhat.com>:
mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages()
Subsystem: mm/memcg
Hugh Dickins <hughd@google.com>:
mm/memcg: fix refcount error while moving and swapping
Muchun Song <songmuchun@bytedance.com>:
mm: memcg/slab: fix memory leak at non-root kmem_cache destroy
Subsystem: mm/hugetlb
Barry Song <song.bao.hua@hisilicon.com>:
mm/hugetlb: avoid hardcoding while checking if cma is enabled
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
khugepaged: fix null-pointer dereference due to race
Subsystem: mailmap
Mike Rapoport <rppt@linux.ibm.com>:
mailmap: add entry for Mike Rapoport
Subsystem: squashfs
Phillip Lougher <phillip@squashfs.org.uk>:
squashfs: fix length field overlap check in metadata reading
Subsystem: scripts
Pi-Hsun Shih <pihsun@chromium.org>:
scripts/decode_stacktrace: strip basepath from all paths
Subsystem: io-mapping
"Michael J. Ruhl" <michael.j.ruhl@intel.com>:
io-mapping: indicate mapping failure
Subsystem: MAINTAINERS
Andrey Konovalov <andreyknvl@google.com>:
MAINTAINERS: add KCOV section
Subsystem: gdb
Stefano Garzarella <sgarzare@redhat.com>:
scripts/gdb: fix lx-symbols 'gdb.error' while loading modules
.mailmap | 3 +++
MAINTAINERS | 11 +++++++++++
fs/squashfs/block.c | 2 +-
include/linux/io-mapping.h | 5 ++++-
include/linux/xattr.h | 3 ++-
mm/hugetlb.c | 15 ++++++++++-----
mm/khugepaged.c | 3 +++
mm/memcontrol.c | 13 ++++++++++---
mm/memory.c | 9 +++++++--
mm/mmap.c | 16 ++++++++++++++--
mm/shmem.c | 2 +-
mm/slab_common.c | 35 ++++++++++++++++++++++++++++-------
scripts/decode_stacktrace.sh | 4 ++--
scripts/gdb/linux/symbols.py | 2 +-
14 files changed, 97 insertions(+), 26 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-07-03 22:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-07-03 22:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
5 patches, based on cdd3bb54332f82295ed90cd0c09c78cd0c0ee822.
Subsystems affected by this patch series:
mm/hugetlb
samples
mm/cma
mm/vmalloc
mm/pagealloc
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
mm/hugetlb.c: fix pages per hugetlb calculation
Subsystem: samples
Kees Cook <keescook@chromium.org>:
samples/vfs: avoid warning in statx override
Subsystem: mm/cma
Barry Song <song.bao.hua@hisilicon.com>:
mm/cma.c: use exact_nid true to fix possible per-numa cma leak
Subsystem: mm/vmalloc
Christoph Hellwig <hch@lst.de>:
vmalloc: fix the owner argument for the new __vmalloc_node_range callers
Subsystem: mm/pagealloc
Joel Savitz <jsavitz@redhat.com>:
mm/page_alloc: fix documentation error
arch/arm64/kernel/probes/kprobes.c | 2 +-
arch/x86/hyperv/hv_init.c | 3 ++-
kernel/module.c | 2 +-
mm/cma.c | 4 ++--
mm/hugetlb.c | 2 +-
mm/page_alloc.c | 2 +-
samples/vfs/test-statx.c | 2 ++
7 files changed, 10 insertions(+), 7 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-26 17:39 ` incoming Konstantin Ryabitsev
@ 2020-06-26 17:40 ` Konstantin Ryabitsev
0 siblings, 0 replies; 786+ messages in thread
From: Konstantin Ryabitsev @ 2020-06-26 17:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Linux-MM, mm-commits
On Fri, 26 Jun 2020 at 13:39, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
> > Konstantin, maybe mm-commits could be on lore too and then they'd have
> > been caught that way?
>
> Yes, I already have a request from Kees for linux-mm addition, so that
> should show up in archives before long.
correction: mm-commits, that is
-K
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-26 6:51 ` incoming Linus Torvalds
2020-06-26 7:31 ` incoming Linus Torvalds
@ 2020-06-26 17:39 ` Konstantin Ryabitsev
2020-06-26 17:40 ` incoming Konstantin Ryabitsev
1 sibling, 1 reply; 786+ messages in thread
From: Konstantin Ryabitsev @ 2020-06-26 17:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Linux-MM, mm-commits
On Thu, Jun 25, 2020 at 11:51:06PM -0700, Linus Torvalds wrote:
> On Thu, Jun 25, 2020 at 8:28 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > 32 patches, based on 908f7d12d3ba51dfe0449b9723199b423f97ca9a.
>
> You didn't cc lkml, so now none of the nice 'b4' automation seems to
> work for this series..
>
> Yes, this cover-letter went to linux-mm (which is on lore), but the
> individual patches didn't.
>
> Konstantin, maybe mm-commits could be on lore too and then they'd have
> been caught that way?
Yes, I already have a request from Kees for linux-mm addition, so that
should show up in archives before long.
-K
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-26 6:51 ` incoming Linus Torvalds
@ 2020-06-26 7:31 ` Linus Torvalds
2020-06-26 17:39 ` incoming Konstantin Ryabitsev
1 sibling, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-06-26 7:31 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: Linux-MM, mm-commits
On Thu, Jun 25, 2020 at 11:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> You didn't cc lkml, so now none of the nice 'b4' automation seems to
> work for this series..
Note that I've picked them up the old-fashioned way, so don't re-send them.
So more of a note for "please, next time..."
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-26 3:28 incoming Andrew Morton
@ 2020-06-26 6:51 ` Linus Torvalds
2020-06-26 7:31 ` incoming Linus Torvalds
2020-06-26 17:39 ` incoming Konstantin Ryabitsev
0 siblings, 2 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-06-26 6:51 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: Linux-MM, mm-commits
On Thu, Jun 25, 2020 at 8:28 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 32 patches, based on 908f7d12d3ba51dfe0449b9723199b423f97ca9a.
You didn't cc lkml, so now none of the nice 'b4' automation seems to
work for this series..
Yes, this cover-letter went to linux-mm (which is on lore), but the
individual patches didn't.
Konstantin, maybe mm-commits could be on lore too and then they'd have
been caught that way?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-26 3:28 Andrew Morton
2020-06-26 6:51 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-06-26 3:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
32 patches, based on 908f7d12d3ba51dfe0449b9723199b423f97ca9a.
Subsystems affected by this patch series:
hotfixes
mm/pagealloc
kexec
ocfs2
lib
misc
mm/slab
mm/slab
mm/slub
mm/swap
mm/pagemap
mm/vmalloc
mm/memcg
mm/gup
mm/thp
mm/vmscan
x86
mm/memory-hotplug
MAINTAINERS
Subsystem: hotfixes
Stafford Horne <shorne@gmail.com>:
openrisc: fix boot oops when DEBUG_VM is enabled
Michal Hocko <mhocko@suse.com>:
mm: do_swap_page(): fix up the error code
Subsystem: mm/pagealloc
Vlastimil Babka <vbabka@suse.cz>:
mm, compaction: make capture control handling safe wrt interrupts
Subsystem: kexec
Lianbo Jiang <lijiang@redhat.com>:
kexec: do not verify the signature without the lockdown or mandatory signature
Subsystem: ocfs2
Junxiao Bi <junxiao.bi@oracle.com>:
Patch series "ocfs2: fix nfsd over ocfs2 issues", v2:
ocfs2: avoid inode removal while nfsd is accessing it
ocfs2: load global_inode_alloc
ocfs2: fix panic on nfs server over ocfs2
ocfs2: fix value of OCFS2_INVALID_SLOT
Subsystem: lib
Randy Dunlap <rdunlap@infradead.org>:
lib: fix test_hmm.c reference after free
Subsystem: misc
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
linux/bits.h: fix unsigned less than zero warnings
Subsystem: mm/slab
Waiman Long <longman@redhat.com>:
mm, slab: fix sign conversion problem in memcg_uncharge_slab()
Subsystem: mm/slab
Waiman Long <longman@redhat.com>:
mm/slab: use memzero_explicit() in kzfree()
Subsystem: mm/slub
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
slub: cure list_slab_objects() from double fix
Subsystem: mm/swap
Hugh Dickins <hughd@google.com>:
mm: fix swap cache node allocation mask
Subsystem: mm/pagemap
Arjun Roy <arjunroy@google.com>:
mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages()
Christophe Leroy <christophe.leroy@csgroup.eu>:
mm/debug_vm_pgtable: fix build failure with powerpc 8xx
Stephen Rothwell <sfr@canb.auug.org.au>:
make asm-generic/cacheflush.h more standalone
Nathan Chancellor <natechancellor@gmail.com>:
media: omap3isp: remove cacheflush.h
Subsystem: mm/vmalloc
Masanari Iida <standby24x7@gmail.com>:
mm/vmalloc.c: fix a warning while make xmldocs
Subsystem: mm/memcg
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: handle div0 crash race condition in memory.low
Muchun Song <songmuchun@bytedance.com>:
mm/memcontrol.c: add missed css_put()
Chris Down <chris@chrisdown.name>:
mm/memcontrol.c: prevent missed memory.low load tears
Subsystem: mm/gup
Souptick Joarder <jrdr.linux@gmail.com>:
docs: mm/gup: minor documentation update
Subsystem: mm/thp
Yang Shi <yang.shi@linux.alibaba.com>:
doc: THP CoW fault no longer allocate THP
Subsystem: mm/vmscan
Johannes Weiner <hannes@cmpxchg.org>:
Patch series "fix for "mm: balance LRU lists based on relative thrashing" patchset":
mm: workingset: age nonresident information alongside anonymous pages
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages"
mm/memory: fix IO cost for anonymous page
Subsystem: x86
Christoph Hellwig <hch@lst.de>:
Patch series "fix a hyperv W^X violation and remove vmalloc_exec":
x86/hyperv: allocate the hypercall page with only read and execute bits
arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page
mm: remove vmalloc_exec
Subsystem: mm/memory-hotplug
Ben Widawsky <ben.widawsky@intel.com>:
mm/memory_hotplug.c: fix false softlockup during pfn range removal
Subsystem: MAINTAINERS
Luc Van Oostenryck <luc.vanoostenryck@gmail.com>:
MAINTAINERS: update info for sparse
Documentation/admin-guide/cgroup-v2.rst | 4 +-
Documentation/admin-guide/mm/transhuge.rst | 3 -
Documentation/core-api/pin_user_pages.rst | 2 -
MAINTAINERS | 4 +-
arch/arm64/kernel/probes/kprobes.c | 12 +------
arch/openrisc/kernel/dma.c | 5 +++
arch/x86/hyperv/hv_init.c | 4 +-
arch/x86/include/asm/pgtable_types.h | 2 +
drivers/media/platform/omap3isp/isp.c | 2 -
drivers/media/platform/omap3isp/ispvideo.c | 1
fs/ocfs2/dlmglue.c | 17 ++++++++++
fs/ocfs2/ocfs2.h | 1
fs/ocfs2/ocfs2_fs.h | 4 +-
fs/ocfs2/suballoc.c | 9 +++--
include/asm-generic/cacheflush.h | 5 +++
include/linux/bits.h | 3 +
include/linux/mmzone.h | 4 +-
include/linux/swap.h | 1
include/linux/vmalloc.h | 1
kernel/kexec_file.c | 36 ++++------------------
kernel/module.c | 4 +-
lib/test_hmm.c | 3 -
mm/compaction.c | 17 ++++++++--
mm/debug_vm_pgtable.c | 4 +-
mm/memcontrol.c | 18 ++++++++---
mm/memory.c | 33 +++++++++++++-------
mm/memory_hotplug.c | 13 ++++++--
mm/nommu.c | 17 ----------
mm/slab.h | 4 +-
mm/slab_common.c | 2 -
mm/slub.c | 19 ++---------
mm/swap.c | 3 -
mm/swap_state.c | 4 +-
mm/vmalloc.c | 21 -------------
mm/vmscan.c | 3 +
mm/workingset.c | 46 +++++++++++++++++------------
36 files changed, 168 insertions(+), 163 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-12 0:30 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-06-12 0:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
A few fixes and stragglers.
5 patches, based on 623f6dc593eaf98b91916836785278eddddaacf8.
Subsystems affected by this patch series:
mm/memory-failure
ocfs2
lib/lzo
misc
Subsystem: mm/memory-failure
Naoya Horiguchi <nao.horiguchi@gmail.com>:
Patch series "hwpoison: fixes signaling on memory error":
mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill
mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread
Subsystem: ocfs2
Tom Seewald <tseewald@gmail.com>:
ocfs2: fix build failure when TCP/IP is disabled
Subsystem: lib/lzo
Dave Rodgman <dave.rodgman@arm.com>:
lib/lzo: fix ambiguous encoding bug in lzo-rle
Subsystem: misc
Christoph Hellwig <hch@lst.de>:
amdgpu: a NULL ->mm does not mean a thread is a kthread
Documentation/lzo.txt | 8 ++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 -
fs/ocfs2/Kconfig | 2 -
lib/lzo/lzo1x_compress.c | 13 ++++++++
mm/memory-failure.c | 43 +++++++++++++++++------------
5 files changed, 47 insertions(+), 21 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-11 1:40 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-06-11 1:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- various hotfixes and minor things
- hch's use_mm/unuse_mm clearnups
- new syscall process_madvise(): perform madvise() on a process other
than self
25 patches, based on 6f630784cc0d92fb58ea326e2bc01aa056279ecb.
Subsystems affected by this patch series:
mm/hugetlb
scripts
kcov
lib
nilfs
checkpatch
lib
mm/debug
ocfs2
lib
misc
mm/madvise
Subsystem: mm/hugetlb
Dan Carpenter <dan.carpenter@oracle.com>:
khugepaged: selftests: fix timeout condition in wait_for_scan()
Subsystem: scripts
SeongJae Park <sjpark@amazon.de>:
scripts/spelling: add a few more typos
Subsystem: kcov
Andrey Konovalov <andreyknvl@google.com>:
kcov: check kcov_softirq in kcov_remote_stop()
Subsystem: lib
Joe Perches <joe@perches.com>:
lib/lz4/lz4_decompress.c: document deliberate use of `&'
Subsystem: nilfs
Ryusuke Konishi <konishi.ryusuke@gmail.com>:
nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
Subsystem: checkpatch
Tim Froidcoeur <tim.froidcoeur@tessares.net>:
checkpatch: correct check for kernel parameters doc
Subsystem: lib
Alexander Gordeev <agordeev@linux.ibm.com>:
lib: fix bitmap_parse() on 64-bit big endian archs
Subsystem: mm/debug
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
mm/debug_vm_pgtable: fix kernel crash by checking for THP support
Subsystem: ocfs2
Keyur Patel <iamkeyur96@gmail.com>:
ocfs2: fix spelling mistake and grammar
Ben Widawsky <ben.widawsky@intel.com>:
mm: add comments on pglist_data zones
Subsystem: lib
Wei Yang <richard.weiyang@gmail.com>:
lib: test get_count_order/long in test_bitops.c
Subsystem: misc
Walter Wu <walter-zh.wu@mediatek.com>:
stacktrace: cleanup inconsistent variable type
Christoph Hellwig <hch@lst.de>:
Patch series "improve use_mm / unuse_mm", v2:
kernel: move use_mm/unuse_mm to kthread.c
kernel: move use_mm/unuse_mm to kthread.c
kernel: better document the use_mm/unuse_mm API contract
kernel: set USER_DS in kthread_use_mm
Subsystem: mm/madvise
Minchan Kim <minchan@kernel.org>:
Patch series "introduce memory hinting API for external process", v7:
mm/madvise: pass task and mm to do_madvise
mm/madvise: introduce process_madvise() syscall: an external memory hinting API
mm/madvise: check fatal signal pending of target process
pid: move pidfd_get_pid() to pid.c
mm/madvise: support both pid and pidfd for process_madvise
Oleksandr Natalenko <oleksandr@redhat.com>:
mm/madvise: allow KSM hints for remote API
Minchan Kim <minchan@kernel.org>:
mm: support vector address ranges for process_madvise
mm: use only pidfd for process_madvise syscall
YueHaibing <yuehaibing@huawei.com>:
mm/madvise.c: remove duplicated include
arch/alpha/kernel/syscalls/syscall.tbl | 1
arch/arm/tools/syscall.tbl | 1
arch/arm64/include/asm/unistd.h | 2
arch/arm64/include/asm/unistd32.h | 4
arch/ia64/kernel/syscalls/syscall.tbl | 1
arch/m68k/kernel/syscalls/syscall.tbl | 1
arch/microblaze/kernel/syscalls/syscall.tbl | 1
arch/mips/kernel/syscalls/syscall_n32.tbl | 3
arch/mips/kernel/syscalls/syscall_n64.tbl | 1
arch/mips/kernel/syscalls/syscall_o32.tbl | 3
arch/parisc/kernel/syscalls/syscall.tbl | 3
arch/powerpc/kernel/syscalls/syscall.tbl | 3
arch/powerpc/platforms/powernv/vas-fault.c | 4
arch/s390/kernel/syscalls/syscall.tbl | 3
arch/sh/kernel/syscalls/syscall.tbl | 1
arch/sparc/kernel/syscalls/syscall.tbl | 3
arch/x86/entry/syscalls/syscall_32.tbl | 3
arch/x86/entry/syscalls/syscall_64.tbl | 5
arch/xtensa/kernel/syscalls/syscall.tbl | 1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 1
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 2
drivers/gpu/drm/i915/gvt/kvmgt.c | 2
drivers/usb/gadget/function/f_fs.c | 10
drivers/usb/gadget/legacy/inode.c | 6
drivers/vfio/vfio_iommu_type1.c | 6
drivers/vhost/vhost.c | 8
fs/aio.c | 1
fs/io-wq.c | 15 -
fs/io_uring.c | 11
fs/nilfs2/segment.c | 2
fs/ocfs2/mmap.c | 2
include/linux/compat.h | 10
include/linux/kthread.h | 9
include/linux/mm.h | 3
include/linux/mmu_context.h | 5
include/linux/mmzone.h | 14
include/linux/pid.h | 1
include/linux/stacktrace.h | 2
include/linux/syscalls.h | 16 -
include/uapi/asm-generic/unistd.h | 7
kernel/exit.c | 17 -
kernel/kcov.c | 26 +
kernel/kthread.c | 95 +++++-
kernel/pid.c | 17 +
kernel/sys_ni.c | 2
lib/Kconfig.debug | 10
lib/bitmap.c | 9
lib/lz4/lz4_decompress.c | 3
lib/test_bitops.c | 53 +++
mm/Makefile | 2
mm/debug_vm_pgtable.c | 6
mm/madvise.c | 295 ++++++++++++++------
mm/mmu_context.c | 64 ----
mm/oom_kill.c | 6
mm/vmacache.c | 4
scripts/checkpatch.pl | 4
scripts/spelling.txt | 9
tools/testing/selftests/vm/khugepaged.c | 2
62 files changed, 526 insertions(+), 285 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-09 4:29 incoming Andrew Morton
@ 2020-06-09 16:58 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-06-09 16:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Mon, Jun 8, 2020 at 9:29 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> 942 files changed, 4580 insertions(+), 5662 deletions(-)
If you use proper tools, add a "-M" to your diff script, so that you see
941 files changed, 2614 insertions(+), 3696 deletions(-)
because a big portion of the lines were due to a rename:
rename include/{asm-generic => linux}/pgtable.h (91%)
but at some earlier point you mentioned "diffstat", so I guess "proper
tools" isn't an option ;(
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-09 4:29 Andrew Morton
2020-06-09 16:58 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-06-09 4:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- a kernel-wide sweep of show_stack()
- pagetable cleanups
- abstract out accesses to mmap_sem - prep for mmap_sem scalability work
- hch's user acess work
93 patches, based on abfbb29297c27e3f101f348dc9e467b0fe70f919:
Subsystems affected by this patch series:
debug
mm/pagemap
mm/maccess
mm/documentation
Subsystem: debug
Dmitry Safonov <dima@arista.com>:
Patch series "Add log level to show_stack()", v3:
kallsyms/printk: add loglvl to print_ip_sym()
alpha: add show_stack_loglvl()
arc: add show_stack_loglvl()
arm/asm: add loglvl to c_backtrace()
arm: add loglvl to unwind_backtrace()
arm: add loglvl to dump_backtrace()
arm: wire up dump_backtrace_{entry,stm}
arm: add show_stack_loglvl()
arm64: add loglvl to dump_backtrace()
arm64: add show_stack_loglvl()
c6x: add show_stack_loglvl()
csky: add show_stack_loglvl()
h8300: add show_stack_loglvl()
hexagon: add show_stack_loglvl()
ia64: pass log level as arg into ia64_do_show_stack()
ia64: add show_stack_loglvl()
m68k: add show_stack_loglvl()
microblaze: add loglvl to microblaze_unwind_inner()
microblaze: add loglvl to microblaze_unwind()
microblaze: add show_stack_loglvl()
mips: add show_stack_loglvl()
nds32: add show_stack_loglvl()
nios2: add show_stack_loglvl()
openrisc: add show_stack_loglvl()
parisc: add show_stack_loglvl()
powerpc: add show_stack_loglvl()
riscv: add show_stack_loglvl()
s390: add show_stack_loglvl()
sh: add loglvl to dump_mem()
sh: remove needless printk()
sh: add loglvl to printk_address()
sh: add loglvl to show_trace()
sh: add show_stack_loglvl()
sparc: add show_stack_loglvl()
um/sysrq: remove needless variable sp
um: add show_stack_loglvl()
unicore32: remove unused pmode argument in c_backtrace()
unicore32: add loglvl to c_backtrace()
unicore32: add show_stack_loglvl()
x86: add missing const qualifiers for log_lvl
x86: add show_stack_loglvl()
xtensa: add loglvl to show_trace()
xtensa: add show_stack_loglvl()
sysrq: use show_stack_loglvl()
x86/amd_gart: print stacktrace for a leak with KERN_ERR
power: use show_stack_loglvl()
kdb: don't play with console_loglevel
sched: print stack trace with KERN_INFO
kernel: use show_stack_loglvl()
kernel: rename show_stack_loglvl() => show_stack()
Subsystem: mm/pagemap
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: consolidate definitions of page table accessors", v2:
mm: don't include asm/pgtable.h if linux/mm.h is already included
mm: introduce include/linux/pgtable.h
mm: reorder includes after introduction of linux/pgtable.h
csky: replace definitions of __pXd_offset() with pXd_index()
m68k/mm/motorola: move comment about page table allocation funcitons
m68k/mm: move {cache,nocahe}_page() definitions close to their user
x86/mm: simplify init_trampoline() and surrounding logic
mm: pgtable: add shortcuts for accessing kernel PMD and PTE
mm: consolidate pte_index() and pte_offset_*() definitions
Michel Lespinasse <walken@google.com>:
mmap locking API: initial implementation as rwsem wrappers
MMU notifier: use the new mmap locking API
DMA reservations: use the new mmap locking API
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
mmap locking API: convert mmap_sem call sites missed by coccinelle
mmap locking API: convert nested write lock sites
mmap locking API: add mmap_read_trylock_non_owner()
mmap locking API: add MMAP_LOCK_INITIALIZER
mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
mmap locking API: rename mmap_sem to mmap_lock
mmap locking API: convert mmap_sem API comments
mmap locking API: convert mmap_sem comments
Subsystem: mm/maccess
Christoph Hellwig <hch@lst.de>:
Patch series "clean up and streamline probe_kernel_* and friends", v4:
maccess: unexport probe_kernel_write()
maccess: remove various unused weak aliases
maccess: remove duplicate kerneldoc comments
maccess: clarify kerneldoc comments
maccess: update the top of file comment
maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
maccess: rename strnlen_unsafe_user to strnlen_user_nofault
maccess: remove probe_read_common and probe_write_common
maccess: unify the probe kernel arch hooks
bpf: factor out a bpf_trace_copy_string helper
bpf: handle the compat string in bpf_trace_copy_string better
Andrew Morton <akpm@linux-foundation.org>:
bpf:bpf_seq_printf(): handle potentially unsafe format string better
Christoph Hellwig <hch@lst.de>:
bpf: rework the compat kernel probe handling
tracing/kprobes: handle mixed kernel/userspace probes better
maccess: remove strncpy_from_unsafe
maccess: always use strict semantics for probe_kernel_read
maccess: move user access routines together
maccess: allow architectures to provide kernel probing directly
x86: use non-set_fs based maccess routines
maccess: return -ERANGE when probe_kernel_read() fails
Subsystem: mm/documentation
Luis Chamberlain <mcgrof@kernel.org>:
include/linux/cache.h: expand documentation over __read_mostly
Documentation/admin-guide/mm/numa_memory_policy.rst | 10
Documentation/admin-guide/mm/userfaultfd.rst | 2
Documentation/filesystems/locking.rst | 2
Documentation/vm/hmm.rst | 6
Documentation/vm/transhuge.rst | 4
arch/alpha/boot/bootp.c | 1
arch/alpha/boot/bootpz.c | 1
arch/alpha/boot/main.c | 1
arch/alpha/include/asm/io.h | 1
arch/alpha/include/asm/pgtable.h | 16
arch/alpha/kernel/process.c | 1
arch/alpha/kernel/proto.h | 4
arch/alpha/kernel/ptrace.c | 1
arch/alpha/kernel/setup.c | 1
arch/alpha/kernel/smp.c | 1
arch/alpha/kernel/sys_alcor.c | 1
arch/alpha/kernel/sys_cabriolet.c | 1
arch/alpha/kernel/sys_dp264.c | 1
arch/alpha/kernel/sys_eb64p.c | 1
arch/alpha/kernel/sys_eiger.c | 1
arch/alpha/kernel/sys_jensen.c | 1
arch/alpha/kernel/sys_marvel.c | 1
arch/alpha/kernel/sys_miata.c | 1
arch/alpha/kernel/sys_mikasa.c | 1
arch/alpha/kernel/sys_nautilus.c | 1
arch/alpha/kernel/sys_noritake.c | 1
arch/alpha/kernel/sys_rawhide.c | 1
arch/alpha/kernel/sys_ruffian.c | 1
arch/alpha/kernel/sys_rx164.c | 1
arch/alpha/kernel/sys_sable.c | 1
arch/alpha/kernel/sys_sio.c | 1
arch/alpha/kernel/sys_sx164.c | 1
arch/alpha/kernel/sys_takara.c | 1
arch/alpha/kernel/sys_titan.c | 1
arch/alpha/kernel/sys_wildfire.c | 1
arch/alpha/kernel/traps.c | 40
arch/alpha/mm/fault.c | 12
arch/alpha/mm/init.c | 1
arch/arc/include/asm/bug.h | 3
arch/arc/include/asm/pgtable.h | 24
arch/arc/kernel/process.c | 4
arch/arc/kernel/stacktrace.c | 29
arch/arc/kernel/troubleshoot.c | 6
arch/arc/mm/fault.c | 6
arch/arc/mm/highmem.c | 14
arch/arc/mm/tlbex.S | 4
arch/arm/include/asm/bug.h | 3
arch/arm/include/asm/efi.h | 3
arch/arm/include/asm/fixmap.h | 4
arch/arm/include/asm/idmap.h | 2
arch/arm/include/asm/pgtable-2level.h | 1
arch/arm/include/asm/pgtable-3level.h | 7
arch/arm/include/asm/pgtable-nommu.h | 3
arch/arm/include/asm/pgtable.h | 25
arch/arm/include/asm/traps.h | 3
arch/arm/include/asm/unwind.h | 3
arch/arm/kernel/head.S | 4
arch/arm/kernel/machine_kexec.c | 1
arch/arm/kernel/module.c | 1
arch/arm/kernel/process.c | 4
arch/arm/kernel/ptrace.c | 1
arch/arm/kernel/smp.c | 1
arch/arm/kernel/suspend.c | 4
arch/arm/kernel/swp_emulate.c | 4
arch/arm/kernel/traps.c | 61
arch/arm/kernel/unwind.c | 7
arch/arm/kernel/vdso.c | 2
arch/arm/kernel/vmlinux.lds.S | 4
arch/arm/lib/backtrace-clang.S | 9
arch/arm/lib/backtrace.S | 14
arch/arm/lib/uaccess_with_memcpy.c | 16
arch/arm/mach-ebsa110/core.c | 1
arch/arm/mach-footbridge/common.c | 1
arch/arm/mach-imx/mm-imx21.c | 1
arch/arm/mach-imx/mm-imx27.c | 1
arch/arm/mach-imx/mm-imx3.c | 1
arch/arm/mach-integrator/core.c | 4
arch/arm/mach-iop32x/i2c.c | 1
arch/arm/mach-iop32x/iq31244.c | 1
arch/arm/mach-iop32x/iq80321.c | 1
arch/arm/mach-iop32x/n2100.c | 1
arch/arm/mach-ixp4xx/common.c | 1
arch/arm/mach-keystone/platsmp.c | 4
arch/arm/mach-sa1100/assabet.c | 3
arch/arm/mach-sa1100/hackkit.c | 4
arch/arm/mach-tegra/iomap.h | 2
arch/arm/mach-zynq/common.c | 4
arch/arm/mm/copypage-v4mc.c | 1
arch/arm/mm/copypage-v6.c | 1
arch/arm/mm/copypage-xscale.c | 1
arch/arm/mm/dump.c | 1
arch/arm/mm/fault-armv.c | 1
arch/arm/mm/fault.c | 9
arch/arm/mm/highmem.c | 4
arch/arm/mm/idmap.c | 4
arch/arm/mm/ioremap.c | 31
arch/arm/mm/mm.h | 8
arch/arm/mm/mmu.c | 7
arch/arm/mm/pageattr.c | 1
arch/arm/mm/proc-arm1020.S | 4
arch/arm/mm/proc-arm1020e.S | 4
arch/arm/mm/proc-arm1022.S | 4
arch/arm/mm/proc-arm1026.S | 4
arch/arm/mm/proc-arm720.S | 4
arch/arm/mm/proc-arm740.S | 4
arch/arm/mm/proc-arm7tdmi.S | 4
arch/arm/mm/proc-arm920.S | 4
arch/arm/mm/proc-arm922.S | 4
arch/arm/mm/proc-arm925.S | 4
arch/arm/mm/proc-arm926.S | 4
arch/arm/mm/proc-arm940.S | 4
arch/arm/mm/proc-arm946.S | 4
arch/arm/mm/proc-arm9tdmi.S | 4
arch/arm/mm/proc-fa526.S | 4
arch/arm/mm/proc-feroceon.S | 4
arch/arm/mm/proc-mohawk.S | 4
arch/arm/mm/proc-sa110.S | 4
arch/arm/mm/proc-sa1100.S | 4
arch/arm/mm/proc-v6.S | 4
arch/arm/mm/proc-v7.S | 4
arch/arm/mm/proc-xsc3.S | 4
arch/arm/mm/proc-xscale.S | 4
arch/arm/mm/pv-fixup-asm.S | 4
arch/arm64/include/asm/io.h | 4
arch/arm64/include/asm/kernel-pgtable.h | 2
arch/arm64/include/asm/kvm_mmu.h | 4
arch/arm64/include/asm/mmu_context.h | 4
arch/arm64/include/asm/pgtable.h | 40
arch/arm64/include/asm/stacktrace.h | 3
arch/arm64/include/asm/stage2_pgtable.h | 2
arch/arm64/include/asm/vmap_stack.h | 4
arch/arm64/kernel/acpi.c | 4
arch/arm64/kernel/head.S | 4
arch/arm64/kernel/hibernate.c | 5
arch/arm64/kernel/kaslr.c | 4
arch/arm64/kernel/process.c | 2
arch/arm64/kernel/ptrace.c | 1
arch/arm64/kernel/smp.c | 1
arch/arm64/kernel/suspend.c | 4
arch/arm64/kernel/traps.c | 37
arch/arm64/kernel/vdso.c | 8
arch/arm64/kernel/vmlinux.lds.S | 3
arch/arm64/kvm/mmu.c | 14
arch/arm64/mm/dump.c | 1
arch/arm64/mm/fault.c | 9
arch/arm64/mm/kasan_init.c | 3
arch/arm64/mm/mmu.c | 8
arch/arm64/mm/pageattr.c | 1
arch/arm64/mm/proc.S | 4
arch/c6x/include/asm/pgtable.h | 3
arch/c6x/kernel/traps.c | 28
arch/csky/include/asm/io.h | 2
arch/csky/include/asm/pgtable.h | 37
arch/csky/kernel/module.c | 1
arch/csky/kernel/ptrace.c | 5
arch/csky/kernel/stacktrace.c | 20
arch/csky/kernel/vdso.c | 4
arch/csky/mm/fault.c | 10
arch/csky/mm/highmem.c | 2
arch/csky/mm/init.c | 7
arch/csky/mm/tlb.c | 1
arch/h8300/include/asm/pgtable.h | 1
arch/h8300/kernel/process.c | 1
arch/h8300/kernel/setup.c | 1
arch/h8300/kernel/signal.c | 1
arch/h8300/kernel/traps.c | 26
arch/h8300/mm/fault.c | 1
arch/h8300/mm/init.c | 1
arch/h8300/mm/memory.c | 1
arch/hexagon/include/asm/fixmap.h | 4
arch/hexagon/include/asm/pgtable.h | 55
arch/hexagon/kernel/traps.c | 39
arch/hexagon/kernel/vdso.c | 4
arch/hexagon/mm/uaccess.c | 2
arch/hexagon/mm/vm_fault.c | 9
arch/ia64/include/asm/pgtable.h | 34
arch/ia64/include/asm/ptrace.h | 1
arch/ia64/include/asm/uaccess.h | 2
arch/ia64/kernel/efi.c | 1
arch/ia64/kernel/entry.S | 4
arch/ia64/kernel/head.S | 5
arch/ia64/kernel/irq_ia64.c | 4
arch/ia64/kernel/ivt.S | 4
arch/ia64/kernel/kprobes.c | 4
arch/ia64/kernel/mca.c | 2
arch/ia64/kernel/mca_asm.S | 4
arch/ia64/kernel/perfmon.c | 8
arch/ia64/kernel/process.c | 37
arch/ia64/kernel/ptrace.c | 1
arch/ia64/kernel/relocate_kernel.S | 6
arch/ia64/kernel/setup.c | 4
arch/ia64/kernel/smp.c | 1
arch/ia64/kernel/smpboot.c | 1
arch/ia64/kernel/uncached.c | 4
arch/ia64/kernel/vmlinux.lds.S | 4
arch/ia64/mm/contig.c | 1
arch/ia64/mm/fault.c | 17
arch/ia64/mm/init.c | 12
arch/m68k/68000/m68EZ328.c | 2
arch/m68k/68000/m68VZ328.c | 4
arch/m68k/68000/timers.c | 1
arch/m68k/amiga/config.c | 1
arch/m68k/apollo/config.c | 1
arch/m68k/atari/atasound.c | 1
arch/m68k/atari/stram.c | 1
arch/m68k/bvme6000/config.c | 1
arch/m68k/include/asm/mcf_pgtable.h | 63
arch/m68k/include/asm/motorola_pgalloc.h | 8
arch/m68k/include/asm/motorola_pgtable.h | 84 -
arch/m68k/include/asm/pgtable_mm.h | 1
arch/m68k/include/asm/pgtable_no.h | 2
arch/m68k/include/asm/sun3_pgtable.h | 24
arch/m68k/include/asm/sun3xflop.h | 4
arch/m68k/kernel/head.S | 4
arch/m68k/kernel/process.c | 1
arch/m68k/kernel/ptrace.c | 1
arch/m68k/kernel/setup_no.c | 1
arch/m68k/kernel/signal.c | 1
arch/m68k/kernel/sys_m68k.c | 14
arch/m68k/kernel/traps.c | 27
arch/m68k/kernel/uboot.c | 1
arch/m68k/mac/config.c | 1
arch/m68k/mm/fault.c | 10
arch/m68k/mm/init.c | 2
arch/m68k/mm/mcfmmu.c | 1
arch/m68k/mm/motorola.c | 65
arch/m68k/mm/sun3kmap.c | 1
arch/m68k/mm/sun3mmu.c | 1
arch/m68k/mvme147/config.c | 1
arch/m68k/mvme16x/config.c | 1
arch/m68k/q40/config.c | 1
arch/m68k/sun3/config.c | 1
arch/m68k/sun3/dvma.c | 1
arch/m68k/sun3/mmu_emu.c | 1
arch/m68k/sun3/sun3dvma.c | 1
arch/m68k/sun3x/dvma.c | 1
arch/m68k/sun3x/prom.c | 1
arch/microblaze/include/asm/pgalloc.h | 4
arch/microblaze/include/asm/pgtable.h | 23
arch/microblaze/include/asm/uaccess.h | 2
arch/microblaze/include/asm/unwind.h | 3
arch/microblaze/kernel/hw_exception_handler.S | 4
arch/microblaze/kernel/module.c | 4
arch/microblaze/kernel/setup.c | 4
arch/microblaze/kernel/signal.c | 9
arch/microblaze/kernel/stacktrace.c | 4
arch/microblaze/kernel/traps.c | 28
arch/microblaze/kernel/unwind.c | 46
arch/microblaze/mm/fault.c | 17
arch/microblaze/mm/init.c | 9
arch/microblaze/mm/pgtable.c | 4
arch/mips/fw/arc/memory.c | 1
arch/mips/include/asm/fixmap.h | 3
arch/mips/include/asm/mach-generic/floppy.h | 1
arch/mips/include/asm/mach-jazz/floppy.h | 1
arch/mips/include/asm/pgtable-32.h | 22
arch/mips/include/asm/pgtable-64.h | 32
arch/mips/include/asm/pgtable.h | 2
arch/mips/jazz/irq.c | 4
arch/mips/jazz/jazzdma.c | 1
arch/mips/jazz/setup.c | 4
arch/mips/kernel/module.c | 1
arch/mips/kernel/process.c | 1
arch/mips/kernel/ptrace.c | 1
arch/mips/kernel/ptrace32.c | 1
arch/mips/kernel/smp-bmips.c | 1
arch/mips/kernel/traps.c | 58
arch/mips/kernel/vdso.c | 4
arch/mips/kvm/mips.c | 4
arch/mips/kvm/mmu.c | 20
arch/mips/kvm/tlb.c | 1
arch/mips/kvm/trap_emul.c | 2
arch/mips/lib/dump_tlb.c | 1
arch/mips/lib/r3k_dump_tlb.c | 1
arch/mips/mm/c-octeon.c | 1
arch/mips/mm/c-r3k.c | 11
arch/mips/mm/c-r4k.c | 11
arch/mips/mm/c-tx39.c | 11
arch/mips/mm/fault.c | 12
arch/mips/mm/highmem.c | 2
arch/mips/mm/init.c | 1
arch/mips/mm/page.c | 1
arch/mips/mm/pgtable-32.c | 1
arch/mips/mm/pgtable-64.c | 1
arch/mips/mm/sc-ip22.c | 1
arch/mips/mm/sc-mips.c | 1
arch/mips/mm/sc-r5k.c | 1
arch/mips/mm/tlb-r3k.c | 1
arch/mips/mm/tlb-r4k.c | 1
arch/mips/mm/tlbex.c | 4
arch/mips/sgi-ip27/ip27-init.c | 1
arch/mips/sgi-ip27/ip27-timer.c | 1
arch/mips/sgi-ip32/ip32-memory.c | 1
arch/nds32/include/asm/highmem.h | 3
arch/nds32/include/asm/pgtable.h | 22
arch/nds32/kernel/head.S | 4
arch/nds32/kernel/module.c | 2
arch/nds32/kernel/traps.c | 33
arch/nds32/kernel/vdso.c | 6
arch/nds32/mm/fault.c | 17
arch/nds32/mm/init.c | 13
arch/nds32/mm/proc.c | 7
arch/nios2/include/asm/pgtable.h | 24
arch/nios2/kernel/module.c | 1
arch/nios2/kernel/nios2_ksyms.c | 4
arch/nios2/kernel/traps.c | 35
arch/nios2/mm/fault.c | 14
arch/nios2/mm/init.c | 5
arch/nios2/mm/pgtable.c | 1
arch/nios2/mm/tlb.c | 1
arch/openrisc/include/asm/io.h | 3
arch/openrisc/include/asm/pgtable.h | 33
arch/openrisc/include/asm/tlbflush.h | 1
arch/openrisc/kernel/asm-offsets.c | 1
arch/openrisc/kernel/entry.S | 4
arch/openrisc/kernel/head.S | 4
arch/openrisc/kernel/or32_ksyms.c | 4
arch/openrisc/kernel/process.c | 1
arch/openrisc/kernel/ptrace.c | 1
arch/openrisc/kernel/setup.c | 1
arch/openrisc/kernel/traps.c | 27
arch/openrisc/mm/fault.c | 12
arch/openrisc/mm/init.c | 1
arch/openrisc/mm/ioremap.c | 4
arch/openrisc/mm/tlb.c | 1
arch/parisc/include/asm/io.h | 2
arch/parisc/include/asm/mmu_context.h | 1
arch/parisc/include/asm/pgtable.h | 33
arch/parisc/kernel/asm-offsets.c | 4
arch/parisc/kernel/entry.S | 4
arch/parisc/kernel/head.S | 4
arch/parisc/kernel/module.c | 1
arch/parisc/kernel/pacache.S | 4
arch/parisc/kernel/pci-dma.c | 2
arch/parisc/kernel/pdt.c | 4
arch/parisc/kernel/ptrace.c | 1
arch/parisc/kernel/smp.c | 1
arch/parisc/kernel/traps.c | 42
arch/parisc/lib/memcpy.c | 14
arch/parisc/mm/fault.c | 10
arch/parisc/mm/fixmap.c | 6
arch/parisc/mm/init.c | 1
arch/powerpc/include/asm/book3s/32/pgtable.h | 20
arch/powerpc/include/asm/book3s/64/pgtable.h | 43
arch/powerpc/include/asm/fixmap.h | 4
arch/powerpc/include/asm/io.h | 1
arch/powerpc/include/asm/kup.h | 2
arch/powerpc/include/asm/nohash/32/pgtable.h | 17
arch/powerpc/include/asm/nohash/64/pgtable-4k.h | 4
arch/powerpc/include/asm/nohash/64/pgtable.h | 22
arch/powerpc/include/asm/nohash/pgtable.h | 2
arch/powerpc/include/asm/pgtable.h | 28
arch/powerpc/include/asm/pkeys.h | 2
arch/powerpc/include/asm/tlb.h | 2
arch/powerpc/kernel/asm-offsets.c | 1
arch/powerpc/kernel/btext.c | 4
arch/powerpc/kernel/fpu.S | 3
arch/powerpc/kernel/head_32.S | 4
arch/powerpc/kernel/head_40x.S | 4
arch/powerpc/kernel/head_44x.S | 4
arch/powerpc/kernel/head_8xx.S | 4
arch/powerpc/kernel/head_fsl_booke.S | 4
arch/powerpc/kernel/io-workarounds.c | 4
arch/powerpc/kernel/irq.c | 4
arch/powerpc/kernel/mce_power.c | 4
arch/powerpc/kernel/paca.c | 4
arch/powerpc/kernel/process.c | 30
arch/powerpc/kernel/prom.c | 4
arch/powerpc/kernel/prom_init.c | 4
arch/powerpc/kernel/rtas_pci.c | 4
arch/powerpc/kernel/setup-common.c | 4
arch/powerpc/kernel/setup_32.c | 4
arch/powerpc/kernel/setup_64.c | 4
arch/powerpc/kernel/signal_32.c | 1
arch/powerpc/kernel/signal_64.c | 1
arch/powerpc/kernel/smp.c | 4
arch/powerpc/kernel/stacktrace.c | 2
arch/powerpc/kernel/traps.c | 1
arch/powerpc/kernel/vdso.c | 7
arch/powerpc/kvm/book3s_64_mmu_radix.c | 4
arch/powerpc/kvm/book3s_hv.c | 6
arch/powerpc/kvm/book3s_hv_nested.c | 4
arch/powerpc/kvm/book3s_hv_rm_xics.c | 4
arch/powerpc/kvm/book3s_hv_rm_xive.c | 4
arch/powerpc/kvm/book3s_hv_uvmem.c | 18
arch/powerpc/kvm/e500_mmu_host.c | 4
arch/powerpc/kvm/fpu.S | 4
arch/powerpc/lib/code-patching.c | 1
arch/powerpc/mm/book3s32/hash_low.S | 4
arch/powerpc/mm/book3s32/mmu.c | 2
arch/powerpc/mm/book3s32/tlb.c | 6
arch/powerpc/mm/book3s64/hash_hugetlbpage.c | 1
arch/powerpc/mm/book3s64/hash_native.c | 4
arch/powerpc/mm/book3s64/hash_pgtable.c | 5
arch/powerpc/mm/book3s64/hash_utils.c | 4
arch/powerpc/mm/book3s64/iommu_api.c | 4
arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 1
arch/powerpc/mm/book3s64/radix_pgtable.c | 1
arch/powerpc/mm/book3s64/slb.c | 4
arch/powerpc/mm/book3s64/subpage_prot.c | 16
arch/powerpc/mm/copro_fault.c | 4
arch/powerpc/mm/fault.c | 23
arch/powerpc/mm/hugetlbpage.c | 1
arch/powerpc/mm/init-common.c | 4
arch/powerpc/mm/init_32.c | 1
arch/powerpc/mm/init_64.c | 1
arch/powerpc/mm/kasan/8xx.c | 4
arch/powerpc/mm/kasan/book3s_32.c | 2
arch/powerpc/mm/kasan/kasan_init_32.c | 8
arch/powerpc/mm/mem.c | 1
arch/powerpc/mm/nohash/40x.c | 5
arch/powerpc/mm/nohash/8xx.c | 2
arch/powerpc/mm/nohash/fsl_booke.c | 1
arch/powerpc/mm/nohash/tlb_low_64e.S | 4
arch/powerpc/mm/pgtable.c | 2
arch/powerpc/mm/pgtable_32.c | 5
arch/powerpc/mm/pgtable_64.c | 1
arch/powerpc/mm/ptdump/8xx.c | 2
arch/powerpc/mm/ptdump/bats.c | 4
arch/powerpc/mm/ptdump/book3s64.c | 2
arch/powerpc/mm/ptdump/hashpagetable.c | 1
arch/powerpc/mm/ptdump/ptdump.c | 1
arch/powerpc/mm/ptdump/shared.c | 2
arch/powerpc/oprofile/cell/spu_task_sync.c | 6
arch/powerpc/perf/callchain.c | 1
arch/powerpc/perf/callchain_32.c | 1
arch/powerpc/perf/callchain_64.c | 1
arch/powerpc/platforms/85xx/corenet_generic.c | 4
arch/powerpc/platforms/85xx/mpc85xx_cds.c | 4
arch/powerpc/platforms/85xx/qemu_e500.c | 4
arch/powerpc/platforms/85xx/sbc8548.c | 4
arch/powerpc/platforms/85xx/smp.c | 4
arch/powerpc/platforms/86xx/mpc86xx_smp.c | 4
arch/powerpc/platforms/8xx/cpm1.c | 1
arch/powerpc/platforms/8xx/micropatch.c | 1
arch/powerpc/platforms/cell/cbe_regs.c | 4
arch/powerpc/platforms/cell/interrupt.c | 4
arch/powerpc/platforms/cell/pervasive.c | 4
arch/powerpc/platforms/cell/setup.c | 1
arch/powerpc/platforms/cell/smp.c | 4
arch/powerpc/platforms/cell/spider-pic.c | 4
arch/powerpc/platforms/cell/spufs/file.c | 10
arch/powerpc/platforms/chrp/pci.c | 4
arch/powerpc/platforms/chrp/setup.c | 1
arch/powerpc/platforms/chrp/smp.c | 4
arch/powerpc/platforms/maple/setup.c | 1
arch/powerpc/platforms/maple/time.c | 1
arch/powerpc/platforms/powermac/setup.c | 1
arch/powerpc/platforms/powermac/smp.c | 4
arch/powerpc/platforms/powermac/time.c | 1
arch/powerpc/platforms/pseries/lpar.c | 4
arch/powerpc/platforms/pseries/setup.c | 1
arch/powerpc/platforms/pseries/smp.c | 4
arch/powerpc/sysdev/cpm2.c | 1
arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 2
arch/powerpc/sysdev/mpic.c | 4
arch/powerpc/xmon/xmon.c | 1
arch/riscv/include/asm/fixmap.h | 4
arch/riscv/include/asm/io.h | 4
arch/riscv/include/asm/kasan.h | 4
arch/riscv/include/asm/pgtable-64.h | 7
arch/riscv/include/asm/pgtable.h | 22
arch/riscv/kernel/module.c | 2
arch/riscv/kernel/setup.c | 1
arch/riscv/kernel/soc.c | 2
arch/riscv/kernel/stacktrace.c | 23
arch/riscv/kernel/vdso.c | 4
arch/riscv/mm/cacheflush.c | 3
arch/riscv/mm/fault.c | 14
arch/riscv/mm/init.c | 31
arch/riscv/mm/kasan_init.c | 4
arch/riscv/mm/pageattr.c | 6
arch/riscv/mm/ptdump.c | 2
arch/s390/boot/ipl_parm.c | 4
arch/s390/boot/kaslr.c | 4
arch/s390/include/asm/hugetlb.h | 4
arch/s390/include/asm/kasan.h | 4
arch/s390/include/asm/pgtable.h | 15
arch/s390/include/asm/tlbflush.h | 1
arch/s390/kernel/asm-offsets.c | 4
arch/s390/kernel/dumpstack.c | 25
arch/s390/kernel/machine_kexec.c | 1
arch/s390/kernel/ptrace.c | 1
arch/s390/kernel/uv.c | 4
arch/s390/kernel/vdso.c | 5
arch/s390/kvm/gaccess.c | 8
arch/s390/kvm/interrupt.c | 4
arch/s390/kvm/kvm-s390.c | 32
arch/s390/kvm/priv.c | 38
arch/s390/mm/dump_pagetables.c | 1
arch/s390/mm/extmem.c | 4
arch/s390/mm/fault.c | 17
arch/s390/mm/gmap.c | 80
arch/s390/mm/init.c | 1
arch/s390/mm/kasan_init.c | 4
arch/s390/mm/pageattr.c | 13
arch/s390/mm/pgalloc.c | 2
arch/s390/mm/pgtable.c | 1
arch/s390/mm/vmem.c | 1
arch/s390/pci/pci_mmio.c | 4
arch/sh/include/asm/io.h | 2
arch/sh/include/asm/kdebug.h | 6
arch/sh/include/asm/pgtable-3level.h | 7
arch/sh/include/asm/pgtable.h | 2
arch/sh/include/asm/pgtable_32.h | 25
arch/sh/include/asm/processor_32.h | 2
arch/sh/kernel/dumpstack.c | 54
arch/sh/kernel/machine_kexec.c | 1
arch/sh/kernel/process_32.c | 2
arch/sh/kernel/ptrace_32.c | 1
arch/sh/kernel/signal_32.c | 1
arch/sh/kernel/sys_sh.c | 6
arch/sh/kernel/traps.c | 4
arch/sh/kernel/vsyscall/vsyscall.c | 4
arch/sh/mm/cache-sh3.c | 1
arch/sh/mm/cache-sh4.c | 11
arch/sh/mm/cache-sh7705.c | 1
arch/sh/mm/fault.c | 16
arch/sh/mm/kmap.c | 5
arch/sh/mm/nommu.c | 1
arch/sh/mm/pmb.c | 4
arch/sparc/include/asm/floppy_32.h | 4
arch/sparc/include/asm/highmem.h | 4
arch/sparc/include/asm/ide.h | 2
arch/sparc/include/asm/io-unit.h | 4
arch/sparc/include/asm/pgalloc_32.h | 4
arch/sparc/include/asm/pgalloc_64.h | 2
arch/sparc/include/asm/pgtable_32.h | 34
arch/sparc/include/asm/pgtable_64.h | 32
arch/sparc/kernel/cpu.c | 4
arch/sparc/kernel/entry.S | 4
arch/sparc/kernel/head_64.S | 4
arch/sparc/kernel/ktlb.S | 4
arch/sparc/kernel/leon_smp.c | 1
arch/sparc/kernel/pci.c | 4
arch/sparc/kernel/process_32.c | 29
arch/sparc/kernel/process_64.c | 3
arch/sparc/kernel/ptrace_32.c | 1
arch/sparc/kernel/ptrace_64.c | 1
arch/sparc/kernel/setup_32.c | 1
arch/sparc/kernel/setup_64.c | 1
arch/sparc/kernel/signal32.c | 1
arch/sparc/kernel/signal_32.c | 1
arch/sparc/kernel/signal_64.c | 1
arch/sparc/kernel/smp_32.c | 1
arch/sparc/kernel/smp_64.c | 1
arch/sparc/kernel/sun4m_irq.c | 4
arch/sparc/kernel/trampoline_64.S | 4
arch/sparc/kernel/traps_32.c | 4
arch/sparc/kernel/traps_64.c | 24
arch/sparc/lib/clear_page.S | 4
arch/sparc/lib/copy_page.S | 2
arch/sparc/mm/fault_32.c | 21
arch/sparc/mm/fault_64.c | 17
arch/sparc/mm/highmem.c | 12
arch/sparc/mm/hugetlbpage.c | 1
arch/sparc/mm/init_32.c | 1
arch/sparc/mm/init_64.c | 7
arch/sparc/mm/io-unit.c | 11
arch/sparc/mm/iommu.c | 9
arch/sparc/mm/tlb.c | 1
arch/sparc/mm/tsb.c | 4
arch/sparc/mm/ultra.S | 4
arch/sparc/vdso/vma.c | 4
arch/um/drivers/mconsole_kern.c | 2
arch/um/include/asm/mmu_context.h | 5
arch/um/include/asm/pgtable-3level.h | 4
arch/um/include/asm/pgtable.h | 69
arch/um/kernel/maccess.c | 12
arch/um/kernel/mem.c | 10
arch/um/kernel/process.c | 1
arch/um/kernel/skas/mmu.c | 3
arch/um/kernel/skas/uaccess.c | 1
arch/um/kernel/sysrq.c | 35
arch/um/kernel/tlb.c | 5
arch/um/kernel/trap.c | 15
arch/um/kernel/um_arch.c | 1
arch/unicore32/include/asm/pgtable.h | 19
arch/unicore32/kernel/hibernate.c | 4
arch/unicore32/kernel/hibernate_asm.S | 4
arch/unicore32/kernel/module.c | 1
arch/unicore32/kernel/setup.h | 4
arch/unicore32/kernel/traps.c | 50
arch/unicore32/lib/backtrace.S | 24
arch/unicore32/mm/alignment.c | 4
arch/unicore32/mm/fault.c | 9
arch/unicore32/mm/mm.h | 10
arch/unicore32/mm/proc-ucv2.S | 4
arch/x86/boot/compressed/kaslr_64.c | 4
arch/x86/entry/vdso/vma.c | 14
arch/x86/events/core.c | 4
arch/x86/include/asm/agp.h | 2
arch/x86/include/asm/asm-prototypes.h | 4
arch/x86/include/asm/efi.h | 4
arch/x86/include/asm/iomap.h | 1
arch/x86/include/asm/kaslr.h | 2
arch/x86/include/asm/mmu.h | 2
arch/x86/include/asm/pgtable-3level.h | 8
arch/x86/include/asm/pgtable.h | 89 -
arch/x86/include/asm/pgtable_32.h | 11
arch/x86/include/asm/pgtable_64.h | 4
arch/x86/include/asm/setup.h | 12
arch/x86/include/asm/stacktrace.h | 2
arch/x86/include/asm/uaccess.h | 16
arch/x86/include/asm/xen/hypercall.h | 4
arch/x86/include/asm/xen/page.h | 1
arch/x86/kernel/acpi/boot.c | 4
arch/x86/kernel/acpi/sleep.c | 4
arch/x86/kernel/alternative.c | 1
arch/x86/kernel/amd_gart_64.c | 5
arch/x86/kernel/apic/apic_numachip.c | 4
arch/x86/kernel/cpu/bugs.c | 4
arch/x86/kernel/cpu/common.c | 4
arch/x86/kernel/cpu/intel.c | 4
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6
arch/x86/kernel/crash_core_32.c | 4
arch/x86/kernel/crash_core_64.c | 4
arch/x86/kernel/doublefault_32.c | 1
arch/x86/kernel/dumpstack.c | 21
arch/x86/kernel/early_printk.c | 4
arch/x86/kernel/espfix_64.c | 2
arch/x86/kernel/head64.c | 4
arch/x86/kernel/head_64.S | 4
arch/x86/kernel/i8259.c | 4
arch/x86/kernel/irqinit.c | 4
arch/x86/kernel/kprobes/core.c | 4
arch/x86/kernel/kprobes/opt.c | 4
arch/x86/kernel/ldt.c | 2
arch/x86/kernel/machine_kexec_32.c | 1
arch/x86/kernel/machine_kexec_64.c | 1
arch/x86/kernel/module.c | 1
arch/x86/kernel/paravirt.c | 4
arch/x86/kernel/process_32.c | 1
arch/x86/kernel/process_64.c | 1
arch/x86/kernel/ptrace.c | 1
arch/x86/kernel/reboot.c | 4
arch/x86/kernel/smpboot.c | 4
arch/x86/kernel/tboot.c | 3
arch/x86/kernel/vm86_32.c | 4
arch/x86/kvm/mmu/paging_tmpl.h | 8
arch/x86/mm/cpu_entry_area.c | 4
arch/x86/mm/debug_pagetables.c | 2
arch/x86/mm/dump_pagetables.c | 1
arch/x86/mm/fault.c | 22
arch/x86/mm/init.c | 22
arch/x86/mm/init_32.c | 27
arch/x86/mm/init_64.c | 1
arch/x86/mm/ioremap.c | 4
arch/x86/mm/kasan_init_64.c | 1
arch/x86/mm/kaslr.c | 37
arch/x86/mm/maccess.c | 44
arch/x86/mm/mem_encrypt_boot.S | 2
arch/x86/mm/mmio-mod.c | 4
arch/x86/mm/pat/cpa-test.c | 1
arch/x86/mm/pat/memtype.c | 1
arch/x86/mm/pat/memtype_interval.c | 4
arch/x86/mm/pgtable.c | 1
arch/x86/mm/pgtable_32.c | 1
arch/x86/mm/pti.c | 1
arch/x86/mm/setup_nx.c | 4
arch/x86/platform/efi/efi_32.c | 4
arch/x86/platform/efi/efi_64.c | 1
arch/x86/platform/olpc/olpc_ofw.c | 4
arch/x86/power/cpu.c | 4
arch/x86/power/hibernate.c | 4
arch/x86/power/hibernate_32.c | 4
arch/x86/power/hibernate_64.c | 4
arch/x86/realmode/init.c | 4
arch/x86/um/vdso/vma.c | 4
arch/x86/xen/enlighten_pv.c | 1
arch/x86/xen/grant-table.c | 1
arch/x86/xen/mmu_pv.c | 4
arch/x86/xen/smp_pv.c | 2
arch/xtensa/include/asm/fixmap.h | 12
arch/xtensa/include/asm/highmem.h | 4
arch/xtensa/include/asm/initialize_mmu.h | 2
arch/xtensa/include/asm/mmu_context.h | 4
arch/xtensa/include/asm/pgtable.h | 20
arch/xtensa/kernel/entry.S | 4
arch/xtensa/kernel/process.c | 1
arch/xtensa/kernel/ptrace.c | 1
arch/xtensa/kernel/setup.c | 1
arch/xtensa/kernel/traps.c | 42
arch/xtensa/kernel/vectors.S | 4
arch/xtensa/mm/cache.c | 4
arch/xtensa/mm/fault.c | 12
arch/xtensa/mm/highmem.c | 2
arch/xtensa/mm/ioremap.c | 4
arch/xtensa/mm/kasan_init.c | 10
arch/xtensa/mm/misc.S | 4
arch/xtensa/mm/mmu.c | 5
drivers/acpi/scan.c | 3
drivers/android/binder_alloc.c | 14
drivers/atm/fore200e.c | 4
drivers/base/power/main.c | 4
drivers/block/z2ram.c | 4
drivers/char/agp/frontend.c | 1
drivers/char/agp/generic.c | 1
drivers/char/bsr.c | 1
drivers/char/mspec.c | 3
drivers/dma-buf/dma-resv.c | 5
drivers/firmware/efi/arm-runtime.c | 4
drivers/firmware/efi/efi.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4
drivers/gpu/drm/drm_vm.c | 4
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 14
drivers/gpu/drm/i915/i915_mm.c | 1
drivers/gpu/drm/i915/i915_perf.c | 2
drivers/gpu/drm/nouveau/nouveau_svm.c | 22
drivers/gpu/drm/radeon/radeon_cs.c | 4
drivers/gpu/drm/radeon/radeon_gem.c | 6
drivers/gpu/drm/ttm/ttm_bo_vm.c | 10
drivers/infiniband/core/umem_odp.c | 4
drivers/infiniband/core/uverbs_main.c | 6
drivers/infiniband/hw/hfi1/mmu_rb.c | 2
drivers/infiniband/hw/mlx4/mr.c | 4
drivers/infiniband/hw/qib/qib_file_ops.c | 4
drivers/infiniband/hw/qib/qib_user_pages.c | 6
drivers/infiniband/hw/usnic/usnic_uiom.c | 4
drivers/infiniband/sw/rdmavt/mmap.c | 1
drivers/infiniband/sw/rxe/rxe_mmap.c | 1
drivers/infiniband/sw/siw/siw_mem.c | 4
drivers/iommu/amd_iommu_v2.c | 4
drivers/iommu/intel-svm.c | 4
drivers/macintosh/macio-adb.c | 4
drivers/macintosh/mediabay.c | 4
drivers/macintosh/via-pmu.c | 4
drivers/media/pci/bt8xx/bt878.c | 4
drivers/media/pci/bt8xx/btcx-risc.c | 4
drivers/media/pci/bt8xx/bttv-risc.c | 4
drivers/media/platform/davinci/vpbe_display.c | 1
drivers/media/v4l2-core/v4l2-common.c | 1
drivers/media/v4l2-core/videobuf-core.c | 4
drivers/media/v4l2-core/videobuf-dma-contig.c | 4
drivers/media/v4l2-core/videobuf-dma-sg.c | 10
drivers/media/v4l2-core/videobuf-vmalloc.c | 4
drivers/misc/cxl/cxllib.c | 9
drivers/misc/cxl/fault.c | 4
drivers/misc/genwqe/card_utils.c | 2
drivers/misc/sgi-gru/grufault.c | 25
drivers/misc/sgi-gru/grufile.c | 4
drivers/mtd/ubi/ubi.h | 2
drivers/net/ethernet/amd/7990.c | 4
drivers/net/ethernet/amd/hplance.c | 4
drivers/net/ethernet/amd/mvme147.c | 4
drivers/net/ethernet/amd/sun3lance.c | 4
drivers/net/ethernet/amd/sunlance.c | 4
drivers/net/ethernet/apple/bmac.c | 4
drivers/net/ethernet/apple/mace.c | 4
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 4
drivers/net/ethernet/freescale/fs_enet/mac-fcc.c | 4
drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 4
drivers/net/ethernet/i825xx/82596.c | 4
drivers/net/ethernet/korina.c | 4
drivers/net/ethernet/marvell/pxa168_eth.c | 4
drivers/net/ethernet/natsemi/jazzsonic.c | 4
drivers/net/ethernet/natsemi/macsonic.c | 4
drivers/net/ethernet/natsemi/xtsonic.c | 4
drivers/net/ethernet/sun/sunbmac.c | 4
drivers/net/ethernet/sun/sunhme.c | 1
drivers/net/ethernet/sun/sunqe.c | 4
drivers/oprofile/buffer_sync.c | 12
drivers/sbus/char/flash.c | 1
drivers/sbus/char/uctrl.c | 1
drivers/scsi/53c700.c | 4
drivers/scsi/a2091.c | 1
drivers/scsi/a3000.c | 1
drivers/scsi/arm/cumana_2.c | 4
drivers/scsi/arm/eesox.c | 4
drivers/scsi/arm/powertec.c | 4
drivers/scsi/dpt_i2o.c | 4
drivers/scsi/gvp11.c | 1
drivers/scsi/lasi700.c | 1
drivers/scsi/mac53c94.c | 4
drivers/scsi/mesh.c | 4
drivers/scsi/mvme147.c | 1
drivers/scsi/qlogicpti.c | 4
drivers/scsi/sni_53c710.c | 1
drivers/scsi/zorro_esp.c | 4
drivers/staging/android/ashmem.c | 4
drivers/staging/comedi/comedi_fops.c | 2
drivers/staging/kpc2000/kpc_dma/fileops.c | 4
drivers/staging/media/atomisp/pci/hmm/hmm_bo.c | 4
drivers/tee/optee/call.c | 4
drivers/tty/sysrq.c | 4
drivers/tty/vt/consolemap.c | 2
drivers/vfio/pci/vfio_pci.c | 22
drivers/vfio/vfio_iommu_type1.c | 8
drivers/vhost/vdpa.c | 4
drivers/video/console/newport_con.c | 1
drivers/video/fbdev/acornfb.c | 1
drivers/video/fbdev/atafb.c | 1
drivers/video/fbdev/cirrusfb.c | 1
drivers/video/fbdev/cyber2000fb.c | 1
drivers/video/fbdev/fb-puv3.c | 1
drivers/video/fbdev/hitfb.c | 1
drivers/video/fbdev/neofb.c | 1
drivers/video/fbdev/q40fb.c | 1
drivers/video/fbdev/savage/savagefb_driver.c | 1
drivers/xen/balloon.c | 1
drivers/xen/gntdev.c | 6
drivers/xen/grant-table.c | 1
drivers/xen/privcmd.c | 15
drivers/xen/xenbus/xenbus_probe.c | 1
drivers/xen/xenbus/xenbus_probe_backend.c | 1
drivers/xen/xenbus/xenbus_probe_frontend.c | 1
fs/aio.c | 4
fs/coredump.c | 8
fs/exec.c | 18
fs/ext2/file.c | 2
fs/ext4/super.c | 6
fs/hugetlbfs/inode.c | 2
fs/io_uring.c | 4
fs/kernfs/file.c | 4
fs/proc/array.c | 1
fs/proc/base.c | 24
fs/proc/meminfo.c | 1
fs/proc/nommu.c | 1
fs/proc/task_mmu.c | 34
fs/proc/task_nommu.c | 18
fs/proc/vmcore.c | 1
fs/userfaultfd.c | 46
fs/xfs/xfs_file.c | 2
fs/xfs/xfs_inode.c | 14
fs/xfs/xfs_iops.c | 4
include/asm-generic/io.h | 2
include/asm-generic/pgtable-nopmd.h | 1
include/asm-generic/pgtable-nopud.h | 1
include/asm-generic/pgtable.h | 1322 ----------------
include/linux/cache.h | 10
include/linux/crash_dump.h | 3
include/linux/dax.h | 1
include/linux/dma-noncoherent.h | 2
include/linux/fs.h | 4
include/linux/hmm.h | 2
include/linux/huge_mm.h | 2
include/linux/hugetlb.h | 2
include/linux/io-mapping.h | 4
include/linux/kallsyms.h | 4
include/linux/kasan.h | 4
include/linux/mempolicy.h | 2
include/linux/mm.h | 15
include/linux/mm_types.h | 4
include/linux/mmap_lock.h | 128 +
include/linux/mmu_notifier.h | 13
include/linux/pagemap.h | 2
include/linux/pgtable.h | 1444 +++++++++++++++++-
include/linux/rmap.h | 2
include/linux/sched/debug.h | 7
include/linux/sched/mm.h | 10
include/linux/uaccess.h | 62
include/xen/arm/page.h | 4
init/init_task.c | 1
ipc/shm.c | 8
kernel/acct.c | 6
kernel/bpf/stackmap.c | 21
kernel/bpf/syscall.c | 2
kernel/cgroup/cpuset.c | 4
kernel/debug/kdb/kdb_bt.c | 17
kernel/events/core.c | 10
kernel/events/uprobes.c | 20
kernel/exit.c | 11
kernel/fork.c | 15
kernel/futex.c | 4
kernel/locking/lockdep.c | 4
kernel/locking/rtmutex-debug.c | 4
kernel/power/snapshot.c | 1
kernel/relay.c | 2
kernel/sched/core.c | 10
kernel/sched/fair.c | 4
kernel/sys.c | 22
kernel/trace/bpf_trace.c | 176 +-
kernel/trace/ftrace.c | 8
kernel/trace/trace_kprobe.c | 80
kernel/trace/trace_output.c | 4
lib/dump_stack.c | 4
lib/ioremap.c | 1
lib/test_hmm.c | 14
lib/test_lockup.c | 16
mm/debug.c | 10
mm/debug_vm_pgtable.c | 1
mm/filemap.c | 46
mm/frame_vector.c | 6
mm/gup.c | 73
mm/hmm.c | 2
mm/huge_memory.c | 8
mm/hugetlb.c | 3
mm/init-mm.c | 6
mm/internal.h | 6
mm/khugepaged.c | 72
mm/ksm.c | 48
mm/maccess.c | 496 +++---
mm/madvise.c | 40
mm/memcontrol.c | 10
mm/memory.c | 61
mm/mempolicy.c | 36
mm/migrate.c | 16
mm/mincore.c | 8
mm/mlock.c | 22
mm/mmap.c | 74
mm/mmu_gather.c | 2
mm/mmu_notifier.c | 22
mm/mprotect.c | 22
mm/mremap.c | 14
mm/msync.c | 8
mm/nommu.c | 22
mm/oom_kill.c | 14
mm/page_io.c | 1
mm/page_reporting.h | 2
mm/pagewalk.c | 12
mm/pgtable-generic.c | 6
mm/process_vm_access.c | 4
mm/ptdump.c | 4
mm/rmap.c | 12
mm/shmem.c | 5
mm/sparse-vmemmap.c | 1
mm/sparse.c | 1
mm/swap_state.c | 5
mm/swapfile.c | 5
mm/userfaultfd.c | 26
mm/util.c | 12
mm/vmacache.c | 1
mm/zsmalloc.c | 4
net/ipv4/tcp.c | 8
net/xdp/xdp_umem.c | 4
security/keys/keyctl.c | 2
sound/core/oss/pcm_oss.c | 2
sound/core/sgbuf.c | 1
sound/pci/hda/hda_intel.c | 4
sound/soc/intel/common/sst-firmware.c | 4
sound/soc/intel/haswell/sst-haswell-pcm.c | 4
tools/include/linux/kallsyms.h | 2
virt/kvm/async_pf.c | 4
virt/kvm/kvm_main.c | 9
942 files changed, 4580 insertions(+), 5662 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-08 4:35 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-06-08 4:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
Various trees. Mainly those parts of MM whose linux-next dependents
are now merged. I'm still sitting on ~160 patches which await merges
from -next.
54 patches, based on 9aa900c8094dba7a60dc805ecec1e9f720744ba1.
Subsystems affected by this patch series:
mm/proc
ipc
dynamic-debug
panic
lib
sysctl
mm/gup
mm/pagemap
Subsystem: mm/proc
SeongJae Park <sjpark@amazon.de>:
mm/page_idle.c: skip offline pages
Subsystem: ipc
Jules Irenge <jbi.octave@gmail.com>:
ipc/msg: add missing annotation for freeque()
Giuseppe Scrivano <gscrivan@redhat.com>:
ipc/namespace.c: use a work queue to free_ipc
Subsystem: dynamic-debug
Orson Zhai <orson.zhai@unisoc.com>:
dynamic_debug: add an option to enable dynamic debug for modules only
Subsystem: panic
Rafael Aquini <aquini@redhat.com>:
kernel: add panic_on_taint
Subsystem: lib
Manfred Spraul <manfred@colorfullife.com>:
xarray.h: correct return code documentation for xa_store_{bh,irq}()
Subsystem: sysctl
Vlastimil Babka <vbabka@suse.cz>:
Patch series "support setting sysctl parameters from kernel command line", v3:
kernel/sysctl: support setting sysctl parameters from kernel command line
kernel/sysctl: support handling command line aliases
kernel/hung_task convert hung_task_panic boot parameter to sysctl
tools/testing/selftests/sysctl/sysctl.sh: support CONFIG_TEST_SYSCTL=y
lib/test_sysctl: support testing of sysctl. boot parameter
"Guilherme G. Piccoli" <gpiccoli@canonical.com>:
kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases
kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected
panic: add sysctl to dump all CPUs backtraces on oops event
Rafael Aquini <aquini@redhat.com>:
kernel/sysctl.c: ignore out-of-range taint bits introduced via kernel.tainted
Subsystem: mm/gup
Souptick Joarder <jrdr.linux@gmail.com>:
mm/gup.c: convert to use get_user_{page|pages}_fast_only()
John Hubbard <jhubbard@nvidia.com>:
mm/gup: update pin_user_pages.rst for "case 3" (mmu notifiers)
Patch series "mm/gup: introduce pin_user_pages_locked(), use it in frame_vector.c", v2:
mm/gup: introduce pin_user_pages_locked()
mm/gup: frame_vector: convert get_user_pages() --> pin_user_pages()
mm/gup: documentation fix for pin_user_pages*() APIs
Patch series "vhost, docs: convert to pin_user_pages(), new "case 5"":
docs: mm/gup: pin_user_pages.rst: add a "case 5"
vhost: convert get_user_pages() --> pin_user_pages()
Subsystem: mm/pagemap
Alexander Gordeev <agordeev@linux.ibm.com>:
mm/mmap.c: add more sanity checks to get_unmapped_area()
mm/mmap.c: do not allow mappings outside of allowed limits
Christoph Hellwig <hch@lst.de>:
Patch series "sort out the flush_icache_range mess", v2:
arm: fix the flush_icache_range arguments in set_fiq_handler
nds32: unexport flush_icache_page
powerpc: unexport flush_icache_user_range
unicore32: remove flush_cache_user_range
asm-generic: fix the inclusion guards for cacheflush.h
asm-generic: don't include <linux/mm.h> in cacheflush.h
asm-generic: improve the flush_dcache_page stub
alpha: use asm-generic/cacheflush.h
arm64: use asm-generic/cacheflush.h
c6x: use asm-generic/cacheflush.h
hexagon: use asm-generic/cacheflush.h
ia64: use asm-generic/cacheflush.h
microblaze: use asm-generic/cacheflush.h
m68knommu: use asm-generic/cacheflush.h
openrisc: use asm-generic/cacheflush.h
powerpc: use asm-generic/cacheflush.h
riscv: use asm-generic/cacheflush.h
arm,sparc,unicore32: remove flush_icache_user_range
mm: rename flush_icache_user_range to flush_icache_user_page
asm-generic: add a flush_icache_user_range stub
sh: implement flush_icache_user_range
xtensa: implement flush_icache_user_range
arm: rename flush_cache_user_range to flush_icache_user_range
m68k: implement flush_icache_user_range
exec: only build read_code when needed
exec: use flush_icache_user_range in read_code
binfmt_flat: use flush_icache_user_range
nommu: use flush_icache_user_range in brk and mmap
module: move the set_fs hack for flush_icache_range to m68k
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
doc: cgroup: update note about conditions when oom killer is invoked
Documentation/admin-guide/cgroup-v2.rst | 17 +-
Documentation/admin-guide/dynamic-debug-howto.rst | 5
Documentation/admin-guide/kdump/kdump.rst | 8 +
Documentation/admin-guide/kernel-parameters.txt | 34 +++-
Documentation/admin-guide/sysctl/kernel.rst | 37 ++++
Documentation/core-api/pin_user_pages.rst | 47 ++++--
arch/alpha/include/asm/cacheflush.h | 38 +----
arch/alpha/kernel/smp.c | 2
arch/arm/include/asm/cacheflush.h | 7
arch/arm/kernel/fiq.c | 4
arch/arm/kernel/traps.c | 2
arch/arm64/include/asm/cacheflush.h | 46 ------
arch/c6x/include/asm/cacheflush.h | 19 --
arch/hexagon/include/asm/cacheflush.h | 19 --
arch/ia64/include/asm/cacheflush.h | 30 ----
arch/m68k/include/asm/cacheflush_mm.h | 6
arch/m68k/include/asm/cacheflush_no.h | 19 --
arch/m68k/mm/cache.c | 13 +
arch/microblaze/include/asm/cacheflush.h | 29 ---
arch/nds32/include/asm/cacheflush.h | 4
arch/nds32/mm/cacheflush.c | 3
arch/openrisc/include/asm/cacheflush.h | 33 ----
arch/powerpc/include/asm/cacheflush.h | 46 +-----
arch/powerpc/kvm/book3s_64_mmu_hv.c | 2
arch/powerpc/kvm/book3s_64_mmu_radix.c | 2
arch/powerpc/mm/mem.c | 3
arch/powerpc/perf/callchain_64.c | 4
arch/riscv/include/asm/cacheflush.h | 65 --------
arch/sh/include/asm/cacheflush.h | 1
arch/sparc/include/asm/cacheflush_32.h | 2
arch/sparc/include/asm/cacheflush_64.h | 1
arch/um/include/asm/tlb.h | 2
arch/unicore32/include/asm/cacheflush.h | 11 -
arch/x86/include/asm/cacheflush.h | 2
arch/xtensa/include/asm/cacheflush.h | 2
drivers/media/platform/omap3isp/ispvideo.c | 2
drivers/nvdimm/pmem.c | 3
drivers/vhost/vhost.c | 5
fs/binfmt_flat.c | 2
fs/exec.c | 5
fs/proc/proc_sysctl.c | 163 ++++++++++++++++++++--
include/asm-generic/cacheflush.h | 25 +--
include/linux/dev_printk.h | 6
include/linux/dynamic_debug.h | 2
include/linux/ipc_namespace.h | 2
include/linux/kernel.h | 9 +
include/linux/mm.h | 12 +
include/linux/net.h | 3
include/linux/netdevice.h | 6
include/linux/printk.h | 9 -
include/linux/sched/sysctl.h | 7
include/linux/sysctl.h | 4
include/linux/xarray.h | 4
include/rdma/ib_verbs.h | 6
init/main.c | 2
ipc/msg.c | 2
ipc/namespace.c | 24 ++-
kernel/events/core.c | 4
kernel/events/uprobes.c | 2
kernel/hung_task.c | 30 ++--
kernel/module.c | 8 -
kernel/panic.c | 45 ++++++
kernel/sysctl.c | 38 ++++-
kernel/watchdog.c | 37 +---
lib/Kconfig.debug | 12 +
lib/Makefile | 2
lib/dynamic_debug.c | 9 -
lib/test_sysctl.c | 13 +
mm/frame_vector.c | 7
mm/gup.c | 74 +++++++--
mm/mmap.c | 28 ++-
mm/nommu.c | 4
mm/page_alloc.c | 9 -
mm/page_idle.c | 7
tools/testing/selftests/sysctl/sysctl.sh | 44 +++++
virt/kvm/kvm_main.c | 8 -
76 files changed, 732 insertions(+), 517 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-04 23:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-06-04 23:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
- More MM work. 100ish more to go. Mike's "mm: remove
__ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue.
- Various other little subsystems
127 patches, based on 6929f71e46bdddbf1c4d67c2728648176c67c555.
Subsystems affected by this patch series:
kcov
mm/pagemap
mm/vmalloc
mm/kmap
mm/util
mm/memory-hotplug
mm/cleanups
mm/zram
procfs
core-kernel
get_maintainer
lib
bitops
checkpatch
binfmt
init
fat
seq_file
exec
rapidio
relay
selftests
ubsan
Subsystem: kcov
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kcov: collect coverage from usb soft interrupts", v4:
kcov: cleanup debug messages
kcov: fix potential use-after-free in kcov_remote_start
kcov: move t->kcov assignments into kcov_start/stop
kcov: move t->kcov_sequence assignment
kcov: use t->kcov_mode as enabled indicator
kcov: collect coverage from interrupts
usb: core: kcov: collect coverage from usb complete callback
Subsystem: mm/pagemap
Feng Tang <feng.tang@intel.com>:
mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: remove __ARCH_HAS_5LEVEL_HACK", v4:
h8300: remove usage of __ARCH_USE_5LEVEL_HACK
arm: add support for folded p4d page tables
arm64: add support for folded p4d page tables
hexagon: remove __ARCH_USE_5LEVEL_HACK
ia64: add support for folded p4d page tables
nios2: add support for folded p4d page tables
openrisc: add support for folded p4d page tables
powerpc: add support for folded p4d page tables
Geert Uytterhoeven <geert+renesas@glider.be>:
sh: fault: modernize printing of kernel messages
Mike Rapoport <rppt@linux.ibm.com>:
sh: drop __pXd_offset() macros that duplicate pXd_index() ones
sh: add support for folded p4d page tables
unicore32: remove __ARCH_USE_5LEVEL_HACK
asm-generic: remove pgtable-nop4d-hack.h
mm: remove __ARCH_HAS_5LEVEL_HACK and include/asm-generic/5level-fixup.h
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/debug: Add tests validating architecture page table:
x86/mm: define mm_p4d_folded()
mm/debug: add tests validating architecture page table helpers
Subsystem: mm/vmalloc
Jeongtae Park <jtp.park@samsung.com>:
mm/vmalloc: fix a typo in comment
Subsystem: mm/kmap
Ira Weiny <ira.weiny@intel.com>:
Patch series "Remove duplicated kmap code", v3:
arch/kmap: remove BUG_ON()
arch/xtensa: move kmap build bug out of the way
arch/kmap: remove redundant arch specific kmaps
arch/kunmap: remove duplicate kunmap implementations
{x86,powerpc,microblaze}/kmap: move preempt disable
arch/kmap_atomic: consolidate duplicate code
arch/kunmap_atomic: consolidate duplicate code
arch/kmap: ensure kmap_prot visibility
arch/kmap: don't hard code kmap_prot values
arch/kmap: define kmap_atomic_prot() for all arch's
drm: remove drm specific kmap_atomic code
kmap: remove kmap_atomic_to_page()
parisc/kmap: remove duplicate kmap code
sparc: remove unnecessary includes
kmap: consolidate kmap_prot definitions
Subsystem: mm/util
Waiman Long <longman@redhat.com>:
mm: add kvfree_sensitive() for freeing sensitive data objects
Subsystem: mm/memory-hotplug
Vishal Verma <vishal.l.verma@intel.com>:
mm/memory_hotplug: refrain from adding memory into an impossible node
David Hildenbrand <david@redhat.com>:
powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()
mm/memory_hotplug: remove is_mem_section_removable()
Patch series "mm/memory_hotplug: handle memblocks only with:
mm/memory_hotplug: set node_start_pfn of hotadded pgdat to 0
mm/memory_hotplug: handle memblocks only with CONFIG_ARCH_KEEP_MEMBLOCK
Patch series "mm/memory_hotplug: Interface to add driver-managed system:
mm/memory_hotplug: introduce add_memory_driver_managed()
kexec_file: don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGED
device-dax: add memory via add_memory_driver_managed()
Michal Hocko <mhocko@kernel.org>:
mm/memory_hotplug: disable the functionality for 32b
Subsystem: mm/cleanups
chenqiwu <chenqiwu@xiaomi.com>:
mm: replace zero-length array with flexible-array member
Ethon Paul <ethp@qq.com>:
mm/memory_hotplug: fix a typo in comment "recoreded"->"recorded"
mm: ksm: fix a typo in comment "alreaady"->"already"
mm: mmap: fix a typo in comment "compatbility"->"compatibility"
mm/hugetlb: fix a typos in comments
mm/vmsan: fix some typos in comment
mm/compaction: fix a typo in comment "pessemistic"->"pessimistic"
mm/memblock: fix a typo in comment "implict"->"implicit"
mm/list_lru: fix a typo in comment "numbesr"->"numbers"
mm/filemap: fix a typo in comment "unneccssary"->"unnecessary"
mm/frontswap: fix some typos in frontswap.c
mm, memcg: fix some typos in memcontrol.c
mm: fix a typo in comment "strucure"->"structure"
mm/slub: fix a typo in comment "disambiguiation"->"disambiguation"
mm/sparse: fix a typo in comment "convienence"->"convenience"
mm/page-writeback: fix a typo in comment "effictive"->"effective"
mm/memory: fix a typo in comment "attampt"->"attempt"
Zou Wei <zou_wei@huawei.com>:
mm: use false for bool variable
Jason Yan <yanaijie@huawei.com>:
include/linux/mm.h: return true in cpupid_pid_unset()
Subsystem: mm/zram
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
zcomp: Use ARRAY_SIZE() for backends list
Subsystem: procfs
Alexey Dobriyan <adobriyan@gmail.com>:
proc: rename "catch" function argument
Subsystem: core-kernel
Jason Yan <yanaijie@huawei.com>:
user.c: make uidhash_table static
Subsystem: get_maintainer
Joe Perches <joe@perches.com>:
get_maintainer: add email addresses from .yaml files
get_maintainer: fix unexpected behavior for path/to//file (double slashes)
Subsystem: lib
Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
lib/math: avoid trailing newline hidden in pr_fmt()
KP Singh <kpsingh@chromium.org>:
lib: Add might_fault() to strncpy_from_user.
Jason Yan <yanaijie@huawei.com>:
lib/test_lockup.c: make test_inode static
Jann Horn <jannh@google.com>:
lib/zlib: remove outdated and incorrect pre-increment optimization
Joe Perches <joe@perches.com>:
lib/percpu-refcount.c: use a more common logging style
Tan Hu <tan.hu@zte.com.cn>:
lib/flex_proportions.c: cleanup __fprop_inc_percpu_max
Jesse Brandeburg <jesse.brandeburg@intel.com>:
lib: make a test module with set/clear bit
Subsystem: bitops
Arnd Bergmann <arnd@arndb.de>:
include/linux/bitops.h: avoid clang shift-count-overflow warnings
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: additional MAINTAINER section entry ordering checks
checkpatch: look for c99 comments in ctx_locate_comment
checkpatch: disallow --git and --file/--fix
Geert Uytterhoeven <geert+renesas@glider.be>:
checkpatch: use patch subject when reading from stdin
Subsystem: binfmt
Anthony Iliopoulos <ailiop@suse.com>:
fs/binfmt_elf: remove redundant elf_map ifndef
Nick Desaulniers <ndesaulniers@google.com>:
elfnote: mark all .note sections SHF_ALLOC
Subsystem: init
Chris Down <chris@chrisdown.name>:
init: allow distribution configuration of default init
Subsystem: fat
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
fat: don't allow to mount if the FAT length == 0
fat: improve the readahead for FAT entries
Subsystem: seq_file
Joe Perches <joe@perches.com>:
fs/seq_file.c: seq_read: Update pr_info_ratelimited
Kefeng Wang <wangkefeng.wang@huawei.com>:
Patch series "seq_file: Introduce DEFINE_SEQ_ATTRIBUTE() helper macro":
include/linux/seq_file.h: introduce DEFINE_SEQ_ATTRIBUTE() helper macro
mm/vmstat.c: convert to use DEFINE_SEQ_ATTRIBUTE macro
kernel/kprobes.c: convert to use DEFINE_SEQ_ATTRIBUTE macro
Subsystem: exec
Christoph Hellwig <hch@lst.de>:
exec: simplify the copy_strings_kernel calling convention
exec: open code copy_string_kernel
Subsystem: rapidio
Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>:
rapidio: avoid data race between file operation callbacks and mport_cdev_add().
John Hubbard <jhubbard@nvidia.com>:
rapidio: convert get_user_pages() --> pin_user_pages()
Subsystem: relay
Daniel Axtens <dja@axtens.net>:
kernel/relay.c: handle alloc_percpu returning NULL in relay_open
Pengcheng Yang <yangpc@wangsu.com>:
kernel/relay.c: fix read_pos error when multiple readers
Subsystem: selftests
Ram Pai <linuxram@us.ibm.com>:
Patch series "selftests, powerpc, x86: Memory Protection Keys", v19:
selftests/x86/pkeys: move selftests to arch-neutral directory
selftests/vm/pkeys: rename all references to pkru to a generic name
selftests/vm/pkeys: move generic definitions to header file
Thiago Jung Bauermann <bauerman@linux.ibm.com>:
selftests/vm/pkeys: move some definitions to arch-specific header
selftests/vm/pkeys: make gcc check arguments of sigsafe_printf()
Sandipan Das <sandipan@linux.ibm.com>:
selftests: vm: pkeys: Use sane types for pkey register
selftests: vm: pkeys: add helpers for pkey bits
Ram Pai <linuxram@us.ibm.com>:
selftests/vm/pkeys: fix pkey_disable_clear()
selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
Sandipan Das <sandipan@linux.ibm.com>:
selftests: vm: pkeys: use the correct huge page size
Ram Pai <linuxram@us.ibm.com>:
selftests/vm/pkeys: introduce generic pkey abstractions
selftests/vm/pkeys: introduce powerpc support
"Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>:
selftests/vm/pkeys: fix number of reserved powerpc pkeys
Ram Pai <linuxram@us.ibm.com>:
selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: improve checks to determine pkey support
selftests/vm/pkeys: associate key on a mapped page and detect access violation
selftests/vm/pkeys: associate key on a mapped page and detect write violation
selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
selftests/vm/pkeys: introduce a sub-page allocator
selftests/vm/pkeys: test correct behaviour of pkey-0
selftests/vm/pkeys: override access right definitions on powerpc
Sandipan Das <sandipan@linux.ibm.com>:
selftests: vm: pkeys: use the correct page size on powerpc
selftests: vm: pkeys: fix multilib builds for x86
Jagadeesh Pagadala <jagdsh.linux@gmail.com>:
tools/testing/selftests/vm: remove duplicate headers
Subsystem: ubsan
Arnd Bergmann <arnd@arndb.de>:
lib/ubsan.c: fix gcc-10 warnings
Documentation/dev-tools/kcov.rst | 17
Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 34
arch/arc/Kconfig | 1
arch/arc/include/asm/highmem.h | 20
arch/arc/mm/highmem.c | 34
arch/arm/include/asm/highmem.h | 9
arch/arm/include/asm/pgtable.h | 1
arch/arm/lib/uaccess_with_memcpy.c | 7
arch/arm/mach-sa1100/assabet.c | 2
arch/arm/mm/dump.c | 29
arch/arm/mm/fault-armv.c | 7
arch/arm/mm/fault.c | 22
arch/arm/mm/highmem.c | 41
arch/arm/mm/idmap.c | 3
arch/arm/mm/init.c | 2
arch/arm/mm/ioremap.c | 12
arch/arm/mm/mm.h | 2
arch/arm/mm/mmu.c | 35
arch/arm/mm/pgd.c | 40
arch/arm64/Kconfig | 1
arch/arm64/include/asm/kvm_mmu.h | 10
arch/arm64/include/asm/pgalloc.h | 10
arch/arm64/include/asm/pgtable-types.h | 5
arch/arm64/include/asm/pgtable.h | 37
arch/arm64/include/asm/stage2_pgtable.h | 48
arch/arm64/kernel/hibernate.c | 44
arch/arm64/kvm/mmu.c | 209
arch/arm64/mm/fault.c | 9
arch/arm64/mm/hugetlbpage.c | 15
arch/arm64/mm/kasan_init.c | 26
arch/arm64/mm/mmu.c | 52
arch/arm64/mm/pageattr.c | 7
arch/csky/include/asm/highmem.h | 12
arch/csky/mm/highmem.c | 64
arch/h8300/include/asm/pgtable.h | 1
arch/hexagon/include/asm/fixmap.h | 4
arch/hexagon/include/asm/pgtable.h | 1
arch/ia64/include/asm/pgalloc.h | 4
arch/ia64/include/asm/pgtable.h | 17
arch/ia64/mm/fault.c | 7
arch/ia64/mm/hugetlbpage.c | 18
arch/ia64/mm/init.c | 28
arch/microblaze/include/asm/highmem.h | 55
arch/microblaze/mm/highmem.c | 21
arch/microblaze/mm/init.c | 3
arch/mips/include/asm/highmem.h | 11
arch/mips/mm/cache.c | 6
arch/mips/mm/highmem.c | 62
arch/nds32/include/asm/highmem.h | 9
arch/nds32/mm/highmem.c | 49
arch/nios2/include/asm/pgtable.h | 3
arch/nios2/mm/fault.c | 9
arch/nios2/mm/ioremap.c | 6
arch/openrisc/include/asm/pgtable.h | 1
arch/openrisc/mm/fault.c | 10
arch/openrisc/mm/init.c | 4
arch/parisc/include/asm/cacheflush.h | 32
arch/powerpc/Kconfig | 1
arch/powerpc/include/asm/book3s/32/pgtable.h | 1
arch/powerpc/include/asm/book3s/64/hash.h | 4
arch/powerpc/include/asm/book3s/64/pgalloc.h | 4
arch/powerpc/include/asm/book3s/64/pgtable.h | 60
arch/powerpc/include/asm/book3s/64/radix.h | 6
arch/powerpc/include/asm/highmem.h | 56
arch/powerpc/include/asm/nohash/32/pgtable.h | 1
arch/powerpc/include/asm/nohash/64/pgalloc.h | 2
arch/powerpc/include/asm/nohash/64/pgtable-4k.h | 32
arch/powerpc/include/asm/nohash/64/pgtable.h | 6
arch/powerpc/include/asm/pgtable.h | 10
arch/powerpc/kvm/book3s_64_mmu_radix.c | 32
arch/powerpc/lib/code-patching.c | 7
arch/powerpc/mm/book3s64/hash_pgtable.c | 4
arch/powerpc/mm/book3s64/radix_pgtable.c | 26
arch/powerpc/mm/book3s64/subpage_prot.c | 6
arch/powerpc/mm/highmem.c | 26
arch/powerpc/mm/hugetlbpage.c | 28
arch/powerpc/mm/kasan/kasan_init_32.c | 2
arch/powerpc/mm/mem.c | 3
arch/powerpc/mm/nohash/book3e_pgtable.c | 15
arch/powerpc/mm/pgtable.c | 30
arch/powerpc/mm/pgtable_64.c | 10
arch/powerpc/mm/ptdump/hashpagetable.c | 20
arch/powerpc/mm/ptdump/ptdump.c | 12
arch/powerpc/platforms/pseries/hotplug-memory.c | 26
arch/powerpc/xmon/xmon.c | 27
arch/s390/Kconfig | 1
arch/sh/include/asm/pgtable-2level.h | 1
arch/sh/include/asm/pgtable-3level.h | 1
arch/sh/include/asm/pgtable_32.h | 5
arch/sh/include/asm/pgtable_64.h | 5
arch/sh/kernel/io_trapped.c | 7
arch/sh/mm/cache-sh4.c | 4
arch/sh/mm/cache-sh5.c | 7
arch/sh/mm/fault.c | 64
arch/sh/mm/hugetlbpage.c | 28
arch/sh/mm/init.c | 15
arch/sh/mm/kmap.c | 2
arch/sh/mm/tlbex_32.c | 6
arch/sh/mm/tlbex_64.c | 7
arch/sparc/include/asm/highmem.h | 29
arch/sparc/mm/highmem.c | 31
arch/sparc/mm/io-unit.c | 1
arch/sparc/mm/iommu.c | 1
arch/unicore32/include/asm/pgtable.h | 1
arch/unicore32/kernel/hibernate.c | 4
arch/x86/Kconfig | 1
arch/x86/include/asm/fixmap.h | 1
arch/x86/include/asm/highmem.h | 37
arch/x86/include/asm/pgtable_64.h | 6
arch/x86/mm/highmem_32.c | 52
arch/xtensa/include/asm/highmem.h | 31
arch/xtensa/mm/highmem.c | 28
drivers/block/zram/zcomp.c | 7
drivers/dax/dax-private.h | 1
drivers/dax/kmem.c | 28
drivers/gpu/drm/ttm/ttm_bo_util.c | 56
drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 17
drivers/rapidio/devices/rio_mport_cdev.c | 27
drivers/usb/core/hcd.c | 3
fs/binfmt_elf.c | 4
fs/binfmt_em86.c | 6
fs/binfmt_misc.c | 4
fs/binfmt_script.c | 6
fs/exec.c | 58
fs/fat/fatent.c | 103
fs/fat/inode.c | 6
fs/proc/array.c | 8
fs/seq_file.c | 7
include/asm-generic/5level-fixup.h | 59
include/asm-generic/pgtable-nop4d-hack.h | 64
include/asm-generic/pgtable-nopud.h | 4
include/drm/ttm/ttm_bo_api.h | 4
include/linux/binfmts.h | 3
include/linux/bitops.h | 2
include/linux/elfnote.h | 2
include/linux/highmem.h | 89
include/linux/ioport.h | 1
include/linux/memory_hotplug.h | 9
include/linux/mm.h | 12
include/linux/sched.h | 3
include/linux/seq_file.h | 19
init/Kconfig | 10
init/main.c | 10
kernel/kcov.c | 282 -
kernel/kexec_file.c | 5
kernel/kprobes.c | 34
kernel/relay.c | 22
kernel/user.c | 2
lib/Kconfig.debug | 44
lib/Makefile | 2
lib/flex_proportions.c | 7
lib/math/prime_numbers.c | 10
lib/percpu-refcount.c | 6
lib/strncpy_from_user.c | 1
lib/test_bitops.c | 60
lib/test_lockup.c | 2
lib/ubsan.c | 33
lib/zlib_inflate/inffast.c | 91
mm/Kconfig | 4
mm/Makefile | 1
mm/compaction.c | 2
mm/debug_vm_pgtable.c | 382 +
mm/filemap.c | 2
mm/frontswap.c | 6
mm/huge_memory.c | 2
mm/hugetlb.c | 16
mm/internal.h | 2
mm/kasan/init.c | 11
mm/ksm.c | 10
mm/list_lru.c | 2
mm/memblock.c | 2
mm/memcontrol.c | 4
mm/memory.c | 10
mm/memory_hotplug.c | 179
mm/mmap.c | 2
mm/mremap.c | 2
mm/page-writeback.c | 2
mm/slub.c | 2
mm/sparse.c | 2
mm/util.c | 22
mm/vmalloc.c | 2
mm/vmscan.c | 6
mm/vmstat.c | 32
mm/zbud.c | 2
scripts/checkpatch.pl | 62
scripts/get_maintainer.pl | 46
security/keys/internal.h | 11
security/keys/keyctl.c | 16
tools/testing/selftests/lib/config | 1
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 75
tools/testing/selftests/vm/mremap_dontunmap.c | 1
tools/testing/selftests/vm/pkey-helpers.h | 557 +-
tools/testing/selftests/vm/pkey-powerpc.h | 153
tools/testing/selftests/vm/pkey-x86.h | 191
tools/testing/selftests/vm/protection_keys.c | 2370 ++++++++--
tools/testing/selftests/x86/.gitignore | 1
tools/testing/selftests/x86/Makefile | 2
tools/testing/selftests/x86/pkey-helpers.h | 219
tools/testing/selftests/x86/protection_keys.c | 1506 ------
200 files changed, 5182 insertions(+), 4033 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-03 22:55 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-06-03 22:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
More mm/ work, plenty more to come.
131 patches, based on d6f9469a03d832dcd17041ed67774ffb5f3e73b3.
Subsystems affected by this patch series:
mm/slub
mm/memcg
mm/gup
mm/kasan
mm/pagealloc
mm/hugetlb
mm/vmscan
mm/tools
mm/mempolicy
mm/memblock
mm/hugetlbfs
mm/thp
mm/mmap
mm/kconfig
Subsystem: mm/slub
Wang Hai <wanghai38@huawei.com>:
mm/slub: fix a memory leak in sysfs_slab_add()
Subsystem: mm/memcg
Shakeel Butt <shakeelb@google.com>:
mm/memcg: optimize memory.numa_stat like memory.stat
Subsystem: mm/gup
John Hubbard <jhubbard@nvidia.com>:
Patch series "mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()", v2:
mm/gup: move __get_user_pages_fast() down a few lines in gup.c
mm/gup: refactor and de-duplicate gup_fast() code
mm/gup: introduce pin_user_pages_fast_only()
drm/i915: convert get_user_pages() --> pin_user_pages()
mm/gup: might_lock_read(mmap_sem) in get_user_pages_fast()
Subsystem: mm/kasan
Daniel Axtens <dja@axtens.net>:
Patch series "Fix some incompatibilites between KASAN and FORTIFY_SOURCE", v4:
kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE
string.h: fix incompatibility between FORTIFY_SOURCE and KASAN
Subsystem: mm/pagealloc
Michal Hocko <mhocko@suse.com>:
mm: clarify __GFP_MEMALLOC usage
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: rework free_area_init*() funcitons":
mm: memblock: replace dereferences of memblock_region.nid with API calls
mm: make early_pfn_to_nid() and related defintions close to each other
mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option
mm: free_area_init: use maximal zone PFNs rather than zone sizes
mm: use free_area_init() instead of free_area_init_nodes()
alpha: simplify detection of memory zone boundaries
arm: simplify detection of memory zone boundaries
arm64: simplify detection of memory zone boundaries for UMA configs
csky: simplify detection of memory zone boundaries
m68k: mm: simplify detection of memory zone boundaries
parisc: simplify detection of memory zone boundaries
sparc32: simplify detection of memory zone boundaries
unicore32: simplify detection of memory zone boundaries
xtensa: simplify detection of memory zone boundaries
Baoquan He <bhe@redhat.com>:
mm: memmap_init: iterate over memblock regions rather that check each PFN
Mike Rapoport <rppt@linux.ibm.com>:
mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES
mm: free_area_init: allow defining max_zone_pfn in descending order
mm: rename free_area_init_node() to free_area_init_memoryless_node()
mm: clean up free_area_init_node() and its helpers
mm: simplify find_min_pfn_with_active_regions()
docs/vm: update memory-models documentation
Wei Yang <richard.weiyang@gmail.com>:
Patch series "mm/page_alloc.c: cleanup on check page", v3:
mm/page_alloc.c: bad_[reason|flags] is not necessary when PageHWPoison
mm/page_alloc.c: bad_flags is not necessary for bad_page()
mm/page_alloc.c: rename free_pages_check_bad() to check_free_page_bad()
mm/page_alloc.c: rename free_pages_check() to check_free_page()
mm/page_alloc.c: extract check_[new|free]_page_bad() common part to page_bad_reason()
Roman Gushchin <guro@fb.com>:
mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations
Baoquan He <bhe@redhat.com>:
mm/page_alloc.c: remove unused free_bootmem_with_active_regions
Patch series "improvements about lowmem_reserve and /proc/zoneinfo", v2:
mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it
mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty
mm/vmstat.c: do not show lowmem reserve protection information of empty zone
Joonsoo Kim <iamjoonsoo.kim@lge.com>:
Patch series "integrate classzone_idx and high_zoneidx", v5:
mm/page_alloc: use ac->high_zoneidx for classzone_idx
mm/page_alloc: integrate classzone_idx and high_zoneidx
Wei Yang <richard.weiyang@gmail.com>:
mm/page_alloc.c: use NODE_MASK_NONE in build_zonelists()
mm: rename gfpflags_to_migratetype to gfp_migratetype for same convention
Sandipan Das <sandipan@linux.ibm.com>:
mm/page_alloc.c: reset numa stats for boot pagesets
Charan Teja Reddy <charante@codeaurora.org>:
mm, page_alloc: reset the zone->watermark_boost early
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/page_alloc: restrict and formalize compound_page_dtors[]
Daniel Jordan <daniel.m.jordan@oracle.com>:
Patch series "initialize deferred pages with interrupts enabled", v4:
mm/pagealloc.c: call touch_nmi_watchdog() on max order boundaries in deferred init
Pavel Tatashin <pasha.tatashin@soleen.com>:
mm: initialize deferred pages with interrupts enabled
mm: call cond_resched() from deferred_init_memmap()
Daniel Jordan <daniel.m.jordan@oracle.com>:
Patch series "padata: parallelize deferred page init", v3:
padata: remove exit routine
padata: initialize earlier
padata: allocate work structures for parallel jobs from a pool
padata: add basic support for multithreaded jobs
mm: don't track number of pages during deferred initialization
mm: parallelize deferred_init_memmap()
mm: make deferred init's max threads arch-specific
padata: document multithreaded jobs
Chen Tao <chentao107@huawei.com>:
mm/page_alloc.c: add missing newline
Subsystem: mm/hugetlb
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
Patch series "thp/khugepaged improvements and CoW semantics", v4:
khugepaged: add self test
khugepaged: do not stop collapse if less than half PTEs are referenced
khugepaged: drain all LRU caches before scanning pages
khugepaged: drain LRU add pagevec after swapin
khugepaged: allow to collapse a page shared across fork
khugepaged: allow to collapse PTE-mapped compound pages
thp: change CoW semantics for anon-THP
khugepaged: introduce 'max_ptes_shared' tunable
Mike Kravetz <mike.kravetz@oracle.com>:
Patch series "Clean up hugetlb boot command line processing", v4:
hugetlbfs: add arch_hugetlb_valid_size
hugetlbfs: move hugepagesz= parsing to arch independent code
hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
hugetlbfs: clean up command line processing
hugetlbfs: fix changes to command line processing
Li Xinhai <lixinhai.lxh@gmail.com>:
mm/hugetlb: avoid unnecessary check on pud and pmd entry in huge_pte_offset
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/hugetlb: Add some new generic fallbacks", v3:
arm64/mm: drop __HAVE_ARCH_HUGE_PTEP_GET
mm/hugetlb: define a generic fallback for is_hugepage_only_range()
mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: simplify calling a compound page destructor
Subsystem: mm/vmscan
Wei Yang <richard.weiyang@gmail.com>:
mm/vmscan.c: use update_lru_size() in update_lru_sizes()
Jaewon Kim <jaewon31.kim@samsung.com>:
mm/vmscan: count layzfree pages and fix nr_isolated_* mismatch
Maninder Singh <maninder1.s@samsung.com>:
mm/vmscan.c: change prototype for shrink_page_list
Qiwu Chen <qiwuchen55@gmail.com>:
mm/vmscan: update the comment of should_continue_reclaim()
Johannes Weiner <hannes@cmpxchg.org>:
Patch series "mm: memcontrol: charge swapin pages on instantiation", v2:
mm: fix NUMA node file count error in replace_page_cache()
mm: memcontrol: fix stat-corrupting race in charge moving
mm: memcontrol: drop @compound parameter from memcg charging API
mm: shmem: remove rare optimization when swapin races with hole punching
mm: memcontrol: move out cgroup swaprate throttling
mm: memcontrol: convert page cache to a new mem_cgroup_charge() API
mm: memcontrol: prepare uncharging for removal of private page type counters
mm: memcontrol: prepare move_account for removal of private page type counters
mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters
mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters
mm: memcontrol: switch to native NR_ANON_MAPPED counter
mm: memcontrol: switch to native NR_ANON_THPS counter
mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
mm: memcontrol: drop unused try/commit/cancel charge API
mm: memcontrol: prepare swap controller setup for integration
mm: memcontrol: make swap tracking an integral part of memory control
mm: memcontrol: charge swapin pages on instantiation
Alex Shi <alex.shi@linux.alibaba.com>:
mm: memcontrol: document the new swap control behavior
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: delete unused lrucare handling
mm: memcontrol: update page->mem_cgroup stability rules
mm: fix LRU balancing effect of new transparent huge pages
mm: keep separate anon and file statistics on page reclaim activity
mm: allow swappiness that prefers reclaiming anon over the file workingset
mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()
mm: workingset: let cache workingset challenge anon
mm: remove use-once cache bias from LRU balancing
mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count()
mm: base LRU balancing on an explicit cost model
mm: deactivations shouldn't bias the LRU balance
mm: only count actual rotations as LRU reclaim cost
mm: balance LRU lists based on relative thrashing
mm: vmscan: determine anon/file pressure balance at the reclaim root
mm: vmscan: reclaim writepage is IO cost
mm: vmscan: limit the range of LRU type balancing
Shakeel Butt <shakeelb@google.com>:
mm: swap: fix vmstats for huge pages
mm: swap: memcg: fix memcg stats for huge pages
Subsystem: mm/tools
Changhee Han <ch0.han@lge.com>:
tools/vm/page_owner_sort.c: filter out unneeded line
Subsystem: mm/mempolicy
Michal Hocko <mhocko@suse.com>:
mm, mempolicy: fix up gup usage in lookup_node
Subsystem: mm/memblock
chenqiwu <chenqiwu@xiaomi.com>:
include/linux/memblock.h: fix minor typo and unclear comment
Mike Rapoport <rppt@linux.ibm.com>:
sparc32: register memory occupied by kernel as memblock.memory
Subsystem: mm/hugetlbfs
Shijie Hu <hushijie3@huawei.com>:
hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
Subsystem: mm/thp
Yang Shi <yang.shi@linux.alibaba.com>:
mm: thp: don't need to drain lru cache when splitting and mlocking THP
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/thp: Rename pmd_mknotpresent() as pmd_mknotvalid()", v2:
powerpc/mm: drop platform defined pmd_mknotpresent()
mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
Subsystem: mm/mmap
Scott Cheloha <cheloha@linux.vnet.ibm.com>:
drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
Subsystem: mm/kconfig
Zong Li <zong.li@sifive.com>:
Patch series "Extract DEBUG_WX to shared use":
mm: add DEBUG_WX support
riscv: support DEBUG_WX
x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
Documentation/admin-guide/cgroup-v1/memory.rst | 19
Documentation/admin-guide/kernel-parameters.txt | 40
Documentation/admin-guide/mm/hugetlbpage.rst | 35
Documentation/admin-guide/mm/transhuge.rst | 7
Documentation/admin-guide/sysctl/vm.rst | 23
Documentation/core-api/padata.rst | 41
Documentation/features/vm/numa-memblock/arch-support.txt | 34
Documentation/vm/memory-model.rst | 9
Documentation/vm/page_owner.rst | 3
arch/alpha/mm/init.c | 16
arch/alpha/mm/numa.c | 22
arch/arc/include/asm/hugepage.h | 2
arch/arc/mm/init.c | 41
arch/arm/include/asm/hugetlb.h | 7
arch/arm/include/asm/pgtable-3level.h | 2
arch/arm/mm/init.c | 66
arch/arm64/Kconfig | 2
arch/arm64/Kconfig.debug | 29
arch/arm64/include/asm/hugetlb.h | 13
arch/arm64/include/asm/pgtable.h | 2
arch/arm64/mm/hugetlbpage.c | 48
arch/arm64/mm/init.c | 56
arch/arm64/mm/numa.c | 9
arch/c6x/mm/init.c | 8
arch/csky/kernel/setup.c | 26
arch/h8300/mm/init.c | 6
arch/hexagon/mm/init.c | 6
arch/ia64/Kconfig | 1
arch/ia64/include/asm/hugetlb.h | 5
arch/ia64/mm/contig.c | 2
arch/ia64/mm/discontig.c | 2
arch/m68k/mm/init.c | 6
arch/m68k/mm/mcfmmu.c | 9
arch/m68k/mm/motorola.c | 15
arch/m68k/mm/sun3mmu.c | 10
arch/microblaze/Kconfig | 1
arch/microblaze/mm/init.c | 2
arch/mips/Kconfig | 1
arch/mips/include/asm/hugetlb.h | 11
arch/mips/include/asm/pgtable.h | 2
arch/mips/loongson64/numa.c | 2
arch/mips/mm/init.c | 2
arch/mips/sgi-ip27/ip27-memory.c | 2
arch/nds32/mm/init.c | 11
arch/nios2/mm/init.c | 8
arch/openrisc/mm/init.c | 9
arch/parisc/include/asm/hugetlb.h | 10
arch/parisc/mm/init.c | 22
arch/powerpc/Kconfig | 10
arch/powerpc/include/asm/book3s/64/pgtable.h | 4
arch/powerpc/include/asm/hugetlb.h | 5
arch/powerpc/mm/hugetlbpage.c | 38
arch/powerpc/mm/mem.c | 2
arch/riscv/Kconfig | 2
arch/riscv/include/asm/hugetlb.h | 10
arch/riscv/include/asm/ptdump.h | 11
arch/riscv/mm/hugetlbpage.c | 44
arch/riscv/mm/init.c | 5
arch/s390/Kconfig | 1
arch/s390/include/asm/hugetlb.h | 8
arch/s390/mm/hugetlbpage.c | 34
arch/s390/mm/init.c | 2
arch/sh/Kconfig | 1
arch/sh/include/asm/hugetlb.h | 7
arch/sh/mm/init.c | 2
arch/sparc/Kconfig | 10
arch/sparc/include/asm/hugetlb.h | 10
arch/sparc/mm/init_32.c | 1
arch/sparc/mm/init_64.c | 67
arch/sparc/mm/srmmu.c | 21
arch/um/kernel/mem.c | 12
arch/unicore32/include/asm/memory.h | 2
arch/unicore32/include/mach/memory.h | 6
arch/unicore32/kernel/pci.c | 14
arch/unicore32/mm/init.c | 43
arch/x86/Kconfig | 11
arch/x86/Kconfig.debug | 27
arch/x86/include/asm/hugetlb.h | 10
arch/x86/include/asm/pgtable.h | 2
arch/x86/mm/hugetlbpage.c | 35
arch/x86/mm/init.c | 2
arch/x86/mm/init_64.c | 12
arch/x86/mm/kmmio.c | 2
arch/x86/mm/numa.c | 11
arch/xtensa/mm/init.c | 8
drivers/base/memory.c | 44
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 22
fs/cifs/file.c | 10
fs/fuse/dev.c | 2
fs/hugetlbfs/inode.c | 67
include/asm-generic/hugetlb.h | 2
include/linux/compaction.h | 9
include/linux/gfp.h | 7
include/linux/hugetlb.h | 16
include/linux/memblock.h | 15
include/linux/memcontrol.h | 102 -
include/linux/mm.h | 52
include/linux/mmzone.h | 46
include/linux/padata.h | 43
include/linux/string.h | 60
include/linux/swap.h | 17
include/linux/vm_event_item.h | 4
include/linux/vmstat.h | 2
include/trace/events/compaction.h | 22
include/trace/events/huge_memory.h | 3
include/trace/events/vmscan.h | 14
init/Kconfig | 17
init/main.c | 2
kernel/events/uprobes.c | 22
kernel/padata.c | 293 +++-
kernel/sysctl.c | 3
lib/test_kasan.c | 29
mm/Kconfig | 9
mm/Kconfig.debug | 32
mm/compaction.c | 70 -
mm/filemap.c | 55
mm/gup.c | 237 ++-
mm/huge_memory.c | 282 ----
mm/hugetlb.c | 260 ++-
mm/internal.h | 25
mm/khugepaged.c | 316 ++--
mm/memblock.c | 19
mm/memcontrol.c | 642 +++------
mm/memory.c | 103 -
mm/memory_hotplug.c | 10
mm/mempolicy.c | 5
mm/migrate.c | 30
mm/oom_kill.c | 4
mm/page_alloc.c | 735 ++++------
mm/page_owner.c | 7
mm/pgtable-generic.c | 2
mm/rmap.c | 53
mm/shmem.c | 156 --
mm/slab.c | 4
mm/slub.c | 8
mm/swap.c | 199 +-
mm/swap_cgroup.c | 10
mm/swap_state.c | 110 -
mm/swapfile.c | 39
mm/userfaultfd.c | 15
mm/vmscan.c | 344 ++--
mm/vmstat.c | 16
mm/workingset.c | 23
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 1
tools/testing/selftests/vm/khugepaged.c | 1035 +++++++++++++++
tools/vm/page_owner_sort.c | 5
147 files changed, 3876 insertions(+), 3108 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-02 21:38 ` incoming Andrew Morton
@ 2020-06-02 22:18 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-06-02 22:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Tue, Jun 2, 2020 at 2:38 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 2 Jun 2020 13:45:49 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > Hmm. I have no issues with conflicts, and already took your previous series.
>
> Well that's odd.
I meant "I saw the conflicts and had no issue with them". Nothing odd.
And I actually much prefer seeing conflicts from your series (against
other pulls I've done) over having you delay your patch bombs because
of any fear for them.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-02 20:45 ` incoming Linus Torvalds
@ 2020-06-02 21:38 ` Andrew Morton
2020-06-02 22:18 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-06-02 21:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, Linux-MM
On Tue, 2 Jun 2020 13:45:49 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Jun 2, 2020 at 1:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > The local_lock merge made rather a mess of all of this. I'm
> > cooking up a full resend of the same material.
>
> Hmm. I have no issues with conflicts, and already took your previous series.
Well that's odd.
> I've pushed it out now - does my tree match what you expect?
Yup, thanks.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-02 20:08 ` incoming Andrew Morton
@ 2020-06-02 20:45 ` Linus Torvalds
2020-06-02 21:38 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2020-06-02 20:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Tue, Jun 2, 2020 at 1:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> The local_lock merge made rather a mess of all of this. I'm
> cooking up a full resend of the same material.
Hmm. I have no issues with conflicts, and already took your previous series.
I've pushed it out now - does my tree match what you expect?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-02 20:09 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-06-02 20:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
A few little subsystems and a start of a lot of MM patches.
128 patches, based on f359287765c04711ff54fbd11645271d8e5ff763:
Subsystems affected by this patch series:
squashfs
ocfs2
parisc
vfs
mm/slab-generic
mm/slub
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/memcg
mm/pagemap
mm/memory-failure
mm/vmalloc
mm/kasan
Subsystem: squashfs
Philippe Liard <pliard@google.com>:
squashfs: migrate from ll_rw_block usage to BIO
Subsystem: ocfs2
Jules Irenge <jbi.octave@gmail.com>:
ocfs2: add missing annotation for dlm_empty_lockres()
Gang He <ghe@suse.com>:
ocfs2: mount shared volume without ha stack
Subsystem: parisc
Andrew Morton <akpm@linux-foundation.org>:
arch/parisc/include/asm/pgtable.h: remove unused `old_pte'
Subsystem: vfs
Jeff Layton <jlayton@redhat.com>:
Patch series "vfs: have syncfs() return error when there are writeback:
vfs: track per-sb writeback errors and report them to syncfs
fs/buffer.c: record blockdev write errors in super_block that it backs
Subsystem: mm/slab-generic
Vlastimil Babka <vbabka@suse.cz>:
usercopy: mark dma-kmalloc caches as usercopy caches
Subsystem: mm/slub
Dongli Zhang <dongli.zhang@oracle.com>:
mm/slub.c: fix corrupted freechain in deactivate_slab()
Christoph Lameter <cl@linux.com>:
slub: Remove userspace notifier for cache add/remove
Christopher Lameter <cl@linux.com>:
slub: remove kmalloc under list_lock from list_slab_objects() V2
Qian Cai <cai@lca.pw>:
mm/slub: fix stack overruns with SLUB_STATS
Andrew Morton <akpm@linux-foundation.org>:
Documentation/vm/slub.rst: s/Toggle/Enable/
Subsystem: mm/debug
Vlastimil Babka <vbabka@suse.cz>:
mm, dump_page(): do not crash with invalid mapping pointer
Subsystem: mm/pagecache
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Change readahead API", v11:
mm: move readahead prototypes from mm.h
mm: return void from various readahead functions
mm: ignore return value of ->readpages
mm: move readahead nr_pages check into read_pages
mm: add new readahead_control API
mm: use readahead_control to pass arguments
mm: rename various 'offset' parameters to 'index'
mm: rename readahead loop variable to 'i'
mm: remove 'page_offset' from readahead loop
mm: put readahead pages in cache earlier
mm: add readahead address space operation
mm: move end_index check out of readahead loop
mm: add page_cache_readahead_unbounded
mm: document why we don't set PageReadahead
mm: use memalloc_nofs_save in readahead path
fs: convert mpage_readpages to mpage_readahead
btrfs: convert from readpages to readahead
erofs: convert uncompressed files from readpages to readahead
erofs: convert compressed files from readpages to readahead
ext4: convert from readpages to readahead
ext4: pass the inode to ext4_mpage_readpages
f2fs: convert from readpages to readahead
f2fs: pass the inode to f2fs_mpage_readpages
fuse: convert from readpages to readahead
iomap: convert from readpages to readahead
Guoqing Jiang <guoqing.jiang@cloud.ionos.com>:
Patch series "Introduce attach/detach_page_private to cleanup code":
include/linux/pagemap.h: introduce attach/detach_page_private
md: remove __clear_page_buffers and use attach/detach_page_private
btrfs: use attach/detach_page_private
fs/buffer.c: use attach/detach_page_private
f2fs: use attach/detach_page_private
iomap: use attach/detach_page_private
ntfs: replace attach_page_buffers with attach_page_private
orangefs: use attach/detach_page_private
buffer_head.h: remove attach_page_buffers
mm/migrate.c: call detach_page_private to cleanup code
mm_types.h: change set_page_private to inline function
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/filemap.c: remove misleading comment
Chao Yu <yuchao0@huawei.com>:
mm/page-writeback.c: remove unused variable
NeilBrown <neilb@suse.de>:
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead
Subsystem: mm/gup
Souptick Joarder <jrdr.linux@gmail.com>:
mm/gup.c: update the documentation
John Hubbard <jhubbard@nvidia.com>:
mm/gup: introduce pin_user_pages_unlocked
ivtv: convert get_user_pages() --> pin_user_pages()
Miles Chen <miles.chen@mediatek.com>:
mm/gup.c: further document vma_permits_fault()
Subsystem: mm/swap
chenqiwu <chenqiwu@xiaomi.com>:
mm/swapfile: use list_{prev,next}_entry() instead of open-coding
Qian Cai <cai@lca.pw>:
mm/swap_state: fix a data race in swapin_nr_pages
Andrea Righi <andrea.righi@canonical.com>:
mm: swap: properly update readahead statistics in unuse_pte_range()
Wei Yang <richard.weiyang@gmail.com>:
mm/swapfile.c: offset is only used when there is more slots
mm/swapfile.c: explicitly show ssd/non-ssd is handled mutually exclusive
mm/swapfile.c: remove the unnecessary goto for SSD case
mm/swapfile.c: simplify the calculation of n_goal
mm/swapfile.c: remove the extra check in scan_swap_map_slots()
mm/swapfile.c: found_free could be represented by (tmp < max)
mm/swapfile.c: tmp is always smaller than max
mm/swapfile.c: omit a duplicate code by compare tmp and max first
Huang Ying <ying.huang@intel.com>:
swap: try to scan more free slots even when fragmented
Wei Yang <richard.weiyang@gmail.com>:
mm/swapfile.c: classify SWAP_MAP_XXX to make it more readable
mm/swapfile.c: __swap_entry_free() always free 1 entry
Huang Ying <ying.huang@intel.com>:
mm/swapfile.c: use prandom_u32_max()
swap: reduce lock contention on swap cache from swap slots allocation
Randy Dunlap <rdunlap@infradead.org>:
mm: swapfile: fix /proc/swaps heading and Size/Used/Priority alignment
Miaohe Lin <linmiaohe@huawei.com>:
include/linux/swap.h: delete meaningless __add_to_swap_cache() declaration
Subsystem: mm/memcg
Yafang Shao <laoar.shao@gmail.com>:
mm, memcg: add workingset_restore in memory.stat
Kaixu Xia <kaixuxia@tencent.com>:
mm: memcontrol: simplify value comparison between count and limit
Shakeel Butt <shakeelb@google.com>:
memcg: expose root cgroup's memory.stat
Jakub Kicinski <kuba@kernel.org>:
Patch series "memcg: Slow down swap allocation as the available space gets:
mm/memcg: prepare for swap over-high accounting and penalty calculation
mm/memcg: move penalty delay clamping out of calculate_high_delay()
mm/memcg: move cgroup high memory limit setting into struct page_counter
mm/memcg: automatically penalize tasks with high swap use
Zefan Li <lizefan@huawei.com>:
memcg: fix memcg_kmem_bypass() for remote memcg charging
Subsystem: mm/pagemap
Steven Price <steven.price@arm.com>:
Patch series "Fix W+X debug feature on x86":
x86: mm: ptdump: calculate effective permissions correctly
mm: ptdump: expand type of 'val' in note_page()
Huang Ying <ying.huang@intel.com>:
/proc/PID/smaps: Add PMD migration entry parsing
chenqiwu <chenqiwu@xiaomi.com>:
mm/memory: remove unnecessary pte_devmap case in copy_one_pte()
Subsystem: mm/memory-failure
Wetp Zhang <wetp.zy@linux.alibaba.com>:
mm, memory_failure: don't send BUS_MCEERR_AO for action required error
Subsystem: mm/vmalloc
Christoph Hellwig <hch@lst.de>:
Patch series "decruft the vmalloc API", v2:
x86/hyperv: use vmalloc_exec for the hypercall page
x86: fix vmap arguments in map_irq_stack
staging: android: ion: use vmap instead of vm_map_ram
staging: media: ipu3: use vmap instead of reimplementing it
dma-mapping: use vmap insted of reimplementing it
powerpc: add an ioremap_phb helper
powerpc: remove __ioremap_at and __iounmap_at
mm: remove __get_vm_area
mm: unexport unmap_kernel_range_noflush
mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING
mm: only allow page table mappings for built-in zsmalloc
mm: pass addr as unsigned long to vb_free
mm: remove vmap_page_range_noflush and vunmap_page_range
mm: rename vmap_page_range to map_kernel_range
mm: don't return the number of pages from map_kernel_range{,_noflush}
mm: remove map_vm_range
mm: remove unmap_vmap_area
mm: remove the prot argument from vm_map_ram
mm: enforce that vmap can't map pages executable
gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
mm: remove the pgprot argument to __vmalloc
mm: remove the prot argument to __vmalloc_node
mm: remove both instances of __vmalloc_node_flags
mm: remove __vmalloc_node_flags_caller
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove vmalloc_user_node_flags
arm64: use __vmalloc_node in arch_alloc_vmap_stack
powerpc: use __vmalloc_node in alloc_vm_stack
s390: use __vmalloc_node in stack_alloc
Joerg Roedel <jroedel@suse.de>:
Patch series "mm: Get rid of vmalloc_sync_(un)mappings()", v3:
mm: add functions to track page directory modifications
mm/vmalloc: track which page-table levels were modified
mm/ioremap: track which page-table levels were modified
x86/mm/64: implement arch_sync_kernel_mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
mm: remove vmalloc_sync_(un)mappings()
x86/mm: remove vmalloc faulting
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan: fix clang compilation warning due to stack protector
Kees Cook <keescook@chromium.org>:
ubsan: entirely disable alignment checks under UBSAN_TRAP
Jing Xia <jing.xia@unisoc.com>:
mm/mm_init.c: report kasan-tag information stored in page->flags
Andrey Konovalov <andreyknvl@google.com>:
kasan: move kasan_report() into report.c
Documentation/admin-guide/cgroup-v2.rst | 24 +
Documentation/core-api/cachetlb.rst | 2
Documentation/filesystems/locking.rst | 6
Documentation/filesystems/proc.rst | 4
Documentation/filesystems/vfs.rst | 15
Documentation/vm/slub.rst | 2
arch/arm/configs/omap2plus_defconfig | 2
arch/arm64/include/asm/pgtable.h | 3
arch/arm64/include/asm/vmap_stack.h | 6
arch/arm64/mm/dump.c | 2
arch/parisc/include/asm/pgtable.h | 2
arch/powerpc/include/asm/io.h | 10
arch/powerpc/include/asm/pci-bridge.h | 2
arch/powerpc/kernel/irq.c | 5
arch/powerpc/kernel/isa-bridge.c | 28 +
arch/powerpc/kernel/pci_64.c | 56 +-
arch/powerpc/mm/ioremap_64.c | 50 --
arch/riscv/include/asm/pgtable.h | 4
arch/riscv/mm/ptdump.c | 2
arch/s390/kernel/setup.c | 9
arch/sh/kernel/cpu/sh4/sq.c | 3
arch/x86/hyperv/hv_init.c | 5
arch/x86/include/asm/kvm_host.h | 3
arch/x86/include/asm/pgtable-2level_types.h | 2
arch/x86/include/asm/pgtable-3level_types.h | 2
arch/x86/include/asm/pgtable_64_types.h | 2
arch/x86/include/asm/pgtable_types.h | 8
arch/x86/include/asm/switch_to.h | 23 -
arch/x86/kernel/irq_64.c | 2
arch/x86/kernel/setup_percpu.c | 6
arch/x86/kvm/svm/sev.c | 3
arch/x86/mm/dump_pagetables.c | 35 +
arch/x86/mm/fault.c | 196 ----------
arch/x86/mm/init_64.c | 5
arch/x86/mm/pti.c | 8
arch/x86/mm/tlb.c | 37 -
block/blk-core.c | 1
drivers/acpi/apei/ghes.c | 6
drivers/base/node.c | 2
drivers/block/drbd/drbd_bitmap.c | 4
drivers/block/loop.c | 2
drivers/dax/device.c | 1
drivers/gpu/drm/drm_scatter.c | 11
drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4
drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 2
drivers/lightnvm/pblk-init.c | 5
drivers/md/dm-bufio.c | 4
drivers/md/md-bitmap.c | 12
drivers/media/common/videobuf2/videobuf2-dma-sg.c | 3
drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3
drivers/media/pci/ivtv/ivtv-udma.c | 19 -
drivers/media/pci/ivtv/ivtv-yuv.c | 17
drivers/media/pci/ivtv/ivtvfb.c | 4
drivers/mtd/ubi/io.c | 4
drivers/pcmcia/electra_cf.c | 45 --
drivers/scsi/sd_zbc.c | 3
drivers/staging/android/ion/ion_heap.c | 4
drivers/staging/media/ipu3/ipu3-css-pool.h | 4
drivers/staging/media/ipu3/ipu3-dmamap.c | 30 -
fs/block_dev.c | 7
fs/btrfs/disk-io.c | 4
fs/btrfs/extent_io.c | 64 ---
fs/btrfs/extent_io.h | 3
fs/btrfs/inode.c | 39 --
fs/buffer.c | 23 -
fs/erofs/data.c | 41 --
fs/erofs/decompressor.c | 2
fs/erofs/zdata.c | 31 -
fs/exfat/inode.c | 7
fs/ext2/inode.c | 10
fs/ext4/ext4.h | 5
fs/ext4/inode.c | 25 -
fs/ext4/readpage.c | 25 -
fs/ext4/verity.c | 35 -
fs/f2fs/data.c | 56 +-
fs/f2fs/f2fs.h | 14
fs/f2fs/verity.c | 35 -
fs/fat/inode.c | 7
fs/file_table.c | 1
fs/fs-writeback.c | 1
fs/fuse/file.c | 100 +----
fs/gfs2/aops.c | 23 -
fs/gfs2/dir.c | 9
fs/gfs2/quota.c | 2
fs/hpfs/file.c | 7
fs/iomap/buffered-io.c | 113 +----
fs/iomap/trace.h | 2
fs/isofs/inode.c | 7
fs/jfs/inode.c | 7
fs/mpage.c | 38 --
fs/nfs/blocklayout/extent_tree.c | 2
fs/nfs/internal.h | 10
fs/nfs/write.c | 4
fs/nfsd/vfs.c | 9
fs/nilfs2/inode.c | 15
fs/ntfs/aops.c | 2
fs/ntfs/malloc.h | 2
fs/ntfs/mft.c | 2
fs/ocfs2/aops.c | 34 -
fs/ocfs2/dlm/dlmmaster.c | 1
fs/ocfs2/ocfs2.h | 4
fs/ocfs2/slot_map.c | 46 +-
fs/ocfs2/super.c | 21 +
fs/omfs/file.c | 7
fs/open.c | 3
fs/orangefs/inode.c | 32 -
fs/proc/meminfo.c | 3
fs/proc/task_mmu.c | 16
fs/qnx6/inode.c | 7
fs/reiserfs/inode.c | 8
fs/squashfs/block.c | 273 +++++++-------
fs/squashfs/decompressor.h | 5
fs/squashfs/decompressor_multi.c | 9
fs/squashfs/decompressor_multi_percpu.c | 17
fs/squashfs/decompressor_single.c | 9
fs/squashfs/lz4_wrapper.c | 17
fs/squashfs/lzo_wrapper.c | 17
fs/squashfs/squashfs.h | 4
fs/squashfs/xz_wrapper.c | 51 +-
fs/squashfs/zlib_wrapper.c | 63 +--
fs/squashfs/zstd_wrapper.c | 62 +--
fs/sync.c | 6
fs/ubifs/debug.c | 2
fs/ubifs/lprops.c | 2
fs/ubifs/lpt_commit.c | 4
fs/ubifs/orphan.c | 2
fs/udf/inode.c | 7
fs/xfs/kmem.c | 2
fs/xfs/xfs_aops.c | 13
fs/xfs/xfs_buf.c | 2
fs/zonefs/super.c | 7
include/asm-generic/5level-fixup.h | 5
include/asm-generic/pgtable.h | 27 +
include/linux/buffer_head.h | 8
include/linux/fs.h | 18
include/linux/iomap.h | 3
include/linux/memcontrol.h | 4
include/linux/mm.h | 67 ++-
include/linux/mm_types.h | 6
include/linux/mmzone.h | 1
include/linux/mpage.h | 4
include/linux/page_counter.h | 8
include/linux/pagemap.h | 193 ++++++++++
include/linux/ptdump.h | 3
include/linux/sched.h | 3
include/linux/swap.h | 17
include/linux/vmalloc.h | 49 +-
include/linux/zsmalloc.h | 2
include/trace/events/erofs.h | 6
include/trace/events/f2fs.h | 6
include/trace/events/writeback.h | 5
kernel/bpf/core.c | 6
kernel/bpf/syscall.c | 29 -
kernel/dma/remap.c | 48 --
kernel/groups.c | 2
kernel/module.c | 3
kernel/notifier.c | 1
kernel/sys.c | 2
kernel/trace/trace.c | 12
lib/Kconfig.ubsan | 2
lib/ioremap.c | 46 +-
lib/test_vmalloc.c | 26 -
mm/Kconfig | 4
mm/debug.c | 56 ++
mm/fadvise.c | 6
mm/filemap.c | 1
mm/gup.c | 77 +++-
mm/internal.h | 14
mm/kasan/Makefile | 21 -
mm/kasan/common.c | 19 -
mm/kasan/report.c | 22 +
mm/memcontrol.c | 198 +++++++---
mm/memory-failure.c | 15
mm/memory.c | 2
mm/migrate.c | 9
mm/mm_init.c | 16
mm/nommu.c | 52 +-
mm/page-writeback.c | 62 ++-
mm/page_alloc.c | 7
mm/percpu.c | 2
mm/ptdump.c | 17
mm/readahead.c | 349 ++++++++++--------
mm/slab_common.c | 3
mm/slub.c | 67 ++-
mm/swap_state.c | 5
mm/swapfile.c | 194 ++++++----
mm/util.c | 2
mm/vmalloc.c | 399 ++++++++-------------
mm/vmscan.c | 4
mm/vmstat.c | 11
mm/zsmalloc.c | 12
net/bridge/netfilter/ebtables.c | 6
net/ceph/ceph_common.c | 3
sound/core/memalloc.c | 2
sound/core/pcm_memory.c | 2
195 files changed, 2292 insertions(+), 2288 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-06-02 4:44 incoming Andrew Morton
@ 2020-06-02 20:08 ` Andrew Morton
2020-06-02 20:45 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-06-02 20:08 UTC (permalink / raw)
To: Linus Torvalds, mm-commits, linux-mm
The local_lock merge made rather a mess of all of this. I'm
cooking up a full resend of the same material.
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-06-02 4:44 Andrew Morton
2020-06-02 20:08 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-06-02 4:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
A few little subsystems and a start of a lot of MM patches.
128 patches, based on 9bf9511e3d9f328c03f6f79bfb741c3d18f2f2c0:
Subsystems affected by this patch series:
squashfs
ocfs2
parisc
vfs
mm/slab-generic
mm/slub
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/memcg
mm/pagemap
mm/memory-failure
mm/vmalloc
mm/kasan
Subsystem: squashfs
Philippe Liard <pliard@google.com>:
squashfs: migrate from ll_rw_block usage to BIO
Subsystem: ocfs2
Jules Irenge <jbi.octave@gmail.com>:
ocfs2: add missing annotation for dlm_empty_lockres()
Gang He <ghe@suse.com>:
ocfs2: mount shared volume without ha stack
Subsystem: parisc
Andrew Morton <akpm@linux-foundation.org>:
arch/parisc/include/asm/pgtable.h: remove unused `old_pte'
Subsystem: vfs
Jeff Layton <jlayton@redhat.com>:
Patch series "vfs: have syncfs() return error when there are writeback:
vfs: track per-sb writeback errors and report them to syncfs
fs/buffer.c: record blockdev write errors in super_block that it backs
Subsystem: mm/slab-generic
Vlastimil Babka <vbabka@suse.cz>:
usercopy: mark dma-kmalloc caches as usercopy caches
Subsystem: mm/slub
Dongli Zhang <dongli.zhang@oracle.com>:
mm/slub.c: fix corrupted freechain in deactivate_slab()
Christoph Lameter <cl@linux.com>:
slub: Remove userspace notifier for cache add/remove
Christopher Lameter <cl@linux.com>:
slub: remove kmalloc under list_lock from list_slab_objects() V2
Qian Cai <cai@lca.pw>:
mm/slub: fix stack overruns with SLUB_STATS
Andrew Morton <akpm@linux-foundation.org>:
Documentation/vm/slub.rst: s/Toggle/Enable/
Subsystem: mm/debug
Vlastimil Babka <vbabka@suse.cz>:
mm, dump_page(): do not crash with invalid mapping pointer
Subsystem: mm/pagecache
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
Patch series "Change readahead API", v11:
mm: move readahead prototypes from mm.h
mm: return void from various readahead functions
mm: ignore return value of ->readpages
mm: move readahead nr_pages check into read_pages
mm: add new readahead_control API
mm: use readahead_control to pass arguments
mm: rename various 'offset' parameters to 'index'
mm: rename readahead loop variable to 'i'
mm: remove 'page_offset' from readahead loop
mm: put readahead pages in cache earlier
mm: add readahead address space operation
mm: move end_index check out of readahead loop
mm: add page_cache_readahead_unbounded
mm: document why we don't set PageReadahead
mm: use memalloc_nofs_save in readahead path
fs: convert mpage_readpages to mpage_readahead
btrfs: convert from readpages to readahead
erofs: convert uncompressed files from readpages to readahead
erofs: convert compressed files from readpages to readahead
ext4: convert from readpages to readahead
ext4: pass the inode to ext4_mpage_readpages
f2fs: convert from readpages to readahead
f2fs: pass the inode to f2fs_mpage_readpages
fuse: convert from readpages to readahead
iomap: convert from readpages to readahead
Guoqing Jiang <guoqing.jiang@cloud.ionos.com>:
Patch series "Introduce attach/detach_page_private to cleanup code":
include/linux/pagemap.h: introduce attach/detach_page_private
md: remove __clear_page_buffers and use attach/detach_page_private
btrfs: use attach/detach_page_private
fs/buffer.c: use attach/detach_page_private
f2fs: use attach/detach_page_private
iomap: use attach/detach_page_private
ntfs: replace attach_page_buffers with attach_page_private
orangefs: use attach/detach_page_private
buffer_head.h: remove attach_page_buffers
mm/migrate.c: call detach_page_private to cleanup code
mm_types.h: change set_page_private to inline function
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/filemap.c: remove misleading comment
Chao Yu <yuchao0@huawei.com>:
mm/page-writeback.c: remove unused variable
NeilBrown <neilb@suse.de>:
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead
Subsystem: mm/gup
Souptick Joarder <jrdr.linux@gmail.com>:
mm/gup.c: update the documentation
John Hubbard <jhubbard@nvidia.com>:
mm/gup: introduce pin_user_pages_unlocked
ivtv: convert get_user_pages() --> pin_user_pages()
Miles Chen <miles.chen@mediatek.com>:
mm/gup.c: further document vma_permits_fault()
Subsystem: mm/swap
chenqiwu <chenqiwu@xiaomi.com>:
mm/swapfile: use list_{prev,next}_entry() instead of open-coding
Qian Cai <cai@lca.pw>:
mm/swap_state: fix a data race in swapin_nr_pages
Andrea Righi <andrea.righi@canonical.com>:
mm: swap: properly update readahead statistics in unuse_pte_range()
Wei Yang <richard.weiyang@gmail.com>:
mm/swapfile.c: offset is only used when there is more slots
mm/swapfile.c: explicitly show ssd/non-ssd is handled mutually exclusive
mm/swapfile.c: remove the unnecessary goto for SSD case
mm/swapfile.c: simplify the calculation of n_goal
mm/swapfile.c: remove the extra check in scan_swap_map_slots()
mm/swapfile.c: found_free could be represented by (tmp < max)
mm/swapfile.c: tmp is always smaller than max
mm/swapfile.c: omit a duplicate code by compare tmp and max first
Huang Ying <ying.huang@intel.com>:
swap: try to scan more free slots even when fragmented
Wei Yang <richard.weiyang@gmail.com>:
mm/swapfile.c: classify SWAP_MAP_XXX to make it more readable
mm/swapfile.c: __swap_entry_free() always free 1 entry
Huang Ying <ying.huang@intel.com>:
mm/swapfile.c: use prandom_u32_max()
swap: reduce lock contention on swap cache from swap slots allocation
Randy Dunlap <rdunlap@infradead.org>:
mm: swapfile: fix /proc/swaps heading and Size/Used/Priority alignment
Miaohe Lin <linmiaohe@huawei.com>:
include/linux/swap.h: delete meaningless __add_to_swap_cache() declaration
Subsystem: mm/memcg
Yafang Shao <laoar.shao@gmail.com>:
mm, memcg: add workingset_restore in memory.stat
Kaixu Xia <kaixuxia@tencent.com>:
mm: memcontrol: simplify value comparison between count and limit
Shakeel Butt <shakeelb@google.com>:
memcg: expose root cgroup's memory.stat
Jakub Kicinski <kuba@kernel.org>:
Patch series "memcg: Slow down swap allocation as the available space gets:
mm/memcg: prepare for swap over-high accounting and penalty calculation
mm/memcg: move penalty delay clamping out of calculate_high_delay()
mm/memcg: move cgroup high memory limit setting into struct page_counter
mm/memcg: automatically penalize tasks with high swap use
Zefan Li <lizefan@huawei.com>:
memcg: fix memcg_kmem_bypass() for remote memcg charging
Subsystem: mm/pagemap
Steven Price <steven.price@arm.com>:
Patch series "Fix W+X debug feature on x86":
x86: mm: ptdump: calculate effective permissions correctly
mm: ptdump: expand type of 'val' in note_page()
Huang Ying <ying.huang@intel.com>:
/proc/PID/smaps: Add PMD migration entry parsing
chenqiwu <chenqiwu@xiaomi.com>:
mm/memory: remove unnecessary pte_devmap case in copy_one_pte()
Subsystem: mm/memory-failure
Wetp Zhang <wetp.zy@linux.alibaba.com>:
mm, memory_failure: don't send BUS_MCEERR_AO for action required error
Subsystem: mm/vmalloc
Christoph Hellwig <hch@lst.de>:
Patch series "decruft the vmalloc API", v2:
x86/hyperv: use vmalloc_exec for the hypercall page
x86: fix vmap arguments in map_irq_stack
staging: android: ion: use vmap instead of vm_map_ram
staging: media: ipu3: use vmap instead of reimplementing it
dma-mapping: use vmap insted of reimplementing it
powerpc: add an ioremap_phb helper
powerpc: remove __ioremap_at and __iounmap_at
mm: remove __get_vm_area
mm: unexport unmap_kernel_range_noflush
mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING
mm: only allow page table mappings for built-in zsmalloc
mm: pass addr as unsigned long to vb_free
mm: remove vmap_page_range_noflush and vunmap_page_range
mm: rename vmap_page_range to map_kernel_range
mm: don't return the number of pages from map_kernel_range{,_noflush}
mm: remove map_vm_range
mm: remove unmap_vmap_area
mm: remove the prot argument from vm_map_ram
mm: enforce that vmap can't map pages executable
gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc
mm: remove the pgprot argument to __vmalloc
mm: remove the prot argument to __vmalloc_node
mm: remove both instances of __vmalloc_node_flags
mm: remove __vmalloc_node_flags_caller
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove vmalloc_user_node_flags
arm64: use __vmalloc_node in arch_alloc_vmap_stack
powerpc: use __vmalloc_node in alloc_vm_stack
s390: use __vmalloc_node in stack_alloc
Joerg Roedel <jroedel@suse.de>:
Patch series "mm: Get rid of vmalloc_sync_(un)mappings()", v3:
mm: add functions to track page directory modifications
mm/vmalloc: track which page-table levels were modified
mm/ioremap: track which page-table levels were modified
x86/mm/64: implement arch_sync_kernel_mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
mm: remove vmalloc_sync_(un)mappings()
x86/mm: remove vmalloc faulting
Subsystem: mm/kasan
Andrey Konovalov <andreyknvl@google.com>:
kasan: fix clang compilation warning due to stack protector
Kees Cook <keescook@chromium.org>:
ubsan: entirely disable alignment checks under UBSAN_TRAP
Jing Xia <jing.xia@unisoc.com>:
mm/mm_init.c: report kasan-tag information stored in page->flags
Andrey Konovalov <andreyknvl@google.com>:
kasan: move kasan_report() into report.c
Documentation/admin-guide/cgroup-v2.rst | 24 +
Documentation/core-api/cachetlb.rst | 2
Documentation/filesystems/locking.rst | 6
Documentation/filesystems/proc.rst | 4
Documentation/filesystems/vfs.rst | 15
Documentation/vm/slub.rst | 2
arch/arm/configs/omap2plus_defconfig | 2
arch/arm64/include/asm/pgtable.h | 3
arch/arm64/include/asm/vmap_stack.h | 6
arch/arm64/mm/dump.c | 2
arch/parisc/include/asm/pgtable.h | 2
arch/powerpc/include/asm/io.h | 10
arch/powerpc/include/asm/pci-bridge.h | 2
arch/powerpc/kernel/irq.c | 5
arch/powerpc/kernel/isa-bridge.c | 28 +
arch/powerpc/kernel/pci_64.c | 56 +-
arch/powerpc/mm/ioremap_64.c | 50 --
arch/riscv/include/asm/pgtable.h | 4
arch/riscv/mm/ptdump.c | 2
arch/s390/kernel/setup.c | 9
arch/sh/kernel/cpu/sh4/sq.c | 3
arch/x86/hyperv/hv_init.c | 5
arch/x86/include/asm/kvm_host.h | 3
arch/x86/include/asm/pgtable-2level_types.h | 2
arch/x86/include/asm/pgtable-3level_types.h | 2
arch/x86/include/asm/pgtable_64_types.h | 2
arch/x86/include/asm/pgtable_types.h | 8
arch/x86/include/asm/switch_to.h | 23 -
arch/x86/kernel/irq_64.c | 2
arch/x86/kernel/setup_percpu.c | 6
arch/x86/kvm/svm/sev.c | 3
arch/x86/mm/dump_pagetables.c | 35 +
arch/x86/mm/fault.c | 196 ----------
arch/x86/mm/init_64.c | 5
arch/x86/mm/pti.c | 8
arch/x86/mm/tlb.c | 37 -
block/blk-core.c | 1
drivers/acpi/apei/ghes.c | 6
drivers/base/node.c | 2
drivers/block/drbd/drbd_bitmap.c | 4
drivers/block/loop.c | 2
drivers/dax/device.c | 1
drivers/gpu/drm/drm_scatter.c | 11
drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4
drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c | 2
drivers/lightnvm/pblk-init.c | 5
drivers/md/dm-bufio.c | 4
drivers/md/md-bitmap.c | 12
drivers/media/common/videobuf2/videobuf2-dma-sg.c | 3
drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3
drivers/media/pci/ivtv/ivtv-udma.c | 19 -
drivers/media/pci/ivtv/ivtv-yuv.c | 17
drivers/media/pci/ivtv/ivtvfb.c | 4
drivers/mtd/ubi/io.c | 4
drivers/pcmcia/electra_cf.c | 45 --
drivers/scsi/sd_zbc.c | 3
drivers/staging/android/ion/ion_heap.c | 4
drivers/staging/media/ipu3/ipu3-css-pool.h | 4
drivers/staging/media/ipu3/ipu3-dmamap.c | 30 -
fs/block_dev.c | 7
fs/btrfs/disk-io.c | 4
fs/btrfs/extent_io.c | 64 ---
fs/btrfs/extent_io.h | 3
fs/btrfs/inode.c | 39 --
fs/buffer.c | 23 -
fs/erofs/data.c | 41 --
fs/erofs/decompressor.c | 2
fs/erofs/zdata.c | 31 -
fs/exfat/inode.c | 7
fs/ext2/inode.c | 10
fs/ext4/ext4.h | 5
fs/ext4/inode.c | 25 -
fs/ext4/readpage.c | 25 -
fs/ext4/verity.c | 35 -
fs/f2fs/data.c | 56 +-
fs/f2fs/f2fs.h | 14
fs/f2fs/verity.c | 35 -
fs/fat/inode.c | 7
fs/file_table.c | 1
fs/fs-writeback.c | 1
fs/fuse/file.c | 100 +----
fs/gfs2/aops.c | 23 -
fs/gfs2/dir.c | 9
fs/gfs2/quota.c | 2
fs/hpfs/file.c | 7
fs/iomap/buffered-io.c | 113 +----
fs/iomap/trace.h | 2
fs/isofs/inode.c | 7
fs/jfs/inode.c | 7
fs/mpage.c | 38 --
fs/nfs/blocklayout/extent_tree.c | 2
fs/nfs/internal.h | 10
fs/nfs/write.c | 4
fs/nfsd/vfs.c | 9
fs/nilfs2/inode.c | 15
fs/ntfs/aops.c | 2
fs/ntfs/malloc.h | 2
fs/ntfs/mft.c | 2
fs/ocfs2/aops.c | 34 -
fs/ocfs2/dlm/dlmmaster.c | 1
fs/ocfs2/ocfs2.h | 4
fs/ocfs2/slot_map.c | 46 +-
fs/ocfs2/super.c | 21 +
fs/omfs/file.c | 7
fs/open.c | 3
fs/orangefs/inode.c | 32 -
fs/proc/meminfo.c | 3
fs/proc/task_mmu.c | 16
fs/qnx6/inode.c | 7
fs/reiserfs/inode.c | 8
fs/squashfs/block.c | 273 +++++++-------
fs/squashfs/decompressor.h | 5
fs/squashfs/decompressor_multi.c | 9
fs/squashfs/decompressor_multi_percpu.c | 17
fs/squashfs/decompressor_single.c | 9
fs/squashfs/lz4_wrapper.c | 17
fs/squashfs/lzo_wrapper.c | 17
fs/squashfs/squashfs.h | 4
fs/squashfs/xz_wrapper.c | 51 +-
fs/squashfs/zlib_wrapper.c | 63 +--
fs/squashfs/zstd_wrapper.c | 62 +--
fs/sync.c | 6
fs/ubifs/debug.c | 2
fs/ubifs/lprops.c | 2
fs/ubifs/lpt_commit.c | 4
fs/ubifs/orphan.c | 2
fs/udf/inode.c | 7
fs/xfs/kmem.c | 2
fs/xfs/xfs_aops.c | 13
fs/xfs/xfs_buf.c | 2
fs/zonefs/super.c | 7
include/asm-generic/5level-fixup.h | 5
include/asm-generic/pgtable.h | 27 +
include/linux/buffer_head.h | 8
include/linux/fs.h | 18
include/linux/iomap.h | 3
include/linux/memcontrol.h | 4
include/linux/mm.h | 67 ++-
include/linux/mm_types.h | 6
include/linux/mmzone.h | 1
include/linux/mpage.h | 4
include/linux/page_counter.h | 8
include/linux/pagemap.h | 193 ++++++++++
include/linux/ptdump.h | 3
include/linux/sched.h | 3
include/linux/swap.h | 17
include/linux/vmalloc.h | 49 +-
include/linux/zsmalloc.h | 2
include/trace/events/erofs.h | 6
include/trace/events/f2fs.h | 6
include/trace/events/writeback.h | 5
kernel/bpf/core.c | 6
kernel/bpf/syscall.c | 29 -
kernel/dma/remap.c | 48 --
kernel/groups.c | 2
kernel/module.c | 3
kernel/notifier.c | 1
kernel/sys.c | 2
kernel/trace/trace.c | 12
lib/Kconfig.ubsan | 2
lib/ioremap.c | 46 +-
lib/test_vmalloc.c | 26 -
mm/Kconfig | 4
mm/debug.c | 56 ++
mm/fadvise.c | 6
mm/filemap.c | 1
mm/gup.c | 77 +++-
mm/internal.h | 14
mm/kasan/Makefile | 21 -
mm/kasan/common.c | 19 -
mm/kasan/report.c | 22 +
mm/memcontrol.c | 198 +++++++---
mm/memory-failure.c | 15
mm/memory.c | 2
mm/migrate.c | 9
mm/mm_init.c | 16
mm/nommu.c | 52 +-
mm/page-writeback.c | 62 ++-
mm/page_alloc.c | 7
mm/percpu.c | 2
mm/ptdump.c | 17
mm/readahead.c | 349 ++++++++++--------
mm/slab_common.c | 3
mm/slub.c | 67 ++-
mm/swap_state.c | 5
mm/swapfile.c | 194 ++++++----
mm/util.c | 2
mm/vmalloc.c | 399 ++++++++-------------
mm/vmscan.c | 4
mm/vmstat.c | 11
mm/zsmalloc.c | 12
net/bridge/netfilter/ebtables.c | 6
net/ceph/ceph_common.c | 3
sound/core/memalloc.c | 2
sound/core/pcm_memory.c | 2
195 files changed, 2292 insertions(+), 2288 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-05-29 21:12 ` incoming Andrew Morton
@ 2020-05-29 21:20 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-05-29 21:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Fri, May 29, 2020 at 2:12 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Stupid diffstat. Means that basically all my diffstats are very wrong.
I'm actually used to diffstats not matching 100%/
Usually it's not due to this issue - a "git diff --stat" *will* give
the stat from the actual combined diff result - but with git diffstats
the issue is that I might have gotten a patch from another source.
So the diffstat I see after-the-merge is possibly different from the
pre-merge diffstat simply due to merge issues.
So then I usually take a look at "ok, why did that diffstat differ"
and go "Ahh".
In your case, when I looked at the diffstat, I couldn't for the life
of me see how you would have gotten the diffstat you did, since I only
saw a single patch with no merge issues.
> Thanks for spotting it.
>
> I can fix that...
I can also just live with it, knowing what your workflow is. The
diffstat matching exactly just isn't that important - in fact,
different versions of "diff" can give slightly different output anyway
depending on diff algorithms even when they are looking at the exact
same before/after state. There's not necessarily always only one way
to generate a valid diff.
So to me, the diffstat is more of a guide than a hard thing, and I
want to see the rough outline,
In fact, one reason I want to see it in pull requests is actually just
that I want to get a feel for what changes even before I do the pull
or merge, so it's not just a "match against what I get" thing.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-05-29 20:38 ` incoming Linus Torvalds
@ 2020-05-29 21:12 ` Andrew Morton
2020-05-29 21:20 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-05-29 21:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, Linux-MM
On Fri, 29 May 2020 13:38:35 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, May 29, 2020 at 1:31 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Bah. I got lazy (didn't want to interrupt an ongoing build) so I
> > generated the diffstat prior to folding two patches into a single one.
> > Evidently diffstat isn't as smart as I had assumed!
>
> Ahh. Yes - given two patches, diffstat just adds up the line number
> counts for the individual diffs, it doesn't count some kind of
> "combined diff result" line counts.
Stupid diffstat. Means that basically all my diffstats are very wrong.
Thanks for spotting it.
I can fix that...
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-05-29 20:31 ` incoming Andrew Morton
@ 2020-05-29 20:38 ` Linus Torvalds
2020-05-29 21:12 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2020-05-29 20:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Fri, May 29, 2020 at 1:31 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Bah. I got lazy (didn't want to interrupt an ongoing build) so I
> generated the diffstat prior to folding two patches into a single one.
> Evidently diffstat isn't as smart as I had assumed!
Ahh. Yes - given two patches, diffstat just adds up the line number
counts for the individual diffs, it doesn't count some kind of
"combined diff result" line counts.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-05-28 20:10 ` incoming Linus Torvalds
@ 2020-05-29 20:31 ` Andrew Morton
2020-05-29 20:38 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-05-29 20:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, Linux-MM
On Thu, 28 May 2020 13:10:18 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Hmm..
>
> On Wed, May 27, 2020 at 10:20 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > fs/binfmt_elf.c | 2 +-
> > include/asm-generic/topology.h | 2 +-
> > include/linux/mm.h | 19 +++++++++++++++----
> > mm/khugepaged.c | 1 +
> > mm/z3fold.c | 3 +++
> > 5 files changed, 21 insertions(+), 6 deletions(-)
>
> I wonder how you generate that diffstat.
>
> The change to <linux/mm.h> simply doesn't match what you sent me. The
> patch you sent me that changed mm.h had this:
>
> include/linux/mm.h | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> (note 15 lines changed: it's +13 and -2) but now suddenly in your
> overall diffstat you have that
>
> include/linux/mm.h | 19 +++++++++++++++----
>
> with +15/-4.
>
> So your diffstat simply doesn't match what you are sending. What's going on?
>
Bah. I got lazy (didn't want to interrupt an ongoing build) so I
generated the diffstat prior to folding two patches into a single one.
Evidently diffstat isn't as smart as I had assumed!
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-05-28 5:20 incoming Andrew Morton
@ 2020-05-28 20:10 ` Linus Torvalds
2020-05-29 20:31 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2020-05-28 20:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
Hmm..
On Wed, May 27, 2020 at 10:20 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> fs/binfmt_elf.c | 2 +-
> include/asm-generic/topology.h | 2 +-
> include/linux/mm.h | 19 +++++++++++++++----
> mm/khugepaged.c | 1 +
> mm/z3fold.c | 3 +++
> 5 files changed, 21 insertions(+), 6 deletions(-)
I wonder how you generate that diffstat.
The change to <linux/mm.h> simply doesn't match what you sent me. The
patch you sent me that changed mm.h had this:
include/linux/mm.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
(note 15 lines changed: it's +13 and -2) but now suddenly in your
overall diffstat you have that
include/linux/mm.h | 19 +++++++++++++++----
with +15/-4.
So your diffstat simply doesn't match what you are sending. What's going on?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-05-28 5:20 Andrew Morton
2020-05-28 20:10 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-05-28 5:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
5 fixes, based on 444fc5cde64330661bf59944c43844e7d4c2ccd8:
Qian Cai <cai@lca.pw>:
mm/z3fold: silence kmemleak false positives of slots
Hugh Dickins <hughd@google.com>:
mm,thp: stop leaking unreleased file pages
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()
Alexander Potapenko <glider@google.com>:
fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()
Arnd Bergmann <arnd@arndb.de>:
include/asm-generic/topology.h: guard cpumask_of_node() macro argument
fs/binfmt_elf.c | 2 +-
include/asm-generic/topology.h | 2 +-
include/linux/mm.h | 19 +++++++++++++++----
mm/khugepaged.c | 1 +
mm/z3fold.c | 3 +++
5 files changed, 21 insertions(+), 6 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-05-23 5:22 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-05-23 5:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
11 fixes, based on 444565650a5fe9c63ddf153e6198e31705dedeb2:
David Hildenbrand <david@redhat.com>:
device-dax: don't leak kernel memory to user space after unloading kmem
Nick Desaulniers <ndesaulniers@google.com>:
x86: bitops: fix build regression
John Hubbard <jhubbard@nvidia.com>:
rapidio: fix an error in get_user_pages_fast() error handling
selftests/vm/.gitignore: add mremap_dontunmap
selftests/vm/write_to_hugetlbfs.c: fix unused variable warning
Marco Elver <elver@google.com>:
kasan: disable branch tracing for core runtime
Arnd Bergmann <arnd@arndb.de>:
sh: include linux/time_types.h for sockios
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>:
MAINTAINERS: update email address for Naoya Horiguchi
Mike Rapoport <rppt@linux.ibm.com>:
sparc32: use PUD rather than PGD to get PMD in srmmu_nocache_init()
Uladzislau Rezki <uladzislau.rezki@sony.com>:
z3fold: fix use-after-free when freeing handles
Baoquan He <bhe@redhat.com>:
MAINTAINERS: add files related to kdump
MAINTAINERS | 7 ++++++-
arch/sh/include/uapi/asm/sockios.h | 2 ++
arch/sparc/mm/srmmu.c | 2 +-
arch/x86/include/asm/bitops.h | 12 ++++++------
drivers/dax/kmem.c | 14 +++++++++++---
drivers/rapidio/devices/rio_mport_cdev.c | 5 +++++
mm/kasan/Makefile | 16 ++++++++--------
mm/kasan/generic.c | 1 -
mm/kasan/tags.c | 1 -
mm/z3fold.c | 11 ++++++-----
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/write_to_hugetlbfs.c | 2 --
12 files changed, 46 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-05-14 0:50 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-05-14 0:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
7 fixes, based on 24085f70a6e1b0cb647ec92623284641d8270637:
Yafang Shao <laoar.shao@gmail.com>:
mm, memcg: fix inconsistent oom event behavior
Roman Penyaev <rpenyaev@suse.de>:
epoll: call final ep_events_available() check under the lock
Peter Xu <peterx@redhat.com>:
mm/gup: fix fixup_user_fault() on multiple retries
Brian Geffon <bgeffon@google.com>:
userfaultfd: fix remap event with MREMAP_DONTUNMAP
Vasily Averin <vvs@virtuozzo.com>:
ipc/util.c: sysvipc_find_ipc() incorrectly updates position index
Andrey Konovalov <andreyknvl@google.com>:
kasan: consistently disable debugging features
kasan: add missing functions declarations to kasan.h
fs/eventpoll.c | 48 ++++++++++++++++++++++++++-------------------
include/linux/memcontrol.h | 2 +
ipc/util.c | 12 +++++------
mm/gup.c | 12 ++++++-----
mm/kasan/Makefile | 15 +++++++++-----
mm/kasan/kasan.h | 34 ++++++++++++++++++++++++++++++-
mm/mremap.c | 2 -
7 files changed, 86 insertions(+), 39 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-05-08 1:35 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-05-08 1:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
14 fixes and one selftest to verify the ipc fixes herein.
15 patches, based on a811c1fa0a02c062555b54651065899437bacdbe:
Oleg Nesterov <oleg@redhat.com>:
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
Yafang Shao <laoar.shao@gmail.com>:
mm, memcg: fix error return value of mem_cgroup_css_alloc()
David Hildenbrand <david@redhat.com>:
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
Maciej Grochowski <maciej.grochowski@pm.me>:
kernel/kcov.c: fix typos in kcov_remote_start documentation
Ivan Delalande <colona@arista.com>:
scripts/decodecode: fix trapping instruction formatting
Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>:
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
Khazhismel Kumykov <khazhy@google.com>:
eventpoll: fix missing wakeup for ovflist in ep_poll_callback
Aymeric Agon-Rambosson <aymeric.agon@yandex.com>:
scripts/gdb: repair rb_first() and rb_last()
Waiman Long <longman@redhat.com>:
mm/slub: fix incorrect interpretation of s->offset
Filipe Manana <fdmanana@suse.com>:
percpu: make pcpu_alloc() aware of current gfp context
Roman Penyaev <rpenyaev@suse.de>:
kselftests: introduce new epoll60 testcase for catching lost wakeups
epoll: atomically remove wait entry on wake up
Qiwu Chen <qiwuchen55@gmail.com>:
mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
Kees Cook <keescook@chromium.org>:
ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
Henry Willard <henry.willard@oracle.com>:
mm: limit boost_watermark on small zones
arch/x86/kvm/svm/sev.c | 2
fs/eventpoll.c | 61 ++--
ipc/mqueue.c | 34 +-
kernel/kcov.c | 4
lib/Kconfig.ubsan | 15 -
mm/memcontrol.c | 15 -
mm/page_alloc.c | 9
mm/percpu.c | 14
mm/slub.c | 45 ++-
mm/vmscan.c | 1
scripts/decodecode | 2
scripts/gdb/linux/rbtree.py | 4
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 146 ++++++++++
tools/testing/selftests/wireguard/qemu/debug.config | 1
14 files changed, 275 insertions(+), 78 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-04-21 1:13 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-04-21 1:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
15 fixes, based on ae83d0b416db002fe95601e7f97f64b59514d936:
Masahiro Yamada <masahiroy@kernel.org>:
sh: fix build error in mm/init.c
Kees Cook <keescook@chromium.org>:
slub: avoid redzone when choosing freepointer location
Peter Xu <peterx@redhat.com>:
mm/userfaultfd: disable userfaultfd-wp on x86_32
Bartosz Golaszewski <bgolaszewski@baylibre.com>:
MAINTAINERS: add an entry for kfifo
Longpeng <longpeng2@huawei.com>:
mm/hugetlb: fix a addressing exception caused by huge_pte_offset
Michal Hocko <mhocko@suse.com>:
mm, gup: return EINTR when gup is interrupted by fatal signals
Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
checkpatch: fix a typo in the regex for $allocFunctions
George Burgess IV <gbiv@google.com>:
tools/build: tweak unused value workaround
Muchun Song <songmuchun@bytedance.com>:
mm/ksm: fix NULL pointer dereference when KSM zero page is enabled
Hugh Dickins <hughd@google.com>:
mm/shmem: fix build without THP
Jann Horn <jannh@google.com>:
vmalloc: fix remap_vmalloc_range() bounds checks
Hugh Dickins <hughd@google.com>:
shmem: fix possible deadlocks on shmlock_user_lock
Yang Shi <yang.shi@linux.alibaba.com>:
mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path
Sudip Mukherjee <sudipm.mukherjee@gmail.com>:
coredump: fix null pointer dereference on coredump
Lucas Stach <l.stach@pengutronix.de>:
tools/vm: fix cross-compile build
MAINTAINERS | 7 +++++++
arch/sh/mm/init.c | 2 +-
arch/x86/Kconfig | 2 +-
fs/coredump.c | 2 ++
fs/proc/vmcore.c | 5 +++--
include/linux/vmalloc.h | 2 +-
mm/gup.c | 2 +-
mm/hugetlb.c | 14 ++++++++------
mm/ksm.c | 12 ++++++++++--
mm/shmem.c | 13 ++++++++-----
mm/slub.c | 12 ++++++++++--
mm/vmalloc.c | 16 +++++++++++++---
samples/vfio-mdev/mdpy.c | 2 +-
scripts/checkpatch.pl | 2 +-
tools/build/feature/test-sync-compare-and-swap.c | 2 +-
tools/vm/Makefile | 2 ++
16 files changed, 70 insertions(+), 27 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-04-12 7:41 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-04-12 7:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
A straggler. This patch caused a lot of build errors on a lot of
architectures for a long time, but Anshuman believes it's all fixed up
now.
1 patch, based on GIT b032227c62939b5481bcd45442b36dfa263f4a7c.
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/debug: add tests validating architecture page table helpers
Documentation/features/debug/debug-vm-pgtable/arch-support.txt | 34
arch/arc/Kconfig | 1
arch/arm64/Kconfig | 1
arch/powerpc/Kconfig | 1
arch/s390/Kconfig | 1
arch/x86/Kconfig | 1
arch/x86/include/asm/pgtable_64.h | 6
include/linux/mmdebug.h | 5
init/main.c | 2
lib/Kconfig.debug | 26
mm/Makefile | 1
mm/debug_vm_pgtable.c | 392 ++++++++++
12 files changed, 471 insertions(+)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-04-10 21:30 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-04-10 21:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
Almost all of the rest of MM. Various other things.
35 patches, based on c0cc271173b2e1c2d8d0ceaef14e4dfa79eefc0d.
Subsystems affected by this patch series:
hfs
mm/memcg
mm/slab-generic
mm/slab
mm/pagealloc
mm/gup
ocfs2
mm/hugetlb
mm/pagemap
mm/memremap
kmod
misc
seqfile
Subsystem: hfs
Simon Gander <simon@tuxera.com>:
hfsplus: fix crash and filesystem corruption when deleting files
Subsystem: mm/memcg
Jakub Kicinski <kuba@kernel.org>:
mm, memcg: do not high throttle allocators based on wraparound
Subsystem: mm/slab-generic
Qiujun Huang <hqjagain@gmail.com>:
mm, slab_common: fix a typo in comment "eariler"->"earlier"
Subsystem: mm/slab
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>:
docs: mm: slab.h: fix a broken cross-reference
Subsystem: mm/pagealloc
Randy Dunlap <rdunlap@infradead.org>:
mm/page_alloc.c: fix kernel-doc warning
Jason Yan <yanaijie@huawei.com>:
mm/page_alloc: make pcpu_drain_mutex and pcpu_drain static
Subsystem: mm/gup
Miles Chen <miles.chen@mediatek.com>:
mm/gup: fix null pointer dereference detected by coverity
Subsystem: ocfs2
Changwei Ge <chge@linux.alibaba.com>:
ocfs2: no need try to truncate file beyond i_size
Subsystem: mm/hugetlb
Aslan Bakirov <aslan@fb.com>:
mm: cma: NUMA node interface
Roman Gushchin <guro@fb.com>:
mm: hugetlb: optionally allocate gigantic hugepages using cma
Subsystem: mm/pagemap
Jaewon Kim <jaewon31.kim@samsung.com>:
mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
Arjun Roy <arjunroy@google.com>:
mm/memory.c: refactor insert_page to prepare for batched-lock insert
mm: bring sparc pte_index() semantics inline with other platforms
mm: define pte_index as macro for x86
mm/memory.c: add vm_insert_pages()
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
mm/vma: introduce VM_ACCESS_FLAGS
mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
Subsystem: mm/memremap
Logan Gunthorpe <logang@deltatee.com>:
Patch series "Allow setting caching mode in arch_add_memory() for P2PDMA", v4:
mm/memory_hotplug: drop the flags field from struct mhp_restrictions
mm/memory_hotplug: rename mhp_restrictions to mhp_params
x86/mm: thread pgprot_t through init_memory_mapping()
x86/mm: introduce __set_memory_prot()
powerpc/mm: thread pgprot_t through create_section_mapping()
mm/memory_hotplug: add pgprot_t to mhp_params
mm/memremap: set caching mode for PCI P2PDMA memory to WC
Subsystem: kmod
Eric Biggers <ebiggers@google.com>:
Patch series "module autoloading fixes and cleanups", v5:
kmod: make request_module() return an error when autoloading is disabled
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
docs: admin-guide: document the kernel.modprobe sysctl
selftests: kmod: fix handling test numbers above 9
selftests: kmod: test disabling module autoloading
Subsystem: misc
Pali Rohár <pali@kernel.org>:
change email address for Pali Rohár
kbuild test robot <lkp@intel.com>:
drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
Subsystem: seqfile
Vasily Averin <vvs@virtuozzo.com>:
Patch series "seq_file .next functions should increase position index":
fs/seq_file.c: seq_read(): add info message about buggy .next functions
kernel/gcov/fs.c: gcov_seq_next() should increase position index
ipc/util.c: sysvipc_find_ipc() should increase position index
Documentation/ABI/testing/sysfs-platform-dell-laptop | 8
Documentation/admin-guide/kernel-parameters.txt | 8
Documentation/admin-guide/sysctl/kernel.rst | 21 ++
MAINTAINERS | 16 -
arch/alpha/include/asm/page.h | 3
arch/alpha/include/asm/pgtable.h | 2
arch/arc/include/asm/page.h | 2
arch/arm/include/asm/page.h | 4
arch/arm/include/asm/pgtable-2level.h | 2
arch/arm/include/asm/pgtable.h | 15 -
arch/arm/mach-omap2/omap-secure.c | 2
arch/arm/mach-omap2/omap-secure.h | 2
arch/arm/mach-omap2/omap-smc.S | 2
arch/arm/mm/fault.c | 2
arch/arm/mm/mmu.c | 14 +
arch/arm64/include/asm/page.h | 4
arch/arm64/mm/fault.c | 2
arch/arm64/mm/init.c | 6
arch/arm64/mm/mmu.c | 7
arch/c6x/include/asm/page.h | 5
arch/csky/include/asm/page.h | 3
arch/csky/include/asm/pgtable.h | 3
arch/h8300/include/asm/page.h | 2
arch/hexagon/include/asm/page.h | 3
arch/hexagon/include/asm/pgtable.h | 2
arch/ia64/include/asm/page.h | 5
arch/ia64/include/asm/pgtable.h | 2
arch/ia64/mm/init.c | 7
arch/m68k/include/asm/mcf_pgtable.h | 10 -
arch/m68k/include/asm/motorola_pgtable.h | 2
arch/m68k/include/asm/page.h | 3
arch/m68k/include/asm/sun3_pgtable.h | 2
arch/microblaze/include/asm/page.h | 2
arch/microblaze/include/asm/pgtable.h | 4
arch/mips/include/asm/page.h | 5
arch/mips/include/asm/pgtable.h | 44 +++-
arch/nds32/include/asm/page.h | 3
arch/nds32/include/asm/pgtable.h | 9 -
arch/nds32/mm/fault.c | 2
arch/nios2/include/asm/page.h | 3
arch/nios2/include/asm/pgtable.h | 3
arch/openrisc/include/asm/page.h | 5
arch/openrisc/include/asm/pgtable.h | 2
arch/parisc/include/asm/page.h | 3
arch/parisc/include/asm/pgtable.h | 2
arch/powerpc/include/asm/book3s/64/hash.h | 3
arch/powerpc/include/asm/book3s/64/radix.h | 3
arch/powerpc/include/asm/page.h | 9 -
arch/powerpc/include/asm/page_64.h | 7
arch/powerpc/include/asm/sparsemem.h | 3
arch/powerpc/mm/book3s64/hash_utils.c | 5
arch/powerpc/mm/book3s64/pgtable.c | 7
arch/powerpc/mm/book3s64/pkeys.c | 2
arch/powerpc/mm/book3s64/radix_pgtable.c | 18 +-
arch/powerpc/mm/mem.c | 12 -
arch/riscv/include/asm/page.h | 3
arch/s390/include/asm/page.h | 3
arch/s390/mm/fault.c | 2
arch/s390/mm/init.c | 9 -
arch/sh/include/asm/page.h | 3
arch/sh/mm/init.c | 7
arch/sparc/include/asm/page_32.h | 3
arch/sparc/include/asm/page_64.h | 3
arch/sparc/include/asm/pgtable_32.h | 7
arch/sparc/include/asm/pgtable_64.h | 10 -
arch/um/include/asm/pgtable.h | 10 -
arch/unicore32/include/asm/page.h | 3
arch/unicore32/include/asm/pgtable.h | 3
arch/unicore32/mm/fault.c | 2
arch/x86/include/asm/page_types.h | 7
arch/x86/include/asm/pgtable.h | 6
arch/x86/include/asm/set_memory.h | 1
arch/x86/kernel/amd_gart_64.c | 3
arch/x86/kernel/setup.c | 4
arch/x86/mm/init.c | 9 -
arch/x86/mm/init_32.c | 19 +-
arch/x86/mm/init_64.c | 42 ++--
arch/x86/mm/mm_internal.h | 3
arch/x86/mm/pat/set_memory.c | 13 +
arch/x86/mm/pkeys.c | 2
arch/x86/platform/uv/bios_uv.c | 3
arch/x86/um/asm/vm-flags.h | 10 -
arch/xtensa/include/asm/page.h | 3
arch/xtensa/include/asm/pgtable.h | 3
drivers/char/hw_random/omap3-rom-rng.c | 4
drivers/dma/tegra20-apb-dma.c | 1
drivers/hwmon/dell-smm-hwmon.c | 4
drivers/platform/x86/dell-laptop.c | 4
drivers/platform/x86/dell-rbtn.c | 4
drivers/platform/x86/dell-rbtn.h | 2
drivers/platform/x86/dell-smbios-base.c | 4
drivers/platform/x86/dell-smbios-smm.c | 2
drivers/platform/x86/dell-smbios.h | 2
drivers/platform/x86/dell-smo8800.c | 2
drivers/platform/x86/dell-wmi.c | 4
drivers/power/supply/bq2415x_charger.c | 4
drivers/power/supply/bq27xxx_battery.c | 2
drivers/power/supply/isp1704_charger.c | 2
drivers/power/supply/rx51_battery.c | 4
drivers/staging/gasket/gasket_core.c | 2
fs/filesystems.c | 4
fs/hfsplus/attributes.c | 4
fs/ocfs2/alloc.c | 4
fs/seq_file.c | 7
fs/udf/ecma_167.h | 2
fs/udf/osta_udf.h | 2
include/linux/cma.h | 14 +
include/linux/hugetlb.h | 12 +
include/linux/memblock.h | 3
include/linux/memory_hotplug.h | 21 +-
include/linux/mm.h | 34 +++
include/linux/power/bq2415x_charger.h | 2
include/linux/slab.h | 2
ipc/util.c | 2
kernel/gcov/fs.c | 2
kernel/kmod.c | 4
mm/cma.c | 16 +
mm/gup.c | 3
mm/hugetlb.c | 109 ++++++++++++
mm/memblock.c | 2
mm/memcontrol.c | 3
mm/memory.c | 168 +++++++++++++++++--
mm/memory_hotplug.c | 13 -
mm/memremap.c | 17 +
mm/mmap.c | 4
mm/mprotect.c | 4
mm/page_alloc.c | 5
mm/slab_common.c | 2
tools/laptop/freefall/freefall.c | 2
tools/testing/selftests/kmod/kmod.sh | 43 ++++
130 files changed, 710 insertions(+), 370 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-04-07 3:02 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-04-07 3:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
- a lot more of MM, quite a bit more yet to come.
- various other subsystems
166 patches based on 7e63420847ae5f1036e4f7c42f0b3282e73efbc2.
Subsystems affected by this patch series:
mm/memcg
mm/pagemap
mm/vmalloc
mm/pagealloc
mm/migration
mm/thp
mm/ksm
mm/madvise
mm/virtio
mm/userfaultfd
mm/memory-hotplug
mm/shmem
mm/rmap
mm/zswap
mm/zsmalloc
mm/cleanups
procfs
misc
MAINTAINERS
bitops
lib
checkpatch
epoll
binfmt
kallsyms
reiserfs
kmod
gcov
kconfig
kcov
ubsan
fault-injection
ipc
Subsystem: mm/memcg
Chris Down <chris@chrisdown.name>:
mm, memcg: bypass high reclaim iteration for cgroup hierarchy root
Subsystem: mm/pagemap
Li Xinhai <lixinhai.lxh@gmail.com>:
Patch series "mm: Fix misuse of parent anon_vma in dup_mmap path":
mm: don't prepare anon_vma if vma has VM_WIPEONFORK
Revert "mm/rmap.c: reuse mergeable anon_vma as parent when fork"
mm: set vm_next and vm_prev to NULL in vm_area_dup()
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/vma: Use all available wrappers when possible", v2:
mm/vma: add missing VMA flag readable name for VM_SYNC
mm/vma: make vma_is_accessible() available for general use
mm/vma: replace all remaining open encodings with is_vm_hugetlb_page()
mm/vma: replace all remaining open encodings with vma_is_anonymous()
mm/vma: append unlikely() while testing VMA access permissions
Subsystem: mm/vmalloc
Qiujun Huang <hqjagain@gmail.com>:
mm/vmalloc: fix a typo in comment
Subsystem: mm/pagealloc
Michal Hocko <mhocko@suse.com>:
mm: make it clear that gfp reclaim modifiers are valid only for sleepable allocations
Subsystem: mm/migration
Wei Yang <richardw.yang@linux.intel.com>:
Patch series "cleanup on do_pages_move()", v5:
mm/migrate.c: no need to check for i > start in do_pages_move()
mm/migrate.c: wrap do_move_pages_to_node() and store_status()
mm/migrate.c: check pagelist in move_pages_and_store_status()
mm/migrate.c: unify "not queued for migration" handling in do_pages_move()
Yang Shi <yang.shi@linux.alibaba.com>:
mm/migrate.c: migrate PG_readahead flag
Subsystem: mm/thp
David Rientjes <rientjes@google.com>:
mm, shmem: add vmstat for hugepage fallback
mm, thp: track fallbacks due to failed memcg charges separately
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
include/linux/pagemap.h: optimise find_subpage for !THP
mm: remove CONFIG_TRANSPARENT_HUGE_PAGECACHE
Subsystem: mm/ksm
Li Chen <chenli@uniontech.com>:
mm/ksm.c: update get_user_pages() argument in comment
Subsystem: mm/madvise
Huang Ying <ying.huang@intel.com>:
mm: code cleanup for MADV_FREE
Subsystem: mm/virtio
Alexander Duyck <alexander.h.duyck@linux.intel.com>:
Patch series "mm / virtio: Provide support for free page reporting", v17:
mm: adjust shuffle code to allow for future coalescing
mm: use zone and order instead of free area in free_list manipulators
mm: add function __putback_isolated_page
mm: introduce Reported pages
virtio-balloon: pull page poisoning config out of free page hinting
virtio-balloon: add support for providing free page reports to host
mm/page_reporting: rotate reported pages to the tail of the list
mm/page_reporting: add budget limit on how many pages can be reported per pass
mm/page_reporting: add free page reporting documentation
David Hildenbrand <david@redhat.com>:
virtio-balloon: switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
Subsystem: mm/userfaultfd
Shaohua Li <shli@fb.com>:
Patch series "userfaultfd: write protection support", v6:
userfaultfd: wp: add helper for writeprotect check
Andrea Arcangeli <aarcange@redhat.com>:
userfaultfd: wp: hook userfault handler to write protection fault
userfaultfd: wp: add WP pagetable tracking to x86
userfaultfd: wp: userfaultfd_pte/huge_pmd_wp() helpers
userfaultfd: wp: add UFFDIO_COPY_MODE_WP
Peter Xu <peterx@redhat.com>:
mm: merge parameters for change_protection()
userfaultfd: wp: apply _PAGE_UFFD_WP bit
userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork
userfaultfd: wp: add pmd_swp_*uffd_wp() helpers
userfaultfd: wp: support swap and page migration
khugepaged: skip collapse if uffd-wp detected
Shaohua Li <shli@fb.com>:
userfaultfd: wp: support write protection for userfault vma range
Andrea Arcangeli <aarcange@redhat.com>:
userfaultfd: wp: add the writeprotect API to userfaultfd ioctl
Shaohua Li <shli@fb.com>:
userfaultfd: wp: enabled write protection in userfaultfd API
Peter Xu <peterx@redhat.com>:
userfaultfd: wp: don't wake up when doing write protect
Martin Cracauer <cracauer@cons.org>:
userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update
Peter Xu <peterx@redhat.com>:
userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally
userfaultfd: selftests: refactor statistics
userfaultfd: selftests: add write-protect test
Subsystem: mm/memory-hotplug
David Hildenbrand <david@redhat.com>:
Patch series "mm: drop superfluous section checks when onlining/offlining":
drivers/base/memory.c: drop section_count
drivers/base/memory.c: drop pages_correctly_probed()
mm/page_ext.c: drop pfn_present() check when onlining
Baoquan He <bhe@redhat.com>:
mm/memory_hotplug.c: only respect mem= parameter during boot stage
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug.c: simplify calculation of number of pages in __remove_pages()
mm/memory_hotplug.c: cleanup __add_pages()
Baoquan He <bhe@redhat.com>:
Patch series "mm/hotplug: Only use subsection map for VMEMMAP", v4:
mm/sparse.c: introduce new function fill_subsection_map()
mm/sparse.c: introduce a new function clear_subsection_map()
mm/sparse.c: only use subsection map in VMEMMAP case
mm/sparse.c: add note about only VMEMMAP supporting sub-section hotplug
mm/sparse.c: move subsection_map related functions together
David Hildenbrand <david@redhat.com>:
Patch series "mm/memory_hotplug: allow to specify a default online_type", v3:
drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE
drivers/base/memory: map MMOP_OFFLINE to 0
drivers/base/memory: store mapping between MMOP_* and string in an array
powernv/memtrace: always online added memory blocks
hv_balloon: don't check for memhp_auto_online manually
mm/memory_hotplug: unexport memhp_auto_online
mm/memory_hotplug: convert memhp_auto_online to store an online_type
mm/memory_hotplug: allow to specify a default online_type
chenqiwu <chenqiwu@xiaomi.com>:
mm/memory_hotplug.c: use __pfn_to_section() instead of open-coding
Subsystem: mm/shmem
Kees Cook <keescook@chromium.org>:
mm/shmem.c: distribute switch variables for initialization
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/shmem.c: clean code by removing unnecessary assignment
Hugh Dickins <hughd@google.com>:
mm: huge tmpfs: try to split_huge_page() when punching hole
Subsystem: mm/rmap
Palmer Dabbelt <palmerdabbelt@google.com>:
mm: prevent a warning when casting void* -> enum
Subsystem: mm/zswap
"Maciej S. Szmigiero" <mail@maciej.szmigiero.name>:
mm/zswap: allow setting default status, compressor and allocator in Kconfig
Subsystem: mm/zsmalloc
Subsystem: mm/cleanups
Jules Irenge <jbi.octave@gmail.com>:
mm/compaction: add missing annotation for compact_lock_irqsave
mm/hugetlb: add missing annotation for gather_surplus_pages()
mm/mempolicy: add missing annotation for queue_pages_pmd()
mm/slub: add missing annotation for get_map()
mm/slub: add missing annotation for put_map()
mm/zsmalloc: add missing annotation for migrate_read_lock()
mm/zsmalloc: add missing annotation for migrate_read_unlock()
mm/zsmalloc: add missing annotation for pin_tag()
mm/zsmalloc: add missing annotation for unpin_tag()
chenqiwu <chenqiwu@xiaomi.com>:
mm: fix ambiguous comments for better code readability
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/mm_init.c: clean code. Use BUILD_BUG_ON when comparing compile time constant
Joe Perches <joe@perches.com>:
mm: use fallthrough;
Steven Price <steven.price@arm.com>:
include/linux/swapops.h: correct guards for non_swap_entry()
Ira Weiny <ira.weiny@intel.com>:
include/linux/memremap.h: remove stale comments
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/dmapool.c: micro-optimisation remove unnecessary branch
Waiman Long <longman@redhat.com>:
mm: remove dummy struct bootmem_data/bootmem_data_t
Subsystem: procfs
Jules Irenge <jbi.octave@gmail.com>:
fs/proc/inode.c: annotate close_pdeo() for sparse
Alexey Dobriyan <adobriyan@gmail.com>:
proc: faster open/read/close with "permanent" files
proc: speed up /proc/*/statm
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
proc: inline vma_stop into m_stop
proc: remove m_cache_vma
proc: use ppos instead of m->version
seq_file: remove m->version
proc: inline m_next_vma into m_next
Subsystem: misc
Michal Simek <michal.simek@xilinx.com>:
asm-generic: fix unistd_32.h generation format
Nathan Chancellor <natechancellor@gmail.com>:
kernel/extable.c: use address-of operator on section symbols
Masahiro Yamada <masahiroy@kernel.org>:
sparc,x86: vdso: remove meaningless undefining CONFIG_OPTIMIZE_INLINING
compiler: remove CONFIG_OPTIMIZE_INLINING entirely
Vegard Nossum <vegard.nossum@oracle.com>:
compiler.h: fix error in BUILD_BUG_ON() reporting
Subsystem: MAINTAINERS
Joe Perches <joe@perches.com>:
MAINTAINERS: list the section entries in the preferred order
Subsystem: bitops
Josh Poimboeuf <jpoimboe@redhat.com>:
bitops: always inline sign extension helpers
Subsystem: lib
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
lib/test_lockup: test module to generate lockups
Colin Ian King <colin.king@canonical.com>:
lib/test_lockup.c: fix spelling mistake "iteraions" -> "iterations"
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
lib/test_lockup.c: add parameters for locking generic vfs locks
"Gustavo A. R. Silva" <gustavo@embeddedor.com>:
lib/bch.c: replace zero-length array with flexible-array member
lib/ts_bm.c: replace zero-length array with flexible-array member
lib/ts_fsm.c: replace zero-length array with flexible-array member
lib/ts_kmp.c: replace zero-length array with flexible-array member
Geert Uytterhoeven <geert+renesas@glider.be>:
lib/scatterlist: fix sg_copy_buffer() kerneldoc
Kees Cook <keescook@chromium.org>:
lib: test_stackinit.c: XFAIL switch variable init tests
Alexander Potapenko <glider@google.com>:
lib/stackdepot.c: check depot_index before accessing the stack slab
lib/stackdepot.c: fix a condition in stack_depot_fetch()
lib/stackdepot.c: build with -fno-builtin
kasan: stackdepot: move filter_irq_stacks() to stackdepot.c
Qian Cai <cai@lca.pw>:
percpu_counter: fix a data race at vm_committed_as
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
lib/test_bitmap.c: make use of EXP2_IN_BITS
chenqiwu <chenqiwu@xiaomi.com>:
lib/rbtree: fix coding style of assignments
Dan Carpenter <dan.carpenter@oracle.com>:
lib/test_kmod.c: remove a NULL test
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
linux/bits.h: add compile time sanity check of GENMASK inputs
Chris Wilson <chris@chris-wilson.co.uk>:
lib/list: prevent compiler reloads inside 'safe' list iteration
Nathan Chancellor <natechancellor@gmail.com>:
lib/dynamic_debug.c: use address-of operator on section symbols
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: remove email address comment from email address comparisons
Lubomir Rintel <lkundrak@v3.sk>:
checkpatch: check SPDX tags in YAML files
John Hubbard <jhubbard@nvidia.com>:
checkpatch: support "base-commit:" format
Joe Perches <joe@perches.com>:
checkpatch: prefer fallthrough; over fallthrough comments
Antonio Borneo <borneo.antonio@gmail.com>:
checkpatch: fix minor typo and mixed space+tab in indentation
checkpatch: fix multiple const * types
checkpatch: add command-line option for TAB size
Joe Perches <joe@perches.com>:
checkpatch: improve Gerrit Change-Id: test
Lubomir Rintel <lkundrak@v3.sk>:
checkpatch: check proper licensing of Devicetree bindings
Joe Perches <joe@perches.com>:
checkpatch: avoid warning about uninitialized_var()
Subsystem: epoll
Roman Penyaev <rpenyaev@suse.de>:
kselftest: introduce new epoll test case
Jason Baron <jbaron@akamai.com>:
fs/epoll: make nesting accounting safe for -rt kernel
Subsystem: binfmt
Alexey Dobriyan <adobriyan@gmail.com>:
fs/binfmt_elf.c: delete "loc" variable
fs/binfmt_elf.c: allocate less for static executable
fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
Subsystem: kallsyms
Will Deacon <will@kernel.org>:
Patch series "Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()":
samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
samples/hw_breakpoint: drop use of kallsyms_lookup_name()
kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
Subsystem: reiserfs
Colin Ian King <colin.king@canonical.com>:
reiserfs: clean up several indentation issues
Subsystem: kmod
Qiujun Huang <hqjagain@gmail.com>:
kernel/kmod.c: fix a typo "assuems" -> "assumes"
Subsystem: gcov
"Gustavo A. R. Silva" <gustavo@embeddedor.com>:
gcov: gcc_4_7: replace zero-length array with flexible-array member
gcov: gcc_3_4: replace zero-length array with flexible-array member
kernel/gcov/fs.c: replace zero-length array with flexible-array member
Subsystem: kconfig
Krzysztof Kozlowski <krzk@kernel.org>:
init/Kconfig: clean up ANON_INODES and old IO schedulers options
Subsystem: kcov
Andrey Konovalov <andreyknvl@google.com>:
Patch series "kcov: collect coverage from usb soft interrupts", v4:
kcov: cleanup debug messages
kcov: fix potential use-after-free in kcov_remote_start
kcov: move t->kcov assignments into kcov_start/stop
kcov: move t->kcov_sequence assignment
kcov: use t->kcov_mode as enabled indicator
kcov: collect coverage from interrupts
usb: core: kcov: collect coverage from usb complete callback
Subsystem: ubsan
Kees Cook <keescook@chromium.org>:
Patch series "ubsan: Split out bounds checker", v5:
ubsan: add trap instrumentation option
ubsan: split "bounds" checker from other options
drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks
ubsan: check panic_on_warn
kasan: unset panic_on_warn before calling panic()
ubsan: include bug type in report header
Subsystem: fault-injection
Qiujun Huang <hqjagain@gmail.com>:
lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability"
Subsystem: ipc
Somala Swaraj <somalaswaraj@gmail.com>:
ipc/mqueue.c: fix a brace coding style issue
Jason Yan <yanaijie@huawei.com>:
ipc/shm.c: make compat_ksys_shmctl() static
Documentation/admin-guide/kernel-parameters.txt | 13
Documentation/admin-guide/mm/transhuge.rst | 14
Documentation/admin-guide/mm/userfaultfd.rst | 51
Documentation/dev-tools/kcov.rst | 17
Documentation/vm/free_page_reporting.rst | 41
Documentation/vm/zswap.rst | 20
MAINTAINERS | 35
arch/alpha/include/asm/mmzone.h | 2
arch/alpha/kernel/syscalls/syscallhdr.sh | 2
arch/csky/mm/fault.c | 4
arch/ia64/kernel/syscalls/syscallhdr.sh | 2
arch/ia64/kernel/vmlinux.lds.S | 2
arch/m68k/mm/fault.c | 4
arch/microblaze/kernel/syscalls/syscallhdr.sh | 2
arch/mips/kernel/syscalls/syscallhdr.sh | 3
arch/mips/mm/fault.c | 4
arch/nds32/kernel/vmlinux.lds.S | 1
arch/parisc/kernel/syscalls/syscallhdr.sh | 2
arch/powerpc/kernel/syscalls/syscallhdr.sh | 3
arch/powerpc/kvm/e500_mmu_host.c | 2
arch/powerpc/mm/fault.c | 2
arch/powerpc/platforms/powernv/memtrace.c | 14
arch/sh/kernel/syscalls/syscallhdr.sh | 2
arch/sh/mm/fault.c | 2
arch/sparc/kernel/syscalls/syscallhdr.sh | 2
arch/sparc/vdso/vdso32/vclock_gettime.c | 4
arch/x86/Kconfig | 1
arch/x86/configs/i386_defconfig | 1
arch/x86/configs/x86_64_defconfig | 1
arch/x86/entry/vdso/vdso32/vclock_gettime.c | 4
arch/x86/include/asm/pgtable.h | 67 +
arch/x86/include/asm/pgtable_64.h | 8
arch/x86/include/asm/pgtable_types.h | 12
arch/x86/mm/fault.c | 2
arch/xtensa/kernel/syscalls/syscallhdr.sh | 2
drivers/base/memory.c | 138 --
drivers/hv/hv_balloon.c | 25
drivers/misc/lkdtm/bugs.c | 75 +
drivers/misc/lkdtm/core.c | 3
drivers/misc/lkdtm/lkdtm.h | 3
drivers/usb/core/hcd.c | 3
drivers/virtio/Kconfig | 1
drivers/virtio/virtio_balloon.c | 190 ++-
fs/binfmt_elf.c | 56
fs/eventpoll.c | 64 -
fs/proc/array.c | 39
fs/proc/cpuinfo.c | 1
fs/proc/generic.c | 31
fs/proc/inode.c | 188 ++-
fs/proc/internal.h | 6
fs/proc/kmsg.c | 1
fs/proc/stat.c | 1
fs/proc/task_mmu.c | 97 -
fs/reiserfs/do_balan.c | 2
fs/reiserfs/ioctl.c | 11
fs/reiserfs/namei.c | 10
fs/seq_file.c | 28
fs/userfaultfd.c | 116 +
include/asm-generic/pgtable.h | 1
include/asm-generic/pgtable_uffd.h | 66 +
include/asm-generic/tlb.h | 3
include/linux/bitops.h | 4
include/linux/bits.h | 22
include/linux/compiler.h | 2
include/linux/compiler_types.h | 11
include/linux/gfp.h | 2
include/linux/huge_mm.h | 2
include/linux/list.h | 50
include/linux/memory.h | 1
include/linux/memory_hotplug.h | 13
include/linux/memremap.h | 2
include/linux/mm.h | 25
include/linux/mm_inline.h | 15
include/linux/mm_types.h | 4
include/linux/mmzone.h | 47
include/linux/page-flags.h | 16
include/linux/page_reporting.h | 26
include/linux/pagemap.h | 4
include/linux/percpu_counter.h | 4
include/linux/proc_fs.h | 17
include/linux/sched.h | 3
include/linux/seq_file.h | 1
include/linux/shmem_fs.h | 10
include/linux/stackdepot.h | 2
include/linux/swapops.h | 5
include/linux/userfaultfd_k.h | 42
include/linux/vm_event_item.h | 5
include/trace/events/huge_memory.h | 1
include/trace/events/mmflags.h | 1
include/trace/events/vmscan.h | 2
include/uapi/linux/userfaultfd.h | 40
include/uapi/linux/virtio_balloon.h | 1
init/Kconfig | 8
ipc/mqueue.c | 5
ipc/shm.c | 2
ipc/util.c | 1
kernel/configs/tiny.config | 1
kernel/events/core.c | 3
kernel/extable.c | 3
kernel/fork.c | 10
kernel/gcov/fs.c | 2
kernel/gcov/gcc_3_4.c | 6
kernel/gcov/gcc_4_7.c | 2
kernel/kallsyms.c | 2
kernel/kcov.c | 282 +++-
kernel/kmod.c | 2
kernel/module.c | 1
kernel/sched/fair.c | 2
lib/Kconfig.debug | 35
lib/Kconfig.ubsan | 51
lib/Makefile | 8
lib/bch.c | 2
lib/dynamic_debug.c | 2
lib/rbtree.c | 4
lib/scatterlist.c | 2
lib/stackdepot.c | 39
lib/test_bitmap.c | 2
lib/test_kmod.c | 2
lib/test_lockup.c | 601 +++++++++-
lib/test_stackinit.c | 28
lib/ts_bm.c | 2
lib/ts_fsm.c | 2
lib/ts_kmp.c | 2
lib/ubsan.c | 47
mm/Kconfig | 135 ++
mm/Makefile | 1
mm/compaction.c | 3
mm/dmapool.c | 4
mm/filemap.c | 14
mm/gup.c | 9
mm/huge_memory.c | 36
mm/hugetlb.c | 1
mm/hugetlb_cgroup.c | 6
mm/internal.h | 2
mm/kasan/common.c | 23
mm/kasan/report.c | 10
mm/khugepaged.c | 39
mm/ksm.c | 5
mm/list_lru.c | 2
mm/memcontrol.c | 5
mm/memory-failure.c | 2
mm/memory.c | 42
mm/memory_hotplug.c | 53
mm/mempolicy.c | 11
mm/migrate.c | 122 +-
mm/mm_init.c | 2
mm/mmap.c | 10
mm/mprotect.c | 76 -
mm/page_alloc.c | 174 ++
mm/page_ext.c | 5
mm/page_isolation.c | 6
mm/page_reporting.c | 384 ++++++
mm/page_reporting.h | 54
mm/rmap.c | 23
mm/shmem.c | 168 +-
mm/shuffle.c | 12
mm/shuffle.h | 6
mm/slab_common.c | 1
mm/slub.c | 3
mm/sparse.c | 236 ++-
mm/swap.c | 20
mm/swapfile.c | 1
mm/userfaultfd.c | 98 +
mm/vmalloc.c | 2
mm/vmscan.c | 12
mm/vmstat.c | 3
mm/zsmalloc.c | 10
mm/zswap.c | 24
samples/hw_breakpoint/data_breakpoint.c | 11
scripts/Makefile.ubsan | 16
scripts/checkpatch.pl | 155 +-
tools/lib/rbtree.c | 4
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 67 +
tools/testing/selftests/vm/userfaultfd.c | 233 +++
174 files changed, 3990 insertions(+), 1399 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-04-02 4:01 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-04-02 4:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
A large amount of MM, plenty more to come.
155 patches, based on GIT 1a323ea5356edbb3073dc59d51b9e6b86908857d
Subsystems affected by this patch series:
tools
kthread
kbuild
scripts
ocfs2
vfs
mm/slub
mm/kmemleak
mm/pagecache
mm/gup
mm/swap
mm/memcg
mm/pagemap
mm/mremap
mm/sparsemem
mm/kasan
mm/pagealloc
mm/vmscan
mm/compaction
mm/mempolicy
mm/hugetlbfs
mm/hugetlb
Subsystem: tools
David Ahern <dsahern@kernel.org>:
tools/accounting/getdelays.c: fix netlink attribute length
Subsystem: kthread
Petr Mladek <pmladek@suse.com>:
kthread: mark timer used by delayed kthread works as IRQ safe
Subsystem: kbuild
Masahiro Yamada <masahiroy@kernel.org>:
asm-generic: make more kernel-space headers mandatory
Subsystem: scripts
Jonathan Neuschäfer <j.neuschaefer@gmx.net>:
scripts/spelling.txt: add syfs/sysfs pattern
Colin Ian King <colin.king@canonical.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Subsystem: ocfs2
Alex Shi <alex.shi@linux.alibaba.com>:
ocfs2: remove FS_OCFS2_NM
ocfs2: remove unused macros
ocfs2: use OCFS2_SEC_BITS in macro
ocfs2: remove dlm_lock_is_remote
wangyan <wangyan122@huawei.com>:
ocfs2: there is no need to log twice in several functions
ocfs2: correct annotation from "l_next_rec" to "l_next_free_rec"
Alex Shi <alex.shi@linux.alibaba.com>:
ocfs2: remove useless err
Jules Irenge <jbi.octave@gmail.com>:
ocfs2: Add missing annotations for ocfs2_refcount_cache_lock() and ocfs2_refcount_cache_unlock()
"Gustavo A. R. Silva" <gustavo@embeddedor.com>:
ocfs2: replace zero-length array with flexible-array member
ocfs2: cluster: replace zero-length array with flexible-array member
ocfs2: dlm: replace zero-length array with flexible-array member
ocfs2: ocfs2_fs.h: replace zero-length array with flexible-array member
wangjian <wangjian161@huawei.com>:
ocfs2: roll back the reference count modification of the parent directory if an error occurs
Takashi Iwai <tiwai@suse.de>:
ocfs2: use scnprintf() for avoiding potential buffer overflow
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
ocfs2: use memalloc_nofs_save instead of memalloc_noio_save
Subsystem: vfs
Kees Cook <keescook@chromium.org>:
fs_parse: Remove pr_notice() about each validation
Subsystem: mm/slub
chenqiwu <chenqiwu@xiaomi.com>:
mm/slub.c: replace cpu_slab->partial with wrapped APIs
mm/slub.c: replace kmem_cache->cpu_partial with wrapped APIs
Kees Cook <keescook@chromium.org>:
slub: improve bit diffusion for freelist ptr obfuscation
slub: relocate freelist pointer to middle of object
Vlastimil Babka <vbabka@suse.cz>:
Revert "topology: add support for node_to_mem_node() to determine the fallback node"
Subsystem: mm/kmemleak
Nathan Chancellor <natechancellor@gmail.com>:
mm/kmemleak.c: use address-of operator on section symbols
Qian Cai <cai@lca.pw>:
mm/Makefile: disable KCSAN for kmemleak
Subsystem: mm/pagecache
Jan Kara <jack@suse.cz>:
mm/filemap.c: don't bother dropping mmap_sem for zero size readahead
Mauricio Faria de Oliveira <mfo@canonical.com>:
mm/page-writeback.c: write_cache_pages(): deduplicate identical checks
Xianting Tian <xianting_tian@126.com>:
mm/filemap.c: clear page error before actual read
Souptick Joarder <jrdr.linux@gmail.com>:
mm/filemap.c: remove unused argument from shrink_readahead_size_eio()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm/filemap.c: use vm_fault error code directly
include/linux/pagemap.h: rename arguments to find_subpage
mm/page-writeback.c: use VM_BUG_ON_PAGE in clear_page_dirty_for_io
mm/filemap.c: unexport find_get_entry
mm/filemap.c: rewrite pagecache_get_page documentation
Subsystem: mm/gup
John Hubbard <jhubbard@nvidia.com>:
Patch series "mm/gup: track FOLL_PIN pages", v6:
mm/gup: split get_user_pages_remote() into two routines
mm/gup: pass a flags arg to __gup_device_* functions
mm: introduce page_ref_sub_return()
mm/gup: pass gup flags to two more routines
mm/gup: require FOLL_GET for get_user_pages_fast()
mm/gup: track FOLL_PIN pages
mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages
mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: improve dump_page() for compound pages
John Hubbard <jhubbard@nvidia.com>:
mm: dump_page(): additional diagnostics for huge pinned pages
Claudio Imbrenda <imbrenda@linux.ibm.com>:
mm/gup/writeback: add callbacks for inaccessible pages
Pingfan Liu <kernelfans@gmail.com>:
mm/gup: rename nr as nr_pinned in get_user_pages_fast()
mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
Subsystem: mm/swap
Chen Wandun <chenwandun@huawei.com>:
mm/swapfile.c: fix comments for swapcache_prepare
Wei Yang <richardw.yang@linux.intel.com>:
mm/swap.c: not necessary to export __pagevec_lru_add()
Qian Cai <cai@lca.pw>:
mm/swapfile: fix data races in try_to_unuse()
Wei Yang <richard.weiyang@linux.alibaba.com>:
mm/swap_slots.c: assign|reset cache slot by value directly
Yang Shi <yang.shi@linux.alibaba.com>:
mm: swap: make page_evictable() inline
mm: swap: use smp_mb__after_atomic() to order LRU bit set
Wei Yang <richard.weiyang@gmail.com>:
mm/swap_state.c: use the same way to count page in [add_to|delete_from]_swap_cache
Subsystem: mm/memcg
Yafang Shao <laoar.shao@gmail.com>:
mm, memcg: fix build error around the usage of kmem_caches
Kirill Tkhai <ktkhai@virtuozzo.com>:
mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: use mem_cgroup_from_obj()
Patch series "mm: memcg: kmem API cleanup", v2:
mm: kmem: cleanup (__)memcg_kmem_charge_memcg() arguments
mm: kmem: cleanup memcg_kmem_uncharge_memcg() arguments
mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page()
mm: kmem: switch to nr_pages in (__)memcg_kmem_charge_memcg()
mm: memcg/slab: cache page number in memcg_(un)charge_slab()
mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge()
Johannes Weiner <hannes@cmpxchg.org>:
Patch series "mm: memcontrol: recursive memory.low protection", v3:
mm: memcontrol: fix memory.low proportional distribution
mm: memcontrol: clean up and document effective low/min calculations
mm: memcontrol: recursive memory.low protection
Shakeel Butt <shakeelb@google.com>:
memcg: css_tryget_online cleanups
Vincenzo Frascino <vincenzo.frascino@arm.com>:
mm/memcontrol.c: make mem_cgroup_id_get_many() __maybe_unused
Chris Down <chris@chrisdown.name>:
mm, memcg: prevent memory.high load/store tearing
mm, memcg: prevent memory.max load tearing
mm, memcg: prevent memory.low load/store tearing
mm, memcg: prevent memory.min load/store tearing
mm, memcg: prevent memory.swap.max load tearing
mm, memcg: prevent mem_cgroup_protected store tearing
Roman Gushchin <guro@fb.com>:
mm: memcg: make memory.oom.group tolerable to task migration
Subsystem: mm/pagemap
Thomas Hellstrom <thellstrom@vmware.com>:
mm/mapping_dirty_helpers: Update huge page-table entry callbacks
Anshuman Khandual <anshuman.khandual@arm.com>:
Patch series "mm/vma: some more minor changes", v2:
mm/vma: move VM_NO_KHUGEPAGED into generic header
mm/vma: make vma_is_foreign() available for general use
mm/vma: make is_vma_temporary_stack() available for general use
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: add pagemap.h to the fine documentation
Peter Xu <peterx@redhat.com>:
Patch series "mm: Page fault enhancements", v6:
mm/gup: rename "nonblocking" to "locked" where proper
mm/gup: fix __get_user_pages() on fault retry of hugetlb
mm: introduce fault_signal_pending()
x86/mm: use helper fault_signal_pending()
arc/mm: use helper fault_signal_pending()
arm64/mm: use helper fault_signal_pending()
powerpc/mm: use helper fault_signal_pending()
sh/mm: use helper fault_signal_pending()
mm: return faster for non-fatal signals in user mode faults
userfaultfd: don't retake mmap_sem to emulate NOPAGE
mm: introduce FAULT_FLAG_DEFAULT
mm: introduce FAULT_FLAG_INTERRUPTIBLE
mm: allow VM_FAULT_RETRY for multiple times
mm/gup: allow VM_FAULT_RETRY for multiple times
mm/gup: allow to react to fatal signals
mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path
WANG Wenhu <wenhu.wang@vivo.com>:
mm: clarify a confusing comment for remap_pfn_range()
Wang Wenhu <wenhu.wang@vivo.com>:
mm/memory.c: clarify a confusing comment for vm_iomap_memory
Jaewon Kim <jaewon31.kim@samsung.com>:
Patch series "mm: mmap: add mmap trace point", v3:
mmap: remove inline of vm_unmapped_area
mm: mmap: add trace point of vm_unmapped_area
Subsystem: mm/mremap
Brian Geffon <bgeffon@google.com>:
mm/mremap: add MREMAP_DONTUNMAP to mremap()
selftests: add MREMAP_DONTUNMAP selftest
Subsystem: mm/sparsemem
Wei Yang <richardw.yang@linux.intel.com>:
mm/sparsemem: get address to page struct instead of address to pfn
Pingfan Liu <kernelfans@gmail.com>:
mm/sparse: rename pfn_present() to pfn_in_present_section()
Baoquan He <bhe@redhat.com>:
mm/sparse.c: use kvmalloc/kvfree to alloc/free memmap for the classic sparse
mm/sparse.c: allocate memmap preferring the given node
Subsystem: mm/kasan
Walter Wu <walter-zh.wu@mediatek.com>:
Patch series "fix the missing underflow in memory operation function", v4:
kasan: detect negative size in memory operation function
kasan: add test for invalid size in memmove
Subsystem: mm/pagealloc
Joel Savitz <jsavitz@redhat.com>:
mm/page_alloc: increase default min_free_kbytes bound
Mateusz Nosek <mateusznosek0@gmail.com>:
mm, pagealloc: micro-optimisation: save two branches on hot page allocation path
chenqiwu <chenqiwu@xiaomi.com>:
mm/page_alloc.c: use free_area_empty() instead of open-coding
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/page_alloc.c: micro-optimisation Remove unnecessary branch
chenqiwu <chenqiwu@xiaomi.com>:
mm/page_alloc: simplify page_is_buddy() for better code readability
Subsystem: mm/vmscan
Yang Shi <yang.shi@linux.alibaba.com>:
mm: vmpressure: don't need call kfree if kstrndup fails
mm: vmpressure: use mem_cgroup_is_root API
mm: vmscan: replace open codings to NUMA_NO_NODE
Wei Yang <richardw.yang@linux.intel.com>:
mm/vmscan.c: remove cpu online notification for now
Qian Cai <cai@lca.pw>:
mm/vmscan.c: fix data races using kswapd_classzone_idx
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/vmscan.c: Clean code by removing unnecessary assignment
Kirill Tkhai <ktkhai@virtuozzo.com>:
mm/vmscan.c: make may_enter_fs bool in shrink_page_list()
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/vmscan.c: do_try_to_free_pages(): clean code by removing unnecessary assignment
Michal Hocko <mhocko@suse.com>:
selftests: vm: drop dependencies on page flags from mlock2 tests
Subsystem: mm/compaction
Rik van Riel <riel@surriel.com>:
Patch series "fix THP migration for CMA allocations", v2:
mm,compaction,cma: add alloc_contig flag to compact_control
mm,thp,compaction,cma: allow THP migration for CMA allocations
Vlastimil Babka <vbabka@suse.cz>:
mm, compaction: fully assume capture is not NULL in compact_zone_order()
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm/compaction: really limit compact_unevictable_allowed to 0 and 1
mm/compaction: Disable compact_unevictable_allowed on RT
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/compaction.c: clean code by removing unnecessary assignment
Subsystem: mm/mempolicy
Li Xinhai <lixinhai.lxh@gmail.com>:
mm/mempolicy: support MPOL_MF_STRICT for huge page mapping
mm/mempolicy: check hugepage migration is supported by arch in vma_migratable()
Yang Shi <yang.shi@linux.alibaba.com>:
mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
Randy Dunlap <rdunlap@infradead.org>:
mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
Colin Ian King <colin.king@canonical.com>:
mm/memblock.c: remove redundant assignment to variable max_addr
Subsystem: mm/hugetlbfs
Mike Kravetz <mike.kravetz@oracle.com>:
Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2:
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
Subsystem: mm/hugetlb
Mina Almasry <almasrymina@google.com>:
hugetlb_cgroup: add hugetlb_cgroup reservation counter
hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
mm/hugetlb_cgroup: fix hugetlb_cgroup migration
hugetlb_cgroup: add reservation accounting for private mappings
hugetlb: disable region_add file_region coalescing
hugetlb_cgroup: add accounting for shared mappings
hugetlb_cgroup: support noreserve mappings
hugetlb: support file_region coalescing again
hugetlb_cgroup: add hugetlb_cgroup reservation tests
hugetlb_cgroup: add hugetlb_cgroup reservation docs
Mateusz Nosek <mateusznosek0@gmail.com>:
mm/hugetlb.c: clean code by removing unnecessary initialization
Vlastimil Babka <vbabka@suse.cz>:
mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
Christophe Leroy <christophe.leroy@c-s.fr>:
selftests/vm: fix map_hugetlb length used for testing read and write
mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
Documentation/admin-guide/cgroup-v1/hugetlb.rst | 103 +-
Documentation/admin-guide/cgroup-v2.rst | 11
Documentation/admin-guide/sysctl/vm.rst | 3
Documentation/core-api/mm-api.rst | 3
Documentation/core-api/pin_user_pages.rst | 86 +
arch/alpha/include/asm/Kbuild | 11
arch/alpha/mm/fault.c | 6
arch/arc/include/asm/Kbuild | 21
arch/arc/mm/fault.c | 37
arch/arm/include/asm/Kbuild | 12
arch/arm/mm/fault.c | 7
arch/arm64/include/asm/Kbuild | 18
arch/arm64/mm/fault.c | 26
arch/c6x/include/asm/Kbuild | 37
arch/csky/include/asm/Kbuild | 36
arch/h8300/include/asm/Kbuild | 46
arch/hexagon/include/asm/Kbuild | 33
arch/hexagon/mm/vm_fault.c | 5
arch/ia64/include/asm/Kbuild | 7
arch/ia64/mm/fault.c | 5
arch/m68k/include/asm/Kbuild | 24
arch/m68k/mm/fault.c | 7
arch/microblaze/include/asm/Kbuild | 29
arch/microblaze/mm/fault.c | 5
arch/mips/include/asm/Kbuild | 13
arch/mips/mm/fault.c | 5
arch/nds32/include/asm/Kbuild | 37
arch/nds32/mm/fault.c | 5
arch/nios2/include/asm/Kbuild | 38
arch/nios2/mm/fault.c | 7
arch/openrisc/include/asm/Kbuild | 36
arch/openrisc/mm/fault.c | 5
arch/parisc/include/asm/Kbuild | 18
arch/parisc/mm/fault.c | 8
arch/powerpc/include/asm/Kbuild | 4
arch/powerpc/mm/book3s64/pkeys.c | 12
arch/powerpc/mm/fault.c | 20
arch/powerpc/platforms/pseries/hotplug-memory.c | 2
arch/riscv/include/asm/Kbuild | 28
arch/riscv/mm/fault.c | 9
arch/s390/include/asm/Kbuild | 15
arch/s390/mm/fault.c | 10
arch/sh/include/asm/Kbuild | 16
arch/sh/mm/fault.c | 13
arch/sparc/include/asm/Kbuild | 14
arch/sparc/mm/fault_32.c | 5
arch/sparc/mm/fault_64.c | 5
arch/um/kernel/trap.c | 3
arch/unicore32/include/asm/Kbuild | 34
arch/unicore32/mm/fault.c | 8
arch/x86/include/asm/Kbuild | 2
arch/x86/include/asm/mmu_context.h | 15
arch/x86/mm/fault.c | 32
arch/xtensa/include/asm/Kbuild | 26
arch/xtensa/mm/fault.c | 5
drivers/base/node.c | 2
drivers/gpu/drm/ttm/ttm_bo_vm.c | 12
fs/fs_parser.c | 2
fs/hugetlbfs/inode.c | 30
fs/ocfs2/alloc.c | 3
fs/ocfs2/cluster/heartbeat.c | 12
fs/ocfs2/cluster/netdebug.c | 4
fs/ocfs2/cluster/tcp.c | 27
fs/ocfs2/cluster/tcp.h | 2
fs/ocfs2/dir.c | 4
fs/ocfs2/dlm/dlmcommon.h | 8
fs/ocfs2/dlm/dlmdebug.c | 100 -
fs/ocfs2/dlm/dlmmaster.c | 2
fs/ocfs2/dlm/dlmthread.c | 3
fs/ocfs2/dlmglue.c | 2
fs/ocfs2/journal.c | 2
fs/ocfs2/namei.c | 15
fs/ocfs2/ocfs2_fs.h | 18
fs/ocfs2/refcounttree.c | 2
fs/ocfs2/reservations.c | 3
fs/ocfs2/stackglue.c | 2
fs/ocfs2/suballoc.c | 5
fs/ocfs2/super.c | 46
fs/pipe.c | 2
fs/userfaultfd.c | 64 -
include/asm-generic/Kbuild | 52 +
include/linux/cgroup-defs.h | 5
include/linux/fs.h | 5
include/linux/gfp.h | 6
include/linux/huge_mm.h | 10
include/linux/hugetlb.h | 76 +
include/linux/hugetlb_cgroup.h | 175 +++
include/linux/kasan.h | 2
include/linux/kthread.h | 3
include/linux/memcontrol.h | 66 -
include/linux/mempolicy.h | 29
include/linux/mm.h | 243 +++-
include/linux/mm_types.h | 7
include/linux/mmzone.h | 6
include/linux/page_ref.h | 9
include/linux/pagemap.h | 29
include/linux/sched/signal.h | 18
include/linux/swap.h | 1
include/linux/topology.h | 17
include/trace/events/mmap.h | 48
include/uapi/linux/mman.h | 5
kernel/cgroup/cgroup.c | 17
kernel/fork.c | 9
kernel/sysctl.c | 31
lib/test_kasan.c | 19
mm/Makefile | 1
mm/compaction.c | 31
mm/debug.c | 54 -
mm/filemap.c | 77 -
mm/gup.c | 682 ++++++++++---
mm/gup_benchmark.c | 71 +
mm/huge_memory.c | 29
mm/hugetlb.c | 866 ++++++++++++-----
mm/hugetlb_cgroup.c | 347 +++++-
mm/internal.h | 32
mm/kasan/common.c | 26
mm/kasan/generic.c | 9
mm/kasan/generic_report.c | 11
mm/kasan/kasan.h | 2
mm/kasan/report.c | 5
mm/kasan/tags.c | 9
mm/kasan/tags_report.c | 11
mm/khugepaged.c | 4
mm/kmemleak.c | 2
mm/list_lru.c | 12
mm/mapping_dirty_helpers.c | 42
mm/memblock.c | 2
mm/memcontrol.c | 378 ++++---
mm/memory-failure.c | 29
mm/memory.c | 4
mm/mempolicy.c | 73 +
mm/migrate.c | 25
mm/mmap.c | 32
mm/mremap.c | 92 +
mm/page-writeback.c | 19
mm/page_alloc.c | 82 -
mm/page_counter.c | 29
mm/page_ext.c | 2
mm/rmap.c | 39
mm/shuffle.c | 2
mm/slab.h | 32
mm/slab_common.c | 2
mm/slub.c | 27
mm/sparse.c | 33
mm/swap.c | 5
mm/swap_slots.c | 12
mm/swap_state.c | 2
mm/swapfile.c | 10
mm/userfaultfd.c | 11
mm/vmpressure.c | 8
mm/vmscan.c | 111 --
mm/vmstat.c | 2
scripts/spelling.txt | 21
tools/accounting/getdelays.c | 2
tools/testing/selftests/vm/.gitignore | 1
tools/testing/selftests/vm/Makefile | 2
tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 575 +++++++++++
tools/testing/selftests/vm/gup_benchmark.c | 15
tools/testing/selftests/vm/hugetlb_reparenting_test.sh | 244 ++++
tools/testing/selftests/vm/map_hugetlb.c | 14
tools/testing/selftests/vm/mlock2-tests.c | 233 ----
tools/testing/selftests/vm/mremap_dontunmap.c | 313 ++++++
tools/testing/selftests/vm/run_vmtests | 37
tools/testing/selftests/vm/write_hugetlb_memory.sh | 23
tools/testing/selftests/vm/write_to_hugetlbfs.c | 242 ++++
165 files changed, 5020 insertions(+), 2376 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-03-29 2:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-03-29 2:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
5 fixes, based on 83fd69c93340177dcd66fd26ce6441fb581c1dbf:
Naohiro Aota <naohiro.aota@wdc.com>:
mm/swapfile.c: move inode_lock out of claim_swapfile
David Hildenbrand <david@redhat.com>:
drivers/base/memory.c: indicate all memory blocks as removable
Mina Almasry <almasrymina@google.com>:
hugetlb_cgroup: fix illegal access to memory
Roman Gushchin <guro@fb.com>:
mm: fork: fix kernel_stack memcg stats for various stack implementations
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
mm/sparse: fix kernel crash with pfn_section_valid check
drivers/base/memory.c | 23 +++--------------------
include/linux/memcontrol.h | 12 ++++++++++++
kernel/fork.c | 4 ++--
mm/hugetlb_cgroup.c | 3 +--
mm/memcontrol.c | 38 ++++++++++++++++++++++++++++++++++++++
mm/sparse.c | 6 ++++++
mm/swapfile.c | 41 ++++++++++++++++++++---------------------
7 files changed, 82 insertions(+), 45 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-03-22 1:19 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-03-22 1:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
10 fixes, based on c63c50fc2ec9afc4de21ef9ead2eac64b178cce1:
Chunguang Xu <brookxu@tencent.com>:
memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
Baoquan He <bhe@redhat.com>:
mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
Qian Cai <cai@lca.pw>:
page-flags: fix a crash at SetPageError(THP_SWAP)
Chris Down <chris@chrisdown.name>:
mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
mm, memcg: throttle allocators based on ancestral memory.high
Michal Hocko <mhocko@suse.com>:
mm: do not allow MADV_PAGEOUT for CoW pages
Roman Penyaev <rpenyaev@suse.de>:
epoll: fix possible lost wakeup on epoll_ctl() path
Qian Cai <cai@lca.pw>:
mm/mmu_notifier: silence PROVE_RCU_LIST warnings
Vlastimil Babka <vbabka@suse.cz>:
mm, slub: prevent kmalloc_node crashes and memory leaks
Joerg Roedel <jroedel@suse.de>:
x86/mm: split vmalloc_sync_all()
arch/x86/mm/fault.c | 26 ++++++++++-
drivers/acpi/apei/ghes.c | 2
fs/eventpoll.c | 8 +--
include/linux/page-flags.h | 2
include/linux/vmalloc.h | 5 +-
kernel/notifier.c | 2
mm/madvise.c | 12 +++--
mm/memcontrol.c | 105 ++++++++++++++++++++++++++++-----------------
mm/mmu_notifier.c | 27 +++++++----
mm/nommu.c | 10 +++-
mm/slub.c | 26 +++++++----
mm/sparse.c | 8 ++-
mm/vmalloc.c | 11 +++-
13 files changed, 165 insertions(+), 79 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-03-06 6:27 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-03-06 6:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
7 fixes, based on 9f65ed5fe41ce08ed1cb1f6a950f9ec694c142ad:
Mel Gorman <mgorman@techsingularity.net>:
mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
Huang Ying <ying.huang@intel.com>:
mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
"Kirill A. Shutemov" <kirill@shutemov.name>:
mm: avoid data corruption on CoW fault into PFN-mapped VMA
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
fat: fix uninit-memory access for partial initialized inode
Sebastian Andrzej Siewior <bigeasy@linutronix.de>:
mm/z3fold.c: do not include rwlock.h directly
Vlastimil Babka <vbabka@suse.cz>:
mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
Miroslav Benes <mbenes@suse.cz>:
arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description
arch/Kconfig | 5 +++--
fs/fat/inode.c | 19 +++++++------------
include/linux/mm.h | 4 ++++
mm/huge_memory.c | 3 +--
mm/memory.c | 35 +++++++++++++++++++++++++++--------
mm/memory_hotplug.c | 8 +++++++-
mm/mprotect.c | 38 ++++++++++++++++++++++++++++++++++++--
mm/z3fold.c | 1 -
8 files changed, 85 insertions(+), 28 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-21 18:32 ` incoming Konstantin Ryabitsev
@ 2020-02-27 9:59 ` Vlastimil Babka
0 siblings, 0 replies; 786+ messages in thread
From: Vlastimil Babka @ 2020-02-27 9:59 UTC (permalink / raw)
To: Konstantin Ryabitsev, Linus Torvalds; +Cc: Andrew Morton, Linux-MM, mm-commits
On 2/21/20 7:32 PM, Konstantin Ryabitsev wrote:
> On Fri, Feb 21, 2020 at 10:21:19AM -0800, Linus Torvalds wrote:
>> On Thu, Feb 20, 2020 at 8:00 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> >
>> > - A few y2038 fixes which missed the merge window whiole dependencies
>> > in NFS were being sorted out.
>> >
>> > - A bunch of fixes. Some minor, some not.
>>
>> Hmm. Konstantin's nice lore script _used_ to pick up your patches, but
>> now they don't.
>>
>> I'm not sure what changed. It worked with your big series of 118 patches.
>>
>> It doesn't work with this smaller series of fixes.
>>
>> I think the difference is that you've done something bad to your patch
>> sending. That big series was properly threaded with each of the
>> patches being a reply to the 'incoming' message.
>>
>> This series is not.
>
> This is correct -- each patch is posted without an in-reply-to, so
> public-inbox doesn't group them into a thread.
>
> E.g.:
> https://lore.kernel.org/linux-mm/20200221040350.84HaG%25akpm@linux-foundation.org/
>
>>
>> Please, Andrew, can you make your email flow more consistent so that I
>> can actually use the nice new tool to download a patch series?
>
> Andrew, I'll be happy to provide you with a helper tool if you can
> describe me your workflow. E.g. if you have a quilt directory of patches
> plus a series file, it could easily be a tiny wrapper like:
>
> send-patches --base-commit 1234abcd --cover cover.txt patchdir/series
Once/if there is such tool, could it perhaps instead of mass e-mailing create
git commits, push them to korg repo and send a pull request?
Thanks,
Vlastimil
> -K
>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-21 18:21 ` incoming Linus Torvalds
2020-02-21 18:32 ` incoming Konstantin Ryabitsev
@ 2020-02-21 19:33 ` Linus Torvalds
1 sibling, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-02-21 19:33 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: Linux-MM, mm-commits
Side note: I've obviously picked it up the old-fashioned way, but I
had been looking forward to seeing if I could just automate this more.
Linus
On Fri, Feb 21, 2020 at 10:21 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Please, Andrew, can you make your email flow more consistent so that I
> can actually use the nice new tool to download a patch series?
>
> Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-21 18:21 ` incoming Linus Torvalds
@ 2020-02-21 18:32 ` Konstantin Ryabitsev
2020-02-27 9:59 ` incoming Vlastimil Babka
2020-02-21 19:33 ` incoming Linus Torvalds
1 sibling, 1 reply; 786+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-21 18:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Linux-MM, mm-commits
On Fri, Feb 21, 2020 at 10:21:19AM -0800, Linus Torvalds wrote:
> On Thu, Feb 20, 2020 at 8:00 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > - A few y2038 fixes which missed the merge window whiole dependencies
> > in NFS were being sorted out.
> >
> > - A bunch of fixes. Some minor, some not.
>
> Hmm. Konstantin's nice lore script _used_ to pick up your patches, but
> now they don't.
>
> I'm not sure what changed. It worked with your big series of 118 patches.
>
> It doesn't work with this smaller series of fixes.
>
> I think the difference is that you've done something bad to your patch
> sending. That big series was properly threaded with each of the
> patches being a reply to the 'incoming' message.
>
> This series is not.
This is correct -- each patch is posted without an in-reply-to, so
public-inbox doesn't group them into a thread.
E.g.:
https://lore.kernel.org/linux-mm/20200221040350.84HaG%25akpm@linux-foundation.org/
>
> Please, Andrew, can you make your email flow more consistent so that I
> can actually use the nice new tool to download a patch series?
Andrew, I'll be happy to provide you with a helper tool if you can
describe me your workflow. E.g. if you have a quilt directory of patches
plus a series file, it could easily be a tiny wrapper like:
send-patches --base-commit 1234abcd --cover cover.txt patchdir/series
-K
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-21 4:00 incoming Andrew Morton
2020-02-21 4:03 ` incoming Andrew Morton
@ 2020-02-21 18:21 ` Linus Torvalds
2020-02-21 18:32 ` incoming Konstantin Ryabitsev
2020-02-21 19:33 ` incoming Linus Torvalds
1 sibling, 2 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-02-21 18:21 UTC (permalink / raw)
To: Andrew Morton, Konstantin Ryabitsev; +Cc: Linux-MM, mm-commits
On Thu, Feb 20, 2020 at 8:00 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> - A few y2038 fixes which missed the merge window whiole dependencies
> in NFS were being sorted out.
>
> - A bunch of fixes. Some minor, some not.
Hmm. Konstantin's nice lore script _used_ to pick up your patches, but
now they don't.
I'm not sure what changed. It worked with your big series of 118 patches.
It doesn't work with this smaller series of fixes.
I think the difference is that you've done something bad to your patch
sending. That big series was properly threaded with each of the
patches being a reply to the 'incoming' message.
This series is not.
Please, Andrew, can you make your email flow more consistent so that I
can actually use the nice new tool to download a patch series?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-21 4:00 incoming Andrew Morton
@ 2020-02-21 4:03 ` Andrew Morton
2020-02-21 18:21 ` incoming Linus Torvalds
1 sibling, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-02-21 4:03 UTC (permalink / raw)
To: Linus Torvalds, linux-mm, mm-commits
On Thu, 20 Feb 2020 20:00:30 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> - A few y2038 fixes which missed the merge window whiole dependencies
> in NFS were being sorted out.
>
> - A bunch of fixes. Some minor, some not.
15 patches, based on ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-02-21 4:00 Andrew Morton
2020-02-21 4:03 ` incoming Andrew Morton
2020-02-21 18:21 ` incoming Linus Torvalds
0 siblings, 2 replies; 786+ messages in thread
From: Andrew Morton @ 2020-02-21 4:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
- A few y2038 fixes which missed the merge window whiole dependencies
in NFS were being sorted out.
- A bunch of fixes. Some minor, some not.
Subsystems affected by this patch series:
Arnd Bergmann <arnd@arndb.de>:
y2038: remove ktime to/from timespec/timeval conversion
y2038: remove unused time32 interfaces
y2038: hide timeval/timespec/itimerval/itimerspec types
Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>:
Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
Christian Borntraeger <borntraeger@de.ibm.com>:
include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap
SeongJae Park <sjpark@amazon.de>:
selftests/vm: add missed tests in run_vmtests
Joe Perches <joe@perches.com>:
get_maintainer: remove uses of P: for maintainer name
Douglas Anderson <dianders@chromium.org>:
scripts/get_maintainer.pl: deprioritize old Fixes: addresses
Christoph Hellwig <hch@lst.de>:
mm/swapfile.c: fix a comment in sys_swapon()
Vasily Averin <vvs@virtuozzo.com>:
mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()
Alexandru Ardelean <alexandru.ardelean@analog.com>:
lib/string.c: update match_string() doc-strings with correct behavior
Gavin Shan <gshan@redhat.com>:
mm/vmscan.c: don't round up scan size for online memory cgroup
Wei Yang <richardw.yang@linux.intel.com>:
mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM
Alexander Potapenko <glider@google.com>:
lib/stackdepot.c: fix global out-of-bounds in stack_slabs
Randy Dunlap <rdunlap@infradead.org>:
MAINTAINERS: use tabs for SAFESETID
MAINTAINERS | 8 -
include/linux/compat.h | 29 ------
include/linux/ktime.h | 37 -------
include/linux/time32.h | 154 ---------------------------------
include/linux/timekeeping32.h | 32 ------
include/linux/types.h | 5 -
include/uapi/asm-generic/posix_types.h | 2
include/uapi/linux/swab.h | 4
include/uapi/linux/time.h | 22 ++--
ipc/sem.c | 6 -
kernel/compat.c | 64 -------------
kernel/time/time.c | 43 ---------
lib/stackdepot.c | 8 +
lib/string.c | 16 +++
mm/memcontrol.c | 4
mm/sparse.c | 2
mm/swapfile.c | 2
mm/vmscan.c | 9 +
scripts/get_maintainer.pl | 32 ------
tools/testing/selftests/vm/run_vmtests | 33 +++++++
20 files changed, 93 insertions(+), 419 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-04 2:46 ` incoming Andrew Morton
@ 2020-02-04 3:11 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2020-02-04 3:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Tue, Feb 4, 2020 at 2:46 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 4 Feb 2020 02:27:48 +0000 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > What's the base? You've changed your scripts or something, and that
> > information is no longer in your cover letter..
>
> Crap, sorry, geriatric.
>
> d4e9056daedca3891414fe3c91de3449a5dad0f2
Ok, I've tentatively applied it with the MIME decoding fixes I found,
and I'll guess I'll let it build and sit for a while before merging it
into my tree.
I didn't find anything else odd in there. But...
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-04 2:27 ` incoming Linus Torvalds
@ 2020-02-04 2:46 ` Andrew Morton
2020-02-04 3:11 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-02-04 2:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, Linux-MM
On Tue, 4 Feb 2020 02:27:48 +0000 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Feb 4, 2020 at 1:33 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > The rest of MM and the rest of everything else.
>
> What's the base? You've changed your scripts or something, and that
> information is no longer in your cover letter..
>
Crap, sorry, geriatric.
d4e9056daedca3891414fe3c91de3449a5dad0f2
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2020-02-04 1:33 incoming Andrew Morton
@ 2020-02-04 2:27 ` Linus Torvalds
2020-02-04 2:46 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2020-02-04 2:27 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, Linux-MM
On Tue, Feb 4, 2020 at 1:33 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> The rest of MM and the rest of everything else.
What's the base? You've changed your scripts or something, and that
information is no longer in your cover letter..
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-02-04 1:33 Andrew Morton
2020-02-04 2:27 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2020-02-04 1:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
The rest of MM and the rest of everything else.
Subsystems affected by this patch series:
hotfixes
mm/pagealloc
mm/memory-hotplug
ipc
misc
mm/cleanups
mm/pagemap
procfs
lib
cleanups
arm
Subsystem: hotfixes
Gang He <GHe@suse.com>:
ocfs2: fix oops when writing cloned file
David Hildenbrand <david@redhat.com>:
Patch series "mm: fix max_pfn not falling on section boundary", v2:
mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
fs/proc/page.c: allow inspection of last section and fix end detection
mm/page_alloc.c: initialize memmap of unavailable memory directly
Subsystem: mm/pagealloc
David Hildenbrand <david@redhat.com>:
mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
mm: factor out next_present_section_nr()
Subsystem: mm/memory-hotplug
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6:
mm/memmap_init: update variable name in memmap_init_zone
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()
mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn
mm/memory_hotplug: don't check for "all holes" in shrink_zone_span()
mm/memory_hotplug: drop local variables in shrink_zone_span()
mm/memory_hotplug: cleanup __remove_pages()
mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone()
Subsystem: ipc
Manfred Spraul <manfred@colorfullife.com>:
smp_mb__{before,after}_atomic(): update Documentation
Davidlohr Bueso <dave@stgolabs.net>:
ipc/mqueue.c: remove duplicated code
Manfred Spraul <manfred@colorfullife.com>:
ipc/mqueue.c: update/document memory barriers
ipc/msg.c: update and document memory barriers
ipc/sem.c: document and update memory barriers
Lu Shuaibing <shuaibinglu@126.com>:
ipc/msg.c: consolidate all xxxctl_down() functions
drivers/block/null_blk_main.c: fix layout
Subsystem: misc
Andrew Morton <akpm@linux-foundation.org>:
drivers/block/null_blk_main.c: fix layout
drivers/block/null_blk_main.c: fix uninitialized var warnings
Randy Dunlap <rdunlap@infradead.org>:
pinctrl: fix pxa2xx.c build warnings
Subsystem: mm/cleanups
Florian Westphal <fw@strlen.de>:
mm: remove __krealloc
Subsystem: mm/pagemap
Steven Price <steven.price@arm.com>:
Patch series "Generic page walk and ptdump", v17:
mm: add generic p?d_leaf() macros
arc: mm: add p?d_leaf() definitions
arm: mm: add p?d_leaf() definitions
arm64: mm: add p?d_leaf() definitions
mips: mm: add p?d_leaf() definitions
powerpc: mm: add p?d_leaf() definitions
riscv: mm: add p?d_leaf() definitions
s390: mm: add p?d_leaf() definitions
sparc: mm: add p?d_leaf() definitions
x86: mm: add p?d_leaf() definitions
mm: pagewalk: add p4d_entry() and pgd_entry()
mm: pagewalk: allow walking without vma
mm: pagewalk: don't lock PTEs for walk_page_range_novma()
mm: pagewalk: fix termination condition in walk_pte_range()
mm: pagewalk: add 'depth' parameter to pte_hole
x86: mm: point to struct seq_file from struct pg_state
x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct
x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
mm: add generic ptdump
x86: mm: convert dump_pagetables to use walk_page_range
arm64: mm: convert mm/dump.c to use walk_page_range()
arm64: mm: display non-present entries in ptdump
mm: ptdump: reduce level numbers by 1 in note_page()
x86: mm: avoid allocating struct mm_struct on the stack
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "Fixup page directory freeing", v4:
powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
Peter Zijlstra <peterz@infradead.org>:
mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
asm-generic/tlb: avoid potential double flush
asm-gemeric/tlb: remove stray function declarations
asm-generic/tlb: add missing CONFIG symbol
asm-generic/tlb: rename HAVE_RCU_TABLE_FREE
asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE
asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER
asm-generic/tlb: provide MMU_GATHER_TABLE_FREE
Subsystem: procfs
Alexey Dobriyan <adobriyan@gmail.com>:
proc: decouple proc from VFS with "struct proc_ops"
proc: convert everything to "struct proc_ops"
Subsystem: lib
Yury Norov <yury.norov@gmail.com>:
Patch series "lib: rework bitmap_parse", v5:
lib/string: add strnchrnul()
bitops: more BITS_TO_* macros
lib: add test for bitmap_parse()
lib: make bitmap_parse_user a wrapper on bitmap_parse
lib: rework bitmap_parse()
lib: new testcases for bitmap_parse{_user}
include/linux/cpumask.h: don't calculate length of the input string
Subsystem: cleanups
Masahiro Yamada <masahiroy@kernel.org>:
treewide: remove redundant IS_ERR() before error code check
Subsystem: arm
Chen-Yu Tsai <wens@csie.org>:
ARM: dma-api: fix max_pfn off-by-one error in __dma_supported()
Documentation/memory-barriers.txt | 14
arch/Kconfig | 17
arch/alpha/kernel/srm_env.c | 17
arch/arc/include/asm/pgtable.h | 1
arch/arm/Kconfig | 2
arch/arm/include/asm/pgtable-2level.h | 1
arch/arm/include/asm/pgtable-3level.h | 1
arch/arm/include/asm/tlb.h | 6
arch/arm/kernel/atags_proc.c | 8
arch/arm/mm/alignment.c | 14
arch/arm/mm/dma-mapping.c | 2
arch/arm64/Kconfig | 3
arch/arm64/Kconfig.debug | 19
arch/arm64/include/asm/pgtable.h | 2
arch/arm64/include/asm/ptdump.h | 8
arch/arm64/mm/Makefile | 4
arch/arm64/mm/dump.c | 152 ++----
arch/arm64/mm/mmu.c | 4
arch/arm64/mm/ptdump_debugfs.c | 2
arch/ia64/kernel/salinfo.c | 24 -
arch/m68k/kernel/bootinfo_proc.c | 8
arch/mips/include/asm/pgtable.h | 5
arch/mips/lasat/picvue_proc.c | 31 -
arch/powerpc/Kconfig | 7
arch/powerpc/include/asm/book3s/32/pgalloc.h | 8
arch/powerpc/include/asm/book3s/64/pgalloc.h | 2
arch/powerpc/include/asm/book3s/64/pgtable.h | 3
arch/powerpc/include/asm/nohash/pgalloc.h | 8
arch/powerpc/include/asm/tlb.h | 11
arch/powerpc/kernel/proc_powerpc.c | 10
arch/powerpc/kernel/rtas-proc.c | 70 +--
arch/powerpc/kernel/rtas_flash.c | 34 -
arch/powerpc/kernel/rtasd.c | 14
arch/powerpc/mm/book3s64/pgtable.c | 7
arch/powerpc/mm/numa.c | 12
arch/powerpc/platforms/pseries/lpar.c | 24 -
arch/powerpc/platforms/pseries/lparcfg.c | 14
arch/powerpc/platforms/pseries/reconfig.c | 8
arch/powerpc/platforms/pseries/scanlog.c | 15
arch/riscv/include/asm/pgtable-64.h | 7
arch/riscv/include/asm/pgtable.h | 7
arch/s390/Kconfig | 4
arch/s390/include/asm/pgtable.h | 2
arch/sh/mm/alignment.c | 17
arch/sparc/Kconfig | 3
arch/sparc/include/asm/pgtable_64.h | 2
arch/sparc/include/asm/tlb_64.h | 11
arch/sparc/kernel/led.c | 15
arch/um/drivers/mconsole_kern.c | 9
arch/um/kernel/exitcode.c | 15
arch/um/kernel/process.c | 15
arch/x86/Kconfig | 3
arch/x86/Kconfig.debug | 20
arch/x86/include/asm/pgtable.h | 10
arch/x86/include/asm/tlb.h | 4
arch/x86/kernel/cpu/mtrr/if.c | 21
arch/x86/mm/Makefile | 4
arch/x86/mm/debug_pagetables.c | 18
arch/x86/mm/dump_pagetables.c | 418 +++++-------------
arch/x86/platform/efi/efi_32.c | 2
arch/x86/platform/efi/efi_64.c | 4
arch/x86/platform/uv/tlb_uv.c | 14
arch/xtensa/platforms/iss/simdisk.c | 10
crypto/af_alg.c | 2
drivers/acpi/battery.c | 15
drivers/acpi/proc.c | 15
drivers/acpi/scan.c | 2
drivers/base/memory.c | 9
drivers/block/null_blk_main.c | 58 +-
drivers/char/hw_random/bcm2835-rng.c | 2
drivers/char/hw_random/omap-rng.c | 4
drivers/clk/clk.c | 2
drivers/dma/mv_xor_v2.c | 2
drivers/firmware/efi/arm-runtime.c | 2
drivers/gpio/gpiolib-devres.c | 2
drivers/gpio/gpiolib-of.c | 8
drivers/gpio/gpiolib.c | 2
drivers/hwmon/dell-smm-hwmon.c | 15
drivers/i2c/busses/i2c-mv64xxx.c | 5
drivers/i2c/busses/i2c-synquacer.c | 2
drivers/ide/ide-proc.c | 19
drivers/input/input.c | 28 -
drivers/isdn/capi/kcapi_proc.c | 6
drivers/macintosh/via-pmu.c | 17
drivers/md/md.c | 15
drivers/misc/sgi-gru/gruprocfs.c | 42 -
drivers/mtd/ubi/build.c | 2
drivers/net/wireless/cisco/airo.c | 126 ++---
drivers/net/wireless/intel/ipw2x00/libipw_module.c | 15
drivers/net/wireless/intersil/hostap/hostap_hw.c | 4
drivers/net/wireless/intersil/hostap/hostap_proc.c | 14
drivers/net/wireless/intersil/hostap/hostap_wlan.h | 2
drivers/net/wireless/ray_cs.c | 20
drivers/of/device.c | 2
drivers/parisc/led.c | 17
drivers/pci/controller/pci-tegra.c | 2
drivers/pci/proc.c | 25 -
drivers/phy/phy-core.c | 4
drivers/pinctrl/pxa/pinctrl-pxa2xx.c | 1
drivers/platform/x86/thinkpad_acpi.c | 15
drivers/platform/x86/toshiba_acpi.c | 60 +-
drivers/pnp/isapnp/proc.c | 9
drivers/pnp/pnpbios/proc.c | 17
drivers/s390/block/dasd_proc.c | 15
drivers/s390/cio/blacklist.c | 14
drivers/s390/cio/css.c | 11
drivers/scsi/esas2r/esas2r_main.c | 9
drivers/scsi/scsi_devinfo.c | 15
drivers/scsi/scsi_proc.c | 29 -
drivers/scsi/sg.c | 30 -
drivers/spi/spi-orion.c | 3
drivers/staging/rtl8192u/ieee80211/ieee80211_module.c | 14
drivers/tty/sysrq.c | 8
drivers/usb/gadget/function/rndis.c | 17
drivers/video/fbdev/imxfb.c | 2
drivers/video/fbdev/via/viafbdev.c | 105 ++--
drivers/zorro/proc.c | 9
fs/cifs/cifs_debug.c | 108 ++--
fs/cifs/dfs_cache.c | 13
fs/cifs/dfs_cache.h | 2
fs/ext4/super.c | 2
fs/f2fs/node.c | 2
fs/fscache/internal.h | 2
fs/fscache/object-list.c | 11
fs/fscache/proc.c | 2
fs/jbd2/journal.c | 13
fs/jfs/jfs_debug.c | 14
fs/lockd/procfs.c | 12
fs/nfsd/nfsctl.c | 13
fs/nfsd/stats.c | 12
fs/ocfs2/file.c | 14
fs/ocfs2/suballoc.c | 2
fs/proc/cpuinfo.c | 12
fs/proc/generic.c | 38 -
fs/proc/inode.c | 76 +--
fs/proc/internal.h | 5
fs/proc/kcore.c | 13
fs/proc/kmsg.c | 14
fs/proc/page.c | 54 +-
fs/proc/proc_net.c | 32 -
fs/proc/proc_sysctl.c | 2
fs/proc/root.c | 2
fs/proc/stat.c | 12
fs/proc/task_mmu.c | 4
fs/proc/vmcore.c | 10
fs/sysfs/group.c | 2
include/asm-generic/pgtable.h | 20
include/asm-generic/tlb.h | 138 +++--
include/linux/bitmap.h | 8
include/linux/bitops.h | 4
include/linux/cpumask.h | 4
include/linux/memory_hotplug.h | 4
include/linux/mm.h | 6
include/linux/mmzone.h | 10
include/linux/pagewalk.h | 49 +-
include/linux/proc_fs.h | 23
include/linux/ptdump.h | 24 -
include/linux/seq_file.h | 13
include/linux/slab.h | 1
include/linux/string.h | 1
include/linux/sunrpc/stats.h | 4
ipc/mqueue.c | 123 ++++-
ipc/msg.c | 62 +-
ipc/sem.c | 66 +-
ipc/util.c | 14
kernel/configs.c | 9
kernel/irq/proc.c | 42 -
kernel/kallsyms.c | 12
kernel/latencytop.c | 14
kernel/locking/lockdep_proc.c | 15
kernel/module.c | 12
kernel/profile.c | 24 -
kernel/sched/psi.c | 48 +-
lib/bitmap.c | 195 ++++----
lib/string.c | 17
lib/test_bitmap.c | 105 ++++
mm/Kconfig.debug | 21
mm/Makefile | 1
mm/gup.c | 2
mm/hmm.c | 66 +-
mm/memory_hotplug.c | 104 +---
mm/memremap.c | 2
mm/migrate.c | 5
mm/mincore.c | 1
mm/mmu_gather.c | 158 ++++--
mm/page_alloc.c | 75 +--
mm/pagewalk.c | 167 +++++--
mm/ptdump.c | 159 ++++++
mm/slab_common.c | 37 -
mm/sparse.c | 10
mm/swapfile.c | 14
net/atm/mpoa_proc.c | 17
net/atm/proc.c | 8
net/core/dev.c | 2
net/core/filter.c | 2
net/core/pktgen.c | 44 -
net/ipv4/ipconfig.c | 10
net/ipv4/netfilter/ipt_CLUSTERIP.c | 16
net/ipv4/route.c | 24 -
net/netfilter/xt_recent.c | 17
net/sunrpc/auth_gss/svcauth_gss.c | 10
net/sunrpc/cache.c | 45 -
net/sunrpc/stats.c | 21
net/xfrm/xfrm_policy.c | 2
samples/kfifo/bytestream-example.c | 11
samples/kfifo/inttype-example.c | 11
samples/kfifo/record-example.c | 11
scripts/coccinelle/free/devm_free.cocci | 4
sound/core/info.c | 34 -
sound/soc/codecs/ak4104.c | 3
sound/soc/codecs/cs4270.c | 3
sound/soc/codecs/tlv320aic32x4.c | 6
sound/soc/sunxi/sun4i-spdif.c | 2
tools/include/linux/bitops.h | 9
214 files changed, 2589 insertions(+), 2227 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-01-31 6:10 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-01-31 6:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
Most of -mm and quite a number of other subsystems.
MM is fairly quiet this time. Holidays, I assume.
119 patches, based on 39bed42de2e7d74686a2d5a45638d6a5d7e7d473:
Subsystems affected by this patch series:
hotfixes
scripts
ocfs2
mm/slub
mm/kmemleak
mm/debug
mm/pagecache
mm/gup
mm/swap
mm/memcg
mm/pagemap
mm/tracing
mm/kasan
mm/initialization
mm/pagealloc
mm/vmscan
mm/tools
mm/memblock
mm/oom-kill
mm/hugetlb
mm/migration
mm/mmap
mm/memory-hotplug
mm/zswap
mm/cleanups
mm/zram
misc
lib
binfmt
init
reiserfs
exec
dma-mapping
kcov
Subsystem: hotfixes
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
lib/test_bitmap: correct test data offsets for 32-bit
"Theodore Ts'o" <tytso@mit.edu>:
memcg: fix a crash in wb_workfn when a device disappears
Dan Carpenter <dan.carpenter@oracle.com>:
mm/mempolicy.c: fix out of bounds write in mpol_parse_str()
Pingfan Liu <kernelfans@gmail.com>:
mm/sparse.c: reset section's mem_map when fully deactivated
Wei Yang <richardw.yang@linux.intel.com>:
mm/migrate.c: also overwrite error when it is bigger than zero
Dan Williams <dan.j.williams@intel.com>:
mm/memory_hotplug: fix remove_memory() lockdep splat
Wei Yang <richardw.yang@linux.intel.com>:
mm: thp: don't need care deferred split queue in memcg charge move path
Yang Shi <yang.shi@linux.alibaba.com>:
mm: move_pages: report the number of non-attempted pages
Subsystem: scripts
Xiong <xndchn@gmail.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Luca Ceresoli <luca@lucaceresoli.net>:
scripts/spelling.txt: add "issus" typo
Subsystem: ocfs2
Aditya Pakki <pakki001@umn.edu>:
fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres
zhengbin <zhengbin13@huawei.com>:
ocfs2: remove unneeded semicolons
Masahiro Yamada <masahiroy@kernel.org>:
ocfs2: make local header paths relative to C files
Colin Ian King <colin.king@canonical.com>:
ocfs2/dlm: remove redundant assignment to ret
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use
wangyan <wangyan122@huawei.com>:
ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction
Subsystem: mm/slub
Yu Zhao <yuzhao@google.com>:
mm/slub.c: avoid slub allocation while holding list_lock
Subsystem: mm/kmemleak
He Zhe <zhe.he@windriver.com>:
mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t
Subsystem: mm/debug
Vlastimil Babka <vbabka@suse.cz>:
mm/debug.c: always print flags in dump_page()
Subsystem: mm/pagecache
Ira Weiny <ira.weiny@intel.com>:
mm/filemap.c: clean up filemap_write_and_wait()
Subsystem: mm/gup
Qiujun Huang <hqjagain@gmail.com>:
mm: fix gup_pud_range
Wei Yang <richardw.yang@linux.intel.com>:
mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge
John Hubbard <jhubbard@nvidia.com>:
Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12:
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
Dan Williams <dan.j.williams@intel.com>:
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard <jhubbard@nvidia.com>:
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
media/v4l2-core: set pages dirty upon releasing DMA buffers
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
Subsystem: mm/swap
Vasily Averin <vvs@virtuozzo.com>:
mm/swapfile.c: swap_next should increase position index
Subsystem: mm/memcg
Kaitao Cheng <pilgrimtao@gmail.com>:
mm/memcontrol.c: cleanup some useless code
Subsystem: mm/pagemap
Li Xinhai <lixinhai.lxh@gmail.com>:
mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP page
Subsystem: mm/tracing
Junyong Sun <sunjy516@gmail.com>:
mm, tracing: print symbol name for kmem_alloc_node call_site events
Subsystem: mm/kasan
"Gustavo A. R. Silva" <gustavo@embeddedor.com>:
lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()
Subsystem: mm/initialization
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
mm/early_ioremap.c: use %pa to print resource_size_t variables
Subsystem: mm/pagealloc
"Kirill A. Shutemov" <kirill@shutemov.name>:
mm/page_alloc: skip non present sections on zone initialization
David Hildenbrand <david@redhat.com>:
mm: remove the memory isolate notifier
mm: remove "count" parameter from has_unmovable_pages()
Subsystem: mm/vmscan
Liu Song <liu.song11@zte.com.cn>:
mm/vmscan.c: remove unused return value of shrink_node
Alex Shi <alex.shi@linux.alibaba.com>:
mm/vmscan: remove prefetch_prev_lru_page
mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE
Subsystem: mm/tools
Daniel Wagner <dwagner@suse.de>:
tools/vm/slabinfo: fix sanity checks enabling
Subsystem: mm/memblock
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/memblock: define memblock_physmem_add()
memblock: Use __func__ in remaining memblock_dbg() call sites
Subsystem: mm/oom-kill
David Rientjes <rientjes@google.com>:
mm, oom: dump stack of victim when reaping failed
Subsystem: mm/hugetlb
Wei Yang <richardw.yang@linux.intel.com>:
mm/huge_memory.c: use head to check huge zero page
mm/huge_memory.c: use head to emphasize the purpose of page
mm/huge_memory.c: reduce critical section protected by split_queue_lock
Subsystem: mm/migration
Ralph Campbell <rcampbell@nvidia.com>:
mm/migrate: remove useless mask of start address
mm/migrate: clean up some minor coding style
mm/migrate: add stable check in migrate_vma_insert_page()
David Rientjes <rientjes@google.com>:
mm, thp: fix defrag setting if newline is not used
Subsystem: mm/mmap
Miaohe Lin <linmiaohe@huawei.com>:
mm/mmap.c: get rid of odd jump labels in find_mergeable_anon_vma()
Subsystem: mm/memory-hotplug
David Hildenbrand <david@redhat.com>:
Patch series "mm/memory_hotplug: pass in nid to online_pages()":
mm/memory_hotplug: pass in nid to online_pages()
Qian Cai <cai@lca.pw>:
mm/hotplug: silence a lockdep splat with printk()
mm/page_isolation: fix potential warning from user
Subsystem: mm/zswap
Vitaly Wool <vitaly.wool@konsulko.com>:
mm/zswap.c: add allocation hysteresis if pool limit is hit
Dan Carpenter <dan.carpenter@oracle.com>:
zswap: potential NULL dereference on error in init_zswap()
Subsystem: mm/cleanups
Yu Zhao <yuzhao@google.com>:
include/linux/mm.h: clean up obsolete check on space in page->flags
Wei Yang <richardw.yang@linux.intel.com>:
include/linux/mm.h: remove dead code totalram_pages_set()
Anshuman Khandual <anshuman.khandual@arm.com>:
include/linux/memory.h: drop fields 'hw' and 'phys_callback' from struct memory_block
Hao Lee <haolee.swjtu@gmail.com>:
mm: fix comments related to node reclaim
Subsystem: mm/zram
Taejoon Song <taejoon.song@lge.com>:
zram: try to avoid worst-case scenario on same element pages
Colin Ian King <colin.king@canonical.com>:
drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store
Subsystem: misc
Akinobu Mita <akinobu.mita@gmail.com>:
Patch series "add header file for kelvin to/from Celsius conversion:
include/linux/units.h: add helpers for kelvin to/from Celsius conversion
ACPI: thermal: switch to use <linux/units.h> helpers
platform/x86: asus-wmi: switch to use <linux/units.h> helpers
platform/x86: intel_menlow: switch to use <linux/units.h> helpers
thermal: int340x: switch to use <linux/units.h> helpers
thermal: intel_pch: switch to use <linux/units.h> helpers
nvme: hwmon: switch to use <linux/units.h> helpers
thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h>
iwlegacy: use <linux/units.h> helpers
iwlwifi: use <linux/units.h> helpers
thermal: armada: remove unused TO_MCELSIUS macro
iio: adc: qcom-vadc-common: use <linux/units.h> helpers
Subsystem: lib
Mikhail Zaslonko <zaslonko@linux.ibm.com>:
Patch series "S390 hardware support for kernel zlib", v3:
lib/zlib: add s390 hardware support for kernel zlib_deflate
s390/boot: rename HEAP_SIZE due to name collision
lib/zlib: add s390 hardware support for kernel zlib_inflate
s390/boot: add dfltcc= kernel command line parameter
lib/zlib: add zlib_deflate_dfltcc_enabled() function
btrfs: use larger zlib buffer for s390 hardware compression
Nathan Chancellor <natechancellor@gmail.com>:
lib/scatterlist.c: adjust indentation in __sg_alloc_table
Yury Norov <yury.norov@gmail.com>:
uapi: rename ext2_swab() to swab() and share globally in swab.h
lib/find_bit.c: join _find_next_bit{_le}
lib/find_bit.c: uninline helper _find_next_bit()
Subsystem: binfmt
Alexey Dobriyan <adobriyan@gmail.com>:
fs/binfmt_elf.c: smaller code generation around auxv vector fill
fs/binfmt_elf.c: fix ->start_code calculation
fs/binfmt_elf.c: don't copy ELF header around
fs/binfmt_elf.c: better codegen around current->mm
fs/binfmt_elf.c: make BAD_ADDR() unlikely
fs/binfmt_elf.c: coredump: allocate core ELF header on stack
fs/binfmt_elf.c: coredump: delete duplicated overflow check
fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
Subsystem: init
Arvind Sankar <nivedita@alum.mit.edu>:
init/main.c: log arguments and environment passed to init
init/main.c: remove unnecessary repair_env_string in do_initcall_level
Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2:
init/main.c: fix quoted value handling in unknown_bootoption
Christophe Leroy <christophe.leroy@c-s.fr>:
init/main.c: fix misleading "This architecture does not have kernel memory protection" message
Subsystem: reiserfs
Yunfeng Ye <yeyunfeng@huawei.com>:
reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
Subsystem: exec
Alexey Dobriyan <adobriyan@gmail.com>:
execve: warn if process starts with executable stack
Subsystem: dma-mapping
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
Subsystem: kcov
Dmitry Vyukov <dvyukov@google.com>:
kcov: ignore fault-inject and stacktrace
Documentation/admin-guide/kernel-parameters.txt | 12
Documentation/core-api/index.rst | 1
Documentation/core-api/pin_user_pages.rst | 234 +++++
Documentation/vm/zswap.rst | 13
arch/powerpc/mm/book3s64/iommu_api.c | 14
arch/s390/boot/compressed/decompressor.c | 8
arch/s390/boot/ipl_parm.c | 14
arch/s390/include/asm/setup.h | 7
arch/s390/kernel/setup.c | 14
drivers/acpi/thermal.c | 34
drivers/base/memory.c | 25
drivers/block/zram/zram_drv.c | 10
drivers/gpu/drm/via/via_dmablit.c | 6
drivers/iio/adc/qcom-vadc-common.c | 6
drivers/iio/adc/qcom-vadc-common.h | 1
drivers/infiniband/core/umem.c | 21
drivers/infiniband/core/umem_odp.c | 13
drivers/infiniband/hw/hfi1/user_pages.c | 4
drivers/infiniband/hw/mthca/mthca_memfree.c | 8
drivers/infiniband/hw/qib/qib_user_pages.c | 4
drivers/infiniband/hw/qib/qib_user_sdma.c | 8
drivers/infiniband/hw/usnic/usnic_uiom.c | 4
drivers/infiniband/sw/siw/siw_mem.c | 4
drivers/media/v4l2-core/videobuf-dma-sg.c | 20
drivers/net/ethernet/broadcom/bnx2x/bnx2x_init.h | 1
drivers/net/wireless/intel/iwlegacy/4965-mac.c | 3
drivers/net/wireless/intel/iwlegacy/4965.c | 17
drivers/net/wireless/intel/iwlegacy/common.h | 3
drivers/net/wireless/intel/iwlwifi/dvm/dev.h | 5
drivers/net/wireless/intel/iwlwifi/dvm/devices.c | 6
drivers/nvdimm/pmem.c | 6
drivers/nvme/host/hwmon.c | 13
drivers/platform/goldfish/goldfish_pipe.c | 39
drivers/platform/x86/asus-wmi.c | 7
drivers/platform/x86/intel_menlow.c | 9
drivers/thermal/armada_thermal.c | 2
drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c | 7
drivers/thermal/intel/intel_pch_thermal.c | 3
drivers/vfio/vfio_iommu_type1.c | 39
fs/binfmt_elf.c | 154 +--
fs/btrfs/compression.c | 2
fs/btrfs/zlib.c | 135 ++
fs/exec.c | 5
fs/fs-writeback.c | 2
fs/io_uring.c | 6
fs/ocfs2/cluster/quorum.c | 2
fs/ocfs2/dlm/Makefile | 2
fs/ocfs2/dlm/dlmast.c | 8
fs/ocfs2/dlm/dlmcommon.h | 4
fs/ocfs2/dlm/dlmconvert.c | 8
fs/ocfs2/dlm/dlmdebug.c | 8
fs/ocfs2/dlm/dlmdomain.c | 8
fs/ocfs2/dlm/dlmlock.c | 8
fs/ocfs2/dlm/dlmmaster.c | 10
fs/ocfs2/dlm/dlmrecovery.c | 10
fs/ocfs2/dlm/dlmthread.c | 8
fs/ocfs2/dlm/dlmunlock.c | 8
fs/ocfs2/dlmfs/Makefile | 2
fs/ocfs2/dlmfs/dlmfs.c | 4
fs/ocfs2/dlmfs/userdlm.c | 6
fs/ocfs2/dlmglue.c | 2
fs/ocfs2/journal.h | 8
fs/ocfs2/namei.c | 3
fs/reiserfs/stree.c | 3
include/linux/backing-dev.h | 10
include/linux/bitops.h | 1
include/linux/fs.h | 6
include/linux/io-mapping.h | 5
include/linux/memblock.h | 7
include/linux/memory.h | 29
include/linux/memory_hotplug.h | 3
include/linux/mm.h | 116 +-
include/linux/mmzone.h | 2
include/linux/page-isolation.h | 8
include/linux/swab.h | 1
include/linux/thermal.h | 11
include/linux/units.h | 84 +
include/linux/zlib.h | 6
include/trace/events/kmem.h | 4
include/trace/events/writeback.h | 37
include/uapi/linux/swab.h | 10
include/uapi/linux/sysctl.h | 2
init/main.c | 36
kernel/Makefile | 1
lib/Kconfig | 7
lib/Makefile | 2
lib/decompress_inflate.c | 13
lib/find_bit.c | 82 -
lib/scatterlist.c | 2
lib/test_bitmap.c | 9
lib/test_kasan.c | 1
lib/zlib_deflate/deflate.c | 85 +
lib/zlib_deflate/deflate_syms.c | 1
lib/zlib_deflate/deftree.c | 54 -
lib/zlib_deflate/defutil.h | 134 ++
lib/zlib_dfltcc/Makefile | 13
lib/zlib_dfltcc/dfltcc.c | 57 +
lib/zlib_dfltcc/dfltcc.h | 155 +++
lib/zlib_dfltcc/dfltcc_deflate.c | 280 ++++++
lib/zlib_dfltcc/dfltcc_inflate.c | 149 +++
lib/zlib_dfltcc/dfltcc_syms.c | 17
lib/zlib_dfltcc/dfltcc_util.h | 123 ++
lib/zlib_inflate/inflate.c | 32
lib/zlib_inflate/inflate.h | 8
lib/zlib_inflate/infutil.h | 18
mm/Makefile | 1
mm/backing-dev.c | 1
mm/debug.c | 18
mm/early_ioremap.c | 8
mm/filemap.c | 34
mm/gup.c | 503 ++++++-----
mm/gup_benchmark.c | 9
mm/huge_memory.c | 44
mm/kmemleak.c | 112 +-
mm/memblock.c | 22
mm/memcontrol.c | 25
mm/memory_hotplug.c | 24
mm/mempolicy.c | 6
mm/memremap.c | 95 --
mm/migrate.c | 77 +
mm/mmap.c | 30
mm/oom_kill.c | 2
mm/page_alloc.c | 83 +
mm/page_isolation.c | 69 -
mm/page_vma_mapped.c | 12
mm/process_vm_access.c | 32
mm/slub.c | 88 +
mm/sparse.c | 2
mm/swap.c | 27
mm/swapfile.c | 2
mm/vmscan.c | 24
mm/zswap.c | 88 +
net/xdp/xdp_umem.c | 4
scripts/spelling.txt | 14
tools/testing/selftests/vm/gup_benchmark.c | 6
tools/vm/slabinfo.c | 4
136 files changed, 2790 insertions(+), 1358 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-01-14 0:28 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-01-14 0:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
11 MM fixes, based on b3a987b0264d3ddbb24293ebff10eddfc472f653:
Vlastimil Babka <vbabka@suse.cz>:
mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: don't free usage map when removing a re-added early section
"Kirill A. Shutemov" <kirill@shutemov.name>:
Patch series "Fix two above-47bit hint address vs. THP bugs":
mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: fix percpu slab vmstats flushing
Vlastimil Babka <vbabka@suse.cz>:
mm, debug_pagealloc: don't rely on static keys too early
Wen Yang <wenyang@linux.alibaba.com>:
Patch series "use div64_ul() instead of div_u64() if the divisor is:
mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide
mm/page-writeback.c: improve arithmetic divisions
Adrian Huang <ahuang12@lenovo.com>:
mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
Yang Shi <yang.shi@linux.alibaba.com>:
mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE
include/linux/mm.h | 18 +++++++++-
include/linux/mmzone.h | 5 +--
include/trace/events/huge_memory.h | 3 +
init/main.c | 1
mm/huge_memory.c | 38 ++++++++++++++---------
mm/memcontrol.c | 37 +++++-----------------
mm/mempolicy.c | 10 ++++--
mm/page-writeback.c | 10 +++---
mm/page_alloc.c | 61 ++++++++++---------------------------
mm/shmem.c | 7 ++--
mm/slab.c | 4 +-
mm/slab_common.c | 3 +
mm/slub.c | 2 -
mm/sparse.c | 9 ++++-
mm/vmalloc.c | 4 +-
15 files changed, 102 insertions(+), 110 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2020-01-04 20:55 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2020-01-04 20:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
17 fixes, base on 5613970af3f5f8372c596b138bd64f3918513515:
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: shrink zones when offlining memory
Chanho Min <chanho.min@lge.com>:
mm/zsmalloc.c: fix the migrated zspage statistics.
Andrey Konovalov <andreyknvl@google.com>:
kcov: fix struct layout for kcov_remote_arg
Shakeel Butt <shakeelb@google.com>:
memcg: account security cred as well to kmemcg
Yang Shi <yang.shi@linux.alibaba.com>:
mm: move_pages: return valid node id in status if the page is already on the target node
Eric Biggers <ebiggers@google.com>:
fs/direct-io.c: include fs/internal.h for missing prototype
fs/nsfs.c: include headers for missing declarations
fs/namespace.c: make to_mnt_ns() static
Nick Desaulniers <ndesaulniers@google.com>:
hexagon: parenthesize registers in asm predicates
hexagon: work around compiler crash
Randy Dunlap <rdunlap@infradead.org>:
fs/posix_acl.c: fix kernel-doc warnings
Ilya Dryomov <idryomov@gmail.com>:
mm/oom: fix pgtables units mismatch in Killed process message
Navid Emamdoost <navid.emamdoost@gmail.com>:
mm/gup: fix memory leak in __gup_benchmark_ioctl
Waiman Long <longman@redhat.com>:
mm/hugetlb: defer freeing of huge pages if in non-task context
Kai Li <li.kai4@h3c.com>:
ocfs2: call journal flush to mark journal as empty after journal recovery when mount
Gang He <GHe@suse.com>:
ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
Nick Desaulniers <ndesaulniers@google.com>:
hexagon: define ioremap_uc
Documentation/dev-tools/kcov.rst | 10 +++----
arch/arm64/mm/mmu.c | 4 --
arch/hexagon/include/asm/atomic.h | 8 ++---
arch/hexagon/include/asm/bitops.h | 8 ++---
arch/hexagon/include/asm/cmpxchg.h | 2 -
arch/hexagon/include/asm/futex.h | 6 ++--
arch/hexagon/include/asm/io.h | 1
arch/hexagon/include/asm/spinlock.h | 20 +++++++-------
arch/hexagon/kernel/stacktrace.c | 4 --
arch/hexagon/kernel/vm_entry.S | 2 -
arch/ia64/mm/init.c | 4 --
arch/powerpc/mm/mem.c | 3 --
arch/s390/mm/init.c | 4 --
arch/sh/mm/init.c | 4 --
arch/x86/mm/init_32.c | 4 --
arch/x86/mm/init_64.c | 4 --
fs/direct-io.c | 2 +
fs/namespace.c | 2 -
fs/nsfs.c | 3 ++
fs/ocfs2/dlmglue.c | 1
fs/ocfs2/journal.c | 8 +++++
fs/posix_acl.c | 7 +++-
include/linux/memory_hotplug.h | 7 +++-
include/uapi/linux/kcov.h | 10 +++----
kernel/cred.c | 6 ++--
mm/gup_benchmark.c | 8 ++++-
mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++-
mm/memory_hotplug.c | 31 +++++++++++----------
mm/memremap.c | 2 -
mm/migrate.c | 23 ++++++++++++----
mm/oom_kill.c | 2 -
mm/zsmalloc.c | 5 +++
32 files changed, 166 insertions(+), 90 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-12-18 4:50 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-12-18 4:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm, mm-commits
6 fixes based on 2187f215ebaac73ddbd814696d7c7fa34f0c3de0:
Andrey Ryabinin <aryabinin@virtuozzo.com>:
kasan: fix crashes on access to memory mapped by vm_map_ram()
Daniel Axtens <dja@axtens.net>:
mm/memory.c: add apply_to_existing_page_range() helper
kasan: use apply_to_existing_page_range() for releasing vmalloc shadow
kasan: don't assume percpu shadow allocations will succeed
Yang Shi <yang.shi@linux.alibaba.com>:
mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG
Changbin Du <changbin.du@gmail.com>:
lib/Kconfig.debug: fix some messed up configurations
include/linux/kasan.h | 15 +++--
include/linux/mm.h | 3 +
lib/Kconfig.debug | 100 ++++++++++++++++++------------------
mm/kasan/common.c | 36 ++++++++-----
mm/memory.c | 136 ++++++++++++++++++++++++++++++++++----------------
mm/vmalloc.c | 133 ++++++++++++++++++++++++++++--------------------
mm/vmscan.c | 2
7 files changed, 260 insertions(+), 165 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-12-05 0:48 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-12-05 0:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
Most of the rest of MM and various other things. Some Kconfig rework
still awaits merges of dependent trees from linux-next.
86 patches, based on 63de37476ebd1e9bab6a9e17186dc5aa1da9ea99.
Subsystems affected by this patch series:
mm/hotfixes
mm/memcg
mm/vmstat
mm/thp
procfs
sysctl
misc
notifiers
core-kernel
bitops
lib
checkpatch
epoll
binfmt
init
rapidio
uaccess
kcov
ubsan
ipc
bitmap
mm/pagemap
Subsystem: mm/hotfixes
zhong jiang <zhongjiang@huawei.com>:
mm/kasan/common.c: fix compile error
Subsystem: mm/memcg
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
Subsystem: mm/vmstat
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
mm/vmstat: add helpers to get vmstat item names for each enum type
mm/memcontrol: use vmstat names for printing statistics
Subsystem: mm/thp
Yu Zhao <yuzhao@google.com>:
mm/memory.c: replace is_zero_pfn with is_huge_zero_pmd for thp
Subsystem: procfs
Alexey Dobriyan <adobriyan@gmail.com>:
proc: change ->nlink under proc_subdir_lock
fs/proc/generic.c: delete useless "len" variable
fs/proc/internal.h: shuffle "struct pde_opener"
Miaohe Lin <linmiaohe@huawei.com>:
include/linux/proc_fs.h: fix confusing macro arg name
Krzysztof Kozlowski <krzk@kernel.org>:
fs/proc/Kconfig: fix indentation
Subsystem: sysctl
Alessio Balsini <balsini@android.com>:
include/linux/sysctl.h: inline braces for ctl_table and ctl_table_header
Subsystem: misc
Stephen Boyd <swboyd@chromium.org>:
.gitattributes: use 'dts' diff driver for dts files
Rikard Falkeborn <rikard.falkeborn@gmail.com>:
linux/build_bug.h: change type to int
Masahiro Yamada <yamada.masahiro@socionext.com>:
linux/scc.h: make uapi linux/scc.h self-contained
Krzysztof Kozlowski <krzk@kernel.org>:
arch/Kconfig: fix indentation
Joe Perches <joe@perches.com>:
scripts/get_maintainer.pl: add signatures from Fixes: <badcommit> lines in commit message
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
kernel.h: update comment about simple_strto<foo>() functions
auxdisplay: charlcd: deduplicate simple_strtoul()
Subsystem: notifiers
Xiaoming Ni <nixiaoming@huawei.com>:
kernel/notifier.c: intercept duplicate registrations to avoid infinite loops
kernel/notifier.c: remove notifier_chain_cond_register()
kernel/notifier.c: remove blocking_notifier_chain_cond_register()
Subsystem: core-kernel
Nathan Chancellor <natechancellor@gmail.com>:
kernel/profile.c: use cpumask_available to check for NULL cpumask
Joe Perches <joe@perches.com>:
kernel/sys.c: avoid copying possible padding bytes in copy_to_user
Subsystem: bitops
William Breathitt Gray <vilhelm.gray@gmail.com>:
bitops: introduce the for_each_set_clump8 macro
lib/test_bitmap.c: add for_each_set_clump8 test cases
gpio: 104-dio-48e: utilize for_each_set_clump8 macro
gpio: 104-idi-48: utilize for_each_set_clump8 macro
gpio: gpio-mm: utilize for_each_set_clump8 macro
gpio: ws16c48: utilize for_each_set_clump8 macro
gpio: pci-idio-16: utilize for_each_set_clump8 macro
gpio: pcie-idio-24: utilize for_each_set_clump8 macro
gpio: uniphier: utilize for_each_set_clump8 macro
gpio: 74x164: utilize the for_each_set_clump8 macro
thermal: intel: intel_soc_dts_iosf: Utilize for_each_set_clump8 macro
gpio: pisosr: utilize the for_each_set_clump8 macro
gpio: max3191x: utilize the for_each_set_clump8 macro
gpio: pca953x: utilize the for_each_set_clump8 macro
Subsystem: lib
Wei Yang <richardw.yang@linux.intel.com>:
lib/rbtree: set successor's parent unconditionally
lib/rbtree: get successor's color directly
Laura Abbott <labbott@redhat.com>:
lib/test_meminit.c: add bulk alloc/free tests
Trent Piepho <tpiepho@gmail.com>:
lib/math/rational.c: fix possible incorrect result from rational fractions helper
Huang Shijie <sjhuang@iluvatar.ai>:
lib/genalloc.c: export symbol addr_in_gen_pool
lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: improve ignoring CamelCase SI style variants like mA
checkpatch: reduce is_maintained_obsolete lookup runtime
Subsystem: epoll
Jason Baron <jbaron@akamai.com>:
epoll: simplify ep_poll_safewake() for CONFIG_DEBUG_LOCK_ALLOC
Heiher <r@hev.cc>:
fs/epoll: remove unnecessary wakeups of nested epoll
selftests: add epoll selftests
Subsystem: binfmt
Alexey Dobriyan <adobriyan@gmail.com>:
fs/binfmt_elf.c: delete unused "interp_map_addr" argument
fs/binfmt_elf.c: extract elf_read() function
Subsystem: init
Krzysztof Kozlowski <krzk@kernel.org>:
init/Kconfig: fix indentation
Subsystem: rapidio
"Ben Dooks (Codethink)" <ben.dooks@codethink.co.uk>:
drivers/rapidio/rio-driver.c: fix missing include of <linux/rio_drv.h>
drivers/rapidio/rio-access.c: fix missing include of <linux/rio_drv.h>
Subsystem: uaccess
Daniel Vetter <daniel.vetter@ffwll.ch>:
drm: limit to INT_MAX in create_blob ioctl
Kees Cook <keescook@chromium.org>:
uaccess: disallow > INT_MAX copy sizes
Subsystem: kcov
Andrey Konovalov <andreyknvl@google.com>:
Patch series " kcov: collect coverage from usb and vhost", v3:
kcov: remote coverage support
usb, kcov: collect coverage from hub_event
vhost, kcov: collect coverage from vhost_worker
Subsystem: ubsan
Julien Grall <julien.grall@arm.com>:
lib/ubsan: don't serialize UBSAN report
Subsystem: ipc
Masahiro Yamada <yamada.masahiro@socionext.com>:
arch: ipcbuf.h: make uapi asm/ipcbuf.h self-contained
arch: msgbuf.h: make uapi asm/msgbuf.h self-contained
arch: sembuf.h: make uapi asm/sembuf.h self-contained
Subsystem: bitmap
Andy Shevchenko <andriy.shevchenko@linux.intel.com>:
Patch series "gpio: pca953x: Convert to bitmap (extended) API", v2:
lib/test_bitmap: force argument of bitmap_parselist_user() to proper address space
lib/test_bitmap: undefine macros after use
lib/test_bitmap: name EXP_BYTES properly
lib/test_bitmap: rename exp to exp1 to avoid ambiguous name
lib/test_bitmap: move exp1 and exp2 upper for others to use
lib/test_bitmap: fix comment about this file
lib/bitmap: introduce bitmap_replace() helper
gpio: pca953x: remove redundant variable and check in IRQ handler
gpio: pca953x: use input from regs structure in pca953x_irq_pending()
gpio: pca953x: convert to use bitmap API
gpio: pca953x: tighten up indentation
Subsystem: mm/pagemap
Mike Rapoport <rppt@linux.ibm.com>:
Patch series "mm: remove __ARCH_HAS_4LEVEL_HACK", v13:
alpha: use pgtable-nopud instead of 4level-fixup
arm: nommu: use pgtable-nopud instead of 4level-fixup
c6x: use pgtable-nopud instead of 4level-fixup
m68k: nommu: use pgtable-nopud instead of 4level-fixup
m68k: mm: use pgtable-nopXd instead of 4level-fixup
microblaze: use pgtable-nopmd instead of 4level-fixup
nds32: use pgtable-nopmd instead of 4level-fixup
parisc: use pgtable-nopXd instead of 4level-fixup
Helge Deller <deller@gmx.de>:
parisc/hugetlb: use pgtable-nopXd instead of 4level-fixup
Mike Rapoport <rppt@linux.ibm.com>:
sparc32: use pgtable-nopud instead of 4level-fixup
um: remove unused pxx_offset_proc() and addr_pte() functions
um: add support for folded p4d page tables
mm: remove __ARCH_HAS_4LEVEL_HACK and include/asm-generic/4level-fixup.h
.gitattributes | 2
Documentation/core-api/genalloc.rst | 2
Documentation/dev-tools/kcov.rst | 129
arch/Kconfig | 22
arch/alpha/include/asm/mmzone.h | 1
arch/alpha/include/asm/pgalloc.h | 4
arch/alpha/include/asm/pgtable.h | 24
arch/alpha/mm/init.c | 12
arch/arm/include/asm/pgtable.h | 2
arch/arm/mm/dma-mapping.c | 2
arch/c6x/include/asm/pgtable.h | 2
arch/m68k/include/asm/mcf_pgalloc.h | 7
arch/m68k/include/asm/mcf_pgtable.h | 28
arch/m68k/include/asm/mmu_context.h | 12
arch/m68k/include/asm/motorola_pgalloc.h | 4
arch/m68k/include/asm/motorola_pgtable.h | 32
arch/m68k/include/asm/page.h | 9
arch/m68k/include/asm/pgtable_mm.h | 11
arch/m68k/include/asm/pgtable_no.h | 2
arch/m68k/include/asm/sun3_pgalloc.h | 5
arch/m68k/include/asm/sun3_pgtable.h | 18
arch/m68k/kernel/sys_m68k.c | 10
arch/m68k/mm/init.c | 6
arch/m68k/mm/kmap.c | 39
arch/m68k/mm/mcfmmu.c | 16
arch/m68k/mm/motorola.c | 17
arch/m68k/sun3x/dvma.c | 7
arch/microblaze/include/asm/page.h | 3
arch/microblaze/include/asm/pgalloc.h | 16
arch/microblaze/include/asm/pgtable.h | 32
arch/microblaze/kernel/signal.c | 10
arch/microblaze/mm/init.c | 7
arch/microblaze/mm/pgtable.c | 13
arch/mips/include/uapi/asm/msgbuf.h | 1
arch/mips/include/uapi/asm/sembuf.h | 2
arch/nds32/include/asm/page.h | 3
arch/nds32/include/asm/pgalloc.h | 3
arch/nds32/include/asm/pgtable.h | 12
arch/nds32/include/asm/tlb.h | 1
arch/nds32/kernel/pm.c | 4
arch/nds32/mm/fault.c | 16
arch/nds32/mm/init.c | 11
arch/nds32/mm/mm-nds32.c | 6
arch/nds32/mm/proc.c | 26
arch/parisc/include/asm/page.h | 30
arch/parisc/include/asm/pgalloc.h | 41
arch/parisc/include/asm/pgtable.h | 52
arch/parisc/include/asm/tlb.h | 2
arch/parisc/include/uapi/asm/msgbuf.h | 1
arch/parisc/include/uapi/asm/sembuf.h | 1
arch/parisc/kernel/cache.c | 13
arch/parisc/kernel/pci-dma.c | 9
arch/parisc/mm/fixmap.c | 10
arch/parisc/mm/hugetlbpage.c | 18
arch/powerpc/include/uapi/asm/msgbuf.h | 2
arch/powerpc/include/uapi/asm/sembuf.h | 2
arch/s390/include/uapi/asm/ipcbuf.h | 2
arch/sparc/include/asm/pgalloc_32.h | 6
arch/sparc/include/asm/pgtable_32.h | 28
arch/sparc/include/uapi/asm/ipcbuf.h | 2
arch/sparc/include/uapi/asm/msgbuf.h | 2
arch/sparc/include/uapi/asm/sembuf.h | 2
arch/sparc/mm/fault_32.c | 11
arch/sparc/mm/highmem.c | 6
arch/sparc/mm/io-unit.c | 6
arch/sparc/mm/iommu.c | 6
arch/sparc/mm/srmmu.c | 51
arch/um/include/asm/pgtable-2level.h | 1
arch/um/include/asm/pgtable-3level.h | 1
arch/um/include/asm/pgtable.h | 3
arch/um/kernel/mem.c | 8
arch/um/kernel/skas/mmu.c | 12
arch/um/kernel/skas/uaccess.c | 7
arch/um/kernel/tlb.c | 85
arch/um/kernel/trap.c | 4
arch/x86/include/uapi/asm/msgbuf.h | 3
arch/x86/include/uapi/asm/sembuf.h | 2
arch/xtensa/include/uapi/asm/ipcbuf.h | 2
arch/xtensa/include/uapi/asm/msgbuf.h | 2
arch/xtensa/include/uapi/asm/sembuf.h | 1
drivers/auxdisplay/charlcd.c | 34
drivers/base/node.c | 9
drivers/gpio/gpio-104-dio-48e.c | 75
drivers/gpio/gpio-104-idi-48.c | 36
drivers/gpio/gpio-74x164.c | 19
drivers/gpio/gpio-gpio-mm.c | 75
drivers/gpio/gpio-max3191x.c | 19
drivers/gpio/gpio-pca953x.c | 209
drivers/gpio/gpio-pci-idio-16.c | 75
drivers/gpio/gpio-pcie-idio-24.c | 111
drivers/gpio/gpio-pisosr.c | 12
drivers/gpio/gpio-uniphier.c | 13
drivers/gpio/gpio-ws16c48.c | 73
drivers/gpu/drm/drm_property.c | 2
drivers/misc/sram-exec.c | 2
drivers/rapidio/rio-access.c | 2
drivers/rapidio/rio-driver.c | 1
drivers/thermal/intel/intel_soc_dts_iosf.c | 31
drivers/thermal/intel/intel_soc_dts_iosf.h | 2
drivers/usb/core/hub.c | 5
drivers/vhost/vhost.c | 6
drivers/vhost/vhost.h | 1
fs/binfmt_elf.c | 56
fs/eventpoll.c | 52
fs/proc/Kconfig | 8
fs/proc/generic.c | 37
fs/proc/internal.h | 2
include/asm-generic/4level-fixup.h | 39
include/asm-generic/bitops/find.h | 17
include/linux/bitmap.h | 51
include/linux/bitops.h | 12
include/linux/build_bug.h | 4
include/linux/genalloc.h | 2
include/linux/kcov.h | 23
include/linux/kernel.h | 19
include/linux/mm.h | 10
include/linux/notifier.h | 4
include/linux/proc_fs.h | 4
include/linux/rbtree_augmented.h | 6
include/linux/sched.h | 8
include/linux/sysctl.h | 6
include/linux/thread_info.h | 2
include/linux/vmstat.h | 54
include/uapi/asm-generic/ipcbuf.h | 2
include/uapi/asm-generic/msgbuf.h | 2
include/uapi/asm-generic/sembuf.h | 1
include/uapi/linux/kcov.h | 28
include/uapi/linux/scc.h | 1
init/Kconfig | 78
kernel/dma/remap.c | 2
kernel/kcov.c | 547 +
kernel/notifier.c | 45
kernel/profile.c | 6
kernel/sys.c | 4
lib/bitmap.c | 12
lib/find_bit.c | 14
lib/genalloc.c | 7
lib/math/rational.c | 63
lib/test_bitmap.c | 206
lib/test_meminit.c | 20
lib/ubsan.c | 64
mm/kasan/common.c | 1
mm/memcontrol.c | 52
mm/memory.c | 10
mm/slab_common.c | 12
mm/vmstat.c | 60
net/sunrpc/rpc_pipe.c | 2
scripts/checkpatch.pl | 13
scripts/get_maintainer.pl | 38
tools/testing/selftests/Makefile | 1
tools/testing/selftests/filesystems/epoll/.gitignore | 1
tools/testing/selftests/filesystems/epoll/Makefile | 7
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c | 3074 ++++++++++
usr/include/Makefile | 4
154 files changed, 5270 insertions(+), 1360 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-12-01 21:07 ` incoming Linus Torvalds
@ 2019-12-02 8:21 ` Steven Price
0 siblings, 0 replies; 786+ messages in thread
From: Steven Price @ 2019-12-02 8:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, mm-commits, Linux-MM
On Sun, Dec 01, 2019 at 09:07:47PM +0000, Linus Torvalds wrote:
> On Sat, Nov 30, 2019 at 5:47 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Steven Price <steven.price@arm.com>:
> > Patch series "Generic page walk and ptdump", v15:
> > mm: add generic p?d_leaf() macros
> > arc: mm: add p?d_leaf() definitions
> > arm: mm: add p?d_leaf() definitions
> > arm64: mm: add p?d_leaf() definitions
> > mips: mm: add p?d_leaf() definitions
> > powerpc: mm: add p?d_leaf() definitions
> > riscv: mm: add p?d_leaf() definitions
> > s390: mm: add p?d_leaf() definitions
> > sparc: mm: add p?d_leaf() definitions
> > x86: mm: add p?d_leaf() definitions
> > mm: pagewalk: add p4d_entry() and pgd_entry()
> > mm: pagewalk: allow walking without vma
> > mm: pagewalk: add test_p?d callbacks
> > mm: pagewalk: add 'depth' parameter to pte_hole
> > x86: mm: point to struct seq_file from struct pg_state
> > x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct
> > x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
> > x86: mm: convert ptdump_walk_pgd_level_core() to take an mm_struct
> > mm: add generic ptdump
> > x86: mm: convert dump_pagetables to use walk_page_range
> > arm64: mm: convert mm/dump.c to use walk_page_range()
> > arm64: mm: display non-present entries in ptdump
> > mm: ptdump: reduce level numbers by 1 in note_page()
>
> I've dropped these, and since they clearly weren't ready I don't want
> to see them re-sent for 5.5.
Sorry about this, I'll try to track down the cause of this and hopefully
resubmit for 5.6.
Thanks,
Steve
> If somebody figures out the bug, trying again for 5.6 sounds fine.
>
> Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-12-01 1:47 incoming Andrew Morton
2019-12-01 5:17 ` incoming James Bottomley
@ 2019-12-01 21:07 ` Linus Torvalds
2019-12-02 8:21 ` incoming Steven Price
1 sibling, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2019-12-01 21:07 UTC (permalink / raw)
To: Andrew Morton, Steven Price; +Cc: mm-commits, Linux-MM
On Sat, Nov 30, 2019 at 5:47 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Steven Price <steven.price@arm.com>:
> Patch series "Generic page walk and ptdump", v15:
> mm: add generic p?d_leaf() macros
> arc: mm: add p?d_leaf() definitions
> arm: mm: add p?d_leaf() definitions
> arm64: mm: add p?d_leaf() definitions
> mips: mm: add p?d_leaf() definitions
> powerpc: mm: add p?d_leaf() definitions
> riscv: mm: add p?d_leaf() definitions
> s390: mm: add p?d_leaf() definitions
> sparc: mm: add p?d_leaf() definitions
> x86: mm: add p?d_leaf() definitions
> mm: pagewalk: add p4d_entry() and pgd_entry()
> mm: pagewalk: allow walking without vma
> mm: pagewalk: add test_p?d callbacks
> mm: pagewalk: add 'depth' parameter to pte_hole
> x86: mm: point to struct seq_file from struct pg_state
> x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct
> x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
> x86: mm: convert ptdump_walk_pgd_level_core() to take an mm_struct
> mm: add generic ptdump
> x86: mm: convert dump_pagetables to use walk_page_range
> arm64: mm: convert mm/dump.c to use walk_page_range()
> arm64: mm: display non-present entries in ptdump
> mm: ptdump: reduce level numbers by 1 in note_page()
I've dropped these, and since they clearly weren't ready I don't want
to see them re-sent for 5.5.
If somebody figures out the bug, trying again for 5.6 sounds fine.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-12-01 1:47 incoming Andrew Morton
@ 2019-12-01 5:17 ` James Bottomley
2019-12-01 21:07 ` incoming Linus Torvalds
1 sibling, 0 replies; 786+ messages in thread
From: James Bottomley @ 2019-12-01 5:17 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds; +Cc: mm-commits, linux-mm
On Sat, 2019-11-30 at 17:47 -0800, Andrew Morton wrote:
> - a small number of updates to scripts/, ocfs2 and fs/buffer.c
>
> - most of MM. I still have quite a lot of material (mostly not MM)
> staged after linux-next due to -next dependencies. I'll send thos
> across next week as the preprequisites get merged up.
>
> 158 patches, based on 32ef9553635ab1236c33951a8bd9b5af1c3b1646.
Hey, Andrew, would it be at all possible for you to thread these
patches under something like this incoming message? The selfish reason
I'm asking is so I can mark the thread as read instead of having to do
it individually for 158 messages ... my thumb would thank you for this.
Regards,
James
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-12-01 1:47 Andrew Morton
2019-12-01 5:17 ` incoming James Bottomley
2019-12-01 21:07 ` incoming Linus Torvalds
0 siblings, 2 replies; 786+ messages in thread
From: Andrew Morton @ 2019-12-01 1:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- a small number of updates to scripts/, ocfs2 and fs/buffer.c
- most of MM. I still have quite a lot of material (mostly not MM)
staged after linux-next due to -next dependencies. I'll send thos
across next week as the preprequisites get merged up.
158 patches, based on 32ef9553635ab1236c33951a8bd9b5af1c3b1646.
Subsystems affected by this patch series:
scripts
ocfs2
vfs
mm/slab
mm/slub
mm/pagecache
mm/gup
mm/swap
mm/memcg
mm/pagemap
mm/memfd
mm/memory-failure
mm/memory-hotplug
mm/sparsemem
mm/vmalloc
mm/kasan
mm/pagealloc
mm/vmscan
mm/proc
mm/z3fold
mm/mempolicy
mm/memblock
mm/hugetlbfs
mm/hugetlb
mm/migration
mm/thp
mm/cma
mm/autonuma
mm/page-poison
mm/mmap
mm/madvise
mm/userfaultfd
mm/shmem
mm/cleanups
mm/support
Subsystem: scripts
Colin Ian King <colin.king@canonical.com>:
scripts/spelling.txt: add more spellings to spelling.txt
Subsystem: ocfs2
Ding Xiang <dingxiang@cmss.chinamobile.com>:
ocfs2: fix passing zero to 'PTR_ERR' warning
Subsystem: vfs
Saurav Girepunje <saurav.girepunje@gmail.com>:
fs/buffer.c: fix use true/false for bool type
Ben Dooks <ben.dooks@codethink.co.uk>:
fs/buffer.c: include internal.h for missing declarations
Subsystem: mm/slab
Pengfei Li <lpf.vector@gmail.com>:
Patch series "mm, slab: Make kmalloc_info[] contain all types of names", v6:
mm, slab: make kmalloc_info[] contain all types of names
mm, slab: remove unused kmalloc_size()
mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc caches
Subsystem: mm/slub
Miles Chen <miles.chen@mediatek.com>:
mm: slub: print the offset of fault addresses
Yu Zhao <yuzhao@google.com>:
mm/slub.c: update comments
mm/slub.c: clean up validate_slab()
Subsystem: mm/pagecache
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
mm/filemap.c: remove redundant cache invalidation after async direct-io write
fs/direct-io.c: keep dio_warn_stale_pagecache() when CONFIG_BLOCK=n
mm/filemap.c: warn if stale pagecache is left after direct write
Subsystem: mm/gup
zhong jiang <zhongjiang@huawei.com>:
mm/gup.c: allow CMA migration to propagate errors back to caller
Liu Xiang <liuxiang_1999@126.com>:
mm/gup.c: fix comments of __get_user_pages() and get_user_pages_remote()
Subsystem: mm/swap
Naohiro Aota <naohiro.aota@wdc.com>:
mm, swap: disallow swapon() on zoned block devices
Fengguang Wu <fengguang.wu@intel.com>:
mm/swap.c: trivial mark_page_accessed() cleanup
Subsystem: mm/memcg
Yafang Shao <laoar.shao@gmail.com>:
mm, memcg: clean up reclaim iter array
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: remove dead code from memory_max_write()
mm: memcontrol: try harder to set a new memory.high
Hao Lee <haolee.swjtu@gmail.com>:
include/linux/memcontrol.h: fix comments based on per-node memcg
Shakeel Butt <shakeelb@google.com>:
mm: vmscan: memcontrol: remove mem_cgroup_select_victim_node()
Chris Down <chris@chrisdown.name>:
Documentation/admin-guide/cgroup-v2.rst: document why inactive_X + active_X may not equal X
Subsystem: mm/pagemap
Johannes Weiner <hannes@cmpxchg.org>:
mm: drop mmap_sem before calling balance_dirty_pages() in write fault
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
shmem: pin the file in shmem_fault() if mmap_sem is dropped
"Joel Fernandes (Google)" <joel@joelfernandes.org>:
mm: emit tracepoint when RSS changes
rss_stat: add support to detect RSS updates of external mm
Wei Yang <richardw.yang@linux.intel.com>:
mm/mmap.c: remove a never-triggered warning in __vma_adjust()
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
mm/swap.c: piggyback lru_add_drain_all() calls
Wei Yang <richardw.yang@linux.intel.com>:
mm/mmap.c: prev could be retrieved from vma->vm_prev
mm/mmap.c: __vma_unlink_prev() is not necessary now
mm/mmap.c: extract __vma_unlink_list() as counterpart for __vma_link_list()
mm/mmap.c: rb_parent is not necessary in __vma_link_list()
mm/rmap.c: don't reuse anon_vma if we just want a copy
mm/rmap.c: reuse mergeable anon_vma as parent when fork
Gaowei Pu <pugaowei@gmail.com>:
mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_area
Vineet Gupta <Vineet.Gupta1@synopsys.com>:
Patch series "elide extraneous generated code for folded p4d/pud/pmd", v3:
ARC: mm: remove __ARCH_USE_5LEVEL_HACK
asm-generic/tlb: stub out pud_free_tlb() if nopud ...
asm-generic/tlb: stub out p4d_free_tlb() if nop4d ...
asm-generic/tlb: stub out pmd_free_tlb() if nopmd
asm-generic/mm: stub out p{4,u}d_clear_bad() if __PAGETABLE_P{4,U}D_FOLDED
Miles Chen <miles.chen@mediatek.com>:
mm/rmap.c: fix outdated comment in page_get_anon_vma()
Yang Shi <yang.shi@linux.alibaba.com>:
mm/rmap.c: use VM_BUG_ON_PAGE() in __page_check_anon_rmap()
Thomas Hellstrom <thellstrom@vmware.com>:
mm: move the backup x_devmap() functions to asm-generic/pgtable.h
mm/memory.c: fix a huge pud insertion race during faulting
Steven Price <steven.price@arm.com>:
Patch series "Generic page walk and ptdump", v15:
mm: add generic p?d_leaf() macros
arc: mm: add p?d_leaf() definitions
arm: mm: add p?d_leaf() definitions
arm64: mm: add p?d_leaf() definitions
mips: mm: add p?d_leaf() definitions
powerpc: mm: add p?d_leaf() definitions
riscv: mm: add p?d_leaf() definitions
s390: mm: add p?d_leaf() definitions
sparc: mm: add p?d_leaf() definitions
x86: mm: add p?d_leaf() definitions
mm: pagewalk: add p4d_entry() and pgd_entry()
mm: pagewalk: allow walking without vma
mm: pagewalk: add test_p?d callbacks
mm: pagewalk: add 'depth' parameter to pte_hole
x86: mm: point to struct seq_file from struct pg_state
x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct
x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
x86: mm: convert ptdump_walk_pgd_level_core() to take an mm_struct
mm: add generic ptdump
x86: mm: convert dump_pagetables to use walk_page_range
arm64: mm: convert mm/dump.c to use walk_page_range()
arm64: mm: display non-present entries in ptdump
mm: ptdump: reduce level numbers by 1 in note_page()
Subsystem: mm/memfd
Nicolas Geoffray <ngeoffray@google.com>:
mm, memfd: fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings
"Joel Fernandes (Google)" <joel@joelfernandes.org>:
memfd: add test for COW on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings
Subsystem: mm/memory-failure
Jane Chu <jane.chu@oracle.com>:
mm/memory-failure.c clean up around tk pre-allocation
Naoya Horiguchi <nao.horiguchi@gmail.com>:
mm, soft-offline: convert parameter to pfn
Yunfeng Ye <yeyunfeng@huawei.com>:
mm/memory-failure.c: use page_shift() in add_to_kill()
Subsystem: mm/memory-hotplug
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/hotplug: reorder memblock_[free|remove]() calls in try_remove_memory()
Alastair D'Silva <alastair@d-silva.org>:
mm/memory_hotplug.c: add a bounds check to __add_pages()
David Hildenbrand <david@redhat.com>:
Patch series "mm/memory_hotplug: Export generic_online_page()":
mm/memory_hotplug: export generic_online_page()
hv_balloon: use generic_online_page()
mm/memory_hotplug: remove __online_page_free() and __online_page_increment_counters()
Patch series "mm: Memory offlining + page isolation cleanups", v2:
mm/page_alloc.c: don't set pages PageReserved() when offlining
mm/page_isolation.c: convert SKIP_HWPOISON to MEMORY_OFFLINE
"Ben Dooks (Codethink)" <ben.dooks@codethink.co.uk>:
include/linux/memory_hotplug.h: move definitions of {set,clear}_zone_contiguous
David Hildenbrand <david@redhat.com>:
drivers/base/memory.c: drop the mem_sysfs_mutex
mm/memory_hotplug.c: don't allow to online/offline memory blocks with holes
Subsystem: mm/sparsemem
Vincent Whitchurch <vincent.whitchurch@axis.com>:
mm/sparse: consistently do not zero memmap
Ilya Leoshkevich <iii@linux.ibm.com>:
mm/sparse.c: mark populate_section_memmap as __meminit
Michal Hocko <mhocko@suse.com>:
mm/sparse.c: do not waste pre allocated memmap space
Subsystem: mm/vmalloc
Liu Xiang <liuxiang_1999@126.com>:
mm/vmalloc.c: remove unnecessary highmem_mask from parameter of gfpflags_allow_blocking()
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: remove preempt_disable/enable when doing preloading
mm/vmalloc: respect passed gfp_mask when doing preloading
mm/vmalloc: add more comments to the adjust_va_to_fit_type()
Anders Roxell <anders.roxell@linaro.org>:
selftests: vm: add fragment CONFIG_TEST_VMALLOC
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: rework vmap_area_lock
Subsystem: mm/kasan
Daniel Axtens <dja@axtens.net>:
Patch series "kasan: support backing vmalloc space with real shadow:
kasan: support backing vmalloc space with real shadow memory
kasan: add test for vmalloc
fork: support VMAP_STACK with KASAN_VMALLOC
x86/kasan: support KASAN_VMALLOC
Subsystem: mm/pagealloc
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/page_alloc: add alloc_contig_pages()
Mel Gorman <mgorman@techsingularity.net>:
mm, pcp: share common code between memory hotplug and percpu sysctl handler
mm, pcpu: make zone pcp updates and reset internal to the mm
Hao Lee <haolee.swjtu@gmail.com>:
include/linux/mmzone.h: fix comment for ISOLATE_UNMAPPED macro
lijiazi <jqqlijiazi@gmail.com>:
mm/page_alloc.c: print reserved_highatomic info
Subsystem: mm/vmscan
Andrey Ryabinin <aryabinin@virtuozzo.com>:
mm/vmscan: remove unused lru_pages argument
Yang Shi <yang.shi@linux.alibaba.com>:
mm/vmscan.c: remove unused scan_control parameter from pageout()
Johannes Weiner <hannes@cmpxchg.org>:
Patch series "mm: vmscan: cgroup-related cleanups":
mm: vmscan: simplify lruvec_lru_size()
mm: clean up and clarify lruvec lookup procedure
mm: vmscan: move inactive_list_is_low() swap check to the caller
mm: vmscan: naming fixes: global_reclaim() and sane_reclaim()
mm: vmscan: replace shrink_node() loop with a retry jump
mm: vmscan: turn shrink_node_memcg() into shrink_lruvec()
mm: vmscan: split shrink_node() into node part and memcgs part
mm: vmscan: harmonize writeback congestion tracking for nodes & memcgs
Patch series "mm: fix page aging across multiple cgroups":
mm: vmscan: move file exhaustion detection to the node level
mm: vmscan: detect file thrashing at the reclaim root
mm: vmscan: enforce inactive:active ratio at the reclaim root
Xianting Tian <xianting_tian@126.com>:
mm/vmscan.c: fix typo in comment
Subsystem: mm/proc
Johannes Weiner <hannes@cmpxchg.org>:
kernel: sysctl: make drop_caches write-only
Subsystem: mm/z3fold
Vitaly Wool <vitaly.wool@konsulko.com>:
mm/z3fold.c: add inter-page compaction
Subsystem: mm/mempolicy
Li Xinhai <lixinhai.lxh@gmail.com>:
Patch series "mm: Fix checking unmapped holes for mbind", v4:
mm/mempolicy.c: check range first in queue_pages_test_walk
mm/mempolicy.c: fix checking unmapped holes for mbind
Subsystem: mm/memblock
Cao jin <caoj.fnst@cn.fujitsu.com>:
mm/memblock.c: cleanup doc
mm/memblock: correct doc for function
Yunfeng Ye <yeyunfeng@huawei.com>:
mm: support memblock alloc on the exact node for sparse_buffer_init()
Subsystem: mm/hugetlbfs
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlbfs: hugetlb_fault_mutex_hash() cleanup
mm/hugetlbfs: fix error handling when setting up mounts
Patch series "hugetlbfs: convert macros to static inline, fix sparse warning":
powerpc/mm: remove pmd_huge/pud_huge stubs and include hugetlb.h
hugetlbfs: convert macros to static inline, fix sparse warning
Piotr Sarna <p.sarna@tlen.pl>:
hugetlbfs: add O_TMPFILE support
Waiman Long <longman@redhat.com>:
hugetlbfs: take read_lock on i_mmap for PMD sharing
Subsystem: mm/hugetlb
Mina Almasry <almasrymina@google.com>:
hugetlb: region_chg provides only cache entry
hugetlb: remove duplicated code
Wei Yang <richardw.yang@linux.intel.com>:
hugetlb: remove unused hstate in hugetlb_fault_mutex_hash()
Zhigang Lu <tonnylu@tencent.com>:
mm/hugetlb: avoid looping to the same hugepage if !pages and !vmas
zhong jiang <zhongjiang@huawei.com>:
mm/huge_memory.c: split_huge_pages_fops should be defined with DEFINE_DEBUGFS_ATTRIBUTE
Subsystem: mm/migration
Yang Shi <yang.shi@linux.alibaba.com>:
mm/migrate.c: handle freed page at the first place
Subsystem: mm/thp
"Kirill A. Shutemov" <kirill@shutemov.name>:
mm, thp: do not queue fully unmapped pages for deferred split
Song Liu <songliubraving@fb.com>:
mm/thp: flush file for !is_shmem PageDirty() case in collapse_file()
Subsystem: mm/cma
Yunfeng Ye <yeyunfeng@huawei.com>:
mm/cma.c: switch to bitmap_zalloc() for cma bitmap allocation
zhong jiang <zhongjiang@huawei.com>:
mm/cma_debug.c: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
Subsystem: mm/autonuma
Huang Ying <ying.huang@intel.com>:
autonuma: fix watermark checking in migrate_balanced_pgdat()
autonuma: reduce cache footprint when scanning page tables
Subsystem: mm/page-poison
zhong jiang <zhongjiang@huawei.com>:
mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
Subsystem: mm/mmap
Wei Yang <richardw.yang@linux.intel.com>:
mm/mmap.c: make vma_merge() comment more easy to understand
Subsystem: mm/madvise
Yunfeng Ye <yeyunfeng@huawei.com>:
mm/madvise.c: replace with page_size() in madvise_inject_error()
Wei Yang <richardw.yang@linux.intel.com>:
mm/madvise.c: use PAGE_ALIGN[ED] for range checking
Subsystem: mm/userfaultfd
Wei Yang <richardw.yang@linux.intel.com>:
userfaultfd: use vma_pagesize for all huge page size calculation
userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
userfaultfd: wrap the common dst_vma check into an inlined function
Andrea Arcangeli <aarcange@redhat.com>:
fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
Mike Rapoport <rppt@linux.ibm.com>:
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
Subsystem: mm/shmem
Colin Ian King <colin.king@canonical.com>:
mm/shmem.c: make array 'values' static const, makes object smaller
Yang Shi <yang.shi@linux.alibaba.com>:
mm: shmem: use proper gfp flags for shmem_writepage()
Chen Jun <chenjun102@huawei.com>:
mm/shmem.c: cast the type of unmap_start to u64
Subsystem: mm/cleanups
Hao Lee <haolee.swjtu@gmail.com>:
mm: fix struct member name in function comments
Wei Yang <richardw.yang@linux.intel.com>:
mm: fix typos in comments when calling __SetPageUptodate()
Souptick Joarder <jrdr.linux@gmail.com>:
mm/memory_hotplug.c: remove __online_page_set_limits()
Krzysztof Kozlowski <krzk@kernel.org>:
mm/Kconfig: fix indentation
Randy Dunlap <rdunlap@infradead.org>:
mm/Kconfig: fix trivial help text punctuation
Subsystem: mm/support
Minchan Kim <minchan@google.com>:
mm/page_io.c: annotate refault stalls from swap_readpage
Documentation/admin-guide/cgroup-v2.rst | 7
Documentation/dev-tools/kasan.rst | 63 +
arch/Kconfig | 9
arch/arc/include/asm/pgtable.h | 2
arch/arc/mm/fault.c | 10
arch/arc/mm/highmem.c | 4
arch/arm/include/asm/pgtable-2level.h | 1
arch/arm/include/asm/pgtable-3level.h | 1
arch/arm64/Kconfig | 1
arch/arm64/Kconfig.debug | 19
arch/arm64/include/asm/pgtable.h | 2
arch/arm64/include/asm/ptdump.h | 8
arch/arm64/mm/Makefile | 4
arch/arm64/mm/dump.c | 148 +---
arch/arm64/mm/mmu.c | 4
arch/arm64/mm/ptdump_debugfs.c | 2
arch/mips/include/asm/pgtable.h | 5
arch/powerpc/include/asm/book3s/64/pgtable-4k.h | 3
arch/powerpc/include/asm/book3s/64/pgtable-64k.h | 3
arch/powerpc/include/asm/book3s/64/pgtable.h | 30
arch/powerpc/mm/book3s64/radix_pgtable.c | 1
arch/riscv/include/asm/pgtable-64.h | 7
arch/riscv/include/asm/pgtable.h | 7
arch/s390/include/asm/pgtable.h | 2
arch/sparc/include/asm/pgtable_64.h | 2
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 20
arch/x86/include/asm/pgtable.h | 10
arch/x86/mm/Makefile | 4
arch/x86/mm/debug_pagetables.c | 8
arch/x86/mm/dump_pagetables.c | 431 +++---------
arch/x86/mm/kasan_init_64.c | 61 +
arch/x86/platform/efi/efi_32.c | 2
arch/x86/platform/efi/efi_64.c | 4
drivers/base/memory.c | 40 -
drivers/firmware/efi/arm-runtime.c | 2
drivers/hv/hv_balloon.c | 4
drivers/xen/balloon.c | 1
fs/buffer.c | 6
fs/direct-io.c | 21
fs/hugetlbfs/inode.c | 67 +
fs/ocfs2/acl.c | 4
fs/proc/task_mmu.c | 4
fs/userfaultfd.c | 21
include/asm-generic/4level-fixup.h | 1
include/asm-generic/5level-fixup.h | 1
include/asm-generic/pgtable-nop4d.h | 2
include/asm-generic/pgtable-nopmd.h | 2
include/asm-generic/pgtable-nopud.h | 2
include/asm-generic/pgtable.h | 71 ++
include/asm-generic/tlb.h | 4
include/linux/fs.h | 6
include/linux/gfp.h | 2
include/linux/hugetlb.h | 142 +++-
include/linux/kasan.h | 31
include/linux/memblock.h | 3
include/linux/memcontrol.h | 51 -
include/linux/memory_hotplug.h | 11
include/linux/mm.h | 42 -
include/linux/mmzone.h | 34
include/linux/moduleloader.h | 2
include/linux/page-isolation.h | 4
include/linux/pagewalk.h | 42 -
include/linux/ptdump.h | 22
include/linux/slab.h | 20
include/linux/string.h | 2
include/linux/swap.h | 2
include/linux/vmalloc.h | 12
include/trace/events/kmem.h | 53 +
kernel/events/uprobes.c | 2
kernel/fork.c | 4
kernel/sysctl.c | 2
lib/Kconfig.kasan | 16
lib/test_kasan.c | 26
lib/vsprintf.c | 40 -
mm/Kconfig | 40 -
mm/Kconfig.debug | 21
mm/Makefile | 1
mm/cma.c | 6
mm/cma_debug.c | 10
mm/filemap.c | 56 -
mm/gup.c | 40 -
mm/hmm.c | 8
mm/huge_memory.c | 2
mm/hugetlb.c | 298 ++------
mm/hwpoison-inject.c | 4
mm/internal.h | 27
mm/kasan/common.c | 233 ++++++
mm/kasan/generic_report.c | 3
mm/kasan/kasan.h | 1
mm/khugepaged.c | 18
mm/madvise.c | 14
mm/memblock.c | 113 ++-
mm/memcontrol.c | 167 ----
mm/memory-failure.c | 61 -
mm/memory.c | 56 +
mm/memory_hotplug.c | 86 +-
mm/mempolicy.c | 59 +
mm/migrate.c | 21
mm/mincore.c | 1
mm/mmap.c | 75 --
mm/mprotect.c | 8
mm/mremap.c | 4
mm/nommu.c | 10
mm/page_alloc.c | 137 +++
mm/page_io.c | 15
mm/page_isolation.c | 12
mm/pagewalk.c | 126 ++-
mm/pgtable-generic.c | 9
mm/ptdump.c | 167 ++++
mm/rmap.c | 65 +
mm/shmem.c | 29
mm/slab.c | 7
mm/slab.h | 6
mm/slab_common.c | 101 +-
mm/slub.c | 36 -
mm/sparse.c | 22
mm/swap.c | 29
mm/swapfile.c | 7
mm/userfaultfd.c | 77 +-
mm/util.c | 22
mm/vmalloc.c | 196 +++--
mm/vmscan.c | 798 +++++++++++------------
mm/workingset.c | 75 +-
mm/z3fold.c | 375 ++++++++--
scripts/spelling.txt | 28
tools/testing/selftests/memfd/memfd_test.c | 36 +
tools/testing/selftests/vm/config | 1
128 files changed, 3409 insertions(+), 2121 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-11-22 1:53 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-11-22 1:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
4 fixes, based on 81429eb8d9ca40b0c65bb739d29fa856c5d5e958:
Vincent Whitchurch <vincent.whitchurch@axis.com>:
mm/sparse: consistently do not zero memmap
Joseph Qi <joseph.qi@linux.alibaba.com>:
Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()"
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span()
Andrey Ryabinin <aryabinin@virtuozzo.com>:
mm/ksm.c: don't WARN if page is still mapped in remove_stable_node()
fs/ocfs2/xattr.c | 56 ++++++++++++++++++++++++++++++----------------------
mm/ksm.c | 14 ++++++-------
mm/memory_hotplug.c | 16 ++++++++++++--
mm/sparse.c | 2 -
4 files changed, 54 insertions(+), 34 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-11-16 1:34 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-11-16 1:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
11 fixes, based on 875fef493f21e54d20d71a581687990aaa50268c:
Yang Shi <yang.shi@linux.alibaba.com>:
mm: mempolicy: fix the wrong return value and potential pages leak of mbind
zhong jiang <zhongjiang@huawei.com>:
mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
Lasse Collin <lasse.collin@tukaani.org>:
lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations
Roman Gushchin <guro@fb.com>:
mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
Laura Abbott <labbott@redhat.com>:
mm: slub: really fix slab walking for init_on_free
Song Liu <songliubraving@fb.com>:
mm,thp: recheck each page before collapsing file THP
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: fix try_offline_node()
Vinayak Menon <vinmenon@codeaurora.org>:
mm/page_io.c: do not free shared swap slots
Ralph Campbell <rcampbell@nvidia.com>:
mm/debug.c: __dump_page() prints an extra line
mm/debug.c: PageAnon() is true for PageKsm() pages
drivers/base/memory.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/memory.h | 1 +
lib/xz/xz_dec_lzma2.c | 1 +
mm/debug.c | 33 ++++++++++++++++++---------------
mm/hugetlb_cgroup.c | 2 +-
mm/khugepaged.c | 28 ++++++++++++++++------------
mm/madvise.c | 16 ++++++++++++----
mm/memcontrol.c | 2 +-
mm/memory_hotplug.c | 47 +++++++++++++++++++++++++++++------------------
mm/mempolicy.c | 14 +++++++++-----
mm/page_io.c | 6 +++---
mm/slub.c | 39 +++++++++------------------------------
12 files changed, 136 insertions(+), 89 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-11-06 5:16 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-11-06 5:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
17 fixes, based on 26bc672134241a080a83b2ab9aa8abede8d30e1c:
Shakeel Butt <shakeelb@google.com>:
mm: memcontrol: fix NULL-ptr deref in percpu stats flush
John Hubbard <jhubbard@nvidia.com>:
mm/gup_benchmark: fix MAP_HUGETLB case
Mel Gorman <mgorman@techsingularity.net>:
mm, meminit: recalculate pcpu batch and high limits after init completes
Yang Shi <yang.shi@linux.alibaba.com>:
mm: thp: handle page cache THP correctly in PageTransCompoundMap
Shuning Zhang <sunny.s.zhang@oracle.com>:
ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()
Jason Gunthorpe <jgg@mellanox.com>:
mm/mmu_notifiers: use the right return code for WARN_ON
Michal Hocko <mhocko@suse.com>:
mm, vmstat: hide /proc/pagetypeinfo from normal users
mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo
Ville Syrjälä <ville.syrjala@linux.intel.com>:
mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y
Johannes Weiner <hannes@cmpxchg.org>:
mm/page_alloc.c: ratelimit allocation failure warnings more aggressively
Vitaly Wool <vitaly.wool@konsulko.com>:
zswap: add Vitaly to the maintainers list
Kevin Hao <haokexin@gmail.com>:
dump_stack: avoid the livelock of the dump_lock
Song Liu <songliubraving@fb.com>:
MAINTAINERS: update information for "MEMORY MANAGEMENT"
Roman Gushchin <guro@fb.com>:
mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly
Ilya Leoshkevich <iii@linux.ibm.com>:
scripts/gdb: fix debugging modules compiled with hot/cold partitioning
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: fix updating the node span
Johannes Weiner <hannes@cmpxchg.org>:
mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
MAINTAINERS | 5 +
fs/ocfs2/file.c | 125 ++++++++++++++++++++++-------
include/linux/mm.h | 5 -
include/linux/mm_types.h | 5 +
include/linux/page-flags.h | 20 ++++
lib/dump_stack.c | 7 +
mm/khugepaged.c | 7 -
mm/memcontrol.c | 23 +++--
mm/memory_hotplug.c | 8 +
mm/mmu_notifier.c | 2
mm/page_alloc.c | 17 ++-
mm/slab.h | 4
mm/vmstat.c | 25 ++++-
scripts/gdb/linux/symbols.py | 3
tools/testing/selftests/vm/gup_benchmark.c | 2
15 files changed, 197 insertions(+), 61 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-10-19 3:19 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-10-19 3:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
Rather a lot of fixes, almost all affecting mm/.
26 patches, based on b9959c7a347d6adbb558fba7e36e9fef3cba3b07:
David Hildenbrand <david@redhat.com>:
drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()
fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c
mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()
Joel Colledge <joel.colledge@linbit.com>:
scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set
Qian Cai <cai@lca.pw>:
mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>:
Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6:
mm/memunmap: don't access uninitialized memmap in memunmap_pages()
Roman Gushchin <guro@fb.com>:
mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release
Chengguang Xu <cgxu519@mykernel.net>:
ocfs2: fix error handling in ocfs2_setattr()
John Hubbard <jhubbard@nvidia.com>:
mm/gup_benchmark: add a missing "w" to getopt string
mm/gup: fix a misnamed "write" argument, and a related bug
Honglei Wang <honglei.wang@oracle.com>:
mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size
Mike Rapoport <rppt@linux.ibm.com>:
mm: memblock: do not enforce current limit for memblock_phys* family
David Hildenbrand <david@redhat.com>:
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
Yi Li <yilikernel@gmail.com>:
ocfs2: fix panic due to ocfs2_wq is null
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
mm/memcontrol: update lruvec counters in mem_cgroup_move_account
Chenwandun <chenwandun@huawei.com>:
zram: fix race between backing_dev_show and backing_dev_store
Ben Dooks <ben.dooks@codethink.co.uk>:
mm: include <linux/huge_mm.h> for is_vma_temporary_stack
mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition
"Ben Dooks (Codethink)" <ben.dooks@codethink.co.uk>:
mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
Patch series "Fixes for THP in page cache", v2:
proc/meminfo: fix output alignment
mm/thp: fix node page state in split_huge_page_to_list()
William Kucharski <william.kucharski@oracle.com>:
mm/vmscan.c: support removing arbitrary sized pages from mapping
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>:
mm/thp: allow dropping THP from page cache
Song Liu <songliubraving@fb.com>:
kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register
Ilya Leoshkevich <iii@linux.ibm.com>:
scripts/gdb: fix debugging modules on s390
drivers/base/memory.c | 3 +
drivers/block/zram/zram_drv.c | 5 +
fs/ocfs2/file.c | 2
fs/ocfs2/journal.c | 3 -
fs/ocfs2/localalloc.c | 3 -
fs/proc/meminfo.c | 4 -
fs/proc/page.c | 28 ++++++----
kernel/events/uprobes.c | 13 ++++-
mm/filemap.c | 1
mm/gup.c | 14 +++--
mm/huge_memory.c | 9 ++-
mm/hugetlb.c | 5 -
mm/init-mm.c | 1
mm/memblock.c | 6 +-
mm/memcontrol.c | 18 ++++---
mm/memory-failure.c | 14 +++--
mm/memory_hotplug.c | 74 ++++++-----------------------
mm/memremap.c | 11 ++--
mm/page_owner.c | 5 +
mm/rmap.c | 1
mm/slab_common.c | 9 +--
mm/truncate.c | 12 ++++
mm/vmscan.c | 14 ++---
scripts/gdb/linux/dmesg.py | 16 ++++--
scripts/gdb/linux/symbols.py | 8 ++-
scripts/gdb/linux/utils.py | 25 +++++----
tools/testing/selftests/vm/gup_benchmark.c | 2
27 files changed, 166 insertions(+), 140 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-10-14 21:11 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-10-14 21:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
The usual shower of hotfixes and some followups to the recently merged
page_owner enhancements.
16 patches, based on 2abd839aa7e615f2bbc50c8ba7deb9e40d186768.
Subsystems affected by this patch series:
Vlastimil Babka <vbabka@suse.cz>:
Patch series "followups to debug_pagealloc improvements through page_owner", v3:
mm, page_owner: fix off-by-one error in __set_page_owner_handle()
mm, page_owner: decouple freeing stack trace from debug_pagealloc
mm, page_owner: rename flag indicating that page is allocated
Qian Cai <cai@lca.pw>:
mm/slub: fix a deadlock in show_slab_objects()
Eric Biggers <ebiggers@google.com>:
lib/generic-radix-tree.c: add kmemleak annotations
Alexander Potapenko <glider@google.com>:
mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
lib/test_meminit: add a kmem_cache_alloc_bulk() test
David Rientjes <rientjes@google.com>:
mm, hugetlb: allow hugepage allocations to reclaim as needed
Vlastimil Babka <vbabka@suse.cz>:
mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
Randy Dunlap <rdunlap@infradead.org>:
fs/direct-io.c: fix kernel-doc warning
fs/libfs.c: fix kernel-doc warning
fs/fs-writeback.c: fix kernel-doc warning
bitmap.h: fix kernel-doc warning and typo
xarray.h: fix kernel-doc warning
mm/slab.c: fix kernel-doc warning for __ksize()
Jane Chu <jane.chu@oracle.com>:
mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
Documentation/dev-tools/kasan.rst | 3 ++
fs/direct-io.c | 3 --
fs/fs-writeback.c | 2 -
fs/libfs.c | 3 --
include/linux/bitmap.h | 3 +-
include/linux/page_ext.h | 10 ++++++
include/linux/xarray.h | 4 +-
lib/generic-radix-tree.c | 32 +++++++++++++++++-----
lib/test_meminit.c | 27 ++++++++++++++++++
mm/compaction.c | 7 ++--
mm/memory-failure.c | 22 ++++++++-------
mm/page_alloc.c | 6 ++--
mm/page_ext.c | 23 ++++++---------
mm/page_owner.c | 55 +++++++++++++-------------------------
mm/slab.c | 3 ++
mm/slub.c | 35 ++++++++++++++++++------
16 files changed, 152 insertions(+), 86 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-10-07 0:57 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-10-07 0:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
The usual shower of hotfixes.
Chris's memcg patches aren't actually fixes - they're mature but a few
niggling review issues were late to arrive.
The ocfs2 fixes are quite old - those took some time to get
reviewer attention.
18 patches, based on 4ea655343ce4180fe9b2c7ec8cb8ef9884a47901.
Subsystems affected by this patch series:
ocfs2
hotfixes
mm/memcg
mm/slab-generic
Subsystem: ocfs2
Jia Guo <guojia12@huawei.com>:
ocfs2: clear zero in unaligned direct IO
Jia-Ju Bai <baijiaju1990@gmail.com>:
fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock()
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc()
Subsystem: hotfixes
Will Deacon <will@kernel.org>:
panic: ensure preemption is disabled during panic()
Anshuman Khandual <anshuman.khandual@arm.com>:
mm/memremap: drop unused SECTION_SIZE and SECTION_MASK
Tejun Heo <tj@kernel.org>:
writeback: fix use-after-free in finish_writeback_work()
Yi Wang <wang.yi59@zte.com.cn>:
mm: fix -Wmissing-prototypes warnings
Baoquan He <bhe@redhat.com>:
memcg: only record foreign writebacks with dirty pages when memcg is not disabled
Michal Hocko <mhocko@suse.com>:
kernel/sysctl.c: do not override max_threads provided by userspace
Vitaly Wool <vitalywool@gmail.com>:
mm/z3fold.c: claim page in the beginning of free
Qian Cai <cai@lca.pw>:
mm/page_alloc.c: fix a crash in free_pages_prepare()
Dan Carpenter <dan.carpenter@oracle.com>:
mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
Subsystem: mm/memcg
Chris Down <chris@chrisdown.name>:
mm, memcg: proportional memory.{low,min} reclaim
mm, memcg: make memory.emin the baseline for utilisation determination
mm, memcg: make scan aggression always exclude protection
Subsystem: mm/slab-generic
Vlastimil Babka <vbabka@suse.cz>:
Patch series "guarantee natural alignment for kmalloc()", v2:
mm, sl[ou]b: improve memory accounting
mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
Documentation/admin-guide/cgroup-v2.rst | 20 +-
Documentation/core-api/memory-allocation.rst | 4
fs/fs-writeback.c | 9 -
fs/ocfs2/aops.c | 25 +++
fs/ocfs2/ioctl.c | 2
fs/ocfs2/xattr.c | 56 +++----
include/linux/memcontrol.h | 67 ++++++---
include/linux/slab.h | 4
kernel/fork.c | 4
kernel/panic.c | 1
mm/memcontrol.c | 5
mm/memremap.c | 2
mm/page_alloc.c | 8 -
mm/shuffle.c | 2
mm/slab_common.c | 19 ++
mm/slob.c | 62 ++++++--
mm/slub.c | 14 +
mm/sparse.c | 2
mm/vmpressure.c | 20 +-
mm/vmscan.c | 198 +++++++++++++++++----------
mm/z3fold.c | 10 +
21 files changed, 363 insertions(+), 171 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-09-25 23:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-09-25 23:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- almost all of the rest of -mm
- various other subsystems
76 patches, based on 351c8a09b00b5c51c8f58b016fffe51f87e2d820:
Subsystems affected by this patch series:
memcg
misc
core-kernel
lib
checkpatch
reiserfs
fat
fork
cpumask
kexec
uaccess
kconfig
kgdb
bug
ipc
lzo
kasan
madvise
cleanups
pagemap
Subsystem: memcg
Michal Hocko <mhocko@suse.com>:
memcg, kmem: do not fail __GFP_NOFAIL charges
Subsystem: misc
Masahiro Yamada <yamada.masahiro@socionext.com>:
linux/coff.h: add include guard
Subsystem: core-kernel
Valdis Kletnieks <valdis.kletnieks@vt.edu>:
kernel/elfcore.c: include proper prototypes
Subsystem: lib
Michel Lespinasse <walken@google.com>:
rbtree: avoid generating code twice for the cached versions (tools copy)
Patch series "make RB_DECLARE_CALLBACKS more generic", v3:
augmented rbtree: add comments for RB_DECLARE_CALLBACKS macro
augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macro
augmented rbtree: rework the RB_DECLARE_CALLBACKS macro definition
Joe Perches <joe@perches.com>:
kernel-doc: core-api: include string.h into core-api
Qian Cai <cai@lca.pw>:
include/trace/events/writeback.h: fix -Wstringop-truncation warnings
Kees Cook <keescook@chromium.org>:
strscpy: reject buffer sizes larger than INT_MAX
Valdis Kletnieks <valdis.kletnieks@vt.edu>:
lib/generic-radix-tree.c: make 2 functions static inline
lib/extable.c: add missing prototypes
Stephen Boyd <swboyd@chromium.org>:
lib/hexdump: make print_hex_dump_bytes() a nop on !DEBUG builds
Subsystem: checkpatch
Joe Perches <joe@perches.com>:
checkpatch: don't interpret stack dumps as commit IDs
checkpatch: improve SPDX license checking
Matteo Croce <mcroce@redhat.com>:
checkpatch.pl: warn on invalid commit id
Brendan Jackman <brendan.jackman@bluwireless.co.uk>:
checkpatch: exclude sizeof sub-expressions from MACRO_ARG_REUSE
Joe Perches <joe@perches.com>:
checkpatch: prefer __section over __attribute__((section(...)))
checkpatch: allow consecutive close braces
Sean Christopherson <sean.j.christopherson@intel.com>:
checkpatch: remove obsolete period from "ambiguous SHA1" query
Joe Perches <joe@perches.com>:
checkpatch: make git output use LANGUAGE=en_US.utf8
Subsystem: reiserfs
Jia-Ju Bai <baijiaju1990@gmail.com>:
fs: reiserfs: remove unnecessary check of bh in remove_from_transaction()
zhengbin <zhengbin13@huawei.com>:
fs/reiserfs/journal.c: remove set but not used variables
fs/reiserfs/stree.c: remove set but not used variables
fs/reiserfs/lbalance.c: remove set but not used variables
fs/reiserfs/objectid.c: remove set but not used variables
fs/reiserfs/prints.c: remove set but not used variables
fs/reiserfs/fix_node.c: remove set but not used variables
fs/reiserfs/do_balan.c: remove set but not used variables
Jason Yan <yanaijie@huawei.com>:
fs/reiserfs/journal.c: remove set but not used variable
fs/reiserfs/do_balan.c: remove set but not used variable
Subsystem: fat
Markus Elfring <elfring@users.sourceforge.net>:
fat: delete an unnecessary check before brelse()
Subsystem: fork
Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>:
fork: improve error message for corrupted page tables
Subsystem: cpumask
Alexey Dobriyan <adobriyan@gmail.com>:
cpumask: nicer for_each_cpumask_and() signature
Subsystem: kexec
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>:
kexec: bail out upon SIGKILL when allocating memory.
Vasily Gorbik <gor@linux.ibm.com>:
kexec: restore arch_kexec_kernel_image_probe declaration
Subsystem: uaccess
Kees Cook <keescook@chromium.org>:
uaccess: add missing __must_check attributes
Subsystem: kconfig
Masahiro Yamada <yamada.masahiro@socionext.com>:
compiler: enable CONFIG_OPTIMIZE_INLINING forcibly
Subsystem: kgdb
Douglas Anderson <dianders@chromium.org>:
kgdb: don't use a notifier to enter kgdb at panic; call directly
scripts/gdb: handle split debug
Subsystem: bug
Kees Cook <keescook@chromium.org>:
Patch series "Clean up WARN() "cut here" handling", v2:
bug: refactor away warn_slowpath_fmt_taint()
bug: rename __WARN_printf_taint() to __WARN_printf()
bug: consolidate warn_slowpath_fmt() usage
bug: lift "cut here" out of __warn()
bug: clean up helper macros to remove __WARN_TAINT()
bug: consolidate __WARN_FLAGS usage
bug: move WARN_ON() "cut here" into exception handler
Subsystem: ipc
Markus Elfring <elfring@users.sourceforge.net>:
ipc/mqueue.c: delete an unnecessary check before the macro call dev_kfree_skb()
ipc/mqueue: improve exception handling in do_mq_notify()
"Joel Fernandes (Google)" <joel@joelfernandes.org>:
ipc/sem.c: convert to use built-in RCU list checking
Subsystem: lzo
Dave Rodgman <dave.rodgman@arm.com>:
lib/lzo/lzo1x_compress.c: fix alignment bug in lzo-rle
Subsystem: kasan
Andrey Konovalov <andreyknvl@google.com>:
Patch series "arm64: untag user pointers passed to the kernel", v19:
lib: untag user pointers in strn*_user
mm: untag user pointers passed to memory syscalls
mm: untag user pointers in mm/gup.c
mm: untag user pointers in get_vaddr_frames
fs/namespace: untag user pointers in copy_mount_options
userfaultfd: untag user pointers
drm/amdgpu: untag user pointers
drm/radeon: untag user pointers in radeon_gem_userptr_ioctl
media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get
tee/shm: untag user pointers in tee_shm_register
vfio/type1: untag user pointers in vaddr_get_pfn
Catalin Marinas <catalin.marinas@arm.com>:
mm: untag user pointers in mmap/munmap/mremap/brk
Subsystem: madvise
Minchan Kim <minchan@kernel.org>:
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7:
mm: introduce MADV_COLD
mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM
mm: introduce MADV_PAGEOUT
mm: factor out common parts between MADV_COLD and MADV_PAGEOUT
Subsystem: cleanups
Mike Rapoport <rppt@linux.ibm.com>:
hexagon: drop empty and unused free_initrd_mem
Denis Efremov <efremov@linux.com>:
checkpatch: check for nested (un)?likely() calls
xen/events: remove unlikely() from WARN() condition
fs: remove unlikely() from WARN_ON() condition
wimax/i2400m: remove unlikely() from WARN*() condition
xfs: remove unlikely() from WARN_ON() condition
IB/hfi1: remove unlikely() from IS_ERR*() condition
ntfs: remove (un)?likely() from IS_ERR() conditions
Subsystem: pagemap
Mark Rutland <mark.rutland@arm.com>:
mm: treewide: clarify pgtable_page_{ctor,dtor}() naming
Documentation/core-api/kernel-api.rst | 3
Documentation/vm/split_page_table_lock.rst | 10
arch/alpha/include/uapi/asm/mman.h | 3
arch/arc/include/asm/pgalloc.h | 4
arch/arm/include/asm/tlb.h | 2
arch/arm/mm/mmu.c | 2
arch/arm64/include/asm/tlb.h | 2
arch/arm64/mm/mmu.c | 2
arch/csky/include/asm/pgalloc.h | 2
arch/hexagon/include/asm/pgalloc.h | 2
arch/hexagon/mm/init.c | 13
arch/m68k/include/asm/mcf_pgalloc.h | 6
arch/m68k/include/asm/motorola_pgalloc.h | 6
arch/m68k/include/asm/sun3_pgalloc.h | 2
arch/mips/include/asm/pgalloc.h | 2
arch/mips/include/uapi/asm/mman.h | 3
arch/nios2/include/asm/pgalloc.h | 2
arch/openrisc/include/asm/pgalloc.h | 6
arch/parisc/include/uapi/asm/mman.h | 3
arch/powerpc/mm/pgtable-frag.c | 6
arch/riscv/include/asm/pgalloc.h | 2
arch/s390/mm/pgalloc.c | 6
arch/sh/include/asm/pgalloc.h | 2
arch/sparc/include/asm/pgtable_64.h | 5
arch/sparc/mm/init_64.c | 4
arch/sparc/mm/srmmu.c | 4
arch/um/include/asm/pgalloc.h | 2
arch/unicore32/include/asm/tlb.h | 2
arch/x86/mm/pat_rbtree.c | 19
arch/x86/mm/pgtable.c | 2
arch/xtensa/include/asm/pgalloc.h | 4
arch/xtensa/include/uapi/asm/mman.h | 3
drivers/block/drbd/drbd_interval.c | 29 -
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2
drivers/gpu/drm/radeon/radeon_gem.c | 2
drivers/infiniband/hw/hfi1/verbs.c | 2
drivers/media/v4l2-core/videobuf-dma-contig.c | 9
drivers/net/wimax/i2400m/tx.c | 3
drivers/tee/tee_shm.c | 1
drivers/vfio/vfio_iommu_type1.c | 2
drivers/xen/events/events_base.c | 2
fs/fat/dir.c | 4
fs/namespace.c | 2
fs/ntfs/mft.c | 12
fs/ntfs/namei.c | 2
fs/ntfs/runlist.c | 2
fs/ntfs/super.c | 2
fs/open.c | 2
fs/reiserfs/do_balan.c | 15
fs/reiserfs/fix_node.c | 6
fs/reiserfs/journal.c | 22
fs/reiserfs/lbalance.c | 3
fs/reiserfs/objectid.c | 3
fs/reiserfs/prints.c | 3
fs/reiserfs/stree.c | 4
fs/userfaultfd.c | 22
fs/xfs/xfs_buf.c | 4
include/asm-generic/bug.h | 71 +-
include/asm-generic/pgalloc.h | 8
include/linux/cpumask.h | 14
include/linux/interval_tree_generic.h | 22
include/linux/kexec.h | 2
include/linux/kgdb.h | 2
include/linux/mm.h | 4
include/linux/mm_types_task.h | 4
include/linux/printk.h | 22
include/linux/rbtree_augmented.h | 114 +++-
include/linux/string.h | 5
include/linux/swap.h | 2
include/linux/thread_info.h | 2
include/linux/uaccess.h | 21
include/trace/events/writeback.h | 38 -
include/uapi/asm-generic/mman-common.h | 3
include/uapi/linux/coff.h | 5
ipc/mqueue.c | 22
ipc/sem.c | 3
kernel/debug/debug_core.c | 31 -
kernel/elfcore.c | 1
kernel/fork.c | 16
kernel/kexec_core.c | 2
kernel/panic.c | 48 -
lib/Kconfig.debug | 4
lib/bug.c | 11
lib/extable.c | 1
lib/generic-radix-tree.c | 4
lib/hexdump.c | 21
lib/lzo/lzo1x_compress.c | 14
lib/rbtree_test.c | 37 -
lib/string.c | 12
lib/strncpy_from_user.c | 3
lib/strnlen_user.c | 3
mm/frame_vector.c | 2
mm/gup.c | 4
mm/internal.h | 2
mm/madvise.c | 562 ++++++++++++++++-------
mm/memcontrol.c | 10
mm/mempolicy.c | 3
mm/migrate.c | 2
mm/mincore.c | 2
mm/mlock.c | 4
mm/mmap.c | 34 -
mm/mprotect.c | 2
mm/mremap.c | 13
mm/msync.c | 2
mm/oom_kill.c | 2
mm/swap.c | 42 +
mm/vmalloc.c | 5
mm/vmscan.c | 62 ++
scripts/checkpatch.pl | 69 ++
scripts/gdb/linux/symbols.py | 4
tools/include/linux/rbtree.h | 71 +-
tools/include/linux/rbtree_augmented.h | 145 +++--
tools/lib/rbtree.c | 37 -
114 files changed, 1195 insertions(+), 754 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-09-24 15:34 ` incoming Linus Torvalds
@ 2019-09-25 6:36 ` Michal Hocko
0 siblings, 0 replies; 786+ messages in thread
From: Michal Hocko @ 2019-09-25 6:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, David Rientjes, Vlastimil Babka, Andrea Arcangeli,
mm-commits, Linux-MM
On Tue 24-09-19 08:34:20, Linus Torvalds wrote:
> On Tue, Sep 24, 2019 at 12:48 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > The patch proposed by David is really non trivial wrt. potential side
> > effects.
>
> The thing is, that's not an argument when we know that the current
> state is garbage and has a lot of these non-trivial side effects that
> are bad.
>
> So the patch by David _fixes_ a non-trivial bad side effect.
>
> You can't then say "there may be other non-trivial side effects that I
> don't even know about" as an argument for saying it's bad. David at
> least has numbers and an argument for his patch.
All I am saying is that I am not able to wrap my head around this patch
to provide a competent Ack. I also believe that the fix is targetting a
wrong layer of the problem as explained in my review feedback. Appart
from reclaim/compaction interaction mentioned by Vlastimil, it seems
that it is an overly eager fallback to a remote node in the fast path
that is causing a large part of the problem as well. Kcompactd is not
eager enough to keep high order allocations ready for the fast path.
This is not specific to THP we have many other high order allocations
which are going to follow the same pattern, likely not visible in any
counters but still having performance implications.
Let's discuss technical details in the respective email thread
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-09-24 7:48 ` incoming Michal Hocko
2019-09-24 15:34 ` incoming Linus Torvalds
@ 2019-09-24 19:55 ` Vlastimil Babka
1 sibling, 0 replies; 786+ messages in thread
From: Vlastimil Babka @ 2019-09-24 19:55 UTC (permalink / raw)
To: Michal Hocko, Andrew Morton
Cc: Linus Torvalds, David Rientjes, Andrea Arcangeli, mm-commits, Linux-MM
On 9/24/19 9:48 AM, Michal Hocko wrote:
> On Mon 23-09-19 21:31:53, Andrew Morton wrote:
>> On Mon, 23 Sep 2019 17:55:24 -0700 Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>
>>> On Mon, Sep 23, 2019 at 3:31 PM Andrew Morton
>>> <akpm@linux-foundation.org> wrote:
>>>>
>>>> - almost all of -mm, as below.
>>>
>>> I was hoping that we could at least test the THP locality thing?
>>> Is it in your queue at all, or am I supposed to just do it
>>> myself?
>>>
>>
>> Confused. I saw a privately emailed patch from David which nobody
>> seems to have tested yet. I parked that for consideration after
>> -rc1. Or are you referring to something else?
>>
>> This thing keeps stalling. It would be nice to push this along and
>> get something nailed down which we can at least get into 5.4-rc,
>> perhaps with a backport-this tag?
>
> The patch proposed by David is really non trivial wrt. potential
> side effects. I have provided my review feedback [1] and it didn't
> get any reaction. I really believe that we need to debug this
> properly. A reproducer would be useful for others to work on that.
>
> There is a more fundamental problem here and we need to address it
> rather than to duck tape it and whack a mole afterwards.
I believe we found a problem when investigating over-reclaim in this
thread [1] where it seems madvised THP allocation attempt can result in
4MB reclaimed, if there is a small zone such as ZONE_DMA on the node. As
it happens, the patch "[patch 090/134] mm, reclaim: make
should_continue_reclaim perform dryrun detection" in Andrew's pile
should change this 4MB to 32 pages reclaimed (as a side-effect), but
that has to be tested. I'm also working on a patch to not reclaim even
those few pages. Of course there might be more fundamental issues with
reclaim/compaction interaction, but this one seems to become hopefully
clear now.
[1]
https://lore.kernel.org/linux-mm/4b4ba042-3741-7b16-2292-198c569da2aa@profihost.ag/
> [1] http://lkml.kernel.org/r/20190909193020.GD2063@dhcp22.suse.cz
>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-09-24 7:48 ` incoming Michal Hocko
@ 2019-09-24 15:34 ` Linus Torvalds
2019-09-25 6:36 ` incoming Michal Hocko
2019-09-24 19:55 ` incoming Vlastimil Babka
1 sibling, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2019-09-24 15:34 UTC (permalink / raw)
To: Michal Hocko
Cc: Andrew Morton, David Rientjes, Vlastimil Babka, Andrea Arcangeli,
mm-commits, Linux-MM
On Tue, Sep 24, 2019 at 12:48 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> The patch proposed by David is really non trivial wrt. potential side
> effects.
The thing is, that's not an argument when we know that the current
state is garbage and has a lot of these non-trivial side effects that
are bad.
So the patch by David _fixes_ a non-trivial bad side effect.
You can't then say "there may be other non-trivial side effects that I
don't even know about" as an argument for saying it's bad. David at
least has numbers and an argument for his patch.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-09-24 4:31 ` incoming Andrew Morton
@ 2019-09-24 7:48 ` Michal Hocko
2019-09-24 15:34 ` incoming Linus Torvalds
2019-09-24 19:55 ` incoming Vlastimil Babka
0 siblings, 2 replies; 786+ messages in thread
From: Michal Hocko @ 2019-09-24 7:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, David Rientjes, Vlastimil Babka,
Andrea Arcangeli, mm-commits, Linux-MM
On Mon 23-09-19 21:31:53, Andrew Morton wrote:
> On Mon, 23 Sep 2019 17:55:24 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > On Mon, Sep 23, 2019 at 3:31 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > - almost all of -mm, as below.
> >
> > I was hoping that we could at least test the THP locality thing? Is it
> > in your queue at all, or am I supposed to just do it myself?
> >
>
> Confused. I saw a privately emailed patch from David which nobody
> seems to have tested yet. I parked that for consideration after -rc1.
> Or are you referring to something else?
>
> This thing keeps stalling. It would be nice to push this along and get
> something nailed down which we can at least get into 5.4-rc, perhaps
> with a backport-this tag?
The patch proposed by David is really non trivial wrt. potential side
effects. I have provided my review feedback [1] and it didn't get
any reaction. I really believe that we need to debug this properly. A
reproducer would be useful for others to work on that.
There is a more fundamental problem here and we need to address it
rather than to duck tape it and whack a mole afterwards.
[1] http://lkml.kernel.org/r/20190909193020.GD2063@dhcp22.suse.cz
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-09-24 0:55 ` incoming Linus Torvalds
@ 2019-09-24 4:31 ` Andrew Morton
2019-09-24 7:48 ` incoming Michal Hocko
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2019-09-24 4:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Rientjes, Vlastimil Babka, Michal Hocko, Andrea Arcangeli,
mm-commits, Linux-MM
On Mon, 23 Sep 2019 17:55:24 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, Sep 23, 2019 at 3:31 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > - almost all of -mm, as below.
>
> I was hoping that we could at least test the THP locality thing? Is it
> in your queue at all, or am I supposed to just do it myself?
>
Confused. I saw a privately emailed patch from David which nobody
seems to have tested yet. I parked that for consideration after -rc1.
Or are you referring to something else?
This thing keeps stalling. It would be nice to push this along and get
something nailed down which we can at least get into 5.4-rc, perhaps
with a backport-this tag?
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-09-23 22:31 incoming Andrew Morton
@ 2019-09-24 0:55 ` Linus Torvalds
2019-09-24 4:31 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2019-09-24 0:55 UTC (permalink / raw)
To: Andrew Morton, David Rientjes, Vlastimil Babka, Michal Hocko,
Andrea Arcangeli
Cc: mm-commits, Linux-MM
On Mon, Sep 23, 2019 at 3:31 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> - almost all of -mm, as below.
I was hoping that we could at least test the THP locality thing? Is it
in your queue at all, or am I supposed to just do it myself?
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-09-23 22:31 Andrew Morton
2019-09-24 0:55 ` incoming Linus Torvalds
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2019-09-23 22:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
- a few hot fixes
- ocfs2 updates
- almost all of -mm, as below.
134 patches, based on 619e17cf75dd58905aa67ccd494a6ba5f19d6cc6:
Subsystems affected by this patch series:
hotfixes
ocfs2
slab-generic
slab
slub
kmemleak
kasan
cleanups
debug
pagecache
memcg
gup
pagemap
memory-hotplug
sparsemem
vmalloc
initialization
z3fold
compaction
mempolicy
oom-kill
hugetlb
migration
thp
mmap
madvise
shmem
zswap
zsmalloc
Subsystem: hotfixes
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
fat: work around race with userspace's read via blockdev while mounting
Vitaly Wool <vitalywool@gmail.com>:
Revert "mm/z3fold.c: fix race between migration and destruction"
Arnd Bergmann <arnd@arndb.de>:
mm: add dummy can_do_mlock() helper
Vitaly Wool <vitalywool@gmail.com>:
z3fold: fix retry mechanism in page reclaim
Greg Thelen <gthelen@google.com>:
kbuild: clean compressed initramfs image
Subsystem: ocfs2
Joseph Qi <joseph.qi@linux.alibaba.com>:
ocfs2: use jbd2_inode dirty range scoping
jbd2: remove jbd2_journal_inode_add_[write|wait]
Greg Kroah-Hartman <gregkh@linuxfoundation.org>:
ocfs2: further debugfs cleanups
Guozhonghua <guozhonghua@h3c.com>:
ocfs2: remove unused ocfs2_calc_tree_trunc_credits()
ocfs2: remove unused ocfs2_orphan_scan_exit() declaration
zhengbin <zhengbin13@huawei.com>:
fs/ocfs2/namei.c: remove set but not used variables
fs/ocfs2/file.c: remove set but not used variables
fs/ocfs2/dir.c: remove set but not used variables
Markus Elfring <elfring@users.sourceforge.net>:
ocfs2: delete unnecessary checks before brelse()
Changwei Ge <gechangwei@live.cn>:
ocfs2: wait for recovering done after direct unlock request
ocfs2: checkpoint appending truncate log transaction before flushing
Colin Ian King <colin.king@canonical.com>:
ocfs2: fix spelling mistake "ambigous" -> "ambiguous"
Subsystem: slab-generic
Waiman Long <longman@redhat.com>:
mm, slab: extend slab/shrink to shrink all memcg caches
Subsystem: slab
Waiman Long <longman@redhat.com>:
mm, slab: move memcg_cache_params structure to mm/slab.h
Subsystem: slub
Qian Cai <cai@lca.pw>:
mm/slub.c: fix -Wunused-function compiler warnings
Subsystem: kmemleak
Nicolas Boichat <drinkcat@chromium.org>:
kmemleak: increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE default to 16K
Catalin Marinas <catalin.marinas@arm.com>:
Patch series "mm: kmemleak: Use a memory pool for kmemleak object:
mm: kmemleak: make the tool tolerant to struct scan_area allocation failures
mm: kmemleak: simple memory allocation pool for kmemleak objects
mm: kmemleak: use the memory pool for early allocations
Qian Cai <cai@lca.pw>:
mm/kmemleak.c: record the current memory pool size
mm/kmemleak: increase the max mem pool to 1M
Subsystem: kasan
Walter Wu <walter-zh.wu@mediatek.com>:
kasan: add memory corruption identification for software tag-based mode
Mark Rutland <mark.rutland@arm.com>:
lib/test_kasan.c: add roundtrip tests
Subsystem: cleanups
Christophe JAILLET <christophe.jaillet@wanadoo.fr>:
mm/page_poison.c: fix a typo in a comment
YueHaibing <yuehaibing@huawei.com>:
mm/rmap.c: remove set but not used variable 'cstart'
Matthew Wilcox (Oracle) <willy@infradead.org>:
Patch series "Make working with compound pages easier", v2:
mm: introduce page_size()
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: introduce page_shift()
Matthew Wilcox (Oracle) <willy@infradead.org>:
mm: introduce compound_nr()
Yu Zhao <yuzhao@google.com>:
mm: replace list_move_tail() with add_page_to_lru_list_tail()
Subsystem: debug
Vlastimil Babka <vbabka@suse.cz>:
Patch series "debug_pagealloc improvements through page_owner", v2:
mm, page_owner: record page owner for each subpage
mm, page_owner: keep owner info when freeing the page
mm, page_owner, debug_pagealloc: save and dump freeing stack trace
Subsystem: pagecache
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>:
mm/filemap.c: don't initiate writeback if mapping has no dirty pages
mm/filemap.c: rewrite mapping_needs_writeback in less fancy manner
"Matthew Wilcox (Oracle)" <willy@infradead.org>:
mm: page cache: store only head pages in i_pages
Subsystem: memcg
Chris Down <chris@chrisdown.name>:
mm, memcg: throttle allocators when failing reclaim over memory.high
Roman Gushchin <guro@fb.com>:
mm: memcontrol: switch to rcu protection in drain_all_stock()
Johannes Weiner <hannes@cmpxchg.org>:
mm: vmscan: do not share cgroup iteration between reclaimers
Subsystem: gup
[11~From: John Hubbard <jhubbard@nvidia.com>:
Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",:
mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
John Hubbard <jhubbard@nvidia.com>:
drivers/gpu/drm/via: convert put_page() to put_user_page*()
net/xdp: convert put_page() to put_user_page*()
Subsystem: pagemap
Wei Yang <richardw.yang@linux.intel.com>:
mm: remove redundant assignment of entry
Minchan Kim <minchan@kernel.org>:
mm: release the spinlock on zap_pte_range
Nicholas Piggin <npiggin@gmail.com>:
Patch series "mm: remove quicklist page table caches":
mm: remove quicklist page table caches
Mike Rapoport <rppt@linux.ibm.com>:
ia64: switch to generic version of pte allocation
sh: switch to generic version of pte allocation
microblaze: switch to generic version of pte allocation
mm: consolidate pgtable_cache_init() and pgd_cache_init()
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm: do not hash address in print_bad_pte()
Subsystem: memory-hotplug
David Hildenbrand <david@redhat.com>:
mm/memory_hotplug: remove move_pfn_range()
drivers/base/node.c: simplify unregister_memory_block_under_nodes()
drivers/base/memory.c: fixup documentation of removable/phys_index/block_size_bytes
driver/base/memory.c: validate memory block size early
drivers/base/memory.c: don't store end_section_nr in memory blocks
Wei Yang <richardw.yang@linux.intel.com>:
mm/memory_hotplug.c: prevent memory leak when reusing pgdat
David Hildenbrand <david@redhat.com>:
Patch series "mm/memory_hotplug: online_pages() cleanups", v2:
mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range()
mm/memory_hotplug: drop PageReserved() check in online_pages_range()
mm/memory_hotplug: simplify online_pages_range()
mm/memory_hotplug: make sure the pfn is aligned to the order when onlining
mm/memory_hotplug: online_pages cannot be 0 in online_pages()
Alastair D'Silva <alastair@d-silva.org>:
Patch series "Add bounds check for Hotplugged memory", v3:
mm/memory_hotplug.c: add a bounds check to check_hotplug_memory_range()
mm/memremap.c: add a bounds check in devm_memremap_pages()
Souptick Joarder <jrdr.linux@gmail.com>:
mm/memory_hotplug.c: s/is/if
Subsystem: sparsemem
Lecopzer Chen <lecopzer.chen@mediatek.com>:
mm/sparse.c: fix memory leak of sparsemap_buf in aligned memory
mm/sparse.c: fix ALIGN() without power of 2 in sparse_buffer_alloc()
Wei Yang <richardw.yang@linux.intel.com>:
mm/sparse.c: use __nr_to_section(section_nr) to get mem_section
Alastair D'Silva <alastair@d-silva.org>:
mm/sparse.c: don't manually decrement num_poisoned_pages
"Alastair D'Silva" <alastair@d-silva.org>:
mm/sparse.c: remove NULL check in clear_hwpoisoned_pages()
Subsystem: vmalloc
"Uladzislau Rezki (Sony)" <urezki@gmail.com>:
mm/vmalloc: do not keep unpurged areas in the busy tree
Pengfei Li <lpf.vector@gmail.com>:
mm/vmalloc: modify struct vmap_area to reduce its size
Austin Kim <austindh.kim@gmail.com>:
mm/vmalloc.c: move 'area->pages' after if statement
Subsystem: initialization
Mike Rapoport <rppt@linux.ibm.com>:
mm: use CPU_BITS_NONE to initialize init_mm.cpu_bitmask
Qian Cai <cai@lca.pw>:
mm: silence -Woverride-init/initializer-overrides
Subsystem: z3fold
Vitaly Wool <vitalywool@gmail.com>:
z3fold: fix memory leak in kmem cache
Subsystem: compaction
Yafang Shao <laoar.shao@gmail.com>:
mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
Pengfei Li <lpf.vector@gmail.com>:
mm/compaction.c: remove unnecessary zone parameter in isolate_migratepages()
Subsystem: mempolicy
Kefeng Wang <wangkefeng.wang@huawei.com>:
mm/mempolicy.c: remove unnecessary nodemask check in kernel_migrate_pages()
Subsystem: oom-kill
Joel Savitz <jsavitz@redhat.com>:
mm/oom_kill.c: add task UID to info message on an oom kill
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>:
memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
Edward Chron <echron@arista.com>:
mm/oom: add oom_score_adj and pgtables to Killed process message
Yi Wang <wang.yi59@zte.com.cn>:
mm/oom_kill.c: fix oom_cpuset_eligible() comment
Michal Hocko <mhocko@suse.com>:
mm, oom: consider present pages for the node size
Qian Cai <cai@lca.pw>:
mm/memcontrol.c: fix a -Wunused-function warning
Michal Hocko <mhocko@suse.com>:
memcg, kmem: deprecate kmem.limit_in_bytes
Subsystem: hugetlb
Hillf Danton <hdanton@sina.com>:
Patch series "address hugetlb page allocation stalls", v2:
mm, reclaim: make should_continue_reclaim perform dryrun detection
Vlastimil Babka <vbabka@suse.cz>:
mm, reclaim: cleanup should_continue_reclaim()
mm, compaction: raise compaction priority after it withdrawns
Mike Kravetz <mike.kravetz@oracle.com>:
hugetlbfs: don't retry when pool page allocations start to fail
Subsystem: migration
Pingfan Liu <kernelfans@gmail.com>:
mm/migrate.c: clean up useless code in migrate_vma_collect_pmd()
Subsystem: thp
Kefeng Wang <wangkefeng.wang@huawei.com>:
thp: update split_huge_page_pmd() comment
Song Liu <songliubraving@fb.com>:
Patch series "Enable THP for text section of non-shmem files", v10;:
filemap: check compound_head(page)->mapping in filemap_fault()
filemap: check compound_head(page)->mapping in pagecache_get_page()
filemap: update offset check in filemap_fault()
mm,thp: stats for file backed THP
khugepaged: rename collapse_shmem() and khugepaged_scan_shmem()
mm,thp: add read-only THP support for (non-shmem) FS
mm,thp: avoid writes to file with THP in pagecache
Yang Shi <yang.shi@linux.alibaba.com>:
Patch series "Make deferred split shrinker memcg aware", v6:
mm: thp: extract split_queue_* into a struct
mm: move mem_cgroup_uncharge out of __page_cache_release()
mm: shrinker: make shrinker not depend on memcg kmem
mm: thp: make deferred split shrinker memcg aware
Song Liu <songliubraving@fb.com>:
Patch series "THP aware uprobe", v13:
mm: move memcmp_pages() and pages_identical()
uprobe: use original page when all uprobes are removed
mm, thp: introduce FOLL_SPLIT_PMD
uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT
khugepaged: enable collapse pmd for pte-mapped THP
uprobe: collapse THP pmd after removing all uprobes
Subsystem: mmap
Alexandre Ghiti <alex@ghiti.fr>:
Patch series "Provide generic top-down mmap layout functions", v6:
mm, fs: move randomize_stack_top from fs to mm
arm64: make use of is_compat_task instead of hardcoding this test
arm64: consider stack randomization for mmap base only when necessary
arm64, mm: move generic mmap layout functions to mm
arm64, mm: make randomization selected by generic topdown mmap layout
arm: properly account for stack randomization and stack guard gap
arm: use STACK_TOP when computing mmap base address
arm: use generic mmap top-down layout and brk randomization
mips: properly account for stack randomization and stack guard gap
mips: use STACK_TOP when computing mmap base address
mips: adjust brk randomization offset to fit generic version
mips: replace arch specific way to determine 32bit task with generic version
mips: use generic mmap top-down layout and brk randomization
riscv: make mmap allocation top-down by default
Wei Yang <richardw.yang@linux.intel.com>:
mm/mmap.c: refine find_vma_prev() with rb_last()
Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>:
mm: mmap: increase sockets maximum memory size pgoff for 32bits
Subsystem: madvise
Mike Rapoport <rppt@linux.ibm.com>:
mm/madvise: reduce code duplication in error handling paths
Subsystem: shmem
Miles Chen <miles.chen@mediatek.com>:
shmem: fix obsolete comment in shmem_getpage_gfp()
Subsystem: zswap
Hui Zhu <teawaterz@linux.alibaba.com>:
zpool: add malloc_support_movable to zpool_driver
zswap: use movable memory if zpool support allocate movable memory
Vitaly Wool <vitalywool@gmail.com>:
zswap: do not map same object twice
Subsystem: zsmalloc
Qian Cai <cai@lca.pw>:
mm/zsmalloc.c: fix a -Wunused-function warning
Documentation/ABI/testing/sysfs-kernel-slab | 13
Documentation/admin-guide/cgroup-v1/memory.rst | 4
Documentation/admin-guide/kernel-parameters.txt | 2
arch/Kconfig | 11
arch/alpha/include/asm/pgalloc.h | 2
arch/alpha/include/asm/pgtable.h | 5
arch/arc/include/asm/pgalloc.h | 1
arch/arc/include/asm/pgtable.h | 5
arch/arm/Kconfig | 1
arch/arm/include/asm/pgalloc.h | 2
arch/arm/include/asm/pgtable-nommu.h | 5
arch/arm/include/asm/pgtable.h | 2
arch/arm/include/asm/processor.h | 2
arch/arm/kernel/process.c | 5
arch/arm/mm/flush.c | 7
arch/arm/mm/mmap.c | 80 -----
arch/arm64/Kconfig | 2
arch/arm64/include/asm/pgalloc.h | 2
arch/arm64/include/asm/pgtable.h | 2
arch/arm64/include/asm/processor.h | 2
arch/arm64/kernel/process.c | 8
arch/arm64/mm/flush.c | 3
arch/arm64/mm/mmap.c | 84 -----
arch/arm64/mm/pgd.c | 2
arch/c6x/include/asm/pgtable.h | 5
arch/csky/include/asm/pgalloc.h | 2
arch/csky/include/asm/pgtable.h | 5
arch/h8300/include/asm/pgtable.h | 6
arch/hexagon/include/asm/pgalloc.h | 2
arch/hexagon/include/asm/pgtable.h | 3
arch/hexagon/mm/Makefile | 2
arch/hexagon/mm/pgalloc.c | 10
arch/ia64/Kconfig | 4
arch/ia64/include/asm/pgalloc.h | 64 ----
arch/ia64/include/asm/pgtable.h | 5
arch/ia64/mm/init.c | 2
arch/m68k/include/asm/pgtable_mm.h | 7
arch/m68k/include/asm/pgtable_no.h | 7
arch/microblaze/include/asm/pgalloc.h | 128 --------
arch/microblaze/include/asm/pgtable.h | 7
arch/microblaze/mm/pgtable.c | 4
arch/mips/Kconfig | 2
arch/mips/include/asm/pgalloc.h | 2
arch/mips/include/asm/pgtable.h | 5
arch/mips/include/asm/processor.h | 5
arch/mips/mm/mmap.c | 124 +-------
arch/nds32/include/asm/pgalloc.h | 2
arch/nds32/include/asm/pgtable.h | 2
arch/nios2/include/asm/pgalloc.h | 2
arch/nios2/include/asm/pgtable.h | 2
arch/openrisc/include/asm/pgalloc.h | 2
arch/openrisc/include/asm/pgtable.h | 5
arch/parisc/include/asm/pgalloc.h | 2
arch/parisc/include/asm/pgtable.h | 2
arch/powerpc/include/asm/pgalloc.h | 2
arch/powerpc/include/asm/pgtable.h | 1
arch/powerpc/mm/book3s64/hash_utils.c | 2
arch/powerpc/mm/book3s64/iommu_api.c | 7
arch/powerpc/mm/hugetlbpage.c | 2
arch/riscv/Kconfig | 12
arch/riscv/include/asm/pgalloc.h | 4
arch/riscv/include/asm/pgtable.h | 5
arch/s390/include/asm/pgtable.h | 6
arch/sh/include/asm/pgalloc.h | 56 ---
arch/sh/include/asm/pgtable.h | 5
arch/sh/mm/Kconfig | 3
arch/sh/mm/nommu.c | 4
arch/sparc/include/asm/pgalloc_32.h | 2
arch/sparc/include/asm/pgalloc_64.h | 2
arch/sparc/include/asm/pgtable_32.h | 5
arch/sparc/include/asm/pgtable_64.h | 1
arch/sparc/mm/init_32.c | 1
arch/um/include/asm/pgalloc.h | 2
arch/um/include/asm/pgtable.h | 2
arch/unicore32/include/asm/pgalloc.h | 2
arch/unicore32/include/asm/pgtable.h | 2
arch/x86/include/asm/pgtable_32.h | 2
arch/x86/include/asm/pgtable_64.h | 3
arch/x86/mm/pgtable.c | 6
arch/xtensa/include/asm/pgtable.h | 1
arch/xtensa/include/asm/tlbflush.h | 3
drivers/base/memory.c | 44 +-
drivers/base/node.c | 55 +--
drivers/crypto/chelsio/chtls/chtls_io.c | 5
drivers/gpu/drm/via/via_dmablit.c | 10
drivers/infiniband/core/umem.c | 5
drivers/infiniband/hw/hfi1/user_pages.c | 5
drivers/infiniband/hw/qib/qib_user_pages.c | 5
drivers/infiniband/hw/usnic/usnic_uiom.c | 5
drivers/infiniband/sw/siw/siw_mem.c | 10
drivers/staging/android/ion/ion_system_heap.c | 4
drivers/target/tcm_fc/tfc_io.c | 3
drivers/vfio/vfio_iommu_spapr_tce.c | 8
fs/binfmt_elf.c | 20 -
fs/fat/dir.c | 13
fs/fat/fatent.c | 3
fs/inode.c | 3
fs/io_uring.c | 2
fs/jbd2/journal.c | 2
fs/jbd2/transaction.c | 12
fs/ocfs2/alloc.c | 20 +
fs/ocfs2/aops.c | 13
fs/ocfs2/blockcheck.c | 26 -
fs/ocfs2/cluster/heartbeat.c | 109 +------
fs/ocfs2/dir.c | 3
fs/ocfs2/dlm/dlmcommon.h | 1
fs/ocfs2/dlm/dlmdebug.c | 55 ---
fs/ocfs2/dlm/dlmdebug.h | 16 -
fs/ocfs2/dlm/dlmdomain.c | 7
fs/ocfs2/dlm/dlmunlock.c | 23 +
fs/ocfs2/dlmglue.c | 29 -
fs/ocfs2/extent_map.c | 3
fs/ocfs2/file.c | 13
fs/ocfs2/inode.c | 2
fs/ocfs2/journal.h | 42 --
fs/ocfs2/namei.c | 2
fs/ocfs2/ocfs2.h | 3
fs/ocfs2/super.c | 10
fs/open.c | 8
fs/proc/meminfo.c | 8
fs/proc/task_mmu.c | 6
include/asm-generic/pgalloc.h | 5
include/asm-generic/pgtable.h | 7
include/linux/compaction.h | 22 +
include/linux/fs.h | 32 ++
include/linux/huge_mm.h | 9
include/linux/hugetlb.h | 2
include/linux/jbd2.h | 2
include/linux/khugepaged.h | 12
include/linux/memcontrol.h | 23 -
include/linux/memory.h | 7
include/linux/memory_hotplug.h | 1
include/linux/mm.h | 37 ++
include/linux/mm_types.h | 1
include/linux/mmzone.h | 14
include/linux/page_ext.h | 1
include/linux/pagemap.h | 10
include/linux/quicklist.h | 94 ------
include/linux/shrinker.h | 7
include/linux/slab.h | 62 ----
include/linux/vmalloc.h | 20 -
include/linux/zpool.h | 3
init/main.c | 6
kernel/events/uprobes.c | 81 ++++-
kernel/resource.c | 4
kernel/sched/idle.c | 1
kernel/sysctl.c | 6
lib/Kconfig.debug | 15
lib/Kconfig.kasan | 8
lib/iov_iter.c | 2
lib/show_mem.c | 5
lib/test_kasan.c | 41 ++
mm/Kconfig | 16 -
mm/Kconfig.debug | 4
mm/Makefile | 4
mm/compaction.c | 50 +--
mm/filemap.c | 168 ++++------
mm/gup.c | 125 +++-----
mm/huge_memory.c | 129 ++++++--
mm/hugetlb.c | 89 +++++
mm/hugetlb_cgroup.c | 2
mm/init-mm.c | 2
mm/kasan/common.c | 32 +-
mm/kasan/kasan.h | 14
mm/kasan/report.c | 44 ++
mm/kasan/tags_report.c | 24 +
mm/khugepaged.c | 372 ++++++++++++++++++++----
mm/kmemleak.c | 338 +++++----------------
mm/ksm.c | 18 -
mm/madvise.c | 52 +--
mm/memcontrol.c | 188 ++++++++++--
mm/memfd.c | 2
mm/memory.c | 21 +
mm/memory_hotplug.c | 120 ++++---
mm/mempolicy.c | 4
mm/memremap.c | 5
mm/migrate.c | 13
mm/mmap.c | 12
mm/mmu_gather.c | 2
mm/nommu.c | 2
mm/oom_kill.c | 30 +
mm/page_alloc.c | 27 +
mm/page_owner.c | 127 +++++---
mm/page_poison.c | 2
mm/page_vma_mapped.c | 3
mm/quicklist.c | 103 ------
mm/rmap.c | 25 -
mm/shmem.c | 12
mm/slab.h | 64 ++++
mm/slab_common.c | 37 ++
mm/slob.c | 2
mm/slub.c | 22 -
mm/sparse.c | 25 +
mm/swap.c | 16 -
mm/swap_state.c | 6
mm/util.c | 126 +++++++-
mm/vmalloc.c | 84 +++--
mm/vmscan.c | 163 ++++------
mm/vmstat.c | 2
mm/z3fold.c | 154 ++-------
mm/zpool.c | 16 +
mm/zsmalloc.c | 23 -
mm/zswap.c | 15
net/xdp/xdp_umem.c | 9
net/xdp/xsk.c | 2
usr/Makefile | 3
206 files changed, 2385 insertions(+), 2533 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-08-30 23:04 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-08-30 23:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
7 fixes, based on 846d2db3e00048da3f650e0cfb0b8d67669cec3e:
Roman Gushchin <guro@fb.com>:
mm: memcontrol: flush percpu slab vmstats on kmem offlining
Andrew Morton <akpm@linux-foundation.org>:
mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n
Roman Gushchin <guro@fb.com>:
mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
"Gustavo A. R. Silva" <gustavo@embeddedor.com>:
mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate
Dmitry Safonov <dima@arista.com>:
mailmap: add aliases for Dmitry Safonov
Michal Hocko <mhocko@suse.com>:
mm, memcg: do not set reclaim_state on soft limit reclaim
Shakeel Butt <shakeelb@google.com>:
mm: memcontrol: fix percpu vmstats and vmevents flush
.mailmap | 3 ++
include/linux/mmzone.h | 5 ++--
mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++-----------------
mm/vmscan.c | 5 ++--
mm/z3fold.c | 1
mm/zsmalloc.c | 2 +
6 files changed, 47 insertions(+), 22 deletions(-)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2019-08-25 0:54 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2019-08-25 0:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm
11 fixes, based on 361469211f876e67d7ca3d3d29e6d1c3e313d0f1:
Henry Burns <henryburns@google.com>:
mm/z3fold.c: fix race between migration and destruction
David Rientjes <rientjes@google.com>:
mm, page_alloc: move_freepages should not examine struct page of reserved memory
Qian Cai <cai@lca.pw>:
parisc: fix compilation errrors
Roman Gushchin <guro@fb.com>:
mm: memcontrol: flush percpu vmstats before releasing memcg
mm: memcontrol: flush percpu vmevents before releasing memcg
Jason Xing <kerneljasonxing@linux.alibaba.com>:
psi: get poll_work to run when calling poll syscall next time
Oleg Nesterov <oleg@redhat.com>:
userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
Vlastimil Babka <vbabka@suse.cz>:
mm, page_owner: handle THP splits correctly
Henry Burns <henryburns@google.com>:
mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
mm/zsmalloc.c: fix race condition in zs_destroy_pool
Andrey Ryabinin <aryabinin@virtuozzo.com>:
mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
[not found] <20190718155613.546f9056bbb57f486ab64307@linux-foundation.org>
@ 2019-07-19 10:42 ` Vlastimil Babka
0 siblings, 0 replies; 786+ messages in thread
From: Vlastimil Babka @ 2019-07-19 10:42 UTC (permalink / raw)
To: linux-kernel, Linus Torvalds, Andrew Morton
On 7/19/19 12:56 AM, Andrew Morton wrote:
>
> The rest of MM and a kernel-wide procfs cleanup.
>
>
>
> Summary of the more significant patches:
Thanks for that!
Perhaps now it would be nice if this went also to linux-mm and lkml, as
mm-commits is sort of hidden.
Vlastimil
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-07-17 16:13 ` incoming Linus Torvalds
(?)
(?)
@ 2019-07-17 18:13 ` Vlastimil Babka
-1 siblings, 0 replies; 786+ messages in thread
From: Vlastimil Babka @ 2019-07-17 18:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux List Kernel Mailing, linux-mm, Jonathan Corbet, Thorsten Leemhuis
On 7/17/19 6:13 PM, Linus Torvalds wrote:
> On Wed, Jul 17, 2019 at 1:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> So I've tried now to provide an example what I had in mind, below.
>
> I'll take it as a trial. I added one-line notes about coda and the
> PTRACE_GET_SYSCALL_INFO interface too.
Thanks.
> I do hope that eventually I'll just get pull requests,
Very much agree, that was also discussed at length in the LSF/MM mm
process session I've linked.
> and they'll
> have more of a "theme" than this all (*)
I'll check if the first patch bomb would be more amenable to that, as I
plan to fill in the mm part for 5.3 on LinuxChanges wiki, but for a
merge commit it's too late.
> Linus
>
> (*) Although in many ways, the theme for Andrew is "falls through the
> cracks otherwise" so I'm not really complaining. This has been working
> for years and years.
Nevermind the misc stuff that much, but I think mm itself is more
important and deserves what other subsystems have.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-07-17 16:13 ` incoming Linus Torvalds
(?)
@ 2019-07-17 17:09 ` Christian Brauner
-1 siblings, 0 replies; 786+ messages in thread
From: Christian Brauner @ 2019-07-17 17:09 UTC (permalink / raw)
To: Linus Torvalds
Cc: Vlastimil Babka, Linux List Kernel Mailing, linux-mm,
Jonathan Corbet, Thorsten Leemhuis
On Wed, Jul 17, 2019 at 09:13:26AM -0700, Linus Torvalds wrote:
> On Wed, Jul 17, 2019 at 1:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > So I've tried now to provide an example what I had in mind, below.
>
> I'll take it as a trial. I added one-line notes about coda and the
> PTRACE_GET_SYSCALL_INFO interface too.
>
> I do hope that eventually I'll just get pull requests, and they'll
> have more of a "theme" than this all (*)
>
> Linus
>
> (*) Although in many ways, the theme for Andrew is "falls through the
> cracks otherwise" so I'm not really complaining. This has been working
I put all pid{fd}/clone{3} which is mostly related to pid.c, exit.c,
fork.c into my tree and try to give it a consistent theme for the prs I
sent. And that at least from my perspective that worked and was pretty
easy to coordinate with Andrew. That should hopefully make it a little
easier to theme the -mm tree overall going forward.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-07-17 8:47 ` incoming Vlastimil Babka
@ 2019-07-17 16:13 ` Linus Torvalds
2019-07-17 16:13 ` incoming Linus Torvalds
1 sibling, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2019-07-17 16:13 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Linux List Kernel Mailing, linux-mm, Jonathan Corbet, Thorsten Leemhuis
On Wed, Jul 17, 2019 at 1:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> So I've tried now to provide an example what I had in mind, below.
I'll take it as a trial. I added one-line notes about coda and the
PTRACE_GET_SYSCALL_INFO interface too.
I do hope that eventually I'll just get pull requests, and they'll
have more of a "theme" than this all (*)
Linus
(*) Although in many ways, the theme for Andrew is "falls through the
cracks otherwise" so I'm not really complaining. This has been working
for years and years.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2019-07-17 16:13 ` Linus Torvalds
0 siblings, 0 replies; 786+ messages in thread
From: Linus Torvalds @ 2019-07-17 16:13 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Linux List Kernel Mailing, linux-mm, Jonathan Corbet, Thorsten Leemhuis
On Wed, Jul 17, 2019 at 1:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> So I've tried now to provide an example what I had in mind, below.
I'll take it as a trial. I added one-line notes about coda and the
PTRACE_GET_SYSCALL_INFO interface too.
I do hope that eventually I'll just get pull requests, and they'll
have more of a "theme" than this all (*)
Linus
(*) Although in many ways, the theme for Andrew is "falls through the
cracks otherwise" so I'm not really complaining. This has been working
for years and years.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2019-07-17 8:47 ` incoming Vlastimil Babka
@ 2019-07-17 8:57 ` Bhaskar Chowdhury
2019-07-17 16:13 ` incoming Linus Torvalds
1 sibling, 0 replies; 786+ messages in thread
From: Bhaskar Chowdhury @ 2019-07-17 8:57 UTC (permalink / raw)
To: Vlastimil Babka
Cc: linux-kernel, Linus Torvalds, linux-mm, Jonathan Corbet,
Thorsten Leemhuis
[-- Attachment #1: Type: text/plain, Size: 2496 bytes --]
Cool !!
On 10:47 Wed 17 Jul , Vlastimil Babka wrote:
>On 7/17/19 1:25 AM, Andrew Morton wrote:
>>
>> Most of the rest of MM and just about all of the rest of everything
>> else.
>
>Hi,
>
>as I've mentioned at LSF/MM [1], I think it would be nice if mm pull
>requests had summaries similar to other subsystems. I see they are now
>more structured (thanks!), but they are now probably hitting the limit
>of what scripting can do to produce a high-level summary for human
>readers (unless patch authors themselves provide a blurb that can be
>extracted later?).
>
>So I've tried now to provide an example what I had in mind, below. Maybe
>it's too concise - if there were "larger" features in this pull request,
>they would probably benefit from more details. I'm CCing the known (to
>me) consumers of these mails to judge :) Note I've only covered mm, and
>core stuff that I think will be interesting to wide audience (change in
>LIST_POISON2 value? I'm sure as hell glad to know about that one :)
>
>Feel free to include this in the merge commit, if you find it useful.
>
>Thanks,
>Vlastimil
>
>[1] https://lwn.net/Articles/787705/
>
>-----
>
>- z3fold fixes and enhancements by Henry Burns and Vitaly Wool
>- more accurate reclaimed slab caches calculations by Yafang Shao
>- fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
>Christoph Hellwig
>- !CONFIG_MMU fixes by Christoph Hellwig
>- new novmcoredd parameter to omit device dumps from vmcore, by Kairui Song
>- new test_meminit module for testing heap and pagealloc initialization,
>by Alexander Potapenko
>- ioremap improvements for huge mappings, by Anshuman Khandual
>- generalize kprobe page fault handling, by Anshuman Khandual
>- device-dax hotplug fixes and improvements, by Pavel Tatashin
>- enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V
>- add pte_devmap() support for arm64, by Robin Murphy
>- unify locked_vm accounting with a helper, by Daniel Jordan
>- several misc fixes
>
>core/lib
>- new typeof_member() macro including some users, by Alexey Dobriyan
>- make BIT() and GENMASK() available in asm, by Masahiro Yamada
>- changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better code
>generation, by Alexey Dobriyan
>- rbtree code size optimizations, by Michel Lespinasse
>- convert struct pid count to refcount_t, by Joel Fernandes
>
>get_maintainer.pl
>- add --no-moderated switch to skip moderated ML's, by Joe Perches
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
[not found] <20190716162536.bb52b8f34a8ecf5331a86a42@linux-foundation.org>
@ 2019-07-17 8:47 ` Vlastimil Babka
2019-07-17 8:57 ` incoming Bhaskar Chowdhury
2019-07-17 16:13 ` incoming Linus Torvalds
0 siblings, 2 replies; 786+ messages in thread
From: Vlastimil Babka @ 2019-07-17 8:47 UTC (permalink / raw)
To: linux-kernel, Linus Torvalds
Cc: linux-mm, Jonathan Corbet, Thorsten Leemhuis, LKML
On 7/17/19 1:25 AM, Andrew Morton wrote:
>
> Most of the rest of MM and just about all of the rest of everything
> else.
Hi,
as I've mentioned at LSF/MM [1], I think it would be nice if mm pull
requests had summaries similar to other subsystems. I see they are now
more structured (thanks!), but they are now probably hitting the limit
of what scripting can do to produce a high-level summary for human
readers (unless patch authors themselves provide a blurb that can be
extracted later?).
So I've tried now to provide an example what I had in mind, below. Maybe
it's too concise - if there were "larger" features in this pull request,
they would probably benefit from more details. I'm CCing the known (to
me) consumers of these mails to judge :) Note I've only covered mm, and
core stuff that I think will be interesting to wide audience (change in
LIST_POISON2 value? I'm sure as hell glad to know about that one :)
Feel free to include this in the merge commit, if you find it useful.
Thanks,
Vlastimil
[1] https://lwn.net/Articles/787705/
-----
- z3fold fixes and enhancements by Henry Burns and Vitaly Wool
- more accurate reclaimed slab caches calculations by Yafang Shao
- fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
Christoph Hellwig
- !CONFIG_MMU fixes by Christoph Hellwig
- new novmcoredd parameter to omit device dumps from vmcore, by Kairui Song
- new test_meminit module for testing heap and pagealloc initialization,
by Alexander Potapenko
- ioremap improvements for huge mappings, by Anshuman Khandual
- generalize kprobe page fault handling, by Anshuman Khandual
- device-dax hotplug fixes and improvements, by Pavel Tatashin
- enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V
- add pte_devmap() support for arm64, by Robin Murphy
- unify locked_vm accounting with a helper, by Daniel Jordan
- several misc fixes
core/lib
- new typeof_member() macro including some users, by Alexey Dobriyan
- make BIT() and GENMASK() available in asm, by Masahiro Yamada
- changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better code
generation, by Alexey Dobriyan
- rbtree code size optimizations, by Michel Lespinasse
- convert struct pid count to refcount_t, by Joel Fernandes
get_maintainer.pl
- add --no-moderated switch to skip moderated ML's, by Joe Perches
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2018-02-06 23:34 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2018-02-06 23:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- kasan updates
- procfs
- lib/bitmap updates
- other lib/ updates
- checkpatch tweaks
- rapidio
- ubsan
- pipe fixes and cleanups
- lots of other misc bits
114 patches, based on e237f98a9c134c3d600353f21e07db915516875b:
Subject: kasan: don't emit builtin calls when sanitization is off
Subject: kasan: add compiler support for clang
Subject: kasan/Makefile: support LLVM style asan parameters
Subject: kasan: support alloca() poisoning
Subject: kasan: add tests for alloca poisoning
Subject: kasan: add functions for unpoisoning stack variables
Subject: kasan: detect invalid frees for large objects
Subject: kasan: don't use __builtin_return_address(1)
Subject: kasan: detect invalid frees for large mempool objects
Subject: kasan: unify code between kasan_slab_free() and kasan_poison_kfree()
Subject: kasan: detect invalid frees
Subject: kasan: fix prototype author email address
Subject: kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage
Subject: kasan: remove redundant initialization of variable 'real_size'
Subject: proc: use %u for pid printing and slightly less stack
Subject: proc: don't use READ_ONCE/WRITE_ONCE for /proc/*/fail-nth
Subject: proc: fix /proc/*/map_files lookup
Subject: fs/proc/vmcore.c: simpler /proc/vmcore cleanup
Subject: proc: less memory for /proc/*/map_files readdir
Subject: fs/proc/array.c: delete children_seq_release()
Subject: fs/proc/kcore.c: use probe_kernel_read() instead of memcpy()
Subject: fs/proc/internal.h: rearrange struct proc_dir_entry
Subject: fs/proc/internal.h: fix up comment
Subject: fs/proc: use __ro_after_init
Subject: proc: spread likely/unlikely a bit
Subject: proc: rearrange args
Subject: fs/proc/consoles.c: use seq_putc() in show_console_dev()
Subject: Makefile: move stack-protector compiler breakage test earlier
Subject: Makefile: move stack-protector availability out of Kconfig
Subject: Makefile: introduce CONFIG_CC_STACKPROTECTOR_AUTO
Subject: uuid: cleanup <uapi/linux/uuid.h>
Subject: tools/lib/subcmd/pager.c: do not alias select() params
Subject: kernel/async.c: revert "async: simplify lowest_in_progress()"
Subject: MAINTAINERS: update sboyd's email address
Subject: bitmap: new bitmap_copy_safe and bitmap_{from,to}_arr32
Subject: bitmap: replace bitmap_{from,to}_u32array
Subject: lib/test_bitmap.c: add bitmap_zero()/bitmap_clear() test cases
Subject: lib/test_bitmap.c: add bitmap_fill()/bitmap_set() test cases
Subject: lib/test_bitmap.c: clean up test_zero_fill_copy() test case and rename
Subject: include/linux/bitmap.h: make bitmap_fill() and bitmap_zero() consistent
Subject: lib/stackdepot.c: use a non-instrumented version of memcmp()
Subject: lib/test_find_bit.c: rename to find_bit_benchmark.c
Subject: lib/find_bit_benchmark.c: improvements
Subject: lib: optimize cpumask_next_and()
Subject: lib/: make RUNTIME_TESTS a menuconfig to ease disabling it all
Subject: lib/test_sort.c: add module unload support
Subject: checkpatch: allow long lines containing URL
Subject: checkpatch: ignore some octal permissions of 0
Subject: checkpatch: improve quoted string and line continuation test
Subject: checkpatch: add a few DEVICE_ATTR style tests
Subject: checkpatch: improve the TABSTOP test to include declarations
Subject: checkpatch: exclude drivers/staging from if with unnecessary parentheses test
Subject: checkpatch: avoid some false positives for TABSTOP declaration test
Subject: checkpatch: improve OPEN_BRACE test
Subject: elf: fix NT_FILE integer overflow
Subject: kallsyms: let print_ip_sym() print raw addresses
Subject: nilfs2: use time64_t internally
Subject: hfsplus: honor setgid flag on directories
Subject: <asm-generic/siginfo.h>: fix language in comments
Subject: kernel/fork.c: check error and return early
Subject: kernel/fork.c: add comment about usage of CLONE_FS flags and namespaces
Subject: cpumask: make cpumask_size() return "unsigned int"
Subject: rapidio: delete an error message for a failed memory allocation in rio_init_mports()
Subject: rapidio: adjust 12 checks for null pointers
Subject: rapidio: adjust five function calls together with a variable assignment
Subject: rapidio: improve a size determination in five functions
Subject: rapidio: delete an unnecessary variable initialisation in three functions
Subject: rapidio: return an error code only as a constant in two functions
Subject: rapidio: move 12 EXPORT_SYMBOL_GPL() calls to function implementations
Subject: drivers/rapidio/devices/tsi721_dma.c: delete an error message for a failed memory allocation in tsi721_alloc_chan_resources()
Subject: drivers/rapidio/devices/tsi721_dma.c: delete an unnecessary variable initialisation in tsi721_alloc_chan_resources()
Subject: drivers/rapidio/devices/tsi721_dma.c: adjust six checks for null pointers
Subject: pids: introduce find_get_task_by_vpid() helper
Subject: pps: parport: use timespec64 instead of timespec
Subject: kernel/relay.c: revert "kernel/relay.c: fix potential memory leak"
Subject: kcov: detect double association with a single task
Subject: include/linux/genl_magic_func.h: remove own BUILD_BUG_ON*() defines
Subject: build_bug.h: remove BUILD_BUG_ON_NULL()
Subject: lib/ubsan.c: s/missaligned/misaligned/
Subject: lib/ubsan: add type mismatch handler for new GCC/Clang
Subject: lib/ubsan: remove returns-nonnull-attribute checks
Subject: ipc: fix ipc data structures inconsistency
Subject: ipc/mqueue.c: have RT tasks queue in by priority in wq_add()
Subject: arch/score/kernel/setup.c: combine two seq_printf() calls into one call in show_cpuinfo()
Subject: vfs: remove might_sleep() from clear_inode()
Subject: mm/userfaultfd.c: remove duplicate include
Subject: mm: remove unneeded kallsyms include
Subject: hrtimer: remove unneeded kallsyms include
Subject: genirq: remove unneeded kallsyms include
Subject: mm/memblock: memblock_is_map/region_memory can be boolean
Subject: lib/lockref: __lockref_is_dead can be boolean
Subject: kernel/cpuset: current_cpuset_is_being_rebound can be boolean
Subject: kernel/resource: iomem_is_exclusive can be boolean
Subject: kernel/module: module_is_live can be boolean
Subject: kernel/mutex: mutex_is_locked can be boolean
Subject: crash_dump: is_kdump_kernel can be boolean
Subject: kasan: rework Kconfig settings
Subject: pipe, sysctl: drop 'min' parameter from pipe-max-size converter
Subject: pipe, sysctl: remove pipe_proc_fn()
Subject: pipe: actually allow root to exceed the pipe buffer limits
Subject: pipe: fix off-by-one error when checking buffer limits
Subject: pipe: reject F_SETPIPE_SZ with size over UINT_MAX
Subject: pipe: simplify round_pipe_size()
Subject: pipe: read buffer limits atomically
Subject: mm: docs: fixup punctuation
Subject: mm: docs: fix parameter names mismatch
Subject: mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors
Subject: MAINTAINERS: remove ANDROID ION pattern
Subject: MAINTAINERS: remove ARM/CLKDEV SUPPORT file pattern
Subject: MAINTAINERS: update Cortina/Gemini patterns
Subject: MAINTAINERS: update "ARM/OXNAS platform support" patterns
Subject: MAINTAINERS: update various PALM patterns
Subject: MAINTAINERS: update ARM/QUALCOMM SUPPORT patterns
Subject: Documentation/sysctl/user.txt: fix typo
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2018-02-01 0:13 incoming Andrew Morton
@ 2018-02-01 0:25 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2018-02-01 0:25 UTC (permalink / raw)
To: Linus Torvalds, mm-commits
And... [002/119] seems to have just disappeared. It was a standalone thing,
I'll resend next time.
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2018-02-01 0:13 Andrew Morton
2018-02-01 0:25 ` incoming Andrew Morton
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2018-02-01 0:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- misc fixes
- ocfs2 updates
- most of MM
119 patches, based on 7b1cd95d65eb3b1e13f8a90eb757e0ea232c7899:
Subject: fs/dax.c: release PMD lock even when there is no PMD support in DAX
Subject: tools: fix cross-compile var clobbering
Subject: scripts/decodecode: make it take multiline Code line
Subject: scripts/tags.sh: change find_other_sources() for include directories
Subject: m32r: remove abort()
Subject: fs/ocfs2/dlm/dlmmaster.c: clean up dead code
Subject: ocfs2/cluster: neaten a member of o2net_msg_handler
Subject: ocfs2: give an obvious tip for mismatched cluster names
Subject: ocfs2/cluster: close a race that fence can't be triggered
Subject: ocfs2: use the OCFS2_XATTR_ROOT_SIZE macro in ocfs2_reflink_xattr_header()
Subject: ocfs2: clean dead code in suballoc.c
Subject: ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
Subject: ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE
Subject: ocfs2/xattr: assign errno to 'ret' in ocfs2_calc_xattr_init()
Subject: ocfs2: clean up dead code in alloc.c
Subject: ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
Subject: ocfs2: make metadata estimation accurate and clear
Subject: ocfs2: try to reuse extent block in dealloc without meta_alloc
Subject: ocfs2: add trimfs dlm lock resource
Subject: ocfs2: add trimfs lock to avoid duplicated trims in cluster
Subject: ocfs2: add ocfs2_try_rw_lock() and ocfs2_try_inode_lock()
Subject: ocfs2: add ocfs2_overwrite_io()
Subject: ocfs2: nowait aio support
Subject: ocfs2: unlock bh_state if bg check fails
Subject: ocfs2: return error when we attempt to access a dirty bh in jbd2
Subject: mm/slab_common.c: make calculate_alignment() static
Subject: mm/slab.c: remove redundant assignments for slab_state
Subject: mm/slub.c: fix wrong address during slab padding restoration
Subject: slub: remove obsolete comments of put_cpu_partial()
Subject: include/linux/sched/mm.h: uninline mmdrop_async(), etc
Subject: mm: kmemleak: remove unused hardirq.h
Subject: zswap: same-filled pages handling
Subject: mm: relax deferred struct page requirements
Subject: mm/mempolicy: remove redundant check in get_nodes
Subject: mm/mempolicy: fix the check of nodemask from user
Subject: mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
Subject: mm: drop hotplug lock from lru_add_drain_all()
Subject: mm: show total hugetlb memory consumption in /proc/meminfo
Subject: mm: use sc->priority for slab shrink targets
Subject: mm: split deferred_init_range into initializing and freeing parts
Subject: mm/filemap.c: remove include of hardirq.h
Subject: mm: memcontrol: eliminate raw access to stat and event counters
Subject: mm: memcontrol: implement lruvec stat functions on top of each other
Subject: mm: memcontrol: fix excessive complexity in memory.stat reporting
Subject: mm/page_owner.c: use PTR_ERR_OR_ZERO()
Subject: mm/page_alloc.c: fix comment in __get_free_pages()
Subject: mm: do not stall register_shrinker()
Subject: selftests/vm: move 128TB mmap boundary test to generic directory
Subject: mm/interval_tree.c: use vma_pages() helper
Subject: mm: remove unused pgdat_reclaimable_pages()
Subject: mm, hugetlb: remove hugepages_treat_as_movable sysctl
Subject: mm/memory_hotplug.c: remove unnecesary check from register_page_bootmem_info_section()
Subject: mm: update comment describing tlb_gather_mmu
Subject: fs/proc/task_mmu.c: do not show VmExe bigger than total executable virtual memory
Subject: mm: memory_hotplug: remove second __nr_to_section in register_page_bootmem_info_section()
Subject: mm/huge_memory.c: fix comment in __split_huge_pmd_locked
Subject: mm, userfaultfd, THP: avoid waiting when PMD under THP migration
Subject: mm: add unmap_mapping_pages()
Subject: mm: get 7% more pages in a pagevec
Subject: asm-generic: provide generic_pmdp_establish()
Subject: arc: use generic_pmdp_establish as pmdp_establish
Subject: arm/mm: provide pmdp_establish() helper
Subject: arm64: provide pmdp_establish() helper
Subject: mips: use generic_pmdp_establish as pmdp_establish
Subject: powerpc/mm: update pmdp_invalidate to return old pmd value
Subject: s390/mm: modify pmdp_invalidate to return old value.
Subject: sparc64: update pmdp_invalidate() to return old pmd value
Subject: x86/mm: provide pmdp_establish() helper
Subject: mm: do not lose dirty and accessed bits in pmdp_invalidate()
Subject: mm: use updated pmdp_invalidate() interface to track dirty/accessed bits
Subject: mm/thp: remove pmd_huge_split_prepare()
Subject: mm: thp: use down_read_trylock() in khugepaged to avoid long block
Subject: mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks
Subject: mm, oom: avoid reaping only for mm's with blockable invalidate callbacks
Subject: mm/zsmalloc: simplify shrinker init/destroy
Subject: mm: align struct page more aesthetically
Subject: mm: de-indent struct page
Subject: mm: remove misleading alignment claims
Subject: mm: improve comment on page->mapping
Subject: mm: introduce _slub_counter_t
Subject: mm: store compound_dtor / compound_order as bytes
Subject: mm: document how to use struct page
Subject: mm: remove reference to PG_buddy
Subject: shmem: unexport shmem_add_seals()/shmem_get_seals()
Subject: shmem: rename functions that are memfd-related
Subject: hugetlb: expose hugetlbfs_inode_info in header
Subject: hugetlb: implement memfd sealing
Subject: shmem: add sealing support to hugetlb-backed memfd
Subject: memfd-test: test hugetlbfs sealing
Subject: memfd-test: add 'memfd-hugetlb:' prefix when testing hugetlbfs
Subject: memfd-test: move common code to a shared unit
Subject: memfd-test: run fuse test on hugetlb backend memory
Subject: userfaultfd: convert to use anon_inode_getfd()
Subject: mm: pin address_space before dereferencing it while isolating an LRU page
Subject: mm/fadvise: discard partial page if endbyte is also EOF
Subject: zswap: only save zswap header when necessary
Subject: memcg: refactor mem_cgroup_resize_limit()
Subject: mm/page_alloc.c: fix typos in comments
Subject: mm/page_owner.c: clean up init_pages_in_zone()
Subject: zsmalloc: use U suffix for negative literals being shifted
Subject: mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
Subject: mm/compaction.c: fix comment for try_to_compact_pages()
Subject: include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
Subject: mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
Subject: mm/memcontrol.c: make local symbol static
Subject: mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
Subject: mm, hugetlb: unify core page allocation accounting and initialization
Subject: mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
Subject: mm, hugetlb: do not rely on overcommit limit during migration
Subject: mm, hugetlb: get rid of surplus page accounting tricks
Subject: mm, hugetlb: further simplify hugetlb allocation API
Subject: hugetlb, mempolicy: fix the mbind hugetlb migration
Subject: hugetlb, mbind: fall back to default policy if vma is NULL
Subject: mm: numa: do not trap faults on shared data section pages.
Subject: mm: correct comments regarding do_fault_around()
Subject: mm, memory_hotplug: fix memmap initialization
Subject: mm/swap.c: make functions and their kernel-doc agree
Subject: tools, vm: new option to specify kpageflags file
Subject: mm: remove PG_highmem description
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2018-01-19 0:33 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2018-01-19 0:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
6 fixes, based on dda3e15231b35840fe6f0973f803cc70ddb86281:
Subject: mm/memory.c: release locked page in do_swap_page()
Subject: mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages
Subject: scripts/decodecode: fix decoding for AArch64 (arm64) instructions
Subject: scripts/gdb/linux/tasks.py: fix get_thread_info
Subject: proc: fix coredump vs read /proc/*/stat race
Subject: sparse doesn't support struct randomization
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2018-01-13 0:52 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2018-01-13 0:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
4 fixes, based on 1545dec46db3858bbce84c2065b579e2925706ab:
Subject: MAINTAINERS, nilfs2: change project home URLs
Subject: kmemleak: allow to coexist with fault injection
Subject: kdump: write correct address of mem_section into vmcoreinfo
Subject: tools/objtool/Makefile: don't assume sync-check.sh is executable
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2018-01-05 0:17 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2018-01-05 0:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
9 fixes, based on e1915c8195b38393005be9b74bfa6a3a367c83b3:
Subject: mm: check pfn_valid first in zero_resv_unavail
Subject: kernel/acct.c: fix the acct->needcheck check in check_free_space()
Subject: mm/mprotect: add a cond_resched() inside change_pmd_range()
Subject: kernel/exit.c: export abort() to modules
Subject: mm/debug.c: provide useful debugging information for VM_BUG
Subject: mm/zsmalloc.c: include fs.h
Subject: mm/sparse.c: wrong allocation for mem_section
Subject: userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
Subject: mailmap: update Mark Yao's email address
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-12-14 23:32 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-12-14 23:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
17 fixes, based on 7c5cac1bc7170bfc726a69eb64947c55658d16ad:
Subject: include/linux/idr.h: add #include <linux/bug.h>
Subject: lib/rbtree,drm/mm: add rbtree_replace_node_cached()
Subject: mm/kmemleak.c: make cond_resched() rate-limiting more efficient
Subject: string.h: workaround for increased stack usage
Subject: autofs: fix careless error in recent commit
Subject: exec: avoid gcc-8 warning for get_task_comm
Subject: Documentation/vm/zswap.txt: update with same-value filled page feature
Subject: scripts/faddr2line: fix CROSS_COMPILE unset error
Subject: mm/memory.c: mark wp_huge_pmd() inline to prevent build failure
Subject: mm/page_alloc.c: avoid excessive IRQ disabled times in free_unref_page_list()
Subject: mm/slab.c: do not hash pointers when debugging slab
Subject: kcov: fix comparison callback signature
Subject: tools/slabinfo-gnuplot: force to use bash shell
Subject: mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'
Subject: kernel: make groups_sort calling a responsibility group_info allocators
Subject: mm, oom_reaper: fix memory corruption
Subject: arch: define weak abort()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-11-30 0:09 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-11-30 0:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
28 fixes, based on 43570f0383d6d5879ae585e6c3cf027ba321546f:
Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context
Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
Subject: mm: fix device-dax pud write-faults triggered by get_user_pages()
Subject: mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
Subject: mm: replace pud_write with pud_access_permitted in fault + gup paths
Subject: mm: replace pmd_write with pmd_access_permitted in fault + gup paths
Subject: mm: replace pte_write with pte_access_permitted in fault + gup paths
Subject: scripts/faddr2line: extend usage on generic arch
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
Subject: device-dax: implement ->split() to catch invalid munmap attempts
Subject: mm: introduce get_user_pages_longterm
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
Subject: v4l2: disable filesystem-dax mapping support
Subject: IB/core: disable memory registration of filesystem-dax vmas
Subject: exec: avoid RLIMIT_STACK races with prlimit()
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
Subject: Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"
Subject: fs/mbcache.c: make count_objects() more robust
Subject: scripts/bloat-o-meter: don't fail with division by 0
Subject: kmemleak: add scheduling point to kmemleak_scan()
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
Subject: mm, memcg: fix mem_cgroup_swapout() for THPs
Subject: fs/fat/inode.c: fix sb_rdonly() change
Subject: autofs: revert "autofs: take more care to not update last_used on path walk"
Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
Subject: fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-11-17 23:25 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-11-17 23:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a bit more MM
- procfs updates
- dynamic-debug fixes
- lib/ updates
- checkpatch
- epoll
- nilfs2
- signals
- rapidio
- PID management cleanup and optimization
- kcov updates
- sysvipc updates
- quite a few misc things all over the place
94 patches, based on a3841f94c7ecb3ede0f888d3fcfe8fb6368ddd7a:
Subject: mm: fix nodemask printing
Subject: mm/z3fold.c: use kref to prevent page free/compact race
Subject: lib/dma-debug.c: fix incorrect pfn calculation
Subject: mm: shmem: remove unused info variable
Subject: mm, compaction: kcompactd should not ignore pageblock skip
Subject: mm, compaction: persistently skip hugetlbfs pageblocks
Subject: mm, compaction: extend pageblock_skip_persistent() to all compound pages
Subject: mm, compaction: split off flag for not updating skip hints
Subject: mm, compaction: remove unneeded pageblock_skip_persistent() checks
Subject: proc, coredump: add CoreDumping flag to /proc/pid/status
Subject: proc: : uninline name_to_int()
Subject: proc: use do-while in name_to_int()
Subject: spelling.txt: add "unnecessary" typo variants
Subject: sh/boot: add static stack-protector to pre-kernel
Subject: kernel debug: support resetting WARN*_ONCE
Subject: kernel debug: support resetting WARN_ONCE for all architectures
Subject: parse-maintainers: add ability to specify filenames
Subject: iopoll: avoid -Wint-in-bool-context warning
Subject: lkdtm: include WARN format string
Subject: bug: define the "cut here" string in a single place
Subject: bug: fix "cut here" location for __WARN_TAINT architectures
Subject: include/linux/compiler-clang.h: handle randomizable anonymous structs
Subject: kernel/umh.c: optimize 'proc_cap_handler()'
Subject: dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0
Subject: dynamic_debug documentation: minor fixes
Subject: get_maintainer: add --self-test for internal consistency tests
Subject: get_maintainer: add more --self-test options
Subject: include/linux/bitfield.h: include <linux/build_bug.h> instead of <linux/bug.h>
Subject: include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>
Subject: lib: add module support to string tests
Subject: lib/test: delete five error messages for failed memory allocations
Subject: lib/int_sqrt: optimize small argument
Subject: lib/int_sqrt: optimize initial value compute
Subject: lib/int_sqrt: adjust comments
Subject: lib/genalloc.c: make the avail variable an atomic_long_t
Subject: lib/nmi_backtrace.c: fix kernel text address leak
Subject: tools/lib/traceevent/parse-filter.c: clean up clang build warning
Subject: lib/rbtree-test: lower default params
Subject: lib: test module for find_*_bit() functions
Subject: checkpatch: support function pointers for unnamed function definition arguments
Subject: scripts/checkpatch.pl: avoid false warning missing break
Subject: checkpatch: printks always need a KERN_<LEVEL>
Subject: checkpatch: allow DEFINE_PER_CPU definitions to exceed line length
Subject: checkpatch: add TP_printk to list of logging functions
Subject: checkpatch: add --strict test for lines ending in [ or (
Subject: checkpatch: do not check missing blank line before builtin_*_driver
Subject: epoll: account epitem and eppoll_entry to kmemcg
Subject: epoll: avoid calling ep_call_nested() from ep_poll_safewake()
Subject: epoll: remove ep_call_nested() from ep_eventpoll_poll()
Subject: init/version.c: include <linux/export.h> instead of <linux/module.h>
Subject: autofs: don't fail mount for transient error
Subject: pipe: match pipe_max_size data type with procfs
Subject: pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
Subject: pipe: add proc_dopipe_max_size() to safely assign pipe_max_size
Subject: sysctl: check for UINT_MAX before unsigned int min/max
Subject: fs/nilfs2: convert timers to use timer_setup()
Subject: nilfs2: fix race condition that causes file system corruption
Subject: fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t
Subject: nilfs2: align block comments of nilfs_sufile_truncate_range() at *
Subject: nilfs2: use octal for unreadable permission macro
Subject: nilfs2: remove inode->i_version initialization
Subject: hfs/hfsplus: clean up unused variables in bnode.c
Subject: fat: remove redundant assignment of 0 to slots
Subject: kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
Subject: kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
Subject: kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
Subject: kdump: print a message in case parse_crashkernel_mem resulted in zero bytes
Subject: rapidio: constify rio_device_id
Subject: drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()'
Subject: drivers/rapidio/devices/rio_mport_cdev.c: fix error handling in 'rio_dma_transfer()'
Subject: Documentation/sysctl/vm.txt: fix typo
Subject: kernel/sysctl.c: code cleanups
Subject: pid: replace pid bitmap implementation with IDR API
Subject: pid: remove pidhash
Subject: kernel/panic.c: add TAINT_AUX
Subject: kcov: remove pointless current != NULL check
Subject: kcov: support comparison operands collection
Subject: Makefile: support flag -fsanitizer-coverage=trace-cmp
Subject: kcov: update documentation
Subject: kernel/reboot.c: add devm_register_reboot_notifier()
Subject: drivers/watchdog: make use of devm_register_reboot_notifier()
Subject: initramfs: use time64_t timestamps
Subject: sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
Subject: sysvipc: duplicate lock comments wrt ipc_addid()
Subject: sysvipc: properly name ipc_addid() limit parameter
Subject: sysvipc: make get_maxid O(1) again
Subject: mm: add infrastructure for get_user_pages_fast() benchmarking
Subject: drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
Subject: arch/ia64/include/asm/topology.h: remove unused parent_node() macro
Subject: arch/sh/include/asm/topology.h: remove unused parent_node() macro
Subject: arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
Subject: arch/tile/include/asm/topology.h: remove unused parent_node() macro
Subject: include/asm-generic/topology.h: remove unused parent_node() macro
Subject: EXPERT Kconfig menu: fix broken EXPERT menu
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-11-16 1:29 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-11-16 1:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a few misc bits
- ocfs2 updates
- almost all of MM
131 patches, based on c9b012e5f4a1d01dfa8abc6318211a67ba7d5db2:
Subject: bloat-o-meter: provide 3 different arguments for data, function and All
Subject: m32r: fix endianness constraints
Subject: ocfs2: remove unused declaration ocfs2_publish_get_mount_state()
Subject: ocfs2: no need flush workqueue before destroying it
Subject: ocfs2: cleanup unused func declaration and assignment
Subject: ocfs2: fix cluster hang after a node dies
Subject: ocfs2: clean up some unused function declarations
Subject: ocfs2: should wait dio before inode lock in ocfs2_setattr()
Subject: ocfs2: ip_alloc_sem should be taken in ocfs2_get_block()
Subject: ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent
Subject: ocfs2/dlm: get mle inuse only when it is initialized
Subject: ocfs2: remove unneeded goto in ocfs2_reserve_cluster_bitmap_bits()
Subject: tools: slabinfo: add "-U" option to show unreclaimable slabs only
Subject: mm: slabinfo: remove CONFIG_SLABINFO
Subject: mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory
Subject: mm/slob.c: remove an unnecessary check for __GFP_ZERO
Subject: mm/slab.c: only set __GFP_RECLAIMABLE once
Subject: slab, slub, slob: add slab_flags_t
Subject: slab, slub, slob: convert slab_flags_t to 32-bit
Subject: slub: fix sysfs duplicate filename creation when slub_debug=O
Subject: include/linux/slab.h: add kmalloc_array_node() and kcalloc_node()
Subject: block/blk-mq.c: use kmalloc_array_node()
Subject: drivers/infiniband/hw/qib/qib_init.c: use kmalloc_array_node()
Subject: drivers/infiniband/sw/rdmavt/qp.c: use kmalloc_array_node()
Subject: mm/mempool.c: use kmalloc_array_node()
Subject: net/rds/ib_fmr.c: use kmalloc_array_node()
Subject: mm: update comments for struct page.mapping
Subject: zram: set BDI_CAP_STABLE_WRITES once
Subject: bdi: introduce BDI_CAP_SYNCHRONOUS_IO
Subject: mm, swap: introduce SWP_SYNCHRONOUS_IO
Subject: mm, swap: skip swapcache for swapin of synchronous device
Subject: mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference
Subject: mm, swap: fix false error message in __swp_swapcount()
Subject: mm/page-writeback.c: remove unused parameter from balance_dirty_pages()
Subject: mm: drop migrate type checks from has_unmovable_pages
Subject: mm: distinguish CMA and MOVABLE isolation in has_unmovable_pages()
Subject: mm, page_alloc: fail has_unmovable_pages when seeing reserved pages
Subject: mm, memory_hotplug: do not fail offlining too early
Subject: mm, memory_hotplug: remove timeout from __offline_memory
Subject: mm/memblock.c: make the index explicit argument of for_each_memblock_type
Subject: mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical
Subject: zram: add zstd to the supported algorithms list
Subject: zram: remove zlib from the list of recommended algorithms
Subject: fs/hugetlbfs/inode.c: remove redundant -ENIVAL return from hugetlbfs_setattr()
Subject: mm/hmm: constify hmm_devmem_page_get_drvdata() parameter
Subject: zsmalloc: calling zs_map_object() from irq is a bug
Subject: mm/mmu_notifier: avoid double notification when it is useless
Subject: mm/mmu_notifier: avoid call to invalidate_range() in range_end()
Subject: mm: remove unused pgdat->inactive_ratio
Subject: mm/swap_slots.c: fix race conditions in swap_slots cache init
Subject: mm, arch: remove empty_bad_page*
Subject: mm/cma.c: change pr_info to pr_err for cma_alloc fail log
Subject: mm/page_owner.c: reduce page_owner structure size
Subject: mm: implement find_get_pages_range_tag()
Subject: btrfs: use pagevec_lookup_range_tag()
Subject: ceph: use pagevec_lookup_range_tag()
Subject: ext4: use pagevec_lookup_range_tag()
Subject: f2fs: use pagevec_lookup_range_tag()
Subject: f2fs: simplify page iteration loops
Subject: f2fs: use find_get_pages_tag() for looking up single page
Subject: gfs2: use pagevec_lookup_range_tag()
Subject: nilfs2: use pagevec_lookup_range_tag()
Subject: mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range()
Subject: mm: use pagevec_lookup_range_tag() in write_cache_pages()
Subject: mm: add variant of pagevec_lookup_range_tag() taking number of pages
Subject: ceph: use pagevec_lookup_range_nr_tag()
Subject: mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
Subject: afs: use find_get_pages_range_tag()
Subject: cifs: use find_get_pages_range_tag()
Subject: kmemleak: change /sys/kernel/debug/kmemleak permissions from 0444 to 0644
Subject: mm: account pud page tables
Subject: mm: introduce wrappers to access mm->nr_ptes
Subject: mm: consolidate page table accounting
Subject: fs, mm: account filp cache to kmemcg
Subject: mm/rmap.c: remove redundant variable cend
Subject: kmemcheck: remove annotations
Subject: kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK
Subject: kmemcheck: remove whats left of NOTRACK flags
Subject: kmemcheck: rip it out
Subject: mm/swap_state.c: declare a few variables as __read_mostly
Subject: mm: deferred_init_memmap improvements
Subject: x86/mm: set fields in deferred pages
Subject: sparc64/mm: set fields in deferred pages
Subject: sparc64: simplify vmemmap_populate
Subject: mm: define memblock_virt_alloc_try_nid_raw
Subject: mm: zero reserved and unavailable struct pages
Subject: x86/mm/kasan: don't use vmemmap_populate() to initialize shadow
Subject: arm64/mm/kasan: don't use vmemmap_populate() to initialize shadow
Subject: mm: stop zeroing memory during allocation in vmemmap
Subject: sparc64: optimize struct page zeroing
Subject: mm/page_alloc: make sure __rmqueue() etc are always inline
Subject: userfaultfd: use mmgrab instead of open-coded increment of mm_count
Subject: mm, soft_offline: improve hugepage soft offlining error log
Subject: mm/page-writeback.c: convert timers to use timer_setup()
Subject: drivers/block/zram/zram_drv.c: make zram_page_end_io() static
Subject: mm: speed up cancel_dirty_page() for clean pages
Subject: mm: refactor truncate_complete_page()
Subject: mm: factor out page cache page freeing into a separate function
Subject: mm: move accounting updates before page_cache_tree_delete()
Subject: mm: move clearing of page->mapping to page_cache_tree_delete()
Subject: mm: factor out checks and accounting from __delete_from_page_cache()
Subject: mm: batch radix tree operations when truncating pages
Subject: mm, page_alloc: enable/disable IRQs once when freeing a list of pages
Subject: mm, truncate: do not check mapping for every page being truncated
Subject: mm, truncate: remove all exceptional entries from pagevec under one lock
Subject: mm: only drain per-cpu pagevecs once per pagevec usage
Subject: mm, pagevec: remove cold parameter for pagevecs
Subject: mm: remove cold parameter for release_pages
Subject: mm: remove cold parameter from free_hot_cold_page*
Subject: mm: remove __GFP_COLD
Subject: mm, page_alloc: simplify list handling in rmqueue_bulk()
Subject: mm, pagevec: rename pagevec drained field
Subject: Unify migrate_pages and move_pages access checks
Subject: shmem: convert shmem_init_inodecache() to void
Subject: mm, sysctl: make NUMA stats configurable
Subject: mm: mlock: remove lru_add_drain_all()
Subject: mm, page_alloc: fix potential false positive in __zone_watermark_ok
Subject: fs: fuse: account fuse_inode slab memory as reclaimable
Subject: mm: don't warn about allocations which stall for too long
Subject: mm/page_alloc.c: broken deferred calculation
Subject: mm/shmem.c: mark expected switch fall-through
Subject: mm/list_lru.c: mark expected switch fall-through
Subject: mm/hmm: remove redundant variable align_end
Subject: mm, sparse: do not swamp log with huge vmemmap allocation failures
Subject: mm: do not rely on preempt_count in print_vma_addr
Subject: writeback: remove unused function parameter
Subject: mm/page_ext.c: check if page_ext is not prepared
Subject: mm,oom_reaper: remove pointless kthread_run() error check
Subject: mm: simplify nodemask printing
Subject: mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
Subject: memory hotplug: fix comments when adding section
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-11-09 21:38 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-11-09 21:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
2 fixes, based on 3fefc31843cfe2b5f072efe11ed9ccaf6a7a5092:
Subject: sysctl: add register_sysctl() dummy helper
Subject: MAINTAINERS: update TPM driver infrastructure changes
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-11-02 22:59 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-11-02 22:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
7 fixes, based on 5cb0512c02ecd7e6214e912e4c150f4219ac78e0:
Subject: userfaultfd: hugetlbfs: prevent UFFDIO_COPY to fill beyond the end of i_size
Subject: mm, /proc/pid/pagemap: fix soft dirty marking for PMD migration entry
Subject: ocfs2: fstrim: Fix start offset of first cluster group during fstrim
Subject: fs/hugetlbfs/inode.c: fix hwpoison reserve accounting
Subject: initramfs: fix initramfs rebuilds w/ compression after disabling
Subject: mm/huge_memory.c: deposit page table when copying a PMD migration entry
Subject: mm, swap: fix race between swap count continuation operations
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-10-13 22:57 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-10-13 22:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
18 fixes, based on 997301a860fca1a05ab8e383a8039b65f8abeb1e:
Subject: mm/migrate: fix indexing bug (off by one) and avoid out of bound access
Subject: lib/Kconfig.debug: kernel hacking menu: runtime testing: keep tests together
Subject: mm/madvise.c: add description for MADV_WIPEONFORK and MADV_KEEPONFORK
Subject: include/linux/of.h: provide of_n_{addr,size}_cells wrappers for !CONFIG_OF
Subject: mm/mempolicy: fix NUMA_INTERLEAVE_HIT counter
Subject: mm: remove unnecessary WARN_ONCE in page_vma_mapped_walk().
Subject: mm: only display online cpus of the numa node
Subject: userfaultfd: selftest: exercise -EEXIST only in background transfer
Subject: scripts/kallsyms.c: ignore symbol type 'n'
Subject: mm/cma.c: take __GFP_NOWARN into account in cma_alloc()
Subject: Revert "vmalloc: back off when the current task is killed"
Subject: tty: fall back to N_NULL if switching to N_TTY fails during hangup
Subject: linux/kernel.h: add/correct kernel-doc notation
Subject: fs/mpage.c: fix mpage_writepage() for pages with buffers
Subject: fs/binfmt_misc.c: node could be NULL when evicting inode
Subject: kmemleak: clear stale pointers from task stacks
Subject: mm: page_vma_mapped: ensure pmd is loaded with READ_ONCE outside of lock
Subject: mm, swap: use page-cluster as max window of VMA based swap readahead
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-10-03 23:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-10-03 23:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
A lot of stuff, sorry about that. A week on a beach, then a bunch of
time catching up then more time letting it bake in -next. Shan't do
that again!
51 fixes, based on d81fa669e3de7eb8a631d7d95dac5fbcb2bf9d4e:
Subject: alpha: fix build failures
Subject: kernel/params.c: align add_sysfs_param documentation with code
Subject: scripts/spelling.txt: add more spelling mistakes to spelling.txt
Subject: include/linux/mm.h: fix typo in VM_MPX definition
Subject: ksm: fix unlocked iteration over vmas in cmp_and_merge_page()
Subject: mm, hugetlb, soft_offline: save compound page order before page migration
Subject: sh: sh7722: remove nonexistent GPIO_PTQ7 to fix pinctrl registration
Subject: sh: sh7757: remove nonexistent GPIO_PT[JLNQ]7_RESV to fix pinctrl registration
Subject: sh: sh7264: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration
Subject: sh: sh7269: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration
Subject: z3fold: fix potential race in z3fold_reclaim_page
Subject: mm, oom_reaper: skip mm structs with mmu notifiers
Subject: mm, memcg: remove hotplug locking from try_charge
Subject: mm/memcg: avoid page count check for zone device
Subject: android: binder: drop lru lock in isolate callback
Subject: mm,compaction: serialize waitqueue_active() checks (for real)
Subject: z3fold: fix stale list handling
Subject: mm: meminit: mark init_reserved_page as __meminit
Subject: rapidio: remove global irq spinlocks from the subsystem
Subject: mm: fix RODATA_TEST failure "rodata_test: test data was not read only"
Subject: zram: fix null dereference of handle
Subject: m32r: define CPU_BIG_ENDIAN
Subject: mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC
Subject: mm: avoid marking swap cached page as lazyfree
Subject: mm: fix data corruption caused by lazyfree page
Subject: mm/device-public-memory: fix edge case in _vm_normal_page()
Subject: userfaultfd: non-cooperative: fix fork use after free
Subject: exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array
Subject: exec: binfmt_misc: don't nullify Node->dentry in kill_node()
Subject: exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()
Subject: exec: binfmt_misc: remove the confusing e->interp_file != NULL checks
Subject: exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
Subject: exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array
Subject: lib/lz4: make arrays static const, reduces object code size
Subject: include/linux/bitfield.h: remove 32bit from FIELD_GET comment block
Subject: kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()
Subject: mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
Subject: lib/idr.c: fix comment for idr_replace()
Subject: mm, memory_hotplug: add scheduling point to __add_pages
Subject: mm, page_alloc: add scheduling point to memmap_init_zone
Subject: memremap: add scheduling point to devm_memremap_pages
Subject: kernel/kcmp.c: drop branch leftover typo
Subject: mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
Subject: mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
Subject: kernel/params.c: fix the maximum length in param_get_string
Subject: kernel/params.c: fix an overflow in param_attr_show
Subject: kernel/params.c: improve STANDARD_PARAM_DEF readability
Subject: lib/ratelimit.c: use deferred printk() version
Subject: m32r: fix build failure
Subject: checkpatch: fix ignoring cover-letter logic
Subject: include/linux/fs.h: fix comment about struct address_space
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-09-13 23:28 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-09-13 23:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
A few leftovers. Now with fixed up locale stuff, fingers crossed.
9 patches, based on 46c1e79fee417f151547aa46fae04ab06cb666f4:
Subject: idr: remove WARN_ON_ONCE() when trying to replace negative ID
Subject: drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4
Subject: procfs: remove unused variable
Subject: lib/test_bitmap.c: use ULL suffix for 64-bit constants
Subject: fscache: fix fscache_objlist_show format processing
Subject: IB/mlx4: fix sprintf format warning
Subject: mm: treewide: remove GFP_TEMPORARY allocation flag
Subject: arm64: stacktrace: avoid listing stacktrace functions in stacktrace
Subject: mm, page_owner: skip unnecessary stack_trace entries
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
[not found] ` <CA+55aFw+z3HDT4s1C41j=d5_0QTSu8NLSSpnk_jxZ39w34xgnA@mail.gmail.com>
@ 2017-09-09 18:09 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-09-09 18:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Stephen Rothwell, mm-commits
On Sat, 9 Sep 2017 10:40:21 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Sep 8, 2017 at 6:27 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Git does all of this right. Your quilt scripts are garbage. Please
> > please start fixing this.
> >
> > I've worked around it by just editing the patch, but..
>
> .. and I just realized that your patches must obviously be ok in your
> tree, since you can apply them, and apparently Stephen can apply them
> in linux-next.
>
> I'm assuming Stephen applies them from your quilt series directly, and
> thus never saw the problem with bad locale conversion.
>
> Maybe we should just change the workflow, with you sending me a raw
> tar-ball of the quilt series (or whatever the equivalent quilt
> "bundle" is) as an attachment and we forego the traditional
> patch-bombing model?
>
> That would avoid the locale issues with email.
>
Leave it with me - I need to sit down and have fiddle for a while. For
some reason I can't recall I had LOCALE=C set, and using en_US.UTF-8
changes things quite a lot.
And I need to figure out why the heck I did this:
iconv -f latin1 | mailx -s "$subject" "$all"
!
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-09-08 23:10 Andrew Morton
[not found] ` <CA+55aFwRXB5_kSuN7o+tqN6Eft6w5oZuLG3B8Rns=0ZZa2ihgA@mail.gmail.com>
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2017-09-08 23:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
126 patches, based on 015a9e66b9b8c1f28097ed09bf9350708e26249a:
- most of the rest of MM
- a small number of misc things
- lib/ updates
- checkpatch
- autofs updates
- ipc/ updates
Subject: mm: mempolicy: add queue_pages_required()
Subject: mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
Subject: mm: thp: introduce separate TTU flag for thp freezing
Subject: mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION
Subject: mm: thp: enable thp migration in generic path
Subject: mm: thp: check pmd migration entry in common path
Subject: mm: soft-dirty: keep soft-dirty bits over thp migration
Subject: mm: mempolicy: mbind and migrate_pages support thp migration
Subject: mm: migrate: move_pages() supports thp migration
Subject: mm: memory_hotplug: memory hotremove supports thp migration
Subject: hmm: heterogeneous memory management documentation
Subject: mm/hmm: heterogeneous memory management (HMM for short)
Subject: mm/hmm/mirror: mirror process address space on device with HMM helpers
Subject: mm/hmm/mirror: helper to snapshot CPU page table
Subject: mm/hmm/mirror: device page fault handler
Subject: mm/memory_hotplug: introduce add_pages
Subject: mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory
Subject: mm/ZONE_DEVICE: special case put_page() for device private pages
Subject: mm/memcontrol: allow to uncharge page without using page->lru field
Subject: mm/memcontrol: support MEMORY_DEVICE_PRIVATE
Subject: mm/hmm/devmem: device memory hotplug using ZONE_DEVICE
Subject: mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory
Subject: mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY
Subject: mm/migrate: new memory migration helper for use with device memory
Subject: mm/migrate: migrate_vma() unmap page from vma while collecting pages
Subject: mm/migrate: support un-addressable ZONE_DEVICE page in migration
Subject: mm/migrate: allow migrate_vma() to alloc new page on empty entry
Subject: mm/device-public-memory: device memory cache coherent with CPU
Subject: mm/hmm: add new helper to hotplug CDM memory region
Subject: mm/hmm: avoid bloating arch that do not make use of HMM
Subject: mm/hmm: fix build when HMM is disabled
Subject: mm: remove useless vma parameter to offset_il_node
Subject: userfaultfd: non-cooperative: closing the uffd without triggering SIGBUS
Subject: mm/memory.c: remove reduntant check for write access
Subject: mm: change the call sites of numa statistics items
Subject: mm: update NUMA counter threshold size
Subject: mm: consider the number in local CPUs when reading NUMA stats
Subject: mm/mlock.c: use page_zone() instead of page_zone_id()
Subject: mm/zsmalloc.c: change stat type parameter to int
Subject: mm: fadvise: avoid fadvise for fs without backing device
Subject: mm: memcontrol: use per-cpu stocks for socket memory uncharging
Subject: mm/memory.c: fix mem_cgroup_oom_disable() call missing
Subject: mm/sparse.c: fix typo in online_mem_sections
Subject: tools/testing/selftests/kcmp/kcmp_test.c: add KCMP_EPOLL_TFD testing
Subject: mm/page_alloc.c: apply gfp_allowed_mask before the first allocation attempt
Subject: mm: kvfree the swap cluster info if the swap file is unsatisfactory
Subject: mm/swapfile.c: fix swapon frontswap_map memory leak on error
Subject: mm/mempolicy.c: remove BUG_ON() checks for VMA inside mpol_misplaced()
Subject: fs, proc: remove priv argument from is_stack
Subject: proc: uninline proc_create()
Subject: fs, proc: unconditional cond_resched when reading smaps
Subject: linux/kernel.h: move DIV_ROUND_DOWN_ULL() macro
Subject: lib/string.c: add multibyte memset functions
Subject: lib/string.c: add testcases for memset16/32/64
Subject: x86: implement memset16, memset32 & memset64
Subject: ARM: implement memset32 & memset64
Subject: alpha: add support for memset16
Subject: drivers/block/zram/zram_drv.c: convert to using memset_l
Subject: drivers/scsi/sym53c8xx_2/sym_hipd.c: convert to use memset32
Subject: vga: optimise console scrolling
Subject: treewide: make "nr_cpu_ids" unsigned
Subject: arch: define CPU_BIG_ENDIAN for all fixed big endian archs
Subject: arch/microblaze: add choice for endianness and update Makefile
Subject: include: warn for inconsistent endian config definition
Subject: bitops: avoid integer overflow in GENMASK(_ULL)
Subject: rbtree: cache leftmost node internally
Subject: rbtree: optimize root-check during rebalancing loop
Subject: rbtree: add some additional comments for rebalancing cases
Subject: lib/rbtree_test.c: make input module parameters
Subject: lib/rbtree_test.c: add (inorder) traversal test
Subject: lib/rbtree_test.c: support rb_root_cached
Subject: sched/fair: replace cfs_rq->rb_leftmost
Subject: sched/deadline: replace earliest dl and rq leftmost caching
Subject: locking/rtmutex: replace top-waiter and pi_waiters leftmost caching
Subject: block/cfq: replace cfq_rb_root leftmost caching
Subject: lib/interval_tree: fast overlap detection
Subject: lib/interval-tree: correct comment wrt generic flavor
Subject: procfs: use faster rb_first_cached()
Subject: fs/epoll: use faster rb_first_cached()
Subject: mem/memcg: cache rightmost node
Subject: block/cfq: cache rightmost rb_node
Subject: lib/hexdump.c: return -EINVAL in case of error in hex2bin()
Subject: lib: add test module for CONFIG_DEBUG_VIRTUAL
Subject: lib/bitmap.c: make bitmap_parselist() thread-safe and much faster
Subject: lib/test_bitmap.c: add test for bitmap_parselist()
Subject: bitmap: introduce BITMAP_FROM_U64()
Subject: lib/rhashtable: fix comment on locks_mul default value
Subject: lib/string.c: check for kmalloc() failure
Subject: lib/cmdline.c: remove meaningless comment
Subject: radix-tree: must check __radix_tree_preload() return value
Subject: lib/oid_registry.c: X.509: fix the buffer overflow in the utility function for OID string
Subject: checkpatch: add --strict check for ifs with unnecessary parentheses
Subject: checkpatch: fix typo in comment
Subject: checkpatch: rename variables to avoid confusion
Subject: checkpatch: add 6 missing types to --list-types
Subject: binfmt_flat: delete two error messages for a failed memory allocation in decompress_exec()
Subject: init: move stack canary initialization after setup_arch
Subject: init/main.c: extract early boot entropy from the passed cmdline
Subject: autofs: fix AT_NO_AUTOMOUNT not being honored
Subject: autofs: make disc device user accessible
Subject: autofs: make dev ioctl version and ismountpoint user accessible
Subject: autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT
Subject: autofs: non functional header inclusion cleanup
Subject: autofs: use AUTOFS_DEV_IOCTL_SIZE
Subject: autofs: drop wrong comment
Subject: autofs: use unsigned int/long instead of uint/ulong for ioctl args
Subject: vfat: deduplicate hex2bin()
Subject: test_kmod: remove paranoid UINT_MAX check on uint range processing
Subject: test_kmod: flip INT checks to be consistent
Subject: kmod: split out umh code into its own file
Subject: MAINTAINERS: clarify kmod is just a kernel module loader
Subject: kmod: split off umh headers into its own file
Subject: kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
Subject: cpumask: make cpumask_next() out-of-line
Subject: drivers/pps: aesthetic tweaks to PPS-related content
Subject: drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
Subject: m32r: defconfig: cleanup from old Kconfig options
Subject: mn10300: defconfig: cleanup from old Kconfig options
Subject: sh: defconfig: cleanup from old Kconfig options
Subject: kcov: support compat processes
Subject: ipc: convert ipc_namespace.count from atomic_t to refcount_t
Subject: ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
Subject: ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
Subject: ipc/sem: drop sem_checkid helper
Subject: ipc/sem: play nicer with large nsops allocations
Subject: ipc: optimize semget/shmget/msgget for lots of keys
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-09-06 23:17 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-09-06 23:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- various misc bits
- DAX updates
- OCFS2
- most of MM
119 patches, based on e7d0c41ecc2e372a81741a30894f556afec24315:
Subject: metag/numa: remove the unused parent_node() macro
Subject: mm: add vm_insert_mixed_mkwrite()
Subject: dax: relocate some dax functions
Subject: dax: use common 4k zero page for dax mmap reads
Subject: dax: remove DAX code from page_cache_tree_insert()
Subject: dax: move all DAX radix tree defs to fs/dax.c
Subject: dax: explain how read(2)/write(2) addresses are validated
Subject: dax: use PG_PMD_COLOUR instead of open coding
Subject: dax: initialize variable pfn before using it
Subject: modpost: simplify sec_name()
Subject: ocfs2: make ocfs2_set_acl() static
Subject: ocfs2: clean up some dead code
Subject: slub: tidy up initialization ordering
Subject: mm: add SLUB free list pointer obfuscation
Subject: mm/slub.c: add a naive detection of double free or corruption
Subject: mm: track actual nr_scanned during shrink_slab()
Subject: drm/i915: wire up shrinkctl->nr_scanned
Subject: mm/memory_hotplug: just build zonelist for newly added node
Subject: mm, memory_hotplug: display allowed zones in the preferred ordering
Subject: mm, memory_hotplug: remove zone restrictions
Subject: zram: clean up duplicated codes in __zram_bvec_write
Subject: zram: inline zram_compress
Subject: zram: rename zram_decompress_page to __zram_bvec_read
Subject: zram: add interface to specif backing device
Subject: zram: add free space management in backing device
Subject: zram: identify asynchronous IO's return value
Subject: zram: write incompressible pages to backing device
Subject: zram: read page from backing device
Subject: zram: add config and doc file for writeback feature
Subject: mm, page_alloc: rip out ZONELIST_ORDER_ZONE
Subject: mm, page_alloc: remove boot pageset initialization from memory hotplug
Subject: mm, page_alloc: do not set_cpu_numa_mem on empty nodes initialization
Subject: mm, memory_hotplug: drop zone from build_all_zonelists
Subject: mm, memory_hotplug: remove explicit build_all_zonelists from try_online_node
Subject: mm, page_alloc: simplify zonelist initialization
Subject: mm, page_alloc: remove stop_machine from build_all_zonelists
Subject: mm, memory_hotplug: get rid of zonelists_mutex
Subject: mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations
Subject: mm, page_owner: make init_pages_in_zone() faster
Subject: mm, page_ext: periodically reschedule during page_ext_init()
Subject: mm, page_owner: don't grab zone->lock for init_pages_in_zone()
Subject: mm/mremap: fail map duplication attempts for private mappings
Subject: mm/gup: make __gup_device_* require THP
Subject: mm/hugetlb.c: make huge_pte_offset() consistent and document behaviour
Subject: mm: always flush VMA ranges affected by zap_page_range
Subject: zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse
Subject: mm, vmscan: do not loop on too_many_isolated for ever
Subject: fscache: remove unused ->now_uncached callback
Subject: mm: make pagevec_lookup() update index
Subject: mm: implement find_get_pages_range()
Subject: fs: fix performance regression in clean_bdev_aliases()
Subject: ext4: use pagevec_lookup_range() in ext4_find_unwritten_pgoff()
Subject: ext4: use pagevec_lookup_range() in writeback code
Subject: hugetlbfs: use pagevec_lookup_range() in remove_inode_hugepages()
Subject: fs: use pagevec_lookup_range() in page_cache_seek_hole_data()
Subject: mm: use find_get_pages_range() in filemap_range_has_page()
Subject: mm: remove nr_pages argument from pagevec_lookup{,_range}()
Subject: mm, memcg: reset memory.low during memcg offlining
Subject: cgroup: revert fa06235b8eb0 ("cgroup: reset css on destruction")
Subject: mm/ksm.c: constify attribute_group structures
Subject: mm/slub.c: constify attribute_group structures
Subject: mm/page_idle.c: constify attribute_group structures
Subject: mm/huge_memory.c: constify attribute_group structures
Subject: mm/hugetlb.c: constify attribute_group structures
Subject: mm: memcontrol: use int for event/state parameter in several functions
Subject: mm, THP, swap: support to clear swap cache flag for THP swapped out
Subject: mm, THP, swap: support to reclaim swap space for THP swapped out
Subject: mm, THP, swap: make reuse_swap_page() works for THP swapped out
Subject: mm, THP, swap: don't allocate huge cluster for file backed swap device
Subject: block, THP: make block_device_operations.rw_page support THP
Subject: mm: test code to write THP to swap device as a whole
Subject: mm, THP, swap: support splitting THP for THP swap out
Subject: memcg, THP, swap: support move mem cgroup charge for THP swapped out
Subject: memcg, THP, swap: avoid to duplicated charge THP in swap cache
Subject: memcg, THP, swap: make mem_cgroup_swapout() support THP
Subject: mm, THP, swap: delay splitting THP after swapped out
Subject: mm, THP, swap: add THP swapping out fallback counting
Subject: shmem: shmem_charge: verify max_block is not exceeded before inode update
Subject: shmem: introduce shmem_inode_acct_block
Subject: userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support
Subject: userfaultfd: mcopy_atomic: introduce mfill_atomic_pte helper
Subject: userfaultfd: shmem: wire up shmem_mfill_zeropage_pte
Subject: userfaultfd: report UFFDIO_ZEROPAGE as available for shmem VMAs
Subject: userfaultfd: selftest: enable testing of UFFDIO_ZEROPAGE for shmem
Subject: fs/sync.c: remove unnecessary NULL f_mapping check in sync_file_range
Subject: include/linux/fs.h: remove unneeded forward definition of mm_struct
Subject: mm: hugetlb: define system call hugetlb size encodings in single file
Subject: mm: arch: consolidate mmap hugetlb size encodings
Subject: mm: shm: use new hugetlb size encoding definitions
Subject: mm: rename global_page_state to global_zone_page_state
Subject: mm: userfaultfd: add feature to request for a signal delivery
Subject: userfaultfd: selftest: add tests for UFFD_FEATURE_SIGBUS feature
Subject: userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST
Subject: userfaultfd: selftest: explicit failure if the SIGBUS test failed
Subject: userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds
Subject: userfaultfd: provide pid in userfault msg
Subject: userfaultfd: provide pid in userfault msg - add feat union
Subject: mm, hugetlb: do not allocate non-migrateable gigantic pages from movable zones
Subject: mm/vmstat: fix divide error at __fragmentation_index
Subject: mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
Subject: mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
Subject: mm/shmem: add hugetlbfs support to memfd_create()
Subject: selftests/memfd: add memfd_create hugetlbfs selftest
Subject: mm/vmstat.c: fix wrong comment
Subject: mm/vmalloc.c: don't reinvent the wheel but use existing llist API
Subject: mm, swap: add swap readahead hit statistics
Subject: mm, swap: fix swap readahead marking
Subject: mm, swap: VMA based swap readahead
Subject: mm, swap: add sysfs interface for VMA based swap readahead
Subject: mm, swap: don't use VMA based swap readahead if HDD is used as swap
Subject: z3fold: use per-cpu unbuddied lists
Subject: mm, oom: do not rely on TIF_MEMDIE for memory reserves access
Subject: mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
Subject: swap: choose swap device according to numa node
Subject: mm: oom: let oom_reap_task and exit_mmap run concurrently
Subject: mm: hugetlb: clear target sub-page last when clearing huge page
Subject: mm: add /proc/pid/smaps_rollup
Subject: x86,mpx: make mpx depend on x86-64 to free up VMA flag
Subject: mm,fork: introduce MADV_WIPEONFORK
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-08-31 23:15 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-08-31 23:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
6 fixes, baed on 42ff72cf27027fa28dd79acabe01d9196f1480a7:
Subject: mm,page_alloc: don't call __node_reclaim() with oom_lock held.
Subject: kernel/kthread.c: kthread_worker: don't hog the cpu
Subject: mm, uprobes: fix multiple free of ->uprobes_state.xol_area
Subject: mm, madvise: ensure poisoned pages are removed from per-cpu lists
Subject: include/linux/compiler.h: don't perform compiletime_assert with -O0
Subject: scripts/dtc: fix '%zx' warning
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-08-25 22:55 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-08-25 22:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
6 fixes, based on 90a6cd503982bfd33ce8c70eb49bd2dd33bc6325:
Subject: PM/hibernate: touch NMI watchdog when creating snapshot
Subject: mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled
Subject: dax: fix deadlock due to misaligned PMD faults
Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
Subject: fork: fix incorrect fput of ->exe_file causing use-after-free
Subject: mm/memblock.c: reversed logic in memblock_discard()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-08-18 22:15 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-08-18 22:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
14 fixes, based on 039a8e38473323ed9f6c4415b4c3a36777efac34:
Subject: mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
Subject: kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog
Subject: wait: add wait_event_killable_timeout()
Subject: kmod: fix wait on recursive loop
Subject: test_kmod: fix description for -s -and -c parameters
Subject: mm: discard memblock data later
Subject: slub: fix per memcg cache leak on css offline
Subject: mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS
Subject: mm, oom: fix potential data corruption when oom_reaper races with writer
Subject: signal: don't remove SIGNAL_UNKILLABLE for traced tasks.
Subject: mm/cma_debug.c: fix stack corruption due to sprintf usage
Subject: mm/mempolicy: fix use after free when calling get_mempolicy
Subject: mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM
Subject: mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-08-10 22:23 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-08-10 22:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
21 fixes, based on 26273939ace935dd7553b31d279eab30b40f7b9a:
Subject: mm: fix global NR_SLAB_.*CLAIMABLE counter reads
Subject: mm: ratelimit PFNs busy info message
Subject: userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
Subject: test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
Subject: test_kmod: fix bug which allows negative values on two config options
Subject: test_kmod: fix the lock in register_test_dev_kmod()
Subject: test_kmod: fix small memory leak on filesystem tests
Subject: fault-inject: fix wrong should_fail() decision in task context
Subject: mm: migrate: prevent racy access to tlb_flush_pending
Subject: mm: migrate: fix barriers around tlb_flush_pending
Subject: Revert "mm: numa: defer TLB flush for THP migration as long as possible"
Subject: mm: refactor TLB gathering API
Subject: mm: make tlb_flush_pending global
Subject: mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Subject: mm: fix KSM data corruption
Subject: MAINTAINERS: copy virtio on balloon_compaction.c
Subject: mm/balloon_compaction.c: don't zero ballooned pages
Subject: mm: fix list corruptions on shmem shrinklist
Subject: rmap: do not call mmu_notifier_invalidate_page() under ptl
Subject: zram: rework copy of compressor name in comp_algorithm_store()
Subject: userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-08-02 20:31 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-08-02 20:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
16 fixes, based on 4d3f5d04d69e9479a3df88ceb0e2cd8188a49366:
Subject: mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
Subject: pid: kill pidhash_size in pidhash_init()
Subject: mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
Subject: userfaultfd: non-cooperative: notify about unmap of destination during mremap
Subject: kasan: avoid -Wmaybe-uninitialized warning
Subject: kthread: fix documentation build warning
Subject: zram: do not free pool->size_class
Subject: fortify: use WARN instead of BUG for now
Subject: mm/page_io.c: fix oops during block io poll in swapin path
Subject: mm: take memory hotplug lock within numa_zonelist_order_handler()
Subject: userfaultfd_zeropage: return -ENOSPC in case mm has gone
Subject: cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
Subject: ipc: add missing container_of()s for randstruct
Subject: userfaultfd: non-cooperative: flush event_wqh at release time
Subject: mm: allow page_cache_get_speculative in interrupt context
Subject: ocfs2: don't clear SGID when inheriting ACLs
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-07-14 21:46 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-07-14 21:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a few leftovers
- fault-injector rework
- add a module loader test driver
13 patches, based on b86faee6d111294fa95a2e89b5f771b2da3c9782:
Subject: mm: fix overflow check in expand_upwards()
Subject: lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
Subject: MAINTAINERS: move the befs tree to kernel.org
Subject: kernel/watchdog.c: use better pr_fmt prefix
Subject: fault-inject: automatically detect the number base for fail-nth write interface
Subject: fault-inject: parse as natural 1-based value for fail-nth write interface
Subject: fault-inject: make fail-nth read/write interface symmetric
Subject: fault-inject: simplify access check for fail-nth
Subject: fault-inject: add /proc/<pid>/fail-nth
Subject: xtensa: use generic fb.h
Subject: MAINTAINERS: give kmod some maintainer love
Subject: kmod: add test driver to stress test the module loader
Subject: kmod: throttle kmod thread limit
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-07-12 21:32 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-07-12 21:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- various misc things
- kexec updates
- sysctl core updates
- scripts/gdb udpates
- checkpoint-restart updates
- ipc updates
- kernel/watchdog updates
- Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"
- "stackprotector: ascii armor the stack canary"
- more MM bits
- checkpatch updates
96 patches, based on 235b84fc862ae2637dc0dabada18d97f1bfc18e1:
Subject: include/linux/dcache.h: use unsigned chars in struct name_snapshot
Subject: kernel.h: handle pointers to arrays better in container_of()
Subject: mm/memory.c: mark create_huge_pmd() inline to prevent build failure
Subject: kernel/fork.c: virtually mapped stacks: do not disable interrupts
Subject: kexec: move vmcoreinfo out of the kernel's .bss section
Subject: powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdr
Subject: kdump: protect vmcoreinfo data under the crash memory
Subject: kexec/kdump: minor Documentation updates for arm64 and Image
Subject: sysctl: fix lax sysctl_check_table() sanity check
Subject: sysctl: kdoc'ify sysctl_writes_strict
Subject: sysctl: fold sysctl_writes_strict checks into helper
Subject: sysctl: simplify unsigned int support
Subject: sysctl: add unsigned int range support
Subject: test_sysctl: add dedicated proc sysctl test driver
Subject: test_sysctl: add generic script to expand on tests
Subject: test_sysctl: test against PAGE_SIZE for int
Subject: test_sysctl: add simple proc_dointvec() case
Subject: test_sysctl: add simple proc_douintvec() case
Subject: test_sysctl: test against int proc_dointvec() array support
Subject: kernel/sysctl_binary.c: check name array length in deprecated_sysctl_warning()
Subject: random: do not ignore early device randomness
Subject: bfs: fix sanity checks for empty files
Subject: fs/Kconfig: kill CONFIG_PERCPU_RWSEM some more
Subject: scripts/gdb: add lx-fdtdump command
Subject: scripts/gdb: lx-dmesg: cast log_buf to void* for addr fetch
Subject: scripts/gdb: lx-dmesg: use explicit encoding=utf8 errors=replace
Subject: kfifo: clean up example to not use page_link
Subject: procfs: fdinfo: extend information about epoll target files
Subject: kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files
Subject: kcmp: fs/epoll: wrap kcmp code with CONFIG_CHECKPOINT_RESTORE
Subject: fault-inject: support systematic fault injection
Subject: ipc/sem.c: remove sem_base, embed struct sem
Subject: ipc: merge ipc_rcu and kern_ipc_perm
Subject: include/linux/sem.h: correctly document sem_ctime
Subject: ipc: drop non-RCU allocation
Subject: ipc/sem: do not use ipc_rcu_free()
Subject: ipc/shm: do not use ipc_rcu_free()
Subject: ipc/msg: do not use ipc_rcu_free()
Subject: ipc/util: drop ipc_rcu_free()
Subject: ipc/sem: avoid ipc_rcu_alloc()
Subject: ipc/shm: avoid ipc_rcu_alloc()
Subject: ipc/msg: avoid ipc_rcu_alloc()
Subject: ipc/util: drop ipc_rcu_alloc()
Subject: ipc/sem.c: avoid ipc_rcu_putref for failed ipc_addid()
Subject: ipc/shm.c: avoid ipc_rcu_putref for failed ipc_addid()
Subject: ipc/msg.c: avoid ipc_rcu_putref for failed ipc_addid()
Subject: ipc: move atomic_set() to where it is needed
Subject: ipc/shm: remove special shm_alloc/free
Subject: ipc/msg: remove special msg_alloc/free
Subject: ipc/sem: drop __sem_free()
Subject: ipc/util.h: update documentation for ipc_getref() and ipc_putref()
Subject: net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()
Subject: kernel/watchdog: remove unused declaration
Subject: kernel/watchdog: introduce arch_touch_nmi_watchdog()
Subject: kernel/watchdog: split up config options
Subject: kernel/watchdog: provide watchdog_nmi_reconfigure() for arch watchdogs
Subject: powerpc/64s: implement arch-specific hardlockup watchdog
Subject: efi: avoid fortify checks in EFI stub
Subject: kexec_file: adjust declaration of kexec_purgatory
Subject: IB/rxe: do not copy extra stack memory to skb
Subject: powerpc: don't fortify prom_init
Subject: powerpc: make feature-fixup tests fortify-safe
Subject: include/linux/string.h: add the option of fortified string.h functions
Subject: sh: mark end of BUG() implementation as unreachable
Subject: random,stackprotect: introduce get_random_canary function
Subject: fork,random: use get_random_canary() to set tsk->stack_canary
Subject: x86: ascii armor the x86_64 boot init stack canary
Subject: arm64: ascii armor the arm64 boot init stack canary
Subject: sh64: ascii armor the sh64 boot init stack canary
Subject: x86/mmap: properly account for stack randomization in mmap_base
Subject: arm64/mmap: properly account for stack randomization in mmap_base
Subject: powerpc,mmap: properly account for stack randomization in mmap_base
Subject: MIPS: do not use __GFP_REPEAT for order-0 request
Subject: mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
Subject: xfs: map KM_MAYFAIL to __GFP_RETRY_MAYFAIL
Subject: mm: kvmalloc support __GFP_RETRY_MAYFAIL for all sizes
Subject: drm/i915: use __GFP_RETRY_MAYFAIL
Subject: mm, migration: do not trigger OOM killer when migrating memory
Subject: checkpatch: improve the STORAGE_CLASS test
Subject: ARM: KVM: move asmlinkage before type
Subject: ARM: HP Jornada 7XX: move inline before return type
Subject: CRIS: gpio: move inline before return type
Subject: FRV: tlbflush: move asmlinkage before return type
Subject: ia64: move inline before return type
Subject: ia64: sn: pci: move inline before type
Subject: m68k: coldfire: move inline before return type
Subject: MIPS: SMP: move asmlinkage before return type
Subject: sh: move inline before return type
Subject: x86/efi: move asmlinkage before return type
Subject: drivers: s390: move static and inline before return type
Subject: drivers: tty: serial: move inline before return type
Subject: USB: serial: safe_serial: move __inline__ before return type
Subject: video: fbdev: intelfb: move inline before return type
Subject: video: fbdev: omap: move inline before return type
Subject: ARM: samsung: usb-ohci: move inline before return type
Subject: writeback: rework wb_[dec|inc]_stat family of functions
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-07-10 22:46 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-07-10 22:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- most of the rest of MM
- KASAN updates
- lib/ updates
- checkpatch updates
- some binfmt_elf changes
- various misc bits
115 patches, based on 9eb788800510ae1a6bc419636a66071ee4deafd5:
Subject: swap: add block io poll in swapin path
Subject: mm, page_alloc: fallback to smallest page when not stealing whole pageblock
Subject: mm/memory.c: convert to DEFINE_DEBUGFS_ATTRIBUTE
Subject: mm, vmscan: avoid thrashing anon lru when free + file is low
Subject: mm/memory_hotplug.c: add NULL check to avoid potential NULL pointer dereference
Subject: mm/zsmalloc.c: fix -Wunneeded-internal-declaration warning
Subject: fs/buffer.c: make bh_lru_install() more efficient
Subject: mm: hugetlb: prevent reuse of hwpoisoned free hugepages
Subject: mm: hugetlb: return immediately for hugetlb page in __delete_from_page_cache()
Subject: mm: hwpoison: change PageHWPoison behavior on hugetlb pages
Subject: mm: hugetlb: soft-offline: dissolve source hugepage after successful migration
Subject: mm: soft-offline: dissolve free hugepage if soft-offlined
Subject: mm: hwpoison: introduce memory_failure_hugetlb()
Subject: mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error
Subject: mm: hugetlb: delete dequeue_hwpoisoned_huge_page()
Subject: mm: hwpoison: introduce idenfity_page_state
Subject: mm, vmpressure: pass-through notification support
Subject: mm: make PR_SET_THP_DISABLE immediately active
Subject: mm/memcontrol: exclude @root from checks in mem_cgroup_low
Subject: vmalloc: show lazy-purged vma info in vmallocinfo
Subject: mm/cma.c: warn if the CMA area could not be activated
Subject: mm/hugetlb.c: warn the user when issues arise on boot due to hugepages
Subject: oom, trace: remove ENUM evaluation of COMPACTION_FEEDBACK
Subject: mm: improve readability of transparent_hugepage_enabled()
Subject: mm: always enable thp for dax mappings
Subject: include/linux/page_ref.h: ensure page_ref_unfreeze is ordered against prior accesses
Subject: mm/migrate.c: stabilise page count when migrating transparent hugepages
Subject: zram: use __sysfs_match_string() helper
Subject: mm, memory_hotplug: support movable_node for hotpluggable nodes
Subject: mm, memory_hotplug: simplify empty node mask handling in new_node_page
Subject: hugetlb, memory_hotplug: prefer to use reserved pages for migration
Subject: mm: unify new_node_page and alloc_migrate_target
Subject: mm, hugetlb: schedule when potentially allocating many hugepages
Subject: mm, memcg: fix potential undefined behavior in mem_cgroup_event_ratelimit()
Subject: mm/hugetlb.c: replace memfmt with string_get_size
Subject: mm/truncate.c: fix THP handling in invalidate_mapping_pages()
Subject: userfaultfd: non-cooperative: add madvise() event for MADV_FREE request
Subject: mm/oom_kill.c: add tracepoints for oom reaper-related events
Subject: mm, hugetlb: unclutter hugetlb allocation layers
Subject: hugetlb: add support for preferred node to alloc_huge_page_nodemask
Subject: mm, hugetlb, soft_offline: use new_page_nodemask for soft offline migration
Subject: mm: avoid taking zone lock in pagetypeinfo_showmixed()
Subject: mm: drop useless local parameters of __register_one_node()
Subject: fs/proc/task_mmu.c: remove obsolete comment in show_map_vma()
Subject: mm/page_alloc.c: eliminate unsigned confusion in __rmqueue_fallback
Subject: mm/swap_slots.c: don't disable preemption while taking the per-CPU cache
Subject: include/linux/mmzone.h: remove ancient/ambiguous comment
Subject: include/linux/backing-dev.h: simplify wb_stat_sum
Subject: mm: document highmem_is_dirtyable sysctl
Subject: mm/memory_hotplug.c: remove unused local zone_type from __remove_zone()
Subject: cma: fix calculation of aligned offset
Subject: mm/balloon_compaction.c: enqueue zero page to balloon device
Subject: mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
Subject: mm/mmap.c: expand_downwards: don't require the gap if !vm_prev
Subject: mm/list_lru.c: fix list_lru_count_node() to be race free
Subject: fs/dcache.c: fix spin lockup issue on nlru->lock
Subject: mm: use dedicated helper to access rlimit value
Subject: mm: swap: provide lru_add_drain_all_cpuslocked()
Subject: mm/memory-hotplug: switch locking to a percpu rwsem
Subject: mm: disallow early_pfn_to_nid on configurations which do not implement it
Subject: zram: constify attribute_group structures.
Subject: mm/zsmalloc: simplify zs_max_alloc_size handling
Subject: mm/kasan/kasan_init.c: use kasan_zero_pud for p4d table
Subject: mm/kasan: get rid of speculative shadow checks
Subject: x86/kasan: don't allocate extra shadow memory
Subject: arm64/kasan: don't allocate extra shadow memory
Subject: mm/kasan: add support for memory hotplug
Subject: mm/kasan/kasan.c: rename XXX_is_zero to XXX_is_nonzero
Subject: kasan: make get_wild_bug_type() static
Subject: frv: remove wrapper header for asm/device.h
Subject: frv: use generic fb.h
Subject: frv: cmpxchg: implement cmpxchg64()
Subject: fs/proc/generic.c: switch to ida_simple_get/remove
Subject: asm-generic/bug.h: declare struct pt_regs; before function prototype
Subject: linux/bug.h: correct formatting of block comment
Subject: linux/bug.h: correct "(foo*)" should be "(foo *)"
Subject: linux/bug.h: correct "space required before that '-'"
Subject: bug: split BUILD_BUG stuff out into <linux/build_bug.h>
Subject: ARM: fix rd_size declaration
Subject: kernel/ksysfs.c: constify attribute_group structures.
Subject: kernel/groups.c: use sort library function
Subject: kernel/kallsyms.c: replace all_var with IS_ENABLED(CONFIG_KALLSYMS_ALL)
Subject: MAINTAINERS: give proc sysctl some maintainer love
Subject: lib/test_bitmap.c: add optimisation tests
Subject: bitmap: optimise bitmap_set and bitmap_clear of a single bit
Subject: include/linux/bitmap.h: turn bitmap_set and bitmap_clear into memset when possible
Subject: bitmap: use memcmp optimisation in more situations
Subject: lib/kstrtox.c: delete end-of-string test
Subject: lib/kstrtox.c: use "unsigned int" more
Subject: lib/interval_tree_test.c: allow the module to be compiled-in
Subject: lib/interval_tree_test.c: make test options module parameters
Subject: lib/interval_tree_test.c: allow users to limit scope of endpoint
Subject: lib/interval_tree_test.c: allow full tree search
Subject: lib/rhashtable.c: use kvzalloc() in bucket_table_alloc() when possible
Subject: lib/extable.c: use bsearch() library function in search_extable()
Subject: lib/bsearch.c: micro-optimize pivot position calculation
Subject: checkpatch: improve the unnecessary OOM message test
Subject: checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t
Subject: checkpatch: [HLP]LIST_HEAD is also declaration
Subject: checkpatch: fix stepping through statements with $stat and ctx_statement_block
Subject: checkpatch: remove false warning for commit reference
Subject: checkpatch: improve tests for multiple line function definitions
Subject: checkpatch: silence perl 5.26.0 unescaped left brace warnings
Subject: checkpatch: change format of --color argument to --color[=WHEN]
Subject: checkpatch: improve macro reuse test
Subject: checkpatch: improve multi-line alignment test
Subject: fs, epoll: short circuit fetching events if thread has been killed
Subject: binfmt_elf: use ELF_ET_DYN_BASE only for PIE
Subject: arm: move ELF_ET_DYN_BASE to 4MB
Subject: arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
Subject: powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
Subject: s390: reduce ELF_ET_DYN_BASE
Subject: binfmt_elf: safely increment argv pointers
Subject: kernel/signal.c: avoid undefined behaviour in kill_something_info
Subject: kernel/exit.c: avoid undefined behaviour when calling wait4()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-07-06 22:34 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-07-06 22:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a few hotfixes
- various misc updates
- ocfs2 updates
- most of MM
108 patches, based on 9ced560b82606b35adb33a27012a148d418a4c1f:
Subject: compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled
Subject: thp, mm: fix crash due race in MADV_FREE handling
Subject: kernel/extable.c: mark core_kernel_text notrace
Subject: mn10300: remove wrapper header for asm/device.h
Subject: mn10300: use generic fb.h
Subject: tile: provide default ioremap declaration
Subject: scripts/gen_initramfs_list.sh: teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user".
Subject: ramfs: clarify help text that compression applies to ramfs as well as legacy ramdisk.
Subject: scripts/spelling.txt: add a bunch more spelling mistakes
Subject: provide linux/set_memory.h
Subject: kernel/power/snapshot.c: use linux/set_memory.h
Subject: kernel/module.c: use linux/set_memory.h
Subject: include/linux/filter.h: use linux/set_memory.h
Subject: drivers/sh/intc/virq.c: delete an error message for a failed memory allocation in add_virq_to_pirq()
Subject: ocfs2: fix a static checker warning
Subject: ocfs2: use magic.h
Subject: ocfs2: free 'dummy_sc' in sc_fop_release() to prevent memory leak
Subject: ocfs2: constify attribute_group structures
Subject: fs/file.c: replace alloc_fdmem() with kvmalloc() alternative
Subject: mm/slub.c: remove a redundant assignment in ___slab_alloc()
Subject: mm/slub: reset cpu_slab's pointer in deactivate_slab()
Subject: mm/slub.c: pack red_left_pad with another int to save a word
Subject: mm/slub.c: wrap cpu_slab->partial in CONFIG_SLUB_CPU_PARTIAL
Subject: mm/slub.c: wrap kmem_cache->cpu_partial in config CONFIG_SLUB_CPU_PARTIAL
Subject: mm/slab.c: replace open-coded round-up code with ALIGN
Subject: mm: allow slab_nomerge to be set at build time
Subject: mm, sparsemem: break out of loops early
Subject: mm/mmap.c: mark protection_map as __ro_after_init
Subject: mm/vmscan.c: fix unsequenced modification and access warning
Subject: mm/nobootmem.c: return 0 when start_pfn equals end_pfn
Subject: ksm: introduce ksm_max_page_sharing per page deduplication limit
Subject: ksm: fix use after free with merge_across_nodes = 0
Subject: ksm: cleanup stable_node chain collapse case
Subject: ksm: swap the two output parameters of chain/chain_prune
Subject: ksm: optimize refile of stable_node_dup at the head of the chain
Subject: zram: count same page write as page_stored
Subject: mm/vmstat.c: standardize file operations variable names
Subject: mm, THP, swap: delay splitting THP during swap out
Subject: mm, THP, swap: unify swap slot free functions to put_swap_page
Subject: mm, THP, swap: move anonymous THP split logic to vmscan
Subject: mm, THP, swap: check whether THP can be split firstly
Subject: mm, THP, swap: enable THP swap optimization only if has compound map
Subject: mm: remove return value from init_currently_empty_zone
Subject: mm, memory_hotplug: use node instead of zone in can_online_high_movable
Subject: mm: drop page_initialized check from get_nid_for_pfn
Subject: mm, memory_hotplug: get rid of is_zone_device_section
Subject: mm, memory_hotplug: split up register_one_node()
Subject: mm, memory_hotplug: consider offline memblocks removable
Subject: mm: consider zone which is not fully populated to have holes
Subject: mm, compaction: skip over holes in __reset_isolation_suitable
Subject: mm: __first_valid_page skip over offline pages
Subject: mm, vmstat: skip reporting offline pages in pagetypeinfo
Subject: mm, memory_hotplug: do not associate hotadded memory to zones until online
Subject: mm, memory_hotplug: fix MMOP_ONLINE_KEEP behavior
Subject: mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
Subject: mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory
Subject: mm, memory_hotplug: fix the section mismatch warning
Subject: mm, memory_hotplug: remove unused cruft after memory hotplug rework
Subject: kernel/exit.c: don't include unused userfaultfd_k.h
Subject: fs/userfaultfd.c: drop dead code
Subject: mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level
Subject: mm/hugetlb/migration: use set_huge_pte_at instead of set_pte_at
Subject: mm/follow_page_mask: split follow_page_mask to smaller functions.
Subject: mm/hugetlb: export hugetlb_entry_migration helper
Subject: mm/follow_page_mask: add support for hugetlb pgd entries
Subject: mm/hugetlb: move default definition of hugepd_t earlier in the header
Subject: mm/follow_page_mask: add support for hugepage directory entry
Subject: powerpc/hugetlb: add follow_huge_pd implementation for ppc64
Subject: powerpc/mm/hugetlb: remove follow_huge_addr for powerpc
Subject: powerpc/hugetlb: enable hugetlb migration for ppc64
Subject: mm: zero hash tables in allocator
Subject: mm: update callers to use HASH_ZERO flag
Subject: mm: adaptive hash table scaling
Subject: mm/hugetlb: clean up ARCH_HAS_GIGANTIC_PAGE
Subject: powerpc/mm/hugetlb: add support for 1G huge pages
Subject: mm/page_alloc.c: mark bad_range() and meminit_pfn_in_nid() as __maybe_unused
Subject: mm: drop NULL return check of pte_offset_map_lock()
Subject: arm64: hugetlb: refactor find_num_contig()
Subject: arm64: hugetlb: remove spurious calls to huge_ptep_offset()
Subject: mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages
Subject: mm, gup: ensure real head page is ref-counted when using hugepages
Subject: mm/hugetlb: add size parameter to huge_pte_offset()
Subject: mm/hugetlb: allow architectures to override huge_pte_clear()
Subject: mm/hugetlb: introduce set_huge_swap_pte_at() helper
Subject: mm: rmap: use correct helper when poisoning hugepages
Subject: mm, page_alloc: fix more premature OOM due to race with cpuset update
Subject: mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask()
Subject: mm, page_alloc: pass preferred nid instead of zonelist to allocator
Subject: mm, mempolicy: simplify rebinding mempolicies when updating cpusets
Subject: mm, cpuset: always use seqlock when changing task's nodemask
Subject: mm, mempolicy: don't check cpuset seqlock where it doesn't matter
Subject: mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
Subject: mm: kmemleak: factor object reference updating out of scan_block()
Subject: mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
Subject: mm: per-cgroup memory reclaim stats
Subject: mm/oom_kill: count global and memory cgroup oom kills
Subject: mm/swapfile.c: sort swap entries before free
Subject: mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
Subject: mm/zswap.c: improve a size determination in zswap_frontswap_init()
Subject: mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
Subject: mm: vmstat: move slab statistics from zone to node counters
Subject: mm: memcontrol: use the node-native slab memory counters
Subject: mm: memcontrol: use generic mod_memcg_page_state for kmem pages
Subject: mm: memcontrol: per-lruvec stats infrastructure
Subject: mm: memcontrol: account slab stats per lruvec
Subject: mm, memory_hotplug: drop artificial restriction on online/offline
Subject: mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
Subject: mm, memory_hotplug: move movable_node to the hotplug proper
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-06-23 22:08 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-06-23 22:08 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
8 fixes, based on a38371cba67539ce6a5d5324db34bc2ddaf66cc1:
Subject: mm, thp: remove cond_resched from __collapse_huge_page_copy
Subject: mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings
Subject: autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
Subject: fs/dax.c: fix inefficiency in dax_writeback_mapping_range()
Subject: lib/cmdline.c: fix get_options() overflow while parsing ranges
Subject: slub: make sysfs file removal asynchronous
Subject: ocfs2: fix deadlock caused by recursive locking in xattr
Subject: fs/exec.c: account for argv/envp pointers
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-06-16 21:02 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-06-16 21:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
5 fixes, based on ab2789b72df3cf7a01e30636ea86cbbf44ba2e99:
Subject: mm/memory-failure.c: use compound_head() flags for huge pages
Subject: swap: cond_resched in swap_cgroup_prepare()
Subject: mm: numa: avoid waiting on freed migrated pages
Subject: userfaultfd: shmem: handle coredumping in handle_userfault()
Subject: mm: correct the comment when reclaimed pages exceed the scanned pages
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-06-02 21:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-06-02 21:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
15 fixes, baed on c531577bcdac51225f50033e0c89644873f4dc6d:
Subject: ksm: prevent crash after write_protect_page fails
Subject: include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
Subject: frv: declare jiffies to be located in the .data section
Subject: mm: clarify why we want kmalloc before falling backto vmallock
Subject: initramfs: fix disabling of initramfs (and its compression)
Subject: slub/memcg: cure the brainless abuse of sysfs attributes
Subject: pcmcia: remove left-over %Z format
Subject: mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
Subject: mm: avoid spurious 'bad pmd' warning messages
Subject: dax: fix race between colliding PMD & PTE entries
Subject: mm/migrate: fix refcount handling when !hugepage_migration_supported()
Subject: mlock: fix mlock count can not decrease in race condition
Subject: mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
Subject: mm: consider memblock reservations for deferred memory initialization sizing
Subject: scripts/gdb: make lx-dmesg command work (reliably)
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-05-12 22:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-05-12 22:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
15 fixes, based on deac8429d62ca19c1571853e2a18f60e760ee04c:
Subject: hwpoison, memcg: forcibly uncharge LRU pages
Subject: time: delete current_fs_time()
Subject: mm, vmstat: Remove spurious WARN() during zoneinfo print
Subject: gcov: support GCC 7.1
Subject: mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
Subject: mm, vmalloc: fix vmalloc users tracking properly
Subject: Tigran has moved
Subject: dax: prevent invalidation of mapped DAX entries
Subject: mm: fix data corruption due to stale mmap reads
Subject: ext4: return to starting transaction in ext4_dax_huge_fault()
Subject: dax: fix data corruption when fault races with write
Subject: dax: fix PMD data corruption when fault races with write
Subject: mm, thp: copying user pages must schedule on collapse
Subject: mm: vmscan: scan until it finds eligible pages
Subject: mm, docs: update memory.stat description with workingset* entries
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-05-08 22:53 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-05-08 22:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- the rest of MM
- various misc things
- procfs updates
- lib/ updates
- checkpatch updates
- kdump/kexec updates
- add kvmalloc helpers, use them
- time helper updates for Y2038 issues. We're almost ready to remove
current_fs_time() but that awaits a btrfs merge.
- add tracepoints to DAX.
114 patches, based on 13e0988140374123bead1dd27c287354cb95108e:
Subject: mm, compaction: reorder fields in struct compact_control
Subject: mm, compaction: remove redundant watermark check in compact_finished()
Subject: mm, page_alloc: split smallest stolen page in fallback
Subject: mm, page_alloc: count movable pages when stealing from pageblock
Subject: mm, compaction: change migrate_async_suitable() to suitable_migration_source()
Subject: mm, compaction: add migratetype to compact_control
Subject: mm, compaction: restrict async compaction to pageblocks of same migratetype
Subject: mm, compaction: finish whole pageblock to reduce fragmentation
Subject: fs/proc/inode.c: remove cast from memory allocation
Subject: proc/sysctl: fix the int overflow for jiffies conversion
Subject: drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked()
Subject: jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp
Subject: make help: add tools help target
Subject: kernel/hung_task.c: defer showing held locks
Subject: drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests
Subject: drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
Subject: Revert "lib/test_sort.c: make it explicitly non-modular"
Subject: lib: add module support to array-based sort tests
Subject: lib: add module support to linked list sorting tests
Subject: firmware/Makefile: force recompilation if makefile changes
Subject: checkpatch: remove obsolete CONFIG_EXPERIMENTAL checks
Subject: checkpatch: add ability to find bad uses of vsprintf %p<foo> extensions
Subject: checkpatch: improve EMBEDDED_FUNCTION_NAME test
Subject: checkpatch: allow space leading blank lines in email headers
Subject: checkpatch: avoid suggesting struct definitions should be const
Subject: checkpatch: improve MULTISTATEMENT_MACRO_USE_DO_WHILE test
Subject: checkpatch: clarify the EMBEDDED_FUNCTION_NAME message
Subject: checkpatch: special audit for revert commit line
Subject: checkpatch: improve k.alloc with multiplication and sizeof test
Subject: checkpatch: add --typedefsfile
Subject: checkpatch: improve the embedded function name test for patch contexts
Subject: checkpatch: improve the SUSPECT_CODE_INDENT test
Subject: reiserfs: use designated initializers
Subject: fork: free vmapped stacks in cache when cpus are offline
Subject: cpumask: make "nr_cpumask_bits" unsigned
Subject: crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Subject: ia64: reuse append_elf_note() and final_note() functions
Subject: powerpc/fadump: remove dependency with CONFIG_KEXEC
Subject: powerpc/fadump: reuse crashkernel parameter for fadump memory reservation
Subject: powerpc/fadump: update documentation about crashkernel parameter reuse
Subject: pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid()
Subject: ns: allow ns_entries to have custom symlink content
Subject: pidns: expose task pid_ns_for_children to userspace
Subject: taskstats: add e/u/stime for TGID command
Subject: kcov: simplify interrupt check
Subject: lib/fault-inject.c: use correct check for interrupts
Subject: lib/zlib_inflate/inftrees.c: fix potential buffer overflow
Subject: initramfs: provide a way to ignore image provided by bootloader
Subject: initramfs: use vfs_stat/lstat directly
Subject: ipc/shm: some shmat cleanups
Subject: sysv,ipc: cacheline align kern_ipc_perm
Subject: mm: introduce kv[mz]alloc helpers
Subject: mm, vmalloc: properly track vmalloc users
Subject: mm: support __GFP_REPEAT in kvmalloc_node for >32kB
Subject: lib/rhashtable.c: simplify a strange allocation pattern
Subject: net/ipv6/ila/ila_xlat.c: simplify a strange allocation pattern
Subject: fs/xattr.c: zero out memory copied to userspace in getxattr
Subject: treewide: use kv[mz]alloc* rather than opencoded variants
Subject: net: use kvmalloc with __GFP_REPEAT rather than open coded variant
Subject: drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant
Subject: drivers/md/bcache/super.c: use kvmalloc
Subject: mm, swap: use kvzalloc to allocate some swap data structures
Subject: mm, vmalloc: use __GFP_HIGHMEM implicitly
Subject: scripts/spelling.txt: add "memory" pattern and fix typos
Subject: scripts/spelling.txt: add regsiter -> register spelling mistake
Subject: scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances
Subject: treewide: spelling: correct diffrent[iate] and banlance typos
Subject: treewide: move set_memory_* functions away from cacheflush.h
Subject: arm: use set_memory.h header
Subject: arm64: use set_memory.h header
Subject: s390: use set_memory.h header
Subject: x86: use set_memory.h header
Subject: agp: use set_memory.h header
Subject: drm: use set_memory.h header
Subject: drivers/hwtracing/intel_th/msu.c: use set_memory.h header
Subject: drivers/watchdog/hpwdt.c: use set_memory.h header
Subject: include/linux/filter.h: use set_memory.h header
Subject: kernel/module.c: use set_memory.h header
Subject: kernel/power/snapshot.c: use set_memory.h header
Subject: alsa: use set_memory.h header
Subject: drivers/misc/sram-exec.c: use set_memory.h header
Subject: drivers/video/fbdev/vermilion/vermilion.c: use set_memory.h header
Subject: drivers/staging/media/atomisp/pci/atomisp2: use set_memory.h
Subject: treewide: decouple cacheflush.h and set_memory.h
Subject: kref: remove WARN_ON for NULL release functions
Subject: drivers/scsi/megaraid: remove expensive inline from megasas_return_cmd
Subject: include/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_dec
Subject: fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag
Subject: Documentation/vm/transhuge.txt: fix trivial typos
Subject: format-security: move static strings to const
Subject: fs: f2fs: use ktime_get_real_seconds for sit_info times
Subject: trace: make trace_hwlat timestamp y2038 safe
Subject: fs: cifs: replace CURRENT_TIME by other appropriate apis
Subject: fs: ceph: CURRENT_TIME with ktime_get_real_ts()
Subject: fs: ufs: use ktime_get_real_ts64() for birthtime
Subject: fs: ubifs: replace CURRENT_TIME_SEC with current_time
Subject: lustre: replace CURRENT_TIME macro
Subject: apparmorfs: replace CURRENT_TIME with current_time()
Subject: gfs2: replace CURRENT_TIME with current_time
Subject: time: delete CURRENT_TIME_SEC and CURRENT_TIME
Subject: mm/huge_memory.c: use zap_deposited_table() more
Subject: mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
Subject: mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
Subject: mm: introduce memalloc_noreclaim_{save,restore}
Subject: treewide: convert PF_MEMALLOC manipulations to new helpers
Subject: mtd: nand: nandsim: convert to memalloc_noreclaim_*()
Subject: dax: add tracepoints to dax_iomap_pte_fault()
Subject: dax: add tracepoints to dax_pfn_mkwrite()
Subject: dax: add tracepoints to dax_load_hole()
Subject: dax: add tracepoints to dax_writeback_mapping_range()
Subject: dax: add tracepoint to dax_writeback_one()
Subject: dax: add tracepoint to dax_insert_mapping()
Subject: selftests/vm: add a test for virtual address range mapping
Subject: drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-05-03 21:50 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-05-03 21:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a few misc things
- most of MM
- KASAN updates
102 patches, based on 46f0537b1ecf672052007c97f102a7e6bf0791e4:
Subject: lib/dma-debug.c: make locking work for RT
Subject: scripts/spelling.txt: add several more common spelling mistakes
Subject: blackfin: bf609: let clk_disable() return immediately if clk is NULL
Subject: fs/ocfs2/cluster: use setup_timer
Subject: ocfs2: o2hb: revert hb threshold to keep compatible
Subject: fs/ocfs2/cluster: use offset_in_page() macro
Subject: slab: avoid IPIs when creating kmem caches
Subject: mm: fix 100% CPU kswapd busyloop on unreclaimable nodes
Subject: mm: fix check for reclaimable pages in PF_MEMALLOC reclaim throttling
Subject: mm: remove seemingly spurious reclaimability check from laptop_mode gating
Subject: mm: remove unnecessary reclaimability check from NUMA balancing target
Subject: mm: don't avoid high-priority reclaim on unreclaimable nodes
Subject: mm: don't avoid high-priority reclaim on memcg limit reclaim
Subject: mm: delete NR_PAGES_SCANNED and pgdat_reclaimable()
Subject: Revert "mm, vmscan: account for skipped pages as a partial scan"
Subject: mm: remove unnecessary back-off function when retrying page reclaim
Subject: mm/page-writeback.c: use setup_deferrable_timer
Subject: mm: delete unnecessary TTU_* flags
Subject: mm: don't assume anonymous pages have SwapBacked flag
Subject: mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
Subject: mm: reclaim MADV_FREE pages
Subject: mm: fix lazyfree BUG_ON check in try_to_unmap_one()
Subject: mm: enable MADV_FREE for swapless system
Subject: proc: show MADV_FREE pages info in smaps
Subject: mm: memcontrol: provide shmem statistics
Subject: mm, swap: Fix a race in free_swap_and_cache()
Subject: mm: use is_migrate_highatomic() to simplify the code
Subject: mm: use is_migrate_isolate_page() to simplify the code
Subject: mm, vmstat: print non-populated zones in zoneinfo
Subject: mm, vmstat: suppress pcp stats for unpopulated zones in zoneinfo
Subject: lockdep: teach lockdep about memalloc_noio_save
Subject: lockdep: allow to disable reclaim lockup detection
Subject: xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS
Subject: mm: introduce memalloc_nofs_{save,restore} API
Subject: xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*
Subject: jbd2: mark the transaction context with the scope GFP_NOFS context
Subject: jbd2: make the whole kjournald2 kthread NOFS safe
Subject: mm: tighten up the fault path a little
Subject: mm: remove rodata_test_data export, add pr_fmt
Subject: mm: do not use double negation for testing page flags
Subject: mm, vmscan: fix zone balance check in prepare_kswapd_sleep
Subject: mm, vmscan: only clear pgdat congested/dirty/writeback state when balanced
Subject: mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx
Subject: mm: page_alloc: __GFP_NOWARN shouldn't suppress stall warnings
Subject: mm/sparse: refine usemap_size() a little
Subject: mm/compaction: ignore block suitable after check large free page
Subject: mm/vmscan: more restrictive condition for retry in do_try_to_free_pages
Subject: mm: remove unncessary ret in page_referenced
Subject: mm: remove SWAP_DIRTY in ttu
Subject: mm: remove SWAP_MLOCK check for SWAP_SUCCESS in ttu
Subject: mm: make try_to_munlock() return void
Subject: mm: remove SWAP_MLOCK in ttu
Subject: mm: remove SWAP_AGAIN in ttu
Subject: mm: make ttu's return boolean
Subject: mm: make rmap_walk() return void
Subject: mm: make rmap_one boolean function
Subject: mm: remove SWAP_[SUCCESS|AGAIN|FAIL]
Subject: mm, swap: fix comment in __read_swap_cache_async
Subject: mm, swap: improve readability via make spin_lock/unlock balanced
Subject: mm, swap: avoid lock swap_avail_lock when held cluster lock
Subject: mm: enable page poisoning early at boot
Subject: include/linux/migrate.h: add arg names to prototype
Subject: mm/swap_slots.c: add warning if swap slots cache failed to initialize
Subject: mm: fix spelling error
Subject: userfaultfd: selftest: combine all cases into a single executable
Subject: oom: improve oom disable handling
Subject: mm/mmap: replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
Subject: mm: vmscan: fix IO/refault regression in cache workingset transition
Subject: mm: memcontrol: clean up memory.events counting function
Subject: mm: memcontrol: re-use global VM event enum
Subject: mm: memcontrol: re-use node VM page state enum
Subject: mm: memcontrol: use node page state naming scheme for memcg
Subject: mm, swap: remove unused function prototype
Subject: Documentation: vm, add hugetlbfs reservation overview
Subject: mm/madvise.c: clean up MADV_SOFT_OFFLINE and MADV_HWPOISON
Subject: mm/madvise: move up the behavior parameter validation
Subject: mm/memory-failure.c: add page flag description in error paths
Subject: mm, page_alloc: remove debug_guardpage_minorder() test in warn_alloc()
Subject: zram: handle multiple pages attached bio's bvec
Subject: zram: partial IO refactoring
Subject: zram: use zram_slot_lock instead of raw bit_spin_lock op
Subject: zram: remove zram_meta structure
Subject: zram: introduce zram data accessor
Subject: zram: use zram_free_page instead of open-coded
Subject: zram: reduce load operation in page_same_filled
Subject: fs: fix data invalidation in the cleancache during direct IO
Subject: fs/block_dev: always invalidate cleancache in invalidate_bdev()
Subject: mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty
Subject: mm/truncate: avoid pointless cleancache_invalidate_inode() calls.
Subject: mm/gup.c: fix access_ok() argument type
Subject: mm/swapfile.c: fix swap space leak in error path of swap_free_entries()
Subject: mm: hwpoison: call shake_page() unconditionally
Subject: mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
Subject: kasan: introduce helper functions for determining bug type
Subject: kasan: unify report headers
Subject: kasan: change allocation and freeing stack traces headers
Subject: kasan: simplify address description logic
Subject: kasan: change report header
Subject: kasan: improve slab object description
Subject: kasan: print page description after stacks
Subject: kasan: improve double-free report format
Subject: kasan: separate report parts by empty lines
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-04-20 21:37 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-04-20 21:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
2 fixes, based on f61143c45077df4fa78e2f1ba455a00bbe1d5b8c:
Subject: Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"
Subject: mm: prevent NR_ISOLATE_* stats from going negative
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-04-13 21:56 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-04-13 21:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
11 fixes, based on 2760078203a6b46b96307f4b06030ab0b801c97e:
Subject: z3fold: fix page locking in z3fold_alloc()
Subject: thp: reduce indentation level in change_huge_pmd()
Subject: thp: fix MADV_DONTNEED vs. numa balancing race
Subject: mm: drop unused pmdp_huge_get_and_clear_notify()
Subject: thp: fix MADV_DONTNEED vs. MADV_FREE race
Subject: thp: fix MADV_DONTNEED vs clear soft dirty race
Subject: hugetlbfs: fix offset overflow in hugetlbfs mmap
Subject: zram: fix operator precedence to get offset
Subject: zram: do not use copy_page with non-page aligned address
Subject: zsmalloc: expand class bit
Subject: mailmap: add Martin Kepplinger's email
The presence of "thp: reduce indentation level in change_huge_pmd()" is
unfortunate. But the patchset had been decently reviewed and tested
before we decided it was needed in -stable and I felt it best not to
churn things at the last minute.
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-04-07 23:04 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-04-07 23:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
10 fixes, based on 81d4bab4ce87228c37ab14a885438544af5c9ce6:
Subject: mm: fix page_vma_mapped_walk() for ksm pages
Subject: userfaultfd: report actual registered features in fdinfo
Subject: mm/page_alloc.c: fix print order in show_free_areas()
Subject: vmlinux.lds: add missing VMLINUX_SYMBOL macros
Subject: ptrace: fix PTRACE_LISTEN race corrupting task->state
Subject: mm, thp: fix setting of defer+madvise thp defrag mode
Subject: dax: fix radix tree insertion race
Subject: mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
Subject: mailmap: update Yakir Yang email address
Subject: mm: move pcp and lru-pcp draining into single wq
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-03-31 22:11 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-03-31 22:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
11 fixes, based on d4562267b995fa3917717cc7773dad9c1f1ca658:
Subject: mm: migrate: fix remove_migration_pte() for ksm pages
Subject: mm: move mm_percpu_wq initialization earlier
Subject: mm: rmap: fix huge file mmap accounting in the memcg stats
Subject: mm: workingset: fix premature shadow node shrinking with cgroups
Subject: mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
Subject: mm: fix section name for .data..ro_after_init
Subject: hugetlbfs: initialize shared policy as part of inode allocation
Subject: kasan: report only the first error by default
Subject: mm/hugetlb.c: don't call region_abort if region_chg fails
Subject: drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
Subject: kasan: do not sanitize kexec purgatory
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-03-16 23:40 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-03-16 23:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
6 fixes, based on 69eea5a4ab9c705496e912b55a9d312325de19e6:
Subject: z3fold: fix spinlock unlocking in page reclaim
Subject: kasan: add a prototype of task_struct to avoid warning
Subject: mm, x86: fix native_pud_clear build error
Subject: mm: don't warn when vmalloc() fails due to a fatal signal
Subject: mm: add private lock to serialize memory hotplug operations
Subject: drivers core: remove assert_held_device_hotplug()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-03-10 0:15 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-03-10 0:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
26 fixes, based on ea6200e84182989a3cce9687cf79a23ac44ec4db:
Subject: userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE
Subject: scripts/spelling.txt: add "disble(d)" pattern and fix typo instances
Subject: scripts/spelling.txt: add "overide" pattern and fix typo instances
Subject: powerpc/mm: handle protnone ptes on fork
Subject: power/mm: update pte_write and pte_wrprotect to handle savedwrite
Subject: x86, mm: fix gup_pte_range() vs DAX mappings
Subject: x86, mm: unify exit paths in gup_pte_range()
Subject: userfaultfd: non-cooperative: rollback userfaultfd_exit
Subject: userfaultfd: non-cooperative: robustness check
Subject: userfaultfd: non-cooperative: release all ctx in dup_userfaultfd_complete
Subject: include/linux/fs.h: fix unsigned enum warning with gcc-4.2
Subject: mm/vmstats: add thp_split_pud event for clarity
Subject: drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h
Subject: mm/cgroup: avoid panic when init with low memory
Subject: userfaultfd: non-cooperative: fix fork fctx->new memleak
Subject: userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED
Subject: userfaultfd: selftest: vm: allow to build in vm/ directory
Subject: mm/memblock.c: fix memblock_next_valid_pfn()
Subject: rmap: fix NULL-pointer dereference on THP munlocking
Subject: thp: fix another corner case of munlock() vs. THPs
Subject: mm: do not call mem_cgroup_free() from within mem_cgroup_alloc()
Subject: kasan: resched in quarantine_remove_cache()
Subject: kasan: fix races in quarantine_remove_cache()
Subject: sh: cayman: IDE support fix
Subject: fat: fix using uninitialized fields of fat_inode/fsinfo_inode
Subject: userfaultfd: remove wrong comment from userfaultfd_ctx_get()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-02-27 22:25 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-02-27 22:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a few MM remainders
- misc things
- autofs updates
- signals
- affs updates
- ipc
- nilfs2
- spelling.txt updates
78 patches, based on e5d56efc97f8240d0b5d66c03949382b6d7e5570:
Subject: mm,fs,dax: mark dax_iomap_pmd_fault as const
Subject: zswap: allow initialization at boot without pool
Subject: zswap: clear compressor or zpool param if invalid at init
Subject: zswap: don't param_set_charp while holding spinlock
Subject: kprobes: move kprobe declarations to asm-generic/kprobes.h
Subject: autofs: remove wrong comment
Subject: autofs: fix typo in Documentation
Subject: autofs: fix wrong ioctl documentation regarding devid
Subject: autofs: update ioctl documentation regarding struct autofs_dev_ioctl
Subject: autofs: add command enum/macros for root-dir ioctls
Subject: autofs: remove duplicated AUTOFS_DEV_IOCTL_SIZE definition
Subject: autofs: take more care to not update last_used on path walk
Subject: hfsplus: atomically read inode size
Subject: fs/reiserfs: atomically read inode size
Subject: sigaltstack: support SS_AUTODISARM for CONFIG_COMPAT
Subject: tools/testing/selftests/sigaltstack/sas.c: improve output of sigaltstack testcase
Subject: /proc/kcore: update physical address for kcore ram and text
Subject: rapidio: use get_user_pages_unlocked()
Subject: include/linux/pid.h: use for_each_thread() in do_each_pid_thread()
Subject: fs,eventpoll: Don't test for bitfield with stack value
Subject: fs/affs: remove reference to affs_parent_ino()
Subject: fs/affs: add validation block function
Subject: fs/affs: make affs exportable
Subject: fs/affs: use octal for permissions
Subject: fs/affs: add prefix to some functions
Subject: fs/affs/namei.c: forward declarations clean-up
Subject: fs/affs: make export work with cold dcache
Subject: config: android-recommended: disable aio support
Subject: config: android-base: enable hardened usercopy and kernel ASLR
Subject: lib/fonts/Kconfig: keep non-Sparc fonts listed together
Subject: initramfs: finish fput() before accessing any binary from initramfs
Subject: ipc/sem.c: avoid using spin_unlock_wait()
Subject: ipc/sem: add hysteresis
Subject: ipc/mqueue: add missing sparse annotation
Subject: ipc/shm: Fix shmat mmap nil-page protection
Subject: scatterlist: reorder compound boolean expression
Subject: scatterlist: do not disable IRQs in sg_copy_buffer
Subject: fs: add i_blocksize()
Subject: nilfs2: use nilfs_btree_node_size()
Subject: nilfs2: use i_blocksize()
Subject: scripts/spelling.txt: add "swith" pattern and fix typo instances
Subject: scripts/spelling.txt: add "swithc" pattern and fix typo instances
Subject: scripts/spelling.txt: add "an user" pattern and fix typo instances
Subject: scripts/spelling.txt: add "an union" pattern and fix typo instances
Subject: scripts/spelling.txt: add "an one" pattern and fix typo instances
Subject: scripts/spelling.txt: add "partiton" pattern and fix typo instances
Subject: scripts/spelling.txt: add "aligment" pattern and fix typo instances
Subject: scripts/spelling.txt: add "algined" pattern and fix typo instances
Subject: scripts/spelling.txt: add "efective" pattern and fix typo instances
Subject: scripts/spelling.txt: add "varible" pattern and fix typo instances
Subject: scripts/spelling.txt: add "embeded" pattern and fix typo instances
Subject: scripts/spelling.txt: add "againt" pattern and fix typo instances
Subject: scripts/spelling.txt: add "neded" pattern and fix typo instances
Subject: scripts/spelling.txt: add "unneded" pattern and fix typo instances
Subject: scripts/spelling.txt: add "intialization" pattern and fix typo instances
Subject: scripts/spelling.txt: add "initialiazation" pattern and fix typo instances
Subject: scripts/spelling.txt: add "comsume(r)" pattern and fix typo instances
Subject: scripts/spelling.txt: add "overrided" pattern and fix typo instances
Subject: scripts/spelling.txt: add "configuartion" pattern and fix typo instances
Subject: scripts/spelling.txt: add "applys" pattern and fix typo instances
Subject: scripts/spelling.txt: add "explictely" pattern and fix typo instances
Subject: scripts/spelling.txt: add "omited" pattern and fix typo instances
Subject: scripts/spelling.txt: add "disassocation" pattern and fix typo instances
Subject: scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances
Subject: scripts/spelling.txt: add "overwritting" pattern and fix typo instances
Subject: scripts/spelling.txt: add "overwriten" pattern and fix typo instances
Subject: scripts/spelling.txt: add "therfore" pattern and fix typo instances
Subject: scripts/spelling.txt: add "followings" pattern and fix typo instances
Subject: scripts/spelling.txt: add some typo-words
Subject: lib/vsprintf.c: remove %Z support
Subject: checkpatch: warn when formats use %Z and suggest %z
Subject: mm: add new mmgrab() helper
Subject: mm: add new mmget() helper
Subject: mm: use mmget_not_zero() helper
Subject: mm: clarify mm_struct.mm_{users,count} documentation
Subject: hfs: atomically read inode size
Subject: mm: add arch-independent testcases for RODATA
Subject: mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-02-24 22:55 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-02-24 22:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- almost all of the rest of MM
- misc bits
- KASAN updates
- procfs
- lib/ updates
- checkpatch updates
124 patches, based on f1ef09fde17f9b77ca1435a5b53a28b203afb81c:
Subject: cris: use generic current.h
Subject: mm/ksm: improve deduplication of zero pages with colouring
Subject: mm, oom: header nodemask is NULL when cpusets are disabled
Subject: mm, devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin, done}
Subject: mm: validate device_hotplug is held for memory hotplug
Subject: mm/memory_hotplug.c: unexport __remove_pages()
Subject: memblock: let memblock_type_name know about physmem type
Subject: memblock: also dump physmem list within __memblock_dump_all
Subject: memblock: embed memblock type name within struct memblock_type
Subject: userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE
Subject: userfaultfd: non-cooperative: add madvise() event for MADV_REMOVE request
Subject: userfaultfd: non-cooperative: selftest: enable REMOVE event test for shmem
Subject: mm: vmscan: scan dirty pages even in laptop mode
Subject: mm: vmscan: kick flushers when we encounter dirty pages on the LRU
Subject: mm: vmscan: remove old flusher wakeup from direct reclaim path
Subject: mm: vmscan: only write dirty pages that the scanner has seen twice
Subject: mm: vmscan: move dirty pages out of the way until they're flushed
Subject: mm, page_alloc: split buffered_rmqueue()
Subject: mm, page_alloc: split alloc_pages_nodemask()
Subject: mm, page_alloc: drain per-cpu pages from workqueue context
Subject: mm, page_alloc: do not depend on cpu hotplug locks inside the allocator
Subject: mm, page_alloc: only use per-cpu allocator for irq-safe requests
Subject: mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
Subject: mm: fix comments for mmap_init()
Subject: zram: remove waitqueue for IO done
Subject: mm, page_alloc: remove redundant checks from alloc fastpath
Subject: mm, page_alloc: don't check cpuset allowed twice in fast-path
Subject: mm, page_alloc: use static global work_struct for draining per-cpu pages
Subject: mm,fs,dax: change ->pmd_fault to ->huge_fault
Subject: mm, x86: add support for PUD-sized transparent hugepages
Subject: dax: support for transparent PUD pages for device DAX
Subject: mm: replace FAULT_FLAG_SIZE with parameter to huge_fault
Subject: mm: fix get_user_pages() vs device-dax pud mappings
Subject: z3fold: make pages_nr atomic
Subject: z3fold: fix header size related issues
Subject: z3fold: extend compaction function
Subject: z3fold: use per-page spinlock
Subject: z3fold: add kref refcounting
Subject: mm/migration: make isolate_movable_page() return int type
Subject: mm/migration: make isolate_movable_page always defined
Subject: HWPOISON: soft offlining for non-lru movable page
Subject: mm/hotplug: enable memory hotplug for non-lru movable pages
Subject: uprobes: split THPs before trying to replace them
Subject: mm: introduce page_vma_mapped_walk()
Subject: mm: fix handling PTE-mapped THPs in page_referenced()
Subject: mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs()
Subject: mm, rmap: check all VMAs that PTE-mapped THP can be part of
Subject: mm: convert page_mkclean_one() to use page_vma_mapped_walk()
Subject: mm: convert try_to_unmap_one() to use page_vma_mapped_walk()
Subject: mm, ksm: convert write_protect_page() to use page_vma_mapped_walk()
Subject: mm, uprobes: convert __replace_page() to use page_vma_mapped_walk()
Subject: mm: convert page_mapped_in_vma() to use page_vma_mapped_walk()
Subject: mm: drop page_check_address{,_transhuge}
Subject: mm: convert remove_migration_pte() to use page_vma_mapped_walk()
Subject: mm: call vm_munmap in munmap syscall instead of using open coded version
Subject: userfaultfd: non-cooperative: add event for memory unmaps
Subject: userfaultfd: non-cooperative: add event for exit() notification
Subject: userfaultfd: mcopy_atomic: return -ENOENT when no compatible VMA found
Subject: userfaultfd_copy: return -ENOSPC in case mm has gone
Subject: userfaultfd: documentation update
Subject: mm: alloc_contig_range: allow to specify GFP mask
Subject: mm: cma_alloc: allow to specify GFP mask
Subject: mm: wire up GFP flag passing in dma_alloc_from_contiguous
Subject: mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count
Subject: mm: cma: print allocation failure reason and bitmap status
Subject: vmalloc: back off when the current task is killed
Subject: mm/page_alloc.c: remove duplicate inclusion of page_ext.h
Subject: mm/memory.c: use NULL instead of literal 0
Subject: mm: codgin-style fixes
Subject: drm: remove unnecessary fault wrappers
Subject: mm, vmscan: clear PGDAT_WRITEBACK when zone is balanced
Subject: mm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROW
Subject: mm/autonuma: don't use set_pte_at when updating protnone ptes
Subject: mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte.
Subject: mm/ksm: handle protnone saved writes when making page write protect
Subject: powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write
Subject: mm/page-writeback.c: place "not" inside of unlikely() statement in wb_domain_writeout_inc()
Subject: zram: extend zero pages to same element pages
Subject: mm/memory_hotplug.c: fix overflow in test_pages_in_a_zone()
Subject: mm/page_alloc: fix nodes for reclaim in fast path
Subject: mm: remove shmem_mapping() shmem_zero_setup() duplicates
Subject: mm: vmpressure: fix sending wrong events on underflow
Subject: mm/zsmalloc: remove redundant SetPagePrivate2 in create_page_chain
Subject: mm/page_alloc.c: remove redundant init code for ZONE_MOVABLE
Subject: mm/zsmalloc: fix comment in zsmalloc
Subject: mm: cleanups for printing phys_addr_t and dma_addr_t
Subject: mm/gup: check for protnone only if it is a PTE entry
Subject: mm/thp/autonuma: use TNF flag instead of vm fault
Subject: mm: do not access page->mapping directly on page_endio
Subject: memory-hotplug: use dev_online for memhp_auto_online
Subject: kasan: drain quarantine of memcg slab objects
Subject: kasan: add memcg kmem_cache test
Subject: arch/frv/mb93090-mb00/pci-frv.c: fix build warning
Subject: alpha: use generic current.h
Subject: proc: use rb_entry()
Subject: proc: less code duplication in /proc/*/cmdline
Subject: procfs: use an enum for possible hidepid values
Subject: uapi: mqueue.h: add missing linux/types.h include
Subject: include/linux/iopoll.h: include <linux/ktime.h> instead of <linux/hrtimer.h>
Subject: compiler-gcc.h: add a new macro to wrap gcc attribute
Subject: m68k: replace gcc specific macros with ones from compiler.h
Subject: bug: switch data corruption check to __must_check
Subject: mm balloon: umount balloon_mnt when removing vb device
Subject: kernel/notifier.c: simplify expression
Subject: kernel/ksysfs.c: add __ro_after_init to bin_attribute structure
Subject: lib: add module support to crc32 tests
Subject: lib: add module support to glob tests
Subject: lib: add module support to atomic64 tests
Subject: lib/find_bit.c: micro-optimise find_next_*_bit
Subject: linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors
Subject: rbtree: use designated initializers
Subject: lib: add CONFIG_TEST_SORT to enable self-test of sort()
Subject: lib/test_sort.c: make it explicitly non-modular
Subject: lib: update LZ4 compressor module
Subject: lib/decompress_unlz4: change module to work with new LZ4 module version
Subject: crypto: change LZ4 modules to work with new LZ4 module version
Subject: fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version
Subject: lib/lz4: remove back-compat wrappers
Subject: checkpatch: warn on embedded function names
Subject: checkpatch: warn on logging continuations
Subject: checkpatch: update $logFunctions
Subject: checkpatch: add another old address for the FSF
Subject: checkpatch: notice unbalanced else braces in a patch
Subject: checkpatch: remove false unbalanced braces warning
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-02-22 23:38 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-02-22 23:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
142 patches, based on 37c85961c3f87f2141c84e53df31e59db072fd2e:
- DAX updates
- various misc bits
- OCFS2 updates
- most of MM
Subject: tracing: add __print_flags_u64()
Subject: dax: add tracepoint infrastructure, PMD tracing
Subject: dax: update MAINTAINERS entries for FS DAX
Subject: dax: add tracepoints to dax_pmd_load_hole()
Subject: dax: add tracepoints to dax_pmd_insert_mapping()
Subject: mm, dax: make pmd_fault() and friends be the same as fault()
Subject: mm, dax: change pmd_fault() to take only vmf parameter
Subject: dma-debug: add comment for failed to check map error
Subject: tools/vm: add missing Makefile rules
Subject: scripts/spelling.txt: add several more common spelling mistakes
Subject: scripts/spelling.txt: fix incorrect typo-words
Subject: scripts/Lindent: clean up and optimize
Subject: scripts/checkstack.pl: add support for nios2
Subject: scripts/checkincludes.pl: add exit message for no duplicates found
Subject: scripts/tags.sh: include arch/Kconfig* for tags generation
Subject: m32r: use generic current.h
Subject: m32r: fix build warning
Subject: score: remove asm/current.h
Subject: ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
Subject: ocfs2: fix deadlock issue when taking inode lock at vfs entry points
Subject: parisc: use generic current.h
Subject: block: use for_each_thread() in sys_ioprio_set()/sys_ioprio_get()
Subject: 9p: fix a potential acl leak
Subject: kernel/watchdog.c: do not hardcode CPU 0 as the initial thread
Subject: slub: do not merge cache if slub_debug contains a never-merge flag
Subject: mm/slub: add a dump_stack() to the unexpected GFP check
Subject: mm, slab: rename kmalloc-node cache to kmalloc-<size>
Subject: Revert "slub: move synchronize_sched out of slab_mutex on shrink"
Subject: slub: separate out sysfs_slab_release() from sysfs_slab_remove()
Subject: slab: remove synchronous rcu_barrier() call in memcg cache release path
Subject: slab: reorganize memcg_cache_params
Subject: slab: link memcg kmem_caches on their associated memory cgroup
Subject: slab: implement slab_root_caches list
Subject: slab: introduce __kmemcg_cache_deactivate()
Subject: slab: remove synchronous synchronize_sched() from memcg cache deactivation path
Subject: slab: remove slub sysfs interface files early for empty memcg caches
Subject: slab: use memcg_kmem_cache_wq for slab destruction operations
Subject: slub: make sysfs directories for memcg sub-caches optional
Subject: tmpfs: change shmem_mapping() to test shmem_aops
Subject: mm: throttle show_mem() from warn_alloc()
Subject: mm, page_alloc: don't convert pfn to idx when merging
Subject: mm, page_alloc: avoid page_to_pfn() when merging buddies
Subject: mm/vmalloc.c: use rb_entry_safe
Subject: mm, trace: extract COMPACTION_STATUS and ZONE_TYPE to a common header
Subject: oom, trace: add oom detection tracepoints
Subject: oom, trace: add compaction retry tracepoint
Subject: userfaultfd: document _IOR/_IOW
Subject: userfaultfd: correct comment about UFFD_FEATURE_PAGEFAULT_FLAG_WP
Subject: userfaultfd: convert BUG() to WARN_ON_ONCE()
Subject: userfaultfd: use vma_is_anonymous
Subject: userfaultfd: non-cooperative: Split the find_userfault() routine
Subject: userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor
Subject: userfaultfd: non-cooperative: report all available features to userland
Subject: userfaultfd: non-cooperative: Add fork() event
Subject: userfaultfd: non-cooperative: dup_userfaultfd: use mm_count instead of mm_users
Subject: userfaultfd: non-cooperative: add mremap() event
Subject: userfaultfd: non-cooperative: optimize mremap_userfaultfd_complete()
Subject: userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request
Subject: userfaultfd: non-cooperative: avoid MADV_DONTNEED race condition
Subject: userfaultfd: non-cooperative: wake userfaults after UFFDIO_UNREGISTER
Subject: userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support
Subject: userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support
Subject: userfaultfd: hugetlbfs: add __mcopy_atomic_hugetlb for huge page UFFDIO_COPY
Subject: userfaultfd: hugetlbfs: fix __mcopy_atomic_hugetlb retry/error processing
Subject: userfaultfd: hugetlbfs: add userfaultfd hugetlb hook
Subject: userfaultfd: hugetlbfs: allow registration of ranges containing huge pages
Subject: userfaultfd: hugetlbfs: add userfaultfd_hugetlb test
Subject: userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges
Subject: userfaultfd: hugetlbfs: gup: support VM_FAULT_RETRY
Subject: userfaultfd: hugetlbfs: reserve count on error in __mcopy_atomic_hugetlb
Subject: userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS
Subject: userfaultfd: introduce vma_can_userfault
Subject: userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support
Subject: userfaultfd: shmem: introduce vma_is_shmem
Subject: userfaultfd: shmem: add tlbflush.h header for microblaze
Subject: userfaultfd: shmem: use shmem_mcopy_atomic_pte for shared memory
Subject: userfaultfd: shmem: add userfaultfd hook for shared memory faults
Subject: userfaultfd: shmem: allow registration of shared memory ranges
Subject: userfaultfd: hugetlbfs: add UFFDIO_COPY support for shared mappings
Subject: userfaultfd: shmem: add userfaultfd_shmem test
Subject: userfaultfd: shmem: lock the page before adding it to pagecache
Subject: userfaultfd: shmem: avoid a lockup resulting from corrupted page->flags
Subject: userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY
Subject: userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_SHMEM
Subject: userfaultfd: non-cooperative: selftest: introduce userfaultfd_open
Subject: userfaultfd: non-cooperative: selftest: add ufd parameter to copy_page
Subject: userfaultfd: non-cooperative: selftest: add test for FORK, MADVDONTNEED and REMAP events
Subject: userfaultfd: selftest: test UFFDIO_ZEROPAGE on all memory types
Subject: mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock
Subject: mm, vmscan: remove unused mm_vmscan_memcg_isolate
Subject: mm, vmscan: add active list aging tracepoint
Subject: mm, vmscan: show the number of skipped pages in mm_vmscan_lru_isolate
Subject: mm, vmscan: show LRU name in mm_vmscan_lru_isolate tracepoint
Subject: mm, vmscan: extract shrink_page_list reclaim counters into a struct
Subject: mm, vmscan: enhance mm_vmscan_lru_shrink_inactive tracepoint
Subject: mm, vmscan: add mm_vmscan_inactive_list_is_low tracepoint
Subject: trace-vmscan-postprocess: sync with tracepoints updates
Subject: nfs: no PG_private waiters remain, remove waker
Subject: mm: un-export wake_up_page functions
Subject: mm: fix filemap.c kernel-doc warnings
Subject: mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist()
Subject: mm, compaction: add vmstats for kcompactd work
Subject: mm: page_alloc: skip over regions of invalid pfns where possible
Subject: mm,compaction: serialize waitqueue_active() checks
Subject: mm/bootmem.c: cosmetic improvement of code readability
Subject: mm: fix some typos in mm/zsmalloc.c
Subject: mm/memblock.c: trivial code refine in memblock_is_region_memory()
Subject: mm/memblock.c: check return value of memblock_reserve() in memblock_virt_alloc_internal()
Subject: mm/sparse: use page_private() to get page->private value
Subject: mm/memory_hotplug: set magic number to page->freelist instead of page->lru.next
Subject: powerpc: do not make the entire heap executable
Subject: mm/swap: fix kernel message in swap_info_get()
Subject: mm/swap: add cluster lock
Subject: mm/swap: split swap cache into 64MB trunks
Subject: mm/swap: skip readahead for unreferenced swap slots
Subject: mm/swap: allocate swap slots in batches
Subject: mm/swap: free swap slots in batch
Subject: mm/swap: add cache for swap slots allocation
Subject: mm/swap: enable swap slots cache usage
Subject: mm/swap: skip readahead only when swap slot cache is enabled
Subject: mm, thp: add new defer+madvise defrag option
Subject: mm/backing-dev.c: use rb_entry()
Subject: mm, vmscan: do not count freed pages as PGDEACTIVATE
Subject: mm, vmscan: cleanup lru size claculations
Subject: mm, vmscan: consider eligible zones in get_scan_count
Subject: Revert "mm: bail out in shrink_inactive_list()"
Subject: mm, page_alloc: do not report all nodes in show_mem
Subject: mm, page_alloc: warn_alloc print nodemask
Subject: arch, mm: remove arch specific show_mem
Subject: lib/show_mem.c: teach show_mem to work with the given nodemask
Subject: mm: consolidate GFP_NOFAIL checks in the allocator slowpath
Subject: mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
Subject: mm: help __GFP_NOFAIL allocations which do not trigger OOM killer
Subject: mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled
Subject: mm: drop zap_details::ignore_dirty
Subject: mm: drop zap_details::check_swap_entries
Subject: mm: drop unused argument of zap_page_range()
Subject: oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA
Subject: mm/memblock.c: remove unnecessary log and clean up
Subject: zram: remove obsolete sysfs attrs
Subject: mm: fix <linux/pagemap.h> stray kernel-doc notation
Subject: mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-02-18 11:42 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-02-18 11:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
1 fix, based on 2fe1e8a7b2f4dcac3fcb07ff06b0ae7396201fd6:
Subject: printk: use rcuidle console tracepoint
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-02-08 22:30 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-02-08 22:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
4 fixes, based on 926af6273fc683cd98cd0ce7bf0d04a02eed6742:
Subject: kernel/ucount.c: mark user_header with kmemleak_ignore()
Subject: mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
Subject: cpumask: use nr_cpumask_bits for parsing functions
Subject: mm/slub.c: fix random_seq offset destruction
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-01-24 23:17 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-01-24 23:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
26 fixes, based on a4685d2f58e2230d4e27fb2ee581d7ea35e5d046:
Subject: memory_hotplug: make zone_can_shift() return a boolean value
Subject: mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
Subject: dax: fix build warnings with FS_DAX and !FS_IOMAP
Subject: kernel/watchdog: prevent false hardlockup on overloaded system
Subject: drivers/memstick/core/memstick.c: avoid -Wnonnull warning
Subject: userfaultfd: fix SIGBUS resulting from false rwsem wakeups
Subject: mm/slub.c: trace free objects at KERN_INFO
Subject: mm: alloc_contig: re-allow CMA to compact FS pages
Subject: proc: add a schedule point in proc_pid_readdir()
Subject: mm, memcg: do not retry precharge charges
Subject: Documentation/filesystems/proc.txt: add VmPin
Subject: radix-tree: fix private list warnings
Subject: mm/mempolicy.c: do not put mempolicy before using its nodemask
Subject: frv: add atomic64_add_unless()
Subject: fbdev: color map copying bounds checking
Subject: kernel/panic.c: add missing \n
Subject: mm, page_alloc: fix check for NULL preferred_zone
Subject: mm, page_alloc: fix fast-path race with cpuset update or removal
Subject: mm, page_alloc: move cpuset seqcount checking to slowpath
Subject: mm, page_alloc: fix premature OOM when racing with cpuset mems update
Subject: frv: add missing atomic64 operations
Subject: romfs: use different way to generate fsid for BLOCK or MTD
Subject: mn10300: fix build error of missing fpu_save()
Subject: mm: do not export ioremap_page_range symbol for external module
Subject: MAINTAINERS: add Dan Streetman to zswap maintainers
Subject: MAINTAINERS: add Dan Streetman to zbud maintainers
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2017-01-11 0:57 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2017-01-11 0:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
27 fixes, based on bd5d7428f5e50cc10b98cf0abc13ccac391e1e33:
The three patches
Subject: mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Subject: mm: rename __page_frag functions to __page_frag_cache, drop order from drain
Subject: mm: add documentation for page fragment APIs
aren't actually fixes. They're simple function renamings which are
nice-to-have in mainline as ongoing net development depends on them.
Subject: MAINTAINERS: remove duplicate bug filling description
Subject: dax: fix deadlock with DAX 4k holes
Subject: mm/thp/pagecache/collapse: free the pte page table on collapse for thp page cache.
Subject: mm: add follow_pte_pmd()
Subject: dax: wrprotect pmd_t in dax_mapping_entry_mkclean
Subject: mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
Subject: bpf: do not use KMALLOC_SHIFT_MAX
Subject: ocfs2: fix crash caused by stale lvb with fsdlm plugin
Subject: mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
Subject: mm: fix remote numa hits statistics
Subject: mm: get rid of __GFP_OTHER_NODE
Subject: lib/Kconfig.debug: fix frv build failure
Subject: ipc/sem.c: fix incorrect sem_lock pairing
Subject: mm: pmd dirty emulation in page fault handler
Subject: signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
Subject: mailmap: add codeaurora.org names for nameless email commits
Subject: mm: don't dereference struct page fields of invalid pages
Subject: mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
Subject: mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Subject: mm: rename __page_frag functions to __page_frag_cache, drop order from drain
Subject: mm: add documentation for page fragment APIs
Subject: mm: support anonymous stable page
Subject: zram: revalidate disk under init_lock
Subject: zram: support BDI_CAP_STABLE_WRITES
Subject: mm/slab.c: fix SLAB freelist randomization duplicate entries
Subject: mm/hugetlb.c: fix reservation race when freeing surplus pages
Subject: timerfd: export defines to userspace
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-12-20 0:22 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-12-20 0:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a series to make IMA play better across kexec
- a handful of random fixes
15 patches, based on e93b1cc8a8965da137ffea0b88e5f62fa1d2a9e6:
Subject: powerpc: ima: get the kexec buffer passed by the previous kernel
Subject: ima: on soft reboot, restore the measurement list
Subject: ima: permit duplicate measurement list entries
Subject: ima: maintain memory size needed for serializing the measurement list
Subject: powerpc: ima: send the kexec buffer to the next kernel
Subject: ima: on soft reboot, save the measurement list
Subject: ima: store the builtin/custom template definitions in a list
Subject: ima: support restoring multiple template formats
Subject: ima: define a canonical binary_runtime_measurements list format
Subject: ima: platform-independent hash value
Subject: mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
Subject: arm64: setup: introduce kaslr_offset()
Subject: kcov: make kcov work properly with KASLR enabled
Subject: ratelimit: fix WARN_ON_RATELIMIT return value
Subject: printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-12-14 23:04 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-12-14 23:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a few misc things
- kexec updates
- DMA-mapping updates to better support networking DMA operations
- IPC updates
- various MM changes to improve DAX fault handling
- lots of radix-tree changes, mainly to the test suite. All leading
up to reimplementing the IDA/IDR code to be a wrapper layer over the
radix-tree. However the final trigger-pulling patch is held off for
4.11.
114 patches, based on 775a2e29c3bbcf853432f47d3caa9ff8808807ad:
Subject: btrfs: better handle btrfs_printk() defaults
Subject: kernel/watchdog: use nmi registers snapshot in hardlockup handler
Subject: mm, compaction: allow compaction for GFP_NOFS requests
Subject: signals: avoid unnecessary taking of sighand->siglock
Subject: coredump: clarify "unsafe core_pattern" warning
Subject: Revert "kdump, vmcoreinfo: report memory sections virtual addresses"
Subject: kexec: export the value of phys_base instead of symbol address
Subject: kexec: add cond_resched into kimage_alloc_crash_control_pages
Subject: sysctl: add KERN_CONT to deprecated_sysctl_warning()
Subject: arch/arc: add option to skip sync on DMA mapping
Subject: arch/arm: add option to skip sync on DMA map and unmap
Subject: arch/avr32: add option to skip sync on DMA map
Subject: arch/blackfin: add option to skip sync on DMA map
Subject: arch/c6x: add option to skip sync on DMA map and unmap
Subject: arch/frv: add option to skip sync on DMA map
Subject: arch/hexagon: Add option to skip DMA sync as a part of mapping
Subject: arch/m68k: add option to skip DMA sync as a part of mapping
Subject: arch/metag: add option to skip DMA sync as a part of map and unmap
Subject: arch/microblaze: add option to skip DMA sync as a part of map and unmap
Subject: arch/mips: add option to skip DMA sync as a part of map and unmap
Subject: arch/nios2: add option to skip DMA sync as a part of map and unmap
Subject: arch/openrisc: add option to skip DMA sync as a part of mapping
Subject: arch/parisc: add option to skip DMA sync as a part of map and unmap
Subject: arch/powerpc: add option to skip DMA sync as a part of mapping
Subject: arch/sh: add option to skip DMA sync as a part of mapping
Subject: arch/sparc: add option to skip DMA sync as a part of map and unmap
Subject: arch/tile: add option to skip DMA sync as a part of map and unmap
Subject: arch/xtensa: add option to skip DMA sync as a part of mapping
Subject: dma: add calls for dma_map_page_attrs and dma_unmap_page_attrs
Subject: mm: add support for releasing multiple instances of a page
Subject: igb: update driver to make use of DMA_ATTR_SKIP_CPU_SYNC
Subject: igb: update code to better handle incrementing page count
Subject: relay: check array offset before using it
Subject: Kconfig: lib/Kconfig.debug: fix references to Documenation
Subject: Kconfig: lib/Kconfig.ubsan fix reference to ubsan documentation
Subject: kcov: add more missing includes
Subject: kernel/debug/debug_core.c: more properly delay for secondary CPUs
Subject: kdb: remove unused kdb_event handling
Subject: kdb: properly synchronize vkdb_printf() calls with other CPUs
Subject: kdb: call vkdb_printf() from vprintk_default() only when wanted
Subject: initramfs: select builtin initram compression algorithm on KConfig instead of Makefile
Subject: initramfs: allow again choice of the embedded initram compression algorithm
Subject: ipc: msg, make msgrcv work with LONG_MIN
Subject: ipc/shm.c: coding style fixes
Subject: posix-timers: give lazy compilers some help optimizing code away
Subject: drivers/net/wireless/intel/iwlwifi/dvm/calib.c: simplfy min() expression
Subject: ktest.pl: fix english
Subject: kernel/watchdog.c: move shared definitions to nmi.h
Subject: kernel/watchdog.c: move hardlockup detector to separate file
Subject: sparc: implement watchdog_nmi_enable and watchdog_nmi_disable
Subject: ipc/sem: do not call wake_sem_queue_do() prematurely
Subject: ipc/sem: rework task wakeups
Subject: ipc/sem: optimize perform_atomic_semop()
Subject: ipc/sem: explicitly inline check_restart
Subject: ipc/sem: use proper list api for pending_list wakeups
Subject: ipc/sem: simplify wait-wake loop
Subject: ipc/sem: avoid idr tree lookup for interrupted semop
Subject: mm: add locked parameter to get_user_pages_remote()
Subject: mm: unexport __get_user_pages_unlocked()
Subject: mm: join struct fault_env and vm_fault
Subject: mm: use vmf->address instead of of vmf->virtual_address
Subject: mm: use pgoff in struct vm_fault instead of passing it separately
Subject: mm: use passed vm_fault structure in __do_fault()
Subject: mm: trim __do_fault() arguments
Subject: mm: use passed vm_fault structure for in wp_pfn_shared()
Subject: mm: add orig_pte field into vm_fault
Subject: mm: allow full handling of COW faults in ->fault handlers
Subject: mm: factor out functionality to finish page faults
Subject: mm: move handling of COW faults into DAX code
Subject: mm: factor out common parts of write fault handling
Subject: mm: pass vm_fault structure into do_page_mkwrite()
Subject: mm: use vmf->page during WP faults
Subject: mm: move part of wp_page_reuse() into the single call site
Subject: mm: provide helper for finishing mkwrite faults
Subject: mm: change return values of finish_mkwrite_fault()
Subject: mm: export follow_pte()
Subject: dax: make cache flushing protected by entry lock
Subject: dax: protect PTE modification on WP fault by radix tree entry lock
Subject: dax: clear dirty entry tags on cache flush
Subject: tools: add WARN_ON_ONCE
Subject: radix tree test suite: allow GFP_ATOMIC allocations to fail
Subject: radix tree test suite: track preempt_count
Subject: radix tree test suite: free preallocated nodes
Subject: radix tree test suite: make runs more reproducible
Subject: radix tree test suite: iteration test misuses RCU
Subject: radix tree test suite: benchmark for iterator
Subject: radix tree test suite: use rcu_barrier
Subject: radix tree test suite: handle exceptional entries
Subject: radix tree test suite: record order in each item
Subject: tools: add more bitmap functions
Subject: radix tree test suite: use common find-bit code
Subject: radix-tree: fix typo
Subject: radix-tree: move rcu_head into a union with private_list
Subject: radix-tree: create node_tag_set()
Subject: radix-tree: make radix_tree_find_next_bit more useful
Subject: radix-tree: improve dump output
Subject: btrfs: fix race in btrfs_free_dummy_fs_info()
Subject: radix-tree: improve multiorder iterators
Subject: radix-tree: delete radix_tree_locate_item()
Subject: radix-tree: delete radix_tree_range_tag_if_tagged()
Subject: radix-tree: add radix_tree_join
Subject: radix-tree: add radix_tree_split
Subject: radix-tree: add radix_tree_split_preload()
Subject: radix-tree: fix replacement for multiorder entries
Subject: radix tree test suite: check multiorder iteration
Subject: idr: add ida_is_empty
Subject: tpm: use idr_find(), not idr_find_slowpath()
Subject: rxrpc: abstract away knowledge of IDR internals
Subject: idr: reduce the number of bits per level from 8 to 6
Subject: radix tree test suite: add some more functionality
Subject: radix tree test suite: cache recently freed objects
Subject: radix-tree: ensure counts are initialised
Subject: radix tree test suite: add new tag check
Subject: radix tree test suite: delete unused rcupdate.c
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-12-13 0:40 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-12-13 0:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- various misc bits
- most of MM (quite a lot of MM material is awaiting the merge of
linux-next dependencies)
- kasan
- printk updates
- procfs updates
- MAINTAINERS
- /lib updates
- checkpatch updates
123 patches, based on df5f0f0a028c9bf43949398a175dbaafaf513e14:
Subject: kthread: add __printf attributes
Subject: prctl: remove one-shot limitation for changing exe link
Subject: scripts/bloat-o-meter: don't use readlines()
Subject: scripts/bloat-o-meter: compile .NUMBER regex
Subject: scripts/tags.sh: handle OMAP platforms properly
Subject: m32r: add simple dma
Subject: m32r: fix build warning
Subject: drivers/pcmcia/m32r_pcc.c: check return from request_irq
Subject: drivers/pcmcia/m32r_pcc.c: use common error path
Subject: drivers/pcmcia/m32r_pcc.c: check return from add_pcc_socket
Subject: ocfs2/dlm: clean up useless BUG_ON default case in dlm_finalize_reco_handler()
Subject: ocfs2: delete redundant code and set the node bit into maybe_map directly
Subject: ocfs2/dlm: clean up deadcode in dlm_master_request_handler()
Subject: ocfs2: clean up unused 'page' parameter in ocfs2_write_end_nolock()
Subject: ocfs2: fix double put of recount tree in ocfs2_lock_refcount_tree()
Subject: ocfs2: use time64_t to represent orphan scan times
Subject: ocfs2: replace CURRENT_TIME macro
Subject: mm: memcontrol: use special workqueue for creating per-memcg caches
Subject: slub: move synchronize_sched out of slab_mutex on shrink
Subject: slub: avoid false-postive warning
Subject: mm/slab_common.c: check kmem_create_cache flags are common
Subject: mm, slab: faster active and free stats
Subject: mm, slab: maintain total slab count instead of active count
Subject: mm/mprotect.c: don't touch single threaded PTEs which are on the right node
Subject: mm/vmscan.c: set correct defer count for shrinker
Subject: mm/gup.c: make unnecessarily global vma_permits_fault() static
Subject: mm/hugetlb.c: use the right pte val for compare in hugetlb_cow
Subject: mm/hugetlb.c: use huge_pte_lock instead of opencoding the lock
Subject: kmemleak: fix reference to Documentation
Subject: mm: don't steal highatomic pageblock
Subject: mm: prevent double decrease of nr_reserved_highatomic
Subject: mm: try to exhaust highatomic reserve before the OOM
Subject: mm: make unreserve highatomic functions reliable
Subject: mm/vmalloc.c: simplify /proc/vmallocinfo implementation
Subject: mm, thp: avoid unlikely branches for split_huge_pmd
Subject: mm, mempolicy: clean up __GFP_THISNODE confusion in policy_zonelist
Subject: mm, compaction: fix NR_ISOLATED_* stats for pfn based migration
Subject: shmem: avoid maybe-uninitialized warning
Subject: mm: use the correct page size when removing the page
Subject: mm: update mmu_gather range correctly
Subject: mm/hugetlb: add tlb_remove_hugetlb_entry for handling hugetlb pages
Subject: mm: add tlb_remove_check_page_size_change to track page size change
Subject: mm: remove the page size change check in tlb_remove_page
Subject: mm: fix up get_user_pages* comments
Subject: mm/mempolicy.c: forbid static or relative flags for local NUMA mode
Subject: powerpc/mm: allow memory hotplug into a memoryless node
Subject: mm: remove x86-only restriction of movable_node
Subject: mm: enable CONFIG_MOVABLE_NODE on non-x86 arches
Subject: of/fdt: mark hotpluggable memory
Subject: dt: add documentation of "hotpluggable" memory property
Subject: mm/pkeys: generate pkey system call code only if ARCH_HAS_PKEYS is selected
Subject: mm: disable numa migration faults for dax vmas
Subject: mm: cma: make linux/cma.h standalone includible
Subject: mm/filemap.c: add comment for confusing logic in page_cache_tree_insert()
Subject: fs/fs-writeback.c: remove redundant if check
Subject: shmem: fix compilation warnings on unused functions
Subject: mm: don't cap request size based on read-ahead setting
Subject: include/linux/backing-dev-defs.h: shrink struct backing_dev_info
Subject: mm: khugepaged: close use-after-free race during shmem collapsing
Subject: mm: khugepaged: fix radix tree node leak in shmem collapse error path
Subject: mm: workingset: turn shadow node shrinker bugs into warnings
Subject: lib: radix-tree: native accounting of exceptional entries
Subject: lib: radix-tree: check accounting of existing slot replacement users
Subject: lib: radix-tree: add entry deletion support to __radix_tree_replace()
Subject: lib: radix-tree: update callback for changing leaf nodes
Subject: mm: workingset: move shadow entry tracking to radix tree exceptional tracking
Subject: mm: workingset: restore refault tracking for single-page files
Subject: mm: workingset: update shadow limit to reflect bigger active list
Subject: mm: remove free_unmap_vmap_area_noflush()
Subject: mm: remove free_unmap_vmap_area_addr()
Subject: mm: refactor __purge_vmap_area_lazy()
Subject: mm: add vfree_atomic()
Subject: kernel/fork: use vfree_atomic() to free thread stack
Subject: x86/ldt: use vfree_atomic() to free ldt entries
Subject: mm: mark all calls into the vmalloc subsystem as potentially sleeping
Subject: mm: turn vmap_purge_lock into a mutex
Subject: mm: add preempt points into __purge_vmap_area_lazy()
Subject: mm: move vma_is_anonymous check within pmd_move_must_withdraw
Subject: mm: THP page cache support for ppc64
Subject: mm, debug: print raw struct page data in __dump_page()
Subject: mm, rmap: handle anon_vma_prepare() common case inline
Subject: mm, page_alloc: keep pcp count and list contents in sync if struct page is corrupted
Subject: mm: add three more cond_resched() in swapoff
Subject: mm: add cond_resched() in gather_pte_stats()
Subject: mm: make transparent hugepage size public
Subject: kasan: support panic_on_warn
Subject: kasan: eliminate long stalls during quarantine reduction
Subject: kasan: turn on -fsanitize-address-use-after-scope
Subject: mm/percpu.c: fix panic triggered by BUG_ON() falsely
Subject: proc: report no_new_privs state
Subject: proc: make struct pid_entry::len unsigned
Subject: proc: make struct struct map_files_info::len unsigned int
Subject: proc: just list_del() struct pde_opener
Subject: proc: fix type of struct pde_opener::closing field
Subject: proc: kmalloc struct pde_opener
Subject: proc: tweak comments about 2 stage open and everything
Subject: fs/proc/array.c: slightly improve render_sigset_t
Subject: fs/proc/base.c: save decrement during lookup/readdir in /proc/$PID
Subject: fs/proc: calculate /proc/* and /proc/*/task/* nlink at init time
Subject: hung_task: decrement sysctl_hung_task_warnings only if it is positive
Subject: compiler-gcc.h: use "proved" instead of "proofed"
Subject: printk/NMI: fix up handling of the full nmi log buffer
Subject: printk/NMI: handle continuous lines and missing newline
Subject: printk/kdb: handle more message headers
Subject: printk/btrfs: handle more message headers
Subject: printk/sound: handle more message headers
Subject: printk: add Kconfig option to set default console loglevel
Subject: get_maintainer: look for arbitrary letter prefixes in sections
Subject: MAINTAINERS: add "B:" for URI where to file bugs
Subject: MAINTAINERS: add drm and drm/i915 bug filing info
Subject: MAINTAINERS: add "C:" for URI for chat where developers hang out
Subject: MAINTAINERS: add drm and drm/i915 irc channels
Subject: lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
Subject: lib/rbtree.c: fix typo in comment of ____rb_erase_color
Subject: lib/ida: document locking requirements a bit better
Subject: checkpatch: don't try to get maintained status when --no-tree is given
Subject: scripts/checkpatch.pl: fix spelling
Subject: checkpatch: don't check .pl files, improve absolute path commit log test
Subject: checkpatch: avoid multiple line dereferences
Subject: checkpatch: don't check c99 types like uint8_t under tools
Subject: checkpatch: don't emit unified-diff error for rename-only patches
Subject: binfmt_elf: use vmalloc() for allocation of vma_filesz
Subject: init: reduce rootwait polling interval time to 5ms
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-12-07 22:44 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-12-07 22:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
3 fixes, based on ea5a9eff96fed8252f3a8c94a84959f981a93cae:
Subject: zram: restrict add/remove attributes to root only
Subject: radix tree test suite: fix compilation
Subject: kcov: add missing #include <linux/sched.h>
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-12-03 1:26 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-12-03 1:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
2 fixes, based on 8dc0f265d39a3933f4c1f846c7c694f12a2ab88a:
Subject: mm: workingset: fix NULL ptr in count_shadow_nodes
Subject: mm, vmscan: add cond_resched() into shrink_node_memcg()
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-11-30 23:53 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-11-30 23:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
7 fixes, based on ded6e842cf499ef04b0d611d92b859d5b846c497:
Subject: mm, thp: propagation of conditional compilation in khugepaged.c
Subject: thp: fix corner case of munlock() of PTE-mapped THPs
Subject: zram: fix unbalanced idr management at hot removal
Subject: lib/debugobjects: export for use in modules
Subject: kasan: update kasan_global for gcc 7
Subject: kasan: support use-after-scope detection
Subject: mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-11-10 18:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-11-10 18:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
15 fixes, based on 27bcd37e0240bbe33f0efe244b5aad52104115b3:
Subject: mm: remove extra newline from allocation stall warning
Subject: mm, frontswap: make sure allocated frontswap map is assigned
Subject: shmem: fix pageflags after swapping DMA32 object
Subject: scripts/bloat-o-meter: fix SIGPIPE
Subject: mm/cma.c: check the max limit for cma allocation
Subject: swapfile: fix memory corruption via malformed swapfile
Subject: mm: hwpoison: fix thp split handling in memory_failure()
Subject: Revert "console: don't prefer first registered if DT specifies stdout-path"
Subject: ocfs2: fix not enough credit panic
Subject: mm/hugetlb: fix huge page reservation leak in private mapping error paths
Subject: mm/filemap: don't allow partially uptodate page for pipes
Subject: coredump: fix unfreezable coredumping task
Subject: memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB
Subject: mm: kmemleak: scan .data.ro_after_init
Subject: lib/stackdepot: export save/fetch stack for drivers
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-10-11 20:49 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-10-11 20:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- a few block updates that fell in my lap
- lib/ updates
- checkpatch
- autofs
- ipc
- A ton of misc other things
102 patches, based on 1689c73a739d094b544c680b0dfdebe52ffee8fb:
Subject: ocfs2: fix memory leak in dlm_migrate_request_handler()
Subject: block: invalidate the page cache when issuing BLKZEROOUT
Subject: block: require write_same and discard requests align to logical block size
Subject: block: implement (some of) fallocate for block devices
Subject: fs/select: add vmalloc fallback for select(2)
Subject: radix-tree: 'slot' can be NULL in radix_tree_next_slot()
Subject: radix-tree tests: add iteration test
Subject: radix-tree tests: properly initialize mutex
Subject: lib: harden strncpy_from_user
Subject: include/linux/ctype.h: make isdigit() table lookupless
Subject: lib/kstrtox.c: smaller _parse_integer()
Subject: lib/bitmap.c: enhance bitmap syntax
Subject: include/linux: provide a safe version of container_of()
Subject: llist: introduce llist_entry_safe()
Subject: checkpatch: see if modified files are marked obsolete in MAINTAINERS
Subject: checkpatch: look for symbolic permissions and suggest octal instead
Subject: checkpatch: test multiple line block comment alignment
Subject: checkpatch: don't test for prefer ether_addr_<foo>
Subject: checkpatch: externalize the structs that should be const
Subject: const_structs.checkpatch: add frequently used from Julia Lawall's list
Subject: checkpatch: speed up checking for filenames in sections marked obsolete
Subject: checkpatch: improve the block comment * alignment test
Subject: checkpatch: add --strict test for macro argument reuse
Subject: checkpatch: add --strict test for precedence challenged macro arguments
Subject: checkpatch: improve MACRO_ARG_PRECEDENCE test
Subject: checkpatch: add warning for unnamed function definition arguments
Subject: checkpatch: improve the octal permissions tests
Subject: kprobes: include <asm/sections.h> instead of <asm-generic/sections.h>
Subject: autofs: fix typos in Documentation/filesystems/autofs4.txt
Subject: autofs: drop unnecessary extern in autofs_i.h
Subject: autofs: test autofs versions first on sb initialization
Subject: autofs: fix autofs4_fill_super() error exit handling
Subject: autofs: add WARN_ON(1) for non dir/link inode case
Subject: autofs: remove ino free in autofs4_dir_symlink()
Subject: autofs: use autofs4_free_ino() to kfree dentry data
Subject: autofs: remove obsolete sb fields
Subject: autofs: don't fail to free_dev_ioctl(param)
Subject: autofs: remove AUTOFS_DEVID_LEN
Subject: autofs: fix Documentation regarding devid on ioctl
Subject: autofs: update struct autofs_dev_ioctl in Documentation
Subject: autofs: fix pr_debug() message
Subject: autofs: fix dev ioctl number range check
Subject: autofs: add autofs_dev_ioctl_version() for AUTOFS_DEV_IOCTL_VERSION_CMD
Subject: autofs: fix print format for ioctl warning message
Subject: autofs: move inclusion of linux/limits.h to uapi
Subject: autofs4: move linux/auto_dev-ioctl.h to uapi/linux
Subject: autofs: remove possibly misleading /* #define DEBUG */
Subject: autofs: refactor ioctl fn vector in iookup_dev_ioctl()
Subject: pipe: relocate round_pipe_size() above pipe_set_size()
Subject: pipe: move limit checking logic into pipe_set_size()
Subject: pipe: refactor argument for account_pipe_buffers()
Subject: pipe: fix limit checking in pipe_set_size()
Subject: pipe: simplify logic in alloc_pipe_info()
Subject: pipe: fix limit checking in alloc_pipe_info()
Subject: pipe: make account_pipe_buffers() return a value, and use it
Subject: pipe: cap initial pipe capacity according to pipe-max-size limit
Subject: ptrace: clear TIF_SYSCALL_TRACE on ptrace detach
Subject: rapidio/rio_cm: use memdup_user() instead of duplicating code
Subject: random: simplify API for random address requests
Subject: x86: use simpler API for random address requests
Subject: ARM: use simpler API for random address requests
Subject: arm64: use simpler API for random address requests
Subject: tile: use simpler API for random address requests
Subject: unicore32: use simpler API for random address requests
Subject: random: remove unused randomize_range()
Subject: dma-mapping: introduce the DMA_ATTR_NO_WARN attribute
Subject: powerpc: implement the DMA_ATTR_NO_WARN attribute
Subject: nvme: use the DMA_ATTR_NO_WARN attribute
Subject: x86/panic: replace smp_send_stop() with kdump friendly version in panic path
Subject: mips/panic: replace smp_send_stop() with kdump friendly version in panic path
Subject: pps: kc: fix non-tickless system config dependency
Subject: relay: Use irq_work instead of plain timer for deferred wakeup
Subject: config/android: Remove CONFIG_IPV6_PRIVACY
Subject: config: android: move device mapper options to recommended
Subject: config: android: set SELinux as default security mode
Subject: config: android: enable CONFIG_SECCOMP
Subject: kcov: do not instrument lib/stackdepot.c
Subject: ipc/sem.c: fix complex_count vs. simple op race
Subject: ipc/msg: implement lockless pipelined wakeups
Subject: ipc/msg: batch queue sender wakeups
Subject: ipc/msg: make ss_wakeup() kill arg boolean
Subject: ipc/msg: avoid waking sender upon full queue
Subject: ipc/sem.c: Add cond_resched in exit_sme
Subject: kdump, vmcoreinfo: report memory sections virtual addresses
Subject: mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping
Subject: scripts/tags.sh: enable code completion in VIM
Subject: kthread: rename probe_kthread_data() to kthread_probe_data()
Subject: kthread: kthread worker API cleanup
Subject: kthread/smpboot: do not park in kthread_create_on_cpu()
Subject: kthread: allow to call __kthread_create_on_node() with va_list args
Subject: kthread: add kthread_create_worker*()
Subject: kthread: add kthread_destroy_worker()
Subject: kthread: detect when a kthread work is used by more workers
Subject: kthread: initial support for delayed kthread work
Subject: kthread: allow to cancel kthread work
Subject: kthread: allow to modify delayed kthread work
Subject: kthread: better support freezable kthread workers
Subject: kthread: add kerneldoc for kthread_create()
Subject: hung_task: allow hung_task_panic when hung_task_warnings is 0
Subject: treewide: remove redundant #include <linux/kconfig.h>
Subject: fs: use mapping_set_error instead of opencoded set_bit
Subject: mm: split gfp_mask and mapping flags into separate fields
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-10-07 23:53 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-10-07 23:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
- fsnotify updates
- ocfs2 updates
- all of MM
127 patches, based on 87840a2b7e048018d18d60bdac5c09224de85370:
Subject: fsnotify: drop notification_mutex before destroying event
Subject: fsnotify: convert notification_mutex to a spinlock
Subject: fanotify: use notification_lock instead of access_lock
Subject: fanotify: fix possible false warning when freeing events
Subject: fsnotify: clean up spinlock assertions
Subject: jiffies: add time comparison functions for 64 bit jiffies
Subject: fs/ocfs2/dlmfs: remove deprecated create_singlethread_workqueue()
Subject: fs/ocfs2/cluster: remove deprecated create_singlethread_workqueue()
Subject: fs/ocfs2/super: remove deprecated create_singlethread_workqueue()
Subject: fs/ocfs2/dlm: remove deprecated create_singlethread_workqueue()
Subject: ocfs2: fix undefined struct variable in inode.h
Subject: mm: oom: deduplicate victim selection code for memcg and global oom
Subject: mm/vmalloc.c: fix align value calculation error
Subject: mm: memcontrol: add sanity checks for memcg->id.ref on get/put
Subject: mm/oom_kill.c: fix task_will_free_mem() comment
Subject: mm, compaction: make whole_zone flag ignore cached scanner positions
Subject: mm, compaction: cleanup unused functions
Subject: mm, compaction: rename COMPACT_PARTIAL to COMPACT_SUCCESS
Subject: mm, compaction: don't recheck watermarks after COMPACT_SUCCESS
Subject: mm, compaction: add the ultimate direct compaction priority
Subject: mm, compaction: use correct watermark when checking compaction success
Subject: mm, compaction: create compact_gap wrapper
Subject: mm, compaction: use proper alloc_flags in __compaction_suitable()
Subject: mm, compaction: require only min watermarks for non-costly orders
Subject: mm, vmscan: make compaction_ready() more accurate and readable
Subject: mem-hotplug: fix node spanned pages when we have a movable node
Subject: mm: fix set pageblock migratetype in deferred struct page init
Subject: mm, vmscan: get rid of throttle_vm_writeout
Subject: mm/debug_pagealloc.c: clean-up guard page handling code
Subject: mm/debug_pagealloc.c: don't allocate page_ext if we don't use guard page
Subject: mm/page_owner: move page_owner specific function to page_owner.c
Subject: mm/page_ext: rename offset to index
Subject: mm/page_ext: support extra space allocation by page_ext user
Subject: mm/page_owner: don't define fields on struct page_ext by hard-coding
Subject: do_generic_file_read(): fail immediately if killed
Subject: mm: pagewalk: fix the comment for test_walk
Subject: mm: unrig VMA cache hit ratio
Subject: mm, swap: add swap_cluster_list
Subject: mm,oom_reaper: reduce find_lock_task_mm() usage
Subject: mm,oom_reaper: do not attempt to reap a task twice
Subject: oom: keep mm of the killed task available
Subject: kernel, oom: fix potential pgd_lock deadlock from __mmdrop
Subject: mm, oom: get rid of signal_struct::oom_victims
Subject: oom, suspend: fix oom_killer_disable vs. pm suspend properly
Subject: mm, oom: enforce exit_oom_victim on current task
Subject: mm: make sure that kthreads will not refault oom reaped memory
Subject: oom, oom_reaper: allow to reap mm shared by the kthreads
Subject: mm: use zonelist name instead of using hardcoded index
Subject: mm: introduce arch_reserved_kernel_pages()
Subject: mm/memblock.c: expose total reserved memory
Subject: powerpc: implement arch_reserved_kernel_pages
Subject: mm/nobootmem.c: remove duplicate macro ARCH_LOW_ADDRESS_LIMIT statements
Subject: mm/bootmem.c: replace kzalloc() by kzalloc_node()
Subject: mm: don't use radix tree writeback tags for pages in swap cache
Subject: oom: warn if we go OOM for higher order and compaction is disabled
Subject: mm: mlock: check against vma for actual mlock() size
Subject: mm: mlock: avoid increase mm->locked_vm on mlock() when already mlock2(,MLOCK_ONFAULT)
Subject: selftest: split mlock2_ funcs into separate mlock2.h
Subject: selftests/vm: add test for mlock() when areas are intersected
Subject: selftest: move seek_to_smaps_entry() out of mlock2-tests.c
Subject: selftests: expanding more mlock selftest
Subject: thp, dax: add thp_get_unmapped_area for pmd mappings
Subject: ext2/4, xfs: call thp_get_unmapped_area() for pmd mappings
Subject: cpu: fix node state for whether it contains CPU
Subject: fs/proc/task_mmu.c: make the task_mmu walk_page_range() limit in clear_refs_write() obvious
Subject: thp: reduce usage of huge zero page's atomic counter
Subject: mm/memcontrol.c: make the walk_page_range() limit obvious
Subject: memory-hotplug: fix store_mem_state() return value
Subject: mm: fix cache mode tracking in vm_insert_mixed()
Subject: mm, swap: use offset of swap entry as key of swap cache
Subject: mm: remove page_file_index
Subject: Revert "mm, oom: prevent premature OOM killer invocation for high order request"
Subject: mm, compaction: more reliably increase direct compaction priority
Subject: mm, compaction: restrict full priority to non-costly orders
Subject: mm, compaction: make full priority ignore pageblock suitability
Subject: mm, page_alloc: pull no_progress_loops update to should_reclaim_retry()
Subject: mm, compaction: ignore fragindex from compaction_zonelist_suitable()
Subject: mm, compaction: restrict fragindex to costly orders
Subject: mm: don't emit warning from pagefault_out_of_memory()
Subject: mm/page_io.c: replace some BUG_ON()s with VM_BUG_ON_PAGE()
Subject: mm: move phys_mem_access_prot_allowed() declaration to pgtable.h
Subject: mm: memcontrol: consolidate cgroup socket tracking
Subject: mm/shmem.c: constify anon_ops
Subject: mm: nobootmem: move the comment of free_all_bootmem
Subject: mm/hugetlb: fix memory offline with hugepage size > memory block size
Subject: mm/hugetlb: check for reserved hugepages during memory offline
Subject: mm/hugetlb: improve locking in dissolve_free_huge_pages()
Subject: mm/page_isolation: fix typo: "paes" -> "pages"
Subject: mm,ksm: add __GFP_HIGH to the allocation in alloc_stable_node()
Subject: mm: vm_page_prot: update with WRITE_ONCE/READ_ONCE
Subject: mm: vma_adjust: remove superfluous confusing update in remove_next == 1 case
Subject: mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk
Subject: mm: vma_adjust: remove superfluous check for next not NULL
Subject: mm: vma_adjust: minor comment correction
Subject: mm: vma_merge: correct false positive from __vma_unlink->validate_mm_rb
Subject: mm: clarify why we avoid page_mapcount() for slab pages in dump_page()
Subject: oom: print nodemask in the oom report
Subject: mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGE
Subject: arm64 Kconfig: select gigantic page
Subject: vfs,mm: fix a dead loop in truncate_inode_pages_range()
Subject: mm: consolidate warn_alloc_failed users
Subject: mm: warn about allocations which stall for too long
Subject: mm: remove unnecessary condition in remove_inode_hugepages
Subject: linux/mm.h: canonicalize macro PAGE_ALIGNED() definition
Subject: ia64: implement atomic64_dec_if_positive
Subject: atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
Subject: proc: much faster /proc/vmstat
Subject: proc: faster /proc/*/status
Subject: seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
Subject: meminfo: break apart a very long seq_printf with #ifdefs
Subject: proc: relax /proc/<tid>/timerslack_ns capability requirements
Subject: proc: add LSM hook checks to /proc/<tid>/timerslack_ns
Subject: proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
Subject: mm, proc: fix region lost in /proc/self/smaps
Subject: Documentation/filesystems/proc.txt: add more description for maps/smaps
Subject: min/max: remove sparse warnings when they're nested
Subject: nmi_backtrace: add more trigger_*_cpu_backtrace() methods
Subject: nmi_backtrace: do a local dump_stack() instead of a self-NMI
Subject: arch/tile: adopt the new nmi_backtrace framework
Subject: nmi_backtrace: generate one-line reports for idle cpus
Subject: spelling.txt: "modeled" is spelt correctly
Subject: uprobes: remove function declarations from arch/{mips,s390}
Subject: .gitattributes: set git diff driver for C source code files
Subject: mailmap: add Johan Hovold
Subject: CREDITS: update Pavel's information, add GPG key, remove snail mail address
Subject: cred: simpler, 1D supplementary groups
Subject: console: don't prefer first registered if DT specifies stdout-path
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-09-30 22:11 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-09-30 22:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 338 bytes --]
4 fixes, based on e3b3656ca63e23b5755183718df36fb9ff518b02:
Subject: mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
Subject: ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
Subject: include/linux/property.h: fix typo/compile error
Subject: MAINTAINERS: Javi has moved
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-09-28 22:22 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-09-28 22:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 393 bytes --]
5 fixes, based on 8ab293e3a1376574e11f9059c09cc0db212546cb:
Subject: mm,ksm: fix endless looping in allocating memory when ksm enable
Subject: dma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG
Subject: scripts/recordmcount.c: account for .softirqentry.text
Subject: mem-hotplug: use nodes that contain memory as mask in new_node_page()
Subject: MAINTAINERS: Mark has moved
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-09-19 21:43 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-09-19 21:43 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]
20 fixes, based on 3be7988674ab33565700a37b210f502563d932e6:
Subject: mem-hotplug: don't clear the only node in new_node_page()
Subject: ocfs2/dlm: fix race between convert and migration
Subject: MAINTAINERS: Maik has moved
Subject: khugepaged: fix use-after-free in collapse_huge_page()
Subject: mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
Subject: mm: avoid endless recursion in dump_page()
Subject: MAINTAINERS: update email for VLYNQ bus entry
Subject: autofs: use dentry flags to block walks during expire
Subject: mm: fix the page_swap_info() BUG_ON check
Subject: ipc/shm: fix crash if CONFIG_SHMEM is not set
Subject: ocfs2: fix trans extend while flush truncate log
Subject: ocfs2: fix trans extend while free cached blocks
Subject: fsnotify: add a way to stop queueing events on group shutdown
Subject: fanotify: fix list corruption in fanotify_get_response()
Subject: ocfs2: fix double unlock in case retry after free truncate log
Subject: mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
Subject: cgroup: duplicate cgroup reference when cloning sockets
Subject: ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
Subject: Revert "ocfs2: bump up o2cb network protocol version"
Subject: rapidio/rio_cm: avoid GFP_KERNEL in atomic context
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-09-01 23:14 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-09-01 23:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 1144 bytes --]
14 fixes, based on 071e31e254e0e0c438eecba3dba1d6e2d0da36c2:
Subject: mm, oom: prevent premature OOM killer invocation for high order request
Subject: kexec: fix double-free when failing to relocate the purgatory
Subject: kconfig: tinyconfig: provide whole choice blocks to avoid warnings
Subject: lib/test_hash.c: fix warning in two-dimensional array init
Subject: lib/test_hash.c: fix warning in preprocessor symbol evaluation
Subject: mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator
Subject: drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE
Subject: treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE
Subject: printk/nmi: avoid direct printk()-s from __printk_nmi_flush()
Subject: mm, mempolicy: task->mempolicy must be NULL before dropping final reference
Subject: MAINTAINERS: Vladimir has moved
Subject: kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
Subject: rapidio/documentation/mport_cdev: add missing parameter description
Subject: rapidio/tsi721: fix incorrect detection of address translation condition
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-08-25 22:16 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-08-25 22:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 738 bytes --]
12 fixes, based on 61c04572de404e52a655a36752e696bbcb483cf5:
Subject: byteswap: don't use __builtin_bswap*() with sparse
Subject: get_maintainer: quiet noisy implicit -f vcs_file_exists checking
Subject: sysctl: handle error writing UINT_MAX to u32 fields
Subject: stackdepot: fix mempolicy use-after-free
Subject: soft_dirty: fix soft_dirty during THP split
Subject: printk: fix parsing of "brl=" option
Subject: treewide: replace config_enabled() with IS_ENABLED() (2nd round)
Subject: mm: clarify COMPACTION Kconfig text
Subject: mm: memcontrol: avoid unused function warning
Subject: fs/seq_file: fix out-of-bounds read
Subject: dax: fix device-dax region base
Subject: mm: silently skip readahead for DAX inodes
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-08-11 22:32 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-08-11 22:32 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 579 bytes --]
7 fixes, based on 85e97be32c6242c98dbbc7a241b4a78c1b93327b:
Subject: mm/hugetlb: fix incorrect hugepages count during mem hotplug
Subject: proc, meminfo: use correct helpers for calculating LRU sizes in meminfo
Subject: mm: memcontrol: fix swap counter leak on swapout from offline cgroup
Subject: mm: memcontrol: fix memcg id ref counter on swap charge move
Subject: kasan: remove the unnecessary WARN_ONCE from quarantine.c
Subject: mm, oom: fix uninitialized ret in task_will_free_mem()
Subject: mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-08-04 22:31 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-08-04 22:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
A few late-breaking fixes.
7 fixes, based on c1ece76719205690f4b448460d9b85c130e8021b:
Subject: mm: disable CONFIG_MEMORY_HOTPLUG when KASAN is enabled
Subject: mm/memblock: fix a typo in a comment
Subject: mm: initialise per_cpu_nodestats for all online pgdats at boot
Subject: powerpc/fsl_rio: fix a missing error code
Subject: slub: drop bogus inline for fixup_red_left()
Subject: MAINTAINERS: update cgroup's document path
Subject: mm/memblock.c: fix NULL dereference error
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-08-03 20:45 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-08-03 20:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 1288 bytes --]
- dma-mapping API cleanup
- a few cleanups and misc things
- use jump labels in dynamic-debug
18 patches, based on bf0f500bd0199aab613eb0ecb3412edd5472740d:
Subject: drivers/fpga/Kconfig: fix build failure
Subject: tree-wide: replace config_enabled() with IS_ENABLED()
Subject: include/linux/bitmap.h: cleanup
Subject: media: mtk-vcodec: remove unused dma_attrs
Subject: dma-mapping: use unsigned long for dma_attrs
Subject: samples/kprobe: convert the printk to pr_info/pr_err
Subject: samples/jprobe: convert the printk to pr_info/pr_err
Subject: samples/kretprobe: convert the printk to pr_info/pr_err
Subject: samples/kretprobe: fix the wrong type
Subject: block: remove BLK_DEV_DAX config option
Subject: MAINTAINERS: update email and list of Samsung HW driver maintainers
Subject: drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning
Subject: powerpc: add explicit #include <asm/asm-compat.h> for jump label
Subject: sparc: support static_key usage in non-module __exit sections
Subject: tile: support static_key usage in non-module __exit sections
Subject: arm: jump label may reference text in __exit
Subject: jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL
Subject: dynamic_debug: add jump label support
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-08-02 21:01 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-08-02 21:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 7166 bytes --]
- the rest of ocfs2
- various hotfixes, mainly MM
- quite a bit of misc stuff - drivers, fork, exec, signals, etc.
- printk updates
- firmware
- checkpatch
- nilfs2
- more kexec stuff than usual
- rapidio updates
- w1 things
111 patches, based on f7b32e4c021fd788f13f6785e17efbc3eb05b351:
Subject: ocfs2: ensure that dlm lockspace is created by kernel module
Subject: ocfs2: retry on ENOSPC if sufficient space in truncate log
Subject: ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler
Subject: ocfs2/dlm: solve a BUG when deref failed in dlm_drop_lockres_ref
Subject: ocfs2/dlm: continue to purge recovery lockres when recovery master goes down
Subject: mm: fail prefaulting if page table allocation fails
Subject: mm: move swap-in anonymous page into active list
Subject: tools/testing/radix-tree/linux/gfp.h: fix bitrotted value
Subject: mm/hugetlb: avoid soft lockup in set_max_huge_pages()
Subject: mm, hugetlb: fix huge_pte_alloc BUG_ON
Subject: memcg: put soft limit reclaim out of way if the excess tree is empty
Subject: mm/kasan: fix corruptions and false positive reports
Subject: mm/kasan: don't reduce quarantine in atomic contexts
Subject: mm/kasan, slub: don't disable interrupts when object leaves quarantine
Subject: mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta
Subject: mm/kasan: get rid of ->state in struct kasan_alloc_meta
Subject: kasan: improve double-free reports
Subject: kasan: avoid overflowing quarantine size on low memory systems
Subject: radix-tree: account nodes to memcg only if explicitly requested
Subject: mm: vmscan: fix memcg-aware shrinkers not called on global reclaim
Subject: sysv, ipc: fix security-layer leaking
Subject: UBSAN: fix typo in format string
Subject: cgroup: update cgroup's document path
Subject: MAINTAINERS: befs: add new maintainers
Subject: proc_oom_score: remove tasklist_lock and pid_alive()
Subject: procfs: avoid 32-bit time_t in /proc/*/stat
Subject: fs/proc/task_mmu.c: suppress compilation warnings with W=1
Subject: init/Kconfig: make COMPILE_TEST depend on !UML
Subject: memstick: don't allocate unused major for ms_block
Subject: treewide: replace obsolete _refok by __ref
Subject: uapi: move forward declarations of internal structures
Subject: mailmap: add Linus L_ssing
Subject: include: mman: use bool instead of int for the return value of arch_validate_prot
Subject: task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works
Subject: dynamic_debug: only add header when used
Subject: printk: do not include interrupt.h
Subject: printk: create pr_<level> functions
Subject: printk: introduce suppress_message_printing()
Subject: printk: include <asm/sections.h> instead of <asm-generic/sections.h>
Subject: fbdev/bfin_adv7393fb: move DRIVER_NAME before its first use
Subject: ratelimit: extend to print suppressed messages on release
Subject: printk: add kernel parameter to control writes to /dev/kmsg
Subject: get_maintainer.pl: reduce need for command-line option -f
Subject: lib/iommu-helper: skip to next segment
Subject: crc32: use ktime_get_ns() for measurement
Subject: radix-tree: fix comment about "exceptional" bits
Subject: firmware: consolidate kmap/read/write logic
Subject: firmware: provide infrastructure to make fw caching optional
Subject: firmware: support loading into a pre-allocated buffer
Subject: checkpatch: skip long lines that use an EFI_GUID macro
Subject: checkpatch: allow c99 style // comments
Subject: checkpatch: yet another commit id improvement
Subject: checkpatch: don't complain about BIT macro in uapi
Subject: checkpatch: improve 'bare use of' signed/unsigned types warning
Subject: checkpatch: check signoff when reading stdin
Subject: checkpatch: if no filenames then read stdin
Subject: binfmt_elf: fix calculations for bss padding
Subject: mm: refuse wrapped vm_brk requests
Subject: fs/binfmt_em86.c: fix incompatible pointer type
Subject: nilfs2: hide function name argument from nilfs_error()
Subject: nilfs2: add nilfs_msg() message interface
Subject: nilfs2: embed a back pointer to super block instance in nilfs object
Subject: nilfs2: reduce bare use of printk() with nilfs_msg()
Subject: nilfs2: replace nilfs_warning() with nilfs_msg()
Subject: nilfs2: emit error message when I/O error is detected
Subject: nilfs2: do not use yield()
Subject: nilfs2: refactor parser of snapshot mount option
Subject: nilfs2: fix misuse of a semaphore in sysfs code
Subject: nilfs2: use BIT() macro
Subject: nilfs2: move ioctl interface and disk layout to uapi separately
Subject: reiserfs: fix "new_insert_key may be used uninitialized ..."
Subject: signal: consolidate {TS,TLF}_RESTORE_SIGMASK code
Subject: kernel/exit.c: quieten greatest stack depth printk
Subject: cpumask: fix code comment
Subject: kexec: return error number directly
Subject: ARM: kdump: advertise boot aliased crash kernel resource
Subject: ARM: kexec: advertise location of bootable RAM
Subject: kexec: don't invoke OOM-killer for control page allocation
Subject: kexec: ensure user memory sizes do not wrap
Subject: kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_t
Subject: kexec: allow architectures to override boot mapping
Subject: ARM: keystone: dts: add psci command definition
Subject: ARM: kexec: fix kexec for Keystone 2
Subject: kexec: use core_param for crash_kexec_post_notifiers boot option
Subject: kexec: add a kexec_crash_loaded() function
Subject: kexec: allow kdump with crash_kexec_post_notifiers
Subject: kexec: add restriction on kexec_load() segment sizes
Subject: rapidio: add RapidIO channelized messaging driver
Subject: rapidio: remove unnecessary 0x prefixes before %pa extension uses
Subject: rapidio/documentation: fix mangled paragraph in mport_cdev
Subject: rapidio: fix return value description for dma_prep functions
Subject: rapidio/tsi721_dma: add channel mask and queue size parameters
Subject: rapidio/tsi721: add PCIe MRRS override parameter
Subject: rapidio/tsi721: add messaging mbox selector parameter
Subject: rapidio/tsi721_dma: advance queue processing from transfer submit call
Subject: rapidio: fix error handling in mbox request/release functions
Subject: rapidio/idt_gen2: fix locking warning
Subject: rapidio: change inbound window size type to u64
Subject: rapidio: modify for rev.3 specification changes
Subject: powerpc/fsl_rio: apply changes for RIO spec rev 3
Subject: rapidio/switches: add driver for IDT gen3 switches
Subject: w1: remove need for ida and use PLATFORM_DEVID_AUTO
Subject: w1: add helper macro module_w1_family
Subject: w1:omap_hdq: fix regression
Subject: init: allow blacklisting of module_init functions
Subject: relay: add global mode support for buffer-only channels
Subject: init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
Subject: config: add android config fragments
Subject: init/Kconfig: add clarification for out-of-tree modules
Subject: kcov: allow more fine-grained coverage instrumentation
Subject: ipc: delete "nr_ipc_ns"
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-07-28 22:42 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-07-28 22:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 6861 bytes --]
- the rest of MM
101 patches, based on 194dc870a5890e855ecffb30f3b80ba7c88f96d6:
Subject: proc, oom: drop bogus task_lock and mm check
Subject: proc, oom: drop bogus sighand lock
Subject: proc, oom_adj: extract oom_score_adj setting into a helper
Subject: mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj
Subject: mm, oom: skip vforked tasks from being selected
Subject: mm, oom: kill all tasks sharing the mm
Subject: mm, oom: fortify task_will_free_mem()
Subject: mm, oom: task_will_free_mem should skip oom_reaped tasks
Subject: mm, oom_reaper: do not attempt to reap a task more than twice
Subject: mm, oom: hide mm which is shared with kthread or global init
Subject: mm, oom: tighten task_will_free_mem() locking
Subject: mm: update the comment in __isolate_free_page
Subject: mm: fix vm-scalability regression in cgroup-aware workingset code
Subject: mm/compaction: remove unnecessary order check in try_to_compact_pages()
Subject: freezer, oom: check TIF_MEMDIE on the correct task
Subject: cpuset, mm: fix TIF_MEMDIE check in cpuset_change_task_nodemask
Subject: mm, meminit: remove early_page_nid_uninitialised
Subject: mm, vmstat: add infrastructure for per-node vmstats
Subject: mm, vmscan: move lru_lock to the node
Subject: mm, vmscan: move LRU lists to node
Subject: mm, mmzone: clarify the usage of zone padding
Subject: mm, vmscan: begin reclaiming pages on a per-node basis
Subject: mm, vmscan: have kswapd only scan based on the highest requested zone
Subject: mm, vmscan: make kswapd reclaim in terms of nodes
Subject: mm, vmscan: remove balance gap
Subject: mm, vmscan: simplify the logic deciding whether kswapd sleeps
Subject: mm, vmscan: by default have direct reclaim only shrink once per node
Subject: mm, vmscan: remove duplicate logic clearing node congestion and dirty state
Subject: mm: vmscan: do not reclaim from kswapd if there is any eligible zone
Subject: mm, vmscan: make shrink_node decisions more node-centric
Subject: mm, memcg: move memcg limit enforcement from zones to nodes
Subject: mm, workingset: make working set detection node-aware
Subject: mm, page_alloc: consider dirtyable memory in terms of nodes
Subject: mm: move page mapped accounting to the node
Subject: mm: rename NR_ANON_PAGES to NR_ANON_MAPPED
Subject: mm: move most file-based accounting to the node
Subject: mm: move vmscan writes and file write accounting to the node
Subject: mm, vmscan: only wakeup kswapd once per node for the requested classzone
Subject: mm, page_alloc: wake kswapd based on the highest eligible zone
Subject: mm: convert zone_reclaim to node_reclaim
Subject: mm, vmscan: avoid passing in classzone_idx unnecessarily to shrink_node
Subject: mm, vmscan: avoid passing in classzone_idx unnecessarily to compaction_ready
Subject: mm, vmscan: avoid passing in `remaining' unnecessarily to prepare_kswapd_sleep()
Subject: mm, vmscan: Have kswapd reclaim from all zones if reclaiming and buffer_heads_over_limit
Subject: mm, vmscan: add classzone information to tracepoints
Subject: mm, page_alloc: remove fair zone allocation policy
Subject: mm: page_alloc: cache the last node whose dirty limit is reached
Subject: mm: vmstat: replace __count_zone_vm_events with a zone id equivalent
Subject: mm: vmstat: account per-zone stalls and pages skipped during reclaim
Subject: mm, vmstat: print node-based stats in zoneinfo file
Subject: mm, vmstat: remove zone and node double accounting by approximating retries
Subject: mm, page_alloc: fix dirtyable highmem calculation
Subject: mm, pagevec: release/reacquire lru_lock on pgdat change
Subject: mm: show node_pages_scanned per node, not zone
Subject: mm, vmscan: Update all zone LRU sizes before updating memcg
Subject: mm, vmscan: remove redundant check in shrink_zones()
Subject: mm, vmscan: release/reacquire lru_lock on pgdat change
Subject: mm: add per-zone lru list stat
Subject: mm, vmscan: remove highmem_file_pages
Subject: mm: remove reclaim and compaction retry approximations
Subject: mm: consider whether to decivate based on eligible zones inactive ratio
Subject: mm, vmscan: account for skipped pages as a partial scan
Subject: mm: bail out in shrink_inactive_list()
Subject: mm/zsmalloc: use obj_index to keep consistent with others
Subject: mm/zsmalloc: take obj index back from find_alloced_obj
Subject: mm/zsmalloc: use class->objs_per_zspage to get num of max objects
Subject: mm/zsmalloc: avoid calculate max objects of zspage twice
Subject: mm/zsmalloc: keep comments consistent with code
Subject: mm/zsmalloc: add __init,__exit attribute
Subject: mm/zsmalloc: use helper to clear page->flags bit
Subject: mm, THP: clean up return value of madvise_free_huge_pmd
Subject: memblock: include <asm/sections.h> instead of <asm-generic/sections.h>
Subject: mm: CONFIG_ZONE_DEVICE stop depending on CONFIG_EXPERT
Subject: mm: cleanup ifdef guards for vmem_altmap
Subject: mm: track NR_KERNEL_STACK in KiB instead of number of stacks
Subject: mm: fix memcg stack accounting for sub-page stacks
Subject: kdb: use task_cpu() instead of task_thread_info()->cpu
Subject: printk: when dumping regs, show the stack, not thread_info
Subject: mm/memblock.c: add new infrastructure to address the mem limit issue
Subject: arm64:acpi: fix the acpi alignment exception when 'mem=' specified
Subject: kmemleak: don't hang if user disables scanning early
Subject: make __section_nr() more efficient
Subject: mm: hwpoison: remove incorrect comments
Subject: mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
Subject: Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
Subject: mm: add cond_resched() to generic_swapfile_activate()
Subject: mm: optimize copy_page_to/from_iter_iovec
Subject: mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
Subject: mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
Subject: zsmalloc: Delete an unnecessary check before the function call "iput"
Subject: mm: fix use-after-free if memory allocation failed in vma_adjust()
Subject: mm, kasan: account for object redzone in SLUB's nearest_obj()
Subject: mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
Subject: lib/stackdepot.c: use __GFP_NOWARN for stack allocations
Subject: mm, page_alloc: set alloc_flags only once in slowpath
Subject: mm, page_alloc: don't retry initial attempt in slowpath
Subject: mm, page_alloc: restructure direct compaction handling in slowpath
Subject: mm, page_alloc: make THP-specific decisions more generic
Subject: mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
Subject: mm, compaction: introduce direct compaction priority
Subject: mm, compaction: simplify contended compaction handling
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-07-26 22:16 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-07-26 22:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 7441 bytes --]
- a few misc bits
- ocfs2
- most(?) of MM
126 patches, based on e65805251f2db69c9f67ed8062ab82526be5a374:
Subject: arm: get rid of superfluous __GFP_REPEAT
Subject: dax: some small updates to dax.txt documentation
Subject: dax: remote unused fault wrappers
Subject: dma-debug: track bucket lock state for static checkers
Subject: fbmon: remove unused function argument
Subject: CFLAGS: add -Wunused-but-set-parameter
Subject: kbuild: abort build on bad stack protector flag
Subject: scripts/bloat-o-meter: fix percent on <1% changes
Subject: m32r: add __ucmpdi2 to fix build failure
Subject: debugobjects.h: fix trivial kernel doc warning
Subject: ocfs2: fix a redundant re-initialization
Subject: ocfs2: improve recovery performance
Subject: ocfs2: cleanup unneeded goto in ocfs2_create_new_inode_locks
Subject: ocfs2/dlm: fix memory leak of dlm_debug_ctxt
Subject: ocfs2: cleanup implemented prototypes
Subject: ocfs2: remove obscure BUG_ON in dlmglue
Subject: ocfs2/cluster: clean up unnecessary assignment for 'ret'
Subject: fs/fs-writeback.c: add a new writeback list for sync
Subject: fs/fs-writeback.c: inode writeback list tracking tracepoints
Subject: mm: reorganize SLAB freelist randomization
Subject: mm: SLUB freelist randomization
Subject: slab: make GFP_SLAB_BUG_MASK information more human readable
Subject: slab: do not panic on invalid gfp_mask
Subject: mm: faster kmalloc_array(), kcalloc()
Subject: mm/slab: use list_move instead of list_del/list_add
Subject: mm/memcontrol.c: remove the useless parameter for mc_handle_swap_pte
Subject: mm/init: fix zone boundary creation
Subject: memory-hotplug: add move_pfn_range()
Subject: memory-hotplug: more general validation of zone during online
Subject: memory-hotplug: use zone_can_shift() for sysfs valid_zones attribute
Subject: mm: zap ZONE_OOM_LOCKED
Subject: mm: oom: add memcg to oom_control
Subject: include/linux/mmdebug.h: add VM_WARN which maps to WARN()
Subject: powerpc/mm: check for irq disabled() only if DEBUG_VM is enabled
Subject: zram: rename zstrm find-release functions
Subject: zram: switch to crypto compress API
Subject: zram: use crypto api to check alg availability
Subject: zram: cosmetic: cleanup documentation
Subject: zram: delete custom lzo/lz4
Subject: zram: add more compression algorithms
Subject: zram: drop gfp_t from zcomp_strm_alloc()
Subject: mm: use put_page() to free page instead of putback_lru_page()
Subject: mm: migrate: support non-lru movable page migration
Subject: mm: balloon: use general non-lru movable page feature
Subject: zsmalloc: keep max_object in size_class
Subject: zsmalloc: use bit_spin_lock
Subject: zsmalloc: use accessor
Subject: zsmalloc: factor page chain functionality out
Subject: zsmalloc: introduce zspage structure
Subject: zsmalloc: separate free_zspage from putback_zspage
Subject: zsmalloc: use freeobj for index
Subject: zsmalloc: page migration support
Subject: zram: use __GFP_MOVABLE for memory allocation
Subject: zsmalloc: use OBJ_TAG_BIT for bit shifter
Subject: mm/compaction: split freepages without holding the zone lock
Subject: mm/page_owner: initialize page owner without holding the zone lock
Subject: mm/page_owner: copy last_migrate_reason in copy_page_owner()
Subject: mm/page_owner: introduce split_page_owner and replace manual handling
Subject: tools/vm/page_owner: increase temporary buffer size
Subject: mm/page_owner: use stackdepot to store stacktrace
Subject: mm/page_alloc: introduce post allocation processing on page allocator
Subject: mm/page_isolation: clean up confused code
Subject: mm: thp: check pmd_trans_unstable() after split_huge_pmd()
Subject: mm/hugetlb: simplify hugetlb unmap
Subject: mm: change the interface for __tlb_remove_page()
Subject: mm/mmu_gather: track page size with mmu gather and force flush if page size change
Subject: mm: remove pointless struct in struct page definition
Subject: mm: clean up non-standard page->_mapcount users
Subject: mm: memcontrol: cleanup kmem charge functions
Subject: mm: charge/uncharge kmemcg from generic page allocator paths
Subject: mm: memcontrol: teach uncharge_list to deal with kmem pages
Subject: arch: x86: charge page tables to kmemcg
Subject: pipe: account to kmemcg
Subject: af_unix: charge buffers to kmemcg
Subject: mm,oom: remove unused argument from oom_scan_process_thread().
Subject: mm, frontswap: convert frontswap_enabled to static key
Subject: mm: add NR_ZSMALLOC to vmstat
Subject: include/linux/memblock.h: Clean up code for several trivial details
Subject: mm, oom_reaper: make sure that mmput_async is called only when memory was reaped
Subject: mm, memcg: use consistent gfp flags during readahead
Subject: mm/memblock.c:memblock_add_range(): if nr_new is 0 just return
Subject: mm: make optimistic check for swapin readahead
Subject: mm: make swapin readahead to improve thp collapse rate
Subject: mm, thp: make swapin readahead under down_read of mmap_sem
Subject: mm, thp: fix locking inconsistency in collapse_huge_page
Subject: khugepaged: recheck pmd after mmap_sem re-acquired
Subject: thp, mlock: update unevictable-lru.txt
Subject: mm: do not pass mm_struct into handle_mm_fault
Subject: mm: introduce fault_env
Subject: mm: postpone page table allocation until we have page to map
Subject: rmap: support file thp
Subject: mm: introduce do_set_pmd()
Subject: thp, vmstats: add counters for huge file pages
Subject: thp: support file pages in zap_huge_pmd()
Subject: thp: handle file pages in split_huge_pmd()
Subject: thp: handle file COW faults
Subject: thp: skip file huge pmd on copy_huge_pmd()
Subject: thp: prepare change_huge_pmd() for file thp
Subject: thp: run vma_adjust_trans_huge() outside i_mmap_rwsem
Subject: thp: file pages support for split_huge_page()
Subject: thp, mlock: do not mlock PTE-mapped file huge pages
Subject: vmscan: split file huge pages before paging them out
Subject: page-flags: relax policy for PG_mappedtodisk and PG_reclaim
Subject: radix-tree: implement radix_tree_maybe_preload_order()
Subject: filemap: prepare find and delete operations for huge pages
Subject: truncate: handle file thp
Subject: mm, rmap: account shmem thp pages
Subject: shmem: prepare huge= mount option and sysfs knob
Subject: shmem: get_unmapped_area align huge page
Subject: shmem: add huge pages support
Subject: shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
Subject: thp: extract khugepaged from mm/huge_memory.c
Subject: khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
Subject: shmem: make shmem_inode_info::lock irq-safe
Subject: khugepaged: add support of collapse for tmpfs/shmem pages
Subject: thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
Subject: shmem: split huge pages beyond i_size under memory pressure
Subject: thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
Subject: mm, thp: fix comment inconsistency for swapin readahead functions
Subject: mm, thp: convert from optimistic swapin collapsing to conservative
Subject: mm: fix build warnings in <linux/compaction.h>
Subject: mm: memcontrol: remove BUG_ON in uncharge_list
Subject: mm: memcontrol: fix documentation for compound parameter
Subject: cgroup: fix idr leak for the first cgroup root
Subject: cgroup: remove unnecessary 0 check from css_from_id()
Subject: thp: fix comments of __pmd_trans_huge_lock()
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-07-20 22:44 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-07-20 22:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 378 bytes --]
5 fixes, based on 47ef4ad2684d380dd6d596140fb79395115c3950:
Subject: mm: memcontrol: fix cgroup creation failure after many small jobs
Subject: radix-tree: fix radix_tree_iter_retry() for tagged iterators.
Subject: testing/radix-tree: fix a macro expansion bug
Subject: tools/vm/slabinfo: fix an unintentional printf
Subject: pps: do not crash when failed to register
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-07-14 19:06 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-07-14 19:06 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]
20 fixes, based on f97d10454e4da2aceb44dfa7c59bb43ba9f50199:
Subject: mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
Subject: kasan: add newline to messages
Subject: scripts/gdb: silence 'nothing to do' message
Subject: scripts/gdb: rebuild constants.py on dependancy change
Subject: scripts/gdb: add constants.py to .gitignore
Subject: scripts/gdb: Perform path expansion to lx-symbol's arguments
Subject: Revert "scripts/gdb: add a Radix Tree Parser"
Subject: Revert "scripts/gdb: add documentation example for radix tree"
Subject: madvise_free, thp: fix madvise_free_huge_pmd return value after splitting
Subject: uapi: export lirc.h header
Subject: kasan/quarantine: fix bugs on qlist_move_cache()
Subject: mm, meminit: always return a valid node from early_pfn_to_nid
Subject: mm, meminit: ensure node is online before checking whether pages are uninitialised
Subject: gcov: add support for gcc version >= 6
Subject: vmlinux.lds: account for destructor sections
Subject: mm: thp: move pmd check inside ptl for freeze_page()
Subject: mm: rmap: call page_check_address() with sync enabled to avoid racy check
Subject: mm: thp: refix false positive BUG in page_move_anon_rmap()
Subject: mm: workingset: printk missing log level, use pr_info()
Subject: m32r: fix build warning about putc
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-06-24 21:48 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-06-24 21:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 2596 bytes --]
Two weeks worth of fixes here.
41 fixes, based on 63c04ee7d3b7c8d8e2726cb7c5f8a5f6fcc1e3b2:
Subject: mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero()
Subject: oom_reaper: avoid pointless atomic_inc_not_zero usage.
Subject: selftests/vm/compaction_test: fix write to restore nr_hugepages
Subject: tmpfs: don't undo fallocate past its last page
Subject: tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
Subject: x86: get rid of superfluous __GFP_REPEAT
Subject: x86/efi: get rid of superfluous __GFP_REPEAT
Subject: arm64: get rid of superfluous __GFP_REPEAT
Subject: arc: get rid of superfluous __GFP_REPEAT
Subject: mips: get rid of superfluous __GFP_REPEAT
Subject: nios2: get rid of superfluous __GFP_REPEAT
Subject: parisc: get rid of superfluous __GFP_REPEAT
Subject: score: get rid of superfluous __GFP_REPEAT
Subject: powerpc: get rid of superfluous __GFP_REPEAT
Subject: sparc: get rid of superfluous __GFP_REPEAT
Subject: s390: get rid of superfluous __GFP_REPEAT
Subject: sh: get rid of superfluous __GFP_REPEAT
Subject: tile: get rid of superfluous __GFP_REPEAT
Subject: unicore32: get rid of superfluous __GFP_REPEAT
Subject: jbd2: get rid of superfluous __GFP_REPEAT
Subject: MAINTAINERS: update Calgary IOMMU
Subject: mm: mempool: kasan: don't poot mempool objects in quarantine
Subject: mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
Subject: mailmap: add Antoine Tenart's email
Subject: mailmap: add Boris Brezillon's email
Subject: Revert "mm: make faultaround produce old ptes"
Subject: Revert "mm: disable fault around on emulated access bit architecture"
Subject: hugetlb: fix nr_pmds accounting with shared page tables
Subject: memcg: mem_cgroup_migrate() may be called with irq disabled
Subject: memcg: css_alloc should return an ERR_PTR value on error
Subject: mm/swap.c: flush lru pvecs on compound page arrival
Subject: mm/hugetlb: clear compound_mapcount when freeing gigantic pages
Subject: mm: prevent KASAN false positives in kmemleak
Subject: mm, compaction: abort free scanner if split fails
Subject: ocfs2: disable BUG assertions in reading blocks
Subject: oom, suspend: fix oom_reaper vs. oom_killer_disable race
Subject: fs/nilfs2: fix potential underflow in call to crc32_le
Subject: tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
Subject: mm/page_owner: avoid null pointer dereference
Subject: autofs: don't get stuck in a loop if vfs_write() returns an error
Subject: init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-06-08 22:33 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-06-08 22:33 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 552 bytes --]
7 fixes, based on c8ae067f2635be0f8c7e5db1bb74b757d623e05b:
Subject: mm/hugetlb: fix huge page reserve accounting for private mappings
Subject: kasan: change memory hot-add error messages to info messages
Subject: revert "mm: memcontrol: fix possible css ref leak on oom"
Subject: mm: thp: broken page count after commit aa88b68c
Subject: kernel/relay.c: fix potential memory leak
Subject: mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all
Subject: mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-06-03 21:51 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-06-03 21:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 880 bytes --]
11 fixes, based on 4340fa55298d17049e71c7a34e04647379c269f3:
Subject: mm: fix overflow in vm_map_ram()
Subject: kdump: fix dmesg gdbmacro to work with record based printk
Subject: mm: check the return value of lookup_page_ext for all call sites
Subject: reiserfs: avoid uninitialized variable use
Subject: memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
Subject: mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
Subject: checkpatch: reduce git commit description style false positives
Subject: mm, page_alloc: prevent infinite loop in buffered_rmqueue()
Subject: mm, oom_reaper: do not use siglock in try_oom_reaper()
Subject: mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-27 21:26 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-27 21:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]
- late-breaking ocfs2 updates
- random bunch of fixes
19 patches, based on dc03c0f9d12d85286d5e3623aa96d5c2a271b8e6:
Subject: ocfs2: o2hb: add negotiate timer
Subject: ocfs2: o2hb: add NEGO_TIMEOUT message
Subject: ocfs2: o2hb: add NEGOTIATE_APPROVE message
Subject: ocfs2: o2hb: add some user/debug log
Subject: ocfs2: o2hb: don't negotiate if last hb fail
Subject: ocfs2: o2hb: fix hb hung time
Subject: ocfs2: bump up o2cb network protocol version
Subject: direct-io: fix direct write stale data exposure from concurrent buffered read
Subject: mm: oom: do not reap task if there are live threads in threadgroup
Subject: MAINTAINERS: add kexec_core.c and kexec_file.c
Subject: MAINTAINERS: Kdump maintainers update
Subject: mm: use early_pfn_to_nid in page_ext_init
Subject: mm: use early_pfn_to_nid in register_page_bootmem_info_node
Subject: oom_reaper: close race with exiting task
Subject: mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()
Subject: mm/cma: silence warnings due to max() usage
Subject: mm/memcontrol.c: fix the margin computation in mem_cgroup_margin()
Subject: mm/memcontrol.c: move comments for get_mctgt_type() to proper position
Subject: mm: disable DEFERRED_STRUCT_PAGE_INIT on !NO_BOOTMEM
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-26 22:15 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-26 22:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 706 bytes --]
10 fixes, based on ea8ea737c46cffa5d0ee74309f81e55a7e5e9c2a:
Subject: seqlock: fix raw_read_seqcount_latch()
Subject: mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly
Subject: mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
Subject: mm: slub: remove unused virt_to_obj()
Subject: ocfs2: fix improper handling of return errno
Subject: memcg: fix mem_cgroup_out_of_memory() return value.
Subject: mm: oom_reaper: remove some bloat
Subject: dma-debug: avoid spinlock recursion when disabling dma-debug
Subject: update "mm/zsmalloc: don't fail if can't create debugfs info"
Subject: drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-23 23:21 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-23 23:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 5691 bytes --]
- Please have a think about Oleg's "wait/ptrace: assume __WALL if the
child is traced". It's a kernel-based workaround for existing
userspace issues and is a form of non-back-compatible change.
- A few hotfixes
- befs cleanups
- nilfs2 updates
- sys_wait() changes
- kexec updates
- kdump
- scripts/gdb updates
- the last of the MM queue
- a few other misc things
84 patches, based on 7639dad93a5564579987abded4ec05e3db13659d:
Subject: m32r: fix build failure
Subject: : ELF/MIPS build fix
Subject: mm: memcontrol: fix possible css ref leak on oom
Subject: fs/befs/datastream.c:befs_read_datastream(): remove unneeded initialization to NULL
Subject: fs/befs/datastream.c:befs_read_lsymlink(): remove unneeded initialization to NULL
Subject: fs/befs/datastream.c:befs_find_brun_dblindirect(): remove unneeded initializations to NULL
Subject: fs/befs/linuxvfs.c:befs_get_block(): remove unneeded initialization to NULL
Subject: fs/befs/linuxvfs.c:befs_iget(): remove unneeded initialization to NULL
Subject: fs/befs/linuxvfs.c:befs_iget(): remove unneeded raw_inode initialization to NULL
Subject: fs/befs/linuxvfs.c:befs_iget(): remove unneeded befs_nio initialization to NULL
Subject: fs/befs/io.c:befs_bread_iaddr(): remove unneeded initialization to NULL
Subject: fs/befs/io.c:befs_bread(): remove unneeded initialization to NULL
Subject: nilfs2: constify nilfs_sc_operations structures
Subject: nilfs2: fix white space issue in nilfs_mount()
Subject: nilfs2: remove space before comma
Subject: nilfs2: remove FSF mailing address from GPL notices
Subject: nilfs2: clean up old e-mail addresses
Subject: MAINTAINERS: add web link for nilfs project
Subject: nilfs2: clarify permission to replicate the design
Subject: nilfs2: get rid of nilfs_mdt_mark_block_dirty()
Subject: nilfs2: move cleanup code of metadata file from inode routines
Subject: nilfs2: replace __attribute__((packed)) with __packed
Subject: nilfs2: add missing line spacing
Subject: nilfs2: clean trailing semicolons in macros
Subject: nilfs2: do not emit extra newline on nilfs_warning() and nilfs_error()
Subject: nilfs2: remove space before semicolon
Subject: nilfs2: fix code indent coding style issue
Subject: nilfs2: avoid bare use of 'unsigned'
Subject: nilfs2: remove unnecessary else after return or break
Subject: nilfs2: remove loops of single statement macros
Subject: nilfs2: fix block comments
Subject: wait/ptrace: assume __WALL if the child is traced
Subject: wait: allow sys_waitid() to accept __WNOTHREAD/__WCLONE/__WALL
Subject: signal: make oom_flags a bool
Subject: kernel/signal.c: convert printk(KERN_<LEVEL> ...) to pr_<level>(...)
Subject: signal: move the "sig < SIGRTMIN" check into siginmask(sig)
Subject: kernek/fork.c: allocate idle task for a CPU always on its local node
Subject: exec: remove the no longer needed remove_arg_zero()->free_arg_page()
Subject: kexec: introduce a protection mechanism for the crashkernel reserved memory
Subject: kexec: provide arch_kexec_protect(unprotect)_crashkres()
Subject: kexec: make a pair of map/unmap reserved pages in error path
Subject: kexec: do a cleanup for function kexec_load
Subject: s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()
Subject: kdump: fix gdb macros work work with newer and 64-bit kernels
Subject: rtsx_usb_ms: use schedule_timeout_idle() in polling loop
Subject: drivers/memstick/core/mspro_block: use kmemdup
Subject: arch/defconfig: remove CONFIG_RESOURCE_COUNTERS
Subject: scripts/gdb: Adjust module reference counter reported by lx-lsmod
Subject: scripts/gdb: provide linux constants
Subject: scripts/gdb: provide kernel list item generators
Subject: scripts/gdb: convert modules usage to lists functions
Subject: scripts/gdb: provide exception catching parser
Subject: scripts/gdb: support !CONFIG_MODULES gracefully
Subject: scripts/gdb: provide a dentry_name VFS path helper
Subject: scripts/gdb: add io resource readers
Subject: scripts/gdb: add mount point list command
Subject: scripts/gdb: add cpu iterators
Subject: scripts/gdb: cast CPU numbers to integer
Subject: scripts/gdb: add a Radix Tree Parser
Subject: scripts/gdb: add documentation example for radix tree
Subject: scripts/gdb: add lx_thread_info_by_pid helper
Subject: scripts/gdb: improve types abstraction for gdb python scripts
Subject: scripts/gdb: fix issue with dmesg.py and python 3.X
Subject: scripts/gdb: decode bytestream on dmesg for Python3
Subject: MAINTAINERS: add co-maintainer for scripts/gdb
Subject: mm: make mmap_sem for write waits killable for mm syscalls
Subject: mm: make vm_mmap killable
Subject: mm: make vm_munmap killable
Subject: mm, aout: handle vm_brk failures
Subject: mm, elf: handle vm_brk error
Subject: mm: make vm_brk killable
Subject: mm, proc: make clear_refs killable
Subject: mm, fork: make dup_mmap wait for mmap_sem for write killable
Subject: ipc, shm: make shmem attach/detach wait for mmap_sem killable
Subject: vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
Subject: coredump: make coredump_wait wait for mmap_sem for write killable
Subject: aio: make aio_setup_ring killable
Subject: exec: make exec path waiting for mmap_sem killable
Subject: prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
Subject: uprobes: wait for mmap_sem for write killable
Subject: drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
Subject: drm/radeon: make radeon_mn_get wait for mmap_sem killable
Subject: drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
Subject: kgdb: depends on VT
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-20 23:55 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-20 23:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 297 bytes --]
- the rest of MM
- KASAN updates
- procfs updates
- exit, fork updates
- printk updates
- lib/ updates
- radix-tree testsuite updates
- checkpatch updates
- kprobes updates
- a few other misc bits
162 patches, based on 6eb59af580dcffc6f6982ac8ef6d27a1a5f26b27
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-20 0:07 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-20 0:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 7864 bytes --]
- fsnotify fix
- poll() timeout fix
- a few scripts/ tweaks
- debugobjects updates
- the (small) ocfs2 queue
- Minor fixes to kernel/padata.c
- Maybe half of the MM queue
117 patches, based on 2600a46ee0ed57c0e0a382c2a37ebac64d374d20:
Subject: fsnotify: avoid spurious EMFILE errors from inotify_init()
Subject: time: add missing implementation for timespec64_add_safe()
Subject: fs: poll/select/recvmmsg: use timespec64 for timeout events
Subject: time: remove timespec_add_safe()
Subject: scripts/decode_stacktrace.sh: handle symbols in modules
Subject: scripts/spelling.txt: add "fimware" misspelling
Subject: scripts/bloat-o-meter: print percent change
Subject: debugobjects: make fixup functions return bool instead of int
Subject: debugobjects: correct the usage of fixup call results
Subject: workqueue: update debugobjects fixup callbacks return type
Subject: timer: update debugobjects fixup callbacks return type
Subject: rcu: update debugobjects fixup callbacks return type
Subject: percpu_counter: update debugobjects fixup callbacks return type
Subject: Documentation: update debugobjects doc
Subject: debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
Subject: ocfs2: fix comment in struct ocfs2_extended_slot
Subject: ocfs2: clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec
Subject: ocfs2: clean up unused parameter 'count' in o2hb_read_block_input()
Subject: ocfs2: clean up an unneeded goto in ocfs2_put_slot()
Subject: kernel/padata.c: removed unused code
Subject: kernel/padata.c: hide unused functions
Subject: mm/slab: fix the theoretical race by holding proper lock
Subject: mm/slab: remove BAD_ALIEN_MAGIC again
Subject: mm/slab: drain the free slab as much as possible
Subject: mm/slab: factor out kmem_cache_node initialization code
Subject: mm/slab: clean-up kmem_cache_node setup
Subject: mm/slab: don't keep free slabs if free_objects exceeds free_limit
Subject: mm/slab: racy access/modify the slab color
Subject: mm/slab: make cache_grow() handle the page allocated on arbitrary node
Subject: mm/slab: separate cache_grow() to two parts
Subject: mm/slab: refill cpu cache through a new slab without holding a node lock
Subject: mm/slab: lockless decision to grow cache
Subject: mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in kmem_cache_shrink()
Subject: mm: SLAB freelist randomization
Subject: mm: slab: remove ZONE_DMA_FLAG
Subject: mm/slub.c: fix sysfs filename in comment
Subject: mm/page_ref: use page_ref helper instead of direct modification of _count
Subject: mm: rename _count, field of the struct page, to _refcount
Subject: compiler.h: add support for malloc attribute
Subject: include/linux: apply __malloc attribute
Subject: include/linux/nodemask.h: create next_node_in() helper
Subject: mm/hugetlb: optimize minimum size (min_size) accounting
Subject: mm/hugetlb: introduce hugetlb_bad_size()
Subject: arm64: mm: use hugetlb_bad_size()
Subject: metag: mm: use hugetlb_bad_size()
Subject: powerpc: mm: use hugetlb_bad_size()
Subject: tile: mm: use hugetlb_bad_size()
Subject: x86: mm: use hugetlb_bad_size()
Subject: mm/hugetlb: is_vm_hugetlb_page() can return bool
Subject: mm/memory_hotplug: is_mem_section_removable() can return bool
Subject: mm/vmalloc.c: is_vmalloc_addr() can return bool
Subject: mm/mempolicy.c: vma_migratable() can return bool
Subject: mm/memcontrol.c:mem_cgroup_select_victim_node(): clarify comment
Subject: mm/page_alloc: remove useless parameter of __free_pages_boot_core
Subject: mm/hugetlb.c: use first_memory_node
Subject: mm/mempolicy.c:offset_il_node() document and clarify
Subject: mm/rmap: replace BUG_ON(anon_vma->degree) with VM_WARN_ON
Subject: mm, compaction: wrap calculating first and last pfn of pageblock
Subject: mm, compaction: reduce spurious pcplist drains
Subject: mm, compaction: skip blocks where isolation fails in async direct compaction
Subject: mm/highmem: simplify is_highmem()
Subject: mm: uninline page_mapped()
Subject: mm/hugetlb: add same zone check in pfn_range_valid_gigantic()
Subject: mm/memory_hotplug: add comment to some functions related to memory hotplug
Subject: mm/vmstat: add zone range overlapping check
Subject: mm/page_owner: add zone range overlapping check
Subject: power: add zone range overlapping check
Subject: mm/writeback: correct dirty page calculation for highmem
Subject: mm/page_alloc: correct highmem memory statistics
Subject: mm/highmem: make nr_free_highpages() handles all highmem zones by itself
Subject: mm/vmstat: make node_page_state() handles all zones by itself
Subject: mm/mmap: kill hook arch_rebalance_pgtables()
Subject: mm: update_lru_size warn and reset bad lru_size
Subject: mm: update_lru_size do the __mod_zone_page_state
Subject: mm: use __SetPageSwapBacked and dont ClearPageSwapBacked
Subject: tmpfs: preliminary minor tidyups
Subject: tmpfs: mem_cgroup charge fault to vm_mm not current mm
Subject: mm: /proc/sys/vm/stat_refresh to force vmstat update
Subject: huge mm: move_huge_pmd does not need new_vma
Subject: huge pagecache: extend mremap pmd rmap lockout to files
Subject: arch: fix has_transparent_hugepage()
Subject: memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE
Subject: memory_hotplug: introduce memhp_default_state= command line parameter
Subject: mm, oom: move GFP_NOFS check to out_of_memory
Subject: oom, oom_reaper: try to reap tasks which skip regular OOM killer path
Subject: mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper
Subject: mm, page_alloc: only check PageCompound for high-order pages
Subject: mm, page_alloc: use new PageAnonHead helper in the free page fast path
Subject: mm, page_alloc: reduce branches in zone_statistics
Subject: mm, page_alloc: inline zone_statistics
Subject: mm, page_alloc: inline the fast path of the zonelist iterator
Subject: mm, page_alloc: use __dec_zone_state for order-0 page allocation
Subject: mm, page_alloc: avoid unnecessary zone lookups during pageblock operations
Subject: mm, page_alloc: convert alloc_flags to unsigned
Subject: mm, page_alloc: convert nr_fair_skipped to bool
Subject: mm, page_alloc: remove unnecessary local variable in get_page_from_freelist
Subject: mm, page_alloc: remove unnecessary initialisation in get_page_from_freelist
Subject: mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
Subject: mm, page_alloc: simplify last cpupid reset
Subject: mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
Subject: mm, page_alloc: check once if a zone has isolated pageblocks
Subject: mm, page_alloc: shorten the page allocator fast path
Subject: mm, page_alloc: reduce cost of fair zone allocation policy retry
Subject: mm, page_alloc: shortcut watermark checks for order-0 pages
Subject: mm, page_alloc: avoid looking up the first zone in a zonelist twice
Subject: mm, page_alloc: remove field from alloc_context
Subject: mm, page_alloc: check multiple page fields with a single branch
Subject: mm, page_alloc: un-inline the bad part of free_pages_check
Subject: mm, page_alloc: pull out side effects from free_pages_check
Subject: mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
Subject: mm, page_alloc: inline pageblock lookup in page free fast paths
Subject: cpuset: use static key better and convert to new API
Subject: mm, page_alloc: defer debugging checks of freed pages until a PCP drain
Subject: mm, page_alloc: defer debugging checks of pages allocated from the PCP
Subject: mm, page_alloc: don't duplicate code in free_pcp_prepare
Subject: mm, page_alloc: uninline the bad page part of check_new_page()
Subject: mm, page_alloc: restore the original nodemask if the fast path allocation failed
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-12 22:41 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-12 22:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 344 bytes --]
4 fixes, based on 422ce5a97570cb8a37d016b6bc2021ae4dac5499:
Subject: ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
Subject: ocfs2: fix posix_acl_create deadlock
Subject: ksm: fix conflict between mmput and scan_get_next_rmap_item
Subject: mm: thp: calculate the mapcount correctly for THP pages during WP faults
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-09 23:28 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-09 23:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 302 bytes --]
3 fixes, based on 44549e8f5eea4e0a41b487b63e616cb089922b99:
Subject: Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan""
Subject: zsmalloc: fix zs_can_compact() integer overflow
Subject: compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16()
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2016-05-05 23:21 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2016-05-05 23:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits
[-- Attachment #1: Type: text/plain, Size: 987 bytes --]
14 fixes, based on c5e0666c5a3ccabdf16bb88451886cdf81849b66:
Subject: mm: thp: correct split_huge_pages file permission
Subject: mm: memcontrol: let v2 cgroups follow changes in system swappiness
Subject: rapidio/mport_cdev: fix uapi type definitions
Subject: huge pagecache: mmap_sem is unlocked when truncation splits pmd
Subject: mm: update min_free_kbytes from khugepaged after core initialization
Subject: mm, cma: prevent nr_isolated_* counters from going negative
Subject: MAINTAINERS: fix Rajendra Nayak's address
Subject: mm: thp: kvm: fix memory corruption in KVM with THP enabled
Subject: mm/zswap: provide unique zpool name
Subject: proc: prevent accessing /proc/<PID>/environ until it's ready
Subject: modpost: fix module autoloading for OF devices with generic compatible property
Subject: mm: fix kcompactd hang during memory offlining
Subject: lib/stackdepot: avoid to return 0 handle
Subject: byteswap: try to avoid __builtin_constant_p gcc bug
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2015-09-09 23:23 ` incoming Linus Torvalds
@ 2015-09-10 6:47 ` Rasmus Villemoes
0 siblings, 0 replies; 786+ messages in thread
From: Rasmus Villemoes @ 2015-09-10 6:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Alexey Dobriyan, Linux Kernel Mailing List
On Thu, Sep 10 2015, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> The VERY FIRST conversion patch I looked at was buggy. That makes me
> angry. The whole *AND*ONLY* point of this whole thing was to get rid
> of bugs, and be a obviously safe interface, and then the first
> conversion patch proves it wrong.
>
> Let me show you:
>
> if (isdigit(*str)) {
> - io_tlb_nslabs = simple_strtoul(str, &str, 0);
> + str += parse_integer(str, 0, &io_tlb_nslabs);
>
> and obviously nobody spent even a *second* asking themselves "what if
> parse_integer returns an error".
[This is going to sound awfully self-glorifying. Oh well.] I did point
that out in another instance (memparse), which I think then got somewhat
fixed in a later version. Since Alexey and I seemed to disagree on what
guiding principles to use when doing the conversions and a number of
other points, I didn't have the energy to go through the entire series,
and the discussion died out.
http://thread.gmane.org/gmane.linux.kernel/1942623/focus=1944193
> I liked the automatic type-based templating it does, but I *don't*
> like the breakage that seems to be inevitable in any large-scale
> conversion from a previously used historical interface.
My words exactly.
Rasmus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
[not found] <20150909153424.3feb1c403a841ab97b2d98ab@linux-foundation.org>
@ 2015-09-09 23:23 ` Linus Torvalds
2015-09-10 6:47 ` incoming Rasmus Villemoes
0 siblings, 1 reply; 786+ messages in thread
From: Linus Torvalds @ 2015-09-09 23:23 UTC (permalink / raw)
To: Andrew Morton, Alexey Dobriyan; +Cc: Linux Kernel Mailing List
On Wed, Sep 9, 2015 at 3:34 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> Subject: lib/: add parse_integer() (replacement for simple_strto*())
> Subject: parse_integer: add runtime testsuite
> Subject: parse-integer: rewrite kstrto*()
> Subject: parse_integer: add checkpatch.pl notice
> Subject: parse_integer: convert scanf()
> Subject: scanf: fix type range overflow
> Subject: parse_integer: convert lib/
> Subject: parse_integer: convert mm/
> Subject: parse_integer: convert fs/
> Subject: parse_integer: convert fs/cachefiles/
> Subject: parse_integer: convert ext2, ext4
> Subject: parse_integer: convert fs/ocfs2/
> Subject: parse_integer: convert fs/9p/
> Subject: parse_integer: convert fs/exofs/
No.
I'm not taking yet another broken "deprecate old interface, replace it
with new-and-improved one, and screw things up in the process".
The whole "kstrto*()" thing was a mistake. We had real bugs brought in
by the conversion to the "better" interface. The "even betterer" new
parse_integer() interface actually looks lik ea real improvement, and
talks about some of the brokenness of the old code, and I was really
wanting to like it, but then I saw the conversions.
The VERY FIRST conversion patch I looked at was buggy. That makes me
angry. The whole *AND*ONLY* point of this whole thing was to get rid
of bugs, and be a obviously safe interface, and then the first
conversion patch proves it wrong.
Let me show you:
if (isdigit(*str)) {
- io_tlb_nslabs = simple_strtoul(str, &str, 0);
+ str += parse_integer(str, 0, &io_tlb_nslabs);
and obviously nobody spent even a *second* asking themselves "what if
parse_integer returns an error".
The old code didn't fail catastrophically in the error case. The new one does.
And yes, parse_integer() really can return an error, even despite that
"isdigit(*str)" check. Think about it. Or just read the source code.
I really am very tired indeed of these "trivially obvious
improvements" that are buggy and actually introduce whole new ways to
write buggy code. Yes, the old code could miss an error. But the old
code wouldn't then create invalid pointers like the new code does.
I'm not thrilled about going through the rest of this sequence,
looking for other gotcha's. But I am *really* really tired of this
idiotic "let's make up a new interface that gets things right" and
then absolutely doesn't get it right at all. This is not just an issue
for number parseing - we had similar issues with the completely
moronic and misdesigned crap called "strlcpy()", which was introduced
for similar reasons, and also caused nasty bugs where the old code was
actually correct, and the "converted to better and safer interfaces"
code was actually buggy.
Mixing the error handling and the string update was a mistake.
Although *not* mixing it causes its own set of problems.
But whatever the final resolution to this is, I am *not* taking this
series. No way, no how. I liked the automatic type-based templating it
does, but I *don't* like the breakage that seems to be inevitable in
any large-scale conversion from a previously used historical
interface. People who implement new and improved interfaces always
seem to get that wrong.
Linus
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-04 19:24 ` incoming Greg KH
@ 2007-05-04 19:29 ` Roland McGrath
-1 siblings, 0 replies; 786+ messages in thread
From: Roland McGrath @ 2007-05-04 19:29 UTC (permalink / raw)
To: Greg KH
Cc: Andrew Morton, Linus Torvalds, Hugh Dickins, Christoph Lameter,
David S. Miller, Andi Kleen, Luck, Tony, Rik van Riel,
Benjamin Herrenschmidt, linux-kernel, linux-mm, Stephen Smalley
> ABI changes are not a problem for -stable, so don't let that stop anyone
> :)
In fact this is the harmless sort (changes only the error code of a
failure case) that might actually go in if there were any important
reason. But the smiley stands.
Thanks,
Roland
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-04 19:29 ` Roland McGrath
0 siblings, 0 replies; 786+ messages in thread
From: Roland McGrath @ 2007-05-04 19:29 UTC (permalink / raw)
To: Greg KH
Cc: Andrew Morton, Linus Torvalds, Hugh Dickins, Christoph Lameter,
David S. Miller, Andi Kleen, Luck, Tony, Rik van Riel,
Benjamin Herrenschmidt, linux-kernel, linux-mm, Stephen Smalley
> ABI changes are not a problem for -stable, so don't let that stop anyone
> :)
In fact this is the harmless sort (changes only the error code of a
failure case) that might actually go in if there were any important
reason. But the smiley stands.
Thanks,
Roland
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-04 18:57 ` incoming Roland McGrath
@ 2007-05-04 19:24 ` Greg KH
-1 siblings, 0 replies; 786+ messages in thread
From: Greg KH @ 2007-05-04 19:24 UTC (permalink / raw)
To: Roland McGrath
Cc: Andrew Morton, Linus Torvalds, Hugh Dickins, Christoph Lameter,
David S. Miller, Andi Kleen, Luck, Tony, Rik van Riel,
Benjamin Herrenschmidt, linux-kernel, linux-mm, Stephen Smalley
On Fri, May 04, 2007 at 11:57:21AM -0700, Roland McGrath wrote:
> > Ah. The patch affects security code, but it doesn't actually address any
> > insecurity. I didn't think it was needed for -stable?
>
> I would not recommend it for -stable.
> It is an ABI change for the case of a security refusal.
ABI changes are not a problem for -stable, so don't let that stop anyone
:)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-04 19:24 ` Greg KH
0 siblings, 0 replies; 786+ messages in thread
From: Greg KH @ 2007-05-04 19:24 UTC (permalink / raw)
To: Roland McGrath
Cc: Andrew Morton, Linus Torvalds, Hugh Dickins, Christoph Lameter,
David S. Miller, Andi Kleen, Luck, Tony, Rik van Riel,
Benjamin Herrenschmidt, linux-kernel, linux-mm, Stephen Smalley
On Fri, May 04, 2007 at 11:57:21AM -0700, Roland McGrath wrote:
> > Ah. The patch affects security code, but it doesn't actually address any
> > insecurity. I didn't think it was needed for -stable?
>
> I would not recommend it for -stable.
> It is an ABI change for the case of a security refusal.
ABI changes are not a problem for -stable, so don't let that stop anyone
:)
thanks,
greg k-h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-04 16:14 ` incoming Andrew Morton
@ 2007-05-04 18:57 ` Roland McGrath
-1 siblings, 0 replies; 786+ messages in thread
From: Roland McGrath @ 2007-05-04 18:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg KH, Linus Torvalds, Hugh Dickins, Christoph Lameter,
David S. Miller, Andi Kleen, Luck, Tony, Rik van Riel,
Benjamin Herrenschmidt, linux-kernel, linux-mm, Stephen Smalley
> Ah. The patch affects security code, but it doesn't actually address any
> insecurity. I didn't think it was needed for -stable?
I would not recommend it for -stable.
It is an ABI change for the case of a security refusal.
Thanks,
Roland
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-04 18:57 ` Roland McGrath
0 siblings, 0 replies; 786+ messages in thread
From: Roland McGrath @ 2007-05-04 18:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg KH, Linus Torvalds, Hugh Dickins, Christoph Lameter,
David S. Miller, Andi Kleen, Luck, Tony, Rik van Riel,
Benjamin Herrenschmidt, linux-kernel, linux-mm, Stephen Smalley
> Ah. The patch affects security code, but it doesn't actually address any
> insecurity. I didn't think it was needed for -stable?
I would not recommend it for -stable.
It is an ABI change for the case of a security refusal.
Thanks,
Roland
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-04 16:14 ` incoming Andrew Morton
@ 2007-05-04 17:02 ` Greg KH
-1 siblings, 0 replies; 786+ messages in thread
From: Greg KH @ 2007-05-04 17:02 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm, Roland McGrath, Stephen Smalley
On Fri, May 04, 2007 at 09:14:34AM -0700, Andrew Morton wrote:
> On Fri, 4 May 2007 06:37:28 -0700 Greg KH <greg@kroah.com> wrote:
>
> > On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> > > - One little security patch
> >
> > Care to cc: linux-stable with it so we can do a new 2.6.21 release with
> > it if needed?
> >
>
> Ah. The patch affects security code, but it doesn't actually address any
> insecurity. I didn't think it was needed for -stable?
Ah, ok, I read "security" as fixing a insecure problem, my mistake :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-04 17:02 ` Greg KH
0 siblings, 0 replies; 786+ messages in thread
From: Greg KH @ 2007-05-04 17:02 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm, Roland McGrath, Stephen Smalley
On Fri, May 04, 2007 at 09:14:34AM -0700, Andrew Morton wrote:
> On Fri, 4 May 2007 06:37:28 -0700 Greg KH <greg@kroah.com> wrote:
>
> > On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> > > - One little security patch
> >
> > Care to cc: linux-stable with it so we can do a new 2.6.21 release with
> > it if needed?
> >
>
> Ah. The patch affects security code, but it doesn't actually address any
> insecurity. I didn't think it was needed for -stable?
Ah, ok, I read "security" as fixing a insecure problem, my mistake :)
thanks,
greg k-h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-04 13:37 ` incoming Greg KH
@ 2007-05-04 16:14 ` Andrew Morton
-1 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2007-05-04 16:14 UTC (permalink / raw)
To: Greg KH
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm, Roland McGrath, Stephen Smalley
On Fri, 4 May 2007 06:37:28 -0700 Greg KH <greg@kroah.com> wrote:
> On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> > - One little security patch
>
> Care to cc: linux-stable with it so we can do a new 2.6.21 release with
> it if needed?
>
Ah. The patch affects security code, but it doesn't actually address any
insecurity. I didn't think it was needed for -stable?
From: Roland McGrath <roland@redhat.com>
wait* syscalls return -ECHILD even when an individual PID of a live child
was requested explicitly, when security_task_wait denies the operation.
This means that something like a broken SELinux policy can produce an
unexpected failure that looks just like a bug with wait or ptrace or
something.
This patch makes do_wait return -EACCES (or other appropriate error returned
from security_task_wait() instead of -ECHILD if some children were ruled out
solely because security_task_wait failed.
[jmorris@namei.org: switch error code to EACCES]
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/exit.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff -puN kernel/exit.c~return-eperm-not-echild-on-security_task_wait-failure kernel/exit.c
--- a/kernel/exit.c~return-eperm-not-echild-on-security_task_wait-failure
+++ a/kernel/exit.c
@@ -1033,6 +1033,8 @@ asmlinkage void sys_exit_group(int error
static int eligible_child(pid_t pid, int options, struct task_struct *p)
{
+ int err;
+
if (pid > 0) {
if (p->pid != pid)
return 0;
@@ -1066,8 +1068,9 @@ static int eligible_child(pid_t pid, int
if (delay_group_leader(p))
return 2;
- if (security_task_wait(p))
- return 0;
+ err = security_task_wait(p);
+ if (err)
+ return err;
return 1;
}
@@ -1449,6 +1452,7 @@ static long do_wait(pid_t pid, int optio
DECLARE_WAITQUEUE(wait, current);
struct task_struct *tsk;
int flag, retval;
+ int allowed, denied;
add_wait_queue(¤t->signal->wait_chldexit,&wait);
repeat:
@@ -1457,6 +1461,7 @@ repeat:
* match our criteria, even if we are not able to reap it yet.
*/
flag = 0;
+ allowed = denied = 0;
current->state = TASK_INTERRUPTIBLE;
read_lock(&tasklist_lock);
tsk = current;
@@ -1472,6 +1477,12 @@ repeat:
if (!ret)
continue;
+ if (unlikely(ret < 0)) {
+ denied = ret;
+ continue;
+ }
+ allowed = 1;
+
switch (p->state) {
case TASK_TRACED:
/*
@@ -1570,6 +1581,8 @@ check_continued:
goto repeat;
}
retval = -ECHILD;
+ if (unlikely(denied) && !allowed)
+ retval = denied;
end:
current->state = TASK_RUNNING;
remove_wait_queue(¤t->signal->wait_chldexit,&wait);
_
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-04 16:14 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2007-05-04 16:14 UTC (permalink / raw)
To: Greg KH
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm, Roland McGrath, Stephen Smalley
On Fri, 4 May 2007 06:37:28 -0700 Greg KH <greg@kroah.com> wrote:
> On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> > - One little security patch
>
> Care to cc: linux-stable with it so we can do a new 2.6.21 release with
> it if needed?
>
Ah. The patch affects security code, but it doesn't actually address any
insecurity. I didn't think it was needed for -stable?
From: Roland McGrath <roland@redhat.com>
wait* syscalls return -ECHILD even when an individual PID of a live child
was requested explicitly, when security_task_wait denies the operation.
This means that something like a broken SELinux policy can produce an
unexpected failure that looks just like a bug with wait or ptrace or
something.
This patch makes do_wait return -EACCES (or other appropriate error returned
from security_task_wait() instead of -ECHILD if some children were ruled out
solely because security_task_wait failed.
[jmorris@namei.org: switch error code to EACCES]
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/exit.c | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff -puN kernel/exit.c~return-eperm-not-echild-on-security_task_wait-failure kernel/exit.c
--- a/kernel/exit.c~return-eperm-not-echild-on-security_task_wait-failure
+++ a/kernel/exit.c
@@ -1033,6 +1033,8 @@ asmlinkage void sys_exit_group(int error
static int eligible_child(pid_t pid, int options, struct task_struct *p)
{
+ int err;
+
if (pid > 0) {
if (p->pid != pid)
return 0;
@@ -1066,8 +1068,9 @@ static int eligible_child(pid_t pid, int
if (delay_group_leader(p))
return 2;
- if (security_task_wait(p))
- return 0;
+ err = security_task_wait(p);
+ if (err)
+ return err;
return 1;
}
@@ -1449,6 +1452,7 @@ static long do_wait(pid_t pid, int optio
DECLARE_WAITQUEUE(wait, current);
struct task_struct *tsk;
int flag, retval;
+ int allowed, denied;
add_wait_queue(¤t->signal->wait_chldexit,&wait);
repeat:
@@ -1457,6 +1461,7 @@ repeat:
* match our criteria, even if we are not able to reap it yet.
*/
flag = 0;
+ allowed = denied = 0;
current->state = TASK_INTERRUPTIBLE;
read_lock(&tasklist_lock);
tsk = current;
@@ -1472,6 +1477,12 @@ repeat:
if (!ret)
continue;
+ if (unlikely(ret < 0)) {
+ denied = ret;
+ continue;
+ }
+ allowed = 1;
+
switch (p->state) {
case TASK_TRACED:
/*
@@ -1570,6 +1581,8 @@ check_continued:
goto repeat;
}
retval = -ECHILD;
+ if (unlikely(denied) && !allowed)
+ retval = denied;
end:
current->state = TASK_RUNNING;
remove_wait_queue(¤t->signal->wait_chldexit,&wait);
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-02 22:02 ` incoming Andrew Morton
@ 2007-05-04 13:37 ` Greg KH
-1 siblings, 0 replies; 786+ messages in thread
From: Greg KH @ 2007-05-04 13:37 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm
On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> - One little security patch
Care to cc: linux-stable with it so we can do a new 2.6.21 release with
it if needed?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-04 13:37 ` Greg KH
0 siblings, 0 replies; 786+ messages in thread
From: Greg KH @ 2007-05-04 13:37 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm
On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> - One little security patch
Care to cc: linux-stable with it so we can do a new 2.6.21 release with
it if needed?
thanks,
greg k-h
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-03 7:55 ` incoming Russell King
@ 2007-05-03 8:05 ` Andrew Morton
-1 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2007-05-03 8:05 UTC (permalink / raw)
To: Russell King
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm
On Thu, 3 May 2007 08:55:43 +0100 Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> > So this is what I have lined up for the first mm->2.6.22 batch. I won't be
> > sending it off for another 12-24 hours yet. To give people time for final
> > comment and to give me time to see if it actually works.
>
> I assume you're going to update this list with my comments I sent
> yesterday?
>
Serial drivers? Well you saw me drop a bunch of them. I now have:
serial-driver-pmc-msp71xx.patch
rm9000-serial-driver.patch
serial-define-fixed_port-flag-for-serial_core.patch
mpsc-serial-driver-tx-locking.patch
serial-serial_core-use-pr_debug.patch
I'll also be holding off on MADV_FREE - Nick has some performance things to
share and I'm assuming they're not as good as he'd like.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-03 8:05 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2007-05-03 8:05 UTC (permalink / raw)
To: Russell King
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm
On Thu, 3 May 2007 08:55:43 +0100 Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> > So this is what I have lined up for the first mm->2.6.22 batch. I won't be
> > sending it off for another 12-24 hours yet. To give people time for final
> > comment and to give me time to see if it actually works.
>
> I assume you're going to update this list with my comments I sent
> yesterday?
>
Serial drivers? Well you saw me drop a bunch of them. I now have:
serial-driver-pmc-msp71xx.patch
rm9000-serial-driver.patch
serial-define-fixed_port-flag-for-serial_core.patch
mpsc-serial-driver-tx-locking.patch
serial-serial_core-use-pr_debug.patch
I'll also be holding off on MADV_FREE - Nick has some performance things to
share and I'm assuming they're not as good as he'd like.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-02 22:02 ` incoming Andrew Morton
@ 2007-05-03 7:55 ` Russell King
-1 siblings, 0 replies; 786+ messages in thread
From: Russell King @ 2007-05-03 7:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm
On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> So this is what I have lined up for the first mm->2.6.22 batch. I won't be
> sending it off for another 12-24 hours yet. To give people time for final
> comment and to give me time to see if it actually works.
I assume you're going to update this list with my comments I sent
yesterday?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-03 7:55 ` Russell King
0 siblings, 0 replies; 786+ messages in thread
From: Russell King @ 2007-05-03 7:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, Benjamin Herrenschmidt,
linux-kernel, linux-mm
On Wed, May 02, 2007 at 03:02:52PM -0700, Andrew Morton wrote:
> So this is what I have lined up for the first mm->2.6.22 batch. I won't be
> sending it off for another 12-24 hours yet. To give people time for final
> comment and to give me time to see if it actually works.
I assume you're going to update this list with my comments I sent
yesterday?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2007-05-02 22:02 ` incoming Andrew Morton
@ 2007-05-02 22:31 ` Benjamin Herrenschmidt
-1 siblings, 0 replies; 786+ messages in thread
From: Benjamin Herrenschmidt @ 2007-05-02 22:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, linux-kernel, linux-mm
On Wed, 2007-05-02 at 15:02 -0700, Andrew Morton wrote:
> So this is what I have lined up for the first mm->2.6.22 batch. I won't be
> sending it off for another 12-24 hours yet. To give people time for final
> comment and to give me time to see if it actually works.
Thanks.
I have some powerpc bits that depend on that stuff that will go through
Paulus after these show up in git and I've rebased.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
@ 2007-05-02 22:31 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 786+ messages in thread
From: Benjamin Herrenschmidt @ 2007-05-02 22:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Hugh Dickins, Christoph Lameter, David S. Miller,
Andi Kleen, Luck, Tony, Rik van Riel, linux-kernel, linux-mm
On Wed, 2007-05-02 at 15:02 -0700, Andrew Morton wrote:
> So this is what I have lined up for the first mm->2.6.22 batch. I won't be
> sending it off for another 12-24 hours yet. To give people time for final
> comment and to give me time to see if it actually works.
Thanks.
I have some powerpc bits that depend on that stuff that will go through
Paulus after these show up in git and I've rebased.
Cheers,
Ben.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2007-05-02 22:02 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2007-05-02 22:02 UTC (permalink / raw)
To: Linus Torvalds
Cc: Hugh Dickins, Christoph Lameter, David S. Miller, Andi Kleen,
Luck, Tony, Rik van Riel, Benjamin Herrenschmidt, linux-kernel,
linux-mm
So this is what I have lined up for the first mm->2.6.22 batch. I won't be
sending it off for another 12-24 hours yet. To give people time for final
comment and to give me time to see if it actually works.
- A few serial bits.
- A few pcmcia bits.
- Some of the MM queue. Includes:
- An enhancement to /proc/pid/smaps to permit monitoring of a running
program's working set.
There's another patchset which builds on this quite a lot from Matt
Mackall, but it's not quite ready yet.
- The SLUB allocator. It's pretty green but I do want to push ahead
with this pretty aggressively with a view to replacing slab altogether.
If it ends up not working out then we should remove slub altogether
again, but I doubt if that will occur.
If SLUB isn't in good shape by 2.6.22 we should hide it in Kconfig
to prevent people from hitting known problems. It'll remain
EXPERIMENTAL.
- generic pagetable quicklist management. We have x86_64 and ia64
and sparc64 implementations, but I'll only include David's sparc64
implementation here. I'll send the x86_64 and ia64 implementations
through maintainers.
- Various random MM bits
- Benh's teach-get_unmapped_area-about-MAP_FIXED changes
- madvise(MADV_FREE)
This means I'm holding back Mel's page allocator work, and Andy's
lumpy-reclaim.
A shame in a way - I have high hopes for lumpy reclaim against the
moveable zone, but these things are not to be done lightly.
A few MM things have been held back awaiting subsystem tree merges
(probably x86 - I didn't check).
- One little security patch
- the blackfin architecture
- small h8300 update
- small alpha update
- swsusp updates
- m68k bits
- cris udpates
- Lots of UML updates
- v850, xtensa
slab-introduce-krealloc.patch
at91_cf-minor-fix.patch
add-new_id-to-pcmcia-drivers.patch
ide-cs-recognize-2gb-compactflash-from-transcend.patch
serial-driver-pmc-msp71xx.patch
rm9000-serial-driver.patch
serial-define-fixed_port-flag-for-serial_core.patch
serial-use-resource_size_t-for-serial-port-io-addresses.patch
mpsc-serial-driver-tx-locking.patch
8250_pci-fix-pci-must_checks.patch
serial-serial_core-use-pr_debug.patch
add-apply_to_page_range-which-applies-a-function-to-a-pte-range.patch
safer-nr_node_ids-and-nr_node_ids-determination-and-initial.patch
use-zvc-counters-to-establish-exact-size-of-dirtyable-pages.patch
proper-prototype-for-hugetlb_get_unmapped_area.patch
mm-remove-gcc-workaround.patch
slab-ensure-cache_alloc_refill-terminates.patch
mm-make-read_cache_page-synchronous.patch
fs-buffer-dont-pageuptodate-without-page-locked.patch
allow-oom_adj-of-saintly-processes.patch
introduce-config_has_dma.patch
mm-slabc-proper-prototypes.patch
add-pfn_valid_within-helper-for-sub-max_order-hole-detection.patch
mm-simplify-filemap_nopage.patch
add-unitialized_var-macro-for-suppressing-gcc-warnings.patch
i386-add-ptep_test_and_clear_dirtyyoung.patch
i386-use-pte_update_defer-in-ptep_test_and_clear_dirtyyoung.patch
smaps-extract-pmd-walker-from-smaps-code.patch
smaps-add-pages-referenced-count-to-smaps.patch
smaps-add-clear_refs-file-to-clear-reference.patch
readahead-improve-heuristic-detecting-sequential-reads.patch
readahead-code-cleanup.patch
slab-use-num_possible_cpus-in-enable_cpucache.patch
slab-dont-allocate-empty-shared-caches.patch
slab-numa-kmem_cache-diet.patch
do-not-disable-interrupts-when-reading-min_free_kbytes.patch
slab-mark-set_up_list3s-__init.patch
cpusets-allow-tif_memdie-threads-to-allocate-anywhere.patch
i386-use-page-allocator-to-allocate-thread_info-structure.patch
slub-core.patch
make-page-private-usable-in-compound-pages-v1.patch
optimize-compound_head-by-avoiding-a-shared-page.patch
add-virt_to_head_page-and-consolidate-code-in-slab-and-slub.patch
slub-fix-object-tracking.patch
slub-enable-tracking-of-full-slabs.patch
slub-validation-of-slabs-metadata-and-guard-zones.patch
slub-add-min_partial.patch
slub-add-ability-to-list-alloc--free-callers-per-slab.patch
slub-free-slabs-and-sort-partial-slab-lists-in-kmem_cache_shrink.patch
slub-remove-object-activities-out-of-checking-functions.patch
slub-user-documentation.patch
slub-add-slabinfo-tool.patch
quicklists-for-page-table-pages.patch
quicklist-support-for-sparc64.patch
slob-handle-slab_panic-flag.patch
include-kern_-constant-in-printk-calls-in-mm-slabc.patch
mm-madvise-avoid-exclusive-mmap_sem.patch
mm-remove-destroy_dirty_buffers-from-invalidate_bdev.patch
mm-optimize-kill_bdev.patch
mm-optimize-acorn-partition-truncate.patch
slab-allocators-remove-obsolete-slab_must_hwcache_align.patch
kmem_cache-simplify-slab-cache-creation.patch
slab-allocators-remove-multiple-alignment-specifications.patch
fault-injection-fix-failslab-with-config_numa.patch
mm-fix-handling-of-panic_on_oom-when-cpusets-are-in-use.patch
oom-fix-constraint-deadlock.patch
get_unmapped_area-handles-map_fixed-on-powerpc.patch
get_unmapped_area-handles-map_fixed-on-alpha.patch
get_unmapped_area-handles-map_fixed-on-arm.patch
get_unmapped_area-handles-map_fixed-on-frv.patch
get_unmapped_area-handles-map_fixed-on-i386.patch
get_unmapped_area-handles-map_fixed-on-ia64.patch
get_unmapped_area-handles-map_fixed-on-parisc.patch
get_unmapped_area-handles-map_fixed-on-sparc64.patch
get_unmapped_area-handles-map_fixed-on-x86_64.patch
get_unmapped_area-handles-map_fixed-in-hugetlbfs.patch
get_unmapped_area-handles-map_fixed-in-generic-code.patch
get_unmapped_area-doesnt-need-hugetlbfs-hacks-anymore.patch
slab-allocators-remove-slab_debug_initial-flag.patch
slab-allocators-remove-slab_ctor_atomic.patch
slab-allocators-remove-useless-__gfp_no_grow-flag.patch
lazy-freeing-of-memory-through-madv_free.patch
restore-madv_dontneed-to-its-original-linux-behaviour.patch
hugetlbfs-add-null-check-in-hugetlb_zero_setup.patch
slob-fix-page-order-calculation-on-not-4kb-page.patch
page-migration-only-migrate-pages-if-allocation-in-the-highest-zone-is-possible.patch
return-eperm-not-echild-on-security_task_wait-failure.patch
blackfin-arch.patch
driver_bfin_serial_core.patch
blackfin-on-chip-ethernet-mac-controller-driver.patch
blackfin-patch-add-blackfin-support-in-smc91x.patch
blackfin-on-chip-rtc-controller-driver.patch
blackfin-blackfin-on-chip-spi-controller-driver.patch
convert-h8-300-to-generic-timekeeping.patch
h8300-generic-irq.patch
h8300-add-zimage-support.patch
round_up-macro-cleanup-in-arch-alpha-kernel-osf_sysc.patch
alpha-fix-bootp-image-creation.patch
alpha-prctl-macros.patch
srmcons-fix-kmallocgfp_kernel-inside-spinlock.patch
arm26-remove-useless-config-option-generic_bust_spinlock.patch
fix-refrigerator-vs-thaw_process-race.patch
swsusp-use-inline-functions-for-changing-page-flags.patch
swsusp-do-not-use-page-flags.patch
mm-remove-unused-page-flags.patch
swsusp-fix-error-paths-in-snapshot_open.patch
swsusp-use-gfp_kernel-for-creating-basic-data-structures.patch
freezer-remove-pf_nofreeze-from-handle_initrd.patch
swsusp-use-rbtree-for-tracking-allocated-swap.patch
freezer-fix-racy-usage-of-try_to_freeze-in-kswapd.patch
remove-software_suspend.patch
power-management-change-sys-power-disk-display.patch
kconfig-mentioneds-hibernation-not-just-swsusp.patch
swsusp-fix-snapshot_release.patch
swsusp-free-more-memory.patch
remove-unused-header-file-arch-m68k-atari-atasoundh.patch
spin_lock_unlocked-cleanup-in-arch-m68k.patch
remove-unused-header-file-drivers-serial-crisv10h.patch
cris-check-for-memory-allocation.patch
cris-remove-code-related-to-pre-22-kernel.patch
uml-delete-unused-code.patch
uml-formatting-fixes.patch
uml-host_info-tidying.patch
uml-mark-tt-mode-code-for-future-removal.patch
uml-print-coredump-limits.patch
uml-handle-block-device-hotplug-errors.patch
uml-driver-formatting-fixes.patch
uml-driver-formatting-fixes-fix.patch
uml-network-interface-hotplug-error-handling.patch
array_size-check-for-type.patch
uml-move-sigio-testing-to-sigioc.patch
uml-create-archh.patch
uml-create-as-layouth.patch
uml-move-remaining-useful-contents-of-user_utilh.patch
uml-remove-user_utilh.patch
uml-add-missing-__init-declarations.patch
remove-unused-header-file-arch-um-kernel-tt-include-mode_kern-tth.patch
uml-improve-checking-and-diagnostics-of-ethernet-macs.patch
uml-eliminate-temporary-buffer-in-eth_configure.patch
uml-replace-one-element-array-with-zero-element-array.patch
uml-fix-umid-in-xterm-titles.patch
uml-speed-up-exec.patch
uml-no-locking-needed-in-tlsc.patch
uml-tidy-processc.patch
uml-remove-page_size.patch
uml-kernel_thread-shouldnt-panic.patch
uml-tidy-fault-code.patch
uml-kernel-segfaults-should-dump-proper-registers.patch
uml-comment-early-boot-locking.patch
uml-irq-locking-commentary.patch
uml-delete-host_frame_size.patch
uml-drivers-get-release-methods.patch
uml-dump-registers-on-ptrace-or-wait-failure.patch
uml-speed-up-page-table-walking.patch
uml-remove-unused-x86_64-code.patch
uml-start-fixing-os_read_file-and-os_write_file.patch
uml-tidy-libc-code.patch
uml-convert-libc-layer-to-call-read-and-write.patch
uml-batch-i-o-requests.patch
uml-send-pointers-instead-of-structures-to-i-o-thread.patch
uml-send-pointers-instead-of-structures-to-i-o-thread-fix.patch
uml-dump-core-on-panic.patch
uml-dont-try-to-handle-signals-on-initial-process-stack.patch
uml-change-remaining-callers-of-os_read_write_file.patch
uml-formatting-fixes-around-os_read_write_file-callers.patch
uml-remove-debugging-remnants.patch
uml-rename-os_read_write_file_k-back-to-os_read_write_file.patch
uml-aio-deadlock-avoidance.patch
uml-speed-page-fault-path.patch
uml-eliminate-a-piece-of-debugging-code.patch
uml-more-page-fault-path-trimming.patch
uml-only-flush-areas-covered-by-vma.patch
uml-out-of-tmpfs-space-error-clarification.patch
uml-virtualized-time-fix.patch
uml-fix-prototypes.patch
v850-generic-timekeeping-conversion.patch
xtensa-strlcpy-is-smart-enough.patch
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2007-05-02 22:02 ` Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2007-05-02 22:02 UTC (permalink / raw)
To: Linus Torvalds
Cc: Hugh Dickins, Christoph Lameter, David S. Miller, Andi Kleen,
Luck, Tony, Rik van Riel, Benjamin Herrenschmidt, linux-kernel,
linux-mm
So this is what I have lined up for the first mm->2.6.22 batch. I won't be
sending it off for another 12-24 hours yet. To give people time for final
comment and to give me time to see if it actually works.
- A few serial bits.
- A few pcmcia bits.
- Some of the MM queue. Includes:
- An enhancement to /proc/pid/smaps to permit monitoring of a running
program's working set.
There's another patchset which builds on this quite a lot from Matt
Mackall, but it's not quite ready yet.
- The SLUB allocator. It's pretty green but I do want to push ahead
with this pretty aggressively with a view to replacing slab altogether.
If it ends up not working out then we should remove slub altogether
again, but I doubt if that will occur.
If SLUB isn't in good shape by 2.6.22 we should hide it in Kconfig
to prevent people from hitting known problems. It'll remain
EXPERIMENTAL.
- generic pagetable quicklist management. We have x86_64 and ia64
and sparc64 implementations, but I'll only include David's sparc64
implementation here. I'll send the x86_64 and ia64 implementations
through maintainers.
- Various random MM bits
- Benh's teach-get_unmapped_area-about-MAP_FIXED changes
- madvise(MADV_FREE)
This means I'm holding back Mel's page allocator work, and Andy's
lumpy-reclaim.
A shame in a way - I have high hopes for lumpy reclaim against the
moveable zone, but these things are not to be done lightly.
A few MM things have been held back awaiting subsystem tree merges
(probably x86 - I didn't check).
- One little security patch
- the blackfin architecture
- small h8300 update
- small alpha update
- swsusp updates
- m68k bits
- cris udpates
- Lots of UML updates
- v850, xtensa
slab-introduce-krealloc.patch
at91_cf-minor-fix.patch
add-new_id-to-pcmcia-drivers.patch
ide-cs-recognize-2gb-compactflash-from-transcend.patch
serial-driver-pmc-msp71xx.patch
rm9000-serial-driver.patch
serial-define-fixed_port-flag-for-serial_core.patch
serial-use-resource_size_t-for-serial-port-io-addresses.patch
mpsc-serial-driver-tx-locking.patch
8250_pci-fix-pci-must_checks.patch
serial-serial_core-use-pr_debug.patch
add-apply_to_page_range-which-applies-a-function-to-a-pte-range.patch
safer-nr_node_ids-and-nr_node_ids-determination-and-initial.patch
use-zvc-counters-to-establish-exact-size-of-dirtyable-pages.patch
proper-prototype-for-hugetlb_get_unmapped_area.patch
mm-remove-gcc-workaround.patch
slab-ensure-cache_alloc_refill-terminates.patch
mm-make-read_cache_page-synchronous.patch
fs-buffer-dont-pageuptodate-without-page-locked.patch
allow-oom_adj-of-saintly-processes.patch
introduce-config_has_dma.patch
mm-slabc-proper-prototypes.patch
add-pfn_valid_within-helper-for-sub-max_order-hole-detection.patch
mm-simplify-filemap_nopage.patch
add-unitialized_var-macro-for-suppressing-gcc-warnings.patch
i386-add-ptep_test_and_clear_dirtyyoung.patch
i386-use-pte_update_defer-in-ptep_test_and_clear_dirtyyoung.patch
smaps-extract-pmd-walker-from-smaps-code.patch
smaps-add-pages-referenced-count-to-smaps.patch
smaps-add-clear_refs-file-to-clear-reference.patch
readahead-improve-heuristic-detecting-sequential-reads.patch
readahead-code-cleanup.patch
slab-use-num_possible_cpus-in-enable_cpucache.patch
slab-dont-allocate-empty-shared-caches.patch
slab-numa-kmem_cache-diet.patch
do-not-disable-interrupts-when-reading-min_free_kbytes.patch
slab-mark-set_up_list3s-__init.patch
cpusets-allow-tif_memdie-threads-to-allocate-anywhere.patch
i386-use-page-allocator-to-allocate-thread_info-structure.patch
slub-core.patch
make-page-private-usable-in-compound-pages-v1.patch
optimize-compound_head-by-avoiding-a-shared-page.patch
add-virt_to_head_page-and-consolidate-code-in-slab-and-slub.patch
slub-fix-object-tracking.patch
slub-enable-tracking-of-full-slabs.patch
slub-validation-of-slabs-metadata-and-guard-zones.patch
slub-add-min_partial.patch
slub-add-ability-to-list-alloc--free-callers-per-slab.patch
slub-free-slabs-and-sort-partial-slab-lists-in-kmem_cache_shrink.patch
slub-remove-object-activities-out-of-checking-functions.patch
slub-user-documentation.patch
slub-add-slabinfo-tool.patch
quicklists-for-page-table-pages.patch
quicklist-support-for-sparc64.patch
slob-handle-slab_panic-flag.patch
include-kern_-constant-in-printk-calls-in-mm-slabc.patch
mm-madvise-avoid-exclusive-mmap_sem.patch
mm-remove-destroy_dirty_buffers-from-invalidate_bdev.patch
mm-optimize-kill_bdev.patch
mm-optimize-acorn-partition-truncate.patch
slab-allocators-remove-obsolete-slab_must_hwcache_align.patch
kmem_cache-simplify-slab-cache-creation.patch
slab-allocators-remove-multiple-alignment-specifications.patch
fault-injection-fix-failslab-with-config_numa.patch
mm-fix-handling-of-panic_on_oom-when-cpusets-are-in-use.patch
oom-fix-constraint-deadlock.patch
get_unmapped_area-handles-map_fixed-on-powerpc.patch
get_unmapped_area-handles-map_fixed-on-alpha.patch
get_unmapped_area-handles-map_fixed-on-arm.patch
get_unmapped_area-handles-map_fixed-on-frv.patch
get_unmapped_area-handles-map_fixed-on-i386.patch
get_unmapped_area-handles-map_fixed-on-ia64.patch
get_unmapped_area-handles-map_fixed-on-parisc.patch
get_unmapped_area-handles-map_fixed-on-sparc64.patch
get_unmapped_area-handles-map_fixed-on-x86_64.patch
get_unmapped_area-handles-map_fixed-in-hugetlbfs.patch
get_unmapped_area-handles-map_fixed-in-generic-code.patch
get_unmapped_area-doesnt-need-hugetlbfs-hacks-anymore.patch
slab-allocators-remove-slab_debug_initial-flag.patch
slab-allocators-remove-slab_ctor_atomic.patch
slab-allocators-remove-useless-__gfp_no_grow-flag.patch
lazy-freeing-of-memory-through-madv_free.patch
restore-madv_dontneed-to-its-original-linux-behaviour.patch
hugetlbfs-add-null-check-in-hugetlb_zero_setup.patch
slob-fix-page-order-calculation-on-not-4kb-page.patch
page-migration-only-migrate-pages-if-allocation-in-the-highest-zone-is-possible.patch
return-eperm-not-echild-on-security_task_wait-failure.patch
blackfin-arch.patch
driver_bfin_serial_core.patch
blackfin-on-chip-ethernet-mac-controller-driver.patch
blackfin-patch-add-blackfin-support-in-smc91x.patch
blackfin-on-chip-rtc-controller-driver.patch
blackfin-blackfin-on-chip-spi-controller-driver.patch
convert-h8-300-to-generic-timekeeping.patch
h8300-generic-irq.patch
h8300-add-zimage-support.patch
round_up-macro-cleanup-in-arch-alpha-kernel-osf_sysc.patch
alpha-fix-bootp-image-creation.patch
alpha-prctl-macros.patch
srmcons-fix-kmallocgfp_kernel-inside-spinlock.patch
arm26-remove-useless-config-option-generic_bust_spinlock.patch
fix-refrigerator-vs-thaw_process-race.patch
swsusp-use-inline-functions-for-changing-page-flags.patch
swsusp-do-not-use-page-flags.patch
mm-remove-unused-page-flags.patch
swsusp-fix-error-paths-in-snapshot_open.patch
swsusp-use-gfp_kernel-for-creating-basic-data-structures.patch
freezer-remove-pf_nofreeze-from-handle_initrd.patch
swsusp-use-rbtree-for-tracking-allocated-swap.patch
freezer-fix-racy-usage-of-try_to_freeze-in-kswapd.patch
remove-software_suspend.patch
power-management-change-sys-power-disk-display.patch
kconfig-mentioneds-hibernation-not-just-swsusp.patch
swsusp-fix-snapshot_release.patch
swsusp-free-more-memory.patch
remove-unused-header-file-arch-m68k-atari-atasoundh.patch
spin_lock_unlocked-cleanup-in-arch-m68k.patch
remove-unused-header-file-drivers-serial-crisv10h.patch
cris-check-for-memory-allocation.patch
cris-remove-code-related-to-pre-22-kernel.patch
uml-delete-unused-code.patch
uml-formatting-fixes.patch
uml-host_info-tidying.patch
uml-mark-tt-mode-code-for-future-removal.patch
uml-print-coredump-limits.patch
uml-handle-block-device-hotplug-errors.patch
uml-driver-formatting-fixes.patch
uml-driver-formatting-fixes-fix.patch
uml-network-interface-hotplug-error-handling.patch
array_size-check-for-type.patch
uml-move-sigio-testing-to-sigioc.patch
uml-create-archh.patch
uml-create-as-layouth.patch
uml-move-remaining-useful-contents-of-user_utilh.patch
uml-remove-user_utilh.patch
uml-add-missing-__init-declarations.patch
remove-unused-header-file-arch-um-kernel-tt-include-mode_kern-tth.patch
uml-improve-checking-and-diagnostics-of-ethernet-macs.patch
uml-eliminate-temporary-buffer-in-eth_configure.patch
uml-replace-one-element-array-with-zero-element-array.patch
uml-fix-umid-in-xterm-titles.patch
uml-speed-up-exec.patch
uml-no-locking-needed-in-tlsc.patch
uml-tidy-processc.patch
uml-remove-page_size.patch
uml-kernel_thread-shouldnt-panic.patch
uml-tidy-fault-code.patch
uml-kernel-segfaults-should-dump-proper-registers.patch
uml-comment-early-boot-locking.patch
uml-irq-locking-commentary.patch
uml-delete-host_frame_size.patch
uml-drivers-get-release-methods.patch
uml-dump-registers-on-ptrace-or-wait-failure.patch
uml-speed-up-page-table-walking.patch
uml-remove-unused-x86_64-code.patch
uml-start-fixing-os_read_file-and-os_write_file.patch
uml-tidy-libc-code.patch
uml-convert-libc-layer-to-call-read-and-write.patch
uml-batch-i-o-requests.patch
uml-send-pointers-instead-of-structures-to-i-o-thread.patch
uml-send-pointers-instead-of-structures-to-i-o-thread-fix.patch
uml-dump-core-on-panic.patch
uml-dont-try-to-handle-signals-on-initial-process-stack.patch
uml-change-remaining-callers-of-os_read_write_file.patch
uml-formatting-fixes-around-os_read_write_file-callers.patch
uml-remove-debugging-remnants.patch
uml-rename-os_read_write_file_k-back-to-os_read_write_file.patch
uml-aio-deadlock-avoidance.patch
uml-speed-page-fault-path.patch
uml-eliminate-a-piece-of-debugging-code.patch
uml-more-page-fault-path-trimming.patch
uml-only-flush-areas-covered-by-vma.patch
uml-out-of-tmpfs-space-error-clarification.patch
uml-virtualized-time-fix.patch
uml-fix-prototypes.patch
v850-generic-timekeeping-conversion.patch
xtensa-strlcpy-is-smart-enough.patch
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2006-10-20 21:39 incoming Andrew Morton
@ 2006-10-20 22:31 ` Alan Cox
0 siblings, 0 replies; 786+ messages in thread
From: Alan Cox @ 2006-10-20 22:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jeff Garzik, linux-ide, Tejun Heo
Ar Gwe, 2006-10-20 am 14:39 -0700, ysgrifennodd Andrew Morton:
> I have 12 ata patches here - I'm not sure that Tejun's ones are the latest
> version, but I'll just send the whole lot as-is, see what happens...
Looks fine with respect to my bits and Tejun's. Now Tejun's polling
identify is in I'll have a pile of patches next week as most PATA
controllers want to use this.
Alan
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2006-10-20 21:39 Andrew Morton
2006-10-20 22:31 ` incoming Alan Cox
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2006-10-20 21:39 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-ide, Tejun Heo, Alan Cox
I have 12 ata patches here - I'm not sure that Tejun's ones are the latest
version, but I'll just send the whole lot as-is, see what happens...
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-14 15:38 ` incoming Lee Revell
@ 2005-04-16 9:03 ` Paul Jackson
0 siblings, 0 replies; 786+ messages in thread
From: Paul Jackson @ 2005-04-16 9:03 UTC (permalink / raw)
To: Lee Revell; +Cc: geert, akpm, linux-kernel
> Looks like Andrew's patch bomb script needs some rate limiting ;-)
sendpatchset has that, already builtin ;)
http://www.speakeasy.org/~pj99/sgi/sendpatchset
Though the 5 second delay might not be enough for someone
publishing at the rate Andrew does.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 11:10 ` incoming Andrew Morton
2005-04-12 11:33 ` incoming David Vrabel
2005-04-12 18:31 ` incoming Matthias Urlichs
@ 2005-04-16 8:59 ` Paul Jackson
2 siblings, 0 replies; 786+ messages in thread
From: Paul Jackson @ 2005-04-16 8:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: dvrabel, torvalds, linux-kernel
Andrew wrote:
> I never got around to setting that up, plus the Subject:s pretty quickly
> become invisible when they're indented 198 columns in GUI MUAs.
My sendpatchset tool should be good for this. It sends all but the
first message are sent in "Reference" to, and "In-Reply-To" the first
message.
http://www.speakeasy.org/~pj99/sgi/sendpatchset
I use it when sending out multiple patches in sequence from a quilt
repository.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-14 11:48 ` incoming Geert Uytterhoeven
2005-04-14 11:57 ` incoming Paulo Marques
@ 2005-04-14 15:38 ` Lee Revell
2005-04-16 9:03 ` incoming Paul Jackson
1 sibling, 1 reply; 786+ messages in thread
From: Lee Revell @ 2005-04-14 15:38 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: Andrew Morton, Linux Kernel Development
On Thu, 2005-04-14 at 13:48 +0200, Geert Uytterhoeven wrote:
> On Tue, 12 Apr 2005, Andrew Morton wrote:
> > As the commits list probably isn't working at present I'll cc linux-kernel
> > on this lot. Fairly cruel, sorry, but I don't like the idea of people not
> > knowing what's hitting the main tree.
>
> Is it me, or were really only 117 mails of the 198 sent to lkml?
The patch bombing seems to have really wedged vger. It took up to 24
hours to get all the messages.
Looks like Andrew's patch bomb script needs some rate limiting ;-)
Lee
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-14 11:48 ` incoming Geert Uytterhoeven
@ 2005-04-14 11:57 ` Paulo Marques
2005-04-14 15:38 ` incoming Lee Revell
1 sibling, 0 replies; 786+ messages in thread
From: Paulo Marques @ 2005-04-14 11:57 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: Andrew Morton, Linux Kernel Development
Geert Uytterhoeven wrote:
> On Tue, 12 Apr 2005, Andrew Morton wrote:
>
>>As the commits list probably isn't working at present I'll cc linux-kernel
>>on this lot. Fairly cruel, sorry, but I don't like the idea of people not
>>knowing what's hitting the main tree.
>
>
> Is it me, or were really only 117 mails of the 198 sent to lkml?
(?)
I just double-checked, and I can say that I received all 198 emails from
vger...
--
Paulo Marques - www.grupopie.com
All that is necessary for the triumph of evil is that good men do nothing.
Edmund Burke (1729 - 1797)
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 10:23 incoming Andrew Morton
` (2 preceding siblings ...)
2005-04-12 20:55 ` incoming Russell King
@ 2005-04-14 11:48 ` Geert Uytterhoeven
2005-04-14 11:57 ` incoming Paulo Marques
2005-04-14 15:38 ` incoming Lee Revell
3 siblings, 2 replies; 786+ messages in thread
From: Geert Uytterhoeven @ 2005-04-14 11:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel Development
On Tue, 12 Apr 2005, Andrew Morton wrote:
> As the commits list probably isn't working at present I'll cc linux-kernel
> on this lot. Fairly cruel, sorry, but I don't like the idea of people not
> knowing what's hitting the main tree.
Is it me, or were really only 117 mails of the 198 sent to lkml?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 21:08 ` incoming Andrew Morton
@ 2005-04-12 21:12 ` Russell King
0 siblings, 0 replies; 786+ messages in thread
From: Russell King @ 2005-04-12 21:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: torvalds, linux-kernel
On Tue, Apr 12, 2005 at 02:08:00PM -0700, Andrew Morton wrote:
> Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> >
> > I don't see a patch which adds linux/pm.h to linux/sysdev.h, which is
> > required to fix ARM builds in -rc2 and onwards kernels.
>
> That fix is buried in [patch 105/198]
Great, thanks. I must have missed it, sorry.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 20:55 ` incoming Russell King
@ 2005-04-12 21:08 ` Andrew Morton
2005-04-12 21:12 ` incoming Russell King
0 siblings, 1 reply; 786+ messages in thread
From: Andrew Morton @ 2005-04-12 21:08 UTC (permalink / raw)
To: Russell King; +Cc: torvalds, linux-kernel
Russell King <rmk+lkml@arm.linux.org.uk> wrote:
>
> I don't see a patch which adds linux/pm.h to linux/sysdev.h, which is
> required to fix ARM builds in -rc2 and onwards kernels.
That fix is buried in [patch 105/198]
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 10:23 incoming Andrew Morton
2005-04-12 11:02 ` incoming David Vrabel
2005-04-12 14:38 ` incoming Chris Friesen
@ 2005-04-12 20:55 ` Russell King
2005-04-12 21:08 ` incoming Andrew Morton
2005-04-14 11:48 ` incoming Geert Uytterhoeven
3 siblings, 1 reply; 786+ messages in thread
From: Russell King @ 2005-04-12 20:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel
On Tue, Apr 12, 2005 at 03:23:22AM -0700, Andrew Morton wrote:
> As the commits list probably isn't working at present I'll cc linux-kernel
> on this lot. Fairly cruel, sorry, but I don't like the idea of people not
> knowing what's hitting the main tree.
I don't see a patch which adds linux/pm.h to linux/sysdev.h, which is
required to fix ARM builds in -rc2 and onwards kernels.
It is my understanding that you have such a patch, and if it isn't
going to be sent, I'd like to send my own fix so that ARM can start
building again in mainline.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 11:10 ` incoming Andrew Morton
2005-04-12 11:33 ` incoming David Vrabel
@ 2005-04-12 18:31 ` Matthias Urlichs
2005-04-16 8:59 ` incoming Paul Jackson
2 siblings, 0 replies; 786+ messages in thread
From: Matthias Urlichs @ 2005-04-12 18:31 UTC (permalink / raw)
To: linux-kernel
Hi, Andrew Morton schrub am Tue, 12 Apr 2005 04:10:45 -0700:
> David Vrabel <dvrabel@cantab.net> wrote:
>>
>> Is there any chance that in the future that these patch sets get posted
>> all to one thread?
>
> I never got around to setting that up, plus the Subject:s pretty quickly
> become invisible when they're indented 198 columns in GUI MUAs.
>
Umm, what stops you from letting all the parts refer to part zero,
instead of part n-1?
--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 10:23 incoming Andrew Morton
2005-04-12 11:02 ` incoming David Vrabel
@ 2005-04-12 14:38 ` Chris Friesen
2005-04-12 20:55 ` incoming Russell King
2005-04-14 11:48 ` incoming Geert Uytterhoeven
3 siblings, 0 replies; 786+ messages in thread
From: Chris Friesen @ 2005-04-12 14:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel
Andrew Morton wrote:
> As the commits list probably isn't working at present I'll cc linux-kernel
> on this lot. Fairly cruel, sorry, but I don't like the idea of people not
> knowing what's hitting the main tree.
I'd like to second the idea of having all the patches be replies to this
original posting (ie one level of indenting for all patches). That way
a threaded view will only have one subject line for all 198 patches.
Chris
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 11:10 ` incoming Andrew Morton
@ 2005-04-12 11:33 ` David Vrabel
2005-04-12 18:31 ` incoming Matthias Urlichs
2005-04-16 8:59 ` incoming Paul Jackson
2 siblings, 0 replies; 786+ messages in thread
From: David Vrabel @ 2005-04-12 11:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: torvalds, linux-kernel
Andrew Morton wrote:
> David Vrabel <dvrabel@cantab.net> wrote:
>
>>Is there any chance that in the future that these patch sets get posted
>> all to one thread?
>
> I never got around to setting that up, plus the Subject:s pretty quickly
> become invisible when they're indented 198 columns in GUI MUAs.
I meant something like this:
[patch 000/100] Foo-ize the baz.
[patch 001/100] Frob the baz
[patch 002/100] baz cleanups
[patch 003/100] apply foo-ization to baz
Rather than
[patch 000/100] Foo-ize the baz.
[patch 001/100] Frob the baz
[patch 002/100] baz cleanups
[patch 003/100] apply foo-ization to baz
Which would (as you rightly pointed out) be ludicrous.
i.e., all the patches are replys to the summary.
David Vrabel
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 11:02 ` incoming David Vrabel
@ 2005-04-12 11:10 ` Andrew Morton
2005-04-12 11:33 ` incoming David Vrabel
` (2 more replies)
0 siblings, 3 replies; 786+ messages in thread
From: Andrew Morton @ 2005-04-12 11:10 UTC (permalink / raw)
To: David Vrabel; +Cc: torvalds, linux-kernel
David Vrabel <dvrabel@cantab.net> wrote:
>
> Is there any chance that in the future that these patch sets get posted
> all to one thread?
I never got around to setting that up, plus the Subject:s pretty quickly
become invisible when they're indented 198 columns in GUI MUAs.
Hopefully we'll have the commits list running next time...
^ permalink raw reply [flat|nested] 786+ messages in thread
* Re: incoming
2005-04-12 10:23 incoming Andrew Morton
@ 2005-04-12 11:02 ` David Vrabel
2005-04-12 11:10 ` incoming Andrew Morton
2005-04-12 14:38 ` incoming Chris Friesen
` (2 subsequent siblings)
3 siblings, 1 reply; 786+ messages in thread
From: David Vrabel @ 2005-04-12 11:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel
Andrew Morton wrote:
> As the commits list probably isn't working at present I'll cc linux-kernel
> on this lot. Fairly cruel, sorry, but I don't like the idea of people not
> knowing what's hitting the main tree.
Is there any chance that in the future that these patch sets get posted
all to one thread? Perhaps as a reply to a summary? 1 thread to ignore
is preferable to 198.
David Vrabel
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2005-04-12 10:23 Andrew Morton
2005-04-12 11:02 ` incoming David Vrabel
` (3 more replies)
0 siblings, 4 replies; 786+ messages in thread
From: Andrew Morton @ 2005-04-12 10:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
As the commits list probably isn't working at present I'll cc linux-kernel
on this lot. Fairly cruel, sorry, but I don't like the idea of people not
knowing what's hitting the main tree.
This is the first live test of Linus's git-importing ability. I'm about
to disappear for 1.5 weeks - hope we'll still have a kernel left when I
get back.
- As we're still a fair way from 2.6.12 and things are still backing up,
it's a relatively large update.
- Various arch updates
- Big x86_64 update, as discussed
- decent-sized ppc32, ppc64 updates
- big infiniband update
- very nearly the last batch of u32->pm_message_t conversions. Some
other bits of this will be sitting out in subsystem trees - this is just
the stuff which doesn't overlap.
- the important fixes from the md, nfs4 queues
- other random fixes and things we probably want to have in 2.6.12.
- I'd draw especial Linus attention to:
"fix crash in entry.S restore_all" and
"pci enumeration on ixp2000: overflow in kernel/resource.c"
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2004-11-11 0:02 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2004-11-11 0:02 UTC (permalink / raw)
To: David S. Miller, Jeff Garzik; +Cc: netdev
A bunch of tricky stuff which I picked up off the internets. They've been
in -mm for a while but I otherwise cannot vouch for them.
^ permalink raw reply [flat|nested] 786+ messages in thread
* incoming
@ 2004-10-28 7:19 Andrew Morton
0 siblings, 0 replies; 786+ messages in thread
From: Andrew Morton @ 2004-10-28 7:19 UTC (permalink / raw)
To: David S. Miller, Jeff Garzik; +Cc: netdev
A bunch of net patches which I've accumulated. I've made no
effort to test or review these...
^ permalink raw reply [flat|nested] 786+ messages in thread
end of thread, other threads:[~2022-04-27 19:41 UTC | newest]
Thread overview: 786+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22 21:38 incoming Andrew Morton
2022-03-22 21:38 ` [patch 001/227] linux/kthread.h: remove unused macros Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` [patch 002/227] scripts/spelling.txt: add more spellings to spelling.txt Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` [patch 003/227] ntfs: add sanity check on allocation size Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 22:13 ` Linus Torvalds
2022-03-22 21:38 ` [patch 004/227] ocfs2: cleanup some return variables Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` [patch 005/227] fs/ocfs2: fix comments mentioning i_mutex Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` [patch 006/227] doc: convert 'subsection' to 'section' in gfp.h Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` [patch 007/227] mm: document and polish read-ahead code Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` [patch 008/227] mm: improve cleanup when ->readpages doesn't process all pages Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:38 ` [patch 009/227] fuse: remove reliance on bdi congestion Andrew Morton
2022-03-22 21:38 ` Andrew Morton
2022-03-22 21:39 ` [patch 010/227] nfs: " Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 011/227] ceph: " Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 012/227] remove inode_congested() Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 013/227] remove bdi_congested() and wb_congested() and related functions Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 014/227] f2fs: replace congestion_wait() calls with io_schedule_timeout() Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 015/227] block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 016/227] remove congestion tracking framework Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 017/227] mount: warn only once about timestamp range expiration Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 018/227] mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 019/227] filemap: remove find_get_pages() Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 020/227] mm/writeback: minor clean up for highmem_dirtyable_memory Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 021/227] mm: fs: fix lru_cache_disabled race in bh_lru Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 022/227] mm: fix invalid page pointer returned with FOLL_PIN gups Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 023/227] mm/gup: follow_pfn_pte(): -EEXIST cleanup Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 024/227] mm/gup: remove unused pin_user_pages_locked() Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 025/227] mm: change lookup_node() to use get_user_pages_fast() Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 026/227] mm/gup: remove unused get_user_pages_locked() Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 027/227] mm/swap: fix confusing comment in folio_mark_accessed Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 028/227] tmpfs: support for file creation time Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:39 ` [patch 029/227] shmem: mapping_set_exiting() to help mapped resilience Andrew Morton
2022-03-22 21:39 ` Andrew Morton
2022-03-22 21:40 ` [patch 030/227] tmpfs: do not allocate pages on read Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 031/227] mm: shmem: use helper macro __ATTR_RW Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 032/227] memcg: replace in_interrupt() with !in_task() Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 033/227] memcg: add per-memcg total kernel memory stat Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 034/227] mm/memcg: mem_cgroup_per_node is already set to 0 on allocation Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 035/227] mm/memcg: retrieve parent memcg from css.parent Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 036/227] memcg: refactor mem_cgroup_oom Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 037/227] memcg: unify force charging conditions Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 038/227] selftests: memcg: test high limit for single entry allocation Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 039/227] memcg: synchronously enforce memory.high for large overcharges Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 040/227] mm/memcontrol: return 1 from cgroup.memory __setup() handler Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 041/227] mm/memcg: revert ("mm/memcg: optimize user context object stock access") Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 042/227] mm/memcg: disable threshold event handlers on PREEMPT_RT Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 043/227] mm/memcg: protect per-CPU counter by disabling preemption on PREEMPT_RT where needed Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 044/227] mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 045/227] mm/memcg: protect memcg_stock with a local_lock_t Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 046/227] mm/memcg: disable migration instead of preemption in drain_all_stock() Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 047/227] mm: list_lru: transpose the array of per-node per-memcg lru lists Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:40 ` [patch 048/227] mm: introduce kmem_cache_alloc_lru Andrew Morton
2022-03-22 21:40 ` Andrew Morton
2022-03-22 21:41 ` [patch 049/227] fs: introduce alloc_inode_sb() to allocate filesystems specific inode Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 050/227] fs: allocate inode by using alloc_inode_sb() Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 051/227] f2fs: " Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 052/227] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 053/227] xarray: use kmem_cache_alloc_lru to allocate xa_node Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 054/227] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 055/227] mm: list_lru: allocate list_lru_one only when needed Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 056/227] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 057/227] mm: list_lru: replace linear array with xarray Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 058/227] mm: memcontrol: reuse memory cgroup ID for kmem ID Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 059/227] mm: memcontrol: fix cannot alloc the maximum memcg ID Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 060/227] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 061/227] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 062/227] memcg: enable accounting for tty-related objects Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 063/227] selftests, x86: fix how check_cc.sh is being invoked Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 064/227] mm: merge pte_mkhuge() call into arch_make_huge_pte() Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 065/227] mm: remove mmu_gathers storage from remaining architectures Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 066/227] mm: thp: fix wrong cache flush in remove_migration_pmd() Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 067/227] mm: fix missing cache flush for all tail pages of compound page Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:41 ` [patch 068/227] mm: hugetlb: fix missing cache flush in copy_huge_page_from_user() Andrew Morton
2022-03-22 21:41 ` Andrew Morton
2022-03-22 21:42 ` [patch 069/227] mm: hugetlb: fix missing cache flush in hugetlb_mcopy_atomic_pte() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 070/227] mm: shmem: fix missing cache flush in shmem_mfill_atomic_pte() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 071/227] mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 072/227] mm: replace multiple dcache flush with flush_dcache_folio() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 073/227] mm: don't skip swap entry even if zap_details specified Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 074/227] mm: rename zap_skip_check_mapping() to should_zap_page() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 075/227] mm: change zap_details.zap_mapping into even_cows Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 076/227] mm: rework swap handling of zap_pte_range Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 077/227] mm/mmap: return 1 from stack_guard_gap __setup() handler Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 078/227] mm/memory.c: use helper function range_in_vma() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 079/227] mm/memory.c: use helper macro min and max in unmap_mapping_range_tree() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 080/227] mm: _install_special_mapping() apply VM_LOCKED_CLEAR_MASK Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 081/227] mm/mmap: remove obsolete comment in ksys_mmap_pgoff Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 082/227] mm/mremap:: use vma_lookup() instead of find_vma() Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 083/227] mm/sparse: make mminit_validate_memmodel_limits() static Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 084/227] mm/vmalloc: remove unneeded function forward declaration Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 085/227] mm/vmalloc: Move draining areas out of caller context Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 086/227] mm/vmalloc: add adjust_search_size parameter Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 087/227] mm/vmalloc: eliminate an extra orig_gfp_mask Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:42 ` [patch 088/227] mm/vmalloc.c: fix "unused function" warning Andrew Morton
2022-03-22 21:42 ` Andrew Morton
2022-03-22 21:43 ` [patch 089/227] mm/vmalloc: fix comments about vmap_area struct Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 090/227] mm: page_alloc: avoid merging non-fallbackable pageblocks with others Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 091/227] mm/mmzone.c: use try_cmpxchg() in page_cpupid_xchg_last() Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 092/227] mm/mmzone.h: remove unused macros Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 093/227] mm/page_alloc: don't pass pfn to free_unref_page_commit() Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 094/227] cma: factor out minimum alignment requirement Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 095/227] mm: enforce pageblock_order < MAX_ORDER Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 096/227] mm/page_alloc: mark pagesets as __maybe_unused Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 097/227] mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 098/227] mm/page_alloc: fetch the correct pcp buddy during bulk free Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 099/227] mm/page_alloc: track range of active PCP lists " Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 100/227] mm/page_alloc: simplify how many pages are selected per pcp list " Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 101/227] mm/page_alloc: drain the requested list first " Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 102/227] mm/page_alloc: free pages in a single pass " Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 103/227] mm/page_alloc: limit number of high-order pages on PCP " Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 104/227] mm/page_alloc: do not prefetch buddies " Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 105/227] arch/x86/mm/numa: Do not initialize nodes twice Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 106/227] mm: count time in drain_all_pages during direct reclaim as memory pressure Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:43 ` [patch 107/227] mm/page_alloc: call check_new_pages() while zone spinlock is not held Andrew Morton
2022-03-22 21:43 ` Andrew Morton
2022-03-22 21:44 ` [patch 108/227] mm/page_alloc: check high-order pages for corruption during PCP operations Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 109/227] mm/memory-failure.c: remove obsolete comment Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 110/227] mm/hwpoison: fix error page recovered but reported "not recovered" Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 111/227] mm: invalidate hwpoison page cache page in fault path Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 112/227] mm/memory-failure.c: minor clean up for memory_failure_dev_pagemap Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 113/227] mm/memory-failure.c: catch unexpected -EFAULT from vma_address() Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 114/227] mm/memory-failure.c: rework the signaling logic in kill_proc Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 115/227] mm/memory-failure.c: fix race with changing page more robustly Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 116/227] mm/memory-failure.c: remove PageSlab check in hwpoison_filter_dev Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 117/227] mm/memory-failure.c: rework the try_to_unmap logic in hwpoison_user_mappings() Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 118/227] mm/memory-failure.c: remove obsolete comment in __soft_offline_page Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 119/227] mm/memory-failure.c: remove unnecessary PageTransTail check Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 120/227] mm/hwpoison-inject: support injecting hwpoison to free page Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 121/227] mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 122/227] mm/hwpoison: add in-use hugepage hwpoison filter judgement Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 123/227] mm/memory-failure.c: fix race with changing page compound again Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 124/227] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 125/227] mm/memory-failure.c: make non-LRU movable pages unhandlable Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 126/227] mm, fault-injection: declare should_fail_alloc_page() Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:44 ` [patch 127/227] mm/mlock: fix potential imbalanced rlimit ucounts adjustment Andrew Morton
2022-03-22 21:44 ` Andrew Morton
2022-03-22 21:45 ` [patch 128/227] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 129/227] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 130/227] mm: sparsemem: use page table lock to protect kernel pmd operations Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 131/227] selftests: vm: add a hugetlb test case Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 132/227] mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 133/227] mm/hugetlb: generalize ARCH_WANT_GENERAL_HUGETLB Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 134/227] hugetlb: clean up potential spectre issue warnings Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 135/227] mm/hugetlb: use helper macro __ATTR_RW Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 136/227] mm/hugetlb.c: export PageHeadHuge() Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 137/227] mm: remove unneeded local variable follflags Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 138/227] userfaultfd: provide unmasked address on page-fault Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 139/227] userfaultfd/selftests: fix uninitialized_var.cocci warning Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 140/227] mm/fs: delete PF_SWAPWRITE Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 141/227] mm: __isolate_lru_page_prepare() in isolate_migratepages_block() Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 142/227] mm/list_lru: optimize memcg_reparent_list_lru_node() Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 143/227] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 144/227] mm: workingset: replace IRQ-off check with a lockdep assert Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 145/227] mm: vmscan: fix documentation for page_check_references() Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 146/227] mm: compaction: cleanup the compaction trace events Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:45 ` [patch 147/227] mempolicy: mbind_range() set_policy() after vma_merge() Andrew Morton
2022-03-22 21:45 ` Andrew Morton
2022-03-22 21:46 ` [patch 148/227] mm/oom_kill: remove unneeded is_memcg_oom check Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 149/227] mm,migrate: fix establishing demotion target Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 150/227] mm/migrate: fix race between lock page and clear PG_Isolated Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 151/227] mm/thp: refix __split_huge_pmd_locked() for migration PMD Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 152/227] mm/cma: provide option to opt out from exposing pages on activation failure Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 153/227] powerpc/fadump: opt out from freeing pages on cma " Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 154/227] NUMA Balancing: add page promotion counter Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 155/227] NUMA balancing: optimize page placement for memory tiering system Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 156/227] memory tiering: skip to scan fast memory Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 157/227] mm: page_io: fix psi memory pressure error on cold swapins Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 158/227] mm/vmstat: add event for ksm swapping in copy Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 159/227] mm/ksm: use helper macro __ATTR_RW Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 160/227] mm/hwpoison: check the subpage, not the head page Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 161/227] mm/madvise: use vma_lookup() instead of find_vma() Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 162/227] mm: madvise: return correct bytes advised with process_madvise Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 163/227] mm: madvise: skip unmapped vma holes passed to process_madvise Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-23 0:24 ` Minchan Kim
2022-03-23 2:08 ` Linus Torvalds
2022-03-23 8:28 ` Michal Hocko
2022-03-23 15:47 ` Charan Teja Kalla
2022-03-22 21:46 ` [patch 164/227] mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 165/227] mm: handle uninitialized numa nodes gracefully Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:46 ` [patch 166/227] mm, memory_hotplug: drop arch_free_nodedata Andrew Morton
2022-03-22 21:46 ` Andrew Morton
2022-03-22 21:47 ` [patch 167/227] mm, memory_hotplug: reorganize new pgdat initialization Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 168/227] mm: make free_area_init_node aware of memory less nodes Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 169/227] memcg: do not tweak node in alloc_mem_cgroup_per_node_info Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 170/227] drivers/base/memory: add memory block to memory group after registration succeeded Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 171/227] drivers/base/node: consolidate node device subsystem initialization in node_dev_init() Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 172/227] mm/memory_hotplug: remove obsolete comment of __add_pages Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 173/227] mm/memory_hotplug: avoid calling zone_intersects() for ZONE_NORMAL Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 174/227] mm/memory_hotplug: clean up try_offline_node Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 175/227] mm/memory_hotplug: fix misplaced comment in offline_pages Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 176/227] drivers/base/node: rename link_mem_sections() to register_memory_block_under_node() Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 177/227] drivers/base/memory: determine and store zone for single-zone memory blocks Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 178/227] drivers/base/memory: clarify adding and removing of " Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 179/227] mm: only re-generate demotion targets when a numa node changes its N_CPU state Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 180/227] mm/thp: ClearPageDoubleMap in first page_add_file_rmap() Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 181/227] mm/zswap.c: allow handling just same-value filled pages Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 182/227] mm: remove usercopy_warn() Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 183/227] mm: uninline copy_overflow() Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 184/227] mm/usercopy: return 1 from hardened_usercopy __setup() handler Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 185/227] mm/early_ioremap: declare early_memremap_pgprot_adjust() Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:47 ` [patch 186/227] highmem: document kunmap_local() Andrew Morton
2022-03-22 21:47 ` Andrew Morton
2022-03-22 21:48 ` [patch 187/227] mm/highmem: remove unnecessary done label Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 188/227] mm/page_table_check.c: use strtobool for param parsing Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 189/227] mm/kfence: remove unnecessary CONFIG_KFENCE option Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 190/227] kfence: allow re-enabling KFENCE after system startup Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 191/227] kfence: alloc kfence_pool " Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 192/227] kunit: fix UAF when run kfence test case test_gfpzero Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 193/227] kunit: make kunit_test_timeout compatible with comment Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 194/227] kfence: test: try to avoid test_gfpzero trigger rcu_stall Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 195/227] kfence: allow use of a deferrable timer Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 196/227] mm/hmm.c: remove unneeded local variable ret Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 197/227] mm/damon/dbgfs/init_regions: use target index instead of target id Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 198/227] Docs/admin-guide/mm/damon/usage: update for changed initail_regions file input Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 199/227] mm/damon/core: move damon_set_targets() into dbgfs Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 200/227] mm/damon: remove the target id concept Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 201/227] mm/damon: remove redundant page validation Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 202/227] mm/damon: rename damon_primitives to damon_operations Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 203/227] mm/damon: let monitoring operations can be registered and selected Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 204/227] mm/damon/paddr,vaddr: register themselves to DAMON in subsys_initcall Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 205/227] mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations() Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:48 ` [patch 206/227] mm/damon/dbgfs: " Andrew Morton
2022-03-22 21:48 ` Andrew Morton
2022-03-22 21:49 ` [patch 207/227] mm/damon/dbgfs: use operations id for knowing if the target has pid Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 208/227] mm/damon/dbgfs-test: fix is_target_id() change Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 209/227] mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 210/227] mm/damon: remove unnecessary CONFIG_DAMON option Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 211/227] Docs/vm/damon: call low level monitoring primitives the operations Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 212/227] Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 213/227] Docs/damon: update outdated term 'regions update interval' Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 214/227] mm/damon/core: allow non-exclusive DAMON start/stop Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 215/227] mm/damon/core: add number of each enum type values Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 216/227] mm/damon: implement a minimal stub for sysfs-based DAMON interface Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 217/227] mm/damon/sysfs: link DAMON for virtual address spaces monitoring Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 218/227] mm/damon/sysfs: support the physical address space monitoring Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 219/227] mm/damon/sysfs: support DAMON-based Operation Schemes Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 220/227] mm/damon/sysfs: support DAMOS quotas Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 221/227] mm/damon/sysfs: support schemes prioritization Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 222/227] mm/damon/sysfs: support DAMOS watermarks Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 223/227] mm/damon/sysfs: support DAMOS stats Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 224/227] selftests/damon: add a test for DAMON sysfs interface Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 225/227] Docs/admin-guide/mm/damon/usage: document " Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:49 ` [patch 226/227] Docs/ABI/testing: add DAMON sysfs interface ABI document Andrew Morton
2022-03-22 21:49 ` Andrew Morton
2022-03-22 21:50 ` [patch 227/227] mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Andrew Morton
2022-03-22 21:50 ` Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2022-04-27 19:41 incoming Andrew Morton
2022-04-21 23:35 incoming Andrew Morton
2022-04-15 2:12 incoming Andrew Morton
2022-04-08 20:08 incoming Andrew Morton
2022-04-01 18:27 incoming Andrew Morton
2022-04-01 18:20 incoming Andrew Morton
2022-04-01 18:27 ` incoming Andrew Morton
2022-03-25 1:07 incoming Andrew Morton
2022-03-23 23:04 incoming Andrew Morton
2022-03-16 23:14 incoming Andrew Morton
2022-03-05 4:28 incoming Andrew Morton
2022-02-26 3:10 incoming Andrew Morton
2022-02-12 0:27 incoming Andrew Morton
2022-02-12 2:02 ` incoming Linus Torvalds
2022-02-12 5:24 ` incoming Andrew Morton
2022-02-04 4:48 incoming Andrew Morton
2022-01-29 21:40 incoming Andrew Morton
2022-01-29 2:13 incoming Andrew Morton
2022-01-29 4:25 ` incoming Matthew Wilcox
2022-01-29 6:23 ` incoming Andrew Morton
2022-01-22 6:10 incoming Andrew Morton
2022-01-20 2:07 incoming Andrew Morton
2022-01-14 22:02 incoming Andrew Morton
2021-12-31 4:12 incoming Andrew Morton
2021-12-25 5:11 incoming Andrew Morton
2021-12-10 22:45 incoming Andrew Morton
2021-11-20 0:42 incoming Andrew Morton
2021-11-11 4:32 incoming Andrew Morton
2021-11-09 2:30 incoming Andrew Morton
2021-11-05 20:34 incoming Andrew Morton
2021-10-28 21:35 incoming Andrew Morton
2021-10-18 22:14 incoming Andrew Morton
2021-09-24 22:42 incoming Andrew Morton
2021-09-10 3:09 incoming Andrew Morton
2021-09-10 17:11 ` incoming Kees Cook
2021-09-10 20:13 ` incoming Kees Cook
2021-09-09 1:08 incoming Andrew Morton
2021-09-08 22:17 incoming Andrew Morton
2021-09-08 2:52 incoming Andrew Morton
2021-09-08 8:57 ` incoming Vlastimil Babka
2021-09-02 21:48 incoming Andrew Morton
2021-09-02 21:49 ` incoming Andrew Morton
2021-08-25 19:17 incoming Andrew Morton
2021-08-20 2:03 incoming Andrew Morton
2021-08-13 23:53 incoming Andrew Morton
2021-07-29 21:52 incoming Andrew Morton
2021-07-23 22:49 incoming Andrew Morton
2021-07-15 4:26 incoming Andrew Morton
2021-07-08 0:59 incoming Andrew Morton
2021-07-01 1:46 incoming Andrew Morton
2021-07-03 0:28 ` incoming Linus Torvalds
2021-07-03 1:06 ` incoming Linus Torvalds
2021-06-29 2:32 incoming Andrew Morton
2021-06-25 1:38 incoming Andrew Morton
2021-06-16 1:22 incoming Andrew Morton
2021-06-05 3:00 incoming Andrew Morton
2021-05-23 0:41 incoming Andrew Morton
2021-05-15 0:26 incoming Andrew Morton
2021-05-07 1:01 incoming Andrew Morton
2021-05-07 7:12 ` incoming Linus Torvalds
2021-05-05 1:32 incoming Andrew Morton
2021-05-05 1:47 ` incoming Linus Torvalds
2021-05-05 3:16 ` incoming Andrew Morton
2021-05-05 17:10 ` incoming Linus Torvalds
2021-05-05 17:10 ` incoming Linus Torvalds
2021-05-05 17:44 ` incoming Andrew Morton
2021-05-06 3:19 ` incoming Anshuman Khandual
2021-04-30 5:52 incoming Andrew Morton
2021-04-23 21:28 incoming Andrew Morton
2021-04-16 22:45 incoming Andrew Morton
2021-04-09 20:26 incoming Andrew Morton
2021-03-25 4:36 incoming Andrew Morton
2021-03-13 5:06 incoming Andrew Morton
2021-02-26 1:14 incoming Andrew Morton
2021-02-26 17:55 ` incoming Linus Torvalds
2021-02-26 19:16 ` incoming Andrew Morton
2021-02-24 19:58 incoming Andrew Morton
2021-02-24 21:30 ` incoming Linus Torvalds
2021-02-24 21:37 ` incoming Linus Torvalds
2021-02-24 21:37 ` incoming Linus Torvalds
2021-02-25 8:53 ` incoming Arnd Bergmann
2021-02-25 8:53 ` incoming Arnd Bergmann
2021-02-25 9:12 ` incoming Andrey Ryabinin
2021-02-25 9:12 ` incoming Andrey Ryabinin
2021-02-25 11:07 ` incoming Walter Wu
2021-02-13 4:52 incoming Andrew Morton
2021-02-09 21:41 incoming Andrew Morton
2021-02-10 19:30 ` incoming Linus Torvalds
2021-02-05 2:31 incoming Andrew Morton
2021-01-24 5:00 incoming Andrew Morton
2021-01-12 23:48 incoming Andrew Morton
2021-01-15 23:32 ` incoming Linus Torvalds
2020-12-29 23:13 incoming Andrew Morton
2020-12-22 19:58 incoming Andrew Morton
2020-12-22 21:43 ` incoming Linus Torvalds
2020-12-18 22:00 incoming Andrew Morton
2020-12-16 4:41 incoming Andrew Morton
2020-12-15 20:32 incoming Andrew Morton
2020-12-15 21:00 ` incoming Linus Torvalds
2020-12-15 22:48 ` incoming Linus Torvalds
2020-12-15 22:49 ` incoming Linus Torvalds
2020-12-15 22:55 ` incoming Andrew Morton
2020-12-15 3:02 incoming Andrew Morton
2020-12-15 3:25 ` incoming Linus Torvalds
2020-12-15 3:25 ` incoming Linus Torvalds
2020-12-15 3:30 ` incoming Linus Torvalds
2020-12-15 3:30 ` incoming Linus Torvalds
2020-12-15 14:04 ` incoming Konstantin Ryabitsev
2020-12-11 21:35 incoming Andrew Morton
2020-12-06 6:14 incoming Andrew Morton
2020-11-22 6:16 incoming Andrew Morton
2020-11-14 6:51 incoming Andrew Morton
2020-11-02 1:06 incoming Andrew Morton
2020-10-17 23:13 incoming Andrew Morton
2020-10-16 2:40 incoming Andrew Morton
2020-10-16 3:03 ` incoming Andrew Morton
2020-10-13 23:46 incoming Andrew Morton
2020-10-11 6:15 incoming Andrew Morton
2020-10-03 5:20 incoming Andrew Morton
2020-09-26 4:17 incoming Andrew Morton
2020-09-19 4:19 incoming Andrew Morton
2020-09-04 23:34 incoming Andrew Morton
2020-08-21 0:41 incoming Andrew Morton
2020-08-15 0:29 incoming Andrew Morton
2020-08-12 1:29 incoming Andrew Morton
2020-08-07 6:16 incoming Andrew Morton
2020-07-24 4:14 incoming Andrew Morton
2020-07-03 22:14 incoming Andrew Morton
2020-06-26 3:28 incoming Andrew Morton
2020-06-26 6:51 ` incoming Linus Torvalds
2020-06-26 7:31 ` incoming Linus Torvalds
2020-06-26 17:39 ` incoming Konstantin Ryabitsev
2020-06-26 17:40 ` incoming Konstantin Ryabitsev
2020-06-12 0:30 incoming Andrew Morton
2020-06-11 1:40 incoming Andrew Morton
2020-06-09 4:29 incoming Andrew Morton
2020-06-09 16:58 ` incoming Linus Torvalds
2020-06-08 4:35 incoming Andrew Morton
2020-06-04 23:45 incoming Andrew Morton
2020-06-03 22:55 incoming Andrew Morton
2020-06-02 20:09 incoming Andrew Morton
2020-06-02 4:44 incoming Andrew Morton
2020-06-02 20:08 ` incoming Andrew Morton
2020-06-02 20:45 ` incoming Linus Torvalds
2020-06-02 21:38 ` incoming Andrew Morton
2020-06-02 22:18 ` incoming Linus Torvalds
2020-05-28 5:20 incoming Andrew Morton
2020-05-28 20:10 ` incoming Linus Torvalds
2020-05-29 20:31 ` incoming Andrew Morton
2020-05-29 20:38 ` incoming Linus Torvalds
2020-05-29 21:12 ` incoming Andrew Morton
2020-05-29 21:20 ` incoming Linus Torvalds
2020-05-23 5:22 incoming Andrew Morton
2020-05-14 0:50 incoming Andrew Morton
2020-05-08 1:35 incoming Andrew Morton
2020-04-21 1:13 incoming Andrew Morton
2020-04-12 7:41 incoming Andrew Morton
2020-04-10 21:30 incoming Andrew Morton
2020-04-07 3:02 incoming Andrew Morton
2020-04-02 4:01 incoming Andrew Morton
2020-03-29 2:14 incoming Andrew Morton
2020-03-22 1:19 incoming Andrew Morton
2020-03-06 6:27 incoming Andrew Morton
2020-02-21 4:00 incoming Andrew Morton
2020-02-21 4:03 ` incoming Andrew Morton
2020-02-21 18:21 ` incoming Linus Torvalds
2020-02-21 18:32 ` incoming Konstantin Ryabitsev
2020-02-27 9:59 ` incoming Vlastimil Babka
2020-02-21 19:33 ` incoming Linus Torvalds
2020-02-04 1:33 incoming Andrew Morton
2020-02-04 2:27 ` incoming Linus Torvalds
2020-02-04 2:46 ` incoming Andrew Morton
2020-02-04 3:11 ` incoming Linus Torvalds
2020-01-31 6:10 incoming Andrew Morton
2020-01-14 0:28 incoming Andrew Morton
2020-01-04 20:55 incoming Andrew Morton
2019-12-18 4:50 incoming Andrew Morton
2019-12-05 0:48 incoming Andrew Morton
2019-12-01 1:47 incoming Andrew Morton
2019-12-01 5:17 ` incoming James Bottomley
2019-12-01 21:07 ` incoming Linus Torvalds
2019-12-02 8:21 ` incoming Steven Price
2019-11-22 1:53 incoming Andrew Morton
2019-11-16 1:34 incoming Andrew Morton
2019-11-06 5:16 incoming Andrew Morton
2019-10-19 3:19 incoming Andrew Morton
2019-10-14 21:11 incoming Andrew Morton
2019-10-07 0:57 incoming Andrew Morton
2019-09-25 23:45 incoming Andrew Morton
2019-09-23 22:31 incoming Andrew Morton
2019-09-24 0:55 ` incoming Linus Torvalds
2019-09-24 4:31 ` incoming Andrew Morton
2019-09-24 7:48 ` incoming Michal Hocko
2019-09-24 15:34 ` incoming Linus Torvalds
2019-09-25 6:36 ` incoming Michal Hocko
2019-09-24 19:55 ` incoming Vlastimil Babka
2019-08-30 23:04 incoming Andrew Morton
2019-08-25 0:54 incoming Andrew Morton
[not found] <20190718155613.546f9056bbb57f486ab64307@linux-foundation.org>
2019-07-19 10:42 ` incoming Vlastimil Babka
[not found] <20190716162536.bb52b8f34a8ecf5331a86a42@linux-foundation.org>
2019-07-17 8:47 ` incoming Vlastimil Babka
2019-07-17 8:57 ` incoming Bhaskar Chowdhury
2019-07-17 16:13 ` incoming Linus Torvalds
2019-07-17 16:13 ` incoming Linus Torvalds
2019-07-17 17:09 ` incoming Christian Brauner
2019-07-17 18:13 ` incoming Vlastimil Babka
2018-02-06 23:34 incoming Andrew Morton
2018-02-01 0:13 incoming Andrew Morton
2018-02-01 0:25 ` incoming Andrew Morton
2018-01-19 0:33 incoming Andrew Morton
2018-01-13 0:52 incoming Andrew Morton
2018-01-05 0:17 incoming Andrew Morton
2017-12-14 23:32 incoming Andrew Morton
2017-11-30 0:09 incoming Andrew Morton
2017-11-17 23:25 incoming Andrew Morton
2017-11-16 1:29 incoming Andrew Morton
2017-11-09 21:38 incoming Andrew Morton
2017-11-02 22:59 incoming Andrew Morton
2017-10-13 22:57 incoming Andrew Morton
2017-10-03 23:14 incoming Andrew Morton
2017-09-13 23:28 incoming Andrew Morton
2017-09-08 23:10 incoming Andrew Morton
[not found] ` <CA+55aFwRXB5_kSuN7o+tqN6Eft6w5oZuLG3B8Rns=0ZZa2ihgA@mail.gmail.com>
[not found] ` <CA+55aFw+z3HDT4s1C41j=d5_0QTSu8NLSSpnk_jxZ39w34xgnA@mail.gmail.com>
2017-09-09 18:09 ` incoming Andrew Morton
2017-09-06 23:17 incoming Andrew Morton
2017-08-31 23:15 incoming Andrew Morton
2017-08-25 22:55 incoming Andrew Morton
2017-08-18 22:15 incoming Andrew Morton
2017-08-10 22:23 incoming Andrew Morton
2017-08-02 20:31 incoming Andrew Morton
2017-07-14 21:46 incoming Andrew Morton
2017-07-12 21:32 incoming Andrew Morton
2017-07-10 22:46 incoming Andrew Morton
2017-07-06 22:34 incoming Andrew Morton
2017-06-23 22:08 incoming Andrew Morton
2017-06-16 21:02 incoming Andrew Morton
2017-06-02 21:45 incoming Andrew Morton
2017-05-12 22:45 incoming Andrew Morton
2017-05-08 22:53 incoming Andrew Morton
2017-05-03 21:50 incoming Andrew Morton
2017-04-20 21:37 incoming Andrew Morton
2017-04-13 21:56 incoming Andrew Morton
2017-04-07 23:04 incoming Andrew Morton
2017-03-31 22:11 incoming Andrew Morton
2017-03-16 23:40 incoming Andrew Morton
2017-03-10 0:15 incoming Andrew Morton
2017-02-27 22:25 incoming Andrew Morton
2017-02-24 22:55 incoming Andrew Morton
2017-02-22 23:38 incoming Andrew Morton
2017-02-18 11:42 incoming Andrew Morton
2017-02-08 22:30 incoming Andrew Morton
2017-01-24 23:17 incoming Andrew Morton
2017-01-11 0:57 incoming Andrew Morton
2016-12-20 0:22 incoming Andrew Morton
2016-12-14 23:04 incoming Andrew Morton
2016-12-13 0:40 incoming Andrew Morton
2016-12-07 22:44 incoming Andrew Morton
2016-12-03 1:26 incoming Andrew Morton
2016-11-30 23:53 incoming Andrew Morton
2016-11-10 18:45 incoming Andrew Morton
2016-10-11 20:49 incoming Andrew Morton
2016-10-07 23:53 incoming Andrew Morton
2016-09-30 22:11 incoming Andrew Morton
2016-09-28 22:22 incoming Andrew Morton
2016-09-19 21:43 incoming Andrew Morton
2016-09-01 23:14 incoming Andrew Morton
2016-08-25 22:16 incoming Andrew Morton
2016-08-11 22:32 incoming Andrew Morton
2016-08-04 22:31 incoming Andrew Morton
2016-08-03 20:45 incoming Andrew Morton
2016-08-02 21:01 incoming Andrew Morton
2016-07-28 22:42 incoming Andrew Morton
2016-07-26 22:16 incoming Andrew Morton
2016-07-20 22:44 incoming Andrew Morton
2016-07-14 19:06 incoming Andrew Morton
2016-06-24 21:48 incoming Andrew Morton
2016-06-08 22:33 incoming Andrew Morton
2016-06-03 21:51 incoming Andrew Morton
2016-05-27 21:26 incoming Andrew Morton
2016-05-26 22:15 incoming Andrew Morton
2016-05-23 23:21 incoming Andrew Morton
2016-05-20 23:55 incoming Andrew Morton
2016-05-20 0:07 incoming Andrew Morton
2016-05-12 22:41 incoming Andrew Morton
2016-05-09 23:28 incoming Andrew Morton
2016-05-05 23:21 incoming Andrew Morton
[not found] <20150909153424.3feb1c403a841ab97b2d98ab@linux-foundation.org>
2015-09-09 23:23 ` incoming Linus Torvalds
2015-09-10 6:47 ` incoming Rasmus Villemoes
2007-05-02 22:02 incoming Andrew Morton
2007-05-02 22:02 ` incoming Andrew Morton
2007-05-02 22:31 ` incoming Benjamin Herrenschmidt
2007-05-02 22:31 ` incoming Benjamin Herrenschmidt
2007-05-03 7:55 ` incoming Russell King
2007-05-03 7:55 ` incoming Russell King
2007-05-03 8:05 ` incoming Andrew Morton
2007-05-03 8:05 ` incoming Andrew Morton
2007-05-04 13:37 ` incoming Greg KH
2007-05-04 13:37 ` incoming Greg KH
2007-05-04 16:14 ` incoming Andrew Morton
2007-05-04 16:14 ` incoming Andrew Morton
2007-05-04 17:02 ` incoming Greg KH
2007-05-04 17:02 ` incoming Greg KH
2007-05-04 18:57 ` incoming Roland McGrath
2007-05-04 18:57 ` incoming Roland McGrath
2007-05-04 19:24 ` incoming Greg KH
2007-05-04 19:24 ` incoming Greg KH
2007-05-04 19:29 ` incoming Roland McGrath
2007-05-04 19:29 ` incoming Roland McGrath
2006-10-20 21:39 incoming Andrew Morton
2006-10-20 22:31 ` incoming Alan Cox
2005-04-12 10:23 incoming Andrew Morton
2005-04-12 11:02 ` incoming David Vrabel
2005-04-12 11:10 ` incoming Andrew Morton
2005-04-12 11:33 ` incoming David Vrabel
2005-04-12 18:31 ` incoming Matthias Urlichs
2005-04-16 8:59 ` incoming Paul Jackson
2005-04-12 14:38 ` incoming Chris Friesen
2005-04-12 20:55 ` incoming Russell King
2005-04-12 21:08 ` incoming Andrew Morton
2005-04-12 21:12 ` incoming Russell King
2005-04-14 11:48 ` incoming Geert Uytterhoeven
2005-04-14 11:57 ` incoming Paulo Marques
2005-04-14 15:38 ` incoming Lee Revell
2005-04-16 9:03 ` incoming Paul Jackson
2004-11-11 0:02 incoming Andrew Morton
2004-10-28 7:19 incoming Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.