* incoming @ 2022-03-05 4:28 Andrew Morton 2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton ` (7 more replies) 0 siblings, 8 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches 8 patches, based on 07ebd38a0da24d2534da57b4841346379db9f354. Subsystems affected by this patch series: mm/hugetlb mm/pagemap memfd selftests mm/userfaultfd kconfig Subsystem: mm/hugetlb Mike Kravetz <mike.kravetz@oracle.com>: selftests/vm: cleanup hugetlb file after mremap test Subsystem: mm/pagemap Suren Baghdasaryan <surenb@google.com>: mm: refactor vm_area_struct::anon_vma_name usage code mm: prevent vm_area_struct::anon_name refcount saturation mm: fix use-after-free when anon vma name is used after vma is freed Subsystem: memfd Hugh Dickins <hughd@google.com>: memfd: fix F_SEAL_WRITE after shmem huge page allocated Subsystem: selftests Chengming Zhou <zhouchengming@bytedance.com>: kselftest/vm: fix tests build with old libc Subsystem: mm/userfaultfd Yun Zhou <yun.zhou@windriver.com>: proc: fix documentation and description of pagemap Subsystem: kconfig Qian Cai <quic_qiancai@quicinc.com>: configs/debug: set CONFIG_DEBUG_INFO=y properly Documentation/admin-guide/mm/pagemap.rst | 2 fs/proc/task_mmu.c | 9 +- fs/userfaultfd.c | 6 - include/linux/mm.h | 7 + include/linux/mm_inline.h | 105 ++++++++++++++++++--------- include/linux/mm_types.h | 5 + kernel/configs/debug.config | 2 kernel/fork.c | 4 - kernel/sys.c | 19 +++- mm/madvise.c | 98 +++++++++---------------- mm/memfd.c | 40 +++++++--- mm/mempolicy.c | 2 mm/mlock.c | 2 mm/mmap.c | 12 +-- mm/mprotect.c | 2 tools/testing/selftests/vm/hugepage-mremap.c | 26 ++++-- tools/testing/selftests/vm/run_vmtests.sh | 3 tools/testing/selftests/vm/userfaultfd.c | 1 18 files changed, 201 insertions(+), 144 deletions(-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test 2022-03-05 4:28 incoming Andrew Morton @ 2022-03-05 4:28 ` Andrew Morton 2022-03-05 4:28 ` [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code Andrew Morton ` (6 subsequent siblings) 7 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw) To: yosryahmed, songmuchun, skhan, almasrymina, mike.kravetz, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Mike Kravetz <mike.kravetz@oracle.com> Subject: selftests/vm: cleanup hugetlb file after mremap test The hugepage-mremap test will create a file in a hugetlb filesystem. In a default 'run_vmtests' run, the file will contain all the hugetlb pages. After the test, the file remains and there are no free hugetlb pages for subsequent tests. This causes those hugetlb tests to fail. Change hugepage-mremap to take the name of the hugetlb file as an argument. Unlink the file within the test, and just to be sure remove the file in the run_vmtests script. Link: https://lkml.kernel.org/r/20220201033459.156944-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- tools/testing/selftests/vm/hugepage-mremap.c | 26 ++++++++++++----- tools/testing/selftests/vm/run_vmtests.sh | 3 + 2 files changed, 21 insertions(+), 8 deletions(-) --- a/tools/testing/selftests/vm/hugepage-mremap.c~selftests-vm-cleanup-hugetlb-file-after-mremap-test +++ a/tools/testing/selftests/vm/hugepage-mremap.c @@ -3,9 +3,10 @@ * hugepage-mremap: * * Example of remapping huge page memory in a user application using the - * mremap system call. Code assumes a hugetlbfs filesystem is mounted - * at './huge'. The amount of memory used by this test is decided by a command - * line argument in MBs. If missing, the default amount is 10MB. + * mremap system call. The path to a file in a hugetlbfs filesystem must + * be passed as the last argument to this test. The amount of memory used + * by this test in MBs can optionally be passed as an argument. If no memory + * amount is passed, the default amount is 10MB. * * To make sure the test triggers pmd sharing and goes through the 'unshare' * path in the mremap code use 1GB (1024) or more. @@ -25,7 +26,6 @@ #define DEFAULT_LENGTH_MB 10UL #define MB_TO_BYTES(x) (x * 1024 * 1024) -#define FILE_NAME "huge/hugepagefile" #define PROTECTION (PROT_READ | PROT_WRITE | PROT_EXEC) #define FLAGS (MAP_SHARED | MAP_ANONYMOUS) @@ -107,17 +107,26 @@ static void register_region_with_uffd(ch int main(int argc, char *argv[]) { + size_t length; + + if (argc != 2 && argc != 3) { + printf("Usage: %s [length_in_MB] <hugetlb_file>\n", argv[0]); + exit(1); + } + /* Read memory length as the first arg if valid, otherwise fallback to - * the default length. Any additional args are ignored. + * the default length. */ - size_t length = argc > 1 ? (size_t)atoi(argv[1]) : 0UL; + if (argc == 3) + length = argc > 2 ? (size_t)atoi(argv[1]) : 0UL; length = length > 0 ? length : DEFAULT_LENGTH_MB; length = MB_TO_BYTES(length); int ret = 0; - int fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755); + /* last arg is the hugetlb file name */ + int fd = open(argv[argc-1], O_CREAT | O_RDWR, 0755); if (fd < 0) { perror("Open failed"); @@ -169,5 +178,8 @@ int main(int argc, char *argv[]) munmap(addr, length); + close(fd); + unlink(argv[argc-1]); + return ret; } --- a/tools/testing/selftests/vm/run_vmtests.sh~selftests-vm-cleanup-hugetlb-file-after-mremap-test +++ a/tools/testing/selftests/vm/run_vmtests.sh @@ -111,13 +111,14 @@ fi echo "-----------------------" echo "running hugepage-mremap" echo "-----------------------" -./hugepage-mremap 256 +./hugepage-mremap $mnt/huge_mremap if [ $? -ne 0 ]; then echo "[FAIL]" exitcode=1 else echo "[PASS]" fi +rm -f $mnt/huge_mremap echo "NOTE: The above hugetlb tests provide minimal coverage. Use" echo " https://github.com/libhugetlbfs/libhugetlbfs.git for" _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code 2022-03-05 4:28 incoming Andrew Morton 2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton @ 2022-03-05 4:28 ` Andrew Morton 2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton ` (5 subsequent siblings) 7 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw) To: willy, vbabka, sumit.semwal, sashal, pcc, mhocko, legion, kirill.shutemov, keescook, hannes, gorcunov, ebiederm, david, dave, dave.hansen, chris.hyser, ccross, caoxiaofeng, brauner, surenb, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Suren Baghdasaryan <surenb@google.com> Subject: mm: refactor vm_area_struct::anon_vma_name usage code Avoid mixing strings and their anon_vma_name referenced pointers by using struct anon_vma_name whenever possible. This simplifies the code and allows easier sharing of anon_vma_name structures when they represent the same name. [surenb@google.com: fix comment] Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Colin Cross <ccross@google.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Alexey Gladkov <legion@kernel.org> Cc: Sasha Levin <sashal@kernel.org> Cc: Chris Hyser <chris.hyser@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Collingbourne <pcc@google.com> Cc: Xiaofeng Cao <caoxiaofeng@yulong.com> Cc: David Hildenbrand <david@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- fs/proc/task_mmu.c | 6 +- fs/userfaultfd.c | 6 +- include/linux/mm.h | 7 +- include/linux/mm_inline.h | 87 ++++++++++++++++++++++++------------ include/linux/mm_types.h | 5 +- kernel/fork.c | 4 - kernel/sys.c | 19 ++++--- mm/madvise.c | 87 ++++++++++++------------------------ mm/mempolicy.c | 2 mm/mlock.c | 2 mm/mmap.c | 12 ++-- mm/mprotect.c | 2 12 files changed, 125 insertions(+), 114 deletions(-) --- a/fs/proc/task_mmu.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/fs/proc/task_mmu.c @@ -309,7 +309,7 @@ show_map_vma(struct seq_file *m, struct name = arch_vma_name(vma); if (!name) { - const char *anon_name; + struct anon_vma_name *anon_name; if (!mm) { name = "[vdso]"; @@ -327,10 +327,10 @@ show_map_vma(struct seq_file *m, struct goto done; } - anon_name = vma_anon_name(vma); + anon_name = anon_vma_name(vma); if (anon_name) { seq_pad(m, ' '); - seq_printf(m, "[anon:%s]", anon_name); + seq_printf(m, "[anon:%s]", anon_name->name); } } --- a/fs/userfaultfd.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/fs/userfaultfd.c @@ -878,7 +878,7 @@ static int userfaultfd_release(struct in new_flags, vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma), - NULL_VM_UFFD_CTX, vma_anon_name(vma)); + NULL_VM_UFFD_CTX, anon_vma_name(vma)); if (prev) vma = prev; else @@ -1438,7 +1438,7 @@ static int userfaultfd_register(struct u vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma), ((struct vm_userfaultfd_ctx){ ctx }), - vma_anon_name(vma)); + anon_vma_name(vma)); if (prev) { vma = prev; goto next; @@ -1615,7 +1615,7 @@ static int userfaultfd_unregister(struct prev = vma_merge(mm, prev, start, vma_end, new_flags, vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma), - NULL_VM_UFFD_CTX, vma_anon_name(vma)); + NULL_VM_UFFD_CTX, anon_vma_name(vma)); if (prev) { vma = prev; goto next; --- a/include/linux/mm.h~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/include/linux/mm.h @@ -2626,7 +2626,7 @@ static inline int vma_adjust(struct vm_a extern struct vm_area_struct *vma_merge(struct mm_struct *, struct vm_area_struct *prev, unsigned long addr, unsigned long end, unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t, - struct mempolicy *, struct vm_userfaultfd_ctx, const char *); + struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *); extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); extern int __split_vma(struct mm_struct *, struct vm_area_struct *, unsigned long addr, int new_below); @@ -3372,11 +3372,12 @@ static inline int seal_check_future_writ #ifdef CONFIG_ANON_VMA_NAME int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, const char *name); + unsigned long len_in, + struct anon_vma_name *anon_name); #else static inline int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, const char *name) { + unsigned long len_in, struct anon_vma_name *anon_name) { return 0; } #endif --- a/include/linux/mm_inline.h~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/include/linux/mm_inline.h @@ -140,50 +140,81 @@ static __always_inline void del_page_fro #ifdef CONFIG_ANON_VMA_NAME /* - * mmap_lock should be read-locked when calling vma_anon_name() and while using - * the returned pointer. + * mmap_lock should be read-locked when calling anon_vma_name(). Caller should + * either keep holding the lock while using the returned pointer or it should + * raise anon_vma_name refcount before releasing the lock. */ -extern const char *vma_anon_name(struct vm_area_struct *vma); +extern struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma); +extern struct anon_vma_name *anon_vma_name_alloc(const char *name); +extern void anon_vma_name_free(struct kref *kref); -/* - * mmap_lock should be read-locked for orig_vma->vm_mm. - * mmap_lock should be write-locked for new_vma->vm_mm or new_vma should be - * isolated. - */ -extern void dup_vma_anon_name(struct vm_area_struct *orig_vma, - struct vm_area_struct *new_vma); +/* mmap_lock should be read-locked */ +static inline void anon_vma_name_get(struct anon_vma_name *anon_name) +{ + if (anon_name) + kref_get(&anon_name->kref); +} -/* - * mmap_lock should be write-locked or vma should have been isolated under - * write-locked mmap_lock protection. - */ -extern void free_vma_anon_name(struct vm_area_struct *vma); +static inline void anon_vma_name_put(struct anon_vma_name *anon_name) +{ + if (anon_name) + kref_put(&anon_name->kref, anon_vma_name_free); +} -/* mmap_lock should be read-locked */ -static inline bool is_same_vma_anon_name(struct vm_area_struct *vma, - const char *name) +static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma, + struct vm_area_struct *new_vma) +{ + struct anon_vma_name *anon_name = anon_vma_name(orig_vma); + + if (anon_name) { + anon_vma_name_get(anon_name); + new_vma->anon_name = anon_name; + } +} + +static inline void free_anon_vma_name(struct vm_area_struct *vma) { - const char *vma_name = vma_anon_name(vma); + /* + * Not using anon_vma_name because it generates a warning if mmap_lock + * is not held, which might be the case here. + */ + if (!vma->vm_file) + anon_vma_name_put(vma->anon_name); +} - /* either both NULL, or pointers to same string */ - if (vma_name == name) +static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1, + struct anon_vma_name *anon_name2) +{ + if (anon_name1 == anon_name2) return true; - return name && vma_name && !strcmp(name, vma_name); + return anon_name1 && anon_name2 && + !strcmp(anon_name1->name, anon_name2->name); } + #else /* CONFIG_ANON_VMA_NAME */ -static inline const char *vma_anon_name(struct vm_area_struct *vma) +static inline struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma) +{ + return NULL; +} + +static inline struct anon_vma_name *anon_vma_name_alloc(const char *name) { return NULL; } -static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma, - struct vm_area_struct *new_vma) {} -static inline void free_vma_anon_name(struct vm_area_struct *vma) {} -static inline bool is_same_vma_anon_name(struct vm_area_struct *vma, - const char *name) + +static inline void anon_vma_name_get(struct anon_vma_name *anon_name) {} +static inline void anon_vma_name_put(struct anon_vma_name *anon_name) {} +static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma, + struct vm_area_struct *new_vma) {} +static inline void free_anon_vma_name(struct vm_area_struct *vma) {} + +static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1, + struct anon_vma_name *anon_name2) { return true; } + #endif /* CONFIG_ANON_VMA_NAME */ static inline void init_tlb_flush_pending(struct mm_struct *mm) --- a/include/linux/mm_types.h~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/include/linux/mm_types.h @@ -416,7 +416,10 @@ struct vm_area_struct { struct rb_node rb; unsigned long rb_subtree_last; } shared; - /* Serialized by mmap_sem. */ + /* + * Serialized by mmap_sem. Never use directly because it is + * valid only when vm_file is NULL. Use anon_vma_name instead. + */ struct anon_vma_name *anon_name; }; --- a/kernel/fork.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/kernel/fork.c @@ -366,14 +366,14 @@ struct vm_area_struct *vm_area_dup(struc *new = data_race(*orig); INIT_LIST_HEAD(&new->anon_vma_chain); new->vm_next = new->vm_prev = NULL; - dup_vma_anon_name(orig, new); + dup_anon_vma_name(orig, new); } return new; } void vm_area_free(struct vm_area_struct *vma) { - free_vma_anon_name(vma); + free_anon_vma_name(vma); kmem_cache_free(vm_area_cachep, vma); } --- a/kernel/sys.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/kernel/sys.c @@ -7,6 +7,7 @@ #include <linux/export.h> #include <linux/mm.h> +#include <linux/mm_inline.h> #include <linux/utsname.h> #include <linux/mman.h> #include <linux/reboot.h> @@ -2286,15 +2287,16 @@ static int prctl_set_vma(unsigned long o { struct mm_struct *mm = current->mm; const char __user *uname; - char *name, *pch; + struct anon_vma_name *anon_name = NULL; int error; switch (opt) { case PR_SET_VMA_ANON_NAME: uname = (const char __user *)arg; if (uname) { - name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); + char *name, *pch; + name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); if (IS_ERR(name)) return PTR_ERR(name); @@ -2304,15 +2306,18 @@ static int prctl_set_vma(unsigned long o return -EINVAL; } } - } else { - /* Reset the name */ - name = NULL; + /* anon_vma has its own copy */ + anon_name = anon_vma_name_alloc(name); + kfree(name); + if (!anon_name) + return -ENOMEM; + } mmap_write_lock(mm); - error = madvise_set_anon_name(mm, addr, size, name); + error = madvise_set_anon_name(mm, addr, size, anon_name); mmap_write_unlock(mm); - kfree(name); + anon_vma_name_put(anon_name); break; default: error = -EINVAL; --- a/mm/madvise.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/mm/madvise.c @@ -65,7 +65,7 @@ static int madvise_need_mmap_write(int b } #ifdef CONFIG_ANON_VMA_NAME -static struct anon_vma_name *anon_vma_name_alloc(const char *name) +struct anon_vma_name *anon_vma_name_alloc(const char *name) { struct anon_vma_name *anon_name; size_t count; @@ -81,78 +81,49 @@ static struct anon_vma_name *anon_vma_na return anon_name; } -static void vma_anon_name_free(struct kref *kref) +void anon_vma_name_free(struct kref *kref) { struct anon_vma_name *anon_name = container_of(kref, struct anon_vma_name, kref); kfree(anon_name); } -static inline bool has_vma_anon_name(struct vm_area_struct *vma) +struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma) { - return !vma->vm_file && vma->anon_name; -} - -const char *vma_anon_name(struct vm_area_struct *vma) -{ - if (!has_vma_anon_name(vma)) - return NULL; - mmap_assert_locked(vma->vm_mm); - return vma->anon_name->name; -} - -void dup_vma_anon_name(struct vm_area_struct *orig_vma, - struct vm_area_struct *new_vma) -{ - if (!has_vma_anon_name(orig_vma)) - return; - - kref_get(&orig_vma->anon_name->kref); - new_vma->anon_name = orig_vma->anon_name; -} - -void free_vma_anon_name(struct vm_area_struct *vma) -{ - struct anon_vma_name *anon_name; - - if (!has_vma_anon_name(vma)) - return; + if (vma->vm_file) + return NULL; - anon_name = vma->anon_name; - vma->anon_name = NULL; - kref_put(&anon_name->kref, vma_anon_name_free); + return vma->anon_name; } /* mmap_lock should be write-locked */ -static int replace_vma_anon_name(struct vm_area_struct *vma, const char *name) +static int replace_anon_vma_name(struct vm_area_struct *vma, + struct anon_vma_name *anon_name) { - const char *anon_name; + struct anon_vma_name *orig_name = anon_vma_name(vma); - if (!name) { - free_vma_anon_name(vma); + if (!anon_name) { + vma->anon_name = NULL; + anon_vma_name_put(orig_name); return 0; } - anon_name = vma_anon_name(vma); - if (anon_name) { - /* Same name, nothing to do here */ - if (!strcmp(name, anon_name)) - return 0; + if (anon_vma_name_eq(orig_name, anon_name)) + return 0; - free_vma_anon_name(vma); - } - vma->anon_name = anon_vma_name_alloc(name); - if (!vma->anon_name) - return -ENOMEM; + anon_vma_name_get(anon_name); + vma->anon_name = anon_name; + anon_vma_name_put(orig_name); return 0; } #else /* CONFIG_ANON_VMA_NAME */ -static int replace_vma_anon_name(struct vm_area_struct *vma, const char *name) +static int replace_anon_vma_name(struct vm_area_struct *vma, + struct anon_vma_name *anon_name) { - if (name) + if (anon_name) return -EINVAL; return 0; @@ -165,13 +136,13 @@ static int replace_vma_anon_name(struct static int madvise_update_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end, unsigned long new_flags, - const char *name) + struct anon_vma_name *anon_name) { struct mm_struct *mm = vma->vm_mm; int error; pgoff_t pgoff; - if (new_flags == vma->vm_flags && is_same_vma_anon_name(vma, name)) { + if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) { *prev = vma; return 0; } @@ -179,7 +150,7 @@ static int madvise_update_vma(struct vm_ pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, name); + vma->vm_userfaultfd_ctx, anon_name); if (*prev) { vma = *prev; goto success; @@ -209,7 +180,7 @@ success: */ vma->vm_flags = new_flags; if (!vma->vm_file) { - error = replace_vma_anon_name(vma, name); + error = replace_anon_vma_name(vma, anon_name); if (error) return error; } @@ -1041,7 +1012,7 @@ static int madvise_vma_behavior(struct v } error = madvise_update_vma(vma, prev, start, end, new_flags, - vma_anon_name(vma)); + anon_vma_name(vma)); out: /* @@ -1225,7 +1196,7 @@ int madvise_walk_vmas(struct mm_struct * static int madvise_vma_anon_name(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end, - unsigned long name) + unsigned long anon_name) { int error; @@ -1234,7 +1205,7 @@ static int madvise_vma_anon_name(struct return -EBADF; error = madvise_update_vma(vma, prev, start, end, vma->vm_flags, - (const char *)name); + (struct anon_vma_name *)anon_name); /* * madvise() returns EAGAIN if kernel resources, such as @@ -1246,7 +1217,7 @@ static int madvise_vma_anon_name(struct } int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, const char *name) + unsigned long len_in, struct anon_vma_name *anon_name) { unsigned long end; unsigned long len; @@ -1266,7 +1237,7 @@ int madvise_set_anon_name(struct mm_stru if (end == start) return 0; - return madvise_walk_vmas(mm, start, end, (unsigned long)name, + return madvise_walk_vmas(mm, start, end, (unsigned long)anon_name, madvise_vma_anon_name); } #endif /* CONFIG_ANON_VMA_NAME */ --- a/mm/mempolicy.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/mm/mempolicy.c @@ -814,7 +814,7 @@ static int mbind_range(struct mm_struct prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff, new_pol, vma->vm_userfaultfd_ctx, - vma_anon_name(vma)); + anon_vma_name(vma)); if (prev) { vma = prev; next = vma->vm_next; --- a/mm/mlock.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/mm/mlock.c @@ -512,7 +512,7 @@ static int mlock_fixup(struct vm_area_st pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, vma_anon_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma)); if (*prev) { vma = *prev; goto success; --- a/mm/mmap.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/mm/mmap.c @@ -1031,7 +1031,7 @@ again: static inline int is_mergeable_vma(struct vm_area_struct *vma, struct file *file, unsigned long vm_flags, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - const char *anon_name) + struct anon_vma_name *anon_name) { /* * VM_SOFTDIRTY should not prevent from VMA merging, if we @@ -1049,7 +1049,7 @@ static inline int is_mergeable_vma(struc return 0; if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx)) return 0; - if (!is_same_vma_anon_name(vma, anon_name)) + if (!anon_vma_name_eq(anon_vma_name(vma), anon_name)) return 0; return 1; } @@ -1084,7 +1084,7 @@ can_vma_merge_before(struct vm_area_stru struct anon_vma *anon_vma, struct file *file, pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - const char *anon_name) + struct anon_vma_name *anon_name) { if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) && is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { @@ -1106,7 +1106,7 @@ can_vma_merge_after(struct vm_area_struc struct anon_vma *anon_vma, struct file *file, pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - const char *anon_name) + struct anon_vma_name *anon_name) { if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) && is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { @@ -1167,7 +1167,7 @@ struct vm_area_struct *vma_merge(struct struct anon_vma *anon_vma, struct file *file, pgoff_t pgoff, struct mempolicy *policy, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - const char *anon_name) + struct anon_vma_name *anon_name) { pgoff_t pglen = (end - addr) >> PAGE_SHIFT; struct vm_area_struct *area, *next; @@ -3256,7 +3256,7 @@ struct vm_area_struct *copy_vma(struct v return NULL; /* should never get here */ new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, vma_anon_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma)); if (new_vma) { /* * Source vma may have been merged into new_vma --- a/mm/mprotect.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code +++ a/mm/mprotect.c @@ -464,7 +464,7 @@ mprotect_fixup(struct vm_area_struct *vm pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *pprev = vma_merge(mm, *pprev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, vma_anon_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma)); if (*pprev) { vma = *pprev; VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY); _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation 2022-03-05 4:28 incoming Andrew Morton 2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton 2022-03-05 4:28 ` [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code Andrew Morton @ 2022-03-05 4:28 ` Andrew Morton 2022-03-05 19:03 ` Linus Torvalds 2022-03-05 4:28 ` [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed Andrew Morton ` (4 subsequent siblings) 7 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw) To: willy, vbabka, sumit.semwal, sashal, pcc, mhocko, legion, kirill.shutemov, keescook, hannes, gorcunov, ebiederm, david, dave, dave.hansen, chris.hyser, ccross, caoxiaofeng, brauner, surenb, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Suren Baghdasaryan <surenb@google.com> Subject: mm: prevent vm_area_struct::anon_name refcount saturation A deep process chain with many vmas could grow really high. With default sysctl_max_map_count (64k) and default pid_max (32k) the max number of vmas in the system is 2147450880 and the refcounter has headroom of 1073774592 before it reaches REFCOUNT_SATURATED (3221225472). Therefore it's unlikely that an anonymous name refcounter will overflow with these defaults. Currently the max for pid_max is PID_MAX_LIMIT (4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In this configuration anon_vma_name refcount overflow becomes theoretically possible (that still require heavy sharing of that anon_vma_name between processes). kref refcounting interface used in anon_vma_name structure will detect a counter overflow when it reaches REFCOUNT_SATURATED value but will only generate a warning and freeze the ref counter. This would lead to the refcounted object never being freed. A determined attacker could leak memory like that but it would be rather expensive and inefficient way to do so. To ensure anon_vma_name refcount does not overflow, stop anon_vma_name sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still leaves INT_MAX/2 (1073741823) values before the counter reaches REFCOUNT_SATURATED. This should provide enough headroom for raising the refcounts temporarily. Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Chris Hyser <chris.hyser@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Colin Cross <ccross@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Collingbourne <pcc@google.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiaofeng Cao <caoxiaofeng@yulong.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm_inline.h | 18 ++++++++++++++---- mm/madvise.c | 3 +-- 2 files changed, 15 insertions(+), 6 deletions(-) --- a/include/linux/mm_inline.h~mm-prevent-vm_area_struct-anon_name-refcount-saturation +++ a/include/linux/mm_inline.h @@ -161,15 +161,25 @@ static inline void anon_vma_name_put(str kref_put(&anon_name->kref, anon_vma_name_free); } +static inline +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name) +{ + /* Prevent anon_name refcount saturation early on */ + if (kref_read(&anon_name->kref) < REFCOUNT_MAX) { + anon_vma_name_get(anon_name); + return anon_name; + + } + return anon_vma_name_alloc(anon_name->name); +} + static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma, struct vm_area_struct *new_vma) { struct anon_vma_name *anon_name = anon_vma_name(orig_vma); - if (anon_name) { - anon_vma_name_get(anon_name); - new_vma->anon_name = anon_name; - } + if (anon_name) + new_vma->anon_name = anon_vma_name_reuse(anon_name); } static inline void free_anon_vma_name(struct vm_area_struct *vma) --- a/mm/madvise.c~mm-prevent-vm_area_struct-anon_name-refcount-saturation +++ a/mm/madvise.c @@ -113,8 +113,7 @@ static int replace_anon_vma_name(struct if (anon_vma_name_eq(orig_name, anon_name)) return 0; - anon_vma_name_get(anon_name); - vma->anon_name = anon_name; + vma->anon_name = anon_vma_name_reuse(anon_name); anon_vma_name_put(orig_name); return 0; _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation 2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton @ 2022-03-05 19:03 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2022-03-05 19:03 UTC (permalink / raw) To: Andrew Morton Cc: Matthew Wilcox, Vlastimil Babka, sumit.semwal, Sasha Levin, Peter Collingbourne, Michal Hocko, Alexey Gladkov, Kirill A . Shutemov, Kees Cook, Johannes Weiner, Cyrill Gorcunov, Eric W. Biederman, David Hildenbrand, Davidlohr Bueso, Dave Hansen, chris.hyser, Colin Cross, caoxiaofeng, Christian Brauner, Suren Baghdasaryan, patches, Linux-MM, mm-commits On Fri, Mar 4, 2022 at 8:28 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > To ensure anon_vma_name refcount does not overflow, stop anon_vma_name > sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still > leaves INT_MAX/2 (1073741823) values before the counter reaches > REFCOUNT_SATURATED. This should provide enough headroom for raising the > refcounts temporarily. This is a classic case of kref simply being the wrong type for this. We sh ould move away from that idiotic "saturate with a warning" type, and just codify that what the page refs do is the RightThing(tm) to do. I've ranted against kref for years, I hate that damn thing. It's literally broken by design with the known leaking behavior. Oh well. I'm taking this patch as a "fix up refcount problems", and I guess I need to some day just extract the page_ref code into a nice type of its own so that it's usable outside of pages. (Others have copied the page_ref code manually, but there's no "helper type with functions to use it", which is why people then use that mis-designed refcount stuff). Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed 2022-03-05 4:28 incoming Andrew Morton ` (2 preceding siblings ...) 2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton @ 2022-03-05 4:28 ` Andrew Morton 2022-03-05 4:29 ` [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated Andrew Morton ` (3 subsequent siblings) 7 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw) To: willy, vbabka, sumit.semwal, sashal, pcc, mhocko, legion, kirill.shutemov, keescook, hannes, gorcunov, ebiederm, david, dave, dave.hansen, chris.hyser, ccross, caoxiaofeng, brauner, surenb, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Suren Baghdasaryan <surenb@google.com> Subject: mm: fix use-after-free when anon vma name is used after vma is freed When adjacent vmas are being merged it can result in the vma that was originally passed to madvise_update_vma being destroyed. In the current implementation, the name parameter passed to madvise_update_vma points directly to vma->anon_name and it is used after the call to vma_merge. In the cases when vma_merge merges the original vma and destroys it, this might result in UAF. For that the original vma would have to hold the anon_vma_name with the last reference. The following vma would need to contain a different anon_vma_name object with the same string. Such scenario is shown below: madvise_vma_behavior(vma) madvise_update_vma(vma, ..., anon_name == vma->anon_name) vma_merge(vma) __vma_adjust(vma) <-- merges vma with adjacent one vm_area_free(vma) <-- frees the original vma replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name Fix this by raising the name refcount and stabilizing it. Link: https://lkml.kernel.org/r/20220224231834.1481408-3-surenb@google.com Link: https://lkml.kernel.org/r/20220223153613.835563-3-surenb@google.com Fixes: 9a10064f5625 ("mm: add a field to store names for private anonymous memory") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Chris Hyser <chris.hyser@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Colin Cross <ccross@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiaofeng Cao <caoxiaofeng@yulong.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/madvise.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/madvise.c~mm-fix-use-after-free-when-anon-vma-name-is-used-after-vma-is-freed +++ a/mm/madvise.c @@ -131,6 +131,8 @@ static int replace_anon_vma_name(struct /* * Update the vm_flags on region of a vma, splitting it or merging it as * necessary. Must be called with mmap_sem held for writing; + * Caller should ensure anon_name stability by raising its refcount even when + * anon_name belongs to a valid vma because this function might free that vma. */ static int madvise_update_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, @@ -945,6 +947,7 @@ static int madvise_vma_behavior(struct v unsigned long behavior) { int error; + struct anon_vma_name *anon_name; unsigned long new_flags = vma->vm_flags; switch (behavior) { @@ -1010,8 +1013,11 @@ static int madvise_vma_behavior(struct v break; } + anon_name = anon_vma_name(vma); + anon_vma_name_get(anon_name); error = madvise_update_vma(vma, prev, start, end, new_flags, - anon_vma_name(vma)); + anon_name); + anon_vma_name_put(anon_name); out: /* _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 4:28 incoming Andrew Morton ` (3 preceding siblings ...) 2022-03-05 4:28 ` [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed Andrew Morton @ 2022-03-05 4:29 ` Andrew Morton 2022-03-05 4:29 ` [patch 6/8] kselftest/vm: fix tests build with old libc Andrew Morton ` (2 subsequent siblings) 7 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw) To: zealci, yang.yang29, willy, wang.yong12, stable, songliubraving, mike.kravetz, kirill, cgel.zte, hughd, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Hugh Dickins <hughd@google.com> Subject: memfd: fix F_SEAL_WRITE after shmem huge page allocated Wangyong reports: after enabling tmpfs filesystem to support transparent hugepage with the following command: echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled the docker program tries to add F_SEAL_WRITE through the following command, but it fails unexpectedly with errno EBUSY: fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1. That is because memfd_tag_pins() and memfd_wait_for_pins() were never updated for shmem huge pages: checking page_mapcount() against page_count() is hopeless on THP subpages - they need to check total_mapcount() against page_count() on THP heads only. Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins() (compared != 1): either can be justified, but given the non-atomic total_mapcount() calculation, it is better now to be strict. Bear in mind that total_mapcount() itself scans all of the THP subpages, when choosing to take an XA_CHECK_SCHED latency break. Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a page has been swapped out since memfd_tag_pins(), then its refcount must have fallen, and so it can safely be untagged. Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Zeal Robot <zealci@zte.com.cn> Reported-by: wangyong <wang.yong12@zte.com.cn> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: CGEL ZTE <cgel.zte@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Song Liu <songliubraving@fb.com> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memfd.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) --- a/mm/memfd.c~memfd-fix-f_seal_write-after-shmem-huge-page-allocated +++ a/mm/memfd.c @@ -31,20 +31,28 @@ static void memfd_tag_pins(struct xa_state *xas) { struct page *page; - unsigned int tagged = 0; + int latency = 0; + int cache_count; lru_add_drain(); xas_lock_irq(xas); xas_for_each(xas, page, ULONG_MAX) { - if (xa_is_value(page)) - continue; - page = find_subpage(page, xas->xa_index); - if (page_count(page) - page_mapcount(page) > 1) + cache_count = 1; + if (!xa_is_value(page) && + PageTransHuge(page) && !PageHuge(page)) + cache_count = HPAGE_PMD_NR; + + if (!xa_is_value(page) && + page_count(page) - total_mapcount(page) != cache_count) xas_set_mark(xas, MEMFD_TAG_PINNED); + if (cache_count != 1) + xas_set(xas, page->index + cache_count); - if (++tagged % XA_CHECK_SCHED) + latency += cache_count; + if (latency < XA_CHECK_SCHED) continue; + latency = 0; xas_pause(xas); xas_unlock_irq(xas); @@ -73,7 +81,8 @@ static int memfd_wait_for_pins(struct ad error = 0; for (scan = 0; scan <= LAST_SCAN; scan++) { - unsigned int tagged = 0; + int latency = 0; + int cache_count; if (!xas_marked(&xas, MEMFD_TAG_PINNED)) break; @@ -87,10 +96,14 @@ static int memfd_wait_for_pins(struct ad xas_lock_irq(&xas); xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) { bool clear = true; - if (xa_is_value(page)) - continue; - page = find_subpage(page, xas.xa_index); - if (page_count(page) - page_mapcount(page) != 1) { + + cache_count = 1; + if (!xa_is_value(page) && + PageTransHuge(page) && !PageHuge(page)) + cache_count = HPAGE_PMD_NR; + + if (!xa_is_value(page) && cache_count != + page_count(page) - total_mapcount(page)) { /* * On the last scan, we clean up all those tags * we inserted; but make a note that we still @@ -103,8 +116,11 @@ static int memfd_wait_for_pins(struct ad } if (clear) xas_clear_mark(&xas, MEMFD_TAG_PINNED); - if (++tagged % XA_CHECK_SCHED) + + latency += cache_count; + if (latency < XA_CHECK_SCHED) continue; + latency = 0; xas_pause(&xas); xas_unlock_irq(&xas); _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 6/8] kselftest/vm: fix tests build with old libc 2022-03-05 4:28 incoming Andrew Morton ` (4 preceding siblings ...) 2022-03-05 4:29 ` [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated Andrew Morton @ 2022-03-05 4:29 ` Andrew Morton 2022-03-05 4:29 ` [patch 7/8] proc: fix documentation and description of pagemap Andrew Morton 2022-03-05 4:29 ` [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly Andrew Morton 7 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw) To: skhan, zhouchengming, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Chengming Zhou <zhouchengming@bytedance.com> Subject: kselftest/vm: fix tests build with old libc The error message when I build vm tests on debian10 (GLIBC 2.28): userfaultfd.c: In function `userfaultfd_pagemap_test': userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use in this function); did you mean `MADV_RANDOM'? if (madvise(area_dst, test_pgsize, MADV_PAGEOUT)) ^~~~~~~~~~~~ MADV_RANDOM This patch includes these newer definitions from UAPI linux/mman.h, is useful to fix tests build on systems without these definitions in glibc sys/mman.h. Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- tools/testing/selftests/vm/userfaultfd.c | 1 + 1 file changed, 1 insertion(+) --- a/tools/testing/selftests/vm/userfaultfd.c~kselftest-vm-fix-tests-build-with-old-libc +++ a/tools/testing/selftests/vm/userfaultfd.c @@ -46,6 +46,7 @@ #include <signal.h> #include <poll.h> #include <string.h> +#include <linux/mman.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/ioctl.h> _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 7/8] proc: fix documentation and description of pagemap 2022-03-05 4:28 incoming Andrew Morton ` (5 preceding siblings ...) 2022-03-05 4:29 ` [patch 6/8] kselftest/vm: fix tests build with old libc Andrew Morton @ 2022-03-05 4:29 ` Andrew Morton 2022-03-05 4:29 ` [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly Andrew Morton 7 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw) To: tiberiu.georgescu, sj, shy828301, peterx, linmiaohe, ivan.teterevkov, florian.schmidt, david, corbet, ccross, axelrasmussen, apopple, aarcange, yun.zhou, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Yun Zhou <yun.zhou@windriver.com> Subject: proc: fix documentation and description of pagemap Since bit 57 was exported for uffd-wp write-protected(commit fb8e37f35a2f), fixing it can reduce some unnecessary confusion. Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information") Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com> Cc: Florian Schmidt <florian.schmidt@nutanix.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: SeongJae Park <sj@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Colin Cross <ccross@google.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- Documentation/admin-guide/mm/pagemap.rst | 2 +- fs/proc/task_mmu.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) --- a/Documentation/admin-guide/mm/pagemap.rst~proc-fix-documentation-and-description-of-pagemap +++ a/Documentation/admin-guide/mm/pagemap.rst @@ -23,7 +23,7 @@ There are four components to pagemap: * Bit 56 page exclusively mapped (since 4.2) * Bit 57 pte is uffd-wp write-protected (since 5.13) (see :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`) - * Bits 57-60 zero + * Bits 58-60 zero * Bit 61 page is file-page or shared-anon (since 3.5) * Bit 62 page swapped * Bit 63 page present --- a/fs/proc/task_mmu.c~proc-fix-documentation-and-description-of-pagemap +++ a/fs/proc/task_mmu.c @@ -1597,7 +1597,8 @@ static const struct mm_walk_ops pagemap_ * Bits 5-54 swap offset if swapped * Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst) * Bit 56 page exclusively mapped - * Bits 57-60 zero + * Bit 57 pte is uffd-wp write-protected + * Bits 58-60 zero * Bit 61 page is file-page or shared-anon * Bit 62 page swapped * Bit 63 page present _ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly 2022-03-05 4:28 incoming Andrew Morton ` (6 preceding siblings ...) 2022-03-05 4:29 ` [patch 7/8] proc: fix documentation and description of pagemap Andrew Morton @ 2022-03-05 4:29 ` Andrew Morton 7 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw) To: quic_qiancai, akpm, patches, linux-mm, mm-commits, torvalds, akpm From: Qian Cai <quic_qiancai@quicinc.com> Subject: configs/debug: set CONFIG_DEBUG_INFO=y properly CONFIG_DEBUG_INFO can't be set by user directly, so set CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y instead. Otherwise, we end up with no debuginfo in vmlinux which is a big no-no for kernel debugging. Link: https://lkml.kernel.org/r/20220301202920.18488-1-quic_qiancai@quicinc.com Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- kernel/configs/debug.config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/configs/debug.config~configs-debug-set-config_debug_info=y-properly +++ a/kernel/configs/debug.config @@ -16,7 +16,7 @@ CONFIG_SYMBOLIC_ERRNAME=y # # Compile-time checks and compiler options # -CONFIG_DEBUG_INFO=y +CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_DEBUG_SECTION_MISMATCH=y CONFIG_FRAME_WARN=2048 CONFIG_SECTION_MISMATCH_WARN_ONLY=y _ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-03-05 19:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-05 4:28 incoming Andrew Morton 2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton 2022-03-05 4:28 ` [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code Andrew Morton 2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton 2022-03-05 19:03 ` Linus Torvalds 2022-03-05 4:28 ` [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed Andrew Morton 2022-03-05 4:29 ` [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated Andrew Morton 2022-03-05 4:29 ` [patch 6/8] kselftest/vm: fix tests build with old libc Andrew Morton 2022-03-05 4:29 ` [patch 7/8] proc: fix documentation and description of pagemap Andrew Morton 2022-03-05 4:29 ` [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).