* incoming
@ 2022-03-05 4:28 Andrew Morton
2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: mm-commits, linux-mm, patches
8 patches, based on 07ebd38a0da24d2534da57b4841346379db9f354.
Subsystems affected by this patch series:
mm/hugetlb
mm/pagemap
memfd
selftests
mm/userfaultfd
kconfig
Subsystem: mm/hugetlb
Mike Kravetz <mike.kravetz@oracle.com>:
selftests/vm: cleanup hugetlb file after mremap test
Subsystem: mm/pagemap
Suren Baghdasaryan <surenb@google.com>:
mm: refactor vm_area_struct::anon_vma_name usage code
mm: prevent vm_area_struct::anon_name refcount saturation
mm: fix use-after-free when anon vma name is used after vma is freed
Subsystem: memfd
Hugh Dickins <hughd@google.com>:
memfd: fix F_SEAL_WRITE after shmem huge page allocated
Subsystem: selftests
Chengming Zhou <zhouchengming@bytedance.com>:
kselftest/vm: fix tests build with old libc
Subsystem: mm/userfaultfd
Yun Zhou <yun.zhou@windriver.com>:
proc: fix documentation and description of pagemap
Subsystem: kconfig
Qian Cai <quic_qiancai@quicinc.com>:
configs/debug: set CONFIG_DEBUG_INFO=y properly
Documentation/admin-guide/mm/pagemap.rst | 2
fs/proc/task_mmu.c | 9 +-
fs/userfaultfd.c | 6 -
include/linux/mm.h | 7 +
include/linux/mm_inline.h | 105 ++++++++++++++++++---------
include/linux/mm_types.h | 5 +
kernel/configs/debug.config | 2
kernel/fork.c | 4 -
kernel/sys.c | 19 +++-
mm/madvise.c | 98 +++++++++----------------
mm/memfd.c | 40 +++++++---
mm/mempolicy.c | 2
mm/mlock.c | 2
mm/mmap.c | 12 +--
mm/mprotect.c | 2
tools/testing/selftests/vm/hugepage-mremap.c | 26 ++++--
tools/testing/selftests/vm/run_vmtests.sh | 3
tools/testing/selftests/vm/userfaultfd.c | 1
18 files changed, 201 insertions(+), 144 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test
2022-03-05 4:28 incoming Andrew Morton
@ 2022-03-05 4:28 ` Andrew Morton
2022-03-05 4:28 ` [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code Andrew Morton
` (6 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw)
To: yosryahmed, songmuchun, skhan, almasrymina, mike.kravetz, akpm,
patches, linux-mm, mm-commits, torvalds, akpm
From: Mike Kravetz <mike.kravetz@oracle.com>
Subject: selftests/vm: cleanup hugetlb file after mremap test
The hugepage-mremap test will create a file in a hugetlb filesystem. In a
default 'run_vmtests' run, the file will contain all the hugetlb pages.
After the test, the file remains and there are no free hugetlb pages for
subsequent tests. This causes those hugetlb tests to fail.
Change hugepage-mremap to take the name of the hugetlb file as an
argument. Unlink the file within the test, and just to be sure remove the
file in the run_vmtests script.
Link: https://lkml.kernel.org/r/20220201033459.156944-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/hugepage-mremap.c | 26 ++++++++++++-----
tools/testing/selftests/vm/run_vmtests.sh | 3 +
2 files changed, 21 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/vm/hugepage-mremap.c~selftests-vm-cleanup-hugetlb-file-after-mremap-test
+++ a/tools/testing/selftests/vm/hugepage-mremap.c
@@ -3,9 +3,10 @@
* hugepage-mremap:
*
* Example of remapping huge page memory in a user application using the
- * mremap system call. Code assumes a hugetlbfs filesystem is mounted
- * at './huge'. The amount of memory used by this test is decided by a command
- * line argument in MBs. If missing, the default amount is 10MB.
+ * mremap system call. The path to a file in a hugetlbfs filesystem must
+ * be passed as the last argument to this test. The amount of memory used
+ * by this test in MBs can optionally be passed as an argument. If no memory
+ * amount is passed, the default amount is 10MB.
*
* To make sure the test triggers pmd sharing and goes through the 'unshare'
* path in the mremap code use 1GB (1024) or more.
@@ -25,7 +26,6 @@
#define DEFAULT_LENGTH_MB 10UL
#define MB_TO_BYTES(x) (x * 1024 * 1024)
-#define FILE_NAME "huge/hugepagefile"
#define PROTECTION (PROT_READ | PROT_WRITE | PROT_EXEC)
#define FLAGS (MAP_SHARED | MAP_ANONYMOUS)
@@ -107,17 +107,26 @@ static void register_region_with_uffd(ch
int main(int argc, char *argv[])
{
+ size_t length;
+
+ if (argc != 2 && argc != 3) {
+ printf("Usage: %s [length_in_MB] <hugetlb_file>\n", argv[0]);
+ exit(1);
+ }
+
/* Read memory length as the first arg if valid, otherwise fallback to
- * the default length. Any additional args are ignored.
+ * the default length.
*/
- size_t length = argc > 1 ? (size_t)atoi(argv[1]) : 0UL;
+ if (argc == 3)
+ length = argc > 2 ? (size_t)atoi(argv[1]) : 0UL;
length = length > 0 ? length : DEFAULT_LENGTH_MB;
length = MB_TO_BYTES(length);
int ret = 0;
- int fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
+ /* last arg is the hugetlb file name */
+ int fd = open(argv[argc-1], O_CREAT | O_RDWR, 0755);
if (fd < 0) {
perror("Open failed");
@@ -169,5 +178,8 @@ int main(int argc, char *argv[])
munmap(addr, length);
+ close(fd);
+ unlink(argv[argc-1]);
+
return ret;
}
--- a/tools/testing/selftests/vm/run_vmtests.sh~selftests-vm-cleanup-hugetlb-file-after-mremap-test
+++ a/tools/testing/selftests/vm/run_vmtests.sh
@@ -111,13 +111,14 @@ fi
echo "-----------------------"
echo "running hugepage-mremap"
echo "-----------------------"
-./hugepage-mremap 256
+./hugepage-mremap $mnt/huge_mremap
if [ $? -ne 0 ]; then
echo "[FAIL]"
exitcode=1
else
echo "[PASS]"
fi
+rm -f $mnt/huge_mremap
echo "NOTE: The above hugetlb tests provide minimal coverage. Use"
echo " https://github.com/libhugetlbfs/libhugetlbfs.git for"
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code
2022-03-05 4:28 incoming Andrew Morton
2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton
@ 2022-03-05 4:28 ` Andrew Morton
2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw)
To: willy, vbabka, sumit.semwal, sashal, pcc, mhocko, legion,
kirill.shutemov, keescook, hannes, gorcunov, ebiederm, david,
dave, dave.hansen, chris.hyser, ccross, caoxiaofeng, brauner,
surenb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Suren Baghdasaryan <surenb@google.com>
Subject: mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by using
struct anon_vma_name whenever possible. This simplifies the code and
allows easier sharing of anon_vma_name structures when they represent the
same name.
[surenb@google.com: fix comment]
Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/proc/task_mmu.c | 6 +-
fs/userfaultfd.c | 6 +-
include/linux/mm.h | 7 +-
include/linux/mm_inline.h | 87 ++++++++++++++++++++++++------------
include/linux/mm_types.h | 5 +-
kernel/fork.c | 4 -
kernel/sys.c | 19 ++++---
mm/madvise.c | 87 ++++++++++++------------------------
mm/mempolicy.c | 2
mm/mlock.c | 2
mm/mmap.c | 12 ++--
mm/mprotect.c | 2
12 files changed, 125 insertions(+), 114 deletions(-)
--- a/fs/proc/task_mmu.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/fs/proc/task_mmu.c
@@ -309,7 +309,7 @@ show_map_vma(struct seq_file *m, struct
name = arch_vma_name(vma);
if (!name) {
- const char *anon_name;
+ struct anon_vma_name *anon_name;
if (!mm) {
name = "[vdso]";
@@ -327,10 +327,10 @@ show_map_vma(struct seq_file *m, struct
goto done;
}
- anon_name = vma_anon_name(vma);
+ anon_name = anon_vma_name(vma);
if (anon_name) {
seq_pad(m, ' ');
- seq_printf(m, "[anon:%s]", anon_name);
+ seq_printf(m, "[anon:%s]", anon_name->name);
}
}
--- a/fs/userfaultfd.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/fs/userfaultfd.c
@@ -878,7 +878,7 @@ static int userfaultfd_release(struct in
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX, vma_anon_name(vma));
+ NULL_VM_UFFD_CTX, anon_vma_name(vma));
if (prev)
vma = prev;
else
@@ -1438,7 +1438,7 @@ static int userfaultfd_register(struct u
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
((struct vm_userfaultfd_ctx){ ctx }),
- vma_anon_name(vma));
+ anon_vma_name(vma));
if (prev) {
vma = prev;
goto next;
@@ -1615,7 +1615,7 @@ static int userfaultfd_unregister(struct
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX, vma_anon_name(vma));
+ NULL_VM_UFFD_CTX, anon_vma_name(vma));
if (prev) {
vma = prev;
goto next;
--- a/include/linux/mm.h~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/include/linux/mm.h
@@ -2626,7 +2626,7 @@ static inline int vma_adjust(struct vm_a
extern struct vm_area_struct *vma_merge(struct mm_struct *,
struct vm_area_struct *prev, unsigned long addr, unsigned long end,
unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
- struct mempolicy *, struct vm_userfaultfd_ctx, const char *);
+ struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
unsigned long addr, int new_below);
@@ -3372,11 +3372,12 @@ static inline int seal_check_future_writ
#ifdef CONFIG_ANON_VMA_NAME
int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
- unsigned long len_in, const char *name);
+ unsigned long len_in,
+ struct anon_vma_name *anon_name);
#else
static inline int
madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
- unsigned long len_in, const char *name) {
+ unsigned long len_in, struct anon_vma_name *anon_name) {
return 0;
}
#endif
--- a/include/linux/mm_inline.h~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/include/linux/mm_inline.h
@@ -140,50 +140,81 @@ static __always_inline void del_page_fro
#ifdef CONFIG_ANON_VMA_NAME
/*
- * mmap_lock should be read-locked when calling vma_anon_name() and while using
- * the returned pointer.
+ * mmap_lock should be read-locked when calling anon_vma_name(). Caller should
+ * either keep holding the lock while using the returned pointer or it should
+ * raise anon_vma_name refcount before releasing the lock.
*/
-extern const char *vma_anon_name(struct vm_area_struct *vma);
+extern struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma);
+extern struct anon_vma_name *anon_vma_name_alloc(const char *name);
+extern void anon_vma_name_free(struct kref *kref);
-/*
- * mmap_lock should be read-locked for orig_vma->vm_mm.
- * mmap_lock should be write-locked for new_vma->vm_mm or new_vma should be
- * isolated.
- */
-extern void dup_vma_anon_name(struct vm_area_struct *orig_vma,
- struct vm_area_struct *new_vma);
+/* mmap_lock should be read-locked */
+static inline void anon_vma_name_get(struct anon_vma_name *anon_name)
+{
+ if (anon_name)
+ kref_get(&anon_name->kref);
+}
-/*
- * mmap_lock should be write-locked or vma should have been isolated under
- * write-locked mmap_lock protection.
- */
-extern void free_vma_anon_name(struct vm_area_struct *vma);
+static inline void anon_vma_name_put(struct anon_vma_name *anon_name)
+{
+ if (anon_name)
+ kref_put(&anon_name->kref, anon_vma_name_free);
+}
-/* mmap_lock should be read-locked */
-static inline bool is_same_vma_anon_name(struct vm_area_struct *vma,
- const char *name)
+static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma,
+ struct vm_area_struct *new_vma)
+{
+ struct anon_vma_name *anon_name = anon_vma_name(orig_vma);
+
+ if (anon_name) {
+ anon_vma_name_get(anon_name);
+ new_vma->anon_name = anon_name;
+ }
+}
+
+static inline void free_anon_vma_name(struct vm_area_struct *vma)
{
- const char *vma_name = vma_anon_name(vma);
+ /*
+ * Not using anon_vma_name because it generates a warning if mmap_lock
+ * is not held, which might be the case here.
+ */
+ if (!vma->vm_file)
+ anon_vma_name_put(vma->anon_name);
+}
- /* either both NULL, or pointers to same string */
- if (vma_name == name)
+static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1,
+ struct anon_vma_name *anon_name2)
+{
+ if (anon_name1 == anon_name2)
return true;
- return name && vma_name && !strcmp(name, vma_name);
+ return anon_name1 && anon_name2 &&
+ !strcmp(anon_name1->name, anon_name2->name);
}
+
#else /* CONFIG_ANON_VMA_NAME */
-static inline const char *vma_anon_name(struct vm_area_struct *vma)
+static inline struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma)
+{
+ return NULL;
+}
+
+static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
{
return NULL;
}
-static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma,
- struct vm_area_struct *new_vma) {}
-static inline void free_vma_anon_name(struct vm_area_struct *vma) {}
-static inline bool is_same_vma_anon_name(struct vm_area_struct *vma,
- const char *name)
+
+static inline void anon_vma_name_get(struct anon_vma_name *anon_name) {}
+static inline void anon_vma_name_put(struct anon_vma_name *anon_name) {}
+static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma,
+ struct vm_area_struct *new_vma) {}
+static inline void free_anon_vma_name(struct vm_area_struct *vma) {}
+
+static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1,
+ struct anon_vma_name *anon_name2)
{
return true;
}
+
#endif /* CONFIG_ANON_VMA_NAME */
static inline void init_tlb_flush_pending(struct mm_struct *mm)
--- a/include/linux/mm_types.h~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/include/linux/mm_types.h
@@ -416,7 +416,10 @@ struct vm_area_struct {
struct rb_node rb;
unsigned long rb_subtree_last;
} shared;
- /* Serialized by mmap_sem. */
+ /*
+ * Serialized by mmap_sem. Never use directly because it is
+ * valid only when vm_file is NULL. Use anon_vma_name instead.
+ */
struct anon_vma_name *anon_name;
};
--- a/kernel/fork.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/kernel/fork.c
@@ -366,14 +366,14 @@ struct vm_area_struct *vm_area_dup(struc
*new = data_race(*orig);
INIT_LIST_HEAD(&new->anon_vma_chain);
new->vm_next = new->vm_prev = NULL;
- dup_vma_anon_name(orig, new);
+ dup_anon_vma_name(orig, new);
}
return new;
}
void vm_area_free(struct vm_area_struct *vma)
{
- free_vma_anon_name(vma);
+ free_anon_vma_name(vma);
kmem_cache_free(vm_area_cachep, vma);
}
--- a/kernel/sys.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/kernel/sys.c
@@ -7,6 +7,7 @@
#include <linux/export.h>
#include <linux/mm.h>
+#include <linux/mm_inline.h>
#include <linux/utsname.h>
#include <linux/mman.h>
#include <linux/reboot.h>
@@ -2286,15 +2287,16 @@ static int prctl_set_vma(unsigned long o
{
struct mm_struct *mm = current->mm;
const char __user *uname;
- char *name, *pch;
+ struct anon_vma_name *anon_name = NULL;
int error;
switch (opt) {
case PR_SET_VMA_ANON_NAME:
uname = (const char __user *)arg;
if (uname) {
- name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
+ char *name, *pch;
+ name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
if (IS_ERR(name))
return PTR_ERR(name);
@@ -2304,15 +2306,18 @@ static int prctl_set_vma(unsigned long o
return -EINVAL;
}
}
- } else {
- /* Reset the name */
- name = NULL;
+ /* anon_vma has its own copy */
+ anon_name = anon_vma_name_alloc(name);
+ kfree(name);
+ if (!anon_name)
+ return -ENOMEM;
+
}
mmap_write_lock(mm);
- error = madvise_set_anon_name(mm, addr, size, name);
+ error = madvise_set_anon_name(mm, addr, size, anon_name);
mmap_write_unlock(mm);
- kfree(name);
+ anon_vma_name_put(anon_name);
break;
default:
error = -EINVAL;
--- a/mm/madvise.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/mm/madvise.c
@@ -65,7 +65,7 @@ static int madvise_need_mmap_write(int b
}
#ifdef CONFIG_ANON_VMA_NAME
-static struct anon_vma_name *anon_vma_name_alloc(const char *name)
+struct anon_vma_name *anon_vma_name_alloc(const char *name)
{
struct anon_vma_name *anon_name;
size_t count;
@@ -81,78 +81,49 @@ static struct anon_vma_name *anon_vma_na
return anon_name;
}
-static void vma_anon_name_free(struct kref *kref)
+void anon_vma_name_free(struct kref *kref)
{
struct anon_vma_name *anon_name =
container_of(kref, struct anon_vma_name, kref);
kfree(anon_name);
}
-static inline bool has_vma_anon_name(struct vm_area_struct *vma)
+struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma)
{
- return !vma->vm_file && vma->anon_name;
-}
-
-const char *vma_anon_name(struct vm_area_struct *vma)
-{
- if (!has_vma_anon_name(vma))
- return NULL;
-
mmap_assert_locked(vma->vm_mm);
- return vma->anon_name->name;
-}
-
-void dup_vma_anon_name(struct vm_area_struct *orig_vma,
- struct vm_area_struct *new_vma)
-{
- if (!has_vma_anon_name(orig_vma))
- return;
-
- kref_get(&orig_vma->anon_name->kref);
- new_vma->anon_name = orig_vma->anon_name;
-}
-
-void free_vma_anon_name(struct vm_area_struct *vma)
-{
- struct anon_vma_name *anon_name;
-
- if (!has_vma_anon_name(vma))
- return;
+ if (vma->vm_file)
+ return NULL;
- anon_name = vma->anon_name;
- vma->anon_name = NULL;
- kref_put(&anon_name->kref, vma_anon_name_free);
+ return vma->anon_name;
}
/* mmap_lock should be write-locked */
-static int replace_vma_anon_name(struct vm_area_struct *vma, const char *name)
+static int replace_anon_vma_name(struct vm_area_struct *vma,
+ struct anon_vma_name *anon_name)
{
- const char *anon_name;
+ struct anon_vma_name *orig_name = anon_vma_name(vma);
- if (!name) {
- free_vma_anon_name(vma);
+ if (!anon_name) {
+ vma->anon_name = NULL;
+ anon_vma_name_put(orig_name);
return 0;
}
- anon_name = vma_anon_name(vma);
- if (anon_name) {
- /* Same name, nothing to do here */
- if (!strcmp(name, anon_name))
- return 0;
+ if (anon_vma_name_eq(orig_name, anon_name))
+ return 0;
- free_vma_anon_name(vma);
- }
- vma->anon_name = anon_vma_name_alloc(name);
- if (!vma->anon_name)
- return -ENOMEM;
+ anon_vma_name_get(anon_name);
+ vma->anon_name = anon_name;
+ anon_vma_name_put(orig_name);
return 0;
}
#else /* CONFIG_ANON_VMA_NAME */
-static int replace_vma_anon_name(struct vm_area_struct *vma, const char *name)
+static int replace_anon_vma_name(struct vm_area_struct *vma,
+ struct anon_vma_name *anon_name)
{
- if (name)
+ if (anon_name)
return -EINVAL;
return 0;
@@ -165,13 +136,13 @@ static int replace_vma_anon_name(struct
static int madvise_update_vma(struct vm_area_struct *vma,
struct vm_area_struct **prev, unsigned long start,
unsigned long end, unsigned long new_flags,
- const char *name)
+ struct anon_vma_name *anon_name)
{
struct mm_struct *mm = vma->vm_mm;
int error;
pgoff_t pgoff;
- if (new_flags == vma->vm_flags && is_same_vma_anon_name(vma, name)) {
+ if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) {
*prev = vma;
return 0;
}
@@ -179,7 +150,7 @@ static int madvise_update_vma(struct vm_
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, name);
+ vma->vm_userfaultfd_ctx, anon_name);
if (*prev) {
vma = *prev;
goto success;
@@ -209,7 +180,7 @@ success:
*/
vma->vm_flags = new_flags;
if (!vma->vm_file) {
- error = replace_vma_anon_name(vma, name);
+ error = replace_anon_vma_name(vma, anon_name);
if (error)
return error;
}
@@ -1041,7 +1012,7 @@ static int madvise_vma_behavior(struct v
}
error = madvise_update_vma(vma, prev, start, end, new_flags,
- vma_anon_name(vma));
+ anon_vma_name(vma));
out:
/*
@@ -1225,7 +1196,7 @@ int madvise_walk_vmas(struct mm_struct *
static int madvise_vma_anon_name(struct vm_area_struct *vma,
struct vm_area_struct **prev,
unsigned long start, unsigned long end,
- unsigned long name)
+ unsigned long anon_name)
{
int error;
@@ -1234,7 +1205,7 @@ static int madvise_vma_anon_name(struct
return -EBADF;
error = madvise_update_vma(vma, prev, start, end, vma->vm_flags,
- (const char *)name);
+ (struct anon_vma_name *)anon_name);
/*
* madvise() returns EAGAIN if kernel resources, such as
@@ -1246,7 +1217,7 @@ static int madvise_vma_anon_name(struct
}
int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
- unsigned long len_in, const char *name)
+ unsigned long len_in, struct anon_vma_name *anon_name)
{
unsigned long end;
unsigned long len;
@@ -1266,7 +1237,7 @@ int madvise_set_anon_name(struct mm_stru
if (end == start)
return 0;
- return madvise_walk_vmas(mm, start, end, (unsigned long)name,
+ return madvise_walk_vmas(mm, start, end, (unsigned long)anon_name,
madvise_vma_anon_name);
}
#endif /* CONFIG_ANON_VMA_NAME */
--- a/mm/mempolicy.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/mm/mempolicy.c
@@ -814,7 +814,7 @@ static int mbind_range(struct mm_struct
prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
new_pol, vma->vm_userfaultfd_ctx,
- vma_anon_name(vma));
+ anon_vma_name(vma));
if (prev) {
vma = prev;
next = vma->vm_next;
--- a/mm/mlock.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/mm/mlock.c
@@ -512,7 +512,7 @@ static int mlock_fixup(struct vm_area_st
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, vma_anon_name(vma));
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*prev) {
vma = *prev;
goto success;
--- a/mm/mmap.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/mm/mmap.c
@@ -1031,7 +1031,7 @@ again:
static inline int is_mergeable_vma(struct vm_area_struct *vma,
struct file *file, unsigned long vm_flags,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- const char *anon_name)
+ struct anon_vma_name *anon_name)
{
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -1049,7 +1049,7 @@ static inline int is_mergeable_vma(struc
return 0;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
return 0;
- if (!is_same_vma_anon_name(vma, anon_name))
+ if (!anon_vma_name_eq(anon_vma_name(vma), anon_name))
return 0;
return 1;
}
@@ -1084,7 +1084,7 @@ can_vma_merge_before(struct vm_area_stru
struct anon_vma *anon_vma, struct file *file,
pgoff_t vm_pgoff,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- const char *anon_name)
+ struct anon_vma_name *anon_name)
{
if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
@@ -1106,7 +1106,7 @@ can_vma_merge_after(struct vm_area_struc
struct anon_vma *anon_vma, struct file *file,
pgoff_t vm_pgoff,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- const char *anon_name)
+ struct anon_vma_name *anon_name)
{
if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
@@ -1167,7 +1167,7 @@ struct vm_area_struct *vma_merge(struct
struct anon_vma *anon_vma, struct file *file,
pgoff_t pgoff, struct mempolicy *policy,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
- const char *anon_name)
+ struct anon_vma_name *anon_name)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
struct vm_area_struct *area, *next;
@@ -3256,7 +3256,7 @@ struct vm_area_struct *copy_vma(struct v
return NULL; /* should never get here */
new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, vma_anon_name(vma));
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (new_vma) {
/*
* Source vma may have been merged into new_vma
--- a/mm/mprotect.c~mm-refactor-vm_area_struct-anon_vma_name-usage-code
+++ a/mm/mprotect.c
@@ -464,7 +464,7 @@ mprotect_fixup(struct vm_area_struct *vm
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*pprev = vma_merge(mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx, vma_anon_name(vma));
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
if (*pprev) {
vma = *pprev;
VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation
2022-03-05 4:28 incoming Andrew Morton
2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton
2022-03-05 4:28 ` [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code Andrew Morton
@ 2022-03-05 4:28 ` Andrew Morton
2022-03-05 19:03 ` Linus Torvalds
2022-03-05 4:28 ` [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed Andrew Morton
` (4 subsequent siblings)
7 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw)
To: willy, vbabka, sumit.semwal, sashal, pcc, mhocko, legion,
kirill.shutemov, keescook, hannes, gorcunov, ebiederm, david,
dave, dave.hansen, chris.hyser, ccross, caoxiaofeng, brauner,
surenb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Suren Baghdasaryan <surenb@google.com>
Subject: mm: prevent vm_area_struct::anon_name refcount saturation
A deep process chain with many vmas could grow really high. With default
sysctl_max_map_count (64k) and default pid_max (32k) the max number of
vmas in the system is 2147450880 and the refcounter has headroom of
1073774592 before it reaches REFCOUNT_SATURATED (3221225472).
Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults. Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In this
configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).
kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter. This would lead to the
refcounted object never being freed. A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.
To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED. This should provide enough headroom for raising the
refcounts temporarily.
Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mm_inline.h | 18 ++++++++++++++----
mm/madvise.c | 3 +--
2 files changed, 15 insertions(+), 6 deletions(-)
--- a/include/linux/mm_inline.h~mm-prevent-vm_area_struct-anon_name-refcount-saturation
+++ a/include/linux/mm_inline.h
@@ -161,15 +161,25 @@ static inline void anon_vma_name_put(str
kref_put(&anon_name->kref, anon_vma_name_free);
}
+static inline
+struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name)
+{
+ /* Prevent anon_name refcount saturation early on */
+ if (kref_read(&anon_name->kref) < REFCOUNT_MAX) {
+ anon_vma_name_get(anon_name);
+ return anon_name;
+
+ }
+ return anon_vma_name_alloc(anon_name->name);
+}
+
static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma,
struct vm_area_struct *new_vma)
{
struct anon_vma_name *anon_name = anon_vma_name(orig_vma);
- if (anon_name) {
- anon_vma_name_get(anon_name);
- new_vma->anon_name = anon_name;
- }
+ if (anon_name)
+ new_vma->anon_name = anon_vma_name_reuse(anon_name);
}
static inline void free_anon_vma_name(struct vm_area_struct *vma)
--- a/mm/madvise.c~mm-prevent-vm_area_struct-anon_name-refcount-saturation
+++ a/mm/madvise.c
@@ -113,8 +113,7 @@ static int replace_anon_vma_name(struct
if (anon_vma_name_eq(orig_name, anon_name))
return 0;
- anon_vma_name_get(anon_name);
- vma->anon_name = anon_name;
+ vma->anon_name = anon_vma_name_reuse(anon_name);
anon_vma_name_put(orig_name);
return 0;
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed
2022-03-05 4:28 incoming Andrew Morton
` (2 preceding siblings ...)
2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton
@ 2022-03-05 4:28 ` Andrew Morton
2022-03-05 4:29 ` [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated Andrew Morton
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:28 UTC (permalink / raw)
To: willy, vbabka, sumit.semwal, sashal, pcc, mhocko, legion,
kirill.shutemov, keescook, hannes, gorcunov, ebiederm, david,
dave, dave.hansen, chris.hyser, ccross, caoxiaofeng, brauner,
surenb, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Suren Baghdasaryan <surenb@google.com>
Subject: mm: fix use-after-free when anon vma name is used after vma is freed
When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed. In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name and it is used after the call to vma_merge. In
the cases when vma_merge merges the original vma and destroys it, this
might result in UAF. For that the original vma would have to hold the
anon_vma_name with the last reference. The following vma would need to
contain a different anon_vma_name object with the same string. Such
scenario is shown below:
madvise_vma_behavior(vma)
madvise_update_vma(vma, ..., anon_name == vma->anon_name)
vma_merge(vma)
__vma_adjust(vma) <-- merges vma with adjacent one
vm_area_free(vma) <-- frees the original vma
replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name
Fix this by raising the name refcount and stabilizing it.
Link: https://lkml.kernel.org/r/20220224231834.1481408-3-surenb@google.com
Link: https://lkml.kernel.org/r/20220223153613.835563-3-surenb@google.com
Fixes: 9a10064f5625 ("mm: add a field to store names for private anonymous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Colin Cross <ccross@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mm-fix-use-after-free-when-anon-vma-name-is-used-after-vma-is-freed
+++ a/mm/madvise.c
@@ -131,6 +131,8 @@ static int replace_anon_vma_name(struct
/*
* Update the vm_flags on region of a vma, splitting it or merging it as
* necessary. Must be called with mmap_sem held for writing;
+ * Caller should ensure anon_name stability by raising its refcount even when
+ * anon_name belongs to a valid vma because this function might free that vma.
*/
static int madvise_update_vma(struct vm_area_struct *vma,
struct vm_area_struct **prev, unsigned long start,
@@ -945,6 +947,7 @@ static int madvise_vma_behavior(struct v
unsigned long behavior)
{
int error;
+ struct anon_vma_name *anon_name;
unsigned long new_flags = vma->vm_flags;
switch (behavior) {
@@ -1010,8 +1013,11 @@ static int madvise_vma_behavior(struct v
break;
}
+ anon_name = anon_vma_name(vma);
+ anon_vma_name_get(anon_name);
error = madvise_update_vma(vma, prev, start, end, new_flags,
- anon_vma_name(vma));
+ anon_name);
+ anon_vma_name_put(anon_name);
out:
/*
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated
2022-03-05 4:28 incoming Andrew Morton
` (3 preceding siblings ...)
2022-03-05 4:28 ` [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed Andrew Morton
@ 2022-03-05 4:29 ` Andrew Morton
2022-03-05 4:29 ` [patch 6/8] kselftest/vm: fix tests build with old libc Andrew Morton
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw)
To: zealci, yang.yang29, willy, wang.yong12, stable, songliubraving,
mike.kravetz, kirill, cgel.zte, hughd, akpm, patches, linux-mm,
mm-commits, torvalds, akpm
From: Hugh Dickins <hughd@google.com>
Subject: memfd: fix F_SEAL_WRITE after shmem huge page allocated
Wangyong reports: after enabling tmpfs filesystem to support transparent
hugepage with the following command:
echo always > /sys/kernel/mm/transparent_hugepage/shmem_enabled
the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:
fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.
Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict. Bear in mind
that total_mapcount() itself scans all of the THP subpages, when choosing
to take an XA_CHECK_SCHED latency break.
Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.
Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Reported-by: wangyong <wang.yong12@zte.com.cn>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memfd.c | 40 ++++++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)
--- a/mm/memfd.c~memfd-fix-f_seal_write-after-shmem-huge-page-allocated
+++ a/mm/memfd.c
@@ -31,20 +31,28 @@
static void memfd_tag_pins(struct xa_state *xas)
{
struct page *page;
- unsigned int tagged = 0;
+ int latency = 0;
+ int cache_count;
lru_add_drain();
xas_lock_irq(xas);
xas_for_each(xas, page, ULONG_MAX) {
- if (xa_is_value(page))
- continue;
- page = find_subpage(page, xas->xa_index);
- if (page_count(page) - page_mapcount(page) > 1)
+ cache_count = 1;
+ if (!xa_is_value(page) &&
+ PageTransHuge(page) && !PageHuge(page))
+ cache_count = HPAGE_PMD_NR;
+
+ if (!xa_is_value(page) &&
+ page_count(page) - total_mapcount(page) != cache_count)
xas_set_mark(xas, MEMFD_TAG_PINNED);
+ if (cache_count != 1)
+ xas_set(xas, page->index + cache_count);
- if (++tagged % XA_CHECK_SCHED)
+ latency += cache_count;
+ if (latency < XA_CHECK_SCHED)
continue;
+ latency = 0;
xas_pause(xas);
xas_unlock_irq(xas);
@@ -73,7 +81,8 @@ static int memfd_wait_for_pins(struct ad
error = 0;
for (scan = 0; scan <= LAST_SCAN; scan++) {
- unsigned int tagged = 0;
+ int latency = 0;
+ int cache_count;
if (!xas_marked(&xas, MEMFD_TAG_PINNED))
break;
@@ -87,10 +96,14 @@ static int memfd_wait_for_pins(struct ad
xas_lock_irq(&xas);
xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) {
bool clear = true;
- if (xa_is_value(page))
- continue;
- page = find_subpage(page, xas.xa_index);
- if (page_count(page) - page_mapcount(page) != 1) {
+
+ cache_count = 1;
+ if (!xa_is_value(page) &&
+ PageTransHuge(page) && !PageHuge(page))
+ cache_count = HPAGE_PMD_NR;
+
+ if (!xa_is_value(page) && cache_count !=
+ page_count(page) - total_mapcount(page)) {
/*
* On the last scan, we clean up all those tags
* we inserted; but make a note that we still
@@ -103,8 +116,11 @@ static int memfd_wait_for_pins(struct ad
}
if (clear)
xas_clear_mark(&xas, MEMFD_TAG_PINNED);
- if (++tagged % XA_CHECK_SCHED)
+
+ latency += cache_count;
+ if (latency < XA_CHECK_SCHED)
continue;
+ latency = 0;
xas_pause(&xas);
xas_unlock_irq(&xas);
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 6/8] kselftest/vm: fix tests build with old libc
2022-03-05 4:28 incoming Andrew Morton
` (4 preceding siblings ...)
2022-03-05 4:29 ` [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated Andrew Morton
@ 2022-03-05 4:29 ` Andrew Morton
2022-03-05 4:29 ` [patch 7/8] proc: fix documentation and description of pagemap Andrew Morton
2022-03-05 4:29 ` [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly Andrew Morton
7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw)
To: skhan, zhouchengming, akpm, patches, linux-mm, mm-commits,
torvalds, akpm
From: Chengming Zhou <zhouchengming@bytedance.com>
Subject: kselftest/vm: fix tests build with old libc
The error message when I build vm tests on debian10 (GLIBC 2.28):
userfaultfd.c: In function `userfaultfd_pagemap_test':
userfaultfd.c:1393:37: error: `MADV_PAGEOUT' undeclared (first use
in this function); did you mean `MADV_RANDOM'?
if (madvise(area_dst, test_pgsize, MADV_PAGEOUT))
^~~~~~~~~~~~
MADV_RANDOM
This patch includes these newer definitions from UAPI linux/mman.h, is
useful to fix tests build on systems without these definitions in glibc
sys/mman.h.
Link: https://lkml.kernel.org/r/20220227055330.43087-2-zhouchengming@bytedance.com
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
tools/testing/selftests/vm/userfaultfd.c | 1 +
1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/vm/userfaultfd.c~kselftest-vm-fix-tests-build-with-old-libc
+++ a/tools/testing/selftests/vm/userfaultfd.c
@@ -46,6 +46,7 @@
#include <signal.h>
#include <poll.h>
#include <string.h>
+#include <linux/mman.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 7/8] proc: fix documentation and description of pagemap
2022-03-05 4:28 incoming Andrew Morton
` (5 preceding siblings ...)
2022-03-05 4:29 ` [patch 6/8] kselftest/vm: fix tests build with old libc Andrew Morton
@ 2022-03-05 4:29 ` Andrew Morton
2022-03-05 4:29 ` [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly Andrew Morton
7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw)
To: tiberiu.georgescu, sj, shy828301, peterx, linmiaohe,
ivan.teterevkov, florian.schmidt, david, corbet, ccross,
axelrasmussen, apopple, aarcange, yun.zhou, akpm, patches,
linux-mm, mm-commits, torvalds, akpm
From: Yun Zhou <yun.zhou@windriver.com>
Subject: proc: fix documentation and description of pagemap
Since bit 57 was exported for uffd-wp write-protected(commit
fb8e37f35a2f), fixing it can reduce some unnecessary confusion.
Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com
Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Colin Cross <ccross@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
Documentation/admin-guide/mm/pagemap.rst | 2 +-
fs/proc/task_mmu.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
--- a/Documentation/admin-guide/mm/pagemap.rst~proc-fix-documentation-and-description-of-pagemap
+++ a/Documentation/admin-guide/mm/pagemap.rst
@@ -23,7 +23,7 @@ There are four components to pagemap:
* Bit 56 page exclusively mapped (since 4.2)
* Bit 57 pte is uffd-wp write-protected (since 5.13) (see
:ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
- * Bits 57-60 zero
+ * Bits 58-60 zero
* Bit 61 page is file-page or shared-anon (since 3.5)
* Bit 62 page swapped
* Bit 63 page present
--- a/fs/proc/task_mmu.c~proc-fix-documentation-and-description-of-pagemap
+++ a/fs/proc/task_mmu.c
@@ -1597,7 +1597,8 @@ static const struct mm_walk_ops pagemap_
* Bits 5-54 swap offset if swapped
* Bit 55 pte is soft-dirty (see Documentation/admin-guide/mm/soft-dirty.rst)
* Bit 56 page exclusively mapped
- * Bits 57-60 zero
+ * Bit 57 pte is uffd-wp write-protected
+ * Bits 58-60 zero
* Bit 61 page is file-page or shared-anon
* Bit 62 page swapped
* Bit 63 page present
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly
2022-03-05 4:28 incoming Andrew Morton
` (6 preceding siblings ...)
2022-03-05 4:29 ` [patch 7/8] proc: fix documentation and description of pagemap Andrew Morton
@ 2022-03-05 4:29 ` Andrew Morton
7 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2022-03-05 4:29 UTC (permalink / raw)
To: quic_qiancai, akpm, patches, linux-mm, mm-commits, torvalds, akpm
From: Qian Cai <quic_qiancai@quicinc.com>
Subject: configs/debug: set CONFIG_DEBUG_INFO=y properly
CONFIG_DEBUG_INFO can't be set by user directly, so set
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y instead. Otherwise, we end up
with no debuginfo in vmlinux which is a big no-no for kernel debugging.
Link: https://lkml.kernel.org/r/20220301202920.18488-1-quic_qiancai@quicinc.com
Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/configs/debug.config | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/configs/debug.config~configs-debug-set-config_debug_info=y-properly
+++ a/kernel/configs/debug.config
@@ -16,7 +16,7 @@ CONFIG_SYMBOLIC_ERRNAME=y
#
# Compile-time checks and compiler options
#
-CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_FRAME_WARN=2048
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation
2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton
@ 2022-03-05 19:03 ` Linus Torvalds
0 siblings, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2022-03-05 19:03 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox, Vlastimil Babka, sumit.semwal, Sasha Levin,
Peter Collingbourne, Michal Hocko, Alexey Gladkov,
Kirill A . Shutemov, Kees Cook, Johannes Weiner, Cyrill Gorcunov,
Eric W. Biederman, David Hildenbrand, Davidlohr Bueso,
Dave Hansen, chris.hyser, Colin Cross, caoxiaofeng,
Christian Brauner, Suren Baghdasaryan, patches, Linux-MM,
mm-commits
On Fri, Mar 4, 2022 at 8:28 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
> sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
> leaves INT_MAX/2 (1073741823) values before the counter reaches
> REFCOUNT_SATURATED. This should provide enough headroom for raising the
> refcounts temporarily.
This is a classic case of kref simply being the wrong type for this.
We sh ould move away from that idiotic "saturate with a warning" type,
and just codify that what the page refs do is the RightThing(tm) to
do.
I've ranted against kref for years, I hate that damn thing. It's
literally broken by design with the known leaking behavior.
Oh well. I'm taking this patch as a "fix up refcount problems", and I
guess I need to some day just extract the page_ref code into a nice
type of its own so that it's usable outside of pages.
(Others have copied the page_ref code manually, but there's no "helper
type with functions to use it", which is why people then use that
mis-designed refcount stuff).
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-03-05 19:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-05 4:28 incoming Andrew Morton
2022-03-05 4:28 ` [patch 1/8] selftests/vm: cleanup hugetlb file after mremap test Andrew Morton
2022-03-05 4:28 ` [patch 2/8] mm: refactor vm_area_struct::anon_vma_name usage code Andrew Morton
2022-03-05 4:28 ` [patch 3/8] mm: prevent vm_area_struct::anon_name refcount saturation Andrew Morton
2022-03-05 19:03 ` Linus Torvalds
2022-03-05 4:28 ` [patch 4/8] mm: fix use-after-free when anon vma name is used after vma is freed Andrew Morton
2022-03-05 4:29 ` [patch 5/8] memfd: fix F_SEAL_WRITE after shmem huge page allocated Andrew Morton
2022-03-05 4:29 ` [patch 6/8] kselftest/vm: fix tests build with old libc Andrew Morton
2022-03-05 4:29 ` [patch 7/8] proc: fix documentation and description of pagemap Andrew Morton
2022-03-05 4:29 ` [patch 8/8] configs/debug: set CONFIG_DEBUG_INFO=y properly Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).