* [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp @ 2021-02-17 20:44 Peter Xu 2021-02-17 20:44 ` [PATCH v2 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Peter Xu ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Peter Xu @ 2021-02-17 20:44 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: peterx, Axel Rasmussen, Mike Kravetz, Mike Rapoport, Matthew Wilcox, Andrea Arcangeli, Andrew Morton, Kirill A . Shutemov v2: - patch 4: move hugetlb_unshare_all_pmds() into mm/hugetlb.c, so it can be used even outside userfaultfd.c This series tries to disable huge pmd unshare of hugetlbfs backed memory for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the idea of this series may be needed for multiple tasks (Axel's uffd minor fault series, and Mike's soft dirty series), so I picked it out from the larger series. References works: Uffd shmem+hugetlbfs rfc: https://lore.kernel.org/lkml/20210115170907.24498-1-peterx@redhat.com/ Uffd minor mode for hugetlbfs: https://lore.kernel.org/lkml/20210212215403.3457686-1-axelrasmussen@google.com/ Soft dirty for hugetlbfs: https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/ Please review, thanks. Peter Xu (4): hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp arch/arm64/mm/hugetlbpage.c | 7 ++- arch/ia64/mm/hugetlbpage.c | 3 +- arch/mips/mm/hugetlbpage.c | 4 +- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 3 +- arch/s390/mm/hugetlbpage.c | 2 +- arch/sh/mm/hugetlbpage.c | 2 +- arch/sparc/mm/hugetlbpage.c | 1 + fs/userfaultfd.c | 4 ++ include/linux/hugetlb.h | 16 +++++- include/linux/userfaultfd_k.h | 9 ++++ mm/hugetlb.c | 94 +++++++++++++++++++++++++++-------- mm/userfaultfd.c | 2 +- 13 files changed, 114 insertions(+), 35 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() 2021-02-17 20:44 [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Peter Xu @ 2021-02-17 20:44 ` Peter Xu 2021-02-17 20:46 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Peter Xu 2021-02-18 18:54 ` [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Axel Rasmussen 2 siblings, 0 replies; 14+ messages in thread From: Peter Xu @ 2021-02-17 20:44 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: peterx, Axel Rasmussen, Mike Kravetz, Mike Rapoport, Matthew Wilcox, Andrea Arcangeli, Andrew Morton, Kirill A . Shutemov It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/arm64/mm/hugetlbpage.c | 4 ++-- arch/ia64/mm/hugetlbpage.c | 3 ++- arch/mips/mm/hugetlbpage.c | 4 ++-- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 3 ++- arch/s390/mm/hugetlbpage.c | 2 +- arch/sh/mm/hugetlbpage.c | 2 +- arch/sparc/mm/hugetlbpage.c | 1 + include/linux/hugetlb.h | 5 +++-- mm/hugetlb.c | 15 ++++++++------- mm/userfaultfd.c | 2 +- 11 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 55ecf6de9ff7..6e3bcffe2837 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, set_pte(ptep, pte); } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; @@ -286,7 +286,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else if (sz == PMD_SIZE) { if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && pud_none(READ_ONCE(*pudp))) - ptep = huge_pmd_share(mm, addr, pudp); + ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); } else if (sz == (CONT_PMD_SIZE)) { diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c index b331f94d20ac..f993cb36c062 100644 --- a/arch/ia64/mm/hugetlbpage.c +++ b/arch/ia64/mm/hugetlbpage.c @@ -25,7 +25,8 @@ unsigned int hpage_shift = HPAGE_SHIFT_DEFAULT; EXPORT_SYMBOL(hpage_shift); pte_t * -huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { unsigned long taddr = htlbpage_to_page(addr); pgd_t *pgd; diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index b9f76f433617..7eaff5b07873 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -21,8 +21,8 @@ #include <asm/tlb.h> #include <asm/tlbflush.h> -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, - unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pgd; p4d_t *p4d; diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index d7ba014a7fbb..e141441bfa64 100644 --- a/arch/parisc/mm/hugetlbpage.c +++ b/arch/parisc/mm/hugetlbpage.c @@ -44,7 +44,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 8b3cc4d688e8..d57276b8791c 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -106,7 +106,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, * At this point we do the placement change only for BOOK3S 64. This would * possibly work on other subarchs. */ -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pg; p4d_t *p4; diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 3b5a4d25ca9b..da36d13ffc16 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -189,7 +189,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return pte; } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c index 220d7bc43d2b..999ab5916e69 100644 --- a/arch/sh/mm/hugetlbpage.c +++ b/arch/sh/mm/hugetlbpage.c @@ -21,7 +21,7 @@ #include <asm/tlbflush.h> #include <asm/cacheflush.h> -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index ad4b42f04988..97e0824fdbe7 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -280,6 +280,7 @@ unsigned long pmd_leaf_size(pmd_t pmd) { return 1UL << tte_to_shift(*(pte_t *)&p unsigned long pte_leaf_size(pte_t pte) { return 1UL << tte_to_shift(pte); } pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b5807f23caf8..a6113fa6d21d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -152,7 +152,8 @@ void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud); struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); @@ -161,7 +162,7 @@ extern struct list_head huge_boot_pages; /* arch callbacks */ -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58ab14cb..07bb9bdc3282 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3807,7 +3807,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte = huge_pte_alloc(dst, addr, sz); + dst_pte = huge_pte_alloc(dst, vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; @@ -4544,7 +4544,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ mapping = vma->vm_file->f_mapping; i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, haddr, huge_page_size(h)); + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); if (!ptep) { i_mmap_unlock_read(mapping); return VM_FAULT_OOM; @@ -5334,9 +5334,9 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * if !vma_shareable check at the beginning of the routine. i_mmap_rwsem is * only required for subsequent processing. */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud) { - struct vm_area_struct *vma = find_vma(mm, addr); struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; @@ -5414,7 +5414,8 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #define want_pmd_share() (1) #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, + unsigned long addr, pud_t *pud) { return NULL; } @@ -5433,7 +5434,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; @@ -5452,7 +5453,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else { BUG_ON(sz != PMD_SIZE); if (want_pmd_share() && pud_none(*pud)) - pte = huge_pmd_share(mm, addr, pud); + pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9a3d451402d7..063cbb17e8d8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -290,7 +290,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); err = -ENOMEM; - dst_pte = huge_pte_alloc(dst_mm, dst_addr, vma_hpagesize); + dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); -- 2.26.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled 2021-02-17 20:44 [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Peter Xu 2021-02-17 20:44 ` [PATCH v2 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Peter Xu @ 2021-02-17 20:46 ` Peter Xu 2021-02-17 20:46 ` [PATCH v2 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Peter Xu ` (2 more replies) 2021-02-18 18:54 ` [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Axel Rasmussen 2 siblings, 3 replies; 14+ messages in thread From: Peter Xu @ 2021-02-17 20:46 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Mike Kravetz, peterx, Mike Rapoport, Andrea Arcangeli, Axel Rasmussen, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. Since at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/arm64/mm/hugetlbpage.c | 3 +-- include/linux/hugetlb.h | 2 ++ include/linux/userfaultfd_k.h | 9 +++++++++ mm/hugetlb.c | 20 ++++++++++++++------ 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 6e3bcffe2837..58987a98e179 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, */ ptep = pte_alloc_map(mm, pmdp, addr); } else if (sz == PMD_SIZE) { - if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && - pud_none(READ_ONCE(*pudp))) + if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp))) ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a6113fa6d21d..bc86f2f516e7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -950,4 +950,6 @@ static inline __init void hugetlb_cma_check(void) } #endif +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); + #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a8e5f3ea9bb2..c63ccdae3eab 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -52,6 +52,15 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx; } +/* + * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp + * protect information is per pgtable entry. + */ +static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07bb9bdc3282..8e8e2f3dfe06 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5292,6 +5292,18 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) return false; } +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ +#ifndef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + return false; +#endif +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif + return vma_shareable(vma, addr); +} + /* * Determine if start,end range within vma could be mapped by shared pmd. * If yes, adjust start and end to cover range associated with possible @@ -5346,9 +5358,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - if (!vma_shareable(vma, addr)) - return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_assert_locked(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) @@ -5412,7 +5421,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE; return 1; } -#define want_pmd_share() (1) + #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, unsigned long addr, pud_t *pud) @@ -5430,7 +5439,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { } -#define want_pmd_share() (0) #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB @@ -5452,7 +5460,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pud; } else { BUG_ON(sz != PMD_SIZE); - if (want_pmd_share() && pud_none(*pud)) + if (want_pmd_share(vma, addr) && pud_none(*pud)) pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); -- 2.26.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h 2021-02-17 20:46 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Peter Xu @ 2021-02-17 20:46 ` Peter Xu 2021-02-17 20:46 ` [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Peter Xu 2021-02-18 1:34 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Mike Kravetz 2 siblings, 0 replies; 14+ messages in thread From: Peter Xu @ 2021-02-17 20:46 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Mike Kravetz, peterx, Mike Rapoport, Andrea Arcangeli, Axel Rasmussen, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton Prepare for it to be called outside of mm/hugetlb.c. Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- include/linux/hugetlb.h | 8 ++++++++ mm/hugetlb.c | 8 -------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bc86f2f516e7..3b4104021dd3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -952,4 +952,12 @@ static inline __init void hugetlb_cma_check(void) bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); +#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE +/* + * ARCHes with special requirements for evicting HUGETLB backing TLB entries can + * implement this. + */ +#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#endif + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8e8e2f3dfe06..f53a0b852ed8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4965,14 +4965,6 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, return i ? i : err; } -#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE -/* - * ARCHes with special requirements for evicting HUGETLB backing TLB entries can - * implement this. - */ -#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#endif - unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot) { -- 2.26.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp 2021-02-17 20:46 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Peter Xu 2021-02-17 20:46 ` [PATCH v2 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Peter Xu @ 2021-02-17 20:46 ` Peter Xu 2021-02-18 1:46 ` Mike Kravetz 2021-02-18 18:32 ` Axel Rasmussen 2021-02-18 1:34 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Mike Kravetz 2 siblings, 2 replies; 14+ messages in thread From: Peter Xu @ 2021-02-17 20:46 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Mike Kravetz, peterx, Mike Rapoport, Andrea Arcangeli, Axel Rasmussen, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Signed-off-by: Peter Xu <peterx@redhat.com> --- fs/userfaultfd.c | 4 ++++ include/linux/hugetlb.h | 1 + mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 894cc28142e7..e259318fcae1 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -15,6 +15,7 @@ #include <linux/sched/signal.h> #include <linux/sched/mm.h> #include <linux/mm.h> +#include <linux/mmu_notifier.h> #include <linux/poll.h> #include <linux/slab.h> #include <linux/seq_file.h> @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx.ctx = ctx; + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + skip: prev = vma; start = vma->vm_end; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3b4104021dd3..97ecfd4c20b2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); bool is_hugetlb_entry_migration(pte_t pte); +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); #else /* !CONFIG_HUGETLB_PAGE */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f53a0b852ed8..83c006ea3ff9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5723,4 +5723,55 @@ void __init hugetlb_cma_check(void) pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); } +/* + * This function will unconditionally remove all the shared pmd pgtable entries + * within the specific vma for a hugetlbfs memory range. + */ +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) +{ + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; + unsigned long address, start, end; + spinlock_t *ptl; + pte_t *ptep; + + if (!(vma->vm_flags & VM_MAYSHARE)) + return; + + start = ALIGN(vma->vm_start, PUD_SIZE); + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >= end) + return; + + /* + * No need to call adjust_range_if_pmd_sharing_possible(), because + * we're going to operate on the whole vma + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + vma->vm_start, vma->vm_end); + mmu_notifier_invalidate_range_start(&range); + i_mmap_lock_write(vma->vm_file->f_mapping); + for (address = start; address < end; address += PUD_SIZE) { + unsigned long tmp = address; + + ptep = huge_pte_offset(mm, address, sz); + if (!ptep) + continue; + ptl = huge_pte_lock(h, mm, ptep); + /* We don't want 'address' to be changed */ + huge_pmd_unshare(mm, vma, &tmp, ptep); + spin_unlock(ptl); + } + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); + i_mmap_unlock_write(vma->vm_file->f_mapping); + /* + * No need to call mmu_notifier_invalidate_range(), see + * Documentation/vm/mmu_notifier.rst. + */ + mmu_notifier_invalidate_range_end(&range); +} + #endif /* CONFIG_CMA */ -- 2.26.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp 2021-02-17 20:46 ` [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Peter Xu @ 2021-02-18 1:46 ` Mike Kravetz 2021-02-18 17:55 ` Peter Xu 2021-02-18 18:32 ` Axel Rasmussen 1 sibling, 1 reply; 14+ messages in thread From: Mike Kravetz @ 2021-02-18 1:46 UTC (permalink / raw) To: Peter Xu, linux-mm, linux-kernel Cc: Mike Rapoport, Andrea Arcangeli, Axel Rasmussen, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton On 2/17/21 12:46 PM, Peter Xu wrote: > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because > userfaultfd-wp is always based on pgtable entries, so they cannot be shared. > > Walk the hugetlb range and unshare all such mappings if there is, right before > UFFDIO_REGISTER will succeed and return to userspace. > > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing > is completely disabled for userfaultfd-wp registered range. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > fs/userfaultfd.c | 4 ++++ > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 56 insertions(+) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 894cc28142e7..e259318fcae1 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -15,6 +15,7 @@ > #include <linux/sched/signal.h> > #include <linux/sched/mm.h> > #include <linux/mm.h> > +#include <linux/mmu_notifier.h> > #include <linux/poll.h> > #include <linux/slab.h> > #include <linux/seq_file.h> > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > vma->vm_flags = new_flags; > vma->vm_userfaultfd_ctx.ctx = ctx; > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > + hugetlb_unshare_all_pmds(vma); > + > skip: > prev = vma; > start = vma->vm_end; > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 3b4104021dd3..97ecfd4c20b2 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > unsigned long address, unsigned long end, pgprot_t newprot); > > bool is_hugetlb_entry_migration(pte_t pte); > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); > > #else /* !CONFIG_HUGETLB_PAGE */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f53a0b852ed8..83c006ea3ff9 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5723,4 +5723,55 @@ void __init hugetlb_cma_check(void) > pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); > } > > +/* > + * This function will unconditionally remove all the shared pmd pgtable entries > + * within the specific vma for a hugetlbfs memory range. > + */ Thanks for updating this! > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > +{ > + struct hstate *h = hstate_vma(vma); > + unsigned long sz = huge_page_size(h); > + struct mm_struct *mm = vma->vm_mm; > + struct mmu_notifier_range range; > + unsigned long address, start, end; > + spinlock_t *ptl; > + pte_t *ptep; > + > + if (!(vma->vm_flags & VM_MAYSHARE)) > + return; > + > + start = ALIGN(vma->vm_start, PUD_SIZE); > + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); > + > + if (start >= end) > + return; > + > + /* > + * No need to call adjust_range_if_pmd_sharing_possible(), because > + * we're going to operate on the whole vma not necessary, but perhaps change to: * we're going to operate on ever PUD_SIZE aligned sized range * within the vma. > + * we're going to operate on the whole vma > + */ > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, > + vma->vm_start, vma->vm_end); Should we use start, end here instead of vma->vm_start, vma->vm_end ? > + mmu_notifier_invalidate_range_start(&range); > + i_mmap_lock_write(vma->vm_file->f_mapping); > + for (address = start; address < end; address += PUD_SIZE) { > + unsigned long tmp = address; > + > + ptep = huge_pte_offset(mm, address, sz); > + if (!ptep) > + continue; > + ptl = huge_pte_lock(h, mm, ptep); > + /* We don't want 'address' to be changed */ > + huge_pmd_unshare(mm, vma, &tmp, ptep); > + spin_unlock(ptl); > + } > + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); start, end ? -- Mike Kravetz > + i_mmap_unlock_write(vma->vm_file->f_mapping); > + /* > + * No need to call mmu_notifier_invalidate_range(), see > + * Documentation/vm/mmu_notifier.rst. > + */ > + mmu_notifier_invalidate_range_end(&range); > +} > + > #endif /* CONFIG_CMA */ > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp 2021-02-18 1:46 ` Mike Kravetz @ 2021-02-18 17:55 ` Peter Xu 0 siblings, 0 replies; 14+ messages in thread From: Peter Xu @ 2021-02-18 17:55 UTC (permalink / raw) To: Mike Kravetz Cc: linux-mm, linux-kernel, Mike Rapoport, Andrea Arcangeli, Axel Rasmussen, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton On Wed, Feb 17, 2021 at 05:46:30PM -0800, Mike Kravetz wrote: > On 2/17/21 12:46 PM, Peter Xu wrote: > > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because > > userfaultfd-wp is always based on pgtable entries, so they cannot be shared. > > > > Walk the hugetlb range and unshare all such mappings if there is, right before > > UFFDIO_REGISTER will succeed and return to userspace. > > > > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing > > is completely disabled for userfaultfd-wp registered range. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > fs/userfaultfd.c | 4 ++++ > > include/linux/hugetlb.h | 1 + > > mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 56 insertions(+) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 894cc28142e7..e259318fcae1 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -15,6 +15,7 @@ > > #include <linux/sched/signal.h> > > #include <linux/sched/mm.h> > > #include <linux/mm.h> > > +#include <linux/mmu_notifier.h> > > #include <linux/poll.h> > > #include <linux/slab.h> > > #include <linux/seq_file.h> > > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > vma->vm_flags = new_flags; > > vma->vm_userfaultfd_ctx.ctx = ctx; > > > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > > + hugetlb_unshare_all_pmds(vma); > > + > > skip: > > prev = vma; > > start = vma->vm_end; > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 3b4104021dd3..97ecfd4c20b2 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > > unsigned long address, unsigned long end, pgprot_t newprot); > > > > bool is_hugetlb_entry_migration(pte_t pte); > > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); > > > > #else /* !CONFIG_HUGETLB_PAGE */ > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index f53a0b852ed8..83c006ea3ff9 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -5723,4 +5723,55 @@ void __init hugetlb_cma_check(void) > > pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); > > } > > > > +/* > > + * This function will unconditionally remove all the shared pmd pgtable entries > > + * within the specific vma for a hugetlbfs memory range. > > + */ > > Thanks for updating this! > > > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > > +{ > > + struct hstate *h = hstate_vma(vma); > > + unsigned long sz = huge_page_size(h); > > + struct mm_struct *mm = vma->vm_mm; > > + struct mmu_notifier_range range; > > + unsigned long address, start, end; > > + spinlock_t *ptl; > > + pte_t *ptep; > > + > > + if (!(vma->vm_flags & VM_MAYSHARE)) > > + return; > > + > > + start = ALIGN(vma->vm_start, PUD_SIZE); > > + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); > > + > > + if (start >= end) > > + return; > > + > > + /* > > + * No need to call adjust_range_if_pmd_sharing_possible(), because > > + * we're going to operate on the whole vma > > not necessary, but perhaps change to: > * we're going to operate on ever PUD_SIZE aligned sized range > * within the vma. > > > + * we're going to operate on the whole vma > > + */ > > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, > > + vma->vm_start, vma->vm_end); > > Should we use start, end here instead of vma->vm_start, vma->vm_end ? > > > + mmu_notifier_invalidate_range_start(&range); > > + i_mmap_lock_write(vma->vm_file->f_mapping); > > + for (address = start; address < end; address += PUD_SIZE) { > > + unsigned long tmp = address; > > + > > + ptep = huge_pte_offset(mm, address, sz); > > + if (!ptep) > > + continue; > > + ptl = huge_pte_lock(h, mm, ptep); > > + /* We don't want 'address' to be changed */ > > + huge_pmd_unshare(mm, vma, &tmp, ptep); > > + spin_unlock(ptl); > > + } > > + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); > > start, end ? Right we can even shrink the notifier, I'll respin shortly. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp 2021-02-17 20:46 ` [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Peter Xu 2021-02-18 1:46 ` Mike Kravetz @ 2021-02-18 18:32 ` Axel Rasmussen 2021-02-18 20:32 ` Peter Xu 1 sibling, 1 reply; 14+ messages in thread From: Axel Rasmussen @ 2021-02-18 18:32 UTC (permalink / raw) To: Peter Xu Cc: Linux MM, LKML, Mike Kravetz, Mike Rapoport, Andrea Arcangeli, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton On Wed, Feb 17, 2021 at 12:46 PM Peter Xu <peterx@redhat.com> wrote: > > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because > userfaultfd-wp is always based on pgtable entries, so they cannot be shared. > > Walk the hugetlb range and unshare all such mappings if there is, right before > UFFDIO_REGISTER will succeed and return to userspace. > > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing > is completely disabled for userfaultfd-wp registered range. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > fs/userfaultfd.c | 4 ++++ > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 56 insertions(+) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 894cc28142e7..e259318fcae1 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -15,6 +15,7 @@ > #include <linux/sched/signal.h> > #include <linux/sched/mm.h> > #include <linux/mm.h> > +#include <linux/mmu_notifier.h> > #include <linux/poll.h> > #include <linux/slab.h> > #include <linux/seq_file.h> > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > vma->vm_flags = new_flags; > vma->vm_userfaultfd_ctx.ctx = ctx; > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > + hugetlb_unshare_all_pmds(vma); This line yields the following error, if building with: # CONFIG_CMA is not set ./fs/userfaultfd.c:1459: undefined reference to `hugetlb_unshare_all_pmds' > + > skip: > prev = vma; > start = vma->vm_end; > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 3b4104021dd3..97ecfd4c20b2 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, > unsigned long address, unsigned long end, pgprot_t newprot); > > bool is_hugetlb_entry_migration(pte_t pte); > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); > > #else /* !CONFIG_HUGETLB_PAGE */ > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f53a0b852ed8..83c006ea3ff9 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5723,4 +5723,55 @@ void __init hugetlb_cma_check(void) > pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); > } > > +/* > + * This function will unconditionally remove all the shared pmd pgtable entries > + * within the specific vma for a hugetlbfs memory range. > + */ > +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) > +{ > + struct hstate *h = hstate_vma(vma); > + unsigned long sz = huge_page_size(h); > + struct mm_struct *mm = vma->vm_mm; > + struct mmu_notifier_range range; > + unsigned long address, start, end; > + spinlock_t *ptl; > + pte_t *ptep; > + > + if (!(vma->vm_flags & VM_MAYSHARE)) > + return; > + > + start = ALIGN(vma->vm_start, PUD_SIZE); > + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); > + > + if (start >= end) > + return; > + > + /* > + * No need to call adjust_range_if_pmd_sharing_possible(), because > + * we're going to operate on the whole vma > + */ > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, > + vma->vm_start, vma->vm_end); > + mmu_notifier_invalidate_range_start(&range); > + i_mmap_lock_write(vma->vm_file->f_mapping); > + for (address = start; address < end; address += PUD_SIZE) { > + unsigned long tmp = address; > + > + ptep = huge_pte_offset(mm, address, sz); > + if (!ptep) > + continue; > + ptl = huge_pte_lock(h, mm, ptep); > + /* We don't want 'address' to be changed */ > + huge_pmd_unshare(mm, vma, &tmp, ptep); > + spin_unlock(ptl); > + } > + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); > + i_mmap_unlock_write(vma->vm_file->f_mapping); > + /* > + * No need to call mmu_notifier_invalidate_range(), see > + * Documentation/vm/mmu_notifier.rst. > + */ > + mmu_notifier_invalidate_range_end(&range); > +} > + > #endif /* CONFIG_CMA */ > -- > 2.26.2 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp 2021-02-18 18:32 ` Axel Rasmussen @ 2021-02-18 20:32 ` Peter Xu 2021-02-18 20:34 ` Axel Rasmussen 0 siblings, 1 reply; 14+ messages in thread From: Peter Xu @ 2021-02-18 20:32 UTC (permalink / raw) To: Axel Rasmussen Cc: Linux MM, LKML, Mike Kravetz, Mike Rapoport, Andrea Arcangeli, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton On Thu, Feb 18, 2021 at 10:32:00AM -0800, Axel Rasmussen wrote: > > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > vma->vm_flags = new_flags; > > vma->vm_userfaultfd_ctx.ctx = ctx; > > > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > > + hugetlb_unshare_all_pmds(vma); > > This line yields the following error, if building with: > # CONFIG_CMA is not set > > ./fs/userfaultfd.c:1459: undefined reference to `hugetlb_unshare_all_pmds' Ouch.. Axel, you mean CONFIG_HUGETLBFS rather than CONFIG_CMA, am I right? -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp 2021-02-18 20:32 ` Peter Xu @ 2021-02-18 20:34 ` Axel Rasmussen 2021-02-18 20:41 ` Peter Xu 0 siblings, 1 reply; 14+ messages in thread From: Axel Rasmussen @ 2021-02-18 20:34 UTC (permalink / raw) To: Peter Xu Cc: Linux MM, LKML, Mike Kravetz, Mike Rapoport, Andrea Arcangeli, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton On Thu, Feb 18, 2021 at 12:32 PM Peter Xu <peterx@redhat.com> wrote: > > On Thu, Feb 18, 2021 at 10:32:00AM -0800, Axel Rasmussen wrote: > > > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > > vma->vm_flags = new_flags; > > > vma->vm_userfaultfd_ctx.ctx = ctx; > > > > > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > > > + hugetlb_unshare_all_pmds(vma); > > > > This line yields the following error, if building with: > > # CONFIG_CMA is not set > > > > ./fs/userfaultfd.c:1459: undefined reference to `hugetlb_unshare_all_pmds' > > Ouch.. Axel, you mean CONFIG_HUGETLBFS rather than CONFIG_CMA, am I right? Surprisingly no, there's a "#ifdef CONFIG_CMA" line ~100 lines above where hugetlb_unshare_all_pmds is defined in hugetlb.c which causes this. My guess is that putting the function inside that block was accidental and it can just be moved. > > -- > Peter Xu > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp 2021-02-18 20:34 ` Axel Rasmussen @ 2021-02-18 20:41 ` Peter Xu 0 siblings, 0 replies; 14+ messages in thread From: Peter Xu @ 2021-02-18 20:41 UTC (permalink / raw) To: Axel Rasmussen Cc: Linux MM, LKML, Mike Kravetz, Mike Rapoport, Andrea Arcangeli, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton On Thu, Feb 18, 2021 at 12:34:55PM -0800, Axel Rasmussen wrote: > On Thu, Feb 18, 2021 at 12:32 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Feb 18, 2021 at 10:32:00AM -0800, Axel Rasmussen wrote: > > > > @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, > > > > vma->vm_flags = new_flags; > > > > vma->vm_userfaultfd_ctx.ctx = ctx; > > > > > > > > + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) > > > > + hugetlb_unshare_all_pmds(vma); > > > > > > This line yields the following error, if building with: > > > # CONFIG_CMA is not set > > > > > > ./fs/userfaultfd.c:1459: undefined reference to `hugetlb_unshare_all_pmds' > > > > Ouch.. Axel, you mean CONFIG_HUGETLBFS rather than CONFIG_CMA, am I right? > > Surprisingly no, there's a "#ifdef CONFIG_CMA" line ~100 lines above > where hugetlb_unshare_all_pmds is defined in hugetlb.c which causes > this. My guess is that putting the function inside that block was > accidental and it can just be moved. Right, thanks for catching that, I actually need to fix both. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled 2021-02-17 20:46 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Peter Xu 2021-02-17 20:46 ` [PATCH v2 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Peter Xu 2021-02-17 20:46 ` [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Peter Xu @ 2021-02-18 1:34 ` Mike Kravetz 2 siblings, 0 replies; 14+ messages in thread From: Mike Kravetz @ 2021-02-18 1:34 UTC (permalink / raw) To: Peter Xu, linux-mm, linux-kernel Cc: Mike Rapoport, Andrea Arcangeli, Axel Rasmussen, Matthew Wilcox, Kirill A . Shutemov, Andrew Morton On 2/17/21 12:46 PM, Peter Xu wrote: > Huge pmd sharing could bring problem to userfaultfd. The thing is that > userfaultfd is running its logic based on the special bits on page table > entries, however the huge pmd sharing could potentially share page table > entries for different address ranges. That could cause issues on either: > > - When sharing huge pmd page tables for an uffd write protected range, the > newly mapped huge pmd range will also be write protected unexpectedly, or, > > - When we try to write protect a range of huge pmd shared range, we'll first > do huge_pmd_unshare() in hugetlb_change_protection(), however that also > means the UFFDIO_WRITEPROTECT could be silently skipped for the shared > region, which could lead to data loss. > > Since at it, a few other things are done altogether: > > - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because > that's definitely something that arch code would like to use too > > - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when > trying to share huge pmd. Switch to the want_pmd_share() helper. > > Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > arch/arm64/mm/hugetlbpage.c | 3 +-- > include/linux/hugetlb.h | 2 ++ > include/linux/userfaultfd_k.h | 9 +++++++++ > mm/hugetlb.c | 20 ++++++++++++++------ > 4 files changed, 26 insertions(+), 8 deletions(-) Thanks, Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> -- Mike Kravetz ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp 2021-02-17 20:44 [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Peter Xu 2021-02-17 20:44 ` [PATCH v2 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Peter Xu 2021-02-17 20:46 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Peter Xu @ 2021-02-18 18:54 ` Axel Rasmussen 2021-02-18 20:33 ` Peter Xu 2 siblings, 1 reply; 14+ messages in thread From: Axel Rasmussen @ 2021-02-18 18:54 UTC (permalink / raw) To: Peter Xu Cc: Linux MM, LKML, Mike Kravetz, Mike Rapoport, Matthew Wilcox, Andrea Arcangeli, Andrew Morton, Kirill A . Shutemov I reviewed these patches, rebased my minor fault handling series on top of this series, and then ran some stress tests of minor fault handling. Other than the one comment I left about !CONFIG_CMA, I didn't spot any issues. So: Tested-By: Axel Rasmussen <axelrasmussen@google.com> (Or Reviewed-By: , if that makes more sense.) On Wed, Feb 17, 2021 at 12:44 PM Peter Xu <peterx@redhat.com> wrote: > > v2: > - patch 4: move hugetlb_unshare_all_pmds() into mm/hugetlb.c, so it can be used > even outside userfaultfd.c > > This series tries to disable huge pmd unshare of hugetlbfs backed memory for > uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the idea of > this series may be needed for multiple tasks (Axel's uffd minor fault series, > and Mike's soft dirty series), so I picked it out from the larger series. > > References works: > > Uffd shmem+hugetlbfs rfc: > https://lore.kernel.org/lkml/20210115170907.24498-1-peterx@redhat.com/ > > Uffd minor mode for hugetlbfs: > https://lore.kernel.org/lkml/20210212215403.3457686-1-axelrasmussen@google.com/ > > Soft dirty for hugetlbfs: > https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/ > > Please review, thanks. > > Peter Xu (4): > hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() > hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled > mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h > hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp > > arch/arm64/mm/hugetlbpage.c | 7 ++- > arch/ia64/mm/hugetlbpage.c | 3 +- > arch/mips/mm/hugetlbpage.c | 4 +- > arch/parisc/mm/hugetlbpage.c | 2 +- > arch/powerpc/mm/hugetlbpage.c | 3 +- > arch/s390/mm/hugetlbpage.c | 2 +- > arch/sh/mm/hugetlbpage.c | 2 +- > arch/sparc/mm/hugetlbpage.c | 1 + > fs/userfaultfd.c | 4 ++ > include/linux/hugetlb.h | 16 +++++- > include/linux/userfaultfd_k.h | 9 ++++ > mm/hugetlb.c | 94 +++++++++++++++++++++++++++-------- > mm/userfaultfd.c | 2 +- > 13 files changed, 114 insertions(+), 35 deletions(-) > > -- > 2.26.2 > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp 2021-02-18 18:54 ` [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Axel Rasmussen @ 2021-02-18 20:33 ` Peter Xu 0 siblings, 0 replies; 14+ messages in thread From: Peter Xu @ 2021-02-18 20:33 UTC (permalink / raw) To: Axel Rasmussen Cc: Linux MM, LKML, Mike Kravetz, Mike Rapoport, Matthew Wilcox, Andrea Arcangeli, Andrew Morton, Kirill A . Shutemov On Thu, Feb 18, 2021 at 10:54:41AM -0800, Axel Rasmussen wrote: > I reviewed these patches, rebased my minor fault handling series on > top of this series, and then ran some stress tests of minor fault > handling. Other than the one comment I left about !CONFIG_CMA, I > didn't spot any issues. So: > > Tested-By: Axel Rasmussen <axelrasmussen@google.com> > > (Or Reviewed-By: , if that makes more sense.) I'll add your r-b for the initial 3 patches, thanks Axel! -- Peter Xu ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-02-18 20:41 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-17 20:44 [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Peter Xu 2021-02-17 20:44 ` [PATCH v2 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Peter Xu 2021-02-17 20:46 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Peter Xu 2021-02-17 20:46 ` [PATCH v2 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Peter Xu 2021-02-17 20:46 ` [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Peter Xu 2021-02-18 1:46 ` Mike Kravetz 2021-02-18 17:55 ` Peter Xu 2021-02-18 18:32 ` Axel Rasmussen 2021-02-18 20:32 ` Peter Xu 2021-02-18 20:34 ` Axel Rasmussen 2021-02-18 20:41 ` Peter Xu 2021-02-18 1:34 ` [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Mike Kravetz 2021-02-18 18:54 ` [PATCH v2 0/4] hugetlb: Disable huge pmd unshare for uffd-wp Axel Rasmussen 2021-02-18 20:33 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).