From patchwork Fri Oct 11 14:09:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jia He X-Patchwork-Id: 1138647 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B85EECE58C for ; Fri, 11 Oct 2019 14:09:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3C0B720679 for ; Fri, 11 Oct 2019 14:09:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728540AbfJKOJ5 (ORCPT ); Fri, 11 Oct 2019 10:09:57 -0400 Received: from foss.arm.com ([217.140.110.172]:33446 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728033AbfJKOJ4 (ORCPT ); Fri, 11 Oct 2019 10:09:56 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0538E1570; Fri, 11 Oct 2019 07:09:56 -0700 (PDT) Received: from localhost.localdomain (entos-thunderx2-02.shanghai.arm.com [10.169.40.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 734B63F68E; Fri, 11 Oct 2019 07:09:51 -0700 (PDT) From: Jia He To: Catalin Marinas , Will Deacon , Mark Rutland , James Morse , Marc Zyngier , Matthew Wilcox , "Kirill A. Shutemov" , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Suzuki Poulose , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org Cc: Thomas Gleixner , Andrew Morton , hejianet@gmail.com, Kaly Xin , nd@arm.com, Jia He Subject: [PATCH v12 1/4] arm64: cpufeature: introduce helper cpu_has_hw_af() Date: Fri, 11 Oct 2019 22:09:36 +0800 Message-Id: <20191011140939.6115-2-justin.he@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191011140939.6115-1-justin.he@arm.com> References: <20191011140939.6115-1-justin.he@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We unconditionally set the HW_AFDBM capability and only enable it on CPUs which really have the feature. But sometimes we need to know whether this cpu has the capability of HW AF. So decouple AF from DBM by a new helper cpu_has_hw_af(). If later we noticed a potential performance issue on this path, we can turn it into a static label as with other CPU features. Signed-off-by: Jia He Suggested-by: Suzuki Poulose Reviewed-by: Catalin Marinas --- arch/arm64/include/asm/cpufeature.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 9cde5d2e768f..4261d55e8506 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -659,6 +659,20 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) default: return CONFIG_ARM64_PA_BITS; } } + +/* Check whether hardware update of the Access flag is supported */ +static inline bool cpu_has_hw_af(void) +{ + u64 mmfr1; + + if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM)) + return false; + + mmfr1 = read_cpuid(ID_AA64MMFR1_EL1); + return cpuid_feature_extract_unsigned_field(mmfr1, + ID_AA64MMFR1_HADBS_SHIFT); +} + #endif /* __ASSEMBLY__ */ #endif From patchwork Fri Oct 11 14:09:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jia He X-Patchwork-Id: 1138648 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51F62C47404 for ; Fri, 11 Oct 2019 14:10:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 337D320679 for ; Fri, 11 Oct 2019 14:10:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728595AbfJKOKC (ORCPT ); Fri, 11 Oct 2019 10:10:02 -0400 Received: from foss.arm.com ([217.140.110.172]:33468 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728033AbfJKOKB (ORCPT ); Fri, 11 Oct 2019 10:10:01 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 20125142F; Fri, 11 Oct 2019 07:10:01 -0700 (PDT) Received: from localhost.localdomain (entos-thunderx2-02.shanghai.arm.com [10.169.40.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 6A1CB3F68E; Fri, 11 Oct 2019 07:09:56 -0700 (PDT) From: Jia He To: Catalin Marinas , Will Deacon , Mark Rutland , James Morse , Marc Zyngier , Matthew Wilcox , "Kirill A. Shutemov" , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Suzuki Poulose , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org Cc: Thomas Gleixner , Andrew Morton , hejianet@gmail.com, Kaly Xin , nd@arm.com, Jia He Subject: [PATCH v12 2/4] arm64: mm: implement arch_faults_on_old_pte() on arm64 Date: Fri, 11 Oct 2019 22:09:37 +0800 Message-Id: <20191011140939.6115-3-justin.he@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191011140939.6115-1-justin.he@arm.com> References: <20191011140939.6115-1-justin.he@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On arm64 without hardware Access Flag, copying from user will fail because the pte is old and cannot be marked young. So we always end up with zeroed page after fork() + CoW for pfn mappings. We don't always have a hardware-managed Access Flag on arm64. Hence implement arch_faults_on_old_pte on arm64 to indicate that it might cause page fault when accessing old pte. Signed-off-by: Jia He Reviewed-by: Catalin Marinas --- arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7576df00eb50..e96fb82f62de 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct vm_area_struct *vma, #define phys_to_ttbr(addr) (addr) #endif +/* + * On arm64 without hardware Access Flag, copying from user will fail because + * the pte is old and cannot be marked young. So we always end up with zeroed + * page after fork() + CoW for pfn mappings. We don't always have a + * hardware-managed access flag on arm64. + */ +static inline bool arch_faults_on_old_pte(void) +{ + WARN_ON(preemptible()); + + return !cpu_has_hw_af(); +} +#define arch_faults_on_old_pte arch_faults_on_old_pte + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ From patchwork Fri Oct 11 14:09:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jia He X-Patchwork-Id: 1138649 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64B30ECE58E for ; Fri, 11 Oct 2019 14:10:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 482F4214E0 for ; Fri, 11 Oct 2019 14:10:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728632AbfJKOKH (ORCPT ); Fri, 11 Oct 2019 10:10:07 -0400 Received: from foss.arm.com ([217.140.110.172]:33498 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728368AbfJKOKG (ORCPT ); Fri, 11 Oct 2019 10:10:06 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 235AF1570; Fri, 11 Oct 2019 07:10:06 -0700 (PDT) Received: from localhost.localdomain (entos-thunderx2-02.shanghai.arm.com [10.169.40.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 855AD3F68E; Fri, 11 Oct 2019 07:10:01 -0700 (PDT) From: Jia He To: Catalin Marinas , Will Deacon , Mark Rutland , James Morse , Marc Zyngier , Matthew Wilcox , "Kirill A. Shutemov" , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Suzuki Poulose , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org Cc: Thomas Gleixner , Andrew Morton , hejianet@gmail.com, Kaly Xin , nd@arm.com, Jia He Subject: [PATCH v12 3/4] x86/mm: implement arch_faults_on_old_pte() stub on x86 Date: Fri, 11 Oct 2019 22:09:38 +0800 Message-Id: <20191011140939.6115-4-justin.he@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191011140939.6115-1-justin.he@arm.com> References: <20191011140939.6115-1-justin.he@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org arch_faults_on_old_pte is a helper to indicate that it might cause page fault when accessing old pte. But on x86, there is feature to setting pte access flag by hardware. Hence implement an overriding stub which always returns false. Signed-off-by: Jia He Suggested-by: Will Deacon --- arch/x86/include/asm/pgtable.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 0bc530c4eb13..ad97dc155195 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1463,6 +1463,12 @@ static inline bool arch_has_pfn_modify_check(void) return boot_cpu_has_bug(X86_BUG_L1TF); } +#define arch_faults_on_old_pte arch_faults_on_old_pte +static inline bool arch_faults_on_old_pte(void) +{ + return false; +} + #include #endif /* __ASSEMBLY__ */ From patchwork Fri Oct 11 14:09:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jia He X-Patchwork-Id: 1138650 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3F1EC47404 for ; Fri, 11 Oct 2019 14:10:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CF3F820679 for ; Fri, 11 Oct 2019 14:10:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728672AbfJKOKL (ORCPT ); Fri, 11 Oct 2019 10:10:11 -0400 Received: from foss.arm.com ([217.140.110.172]:33528 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728190AbfJKOKL (ORCPT ); Fri, 11 Oct 2019 10:10:11 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1ED60142F; Fri, 11 Oct 2019 07:10:11 -0700 (PDT) Received: from localhost.localdomain (entos-thunderx2-02.shanghai.arm.com [10.169.40.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 884983F68E; Fri, 11 Oct 2019 07:10:06 -0700 (PDT) From: Jia He To: Catalin Marinas , Will Deacon , Mark Rutland , James Morse , Marc Zyngier , Matthew Wilcox , "Kirill A. Shutemov" , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Suzuki Poulose , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org Cc: Thomas Gleixner , Andrew Morton , hejianet@gmail.com, Kaly Xin , nd@arm.com, Jia He Subject: [PATCH v12 4/4] mm: fix double page fault on arm64 if PTE_AF is cleared Date: Fri, 11 Oct 2019 22:09:39 +0800 Message-Id: <20191011140939.6115-5-justin.he@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191011140939.6115-1-justin.he@arm.com> References: <20191011140939.6115-1-justin.he@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When we tested pmdk unit test [1] vmmalloc_fork TEST3 on arm64 guest, there will be a double page fault in __copy_from_user_inatomic of cow_user_page. To reproduce the bug, the cmd is as follows after you deployed everything: make -C src/test/vmmalloc_fork/ TEST_TIME=60m check Below call trace is from arm64 do_page_fault for debugging purpose: [ 110.016195] Call trace: [ 110.016826] do_page_fault+0x5a4/0x690 [ 110.017812] do_mem_abort+0x50/0xb0 [ 110.018726] el1_da+0x20/0xc4 [ 110.019492] __arch_copy_from_user+0x180/0x280 [ 110.020646] do_wp_page+0xb0/0x860 [ 110.021517] __handle_mm_fault+0x994/0x1338 [ 110.022606] handle_mm_fault+0xe8/0x180 [ 110.023584] do_page_fault+0x240/0x690 [ 110.024535] do_mem_abort+0x50/0xb0 [ 110.025423] el0_da+0x20/0x24 The pte info before __copy_from_user_inatomic is (PTE_AF is cleared): [ffff9b007000] pgd=000000023d4f8003, pud=000000023da9b003, pmd=000000023d4b3003, pte=360000298607bd3 As told by Catalin: "On arm64 without hardware Access Flag, copying from user will fail because the pte is old and cannot be marked young. So we always end up with zeroed page after fork() + CoW for pfn mappings. we don't always have a hardware-managed access flag on arm64." This patch fixes it by calling pte_mkyoung. Also, the parameter is changed because vmf should be passed to cow_user_page() Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error in case there can be some obscure use-case (by Kirill). [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork Signed-off-by: Jia He Reported-by: Yibo Cai Reviewed-by: Catalin Marinas Acked-by: Kirill A. Shutemov --- mm/memory.c | 104 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 89 insertions(+), 15 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index b1ca51a079f2..b6a5d6a08438 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -118,6 +118,18 @@ int randomize_va_space __read_mostly = 2; #endif +#ifndef arch_faults_on_old_pte +static inline bool arch_faults_on_old_pte(void) +{ + /* + * Those arches which don't have hw access flag feature need to + * implement their own helper. By default, "true" means pagefault + * will be hit on old pte. + */ + return true; +} +#endif + static int __init disable_randmaps(char *s) { randomize_va_space = 0; @@ -2145,32 +2157,82 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, return same; } -static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma) +static inline bool cow_user_page(struct page *dst, struct page *src, + struct vm_fault *vmf) { + bool ret; + void *kaddr; + void __user *uaddr; + bool force_mkyoung; + struct vm_area_struct *vma = vmf->vma; + struct mm_struct *mm = vma->vm_mm; + unsigned long addr = vmf->address; + debug_dma_assert_idle(src); + if (likely(src)) { + copy_user_highpage(dst, src, addr, vma); + return true; + } + /* * If the source page was a PFN mapping, we don't have * a "struct page" for it. We do a best-effort copy by * just copying from the original user address. If that * fails, we just zero-fill it. Live with it. */ - if (unlikely(!src)) { - void *kaddr = kmap_atomic(dst); - void __user *uaddr = (void __user *)(va & PAGE_MASK); + kaddr = kmap_atomic(dst); + uaddr = (void __user *)(addr & PAGE_MASK); + + /* + * On architectures with software "accessed" bits, we would + * take a double page fault, so mark it accessed here. + */ + force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf->orig_pte); + if (force_mkyoung) { + pte_t entry; + + vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); + if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) { + /* + * Other thread has already handled the fault + * and we don't need to do anything. If it's + * not the case, the fault will be triggered + * again on the same address. + */ + ret = false; + goto pte_unlock; + } + entry = pte_mkyoung(vmf->orig_pte); + if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0)) + update_mmu_cache(vma, addr, vmf->pte); + } + + /* + * This really shouldn't fail, because the page is there + * in the page tables. But it might just be unreadable, + * in which case we just give up and fill the result with + * zeroes. + */ + if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { /* - * This really shouldn't fail, because the page is there - * in the page tables. But it might just be unreadable, - * in which case we just give up and fill the result with - * zeroes. + * Give a warn in case there can be some obscure + * use-case */ - if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) - clear_page(kaddr); - kunmap_atomic(kaddr); - flush_dcache_page(dst); - } else - copy_user_highpage(dst, src, va, vma); + WARN_ON_ONCE(1); + clear_page(kaddr); + } + + ret = true; + +pte_unlock: + if (force_mkyoung) + pte_unmap_unlock(vmf->pte, vmf->ptl); + kunmap_atomic(kaddr); + flush_dcache_page(dst); + + return ret; } static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) @@ -2327,7 +2389,19 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) vmf->address); if (!new_page) goto oom; - cow_user_page(new_page, old_page, vmf->address, vma); + + if (!cow_user_page(new_page, old_page, vmf)) { + /* + * COW failed, if the fault was solved by other, + * it's fine. If not, userspace would re-fault on + * the same address and we will handle the fault + * from the second attempt. + */ + put_page(new_page); + if (old_page) + put_page(old_page); + return 0; + } } if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))