From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1E9CC4167E for ; Wed, 23 Feb 2022 05:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238706AbiBWF2O (ORCPT ); Wed, 23 Feb 2022 00:28:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238389AbiBWF0v (ORCPT ); Wed, 23 Feb 2022 00:26:51 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC6006AA44 for ; Tue, 22 Feb 2022 21:25:14 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id g205-20020a2552d6000000b0061e1843b8edso26635752ybb.18 for ; Tue, 22 Feb 2022 21:25:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=z2A7zhGCAXM89Mm8qf+swCLr5Dn/epWs5SXVvIpUsek=; b=tRhCbGDitng8H5bHVni9LeLhw+OutL5exoDLV2q/On1vnMgXPuEEjcg+LbKI1cVqrD Vyo7V47FL7/0MDpSopFI7nReIbg7le3SY0oaOVkKdiMGKdDx7pdDSJXZaaKKRg410xu2 t4RPD/m0Rz3825891oGpbAo2zhoskX0LmYLk/s6S6UydDxkiw0PDjI+nunjVplgqDwrA +5Jz1StT9q9LpFCspfDpRycEYKD4vtMQDfcSS06Robo3QGT5pJP4suYnnRPaHrjD0ty7 l5qUqZO6zEun+3vP4Gykt/1OOPF+J93aOR22CevdnVkqI4Do06KPWQJ6jdbyHleq2VP2 sP7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=z2A7zhGCAXM89Mm8qf+swCLr5Dn/epWs5SXVvIpUsek=; b=kXc4MM+FP9iLh+AtoIdKWczCPVElLVZ3vZGqeGJy95HHm4a3BD4pw99I7twQhPbdQZ eKh5HJFbjli9CeV/ywaXiYq3NEzyI4IdA2m1TBRsKPJBpsltVV/2EOEOYfAhm7XsyBnY 1PtYjieCKrBpfoGF+2wFBBBi6u48MW23IV5RtvIevQZOOhpM1ORQbGEE3RYakTPSPzu0 EgSgNMeXIye/gOK7Pqf+Y6VSL9//nstYXve3EeA+V/YpGf8GqsPiZ/1wW7E+b3XCYnH2 izojfUD6Y7abyRlBi9vBrWH2Q/XX6Vssx7Wnv8fPxvAzH4ds6uwT147HZGiNfaYoPl5Q j6Fw== X-Gm-Message-State: AOAM533pBLMtZPO3vmUnh3fLIs/GA3X/PkM+WrbWACC9ezzFgabOTa09 Gz0Rgfs4c/WiaH52XJhSqf8Mb6HvTtSVMuctiI0sh6qTuqqv3S6vH7RNdNOej20NCSKHwLLBhqY uuBtGjQ2ZMV4SYKCs0sUhzmTrvQNxK4jMp0z7MH4Hzv0RPX84z0wSEZ/e91/m0IwuLbZXX5lz X-Google-Smtp-Source: ABdhPJxQh2hzOmUCmk3wlhtqctWGzojDMn80W5x6yZGJyfVoEE9kdvLHRV4rxoztBsJ+fy5+R2LvtwTwKQ0F X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:eb09:0:b0:2d1:e0df:5104 with SMTP id u9-20020a0deb09000000b002d1e0df5104mr27669944ywe.250.1645593903681; Tue, 22 Feb 2022 21:25:03 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:11 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-36-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 35/47] mm: asi: asi_exit() on PF, skip handling if address is accessible From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ofir Weisse On a page-fault - do asi_exit(). Then check if now after the exit the address is accessible. We do this by refactoring spurious_kernel_fault() into two parts: 1. Verify that the error code value is something that could arise from a lazy TLB update. 2. Walk the page table and verify permissions, which is now called is_address_accessible_now(). We also define PTE_PRESENT() and PMD_PRESENT() which are suitable for checking userspace pages. For the sake of spurious faualts, pte_present() and pmd_present() are only good for kernelspace pages. This is because these macros might return true even if the present bit is 0 (only relevant for userspace). Signed-off-by: Ofir Weisse --- arch/x86/mm/fault.c | 60 ++++++++++++++++++++++++++++++++++------ include/linux/mm_types.h | 3 ++ 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 8692eb50f4a5..d08021ba380b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -982,6 +982,8 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) return 1; } +static int is_address_accessible_now(unsigned long error_code, unsigned long address, + pgd_t *pgd); /* * Handle a spurious fault caused by a stale TLB entry. * @@ -1003,15 +1005,13 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) * See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3 * (Optional Invalidation). */ +/* A spurious fault is also possible when Address Space Isolation (ASI) is in + * use. Specifically, code running withing an ASI domain touched memory outside + * the domain. This access causes a page-fault --> asi_exit() */ static noinline int spurious_kernel_fault(unsigned long error_code, unsigned long address) { pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - pte_t *pte; - int ret; /* * Only writes to RO or instruction fetches from NX may cause @@ -1027,6 +1027,37 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return 0; pgd = init_mm.pgd + pgd_index(address); + return is_address_accessible_now(error_code, address, pgd); +} +NOKPROBE_SYMBOL(spurious_kernel_fault); + + +/* Check if an address (kernel or userspace) would cause a page fault if + * accessed now. + * + * For kernel addresses, pte_present and pmd_present are sufficioent. For + * userspace, we must use PTE_PRESENT and PMD_PRESENT, which will only check the + * present bits. + * The existing pmd_present() in arch/x86/include/asm/pgtable.h is misleading. + * The PMD page might be in the middle of split_huge_page with present bit + * clear, but pmd_present will still return true. We are inteerested in knowing + * if the page is accessible to hardware - that is - the present bit is 1. */ +#define PMD_PRESENT(pmd) (pmd_flags(pmd) & _PAGE_PRESENT) + +/* pte_present will return true is _PAGE_PROTNONE is 1. We care if the hardware + * can actually access the page right now. */ +#define PTE_PRESENT(pte) (pte_flags(pte) & _PAGE_PRESENT) + +static noinline int +is_address_accessible_now(unsigned long error_code, unsigned long address, + pgd_t *pgd) +{ + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + int ret; + if (!pgd_present(*pgd)) return 0; @@ -1045,14 +1076,14 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return spurious_kernel_fault_check(error_code, (pte_t *) pud); pmd = pmd_offset(pud, address); - if (!pmd_present(*pmd)) + if (!PMD_PRESENT(*pmd)) return 0; if (pmd_large(*pmd)) return spurious_kernel_fault_check(error_code, (pte_t *) pmd); pte = pte_offset_kernel(pmd, address); - if (!pte_present(*pte)) + if (!PTE_PRESENT(*pte)) return 0; ret = spurious_kernel_fault_check(error_code, pte); @@ -1068,7 +1099,6 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return ret; } -NOKPROBE_SYMBOL(spurious_kernel_fault); int show_unhandled_signals = 1; @@ -1504,6 +1534,20 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) * the fixup on the next page fault. */ struct asi *asi = asi_get_current(); + if (asi) + asi_exit(); + + /* handle_page_fault() might call BUG() if we run it for a kernel + * address. This might be the case if we got here due to an ASI fault. + * We avoid this case by checking whether the address is now, after a + * potential asi_exit(), accessible by hardware. If it is - there's + * nothing to do. + */ + if (current && mm_asi_enabled(current->mm)) { + pgd_t *pgd = (pgd_t*)__va(read_cr3_pa()) + pgd_index(address); + if (is_address_accessible_now(error_code, address, pgd)) + return; + } prefetchw(¤t->mm->mmap_lock); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c3f209720a84..560909e80841 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -707,6 +707,9 @@ extern struct mm_struct init_mm; #ifdef CONFIG_ADDRESS_SPACE_ISOLATION static inline bool mm_asi_enabled(struct mm_struct *mm) { + if (!mm) + return false; + return mm->asi_enabled; } #else -- 2.35.1.473.g83b2b277ed-goog