From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3481EC433F5 for ; Mon, 20 Sep 2021 20:04:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 17B3261244 for ; Mon, 20 Sep 2021 20:04:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231580AbhITUFr (ORCPT ); Mon, 20 Sep 2021 16:05:47 -0400 Received: from mga05.intel.com ([192.55.52.43]:60380 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230438AbhITUDo (ORCPT ); Mon, 20 Sep 2021 16:03:44 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10113"; a="308775425" X-IronPort-AV: E=Sophos;i="5.85,309,1624345200"; d="scan'208";a="308775425" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2021 13:02:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,309,1624345200"; d="scan'208";a="473779494" Received: from otcwcpicx3.sc.intel.com ([172.25.55.73]) by fmsmga007.fm.intel.com with ESMTP; 20 Sep 2021 13:02:15 -0700 From: Fenghua Yu To: "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "Peter Zijlstra" , "Andy Lutomirski" , "Dave Hansen" , "Tony Luck" , "Lu Baolu" , "Joerg Roedel" , Josh Poimboeuf , "Dave Jiang" , "Jacob Jun Pan" , "Ashok Raj" , "Ravi V Shankar" Cc: iommu@lists.linux-foundation.org, "x86" , "linux-kernel" , Fenghua Yu Subject: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting Date: Mon, 20 Sep 2021 19:23:46 +0000 Message-Id: <20210920192349.2602141-6-fenghua.yu@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210920192349.2602141-1-fenghua.yu@intel.com> References: <20210920192349.2602141-1-fenghua.yu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org PASIDs are fundamentally hardware resources in a shared address space. There is a limited number of them to use ENQCMD on shared workqueue. They must be shared and managed. They can not, for instance, be statically allocated to processes. Free PASID eagerly by sending IPIs in unbind was disabled due to locking and other issues in commit 9bfecd058339 ("x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()"). Lazy PASID free is implemented in order to re-enable the ENQCMD feature. PASIDs are currently reference counted and are centered around device usage. To support lazy PASID free, reference counts are tracked in the following scenarios: 1. The PASID's reference count is initialized as 1 when the PASID is first allocated in bind. This is already implemented. 2. A reference is taken when a device is bound to the mm and dropped when the device is unbound from the mm. This reference tracks device usage of the PASID. This is already implemented. 3. A reference is taken when a task's IA32_PASID MSR is initialized in #GP fix up and dropped when the task exits. This reference tracks the task usage of the PASID. It is implemented here. Once a PASID is allocated to an mm in bind, it's associated to the mm until it's freed lazily when its reference count is dropped to zero in unbind or exit(2). ENQCMD requires a valid IA32_PASID MSR with the PASID value and a valid PASID table entry for the PASID. Lazy PASID free may cause the process still has the valid PASID but the PASID table entry is removed in unbind. In this case, workqueue submitted by ENQCMD cannot find the PASID table entry and will generate a DMAR fault. Here is a more detailed explanation of the life cycle of a PASID: All processes start out without a PASID allocated (because fork(2) clears the PASID in the child). A PASID is allocated on the first open of an accelerator device by a call to: iommu_sva_bind_device() -> intel_svm_bind() -> intel_svm_alloc_pasid() -> iommu_sva_alloc_pasid() -> ioasid_alloc() At this point mm->pasid for the process is initialized, the reference count on that PASID is 1, but as yet no tasks within the process have set up their MSR_IA32_PASID to be able to execute the ENQCMD instruction. When a task in the process does execute ENQCMD there is a #GP fault. The Linux handler notes that the process has a PASID allocated, and attempts to fix the #GP fault by initializing MSR_IA32_PASID for this task. It also increments the reference count for the PASID. Additional threads in the task may also execute ENQCMD, and each will add to the reference count of the PASID. Tasks within the process may open additional accelerator devices. In this case the call to iommu_sva_bind_device() merely increments the reference count for the PASID. Since all devices use the same PASID (all are accessing the same address space). So the reference count on a PASID is the sum of the number of open accelerator devices plus the number of threads that have tried to execute ENQCMD. The reverse happens as a process gives up resources. Each call to iommu_sva_unbind_device() will reduce the reference count on the PASID. Each task in the process that had set up MSR_IA32_PASID will reduce the reference count as it exits. When the reference count is dropped to 0 in either task exit or unbind, the PASID will be freed. Signed-off-by: Fenghua Yu Reviewed-by: Tony Luck --- arch/x86/include/asm/iommu.h | 6 +++++ arch/x86/include/asm/mmu_context.h | 2 ++ drivers/iommu/intel/svm.c | 39 ++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h index 9c4bf9b0702f..d00f0a3f32fb 100644 --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -28,4 +28,10 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr) bool __fixup_pasid_exception(void); +#ifdef CONFIG_INTEL_IOMMU_SVM +void pasid_put(struct task_struct *tsk, struct mm_struct *mm); +#else +static inline void pasid_put(struct task_struct *tsk, struct mm_struct *mm) { } +#endif + #endif /* _ASM_X86_IOMMU_H */ diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 27516046117a..3a2de87e98a9 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -12,6 +12,7 @@ #include #include #include +#include extern atomic64_t last_mm_ctx_id; @@ -146,6 +147,7 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + pasid_put(tsk, mm); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index ab65020019b6..8b6b8007ba2c 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -1187,6 +1187,7 @@ int intel_svm_page_response(struct device *dev, bool __fixup_pasid_exception(void) { u32 pasid; + int ret; /* * This function is called only when this #GP was triggered from user @@ -1205,9 +1206,47 @@ bool __fixup_pasid_exception(void) if (current->has_valid_pasid) return false; + mutex_lock(&pasid_mutex); + /* The mm's pasid has been allocated. Take a reference to it. */ + ret = iommu_sva_alloc_pasid(current->mm, PASID_MIN, + intel_pasid_max_id - 1); + mutex_unlock(&pasid_mutex); + if (ret) + return false; + /* Fix up the MSR by the PASID in the mm. */ fpu__pasid_write(pasid); current->has_valid_pasid = 1; return true; } + +/* + * pasid_put - On task exit release a reference to the mm's PASID + * and free the PASID if no more reference + * @mm: the mm + * + * When the task exits, release a reference to the mm's PASID if it was + * allocated and the IA32_PASID MSR was fixed up. + * + * If there is no reference, the PASID is freed and can be allocated to + * any process later. + */ +void pasid_put(struct task_struct *tsk, struct mm_struct *mm) +{ + if (!cpu_feature_enabled(X86_FEATURE_ENQCMD)) + return; + + /* + * Nothing to do if this task doesn't have a reference to the PASID. + */ + if (tsk->has_valid_pasid) { + mutex_lock(&pasid_mutex); + /* + * The PASID's reference was taken during fix up. Release it + * now. If the reference count is 0, the PASID is freed. + */ + iommu_sva_free_pasid(mm); + mutex_unlock(&pasid_mutex); + } +} -- 2.33.0