From: Lu Baolu <baolu.lu@linux.intel.com>
To: Fenghua Yu <fenghua.yu@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@intel.com>,
Tony Luck <tony.luck@intel.com>, Joerg Roedel <joro@8bytes.org>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Dave Jiang <dave.jiang@intel.com>,
Jacob Jun Pan <jacob.jun.pan@intel.com>,
Ashok Raj <ashok.raj@intel.com>,
Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: baolu.lu@linux.intel.com, iommu@lists.linux-foundation.org,
x86 <x86@kernel.org>, linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting
Date: Thu, 23 Sep 2021 13:43:32 +0800 [thread overview]
Message-ID: <3156573d-0d25-db0f-57ae-b6406763a8e9@linux.intel.com> (raw)
In-Reply-To: <20210920192349.2602141-6-fenghua.yu@intel.com>
Hi Fenghua,
On 9/21/21 3:23 AM, Fenghua Yu wrote:
> PASIDs are fundamentally hardware resources in a shared address space.
> There is a limited number of them to use ENQCMD on shared workqueue.
> They must be shared and managed. They can not, for instance, be
> statically allocated to processes.
>
> Free PASID eagerly by sending IPIs in unbind was disabled due to locking
> and other issues in commit 9bfecd058339 ("x86/cpufeatures: Force disable
> X86_FEATURE_ENQCMD and remove update_pasid()").
>
> Lazy PASID free is implemented in order to re-enable the ENQCMD feature.
> PASIDs are currently reference counted and are centered around device
> usage. To support lazy PASID free, reference counts are tracked in the
> following scenarios:
>
> 1. The PASID's reference count is initialized as 1 when the PASID is first
> allocated in bind. This is already implemented.
> 2. A reference is taken when a device is bound to the mm and dropped
> when the device is unbound from the mm. This reference tracks device
> usage of the PASID. This is already implemented.
> 3. A reference is taken when a task's IA32_PASID MSR is initialized in
> #GP fix up and dropped when the task exits. This reference tracks
> the task usage of the PASID. It is implemented here.
>
> Once a PASID is allocated to an mm in bind, it's associated to the mm until
> it's freed lazily when its reference count is dropped to zero in unbind or
> exit(2).
>
> ENQCMD requires a valid IA32_PASID MSR with the PASID value and a valid
> PASID table entry for the PASID. Lazy PASID free may cause the process
> still has the valid PASID but the PASID table entry is removed in unbind.
> In this case, workqueue submitted by ENQCMD cannot find the PASID table
> entry and will generate a DMAR fault.
>
> Here is a more detailed explanation of the life cycle of a PASID:
>
> All processes start out without a PASID allocated (because fork(2)
> clears the PASID in the child).
>
> A PASID is allocated on the first open of an accelerator device by
> a call to:
> iommu_sva_bind_device()
> -> intel_svm_bind()
> -> intel_svm_alloc_pasid()
> -> iommu_sva_alloc_pasid()
> -> ioasid_alloc()
>
> At this point mm->pasid for the process is initialized, the reference
> count on that PASID is 1, but as yet no tasks within the process have
> set up their MSR_IA32_PASID to be able to execute the ENQCMD instruction.
>
> When a task in the process does execute ENQCMD there is a #GP fault.
> The Linux handler notes that the process has a PASID allocated, and
> attempts to fix the #GP fault by initializing MSR_IA32_PASID for this
> task. It also increments the reference count for the PASID.
>
> Additional threads in the task may also execute ENQCMD, and each
> will add to the reference count of the PASID.
>
> Tasks within the process may open additional accelerator devices.
> In this case the call to iommu_sva_bind_device() merely increments
> the reference count for the PASID. Since all devices use the same
> PASID (all are accessing the same address space).
>
> So the reference count on a PASID is the sum of the number of open
> accelerator devices plus the number of threads that have tried to
> execute ENQCMD.
>
> The reverse happens as a process gives up resources. Each call to
> iommu_sva_unbind_device() will reduce the reference count on the
> PASID. Each task in the process that had set up MSR_IA32_PASID will
> reduce the reference count as it exits.
>
> When the reference count is dropped to 0 in either task exit or
> unbind, the PASID will be freed.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/include/asm/iommu.h | 6 +++++
> arch/x86/include/asm/mmu_context.h | 2 ++
> drivers/iommu/intel/svm.c | 39 ++++++++++++++++++++++++++++++
> 3 files changed, 47 insertions(+)
>
> diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
> index 9c4bf9b0702f..d00f0a3f32fb 100644
> --- a/arch/x86/include/asm/iommu.h
> +++ b/arch/x86/include/asm/iommu.h
> @@ -28,4 +28,10 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
>
> bool __fixup_pasid_exception(void);
>
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +void pasid_put(struct task_struct *tsk, struct mm_struct *mm);
> +#else
> +static inline void pasid_put(struct task_struct *tsk, struct mm_struct *mm) { }
> +#endif
> +
> #endif /* _ASM_X86_IOMMU_H */
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 27516046117a..3a2de87e98a9 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -12,6 +12,7 @@
> #include <asm/tlbflush.h>
> #include <asm/paravirt.h>
> #include <asm/debugreg.h>
> +#include <asm/iommu.h>
>
> extern atomic64_t last_mm_ctx_id;
>
> @@ -146,6 +147,7 @@ do { \
> #else
> #define deactivate_mm(tsk, mm) \
> do { \
> + pasid_put(tsk, mm); \
> load_gs_index(0); \
> loadsegment(fs, 0); \
> } while (0)
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index ab65020019b6..8b6b8007ba2c 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -1187,6 +1187,7 @@ int intel_svm_page_response(struct device *dev,
> bool __fixup_pasid_exception(void)
> {
> u32 pasid;
> + int ret;
>
> /*
> * This function is called only when this #GP was triggered from user
> @@ -1205,9 +1206,47 @@ bool __fixup_pasid_exception(void)
> if (current->has_valid_pasid)
> return false;
>
> + mutex_lock(&pasid_mutex);
> + /* The mm's pasid has been allocated. Take a reference to it. */
> + ret = iommu_sva_alloc_pasid(current->mm, PASID_MIN,
> + intel_pasid_max_id - 1);
> + mutex_unlock(&pasid_mutex);
> + if (ret)
> + return false;
> +
> /* Fix up the MSR by the PASID in the mm. */
> fpu__pasid_write(pasid);
> current->has_valid_pasid = 1;
>
> return true;
> }
> +
> +/*
> + * pasid_put - On task exit release a reference to the mm's PASID
> + * and free the PASID if no more reference
> + * @mm: the mm
> + *
> + * When the task exits, release a reference to the mm's PASID if it was
> + * allocated and the IA32_PASID MSR was fixed up.
> + *
> + * If there is no reference, the PASID is freed and can be allocated to
> + * any process later.
> + */
> +void pasid_put(struct task_struct *tsk, struct mm_struct *mm)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
> + return;
> +
> + /*
> + * Nothing to do if this task doesn't have a reference to the PASID.
> + */
> + if (tsk->has_valid_pasid) {
> + mutex_lock(&pasid_mutex);
> + /*
> + * The PASID's reference was taken during fix up. Release it
> + * now. If the reference count is 0, the PASID is freed.
> + */
> + iommu_sva_free_pasid(mm);
> + mutex_unlock(&pasid_mutex);
> + }
> +}
>
It looks odd that both __fixup_pasid_exception() and pasid_put() are
defined in the vendor IOMMU driver, but get called in the arch/x86
code.
Is it feasible to move these two helpers to the files where they are
called? The IA32_PASID MSR fixup and release are not part of the IOMMU
implementation.
Best regards,
baolu
next prev parent reply other threads:[~2021-09-23 5:47 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-20 19:23 [PATCH 0/8] Re-enable ENQCMD and PASID MSR Fenghua Yu
2021-09-20 19:23 ` [PATCH 1/8] iommu/vt-d: Clean up unused PASID updating functions Fenghua Yu
2021-09-29 7:34 ` Lu Baolu
2021-09-30 0:40 ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 2/8] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
2021-09-20 19:23 ` [PATCH 3/8] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
2021-09-20 19:23 ` [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP Fenghua Yu
2021-09-22 21:07 ` Peter Zijlstra
2021-09-22 21:11 ` Peter Zijlstra
2021-09-22 21:26 ` Luck, Tony
2021-09-23 7:03 ` Peter Zijlstra
2021-09-22 21:33 ` Dave Hansen
2021-09-23 7:05 ` Peter Zijlstra
2021-09-22 21:36 ` Fenghua Yu
2021-09-22 23:39 ` Fenghua Yu
2021-09-23 17:14 ` Luck, Tony
2021-09-24 13:37 ` Peter Zijlstra
2021-09-24 15:39 ` Luck, Tony
2021-09-29 9:00 ` Peter Zijlstra
2021-09-23 11:31 ` Thomas Gleixner
2021-09-23 23:17 ` Andy Lutomirski
2021-09-24 2:56 ` Fenghua Yu
2021-09-24 5:12 ` Andy Lutomirski
2021-09-27 21:02 ` Luck, Tony
2021-09-27 23:51 ` Dave Hansen
2021-09-28 18:50 ` Luck, Tony
2021-09-28 19:19 ` Dave Hansen
2021-09-28 20:28 ` Luck, Tony
2021-09-28 20:55 ` Dave Hansen
2021-09-28 23:10 ` Luck, Tony
2021-09-28 23:50 ` Fenghua Yu
2021-09-29 0:08 ` Luck, Tony
2021-09-29 0:26 ` Yu, Fenghua
2021-09-29 1:06 ` Luck, Tony
2021-09-29 1:16 ` Fenghua Yu
2021-09-29 2:11 ` Luck, Tony
2021-09-29 1:56 ` Yu, Fenghua
2021-09-29 2:15 ` Luck, Tony
2021-09-29 16:58 ` Andy Lutomirski
2021-09-29 17:07 ` Luck, Tony
2021-09-29 17:48 ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting Fenghua Yu
2021-09-23 5:43 ` Lu Baolu [this message]
2021-09-30 0:44 ` Fenghua Yu
2021-09-23 14:36 ` Thomas Gleixner
2021-09-23 16:40 ` Luck, Tony
2021-09-23 17:48 ` Thomas Gleixner
2021-09-24 13:18 ` Thomas Gleixner
2021-09-24 16:12 ` Luck, Tony
2021-09-24 23:03 ` Andy Lutomirski
2021-09-24 23:11 ` Luck, Tony
2021-09-29 9:54 ` Peter Zijlstra
2021-09-29 12:28 ` Thomas Gleixner
2021-09-29 16:51 ` Luck, Tony
2021-09-29 17:07 ` Fenghua Yu
2021-09-29 16:59 ` Andy Lutomirski
2021-09-29 17:15 ` Thomas Gleixner
2021-09-29 17:41 ` Luck, Tony
2021-09-29 17:46 ` Andy Lutomirski
2021-09-29 18:07 ` Fenghua Yu
2021-09-29 18:31 ` Luck, Tony
2021-09-29 20:07 ` Thomas Gleixner
2021-09-24 16:12 ` Fenghua Yu
2021-09-25 23:13 ` Thomas Gleixner
2021-09-28 16:36 ` Fenghua Yu
2021-09-23 23:09 ` Andy Lutomirski
2021-09-23 23:22 ` Luck, Tony
2021-09-24 5:17 ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 6/8] x86/cpufeatures: Re-enable ENQCMD Fenghua Yu
2021-09-20 19:23 ` [PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel Fenghua Yu
2021-09-22 21:03 ` Peter Zijlstra
2021-09-22 23:44 ` Fenghua Yu
2021-09-23 7:17 ` Peter Zijlstra
2021-09-23 15:26 ` Fenghua Yu
2021-09-24 0:55 ` Josh Poimboeuf
2021-09-24 0:57 ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 8/8] docs: x86: Change documentation for SVA (Shared Virtual Addressing) Fenghua Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3156573d-0d25-db0f-57ae-b6406763a8e9@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=ashok.raj@intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.jiang@intel.com \
--cc=fenghua.yu@intel.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jacob.jun.pan@intel.com \
--cc=joro@8bytes.org \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=ravi.v.shankar@intel.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).