From: Fenghua Yu <fenghua.yu@intel.com>
To: "Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"Peter Zijlstra" <peterz@infradead.org>,
"Andy Lutomirski" <luto@kernel.org>,
"Dave Hansen" <dave.hansen@intel.com>,
"Tony Luck" <tony.luck@intel.com>,
"Lu Baolu" <baolu.lu@linux.intel.com>,
"Joerg Roedel" <joro@8bytes.org>,
Josh Poimboeuf <jpoimboe@redhat.com>,
"Dave Jiang" <dave.jiang@intel.com>,
"Jacob Jun Pan" <jacob.jun.pan@intel.com>,
"Ashok Raj" <ashok.raj@intel.com>,
"Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: iommu@lists.linux-foundation.org, "x86" <x86@kernel.org>,
"linux-kernel" <linux-kernel@vger.kernel.org>,
Fenghua Yu <fenghua.yu@intel.com>
Subject: [PATCH 8/8] docs: x86: Change documentation for SVA (Shared Virtual Addressing)
Date: Mon, 20 Sep 2021 19:23:49 +0000 [thread overview]
Message-ID: <20210920192349.2602141-9-fenghua.yu@intel.com> (raw)
In-Reply-To: <20210920192349.2602141-1-fenghua.yu@intel.com>
Since allocating, freeing and fixing up PASID are changed, the
documentation is updated to reflect the changes.
Originally-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
Documentation/x86/sva.rst | 81 +++++++++++++++++++++++++++++++++++----
1 file changed, 74 insertions(+), 7 deletions(-)
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
index 076efd51ef1f..868ed4b25002 100644
--- a/Documentation/x86/sva.rst
+++ b/Documentation/x86/sva.rst
@@ -106,16 +106,83 @@ process share the same page tables, thus the same MSR value.
PASID is cleared when a process is created. The PASID allocation and MSR
programming may occur long after a process and its threads have been created.
-One thread must call iommu_sva_bind_device() to allocate the PASID for the
-process. If a thread uses ENQCMD without the MSR first being populated, a #GP
-will be raised. The kernel will update the PASID MSR with the PASID for all
-threads in the process. A single process PASID can be used simultaneously
+One thread must call iommu_sva_bind(_device) to allocate the PASID for the process.
+If a thread uses ENQCMD without the MSR first being populated, it will cause #GP.
+The kernel will fix up the #GP by writing the process-wide PASID into the
+thread that took the #GP. A single process PASID can be used simultaneously
with multiple devices since they all share the same address space.
-One thread can call iommu_sva_unbind_device() to free the allocated PASID.
-The kernel will clear the PASID MSR for all threads belonging to the process.
+The PASID MSR value is cleared at thread creation and is never inherited from a
+parent. This ensures consistent child behavior no matter whether the thread is
+created first or the PASID is allocated (and the MSR written).
-New threads inherit the MSR value from the parent.
+
+PASID Lifecycle Management
+==========================
+
+Only processes that access SVA-capable devices need to have a PASID
+allocated. This allocation happens when a process opens/binds an SVA-capable
+device but finds no PASID for this process. Subsequent binds of the same, or
+other devices will share the same PASID.
+
+Although the PASID is allocated to the process by opening a device,
+it is not active in any of the threads of that process. It's loaded to the
+IA32_PASID MSR lazily when a thread tries to submit a work descriptor
+to a device using the ENQCMD.
+
+That first access will trigger a #GP fault because the IA32_PASID MSR
+has not been initialized with the PASID value assigned to the process
+when the device was opened. The Linux #GP handler notes that a PASID has
+been allocated for the process, and so initializes the IA32_PASID MSR, takes
+a reference to the PASID and returns so that the ENQCMD instruction is
+re-executed.
+
+On fork(2) or exec(2) the PASID is removed from the process as it no
+longer has the same address space that it had when the device was opened.
+
+On clone(2) the new task shares the same address space, so will be
+able to use the PASID allocated to the process. The IA32_PASID is not
+preemptively initialized as the PASID value might not be allocated yet or
+the kernel does not know whether this thread is going to access the device
+and the cleared IA32_PASID MSR reduces context switch overhead by xstate
+init optimization. Since #GP faults have to be handled on any threads that
+were created before the PASID was assigned to the mm of the process, newly
+created threads might as well be treated in a consistent way.
+
+Due to complexity of freeing the PASID and clearing all IA32_PASID MSRs in
+all threads in unbind, free the PASID lazily when there is no PASID user.
+Track the PASID's reference count in the following way:
+
+- Track device usage of the PASID: The PASID's reference count is initialized
+ as 1 when the PASID is allocated in the first bind. Bind takes a reference
+ and unbind drops the reference.
+- Track task usage of the PASID: Fixing up the IA32_PASID MSR in #GP takes
+ reference and exit(2) drops the reference. Once the MSR is fixed up, the
+ PASID value stays in the MSR stays for the rest life of the task.
+
+The PASID is freed lazily in exit(2) or unbind when there is no reference
+to the PASID. After freed, the PASID can be allocated to any process.
+
+ENQCMD needs at least two requirements: a valid IA32_PASID MSR with the
+PASID value of the process and a valid PASID table entry for the PASID.
+To execute ENQCMD, the user must ensure the device is bound to the
+process so that the kernel can guarantee to meet the above two requirements.
+
+Lazy PASID free may cause the task still has the PASID value in IA32_PASID
+while there is no PASID table entry for the PASID. The workqueue submitted
+by ENQCMD in this scenario cannot find the PASID table entry and generates
+a DMAR fault. Currently DMAR fault handler just prints a fault reason.
+Future DMAR fault handler might notify the user the workqueue failure.
+Here are two detailed cases:
+
+- Unbind removes the PASID table entry but the process still owns the PASID
+ and the task's IA32_PASID MSR still keeps the PASID value. The workqueue
+ submitted by ENQCMD in this task will generate a DMAR fault.
+- Unbind removes the PASID table entry but the process still owns the PASID
+ because some task took one reference during fix up. ENQCMD executed in a
+ task that doesn't fix up the IA32_PASID MSR will generate #GP first to get
+ its IA32_PASID MSR fixed up and then the submitted workqueue will generate
+ a DMAR fault.
Relationships
=============
--
2.33.0
prev parent reply other threads:[~2021-09-20 20:06 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-20 19:23 [PATCH 0/8] Re-enable ENQCMD and PASID MSR Fenghua Yu
2021-09-20 19:23 ` [PATCH 1/8] iommu/vt-d: Clean up unused PASID updating functions Fenghua Yu
2021-09-29 7:34 ` Lu Baolu
2021-09-30 0:40 ` Fenghua Yu
2021-09-20 19:23 ` [PATCH 2/8] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
2021-09-20 19:23 ` [PATCH 3/8] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
2021-09-20 19:23 ` [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP Fenghua Yu
2021-09-22 21:07 ` Peter Zijlstra
2021-09-22 21:11 ` Peter Zijlstra
2021-09-22 21:26 ` Luck, Tony
2021-09-23 7:03 ` Peter Zijlstra
2021-09-22 21:33 ` Dave Hansen
2021-09-23 7:05 ` Peter Zijlstra
2021-09-22 21:36 ` Fenghua Yu
2021-09-22 23:39 ` Fenghua Yu
2021-09-23 17:14 ` Luck, Tony
2021-09-24 13:37 ` Peter Zijlstra
2021-09-24 15:39 ` Luck, Tony
2021-09-29 9:00 ` Peter Zijlstra
2021-09-23 11:31 ` Thomas Gleixner
2021-09-23 23:17 ` Andy Lutomirski
2021-09-24 2:56 ` Fenghua Yu
2021-09-24 5:12 ` Andy Lutomirski
2021-09-27 21:02 ` Luck, Tony
2021-09-27 23:51 ` Dave Hansen
2021-09-28 18:50 ` Luck, Tony
2021-09-28 19:19 ` Dave Hansen
2021-09-28 20:28 ` Luck, Tony
2021-09-28 20:55 ` Dave Hansen
2021-09-28 23:10 ` Luck, Tony
2021-09-28 23:50 ` Fenghua Yu
2021-09-29 0:08 ` Luck, Tony
2021-09-29 0:26 ` Yu, Fenghua
2021-09-29 1:06 ` Luck, Tony
2021-09-29 1:16 ` Fenghua Yu
2021-09-29 2:11 ` Luck, Tony
2021-09-29 1:56 ` Yu, Fenghua
2021-09-29 2:15 ` Luck, Tony
2021-09-29 16:58 ` Andy Lutomirski
2021-09-29 17:07 ` Luck, Tony
2021-09-29 17:48 ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting Fenghua Yu
2021-09-23 5:43 ` Lu Baolu
2021-09-30 0:44 ` Fenghua Yu
2021-09-23 14:36 ` Thomas Gleixner
2021-09-23 16:40 ` Luck, Tony
2021-09-23 17:48 ` Thomas Gleixner
2021-09-24 13:18 ` Thomas Gleixner
2021-09-24 16:12 ` Luck, Tony
2021-09-24 23:03 ` Andy Lutomirski
2021-09-24 23:11 ` Luck, Tony
2021-09-29 9:54 ` Peter Zijlstra
2021-09-29 12:28 ` Thomas Gleixner
2021-09-29 16:51 ` Luck, Tony
2021-09-29 17:07 ` Fenghua Yu
2021-09-29 16:59 ` Andy Lutomirski
2021-09-29 17:15 ` Thomas Gleixner
2021-09-29 17:41 ` Luck, Tony
2021-09-29 17:46 ` Andy Lutomirski
2021-09-29 18:07 ` Fenghua Yu
2021-09-29 18:31 ` Luck, Tony
2021-09-29 20:07 ` Thomas Gleixner
2021-09-24 16:12 ` Fenghua Yu
2021-09-25 23:13 ` Thomas Gleixner
2021-09-28 16:36 ` Fenghua Yu
2021-09-23 23:09 ` Andy Lutomirski
2021-09-23 23:22 ` Luck, Tony
2021-09-24 5:17 ` Andy Lutomirski
2021-09-20 19:23 ` [PATCH 6/8] x86/cpufeatures: Re-enable ENQCMD Fenghua Yu
2021-09-20 19:23 ` [PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel Fenghua Yu
2021-09-22 21:03 ` Peter Zijlstra
2021-09-22 23:44 ` Fenghua Yu
2021-09-23 7:17 ` Peter Zijlstra
2021-09-23 15:26 ` Fenghua Yu
2021-09-24 0:55 ` Josh Poimboeuf
2021-09-24 0:57 ` Fenghua Yu
2021-09-20 19:23 ` Fenghua Yu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210920192349.2602141-9-fenghua.yu@intel.com \
--to=fenghua.yu@intel.com \
--cc=ashok.raj@intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.jiang@intel.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jacob.jun.pan@intel.com \
--cc=joro@8bytes.org \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=ravi.v.shankar@intel.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).