linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fenghua Yu <fenghua.yu@intel.com>
To: "Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>, "H Peter Anvin" <hpa@zytor.com>,
	"David Woodhouse" <dwmw2@infradead.org>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Tony Luck" <tony.luck@intel.com>,
	"Ashok Raj" <ashok.raj@intel.com>,
	"Jacob Jun Pan" <jacob.jun.pan@intel.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Sohil Mehta" <sohil.mehta@intel.com>,
	"Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "linux-kernel" <linux-kernel@vger.kernel.org>,
	"x86" <x86@kernel.org>,
	iommu@lists.linux-foundation.org,
	Fenghua Yu <fenghua.yu@intel.com>
Subject: [PATCH 1/7] docs: x86: Add a documentation for ENQCMD
Date: Mon, 30 Mar 2020 12:33:02 -0700	[thread overview]
Message-ID: <1585596788-193989-2-git-send-email-fenghua.yu@intel.com> (raw)
In-Reply-To: <1585596788-193989-1-git-send-email-fenghua.yu@intel.com>

From: Ashok Raj <ashok.raj@intel.com>

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/x86/enqcmd.rst | 185 +++++++++++++++++++++++++++++++++++
 1 file changed, 185 insertions(+)
 create mode 100644 Documentation/x86/enqcmd.rst

diff --git a/Documentation/x86/enqcmd.rst b/Documentation/x86/enqcmd.rst
new file mode 100644
index 000000000000..414ef7d24028
--- /dev/null
+++ b/Documentation/x86/enqcmd.rst
@@ -0,0 +1,185 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Improved Device Interaction Overview
+
+== Background ==
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. ENQCMD is a new instruction on Intel
+platforms that allows user applications to directly notify hardware of new
+work, much like doorbells are used in some hardware, but carries a payload
+that carries the PASID and some additional device specific commands
+along with it.
+
+== Address Space Tagging ==
+
+A new MSR (MSR_IA32_PASID) allows an application address space to be
+associated with what the PCIe spec calls a Process Address Space ID
+(PASID). This PASID tag is carried along with all requests between
+applications and devices and allows devices to interact with the process
+address space.
+
+This MSR is managed with the XSAVE feature set as "supervisor state".
+
+== PASID Management ==
+
+The kernel must allocate a PASID on behalf of each process and program it
+into the new MSR to communicate the process identity to platform hardware.
+ENQCMD uses the PASID stored in this MSR to tag requests from this process.
+Requests for DMA from the device are also tagged with the same PASID. The
+platform IOMMU uses the PASID in the transaction to perform address
+translation. The IOMMU api's setup the corresponding PASID entry in IOMMU
+with the process address used by the CPU (for e.g cr3 in x86).
+
+The MSR must be configured on each logical CPU before any application
+thread can interact with a device.  Threads that belong to the same
+process share the same page tables, thus the same MSR value.
+
+The PASID allocation and MSR programming may occur long after a process and
+its threads have been created. If a thread uses ENQCMD without the MSR
+first being populated, it will #GP.  The kernel will fix up the #GP by
+writing the process-wide PASID into the thread that took the #GP. A single
+process PASID can be used simultaneously with multiple devices since they
+all share the same address space.
+
+New threads could inherit the MSR value from the parent. But this would
+involve additional state management for those threads which may never use
+ENQCMD. Clearing the MSR at thread creation permits all threads to have a
+consistent behavior; the PASID is only programmed when the thread calls
+ENQCMD for the first time.
+
+Although ENQCMD can be executed in the kernel, there isn't any usage yet.
+Currently #GP handler doesn't fix up #GP triggered from ENQCMD executed
+in the kernel
+
+== Relationships ==
+
+ * Each process has many threads, but only one PASID
+ * Devices have a limited number (~10's to 1000's) of hardware
+   workqueues and each portal maps down to a single workqueue.
+   The device driver manages allocating hardware workqueues.
+ * A single mmap() maps a single hardware workqueue as a "portal"
+ * For each device with which a process interacts, there must be
+   one or more mmap()'d portals.
+ * Many threads within a process can share a single portal to access
+   a single device.
+ * Multiple processes can separately mmap() the same portal, in
+   which case they still share one device hardware workqueue.
+ * The single process-wide PASID is used by all threads to interact
+   with all devices.  There is not, for instance, a PASID for each
+   thread or each thread<->device pair.
+
+== FAQ ==
+
+* What is SVA/SVM?
+
+Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
+work in the same address space. In short, sharing the address space. Some
+call it Shared Virtual Memory (SVM), but Linux community wanted to avoid
+it with Posix Shared Memory and Secure Virtual Machines which were terms
+already in circulation.
+
+* What is a PASID?
+
+A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A PASID is
+a 20 bit number allocated and managed by the OS. PASID is included in all
+transactions between the platform and the device.
+
+* How are shared work queues different?
+
+Traditionally to allow user space applications interact with hardware,
+there is a separate instance required per process. For e.g. consider
+doorbells as a mechanism of informing hardware about work to process. Each
+doorbell is required to be spaced 4k (or page-size) apart for process
+isolation. This requires hardware to provision that space and reserve in
+MMIO. This doesn't scale as the number of threads becomes quite large. The
+hardware also manages the queue depth for Shared Work Queues (SWQ), and
+consumers don't need to track queue depth. If there is no space to accept
+a command, the device will return an error indicating retry. Also
+submitting a command to an MMIO address that can't accept ENQCMD will
+return retry in response.
+
+SWQ allows hardware to provision just a single address in the device. When
+used with ENQCMD to submit work, the device can distinguish the process
+submitting the work since it will include the PASID assigned to that
+process. This decreases the pressure of hardware requiring to support
+hardware to scale to a large number of processes.
+
+* Is this the same as a user space device driver?
+
+Communicating with the device via the shared work queue is much simpler
+than a full blown user space driver. The kernel driver does all the
+initialization of the hardware. User space only needs to worry about
+submitting work and processing completions.
+
+* Is this the same as SR-IOV?
+
+Single Root I/O Virtualization (SR-IOV) focuses on providing independent
+hardware interfaces for virtualizing hardware. Hence its required to be
+almost fully functional interface to software supporting the traditional
+BAR's, space for interrupts via MSI-x, its own register layout. Creating
+of this Virtual Functions (VFs) is assisted by the Physical Function (PF)
+driver.
+
+Scalable I/O Virtualization builds on the PASID concept to create device
+instances for virtualization. SIOV requires host software to assist in
+creating virtual devices, each virtual device is represented by a PASID.
+This allows device hardware to optimize device resource creation and can
+grow dynamically on demand. SR-IOV creation and management is very static
+in nature. Consult references below for more details.
+
+* Why not just create a virtual function for each app?
+
+Creating PCIe SRIOV type virtual functions (VF) are expensive. They create
+duplicated hardware for PCI config space requirements, Interrupts such as
+MSIx for instance. Resources such as interrupts have to be hard partitioned
+between VF's at creation time, and cannot scale dynamically on demand. The
+VF's are not completely independent from the Physical function (PF). Most
+VF's require some communication and assistance from the PF driver. SIOV
+creates a software defined device. Where all the configuration and control
+aspects are mediated via the slow path. The work submission and completion
+happen without any mediation.
+
+* Does this support virtualization?
+
+ENQCMD can be used from within a guest VM. In these cases the VMM helps
+with setting up a translation table to translate from Guest PASID to Host
+PASID. Please consult the ENQCMD instruction set reference for more
+details.
+
+* Does memory need to be pinned?
+
+When devices support SVA, along with platform hardware such as IOMMU
+supporting such devices, there is no need to pin memory for DMA purposes.
+Devices that support SVA also support other PCIe features that remove the
+pinning requirement for memory.
+
+Device TLB support - Device requests the IOMMU to lookup an address before
+use via Address Translation Service (ATS) requests.  If the mapping exists
+but there is no page allocated by the OS, IOMMU hardware returns that no
+mapping exists.
+
+Device requests that virtual address to be mapped via Page Request
+Interface (PRI). Once the OS has successfully completed  the mapping, it
+returns the response back to the device. The device continues again to
+request for a translation and continues.
+
+IOMMU works with the OS in managing consistency of page-tables with the
+device. When removing pages, it interacts with the device to remove any
+device-tlb that might have been cached before removing the mappings from
+the OS.
+
+== References ==
+
+VT-D:
+https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualization-technology-directed-i/o-intel-vt-d
+
+SIOV:
+https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux
+
+ENQCMD in ISE:
+https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
+
+DSA spec:
+https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf
-- 
2.19.1


  reply	other threads:[~2020-03-30 20:38 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-30 19:33 [PATCH 0/7] x86: tag application address space for devices Fenghua Yu
2020-03-30 19:33 ` Fenghua Yu [this message]
2020-04-26 11:02   ` [PATCH 1/7] docs: x86: Add a documentation for ENQCMD Thomas Gleixner
2020-04-27 20:13     ` Fenghua Yu
2020-03-30 19:33 ` [PATCH 2/7] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions Fenghua Yu
2020-04-26 11:06   ` Thomas Gleixner
2020-04-27 20:17     ` Fenghua Yu
2020-03-30 19:33 ` [PATCH 3/7] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature Fenghua Yu
2020-04-26 11:17   ` Thomas Gleixner
2020-04-27 20:33     ` Fenghua Yu
2020-03-30 19:33 ` [PATCH 4/7] x86/msr-index: Define IA32_PASID MSR Fenghua Yu
2020-04-26 11:22   ` Thomas Gleixner
2020-04-27 20:50     ` Fenghua Yu
2020-03-30 19:33 ` [PATCH 5/7] x86/mmu: Allocate/free PASID Fenghua Yu
2020-04-26 14:55   ` Thomas Gleixner
2020-04-27 22:18     ` Fenghua Yu
2020-04-27 23:44       ` Thomas Gleixner
2020-04-28 18:21     ` Jacob Pan (Jun)
2020-04-28 18:54       ` Thomas Gleixner
2020-04-28 19:07         ` Luck, Tony
2020-04-28 20:42           ` Jacob Pan (Jun)
2020-04-28 20:59             ` Luck, Tony
2020-04-28 22:13               ` Jacob Pan (Jun)
2020-04-28 22:32                 ` Luck, Tony
2020-04-28 20:40         ` Jacob Pan (Jun)
2020-04-28 20:57     ` Fenghua Yu
2020-03-30 19:33 ` [PATCH 6/7] x86/traps: Fix up invalid PASID Fenghua Yu
2020-04-26 15:25   ` Thomas Gleixner
2020-04-27 20:11     ` Fenghua Yu
2020-04-28  0:13       ` Thomas Gleixner
2020-04-27 22:46     ` Raj, Ashok
2020-04-27 23:08       ` Luck, Tony
2020-04-28  0:20         ` Thomas Gleixner
2020-04-28  0:54       ` Thomas Gleixner
2020-04-28  1:08         ` Raj, Ashok
2020-03-30 19:33 ` [PATCH 7/7] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
2020-04-22 20:41 ` [PATCH 0/7] x86: tag application address space for devices Fenghua Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1585596788-193989-2-git-send-email-fenghua.yu@intel.com \
    --to=fenghua.yu@intel.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=sohil.mehta@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).