linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Add shared workqueue support for idxd driver
@ 2020-03-30 21:26 Dave Jiang
  2020-03-30 21:26 ` [PATCH 1/6] x86/asm: add iosubmit_cmds512_sync() based on enqcmds Dave Jiang
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-30 21:26 UTC (permalink / raw)
  To: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd
  Cc: linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

The patch series breaks down into following parts:
Patch 1: x86 arch, add a new I/O accessor based on ENQCMDS
Patches 2,3: PCI
Patch 4: device
Patches 5,6: idxd driver shared WQ support

Driver stage 1 postings for context: [1]

This patch series is dependent on Fenghua's "Tag application address space
for devices" patch series for the ENQCMD CPU command enumeration and the
PASID MSR support. [2]

This patch series introduces support for shared workqueue (swq) for the
idxd driver that is supported by the Intel Data Streaming Accelerator (DSA). In
the stage 1 patch series posting, only dedicated workqueues (dwq) are
supported. Another major feature being introduced is Shared Virtual Memory (SVM)
support.

In this patch series, we introduce the iosubmit_cmds512_sync() command.
This function enables the calling of the new ENQCMDS CPU instruction on the
Intel CPU. ENQCMDS is a ring0 CPU instruction that performs similar function as
the ENQCMD. The CPUID capability bit will enumerate ENQCMD is supported,
which implies that ENQCMDS is supported as well. The instruction creates a
non-posted write command on the PCIe bus and atomically writes all 64 bytes of a
command descriptor to the accelerator device's special BAR address. The device can
reject the command by using the zero flag. See Intel SDM [2] for additional
details. The ENQCMDS instruction is used for submitting command descriptors
to the swq. With this instruction, multiple "users" in the kernel can submit
to the swq with the synchronization done by the hardware. When the swq is
full, a 1 will be returned indicate the wq being busy. This is different than a
dwq where the command will be silently dropped without response. The dwq
requires the software to track the queue depth of a dwq.

The attribute of cmdmem device capability is being introduced to the PCI
device with ongoing discussion of adding the ability to enumerate PCIe capability
that provides the support of such device MMIO regions. A
device_supports_cmdmem(struct device *) helper function is introduced to
initially support devices that has this capability. Since the standardized
way to detect such capability is not available yet, a PCI quirk is defined
for the DSA device.

ENQCMDS moves a command descriptor (64 bytes) from memory to specific MMIO
regions on a device that supports this capability. Like MMIO, the "handle"
for these queues is just a pointer to the address of the queue in MMIO memory.
These specific MMIO regions on the device are called portals. To support
these portals, wrappers functions are introduced for ioremap() to make it clear
that the special MMIO regions that expects a response from a non-posted PCI
write are being ioremaped.

The support of SVM is done through Process Address Space ID (PASID). With
SVM support, the DMA descriptors can be programmed with virtual address for
source and destination addresses. When a page fault is encountered, the device can
fault in the memory page needed to complete the operation through the
IOMMU.  This makes calling the dma_(un)map_* API calls unnecessary which reduces
some software latencies. Both swq and dwq can support SVM.

Swq enabling has been added to the idxd char device driver to allow
exposure of the swq to the user space. User apps can call ENQCMD to submit
descriptors directly to a swq after calling mmap() on the portal. ENQCMD is
similar to the ENQCMDS instruction, but is only available to ring3
(application) code. The primary difference is that ENQCMD obtains the PASID for the
request from the IA32_PASID MSR to ensure that all virtual addresses specified by
the user are interpreted in the address space of the process that executed
ENQCMD. With the SVM enabling, the user no longer has to pin the memory and program
IO Virtual Address (IOVA) as source and address. Virtual addresses can be
programmed in the command descriptor and be submitted to the device. The
SVM handling is done through the usage of PASID. For more detailed explanation
on how ENQCMD and PASID interaction works please refer to Fenghua's submission
cover letter [6]. Multiple user apps can easily share a swq with the ENQCMD
instruction without needing software sychronization.

[1]: https://lore.kernel.org/lkml/157965011794.73301.15960052071729101309.stgit@djiang5-desk3.ch.intel.com/
[2]: https://lore.kernel.org/lkml/1585596788-193989-1-git-send-email-fenghua.yu@intel.com/
[3]: https://software.intel.com/en-us/articles/intel-sdm
[4]: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[5]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
[6]: https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
[7]: https://intel.github.io/idxd/
[8]: https://github.com/intel/idxd-driver idxd-stage2

---

Dave Jiang (6):
      x86/asm: add iosubmit_cmds512_sync() based on enqcmds
      device/pci: add cmdmem cap to pci_dev
      pci: add PCI quirk cmdmem fixup for Intel DSA device
      device: add cmdmem support for MMIO address
      dmaengine: idxd: add shared workqueue support
      dmaengine: idxd: add ABI documentation for shared wq


 Documentation/ABI/stable/sysfs-driver-dma-idxd |   14 ++
 arch/x86/include/asm/io.h                      |   37 ++++++
 drivers/base/core.c                            |   13 ++
 drivers/dma/Kconfig                            |    4 +
 drivers/dma/idxd/cdev.c                        |   46 +++++++-
 drivers/dma/idxd/device.c                      |  122 ++++++++++++++++++--
 drivers/dma/idxd/dma.c                         |    2 
 drivers/dma/idxd/idxd.h                        |   13 ++
 drivers/dma/idxd/init.c                        |   92 ++++++++++++---
 drivers/dma/idxd/irq.c                         |  147 ++++++++++++++++++++++--
 drivers/dma/idxd/submit.c                      |  119 ++++++++++++++-----
 drivers/dma/idxd/sysfs.c                       |  133 ++++++++++++++++++++++
 drivers/pci/quirks.c                           |   11 ++
 include/linux/device.h                         |    2 
 include/linux/io.h                             |    4 +
 include/linux/pci.h                            |    1 
 lib/devres.c                                   |   36 ++++++
 17 files changed, 721 insertions(+), 75 deletions(-)

--

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] x86/asm: add iosubmit_cmds512_sync() based on enqcmds
  2020-03-30 21:26 [PATCH 0/6] Add shared workqueue support for idxd driver Dave Jiang
@ 2020-03-30 21:26 ` Dave Jiang
  2020-03-30 21:27 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Dave Jiang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-30 21:26 UTC (permalink / raw)
  To: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd
  Cc: linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

ENQCMDS is a non-posted instruction introduced to submit 64B descriptors to
accelerator devices. The CPU instruction will set 1 for the zero flag if
the device rejects the submission. An 1 is also set if the destination is
not MMIO and/or the device does not respond. iosubmit_cmds512_sync() is
introduced to support this CPU instruction and allow multiple descriptors
to be copied to the same mmio location. This allows the caller to issue
multiple descriptors that are virtually contiguous in memory if desired.

ENQCMDS requires the destination address to be 64-byte aligned. No
alignment restriction is enforced for source operand.

See Intel Software Developer’s Manual for more information on the
instruction.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 arch/x86/include/asm/io.h |   37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e1aa17a468a8..349e97766c02 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -435,4 +435,41 @@ static inline void iosubmit_cmds512(void __iomem *__dst, const void *src,
 	}
 }
 
+/**
+ * iosubmit_cmds512_sync - copy data to single MMIO location, in 512-bit units
+ * @dst: destination, in MMIO space (must be 512-bit aligned)
+ * @src: source
+ * @count: number of 512 bits quantities to submit
+ *
+ * Submit data from kernel space to MMIO space, in units of 512 bits at a
+ * time. Order of access is not guaranteed, nor is a memory barrier
+ * performed afterwards. The command returns the remaining count that is not
+ * successful on failure. 0 is returned if successful.
+ *
+ * Warning: Do not use this helper unless your driver has checked that the CPU
+ * instruction is supported on the platform.
+ */
+static inline size_t iosubmit_cmds512_sync(void __iomem *dst, const void *src,
+					   size_t count)
+{
+	const u8 *from = src;
+	const u8 *end = from + count * 64;
+	size_t remain = count;
+	bool retry;
+
+	while (from < end) {
+		/* ENQCMDS [rdx], rax */
+		asm volatile(".byte 0xf3, 0x0f, 0x38, 0xf8, 0x02, 0x66, 0x90\t\n"
+			     "setz %0\t\n"
+			     : "=r"(retry) : "a" (dst), "d" (from));
+		if (retry)
+			return remain;
+
+		from += 64;
+		remain--;
+	}
+
+	return 0;
+}
+
 #endif /* _ASM_X86_IO_H */


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-30 21:26 [PATCH 0/6] Add shared workqueue support for idxd driver Dave Jiang
  2020-03-30 21:26 ` [PATCH 1/6] x86/asm: add iosubmit_cmds512_sync() based on enqcmds Dave Jiang
@ 2020-03-30 21:27 ` Dave Jiang
  2020-03-31 10:04   ` Greg KH
  2020-03-31 16:03   ` Bjorn Helgaas
  2020-03-30 21:27 ` [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device Dave Jiang
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-30 21:27 UTC (permalink / raw)
  To: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd
  Cc: linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

Since the current accelerator devices do not have standard PCIe capability
enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
been added to struct pci_dev.  Currently a PCI quirk must be used for the
devices that have such cap until the PCI cap is standardized. Add a helper
function to provide the check if a device supports the cmdmem capability.

Such capability is expected to be added to PCIe device cap enumeration in
the future.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/base/core.c    |   13 +++++++++++++
 include/linux/device.h |    2 ++
 include/linux/pci.h    |    1 +
 3 files changed, 16 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index dbb0f9130f42..cd9f5b040ed4 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
 #include <linux/netdevice.h>
 #include <linux/sched/signal.h>
 #include <linux/sysfs.h>
+#include <linux/pci.h>
 
 #include "base.h"
 #include "power/power.h"
@@ -3790,3 +3791,15 @@ int device_match_any(struct device *dev, const void *unused)
 	return 1;
 }
 EXPORT_SYMBOL_GPL(device_match_any);
+
+bool device_supports_cmdmem(struct device *dev)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(dev))
+		return false;
+
+	pdev = to_pci_dev(dev);
+	return pdev->cmdmem;
+}
+EXPORT_SYMBOL_GPL(device_supports_cmdmem);
diff --git a/include/linux/device.h b/include/linux/device.h
index fa04dfd22bbc..3e787d3de435 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -809,6 +809,8 @@ static inline bool dev_has_sync_state(struct device *dev)
 	return false;
 }
 
+extern bool device_supports_cmdmem(struct device *dev);
+
 /*
  * High level routines for use by the bus drivers
  */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3840a541a9de..0bc5d581f20e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -422,6 +422,7 @@ struct pci_dev {
 	unsigned int	is_probed:1;		/* Device probing in progress */
 	unsigned int	link_active_reporting:1;/* Device capable of reporting link active */
 	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
+	unsigned int	cmdmem:1;		/* MMIO writes support command mem region with synchronous write notification */
 	pci_dev_flags_t dev_flags;
 	atomic_t	enable_cnt;	/* pci_enable_device has been called */
 


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device
  2020-03-30 21:26 [PATCH 0/6] Add shared workqueue support for idxd driver Dave Jiang
  2020-03-30 21:26 ` [PATCH 1/6] x86/asm: add iosubmit_cmds512_sync() based on enqcmds Dave Jiang
  2020-03-30 21:27 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Dave Jiang
@ 2020-03-30 21:27 ` Dave Jiang
  2020-03-31 15:59   ` Bjorn Helgaas
  2020-04-01  7:18   ` Christoph Hellwig
  2020-03-30 21:27 ` [PATCH 4/6] device: add cmdmem support for MMIO address Dave Jiang
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-30 21:27 UTC (permalink / raw)
  To: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd
  Cc: linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

Since there is no standard way that defines a PCI device that receives
descriptors or commands with synchronous write operations, add quirk to set
cmdmem for the Intel accelerator device that supports it.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/pci/quirks.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 29f473ebf20f..ba0572b9b9c8 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5461,3 +5461,14 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
 			      PCI_CLASS_DISPLAY_VGA, 8,
 			      quirk_reset_lenovo_thinkpad_p50_nvgpu);
+
+/*
+ * Until the PCI Sig defines a standard capaiblity check that indicates a
+ * device has cmdmem with synchronous write capability, we'll add a quirk
+ * for device that supports it.
+ */
+static void device_cmdmem_fixup(struct pci_dev *pdev)
+{
+	pdev->cmdmem = 1;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0b25, device_cmdmem_fixup);


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] device: add cmdmem support for MMIO address
  2020-03-30 21:26 [PATCH 0/6] Add shared workqueue support for idxd driver Dave Jiang
                   ` (2 preceding siblings ...)
  2020-03-30 21:27 ` [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device Dave Jiang
@ 2020-03-30 21:27 ` Dave Jiang
  2020-04-01  7:19   ` Christoph Hellwig
  2020-03-30 21:27 ` [PATCH 5/6] dmaengine: idxd: add shared workqueue support Dave Jiang
  2020-03-30 21:27 ` [PATCH 6/6] dmaengine: idxd: add ABI documentation for shared wq Dave Jiang
  5 siblings, 1 reply; 22+ messages in thread
From: Dave Jiang @ 2020-03-30 21:27 UTC (permalink / raw)
  To: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd
  Cc: linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

With the introduction of ENQCMDS CPU instruction on Intel CPU, a number of
accelerator devices that support accepting data via ENQCMDS will be
arriving. Add devm_cmdmem_remap/unmap() wrappers to remap BAR memory to
specifically denote that these regions are of cmdmem behavior type even
thought they are iomem.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 include/linux/io.h |    4 ++++
 lib/devres.c       |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/io.h b/include/linux/io.h
index b1c44bb4b2d7..2b3356244553 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -79,6 +79,10 @@ void devm_memunmap(struct device *dev, void *addr);
 
 void *__devm_memremap_pages(struct device *dev, struct resource *res);
 
+void __iomem *devm_cmdmem_remap(struct device *dev, resource_size_t offset,
+				 resource_size_t size);
+void devm_cmdmem_unmap(struct device *dev, void __iomem *addr);
+
 #ifdef CONFIG_PCI
 /*
  * The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and
diff --git a/lib/devres.c b/lib/devres.c
index 6ef51f159c54..a14a49087b37 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -218,6 +218,42 @@ void __iomem *devm_of_iomap(struct device *dev, struct device_node *node, int in
 }
 EXPORT_SYMBOL(devm_of_iomap);
 
+/**
+ * devm_cmdmem_remap - Managed wrapper for cmdmem ioremap()
+ * @dev: Generic device to remap IO address for
+ * @offset: Resource address to map
+ * @size: Size of map
+ *
+ * Managed cmdmem ioremap() wrapper.  Map is automatically unmapped on
+ * driver detach.
+ */
+void __iomem *devm_cmdmem_remap(struct device *dev, resource_size_t offset,
+				 resource_size_t size)
+{
+	if (!device_supports_cmdmem(dev))
+		return NULL;
+
+	return devm_ioremap(dev, offset, size);
+}
+EXPORT_SYMBOL(devm_cmdmem_remap);
+
+/**
+ * devm_cmdmem_unmap - Managed wrapper for cmdmem iounmap()
+ * @dev: Generic device to unmap for
+ * @addr: Address to unmap
+ *
+ * Managed cmdmem iounmap().  @addr must have been mapped using
+ * devm_cmdmem_remap*().
+ */
+void devm_cmdmem_unmap(struct device *dev, void __iomem *addr)
+{
+	if (!device_supports_cmdmem(dev))
+		return;
+
+	devm_iounmap(dev, addr);
+}
+EXPORT_SYMBOL(devm_cmdmem_unmap);
+
 #ifdef CONFIG_HAS_IOPORT_MAP
 /*
  * Generic iomap devres


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] dmaengine: idxd: add shared workqueue support
  2020-03-30 21:26 [PATCH 0/6] Add shared workqueue support for idxd driver Dave Jiang
                   ` (3 preceding siblings ...)
  2020-03-30 21:27 ` [PATCH 4/6] device: add cmdmem support for MMIO address Dave Jiang
@ 2020-03-30 21:27 ` Dave Jiang
  2020-03-30 21:27 ` [PATCH 6/6] dmaengine: idxd: add ABI documentation for shared wq Dave Jiang
  5 siblings, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-30 21:27 UTC (permalink / raw)
  To: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd
  Cc: linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

Add shared workqueue support that includes the support of Shared Virtual
memory (SVM) or in similar terms On Demand Paging (ODP). The shared
workqueue uses the enqcmds command in kernel and will respond with retry if
the workqueue is full. Shared workqueue only works when there is PASID
support from the IOMMU.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/Kconfig       |    4 +
 drivers/dma/idxd/cdev.c   |   46 ++++++++++++++
 drivers/dma/idxd/device.c |  122 +++++++++++++++++++++++++++++++++++--
 drivers/dma/idxd/dma.c    |    2 -
 drivers/dma/idxd/idxd.h   |   13 +++-
 drivers/dma/idxd/init.c   |   92 +++++++++++++++++++++++-----
 drivers/dma/idxd/irq.c    |  147 +++++++++++++++++++++++++++++++++++++++++----
 drivers/dma/idxd/submit.c |  119 +++++++++++++++++++++++++++---------
 drivers/dma/idxd/sysfs.c  |  133 +++++++++++++++++++++++++++++++++++++++++
 9 files changed, 603 insertions(+), 75 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 5142da401db3..81b7848c1edb 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -286,6 +286,10 @@ config INTEL_IDXD
 	depends on PCI && X86_64
 	select DMA_ENGINE
 	select SBITMAP
+	select INTEL_IOMMU_SVM
+	select PCI_PRI
+	select PCI_PASID
+	select PCI_IOV
 	help
 	  Enable support for the Intel(R) data accelerators present
 	  in Intel Xeon CPU.
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index ff49847e37a8..27be9250606d 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -31,6 +31,7 @@ static struct idxd_cdev_context ictx[IDXD_TYPE_MAX] = {
 
 struct idxd_user_context {
 	struct idxd_wq *wq;
+	int pasid;
 	struct task_struct *task;
 	unsigned int flags;
 };
@@ -74,6 +75,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 	struct idxd_device *idxd;
 	struct idxd_wq *wq;
 	struct device *dev;
+	int rc;
 
 	wq = inode_wq(inode);
 	idxd = wq->idxd;
@@ -90,8 +92,34 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 
 	ctx->wq = wq;
 	filp->private_data = ctx;
+
+	if (idxd->pasid_enabled) {
+		ctx->task = current;
+		get_task_struct(current);
+
+		rc = intel_svm_bind_mm(dev, &ctx->pasid, 0, NULL);
+		if (rc < 0) {
+			dev_err(dev, "pasid allocation failed: %d\n", rc);
+			goto failed;
+		}
+
+		if (wq_dedicated(wq)) {
+			rc = idxd_wq_set_pasid(wq, ctx->pasid);
+			if (rc < 0) {
+				dev_err(dev, "wq set pasid failed: %d\n", rc);
+				goto failed;
+			}
+		}
+	}
+
 	idxd_wq_get(wq);
 	return 0;
+
+failed:
+	if (ctx->task)
+		put_task_struct(current);
+	kfree(ctx);
+	return rc;
 }
 
 static int idxd_cdev_release(struct inode *node, struct file *filep)
@@ -100,10 +128,26 @@ static int idxd_cdev_release(struct inode *node, struct file *filep)
 	struct idxd_wq *wq = ctx->wq;
 	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
+	int rc;
 
 	dev_dbg(dev, "%s called\n", __func__);
 	filep->private_data = NULL;
 
+	if (idxd->pasid_enabled) {
+		rc = idxd_device_drain_pasid(idxd, ctx->pasid);
+		if (rc < 0)
+			dev_err(dev, "Failed to drain pasid: %d\n",
+				ctx->pasid);
+		intel_svm_unbind_mm(&idxd->pdev->dev, ctx->pasid);
+		put_task_struct(ctx->task);
+
+		if (wq_dedicated(wq)) {
+			rc = idxd_wq_disable_pasid(wq);
+			if (rc < 0)
+				dev_err(dev, "wq disable pasid failed.\n");
+		}
+	}
+
 	kfree(ctx);
 	idxd_wq_put(wq);
 	return 0;
@@ -140,7 +184,7 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma)
 	if (rc < 0)
 		return rc;
 
-	vma->vm_flags |= VM_DONTCOPY;
+	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_WIPEONFORK;
 	pfn = (base + idxd_get_wq_portal_full_offset(wq->id,
 				IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT;
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index f6f49f0f6fae..9a785fef8879 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -298,10 +298,19 @@ int idxd_wq_map_portal(struct idxd_wq *wq)
 	start = pci_resource_start(pdev, IDXD_WQ_BAR);
 	start = start + wq->id * IDXD_PORTAL_SIZE;
 
-	wq->dportal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE);
-	if (!wq->dportal)
-		return -ENOMEM;
-	dev_dbg(dev, "wq %d portal mapped at %p\n", wq->id, wq->dportal);
+	if (wq_dedicated(wq)) {
+		wq->portal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE);
+		if (!wq->portal)
+			return -ENOMEM;
+		dev_dbg(dev, "dedicated wq %d portal mapped at %p\n",
+			wq->id, wq->portal);
+	} else {
+		wq->portal = devm_cmdmem_remap(dev, start, IDXD_PORTAL_SIZE);
+		if (!wq->portal)
+			return -ENOMEM;
+		dev_dbg(dev, "shared wq %d portal mapped at %p\n",
+			wq->id, wq->portal);
+	}
 
 	return 0;
 }
@@ -310,7 +319,63 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq)
 {
 	struct device *dev = &wq->idxd->pdev->dev;
 
-	devm_iounmap(dev, wq->dportal);
+	if (wq_dedicated(wq))
+		devm_iounmap(dev, wq->portal);
+	else
+		devm_cmdmem_unmap(dev, wq->portal);
+}
+
+int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
+{
+	struct idxd_device *idxd = wq->idxd;
+	int rc;
+	union wqcfg wqcfg;
+	unsigned int offset;
+
+	lockdep_assert_held(&idxd->dev_lock);
+
+	rc = idxd_wq_disable(wq);
+	if (rc < 0)
+		return rc;
+
+	offset = idxd->wqcfg_offset + wq->id * sizeof(wqcfg);
+	offset += sizeof(u32) * 2;
+	wqcfg.bits[2] = ioread32(idxd->reg_base + offset);
+	wqcfg.pasid_en = 1;
+	wqcfg.pasid = pasid;
+	iowrite32(wqcfg.bits[2], idxd->reg_base + offset);
+
+	rc = idxd_wq_enable(wq);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+
+int idxd_wq_disable_pasid(struct idxd_wq *wq)
+{
+	struct idxd_device *idxd = wq->idxd;
+	int rc;
+	union wqcfg wqcfg;
+	unsigned int offset;
+
+	lockdep_assert_held(&idxd->dev_lock);
+
+	rc = idxd_wq_disable(wq);
+	if (rc < 0)
+		return rc;
+
+	offset = idxd->wqcfg_offset + wq->id * sizeof(wqcfg);
+	wqcfg.bits[2] = ioread32(idxd->reg_base + offset);
+	wqcfg.pasid_en = 0;
+	wqcfg.pasid = 0;
+	iowrite32(wqcfg.bits[2], idxd->reg_base + offset);
+
+	rc = idxd_wq_enable(wq);
+	if (rc < 0)
+		return rc;
+
+	return 0;
 }
 
 /* Device control bits */
@@ -454,6 +519,35 @@ int idxd_device_reset(struct idxd_device *idxd)
 	return rc;
 }
 
+int idxd_device_drain_pasid(struct idxd_device *idxd, int pasid)
+{
+	int rc;
+	struct device *dev = &idxd->pdev->dev;
+	u32 operand, status;
+
+	lockdep_assert_held(&idxd->dev_lock);
+
+	dev_dbg(dev, "Drain pasid %d\n", pasid);
+
+	operand = pasid;
+	dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_DRAIN_PASID, operand);
+	rc = idxd_cmd_send(idxd, IDXD_CMD_DRAIN_PASID, operand);
+	if (rc < 0)
+		return rc;
+
+	rc = idxd_cmd_wait(idxd, &status, IDXD_DRAIN_TIMEOUT);
+	if (rc < 0)
+		return rc;
+
+	if (status != IDXD_CMDSTS_SUCCESS) {
+		dev_dbg(dev, "pasid drain failed: %#x\n", status);
+		return -ENXIO;
+	}
+
+	dev_dbg(dev, "pasid %d drained\n", pasid);
+	return 0;
+}
+
 /* Device configuration bits */
 static void idxd_group_config_write(struct idxd_group *group)
 {
@@ -539,11 +633,21 @@ static int idxd_wq_config_write(struct idxd_wq *wq)
 	wq->wqcfg.wq_thresh = wq->threshold;
 
 	/* byte 8-11 */
-	wq->wqcfg.priv = !!(wq->type == IDXD_WQT_KERNEL);
-	wq->wqcfg.mode = 1;
+	wq->wqcfg.priv = wq->type == IDXD_WQT_KERNEL ? 1 : 0;
+	if (wq_dedicated(wq))
+		wq->wqcfg.mode = 1;
+
+	if (idxd->pasid_enabled) {
+		wq->wqcfg.pasid_en = 1;
+		wq->wqcfg.pasid = idxd->pasid;
+	}
 
 	wq->wqcfg.priority = wq->priority;
 
+	if (idxd->hw.gen_cap.block_on_fault &&
+	    test_bit(WQ_FLAG_BOF, &wq->flags))
+		wq->wqcfg.bof = 1;
+
 	/* bytes 12-15 */
 	wq->wqcfg.max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift;
 	wq->wqcfg.max_batch_shift = idxd->hw.gen_cap.max_batch_shift;
@@ -651,8 +755,8 @@ static int idxd_wqs_setup(struct idxd_device *idxd)
 		if (!wq->size)
 			continue;
 
-		if (!wq_dedicated(wq)) {
-			dev_warn(dev, "No shared workqueue support.\n");
+		if (!wq_dedicated(wq) && !idxd->pasid_enabled) {
+			dev_warn(dev, "No pasid support but shared queue.\n");
 			return -EINVAL;
 		}
 
diff --git a/drivers/dma/idxd/dma.c b/drivers/dma/idxd/dma.c
index c64c1429d160..9a4f78519e57 100644
--- a/drivers/dma/idxd/dma.c
+++ b/drivers/dma/idxd/dma.c
@@ -154,7 +154,7 @@ dma_cookie_t idxd_dma_tx_submit(struct dma_async_tx_descriptor *tx)
 
 	cookie = dma_cookie_assign(tx);
 
-	rc = idxd_submit_desc(wq, desc);
+	rc = idxd_submit_desc(wq, desc, IDXD_OP_BLOCK);
 	if (rc < 0) {
 		idxd_free_desc(wq, desc);
 		return rc;
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index b8f8a363b4a7..2e96dec04eda 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -59,6 +59,7 @@ enum idxd_wq_state {
 
 enum idxd_wq_flag {
 	WQ_FLAG_DEDICATED = 0,
+	WQ_FLAG_BOF,
 };
 
 enum idxd_wq_type {
@@ -86,10 +87,11 @@ enum idxd_op_type {
 enum idxd_complete_type {
 	IDXD_COMPLETE_NORMAL = 0,
 	IDXD_COMPLETE_ABORT,
+	IDXD_COMPLETE_DEV_FAIL,
 };
 
 struct idxd_wq {
-	void __iomem *dportal;
+	void __iomem *portal;
 	struct device conf_dev;
 	struct idxd_cdev idxd_cdev;
 	struct idxd_device *idxd;
@@ -165,6 +167,9 @@ struct idxd_device {
 	struct idxd_wq *wqs;
 	struct idxd_engine *engines;
 
+	bool pasid_enabled;
+	int pasid;
+
 	int num_groups;
 
 	u32 msix_perm_offset;
@@ -282,6 +287,7 @@ int __idxd_device_reset(struct idxd_device *idxd);
 void idxd_device_cleanup(struct idxd_device *idxd);
 int idxd_device_config(struct idxd_device *idxd);
 void idxd_device_wqs_clear_state(struct idxd_device *idxd);
+int idxd_device_drain_pasid(struct idxd_device *idxd, int pasid);
 
 /* work queue control */
 int idxd_wq_alloc_resources(struct idxd_wq *wq);
@@ -290,9 +296,12 @@ int idxd_wq_enable(struct idxd_wq *wq);
 int idxd_wq_disable(struct idxd_wq *wq);
 int idxd_wq_map_portal(struct idxd_wq *wq);
 void idxd_wq_unmap_portal(struct idxd_wq *wq);
+int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid);
+int idxd_wq_disable_pasid(struct idxd_wq *wq);
 
 /* submission */
-int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc);
+int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc,
+		     enum idxd_op_type optype);
 struct idxd_desc *idxd_alloc_desc(struct idxd_wq *wq, enum idxd_op_type optype);
 void idxd_free_desc(struct idxd_wq *wq, struct idxd_desc *desc);
 
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 7778c05deb5d..f3afd47ec782 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -14,6 +14,7 @@
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/device.h>
 #include <linux/idr.h>
+#include <linux/intel-svm.h>
 #include <uapi/linux/idxd.h>
 #include <linux/dmaengine.h>
 #include "../dmaengine.h"
@@ -53,6 +54,7 @@ static int idxd_setup_interrupts(struct idxd_device *idxd)
 	struct idxd_irq_entry *irq_entry;
 	int i, msixcnt;
 	int rc = 0;
+	union msix_perm mperm;
 
 	msixcnt = pci_msix_vec_count(pdev);
 	if (msixcnt < 0) {
@@ -131,6 +133,14 @@ static int idxd_setup_interrupts(struct idxd_device *idxd)
 
 	idxd_unmask_error_interrupts(idxd);
 
+	/* Setup MSIX permission table */
+	mperm.bits = 0;
+	mperm.pasid = idxd->pasid;
+	mperm.pasid_en = idxd->pasid_enabled;
+	for (i = 1; i < msixcnt; i++)
+		iowrite32(mperm.bits, idxd->reg_base +
+				idxd->msix_perm_offset + i * 8);
+
 	return 0;
 
  err_no_irq:
@@ -272,8 +282,7 @@ static void idxd_read_caps(struct idxd_device *idxd)
 	}
 }
 
-static struct idxd_device *idxd_alloc(struct pci_dev *pdev,
-				      void __iomem * const *iomap)
+static struct idxd_device *idxd_alloc(struct pci_dev *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct idxd_device *idxd;
@@ -283,12 +292,55 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev,
 		return NULL;
 
 	idxd->pdev = pdev;
-	idxd->reg_base = iomap[IDXD_MMIO_BAR];
 	spin_lock_init(&idxd->dev_lock);
 
 	return idxd;
 }
 
+static void idxd_enable_system_pasid(struct idxd_device *idxd)
+{
+	int rc, flags, pasid;
+	struct pci_dev *pdev = idxd->pdev;
+
+	/*
+	 * If CPU does not have enqcmds support then there's no point in
+	 * enabling pasid.
+	 */
+	if (!pdev->cmdmem)
+		return;
+
+	flags = SVM_FLAG_SUPERVISOR_MODE;
+
+	rc = intel_svm_bind_mm(&idxd->pdev->dev, &pasid, flags, NULL);
+	if (rc < 0) {
+		dev_warn(&idxd->pdev->dev,
+			 "system pasid allocation failed: %d\n", rc);
+		idxd->pasid_enabled = false;
+		return;
+	}
+
+	idxd->pasid_enabled = true;
+	idxd->pasid = pasid;
+	dev_dbg(&idxd->pdev->dev, "system pasid: %d\n", pasid);
+}
+
+static int idxd_disable_system_pasid(struct idxd_device *idxd)
+{
+	int rc;
+
+	if (idxd->pasid_enabled) {
+		rc = intel_svm_unbind_mm(&idxd->pdev->dev, idxd->pasid);
+		if (rc < 0) {
+			dev_warn(&idxd->pdev->dev,
+				 "system pasid unbind failed: %d\n",
+					rc);
+			return rc;
+		}
+	}
+	idxd->pasid_enabled = false;
+	return 0;
+}
+
 static int idxd_probe(struct idxd_device *idxd)
 {
 	struct pci_dev *pdev = idxd->pdev;
@@ -301,6 +353,7 @@ static int idxd_probe(struct idxd_device *idxd)
 		return rc;
 	dev_dbg(dev, "IDXD reset complete\n");
 
+	idxd_enable_system_pasid(idxd);
 	idxd_read_caps(idxd);
 	idxd_read_table_offsets(idxd);
 
@@ -331,29 +384,31 @@ static int idxd_probe(struct idxd_device *idxd)
 	idxd_mask_error_interrupts(idxd);
 	idxd_mask_msix_vectors(idxd);
  err_setup:
+	idxd_disable_system_pasid(idxd);
 	return rc;
 }
 
 static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
-	void __iomem * const *iomap;
 	struct device *dev = &pdev->dev;
 	struct idxd_device *idxd;
 	int rc;
-	unsigned int mask;
 
 	rc = pcim_enable_device(pdev);
 	if (rc)
 		return rc;
 
-	dev_dbg(dev, "Mapping BARs\n");
-	mask = (1 << IDXD_MMIO_BAR);
-	rc = pcim_iomap_regions(pdev, mask, DRV_NAME);
-	if (rc)
-		return rc;
+	dev_dbg(dev, "Alloc IDXD context\n");
+	idxd = idxd_alloc(pdev);
+	if (!idxd)
+		return -ENOMEM;
+
+	if (!pdev->cmdmem)
+		dev_dbg(dev, "Device does not have cmdmem support\n");
 
-	iomap = pcim_iomap_table(pdev);
-	if (!iomap)
+	dev_dbg(dev, "Mapping BARs\n");
+	idxd->reg_base = pcim_iomap(pdev, IDXD_MMIO_BAR, 0);
+	if (!idxd->reg_base)
 		return -ENOMEM;
 
 	dev_dbg(dev, "Set DMA masks\n");
@@ -369,11 +424,6 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		return rc;
 
-	dev_dbg(dev, "Alloc IDXD context\n");
-	idxd = idxd_alloc(pdev, iomap);
-	if (!idxd)
-		return -ENOMEM;
-
 	idxd_set_type(idxd);
 
 	dev_dbg(dev, "Set PCI master\n");
@@ -463,6 +513,7 @@ static void idxd_remove(struct pci_dev *pdev)
 	idxd_cleanup_sysfs(idxd);
 	idxd_shutdown(pdev);
 	idxd_wqs_free_lock(idxd);
+	idxd_disable_system_pasid(idxd);
 	mutex_lock(&idxd_idr_lock);
 	idr_remove(&idxd_idrs[idxd->type], idxd->id);
 	mutex_unlock(&idxd_idr_lock);
@@ -481,7 +532,7 @@ static int __init idxd_init_module(void)
 	int err, i;
 
 	/*
-	 * If the CPU does not support write512, there's no point in
+	 * If the CPU does not support MOVDIR64B or ENQCMDS, there's no point in
 	 * enumerating the device. We can not utilize it.
 	 */
 	if (!boot_cpu_has(X86_FEATURE_MOVDIR64B)) {
@@ -489,6 +540,11 @@ static int __init idxd_init_module(void)
 		return -ENODEV;
 	}
 
+	if (!boot_cpu_has(X86_FEATURE_ENQCMD)) {
+		pr_warn("idxd module failed to load without ENQCMD.\n");
+		return -ENODEV;
+	}
+
 	pr_info("%s: Intel(R) Accelerator Devices Driver %s\n",
 		DRV_NAME, IDXD_DRIVER_VERSION);
 
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index d6fcd2e60103..37ad927d6944 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -11,6 +11,38 @@
 #include "idxd.h"
 #include "registers.h"
 
+enum irq_work_type {
+	IRQ_WORK_NORMAL = 0,
+	IRQ_WORK_PROCESS_FAULT,
+};
+
+static int irq_process_work_list(struct idxd_irq_entry *irq_entry,
+				 enum irq_work_type wtype,
+				 int *processed, u64 data);
+static int irq_process_pending_llist(struct idxd_irq_entry *irq_entry,
+				     enum irq_work_type wtype,
+				     int *processed, u64 data);
+
+static void idxd_mask_and_sync_wq_msix_vectors(struct idxd_device *idxd)
+{
+	int irqcnt = idxd->num_wq_irqs + 1;
+	int i;
+
+	for (i = 1; i < irqcnt; i++) {
+		idxd_mask_msix_vector(idxd, i);
+		synchronize_irq(idxd->msix_entries[i].vector);
+	}
+}
+
+static void idxd_unmask_wq_msix_vectors(struct idxd_device *idxd)
+{
+	int irqcnt = idxd->num_wq_irqs + 1;
+	int i;
+
+	for (i = 1; i < irqcnt; i++)
+		idxd_unmask_msix_vector(idxd, i);
+}
+
 void idxd_device_wqs_clear_state(struct idxd_device *idxd)
 {
 	int i;
@@ -62,6 +94,45 @@ static int idxd_restart(struct idxd_device *idxd)
 	return rc;
 }
 
+static void idxd_device_complete_fault_desc(struct idxd_device *idxd,
+					    u64 fault_addr)
+{
+	unsigned long flags;
+	struct idxd_irq_entry *ie;
+	int i, found = 0;
+	int irqcnt = idxd->num_wq_irqs + 1;
+
+	idxd_mask_and_sync_wq_msix_vectors(idxd);
+
+	spin_lock_irqsave(&idxd->dev_lock, flags);
+
+	/*
+	 * At this point, all MSIX vectors used by workqueues should be masked
+	 * and all threaded irq handlers should be quieted. We should be able
+	 * to touch the pending descriptor lists.
+	 */
+
+	for (i = 1; i < irqcnt; i++) {
+		ie = &idxd->irq_entries[i];
+		irq_process_work_list(ie, IRQ_WORK_PROCESS_FAULT,
+				      &found, fault_addr);
+		if (found) {
+			spin_unlock_irqrestore(&idxd->dev_lock, flags);
+			return;
+		}
+
+		irq_process_pending_llist(ie, IRQ_WORK_PROCESS_FAULT,
+					  &found, fault_addr);
+		if (found) {
+			spin_unlock_irqrestore(&idxd->dev_lock, flags);
+			return;
+		}
+	}
+
+	idxd_unmask_wq_msix_vectors(idxd);
+	spin_unlock_irqrestore(&idxd->dev_lock, flags);
+}
+
 irqreturn_t idxd_irq_handler(int vec, void *data)
 {
 	struct idxd_irq_entry *irq_entry = data;
@@ -136,13 +207,24 @@ irqreturn_t idxd_misc_thread(int vec, void *data)
 
 	val ^= cause;
 	if (val)
-		dev_warn_once(dev, "Unexpected interrupt cause bits set: %#x\n",
+		dev_warn_once(dev,
+			      "Unexpected interrupt cause bits set: %#x\n",
 			      val);
 
 	iowrite32(cause, idxd->reg_base + IDXD_INTCAUSE_OFFSET);
 	if (!err)
 		return IRQ_HANDLED;
 
+	/*
+	 * This case should rarely happen and typically is due to software
+	 * programming error by the driver.
+	 */
+	if (idxd->sw_err.valid &&
+	    idxd->sw_err.desc_valid &&
+	    idxd->sw_err.fault_addr)
+		idxd_device_complete_fault_desc(idxd,
+						idxd->sw_err.fault_addr);
+
 	gensts.bits = ioread32(idxd->reg_base + IDXD_GENSTATS_OFFSET);
 	if (gensts.state == IDXD_DEVICE_STATE_HALT) {
 		spin_lock_bh(&idxd->dev_lock);
@@ -166,24 +248,56 @@ irqreturn_t idxd_misc_thread(int vec, void *data)
 	return IRQ_HANDLED;
 }
 
+static bool process_fault(struct idxd_desc *desc, u64 fault_addr)
+{
+	if ((u64)desc->hw == fault_addr ||
+	    (u64)desc->completion == fault_addr) {
+		idxd_dma_complete_txd(desc, IDXD_COMPLETE_DEV_FAIL);
+		return true;
+	}
+
+	return false;
+}
+
+static bool complete_desc(struct idxd_desc *desc)
+{
+	if (desc->completion->status) {
+		idxd_dma_complete_txd(desc, IDXD_COMPLETE_NORMAL);
+		return true;
+	}
+
+	return false;
+}
+
 static int irq_process_pending_llist(struct idxd_irq_entry *irq_entry,
-				     int *processed)
+				     enum irq_work_type wtype,
+				     int *processed, u64 data)
 {
 	struct idxd_desc *desc, *t;
 	struct llist_node *head;
 	int queued = 0;
+	bool completed = false;
 
 	head = llist_del_all(&irq_entry->pending_llist);
-	if (!head)
+	if (!head) {
+		*processed = 0;
 		return 0;
+	}
 
 	llist_for_each_entry_safe(desc, t, head, llnode) {
-		if (desc->completion->status) {
-			idxd_dma_complete_txd(desc, IDXD_COMPLETE_NORMAL);
+		if (wtype == IRQ_WORK_NORMAL)
+			completed = complete_desc(desc);
+		else if (wtype == IRQ_WORK_PROCESS_FAULT)
+			completed = process_fault(desc, data);
+
+		if (completed) {
 			idxd_free_desc(desc->wq, desc);
 			(*processed)++;
+			if (wtype == IRQ_WORK_PROCESS_FAULT)
+				break;
 		} else {
-			list_add_tail(&desc->list, &irq_entry->work_list);
+			list_add_tail(&desc->list,
+				      &irq_entry->work_list);
 			queued++;
 		}
 	}
@@ -192,10 +306,12 @@ static int irq_process_pending_llist(struct idxd_irq_entry *irq_entry,
 }
 
 static int irq_process_work_list(struct idxd_irq_entry *irq_entry,
-				 int *processed)
+				 enum irq_work_type wtype,
+				 int *processed, u64 data)
 {
 	struct list_head *node, *next;
 	int queued = 0;
+	bool completed = false;
 
 	if (list_empty(&irq_entry->work_list))
 		return 0;
@@ -204,12 +320,17 @@ static int irq_process_work_list(struct idxd_irq_entry *irq_entry,
 		struct idxd_desc *desc =
 			container_of(node, struct idxd_desc, list);
 
-		if (desc->completion->status) {
+		if (wtype == IRQ_WORK_NORMAL)
+			completed = complete_desc(desc);
+		else if (wtype == IRQ_WORK_PROCESS_FAULT)
+			completed = process_fault(desc, data);
+
+		if (completed) {
 			list_del(&desc->list);
-			/* process and callback */
-			idxd_dma_complete_txd(desc, IDXD_COMPLETE_NORMAL);
 			idxd_free_desc(desc->wq, desc);
 			(*processed)++;
+			if (wtype == IRQ_WORK_PROCESS_FAULT)
+				break;
 		} else {
 			queued++;
 		}
@@ -243,13 +364,15 @@ irqreturn_t idxd_wq_thread(int irq, void *data)
 	 * 5. Repeat until no more descriptors.
 	 */
 	do {
-		rc = irq_process_work_list(irq_entry, &processed);
+		rc = irq_process_work_list(irq_entry, IRQ_WORK_NORMAL,
+					   &processed, 0);
 		if (rc != 0) {
 			retry++;
 			continue;
 		}
 
-		rc = irq_process_pending_llist(irq_entry, &processed);
+		rc = irq_process_pending_llist(irq_entry, IRQ_WORK_NORMAL,
+					       &processed, 0);
 	} while (rc != 0 && retry != 10);
 
 	idxd_unmask_msix_vector(irq_entry->idxd, irq_entry->id);
diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c
index 45a0c5869a0a..741bc3aa7267 100644
--- a/drivers/dma/idxd/submit.c
+++ b/drivers/dma/idxd/submit.c
@@ -8,41 +8,44 @@
 #include "idxd.h"
 #include "registers.h"
 
-struct idxd_desc *idxd_alloc_desc(struct idxd_wq *wq, enum idxd_op_type optype)
+struct idxd_desc *idxd_alloc_desc(struct idxd_wq *wq,
+				  enum idxd_op_type optype)
 {
+	struct idxd_device *idxd = wq->idxd;
 	struct idxd_desc *desc;
 	int idx;
-	struct idxd_device *idxd = wq->idxd;
 
 	if (idxd->state != IDXD_DEV_ENABLED)
 		return ERR_PTR(-EIO);
 
-	if (optype == IDXD_OP_BLOCK)
-		percpu_down_read(&wq->submit_lock);
-	else if (!percpu_down_read_trylock(&wq->submit_lock))
-		return ERR_PTR(-EBUSY);
+	if (wq_dedicated(wq)) {
+		if (optype == IDXD_OP_BLOCK)
+			percpu_down_read(&wq->submit_lock);
+		else if (!percpu_down_read_trylock(&wq->submit_lock))
+			return ERR_PTR(-EBUSY);
+
+		if (!atomic_add_unless(&wq->dq_count, 1, wq->size)) {
+			int rc;
 
-	if (!atomic_add_unless(&wq->dq_count, 1, wq->size)) {
-		int rc;
+			if (optype == IDXD_OP_NONBLOCK) {
+				percpu_up_read(&wq->submit_lock);
+				return ERR_PTR(-EAGAIN);
+			}
 
-		if (optype == IDXD_OP_NONBLOCK) {
 			percpu_up_read(&wq->submit_lock);
-			return ERR_PTR(-EAGAIN);
+			percpu_down_write(&wq->submit_lock);
+			rc = wait_event_interruptible(wq->submit_waitq,
+					atomic_add_unless(&wq->dq_count, 1,
+							  wq->size) ||
+					idxd->state != IDXD_DEV_ENABLED);
+			percpu_up_write(&wq->submit_lock);
+			if (rc < 0)
+				return ERR_PTR(-EINTR);
+			if (idxd->state != IDXD_DEV_ENABLED)
+				return ERR_PTR(-EIO);
+		} else {
+			percpu_up_read(&wq->submit_lock);
 		}
-
-		percpu_up_read(&wq->submit_lock);
-		percpu_down_write(&wq->submit_lock);
-		rc = wait_event_interruptible(wq->submit_waitq,
-					      atomic_add_unless(&wq->dq_count,
-								1, wq->size) ||
-					       idxd->state != IDXD_DEV_ENABLED);
-		percpu_up_write(&wq->submit_lock);
-		if (rc < 0)
-			return ERR_PTR(-EINTR);
-		if (idxd->state != IDXD_DEV_ENABLED)
-			return ERR_PTR(-EIO);
-	} else {
-		percpu_up_read(&wq->submit_lock);
 	}
 
 	idx = sbitmap_get(&wq->sbmap, 0, false);
@@ -59,29 +62,81 @@ struct idxd_desc *idxd_alloc_desc(struct idxd_wq *wq, enum idxd_op_type optype)
 
 void idxd_free_desc(struct idxd_wq *wq, struct idxd_desc *desc)
 {
-	atomic_dec(&wq->dq_count);
+	if (wq_dedicated(wq))
+		atomic_dec(&wq->dq_count);
 
 	sbitmap_clear_bit(&wq->sbmap, desc->id);
 	wake_up(&wq->submit_waitq);
 }
 
-int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc)
+static int idxd_iosubmit_cmd_sync(struct idxd_wq *wq, void __iomem *portal,
+				  struct dsa_hw_desc *hw,
+				  enum idxd_op_type optype)
 {
 	struct idxd_device *idxd = wq->idxd;
-	int vec = desc->hw->int_handle;
-	void __iomem *portal;
+	int rc;
 
-	if (idxd->state != IDXD_DEV_ENABLED)
-		return -EIO;
+	if (optype == IDXD_OP_BLOCK)
+		percpu_down_read(&wq->submit_lock);
+	else if (!percpu_down_read_trylock(&wq->submit_lock))
+		return -EBUSY;
 
-	portal = wq->dportal + idxd_get_wq_portal_offset(IDXD_PORTAL_UNLIMITED);
 	/*
 	 * The wmb() flushes writes to coherent DMA data before possibly
 	 * triggering a DMA read. The wmb() is necessary even on UP because
 	 * the recipient is a device.
 	 */
 	wmb();
-	iosubmit_cmds512(portal, desc->hw, 1);
+	rc = iosubmit_cmds512_sync(portal, hw, 1);
+	if (rc) {
+		if (optype == IDXD_OP_NONBLOCK)
+			return -EBUSY;
+		if (idxd->state != IDXD_DEV_ENABLED)
+			return -EIO;
+		percpu_up_read(&wq->submit_lock);
+		percpu_down_write(&wq->submit_lock);
+		rc = wait_event_interruptible(wq->submit_waitq,
+					      !iosubmit_cmds512_sync(portal,
+								     hw, 1) ||
+					      idxd->state != IDXD_DEV_ENABLED);
+		percpu_up_write(&wq->submit_lock);
+		if (rc < 0)
+			return -EINTR;
+		if (idxd->state != IDXD_DEV_ENABLED)
+			return -EIO;
+	} else {
+		percpu_up_read(&wq->submit_lock);
+	}
+
+	return 0;
+}
+
+int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc,
+		     enum idxd_op_type optype)
+{
+	struct idxd_device *idxd = wq->idxd;
+	int vec = desc->hw->int_handle;
+	int rc;
+	void __iomem *portal;
+
+	if (idxd->state != IDXD_DEV_ENABLED)
+		return -EIO;
+
+	portal = wq->portal +
+		 idxd_get_wq_portal_offset(IDXD_PORTAL_UNLIMITED);
+	if (wq_dedicated(wq)) {
+		/*
+		 * The wmb() flushes writes to coherent DMA data before
+		 * possibly triggering a DMA read. The wmb() is necessary
+		 * even on UP because the recipient is a device.
+		 */
+		wmb();
+		iosubmit_cmds512(portal, desc->hw, 1);
+	} else {
+		rc = idxd_iosubmit_cmd_sync(wq, portal, desc->hw, optype);
+		if (rc < 0)
+			return rc;
+	}
 
 	/*
 	 * Pending the descriptor to the lockless list for the irq_entry
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 3999827970ab..dc38172be42e 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -179,6 +179,30 @@ static int idxd_config_bus_probe(struct device *dev)
 			return -EINVAL;
 		}
 
+		/* Shared WQ checks */
+		if (!wq_dedicated(wq)) {
+			if (!idxd->pasid_enabled) {
+				dev_warn(dev,
+					 "PASID not enabled and shared WQ.\n");
+				mutex_unlock(&wq->wq_lock);
+				return -ENXIO;
+			}
+			/*
+			 * Shared wq with the threshold set to 0 means the user
+			 * did not set the threshold or transitioned from a
+			 * dedicated wq but did not set threshold. A value
+			 * of 0 would effectively disable the shared wq. The
+			 * driver does not allow a value of 0 to be set for
+			 * threshold via sysfs.
+			 */
+			if (wq->threshold == 0) {
+				dev_warn(dev,
+					 "Shared WQ and threshold 0.\n");
+				mutex_unlock(&wq->wq_lock);
+				return -EINVAL;
+			}
+		}
+
 		rc = idxd_wq_alloc_resources(wq);
 		if (rc < 0) {
 			mutex_unlock(&wq->wq_lock);
@@ -880,6 +904,8 @@ static ssize_t wq_mode_store(struct device *dev,
 	if (sysfs_streq(buf, "dedicated")) {
 		set_bit(WQ_FLAG_DEDICATED, &wq->flags);
 		wq->threshold = 0;
+	} else if (sysfs_streq(buf, "shared") && idxd->pasid_enabled) {
+		clear_bit(WQ_FLAG_DEDICATED, &wq->flags);
 	} else {
 		return -EINVAL;
 	}
@@ -978,6 +1004,91 @@ static ssize_t wq_priority_store(struct device *dev,
 static struct device_attribute dev_attr_wq_priority =
 		__ATTR(priority, 0644, wq_priority_show, wq_priority_store);
 
+static ssize_t wq_block_on_fault_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev);
+
+	return sprintf(buf, "%u\n", test_bit(WQ_FLAG_BOF, &wq->flags));
+}
+
+static ssize_t wq_block_on_fault_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t count)
+{
+	struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev);
+	struct idxd_device *idxd = wq->idxd;
+	unsigned long val;
+	int rc;
+
+	if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
+		return -EPERM;
+
+	if (wq->state != IDXD_WQ_DISABLED)
+		return -EPERM;
+
+	rc = kstrtoul(buf, 10, &val);
+	if (rc < 0)
+		return -EINVAL;
+
+	if (val == 1)
+		set_bit(WQ_FLAG_BOF, &wq->flags);
+	else if (val == 0)
+		clear_bit(WQ_FLAG_BOF, &wq->flags);
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+static struct device_attribute dev_attr_wq_block_on_fault =
+		__ATTR(block_on_fault, 0644, wq_block_on_fault_show,
+		       wq_block_on_fault_store);
+
+static ssize_t wq_threshold_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev);
+
+	return sprintf(buf, "%u\n", wq->threshold);
+}
+
+static ssize_t wq_threshold_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev);
+	struct idxd_device *idxd = wq->idxd;
+	unsigned long val;
+	int rc;
+
+	rc = kstrtoul(buf, 10, &val);
+	if (rc < 0)
+		return -EINVAL;
+
+	if (val > wq->size || val <= 0)
+		return -EINVAL;
+
+	if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
+		return -EPERM;
+
+	if (wq->state != IDXD_WQ_DISABLED)
+		return -EPERM;
+
+	if (test_bit(WQ_FLAG_DEDICATED, &wq->flags))
+		return -EINVAL;
+
+	if (val > wq->size)
+		return -EINVAL;
+
+	wq->threshold = val;
+
+	return count;
+}
+
+static struct device_attribute dev_attr_wq_threshold =
+		__ATTR(threshold, 0644, wq_threshold_show, wq_threshold_store);
+
 static ssize_t wq_type_show(struct device *dev,
 			    struct device_attribute *attr, char *buf)
 {
@@ -1049,6 +1160,15 @@ static ssize_t wq_name_store(struct device *dev,
 	if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
 		return -EINVAL;
 
+	/*
+	 * This is temporarily placed here until we implement the direct
+	 * submission API through dmaengine with SVM support.
+	 */
+	if (sysfs_streq(buf, "dmaengine") &&
+	    wq->type == IDXD_WQT_KERNEL &&
+	    wq->idxd->pasid_enabled)
+		return -EOPNOTSUPP;
+
 	memset(wq->name, 0, WQ_NAME_SIZE + 1);
 	strncpy(wq->name, buf, WQ_NAME_SIZE);
 	strreplace(wq->name, '\n', '\0');
@@ -1076,6 +1196,8 @@ static struct attribute *idxd_wq_attributes[] = {
 	&dev_attr_wq_mode.attr,
 	&dev_attr_wq_size.attr,
 	&dev_attr_wq_priority.attr,
+	&dev_attr_wq_block_on_fault.attr,
+	&dev_attr_wq_threshold.attr,
 	&dev_attr_wq_type.attr,
 	&dev_attr_wq_name.attr,
 	&dev_attr_wq_cdev_minor.attr,
@@ -1215,6 +1337,16 @@ static ssize_t clients_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(clients);
 
+static ssize_t pasid_enabled_show(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct idxd_device *idxd =
+		container_of(dev, struct idxd_device, conf_dev);
+
+	return sprintf(buf, "%u\n", idxd->pasid_enabled);
+}
+static DEVICE_ATTR_RO(pasid_enabled);
+
 static ssize_t state_show(struct device *dev,
 			  struct device_attribute *attr, char *buf)
 {
@@ -1324,6 +1456,7 @@ static struct attribute *idxd_device_attributes[] = {
 	&dev_attr_gen_cap.attr,
 	&dev_attr_configurable.attr,
 	&dev_attr_clients.attr,
+	&dev_attr_pasid_enabled.attr,
 	&dev_attr_state.attr,
 	&dev_attr_errors.attr,
 	&dev_attr_max_tokens.attr,


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] dmaengine: idxd: add ABI documentation for shared wq
  2020-03-30 21:26 [PATCH 0/6] Add shared workqueue support for idxd driver Dave Jiang
                   ` (4 preceding siblings ...)
  2020-03-30 21:27 ` [PATCH 5/6] dmaengine: idxd: add shared workqueue support Dave Jiang
@ 2020-03-30 21:27 ` Dave Jiang
  5 siblings, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-30 21:27 UTC (permalink / raw)
  To: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd
  Cc: linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

Add the sysfs attribute bits in ABI/stable for shared wq support.

Signed-off-by: Jing Lin <jing.lin@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 Documentation/ABI/stable/sysfs-driver-dma-idxd |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd
index f4be46cc6cb6..c1adddde23c2 100644
--- a/Documentation/ABI/stable/sysfs-driver-dma-idxd
+++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd
@@ -71,6 +71,13 @@ Contact:        dmaengine@vger.kernel.org
 Description:    The operation capability bit mask specify the operation types
 		supported by the this device.
 
+What:		sys/bus/dsa/devices/dsa<m>/pasid_enabled
+Date:		Jan 30, 2020
+KernelVersion:	5.7.0
+Contact:	dmaengine@vger.kernel.org
+Description:	To indicate if PASID (process address space identifier) is
+		enabled or not for this device.
+
 What:           sys/bus/dsa/devices/dsa<m>/state
 Date:           Oct 25, 2019
 KernelVersion:  5.6.0
@@ -110,6 +117,13 @@ Description:    The maximum number of bandwidth tokens that may be in use at
 		one time by operations that access low bandwidth memory in the
 		device.
 
+What:		sys/bus/dsa/devices/wq<m>.<n>/block_on_fault
+Date:		Jan 30, 2020
+KernelVersion:	5.7.0
+Contact:	dmaengine@vger.kernel.org
+Description:	To indicate block on fault is allowed or not for the work queue
+		to support on demand paging.
+
 What:           sys/bus/dsa/devices/wq<m>.<n>/group_id
 Date:           Oct 25, 2019
 KernelVersion:  5.6.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-30 21:27 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Dave Jiang
@ 2020-03-31 10:04   ` Greg KH
  2020-03-31 17:07     ` Dave Jiang
  2020-03-31 16:03   ` Bjorn Helgaas
  1 sibling, 1 reply; 22+ messages in thread
From: Greg KH @ 2020-03-31 10:04 UTC (permalink / raw)
  To: Dave Jiang
  Cc: vkoul, tglx, mingo, bp, hpa, bhelgaas, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar

On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
> Since the current accelerator devices do not have standard PCIe capability
> enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
> been added to struct pci_dev.  Currently a PCI quirk must be used for the
> devices that have such cap until the PCI cap is standardized. Add a helper
> function to provide the check if a device supports the cmdmem capability.
> 
> Such capability is expected to be added to PCIe device cap enumeration in
> the future.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/base/core.c    |   13 +++++++++++++
>  include/linux/device.h |    2 ++
>  include/linux/pci.h    |    1 +
>  3 files changed, 16 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index dbb0f9130f42..cd9f5b040ed4 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -27,6 +27,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sysfs.h>
> +#include <linux/pci.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -3790,3 +3791,15 @@ int device_match_any(struct device *dev, const void *unused)
>  	return 1;
>  }
>  EXPORT_SYMBOL_GPL(device_match_any);
> +
> +bool device_supports_cmdmem(struct device *dev)
> +{
> +	struct pci_dev *pdev;
> +
> +	if (!dev_is_pci(dev))
> +		return false;
> +
> +	pdev = to_pci_dev(dev);
> +	return pdev->cmdmem;
> +}
> +EXPORT_SYMBOL_GPL(device_supports_cmdmem);

Why would a pci-specific function like this be ok to have in the driver
core?  Please keep it in the pci core code instead.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device
  2020-03-30 21:27 ` [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device Dave Jiang
@ 2020-03-31 15:59   ` Bjorn Helgaas
  2020-03-31 18:02     ` Dave Jiang
  2020-04-01  7:18   ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2020-03-31 15:59 UTC (permalink / raw)
  To: Dave Jiang
  Cc: vkoul, tglx, mingo, bp, hpa, gregkh, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar

Take a look and make yours match (applies to other patches in the
series as well):

  $ git log --oneline drivers/pci/quirks.c
  299bd044a6f3 ("PCI: Add ACS quirk for Zhaoxin Root/Downstream Ports")
  0325837c51cb ("PCI: Add ACS quirk for Zhaoxin multi-function devices")
  2880325bda8d ("PCI: Avoid ASMedia XHCI USB PME# from D0 defect")
  b88bf6c3b6ff ("PCI: Add boot interrupt quirk mechanism for Xeon chipsets")
  5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
  7b90dfc4873b ("PCI: Add DMA alias quirk for PLX PEX NTB")
  09298542cd89 ("PCI: Add nr_devfns parameter to pci_add_dma_alias()")

There's no need to mention "PCI" twice.  Also no need for both "quirk"
and "fixup".  This is all in the interest of putting more information
in the small space of the subject line.

On Mon, Mar 30, 2020 at 02:27:06PM -0700, Dave Jiang wrote:
> Since there is no standard way that defines a PCI device that receives
> descriptors or commands with synchronous write operations, add quirk to set
> cmdmem for the Intel accelerator device that supports it.
>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/pci/quirks.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 29f473ebf20f..ba0572b9b9c8 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5461,3 +5461,14 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
>  			      PCI_CLASS_DISPLAY_VGA, 8,
>  			      quirk_reset_lenovo_thinkpad_p50_nvgpu);
> +
> +/*
> + * Until the PCI Sig defines a standard capaiblity check that indicates a
> + * device has cmdmem with synchronous write capability, we'll add a quirk
> + * for device that supports it.

s/PCI Sig/PCI-SIG/
s/capaiblity/capability/

It's not clear why this would need to be in drivers/pci/quirks.c as
opposed to being in the driver itself.

> + */
> +static void device_cmdmem_fixup(struct pci_dev *pdev)
> +{
> +	pdev->cmdmem = 1;
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0b25, device_cmdmem_fixup);
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-30 21:27 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Dave Jiang
  2020-03-31 10:04   ` Greg KH
@ 2020-03-31 16:03   ` Bjorn Helgaas
  2020-03-31 21:44     ` Dave Jiang
  1 sibling, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2020-03-31 16:03 UTC (permalink / raw)
  To: Dave Jiang
  Cc: vkoul, tglx, mingo, bp, hpa, gregkh, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar

On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
> Since the current accelerator devices do not have standard PCIe capability
> enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
> been added to struct pci_dev.  Currently a PCI quirk must be used for the
> devices that have such cap until the PCI cap is standardized. Add a helper
> function to provide the check if a device supports the cmdmem capability.
> 
> Such capability is expected to be added to PCIe device cap enumeration in
> the future.

This needs some sort of thumbnail description of what "synchronous
write notification" and "cmdmem" mean.

Do you have a pointer to a PCI-SIG ECR or similar?

Your window size seems to be 85 or so.  It would be easier if you used
80 and wrapped the commit log to fit in 75 columns so it looks decent
when "git log" indents it by 4.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-31 10:04   ` Greg KH
@ 2020-03-31 17:07     ` Dave Jiang
  2020-03-31 17:24       ` Greg KH
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jiang @ 2020-03-31 17:07 UTC (permalink / raw)
  To: Greg KH
  Cc: vkoul, tglx, mingo, bp, hpa, bhelgaas, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar


On 3/31/2020 3:04 AM, Greg KH wrote:
> On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
>> Since the current accelerator devices do not have standard PCIe capability
>> enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
>> been added to struct pci_dev.  Currently a PCI quirk must be used for the
>> devices that have such cap until the PCI cap is standardized. Add a helper
>> function to provide the check if a device supports the cmdmem capability.
>>
>> Such capability is expected to be added to PCIe device cap enumeration in
>> the future.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>   drivers/base/core.c    |   13 +++++++++++++
>>   include/linux/device.h |    2 ++
>>   include/linux/pci.h    |    1 +
>>   3 files changed, 16 insertions(+)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index dbb0f9130f42..cd9f5b040ed4 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -27,6 +27,7 @@
>>   #include <linux/netdevice.h>
>>   #include <linux/sched/signal.h>
>>   #include <linux/sysfs.h>
>> +#include <linux/pci.h>
>>   
>>   #include "base.h"
>>   #include "power/power.h"
>> @@ -3790,3 +3791,15 @@ int device_match_any(struct device *dev, const void *unused)
>>   	return 1;
>>   }
>>   EXPORT_SYMBOL_GPL(device_match_any);
>> +
>> +bool device_supports_cmdmem(struct device *dev)
>> +{
>> +	struct pci_dev *pdev;
>> +
>> +	if (!dev_is_pci(dev))
>> +		return false;
>> +
>> +	pdev = to_pci_dev(dev);
>> +	return pdev->cmdmem;
>> +}
>> +EXPORT_SYMBOL_GPL(device_supports_cmdmem);
> Why would a pci-specific function like this be ok to have in the driver
> core?  Please keep it in the pci core code instead.

The original thought was to introduce a new arch level memory mapping 
semantic. If you feel this should be PCI exclusive, should we make the 
ioremap routines for this memory type pci specific as well?


>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-31 17:07     ` Dave Jiang
@ 2020-03-31 17:24       ` Greg KH
  2020-03-31 17:38         ` Dave Jiang
  0 siblings, 1 reply; 22+ messages in thread
From: Greg KH @ 2020-03-31 17:24 UTC (permalink / raw)
  To: Dave Jiang
  Cc: vkoul, tglx, mingo, bp, hpa, bhelgaas, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar

On Tue, Mar 31, 2020 at 10:07:07AM -0700, Dave Jiang wrote:
> 
> On 3/31/2020 3:04 AM, Greg KH wrote:
> > On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
> > > Since the current accelerator devices do not have standard PCIe capability
> > > enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
> > > been added to struct pci_dev.  Currently a PCI quirk must be used for the
> > > devices that have such cap until the PCI cap is standardized. Add a helper
> > > function to provide the check if a device supports the cmdmem capability.
> > > 
> > > Such capability is expected to be added to PCIe device cap enumeration in
> > > the future.
> > > 
> > > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > > ---
> > >   drivers/base/core.c    |   13 +++++++++++++
> > >   include/linux/device.h |    2 ++
> > >   include/linux/pci.h    |    1 +
> > >   3 files changed, 16 insertions(+)
> > > 
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index dbb0f9130f42..cd9f5b040ed4 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -27,6 +27,7 @@
> > >   #include <linux/netdevice.h>
> > >   #include <linux/sched/signal.h>
> > >   #include <linux/sysfs.h>
> > > +#include <linux/pci.h>
> > >   #include "base.h"
> > >   #include "power/power.h"
> > > @@ -3790,3 +3791,15 @@ int device_match_any(struct device *dev, const void *unused)
> > >   	return 1;
> > >   }
> > >   EXPORT_SYMBOL_GPL(device_match_any);
> > > +
> > > +bool device_supports_cmdmem(struct device *dev)
> > > +{
> > > +	struct pci_dev *pdev;
> > > +
> > > +	if (!dev_is_pci(dev))
> > > +		return false;
> > > +
> > > +	pdev = to_pci_dev(dev);
> > > +	return pdev->cmdmem;
> > > +}
> > > +EXPORT_SYMBOL_GPL(device_supports_cmdmem);
> > Why would a pci-specific function like this be ok to have in the driver
> > core?  Please keep it in the pci core code instead.
> 
> The original thought was to introduce a new arch level memory mapping
> semantic.

Please do not.  Also, that's not what you are doing here from what I can
tell.

> If you feel this should be PCI exclusive, should we make the ioremap
> routines for this memory type pci specific as well?

Why wouldn't it be?  Is this needed anywhere else?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-31 17:24       ` Greg KH
@ 2020-03-31 17:38         ` Dave Jiang
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-31 17:38 UTC (permalink / raw)
  To: Greg KH
  Cc: vkoul, tglx, mingo, bp, hpa, bhelgaas, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar


On 3/31/2020 10:24 AM, Greg KH wrote:
> On Tue, Mar 31, 2020 at 10:07:07AM -0700, Dave Jiang wrote:
>> On 3/31/2020 3:04 AM, Greg KH wrote:
>>> On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
>>>> Since the current accelerator devices do not have standard PCIe capability
>>>> enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
>>>> been added to struct pci_dev.  Currently a PCI quirk must be used for the
>>>> devices that have such cap until the PCI cap is standardized. Add a helper
>>>> function to provide the check if a device supports the cmdmem capability.
>>>>
>>>> Such capability is expected to be added to PCIe device cap enumeration in
>>>> the future.
>>>>
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>> ---
>>>>    drivers/base/core.c    |   13 +++++++++++++
>>>>    include/linux/device.h |    2 ++
>>>>    include/linux/pci.h    |    1 +
>>>>    3 files changed, 16 insertions(+)
>>>>
>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>> index dbb0f9130f42..cd9f5b040ed4 100644
>>>> --- a/drivers/base/core.c
>>>> +++ b/drivers/base/core.c
>>>> @@ -27,6 +27,7 @@
>>>>    #include <linux/netdevice.h>
>>>>    #include <linux/sched/signal.h>
>>>>    #include <linux/sysfs.h>
>>>> +#include <linux/pci.h>
>>>>    #include "base.h"
>>>>    #include "power/power.h"
>>>> @@ -3790,3 +3791,15 @@ int device_match_any(struct device *dev, const void *unused)
>>>>    	return 1;
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(device_match_any);
>>>> +
>>>> +bool device_supports_cmdmem(struct device *dev)
>>>> +{
>>>> +	struct pci_dev *pdev;
>>>> +
>>>> +	if (!dev_is_pci(dev))
>>>> +		return false;
>>>> +
>>>> +	pdev = to_pci_dev(dev);
>>>> +	return pdev->cmdmem;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(device_supports_cmdmem);
>>> Why would a pci-specific function like this be ok to have in the driver
>>> core?  Please keep it in the pci core code instead.
>> The original thought was to introduce a new arch level memory mapping
>> semantic.
> Please do not.  Also, that's not what you are doing here from what I can
> tell.
>
>> If you feel this should be PCI exclusive, should we make the ioremap
>> routines for this memory type pci specific as well?
> Why wouldn't it be?  Is this needed anywhere else?

Ok I'll make this pci specific.


>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device
  2020-03-31 15:59   ` Bjorn Helgaas
@ 2020-03-31 18:02     ` Dave Jiang
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-31 18:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: vkoul, tglx, mingo, bp, hpa, gregkh, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar


On 3/31/2020 8:59 AM, Bjorn Helgaas wrote:
> Take a look and make yours match (applies to other patches in the
> series as well):
>
>    $ git log --oneline drivers/pci/quirks.c
>    299bd044a6f3 ("PCI: Add ACS quirk for Zhaoxin Root/Downstream Ports")
>    0325837c51cb ("PCI: Add ACS quirk for Zhaoxin multi-function devices")
>    2880325bda8d ("PCI: Avoid ASMedia XHCI USB PME# from D0 defect")
>    b88bf6c3b6ff ("PCI: Add boot interrupt quirk mechanism for Xeon chipsets")
>    5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
>    7b90dfc4873b ("PCI: Add DMA alias quirk for PLX PEX NTB")
>    09298542cd89 ("PCI: Add nr_devfns parameter to pci_add_dma_alias()")
>
> There's no need to mention "PCI" twice.  Also no need for both "quirk"
> and "fixup".  This is all in the interest of putting more information
> in the small space of the subject line.
Ok I'll fix up.
>
> On Mon, Mar 30, 2020 at 02:27:06PM -0700, Dave Jiang wrote:
>> Since there is no standard way that defines a PCI device that receives
>> descriptors or commands with synchronous write operations, add quirk to set
>> cmdmem for the Intel accelerator device that supports it.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>   drivers/pci/quirks.c |   11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 29f473ebf20f..ba0572b9b9c8 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -5461,3 +5461,14 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
>>   DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
>>   			      PCI_CLASS_DISPLAY_VGA, 8,
>>   			      quirk_reset_lenovo_thinkpad_p50_nvgpu);
>> +
>> +/*
>> + * Until the PCI Sig defines a standard capaiblity check that indicates a
>> + * device has cmdmem with synchronous write capability, we'll add a quirk
>> + * for device that supports it.
> s/PCI Sig/PCI-SIG/
> s/capaiblity/capability/
>
> It's not clear why this would need to be in drivers/pci/quirks.c as
> opposed to being in the driver itself.

That would make the driver to set the PCI device struct cap bit instead 
of this being set on discovery right? And if the driver isn't loaded, 
then the cap wouldn't be set. In the future if user space wants to 
discover this information that may be an issue.



>
>> + */
>> +static void device_cmdmem_fixup(struct pci_dev *pdev)
>> +{
>> +	pdev->cmdmem = 1;
>> +}
>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0b25, device_cmdmem_fixup);
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-31 16:03   ` Bjorn Helgaas
@ 2020-03-31 21:44     ` Dave Jiang
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2020-03-31 21:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: vkoul, tglx, mingo, bp, hpa, gregkh, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar



On 3/31/2020 9:03 AM, Bjorn Helgaas wrote:
> On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
>> Since the current accelerator devices do not have standard PCIe capability
>> enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
>> been added to struct pci_dev.  Currently a PCI quirk must be used for the
>> devices that have such cap until the PCI cap is standardized. Add a helper
>> function to provide the check if a device supports the cmdmem capability.
>>
>> Such capability is expected to be added to PCIe device cap enumeration in
>> the future.

Re-send. My misconfigured mail client caused mailing lists to bounce the 
send.

> 
> This needs some sort of thumbnail description of what "synchronous
> write notification" and "cmdmem" mean.

I will add more explanation.

> 
> Do you have a pointer to a PCI-SIG ECR or similar?



Deferrable Memory Write (DMWr) ECR


https://members.pcisig.com/wg/PCI-SIG/document/13747

 From what I'm told it should be available for public review by EOW.

> 
> Your window size seems to be 85 or so.  It would be easier if you used
> 80 and wrapped the commit log to fit in 75 columns so it looks decent
> when "git log" indents it by 4.
> 
Ok I will fix.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device
  2020-03-30 21:27 ` [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device Dave Jiang
  2020-03-31 15:59   ` Bjorn Helgaas
@ 2020-04-01  7:18   ` Christoph Hellwig
  2020-04-02  2:20     ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2020-04-01  7:18 UTC (permalink / raw)
  To: Dave Jiang
  Cc: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd,
	linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

On Mon, Mar 30, 2020 at 02:27:06PM -0700, Dave Jiang wrote:
> Since there is no standard way that defines a PCI device that receives
> descriptors or commands with synchronous write operations, add quirk to set
> cmdmem for the Intel accelerator device that supports it.

Why do we need a quirk for this?  Even assuming a flag is needed in
struct pci_dev (and I don't really understand why that is needed to
start with), it could be set in ->probe.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] device: add cmdmem support for MMIO address
  2020-03-30 21:27 ` [PATCH 4/6] device: add cmdmem support for MMIO address Dave Jiang
@ 2020-04-01  7:19   ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-04-01  7:19 UTC (permalink / raw)
  To: Dave Jiang
  Cc: vkoul, tglx, mingo, bp, hpa, bhelgaas, gregkh, arnd,
	linux-kernel, x86, dmaengine, dan.j.williams, ashok.raj,
	fenghua.yu, linux-pci, tony.luck, jing.lin, sanjay.k.kumar

> +/**
> + * devm_cmdmem_remap - Managed wrapper for cmdmem ioremap()
> + * @dev: Generic device to remap IO address for
> + * @offset: Resource address to map
> + * @size: Size of map
> + *
> + * Managed cmdmem ioremap() wrapper.  Map is automatically unmapped on
> + * driver detach.
> + */
> +void __iomem *devm_cmdmem_remap(struct device *dev, resource_size_t offset,
> +				 resource_size_t size)
> +{
> +	if (!device_supports_cmdmem(dev))
> +		return NULL;
> +
> +	return devm_ioremap(dev, offset, size);

All this could be trivially open coded in the caller.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device
  2020-04-01  7:18   ` Christoph Hellwig
@ 2020-04-02  2:20     ` Dan Williams
  2020-04-02  7:39       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2020-04-02  2:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Jiang, Vinod Koul, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Bjorn Helgaas, Greg KH,
	Arnd Bergmann, Linux Kernel Mailing List, X86 ML, dmaengine, Raj,
	Ashok, Fenghua Yu, linux-pci, Luck, Tony, Jing Lin,
	Sanjay K Kumar

On Wed, Apr 1, 2020 at 12:19 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Mar 30, 2020 at 02:27:06PM -0700, Dave Jiang wrote:
> > Since there is no standard way that defines a PCI device that receives
> > descriptors or commands with synchronous write operations, add quirk to set
> > cmdmem for the Intel accelerator device that supports it.
>
> Why do we need a quirk for this?  Even assuming a flag is needed in
> struct pci_dev (and I don't really understand why that is needed to
> start with), it could be set in ->probe.

The consideration in my mind is whether this new memory type and
instruction combination warrants a new __attribute__((noderef,
address_space(X))) separate from __iomem. If it stays a single device
concept layered on __iomem then yes, I agree it can all live locally
in the driver. However, when / if this facility becomes wider spread,
as the PCI ECR in question is trending, it might warrant general
annotation.

The enqcmds instruction does not operate on typical x86 mmio
addresses, only these special device portals offer the ability to have
non-posted writes with immediate results in the cpu condition code
flags.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device
  2020-04-02  2:20     ` Dan Williams
@ 2020-04-02  7:39       ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2020-04-02  7:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Dave Jiang, Vinod Koul, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Bjorn Helgaas,
	Greg KH, Arnd Bergmann, Linux Kernel Mailing List, X86 ML,
	dmaengine, Raj, Ashok, Fenghua Yu, linux-pci, Luck, Tony,
	Jing Lin, Sanjay K Kumar

On Wed, Apr 01, 2020 at 07:20:59PM -0700, Dan Williams wrote:
> On Wed, Apr 1, 2020 at 12:19 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Mon, Mar 30, 2020 at 02:27:06PM -0700, Dave Jiang wrote:
> > > Since there is no standard way that defines a PCI device that receives
> > > descriptors or commands with synchronous write operations, add quirk to set
> > > cmdmem for the Intel accelerator device that supports it.
> >
> > Why do we need a quirk for this?  Even assuming a flag is needed in
> > struct pci_dev (and I don't really understand why that is needed to
> > start with), it could be set in ->probe.
> 
> The consideration in my mind is whether this new memory type and
> instruction combination warrants a new __attribute__((noderef,
> address_space(X))) separate from __iomem. If it stays a single device
> concept layered on __iomem then yes, I agree it can all live locally
> in the driver. However, when / if this facility becomes wider spread,
> as the PCI ECR in question is trending, it might warrant general
> annotation.
> 
> The enqcmds instruction does not operate on typical x86 mmio
> addresses, only these special device portals offer the ability to have
> non-posted writes with immediate results in the cpu condition code
> flags.

But that is not what this series does at all.  And I think it makes
sense to wait until it gains adoption to think about a different address
space.  In this series we could just trivially kill patches 2-4 and make
it much easier to understand.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-31 22:00 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Bjorn Helgaas
  2020-03-31 22:34   ` Thomas Gleixner
@ 2020-04-01 16:37   ` Dave Jiang
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Jiang @ 2020-04-01 16:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: vkoul, tglx, mingo, bp, hpa, gregkh, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar



On 3/31/2020 3:00 PM, Bjorn Helgaas wrote:
> On Tue, Mar 31, 2020 at 10:59:44AM -0700, Dave Jiang wrote:
>> On 3/31/2020 9:03 AM, Bjorn Helgaas wrote:
>>> On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
>>>> Since the current accelerator devices do not have standard PCIe capability
>>>> enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
>>>> been added to struct pci_dev.  Currently a PCI quirk must be used for the
>>>> devices that have such cap until the PCI cap is standardized. Add a helper
>>>> function to provide the check if a device supports the cmdmem capability.
>>>>
>>>> Such capability is expected to be added to PCIe device cap enumeration in
>>>> the future.
>>> This needs some sort of thumbnail description of what "synchronous
>>> write notification" and "cmdmem" mean.
>>
>> I will add more explanation.
>>
>>> Do you have a pointer to a PCI-SIG ECR or similar?
>>
>> Deferrable Memory Write (DMWr) ECR
>>
>> https://members.pcisig.com/wg/PCI-SIG/document/13747
>>
>>  From what I'm told it should be available for public review by EOW.
> 
> Please use terminology from the spec instead of things like
> "synchronous write notification".
> 
> AIUI, ENQCMDS is an x86 instruction.  That would have no meaning in
> the PCIe domain.
> 
> I'm not committing to acking any part of this before the ECR is
> accepted, but if you're adding support for the feature described by
> the ECR, you might as well add support for discovering the DMWr
> capability via Device Capabilities 2 as described in the ECR.

I'll look into adding the support for the ECR.

> 
> If you have hardware that supports DMWr but doesn't advertise it via
> Device Capabilities 2, you can always have a quirk that works around
> that lack.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
  2020-03-31 22:00 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Bjorn Helgaas
@ 2020-03-31 22:34   ` Thomas Gleixner
  2020-04-01 16:37   ` Dave Jiang
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2020-03-31 22:34 UTC (permalink / raw)
  To: Bjorn Helgaas, Dave Jiang
  Cc: vkoul, mingo, bp, hpa, gregkh, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar

Bjorn Helgaas <helgaas@kernel.org> writes:
> On Tue, Mar 31, 2020 at 10:59:44AM -0700, Dave Jiang wrote:
>> On 3/31/2020 9:03 AM, Bjorn Helgaas wrote:
>> > On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
>> > > Since the current accelerator devices do not have standard PCIe capability
>> > > enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
>> > > been added to struct pci_dev.  Currently a PCI quirk must be used for the
>> > > devices that have such cap until the PCI cap is standardized. Add a helper
>> > > function to provide the check if a device supports the cmdmem capability.
>> > > 
>> > > Such capability is expected to be added to PCIe device cap enumeration in
>> > > the future.
>> > This needs some sort of thumbnail description of what "synchronous
>> > write notification" and "cmdmem" mean.
>> 
>> I will add more explanation.
>> 
>> > Do you have a pointer to a PCI-SIG ECR or similar?
>> 
>> Deferrable Memory Write (DMWr) ECR
>> 
>> https://members.pcisig.com/wg/PCI-SIG/document/13747
>> 
>> From what I'm told it should be available for public review by EOW.
>
> Please use terminology from the spec instead of things like
> "synchronous write notification".
>
> AIUI, ENQCMDS is an x86 instruction.  That would have no meaning in
> the PCIe domain.
>
> I'm not committing to acking any part of this before the ECR is
> accepted, but if you're adding support for the feature described by
> the ECR, you might as well add support for discovering the DMWr
> capability via Device Capabilities 2 as described in the ECR.

Don't worry. There is nothing to decide and ack before the basic
architecture support for ENQCMD[S] is discussed and accepted. The
patches providing this support have been posted 2 hrs before this pile
hit the mailing lists yesterday.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] device/pci: add cmdmem cap to pci_dev
       [not found] <d798a7eb-ceb0-d81e-5422-f9e41058a098@intel.com>
@ 2020-03-31 22:00 ` Bjorn Helgaas
  2020-03-31 22:34   ` Thomas Gleixner
  2020-04-01 16:37   ` Dave Jiang
  0 siblings, 2 replies; 22+ messages in thread
From: Bjorn Helgaas @ 2020-03-31 22:00 UTC (permalink / raw)
  To: Dave Jiang
  Cc: vkoul, tglx, mingo, bp, hpa, gregkh, arnd, linux-kernel, x86,
	dmaengine, dan.j.williams, ashok.raj, fenghua.yu, linux-pci,
	tony.luck, jing.lin, sanjay.k.kumar

On Tue, Mar 31, 2020 at 10:59:44AM -0700, Dave Jiang wrote:
> On 3/31/2020 9:03 AM, Bjorn Helgaas wrote:
> > On Mon, Mar 30, 2020 at 02:27:00PM -0700, Dave Jiang wrote:
> > > Since the current accelerator devices do not have standard PCIe capability
> > > enumeration for accepting ENQCMDS yet, for now an attribute of pdev->cmdmem has
> > > been added to struct pci_dev.  Currently a PCI quirk must be used for the
> > > devices that have such cap until the PCI cap is standardized. Add a helper
> > > function to provide the check if a device supports the cmdmem capability.
> > > 
> > > Such capability is expected to be added to PCIe device cap enumeration in
> > > the future.
> > This needs some sort of thumbnail description of what "synchronous
> > write notification" and "cmdmem" mean.
> 
> I will add more explanation.
> 
> > Do you have a pointer to a PCI-SIG ECR or similar?
> 
> Deferrable Memory Write (DMWr) ECR
> 
> https://members.pcisig.com/wg/PCI-SIG/document/13747
> 
> From what I'm told it should be available for public review by EOW.

Please use terminology from the spec instead of things like
"synchronous write notification".

AIUI, ENQCMDS is an x86 instruction.  That would have no meaning in
the PCIe domain.

I'm not committing to acking any part of this before the ECR is
accepted, but if you're adding support for the feature described by
the ECR, you might as well add support for discovering the DMWr
capability via Device Capabilities 2 as described in the ECR.

If you have hardware that supports DMWr but doesn't advertise it via
Device Capabilities 2, you can always have a quirk that works around
that lack.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-04-02  7:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30 21:26 [PATCH 0/6] Add shared workqueue support for idxd driver Dave Jiang
2020-03-30 21:26 ` [PATCH 1/6] x86/asm: add iosubmit_cmds512_sync() based on enqcmds Dave Jiang
2020-03-30 21:27 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Dave Jiang
2020-03-31 10:04   ` Greg KH
2020-03-31 17:07     ` Dave Jiang
2020-03-31 17:24       ` Greg KH
2020-03-31 17:38         ` Dave Jiang
2020-03-31 16:03   ` Bjorn Helgaas
2020-03-31 21:44     ` Dave Jiang
2020-03-30 21:27 ` [PATCH 3/6] pci: add PCI quirk cmdmem fixup for Intel DSA device Dave Jiang
2020-03-31 15:59   ` Bjorn Helgaas
2020-03-31 18:02     ` Dave Jiang
2020-04-01  7:18   ` Christoph Hellwig
2020-04-02  2:20     ` Dan Williams
2020-04-02  7:39       ` Christoph Hellwig
2020-03-30 21:27 ` [PATCH 4/6] device: add cmdmem support for MMIO address Dave Jiang
2020-04-01  7:19   ` Christoph Hellwig
2020-03-30 21:27 ` [PATCH 5/6] dmaengine: idxd: add shared workqueue support Dave Jiang
2020-03-30 21:27 ` [PATCH 6/6] dmaengine: idxd: add ABI documentation for shared wq Dave Jiang
     [not found] <d798a7eb-ceb0-d81e-5422-f9e41058a098@intel.com>
2020-03-31 22:00 ` [PATCH 2/6] device/pci: add cmdmem cap to pci_dev Bjorn Helgaas
2020-03-31 22:34   ` Thomas Gleixner
2020-04-01 16:37   ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).