All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features
@ 2023-01-03 16:34 Fenghua Yu
  2023-01-03 16:34 ` [PATCH 01/17] dmaengine: idxd: make misc interrupt one shot Fenghua Yu
                   ` (16 more replies)
  0 siblings, 17 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine

Applications can send 64B descriptors to the DSA device via CPU instructions
MOVDIR64B or ENQCMD. The application can choose to have the device write
back a completion record (CR) in system memory to indicate the status of the
descriptor submitted on completion.

With the DSA hardware, the device is able to do on demand paging through
the hardware by faulting in the user pages that do not have physical memory
page backing with assistance from IOMMU. In the spec this was designated as
the block on fault feature. While this hardware feature made operation
simpler, it also stalls the device engines while the memory pages are being
faulted in through Page Request Service (PRS). For applications sharing the
same workqueue (wq) or wqs in the same group, operations are stalled if
there are no free engines. To avoid slowing the performance of all other
running applications sharing the same device engine(s), PRS can to be
disabled and software can deal with partial completion.

The block on fault feature on DSA 1.0 can be disabled for the wq. However,
PRS is not completely disabled for the whole path. It is not disabled for
CRs or batch list for a batch operation.

The other issue is the DSA 1.0 error reporting mechanism, SWERROR register.
The SWERROR register can only report a single error at a time until the
driver reads and acknowledges the error. The follow on errors cannot be
reported until the current error is "cleared" by the software by writing
a bit to the SWERR register. If a large number of faults arrive and the
software cannot clear them fast enough, overflowed errors will be dropped
by the device.

A CR is the optional 32 bytes (DSA) or 64 bytes (IAA) status that is
written back for a submitted descriptor. If the address for the CR faults,
the error is reported to the SWERROR register instead.

With DSA 2.0 hardware [1], the event log feature is added. All errors are
reported as an entry in a circular buffer reside in the system memory.
The system admin is responsible to configure the size of the circular
buffer large enough per device to handle the potential errors that may be
reported. If the buffer is full and another error needs to be reported,
the device engine will block until there's a free slot in the buffer.
An event log entry for a faulted CR will contain the error information,
the CR address that faulted, and the expected CR content the device had
originally intended to write.

DSA 2.0 also introduces per wq PRS disable knob. This will disable all PRS
operations for the specific wq. The device will still have Address
Translation Service (ATS) on. When ATS fails on a memory address for a CR,
an eventlog entry will be written by the hardware into the event log
ring buffer. The driver software is expected to parse the event log entry,
fault in the address of the CR, and the write the content of the CR to
the memory address.

This patch series will implement the DSA 2 event log support. The support
for the handling of the faulted user CR is added. The driver is also
adding the same support for batch operation descriptors. With a batch
operation the handling of the event log entry is a bit more complex.
The faulting CR could be for the batch descriptor or any of the operation
descriptors within the batch. The hardware generates a batch identifier
that is used by the driver software to correlate the event log entries for
the relevant descriptors of that batch.

The faulting of source and destination addresses for the operation is not
handled by the driver. That is left to be handled by the user application
by faulting in the memory and re-submit the remaining operation.

This series consists of three parts:
1. Patch 1: Make misc interrupt one shot. Event Log interrupt depends on
   this patch. This patch was released before but is not in upstream yet:
   https://lore.kernel.org/dmaengine/165125374675.311834.10460196228320964350.stgit@djiang5-desk3.ch.intel.com/
2. Patches 2-16: Enable Event Log and Completion Record faulting.
3. Patch 17: Configure PRS disable per WQ.

[1] DSA 2.0 spec: https://cdrdv2-public.intel.com/671116/341204-intel-data-streaming-accelerator-spec.pdf

Dave Jiang (17):
  dmaengine: idxd: make misc interrupt one shot
  dmaengine: idxd: add event log size sysfs attribute
  dmaengine: idxd: setup event log configuration
  dmaengine: idxd: add interrupt handling for event log
  dmanegine: idxd: add debugfs for event log dump
  dmaengine: idxd: add per DSA wq workqueue for processing cr faults
  dmaengine: idxd: create kmem cache for event log fault items
  iommu: expose iommu_sva_find() to common header
  mm: export access_remote_vm() symbol
  dmaengine: idxd: process user page faults for completion record
  dmaengine: idxd: add descs_completed field for completion record
  dmaengine: idxd: process batch descriptor completion record faults
  dmaengine: idxd: add per file user counters for completion record
    faults
  dmaengine: idxd: add a device to represent the file opened
  dmaengine: idxd: expose fault counters to sysfs
  dmaengine: idxd: add pid to exported sysfs attribute for opened file
  dmaengine: idxd: add per wq PRS disable

 .../ABI/stable/sysfs-driver-dma-idxd          |  43 +++
 drivers/dma/Kconfig                           |   1 +
 drivers/dma/idxd/Makefile                     |   2 +-
 drivers/dma/idxd/cdev.c                       | 264 ++++++++++++++++--
 drivers/dma/idxd/debugfs.c                    | 138 +++++++++
 drivers/dma/idxd/device.c                     | 113 +++++++-
 drivers/dma/idxd/idxd.h                       |  63 +++++
 drivers/dma/idxd/init.c                       |  53 ++++
 drivers/dma/idxd/irq.c                        | 207 ++++++++++++--
 drivers/dma/idxd/registers.h                  | 105 ++++++-
 drivers/dma/idxd/sysfs.c                      | 112 +++++++-
 drivers/iommu/iommu-sva.h                     |   1 -
 include/linux/iommu.h                         |   7 +
 include/uapi/linux/idxd.h                     |  15 +-
 mm/memory.c                                   |   1 +
 15 files changed, 1059 insertions(+), 66 deletions(-)
 create mode 100644 drivers/dma/idxd/debugfs.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/17] dmaengine: idxd: make misc interrupt one shot
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 02/17] dmaengine: idxd: add event log size sysfs attribute Fenghua Yu
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Sanjay Kumar, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Current code continuously processes the interrupt as long as the hardware
is setting the status bit. There's no reason to do that since the threaded
handler will get called again if another interrupt is asserted.

Also through testing, it has shown that if a misprogrammed (or malicious)
agent can continuously submit descriptors with bad completion record and
causes errors to be reported via the misc interrupt. Continuous processing
by the thread can cause software hang watchdog to kick off since the thread
isn't giving up the CPU.

Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/irq.c | 38 ++++++++++++--------------------------
 1 file changed, 12 insertions(+), 26 deletions(-)

diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index aa314ebec587..0d639303b515 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -217,13 +217,22 @@ static void idxd_int_handle_revoke(struct work_struct *work)
 	kfree(revoke);
 }
 
-static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
+irqreturn_t idxd_misc_thread(int vec, void *data)
 {
+	struct idxd_irq_entry *irq_entry = data;
+	struct idxd_device *idxd = ie_to_idxd(irq_entry);
 	struct device *dev = &idxd->pdev->dev;
 	union gensts_reg gensts;
 	u32 val = 0;
 	int i;
 	bool err = false;
+	u32 cause;
+
+	cause = ioread32(idxd->reg_base + IDXD_INTCAUSE_OFFSET);
+	if (!cause)
+		return IRQ_NONE;
+
+	iowrite32(cause, idxd->reg_base + IDXD_INTCAUSE_OFFSET);
 
 	if (cause & IDXD_INTC_HALT_STATE)
 		goto halt;
@@ -301,7 +310,7 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
 			      val);
 
 	if (!err)
-		return 0;
+		goto out;
 
 halt:
 	gensts.bits = ioread32(idxd->reg_base + IDXD_GENSTATS_OFFSET);
@@ -324,33 +333,10 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
 				"idxd halted, need %s.\n",
 				gensts.reset_type == IDXD_DEVICE_RESET_FLR ?
 				"FLR" : "system reset");
-			return -ENXIO;
 		}
 	}
 
-	return 0;
-}
-
-irqreturn_t idxd_misc_thread(int vec, void *data)
-{
-	struct idxd_irq_entry *irq_entry = data;
-	struct idxd_device *idxd = ie_to_idxd(irq_entry);
-	int rc;
-	u32 cause;
-
-	cause = ioread32(idxd->reg_base + IDXD_INTCAUSE_OFFSET);
-	if (cause)
-		iowrite32(cause, idxd->reg_base + IDXD_INTCAUSE_OFFSET);
-
-	while (cause) {
-		rc = process_misc_interrupts(idxd, cause);
-		if (rc < 0)
-			break;
-		cause = ioread32(idxd->reg_base + IDXD_INTCAUSE_OFFSET);
-		if (cause)
-			iowrite32(cause, idxd->reg_base + IDXD_INTCAUSE_OFFSET);
-	}
-
+out:
 	return IRQ_HANDLED;
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/17] dmaengine: idxd: add event log size sysfs attribute
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
  2023-01-03 16:34 ` [PATCH 01/17] dmaengine: idxd: make misc interrupt one shot Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 03/17] dmaengine: idxd: setup event log configuration Fenghua Yu
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add support for changing of the event log size. Event log is a
feature added to DSA 2.0 hardware to improve error reporting.
It supersedes the SWERROR register on DSA 1.0 hardware and hope
to prevent loss of reported errors.

The error log size determines how many error entries supported for
the device. It can be configured by the user via sysfs attribute.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 .../ABI/stable/sysfs-driver-dma-idxd          |  8 +++
 drivers/dma/idxd/idxd.h                       |  5 ++
 drivers/dma/idxd/init.c                       | 23 ++++++++
 drivers/dma/idxd/registers.h                  |  7 ++-
 drivers/dma/idxd/sysfs.c                      | 52 +++++++++++++++++++
 5 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd
index 3becc9a82bdf..f7acb14bf255 100644
--- a/Documentation/ABI/stable/sysfs-driver-dma-idxd
+++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd
@@ -136,6 +136,14 @@ Description:	The last executed device administrative command's status/error.
 		Also last configuration error overloaded.
 		Writing to it will clear the status.
 
+What:		/sys/bus/dsa/devices/dsa<m>/event_log_size
+Date:		Sept 14, 2022
+KernelVersion: 6.2.0
+Contact:	dmaengine@vger.kernel.org
+Description:	The event log size to be configured. Default is 64 entries and
+		occupies 4k size if the evl entry is 64 bytes. It's visible
+		only on platforms that support the capability.
+
 What:		/sys/bus/dsa/devices/wq<m>.<n>/block_on_fault
 Date:		Oct 27, 2020
 KernelVersion:	5.11.0
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 7ced8d283d98..f42f87c490b4 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -260,6 +260,10 @@ struct idxd_driver_data {
 	int align;
 };
 
+struct idxd_evl {
+	u16 size;
+};
+
 struct idxd_device {
 	struct idxd_dev idxd_dev;
 	struct idxd_driver_data *data;
@@ -316,6 +320,7 @@ struct idxd_device {
 	struct idxd_pmu *idxd_pmu;
 
 	unsigned long *opcap_bmap;
+	struct idxd_evl *evl;
 };
 
 /* IDXD software descriptor */
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 529ea09c9094..3f4540741d11 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -327,6 +327,23 @@ static void idxd_cleanup_internals(struct idxd_device *idxd)
 	destroy_workqueue(idxd->wq);
 }
 
+static int idxd_init_evl(struct idxd_device *idxd)
+{
+	struct device *dev = &idxd->pdev->dev;
+	struct idxd_evl *evl;
+
+	if (idxd->hw.gen_cap.evl_support == 0)
+		return 0;
+
+	evl = kzalloc_node(sizeof(*evl), GFP_KERNEL, dev_to_node(dev));
+	if (!evl)
+		return -ENOMEM;
+
+	evl->size = IDXD_EVL_SIZE_MIN;
+	idxd->evl = evl;
+	return 0;
+}
+
 static int idxd_setup_internals(struct idxd_device *idxd)
 {
 	struct device *dev = &idxd->pdev->dev;
@@ -352,8 +369,14 @@ static int idxd_setup_internals(struct idxd_device *idxd)
 		goto err_wkq_create;
 	}
 
+	rc = idxd_init_evl(idxd);
+	if (rc < 0)
+		goto err_evl;
+
 	return 0;
 
+ err_evl:
+	destroy_workqueue(idxd->wq);
  err_wkq_create:
 	for (i = 0; i < idxd->max_groups; i++)
 		put_device(group_confdev(idxd->groups[i]));
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index fe3b8d04f9db..b36a743a94a4 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -31,7 +31,9 @@ union gen_cap_reg {
 		u64 rsvd:3;
 		u64 dest_readback:1;
 		u64 drain_readback:1;
-		u64 rsvd2:6;
+		u64 rsvd2:3;
+		u64 evl_support:2;
+		u64 rsvd4:1;
 		u64 max_xfer_shift:5;
 		u64 max_batch_shift:4;
 		u64 max_ims_mult:6;
@@ -276,6 +278,9 @@ union sw_err_reg {
 	u64 bits[4];
 } __packed;
 
+#define IDXD_EVL_SIZE_MIN	0x0040
+#define IDXD_EVL_SIZE_MAX	0xffff
+
 union msix_perm {
 	struct {
 		u32 rsvd:2;
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 3229dfc78650..3e27cd4d189f 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -1563,6 +1563,46 @@ static ssize_t cmd_status_store(struct device *dev, struct device_attribute *att
 }
 static DEVICE_ATTR_RW(cmd_status);
 
+static ssize_t event_log_size_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct idxd_device *idxd = confdev_to_idxd(dev);
+
+	if (!idxd->evl)
+		return -EOPNOTSUPP;
+
+	return sysfs_emit(buf, "%u\n", idxd->evl->size);
+}
+
+static ssize_t event_log_size_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct idxd_device *idxd = confdev_to_idxd(dev);
+	unsigned long val;
+	int rc;
+
+	if (!idxd->evl)
+		return -EOPNOTSUPP;
+
+	rc = kstrtoul(buf, 10, &val);
+	if (rc < 0)
+		return -EINVAL;
+
+	if (idxd->state == IDXD_DEV_ENABLED)
+		return -EPERM;
+
+	if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
+		return -EPERM;
+
+	if (val < IDXD_EVL_SIZE_MIN || val > IDXD_EVL_SIZE_MAX)
+		return -EINVAL;
+
+	idxd->evl->size = val;
+	return count;
+}
+static DEVICE_ATTR_RW(event_log_size);
+
 static bool idxd_device_attr_max_batch_size_invisible(struct attribute *attr,
 						      struct idxd_device *idxd)
 {
@@ -1585,6 +1625,13 @@ static bool idxd_device_attr_read_buffers_invisible(struct attribute *attr,
 		idxd->data->type == IDXD_TYPE_IAX;
 }
 
+static bool idxd_device_attr_event_log_size_invisible(struct attribute *attr,
+						      struct idxd_device *idxd)
+{
+	return (attr == &dev_attr_event_log_size.attr &&
+		!idxd->hw.gen_cap.evl_support);
+}
+
 static umode_t idxd_device_attr_visible(struct kobject *kobj,
 					struct attribute *attr, int n)
 {
@@ -1597,6 +1644,9 @@ static umode_t idxd_device_attr_visible(struct kobject *kobj,
 	if (idxd_device_attr_read_buffers_invisible(attr, idxd))
 		return 0;
 
+	if (idxd_device_attr_event_log_size_invisible(attr, idxd))
+		return 0;
+
 	return attr->mode;
 }
 
@@ -1622,6 +1672,7 @@ static struct attribute *idxd_device_attributes[] = {
 	&dev_attr_read_buffer_limit.attr,
 	&dev_attr_cdev_major.attr,
 	&dev_attr_cmd_status.attr,
+	&dev_attr_event_log_size.attr,
 	NULL,
 };
 
@@ -1643,6 +1694,7 @@ static void idxd_conf_device_release(struct device *dev)
 	bitmap_free(idxd->wq_enable_map);
 	kfree(idxd->wqs);
 	kfree(idxd->engines);
+	kfree(idxd->evl);
 	ida_free(&idxd_ida, idxd->id);
 	bitmap_free(idxd->opcap_bmap);
 	kfree(idxd);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/17] dmaengine: idxd: setup event log configuration
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
  2023-01-03 16:34 ` [PATCH 01/17] dmaengine: idxd: make misc interrupt one shot Fenghua Yu
  2023-01-03 16:34 ` [PATCH 02/17] dmaengine: idxd: add event log size sysfs attribute Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 04/17] dmaengine: idxd: add interrupt handling for event log Fenghua Yu
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add setup of event log feature for supported device. Event log addresses
error reporting that was lacking in gen 1 DSA devices where a second error
event does not get reported when a first event is pending software
handling. The event log allows a circular buffer that the device can push
error events to. It is up to the user to create a large enough event log
ring in order to capture the expected events. The evl size can be set in
the device sysfs attribute. By default 64 entries are supported as minimal
when event log is enabled.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/device.c    | 89 +++++++++++++++++++++++++++++++++++-
 drivers/dma/idxd/idxd.h      | 19 ++++++++
 drivers/dma/idxd/init.c      |  1 +
 drivers/dma/idxd/registers.h | 72 ++++++++++++++++++++++++++++-
 drivers/dma/idxd/sysfs.c     |  3 +-
 include/uapi/linux/idxd.h    |  1 +
 6 files changed, 181 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 06f5d3783d77..307bbb6c80d8 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -748,6 +748,83 @@ void idxd_device_clear_state(struct idxd_device *idxd)
 	spin_unlock(&idxd->dev_lock);
 }
 
+static int idxd_device_evl_setup(struct idxd_device *idxd)
+{
+	union gencfg_reg gencfg;
+	union evlcfg_reg evlcfg;
+	union genctrl_reg genctrl;
+	struct device *dev = &idxd->pdev->dev;
+	void *addr;
+	dma_addr_t dma_addr;
+	int size;
+	struct idxd_evl *evl = idxd->evl;
+
+	if (!evl)
+		return 0;
+
+	size = evl_size(idxd);
+	/*
+	 * Address needs to be page aligned. However, dma_alloc_coherent() provides
+	 * at minimal page size aligned address. No manual alignment required.
+	 */
+	addr = dma_alloc_coherent(dev, size, &dma_addr, GFP_KERNEL);
+	if (!addr)
+		return -ENOMEM;
+
+	memset(addr, 0, size);
+
+	spin_lock(&evl->lock);
+	evl->log = addr;
+	evl->dma = dma_addr;
+	evl->log_size = size;
+
+	memset(&evlcfg, 0, sizeof(evlcfg));
+	evlcfg.bits[0] = dma_addr & GENMASK(63, 12);
+	evlcfg.size = evl->size;
+
+	iowrite64(evlcfg.bits[0], idxd->reg_base + IDXD_EVLCFG_OFFSET);
+	iowrite64(evlcfg.bits[1], idxd->reg_base + IDXD_EVLCFG_OFFSET + 8);
+
+	genctrl.bits = ioread32(idxd->reg_base + IDXD_GENCTRL_OFFSET);
+	genctrl.evl_int_en = 1;
+	iowrite32(genctrl.bits, idxd->reg_base + IDXD_GENCTRL_OFFSET);
+
+	gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+	gencfg.evl_en = 1;
+	iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
+
+	spin_unlock(&evl->lock);
+	return 0;
+}
+
+static void idxd_device_evl_free(struct idxd_device *idxd)
+{
+	union gencfg_reg gencfg;
+	union genctrl_reg genctrl;
+	struct device *dev = &idxd->pdev->dev;
+	struct idxd_evl *evl = idxd->evl;
+
+	gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+	if (!gencfg.evl_en)
+		return;
+
+	spin_lock(&evl->lock);
+	gencfg.evl_en = 0;
+	iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
+
+	genctrl.bits = ioread32(idxd->reg_base + IDXD_GENCTRL_OFFSET);
+	genctrl.evl_int_en = 0;
+	iowrite32(genctrl.bits, idxd->reg_base + IDXD_GENCTRL_OFFSET);
+
+	iowrite64(0, idxd->reg_base + IDXD_EVLCFG_OFFSET);
+	iowrite64(0, idxd->reg_base + IDXD_EVLCFG_OFFSET + 8);
+
+	dma_free_coherent(dev, evl->log_size, evl->log, evl->dma);
+	evl->log = NULL;
+	evl->size = IDXD_EVL_SIZE_MIN;
+	spin_unlock(&evl->lock);
+}
+
 static void idxd_group_config_write(struct idxd_group *group)
 {
 	struct idxd_device *idxd = group->idxd;
@@ -1441,15 +1518,24 @@ int idxd_device_drv_probe(struct idxd_dev *idxd_dev)
 	if (rc < 0)
 		return -ENXIO;
 
+	rc = idxd_device_evl_setup(idxd);
+	if (rc < 0) {
+		idxd->cmd_status = IDXD_SCMD_DEV_EVL_ERR;
+		return rc;
+	}
+
 	/* Start device */
 	rc = idxd_device_enable(idxd);
-	if (rc < 0)
+	if (rc < 0) {
+		idxd_device_evl_free(idxd);
 		return rc;
+	}
 
 	/* Setup DMA device without channels */
 	rc = idxd_register_dma_device(idxd);
 	if (rc < 0) {
 		idxd_device_disable(idxd);
+		idxd_device_evl_free(idxd);
 		idxd->cmd_status = IDXD_SCMD_DEV_DMA_ERR;
 		return rc;
 	}
@@ -1478,6 +1564,7 @@ void idxd_device_drv_remove(struct idxd_dev *idxd_dev)
 	idxd_device_disable(idxd);
 	if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
 		idxd_device_reset(idxd);
+	idxd_device_evl_free(idxd);
 }
 
 static enum idxd_dev_type dev_types[] = {
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index f42f87c490b4..55124e400559 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -261,7 +261,15 @@ struct idxd_driver_data {
 };
 
 struct idxd_evl {
+	/* Lock to protect event log access. */
+	spinlock_t lock;
+	void *log;
+	dma_addr_t dma;
+	/* Total size of event log = number of entries * entry size. */
+	unsigned int log_size;
+	/* The number of entries in the event log. */
 	u16 size;
+	u16 head;
 };
 
 struct idxd_device {
@@ -323,6 +331,17 @@ struct idxd_device {
 	struct idxd_evl *evl;
 };
 
+static inline unsigned int evl_ent_size(struct idxd_device *idxd)
+{
+	return idxd->hw.gen_cap.evl_support ?
+	       (32 * (1 << idxd->hw.gen_cap.evl_support)) : 0;
+}
+
+static inline unsigned int evl_size(struct idxd_device *idxd)
+{
+	return idxd->evl->size * evl_ent_size(idxd);
+}
+
 /* IDXD software descriptor */
 struct idxd_desc {
 	union {
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 3f4540741d11..7cef0559ef15 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -339,6 +339,7 @@ static int idxd_init_evl(struct idxd_device *idxd)
 	if (!evl)
 		return -ENOMEM;
 
+	spin_lock_init(&evl->lock);
 	evl->size = IDXD_EVL_SIZE_MIN;
 	idxd->evl = evl;
 	return 0;
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index b36a743a94a4..8ea0c45cbe2c 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -3,6 +3,8 @@
 #ifndef _IDXD_REGISTERS_H_
 #define _IDXD_REGISTERS_H_
 
+#include <uapi/linux/idxd.h>
+
 /* PCI Config */
 #define PCI_DEVICE_ID_INTEL_DSA_SPR0	0x0b25
 #define PCI_DEVICE_ID_INTEL_IAX_SPR0	0x0cfe
@@ -119,7 +121,8 @@ union gencfg_reg {
 		u32 rdbuf_limit:8;
 		u32 rsvd:4;
 		u32 user_int_en:1;
-		u32 rsvd2:19;
+		u32 evl_en:1;
+		u32 rsvd2:18;
 	};
 	u32 bits;
 } __packed;
@@ -129,7 +132,8 @@ union genctrl_reg {
 	struct {
 		u32 softerr_int_en:1;
 		u32 halt_int_en:1;
-		u32 rsvd:30;
+		u32 evl_int_en:1;
+		u32 rsvd:29;
 	};
 	u32 bits;
 } __packed;
@@ -278,6 +282,21 @@ union sw_err_reg {
 	u64 bits[4];
 } __packed;
 
+#define IDXD_EVLCFG_OFFSET	0xe0
+union evlcfg_reg {
+	struct {
+		u64 pasid_en:1;
+		u64 priv:1;
+		u64 rsvd:10;
+		u64 base_addr:52;
+
+		u64 size:16;
+		u64 pasid:20;
+		u64 rsvd2:28;
+	};
+	u64 bits[2];
+} __packed;
+
 #define IDXD_EVL_SIZE_MIN	0x0040
 #define IDXD_EVL_SIZE_MAX	0xffff
 
@@ -518,4 +537,53 @@ union filter_cfg {
 	u64 val;
 } __packed;
 
+struct __evl_entry {
+	u64 rsvd:2;
+	u64 desc_valid:1;
+	u64 wq_idx_valid:1;
+	u64 batch:1;
+	u64 fault_rw:1;
+	u64 priv:1;
+	u64 err_info_valid:1;
+	u64 error:8;
+	u64 wq_idx:8;
+	u64 batch_id:8;
+	u64 operation:8;
+	u64 pasid:20;
+	u64 rsvd2:4;
+
+	u16 batch_idx;
+	u16 rsvd3;
+	union {
+		/* Invalid Flags 0x11 */
+		u32 invalid_flags;
+		/* Invalid Int Handle 0x19 */
+		/* Page fault 0x1a */
+		/* Page fault 0x06, 0x1f, only operand_id */
+		/* Page fault before drain or in batch, 0x26, 0x27 */
+		struct {
+			u16 int_handle;
+			u16 rci:1;
+			u16 ims:1;
+			u16 rcr:1;
+			u16 first_err_in_batch:1;
+			u16 rsvd4_2:9;
+			u16 operand_id:3;
+		};
+	};
+	u64 fault_addr;
+	u64 rsvd5;
+} __packed;
+
+struct dsa_evl_entry {
+	struct __evl_entry e;
+	struct dsa_completion_record cr;
+} __packed;
+
+struct iax_evl_entry {
+	struct __evl_entry e;
+	u64 rsvd[4];
+	struct iax_completion_record cr;
+} __packed;
+
 #endif
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 3e27cd4d189f..3e8c4ecd40ee 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -1595,7 +1595,8 @@ static ssize_t event_log_size_store(struct device *dev,
 	if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
 		return -EPERM;
 
-	if (val < IDXD_EVL_SIZE_MIN || val > IDXD_EVL_SIZE_MAX)
+	if (val < IDXD_EVL_SIZE_MIN || val > IDXD_EVL_SIZE_MAX ||
+	    (val * evl_ent_size(idxd) > ULONG_MAX - idxd->evl->dma))
 		return -EINVAL;
 
 	idxd->evl->size = val;
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 1d553bedbdb5..96b552614ee7 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -30,6 +30,7 @@ enum idxd_scmd_stat {
 	IDXD_SCMD_WQ_NO_PRIV = 0x800f0000,
 	IDXD_SCMD_WQ_IRQ_ERR = 0x80100000,
 	IDXD_SCMD_WQ_USER_NO_IOMMU = 0x80110000,
+	IDXD_SCMD_DEV_EVL_ERR = 0x80120000,
 };
 
 #define IDXD_SCMD_SOFTERR_MASK	0x80000000
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/17] dmaengine: idxd: add interrupt handling for event log
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (2 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 03/17] dmaengine: idxd: setup event log configuration Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 05/17] dmanegine: idxd: add debugfs for event log dump Fenghua Yu
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

An event log interrupt is raised in the misc interrupt INTCAUSE register
when an event is written by the hardware. Add basic event log processing
support to the interrupt handler. The event log is a ring where the
hardware owns the tail and the software owns the head. The hardware will
advance the tail index when an additional event has been pushed to memory.
The software will process the log entry and then advances the head. The
log is full when (tail + 1) % log_size = head. The hardware will stop
writing when the log is full. The user is expected to create a log size
large enough to handle all the expected events.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/irq.c       | 48 ++++++++++++++++++++++++++++++++++++
 drivers/dma/idxd/registers.h | 19 ++++++++++++++
 include/uapi/linux/idxd.h    |  1 +
 3 files changed, 68 insertions(+)

diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index 0d639303b515..52b8b7d9db22 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -217,6 +217,49 @@ static void idxd_int_handle_revoke(struct work_struct *work)
 	kfree(revoke);
 }
 
+static void process_evl_entry(struct idxd_device *idxd, struct __evl_entry *entry_head)
+{
+	struct device *dev = &idxd->pdev->dev;
+	u8 status;
+
+	status = DSA_COMP_STATUS(entry_head->error);
+	dev_warn_ratelimited(dev, "Device error %#x operation: %#x fault addr: %#llx\n",
+			     status, entry_head->operation, entry_head->fault_addr);
+}
+
+static void process_evl_entries(struct idxd_device *idxd)
+{
+	union evl_status_reg evl_status;
+	unsigned int h, t;
+	struct idxd_evl *evl = idxd->evl;
+	struct __evl_entry *entry_head;
+	unsigned int ent_size = evl_ent_size(idxd);
+	u32 size;
+
+	evl_status.bits = 0;
+	evl_status.int_pending = 1;
+
+	spin_lock(&evl->lock);
+	/* Clear interrupt pending bit */
+	iowrite32(evl_status.bits_upper32,
+		  idxd->reg_base + IDXD_EVLSTATUS_OFFSET + sizeof(u32));
+	h = evl->head;
+	evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
+	t = evl_status.tail;
+	size = idxd->evl->size;
+
+	while (h != t) {
+		entry_head = (struct __evl_entry *)(evl->log + (h * ent_size));
+		process_evl_entry(idxd, entry_head);
+		h = (h + 1) % size;
+	}
+
+	evl->head = h;
+	evl_status.head = h;
+	iowrite32(evl_status.bits_lower32, idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
+	spin_unlock(&evl->lock);
+}
+
 irqreturn_t idxd_misc_thread(int vec, void *data)
 {
 	struct idxd_irq_entry *irq_entry = data;
@@ -304,6 +347,11 @@ irqreturn_t idxd_misc_thread(int vec, void *data)
 		perfmon_counter_overflow(idxd);
 	}
 
+	if (cause & IDXD_INTC_EVL) {
+		val |= IDXD_INTC_EVL;
+		process_evl_entries(idxd);
+	}
+
 	val ^= cause;
 	if (val)
 		dev_warn_once(dev, "Unexpected interrupt cause bits set: %#x\n",
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index 8ea0c45cbe2c..a3c56847b38a 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -168,6 +168,7 @@ enum idxd_device_reset_type {
 #define IDXD_INTC_OCCUPY			0x04
 #define IDXD_INTC_PERFMON_OVFL		0x08
 #define IDXD_INTC_HALT_STATE		0x10
+#define IDXD_INTC_EVL			0x20
 #define IDXD_INTC_INT_HANDLE_REVOKED	0x80000000
 
 #define IDXD_CMD_OFFSET			0xa0
@@ -537,6 +538,24 @@ union filter_cfg {
 	u64 val;
 } __packed;
 
+#define IDXD_EVLSTATUS_OFFSET		0xf0
+
+union evl_status_reg {
+	struct {
+		u32 head:16;
+		u32 rsvd:16;
+		u32 tail:16;
+		u32 rsvd2:14;
+		u32 int_pending:1;
+		u32 rsvd3:1;
+	};
+	struct {
+		u32 bits_lower32;
+		u32 bits_upper32;
+	};
+	u64 bits;
+} __packed;
+
 struct __evl_entry {
 	u64 rsvd:2;
 	u64 desc_valid:1;
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 96b552614ee7..1b33834336ab 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -168,6 +168,7 @@ enum iax_completion_status {
 
 #define DSA_COMP_STATUS_MASK		0x7f
 #define DSA_COMP_STATUS_WRITE		0x80
+#define DSA_COMP_STATUS(status)		((status) & DSA_COMP_STATUS_MASK)
 
 struct dsa_hw_desc {
 	uint32_t	pasid:20;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/17] dmanegine: idxd: add debugfs for event log dump
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (3 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 04/17] dmaengine: idxd: add interrupt handling for event log Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 06/17] dmaengine: idxd: add per DSA wq workqueue for processing cr faults Fenghua Yu
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add debugfs entry to dump the content of the event log for debugging. The
function will dump all non-zero entries in the event log. It will note
which entries are processed and which entries are still pending processing
at the time of the dump. The entries may not always be in chronological
order due to the log is a circular buffer.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/Makefile  |   2 +-
 drivers/dma/idxd/debugfs.c | 138 +++++++++++++++++++++++++++++++++++++
 drivers/dma/idxd/idxd.h    |   9 +++
 drivers/dma/idxd/init.c    |  12 ++++
 include/uapi/linux/idxd.h  |   6 +-
 5 files changed, 164 insertions(+), 3 deletions(-)
 create mode 100644 drivers/dma/idxd/debugfs.c

diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile
index a1e9f2b3a37c..dc096839ac63 100644
--- a/drivers/dma/idxd/Makefile
+++ b/drivers/dma/idxd/Makefile
@@ -1,7 +1,7 @@
 ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=IDXD
 
 obj-$(CONFIG_INTEL_IDXD) += idxd.o
-idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o
+idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o debugfs.o
 
 idxd-$(CONFIG_INTEL_IDXD_PERFMON) += perfmon.o
 
diff --git a/drivers/dma/idxd/debugfs.c b/drivers/dma/idxd/debugfs.c
new file mode 100644
index 000000000000..9cfbd9b14c4c
--- /dev/null
+++ b/drivers/dma/idxd/debugfs.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2021 Intel Corporation. All rights rsvd. */
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/debugfs.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <uapi/linux/idxd.h>
+#include "idxd.h"
+#include "registers.h"
+
+static struct dentry *idxd_debugfs_dir;
+
+static void dump_event_entry(struct idxd_device *idxd, struct seq_file *s,
+			     u16 index, int *count, bool processed)
+{
+	struct idxd_evl *evl = idxd->evl;
+	struct dsa_evl_entry *entry;
+	struct dsa_completion_record *cr;
+	u64 *raw;
+	int i;
+	int evl_strides = evl_ent_size(idxd) / sizeof(u64);
+
+	entry = (struct dsa_evl_entry *)evl->log + index;
+
+	if (!entry->e.desc_valid)
+		return;
+
+	seq_printf(s, "Event Log entry %d (real index %u) processed: %u\n",
+		   *count, index, processed);
+
+	seq_printf(s, "desc valid %u wq idx valid %u\n"
+		   "batch %u fault rw %u priv %u error 0x%x\n"
+		   "wq idx %u op %#x pasid %u batch idx %u\n"
+		   "fault addr %#llx\n",
+		   entry->e.desc_valid, entry->e.wq_idx_valid,
+		   entry->e.batch, entry->e.fault_rw, entry->e.priv,
+		   entry->e.error, entry->e.wq_idx, entry->e.operation,
+		   entry->e.pasid, entry->e.batch_idx, entry->e.fault_addr);
+
+	cr = &entry->cr;
+	seq_printf(s, "status %#x result %#x fault_info %#x bytes_completed %u\n"
+		   "fault addr %#llx inv flags %#x\n\n",
+		   cr->status, cr->result, cr->fault_info, cr->bytes_completed,
+		   cr->fault_addr, cr->invalid_flags);
+
+	raw = (u64 *)entry;
+
+	for (i = 0; i < evl_strides; i++)
+		seq_printf(s, "entry[%d] = %#llx\n", i, raw[i]);
+
+	seq_puts(s, "\n");
+	*count += 1;
+}
+
+static int debugfs_evl_show(struct seq_file *s, void *d)
+{
+	struct idxd_device *idxd = s->private;
+	struct idxd_evl *evl = idxd->evl;
+	union evl_status_reg evl_status;
+	u16 h, t, evl_size, i;
+	int count = 0;
+	bool processed = true;
+
+	if (!evl || !evl->log)
+		return 0;
+
+	spin_lock(&evl->lock);
+
+	h = evl->head;
+	evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
+	t = evl_status.tail;
+	evl_size = evl->size;
+
+	seq_printf(s, "Event Log head %u tail %u interrupt pending %u\n\n",
+		   evl_status.head, evl_status.tail, evl_status.int_pending);
+
+	i = t;
+	while (1) {
+		i = (i + 1) % evl_size;
+		if (i == t)
+			break;
+
+		if (processed && i == h)
+			processed = false;
+		dump_event_entry(idxd, s, i, &count, processed);
+	}
+
+	spin_unlock(&evl->lock);
+	return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(debugfs_evl);
+
+int idxd_device_init_debugfs(struct idxd_device *idxd)
+{
+	if (IS_ERR_OR_NULL(idxd_debugfs_dir))
+		return 0;
+
+	idxd->dbgfs_dir = debugfs_create_dir(dev_name(idxd_confdev(idxd)), idxd_debugfs_dir);
+	if (IS_ERR(idxd->dbgfs_dir))
+		return PTR_ERR(idxd->dbgfs_dir);
+
+	if (idxd->evl) {
+		idxd->dbgfs_evl_file = debugfs_create_file("event_log", 0400,
+							   idxd->dbgfs_dir, idxd,
+							   &debugfs_evl_fops);
+		if (IS_ERR(idxd->dbgfs_evl_file)) {
+			debugfs_remove_recursive(idxd->dbgfs_dir);
+			idxd->dbgfs_dir = NULL;
+			return PTR_ERR(idxd->dbgfs_evl_file);
+		}
+	}
+
+	return 0;
+}
+
+void idxd_device_remove_debugfs(struct idxd_device *idxd)
+{
+	debugfs_remove_recursive(idxd->dbgfs_dir);
+}
+
+int idxd_init_debugfs(void)
+{
+	if (!debugfs_initialized())
+		return 0;
+
+	idxd_debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
+	if (IS_ERR(idxd_debugfs_dir))
+		return  PTR_ERR(idxd_debugfs_dir);
+	return 0;
+}
+
+void idxd_remove_debugfs(void)
+{
+	debugfs_remove_recursive(idxd_debugfs_dir);
+}
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 55124e400559..0503c6f1629b 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -329,6 +329,9 @@ struct idxd_device {
 
 	unsigned long *opcap_bmap;
 	struct idxd_evl *evl;
+
+	struct dentry *dbgfs_dir;
+	struct dentry *dbgfs_evl_file;
 };
 
 static inline unsigned int evl_ent_size(struct idxd_device *idxd)
@@ -702,4 +705,10 @@ static inline void perfmon_init(void) {}
 static inline void perfmon_exit(void) {}
 #endif
 
+/* debugfs */
+int idxd_device_init_debugfs(struct idxd_device *idxd);
+void idxd_device_remove_debugfs(struct idxd_device *idxd);
+int idxd_init_debugfs(void);
+void idxd_remove_debugfs(void);
+
 #endif
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 7cef0559ef15..9a5ab223d8ac 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -661,6 +661,10 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_dev_register;
 	}
 
+	rc = idxd_device_init_debugfs(idxd);
+	if (rc)
+		dev_warn(dev, "IDXD debugfs failed to setup\n");
+
 	dev_info(&pdev->dev, "Intel(R) Accelerator Device (v%x)\n",
 		 idxd->hw.version);
 
@@ -723,6 +727,7 @@ static void idxd_remove(struct pci_dev *pdev)
 	idxd_shutdown(pdev);
 	if (device_pasid_enabled(idxd))
 		idxd_disable_system_pasid(idxd);
+	idxd_device_remove_debugfs(idxd);
 
 	irq_entry = idxd_get_ie(idxd, 0);
 	free_irq(irq_entry->vector, irq_entry);
@@ -780,6 +785,10 @@ static int __init idxd_init_module(void)
 	if (err)
 		goto err_cdev_register;
 
+	err = idxd_init_debugfs();
+	if (err)
+		goto err_debugfs;
+
 	err = pci_register_driver(&idxd_pci_driver);
 	if (err)
 		goto err_pci_register;
@@ -787,6 +796,8 @@ static int __init idxd_init_module(void)
 	return 0;
 
 err_pci_register:
+	idxd_remove_debugfs();
+err_debugfs:
 	idxd_cdev_remove();
 err_cdev_register:
 	idxd_driver_unregister(&idxd_user_drv);
@@ -807,5 +818,6 @@ static void __exit idxd_exit_module(void)
 	pci_unregister_driver(&idxd_pci_driver);
 	idxd_cdev_remove();
 	perfmon_exit();
+	idxd_remove_debugfs();
 }
 module_exit(idxd_exit_module);
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 1b33834336ab..9f66a40287b7 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -286,7 +286,8 @@ struct dsa_completion_record {
 		uint8_t		result;
 		uint8_t		dif_status;
 	};
-	uint16_t		rsvd;
+	uint8_t			fault_info;
+	uint8_t			rsvd;
 	uint32_t		bytes_completed;
 	uint64_t		fault_addr;
 	union {
@@ -335,7 +336,8 @@ struct dsa_raw_completion_record {
 struct iax_completion_record {
 	volatile uint8_t        status;
 	uint8_t                 error_code;
-	uint16_t                rsvd;
+	uint8_t			fault_info;
+	uint8_t			rsvd;
 	uint32_t                bytes_completed;
 	uint64_t                fault_addr;
 	uint32_t                invalid_flags;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/17] dmaengine: idxd: add per DSA wq workqueue for processing cr faults
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (4 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 05/17] dmanegine: idxd: add debugfs for event log dump Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 07/17] dmaengine: idxd: create kmem cache for event log fault items Fenghua Yu
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add a workqueue for user submitted completion record fault processing.
The workqueue creation and destruction lifetime will be tied to the user
sub-driver since it will only be used when the wq is a user type.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/cdev.c | 11 +++++++++++
 drivers/dma/idxd/idxd.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index e13e92609943..493e4db150d5 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -330,6 +330,13 @@ static int idxd_user_drv_probe(struct idxd_dev *idxd_dev)
 	}
 
 	mutex_lock(&wq->wq_lock);
+
+	wq->wq = create_workqueue(dev_name(wq_confdev(wq)));
+	if (!wq->wq) {
+		rc = -ENOMEM;
+		goto wq_err;
+	}
+
 	wq->type = IDXD_WQT_USER;
 	rc = drv_enable_wq(wq);
 	if (rc < 0)
@@ -348,7 +355,9 @@ static int idxd_user_drv_probe(struct idxd_dev *idxd_dev)
 err_cdev:
 	drv_disable_wq(wq);
 err:
+	destroy_workqueue(wq->wq);
 	wq->type = IDXD_WQT_NONE;
+wq_err:
 	mutex_unlock(&wq->wq_lock);
 	return rc;
 }
@@ -361,6 +370,8 @@ static void idxd_user_drv_remove(struct idxd_dev *idxd_dev)
 	idxd_wq_del_cdev(wq);
 	drv_disable_wq(wq);
 	wq->type = IDXD_WQT_NONE;
+	destroy_workqueue(wq->wq);
+	wq->wq = NULL;
 	mutex_unlock(&wq->wq_lock);
 }
 
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 0503c6f1629b..3f182540b040 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -185,6 +185,7 @@ struct idxd_wq {
 	struct idxd_dev idxd_dev;
 	struct idxd_cdev *idxd_cdev;
 	struct wait_queue_head err_queue;
+	struct workqueue_struct *wq;
 	struct idxd_device *idxd;
 	int id;
 	struct idxd_irq_entry ie;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/17] dmaengine: idxd: create kmem cache for event log fault items
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (5 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 06/17] dmaengine: idxd: add per DSA wq workqueue for processing cr faults Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 08/17] iommu: expose iommu_sva_find() to common header Fenghua Yu
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add a kmem cache per device for allocating event log fault context. The
context allows an event log entry to be copied and passed to a software
workqueue to be processed. Due to each device can have different sized
event log entry depending on device type, it's not possible to have a
global kmem cache.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/idxd.h  | 10 ++++++++++
 drivers/dma/idxd/init.c  |  9 +++++++++
 drivers/dma/idxd/sysfs.c |  1 +
 3 files changed, 20 insertions(+)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 3f182540b040..b4fa1051a482 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -273,6 +273,15 @@ struct idxd_evl {
 	u16 head;
 };
 
+struct idxd_evl_fault {
+	struct work_struct work;
+	struct idxd_wq *wq;
+	u8 status;
+
+	/* make this last member always */
+	struct __evl_entry entry[];
+};
+
 struct idxd_device {
 	struct idxd_dev idxd_dev;
 	struct idxd_driver_data *data;
@@ -330,6 +339,7 @@ struct idxd_device {
 
 	unsigned long *opcap_bmap;
 	struct idxd_evl *evl;
+	struct kmem_cache *evl_cache;
 
 	struct dentry *dbgfs_dir;
 	struct dentry *dbgfs_evl_file;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 9a5ab223d8ac..74d6a12d4bd3 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -341,6 +341,15 @@ static int idxd_init_evl(struct idxd_device *idxd)
 
 	spin_lock_init(&evl->lock);
 	evl->size = IDXD_EVL_SIZE_MIN;
+
+	idxd->evl_cache = kmem_cache_create(dev_name(idxd_confdev(idxd)),
+					    sizeof(struct idxd_evl_fault) + evl_ent_size(idxd),
+					    0, 0, NULL);
+	if (!idxd->evl_cache) {
+		kfree(evl);
+		return -ENOMEM;
+	}
+
 	idxd->evl = evl;
 	return 0;
 }
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 3e8c4ecd40ee..ea696e3c5680 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -1696,6 +1696,7 @@ static void idxd_conf_device_release(struct device *dev)
 	kfree(idxd->wqs);
 	kfree(idxd->engines);
 	kfree(idxd->evl);
+	kmem_cache_destroy(idxd->evl_cache);
 	ida_free(&idxd_ida, idxd->id);
 	bitmap_free(idxd->opcap_bmap);
 	kfree(idxd);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/17] iommu: expose iommu_sva_find() to common header
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (6 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 07/17] dmaengine: idxd: create kmem cache for event log fault items Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 09/17] mm: export access_remote_vm() symbol Fenghua Yu
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang
  Cc: Fenghua Yu, dmaengine, Tony Zhu, Joerg Roedel, Will Deacon,
	Robin Murphy, iommu

From: Dave Jiang <dave.jiang@intel.com>

Move iommu_sva_find() prototype to global header from local header in order
to allow drivers to call the function. Used by idxd driver to find the mm
from device reported PASID. The symbol is already exported.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev
---
 drivers/iommu/iommu-sva.h | 1 -
 include/linux/iommu.h     | 7 +++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
index 7215a761b962..102eae1817a2 100644
--- a/drivers/iommu/iommu-sva.h
+++ b/drivers/iommu/iommu-sva.h
@@ -9,7 +9,6 @@
 #include <linux/mm_types.h>
 
 int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
-struct mm_struct *iommu_sva_find(ioasid_t pasid);
 
 /* I/O Page fault */
 struct device;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 46e1347bfa22..7db16ca3f519 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -704,6 +704,8 @@ void iommu_release_device(struct device *dev);
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
 
+struct mm_struct *iommu_sva_find(ioasid_t pasid);
+
 int iommu_device_use_default_domain(struct device *dev);
 void iommu_device_unuse_default_domain(struct device *dev);
 
@@ -1042,6 +1044,11 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
 	return -ENODEV;
 }
 
+static inline struct mm_struct *iommu_sva_find(ioasid_t pasid)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (7 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 08/17] iommu: expose iommu_sva_find() to common header Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 17:45   ` Lorenzo Stoakes
  2023-01-03 16:34 ` [PATCH 10/17] dmaengine: idxd: process user page faults for completion record Fenghua Yu
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang
  Cc: Fenghua Yu, dmaengine, Tony Zhu, Andrew Morton, linux-mm

From: Dave Jiang <dave.jiang@intel.com>

Export access_remote_vm() symbol for driver usage. The idxd driver would
like to use it to write the user's completion record that the hardware
device is not able to write to due to user page fault.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
---
 mm/memory.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory.c b/mm/memory.c
index aad226daf41b..caae4deff987 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5579,6 +5579,7 @@ int access_remote_vm(struct mm_struct *mm, unsigned long addr,
 {
 	return __access_remote_vm(mm, addr, buf, len, gup_flags);
 }
+EXPORT_SYMBOL_GPL(access_remote_vm);
 
 /*
  * Access another process' address space.
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/17] dmaengine: idxd: process user page faults for completion record
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (8 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 09/17] mm: export access_remote_vm() symbol Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:34 ` [PATCH 11/17] dmaengine: idxd: add descs_completed field " Fenghua Yu
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

DSA supports page fault handling through PRS. However, the DMA engine
that's processing the descriptor is blocked until the PRS response is
received. Other workqueues sharing the engine are also blocked.
Page fault handing by the driver with PRS disabled can be used to
mitigate the stalling.

With PRS disabled while ATS remain enabled, DSA handles page faults on
a completion record by reporting an event in the event log. In this
instance, the descriptor is completed and the event log contains the
completion record address and the contents of the completion record. Add
support to the event log handling code to fault in the completion record
and copy the content of the completion record to user memory.

A bitmap is introduced to keep track of discarded event log entries. When
the user process initiates ->release() of the char device, it no longer is
interested in any remaing event log entries tied to the relevant wq and
PASID. The driver will mark the event log entry index in the bitmap. Upon
encountering the entries during processing, the event log handler will just
clear the bitmap bit and skip the entry rather than attempt to process the
event log entry.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/Kconfig       |  1 +
 drivers/dma/idxd/cdev.c   | 34 +++++++++++++++-
 drivers/dma/idxd/device.c | 22 +++++++++-
 drivers/dma/idxd/idxd.h   |  2 +
 drivers/dma/idxd/init.c   |  2 +
 drivers/dma/idxd/irq.c    | 84 ++++++++++++++++++++++++++++++++++++---
 include/uapi/linux/idxd.h |  1 +
 7 files changed, 138 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index b6d48d54f42f..b95197fd7137 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -297,6 +297,7 @@ config INTEL_IDXD
 	depends on PCI_PASID
 	depends on SBITMAP
 	select DMA_ENGINE
+	select IOMMU_SVA
 	help
 	  Enable support for the Intel(R) data accelerators present
 	  in Intel Xeon CPU.
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 493e4db150d5..eed0bc15951a 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -136,6 +136,35 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 	return rc;
 }
 
+static void idxd_cdev_evl_drain_pasid(struct idxd_wq *wq, u32 pasid)
+{
+	struct idxd_device *idxd = wq->idxd;
+	struct idxd_evl *evl = idxd->evl;
+	union evl_status_reg status;
+	u16 h, t, size;
+	int ent_size = evl_ent_size(idxd);
+	struct __evl_entry *entry_head;
+
+	if (!evl)
+		return;
+
+	spin_lock(&evl->lock);
+	status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
+	t = status.tail;
+	h = evl->head;
+	size = evl->size;
+
+	while (h != t) {
+		entry_head = (struct __evl_entry *)(evl->log + (h * ent_size));
+		if (entry_head->pasid == pasid && entry_head->wq_idx == wq->id)
+			set_bit(h, evl->bmap);
+		h = (h + 1) % size;
+	}
+	spin_unlock(&evl->lock);
+
+	drain_workqueue(wq->wq);
+}
+
 static int idxd_cdev_release(struct inode *node, struct file *filep)
 {
 	struct idxd_user_context *ctx = filep->private_data;
@@ -161,8 +190,11 @@ static int idxd_cdev_release(struct inode *node, struct file *filep)
 		}
 	}
 
-	if (ctx->sva)
+	if (ctx->sva) {
+		idxd_cdev_evl_drain_pasid(wq, ctx->pasid);
 		iommu_sva_unbind_device(ctx->sva);
+	}
+
 	kfree(ctx);
 	mutex_lock(&wq->wq_lock);
 	idxd_wq_put(wq);
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 307bbb6c80d8..cef3f22dd48b 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -758,18 +758,29 @@ static int idxd_device_evl_setup(struct idxd_device *idxd)
 	dma_addr_t dma_addr;
 	int size;
 	struct idxd_evl *evl = idxd->evl;
+	unsigned long *bmap;
+	int rc;
 
 	if (!evl)
 		return 0;
 
 	size = evl_size(idxd);
+
+	bmap = bitmap_zalloc(size, GFP_KERNEL);
+	if (!bmap) {
+		rc = -ENOMEM;
+		goto err_bmap;
+	}
+
 	/*
 	 * Address needs to be page aligned. However, dma_alloc_coherent() provides
 	 * at minimal page size aligned address. No manual alignment required.
 	 */
 	addr = dma_alloc_coherent(dev, size, &dma_addr, GFP_KERNEL);
-	if (!addr)
-		return -ENOMEM;
+	if (!addr) {
+		rc = -ENOMEM;
+		goto err_alloc;
+	}
 
 	memset(addr, 0, size);
 
@@ -777,6 +788,7 @@ static int idxd_device_evl_setup(struct idxd_device *idxd)
 	evl->log = addr;
 	evl->dma = dma_addr;
 	evl->log_size = size;
+	evl->bmap = bmap;
 
 	memset(&evlcfg, 0, sizeof(evlcfg));
 	evlcfg.bits[0] = dma_addr & GENMASK(63, 12);
@@ -795,6 +807,11 @@ static int idxd_device_evl_setup(struct idxd_device *idxd)
 
 	spin_unlock(&evl->lock);
 	return 0;
+
+err_alloc:
+	bitmap_free(bmap);
+err_bmap:
+	return rc;
 }
 
 static void idxd_device_evl_free(struct idxd_device *idxd)
@@ -820,6 +837,7 @@ static void idxd_device_evl_free(struct idxd_device *idxd)
 	iowrite64(0, idxd->reg_base + IDXD_EVLCFG_OFFSET + 8);
 
 	dma_free_coherent(dev, evl->log_size, evl->log, evl->dma);
+	bitmap_free(evl->bmap);
 	evl->log = NULL;
 	evl->size = IDXD_EVL_SIZE_MIN;
 	spin_unlock(&evl->lock);
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index b4fa1051a482..61c0616828c0 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -259,6 +259,7 @@ struct idxd_driver_data {
 	struct device_type *dev_type;
 	int compl_size;
 	int align;
+	int evl_cr_off;
 };
 
 struct idxd_evl {
@@ -271,6 +272,7 @@ struct idxd_evl {
 	/* The number of entries in the event log. */
 	u16 size;
 	u16 head;
+	unsigned long *bmap;
 };
 
 struct idxd_evl_fault {
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 74d6a12d4bd3..c0fffc31c6e4 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -47,6 +47,7 @@ static struct idxd_driver_data idxd_driver_data[] = {
 		.compl_size = sizeof(struct dsa_completion_record),
 		.align = 32,
 		.dev_type = &dsa_device_type,
+		.evl_cr_off = offsetof(struct dsa_evl_entry, cr),
 	},
 	[IDXD_TYPE_IAX] = {
 		.name_prefix = "iax",
@@ -54,6 +55,7 @@ static struct idxd_driver_data idxd_driver_data[] = {
 		.compl_size = sizeof(struct iax_completion_record),
 		.align = 64,
 		.dev_type = &iax_device_type,
+		.evl_cr_off = offsetof(struct iax_evl_entry, cr),
 	},
 };
 
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index 52b8b7d9db22..44b5ed808b31 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -7,6 +7,9 @@
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/dmaengine.h>
 #include <linux/delay.h>
+#include <linux/ioasid.h>
+#include <linux/iommu.h>
+#include <linux/sched/mm.h>
 #include <uapi/linux/idxd.h>
 #include "../dmaengine.h"
 #include "idxd.h"
@@ -217,14 +220,85 @@ static void idxd_int_handle_revoke(struct work_struct *work)
 	kfree(revoke);
 }
 
-static void process_evl_entry(struct idxd_device *idxd, struct __evl_entry *entry_head)
+static void idxd_evl_fault_work(struct work_struct *work)
 {
+	struct idxd_evl_fault *fault = container_of(work, struct idxd_evl_fault, work);
+	struct idxd_wq *wq = fault->wq;
+	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
+	struct mm_struct *mm;
+	int copied;
+	struct __evl_entry *entry_head = fault->entry;
+	void *cr = (void *)entry_head + idxd->data->evl_cr_off;
+	int cr_size = idxd->data->compl_size;
+
+	mm = iommu_sva_find(entry_head->pasid);
+	if (IS_ERR_OR_NULL(mm)) {
+		dev_err(dev, "Page request for invalid mm (pasid: %u).\n",
+			entry_head->pasid);
+		kfree(fault);
+		return;
+	}
+
+	switch (fault->status) {
+	case DSA_COMP_CRA_XLAT:
+	case DSA_COMP_DRAIN_EVL:
+		copied = access_remote_vm(mm, entry_head->fault_addr, cr, cr_size,
+					  FOLL_WRITE | FOLL_REMOTE);
+		if (copied != cr_size)
+			dev_err(dev, "Failed to write to completion record. (%d:%d)\n",
+				cr_size, copied);
+		break;
+	default:
+		dev_err(dev, "Unrecognized error code: %#x\n",
+			DSA_COMP_STATUS(entry_head->error));
+		break;
+	}
+
+	mmput(mm);
+	kmem_cache_free(idxd->evl_cache, fault);
+}
+
+static void process_evl_entry(struct idxd_device *idxd,
+			      struct __evl_entry *entry_head, unsigned int index)
+{
+	struct device *dev = &idxd->pdev->dev;
+	struct idxd_evl *evl = idxd->evl;
 	u8 status;
 
-	status = DSA_COMP_STATUS(entry_head->error);
-	dev_warn_ratelimited(dev, "Device error %#x operation: %#x fault addr: %#llx\n",
-			     status, entry_head->operation, entry_head->fault_addr);
+	if (test_bit(index, evl->bmap)) {
+		clear_bit(index, evl->bmap);
+	} else {
+		status = DSA_COMP_STATUS(entry_head->error);
+
+		if (status == DSA_COMP_CRA_XLAT || status == DSA_COMP_DRAIN_EVL) {
+			struct idxd_evl_fault *fault;
+			int ent_size = evl_ent_size(idxd);
+
+			if (entry_head->rci)
+				dev_dbg(dev, "Completion Int Req set, ignoring!\n");
+
+			if (!entry_head->rcr && status == DSA_COMP_DRAIN_EVL)
+				return;
+
+			fault = kmem_cache_alloc(idxd->evl_cache, GFP_ATOMIC);
+			if (fault) {
+				struct idxd_wq *wq = idxd->wqs[entry_head->wq_idx];
+
+				fault->wq = wq;
+				fault->status = status;
+				memcpy(&fault->entry, entry_head, ent_size);
+				INIT_WORK(&fault->work, idxd_evl_fault_work);
+				queue_work(wq->wq, &fault->work);
+			} else {
+				dev_warn(dev, "Failed to service fault work.\n");
+			}
+		} else {
+			dev_warn_ratelimited(dev, "Device error %#x operation: %#x fault addr: %#llx\n",
+					     status, entry_head->operation,
+					     entry_head->fault_addr);
+		}
+	}
 }
 
 static void process_evl_entries(struct idxd_device *idxd)
@@ -250,7 +324,7 @@ static void process_evl_entries(struct idxd_device *idxd)
 
 	while (h != t) {
 		entry_head = (struct __evl_entry *)(evl->log + (h * ent_size));
-		process_evl_entry(idxd, entry_head);
+		process_evl_entry(idxd, entry_head, h);
 		h = (h + 1) % size;
 	}
 
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 9f66a40287b7..685440a2c4bc 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -133,6 +133,7 @@ enum dsa_completion_status {
 	DSA_COMP_HW_ERR1,
 	DSA_COMP_HW_ERR_DRB,
 	DSA_COMP_TRANSLATION_FAIL,
+	DSA_COMP_DRAIN_EVL = 0x26,
 };
 
 enum iax_completion_status {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/17] dmaengine: idxd: add descs_completed field for completion record
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (9 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 10/17] dmaengine: idxd: process user page faults for completion record Fenghua Yu
@ 2023-01-03 16:34 ` Fenghua Yu
  2023-01-03 16:35 ` [PATCH 12/17] dmaengine: idxd: process batch descriptor completion record faults Fenghua Yu
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:34 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

The descs_completed field for a completion record is part of a batch
descriptor completion record. It takes the same location as bytes_completed
in a normal descriptor field. Add to expose to user.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 include/uapi/linux/idxd.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 685440a2c4bc..37732016f3b0 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -289,7 +289,10 @@ struct dsa_completion_record {
 	};
 	uint8_t			fault_info;
 	uint8_t			rsvd;
-	uint32_t		bytes_completed;
+	union {
+		uint32_t		bytes_completed;
+		uint32_t		descs_completed;
+	};
 	uint64_t		fault_addr;
 	union {
 		/* common record */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/17] dmaengine: idxd: process batch descriptor completion record faults
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (10 preceding siblings ...)
  2023-01-03 16:34 ` [PATCH 11/17] dmaengine: idxd: add descs_completed field " Fenghua Yu
@ 2023-01-03 16:35 ` Fenghua Yu
  2023-01-03 16:35 ` [PATCH 13/17] dmaengine: idxd: add per file user counters for " Fenghua Yu
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:35 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add event log processing for faulting of user batch descriptor completion
record.

When encountering an event log entry for a page fault on a completion
record, the driver is expected to do the following:
1. If the "first error in batch" bit in event log entry error info is
set, discard any previously recorded errors associated with the
"batch identifier".
2. Fix the page fault according to the fault address in the event log. If
successful, write the completion record to the fault address in user space.
3. If an error is encountered while writing the completion record and it is
associated to a descriptor in the batch, the driver associates the error
with the batch identifier of the event log entry and tracks it until the
event log entry for the corresponding batch desc is encountered.

While processing an event log entry for a batch descriptor with error
indicating that one or more descs in the batch had event log entries,
the driver will do the following before writing the batch completion
record:
1. If the status field of the completion record is 0x1, the driver will
change it to error code 0x5 (one or more operations in batch completed
with status not successful) and changes the result field to 1.
2. If the status is error code 0x6 (page fault on batch descriptor list
address), change the result field to 1.
3. If status is any other value, the completion record is not changed.
4. Clear the recorded error in preparation for next batch with same batch
identifier.

The result field is for user software to determine whether to set the
"Batch Error" flag bit in the descriptor for continuation of partial
batch descriptor completion. See DSA spec 2.0 for additional information.

If no error has been recorded for the batch, the batch completion record is
written to user space as is.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/idxd.h      |  3 ++
 drivers/dma/idxd/init.c      |  4 +++
 drivers/dma/idxd/irq.c       | 60 ++++++++++++++++++++++++++++++------
 drivers/dma/idxd/registers.h |  4 ++-
 include/uapi/linux/idxd.h    |  1 +
 5 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 61c0616828c0..00edf68e3528 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -260,6 +260,8 @@ struct idxd_driver_data {
 	int compl_size;
 	int align;
 	int evl_cr_off;
+	int cr_status_off;
+	int cr_result_off;
 };
 
 struct idxd_evl {
@@ -273,6 +275,7 @@ struct idxd_evl {
 	u16 size;
 	u16 head;
 	unsigned long *bmap;
+	bool batch_fail[IDXD_MAX_BATCH_IDENT];
 };
 
 struct idxd_evl_fault {
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index c0fffc31c6e4..ddb9b13f3c3c 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -48,6 +48,8 @@ static struct idxd_driver_data idxd_driver_data[] = {
 		.align = 32,
 		.dev_type = &dsa_device_type,
 		.evl_cr_off = offsetof(struct dsa_evl_entry, cr),
+		.cr_status_off = offsetof(struct dsa_completion_record, status),
+		.cr_result_off = offsetof(struct dsa_completion_record, result),
 	},
 	[IDXD_TYPE_IAX] = {
 		.name_prefix = "iax",
@@ -56,6 +58,8 @@ static struct idxd_driver_data idxd_driver_data[] = {
 		.align = 64,
 		.dev_type = &iax_device_type,
 		.evl_cr_off = offsetof(struct iax_evl_entry, cr),
+		.cr_status_off = offsetof(struct iax_completion_record, status),
+		.cr_result_off = offsetof(struct iax_completion_record, error_code),
 	},
 };
 
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index 44b5ed808b31..18245f82d367 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -226,11 +226,15 @@ static void idxd_evl_fault_work(struct work_struct *work)
 	struct idxd_wq *wq = fault->wq;
 	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
+	struct idxd_evl *evl = idxd->evl;
 	struct mm_struct *mm;
-	int copied;
+	int copied, copy_size;
 	struct __evl_entry *entry_head = fault->entry;
 	void *cr = (void *)entry_head + idxd->data->evl_cr_off;
 	int cr_size = idxd->data->compl_size;
+	u8 *status = (u8 *)cr + idxd->data->cr_status_off;
+	u8 *result = (u8 *)cr + idxd->data->cr_result_off;
+	bool *bf;
 
 	mm = iommu_sva_find(entry_head->pasid);
 	if (IS_ERR_OR_NULL(mm)) {
@@ -242,16 +246,53 @@ static void idxd_evl_fault_work(struct work_struct *work)
 
 	switch (fault->status) {
 	case DSA_COMP_CRA_XLAT:
+		if (entry_head->batch && entry_head->first_err_in_batch)
+			evl->batch_fail[entry_head->batch_id] = false;
+
+		copy_size = cr_size;
+		break;
+	case DSA_COMP_BATCH_EVL_ERR:
+		bf = &evl->batch_fail[entry_head->batch_id];
+
+		copy_size = entry_head->rcr || *bf ? cr_size : 0;
+		if (*bf) {
+			if (*status == DSA_COMP_SUCCESS)
+				*status = DSA_COMP_BATCH_FAIL;
+			*result = 1;
+			*bf = false;
+		}
+		break;
 	case DSA_COMP_DRAIN_EVL:
-		copied = access_remote_vm(mm, entry_head->fault_addr, cr, cr_size,
-					  FOLL_WRITE | FOLL_REMOTE);
-		if (copied != cr_size)
-			dev_err(dev, "Failed to write to completion record. (%d:%d)\n",
-				cr_size, copied);
+		copy_size = cr_size;
 		break;
 	default:
-		dev_err(dev, "Unrecognized error code: %#x\n",
-			DSA_COMP_STATUS(entry_head->error));
+		copy_size = 0;
+		dev_err(dev, "Unrecognized error code: %#x\n", fault->status);
+		break;
+	}
+
+	if (copy_size)
+		copied = access_remote_vm(mm, entry_head->fault_addr, cr, copy_size,
+					  FOLL_WRITE | FOLL_REMOTE);
+
+	switch (fault->status) {
+	case DSA_COMP_CRA_XLAT:
+		if (copied != copy_size) {
+			dev_err(dev, "Failed to write to completion record: (%d:%d)\n",
+				copy_size, copied);
+			if (entry_head->batch)
+				evl->batch_fail[entry_head->batch_id] = true;
+		}
+		break;
+	case DSA_COMP_BATCH_EVL_ERR:
+		if (copied != copy_size)
+			dev_err(dev, "Failed to write to batch completion record: (%d:%d)\n",
+				copy_size, copied);
+		break;
+	case DSA_COMP_DRAIN_EVL:
+		if (copied != copy_size)
+			dev_err(dev, "Failed to write to drain completion record: (%d:%d)\n",
+				copy_size, copied);
 		break;
 	}
 
@@ -271,7 +312,8 @@ static void process_evl_entry(struct idxd_device *idxd,
 	} else {
 		status = DSA_COMP_STATUS(entry_head->error);
 
-		if (status == DSA_COMP_CRA_XLAT || status == DSA_COMP_DRAIN_EVL) {
+		if (status == DSA_COMP_CRA_XLAT || status == DSA_COMP_DRAIN_EVL ||
+		    status == DSA_COMP_BATCH_EVL_ERR) {
 			struct idxd_evl_fault *fault;
 			int ent_size = evl_ent_size(idxd);
 
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index a3c56847b38a..a772116b4c0d 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -35,7 +35,7 @@ union gen_cap_reg {
 		u64 drain_readback:1;
 		u64 rsvd2:3;
 		u64 evl_support:2;
-		u64 rsvd4:1;
+		u64 batch_continuation:1;
 		u64 max_xfer_shift:5;
 		u64 max_batch_shift:4;
 		u64 max_ims_mult:6;
@@ -556,6 +556,8 @@ union evl_status_reg {
 	u64 bits;
 } __packed;
 
+#define IDXD_MAX_BATCH_IDENT	256
+
 struct __evl_entry {
 	u64 rsvd:2;
 	u64 desc_valid:1;
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 37732016f3b0..2645fa8662cc 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -134,6 +134,7 @@ enum dsa_completion_status {
 	DSA_COMP_HW_ERR_DRB,
 	DSA_COMP_TRANSLATION_FAIL,
 	DSA_COMP_DRAIN_EVL = 0x26,
+	DSA_COMP_BATCH_EVL_ERR,
 };
 
 enum iax_completion_status {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/17] dmaengine: idxd: add per file user counters for completion record faults
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (11 preceding siblings ...)
  2023-01-03 16:35 ` [PATCH 12/17] dmaengine: idxd: process batch descriptor completion record faults Fenghua Yu
@ 2023-01-03 16:35 ` Fenghua Yu
  2023-01-03 16:35 ` [PATCH 14/17] dmaengine: idxd: add a device to represent the file opened Fenghua Yu
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:35 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add counters per opened file for the char device in order to keep track how
many completion record faults occurred and how many of those faults failed
the writeback by the driver after attempt to fault in the page. An xarray
is added to associate the PASID with the struct idxd_user_context so the
counters can be managed.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/cdev.c  | 50 +++++++++++++++++++++++++++++++++++++---
 drivers/dma/idxd/idxd.h  | 11 +++++++++
 drivers/dma/idxd/init.c  |  2 ++
 drivers/dma/idxd/irq.c   |  7 +++++-
 drivers/dma/idxd/sysfs.c |  1 +
 5 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index eed0bc15951a..baec95911182 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -11,6 +11,7 @@
 #include <linux/fs.h>
 #include <linux/poll.h>
 #include <linux/iommu.h>
+#include <linux/xarray.h>
 #include <uapi/linux/idxd.h>
 #include "registers.h"
 #include "idxd.h"
@@ -36,6 +37,7 @@ struct idxd_user_context {
 	unsigned int pasid;
 	unsigned int flags;
 	struct iommu_sva *sva;
+	u64 counters[COUNTER_MAX];
 };
 
 static void idxd_cdev_dev_release(struct device *dev)
@@ -68,6 +70,36 @@ static inline struct idxd_wq *inode_wq(struct inode *inode)
 	return idxd_cdev->wq;
 }
 
+void idxd_user_counter_increment(struct idxd_wq *wq, u32 pasid, int index)
+{
+	struct idxd_user_context *ctx;
+
+	if (index >= COUNTER_MAX)
+		return;
+
+	mutex_lock(&wq->uc_lock);
+	ctx = xa_load(&wq->upasid_xa, pasid);
+	if (!ctx) {
+		mutex_unlock(&wq->uc_lock);
+		return;
+	}
+	ctx->counters[index]++;
+	mutex_unlock(&wq->uc_lock);
+}
+
+static void idxd_xa_pasid_remove(struct idxd_user_context *ctx)
+{
+	struct idxd_wq *wq = ctx->wq;
+	void *ptr;
+
+	mutex_lock(&wq->uc_lock);
+	ptr = xa_cmpxchg(&wq->upasid_xa, ctx->pasid, ctx, NULL, GFP_KERNEL);
+	if (ptr != (void *)ctx)
+		dev_warn(&wq->idxd->pdev->dev, "xarray cmpxchg failed for pasid %u\n",
+			 ctx->pasid);
+	mutex_unlock(&wq->uc_lock);
+}
+
 static int idxd_cdev_open(struct inode *inode, struct file *filp)
 {
 	struct idxd_user_context *ctx;
@@ -108,20 +140,25 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 
 		pasid = iommu_sva_get_pasid(sva);
 		if (pasid == IOMMU_PASID_INVALID) {
-			iommu_sva_unbind_device(sva);
 			rc = -EINVAL;
-			goto failed;
+			goto failed_get_pasid;
 		}
 
 		ctx->sva = sva;
 		ctx->pasid = pasid;
 
+		mutex_lock(&wq->uc_lock);
+		rc = xa_insert(&wq->upasid_xa, pasid, ctx, GFP_KERNEL);
+		mutex_unlock(&wq->uc_lock);
+		if (rc < 0)
+			dev_warn(dev, "PASID entry already exist in xarray.\n");
+
 		if (wq_dedicated(wq)) {
 			rc = idxd_wq_set_pasid(wq, pasid);
 			if (rc < 0) {
 				iommu_sva_unbind_device(sva);
 				dev_err(dev, "wq set pasid failed: %d\n", rc);
-				goto failed;
+				goto failed_set_pasid;
 			}
 		}
 	}
@@ -130,6 +167,12 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 	mutex_unlock(&wq->wq_lock);
 	return 0;
 
+ failed_set_pasid:
+	if (device_user_pasid_enabled(idxd))
+		idxd_xa_pasid_remove(ctx);
+ failed_get_pasid:
+	if (device_user_pasid_enabled(idxd))
+		iommu_sva_unbind_device(sva);
  failed:
 	mutex_unlock(&wq->wq_lock);
 	kfree(ctx);
@@ -193,6 +236,7 @@ static int idxd_cdev_release(struct inode *node, struct file *filep)
 	if (ctx->sva) {
 		idxd_cdev_evl_drain_pasid(wq, ctx->pasid);
 		iommu_sva_unbind_device(ctx->sva);
+		idxd_xa_pasid_remove(ctx);
 	}
 
 	kfree(ctx);
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 00edf68e3528..3e0f113368fa 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -127,6 +127,12 @@ struct idxd_pmu {
 
 #define IDXD_MAX_PRIORITY	0xf
 
+enum {
+	COUNTER_FAULTS = 0,
+	COUNTER_FAULT_FAILS,
+	COUNTER_MAX
+};
+
 enum idxd_wq_state {
 	IDXD_WQ_DISABLED = 0,
 	IDXD_WQ_ENABLED,
@@ -215,6 +221,10 @@ struct idxd_wq {
 	char name[WQ_NAME_SIZE + 1];
 	u64 max_xfer_bytes;
 	u32 max_batch_size;
+
+	/* Lock to protect upasid_xa access. */
+	struct mutex uc_lock;
+	struct xarray upasid_xa;
 };
 
 struct idxd_engine {
@@ -705,6 +715,7 @@ void idxd_cdev_remove(void);
 int idxd_cdev_get_major(struct idxd_device *idxd);
 int idxd_wq_add_cdev(struct idxd_wq *wq);
 void idxd_wq_del_cdev(struct idxd_wq *wq);
+void idxd_user_counter_increment(struct idxd_wq *wq, u32 pasid, int index);
 
 /* perfmon */
 #if IS_ENABLED(CONFIG_INTEL_IDXD_PERFMON)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index ddb9b13f3c3c..564c025b9b7e 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -206,6 +206,8 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
 			}
 			bitmap_copy(wq->opcap_bmap, idxd->opcap_bmap, IDXD_MAX_OPCAP_BITS);
 		}
+		mutex_init(&wq->uc_lock);
+		xa_init(&wq->upasid_xa);
 		idxd->wqs[i] = wq;
 	}
 
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index 18245f82d367..241ece8eb9fc 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -250,6 +250,7 @@ static void idxd_evl_fault_work(struct work_struct *work)
 			evl->batch_fail[entry_head->batch_id] = false;
 
 		copy_size = cr_size;
+		idxd_user_counter_increment(wq, entry_head->pasid, COUNTER_FAULTS);
 		break;
 	case DSA_COMP_BATCH_EVL_ERR:
 		bf = &evl->batch_fail[entry_head->batch_id];
@@ -261,6 +262,7 @@ static void idxd_evl_fault_work(struct work_struct *work)
 			*result = 1;
 			*bf = false;
 		}
+		idxd_user_counter_increment(wq, entry_head->pasid, COUNTER_FAULTS);
 		break;
 	case DSA_COMP_DRAIN_EVL:
 		copy_size = cr_size;
@@ -278,6 +280,7 @@ static void idxd_evl_fault_work(struct work_struct *work)
 	switch (fault->status) {
 	case DSA_COMP_CRA_XLAT:
 		if (copied != copy_size) {
+			idxd_user_counter_increment(wq, entry_head->pasid, COUNTER_FAULT_FAILS);
 			dev_err(dev, "Failed to write to completion record: (%d:%d)\n",
 				copy_size, copied);
 			if (entry_head->batch)
@@ -285,9 +288,11 @@ static void idxd_evl_fault_work(struct work_struct *work)
 		}
 		break;
 	case DSA_COMP_BATCH_EVL_ERR:
-		if (copied != copy_size)
+		if (copied != copy_size) {
+			idxd_user_counter_increment(wq, entry_head->pasid, COUNTER_FAULT_FAILS);
 			dev_err(dev, "Failed to write to batch completion record: (%d:%d)\n",
 				copy_size, copied);
+		}
 		break;
 	case DSA_COMP_DRAIN_EVL:
 		if (copied != copy_size)
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index ea696e3c5680..137126781362 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -1292,6 +1292,7 @@ static void idxd_conf_wq_release(struct device *dev)
 
 	bitmap_free(wq->opcap_bmap);
 	kfree(wq->wqcfg);
+	xa_destroy(&wq->upasid_xa);
 	kfree(wq);
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/17] dmaengine: idxd: add a device to represent the file opened
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (12 preceding siblings ...)
  2023-01-03 16:35 ` [PATCH 13/17] dmaengine: idxd: add per file user counters for " Fenghua Yu
@ 2023-01-03 16:35 ` Fenghua Yu
  2023-01-03 16:35 ` [PATCH 15/17] dmaengine: idxd: expose fault counters to sysfs Fenghua Yu
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:35 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Embed a struct device for the user file context in order to export sysfs
attributes related with the opened file. Tie the lifetime of the file
context to the device. The sysfs entry will be added under the char device.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/cdev.c | 122 +++++++++++++++++++++++++++++++---------
 drivers/dma/idxd/idxd.h |   2 +
 2 files changed, 97 insertions(+), 27 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index baec95911182..46ef05de6828 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -22,6 +22,13 @@ struct idxd_cdev_context {
 	struct ida minor_ida;
 };
 
+/*
+ * Since user file names are global in DSA devices, define their ida's as
+ * global to avoid conflict file names.
+ */
+static DEFINE_IDA(file_ida);
+static DEFINE_MUTEX(ida_lock);
+
 /*
  * ictx is an array based off of accelerator types. enum idxd_type
  * is used as index
@@ -37,7 +44,60 @@ struct idxd_user_context {
 	unsigned int pasid;
 	unsigned int flags;
 	struct iommu_sva *sva;
+	struct idxd_dev idxd_dev;
 	u64 counters[COUNTER_MAX];
+	int id;
+};
+
+static void idxd_cdev_evl_drain_pasid(struct idxd_wq *wq, u32 pasid);
+static void idxd_xa_pasid_remove(struct idxd_user_context *ctx);
+
+static inline struct idxd_user_context *dev_to_uctx(struct device *dev)
+{
+	struct idxd_dev *idxd_dev = confdev_to_idxd_dev(dev);
+
+	return container_of(idxd_dev, struct idxd_user_context, idxd_dev);
+}
+
+static void idxd_file_dev_release(struct device *dev)
+{
+	struct idxd_user_context *ctx = dev_to_uctx(dev);
+	struct idxd_wq *wq = ctx->wq;
+	struct idxd_device *idxd = wq->idxd;
+	int rc;
+
+	mutex_lock(&ida_lock);
+	ida_free(&file_ida, ctx->id);
+	mutex_unlock(&ida_lock);
+
+	/* Wait for in-flight operations to complete. */
+	if (wq_shared(wq)) {
+		idxd_device_drain_pasid(idxd, ctx->pasid);
+	} else {
+		if (device_user_pasid_enabled(idxd)) {
+			/* The wq disable in the disable pasid function will drain the wq */
+			rc = idxd_wq_disable_pasid(wq);
+			if (rc < 0)
+				dev_err(dev, "wq disable pasid failed.\n");
+		} else {
+			idxd_wq_drain(wq);
+		}
+	}
+
+	if (ctx->sva) {
+		idxd_cdev_evl_drain_pasid(wq, ctx->pasid);
+		iommu_sva_unbind_device(ctx->sva);
+		idxd_xa_pasid_remove(ctx);
+	}
+	kfree(ctx);
+	mutex_lock(&wq->wq_lock);
+	idxd_wq_put(wq);
+	mutex_unlock(&wq->wq_lock);
+}
+
+static struct device_type idxd_cdev_file_type = {
+	.name = "idxd_file",
+	.release = idxd_file_dev_release,
 };
 
 static void idxd_cdev_dev_release(struct device *dev)
@@ -105,10 +165,11 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 	struct idxd_user_context *ctx;
 	struct idxd_device *idxd;
 	struct idxd_wq *wq;
-	struct device *dev;
+	struct device *dev, *fdev;
 	int rc = 0;
 	struct iommu_sva *sva;
 	unsigned int pasid;
+	struct idxd_cdev *idxd_cdev;
 
 	wq = inode_wq(inode);
 	idxd = wq->idxd;
@@ -163,10 +224,41 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 		}
 	}
 
+	idxd_cdev = wq->idxd_cdev;
+	mutex_lock(&ida_lock);
+	ctx->id = ida_alloc(&file_ida, GFP_KERNEL);
+	mutex_unlock(&ida_lock);
+	if (ctx->id < 0) {
+		dev_warn(dev, "ida alloc failure\n");
+		goto failed_ida;
+	}
+	ctx->idxd_dev.type  = IDXD_DEV_CDEV_FILE;
+	fdev = user_ctx_dev(ctx);
+	device_initialize(fdev);
+	fdev->parent = cdev_dev(idxd_cdev);
+	fdev->bus = &dsa_bus_type;
+	fdev->type = &idxd_cdev_file_type;
+
+	rc = dev_set_name(fdev, "file%d", ctx->id);
+	if (rc < 0) {
+		dev_warn(dev, "set name failure\n");
+		goto failed_dev_name;
+	}
+
+	rc = device_add(fdev);
+	if (rc < 0) {
+		dev_warn(dev, "file device add failure\n");
+		goto failed_dev_add;
+	}
+
 	idxd_wq_get(wq);
 	mutex_unlock(&wq->wq_lock);
 	return 0;
 
+ failed_dev_add:
+ failed_dev_name:
+	put_device(fdev);
+ failed_ida:
  failed_set_pasid:
 	if (device_user_pasid_enabled(idxd))
 		idxd_xa_pasid_remove(ctx);
@@ -214,35 +306,10 @@ static int idxd_cdev_release(struct inode *node, struct file *filep)
 	struct idxd_wq *wq = ctx->wq;
 	struct idxd_device *idxd = wq->idxd;
 	struct device *dev = &idxd->pdev->dev;
-	int rc;
 
 	dev_dbg(dev, "%s called\n", __func__);
 	filep->private_data = NULL;
-
-	/* Wait for in-flight operations to complete. */
-	if (wq_shared(wq)) {
-		idxd_device_drain_pasid(idxd, ctx->pasid);
-	} else {
-		if (device_user_pasid_enabled(idxd)) {
-			/* The wq disable in the disable pasid function will drain the wq */
-			rc = idxd_wq_disable_pasid(wq);
-			if (rc < 0)
-				dev_err(dev, "wq disable pasid failed.\n");
-		} else {
-			idxd_wq_drain(wq);
-		}
-	}
-
-	if (ctx->sva) {
-		idxd_cdev_evl_drain_pasid(wq, ctx->pasid);
-		iommu_sva_unbind_device(ctx->sva);
-		idxd_xa_pasid_remove(ctx);
-	}
-
-	kfree(ctx);
-	mutex_lock(&wq->wq_lock);
-	idxd_wq_put(wq);
-	mutex_unlock(&wq->wq_lock);
+	device_unregister(user_ctx_dev(ctx));
 	return 0;
 }
 
@@ -373,6 +440,7 @@ void idxd_wq_del_cdev(struct idxd_wq *wq)
 	struct idxd_cdev *idxd_cdev;
 
 	idxd_cdev = wq->idxd_cdev;
+	ida_destroy(&file_ida);
 	wq->idxd_cdev = NULL;
 	cdev_device_del(&idxd_cdev->cdev, cdev_dev(idxd_cdev));
 	put_device(cdev_dev(idxd_cdev));
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 3e0f113368fa..1ad4d9e1f06d 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -32,6 +32,7 @@ enum idxd_dev_type {
 	IDXD_DEV_GROUP,
 	IDXD_DEV_ENGINE,
 	IDXD_DEV_CDEV,
+	IDXD_DEV_CDEV_FILE,
 	IDXD_DEV_MAX_TYPE,
 };
 
@@ -404,6 +405,7 @@ enum idxd_completion_status {
 #define engine_confdev(engine) &engine->idxd_dev.conf_dev
 #define group_confdev(group) &group->idxd_dev.conf_dev
 #define cdev_dev(cdev) &cdev->idxd_dev.conf_dev
+#define user_ctx_dev(ctx) (&(ctx)->idxd_dev.conf_dev)
 
 #define confdev_to_idxd_dev(dev) container_of(dev, struct idxd_dev, conf_dev)
 #define idxd_dev_to_idxd(idxd_dev) container_of(idxd_dev, struct idxd_device, idxd_dev)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/17] dmaengine: idxd: expose fault counters to sysfs
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (13 preceding siblings ...)
  2023-01-03 16:35 ` [PATCH 14/17] dmaengine: idxd: add a device to represent the file opened Fenghua Yu
@ 2023-01-03 16:35 ` Fenghua Yu
  2023-01-03 16:35 ` [PATCH 16/17] dmaengine: idxd: add pid to exported sysfs attribute for opened file Fenghua Yu
  2023-01-03 16:35 ` [PATCH 17/17] dmaengine: idxd: add per wq PRS disable Fenghua Yu
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:35 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Expose cr_faults and cr_fault_failures counters to the user space. This
allows a user app to keep track of how many fault the application is
causing with the completion record (CR) and also the number of failures
of the CR writeback. Having a high number of cr_fault_failures is bad as
the app is submitting descriptors with the CR addresses that are bad. User
monitoring daemon may want to consider killing the application as it may be
malicious and attempting to flood the device event log.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 .../ABI/stable/sysfs-driver-dma-idxd          | 17 +++++++
 drivers/dma/idxd/cdev.c                       | 46 +++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd
index f7acb14bf255..a3ad253efbbc 100644
--- a/Documentation/ABI/stable/sysfs-driver-dma-idxd
+++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd
@@ -310,3 +310,20 @@ Description:	Allows control of the number of batch descriptors that can be
 		1 (1/2 of max value), 2 (1/4 of the max value), and 3 (1/8 of
 		the max value). It's visible only on platforms that support
 		the capability.
+
+What:		/sys/bus/dsa/devices/wq<m>.<n>/dsa<x>\!wq<m>.<n>/file<y>/cr_faults
+Date:		Sept 14, 2022
+KernelVersion:	6.2.0
+Contact:	dmaengine@vger.kernel.org
+Description:	Show the number of Completion Record (CR) faults this application
+		has caused.
+
+What:		/sys/bus/dsa/devices/wq<m>.<n>/dsa<x>\!wq<m>.<n>/file<y>/cr_fault_failures
+Date:		Sept 14, 2022
+KernelVersion:	6.2.0
+Contact:	dmaengine@vger.kernel.org
+Description:	Show the number of Completion Record (CR) faults failures that this
+		application has caused. The failure counter is incremented when the
+		driver cannot fault in the address for the CR. Typically this is caused
+		by a bad address programmed in the submitted descriptor or a malicious
+		submitter is using bad CR address on purpose.
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 46ef05de6828..121231a89060 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -59,6 +59,51 @@ static inline struct idxd_user_context *dev_to_uctx(struct device *dev)
 	return container_of(idxd_dev, struct idxd_user_context, idxd_dev);
 }
 
+static ssize_t cr_faults_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct idxd_user_context *ctx = dev_to_uctx(dev);
+
+	return sysfs_emit(buf, "%llu\n", ctx->counters[COUNTER_FAULTS]);
+}
+static DEVICE_ATTR_RO(cr_faults);
+
+static ssize_t cr_fault_failures_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct idxd_user_context *ctx = dev_to_uctx(dev);
+
+	return sysfs_emit(buf, "%llu\n", ctx->counters[COUNTER_FAULT_FAILS]);
+}
+static DEVICE_ATTR_RO(cr_fault_failures);
+
+static struct attribute *cdev_file_attributes[] = {
+	&dev_attr_cr_faults.attr,
+	&dev_attr_cr_fault_failures.attr,
+	NULL
+};
+
+static umode_t cdev_file_attr_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, typeof(*dev), kobj);
+	struct idxd_user_context *ctx = dev_to_uctx(dev);
+	struct idxd_wq *wq = ctx->wq;
+
+	if (!wq_pasid_enabled(wq))
+		return 0;
+
+	return a->mode;
+}
+
+static const struct attribute_group cdev_file_attribute_group = {
+	.attrs = cdev_file_attributes,
+	.is_visible = cdev_file_attr_visible,
+};
+
+static const struct attribute_group *cdev_file_attribute_groups[] = {
+	&cdev_file_attribute_group,
+	NULL
+};
+
 static void idxd_file_dev_release(struct device *dev)
 {
 	struct idxd_user_context *ctx = dev_to_uctx(dev);
@@ -98,6 +143,7 @@ static void idxd_file_dev_release(struct device *dev)
 static struct device_type idxd_cdev_file_type = {
 	.name = "idxd_file",
 	.release = idxd_file_dev_release,
+	.groups = cdev_file_attribute_groups,
 };
 
 static void idxd_cdev_dev_release(struct device *dev)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 16/17] dmaengine: idxd: add pid to exported sysfs attribute for opened file
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (14 preceding siblings ...)
  2023-01-03 16:35 ` [PATCH 15/17] dmaengine: idxd: expose fault counters to sysfs Fenghua Yu
@ 2023-01-03 16:35 ` Fenghua Yu
  2023-01-03 16:35 ` [PATCH 17/17] dmaengine: idxd: add per wq PRS disable Fenghua Yu
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:35 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Provide the pid of the application for the opened file. This allows the
monitor daemon to easily correlate which app opened the file and easily
kill the app by pid if that is desired action.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 Documentation/ABI/stable/sysfs-driver-dma-idxd |  8 ++++++++
 drivers/dma/idxd/cdev.c                        | 11 +++++++++++
 2 files changed, 19 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd
index a3ad253efbbc..be12612bd76e 100644
--- a/Documentation/ABI/stable/sysfs-driver-dma-idxd
+++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd
@@ -327,3 +327,11 @@ Description:	Show the number of Completion Record (CR) faults failures that this
 		driver cannot fault in the address for the CR. Typically this is caused
 		by a bad address programmed in the submitted descriptor or a malicious
 		submitter is using bad CR address on purpose.
+
+What:		/sys/bus/dsa/devices/wq<m>.<n>/dsa<x>\!wq<m>.<n>/file<y>/pid
+Date:		Sept 14, 2022
+KernelVersion:	6.2.0
+Contact:	dmaengine@vger.kernel.org
+Description:	Show the process id of the application that opened the file. This is
+		helpful information for a monitor daemon that wants to kill the
+		application that opened the file.
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 121231a89060..7a82ce25c505 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -47,6 +47,7 @@ struct idxd_user_context {
 	struct idxd_dev idxd_dev;
 	u64 counters[COUNTER_MAX];
 	int id;
+	pid_t pid;
 };
 
 static void idxd_cdev_evl_drain_pasid(struct idxd_wq *wq, u32 pasid);
@@ -76,9 +77,18 @@ static ssize_t cr_fault_failures_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(cr_fault_failures);
 
+static ssize_t pid_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct idxd_user_context *ctx = dev_to_uctx(dev);
+
+	return sysfs_emit(buf, "%u\n", ctx->pid);
+}
+static DEVICE_ATTR_RO(pid);
+
 static struct attribute *cdev_file_attributes[] = {
 	&dev_attr_cr_faults.attr,
 	&dev_attr_cr_fault_failures.attr,
+	&dev_attr_pid.attr,
 	NULL
 };
 
@@ -236,6 +246,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 
 	ctx->wq = wq;
 	filp->private_data = ctx;
+	ctx->pid = current->pid;
 
 	if (device_user_pasid_enabled(idxd)) {
 		sva = iommu_sva_bind_device(dev, current->mm);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 17/17] dmaengine: idxd: add per wq PRS disable
  2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
                   ` (15 preceding siblings ...)
  2023-01-03 16:35 ` [PATCH 16/17] dmaengine: idxd: add pid to exported sysfs attribute for opened file Fenghua Yu
@ 2023-01-03 16:35 ` Fenghua Yu
  16 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:35 UTC (permalink / raw)
  To: Vinod Koul, Dave Jiang; +Cc: Fenghua Yu, dmaengine, Tony Zhu

From: Dave Jiang <dave.jiang@intel.com>

Add sysfs knob for per wq Page Request Service disable. This knob
disables PRS support for the specific wq. When this bit is set,
it also overrides the wq's block on fault enabling.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 .../ABI/stable/sysfs-driver-dma-idxd          | 10 ++++
 drivers/dma/idxd/device.c                     |  6 +-
 drivers/dma/idxd/idxd.h                       |  1 +
 drivers/dma/idxd/registers.h                  |  5 +-
 drivers/dma/idxd/sysfs.c                      | 57 ++++++++++++++++++-
 5 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd
index be12612bd76e..603869112887 100644
--- a/Documentation/ABI/stable/sysfs-driver-dma-idxd
+++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd
@@ -227,6 +227,16 @@ Contact:	dmaengine@vger.kernel.org
 Description:	Indicate whether ATS disable is turned on for the workqueue.
 		0 indicates ATS is on, and 1 indicates ATS is off for the workqueue.
 
+What:		/sys/bus/dsa/devices/wq<m>.<n>/prs_disable
+Date:		Sept 14, 2022
+KernelVersion: 6.2.0
+Contact:	dmaengine@vger.kernel.org
+Description:	Controls whether PRS disable is turned on for the workqueue.
+		0 indicates PRS is on, and 1 indicates PRS is off for the
+		workqueue. This option overrides block_on_fault attribute
+		if set. It's visible only on platforms that support the
+		capability.
+
 What:		/sys/bus/dsa/devices/wq<m>.<n>/occupancy
 Date		May 25, 2021
 KernelVersion:	5.14.0
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index cef3f22dd48b..26f0463cff7e 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -963,12 +963,16 @@ static int idxd_wq_config_write(struct idxd_wq *wq)
 	wq->wqcfg->priority = wq->priority;
 
 	if (idxd->hw.gen_cap.block_on_fault &&
-	    test_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags))
+	    test_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags) &&
+	    !test_bit(WQ_FLAG_PRS_DISABLE, &wq->flags))
 		wq->wqcfg->bof = 1;
 
 	if (idxd->hw.wq_cap.wq_ats_support)
 		wq->wqcfg->wq_ats_disable = test_bit(WQ_FLAG_ATS_DISABLE, &wq->flags);
 
+	if (idxd->hw.wq_cap.wq_prs_support)
+		wq->wqcfg->wq_prs_disable = test_bit(WQ_FLAG_PRS_DISABLE, &wq->flags);
+
 	/* bytes 12-15 */
 	wq->wqcfg->max_xfer_shift = ilog2(wq->max_xfer_bytes);
 	idxd_wqcfg_set_max_batch_shift(idxd->data->type, wq->wqcfg, ilog2(wq->max_batch_size));
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 1ad4d9e1f06d..f92f8ec83722 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -143,6 +143,7 @@ enum idxd_wq_flag {
 	WQ_FLAG_DEDICATED = 0,
 	WQ_FLAG_BLOCK_ON_FAULT,
 	WQ_FLAG_ATS_DISABLE,
+	WQ_FLAG_PRS_DISABLE,
 };
 
 enum idxd_wq_type {
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index a772116b4c0d..6155dc7d2152 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -59,7 +59,8 @@ union wq_cap_reg {
 		u64 occupancy:1;
 		u64 occupancy_int:1;
 		u64 op_config:1;
-		u64 rsvd3:9;
+		u64 wq_prs_support:1;
+		u64 rsvd4:8;
 	};
 	u64 bits;
 } __packed;
@@ -350,7 +351,7 @@ union wqcfg {
 		u32 mode:1;	/* shared or dedicated */
 		u32 bof:1;	/* block on fault */
 		u32 wq_ats_disable:1;
-		u32 rsvd2:1;
+		u32 wq_prs_disable:1;
 		u32 priority:4;
 		u32 pasid:20;
 		u32 pasid_en:1;
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 137126781362..c772172d4ceb 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -822,10 +822,14 @@ static ssize_t wq_block_on_fault_store(struct device *dev,
 	if (rc < 0)
 		return rc;
 
-	if (bof)
+	if (bof) {
+		if (test_bit(WQ_FLAG_PRS_DISABLE, &wq->flags))
+			return -EOPNOTSUPP;
+
 		set_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags);
-	else
+	} else {
 		clear_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags);
+	}
 
 	return count;
 }
@@ -1109,6 +1113,44 @@ static ssize_t wq_ats_disable_store(struct device *dev, struct device_attribute
 static struct device_attribute dev_attr_wq_ats_disable =
 		__ATTR(ats_disable, 0644, wq_ats_disable_show, wq_ats_disable_store);
 
+static ssize_t wq_prs_disable_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct idxd_wq *wq = confdev_to_wq(dev);
+
+	return sysfs_emit(buf, "%u\n", test_bit(WQ_FLAG_PRS_DISABLE, &wq->flags));
+}
+
+static ssize_t wq_prs_disable_store(struct device *dev, struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	struct idxd_wq *wq = confdev_to_wq(dev);
+	struct idxd_device *idxd = wq->idxd;
+	bool prs_dis;
+	int rc;
+
+	if (wq->state != IDXD_WQ_DISABLED)
+		return -EPERM;
+
+	if (!idxd->hw.wq_cap.wq_prs_support)
+		return -EOPNOTSUPP;
+
+	rc = kstrtobool(buf, &prs_dis);
+	if (rc < 0)
+		return rc;
+
+	if (prs_dis) {
+		set_bit(WQ_FLAG_PRS_DISABLE, &wq->flags);
+		/* when PRS is disabled, BOF needs to be off as well */
+		clear_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags);
+	} else {
+		clear_bit(WQ_FLAG_PRS_DISABLE, &wq->flags);
+	}
+	return count;
+}
+
+static struct device_attribute dev_attr_wq_prs_disable =
+		__ATTR(prs_disable, 0644, wq_prs_disable_show, wq_prs_disable_store);
+
 static ssize_t wq_occupancy_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
 	struct idxd_wq *wq = confdev_to_wq(dev);
@@ -1239,6 +1281,7 @@ static struct attribute *idxd_wq_attributes[] = {
 	&dev_attr_wq_max_transfer_size.attr,
 	&dev_attr_wq_max_batch_size.attr,
 	&dev_attr_wq_ats_disable.attr,
+	&dev_attr_wq_prs_disable.attr,
 	&dev_attr_wq_occupancy.attr,
 	&dev_attr_wq_enqcmds_retries.attr,
 	&dev_attr_wq_op_config.attr,
@@ -1260,6 +1303,13 @@ static bool idxd_wq_attr_max_batch_size_invisible(struct attribute *attr,
 	       idxd->data->type == IDXD_TYPE_IAX;
 }
 
+static bool idxd_wq_attr_wq_prs_disable_invisible(struct attribute *attr,
+						  struct idxd_device *idxd)
+{
+	return attr == &dev_attr_wq_prs_disable.attr &&
+	       !idxd->hw.wq_cap.wq_prs_support;
+}
+
 static umode_t idxd_wq_attr_visible(struct kobject *kobj,
 				    struct attribute *attr, int n)
 {
@@ -1273,6 +1323,9 @@ static umode_t idxd_wq_attr_visible(struct kobject *kobj,
 	if (idxd_wq_attr_max_batch_size_invisible(attr, idxd))
 		return 0;
 
+	if (idxd_wq_attr_wq_prs_disable_invisible(attr, idxd))
+		return 0;
+
 	return attr->mode;
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-03 16:34 ` [PATCH 09/17] mm: export access_remote_vm() symbol Fenghua Yu
@ 2023-01-03 17:45   ` Lorenzo Stoakes
  2023-01-03 17:50     ` Lorenzo Stoakes
  2023-01-08 17:36     ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-03 17:45 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Vinod Koul, Dave Jiang, dmaengine, Tony Zhu, Andrew Morton,
	linux-mm, Christoph Hellwig

On Tue, Jan 03, 2023 at 08:34:57AM -0800, Fenghua Yu wrote:
> From: Dave Jiang <dave.jiang@intel.com>
>
> Export access_remote_vm() symbol for driver usage. The idxd driver would
> like to use it to write the user's completion record that the hardware
> device is not able to write to due to user page fault.
>
> Tested-by: Tony Zhu <tony.zhu@intel.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> ---
>  mm/memory.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index aad226daf41b..caae4deff987 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5579,6 +5579,7 @@ int access_remote_vm(struct mm_struct *mm, unsigned long addr,
>  {
>  	return __access_remote_vm(mm, addr, buf, len, gup_flags);
>  }
> +EXPORT_SYMBOL_GPL(access_remote_vm);
>
>  /*
>   * Access another process' address space.
> --
> 2.32.0
>

Can you explain what use case you have for exporting this? Currently this is
only used by procfs.

Additionally, it relies on a reference count being held on the mm which seems a
little risky exposing to a driver.

Is there a reason you can't use access_process_vm() which is exported and
additionally handles the refrencing?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-03 17:45   ` Lorenzo Stoakes
@ 2023-01-03 17:50     ` Lorenzo Stoakes
  2023-01-03 19:20       ` Yu, Fenghua
  2023-01-08 17:36     ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-03 17:50 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Vinod Koul, Dave Jiang, dmaengine, Tony Zhu, Andrew Morton,
	linux-mm, Christoph Hellwig

On Tue, Jan 03, 2023 at 05:45:30PM +0000, Lorenzo Stoakes wrote:
> Can you explain what use case you have for exporting this? Currently this is
> only used by procfs.
>

Ugh I hit send too soon ! I see you've explained the _why_ (i.e. idxd). The
other two points remain, however.

> Additionally, it relies on a reference count being held on the mm which seems a
> little risky exposing to a driver.
>
> Is there a reason you can't use access_process_vm() which is exported and
> additionally handles the refrencing?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-03 17:50     ` Lorenzo Stoakes
@ 2023-01-03 19:20       ` Yu, Fenghua
  2023-01-03 20:13         ` Lorenzo Stoakes
  0 siblings, 1 reply; 37+ messages in thread
From: Yu, Fenghua @ 2023-01-03 19:20 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony, Andrew Morton,
	linux-mm, Christoph Hellwig

Hi, Lorenzo,

> On Tue, Jan 03, 2023 at 09:46:00AM -0800, Lorenzo Stoakes wrote:
> On Tue, Jan 03, 2023 at 05:45:30PM +0000, Lorenzo Stoakes wrote:
> > Can you explain what use case you have for exporting this? Currently
> > this is only used by procfs.
> >
> 
> Ugh I hit send too soon ! I see you've explained the _why_ (i.e. idxd). The other
> two points remain, however.
> 
> > Additionally, it relies on a reference count being held on the mm
> > which seems a little risky exposing to a driver.

access_remote_vm(mm) directly call __access_remote_vm(mm).
access_process_vm(tsk) calls mm=get_task_mm() then __access_remote_vm(mm).

So instead of access_remote_vm(mm), it's access_process_vm(tsk) that holds
a reference count on the mm, right?

> >
> > Is there a reason you can't use access_process_vm() which is exported
> > and additionally handles the refrencing?

IDXD interrupt handler starts a work which needs to access remote vm.
The remote mm is found by PASID which is saved in device event log.

In the work, it's hard to get the remote mm from a task because mm->owner could be NULL
but the mm is still existing.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-03 19:20       ` Yu, Fenghua
@ 2023-01-03 20:13         ` Lorenzo Stoakes
  2023-01-04  5:06           ` Yu, Fenghua
  0 siblings, 1 reply; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-03 20:13 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony, Andrew Morton,
	linux-mm, Christoph Hellwig

On Tue, Jan 03, 2023 at 07:20:11PM +0000, Yu, Fenghua wrote:
> Hi, Lorenzo,
>

Hey Fenghua :)

> access_remote_vm(mm) directly call __access_remote_vm(mm).
> access_process_vm(tsk) calls mm=get_task_mm() then __access_remote_vm(mm).
>
> So instead of access_remote_vm(mm), it's access_process_vm(tsk) that holds
> a reference count on the mm, right?

Indeed!

>
> > >
> > > Is there a reason you can't use access_process_vm() which is exported
> > > and additionally handles the refrencing?
>
> IDXD interrupt handler starts a work which needs to access remote vm.
> The remote mm is found by PASID which is saved in device event log.
>
> In the work, it's hard to get the remote mm from a task because mm->owner could be NULL
> but the mm is still existing.

That makes sense, however I do feel nervous about exporting something that that
relies on this reference.

The issue is ensuring that the mm can't be taken from underneath you, the only
user of access_remote_vm(), procfs, does a careful dance using get_task_mm() and
mm_access() to ensure this can't happen, if _sometimes_ the remote mm might have
an owner and _sometimes_ not it feels like any exported function needs to be
equally careful?

I definitely don't feel as if simply exporting this is a safe option, and you
would in that case need a new function that handles different scenarios of mm
ownership/not.

I may be missing something here and I will wait for others to chime in but I
think we would definitely need something more than simply exporting this.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-03 20:13         ` Lorenzo Stoakes
@ 2023-01-04  5:06           ` Yu, Fenghua
  2023-01-04  6:12             ` Alistair Popple
  0 siblings, 1 reply; 37+ messages in thread
From: Yu, Fenghua @ 2023-01-04  5:06 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony, Andrew Morton,
	linux-mm, Christoph Hellwig, Shankar, Ravi V

Hi, Lorenzo,

> Hey Fenghua :)
> 
> > access_remote_vm(mm) directly call __access_remote_vm(mm).
> > access_process_vm(tsk) calls mm=get_task_mm() then
> __access_remote_vm(mm).
> >
> > So instead of access_remote_vm(mm), it's access_process_vm(tsk) that
> > holds a reference count on the mm, right?
> 
> Indeed!
> 
> >
> > > >
> > > > Is there a reason you can't use access_process_vm() which is
> > > > exported and additionally handles the refrencing?
> >
> > IDXD interrupt handler starts a work which needs to access remote vm.
> > The remote mm is found by PASID which is saved in device event log.
> >
> > In the work, it's hard to get the remote mm from a task because
> > mm->owner could be NULL but the mm is still existing.
> 
> That makes sense, however I do feel nervous about exporting something that
> that relies on this reference.
> 
> The issue is ensuring that the mm can't be taken from underneath you, the only
> user of access_remote_vm(), procfs, does a careful dance using get_task_mm()
> and
> mm_access() to ensure this can't happen, if _sometimes_ the remote mm might
> have an owner and _sometimes_ not it feels like any exported function needs to
> be equally careful?
> 
> I definitely don't feel as if simply exporting this is a safe option, and you would in
> that case need a new function that handles different scenarios of mm
> ownership/not.
> 
> I may be missing something here and I will wait for others to chime in but I think
> we would definitely need something more than simply exporting this.

I may define and export a new wrapper access_remote_vm_ref() which will hold
mm's reference count before accessing it:
int access_remote_vm_ref(mm)
{
   int ret;

   if (mm == &init_mm)
        return 0;

   mmget(mm);
   ret = access_remote_vm(mm);
   mmput(mm);

   return ret;
}
EXPORT_SYMBOL_GPL(access_remote_vm_ref);

IDXD or any driver calls this and holds mm reference count while accesses the mm.
This is useful for caller to directly access mm even if mm's owner could be NULL.

Do you think this is sufficient to take care of the mm reference and is a good way to go?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04  5:06           ` Yu, Fenghua
@ 2023-01-04  6:12             ` Alistair Popple
  2023-01-04 19:00               ` Yu, Fenghua
  2023-01-04 19:56               ` Lorenzo Stoakes
  0 siblings, 2 replies; 37+ messages in thread
From: Alistair Popple @ 2023-01-04  6:12 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Lorenzo Stoakes, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V


"Yu, Fenghua" <fenghua.yu@intel.com> writes:

> Hi, Lorenzo,
>
>> Hey Fenghua :)
>> 
>> > access_remote_vm(mm) directly call __access_remote_vm(mm).
>> > access_process_vm(tsk) calls mm=get_task_mm() then
>> __access_remote_vm(mm).
>> >
>> > So instead of access_remote_vm(mm), it's access_process_vm(tsk) that
>> > holds a reference count on the mm, right?
>> 
>> Indeed!
>> 
>> >
>> > > >
>> > > > Is there a reason you can't use access_process_vm() which is
>> > > > exported and additionally handles the refrencing?
>> >
>> > IDXD interrupt handler starts a work which needs to access remote vm.
>> > The remote mm is found by PASID which is saved in device event log.
>> >
>> > In the work, it's hard to get the remote mm from a task because
>> > mm->owner could be NULL but the mm is still existing.
>> 
>> That makes sense, however I do feel nervous about exporting something that
>> that relies on this reference.
>> 
>> The issue is ensuring that the mm can't be taken from underneath you, the only
>> user of access_remote_vm(), procfs, does a careful dance using get_task_mm()
>> and
>> mm_access() to ensure this can't happen, if _sometimes_ the remote mm might
>> have an owner and _sometimes_ not it feels like any exported function needs to
>> be equally careful?

I think the point is the remote mm should be valid as long as the PASID
is valid because it doesn't make sense to have a PASID without
associated memory map. iommu_sva_find() does mmget_not_zero() to ensure
that.

Obviously something must still be holding a mmgrab() though. That should
happen as part of the PASID allocation done by iommu_sva_bind_device().
 
>> I definitely don't feel as if simply exporting this is a safe option, and you would in
>> that case need a new function that handles different scenarios of mm
>> ownership/not.

Note this isn't that different from get_user_pages_remote().
 
>> I may be missing something here and I will wait for others to chime in but I think
>> we would definitely need something more than simply exporting this.
>
> I may define and export a new wrapper access_remote_vm_ref() which will hold
> mm's reference count before accessing it:
> int access_remote_vm_ref(mm)
> {
>    int ret;
>
>    if (mm == &init_mm)
>         return 0;
>
>    mmget(mm);
>    ret = access_remote_vm(mm);
>    mmput(mm);
>
>    return ret;
> }
> EXPORT_SYMBOL_GPL(access_remote_vm_ref);
>
> IDXD or any driver calls this and holds mm reference count while accesses the mm.
> This is useful for caller to directly access mm even if mm's owner could be NULL.

I'm not sure that helps much. A driver would still need to hold a
mm_count to ensure the struct_mm itself can't go away anyway so it may
as well do the mmget() IMHO (although it really should be
mmget_not_zero()).

In any case though iommu_sva_find() already takes care of doing
mmget_not_zero(). I wonder if it makes more sense to define a wrapper
(eg. iommu_access_pasid) that takes a PASID and does the mm
lookup/access_vm/mmput?

> Do you think this is sufficient to take care of the mm reference and is a good way to go?
>
> Thanks.
>
> -Fenghua


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04  6:12             ` Alistair Popple
@ 2023-01-04 19:00               ` Yu, Fenghua
  2023-01-04 20:00                 ` Lorenzo Stoakes
  2023-01-04 19:56               ` Lorenzo Stoakes
  1 sibling, 1 reply; 37+ messages in thread
From: Yu, Fenghua @ 2023-01-04 19:00 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Lorenzo Stoakes, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V

Hi, Alistair and Lorenzo,

> >> > access_remote_vm(mm) directly call __access_remote_vm(mm).
> >> > access_process_vm(tsk) calls mm=get_task_mm() then
> >> __access_remote_vm(mm).
> >> >
> >> > So instead of access_remote_vm(mm), it's access_process_vm(tsk)
> >> > that holds a reference count on the mm, right?
> >>
> >> Indeed!
> >>
> >> >
> >> > > >
> >> > > > Is there a reason you can't use access_process_vm() which is
> >> > > > exported and additionally handles the refrencing?
> >> >
> >> > IDXD interrupt handler starts a work which needs to access remote vm.
> >> > The remote mm is found by PASID which is saved in device event log.
> >> >
> >> > In the work, it's hard to get the remote mm from a task because
> >> > mm->owner could be NULL but the mm is still existing.
> >>
> >> That makes sense, however I do feel nervous about exporting something
> >> that that relies on this reference.
> >>
> >> The issue is ensuring that the mm can't be taken from underneath you,
> >> the only user of access_remote_vm(), procfs, does a careful dance
> >> using get_task_mm() and
> >> mm_access() to ensure this can't happen, if _sometimes_ the remote mm
> >> might have an owner and _sometimes_ not it feels like any exported
> >> function needs to be equally careful?
> 
> I think the point is the remote mm should be valid as long as the PASID is valid
> because it doesn't make sense to have a PASID without associated memory map.
> iommu_sva_find() does mmget_not_zero() to ensure that.
> 
> Obviously something must still be holding a mmgrab() though. That should
> happen as part of the PASID allocation done by iommu_sva_bind_device().
> 
> >> I definitely don't feel as if simply exporting this is a safe option,
> >> and you would in that case need a new function that handles different
> >> scenarios of mm ownership/not.
> 
> Note this isn't that different from get_user_pages_remote().
> 
> >> I may be missing something here and I will wait for others to chime
> >> in but I think we would definitely need something more than simply exporting
> this.
> >
> > I may define and export a new wrapper access_remote_vm_ref() which
> > will hold mm's reference count before accessing it:
> > int access_remote_vm_ref(mm)
> > {
> >    int ret;
> >
> >    if (mm == &init_mm)
> >         return 0;
> >
> >    mmget(mm);
> >    ret = access_remote_vm(mm);
> >    mmput(mm);
> >
> >    return ret;
> > }
> > EXPORT_SYMBOL_GPL(access_remote_vm_ref);
> >
> > IDXD or any driver calls this and holds mm reference count while accesses the
> mm.
> > This is useful for caller to directly access mm even if mm's owner could be
> NULL.
> 
> I'm not sure that helps much. A driver would still need to hold a mm_count to
> ensure the struct_mm itself can't go away anyway so it may as well do the
> mmget() IMHO (although it really should be mmget_not_zero()).
> 
> In any case though iommu_sva_find() already takes care of doing
> mmget_not_zero().

That's right. IDXD driver calls iommu_sva_find() which holds mm reference before
access_remote_vm(mm) and puts the count after.

And comment of access_remote_vm() explicitly says "The caller must hold a reference on @mm.".
IDXD follows the mm reference policy. There is no need to have a different wrapper.

So the current patch is good without any change, right?

> I wonder if it makes more sense to define a wrapper (eg.
> iommu_access_pasid) that takes a PASID and does the mm
> lookup/access_vm/mmput?

Currently access_remove_vm() is called only once in IDXD. And the calling code
is clearly to have mmget() and mmput() already. The proposed
wrapper iommu_access_pasid() may not be very useful. We may add the wrapper
in the future if there are more usages. Is that OK?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04  6:12             ` Alistair Popple
  2023-01-04 19:00               ` Yu, Fenghua
@ 2023-01-04 19:56               ` Lorenzo Stoakes
  2023-01-04 21:05                 ` Lorenzo Stoakes
  2023-01-04 23:57                 ` Alistair Popple
  1 sibling, 2 replies; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-04 19:56 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Yu, Fenghua, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V

On Wed, Jan 04, 2023 at 05:12:31PM +1100, Alistair Popple wrote:
> Obviously something must still be holding a mmgrab() though. That should
> happen as part of the PASID allocation done by iommu_sva_bind_device().

I'm not familiar with the iommu code, but a brief glance suggests that no
mmgrab() is being performed for intel devices? I may have missed something
though.

We do need to be absolutely sure the mm sticks around (hence the
grab) but if we need userland mappings that we have to have a subsequent
mmget_not_zero().

> >> I definitely don't feel as if simply exporting this is a safe option, and you would in
> >> that case need a new function that handles different scenarios of mm
> >> ownership/not.
>
> Note this isn't that different from get_user_pages_remote().

get_user_pages_remote() differs in that, most importantly, it requires the
mm_lock is held on invocation (implying that the mm will stick around), which is
not the case for access_remote_vm() (as __access_remote_vm() subsequently
obtains it), but also in that it pins pages but doesn't copy things to a buffer
(rather returning VMAs or struct page objects).

Also note the comment around get_user_pages_remote() saying nobody should be
using it due to lack of FAULT_FLAG_ALLOW_RETRY handling :) yes it feels like GUP
is a bit of a mess.

> In any case though iommu_sva_find() already takes care of doing
> mmget_not_zero(). I wonder if it makes more sense to define a wrapper
> (eg. iommu_access_pasid) that takes a PASID and does the mm
> lookup/access_vm/mmput?

My concern is exposing something highly delicate _which accesses remote mas a public API with implicit
assumptions whose one and only (core kernel) user treats with enormous
caution. Even if this iommu code were to use it correctly, we'd end up with an
interface which could be subject to real risks which other drivers may misuse.

Another question is - why can't we:-

1. mmgrab() [or safely assume we already have a reference] + mmget_not_zero()
2. acquire mm read lock to stop VMAs disappearing underneath us and pin pages with get_user_pages_remote()
3. copy what we need using e.g. copy_from_user()/copy_to_user()
4. unwind

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04 19:00               ` Yu, Fenghua
@ 2023-01-04 20:00                 ` Lorenzo Stoakes
  0 siblings, 0 replies; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-04 20:00 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Alistair Popple, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V

On Wed, Jan 04, 2023 at 07:00:55PM +0000, Yu, Fenghua wrote:
> So the current patch is good without any change, right?

No, you'd definitely need mmget_not_zero() (and a grab in advance of this,
though from the sounds of it you may already have it), I definitely don't think
the patch as-is works, see my reply to Alistair.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04 19:56               ` Lorenzo Stoakes
@ 2023-01-04 21:05                 ` Lorenzo Stoakes
  2023-01-04 23:57                 ` Alistair Popple
  1 sibling, 0 replies; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-04 21:05 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Yu, Fenghua, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V

On Wed, Jan 04, 2023 at 07:56:35PM +0000, Lorenzo Stoakes wrote:
> Another question is - why can't we:-
>
> 1. mmgrab() [or safely assume we already have a reference] + mmget_not_zero()
> 2. acquire mm read lock to stop VMAs disappearing underneath us and pin pages with get_user_pages_remote()
> 3. copy what we need using e.g. copy_from_user()/copy_to_user()
> 4. unwind

OK looking at __access_remote_vm() I just accidentally described exactly what it
does other than step 1 :)

Perhaps then the answer is a wrapper that gets the reference before
invoking __access_remote_vm()? I guess we could assume grab there.

It strikes me that access_remote_vm() being quite literally a pass through to
__access_remote_vm() means we could:-

a. change all callers of access_remote_vm() to use __access_remote_vm()
b. Update access_remote_vm() to be safer
c. finally, export access_remote_vm()

e.g.:-

int access_remote_vm(struct mm_struct *mm, unsigned long addr,
		void *buf, int len, unsigned int gup_flags)
{
	int ret;

	if (!mmget_not_zero(mm))
		return 0;

	ret = __access_remote_vm(mm, addr, buf, len, gup_flags);

	mmput(mm);
}
EXPORT_SYMBOL_GPL(access_remote_vm)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04 19:56               ` Lorenzo Stoakes
  2023-01-04 21:05                 ` Lorenzo Stoakes
@ 2023-01-04 23:57                 ` Alistair Popple
  2023-01-05  3:08                   ` Yu, Fenghua
  2023-01-05  7:26                   ` Lorenzo Stoakes
  1 sibling, 2 replies; 37+ messages in thread
From: Alistair Popple @ 2023-01-04 23:57 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Yu, Fenghua, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V


Lorenzo Stoakes <lstoakes@gmail.com> writes:

> On Wed, Jan 04, 2023 at 05:12:31PM +1100, Alistair Popple wrote:
>> Obviously something must still be holding a mmgrab() though. That should
>> happen as part of the PASID allocation done by iommu_sva_bind_device().
>
> I'm not familiar with the iommu code, but a brief glance suggests that no
> mmgrab() is being performed for intel devices? I may have missed something
> though.

I'm more familiar with the ARM side of things, but we can safely assume
we always have a valid mmgrab()/mm_count while the PASID is bound because
iommu_sva_bind_device() -> iommu_sva_domain_alloc() -> mmgrab().

> We do need to be absolutely sure the mm sticks around (hence the
> grab) but if we need userland mappings that we have to have a subsequent
> mmget_not_zero().

Yeah, iommu_sva_find() does take care of that though:

 * On success a reference to the mm is taken, and must be released with mmput().

>> >> I definitely don't feel as if simply exporting this is a safe option, and you would in
>> >> that case need a new function that handles different scenarios of mm
>> >> ownership/not.
>>
>> Note this isn't that different from get_user_pages_remote().
>
> get_user_pages_remote() differs in that, most importantly, it requires the
> mm_lock is held on invocation (implying that the mm will stick around), which is
> not the case for access_remote_vm() (as __access_remote_vm() subsequently
> obtains it), but also in that it pins pages but doesn't copy things to a buffer
> (rather returning VMAs or struct page objects).

Oh that makes sense.

> Also note the comment around get_user_pages_remote() saying nobody should be
> using it due to lack of FAULT_FLAG_ALLOW_RETRY handling :) yes it feels like GUP
> is a bit of a mess.
>
>> In any case though iommu_sva_find() already takes care of doing
>> mmget_not_zero(). I wonder if it makes more sense to define a wrapper
>> (eg. iommu_access_pasid) that takes a PASID and does the mm
>> lookup/access_vm/mmput?
>
> My concern is exposing something highly delicate _which accesses remote mas a public API with implicit
> assumptions whose one and only (core kernel) user treats with enormous
> caution. Even if this iommu code were to use it correctly, we'd end up with an
> interface which could be subject to real risks which other drivers may misuse.

Ok, although I think making this an iommu specific wrapper taking a
PASID rather than mm_struct would make the API more specific and less
likely to be misused as the mm_count/users lifetime issues could be
dealt with inside the core IOMMU code.

> Another question is - why can't we:-
>
> 1. mmgrab() [or safely assume we already have a reference] + mmget_not_zero()
> 2. acquire mm read lock to stop VMAs disappearing underneath us and pin pages with get_user_pages_remote()
> 3. copy what we need using e.g. copy_from_user()/copy_to_user()
> 4. unwind

Seems reasonable to me at least, but I don't have any strong opinions as
I only noticed this thread while trying to catch up on IOMMU
developments.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04 23:57                 ` Alistair Popple
@ 2023-01-05  3:08                   ` Yu, Fenghua
  2023-01-05  3:22                     ` Alistair Popple
  2023-01-05  7:26                   ` Lorenzo Stoakes
  1 sibling, 1 reply; 37+ messages in thread
From: Yu, Fenghua @ 2023-01-05  3:08 UTC (permalink / raw)
  To: Alistair Popple, Lorenzo Stoakes
  Cc: Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony, Andrew Morton,
	linux-mm, Christoph Hellwig, Shankar, Ravi V

Hi, Alistair,

> Alistair Popple <apopple@nvidia.com> writes:
> Lorenzo Stoakes <lstoakes@gmail.com> writes:
> > My concern is exposing something highly delicate _which accesses
> > remote mas a public API with implicit assumptions whose one and only
> > (core kernel) user treats with enormous caution. Even if this iommu
> > code were to use it correctly, we'd end up with an interface which could be
> subject to real risks which other drivers may misuse.
> 
> Ok, although I think making this an iommu specific wrapper taking a PASID
> rather than mm_struct would make the API more specific and less likely to be
> misused as the mm_count/users lifetime issues could be dealt with inside the
> core IOMMU code.

The iommu specific wrapper still needs to call access_remote_vm() which is
in generic mm. We cannot avoid to export access_remote_vm(), right?
Are you saying the iommu specific wrapper doesn't need to the mm code?
 
Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-05  3:08                   ` Yu, Fenghua
@ 2023-01-05  3:22                     ` Alistair Popple
  2023-01-05 20:58                       ` Yu, Fenghua
  0 siblings, 1 reply; 37+ messages in thread
From: Alistair Popple @ 2023-01-05  3:22 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Lorenzo Stoakes, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V


"Yu, Fenghua" <fenghua.yu@intel.com> writes:

> Hi, Alistair,
>> Alistair Popple <apopple@nvidia.com> writes:
>> Lorenzo Stoakes <lstoakes@gmail.com> writes:
>> > My concern is exposing something highly delicate _which accesses
>> > remote mas a public API with implicit assumptions whose one and only
>> > (core kernel) user treats with enormous caution. Even if this iommu
>> > code were to use it correctly, we'd end up with an interface which could be
>> subject to real risks which other drivers may misuse.
>> 
>> Ok, although I think making this an iommu specific wrapper taking a PASID
>> rather than mm_struct would make the API more specific and less likely to be
>> misused as the mm_count/users lifetime issues could be dealt with inside the
>> core IOMMU code.
>
> The iommu specific wrapper still needs to call access_remote_vm() which is
> in generic mm. We cannot avoid to export access_remote_vm(), right?

The wrapper still needs to call access_remote_vm(), but that doesn't
imply access_remote_vm() needs to be exported. I think the logical place
to put the wrapper would be in iommu-sva.c which isn't built as a
module, so you would only have to EXPORT_SYMBOL_GPL the wrapper and not
access_remote_vm().

> Are you saying the iommu specific wrapper doesn't need to the mm code?
>
> Thanks.
>
> -Fenghua


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-04 23:57                 ` Alistair Popple
  2023-01-05  3:08                   ` Yu, Fenghua
@ 2023-01-05  7:26                   ` Lorenzo Stoakes
  1 sibling, 0 replies; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-05  7:26 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Yu, Fenghua, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V

On Thu, Jan 05, 2023 at 10:57:02AM +1100, Alistair Popple wrote:
> Ok, although I think making this an iommu specific wrapper taking a
> PASID rather than mm_struct would make the API more specific and less
> likely to be misused as the mm_count/users lifetime issues could be
> dealt with inside the core IOMMU code.

This sounds like the ideal approach, as long as we either mmgrab() or otherwise
ensure that the mm won't disappear underneath us, then use mmget_not_zero() to
get a reference in this wrapper function, I think we're gravy.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-05  3:22                     ` Alistair Popple
@ 2023-01-05 20:58                       ` Yu, Fenghua
  2023-01-05 21:04                         ` Lorenzo Stoakes
  0 siblings, 1 reply; 37+ messages in thread
From: Yu, Fenghua @ 2023-01-05 20:58 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Lorenzo Stoakes, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V

Hi, Alistair,

> >> Alistair Popple <apopple@nvidia.com> writes:
> >> Lorenzo Stoakes <lstoakes@gmail.com> writes:
> >> > My concern is exposing something highly delicate _which accesses
> >> > remote mas a public API with implicit assumptions whose one and
> >> > only (core kernel) user treats with enormous caution. Even if this
> >> > iommu code were to use it correctly, we'd end up with an interface
> >> > which could be
> >> subject to real risks which other drivers may misuse.
> >>
> >> Ok, although I think making this an iommu specific wrapper taking a
> >> PASID rather than mm_struct would make the API more specific and less
> >> likely to be misused as the mm_count/users lifetime issues could be
> >> dealt with inside the core IOMMU code.
> >
> > The iommu specific wrapper still needs to call access_remote_vm()
> > which is in generic mm. We cannot avoid to export access_remote_vm(), right?
> 
> The wrapper still needs to call access_remote_vm(), but that doesn't imply
> access_remote_vm() needs to be exported. I think the logical place to put the
> wrapper would be in iommu-sva.c which isn't built as a module, so you would
> only have to EXPORT_SYMBOL_GPL the wrapper and not access_remote_vm().

This looks better than exporting access_remote_vm(). I will remove this patch
and write the IOMMU wrapper to call access_remote_vm() in v2.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-05 20:58                       ` Yu, Fenghua
@ 2023-01-05 21:04                         ` Lorenzo Stoakes
  0 siblings, 0 replies; 37+ messages in thread
From: Lorenzo Stoakes @ 2023-01-05 21:04 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Alistair Popple, Vinod Koul, Jiang, Dave, dmaengine, Zhu, Tony,
	Andrew Morton, linux-mm, Christoph Hellwig, Shankar, Ravi V

On Thu, Jan 05, 2023 at 08:58:08PM +0000, Yu, Fenghua wrote:
> This looks better than exporting access_remote_vm(). I will remove this patch
> and write the IOMMU wrapper to call access_remote_vm() in v2.

Thank you both for a very constructive conversation! I definitely think this is
the right way to go.

All the best, Lorenzo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-03 17:45   ` Lorenzo Stoakes
  2023-01-03 17:50     ` Lorenzo Stoakes
@ 2023-01-08 17:36     ` Christoph Hellwig
  2023-03-01 23:39       ` Fenghua Yu
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2023-01-08 17:36 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Fenghua Yu, Vinod Koul, Dave Jiang, dmaengine, Tony Zhu,
	Andrew Morton, linux-mm, Christoph Hellwig

Exporting access_remote_vm just seems like a horrible idea.

If a driver needs to access a different VM from a completion path
in some thread or workqueue (which I assume this does, if not please
explain the use case), it should use kthread_use_mm to associate the
mm struct and then just use all the normal uaccess helpers.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/17] mm: export access_remote_vm() symbol
  2023-01-08 17:36     ` Christoph Hellwig
@ 2023-03-01 23:39       ` Fenghua Yu
  0 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-03-01 23:39 UTC (permalink / raw)
  To: Christoph Hellwig, Lorenzo Stoakes
  Cc: Vinod Koul, Dave Jiang, dmaengine, Tony Zhu, Andrew Morton, linux-mm

Hi, Christoph,

On 1/8/23 09:36, Christoph Hellwig wrote:
> Exporting access_remote_vm just seems like a horrible idea.
> 
> If a driver needs to access a different VM from a completion path
> in some thread or workqueue (which I assume this does, if not please
> explain the use case), it should use kthread_use_mm to associate the
> mm struct and then just use all the normal uaccess helpers.

To trigger fault by IDXD device, user app frees the fault_addr pages.
After kthread_use_mm(), the IDXD driver cannot directly call
copy_to_user() to copy data to fault_addr because
the pages are freed by the app. If the IDXD driver
tries to copy to the app's fault_addr, it needs to get all the pages,
kmap, copy_to_user_page() etc, basically majority of the 
__access_remote_vm() code. Without exporting access_remote_vm(),
the driver has to re-implement the function for the copy to work
in IDXD driver.

Maybe a simple exported IOMMU wrapper in iommu-sva.c can help here?
CONFIG_IOMMU_SVA is bool and needs to be set for Event Log to work.
So the wrapper is exported and can be called the IDXD driver or any
driver that is on top of IOMMU SVA.

No need to export access_remote_vm() any more.

Plus with this wrapper, no need to export iommu_sva_find()
(in an earlier patch). So the code will be more clean and concise.

Is this OK for you?

struct mm_struct
*iommu_access_remote_vm(ioasid_t pasid, unsigned long addr,
			void *buf, int len, unsigned int gup_flags, int *copied)
{
	struct mm_struct *mm;

	mm = iommu_sva_find(pasid);
	if (!IS_ERR_OR_NULL(mm)) {
		/* A reference on @mm has been held. */
		*copied = access_remote_vm(mm, addr, buf, len, gup_flags);
	}

	return mm;
}
EXPORT_SYMBOL_GPL(iommu_access_remote_vm);

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 08/17] iommu: expose iommu_sva_find() to common header
       [not found] <20230103162920.1569002-1-fenghua.yu@intel.com>
@ 2023-01-03 16:29 ` Fenghua Yu
  0 siblings, 0 replies; 37+ messages in thread
From: Fenghua Yu @ 2023-01-03 16:29 UTC (permalink / raw)
  To: Vinod Koul, devejiang
  Cc: Fenghua Yu, Dave Jiang, Tony Zhu, Joerg Roedel, Will Deacon,
	Robin Murphy, iommu

From: Dave Jiang <dave.jiang@intel.com>

Move iommu_sva_find() prototype to global header from local header in order
to allow drivers to call the function. Used by idxd driver to find the mm
from device reported PASID. The symbol is already exported.

Tested-by: Tony Zhu <tony.zhu@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev
---
 drivers/iommu/iommu-sva.h | 1 -
 include/linux/iommu.h     | 7 +++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
index 7215a761b962..102eae1817a2 100644
--- a/drivers/iommu/iommu-sva.h
+++ b/drivers/iommu/iommu-sva.h
@@ -9,7 +9,6 @@
 #include <linux/mm_types.h>
 
 int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
-struct mm_struct *iommu_sva_find(ioasid_t pasid);
 
 /* I/O Page fault */
 struct device;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 46e1347bfa22..7db16ca3f519 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -704,6 +704,8 @@ void iommu_release_device(struct device *dev);
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
 
+struct mm_struct *iommu_sva_find(ioasid_t pasid);
+
 int iommu_device_use_default_domain(struct device *dev);
 void iommu_device_unuse_default_domain(struct device *dev);
 
@@ -1042,6 +1044,11 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
 	return -ENODEV;
 }
 
+static inline struct mm_struct *iommu_sva_find(ioasid_t pasid)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-03-01 23:40 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-03 16:34 [PATCH 00/17] Enable DSA 2.0 Event Log and completion record faulting features Fenghua Yu
2023-01-03 16:34 ` [PATCH 01/17] dmaengine: idxd: make misc interrupt one shot Fenghua Yu
2023-01-03 16:34 ` [PATCH 02/17] dmaengine: idxd: add event log size sysfs attribute Fenghua Yu
2023-01-03 16:34 ` [PATCH 03/17] dmaengine: idxd: setup event log configuration Fenghua Yu
2023-01-03 16:34 ` [PATCH 04/17] dmaengine: idxd: add interrupt handling for event log Fenghua Yu
2023-01-03 16:34 ` [PATCH 05/17] dmanegine: idxd: add debugfs for event log dump Fenghua Yu
2023-01-03 16:34 ` [PATCH 06/17] dmaengine: idxd: add per DSA wq workqueue for processing cr faults Fenghua Yu
2023-01-03 16:34 ` [PATCH 07/17] dmaengine: idxd: create kmem cache for event log fault items Fenghua Yu
2023-01-03 16:34 ` [PATCH 08/17] iommu: expose iommu_sva_find() to common header Fenghua Yu
2023-01-03 16:34 ` [PATCH 09/17] mm: export access_remote_vm() symbol Fenghua Yu
2023-01-03 17:45   ` Lorenzo Stoakes
2023-01-03 17:50     ` Lorenzo Stoakes
2023-01-03 19:20       ` Yu, Fenghua
2023-01-03 20:13         ` Lorenzo Stoakes
2023-01-04  5:06           ` Yu, Fenghua
2023-01-04  6:12             ` Alistair Popple
2023-01-04 19:00               ` Yu, Fenghua
2023-01-04 20:00                 ` Lorenzo Stoakes
2023-01-04 19:56               ` Lorenzo Stoakes
2023-01-04 21:05                 ` Lorenzo Stoakes
2023-01-04 23:57                 ` Alistair Popple
2023-01-05  3:08                   ` Yu, Fenghua
2023-01-05  3:22                     ` Alistair Popple
2023-01-05 20:58                       ` Yu, Fenghua
2023-01-05 21:04                         ` Lorenzo Stoakes
2023-01-05  7:26                   ` Lorenzo Stoakes
2023-01-08 17:36     ` Christoph Hellwig
2023-03-01 23:39       ` Fenghua Yu
2023-01-03 16:34 ` [PATCH 10/17] dmaengine: idxd: process user page faults for completion record Fenghua Yu
2023-01-03 16:34 ` [PATCH 11/17] dmaengine: idxd: add descs_completed field " Fenghua Yu
2023-01-03 16:35 ` [PATCH 12/17] dmaengine: idxd: process batch descriptor completion record faults Fenghua Yu
2023-01-03 16:35 ` [PATCH 13/17] dmaengine: idxd: add per file user counters for " Fenghua Yu
2023-01-03 16:35 ` [PATCH 14/17] dmaengine: idxd: add a device to represent the file opened Fenghua Yu
2023-01-03 16:35 ` [PATCH 15/17] dmaengine: idxd: expose fault counters to sysfs Fenghua Yu
2023-01-03 16:35 ` [PATCH 16/17] dmaengine: idxd: add pid to exported sysfs attribute for opened file Fenghua Yu
2023-01-03 16:35 ` [PATCH 17/17] dmaengine: idxd: add per wq PRS disable Fenghua Yu
     [not found] <20230103162920.1569002-1-fenghua.yu@intel.com>
2023-01-03 16:29 ` [PATCH 08/17] iommu: expose iommu_sva_find() to common header Fenghua Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.