All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/5] ocxl: Mmio invalidation support
@ 2020-11-20 17:32 Christophe Lombard
  2020-11-20 17:32 ` [PATCH V2 1/5] ocxl: Assign a register set to a Logical Partition Christophe Lombard
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Christophe Lombard @ 2020-11-20 17:32 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd

OpenCAPI 4.0/5.0 with TLBI/SLBI Snooping, is not used due to performance
problems caused by the PAU having to process all incoming TLBI/SLBI
commands which will cause them to back up on the PowerBus.

When the Address Translation Mode requires TLB operations to be initiated
using MMIO registers, a set of registers like the following is used:
• XTS MMIO ATSD0 LPARID register
• XTS MMIO ATSD0 AVA register
• XTS MMIO ATSD0 launch register, write access initiates a shoot down
• XTS MMIO ATSD0 status register

The MMIO based mechanism also blocks the NPU/PAU from snooping TLBIE
commands from the PowerBus.

The Shootdown commands (ATSD) will be generated using MMIO registers
in the NPU/PAU and sent to the device.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>

---
Changelog[v2]
 - Rebase to latest upstream.
 - Create a set of smaller patches
 - Move the device tree parsing and ioremap() for the shootdown page in a
   platform-specific file (powernv)
 - Release the shootdown page in release_xsl()
 - Initialize atsd_lock
 - Move the code to initiate the TLB Invalidate command in a
   platform-specific file (powernv)
 - Use the notifier invalidate_range
---

Christophe Lombard (5):
  ocxl: Assign a register set to a Logical Partition
  ocxl: Initiate a TLB invalidate command
  ocxl: Update the Process Element Entry
  ocxl: Add mmu notifier
  ocxl: Add new kernel traces

 arch/powerpc/include/asm/pnv-ocxl.h   |  53 +++++++++++++
 arch/powerpc/platforms/powernv/ocxl.c | 103 ++++++++++++++++++++++++++
 drivers/misc/ocxl/context.c           |   4 +-
 drivers/misc/ocxl/link.c              |  66 ++++++++++++++++-
 drivers/misc/ocxl/ocxl_internal.h     |   4 +-
 drivers/misc/ocxl/trace.h             |  64 ++++++++++++++++
 drivers/scsi/cxlflash/ocxl_hw.c       |   6 +-
 include/misc/ocxl.h                   |   2 +-
 8 files changed, 295 insertions(+), 7 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2 1/5] ocxl: Assign a register set to a Logical Partition
  2020-11-20 17:32 [PATCH V2 0/5] ocxl: Mmio invalidation support Christophe Lombard
@ 2020-11-20 17:32 ` Christophe Lombard
  2020-11-23 10:35   ` Frederic Barrat
  2020-11-20 17:32 ` [PATCH V2 2/5] ocxl: Initiate a TLB invalidate command Christophe Lombard
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Christophe Lombard @ 2020-11-20 17:32 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd

Platform specific function to assign a register set to a Logical Partition.
The "ibm,mmio-atsd" property, provided by the firmware, contains the 16
base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO
registers (XTS MMIO ATSDx LPARID/AVA/launch/status register).

For the time being, the ATSD0 set of registers is used by default.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pnv-ocxl.h   |  3 ++
 arch/powerpc/platforms/powernv/ocxl.c | 48 +++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index d37ededca3ee..3f38aed7100c 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void **p
 void pnv_ocxl_spa_release(void *platform_data);
 int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
 
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+		      uint64_t lpcr, void __iomem **arva);
+void pnv_ocxl_unmap_lpar(void __iomem **arva);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index ecdad219d704..bc20cf867900 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -483,3 +483,51 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
 	return rc;
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
+
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+		      uint64_t lpcr, void __iomem **arva)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	u64 mmio_atsd;
+	int rc;
+
+	/* ATSD physical address.
+	 * ATSD LAUNCH register: write access initiates a shoot down to
+	 * initiate the TLB Invalidate command.
+	 */
+	rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
+					0, &mmio_atsd);
+	if (rc) {
+		dev_info(&dev->dev, "No available ATSD found\n");
+		return rc;
+	}
+
+	/* Assign a register set to a Logical Partition and MMIO ATSD
+	 * LPARID register to the required value.
+	 */
+	if (mmio_atsd)
+		rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev),
+				       lparid, lpcr);
+	if (rc) {
+		dev_err(&dev->dev, "Error mapping device to LPAR: %d\n", rc);
+		return rc;
+	}
+
+	if (mmio_atsd) {
+		*arva = ioremap(mmio_atsd, 24);
+		if (!(*arva)) {
+			dev_warn(&dev->dev, "ioremap failed - mmio_atsd: %#llx\n", mmio_atsd);
+			rc = -ENOMEM;
+		}
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar);
+
+void pnv_ocxl_unmap_lpar(void __iomem **arva)
+{
+	iounmap(*arva);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 2/5] ocxl: Initiate a TLB invalidate command
  2020-11-20 17:32 [PATCH V2 0/5] ocxl: Mmio invalidation support Christophe Lombard
  2020-11-20 17:32 ` [PATCH V2 1/5] ocxl: Assign a register set to a Logical Partition Christophe Lombard
@ 2020-11-20 17:32 ` Christophe Lombard
  2020-11-23 10:37   ` Frederic Barrat
  2020-11-20 17:32 ` [PATCH V2 3/5] ocxl: Update the Process Element Entry Christophe Lombard
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Christophe Lombard @ 2020-11-20 17:32 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd

When a TLB Invalidate is required for the Logical Partition, the following
sequence has to be performed:

1. Load MMIO ATSD AVA register with the necessary value, if required.
2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
   command.
3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
   has been completed.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pnv-ocxl.h   | 50 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/ocxl.c | 55 +++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index 3f38aed7100c..9c90e87e7659 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -3,12 +3,59 @@
 #ifndef _ASM_PNV_OCXL_H
 #define _ASM_PNV_OCXL_H
 
+#include <linux/bitfield.h>
 #include <linux/pci.h>
 
 #define PNV_OCXL_TL_MAX_TEMPLATE        63
 #define PNV_OCXL_TL_BITS_PER_RATE       4
 #define PNV_OCXL_TL_RATE_BUF_SIZE       ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
 
+#define PNV_OCXL_ATSD_TIMEOUT		1
+
+/* TLB Management Instructions */
+#define PNV_OCXL_ATSD_LNCH		0x00
+/* Radix Invalidate */
+#define   PNV_OCXL_ATSD_LNCH_R		PPC_BIT(0)
+/* Radix Invalidation Control
+ * 0b00 Just invalidate TLB.
+ * 0b01 Invalidate just Page Walk Cache.
+ * 0b10 Invalidate TLB, Page Walk Cache, and any
+ * caching of Partition and Process Table Entries.
+ */
+#define   PNV_OCXL_ATSD_LNCH_RIC	PPC_BITMASK(1, 2)
+/* Number and Page Size of translations to be invalidated */
+#define   PNV_OCXL_ATSD_LNCH_LP		PPC_BITMASK(3, 10)
+/* Invalidation Criteria
+ * 0b00 Invalidate just the target VA.
+ * 0b01 Invalidate matching PID.
+ */
+#define   PNV_OCXL_ATSD_LNCH_IS		PPC_BITMASK(11, 12)
+/* 0b1: Process Scope, 0b0: Partition Scope */
+#define   PNV_OCXL_ATSD_LNCH_PRS	PPC_BIT(13)
+/* Invalidation Flag */
+#define   PNV_OCXL_ATSD_LNCH_B		PPC_BIT(14)
+/* Actual Page Size to be invalidated
+ * 000 4KB
+ * 101 64KB
+ * 001 2MB
+ * 010 1GB
+ */
+#define   PNV_OCXL_ATSD_LNCH_AP		PPC_BITMASK(15, 17)
+/* Defines the large page select
+ * L=0b0 for 4KB pages
+ * L=0b1 for large pages)
+ */
+#define   PNV_OCXL_ATSD_LNCH_L		PPC_BIT(18)
+/* Process ID */
+#define   PNV_OCXL_ATSD_LNCH_PID	PPC_BITMASK(19, 38)
+/* NoFlush – Assumed to be 0b0 */
+#define   PNV_OCXL_ATSD_LNCH_F		PPC_BIT(39)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SLBI	PPC_BIT(40)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON	PPC_BIT(41)
+#define PNV_OCXL_ATSD_AVA		0x08
+#define   PNV_OCXL_ATSD_AVA_AVA		PPC_BITMASK(0, 51)
+#define PNV_OCXL_ATSD_STAT		0x10
+
 int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 *supported);
 int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
 
@@ -31,4 +78,7 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
 int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
 		      uint64_t lpcr, void __iomem **arva);
 void pnv_ocxl_unmap_lpar(void __iomem **arva);
+void pnv_ocxl_tlb_invalidate(void __iomem **arva,
+			     unsigned long pid,
+			     unsigned long addr);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index bc20cf867900..07878496954b 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -531,3 +531,58 @@ void pnv_ocxl_unmap_lpar(void __iomem **arva)
 	iounmap(*arva);
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
+
+void pnv_ocxl_tlb_invalidate(void __iomem **arva,
+			     unsigned long pid,
+			     unsigned long addr)
+{
+	unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT);
+	uint64_t val = 0ull;
+	int pend;
+
+	if (!(*arva))
+		return;
+
+	if (addr) {
+		/* load Abbreviated Virtual Address register with
+		 * the necessary value
+		 */
+		val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51));
+		out_be64(*arva + PNV_OCXL_ATSD_AVA, val);
+	}
+
+	/* Write access initiates a shoot down to initiate the
+	 * TLB Invalidate command
+	 */
+	val = PNV_OCXL_ATSD_LNCH_R;
+	if (addr) {
+		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b00);
+		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00);
+	} else {
+		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10);
+		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01);
+		val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;
+	}
+	val |= PNV_OCXL_ATSD_LNCH_PRS;
+	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_AP, 0b101);
+	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_PID, pid);
+	out_be64(*arva + PNV_OCXL_ATSD_LNCH, val);
+
+	/* Poll the ATSD status register to determine when the
+	 * TLB Invalidate has been completed.
+	 */
+	val = in_be64(*arva + PNV_OCXL_ATSD_STAT);
+	pend = val >> 63;
+
+	while (pend) {
+		if (time_after_eq(jiffies, timeout)) {
+			pr_err("%s - Timeout while reading XTS MMIO ATSD status register (val=%#llx, pidr=0x%lx)\n",
+			       __func__, val, pid);
+			return;
+		}
+		cpu_relax();
+		val = in_be64(*arva + PNV_OCXL_ATSD_STAT);
+		pend = val >> 63;
+	}
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_tlb_invalidate);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 3/5] ocxl: Update the Process Element Entry
  2020-11-20 17:32 [PATCH V2 0/5] ocxl: Mmio invalidation support Christophe Lombard
  2020-11-20 17:32 ` [PATCH V2 1/5] ocxl: Assign a register set to a Logical Partition Christophe Lombard
  2020-11-20 17:32 ` [PATCH V2 2/5] ocxl: Initiate a TLB invalidate command Christophe Lombard
@ 2020-11-20 17:32 ` Christophe Lombard
  2020-11-23 10:38   ` Frederic Barrat
  2020-11-20 17:32 ` [PATCH V2 4/5] ocxl: Add mmu notifier Christophe Lombard
  2020-11-20 17:32 ` [PATCH V2 5/5] ocxl: Add new kernel traces Christophe Lombard
  4 siblings, 1 reply; 13+ messages in thread
From: Christophe Lombard @ 2020-11-20 17:32 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd

To complete the MMIO based mechanism, the fields: PASID, bus, device and
function of the Process Element Entry have to be filled. (See
OpenCAPI Power Platform Architecture document)

                   Hypervisor Process Element Entry
Word
    0 1 .... 7  8  ...... 12  13 ..15  16.... 19  20 ........... 31
0                  OSL Configuration State (0:31)
1                  OSL Configuration State (32:63)
2               PASID                      |    Reserved
3       Bus   |   Device    |Function |        Reserved
4                             Reserved
5                             Reserved
6                               ....

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 drivers/misc/ocxl/context.c       | 4 +++-
 drivers/misc/ocxl/link.c          | 4 +++-
 drivers/misc/ocxl/ocxl_internal.h | 4 +++-
 drivers/scsi/cxlflash/ocxl_hw.c   | 6 ++++--
 include/misc/ocxl.h               | 2 +-
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index c21f65a5c762..9eb0d93b01c6 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
 {
 	int rc;
 	unsigned long pidr = 0;
+	struct pci_dev *dev;
 
 	// Locks both status & tidr
 	mutex_lock(&ctx->status_mutex);
@@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
 	if (mm)
 		pidr = mm->context.id;
 
+	dev = to_pci_dev(ctx->afu->fn->dev.parent);
 	rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr,
-			      amr, mm, xsl_fault_error, ctx);
+			      amr, pci_dev_id(dev), mm, xsl_fault_error, ctx);
 	if (rc)
 		goto out;
 
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fd73d3bc0eb6..20444db8a2bb 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel)
 }
 
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-		u64 amr, struct mm_struct *mm,
+		u64 amr, u64 bdf, struct mm_struct *mm,
 		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
 		void *xsl_err_data)
 {
@@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 
 	memset(pe, 0, sizeof(struct ocxl_process_element));
 	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
+	pe->pasid = cpu_to_be32(pasid << (31 - 19));
+	pe->bdf = cpu_to_be32(bdf << (31 - 15));
 	pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
 	pe->pid = cpu_to_be32(pidr);
 	pe->tid = cpu_to_be32(tidr);
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 0bad0a123af6..c9ce2af21d6f 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -84,7 +84,9 @@ struct ocxl_context {
 
 struct ocxl_process_element {
 	__be64 config_state;
-	__be32 reserved1[11];
+	__be32 pasid;
+	__be32 bdf;
+	__be32 reserved1[9];
 	__be32 lpid;
 	__be32 tid;
 	__be32 pid;
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index e4e0d767b98e..244fc27215dc 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx)
 	struct ocxl_hw_afu *afu = ctx->hw_afu;
 	struct ocxl_afu_config *acfg = &afu->acfg;
 	void *link_token = afu->link_token;
+	struct pci_dev *pdev = afu->pdev;
 	struct device *dev = afu->dev;
 	bool master = ctx->master;
 	struct mm_struct *mm;
@@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx)
 		mm = current->mm;
 	}
 
-	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm,
-			      ocxlflash_xsl_fault, ctx);
+	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0,
+			      pci_dev_id(pdev), mm, ocxlflash_xsl_fault,
+			      ctx);
 	if (unlikely(rc)) {
 		dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n",
 			__func__, rc);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index e013736e275d..d0f101f428dd 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle);
  * defined
  */
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-		u64 amr, struct mm_struct *mm,
+		u64 amr, u64 bdf, struct mm_struct *mm,
 		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
 		void *xsl_err_data);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 4/5] ocxl: Add mmu notifier
  2020-11-20 17:32 [PATCH V2 0/5] ocxl: Mmio invalidation support Christophe Lombard
                   ` (2 preceding siblings ...)
  2020-11-20 17:32 ` [PATCH V2 3/5] ocxl: Update the Process Element Entry Christophe Lombard
@ 2020-11-20 17:32 ` Christophe Lombard
  2020-11-23 10:40   ` Frederic Barrat
  2020-11-24  9:17   ` Christoph Hellwig
  2020-11-20 17:32 ` [PATCH V2 5/5] ocxl: Add new kernel traces Christophe Lombard
  4 siblings, 2 replies; 13+ messages in thread
From: Christophe Lombard @ 2020-11-20 17:32 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd

Add invalidate_range mmu notifier, when required (ATSD access of MMIO
registers is available), to initiate TLB invalidation commands.
For the time being, the ATSD0 set of registers is used by default.

The pasid and bdf values have to be configured in the Process Element
Entry.
The PEE must be set up to match the BDF/PASID of the AFU.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 drivers/misc/ocxl/link.c | 58 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 20444db8a2bb..100bdfe9ec37 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -2,8 +2,10 @@
 // Copyright 2017 IBM Corp.
 #include <linux/sched/mm.h>
 #include <linux/mutex.h>
+#include <linux/mm.h>
 #include <linux/mm_types.h>
 #include <linux/mmu_context.h>
+#include <linux/mmu_notifier.h>
 #include <asm/copro.h>
 #include <asm/pnv-ocxl.h>
 #include <asm/xive.h>
@@ -33,6 +35,7 @@
 
 #define SPA_PE_VALID		0x80000000
 
+struct ocxl_link;
 
 struct pe_data {
 	struct mm_struct *mm;
@@ -41,6 +44,8 @@ struct pe_data {
 	/* opaque pointer to be passed to the above callback */
 	void *xsl_err_data;
 	struct rcu_head rcu;
+	struct ocxl_link *link;
+	struct mmu_notifier mmu_notifier;
 };
 
 struct spa {
@@ -83,6 +88,8 @@ struct ocxl_link {
 	int domain;
 	int bus;
 	int dev;
+	void __iomem *arva;     /* ATSD register virtual address */
+	spinlock_t atsd_lock;   /* to serialize shootdowns */
 	atomic_t irq_available;
 	struct spa *spa;
 	void *platform_data;
@@ -403,6 +410,11 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
 	if (rc)
 		goto err_xsl_irq;
 
+	rc = pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0,
+					  &link->arva);
+	if (!rc)
+		spin_lock_init(&link->atsd_lock);
+
 	*out_link = link;
 	return 0;
 
@@ -454,6 +466,11 @@ static void release_xsl(struct kref *ref)
 {
 	struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
 
+	if (link->arva) {
+		pnv_ocxl_unmap_lpar(&link->arva);
+		link->arva = NULL;
+	}
+
 	list_del(&link->list);
 	/* call platform code before releasing data */
 	pnv_ocxl_spa_release(link->platform_data);
@@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
 }
 EXPORT_SYMBOL_GPL(ocxl_link_release);
 
+static void invalidate_range(struct mmu_notifier *mn,
+			     struct mm_struct *mm,
+			     unsigned long start, unsigned long end)
+{
+	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
+	struct ocxl_link *link = pe_data->link;
+	unsigned long addr, pid, page_size = PAGE_SIZE;
+
+	pid = mm->context.id;
+
+	spin_lock(&link->atsd_lock);
+	for (addr = start; addr < end; addr += page_size)
+		pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
+	spin_unlock(&link->atsd_lock);
+}
+
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
+	.invalidate_range = invalidate_range,
+};
+
 static u64 calculate_cfg_state(bool kernel)
 {
 	u64 state;
@@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 	pe_data->mm = mm;
 	pe_data->xsl_err_cb = xsl_err_cb;
 	pe_data->xsl_err_data = xsl_err_data;
+	pe_data->link = link;
+	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
 
 	memset(pe, 0, sizeof(struct ocxl_process_element));
 	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
@@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 	 * by the nest MMU. If we have a kernel context, TLBIs are
 	 * already global.
 	 */
-	if (mm)
+	if (mm) {
 		mm_context_add_copro(mm);
+		if (link->arva) {
+			/* Use MMIO registers for the TLB Invalidate
+			 * operations.
+			 */
+			mmu_notifier_register(&pe_data->mmu_notifier, mm);
+		}
+	}
+
 	/*
 	 * Barrier is to make sure PE is visible in the SPA before it
 	 * is used by the device. It also helps with the global TLBI
@@ -674,6 +721,15 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
 		WARN(1, "Couldn't find pe data when removing PE\n");
 	} else {
 		if (pe_data->mm) {
+			if (link->arva) {
+				mmu_notifier_unregister(&pe_data->mmu_notifier,
+							pe_data->mm);
+				spin_lock(&link->atsd_lock);
+				pnv_ocxl_tlb_invalidate(&link->arva,
+							pe_data->mm->context.id,
+							0ull);
+				spin_unlock(&link->atsd_lock);
+			}
 			mm_context_remove_copro(pe_data->mm);
 			mmdrop(pe_data->mm);
 		}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V2 5/5] ocxl: Add new kernel traces
  2020-11-20 17:32 [PATCH V2 0/5] ocxl: Mmio invalidation support Christophe Lombard
                   ` (3 preceding siblings ...)
  2020-11-20 17:32 ` [PATCH V2 4/5] ocxl: Add mmu notifier Christophe Lombard
@ 2020-11-20 17:32 ` Christophe Lombard
  4 siblings, 0 replies; 13+ messages in thread
From: Christophe Lombard @ 2020-11-20 17:32 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd

Add specific kernel traces which provide information on mmu notifier and on
pages range.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 drivers/misc/ocxl/link.c  |  4 +++
 drivers/misc/ocxl/trace.h | 64 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 100bdfe9ec37..2e0eb46d032e 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -496,6 +496,7 @@ static void invalidate_range(struct mmu_notifier *mn,
 	unsigned long addr, pid, page_size = PAGE_SIZE;
 
 	pid = mm->context.id;
+	trace_ocxl_mmu_notifier_range(start, end, pid);
 
 	spin_lock(&link->atsd_lock);
 	for (addr = start; addr < end; addr += page_size)
@@ -587,6 +588,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 			/* Use MMIO registers for the TLB Invalidate
 			 * operations.
 			 */
+			trace_ocxl_init_mmu_notifier(pasid, mm->context.id);
 			mmu_notifier_register(&pe_data->mmu_notifier, mm);
 		}
 	}
@@ -722,6 +724,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
 	} else {
 		if (pe_data->mm) {
 			if (link->arva) {
+				trace_ocxl_release_mmu_notifier(pasid,
+								pe_data->mm->context.id);
 				mmu_notifier_unregister(&pe_data->mmu_notifier,
 							pe_data->mm);
 				spin_lock(&link->atsd_lock);
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..a33a5094ff6c 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -8,6 +8,70 @@
 
 #include <linux/tracepoint.h>
 
+
+TRACE_EVENT(ocxl_mmu_notifier_range,
+	TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr),
+	TP_ARGS(start, end, pidr),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, start)
+		__field(unsigned long, end)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->start = start;
+		__entry->end = end;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("start=0x%lx end=0x%lx pidr=0x%lx",
+		__entry->start,
+		__entry->end,
+		__entry->pidr
+	)
+);
+
+TRACE_EVENT(ocxl_init_mmu_notifier,
+	TP_PROTO(int pasid, unsigned long pidr),
+	TP_ARGS(pasid, pidr),
+
+	TP_STRUCT__entry(
+		__field(int, pasid)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->pasid = pasid;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("pasid=%d, pidr=0x%lx",
+		__entry->pasid,
+		__entry->pidr
+	)
+);
+
+TRACE_EVENT(ocxl_release_mmu_notifier,
+	TP_PROTO(int pasid, unsigned long pidr),
+	TP_ARGS(pasid, pidr),
+
+	TP_STRUCT__entry(
+		__field(int, pasid)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->pasid = pasid;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("pasid=%d, pidr=0x%lx",
+		__entry->pasid,
+		__entry->pidr
+	)
+);
+
 DECLARE_EVENT_CLASS(ocxl_context,
 	TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
 	TP_ARGS(pid, spa, pasid, pidr, tidr),
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 1/5] ocxl: Assign a register set to a Logical Partition
  2020-11-20 17:32 ` [PATCH V2 1/5] ocxl: Assign a register set to a Logical Partition Christophe Lombard
@ 2020-11-23 10:35   ` Frederic Barrat
  0 siblings, 0 replies; 13+ messages in thread
From: Frederic Barrat @ 2020-11-23 10:35 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd



On 20/11/2020 18:32, Christophe Lombard wrote:
> Platform specific function to assign a register set to a Logical Partition.
> The "ibm,mmio-atsd" property, provided by the firmware, contains the 16
> base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO
> registers (XTS MMIO ATSDx LPARID/AVA/launch/status register).
> 
> For the time being, the ATSD0 set of registers is used by default.
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pnv-ocxl.h   |  3 ++
>   arch/powerpc/platforms/powernv/ocxl.c | 48 +++++++++++++++++++++++++++
>   2 files changed, 51 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> index d37ededca3ee..3f38aed7100c 100644
> --- a/arch/powerpc/include/asm/pnv-ocxl.h
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void **p
>   void pnv_ocxl_spa_release(void *platform_data);
>   int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
> 
> +int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
> +		      uint64_t lpcr, void __iomem **arva);
> +void pnv_ocxl_unmap_lpar(void __iomem **arva);
>   #endif /* _ASM_PNV_OCXL_H */
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> index ecdad219d704..bc20cf867900 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -483,3 +483,51 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
>   	return rc;
>   }
>   EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
> +
> +int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
> +		      uint64_t lpcr, void __iomem **arva)
> +{
> +	struct pci_controller *hose = pci_bus_to_host(dev->bus);
> +	struct pnv_phb *phb = hose->private_data;
> +	u64 mmio_atsd;
> +	int rc;
> +
> +	/* ATSD physical address.
> +	 * ATSD LAUNCH register: write access initiates a shoot down to
> +	 * initiate the TLB Invalidate command.
> +	 */
> +	rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
> +					0, &mmio_atsd);
> +	if (rc) {
> +		dev_info(&dev->dev, "No available ATSD found\n");
> +		return rc;
> +	}
> +
> +	/* Assign a register set to a Logical Partition and MMIO ATSD
> +	 * LPARID register to the required value.
> +	 */
> +	if (mmio_atsd)


If we don't have the "ibm,mmio-atsd", i.e on P9, then we've already 
exited above. So why not consider mmio_atsd as an error?


> +		rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev),
> +				       lparid, lpcr);
> +	if (rc) {
> +		dev_err(&dev->dev, "Error mapping device to LPAR: %d\n", rc);
> +		return rc;
> +	}
> +
> +	if (mmio_atsd) {


Same here


> +		*arva = ioremap(mmio_atsd, 24);
> +		if (!(*arva)) {
> +			dev_warn(&dev->dev, "ioremap failed - mmio_atsd: %#llx\n", mmio_atsd);
> +			rc = -ENOMEM;
> +		}
> +	}
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar);
> +
> +void pnv_ocxl_unmap_lpar(void __iomem **arva)


The arva argument doesn't need to be a double pointer. Void * is enough.

   Fred


> +{
> +	iounmap(*arva);
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 2/5] ocxl: Initiate a TLB invalidate command
  2020-11-20 17:32 ` [PATCH V2 2/5] ocxl: Initiate a TLB invalidate command Christophe Lombard
@ 2020-11-23 10:37   ` Frederic Barrat
  0 siblings, 0 replies; 13+ messages in thread
From: Frederic Barrat @ 2020-11-23 10:37 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd



On 20/11/2020 18:32, Christophe Lombard wrote:
> When a TLB Invalidate is required for the Logical Partition, the following
> sequence has to be performed:
> 
> 1. Load MMIO ATSD AVA register with the necessary value, if required.
> 2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
>     command.
> 3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
>     has been completed.
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pnv-ocxl.h   | 50 ++++++++++++++++++++++++
>   arch/powerpc/platforms/powernv/ocxl.c | 55 +++++++++++++++++++++++++++
>   2 files changed, 105 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> index 3f38aed7100c..9c90e87e7659 100644
> --- a/arch/powerpc/include/asm/pnv-ocxl.h
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -3,12 +3,59 @@
>   #ifndef _ASM_PNV_OCXL_H
>   #define _ASM_PNV_OCXL_H
> 
> +#include <linux/bitfield.h>
>   #include <linux/pci.h>
> 
>   #define PNV_OCXL_TL_MAX_TEMPLATE        63
>   #define PNV_OCXL_TL_BITS_PER_RATE       4
>   #define PNV_OCXL_TL_RATE_BUF_SIZE       ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
> 
> +#define PNV_OCXL_ATSD_TIMEOUT		1
> +
> +/* TLB Management Instructions */
> +#define PNV_OCXL_ATSD_LNCH		0x00
> +/* Radix Invalidate */
> +#define   PNV_OCXL_ATSD_LNCH_R		PPC_BIT(0)
> +/* Radix Invalidation Control
> + * 0b00 Just invalidate TLB.
> + * 0b01 Invalidate just Page Walk Cache.
> + * 0b10 Invalidate TLB, Page Walk Cache, and any
> + * caching of Partition and Process Table Entries.
> + */
> +#define   PNV_OCXL_ATSD_LNCH_RIC	PPC_BITMASK(1, 2)
> +/* Number and Page Size of translations to be invalidated */
> +#define   PNV_OCXL_ATSD_LNCH_LP		PPC_BITMASK(3, 10)
> +/* Invalidation Criteria
> + * 0b00 Invalidate just the target VA.
> + * 0b01 Invalidate matching PID.
> + */
> +#define   PNV_OCXL_ATSD_LNCH_IS		PPC_BITMASK(11, 12)
> +/* 0b1: Process Scope, 0b0: Partition Scope */
> +#define   PNV_OCXL_ATSD_LNCH_PRS	PPC_BIT(13)
> +/* Invalidation Flag */
> +#define   PNV_OCXL_ATSD_LNCH_B		PPC_BIT(14)
> +/* Actual Page Size to be invalidated
> + * 000 4KB
> + * 101 64KB
> + * 001 2MB
> + * 010 1GB
> + */
> +#define   PNV_OCXL_ATSD_LNCH_AP		PPC_BITMASK(15, 17)
> +/* Defines the large page select
> + * L=0b0 for 4KB pages
> + * L=0b1 for large pages)
> + */
> +#define   PNV_OCXL_ATSD_LNCH_L		PPC_BIT(18)
> +/* Process ID */
> +#define   PNV_OCXL_ATSD_LNCH_PID	PPC_BITMASK(19, 38)
> +/* NoFlush – Assumed to be 0b0 */
> +#define   PNV_OCXL_ATSD_LNCH_F		PPC_BIT(39)
> +#define   PNV_OCXL_ATSD_LNCH_OCAPI_SLBI	PPC_BIT(40)
> +#define   PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON	PPC_BIT(41)
> +#define PNV_OCXL_ATSD_AVA		0x08
> +#define   PNV_OCXL_ATSD_AVA_AVA		PPC_BITMASK(0, 51)
> +#define PNV_OCXL_ATSD_STAT		0x10
> +
>   int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 *supported);
>   int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
> 
> @@ -31,4 +78,7 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
>   int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
>   		      uint64_t lpcr, void __iomem **arva);
>   void pnv_ocxl_unmap_lpar(void __iomem **arva);
> +void pnv_ocxl_tlb_invalidate(void __iomem **arva,
> +			     unsigned long pid,
> +			     unsigned long addr);
>   #endif /* _ASM_PNV_OCXL_H */
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> index bc20cf867900..07878496954b 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -531,3 +531,58 @@ void pnv_ocxl_unmap_lpar(void __iomem **arva)
>   	iounmap(*arva);
>   }
>   EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
> +
> +void pnv_ocxl_tlb_invalidate(void __iomem **arva,


Similarly to the previous patch, there's no reason why arva should be a 
double-pointer.


> +			     unsigned long pid,
> +			     unsigned long addr)
> +{
> +	unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT);
> +	uint64_t val = 0ull;
> +	int pend;
> +
> +	if (!(*arva))
> +		return;
> +
> +	if (addr) {
> +		/* load Abbreviated Virtual Address register with
> +		 * the necessary value
> +		 */
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51));
> +		out_be64(*arva + PNV_OCXL_ATSD_AVA, val);
> +	}
> +
> +	/* Write access initiates a shoot down to initiate the
> +	 * TLB Invalidate command
> +	 */
> +	val = PNV_OCXL_ATSD_LNCH_R;
> +	if (addr) {
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b00);
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00);
> +	} else {
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10);
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01);
> +		val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;
> +	}
> +	val |= PNV_OCXL_ATSD_LNCH_PRS;
> +	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_AP, 0b101);



So we hard code a page size of 64k. The mmu notifier loops over 
PAGE_SIZE. It would be cleaner to pass the page size as an argument and 
code AP based on it.

   Fred


> +	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_PID, pid);
> +	out_be64(*arva + PNV_OCXL_ATSD_LNCH, val);
> +
> +	/* Poll the ATSD status register to determine when the
> +	 * TLB Invalidate has been completed.
> +	 */
> +	val = in_be64(*arva + PNV_OCXL_ATSD_STAT);
> +	pend = val >> 63;
> +
> +	while (pend) {
> +		if (time_after_eq(jiffies, timeout)) {
> +			pr_err("%s - Timeout while reading XTS MMIO ATSD status register (val=%#llx, pidr=0x%lx)\n",
> +			       __func__, val, pid);
> +			return;
> +		}
> +		cpu_relax();
> +		val = in_be64(*arva + PNV_OCXL_ATSD_STAT);
> +		pend = val >> 63;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_tlb_invalidate);
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 3/5] ocxl: Update the Process Element Entry
  2020-11-20 17:32 ` [PATCH V2 3/5] ocxl: Update the Process Element Entry Christophe Lombard
@ 2020-11-23 10:38   ` Frederic Barrat
  0 siblings, 0 replies; 13+ messages in thread
From: Frederic Barrat @ 2020-11-23 10:38 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd



On 20/11/2020 18:32, Christophe Lombard wrote:
> To complete the MMIO based mechanism, the fields: PASID, bus, device and
> function of the Process Element Entry have to be filled. (See
> OpenCAPI Power Platform Architecture document)
> 
>                     Hypervisor Process Element Entry
> Word
>      0 1 .... 7  8  ...... 12  13 ..15  16.... 19  20 ........... 31
> 0                  OSL Configuration State (0:31)
> 1                  OSL Configuration State (32:63)
> 2               PASID                      |    Reserved
> 3       Bus   |   Device    |Function |        Reserved
> 4                             Reserved
> 5                             Reserved
> 6                               ....
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---
>   drivers/misc/ocxl/context.c       | 4 +++-
>   drivers/misc/ocxl/link.c          | 4 +++-
>   drivers/misc/ocxl/ocxl_internal.h | 4 +++-
>   drivers/scsi/cxlflash/ocxl_hw.c   | 6 ++++--
>   include/misc/ocxl.h               | 2 +-
>   5 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index c21f65a5c762..9eb0d93b01c6 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
>   {
>   	int rc;
>   	unsigned long pidr = 0;
> +	struct pci_dev *dev;
> 
>   	// Locks both status & tidr
>   	mutex_lock(&ctx->status_mutex);
> @@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
>   	if (mm)
>   		pidr = mm->context.id;
> 
> +	dev = to_pci_dev(ctx->afu->fn->dev.parent);
>   	rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr,
> -			      amr, mm, xsl_fault_error, ctx);
> +			      amr, pci_dev_id(dev), mm, xsl_fault_error, ctx);
>   	if (rc)
>   		goto out;
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index fd73d3bc0eb6..20444db8a2bb 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel)
>   }
> 
>   int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> -		u64 amr, struct mm_struct *mm,
> +		u64 amr, u64 bdf, struct mm_struct *mm,


bdf could/should be a u16, since that's per the PCI spec.

   Fred


>   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>   		void *xsl_err_data)
>   {
> @@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> 
>   	memset(pe, 0, sizeof(struct ocxl_process_element));
>   	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> +	pe->pasid = cpu_to_be32(pasid << (31 - 19));
> +	pe->bdf = cpu_to_be32(bdf << (31 - 15));
>   	pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
>   	pe->pid = cpu_to_be32(pidr);
>   	pe->tid = cpu_to_be32(tidr);
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index 0bad0a123af6..c9ce2af21d6f 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -84,7 +84,9 @@ struct ocxl_context {
> 
>   struct ocxl_process_element {
>   	__be64 config_state;
> -	__be32 reserved1[11];
> +	__be32 pasid;
> +	__be32 bdf;
> +	__be32 reserved1[9];
>   	__be32 lpid;
>   	__be32 tid;
>   	__be32 pid;
> diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
> index e4e0d767b98e..244fc27215dc 100644
> --- a/drivers/scsi/cxlflash/ocxl_hw.c
> +++ b/drivers/scsi/cxlflash/ocxl_hw.c
> @@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx)
>   	struct ocxl_hw_afu *afu = ctx->hw_afu;
>   	struct ocxl_afu_config *acfg = &afu->acfg;
>   	void *link_token = afu->link_token;
> +	struct pci_dev *pdev = afu->pdev;
>   	struct device *dev = afu->dev;
>   	bool master = ctx->master;
>   	struct mm_struct *mm;
> @@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx)
>   		mm = current->mm;
>   	}
> 
> -	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm,
> -			      ocxlflash_xsl_fault, ctx);
> +	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0,
> +			      pci_dev_id(pdev), mm, ocxlflash_xsl_fault,
> +			      ctx);
>   	if (unlikely(rc)) {
>   		dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n",
>   			__func__, rc);
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index e013736e275d..d0f101f428dd 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle);
>    * defined
>    */
>   int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> -		u64 amr, struct mm_struct *mm,
> +		u64 amr, u64 bdf, struct mm_struct *mm,
>   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>   		void *xsl_err_data);
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/5] ocxl: Add mmu notifier
  2020-11-20 17:32 ` [PATCH V2 4/5] ocxl: Add mmu notifier Christophe Lombard
@ 2020-11-23 10:40   ` Frederic Barrat
  2020-11-24  9:17   ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Frederic Barrat @ 2020-11-23 10:40 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd



On 20/11/2020 18:32, Christophe Lombard wrote:
> Add invalidate_range mmu notifier, when required (ATSD access of MMIO
> registers is available), to initiate TLB invalidation commands.
> For the time being, the ATSD0 set of registers is used by default.
> 
> The pasid and bdf values have to be configured in the Process Element
> Entry.
> The PEE must be set up to match the BDF/PASID of the AFU.
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---
>   drivers/misc/ocxl/link.c | 58 +++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 20444db8a2bb..100bdfe9ec37 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -2,8 +2,10 @@
>   // Copyright 2017 IBM Corp.
>   #include <linux/sched/mm.h>
>   #include <linux/mutex.h>
> +#include <linux/mm.h>
>   #include <linux/mm_types.h>
>   #include <linux/mmu_context.h>
> +#include <linux/mmu_notifier.h>
>   #include <asm/copro.h>
>   #include <asm/pnv-ocxl.h>
>   #include <asm/xive.h>
> @@ -33,6 +35,7 @@
> 
>   #define SPA_PE_VALID		0x80000000
> 
> +struct ocxl_link;
> 
>   struct pe_data {
>   	struct mm_struct *mm;
> @@ -41,6 +44,8 @@ struct pe_data {
>   	/* opaque pointer to be passed to the above callback */
>   	void *xsl_err_data;
>   	struct rcu_head rcu;
> +	struct ocxl_link *link;
> +	struct mmu_notifier mmu_notifier;
>   };
> 
>   struct spa {
> @@ -83,6 +88,8 @@ struct ocxl_link {
>   	int domain;
>   	int bus;
>   	int dev;
> +	void __iomem *arva;     /* ATSD register virtual address */
> +	spinlock_t atsd_lock;   /* to serialize shootdowns */
>   	atomic_t irq_available;
>   	struct spa *spa;
>   	void *platform_data;
> @@ -403,6 +410,11 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
>   	if (rc)
>   		goto err_xsl_irq;
> 
> +	rc = pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0,
> +					  &link->arva);
> +	if (!rc)
> +		spin_lock_init(&link->atsd_lock);
> +


We could use a comment to say that if arva = 0, then we don't need mmio 
shootdowns and we rely on hardware snooping.

Also, we could always initialize the spin lock, it doesn't hurt and make 
the code more readable.

   Fred


>   	*out_link = link;
>   	return 0;
> 
> @@ -454,6 +466,11 @@ static void release_xsl(struct kref *ref)
>   {
>   	struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
> 
> +	if (link->arva) {
> +		pnv_ocxl_unmap_lpar(&link->arva);
> +		link->arva = NULL;
> +	}
> +
>   	list_del(&link->list);
>   	/* call platform code before releasing data */
>   	pnv_ocxl_spa_release(link->platform_data);
> @@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
>   }
>   EXPORT_SYMBOL_GPL(ocxl_link_release);
> 
> +static void invalidate_range(struct mmu_notifier *mn,
> +			     struct mm_struct *mm,
> +			     unsigned long start, unsigned long end)
> +{
> +	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
> +	struct ocxl_link *link = pe_data->link;
> +	unsigned long addr, pid, page_size = PAGE_SIZE;
> +
> +	pid = mm->context.id;
> +
> +	spin_lock(&link->atsd_lock);
> +	for (addr = start; addr < end; addr += page_size)
> +		pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
> +	spin_unlock(&link->atsd_lock);
> +}
> +
> +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
> +	.invalidate_range = invalidate_range,
> +};
> +
>   static u64 calculate_cfg_state(bool kernel)
>   {
>   	u64 state;
> @@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   	pe_data->mm = mm;
>   	pe_data->xsl_err_cb = xsl_err_cb;
>   	pe_data->xsl_err_data = xsl_err_data;
> +	pe_data->link = link;
> +	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
> 
>   	memset(pe, 0, sizeof(struct ocxl_process_element));
>   	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> @@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   	 * by the nest MMU. If we have a kernel context, TLBIs are
>   	 * already global.
>   	 */
> -	if (mm)
> +	if (mm) {
>   		mm_context_add_copro(mm);
> +		if (link->arva) {
> +			/* Use MMIO registers for the TLB Invalidate
> +			 * operations.
> +			 */
> +			mmu_notifier_register(&pe_data->mmu_notifier, mm);
> +		}
> +	}
> +
>   	/*
>   	 * Barrier is to make sure PE is visible in the SPA before it
>   	 * is used by the device. It also helps with the global TLBI
> @@ -674,6 +721,15 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>   		WARN(1, "Couldn't find pe data when removing PE\n");
>   	} else {
>   		if (pe_data->mm) {
> +			if (link->arva) {
> +				mmu_notifier_unregister(&pe_data->mmu_notifier,
> +							pe_data->mm);
> +				spin_lock(&link->atsd_lock);
> +				pnv_ocxl_tlb_invalidate(&link->arva,
> +							pe_data->mm->context.id,
> +							0ull);
> +				spin_unlock(&link->atsd_lock);
> +			}
>   			mm_context_remove_copro(pe_data->mm);
>   			mmdrop(pe_data->mm);
>   		}
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/5] ocxl: Add mmu notifier
  2020-11-20 17:32 ` [PATCH V2 4/5] ocxl: Add mmu notifier Christophe Lombard
  2020-11-23 10:40   ` Frederic Barrat
@ 2020-11-24  9:17   ` Christoph Hellwig
  2020-11-24 13:45     ` Jason Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2020-11-24  9:17 UTC (permalink / raw)
  To: Christophe Lombard; +Cc: linuxppc-dev, ajd, fbarrat, Jason Gunthorpe

You probably want to add Jason for an audit of new notifier uses.

On Fri, Nov 20, 2020 at 06:32:40PM +0100, Christophe Lombard wrote:
> Add invalidate_range mmu notifier, when required (ATSD access of MMIO
> registers is available), to initiate TLB invalidation commands.
> For the time being, the ATSD0 set of registers is used by default.
> 
> The pasid and bdf values have to be configured in the Process Element
> Entry.
> The PEE must be set up to match the BDF/PASID of the AFU.
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---
>  drivers/misc/ocxl/link.c | 58 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 20444db8a2bb..100bdfe9ec37 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -2,8 +2,10 @@
>  // Copyright 2017 IBM Corp.
>  #include <linux/sched/mm.h>
>  #include <linux/mutex.h>
> +#include <linux/mm.h>
>  #include <linux/mm_types.h>
>  #include <linux/mmu_context.h>
> +#include <linux/mmu_notifier.h>
>  #include <asm/copro.h>
>  #include <asm/pnv-ocxl.h>
>  #include <asm/xive.h>
> @@ -33,6 +35,7 @@
>  
>  #define SPA_PE_VALID		0x80000000
>  
> +struct ocxl_link;
>  
>  struct pe_data {
>  	struct mm_struct *mm;
> @@ -41,6 +44,8 @@ struct pe_data {
>  	/* opaque pointer to be passed to the above callback */
>  	void *xsl_err_data;
>  	struct rcu_head rcu;
> +	struct ocxl_link *link;
> +	struct mmu_notifier mmu_notifier;
>  };
>  
>  struct spa {
> @@ -83,6 +88,8 @@ struct ocxl_link {
>  	int domain;
>  	int bus;
>  	int dev;
> +	void __iomem *arva;     /* ATSD register virtual address */
> +	spinlock_t atsd_lock;   /* to serialize shootdowns */
>  	atomic_t irq_available;
>  	struct spa *spa;
>  	void *platform_data;
> @@ -403,6 +410,11 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
>  	if (rc)
>  		goto err_xsl_irq;
>  
> +	rc = pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0,
> +					  &link->arva);
> +	if (!rc)
> +		spin_lock_init(&link->atsd_lock);
> +
>  	*out_link = link;
>  	return 0;
>  
> @@ -454,6 +466,11 @@ static void release_xsl(struct kref *ref)
>  {
>  	struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
>  
> +	if (link->arva) {
> +		pnv_ocxl_unmap_lpar(&link->arva);
> +		link->arva = NULL;
> +	}
> +
>  	list_del(&link->list);
>  	/* call platform code before releasing data */
>  	pnv_ocxl_spa_release(link->platform_data);
> @@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
>  }
>  EXPORT_SYMBOL_GPL(ocxl_link_release);
>  
> +static void invalidate_range(struct mmu_notifier *mn,
> +			     struct mm_struct *mm,
> +			     unsigned long start, unsigned long end)
> +{
> +	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
> +	struct ocxl_link *link = pe_data->link;
> +	unsigned long addr, pid, page_size = PAGE_SIZE;
> +
> +	pid = mm->context.id;
> +
> +	spin_lock(&link->atsd_lock);
> +	for (addr = start; addr < end; addr += page_size)
> +		pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
> +	spin_unlock(&link->atsd_lock);
> +}
> +
> +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
> +	.invalidate_range = invalidate_range,
> +};
> +
>  static u64 calculate_cfg_state(bool kernel)
>  {
>  	u64 state;
> @@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>  	pe_data->mm = mm;
>  	pe_data->xsl_err_cb = xsl_err_cb;
>  	pe_data->xsl_err_data = xsl_err_data;
> +	pe_data->link = link;
> +	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
>  
>  	memset(pe, 0, sizeof(struct ocxl_process_element));
>  	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> @@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>  	 * by the nest MMU. If we have a kernel context, TLBIs are
>  	 * already global.
>  	 */
> -	if (mm)
> +	if (mm) {
>  		mm_context_add_copro(mm);
> +		if (link->arva) {
> +			/* Use MMIO registers for the TLB Invalidate
> +			 * operations.
> +			 */
> +			mmu_notifier_register(&pe_data->mmu_notifier, mm);
> +		}
> +	}
> +
>  	/*
>  	 * Barrier is to make sure PE is visible in the SPA before it
>  	 * is used by the device. It also helps with the global TLBI
> @@ -674,6 +721,15 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>  		WARN(1, "Couldn't find pe data when removing PE\n");
>  	} else {
>  		if (pe_data->mm) {
> +			if (link->arva) {
> +				mmu_notifier_unregister(&pe_data->mmu_notifier,
> +							pe_data->mm);
> +				spin_lock(&link->atsd_lock);
> +				pnv_ocxl_tlb_invalidate(&link->arva,
> +							pe_data->mm->context.id,
> +							0ull);
> +				spin_unlock(&link->atsd_lock);
> +			}
>  			mm_context_remove_copro(pe_data->mm);
>  			mmdrop(pe_data->mm);
>  		}
> -- 
> 2.28.0
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/5] ocxl: Add mmu notifier
  2020-11-24  9:17   ` Christoph Hellwig
@ 2020-11-24 13:45     ` Jason Gunthorpe
  2020-11-24 16:48       ` Christophe Lombard
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-11-24 13:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linuxppc-dev, Christophe Lombard, fbarrat, ajd

On Tue, Nov 24, 2020 at 09:17:38AM +0000, Christoph Hellwig wrote:

> > @@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
> >  }
> >  EXPORT_SYMBOL_GPL(ocxl_link_release);
> >  
> > +static void invalidate_range(struct mmu_notifier *mn,
> > +			     struct mm_struct *mm,
> > +			     unsigned long start, unsigned long end)
> > +{
> > +	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
> > +	struct ocxl_link *link = pe_data->link;
> > +	unsigned long addr, pid, page_size = PAGE_SIZE;

The page_size variable seems unnecessary

> > +
> > +	pid = mm->context.id;
> > +
> > +	spin_lock(&link->atsd_lock);
> > +	for (addr = start; addr < end; addr += page_size)
> > +		pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
> > +	spin_unlock(&link->atsd_lock);
> > +}
> > +
> > +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
> > +	.invalidate_range = invalidate_range,
> > +};
> > +
> >  static u64 calculate_cfg_state(bool kernel)
> >  {
> >  	u64 state;
> > @@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> >  	pe_data->mm = mm;
> >  	pe_data->xsl_err_cb = xsl_err_cb;
> >  	pe_data->xsl_err_data = xsl_err_data;
> > +	pe_data->link = link;
> > +	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
> >  
> >  	memset(pe, 0, sizeof(struct ocxl_process_element));
> >  	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> > @@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> >  	 * by the nest MMU. If we have a kernel context, TLBIs are
> >  	 * already global.
> >  	 */
> > -	if (mm)
> > +	if (mm) {
> >  		mm_context_add_copro(mm);
> > +		if (link->arva) {
> > +			/* Use MMIO registers for the TLB Invalidate
> > +			 * operations.
> > +			 */
> > +			mmu_notifier_register(&pe_data->mmu_notifier, mm);

Every other place doing stuff like this is de-duplicating the
notifier. If you have multiple clients this will do multiple redundant
invalidations?

The notifier get/put API is designed to solve that problem, you'd get
a single notifier for the mm and then add the impacted arva's to some
list at the notifier.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2 4/5] ocxl: Add mmu notifier
  2020-11-24 13:45     ` Jason Gunthorpe
@ 2020-11-24 16:48       ` Christophe Lombard
  0 siblings, 0 replies; 13+ messages in thread
From: Christophe Lombard @ 2020-11-24 16:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig; +Cc: linuxppc-dev, ajd, fbarrat

[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]


Le 24/11/2020 à 14:45, Jason Gunthorpe a écrit :
> On Tue, Nov 24, 2020 at 09:17:38AM +0000, Christoph Hellwig wrote:
>
>>> @@ -470,6 +487,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
>>>   }
>>>   EXPORT_SYMBOL_GPL(ocxl_link_release);
>>>   
>>> +static void invalidate_range(struct mmu_notifier *mn,
>>> +			     struct mm_struct *mm,
>>> +			     unsigned long start, unsigned long end)
>>> +{
>>> +	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
>>> +	struct ocxl_link *link = pe_data->link;
>>> +	unsigned long addr, pid, page_size = PAGE_SIZE;
> The page_size variable seems unnecessary
>
>>> +
>>> +	pid = mm->context.id;
>>> +
>>> +	spin_lock(&link->atsd_lock);
>>> +	for (addr = start; addr < end; addr += page_size)
>>> +		pnv_ocxl_tlb_invalidate(&link->arva, pid, addr);
>>> +	spin_unlock(&link->atsd_lock);
>>> +}
>>> +
>>> +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
>>> +	.invalidate_range = invalidate_range,
>>> +};
>>> +
>>>   static u64 calculate_cfg_state(bool kernel)
>>>   {
>>>   	u64 state;
>>> @@ -526,6 +563,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>>>   	pe_data->mm = mm;
>>>   	pe_data->xsl_err_cb = xsl_err_cb;
>>>   	pe_data->xsl_err_data = xsl_err_data;
>>> +	pe_data->link = link;
>>> +	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
>>>   
>>>   	memset(pe, 0, sizeof(struct ocxl_process_element));
>>>   	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
>>> @@ -542,8 +581,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>>>   	 * by the nest MMU. If we have a kernel context, TLBIs are
>>>   	 * already global.
>>>   	 */
>>> -	if (mm)
>>> +	if (mm) {
>>>   		mm_context_add_copro(mm);
>>> +		if (link->arva) {
>>> +			/* Use MMIO registers for the TLB Invalidate
>>> +			 * operations.
>>> +			 */
>>> +			mmu_notifier_register(&pe_data->mmu_notifier, mm);
> Every other place doing stuff like this is de-duplicating the
> notifier. If you have multiple clients this will do multiple redundant
> invalidations?

We could have multiple clients, although not something that we have often.
We have only one attach per process. But if not, we must still have 
invalidation for each.

>
> The notifier get/put API is designed to solve that problem, you'd get
> a single notifier for the mm and then add the impacted arva's to some
> list at the notifier.

Thanks for the information.
>
> Jason

[-- Attachment #2: Type: text/html, Size: 3655 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-11-24 20:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-20 17:32 [PATCH V2 0/5] ocxl: Mmio invalidation support Christophe Lombard
2020-11-20 17:32 ` [PATCH V2 1/5] ocxl: Assign a register set to a Logical Partition Christophe Lombard
2020-11-23 10:35   ` Frederic Barrat
2020-11-20 17:32 ` [PATCH V2 2/5] ocxl: Initiate a TLB invalidate command Christophe Lombard
2020-11-23 10:37   ` Frederic Barrat
2020-11-20 17:32 ` [PATCH V2 3/5] ocxl: Update the Process Element Entry Christophe Lombard
2020-11-23 10:38   ` Frederic Barrat
2020-11-20 17:32 ` [PATCH V2 4/5] ocxl: Add mmu notifier Christophe Lombard
2020-11-23 10:40   ` Frederic Barrat
2020-11-24  9:17   ` Christoph Hellwig
2020-11-24 13:45     ` Jason Gunthorpe
2020-11-24 16:48       ` Christophe Lombard
2020-11-20 17:32 ` [PATCH V2 5/5] ocxl: Add new kernel traces Christophe Lombard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.