Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v5 0/7] Fix PF/VF dependency issue
@ 2019-08-02  0:05 sathyanarayanan.kuppuswamy
  2019-08-02  0:05 ` [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues sathyanarayanan.kuppuswamy
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:05 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Current implementation of ATS, PASID, PRI does not handle VF dependencies
correctly. Following patches addresses this issue.

Changes since v4:
 * Defined empty functions for pci_pri_init() and pci_pasid_init() for cases
   where CONFIG_PCI_PRI and CONFIG_PCI_PASID are not enabled.

Changes since v3:
 * Fixed critical path (lock context) in pci_restore_*_state functions.

Changes since v2:
 * Added locking mechanism to synchronize accessing PF registers in VF.
 * Removed spec compliance checks in patches.
 * Addressed comments from Bjorn Helgaas.

Changes since v1:
 * Added more details about the patches in commit log.
 * Removed bulk spec check patch.
 * Addressed comments from Bjorn Helgaas.

Kuppuswamy Sathyanarayanan (7):
  PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues
  PCI/ATS: Initialize PRI in pci_ats_init()
  PCI/ATS: Initialize PASID in pci_ats_init()
  PCI/ATS: Add PRI support for PCIe VF devices
  PCI/ATS: Add PASID support for PCIe VF devices
  PCI/ATS: Disable PF/VF ATS service independently
  PCI: Skip Enhanced Allocation (EA) initialization for VF device

 drivers/pci/ats.c       | 373 ++++++++++++++++++++++++++++++----------
 drivers/pci/pci.c       |   7 +
 include/linux/pci-ats.h |  22 ++-
 include/linux/pci.h     |   7 +-
 4 files changed, 315 insertions(+), 94 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues
  2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
@ 2019-08-02  0:05 ` sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-02  0:05 ` [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init() sathyanarayanan.kuppuswamy
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:05 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Since pci_prg_resp_pasid_required() function has dependency on both
PASID and PRI, define it only if both CONFIG_PCI_PRI and
CONFIG_PCI_PASID config options are enabled.

Fixes: e5567f5f6762 ("PCI/ATS: Add pci_prg_resp_pasid_required()
interface.")
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/ats.c       | 10 ++++++----
 include/linux/pci-ats.h | 12 +++++++++---
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index e18499243f84..cdd936d10f68 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -395,6 +395,8 @@ int pci_pasid_features(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_pasid_features);
 
+#ifdef CONFIG_PCI_PRI
+
 /**
  * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
  *				 status.
@@ -402,10 +404,8 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
  *
  * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
  *
- * Even though the PRG response PASID status is read from PRI Status
- * Register, since this API will mainly be used by PASID users, this
- * function is defined within #ifdef CONFIG_PCI_PASID instead of
- * CONFIG_PCI_PRI.
+ * Since this API has dependency on both PRI and PASID, protect it
+ * with both CONFIG_PCI_PRI and CONFIG_PCI_PASID.
  */
 int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 {
@@ -425,6 +425,8 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
 
+#endif
+
 #define PASID_NUMBER_SHIFT	8
 #define PASID_NUMBER_MASK	(0x1f << PASID_NUMBER_SHIFT)
 /**
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 1ebb88e7c184..1a0bdaee2f32 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -40,7 +40,6 @@ void pci_disable_pasid(struct pci_dev *pdev);
 void pci_restore_pasid_state(struct pci_dev *pdev);
 int pci_pasid_features(struct pci_dev *pdev);
 int pci_max_pasids(struct pci_dev *pdev);
-int pci_prg_resp_pasid_required(struct pci_dev *pdev);
 
 #else  /* CONFIG_PCI_PASID */
 
@@ -67,11 +66,18 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
 	return -EINVAL;
 }
 
+#endif /* CONFIG_PCI_PASID */
+
+#if defined(CONFIG_PCI_PRI) && defined(CONFIG_PCI_PASID)
+
+int pci_prg_resp_pasid_required(struct pci_dev *pdev);
+
+#else /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
+
 static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 {
 	return 0;
 }
-#endif /* CONFIG_PCI_PASID */
-
+#endif
 
 #endif /* LINUX_PCI_ATS_H*/
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
  2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
  2019-08-02  0:05 ` [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues sathyanarayanan.kuppuswamy
@ 2019-08-02  0:05 ` sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-15  4:46   ` Bjorn Helgaas
  2019-08-02  0:06 ` [PATCH v5 3/7] PCI/ATS: Initialize PASID " sathyanarayanan.kuppuswamy
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:05 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Currently, PRI Capability checks are repeated across all PRI API's.
Instead, cache the capability check result in pci_pri_init() and use it
in other PRI API's. Also, since PRI is a shared resource between PF/VF,
initialize default values for common PRI features in pci_pri_init().

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/ats.c       | 80 ++++++++++++++++++++++++++++-------------
 include/linux/pci-ats.h |  5 +++
 include/linux/pci.h     |  1 +
 3 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index cdd936d10f68..280be911f190 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev)
 		return;
 
 	dev->ats_cap = pos;
+
+	pci_pri_init(dev);
 }
 
 /**
@@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
 EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
 
 #ifdef CONFIG_PCI_PRI
+
+void pci_pri_init(struct pci_dev *pdev)
+{
+	u32 max_requests;
+	int pos;
+
+	/*
+	 * As per PCIe r4.0, sec 9.3.7.11, only PF is permitted to
+	 * implement PRI and all associated VFs can only use it.
+	 * Since PF already initialized the PRI parameters there is
+	 * no need to proceed further.
+	 */
+	if (pdev->is_virtfn)
+		return;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+	if (!pos)
+		return;
+
+	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
+
+	/*
+	 * Since PRI is a shared resource between PF and VF, we must not
+	 * configure Outstanding Page Allocation Quota as a per device
+	 * resource in pci_enable_pri(). So use maximum value possible
+	 * as default value.
+	 */
+	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, max_requests);
+
+	pdev->pri_reqs_alloc = max_requests;
+	pdev->pri_cap = pos;
+}
+
 /**
  * pci_enable_pri - Enable PRI capability
  * @ pdev: PCI device structure
  *
  * Returns 0 on success, negative value on error
+ *
+ * TODO: Since PRI is a shared resource between PF/VF, don't update
+ * Outstanding Page Allocation Quota in the same API as a per device
+ * feature.
  */
 int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
 {
 	u16 control, status;
 	u32 max_requests;
-	int pos;
 
 	if (WARN_ON(pdev->pri_enabled))
 		return -EBUSY;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
+	if (!pdev->pri_cap)
 		return -EINVAL;
 
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
 	if (!(status & PCI_PRI_STATUS_STOPPED))
 		return -EBUSY;
 
-	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
+	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
+			      &max_requests);
 	reqs = min(max_requests, reqs);
 	pdev->pri_reqs_alloc = reqs;
-	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
+	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
 
 	control = PCI_PRI_CTRL_ENABLE;
-	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 
 	pdev->pri_enabled = 1;
 
@@ -216,18 +254,16 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
 void pci_disable_pri(struct pci_dev *pdev)
 {
 	u16 control;
-	int pos;
 
 	if (WARN_ON(!pdev->pri_enabled))
 		return;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
+	if (!pdev->pri_cap)
 		return;
 
-	pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
+	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
 	control &= ~PCI_PRI_CTRL_ENABLE;
-	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 
 	pdev->pri_enabled = 0;
 }
@@ -241,17 +277,15 @@ void pci_restore_pri_state(struct pci_dev *pdev)
 {
 	u16 control = PCI_PRI_CTRL_ENABLE;
 	u32 reqs = pdev->pri_reqs_alloc;
-	int pos;
 
 	if (!pdev->pri_enabled)
 		return;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
+	if (!pdev->pri_cap)
 		return;
 
-	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
-	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
+	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 }
 EXPORT_SYMBOL_GPL(pci_restore_pri_state);
 
@@ -265,17 +299,15 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
 int pci_reset_pri(struct pci_dev *pdev)
 {
 	u16 control;
-	int pos;
 
 	if (WARN_ON(pdev->pri_enabled))
 		return -EBUSY;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
+	if (!pdev->pri_cap)
 		return -EINVAL;
 
 	control = PCI_PRI_CTRL_RESET;
-	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 
 	return 0;
 }
@@ -410,13 +442,11 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
 int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 {
 	u16 status;
-	int pos;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
-	if (!pos)
+	if (!pdev->pri_cap)
 		return 0;
 
-	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
+	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
 
 	if (status & PCI_PRI_STATUS_PASID)
 		return 1;
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 1a0bdaee2f32..33653d4ca94f 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -6,6 +6,7 @@
 
 #ifdef CONFIG_PCI_PRI
 
+void pci_pri_init(struct pci_dev *pdev);
 int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
 void pci_disable_pri(struct pci_dev *pdev);
 void pci_restore_pri_state(struct pci_dev *pdev);
@@ -13,6 +14,10 @@ int pci_reset_pri(struct pci_dev *pdev);
 
 #else /* CONFIG_PCI_PRI */
 
+static inline void pci_pri_init(struct pci_dev *pdev)
+{
+}
+
 static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
 {
 	return -ENODEV;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e700d9f9f28..56b55db099fc 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -455,6 +455,7 @@ struct pci_dev {
 	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
 #endif
 #ifdef CONFIG_PCI_PRI
+	u16		pri_cap;	/* PRI Capability offset */
 	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
 #endif
 #ifdef CONFIG_PCI_PASID
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 3/7] PCI/ATS: Initialize PASID in pci_ats_init()
  2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
  2019-08-02  0:05 ` [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues sathyanarayanan.kuppuswamy
  2019-08-02  0:05 ` [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init() sathyanarayanan.kuppuswamy
@ 2019-08-02  0:06 ` " sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
                     ` (2 more replies)
  2019-08-02  0:06 ` [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices sathyanarayanan.kuppuswamy
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:06 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Currently, PASID Capability checks are repeated across all PASID API's.
Instead, cache the capability check result in pci_pasid_init() and use
it in other PASID API's. Also, since PASID is a shared resource between
PF/VF, initialize PASID features with default values in pci_pasid_init().

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/ats.c       | 74 +++++++++++++++++++++++++++++------------
 include/linux/pci-ats.h |  5 +++
 include/linux/pci.h     |  1 +
 3 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 280be911f190..1f4be27a071d 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -30,6 +30,8 @@ void pci_ats_init(struct pci_dev *dev)
 	dev->ats_cap = pos;
 
 	pci_pri_init(dev);
+
+	pci_pasid_init(dev);
 }
 
 /**
@@ -315,6 +317,40 @@ EXPORT_SYMBOL_GPL(pci_reset_pri);
 #endif /* CONFIG_PCI_PRI */
 
 #ifdef CONFIG_PCI_PASID
+
+void pci_pasid_init(struct pci_dev *pdev)
+{
+	u16 supported;
+	int pos;
+
+	/*
+	 * As per PCIe r4.0, sec 9.3.7.14, only PF is permitted to
+	 * implement PASID Capability and all associated VFs can
+	 * only use it. Since PF already initialized the PASID
+	 * parameters there is no need to proceed further.
+	 */
+	if (pdev->is_virtfn)
+		return;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
+	if (!pos)
+		return;
+
+	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
+
+	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
+
+	/*
+	 * Enable all supported features. Since PASID is a shared
+	 * resource between PF/VF, we must not set this feature as
+	 * a per device property in pci_enable_pasid().
+	 */
+	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, supported);
+
+	pdev->pasid_features = supported;
+	pdev->pasid_cap = pos;
+}
+
 /**
  * pci_enable_pasid - Enable the PASID capability
  * @pdev: PCI device structure
@@ -323,11 +359,13 @@ EXPORT_SYMBOL_GPL(pci_reset_pri);
  * Returns 0 on success, negative value on error. This function checks
  * whether the features are actually supported by the device and returns
  * an error if not.
+ *
+ * TODO: Since PASID is a shared resource between PF/VF, don't update
+ * PASID features in the same API as a per device feature.
  */
 int pci_enable_pasid(struct pci_dev *pdev, int features)
 {
 	u16 control, supported;
-	int pos;
 
 	if (WARN_ON(pdev->pasid_enabled))
 		return -EBUSY;
@@ -335,11 +373,11 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
 	if (!pdev->eetlp_prefix_path)
 		return -EINVAL;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-	if (!pos)
+	if (!pdev->pasid_cap)
 		return -EINVAL;
 
-	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
+	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
+			     &supported);
 	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
 	/* User wants to enable anything unsupported? */
@@ -349,7 +387,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
 	control = PCI_PASID_CTRL_ENABLE | features;
 	pdev->pasid_features = features;
 
-	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
+	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 
 	pdev->pasid_enabled = 1;
 
@@ -364,16 +402,14 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
 void pci_disable_pasid(struct pci_dev *pdev)
 {
 	u16 control = 0;
-	int pos;
 
 	if (WARN_ON(!pdev->pasid_enabled))
 		return;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-	if (!pos)
+	if (!pdev->pasid_cap)
 		return;
 
-	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
+	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 
 	pdev->pasid_enabled = 0;
 }
@@ -386,17 +422,15 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
 void pci_restore_pasid_state(struct pci_dev *pdev)
 {
 	u16 control;
-	int pos;
 
 	if (!pdev->pasid_enabled)
 		return;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-	if (!pos)
+	if (!pdev->pasid_cap)
 		return;
 
 	control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
-	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
+	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 }
 EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
 
@@ -413,13 +447,12 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
 int pci_pasid_features(struct pci_dev *pdev)
 {
 	u16 supported;
-	int pos;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-	if (!pos)
+	if (!pdev->pasid_cap)
 		return -EINVAL;
 
-	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
+	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
+			     &supported);
 
 	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
@@ -469,13 +502,12 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
 int pci_max_pasids(struct pci_dev *pdev)
 {
 	u16 supported;
-	int pos;
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
-	if (!pos)
+	if (!pdev->pasid_cap)
 		return -EINVAL;
 
-	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
+	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
+			     &supported);
 
 	supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
 
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 33653d4ca94f..bc7f815d38ff 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -40,6 +40,7 @@ static inline int pci_reset_pri(struct pci_dev *pdev)
 
 #ifdef CONFIG_PCI_PASID
 
+void pci_pasid_init(struct pci_dev *pdev);
 int pci_enable_pasid(struct pci_dev *pdev, int features);
 void pci_disable_pasid(struct pci_dev *pdev);
 void pci_restore_pasid_state(struct pci_dev *pdev);
@@ -48,6 +49,10 @@ int pci_max_pasids(struct pci_dev *pdev);
 
 #else  /* CONFIG_PCI_PASID */
 
+static inline void pci_pasid_init(struct pci_dev *pdev)
+{
+}
+
 static inline int pci_enable_pasid(struct pci_dev *pdev, int features)
 {
 	return -EINVAL;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 56b55db099fc..27224c0db849 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -459,6 +459,7 @@ struct pci_dev {
 	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
 #endif
 #ifdef CONFIG_PCI_PASID
+	u16		pasid_cap;	/* PASID Capability offset */
 	u16		pasid_features;
 #endif
 #ifdef CONFIG_PCI_P2PDMA
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
                   ` (2 preceding siblings ...)
  2019-08-02  0:06 ` [PATCH v5 3/7] PCI/ATS: Initialize PASID " sathyanarayanan.kuppuswamy
@ 2019-08-02  0:06 ` sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
                     ` (2 more replies)
  2019-08-02  0:06 ` [PATCH v5 5/7] PCI/ATS: Add PASID " sathyanarayanan.kuppuswamy
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:06 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

When IOMMU tries to enable Page Request Interface (PRI) for VF device
in iommu_enable_dev_iotlb(), it always fails because PRI support for
PCIe VF device is currently broken. Current implementation expects
the given PCIe device (PF & VF) to implement PRI capability before
enabling the PRI support. But this assumption is incorrect. As per PCIe
spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
PRI of the PF and not implement it. Hence we need to create exception
for handling the PRI support for PCIe VF device.

Also, since PRI is a shared resource between PF/VF, following rules
should apply.

1. Use proper locking before accessing/modifying PF resources in VF
   PRI enable/disable call.
2. Use reference count logic to track the usage of PRI resource.
3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/ats.c   | 143 ++++++++++++++++++++++++++++++++++----------
 include/linux/pci.h |   2 +
 2 files changed, 112 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 1f4be27a071d..079dc5444444 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
 	if (pdev->is_virtfn)
 		return;
 
+	mutex_init(&pdev->pri_lock);
+
 	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
 	if (!pos)
 		return;
@@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
 {
 	u16 control, status;
 	u32 max_requests;
+	int ret = 0;
+	struct pci_dev *pf = pci_physfn(pdev);
 
-	if (WARN_ON(pdev->pri_enabled))
-		return -EBUSY;
+	mutex_lock(&pf->pri_lock);
 
-	if (!pdev->pri_cap)
-		return -EINVAL;
+	if (WARN_ON(pdev->pri_enabled)) {
+		ret = -EBUSY;
+		goto pri_unlock;
+	}
 
-	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
-	if (!(status & PCI_PRI_STATUS_STOPPED))
-		return -EBUSY;
+	if (!pf->pri_cap) {
+		ret = -EINVAL;
+		goto pri_unlock;
+	}
+
+	if (pdev->is_virtfn && pf->pri_enabled)
+		goto update_status;
+
+	/*
+	 * Before updating PRI registers, make sure there is no
+	 * outstanding PRI requests.
+	 */
+	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
+	if (!(status & PCI_PRI_STATUS_STOPPED)) {
+		ret = -EBUSY;
+		goto pri_unlock;
+	}
 
-	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
-			      &max_requests);
+	pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, &max_requests);
 	reqs = min(max_requests, reqs);
-	pdev->pri_reqs_alloc = reqs;
-	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
+	pf->pri_reqs_alloc = reqs;
+	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
 
 	control = PCI_PRI_CTRL_ENABLE;
-	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
+	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
 
-	pdev->pri_enabled = 1;
+	/*
+	 * If PRI is not already enabled in PF, increment the PF
+	 * pri_ref_cnt to track the usage of PRI interface.
+	 */
+	if (pdev->is_virtfn && !pf->pri_enabled) {
+		atomic_inc(&pf->pri_ref_cnt);
+		pf->pri_enabled = 1;
+	}
 
-	return 0;
+update_status:
+	atomic_inc(&pf->pri_ref_cnt);
+	pdev->pri_enabled = 1;
+pri_unlock:
+	mutex_unlock(&pf->pri_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(pci_enable_pri);
 
@@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
 void pci_disable_pri(struct pci_dev *pdev)
 {
 	u16 control;
+	struct pci_dev *pf = pci_physfn(pdev);
 
-	if (WARN_ON(!pdev->pri_enabled))
-		return;
+	mutex_lock(&pf->pri_lock);
 
-	if (!pdev->pri_cap)
-		return;
+	if (WARN_ON(!pdev->pri_enabled) || !pf->pri_cap)
+		goto pri_unlock;
+
+	atomic_dec(&pf->pri_ref_cnt);
 
-	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
+	/*
+	 * If pri_ref_cnt is not zero, then don't modify hardware
+	 * registers.
+	 */
+	if (atomic_read(&pf->pri_ref_cnt))
+		goto done;
+
+	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
 	control &= ~PCI_PRI_CTRL_ENABLE;
-	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
+	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
 
+done:
 	pdev->pri_enabled = 0;
+pri_unlock:
+	mutex_unlock(&pf->pri_lock);
 }
 EXPORT_SYMBOL_GPL(pci_disable_pri);
 
@@ -277,17 +319,31 @@ EXPORT_SYMBOL_GPL(pci_disable_pri);
  */
 void pci_restore_pri_state(struct pci_dev *pdev)
 {
-	u16 control = PCI_PRI_CTRL_ENABLE;
-	u32 reqs = pdev->pri_reqs_alloc;
+	u16 control;
+	u32 reqs;
+	struct pci_dev *pf = pci_physfn(pdev);
 
 	if (!pdev->pri_enabled)
 		return;
 
-	if (!pdev->pri_cap)
+	if (!pf->pri_cap)
 		return;
 
-	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
-	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
+	mutex_lock(&pf->pri_lock);
+
+	/* If PRI is already enabled by other VF's or PF, return */
+	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
+	if (control & PCI_PRI_CTRL_ENABLE)
+		goto pri_unlock;
+
+	reqs = pf->pri_reqs_alloc;
+	control = PCI_PRI_CTRL_ENABLE;
+
+	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
+	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
+
+pri_unlock:
+	mutex_unlock(&pf->pri_lock);
 }
 EXPORT_SYMBOL_GPL(pci_restore_pri_state);
 
@@ -300,18 +356,32 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
  */
 int pci_reset_pri(struct pci_dev *pdev)
 {
+	struct pci_dev *pf = pci_physfn(pdev);
 	u16 control;
+	int ret = 0;
 
-	if (WARN_ON(pdev->pri_enabled))
-		return -EBUSY;
+	mutex_lock(&pf->pri_lock);
 
-	if (!pdev->pri_cap)
-		return -EINVAL;
+	if (WARN_ON(pdev->pri_enabled)) {
+		ret = -EBUSY;
+		goto done;
+	}
+
+	if (!pf->pri_cap) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	/* If PRI is already enabled by other VF's or PF, return 0 */
+	if (pf->pri_enabled)
+		goto done;
 
 	control = PCI_PRI_CTRL_RESET;
-	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
 
-	return 0;
+	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
+done:
+	mutex_unlock(&pf->pri_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(pci_reset_pri);
 #endif /* CONFIG_PCI_PRI */
@@ -475,11 +545,18 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
 int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 {
 	u16 status;
+	struct pci_dev *pf = pci_physfn(pdev);
+
+	mutex_lock(&pf->pri_lock);
 
-	if (!pdev->pri_cap)
+	if (!pf->pri_cap) {
+		mutex_unlock(&pf->pri_lock);
 		return 0;
+	}
+
+	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
 
-	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
+	mutex_unlock(&pf->pri_lock);
 
 	if (status & PCI_PRI_STATUS_PASID)
 		return 1;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 27224c0db849..3c9c4c82be27 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -455,8 +455,10 @@ struct pci_dev {
 	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
 #endif
 #ifdef CONFIG_PCI_PRI
+	struct mutex	pri_lock;	/* PRI enable lock */
 	u16		pri_cap;	/* PRI Capability offset */
 	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
+	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
 #endif
 #ifdef CONFIG_PCI_PASID
 	u16		pasid_cap;	/* PASID Capability offset */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 5/7] PCI/ATS: Add PASID support for PCIe VF devices
  2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
                   ` (3 preceding siblings ...)
  2019-08-02  0:06 ` [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices sathyanarayanan.kuppuswamy
@ 2019-08-02  0:06 ` " sathyanarayanan.kuppuswamy
  2019-08-12 20:05   ` Bjorn Helgaas
  2019-08-02  0:06 ` [PATCH v5 6/7] PCI/ATS: Disable PF/VF ATS service independently sathyanarayanan.kuppuswamy
  2019-08-02  0:06 ` [PATCH v5 7/7] PCI: Skip Enhanced Allocation (EA) initialization for VF device sathyanarayanan.kuppuswamy
  6 siblings, 1 reply; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:06 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

When IOMMU tries to enable PASID for VF device in
iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
VF device is currently broken in PCIE driver. Current implementation
expects the given PCIe device (PF & VF) to implement PASID capability
before enabling the PASID support. But this assumption is incorrect. As
per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
use the PASID of the PF and not implement it.

Also, since PASID is a shared resource between PF/VF, following rules
should apply.

1. Use proper locking before accessing/modifying PF resources in VF
   PASID enable/disable call.
2. Use reference count logic to track the usage of PASID resource.
3. Disable PASID only if the PASID reference count (pasid_ref_cnt) is zero.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/ats.c   | 113 ++++++++++++++++++++++++++++++++++----------
 include/linux/pci.h |   2 +
 2 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 079dc5444444..9384afd7d00e 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -402,6 +402,8 @@ void pci_pasid_init(struct pci_dev *pdev)
 	if (pdev->is_virtfn)
 		return;
 
+	mutex_init(&pdev->pasid_lock);
+
 	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
 	if (!pos)
 		return;
@@ -436,32 +438,57 @@ void pci_pasid_init(struct pci_dev *pdev)
 int pci_enable_pasid(struct pci_dev *pdev, int features)
 {
 	u16 control, supported;
+	int ret = 0;
+	struct pci_dev *pf = pci_physfn(pdev);
 
-	if (WARN_ON(pdev->pasid_enabled))
-		return -EBUSY;
+	mutex_lock(&pf->pasid_lock);
 
-	if (!pdev->eetlp_prefix_path)
-		return -EINVAL;
+	if (WARN_ON(pdev->pasid_enabled)) {
+		ret = -EBUSY;
+		goto pasid_unlock;
+	}
 
-	if (!pdev->pasid_cap)
-		return -EINVAL;
+	if (!pdev->eetlp_prefix_path) {
+		ret = -EINVAL;
+		goto pasid_unlock;
+	}
 
-	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
-			     &supported);
+	if (!pf->pasid_cap) {
+		ret = -EINVAL;
+		goto pasid_unlock;
+	}
+
+	if (pdev->is_virtfn && pf->pasid_enabled)
+		goto update_status;
+
+	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
 	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
 	/* User wants to enable anything unsupported? */
-	if ((supported & features) != features)
-		return -EINVAL;
+	if ((supported & features) != features) {
+		ret = -EINVAL;
+		goto pasid_unlock;
+	}
 
 	control = PCI_PASID_CTRL_ENABLE | features;
-	pdev->pasid_features = features;
-
+	pf->pasid_features = features;
 	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
 
-	pdev->pasid_enabled = 1;
+	/*
+	 * If PASID is not already enabled in PF, increment pasid_ref_cnt
+	 * to count PF PASID usage.
+	 */
+	if (pdev->is_virtfn && !pf->pasid_enabled) {
+		atomic_inc(&pf->pasid_ref_cnt);
+		pf->pasid_enabled = 1;
+	}
 
-	return 0;
+update_status:
+	atomic_inc(&pf->pasid_ref_cnt);
+	pdev->pasid_enabled = 1;
+pasid_unlock:
+	mutex_unlock(&pf->pasid_lock);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(pci_enable_pasid);
 
@@ -472,16 +499,29 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
 void pci_disable_pasid(struct pci_dev *pdev)
 {
 	u16 control = 0;
+	struct pci_dev *pf = pci_physfn(pdev);
+
+	mutex_lock(&pf->pasid_lock);
 
 	if (WARN_ON(!pdev->pasid_enabled))
-		return;
+		goto pasid_unlock;
 
-	if (!pdev->pasid_cap)
-		return;
+	if (!pf->pasid_cap)
+		goto pasid_unlock;
 
-	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
+	atomic_dec(&pf->pasid_ref_cnt);
 
+	if (atomic_read(&pf->pasid_ref_cnt))
+		goto done;
+
+	/* Disable PASID only if pasid_ref_cnt is zero */
+	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
+
+done:
 	pdev->pasid_enabled = 0;
+pasid_unlock:
+	mutex_unlock(&pf->pasid_lock);
+
 }
 EXPORT_SYMBOL_GPL(pci_disable_pasid);
 
@@ -492,15 +532,25 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
 void pci_restore_pasid_state(struct pci_dev *pdev)
 {
 	u16 control;
+	struct pci_dev *pf = pci_physfn(pdev);
 
 	if (!pdev->pasid_enabled)
 		return;
 
-	if (!pdev->pasid_cap)
+	if (!pf->pasid_cap)
 		return;
 
+	mutex_lock(&pf->pasid_lock);
+
+	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, &control);
+	if (control & PCI_PASID_CTRL_ENABLE)
+		goto pasid_unlock;
+
 	control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
-	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
+	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
+
+pasid_unlock:
+	mutex_unlock(&pf->pasid_lock);
 }
 EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
 
@@ -517,15 +567,22 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
 int pci_pasid_features(struct pci_dev *pdev)
 {
 	u16 supported;
+	struct pci_dev *pf = pci_physfn(pdev);
+
+	mutex_lock(&pf->pasid_lock);
 
-	if (!pdev->pasid_cap)
+	if (!pf->pasid_cap) {
+		mutex_unlock(&pf->pasid_lock);
 		return -EINVAL;
+	}
 
-	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
+	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP,
 			     &supported);
 
 	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
+	mutex_unlock(&pf->pasid_lock);
+
 	return supported;
 }
 EXPORT_SYMBOL_GPL(pci_pasid_features);
@@ -579,15 +636,21 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
 int pci_max_pasids(struct pci_dev *pdev)
 {
 	u16 supported;
+	struct pci_dev *pf = pci_physfn(pdev);
+
+	mutex_lock(&pf->pasid_lock);
 
-	if (!pdev->pasid_cap)
+	if (!pf->pasid_cap) {
+		mutex_unlock(&pf->pasid_lock);
 		return -EINVAL;
+	}
 
-	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
-			     &supported);
+	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
 
 	supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
 
+	mutex_unlock(&pf->pasid_lock);
+
 	return (1 << supported);
 }
 EXPORT_SYMBOL_GPL(pci_max_pasids);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3c9c4c82be27..4bfcca045afd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -461,8 +461,10 @@ struct pci_dev {
 	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
 #endif
 #ifdef CONFIG_PCI_PASID
+	struct mutex	pasid_lock;	/* PASID enable lock */
 	u16		pasid_cap;	/* PASID Capability offset */
 	u16		pasid_features;
+	atomic_t	pasid_ref_cnt;	/* Number of VFs with PASID enabled */
 #endif
 #ifdef CONFIG_PCI_P2PDMA
 	struct pci_p2pdma *p2pdma;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 6/7] PCI/ATS: Disable PF/VF ATS service independently
  2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
                   ` (4 preceding siblings ...)
  2019-08-02  0:06 ` [PATCH v5 5/7] PCI/ATS: Add PASID " sathyanarayanan.kuppuswamy
@ 2019-08-02  0:06 ` sathyanarayanan.kuppuswamy
  2019-08-02  0:06 ` [PATCH v5 7/7] PCI: Skip Enhanced Allocation (EA) initialization for VF device sathyanarayanan.kuppuswamy
  6 siblings, 0 replies; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:06 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Currently all VF's needs to be disable their ATS service before
disabling the ATS service in corresponding PF device. But this logic is
incorrect and does not align with the spec. Also it might lead to
some power and performance impact in the system. As per PCIe spec r4.0,
sec 9.3.7.8, ATS Capabilities in VFs and their associated PFs may be
enabled/disabled independently. So remove this dependency logic in
enable/disable code.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/ats.c   | 11 -----------
 include/linux/pci.h |  1 -
 2 files changed, 12 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 9384afd7d00e..df2d20079e38 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -64,8 +64,6 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
 		pdev = pci_physfn(dev);
 		if (pdev->ats_stu != ps)
 			return -EINVAL;
-
-		atomic_inc(&pdev->ats_ref_cnt);  /* count enabled VFs */
 	} else {
 		dev->ats_stu = ps;
 		ctrl |= PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
@@ -83,20 +81,11 @@ EXPORT_SYMBOL_GPL(pci_enable_ats);
  */
 void pci_disable_ats(struct pci_dev *dev)
 {
-	struct pci_dev *pdev;
 	u16 ctrl;
 
 	if (WARN_ON(!dev->ats_enabled))
 		return;
 
-	if (atomic_read(&dev->ats_ref_cnt))
-		return;		/* VFs still enabled */
-
-	if (dev->is_virtfn) {
-		pdev = pci_physfn(dev);
-		atomic_dec(&pdev->ats_ref_cnt);
-	}
-
 	pci_read_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, &ctrl);
 	ctrl &= ~PCI_ATS_CTRL_ENABLE;
 	pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4bfcca045afd..84301db2fbef 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -452,7 +452,6 @@ struct pci_dev {
 	};
 	u16		ats_cap;	/* ATS Capability offset */
 	u8		ats_stu;	/* ATS Smallest Translation Unit */
-	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
 #endif
 #ifdef CONFIG_PCI_PRI
 	struct mutex	pri_lock;	/* PRI enable lock */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 7/7] PCI: Skip Enhanced Allocation (EA) initialization for VF device
  2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
                   ` (5 preceding siblings ...)
  2019-08-02  0:06 ` [PATCH v5 6/7] PCI/ATS: Disable PF/VF ATS service independently sathyanarayanan.kuppuswamy
@ 2019-08-02  0:06 ` sathyanarayanan.kuppuswamy
  6 siblings, 0 replies; 36+ messages in thread
From: sathyanarayanan.kuppuswamy @ 2019-08-02  0:06 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch,
	sathyanarayanan.kuppuswamy

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

As per PCIe r4.0, sec 9.3.6, VF must not implement Enhanced Allocation
Capability. So skip pci_ea_init() for virtual devices.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/pci.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 29ed5ec1ac27..4b2844c3606c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3020,6 +3020,13 @@ void pci_ea_init(struct pci_dev *dev)
 	int offset;
 	int i;
 
+	/*
+	 * Per PCIe r4.0, sec 9.3.6, VF must not implement Enhanced
+	 * Allocation Capability.
+	 */
+	if (dev->is_virtfn)
+		return;
+
 	/* find PCI EA capability in list */
 	ea = pci_find_capability(dev, PCI_CAP_ID_EA);
 	if (!ea)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues
  2019-08-02  0:05 ` [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues sathyanarayanan.kuppuswamy
@ 2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-12 20:20     ` sathyanarayanan kuppuswamy
  0 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-12 20:04 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:05:58PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Since pci_prg_resp_pasid_required() function has dependency on both
> PASID and PRI, define it only if both CONFIG_PCI_PRI and
> CONFIG_PCI_PASID config options are enabled.

I don't really like this.  It makes the #ifdefs more complicated and I
don't think it really buys us anything.  Will anything break if we
just drop this patch?

> Fixes: e5567f5f6762 ("PCI/ATS: Add pci_prg_resp_pasid_required()
> interface.")
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c       | 10 ++++++----
>  include/linux/pci-ats.h | 12 +++++++++---
>  2 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index e18499243f84..cdd936d10f68 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -395,6 +395,8 @@ int pci_pasid_features(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_pasid_features);
>  
> +#ifdef CONFIG_PCI_PRI
> +
>  /**
>   * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
>   *				 status.
> @@ -402,10 +404,8 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>   *
>   * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
>   *
> - * Even though the PRG response PASID status is read from PRI Status
> - * Register, since this API will mainly be used by PASID users, this
> - * function is defined within #ifdef CONFIG_PCI_PASID instead of
> - * CONFIG_PCI_PRI.
> + * Since this API has dependency on both PRI and PASID, protect it
> + * with both CONFIG_PCI_PRI and CONFIG_PCI_PASID.
>   */
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  {
> @@ -425,6 +425,8 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
>  
> +#endif
> +
>  #define PASID_NUMBER_SHIFT	8
>  #define PASID_NUMBER_MASK	(0x1f << PASID_NUMBER_SHIFT)
>  /**
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 1ebb88e7c184..1a0bdaee2f32 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -40,7 +40,6 @@ void pci_disable_pasid(struct pci_dev *pdev);
>  void pci_restore_pasid_state(struct pci_dev *pdev);
>  int pci_pasid_features(struct pci_dev *pdev);
>  int pci_max_pasids(struct pci_dev *pdev);
> -int pci_prg_resp_pasid_required(struct pci_dev *pdev);
>  
>  #else  /* CONFIG_PCI_PASID */
>  
> @@ -67,11 +66,18 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
>  	return -EINVAL;
>  }
>  
> +#endif /* CONFIG_PCI_PASID */
> +
> +#if defined(CONFIG_PCI_PRI) && defined(CONFIG_PCI_PASID)
> +
> +int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +
> +#else /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
> +
>  static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  {
>  	return 0;
>  }
> -#endif /* CONFIG_PCI_PASID */
> -
> +#endif
>  
>  #endif /* LINUX_PCI_ATS_H*/
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
  2019-08-02  0:05 ` [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init() sathyanarayanan.kuppuswamy
@ 2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-12 21:35     ` sathyanarayanan kuppuswamy
  2019-08-15  4:46   ` Bjorn Helgaas
  1 sibling, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-12 20:04 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:05:59PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Currently, PRI Capability checks are repeated across all PRI API's.
> Instead, cache the capability check result in pci_pri_init() and use it
> in other PRI API's. Also, since PRI is a shared resource between PF/VF,
> initialize default values for common PRI features in pci_pri_init().

This patch does two things, and it would be better if they were split:

  1) Cache the PRI capability offset
  2) Separate the PF and VF paths

> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c       | 80 ++++++++++++++++++++++++++++-------------
>  include/linux/pci-ats.h |  5 +++
>  include/linux/pci.h     |  1 +
>  3 files changed, 61 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index cdd936d10f68..280be911f190 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev)
>  		return;
>  
>  	dev->ats_cap = pos;
> +
> +	pci_pri_init(dev);
>  }
>  
>  /**
> @@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
>  
>  #ifdef CONFIG_PCI_PRI
> +
> +void pci_pri_init(struct pci_dev *pdev)
> +{
> +	u32 max_requests;
> +	int pos;
> +
> +	/*
> +	 * As per PCIe r4.0, sec 9.3.7.11, only PF is permitted to
> +	 * implement PRI and all associated VFs can only use it.
> +	 * Since PF already initialized the PRI parameters there is
> +	 * no need to proceed further.
> +	 */
> +	if (pdev->is_virtfn)
> +		return;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +	if (!pos)
> +		return;
> +
> +	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
> +
> +	/*
> +	 * Since PRI is a shared resource between PF and VF, we must not
> +	 * configure Outstanding Page Allocation Quota as a per device
> +	 * resource in pci_enable_pri(). So use maximum value possible
> +	 * as default value.
> +	 */
> +	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, max_requests);
> +
> +	pdev->pri_reqs_alloc = max_requests;
> +	pdev->pri_cap = pos;
> +}
> +
>  /**
>   * pci_enable_pri - Enable PRI capability
>   * @ pdev: PCI device structure
>   *
>   * Returns 0 on success, negative value on error
> + *
> + * TODO: Since PRI is a shared resource between PF/VF, don't update
> + * Outstanding Page Allocation Quota in the same API as a per device
> + * feature.
>   */
>  int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  {
>  	u16 control, status;
>  	u32 max_requests;
> -	int pos;
>  
>  	if (WARN_ON(pdev->pri_enabled))
>  		return -EBUSY;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> +	if (!pdev->pri_cap)
>  		return -EINVAL;
>  
> -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>  	if (!(status & PCI_PRI_STATUS_STOPPED))
>  		return -EBUSY;
>  
> -	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
> +	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
> +			      &max_requests);
>  	reqs = min(max_requests, reqs);
>  	pdev->pri_reqs_alloc = reqs;
> -	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> +	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);

The comment above says "don't update Outstanding Page Allocation
Quota" but it looks like that's what this is doing.

>  	control = PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>  
>  	pdev->pri_enabled = 1;
>  
> @@ -216,18 +254,16 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>  void pci_disable_pri(struct pci_dev *pdev)
>  {
>  	u16 control;
> -	int pos;
>  
>  	if (WARN_ON(!pdev->pri_enabled))
>  		return;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> +	if (!pdev->pri_cap)
>  		return;
>  
> -	pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
> +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
>  	control &= ~PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>  
>  	pdev->pri_enabled = 0;
>  }
> @@ -241,17 +277,15 @@ void pci_restore_pri_state(struct pci_dev *pdev)
>  {
>  	u16 control = PCI_PRI_CTRL_ENABLE;
>  	u32 reqs = pdev->pri_reqs_alloc;
> -	int pos;
>  
>  	if (!pdev->pri_enabled)
>  		return;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> +	if (!pdev->pri_cap)
>  		return;
>  
> -	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>  }
>  EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>  
> @@ -265,17 +299,15 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>  int pci_reset_pri(struct pci_dev *pdev)
>  {
>  	u16 control;
> -	int pos;
>  
>  	if (WARN_ON(pdev->pri_enabled))
>  		return -EBUSY;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> +	if (!pdev->pri_cap)
>  		return -EINVAL;
>  
>  	control = PCI_PRI_CTRL_RESET;
> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>  
>  	return 0;
>  }
> @@ -410,13 +442,11 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  {
>  	u16 status;
> -	int pos;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> -	if (!pos)
> +	if (!pdev->pri_cap)
>  		return 0;
>  
> -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>  
>  	if (status & PCI_PRI_STATUS_PASID)
>  		return 1;
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 1a0bdaee2f32..33653d4ca94f 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -6,6 +6,7 @@
>  
>  #ifdef CONFIG_PCI_PRI
>  
> +void pci_pri_init(struct pci_dev *pdev);

I think this could be moved to drivers/pci/pci.h, since it doesn't
need to be visible outside drivers/pci/.

>  int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  void pci_restore_pri_state(struct pci_dev *pdev);
> @@ -13,6 +14,10 @@ int pci_reset_pri(struct pci_dev *pdev);
>  
>  #else /* CONFIG_PCI_PRI */
>  
> +static inline void pci_pri_init(struct pci_dev *pdev)
> +{
> +}
> +
>  static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  {
>  	return -ENODEV;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 9e700d9f9f28..56b55db099fc 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -455,6 +455,7 @@ struct pci_dev {
>  	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
>  #endif
>  #ifdef CONFIG_PCI_PRI
> +	u16		pri_cap;	/* PRI Capability offset */
>  	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
>  #endif
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 3/7] PCI/ATS: Initialize PASID in pci_ats_init()
  2019-08-02  0:06 ` [PATCH v5 3/7] PCI/ATS: Initialize PASID " sathyanarayanan.kuppuswamy
@ 2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-15  4:48   ` Bjorn Helgaas
  2019-08-15  4:56   ` Bjorn Helgaas
  2 siblings, 0 replies; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-12 20:04 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:06:00PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Currently, PASID Capability checks are repeated across all PASID API's.
> Instead, cache the capability check result in pci_pasid_init() and use
> it in other PASID API's. Also, since PASID is a shared resource between
> PF/VF, initialize PASID features with default values in pci_pasid_init().

Split into two patches as for PRI.

> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c       | 74 +++++++++++++++++++++++++++++------------
>  include/linux/pci-ats.h |  5 +++
>  include/linux/pci.h     |  1 +
>  3 files changed, 59 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 280be911f190..1f4be27a071d 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -30,6 +30,8 @@ void pci_ats_init(struct pci_dev *dev)
>  	dev->ats_cap = pos;
>  
>  	pci_pri_init(dev);
> +

Superfluous blank line; you can remove it.

> +	pci_pasid_init(dev);
>  }
>  
>  /**
> @@ -315,6 +317,40 @@ EXPORT_SYMBOL_GPL(pci_reset_pri);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> +
> +void pci_pasid_init(struct pci_dev *pdev)
> +{
> +	u16 supported;
> +	int pos;
> +
> +	/*
> +	 * As per PCIe r4.0, sec 9.3.7.14, only PF is permitted to
> +	 * implement PASID Capability and all associated VFs can
> +	 * only use it. Since PF already initialized the PASID
> +	 * parameters there is no need to proceed further.
> +	 */
> +	if (pdev->is_virtfn)
> +		return;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> +	if (!pos)
> +		return;
> +
> +	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> +
> +	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> +
> +	/*
> +	 * Enable all supported features. Since PASID is a shared
> +	 * resource between PF/VF, we must not set this feature as
> +	 * a per device property in pci_enable_pasid().

But pci_enable_pasid() *does* set PCI_PASID_CTRL.  Either the comments
or the code needs to be updated.

> +	 */
> +	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, supported);
> +
> +	pdev->pasid_features = supported;
> +	pdev->pasid_cap = pos;
> +}
> +
>  /**
>   * pci_enable_pasid - Enable the PASID capability
>   * @pdev: PCI device structure
> @@ -323,11 +359,13 @@ EXPORT_SYMBOL_GPL(pci_reset_pri);
>   * Returns 0 on success, negative value on error. This function checks
>   * whether the features are actually supported by the device and returns
>   * an error if not.
> + *
> + * TODO: Since PASID is a shared resource between PF/VF, don't update
> + * PASID features in the same API as a per device feature.
>   */
>  int pci_enable_pasid(struct pci_dev *pdev, int features)
>  {
>  	u16 control, supported;
> -	int pos;
>  
>  	if (WARN_ON(pdev->pasid_enabled))
>  		return -EBUSY;
> @@ -335,11 +373,11 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>  	if (!pdev->eetlp_prefix_path)
>  		return -EINVAL;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> -	if (!pos)
> +	if (!pdev->pasid_cap)
>  		return -EINVAL;
>  
> -	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> +	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> +			     &supported);
>  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
>  	/* User wants to enable anything unsupported? */
> @@ -349,7 +387,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>  	control = PCI_PASID_CTRL_ENABLE | features;
>  	pdev->pasid_features = features;
>  
> -	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
>  
>  	pdev->pasid_enabled = 1;
>  
> @@ -364,16 +402,14 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
>  void pci_disable_pasid(struct pci_dev *pdev)
>  {
>  	u16 control = 0;
> -	int pos;
>  
>  	if (WARN_ON(!pdev->pasid_enabled))
>  		return;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> -	if (!pos)
> +	if (!pdev->pasid_cap)
>  		return;
>  
> -	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
>  
>  	pdev->pasid_enabled = 0;
>  }
> @@ -386,17 +422,15 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  void pci_restore_pasid_state(struct pci_dev *pdev)
>  {
>  	u16 control;
> -	int pos;
>  
>  	if (!pdev->pasid_enabled)
>  		return;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> -	if (!pos)
> +	if (!pdev->pasid_cap)
>  		return;
>  
>  	control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> -	pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
>  }
>  EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
>  
> @@ -413,13 +447,12 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
>  int pci_pasid_features(struct pci_dev *pdev)
>  {
>  	u16 supported;
> -	int pos;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> -	if (!pos)
> +	if (!pdev->pasid_cap)
>  		return -EINVAL;
>  
> -	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> +	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> +			     &supported);
>  
>  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
> @@ -469,13 +502,12 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
>  int pci_max_pasids(struct pci_dev *pdev)
>  {
>  	u16 supported;
> -	int pos;
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> -	if (!pos)
> +	if (!pdev->pasid_cap)
>  		return -EINVAL;
>  
> -	pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> +	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> +			     &supported);
>  
>  	supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
>  
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 33653d4ca94f..bc7f815d38ff 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -40,6 +40,7 @@ static inline int pci_reset_pri(struct pci_dev *pdev)
>  
>  #ifdef CONFIG_PCI_PASID
>  
> +void pci_pasid_init(struct pci_dev *pdev);

Move to drivers/pci/pci.h.

>  int pci_enable_pasid(struct pci_dev *pdev, int features);
>  void pci_disable_pasid(struct pci_dev *pdev);
>  void pci_restore_pasid_state(struct pci_dev *pdev);
> @@ -48,6 +49,10 @@ int pci_max_pasids(struct pci_dev *pdev);
>  
>  #else  /* CONFIG_PCI_PASID */
>  
> +static inline void pci_pasid_init(struct pci_dev *pdev)
> +{
> +}
> +
>  static inline int pci_enable_pasid(struct pci_dev *pdev, int features)
>  {
>  	return -EINVAL;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 56b55db099fc..27224c0db849 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -459,6 +459,7 @@ struct pci_dev {
>  	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
>  #endif
>  #ifdef CONFIG_PCI_PASID
> +	u16		pasid_cap;	/* PASID Capability offset */
>  	u16		pasid_features;
>  #endif
>  #ifdef CONFIG_PCI_P2PDMA
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-02  0:06 ` [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices sathyanarayanan.kuppuswamy
@ 2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-12 21:40     ` sathyanarayanan kuppuswamy
  2019-08-13  4:16   ` Bjorn Helgaas
  2019-08-15 22:20   ` Bjorn Helgaas
  2 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-12 20:04 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> When IOMMU tries to enable Page Request Interface (PRI) for VF device
> in iommu_enable_dev_iotlb(), it always fails because PRI support for
> PCIe VF device is currently broken. Current implementation expects
> the given PCIe device (PF & VF) to implement PRI capability before
> enabling the PRI support. But this assumption is incorrect. As per PCIe
> spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> PRI of the PF and not implement it. Hence we need to create exception
> for handling the PRI support for PCIe VF device.
> 
> Also, since PRI is a shared resource between PF/VF, following rules
> should apply.
> 
> 1. Use proper locking before accessing/modifying PF resources in VF
>    PRI enable/disable call.
> 2. Use reference count logic to track the usage of PRI resource.
> 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Suggested-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c   | 143 ++++++++++++++++++++++++++++++++++----------
>  include/linux/pci.h |   2 +
>  2 files changed, 112 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 1f4be27a071d..079dc5444444 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
>  	if (pdev->is_virtfn)
>  		return;
>  
> +	mutex_init(&pdev->pri_lock);
> +
>  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>  	if (!pos)
>  		return;
> @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  {
>  	u16 control, status;
>  	u32 max_requests;
> +	int ret = 0;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
> -	if (WARN_ON(pdev->pri_enabled))
> -		return -EBUSY;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return -EINVAL;
> +	if (WARN_ON(pdev->pri_enabled)) {
> +		ret = -EBUSY;
> +		goto pri_unlock;
> +	}
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> -	if (!(status & PCI_PRI_STATUS_STOPPED))
> -		return -EBUSY;
> +	if (!pf->pri_cap) {
> +		ret = -EINVAL;
> +		goto pri_unlock;
> +	}
> +
> +	if (pdev->is_virtfn && pf->pri_enabled)
> +		goto update_status;
> +
> +	/*
> +	 * Before updating PRI registers, make sure there is no
> +	 * outstanding PRI requests.
> +	 */
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
> +	if (!(status & PCI_PRI_STATUS_STOPPED)) {
> +		ret = -EBUSY;
> +		goto pri_unlock;
> +	}
>  
> -	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
> -			      &max_requests);
> +	pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, &max_requests);
>  	reqs = min(max_requests, reqs);
> -	pdev->pri_reqs_alloc = reqs;
> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> +	pf->pri_reqs_alloc = reqs;
> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>  
>  	control = PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>  
> -	pdev->pri_enabled = 1;
> +	/*
> +	 * If PRI is not already enabled in PF, increment the PF
> +	 * pri_ref_cnt to track the usage of PRI interface.
> +	 */
> +	if (pdev->is_virtfn && !pf->pri_enabled) {
> +		atomic_inc(&pf->pri_ref_cnt);
> +		pf->pri_enabled = 1;
> +	}
>  
> -	return 0;
> +update_status:
> +	atomic_inc(&pf->pri_ref_cnt);
> +	pdev->pri_enabled = 1;
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pri);
>  
> @@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>  void pci_disable_pri(struct pci_dev *pdev)
>  {
>  	u16 control;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
> -	if (WARN_ON(!pdev->pri_enabled))
> -		return;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return;
> +	if (WARN_ON(!pdev->pri_enabled) || !pf->pri_cap)
> +		goto pri_unlock;
> +
> +	atomic_dec(&pf->pri_ref_cnt);
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
> +	/*
> +	 * If pri_ref_cnt is not zero, then don't modify hardware
> +	 * registers.
> +	 */
> +	if (atomic_read(&pf->pri_ref_cnt))
> +		goto done;
> +
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
>  	control &= ~PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>  
> +done:
>  	pdev->pri_enabled = 0;
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
> @@ -277,17 +319,31 @@ EXPORT_SYMBOL_GPL(pci_disable_pri);
>   */
>  void pci_restore_pri_state(struct pci_dev *pdev)
>  {
> -	u16 control = PCI_PRI_CTRL_ENABLE;
> -	u32 reqs = pdev->pri_reqs_alloc;
> +	u16 control;
> +	u32 reqs;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
>  	if (!pdev->pri_enabled)
>  		return;
>  
> -	if (!pdev->pri_cap)
> +	if (!pf->pri_cap)
>  		return;
>  
> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	mutex_lock(&pf->pri_lock);
> +
> +	/* If PRI is already enabled by other VF's or PF, return */
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
> +	if (control & PCI_PRI_CTRL_ENABLE)
> +		goto pri_unlock;
> +
> +	reqs = pf->pri_reqs_alloc;
> +	control = PCI_PRI_CTRL_ENABLE;
> +
> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);

Why use "control" here instead of just PCI_PRI_CTRL_ENABLE?

> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
>  }
>  EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>  
> @@ -300,18 +356,32 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>   */
>  int pci_reset_pri(struct pci_dev *pdev)
>  {
> +	struct pci_dev *pf = pci_physfn(pdev);
>  	u16 control;
> +	int ret = 0;
>  
> -	if (WARN_ON(pdev->pri_enabled))
> -		return -EBUSY;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return -EINVAL;
> +	if (WARN_ON(pdev->pri_enabled)) {
> +		ret = -EBUSY;
> +		goto done;
> +	}
> +
> +	if (!pf->pri_cap) {
> +		ret = -EINVAL;
> +		goto done;
> +	}
> +
> +	/* If PRI is already enabled by other VF's or PF, return 0 */
> +	if (pf->pri_enabled)
> +		goto done;
>  
>  	control = PCI_PRI_CTRL_RESET;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>  
> -	return 0;
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);

Also here (you didn't add this one, but "control" is completely
pointless in this function).

> +done:
> +	mutex_unlock(&pf->pri_lock);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_reset_pri);
>  #endif /* CONFIG_PCI_PRI */
> @@ -475,11 +545,18 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  {
>  	u16 status;
> +	struct pci_dev *pf = pci_physfn(pdev);
> +
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> +	if (!pf->pri_cap) {
> +		mutex_unlock(&pf->pri_lock);
>  		return 0;
> +	}
> +
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> +	mutex_unlock(&pf->pri_lock);
>  
>  	if (status & PCI_PRI_STATUS_PASID)
>  		return 1;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 27224c0db849..3c9c4c82be27 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -455,8 +455,10 @@ struct pci_dev {
>  	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
>  #endif
>  #ifdef CONFIG_PCI_PRI
> +	struct mutex	pri_lock;	/* PRI enable lock */
>  	u16		pri_cap;	/* PRI Capability offset */
>  	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
> +	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
>  #endif
>  #ifdef CONFIG_PCI_PASID
>  	u16		pasid_cap;	/* PASID Capability offset */
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 5/7] PCI/ATS: Add PASID support for PCIe VF devices
  2019-08-02  0:06 ` [PATCH v5 5/7] PCI/ATS: Add PASID " sathyanarayanan.kuppuswamy
@ 2019-08-12 20:05   ` Bjorn Helgaas
  2019-08-13 22:19     ` Kuppuswamy Sathyanarayanan
  0 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-12 20:05 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:06:02PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> When IOMMU tries to enable PASID for VF device in
> iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
> VF device is currently broken in PCIE driver. Current implementation
> expects the given PCIe device (PF & VF) to implement PASID capability
> before enabling the PASID support. But this assumption is incorrect. As
> per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
> use the PASID of the PF and not implement it.
> 
> Also, since PASID is a shared resource between PF/VF, following rules
> should apply.
> 
> 1. Use proper locking before accessing/modifying PF resources in VF
>    PASID enable/disable call.
> 2. Use reference count logic to track the usage of PASID resource.
> 3. Disable PASID only if the PASID reference count (pasid_ref_cnt) is zero.
> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Suggested-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c   | 113 ++++++++++++++++++++++++++++++++++----------
>  include/linux/pci.h |   2 +
>  2 files changed, 90 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 079dc5444444..9384afd7d00e 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -402,6 +402,8 @@ void pci_pasid_init(struct pci_dev *pdev)
>  	if (pdev->is_virtfn)
>  		return;
>  
> +	mutex_init(&pdev->pasid_lock);
> +
>  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
>  	if (!pos)
>  		return;
> @@ -436,32 +438,57 @@ void pci_pasid_init(struct pci_dev *pdev)
>  int pci_enable_pasid(struct pci_dev *pdev, int features)
>  {
>  	u16 control, supported;
> +	int ret = 0;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
> -	if (WARN_ON(pdev->pasid_enabled))
> -		return -EBUSY;
> +	mutex_lock(&pf->pasid_lock);
>  
> -	if (!pdev->eetlp_prefix_path)
> -		return -EINVAL;
> +	if (WARN_ON(pdev->pasid_enabled)) {
> +		ret = -EBUSY;
> +		goto pasid_unlock;
> +	}
>  
> -	if (!pdev->pasid_cap)
> -		return -EINVAL;
> +	if (!pdev->eetlp_prefix_path) {
> +		ret = -EINVAL;
> +		goto pasid_unlock;
> +	}
>  
> -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> -			     &supported);
> +	if (!pf->pasid_cap) {
> +		ret = -EINVAL;
> +		goto pasid_unlock;
> +	}
> +
> +	if (pdev->is_virtfn && pf->pasid_enabled)
> +		goto update_status;
> +
> +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
>  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
>  	/* User wants to enable anything unsupported? */
> -	if ((supported & features) != features)
> -		return -EINVAL;
> +	if ((supported & features) != features) {
> +		ret = -EINVAL;
> +		goto pasid_unlock;
> +	}
>  
>  	control = PCI_PASID_CTRL_ENABLE | features;
> -	pdev->pasid_features = features;
> -
> +	pf->pasid_features = features;
>  	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
>  
> -	pdev->pasid_enabled = 1;
> +	/*
> +	 * If PASID is not already enabled in PF, increment pasid_ref_cnt
> +	 * to count PF PASID usage.
> +	 */
> +	if (pdev->is_virtfn && !pf->pasid_enabled) {
> +		atomic_inc(&pf->pasid_ref_cnt);
> +		pf->pasid_enabled = 1;
> +	}
>  
> -	return 0;
> +update_status:
> +	atomic_inc(&pf->pasid_ref_cnt);
> +	pdev->pasid_enabled = 1;
> +pasid_unlock:
> +	mutex_unlock(&pf->pasid_lock);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pasid);
>  
> @@ -472,16 +499,29 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
>  void pci_disable_pasid(struct pci_dev *pdev)
>  {
>  	u16 control = 0;
> +	struct pci_dev *pf = pci_physfn(pdev);
> +
> +	mutex_lock(&pf->pasid_lock);
>  
>  	if (WARN_ON(!pdev->pasid_enabled))
> -		return;
> +		goto pasid_unlock;
>  
> -	if (!pdev->pasid_cap)
> -		return;
> +	if (!pf->pasid_cap)
> +		goto pasid_unlock;
>  
> -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> +	atomic_dec(&pf->pasid_ref_cnt);
>  
> +	if (atomic_read(&pf->pasid_ref_cnt))
> +		goto done;
> +
> +	/* Disable PASID only if pasid_ref_cnt is zero */
> +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> +
> +done:
>  	pdev->pasid_enabled = 0;
> +pasid_unlock:
> +	mutex_unlock(&pf->pasid_lock);
> +
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
> @@ -492,15 +532,25 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  void pci_restore_pasid_state(struct pci_dev *pdev)
>  {
>  	u16 control;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
>  	if (!pdev->pasid_enabled)
>  		return;
>  
> -	if (!pdev->pasid_cap)
> +	if (!pf->pasid_cap)
>  		return;
>  
> +	mutex_lock(&pf->pasid_lock);
> +
> +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, &control);
> +	if (control & PCI_PASID_CTRL_ENABLE)
> +		goto pasid_unlock;
> +
>  	control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> +
> +pasid_unlock:
> +	mutex_unlock(&pf->pasid_lock);
>  }
>  EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
>  
> @@ -517,15 +567,22 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
>  int pci_pasid_features(struct pci_dev *pdev)
>  {
>  	u16 supported;
> +	struct pci_dev *pf = pci_physfn(pdev);
> +
> +	mutex_lock(&pf->pasid_lock);
>  
> -	if (!pdev->pasid_cap)
> +	if (!pf->pasid_cap) {
> +		mutex_unlock(&pf->pasid_lock);
>  		return -EINVAL;
> +	}
>  
> -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP,
>  			     &supported);
>  
>  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
> +	mutex_unlock(&pf->pasid_lock);
> +
>  	return supported;
>  }
>  EXPORT_SYMBOL_GPL(pci_pasid_features);
> @@ -579,15 +636,21 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
>  int pci_max_pasids(struct pci_dev *pdev)
>  {
>  	u16 supported;
> +	struct pci_dev *pf = pci_physfn(pdev);
> +
> +	mutex_lock(&pf->pasid_lock);
>  
> -	if (!pdev->pasid_cap)
> +	if (!pf->pasid_cap) {
> +		mutex_unlock(&pf->pasid_lock);
>  		return -EINVAL;
> +	}
>  
> -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> -			     &supported);
> +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
>  
>  	supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
>  
> +	mutex_unlock(&pf->pasid_lock);
> +
>  	return (1 << supported);
>  }
>  EXPORT_SYMBOL_GPL(pci_max_pasids);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3c9c4c82be27..4bfcca045afd 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -461,8 +461,10 @@ struct pci_dev {
>  	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
>  #endif
>  #ifdef CONFIG_PCI_PASID
> +	struct mutex	pasid_lock;	/* PASID enable lock */

I think these locks are finer-grained than necessary.  I'm not sure
it's worth having two mutexes for every device (one for PRI and
another for PASID).  Is there really a performance benefit for having
two?

Do it (or do they) need to be in struct pci_dev?  You only use the PF
mutexes, so maybe it could be in the struct pci_sriov, which I think
is only one per PF.

>  	u16		pasid_cap;	/* PASID Capability offset */
>  	u16		pasid_features;
> +	atomic_t	pasid_ref_cnt;	/* Number of VFs with PASID enabled */
>  #endif
>  #ifdef CONFIG_PCI_P2PDMA
>  	struct pci_p2pdma *p2pdma;
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues
  2019-08-12 20:04   ` Bjorn Helgaas
@ 2019-08-12 20:20     ` sathyanarayanan kuppuswamy
  2019-08-13  3:51       ` Bjorn Helgaas
  0 siblings, 1 reply; 36+ messages in thread
From: sathyanarayanan kuppuswamy @ 2019-08-12 20:20 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch


On 8/12/19 1:04 PM, Bjorn Helgaas wrote:
> On Thu, Aug 01, 2019 at 05:05:58PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> Since pci_prg_resp_pasid_required() function has dependency on both
>> PASID and PRI, define it only if both CONFIG_PCI_PRI and
>> CONFIG_PCI_PASID config options are enabled.
> I don't really like this.  It makes the #ifdefs more complicated and I
> don't think it really buys us anything.  Will anything break if we
> just drop this patch?
Yes, this function uses "pri_lock" mutex which is only defined if 
CONFIG_PCI_PRI is enabled. So not protecting this function within 
CONFIG_PCI_PRI will lead to compilation issues.
>
>> Fixes: e5567f5f6762 ("PCI/ATS: Add pci_prg_resp_pasid_required()
>> interface.")
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
>>   drivers/pci/ats.c       | 10 ++++++----
>>   include/linux/pci-ats.h | 12 +++++++++---
>>   2 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>> index e18499243f84..cdd936d10f68 100644
>> --- a/drivers/pci/ats.c
>> +++ b/drivers/pci/ats.c
>> @@ -395,6 +395,8 @@ int pci_pasid_features(struct pci_dev *pdev)
>>   }
>>   EXPORT_SYMBOL_GPL(pci_pasid_features);
>>   
>> +#ifdef CONFIG_PCI_PRI
>> +
>>   /**
>>    * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
>>    *				 status.
>> @@ -402,10 +404,8 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>>    *
>>    * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
>>    *
>> - * Even though the PRG response PASID status is read from PRI Status
>> - * Register, since this API will mainly be used by PASID users, this
>> - * function is defined within #ifdef CONFIG_PCI_PASID instead of
>> - * CONFIG_PCI_PRI.
>> + * Since this API has dependency on both PRI and PASID, protect it
>> + * with both CONFIG_PCI_PRI and CONFIG_PCI_PASID.
>>    */
>>   int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>>   {
>> @@ -425,6 +425,8 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>>   }
>>   EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
>>   
>> +#endif
>> +
>>   #define PASID_NUMBER_SHIFT	8
>>   #define PASID_NUMBER_MASK	(0x1f << PASID_NUMBER_SHIFT)
>>   /**
>> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
>> index 1ebb88e7c184..1a0bdaee2f32 100644
>> --- a/include/linux/pci-ats.h
>> +++ b/include/linux/pci-ats.h
>> @@ -40,7 +40,6 @@ void pci_disable_pasid(struct pci_dev *pdev);
>>   void pci_restore_pasid_state(struct pci_dev *pdev);
>>   int pci_pasid_features(struct pci_dev *pdev);
>>   int pci_max_pasids(struct pci_dev *pdev);
>> -int pci_prg_resp_pasid_required(struct pci_dev *pdev);
>>   
>>   #else  /* CONFIG_PCI_PASID */
>>   
>> @@ -67,11 +66,18 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
>>   	return -EINVAL;
>>   }
>>   
>> +#endif /* CONFIG_PCI_PASID */
>> +
>> +#if defined(CONFIG_PCI_PRI) && defined(CONFIG_PCI_PASID)
>> +
>> +int pci_prg_resp_pasid_required(struct pci_dev *pdev);
>> +
>> +#else /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
>> +
>>   static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>>   {
>>   	return 0;
>>   }
>> -#endif /* CONFIG_PCI_PASID */
>> -
>> +#endif
>>   
>>   #endif /* LINUX_PCI_ATS_H*/
>> -- 
>> 2.21.0
>>
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
  2019-08-12 20:04   ` Bjorn Helgaas
@ 2019-08-12 21:35     ` sathyanarayanan kuppuswamy
  2019-08-13  4:10       ` Bjorn Helgaas
  0 siblings, 1 reply; 36+ messages in thread
From: sathyanarayanan kuppuswamy @ 2019-08-12 21:35 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

Hi,

Thanks for the review.

On 8/12/19 1:04 PM, Bjorn Helgaas wrote:
> On Thu, Aug 01, 2019 at 05:05:59PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> Currently, PRI Capability checks are repeated across all PRI API's.
>> Instead, cache the capability check result in pci_pri_init() and use it
>> in other PRI API's. Also, since PRI is a shared resource between PF/VF,
>> initialize default values for common PRI features in pci_pri_init().
> This patch does two things, and it would be better if they were split:
>
>    1) Cache the PRI capability offset
>    2) Separate the PF and VF paths
Ok. I will split it into two patches in next version.
>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
>>   drivers/pci/ats.c       | 80 ++++++++++++++++++++++++++++-------------
>>   include/linux/pci-ats.h |  5 +++
>>   include/linux/pci.h     |  1 +
>>   3 files changed, 61 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>> index cdd936d10f68..280be911f190 100644
>> --- a/drivers/pci/ats.c
>> +++ b/drivers/pci/ats.c
>> @@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev)
>>   		return;
>>   
>>   	dev->ats_cap = pos;
>> +
>> +	pci_pri_init(dev);
>>   }
>>   
>>   /**
>> @@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
>>   EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
>>   
>>   #ifdef CONFIG_PCI_PRI
>> +
>> +void pci_pri_init(struct pci_dev *pdev)
>> +{
>> +	u32 max_requests;
>> +	int pos;
>> +
>> +	/*
>> +	 * As per PCIe r4.0, sec 9.3.7.11, only PF is permitted to
>> +	 * implement PRI and all associated VFs can only use it.
>> +	 * Since PF already initialized the PRI parameters there is
>> +	 * no need to proceed further.
>> +	 */
>> +	if (pdev->is_virtfn)
>> +		return;
>> +
>> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> +	if (!pos)
>> +		return;
>> +
>> +	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
>> +
>> +	/*
>> +	 * Since PRI is a shared resource between PF and VF, we must not
>> +	 * configure Outstanding Page Allocation Quota as a per device
>> +	 * resource in pci_enable_pri(). So use maximum value possible
>> +	 * as default value.
>> +	 */
>> +	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, max_requests);
>> +
>> +	pdev->pri_reqs_alloc = max_requests;
>> +	pdev->pri_cap = pos;
>> +}
>> +
>>   /**
>>    * pci_enable_pri - Enable PRI capability
>>    * @ pdev: PCI device structure
>>    *
>>    * Returns 0 on success, negative value on error
>> + *
>> + * TODO: Since PRI is a shared resource between PF/VF, don't update
>> + * Outstanding Page Allocation Quota in the same API as a per device
>> + * feature.
>>    */
>>   int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>>   {
>>   	u16 control, status;
>>   	u32 max_requests;
>> -	int pos;
>>   
>>   	if (WARN_ON(pdev->pri_enabled))
>>   		return -EBUSY;
>>   
>> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> -	if (!pos)
>> +	if (!pdev->pri_cap)
>>   		return -EINVAL;
>>   
>> -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
>> +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>>   	if (!(status & PCI_PRI_STATUS_STOPPED))
>>   		return -EBUSY;
>>   
>> -	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
>> +	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
>> +			      &max_requests);
>>   	reqs = min(max_requests, reqs);
>>   	pdev->pri_reqs_alloc = reqs;
>> -	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
>> +	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> The comment above says "don't update Outstanding Page Allocation
> Quota" but it looks like that's what this is doing.

I don't want to fix it in the current patch-set. It needs further 
scrutiny. That's why I have added the TODO comment for it.

Currently, intel-iommu and amd-iommu drivers (only users of 
pci_enable_pri()) hard-codes 32 as a default value for Outstanding Page 
Allocation Quota. Only exception is, amd-iommu sets this value as 1 for 
devices with erratum AMD_PRI_DEV_ERRATUM_LIMIT_REQ_ONE. There is no 
comment or spec reference that explains why 32 is chosen as default 
value. Also configuring 32 as per device max value will break for PF/VF 
devices since they share the PRI interface. So without clear history, I 
don't want to make any changes which might affect their functionality.

IMO, the correct way is to configure the Outstanding Page Allocation 
Quota with maximum value in pci_pri_init(). So, even if IOMMU can't 
handle more than 32 page request per device, it can fail properly and it 
should not affect the functionality.

I have added proper configuration for Outstanding Page Allocation Quota 
in pci_pri_init(), but it does not serve any purpose until we fix the 
part of the issue in pci_enable_pri(). If you want, I can remove it for 
now, and add it when fixing the issue in pci_enable_pri().
>
>>   	control = PCI_PRI_CTRL_ENABLE;
>> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>>   
>>   	pdev->pri_enabled = 1;
>>   
>> @@ -216,18 +254,16 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>>   void pci_disable_pri(struct pci_dev *pdev)
>>   {
>>   	u16 control;
>> -	int pos;
>>   
>>   	if (WARN_ON(!pdev->pri_enabled))
>>   		return;
>>   
>> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> -	if (!pos)
>> +	if (!pdev->pri_cap)
>>   		return;
>>   
>> -	pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
>> +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
>>   	control &= ~PCI_PRI_CTRL_ENABLE;
>> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>>   
>>   	pdev->pri_enabled = 0;
>>   }
>> @@ -241,17 +277,15 @@ void pci_restore_pri_state(struct pci_dev *pdev)
>>   {
>>   	u16 control = PCI_PRI_CTRL_ENABLE;
>>   	u32 reqs = pdev->pri_reqs_alloc;
>> -	int pos;
>>   
>>   	if (!pdev->pri_enabled)
>>   		return;
>>   
>> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> -	if (!pos)
>> +	if (!pdev->pri_cap)
>>   		return;
>>   
>> -	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
>> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>> +	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>>   }
>>   EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>>   
>> @@ -265,17 +299,15 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>>   int pci_reset_pri(struct pci_dev *pdev)
>>   {
>>   	u16 control;
>> -	int pos;
>>   
>>   	if (WARN_ON(pdev->pri_enabled))
>>   		return -EBUSY;
>>   
>> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> -	if (!pos)
>> +	if (!pdev->pri_cap)
>>   		return -EINVAL;
>>   
>>   	control = PCI_PRI_CTRL_RESET;
>> -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>> +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>>   
>>   	return 0;
>>   }
>> @@ -410,13 +442,11 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>>   int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>>   {
>>   	u16 status;
>> -	int pos;
>>   
>> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>> -	if (!pos)
>> +	if (!pdev->pri_cap)
>>   		return 0;
>>   
>> -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
>> +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>>   
>>   	if (status & PCI_PRI_STATUS_PASID)
>>   		return 1;
>> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
>> index 1a0bdaee2f32..33653d4ca94f 100644
>> --- a/include/linux/pci-ats.h
>> +++ b/include/linux/pci-ats.h
>> @@ -6,6 +6,7 @@
>>   
>>   #ifdef CONFIG_PCI_PRI
>>   
>> +void pci_pri_init(struct pci_dev *pdev);
> I think this could be moved to drivers/pci/pci.h, since it doesn't
> need to be visible outside drivers/pci/.
Makes sense. I will move it.
>
>>   int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>>   void pci_disable_pri(struct pci_dev *pdev);
>>   void pci_restore_pri_state(struct pci_dev *pdev);
>> @@ -13,6 +14,10 @@ int pci_reset_pri(struct pci_dev *pdev);
>>   
>>   #else /* CONFIG_PCI_PRI */
>>   
>> +static inline void pci_pri_init(struct pci_dev *pdev)
>> +{
>> +}
>> +
>>   static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>>   {
>>   	return -ENODEV;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 9e700d9f9f28..56b55db099fc 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -455,6 +455,7 @@ struct pci_dev {
>>   	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
>>   #endif
>>   #ifdef CONFIG_PCI_PRI
>> +	u16		pri_cap;	/* PRI Capability offset */
>>   	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
>>   #endif
>>   #ifdef CONFIG_PCI_PASID
>> -- 
>> 2.21.0
>>
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-12 20:04   ` Bjorn Helgaas
@ 2019-08-12 21:40     ` sathyanarayanan kuppuswamy
  0 siblings, 0 replies; 36+ messages in thread
From: sathyanarayanan kuppuswamy @ 2019-08-12 21:40 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

Hi,

On 8/12/19 1:04 PM, Bjorn Helgaas wrote:
> On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> When IOMMU tries to enable Page Request Interface (PRI) for VF device
>> in iommu_enable_dev_iotlb(), it always fails because PRI support for
>> PCIe VF device is currently broken. Current implementation expects
>> the given PCIe device (PF & VF) to implement PRI capability before
>> enabling the PRI support. But this assumption is incorrect. As per PCIe
>> spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
>> PRI of the PF and not implement it. Hence we need to create exception
>> for handling the PRI support for PCIe VF device.
>>
>> Also, since PRI is a shared resource between PF/VF, following rules
>> should apply.
>>
>> 1. Use proper locking before accessing/modifying PF resources in VF
>>     PRI enable/disable call.
>> 2. Use reference count logic to track the usage of PRI resource.
>> 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
>>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Keith Busch <keith.busch@intel.com>
>> Suggested-by: Ashok Raj <ashok.raj@intel.com>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
>>   drivers/pci/ats.c   | 143 ++++++++++++++++++++++++++++++++++----------
>>   include/linux/pci.h |   2 +
>>   2 files changed, 112 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>> index 1f4be27a071d..079dc5444444 100644
>> --- a/drivers/pci/ats.c
>> +++ b/drivers/pci/ats.c
>> @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
>>   	if (pdev->is_virtfn)
>>   		return;
>>   
>> +	mutex_init(&pdev->pri_lock);
>> +
>>   	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>>   	if (!pos)
>>   		return;
>> @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>>   {
>>   	u16 control, status;
>>   	u32 max_requests;
>> +	int ret = 0;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   
>> -	if (WARN_ON(pdev->pri_enabled))
>> -		return -EBUSY;
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> -		return -EINVAL;
>> +	if (WARN_ON(pdev->pri_enabled)) {
>> +		ret = -EBUSY;
>> +		goto pri_unlock;
>> +	}
>>   
>> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>> -	if (!(status & PCI_PRI_STATUS_STOPPED))
>> -		return -EBUSY;
>> +	if (!pf->pri_cap) {
>> +		ret = -EINVAL;
>> +		goto pri_unlock;
>> +	}
>> +
>> +	if (pdev->is_virtfn && pf->pri_enabled)
>> +		goto update_status;
>> +
>> +	/*
>> +	 * Before updating PRI registers, make sure there is no
>> +	 * outstanding PRI requests.
>> +	 */
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
>> +	if (!(status & PCI_PRI_STATUS_STOPPED)) {
>> +		ret = -EBUSY;
>> +		goto pri_unlock;
>> +	}
>>   
>> -	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
>> -			      &max_requests);
>> +	pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, &max_requests);
>>   	reqs = min(max_requests, reqs);
>> -	pdev->pri_reqs_alloc = reqs;
>> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>> +	pf->pri_reqs_alloc = reqs;
>> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>>   
>>   	control = PCI_PRI_CTRL_ENABLE;
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>>   
>> -	pdev->pri_enabled = 1;
>> +	/*
>> +	 * If PRI is not already enabled in PF, increment the PF
>> +	 * pri_ref_cnt to track the usage of PRI interface.
>> +	 */
>> +	if (pdev->is_virtfn && !pf->pri_enabled) {
>> +		atomic_inc(&pf->pri_ref_cnt);
>> +		pf->pri_enabled = 1;
>> +	}
>>   
>> -	return 0;
>> +update_status:
>> +	atomic_inc(&pf->pri_ref_cnt);
>> +	pdev->pri_enabled = 1;
>> +pri_unlock:
>> +	mutex_unlock(&pf->pri_lock);
>> +	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(pci_enable_pri);
>>   
>> @@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>>   void pci_disable_pri(struct pci_dev *pdev)
>>   {
>>   	u16 control;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   
>> -	if (WARN_ON(!pdev->pri_enabled))
>> -		return;
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> -		return;
>> +	if (WARN_ON(!pdev->pri_enabled) || !pf->pri_cap)
>> +		goto pri_unlock;
>> +
>> +	atomic_dec(&pf->pri_ref_cnt);
>>   
>> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
>> +	/*
>> +	 * If pri_ref_cnt is not zero, then don't modify hardware
>> +	 * registers.
>> +	 */
>> +	if (atomic_read(&pf->pri_ref_cnt))
>> +		goto done;
>> +
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
>>   	control &= ~PCI_PRI_CTRL_ENABLE;
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>>   
>> +done:
>>   	pdev->pri_enabled = 0;
>> +pri_unlock:
>> +	mutex_unlock(&pf->pri_lock);
>>   }
>>   EXPORT_SYMBOL_GPL(pci_disable_pri);
>>   
>> @@ -277,17 +319,31 @@ EXPORT_SYMBOL_GPL(pci_disable_pri);
>>    */
>>   void pci_restore_pri_state(struct pci_dev *pdev)
>>   {
>> -	u16 control = PCI_PRI_CTRL_ENABLE;
>> -	u32 reqs = pdev->pri_reqs_alloc;
>> +	u16 control;
>> +	u32 reqs;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   
>>   	if (!pdev->pri_enabled)
>>   		return;
>>   
>> -	if (!pdev->pri_cap)
>> +	if (!pf->pri_cap)
>>   		return;
>>   
>> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>> +	mutex_lock(&pf->pri_lock);
>> +
>> +	/* If PRI is already enabled by other VF's or PF, return */
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
>> +	if (control & PCI_PRI_CTRL_ENABLE)
>> +		goto pri_unlock;
>> +
>> +	reqs = pf->pri_reqs_alloc;
>> +	control = PCI_PRI_CTRL_ENABLE;
>> +
>> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
> Why use "control" here instead of just PCI_PRI_CTRL_ENABLE?
It can be done. Even in original code, using control did not serve any 
purpose. I just left the implementation as original code.
>
>> +pri_unlock:
>> +	mutex_unlock(&pf->pri_lock);
>>   }
>>   EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>>   
>> @@ -300,18 +356,32 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>>    */
>>   int pci_reset_pri(struct pci_dev *pdev)
>>   {
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   	u16 control;
>> +	int ret = 0;
>>   
>> -	if (WARN_ON(pdev->pri_enabled))
>> -		return -EBUSY;
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> -		return -EINVAL;
>> +	if (WARN_ON(pdev->pri_enabled)) {
>> +		ret = -EBUSY;
>> +		goto done;
>> +	}
>> +
>> +	if (!pf->pri_cap) {
>> +		ret = -EINVAL;
>> +		goto done;
>> +	}
>> +
>> +	/* If PRI is already enabled by other VF's or PF, return 0 */
>> +	if (pf->pri_enabled)
>> +		goto done;
>>   
>>   	control = PCI_PRI_CTRL_RESET;
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>>   
>> -	return 0;
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
> Also here (you didn't add this one, but "control" is completely
> pointless in this function).
>
>> +done:
>> +	mutex_unlock(&pf->pri_lock);
>> +	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(pci_reset_pri);
>>   #endif /* CONFIG_PCI_PRI */
>> @@ -475,11 +545,18 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>>   int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>>   {
>>   	u16 status;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>> +
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> +	if (!pf->pri_cap) {
>> +		mutex_unlock(&pf->pri_lock);
>>   		return 0;
>> +	}
>> +
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
>>   
>> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>> +	mutex_unlock(&pf->pri_lock);
>>   
>>   	if (status & PCI_PRI_STATUS_PASID)
>>   		return 1;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 27224c0db849..3c9c4c82be27 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -455,8 +455,10 @@ struct pci_dev {
>>   	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
>>   #endif
>>   #ifdef CONFIG_PCI_PRI
>> +	struct mutex	pri_lock;	/* PRI enable lock */
>>   	u16		pri_cap;	/* PRI Capability offset */
>>   	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
>> +	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
>>   #endif
>>   #ifdef CONFIG_PCI_PASID
>>   	u16		pasid_cap;	/* PASID Capability offset */
>> -- 
>> 2.21.0
>>
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues
  2019-08-12 20:20     ` sathyanarayanan kuppuswamy
@ 2019-08-13  3:51       ` Bjorn Helgaas
  2019-08-16 18:06         ` Kuppuswamy Sathyanarayanan
  0 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-13  3:51 UTC (permalink / raw)
  To: sathyanarayanan kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Mon, Aug 12, 2019 at 01:20:55PM -0700, sathyanarayanan kuppuswamy wrote:
> On 8/12/19 1:04 PM, Bjorn Helgaas wrote:
> > On Thu, Aug 01, 2019 at 05:05:58PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > 
> > > Since pci_prg_resp_pasid_required() function has dependency on both
> > > PASID and PRI, define it only if both CONFIG_PCI_PRI and
> > > CONFIG_PCI_PASID config options are enabled.

> > I don't really like this.  It makes the #ifdefs more complicated and I
> > don't think it really buys us anything.  Will anything break if we
> > just drop this patch?

> Yes, this function uses "pri_lock" mutex which is only defined if
> CONFIG_PCI_PRI is enabled. So not protecting this function within
> CONFIG_PCI_PRI will lead to compilation issues.

Ah, OK.  That helps a lot.  "pri_lock" doesn't exist at this point in
the series, so the patch makes no sense without knowing that.

I'm still not convinced this is the right thing because I'm not sure
the lock is necessary.  I'll respond to the patch that adds the lock.

> > > Fixes: e5567f5f6762 ("PCI/ATS: Add pci_prg_resp_pasid_required()
> > > interface.")
> > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > ---
> > >   drivers/pci/ats.c       | 10 ++++++----
> > >   include/linux/pci-ats.h | 12 +++++++++---
> > >   2 files changed, 15 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > index e18499243f84..cdd936d10f68 100644
> > > --- a/drivers/pci/ats.c
> > > +++ b/drivers/pci/ats.c
> > > @@ -395,6 +395,8 @@ int pci_pasid_features(struct pci_dev *pdev)
> > >   }
> > >   EXPORT_SYMBOL_GPL(pci_pasid_features);
> > > +#ifdef CONFIG_PCI_PRI
> > > +
> > >   /**
> > >    * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
> > >    *				 status.
> > > @@ -402,10 +404,8 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
> > >    *
> > >    * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
> > >    *
> > > - * Even though the PRG response PASID status is read from PRI Status
> > > - * Register, since this API will mainly be used by PASID users, this
> > > - * function is defined within #ifdef CONFIG_PCI_PASID instead of
> > > - * CONFIG_PCI_PRI.
> > > + * Since this API has dependency on both PRI and PASID, protect it
> > > + * with both CONFIG_PCI_PRI and CONFIG_PCI_PASID.
> > >    */
> > >   int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> > >   {
> > > @@ -425,6 +425,8 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> > >   }
> > >   EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
> > > +#endif
> > > +
> > >   #define PASID_NUMBER_SHIFT	8
> > >   #define PASID_NUMBER_MASK	(0x1f << PASID_NUMBER_SHIFT)
> > >   /**
> > > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> > > index 1ebb88e7c184..1a0bdaee2f32 100644
> > > --- a/include/linux/pci-ats.h
> > > +++ b/include/linux/pci-ats.h
> > > @@ -40,7 +40,6 @@ void pci_disable_pasid(struct pci_dev *pdev);
> > >   void pci_restore_pasid_state(struct pci_dev *pdev);
> > >   int pci_pasid_features(struct pci_dev *pdev);
> > >   int pci_max_pasids(struct pci_dev *pdev);
> > > -int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> > >   #else  /* CONFIG_PCI_PASID */
> > > @@ -67,11 +66,18 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
> > >   	return -EINVAL;
> > >   }
> > > +#endif /* CONFIG_PCI_PASID */
> > > +
> > > +#if defined(CONFIG_PCI_PRI) && defined(CONFIG_PCI_PASID)
> > > +
> > > +int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> > > +
> > > +#else /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
> > > +
> > >   static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> > >   {
> > >   	return 0;
> > >   }
> > > -#endif /* CONFIG_PCI_PASID */
> > > -
> > > +#endif
> > >   #endif /* LINUX_PCI_ATS_H*/
> > > -- 
> > > 2.21.0
> > > 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux kernel developer
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
  2019-08-12 21:35     ` sathyanarayanan kuppuswamy
@ 2019-08-13  4:10       ` Bjorn Helgaas
  0 siblings, 0 replies; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-13  4:10 UTC (permalink / raw)
  To: sathyanarayanan kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Mon, Aug 12, 2019 at 02:35:32PM -0700, sathyanarayanan kuppuswamy wrote:
> On 8/12/19 1:04 PM, Bjorn Helgaas wrote:
> > On Thu, Aug 01, 2019 at 05:05:59PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > 
> > > Currently, PRI Capability checks are repeated across all PRI API's.
> > > Instead, cache the capability check result in pci_pri_init() and use it
> > > in other PRI API's. Also, since PRI is a shared resource between PF/VF,
> > > initialize default values for common PRI features in pci_pri_init().
> > This patch does two things, and it would be better if they were split:
> > 
> >    1) Cache the PRI capability offset
> >    2) Separate the PF and VF paths
> Ok. I will split it into two patches in next version.
> > 
> > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > ---
> > >   drivers/pci/ats.c       | 80 ++++++++++++++++++++++++++++-------------
> > >   include/linux/pci-ats.h |  5 +++
> > >   include/linux/pci.h     |  1 +
> > >   3 files changed, 61 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > index cdd936d10f68..280be911f190 100644
> > > --- a/drivers/pci/ats.c
> > > +++ b/drivers/pci/ats.c
> > > @@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev)
> > >   		return;
> > >   	dev->ats_cap = pos;
> > > +
> > > +	pci_pri_init(dev);
> > >   }
> > >   /**
> > > @@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
> > >   EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
> > >   #ifdef CONFIG_PCI_PRI
> > > +
> > > +void pci_pri_init(struct pci_dev *pdev)
> > > +{
> > > +	u32 max_requests;
> > > +	int pos;
> > > +
> > > +	/*
> > > +	 * As per PCIe r4.0, sec 9.3.7.11, only PF is permitted to
> > > +	 * implement PRI and all associated VFs can only use it.
> > > +	 * Since PF already initialized the PRI parameters there is
> > > +	 * no need to proceed further.
> > > +	 */
> > > +	if (pdev->is_virtfn)
> > > +		return;
> > > +
> > > +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > +	if (!pos)
> > > +		return;
> > > +
> > > +	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
> > > +
> > > +	/*
> > > +	 * Since PRI is a shared resource between PF and VF, we must not
> > > +	 * configure Outstanding Page Allocation Quota as a per device
> > > +	 * resource in pci_enable_pri(). So use maximum value possible
> > > +	 * as default value.
> > > +	 */
> > > +	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, max_requests);
> > > +
> > > +	pdev->pri_reqs_alloc = max_requests;
> > > +	pdev->pri_cap = pos;
> > > +}
> > > +
> > >   /**
> > >    * pci_enable_pri - Enable PRI capability
> > >    * @ pdev: PCI device structure
> > >    *
> > >    * Returns 0 on success, negative value on error
> > > + *
> > > + * TODO: Since PRI is a shared resource between PF/VF, don't update
> > > + * Outstanding Page Allocation Quota in the same API as a per device
> > > + * feature.
> > >    */
> > >   int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > >   {
> > >   	u16 control, status;
> > >   	u32 max_requests;
> > > -	int pos;
> > >   	if (WARN_ON(pdev->pri_enabled))
> > >   		return -EBUSY;
> > > -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > -	if (!pos)
> > > +	if (!pdev->pri_cap)
> > >   		return -EINVAL;
> > > -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> > > +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> > >   	if (!(status & PCI_PRI_STATUS_STOPPED))
> > >   		return -EBUSY;
> > > -	pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
> > > +	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
> > > +			      &max_requests);
> > >   	reqs = min(max_requests, reqs);
> > >   	pdev->pri_reqs_alloc = reqs;
> > > -	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> > > +	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);

> > The comment above says "don't update Outstanding Page Allocation
> > Quota" but it looks like that's what this is doing.
> 
> I don't want to fix it in the current patch-set. It needs further scrutiny.
> That's why I have added the TODO comment for it.

You don't have to fix everything in this patch set, but the comment
should match what the code does.  If you desire, it can go on to
explain why the current behavior is incorrect.  But the current
comment is confusing.

I think the series would read better if the patch that changed from
trying to use the PRI capability on the VF (which always fails) to
using the one on the PF were *first*, i.e., if this change:

  - pci_write_config_dword(pdev, ... + PCI_PRI_ALLOC_REQ, reqs);
  + pci_write_config_dword(pf, ... + PCI_PRI_ALLOC_REQ, reqs);

were before adding the pri_cap cache:

  - pci_write_config_dword(pf, pos + PCI_PRI_ALLOC_REQ, reqs);
  + pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);

In the current order, you add the cache to several lines of code that
are never executed.

> Currently, intel-iommu and amd-iommu drivers (only users of
> pci_enable_pri()) hard-codes 32 as a default value for Outstanding Page
> Allocation Quota. Only exception is, amd-iommu sets this value as 1 for
> devices with erratum AMD_PRI_DEV_ERRATUM_LIMIT_REQ_ONE. There is no comment
> or spec reference that explains why 32 is chosen as default value. Also
> configuring 32 as per device max value will break for PF/VF devices since
> they share the PRI interface. So without clear history, I don't want to make
> any changes which might affect their functionality.
> 
> IMO, the correct way is to configure the Outstanding Page Allocation Quota
> with maximum value in pci_pri_init(). So, even if IOMMU can't handle more
> than 32 page request per device, it can fail properly and it should not
> affect the functionality.
> 
> I have added proper configuration for Outstanding Page Allocation Quota in
> pci_pri_init(), but it does not serve any purpose until we fix the part of
> the issue in pci_enable_pri(). If you want, I can remove it for now, and add
> it when fixing the issue in pci_enable_pri().

If it doesn't serve any purpose, please remove it for now.  I think
it's better to keep all the pieces related to that fix together so we
can test them together and backport them together.

> > >   	control = PCI_PRI_CTRL_ENABLE;
> > > -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> > > +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> > >   	pdev->pri_enabled = 1;
> > > @@ -216,18 +254,16 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
> > >   void pci_disable_pri(struct pci_dev *pdev)
> > >   {
> > >   	u16 control;
> > > -	int pos;
> > >   	if (WARN_ON(!pdev->pri_enabled))
> > >   		return;
> > > -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > -	if (!pos)
> > > +	if (!pdev->pri_cap)
> > >   		return;
> > > -	pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
> > > +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
> > >   	control &= ~PCI_PRI_CTRL_ENABLE;
> > > -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> > > +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> > >   	pdev->pri_enabled = 0;
> > >   }
> > > @@ -241,17 +277,15 @@ void pci_restore_pri_state(struct pci_dev *pdev)
> > >   {
> > >   	u16 control = PCI_PRI_CTRL_ENABLE;
> > >   	u32 reqs = pdev->pri_reqs_alloc;
> > > -	int pos;
> > >   	if (!pdev->pri_enabled)
> > >   		return;
> > > -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > -	if (!pos)
> > > +	if (!pdev->pri_cap)
> > >   		return;
> > > -	pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> > > -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> > > +	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> > > +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> > >   }
> > >   EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> > > @@ -265,17 +299,15 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> > >   int pci_reset_pri(struct pci_dev *pdev)
> > >   {
> > >   	u16 control;
> > > -	int pos;
> > >   	if (WARN_ON(pdev->pri_enabled))
> > >   		return -EBUSY;
> > > -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > -	if (!pos)
> > > +	if (!pdev->pri_cap)
> > >   		return -EINVAL;
> > >   	control = PCI_PRI_CTRL_RESET;
> > > -	pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> > > +	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> > >   	return 0;
> > >   }
> > > @@ -410,13 +442,11 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
> > >   int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> > >   {
> > >   	u16 status;
> > > -	int pos;
> > > -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > > -	if (!pos)
> > > +	if (!pdev->pri_cap)
> > >   		return 0;
> > > -	pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> > > +	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> > >   	if (status & PCI_PRI_STATUS_PASID)
> > >   		return 1;
> > > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> > > index 1a0bdaee2f32..33653d4ca94f 100644
> > > --- a/include/linux/pci-ats.h
> > > +++ b/include/linux/pci-ats.h
> > > @@ -6,6 +6,7 @@
> > >   #ifdef CONFIG_PCI_PRI
> > > +void pci_pri_init(struct pci_dev *pdev);
> > I think this could be moved to drivers/pci/pci.h, since it doesn't
> > need to be visible outside drivers/pci/.
> Makes sense. I will move it.
> > 
> > >   int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
> > >   void pci_disable_pri(struct pci_dev *pdev);
> > >   void pci_restore_pri_state(struct pci_dev *pdev);
> > > @@ -13,6 +14,10 @@ int pci_reset_pri(struct pci_dev *pdev);
> > >   #else /* CONFIG_PCI_PRI */
> > > +static inline void pci_pri_init(struct pci_dev *pdev)
> > > +{
> > > +}
> > > +
> > >   static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > >   {
> > >   	return -ENODEV;
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 9e700d9f9f28..56b55db099fc 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -455,6 +455,7 @@ struct pci_dev {
> > >   	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
> > >   #endif
> > >   #ifdef CONFIG_PCI_PRI
> > > +	u16		pri_cap;	/* PRI Capability offset */
> > >   	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
> > >   #endif
> > >   #ifdef CONFIG_PCI_PASID
> > > -- 
> > > 2.21.0
> > > 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux kernel developer
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-02  0:06 ` [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
@ 2019-08-13  4:16   ` Bjorn Helgaas
  2019-08-15 22:20   ` Bjorn Helgaas
  2 siblings, 0 replies; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-13  4:16 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> When IOMMU tries to enable Page Request Interface (PRI) for VF device
> in iommu_enable_dev_iotlb(), it always fails because PRI support for
> PCIe VF device is currently broken. Current implementation expects
> the given PCIe device (PF & VF) to implement PRI capability before
> enabling the PRI support. But this assumption is incorrect. As per PCIe
> spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> PRI of the PF and not implement it. Hence we need to create exception
> for handling the PRI support for PCIe VF device.
> 
> Also, since PRI is a shared resource between PF/VF, following rules
> should apply.
> 
> 1. Use proper locking before accessing/modifying PF resources in VF
>    PRI enable/disable call.
> 2. Use reference count logic to track the usage of PRI resource.
> 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Suggested-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c   | 143 ++++++++++++++++++++++++++++++++++----------
>  include/linux/pci.h |   2 +
>  2 files changed, 112 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 1f4be27a071d..079dc5444444 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
>  	if (pdev->is_virtfn)
>  		return;
>  
> +	mutex_init(&pdev->pri_lock);
> +
>  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>  	if (!pos)
>  		return;
> @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  {
>  	u16 control, status;
>  	u32 max_requests;
> +	int ret = 0;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
> -	if (WARN_ON(pdev->pri_enabled))
> -		return -EBUSY;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return -EINVAL;
> +	if (WARN_ON(pdev->pri_enabled)) {
> +		ret = -EBUSY;
> +		goto pri_unlock;
> +	}
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> -	if (!(status & PCI_PRI_STATUS_STOPPED))
> -		return -EBUSY;
> +	if (!pf->pri_cap) {
> +		ret = -EINVAL;
> +		goto pri_unlock;
> +	}
> +
> +	if (pdev->is_virtfn && pf->pri_enabled)
> +		goto update_status;
> +
> +	/*
> +	 * Before updating PRI registers, make sure there is no
> +	 * outstanding PRI requests.
> +	 */
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
> +	if (!(status & PCI_PRI_STATUS_STOPPED)) {
> +		ret = -EBUSY;
> +		goto pri_unlock;
> +	}
>  
> -	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
> -			      &max_requests);
> +	pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, &max_requests);
>  	reqs = min(max_requests, reqs);
> -	pdev->pri_reqs_alloc = reqs;
> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> +	pf->pri_reqs_alloc = reqs;
> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>  
>  	control = PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>  
> -	pdev->pri_enabled = 1;
> +	/*
> +	 * If PRI is not already enabled in PF, increment the PF
> +	 * pri_ref_cnt to track the usage of PRI interface.
> +	 */
> +	if (pdev->is_virtfn && !pf->pri_enabled) {
> +		atomic_inc(&pf->pri_ref_cnt);
> +		pf->pri_enabled = 1;
> +	}
>  
> -	return 0;
> +update_status:
> +	atomic_inc(&pf->pri_ref_cnt);
> +	pdev->pri_enabled = 1;
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pri);
>  
> @@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>  void pci_disable_pri(struct pci_dev *pdev)
>  {
>  	u16 control;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
> -	if (WARN_ON(!pdev->pri_enabled))
> -		return;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return;
> +	if (WARN_ON(!pdev->pri_enabled) || !pf->pri_cap)
> +		goto pri_unlock;
> +
> +	atomic_dec(&pf->pri_ref_cnt);
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
> +	/*
> +	 * If pri_ref_cnt is not zero, then don't modify hardware
> +	 * registers.
> +	 */
> +	if (atomic_read(&pf->pri_ref_cnt))
> +		goto done;
> +
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
>  	control &= ~PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>  
> +done:
>  	pdev->pri_enabled = 0;
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
> @@ -277,17 +319,31 @@ EXPORT_SYMBOL_GPL(pci_disable_pri);
>   */
>  void pci_restore_pri_state(struct pci_dev *pdev)
>  {
> -	u16 control = PCI_PRI_CTRL_ENABLE;
> -	u32 reqs = pdev->pri_reqs_alloc;
> +	u16 control;
> +	u32 reqs;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
>  	if (!pdev->pri_enabled)
>  		return;
>  
> -	if (!pdev->pri_cap)
> +	if (!pf->pri_cap)
>  		return;
>  
> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	mutex_lock(&pf->pri_lock);
> +
> +	/* If PRI is already enabled by other VF's or PF, return */
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
> +	if (control & PCI_PRI_CTRL_ENABLE)
> +		goto pri_unlock;
> +
> +	reqs = pf->pri_reqs_alloc;
> +	control = PCI_PRI_CTRL_ENABLE;
> +
> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
> +
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
>  }
>  EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>  
> @@ -300,18 +356,32 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>   */
>  int pci_reset_pri(struct pci_dev *pdev)
>  {
> +	struct pci_dev *pf = pci_physfn(pdev);
>  	u16 control;
> +	int ret = 0;
>  
> -	if (WARN_ON(pdev->pri_enabled))
> -		return -EBUSY;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return -EINVAL;
> +	if (WARN_ON(pdev->pri_enabled)) {
> +		ret = -EBUSY;
> +		goto done;
> +	}
> +
> +	if (!pf->pri_cap) {
> +		ret = -EINVAL;
> +		goto done;
> +	}
> +
> +	/* If PRI is already enabled by other VF's or PF, return 0 */
> +	if (pf->pri_enabled)
> +		goto done;
>  
>  	control = PCI_PRI_CTRL_RESET;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>  
> -	return 0;
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
> +done:
> +	mutex_unlock(&pf->pri_lock);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_reset_pri);
>  #endif /* CONFIG_PCI_PRI */
> @@ -475,11 +545,18 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  {
>  	u16 status;
> +	struct pci_dev *pf = pci_physfn(pdev);
> +
> +	mutex_lock(&pf->pri_lock);

I don't believe this lock is necessary.  What is it protecting?

We're only doing a single read of PCI_PRI_STATUS here, and with the
lock, we'll read it either before or after the critical sections in
pci_enable_pri(), pci_disable_pri(), pci_restore_pri_state(), and
pci_reset_pri().

And without the lock, we'll *also* read PCI_PRI_STATUS either before
or after anything in those functions that affects it.

> -	if (!pdev->pri_cap)
> +	if (!pf->pri_cap) {
> +		mutex_unlock(&pf->pri_lock);
>  		return 0;
> +	}
> +
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> +	mutex_unlock(&pf->pri_lock);
>  
>  	if (status & PCI_PRI_STATUS_PASID)
>  		return 1;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 27224c0db849..3c9c4c82be27 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -455,8 +455,10 @@ struct pci_dev {
>  	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
>  #endif
>  #ifdef CONFIG_PCI_PRI
> +	struct mutex	pri_lock;	/* PRI enable lock */
>  	u16		pri_cap;	/* PRI Capability offset */
>  	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
> +	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
>  #endif
>  #ifdef CONFIG_PCI_PASID
>  	u16		pasid_cap;	/* PASID Capability offset */
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 5/7] PCI/ATS: Add PASID support for PCIe VF devices
  2019-08-12 20:05   ` Bjorn Helgaas
@ 2019-08-13 22:19     ` Kuppuswamy Sathyanarayanan
  2019-08-15  5:04       ` Bjorn Helgaas
  0 siblings, 1 reply; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-13 22:19 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Mon, Aug 12, 2019 at 03:05:08PM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 01, 2019 at 05:06:02PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > 
> > When IOMMU tries to enable PASID for VF device in
> > iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
> > VF device is currently broken in PCIE driver. Current implementation
> > expects the given PCIe device (PF & VF) to implement PASID capability
> > before enabling the PASID support. But this assumption is incorrect. As
> > per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
> > use the PASID of the PF and not implement it.
> > 
> > Also, since PASID is a shared resource between PF/VF, following rules
> > should apply.
> > 
> > 1. Use proper locking before accessing/modifying PF resources in VF
> >    PASID enable/disable call.
> > 2. Use reference count logic to track the usage of PASID resource.
> > 3. Disable PASID only if the PASID reference count (pasid_ref_cnt) is zero.
> > 
> > Cc: Ashok Raj <ashok.raj@intel.com>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Suggested-by: Ashok Raj <ashok.raj@intel.com>
> > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > ---
> >  drivers/pci/ats.c   | 113 ++++++++++++++++++++++++++++++++++----------
> >  include/linux/pci.h |   2 +
> >  2 files changed, 90 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > index 079dc5444444..9384afd7d00e 100644
> > --- a/drivers/pci/ats.c
> > +++ b/drivers/pci/ats.c
> > @@ -402,6 +402,8 @@ void pci_pasid_init(struct pci_dev *pdev)
> >  	if (pdev->is_virtfn)
> >  		return;
> >  
> > +	mutex_init(&pdev->pasid_lock);
> > +
> >  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> >  	if (!pos)
> >  		return;
> > @@ -436,32 +438,57 @@ void pci_pasid_init(struct pci_dev *pdev)
> >  int pci_enable_pasid(struct pci_dev *pdev, int features)
> >  {
> >  	u16 control, supported;
> > +	int ret = 0;
> > +	struct pci_dev *pf = pci_physfn(pdev);
> >  
> > -	if (WARN_ON(pdev->pasid_enabled))
> > -		return -EBUSY;
> > +	mutex_lock(&pf->pasid_lock);
> >  
> > -	if (!pdev->eetlp_prefix_path)
> > -		return -EINVAL;
> > +	if (WARN_ON(pdev->pasid_enabled)) {
> > +		ret = -EBUSY;
> > +		goto pasid_unlock;
> > +	}
> >  
> > -	if (!pdev->pasid_cap)
> > -		return -EINVAL;
> > +	if (!pdev->eetlp_prefix_path) {
> > +		ret = -EINVAL;
> > +		goto pasid_unlock;
> > +	}
> >  
> > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > -			     &supported);
> > +	if (!pf->pasid_cap) {
> > +		ret = -EINVAL;
> > +		goto pasid_unlock;
> > +	}
> > +
> > +	if (pdev->is_virtfn && pf->pasid_enabled)
> > +		goto update_status;
> > +
> > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
> >  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> >  
> >  	/* User wants to enable anything unsupported? */
> > -	if ((supported & features) != features)
> > -		return -EINVAL;
> > +	if ((supported & features) != features) {
> > +		ret = -EINVAL;
> > +		goto pasid_unlock;
> > +	}
> >  
> >  	control = PCI_PASID_CTRL_ENABLE | features;
> > -	pdev->pasid_features = features;
> > -
> > +	pf->pasid_features = features;
> >  	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> >  
> > -	pdev->pasid_enabled = 1;
> > +	/*
> > +	 * If PASID is not already enabled in PF, increment pasid_ref_cnt
> > +	 * to count PF PASID usage.
> > +	 */
> > +	if (pdev->is_virtfn && !pf->pasid_enabled) {
> > +		atomic_inc(&pf->pasid_ref_cnt);
> > +		pf->pasid_enabled = 1;
> > +	}
> >  
> > -	return 0;
> > +update_status:
> > +	atomic_inc(&pf->pasid_ref_cnt);
> > +	pdev->pasid_enabled = 1;
> > +pasid_unlock:
> > +	mutex_unlock(&pf->pasid_lock);
> > +	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(pci_enable_pasid);
> >  
> > @@ -472,16 +499,29 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
> >  void pci_disable_pasid(struct pci_dev *pdev)
> >  {
> >  	u16 control = 0;
> > +	struct pci_dev *pf = pci_physfn(pdev);
> > +
> > +	mutex_lock(&pf->pasid_lock);
> >  
> >  	if (WARN_ON(!pdev->pasid_enabled))
> > -		return;
> > +		goto pasid_unlock;
> >  
> > -	if (!pdev->pasid_cap)
> > -		return;
> > +	if (!pf->pasid_cap)
> > +		goto pasid_unlock;
> >  
> > -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > +	atomic_dec(&pf->pasid_ref_cnt);
> >  
> > +	if (atomic_read(&pf->pasid_ref_cnt))
> > +		goto done;
> > +
> > +	/* Disable PASID only if pasid_ref_cnt is zero */
> > +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> > +
> > +done:
> >  	pdev->pasid_enabled = 0;
> > +pasid_unlock:
> > +	mutex_unlock(&pf->pasid_lock);
> > +
> >  }
> >  EXPORT_SYMBOL_GPL(pci_disable_pasid);
> >  
> > @@ -492,15 +532,25 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
> >  void pci_restore_pasid_state(struct pci_dev *pdev)
> >  {
> >  	u16 control;
> > +	struct pci_dev *pf = pci_physfn(pdev);
> >  
> >  	if (!pdev->pasid_enabled)
> >  		return;
> >  
> > -	if (!pdev->pasid_cap)
> > +	if (!pf->pasid_cap)
> >  		return;
> >  
> > +	mutex_lock(&pf->pasid_lock);
> > +
> > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, &control);
> > +	if (control & PCI_PASID_CTRL_ENABLE)
> > +		goto pasid_unlock;
> > +
> >  	control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> > -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> > +
> > +pasid_unlock:
> > +	mutex_unlock(&pf->pasid_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> >  
> > @@ -517,15 +567,22 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> >  int pci_pasid_features(struct pci_dev *pdev)
> >  {
> >  	u16 supported;
> > +	struct pci_dev *pf = pci_physfn(pdev);
> > +
> > +	mutex_lock(&pf->pasid_lock);
> >  
> > -	if (!pdev->pasid_cap)
> > +	if (!pf->pasid_cap) {
> > +		mutex_unlock(&pf->pasid_lock);
> >  		return -EINVAL;
> > +	}
> >  
> > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP,
> >  			     &supported);
> >  
> >  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> >  
> > +	mutex_unlock(&pf->pasid_lock);
> > +
> >  	return supported;
> >  }
> >  EXPORT_SYMBOL_GPL(pci_pasid_features);
> > @@ -579,15 +636,21 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
> >  int pci_max_pasids(struct pci_dev *pdev)
> >  {
> >  	u16 supported;
> > +	struct pci_dev *pf = pci_physfn(pdev);
> > +
> > +	mutex_lock(&pf->pasid_lock);
> >  
> > -	if (!pdev->pasid_cap)
> > +	if (!pf->pasid_cap) {
> > +		mutex_unlock(&pf->pasid_lock);
> >  		return -EINVAL;
> > +	}
> >  
> > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > -			     &supported);
> > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
> >  
> >  	supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
> >  
> > +	mutex_unlock(&pf->pasid_lock);
> > +
> >  	return (1 << supported);
> >  }
> >  EXPORT_SYMBOL_GPL(pci_max_pasids);
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 3c9c4c82be27..4bfcca045afd 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -461,8 +461,10 @@ struct pci_dev {
> >  	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
> >  #endif
> >  #ifdef CONFIG_PCI_PASID
> > +	struct mutex	pasid_lock;	/* PASID enable lock */
> 
> I think these locks are finer-grained than necessary.  I'm not sure
> it's worth having two mutexes for every device (one for PRI and
> another for PASID).  Is there really a performance benefit for having
> two?
Performance benefit should be minimal. But, PRI and PASID are functionally
independent. So I don't think its correct to protect its resources with
a common lock. Let me know your comments.
> 
> Do it (or do they) need to be in struct pci_dev?  You only use the PF
> mutexes, so maybe it could be in the struct pci_sriov, which I think
> is only one per PF.
Its possible to move it to pci_sriov structure. But is that the right
place for it? This lock is only used for protecting PRI and PASID feature
updates and PRI/PASID are not dependent on IOV feature. Let me know your
comments.

If you want to move this lock to pci_sriov structure and use one lock
for both PRI/PASID, then the implementation would look like following. We
could create physfn lock/unlock functions in include/linux/pci.h similar
to pci_physfn() function.

#ifdef CONFIG_PCI_IOV
static inline void pci_physfn_reslock(struct pci_dev *dev)
{
    struct pci_dev *pf = pci_physfn(dev);

    if (!pf->is_physfn)
        return;

    mutex_lock(&pf->sriov->reslock);

}
#else
static inline void pci_physfn_reslock(struct pci_dev *dev) {}; 
#endif

> 
> >  	u16		pasid_cap;	/* PASID Capability offset */
> >  	u16		pasid_features;
> > +	atomic_t	pasid_ref_cnt;	/* Number of VFs with PASID enabled */
> >  #endif
> >  #ifdef CONFIG_PCI_P2PDMA
> >  	struct pci_p2pdma *p2pdma;
> > -- 
> > 2.21.0
> > 

-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
  2019-08-02  0:05 ` [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init() sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
@ 2019-08-15  4:46   ` Bjorn Helgaas
  2019-08-15 17:30     ` Kuppuswamy Sathyanarayanan
  1 sibling, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-15  4:46 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:05:59PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Currently, PRI Capability checks are repeated across all PRI API's.
> Instead, cache the capability check result in pci_pri_init() and use it
> in other PRI API's. Also, since PRI is a shared resource between PF/VF,
> initialize default values for common PRI features in pci_pri_init().
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c       | 80 ++++++++++++++++++++++++++++-------------
>  include/linux/pci-ats.h |  5 +++
>  include/linux/pci.h     |  1 +
>  3 files changed, 61 insertions(+), 25 deletions(-)
> 

> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index cdd936d10f68..280be911f190 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c

> @@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev)
>  		return;
>  
>  	dev->ats_cap = pos;
> +
> +	pci_pri_init(dev);
>  }
>  
>  /**
> @@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
>  
>  #ifdef CONFIG_PCI_PRI
> +
> +void pci_pri_init(struct pci_dev *pdev)
> +{
> ...
> +}

> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 1a0bdaee2f32..33653d4ca94f 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -6,6 +6,7 @@
>  
>  #ifdef CONFIG_PCI_PRI
>  
> +void pci_pri_init(struct pci_dev *pdev);

pci_pri_init() is implemented and called in drivers/pci/ats.c.  Unless
there's a need to call this from outside ats.c, it should be static
and should not be declared here.

If you can make it static, please also reorder the code so you don't
need a forward declaration in ats.c.

>  int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  void pci_restore_pri_state(struct pci_dev *pdev);
> @@ -13,6 +14,10 @@ int pci_reset_pri(struct pci_dev *pdev);
>  
>  #else /* CONFIG_PCI_PRI */
>  
> +static inline void pci_pri_init(struct pci_dev *pdev)
> +{
> +}
> +
>  static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  {
>  	return -ENODEV;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 3/7] PCI/ATS: Initialize PASID in pci_ats_init()
  2019-08-02  0:06 ` [PATCH v5 3/7] PCI/ATS: Initialize PASID " sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
@ 2019-08-15  4:48   ` Bjorn Helgaas
  2019-08-15  4:56   ` Bjorn Helgaas
  2 siblings, 0 replies; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-15  4:48 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:06:00PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Currently, PASID Capability checks are repeated across all PASID API's.
> Instead, cache the capability check result in pci_pasid_init() and use
> it in other PASID API's. Also, since PASID is a shared resource between
> PF/VF, initialize PASID features with default values in pci_pasid_init().
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c       | 74 +++++++++++++++++++++++++++++------------
>  include/linux/pci-ats.h |  5 +++
>  include/linux/pci.h     |  1 +
>  3 files changed, 59 insertions(+), 21 deletions(-)
> 

> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 280be911f190..1f4be27a071d 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -30,6 +30,8 @@ void pci_ats_init(struct pci_dev *dev)
>  	dev->ats_cap = pos;
>  
>  	pci_pri_init(dev);
> +
> +	pci_pasid_init(dev);
>  }
>  
>  /**
> @@ -315,6 +317,40 @@ EXPORT_SYMBOL_GPL(pci_reset_pri);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> +
> +void pci_pasid_init(struct pci_dev *pdev)
> +{
> ...
> +}

> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 33653d4ca94f..bc7f815d38ff 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -40,6 +40,7 @@ static inline int pci_reset_pri(struct pci_dev *pdev)
>  
>  #ifdef CONFIG_PCI_PASID
>  
> +void pci_pasid_init(struct pci_dev *pdev);

This also looks like it should be static in ats.c.

>  int pci_enable_pasid(struct pci_dev *pdev, int features);
>  void pci_disable_pasid(struct pci_dev *pdev);
>  void pci_restore_pasid_state(struct pci_dev *pdev);
> @@ -48,6 +49,10 @@ int pci_max_pasids(struct pci_dev *pdev);
>  
>  #else  /* CONFIG_PCI_PASID */
>  
> +static inline void pci_pasid_init(struct pci_dev *pdev)
> +{
> +}
> +
>  static inline int pci_enable_pasid(struct pci_dev *pdev, int features)
>  {
>  	return -EINVAL;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 3/7] PCI/ATS: Initialize PASID in pci_ats_init()
  2019-08-02  0:06 ` [PATCH v5 3/7] PCI/ATS: Initialize PASID " sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-15  4:48   ` Bjorn Helgaas
@ 2019-08-15  4:56   ` Bjorn Helgaas
  2019-08-15 17:31     ` Kuppuswamy Sathyanarayanan
  2 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-15  4:56 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 01, 2019 at 05:06:00PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Currently, PASID Capability checks are repeated across all PASID API's.
> Instead, cache the capability check result in pci_pasid_init() and use
> it in other PASID API's. Also, since PASID is a shared resource between
> PF/VF, initialize PASID features with default values in pci_pasid_init().
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

> + * TODO: Since PASID is a shared resource between PF/VF, don't update
> + * PASID features in the same API as a per device feature.

This comment is slightly misleading (at least, it misled *me* :))
because it hints that PASID might be specific to SR-IOV.  But I don't
think that's true, so if you keep a comment like this, please reword
it along the lines of "for SR-IOV devices, the PF's PASID is shared
between the PF and all VFs" so it leaves open the possibility of
non-SR-IOV devices using PASID as well.

Bjorn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 5/7] PCI/ATS: Add PASID support for PCIe VF devices
  2019-08-13 22:19     ` Kuppuswamy Sathyanarayanan
@ 2019-08-15  5:04       ` Bjorn Helgaas
  2019-08-16  1:21         ` Kuppuswamy Sathyanarayanan
  0 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-15  5:04 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Tue, Aug 13, 2019 at 03:19:58PM -0700, Kuppuswamy Sathyanarayanan wrote:
> On Mon, Aug 12, 2019 at 03:05:08PM -0500, Bjorn Helgaas wrote:
> > On Thu, Aug 01, 2019 at 05:06:02PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > 
> > > When IOMMU tries to enable PASID for VF device in
> > > iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
> > > VF device is currently broken in PCIE driver. Current implementation
> > > expects the given PCIe device (PF & VF) to implement PASID capability
> > > before enabling the PASID support. But this assumption is incorrect. As
> > > per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
> > > use the PASID of the PF and not implement it.
> > > 
> > > Also, since PASID is a shared resource between PF/VF, following rules
> > > should apply.
> > > 
> > > 1. Use proper locking before accessing/modifying PF resources in VF
> > >    PASID enable/disable call.
> > > 2. Use reference count logic to track the usage of PASID resource.
> > > 3. Disable PASID only if the PASID reference count (pasid_ref_cnt) is zero.
> > > 
> > > Cc: Ashok Raj <ashok.raj@intel.com>
> > > Cc: Keith Busch <keith.busch@intel.com>
> > > Suggested-by: Ashok Raj <ashok.raj@intel.com>
> > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > ---
> > >  drivers/pci/ats.c   | 113 ++++++++++++++++++++++++++++++++++----------
> > >  include/linux/pci.h |   2 +
> > >  2 files changed, 90 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > index 079dc5444444..9384afd7d00e 100644
> > > --- a/drivers/pci/ats.c
> > > +++ b/drivers/pci/ats.c
> > > @@ -402,6 +402,8 @@ void pci_pasid_init(struct pci_dev *pdev)
> > >  	if (pdev->is_virtfn)
> > >  		return;
> > >  
> > > +	mutex_init(&pdev->pasid_lock);
> > > +
> > >  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> > >  	if (!pos)
> > >  		return;
> > > @@ -436,32 +438,57 @@ void pci_pasid_init(struct pci_dev *pdev)
> > >  int pci_enable_pasid(struct pci_dev *pdev, int features)
> > >  {
> > >  	u16 control, supported;
> > > +	int ret = 0;
> > > +	struct pci_dev *pf = pci_physfn(pdev);
> > >  
> > > -	if (WARN_ON(pdev->pasid_enabled))
> > > -		return -EBUSY;
> > > +	mutex_lock(&pf->pasid_lock);
> > >  
> > > -	if (!pdev->eetlp_prefix_path)
> > > -		return -EINVAL;
> > > +	if (WARN_ON(pdev->pasid_enabled)) {
> > > +		ret = -EBUSY;
> > > +		goto pasid_unlock;
> > > +	}
> > >  
> > > -	if (!pdev->pasid_cap)
> > > -		return -EINVAL;
> > > +	if (!pdev->eetlp_prefix_path) {
> > > +		ret = -EINVAL;
> > > +		goto pasid_unlock;
> > > +	}
> > >  
> > > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > > -			     &supported);
> > > +	if (!pf->pasid_cap) {
> > > +		ret = -EINVAL;
> > > +		goto pasid_unlock;
> > > +	}
> > > +
> > > +	if (pdev->is_virtfn && pf->pasid_enabled)
> > > +		goto update_status;
> > > +
> > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
> > >  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> > >  
> > >  	/* User wants to enable anything unsupported? */
> > > -	if ((supported & features) != features)
> > > -		return -EINVAL;
> > > +	if ((supported & features) != features) {
> > > +		ret = -EINVAL;
> > > +		goto pasid_unlock;
> > > +	}
> > >  
> > >  	control = PCI_PASID_CTRL_ENABLE | features;
> > > -	pdev->pasid_features = features;
> > > -
> > > +	pf->pasid_features = features;
> > >  	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > >  
> > > -	pdev->pasid_enabled = 1;
> > > +	/*
> > > +	 * If PASID is not already enabled in PF, increment pasid_ref_cnt
> > > +	 * to count PF PASID usage.
> > > +	 */
> > > +	if (pdev->is_virtfn && !pf->pasid_enabled) {
> > > +		atomic_inc(&pf->pasid_ref_cnt);
> > > +		pf->pasid_enabled = 1;
> > > +	}
> > >  
> > > -	return 0;
> > > +update_status:
> > > +	atomic_inc(&pf->pasid_ref_cnt);
> > > +	pdev->pasid_enabled = 1;
> > > +pasid_unlock:
> > > +	mutex_unlock(&pf->pasid_lock);
> > > +	return ret;
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_enable_pasid);
> > >  
> > > @@ -472,16 +499,29 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
> > >  void pci_disable_pasid(struct pci_dev *pdev)
> > >  {
> > >  	u16 control = 0;
> > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > +
> > > +	mutex_lock(&pf->pasid_lock);
> > >  
> > >  	if (WARN_ON(!pdev->pasid_enabled))
> > > -		return;
> > > +		goto pasid_unlock;
> > >  
> > > -	if (!pdev->pasid_cap)
> > > -		return;
> > > +	if (!pf->pasid_cap)
> > > +		goto pasid_unlock;
> > >  
> > > -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > > +	atomic_dec(&pf->pasid_ref_cnt);
> > >  
> > > +	if (atomic_read(&pf->pasid_ref_cnt))
> > > +		goto done;
> > > +
> > > +	/* Disable PASID only if pasid_ref_cnt is zero */
> > > +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> > > +
> > > +done:
> > >  	pdev->pasid_enabled = 0;
> > > +pasid_unlock:
> > > +	mutex_unlock(&pf->pasid_lock);
> > > +
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_disable_pasid);
> > >  
> > > @@ -492,15 +532,25 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
> > >  void pci_restore_pasid_state(struct pci_dev *pdev)
> > >  {
> > >  	u16 control;
> > > +	struct pci_dev *pf = pci_physfn(pdev);
> > >  
> > >  	if (!pdev->pasid_enabled)
> > >  		return;
> > >  
> > > -	if (!pdev->pasid_cap)
> > > +	if (!pf->pasid_cap)
> > >  		return;
> > >  
> > > +	mutex_lock(&pf->pasid_lock);
> > > +
> > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, &control);
> > > +	if (control & PCI_PASID_CTRL_ENABLE)
> > > +		goto pasid_unlock;
> > > +
> > >  	control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> > > -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > > +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> > > +
> > > +pasid_unlock:
> > > +	mutex_unlock(&pf->pasid_lock);
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> > >  
> > > @@ -517,15 +567,22 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> > >  int pci_pasid_features(struct pci_dev *pdev)
> > >  {
> > >  	u16 supported;
> > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > +
> > > +	mutex_lock(&pf->pasid_lock);
> > >  
> > > -	if (!pdev->pasid_cap)
> > > +	if (!pf->pasid_cap) {
> > > +		mutex_unlock(&pf->pasid_lock);
> > >  		return -EINVAL;
> > > +	}
> > >  
> > > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP,
> > >  			     &supported);
> > >  
> > >  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> > >  
> > > +	mutex_unlock(&pf->pasid_lock);
> > > +
> > >  	return supported;
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_pasid_features);
> > > @@ -579,15 +636,21 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
> > >  int pci_max_pasids(struct pci_dev *pdev)
> > >  {
> > >  	u16 supported;
> > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > +
> > > +	mutex_lock(&pf->pasid_lock);
> > >  
> > > -	if (!pdev->pasid_cap)
> > > +	if (!pf->pasid_cap) {
> > > +		mutex_unlock(&pf->pasid_lock);
> > >  		return -EINVAL;
> > > +	}
> > >  
> > > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > > -			     &supported);
> > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
> > >  
> > >  	supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
> > >  
> > > +	mutex_unlock(&pf->pasid_lock);
> > > +
> > >  	return (1 << supported);
> > >  }
> > >  EXPORT_SYMBOL_GPL(pci_max_pasids);
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 3c9c4c82be27..4bfcca045afd 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -461,8 +461,10 @@ struct pci_dev {
> > >  	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
> > >  #endif
> > >  #ifdef CONFIG_PCI_PASID
> > > +	struct mutex	pasid_lock;	/* PASID enable lock */
> > 
> > I think these locks are finer-grained than necessary.  I'm not sure
> > it's worth having two mutexes for every device (one for PRI and
> > another for PASID).  Is there really a performance benefit for having
> > two?

> Performance benefit should be minimal. But, PRI and PASID are functionally
> independent. So I don't think its correct to protect its resources with
> a common lock. Let me know your comments.

I'm not an expert on PRI and PASID, but if we can figure out a place
to put it and a way to manage it, I think it's OK to have a lock that
protects both.  I'm thinking about the size of the pci_dev -- I'm not
sure the benefit of having two locks is commensurate with the size
cost.

> > Do it (or do they) need to be in struct pci_dev?  You only use the PF
> > mutexes, so maybe it could be in the struct pci_sriov, which I think
> > is only one per PF.

> Its possible to move it to pci_sriov structure. But is that the right
> place for it? This lock is only used for protecting PRI and PASID feature
> updates and PRI/PASID are not dependent on IOV feature. Let me know your
> comments.

Hmm.  I misunderstood the use of these.  I had the impression they
were only used for PFs.  If that were the case, pci_sriov might make
sense because we only allocate that for PFs (when we enable SR-IOV in
sriov_init()).  But IIUC that's *not* the case: even non-SR-IOV
devices can use PRI/PASID; it's just that if a *VF* uses them, the VF
is actually using the PRI of the PF.

> If you want to move this lock to pci_sriov structure and use one lock
> for both PRI/PASID, then the implementation would look like following. We
> could create physfn lock/unlock functions in include/linux/pci.h similar
> to pci_physfn() function.

> #ifdef CONFIG_PCI_IOV
> static inline void pci_physfn_reslock(struct pci_dev *dev)
> {
>     struct pci_dev *pf = pci_physfn(dev);
> 
>     if (!pf->is_physfn)
>         return;
> 
>     mutex_lock(&pf->sriov->reslock);
> 
> }
> #else
> static inline void pci_physfn_reslock(struct pci_dev *dev) {}; 
> #endif

Yeah, that's not a pretty solution.  IIUC, we don't need to lock at
all for non-SR-IOV devices, because we're operating on our own device
and nobody else should be touching it.  Right?

Only the SR-IOV case (operating on a PF with SR-IOV enabled or on one
of its VFs) needs locking because these are all sharing one resource.

So it's kind of a shame to allocate the lock for *every* pci_dev, when
we only need it for PFs with SR-IOV enabled.

Bjorn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
  2019-08-15  4:46   ` Bjorn Helgaas
@ 2019-08-15 17:30     ` Kuppuswamy Sathyanarayanan
  2019-08-16 17:31       ` Bjorn Helgaas
  0 siblings, 1 reply; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-15 17:30 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Wed, Aug 14, 2019 at 11:46:57PM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 01, 2019 at 05:05:59PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > 
> > Currently, PRI Capability checks are repeated across all PRI API's.
> > Instead, cache the capability check result in pci_pri_init() and use it
> > in other PRI API's. Also, since PRI is a shared resource between PF/VF,
> > initialize default values for common PRI features in pci_pri_init().
> > 
> > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > ---
> >  drivers/pci/ats.c       | 80 ++++++++++++++++++++++++++++-------------
> >  include/linux/pci-ats.h |  5 +++
> >  include/linux/pci.h     |  1 +
> >  3 files changed, 61 insertions(+), 25 deletions(-)
> > 
> 
> > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > index cdd936d10f68..280be911f190 100644
> > --- a/drivers/pci/ats.c
> > +++ b/drivers/pci/ats.c
> 
> > @@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev)
> >  		return;
> >  
> >  	dev->ats_cap = pos;
> > +
> > +	pci_pri_init(dev);
> >  }
> >  
> >  /**
> > @@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
> >  EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
> >  
> >  #ifdef CONFIG_PCI_PRI
> > +
> > +void pci_pri_init(struct pci_dev *pdev)
> > +{
> > ...
> > +}
> 
> > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> > index 1a0bdaee2f32..33653d4ca94f 100644
> > --- a/include/linux/pci-ats.h
> > +++ b/include/linux/pci-ats.h
> > @@ -6,6 +6,7 @@
> >  
> >  #ifdef CONFIG_PCI_PRI
> >  
> > +void pci_pri_init(struct pci_dev *pdev);
> 
> pci_pri_init() is implemented and called in drivers/pci/ats.c.  Unless
> there's a need to call this from outside ats.c, it should be static
> and should not be declared here.
> 
> If you can make it static, please also reorder the code so you don't
> need a forward declaration in ats.c.
Initially I did implement it as static function in drivers/pci/ats.c
and protected the calling of pci_pri_init() with #ifdef CONFIG_PCI_PRI.
But Keith did not like the implementation using #ifdefs and asked me to
define empty functions. That's the reason for moving it to header file.
In your previous review to this patch, since this is not used outside ats.c
you asked me to move the declaraion to drivers/pci/pci.h. So I was planing
to move it to drivers/pci/pci.h in next version. Let me know if you are in
agreement.
> 
> >  int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
> >  void pci_disable_pri(struct pci_dev *pdev);
> >  void pci_restore_pri_state(struct pci_dev *pdev);
> > @@ -13,6 +14,10 @@ int pci_reset_pri(struct pci_dev *pdev);
> >  
> >  #else /* CONFIG_PCI_PRI */
> >  
> > +static inline void pci_pri_init(struct pci_dev *pdev)
> > +{
> > +}
> > +
> >  static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> >  {
> >  	return -ENODEV;

-- 
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 3/7] PCI/ATS: Initialize PASID in pci_ats_init()
  2019-08-15  4:56   ` Bjorn Helgaas
@ 2019-08-15 17:31     ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-15 17:31 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Wed, Aug 14, 2019 at 11:56:59PM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 01, 2019 at 05:06:00PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > 
> > Currently, PASID Capability checks are repeated across all PASID API's.
> > Instead, cache the capability check result in pci_pasid_init() and use
> > it in other PASID API's. Also, since PASID is a shared resource between
> > PF/VF, initialize PASID features with default values in pci_pasid_init().
> > 
> > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> > + * TODO: Since PASID is a shared resource between PF/VF, don't update
> > + * PASID features in the same API as a per device feature.
> 
> This comment is slightly misleading (at least, it misled *me* :))
> because it hints that PASID might be specific to SR-IOV.  But I don't
> think that's true, so if you keep a comment like this, please reword
> it along the lines of "for SR-IOV devices, the PF's PASID is shared
> between the PF and all VFs" so it leaves open the possibility of
> non-SR-IOV devices using PASID as well.
Ok. I will fix it in next version.
> 
> Bjorn

-- 
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-02  0:06 ` [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices sathyanarayanan.kuppuswamy
  2019-08-12 20:04   ` Bjorn Helgaas
  2019-08-13  4:16   ` Bjorn Helgaas
@ 2019-08-15 22:20   ` Bjorn Helgaas
  2019-08-15 22:39     ` Kuppuswamy Sathyanarayanan
  2 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-15 22:20 UTC (permalink / raw)
  To: sathyanarayanan.kuppuswamy
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch, Joerg Roedel,
	David Woodhouse, iommu

[+cc Joerg, David, iommu list: because IOMMU drivers are the only
callers of pci_enable_pri() and pci_enable_pasid()]

On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> When IOMMU tries to enable Page Request Interface (PRI) for VF device
> in iommu_enable_dev_iotlb(), it always fails because PRI support for
> PCIe VF device is currently broken. Current implementation expects
> the given PCIe device (PF & VF) to implement PRI capability before
> enabling the PRI support. But this assumption is incorrect. As per PCIe
> spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> PRI of the PF and not implement it. Hence we need to create exception
> for handling the PRI support for PCIe VF device.
> 
> Also, since PRI is a shared resource between PF/VF, following rules
> should apply.
> 
> 1. Use proper locking before accessing/modifying PF resources in VF
>    PRI enable/disable call.
> 2. Use reference count logic to track the usage of PRI resource.
> 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.

Wait, why do we need this at all?  I agree the spec says VFs may not
implement PRI or PASID capabilities and that VFs use the PRI and
PASID of the PF.

But why do we need to support pci_enable_pri() and pci_enable_pasid()
for VFs?  There's nothing interesting we can *do* in the VF, and
passing it off to the PF adds all this locking mess.  For VFs, can we
just make them do nothing or return -EINVAL?  What functionality would
we be missing if we did that?

(Obviously returning -EINVAL would require tweaks in the callers to
either avoid the call for VFs or handle the -EINVAL gracefully.)

> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Suggested-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/ats.c   | 143 ++++++++++++++++++++++++++++++++++----------
>  include/linux/pci.h |   2 +
>  2 files changed, 112 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 1f4be27a071d..079dc5444444 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
>  	if (pdev->is_virtfn)
>  		return;
>  
> +	mutex_init(&pdev->pri_lock);
> +
>  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>  	if (!pos)
>  		return;
> @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  {
>  	u16 control, status;
>  	u32 max_requests;
> +	int ret = 0;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
> -	if (WARN_ON(pdev->pri_enabled))
> -		return -EBUSY;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return -EINVAL;
> +	if (WARN_ON(pdev->pri_enabled)) {
> +		ret = -EBUSY;
> +		goto pri_unlock;
> +	}
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> -	if (!(status & PCI_PRI_STATUS_STOPPED))
> -		return -EBUSY;
> +	if (!pf->pri_cap) {
> +		ret = -EINVAL;
> +		goto pri_unlock;
> +	}
> +
> +	if (pdev->is_virtfn && pf->pri_enabled)
> +		goto update_status;
> +
> +	/*
> +	 * Before updating PRI registers, make sure there is no
> +	 * outstanding PRI requests.
> +	 */
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
> +	if (!(status & PCI_PRI_STATUS_STOPPED)) {
> +		ret = -EBUSY;
> +		goto pri_unlock;
> +	}
>  
> -	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
> -			      &max_requests);
> +	pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, &max_requests);
>  	reqs = min(max_requests, reqs);
> -	pdev->pri_reqs_alloc = reqs;
> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> +	pf->pri_reqs_alloc = reqs;
> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>  
>  	control = PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>  
> -	pdev->pri_enabled = 1;
> +	/*
> +	 * If PRI is not already enabled in PF, increment the PF
> +	 * pri_ref_cnt to track the usage of PRI interface.
> +	 */
> +	if (pdev->is_virtfn && !pf->pri_enabled) {
> +		atomic_inc(&pf->pri_ref_cnt);
> +		pf->pri_enabled = 1;
> +	}
>  
> -	return 0;
> +update_status:
> +	atomic_inc(&pf->pri_ref_cnt);
> +	pdev->pri_enabled = 1;
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pri);
>  
> @@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>  void pci_disable_pri(struct pci_dev *pdev)
>  {
>  	u16 control;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
> -	if (WARN_ON(!pdev->pri_enabled))
> -		return;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return;
> +	if (WARN_ON(!pdev->pri_enabled) || !pf->pri_cap)
> +		goto pri_unlock;
> +
> +	atomic_dec(&pf->pri_ref_cnt);
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
> +	/*
> +	 * If pri_ref_cnt is not zero, then don't modify hardware
> +	 * registers.
> +	 */
> +	if (atomic_read(&pf->pri_ref_cnt))
> +		goto done;
> +
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
>  	control &= ~PCI_PRI_CTRL_ENABLE;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>  
> +done:
>  	pdev->pri_enabled = 0;
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
> @@ -277,17 +319,31 @@ EXPORT_SYMBOL_GPL(pci_disable_pri);
>   */
>  void pci_restore_pri_state(struct pci_dev *pdev)
>  {
> -	u16 control = PCI_PRI_CTRL_ENABLE;
> -	u32 reqs = pdev->pri_reqs_alloc;
> +	u16 control;
> +	u32 reqs;
> +	struct pci_dev *pf = pci_physfn(pdev);
>  
>  	if (!pdev->pri_enabled)
>  		return;
>  
> -	if (!pdev->pri_cap)
> +	if (!pf->pri_cap)
>  		return;
>  
> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
> +	mutex_lock(&pf->pri_lock);
> +
> +	/* If PRI is already enabled by other VF's or PF, return */
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
> +	if (control & PCI_PRI_CTRL_ENABLE)
> +		goto pri_unlock;
> +
> +	reqs = pf->pri_reqs_alloc;
> +	control = PCI_PRI_CTRL_ENABLE;
> +
> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
> +
> +pri_unlock:
> +	mutex_unlock(&pf->pri_lock);
>  }
>  EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>  
> @@ -300,18 +356,32 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>   */
>  int pci_reset_pri(struct pci_dev *pdev)
>  {
> +	struct pci_dev *pf = pci_physfn(pdev);
>  	u16 control;
> +	int ret = 0;
>  
> -	if (WARN_ON(pdev->pri_enabled))
> -		return -EBUSY;
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> -		return -EINVAL;
> +	if (WARN_ON(pdev->pri_enabled)) {
> +		ret = -EBUSY;
> +		goto done;
> +	}
> +
> +	if (!pf->pri_cap) {
> +		ret = -EINVAL;
> +		goto done;
> +	}
> +
> +	/* If PRI is already enabled by other VF's or PF, return 0 */
> +	if (pf->pri_enabled)
> +		goto done;
>  
>  	control = PCI_PRI_CTRL_RESET;
> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>  
> -	return 0;
> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
> +done:
> +	mutex_unlock(&pf->pri_lock);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(pci_reset_pri);
>  #endif /* CONFIG_PCI_PRI */
> @@ -475,11 +545,18 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  {
>  	u16 status;
> +	struct pci_dev *pf = pci_physfn(pdev);
> +
> +	mutex_lock(&pf->pri_lock);
>  
> -	if (!pdev->pri_cap)
> +	if (!pf->pri_cap) {
> +		mutex_unlock(&pf->pri_lock);
>  		return 0;
> +	}
> +
> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
>  
> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
> +	mutex_unlock(&pf->pri_lock);
>  
>  	if (status & PCI_PRI_STATUS_PASID)
>  		return 1;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 27224c0db849..3c9c4c82be27 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -455,8 +455,10 @@ struct pci_dev {
>  	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
>  #endif
>  #ifdef CONFIG_PCI_PRI
> +	struct mutex	pri_lock;	/* PRI enable lock */
>  	u16		pri_cap;	/* PRI Capability offset */
>  	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
> +	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
>  #endif
>  #ifdef CONFIG_PCI_PASID
>  	u16		pasid_cap;	/* PASID Capability offset */
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-15 22:20   ` Bjorn Helgaas
@ 2019-08-15 22:39     ` Kuppuswamy Sathyanarayanan
  2019-08-19 14:15       ` Bjorn Helgaas
  0 siblings, 1 reply; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-15 22:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch, Joerg Roedel,
	David Woodhouse, iommu


On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> callers of pci_enable_pri() and pci_enable_pasid()]
>
> On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> When IOMMU tries to enable Page Request Interface (PRI) for VF device
>> in iommu_enable_dev_iotlb(), it always fails because PRI support for
>> PCIe VF device is currently broken. Current implementation expects
>> the given PCIe device (PF & VF) to implement PRI capability before
>> enabling the PRI support. But this assumption is incorrect. As per PCIe
>> spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
>> PRI of the PF and not implement it. Hence we need to create exception
>> for handling the PRI support for PCIe VF device.
>>
>> Also, since PRI is a shared resource between PF/VF, following rules
>> should apply.
>>
>> 1. Use proper locking before accessing/modifying PF resources in VF
>>     PRI enable/disable call.
>> 2. Use reference count logic to track the usage of PRI resource.
>> 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> Wait, why do we need this at all?  I agree the spec says VFs may not
> implement PRI or PASID capabilities and that VFs use the PRI and
> PASID of the PF.
>
> But why do we need to support pci_enable_pri() and pci_enable_pasid()
> for VFs?  There's nothing interesting we can *do* in the VF, and
> passing it off to the PF adds all this locking mess.  For VFs, can we
> just make them do nothing or return -EINVAL?  What functionality would
> we be missing if we did that?

Currently PRI/PASID capabilities are not enabled by default. IOMMU can
enable PRI/PASID for VF first (and not enable it for PF). In this case,
doing nothing for VF device will break the functionality.

Also the PRI/PASID config options like "PRI Outstanding Page Request 
Allocation"
or "PASID Execute Permission" or "PASID Privileged Mode" are currently 
configured
as per device feature. And hence there is a chance for VF/PF to use 
different
values for these options.

> (Obviously returning -EINVAL would require tweaks in the callers to
> either avoid the call for VFs or handle the -EINVAL gracefully.)
>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Keith Busch <keith.busch@intel.com>
>> Suggested-by: Ashok Raj <ashok.raj@intel.com>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
>>   drivers/pci/ats.c   | 143 ++++++++++++++++++++++++++++++++++----------
>>   include/linux/pci.h |   2 +
>>   2 files changed, 112 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>> index 1f4be27a071d..079dc5444444 100644
>> --- a/drivers/pci/ats.c
>> +++ b/drivers/pci/ats.c
>> @@ -189,6 +189,8 @@ void pci_pri_init(struct pci_dev *pdev)
>>   	if (pdev->is_virtfn)
>>   		return;
>>   
>> +	mutex_init(&pdev->pri_lock);
>> +
>>   	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>>   	if (!pos)
>>   		return;
>> @@ -221,29 +223,57 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>>   {
>>   	u16 control, status;
>>   	u32 max_requests;
>> +	int ret = 0;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   
>> -	if (WARN_ON(pdev->pri_enabled))
>> -		return -EBUSY;
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> -		return -EINVAL;
>> +	if (WARN_ON(pdev->pri_enabled)) {
>> +		ret = -EBUSY;
>> +		goto pri_unlock;
>> +	}
>>   
>> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>> -	if (!(status & PCI_PRI_STATUS_STOPPED))
>> -		return -EBUSY;
>> +	if (!pf->pri_cap) {
>> +		ret = -EINVAL;
>> +		goto pri_unlock;
>> +	}
>> +
>> +	if (pdev->is_virtfn && pf->pri_enabled)
>> +		goto update_status;
>> +
>> +	/*
>> +	 * Before updating PRI registers, make sure there is no
>> +	 * outstanding PRI requests.
>> +	 */
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
>> +	if (!(status & PCI_PRI_STATUS_STOPPED)) {
>> +		ret = -EBUSY;
>> +		goto pri_unlock;
>> +	}
>>   
>> -	pci_read_config_dword(pdev, pdev->pri_cap + PCI_PRI_MAX_REQ,
>> -			      &max_requests);
>> +	pci_read_config_dword(pf, pf->pri_cap + PCI_PRI_MAX_REQ, &max_requests);
>>   	reqs = min(max_requests, reqs);
>> -	pdev->pri_reqs_alloc = reqs;
>> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>> +	pf->pri_reqs_alloc = reqs;
>> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>>   
>>   	control = PCI_PRI_CTRL_ENABLE;
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>>   
>> -	pdev->pri_enabled = 1;
>> +	/*
>> +	 * If PRI is not already enabled in PF, increment the PF
>> +	 * pri_ref_cnt to track the usage of PRI interface.
>> +	 */
>> +	if (pdev->is_virtfn && !pf->pri_enabled) {
>> +		atomic_inc(&pf->pri_ref_cnt);
>> +		pf->pri_enabled = 1;
>> +	}
>>   
>> -	return 0;
>> +update_status:
>> +	atomic_inc(&pf->pri_ref_cnt);
>> +	pdev->pri_enabled = 1;
>> +pri_unlock:
>> +	mutex_unlock(&pf->pri_lock);
>> +	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(pci_enable_pri);
>>   
>> @@ -256,18 +286,30 @@ EXPORT_SYMBOL_GPL(pci_enable_pri);
>>   void pci_disable_pri(struct pci_dev *pdev)
>>   {
>>   	u16 control;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   
>> -	if (WARN_ON(!pdev->pri_enabled))
>> -		return;
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> -		return;
>> +	if (WARN_ON(!pdev->pri_enabled) || !pf->pri_cap)
>> +		goto pri_unlock;
>> +
>> +	atomic_dec(&pf->pri_ref_cnt);
>>   
>> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, &control);
>> +	/*
>> +	 * If pri_ref_cnt is not zero, then don't modify hardware
>> +	 * registers.
>> +	 */
>> +	if (atomic_read(&pf->pri_ref_cnt))
>> +		goto done;
>> +
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
>>   	control &= ~PCI_PRI_CTRL_ENABLE;
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>>   
>> +done:
>>   	pdev->pri_enabled = 0;
>> +pri_unlock:
>> +	mutex_unlock(&pf->pri_lock);
>>   }
>>   EXPORT_SYMBOL_GPL(pci_disable_pri);
>>   
>> @@ -277,17 +319,31 @@ EXPORT_SYMBOL_GPL(pci_disable_pri);
>>    */
>>   void pci_restore_pri_state(struct pci_dev *pdev)
>>   {
>> -	u16 control = PCI_PRI_CTRL_ENABLE;
>> -	u32 reqs = pdev->pri_reqs_alloc;
>> +	u16 control;
>> +	u32 reqs;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   
>>   	if (!pdev->pri_enabled)
>>   		return;
>>   
>> -	if (!pdev->pri_cap)
>> +	if (!pf->pri_cap)
>>   		return;
>>   
>> -	pci_write_config_dword(pdev, pdev->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>> +	mutex_lock(&pf->pri_lock);
>> +
>> +	/* If PRI is already enabled by other VF's or PF, return */
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, &control);
>> +	if (control & PCI_PRI_CTRL_ENABLE)
>> +		goto pri_unlock;
>> +
>> +	reqs = pf->pri_reqs_alloc;
>> +	control = PCI_PRI_CTRL_ENABLE;
>> +
>> +	pci_write_config_dword(pf, pf->pri_cap + PCI_PRI_ALLOC_REQ, reqs);
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>> +
>> +pri_unlock:
>> +	mutex_unlock(&pf->pri_lock);
>>   }
>>   EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>>   
>> @@ -300,18 +356,32 @@ EXPORT_SYMBOL_GPL(pci_restore_pri_state);
>>    */
>>   int pci_reset_pri(struct pci_dev *pdev)
>>   {
>> +	struct pci_dev *pf = pci_physfn(pdev);
>>   	u16 control;
>> +	int ret = 0;
>>   
>> -	if (WARN_ON(pdev->pri_enabled))
>> -		return -EBUSY;
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> -		return -EINVAL;
>> +	if (WARN_ON(pdev->pri_enabled)) {
>> +		ret = -EBUSY;
>> +		goto done;
>> +	}
>> +
>> +	if (!pf->pri_cap) {
>> +		ret = -EINVAL;
>> +		goto done;
>> +	}
>> +
>> +	/* If PRI is already enabled by other VF's or PF, return 0 */
>> +	if (pf->pri_enabled)
>> +		goto done;
>>   
>>   	control = PCI_PRI_CTRL_RESET;
>> -	pci_write_config_word(pdev, pdev->pri_cap + PCI_PRI_CTRL, control);
>>   
>> -	return 0;
>> +	pci_write_config_word(pf, pf->pri_cap + PCI_PRI_CTRL, control);
>> +done:
>> +	mutex_unlock(&pf->pri_lock);
>> +	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(pci_reset_pri);
>>   #endif /* CONFIG_PCI_PRI */
>> @@ -475,11 +545,18 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
>>   int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>>   {
>>   	u16 status;
>> +	struct pci_dev *pf = pci_physfn(pdev);
>> +
>> +	mutex_lock(&pf->pri_lock);
>>   
>> -	if (!pdev->pri_cap)
>> +	if (!pf->pri_cap) {
>> +		mutex_unlock(&pf->pri_lock);
>>   		return 0;
>> +	}
>> +
>> +	pci_read_config_word(pf, pf->pri_cap + PCI_PRI_STATUS, &status);
>>   
>> -	pci_read_config_word(pdev, pdev->pri_cap + PCI_PRI_STATUS, &status);
>> +	mutex_unlock(&pf->pri_lock);
>>   
>>   	if (status & PCI_PRI_STATUS_PASID)
>>   		return 1;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 27224c0db849..3c9c4c82be27 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -455,8 +455,10 @@ struct pci_dev {
>>   	atomic_t	ats_ref_cnt;	/* Number of VFs with ATS enabled */
>>   #endif
>>   #ifdef CONFIG_PCI_PRI
>> +	struct mutex	pri_lock;	/* PRI enable lock */
>>   	u16		pri_cap;	/* PRI Capability offset */
>>   	u32		pri_reqs_alloc; /* Number of PRI requests allocated */
>> +	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
>>   #endif
>>   #ifdef CONFIG_PCI_PASID
>>   	u16		pasid_cap;	/* PASID Capability offset */
>> -- 
>> 2.21.0
>>
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 5/7] PCI/ATS: Add PASID support for PCIe VF devices
  2019-08-15  5:04       ` Bjorn Helgaas
@ 2019-08-16  1:21         ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-16  1:21 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

[-- Attachment #1: Type: text/plain, Size: 11796 bytes --]

On Thu, Aug 15, 2019 at 12:04:30AM -0500, Bjorn Helgaas wrote:
> On Tue, Aug 13, 2019 at 03:19:58PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > On Mon, Aug 12, 2019 at 03:05:08PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Aug 01, 2019 at 05:06:02PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > 
> > > > When IOMMU tries to enable PASID for VF device in
> > > > iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe
> > > > VF device is currently broken in PCIE driver. Current implementation
> > > > expects the given PCIe device (PF & VF) to implement PASID capability
> > > > before enabling the PASID support. But this assumption is incorrect. As
> > > > per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only
> > > > use the PASID of the PF and not implement it.
> > > > 
> > > > Also, since PASID is a shared resource between PF/VF, following rules
> > > > should apply.
> > > > 
> > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > >    PASID enable/disable call.
> > > > 2. Use reference count logic to track the usage of PASID resource.
> > > > 3. Disable PASID only if the PASID reference count (pasid_ref_cnt) is zero.
> > > > 
> > > > Cc: Ashok Raj <ashok.raj@intel.com>
> > > > Cc: Keith Busch <keith.busch@intel.com>
> > > > Suggested-by: Ashok Raj <ashok.raj@intel.com>
> > > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > ---
> > > >  drivers/pci/ats.c   | 113 ++++++++++++++++++++++++++++++++++----------
> > > >  include/linux/pci.h |   2 +
> > > >  2 files changed, 90 insertions(+), 25 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > > index 079dc5444444..9384afd7d00e 100644
> > > > --- a/drivers/pci/ats.c
> > > > +++ b/drivers/pci/ats.c
> > > > @@ -402,6 +402,8 @@ void pci_pasid_init(struct pci_dev *pdev)
> > > >  	if (pdev->is_virtfn)
> > > >  		return;
> > > >  
> > > > +	mutex_init(&pdev->pasid_lock);
> > > > +
> > > >  	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> > > >  	if (!pos)
> > > >  		return;
> > > > @@ -436,32 +438,57 @@ void pci_pasid_init(struct pci_dev *pdev)
> > > >  int pci_enable_pasid(struct pci_dev *pdev, int features)
> > > >  {
> > > >  	u16 control, supported;
> > > > +	int ret = 0;
> > > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > >  
> > > > -	if (WARN_ON(pdev->pasid_enabled))
> > > > -		return -EBUSY;
> > > > +	mutex_lock(&pf->pasid_lock);
> > > >  
> > > > -	if (!pdev->eetlp_prefix_path)
> > > > -		return -EINVAL;
> > > > +	if (WARN_ON(pdev->pasid_enabled)) {
> > > > +		ret = -EBUSY;
> > > > +		goto pasid_unlock;
> > > > +	}
> > > >  
> > > > -	if (!pdev->pasid_cap)
> > > > -		return -EINVAL;
> > > > +	if (!pdev->eetlp_prefix_path) {
> > > > +		ret = -EINVAL;
> > > > +		goto pasid_unlock;
> > > > +	}
> > > >  
> > > > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > > > -			     &supported);
> > > > +	if (!pf->pasid_cap) {
> > > > +		ret = -EINVAL;
> > > > +		goto pasid_unlock;
> > > > +	}
> > > > +
> > > > +	if (pdev->is_virtfn && pf->pasid_enabled)
> > > > +		goto update_status;
> > > > +
> > > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
> > > >  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> > > >  
> > > >  	/* User wants to enable anything unsupported? */
> > > > -	if ((supported & features) != features)
> > > > -		return -EINVAL;
> > > > +	if ((supported & features) != features) {
> > > > +		ret = -EINVAL;
> > > > +		goto pasid_unlock;
> > > > +	}
> > > >  
> > > >  	control = PCI_PASID_CTRL_ENABLE | features;
> > > > -	pdev->pasid_features = features;
> > > > -
> > > > +	pf->pasid_features = features;
> > > >  	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > > >  
> > > > -	pdev->pasid_enabled = 1;
> > > > +	/*
> > > > +	 * If PASID is not already enabled in PF, increment pasid_ref_cnt
> > > > +	 * to count PF PASID usage.
> > > > +	 */
> > > > +	if (pdev->is_virtfn && !pf->pasid_enabled) {
> > > > +		atomic_inc(&pf->pasid_ref_cnt);
> > > > +		pf->pasid_enabled = 1;
> > > > +	}
> > > >  
> > > > -	return 0;
> > > > +update_status:
> > > > +	atomic_inc(&pf->pasid_ref_cnt);
> > > > +	pdev->pasid_enabled = 1;
> > > > +pasid_unlock:
> > > > +	mutex_unlock(&pf->pasid_lock);
> > > > +	return ret;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(pci_enable_pasid);
> > > >  
> > > > @@ -472,16 +499,29 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid);
> > > >  void pci_disable_pasid(struct pci_dev *pdev)
> > > >  {
> > > >  	u16 control = 0;
> > > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > > +
> > > > +	mutex_lock(&pf->pasid_lock);
> > > >  
> > > >  	if (WARN_ON(!pdev->pasid_enabled))
> > > > -		return;
> > > > +		goto pasid_unlock;
> > > >  
> > > > -	if (!pdev->pasid_cap)
> > > > -		return;
> > > > +	if (!pf->pasid_cap)
> > > > +		goto pasid_unlock;
> > > >  
> > > > -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > > > +	atomic_dec(&pf->pasid_ref_cnt);
> > > >  
> > > > +	if (atomic_read(&pf->pasid_ref_cnt))
> > > > +		goto done;
> > > > +
> > > > +	/* Disable PASID only if pasid_ref_cnt is zero */
> > > > +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> > > > +
> > > > +done:
> > > >  	pdev->pasid_enabled = 0;
> > > > +pasid_unlock:
> > > > +	mutex_unlock(&pf->pasid_lock);
> > > > +
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(pci_disable_pasid);
> > > >  
> > > > @@ -492,15 +532,25 @@ EXPORT_SYMBOL_GPL(pci_disable_pasid);
> > > >  void pci_restore_pasid_state(struct pci_dev *pdev)
> > > >  {
> > > >  	u16 control;
> > > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > >  
> > > >  	if (!pdev->pasid_enabled)
> > > >  		return;
> > > >  
> > > > -	if (!pdev->pasid_cap)
> > > > +	if (!pf->pasid_cap)
> > > >  		return;
> > > >  
> > > > +	mutex_lock(&pf->pasid_lock);
> > > > +
> > > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, &control);
> > > > +	if (control & PCI_PASID_CTRL_ENABLE)
> > > > +		goto pasid_unlock;
> > > > +
> > > >  	control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> > > > -	pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control);
> > > > +	pci_write_config_word(pf, pf->pasid_cap + PCI_PASID_CTRL, control);
> > > > +
> > > > +pasid_unlock:
> > > > +	mutex_unlock(&pf->pasid_lock);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> > > >  
> > > > @@ -517,15 +567,22 @@ EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> > > >  int pci_pasid_features(struct pci_dev *pdev)
> > > >  {
> > > >  	u16 supported;
> > > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > > +
> > > > +	mutex_lock(&pf->pasid_lock);
> > > >  
> > > > -	if (!pdev->pasid_cap)
> > > > +	if (!pf->pasid_cap) {
> > > > +		mutex_unlock(&pf->pasid_lock);
> > > >  		return -EINVAL;
> > > > +	}
> > > >  
> > > > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP,
> > > >  			     &supported);
> > > >  
> > > >  	supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> > > >  
> > > > +	mutex_unlock(&pf->pasid_lock);
> > > > +
> > > >  	return supported;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(pci_pasid_features);
> > > > @@ -579,15 +636,21 @@ EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
> > > >  int pci_max_pasids(struct pci_dev *pdev)
> > > >  {
> > > >  	u16 supported;
> > > > +	struct pci_dev *pf = pci_physfn(pdev);
> > > > +
> > > > +	mutex_lock(&pf->pasid_lock);
> > > >  
> > > > -	if (!pdev->pasid_cap)
> > > > +	if (!pf->pasid_cap) {
> > > > +		mutex_unlock(&pf->pasid_lock);
> > > >  		return -EINVAL;
> > > > +	}
> > > >  
> > > > -	pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP,
> > > > -			     &supported);
> > > > +	pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, &supported);
> > > >  
> > > >  	supported = (supported & PASID_NUMBER_MASK) >> PASID_NUMBER_SHIFT;
> > > >  
> > > > +	mutex_unlock(&pf->pasid_lock);
> > > > +
> > > >  	return (1 << supported);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(pci_max_pasids);
> > > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > > index 3c9c4c82be27..4bfcca045afd 100644
> > > > --- a/include/linux/pci.h
> > > > +++ b/include/linux/pci.h
> > > > @@ -461,8 +461,10 @@ struct pci_dev {
> > > >  	atomic_t	pri_ref_cnt;	/* Number of PF/VF PRI users */
> > > >  #endif
> > > >  #ifdef CONFIG_PCI_PASID
> > > > +	struct mutex	pasid_lock;	/* PASID enable lock */
> > > 
> > > I think these locks are finer-grained than necessary.  I'm not sure
> > > it's worth having two mutexes for every device (one for PRI and
> > > another for PASID).  Is there really a performance benefit for having
> > > two?
> 
> > Performance benefit should be minimal. But, PRI and PASID are functionally
> > independent. So I don't think its correct to protect its resources with
> > a common lock. Let me know your comments.
> 
> I'm not an expert on PRI and PASID, but if we can figure out a place
> to put it and a way to manage it, I think it's OK to have a lock that
> protects both.  I'm thinking about the size of the pci_dev -- I'm not
> sure the benefit of having two locks is commensurate with the size
> cost.
> 
> > > Do it (or do they) need to be in struct pci_dev?  You only use the PF
> > > mutexes, so maybe it could be in the struct pci_sriov, which I think
> > > is only one per PF.
> 
> > Its possible to move it to pci_sriov structure. But is that the right
> > place for it? This lock is only used for protecting PRI and PASID feature
> > updates and PRI/PASID are not dependent on IOV feature. Let me know your
> > comments.
> 
> Hmm.  I misunderstood the use of these.  I had the impression they
> were only used for PFs.  If that were the case, pci_sriov might make
> sense because we only allocate that for PFs (when we enable SR-IOV in
> sriov_init()).  But IIUC that's *not* the case: even non-SR-IOV
> devices can use PRI/PASID; it's just that if a *VF* uses them, the VF
> is actually using the PRI of the PF.
Yes, your current interpretation is correct. Even non SR-IOV devices can
use PRI/PASID. But the race condition issue only exists in SR-IOV
(PF/VF) devices.
> 
> > If you want to move this lock to pci_sriov structure and use one lock
> > for both PRI/PASID, then the implementation would look like following. We
> > could create physfn lock/unlock functions in include/linux/pci.h similar
> > to pci_physfn() function.
> 
> > #ifdef CONFIG_PCI_IOV
> > static inline void pci_physfn_reslock(struct pci_dev *dev)
> > {
> >     struct pci_dev *pf = pci_physfn(dev);
> > 
> >     if (!pf->is_physfn)
> >         return;
> > 
> >     mutex_lock(&pf->sriov->reslock);
> > 
> > }
> > #else
> > static inline void pci_physfn_reslock(struct pci_dev *dev) {}; 
> > #endif
> 
> Yeah, that's not a pretty solution.  IIUC, we don't need to lock at
> all for non-SR-IOV devices, because we're operating on our own device
> and nobody else should be touching it.  Right?
Yes, we don't need to lock for non-SR-IOV devices.
> 
> Only the SR-IOV case (operating on a PF with SR-IOV enabled or on one
> of its VFs) needs locking because these are all sharing one resource.
> 
> So it's kind of a shame to allocate the lock for *every* pci_dev, when
> we only need it for PFs with SR-IOV enabled.
if not pci_dev structure, then next appropriate place to add this lock
is struct pci_sriov.
Since the issue is specific to SR-IOV devices, even if PASID/PRI has no
dependency on SR-IOV, I think the we can add the reslock to pci_sriov
structure. Please check the attached patch for sample implementation.
> 
> Bjorn

-- 
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer

[-- Attachment #2: 0001-PCI-IOV-Add-pci_physfn_reslock-unlock-interfaces.patch --]
[-- Type: text/x-diff, Size: 3787 bytes --]

From 7ef4ea0e5ef761286602daac5b6913ad610e37ce Mon Sep 17 00:00:00 2001
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Date: Thu, 15 Aug 2019 17:34:06 -0700
Subject: [PATCH v1 1/1] PCI/IOV: Add pci_physfn_reslock/unlock() interfaces

As per PCIe spec r5.0, sec 9.3.7, in SR-IOV devices, capabilities like
PASID, PRI, VC, etc are shared between PF and its associated VFs. So, to
prevent race conditions between PF/VF while updating configuration
registers of these shared capabilities, a new synchronization mechanism
is required.

As a first step, create shared resource lock and expose expose
pci_physfn_reslock/unlock() API's. Users of these shared capabilities can
use these lock/unlock interfaces to synchronize its access.

Since the shared capability is always implemented by PF, reslock mutex
has been added to pci_sriov structure which only exists for PF.

NOTE: Currently this reslock is common for all shared capabilities
between PF/VF. In future, if any performance impact has been noticed, we
should create individual locks for each of the shared capability.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/iov.c |  1 +
 drivers/pci/pci.h | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 525fd3f272b3..004e7076b065 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -507,6 +507,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
 	else
 		iov->dev = dev;
 
+	mutex_init(&iov->reslock);
 	dev->sriov = iov;
 	dev->is_physfn = 1;
 	rc = compute_max_vf_buses(dev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a0941ade88eb..512d286ed8d6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -304,6 +304,19 @@ struct pci_sriov {
 	u16		subsystem_device; /* VF subsystem device */
 	resource_size_t	barsz[PCI_SRIOV_NUM_BARS];	/* VF BAR size */
 	bool		drivers_autoprobe; /* Auto probing of VFs by driver */
+	/*
+	 * reslock mutex is used for synchronizing updates to resources
+	 * shared between PF and all associated VFs. For example, in
+	 * SRIOV devices, PRI and PASID interfaces are shared between
+	 * PF an all VFs, and hence we need proper locking mechanism to
+	 * prevent both PF and VF update the PRI or PASID configuration
+	 * registers at the same time.
+	 * NOTE: Currently, this lock is shared by all capabilities that
+	 * has shared resource between PF and VFs. If there is any performance
+	 * impact then perhaps we need to create separate lock for each of
+	 * the independent capability that has shared resources.
+	 */
+	struct mutex	reslock;	/* PF/VF shared resource lock */
 };
 
 /**
@@ -449,6 +462,27 @@ void pci_iov_update_resource(struct pci_dev *dev, int resno);
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
 void pci_restore_iov_state(struct pci_dev *dev);
 int pci_iov_bus_range(struct pci_bus *bus);
+static inline void pci_physfn_reslock(struct pci_dev *dev)
+{
+	struct pci_dev *pf = pci_physfn(dev);
+
+	/* For non SRIOV devices, locking is not needed */
+	if (!pf->is_physfn)
+		return;
+
+	mutex_lock(&pf->sriov->reslock);
+}
+
+static inline void pci_physfn_resunlock(struct pci_dev *dev)
+{
+	struct pci_dev *pf = pci_physfn(dev);
+
+	/* For non SRIOV devices, reslock is never held */
+	if (!pf->is_physfn)
+		return;
+
+	mutex_unlock(&pf->sriov->reslock);
+}
 
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
@@ -469,6 +503,12 @@ static inline int pci_iov_bus_range(struct pci_bus *bus)
 {
 	return 0;
 }
+static inline void pci_physfn_reslock(struct pci_dev *dev)
+{
+}
+static inline void pci_physfn_resunlock(struct pci_dev *dev)
+{
+}
 
 #endif /* CONFIG_PCI_IOV */
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
  2019-08-15 17:30     ` Kuppuswamy Sathyanarayanan
@ 2019-08-16 17:31       ` Bjorn Helgaas
  0 siblings, 0 replies; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-16 17:31 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Thu, Aug 15, 2019 at 10:30:03AM -0700, Kuppuswamy Sathyanarayanan wrote:
> On Wed, Aug 14, 2019 at 11:46:57PM -0500, Bjorn Helgaas wrote:
> > On Thu, Aug 01, 2019 at 05:05:59PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > 
> > > Currently, PRI Capability checks are repeated across all PRI API's.
> > > Instead, cache the capability check result in pci_pri_init() and use it
> > > in other PRI API's. Also, since PRI is a shared resource between PF/VF,
> > > initialize default values for common PRI features in pci_pri_init().
> > > 
> > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > ---
> > >  drivers/pci/ats.c       | 80 ++++++++++++++++++++++++++++-------------
> > >  include/linux/pci-ats.h |  5 +++
> > >  include/linux/pci.h     |  1 +
> > >  3 files changed, 61 insertions(+), 25 deletions(-)
> > > 
> > 
> > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > index cdd936d10f68..280be911f190 100644
> > > --- a/drivers/pci/ats.c
> > > +++ b/drivers/pci/ats.c
> > 
> > > @@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev)
> > >  		return;
> > >  
> > >  	dev->ats_cap = pos;
> > > +
> > > +	pci_pri_init(dev);
> > >  }
> > >  
> > >  /**
> > > @@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
> > >  EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
> > >  
> > >  #ifdef CONFIG_PCI_PRI
> > > +
> > > +void pci_pri_init(struct pci_dev *pdev)
> > > +{
> > > ...
> > > +}
> > 
> > > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> > > index 1a0bdaee2f32..33653d4ca94f 100644
> > > --- a/include/linux/pci-ats.h
> > > +++ b/include/linux/pci-ats.h
> > > @@ -6,6 +6,7 @@
> > >  
> > >  #ifdef CONFIG_PCI_PRI
> > >  
> > > +void pci_pri_init(struct pci_dev *pdev);
> > 
> > pci_pri_init() is implemented and called in drivers/pci/ats.c.  Unless
> > there's a need to call this from outside ats.c, it should be static
> > and should not be declared here.
> > 
> > If you can make it static, please also reorder the code so you don't
> > need a forward declaration in ats.c.

> Initially I did implement it as static function in drivers/pci/ats.c
> and protected the calling of pci_pri_init() with #ifdef CONFIG_PCI_PRI.
> But Keith did not like the implementation using #ifdefs and asked me to
> define empty functions. That's the reason for moving it to header file.

Defining empty functions doesn't mean it has to be in a header file.
It's only needed inside ats.c, so the whole thing should be static
there.  You can easily #ifdef the implementation, e.g., do the
following in ats.c:

  static void pci_pri_init(struct pci_dev *pdev)
  {
  #ifdef CONFIG_PCI_PRI
    ...
  #endif
  }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues
  2019-08-13  3:51       ` Bjorn Helgaas
@ 2019-08-16 18:06         ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-16 18:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, ashok.raj, keith.busch

On Mon, Aug 12, 2019 at 10:51:48PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 12, 2019 at 01:20:55PM -0700, sathyanarayanan kuppuswamy wrote:
> > On 8/12/19 1:04 PM, Bjorn Helgaas wrote:
> > > On Thu, Aug 01, 2019 at 05:05:58PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > 
> > > > Since pci_prg_resp_pasid_required() function has dependency on both
> > > > PASID and PRI, define it only if both CONFIG_PCI_PRI and
> > > > CONFIG_PCI_PASID config options are enabled.
> 
> > > I don't really like this.  It makes the #ifdefs more complicated and I
> > > don't think it really buys us anything.  Will anything break if we
> > > just drop this patch?
> 
> > Yes, this function uses "pri_lock" mutex which is only defined if
> > CONFIG_PCI_PRI is enabled. So not protecting this function within
> > CONFIG_PCI_PRI will lead to compilation issues.
> 
> Ah, OK.  That helps a lot.  "pri_lock" doesn't exist at this point in
> the series, so the patch makes no sense without knowing that.
> 
> I'm still not convinced this is the right thing because I'm not sure
> the lock is necessary.  I'll respond to the patch that adds the lock.
Its not only pri_lock. This function also uses "pri_cap" which is also
only defined for CONFIG_PCI_PRI. "pri_cap" is added by next patch in the
series which adds caching support for PRI capability check. So this
patch is still required even if we remove use of pri_lock in this
function.
> 
> > > > Fixes: e5567f5f6762 ("PCI/ATS: Add pci_prg_resp_pasid_required()
> > > > interface.")
> > > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > ---
> > > >   drivers/pci/ats.c       | 10 ++++++----
> > > >   include/linux/pci-ats.h | 12 +++++++++---
> > > >   2 files changed, 15 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > > > index e18499243f84..cdd936d10f68 100644
> > > > --- a/drivers/pci/ats.c
> > > > +++ b/drivers/pci/ats.c
> > > > @@ -395,6 +395,8 @@ int pci_pasid_features(struct pci_dev *pdev)
> > > >   }
> > > >   EXPORT_SYMBOL_GPL(pci_pasid_features);
> > > > +#ifdef CONFIG_PCI_PRI
> > > > +
> > > >   /**
> > > >    * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
> > > >    *				 status.
> > > > @@ -402,10 +404,8 @@ EXPORT_SYMBOL_GPL(pci_pasid_features);
> > > >    *
> > > >    * Returns 1 if PASID is required in PRG Response Message, 0 otherwise.
> > > >    *
> > > > - * Even though the PRG response PASID status is read from PRI Status
> > > > - * Register, since this API will mainly be used by PASID users, this
> > > > - * function is defined within #ifdef CONFIG_PCI_PASID instead of
> > > > - * CONFIG_PCI_PRI.
> > > > + * Since this API has dependency on both PRI and PASID, protect it
> > > > + * with both CONFIG_PCI_PRI and CONFIG_PCI_PASID.
> > > >    */
> > > >   int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> > > >   {
> > > > @@ -425,6 +425,8 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> > > >   }
> > > >   EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
> > > > +#endif
> > > > +
> > > >   #define PASID_NUMBER_SHIFT	8
> > > >   #define PASID_NUMBER_MASK	(0x1f << PASID_NUMBER_SHIFT)
> > > >   /**
> > > > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> > > > index 1ebb88e7c184..1a0bdaee2f32 100644
> > > > --- a/include/linux/pci-ats.h
> > > > +++ b/include/linux/pci-ats.h
> > > > @@ -40,7 +40,6 @@ void pci_disable_pasid(struct pci_dev *pdev);
> > > >   void pci_restore_pasid_state(struct pci_dev *pdev);
> > > >   int pci_pasid_features(struct pci_dev *pdev);
> > > >   int pci_max_pasids(struct pci_dev *pdev);
> > > > -int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> > > >   #else  /* CONFIG_PCI_PASID */
> > > > @@ -67,11 +66,18 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
> > > >   	return -EINVAL;
> > > >   }
> > > > +#endif /* CONFIG_PCI_PASID */
> > > > +
> > > > +#if defined(CONFIG_PCI_PRI) && defined(CONFIG_PCI_PASID)
> > > > +
> > > > +int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> > > > +
> > > > +#else /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
> > > > +
> > > >   static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> > > >   {
> > > >   	return 0;
> > > >   }
> > > > -#endif /* CONFIG_PCI_PASID */
> > > > -
> > > > +#endif
> > > >   #endif /* LINUX_PCI_ATS_H*/
> > > > -- 
> > > > 2.21.0
> > > > 
> > -- 
> > Sathyanarayanan Kuppuswamy
> > Linux kernel developer
> > 

-- 
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-15 22:39     ` Kuppuswamy Sathyanarayanan
@ 2019-08-19 14:15       ` Bjorn Helgaas
  2019-08-19 22:53         ` Kuppuswamy Sathyanarayanan
  0 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-19 14:15 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch, Joerg Roedel,
	David Woodhouse, iommu

On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan wrote:
> On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > callers of pci_enable_pri() and pci_enable_pasid()]
> > 
> > On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > 
> > > When IOMMU tries to enable Page Request Interface (PRI) for VF device
> > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > PCIe VF device is currently broken. Current implementation expects
> > > the given PCIe device (PF & VF) to implement PRI capability before
> > > enabling the PRI support. But this assumption is incorrect. As per PCIe
> > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > PRI of the PF and not implement it. Hence we need to create exception
> > > for handling the PRI support for PCIe VF device.
> > > 
> > > Also, since PRI is a shared resource between PF/VF, following rules
> > > should apply.
> > > 
> > > 1. Use proper locking before accessing/modifying PF resources in VF
> > >     PRI enable/disable call.
> > > 2. Use reference count logic to track the usage of PRI resource.
> > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.

> > Wait, why do we need this at all?  I agree the spec says VFs may not
> > implement PRI or PASID capabilities and that VFs use the PRI and
> > PASID of the PF.
> > 
> > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > for VFs?  There's nothing interesting we can *do* in the VF, and
> > passing it off to the PF adds all this locking mess.  For VFs, can we
> > just make them do nothing or return -EINVAL?  What functionality would
> > we be missing if we did that?
> 
> Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> enable PRI/PASID for VF first (and not enable it for PF). In this case,
> doing nothing for VF device will break the functionality.

What is the path where we can enable PRI/PASID for VF but not for the
PF?  The call chains leading to pci_enable_pri() go through the
iommu_ops.add_device interface, which makes me think this is part of
the device enumeration done by the PCI core, and in that case I would
think this it should be done for the PF before VFs.  But maybe this
path isn't exercised until a driver does a DMA map or something
similar?

Bjorn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-19 14:15       ` Bjorn Helgaas
@ 2019-08-19 22:53         ` Kuppuswamy Sathyanarayanan
  2019-08-19 23:19           ` Bjorn Helgaas
  0 siblings, 1 reply; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-19 22:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch, Joerg Roedel,
	David Woodhouse, iommu

On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > 
> > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > 
> > > > When IOMMU tries to enable Page Request Interface (PRI) for VF device
> > > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > > PCIe VF device is currently broken. Current implementation expects
> > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > enabling the PRI support. But this assumption is incorrect. As per PCIe
> > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > > PRI of the PF and not implement it. Hence we need to create exception
> > > > for handling the PRI support for PCIe VF device.
> > > > 
> > > > Also, since PRI is a shared resource between PF/VF, following rules
> > > > should apply.
> > > > 
> > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > >     PRI enable/disable call.
> > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> 
> > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > PASID of the PF.
> > > 
> > > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > passing it off to the PF adds all this locking mess.  For VFs, can we
> > > just make them do nothing or return -EINVAL?  What functionality would
> > > we be missing if we did that?
> > 
> > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > enable PRI/PASID for VF first (and not enable it for PF). In this case,
> > doing nothing for VF device will break the functionality.
> 
> What is the path where we can enable PRI/PASID for VF but not for the
> PF?  The call chains leading to pci_enable_pri() go through the
> iommu_ops.add_device interface, which makes me think this is part of
> the device enumeration done by the PCI core, and in that case I would
> think this it should be done for the PF before VFs.  But maybe this
> path isn't exercised until a driver does a DMA map or something
> similar?
AFAIK, this path will only get exercised when the device does DMA and
hence there is no specific order in which PRI/PASID is enabled in PF/VF.
In fact, my v2 version of this patch set had a check to ensure PF
PRI/PASID enable is happened before VF attempts PRI/PASID
enable/disable. But I had to remove it in later version of this series
due to failure case reported by one the tester of this code. 
> 
> Bjorn

-- 
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-19 22:53         ` Kuppuswamy Sathyanarayanan
@ 2019-08-19 23:19           ` Bjorn Helgaas
  2019-08-28 18:21             ` Kuppuswamy Sathyanarayanan
  0 siblings, 1 reply; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-19 23:19 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch, Joerg Roedel,
	David Woodhouse, iommu

On Mon, Aug 19, 2019 at 03:53:31PM -0700, Kuppuswamy Sathyanarayanan wrote:
> On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> > On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > > 
> > > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > > 
> > > > > When IOMMU tries to enable Page Request Interface (PRI) for VF device
> > > > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > > > PCIe VF device is currently broken. Current implementation expects
> > > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > > enabling the PRI support. But this assumption is incorrect. As per PCIe
> > > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > > > PRI of the PF and not implement it. Hence we need to create exception
> > > > > for handling the PRI support for PCIe VF device.
> > > > > 
> > > > > Also, since PRI is a shared resource between PF/VF, following rules
> > > > > should apply.
> > > > > 
> > > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > > >     PRI enable/disable call.
> > > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> > 
> > > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > > PASID of the PF.
> > > > 
> > > > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > > passing it off to the PF adds all this locking mess.  For VFs, can we
> > > > just make them do nothing or return -EINVAL?  What functionality would
> > > > we be missing if we did that?
> > > 
> > > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > > enable PRI/PASID for VF first (and not enable it for PF). In this case,
> > > doing nothing for VF device will break the functionality.
> > 
> > What is the path where we can enable PRI/PASID for VF but not for the
> > PF?  The call chains leading to pci_enable_pri() go through the
> > iommu_ops.add_device interface, which makes me think this is part of
> > the device enumeration done by the PCI core, and in that case I would
> > think this it should be done for the PF before VFs.  But maybe this
> > path isn't exercised until a driver does a DMA map or something
> > similar?

> AFAIK, this path will only get exercised when the device does DMA and
> hence there is no specific order in which PRI/PASID is enabled in PF/VF.
> In fact, my v2 version of this patch set had a check to ensure PF
> PRI/PASID enable is happened before VF attempts PRI/PASID
> enable/disable. But I had to remove it in later version of this series
> due to failure case reported by one the tester of this code. 

What's the path?  And does that path make sense?

I got this far before giving up:

    iommu_go_to_state                           # AMD
      state_next
        amd_iommu_init_pci
          amd_iommu_init_api
            bus_set_iommu
              iommu_bus_init
                bus_for_each_dev(..., add_iommu_group)
                  add_iommu_group
                    iommu_probe_device
                      amd_iommu_add_device                      # amd_iommu_ops.add_device
                        init_iommu_group
                          iommu_group_get_for_dev
                            iommu_group_add_device
                              __iommu_attach_device
                                amd_iommu_attach_device         # amd_iommu_ops.attach_dev
                                  attach_device                 # amd_iommu
                                    pdev_iommuv2_enable
                                      pci_enable_pri


    iommu_probe_device
      intel_iommu_add_device                    # intel_iommu_ops.add_device
        domain_add_dev_info
          dmar_insert_one_dev_info
            domain_context_mapping
              domain_context_mapping_one
                iommu_enable_dev_iotlb
                  pci_enable_pri


These *look* like enumeration paths, not DMA setup paths.  But I could
be wrong, since I gave up before getting to the source.

I don't want to add all this complexity because we *think* we need it.
I want to think about whether it makes *sense*.  Maybe it's sensible
for the PF enumeration or a PF driver to enable the hardware it owns.

If we leave it to the VFs, then we have issues with coordinating
between VFs that want different settings, etc.

If we understand the whole picture and it needs to be in the VFs,
that's fine.  But I don't think we understand the whole picture yet.

Bjorn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-19 23:19           ` Bjorn Helgaas
@ 2019-08-28 18:21             ` Kuppuswamy Sathyanarayanan
  2019-08-28 18:57               ` Bjorn Helgaas
  0 siblings, 1 reply; 36+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2019-08-28 18:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch, Joerg Roedel,
	David Woodhouse, iommu

On Mon, Aug 19, 2019 at 06:19:25PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 19, 2019 at 03:53:31PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > > > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > > > 
> > > > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > > > 
> > > > > > When IOMMU tries to enable Page Request Interface (PRI) for VF device
> > > > > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > > > > PCIe VF device is currently broken. Current implementation expects
> > > > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > > > enabling the PRI support. But this assumption is incorrect. As per PCIe
> > > > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > > > > PRI of the PF and not implement it. Hence we need to create exception
> > > > > > for handling the PRI support for PCIe VF device.
> > > > > > 
> > > > > > Also, since PRI is a shared resource between PF/VF, following rules
> > > > > > should apply.
> > > > > > 
> > > > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > > > >     PRI enable/disable call.
> > > > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> > > 
> > > > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > > > PASID of the PF.
> > > > > 
> > > > > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > > > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > > > passing it off to the PF adds all this locking mess.  For VFs, can we
> > > > > just make them do nothing or return -EINVAL?  What functionality would
> > > > > we be missing if we did that?
> > > > 
> > > > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > > > enable PRI/PASID for VF first (and not enable it for PF). In this case,
> > > > doing nothing for VF device will break the functionality.
> > > 
> > > What is the path where we can enable PRI/PASID for VF but not for the
> > > PF?  The call chains leading to pci_enable_pri() go through the
> > > iommu_ops.add_device interface, which makes me think this is part of
> > > the device enumeration done by the PCI core, and in that case I would
> > > think this it should be done for the PF before VFs.  But maybe this
> > > path isn't exercised until a driver does a DMA map or something
> > > similar?
> 
> > AFAIK, this path will only get exercised when the device does DMA and
> > hence there is no specific order in which PRI/PASID is enabled in PF/VF.
> > In fact, my v2 version of this patch set had a check to ensure PF
> > PRI/PASID enable is happened before VF attempts PRI/PASID
> > enable/disable. But I had to remove it in later version of this series
> > due to failure case reported by one the tester of this code. 
> 
> What's the path?  And does that path make sense?
> 
> I got this far before giving up:
> 
>     iommu_go_to_state                           # AMD
>       state_next
>         amd_iommu_init_pci
>           amd_iommu_init_api
>             bus_set_iommu
>               iommu_bus_init
>                 bus_for_each_dev(..., add_iommu_group)
>                   add_iommu_group
>                     iommu_probe_device
>                       amd_iommu_add_device                      # amd_iommu_ops.add_device
>                         init_iommu_group
>                           iommu_group_get_for_dev
>                             iommu_group_add_device
>                               __iommu_attach_device
>                                 amd_iommu_attach_device         # amd_iommu_ops.attach_dev
>                                   attach_device                 # amd_iommu
>                                     pdev_iommuv2_enable
>                                       pci_enable_pri
> 
> 
>     iommu_probe_device
>       intel_iommu_add_device                    # intel_iommu_ops.add_device
>         domain_add_dev_info
>           dmar_insert_one_dev_info
>             domain_context_mapping
>               domain_context_mapping_one
>                 iommu_enable_dev_iotlb
>                   pci_enable_pri
> 
> 
> These *look* like enumeration paths, not DMA setup paths.  But I could
> be wrong, since I gave up before getting to the source.
> 
> I don't want to add all this complexity because we *think* we need it.
> I want to think about whether it makes *sense*.  Maybe it's sensible
> for the PF enumeration or a PF driver to enable the hardware it owns.
> 
> If we leave it to the VFs, then we have issues with coordinating
> between VFs that want different settings, etc.
> 
> If we understand the whole picture and it needs to be in the VFs,
> that's fine.  But I don't think we understand the whole picture yet.

After re-analyzing the code paths, I also could not find the use case
where PF/VF PRI/PASID is enabled in out of order(VF first and then PF).
Also, I had no luck in finding that old bug report email which triggered
me to come up with this complicated fix. As per my current analysis, as
you have mentioned, PF/VF PRI/PASID enable seems to happen only during
device creation time.

Following are some of the possible code paths:

VF PRI/PASID enable path is,

[ 8367.161880]  iommu_enable_dev_iotlb+0x83/0x180
[ 8367.168061]  domain_context_mapping_one+0x44f/0x500
[ 8367.174264]  ? domain_context_mapping_one+0x500/0x500
[ 8367.180429]  pci_for_each_dma_alias+0x30/0x170
[ 8367.186368]  dmar_insert_one_dev_info+0x43f/0x4d0
[ 8367.192288]  domain_add_dev_info+0x50/0x90
[ 8367.197973]  intel_iommu_attach_device+0x9c/0x130
[ 8367.203726]  __iommu_attach_device+0x47/0xb0
[ 8367.209292]  ? _cond_resched+0x15/0x40
[ 8367.214643]  iommu_group_add_device+0x13a/0x2c0
[ 8367.220102]  iommu_group_get_for_dev+0xa8/0x220
[ 8367.225460]  intel_iommu_add_device+0x61/0x590
[ 8367.230708]  iommu_bus_notifier+0xb1/0xe0
[ 8367.235768]  notifier_call_chain+0x47/0x70
[ 8367.240757]  blocking_notifier_call_chain+0x3e/0x60
[ 8367.245854]  device_add+0x3ec/0x690
[ 8367.250533]  pci_device_add+0x26b/0x660
[ 8367.255207]  pci_iov_add_virtfn+0x1ce/0x3b0
[ 8367.259873]  sriov_enable+0x254/0x410
[ 8367.264323]  dev_fops_ioctl+0x1378/0x1520 [sad8]
[ 8367.322115]  init_fops_ioctl+0x12c/0x150 [sad8]
[ 8367.324921]  do_vfs_ioctl+0xa4/0x630
[ 8367.327415]  ksys_ioctl+0x70/0x80
[ 8367.329822]  __x64_sys_ioctl+0x16/0x20
[ 8367.332310]  do_syscall_64+0x5b/0x1a0
[ 8367.334771]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

PF PRI/PASID enable path is,

[   11.084005] Call Trace:
[   11.084005]  dump_stack+0x5c/0x7b
[   11.084005]  iommu_enable_dev_iotlb+0x83/0x180
[   11.084005]  domain_context_mapping_one+0x44f/0x500
[   11.084005]  ? domain_context_mapping_one+0x500/0x500
[   11.084005]  pci_for_each_dma_alias+0x30/0x170
[   11.084005]  dmar_insert_one_dev_info+0x43f/0x4d0
[   11.084005]  domain_add_dev_info+0x50/0x90
[   11.084005]  intel_iommu_attach_device+0x9c/0x130
[   11.084005]  __iommu_attach_device+0x47/0xb0
[   11.084005]  ? _cond_resched+0x15/0x40
[   11.084005]  iommu_group_add_device+0x13a/0x2c0
[   11.084005]  iommu_group_get_for_dev+0xa8/0x220
[   11.084005]  intel_iommu_add_device+0x61/0x590
[   11.084005]  ? iommu_probe_device+0x40/0x40
[   11.084005]  add_iommu_group+0xa/0x20
[   11.084005]  bus_for_each_dev+0x76/0xc0
[   11.084005]  bus_set_iommu+0x85/0xc0
[   11.084005]  intel_iommu_init+0xfe5/0x11c1
[   11.084005]  ? __fput+0x134/0x220
[   11.084005]  ? set_debug_rodata+0x11/0x11
[   11.084005]  ? e820__memblock_setup+0x60/0x60
[   11.084005]  ? pci_iommu_init+0x16/0x3f
[   11.084005]  pci_iommu_init+0x16/0x3f
[   11.084005]  do_one_initcall+0x46/0x1f4
[   11.084005]  kernel_init_freeable+0x1ba/0x283
[   11.084005]  ? rest_init+0xb0/0xb0
[   11.084005]  kernel_init+0xa/0x120
[   11.084005]  ret_from_fork+0x1f/0x40

Similarly PF/VF PRI/PASID possible disable paths are,

iommu_hotplug_path->disable_dmar_iommu->__dmar_remove_one_dev_info->iommu_disable_dev_iotlb

domain_exit()->domain_remove_dev_info->iommu_disable_dev_iotlb

vfio_iommu_type1_detach_group()->iommu_detach_group()->intel_iommu_detach_device->dmar_remove_one_dev_info

But even in all of these paths, PF/VF PRI/PASID disable have to happen
in order (VF first and then PF).

So we can implement the logic of not doing anything for VF when its
related PRI/PASID calls. But my questions is, is it safe to go with
these assumptions? Since all these dependencies we have found are not
explicitly defined, if some one breaks it will also affect PRI/PASID
logic. Let me know your comments.



> 
> Bjorn

-- 
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices
  2019-08-28 18:21             ` Kuppuswamy Sathyanarayanan
@ 2019-08-28 18:57               ` Bjorn Helgaas
  0 siblings, 0 replies; 36+ messages in thread
From: Bjorn Helgaas @ 2019-08-28 18:57 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: linux-pci, linux-kernel, ashok.raj, keith.busch, Joerg Roedel,
	David Woodhouse, iommu

On Wed, Aug 28, 2019 at 11:21:53AM -0700, Kuppuswamy Sathyanarayanan wrote:
> On Mon, Aug 19, 2019 at 06:19:25PM -0500, Bjorn Helgaas wrote:
> > On Mon, Aug 19, 2019 at 03:53:31PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > > On Mon, Aug 19, 2019 at 09:15:00AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Aug 15, 2019 at 03:39:03PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > > > > On 8/15/19 3:20 PM, Bjorn Helgaas wrote:
> > > > > > [+cc Joerg, David, iommu list: because IOMMU drivers are the only
> > > > > > callers of pci_enable_pri() and pci_enable_pasid()]
> > > > > > 
> > > > > > On Thu, Aug 01, 2019 at 05:06:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > > > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > > > > > 
> > > > > > > When IOMMU tries to enable Page Request Interface (PRI) for VF device
> > > > > > > in iommu_enable_dev_iotlb(), it always fails because PRI support for
> > > > > > > PCIe VF device is currently broken. Current implementation expects
> > > > > > > the given PCIe device (PF & VF) to implement PRI capability before
> > > > > > > enabling the PRI support. But this assumption is incorrect. As per PCIe
> > > > > > > spec r4.0, sec 9.3.7.11, all VFs associated with PF can only use the
> > > > > > > PRI of the PF and not implement it. Hence we need to create exception
> > > > > > > for handling the PRI support for PCIe VF device.
> > > > > > > 
> > > > > > > Also, since PRI is a shared resource between PF/VF, following rules
> > > > > > > should apply.
> > > > > > > 
> > > > > > > 1. Use proper locking before accessing/modifying PF resources in VF
> > > > > > >     PRI enable/disable call.
> > > > > > > 2. Use reference count logic to track the usage of PRI resource.
> > > > > > > 3. Disable PRI only if the PRI reference count (pri_ref_cnt) is zero.
> > > > 
> > > > > > Wait, why do we need this at all?  I agree the spec says VFs may not
> > > > > > implement PRI or PASID capabilities and that VFs use the PRI and
> > > > > > PASID of the PF.
> > > > > > 
> > > > > > But why do we need to support pci_enable_pri() and pci_enable_pasid()
> > > > > > for VFs?  There's nothing interesting we can *do* in the VF, and
> > > > > > passing it off to the PF adds all this locking mess.  For VFs, can we
> > > > > > just make them do nothing or return -EINVAL?  What functionality would
> > > > > > we be missing if we did that?
> > > > > 
> > > > > Currently PRI/PASID capabilities are not enabled by default. IOMMU can
> > > > > enable PRI/PASID for VF first (and not enable it for PF). In this case,
> > > > > doing nothing for VF device will break the functionality.
> > > > 
> > > > What is the path where we can enable PRI/PASID for VF but not for the
> > > > PF?  The call chains leading to pci_enable_pri() go through the
> > > > iommu_ops.add_device interface, which makes me think this is part of
> > > > the device enumeration done by the PCI core, and in that case I would
> > > > think this it should be done for the PF before VFs.  But maybe this
> > > > path isn't exercised until a driver does a DMA map or something
> > > > similar?
> > 
> > > AFAIK, this path will only get exercised when the device does DMA and
> > > hence there is no specific order in which PRI/PASID is enabled in PF/VF.
> > > In fact, my v2 version of this patch set had a check to ensure PF
> > > PRI/PASID enable is happened before VF attempts PRI/PASID
> > > enable/disable. But I had to remove it in later version of this series
> > > due to failure case reported by one the tester of this code. 
> > 
> > What's the path?  And does that path make sense?
> > 
> > I got this far before giving up:
> > 
> >     iommu_go_to_state                           # AMD
> >       state_next
> >         amd_iommu_init_pci
> >           amd_iommu_init_api
> >             bus_set_iommu
> >               iommu_bus_init
> >                 bus_for_each_dev(..., add_iommu_group)
> >                   add_iommu_group
> >                     iommu_probe_device
> >                       amd_iommu_add_device                      # amd_iommu_ops.add_device
> >                         init_iommu_group
> >                           iommu_group_get_for_dev
> >                             iommu_group_add_device
> >                               __iommu_attach_device
> >                                 amd_iommu_attach_device         # amd_iommu_ops.attach_dev
> >                                   attach_device                 # amd_iommu
> >                                     pdev_iommuv2_enable
> >                                       pci_enable_pri
> > 
> > 
> >     iommu_probe_device
> >       intel_iommu_add_device                    # intel_iommu_ops.add_device
> >         domain_add_dev_info
> >           dmar_insert_one_dev_info
> >             domain_context_mapping
> >               domain_context_mapping_one
> >                 iommu_enable_dev_iotlb
> >                   pci_enable_pri
> > 
> > 
> > These *look* like enumeration paths, not DMA setup paths.  But I could
> > be wrong, since I gave up before getting to the source.
> > 
> > I don't want to add all this complexity because we *think* we need it.
> > I want to think about whether it makes *sense*.  Maybe it's sensible
> > for the PF enumeration or a PF driver to enable the hardware it owns.
> > 
> > If we leave it to the VFs, then we have issues with coordinating
> > between VFs that want different settings, etc.
> > 
> > If we understand the whole picture and it needs to be in the VFs,
> > that's fine.  But I don't think we understand the whole picture yet.
> 
> After re-analyzing the code paths, I also could not find the use case
> where PF/VF PRI/PASID is enabled in out of order(VF first and then PF).
> Also, I had no luck in finding that old bug report email which triggered
> me to come up with this complicated fix. As per my current analysis, as
> you have mentioned, PF/VF PRI/PASID enable seems to happen only during
> device creation time.
> 
> Following are some of the possible code paths:
> 
> VF PRI/PASID enable path is,
> 
> [ 8367.161880]  iommu_enable_dev_iotlb+0x83/0x180
> [ 8367.168061]  domain_context_mapping_one+0x44f/0x500
> [ 8367.174264]  ? domain_context_mapping_one+0x500/0x500
> [ 8367.180429]  pci_for_each_dma_alias+0x30/0x170
> [ 8367.186368]  dmar_insert_one_dev_info+0x43f/0x4d0
> [ 8367.192288]  domain_add_dev_info+0x50/0x90
> [ 8367.197973]  intel_iommu_attach_device+0x9c/0x130
> [ 8367.203726]  __iommu_attach_device+0x47/0xb0
> [ 8367.209292]  ? _cond_resched+0x15/0x40
> [ 8367.214643]  iommu_group_add_device+0x13a/0x2c0
> [ 8367.220102]  iommu_group_get_for_dev+0xa8/0x220
> [ 8367.225460]  intel_iommu_add_device+0x61/0x590
> [ 8367.230708]  iommu_bus_notifier+0xb1/0xe0
> [ 8367.235768]  notifier_call_chain+0x47/0x70
> [ 8367.240757]  blocking_notifier_call_chain+0x3e/0x60
> [ 8367.245854]  device_add+0x3ec/0x690
> [ 8367.250533]  pci_device_add+0x26b/0x660
> [ 8367.255207]  pci_iov_add_virtfn+0x1ce/0x3b0
> [ 8367.259873]  sriov_enable+0x254/0x410
> [ 8367.264323]  dev_fops_ioctl+0x1378/0x1520 [sad8]
> [ 8367.322115]  init_fops_ioctl+0x12c/0x150 [sad8]
> [ 8367.324921]  do_vfs_ioctl+0xa4/0x630
> [ 8367.327415]  ksys_ioctl+0x70/0x80
> [ 8367.329822]  __x64_sys_ioctl+0x16/0x20
> [ 8367.332310]  do_syscall_64+0x5b/0x1a0
> [ 8367.334771]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> PF PRI/PASID enable path is,
> 
> [   11.084005] Call Trace:
> [   11.084005]  dump_stack+0x5c/0x7b
> [   11.084005]  iommu_enable_dev_iotlb+0x83/0x180
> [   11.084005]  domain_context_mapping_one+0x44f/0x500
> [   11.084005]  ? domain_context_mapping_one+0x500/0x500
> [   11.084005]  pci_for_each_dma_alias+0x30/0x170
> [   11.084005]  dmar_insert_one_dev_info+0x43f/0x4d0
> [   11.084005]  domain_add_dev_info+0x50/0x90
> [   11.084005]  intel_iommu_attach_device+0x9c/0x130
> [   11.084005]  __iommu_attach_device+0x47/0xb0
> [   11.084005]  ? _cond_resched+0x15/0x40
> [   11.084005]  iommu_group_add_device+0x13a/0x2c0
> [   11.084005]  iommu_group_get_for_dev+0xa8/0x220
> [   11.084005]  intel_iommu_add_device+0x61/0x590
> [   11.084005]  ? iommu_probe_device+0x40/0x40
> [   11.084005]  add_iommu_group+0xa/0x20
> [   11.084005]  bus_for_each_dev+0x76/0xc0
> [   11.084005]  bus_set_iommu+0x85/0xc0
> [   11.084005]  intel_iommu_init+0xfe5/0x11c1
> [   11.084005]  ? __fput+0x134/0x220
> [   11.084005]  ? set_debug_rodata+0x11/0x11
> [   11.084005]  ? e820__memblock_setup+0x60/0x60
> [   11.084005]  ? pci_iommu_init+0x16/0x3f
> [   11.084005]  pci_iommu_init+0x16/0x3f
> [   11.084005]  do_one_initcall+0x46/0x1f4
> [   11.084005]  kernel_init_freeable+0x1ba/0x283
> [   11.084005]  ? rest_init+0xb0/0xb0
> [   11.084005]  kernel_init+0xa/0x120
> [   11.084005]  ret_from_fork+0x1f/0x40
> 
> Similarly PF/VF PRI/PASID possible disable paths are,
> 
> iommu_hotplug_path->disable_dmar_iommu->__dmar_remove_one_dev_info->iommu_disable_dev_iotlb
> 
> domain_exit()->domain_remove_dev_info->iommu_disable_dev_iotlb
> 
> vfio_iommu_type1_detach_group()->iommu_detach_group()->intel_iommu_detach_device->dmar_remove_one_dev_info
> 
> But even in all of these paths, PF/VF PRI/PASID disable have to happen
> in order (VF first and then PF).
> 
> So we can implement the logic of not doing anything for VF when its
> related PRI/PASID calls. But my questions is, is it safe to go with
> these assumptions? Since all these dependencies we have found are not
> explicitly defined, if some one breaks it will also affect PRI/PASID
> logic. Let me know your comments.

I think we should assume PRI/PASID will be controlled via the PF.
That's true today because we initialize them via the IOMMU binding
path.  If the IOMMU path changes so that's no longer feasible, we
could probably do the initialization in the PCI core.  These features
are implemented in the PF, so I think the code will be simpler if it
mirrors that instead of trying to provide the illusion that they're in
the VF.

Bjorn

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, back to index

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02  0:05 [PATCH v5 0/7] Fix PF/VF dependency issue sathyanarayanan.kuppuswamy
2019-08-02  0:05 ` [PATCH v5 1/7] PCI/ATS: Fix pci_prg_resp_pasid_required() dependency issues sathyanarayanan.kuppuswamy
2019-08-12 20:04   ` Bjorn Helgaas
2019-08-12 20:20     ` sathyanarayanan kuppuswamy
2019-08-13  3:51       ` Bjorn Helgaas
2019-08-16 18:06         ` Kuppuswamy Sathyanarayanan
2019-08-02  0:05 ` [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init() sathyanarayanan.kuppuswamy
2019-08-12 20:04   ` Bjorn Helgaas
2019-08-12 21:35     ` sathyanarayanan kuppuswamy
2019-08-13  4:10       ` Bjorn Helgaas
2019-08-15  4:46   ` Bjorn Helgaas
2019-08-15 17:30     ` Kuppuswamy Sathyanarayanan
2019-08-16 17:31       ` Bjorn Helgaas
2019-08-02  0:06 ` [PATCH v5 3/7] PCI/ATS: Initialize PASID " sathyanarayanan.kuppuswamy
2019-08-12 20:04   ` Bjorn Helgaas
2019-08-15  4:48   ` Bjorn Helgaas
2019-08-15  4:56   ` Bjorn Helgaas
2019-08-15 17:31     ` Kuppuswamy Sathyanarayanan
2019-08-02  0:06 ` [PATCH v5 4/7] PCI/ATS: Add PRI support for PCIe VF devices sathyanarayanan.kuppuswamy
2019-08-12 20:04   ` Bjorn Helgaas
2019-08-12 21:40     ` sathyanarayanan kuppuswamy
2019-08-13  4:16   ` Bjorn Helgaas
2019-08-15 22:20   ` Bjorn Helgaas
2019-08-15 22:39     ` Kuppuswamy Sathyanarayanan
2019-08-19 14:15       ` Bjorn Helgaas
2019-08-19 22:53         ` Kuppuswamy Sathyanarayanan
2019-08-19 23:19           ` Bjorn Helgaas
2019-08-28 18:21             ` Kuppuswamy Sathyanarayanan
2019-08-28 18:57               ` Bjorn Helgaas
2019-08-02  0:06 ` [PATCH v5 5/7] PCI/ATS: Add PASID " sathyanarayanan.kuppuswamy
2019-08-12 20:05   ` Bjorn Helgaas
2019-08-13 22:19     ` Kuppuswamy Sathyanarayanan
2019-08-15  5:04       ` Bjorn Helgaas
2019-08-16  1:21         ` Kuppuswamy Sathyanarayanan
2019-08-02  0:06 ` [PATCH v5 6/7] PCI/ATS: Disable PF/VF ATS service independently sathyanarayanan.kuppuswamy
2019-08-02  0:06 ` [PATCH v5 7/7] PCI: Skip Enhanced Allocation (EA) initialization for VF device sathyanarayanan.kuppuswamy

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org linux-pci@archiver.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/ public-inbox