linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH next V3 0/5] Report PCI device link status
@ 2018-03-12 12:06 Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 1/5] PCI: Add a query function for PCI device's speed cap Tal Gilboa
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Tal Gilboa @ 2018-03-12 12:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Tariq Toukan, Tal Gilboa

These patches introduce the ability to report PCI link width
and speed for all PCI devices in kernel log.
This would reduce code duplication between drivers and unify
the approach for reporting PCI link status.
Implemented and tested for Mellanox devices.

v3: Add Reviewed-by: Tariq Toukan to all commits.

v2: Remove chain BW calculation. This might be a nice feature, but
without the ability to know the exact limiting part, it is more
confusing than useful.
Remove warnings for failed PCI query actions, leaving only the status
and gaps from max capabilities report.
Use pci_warn()/pci_info() instead of dev_warn()/dev_info().
(suggested by Bjorn Helgaas).

v1: Split to multiple patches, calculate BW limitation and remove
MACRO definition for LNKCAP shift (suggested by Bjorn Helgaas).
Multiple fixes - conventions, typos, function naming and functional
(suggested by Tariq Toukan).

Tal Gilboa (5):
  PCI: Add a query function for PCI device's speed cap
  PCI: Add a query function for PCI device's width cap
  PCI: Print PCI device link status in kernel log
  net/mlx4_core: Report PCI properties using dedicated function
  net/mlx5: Report device PCI link status and issues

 drivers/net/ethernet/mellanox/mlx4/main.c      | 81 +----------------------
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  2 +
 drivers/pci/pci-sysfs.c                        | 28 +++-----
 drivers/pci/pci.c                              | 90 ++++++++++++++++++++++++++
 include/linux/pci.h                            |  9 +++
 5 files changed, 110 insertions(+), 100 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH next V3 1/5] PCI: Add a query function for PCI device's speed cap
  2018-03-12 12:06 [PATCH next V3 0/5] Report PCI device link status Tal Gilboa
@ 2018-03-12 12:06 ` Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 2/5] PCI: Add a query function for PCI device's width cap Tal Gilboa
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Tal Gilboa @ 2018-03-12 12:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Tariq Toukan, Tal Gilboa

pcie_get_speed_cap() implements the logic for querying a PCI device's
maximum speed capability. Change max_link_speed_show() function to
use pcie_get_speed_cap().

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/pci/pci-sysfs.c | 22 +++++-----------------
 drivers/pci/pci.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h     |  7 +++++++
 3 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 4933f027..c8b4854 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -159,29 +159,17 @@ static ssize_t max_link_speed_show(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	u32 linkcap;
+	enum pci_bus_speed speed;
+	const char *speed_str;
 	int err;
-	const char *speed;
 
-	err = pcie_capability_read_dword(pci_dev, PCI_EXP_LNKCAP, &linkcap);
+	err = pcie_get_speed_cap(pci_dev, &speed);
 	if (err)
 		return -EINVAL;
 
-	switch (linkcap & PCI_EXP_LNKCAP_SLS) {
-	case PCI_EXP_LNKCAP_SLS_8_0GB:
-		speed = "8 GT/s";
-		break;
-	case PCI_EXP_LNKCAP_SLS_5_0GB:
-		speed = "5 GT/s";
-		break;
-	case PCI_EXP_LNKCAP_SLS_2_5GB:
-		speed = "2.5 GT/s";
-		break;
-	default:
-		speed = "Unknown speed";
-	}
+	speed_str = PCIE_SPEED2STR(speed);
 
-	return sprintf(buf, "%s\n", speed);
+	return sprintf(buf, "%s\n", speed_str);
 }
 static DEVICE_ATTR_RO(max_link_speed);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8c71d1a..7620cc9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5164,6 +5164,48 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
 EXPORT_SYMBOL(pcie_get_minimum_link);
 
 /**
+ * pcie_get_speed_cap - queries for the PCI device's link speed capability
+ * @dev: PCI device to query
+ * @speed: storage for link speed
+ *
+ * This function queries the PCI device speed capability.
+ */
+int pcie_get_speed_cap(struct pci_dev *dev, enum pci_bus_speed *speed)
+{
+	u32 lnkcap;
+	int err1, err2;
+
+	*speed = PCI_SPEED_UNKNOWN;
+
+	err1 = pcie_capability_read_dword(dev, PCI_EXP_LNKCAP,
+					  &lnkcap);
+	if (!err1 && lnkcap) {
+		if (lnkcap & PCI_EXP_LNKCAP_SLS_8_0GB)
+			*speed = PCIE_SPEED_8_0GT;
+		else if (lnkcap & PCI_EXP_LNKCAP_SLS_5_0GB)
+			*speed = PCIE_SPEED_5_0GT;
+		else if (lnkcap & PCI_EXP_LNKCAP_SLS_2_5GB)
+			*speed = PCIE_SPEED_2_5GT;
+		return 0;
+	}
+
+	err2 = pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2,
+					  &lnkcap);
+	if (!err2 && lnkcap) { /* PCIe r3.0-compliant */
+		if (lnkcap & PCI_EXP_LNKCAP2_SLS_8_0GB)
+			*speed = PCIE_SPEED_8_0GT;
+		else if (lnkcap & PCI_EXP_LNKCAP2_SLS_5_0GB)
+			*speed = PCIE_SPEED_5_0GT;
+		else if (lnkcap & PCI_EXP_LNKCAP2_SLS_2_5GB)
+			*speed = PCIE_SPEED_2_5GT;
+		return 0;
+	}
+
+	return err1 ? err1 : err2;
+}
+EXPORT_SYMBOL(pcie_get_speed_cap);
+
+/**
  * pci_select_bars - Make BAR mask from the type of resource
  * @dev: the PCI device for which BAR mask is made
  * @flags: resource type mask to be selected
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e057e8c..54443e4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -259,6 +259,12 @@ enum pci_bus_speed {
 	PCI_SPEED_UNKNOWN		= 0xff,
 };
 
+#define PCIE_SPEED2STR(speed) \
+	((speed) == PCIE_SPEED_8_0GT ? "8 GT/s" : \
+	 (speed) == PCIE_SPEED_5_0GT ? "5 GT/s" : \
+	 (speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
+	 "Unknown speed")
+
 struct pci_cap_saved_data {
 	u16		cap_nr;
 	bool		cap_extended;
@@ -1077,6 +1083,7 @@ static inline int pci_is_managed(struct pci_dev *pdev)
 int pcie_set_mps(struct pci_dev *dev, int mps);
 int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
 			  enum pcie_link_width *width);
+int pcie_get_speed_cap(struct pci_dev *dev, enum pci_bus_speed *speed);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH next V3 2/5] PCI: Add a query function for PCI device's width cap
  2018-03-12 12:06 [PATCH next V3 0/5] Report PCI device link status Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 1/5] PCI: Add a query function for PCI device's speed cap Tal Gilboa
@ 2018-03-12 12:06 ` Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log Tal Gilboa
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Tal Gilboa @ 2018-03-12 12:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Tariq Toukan, Tal Gilboa

pcie_get_width_cap() implements the logic for querying a PCI device's
maximum width capability. Change max_link_width_show() function to
use pcie_get_width_cap().

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/pci/pci-sysfs.c |  6 +++---
 drivers/pci/pci.c       | 23 +++++++++++++++++++++++
 include/linux/pci.h     |  1 +
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c8b4854..ae30ba2 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -177,14 +177,14 @@ static ssize_t max_link_width_show(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	u32 linkcap;
+	enum pcie_link_width width;
 	int err;
 
-	err = pcie_capability_read_dword(pci_dev, PCI_EXP_LNKCAP, &linkcap);
+	err = pcie_get_width_cap(pci_dev, &width);
 	if (err)
 		return -EINVAL;
 
-	return sprintf(buf, "%u\n", (linkcap & PCI_EXP_LNKCAP_MLW) >> 4);
+	return sprintf(buf, "%u\n", width);
 }
 static DEVICE_ATTR_RO(max_link_width);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7620cc9..48b9fd6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5206,6 +5206,29 @@ int pcie_get_speed_cap(struct pci_dev *dev, enum pci_bus_speed *speed)
 EXPORT_SYMBOL(pcie_get_speed_cap);
 
 /**
+ * pcie_get_width_cap - queries for the PCI device's link width capability.
+ * @dev: PCI device to query
+ * @width: storage for link width
+ *
+ * This function queries the PCI device width capability.
+ */
+int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width)
+{
+	u32 lnkcap;
+	int err;
+
+	*width = PCIE_LNK_WIDTH_UNKNOWN;
+
+	err = pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+	if (!err && lnkcap)
+		/* Shift start of width mask by 4 to get actual speed cap */
+		*width = (lnkcap & PCI_EXP_LNKCAP_MLW) >> 4;
+
+	return err;
+}
+EXPORT_SYMBOL(pcie_get_width_cap);
+
+/**
  * pci_select_bars - Make BAR mask from the type of resource
  * @dev: the PCI device for which BAR mask is made
  * @flags: resource type mask to be selected
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 54443e4..8242d3d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1084,6 +1084,7 @@ static inline int pci_is_managed(struct pci_dev *pdev)
 int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
 			  enum pcie_link_width *width);
 int pcie_get_speed_cap(struct pci_dev *dev, enum pci_bus_speed *speed);
+int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-12 12:06 [PATCH next V3 0/5] Report PCI device link status Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 1/5] PCI: Add a query function for PCI device's speed cap Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 2/5] PCI: Add a query function for PCI device's width cap Tal Gilboa
@ 2018-03-12 12:06 ` Tal Gilboa
  2018-03-20 14:05   ` Bjorn Helgaas
  2018-03-12 12:06 ` [PATCH next V3 4/5] net/mlx4_core: Report PCI properties using dedicated function Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 5/5] net/mlx5: Report device PCI link status and issues Tal Gilboa
  4 siblings, 1 reply; 15+ messages in thread
From: Tal Gilboa @ 2018-03-12 12:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Tariq Toukan, Tal Gilboa

Add pcie_print_link_status() function for querying and verifying
a PCI device link status. The PCI speed and width are reported
in kernel log.
This provides a unified method for all PCI devices to
report status and issues, instead of each device reporting in a
different way, using different code.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/pci/pci.c   | 25 +++++++++++++++++++++++++
 include/linux/pci.h |  1 +
 2 files changed, 26 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 48b9fd6..ac876c4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5229,6 +5229,31 @@ int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width)
 EXPORT_SYMBOL(pcie_get_width_cap);
 
 /**
+ * pcie_print_link_status - Reports the PCI device's link speed and width.
+ * @dev: PCI device to query
+ *
+ * This function checks whether the PCI device current speed and width are equal
+ * to the maximum PCI device capabilities.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	enum pcie_link_width width, width_cap;
+	enum pci_bus_speed speed, speed_cap;
+
+	pcie_get_speed_cap(dev, &speed_cap);
+	pcie_get_width_cap(dev, &width_cap);
+	pcie_get_minimum_link(dev, &speed, &width);
+
+	if (speed == speed_cap && width == width_cap)
+		pci_info(dev, "%s x%d link\n", PCIE_SPEED2STR(speed), width);
+	else
+		pci_info(dev, "%s x%d link (capable of %s x%d)\n",
+			 PCIE_SPEED2STR(speed), width,
+			 PCIE_SPEED2STR(speed_cap), width_cap);
+}
+EXPORT_SYMBOL(pcie_print_link_status);
+
+/**
  * pci_select_bars - Make BAR mask from the type of resource
  * @dev: the PCI device for which BAR mask is made
  * @flags: resource type mask to be selected
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8242d3d..4a20870 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1085,6 +1085,7 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
 			  enum pcie_link_width *width);
 int pcie_get_speed_cap(struct pci_dev *dev, enum pci_bus_speed *speed);
 int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width);
+void pcie_print_link_status(struct pci_dev *dev);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
 int pci_reset_function(struct pci_dev *dev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH next V3 4/5] net/mlx4_core: Report PCI properties using dedicated function
  2018-03-12 12:06 [PATCH next V3 0/5] Report PCI device link status Tal Gilboa
                   ` (2 preceding siblings ...)
  2018-03-12 12:06 ` [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log Tal Gilboa
@ 2018-03-12 12:06 ` Tal Gilboa
  2018-03-12 12:06 ` [PATCH next V3 5/5] net/mlx5: Report device PCI link status and issues Tal Gilboa
  4 siblings, 0 replies; 15+ messages in thread
From: Tal Gilboa @ 2018-03-12 12:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Tariq Toukan, Tal Gilboa

Change mlx4 method of checking and reporting PCI status and
maximum capabilities to use the pci driver functions instead
of implementing them in the driver code.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 81 +------------------------------
 1 file changed, 1 insertion(+), 80 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 4d84cab..30cacac 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -623,85 +623,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	return 0;
 }
 
-static int mlx4_get_pcie_dev_link_caps(struct mlx4_dev *dev,
-				       enum pci_bus_speed *speed,
-				       enum pcie_link_width *width)
-{
-	u32 lnkcap1, lnkcap2;
-	int err1, err2;
-
-#define  PCIE_MLW_CAP_SHIFT 4	/* start of MLW mask in link capabilities */
-
-	*speed = PCI_SPEED_UNKNOWN;
-	*width = PCIE_LNK_WIDTH_UNKNOWN;
-
-	err1 = pcie_capability_read_dword(dev->persist->pdev, PCI_EXP_LNKCAP,
-					  &lnkcap1);
-	err2 = pcie_capability_read_dword(dev->persist->pdev, PCI_EXP_LNKCAP2,
-					  &lnkcap2);
-	if (!err2 && lnkcap2) { /* PCIe r3.0-compliant */
-		if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_8_0GB)
-			*speed = PCIE_SPEED_8_0GT;
-		else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_5_0GB)
-			*speed = PCIE_SPEED_5_0GT;
-		else if (lnkcap2 & PCI_EXP_LNKCAP2_SLS_2_5GB)
-			*speed = PCIE_SPEED_2_5GT;
-	}
-	if (!err1) {
-		*width = (lnkcap1 & PCI_EXP_LNKCAP_MLW) >> PCIE_MLW_CAP_SHIFT;
-		if (!lnkcap2) { /* pre-r3.0 */
-			if (lnkcap1 & PCI_EXP_LNKCAP_SLS_5_0GB)
-				*speed = PCIE_SPEED_5_0GT;
-			else if (lnkcap1 & PCI_EXP_LNKCAP_SLS_2_5GB)
-				*speed = PCIE_SPEED_2_5GT;
-		}
-	}
-
-	if (*speed == PCI_SPEED_UNKNOWN || *width == PCIE_LNK_WIDTH_UNKNOWN) {
-		return err1 ? err1 :
-			err2 ? err2 : -EINVAL;
-	}
-	return 0;
-}
-
-static void mlx4_check_pcie_caps(struct mlx4_dev *dev)
-{
-	enum pcie_link_width width, width_cap;
-	enum pci_bus_speed speed, speed_cap;
-	int err;
-
-#define PCIE_SPEED_STR(speed) \
-	(speed == PCIE_SPEED_8_0GT ? "8.0GT/s" : \
-	 speed == PCIE_SPEED_5_0GT ? "5.0GT/s" : \
-	 speed == PCIE_SPEED_2_5GT ? "2.5GT/s" : \
-	 "Unknown")
-
-	err = mlx4_get_pcie_dev_link_caps(dev, &speed_cap, &width_cap);
-	if (err) {
-		mlx4_warn(dev,
-			  "Unable to determine PCIe device BW capabilities\n");
-		return;
-	}
-
-	err = pcie_get_minimum_link(dev->persist->pdev, &speed, &width);
-	if (err || speed == PCI_SPEED_UNKNOWN ||
-	    width == PCIE_LNK_WIDTH_UNKNOWN) {
-		mlx4_warn(dev,
-			  "Unable to determine PCI device chain minimum BW\n");
-		return;
-	}
-
-	if (width != width_cap || speed != speed_cap)
-		mlx4_warn(dev,
-			  "PCIe BW is different than device's capability\n");
-
-	mlx4_info(dev, "PCIe link speed is %s, device supports %s\n",
-		  PCIE_SPEED_STR(speed), PCIE_SPEED_STR(speed_cap));
-	mlx4_info(dev, "PCIe link width is x%d, device supports x%d\n",
-		  width, width_cap);
-	return;
-}
-
 /*The function checks if there are live vf, return the num of them*/
 static int mlx4_how_many_lives_vf(struct mlx4_dev *dev)
 {
@@ -3475,7 +3396,7 @@ static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
 	 * express device capabilities are under-satisfied by the bus.
 	 */
 	if (!mlx4_is_slave(dev))
-		mlx4_check_pcie_caps(dev);
+		pcie_print_link_status(dev->persist->pdev);
 
 	/* In master functions, the communication channel must be initialized
 	 * after obtaining its address from fw */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH next V3 5/5] net/mlx5: Report device PCI link status and issues
  2018-03-12 12:06 [PATCH next V3 0/5] Report PCI device link status Tal Gilboa
                   ` (3 preceding siblings ...)
  2018-03-12 12:06 ` [PATCH next V3 4/5] net/mlx4_core: Report PCI properties using dedicated function Tal Gilboa
@ 2018-03-12 12:06 ` Tal Gilboa
  4 siblings, 0 replies; 15+ messages in thread
From: Tal Gilboa @ 2018-03-12 12:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linux PCI, Tariq Toukan, Tal Gilboa

Add a kernel log print for mlx5 PCI device's link status.

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 2ef641c9..6adad72 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1043,6 +1043,8 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	dev_info(&pdev->dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
 		 fw_rev_min(dev), fw_rev_sub(dev));
 
+	pcie_print_link_status(dev->pdev);
+
 	/* on load removing any previous indication of internal error, device is
 	 * up
 	 */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-12 12:06 ` [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log Tal Gilboa
@ 2018-03-20 14:05   ` Bjorn Helgaas
  2018-03-20 15:57     ` Keller, Jacob E
  2018-03-21  7:43     ` Tal Gilboa
  0 siblings, 2 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2018-03-20 14:05 UTC (permalink / raw)
  To: Tal Gilboa; +Cc: Bjorn Helgaas, Linux PCI, Tariq Toukan, Jacob Keller

[+cc Jacob]

On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
> Add pcie_print_link_status() function for querying and verifying
> a PCI device link status. The PCI speed and width are reported
> in kernel log.
> This provides a unified method for all PCI devices to
> report status and issues, instead of each device reporting in a
> different way, using different code.
> 
> Signed-off-by: Tal Gilboa <talgi@mellanox.com>
> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> ---
>  drivers/pci/pci.c   | 25 +++++++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 26 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 48b9fd6..ac876c4 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5229,6 +5229,31 @@ int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width)
>  EXPORT_SYMBOL(pcie_get_width_cap);
>  
>  /**
> + * pcie_print_link_status - Reports the PCI device's link speed and width.
> + * @dev: PCI device to query
> + *
> + * This function checks whether the PCI device current speed and width are equal
> + * to the maximum PCI device capabilities.
> + */
> +void pcie_print_link_status(struct pci_dev *dev)
> +{
> +	enum pcie_link_width width, width_cap;
> +	enum pci_bus_speed speed, speed_cap;
> +
> +	pcie_get_speed_cap(dev, &speed_cap);
> +	pcie_get_width_cap(dev, &width_cap);
> +	pcie_get_minimum_link(dev, &speed, &width);
> +
> +	if (speed == speed_cap && width == width_cap)
> +		pci_info(dev, "%s x%d link\n", PCIE_SPEED2STR(speed), width);
> +	else
> +		pci_info(dev, "%s x%d link (capable of %s x%d)\n",
> +			 PCIE_SPEED2STR(speed), width,
> +			 PCIE_SPEED2STR(speed_cap), width_cap);

I think pcie_get_minimum_link() is misleading.  This new use of
it is very similar to the existing uses, but I think we should clean
this up before using it in more places.

pcie_get_minimum_link() finds the minimum speed and minimum link width
(separately) across all the links in the path, and I don't think
that's what we really want.  What we *want* is the total bandwidth
available to the device, because we're trying to learn if the device
is capable of more than the fabric can deliver.

That means we want to look at all the links in the path leading to the
device and find the link with the lowest bandwidth, i.e., width *
speed.  Obviously this is not necessarily the link with the fewest
lanes or the lowest link speed.

I hinted at this before, but you thought it would result in more
questions than answers [1], and I failed to continue the conversation.

Let's continue it now :)

Here are some straw-man interfaces to return a device's capabilities
and the available bandwidth in MB/s:

  unsigned int pcie_bandwidth_capable(struct pci_dev *dev,
                                      enum pci_bus_speed *speed,
                                      enum pcie_link_width *width);
  unsigned int pcie_bandwidth_available(struct pci_dev *dev);

Then we can compare the result with what "dev" is capable of, e.g.,

  my_bw = pcie_bandwidth_capable(dev, &my_speed, &my_width);
  avail_bw = pcie_bandwidth_available(dev);
  if (avail_bw >= my_bw)
    pci_info(dev, "%d MB/s available bandwidth (%s x%d link)\n",
             my_bw, PCIE_SPEED2STR(my_speed), my_width);
  else
    pci_info(dev, "%d MB/s available bandwidth (capable of %d MB/s, %s x%d)\n",
             avail_bw, my_bw, PCIE_SPEED2STR(my_speed), my_width);

Using GB/s would be nicer; I lazily used MB/s to avoid dealing with
decimals, e.g., 2.5GB/s.

[1] https://lkml.kernel.org/r/46f54e24-e65e-08c6-60b3-e34ffd79e91a@mellanox.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-20 14:05   ` Bjorn Helgaas
@ 2018-03-20 15:57     ` Keller, Jacob E
  2018-03-21  7:43     ` Tal Gilboa
  1 sibling, 0 replies; 15+ messages in thread
From: Keller, Jacob E @ 2018-03-20 15:57 UTC (permalink / raw)
  To: Bjorn Helgaas, Tal Gilboa; +Cc: Bjorn Helgaas, Linux PCI, Tariq Toukan

> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Tuesday, March 20, 2018 7:05 AM
> To: Tal Gilboa <talgi@mellanox.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; Linux PCI <linux-
> pci@vger.kernel.org>; Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>
> Subject: Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
> 
> [+cc Jacob]
> 
> >  /**
> > + * pcie_print_link_status - Reports the PCI device's link speed and width.
> > + * @dev: PCI device to query
> > + *
> > + * This function checks whether the PCI device current speed and width are
> equal
> > + * to the maximum PCI device capabilities.
> > + */
> > +void pcie_print_link_status(struct pci_dev *dev)
> > +{
> > +	enum pcie_link_width width, width_cap;
> > +	enum pci_bus_speed speed, speed_cap;
> > +
> > +	pcie_get_speed_cap(dev, &speed_cap);
> > +	pcie_get_width_cap(dev, &width_cap);
> > +	pcie_get_minimum_link(dev, &speed, &width);
> > +
> > +	if (speed == speed_cap && width == width_cap)
> > +		pci_info(dev, "%s x%d link\n", PCIE_SPEED2STR(speed), width);
> > +	else
> > +		pci_info(dev, "%s x%d link (capable of %s x%d)\n",
> > +			 PCIE_SPEED2STR(speed), width,
> > +			 PCIE_SPEED2STR(speed_cap), width_cap);
> 
> I think pcie_get_minimum_link() is misleading.  This new use of
> it is very similar to the existing uses, but I think we should clean
> this up before using it in more places.
> 

Based on your outline here, I completely agree!

> pcie_get_minimum_link() finds the minimum speed and minimum link width
> (separately) across all the links in the path, and I don't think
> that's what we really want.  What we *want* is the total bandwidth
> available to the device, because we're trying to learn if the device
> is capable of more than the fabric can deliver.
> 
> That means we want to look at all the links in the path leading to the
> device and find the link with the lowest bandwidth, i.e., width *
> speed.  Obviously this is not necessarily the link with the fewest
> lanes or the lowest link speed.
> 

Wow, yep you're right! That is a very subtle difference, and one I wish I'd been more aware of prior to writing the get_minimum_link code. I *intended* the code to give a minimum bandwidth calculation. IMHO, this is a bug in pcie_get_minimum_link, and should be fixed there.

> I hinted at this before, but you thought it would result in more
> questions than answers [1], and I failed to continue the conversation.
> 
> Let's continue it now :)
> 
> Here are some straw-man interfaces to return a device's capabilities
> and the available bandwidth in MB/s:
> 
>   unsigned int pcie_bandwidth_capable(struct pci_dev *dev,
>                                       enum pci_bus_speed *speed,
>                                       enum pcie_link_width *width);
>   unsigned int pcie_bandwidth_available(struct pci_dev *dev);
> 
> Then we can compare the result with what "dev" is capable of, e.g.,
> 
>   my_bw = pcie_bandwidth_capable(dev, &my_speed, &my_width);
>   avail_bw = pcie_bandwidth_available(dev);
>   if (avail_bw >= my_bw)
>     pci_info(dev, "%d MB/s available bandwidth (%s x%d link)\n",
>              my_bw, PCIE_SPEED2STR(my_speed), my_width);
>   else
>     pci_info(dev, "%d MB/s available bandwidth (capable of %d MB/s, %s x%d)\n",
>              avail_bw, my_bw, PCIE_SPEED2STR(my_speed), my_width);
> 
> Using GB/s would be nicer; I lazily used MB/s to avoid dealing with
> decimals, e.g., 2.5GB/s.

MBs is better, imo for the same reason.

I like the names here, they're easier to understand. Additionally, we can deprecate and remove pcie_get_minimum_link after fixing up the uses, since this functionality is what the pcie_get_minimum_link was *supposed* to provide.

Thanks,
Jake

> 
> [1] https://lkml.kernel.org/r/46f54e24-e65e-08c6-60b3-
> e34ffd79e91a@mellanox.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-20 14:05   ` Bjorn Helgaas
  2018-03-20 15:57     ` Keller, Jacob E
@ 2018-03-21  7:43     ` Tal Gilboa
  2018-03-21 19:47       ` Bjorn Helgaas
  1 sibling, 1 reply; 15+ messages in thread
From: Tal Gilboa @ 2018-03-21  7:43 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Bjorn Helgaas, Linux PCI, Tariq Toukan, Jacob Keller

On 3/20/2018 4:05 PM, Bjorn Helgaas wrote:
> [+cc Jacob]
> 
> On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
>> Add pcie_print_link_status() function for querying and verifying
>> a PCI device link status. The PCI speed and width are reported
>> in kernel log.
>> This provides a unified method for all PCI devices to
>> report status and issues, instead of each device reporting in a
>> different way, using different code.
>>
>> Signed-off-by: Tal Gilboa <talgi@mellanox.com>
>> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
>> ---
>>   drivers/pci/pci.c   | 25 +++++++++++++++++++++++++
>>   include/linux/pci.h |  1 +
>>   2 files changed, 26 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 48b9fd6..ac876c4 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5229,6 +5229,31 @@ int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width)
>>   EXPORT_SYMBOL(pcie_get_width_cap);
>>   
>>   /**
>> + * pcie_print_link_status - Reports the PCI device's link speed and width.
>> + * @dev: PCI device to query
>> + *
>> + * This function checks whether the PCI device current speed and width are equal
>> + * to the maximum PCI device capabilities.
>> + */
>> +void pcie_print_link_status(struct pci_dev *dev)
>> +{
>> +	enum pcie_link_width width, width_cap;
>> +	enum pci_bus_speed speed, speed_cap;
>> +
>> +	pcie_get_speed_cap(dev, &speed_cap);
>> +	pcie_get_width_cap(dev, &width_cap);
>> +	pcie_get_minimum_link(dev, &speed, &width);
>> +
>> +	if (speed == speed_cap && width == width_cap)
>> +		pci_info(dev, "%s x%d link\n", PCIE_SPEED2STR(speed), width);
>> +	else
>> +		pci_info(dev, "%s x%d link (capable of %s x%d)\n",
>> +			 PCIE_SPEED2STR(speed), width,
>> +			 PCIE_SPEED2STR(speed_cap), width_cap);
> 
> I think pcie_get_minimum_link() is misleading.  This new use of
> it is very similar to the existing uses, but I think we should clean
> this up before using it in more places.
> 
> pcie_get_minimum_link() finds the minimum speed and minimum link width
> (separately) across all the links in the path, and I don't think
> that's what we really want.  What we *want* is the total bandwidth
> available to the device, because we're trying to learn if the device
> is capable of more than the fabric can deliver.
> 
> That means we want to look at all the links in the path leading to the
> device and find the link with the lowest bandwidth, i.e., width *
> speed.  Obviously this is not necessarily the link with the fewest
> lanes or the lowest link speed.

The problem I see here is we need to consider differences in PCIe gens. 
Let's consider this example: gen3 x4 device which down the chain goes 
over gen2 x8 pcie bridge. The device's bandwidth is D=4*8*(gen3 
encoding) and the bridge's bandwidth is B=8*5*(gen2 encoding). D~=31.5 
and B~=32. So it would seem like there's no issue, but don't we want to 
know we actually went up as gen2?

> 
> I hinted at this before, but you thought it would result in more
> questions than answers [1], and I failed to continue the conversation.

You raised the concern of not knowing the device limiting the total 
bandwidth of the chain. I thought it was an argument against the idea, 
but maybe I misunderstood. Let's agree on a design (below) before I make 
anymore changes :).

> 
> Let's continue it now :)
> 
> Here are some straw-man interfaces to return a device's capabilities
> and the available bandwidth in MB/s:
> 
>    unsigned int pcie_bandwidth_capable(struct pci_dev *dev,
>                                        enum pci_bus_speed *speed,
>                                        enum pcie_link_width *width);
>    unsigned int pcie_bandwidth_available(struct pci_dev *dev);
> 
> Then we can compare the result with what "dev" is capable of, e.g.,
> 
>    my_bw = pcie_bandwidth_capable(dev, &my_speed, &my_width);
>    avail_bw = pcie_bandwidth_available(dev);
>    if (avail_bw >= my_bw)

Can it really be >?

>      pci_info(dev, "%d MB/s available bandwidth (%s x%d link)\n",
>               my_bw, PCIE_SPEED2STR(my_speed), my_width);
>    else
>      pci_info(dev, "%d MB/s available bandwidth (capable of %d MB/s, %s x%d)\n",
>               avail_bw, my_bw, PCIE_SPEED2STR(my_speed), my_width);
> 
> Using GB/s would be nicer; I lazily used MB/s to avoid dealing with
> decimals, e.g., 2.5GB/s.

So let's see if we agree on the steps:
1. my_speed_cap, my_width_cap <- calculate device PCIe caps
2. avail_bw, limiting_dev <- calculate PCIe chain bandwidth
3. my_bw <- my_speed_cap * my_width_cap
4. If avail_bw == my_bw print available bandwidth + PCIe caps
5. Else print available bandwidth + limited by + capable bandwidth + 
PCIe caps

What do you think?

> 
> [1] https://lkml.kernel.org/r/46f54e24-e65e-08c6-60b3-e34ffd79e91a@mellanox.com
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-21  7:43     ` Tal Gilboa
@ 2018-03-21 19:47       ` Bjorn Helgaas
  2018-03-21 19:59         ` Keller, Jacob E
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2018-03-21 19:47 UTC (permalink / raw)
  To: Tal Gilboa; +Cc: Bjorn Helgaas, Linux PCI, Tariq Toukan, Jacob Keller

On Wed, Mar 21, 2018 at 09:43:48AM +0200, Tal Gilboa wrote:
> On 3/20/2018 4:05 PM, Bjorn Helgaas wrote:
> > [+cc Jacob]
> > 
> > On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
> > > Add pcie_print_link_status() function for querying and verifying
> > > a PCI device link status. The PCI speed and width are reported
> > > in kernel log.
> > > This provides a unified method for all PCI devices to
> > > report status and issues, instead of each device reporting in a
> > > different way, using different code.
> > > 
> > > Signed-off-by: Tal Gilboa <talgi@mellanox.com>
> > > Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> > > ---
> > >   drivers/pci/pci.c   | 25 +++++++++++++++++++++++++
> > >   include/linux/pci.h |  1 +
> > >   2 files changed, 26 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 48b9fd6..ac876c4 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -5229,6 +5229,31 @@ int pcie_get_width_cap(struct pci_dev *dev, enum pcie_link_width *width)
> > >   EXPORT_SYMBOL(pcie_get_width_cap);
> > >   /**
> > > + * pcie_print_link_status - Reports the PCI device's link speed and width.
> > > + * @dev: PCI device to query
> > > + *
> > > + * This function checks whether the PCI device current speed and width are equal
> > > + * to the maximum PCI device capabilities.
> > > + */
> > > +void pcie_print_link_status(struct pci_dev *dev)
> > > +{
> > > +	enum pcie_link_width width, width_cap;
> > > +	enum pci_bus_speed speed, speed_cap;
> > > +
> > > +	pcie_get_speed_cap(dev, &speed_cap);
> > > +	pcie_get_width_cap(dev, &width_cap);
> > > +	pcie_get_minimum_link(dev, &speed, &width);
> > > +
> > > +	if (speed == speed_cap && width == width_cap)
> > > +		pci_info(dev, "%s x%d link\n", PCIE_SPEED2STR(speed), width);
> > > +	else
> > > +		pci_info(dev, "%s x%d link (capable of %s x%d)\n",
> > > +			 PCIE_SPEED2STR(speed), width,
> > > +			 PCIE_SPEED2STR(speed_cap), width_cap);
> > 
> > I think pcie_get_minimum_link() is misleading.  This new use of
> > it is very similar to the existing uses, but I think we should clean
> > this up before using it in more places.
> > 
> > pcie_get_minimum_link() finds the minimum speed and minimum link width
> > (separately) across all the links in the path, and I don't think
> > that's what we really want.  What we *want* is the total bandwidth
> > available to the device, because we're trying to learn if the device
> > is capable of more than the fabric can deliver.
> > 
> > That means we want to look at all the links in the path leading to the
> > device and find the link with the lowest bandwidth, i.e., width *
> > speed.  Obviously this is not necessarily the link with the fewest
> > lanes or the lowest link speed.
> 
> The problem I see here is we need to consider differences in PCIe
> gens.  Let's consider this example: gen3 x4 device which down the
> chain goes over gen2 x8 pcie bridge. The device's bandwidth is
> D=4*8*(gen3 encoding) and the bridge's bandwidth is B=8*5*(gen2
> encoding). D~=31.5 and B~=32. So it would seem like there's no
> issue, but don't we want to know we actually went up as gen2?

If you're saying the "bandwidth = width * speed" equation should be
enhanced to consider the effect of encoding, I would agree with that.
fm10k_slot_warn() has an example of doing that.

I don't think we specifically care about whether it is gen2/gen3/etc.
If a device with a narrow gen3 link can't keep a wide gen2 link
upstream from it saturated, the device has nothing to complain about.

> > I hinted at this before, but you thought it would result in more
> > questions than answers [1], and I failed to continue the conversation.
> 
> You raised the concern of not knowing the device limiting the total
> bandwidth of the chain. I thought it was an argument against the
> idea, but maybe I misunderstood. Let's agree on a design (below)
> before I make anymore changes :).

I was basically asking whether you wanted to know which device was
the limiting factor.  If we care about that, we could easily return
it, but I don't see any drivers that care about it now.

> > Let's continue it now :)
> > 
> > Here are some straw-man interfaces to return a device's capabilities
> > and the available bandwidth in MB/s:
> > 
> >    unsigned int pcie_bandwidth_capable(struct pci_dev *dev,
> >                                        enum pci_bus_speed *speed,
> >                                        enum pcie_link_width *width);
> >    unsigned int pcie_bandwidth_available(struct pci_dev *dev);
> > 
> > Then we can compare the result with what "dev" is capable of, e.g.,
> > 
> >    my_bw = pcie_bandwidth_capable(dev, &my_speed, &my_width);
> >    avail_bw = pcie_bandwidth_available(dev);
> >    if (avail_bw >= my_bw)
> 
> Can it really be >?

If pcie_bandwidth_available() looks at my device or the immediate
upstream bridge (two ends of the last link), then no, the available
bandwidth (based on current link settings) should never be greater
than the bandwidth my device could support (based on link
capabilities).

> >      pci_info(dev, "%d MB/s available bandwidth (%s x%d link)\n",
> >               my_bw, PCIE_SPEED2STR(my_speed), my_width);
> >    else
> >      pci_info(dev, "%d MB/s available bandwidth (capable of %d MB/s, %s x%d)\n",
> >               avail_bw, my_bw, PCIE_SPEED2STR(my_speed), my_width);
> > 
> > Using GB/s would be nicer; I lazily used MB/s to avoid dealing with
> > decimals, e.g., 2.5GB/s.
> 
> So let's see if we agree on the steps:
> 1. my_speed_cap, my_width_cap <- calculate device PCIe caps
> 2. avail_bw, limiting_dev <- calculate PCIe chain bandwidth
> 3. my_bw <- my_speed_cap * my_width_cap
> 4. If avail_bw == my_bw print available bandwidth + PCIe caps
> 5. Else print available bandwidth + limited by + capable bandwidth + PCIe
> caps
> 
> What do you think?

Steps 2 and 3 might need to be smart enough to apply the effect of
encoding differences between generations.

In step 2, we don't have any current user of the "limiting_dev"
information, so I'd omit it until we have somebody who wants it.

In step 5, we don't know the "limited by" part (unless you want to add
that).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-21 19:47       ` Bjorn Helgaas
@ 2018-03-21 19:59         ` Keller, Jacob E
  2018-03-21 20:10           ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Keller, Jacob E @ 2018-03-21 19:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Tal Gilboa; +Cc: Bjorn Helgaas, Linux PCI, Tariq Toukan



> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Wednesday, March 21, 2018 12:48 PM
> To: Tal Gilboa <talgi@mellanox.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; Linux PCI <linux-
> pci@vger.kernel.org>; Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>
> Subject: Re: [PATCH next V3 3/5] PCI: Print PCI device link status in ker=
nel log
>=20
> On Wed, Mar 21, 2018 at 09:43:48AM +0200, Tal Gilboa wrote:
> > On 3/20/2018 4:05 PM, Bjorn Helgaas wrote:
> > > [+cc Jacob]
> > >
> > > On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
> > > > Add pcie_print_link_status() function for querying and verifying
> > > > a PCI device link status. The PCI speed and width are reported
> > > > in kernel log.
> > > > This provides a unified method for all PCI devices to
> > > > report status and issues, instead of each device reporting in a
> > > > different way, using different code.
> > > >
> > > > Signed-off-by: Tal Gilboa <talgi@mellanox.com>
> > > > Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> > > > ---
> > > >   drivers/pci/pci.c   | 25 +++++++++++++++++++++++++
> > > >   include/linux/pci.h |  1 +
> > > >   2 files changed, 26 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 48b9fd6..ac876c4 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -5229,6 +5229,31 @@ int pcie_get_width_cap(struct pci_dev *dev,
> enum pcie_link_width *width)
> > > >   EXPORT_SYMBOL(pcie_get_width_cap);
> > > >   /**
> > > > + * pcie_print_link_status - Reports the PCI device's link speed an=
d width.
> > > > + * @dev: PCI device to query
> > > > + *
> > > > + * This function checks whether the PCI device current speed and w=
idth
> are equal
> > > > + * to the maximum PCI device capabilities.
> > > > + */
> > > > +void pcie_print_link_status(struct pci_dev *dev)
> > > > +{
> > > > +	enum pcie_link_width width, width_cap;
> > > > +	enum pci_bus_speed speed, speed_cap;
> > > > +
> > > > +	pcie_get_speed_cap(dev, &speed_cap);
> > > > +	pcie_get_width_cap(dev, &width_cap);
> > > > +	pcie_get_minimum_link(dev, &speed, &width);
> > > > +
> > > > +	if (speed =3D=3D speed_cap && width =3D=3D width_cap)
> > > > +		pci_info(dev, "%s x%d link\n", PCIE_SPEED2STR(speed), width);
> > > > +	else
> > > > +		pci_info(dev, "%s x%d link (capable of %s x%d)\n",
> > > > +			 PCIE_SPEED2STR(speed), width,
> > > > +			 PCIE_SPEED2STR(speed_cap), width_cap);
> > >
> > > I think pcie_get_minimum_link() is misleading.  This new use of
> > > it is very similar to the existing uses, but I think we should clean
> > > this up before using it in more places.
> > >
> > > pcie_get_minimum_link() finds the minimum speed and minimum link widt=
h
> > > (separately) across all the links in the path, and I don't think
> > > that's what we really want.  What we *want* is the total bandwidth
> > > available to the device, because we're trying to learn if the device
> > > is capable of more than the fabric can deliver.
> > >
> > > That means we want to look at all the links in the path leading to th=
e
> > > device and find the link with the lowest bandwidth, i.e., width *
> > > speed.  Obviously this is not necessarily the link with the fewest
> > > lanes or the lowest link speed.
> >
> > The problem I see here is we need to consider differences in PCIe
> > gens.  Let's consider this example: gen3 x4 device which down the
> > chain goes over gen2 x8 pcie bridge. The device's bandwidth is
> > D=3D4*8*(gen3 encoding) and the bridge's bandwidth is B=3D8*5*(gen2
> > encoding). D~=3D31.5 and B~=3D32. So it would seem like there's no
> > issue, but don't we want to know we actually went up as gen2?
>=20
> If you're saying the "bandwidth =3D width * speed" equation should be
> enhanced to consider the effect of encoding, I would agree with that.
> fm10k_slot_warn() has an example of doing that.
>=20
> I don't think we specifically care about whether it is gen2/gen3/etc.
> If a device with a narrow gen3 link can't keep a wide gen2 link
> upstream from it saturated, the device has nothing to complain about.
>=20
> > > I hinted at this before, but you thought it would result in more
> > > questions than answers [1], and I failed to continue the conversation=
.
> >
> > You raised the concern of not knowing the device limiting the total
> > bandwidth of the chain. I thought it was an argument against the
> > idea, but maybe I misunderstood. Let's agree on a design (below)
> > before I make anymore changes :).
>=20
> I was basically asking whether you wanted to know which device was
> the limiting factor.  If we care about that, we could easily return
> it, but I don't see any drivers that care about it now.
>=20
> > > Let's continue it now :)
> > >
> > > Here are some straw-man interfaces to return a device's capabilities
> > > and the available bandwidth in MB/s:
> > >
> > >    unsigned int pcie_bandwidth_capable(struct pci_dev *dev,
> > >                                        enum pci_bus_speed *speed,
> > >                                        enum pcie_link_width *width);
> > >    unsigned int pcie_bandwidth_available(struct pci_dev *dev);
> > >
> > > Then we can compare the result with what "dev" is capable of, e.g.,
> > >
> > >    my_bw =3D pcie_bandwidth_capable(dev, &my_speed, &my_width);
> > >    avail_bw =3D pcie_bandwidth_available(dev);
> > >    if (avail_bw >=3D my_bw)
> >
> > Can it really be >?
>=20
> If pcie_bandwidth_available() looks at my device or the immediate
> upstream bridge (two ends of the last link), then no, the available
> bandwidth (based on current link settings) should never be greater
> than the bandwidth my device could support (based on link
> capabilities).
>=20
> > >      pci_info(dev, "%d MB/s available bandwidth (%s x%d link)\n",
> > >               my_bw, PCIE_SPEED2STR(my_speed), my_width);
> > >    else
> > >      pci_info(dev, "%d MB/s available bandwidth (capable of %d MB/s, =
%s
> x%d)\n",
> > >               avail_bw, my_bw, PCIE_SPEED2STR(my_speed), my_width);
> > >
> > > Using GB/s would be nicer; I lazily used MB/s to avoid dealing with
> > > decimals, e.g., 2.5GB/s.
> >
> > So let's see if we agree on the steps:
> > 1. my_speed_cap, my_width_cap <- calculate device PCIe caps
> > 2. avail_bw, limiting_dev <- calculate PCIe chain bandwidth
> > 3. my_bw <- my_speed_cap * my_width_cap
> > 4. If avail_bw =3D=3D my_bw print available bandwidth + PCIe caps
> > 5. Else print available bandwidth + limited by + capable bandwidth + PC=
Ie
> > caps
> >
> > What do you think?
>=20
> Steps 2 and 3 might need to be smart enough to apply the effect of
> encoding differences between generations.
>=20
> In step 2, we don't have any current user of the "limiting_dev"
> information, so I'd omit it until we have somebody who wants it.
>=20
> In step 5, we don't know the "limited by" part (unless you want to add
> that).

It might be useful to have the limited by information printed, even if no d=
river yet bothered to do it today.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-21 19:59         ` Keller, Jacob E
@ 2018-03-21 20:10           ` Bjorn Helgaas
  2018-03-27 17:17             ` Tal Gilboa
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2018-03-21 20:10 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: Tal Gilboa, Bjorn Helgaas, Linux PCI, Tariq Toukan

On Wed, Mar 21, 2018 at 07:59:21PM +0000, Keller, Jacob E wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> > Sent: Wednesday, March 21, 2018 12:48 PM
> > To: Tal Gilboa <talgi@mellanox.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>; Linux PCI <linux-
> > pci@vger.kernel.org>; Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> > <jacob.e.keller@intel.com>
> > Subject: Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
> > 
> > On Wed, Mar 21, 2018 at 09:43:48AM +0200, Tal Gilboa wrote:
> > > On 3/20/2018 4:05 PM, Bjorn Helgaas wrote:
> > > > [+cc Jacob]
> > > >
> > > > On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
> > > > > Add pcie_print_link_status() function for querying and verifying
> > > > > a PCI device link status. The PCI speed and width are reported
> > > > > in kernel log.
> > > > > This provides a unified method for all PCI devices to
> > > > > report status and issues, instead of each device reporting in a
> > > > > different way, using different code.

> > > So let's see if we agree on the steps:
> > > 1. my_speed_cap, my_width_cap <- calculate device PCIe caps
> > > 2. avail_bw, limiting_dev <- calculate PCIe chain bandwidth
> > > 3. my_bw <- my_speed_cap * my_width_cap
> > > 4. If avail_bw == my_bw print available bandwidth + PCIe caps
> > > 5. Else print available bandwidth + limited by + capable bandwidth + PCIe
> > > caps
> > >
> > > What do you think?
> > 
> > Steps 2 and 3 might need to be smart enough to apply the effect of
> > encoding differences between generations.
> > 
> > In step 2, we don't have any current user of the "limiting_dev"
> > information, so I'd omit it until we have somebody who wants it.
> > 
> > In step 5, we don't know the "limited by" part (unless you want to add
> > that).
> 
> It might be useful to have the limited by information printed, even
> if no driver yet bothered to do it today.

I wouldn't object to printing that information (although it increases
the challenge of making the message pithy), and it's basically free to
collect it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-21 20:10           ` Bjorn Helgaas
@ 2018-03-27 17:17             ` Tal Gilboa
  2018-03-27 17:28               ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Tal Gilboa @ 2018-03-27 17:17 UTC (permalink / raw)
  To: Bjorn Helgaas, Keller, Jacob E; +Cc: Bjorn Helgaas, Linux PCI, Tariq Toukan

On 3/21/2018 10:10 PM, Bjorn Helgaas wrote:
> On Wed, Mar 21, 2018 at 07:59:21PM +0000, Keller, Jacob E wrote:
>>> -----Original Message-----
>>> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
>>> Sent: Wednesday, March 21, 2018 12:48 PM
>>> To: Tal Gilboa <talgi@mellanox.com>
>>> Cc: Bjorn Helgaas <bhelgaas@google.com>; Linux PCI <linux-
>>> pci@vger.kernel.org>; Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
>>> <jacob.e.keller@intel.com>
>>> Subject: Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
>>>
>>> On Wed, Mar 21, 2018 at 09:43:48AM +0200, Tal Gilboa wrote:
>>>> On 3/20/2018 4:05 PM, Bjorn Helgaas wrote:
>>>>> [+cc Jacob]
>>>>>
>>>>> On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
>>>>>> Add pcie_print_link_status() function for querying and verifying
>>>>>> a PCI device link status. The PCI speed and width are reported
>>>>>> in kernel log.
>>>>>> This provides a unified method for all PCI devices to
>>>>>> report status and issues, instead of each device reporting in a
>>>>>> different way, using different code.
> 
>>>> So let's see if we agree on the steps:
>>>> 1. my_speed_cap, my_width_cap <- calculate device PCIe caps
>>>> 2. avail_bw, limiting_dev <- calculate PCIe chain bandwidth
>>>> 3. my_bw <- my_speed_cap * my_width_cap
>>>> 4. If avail_bw == my_bw print available bandwidth + PCIe caps
>>>> 5. Else print available bandwidth + limited by + capable bandwidth + PCIe
>>>> caps
>>>>
>>>> What do you think?
>>>
>>> Steps 2 and 3 might need to be smart enough to apply the effect of
>>> encoding differences between generations.
>>>
>>> In step 2, we don't have any current user of the "limiting_dev"
>>> information, so I'd omit it until we have somebody who wants it.
>>>
>>> In step 5, we don't know the "limited by" part (unless you want to add
>>> that).
>>
>> It might be useful to have the limited by information printed, even
>> if no driver yet bothered to do it today.
> 
> I wouldn't object to printing that information (although it increases
> the challenge of making the message pithy), and it's basically free to
> collect it.
> 

Coding done, currently under internal review. Will submit right after.
I'm having some trouble printing the limiting device bus. Any 
recommendations on which format to use? dev->bus->name gives me the 6 
first digits (e.g. "0000:07"). How do I get the last 3 (e.g. "00.0")? 
dev->bus->primary and dev->bus->number seem like good candidates but the 
actual values I get seem off.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-27 17:17             ` Tal Gilboa
@ 2018-03-27 17:28               ` Bjorn Helgaas
  2018-03-27 17:31                 ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2018-03-27 17:28 UTC (permalink / raw)
  To: Tal Gilboa; +Cc: Keller, Jacob E, Bjorn Helgaas, Linux PCI, Tariq Toukan

On Tue, Mar 27, 2018 at 08:17:26PM +0300, Tal Gilboa wrote:
> On 3/21/2018 10:10 PM, Bjorn Helgaas wrote:
> > On Wed, Mar 21, 2018 at 07:59:21PM +0000, Keller, Jacob E wrote:
> > > > -----Original Message-----
> > > > From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> > > > Sent: Wednesday, March 21, 2018 12:48 PM
> > > > To: Tal Gilboa <talgi@mellanox.com>
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>; Linux PCI <linux-
> > > > pci@vger.kernel.org>; Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> > > > <jacob.e.keller@intel.com>
> > > > Subject: Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
> > > > 
> > > > On Wed, Mar 21, 2018 at 09:43:48AM +0200, Tal Gilboa wrote:
> > > > > On 3/20/2018 4:05 PM, Bjorn Helgaas wrote:
> > > > > > [+cc Jacob]
> > > > > > 
> > > > > > On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
> > > > > > > Add pcie_print_link_status() function for querying and verifying
> > > > > > > a PCI device link status. The PCI speed and width are reported
> > > > > > > in kernel log.
> > > > > > > This provides a unified method for all PCI devices to
> > > > > > > report status and issues, instead of each device reporting in a
> > > > > > > different way, using different code.
> > 
> > > > > So let's see if we agree on the steps:
> > > > > 1. my_speed_cap, my_width_cap <- calculate device PCIe caps
> > > > > 2. avail_bw, limiting_dev <- calculate PCIe chain bandwidth
> > > > > 3. my_bw <- my_speed_cap * my_width_cap
> > > > > 4. If avail_bw == my_bw print available bandwidth + PCIe caps
> > > > > 5. Else print available bandwidth + limited by + capable bandwidth + PCIe
> > > > > caps
> > > > > 
> > > > > What do you think?
> > > > 
> > > > Steps 2 and 3 might need to be smart enough to apply the effect of
> > > > encoding differences between generations.
> > > > 
> > > > In step 2, we don't have any current user of the "limiting_dev"
> > > > information, so I'd omit it until we have somebody who wants it.
> > > > 
> > > > In step 5, we don't know the "limited by" part (unless you want to add
> > > > that).
> > > 
> > > It might be useful to have the limited by information printed, even
> > > if no driver yet bothered to do it today.
> > 
> > I wouldn't object to printing that information (although it increases
> > the challenge of making the message pithy), and it's basically free to
> > collect it.
> > 
> 
> Coding done, currently under internal review. Will submit right after.
> I'm having some trouble printing the limiting device bus. Any
> recommendations on which format to use? dev->bus->name gives me the 6 first
> digits (e.g. "0000:07"). How do I get the last 3 (e.g. "00.0")?
> dev->bus->primary and dev->bus->number seem like good candidates but the
> actual values I get seem off.

device_name(pdev) should be what you want.  You should see two devices
with the lowest bandwidth, i.e., the upstream and downstream ends of
one link.  I think it would make the most sense to print the upstream
end.

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
  2018-03-27 17:28               ` Bjorn Helgaas
@ 2018-03-27 17:31                 ` Bjorn Helgaas
  0 siblings, 0 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2018-03-27 17:31 UTC (permalink / raw)
  To: Tal Gilboa; +Cc: Keller, Jacob E, Bjorn Helgaas, Linux PCI, Tariq Toukan

On Tue, Mar 27, 2018 at 12:28:42PM -0500, Bjorn Helgaas wrote:
> On Tue, Mar 27, 2018 at 08:17:26PM +0300, Tal Gilboa wrote:
> > On 3/21/2018 10:10 PM, Bjorn Helgaas wrote:
> > > On Wed, Mar 21, 2018 at 07:59:21PM +0000, Keller, Jacob E wrote:
> > > > > -----Original Message-----
> > > > > From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> > > > > Sent: Wednesday, March 21, 2018 12:48 PM
> > > > > To: Tal Gilboa <talgi@mellanox.com>
> > > > > Cc: Bjorn Helgaas <bhelgaas@google.com>; Linux PCI <linux-
> > > > > pci@vger.kernel.org>; Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> > > > > <jacob.e.keller@intel.com>
> > > > > Subject: Re: [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log
> > > > > 
> > > > > On Wed, Mar 21, 2018 at 09:43:48AM +0200, Tal Gilboa wrote:
> > > > > > On 3/20/2018 4:05 PM, Bjorn Helgaas wrote:
> > > > > > > [+cc Jacob]
> > > > > > > 
> > > > > > > On Mon, Mar 12, 2018 at 02:06:08PM +0200, Tal Gilboa wrote:
> > > > > > > > Add pcie_print_link_status() function for querying and verifying
> > > > > > > > a PCI device link status. The PCI speed and width are reported
> > > > > > > > in kernel log.
> > > > > > > > This provides a unified method for all PCI devices to
> > > > > > > > report status and issues, instead of each device reporting in a
> > > > > > > > different way, using different code.
> > > 
> > > > > > So let's see if we agree on the steps:
> > > > > > 1. my_speed_cap, my_width_cap <- calculate device PCIe caps
> > > > > > 2. avail_bw, limiting_dev <- calculate PCIe chain bandwidth
> > > > > > 3. my_bw <- my_speed_cap * my_width_cap
> > > > > > 4. If avail_bw == my_bw print available bandwidth + PCIe caps
> > > > > > 5. Else print available bandwidth + limited by + capable bandwidth + PCIe
> > > > > > caps
> > > > > > 
> > > > > > What do you think?
> > > > > 
> > > > > Steps 2 and 3 might need to be smart enough to apply the effect of
> > > > > encoding differences between generations.
> > > > > 
> > > > > In step 2, we don't have any current user of the "limiting_dev"
> > > > > information, so I'd omit it until we have somebody who wants it.
> > > > > 
> > > > > In step 5, we don't know the "limited by" part (unless you want to add
> > > > > that).
> > > > 
> > > > It might be useful to have the limited by information printed, even
> > > > if no driver yet bothered to do it today.
> > > 
> > > I wouldn't object to printing that information (although it increases
> > > the challenge of making the message pithy), and it's basically free to
> > > collect it.
> > > 
> > 
> > Coding done, currently under internal review. Will submit right after.
> > I'm having some trouble printing the limiting device bus. Any
> > recommendations on which format to use? dev->bus->name gives me the 6 first
> > digits (e.g. "0000:07"). How do I get the last 3 (e.g. "00.0")?
> > dev->bus->primary and dev->bus->number seem like good candidates but the
> > actual values I get seem off.
> 
> device_name(pdev) should be what you want.  You should see two devices
> with the lowest bandwidth, i.e., the upstream and downstream ends of
> one link.  I think it would make the most sense to print the upstream
> end.

Sorry, it would help if I actually looked at the code first.
pci_name(pdev), which calls dev_name(&pdev->dev), is what you want.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-27 17:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 12:06 [PATCH next V3 0/5] Report PCI device link status Tal Gilboa
2018-03-12 12:06 ` [PATCH next V3 1/5] PCI: Add a query function for PCI device's speed cap Tal Gilboa
2018-03-12 12:06 ` [PATCH next V3 2/5] PCI: Add a query function for PCI device's width cap Tal Gilboa
2018-03-12 12:06 ` [PATCH next V3 3/5] PCI: Print PCI device link status in kernel log Tal Gilboa
2018-03-20 14:05   ` Bjorn Helgaas
2018-03-20 15:57     ` Keller, Jacob E
2018-03-21  7:43     ` Tal Gilboa
2018-03-21 19:47       ` Bjorn Helgaas
2018-03-21 19:59         ` Keller, Jacob E
2018-03-21 20:10           ` Bjorn Helgaas
2018-03-27 17:17             ` Tal Gilboa
2018-03-27 17:28               ` Bjorn Helgaas
2018-03-27 17:31                 ` Bjorn Helgaas
2018-03-12 12:06 ` [PATCH next V3 4/5] net/mlx4_core: Report PCI properties using dedicated function Tal Gilboa
2018-03-12 12:06 ` [PATCH next V3 5/5] net/mlx5: Report device PCI link status and issues Tal Gilboa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).