All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [net-next 0/3] fm10k: driver updates
@ 2018-10-15 19:18 Jacob Keller
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition Jacob Keller
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jacob Keller @ 2018-10-15 19:18 UTC (permalink / raw)
  To: intel-wired-lan

This series includes a few driver updates for the fm10k host interface
driver. In addition to a fix related to strict platforms which can trigger
machine check exceptions during VF resets, we also add the device IDs for
the SDI adapters which were never upstreamed previously, even though the
support exists in the out-of-tree driver and the IDs are already in the
pci.ids database.

Jacob Keller (2):
  fm10k: ensure completer aborts are marked as non-fatal after a resume
  fm10k: add missing device IDs to the upstream driver

Ngai-Mint Kwan (1):
  fm10k: fix SM mailbox full condition

 drivers/net/ethernet/intel/fm10k/fm10k_iov.c  | 51 +++++++++++--------
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  |  2 +
 drivers/net/ethernet/intel/fm10k/fm10k_type.h |  2 +
 3 files changed, 34 insertions(+), 21 deletions(-)

-- 
2.18.0.219.gaf81d287a9da


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition
  2018-10-15 19:18 [Intel-wired-lan] [net-next 0/3] fm10k: driver updates Jacob Keller
@ 2018-10-15 19:18 ` Jacob Keller
  2018-11-01 18:00   ` Singh, Krishneil K
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 2/3] fm10k: ensure completer aborts are marked as non-fatal after a resume Jacob Keller
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 3/3] fm10k: add missing device IDs to the upstream driver Jacob Keller
  2 siblings, 1 reply; 7+ messages in thread
From: Jacob Keller @ 2018-10-15 19:18 UTC (permalink / raw)
  To: intel-wired-lan

From: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>

Current condition will always incorrectly report a full SM mailbox if an
IES API application is not running. Due to this, the
"fm10k_service_task" will be infinitely queued into the driver's
workqueue. This, in turn, will cause a "kworker" thread to report 100%
CPU utilization and might cause "soft lockup" events or system crashes.

To fix this issue, a new condition is added to determine if the SM
mailbox is in the correct state of FM10K_STATE_OPEN before proceeding.
In other words, an instance of the IES API must be running. If there is,
the remainder of the flow stays the same which is to determine if the SM
mailbox capacity has been exceeded or not and take appropriate action.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index e707d717012f..74160c2095ee 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -244,7 +244,8 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface)
 		}
 
 		/* guarantee we have free space in the SM mailbox */
-		if (!hw->mbx.ops.tx_ready(&hw->mbx, FM10K_VFMBX_MSG_MTU)) {
+		if (hw->mbx.state == FM10K_STATE_OPEN &&
+		    !hw->mbx.ops.tx_ready(&hw->mbx, FM10K_VFMBX_MSG_MTU)) {
 			/* keep track of how many times this occurs */
 			interface->hw_sm_mbx_full++;
 
-- 
2.18.0.219.gaf81d287a9da


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Intel-wired-lan] [net-next 2/3] fm10k: ensure completer aborts are marked as non-fatal after a resume
  2018-10-15 19:18 [Intel-wired-lan] [net-next 0/3] fm10k: driver updates Jacob Keller
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition Jacob Keller
@ 2018-10-15 19:18 ` Jacob Keller
  2018-11-01 17:59   ` Singh, Krishneil K
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 3/3] fm10k: add missing device IDs to the upstream driver Jacob Keller
  2 siblings, 1 reply; 7+ messages in thread
From: Jacob Keller @ 2018-10-15 19:18 UTC (permalink / raw)
  To: intel-wired-lan

VF drivers can trigger PCIe completer aborts any time they read a queue
that they don't own. Even in nominal circumstances, it is not possible
to prevent the VF driver from reading queues it doesn't own. VF drivers
may attempt to read queues it previously owned, but which it no longer
does due to a PF reset.

Normally these completer aborts aren't an issue. However, on some
platforms these trigger machine check errors. This is true even if we
lower their severity from fatal to non-fatal. Indeed, we already have
code for lowering the severity.

We could attempt to mask these errors conditionally around resets, which
is the most common time they would occur. However this would essentially
be a race between the PF and VF drivers, and we may still occasionally
see machine check exceptions on these strictly configured platforms.

Instead, mask the errors entirely any time we resume VFs. By doing so,
we prevent the completer aborts from being sent to the parent PCIe
device, and thus these strict platforms will not upgrade them into
machine check errors.

Addtionally, we don't lose any information by masking these errors,
because we'll still report VFs which attempt to access queues via the
FUM_BAD_VF_QACCESS errors.

Without this change, on platforms where completer aborts cause machine
check exceptions, the VF reading queues it doesn't own could crash the
host system. Masking the completer abort prevents this, so we should
mask it for good, and not just around a PCIe reset. Otherwise malicious
or misconfigured VFs could cause the host system to crash.

Because we are masking the error entirely, there is little reason to
also keep setting the severity bit, so that code is also removed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 48 ++++++++++++--------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 74160c2095ee..5d4f1761dc0c 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -303,6 +303,28 @@ void fm10k_iov_suspend(struct pci_dev *pdev)
 	}
 }
 
+static void fm10k_mask_aer_comp_abort(struct pci_dev *pdev)
+{
+	u32 err_mask;
+	int pos;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
+	if (!pos)
+		return;
+
+	/* Mask the completion abort bit in the ERR_UNCOR_MASK register,
+	 * preventing the device from reporting these errors to the upstream
+	 * PCIe root device. This avoids bringing down platforms which upgrade
+	 * non-fatal completer aborts into machine check exceptions. Completer
+	 * aborts can occur whenever a VF reads a queue it doesn't own.
+	 */
+	pci_read_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, &err_mask);
+	err_mask |= PCI_ERR_UNC_COMP_ABORT;
+	pci_write_config_dword(pdev, pos + PCI_ERR_UNCOR_MASK, err_mask);
+
+	mmiowb();
+}
+
 int fm10k_iov_resume(struct pci_dev *pdev)
 {
 	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
@@ -318,6 +340,12 @@ int fm10k_iov_resume(struct pci_dev *pdev)
 	if (!iov_data)
 		return -ENOMEM;
 
+	/* Lower severity of completer abort error reporting as
+	 * the VFs can trigger this any time they read a queue
+	 * that they don't own.
+	 */
+	fm10k_mask_aer_comp_abort(pdev);
+
 	/* allocate hardware resources for the VFs */
 	hw->iov.ops.assign_resources(hw, num_vfs, num_vfs);
 
@@ -461,20 +489,6 @@ void fm10k_iov_disable(struct pci_dev *pdev)
 	fm10k_iov_free_data(pdev);
 }
 
-static void fm10k_disable_aer_comp_abort(struct pci_dev *pdev)
-{
-	u32 err_sev;
-	int pos;
-
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
-	if (!pos)
-		return;
-
-	pci_read_config_dword(pdev, pos + PCI_ERR_UNCOR_SEVER, &err_sev);
-	err_sev &= ~PCI_ERR_UNC_COMP_ABORT;
-	pci_write_config_dword(pdev, pos + PCI_ERR_UNCOR_SEVER, err_sev);
-}
-
 int fm10k_iov_configure(struct pci_dev *pdev, int num_vfs)
 {
 	int current_vfs = pci_num_vf(pdev);
@@ -496,12 +510,6 @@ int fm10k_iov_configure(struct pci_dev *pdev, int num_vfs)
 
 	/* allocate VFs if not already allocated */
 	if (num_vfs && num_vfs != current_vfs) {
-		/* Disable completer abort error reporting as
-		 * the VFs can trigger this any time they read a queue
-		 * that they don't own.
-		 */
-		fm10k_disable_aer_comp_abort(pdev);
-
 		err = pci_enable_sriov(pdev, num_vfs);
 		if (err) {
 			dev_err(&pdev->dev,
-- 
2.18.0.219.gaf81d287a9da


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Intel-wired-lan] [net-next 3/3] fm10k: add missing device IDs to the upstream driver
  2018-10-15 19:18 [Intel-wired-lan] [net-next 0/3] fm10k: driver updates Jacob Keller
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition Jacob Keller
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 2/3] fm10k: ensure completer aborts are marked as non-fatal after a resume Jacob Keller
@ 2018-10-15 19:18 ` Jacob Keller
  2018-10-19 16:11   ` Bowers, AndrewX
  2 siblings, 1 reply; 7+ messages in thread
From: Jacob Keller @ 2018-10-15 19:18 UTC (permalink / raw)
  To: intel-wired-lan

The device IDs for the Ethernet SDI Adapter devices were never added to
the upstream driver. The IDs are already in the pci.ids database, and
are supported by the out-of-tree driver.

Add the device IDs now, so that the upstream driver can recognize and
load these devices.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  | 2 ++
 drivers/net/ethernet/intel/fm10k/fm10k_type.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index c859ababeed5..8f1345b11aa9 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -23,6 +23,8 @@ static const struct fm10k_info *fm10k_info_tbl[] = {
  */
 static const struct pci_device_id fm10k_pci_tbl[] = {
 	{ PCI_VDEVICE(INTEL, FM10K_DEV_ID_PF), fm10k_device_pf },
+	{ PCI_VDEVICE(INTEL, FM10K_DEV_ID_SDI_FM10420_QDA2), fm10k_device_pf },
+	{ PCI_VDEVICE(INTEL, FM10K_DEV_ID_SDI_FM10420_DA2), fm10k_device_pf },
 	{ PCI_VDEVICE(INTEL, FM10K_DEV_ID_VF), fm10k_device_vf },
 	/* required last entry */
 	{ 0, }
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index 3e608e493f9d..9fb9fca375e3 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -15,6 +15,8 @@ struct fm10k_hw;
 
 #define FM10K_DEV_ID_PF			0x15A4
 #define FM10K_DEV_ID_VF			0x15A5
+#define FM10K_DEV_ID_SDI_FM10420_QDA2	0x15D0
+#define FM10K_DEV_ID_SDI_FM10420_DA2	0x15D5
 
 #define FM10K_MAX_QUEUES		256
 #define FM10K_MAX_QUEUES_PF		128
-- 
2.18.0.219.gaf81d287a9da


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Intel-wired-lan] [net-next 3/3] fm10k: add missing device IDs to the upstream driver
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 3/3] fm10k: add missing device IDs to the upstream driver Jacob Keller
@ 2018-10-19 16:11   ` Bowers, AndrewX
  0 siblings, 0 replies; 7+ messages in thread
From: Bowers, AndrewX @ 2018-10-19 16:11 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On
> Behalf Of Jacob Keller
> Sent: Monday, October 15, 2018 12:18 PM
> To: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Subject: [Intel-wired-lan] [net-next 3/3] fm10k: add missing device IDs to the
> upstream driver
> 
> The device IDs for the Ethernet SDI Adapter devices were never added to
> the upstream driver. The IDs are already in the pci.ids database, and are
> supported by the out-of-tree driver.
> 
> Add the device IDs now, so that the upstream driver can recognize and load
> these devices.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c  | 2 ++
> drivers/net/ethernet/intel/fm10k/fm10k_type.h | 2 ++
>  2 files changed, 4 insertions(+)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Intel-wired-lan] [net-next 2/3] fm10k: ensure completer aborts are marked as non-fatal after a resume
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 2/3] fm10k: ensure completer aborts are marked as non-fatal after a resume Jacob Keller
@ 2018-11-01 17:59   ` Singh, Krishneil K
  0 siblings, 0 replies; 7+ messages in thread
From: Singh, Krishneil K @ 2018-11-01 17:59 UTC (permalink / raw)
  To: intel-wired-lan


> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On Behalf
> Of Jacob Keller
> Sent: Monday, October 15, 2018 12:18 PM
> To: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Subject: [Intel-wired-lan] [net-next 2/3] fm10k: ensure completer aborts are
> marked as non-fatal after a resume
> 
> VF drivers can trigger PCIe completer aborts any time they read a queue
> that they don't own. Even in nominal circumstances, it is not possible
> to prevent the VF driver from reading queues it doesn't own. VF drivers
> may attempt to read queues it previously owned, but which it no longer
> does due to a PF reset.
> 
> Normally these completer aborts aren't an issue. However, on some
> platforms these trigger machine check errors. This is true even if we
> lower their severity from fatal to non-fatal. Indeed, we already have
> code for lowering the severity.
> 
> We could attempt to mask these errors conditionally around resets, which
> is the most common time they would occur. However this would essentially
> be a race between the PF and VF drivers, and we may still occasionally
> see machine check exceptions on these strictly configured platforms.
> 
> Instead, mask the errors entirely any time we resume VFs. By doing so,
> we prevent the completer aborts from being sent to the parent PCIe
> device, and thus these strict platforms will not upgrade them into
> machine check errors.
> 
> Addtionally, we don't lose any information by masking these errors,
> because we'll still report VFs which attempt to access queues via the
> FUM_BAD_VF_QACCESS errors.
> 
> Without this change, on platforms where completer aborts cause machine
> check exceptions, the VF reading queues it doesn't own could crash the
> host system. Masking the completer abort prevents this, so we should
> mask it for good, and not just around a PCIe reset. Otherwise malicious
> or misconfigured VFs could cause the host system to crash.
> 
> Because we are masking the error entirely, there is little reason to
> also keep setting the severity bit, so that code is also removed.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---

Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition
  2018-10-15 19:18 ` [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition Jacob Keller
@ 2018-11-01 18:00   ` Singh, Krishneil K
  0 siblings, 0 replies; 7+ messages in thread
From: Singh, Krishneil K @ 2018-11-01 18:00 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at osuosl.org] On Behalf
> Of Jacob Keller
> Sent: Monday, October 15, 2018 12:18 PM
> To: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Cc: Kwan, Ngai-mint <ngai-mint.kwan@intel.com>
> Subject: [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition
> 
> From: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
> 
> Current condition will always incorrectly report a full SM mailbox if an
> IES API application is not running. Due to this, the
> "fm10k_service_task" will be infinitely queued into the driver's
> workqueue. This, in turn, will cause a "kworker" thread to report 100%
> CPU utilization and might cause "soft lockup" events or system crashes.
> 
> To fix this issue, a new condition is added to determine if the SM
> mailbox is in the correct state of FM10K_STATE_OPEN before proceeding.
> In other words, an instance of the IES API must be running. If there is,
> the remainder of the flow stays the same which is to determine if the SM
> mailbox capacity has been exceeded or not and take appropriate action.
> 
> Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---

Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-01 18:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15 19:18 [Intel-wired-lan] [net-next 0/3] fm10k: driver updates Jacob Keller
2018-10-15 19:18 ` [Intel-wired-lan] [net-next 1/3] fm10k: fix SM mailbox full condition Jacob Keller
2018-11-01 18:00   ` Singh, Krishneil K
2018-10-15 19:18 ` [Intel-wired-lan] [net-next 2/3] fm10k: ensure completer aborts are marked as non-fatal after a resume Jacob Keller
2018-11-01 17:59   ` Singh, Krishneil K
2018-10-15 19:18 ` [Intel-wired-lan] [net-next 3/3] fm10k: add missing device IDs to the upstream driver Jacob Keller
2018-10-19 16:11   ` Bowers, AndrewX

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.