All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs
@ 2017-08-23  6:22 Jakub Kicinski
  2017-08-23  6:22 ` [PATCH net 1/3] nfp: don't hold PF lock while enabling SR-IOV Jakub Kicinski
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

This series tackles the bug I've already tried to fix in commit 
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work").
I created a separate workqueue to avoid possible deadlock, and
the lockdep error disappeared, coincidentally.  The way workqueues
are operating, separate workqueue doesn't necessarily mean separate
thread of execution.  Luckily we can safely forego the lock.

Second fix changes the order in which vNIC netdevs and representors
are created/destroyed.  The fix is kept small and should be sufficient
for net because of how flower uses representors, a more thorough fix 
will be targeted at net-next.

Third fix avoids leaking mapped frame buffers if FW sent a frame with
unknown portid.

Jakub Kicinski (3):
  nfp: don't hold PF lock while enabling SR-IOV
  nfp: make sure representors are destroyed before their lower netdev
  nfp: avoid buffer leak when representor is missing

 drivers/net/ethernet/netronome/nfp/nfp_main.c      | 16 +++++++---------
 .../net/ethernet/netronome/nfp/nfp_net_common.c    | 10 +++++-----
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 22 ++++++++++++++--------
 3 files changed, 26 insertions(+), 22 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net 1/3] nfp: don't hold PF lock while enabling SR-IOV
  2017-08-23  6:22 [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs Jakub Kicinski
@ 2017-08-23  6:22 ` Jakub Kicinski
  2017-08-23  6:22 ` [PATCH net 2/3] nfp: make sure representors are destroyed before their lower netdev Jakub Kicinski
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

Enabling SR-IOV VFs will cause the PCI subsystem to schedule a
work and flush its workqueue.  Since the nfp driver schedules its
own work we can't enable VFs while holding driver load.  Commit
6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
tried to avoid this deadlock by creating a separate workqueue.
Unfortunately, due to the architecture of workqueue subsystem this
does not guarantee a separate thread of execution.  Luckily
we can simply take pci_enable_sriov() from under the driver lock.

Take pci_disable_sriov() from under the lock too for symmetry.

Fixes: 6d48ceb27af1 ("nfp: allocate a private workqueue for driver work")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index d67969d3e484..3f199db2002e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -98,21 +98,20 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
 	struct nfp_pf *pf = pci_get_drvdata(pdev);
 	int err;
 
-	mutex_lock(&pf->lock);
-
 	if (num_vfs > pf->limit_vfs) {
 		nfp_info(pf->cpp, "Firmware limits number of VFs to %u\n",
 			 pf->limit_vfs);
-		err = -EINVAL;
-		goto err_unlock;
+		return -EINVAL;
 	}
 
 	err = pci_enable_sriov(pdev, num_vfs);
 	if (err) {
 		dev_warn(&pdev->dev, "Failed to enable PCI SR-IOV: %d\n", err);
-		goto err_unlock;
+		return err;
 	}
 
+	mutex_lock(&pf->lock);
+
 	err = nfp_app_sriov_enable(pf->app, num_vfs);
 	if (err) {
 		dev_warn(&pdev->dev,
@@ -129,9 +128,8 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
 	return num_vfs;
 
 err_sriov_disable:
-	pci_disable_sriov(pdev);
-err_unlock:
 	mutex_unlock(&pf->lock);
+	pci_disable_sriov(pdev);
 	return err;
 #endif
 	return 0;
@@ -158,10 +156,10 @@ static int nfp_pcie_sriov_disable(struct pci_dev *pdev)
 
 	pf->num_vfs = 0;
 
+	mutex_unlock(&pf->lock);
+
 	pci_disable_sriov(pdev);
 	dev_dbg(&pdev->dev, "Removed VFs.\n");
-
-	mutex_unlock(&pf->lock);
 #endif
 	return 0;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net 2/3] nfp: make sure representors are destroyed before their lower netdev
  2017-08-23  6:22 [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs Jakub Kicinski
  2017-08-23  6:22 ` [PATCH net 1/3] nfp: don't hold PF lock while enabling SR-IOV Jakub Kicinski
@ 2017-08-23  6:22 ` Jakub Kicinski
  2017-08-23  6:22 ` [PATCH net 3/3] nfp: avoid buffer leak when representor is missing Jakub Kicinski
  2017-08-24  3:40 ` [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

App start/stop callbacks can perform application initialization.
Unfortunately, flower app started using them for creating and
destroying representors.  This can lead to a situation where
lower vNIC netdev is destroyed while representors still try
to pass traffic.  This will most likely lead to a NULL-dereference
on the lower netdev TX path.

Move the start/stop callbacks, so that representors are created/
destroyed when vNICs are fully initialized.

Fixes: 5de73ee46704 ("nfp: general representor implementation")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 5797dbf2b507..1aca4e57bf41 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -456,10 +456,6 @@ static int nfp_net_pf_app_start(struct nfp_pf *pf)
 {
 	int err;
 
-	err = nfp_net_pf_app_start_ctrl(pf);
-	if (err)
-		return err;
-
 	err = nfp_app_start(pf->app, pf->ctrl_vnic);
 	if (err)
 		goto err_ctrl_stop;
@@ -484,7 +480,6 @@ static void nfp_net_pf_app_stop(struct nfp_pf *pf)
 	if (pf->num_vfs)
 		nfp_app_sriov_disable(pf->app);
 	nfp_app_stop(pf->app);
-	nfp_net_pf_app_stop_ctrl(pf);
 }
 
 static void nfp_net_pci_unmap_mem(struct nfp_pf *pf)
@@ -559,7 +554,7 @@ static int nfp_net_pci_map_mem(struct nfp_pf *pf)
 
 static void nfp_net_pci_remove_finish(struct nfp_pf *pf)
 {
-	nfp_net_pf_app_stop(pf);
+	nfp_net_pf_app_stop_ctrl(pf);
 	/* stop app first, to avoid double free of ctrl vNIC's ddir */
 	nfp_net_debugfs_dir_clean(&pf->ddir);
 
@@ -690,6 +685,7 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 {
 	struct nfp_net_fw_version fw_ver;
 	u8 __iomem *ctrl_bar, *qc_bar;
+	struct nfp_net *nn;
 	int stride;
 	int err;
 
@@ -766,7 +762,7 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 	if (err)
 		goto err_free_vnics;
 
-	err = nfp_net_pf_app_start(pf);
+	err = nfp_net_pf_app_start_ctrl(pf);
 	if (err)
 		goto err_free_irqs;
 
@@ -774,12 +770,20 @@ int nfp_net_pci_probe(struct nfp_pf *pf)
 	if (err)
 		goto err_stop_app;
 
+	err = nfp_net_pf_app_start(pf);
+	if (err)
+		goto err_clean_vnics;
+
 	mutex_unlock(&pf->lock);
 
 	return 0;
 
+err_clean_vnics:
+	list_for_each_entry(nn, &pf->vnics, vnic_list)
+		if (nfp_net_is_data_vnic(nn))
+			nfp_net_pf_clean_vnic(pf, nn);
 err_stop_app:
-	nfp_net_pf_app_stop(pf);
+	nfp_net_pf_app_stop_ctrl(pf);
 err_free_irqs:
 	nfp_net_pf_free_irqs(pf);
 err_free_vnics:
@@ -803,6 +807,8 @@ void nfp_net_pci_remove(struct nfp_pf *pf)
 	if (list_empty(&pf->vnics))
 		goto out;
 
+	nfp_net_pf_app_stop(pf);
+
 	list_for_each_entry(nn, &pf->vnics, vnic_list)
 		if (nfp_net_is_data_vnic(nn))
 			nfp_net_pf_clean_vnic(pf, nn);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net 3/3] nfp: avoid buffer leak when representor is missing
  2017-08-23  6:22 [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs Jakub Kicinski
  2017-08-23  6:22 ` [PATCH net 1/3] nfp: don't hold PF lock while enabling SR-IOV Jakub Kicinski
  2017-08-23  6:22 ` [PATCH net 2/3] nfp: make sure representors are destroyed before their lower netdev Jakub Kicinski
@ 2017-08-23  6:22 ` Jakub Kicinski
  2017-08-24  3:40 ` [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2017-08-23  6:22 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

When driver receives a muxed frame, but it can't find the representor
netdev it is destined to it will try to "drop" that frame, i.e. reuse
the buffer.  The issue is that the replacement buffer has already been
allocated at this point, and reusing the buffer from received frame
will leak it.  Change the code to put the new buffer on the ring
earlier and not reuse the old buffer (make the buffer parameter
to nfp_net_rx_drop() a NULL).

Fixes: 91bf82ca9eed ("nfp: add support for tx/rx with metadata portid")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9f77ce038a4a..1ff0c577819e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1751,6 +1751,10 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			continue;
 		}
 
+		nfp_net_dma_unmap_rx(dp, rxbuf->dma_addr);
+
+		nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
+
 		if (likely(!meta.portid)) {
 			netdev = dp->netdev;
 		} else {
@@ -1759,16 +1763,12 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			nn = netdev_priv(dp->netdev);
 			netdev = nfp_app_repr_get(nn->app, meta.portid);
 			if (unlikely(!netdev)) {
-				nfp_net_rx_drop(dp, r_vec, rx_ring, rxbuf, skb);
+				nfp_net_rx_drop(dp, r_vec, rx_ring, NULL, skb);
 				continue;
 			}
 			nfp_repr_inc_rx_stats(netdev, pkt_len);
 		}
 
-		nfp_net_dma_unmap_rx(dp, rxbuf->dma_addr);
-
-		nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
-
 		skb_reserve(skb, pkt_off);
 		skb_put(skb, pkt_len);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs
  2017-08-23  6:22 [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs Jakub Kicinski
                   ` (2 preceding siblings ...)
  2017-08-23  6:22 ` [PATCH net 3/3] nfp: avoid buffer leak when representor is missing Jakub Kicinski
@ 2017-08-24  3:40 ` David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2017-08-24  3:40 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 22 Aug 2017 23:22:41 -0700

> This series tackles the bug I've already tried to fix in commit 
> 6d48ceb27af1 ("nfp: allocate a private workqueue for driver work").
> I created a separate workqueue to avoid possible deadlock, and
> the lockdep error disappeared, coincidentally.  The way workqueues
> are operating, separate workqueue doesn't necessarily mean separate
> thread of execution.  Luckily we can safely forego the lock.
> 
> Second fix changes the order in which vNIC netdevs and representors
> are created/destroyed.  The fix is kept small and should be sufficient
> for net because of how flower uses representors, a more thorough fix 
> will be targeted at net-next.
> 
> Third fix avoids leaking mapped frame buffers if FW sent a frame with
> unknown portid.

Series applied, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-24  3:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-23  6:22 [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs Jakub Kicinski
2017-08-23  6:22 ` [PATCH net 1/3] nfp: don't hold PF lock while enabling SR-IOV Jakub Kicinski
2017-08-23  6:22 ` [PATCH net 2/3] nfp: make sure representors are destroyed before their lower netdev Jakub Kicinski
2017-08-23  6:22 ` [PATCH net 3/3] nfp: avoid buffer leak when representor is missing Jakub Kicinski
2017-08-24  3:40 ` [PATCH net 0/3] nfp: fix SR-IOV deadlock and representor bugs David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.