* Remove private locks to avoid possible deadlock @ 2016-12-07 20:05 Lino Sanfilippo 2016-12-07 20:05 ` [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock Lino Sanfilippo 2016-12-07 20:05 ` [PATCH 2/2] net: ethernet: stmmac: " Lino Sanfilippo 0 siblings, 2 replies; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-07 20:05 UTC (permalink / raw) To: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue Cc: pavel, davem, linux-kernel, netdev Hi, these patches fix possible deadlock situations in the sxgbe and stmmac driver. Please note that the patches are only compile tested so it would be great if someone could do tests with the concerning HW. Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-07 20:05 Remove private locks to avoid possible deadlock Lino Sanfilippo @ 2016-12-07 20:05 ` Lino Sanfilippo 2016-12-07 23:15 ` Francois Romieu 2016-12-07 20:05 ` [PATCH 2/2] net: ethernet: stmmac: " Lino Sanfilippo 1 sibling, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-07 20:05 UTC (permalink / raw) To: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue Cc: pavel, davem, linux-kernel, netdev, Lino Sanfilippo The driver uses a private lock for synchronization between the xmit function and the xmit completion handler, but since the NETIF_F_LLTX flag is not set, the xmit function is also called with the xmit_lock held. On the other hand the xmit completion handler first takes the private lock and (in case that the tx queue has been stopped) the xmit_lock, leading to a reverse locking order and the potential danger of a deadlock. Fix this by removing the private lock completely and synchronizing the xmit function and completion handler solely by means of the xmit_lock. By doing this also remove the now unnecessary double check for a stopped tx queue. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> --- drivers/net/ethernet/samsung/sxgbe/sxgbe_common.h | 1 - drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 27 +++++------------------ 2 files changed, 6 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_common.h b/drivers/net/ethernet/samsung/sxgbe/sxgbe_common.h index 5cb51b6..c61f260 100644 --- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_common.h +++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_common.h @@ -384,7 +384,6 @@ struct sxgbe_tx_queue { dma_addr_t *tx_skbuff_dma; struct sk_buff **tx_skbuff; struct timer_list txtimer; - spinlock_t tx_lock; /* lock for tx queues */ unsigned int cur_tx; unsigned int dirty_tx; u32 tx_count_frames; diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c index ea44a24..22d3b0b 100644 --- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c +++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c @@ -426,9 +426,6 @@ static int init_tx_ring(struct device *dev, u8 queue_no, tx_ring->dirty_tx = 0; tx_ring->cur_tx = 0; - /* initialise TX queue lock */ - spin_lock_init(&tx_ring->tx_lock); - return 0; dmamem_err: @@ -743,7 +740,7 @@ static void sxgbe_tx_queue_clean(struct sxgbe_tx_queue *tqueue) dev_txq = netdev_get_tx_queue(priv->dev, queue_no); - spin_lock(&tqueue->tx_lock); + __netif_tx_lock(dev_txq, smp_processor_id()); priv->xstats.tx_clean++; while (tqueue->dirty_tx != tqueue->cur_tx) { @@ -781,18 +778,13 @@ static void sxgbe_tx_queue_clean(struct sxgbe_tx_queue *tqueue) /* wake up queue */ if (unlikely(netif_tx_queue_stopped(dev_txq) && - sxgbe_tx_avail(tqueue, tx_rsize) > SXGBE_TX_THRESH(priv))) { - netif_tx_lock(priv->dev); - if (netif_tx_queue_stopped(dev_txq) && - sxgbe_tx_avail(tqueue, tx_rsize) > SXGBE_TX_THRESH(priv)) { - if (netif_msg_tx_done(priv)) - pr_debug("%s: restart transmit\n", __func__); - netif_tx_wake_queue(dev_txq); - } - netif_tx_unlock(priv->dev); + sxgbe_tx_avail(tqueue, tx_rsize) > SXGBE_TX_THRESH(priv))) { + if (netif_msg_tx_done(priv)) + pr_debug("%s: restart transmit\n", __func__); + netif_tx_wake_queue(dev_txq); } - spin_unlock(&tqueue->tx_lock); + __netif_tx_unlock(dev_txq); } /** @@ -1304,9 +1296,6 @@ static netdev_tx_t sxgbe_xmit(struct sk_buff *skb, struct net_device *dev) tqueue->hwts_tx_en))) ctxt_desc_req = 1; - /* get the spinlock */ - spin_lock(&tqueue->tx_lock); - if (priv->tx_path_in_lpi_mode) sxgbe_disable_eee_mode(priv); @@ -1316,8 +1305,6 @@ static netdev_tx_t sxgbe_xmit(struct sk_buff *skb, struct net_device *dev) netdev_err(dev, "%s: Tx Ring is full when %d queue is awake\n", __func__, txq_index); } - /* release the spin lock in case of BUSY */ - spin_unlock(&tqueue->tx_lock); return NETDEV_TX_BUSY; } @@ -1436,8 +1423,6 @@ static netdev_tx_t sxgbe_xmit(struct sk_buff *skb, struct net_device *dev) priv->hw->dma->enable_dma_transmission(priv->ioaddr, txq_index); - spin_unlock(&tqueue->tx_lock); - return NETDEV_TX_OK; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-07 20:05 ` [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock Lino Sanfilippo @ 2016-12-07 23:15 ` Francois Romieu 2016-12-08 20:32 ` Lino Sanfilippo 0 siblings, 1 reply; 36+ messages in thread From: Francois Romieu @ 2016-12-07 23:15 UTC (permalink / raw) To: Lino Sanfilippo Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, pavel, davem, linux-kernel, netdev Lino Sanfilippo <LinoSanfilippo@gmx.de> : > The driver uses a private lock for synchronization between the xmit > function and the xmit completion handler, but since the NETIF_F_LLTX flag > is not set, the xmit function is also called with the xmit_lock held. > > On the other hand the xmit completion handler first takes the private lock > and (in case that the tx queue has been stopped) the xmit_lock, leading > to a reverse locking order and the potential danger of a deadlock. netif_tx_stop_queue is used by: 1. xmit function before releasing lock and returning. 2. sxgbe_restart_tx_queue() <- sxgbe_tx_interrupt <- sxgbe_reset_all_tx_queues() <- sxgbe_tx_timeout() Given xmit won't be called again until tx queue is enabled, it's not clear how a deadlock could happen due to #1. Regardless of deadlocks anywhere else, #2 has some serious problem due to the lack of exclusion between the tx queue restart handler and the xmit handler. -- Ueimor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-07 23:15 ` Francois Romieu @ 2016-12-08 20:32 ` Lino Sanfilippo 2016-12-08 21:54 ` Pavel Machek 2016-12-08 23:19 ` Francois Romieu 0 siblings, 2 replies; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-08 20:32 UTC (permalink / raw) To: Francois Romieu Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, pavel, davem, linux-kernel, netdev Hi, On 08.12.2016 00:15, Francois Romieu wrote: > Lino Sanfilippo <LinoSanfilippo@gmx.de> : >> The driver uses a private lock for synchronization between the xmit >> function and the xmit completion handler, but since the NETIF_F_LLTX flag >> is not set, the xmit function is also called with the xmit_lock held. >> >> On the other hand the xmit completion handler first takes the private lock >> and (in case that the tx queue has been stopped) the xmit_lock, leading >> to a reverse locking order and the potential danger of a deadlock. > > netif_tx_stop_queue is used by: > 1. xmit function before releasing lock and returning. > 2. sxgbe_restart_tx_queue() > <- sxgbe_tx_interrupt > <- sxgbe_reset_all_tx_queues() > <- sxgbe_tx_timeout() > > Given xmit won't be called again until tx queue is enabled, it's not clear > how a deadlock could happen due to #1. > After spending more thoughts on this I tend to agree with you. Yes, we have the different locking order for the xmit_lock and the private lock in two concurrent threads. And one of the first things one learns about locking is that this is a good way to create a deadlock sooner or later. But in our case the deadlock can only occur if the xmit function and the tx completion handler perceive different states for the tx queue, or to be more specific: the completion handler sees the tx queue in state "stopped" while the xmit handler sees it in state "running" at the same time. Only then both functions would try to take both locks, which could lead to a deadlock. OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused by that locking scheme (in a way I have not figured out yet) or if it is a different issue. Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-08 20:32 ` Lino Sanfilippo @ 2016-12-08 21:54 ` Pavel Machek 2016-12-08 22:12 ` Lino Sanfilippo 2016-12-08 23:19 ` Francois Romieu 1 sibling, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-08 21:54 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 2207 bytes --] On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote: > Hi, > > On 08.12.2016 00:15, Francois Romieu wrote: > > Lino Sanfilippo <LinoSanfilippo@gmx.de> : > >> The driver uses a private lock for synchronization between the xmit > >> function and the xmit completion handler, but since the NETIF_F_LLTX flag > >> is not set, the xmit function is also called with the xmit_lock held. > >> > >> On the other hand the xmit completion handler first takes the private lock > >> and (in case that the tx queue has been stopped) the xmit_lock, leading > >> to a reverse locking order and the potential danger of a deadlock. > > > > netif_tx_stop_queue is used by: > > 1. xmit function before releasing lock and returning. > > 2. sxgbe_restart_tx_queue() > > <- sxgbe_tx_interrupt > > <- sxgbe_reset_all_tx_queues() > > <- sxgbe_tx_timeout() > > > > Given xmit won't be called again until tx queue is enabled, it's not clear > > how a deadlock could happen due to #1. > > > > > After spending more thoughts on this I tend to agree with you. Yes, we have the > different locking order for the xmit_lock and the private lock in two concurrent > threads. And one of the first things one learns about locking is that this is a > good way to create a deadlock sooner or later. But in our case the deadlock > can only occur if the xmit function and the tx completion handler perceive different > states for the tx queue, or to be more specific: > the completion handler sees the tx queue in state "stopped" while the xmit handler > sees it in state "running" at the same time. Only then both functions would try to > take both locks, which could lead to a deadlock. > > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused > by that locking scheme (in a way I have not figured out yet) or if it is a different issue. Pavel has some problems, but that's on different hardware.. and it is possible that it is deadlock (or something else) somewhere else. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-08 21:54 ` Pavel Machek @ 2016-12-08 22:12 ` Lino Sanfilippo 2016-12-08 22:18 ` Pavel Machek 0 siblings, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-08 22:12 UTC (permalink / raw) To: Pavel Machek Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Hi, On 08.12.2016 22:54, Pavel Machek wrote: > On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote: >> Hi, >> >> On 08.12.2016 00:15, Francois Romieu wrote: >> > Lino Sanfilippo <LinoSanfilippo@gmx.de> : >> >> The driver uses a private lock for synchronization between the xmit >> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag >> >> is not set, the xmit function is also called with the xmit_lock held. >> >> >> >> On the other hand the xmit completion handler first takes the private lock >> >> and (in case that the tx queue has been stopped) the xmit_lock, leading >> >> to a reverse locking order and the potential danger of a deadlock. >> > >> > netif_tx_stop_queue is used by: >> > 1. xmit function before releasing lock and returning. >> > 2. sxgbe_restart_tx_queue() >> > <- sxgbe_tx_interrupt >> > <- sxgbe_reset_all_tx_queues() >> > <- sxgbe_tx_timeout() >> > >> > Given xmit won't be called again until tx queue is enabled, it's not clear >> > how a deadlock could happen due to #1. >> > >> >> >> After spending more thoughts on this I tend to agree with you. Yes, we have the >> different locking order for the xmit_lock and the private lock in two concurrent >> threads. And one of the first things one learns about locking is that this is a >> good way to create a deadlock sooner or later. But in our case the deadlock >> can only occur if the xmit function and the tx completion handler perceive different >> states for the tx queue, or to be more specific: >> the completion handler sees the tx queue in state "stopped" while the xmit handler >> sees it in state "running" at the same time. Only then both functions would try to >> take both locks, which could lead to a deadlock. >> >> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused >> by that locking scheme (in a way I have not figured out yet) or if it is a different issue. > > Pavel has some problems, but that's on different hardware.. and it is > possible that it is deadlock (or something else) somewhere else. > Right, it is different hardware. But the locking situation in xmit function and tx completion handler is very similar in both drivers. So if a deadlock is not possible in sxgbe it should also not be possible in stmmac (at least not due to the different locking order). So maybe there is no real issue that we could fix with removing the private lock and we should keep it as it is. Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-08 22:12 ` Lino Sanfilippo @ 2016-12-08 22:18 ` Pavel Machek 2016-12-08 22:45 ` Lino Sanfilippo 0 siblings, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-08 22:18 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 2997 bytes --] On Thu 2016-12-08 23:12:10, Lino Sanfilippo wrote: > Hi, > > On 08.12.2016 22:54, Pavel Machek wrote: > > On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote: > >> Hi, > >> > >> On 08.12.2016 00:15, Francois Romieu wrote: > >> > Lino Sanfilippo <LinoSanfilippo@gmx.de> : > >> >> The driver uses a private lock for synchronization between the xmit > >> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag > >> >> is not set, the xmit function is also called with the xmit_lock held. > >> >> > >> >> On the other hand the xmit completion handler first takes the private lock > >> >> and (in case that the tx queue has been stopped) the xmit_lock, leading > >> >> to a reverse locking order and the potential danger of a deadlock. > >> > > >> > netif_tx_stop_queue is used by: > >> > 1. xmit function before releasing lock and returning. > >> > 2. sxgbe_restart_tx_queue() > >> > <- sxgbe_tx_interrupt > >> > <- sxgbe_reset_all_tx_queues() > >> > <- sxgbe_tx_timeout() > >> > > >> > Given xmit won't be called again until tx queue is enabled, it's not clear > >> > how a deadlock could happen due to #1. > >> > > >> > >> > >> After spending more thoughts on this I tend to agree with you. Yes, we have the > >> different locking order for the xmit_lock and the private lock in two concurrent > >> threads. And one of the first things one learns about locking is that this is a > >> good way to create a deadlock sooner or later. But in our case the deadlock > >> can only occur if the xmit function and the tx completion handler perceive different > >> states for the tx queue, or to be more specific: > >> the completion handler sees the tx queue in state "stopped" while the xmit handler > >> sees it in state "running" at the same time. Only then both functions would try to > >> take both locks, which could lead to a deadlock. > >> > >> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused > >> by that locking scheme (in a way I have not figured out yet) or if it is a different issue. > > > > Pavel has some problems, but that's on different hardware.. and it is > > possible that it is deadlock (or something else) somewhere else. > > > > Right, it is different hardware. But the locking situation in xmit function and tx completion handler > is very similar in both drivers. So if a deadlock is not possible in sxgbe it should > also not be possible in stmmac (at least not due to the different locking order). > So maybe there is no real issue that we could fix with removing the private lock and we should > keep it as it is. Well.. the locking is pretty confused there. Having private lock that mirrors lock from network layer is confusing and ugly... that should be reason to fix it. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-08 22:18 ` Pavel Machek @ 2016-12-08 22:45 ` Lino Sanfilippo 0 siblings, 0 replies; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-08 22:45 UTC (permalink / raw) To: Pavel Machek Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev On 08.12.2016 23:18, Pavel Machek wrote: > On Thu 2016-12-08 23:12:10, Lino Sanfilippo wrote: >> Hi, >> >> On 08.12.2016 22:54, Pavel Machek wrote: >> > On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote: >> >> Hi, >> >> >> >> On 08.12.2016 00:15, Francois Romieu wrote: >> >> > Lino Sanfilippo <LinoSanfilippo@gmx.de> : >> >> >> The driver uses a private lock for synchronization between the xmit >> >> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag >> >> >> is not set, the xmit function is also called with the xmit_lock held. >> >> >> >> >> >> On the other hand the xmit completion handler first takes the private lock >> >> >> and (in case that the tx queue has been stopped) the xmit_lock, leading >> >> >> to a reverse locking order and the potential danger of a deadlock. >> >> > >> >> > netif_tx_stop_queue is used by: >> >> > 1. xmit function before releasing lock and returning. >> >> > 2. sxgbe_restart_tx_queue() >> >> > <- sxgbe_tx_interrupt >> >> > <- sxgbe_reset_all_tx_queues() >> >> > <- sxgbe_tx_timeout() >> >> > >> >> > Given xmit won't be called again until tx queue is enabled, it's not clear >> >> > how a deadlock could happen due to #1. >> >> > >> >> >> >> >> >> After spending more thoughts on this I tend to agree with you. Yes, we have the >> >> different locking order for the xmit_lock and the private lock in two concurrent >> >> threads. And one of the first things one learns about locking is that this is a >> >> good way to create a deadlock sooner or later. But in our case the deadlock >> >> can only occur if the xmit function and the tx completion handler perceive different >> >> states for the tx queue, or to be more specific: >> >> the completion handler sees the tx queue in state "stopped" while the xmit handler >> >> sees it in state "running" at the same time. Only then both functions would try to >> >> take both locks, which could lead to a deadlock. >> >> >> >> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused >> >> by that locking scheme (in a way I have not figured out yet) or if it is a different issue. >> > >> > Pavel has some problems, but that's on different hardware.. and it is >> > possible that it is deadlock (or something else) somewhere else. >> > >> >> Right, it is different hardware. But the locking situation in xmit function and tx completion handler >> is very similar in both drivers. So if a deadlock is not possible in sxgbe it should >> also not be possible in stmmac (at least not due to the different locking order). >> So maybe there is no real issue that we could fix with removing the private lock and we should >> keep it as it is. > > Well.. the locking is pretty confused there. Having private lock that > mirrors lock from network layer is confusing and ugly... that should > be reason to fix it. > Pavel > Ok. Then I will resend the patches for both drivers with a different (less dramatic) commit message in which the change is not longer described as a fix for deadlock but rather as a code cleanup/improvement, ok? Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-08 20:32 ` Lino Sanfilippo 2016-12-08 21:54 ` Pavel Machek @ 2016-12-08 23:19 ` Francois Romieu 2016-12-09 11:21 ` Pavel Machek 1 sibling, 1 reply; 36+ messages in thread From: Francois Romieu @ 2016-12-08 23:19 UTC (permalink / raw) To: Lino Sanfilippo Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, pavel, davem, linux-kernel, netdev Lino Sanfilippo <LinoSanfilippo@gmx.de> : [...] > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if > this is caused by that locking scheme (in a way I have not figured out yet) > or if it is a different issue. stmmac_tx_err races with stmmac_xmit. -- Ueimor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-08 23:19 ` Francois Romieu @ 2016-12-09 11:21 ` Pavel Machek 2016-12-10 2:25 ` Lino Sanfilippo 0 siblings, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-09 11:21 UTC (permalink / raw) To: Francois Romieu Cc: Lino Sanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 849 bytes --] On Fri 2016-12-09 00:19:43, Francois Romieu wrote: > Lino Sanfilippo <LinoSanfilippo@gmx.de> : > [...] > > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if > > this is caused by that locking scheme (in a way I have not figured out yet) > > or if it is a different issue. > > stmmac_tx_err races with stmmac_xmit. Umm, yes, that looks real. And that means that removing tx_lock will not be completely trivial :-(. Lino, any ideas there? netif_tx_lock_irqsave() would help, but afaict that one does not exist. Plus, does someone know how to trigger the status == tx_hard_error? I tried powering down the switch, but that did not do it. Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-09 11:21 ` Pavel Machek @ 2016-12-10 2:25 ` Lino Sanfilippo 2016-12-11 20:11 ` Pavel Machek 0 siblings, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-10 2:25 UTC (permalink / raw) To: Pavel Machek, Francois Romieu Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Hi, On 09.12.2016 12:21, Pavel Machek wrote: > On Fri 2016-12-09 00:19:43, Francois Romieu wrote: >> Lino Sanfilippo <LinoSanfilippo@gmx.de> : >> [...] >> > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if >> > this is caused by that locking scheme (in a way I have not figured out yet) >> > or if it is a different issue. >> >> stmmac_tx_err races with stmmac_xmit. > > Umm, yes, that looks real. > > And that means that removing tx_lock will not be completely trivial > :-(. Lino, any ideas there? > Ok, the race is there but it looks like a problem that is not related to the use or removal of the private lock. By a glimpse into other drivers (e.g sky2 or e1000), a possible way to handle a tx error is to start a separate task and restart the tx path in that task instead the irq handler (or timer in case of the watchdog). In that task we could do: 1. deactivate napi 2. deactivate irqs 3. wait for running napi/irqs do complete (_sync) 4. call stmmac_tx_err() 5. reenable napi 6. reenable irqs We have to ensure that no xmit() is executing while stmmac_tx_err() does the cleanup, so stmmac_tx_err() should IMO rather call netif_tx_disable() instead of netif_stop_queue() (the former grabs the xmit lock before it sets __QUEUE_STATE_DRV_XOFF to disable the queue). Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-10 2:25 ` Lino Sanfilippo @ 2016-12-11 20:11 ` Pavel Machek 2016-12-15 19:27 ` Lino Sanfilippo 0 siblings, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-11 20:11 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 2356 bytes --] Hi! > On 09.12.2016 12:21, Pavel Machek wrote: > > On Fri 2016-12-09 00:19:43, Francois Romieu wrote: > >> Lino Sanfilippo <LinoSanfilippo@gmx.de> : > >> [...] > >> > OTOH Pavel said that he actually could produce a deadlock. Now I wonder if > >> > this is caused by that locking scheme (in a way I have not figured out yet) > >> > or if it is a different issue. > >> > >> stmmac_tx_err races with stmmac_xmit. > > > > Umm, yes, that looks real. > > > > And that means that removing tx_lock will not be completely trivial > > :-(. Lino, any ideas there? > > > > Ok, the race is there but it looks like a problem that is not related to > the use or removal of the private lock. Well, removal of the private lock will make it trickier to fix :-(. > By a glimpse into other drivers (e.g sky2 or e1000), a possible way to handle a > tx error is to start a separate task and restart the tx path in that task instead > the irq handler (or timer in case of the watchdog). > > In that task we could do: > 1. deactivate napi > 2. deactivate irqs > 3. wait for running napi/irqs do complete (_sync) > 4. call stmmac_tx_err() > 5. reenable napi > 6. reenable irqs > > We have to ensure that no xmit() is executing while stmmac_tx_err() does the cleanup, > so stmmac_tx_err() should IMO rather call netif_tx_disable() instead of netif_stop_queue() > (the former grabs the xmit lock before it sets __QUEUE_STATE_DRV_XOFF to disable > the queue). Do you understand what stmmac_tx_err(priv); is supposed to do? In particular, if it is called while the driver is working ok -- should the driver survive that? Because it does not currently, and I don't know how to test that code. Unplugging the cable does not provoke that. I tried } else if (unlikely(status == tx_hard_error)) stmmac_tx_err(priv); + + { + static int i; + i++; + if (i==1000) { + i = 0; + printk("Simulated error\n"); + stmmac_tx_err(priv); + } + } } but the driver does not survive that :-(. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-11 20:11 ` Pavel Machek @ 2016-12-15 19:27 ` Lino Sanfilippo 2016-12-15 21:03 ` Pavel Machek 0 siblings, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-15 19:27 UTC (permalink / raw) To: Pavel Machek Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Hi Pavel, sorry for the late reply. On 11.12.2016 21:11, Pavel Machek wrote: > > Do you understand what stmmac_tx_err(priv); is supposed to do? In > particular, if it is called while the driver is working ok -- should > the driver survive that? As far as I understood it is supposed to fixup an errorneous tx path, e.g. a missing tx completion for transmitted frames. Some drivers do this by restarting only the HW parts responsible for tx, some others by restarting the complete hardware. But IMO it should also be ok to be called if the HW is still working fine. > Because it does not currently, and I don't know how to test that > code. Unplugging the cable does not provoke that. > > I tried > > } else if (unlikely(status == tx_hard_error)) > stmmac_tx_err(priv); > + > + { > + static int i; > + i++; > + if (i==1000) { > + i = 0; > + printk("Simulated error\n"); > + stmmac_tx_err(priv); > + } > + } > } > Ok, there is this race that Francois mentioned so it is not surprising that the driver does not survive the call of stmmac_tx_err() as it is called now. Thats why I suggested to do a proper shutdown and restart of the tx path to avoid the race. Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-15 19:27 ` Lino Sanfilippo @ 2016-12-15 21:03 ` Pavel Machek 2016-12-15 21:32 ` Lino Sanfilippo 0 siblings, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-15 21:03 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1784 bytes --] Hi! > sorry for the late reply. No problem. Thanks for the help. > On 11.12.2016 21:11, Pavel Machek wrote: > > > > Do you understand what stmmac_tx_err(priv); is supposed to do? In > > particular, if it is called while the driver is working ok -- should > > the driver survive that? > > As far as I understood it is supposed to fixup an errorneous tx path, e.g. a > missing tx completion for transmitted frames. > > Some drivers do this by restarting only the HW parts responsible for tx, some > others by restarting the complete hardware. > But IMO it should also be ok to be called if the HW is still working fine. > > > Because it does not currently, and I don't know how to test that > > code. Unplugging the cable does not provoke that. > > > > I tried > > > > } else if (unlikely(status == tx_hard_error)) > > stmmac_tx_err(priv); > > + > > + { > > + static int i; > > + i++; > > + if (i==1000) { > > + i = 0; > > + printk("Simulated error\n"); > > + stmmac_tx_err(priv); > > + } > > + } > > } > > > > Ok, there is this race that Francois mentioned so it is not surprising that > the driver does not survive the call of stmmac_tx_err() as it is called now. > Thats why I suggested to do a proper shutdown and restart of the tx path to > avoid the race. I actually did experiment with adding locking there, too, and no, no luck. It seems stmmac_tx_err() is more broken than just locking. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-15 21:03 ` Pavel Machek @ 2016-12-15 21:32 ` Lino Sanfilippo 2016-12-15 22:33 ` Lino Sanfilippo 0 siblings, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-15 21:32 UTC (permalink / raw) To: Pavel Machek Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev On 15.12.2016 22:03, Pavel Machek wrote: > > I actually did experiment with adding locking there, too, and no, no > luck. It seems stmmac_tx_err() is more broken than just locking. > Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the tx path properly) and the HW is still active on the tx path while the tx buffers are freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped. Did you try to stop the phy fist in stmmac_tx_err_work(), too? Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-15 21:32 ` Lino Sanfilippo @ 2016-12-15 22:33 ` Lino Sanfilippo 2016-12-17 17:31 ` Pavel Machek 0 siblings, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-15 22:33 UTC (permalink / raw) To: Pavel Machek Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 556 bytes --] On 15.12.2016 22:32, Lino Sanfilippo wrote: > Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the > tx path properly) and the HW is still active on the tx path while the tx buffers are > freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped. > Did you try to stop the phy fist in stmmac_tx_err_work(), too? > > Regards, > Lino > And this is the "sledgehammer" approach: Do a complete shutdown and restart of the hardware in case of tx error (against net-next and only compile tested). [-- Attachment #2: 0001-Sledgehammer.patch --] [-- Type: text/x-patch, Size: 4140 bytes --] >From 0eda87ce6cbc2fb6d25653f30121f30f89332199 Mon Sep 17 00:00:00 2001 From: Lino Sanfilippo <LinoSanfilippo@gmx.de> Date: Thu, 15 Dec 2016 23:18:15 +0100 Subject: [PATCH] Sledgehammer --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 70 ++++++++++------------- 2 files changed, 31 insertions(+), 40 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index eab04ae..9c240d7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -131,6 +131,7 @@ struct stmmac_priv { u32 rx_tail_addr; u32 tx_tail_addr; u32 mss; + struct work_struct tx_err_work; #ifdef CONFIG_DEBUG_FS struct dentry *dbgfs_dir; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 3e40578..7546574 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1403,37 +1403,6 @@ static inline void stmmac_disable_dma_irq(struct stmmac_priv *priv) } /** - * stmmac_tx_err - to manage the tx error - * @priv: driver private structure - * Description: it cleans the descriptors and restarts the transmission - * in case of transmission errors. - */ -static void stmmac_tx_err(struct stmmac_priv *priv) -{ - int i; - netif_stop_queue(priv->dev); - - priv->hw->dma->stop_tx(priv->ioaddr); - dma_free_tx_skbufs(priv); - for (i = 0; i < DMA_TX_SIZE; i++) - if (priv->extend_desc) - priv->hw->desc->init_tx_desc(&priv->dma_etx[i].basic, - priv->mode, - (i == DMA_TX_SIZE - 1)); - else - priv->hw->desc->init_tx_desc(&priv->dma_tx[i], - priv->mode, - (i == DMA_TX_SIZE - 1)); - priv->dirty_tx = 0; - priv->cur_tx = 0; - netdev_reset_queue(priv->dev); - priv->hw->dma->start_tx(priv->ioaddr); - - priv->dev->stats.tx_errors++; - netif_wake_queue(priv->dev); -} - -/** * stmmac_dma_interrupt - DMA ISR * @priv: driver private structure * Description: this is the DMA ISR. It is called by the main ISR. @@ -1466,7 +1435,7 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv) priv->xstats.threshold = tc; } } else if (unlikely(status == tx_hard_error)) - stmmac_tx_err(priv); + schedule_work(&priv->tx_err_work); } /** @@ -1870,13 +1839,7 @@ static int stmmac_open(struct net_device *dev) return ret; } -/** - * stmmac_release - close entry point of the driver - * @dev : device pointer. - * Description: - * This is the stop entry point of the driver. - */ -static int stmmac_release(struct net_device *dev) +static int stmmac_do_release(struct net_device *dev) { struct stmmac_priv *priv = netdev_priv(dev); @@ -1919,10 +1882,36 @@ static int stmmac_release(struct net_device *dev) #endif stmmac_release_ptp(priv); +} + +/** + * stmmac_release - close entry point of the driver + * @dev : device pointer. + * Description: + * This is the stop entry point of the driver. + */ +static int stmmac_release(struct net_device *dev) +{ + struct stmmac_priv *priv = netdev_priv(dev); + + cancel_work_sync(&priv->tx_err_work); + + stmmac_do_release(dev); return 0; } +static void stmmac_tx_err_work(struct work_struct *work) +{ + struct stmmac_priv *priv = container_of(work, struct stmmac_priv, + tx_err_work); + /* restart netdev */ + rtnl_lock(); + stmmac_release(priv->dev); + stmmac_open(priv->dev); + rtnl_unlock(); +} + /** * stmmac_tso_allocator - close entry point of the driver * @priv: driver private structure @@ -2688,7 +2677,7 @@ static void stmmac_tx_timeout(struct net_device *dev) struct stmmac_priv *priv = netdev_priv(dev); /* Clear Tx resources and restart transmitting again */ - stmmac_tx_err(priv); + schedule_work(&priv->tx_err_work); } /** @@ -3338,6 +3327,7 @@ int stmmac_dvr_probe(struct device *device, netif_napi_add(ndev, &priv->napi, stmmac_poll, 64); spin_lock_init(&priv->lock); + INIT_WORK(&priv->tx_err_work, stmmac_tx_err_work); ret = register_netdev(ndev); if (ret) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-15 22:33 ` Lino Sanfilippo @ 2016-12-17 17:31 ` Pavel Machek 2016-12-18 0:15 ` Francois Romieu 0 siblings, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-17 17:31 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1355 bytes --] On Thu 2016-12-15 23:33:22, Lino Sanfilippo wrote: > On 15.12.2016 22:32, Lino Sanfilippo wrote: > > > Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly (stop the > > tx path properly) and the HW is still active on the tx path while the tx buffers are > > freed. OTOH stmmac_release() also stops the phy before the tx (and rx) paths are stopped. > > Did you try to stop the phy fist in stmmac_tx_err_work(), too? > > > > Regards, > > Lino > > > > And this is the "sledgehammer" approach: Do a complete shutdown and restart > of the hardware in case of tx error (against net-next and only >compile tested). Wow, thanks a lot. I'll try to get the driver back to the non-working state, and try it. I believe I have some idea what is wrong there. (Missing memory barriers). > +static void stmmac_tx_err_work(struct work_struct *work) > +{ > + struct stmmac_priv *priv = container_of(work, struct stmmac_priv, > + tx_err_work); > + /* restart netdev */ > + rtnl_lock(); > + stmmac_release(priv->dev); > + stmmac_open(priv->dev); > + rtnl_unlock(); > +} Won't this up/down the interface, in a way userspace can observe? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-17 17:31 ` Pavel Machek @ 2016-12-18 0:15 ` Francois Romieu 2016-12-18 16:15 ` Lino Sanfilippo 2016-12-19 10:02 ` Pavel Machek 0 siblings, 2 replies; 36+ messages in thread From: Francois Romieu @ 2016-12-18 0:15 UTC (permalink / raw) To: Pavel Machek Cc: Lino Sanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Pavel Machek <pavel@ucw.cz> : [...] > Won't this up/down the interface, in a way userspace can observe? It won't up/down the interface as it doesn't exactly mimic what the network code does (there's more than rtnl_lock). For the same reason it's broken if it races with the transmit path: it can release driver resources while the transmit path uses these. Btw the points below may not matter/hurt much for a proof a concept but they would need to be addressed as well: 1) unchecked (and avoidable) extra error paths due to stmmac_release() 2) racy cancel_work_sync. Low probability as it is, an irq + error could take place right after cancel_work_sync Lino, have you considered via-rhine.c since its "move work from irq to workqueue context" changes that started in 7ab87ff4c770eed71e3777936299292739fcd0fe [*] ? It's a shameless plug - originated in r8169.c - but it should be rather close to what the sxgbe and friends require. Thought / opinion ? [*] Including fixes/changes in: - 3a5a883a8a663b930908cae4abe5ec913b9b2fd2 - e1efa87241272104d6a12c8b9fcdc4f62634d447 - 810f19bcb862f8889b27e0c9d9eceac9593925dd - e45af497950a89459a0c4b13ffd91e1729fffef4 - a926592f5e4e900f3fa903298c4619a131e60963 - 559bcac35facfed49ab4f408e162971612dcfdf3 -- Ueimor ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-18 0:15 ` Francois Romieu @ 2016-12-18 16:15 ` Lino Sanfilippo 2016-12-18 17:23 ` Pavel Machek ` (2 more replies) 2016-12-19 10:02 ` Pavel Machek 1 sibling, 3 replies; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-18 16:15 UTC (permalink / raw) To: Francois Romieu, Pavel Machek Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Hi, On 18.12.2016 01:15, Francois Romieu wrote: > Pavel Machek <pavel@ucw.cz> : > [...] >> Won't this up/down the interface, in a way userspace can observe? > > It won't up/down the interface as it doesn't exactly mimic what the > network code does (there's more than rtnl_lock). > Right. Userspace wont see link down/up, but it will see carrier off/on. But this is AFAIK acceptable for a rare situation like a tx error. > For the same reason it's broken if it races with the transmit path: it > can release driver resources while the transmit path uses these. > > Btw the points below may not matter/hurt much for a proof a concept > but they would need to be addressed as well: > 1) unchecked (and avoidable) extra error paths due to stmmac_release() > 2) racy cancel_work_sync. Low probability as it is, an irq + error could > take place right after cancel_work_sync It was indeed only meant as a proof of concept. Nevertheless the race is not good, since one can run into it when faking the tx error for testings purposes. So below is a slightly improved version of the restart handling. Its not meant as a final version either. But maybe we can use it as a starting point. > Lino, have you considered via-rhine.c since its "move work from irq to > workqueue context" changes that started in > 7ab87ff4c770eed71e3777936299292739fcd0fe [*] ? > It's a shameless plug - originated in r8169.c - but it should be rather > close to what the sxgbe and friends require. Thought / opinion ? > Not really. There are a few drivers that I use to look into if I want to know how certain things are done correctly (e.g the sky2 or the intel drivers) because I think they are well implemented. But you seem to have put some thoughts into various race condition problems in the via-rhine driver and I can image that sxgbe and stmmac still have some of these issues, too. > [*] Including fixes/changes in: > - 3a5a883a8a663b930908cae4abe5ec913b9b2fd2 Ok, the issues you fixed here are concerning the tx_curr and tx_dirty pointers. For now this is not needed in stmmac and sxgbe since the tx completion handlers in both drivers are not lock-free like in the via-rhine.c but are synchronized with xmit by means of the xmit_lock. > - e1efa87241272104d6a12c8b9fcdc4f62634d447 Yep, a sync of the dma descriptors before the hardware gets ownership of the tx tail idx is missing in the stmmac, too. > - 810f19bcb862f8889b27e0c9d9eceac9593925dd > - e45af497950a89459a0c4b13ffd91e1729fffef4 > - a926592f5e4e900f3fa903298c4619a131e60963 I think we should use netif_tx_disable() instead of netif_stop_queue(), too, in case of restart to avoid a possible schedule of the xmit function while we restart. So this is also part of the new patch. Again the patch is only compile tested. Regards, Lino --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 + drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 95 +++++++++++++++-------- 2 files changed, 63 insertions(+), 33 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index eab04ae..9c240d7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -131,6 +131,7 @@ struct stmmac_priv { u32 rx_tail_addr; u32 tx_tail_addr; u32 mss; + struct work_struct tx_err_work; #ifdef CONFIG_DEBUG_FS struct dentry *dbgfs_dir; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 3e40578..5762750 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1403,37 +1403,6 @@ static inline void stmmac_disable_dma_irq(struct stmmac_priv *priv) } /** - * stmmac_tx_err - to manage the tx error - * @priv: driver private structure - * Description: it cleans the descriptors and restarts the transmission - * in case of transmission errors. - */ -static void stmmac_tx_err(struct stmmac_priv *priv) -{ - int i; - netif_stop_queue(priv->dev); - - priv->hw->dma->stop_tx(priv->ioaddr); - dma_free_tx_skbufs(priv); - for (i = 0; i < DMA_TX_SIZE; i++) - if (priv->extend_desc) - priv->hw->desc->init_tx_desc(&priv->dma_etx[i].basic, - priv->mode, - (i == DMA_TX_SIZE - 1)); - else - priv->hw->desc->init_tx_desc(&priv->dma_tx[i], - priv->mode, - (i == DMA_TX_SIZE - 1)); - priv->dirty_tx = 0; - priv->cur_tx = 0; - netdev_reset_queue(priv->dev); - priv->hw->dma->start_tx(priv->ioaddr); - - priv->dev->stats.tx_errors++; - netif_wake_queue(priv->dev); -} - -/** * stmmac_dma_interrupt - DMA ISR * @priv: driver private structure * Description: this is the DMA ISR. It is called by the main ISR. @@ -1466,7 +1435,7 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv) priv->xstats.threshold = tc; } } else if (unlikely(status == tx_hard_error)) - stmmac_tx_err(priv); + schedule_work(&priv->tx_err_work); } /** @@ -1902,6 +1871,7 @@ static int stmmac_release(struct net_device *dev) if (priv->lpi_irq > 0) free_irq(priv->lpi_irq, dev); + cancel_work_sync(&priv->tx_err_work); /* Stop TX/RX DMA and clear the descriptors */ priv->hw->dma->stop_tx(priv->ioaddr); priv->hw->dma->stop_rx(priv->ioaddr); @@ -1920,9 +1890,67 @@ static int stmmac_release(struct net_device *dev) stmmac_release_ptp(priv); + return 0; } +static void stmmac_shutdown(struct net_device *dev) +{ + struct stmmac_priv *priv = netdev_priv(dev); + + /* make sure xmit is not scheduled any more */ + netif_tx_disable(dev); + + if (priv->eee_enabled) + del_timer_sync(&priv->eee_ctrl_timer); + + /* Stop and disconnect the PHY */ + if (dev->phydev) { + phy_stop(dev->phydev); + phy_disconnect(dev->phydev); + } + + napi_disable(&priv->napi); + + del_timer_sync(&priv->txtimer); + + /* Free the IRQ lines */ + free_irq(dev->irq, dev); + if (priv->wol_irq != dev->irq) + free_irq(priv->wol_irq, dev); + if (priv->lpi_irq > 0) + free_irq(priv->lpi_irq, dev); + + /* Stop TX/RX DMA and clear the descriptors */ + priv->hw->dma->stop_tx(priv->ioaddr); + priv->hw->dma->stop_rx(priv->ioaddr); + + /* Release and free the Rx/Tx resources */ + free_dma_desc_resources(priv); + + /* Disable the MAC Rx/Tx */ + stmmac_set_mac(priv->ioaddr, false); + + netif_carrier_off(dev); + +#ifdef CONFIG_DEBUG_FS + stmmac_exit_fs(dev); +#endif + + stmmac_release_ptp(priv); +} + +static void stmmac_tx_err_work(struct work_struct *work) +{ + struct stmmac_priv *priv = container_of(work, struct stmmac_priv, + tx_err_work); + /* restart netdev */ + rtnl_lock(); + stmmac_shutdown(priv->dev); + stmmac_open(priv->dev); + rtnl_unlock(); +} + /** * stmmac_tso_allocator - close entry point of the driver * @priv: driver private structure @@ -2688,7 +2716,7 @@ static void stmmac_tx_timeout(struct net_device *dev) struct stmmac_priv *priv = netdev_priv(dev); /* Clear Tx resources and restart transmitting again */ - stmmac_tx_err(priv); + schedule_work(&priv->tx_err_work); } /** @@ -3338,6 +3366,7 @@ int stmmac_dvr_probe(struct device *device, netif_napi_add(ndev, &priv->napi, stmmac_poll, 64); spin_lock_init(&priv->lock); + INIT_WORK(&priv->tx_err_work, stmmac_tx_err_work); ret = register_netdev(ndev); if (ret) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-18 16:15 ` Lino Sanfilippo @ 2016-12-18 17:23 ` Pavel Machek 2016-12-18 18:30 ` Pavel Machek 2016-12-18 20:16 ` Pavel Machek 2 siblings, 0 replies; 36+ messages in thread From: Pavel Machek @ 2016-12-18 17:23 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 590 bytes --] Hi! > > - e1efa87241272104d6a12c8b9fcdc4f62634d447 > > Yep, a sync of the dma descriptors before the hardware gets ownership of the tx tail > idx is missing in the stmmac, too. Thanks for the hint. Actually, the driver uses smp_wmb() which is completely crazy, and probably misses rmb() in clean_tx, too. Anyway, I did not notice there are dma_ variants, too... we clearly need them. Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-18 16:15 ` Lino Sanfilippo 2016-12-18 17:23 ` Pavel Machek @ 2016-12-18 18:30 ` Pavel Machek 2016-12-19 22:49 ` Lino Sanfilippo 2016-12-18 20:16 ` Pavel Machek 2 siblings, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-18 18:30 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 811 bytes --] Hi! > > - e1efa87241272104d6a12c8b9fcdc4f62634d447 > > Yep, a sync of the dma descriptors before the hardware gets ownership of the tx tail > idx is missing in the stmmac, too. I can reproduce failure with 4.4 fairly easily. I tried with dma_ variant of barriers, and it failed, too [ 1018.410012] stmmac: early irq [ 1023.939702] fpga_cmd_read:wait_event timed out! [ 1033.128692] ------------[ cut here ]------------ [ 1033.133329] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x264/0x284() [ 1033.141748] NETDEV WATCHDOG: eth0 (socfpga-dwmac): transmit queue 0 timed out [ 1033.148861] Modules linked in: Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-18 18:30 ` Pavel Machek @ 2016-12-19 22:49 ` Lino Sanfilippo 0 siblings, 0 replies; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-19 22:49 UTC (permalink / raw) To: Pavel Machek Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Hi, On 18.12.2016 19:30, Pavel Machek wrote: > Hi! > >> > - e1efa87241272104d6a12c8b9fcdc4f62634d447 >> >> Yep, a sync of the dma descriptors before the hardware gets ownership of the tx tail >> idx is missing in the stmmac, too. > > I can reproduce failure with 4.4 fairly easily. I tried with dma_ > variant of barriers, and it failed, too > > [ 1018.410012] stmmac: early irq > [ 1023.939702] fpga_cmd_read:wait_event timed out! > [ 1033.128692] ------------[ cut here ]------------ > [ 1033.133329] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 > dev_watchdog+0x264/0x284() > [ 1033.141748] NETDEV WATCHDOG: eth0 (socfpga-dwmac): transmit queue 0 > timed out > [ 1033.148861] Modules linked in: This watchdog warning clearly says that for some reason the tx queue was stopped but never woken up in a certain timespan (5 sec in our case, which is a lot). Does this occur after the queue has been stopped and woken up again a few times or is it already the first time the queue is stopped (and never woken up)? Regards, Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-18 16:15 ` Lino Sanfilippo 2016-12-18 17:23 ` Pavel Machek 2016-12-18 18:30 ` Pavel Machek @ 2016-12-18 20:16 ` Pavel Machek 2 siblings, 0 replies; 36+ messages in thread From: Pavel Machek @ 2016-12-18 20:16 UTC (permalink / raw) To: Lino Sanfilippo Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1266 bytes --] Hi! > > For the same reason it's broken if it races with the transmit path: it > > can release driver resources while the transmit path uses these. > > > > Btw the points below may not matter/hurt much for a proof a concept > > but they would need to be addressed as well: > > 1) unchecked (and avoidable) extra error paths due to stmmac_release() > > 2) racy cancel_work_sync. Low probability as it is, an irq + error could > > take place right after cancel_work_sync > > It was indeed only meant as a proof of concept. Nevertheless the race is not > good, since one can run into it when faking the tx error for testings purposes. > So below is a slightly improved version of the restart handling. > Its not meant as a final version either. But maybe we can use it as a starting > point. Certainly works better than version we currently have in tree. I'm running it in a loop, and it survived 10 minutes of testing so far. (Previous version killed the hardware at first iteration.) > Again the patch is only compile tested. Tested-by: Pavel Machek <pavel@denx.de> Thanks! Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-18 0:15 ` Francois Romieu 2016-12-18 16:15 ` Lino Sanfilippo @ 2016-12-19 10:02 ` Pavel Machek 2016-12-20 0:05 ` Francois Romieu 1 sibling, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-19 10:02 UTC (permalink / raw) To: Francois Romieu Cc: Lino Sanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 2119 bytes --] Hi! > Lino, have you considered via-rhine.c since its "move work from irq to > workqueue context" changes that started in > 7ab87ff4c770eed71e3777936299292739fcd0fe [*] ? > > It's a shameless plug - originated in r8169.c - but it should be rather > close to what the sxgbe and friends require. Thought / opinion ? > > [*] Including fixes/changes in: > - 3a5a883a8a663b930908cae4abe5ec913b9b2fd2 > - e1efa87241272104d6a12c8b9fcdc4f62634d447 > - 810f19bcb862f8889b27e0c9d9eceac9593925dd > - e45af497950a89459a0c4b13ffd91e1729fffef4 > - a926592f5e4e900f3fa903298c4619a131e60963 > - 559bcac35facfed49ab4f408e162971612dcfdf3 Considering the memory barriers... is something like this neccessary in the via-rhine? AFAICT... we need a barrier after making sure that descriptor is no longer owned by DMA (to make sure we don't get stale data in rest of descriptor)... and we need a barrier before giving the descriptor to the dma, to make sure DMA engine sees the complete update....? Thanks, Pavel diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c index ba5c542..3806e72 100644 --- a/drivers/net/ethernet/via/via-rhine.c +++ b/drivers/net/ethernet/via/via-rhine.c @@ -1952,6 +1952,7 @@ static void rhine_tx(struct net_device *dev) entry, txstatus); if (txstatus & DescOwn) break; + dma_rmb(); skb = rp->tx_skbuff[entry]; if (txstatus & 0x8000) { netif_dbg(rp, tx_done, dev, @@ -2061,6 +2062,7 @@ static int rhine_rx(struct net_device *dev, int limit) if (desc_status & DescOwn) break; + dma_rmb(); netif_dbg(rp, rx_status, dev, "%s() status %08x\n", __func__, desc_status); @@ -2146,6 +2148,7 @@ static int rhine_rx(struct net_device *dev, int limit) u64_stats_update_end(&rp->rx_stats.syncp); } give_descriptor_to_nic: + dma_wmb(); desc->rx_status = cpu_to_le32(DescOwn); entry = (++rp->cur_rx) % RX_RING_SIZE; } -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock 2016-12-19 10:02 ` Pavel Machek @ 2016-12-20 0:05 ` Francois Romieu 0 siblings, 0 replies; 36+ messages in thread From: Francois Romieu @ 2016-12-20 0:05 UTC (permalink / raw) To: Pavel Machek Cc: Lino Sanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Pavel Machek <pavel@ucw.cz> : [...] > Considering the memory barriers... is something like this neccessary > in the via-rhine ? Yes. > AFAICT... we need a barrier after making sure that descriptor is no > longer owned by DMA (to make sure we don't get stale data in rest of > descriptor)... and we need a barrier before giving the descriptor to > the dma, to make sure DMA engine sees the complete update....? I would not expect stale data while processing a single transmit descriptor as the transmit completion does not use the rest of the descriptor at all in the via-rhine driver. However I agree that transmit descriptors should be read by the cpu with adequate ordering so the dma_rmb() should stay. Same kind of narrative for dma_wmb rhine_rx (s/read/written/ and s/cpu/device/). > diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c > index ba5c542..3806e72 100644 > --- a/drivers/net/ethernet/via/via-rhine.c > +++ b/drivers/net/ethernet/via/via-rhine.c [...] > @@ -2061,6 +2062,7 @@ static int rhine_rx(struct net_device *dev, int limit) > > if (desc_status & DescOwn) > break; > + dma_rmb(); > I agree with your explanation for this one (late vlan processing in a different word from the same descriptor). -- Ueimor ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 20:05 Remove private locks to avoid possible deadlock Lino Sanfilippo 2016-12-07 20:05 ` [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock Lino Sanfilippo @ 2016-12-07 20:05 ` Lino Sanfilippo 2016-12-07 20:55 ` Pavel Machek ` (2 more replies) 1 sibling, 3 replies; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-07 20:05 UTC (permalink / raw) To: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue Cc: pavel, davem, linux-kernel, netdev, Lino Sanfilippo The driver uses a private lock for synchronization between the xmit function and the xmit completion handler, but since the NETIF_F_LLTX flag is not set, the xmit function is also called with the xmit_lock held. On the other hand the xmit completion handler first takes the private lock and (in case that the tx queue has been stopped) the xmit_lock, leading to a reverse locking order and the potential danger of a deadlock. Fix this by removing the private lock completely and synchronizing the xmit function and completion handler solely by means of the xmit_lock. By doing this remove also the now unnecessary double check for a stopped tx queue. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> --- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 - drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 28 +++++------------------ 2 files changed, 6 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index 4d2a759..7e69b11 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -64,7 +64,6 @@ struct stmmac_priv { dma_addr_t dma_tx_phy; int tx_coalesce; int hwts_tx_en; - spinlock_t tx_lock; bool tx_path_in_lpi_mode; struct timer_list txtimer; bool tso; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index caf069a..db46ec4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1307,7 +1307,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) unsigned int bytes_compl = 0, pkts_compl = 0; unsigned int entry = priv->dirty_tx; - spin_lock(&priv->tx_lock); + netif_tx_lock(priv->dev); priv->xstats.tx_clean++; @@ -1378,22 +1378,17 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) netdev_completed_queue(priv->dev, pkts_compl, bytes_compl); if (unlikely(netif_queue_stopped(priv->dev) && - stmmac_tx_avail(priv) > STMMAC_TX_THRESH)) { - netif_tx_lock(priv->dev); - if (netif_queue_stopped(priv->dev) && - stmmac_tx_avail(priv) > STMMAC_TX_THRESH) { - if (netif_msg_tx_done(priv)) - pr_debug("%s: restart transmit\n", __func__); - netif_wake_queue(priv->dev); - } - netif_tx_unlock(priv->dev); + stmmac_tx_avail(priv) > STMMAC_TX_THRESH)) { + if (netif_msg_tx_done(priv)) + pr_debug("%s: restart transmit\n", __func__); + netif_wake_queue(priv->dev); } if ((priv->eee_enabled) && (!priv->tx_path_in_lpi_mode)) { stmmac_enable_eee_mode(priv); mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer)); } - spin_unlock(&priv->tx_lock); + netif_tx_unlock(priv->dev); } static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv) @@ -1998,8 +1993,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) u8 proto_hdr_len; int i; - spin_lock(&priv->tx_lock); - /* Compute header lengths */ proto_hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb); @@ -2011,7 +2004,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) /* This is a hard error, log it. */ pr_err("%s: Tx Ring full when queue awake\n", __func__); } - spin_unlock(&priv->tx_lock); return NETDEV_TX_BUSY; } @@ -2146,11 +2138,9 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, priv->tx_tail_addr, STMMAC_CHAN0); - spin_unlock(&priv->tx_lock); return NETDEV_TX_OK; dma_map_err: - spin_unlock(&priv->tx_lock); dev_err(priv->device, "Tx dma map failed\n"); dev_kfree_skb(skb); priv->dev->stats.tx_dropped++; @@ -2182,10 +2172,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) return stmmac_tso_xmit(skb, dev); } - spin_lock(&priv->tx_lock); - if (unlikely(stmmac_tx_avail(priv) < nfrags + 1)) { - spin_unlock(&priv->tx_lock); if (!netif_queue_stopped(dev)) { netif_stop_queue(dev); /* This is a hard error, log it. */ @@ -2357,11 +2344,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, priv->tx_tail_addr, STMMAC_CHAN0); - spin_unlock(&priv->tx_lock); return NETDEV_TX_OK; dma_map_err: - spin_unlock(&priv->tx_lock); dev_err(priv->device, "Tx dma map failed\n"); dev_kfree_skb(skb); priv->dev->stats.tx_dropped++; @@ -3347,7 +3332,6 @@ int stmmac_dvr_probe(struct device *device, netif_napi_add(ndev, &priv->napi, stmmac_poll, 64); spin_lock_init(&priv->lock); - spin_lock_init(&priv->tx_lock); ret = register_netdev(ndev); if (ret) { -- 1.9.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 20:05 ` [PATCH 2/2] net: ethernet: stmmac: " Lino Sanfilippo @ 2016-12-07 20:55 ` Pavel Machek 2016-12-07 20:59 ` Pavel Machek 2016-12-07 21:37 ` Pavel Machek 2 siblings, 0 replies; 36+ messages in thread From: Pavel Machek @ 2016-12-07 20:55 UTC (permalink / raw) To: Lino Sanfilippo Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1719 bytes --] Hi! > The driver uses a private lock for synchronization between the xmit > function and the xmit completion handler, but since the NETIF_F_LLTX flag > is not set, the xmit function is also called with the xmit_lock held. > > On the other hand the xmit completion handler first takes the private lock > and (in case that the tx queue has been stopped) the xmit_lock, leading to > a reverse locking order and the potential danger of a deadlock. > > Fix this by removing the private lock completely and synchronizing the xmit > function and completion handler solely by means of the xmit_lock. By doing > this remove also the now unnecessary double check for a stopped tx queue. > > Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Does not seem to apply to net-next based on adc176c5472214971d77c1a61c83db9b01e9cdc7. Aha, that's the printk() changes, probably would apply to mainline. > index caf069a..db46ec4 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -1307,7 +1307,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) > unsigned int bytes_compl = 0, pkts_compl = 0; > unsigned int entry = priv->dirty_tx; > > - spin_lock(&priv->tx_lock); > + netif_tx_lock(priv->dev); > > priv->xstats.tx_clean++; > Should it use "netif_tx_lock_bh"? I could not reproduce the deadlock without this patch, nor can I detect anything wrong with this patch, so I guess that is: Tested-by: Pavel Machek <pavel@denx.de> Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 20:05 ` [PATCH 2/2] net: ethernet: stmmac: " Lino Sanfilippo 2016-12-07 20:55 ` Pavel Machek @ 2016-12-07 20:59 ` Pavel Machek 2016-12-07 21:37 ` Pavel Machek 2 siblings, 0 replies; 36+ messages in thread From: Pavel Machek @ 2016-12-07 20:59 UTC (permalink / raw) To: Lino Sanfilippo Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1881 bytes --] Hi! > The driver uses a private lock for synchronization between the xmit > function and the xmit completion handler, but since the NETIF_F_LLTX flag > is not set, the xmit function is also called with the xmit_lock held. > > On the other hand the xmit completion handler first takes the private lock > and (in case that the tx queue has been stopped) the xmit_lock, leading to > a reverse locking order and the potential danger of a deadlock. > > Fix this by removing the private lock completely and synchronizing the xmit > function and completion handler solely by means of the xmit_lock. By doing > this remove also the now unnecessary double check for a stopped tx queue. > > Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Oops, sorry no, that broke the driver after a while: (So please ignore my tested-by:) root@wagabuibui:/data/tmp/udpt# ./udp-test raw 10.0.0.6 1234 1000 100 30 Sending 100 packets (1000b each) at an interval of 30ms, expected data rate:3333333b/s (3373333b/s incl udp overhead) [ 30.948626] socfpga-dwmac ff702000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx [ 31.076064] Link is Up - 100/Full [ 32.979526] random: crng init done [ 262.244030] ------------[ cut here ]------------ [ 262.248669] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x254/0x26c [ 262.256916] NETDEV WATCHDOG: eth0 (socfpga-dwmac): transmit queue 0 timed out [ 262.264028] Modules linked in: [ 262.267102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc7-118095-g2d70d9b-dirty #339 [ 262.275328] Hardware name: Altera SOCFPGA [ 262.279352] [<8010f758>] (unwind_backtrace) from [<8010affc>] (show_stack+0x10/0x14) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 20:05 ` [PATCH 2/2] net: ethernet: stmmac: " Lino Sanfilippo 2016-12-07 20:55 ` Pavel Machek 2016-12-07 20:59 ` Pavel Machek @ 2016-12-07 21:37 ` Pavel Machek 2016-12-07 21:43 ` Lino Sanfilippo 2016-12-07 23:41 ` David Miller 2 siblings, 2 replies; 36+ messages in thread From: Pavel Machek @ 2016-12-07 21:37 UTC (permalink / raw) To: Lino Sanfilippo Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 5197 bytes --] On Wed 2016-12-07 21:05:38, Lino Sanfilippo wrote: > The driver uses a private lock for synchronization between the xmit > function and the xmit completion handler, but since the NETIF_F_LLTX flag > is not set, the xmit function is also called with the xmit_lock held. > > On the other hand the xmit completion handler first takes the private lock > and (in case that the tx queue has been stopped) the xmit_lock, leading to > a reverse locking order and the potential danger of a deadlock. > > Fix this by removing the private lock completely and synchronizing the xmit > function and completion handler solely by means of the xmit_lock. By doing > this remove also the now unnecessary double check for a stopped tx queue. > FYI, here's modified version. I believe _bh versions are needed, and I'm testing that version now. (Oh and I also ported it to net-next). It survived 30 minutes of testing so far... Best regards, Pavel Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Pavel Machek <pavel@denx.de> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h index dbacb80..eab04ae 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h @@ -64,7 +64,6 @@ struct stmmac_priv { dma_addr_t dma_tx_phy; int tx_coalesce; int hwts_tx_en; - spinlock_t tx_lock; bool tx_path_in_lpi_mode; struct timer_list txtimer; bool tso; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 982c952..7415bc2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1308,7 +1308,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) unsigned int bytes_compl = 0, pkts_compl = 0; unsigned int entry = priv->dirty_tx; - spin_lock(&priv->tx_lock); + netif_tx_lock_bh(priv->dev); priv->xstats.tx_clean++; @@ -1378,23 +1378,18 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) netdev_completed_queue(priv->dev, pkts_compl, bytes_compl); - if (unlikely(netif_queue_stopped(priv->dev) && - stmmac_tx_avail(priv) > STMMAC_TX_THRESH)) { - netif_tx_lock(priv->dev); - if (netif_queue_stopped(priv->dev) && - stmmac_tx_avail(priv) > STMMAC_TX_THRESH) { - netif_dbg(priv, tx_done, priv->dev, - "%s: restart transmit\n", __func__); - netif_wake_queue(priv->dev); - } - netif_tx_unlock(priv->dev); + if (netif_queue_stopped(priv->dev) && + stmmac_tx_avail(priv) > STMMAC_TX_THRESH) { + netif_dbg(priv, tx_done, priv->dev, + "%s: restart transmit\n", __func__); + netif_wake_queue(priv->dev); } if ((priv->eee_enabled) && (!priv->tx_path_in_lpi_mode)) { stmmac_enable_eee_mode(priv); mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer)); } - spin_unlock(&priv->tx_lock); + netif_tx_unlock_bh(priv->dev); } static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv) @@ -2006,8 +2001,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) u8 proto_hdr_len; int i; - spin_lock(&priv->tx_lock); - /* Compute header lengths */ proto_hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb); @@ -2021,7 +2014,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) "%s: Tx Ring full when queue awake\n", __func__); } - spin_unlock(&priv->tx_lock); return NETDEV_TX_BUSY; } @@ -2156,11 +2148,9 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, priv->tx_tail_addr, STMMAC_CHAN0); - spin_unlock(&priv->tx_lock); return NETDEV_TX_OK; dma_map_err: - spin_unlock(&priv->tx_lock); dev_err(priv->device, "Tx dma map failed\n"); dev_kfree_skb(skb); priv->dev->stats.tx_dropped++; @@ -2192,10 +2182,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) return stmmac_tso_xmit(skb, dev); } - spin_lock(&priv->tx_lock); - if (unlikely(stmmac_tx_avail(priv) < nfrags + 1)) { - spin_unlock(&priv->tx_lock); if (!netif_queue_stopped(dev)) { netif_stop_queue(dev); /* This is a hard error, log it. */ @@ -2366,11 +2353,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, priv->tx_tail_addr, STMMAC_CHAN0); - spin_unlock(&priv->tx_lock); return NETDEV_TX_OK; dma_map_err: - spin_unlock(&priv->tx_lock); netdev_err(priv->dev, "Tx DMA map failed\n"); dev_kfree_skb(skb); priv->dev->stats.tx_dropped++; @@ -3357,7 +3342,6 @@ int stmmac_dvr_probe(struct device *device, netif_napi_add(ndev, &priv->napi, stmmac_poll, 64); spin_lock_init(&priv->lock); - spin_lock_init(&priv->tx_lock); ret = register_netdev(ndev); if (ret) { -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 21:37 ` Pavel Machek @ 2016-12-07 21:43 ` Lino Sanfilippo 2016-12-07 22:34 ` Lino Sanfilippo 2016-12-07 23:41 ` David Miller 1 sibling, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-07 21:43 UTC (permalink / raw) To: Pavel Machek Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev Hi Pavel, On 07.12.2016 22:37, Pavel Machek wrote: > On Wed 2016-12-07 21:05:38, Lino Sanfilippo wrote: >> The driver uses a private lock for synchronization between the xmit >> function and the xmit completion handler, but since the NETIF_F_LLTX flag >> is not set, the xmit function is also called with the xmit_lock held. >> >> On the other hand the xmit completion handler first takes the private lock >> and (in case that the tx queue has been stopped) the xmit_lock, leading to >> a reverse locking order and the potential danger of a deadlock. >> >> Fix this by removing the private lock completely and synchronizing the xmit >> function and completion handler solely by means of the xmit_lock. By doing >> this remove also the now unnecessary double check for a stopped tx queue. >> > > FYI, here's modified version. I believe _bh versions are needed, and > I'm testing that version now. (Oh and I also ported it to net-next). > > It survived 30 minutes of testing so far... > First off, thanks for testing. Hmm. I dont understand why _bh would be needed. We call that function from BH context only (napi poll and timer). Any idea? Lino ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 21:43 ` Lino Sanfilippo @ 2016-12-07 22:34 ` Lino Sanfilippo 2016-12-07 23:21 ` Pavel Machek 0 siblings, 1 reply; 36+ messages in thread From: Lino Sanfilippo @ 2016-12-07 22:34 UTC (permalink / raw) To: Pavel Machek Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev On 07.12.2016 22:43, Lino Sanfilippo wrote: > Hi Pavel, > > On 07.12.2016 22:37, Pavel Machek wrote: >> On Wed 2016-12-07 21:05:38, Lino Sanfilippo wrote: >>> The driver uses a private lock for synchronization between the xmit >>> function and the xmit completion handler, but since the NETIF_F_LLTX flag >>> is not set, the xmit function is also called with the xmit_lock held. >>> >>> On the other hand the xmit completion handler first takes the private lock >>> and (in case that the tx queue has been stopped) the xmit_lock, leading to >>> a reverse locking order and the potential danger of a deadlock. >>> >>> Fix this by removing the private lock completely and synchronizing the xmit >>> function and completion handler solely by means of the xmit_lock. By doing >>> this remove also the now unnecessary double check for a stopped tx queue. >>> >> >> FYI, here's modified version. I believe _bh versions are needed, and >> I'm testing that version now. (Oh and I also ported it to net-next). >> >> It survived 30 minutes of testing so far... >> > > First off, thanks for testing. > Hmm. I dont understand why _bh would be needed. We call that function from > BH context only (napi poll and timer). > Any idea? > Could this once again be caused by irq coalescing? When the tx queue has been stopped the cleanup handler has to wakeup the queue within a certain time span, otherwise the watchdog will complain (as it happened in your test). Could you retest this with irq coalescing disabled? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 22:34 ` Lino Sanfilippo @ 2016-12-07 23:21 ` Pavel Machek 0 siblings, 0 replies; 36+ messages in thread From: Pavel Machek @ 2016-12-07 23:21 UTC (permalink / raw) To: Lino Sanfilippo Cc: bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, davem, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1945 bytes --] On Wed 2016-12-07 23:34:19, Lino Sanfilippo wrote: > On 07.12.2016 22:43, Lino Sanfilippo wrote: > > Hi Pavel, > > > > On 07.12.2016 22:37, Pavel Machek wrote: > >> On Wed 2016-12-07 21:05:38, Lino Sanfilippo wrote: > >>> The driver uses a private lock for synchronization between the xmit > >>> function and the xmit completion handler, but since the NETIF_F_LLTX flag > >>> is not set, the xmit function is also called with the xmit_lock held. > >>> > >>> On the other hand the xmit completion handler first takes the private lock > >>> and (in case that the tx queue has been stopped) the xmit_lock, leading to > >>> a reverse locking order and the potential danger of a deadlock. > >>> > >>> Fix this by removing the private lock completely and synchronizing the xmit > >>> function and completion handler solely by means of the xmit_lock. By doing > >>> this remove also the now unnecessary double check for a stopped tx queue. > >>> > >> > >> FYI, here's modified version. I believe _bh versions are needed, and > >> I'm testing that version now. (Oh and I also ported it to net-next). > >> > >> It survived 30 minutes of testing so far... > >> > > > > First off, thanks for testing. > > Hmm. I dont understand why _bh would be needed. We call that function from > > BH context only (napi poll and timer). > > Any idea? > > > > Could this once again be caused by irq coalescing? When the tx queue has been stopped > the cleanup handler has to wakeup the queue within a certain time span, otherwise the > watchdog will complain (as it happened in your test). Could you retest this with > irq coalescing disabled? I actually had TX coalescing disabled, with -#define STMMAC_TX_FRAMES 64 +#define STMMAC_TX_FRAMES 0 Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 21:37 ` Pavel Machek 2016-12-07 21:43 ` Lino Sanfilippo @ 2016-12-07 23:41 ` David Miller 2016-12-08 14:08 ` Pavel Machek 1 sibling, 1 reply; 36+ messages in thread From: David Miller @ 2016-12-07 23:41 UTC (permalink / raw) To: pavel Cc: LinoSanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, linux-kernel, netdev From: Pavel Machek <pavel@ucw.cz> Date: Wed, 7 Dec 2016 22:37:57 +0100 > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 982c952..7415bc2 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -1308,7 +1308,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) > unsigned int bytes_compl = 0, pkts_compl = 0; > unsigned int entry = priv->dirty_tx; > > - spin_lock(&priv->tx_lock); > + netif_tx_lock_bh(priv->dev); > > priv->xstats.tx_clean++; > stmmac_tx_clean() runs from either the timer or the NAPI poll handler, both execute from software interrupts, therefore _bh() should be unnecessary. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-07 23:41 ` David Miller @ 2016-12-08 14:08 ` Pavel Machek 2016-12-08 15:26 ` David Miller 0 siblings, 1 reply; 36+ messages in thread From: Pavel Machek @ 2016-12-08 14:08 UTC (permalink / raw) To: David Miller Cc: LinoSanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1160 bytes --] On Wed 2016-12-07 18:41:11, David Miller wrote: > From: Pavel Machek <pavel@ucw.cz> > Date: Wed, 7 Dec 2016 22:37:57 +0100 > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index 982c952..7415bc2 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -1308,7 +1308,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) > > unsigned int bytes_compl = 0, pkts_compl = 0; > > unsigned int entry = priv->dirty_tx; > > > > - spin_lock(&priv->tx_lock); > > + netif_tx_lock_bh(priv->dev); > > > > priv->xstats.tx_clean++; > > > > stmmac_tx_clean() runs from either the timer or the NAPI poll handler, > both execute from software interrupts, therefore _bh() should be > unnecessary. I've tried the test again with netif_tx_lock() (not _bh()) and it survived for more then four hours. Strange... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-08 14:08 ` Pavel Machek @ 2016-12-08 15:26 ` David Miller 2016-12-08 15:46 ` Pavel Machek 0 siblings, 1 reply; 36+ messages in thread From: David Miller @ 2016-12-08 15:26 UTC (permalink / raw) To: pavel Cc: LinoSanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, linux-kernel, netdev From: Pavel Machek <pavel@ucw.cz> Date: Thu, 8 Dec 2016 15:08:46 +0100 > On Wed 2016-12-07 18:41:11, David Miller wrote: >> From: Pavel Machek <pavel@ucw.cz> >> Date: Wed, 7 Dec 2016 22:37:57 +0100 >> >> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> > index 982c952..7415bc2 100644 >> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> > @@ -1308,7 +1308,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) >> > unsigned int bytes_compl = 0, pkts_compl = 0; >> > unsigned int entry = priv->dirty_tx; >> > >> > - spin_lock(&priv->tx_lock); >> > + netif_tx_lock_bh(priv->dev); >> > >> > priv->xstats.tx_clean++; >> > >> >> stmmac_tx_clean() runs from either the timer or the NAPI poll handler, >> both execute from software interrupts, therefore _bh() should be >> unnecessary. > > I've tried the test again with netif_tx_lock() (not _bh()) and it > survived for more then four hours. Strange... It's not strange, it's completely expected. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] net: ethernet: stmmac: remove private tx queue lock 2016-12-08 15:26 ` David Miller @ 2016-12-08 15:46 ` Pavel Machek 0 siblings, 0 replies; 36+ messages in thread From: Pavel Machek @ 2016-12-08 15:46 UTC (permalink / raw) To: David Miller Cc: LinoSanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro, alexandre.torgue, linux-kernel, netdev [-- Attachment #1: Type: text/plain, Size: 1518 bytes --] On Thu 2016-12-08 10:26:41, David Miller wrote: > From: Pavel Machek <pavel@ucw.cz> > Date: Thu, 8 Dec 2016 15:08:46 +0100 > > > On Wed 2016-12-07 18:41:11, David Miller wrote: > >> From: Pavel Machek <pavel@ucw.cz> > >> Date: Wed, 7 Dec 2016 22:37:57 +0100 > >> > >> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > >> > index 982c952..7415bc2 100644 > >> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > >> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > >> > @@ -1308,7 +1308,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv) > >> > unsigned int bytes_compl = 0, pkts_compl = 0; > >> > unsigned int entry = priv->dirty_tx; > >> > > >> > - spin_lock(&priv->tx_lock); > >> > + netif_tx_lock_bh(priv->dev); > >> > > >> > priv->xstats.tx_clean++; > >> > > >> > >> stmmac_tx_clean() runs from either the timer or the NAPI poll handler, > >> both execute from software interrupts, therefore _bh() should be > >> unnecessary. > > > > I've tried the test again with netif_tx_lock() (not _bh()) and it > > survived for more then four hours. Strange... > > It's not strange, it's completely expected. Well, I tried that exact test before, and it survived for something like 10 minutes. So yes... this surprised me. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2016-12-20 0:05 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-12-07 20:05 Remove private locks to avoid possible deadlock Lino Sanfilippo 2016-12-07 20:05 ` [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock Lino Sanfilippo 2016-12-07 23:15 ` Francois Romieu 2016-12-08 20:32 ` Lino Sanfilippo 2016-12-08 21:54 ` Pavel Machek 2016-12-08 22:12 ` Lino Sanfilippo 2016-12-08 22:18 ` Pavel Machek 2016-12-08 22:45 ` Lino Sanfilippo 2016-12-08 23:19 ` Francois Romieu 2016-12-09 11:21 ` Pavel Machek 2016-12-10 2:25 ` Lino Sanfilippo 2016-12-11 20:11 ` Pavel Machek 2016-12-15 19:27 ` Lino Sanfilippo 2016-12-15 21:03 ` Pavel Machek 2016-12-15 21:32 ` Lino Sanfilippo 2016-12-15 22:33 ` Lino Sanfilippo 2016-12-17 17:31 ` Pavel Machek 2016-12-18 0:15 ` Francois Romieu 2016-12-18 16:15 ` Lino Sanfilippo 2016-12-18 17:23 ` Pavel Machek 2016-12-18 18:30 ` Pavel Machek 2016-12-19 22:49 ` Lino Sanfilippo 2016-12-18 20:16 ` Pavel Machek 2016-12-19 10:02 ` Pavel Machek 2016-12-20 0:05 ` Francois Romieu 2016-12-07 20:05 ` [PATCH 2/2] net: ethernet: stmmac: " Lino Sanfilippo 2016-12-07 20:55 ` Pavel Machek 2016-12-07 20:59 ` Pavel Machek 2016-12-07 21:37 ` Pavel Machek 2016-12-07 21:43 ` Lino Sanfilippo 2016-12-07 22:34 ` Lino Sanfilippo 2016-12-07 23:21 ` Pavel Machek 2016-12-07 23:41 ` David Miller 2016-12-08 14:08 ` Pavel Machek 2016-12-08 15:26 ` David Miller 2016-12-08 15:46 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).