All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tg3: Fix soft lockup when tg3_reset_task() fails.
@ 2020-09-03 18:28 Michael Chan
  2020-09-03 19:24 ` David Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Michael Chan @ 2020-09-03 18:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, kuba, drc, baptiste

If tg3_reset_task() fails, the device state is left in an inconsistent
state with IFF_RUNNING still set but NAPI state not enabled.  A
subsequent operation, such as ifdown or AER error can cause it to
soft lock up when it tries to disable NAPI state.

Fix it by bringing down the device to !IFF_RUNNING state when
tg3_reset_task() fails.  tg3_reset_task() running from workqueue
will now call tg3_close() when the reset fails.  We need to
modify tg3_reset_task_cancel() slightly to avoid tg3_close()
calling cancel_work_sync() to cancel tg3_reset_task().  Otherwise
cancel_work_sync() will wait forever for tg3_reset_task() to
finish.

Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Reported-by: Baptiste Covolato <baptiste@arista.com>
Fixes: db2199737990 ("tg3: Schedule at most one tg3_reset_task run")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index ebff1fc..4515804 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7221,8 +7221,8 @@ static inline void tg3_reset_task_schedule(struct tg3 *tp)
 
 static inline void tg3_reset_task_cancel(struct tg3 *tp)
 {
-	cancel_work_sync(&tp->reset_task);
-	tg3_flag_clear(tp, RESET_TASK_PENDING);
+	if (test_and_clear_bit(TG3_FLAG_RESET_TASK_PENDING, tp->tg3_flags))
+		cancel_work_sync(&tp->reset_task);
 	tg3_flag_clear(tp, TX_RECOVERY_PENDING);
 }
 
@@ -11209,18 +11209,27 @@ static void tg3_reset_task(struct work_struct *work)
 
 	tg3_halt(tp, RESET_KIND_SHUTDOWN, 0);
 	err = tg3_init_hw(tp, true);
-	if (err)
+	if (err) {
+		tg3_full_unlock(tp);
+		tp->irq_sync = 0;
+		tg3_napi_enable(tp);
+		/* Clear this flag so that tg3_reset_task_cancel() will not
+		 * call cancel_work_sync() and wait forever.
+		 */
+		tg3_flag_clear(tp, RESET_TASK_PENDING);
+		dev_close(tp->dev);
 		goto out;
+	}
 
 	tg3_netif_start(tp);
 
-out:
 	tg3_full_unlock(tp);
 
 	if (!err)
 		tg3_phy_start(tp);
 
 	tg3_flag_clear(tp, RESET_TASK_PENDING);
+out:
 	rtnl_unlock();
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread
* Re: [PATCH net] tg3: Fix soft lockup when tg3_reset_task() fails.
@ 2020-10-07 16:54 Tomas Charvat
  0 siblings, 0 replies; 7+ messages in thread
From: Tomas Charvat @ 2020-10-07 16:54 UTC (permalink / raw)
  To: CABb8VeHA8yEmi-iDs3O-eRfOucWqGM+9p6gj87NLdjeQHfJROA; +Cc: netdev

Greetings, 
tg3 with kernel 5.4.69 is able to get into state, that its reseting
link forever. From mii-tools -w XXX I can see, that link is going up
and down every 3-8 seconds. 
Take down/up interface doesnt help.
However there is no any error in dmesg or anywhere else.

Tested kernel is not modular, only reboot helped.
NIC details are:
driver=tg3 driverversion=3.137 firmware=5720-v1.39 NCSI v1.5.1.0

Best regards
-- 
Tomas Charvat <tc@excello.cz>
Excello s.r.o.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-07 17:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03 18:28 [PATCH net] tg3: Fix soft lockup when tg3_reset_task() fails Michael Chan
2020-09-03 19:24 ` David Miller
2020-09-04 23:20 ` Baptiste Covolato
2020-09-05  9:02   ` Michael Chan
2020-09-10 23:00     ` Baptiste Covolato
2020-09-08 16:55 ` David Christensen
2020-10-07 16:54 Tomas Charvat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.