All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Fix deadlocks caused by del_timer_sync()
@ 2022-04-07  6:33 Duoming Zhou
  2022-04-07  6:33 ` [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios() Duoming Zhou
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

If the timer handlers need a lock owned by the thread calling 
del_timer_sync(), then, the caller thread will block forever.

Duoming Zhou (11):
  drivers: tty: serial: Fix deadlock in sa1100_set_termios()
  drivers: usb: host: Fix deadlock in oxu_bus_suspend()
  drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop()
  drivers: staging: rtl8723bs: Fix deadlock in
    rtw_surveydone_event_callback()
  drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop()
  drivers: staging: rtl8192e: Fix deadlock in
    rtw_joinbss_event_prehandle()
  drivers: net: hippi: Fix deadlock in rr_close()
  drivers: net: can: Fix deadlock in grcan_close()
  drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  arch: xtensa: platforms: Fix deadlock in iss_net_close()
  arch: xtensa: platforms: Fix deadlock in rs_close()

 arch/xtensa/platforms/iss/console.c                    | 4 +++-
 arch/xtensa/platforms/iss/network.c                    | 2 ++
 drivers/infiniband/hw/irdma/cm.c                       | 5 ++++-
 drivers/net/can/grcan.c                                | 2 ++
 drivers/net/hippi/rrunner.c                            | 2 ++
 drivers/staging/rtl8192e/rtllib_softmac.c              | 2 +-
 drivers/staging/rtl8192u/ieee80211/ieee80211_softmac.c | 2 +-
 drivers/staging/rtl8723bs/core/rtw_mlme.c              | 4 ++++
 drivers/tty/serial/sa1100.c                            | 2 ++
 drivers/usb/host/oxu210hp-hcd.c                        | 2 ++
 10 files changed, 23 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
@ 2022-04-07  6:33 ` Duoming Zhou
  2022-04-07  7:02   ` Jiri Slaby
  2022-04-07  6:33 ` [PATCH 02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend() Duoming Zhou
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in sa1100_set_termios(), which is shown
below:

   (Thread 1)              |      (Thread 2)
                           | sa1100_enable_ms()
sa1100_set_termios()       |  mod_timer()
 spin_lock_irqsave() //(1) |  (wait a time)
 ...                       | sa1100_timeout()
 del_timer_sync()          |  spin_lock_irqsave() //(2)
 (wait timer to stop)      |  ...

We hold sport->port.lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need sport->port.lock in position (2) of thread 2. As a result,
sa1100_set_termios() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/tty/serial/sa1100.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/tty/serial/sa1100.c b/drivers/tty/serial/sa1100.c
index 5fe6cccfc1a..3a5f12ced0b 100644
--- a/drivers/tty/serial/sa1100.c
+++ b/drivers/tty/serial/sa1100.c
@@ -476,7 +476,9 @@ sa1100_set_termios(struct uart_port *port, struct ktermios *termios,
 				UTSR1_TO_SM(UTSR1_ROR);
 	}
 
+	spin_unlock_irqrestore(&sport->port.lock, flags);
 	del_timer_sync(&sport->timer);
+	spin_lock_irqsave(&sport->port.lock, flags);
 
 	/*
 	 * Update the per-port timeout.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
  2022-04-07  6:33 ` [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios() Duoming Zhou
@ 2022-04-07  6:33 ` Duoming Zhou
  2022-04-07  8:01   ` Oliver Neukum
  2022-04-07  6:33 ` [PATCH 03/11] drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop() Duoming Zhou
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in oxu_bus_suspend(), which is shown below:

   (Thread 1)              |      (Thread 2)
                           | timer_action()
oxu_bus_suspend()          |  mod_timer()
 spin_lock_irq() //(1)     |  (wait a time)
 ...                       | oxu_watchdog()
 del_timer_sync()          |  spin_lock_irq() //(2)
 (wait timer to stop)      |  ...

We hold oxu->lock in position (1) of thread 1, and use
del_timer_sync() to wait timer to stop, but timer handler
also need oxu->lock in position (2) of thread 2. As a result,
oxu_bus_suspend() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irq(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/usb/host/oxu210hp-hcd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/host/oxu210hp-hcd.c b/drivers/usb/host/oxu210hp-hcd.c
index b741670525e..ee403df3309 100644
--- a/drivers/usb/host/oxu210hp-hcd.c
+++ b/drivers/usb/host/oxu210hp-hcd.c
@@ -3909,8 +3909,10 @@ static int oxu_bus_suspend(struct usb_hcd *hcd)
 		}
 	}
 
+	spin_unlock_irq(&oxu->lock);
 	/* turn off now-idle HC */
 	del_timer_sync(&oxu->watchdog);
+	spin_lock_irq(&oxu->lock);
 	ehci_halt(oxu);
 	hcd->state = HC_STATE_SUSPENDED;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/11] drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
  2022-04-07  6:33 ` [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios() Duoming Zhou
  2022-04-07  6:33 ` [PATCH 02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend() Duoming Zhou
@ 2022-04-07  6:33 ` Duoming Zhou
  2022-04-07  6:33 ` [PATCH 04/11] drivers: staging: rtl8723bs: Fix deadlock in rtw_surveydone_event_callback() Duoming Zhou
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in ieee80211_beacons_stop(), which is shown below:

   (Thread 1)              |      (Thread 2)
                           | ieee80211_send_beacon()
ieee80211_beacons_stop()   |  mod_timer()
 spin_lock_irqsave() //(1) |  (wait a time)
 ...                       | ieee80211_send_beacon_cb()
 del_timer_sync()          |  spin_lock_irqsave() //(2)
 (wait timer to stop)      |  ...

We hold ieee->beacon_lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need ieee->beacon_lock in position (2) of thread 2.
As a result, ieee80211_beacons_stop() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/staging/rtl8192u/ieee80211/ieee80211_softmac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_softmac.c b/drivers/staging/rtl8192u/ieee80211/ieee80211_softmac.c
index 1a43979939a..79f3fbe2555 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211_softmac.c
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_softmac.c
@@ -528,9 +528,9 @@ static void ieee80211_beacons_stop(struct ieee80211_device *ieee)
 	spin_lock_irqsave(&ieee->beacon_lock, flags);
 
 	ieee->beacon_txing = 0;
-	del_timer_sync(&ieee->beacon_timer);
 
 	spin_unlock_irqrestore(&ieee->beacon_lock, flags);
+	del_timer_sync(&ieee->beacon_timer);
 }
 
 void ieee80211_stop_send_beacons(struct ieee80211_device *ieee)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/11] drivers: staging: rtl8723bs: Fix deadlock in rtw_surveydone_event_callback()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (2 preceding siblings ...)
  2022-04-07  6:33 ` [PATCH 03/11] drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop() Duoming Zhou
@ 2022-04-07  6:33 ` Duoming Zhou
  2022-04-07  6:36 ` [PATCH 05/11] drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop() Duoming Zhou
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in rtw_surveydone_event_callback(),
which is shown below:

   (Thread 1)                  |      (Thread 2)
                               | _set_timer()
rtw_surveydone_event_callback()|  mod_timer()
 spin_lock_bh() //(1)          |  (wait a time)
 ...                           | rtw_scan_timeout_handler()
 del_timer_sync()              |  spin_lock_bh() //(2)
 (wait timer to stop)          |  ...

We hold pmlmepriv->lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need pmlmepriv->lock in position (2) of thread 2.
As a result, rtw_surveydone_event_callback() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/staging/rtl8723bs/core/rtw_mlme.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/staging/rtl8723bs/core/rtw_mlme.c b/drivers/staging/rtl8723bs/core/rtw_mlme.c
index ed2d3b7d44d..d559eefe633 100644
--- a/drivers/staging/rtl8723bs/core/rtw_mlme.c
+++ b/drivers/staging/rtl8723bs/core/rtw_mlme.c
@@ -751,7 +751,9 @@ void rtw_surveydone_event_callback(struct adapter	*adapter, u8 *pbuf)
 	}
 
 	if (check_fwstate(pmlmepriv, _FW_UNDER_SURVEY)) {
+		spin_unlock_bh(&pmlmepriv->lock);
 		del_timer_sync(&pmlmepriv->scan_to_timer);
+		spin_lock_bh(&pmlmepriv->lock);
 		_clr_fwstate_(pmlmepriv, _FW_UNDER_SURVEY);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/11] drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (3 preceding siblings ...)
  2022-04-07  6:33 ` [PATCH 04/11] drivers: staging: rtl8723bs: Fix deadlock in rtw_surveydone_event_callback() Duoming Zhou
@ 2022-04-07  6:36 ` Duoming Zhou
  2022-04-07  6:36 ` [PATCH 06/11] drivers: staging: rtl8192e: Fix deadlock in rtw_joinbss_event_prehandle() Duoming Zhou
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in rtllib_beacons_stop(), which is shown
below:

   (Thread 1)              |      (Thread 2)
                           | rtllib_send_beacon()
rtllib_beacons_stop()      |  mod_timer()
 spin_lock_irqsave() //(1) |  (wait a time)
 ...                       | rtllib_send_beacon_cb()
 del_timer_sync()          |  spin_lock_irqsave() //(2)
 (wait timer to stop)      |  ...

We hold ieee->beacon_lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need ieee->beacon_lock in position (2) of thread 2.
As a result, rtllib_beacons_stop() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/staging/rtl8192e/rtllib_softmac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/rtl8192e/rtllib_softmac.c b/drivers/staging/rtl8192e/rtllib_softmac.c
index 4b6c2295a3c..3ea542e9008 100644
--- a/drivers/staging/rtl8192e/rtllib_softmac.c
+++ b/drivers/staging/rtl8192e/rtllib_softmac.c
@@ -651,10 +651,10 @@ static void rtllib_beacons_stop(struct rtllib_device *ieee)
 	spin_lock_irqsave(&ieee->beacon_lock, flags);
 
 	ieee->beacon_txing = 0;
-	del_timer_sync(&ieee->beacon_timer);
 
 	spin_unlock_irqrestore(&ieee->beacon_lock, flags);
 
+	del_timer_sync(&ieee->beacon_timer);
 }
 
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/11] drivers: staging: rtl8192e: Fix deadlock in rtw_joinbss_event_prehandle()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (4 preceding siblings ...)
  2022-04-07  6:36 ` [PATCH 05/11] drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop() Duoming Zhou
@ 2022-04-07  6:36 ` Duoming Zhou
  2022-04-07  6:36 ` [PATCH 07/11] drivers: net: hippi: Fix deadlock in rr_close() Duoming Zhou
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in rtw_joinbss_event_prehandle(), which is shown
below:

   (Thread 1)                |      (Thread 2)
                             | _set_timer()
rtw_joinbss_event_prehandle()|  mod_timer()
 spin_lock_bh() //(1)        |  (wait a time)
 ...                         | _rtw_join_timeout_handler()
 del_timer_sync()            |  spin_lock_bh() //(2)
 (wait timer to stop)        |  ...

We hold pmlmepriv->lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need pmlmepriv->lock in position (2) of thread 2.
As a result, rtw_joinbss_event_prehandle() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/staging/rtl8723bs/core/rtw_mlme.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/staging/rtl8723bs/core/rtw_mlme.c b/drivers/staging/rtl8723bs/core/rtw_mlme.c
index d559eefe633..5030556929d 100644
--- a/drivers/staging/rtl8723bs/core/rtw_mlme.c
+++ b/drivers/staging/rtl8723bs/core/rtw_mlme.c
@@ -1240,8 +1240,10 @@ void rtw_joinbss_event_prehandle(struct adapter *adapter, u8 *pbuf)
 
 			spin_unlock_bh(&pmlmepriv->scanned_queue.lock);
 
+			spin_unlock_bh(&pmlmepriv->lock);
 			/* s5. Cancel assoc_timer */
 			del_timer_sync(&pmlmepriv->assoc_timer);
+			spin_lock_bh(&pmlmepriv->lock);
 		} else {
 			spin_unlock_bh(&(pmlmepriv->scanned_queue.lock));
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/11] drivers: net: hippi: Fix deadlock in rr_close()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (5 preceding siblings ...)
  2022-04-07  6:36 ` [PATCH 06/11] drivers: staging: rtl8192e: Fix deadlock in rtw_joinbss_event_prehandle() Duoming Zhou
@ 2022-04-07  6:36 ` Duoming Zhou
  2022-04-07  6:37 ` [PATCH 08/11] drivers: net: can: Fix deadlock in grcan_close() Duoming Zhou
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in rr_close(), which is shown below:

   (Thread 1)                |      (Thread 2)
                             | rr_open()
rr_close()                   |  add_timer()
 spin_lock_irqsave() //(1)   |  (wait a time)
 ...                         | rr_timer()
 del_timer_sync()            |  spin_lock_irqsave() //(2)
 (wait timer to stop)        |  ...

We hold rrpriv->lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need rrpriv->lock in position (2) of thread 2.
As a result, rr_close() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/net/hippi/rrunner.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/hippi/rrunner.c b/drivers/net/hippi/rrunner.c
index 16105292b14..74e845fa2e0 100644
--- a/drivers/net/hippi/rrunner.c
+++ b/drivers/net/hippi/rrunner.c
@@ -1355,7 +1355,9 @@ static int rr_close(struct net_device *dev)
 
 	rrpriv->fw_running = 0;
 
+	spin_unlock_irqrestore(&rrpriv->lock, flags);
 	del_timer_sync(&rrpriv->timer);
+	spin_lock_irqsave(&rrpriv->lock, flags);
 
 	writel(0, &regs->TxPi);
 	writel(0, &regs->IpRxPi);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/11] drivers: net: can: Fix deadlock in grcan_close()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (6 preceding siblings ...)
  2022-04-07  6:36 ` [PATCH 07/11] drivers: net: hippi: Fix deadlock in rr_close() Duoming Zhou
@ 2022-04-07  6:37 ` Duoming Zhou
  2022-04-07  6:37 ` [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core() Duoming Zhou
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There are deadlocks caused by del_timer_sync(&priv->hang_timer)
and del_timer_sync(&priv->rr_timer) in grcan_close(), one of
the deadlocks are shown below:

   (Thread 1)              |      (Thread 2)
                           | grcan_reset_timer()
grcan_close()              |  mod_timer()
 spin_lock_irqsave() //(1) |  (wait a time)
 ...                       | grcan_initiate_running_reset()
 del_timer_sync()          |  spin_lock_irqsave() //(2)
 (wait timer to stop)      |  ...

We hold priv->lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need priv->lock in position (2) of thread 2.
As a result, grcan_close() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/net/can/grcan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/can/grcan.c b/drivers/net/can/grcan.c
index d0c5a7a60da..1189057b5d6 100644
--- a/drivers/net/can/grcan.c
+++ b/drivers/net/can/grcan.c
@@ -1102,8 +1102,10 @@ static int grcan_close(struct net_device *dev)
 
 	priv->closing = true;
 	if (priv->need_txbug_workaround) {
+		spin_unlock_irqrestore(&priv->lock, flags);
 		del_timer_sync(&priv->hang_timer);
 		del_timer_sync(&priv->rr_timer);
+		spin_lock_irqsave(&priv->lock, flags);
 	}
 	netif_stop_queue(dev);
 	grcan_stop_hardware(dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (7 preceding siblings ...)
  2022-04-07  6:37 ` [PATCH 08/11] drivers: net: can: Fix deadlock in grcan_close() Duoming Zhou
@ 2022-04-07  6:37 ` Duoming Zhou
  2022-04-07 11:24   ` Dan Carpenter
  2022-04-07  6:37 ` [PATCH 10/11] arch: xtensa: platforms: Fix deadlock in iss_net_close() Duoming Zhou
  2022-04-07  6:37 ` [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close() Duoming Zhou
  10 siblings, 1 reply; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in irdma_cleanup_cm_core(), which is shown
below:

   (Thread 1)              |      (Thread 2)
                           | irdma_schedule_cm_timer()
irdma_cleanup_cm_core()    |  add_timer()
 spin_lock_irqsave() //(1) |  (wait a time)
 ...                       | irdma_cm_timer_tick()
 del_timer_sync()          |  spin_lock_irqsave() //(2)
 (wait timer to stop)      |  ...

We hold cm_core->ht_lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need cm_core->ht_lock in position (2) of thread 2.
As a result, irdma_cleanup_cm_core() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_irqsave(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 drivers/infiniband/hw/irdma/cm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
index dedb3b7edd8..019dd8bfe08 100644
--- a/drivers/infiniband/hw/irdma/cm.c
+++ b/drivers/infiniband/hw/irdma/cm.c
@@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct irdma_cm_core *cm_core)
 		return;
 
 	spin_lock_irqsave(&cm_core->ht_lock, flags);
-	if (timer_pending(&cm_core->tcp_timer))
+	if (timer_pending(&cm_core->tcp_timer)) {
+		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
 		del_timer_sync(&cm_core->tcp_timer);
+		spin_lock_irqsave(&cm_core->ht_lock, flags);
+	}
 	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
 
 	destroy_workqueue(cm_core->event_wq);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/11] arch: xtensa: platforms: Fix deadlock in iss_net_close()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (8 preceding siblings ...)
  2022-04-07  6:37 ` [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core() Duoming Zhou
@ 2022-04-07  6:37 ` Duoming Zhou
  2022-04-07  6:37 ` [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close() Duoming Zhou
  10 siblings, 0 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in iss_net_close(), which is shown
below:

   (Thread 1)              |      (Thread 2)
                           | iss_net_open()
iss_net_close()            |  mod_timer()
 spin_lock_bh() //(1)      |  (wait a time)
 ...                       | iss_net_timer()
 del_timer_sync()          |  spin_lock() //(2)
 (wait timer to stop)      |  ...

We hold lp->lock in position (1) of thread 1 and use
del_timer_sync() to wait timer to stop, but timer handler
also need lp->lock in position (2) of thread 2. As a result,
iss_net_close() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 arch/xtensa/platforms/iss/network.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/xtensa/platforms/iss/network.c b/arch/xtensa/platforms/iss/network.c
index be3aaaad8be..48340f17e39 100644
--- a/arch/xtensa/platforms/iss/network.c
+++ b/arch/xtensa/platforms/iss/network.c
@@ -403,7 +403,9 @@ static int iss_net_close(struct net_device *dev)
 	list_del(&opened);
 	spin_unlock(&opened_lock);
 
+	spin_unlock_bh(&lp->lock);
 	del_timer_sync(&lp->timer);
+	spin_lock_bh(&lp->lock);
 
 	lp->tp.close(lp);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close()
  2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
                   ` (9 preceding siblings ...)
  2022-04-07  6:37 ` [PATCH 10/11] arch: xtensa: platforms: Fix deadlock in iss_net_close() Duoming Zhou
@ 2022-04-07  6:37 ` Duoming Zhou
  2022-04-07  7:21   ` Max Filippov
  2022-04-07  9:42   ` Sergey Shtylyov
  10 siblings, 2 replies; 30+ messages in thread
From: Duoming Zhou @ 2022-04-07  6:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Duoming Zhou

There is a deadlock in rs_close(), which is shown
below:

   (Thread 1)              |      (Thread 2)
                           | rs_open()
rs_close()                 |  mod_timer()
 spin_lock_bh() //(1)      |  (wait a time)
 ...                       | rs_poll()
 del_timer_sync()          |  spin_lock() //(2)
 (wait timer to stop)      |  ...

We hold timer_lock in position (1) of thread 1 and
use del_timer_sync() to wait timer to stop, but timer handler
also need timer_lock in position (2) of thread 2.
As a result, rs_close() will block forever.

This patch extracts del_timer_sync() from the protection of
spin_lock_bh(), which could let timer handler to obtain
the needed lock.

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
 arch/xtensa/platforms/iss/console.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/xtensa/platforms/iss/console.c b/arch/xtensa/platforms/iss/console.c
index 81d7c7e8f7e..d431b61ae3c 100644
--- a/arch/xtensa/platforms/iss/console.c
+++ b/arch/xtensa/platforms/iss/console.c
@@ -51,8 +51,10 @@ static int rs_open(struct tty_struct *tty, struct file * filp)
 static void rs_close(struct tty_struct *tty, struct file * filp)
 {
 	spin_lock_bh(&timer_lock);
-	if (tty->count == 1)
+	if (tty->count == 1) {
+		spin_unlock_bh(&timer_lock);
 		del_timer_sync(&serial_timer);
+	}
 	spin_unlock_bh(&timer_lock);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios()
  2022-04-07  6:33 ` [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios() Duoming Zhou
@ 2022-04-07  7:02   ` Jiri Slaby
  2022-04-07 14:03     ` duoming
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Slaby @ 2022-04-07  7:02 UTC (permalink / raw)
  To: Duoming Zhou, linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb, Russell King - ARM Linux

On 07. 04. 22, 8:33, Duoming Zhou wrote:
> There is a deadlock in sa1100_set_termios(), which is shown
> below:
> 
>     (Thread 1)              |      (Thread 2)
>                             | sa1100_enable_ms()
> sa1100_set_termios()       |  mod_timer()
>   spin_lock_irqsave() //(1) |  (wait a time)
>   ...                       | sa1100_timeout()
>   del_timer_sync()          |  spin_lock_irqsave() //(2)
>   (wait timer to stop)      |  ...
> 
> We hold sport->port.lock in position (1) of thread 1 and
> use del_timer_sync() to wait timer to stop, but timer handler
> also need sport->port.lock in position (2) of thread 2. As a result,
> sa1100_set_termios() will block forever.
> 
> This patch extracts del_timer_sync() from the protection of
> spin_lock_irqsave(), which could let timer handler to obtain
> the needed lock.
> 
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>   drivers/tty/serial/sa1100.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/tty/serial/sa1100.c b/drivers/tty/serial/sa1100.c
> index 5fe6cccfc1a..3a5f12ced0b 100644
> --- a/drivers/tty/serial/sa1100.c
> +++ b/drivers/tty/serial/sa1100.c
> @@ -476,7 +476,9 @@ sa1100_set_termios(struct uart_port *port, struct ktermios *termios,
>   				UTSR1_TO_SM(UTSR1_ROR);
>   	}
>   
> +	spin_unlock_irqrestore(&sport->port.lock, flags);

Unlocking the lock at this point doesn't look safe at all. Maybe moving 
the timer deletion before the lock? There is no current maintainer to 
ask. Most of the driver originates from rmk. Ccing him just in case.

FWIW the lock was moved by this commit around linux 2.5.55 (from 
full-history-linux [1])
commit f38aef3e62c26a33ea360a86fde9b27e183a3748
Author: Russell King <rmk@flint.arm.linux.org.uk>
Date:   Fri Jan 3 15:42:09 2003 +0000

     [SERIAL] Convert change_speed() to settermios()

[1] 
https://archive.org/download/git-history-of-linux/full-history-linux.git.tar

>   	del_timer_sync(&sport->timer);
> +	spin_lock_irqsave(&sport->port.lock, flags);
>   
>   	/*
>   	 * Update the per-port timeout.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close()
  2022-04-07  6:37 ` [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close() Duoming Zhou
@ 2022-04-07  7:21   ` Max Filippov
  2022-04-07 11:05     ` duoming
  2022-04-07  9:42   ` Sergey Shtylyov
  1 sibling, 1 reply; 30+ messages in thread
From: Max Filippov @ 2022-04-07  7:21 UTC (permalink / raw)
  To: Duoming Zhou
  Cc: LKML, Chris Zankel, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	David S. Miller, Jakub Kicinski, pabeni, jes, Greg Kroah-Hartman,
	Jiri Slaby, alexander.deucher,
	open list:TENSILICA XTENSA PORT (xtensa),
	linux-rdma, linux-can, netdev, linux-hippi, linux-staging,
	linux-serial, USB list

Hi Duoming,

On Wed, Apr 6, 2022 at 11:38 PM Duoming Zhou <duoming@zju.edu.cn> wrote:
>
> There is a deadlock in rs_close(), which is shown
> below:
>
>    (Thread 1)              |      (Thread 2)
>                            | rs_open()
> rs_close()                 |  mod_timer()
>  spin_lock_bh() //(1)      |  (wait a time)
>  ...                       | rs_poll()
>  del_timer_sync()          |  spin_lock() //(2)
>  (wait timer to stop)      |  ...
>
> We hold timer_lock in position (1) of thread 1 and
> use del_timer_sync() to wait timer to stop, but timer handler
> also need timer_lock in position (2) of thread 2.
> As a result, rs_close() will block forever.

I agree with this.

> This patch extracts del_timer_sync() from the protection of
> spin_lock_bh(), which could let timer handler to obtain
> the needed lock.

Looking at the timer_lock I don't really understand what it protects.
It looks like it is not needed at all.

Also, I see that rs_poll rewinds the timer regardless of whether del_timer_sync
was called or not, which violates del_timer_sync requirements.

> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>  arch/xtensa/platforms/iss/console.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/xtensa/platforms/iss/console.c b/arch/xtensa/platforms/iss/console.c
> index 81d7c7e8f7e..d431b61ae3c 100644
> --- a/arch/xtensa/platforms/iss/console.c
> +++ b/arch/xtensa/platforms/iss/console.c
> @@ -51,8 +51,10 @@ static int rs_open(struct tty_struct *tty, struct file * filp)
>  static void rs_close(struct tty_struct *tty, struct file * filp)
>  {
>         spin_lock_bh(&timer_lock);
> -       if (tty->count == 1)
> +       if (tty->count == 1) {
> +               spin_unlock_bh(&timer_lock);
>                 del_timer_sync(&serial_timer);
> +       }
>         spin_unlock_bh(&timer_lock);

Now in case tty->count == 1 the timer_lock would be unlocked twice.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend()
  2022-04-07  6:33 ` [PATCH 02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend() Duoming Zhou
@ 2022-04-07  8:01   ` Oliver Neukum
  2022-04-07 12:02     ` duoming
  0 siblings, 1 reply; 30+ messages in thread
From: Oliver Neukum @ 2022-04-07  8:01 UTC (permalink / raw)
  To: Duoming Zhou, linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb



On 07.04.22 08:33, Duoming Zhou wrote:
> There is a deadlock in oxu_bus_suspend(), which is shown below:
>
>    (Thread 1)              |      (Thread 2)
>                            | timer_action()
> oxu_bus_suspend()          |  mod_timer()
>  spin_lock_irq() //(1)     |  (wait a time)
>  ...                       | oxu_watchdog()
>  del_timer_sync()          |  spin_lock_irq() //(2)
>  (wait timer to stop)      |  ...
>
> We hold oxu->lock in position (1) of thread 1, and use
> del_timer_sync() to wait timer to stop, but timer handler
> also need oxu->lock in position (2) of thread 2. As a result,
> oxu_bus_suspend() will block forever.
>
> This patch extracts del_timer_sync() from the protection of
> spin_lock_irq(), which could let timer handler to obtain
> the needed lock.
Good catch.
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>  drivers/usb/host/oxu210hp-hcd.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/usb/host/oxu210hp-hcd.c b/drivers/usb/host/oxu210hp-hcd.c
> index b741670525e..ee403df3309 100644
> --- a/drivers/usb/host/oxu210hp-hcd.c
> +++ b/drivers/usb/host/oxu210hp-hcd.c
> @@ -3909,8 +3909,10 @@ static int oxu_bus_suspend(struct usb_hcd *hcd)
>  		}
>  	}
>  
> +	spin_unlock_irq(&oxu->lock);
>  	/* turn off now-idle HC */
>  	del_timer_sync(&oxu->watchdog);
> +	spin_lock_irq(&oxu->lock);
>  	ehci_halt(oxu);
>  	hcd->state = HC_STATE_SUSPENDED;
>  

What is the lock protecting at that stage? Why not just drop it earlier.

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close()
  2022-04-07  6:37 ` [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close() Duoming Zhou
  2022-04-07  7:21   ` Max Filippov
@ 2022-04-07  9:42   ` Sergey Shtylyov
  2022-04-07 11:12     ` duoming
  1 sibling, 1 reply; 30+ messages in thread
From: Sergey Shtylyov @ 2022-04-07  9:42 UTC (permalink / raw)
  To: Duoming Zhou, linux-kernel
  Cc: chris, jcmvbkbc, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb

Hello!

On 4/7/22 9:37 AM, Duoming Zhou wrote:

> There is a deadlock in rs_close(), which is shown
> below:
> 
>    (Thread 1)              |      (Thread 2)
>                            | rs_open()
> rs_close()                 |  mod_timer()
>  spin_lock_bh() //(1)      |  (wait a time)
>  ...                       | rs_poll()
>  del_timer_sync()          |  spin_lock() //(2)
>  (wait timer to stop)      |  ...
> 
> We hold timer_lock in position (1) of thread 1 and
> use del_timer_sync() to wait timer to stop, but timer handler
> also need timer_lock in position (2) of thread 2.
> As a result, rs_close() will block forever.
> 
> This patch extracts del_timer_sync() from the protection of
> spin_lock_bh(), which could let timer handler to obtain
> the needed lock.
> 
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>  arch/xtensa/platforms/iss/console.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/xtensa/platforms/iss/console.c b/arch/xtensa/platforms/iss/console.c
> index 81d7c7e8f7e..d431b61ae3c 100644
> --- a/arch/xtensa/platforms/iss/console.c
> +++ b/arch/xtensa/platforms/iss/console.c
> @@ -51,8 +51,10 @@ static int rs_open(struct tty_struct *tty, struct file * filp)
>  static void rs_close(struct tty_struct *tty, struct file * filp)
>  {
>  	spin_lock_bh(&timer_lock);
> -	if (tty->count == 1)
> +	if (tty->count == 1) {
> +		spin_unlock_bh(&timer_lock);
>  		del_timer_sync(&serial_timer);
> +	}
>  	spin_unlock_bh(&timer_lock);

   Double unlock iff tty->count == 1?

[...]

MBR, Sergey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close()
  2022-04-07  7:21   ` Max Filippov
@ 2022-04-07 11:05     ` duoming
  0 siblings, 0 replies; 30+ messages in thread
From: duoming @ 2022-04-07 11:05 UTC (permalink / raw)
  To: Max Filippov
  Cc: LKML, Chris Zankel, mustafa.ismail, shiraz.saleem, jgg, wg, mkl,
	David S. Miller, Jakub Kicinski, pabeni, jes, Greg Kroah-Hartman,
	Jiri Slaby, alexander.deucher,
	open list:TENSILICA XTENSA PORT (xtensa),
	linux-rdma, linux-can, netdev, linux-hippi, linux-staging,
	linux-serial, USB list, linma

Hello,

On Thu, 7 Apr 2022 00:21:58 -0700 Max Filippov wrote:

> > There is a deadlock in rs_close(), which is shown
> > below:
> >
> >    (Thread 1)              |      (Thread 2)
> >                            | rs_open()
> > rs_close()                 |  mod_timer()
> >  spin_lock_bh() //(1)      |  (wait a time)
> >  ...                       | rs_poll()
> >  del_timer_sync()          |  spin_lock() //(2)
> >  (wait timer to stop)      |  ...
> >
> > We hold timer_lock in position (1) of thread 1 and
> > use del_timer_sync() to wait timer to stop, but timer handler
> > also need timer_lock in position (2) of thread 2.
> > As a result, rs_close() will block forever.
> 
> I agree with this.
> 
> > This patch extracts del_timer_sync() from the protection of
> > spin_lock_bh(), which could let timer handler to obtain
> > the needed lock.
> 
> Looking at the timer_lock I don't really understand what it protects.
> It looks like it is not needed at all.

There is no race condition between rs_close and rs_poll(timer handler),
I think we could remove the timer_lock in rs_close(), rs_open() and rs_poll().

> Also, I see that rs_poll rewinds the timer regardless of whether del_timer_sync
> was called or not, which violates del_timer_sync requirements.

I wrote a kernel module to test whether del_timer_sync() could finish a timer handler
that use mod_timer() to rewind itself. The following is the result.

# insmod del_timer_sync.ko 
[  929.374405] my_timer will be create.
[  929.374738] the jiffies is :4295595572
[  930.411581] In my_timer_function
[  930.411956] the jiffies is 4295596609
[  935.466643] In my_timer_function
[  935.467505] the jiffies is 4295601665
[  940.586538] In my_timer_function
[  940.586916] the jiffies is 4295606784
[  945.706579] In my_timer_function
[  945.706885] the jiffies is 4295611904

# 
# rmmod del_timer_sync.ko
[  948.507692] the del_timer_sync is :1
[  948.507692] 
# 
# 

The result of the experiment shows that the timer handler could
be killed after we execute del_timer_sync().

> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> >  arch/xtensa/platforms/iss/console.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/xtensa/platforms/iss/console.c b/arch/xtensa/platforms/iss/console.c
> > index 81d7c7e8f7e..d431b61ae3c 100644
> > --- a/arch/xtensa/platforms/iss/console.c
> > +++ b/arch/xtensa/platforms/iss/console.c
> > @@ -51,8 +51,10 @@ static int rs_open(struct tty_struct *tty, struct file * filp)
> >  static void rs_close(struct tty_struct *tty, struct file * filp)
> >  {
> >         spin_lock_bh(&timer_lock);
> > -       if (tty->count == 1)
> > +       if (tty->count == 1) {
> > +               spin_unlock_bh(&timer_lock);
> >                 del_timer_sync(&serial_timer);
> > +       }
> >         spin_unlock_bh(&timer_lock);
> 
> Now in case tty->count == 1 the timer_lock would be unlocked twice.

I will remove the timer_lock in rs_close(), rs_open() and rs_poll().

Thanks a lot for your time and advice!

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close()
  2022-04-07  9:42   ` Sergey Shtylyov
@ 2022-04-07 11:12     ` duoming
  0 siblings, 0 replies; 30+ messages in thread
From: duoming @ 2022-04-07 11:12 UTC (permalink / raw)
  To: Sergey Shtylyov
  Cc: linux-kernel, chris, jcmvbkbc, mustafa.ismail, shiraz.saleem,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb, linma

Hello,

On Thu, 7 Apr 2022 12:42:31 +0300 Sergey Shtylyov wrote:

> > There is a deadlock in rs_close(), which is shown
> > below:
> > 
> >    (Thread 1)              |      (Thread 2)
> >                            | rs_open()
> > rs_close()                 |  mod_timer()
> >  spin_lock_bh() //(1)      |  (wait a time)
> >  ...                       | rs_poll()
> >  del_timer_sync()          |  spin_lock() //(2)
> >  (wait timer to stop)      |  ...
> > 
> > We hold timer_lock in position (1) of thread 1 and
> > use del_timer_sync() to wait timer to stop, but timer handler
> > also need timer_lock in position (2) of thread 2.
> > As a result, rs_close() will block forever.
> > 
> > This patch extracts del_timer_sync() from the protection of
> > spin_lock_bh(), which could let timer handler to obtain
> > the needed lock.
> > 
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> >  arch/xtensa/platforms/iss/console.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/xtensa/platforms/iss/console.c b/arch/xtensa/platforms/iss/console.c
> > index 81d7c7e8f7e..d431b61ae3c 100644
> > --- a/arch/xtensa/platforms/iss/console.c
> > +++ b/arch/xtensa/platforms/iss/console.c
> > @@ -51,8 +51,10 @@ static int rs_open(struct tty_struct *tty, struct file * filp)
> >  static void rs_close(struct tty_struct *tty, struct file * filp)
> >  {
> >  	spin_lock_bh(&timer_lock);
> > -	if (tty->count == 1)
> > +	if (tty->count == 1) {
> > +		spin_unlock_bh(&timer_lock);
> >  		del_timer_sync(&serial_timer);
> > +	}
> >  	spin_unlock_bh(&timer_lock);
> 
>    Double unlock iff tty->count == 1?

Yes, Thanks a lot for your timer and advice. I found there is no race condition
between rs_close and rs_poll(timer handler), I think we could remove the timer_lock
in rs_close(), rs_open() and rs_poll().

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07  6:37 ` [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core() Duoming Zhou
@ 2022-04-07 11:24   ` Dan Carpenter
  2022-04-07 12:54     ` duoming
  0 siblings, 1 reply; 30+ messages in thread
From: Dan Carpenter @ 2022-04-07 11:24 UTC (permalink / raw)
  To: Duoming Zhou
  Cc: linux-kernel, chris, jcmvbkbc, mustafa.ismail, shiraz.saleem,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb

On Thu, Apr 07, 2022 at 02:37:12PM +0800, Duoming Zhou wrote:
> There is a deadlock in irdma_cleanup_cm_core(), which is shown
> below:
> 
>    (Thread 1)              |      (Thread 2)
>                            | irdma_schedule_cm_timer()
> irdma_cleanup_cm_core()    |  add_timer()
>  spin_lock_irqsave() //(1) |  (wait a time)
>  ...                       | irdma_cm_timer_tick()
>  del_timer_sync()          |  spin_lock_irqsave() //(2)
>  (wait timer to stop)      |  ...
> 
> We hold cm_core->ht_lock in position (1) of thread 1 and
> use del_timer_sync() to wait timer to stop, but timer handler
> also need cm_core->ht_lock in position (2) of thread 2.
> As a result, irdma_cleanup_cm_core() will block forever.
> 
> This patch extracts del_timer_sync() from the protection of
> spin_lock_irqsave(), which could let timer handler to obtain
> the needed lock.
> 
> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> ---
>  drivers/infiniband/hw/irdma/cm.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
> index dedb3b7edd8..019dd8bfe08 100644
> --- a/drivers/infiniband/hw/irdma/cm.c
> +++ b/drivers/infiniband/hw/irdma/cm.c
> @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct irdma_cm_core *cm_core)
>  		return;
>  
>  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> -	if (timer_pending(&cm_core->tcp_timer))
> +	if (timer_pending(&cm_core->tcp_timer)) {
> +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
>  		del_timer_sync(&cm_core->tcp_timer);
> +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> +	}
>  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);

This lock doesn't seem to be protecting anything.  Also do we need to
check timer_pending()?  I think the del_timer_sync() function will just
return directly if there isn't a pending lock?

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend()
  2022-04-07  8:01   ` Oliver Neukum
@ 2022-04-07 12:02     ` duoming
  0 siblings, 0 replies; 30+ messages in thread
From: duoming @ 2022-04-07 12:02 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: linux-kernel, chris, jcmvbkbc, mustafa.ismail, shiraz.saleem,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb, linma

Hello,

On Thu, 7 Apr 2022 10:01:43 +0200 Oliver Neukum wrote:

> On 07.04.22 08:33, Duoming Zhou wrote:
> > There is a deadlock in oxu_bus_suspend(), which is shown below:
> >
> >    (Thread 1)              |      (Thread 2)
> >                            | timer_action()
> > oxu_bus_suspend()          |  mod_timer()
> >  spin_lock_irq() //(1)     |  (wait a time)
> >  ...                       | oxu_watchdog()
> >  del_timer_sync()          |  spin_lock_irq() //(2)
> >  (wait timer to stop)      |  ...
> >
> > We hold oxu->lock in position (1) of thread 1, and use
> > del_timer_sync() to wait timer to stop, but timer handler
> > also need oxu->lock in position (2) of thread 2. As a result,
> > oxu_bus_suspend() will block forever.
> >
> > This patch extracts del_timer_sync() from the protection of
> > spin_lock_irq(), which could let timer handler to obtain
> > the needed lock.
> Good catch.
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> >  drivers/usb/host/oxu210hp-hcd.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/usb/host/oxu210hp-hcd.c b/drivers/usb/host/oxu210hp-hcd.c
> > index b741670525e..ee403df3309 100644
> > --- a/drivers/usb/host/oxu210hp-hcd.c
> > +++ b/drivers/usb/host/oxu210hp-hcd.c
> > @@ -3909,8 +3909,10 @@ static int oxu_bus_suspend(struct usb_hcd *hcd)
> >  		}
> >  	}
> >  
> > +	spin_unlock_irq(&oxu->lock);
> >  	/* turn off now-idle HC */
> >  	del_timer_sync(&oxu->watchdog);
> > +	spin_lock_irq(&oxu->lock);
> >  	ehci_halt(oxu);
> >  	hcd->state = HC_STATE_SUSPENDED;
> >  
> 
> What is the lock protecting at that stage? Why not just drop it earlier.

I think there is a race condition between oxu_bus_suspend() and oxu_stop(),
so I think we could not drop the oxu->lock earlier.

               (Thread 1)               |         (Thread 2)
 oxu_bus_suspend()                      | oxu_stop()
                                        | 
 hcd->state = HC_STATE_SUSPENDED;       |  spin_lock_irq(&oxu->lock);
 ...                                    | 
 writel(mask, &oxu->regs->intr_enable); |  ...
                                        |  writel(0, &oxu->regs->intr_enable);
 readl(&oxu->regs->intr_enable);        |

The oxu->regs->intr_enable is set to 0 in oxu_stop(), and the readl() in
oxu_bus_suspend() will read the wrong value.

Thanks a lot for your time and advice. If you have questions, welcome to ask me.

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07 11:24   ` Dan Carpenter
@ 2022-04-07 12:54     ` duoming
  2022-04-07 14:23       ` Jason Gunthorpe
  2022-04-07 15:24       ` Dan Carpenter
  0 siblings, 2 replies; 30+ messages in thread
From: duoming @ 2022-04-07 12:54 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: linux-kernel, chris, jcmvbkbc, mustafa.ismail, shiraz.saleem,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb

Hello,

On Thu, 7 Apr 2022 14:24:56 +0300 Dan Carpenter wrote:

> > There is a deadlock in irdma_cleanup_cm_core(), which is shown
> > below:
> > 
> >    (Thread 1)              |      (Thread 2)
> >                            | irdma_schedule_cm_timer()
> > irdma_cleanup_cm_core()    |  add_timer()
> >  spin_lock_irqsave() //(1) |  (wait a time)
> >  ...                       | irdma_cm_timer_tick()
> >  del_timer_sync()          |  spin_lock_irqsave() //(2)
> >  (wait timer to stop)      |  ...
> > 
> > We hold cm_core->ht_lock in position (1) of thread 1 and
> > use del_timer_sync() to wait timer to stop, but timer handler
> > also need cm_core->ht_lock in position (2) of thread 2.
> > As a result, irdma_cleanup_cm_core() will block forever.
> > 
> > This patch extracts del_timer_sync() from the protection of
> > spin_lock_irqsave(), which could let timer handler to obtain
> > the needed lock.
> > 
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> >  drivers/infiniband/hw/irdma/cm.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
> > index dedb3b7edd8..019dd8bfe08 100644
> > --- a/drivers/infiniband/hw/irdma/cm.c
> > +++ b/drivers/infiniband/hw/irdma/cm.c
> > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct irdma_cm_core *cm_core)
> >  		return;
> >  
> >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > -	if (timer_pending(&cm_core->tcp_timer))
> > +	if (timer_pending(&cm_core->tcp_timer)) {
> > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> >  		del_timer_sync(&cm_core->tcp_timer);
> > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > +	}
> >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> 
> This lock doesn't seem to be protecting anything.  Also do we need to
> check timer_pending()?  I think the del_timer_sync() function will just
> return directly if there isn't a pending lock?

Thanks a lot for your advice, I will remove the timer_pending() and the
redundant lock.

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios()
  2022-04-07  7:02   ` Jiri Slaby
@ 2022-04-07 14:03     ` duoming
  0 siblings, 0 replies; 30+ messages in thread
From: duoming @ 2022-04-07 14:03 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: linux-kernel, chris, jcmvbkbc, mustafa.ismail, shiraz.saleem,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb,
	Russell King - ARM Linux, linma, rmk

Hello,

On Thu, 7 Apr 2022 09:02:05 +0200 Jiri Slaby wrote:

> > There is a deadlock in sa1100_set_termios(), which is shown
> > below:
> > 
> >     (Thread 1)              |      (Thread 2)
> >                             | sa1100_enable_ms()
> > sa1100_set_termios()       |  mod_timer()
> >   spin_lock_irqsave() //(1) |  (wait a time)
> >   ...                       | sa1100_timeout()
> >   del_timer_sync()          |  spin_lock_irqsave() //(2)
> >   (wait timer to stop)      |  ...
> > 
> > We hold sport->port.lock in position (1) of thread 1 and
> > use del_timer_sync() to wait timer to stop, but timer handler
> > also need sport->port.lock in position (2) of thread 2. As a result,
> > sa1100_set_termios() will block forever.
> > 
> > This patch extracts del_timer_sync() from the protection of
> > spin_lock_irqsave(), which could let timer handler to obtain
> > the needed lock.
> > 
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> >   drivers/tty/serial/sa1100.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/tty/serial/sa1100.c b/drivers/tty/serial/sa1100.c
> > index 5fe6cccfc1a..3a5f12ced0b 100644
> > --- a/drivers/tty/serial/sa1100.c
> > +++ b/drivers/tty/serial/sa1100.c
> > @@ -476,7 +476,9 @@ sa1100_set_termios(struct uart_port *port, struct ktermios *termios,
> >   				UTSR1_TO_SM(UTSR1_ROR);
> >   	}
> >   
> > +	spin_unlock_irqrestore(&sport->port.lock, flags);
> 
> Unlocking the lock at this point doesn't look safe at all. Maybe moving 
> the timer deletion before the lock? There is no current maintainer to 
> ask. Most of the driver originates from rmk. Ccing him just in case.

Thanks a lot for your time and advice. I think moving the del_timer_sync()
before the lock is good. Because we may use "sa1100_enable_ms(&sport->port)"
to start the timer after we have set termios.

> FWIW the lock was moved by this commit around linux 2.5.55 (from 
> full-history-linux [1])
> commit f38aef3e62c26a33ea360a86fde9b27e183a3748
> Author: Russell King <rmk@flint.arm.linux.org.uk>
> Date:   Fri Jan 3 15:42:09 2003 +0000
> 
>      [SERIAL] Convert change_speed() to settermios()
> 
> [1] 
> https://archive.org/download/git-history-of-linux/full-history-linux.git.tar
> 
> >   	del_timer_sync(&sport->timer);
> > +	spin_lock_irqsave(&sport->port.lock, flags);
> >   
> >   	/*
> >   	 * Update the per-port timeout.


Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07 12:54     ` duoming
@ 2022-04-07 14:23       ` Jason Gunthorpe
  2022-04-07 14:38         ` duoming
  2022-04-07 15:24       ` Dan Carpenter
  1 sibling, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-04-07 14:23 UTC (permalink / raw)
  To: duoming
  Cc: Dan Carpenter, linux-kernel, chris, jcmvbkbc, mustafa.ismail,
	shiraz.saleem, wg, mkl, davem, kuba, pabeni, jes, gregkh,
	jirislaby, alexander.deucher, linux-xtensa, linux-rdma,
	linux-can, netdev, linux-hippi, linux-staging, linux-serial,
	linux-usb

On Thu, Apr 07, 2022 at 08:54:13PM +0800, duoming@zju.edu.cn wrote:
> > > diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
> > > index dedb3b7edd8..019dd8bfe08 100644
> > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct irdma_cm_core *cm_core)
> > >  		return;
> > >  
> > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > -	if (timer_pending(&cm_core->tcp_timer))
> > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > >  		del_timer_sync(&cm_core->tcp_timer);
> > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > +	}
> > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > 
> > This lock doesn't seem to be protecting anything.  Also do we need to
> > check timer_pending()?  I think the del_timer_sync() function will just
> > return directly if there isn't a pending lock?
> 
> Thanks a lot for your advice, I will remove the timer_pending() and the
> redundant lock.

Does del_timer_sync work with a self-rescheduling timer like this has?

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07 14:23       ` Jason Gunthorpe
@ 2022-04-07 14:38         ` duoming
  2022-04-07 17:35           ` Saleem, Shiraz
  0 siblings, 1 reply; 30+ messages in thread
From: duoming @ 2022-04-07 14:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Carpenter, linux-kernel, chris, jcmvbkbc, mustafa.ismail,
	shiraz.saleem, wg, mkl, davem, kuba, pabeni, jes, gregkh,
	jirislaby, alexander.deucher, linux-xtensa, linux-rdma,
	linux-can, netdev, linux-hippi, linux-staging, linux-serial,
	linux-usb

Hello,

On Thu, 7 Apr 2022 11:23:55 -0300 Jason Gunthorpe wrote:

> > > > diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
> > > > index dedb3b7edd8..019dd8bfe08 100644
> > > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct irdma_cm_core *cm_core)
> > > >  		return;
> > > >  
> > > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > -	if (timer_pending(&cm_core->tcp_timer))
> > > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > >  		del_timer_sync(&cm_core->tcp_timer);
> > > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > +	}
> > > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > 
> > > This lock doesn't seem to be protecting anything.  Also do we need to
> > > check timer_pending()?  I think the del_timer_sync() function will just
> > > return directly if there isn't a pending lock?
> > 
> > Thanks a lot for your advice, I will remove the timer_pending() and the
> > redundant lock.
> 
> Does del_timer_sync work with a self-rescheduling timer like this has?

The del_timer_sync() will kill the timer although it is self-rescheduling.
We could use other functions to arouse timer again besides timer handler itself.

Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07 12:54     ` duoming
  2022-04-07 14:23       ` Jason Gunthorpe
@ 2022-04-07 15:24       ` Dan Carpenter
  2022-04-07 17:36         ` Saleem, Shiraz
  1 sibling, 1 reply; 30+ messages in thread
From: Dan Carpenter @ 2022-04-07 15:24 UTC (permalink / raw)
  To: duoming
  Cc: linux-kernel, chris, jcmvbkbc, mustafa.ismail, shiraz.saleem,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb

On Thu, Apr 07, 2022 at 08:54:13PM +0800, duoming@zju.edu.cn wrote:
> Hello,
> 
> On Thu, 7 Apr 2022 14:24:56 +0300 Dan Carpenter wrote:
> 
> > > There is a deadlock in irdma_cleanup_cm_core(), which is shown
> > > below:
> > > 
> > >    (Thread 1)              |      (Thread 2)
> > >                            | irdma_schedule_cm_timer()
> > > irdma_cleanup_cm_core()    |  add_timer()
> > >  spin_lock_irqsave() //(1) |  (wait a time)
> > >  ...                       | irdma_cm_timer_tick()
> > >  del_timer_sync()          |  spin_lock_irqsave() //(2)
> > >  (wait timer to stop)      |  ...
> > > 
> > > We hold cm_core->ht_lock in position (1) of thread 1 and
> > > use del_timer_sync() to wait timer to stop, but timer handler
> > > also need cm_core->ht_lock in position (2) of thread 2.
> > > As a result, irdma_cleanup_cm_core() will block forever.
> > > 
> > > This patch extracts del_timer_sync() from the protection of
> > > spin_lock_irqsave(), which could let timer handler to obtain
> > > the needed lock.
> > > 
> > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > > ---
> > >  drivers/infiniband/hw/irdma/cm.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
> > > index dedb3b7edd8..019dd8bfe08 100644
> > > --- a/drivers/infiniband/hw/irdma/cm.c
> > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct irdma_cm_core *cm_core)
> > >  		return;
> > >  
> > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > -	if (timer_pending(&cm_core->tcp_timer))
> > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > >  		del_timer_sync(&cm_core->tcp_timer);
> > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > +	}
> > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > 
> > This lock doesn't seem to be protecting anything.  Also do we need to
> > check timer_pending()?  I think the del_timer_sync() function will just
> > return directly if there isn't a pending lock?
> 
> Thanks a lot for your advice, I will remove the timer_pending() and the
> redundant lock.

I didn't give any advice. :P I only ask questions when I don't know the
answers.  Someone probably needs to look at &cm_core->ht_lock and figure
out what it's protecting.

regards,
dan carpenter



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07 14:38         ` duoming
@ 2022-04-07 17:35           ` Saleem, Shiraz
  0 siblings, 0 replies; 30+ messages in thread
From: Saleem, Shiraz @ 2022-04-07 17:35 UTC (permalink / raw)
  To: duoming, Jason Gunthorpe
  Cc: Dan Carpenter, linux-kernel, chris, jcmvbkbc, Ismail, Mustafa,
	wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb

> Subject: Re: Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in
> irdma_cleanup_cm_core()
> 
> Hello,
> 
> On Thu, 7 Apr 2022 11:23:55 -0300 Jason Gunthorpe wrote:
> 
> > > > > diff --git a/drivers/infiniband/hw/irdma/cm.c
> > > > > b/drivers/infiniband/hw/irdma/cm.c
> > > > > index dedb3b7edd8..019dd8bfe08 100644
> > > > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct
> irdma_cm_core *cm_core)
> > > > >  		return;
> > > > >
> > > > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > -	if (timer_pending(&cm_core->tcp_timer))
> > > > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > > >  		del_timer_sync(&cm_core->tcp_timer);
> > > > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > +	}
> > > > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > >
> > > > This lock doesn't seem to be protecting anything.  Also do we need
> > > > to check timer_pending()?  I think the del_timer_sync() function
> > > > will just return directly if there isn't a pending lock?
> > >
> > > Thanks a lot for your advice, I will remove the timer_pending() and
> > > the redundant lock.
> >
> > Does del_timer_sync work with a self-rescheduling timer like this has?
> 
> The del_timer_sync() will kill the timer although it is self-rescheduling.
> We could use other functions to arouse timer again besides timer handler itself.
> 

By the time we execute, irdma_cleanup_cm_core all cm_nodes should be culled and there will be no timer add from the timer handler.

And the secondary path to add timer, irdma_schedule_timer is guaranteed to not run.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07 15:24       ` Dan Carpenter
@ 2022-04-07 17:36         ` Saleem, Shiraz
  2022-04-08  0:35           ` duoming
  0 siblings, 1 reply; 30+ messages in thread
From: Saleem, Shiraz @ 2022-04-07 17:36 UTC (permalink / raw)
  To: Dan Carpenter, duoming
  Cc: linux-kernel, chris, jcmvbkbc, Ismail, Mustafa, jgg, wg, mkl,
	davem, kuba, pabeni, jes, gregkh, jirislaby, alexander.deucher,
	linux-xtensa, linux-rdma, linux-can, netdev, linux-hippi,
	linux-staging, linux-serial, linux-usb

> Subject: Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in
> irdma_cleanup_cm_core()
> 
> On Thu, Apr 07, 2022 at 08:54:13PM +0800, duoming@zju.edu.cn wrote:
> > Hello,
> >
> > On Thu, 7 Apr 2022 14:24:56 +0300 Dan Carpenter wrote:
> >
> > > > There is a deadlock in irdma_cleanup_cm_core(), which is shown
> > > > below:
> > > >
> > > >    (Thread 1)              |      (Thread 2)
> > > >                            | irdma_schedule_cm_timer()
> > > > irdma_cleanup_cm_core()    |  add_timer()
> > > >  spin_lock_irqsave() //(1) |  (wait a time)
> > > >  ...                       | irdma_cm_timer_tick()
> > > >  del_timer_sync()          |  spin_lock_irqsave() //(2)
> > > >  (wait timer to stop)      |  ...
> > > >
> > > > We hold cm_core->ht_lock in position (1) of thread 1 and use
> > > > del_timer_sync() to wait timer to stop, but timer handler also
> > > > need cm_core->ht_lock in position (2) of thread 2.
> > > > As a result, irdma_cleanup_cm_core() will block forever.
> > > >
> > > > This patch extracts del_timer_sync() from the protection of
> > > > spin_lock_irqsave(), which could let timer handler to obtain the
> > > > needed lock.
> > > >
> > > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > > > ---
> > > >  drivers/infiniband/hw/irdma/cm.c | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/infiniband/hw/irdma/cm.c
> > > > b/drivers/infiniband/hw/irdma/cm.c
> > > > index dedb3b7edd8..019dd8bfe08 100644
> > > > --- a/drivers/infiniband/hw/irdma/cm.c
> > > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct
> irdma_cm_core *cm_core)
> > > >  		return;
> > > >
> > > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > -	if (timer_pending(&cm_core->tcp_timer))
> > > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > >  		del_timer_sync(&cm_core->tcp_timer);
> > > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > +	}
> > > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > >
> > > This lock doesn't seem to be protecting anything.  Also do we need
> > > to check timer_pending()?  I think the del_timer_sync() function
> > > will just return directly if there isn't a pending lock?
> >
> > Thanks a lot for your advice, I will remove the timer_pending() and
> > the redundant lock.
> 
> I didn't give any advice. :P I only ask questions when I don't know the answers.
> Someone probably needs to look at &cm_core->ht_lock and figure out what it's
> protecting.
> 
Agreed on this fix.

We should not lock around del_timer_sync or need to check on timer_pending.

However, we do need serialize addition of a timer which can be called from multiple paths, i.e. the timer handler and irdma_schedule_cm_timer.

Shiraz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RE: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-07 17:36         ` Saleem, Shiraz
@ 2022-04-08  0:35           ` duoming
  2022-04-08  2:21             ` Saleem, Shiraz
  0 siblings, 1 reply; 30+ messages in thread
From: duoming @ 2022-04-08  0:35 UTC (permalink / raw)
  To: Saleem, Shiraz
  Cc: Dan Carpenter, linux-kernel, chris, jcmvbkbc, Ismail, Mustafa,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb

Hello,

On Thu, 7 Apr 2022 17:36:12 +0000 Saleem, Shiraz wrote:
 
> > Subject: Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in
> > irdma_cleanup_cm_core()
> > 
> > On Thu, Apr 07, 2022 at 08:54:13PM +0800, duoming@zju.edu.cn wrote:
> > > Hello,
> > >
> > > On Thu, 7 Apr 2022 14:24:56 +0300 Dan Carpenter wrote:
> > >
> > > > > There is a deadlock in irdma_cleanup_cm_core(), which is shown
> > > > > below:
> > > > >
> > > > >    (Thread 1)              |      (Thread 2)
> > > > >                            | irdma_schedule_cm_timer()
> > > > > irdma_cleanup_cm_core()    |  add_timer()
> > > > >  spin_lock_irqsave() //(1) |  (wait a time)
> > > > >  ...                       | irdma_cm_timer_tick()
> > > > >  del_timer_sync()          |  spin_lock_irqsave() //(2)
> > > > >  (wait timer to stop)      |  ...
> > > > >
> > > > > We hold cm_core->ht_lock in position (1) of thread 1 and use
> > > > > del_timer_sync() to wait timer to stop, but timer handler also
> > > > > need cm_core->ht_lock in position (2) of thread 2.
> > > > > As a result, irdma_cleanup_cm_core() will block forever.
> > > > >
> > > > > This patch extracts del_timer_sync() from the protection of
> > > > > spin_lock_irqsave(), which could let timer handler to obtain the
> > > > > needed lock.
> > > > >
> > > > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > > > > ---
> > > > >  drivers/infiniband/hw/irdma/cm.c | 5 ++++-
> > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/infiniband/hw/irdma/cm.c
> > > > > b/drivers/infiniband/hw/irdma/cm.c
> > > > > index dedb3b7edd8..019dd8bfe08 100644
> > > > > --- a/drivers/infiniband/hw/irdma/cm.c
> > > > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct
> > irdma_cm_core *cm_core)
> > > > >  		return;
> > > > >
> > > > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > -	if (timer_pending(&cm_core->tcp_timer))
> > > > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > > >  		del_timer_sync(&cm_core->tcp_timer);
> > > > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > +	}
> > > > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > >
> > > > This lock doesn't seem to be protecting anything.  Also do we need
> > > > to check timer_pending()?  I think the del_timer_sync() function
> > > > will just return directly if there isn't a pending lock?
> > >
> > > Thanks a lot for your advice, I will remove the timer_pending() and
> > > the redundant lock.
> > 
> > I didn't give any advice. :P I only ask questions when I don't know the answers.
> > Someone probably needs to look at &cm_core->ht_lock and figure out what it's
> > protecting.
> > 
> Agreed on this fix.
> 
> We should not lock around del_timer_sync or need to check on timer_pending.
> 
> However, we do need serialize addition of a timer which can be called from multiple paths, i.e. the timer handler and irdma_schedule_cm_timer.

I think we should replace the check "if (!timer_pending(&cm_core->tcp_timer))" to
"if (timer_pending(&cm_core->tcp_timer))" in irdma_cm_timer_tick(), and replace
"if (!was_timer_set)" to "if (was_timer_set)" in irdma_schedule_cm_timer() in order
to guarantee the timer could be executed. I will send the modified patch as soon as 
possible.

Best regards,
Duoming Zhou


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: RE: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-08  0:35           ` duoming
@ 2022-04-08  2:21             ` Saleem, Shiraz
  2022-04-08  2:52               ` duoming
  0 siblings, 1 reply; 30+ messages in thread
From: Saleem, Shiraz @ 2022-04-08  2:21 UTC (permalink / raw)
  To: duoming
  Cc: Dan Carpenter, linux-kernel, chris, jcmvbkbc, Ismail, Mustafa,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb

> Subject: Re: RE: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in
> irdma_cleanup_cm_core()
> 
> Hello,
> 
> On Thu, 7 Apr 2022 17:36:12 +0000 Saleem, Shiraz wrote:
> 
> > > Subject: Re: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock
> > > in
> > > irdma_cleanup_cm_core()
> > >
> > > On Thu, Apr 07, 2022 at 08:54:13PM +0800, duoming@zju.edu.cn wrote:
> > > > Hello,
> > > >
> > > > On Thu, 7 Apr 2022 14:24:56 +0300 Dan Carpenter wrote:
> > > >
> > > > > > There is a deadlock in irdma_cleanup_cm_core(), which is shown
> > > > > > below:
> > > > > >
> > > > > >    (Thread 1)              |      (Thread 2)
> > > > > >                            | irdma_schedule_cm_timer()
> > > > > > irdma_cleanup_cm_core()    |  add_timer()
> > > > > >  spin_lock_irqsave() //(1) |  (wait a time)
> > > > > >  ...                       | irdma_cm_timer_tick()
> > > > > >  del_timer_sync()          |  spin_lock_irqsave() //(2)
> > > > > >  (wait timer to stop)      |  ...
> > > > > >
> > > > > > We hold cm_core->ht_lock in position (1) of thread 1 and use
> > > > > > del_timer_sync() to wait timer to stop, but timer handler also
> > > > > > need cm_core->ht_lock in position (2) of thread 2.
> > > > > > As a result, irdma_cleanup_cm_core() will block forever.
> > > > > >
> > > > > > This patch extracts del_timer_sync() from the protection of
> > > > > > spin_lock_irqsave(), which could let timer handler to obtain
> > > > > > the needed lock.
> > > > > >
> > > > > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > > > > > ---
> > > > > >  drivers/infiniband/hw/irdma/cm.c | 5 ++++-
> > > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/infiniband/hw/irdma/cm.c
> > > > > > b/drivers/infiniband/hw/irdma/cm.c
> > > > > > index dedb3b7edd8..019dd8bfe08 100644
> > > > > > --- a/drivers/infiniband/hw/irdma/cm.c
> > > > > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > > > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct
> > > irdma_cm_core *cm_core)
> > > > > >  		return;
> > > > > >
> > > > > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > > -	if (timer_pending(&cm_core->tcp_timer))
> > > > > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > > > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > > > >  		del_timer_sync(&cm_core->tcp_timer);
> > > > > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > > +	}
> > > > > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > > >
> > > > > This lock doesn't seem to be protecting anything.  Also do we
> > > > > need to check timer_pending()?  I think the del_timer_sync()
> > > > > function will just return directly if there isn't a pending lock?
> > > >
> > > > Thanks a lot for your advice, I will remove the timer_pending()
> > > > and the redundant lock.
> > >
> > > I didn't give any advice. :P I only ask questions when I don't know the answers.
> > > Someone probably needs to look at &cm_core->ht_lock and figure out
> > > what it's protecting.
> > >
> > Agreed on this fix.
> >
> > We should not lock around del_timer_sync or need to check on timer_pending.
> >
> > However, we do need serialize addition of a timer which can be called from
> multiple paths, i.e. the timer handler and irdma_schedule_cm_timer.
> 
> I think we should replace the check "if (!timer_pending(&cm_core->tcp_timer))" to
> "if (timer_pending(&cm_core->tcp_timer))" in irdma_cm_timer_tick(), and replace "if
> (!was_timer_set)" to "if (was_timer_set)" in irdma_schedule_cm_timer() in order to
> guarantee the timer could be executed. I will send the modified patch as soon as
> possible.
> 

No we don’t arm the timer if there's is one pending. Its also a bug to do so. 

https://elixir.bootlin.com/linux/v5.18-rc1/source/kernel/time/timer.c#L1143

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: RE: RE: Re: [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core()
  2022-04-08  2:21             ` Saleem, Shiraz
@ 2022-04-08  2:52               ` duoming
  0 siblings, 0 replies; 30+ messages in thread
From: duoming @ 2022-04-08  2:52 UTC (permalink / raw)
  To: Saleem, Shiraz
  Cc: Dan Carpenter, linux-kernel, chris, jcmvbkbc, Ismail, Mustafa,
	jgg, wg, mkl, davem, kuba, pabeni, jes, gregkh, jirislaby,
	alexander.deucher, linux-xtensa, linux-rdma, linux-can, netdev,
	linux-hippi, linux-staging, linux-serial, linux-usb

Hello,

On Fri, 8 Apr 2022 02:21:59 +0000 Saleem, Shiraz wrote:

> > > > > > > There is a deadlock in irdma_cleanup_cm_core(), which is shown
> > > > > > > below:
> > > > > > >
> > > > > > >    (Thread 1)              |      (Thread 2)
> > > > > > >                            | irdma_schedule_cm_timer()
> > > > > > > irdma_cleanup_cm_core()    |  add_timer()
> > > > > > >  spin_lock_irqsave() //(1) |  (wait a time)
> > > > > > >  ...                       | irdma_cm_timer_tick()
> > > > > > >  del_timer_sync()          |  spin_lock_irqsave() //(2)
> > > > > > >  (wait timer to stop)      |  ...
> > > > > > >
> > > > > > > We hold cm_core->ht_lock in position (1) of thread 1 and use
> > > > > > > del_timer_sync() to wait timer to stop, but timer handler also
> > > > > > > need cm_core->ht_lock in position (2) of thread 2.
> > > > > > > As a result, irdma_cleanup_cm_core() will block forever.
> > > > > > >
> > > > > > > This patch extracts del_timer_sync() from the protection of
> > > > > > > spin_lock_irqsave(), which could let timer handler to obtain
> > > > > > > the needed lock.
> > > > > > >
> > > > > > > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > > > > > > ---
> > > > > > >  drivers/infiniband/hw/irdma/cm.c | 5 ++++-
> > > > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/infiniband/hw/irdma/cm.c
> > > > > > > b/drivers/infiniband/hw/irdma/cm.c
> > > > > > > index dedb3b7edd8..019dd8bfe08 100644
> > > > > > > --- a/drivers/infiniband/hw/irdma/cm.c
> > > > > > > +++ b/drivers/infiniband/hw/irdma/cm.c
> > > > > > > @@ -3252,8 +3252,11 @@ void irdma_cleanup_cm_core(struct
> > > > irdma_cm_core *cm_core)
> > > > > > >  		return;
> > > > > > >
> > > > > > >  	spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > > > -	if (timer_pending(&cm_core->tcp_timer))
> > > > > > > +	if (timer_pending(&cm_core->tcp_timer)) {
> > > > > > > +		spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > > > > >  		del_timer_sync(&cm_core->tcp_timer);
> > > > > > > +		spin_lock_irqsave(&cm_core->ht_lock, flags);
> > > > > > > +	}
> > > > > > >  	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
> > > > > >
> > > > > > This lock doesn't seem to be protecting anything.  Also do we
> > > > > > need to check timer_pending()?  I think the del_timer_sync()
> > > > > > function will just return directly if there isn't a pending lock?
> > > > >
> > > > > Thanks a lot for your advice, I will remove the timer_pending()
> > > > > and the redundant lock.
> > > >
> > > > I didn't give any advice. :P I only ask questions when I don't know the answers.
> > > > Someone probably needs to look at &cm_core->ht_lock and figure out
> > > > what it's protecting.
> > > >
> > > Agreed on this fix.
> > >
> > > We should not lock around del_timer_sync or need to check on timer_pending.
> > >
> > > However, we do need serialize addition of a timer which can be called from
> > multiple paths, i.e. the timer handler and irdma_schedule_cm_timer.
> > 
> > I think we should replace the check "if (!timer_pending(&cm_core->tcp_timer))" to
> > "if (timer_pending(&cm_core->tcp_timer))" in irdma_cm_timer_tick(), and replace "if
> > (!was_timer_set)" to "if (was_timer_set)" in irdma_schedule_cm_timer() in order to
> > guarantee the timer could be executed. I will send the modified patch as soon as
> > possible.
> > 
> 
> No we don’t arm the timer if there's is one pending. Its also a bug to do so. 
> 
> https://elixir.bootlin.com/linux/v5.18-rc1/source/kernel/time/timer.c#L1143

You are right, I think we could add "mod_timer" in irdma_schedule_cm_timer and
irdma_cm_timer_tick() in order to start timer. 

I will send [PATCH V4 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core().


Best regards,
Duoming Zhou

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-04-08  2:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07  6:33 [PATCH 00/11] Fix deadlocks caused by del_timer_sync() Duoming Zhou
2022-04-07  6:33 ` [PATCH 01/11] drivers: tty: serial: Fix deadlock in sa1100_set_termios() Duoming Zhou
2022-04-07  7:02   ` Jiri Slaby
2022-04-07 14:03     ` duoming
2022-04-07  6:33 ` [PATCH 02/11] drivers: usb: host: Fix deadlock in oxu_bus_suspend() Duoming Zhou
2022-04-07  8:01   ` Oliver Neukum
2022-04-07 12:02     ` duoming
2022-04-07  6:33 ` [PATCH 03/11] drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop() Duoming Zhou
2022-04-07  6:33 ` [PATCH 04/11] drivers: staging: rtl8723bs: Fix deadlock in rtw_surveydone_event_callback() Duoming Zhou
2022-04-07  6:36 ` [PATCH 05/11] drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop() Duoming Zhou
2022-04-07  6:36 ` [PATCH 06/11] drivers: staging: rtl8192e: Fix deadlock in rtw_joinbss_event_prehandle() Duoming Zhou
2022-04-07  6:36 ` [PATCH 07/11] drivers: net: hippi: Fix deadlock in rr_close() Duoming Zhou
2022-04-07  6:37 ` [PATCH 08/11] drivers: net: can: Fix deadlock in grcan_close() Duoming Zhou
2022-04-07  6:37 ` [PATCH 09/11] drivers: infiniband: hw: Fix deadlock in irdma_cleanup_cm_core() Duoming Zhou
2022-04-07 11:24   ` Dan Carpenter
2022-04-07 12:54     ` duoming
2022-04-07 14:23       ` Jason Gunthorpe
2022-04-07 14:38         ` duoming
2022-04-07 17:35           ` Saleem, Shiraz
2022-04-07 15:24       ` Dan Carpenter
2022-04-07 17:36         ` Saleem, Shiraz
2022-04-08  0:35           ` duoming
2022-04-08  2:21             ` Saleem, Shiraz
2022-04-08  2:52               ` duoming
2022-04-07  6:37 ` [PATCH 10/11] arch: xtensa: platforms: Fix deadlock in iss_net_close() Duoming Zhou
2022-04-07  6:37 ` [PATCH 11/11] arch: xtensa: platforms: Fix deadlock in rs_close() Duoming Zhou
2022-04-07  7:21   ` Max Filippov
2022-04-07 11:05     ` duoming
2022-04-07  9:42   ` Sergey Shtylyov
2022-04-07 11:12     ` duoming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.