All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
@ 2021-12-06 12:02 Sebastian Andrzej Siewior
  2021-12-09 20:06 ` Martin KaFai Lau
  0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-06 12:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kuniyuki Iwashima, eric.dumazet, kafai, davem, dsahern, efault,
	netdev, tglx, yoshfuji

Commit
   9652dc2eb9e40 ("tcp: relax listening_hash operations")

removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.

On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves is also acquired with enabled bottom halves
followed by disabling bottom halves. This is the reverse locking order.
It has been observed with inet_listen_hashbucket::lock:

local_bh_disable() + spin_lock(&ilb->lock):
  inet_listen()
    inet_csk_listen_start()
      sk->sk_prot->hash() := inet_hash()
	local_bh_disable()
	__inet_hash()
	  spin_lock(&ilb->lock);
	    acquire(&ilb->lock);

Reverse order: spin_lock(&ilb->lock) + local_bh_disable():
  tcp_seq_next()
    listening_get_next()
      spin_lock(&ilb->lock);
	acquire(&ilb->lock);

  tcp4_seq_show()
    get_tcp4_sock()
      sock_i_ino()
	read_lock_bh(&sk->sk_callback_lock);
	  acquire(softirq_ctrl)	// <---- whoops
	  acquire(&sk->sk_callback_lock)

Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock. Split inet_unhash() and acquire the
listen_hashbucket lock without disabling bottom halves; the inet_ehash
lock with disabled bottom halves.

Reported-by: Mike Galbraith <efault@gmx.de>
Fixes: 9652dc2eb9e40 ("tcp: relax listening_hash operations")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de
Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
On 2021-12-03 18:23:20 [-0800], Jakub Kicinski wrote:
> On Fri, 3 Dec 2021 10:39:34 +0900 Kuniyuki Iwashima wrote:
> > I think this patch is for the net tree and needs a Fixes: tag of the commit
> > mentioned in the description.
> > The patch itself looks good to me.
> > 
> > Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> 
> Makes sense, please repost for net with the Fixes tag and CC Eric.

v1…v2:

Reposted with fixes and net-tree as requested. Please keep in mind that
this only effects the PREEMPT_RT preemption model and I'm posting this
as part of the merging efforts. Therefore I didn't add the Fixes: tag
and used net-next as I didn't expect any -stable backports (but then
Greg sometimes backports RT-only patches since "it makes the life of
some folks easier" as he puts it).

 net/ipv4/inet_hashtables.c  | 53 ++++++++++++++++++++++---------------
 net/ipv6/inet6_hashtables.c |  5 +---
 2 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 75737267746f8..7bd1e10086f0a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -637,7 +637,9 @@ int __inet_hash(struct sock *sk, struct sock *osk)
 	int err = 0;
 
 	if (sk->sk_state != TCP_LISTEN) {
+		local_bh_disable();
 		inet_ehash_nolisten(sk, osk, NULL);
+		local_bh_enable();
 		return 0;
 	}
 	WARN_ON(!sk_unhashed(sk));
@@ -669,45 +671,54 @@ int inet_hash(struct sock *sk)
 {
 	int err = 0;
 
-	if (sk->sk_state != TCP_CLOSE) {
-		local_bh_disable();
+	if (sk->sk_state != TCP_CLOSE)
 		err = __inet_hash(sk, NULL);
-		local_bh_enable();
-	}
 
 	return err;
 }
 EXPORT_SYMBOL_GPL(inet_hash);
 
-void inet_unhash(struct sock *sk)
+static void __inet_unhash(struct sock *sk, struct inet_listen_hashbucket *ilb)
 {
-	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
-	struct inet_listen_hashbucket *ilb = NULL;
-	spinlock_t *lock;
-
 	if (sk_unhashed(sk))
 		return;
 
-	if (sk->sk_state == TCP_LISTEN) {
-		ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)];
-		lock = &ilb->lock;
-	} else {
-		lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
-	}
-	spin_lock_bh(lock);
-	if (sk_unhashed(sk))
-		goto unlock;
-
 	if (rcu_access_pointer(sk->sk_reuseport_cb))
 		reuseport_stop_listen_sock(sk);
 	if (ilb) {
+		struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
+
 		inet_unhash2(hashinfo, sk);
 		ilb->count--;
 	}
 	__sk_nulls_del_node_init_rcu(sk);
 	sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
-unlock:
-	spin_unlock_bh(lock);
+}
+
+void inet_unhash(struct sock *sk)
+{
+	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
+
+	if (sk_unhashed(sk))
+		return;
+
+	if (sk->sk_state == TCP_LISTEN) {
+		struct inet_listen_hashbucket *ilb;
+
+		ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)];
+		/* Don't disable bottom halves while acquiring the lock to
+		 * avoid circular locking dependency on PREEMPT_RT.
+		 */
+		spin_lock(&ilb->lock);
+		__inet_unhash(sk, ilb);
+		spin_unlock(&ilb->lock);
+	} else {
+		spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
+
+		spin_lock_bh(lock);
+		__inet_unhash(sk, NULL);
+		spin_unlock_bh(lock);
+	}
 }
 EXPORT_SYMBOL_GPL(inet_unhash);
 
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 67c9114835c84..0a2e7f2283911 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -333,11 +333,8 @@ int inet6_hash(struct sock *sk)
 {
 	int err = 0;
 
-	if (sk->sk_state != TCP_CLOSE) {
-		local_bh_disable();
+	if (sk->sk_state != TCP_CLOSE)
 		err = __inet_hash(sk, NULL);
-		local_bh_enable();
-	}
 
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
  2021-12-06 12:02 [PATCH net v2] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH Sebastian Andrzej Siewior
@ 2021-12-09 20:06 ` Martin KaFai Lau
  2021-12-10 16:21   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 4+ messages in thread
From: Martin KaFai Lau @ 2021-12-09 20:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jakub Kicinski, Kuniyuki Iwashima, eric.dumazet, davem, dsahern,
	efault, netdev, tglx, yoshfuji

On Mon, Dec 06, 2021 at 01:02:16PM +0100, Sebastian Andrzej Siewior wrote:
> Commit
>    9652dc2eb9e40 ("tcp: relax listening_hash operations")
> 
> removed the need to disable bottom half while acquiring
> listening_hash.lock. There are still two callers left which disable
> bottom half before the lock is acquired.
> 
> On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> as a lock to ensure that resources, that are protected by disabling
> bottom halves, remain protected.
> This leads to a circular locking dependency if the lock acquired with
> disabled bottom halves is also acquired with enabled bottom halves
> followed by disabling bottom halves. This is the reverse locking order.
> It has been observed with inet_listen_hashbucket::lock:
> 
> local_bh_disable() + spin_lock(&ilb->lock):
>   inet_listen()
>     inet_csk_listen_start()
>       sk->sk_prot->hash() := inet_hash()
> 	local_bh_disable()
> 	__inet_hash()
> 	  spin_lock(&ilb->lock);
> 	    acquire(&ilb->lock);
> 
> Reverse order: spin_lock(&ilb->lock) + local_bh_disable():
>   tcp_seq_next()
>     listening_get_next()
>       spin_lock(&ilb->lock);
The net tree has already been using ilb2 instead of ilb.
It does not change the problem though but updating
the commit log will be useful to avoid future confusion.

iiuc, established_get_next() does not hit this because
it calls spin_lock_bh() which then keeps the
local_bh_disable() => spin_lock() ordering?

The patch lgtm.
Acked-by: Martin KaFai Lau <kafai@fb.com>

> 	acquire(&ilb->lock);
> 
>   tcp4_seq_show()
>     get_tcp4_sock()
>       sock_i_ino()
> 	read_lock_bh(&sk->sk_callback_lock);
> 	  acquire(softirq_ctrl)	// <---- whoops
> 	  acquire(&sk->sk_callback_lock)
> 
> Drop local_bh_disable() around __inet_hash() which acquires
> listening_hash->lock. Split inet_unhash() and acquire the
> listen_hashbucket lock without disabling bottom halves; the inet_ehash
> lock with disabled bottom halves.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
  2021-12-09 20:06 ` Martin KaFai Lau
@ 2021-12-10 16:21   ` Sebastian Andrzej Siewior
  2021-12-10 21:22     ` Martin KaFai Lau
  0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-12-10 16:21 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Jakub Kicinski, Kuniyuki Iwashima, eric.dumazet, davem, dsahern,
	efault, netdev, tglx, yoshfuji

On 2021-12-09 12:06:32 [-0800], Martin KaFai Lau wrote:
> > local_bh_disable() + spin_lock(&ilb->lock):
> >   inet_listen()
> >     inet_csk_listen_start()
> >       sk->sk_prot->hash() := inet_hash()
> > 	local_bh_disable()
> > 	__inet_hash()
> > 	  spin_lock(&ilb->lock);
> > 	    acquire(&ilb->lock);
> > 
> > Reverse order: spin_lock(&ilb->lock) + local_bh_disable():
> >   tcp_seq_next()
> >     listening_get_next()
> >       spin_lock(&ilb->lock);
> The net tree has already been using ilb2 instead of ilb.
> It does not change the problem though but updating
> the commit log will be useful to avoid future confusion.

You think so? But having ilb2 and ilb might suggest that these two are
different locks while they are the same. I could repost it early next
week if you this actually confuses more…

> iiuc, established_get_next() does not hit this because
> it calls spin_lock_bh() which then keeps the
> local_bh_disable() => spin_lock() ordering?

Yes. BH was disabled before the spin_lock was acquired so it is the same
ordering. The important part is not to disable BH after the spin_lock
was acquired and in the reverse order.

> The patch lgtm.
> Acked-by: Martin KaFai Lau <kafai@fb.com>

Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
  2021-12-10 16:21   ` Sebastian Andrzej Siewior
@ 2021-12-10 21:22     ` Martin KaFai Lau
  0 siblings, 0 replies; 4+ messages in thread
From: Martin KaFai Lau @ 2021-12-10 21:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jakub Kicinski, Kuniyuki Iwashima, eric.dumazet, davem, dsahern,
	efault, netdev, tglx, yoshfuji

On Fri, Dec 10, 2021 at 05:21:13PM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-12-09 12:06:32 [-0800], Martin KaFai Lau wrote:
> > > local_bh_disable() + spin_lock(&ilb->lock):
> > >   inet_listen()
> > >     inet_csk_listen_start()
> > >       sk->sk_prot->hash() := inet_hash()
> > > 	local_bh_disable()
> > > 	__inet_hash()
> > > 	  spin_lock(&ilb->lock);
> > > 	    acquire(&ilb->lock);
> > > 
> > > Reverse order: spin_lock(&ilb->lock) + local_bh_disable():
> > >   tcp_seq_next()
> > >     listening_get_next()
> > >       spin_lock(&ilb->lock);
> > The net tree has already been using ilb2 instead of ilb.
> > It does not change the problem though but updating
> > the commit log will be useful to avoid future confusion.
> 
> You think so? But having ilb2 and ilb might suggest that these two are
> different locks while they are the same. I could repost it early next
> week if you this actually confuses more…
Yes, they are different locks.  ilb2->lock is also taken
in the inet_listen() path.  ilb->lock is not even taken
in the listening_get_next() side.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-10 21:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06 12:02 [PATCH net v2] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH Sebastian Andrzej Siewior
2021-12-09 20:06 ` Martin KaFai Lau
2021-12-10 16:21   ` Sebastian Andrzej Siewior
2021-12-10 21:22     ` Martin KaFai Lau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.