netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
@ 2022-11-19 13:03 Jakub Sitnicki
  2022-11-19 13:52 ` Tetsuo Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2022-11-19 13:03 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Tom Parkin, Tetsuo Handa, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

When holding a reader-writer spin lock we cannot sleep. Calling
setup_udp_tunnel_sock() with write lock held violates this rule, because we
end up calling percpu_down_read(), which might sleep, as syzbot reports
[1]:

 __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890
 percpu_down_read include/linux/percpu-rwsem.h:49 [inline]
 cpus_read_lock+0x1b/0x140 kernel/cpu.c:310
 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158
 udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline]
 setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81
 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509
 pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723

Trim the writer-side critical section for sk_callback_lock down to the
minimum, so that it covers only operations on sk_user_data.

Also, when grabbing the sk_callback_lock, we always need to disable BH, as
Eric points out. Failing to do so leads to deadlocks because we acquire
sk_callback_lock in softirq context, which can get stuck waiting on us if:

1) it runs on the same CPU, or

       CPU0
       ----
  lock(clock-AF_INET6);
  <Interrupt>
    lock(clock-AF_INET6);

2) lock ordering leads to priority inversion

       CPU0                    CPU1
       ----                    ----
  lock(clock-AF_INET6);
                               local_irq_disable();
                               lock(&tcp_hashinfo.bhash[i].lock);
                               lock(clock-AF_INET6);
  <Interrupt>
    lock(&tcp_hashinfo.bhash[i].lock);

... as syzbot reports [2,3]. Use the _bh variants for write_(un)lock.

[1] https://lore.kernel.org/netdev/0000000000004e78ec05eda79749@google.com/
[2] https://lore.kernel.org/netdev/000000000000e38b6605eda76f98@google.com/
[3] https://lore.kernel.org/netdev/000000000000dfa31e05eda76f75@google.com/

Cc: Tom Parkin <tparkin@katalix.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Fixes: b68777d54fac ("l2tp: Serialize access to sk_user_data with sk_callback_lock")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com
Reported-by: syzbot+50680ced9e98a61f7698@syzkaller.appspotmail.com
Reported-by: syzbot+de987172bb74a381879b@syzkaller.appspotmail.com
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/l2tp/l2tp_core.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 754fdda8a5f5..100d17908196 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1474,11 +1474,20 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 	}
 
 	sk = sock->sk;
-	write_lock(&sk->sk_callback_lock);
-
+	write_lock_bh(&sk->sk_callback_lock);
 	ret = l2tp_validate_socket(sk, net, tunnel->encap);
 	if (ret < 0)
-		goto err_sock;
+		goto err_inval_sock;
+
+	switch (tunnel->encap) {
+	case L2TP_ENCAPTYPE_IP:
+		rcu_assign_sk_user_data(sk, tunnel);
+		break;
+	case L2TP_ENCAPTYPE_UDP:
+		/* nothing to do */
+		break;
+	}
+	write_unlock_bh(&sk->sk_callback_lock);
 
 	tunnel->l2tp_net = net;
 	pn = l2tp_pernet(net);
@@ -1507,8 +1516,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 		};
 
 		setup_udp_tunnel_sock(net, sock, &udp_cfg);
-	} else {
-		rcu_assign_sk_user_data(sk, tunnel);
 	}
 
 	tunnel->old_sk_destruct = sk->sk_destruct;
@@ -1522,16 +1529,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 	if (tunnel->fd >= 0)
 		sockfd_put(sock);
 
-	write_unlock(&sk->sk_callback_lock);
 	return 0;
 
 err_sock:
+	write_lock_bh(&sk->sk_callback_lock);
+	rcu_assign_sk_user_data(sk, NULL);
+err_inval_sock:
+	write_unlock_bh(&sk->sk_callback_lock);
+
 	if (tunnel->fd < 0)
 		sock_release(sock);
 	else
 		sockfd_put(sock);
-
-	write_unlock(&sk->sk_callback_lock);
 err:
 	return ret;
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-19 13:03 [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock Jakub Sitnicki
@ 2022-11-19 13:52 ` Tetsuo Handa
  2022-11-19 14:27   ` Tetsuo Handa
  2022-11-21  9:00   ` Jakub Sitnicki
  0 siblings, 2 replies; 15+ messages in thread
From: Tetsuo Handa @ 2022-11-19 13:52 UTC (permalink / raw)
  To: Jakub Sitnicki, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On 2022/11/19 22:03, Jakub Sitnicki wrote:
> When holding a reader-writer spin lock we cannot sleep. Calling
> setup_udp_tunnel_sock() with write lock held violates this rule, because we
> end up calling percpu_down_read(), which might sleep, as syzbot reports
> [1]:
> 
>  __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890
>  percpu_down_read include/linux/percpu-rwsem.h:49 [inline]
>  cpus_read_lock+0x1b/0x140 kernel/cpu.c:310
>  static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158
>  udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline]
>  setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81
>  l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509
>  pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723
> 
> Trim the writer-side critical section for sk_callback_lock down to the
> minimum, so that it covers only operations on sk_user_data.

This patch does not look correct.

Since l2tp_validate_socket() checks that sk->sk_user_data == NULL with
sk->sk_callback_lock held, you need to call rcu_assign_sk_user_data(sk, tunnel)
before releasing sk->sk_callback_lock.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-19 13:52 ` Tetsuo Handa
@ 2022-11-19 14:27   ` Tetsuo Handa
  2022-11-21  9:00     ` Jakub Sitnicki
  2022-11-21  9:00   ` Jakub Sitnicki
  1 sibling, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-11-19 14:27 UTC (permalink / raw)
  To: Jakub Sitnicki, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On 2022/11/19 22:52, Tetsuo Handa wrote:
> On 2022/11/19 22:03, Jakub Sitnicki wrote:
>> When holding a reader-writer spin lock we cannot sleep. Calling
>> setup_udp_tunnel_sock() with write lock held violates this rule, because we
>> end up calling percpu_down_read(), which might sleep, as syzbot reports
>> [1]:
>>
>>  __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890
>>  percpu_down_read include/linux/percpu-rwsem.h:49 [inline]
>>  cpus_read_lock+0x1b/0x140 kernel/cpu.c:310
>>  static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158
>>  udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline]
>>  setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81
>>  l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509
>>  pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723
>>
>> Trim the writer-side critical section for sk_callback_lock down to the
>> minimum, so that it covers only operations on sk_user_data.
> 
> This patch does not look correct.
> 
> Since l2tp_validate_socket() checks that sk->sk_user_data == NULL with
> sk->sk_callback_lock held, you need to call rcu_assign_sk_user_data(sk, tunnel)
> before releasing sk->sk_callback_lock.
> 

Is it safe to temporarily set a dummy pointer like below?
If it is not safe, what makes assignments done by setup_udp_tunnel_sock() safe?

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 754fdda8a5f5..198d38d8fceb 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1474,11 +1474,12 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 	}
 
 	sk = sock->sk;
-	write_lock(&sk->sk_callback_lock);
-
+	write_lock_bh(&sk->sk_callback_lock);
 	ret = l2tp_validate_socket(sk, net, tunnel->encap);
 	if (ret < 0)
 		goto err_sock;
+	rcu_assign_sk_user_data(sk, (void *) 1);
+	write_unlock_bh(&sk->sk_callback_lock);
 
 	tunnel->l2tp_net = net;
 	pn = l2tp_pernet(net);
@@ -1492,6 +1493,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 			spin_unlock_bh(&pn->l2tp_tunnel_list_lock);
 			sock_put(sk);
 			ret = -EEXIST;
+			write_lock_bh(&sk->sk_callback_lock);
+			rcu_assign_sk_user_data(sk, NULL);
 			goto err_sock;
 		}
 	}
@@ -1522,16 +1525,15 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 	if (tunnel->fd >= 0)
 		sockfd_put(sock);
 
-	write_unlock(&sk->sk_callback_lock);
 	return 0;
 
 err_sock:
+	write_unlock_bh(&sk->sk_callback_lock);
+
 	if (tunnel->fd < 0)
 		sock_release(sock);
 	else
 		sockfd_put(sock);
-
-	write_unlock(&sk->sk_callback_lock);
 err:
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-19 13:52 ` Tetsuo Handa
  2022-11-19 14:27   ` Tetsuo Handa
@ 2022-11-21  9:00   ` Jakub Sitnicki
  1 sibling, 0 replies; 15+ messages in thread
From: Jakub Sitnicki @ 2022-11-21  9:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On Sat, Nov 19, 2022 at 10:52 PM +09, Tetsuo Handa wrote:
> On 2022/11/19 22:03, Jakub Sitnicki wrote:
>> When holding a reader-writer spin lock we cannot sleep. Calling
>> setup_udp_tunnel_sock() with write lock held violates this rule, because we
>> end up calling percpu_down_read(), which might sleep, as syzbot reports
>> [1]:
>> 
>>  __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890
>>  percpu_down_read include/linux/percpu-rwsem.h:49 [inline]
>>  cpus_read_lock+0x1b/0x140 kernel/cpu.c:310
>>  static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158
>>  udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline]
>>  setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81
>>  l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509
>>  pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723
>> 
>> Trim the writer-side critical section for sk_callback_lock down to the
>> minimum, so that it covers only operations on sk_user_data.
>
> This patch does not look correct.
>
> Since l2tp_validate_socket() checks that sk->sk_user_data == NULL with
> sk->sk_callback_lock held, you need to call rcu_assign_sk_user_data(sk, tunnel)
> before releasing sk->sk_callback_lock.

You're right. v2 posted:

https://lore.kernel.org/netdev/20221121085426.21315-1-jakub@cloudflare.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-19 14:27   ` Tetsuo Handa
@ 2022-11-21  9:00     ` Jakub Sitnicki
  2022-11-21 10:03       ` Tetsuo Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2022-11-21  9:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On Sat, Nov 19, 2022 at 11:27 PM +09, Tetsuo Handa wrote:
> On 2022/11/19 22:52, Tetsuo Handa wrote:
>> On 2022/11/19 22:03, Jakub Sitnicki wrote:
>>> When holding a reader-writer spin lock we cannot sleep. Calling
>>> setup_udp_tunnel_sock() with write lock held violates this rule, because we
>>> end up calling percpu_down_read(), which might sleep, as syzbot reports
>>> [1]:
>>>
>>>  __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890
>>>  percpu_down_read include/linux/percpu-rwsem.h:49 [inline]
>>>  cpus_read_lock+0x1b/0x140 kernel/cpu.c:310
>>>  static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158
>>>  udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline]
>>>  setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81
>>>  l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509
>>>  pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723
>>>
>>> Trim the writer-side critical section for sk_callback_lock down to the
>>> minimum, so that it covers only operations on sk_user_data.
>> 
>> This patch does not look correct.
>> 
>> Since l2tp_validate_socket() checks that sk->sk_user_data == NULL with
>> sk->sk_callback_lock held, you need to call rcu_assign_sk_user_data(sk, tunnel)
>> before releasing sk->sk_callback_lock.
>> 
>
> Is it safe to temporarily set a dummy pointer like below?
> If it is not safe, what makes assignments done by
> setup_udp_tunnel_sock() safe?

Yes, I think so. Great idea. I've used it in v2.

We can check & assign sk_user_data under sk_callback_lock, and then just
let setup_udp_tunnel_sock overwrite it with the same value, without
holding the lock.

I still think that it's best to keep the critical section as short as
possible, though.

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-21  9:00     ` Jakub Sitnicki
@ 2022-11-21 10:03       ` Tetsuo Handa
  2022-11-21 21:55         ` Jakub Sitnicki
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-11-21 10:03 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On 2022/11/21 18:00, Jakub Sitnicki wrote:
>> Is it safe to temporarily set a dummy pointer like below?
>> If it is not safe, what makes assignments done by
>> setup_udp_tunnel_sock() safe?
> 
> Yes, I think so. Great idea. I've used it in v2.

So, you are sure that e.g.

	udp_sk(sk)->gro_receive = cfg->gro_receive;

in setup_udp_tunnel_sock() (where the caller is passing cfg->gro_receive == NULL)
never races with e.g. below check (because the socket might be sending/receiving
in progress since the socket is retrieved via user-specified file descriptor) ?

Then, v2 patch would be OK for fixing this regression. (But I think we still should
avoid retrieving a socket from user-specified file descriptor in order to avoid
lockdep race window.)


struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
                                struct udphdr *uh, struct sock *sk)
{
	(...snipped...)
        if (!sk || !udp_sk(sk)->gro_receive) {
		(...snipped...)
                /* no GRO, be sure flush the current packet */
                goto out;
        }
	(...snipped...)
        pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
out:
        skb_gro_flush_final(skb, pp, flush);
        return pp;
}

> 
> We can check & assign sk_user_data under sk_callback_lock, and then just
> let setup_udp_tunnel_sock overwrite it with the same value, without
> holding the lock.

Given that sk_user_data is RCU-protected on reader-side, don't we need to
wait for RCU grace period after resetting to NULL?

> 
> I still think that it's best to keep the critical section as short as
> possible, though.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-21 10:03       ` Tetsuo Handa
@ 2022-11-21 21:55         ` Jakub Sitnicki
  2022-11-22  9:48           ` Tetsuo Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2022-11-21 21:55 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On Mon, Nov 21, 2022 at 07:03 PM +09, Tetsuo Handa wrote:
> On 2022/11/21 18:00, Jakub Sitnicki wrote:
>>> Is it safe to temporarily set a dummy pointer like below?
>>> If it is not safe, what makes assignments done by
>>> setup_udp_tunnel_sock() safe?
>> 
>> Yes, I think so. Great idea. I've used it in v2.
>
> So, you are sure that e.g.
>
> 	udp_sk(sk)->gro_receive = cfg->gro_receive;
>
> in setup_udp_tunnel_sock() (where the caller is passing cfg->gro_receive == NULL)
> never races with e.g. below check (because the socket might be sending/receiving
> in progress since the socket is retrieved via user-specified file descriptor) ?
>
> Then, v2 patch would be OK for fixing this regression. (But I think we still should
> avoid retrieving a socket from user-specified file descriptor in order to avoid
> lockdep race window.)
>
>
> struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
>                                 struct udphdr *uh, struct sock *sk)
> {
> 	(...snipped...)
>         if (!sk || !udp_sk(sk)->gro_receive) {
> 		(...snipped...)
>                 /* no GRO, be sure flush the current packet */
>                 goto out;
>         }
> 	(...snipped...)
>         pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
> out:
>         skb_gro_flush_final(skb, pp, flush);
>         return pp;
> }
>

First, let me say, that I get the impression that setup_udp_tunnel_sock
was not really meant to be used on pre-existing sockets created by
user-space. Even though l2tp and gtp seem to be doing that.

That is because, I don't see how it could be used properly. Given that
we need to check-and-set sk_user_data under sk_callback_lock, which
setup_udp_tunnel_sock doesn't grab itself. At the same time it might
sleep. There is no way to apply it without resorting to tricks, like we
did here.

So - yeah - there may be other problems. But if there are, they are not
related to the faulty commit b68777d54fac ("l2tp: Serialize access to
sk_user_data with sk_callback_lock"), which we're trying to fix. There
was no locking present in l2tp_tunnel_register before that point.

>> 
>> We can check & assign sk_user_data under sk_callback_lock, and then just
>> let setup_udp_tunnel_sock overwrite it with the same value, without
>> holding the lock.
>
> Given that sk_user_data is RCU-protected on reader-side, don't we need to
> wait for RCU grace period after resetting to NULL?

Who would be the reader?

If you have l2tp_udp_encap_recv in mind - the encap_rcv callback has not
been set yet, if we are on the error handling path in
l2tp_tunnel_register.

In general, though, yes - I think you are right. We must synchronize_rcu
before we kfree the tunnel.

However, that is also not related to the race to check-and-set
sk_user_data, which commit b68777d54fac is trying to fix.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-21 21:55         ` Jakub Sitnicki
@ 2022-11-22  9:48           ` Tetsuo Handa
  2022-11-22 10:46             ` Jakub Sitnicki
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-11-22  9:48 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On 2022/11/22 6:55, Jakub Sitnicki wrote:
> First, let me say, that I get the impression that setup_udp_tunnel_sock
> was not really meant to be used on pre-existing sockets created by
> user-space. Even though l2tp and gtp seem to be doing that.
> 
> That is because, I don't see how it could be used properly. Given that
> we need to check-and-set sk_user_data under sk_callback_lock, which
> setup_udp_tunnel_sock doesn't grab itself. At the same time it might
> sleep. There is no way to apply it without resorting to tricks, like we
> did here.
> 
> So - yeah - there may be other problems. But if there are, they are not
> related to the faulty commit b68777d54fac ("l2tp: Serialize access to
> sk_user_data with sk_callback_lock"), which we're trying to fix. There
> was no locking present in l2tp_tunnel_register before that point.

https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360 is the one
where changing lockdep class is concurrently done on pre-existing sockets.

I think we need to always create a new socket inside l2tp_tunnel_register(),
rather than trying to serialize setting of sk_user_data under sk_callback_lock.

> However, that is also not related to the race to check-and-set
> sk_user_data, which commit b68777d54fac is trying to fix.

Therefore, I feel that reverting commit b68777d54fac "l2tp: Serialize access
to sk_user_data with sk_callback_lock" might be the better choice.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-22  9:48           ` Tetsuo Handa
@ 2022-11-22 10:46             ` Jakub Sitnicki
  2022-11-22 11:14               ` Tetsuo Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2022-11-22 10:46 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On Tue, Nov 22, 2022 at 06:48 PM +09, Tetsuo Handa wrote:
> On 2022/11/22 6:55, Jakub Sitnicki wrote:
>> First, let me say, that I get the impression that setup_udp_tunnel_sock
>> was not really meant to be used on pre-existing sockets created by
>> user-space. Even though l2tp and gtp seem to be doing that.
>> 
>> That is because, I don't see how it could be used properly. Given that
>> we need to check-and-set sk_user_data under sk_callback_lock, which
>> setup_udp_tunnel_sock doesn't grab itself. At the same time it might
>> sleep. There is no way to apply it without resorting to tricks, like we
>> did here.
>> 
>> So - yeah - there may be other problems. But if there are, they are not
>> related to the faulty commit b68777d54fac ("l2tp: Serialize access to
>> sk_user_data with sk_callback_lock"), which we're trying to fix. There
>> was no locking present in l2tp_tunnel_register before that point.
>
> https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360 is the one
> where changing lockdep class is concurrently done on pre-existing sockets.
>
> I think we need to always create a new socket inside l2tp_tunnel_register(),
> rather than trying to serialize setting of sk_user_data under sk_callback_lock.

While that would be easier to handle, I don't see how it can be done in
a backward-compatible way. User-space is allowed to pass a socket to
l2tp today [1].

>
>> However, that is also not related to the race to check-and-set
>> sk_user_data, which commit b68777d54fac is trying to fix.
>
> Therefore, I feel that reverting commit b68777d54fac "l2tp: Serialize access
> to sk_user_data with sk_callback_lock" might be the better choice.

I'm okay with that. Providing we can come up with have an alternative
fix to the race between l2tp and other sk_user_data users.

[1] https://elixir.bootlin.com/linux/v6.1-rc6/source/net/l2tp/l2tp_netlink.c#L220

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-22 10:46             ` Jakub Sitnicki
@ 2022-11-22 11:14               ` Tetsuo Handa
  2022-11-22 14:10                 ` Guillaume Nault
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-11-22 11:14 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Tom Parkin, syzbot+703d9e154b3b58277261,
	syzbot+50680ced9e98a61f7698, syzbot+de987172bb74a381879b

On 2022/11/22 19:46, Jakub Sitnicki wrote:
>> https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360 is the one
>> where changing lockdep class is concurrently done on pre-existing sockets.
>>
>> I think we need to always create a new socket inside l2tp_tunnel_register(),
>> rather than trying to serialize setting of sk_user_data under sk_callback_lock.
> 
> While that would be easier to handle, I don't see how it can be done in
> a backward-compatible way. User-space is allowed to pass a socket to
> l2tp today [1].

What is the expected usage of the socket which was passed to l2tp_tunnel_register() ?
Is the userspace supposed to just close() that socket? Or, is the userspace allowed to
continue using the socket?

If the userspace might continue using the socket, we would

  create a new socket, copy required attributes (the source and destination addresses?) from
  the socket fetched via sockfd_lookup(), and call replace_fd() like e.g. umh_pipe_setup() does

inside l2tp_tunnel_register(). i-node number of the socket would change, but I assume that
the process which called l2tp_tunnel_register() is not using that i-node number.

Since the socket is a datagram socket, I think we can copy required attributes. But since
I'm not familiar with networking code, I don't know what attributes need to be copied. Thus,
I leave implementing it to netdev people.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-22 11:14               ` Tetsuo Handa
@ 2022-11-22 14:10                 ` Guillaume Nault
  2022-11-22 14:28                   ` Tetsuo Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Guillaume Nault @ 2022-11-22 14:10 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jakub Sitnicki, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Tom Parkin,
	syzbot+703d9e154b3b58277261, syzbot+50680ced9e98a61f7698,
	syzbot+de987172bb74a381879b

On Tue, Nov 22, 2022 at 08:14:33PM +0900, Tetsuo Handa wrote:
> On 2022/11/22 19:46, Jakub Sitnicki wrote:
> >> https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360 is the one
> >> where changing lockdep class is concurrently done on pre-existing sockets.
> >>
> >> I think we need to always create a new socket inside l2tp_tunnel_register(),
> >> rather than trying to serialize setting of sk_user_data under sk_callback_lock.
> > 
> > While that would be easier to handle, I don't see how it can be done in
> > a backward-compatible way. User-space is allowed to pass a socket to
> > l2tp today [1].
> 
> What is the expected usage of the socket which was passed to l2tp_tunnel_register() ?

It receives L2TP packets. Those can be either control or data ones.
Data packets are processed by the kernel. Control packets are queued to
user space.

> Is the userspace supposed to just close() that socket? Or, is the userspace allowed to
> continue using the socket?

User space uses this socket to send and receive L2TP control packets
(tunnel and session configuration, keep alive and tear down). Therefore
it absolutely needs to continue using this socket after the
registration phase.

> If the userspace might continue using the socket, we would
> 
>   create a new socket, copy required attributes (the source and destination addresses?) from
>   the socket fetched via sockfd_lookup(), and call replace_fd() like e.g. umh_pipe_setup() does
> 
> inside l2tp_tunnel_register(). i-node number of the socket would change, but I assume that
> the process which called l2tp_tunnel_register() is not using that i-node number.
> 
> Since the socket is a datagram socket, I think we can copy required attributes. But since
> I'm not familiar with networking code, I don't know what attributes need to be copied. Thus,
> I leave implementing it to netdev people.

That looks fragile to me. If the problem is that setup_udp_tunnel_sock()
can sleep, we can just drop the udp_tunnel_encap_enable() call from
setup_udp_tunnel_sock(), rename it __udp_tunnel_encap_enable() and make
make udp_tunnel_encap_enable() a wrapper around it that'd also call
udp_tunnel_encap_enable().


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-22 14:10                 ` Guillaume Nault
@ 2022-11-22 14:28                   ` Tetsuo Handa
  2022-11-23 15:24                     ` Guillaume Nault
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-11-22 14:28 UTC (permalink / raw)
  To: Guillaume Nault
  Cc: Jakub Sitnicki, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Tom Parkin,
	syzbot+703d9e154b3b58277261, syzbot+50680ced9e98a61f7698,
	syzbot+de987172bb74a381879b

On 2022/11/22 23:10, Guillaume Nault wrote:
> User space uses this socket to send and receive L2TP control packets
> (tunnel and session configuration, keep alive and tear down). Therefore
> it absolutely needs to continue using this socket after the
> registration phase.

Thank you for explanation.

>> If the userspace might continue using the socket, we would
>>
>>   create a new socket, copy required attributes (the source and destination addresses?) from
>>   the socket fetched via sockfd_lookup(), and call replace_fd() like e.g. umh_pipe_setup() does
>>
>> inside l2tp_tunnel_register(). i-node number of the socket would change, but I assume that
>> the process which called l2tp_tunnel_register() is not using that i-node number.
>>
>> Since the socket is a datagram socket, I think we can copy required attributes. But since
>> I'm not familiar with networking code, I don't know what attributes need to be copied. Thus,
>> I leave implementing it to netdev people.
> 
> That looks fragile to me. If the problem is that setup_udp_tunnel_sock()
> can sleep, we can just drop the udp_tunnel_encap_enable() call from
> setup_udp_tunnel_sock(), rename it __udp_tunnel_encap_enable() and make
> make udp_tunnel_encap_enable() a wrapper around it that'd also call
> udp_tunnel_encap_enable().
> 

That's what I thought at https://lkml.kernel.org/r/c64284f4-2c2a-ecb9-a08e-9e49d49c720b@I-love.SAKURA.ne.jp .

But the problem is not that setup_udp_tunnel_sock() can sleep. The problem is that lockdep
gets confused due to changing lockdep class after the socket is already published. We need
to avoid calling lockdep_set_class_and_name() on a socket retrieved via sockfd_lookup().


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-22 14:28                   ` Tetsuo Handa
@ 2022-11-23 15:24                     ` Guillaume Nault
  2022-11-24 10:07                       ` Tom Parkin
  0 siblings, 1 reply; 15+ messages in thread
From: Guillaume Nault @ 2022-11-23 15:24 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jakub Sitnicki, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Tom Parkin,
	syzbot+703d9e154b3b58277261, syzbot+50680ced9e98a61f7698,
	syzbot+de987172bb74a381879b

On Tue, Nov 22, 2022 at 11:28:45PM +0900, Tetsuo Handa wrote:
> On 2022/11/22 23:10, Guillaume Nault wrote:
> > User space uses this socket to send and receive L2TP control packets
> > (tunnel and session configuration, keep alive and tear down). Therefore
> > it absolutely needs to continue using this socket after the
> > registration phase.
> 
> Thank you for explanation.
> 
> >> If the userspace might continue using the socket, we would
> >>
> >>   create a new socket, copy required attributes (the source and destination addresses?) from
> >>   the socket fetched via sockfd_lookup(), and call replace_fd() like e.g. umh_pipe_setup() does
> >>
> >> inside l2tp_tunnel_register(). i-node number of the socket would change, but I assume that
> >> the process which called l2tp_tunnel_register() is not using that i-node number.
> >>
> >> Since the socket is a datagram socket, I think we can copy required attributes. But since
> >> I'm not familiar with networking code, I don't know what attributes need to be copied. Thus,
> >> I leave implementing it to netdev people.
> > 
> > That looks fragile to me. If the problem is that setup_udp_tunnel_sock()
> > can sleep, we can just drop the udp_tunnel_encap_enable() call from
> > setup_udp_tunnel_sock(), rename it __udp_tunnel_encap_enable() and make
> > make udp_tunnel_encap_enable() a wrapper around it that'd also call
> > udp_tunnel_encap_enable().
> > 
> 
> That's what I thought at https://lkml.kernel.org/r/c64284f4-2c2a-ecb9-a08e-9e49d49c720b@I-love.SAKURA.ne.jp .
> 
> But the problem is not that setup_udp_tunnel_sock() can sleep. The problem is that lockdep
> gets confused due to changing lockdep class after the socket is already published. We need
> to avoid calling lockdep_set_class_and_name() on a socket retrieved via sockfd_lookup().

This is a second problem. The problem of setting sk_user_data under
sk_callback_lock write protection (while still calling
udp_tunnel_encap_enable() from sleepable context) still remains.

For lockdep_set_class_and_name(), maybe we could store the necessary
socket information (addresses, ports and checksum configuration) in the
l2tp_tunnel structure, thus avoiding the need to read them from the
socket. This way, we could stop locking the user space socket in
l2tp_xmit_core() and drop the lockdep_set_class_and_name() call.
I think either you or Jakub proposed something like this in another
thread.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-23 15:24                     ` Guillaume Nault
@ 2022-11-24 10:07                       ` Tom Parkin
  2022-11-24 10:27                         ` Guillaume Nault
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Parkin @ 2022-11-24 10:07 UTC (permalink / raw)
  To: Guillaume Nault
  Cc: Tetsuo Handa, Jakub Sitnicki, netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	syzbot+703d9e154b3b58277261, syzbot+50680ced9e98a61f7698,
	syzbot+de987172bb74a381879b

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

On  Wed, Nov 23, 2022 at 16:24:00 +0100, Guillaume Nault wrote:
> On Tue, Nov 22, 2022 at 11:28:45PM +0900, Tetsuo Handa wrote:
> > That's what I thought at https://lkml.kernel.org/r/c64284f4-2c2a-ecb9-a08e-9e49d49c720b@I-love.SAKURA.ne.jp .
> > 
> > But the problem is not that setup_udp_tunnel_sock() can sleep. The problem is that lockdep
> > gets confused due to changing lockdep class after the socket is already published. We need
> > to avoid calling lockdep_set_class_and_name() on a socket retrieved via sockfd_lookup().
> 
> This is a second problem. The problem of setting sk_user_data under
> sk_callback_lock write protection (while still calling
> udp_tunnel_encap_enable() from sleepable context) still remains.
> 
> For lockdep_set_class_and_name(), maybe we could store the necessary
> socket information (addresses, ports and checksum configuration) in the
> l2tp_tunnel structure, thus avoiding the need to read them from the
> socket. This way, we could stop locking the user space socket in
> l2tp_xmit_core() and drop the lockdep_set_class_and_name() call.
> I think either you or Jakub proposed something like this in another
> thread.

I note that l2tp_xmit_core calls ip_queue_xmit which expects a socket
atomic context*.

It also accesses struct inet_sock corking data which may also need locks
to safely access.

Possibly we could somehow work around that, but on the face of it we'd
need to do a bit more work to avoid the socket lock in the tx path.

* davem fixed locking in the l2tp xmit path in:

6af88da14ee2 ("l2tp: Fix locking in l2tp_core.c")
-- 
Tom Parkin
Katalix Systems Ltd
https://katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock
  2022-11-24 10:07                       ` Tom Parkin
@ 2022-11-24 10:27                         ` Guillaume Nault
  0 siblings, 0 replies; 15+ messages in thread
From: Guillaume Nault @ 2022-11-24 10:27 UTC (permalink / raw)
  To: Tom Parkin
  Cc: Tetsuo Handa, Jakub Sitnicki, netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	syzbot+703d9e154b3b58277261, syzbot+50680ced9e98a61f7698,
	syzbot+de987172bb74a381879b

On Thu, Nov 24, 2022 at 10:07:51AM +0000, Tom Parkin wrote:
> On  Wed, Nov 23, 2022 at 16:24:00 +0100, Guillaume Nault wrote:
> > On Tue, Nov 22, 2022 at 11:28:45PM +0900, Tetsuo Handa wrote:
> > > That's what I thought at https://lkml.kernel.org/r/c64284f4-2c2a-ecb9-a08e-9e49d49c720b@I-love.SAKURA.ne.jp .
> > > 
> > > But the problem is not that setup_udp_tunnel_sock() can sleep. The problem is that lockdep
> > > gets confused due to changing lockdep class after the socket is already published. We need
> > > to avoid calling lockdep_set_class_and_name() on a socket retrieved via sockfd_lookup().
> > 
> > This is a second problem. The problem of setting sk_user_data under
> > sk_callback_lock write protection (while still calling
> > udp_tunnel_encap_enable() from sleepable context) still remains.
> > 
> > For lockdep_set_class_and_name(), maybe we could store the necessary
> > socket information (addresses, ports and checksum configuration) in the
> > l2tp_tunnel structure, thus avoiding the need to read them from the
> > socket. This way, we could stop locking the user space socket in
> > l2tp_xmit_core() and drop the lockdep_set_class_and_name() call.
> > I think either you or Jakub proposed something like this in another
> > thread.
> 
> I note that l2tp_xmit_core calls ip_queue_xmit which expects a socket
> atomic context*.
> 
> It also accesses struct inet_sock corking data which may also need locks
> to safely access.
> 
> Possibly we could somehow work around that, but on the face of it we'd
> need to do a bit more work to avoid the socket lock in the tx path.

I was thinking of avoiding using the socket entirely, which indeed
means replacing ip_queue_xmit(). We should probably use the different
variants of udp_tunnel_xmit_skb() instead.

> * davem fixed locking in the l2tp xmit path in:
> 
> 6af88da14ee2 ("l2tp: Fix locking in l2tp_core.c")
> -- 
> Tom Parkin
> Katalix Systems Ltd
> https://katalix.com
> Catalysts for your Embedded Linux software development


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-11-24 10:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-19 13:03 [PATCH net] l2tp: Don't sleep and disable BH under writer-side sk_callback_lock Jakub Sitnicki
2022-11-19 13:52 ` Tetsuo Handa
2022-11-19 14:27   ` Tetsuo Handa
2022-11-21  9:00     ` Jakub Sitnicki
2022-11-21 10:03       ` Tetsuo Handa
2022-11-21 21:55         ` Jakub Sitnicki
2022-11-22  9:48           ` Tetsuo Handa
2022-11-22 10:46             ` Jakub Sitnicki
2022-11-22 11:14               ` Tetsuo Handa
2022-11-22 14:10                 ` Guillaume Nault
2022-11-22 14:28                   ` Tetsuo Handa
2022-11-23 15:24                     ` Guillaume Nault
2022-11-24 10:07                       ` Tom Parkin
2022-11-24 10:27                         ` Guillaume Nault
2022-11-21  9:00   ` Jakub Sitnicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).