bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf] bpf: Fix a race in reuseport_array_free()
@ 2019-09-27 16:52 Martin KaFai Lau
  2019-09-27 17:24 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Martin KaFai Lau @ 2019-09-27 16:52 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, kernel-team

In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
valid.  It is because bpf_sk_reuseport_detach() can be called from
__sk_destruct() which is invoked through call_rcu(..., __sk_destruct).

This patch takes the reuseport_lock in reuseport_array_free() which
is not the fast path.  The lock is taken inside the loop in case
that the bpf map is big.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 kernel/bpf/reuseport_array.c | 27 +++++----------------------
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 50c083ba978c..9e593ac31ad7 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -103,29 +103,11 @@ static void reuseport_array_free(struct bpf_map *map)
 	 * array now. Hence, this function only races with
 	 * bpf_sk_reuseport_detach() which was triggerred by
 	 * close() or disconnect().
-	 *
-	 * This function and bpf_sk_reuseport_detach() are
-	 * both removing sk from "array".  Who removes it
-	 * first does not matter.
-	 *
-	 * The only concern here is bpf_sk_reuseport_detach()
-	 * may access "array" which is being freed here.
-	 * bpf_sk_reuseport_detach() access this "array"
-	 * through sk->sk_user_data _and_ with sk->sk_callback_lock
-	 * held which is enough because this "array" is not freed
-	 * until all sk->sk_user_data has stopped referencing this "array".
-	 *
-	 * Hence, due to the above, taking "reuseport_lock" is not
-	 * needed here.
 	 */
-
-	/*
-	 * Since reuseport_lock is not taken, sk is accessed under
-	 * rcu_read_lock()
-	 */
-	rcu_read_lock();
 	for (i = 0; i < map->max_entries; i++) {
-		sk = rcu_dereference(array->ptrs[i]);
+		spin_lock_bh(&reuseport_lock);
+		sk = rcu_dereference_protected(array->ptrs[i],
+					lockdep_is_held(&reuseport_lock));
 		if (sk) {
 			write_lock_bh(&sk->sk_callback_lock);
 			/*
@@ -137,8 +119,9 @@ static void reuseport_array_free(struct bpf_map *map)
 			write_unlock_bh(&sk->sk_callback_lock);
 			RCU_INIT_POINTER(array->ptrs[i], NULL);
 		}
+		spin_unlock_bh(&reuseport_lock);
+		cond_resched();
 	}
-	rcu_read_unlock();
 
 	/*
 	 * Once reaching here, all sk->sk_user_data is not
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
  2019-09-27 16:52 [PATCH bpf] bpf: Fix a race in reuseport_array_free() Martin KaFai Lau
@ 2019-09-27 17:24 ` Eric Dumazet
  2019-09-27 18:17   ` Martin Lau
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2019-09-27 17:24 UTC (permalink / raw)
  To: Martin KaFai Lau, bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, kernel-team



On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
> valid.  It is because bpf_sk_reuseport_detach() can be called from
> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).

We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
(after the rcu grace period) instead of sk_destruct() ?

> 
> This patch takes the reuseport_lock in reuseport_array_free() which
> is not the fast path.  The lock is taken inside the loop in case
> that the bpf map is big.
> 
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
  2019-09-27 17:24 ` Eric Dumazet
@ 2019-09-27 18:17   ` Martin Lau
  2019-09-27 20:47     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Lau @ 2019-09-27 18:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, David Miller,
	Kernel Team

On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote:
> 
> 
> On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
> > In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
> > valid.  It is because bpf_sk_reuseport_detach() can be called from
> > __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
> 
> We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
> (after the rcu grace period) instead of sk_destruct() ?
Agree.  It is another way to fix it.

In this patch, I chose to avoid the need to single out a special treatment for
reuseport_detach_sock() in sk_destruct().

I am happy either way.  What do you think?

> 
> > 
> > This patch takes the reuseport_lock in reuseport_array_free() which
> > is not the fast path.  The lock is taken inside the loop in case
> > that the bpf map is big.
> > 
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> 
> Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
Ah...missed that.  Thanks!


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
  2019-09-27 18:17   ` Martin Lau
@ 2019-09-27 20:47     ` Eric Dumazet
  2019-09-27 21:22       ` Martin Lau
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2019-09-27 20:47 UTC (permalink / raw)
  To: Martin Lau, Eric Dumazet
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, David Miller,
	Kernel Team



On 9/27/19 11:17 AM, Martin Lau wrote:
> On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote:
>>
>>
>> On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
>>> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
>>> valid.  It is because bpf_sk_reuseport_detach() can be called from
>>> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
>>
>> We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
>> (after the rcu grace period) instead of sk_destruct() ?
> Agree.  It is another way to fix it.
> 
> In this patch, I chose to avoid the need to single out a special treatment for
> reuseport_detach_sock() in sk_destruct().
> 
> I am happy either way.  What do you think?

It seems that since we call reuseport_detach_sock() after the rcu grace period,
another cpu could catch the sk pointer in reuse->socks[] array and use
it right before our cpu frees the socket.

RCU rules are not properly applied here I think.

The rules for deletion are :

1) unpublish object from various lists/arrays/hashes.
2) rcu_grace_period
3) free the object.

If we fix the unpublish (we need to anyway to make the data path safe),
then your patch is not needed ?

What about (totally untested, might be horribly wrong)

diff --git a/net/core/sock.c b/net/core/sock.c
index 07863edbe6fc4842e47ebebf00bc21bc406d9264..d31a4b094797f73ef89110c954aa0a164879362d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1700,8 +1700,6 @@ static void __sk_destruct(struct rcu_head *head)
                sk_filter_uncharge(sk, filter);
                RCU_INIT_POINTER(sk->sk_filter, NULL);
        }
-       if (rcu_access_pointer(sk->sk_reuseport_cb))
-               reuseport_detach_sock(sk);
 
        sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP);
 
@@ -1728,7 +1726,13 @@ static void __sk_destruct(struct rcu_head *head)
 
 void sk_destruct(struct sock *sk)
 {
-       if (sock_flag(sk, SOCK_RCU_FREE))
+       bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE);
+
+       if (rcu_access_pointer(sk->sk_reuseport_cb)) {
+               reuseport_detach_sock(sk);
+               use_call_rcu = true;
+       }
+       if (use_call_rcu)
                call_rcu(&sk->sk_rcu, __sk_destruct);
        else
                __sk_destruct(&sk->sk_rcu);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
  2019-09-27 20:47     ` Eric Dumazet
@ 2019-09-27 21:22       ` Martin Lau
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Lau @ 2019-09-27 21:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, David Miller,
	Kernel Team

On Fri, Sep 27, 2019 at 01:47:32PM -0700, Eric Dumazet wrote:
> 
> 
> On 9/27/19 11:17 AM, Martin Lau wrote:
> > On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote:
> >>
> >>
> >> On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
> >>> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
> >>> valid.  It is because bpf_sk_reuseport_detach() can be called from
> >>> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
> >>
> >> We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
> >> (after the rcu grace period) instead of sk_destruct() ?
> > Agree.  It is another way to fix it.
> > 
> > In this patch, I chose to avoid the need to single out a special treatment for
> > reuseport_detach_sock() in sk_destruct().
> > 
> > I am happy either way.  What do you think?
> 
> It seems that since we call reuseport_detach_sock() after the rcu grace period,
> another cpu could catch the sk pointer in reuse->socks[] array and use
> it right before our cpu frees the socket.
> 
> RCU rules are not properly applied here I think.
> 
> The rules for deletion are :
> 
> 1) unpublish object from various lists/arrays/hashes.
Thanks for the analysis.  Agreed.  Indeed, there is an issue in reuse->socks[]
which is shared with other sockets and they may pick up the destructed
sk from reuse->socks[].

> 2) rcu_grace_period
> 3) free the object.
> 
> If we fix the unpublish (we need to anyway to make the data path safe),
> then your patch is not needed ?
Correct, not needed.

> 
> What about (totally untested, might be horribly wrong)
I had something similar in mind also.  I will take a closer look and
re-spin v2.

> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 07863edbe6fc4842e47ebebf00bc21bc406d9264..d31a4b094797f73ef89110c954aa0a164879362d 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1700,8 +1700,6 @@ static void __sk_destruct(struct rcu_head *head)
>                 sk_filter_uncharge(sk, filter);
>                 RCU_INIT_POINTER(sk->sk_filter, NULL);
>         }
> -       if (rcu_access_pointer(sk->sk_reuseport_cb))
> -               reuseport_detach_sock(sk);
>  
>         sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP);
>  
> @@ -1728,7 +1726,13 @@ static void __sk_destruct(struct rcu_head *head)
>  
>  void sk_destruct(struct sock *sk)
>  {
> -       if (sock_flag(sk, SOCK_RCU_FREE))
> +       bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE);
> +
> +       if (rcu_access_pointer(sk->sk_reuseport_cb)) {
> +               reuseport_detach_sock(sk);
> +               use_call_rcu = true;
> +       }
> +       if (use_call_rcu)
>                 call_rcu(&sk->sk_rcu, __sk_destruct);
>         else
>                 __sk_destruct(&sk->sk_rcu);

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-27 21:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 16:52 [PATCH bpf] bpf: Fix a race in reuseport_array_free() Martin KaFai Lau
2019-09-27 17:24 ` Eric Dumazet
2019-09-27 18:17   ` Martin Lau
2019-09-27 20:47     ` Eric Dumazet
2019-09-27 21:22       ` Martin Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).