* [PATCH bpf] bpf: Fix a race in reuseport_array_free()
@ 2019-09-27 16:52 Martin KaFai Lau
2019-09-27 17:24 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Martin KaFai Lau @ 2019-09-27 16:52 UTC (permalink / raw)
To: bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, kernel-team
In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
valid. It is because bpf_sk_reuseport_detach() can be called from
__sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
This patch takes the reuseport_lock in reuseport_array_free() which
is not the fast path. The lock is taken inside the loop in case
that the bpf map is big.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
kernel/bpf/reuseport_array.c | 27 +++++----------------------
1 file changed, 5 insertions(+), 22 deletions(-)
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 50c083ba978c..9e593ac31ad7 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -103,29 +103,11 @@ static void reuseport_array_free(struct bpf_map *map)
* array now. Hence, this function only races with
* bpf_sk_reuseport_detach() which was triggerred by
* close() or disconnect().
- *
- * This function and bpf_sk_reuseport_detach() are
- * both removing sk from "array". Who removes it
- * first does not matter.
- *
- * The only concern here is bpf_sk_reuseport_detach()
- * may access "array" which is being freed here.
- * bpf_sk_reuseport_detach() access this "array"
- * through sk->sk_user_data _and_ with sk->sk_callback_lock
- * held which is enough because this "array" is not freed
- * until all sk->sk_user_data has stopped referencing this "array".
- *
- * Hence, due to the above, taking "reuseport_lock" is not
- * needed here.
*/
-
- /*
- * Since reuseport_lock is not taken, sk is accessed under
- * rcu_read_lock()
- */
- rcu_read_lock();
for (i = 0; i < map->max_entries; i++) {
- sk = rcu_dereference(array->ptrs[i]);
+ spin_lock_bh(&reuseport_lock);
+ sk = rcu_dereference_protected(array->ptrs[i],
+ lockdep_is_held(&reuseport_lock));
if (sk) {
write_lock_bh(&sk->sk_callback_lock);
/*
@@ -137,8 +119,9 @@ static void reuseport_array_free(struct bpf_map *map)
write_unlock_bh(&sk->sk_callback_lock);
RCU_INIT_POINTER(array->ptrs[i], NULL);
}
+ spin_unlock_bh(&reuseport_lock);
+ cond_resched();
}
- rcu_read_unlock();
/*
* Once reaching here, all sk->sk_user_data is not
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
2019-09-27 16:52 [PATCH bpf] bpf: Fix a race in reuseport_array_free() Martin KaFai Lau
@ 2019-09-27 17:24 ` Eric Dumazet
2019-09-27 18:17 ` Martin Lau
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2019-09-27 17:24 UTC (permalink / raw)
To: Martin KaFai Lau, bpf, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, kernel-team
On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
> valid. It is because bpf_sk_reuseport_detach() can be called from
> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
(after the rcu grace period) instead of sk_destruct() ?
>
> This patch takes the reuseport_lock in reuseport_array_free() which
> is not the fast path. The lock is taken inside the loop in case
> that the bpf map is big.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
2019-09-27 17:24 ` Eric Dumazet
@ 2019-09-27 18:17 ` Martin Lau
2019-09-27 20:47 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Martin Lau @ 2019-09-27 18:17 UTC (permalink / raw)
To: Eric Dumazet
Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, David Miller,
Kernel Team
On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote:
>
>
> On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
> > In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
> > valid. It is because bpf_sk_reuseport_detach() can be called from
> > __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
>
> We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
> (after the rcu grace period) instead of sk_destruct() ?
Agree. It is another way to fix it.
In this patch, I chose to avoid the need to single out a special treatment for
reuseport_detach_sock() in sk_destruct().
I am happy either way. What do you think?
>
> >
> > This patch takes the reuseport_lock in reuseport_array_free() which
> > is not the fast path. The lock is taken inside the loop in case
> > that the bpf map is big.
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
>
> Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
Ah...missed that. Thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
2019-09-27 18:17 ` Martin Lau
@ 2019-09-27 20:47 ` Eric Dumazet
2019-09-27 21:22 ` Martin Lau
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2019-09-27 20:47 UTC (permalink / raw)
To: Martin Lau, Eric Dumazet
Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, David Miller,
Kernel Team
On 9/27/19 11:17 AM, Martin Lau wrote:
> On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote:
>>
>>
>> On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
>>> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
>>> valid. It is because bpf_sk_reuseport_detach() can be called from
>>> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
>>
>> We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
>> (after the rcu grace period) instead of sk_destruct() ?
> Agree. It is another way to fix it.
>
> In this patch, I chose to avoid the need to single out a special treatment for
> reuseport_detach_sock() in sk_destruct().
>
> I am happy either way. What do you think?
It seems that since we call reuseport_detach_sock() after the rcu grace period,
another cpu could catch the sk pointer in reuse->socks[] array and use
it right before our cpu frees the socket.
RCU rules are not properly applied here I think.
The rules for deletion are :
1) unpublish object from various lists/arrays/hashes.
2) rcu_grace_period
3) free the object.
If we fix the unpublish (we need to anyway to make the data path safe),
then your patch is not needed ?
What about (totally untested, might be horribly wrong)
diff --git a/net/core/sock.c b/net/core/sock.c
index 07863edbe6fc4842e47ebebf00bc21bc406d9264..d31a4b094797f73ef89110c954aa0a164879362d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1700,8 +1700,6 @@ static void __sk_destruct(struct rcu_head *head)
sk_filter_uncharge(sk, filter);
RCU_INIT_POINTER(sk->sk_filter, NULL);
}
- if (rcu_access_pointer(sk->sk_reuseport_cb))
- reuseport_detach_sock(sk);
sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP);
@@ -1728,7 +1726,13 @@ static void __sk_destruct(struct rcu_head *head)
void sk_destruct(struct sock *sk)
{
- if (sock_flag(sk, SOCK_RCU_FREE))
+ bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE);
+
+ if (rcu_access_pointer(sk->sk_reuseport_cb)) {
+ reuseport_detach_sock(sk);
+ use_call_rcu = true;
+ }
+ if (use_call_rcu)
call_rcu(&sk->sk_rcu, __sk_destruct);
else
__sk_destruct(&sk->sk_rcu);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf] bpf: Fix a race in reuseport_array_free()
2019-09-27 20:47 ` Eric Dumazet
@ 2019-09-27 21:22 ` Martin Lau
0 siblings, 0 replies; 5+ messages in thread
From: Martin Lau @ 2019-09-27 21:22 UTC (permalink / raw)
To: Eric Dumazet
Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, David Miller,
Kernel Team
On Fri, Sep 27, 2019 at 01:47:32PM -0700, Eric Dumazet wrote:
>
>
> On 9/27/19 11:17 AM, Martin Lau wrote:
> > On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote:
> >>
> >>
> >> On 9/27/19 9:52 AM, Martin KaFai Lau wrote:
> >>> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still
> >>> valid. It is because bpf_sk_reuseport_detach() can be called from
> >>> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct).
> >>
> >> We could question why reuseport_detach_sock(sk) is called from __sk_destruct()
> >> (after the rcu grace period) instead of sk_destruct() ?
> > Agree. It is another way to fix it.
> >
> > In this patch, I chose to avoid the need to single out a special treatment for
> > reuseport_detach_sock() in sk_destruct().
> >
> > I am happy either way. What do you think?
>
> It seems that since we call reuseport_detach_sock() after the rcu grace period,
> another cpu could catch the sk pointer in reuse->socks[] array and use
> it right before our cpu frees the socket.
>
> RCU rules are not properly applied here I think.
>
> The rules for deletion are :
>
> 1) unpublish object from various lists/arrays/hashes.
Thanks for the analysis. Agreed. Indeed, there is an issue in reuse->socks[]
which is shared with other sockets and they may pick up the destructed
sk from reuse->socks[].
> 2) rcu_grace_period
> 3) free the object.
>
> If we fix the unpublish (we need to anyway to make the data path safe),
> then your patch is not needed ?
Correct, not needed.
>
> What about (totally untested, might be horribly wrong)
I had something similar in mind also. I will take a closer look and
re-spin v2.
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 07863edbe6fc4842e47ebebf00bc21bc406d9264..d31a4b094797f73ef89110c954aa0a164879362d 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1700,8 +1700,6 @@ static void __sk_destruct(struct rcu_head *head)
> sk_filter_uncharge(sk, filter);
> RCU_INIT_POINTER(sk->sk_filter, NULL);
> }
> - if (rcu_access_pointer(sk->sk_reuseport_cb))
> - reuseport_detach_sock(sk);
>
> sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP);
>
> @@ -1728,7 +1726,13 @@ static void __sk_destruct(struct rcu_head *head)
>
> void sk_destruct(struct sock *sk)
> {
> - if (sock_flag(sk, SOCK_RCU_FREE))
> + bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE);
> +
> + if (rcu_access_pointer(sk->sk_reuseport_cb)) {
> + reuseport_detach_sock(sk);
> + use_call_rcu = true;
> + }
> + if (use_call_rcu)
> call_rcu(&sk->sk_rcu, __sk_destruct);
> else
> __sk_destruct(&sk->sk_rcu);
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-09-27 21:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 16:52 [PATCH bpf] bpf: Fix a race in reuseport_array_free() Martin KaFai Lau
2019-09-27 17:24 ` Eric Dumazet
2019-09-27 18:17 ` Martin Lau
2019-09-27 20:47 ` Eric Dumazet
2019-09-27 21:22 ` Martin Lau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).