All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Lorenz Bauer <lmb@cloudflare.com>,
	John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next 5/6] bpf: sockmap: allow update from BPF
Date: Thu, 20 Aug 2020 07:45:02 -0700	[thread overview]
Message-ID: <80223f16-efe3-7a43-cd88-4ec323d2c477@fb.com> (raw)
In-Reply-To: <CACAyw9_oa5BKq+0gLS6pAuGu6pj9MsRHhEAxFvts167DwpdhLw@mail.gmail.com>



On 8/20/20 4:33 AM, Lorenz Bauer wrote:
> On Wed, 19 Aug 2020 at 23:41, John Fastabend <john.fastabend@gmail.com> wrote:
>>
>> John Fastabend wrote:
>>> Lorenz Bauer wrote:
>>>> Allow calling bpf_map_update_elem on sockmap and sockhash from a BPF
>>>> context. The synchronization required for this is a bit fiddly: we
>>>> need to prevent the socket from changing it's state while we add it
>>>> to the sockmap, since we rely on getting a callback via
>>>> sk_prot->unhash. However, we can't just lock_sock like in
>>>> sock_map_sk_acquire because that might sleep. So instead we disable
>>>> softirq processing and use bh_lock_sock to prevent further
>>>> modification.
>>>>
>>>> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
>>>> ---
>>>>   kernel/bpf/verifier.c |  6 ++++--
>>>>   net/core/sock_map.c   | 24 ++++++++++++++++++++++++
>>>>   2 files changed, 28 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 47f9b94bb9d4..421fccf18dea 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -4254,7 +4254,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
>>>>                  func_id != BPF_FUNC_map_delete_elem &&
>>>>                  func_id != BPF_FUNC_msg_redirect_map &&
>>>>                  func_id != BPF_FUNC_sk_select_reuseport &&
>>>> -               func_id != BPF_FUNC_map_lookup_elem)
>>>> +               func_id != BPF_FUNC_map_lookup_elem &&
>>>> +               func_id != BPF_FUNC_map_update_elem)
>>>>                      goto error;
>>>>              break;
>>>>      case BPF_MAP_TYPE_SOCKHASH:
>>>> @@ -4263,7 +4264,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
>>>>                  func_id != BPF_FUNC_map_delete_elem &&
>>>>                  func_id != BPF_FUNC_msg_redirect_hash &&
>>>>                  func_id != BPF_FUNC_sk_select_reuseport &&
>>>> -               func_id != BPF_FUNC_map_lookup_elem)
>>>> +               func_id != BPF_FUNC_map_lookup_elem &&
>>>> +               func_id != BPF_FUNC_map_update_elem)
>>>
>>> I lost track of a detail here, map_lookup_elem should return
>>> PTR_TO_MAP_VALUE_OR_NULL but if we want to feed that back into
>>> the map_update_elem() we need to return PTR_TO_SOCKET_OR_NULL
>>> and then presumably have a null check to get a PTR_TO_SOCKET
>>> type as expect.
>>>
>>> Can we use the same logic for expected arg (previous patch) on the
>>> ret_type. Or did I miss it:/ Need some coffee I guess.
>>
>> OK, I tracked this down. It looks like we rely on mark_ptr_or_null_reg()
>> to update the reg->tyype to PTR_TO_SOCKET. I do wonder if it would be
>> a bit more straight forward to do something similar to the previous
>> patch and refine it earlier to PTR_TO_SOCKET_OR_NULL, but should be
>> safe as-is for now.
> 
> Yes, it took me a while to figure this out as well. I think we can use
> the same approach, but I wanted to keep this series simple.
> 
>> I still have the below question though.
>>
>>>
>>>>                      goto error;
>>>>              break;
>>>>      case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
>>>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>>>> index 018367fb889f..b2c886c34566 100644
>>>> --- a/net/core/sock_map.c
>>>> +++ b/net/core/sock_map.c
>>>> @@ -603,6 +603,28 @@ int sock_map_update_elem_sys(struct bpf_map *map, void *key,
>>>>      return ret;
>>>>   }
>>>>
>>>> +static int sock_map_update_elem(struct bpf_map *map, void *key,
>>>> +                           void *value, u64 flags)
>>>> +{
>>>> +   struct sock *sk = (struct sock *)value;
>>>> +   int ret;
>>>> +
>>>> +   if (!sock_map_sk_is_suitable(sk))
>>>> +           return -EOPNOTSUPP;
>>>> +
>>>> +   local_bh_disable();
>>>> +   bh_lock_sock(sk);
>>>
>>> How do ensure we are not being called from some context which
>>> already has the bh_lock_sock() held? It seems we can call map_update_elem()
>>> from any context, kprobes, tc, xdp, etc.?
> 
> Yeah, to be honest I'm not entirely sure.
> 
> XDP, TC, sk_lookup are fine I think. We have bpf_sk_lookup_tcp and
> friends, but these aren't locked, and the BPF doesn't run in a context
> where there is a locked socket.
> 
> As you point out, kprobes / tracing is problematic because the probe
> _can_ run at a point where an sk is locked. If the tracing program
> somehow gets a hold of this socket via sk_lookup_* or
> a sockmap the program could deadlock.

Thanks for John to bring this up. I looked at codes a few times
but somehow missed the potential deadlock issues.

kprobes/non-iter tracing/perf_event, freplace of these kprobes etc. 
programs, may have issues. tracepoint probably not since people
probably won't add tracepoint inside a spinlock.

> 
> bpf_sock_ops is also problematic since ctx->sk is in various states of
> locking. For example, BPF_SOCK_OPS_TCP_LISTEN_CB is called with
> lock_sock held, so unproblematic. BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB
> on the other hand is called with the spinlock held.
> 
> It seems to me like the only option is to instead only allow updates
> from "safe" contexts, such as XDP, tc, bpf_iter etc.

This should be okay, I think. You can start from small and then
grows as more use cases emerge.

> 
> Am I missing something?
> 
> 
>>>
>>>> +   if (!sock_map_sk_state_allowed(sk))
>>>> +           ret = -EOPNOTSUPP;
>>>> +   else if (map->map_type == BPF_MAP_TYPE_SOCKMAP)
>>>> +           ret = sock_map_update_common(map, *(u32 *)key, sk, flags);
>>>> +   else
>>>> +           ret = sock_hash_update_common(map, key, sk, flags);
>>>> +   bh_unlock_sock(sk);
>>>> +   local_bh_enable();
>>>> +   return ret;
>>>> +}
>>>> +
>>>>   BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, sops,
>>>>         struct bpf_map *, map, void *, key, u64, flags)
>>>>   {
>>>> @@ -687,6 +709,7 @@ const struct bpf_map_ops sock_map_ops = {
>>>>      .map_free               = sock_map_free,
>>>>      .map_get_next_key       = sock_map_get_next_key,
>>>>      .map_lookup_elem_sys_only = sock_map_lookup_sys,
>>>> +   .map_update_elem        = sock_map_update_elem,
>>>>      .map_delete_elem        = sock_map_delete_elem,
>>>>      .map_lookup_elem        = sock_map_lookup,
>>>>      .map_release_uref       = sock_map_release_progs,
>>>> @@ -1180,6 +1203,7 @@ const struct bpf_map_ops sock_hash_ops = {
>>>>      .map_alloc              = sock_hash_alloc,
>>>>      .map_free               = sock_hash_free,
>>>>      .map_get_next_key       = sock_hash_get_next_key,
>>>> +   .map_update_elem        = sock_map_update_elem,
>>>>      .map_delete_elem        = sock_hash_delete_elem,
>>>>      .map_lookup_elem        = sock_hash_lookup,
>>>>      .map_lookup_elem_sys_only = sock_hash_lookup_sys,
>>>> --
>>>> 2.25.1
>>>>
>>>
>>>
>>
>>
> 
> 
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cloudflare.com&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=1Gk85bZFwViEPzlsGBklXLgdbxI4Q9F505taA25KfBI&s=pcsCdyC4ZCnSqXgJJc4rjmFx1C8Hiz49KCOxDf6gagg&e=
> 

  reply	other threads:[~2020-08-20 14:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-19  9:24 [PATCH bpf-next 0/6] Allow updating sockmap / sockhash from BPF Lorenz Bauer
2020-08-19  9:24 ` [PATCH bpf-next 1/6] net: sk_msg: simplify sk_psock initialization Lorenz Bauer
2020-08-19 20:05   ` John Fastabend
2020-08-19  9:24 ` [PATCH bpf-next 2/6] bpf: sockmap: merge sockmap and sockhash update functions Lorenz Bauer
2020-08-19 15:48   ` Jakub Kicinski
2020-08-19 18:50   ` Yonghong Song
2020-08-19 20:11   ` John Fastabend
2020-08-19  9:24 ` [PATCH bpf-next 3/6] bpf: sockmap: call sock_map_update_elem directly Lorenz Bauer
2020-08-19 19:15   ` Yonghong Song
2020-08-19 20:29   ` John Fastabend
2020-08-19  9:24 ` [PATCH bpf-next 4/6] bpf: override the meaning of ARG_PTR_TO_MAP_VALUE for sockmap and sockhash Lorenz Bauer
2020-08-19 20:13   ` Yonghong Song
2020-08-20 12:33     ` Lorenz Bauer
2020-08-19 20:51   ` John Fastabend
2020-08-19  9:24 ` [PATCH bpf-next 5/6] bpf: sockmap: allow update from BPF Lorenz Bauer
2020-08-19 20:35   ` Yonghong Song
2020-08-19 21:22   ` John Fastabend
2020-08-19 22:41     ` John Fastabend
2020-08-20 11:33       ` Lorenz Bauer
2020-08-20 14:45         ` Yonghong Song [this message]
2020-08-19  9:24 ` [PATCH bpf-next 6/6] selftests: bpf: test sockmap " Lorenz Bauer
2020-08-19 20:45   ` Yonghong Song
2020-08-20 11:58     ` Lorenz Bauer
2020-08-20 14:49       ` Yonghong Song
2020-08-20 16:12         ` Lorenz Bauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80223f16-efe3-7a43-cd88-4ec323d2c477@fb.com \
    --to=yhs@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.