bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	john fastabend <john.fastabend@gmail.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 5/8] bpf: add poke dependency tracking for prog array maps
Date: Sat, 23 Nov 2019 00:06:56 +0100	[thread overview]
Message-ID: <d019af80-e6ac-fbd5-f4fd-4743d4df5119@iogearbox.net> (raw)
In-Reply-To: <CAEf4BzaZf+_WyARsmZ_rgO_+Ug1iSKsqaoWpB-dPXS6uejT=Qg@mail.gmail.com>

On 11/22/19 11:55 PM, Andrii Nakryiko wrote:
> On Fri, Nov 22, 2019 at 12:08 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> This work adds program tracking to prog array maps. This is needed such
>> that upon prog array updates/deletions we can fix up all programs which
>> make use of this tail call map. We add ops->map_poke_{un,}track()
>> helpers to maps to maintain the list of programs and ops->map_poke_run()
>> for triggering the actual update.
>>
>> bpf_array_aux is extended to contain the list head and poke_mutex in
>> order to serialize program patching during updates/deletions.
>> bpf_free_used_maps() will untrack the program shortly before dropping
>> the reference to the map. For clearing out the prog array once all urefs
>> are dropped we need to use schedule_work() to have a sleepable context.
>>
>> The prog_array_map_poke_run() is triggered during updates/deletions and
>> walks the maintained prog list. It checks in their poke_tabs whether the
>> map and key is matching and runs the actual bpf_arch_text_poke() for
>> patching in the nop or new jmp location. Depending on the type of update,
>> we use one of BPF_MOD_{NOP_TO_JUMP,JUMP_TO_NOP,JUMP_TO_JUMP}.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>   include/linux/bpf.h   |  12 +++
>>   kernel/bpf/arraymap.c | 183 +++++++++++++++++++++++++++++++++++++++++-
>>   kernel/bpf/core.c     |   9 ++-
>>   kernel/bpf/syscall.c  |  20 +++--
>>   4 files changed, 212 insertions(+), 12 deletions(-)
>>
> 
> [...]
> 
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 5a9873e58a01..bb002f15b32a 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -26,12 +26,13 @@
>>   #include <linux/audit.h>
>>   #include <uapi/linux/btf.h>
>>
>> -#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
>> -                          (map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
>> -                          (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
>> -                          (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
>> +#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
>> +                         (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
>> +                         (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
>> +#define IS_FD_PROG_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY)
>>   #define IS_FD_HASH(map) ((map)->map_type == BPF_MAP_TYPE_HASH_OF_MAPS)
>> -#define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_HASH(map))
>> +#define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map) || \
>> +                       IS_FD_HASH(map))
>>
>>   #define BPF_OBJ_FLAG_MASK   (BPF_F_RDONLY | BPF_F_WRONLY)
>>
>> @@ -878,7 +879,7 @@ static int map_lookup_elem(union bpf_attr *attr)
>>                  err = bpf_percpu_cgroup_storage_copy(map, key, value);
>>          } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
>>                  err = bpf_stackmap_copy(map, key, value);
>> -       } else if (IS_FD_ARRAY(map)) {
>> +       } else if (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map)) {
> 
> Why BPF_MAP_TYPE_PROG_ARRAY couldn't still stay as "IS_FD_ARRAY"?
> Seems like it's still handled the same here and is technically an
> array containing FDs, no? You can still have more precise

For the update and delete method we need to hold the mutex in prog array
which is why we cannot be under RCU there. For the lookup itself it can
stay as-is since it's not a modification hence the or above.

> IS_FD_PROG_ARRAY, for cases like in map_update_elem(), where you need
> to special-handle just that map type.
> 
>>                  err = bpf_fd_array_map_lookup_elem(map, key, value);
>>          } else if (IS_FD_HASH(map)) {
>>                  err = bpf_fd_htab_map_lookup_elem(map, key, value);
>> @@ -1005,6 +1006,10 @@ static int map_update_elem(union bpf_attr *attr)
>>                     map->map_type == BPF_MAP_TYPE_SOCKMAP) {
>>                  err = map->ops->map_update_elem(map, key, value, attr->flags);
>>                  goto out;
>> +       } else if (IS_FD_PROG_ARRAY(map)) {
>> +               err = bpf_fd_array_map_update_elem(map, f.file, key, value,
>> +                                                  attr->flags);
>> +               goto out;
>>          }
>>
>>          /* must increment bpf_prog_active to avoid kprobe+bpf triggering from
>> @@ -1087,6 +1092,9 @@ static int map_delete_elem(union bpf_attr *attr)
>>          if (bpf_map_is_dev_bound(map)) {
>>                  err = bpf_map_offload_delete_elem(map, key);
>>                  goto out;
>> +       } else if (IS_FD_PROG_ARRAY(map)) {
>> +               err = map->ops->map_delete_elem(map, key);
> 
> map->ops->map_delete_elem would be called below anyways, except under
> rcu_read_lock() with preempt_disable() (maybe_wait_bpf_programs() is
> no-op for prog_array). So if there is specific reason we want to avoid
> preempt_disable and rcu_read_lock(), would be nice to have a comment
> explaining that.

See answer above. I didn't add an explicit comment there since it's not done
for all the others either, but seems obvious when looking at the map implementation
and prog_array_map_poke_run() / bpf_arch_text_poke().

Thanks,
Daniel

  reply	other threads:[~2019-11-22 23:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-22 20:07 [PATCH bpf-next v2 0/8] Optimize BPF tail calls for direct jumps Daniel Borkmann
2019-11-22 20:07 ` [PATCH bpf-next v2 1/8] bpf, x86: generalize and extend bpf_arch_text_poke " Daniel Borkmann
2019-11-22 20:07 ` [PATCH bpf-next v2 2/8] bpf: move bpf_free_used_maps into sleepable section Daniel Borkmann
2019-11-22 20:07 ` [PATCH bpf-next v2 3/8] bpf: move owner type,jited info into array auxiliary data Daniel Borkmann
2019-11-22 20:07 ` [PATCH bpf-next v2 4/8] bpf: add initial poke descriptor table for jit images Daniel Borkmann
2019-11-22 20:07 ` [PATCH bpf-next v2 5/8] bpf: add poke dependency tracking for prog array maps Daniel Borkmann
2019-11-22 22:55   ` Andrii Nakryiko
2019-11-22 23:06     ` Daniel Borkmann [this message]
2019-11-22 23:10       ` Andrii Nakryiko
2019-11-22 20:07 ` [PATCH bpf-next v2 6/8] bpf: constant map key tracking for prog array pokes Daniel Borkmann
2019-11-22 22:57   ` Andrii Nakryiko
2019-11-23 10:39   ` Jakub Sitnicki
2019-11-22 20:08 ` [PATCH bpf-next v2 7/8] bpf, x86: emit patchable direct jump as tail call Daniel Borkmann
2019-11-22 23:09   ` Andrii Nakryiko
2019-11-22 23:25     ` Daniel Borkmann
2019-11-23  2:28       ` Alexei Starovoitov
2019-11-23  5:00         ` Andrii Nakryiko
2019-11-23  6:18           ` Alexei Starovoitov
2019-11-23  9:24             ` Daniel Borkmann
2019-11-22 20:08 ` [PATCH bpf-next v2 8/8] bpf, testing: add various tail call test cases Daniel Borkmann
2019-11-22 23:14   ` Andrii Nakryiko
2019-11-23  2:22 ` [PATCH bpf-next v2 0/8] Optimize BPF tail calls for direct jumps Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d019af80-e6ac-fbd5-f4fd-4743d4df5119@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).