All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martynas Pumputis <m@lambda.lt>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Joe Stringer <joe@wand.net.nz>
Subject: Re: [PATCH bpf] libbpf: fix race when pinning maps in parallel
Date: Thu, 15 Jul 2021 12:17:14 +0200	[thread overview]
Message-ID: <b716a52c-0a94-9de1-b3fe-51e00540e964@lambda.lt> (raw)
In-Reply-To: <CAEf4BzYaQsD6NaEUij6ttDeKYP7oEB0=c0D9_xdAKw6FYb7h1g@mail.gmail.com>



On 7/8/21 10:33 PM, Andrii Nakryiko wrote:
> On Thu, Jul 8, 2021 at 8:50 AM Martynas Pumputis <m@lambda.lt> wrote:
>>
>>
>>
>> On 7/8/21 12:38 AM, Andrii Nakryiko wrote:
>>> On Mon, Jul 5, 2021 at 12:08 PM Martynas Pumputis <m@lambda.lt> wrote:
>>>>
>>>> When loading in parallel multiple programs which use the same to-be
>>>> pinned map, it is possible that two instances of the loader will call
>>>> bpf_object__create_maps() at the same time. If the map doesn't exist
>>>> when both instances call bpf_object__reuse_map(), then one of the
>>>> instances will fail with EEXIST when calling bpf_map__pin().
>>>>
>>>> Fix the race by retrying creating a map if bpf_map__pin() returns
>>>> EEXIST. The fix is similar to the one in iproute2: e4c4685fd6e4 ("bpf:
>>>> Fix race condition with map pinning").
>>>>
>>>> Cc: Joe Stringer <joe@wand.net.nz>
>>>> Signed-off-by: Martynas Pumputis <m@lambda.lt>
>>>> ---
>>>>    tools/lib/bpf/libbpf.c | 8 +++++++-
>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>>> index 1e04ce724240..7a31c7c3cd21 100644
>>>> --- a/tools/lib/bpf/libbpf.c
>>>> +++ b/tools/lib/bpf/libbpf.c
>>>> @@ -4616,10 +4616,12 @@ bpf_object__create_maps(struct bpf_object *obj)
>>>>           char *cp, errmsg[STRERR_BUFSIZE];
>>>>           unsigned int i, j;
>>>>           int err;
>>>> +       bool retried = false;
>>>
>>> retried has to be reset for each map, so just move it inside the for
>>> loop? you can also generalize it to retry_cnt (> 1 attempts) to allow
>>> for more extreme cases of multiple loaders fighting very heavily
>>
>> If we move "retried = false" to inside the loop, then there is no need
>> for retry_cnt. Single retry for each map should be enough to resolve the
>> race. In any case, I'm going to move "retried = false", as you suggested.
> 
> Right, I was originally thinking about the case where already pinned
> map might get unpinned. But then subsequently rejected the idea of
> re-creating the map :) So single retry should do.
> 
>>
>>>
>>>>
>>>>           for (i = 0; i < obj->nr_maps; i++) {
>>>>                   map = &obj->maps[i];
>>>>
>>>> +retry:
>>>>                   if (map->pin_path) {
>>>>                           err = bpf_object__reuse_map(map);
>>>>                           if (err) {
>>>> @@ -4660,9 +4662,13 @@ bpf_object__create_maps(struct bpf_object *obj)
>>>>                   if (map->pin_path && !map->pinned) {
>>>>                           err = bpf_map__pin(map, NULL);
>>>>                           if (err) {
>>>> +                               zclose(map->fd);
>>>> +                               if (!retried && err == EEXIST) {
>>>
>>> so I'm also wondering... should we commit at this point to trying to
>>> pin and not attempt to re-create the map? I'm worried that
>>> bpf_object__create_map() is not designed and tested to be called
>>> multiple times for the same bpf_map, but it's technically possible for
>>> it to be called multiple times in this scenario. Check the inner map
>>
>> Good call. I'm going to add "if (retried && map->fd < 0) { return
>> -ENOENT; }" after the "if (map->pinned) { err = bpf_object__reuse_map()
>> ... }" statement. This should prevent from invoking
>> bpf_object__create_map() multiple times.
>>
>>> creation scenario, for example (btw, I think there is a bug in
>>> bpf_object__create_map clean up for inner map, care to take a look at
>>> that as well?).
>>
>> In the case of the inner map, it should be destroyed inside
>> bpf_object__create_map() after a successful BPF_MAP_CREATE. So AFAIU,
>> there should be no need for the cleanup. Or do I miss something?
> 
> But if outer map creation fails, we won't do
> bpf_map__destroy(map->inner_map);, which is one bug. And then with
> your retry logic we also don't clean up the internal state of the
> bpf_map, which is another one. It would be good to add a self-test
> simulating such situations (e.g., by specifying wrong key_size for
> outer_map, but correct inner_map definition). Not sure how to reliably
> simulate this pinning race, though.
> 
> Can you please add at least the first test case?

Yep, I've sent the patch with a test case for the first bug. Thanks for 
explaining.

Anyway, regarding the proposed retry, I think the safest approach is to 
bail before invoking bpf_object__create_map() for the second time (when 
we retry). This would avoid any issues with idempotence of 
bpf_object__create_map() and should solve most of the cases (except when 
a map gets unpinned before the retry, but I expect this to be a very 
unusual and rare situation).

> 
>>
>>>
>>> So unless we want to allow map re-creation if (in a highly unlikely
>>> scenario) someone already unpinned the other instance, I'd say we
>>> should just bpf_map__pin() here directly, maybe in a short loop to
>>> allow for a few attempts.
>>>
>>>> +                                       retried = true;
>>>> +                                       goto retry;
>>>> +                               }
>>>>                                   pr_warn("map '%s': failed to auto-pin at '%s': %d\n",
>>>>                                           map->name, map->pin_path, err);
>>>> -                               zclose(map->fd);
>>>>                                   goto err_out;
>>>>                           }
>>>>                   }
>>>> --
>>>> 2.32.0
>>>>

  reply	other threads:[~2021-07-15 10:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-05 19:09 [PATCH bpf] libbpf: fix race when pinning maps in parallel Martynas Pumputis
2021-07-07 22:38 ` Andrii Nakryiko
2021-07-08 15:52   ` Martynas Pumputis
2021-07-08 20:33     ` Andrii Nakryiko
2021-07-15 10:17       ` Martynas Pumputis [this message]
2021-07-22 13:56       ` Martynas Pumputis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b716a52c-0a94-9de1-b3fe-51e00540e964@lambda.lt \
    --to=m@lambda.lt \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=joe@wand.net.nz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.