All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Song Liu <songliubraving@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	Kernel Team <Kernel-team@fb.com>, Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH bpf-next 1/3] bpf: add mmap() support for BPF_MAP_TYPE_ARRAY
Date: Fri, 8 Nov 2019 11:34:31 -0800	[thread overview]
Message-ID: <CAEf4BzY2gp9DR+cdcr4DFhOYc8xkHOOSSf9MiJ6P+54USa8zog@mail.gmail.com> (raw)
In-Reply-To: <94BD3FAC-CA98-4448-B467-3FC7307174F9@fb.com>

On Thu, Nov 7, 2019 at 10:39 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Nov 7, 2019, at 8:20 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> >
> > Add ability to memory-map contents of BPF array map. This is extremely useful
> > for working with BPF global data from userspace programs. It allows to avoid
> > typical bpf_map_{lookup,update}_elem operations, improving both performance
> > and usability.
> >
> > There had to be special considerations for map freezing, to avoid having
> > writable memory view into a frozen map. To solve this issue, map freezing and
> > mmap-ing is happening under mutex now:
> >  - if map is already frozen, no writable mapping is allowed;
> >  - if map has writable memory mappings active (accounted in map->writecnt),
> >    map freezing will keep failing with -EBUSY;
> >  - once number of writable memory mappings drops to zero, map freezing can be
> >    performed again.
> >
> > Only non-per-CPU arrays are supported right now. Maps with spinlocks can't be
> > memory mapped either.
> >
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>
> Acked-by: Song Liu <songliubraving@fb.com>
>
> With one nit below.
>
>
> [...]
>
> > -     if (percpu)
> > +     data_size = 0;
> > +     if (percpu) {
> >               array_size += (u64) max_entries * sizeof(void *);
> > -     else
> > -             array_size += (u64) max_entries * elem_size;
>
> > +     } else {
> > +             if (attr->map_flags & BPF_F_MMAPABLE) {
> > +                     data_size = (u64) max_entries * elem_size;
> > +                     data_size = round_up(data_size, PAGE_SIZE);
> > +             } else {
> > +                     array_size += (u64) max_entries * elem_size;
> > +             }
> > +     }
> >
> >       /* make sure there is no u32 overflow later in round_up() */
> > -     cost = array_size;
> > +     cost = array_size + data_size;
>
>
>
> This is a little confusing. Maybe we can do
>

I don't think I can do that without even bigger code churn. In
non-mmap()-able case, array_size specifies the size of one chunk of
memory, which consists of sizeof(struct bpf_array) bytes, followed by
actual data. This is accomplished in one allocation. That's current
case for arrays.

For BPF_F_MMAPABLE case, though, we have to do 2 separate allocations,
to make sure that mmap()-able part is allocated with vmalloc() and is
page-aligned. So array_size keeps track of number of bytes allocated
for struct bpf_array plus, optionally, per-cpu or non-mmapable array
data, while data_size is explicitly for vmalloc()-ed mmap()-able chunk
of data. If not for this, I'd just keep adjusting array_size.

So the invariant for per-cpu and non-mmapable case is that data_size =
0, array_size = sizeof(struct bpf_array) + whatever amount of data we
need. For mmapable case: array_size = sizeof(struct bpf_array),
data_size = actual amount of array data.


>         data_size = (u64) max_entries * (per_cpu ? sizeof(void *) : elem_size;
>         if (attr->map_flags & BPF_F_MMAPABLE)
>                 data_size = round_up(data_size, PAGE_SIZE);
>
>         cost = array_size + data_size;
>
> So we use data_size in all cases.
>
> Maybe also rename array_size.
>
>
> >       if (percpu)
> >               cost += (u64)attr->max_entries * elem_size * num_possible_cpus();
>
> And maybe we can also include this in data_size.

see above.

>
> [...]
>

  reply	other threads:[~2019-11-08 19:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08  4:20 [PATCH bpf-next 0/3] Add support for memory-mapping BPF array maps Andrii Nakryiko
2019-11-08  4:20 ` [PATCH bpf-next 1/3] bpf: add mmap() support for BPF_MAP_TYPE_ARRAY Andrii Nakryiko
2019-11-08  6:39   ` Song Liu
2019-11-08 19:34     ` Andrii Nakryiko [this message]
2019-11-11 16:39       ` Song Liu
2019-11-08  4:20 ` [PATCH bpf-next 2/3] libbpf: make global data internal arrays mmap()-able, if possible Andrii Nakryiko
2019-11-08  6:44   ` Song Liu
2019-11-08 19:34     ` Andrii Nakryiko
2019-11-08  4:20 ` [PATCH bpf-next 3/3] selftests/bpf: add BPF_TYPE_MAP_ARRAY mmap() tests Andrii Nakryiko
2019-11-08  6:49   ` Song Liu
2019-11-08 22:08 ` [PATCH bpf-next 0/3] Add support for memory-mapping BPF array maps Stanislav Fomichev
2019-11-08 23:19   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzY2gp9DR+cdcr4DFhOYc8xkHOOSSf9MiJ6P+54USa8zog@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=Kernel-team@fb.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=netdev@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.