Re: [PATCH v2 bpf-next 1/3] bpf: add mmap() support for BPF_MAP_TYPE_ARRAY

From: Jakub Kicinski <jakub.kicinski@netronome.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>, Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: add mmap() support for BPF_MAP_TYPE_ARRAY
Date: Tue, 12 Nov 2019 14:38:17 -0800	[thread overview]
Message-ID: <20191112143817.0c0eb768@cakuba> (raw)
In-Reply-To: <CAEf4Bzbx0WvgX9uGF4U1HM41m6kfdvWHCeYBSBRnQhR3egGy5w@mail.gmail.com>

On Tue, 12 Nov 2019 14:03:50 -0800, Andrii Nakryiko wrote:
> On Tue, Nov 12, 2019 at 11:17 AM Jakub Kicinski wrote:
> > On Mon, 11 Nov 2019 18:06:42 -0800, Andrii Nakryiko wrote:  
> > > So let's say if sizeof(struct bpf_array) is 300, then I'd have to either:
> > >
> > > - somehow make sure that I allocate 4k (for data) + 300 (for struct
> > > bpf_array) in such a way that those 4k of data are 4k-aligned. Is
> > > there any way to do that?
> > > - assuming there isn't, then another way would be to allocate entire
> > > 4k page for struct bpf_array itself, but put it at the end of that
> > > page, so that 4k of data is 4k-aligned. While wasteful, the bigger
> > > problem is that pointer to bpf_array is not a pointer to allocated
> > > memory anymore, so we'd need to remember that and adjust address
> > > before calling vfree().
> > >
> > > Were you suggesting #2 as a solution? Or am I missing some other way to do this?  
> >
> > I am suggesting #2, that's the way to do it in the kernel.  
> 
> So I'm concerned about this approach, because it feels like a bunch of
> unnecessarily wasted memory. While there is no way around doing
> round_up(PAGE_SIZE) for data itself, it certainly is not necessary to
> waste almost entire page for struct bpf_array. And given this is going
> to be used for BPF maps backing global variables, there most probably
> will be at least 3 (.data, .bss, .rodata) per each program, and could
> be more. Also, while on x86_64 page is 4k, on other architectures it
> can be up to 64KB, so this seems wasteful.

With the extra mutex and int you grew struct bpf_map from 192B to 256B,
that's for every map on the system, unconditionally, and array map has
an extra pointer even if it doesn't need it.

Increasing "wasted" space when an opt-in feature is selected doesn't
seem all that terrible to me, especially that the overhead of aligning
up map size to page size is already necessary.

> What's your concern exactly with the way it's implemented in this patch?

Judging by other threads we seem to care about performance of BPF
(rightly so). Doing an extra pointer deref for every static data access
seems like an obvious waste.

But then again, it's just an obvious suggestion, take it or leave it..