Re: [PATCH bpf] libbpf: count present CPUs, not theoretically possible

From: Alan Maguire <alan.maguire@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alan Maguire <alan.maguire@oracle.com>,
	Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf] libbpf: count present CPUs, not theoretically possible
Date: Sat, 28 Sep 2019 18:46:22 +0100 (BST)	[thread overview]
Message-ID: <alpine.LRH.2.20.1909281844480.5332@dhcp-10-175-218-65.vpn.oracle.com> (raw)
In-Reply-To: <CAEf4BzYTQbVVUiQOHEcjL8mZ=iJBSGHFRNxoayRpaMw5ys+iDw@mail.gmail.com>

On Sat, 28 Sep 2019, Andrii Nakryiko wrote:

> On Sat, Sep 28, 2019 at 4:20 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > On Fri, 27 Sep 2019, Andrii Nakryiko wrote:
> >
> > > This patch switches libbpf_num_possible_cpus() from using possible CPU
> > > set to present CPU set. This fixes issues with incorrect auto-sizing of
> > > PERF_EVENT_ARRAY map on HOTPLUG-enabled systems.
> > >
> > > On HOTPLUG enabled systems, /sys/devices/system/cpu/possible is going to
> > > be a set of any representable (i.e., potentially possible) CPU, which is
> > > normally way higher than real amount of CPUs (e.g., 0-127 on VM I've
> > > tested on, while there were just two CPU cores actually present).
> > > /sys/devices/system/cpu/present, on the other hand, will only contain
> > > CPUs that are physically present in the system (even if not online yet),
> > > which is what we really want, especially when creating per-CPU maps or
> > > perf events.
> > >
> > > On systems with HOTPLUG disabled, present and possible are identical, so
> > > there is no change of behavior there.
> > >
> >
> > Just curious - is there a reason for not adding a new libbpf_num_present_cpus()
> > function to cover this  case, and switching to using that in various places?
> 
> The reason is that libbpf_num_possible_cpus() is useless on HOTPLUG
> systems and never worked as intended. If you rely on this function to
> create perf_buffer and/or PERF_EVENT_ARRAY, it will simply fail due to
> specifying more CPUs than are present. I didn't want to keep adding
> new APIs for no good reason, while also leaving useless ones, so I
> fixed the existing API to behave as expected. It's unfortunate that
> name doesn't match sysfs file we are reading it from, of course, but
> having people to choose between libbpf_num_possible_cpus() vs
> libbpf_num_present_cpus() seems like even bigger problem, as
> differences are non-obvious.
> 
> The good thing, it won't break all the non-HOTPLUG systems for sure,
> which seems to be the only cases that are used right now (otherwise
> someone would already complain about broken
> libbpf_num_possible_cpus()).
>

Understood, thanks for the explanation. 

> >
> > Looking at the places libbpf_num_possible_cpus() is called in libbpf
> >
> > - __perf_buffer__new(): this could just change to use the number of
> >   present CPUs, since perf_buffer__new_raw() with a cpu_cnt in struct
> >   perf_buffer_raw_ops
> >
> > - bpf_object__create_maps(), which is called via bpf_oject__load_xattr().
> >   In this case it seems like switching to num present makes sense, though
> >   it might make sense to add a field to struct bpf_object_load_attr * to
> >   allow users to explicitly set another max value.
> 
> I believe more knobs is not always better for API. Plus, adding any
> field to those xxx_xattr structs is an ABI breakage and requires
> adding new APIs, so I don't think this is good enough reason to add
> new flag. See discussion in another thread about this whole API design
> w/ current attributes and ABI consequences of adding anything new to
> them.
> 
> >
> > This would give the desired default behaviour, while still giving users
> > a way of specifying the possible number. What do you think? Thanks!
> 
> BTW, if user wants to override the size of maps, they can do it easily
> either in map definition or programmatically after bpf_object__open,
> but before bpf_object__load, so there is no need for flags, it's all
> easily achievable with existing API.
> 

Ah, I missed that. Thanks for clarifying!

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>

> >
> > Alan
> >
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > ---
> > >  tools/lib/bpf/libbpf.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index e0276520171b..45351c074e45 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -5899,7 +5899,7 @@ void bpf_program__bpil_offs_to_addr(struct bpf_prog_info_linear *info_linear)
> > >
> > >  int libbpf_num_possible_cpus(void)
> > >  {
> > > -     static const char *fcpu = "/sys/devices/system/cpu/possible";
> > > +     static const char *fcpu = "/sys/devices/system/cpu/present";
> > >       int len = 0, n = 0, il = 0, ir = 0;
> > >       unsigned int start = 0, end = 0;
> > >       int tmp_cpus = 0;
> > > --
> > > 2.17.1
> > >
> > >
>