From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Yonghong Song <yhs@meta.com>, Hou Tao <houtao@huaweicloud.com>,
bpf <bpf@vger.kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Andrii Nakryiko <andrii@kernel.org>, Song Liu <song@kernel.org>,
Hao Luo <haoluo@google.com>, Yonghong Song <yhs@fb.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
rcu@vger.kernel.org, Hou Tao <houtao1@huawei.com>
Subject: Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc
Date: Fri, 10 Feb 2023 13:06:52 -0800 [thread overview]
Message-ID: <CAADnVQLVi7CcW9ci62Dps4mxCEqHOYvYJ-Fant-0kSy0vPZ3AA@mail.gmail.com> (raw)
In-Reply-To: <20230210163258.phekigglpquitq33@apollo>
On Fri, Feb 10, 2023 at 8:33 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Jan 04, 2023 at 07:26:12PM CET, Alexei Starovoitov wrote:
> > On Tue, Jan 3, 2023 at 11:14 PM Yonghong Song <yhs@meta.com> wrote:
> > >
> > >
> > >
> > > On 1/3/23 10:30 PM, Hou Tao wrote:
> > > > Hi,
> > > >
> > > > On 1/4/2023 2:10 PM, Yonghong Song wrote:
> > > >>
> > > >>
> > > >> On 1/3/23 5:47 AM, Hou Tao wrote:
> > > >>> Hi,
> > > >>>
> > > >>> On 1/2/2023 2:48 AM, Yonghong Song wrote:
> > > >>>>
> > > >>>>
> > > >>>> On 12/31/22 5:26 PM, Alexei Starovoitov wrote:
> > > >>>>> On Fri, Dec 30, 2022 at 12:11:45PM +0800, Hou Tao wrote:
> > > >>>>>> From: Hou Tao <houtao1@huawei.com>
> > > >>>>>>
> > > >>>>>> Hi,
> > > >>>>>>
> > > >>>>>> The patchset tries to fix the problems found when checking how htab map
> > > >>>>>> handles element reuse in bpf memory allocator. The immediate reuse of
> > > >>>>>> freed elements may lead to two problems in htab map:
> > > >>>>>>
> > > >>>>>> (1) reuse will reinitialize special fields (e.g., bpf_spin_lock) in
> > > >>>>>> htab map value and it may corrupt lookup procedure with BFP_F_LOCK
> > > >>>>>> flag which acquires bpf-spin-lock during value copying. The
> > > >>>>>> corruption of bpf-spin-lock may result in hard lock-up.
> > > >>>>>> (2) lookup procedure may get incorrect map value if the found element is
> > > >>>>>> freed and then reused.
> > > >>>>>>
> > > >>>>>> Because the type of htab map elements are the same, so problem #1 can be
> > > >>>>>> fixed by supporting ctor in bpf memory allocator. The ctor initializes
> > > >>>>>> these special fields in map element only when the map element is newly
> > > >>>>>> allocated. If it is just a reused element, there will be no
> > > >>>>>> reinitialization.
> > > >>>>>
> > > >>>>> Instead of adding the overhead of ctor callback let's just
> > > >>>>> add __GFP_ZERO to flags in __alloc().
> > > >>>>> That will address the issue 1 and will make bpf_mem_alloc behave just
> > > >>>>> like percpu_freelist, so hashmap with BPF_F_NO_PREALLOC and default
> > > >>>>> will behave the same way.
> > > >>>>
> > > >>>> Patch https://lore.kernel.org/all/20220809213033.24147-3-memxor@gmail.com/
> > > >>>> tried to address a similar issue for lru hash table.
> > > >>>> Maybe we need to do similar things after bpf_mem_cache_alloc() for
> > > >>>> hash table?
> > > >>> IMO ctor or __GFP_ZERO will fix the issue. Did I miss something here ?
> > > >>
> > > >> The following is my understanding:
> > > >> in function alloc_htab_elem() (hashtab.c), we have
> > > >>
> > > >> if (is_map_full(htab))
> > > >> if (!old_elem)
> > > >> /* when map is full and update() is replacing
> > > >> * old element, it's ok to allocate, since
> > > >> * old element will be freed immediately.
> > > >> * Otherwise return an error
> > > >> */
> > > >> return ERR_PTR(-E2BIG);
> > > >> inc_elem_count(htab);
> > > >> l_new = bpf_mem_cache_alloc(&htab->ma);
> > > >> if (!l_new) {
> > > >> l_new = ERR_PTR(-ENOMEM);
> > > >> goto dec_count;
> > > >> }
> > > >> check_and_init_map_value(&htab->map,
> > > >> l_new->key + round_up(key_size, 8));
> > > >>
> > > >> In the above check_and_init_map_value() intends to do initializing
> > > >> for an element from bpf_mem_cache_alloc (could be reused from the free list).
> > > >>
> > > >> The check_and_init_map_value() looks like below (in include/linux/bpf.h)
> > > >>
> > > >> static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
> > > >> {
> > > >> int i;
> > > >>
> > > >> if (!foffs)
> > > >> return;
> > > >> for (i = 0; i < foffs->cnt; i++)
> > > >> memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
> > > >> }
> > > >>
> > > >> static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > > >> {
> > > >> bpf_obj_init(map->field_offs, dst);
> > > >> }
> > > >>
> > > >> IIUC, bpf_obj_init() will bzero those fields like spin_lock, timer,
> > > >> list_head, list_node, etc.
> > > >>
> > > >> This is the problem for above problem #1.
> > > >> Maybe I missed something?
> > > > Yes. It is the problem patch #1 tries to fix exactly. Patch #1 tries to fix the
> > > > problem by only calling check_and_init_map_value() once for the newly-allocated
> > > > element, so if a freed element is reused, its special fields will not be zeroed
> > > > again. Is there any other cases which are not covered by the solution or any
> > > > other similar problems in hash-tab ?
> > >
> > > No, I checked all cases of check_and_init_map_value() and didn't find
> > > any other instances.
> >
> > check_and_init_map_value() is called in two other cases:
> > lookup_and_delete[_batch].
> > There the zeroing of the fields is necessary because the 'value'
> > is a temp buffer that is going to be copied to user space.
> > I think the way forward is to add GFP_ZERO to mem_alloc
> > (to make it equivalent to prealloc), remove one case
> > of check_and_init_map_value from hashmap, add short comments
> > to two other cases and add a big comment to check_and_init_map_value()
> > that should say that 'dst' must be a temp buffer and should not
> > point to memory that could be used in parallel by a bpf prog.
> > It feels like we've dealt with this issue a couple times already
> > and keep repeating this mistake, so the more comments the better.
>
> Hou, are you plannning to resubmit this change? I also hit this while testing my
> changes on bpf-next.
Are you talking about the whole patch set or just GFP_ZERO in mem_alloc?
The former will take a long time to settle.
The latter is trivial.
To unblock yourself just add GFP_ZERO in an extra patch?
next prev parent reply other threads:[~2023-02-10 21:07 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-30 4:11 [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 1/6] bpf: Support ctor in bpf memory allocator Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 2/6] bpf: Factor out a common helper free_llist() Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 3/6] bpf: Pass bitwise flags to bpf_mem_alloc_init() Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 4/6] bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 5/6] bpf: Use BPF_MA_NO_REUSE in htab map Hou Tao
2022-12-30 4:11 ` [RFC PATCH bpf-next 6/6] selftests/bpf: Add test case for element reuse " Hou Tao
2023-01-01 1:26 ` [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc Alexei Starovoitov
2023-01-01 18:48 ` Yonghong Song
2023-01-03 13:47 ` Hou Tao
2023-01-04 6:10 ` Yonghong Song
2023-01-04 6:30 ` Hou Tao
2023-01-04 7:14 ` Yonghong Song
2023-01-04 18:26 ` Alexei Starovoitov
2023-02-10 16:32 ` Kumar Kartikeya Dwivedi
2023-02-10 21:06 ` Alexei Starovoitov [this message]
2023-02-11 1:09 ` Hou Tao
2023-02-11 16:33 ` Alexei Starovoitov
2023-02-11 16:34 ` Alexei Starovoitov
2023-02-15 1:54 ` Martin KaFai Lau
2023-02-15 4:02 ` Hou Tao
2023-02-15 7:22 ` Martin KaFai Lau
2023-02-16 2:11 ` Hou Tao
2023-02-16 7:47 ` Martin KaFai Lau
2023-02-16 8:18 ` Hou Tao
2023-02-16 13:55 ` Hou Tao
2023-02-16 16:35 ` Alexei Starovoitov
2023-02-17 1:19 ` Hou Tao
2023-02-22 19:30 ` Alexei Starovoitov
2023-02-15 2:35 ` Hou Tao
2023-02-15 2:42 ` Alexei Starovoitov
2023-02-15 3:00 ` Hou Tao
2023-01-03 13:40 ` Hou Tao
2023-01-03 19:38 ` Alexei Starovoitov
2023-01-10 6:26 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAADnVQLVi7CcW9ci62Dps4mxCEqHOYvYJ-Fant-0kSy0vPZ3AA@mail.gmail.com \
--to=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=houtao1@huawei.com \
--cc=houtao@huaweicloud.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
--cc=yhs@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).