All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@meta.com>
To: David Vernet <void@manifault.com>
Cc: Yonghong Song <yhs@fb.com>,
	bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, KP Singh <kpsingh@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Date: Sun, 23 Oct 2022 09:45:35 -0700	[thread overview]
Message-ID: <95ff1fa3-124b-6886-64e0-adcf40085e55@meta.com> (raw)
In-Reply-To: <Y1NdLah/c38isGT+@maniforge.dhcp.thefacebook.com>



On 10/21/22 8:02 PM, David Vernet wrote:
> On Fri, Oct 21, 2022 at 03:57:15PM -0700, Yonghong Song wrote:
> 
> [...]
> 
>>>>>> +	 * could be modifying the local_storage->list now.
>>>>>> +	 * Thus, no elem can be added-to or deleted-from the
>>>>>> +	 * local_storage->list by the bpf_prog or by the bpf-map's syscall.
>>>>>> +	 *
>>>>>> +	 * It is racing with bpf_local_storage_map_free() alone
>>>>>> +	 * when unlinking elem from the local_storage->list and
>>>>>> +	 * the map's bucket->list.
>>>>>> +	 */
>>>>>> +	bpf_cgrp_storage_lock();
>>>>>> +	raw_spin_lock_irqsave(&local_storage->lock, flags);
>>>>>> +	hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
>>>>>> +		bpf_selem_unlink_map(selem);
>>>>>> +		free_cgroup_storage =
>>>>>> +			bpf_selem_unlink_storage_nolock(local_storage, selem, false, false);
>>>>>
>>>>> This still requires a comment explaining why it's OK to overwrite
>>>>> free_cgroup_storage with a previous value from calling
>>>>> bpf_selem_unlink_storage_nolock(). Even if that is safe, this looks like
>>>>> a pretty weird programming pattern, and IMO doing this feels more
>>>>> intentional and future-proof:
>>>>>
>>>>> if (bpf_selem_unlink_storage_nolock(local_storage, selem, false, false))
>>>>> 	free_cgroup_storage = true;
>>>>
>>>> We have a comment a few lines below.
>>>>     /* free_cgroup_storage should always be true as long as
>>>>      * local_storage->list was non-empty.
>>>>      */
>>>>     if (free_cgroup_storage)
>>>> 	kfree_rcu(local_storage, rcu);
>>>
>>> IMO that comment doesn't provide much useful information -- it states an
>>> assumption, but doesn't give a reason for it.
>>>
>>>> I will add more explanation in the above code like
>>>>
>>>> 	bpf_selem_unlink_map(selem);
>>>> 	/* If local_storage list only have one element, the
>>>> 	 * bpf_selem_unlink_storage_nolock() will return true.
>>>> 	 * Otherwise, it will return false. The current loop iteration
>>>> 	 * intends to remove all local storage. So the last iteration
>>>> 	 * of the loop will set the free_cgroup_storage to true.
>>>> 	 */
>>>> 	free_cgroup_storage =
>>>> 		bpf_selem_unlink_storage_nolock(local_storage, selem, false, false);
>>>
>>> Thanks, this is the type of comment I was looking for.
>>>
>>> Also, I realize this was copy-pasted from a number of other possible
>>> locations in the codebase which are doing the same thing, but I still
>>> think this pattern is an odd and brittle way to do this. We're relying
>>> on an abstracted implementation detail of
>>> bpf_selem_unlink_storage_nolock() for correctness, which IMO is a signal
>>> that bpf_selem_unlink_storage_nolock() should probably be the one
>>> invoking kfree_rcu() on behalf of callers in the first place.  It looks
>>> like all of the callers end up calling kfree_rcu() on the struct
>>> bpf_local_storage * if bpf_selem_unlink_storage_nolock() returns true,
>>> so can we just move the responsibility of freeing the local storage
>>> object down into bpf_selem_unlink_storage_nolock() where it's unlinked?
>>
>> We probably cannot do this. bpf_selem_unlink_storage_nolock()
>> is inside the rcu_read_lock() region. We do kfree_rcu() outside
>> the rcu_read_lock() region.
> 
> kfree_rcu() is non-blocking and is safe to invoke from within an RCU
> read region. If you invoke it within an RCU read region, the object will
> not be kfree'd until (at least) you exit the current read region, so I
> believe that the net effect here should be the same whether it's done in
> bpf_selem_unlink_storage_nolock(), or in the caller after the RCU read
> region is exited.

Okay. we probably still want to do kfree_rcu outside
bpf_selem_unlink_storage_nolock() as the function is to unlink storage
for a particular selem.

We could move
	if (free_cgroup_storage)
		kfree_rcu(local_storage, rcu);
immediately after hlist_for_each_entry_safe() loop.
But I think putting that 'if' statement after rcu_read_unlock() is
slightly better as it will not increase the code inside the lock region.

> 
>>> IMO this can be done in a separate patch set, if we decide it's worth
>>> doing at all.
>>>
>>>>>
>>>>>> +	}
>>>>>> +	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
>>>>>> +	bpf_cgrp_storage_unlock();
>>>>>> +	rcu_read_unlock();
>>>>>> +
>>>>>> +	/* free_cgroup_storage should always be true as long as
>>>>>> +	 * local_storage->list was non-empty.
>>>>>> +	 */
>>>>>> +	if (free_cgroup_storage)
>>>>>> +		kfree_rcu(local_storage, rcu);
>>>>>> +}
>>>>>> +
>>>>>> +static struct bpf_local_storage_data *
>>>>>> +cgroup_storage_lookup(struct cgroup *cgroup, struct bpf_map *map, bool cacheit_lockit)
>>>>>> +{
>>>>>> +	struct bpf_local_storage *cgroup_storage;
>>>>>> +	struct bpf_local_storage_map *smap;
>>>>>> +
>>>>>> +	cgroup_storage = rcu_dereference_check(cgroup->bpf_cgrp_storage,
>>>>>> +					       bpf_rcu_lock_held());
>>>>>> +	if (!cgroup_storage)
>>>>>> +		return NULL;
>>>>>> +
>>>>>> +	smap = (struct bpf_local_storage_map *)map;
>>>>>> +	return bpf_local_storage_lookup(cgroup_storage, smap, cacheit_lockit);
>>>>>> +}
>>>>>> +
>>>>>> +static void *bpf_cgrp_storage_lookup_elem(struct bpf_map *map, void *key)
>>>>>> +{
>>>>>> +	struct bpf_local_storage_data *sdata;
>>>>>> +	struct cgroup *cgroup;
>>>>>> +	int fd;
>>>>>> +
>>>>>> +	fd = *(int *)key;
>>>>>> +	cgroup = cgroup_get_from_fd(fd);
>>>>>> +	if (IS_ERR(cgroup))
>>>>>> +		return ERR_CAST(cgroup);
>>>>>> +
>>>>>> +	bpf_cgrp_storage_lock();
>>>>>> +	sdata = cgroup_storage_lookup(cgroup, map, true);
>>>>>> +	bpf_cgrp_storage_unlock();
>>>>>> +	cgroup_put(cgroup);
>>>>>> +	return sdata ? sdata->data : NULL;
>>>>>> +}
>>>>>
>>>>> Stanislav pointed out in the v1 revision that there's a lot of very
>>>>> similar logic in task storage, and I think you'd mentioned that you were
>>>>> going to think about generalizing some of that. Have you had a chance to
>>>>> consider?
>>>>
>>>> It is hard to have a common function for
>>>> lookup_elem/update_elem/delete_elem(). They are quite different as each
>>>> heavily involves
>>>> task/cgroup-specific functions.
>>>
>>> Yes agreed, each implementation is acquiring their own references, and
>>> finding the backing element in whatever way it was implemented, etc.
>>>
>>>> but map_alloc and map_free could have common helpers.
>>>
>>> Agreed, and many of the static functions that are invoked on those paths
>>> such as bpf_cgrp_storage_free(), bpf_cgrp_storage_lock(), etc possibly
>>> as well. In general this feels like something we could pretty easily
>>> simplify using something like a structure with callbacks to implement
>>> the pieces of logic that are specific to each local storage type, such
>>> as getting the struct bpf_local_storage __rcu
>>> * pointer from some context (e.g.  cgroup_storage_ptr()). It doesn't
>>> necessarily need to block this change, but IMO we should clean this up
>>> soon because a lot of this is nearly a 100% copy-paste of other local
>>> storage implementations.
>>
>> Further refactoring is possible. Martin is working to simplify the
>> locking mechanism. We can wait for that done before doing refactoring.
> 
> Sounds great, thanks!
> 
> - David

  reply	other threads:[~2022-10-23 16:46 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-20 22:12 [PATCH bpf-next v2 0/6] bpf: Implement cgroup local storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 1/6] bpf: Make struct cgroup btf id global Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-21  5:22   ` David Vernet
2022-10-21  5:26     ` David Vernet
2022-10-21 17:33     ` Yonghong Song
2022-10-21 19:57       ` David Vernet
2022-10-21 22:57         ` Yonghong Song
2022-10-22  3:02           ` David Vernet
2022-10-23 16:45             ` Yonghong Song [this message]
2022-10-23 21:14               ` David Vernet
     [not found]   ` <202210210932.nHqTyTmx-lkp@intel.com>
2022-10-21 16:51     ` Yonghong Song
2022-10-21 19:29   ` Yosry Ahmed
2022-10-21 21:05     ` Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 3/6] libbpf: Support new cgroup local storage Yonghong Song
2022-10-21 23:10   ` Andrii Nakryiko
2022-10-22  0:32     ` Yonghong Song
2022-10-22  1:05       ` Tejun Heo
2022-10-20 22:13 ` [PATCH bpf-next v2 4/6] bpftool: " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add selftests for " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 6/6] docs/bpf: Add documentation for map type BPF_MAP_TYPE_CGRP_STROAGE Yonghong Song
2022-10-21  7:12   ` David Vernet
2022-10-21 17:46     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95ff1fa3-124b-6886-64e0-adcf40085e55@meta.com \
    --to=yhs@meta.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.