All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, Jann Horn <jannh@google.com>,
	Florent Revest <revest@chromium.org>,
	Brendan Jackman <jackmanb@chromium.org>,
	Yonghong Song <yhs@fb.com>
Subject: Re: [PATCH bpf-next 1/2] bpf: Allow bpf_local_storage to be used by sleepable programs
Date: Fri, 3 Dec 2021 17:01:52 -0800	[thread overview]
Message-ID: <20211204010152.GA3967770@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20211130225129.GB641268@paulmck-ThinkPad-P17-Gen-1>

On Tue, Nov 30, 2021 at 02:51:29PM -0800, Paul E. McKenney wrote:
> On Tue, Nov 30, 2021 at 05:22:25PM +0100, KP Singh wrote:
> > On Tue, Nov 30, 2021 at 3:34 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Wed, Nov 24, 2021 at 11:20:40PM +0100, KP Singh wrote:
> > > > On Tue, Nov 23, 2021 at 11:30 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Tue, Nov 23, 2021 at 10:22:04AM -0800, Paul E. McKenney wrote:
> > > > > > On Tue, Nov 23, 2021 at 06:11:14PM +0100, KP Singh wrote:
> > > > > > > On Thu, Sep 2, 2021 at 6:45 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > > I think the global lock will be an issue for the current non-sleepable
> > > > > > > > netdev bpf-prog which could be triggered by external traffic,  so a flag
> > > > > > > > is needed here to provide a fast path.  I suspect other non-prealloc map
> > > > > > > > may need it in the future, so probably
> > > > > > > > s/BPF_F_SLEEPABLE_STORAGE/BPF_F_SLEEPABLE/ instead.
> > > > > > >
> > > > > > > I was re-working the patches and had a couple of questions.
> > > > > > >
> > > > > > > There are two data structures that get freed under RCU here:
> > > > > > >
> > > > > > > struct bpf_local_storage
> > > > > > > struct bpf_local_storage_selem
> > > > > > >
> > > > > > > We can choose to free the bpf_local_storage_selem under
> > > > > > > call_rcu_tasks_trace based on
> > > > > > > whether the map it belongs to is sleepable with something like:
> > > > > > >
> > > > > > > if (selem->sdata.smap->map.map_flags & BPF_F_SLEEPABLE_STORAGE)
> > > > > Paul's current work (mentioned by his previous email) will improve the
> > > > > performance of call_rcu_tasks_trace, so it probably can avoid the
> > > > > new BPF_F_SLEEPABLE flag and make it easier to use.
> > > > >
> > > > > > >     call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu);
> > > > > > > else
> > > > > > >     kfree_rcu(selem, rcu);
> > > > > > >
> > > > > > > Questions:
> > > > > > >
> > > > > > > * Can we free bpf_local_storage under kfree_rcu by ensuring it's
> > > > > > >   always accessed in a  classical RCU critical section?
> > > > > >>    Or maybe I am missing something and this also needs to be freed
> > > > > > >   under trace RCU if any of the selems are from a sleepable map.
> > > > > In the inode_storage_lookup() of this patch:
> > > > >
> > > > > +#define bpf_local_storage_rcu_lock_held()                      \
> > > > > +       (rcu_read_lock_held() || rcu_read_lock_trace_held() ||  \
> > > > > +        rcu_read_lock_bh_held())
> > > > >
> > > > > @@ -44,7 +45,8 @@ static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
> > > > >         if (!bsb)
> > > > >                 return NULL;
> > > > >
> > > > > -       inode_storage = rcu_dereference(bsb->storage);
> > > > > +       inode_storage = rcu_dereference_protected(bsb->storage,
> > > > > +                                                 bpf_local_storage_rcu_lock_held());
> > > > >
> > > > > Thus, it is not always in classical RCU critical.
> > > > >
> > > > > > >
> > > > > > > * There is an issue with nested raw spinlocks, e.g. in
> > > > > > > bpf_inode_storage.c:bpf_inode_storage_free
> > > > > > >
> > > > > > >   hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
> > > > > > >   /* Always unlink from map before unlinking from
> > > > > > >   * local_storage.
> > > > > > >   */
> > > > > > >   bpf_selem_unlink_map(selem);
> > > > > > >   free_inode_storage = bpf_selem_unlink_storage_nolock(
> > > > > > >                  local_storage, selem, false);
> > > > > > >   }
> > > > > > >   raw_spin_unlock_bh(&local_storage->lock);
> > > > > > >
> > > > > > > in bpf_selem_unlink_storage_nolock (if we add the above logic with the
> > > > > > > flag in place of kfree_rcu)
> > > > > > > call_rcu_tasks_trace grabs a spinlock and these cannot be nested in a
> > > > > > > raw spin lock.
> > > > > > >
> > > > > > > I am moving the freeing code out of the spinlock, saving the selems on
> > > > > > > a local list and then doing the free RCU (trace or normal) callbacks
> > > > > > > at the end. WDYT?
> > > > > There could be more than one selem to save.
> > > >
> > > > Yes, that's why I was saving them on a local list and then calling
> > > > kfree_rcu or call_rcu_tasks_trace after unlocking the raw_spin_lock
> > > >
> > > > INIT_HLIST_HEAD(&free_list);
> > > > raw_spin_lock_irqsave(&local_storage->lock, flags);
> > > > hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
> > > >     bpf_selem_unlink_map(selem);
> > > >     free_local_storage = bpf_selem_unlink_storage_nolock(
> > > >     local_storage, selem, false);
> > > >     hlist_add_head(&selem->snode, &free_list);
> > > > }
> > > > raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > > >
> > > > /* The element needs to be freed outside the raw spinlock because spin
> > > > * locks cannot nest inside a raw spin locks and call_rcu_tasks_trace
> > > > * grabs a spinklock when the RCU code calls into the scheduler.
> > > > *
> > > > * free_local_storage should always be true as long as
> > > > * local_storage->list was non-empty.
> > > > */
> > > > hlist_for_each_entry_safe(selem, n, &free_list, snode) {
> > > >     if (selem->sdata.smap->map.map_flags & BPF_F_SLEEPABLE_STORAGE)
> > > >         call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu);
> > > >     else
> > > >         kfree_rcu(selem, rcu);
> > > > }
> > > >
> > > > But... we won't need this anymore.
> > > Yep, Paul's work (thanks!) will make this piece simpler.
> > 
> > +100
> > 
> > >
> > > KP, this set functionally does not depend on Paul's changes.
> > > Do you want to spin a new version so that it can be reviewed in parallel?
> > 
> > Sure, I will fix the remaining issues (i.e. with RCU locks and renames) and
> > spin a new version.
> > 
> > > When the rcu-task changes land in -next, it can probably
> > > be merged into bpf-next first before landing the sleepable
> > > bpf storage work.
> 
> And I just now got both the expand-queues and shrink-queues code at
> least pretending to work, and it will be picked up in the next -next.
> I was probably too late for today's edition, but there is always tomorrow.
> 
> There are probably still bugs, but it is passing much nastier tests than
> a couple of weeks ago, so here is hoping...

And this is now in -next.  Please let me know how it goes!

								Thanx, Paul

  reply	other threads:[~2021-12-04  1:01 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-26 23:51 [PATCH bpf-next 0/2] Sleepable local storage KP Singh
2021-08-26 23:51 ` [PATCH bpf-next 1/2] bpf: Allow bpf_local_storage to be used by sleepable programs KP Singh
2021-08-27 20:55   ` Martin KaFai Lau
2021-08-29 21:52     ` KP Singh
2021-08-31  2:11       ` Martin KaFai Lau
2021-08-31  9:50         ` KP Singh
2021-08-31 18:22           ` Martin KaFai Lau
2021-08-31 19:38             ` KP Singh
2021-09-01  6:32               ` Martin KaFai Lau
2021-09-01 20:26                 ` Paul E. McKenney
2021-09-02  4:44                   ` Martin KaFai Lau
2021-11-23 17:11                     ` KP Singh
2021-11-23 18:22                       ` Paul E. McKenney
2021-11-23 22:29                         ` Martin KaFai Lau
2021-11-23 23:14                           ` KP Singh
2021-11-24  0:18                             ` Martin KaFai Lau
2021-11-24 22:20                           ` KP Singh
2021-11-30  2:34                             ` Martin KaFai Lau
2021-11-30 16:22                               ` KP Singh
2021-11-30 22:51                                 ` Paul E. McKenney
2021-12-04  1:01                                   ` Paul E. McKenney [this message]
2021-12-05  2:27                                     ` KP Singh
2021-12-05  3:52                                       ` Paul E. McKenney
2021-11-23 23:11                         ` KP Singh
2021-11-25  3:47                           ` Paul E. McKenney
2021-09-30 18:46                 ` Martin KaFai Lau
2021-11-02 16:00                   ` KP Singh
2021-08-26 23:51 ` [PATCH bpf-next 2/2] bpf/selftests: Update local storage selftest for " KP Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211204010152.GA3967770@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jackmanb@chromium.org \
    --cc=jannh@google.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=revest@chromium.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.