bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yan Zhai <yan@cloudflare.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	netdev@vger.kernel.org,  "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>, Jiri Pirko <jiri@resnulli.us>,
	 Simon Horman <horms@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	 Lorenzo Bianconi <lorenzo@kernel.org>,
	Coco Li <lixiaoyan@google.com>, Wei Wang <weiwan@google.com>,
	 Alexander Duyck <alexanderduyck@fb.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	 linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
	bpf@vger.kernel.org,  kernel-team@cloudflare.com,
	Joel Fernandes <joel@joelfernandes.org>,
	 Toke Hoiland-Jorgensen <toke@redhat.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>
Subject: Re: [PATCH v4 net 1/3] rcu: add a helper to report consolidated flavor QS
Date: Mon, 18 Mar 2024 21:39:42 -0500	[thread overview]
Message-ID: <CAO3-PbrssUurD5dpMjxNduYhUj8dAikuwOHgZDn78o+Jqv_dBA@mail.gmail.com> (raw)
In-Reply-To: <CAO3-Pbp6fCayWeJ11U6JtqHn-Rs3OFXoZ9uMohUefSYUvSGUKA@mail.gmail.com>

On Mon, Mar 18, 2024 at 9:32 PM Yan Zhai <yan@cloudflare.com> wrote:
>
> On Mon, Mar 18, 2024 at 5:59 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote:
> > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote:
> > > > There are several scenario in network processing that can run
> > > > extensively under heavy traffic. In such situation, RCU synchronization
> > > > might not observe desired quiescent states for indefinitely long period.
> > > > Create a helper to safely raise the desired RCU quiescent states for
> > > > such scenario.
> > > >
> > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is
> > > > sufficient to address existing problems around RCU tasks. It's unclear
> > > > yet if there is any future scenario for it to be further tuned down.
> > >
> > > I suggest something like the following for the commit log:
> > >
> > > ------------------------------------------------------------------------
> > >
> > > When under heavy load, network processing can run CPU-bound for many tens
> > > of seconds.  Even in preemptible kernels, this can block RCU Tasks grace
> > > periods, which can cause trace-event removal to take more than a minute,
> > > which is unacceptably long.
> > >
> > > This commit therefore creates a new helper function that passes
> > > through both RCU and RCU-Tasks quiescent states every 100 milliseconds.
> > > This hard-coded value suffices for current workloads.
> >
> > FWIW, this sounds good to me.
> >
> > >
> > > ------------------------------------------------------------------------
> > >
> > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> > > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> > > > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > > > ---
> > > > v3->v4: comment fixup
> > > >
> > > > ---
> > > >  include/linux/rcupdate.h | 24 ++++++++++++++++++++++++
> > > >  1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index 0746b1b0b663..da224706323e 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -247,6 +247,30 @@ do { \
> > > >     cond_resched(); \
> > > >  } while (0)
> > > >
> > > > +/**
> > > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states
> > > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro.
> > > > + *
> > > > + * This helper is for network processing in non-RT kernels, where there could
> > > > + * be busy polling threads that block RCU synchronization indefinitely.  In
> > > > + * such context, simply calling cond_resched is insufficient, so give it a
> > > > + * stronger push to eliminate all potential blockage of all RCU types.
> > > > + *
> > > > + * NOTE: unless absolutely sure, this helper should in general be called
> > > > + * outside of bh lock section to avoid reporting a surprising QS to updaters,
> > > > + * who could be expecting RCU read critical section to end at local_bh_enable().
> > > > + */
> > >
> > > How about something like this for the kernel-doc comment?
> > >
> > > /**
> > >  * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states
> > >  * @old_ts: jiffies at start of processing.
> > >  *
> > >  * This helper is for long-running softirq handlers, such as those
> > >  * in networking.  The caller should initialize the variable passed in
> > >  * as @old_ts at the beginning of the softirq handler.  When invoked
> > >  * frequently, this macro will invoke rcu_softirq_qs() every 100
> > >  * milliseconds thereafter, which will provide both RCU and RCU-Tasks
> > >  * quiescent states.  Note that this macro modifies its old_ts argument.
> > >  *
> > >  * Note that although cond_resched() provides RCU quiescent states,
> > >  * it does not provide RCU-Tasks quiescent states.
> > >  *
> > >  * Because regions of code that have disabled softirq act as RCU
> > >  * read-side critical sections, this macro should be invoked with softirq
> > >  * (and preemption) enabled.
> > >  *
> > >  * This macro has no effect in CONFIG_PREEMPT_RT kernels.
> > >  */
> >
> > Considering the note about cond_resched(), does does cond_resched() actually
> > provide an RCU quiescent state for fully-preemptible kernels? IIUC for those
> > cond_resched() expands to:
> >
> >         __might_resched();
> >         klp_sched_try_switch()
> >
> > ... and AFAICT neither reports an RCU quiescent state.
> >
> > So maybe it's worth dropping the note?
> >
> > Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that
> > avoid the problem through other means, or are people just not running effected
> > workloads on that?
> >
> It's a bit anti-intuition but yes the RT kernel avoids the problem.
> This is because "schedule()" reports task RCU QS actually, and on RT
> kernel cond_resched() call won't call "__cond_resched()" or
> "__schedule(PREEMPT)" as you already pointed out, which would clear
> need-resched flag. This then allows "schedule()" to be called on hard
> IRQ exit time by time.
>

And these are excellent questions that I should originally include in
the comment. Thanks for bringing it up.
Let me send another version tomorrow, allowing more thoughts on this if any.

thanks
Yan

> Yan
>
> > Mark.
> >
> > >
> > >                                                       Thanx, Paul
> > >
> > > > +#define rcu_softirq_qs_periodic(old_ts) \
> > > > +do { \
> > > > +   if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \
> > > > +       time_after(jiffies, (old_ts) + HZ / 10)) { \
> > > > +           preempt_disable(); \
> > > > +           rcu_softirq_qs(); \
> > > > +           preempt_enable(); \
> > > > +           (old_ts) = jiffies; \
> > > > +   } \
> > > > +} while (0)
> > > > +
> > > >  /*
> > > >   * Infrastructure to implement the synchronize_() primitives in
> > > >   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
> > > > --
> > > > 2.30.2
> > > >
> > > >

  reply	other threads:[~2024-03-19  2:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-15 19:55 [PATCH v4 net 0/3] Report RCU QS for busy network kthreads Yan Zhai
2024-03-15 19:55 ` [PATCH v4 net 1/3] rcu: add a helper to report consolidated flavor QS Yan Zhai
2024-03-16  5:40   ` Paul E. McKenney
2024-03-18 10:58     ` Mark Rutland
2024-03-19  2:32       ` Yan Zhai
2024-03-19  2:39         ` Yan Zhai [this message]
2024-03-19  1:26     ` Yan Zhai
2024-03-15 19:55 ` [PATCH v4 net 2/3] net: report RCU QS on threaded NAPI repolling Yan Zhai
2024-03-15 19:55 ` [PATCH v4 net 3/3] bpf: report RCU QS in cpumap kthread Yan Zhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAO3-PbrssUurD5dpMjxNduYhUj8dAikuwOHgZDn78o+Jqv_dBA@mail.gmail.com \
    --to=yan@cloudflare.com \
    --cc=alexanderduyck@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=joel@joelfernandes.org \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lixiaoyan@google.com \
    --cc=lorenzo@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=toke@redhat.com \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).