linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Huacai Chen <chenhuacai@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Z qiang <qiang.zhang1211@gmail.com>,
	Huacai Chen <chenhuacai@loongson.cn>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>, Ingo Molnar <mingo@kernel.org>,
	John Stultz <jstultz@google.com>, Stephen Boyd <sboyd@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Binbin Zhou <zhoubinbin@loongson.cn>
Subject: Re: [PATCH V4 2/2] rcu: Update jiffies in rcu_cpu_stall_reset()
Date: Mon, 28 Aug 2023 04:54:59 -0700	[thread overview]
Message-ID: <75b1d69e-b93f-43ba-8289-9465b9fa39a8@paulmck-laptop> (raw)
In-Reply-To: <CAAhV-H6hkyAe616BuuE2H2TqNYFXqyHpJVc_Zn4307JszuVKYQ@mail.gmail.com>

On Mon, Aug 28, 2023 at 07:30:43PM +0800, Huacai Chen wrote:
> Hi, Paul and Joel,
> 
> On Mon, Aug 28, 2023 at 6:47 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, Aug 27, 2023 at 06:11:40PM -0400, Joel Fernandes wrote:
> > > On Sun, Aug 27, 2023 at 1:51 AM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > [..]
> > > > > > > > The only way I know of to avoid these sorts of false positives is for
> > > > > > > > the user to manually suppress all timeouts (perhaps using a kernel-boot
> > > > > > > > parameter for your early-boot case), do the gdb work, and then unsuppress
> > > > > > > > all stalls.  Even that won't work for networking, because the other
> > > > > > > > system's clock will be running throughout.
> > > > > > > >
> > > > > > > > In other words, from what I know now, there is no perfect solution.
> > > > > > > > Therefore, there are sharp limits to the complexity of any solution that
> > > > > > > > I will be willing to accept.
> > > > > > > I think the simplest solution is (I hope Joel will not angry):
> > > > > >
> > > > > > Not angry at all, just want to help. ;-). The problem is the 300*HZ solution
> > > > > > will also effect the VM workloads which also do a similar reset.  Allow me few
> > > > > > days to see if I can take a shot at fixing it slightly differently. I am
> > > > > > trying Paul's idea of setting jiffies at a later time. I think it is doable.
> > > > > > I think the advantage of doing this is it will make stall detection more
> > > > > > robust in this face of these gaps in jiffie update. And that solution does
> > > > > > not even need us to rely on ktime (and all the issues that come with that).
> > > > > >
> > > > >
> > > > > I wrote a patch similar to Paul's idea and sent it out for review, the
> > > > > advantage being it purely is based on jiffies. Could you try it out
> > > > > and let me know?
> > > > If you can cc my gmail <chenhuacai@gmail.com>, that could be better.
> > >
> > > Sure, will do.
> > >
> > > > I have read your patch, maybe the counter (nr_fqs_jiffies_stall)
> > > > should be atomic_t and we should use atomic operation to decrement its
> > > > value. Because rcu_gp_fqs() can be run concurrently, and we may miss
> > > > the (nr_fqs == 1) condition.
> > >
> > > I don't think so. There is only 1 place where RMW operation happens
> > > and rcu_gp_fqs() is called only from the GP kthread. So a concurrent
> > > RMW (and hence a lost update) is not possible.
> >
> > Huacai, is your concern that the gdb user might have created a script
> > (for example, printing a variable or two, then automatically continuing),
> > so that breakpoints could happen in quick successsion, such that the
> > second breakpoint might run concurrently with rcu_gp_fqs()?
> >
> > If this can really happen, the point that Joel makes is a good one, namely
> > that rcu_gp_fqs() is single-threaded and (absent rcutorture) runs only
> > once every few jiffies.  And gdb breakpoints, even with scripting, should
> > also be rather rare.  So if this is an issue, a global lock should do the
> > trick, perhaps even one of the existing locks in the rcu_state structure.
> > The result should then be just as performant/scalable and a lot simpler
> > than use of atomics.
> 
> Sorry, I made a mistake. Yes, there is no concurrent issue, and this
> approach probably works. But I have another problem: how to ensure
> that there is a jiffies update in three calls to rcu_gp_fqs()? Or in
> other word, is three also a magic number here?

Each of the three calls to rcu_gp_fqs() involves a wakeup of and a
context switch to RCU's grace-period kthread, each of which should be
sufficient to update jiffies if initially in an out-of-date-jiffies state.
The three is to some extent magic, the idea being to avoid a situation
where an currently running FQS reenables stall warnings immediately
after gdb disables them.

Obviously, if your testing shows that some other value works better,
please do let us know so that we can update!  But we have to start
somewhere.

> And I rechecked the commit message of a80be428fbc1f1f3bc9e ("rcu: Do
> not disable GP stall detection in rcu_cpu_stall_reset()"). I don't
> know why Sergey said that the original code disables stall-detection
> forever, in fact it only disables the detection in the current GP.

Well, it does disable stall detection forever in the case where the
current grace period lasts forever, which if I recall correctly was the
case that Sergey was encountering.

							Thanx, Paul

> Huacai
> 
> >
> > > Could you test the patch for the issue you are seeing and provide your
> > > Tested-by tag? Thanks,
> >
> > Either way, testing would of course be very good!  ;-)
> >
> >                                                         Thanx, Paul

  reply	other threads:[~2023-08-28 11:55 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-14  2:00 [PATCH V4 1/2] tick: Rename tick_do_update_jiffies64() and allow external usage Huacai Chen
2023-08-14  2:00 ` [PATCH V4 2/2] rcu: Update jiffies in rcu_cpu_stall_reset() Huacai Chen
2023-08-14 16:15   ` Paul E. McKenney
2023-08-15  6:05     ` Huacai Chen
2023-08-16  3:16       ` Z qiang
2023-08-16  4:53         ` Huacai Chen
2023-08-16  5:09           ` Z qiang
2023-08-16  9:33             ` Huacai Chen
2023-08-16 10:06               ` Z qiang
2023-08-16 12:28                 ` Huacai Chen
2023-08-16 15:56                   ` Alan Huang
2023-08-16 16:13                     ` Huacai Chen
2023-08-16 16:52                       ` Alan Huang
2023-08-17  4:04                         ` Huacai Chen
2023-08-23 21:41                     ` Thomas Gleixner
2023-08-16 19:27                   ` Joel Fernandes
2023-08-17  8:06                     ` Huacai Chen
2023-08-23 22:03                       ` Thomas Gleixner
2023-08-23 22:41                         ` Paul E. McKenney
2023-08-24  2:50                           ` Huacai Chen
2023-08-24 11:40                             ` Paul E. McKenney
2023-08-24 12:40                               ` Huacai Chen
2023-08-24 13:24                                 ` Paul E. McKenney
2023-08-24 15:43                                   ` Huacai Chen
2023-08-24 18:28                                     ` Paul E. McKenney
2023-08-25 11:15                                       ` Huacai Chen
2023-08-25 23:28                                         ` Joel Fernandes
2023-08-27  3:27                                           ` Joel Fernandes
2023-08-27  5:50                                             ` Huacai Chen
2023-08-27 22:11                                               ` Joel Fernandes
2023-08-28 10:47                                                 ` Paul E. McKenney
2023-08-28 11:30                                                   ` Huacai Chen
2023-08-28 11:54                                                     ` Paul E. McKenney [this message]
2023-08-28 13:33                                                   ` Joel Fernandes
2023-08-28 14:02                                                     ` Paul E. McKenney
2023-08-28 14:37                                                       ` Huacai Chen
2023-08-28 14:50                                                         ` Joel Fernandes
2023-08-28 15:12                                                           ` Huacai Chen
2023-08-28 20:47                                                             ` Joel Fernandes
2023-08-29  4:07                                                               ` Huacai Chen
2023-08-29 14:46                                                                 ` Joel Fernandes
2023-08-30  4:25                                                                   ` Huacai Chen
2023-08-30 10:18                                                                     ` Joel Fernandes
2023-08-30  1:04                                                             ` Joel Fernandes
2023-08-26  1:45                                         ` Paul E. McKenney
2023-08-24 13:09                               ` Joel Fernandes
2023-08-24 13:28                                 ` Paul E. McKenney
2023-08-24 16:03                                   ` Huacai Chen
2023-08-24 16:32                                     ` Huacai Chen
2023-08-24 16:34                                     ` Paul E. McKenney
2023-08-24  2:47                         ` Huacai Chen
2023-08-24  9:39                           ` Thomas Gleixner
2023-08-24 13:21                         ` Joel Fernandes
2023-08-24 13:29                           ` Paul E. McKenney
2023-08-24 16:15                           ` Huacai Chen
2023-08-23 21:36                 ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75b1d69e-b93f-43ba-8289-9465b9fa39a8@paulmck-laptop \
    --to=paulmck@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=chenhuacai@kernel.org \
    --cc=chenhuacai@loongson.cn \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=zhoubinbin@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).