linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?
Date: Tue, 20 Jun 2017 09:45:23 -0700	[thread overview]
Message-ID: <20170620164523.GI3721@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170618104000.GC28042@htj.duckdns.org>

On Sun, Jun 18, 2017 at 06:40:00AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Sat, Jun 17, 2017 at 10:31:05AM -0700, Paul E. McKenney wrote:
> > On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote:
> > > > And no test failures from yesterday evening.  So it looks like we get
> > > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture
> > > > runtime with your printk() in the mix.
> > > >
> > > > Was the above output from your printk() output of any help?
> > > 
> > > Yeah, if my suspicion is correct, it'd require new kworker creation
> > > racing against CPU offline, which would explain why it's so difficult
> > > to repro.  Can you please see whether the following patch resolves the
> > > issue?
> > 
> > That could explain why only Steve Rostedt and I saw the issue.  As far
> > as I know, we are the only ones who regularly run CPU-hotplug stress
> > tests.  ;-)
> 
> I was a bit confused.  It has to be racing against either new kworker
> being created on the wrong CPU or rescuer trying to migrate to the
> CPU, and it looks like we're mostly seeing the rescuer condition, but,
> yeah, this would only get triggered rarely.  Another contributing
> factor could be the vmstat work putting on a workqueue w/ rescuer
> recently.  It runs quite often, so probably has increased the chance
> of hitting the right condition.

Sounds like too much fun!  ;-)

But more constructively...  If I understand correctly, it is now possible
to take a CPU partially offline and put it back online again.  This should
allow much more intense testing of this sort of interaction.

And no, I haven't yet tried this with RCU because I would probably need
to do some mix of just-RCU online/offline and full-up online-offline.
Plus RCU requires pretty much a full online/offline cycle to fully
exercise it.  :-/

> > I have a weekend-long run going, but will give this a shot overnight on
> > Monday, Pacific Time.  Thank you for putting it together, looking forward
> > to seeing what it does!
> 
> Thanks a lot for the testing and patience.  Sorry that it took so
> long.  I'm not completely sure the patch is correct.  It might have to
> be more specifc about which type of migration or require further
> synchronization around migration, but hopefully it'll at least be able
> to show that this was the cause of the problem.

And last night's tests had no failures.  Which might actually mean
something, will get more info when I run without your patch this
evening.  ;-)

							Thanx, Paul

  reply	other threads:[~2017-06-20 16:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-01 16:57 WARN_ON_ONCE() in process_one_work()? Paul E. McKenney
2017-05-01 18:38 ` Paul E. McKenney
2017-05-01 18:44   ` Tejun Heo
2017-05-01 18:58     ` Paul E. McKenney
2017-05-05 17:11       ` Paul E. McKenney
2017-06-13 20:58         ` Tejun Heo
2017-06-13 22:31           ` Paul E. McKenney
2017-06-14 15:15             ` Paul E. McKenney
2017-06-15 15:38               ` Paul E. McKenney
2017-06-16 17:36                 ` Paul E. McKenney
2017-06-17 11:53                   ` Tejun Heo
2017-06-17 17:31                     ` Paul E. McKenney
2017-06-18 10:40                       ` Tejun Heo
2017-06-20 16:45                         ` Paul E. McKenney [this message]
2017-06-21 15:30                           ` Paul E. McKenney
2017-06-23 16:41                             ` Paul E. McKenney
2017-06-27 16:27                               ` Paul E. McKenney
2017-05-01 18:42 ` Tejun Heo
2017-05-01 19:42   ` Steven Rostedt
2017-05-01 19:50     ` Tejun Heo
2017-05-01 20:02       ` Steven Rostedt
2018-06-20 19:29 Paul E. McKenney
2018-07-02 21:05 ` Tejun Heo
2018-07-03  4:05   ` Paul E. McKenney
2018-07-03 16:40     ` Paul E. McKenney
2018-07-03 20:12       ` Tejun Heo
2018-07-03 21:44         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170620164523.GI3721@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).