All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?
Date: Tue, 20 Jun 2017 09:45:23 -0700	[thread overview]
Message-ID: <20170620164523.GI3721@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170618104000.GC28042@htj.duckdns.org>

On Sun, Jun 18, 2017 at 06:40:00AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Sat, Jun 17, 2017 at 10:31:05AM -0700, Paul E. McKenney wrote:
> > On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote:
> > > > And no test failures from yesterday evening.  So it looks like we get
> > > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture
> > > > runtime with your printk() in the mix.
> > > >
> > > > Was the above output from your printk() output of any help?
> > > 
> > > Yeah, if my suspicion is correct, it'd require new kworker creation
> > > racing against CPU offline, which would explain why it's so difficult
> > > to repro.  Can you please see whether the following patch resolves the
> > > issue?
> > 
> > That could explain why only Steve Rostedt and I saw the issue.  As far
> > as I know, we are the only ones who regularly run CPU-hotplug stress
> > tests.  ;-)
> 
> I was a bit confused.  It has to be racing against either new kworker
> being created on the wrong CPU or rescuer trying to migrate to the
> CPU, and it looks like we're mostly seeing the rescuer condition, but,
> yeah, this would only get triggered rarely.  Another contributing
> factor could be the vmstat work putting on a workqueue w/ rescuer
> recently.  It runs quite often, so probably has increased the chance
> of hitting the right condition.

Sounds like too much fun!  ;-)

But more constructively...  If I understand correctly, it is now possible
to take a CPU partially offline and put it back online again.  This should
allow much more intense testing of this sort of interaction.

And no, I haven't yet tried this with RCU because I would probably need
to do some mix of just-RCU online/offline and full-up online-offline.
Plus RCU requires pretty much a full online/offline cycle to fully
exercise it.  :-/

> > I have a weekend-long run going, but will give this a shot overnight on
> > Monday, Pacific Time.  Thank you for putting it together, looking forward
> > to seeing what it does!
> 
> Thanks a lot for the testing and patience.  Sorry that it took so
> long.  I'm not completely sure the patch is correct.  It might have to
> be more specifc about which type of migration or require further
> synchronization around migration, but hopefully it'll at least be able
> to show that this was the cause of the problem.

And last night's tests had no failures.  Which might actually mean
something, will get more info when I run without your patch this
evening.  ;-)

							Thanx, Paul

  reply	other threads:[~2017-06-20 16:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-01 16:57 WARN_ON_ONCE() in process_one_work()? Paul E. McKenney
2017-05-01 18:38 ` Paul E. McKenney
2017-05-01 18:44   ` Tejun Heo
2017-05-01 18:58     ` Paul E. McKenney
2017-05-05 17:11       ` Paul E. McKenney
2017-06-13 20:58         ` Tejun Heo
2017-06-13 22:31           ` Paul E. McKenney
2017-06-14 15:15             ` Paul E. McKenney
2017-06-15 15:38               ` Paul E. McKenney
2017-06-16 17:36                 ` Paul E. McKenney
2017-06-17 11:53                   ` Tejun Heo
2017-06-17 17:31                     ` Paul E. McKenney
2017-06-18 10:40                       ` Tejun Heo
2017-06-20 16:45                         ` Paul E. McKenney [this message]
2017-06-21 15:30                           ` Paul E. McKenney
2017-06-23 16:41                             ` Paul E. McKenney
2017-06-27 16:27                               ` Paul E. McKenney
2017-05-01 18:42 ` Tejun Heo
2017-05-01 19:42   ` Steven Rostedt
2017-05-01 19:50     ` Tejun Heo
2017-05-01 20:02       ` Steven Rostedt
2018-06-20 19:29 Paul E. McKenney
2018-07-02 21:05 ` Tejun Heo
2018-07-03  4:05   ` Paul E. McKenney
2018-07-03 16:40     ` Paul E. McKenney
2018-07-03 20:12       ` Tejun Heo
2018-07-03 21:44         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170620164523.GI3721@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.