linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rt-users <linux-rt-users@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [patch-rt] hotplug, hrtimer: Migrate expired/deferred timers during cpu offline
Date: Thu, 17 Aug 2017 18:50:42 +0200	[thread overview]
Message-ID: <20170817165041.3agf2btidfdcspiq@linutronix.de> (raw)
In-Reply-To: <1502697588.12319.199.camel@gmx.de>

On 2017-08-14 09:59:48 [+0200], Mike Galbraith wrote:
> On Fri, 2017-08-11 at 10:15 +0200, Mike Galbraith wrote:
> > On Fri, 2017-08-11 at 09:55 +0200, Mike Galbraith wrote:
> > > The below fixes the list debug explosion up.
> > > 
> > > If we do not migrate expired/deferred timers during cpu offline, ->cb_entry
> > > will be corrupted by online initialization of base->expired, leading to a
> > > loud list debug complaint should someone call __remove_hrtimer() thereafter.
> > > 
> > > Signed-off-by: Mike Galvraith <efault@gmx.de>
> > ahem.....................b
> 
> (actually, I shouldn't have signed, question being why we now leave
> them lying about when we _apparently_ previously did not)

takedown_cpu() invokes early smpboot_park_threads() which parks/ stops
the ksoftirqd. That means each hrtimer that fires after that and is not
marked as irqsafe won't be processed but just enqueued onto the
->expired list. The timer would be processed once the CPU goes back
online. Be not really. The thing is that once the CPU goes back online
the "expired" list head will be initialized and that timer is lost. Once
you try to cancel it, it will remove itself from the expired list and
this is when the list corruption is noticed.
My guess here is that the hotplug rework changed the timing and the bug
is more obvious now: if you cancel the timer before the CPU goes back
online then nothing happens.

> > > ---
> > >  kernel/time/hrtimer.c |   13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > --- a/kernel/time/hrtimer.c
> > > +++ b/kernel/time/hrtimer.c
> > > @@ -1802,6 +1802,19 @@ static void migrate_hrtimer_list(struct
> > >  		 */
> > >  		enqueue_hrtimer(timer, new_base);
> > >  	}
> > > +
> > > +	/*
> > > +	 * Finally, migrate any expired timers deferred by RT.
> > > +	 */
> > > +	while (!list_empty(&old_base->expired)) {
> > > +		struct list_head *entry = old_base->expired.next;
> > > +
> > > +		timer = container_of(entry, struct hrtimer, cb_entry);
> 
> (oops, forgot to change that back too. [scribble scribble])
> 
> > > +		/* XXX: hm, perhaps defer again instead of enqueueing. */
> > > +		__remove_hrtimer(timer, old_base, HRTIMER_STATE_ENQUEUED, 0);
> > > +		timer->base = new_base;
> > > +		enqueue_hrtimer(timer, new_base);

__remove_hrtimer() shouldn't be required because it has been done
already. It should be enough to just list_splice() one list to the
other and raise the softirq afterwards.

> > > +	}
> > >  }
> > >  
> > >  int hrtimers_dead_cpu(unsigned int scpu)

Sebastian

  reply	other threads:[~2017-08-17 16:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-04 17:38 [ANNOUNCE] v4.11.12-rt9 Sebastian Andrzej Siewior
2017-08-05  6:13 ` Mike Galbraith
2017-08-05 14:57   ` Mike Galbraith
2017-08-07  7:33     ` Sebastian Andrzej Siewior
2017-08-07  8:22       ` Mike Galbraith
2017-08-08 10:00         ` Mike Galbraith
2017-08-11  7:55           ` [patch-rt] hotplug, hrtimer: Migrate expired/deferred timers during cpu offline Mike Galbraith
2017-08-11  8:15             ` Mike Galbraith
2017-08-14  7:59               ` Mike Galbraith
2017-08-17 16:50                 ` Sebastian Andrzej Siewior [this message]
2017-08-17 17:17                   ` Sebastian Andrzej Siewior
2017-08-17 17:26                     ` Mike Galbraith
2017-08-17 17:37                       ` Mike Galbraith
2017-08-17 18:43                     ` Mike Galbraith
2017-08-07  7:52   ` [ANNOUNCE] v4.11.12-rt9 Sebastian Andrzej Siewior
2017-08-07  8:38     ` Mike Galbraith
2017-08-09 12:04       ` [patch-rt] locking, rwlock-rt: do not save state multiple times in __write_rt_lock() Mike Galbraith
2017-08-18  9:06         ` Sebastian Andrzej Siewior
2017-08-07  9:10     ` [ANNOUNCE] v4.11.12-rt9 Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170817165041.3agf2btidfdcspiq@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).