linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Pavel Machek <pavel@ucw.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	wanpeng.li@hotmail.com, Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>, "# .39.x" <stable@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alan Stern <stern@rowland.harvard.edu>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	USB list <linux-usb@vger.kernel.org>
Subject: Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot
Date: Fri, 17 Feb 2017 19:43:28 +0100	[thread overview]
Message-ID: <20170217184327.GD4521@lerouge> (raw)
In-Reply-To: <20170217170508.GA20884@amd>

On Fri, Feb 17, 2017 at 06:05:08PM +0100, Pavel Machek wrote:
> On Fri 2017-02-17 17:37:47, Thomas Gleixner wrote:
> > On Fri, 17 Feb 2017, Frederic Weisbecker wrote:
> > > On Thu, Feb 16, 2017 at 08:34:45PM +0100, Thomas Gleixner wrote:
> > > > On Thu, 16 Feb 2017, Frederic Weisbecker wrote:
> > > > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote:
> > > > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker
> > > > > > <fweisbec@gmail.com> wrote:
> > > > > > >
> > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed
> > > > > > > with:
> > > > > > >     7bdb59f1ad474bd7161adc8f923cdef10f2638d1
> > > > > > >     "tick/nohz: Fix possible missing clock reprog after tick soft restart"
> > > > > > >
> > > > > > > I hope this fixes your issue.
> > > > > > 
> > > > > > No, Pavel saw the problem with rc8 too, which already has that fix.
> > > > > > 
> > > > > > So I think we'll just need to revert that original patch (and that
> > > > > > means that we have to revert the commit you point to as well, since
> > > > > > that ->next_tick field was added by the original commit).
> > > > > 
> > > > > Aw too bad, but indeed that late we don't have the choice.
> > > > 
> > > > Hint: Look for CPU hotplug interaction of these patches. I bet something
> > > > becomes stale when the CPU goes down and does not get reset when it comes
> > > > back online.
> > > 
> > > Indeed I should check that. But Pavel is seeing this on boot, where the
> > 
> > I don't think so. He observed it on suspend resume and by doing hotplug
> > operations in a loop. But I might be wrong as usual.
> 
> These are different bugs.
> 
> On x60, I see failures doing hotplug/unplug in a loop, or lot of
> suspends. Someone seen it in v4.8-stable etc. Old bug. Rare to hit.
> 
> Desktop machine was failing to boot, and had some fun with
> suspend/resume too. Boot hang was reproducible with right
> procedure. (Hard poweroff, cold boot.). That one was introduced in
> 4.10-rc cycle.

Pavel, is there any chance you could apply this patch on top of latest linus tree
and send me your resulting dmesg log? This has the two reverted patches plus some
debugging code. The amount of printk shouldn't be too big, I tested it home without
issue.

If you can't manage to dump the dmesg, please try to take a picture of your screen
so that I can see the last messages starting with "NEXT_TICK_READ".

Thanks!

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2c115fd..504cb41 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -658,6 +658,8 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 		tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
 }
 
+static DEFINE_PER_CPU(u64, prev_next_tick);
+
 static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 					 ktime_t now, int cpu)
 {
@@ -725,6 +727,11 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		 */
 		if (delta == 0) {
 			tick_nohz_restart(ts, now);
+			/*
+			 * Make sure next tick stop doesn't get fooled by past
+			 * clock deadline
+			 */
+			ts->next_tick = 0;
 			goto out;
 		}
 	}
@@ -767,8 +774,15 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	tick = expires;
 
 	/* Skip reprogram of event if its not changed */
-	if (ts->tick_stopped && (expires == dev->next_event))
-		goto out;
+	if (ts->tick_stopped) {
+		if (system_state == SYSTEM_BOOTING) {
+			if (ts->next_tick != this_cpu_read(prev_next_tick))
+				printk("NEXT_TICK_READ: CPU: %d Expires: %llu ts->next_tick:%llu\n", smp_processor_id(), expires, ts->next_tick);
+			this_cpu_write(prev_next_tick, ts->next_tick);
+		}
+		if (expires == ts->next_tick)
+			goto out;
+	}
 
 	/*
 	 * nohz_stop_sched_tick can be called several times before
@@ -787,6 +801,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		trace_tick_stop(1, TICK_DEP_MASK_NONE);
 	}
 
+	ts->next_tick = tick;
+
 	/*
 	 * If the expiration time == KTIME_MAX, then we simply stop
 	 * the tick timer.
@@ -802,7 +818,10 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	else
 		tick_program_event(tick, 1);
 out:
-	/* Update the estimated sleep length */
+	/*
+	 * Update the estimated sleep length until the next timer
+	 * (not only the tick).
+	 */
 	ts->sleep_length = ktime_sub(dev->next_event, now);
 	return tick;
 }
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index bf38226..075444e 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -27,6 +27,7 @@ enum tick_nohz_mode {
  *			timer is modified for nohz sleeps. This is necessary
  *			to resume the tick timer operation in the timeline
  *			when the CPU returns from nohz sleep.
+ * @next_tick:		Next tick to be fired when in dynticks mode.
  * @tick_stopped:	Indicator that the idle tick has been stopped
  * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
  * @idle_calls:		Total number of idle calls
@@ -44,6 +45,7 @@ struct tick_sched {
 	unsigned long			check_clocks;
 	enum tick_nohz_mode		nohz_mode;
 	ktime_t				last_tick;
+	ktime_t				next_tick;
 	int				inidle;
 	int				tick_stopped;
 	unsigned long			idle_jiffies;

  reply	other threads:[~2017-02-17 18:43 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170203190414.GA3701@amd>
     [not found] ` <Pine.LNX.4.44L0.1702031415140.1794-100000@iolanthe.rowland.org>
2017-02-03 20:51   ` v4.10-rc6 " Pavel Machek
2017-02-03 21:18     ` Pavel Machek
2017-02-03 21:59       ` Alan Stern
2017-02-03 22:43         ` Pavel Machek
2017-02-04  8:48           ` Pavel Machek
2017-02-04 16:52             ` Pavel Machek
2017-02-12 12:00             ` Pavel Machek
2017-02-14 17:59       ` v4.10-rc8 (-rc6) " Pavel Machek
2017-02-14 19:27         ` Pavel Machek
2017-02-14 19:54           ` Alan Stern
2017-02-23 16:28           ` Frederic Weisbecker
2017-02-23 18:40             ` Pavel Machek
2017-02-25  3:28               ` Frederic Weisbecker
2017-03-18 14:42               ` Frederic Weisbecker
2017-04-03 15:38               ` Frederic Weisbecker
2017-04-03 18:20                 ` Pavel Machek
2017-04-12 15:08                   ` Frederic Weisbecker
2017-04-15 21:34                     ` Pavel Machek
2017-04-20 14:52                       ` Frederic Weisbecker
     [not found]         ` <CA+55aFyYAztA+Onquy9ODeC9_YBL_fXAd-RgeUVUhpsjK81ZVQ@mail.gmail.com>
     [not found]           ` <CA+55aFxU1D0hAPJuhkKaFBByi=8vpw7dJUX=FfpoqnZLWsvxig@mail.gmail.com>
     [not found]             ` <CA+55aFwt6pbt2STzRh1yCdoo7AnCLFqnPkkrYk4_BGFuvT2VCw@mail.gmail.com>
     [not found]               ` <CA+55aFzMiXXw9gqNCMCSc+O5HfcqWHXfqbdtbvcOmAHM9_wNig@mail.gmail.com>
     [not found]                 ` <CA+55aFxuXgsCyMgrRDHdM6BQaej68QoU8TwdM=3LYu9LMBf4fQ@mail.gmail.com>
2017-02-15 17:23                   ` Pavel Machek
2017-02-15 23:20                     ` Pavel Machek
2017-02-15 23:34                       ` Linus Torvalds
2017-02-16 11:11                         ` Pavel Machek
2017-02-16 17:25                           ` Pavel Machek
2017-02-16 18:13                             ` Frederic Weisbecker
2017-02-16 18:20                               ` Linus Torvalds
2017-02-16 18:34                                 ` Frederic Weisbecker
2017-02-16 19:34                                   ` Thomas Gleixner
2017-02-16 20:06                                     ` Pavel Machek
2017-02-16 20:21                                       ` Linus Torvalds
2017-02-16 20:48                                         ` Pavel Machek
2017-02-18  8:55                                         ` Pavel Machek
2017-02-17  1:11                                       ` Greg Kroah-Hartman
2017-02-17 14:04                                     ` Frederic Weisbecker
2017-02-17 16:37                                       ` Thomas Gleixner
2017-02-17 17:05                                         ` Pavel Machek
2017-02-17 18:43                                           ` Frederic Weisbecker [this message]
2017-02-16 19:06                             ` Pavel Machek
2017-02-17 14:40                               ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170217184327.GD4521@lerouge \
    --to=fweisbec@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=stable@kernel.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=wanpeng.li@hotmail.com \
    --subject='Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox