All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Pavel Machek <pavel@ucw.cz>
Cc: Alan Stern <stern@rowland.harvard.edu>,
	torvalds@linux-foundation.org,
	kernel list <linux-kernel@vger.kernel.org>,
	linux-usb@vger.kernel.org, gregkh@linuxfoundation.org,
	bhelgaas@google.com, linux-pci@vger.kernel.org
Subject: Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot
Date: Wed, 12 Apr 2017 17:08:35 +0200	[thread overview]
Message-ID: <20170412150832.GE21309@lerouge> (raw)
In-Reply-To: <20170403182050.GA6555@amd>

[-- Attachment #1: Type: text/plain, Size: 3256 bytes --]

On Mon, Apr 03, 2017 at 08:20:50PM +0200, Pavel Machek wrote:
> > > > > > ...1d.7: PCI fixup... pass 2
> > > > > > ...1d.7: PCI fixup... pass 3
> > > > > > ...1d.7: PCI fixup... pass 3 done
> > > > > > 
> > > > > > ...followed by hang. So yes, it looks USB related.
> > > > > > 
> > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU
> > > > > > startup, unfortunately useful info is off screen at that point).
> > > > > 
> > > > > Forgot to say, 1d.7 is EHCI controller.
> > > > > 
> > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI
> > > > > Controller (rev 01)
> > > > 
> > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller).
> > > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to
> > > > burden you again :-)
> > > 
> > > Go through more mails. It is only reproducible after cold boot. .. so
> > > I doubt it will be easy to reproduce on another machine.
> > > 
> > > Now... I do have serial port, and I even might have serial cable
> > > somewhere, but.... Giving how sensitive it is, it is probably going to
> > > go away with console on ttyS...
> > 
> > I also tried on an eeepc (which has ICH7/NM10 as well), with your config.
> > I even plugged a usb keyboard but even then I have been unable to
> > reproduce either :-(
> 
> Ok, give me some time. I'm no longer using the affected machine, so no
> promises.

Actually someone reported me a very similar issue than yours lately. It's probably
the same. And I have a potential fix.

The scenario is a bit tricky again, and still theoretical. If you're interested in gory details:
a tick which is scheduled at jiffies = N + 1, in order to expire a timer_list timer, fires a
tiny bit too early (ie: very few microseconds in advance). So it doesn't update the jiffies on irq entry
and still sees jiffies = N. The timer_list timer doesnt expire yet and on IRQ exit we reschedule
the tick at the same time. But we see that ts->next_tick already has that value, therefore
we don't reprogram it again, leaving the clockevent unprogrammed.

So in case you have the time and opportunity to test the fix, you'll need to:

1) Revert back to the offending change:
   git revert 558e8e27e73f53f8a512485be538b07115fe5f3c

2) Apply a delta fix:

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a3b8154..ae66515 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1071,8 +1071,10 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 	tick_sched_handle(ts, regs);
 
 	/* No need to reprogram if we are running tickless  */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		ts->next_tick = 0;
 		return;
+	}
 
 	hrtimer_forward(&ts->sched_timer, now, tick_period);
 	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
@@ -1172,8 +1174,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 		tick_sched_handle(ts, regs);
 
 	/* No need to reprogram if we are in idle or full dynticks mode */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		ts->next_tick = 0;
 		return HRTIMER_NORESTART;
+	}
 
 	hrtimer_forward(timer, now, tick_period);
 


Thanks!

[-- Attachment #2: pavel.diff --]
[-- Type: text/x-diff, Size: 927 bytes --]

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a3b8154..ae66515 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1071,8 +1071,10 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 	tick_sched_handle(ts, regs);
 
 	/* No need to reprogram if we are running tickless  */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		ts->next_tick = 0;
 		return;
+	}
 
 	hrtimer_forward(&ts->sched_timer, now, tick_period);
 	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
@@ -1172,8 +1174,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 		tick_sched_handle(ts, regs);
 
 	/* No need to reprogram if we are in idle or full dynticks mode */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		ts->next_tick = 0;
 		return HRTIMER_NORESTART;
+	}
 
 	hrtimer_forward(timer, now, tick_period);
 

  reply	other threads:[~2017-04-12 15:08 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-03 19:04 v4.10-rc6 boot regression on Intel desktop, maybe related to EHCI hadnoff? Pavel Machek
2017-02-03 19:21 ` Alan Stern
2017-02-03 20:51   ` v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot Pavel Machek
2017-02-03 21:18     ` Pavel Machek
2017-02-03 21:59       ` Alan Stern
2017-02-03 22:43         ` Pavel Machek
2017-02-04  8:48           ` Pavel Machek
2017-02-04 16:52             ` Pavel Machek
2017-02-12 12:00             ` Pavel Machek
2017-02-14 17:59       ` v4.10-rc8 (-rc6) " Pavel Machek
2017-02-14 19:27         ` Pavel Machek
2017-02-14 19:54           ` Alan Stern
2017-02-23 16:28           ` Frederic Weisbecker
2017-02-23 18:40             ` Pavel Machek
2017-02-25  3:28               ` Frederic Weisbecker
2017-03-18 14:42               ` Frederic Weisbecker
2017-04-03 15:38               ` Frederic Weisbecker
2017-04-03 18:20                 ` Pavel Machek
2017-04-12 15:08                   ` Frederic Weisbecker [this message]
2017-04-15 21:34                     ` Pavel Machek
2017-04-20 14:52                       ` Frederic Weisbecker
     [not found]         ` <CA+55aFyYAztA+Onquy9ODeC9_YBL_fXAd-RgeUVUhpsjK81ZVQ@mail.gmail.com>
     [not found]           ` <CA+55aFxU1D0hAPJuhkKaFBByi=8vpw7dJUX=FfpoqnZLWsvxig@mail.gmail.com>
     [not found]             ` <CA+55aFwt6pbt2STzRh1yCdoo7AnCLFqnPkkrYk4_BGFuvT2VCw@mail.gmail.com>
     [not found]               ` <CA+55aFzMiXXw9gqNCMCSc+O5HfcqWHXfqbdtbvcOmAHM9_wNig@mail.gmail.com>
     [not found]                 ` <CA+55aFxuXgsCyMgrRDHdM6BQaej68QoU8TwdM=3LYu9LMBf4fQ@mail.gmail.com>
2017-02-15 17:23                   ` Pavel Machek
2017-02-15 23:20                     ` Pavel Machek
2017-02-15 23:34                       ` Linus Torvalds
2017-02-16 11:11                         ` Pavel Machek
2017-02-16 17:25                           ` Pavel Machek
2017-02-16 18:13                             ` Frederic Weisbecker
2017-02-16 18:20                               ` Linus Torvalds
2017-02-16 18:34                                 ` Frederic Weisbecker
2017-02-16 19:34                                   ` Thomas Gleixner
2017-02-16 20:06                                     ` Pavel Machek
2017-02-16 20:21                                       ` Linus Torvalds
2017-02-16 20:48                                         ` Pavel Machek
2017-02-18  8:55                                         ` Pavel Machek
2017-02-17  1:11                                       ` Greg Kroah-Hartman
2017-02-17 14:04                                     ` Frederic Weisbecker
2017-02-17 16:37                                       ` Thomas Gleixner
2017-02-17 17:05                                         ` Pavel Machek
2017-02-17 18:43                                           ` Frederic Weisbecker
2017-02-18  9:39                                             ` next_tick hang was " Pavel Machek
2017-02-18 14:50                                               ` Frederic Weisbecker
2017-02-18 18:05                                                 ` Pavel Machek
2017-02-20 14:05                                                   ` Frederic Weisbecker
     [not found]                                               ` <20170218102339.GA3544@amd>
2017-02-22  3:08                                                 ` Frederic Weisbecker
2017-02-23 14:22                                                   ` Pavel Machek
2017-02-16 19:06                             ` Pavel Machek
2017-02-17 14:40                               ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170412150832.GE21309@lerouge \
    --to=fweisbec@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=stern@rowland.harvard.edu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.