* v4.10-rc6 boot regression on Intel desktop, maybe related to EHCI hadnoff? @ 2017-02-03 19:04 Pavel Machek 2017-02-03 19:21 ` Alan Stern 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-03 19:04 UTC (permalink / raw) To: kernel list; +Cc: stern, linux-usb, gregkh [-- Attachment #1: Type: text/plain, Size: 2560 bytes --] Hi! Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer boots. v4.6 works ok. Let me try with keyboard unplugged... no, I could not get it to work. I believe v4.9 and some v4.10-rc's worked, but I'll have to double check. Machine is small Intel desktop: 00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03) 00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03) 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) 00:1b.0 Audio device: Intel Corporation NM10/ICH7 Family High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 1 (rev 01) 00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 2 (rev 01) 00:1d.0 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #1 (rev 01) 00:1d.1 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #2 (rev 01) 00:1d.2 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #3 (rev 01) 00:1d.3 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #4 (rev 01) 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 IDE interface: Intel Corporation NM10/ICH7 Family SATA Controller [IDE mode] (rev 01) 00:1f.3 SMBus: Intel Corporation NM10/ICH7 Family SMBus Controller (rev 01) 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03) Last message I can see on console is something like: pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] ...and then blinking cursor. According to dmesg, v4.6 kernel prints this just after this message: pci 0000:00:1d.7: EHCI: BIOS handoff failed (BIOS bug?) 01010001 Any ideas? Let me try to update to current Linus' tree. I guess I could try to boot with CONFIG_USB=n, but that will be pretty useless configuration. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, maybe related to EHCI hadnoff? 2017-02-03 19:04 v4.10-rc6 boot regression on Intel desktop, maybe related to EHCI hadnoff? Pavel Machek @ 2017-02-03 19:21 ` Alan Stern 2017-02-03 20:51 ` v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Alan Stern @ 2017-02-03 19:21 UTC (permalink / raw) To: Pavel Machek; +Cc: kernel list, linux-usb, gregkh On Fri, 3 Feb 2017, Pavel Machek wrote: > Hi! > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > but I'll have to double check. But all the kernel versions worked when the keyboard was plugged into its original USB port? > Machine is small Intel desktop: > > 00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03) > 00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03) > 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) > 00:02.1 Display controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) > 00:1b.0 Audio device: Intel Corporation NM10/ICH7 Family High Definition Audio Controller (rev 01) > 00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 1 (rev 01) > 00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 2 (rev 01) > 00:1d.0 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #1 (rev 01) > 00:1d.1 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #2 (rev 01) > 00:1d.2 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #3 (rev 01) > 00:1d.3 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #4 (rev 01) > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller (rev 01) > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) > 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) > 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) > 00:1f.2 IDE interface: Intel Corporation NM10/ICH7 Family SATA Controller [IDE mode] (rev 01) > 00:1f.3 SMBus: Intel Corporation NM10/ICH7 Family SMBus Controller (rev 01) > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03) > > Last message I can see on console is something like: > > pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] > > ...and then blinking cursor. According to dmesg, v4.6 kernel prints > this just after this message: > > pci 0000:00:1d.7: EHCI: BIOS handoff failed (BIOS bug?) 01010001 That message doesn't necessarily mean anything; it crops up pretty regularly on systems that nevertheless work fine. But I'm surprised that moving the keyboard from one port to another could cause the system to hang, whether before or after the BIOS handoff fails. > Any ideas? Let me try to update to current Linus' tree. I guess I > could try to boot with CONFIG_USB=n, but that will be pretty useless > configuration. If necessary, you could disable EHCI in the BIOS. That also would be pretty drastic, though. What does the dmesg log in 4.10-rc6 say when the keyboard is plugged into its original port? No obvious explanations spring to mind. You may have to bisect. Alan Stern ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-03 19:21 ` Alan Stern @ 2017-02-03 20:51 ` Pavel Machek 2017-02-03 21:18 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-03 20:51 UTC (permalink / raw) To: Alan Stern; +Cc: kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 1937 bytes --] Hi! > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > but I'll have to double check. > > But all the kernel versions worked when the keyboard was plugged into > its original USB port? Aha. So it looks difference is probably in "where is keyboard plugged in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite a while :-(. Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. It happens with current Linus' tree. "usbcore.nousb usbcore.nousb=1" on kernel command line does not help. So maybe it is not USB after all. (Sorry). > Last message I can see on console is something like: > > pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] If I do the reboot dance with 4.10-rc6, messages around the hang are: [ 0.281948] RPC: Registered named UNIX socket transport module. [ 0.282009] RPC: Registered udp transport module. [ 0.282068] RPC: Registered tcp transport module. [ 0.282126] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.282205] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] [ 0.316266] PCI: CLS 64 bytes, default 64 [ 0.316330] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 0.316395] software IO TLB [mem 0xb987e000-0xbd87e000] (64MB) mapped at [ffff8800b987e000-ffff8800bd87dfff] [ 0.317912] futex hash table entries: 1024 (order: 5, 131072 bytes) [ 0.318646] workingset: timestamp_bits=62 max_order=20 bucket_order=0 So I guess this is something with PCI subsystem? Ideas welcome... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-03 20:51 ` v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot Pavel Machek @ 2017-02-03 21:18 ` Pavel Machek 2017-02-03 21:59 ` Alan Stern 2017-02-14 17:59 ` v4.10-rc8 (-rc6) " Pavel Machek 0 siblings, 2 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-03 21:18 UTC (permalink / raw) To: Alan Stern; +Cc: kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 880 bytes --] Hi! > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > but I'll have to double check. > > > > But all the kernel versions worked when the keyboard was plugged into > > its original USB port? > > Aha. So it looks difference is probably in "where is keyboard plugged > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > a while :-(. > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > It happens with current Linus' tree. v4.10-rc6-feb3 : broken v4.9 : ok (v4.6 : ok) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-03 21:18 ` Pavel Machek @ 2017-02-03 21:59 ` Alan Stern 2017-02-03 22:43 ` Pavel Machek 2017-02-14 17:59 ` v4.10-rc8 (-rc6) " Pavel Machek 1 sibling, 1 reply; 47+ messages in thread From: Alan Stern @ 2017-02-03 21:59 UTC (permalink / raw) To: Pavel Machek; +Cc: kernel list, linux-usb, gregkh, bhelgaas, linux-pci On Fri, 3 Feb 2017, Pavel Machek wrote: > Hi! > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > but I'll have to double check. > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > its original USB port? > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > a while :-(. > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > It happens with current Linus' tree. > > v4.10-rc6-feb3 : broken > v4.9 : ok > (v4.6 : ok) All I can suggest is git bisect. :-( But I agree that the problem is unlikely to be in the USB layer. Alan Stern ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-03 21:59 ` Alan Stern @ 2017-02-03 22:43 ` Pavel Machek 2017-02-04 8:48 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-03 22:43 UTC (permalink / raw) To: Alan Stern; +Cc: kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 1431 bytes --] On Fri 2017-02-03 16:59:05, Alan Stern wrote: > On Fri, 3 Feb 2017, Pavel Machek wrote: > > > Hi! > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > but I'll have to double check. > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > its original USB port? > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > a while :-(. > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > It happens with current Linus' tree. > > > > v4.10-rc6-feb3 : broken > > v4.9 : ok > > (v4.6 : ok) > > All I can suggest is git bisect. :-( But I agree that the problem is > unlikely to be in the USB layer. Yep. I'm hoping PCI people speak up.... adding printks there should be possibility, too. (And I guess I should remove you and usb people from the cc-list... in the next mails). (I verified it happens with 32bit configuration, too, FWIW). Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-03 22:43 ` Pavel Machek @ 2017-02-04 8:48 ` Pavel Machek 2017-02-04 16:52 ` Pavel Machek 2017-02-12 12:00 ` Pavel Machek 0 siblings, 2 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-04 8:48 UTC (permalink / raw) To: pavel; +Cc: kernel list, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 1574 bytes --] On Fri 2017-02-03 23:43:09, Pavel Machek wrote: > On Fri 2017-02-03 16:59:05, Alan Stern wrote: > > On Fri, 3 Feb 2017, Pavel Machek wrote: > > > > > Hi! > > > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > > but I'll have to double check. > > > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > > its original USB port? > > > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > a while :-(. > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > It happens with current Linus' tree. > > > > > > v4.10-rc6-feb3 : broken > > > v4.9 : ok > > > (v4.6 : ok) > > > > All I can suggest is git bisect. :-( But I agree that the problem is > > unlikely to be in the USB layer. > > Yep. I'm hoping PCI people speak up.... adding printks there should be > possibility, too. > > (And I guess I should remove you and usb people from the cc-list... in > the next mails). > > (I verified it happens with 32bit configuration, too, FWIW). v4.10-rc1 seems to work. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-04 8:48 ` Pavel Machek @ 2017-02-04 16:52 ` Pavel Machek 2017-02-12 12:00 ` Pavel Machek 1 sibling, 0 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-04 16:52 UTC (permalink / raw) To: kernel list, bhelgaas, linux-pci, tglx, mingo, hpa, x86 [-- Attachment #1: Type: text/plain, Size: 1283 bytes --] Hi! > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > > a while :-(. > > > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > > > It happens with current Linus' tree. > > > > > > > > v4.10-rc6-feb3 : broken > > > > v4.9 : ok > > > > (v4.6 : ok) > v4.10-rc1 seems to work. v4.10-rc3: ok v4.10-rc5: bad, I get some kind of backtrace? Next boot results in hang. It boots ok twice in between. v4.10-rc6: ok ?! 34e00accf612bc5448ae709245c2b408edf39f46 : ok? v4.10-rc6-feb3: ok?! (with config.ok, and with original config. Hmm?) This machine was pretty reliable (except rowhammer), but I admit their EC code may leave something to be desired. Sometimes it runs without power LED lit (usually after strange crashes), sometimes _two_ long presses are neccessary to power it off. So... yesterday it hung on cold boot, but not on reboot. Verified 5 times or more. Today, I can't reproduce the hang. Strange :-(. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-04 8:48 ` Pavel Machek 2017-02-04 16:52 ` Pavel Machek @ 2017-02-12 12:00 ` Pavel Machek 1 sibling, 0 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-12 12:00 UTC (permalink / raw) To: kernel list, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 1261 bytes --] Hi! > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > > a while :-(. > > > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > > > It happens with current Linus' tree. > > > > > > > > v4.10-rc6-feb3 : broken > > > > v4.9 : ok > > > > (v4.6 : ok) > > > > > > All I can suggest is git bisect. :-( But I agree that the problem is > > > unlikely to be in the USB layer. > > > > Yep. I'm hoping PCI people speak up.... adding printks there should be > > possibility, too. > > > > (And I guess I should remove you and usb people from the cc-list... in > > the next mails). > > > > (I verified it happens with 32bit configuration, too, FWIW). > > v4.10-rc1 seems to work. No, it is somehow harder to test. It is only problem after some cold boots. Today machine was off for few hours, and then v4.10-rc7 failed to boot. On next reboot, I waited for grub, hit ctrl-alt-del, and it booted ok. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-03 21:18 ` Pavel Machek 2017-02-03 21:59 ` Alan Stern @ 2017-02-14 17:59 ` Pavel Machek 2017-02-14 19:27 ` Pavel Machek [not found] ` <CA+55aFyYAztA+Onquy9ODeC9_YBL_fXAd-RgeUVUhpsjK81ZVQ@mail.gmail.com> 1 sibling, 2 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-14 17:59 UTC (permalink / raw) To: Alan Stern, torvalds; +Cc: kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 2021 bytes --] Hi! > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > but I'll have to double check. > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > its original USB port? > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > a while :-(. > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > It happens with current Linus' tree. > > v4.10-rc6-feb3 : broken > v4.9 : ok > (v4.6 : ok) Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. With debug patch below, I get ...1d.7: PCI fixup... pass 2 ...1d.7: PCI fixup... pass 3 ...1d.7: PCI fixup... pass 3 done ...followed by hang. So yes, it looks USB related. (Sometimes it hangs with some kind backtrace involving secondary CPU startup, unfortunately useful info is off screen at that point). Any ideas? Pavel diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 1800bef..060ad79 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3510,6 +3510,8 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev) { struct pci_fixup *start, *end; + dev_info(&dev->dev, "PCI fixup device %p, pass %d\n", dev, pass); + switch (pass) { case pci_fixup_early: start = __start_pci_fixups_early; @@ -3558,6 +3560,7 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev) return; } pci_do_fixups(dev, start, end); + dev_info(&dev->dev, "PCI fixup device %p, pass %d, done\n", dev, pass); } EXPORT_SYMBOL(pci_fixup_device); -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-14 17:59 ` v4.10-rc8 (-rc6) " Pavel Machek @ 2017-02-14 19:27 ` Pavel Machek 2017-02-14 19:54 ` Alan Stern 2017-02-23 16:28 ` Frederic Weisbecker [not found] ` <CA+55aFyYAztA+Onquy9ODeC9_YBL_fXAd-RgeUVUhpsjK81ZVQ@mail.gmail.com> 1 sibling, 2 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-14 19:27 UTC (permalink / raw) To: Alan Stern, torvalds; +Cc: kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 1564 bytes --] On Tue 2017-02-14 18:59:56, Pavel Machek wrote: > Hi! > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > but I'll have to double check. > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > its original USB port? > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > a while :-(. > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > It happens with current Linus' tree. > > > > v4.10-rc6-feb3 : broken > > v4.9 : ok > > (v4.6 : ok) > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. > > With debug patch below, I get > > ...1d.7: PCI fixup... pass 2 > ...1d.7: PCI fixup... pass 3 > ...1d.7: PCI fixup... pass 3 done > > ...followed by hang. So yes, it looks USB related. > > (Sometimes it hangs with some kind backtrace involving secondary CPU > startup, unfortunately useful info is off screen at that point). Forgot to say, 1d.7 is EHCI controller. 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller (rev 01) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-14 19:27 ` Pavel Machek @ 2017-02-14 19:54 ` Alan Stern 2017-02-23 16:28 ` Frederic Weisbecker 1 sibling, 0 replies; 47+ messages in thread From: Alan Stern @ 2017-02-14 19:54 UTC (permalink / raw) To: Pavel Machek Cc: torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci On Tue, 14 Feb 2017, Pavel Machek wrote: > On Tue 2017-02-14 18:59:56, Pavel Machek wrote: > > Hi! > > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > > but I'll have to double check. > > > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > > its original USB port? > > > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > a while :-(. > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > It happens with current Linus' tree. > > > > > > v4.10-rc6-feb3 : broken > > > v4.9 : ok > > > (v4.6 : ok) > > > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. > > > > With debug patch below, I get > > > > ...1d.7: PCI fixup... pass 2 > > ...1d.7: PCI fixup... pass 3 > > ...1d.7: PCI fixup... pass 3 done > > > > ...followed by hang. So yes, it looks USB related. > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > startup, unfortunately useful info is off screen at that point). > > Forgot to say, 1d.7 is EHCI controller. > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > Controller (rev 01) So this looks like a problem in the PCI subsystem affecting a USB controller. Linus is right; bisection is the best approach now that you know a reliable trigger. Alan Stern ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-14 19:27 ` Pavel Machek 2017-02-14 19:54 ` Alan Stern @ 2017-02-23 16:28 ` Frederic Weisbecker 2017-02-23 18:40 ` Pavel Machek 1 sibling, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-23 16:28 UTC (permalink / raw) To: Pavel Machek Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci On Tue, Feb 14, 2017 at 08:27:43PM +0100, Pavel Machek wrote: > On Tue 2017-02-14 18:59:56, Pavel Machek wrote: > > Hi! > > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > > but I'll have to double check. > > > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > > its original USB port? > > > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > a while :-(. > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > It happens with current Linus' tree. > > > > > > v4.10-rc6-feb3 : broken > > > v4.9 : ok > > > (v4.6 : ok) > > > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. > > > > With debug patch below, I get > > > > ...1d.7: PCI fixup... pass 2 > > ...1d.7: PCI fixup... pass 3 > > ...1d.7: PCI fixup... pass 3 done > > > > ...followed by hang. So yes, it looks USB related. > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > startup, unfortunately useful info is off screen at that point). > > Forgot to say, 1d.7 is EHCI controller. > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > Controller (rev 01) Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to burden you again :-) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-23 16:28 ` Frederic Weisbecker @ 2017-02-23 18:40 ` Pavel Machek 2017-02-25 3:28 ` Frederic Weisbecker ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-23 18:40 UTC (permalink / raw) To: Frederic Weisbecker Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 2354 bytes --] On Thu 2017-02-23 17:28:26, Frederic Weisbecker wrote: > On Tue, Feb 14, 2017 at 08:27:43PM +0100, Pavel Machek wrote: > > On Tue 2017-02-14 18:59:56, Pavel Machek wrote: > > > Hi! > > > > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > > > but I'll have to double check. > > > > > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > > > its original USB port? > > > > > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > > a while :-(. > > > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > > > It happens with current Linus' tree. > > > > > > > > v4.10-rc6-feb3 : broken > > > > v4.9 : ok > > > > (v4.6 : ok) > > > > > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. > > > > > > With debug patch below, I get > > > > > > ...1d.7: PCI fixup... pass 2 > > > ...1d.7: PCI fixup... pass 3 > > > ...1d.7: PCI fixup... pass 3 done > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > startup, unfortunately useful info is off screen at that point). > > > > Forgot to say, 1d.7 is EHCI controller. > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > Controller (rev 01) > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > burden you again :-) Go through more mails. It is only reproducible after cold boot. .. so I doubt it will be easy to reproduce on another machine. Now... I do have serial port, and I even might have serial cable somewhere, but.... Giving how sensitive it is, it is probably going to go away with console on ttyS... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-23 18:40 ` Pavel Machek @ 2017-02-25 3:28 ` Frederic Weisbecker 2017-03-18 14:42 ` Frederic Weisbecker 2017-04-03 15:38 ` Frederic Weisbecker 2 siblings, 0 replies; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-25 3:28 UTC (permalink / raw) To: Pavel Machek Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci On Thu, Feb 23, 2017 at 07:40:13PM +0100, Pavel Machek wrote: > On Thu 2017-02-23 17:28:26, Frederic Weisbecker wrote: > > On Tue, Feb 14, 2017 at 08:27:43PM +0100, Pavel Machek wrote: > > > On Tue 2017-02-14 18:59:56, Pavel Machek wrote: > > > > Hi! > > > > > > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > > > > but I'll have to double check. > > > > > > > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > > > > its original USB port? > > > > > > > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > > > a while :-(. > > > > > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > > > > > It happens with current Linus' tree. > > > > > > > > > > v4.10-rc6-feb3 : broken > > > > > v4.9 : ok > > > > > (v4.6 : ok) > > > > > > > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. > > > > > > > > With debug patch below, I get > > > > > > > > ...1d.7: PCI fixup... pass 2 > > > > ...1d.7: PCI fixup... pass 3 > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > startup, unfortunately useful info is off screen at that point). > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > Controller (rev 01) > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > burden you again :-) > > Go through more mails. I've read the whole thread several times, I couldn't get much more clues. > It is only reproducible after cold boot. .. so I doubt it will be easy to reproduce on another machine. I have no idea. That's just my only hope for now. > > Now... I do have serial port, and I even might have serial cable > somewhere, but.... Giving how sensitive it is, it is probably going to > go away with console on ttyS... We'll see how it goes. I'll be off next week and then I should get the eeepc. I'll get back to it there. What gets me surprised is that the tick doesn't even fire yet on pci quirks time, at least not on my machine where the clocksource is setup afterward. That said if some of the pci quirks are async works, it might explain some later relation with the tick. Thanks. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-23 18:40 ` Pavel Machek 2017-02-25 3:28 ` Frederic Weisbecker @ 2017-03-18 14:42 ` Frederic Weisbecker 2017-04-03 15:38 ` Frederic Weisbecker 2 siblings, 0 replies; 47+ messages in thread From: Frederic Weisbecker @ 2017-03-18 14:42 UTC (permalink / raw) To: Pavel Machek Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci On Thu, Feb 23, 2017 at 07:40:13PM +0100, Pavel Machek wrote: > On Thu 2017-02-23 17:28:26, Frederic Weisbecker wrote: > > On Tue, Feb 14, 2017 at 08:27:43PM +0100, Pavel Machek wrote: > > > On Tue 2017-02-14 18:59:56, Pavel Machek wrote: > > > > Hi! > > > > > > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > > > > but I'll have to double check. > > > > > > > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > > > > its original USB port? > > > > > > > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > > > a while :-(. > > > > > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > > > > > It happens with current Linus' tree. > > > > > > > > > > v4.10-rc6-feb3 : broken > > > > > v4.9 : ok > > > > > (v4.6 : ok) > > > > > > > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. > > > > > > > > With debug patch below, I get > > > > > > > > ...1d.7: PCI fixup... pass 2 > > > > ...1d.7: PCI fixup... pass 3 > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > startup, unfortunately useful info is off screen at that point). > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > Controller (rev 01) > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > burden you again :-) > > Go through more mails. It is only reproducible after cold boot. .. so > I doubt it will be easy to reproduce on another machine. > > Now... I do have serial port, and I even might have serial cable > somewhere, but.... Giving how sensitive it is, it is probably going to > go away with console on ttyS... So I had access to a machine with NM10/ICH7 chipset and I failed to reproduce. What machine is it you're using? I fear you're my last resort. I suspect something is programming the clockevent behind the tick. I thought it could be the clockevents switch code but I can't find any issue there. I see you have CONFIG_HIGH_RES_TIMERS=n. Could you try with it enabled? For a quick rewind: git reset --hard v4.10 git revert 558e8e27e73f53f8a512485be538b07115fe5f3c Thanks! ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-23 18:40 ` Pavel Machek 2017-02-25 3:28 ` Frederic Weisbecker 2017-03-18 14:42 ` Frederic Weisbecker @ 2017-04-03 15:38 ` Frederic Weisbecker 2017-04-03 18:20 ` Pavel Machek 2 siblings, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-04-03 15:38 UTC (permalink / raw) To: Pavel Machek Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci On Thu, Feb 23, 2017 at 07:40:13PM +0100, Pavel Machek wrote: > On Thu 2017-02-23 17:28:26, Frederic Weisbecker wrote: > > On Tue, Feb 14, 2017 at 08:27:43PM +0100, Pavel Machek wrote: > > > On Tue 2017-02-14 18:59:56, Pavel Machek wrote: > > > > Hi! > > > > > > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer > > > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I > > > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked, > > > > > > > > but I'll have to double check. > > > > > > > > > > > > > > But all the kernel versions worked when the keyboard was plugged into > > > > > > > its original USB port? > > > > > > > > > > > > Aha. So it looks difference is probably in "where is keyboard plugged > > > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite > > > > > > a while :-(. > > > > > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch. > > > > > > > > > > > > It happens with current Linus' tree. > > > > > > > > > > v4.10-rc6-feb3 : broken > > > > > v4.9 : ok > > > > > (v4.6 : ok) > > > > > > > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too. > > > > > > > > With debug patch below, I get > > > > > > > > ...1d.7: PCI fixup... pass 2 > > > > ...1d.7: PCI fixup... pass 3 > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > startup, unfortunately useful info is off screen at that point). > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > Controller (rev 01) > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > burden you again :-) > > Go through more mails. It is only reproducible after cold boot. .. so > I doubt it will be easy to reproduce on another machine. > > Now... I do have serial port, and I even might have serial cable > somewhere, but.... Giving how sensitive it is, it is probably going to > go away with console on ttyS... I also tried on an eeepc (which has ICH7/NM10 as well), with your config. I even plugged a usb keyboard but even then I have been unable to reproduce either :-( ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-04-03 15:38 ` Frederic Weisbecker @ 2017-04-03 18:20 ` Pavel Machek 2017-04-12 15:08 ` Frederic Weisbecker 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-04-03 18:20 UTC (permalink / raw) To: Frederic Weisbecker Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 1514 bytes --] > > > > > ...1d.7: PCI fixup... pass 2 > > > > > ...1d.7: PCI fixup... pass 3 > > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > > startup, unfortunately useful info is off screen at that point). > > > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > > Controller (rev 01) > > > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > > burden you again :-) > > > > Go through more mails. It is only reproducible after cold boot. .. so > > I doubt it will be easy to reproduce on another machine. > > > > Now... I do have serial port, and I even might have serial cable > > somewhere, but.... Giving how sensitive it is, it is probably going to > > go away with console on ttyS... > > I also tried on an eeepc (which has ICH7/NM10 as well), with your config. > I even plugged a usb keyboard but even then I have been unable to > reproduce either :-( Ok, give me some time. I'm no longer using the affected machine, so no promises. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-04-03 18:20 ` Pavel Machek @ 2017-04-12 15:08 ` Frederic Weisbecker 2017-04-15 21:34 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-04-12 15:08 UTC (permalink / raw) To: Pavel Machek Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 3256 bytes --] On Mon, Apr 03, 2017 at 08:20:50PM +0200, Pavel Machek wrote: > > > > > > ...1d.7: PCI fixup... pass 2 > > > > > > ...1d.7: PCI fixup... pass 3 > > > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > > > startup, unfortunately useful info is off screen at that point). > > > > > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > > > Controller (rev 01) > > > > > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > > > burden you again :-) > > > > > > Go through more mails. It is only reproducible after cold boot. .. so > > > I doubt it will be easy to reproduce on another machine. > > > > > > Now... I do have serial port, and I even might have serial cable > > > somewhere, but.... Giving how sensitive it is, it is probably going to > > > go away with console on ttyS... > > > > I also tried on an eeepc (which has ICH7/NM10 as well), with your config. > > I even plugged a usb keyboard but even then I have been unable to > > reproduce either :-( > > Ok, give me some time. I'm no longer using the affected machine, so no > promises. Actually someone reported me a very similar issue than yours lately. It's probably the same. And I have a potential fix. The scenario is a bit tricky again, and still theoretical. If you're interested in gory details: a tick which is scheduled at jiffies = N + 1, in order to expire a timer_list timer, fires a tiny bit too early (ie: very few microseconds in advance). So it doesn't update the jiffies on irq entry and still sees jiffies = N. The timer_list timer doesnt expire yet and on IRQ exit we reschedule the tick at the same time. But we see that ts->next_tick already has that value, therefore we don't reprogram it again, leaving the clockevent unprogrammed. So in case you have the time and opportunity to test the fix, you'll need to: 1) Revert back to the offending change: git revert 558e8e27e73f53f8a512485be538b07115fe5f3c 2) Apply a delta fix: diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index a3b8154..ae66515 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1071,8 +1071,10 @@ static void tick_nohz_handler(struct clock_event_device *dev) tick_sched_handle(ts, regs); /* No need to reprogram if we are running tickless */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped)) { + ts->next_tick = 0; return; + } hrtimer_forward(&ts->sched_timer, now, tick_period); tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1); @@ -1172,8 +1174,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) tick_sched_handle(ts, regs); /* No need to reprogram if we are in idle or full dynticks mode */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped)) { + ts->next_tick = 0; return HRTIMER_NORESTART; + } hrtimer_forward(timer, now, tick_period); Thanks! [-- Attachment #2: pavel.diff --] [-- Type: text/x-diff, Size: 927 bytes --] diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index a3b8154..ae66515 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1071,8 +1071,10 @@ static void tick_nohz_handler(struct clock_event_device *dev) tick_sched_handle(ts, regs); /* No need to reprogram if we are running tickless */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped)) { + ts->next_tick = 0; return; + } hrtimer_forward(&ts->sched_timer, now, tick_period); tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1); @@ -1172,8 +1174,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) tick_sched_handle(ts, regs); /* No need to reprogram if we are in idle or full dynticks mode */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped)) { + ts->next_tick = 0; return HRTIMER_NORESTART; + } hrtimer_forward(timer, now, tick_period); ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-04-12 15:08 ` Frederic Weisbecker @ 2017-04-15 21:34 ` Pavel Machek 2017-04-20 14:52 ` Frederic Weisbecker 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-04-15 21:34 UTC (permalink / raw) To: Frederic Weisbecker Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 2050 bytes --] On Wed 2017-04-12 17:08:35, Frederic Weisbecker wrote: > On Mon, Apr 03, 2017 at 08:20:50PM +0200, Pavel Machek wrote: > > > > > > > ...1d.7: PCI fixup... pass 2 > > > > > > > ...1d.7: PCI fixup... pass 3 > > > > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > > > > startup, unfortunately useful info is off screen at that point). > > > > > > > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > > > > Controller (rev 01) > > > > > > > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > > > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > > > > burden you again :-) > > > > > > > > Go through more mails. It is only reproducible after cold boot. .. so > > > > I doubt it will be easy to reproduce on another machine. > > > > > > > > Now... I do have serial port, and I even might have serial cable > > > > somewhere, but.... Giving how sensitive it is, it is probably going to > > > > go away with console on ttyS... > > > > > > I also tried on an eeepc (which has ICH7/NM10 as well), with your config. > > > I even plugged a usb keyboard but even then I have been unable to > > > reproduce either :-( > > > > Ok, give me some time. I'm no longer using the affected machine, so no > > promises. > > Actually someone reported me a very similar issue than yours lately. It's probably > the same. And I have a potential fix. Got the machine back to work -- I guess it will be useful for distcc. And yes, you seem to have right fix :-). Tested-by: Pavel Machek <pavel@ucw.cz> Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-04-15 21:34 ` Pavel Machek @ 2017-04-20 14:52 ` Frederic Weisbecker 0 siblings, 0 replies; 47+ messages in thread From: Frederic Weisbecker @ 2017-04-20 14:52 UTC (permalink / raw) To: Pavel Machek Cc: Alan Stern, torvalds, kernel list, linux-usb, gregkh, bhelgaas, linux-pci On Sat, Apr 15, 2017 at 11:34:47PM +0200, Pavel Machek wrote: > On Wed 2017-04-12 17:08:35, Frederic Weisbecker wrote: > > On Mon, Apr 03, 2017 at 08:20:50PM +0200, Pavel Machek wrote: > > > > > > > > ...1d.7: PCI fixup... pass 2 > > > > > > > > ...1d.7: PCI fixup... pass 3 > > > > > > > > ...1d.7: PCI fixup... pass 3 done > > > > > > > > > > > > > > > > ...followed by hang. So yes, it looks USB related. > > > > > > > > > > > > > > > > (Sometimes it hangs with some kind backtrace involving secondary CPU > > > > > > > > startup, unfortunately useful info is off screen at that point). > > > > > > > > > > > > > > Forgot to say, 1d.7 is EHCI controller. > > > > > > > > > > > > > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI > > > > > > > Controller (rev 01) > > > > > > > > > > > > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller). > > > > > > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to > > > > > > burden you again :-) > > > > > > > > > > Go through more mails. It is only reproducible after cold boot. .. so > > > > > I doubt it will be easy to reproduce on another machine. > > > > > > > > > > Now... I do have serial port, and I even might have serial cable > > > > > somewhere, but.... Giving how sensitive it is, it is probably going to > > > > > go away with console on ttyS... > > > > > > > > I also tried on an eeepc (which has ICH7/NM10 as well), with your config. > > > > I even plugged a usb keyboard but even then I have been unable to > > > > reproduce either :-( > > > > > > Ok, give me some time. I'm no longer using the affected machine, so no > > > promises. > > > > Actually someone reported me a very similar issue than yours lately. It's probably > > the same. And I have a potential fix. > > Got the machine back to work -- I guess it will be useful for distcc. > > And yes, you seem to have right fix :-). > > Tested-by: Pavel Machek <pavel@ucw.cz> Thanks a lot! I'm posting the fix. ^ permalink raw reply [flat|nested] 47+ messages in thread
[parent not found: <CA+55aFyYAztA+Onquy9ODeC9_YBL_fXAd-RgeUVUhpsjK81ZVQ@mail.gmail.com>]
[parent not found: <CA+55aFxU1D0hAPJuhkKaFBByi=8vpw7dJUX=FfpoqnZLWsvxig@mail.gmail.com>]
[parent not found: <CA+55aFwt6pbt2STzRh1yCdoo7AnCLFqnPkkrYk4_BGFuvT2VCw@mail.gmail.com>]
[parent not found: <CA+55aFzMiXXw9gqNCMCSc+O5HfcqWHXfqbdtbvcOmAHM9_wNig@mail.gmail.com>]
[parent not found: <CA+55aFxuXgsCyMgrRDHdM6BQaej68QoU8TwdM=3LYu9LMBf4fQ@mail.gmail.com>]
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot [not found] ` <CA+55aFxuXgsCyMgrRDHdM6BQaej68QoU8TwdM=3LYu9LMBf4fQ@mail.gmail.com> @ 2017-02-15 17:23 ` Pavel Machek 2017-02-15 23:20 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-15 17:23 UTC (permalink / raw) To: Linus Torvalds Cc: linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 1011 bytes --] On Tue 2017-02-14 11:12:26, Linus Torvalds wrote: > On Feb 14, 2017 9:59 AM, "Pavel Machek" <pavel@ucw.cz> wrote: > > Hi! > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. > Ouch. > > > > > > It happens with current Linus' tree. > > > > v4.10-rc6-feb3 : broken > > v4.9 : ok > > I wonder if you could bisect it now that you've figured out the rules for > when it breaks... I guess that's what I'll need to do. It is my main machine, so it is a bit painful. Anyway, it seems that "nosmp" makes it hang at similar place, but makes it hang reliably, reboot or cold poweroff. So I guess that's what I'll use for bisection -- should be possible to do automatically that way. > I don't think I've seen any similar reports, so we don't have a lot of > clues to go by otherwise, I think. :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-15 17:23 ` Pavel Machek @ 2017-02-15 23:20 ` Pavel Machek 2017-02-15 23:34 ` Linus Torvalds 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-15 23:20 UTC (permalink / raw) To: Linus Torvalds Cc: linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 1189 bytes --] On Wed 2017-02-15 18:23:03, Pavel Machek wrote: > On Tue 2017-02-14 11:12:26, Linus Torvalds wrote: > > On Feb 14, 2017 9:59 AM, "Pavel Machek" <pavel@ucw.cz> wrote: > > > > Hi! > > > > > > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. > > Ouch. > > > > > > > > It happens with current Linus' tree. > > > > > > v4.10-rc6-feb3 : broken > > > v4.9 : ok > > > > I wonder if you could bisect it now that you've figured out the rules for > > when it breaks... > > I guess that's what I'll need to do. It is my main machine, so it is a > bit painful. > > Anyway, it seems that "nosmp" makes it hang at similar place, but > makes it hang reliably, reboot or cold poweroff. So I guess that's > what I'll use for bisection -- should be possible to do automatically > that way. I was mistaken. "nosmp" does not seem to make the hang reliable. my-4.10-r8+ broken 4.10-rc8 broken 4.10-rc4 broken 4.10-rc3 ok 4.10-rc2 ok? I started bisect, 168 revisions to go. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-15 23:20 ` Pavel Machek @ 2017-02-15 23:34 ` Linus Torvalds 2017-02-16 11:11 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Linus Torvalds @ 2017-02-15 23:34 UTC (permalink / raw) To: Pavel Machek Cc: linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Wed, Feb 15, 2017 at 3:20 PM, Pavel Machek <pavel@ucw.cz> wrote: > 4.10-rc4 broken > 4.10-rc3 ok Hmm. If those actually end up being reliable, then there's not a whole lot in between them wrt PCI or USB. What looked like the most likely candidate seems to be xhci-specific, though. But maybe it's something that isn't directly in drivers/{pci,usb}/ and just interacts badly. Linus ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-15 23:34 ` Linus Torvalds @ 2017-02-16 11:11 ` Pavel Machek 2017-02-16 17:25 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-16 11:11 UTC (permalink / raw) To: Linus Torvalds Cc: linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 2291 bytes --] Hi! On Wed 2017-02-15 15:34:27, Linus Torvalds wrote: > On Wed, Feb 15, 2017 at 3:20 PM, Pavel Machek <pavel@ucw.cz> wrote: > > 4.10-rc4 broken > > 4.10-rc3 ok > > Hmm. If those actually end up being reliable, then there's not a whole > lot in between them wrt PCI or USB. > > What looked like the most likely candidate seems to be xhci-specific, though. > > But maybe it's something that isn't directly in drivers/{pci,usb}/ and > just interacts badly. Ok. I _hope_ my tests are ok. Bisect log so far is: pavel@half:/data/l/linux$ git bisect log # bad: [49def1853334396f948dcb4cedb9347abb318df5] Linux 4.10-rc4 # good: [a121103c922847ba5010819a3f250f1f7fc84ab8] Linux 4.10-rc3 git bisect start 'v4.10-rc4' 'v4.10-rc3' # good: [557ed56cc75e0a33c15ba438734a280bac23bd32] Merge tag 'sound-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound git bisect good 557ed56cc75e0a33c15ba438734a280bac23bd32 # good: [f4d3935e4f4884ba80561db5549394afb8eef8f7] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs git bisect good f4d3935e4f4884ba80561db5549394afb8eef8f7 # bad: [83346fbc07d267de777e2597552f785174ad0373] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 83346fbc07d267de777e2597552f785174ad0373 # good: [18e7a45af91acdde99d3aa1372cc40e1f8142f7b] perf/x86: Reject non sampling events with precise_ip git bisect good 18e7a45af91acdde99d3aa1372cc40e1f8142f7b # good: [84936118bdf37bda513d4a361c38181a216427e0] x86/unwind: Disable KASAN checks for non-current tasks git bisect good 84936118bdf37bda513d4a361c38181a216427e0 # good: [79078c53baabee12dfefb0cfe00ca94cb2c35570] Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 79078c53baabee12dfefb0cfe00ca94cb2c35570 # good: [695085b4bc7603551db0b3da897b8bf9893ca218] x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() git bisect good 695085b4bc7603551db0b3da897b8bf9893ca218 I should go now, but I should be able to finish it today. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 11:11 ` Pavel Machek @ 2017-02-16 17:25 ` Pavel Machek 2017-02-16 18:13 ` Frederic Weisbecker 2017-02-16 19:06 ` Pavel Machek 0 siblings, 2 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-16 17:25 UTC (permalink / raw) To: Linus Torvalds, fweisbec, wanpeng.li, peterz, riel, tglx, stable Cc: linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 923 bytes --] Hi! > > > 4.10-rc4 broken > > > 4.10-rc3 ok > > > > Hmm. If those actually end up being reliable, then there's not a whole > > lot in between them wrt PCI or USB. > > > > What looked like the most likely candidate seems to be xhci-specific, though. > > > > But maybe it's something that isn't directly in drivers/{pci,usb}/ and > > just interacts badly. > > Ok. I _hope_ my tests are ok. Bisect log so far is: And the winner is: pavel@half:/data/l/linux$ git bisect bad 24b91e360ef521a2808771633d76ebc68bd5604b is the first bad commit commit 24b91e360ef521a2808771633d76ebc68bd5604b Author: Frederic Weisbecker <fweisbec@gmail.com> Date: Wed Jan 4 15:12:04 2017 +0100 nohz: Fix collision between tick and other hrtimers Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 17:25 ` Pavel Machek @ 2017-02-16 18:13 ` Frederic Weisbecker 2017-02-16 18:20 ` Linus Torvalds 2017-02-16 19:06 ` Pavel Machek 1 sibling, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-16 18:13 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, wanpeng.li, peterz, riel, tglx, stable, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, Feb 16, 2017 at 06:25:35PM +0100, Pavel Machek wrote: > Hi! > > > > > 4.10-rc4 broken > > > > 4.10-rc3 ok > > > > > > Hmm. If those actually end up being reliable, then there's not a whole > > > lot in between them wrt PCI or USB. > > > > > > What looked like the most likely candidate seems to be xhci-specific, though. > > > > > > But maybe it's something that isn't directly in drivers/{pci,usb}/ and > > > just interacts badly. > > > > Ok. I _hope_ my tests are ok. Bisect log so far is: > > And the winner is: > > pavel@half:/data/l/linux$ git bisect bad > 24b91e360ef521a2808771633d76ebc68bd5604b is the first bad commit > commit 24b91e360ef521a2808771633d76ebc68bd5604b > Author: Frederic Weisbecker <fweisbec@gmail.com> > Date: Wed Jan 4 15:12:04 2017 +0100 > > nohz: Fix collision between tick and other hrtimers I haven't followed the discussion but this patch has a known issue which is fixed with: 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 "tick/nohz: Fix possible missing clock reprog after tick soft restart" I hope this fixes your issue. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 18:13 ` Frederic Weisbecker @ 2017-02-16 18:20 ` Linus Torvalds 2017-02-16 18:34 ` Frederic Weisbecker 0 siblings, 1 reply; 47+ messages in thread From: Linus Torvalds @ 2017-02-16 18:20 UTC (permalink / raw) To: Frederic Weisbecker Cc: Pavel Machek, wanpeng.li, Peter Zijlstra, Rik van Riel, Thomas Gleixner, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote: > > I haven't followed the discussion but this patch has a known issue which is fixed > with: > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > I hope this fixes your issue. No, Pavel saw the problem with rc8 too, which already has that fix. So I think we'll just need to revert that original patch (and that means that we have to revert the commit you point to as well, since that ->next_tick field was added by the original commit). Pavel, can you verify that rc8 with both 24b91e360ef521a2808771633d76ebc68bd5604b 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 reverted works reliably for you? Linus ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 18:20 ` Linus Torvalds @ 2017-02-16 18:34 ` Frederic Weisbecker 2017-02-16 19:34 ` Thomas Gleixner 0 siblings, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-16 18:34 UTC (permalink / raw) To: Linus Torvalds Cc: Pavel Machek, wanpeng.li, Peter Zijlstra, Rik van Riel, Thomas Gleixner, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > <fweisbec@gmail.com> wrote: > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > with: > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > I hope this fixes your issue. > > No, Pavel saw the problem with rc8 too, which already has that fix. > > So I think we'll just need to revert that original patch (and that > means that we have to revert the commit you point to as well, since > that ->next_tick field was added by the original commit). Aw too bad, but indeed that late we don't have the choice. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 18:34 ` Frederic Weisbecker @ 2017-02-16 19:34 ` Thomas Gleixner 2017-02-16 20:06 ` Pavel Machek 2017-02-17 14:04 ` Frederic Weisbecker 0 siblings, 2 replies; 47+ messages in thread From: Thomas Gleixner @ 2017-02-16 19:34 UTC (permalink / raw) To: Frederic Weisbecker Cc: Linus Torvalds, Pavel Machek, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > <fweisbec@gmail.com> wrote: > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > with: > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > I hope this fixes your issue. > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > So I think we'll just need to revert that original patch (and that > > means that we have to revert the commit you point to as well, since > > that ->next_tick field was added by the original commit). > > Aw too bad, but indeed that late we don't have the choice. Hint: Look for CPU hotplug interaction of these patches. I bet something becomes stale when the CPU goes down and does not get reset when it comes back online. Thanks, tglx ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 19:34 ` Thomas Gleixner @ 2017-02-16 20:06 ` Pavel Machek 2017-02-16 20:21 ` Linus Torvalds 2017-02-17 1:11 ` Greg Kroah-Hartman 2017-02-17 14:04 ` Frederic Weisbecker 1 sibling, 2 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-16 20:06 UTC (permalink / raw) To: Thomas Gleixner Cc: Frederic Weisbecker, Linus Torvalds, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 1766 bytes --] On Thu 2017-02-16 20:34:45, Thomas Gleixner wrote: > On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > > <fweisbec@gmail.com> wrote: > > > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > > with: > > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > > > I hope this fixes your issue. > > > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > > > So I think we'll just need to revert that original patch (and that > > > means that we have to revert the commit you point to as well, since > > > that ->next_tick field was added by the original commit). (I already said that elsewhere, but yes, revert of 7bdb59f1ad474b and 24b91e360ef5 fixes boot problems for me. Hmm, and 24b9 was marked for stable... I don't know how to contact all the stable maintainers, but probably it should not go to stable just yet...) > > Aw too bad, but indeed that late we don't have the choice. > > Hint: Look for CPU hotplug interaction of these patches. I bet something > becomes stale when the CPU goes down and does not get reset when it comes > back online. Hmm, that would explain problems at boot _and_ problems during suspend/resume. Note that this can be used to test the hotplug... cd /sys/devices/system/cpu/cpu1 while true; do echo 0 > online; echo 1 > online; done Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 20:06 ` Pavel Machek @ 2017-02-16 20:21 ` Linus Torvalds 2017-02-16 20:48 ` Pavel Machek 2017-02-18 8:55 ` Pavel Machek 2017-02-17 1:11 ` Greg Kroah-Hartman 1 sibling, 2 replies; 47+ messages in thread From: Linus Torvalds @ 2017-02-16 20:21 UTC (permalink / raw) To: Pavel Machek Cc: Thomas Gleixner, Frederic Weisbecker, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, Feb 16, 2017 at 12:06 PM, Pavel Machek <pavel@ucw.cz> wrote: > > Hmm, that would explain problems at boot _and_ problems during > suspend/resume. I've committed the revert, and I'm just assuming that the revert also fixed your suspend/resume issues, but I wanted to just double-check that since it's only implied, no staed explicitly.. Linus ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 20:21 ` Linus Torvalds @ 2017-02-16 20:48 ` Pavel Machek 2017-02-18 8:55 ` Pavel Machek 1 sibling, 0 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-16 20:48 UTC (permalink / raw) To: Linus Torvalds Cc: Thomas Gleixner, Frederic Weisbecker, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 768 bytes --] On Thu 2017-02-16 12:21:13, Linus Torvalds wrote: > On Thu, Feb 16, 2017 at 12:06 PM, Pavel Machek <pavel@ucw.cz> wrote: > > > > Hmm, that would explain problems at boot _and_ problems during > > suspend/resume. > > I've committed the revert, and I'm just assuming that the revert also > fixed your suspend/resume issues, but I wanted to just double-check > that since it's only implied, no staed explicitly.. Thanks! I don't yet know if suspend/resume issues are fixed. Those are somehow tricky to reproduce -- fun stuff does not happen on every suspend. I should know within a week or so... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 20:21 ` Linus Torvalds 2017-02-16 20:48 ` Pavel Machek @ 2017-02-18 8:55 ` Pavel Machek 1 sibling, 0 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-18 8:55 UTC (permalink / raw) To: Linus Torvalds Cc: Thomas Gleixner, Frederic Weisbecker, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 809 bytes --] On Thu 2017-02-16 12:21:13, Linus Torvalds wrote: > On Thu, Feb 16, 2017 at 12:06 PM, Pavel Machek <pavel@ucw.cz> wrote: > > > > Hmm, that would explain problems at boot _and_ problems during > > suspend/resume. > > I've committed the revert, and I'm just assuming that the revert also > fixed your suspend/resume issues, but I wanted to just double-check > that since it's only implied, no staed explicitly.. So boot issue is fixed, but it hung on resume, again. v4.9 worked ok. Display is restored when it hangs on resume, but mouse is dead; I guess that means there should be some chance to get debugging messages during the resume. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 20:06 ` Pavel Machek 2017-02-16 20:21 ` Linus Torvalds @ 2017-02-17 1:11 ` Greg Kroah-Hartman 1 sibling, 0 replies; 47+ messages in thread From: Greg Kroah-Hartman @ 2017-02-17 1:11 UTC (permalink / raw) To: Pavel Machek Cc: Thomas Gleixner, Frederic Weisbecker, Linus Torvalds, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, Feb 16, 2017 at 09:06:24PM +0100, Pavel Machek wrote: > On Thu 2017-02-16 20:34:45, Thomas Gleixner wrote: > > On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > > > <fweisbec@gmail.com> wrote: > > > > > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > > > with: > > > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > > > > > I hope this fixes your issue. > > > > > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > > > > > So I think we'll just need to revert that original patch (and that > > > > means that we have to revert the commit you point to as well, since > > > > that ->next_tick field was added by the original commit). > > (I already said that elsewhere, but yes, revert of 7bdb59f1ad474b and > 24b91e360ef5 fixes boot problems for me. Hmm, and 24b9 was marked for > stable... I don't know how to contact all the stable maintainers, but > probably it should not go to stable just yet...) It tried to get into the stable trees, but it broke the build, so it was dropped. So the stable trees are safe for now. thanks, greg k-h ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 19:34 ` Thomas Gleixner 2017-02-16 20:06 ` Pavel Machek @ 2017-02-17 14:04 ` Frederic Weisbecker 2017-02-17 16:37 ` Thomas Gleixner 1 sibling, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-17 14:04 UTC (permalink / raw) To: Thomas Gleixner Cc: Linus Torvalds, Pavel Machek, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, Feb 16, 2017 at 08:34:45PM +0100, Thomas Gleixner wrote: > On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > > <fweisbec@gmail.com> wrote: > > > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > > with: > > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > > > I hope this fixes your issue. > > > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > > > So I think we'll just need to revert that original patch (and that > > > means that we have to revert the commit you point to as well, since > > > that ->next_tick field was added by the original commit). > > > > Aw too bad, but indeed that late we don't have the choice. > > Hint: Look for CPU hotplug interaction of these patches. I bet something > becomes stale when the CPU goes down and does not get reset when it comes > back online. Indeed I should check that. But Pavel is seeing this on boot, where the only hotplug operations that happen are CPU UP without preceding CPU DOWN that may have retained stale values. I think the value of ts->next_tick should be initially 0 for all CPUs. So perhaps that 0 value confuses stuff. But looking at the code I don't see how. It maybe something more subtle. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-17 14:04 ` Frederic Weisbecker @ 2017-02-17 16:37 ` Thomas Gleixner 2017-02-17 17:05 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Thomas Gleixner @ 2017-02-17 16:37 UTC (permalink / raw) To: Frederic Weisbecker Cc: Linus Torvalds, Pavel Machek, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Fri, 17 Feb 2017, Frederic Weisbecker wrote: > On Thu, Feb 16, 2017 at 08:34:45PM +0100, Thomas Gleixner wrote: > > On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > > > <fweisbec@gmail.com> wrote: > > > > > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > > > with: > > > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > > > > > I hope this fixes your issue. > > > > > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > > > > > So I think we'll just need to revert that original patch (and that > > > > means that we have to revert the commit you point to as well, since > > > > that ->next_tick field was added by the original commit). > > > > > > Aw too bad, but indeed that late we don't have the choice. > > > > Hint: Look for CPU hotplug interaction of these patches. I bet something > > becomes stale when the CPU goes down and does not get reset when it comes > > back online. > > Indeed I should check that. But Pavel is seeing this on boot, where the I don't think so. He observed it on suspend resume and by doing hotplug operations in a loop. But I might be wrong as usual. > only hotplug operations that happen are CPU UP without preceding CPU DOWN > that may have retained stale values. I think the value of ts->next_tick should > be initially 0 for all CPUs. So perhaps that 0 value confuses stuff. But > looking at the code I don't see how. It maybe something more subtle. > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-17 16:37 ` Thomas Gleixner @ 2017-02-17 17:05 ` Pavel Machek 2017-02-17 18:43 ` Frederic Weisbecker 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-17 17:05 UTC (permalink / raw) To: Thomas Gleixner Cc: Frederic Weisbecker, Linus Torvalds, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 2080 bytes --] On Fri 2017-02-17 17:37:47, Thomas Gleixner wrote: > On Fri, 17 Feb 2017, Frederic Weisbecker wrote: > > On Thu, Feb 16, 2017 at 08:34:45PM +0100, Thomas Gleixner wrote: > > > On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > > > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > > > > <fweisbec@gmail.com> wrote: > > > > > > > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > > > > with: > > > > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > > > > > > > I hope this fixes your issue. > > > > > > > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > > > > > > > So I think we'll just need to revert that original patch (and that > > > > > means that we have to revert the commit you point to as well, since > > > > > that ->next_tick field was added by the original commit). > > > > > > > > Aw too bad, but indeed that late we don't have the choice. > > > > > > Hint: Look for CPU hotplug interaction of these patches. I bet something > > > becomes stale when the CPU goes down and does not get reset when it comes > > > back online. > > > > Indeed I should check that. But Pavel is seeing this on boot, where the > > I don't think so. He observed it on suspend resume and by doing hotplug > operations in a loop. But I might be wrong as usual. These are different bugs. On x60, I see failures doing hotplug/unplug in a loop, or lot of suspends. Someone seen it in v4.8-stable etc. Old bug. Rare to hit. Desktop machine was failing to boot, and had some fun with suspend/resume too. Boot hang was reproducible with right procedure. (Hard poweroff, cold boot.). That one was introduced in 4.10-rc cycle. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-17 17:05 ` Pavel Machek @ 2017-02-17 18:43 ` Frederic Weisbecker 2017-02-18 9:39 ` next_tick hang was " Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-17 18:43 UTC (permalink / raw) To: Pavel Machek Cc: Thomas Gleixner, Linus Torvalds, wanpeng.li, Peter Zijlstra, Rik van Riel, # .39.x, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Fri, Feb 17, 2017 at 06:05:08PM +0100, Pavel Machek wrote: > On Fri 2017-02-17 17:37:47, Thomas Gleixner wrote: > > On Fri, 17 Feb 2017, Frederic Weisbecker wrote: > > > On Thu, Feb 16, 2017 at 08:34:45PM +0100, Thomas Gleixner wrote: > > > > On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > > > > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > > > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > > > > > <fweisbec@gmail.com> wrote: > > > > > > > > > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > > > > > with: > > > > > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > > > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > > > > > > > > > I hope this fixes your issue. > > > > > > > > > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > > > > > > > > > So I think we'll just need to revert that original patch (and that > > > > > > means that we have to revert the commit you point to as well, since > > > > > > that ->next_tick field was added by the original commit). > > > > > > > > > > Aw too bad, but indeed that late we don't have the choice. > > > > > > > > Hint: Look for CPU hotplug interaction of these patches. I bet something > > > > becomes stale when the CPU goes down and does not get reset when it comes > > > > back online. > > > > > > Indeed I should check that. But Pavel is seeing this on boot, where the > > > > I don't think so. He observed it on suspend resume and by doing hotplug > > operations in a loop. But I might be wrong as usual. > > These are different bugs. > > On x60, I see failures doing hotplug/unplug in a loop, or lot of > suspends. Someone seen it in v4.8-stable etc. Old bug. Rare to hit. > > Desktop machine was failing to boot, and had some fun with > suspend/resume too. Boot hang was reproducible with right > procedure. (Hard poweroff, cold boot.). That one was introduced in > 4.10-rc cycle. Pavel, is there any chance you could apply this patch on top of latest linus tree and send me your resulting dmesg log? This has the two reverted patches plus some debugging code. The amount of printk shouldn't be too big, I tested it home without issue. If you can't manage to dump the dmesg, please try to take a picture of your screen so that I can see the last messages starting with "NEXT_TICK_READ". Thanks! diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 2c115fd..504cb41 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -658,6 +658,8 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1); } +static DEFINE_PER_CPU(u64, prev_next_tick); + static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, ktime_t now, int cpu) { @@ -725,6 +727,11 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, */ if (delta == 0) { tick_nohz_restart(ts, now); + /* + * Make sure next tick stop doesn't get fooled by past + * clock deadline + */ + ts->next_tick = 0; goto out; } } @@ -767,8 +774,15 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, tick = expires; /* Skip reprogram of event if its not changed */ - if (ts->tick_stopped && (expires == dev->next_event)) - goto out; + if (ts->tick_stopped) { + if (system_state == SYSTEM_BOOTING) { + if (ts->next_tick != this_cpu_read(prev_next_tick)) + printk("NEXT_TICK_READ: CPU: %d Expires: %llu ts->next_tick:%llu\n", smp_processor_id(), expires, ts->next_tick); + this_cpu_write(prev_next_tick, ts->next_tick); + } + if (expires == ts->next_tick) + goto out; + } /* * nohz_stop_sched_tick can be called several times before @@ -787,6 +801,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, trace_tick_stop(1, TICK_DEP_MASK_NONE); } + ts->next_tick = tick; + /* * If the expiration time == KTIME_MAX, then we simply stop * the tick timer. @@ -802,7 +818,10 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, else tick_program_event(tick, 1); out: - /* Update the estimated sleep length */ + /* + * Update the estimated sleep length until the next timer + * (not only the tick). + */ ts->sleep_length = ktime_sub(dev->next_event, now); return tick; } diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h index bf38226..075444e 100644 --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -27,6 +27,7 @@ enum tick_nohz_mode { * timer is modified for nohz sleeps. This is necessary * to resume the tick timer operation in the timeline * when the CPU returns from nohz sleep. + * @next_tick: Next tick to be fired when in dynticks mode. * @tick_stopped: Indicator that the idle tick has been stopped * @idle_jiffies: jiffies at the entry to idle for idle time accounting * @idle_calls: Total number of idle calls @@ -44,6 +45,7 @@ struct tick_sched { unsigned long check_clocks; enum tick_nohz_mode nohz_mode; ktime_t last_tick; + ktime_t next_tick; int inidle; int tick_stopped; unsigned long idle_jiffies; ^ permalink raw reply related [flat|nested] 47+ messages in thread
* next_tick hang was Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-17 18:43 ` Frederic Weisbecker @ 2017-02-18 9:39 ` Pavel Machek 2017-02-18 14:50 ` Frederic Weisbecker [not found] ` <20170218102339.GA3544@amd> 0 siblings, 2 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-18 9:39 UTC (permalink / raw) To: Frederic Weisbecker Cc: Thomas Gleixner, wanpeng.li, Peter Zijlstra, Rik van Riel, Linux Kernel Mailing List [-- Attachment #1.1: Type: text/plain, Size: 1172 bytes --] Hi! [I droped some CCs here, you may want to check the CC list]. > > These are different bugs. > > > > On x60, I see failures doing hotplug/unplug in a loop, or lot of > > suspends. Someone seen it in v4.8-stable etc. Old bug. Rare to hit. > > > > Desktop machine was failing to boot, and had some fun with > > suspend/resume too. Boot hang was reproducible with right > > procedure. (Hard poweroff, cold boot.). That one was introduced in > > 4.10-rc cycle. > > Pavel, is there any chance you could apply this patch on top of latest linus tree > and send me your resulting dmesg log? This has the two reverted patches plus some > debugging code. The amount of printk shouldn't be too big, I tested it home without > issue. > > If you can't manage to dump the dmesg, please try to take a picture of your screen > so that I can see the last messages starting with "NEXT_TICK_READ". > > Thanks! I guess I can. But I'll only have one 80x25 screen to look at... .config is attached. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #1.2: config.gz --] [-- Type: application/gzip, Size: 26599 bytes --] [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: next_tick hang was Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-18 9:39 ` next_tick hang was " Pavel Machek @ 2017-02-18 14:50 ` Frederic Weisbecker 2017-02-18 18:05 ` Pavel Machek [not found] ` <20170218102339.GA3544@amd> 1 sibling, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-18 14:50 UTC (permalink / raw) To: Pavel Machek Cc: Thomas Gleixner, wanpeng.li, Peter Zijlstra, Rik van Riel, Linux Kernel Mailing List On Sat, Feb 18, 2017 at 10:39:17AM +0100, Pavel Machek wrote: > Hi! > > [I droped some CCs here, you may want to check the CC list]. Good! > > > > These are different bugs. > > > > > > On x60, I see failures doing hotplug/unplug in a loop, or lot of > > > suspends. Someone seen it in v4.8-stable etc. Old bug. Rare to hit. > > > > > > Desktop machine was failing to boot, and had some fun with > > > suspend/resume too. Boot hang was reproducible with right > > > procedure. (Hard poweroff, cold boot.). That one was introduced in > > > 4.10-rc cycle. > > > > Pavel, is there any chance you could apply this patch on top of latest linus tree > > and send me your resulting dmesg log? This has the two reverted patches plus some > > debugging code. The amount of printk shouldn't be too big, I tested it home without > > issue. > > > > If you can't manage to dump the dmesg, please try to take a picture of your screen > > so that I can see the last messages starting with "NEXT_TICK_READ". > > > > Thanks! > > I guess I can. But I'll only have one 80x25 screen to look at... > > .config is attached. Ah this is x86-32, interesting! I'm going to try to boot that, we never know. Thanks a lot! ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: next_tick hang was Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-18 14:50 ` Frederic Weisbecker @ 2017-02-18 18:05 ` Pavel Machek 2017-02-20 14:05 ` Frederic Weisbecker 0 siblings, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-18 18:05 UTC (permalink / raw) To: Frederic Weisbecker Cc: Thomas Gleixner, wanpeng.li, Peter Zijlstra, Rik van Riel, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 554 bytes --] Hi! > > I guess I can. But I'll only have one 80x25 screen to look at... > > > > .config is attached. > > Ah this is x86-32, interesting! I'm going to try to boot that, we never know. > > Thanks a lot! Happens on x86-64, too; I'm running that normally, but for testing, 32-bit kernel is easier. thinkpad x60 works fine for me, so it is unlikely that .config is all it takes... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: next_tick hang was Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-18 18:05 ` Pavel Machek @ 2017-02-20 14:05 ` Frederic Weisbecker 0 siblings, 0 replies; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-20 14:05 UTC (permalink / raw) To: Pavel Machek Cc: Thomas Gleixner, wanpeng.li, Peter Zijlstra, Rik van Riel, Linux Kernel Mailing List On Sat, Feb 18, 2017 at 07:05:20PM +0100, Pavel Machek wrote: > Hi! > > > > I guess I can. But I'll only have one 80x25 screen to look at... > > > > > > .config is attached. > > > > Ah this is x86-32, interesting! I'm going to try to boot that, we never know. > > > > Thanks a lot! > > Happens on x86-64, too; I'm running that normally, but for testing, > 32-bit kernel is easier. Ah! And you've seen that on only one machine? What kind machine is it? Ideally I would need a dump of all pending timer list timers (no sysrq key for that though, but I can do a quick patch) and a stacktrace of all tasks. But I guess you have no access to any serial port, right? > > thinkpad x60 works fine for me, so it is unlikely that .config is all > it takes... Yeah I booted the .config and it reached the root filesystem mounting without problem. So I think it's specific to some hardware. ^ permalink raw reply [flat|nested] 47+ messages in thread
[parent not found: <20170218102339.GA3544@amd>]
* Re: next_tick hang was Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot [not found] ` <20170218102339.GA3544@amd> @ 2017-02-22 3:08 ` Frederic Weisbecker 2017-02-23 14:22 ` Pavel Machek 0 siblings, 1 reply; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-22 3:08 UTC (permalink / raw) To: Pavel Machek Cc: Thomas Gleixner, wanpeng.li, Peter Zijlstra, Rik van Riel, Linux Kernel Mailing List On Sat, Feb 18, 2017 at 11:23:39AM +0100, Pavel Machek wrote: > On Sat 2017-02-18 10:39:17, Pavel Machek wrote: > > Hi! > > > > [I droped some CCs here, you may want to check the CC list]. > > > > > > These are different bugs. > > > > > > > > On x60, I see failures doing hotplug/unplug in a loop, or lot of > > > > suspends. Someone seen it in v4.8-stable etc. Old bug. Rare to hit. > > > > > > > > Desktop machine was failing to boot, and had some fun with > > > > suspend/resume too. Boot hang was reproducible with right > > > > procedure. (Hard poweroff, cold boot.). That one was introduced in > > > > 4.10-rc cycle. > > > > > > Pavel, is there any chance you could apply this patch on top of latest linus tree > > > and send me your resulting dmesg log? This has the two reverted patches plus some > > > debugging code. The amount of printk shouldn't be too big, I tested it home without > > > issue. > > > > > > If you can't manage to dump the dmesg, please try to take a picture of your screen > > > so that I can see the last messages starting with "NEXT_TICK_READ". > > > > > > Thanks! > > > > I guess I can. But I'll only have one 80x25 screen to look at... > > Ok, here it is. Thanks, I haven't been able to deduce much though, except that the pending timer on CPU 0 looks quite far away. Could you please add "initcall_debug" in your kernel parameters to identify if we are blocking in a specific initcall? If so it should tell us which one. Thanks! ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: next_tick hang was Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-22 3:08 ` Frederic Weisbecker @ 2017-02-23 14:22 ` Pavel Machek 0 siblings, 0 replies; 47+ messages in thread From: Pavel Machek @ 2017-02-23 14:22 UTC (permalink / raw) To: Frederic Weisbecker Cc: Thomas Gleixner, wanpeng.li, Peter Zijlstra, Rik van Riel, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 1961 bytes --] On Wed 2017-02-22 04:08:58, Frederic Weisbecker wrote: > On Sat, Feb 18, 2017 at 11:23:39AM +0100, Pavel Machek wrote: > > On Sat 2017-02-18 10:39:17, Pavel Machek wrote: > > > Hi! > > > > > > [I droped some CCs here, you may want to check the CC list]. > > > > > > > > These are different bugs. > > > > > > > > > > On x60, I see failures doing hotplug/unplug in a loop, or lot of > > > > > suspends. Someone seen it in v4.8-stable etc. Old bug. Rare to hit. > > > > > > > > > > Desktop machine was failing to boot, and had some fun with > > > > > suspend/resume too. Boot hang was reproducible with right > > > > > procedure. (Hard poweroff, cold boot.). That one was introduced in > > > > > 4.10-rc cycle. > > > > > > > > Pavel, is there any chance you could apply this patch on top of latest linus tree > > > > and send me your resulting dmesg log? This has the two reverted patches plus some > > > > debugging code. The amount of printk shouldn't be too big, I tested it home without > > > > issue. > > > > > > > > If you can't manage to dump the dmesg, please try to take a picture of your screen > > > > so that I can see the last messages starting with "NEXT_TICK_READ". > > > > > > > > Thanks! > > > > > > I guess I can. But I'll only have one 80x25 screen to look at... > > > > Ok, here it is. > > Thanks, I haven't been able to deduce much though, except that the pending timer on CPU 0 > looks quite far away. > > Could you please add "initcall_debug" in your kernel parameters to identify if we are blocking in > a specific initcall? If so it should tell us which one. Please see Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot thread. Hang was traced down to the USB handoff code. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 17:25 ` Pavel Machek 2017-02-16 18:13 ` Frederic Weisbecker @ 2017-02-16 19:06 ` Pavel Machek 2017-02-17 14:40 ` Frederic Weisbecker 1 sibling, 1 reply; 47+ messages in thread From: Pavel Machek @ 2017-02-16 19:06 UTC (permalink / raw) To: Linus Torvalds, fweisbec, wanpeng.li, peterz, riel, tglx, stable Cc: linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list [-- Attachment #1: Type: text/plain, Size: 1237 bytes --] On Thu 2017-02-16 18:25:35, Pavel Machek wrote: > Hi! > > > > > 4.10-rc4 broken > > > > 4.10-rc3 ok > > > > > > Hmm. If those actually end up being reliable, then there's not a whole > > > lot in between them wrt PCI or USB. > > > > > > What looked like the most likely candidate seems to be xhci-specific, though. > > > > > > But maybe it's something that isn't directly in drivers/{pci,usb}/ and > > > just interacts badly. > > > > Ok. I _hope_ my tests are ok. Bisect log so far is: > > And the winner is: > > pavel@half:/data/l/linux$ git bisect bad > 24b91e360ef521a2808771633d76ebc68bd5604b is the first bad commit > commit 24b91e360ef521a2808771633d76ebc68bd5604b > Author: Frederic Weisbecker <fweisbec@gmail.com> > Date: Wed Jan 4 15:12:04 2017 +0100 > > nohz: Fix collision between tick and other hrtimers > I had to revert 7bdb59f1ad474bd7161adc8f923cdef10f2638d1, too, otherwise -rc8 does not compile. With 24b91e360ef521a28087716 and 7bdb59f1ad474 reverted, it seems to boot ok. (I did few tries.) Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot 2017-02-16 19:06 ` Pavel Machek @ 2017-02-17 14:40 ` Frederic Weisbecker 0 siblings, 0 replies; 47+ messages in thread From: Frederic Weisbecker @ 2017-02-17 14:40 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, wanpeng.li, peterz, riel, tglx, stable, linux-pci, Greg Kroah-Hartman, Alan Stern, Linux Kernel Mailing List, Bjorn Helgaas, USB list On Thu, Feb 16, 2017 at 08:06:04PM +0100, Pavel Machek wrote: > On Thu 2017-02-16 18:25:35, Pavel Machek wrote: > > Hi! > > > > > > > 4.10-rc4 broken > > > > > 4.10-rc3 ok > > > > > > > > Hmm. If those actually end up being reliable, then there's not a whole > > > > lot in between them wrt PCI or USB. > > > > > > > > What looked like the most likely candidate seems to be xhci-specific, though. > > > > > > > > But maybe it's something that isn't directly in drivers/{pci,usb}/ and > > > > just interacts badly. > > > > > > Ok. I _hope_ my tests are ok. Bisect log so far is: > > > > And the winner is: > > > > pavel@half:/data/l/linux$ git bisect bad > > 24b91e360ef521a2808771633d76ebc68bd5604b is the first bad commit > > commit 24b91e360ef521a2808771633d76ebc68bd5604b > > Author: Frederic Weisbecker <fweisbec@gmail.com> > > Date: Wed Jan 4 15:12:04 2017 +0100 > > > > nohz: Fix collision between tick and other hrtimers > > > > I had to revert 7bdb59f1ad474bd7161adc8f923cdef10f2638d1, too, > otherwise -rc8 does not compile. > > With 24b91e360ef521a28087716 and 7bdb59f1ad474 reverted, it seems to > boot ok. (I did few tries.) Do you still have the config that triggered this? I don't have much expectations about reproducing, this has almost never worked for me, but at least I could narrow down the context. Thanks. ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2017-04-20 14:52 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-03 19:04 v4.10-rc6 boot regression on Intel desktop, maybe related to EHCI hadnoff? Pavel Machek 2017-02-03 19:21 ` Alan Stern 2017-02-03 20:51 ` v4.10-rc6 boot regression on Intel desktop, does not boot after cold boots, boots after reboot Pavel Machek 2017-02-03 21:18 ` Pavel Machek 2017-02-03 21:59 ` Alan Stern 2017-02-03 22:43 ` Pavel Machek 2017-02-04 8:48 ` Pavel Machek 2017-02-04 16:52 ` Pavel Machek 2017-02-12 12:00 ` Pavel Machek 2017-02-14 17:59 ` v4.10-rc8 (-rc6) " Pavel Machek 2017-02-14 19:27 ` Pavel Machek 2017-02-14 19:54 ` Alan Stern 2017-02-23 16:28 ` Frederic Weisbecker 2017-02-23 18:40 ` Pavel Machek 2017-02-25 3:28 ` Frederic Weisbecker 2017-03-18 14:42 ` Frederic Weisbecker 2017-04-03 15:38 ` Frederic Weisbecker 2017-04-03 18:20 ` Pavel Machek 2017-04-12 15:08 ` Frederic Weisbecker 2017-04-15 21:34 ` Pavel Machek 2017-04-20 14:52 ` Frederic Weisbecker [not found] ` <CA+55aFyYAztA+Onquy9ODeC9_YBL_fXAd-RgeUVUhpsjK81ZVQ@mail.gmail.com> [not found] ` <CA+55aFxU1D0hAPJuhkKaFBByi=8vpw7dJUX=FfpoqnZLWsvxig@mail.gmail.com> [not found] ` <CA+55aFwt6pbt2STzRh1yCdoo7AnCLFqnPkkrYk4_BGFuvT2VCw@mail.gmail.com> [not found] ` <CA+55aFzMiXXw9gqNCMCSc+O5HfcqWHXfqbdtbvcOmAHM9_wNig@mail.gmail.com> [not found] ` <CA+55aFxuXgsCyMgrRDHdM6BQaej68QoU8TwdM=3LYu9LMBf4fQ@mail.gmail.com> 2017-02-15 17:23 ` Pavel Machek 2017-02-15 23:20 ` Pavel Machek 2017-02-15 23:34 ` Linus Torvalds 2017-02-16 11:11 ` Pavel Machek 2017-02-16 17:25 ` Pavel Machek 2017-02-16 18:13 ` Frederic Weisbecker 2017-02-16 18:20 ` Linus Torvalds 2017-02-16 18:34 ` Frederic Weisbecker 2017-02-16 19:34 ` Thomas Gleixner 2017-02-16 20:06 ` Pavel Machek 2017-02-16 20:21 ` Linus Torvalds 2017-02-16 20:48 ` Pavel Machek 2017-02-18 8:55 ` Pavel Machek 2017-02-17 1:11 ` Greg Kroah-Hartman 2017-02-17 14:04 ` Frederic Weisbecker 2017-02-17 16:37 ` Thomas Gleixner 2017-02-17 17:05 ` Pavel Machek 2017-02-17 18:43 ` Frederic Weisbecker 2017-02-18 9:39 ` next_tick hang was " Pavel Machek 2017-02-18 14:50 ` Frederic Weisbecker 2017-02-18 18:05 ` Pavel Machek 2017-02-20 14:05 ` Frederic Weisbecker [not found] ` <20170218102339.GA3544@amd> 2017-02-22 3:08 ` Frederic Weisbecker 2017-02-23 14:22 ` Pavel Machek 2017-02-16 19:06 ` Pavel Machek 2017-02-17 14:40 ` Frederic Weisbecker
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.