* [PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation @ 2016-08-16 3:25 Pratyush Anand 2016-08-16 3:25 ` [PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init() Pratyush Anand 2016-08-16 3:25 ` [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered Pratyush Anand 0 siblings, 2 replies; 11+ messages in thread From: Pratyush Anand @ 2016-08-16 3:25 UTC (permalink / raw) To: mingo, alexandre.belloni, tglx, hpa, x86 Cc: rtc-linux, linux-kernel, prarit, dzickus, dyoung, a.zummo, Pratyush Anand We have observed on few machines with rtc-cmos devices that it generates an interrupt before the hpet_rtc_timer_init() call is finished. This leads to hpet_rtc_interrupt() being called before it is fully initialized. Therefore the while-loop of hpet_cnt_ahead() in hpet_rtc_timer_reinit() never completes. This leads to "NMI watchdog: Watchdog detected hard LOCKUP on cpu 0". This patch set initializes hpet_default_delta and hpet_t1_cmp before interrupt can be raised. Changes since V2: - Improved commit log further Changes since RFC: - Commit log of patches has been improved. Pratyush Anand (2): rtc/hpet: Factorize hpet_rtc_timer_init() rtc/rtc-cmos: Initialize software counters before irq is registered arch/x86/include/asm/hpet.h | 2 ++ arch/x86/kernel/hpet.c | 41 +++++++++++++++++++++++++++++++++++------ drivers/rtc/rtc-cmos.c | 13 ++++++++++++- 3 files changed, 49 insertions(+), 7 deletions(-) -- 2.5.5 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init() 2016-08-16 3:25 [PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation Pratyush Anand @ 2016-08-16 3:25 ` Pratyush Anand 2016-08-16 3:25 ` [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered Pratyush Anand 1 sibling, 0 replies; 11+ messages in thread From: Pratyush Anand @ 2016-08-16 3:25 UTC (permalink / raw) To: mingo, alexandre.belloni, tglx, hpa, x86 Cc: rtc-linux, linux-kernel, prarit, dzickus, dyoung, a.zummo, Pratyush Anand, Andy Lutomirski, Anna-Maria Gleixner, Arnd Bergmann, Borislav Petkov, Denys Vlasenko, Ingo Molnar, Jan Beulich, Sebastian Andrzej Siewior We need the ability to support initialization of hpet_default_delta and hpet_t1_cmp counters before irq can be enabled. This patch splits hpet_rtc_timer_init() into two functions: hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable, so that the above functionality can be achieved. Next patch explains it's need in detail. No functional change in this patch. Signed-off-by: Pratyush Anand <panand@redhat.com> [dzickus@redhat.com: edited the patch's summary] Signed-off-by: Don Zickus <dzickus@redhat.com> --- arch/x86/include/asm/hpet.h | 2 ++ arch/x86/kernel/hpet.c | 41 +++++++++++++++++++++++++++++++++++------ 2 files changed, 37 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index cc285ec4b2c1..8eecb31bebcb 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -96,6 +96,8 @@ extern int hpet_set_alarm_time(unsigned char hrs, unsigned char min, unsigned char sec); extern int hpet_set_periodic_freq(unsigned long freq); extern int hpet_rtc_dropped_irq(void); +extern int hpet_rtc_timer_counter_init(void); +extern int hpet_rtc_timer_enable(void); extern int hpet_rtc_timer_init(void); extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id); extern int hpet_register_irq_handler(rtc_irq_handler handler); diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index ed16e58658a4..6f6d21059b1b 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1074,14 +1074,12 @@ void hpet_unregister_irq_handler(rtc_irq_handler handler) EXPORT_SYMBOL_GPL(hpet_unregister_irq_handler); /* - * Timer 1 for RTC emulation. We use one shot mode, as periodic mode - * is not supported by all HPET implementations for timer 1. - * - * hpet_rtc_timer_init() is called when the rtc is initialized. + * hpet_rtc_timer_counter_init() is called before interrupt can be + * registered */ -int hpet_rtc_timer_init(void) +int hpet_rtc_timer_counter_init(void) { - unsigned int cfg, cnt, delta; + unsigned int cnt, delta; unsigned long flags; if (!is_hpet_enabled()) @@ -1106,6 +1104,22 @@ int hpet_rtc_timer_init(void) hpet_writel(cnt, HPET_T1_CMP); hpet_t1_cmp = cnt; + local_irq_restore(flags); + + return 1; +} +EXPORT_SYMBOL_GPL(hpet_rtc_timer_counter_init); + +/* + * hpet_rtc_timer_enable() is called during RTC initialization + */ +int hpet_rtc_timer_enable(void) +{ + unsigned int cfg; + unsigned long flags; + + local_irq_save(flags); + cfg = hpet_readl(HPET_T1_CFG); cfg &= ~HPET_TN_PERIODIC; cfg |= HPET_TN_ENABLE | HPET_TN_32BIT; @@ -1115,6 +1129,21 @@ int hpet_rtc_timer_init(void) return 1; } +EXPORT_SYMBOL_GPL(hpet_rtc_timer_enable); + +/* + * Timer 1 for RTC emulation. We use one shot mode, as periodic mode + * is not supported by all HPET implementations for timer 1. + * + * hpet_rtc_timer_init() is called when the rtc is initialized. + */ +int hpet_rtc_timer_init(void) +{ + if (!hpet_rtc_timer_counter_init()) + return 0; + + return hpet_rtc_timer_enable(); +} EXPORT_SYMBOL_GPL(hpet_rtc_timer_init); static void hpet_disable_rtc_channel(void) -- 2.5.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-16 3:25 [PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation Pratyush Anand 2016-08-16 3:25 ` [PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init() Pratyush Anand @ 2016-08-16 3:25 ` Pratyush Anand 2016-08-30 8:22 ` Dave Young 2016-09-06 9:58 ` Thomas Gleixner 1 sibling, 2 replies; 11+ messages in thread From: Pratyush Anand @ 2016-08-16 3:25 UTC (permalink / raw) To: mingo, alexandre.belloni, tglx, hpa, x86 Cc: rtc-linux, linux-kernel, prarit, dzickus, dyoung, a.zummo, Pratyush Anand We have observed on few x86 machines with rtc-cmos device that hpet_rtc_interrupt() is called just after irq registration and before cmos_do_probe() could call hpet_rtc_timer_init(). So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time interrupt is raised in the given situation, and this results in NMI watchdog LOCKUP. It has only been observed sporadically on kdump secondary kernels. See the call trace: ---<-snip->--- 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-342.el7.x86_64 #1 [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0 ffffffff81637bd4 [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010 ffff880034e05b80 [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000 0000000000000000 [ 27.926599] Call Trace: [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7 [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50 [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0 [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250 [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20 [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470 [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50 [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0 [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340 [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ? run_timer_softirq+0x43/0x340 [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0 [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60 [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130 [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150 [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0 [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d [ 27.964101] <EOI> [<ffffffff8163f43b>] ? _raw_spin_unlock_irqrestore+0x1b/0x40 [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570 [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140 [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170 [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450 [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450 [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0 [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0 [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390 [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0 [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40 [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0 [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20 [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0 [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0 [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30 [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71 ---<-snip->--- The previous patch split hpet_rtc_timer_init into hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ registration, so that we can gracefully handle such spurious interrupts. We were able to reproduce the problem in maximum 15 trials of kdump secondary kernel boot on an hp-dl160gen8 machine without this patch. However, more than 35 trials went fine after applying this patch. Signed-off-by: Pratyush Anand <panand@redhat.com> [dzickus@redhat.com: edited the patch's summary] Signed-off-by: Don Zickus <dzickus@redhat.com> --- drivers/rtc/rtc-cmos.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 43745cac0141..089d987f2638 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) return 0; } +static inline int hpet_rtc_timer_counter_init(void) +{ + return 0; +} + +static inline int hpet_rtc_timer_enable(void) +{ + return 0; +} + static inline int hpet_rtc_timer_init(void) { return 0; @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } + hpet_rtc_timer_counter_init(); if (is_valid_irq(rtc_irq)) { irq_handler_t rtc_cmos_int_handler; @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } } - hpet_rtc_timer_init(); + hpet_rtc_timer_enable(); /* export at least the first block of NVRAM */ nvram.size = address_space - NVRAM_OFFSET; -- 2.5.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-16 3:25 ` [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered Pratyush Anand @ 2016-08-30 8:22 ` Dave Young 2016-08-30 8:38 ` Dave Young 2016-08-30 9:54 ` Pratyush Anand 2016-09-06 9:58 ` Thomas Gleixner 1 sibling, 2 replies; 11+ messages in thread From: Dave Young @ 2016-08-30 8:22 UTC (permalink / raw) To: Pratyush Anand Cc: mingo, alexandre.belloni, tglx, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, a.zummo Hi, Pratyush On 08/16/16 at 08:55am, Pratyush Anand wrote: > We have observed on few x86 machines with rtc-cmos device that > hpet_rtc_interrupt() is called just after irq registration and before > cmos_do_probe() could call hpet_rtc_timer_init(). > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time > interrupt is raised in the given situation, and this results in NMI > watchdog LOCKUP. > > It has only been observed sporadically on kdump secondary kernels. > > See the call trace: > ---<-snip->--- > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 0 > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 3.10.0-342.el7.x86_64 #1 > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0 > ffffffff81637bd4 > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010 > ffff880034e05b80 > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000 > 0000000000000000 > [ 27.926599] Call Trace: > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7 > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50 > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0 > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250 > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20 > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470 > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50 > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0 > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340 > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ? > run_timer_softirq+0x43/0x340 > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0 > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60 > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130 > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150 > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0 > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d > [ 27.964101] <EOI> [<ffffffff8163f43b>] ? > _raw_spin_unlock_irqrestore+0x1b/0x40 > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570 > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140 > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170 > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450 > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450 > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0 > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0 > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390 > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0 > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40 > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0 > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20 > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0 > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0 > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30 > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71 > ---<-snip->--- > > The previous patch split hpet_rtc_timer_init into > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ > registration, so that we can gracefully handle such spurious interrupts. > > We were able to reproduce the problem in maximum 15 trials of kdump > secondary kernel boot on an hp-dl160gen8 machine without this patch. > However, more than 35 trials went fine after applying this patch. > > Signed-off-by: Pratyush Anand <panand@redhat.com> > [dzickus@redhat.com: edited the patch's summary] > Signed-off-by: Don Zickus <dzickus@redhat.com> > --- > drivers/rtc/rtc-cmos.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > index 43745cac0141..089d987f2638 100644 > --- a/drivers/rtc/rtc-cmos.c > +++ b/drivers/rtc/rtc-cmos.c > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) > return 0; > } > > +static inline int hpet_rtc_timer_counter_init(void) > +{ > + return 0; > +} > + > +static inline int hpet_rtc_timer_enable(void) > +{ > + return 0; > +} > + Can these dummy functions go to /usr/include/linux/hpet.h alont with the #ifdef etc. > static inline int hpet_rtc_timer_init(void) > { > return 0; > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > goto cleanup1; > } > > + hpet_rtc_timer_counter_init(); > if (is_valid_irq(rtc_irq)) { > irq_handler_t rtc_cmos_int_handler; > > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > goto cleanup1; > } > } > - hpet_rtc_timer_init(); > + hpet_rtc_timer_enable(); > > /* export at least the first block of NVRAM */ > nvram.size = address_space - NVRAM_OFFSET; > -- > 2.5.5 > Thanks Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-30 8:22 ` Dave Young @ 2016-08-30 8:38 ` Dave Young 2016-08-30 9:10 ` Dave Young 2016-08-30 9:54 ` Pratyush Anand 1 sibling, 1 reply; 11+ messages in thread From: Dave Young @ 2016-08-30 8:38 UTC (permalink / raw) To: Pratyush Anand Cc: mingo, alexandre.belloni, tglx, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, a.zummo On 08/30/16 at 04:22pm, Dave Young wrote: > Hi, Pratyush > > On 08/16/16 at 08:55am, Pratyush Anand wrote: > > We have observed on few x86 machines with rtc-cmos device that > > hpet_rtc_interrupt() is called just after irq registration and before > > cmos_do_probe() could call hpet_rtc_timer_init(). > > > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time > > interrupt is raised in the given situation, and this results in NMI > > watchdog LOCKUP. > > > > It has only been observed sporadically on kdump secondary kernels. > > > > See the call trace: > > ---<-snip->--- > > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > > cpu 0 > > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > 3.10.0-342.el7.x86_64 #1 > > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 > > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0 > > ffffffff81637bd4 > > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010 > > ffff880034e05b80 > > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000 > > 0000000000000000 > > [ 27.926599] Call Trace: > > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b > > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7 > > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50 > > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0 > > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250 > > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20 > > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470 > > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50 > > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0 > > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340 > > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e > > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ? > > run_timer_softirq+0x43/0x340 > > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0 > > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60 > > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130 > > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150 > > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0 > > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d > > [ 27.964101] <EOI> [<ffffffff8163f43b>] ? > > _raw_spin_unlock_irqrestore+0x1b/0x40 > > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570 > > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140 > > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170 > > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450 > > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450 > > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0 > > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0 > > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390 > > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0 > > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40 > > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0 > > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20 > > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0 > > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe > > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0 > > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30 > > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71 > > ---<-snip->--- > > > > The previous patch split hpet_rtc_timer_init into > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). > > > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ > > registration, so that we can gracefully handle such spurious interrupts. > > > > We were able to reproduce the problem in maximum 15 trials of kdump > > secondary kernel boot on an hp-dl160gen8 machine without this patch. > > However, more than 35 trials went fine after applying this patch. > > > > Signed-off-by: Pratyush Anand <panand@redhat.com> > > [dzickus@redhat.com: edited the patch's summary] > > Signed-off-by: Don Zickus <dzickus@redhat.com> > > --- > > drivers/rtc/rtc-cmos.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > > index 43745cac0141..089d987f2638 100644 > > --- a/drivers/rtc/rtc-cmos.c > > +++ b/drivers/rtc/rtc-cmos.c > > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) > > return 0; > > } > > > > +static inline int hpet_rtc_timer_counter_init(void) > > +{ > > + return 0; > > +} > > + > > +static inline int hpet_rtc_timer_enable(void) > > +{ > > + return 0; > > +} > > + > > Can these dummy functions go to /usr/include/linux/hpet.h alont with > the #ifdef etc. Hmm, seems CONFIG_HPET_EMULATE_RTC is x86 only, so maybe go to asm/hpet.h should be better.. > > > static inline int hpet_rtc_timer_init(void) > > { > > return 0; > > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > goto cleanup1; > > } > > > > + hpet_rtc_timer_counter_init(); > > if (is_valid_irq(rtc_irq)) { > > irq_handler_t rtc_cmos_int_handler; > > > > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > goto cleanup1; > > } > > } > > - hpet_rtc_timer_init(); > > + hpet_rtc_timer_enable(); > > > > /* export at least the first block of NVRAM */ > > nvram.size = address_space - NVRAM_OFFSET; > > -- > > 2.5.5 > > > > Thanks > Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-30 8:38 ` Dave Young @ 2016-08-30 9:10 ` Dave Young 0 siblings, 0 replies; 11+ messages in thread From: Dave Young @ 2016-08-30 9:10 UTC (permalink / raw) To: Pratyush Anand Cc: mingo, alexandre.belloni, tglx, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, a.zummo On 08/30/16 at 04:38pm, Dave Young wrote: > On 08/30/16 at 04:22pm, Dave Young wrote: > > Hi, Pratyush > > > > On 08/16/16 at 08:55am, Pratyush Anand wrote: > > > We have observed on few x86 machines with rtc-cmos device that > > > hpet_rtc_interrupt() is called just after irq registration and before > > > cmos_do_probe() could call hpet_rtc_timer_init(). > > > > > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time > > > interrupt is raised in the given situation, and this results in NMI > > > watchdog LOCKUP. > > > > > > It has only been observed sporadically on kdump secondary kernels. > > > > > > See the call trace: > > > ---<-snip->--- > > > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > > > cpu 0 > > > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > > 3.10.0-342.el7.x86_64 #1 > > > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 > > > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0 > > > ffffffff81637bd4 > > > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010 > > > ffff880034e05b80 > > > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000 > > > 0000000000000000 > > > [ 27.926599] Call Trace: > > > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b > > > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7 > > > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50 > > > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0 > > > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250 > > > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20 > > > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470 > > > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50 > > > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0 > > > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340 > > > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e > > > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ? > > > run_timer_softirq+0x43/0x340 > > > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0 > > > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60 > > > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130 > > > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150 > > > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0 > > > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d > > > [ 27.964101] <EOI> [<ffffffff8163f43b>] ? > > > _raw_spin_unlock_irqrestore+0x1b/0x40 > > > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570 > > > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140 > > > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170 > > > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450 > > > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450 > > > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0 > > > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0 > > > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390 > > > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0 > > > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40 > > > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0 > > > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20 > > > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0 > > > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe > > > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0 > > > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30 > > > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71 > > > ---<-snip->--- > > > > > > The previous patch split hpet_rtc_timer_init into > > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). > > > > > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ > > > registration, so that we can gracefully handle such spurious interrupts. > > > > > > We were able to reproduce the problem in maximum 15 trials of kdump > > > secondary kernel boot on an hp-dl160gen8 machine without this patch. > > > However, more than 35 trials went fine after applying this patch. > > > > > > Signed-off-by: Pratyush Anand <panand@redhat.com> > > > [dzickus@redhat.com: edited the patch's summary] > > > Signed-off-by: Don Zickus <dzickus@redhat.com> > > > --- > > > drivers/rtc/rtc-cmos.c | 13 ++++++++++++- > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > > > index 43745cac0141..089d987f2638 100644 > > > --- a/drivers/rtc/rtc-cmos.c > > > +++ b/drivers/rtc/rtc-cmos.c > > > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) > > > return 0; > > > } > > > > > > +static inline int hpet_rtc_timer_counter_init(void) > > > +{ > > > + return 0; > > > +} > > > + > > > +static inline int hpet_rtc_timer_enable(void) > > > +{ > > > + return 0; > > > +} > > > + > > > > Can these dummy functions go to /usr/include/linux/hpet.h alont with > > the #ifdef etc. > > Hmm, seems CONFIG_HPET_EMULATE_RTC is x86 only, so maybe go to > asm/hpet.h should be better.. Oops, asm/hpet will not work since rtc-cmos is also used in other arches., please ignore the comment. Thanks Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-30 8:22 ` Dave Young 2016-08-30 8:38 ` Dave Young @ 2016-08-30 9:54 ` Pratyush Anand 2016-08-31 4:56 ` Dave Young 1 sibling, 1 reply; 11+ messages in thread From: Pratyush Anand @ 2016-08-30 9:54 UTC (permalink / raw) To: Dave Young Cc: mingo, alexandre.belloni, tglx, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, a.zummo Hi Dave, On 30/08/2016:04:22:30 PM, Dave Young wrote: > Hi, Pratyush > > On 08/16/16 at 08:55am, Pratyush Anand wrote: > > We have observed on few x86 machines with rtc-cmos device that > > hpet_rtc_interrupt() is called just after irq registration and before > > cmos_do_probe() could call hpet_rtc_timer_init(). > > > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time > > interrupt is raised in the given situation, and this results in NMI > > watchdog LOCKUP. > > > > It has only been observed sporadically on kdump secondary kernels. > > > > See the call trace: > > ---<-snip->--- > > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > > cpu 0 > > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > 3.10.0-342.el7.x86_64 #1 > > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 > > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0 > > ffffffff81637bd4 > > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010 > > ffff880034e05b80 > > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000 > > 0000000000000000 > > [ 27.926599] Call Trace: > > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b > > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7 > > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50 > > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0 > > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250 > > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20 > > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470 > > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50 > > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0 > > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340 > > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e > > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ? > > run_timer_softirq+0x43/0x340 > > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0 > > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60 > > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130 > > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150 > > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0 > > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d > > [ 27.964101] <EOI> [<ffffffff8163f43b>] ? > > _raw_spin_unlock_irqrestore+0x1b/0x40 > > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570 > > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140 > > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170 > > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450 > > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450 > > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0 > > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0 > > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390 > > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0 > > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40 > > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0 > > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20 > > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0 > > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe > > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0 > > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30 > > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71 > > ---<-snip->--- > > > > The previous patch split hpet_rtc_timer_init into > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). > > > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ > > registration, so that we can gracefully handle such spurious interrupts. > > > > We were able to reproduce the problem in maximum 15 trials of kdump > > secondary kernel boot on an hp-dl160gen8 machine without this patch. > > However, more than 35 trials went fine after applying this patch. > > > > Signed-off-by: Pratyush Anand <panand@redhat.com> > > [dzickus@redhat.com: edited the patch's summary] > > Signed-off-by: Don Zickus <dzickus@redhat.com> > > --- > > drivers/rtc/rtc-cmos.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > > index 43745cac0141..089d987f2638 100644 > > --- a/drivers/rtc/rtc-cmos.c > > +++ b/drivers/rtc/rtc-cmos.c > > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) > > return 0; > > } > > > > +static inline int hpet_rtc_timer_counter_init(void) > > +{ > > + return 0; > > +} > > + > > +static inline int hpet_rtc_timer_enable(void) > > +{ > > + return 0; > > +} > > + > > Can these dummy functions go to /usr/include/linux/hpet.h alont with > the #ifdef etc. I kept them here because, similar functions like hpet_set_alarm_time() were already there. So, if you suggest that I should have an additional cleanup patch first, which moves existing #ifdef block to inlcude/linux/hpet.h and this patch adds it's inline in linux/hpet.h, then may be I can take that. But not sure if there is something more to be done which will help the MAINTAINER to take it. ~Pratyush > > > static inline int hpet_rtc_timer_init(void) > > { > > return 0; > > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > goto cleanup1; > > } > > > > + hpet_rtc_timer_counter_init(); > > if (is_valid_irq(rtc_irq)) { > > irq_handler_t rtc_cmos_int_handler; > > > > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > goto cleanup1; > > } > > } > > - hpet_rtc_timer_init(); > > + hpet_rtc_timer_enable(); > > > > /* export at least the first block of NVRAM */ > > nvram.size = address_space - NVRAM_OFFSET; > > -- > > 2.5.5 > > > > Thanks > Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-30 9:54 ` Pratyush Anand @ 2016-08-31 4:56 ` Dave Young 2016-08-31 6:44 ` Alexandre Belloni 0 siblings, 1 reply; 11+ messages in thread From: Dave Young @ 2016-08-31 4:56 UTC (permalink / raw) To: Pratyush Anand Cc: mingo, alexandre.belloni, tglx, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, a.zummo, akpm Hi, Pratyush, I'm not sure who is the maintainer to review and take the patches, In MATAINERS file, x86 hpet is orphaned. rtc-cmos may go to rtc maitianer Alessandro Zummo Ccing Andrew maybe he can also take the patches for orphaned component. On 08/30/16 at 03:24pm, Pratyush Anand wrote: > Hi Dave, > > On 30/08/2016:04:22:30 PM, Dave Young wrote: > > Hi, Pratyush > > > > On 08/16/16 at 08:55am, Pratyush Anand wrote: > > > We have observed on few x86 machines with rtc-cmos device that > > > hpet_rtc_interrupt() is called just after irq registration and before > > > cmos_do_probe() could call hpet_rtc_timer_init(). > > > > > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time > > > interrupt is raised in the given situation, and this results in NMI > > > watchdog LOCKUP. > > > > > > It has only been observed sporadically on kdump secondary kernels. > > > > > > See the call trace: > > > ---<-snip->--- > > > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > > > cpu 0 > > > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > > 3.10.0-342.el7.x86_64 #1 > > > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 > > > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0 > > > ffffffff81637bd4 > > > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010 > > > ffff880034e05b80 > > > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000 > > > 0000000000000000 > > > [ 27.926599] Call Trace: > > > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b > > > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7 > > > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50 > > > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0 > > > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250 > > > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20 > > > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470 > > > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50 > > > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0 > > > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340 > > > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e > > > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ? > > > run_timer_softirq+0x43/0x340 > > > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0 > > > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60 > > > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130 > > > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150 > > > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0 > > > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d > > > [ 27.964101] <EOI> [<ffffffff8163f43b>] ? > > > _raw_spin_unlock_irqrestore+0x1b/0x40 > > > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570 > > > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140 > > > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170 > > > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450 > > > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450 > > > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0 > > > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0 > > > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390 > > > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0 > > > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40 > > > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0 > > > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20 > > > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0 > > > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe > > > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0 > > > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30 > > > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71 > > > ---<-snip->--- > > > > > > The previous patch split hpet_rtc_timer_init into > > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). > > > > > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ > > > registration, so that we can gracefully handle such spurious interrupts. > > > > > > We were able to reproduce the problem in maximum 15 trials of kdump > > > secondary kernel boot on an hp-dl160gen8 machine without this patch. > > > However, more than 35 trials went fine after applying this patch. > > > > > > Signed-off-by: Pratyush Anand <panand@redhat.com> > > > [dzickus@redhat.com: edited the patch's summary] > > > Signed-off-by: Don Zickus <dzickus@redhat.com> > > > --- > > > drivers/rtc/rtc-cmos.c | 13 ++++++++++++- > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > > > index 43745cac0141..089d987f2638 100644 > > > --- a/drivers/rtc/rtc-cmos.c > > > +++ b/drivers/rtc/rtc-cmos.c > > > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) > > > return 0; > > > } > > > > > > +static inline int hpet_rtc_timer_counter_init(void) > > > +{ > > > + return 0; > > > +} > > > + > > > +static inline int hpet_rtc_timer_enable(void) > > > +{ > > > + return 0; > > > +} > > > + > > > > Can these dummy functions go to /usr/include/linux/hpet.h alont with > > the #ifdef etc. > > I kept them here because, similar functions like hpet_set_alarm_time() were > already there. So, if you suggest that I should have an additional cleanup patch > first, which moves existing #ifdef block to inlcude/linux/hpet.h and this > patch adds it's inline in linux/hpet.h, then may be I can take that. But not > sure if there is something more to be done which will help the MAINTAINER to > take it. > > ~Pratyush > > > > > > static inline int hpet_rtc_timer_init(void) > > > { > > > return 0; > > > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > > goto cleanup1; > > > } > > > > > > + hpet_rtc_timer_counter_init(); > > > if (is_valid_irq(rtc_irq)) { > > > irq_handler_t rtc_cmos_int_handler; > > > > > > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > > goto cleanup1; > > > } > > > } > > > - hpet_rtc_timer_init(); > > > + hpet_rtc_timer_enable(); > > > > > > /* export at least the first block of NVRAM */ > > > nvram.size = address_space - NVRAM_OFFSET; > > > -- > > > 2.5.5 > > > > > > > Thanks > > Dave Thanks Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-31 4:56 ` Dave Young @ 2016-08-31 6:44 ` Alexandre Belloni 0 siblings, 0 replies; 11+ messages in thread From: Alexandre Belloni @ 2016-08-31 6:44 UTC (permalink / raw) To: Dave Young Cc: Pratyush Anand, mingo, tglx, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, a.zummo, akpm On 31/08/2016 at 12:56:17 +0800, Dave Young wrote : > Hi, Pratyush, > > I'm not sure who is the maintainer to review and take the patches, > In MATAINERS file, x86 hpet is orphaned. rtc-cmos may go to rtc > maitianer Alessandro Zummo > > Ccing Andrew maybe he can also take the patches for orphaned component. > Well, my point was that I can take them but I'd like to get some review from the x86 maintainers as this is x86 specific and I don't know much about the hpet. > On 08/30/16 at 03:24pm, Pratyush Anand wrote: > > Hi Dave, > > > > On 30/08/2016:04:22:30 PM, Dave Young wrote: > > > Hi, Pratyush > > > > > > On 08/16/16 at 08:55am, Pratyush Anand wrote: > > > > We have observed on few x86 machines with rtc-cmos device that > > > > hpet_rtc_interrupt() is called just after irq registration and before > > > > cmos_do_probe() could call hpet_rtc_timer_init(). > > > > > > > > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time > > > > interrupt is raised in the given situation, and this results in NMI > > > > watchdog LOCKUP. > > > > > > > > It has only been observed sporadically on kdump secondary kernels. > > > > > > > > See the call trace: > > > > ---<-snip->--- > > > > 27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > > > > cpu 0 > > > > [ 27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > > > > 3.10.0-342.el7.x86_64 #1 > > > > [ 27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014 > > > > [ 27.919455] ffffffff8186a728 0000000059c82488 ffff880034e05af0 > > > > ffffffff81637bd4 > > > > [ 27.921870] ffff880034e05b70 ffffffff8163144a 0000000000000010 > > > > ffff880034e05b80 > > > > [ 27.924257] ffff880034e05b20 0000000059c82488 0000000000000000 > > > > 0000000000000000 > > > > [ 27.926599] Call Trace: > > > > [ 27.927352] <NMI> [<ffffffff81637bd4>] dump_stack+0x19/0x1b > > > > [ 27.929080] [<ffffffff8163144a>] panic+0xd8/0x1e7 > > > > [ 27.930588] [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50 > > > > [ 27.932502] [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0 > > > > [ 27.934427] [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250 > > > > [ 27.936232] [<ffffffff81161d94>] perf_event_overflow+0x14/0x20 > > > > [ 27.937957] [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470 > > > > [ 27.939799] [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50 > > > > [ 27.941649] [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0 > > > > [ 27.943348] [<ffffffff81640f49>] do_nmi+0x169/0x340 > > > > [ 27.944802] [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e > > > > [ 27.946424] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > > [ 27.948197] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > > [ 27.949992] [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380 > > > > [ 27.951816] <<EOE>> <IRQ> [<ffffffff8108f5a3>] ? > > > > run_timer_softirq+0x43/0x340 > > > > [ 27.954114] [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0 > > > > [ 27.955962] [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60 > > > > [ 27.957635] [<ffffffff811210c7>] handle_edge_irq+0x77/0x130 > > > > [ 27.959332] [<ffffffff8101704f>] handle_irq+0xbf/0x150 > > > > [ 27.960949] [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0 > > > > [ 27.962434] [<ffffffff8163faed>] common_interrupt+0x6d/0x6d > > > > [ 27.964101] <EOI> [<ffffffff8163f43b>] ? > > > > _raw_spin_unlock_irqrestore+0x1b/0x40 > > > > [ 27.966308] [<fffff8111ff07>] __setup_irq+0x2a7/0x570 > > > > [ 28.067859] [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140 > > > > [ 28.069709] [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170 > > > > [ 28.071585] [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450 > > > > [ 28.073240] [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450 > > > > [ 28.074911] [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0 > > > > [ 28.076533] [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0 > > > > [ 28.078198] [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390 > > > > [ 28.079971] [<ffffffff813f9083>] __driver_attach+0x93/0xa0 > > > > [ 28.081660] [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40 > > > > [ 28.083662] [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0 > > > > [ 28.085370] [<ffffffff813f86fe>] driver_attach+0x1e/0x20 > > > > [ 28.086974] [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0 > > > > [ 28.088634] [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe > > > > [ 28.090349] [<ffffffff813f9704>] driver_register+0x64/0xf0 > > > > [ 28.091989] [<ffffffff8139b070>] pnp_register_driver+0x20/0x30 > > > > [ 28.093707] [<ffffffff81ade4ab>] cmos_init+0x11/0x71 > > > > ---<-snip->--- > > > > > > > > The previous patch split hpet_rtc_timer_init into > > > > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable(). > > > > > > > > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ > > > > registration, so that we can gracefully handle such spurious interrupts. > > > > > > > > We were able to reproduce the problem in maximum 15 trials of kdump > > > > secondary kernel boot on an hp-dl160gen8 machine without this patch. > > > > However, more than 35 trials went fine after applying this patch. > > > > > > > > Signed-off-by: Pratyush Anand <panand@redhat.com> > > > > [dzickus@redhat.com: edited the patch's summary] > > > > Signed-off-by: Don Zickus <dzickus@redhat.com> > > > > --- > > > > drivers/rtc/rtc-cmos.c | 13 ++++++++++++- > > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > > > > index 43745cac0141..089d987f2638 100644 > > > > --- a/drivers/rtc/rtc-cmos.c > > > > +++ b/drivers/rtc/rtc-cmos.c > > > > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void) > > > > return 0; > > > > } > > > > > > > > +static inline int hpet_rtc_timer_counter_init(void) > > > > +{ > > > > + return 0; > > > > +} > > > > + > > > > +static inline int hpet_rtc_timer_enable(void) > > > > +{ > > > > + return 0; > > > > +} > > > > + > > > > > > Can these dummy functions go to /usr/include/linux/hpet.h alont with > > > the #ifdef etc. > > > > I kept them here because, similar functions like hpet_set_alarm_time() were > > already there. So, if you suggest that I should have an additional cleanup patch > > first, which moves existing #ifdef block to inlcude/linux/hpet.h and this > > patch adds it's inline in linux/hpet.h, then may be I can take that. But not > > sure if there is something more to be done which will help the MAINTAINER to > > take it. > > > > ~Pratyush > > > > > > > > > static inline int hpet_rtc_timer_init(void) > > > > { > > > > return 0; > > > > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > > > goto cleanup1; > > > > } > > > > > > > > + hpet_rtc_timer_counter_init(); > > > > if (is_valid_irq(rtc_irq)) { > > > > irq_handler_t rtc_cmos_int_handler; > > > > > > > > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > > > > goto cleanup1; > > > > } > > > > } > > > > - hpet_rtc_timer_init(); > > > > + hpet_rtc_timer_enable(); > > > > > > > > /* export at least the first block of NVRAM */ > > > > nvram.size = address_space - NVRAM_OFFSET; > > > > -- > > > > 2.5.5 > > > > > > > > > > Thanks > > > Dave > > Thanks > Dave -- Alexandre Belloni, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-08-16 3:25 ` [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered Pratyush Anand 2016-08-30 8:22 ` Dave Young @ 2016-09-06 9:58 ` Thomas Gleixner 2016-09-06 10:40 ` Pratyush Anand 1 sibling, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2016-09-06 9:58 UTC (permalink / raw) To: Pratyush Anand Cc: mingo, alexandre.belloni, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, dyoung, a.zummo On Tue, 16 Aug 2016, Pratyush Anand wrote: That's a lot of churn to fix that simple problem. The two liner below should fix that as well, right? Thanks, tglx diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index 43745cac0141..cb8dfc3ee012 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -707,6 +707,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } + hpet_rtc_timer_init(); if (is_valid_irq(rtc_irq)) { irq_handler_t rtc_cmos_int_handler; @@ -714,6 +715,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) rtc_cmos_int_handler = hpet_rtc_interrupt; retval = hpet_register_irq_handler(cmos_interrupt); if (retval) { + hpet_mask_rtc_irq_bit(RTC_IRQMASK); dev_warn(dev, "hpet_register_irq_handler " " failed in rtc_init()."); goto cleanup1; @@ -729,7 +731,6 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) goto cleanup1; } } - hpet_rtc_timer_init(); /* export at least the first block of NVRAM */ nvram.size = address_space - NVRAM_OFFSET; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered 2016-09-06 9:58 ` Thomas Gleixner @ 2016-09-06 10:40 ` Pratyush Anand 0 siblings, 0 replies; 11+ messages in thread From: Pratyush Anand @ 2016-09-06 10:40 UTC (permalink / raw) To: Thomas Gleixner Cc: mingo, alexandre.belloni, hpa, x86, rtc-linux, linux-kernel, prarit, dzickus, dyoung, a.zummo Hi Thomas, On 06/09/2016:11:58:08 AM, Thomas Gleixner wrote: > On Tue, 16 Aug 2016, Pratyush Anand wrote: > > That's a lot of churn to fix that simple problem. The two liner below > should fix that as well, right? Thanks a lot for your reply. Yes, that should fix. I was n't sure if "setting HPET_TN_ENABLE before IRQ is registered" would be a right step. I can give it a try, however it is pretty obvious that these two liners should be able to solve the issue. So, if this solution will not have any side effect then, it is fine for me as well. ~Pratyush > > Thanks, > > tglx > > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c > index 43745cac0141..cb8dfc3ee012 100644 > --- a/drivers/rtc/rtc-cmos.c > +++ b/drivers/rtc/rtc-cmos.c > @@ -707,6 +707,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > goto cleanup1; > } > > + hpet_rtc_timer_init(); > if (is_valid_irq(rtc_irq)) { > irq_handler_t rtc_cmos_int_handler; > > @@ -714,6 +715,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > rtc_cmos_int_handler = hpet_rtc_interrupt; > retval = hpet_register_irq_handler(cmos_interrupt); > if (retval) { > + hpet_mask_rtc_irq_bit(RTC_IRQMASK); > dev_warn(dev, "hpet_register_irq_handler " > " failed in rtc_init()."); > goto cleanup1; > @@ -729,7 +731,6 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq) > goto cleanup1; > } > } > - hpet_rtc_timer_init(); > > /* export at least the first block of NVRAM */ > nvram.size = address_space - NVRAM_OFFSET; > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-09-06 10:40 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-16 3:25 [PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation Pratyush Anand 2016-08-16 3:25 ` [PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init() Pratyush Anand 2016-08-16 3:25 ` [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered Pratyush Anand 2016-08-30 8:22 ` Dave Young 2016-08-30 8:38 ` Dave Young 2016-08-30 9:10 ` Dave Young 2016-08-30 9:54 ` Pratyush Anand 2016-08-31 4:56 ` Dave Young 2016-08-31 6:44 ` Alexandre Belloni 2016-09-06 9:58 ` Thomas Gleixner 2016-09-06 10:40 ` Pratyush Anand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).