All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Young <dyoung@redhat.com>
To: Pratyush Anand <panand@redhat.com>
Cc: mingo@kernel.org, alexandre.belloni@free-electrons.com,
	tglx@linutronix.de, hpa@zytor.com, x86@kernel.org,
	rtc-linux@googlegroups.com, linux-kernel@vger.kernel.org,
	prarit@redhat.com, dzickus@redhat.com, a.zummo@towertech.it
Subject: Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered
Date: Tue, 30 Aug 2016 16:38:14 +0800	[thread overview]
Message-ID: <20160830083814.GA7308@dhcp-128-65.nay.redhat.com> (raw)
In-Reply-To: <20160830082230.GA7000@dhcp-128-65.nay.redhat.com>

On 08/30/16 at 04:22pm, Dave Young wrote:
> Hi, Pratyush
> 
> On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > We have observed on few x86 machines with rtc-cmos device that
> > hpet_rtc_interrupt() is called just after irq registration and before
> > cmos_do_probe() could call hpet_rtc_timer_init().
> > 
> > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > interrupt is raised in the given situation, and this results in NMI
> > watchdog LOCKUP.
> > 
> > It has only been observed sporadically on kdump secondary kernels.
> > 
> > See the call trace:
> > ---<-snip->---
> >    27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > cpu 0
> > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 3.10.0-342.el7.x86_64 #1
> > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > [   27.919455]  ffffffff8186a728 0000000059c82488 ffff880034e05af0
> > ffffffff81637bd4
> > [   27.921870]  ffff880034e05b70 ffffffff8163144a 0000000000000010
> > ffff880034e05b80
> > [   27.924257]  ffff880034e05b20 0000000059c82488 0000000000000000
> > 0000000000000000
> > [   27.926599] Call Trace:
> > [   27.927352]  <NMI>  [<ffffffff81637bd4>] dump_stack+0x19/0x1b
> > [   27.929080]  [<ffffffff8163144a>] panic+0xd8/0x1e7
> > [   27.930588]  [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50
> > [   27.932502]  [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0
> > [   27.934427]  [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250
> > [   27.936232]  [<ffffffff81161d94>] perf_event_overflow+0x14/0x20
> > [   27.937957]  [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470
> > [   27.939799]  [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50
> > [   27.941649]  [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0
> > [   27.943348]  [<ffffffff81640f49>] do_nmi+0x169/0x340
> > [   27.944802]  [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e
> > [   27.946424]  [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.948197]  [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.949992]  [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.951816]  <<EOE>>  <IRQ>  [<ffffffff8108f5a3>] ?
> > run_timer_softirq+0x43/0x340
> > [   27.954114]  [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0
> > [   27.955962]  [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60
> > [   27.957635]  [<ffffffff811210c7>] handle_edge_irq+0x77/0x130
> > [   27.959332]  [<ffffffff8101704f>] handle_irq+0xbf/0x150
> > [   27.960949]  [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0
> > [   27.962434]  [<ffffffff8163faed>] common_interrupt+0x6d/0x6d
> > [   27.964101]  <EOI>  [<ffffffff8163f43b>] ?
> > _raw_spin_unlock_irqrestore+0x1b/0x40
> > [   27.966308]  [<fffff8111ff07>] __setup_irq+0x2a7/0x570
> > [   28.067859]  [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140
> > [   28.069709]  [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170
> > [   28.071585]  [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450
> > [   28.073240]  [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450
> > [   28.074911]  [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0
> > [   28.076533]  [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0
> > [   28.078198]  [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390
> > [   28.079971]  [<ffffffff813f9083>] __driver_attach+0x93/0xa0
> > [   28.081660]  [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40
> > [   28.083662]  [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0
> > [   28.085370]  [<ffffffff813f86fe>] driver_attach+0x1e/0x20
> > [   28.086974]  [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0
> > [   28.088634]  [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe
> > [   28.090349]  [<ffffffff813f9704>] driver_register+0x64/0xf0
> > [   28.091989]  [<ffffffff8139b070>] pnp_register_driver+0x20/0x30
> > [   28.093707]  [<ffffffff81ade4ab>] cmos_init+0x11/0x71
> > ---<-snip->---
> > 
> > The previous patch split hpet_rtc_timer_init into
> > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > 
> > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > registration, so that we can gracefully handle such spurious interrupts.
> > 
> > We were able to reproduce the problem in maximum 15 trials of kdump
> > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > However, more than 35 trials went fine after applying this patch.
> > 
> > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > [dzickus@redhat.com: edited the patch's summary]
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> >  drivers/rtc/rtc-cmos.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > index 43745cac0141..089d987f2638 100644
> > --- a/drivers/rtc/rtc-cmos.c
> > +++ b/drivers/rtc/rtc-cmos.c
> > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> >  	return 0;
> >  }
> >  
> > +static inline int hpet_rtc_timer_counter_init(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline int hpet_rtc_timer_enable(void)
> > +{
> > +	return 0;
> > +}
> > +
> 
> Can these dummy functions go to /usr/include/linux/hpet.h alont with
> the #ifdef  etc.

Hmm, seems CONFIG_HPET_EMULATE_RTC is x86 only, so maybe go to
asm/hpet.h should be better..

> 
> >  static inline int hpet_rtc_timer_init(void)
> >  {
> >  	return 0;
> > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> >  		goto cleanup1;
> >  	}
> >  
> > +	hpet_rtc_timer_counter_init();
> >  	if (is_valid_irq(rtc_irq)) {
> >  		irq_handler_t rtc_cmos_int_handler;
> >  
> > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> >  			goto cleanup1;
> >  		}
> >  	}
> > -	hpet_rtc_timer_init();
> > +	hpet_rtc_timer_enable();
> >  
> >  	/* export at least the first block of NVRAM */
> >  	nvram.size = address_space - NVRAM_OFFSET;
> > -- 
> > 2.5.5
> > 
> 
> Thanks
> Dave

WARNING: multiple messages have this Message-ID (diff)
From: Dave Young <dyoung@redhat.com>
To: Pratyush Anand <panand@redhat.com>
Cc: mingo@kernel.org, alexandre.belloni@free-electrons.com,
	tglx@linutronix.de, hpa@zytor.com, x86@kernel.org,
	rtc-linux@googlegroups.com, linux-kernel@vger.kernel.org,
	prarit@redhat.com, dzickus@redhat.com, a.zummo@towertech.it
Subject: [rtc-linux] Re: [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered
Date: Tue, 30 Aug 2016 16:38:14 +0800	[thread overview]
Message-ID: <20160830083814.GA7308@dhcp-128-65.nay.redhat.com> (raw)
In-Reply-To: <20160830082230.GA7000@dhcp-128-65.nay.redhat.com>

On 08/30/16 at 04:22pm, Dave Young wrote:
> Hi, Pratyush
> 
> On 08/16/16 at 08:55am, Pratyush Anand wrote:
> > We have observed on few x86 machines with rtc-cmos device that
> > hpet_rtc_interrupt() is called just after irq registration and before
> > cmos_do_probe() could call hpet_rtc_timer_init().
> > 
> > So, neither hpet_default_delta nor hpet_t1_cmp is initialized by the time
> > interrupt is raised in the given situation, and this results in NMI
> > watchdog LOCKUP.
> > 
> > It has only been observed sporadically on kdump secondary kernels.
> > 
> > See the call trace:
> > ---<-snip->---
> >    27.913194] Kernel panic - not syncing: Watchdog detected hard LOCKUP on
> > cpu 0
> > [   27.915371] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 3.10.0-342.el7.x86_64 #1
> > [   27.917503] Hardware name: HP ProLiant DL160 Gen8, BIOS J03 02/10/2014
> > [   27.919455]  ffffffff8186a728 0000000059c82488 ffff880034e05af0
> > ffffffff81637bd4
> > [   27.921870]  ffff880034e05b70 ffffffff8163144a 0000000000000010
> > ffff880034e05b80
> > [   27.924257]  ffff880034e05b20 0000000059c82488 0000000000000000
> > 0000000000000000
> > [   27.926599] Call Trace:
> > [   27.927352]  <NMI>  [<ffffffff81637bd4>] dump_stack+0x19/0x1b
> > [   27.929080]  [<ffffffff8163144a>] panic+0xd8/0x1e7
> > [   27.930588]  [<ffffffff8111d3e0>] ? restart_watchdog_hrtimer+0x50/0x50
> > [   27.932502]  [<ffffffff8111d4a2>] watchdog_overflow_callback+0xc2/0xd0
> > [   27.934427]  [<ffffffff811612c1>] __perf_event_overflow+0xa1/0x250
> > [   27.936232]  [<ffffffff81161d94>] perf_event_overflow+0x14/0x20
> > [   27.937957]  [<ffffffff81032ae8>] intel_pmu_handle_irq+0x1e8/0x470
> > [   27.939799]  [<ffffffff8164164b>] perf_event_nmi_handler+0x2b/0x50
> > [   27.941649]  [<ffffffff81640d99>] nmi_handle.isra.0+0x69/0xb0
> > [   27.943348]  [<ffffffff81640f49>] do_nmi+0x169/0x340
> > [   27.944802]  [<ffffffff816401d3>] end_repeat_nmi+0x1e/0x2e
> > [   27.946424]  [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.948197]  [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.949992]  [<ffffffff81056ee5>] ? hpet_rtc_interrupt+0x85/0x380
> > [   27.951816]  <<EOE>>  <IRQ>  [<ffffffff8108f5a3>] ?
> > run_timer_softirq+0x43/0x340
> > [   27.954114]  [<ffffffff8111e24e>] handle_irq_event_percpu+0x3e/0x1e0
> > [   27.955962]  [<ffffffff8111e42d>] handle_irq_event+0x3d/0x60
> > [   27.957635]  [<ffffffff811210c7>] handle_edge_irq+0x77/0x130
> > [   27.959332]  [<ffffffff8101704f>] handle_irq+0xbf/0x150
> > [   27.960949]  [<ffffffff8164a86f>] do_IRQ+0x4f/0xf0
> > [   27.962434]  [<ffffffff8163faed>] common_interrupt+0x6d/0x6d
> > [   27.964101]  <EOI>  [<ffffffff8163f43b>] ?
> > _raw_spin_unlock_irqrestore+0x1b/0x40
> > [   27.966308]  [<fffff8111ff07>] __setup_irq+0x2a7/0x570
> > [   28.067859]  [<ffffffff81056e60>] ? hpet_cpuhp_notify+0x140/0x140
> > [   28.069709]  [<ffffffff8112032c>] request_threaded_irq+0xcc/0x170
> > [   28.071585]  [<ffffffff814b24a6>] cmos_do_probe+0x1e6/0x450
> > [   28.073240]  [<ffffffff814b2710>] ? cmos_do_probe+0x450/0x450
> > [   28.074911]  [<ffffffff814b27cb>] cmos_pnp_probe+0xbb/0xc0
> > [   28.076533]  [<ffffffff8139b245>] pnp_device_probe+0x65/0xd0
> > [   28.078198]  [<ffffffff813f8ca7>] driver_probe_device+0x87/0x390
> > [   28.079971]  [<ffffffff813f9083>] __driver_attach+0x93/0xa0
> > [   28.081660]  [<ffffffff813f8ff0>] ? __device_attach+0x40/0x40
> > [   28.083662]  [<ffffffff813f6a13>] bus_for_each_dev+0x73/0xc0
> > [   28.085370]  [<ffffffff813f86fe>] driver_attach+0x1e/0x20
> > [   28.086974]  [<ffffffff813f8250>] bus_add_driver+0x200/0x2d0
> > [   28.088634]  [<ffffffff81ade49a>] ? rtc_sysfs_init+0xe/0xe
> > [   28.090349]  [<ffffffff813f9704>] driver_register+0x64/0xf0
> > [   28.091989]  [<ffffffff8139b070>] pnp_register_driver+0x20/0x30
> > [   28.093707]  [<ffffffff81ade4ab>] cmos_init+0x11/0x71
> > ---<-snip->---
> > 
> > The previous patch split hpet_rtc_timer_init into
> > hpet_rtc_timer_counter_init() and hpet_rtc_timer_enable().
> > 
> > Therefore, this patch moved hpet_rtc_timer_counter_init() before IRQ
> > registration, so that we can gracefully handle such spurious interrupts.
> > 
> > We were able to reproduce the problem in maximum 15 trials of kdump
> > secondary kernel boot on an hp-dl160gen8 machine without this patch.
> > However, more than 35 trials went fine after applying this patch.
> > 
> > Signed-off-by: Pratyush Anand <panand@redhat.com>
> > [dzickus@redhat.com: edited the patch's summary]
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> >  drivers/rtc/rtc-cmos.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> > index 43745cac0141..089d987f2638 100644
> > --- a/drivers/rtc/rtc-cmos.c
> > +++ b/drivers/rtc/rtc-cmos.c
> > @@ -129,6 +129,16 @@ static inline int hpet_rtc_dropped_irq(void)
> >  	return 0;
> >  }
> >  
> > +static inline int hpet_rtc_timer_counter_init(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline int hpet_rtc_timer_enable(void)
> > +{
> > +	return 0;
> > +}
> > +
> 
> Can these dummy functions go to /usr/include/linux/hpet.h alont with
> the #ifdef  etc.

Hmm, seems CONFIG_HPET_EMULATE_RTC is x86 only, so maybe go to
asm/hpet.h should be better..

> 
> >  static inline int hpet_rtc_timer_init(void)
> >  {
> >  	return 0;
> > @@ -707,6 +717,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> >  		goto cleanup1;
> >  	}
> >  
> > +	hpet_rtc_timer_counter_init();
> >  	if (is_valid_irq(rtc_irq)) {
> >  		irq_handler_t rtc_cmos_int_handler;
> >  
> > @@ -729,7 +740,7 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
> >  			goto cleanup1;
> >  		}
> >  	}
> > -	hpet_rtc_timer_init();
> > +	hpet_rtc_timer_enable();
> >  
> >  	/* export at least the first block of NVRAM */
> >  	nvram.size = address_space - NVRAM_OFFSET;
> > -- 
> > 2.5.5
> > 
> 
> Thanks
> Dave

-- 
You received this message because you are subscribed to "rtc-linux".
Membership options at http://groups.google.com/group/rtc-linux .
Please read http://groups.google.com/group/rtc-linux/web/checklist
before submitting a driver.
--- 
You received this message because you are subscribed to the Google Groups "rtc-linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtc-linux+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

  reply	other threads:[~2016-08-30  8:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-16  3:25 [PATCH V3 0/2] rtc-cmos: Workaround unwanted interrupt generation Pratyush Anand
2016-08-16  3:25 ` [rtc-linux] " Pratyush Anand
2016-08-16  3:25 ` [PATCH V3 1/2] rtc/hpet: Factorize hpet_rtc_timer_init() Pratyush Anand
2016-08-16  3:25   ` [rtc-linux] " Pratyush Anand
2016-08-16  3:25 ` [PATCH V3 2/2] rtc/rtc-cmos: Initialize software counters before irq is registered Pratyush Anand
2016-08-16  3:25   ` [rtc-linux] " Pratyush Anand
2016-08-30  8:22   ` Dave Young
2016-08-30  8:22     ` [rtc-linux] " Dave Young
2016-08-30  8:38     ` Dave Young [this message]
2016-08-30  8:38       ` Dave Young
2016-08-30  9:10       ` Dave Young
2016-08-30  9:10         ` [rtc-linux] " Dave Young
2016-08-30  9:54     ` Pratyush Anand
2016-08-30  9:54       ` [rtc-linux] " Pratyush Anand
2016-08-31  4:56       ` Dave Young
2016-08-31  4:56         ` [rtc-linux] " Dave Young
2016-08-31  6:44         ` Alexandre Belloni
2016-08-31  6:44           ` [rtc-linux] " Alexandre Belloni
2016-09-06  9:58   ` Thomas Gleixner
2016-09-06  9:58     ` [rtc-linux] " Thomas Gleixner
2016-09-06 10:40     ` Pratyush Anand
2016-09-06 10:40       ` [rtc-linux] " Pratyush Anand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160830083814.GA7308@dhcp-128-65.nay.redhat.com \
    --to=dyoung@redhat.com \
    --cc=a.zummo@towertech.it \
    --cc=alexandre.belloni@free-electrons.com \
    --cc=dzickus@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=panand@redhat.com \
    --cc=prarit@redhat.com \
    --cc=rtc-linux@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.