linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
       [not found]       ` <50485698.1070905@linaro.org>
@ 2012-09-06  9:22         ` Daniel Lezcano
  2012-09-06 20:04           ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2012-09-06  9:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: xen-devel, linaro-dev, Konrad Rzeszutek Wilk, linux-pm,
	linux-acpi, lenb, Frederic Weisbecker, Linux Kernel Mailing List

On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
> On 09/05/2012 03:41 PM, Rafael J. Wysocki wrote:
>> On Saturday, September 01, 2012, Rafael J. Wysocki wrote:
>>> On Friday, August 31, 2012, Daniel Lezcano wrote:
>>>> On 07/24/2012 11:06 PM, Konrad Rzeszutek Wilk wrote:
>>>>> On Tue, Jul 24, 2012 at 11:12:29PM +0200, Daniel Lezcano wrote:
>>>>>> Remove the power field as it is not used.
>>>>>>
>>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>>> Acked.
>>>> Hi Rafael,
>>>>
>>>> I did not see this patch going in. Is it possible to merge it ?
>>> I think so.  I'll take care of it when I get back from LinuxCon/Plumbers Conf.
>>> (early next week).
>> Applied to the linux-next branch of the linux-pm.git tree as v3.7 material.
> Thanks Rafael.
>
>> Are there any other patches you want me to consider for v3.7?
> Yes please, I have the per cpu latencies ready to be submitted but I
> want to do extra testing before. Unfortunately, the linux-pm-next hangs
> at boot time on my intel dual core (not related to the patchset).
>
> I am git bisecting right now.

I found the culprit. This is not related to the linux-pm tree but with
net-next.
The following patch introduced the issue.

commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d
Author: Amerigo Wang <amwang@redhat.com>
Date:   Fri Aug 10 01:24:50 2012 +0000

    netpoll: re-enable irq in poll_napi()
   
    napi->poll() needs IRQ enabled, so we have to re-enable IRQ before
    calling it.
   
    Cc: David Miller <davem@davemloft.net>
    Signed-off-by: Cong Wang <amwang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

AFAICS, it has been fixed by commit
072a9c48600409d72aeb0d5b29fbb75861a06631 which is not yet in linux-pm-next.

I fall into this issue because NETCONSOLE is set, disabling it allowed
me to go further.

Unfortunately I am facing to some random freeze on the system which
seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.

Disabling one of them, make the freezes to disappear.

Is it a known issue ?

Thanks in advance
  -- Daniel





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-06  9:22         ` CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure) Daniel Lezcano
@ 2012-09-06 20:04           ` Rafael J. Wysocki
  2012-09-06 20:35             ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2012-09-06 20:04 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: xen-devel, linaro-dev, Konrad Rzeszutek Wilk, linux-pm,
	linux-acpi, lenb, Frederic Weisbecker, Linux Kernel Mailing List

On Thursday, September 06, 2012, Daniel Lezcano wrote:
> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
> > On 09/05/2012 03:41 PM, Rafael J. Wysocki wrote:
> >> On Saturday, September 01, 2012, Rafael J. Wysocki wrote:
> >>> On Friday, August 31, 2012, Daniel Lezcano wrote:
> >>>> On 07/24/2012 11:06 PM, Konrad Rzeszutek Wilk wrote:
> >>>>> On Tue, Jul 24, 2012 at 11:12:29PM +0200, Daniel Lezcano wrote:
> >>>>>> Remove the power field as it is not used.
> >>>>>>
> >>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> >>>>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >>>>> Acked.
> >>>> Hi Rafael,
> >>>>
> >>>> I did not see this patch going in. Is it possible to merge it ?
> >>> I think so.  I'll take care of it when I get back from LinuxCon/Plumbers Conf.
> >>> (early next week).
> >> Applied to the linux-next branch of the linux-pm.git tree as v3.7 material.
> > Thanks Rafael.
> >
> >> Are there any other patches you want me to consider for v3.7?
> > Yes please, I have the per cpu latencies ready to be submitted but I
> > want to do extra testing before. Unfortunately, the linux-pm-next hangs
> > at boot time on my intel dual core (not related to the patchset).
> >
> > I am git bisecting right now.
> 
> I found the culprit. This is not related to the linux-pm tree but with
> net-next.
> The following patch introduced the issue.
> 
> commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d
> Author: Amerigo Wang <amwang@redhat.com>
> Date:   Fri Aug 10 01:24:50 2012 +0000
> 
>     netpoll: re-enable irq in poll_napi()
>    
>     napi->poll() needs IRQ enabled, so we have to re-enable IRQ before
>     calling it.
>    
>     Cc: David Miller <davem@davemloft.net>
>     Signed-off-by: Cong Wang <amwang@redhat.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> AFAICS, it has been fixed by commit
> 072a9c48600409d72aeb0d5b29fbb75861a06631 which is not yet in linux-pm-next.

If it is present in the current Linus' tree, you can just pull this one
and merge linux-pm-next into it.  It should merge without conflicts.

> I fall into this issue because NETCONSOLE is set, disabling it allowed
> me to go further.
> 
> Unfortunately I am facing to some random freeze on the system which
> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
> 
> Disabling one of them, make the freezes to disappear.
> 
> Is it a known issue ?

Well, there are systems having problems with this configuration, but they
should be exceptional.  What system is that?

Rafael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-06 20:04           ` Rafael J. Wysocki
@ 2012-09-06 20:35             ` Daniel Lezcano
  2012-09-06 21:18               ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2012-09-06 20:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: xen-devel, linaro-dev, Konrad Rzeszutek Wilk, linux-pm,
	linux-acpi, lenb, Frederic Weisbecker, Linux Kernel Mailing List

On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>> On 09/05/2012 03:41 PM, Rafael J. Wysocki wrote:
>>>> On Saturday, September 01, 2012, Rafael J. Wysocki wrote:
>>>>> On Friday, August 31, 2012, Daniel Lezcano wrote:
>>>>>> On 07/24/2012 11:06 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>> On Tue, Jul 24, 2012 at 11:12:29PM +0200, Daniel Lezcano wrote:
>>>>>>>> Remove the power field as it is not used.
>>>>>>>>
>>>>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>>>>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>>>>> Acked.
>>>>>> Hi Rafael,
>>>>>>
>>>>>> I did not see this patch going in. Is it possible to merge it ?
>>>>> I think so.  I'll take care of it when I get back from LinuxCon/Plumbers Conf.
>>>>> (early next week).
>>>> Applied to the linux-next branch of the linux-pm.git tree as v3.7 material.
>>> Thanks Rafael.
>>>
>>>> Are there any other patches you want me to consider for v3.7?
>>> Yes please, I have the per cpu latencies ready to be submitted but I
>>> want to do extra testing before. Unfortunately, the linux-pm-next hangs
>>> at boot time on my intel dual core (not related to the patchset).
>>>
>>> I am git bisecting right now.
>>
>> I found the culprit. This is not related to the linux-pm tree but with
>> net-next.
>> The following patch introduced the issue.
>>
>> commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d
>> Author: Amerigo Wang <amwang@redhat.com>
>> Date:   Fri Aug 10 01:24:50 2012 +0000
>>
>>     netpoll: re-enable irq in poll_napi()
>>    
>>     napi->poll() needs IRQ enabled, so we have to re-enable IRQ before
>>     calling it.
>>    
>>     Cc: David Miller <davem@davemloft.net>
>>     Signed-off-by: Cong Wang <amwang@redhat.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> AFAICS, it has been fixed by commit
>> 072a9c48600409d72aeb0d5b29fbb75861a06631 which is not yet in linux-pm-next.
> 
> If it is present in the current Linus' tree, you can just pull this one
> and merge linux-pm-next into it.  It should merge without conflicts.

Ok, thanks.

>> I fall into this issue because NETCONSOLE is set, disabling it allowed
>> me to go further.
>>
>> Unfortunately I am facing to some random freeze on the system which
>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>
>> Disabling one of them, make the freezes to disappear.
>>
>> Is it a known issue ?
> 
> Well, there are systems having problems with this configuration, but they
> should be exceptional.  What system is that?

It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
believe. Maybe someone got the same issue ?

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-06 20:35             ` Daniel Lezcano
@ 2012-09-06 21:18               ` Rafael J. Wysocki
  2012-09-07 14:20                 ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2012-09-06 21:18 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: xen-devel, linaro-dev, Konrad Rzeszutek Wilk, linux-pm,
	linux-acpi, lenb, Frederic Weisbecker, Linux Kernel Mailing List

On Thursday, September 06, 2012, Daniel Lezcano wrote:
> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
> > On Thursday, September 06, 2012, Daniel Lezcano wrote:
> >> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
> >>> On 09/05/2012 03:41 PM, Rafael J. Wysocki wrote:
> >>>> On Saturday, September 01, 2012, Rafael J. Wysocki wrote:
> >>>>> On Friday, August 31, 2012, Daniel Lezcano wrote:
> >>>>>> On 07/24/2012 11:06 PM, Konrad Rzeszutek Wilk wrote:
> >>>>>>> On Tue, Jul 24, 2012 at 11:12:29PM +0200, Daniel Lezcano wrote:
> >>>>>>>> Remove the power field as it is not used.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> >>>>>>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >>>>>>> Acked.
> >>>>>> Hi Rafael,
> >>>>>>
> >>>>>> I did not see this patch going in. Is it possible to merge it ?
> >>>>> I think so.  I'll take care of it when I get back from LinuxCon/Plumbers Conf.
> >>>>> (early next week).
> >>>> Applied to the linux-next branch of the linux-pm.git tree as v3.7 material.
> >>> Thanks Rafael.
> >>>
> >>>> Are there any other patches you want me to consider for v3.7?
> >>> Yes please, I have the per cpu latencies ready to be submitted but I
> >>> want to do extra testing before. Unfortunately, the linux-pm-next hangs
> >>> at boot time on my intel dual core (not related to the patchset).
> >>>
> >>> I am git bisecting right now.
> >>
> >> I found the culprit. This is not related to the linux-pm tree but with
> >> net-next.
> >> The following patch introduced the issue.
> >>
> >> commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d
> >> Author: Amerigo Wang <amwang@redhat.com>
> >> Date:   Fri Aug 10 01:24:50 2012 +0000
> >>
> >>     netpoll: re-enable irq in poll_napi()
> >>    
> >>     napi->poll() needs IRQ enabled, so we have to re-enable IRQ before
> >>     calling it.
> >>    
> >>     Cc: David Miller <davem@davemloft.net>
> >>     Signed-off-by: Cong Wang <amwang@redhat.com>
> >>     Signed-off-by: David S. Miller <davem@davemloft.net>
> >>
> >> AFAICS, it has been fixed by commit
> >> 072a9c48600409d72aeb0d5b29fbb75861a06631 which is not yet in linux-pm-next.
> > 
> > If it is present in the current Linus' tree, you can just pull this one
> > and merge linux-pm-next into it.  It should merge without conflicts.
> 
> Ok, thanks.
> 
> >> I fall into this issue because NETCONSOLE is set, disabling it allowed
> >> me to go further.
> >>
> >> Unfortunately I am facing to some random freeze on the system which
> >> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
> >>
> >> Disabling one of them, make the freezes to disappear.
> >>
> >> Is it a known issue ?
> > 
> > Well, there are systems having problems with this configuration, but they
> > should be exceptional.  What system is that?
> 
> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
> believe. Maybe someone got the same issue ?

Is it a regression for you?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-06 21:18               ` Rafael J. Wysocki
@ 2012-09-07 14:20                 ` Daniel Lezcano
  2012-09-07 17:22                   ` John Stultz
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2012-09-07 14:20 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: xen-devel, linaro-dev, Konrad Rzeszutek Wilk, linux-pm,
	linux-acpi, lenb, Frederic Weisbecker, Linux Kernel Mailing List,
	john.stultz, mingo, Peter Zijlstra, richardcochran, prarit,
	Thomas Gleixner

On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>> On 09/05/2012 03:41 PM, Rafael J. Wysocki wrote:
>>>>>> On Saturday, September 01, 2012, Rafael J. Wysocki wrote:
>>>>>>> On Friday, August 31, 2012, Daniel Lezcano wrote:
>>>>>>>> On 07/24/2012 11:06 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>>>> On Tue, Jul 24, 2012 at 11:12:29PM +0200, Daniel Lezcano wrote:
>>>>>>>>>> Remove the power field as it is not used.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>>>>>>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>>>>>>> Acked.
>>>>>>>> Hi Rafael,
>>>>>>>>
>>>>>>>> I did not see this patch going in. Is it possible to merge it ?
>>>>>>> I think so.  I'll take care of it when I get back from LinuxCon/Plumbers Conf.
>>>>>>> (early next week).
>>>>>> Applied to the linux-next branch of the linux-pm.git tree as v3.7 material.
>>>>> Thanks Rafael.
>>>>>
>>>>>> Are there any other patches you want me to consider for v3.7?
>>>>> Yes please, I have the per cpu latencies ready to be submitted but I
>>>>> want to do extra testing before. Unfortunately, the linux-pm-next hangs
>>>>> at boot time on my intel dual core (not related to the patchset).
>>>>>
>>>>> I am git bisecting right now.
>>>>
>>>> I found the culprit. This is not related to the linux-pm tree but with
>>>> net-next.
>>>> The following patch introduced the issue.
>>>>
>>>> commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d
>>>> Author: Amerigo Wang <amwang@redhat.com>
>>>> Date:   Fri Aug 10 01:24:50 2012 +0000
>>>>
>>>>     netpoll: re-enable irq in poll_napi()
>>>>    
>>>>     napi->poll() needs IRQ enabled, so we have to re-enable IRQ before
>>>>     calling it.
>>>>    
>>>>     Cc: David Miller <davem@davemloft.net>
>>>>     Signed-off-by: Cong Wang <amwang@redhat.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> AFAICS, it has been fixed by commit
>>>> 072a9c48600409d72aeb0d5b29fbb75861a06631 which is not yet in linux-pm-next.
>>>
>>> If it is present in the current Linus' tree, you can just pull this one
>>> and merge linux-pm-next into it.  It should merge without conflicts.
>>
>> Ok, thanks.
>>
>>>> I fall into this issue because NETCONSOLE is set, disabling it allowed
>>>> me to go further.
>>>>
>>>> Unfortunately I am facing to some random freeze on the system which
>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>
>>>> Disabling one of them, make the freezes to disappear.
>>>>
>>>> Is it a known issue ?
>>>
>>> Well, there are systems having problems with this configuration, but they
>>> should be exceptional.  What system is that?
>>
>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>> believe. Maybe someone got the same issue ?
> 
> Is it a regression for you?

Yes, I think so. The issue appears between v3.5 and v3.6-rc1.

It is not easy to reproduce but after taking some time to dig, it seems
to appear with this commit:

1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
Author: John Stultz <john.stultz@linaro.org>
Date:   Fri Jul 13 01:21:53 2012 -0400

    time: Condense timekeeper.xtime into xtime_sec

    The timekeeper struct has a xtime_nsec, which keeps the
    sub-nanosecond remainder.  This ends up being somewhat
    duplicative of the timekeeper.xtime.tv_nsec value, and we
    have to do extra work to keep them apart, copying the full
    nsec portion out and back in over and over.

    This patch simplifies some of the logic by taking the timekeeper
    xtime value and splitting it into timekeeper.xtime_sec and
    reuses the timekeeper.xtime_nsec for the sub-second portion
    (stored in higher res shifted nanoseconds).

    This simplifies some of the accumulation logic. And will
    allow for more accurate timekeeping once the vsyscall code
    is updated to use the shifted nanosecond remainder.

    Signed-off-by: John Stultz <john.stultz@linaro.org>
    Reviewed-by: Ingo Molnar <mingo@kernel.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Richard Cochran <richardcochran@gmail.com>
    Cc: Prarit Bhargava <prarit@redhat.com>
    Link:
http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.org
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

:040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
dc5708bc738af695f092bf822809b13a1da104b6 M	kernel

How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
kernel in busybox and wait some minutes before writing something in the
console. At this moment, nothing appears to the console but the
characters are echo'ed several seconds later (could be 1, 5, or 10 secs
or more).

That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
one of them, the issue does not appear.

Thanks
  -- Daniel

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-07 14:20                 ` Daniel Lezcano
@ 2012-09-07 17:22                   ` John Stultz
  2012-09-07 21:35                     ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: John Stultz @ 2012-09-07 17:22 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, xen-devel, linaro-dev, Konrad Rzeszutek Wilk,
	linux-pm, linux-acpi, lenb, Frederic Weisbecker,
	Linux Kernel Mailing List, mingo, Peter Zijlstra, richardcochran,
	prarit, Thomas Gleixner

On 09/07/2012 07:20 AM, Daniel Lezcano wrote:
> On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>> I fall into this issue because NETCONSOLE is set, disabling it allowed
>>>>> me to go further.
>>>>>
>>>>> Unfortunately I am facing to some random freeze on the system which
>>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>>
>>>>> Disabling one of them, make the freezes to disappear.
>>>>>
>>>>> Is it a known issue ?
>>>> Well, there are systems having problems with this configuration, but they
>>>> should be exceptional.  What system is that?
>>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>>> believe. Maybe someone got the same issue ?
>> Is it a regression for you?
> Yes, I think so. The issue appears between v3.5 and v3.6-rc1.
>
> It is not easy to reproduce but after taking some time to dig, it seems
> to appear with this commit:
>
> 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
> commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
> Author: John Stultz <john.stultz@linaro.org>
> Date:   Fri Jul 13 01:21:53 2012 -0400
>
>      time: Condense timekeeper.xtime into xtime_sec
>
>      The timekeeper struct has a xtime_nsec, which keeps the
>      sub-nanosecond remainder.  This ends up being somewhat
>      duplicative of the timekeeper.xtime.tv_nsec value, and we
>      have to do extra work to keep them apart, copying the full
>      nsec portion out and back in over and over.
>
>      This patch simplifies some of the logic by taking the timekeeper
>      xtime value and splitting it into timekeeper.xtime_sec and
>      reuses the timekeeper.xtime_nsec for the sub-second portion
>      (stored in higher res shifted nanoseconds).
>
>      This simplifies some of the accumulation logic. And will
>      allow for more accurate timekeeping once the vsyscall code
>      is updated to use the shifted nanosecond remainder.
>
>      Signed-off-by: John Stultz <john.stultz@linaro.org>
>      Reviewed-by: Ingo Molnar <mingo@kernel.org>
>      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>      Cc: Richard Cochran <richardcochran@gmail.com>
>      Cc: Prarit Bhargava <prarit@redhat.com>
>      Link:
> http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.org
>      Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> :040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
> dc5708bc738af695f092bf822809b13a1da104b6 M	kernel
>
> How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
> kernel in busybox and wait some minutes before writing something in the
> console. At this moment, nothing appears to the console but the
> characters are echo'ed several seconds later (could be 1, 5, or 10 secs
> or more).
>
> That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
> one of them, the issue does not appear.

Thanks for bisecting this down and the heads up!

Right off I can't see what might be causing this.  Bunch of questions:

Is this a 32 or 64 bit kernel?

By your description above, it sounds like the system is still 
functioning, but there's just a high latency for key-input. Is that right?

Are other things on the system happening slowly?

Does generating interrupts by hitting/holding down the ctrl key make the 
system respond faster?

Is there any dmesg output near when it occurs?

If you don't wait that minute after boot before typing anything, does it 
still trigger later? (or is it tied to early boot?)

On a whim, does the patch below avoid the problem?

thanks
-john

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34e5eac..2fa0e52 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1179,6 +1179,7 @@ static void update_wall_time(void)
  	timekeeping_adjust(tk, offset);
  
  
+#if 0
  	/*
  	* Store only full nanoseconds into xtime_nsec after rounding
  	* it up and add the remainder to the error difference.
@@ -1192,6 +1193,7 @@ static void update_wall_time(void)
  	tk->xtime_nsec -= remainder;
  	tk->xtime_nsec += 1ULL << tk->shift;
  	tk->ntp_error += remainder << tk->ntp_error_shift;
+#endif
  
  	/*
  	 * Finally, make sure that after the rounding


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-07 17:22                   ` John Stultz
@ 2012-09-07 21:35                     ` Daniel Lezcano
  2012-09-10 17:14                       ` John Stultz
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2012-09-07 21:35 UTC (permalink / raw)
  To: John Stultz
  Cc: Rafael J. Wysocki, xen-devel, linaro-dev, Konrad Rzeszutek Wilk,
	linux-pm, linux-acpi, lenb, Frederic Weisbecker,
	Linux Kernel Mailing List, mingo, Peter Zijlstra, richardcochran,
	prarit, Thomas Gleixner

On 09/07/2012 07:22 PM, John Stultz wrote:
> On 09/07/2012 07:20 AM, Daniel Lezcano wrote:
>> On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>>> I fall into this issue because NETCONSOLE is set, disabling it
>>>>>> allowed
>>>>>> me to go further.
>>>>>>
>>>>>> Unfortunately I am facing to some random freeze on the system which
>>>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>>>
>>>>>> Disabling one of them, make the freezes to disappear.
>>>>>>
>>>>>> Is it a known issue ?
>>>>> Well, there are systems having problems with this configuration,
>>>>> but they
>>>>> should be exceptional.  What system is that?
>>>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>>>> believe. Maybe someone got the same issue ?
>>> Is it a regression for you?
>> Yes, I think so. The issue appears between v3.5 and v3.6-rc1.
>>
>> It is not easy to reproduce but after taking some time to dig, it seems
>> to appear with this commit:
>>
>> 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
>> commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
>> Author: John Stultz <john.stultz@linaro.org>
>> Date:   Fri Jul 13 01:21:53 2012 -0400
>>
>>      time: Condense timekeeper.xtime into xtime_sec
>>
>>      The timekeeper struct has a xtime_nsec, which keeps the
>>      sub-nanosecond remainder.  This ends up being somewhat
>>      duplicative of the timekeeper.xtime.tv_nsec value, and we
>>      have to do extra work to keep them apart, copying the full
>>      nsec portion out and back in over and over.
>>
>>      This patch simplifies some of the logic by taking the timekeeper
>>      xtime value and splitting it into timekeeper.xtime_sec and
>>      reuses the timekeeper.xtime_nsec for the sub-second portion
>>      (stored in higher res shifted nanoseconds).
>>
>>      This simplifies some of the accumulation logic. And will
>>      allow for more accurate timekeeping once the vsyscall code
>>      is updated to use the shifted nanosecond remainder.
>>
>>      Signed-off-by: John Stultz <john.stultz@linaro.org>
>>      Reviewed-by: Ingo Molnar <mingo@kernel.org>
>>      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>      Cc: Richard Cochran <richardcochran@gmail.com>
>>      Cc: Prarit Bhargava <prarit@redhat.com>
>>      Link:
>> http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.org
>>
>>      Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>
>> :040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
>> dc5708bc738af695f092bf822809b13a1da104b6 M    kernel
>>
>> How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
>> kernel in busybox and wait some minutes before writing something in the
>> console. At this moment, nothing appears to the console but the
>> characters are echo'ed several seconds later (could be 1, 5, or 10 secs
>> or more).
>>
>> That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
>> one of them, the issue does not appear.
> 
> Thanks for bisecting this down and the heads up!
> 
> Right off I can't see what might be causing this.  Bunch of questions:
> 
> Is this a 32 or 64 bit kernel?

It is a 32 bit kernel.

> By your description above, it sounds like the system is still
> functioning, but there's just a high latency for key-input. Is that right?

Yes that's correct but not only. During this freeze time, I can't ping
the host. When the output is echo'ed, the ping works again.

But if I ping the host indefinitely, it does not freeze and the console
is echo'ed without problem.

> Are other things on the system happening slowly?

I have a very minimal system but at the first glance when it is not frozen

> Does generating interrupts by hitting/holding down the ctrl key make the
> system respond faster?

no.

> Is there any dmesg output near when it occurs?

no.

> If you don't wait that minute after boot before typing anything, does it
> still trigger later? (or is it tied to early boot?)

That depends, that could happen immediately or later. It is more or less
random.

> On a whim, does the patch below avoid the problem?

Nope, same issue :/

Thanks
  -- Daniel

> 
> thanks
> -john
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 34e5eac..2fa0e52 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1179,6 +1179,7 @@ static void update_wall_time(void)
>      timekeeping_adjust(tk, offset);
>  
>  
> +#if 0
>      /*
>      * Store only full nanoseconds into xtime_nsec after rounding
>      * it up and add the remainder to the error difference.
> @@ -1192,6 +1193,7 @@ static void update_wall_time(void)
>      tk->xtime_nsec -= remainder;
>      tk->xtime_nsec += 1ULL << tk->shift;
>      tk->ntp_error += remainder << tk->ntp_error_shift;
> +#endif
>  
>      /*
>       * Finally, make sure that after the rounding
> 


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-07 21:35                     ` Daniel Lezcano
@ 2012-09-10 17:14                       ` John Stultz
  2012-09-10 19:45                         ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: John Stultz @ 2012-09-10 17:14 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, xen-devel, linaro-dev, Konrad Rzeszutek Wilk,
	linux-pm, linux-acpi, lenb, Frederic Weisbecker,
	Linux Kernel Mailing List, mingo, Peter Zijlstra, richardcochran,
	prarit, Thomas Gleixner

On 09/07/2012 02:35 PM, Daniel Lezcano wrote:
> On 09/07/2012 07:22 PM, John Stultz wrote:
>> On 09/07/2012 07:20 AM, Daniel Lezcano wrote:
>>> On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>>>> I fall into this issue because NETCONSOLE is set, disabling it
>>>>>>> allowed
>>>>>>> me to go further.
>>>>>>>
>>>>>>> Unfortunately I am facing to some random freeze on the system which
>>>>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>>>>
>>>>>>> Disabling one of them, make the freezes to disappear.
>>>>>>>
>>>>>>> Is it a known issue ?
>>>>>> Well, there are systems having problems with this configuration,
>>>>>> but they
>>>>>> should be exceptional.  What system is that?
>>>>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>>>>> believe. Maybe someone got the same issue ?
>>>> Is it a regression for you?
>>> Yes, I think so. The issue appears between v3.5 and v3.6-rc1.
>>>
>>> It is not easy to reproduce but after taking some time to dig, it seems
>>> to appear with this commit:
>>>
>>> 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
>>> commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
>>> Author: John Stultz <john.stultz@linaro.org>
>>> Date:   Fri Jul 13 01:21:53 2012 -0400
>>>
>>>       time: Condense timekeeper.xtime into xtime_sec
>>>
>>>       The timekeeper struct has a xtime_nsec, which keeps the
>>>       sub-nanosecond remainder.  This ends up being somewhat
>>>       duplicative of the timekeeper.xtime.tv_nsec value, and we
>>>       have to do extra work to keep them apart, copying the full
>>>       nsec portion out and back in over and over.
>>>
>>>       This patch simplifies some of the logic by taking the timekeeper
>>>       xtime value and splitting it into timekeeper.xtime_sec and
>>>       reuses the timekeeper.xtime_nsec for the sub-second portion
>>>       (stored in higher res shifted nanoseconds).
>>>
>>>       This simplifies some of the accumulation logic. And will
>>>       allow for more accurate timekeeping once the vsyscall code
>>>       is updated to use the shifted nanosecond remainder.
>>>
>>>       Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>       Reviewed-by: Ingo Molnar <mingo@kernel.org>
>>>       Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>>       Cc: Richard Cochran <richardcochran@gmail.com>
>>>       Cc: Prarit Bhargava <prarit@redhat.com>
>>>       Link:
>>> http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.org
>>>
>>>       Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>
>>> :040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
>>> dc5708bc738af695f092bf822809b13a1da104b6 M    kernel
>>>
>>> How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
>>> kernel in busybox and wait some minutes before writing something in the
>>> console. At this moment, nothing appears to the console but the
>>> characters are echo'ed several seconds later (could be 1, 5, or 10 secs
>>> or more).
>>>
>>> That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
>>> one of them, the issue does not appear.
>> Thanks for bisecting this down and the heads up!
>>
>> Right off I can't see what might be causing this.  Bunch of questions:
>>
>> Is this a 32 or 64 bit kernel?
> It is a 32 bit kernel.

Thanks for your answers! Has this has been seen on 3.6-rc4+ kernels? 
There were a few casting fixes that landed in 3.6-rc4 that would affect 
32bit systems.

In the meantime, I'll try to reproduce on my T61. If you could send me 
your .config, I'd appreciate it.

thanks!
-john


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-10 17:14                       ` John Stultz
@ 2012-09-10 19:45                         ` Daniel Lezcano
  2012-09-11  0:18                           ` John Stultz
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2012-09-10 19:45 UTC (permalink / raw)
  To: John Stultz
  Cc: prarit, xen-devel, linaro-dev, Peter Zijlstra, linux-pm,
	Frederic Weisbecker, richardcochran, Konrad Rzeszutek Wilk,
	Linux Kernel Mailing List, Rafael J. Wysocki, linux-acpi,
	Thomas Gleixner, mingo, lenb

On 09/10/2012 07:14 PM, John Stultz wrote:
> On 09/07/2012 02:35 PM, Daniel Lezcano wrote:
>> On 09/07/2012 07:22 PM, John Stultz wrote:
>>> On 09/07/2012 07:20 AM, Daniel Lezcano wrote:
>>>> On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
>>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>>>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>>>>> I fall into this issue because NETCONSOLE is set, disabling it
>>>>>>>> allowed
>>>>>>>> me to go further.
>>>>>>>>
>>>>>>>> Unfortunately I am facing to some random freeze on the system
>>>>>>>> which
>>>>>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>>>>>
>>>>>>>> Disabling one of them, make the freezes to disappear.
>>>>>>>>
>>>>>>>> Is it a known issue ?
>>>>>>> Well, there are systems having problems with this configuration,
>>>>>>> but they
>>>>>>> should be exceptional. What system is that?
>>>>>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>>>>>> believe. Maybe someone got the same issue ?
>>>>> Is it a regression for you?
>>>> Yes, I think so. The issue appears between v3.5 and v3.6-rc1.
>>>>
>>>> It is not easy to reproduce but after taking some time to dig, it
>>>> seems
>>>> to appear with this commit:
>>>>
>>>> 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
>>>> commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
>>>> Author: John Stultz <john.stultz@linaro.org>
>>>> Date: Fri Jul 13 01:21:53 2012 -0400
>>>>
>>>> time: Condense timekeeper.xtime into xtime_sec
>>>>
>>>> The timekeeper struct has a xtime_nsec, which keeps the
>>>> sub-nanosecond remainder. This ends up being somewhat
>>>> duplicative of the timekeeper.xtime.tv_nsec value, and we
>>>> have to do extra work to keep them apart, copying the full
>>>> nsec portion out and back in over and over.
>>>>
>>>> This patch simplifies some of the logic by taking the timekeeper
>>>> xtime value and splitting it into timekeeper.xtime_sec and
>>>> reuses the timekeeper.xtime_nsec for the sub-second portion
>>>> (stored in higher res shifted nanoseconds).
>>>>
>>>> This simplifies some of the accumulation logic. And will
>>>> allow for more accurate timekeeping once the vsyscall code
>>>> is updated to use the shifted nanosecond remainder.
>>>>
>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>>>> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>>>> Cc: Richard Cochran <richardcochran@gmail.com>
>>>> Cc: Prarit Bhargava <prarit@redhat.com>
>>>> Link:
>>>> http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@linaro.org
>>>>
>>>>
>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>>
>>>> :040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
>>>> dc5708bc738af695f092bf822809b13a1da104b6 M kernel
>>>>
>>>> How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
>>>> kernel in busybox and wait some minutes before writing something in
>>>> the
>>>> console. At this moment, nothing appears to the console but the
>>>> characters are echo'ed several seconds later (could be 1, 5, or 10
>>>> secs
>>>> or more).
>>>>
>>>> That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
>>>> one of them, the issue does not appear.
>>> Thanks for bisecting this down and the heads up!
>>>
>>> Right off I can't see what might be causing this. Bunch of questions:
>>>
>>> Is this a 32 or 64 bit kernel?
>> It is a 32 bit kernel.
>
> Thanks for your answers! Has this has been seen on 3.6-rc4+ kernels?
> There were a few casting fixes that landed in 3.6-rc4 that would
> affect 32bit systems.

Ok, I have to check that. Unfortunately not before Wednesday.

>
> In the meantime, I'll try to reproduce on my T61. If you could send me
> your .config, I'd appreciate it.

http://pastebin.com/qSxqfdDK

The header of the config file shows for a v3.5-rc7 because it is the
result of the git-bisect. If you keep this config file for the latest
kernel that should reproduce the problem.

Let me know if you were able to reproduce the problem.

Thanks
-- Daniel

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-10 19:45                         ` Daniel Lezcano
@ 2012-09-11  0:18                           ` John Stultz
  2012-09-11  6:58                             ` Daniel Lezcano
  2012-09-11 21:27                             ` Daniel Lezcano
  0 siblings, 2 replies; 13+ messages in thread
From: John Stultz @ 2012-09-11  0:18 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: prarit, xen-devel, linaro-dev, Peter Zijlstra, linux-pm,
	Frederic Weisbecker, richardcochran, Konrad Rzeszutek Wilk,
	Linux Kernel Mailing List, Rafael J. Wysocki, linux-acpi,
	Thomas Gleixner, mingo, lenb

On 09/10/2012 12:45 PM, Daniel Lezcano wrote:
> On 09/10/2012 07:14 PM, John Stultz wrote:
>> In the meantime, I'll try to reproduce on my T61. If you could send me
>> your .config, I'd appreciate it.
> http://pastebin.com/qSxqfdDK
>
> The header of the config file shows for a v3.5-rc7 because it is the
> result of the git-bisect. If you keep this config file for the latest
> kernel that should reproduce the problem.
>
> Let me know if you were able to reproduce the problem.
Great! With this I was able to quickly reproduce the problem and I think 
I have a fix.

Would you mind testing the following patch? It seems to resolve the 
issue, but I've not yet run it through my test suite to make sure it 
didn't break anything else.

If both your and my testing comes back ok, I'll submit it to Thomas.

thanks
-john

 From f10a285a5b532a14d3330f6e60e4d7bd5627932a Mon Sep 17 00:00:00 2001
From: John Stultz <john.stultz@linaro.org>
Date: Mon, 10 Sep 2012 20:00:15 -0400
Subject: [PATCH] time: Fix timeekeping_get_ns overflow on 32bit systems

Daniel Lezcano reported seeing multi-second stalls from
keyboard input on his T61 laptop when NOHZ and CPU_IDLE
were enabled on a 32bit kernel.

He bisected the problem down to
1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 (time: Condense
timekeeper.xtime into xtime_sec).

After reproducing this issue, I narrowed the problem down
to the fact that timekeeping_get_ns() returns a 64bit
nsec value that hasn't been accumulated. In some cases
this value was being then stored in timespec.tv_nsec
(which is a long).

On 32bit systems, With idle times larger then 4 seconds
(or less, depending on the value of xtime_nsec), the
returned nsec value would overflow 32bits. This limited
kept time from increasing, causing timers to not expire.

The fix is to make sure we don't directly store the
result of timekeeping_get_ns() into a tv_nsec field,
instead using a 64bit nsec value which can then be
added into the timespec via timespec_add_ns().

With this patch I cannot reproduce the issue.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-and-bisected-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
  kernel/time/timekeeping.c |   19 ++++++++++++-------
  1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34e5eac..d3b91e7 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -303,10 +303,11 @@ void getnstimeofday(struct timespec *ts)
  		seq = read_seqbegin(&tk->lock);
  
  		ts->tv_sec = tk->xtime_sec;
-		ts->tv_nsec = timekeeping_get_ns(tk);
+		nsecs = timekeeping_get_ns(tk);
  
  	} while (read_seqretry(&tk->lock, seq));
  
+	ts->tv_nsec = 0;
  	timespec_add_ns(ts, nsecs);
  }
  EXPORT_SYMBOL(getnstimeofday);
@@ -345,6 +346,7 @@ void ktime_get_ts(struct timespec *ts)
  {
  	struct timekeeper *tk = &timekeeper;
  	struct timespec tomono;
+	s64 nsec;
  	unsigned int seq;
  
  	WARN_ON(timekeeping_suspended);
@@ -352,13 +354,14 @@ void ktime_get_ts(struct timespec *ts)
  	do {
  		seq = read_seqbegin(&tk->lock);
  		ts->tv_sec = tk->xtime_sec;
-		ts->tv_nsec = timekeeping_get_ns(tk);
+		nsec = timekeeping_get_ns(tk);
  		tomono = tk->wall_to_monotonic;
  
  	} while (read_seqretry(&tk->lock, seq));
  
-	set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec,
-				ts->tv_nsec + tomono.tv_nsec);
+	ts->tv_sec += tomono.tv_sec;
+	ts->tv_nsec = 0;
+	timespec_add_ns(ts, nsec + tomono.tv_nsec);
  }
  EXPORT_SYMBOL_GPL(ktime_get_ts);
  
@@ -1244,6 +1247,7 @@ void get_monotonic_boottime(struct timespec *ts)
  {
  	struct timekeeper *tk = &timekeeper;
  	struct timespec tomono, sleep;
+	s64 nsec;
  	unsigned int seq;
  
  	WARN_ON(timekeeping_suspended);
@@ -1251,14 +1255,15 @@ void get_monotonic_boottime(struct timespec *ts)
  	do {
  		seq = read_seqbegin(&tk->lock);
  		ts->tv_sec = tk->xtime_sec;
-		ts->tv_nsec = timekeeping_get_ns(tk);
+		nsec = timekeeping_get_ns(tk);
  		tomono = tk->wall_to_monotonic;
  		sleep = tk->total_sleep_time;
  
  	} while (read_seqretry(&tk->lock, seq));
  
-	set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec + sleep.tv_sec,
-			ts->tv_nsec + tomono.tv_nsec + sleep.tv_nsec);
+	ts->tv_sec += tomono.tv_sec + sleep.tv_sec;
+	ts->tv_nsec = 0;
+	timespec_add_ns(ts, nsec + tomono.tv_nsec + sleep.tv_nsec);
  }
  EXPORT_SYMBOL_GPL(get_monotonic_boottime);
  
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-11  0:18                           ` John Stultz
@ 2012-09-11  6:58                             ` Daniel Lezcano
  2012-09-11 17:26                               ` John Stultz
  2012-09-11 21:27                             ` Daniel Lezcano
  1 sibling, 1 reply; 13+ messages in thread
From: Daniel Lezcano @ 2012-09-11  6:58 UTC (permalink / raw)
  To: John Stultz
  Cc: prarit, xen-devel, linaro-dev, Peter Zijlstra, linux-pm,
	Frederic Weisbecker, richardcochran, Konrad Rzeszutek Wilk,
	Linux Kernel Mailing List, Rafael J. Wysocki, linux-acpi,
	Thomas Gleixner, mingo, lenb

On 09/11/2012 02:18 AM, John Stultz wrote:
> On 09/10/2012 12:45 PM, Daniel Lezcano wrote:
>> On 09/10/2012 07:14 PM, John Stultz wrote:
>>> In the meantime, I'll try to reproduce on my T61. If you could send me
>>> your .config, I'd appreciate it.
>> http://pastebin.com/qSxqfdDK
>>
>> The header of the config file shows for a v3.5-rc7 because it is the
>> result of the git-bisect. If you keep this config file for the latest
>> kernel that should reproduce the problem.
>>
>> Let me know if you were able to reproduce the problem.
> Great! With this I was able to quickly reproduce the problem and I think
> I have a fix.

Cool !

> Would you mind testing the following patch? It seems to resolve the
> issue, but I've not yet run it through my test suite to make sure it
> didn't break anything else.

No problem, I will try it this evening.

Is this problem related to all 32bits arch ?

Thanks !

  -- Daniel

> If both your and my testing comes back ok, I'll submit it to Thomas.
> 
> thanks
> -john
> 
> From f10a285a5b532a14d3330f6e60e4d7bd5627932a Mon Sep 17 00:00:00 2001
> From: John Stultz <john.stultz@linaro.org>
> Date: Mon, 10 Sep 2012 20:00:15 -0400
> Subject: [PATCH] time: Fix timeekeping_get_ns overflow on 32bit systems
> 
> Daniel Lezcano reported seeing multi-second stalls from
> keyboard input on his T61 laptop when NOHZ and CPU_IDLE
> were enabled on a 32bit kernel.
> 
> He bisected the problem down to
> 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 (time: Condense
> timekeeper.xtime into xtime_sec).
> 
> After reproducing this issue, I narrowed the problem down
> to the fact that timekeeping_get_ns() returns a 64bit
> nsec value that hasn't been accumulated. In some cases
> this value was being then stored in timespec.tv_nsec
> (which is a long).
> 
> On 32bit systems, With idle times larger then 4 seconds
> (or less, depending on the value of xtime_nsec), the
> returned nsec value would overflow 32bits. This limited
> kept time from increasing, causing timers to not expire.
> 
> The fix is to make sure we don't directly store the
> result of timekeeping_get_ns() into a tv_nsec field,
> instead using a 64bit nsec value which can then be
> added into the timespec via timespec_add_ns().
> 
> With this patch I cannot reproduce the issue.
> 
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Reported-and-bisected-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/time/timekeeping.c |   19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 34e5eac..d3b91e7 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -303,10 +303,11 @@ void getnstimeofday(struct timespec *ts)
>          seq = read_seqbegin(&tk->lock);
>  
>          ts->tv_sec = tk->xtime_sec;
> -        ts->tv_nsec = timekeeping_get_ns(tk);
> +        nsecs = timekeeping_get_ns(tk);
>  
>      } while (read_seqretry(&tk->lock, seq));
>  
> +    ts->tv_nsec = 0;
>      timespec_add_ns(ts, nsecs);
>  }
>  EXPORT_SYMBOL(getnstimeofday);
> @@ -345,6 +346,7 @@ void ktime_get_ts(struct timespec *ts)
>  {
>      struct timekeeper *tk = &timekeeper;
>      struct timespec tomono;
> +    s64 nsec;
>      unsigned int seq;
>  
>      WARN_ON(timekeeping_suspended);
> @@ -352,13 +354,14 @@ void ktime_get_ts(struct timespec *ts)
>      do {
>          seq = read_seqbegin(&tk->lock);
>          ts->tv_sec = tk->xtime_sec;
> -        ts->tv_nsec = timekeeping_get_ns(tk);
> +        nsec = timekeeping_get_ns(tk);
>          tomono = tk->wall_to_monotonic;
>  
>      } while (read_seqretry(&tk->lock, seq));
>  
> -    set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec,
> -                ts->tv_nsec + tomono.tv_nsec);
> +    ts->tv_sec += tomono.tv_sec;
> +    ts->tv_nsec = 0;
> +    timespec_add_ns(ts, nsec + tomono.tv_nsec);
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_ts);
>  
> @@ -1244,6 +1247,7 @@ void get_monotonic_boottime(struct timespec *ts)
>  {
>      struct timekeeper *tk = &timekeeper;
>      struct timespec tomono, sleep;
> +    s64 nsec;
>      unsigned int seq;
>  
>      WARN_ON(timekeeping_suspended);
> @@ -1251,14 +1255,15 @@ void get_monotonic_boottime(struct timespec *ts)
>      do {
>          seq = read_seqbegin(&tk->lock);
>          ts->tv_sec = tk->xtime_sec;
> -        ts->tv_nsec = timekeeping_get_ns(tk);
> +        nsec = timekeeping_get_ns(tk);
>          tomono = tk->wall_to_monotonic;
>          sleep = tk->total_sleep_time;
>  
>      } while (read_seqretry(&tk->lock, seq));
>  
> -    set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec + sleep.tv_sec,
> -            ts->tv_nsec + tomono.tv_nsec + sleep.tv_nsec);
> +    ts->tv_sec += tomono.tv_sec + sleep.tv_sec;
> +    ts->tv_nsec = 0;
> +    timespec_add_ns(ts, nsec + tomono.tv_nsec + sleep.tv_nsec);
>  }
>  EXPORT_SYMBOL_GPL(get_monotonic_boottime);
>  


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-11  6:58                             ` Daniel Lezcano
@ 2012-09-11 17:26                               ` John Stultz
  0 siblings, 0 replies; 13+ messages in thread
From: John Stultz @ 2012-09-11 17:26 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: prarit, xen-devel, linaro-dev, Peter Zijlstra, linux-pm,
	Frederic Weisbecker, richardcochran, Konrad Rzeszutek Wilk,
	Linux Kernel Mailing List, Rafael J. Wysocki, linux-acpi,
	Thomas Gleixner, mingo, lenb

On 09/10/2012 11:58 PM, Daniel Lezcano wrote:
>> Would you mind testing the following patch? It seems to resolve the
>> issue, but I've not yet run it through my test suite to make sure it
>> didn't break anything else.
> No problem, I will try it this evening.
>
> Is this problem related to all 32bits arch ?
I believe so. Although it didn't appear in my 32bit testing w/ kvm, but 
I suspect that is due to my distro userland setting lots of timers so 
that we don't hit those multi-second idle times, which could overflow 
32bit nanoseconds, or maybe some other kvm quirk.

Anyway, let me know if your testing goes well.

Thanks so much again for noticing and bisecting this down.
-john


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)
  2012-09-11  0:18                           ` John Stultz
  2012-09-11  6:58                             ` Daniel Lezcano
@ 2012-09-11 21:27                             ` Daniel Lezcano
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Lezcano @ 2012-09-11 21:27 UTC (permalink / raw)
  To: John Stultz
  Cc: prarit, xen-devel, linaro-dev, Peter Zijlstra, linux-pm,
	Frederic Weisbecker, richardcochran, Konrad Rzeszutek Wilk,
	Linux Kernel Mailing List, Rafael J. Wysocki, linux-acpi,
	Thomas Gleixner, mingo, lenb

On 09/11/2012 02:18 AM, John Stultz wrote:
> On 09/10/2012 12:45 PM, Daniel Lezcano wrote:
>> On 09/10/2012 07:14 PM, John Stultz wrote:
>>> In the meantime, I'll try to reproduce on my T61. If you could send me
>>> your .config, I'd appreciate it.
>> http://pastebin.com/qSxqfdDK
>>
>> The header of the config file shows for a v3.5-rc7 because it is the
>> result of the git-bisect. If you keep this config file for the latest
>> kernel that should reproduce the problem.
>>
>> Let me know if you were able to reproduce the problem.
> Great! With this I was able to quickly reproduce the problem and I think
> I have a fix.
> 
> Would you mind testing the following patch? It seems to resolve the
> issue, but I've not yet run it through my test suite to make sure it
> didn't break anything else.
> 
> If both your and my testing comes back ok, I'll submit it to Thomas.

Sounds like this solves the problem. Without enough background on timers
in general, I don't have an opinion about the patch itself but I can
confirm the issue is no longer occurring.

Thanks
  -- Daniel

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-09-11 21:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1343164349-28550-1-git-send-email-daniel.lezcano@linaro.org>
     [not found] ` <50410811.70500@linaro.org>
     [not found]   ` <201209010754.58999.rjw@sisk.pl>
     [not found]     ` <201209051541.58909.rjw@sisk.pl>
     [not found]       ` <50485698.1070905@linaro.org>
2012-09-06  9:22         ` CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure) Daniel Lezcano
2012-09-06 20:04           ` Rafael J. Wysocki
2012-09-06 20:35             ` Daniel Lezcano
2012-09-06 21:18               ` Rafael J. Wysocki
2012-09-07 14:20                 ` Daniel Lezcano
2012-09-07 17:22                   ` John Stultz
2012-09-07 21:35                     ` Daniel Lezcano
2012-09-10 17:14                       ` John Stultz
2012-09-10 19:45                         ` Daniel Lezcano
2012-09-11  0:18                           ` John Stultz
2012-09-11  6:58                             ` Daniel Lezcano
2012-09-11 17:26                               ` John Stultz
2012-09-11 21:27                             ` Daniel Lezcano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).