All of lore.kernel.org
 help / color / mirror / Atom feed
* [Issue] timer callback registered with mod_timer is getting called beforetime
       [not found] <CGME20210924065310epcms5p69dd47a510faaa6bf68c243e02f2d0186@epcms5p6>
@ 2021-09-24  6:53 ` Maninder Singh
  2021-09-24 10:38   ` Frederic Weisbecker
                     ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Maninder Singh @ 2021-09-24  6:53 UTC (permalink / raw)
  To: fweisbec, tglx, mingo
  Cc: linux-kernel, Vaneet Narang, AMIT SAHRAWAT, Chung-Ki Woo

Hi,


we are facing issue of early invocation of timer callback(kernel versio 5.4), Is it known issue for timer callback?

As we checekd code if CPU responsible for updating jiffies value(lets say CPU0) has interrupts disabled for 60 ms
and in mean time other CPU(CPU1) registers new timer for (jiffies + x) value. its callback will be called before time.

As it reads old jiffies value and register callback for x jiffies afterwards, but instead it is called
at x - y jiffies. (y is lapsed period which CPU 1 does not know from jiffies)


CPU 0                             CPU 1

// last jiffies updated
local_irq_disable();              do_some_work()
do_work()                         ....
...                               ....
...                               ....
... // 60 ms                      mod_timer(&timer, jiffies + x); // read old jiffies value
local_irq_enable()                ...
// new jiffies updated with       // timer interrupt reads updated jiffies and calls callback function where x was not actually passed on HW clock.
// jiffies + 16


we tried with TC (with 500 ms irq disable just to check behaviour, in actual issue it was 60 ms IRQ disable)

// IRQs disabled on same CPU0 responsible for jiffies updation for 500 mili seconds

[23.5598] ##### disable IRQs
[24.0600] ##### registered timer 25025334960 5887  //registered timer at HW clock 25.025 seconds for 600 ms at 5887 jiffies on CPU1

[24.0604] ##### ticks change abruptly 5887 124   //IQRS enabled, CPU0 increments 124 jiffies colllectively because of irq disable long time

[24.1595] ##### timer called 25124783406 6037   // as per jiffies callback trigger at 6037 jiffies i.e. after 150 jiffies(600ms)
                                               // but HW clock is passed only 100 milliseconds.



Is it known behaviour for timers?
because only 1 CPU is assigned to update jiffies work to call do_timer utill unless it goes to idle state and pass ownership to other CPU.

we tried by making all CPU to handle code for jiffies updation (it will add performance hit)
but then no issue of abrupt jiffies change occured on system.


Thanks and Regards
----------------------
Maninder Singh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Issue] timer callback registered with mod_timer is getting called beforetime
  2021-09-24  6:53 ` [Issue] timer callback registered with mod_timer is getting called beforetime Maninder Singh
@ 2021-09-24 10:38   ` Frederic Weisbecker
  2021-09-24 12:08   ` Thomas Gleixner
       [not found]   ` <CGME20210924065310epcms5p69dd47a510faaa6bf68c243e02f2d0186@epcms5p2>
  2 siblings, 0 replies; 4+ messages in thread
From: Frederic Weisbecker @ 2021-09-24 10:38 UTC (permalink / raw)
  To: Maninder Singh
  Cc: fweisbec, tglx, mingo, linux-kernel, Vaneet Narang,
	AMIT SAHRAWAT, Chung-Ki Woo

Hi Maninder,

On Fri, Sep 24, 2021 at 12:23:10PM +0530, Maninder Singh wrote:
> Hi,
> 
> 
> we are facing issue of early invocation of timer callback(kernel versio 5.4), Is it known issue for timer callback?
> 
> As we checekd code if CPU responsible for updating jiffies value(lets say CPU0) has interrupts disabled for 60 ms
> and in mean time other CPU(CPU1) registers new timer for (jiffies + x) value. its callback will be called before time.
> 
> As it reads old jiffies value and register callback for x jiffies afterwards, but instead it is called
> at x - y jiffies. (y is lapsed period which CPU 1 does not know from jiffies)
> 
> 
> CPU 0                             CPU 1
> 
> // last jiffies updated
> local_irq_disable();              do_some_work()
> do_work()                         ....
> ...                               ....
> ...                               ....
> ... // 60 ms                      mod_timer(&timer, jiffies + x); // read old jiffies value
> local_irq_enable()                ...
> // new jiffies updated with       // timer interrupt reads updated jiffies and calls callback function where x was not actually passed on HW clock.
> // jiffies + 16
> 
> 
> we tried with TC (with 500 ms irq disable just to check behaviour, in actual issue it was 60 ms IRQ disable)
> 
> // IRQs disabled on same CPU0 responsible for jiffies updation for 500 mili seconds
> 
> [23.5598] ##### disable IRQs
> [24.0600] ##### registered timer 25025334960 5887  //registered timer at HW clock 25.025 seconds for 600 ms at 5887 jiffies on CPU1
> 
> [24.0604] ##### ticks change abruptly 5887 124   //IQRS enabled, CPU0 increments 124 jiffies colllectively because of irq disable long time
> 
> [24.1595] ##### timer called 25124783406 6037   // as per jiffies callback trigger at 6037 jiffies i.e. after 150 jiffies(600ms)
>                                                // but HW clock is passed only 100 milliseconds.
> 
> 
> 
> Is it known behaviour for timers?
> because only 1 CPU is assigned to update jiffies work to call do_timer utill unless it goes to idle state and pass ownership to other CPU.
> 
> we tried by making all CPU to handle code for jiffies updation (it will add performance hit)
> but then no issue of abrupt jiffies change occured on system.

First of all, are you meeting this issue specifically on NOHZ_FULL? Because
there is a pending fix for a related matter there:

      https://lore.kernel.org/lkml/20210915142303.24297-1-frederic@kernel.org/

As for what you're reporting here, I think the core problem is the fact that the
timekeeper (jiffies updater) is stuck with IRQs disabled for way too long. Even
one millisecond is too much to tolerate. Do you have any idea about the source of
that situation?

Thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Issue] timer callback registered with mod_timer is getting called beforetime
  2021-09-24  6:53 ` [Issue] timer callback registered with mod_timer is getting called beforetime Maninder Singh
  2021-09-24 10:38   ` Frederic Weisbecker
@ 2021-09-24 12:08   ` Thomas Gleixner
       [not found]   ` <CGME20210924065310epcms5p69dd47a510faaa6bf68c243e02f2d0186@epcms5p2>
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2021-09-24 12:08 UTC (permalink / raw)
  To: maninder1.s, fweisbec, mingo
  Cc: linux-kernel, Vaneet Narang, AMIT SAHRAWAT, Chung-Ki Woo

On Fri, Sep 24 2021 at 12:23, Maninder Singh wrote:
?
> As we checekd code if CPU responsible for updating jiffies value(lets
> say CPU0) has interrupts disabled for 60 ms and in mean time other
> CPU(CPU1) registers new timer for (jiffies + x) value. its callback
> will be called before time.

So you run some stuff for 60ms with interrupts disabled and now you are
asking what's wrong?

The answer is simply: Keeping interrupts disabled for 60ms is wrong.

Anything else is just a symptom.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE:(2) [Issue] timer callback registered with mod_timer is getting called beforetime
       [not found]   ` <CGME20210924065310epcms5p69dd47a510faaa6bf68c243e02f2d0186@epcms5p2>
@ 2021-09-24 14:04     ` Maninder Singh
  0 siblings, 0 replies; 4+ messages in thread
From: Maninder Singh @ 2021-09-24 14:04 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: fweisbec, tglx, mingo, linux-kernel, Vaneet Narang,
	AMIT SAHRAWAT, Chung-Ki Woo, Vishal Goel

[-- Attachment #1: Type: text/plain, Size: 1877 bytes --]


Hi Frederic,


> > Is it known behaviour for timers?
> > because only 1 CPU is assigned to update jiffies work to call do_timer utill unless it goes to idle state and pass ownership to other CPU.
> > 
> > we tried by making all CPU to handle code for jiffies updation (it will add performance hit)
> > but then no issue of abrupt jiffies change occured on system.
>  
> First of all, are you meeting this issue specifically on NOHZ_FULL? Because
> there is a pending fix for a related matter there:

No, this is not our case.

>  
>       https://lore.kernel.org/lkml/20210915142303.24297-1-frederic@kernel.org/
>  
> As for what you're reporting here, I think the core problem is the fact that the
> timekeeper (jiffies updater) is stuck with IRQs disabled for way too long. Even
> one millisecond is too much to tolerate. Do you have any idea about the source of
> that situation?
>  

Yes, definately interrupts should not be disabled for so long,
but sometimes there are 3rd party drivers/vendors module code can cause issue,
and it can be the same case and we are trying to reproduce issue again and check code path.

So we had 2 doubts:
(1) In this explained case timer callback will be called early right? 
(2) What if jiffies updation can be done by any of the CPU rather that making one
CPU owner? can it cause any side effectes? one we know is performance, there will be redundant calls
from other CPUs.

        /* Check, if the jiffies need an update */
        if (tick_do_timer_cpu == cpu)
                tick_do_update_jiffies64(now);


On our target, there is a race condition when irq_disable code path scheduled on same CPU
which is responsible for jiffies updation and in parallel CPU1 registers evet callback for 20/30 ms.
and due to abrupt jiffies change callback triggered within 1 ms of actual time, which cause actual
issue.

Thanks
Maninder Singh
 
 

[-- Attachment #2: rcptInfo.txt --]
[-- Type: application/octet-stream, Size: 1577 bytes --]


   =================================================================================================================================
      Subject    : Re: [Issue] timer callback registered with mod_timer is getting called beforetime
      From       : null
      Sent Date  : 2021-09-24 16:08  GMT+5:30
   =================================================================================================================================
                  Name                Type          Job Title                       Dept.                               Company                
   =================================================================================================================================
      Maninder Singh                 TO         Staff Engineer             System S/W Group /SRI-Delhi               Samsung Electronics 
      fweisbec@gmail.com             CC
      tglx@linutronix.de             CC
      mingo@kernel.org               CC
      linux-kernel@vger.kernel.org   CC
      Vaneet Narang                  CC         Staff Engineer             System S/W Group /SRI-Delhi               Samsung Electronics
      AMIT SAHRAWAT                  CC         Staff Engineer/Head o...   System S/W Group /SRI-Delhi               Samsung Electronics
      Chung-Ki Woo                   CC         Principal Engineer         S/W Platform R&D Lab.                     Samsung Electronics
   =================================================================================================================================

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-24 14:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210924065310epcms5p69dd47a510faaa6bf68c243e02f2d0186@epcms5p6>
2021-09-24  6:53 ` [Issue] timer callback registered with mod_timer is getting called beforetime Maninder Singh
2021-09-24 10:38   ` Frederic Weisbecker
2021-09-24 12:08   ` Thomas Gleixner
     [not found]   ` <CGME20210924065310epcms5p69dd47a510faaa6bf68c243e02f2d0186@epcms5p2>
2021-09-24 14:04     ` Maninder Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.