linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sky2: Use deferrable timer for watchdog
@ 2007-12-19  1:13 Parag Warudkar
  2007-12-20 17:16 ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: Parag Warudkar @ 2007-12-19  1:13 UTC (permalink / raw)
  To: netdev; +Cc: shemminger, akpm, linux-kernel


sky2 can use deferrable timer for watchdog - reduces wakeups from idle per 
second.

Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>

--- linux-2.6/drivers/net/sky2.c	2007-12-07 10:04:39.000000000 -0500
+++ linux-2.6-work/drivers/net/sky2.c	2007-12-18 20:07:58.000000000 -0500
@@ -4230,7 +4230,10 @@
  			sky2_show_addr(dev1);
  	}

-	setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
+	hw->watchdog_timer.function = sky2_watchdog;
+	hw->watchdog_timer.data = (unsigned long) hw;
+	init_timer_deferrable(&hw->watchdog_timer);
+
  	INIT_WORK(&hw->restart_work, sky2_restart);

  	pci_set_drvdata(pdev, hw);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-19  1:13 [PATCH] sky2: Use deferrable timer for watchdog Parag Warudkar
@ 2007-12-20 17:16 ` Stephen Hemminger
       [not found]   ` <823114761-1198171803-cardhu_decombobulator_blackberry.rim.net-937108990-@bxe019.bisx.prod.on.blackberry>
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2007-12-20 17:16 UTC (permalink / raw)
  To: parag.warudkar; +Cc: netdev, akpm, linux-kernel

On Tue, 18 Dec 2007 20:13:28 -0500 (EST)
Parag Warudkar <parag.warudkar@gmail.com> wrote:

> 
> sky2 can use deferrable timer for watchdog - reduces wakeups from idle per 
> second.
> 
> Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
> 
> --- linux-2.6/drivers/net/sky2.c	2007-12-07 10:04:39.000000000 -0500
> +++ linux-2.6-work/drivers/net/sky2.c	2007-12-18 20:07:58.000000000 -0500
> @@ -4230,7 +4230,10 @@
>   			sky2_show_addr(dev1);
>   	}
> 
> -	setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
> +	hw->watchdog_timer.function = sky2_watchdog;
> +	hw->watchdog_timer.data = (unsigned long) hw;
> +	init_timer_deferrable(&hw->watchdog_timer);
> +
>   	INIT_WORK(&hw->restart_work, sky2_restart);
> 
>   	pci_set_drvdata(pdev, hw);

Does it really reduce the wakeup's or only change who gets charged by powertop?
The system is going to wakeup once a second anyway. Looks to me that if the
timer is using round_jiffies(), that setting deferrable just changes the accounting.

My interpretation of the api is:
   * round_jiffies()  - timer wants to wakeup but isn't precise about when so schedule
                        on next second when system will wake up anyway;
                        e.g why meetings are usually scheduled on the hour

   * deferrable       - timer doesn't have to really wakeup but wants to happen near
                        a particular time. e.g. "I'll meet you at the pub around 8pm"

Therefore doing deferrable is unnecessary for timers using round_jiffies unless system
is so good at doing timers that it is going to skip doing timer once per second.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
       [not found]   ` <823114761-1198171803-cardhu_decombobulator_blackberry.rim.net-937108990-@bxe019.bisx.prod.on.blackberry>
@ 2007-12-20 17:51     ` Stephen Hemminger
  2007-12-20 18:05       ` Parag Warudkar
  2007-12-20 19:09       ` Kok, Auke
  0 siblings, 2 replies; 14+ messages in thread
From: Stephen Hemminger @ 2007-12-20 17:51 UTC (permalink / raw)
  To: parag.warudkar; +Cc: netdev, akpm, linux-kernel

On Thu, 20 Dec 2007 17:29:23 +0000

> 
> -----Original Message-----
> From: Stephen Hemminger <shemminger@linux-foundation.org>
> 
> Date: Thu, 20 Dec 2007 09:16:03 
> To:parag.warudkar@gmail.com
> Cc:netdev@vger.kernel.org, akpm@linux-foundation.org,       linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog
> 
> 
> On Tue, 18 Dec 2007 20:13:28 -0500 (EST)
> Parag Warudkar <parag.warudkar@gmail.com> wrote:
> 
> > 
> > sky2 can use deferrable timer for watchdog - reduces wakeups from idle per 
> > second.
> > 
> > Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
> > 
> > --- linux-2.6/drivers/net/sky2.c	2007-12-07 10:04:39.000000000 -0500
> > +++ linux-2.6-work/drivers/net/sky2.c	2007-12-18 20:07:58.000000000 -0500
> > @@ -4230,7 +4230,10 @@
> >   			sky2_show_addr(dev1);
> >   	}
> > 
> > -	setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
> > +	hw->watchdog_timer.function = sky2_watchdog;
> > +	hw->watchdog_timer.data = (unsigned long) hw;
> > +	init_timer_deferrable(&hw->watchdog_timer);
> > +
> >   	INIT_WORK(&hw->restart_work, sky2_restart);
> > 
> >   	pci_set_drvdata(pdev, hw);
> 
> Does it really reduce the wakeup's or only change who gets charged by powertop?
> The system is going to wakeup once a second anyway. Looks to me that if the
> timer is using round_jiffies(), that setting deferrable just changes the accounting.
> 
> My interpretation of the api is:
>    * round_jiffies()  - timer wants to wakeup but isn't precise about when so schedule
>                         on next second when system will wake up anyway;
>                         e.g why meetings are usually scheduled on the hour
> 
>    * deferrable       - timer doesn't have to really wakeup but wants to happen near
>                         a particular time. e.g. "I'll meet you at the pub around 8pm"
> 
> Therefore doing deferrable is unnecessary for timers using round_jiffies unless system
> is so good at doing timers that it is going to skip doing timer once per second.
> 

parag.warudkar@gmail.com wrote:

> NO_HZ kernels don't do timers every second - if you do round_jiffies() the kernel will wakeup and run the timer at that time no matter what. 
> 
> The reason deferrable was introduced is to avoid waking up the kernel just for this one timer that can be called when the CPU is not idle for some reason other than this timer.
> 
> In other words let's say there were two timers - one non-deferrable expiring in 3 seconds and other deferrable, expiring in 1.5 seconds. The kernel will not wake up twice - once for 1.5 second and other for 3 second - it will wake up once at expiry of 3 second timer and execute both the 1.5 second and 3 second timers.
> 
> And this is not just powertop accounting thing - like I said the total num of wakeups per second go down with this patch.
> 
> Parag
> 
> Sent via BlackBerry from T-Mobile


Quit top-posting!

If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
should just be converted to deferrable??  I am a bit concerned that if deferrable gets used everywhere
then a strange situation would occur where all timers were waiting for some other timer to finally
happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
 "you first, no you first, after you mister chip, no after you mister dale,..."

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 17:51     ` Stephen Hemminger
@ 2007-12-20 18:05       ` Parag Warudkar
  2007-12-20 19:09       ` Kok, Auke
  1 sibling, 0 replies; 14+ messages in thread
From: Parag Warudkar @ 2007-12-20 18:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, akpm, linux-kernel

On Dec 20, 2007 12:51 PM, Stephen Hemminger
<shemminger@linux-foundation.org> wrote:
>
> Quit top-posting!
>
> If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
> should just be converted to deferrable??  I am a bit concerned that if deferrable gets used everywhere
> then a strange situation would occur where all timers were waiting for some other timer to finally
> happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
>  "you first, no you first, after you mister chip, no after you mister dale,..."
>

Haha - I thought about this too. I think there should be mechanism
where the machine does not idle infinitely even if there are no
non-deferrable timers. Something like an affordable QoS for non
deferrable timers - the kernel wakes up after that interval and runs
all deferrable timers  even if nothing non-deferrable is set to run.
So we still get advantage of not having to wake individually for each
timer and the non-deferrable timers do get all run in reasonable
amount of time.

Who knows Thomas/Ingo already built in something of that nature or effect?!

Parag

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 17:51     ` Stephen Hemminger
  2007-12-20 18:05       ` Parag Warudkar
@ 2007-12-20 19:09       ` Kok, Auke
  2007-12-20 19:11         ` Arjan van de Ven
  1 sibling, 1 reply; 14+ messages in thread
From: Kok, Auke @ 2007-12-20 19:09 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: parag.warudkar, netdev, akpm, linux-kernel, Arjan van de Ven

Stephen Hemminger wrote:
> On Thu, 20 Dec 2007 17:29:23 +0000
> 
>> -----Original Message-----
>> From: Stephen Hemminger <shemminger@linux-foundation.org>
>>
>> Date: Thu, 20 Dec 2007 09:16:03 
>> To:parag.warudkar@gmail.com
>> Cc:netdev@vger.kernel.org, akpm@linux-foundation.org,       linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog
>>
>>
>> On Tue, 18 Dec 2007 20:13:28 -0500 (EST)
>> Parag Warudkar <parag.warudkar@gmail.com> wrote:
>>
>>> sky2 can use deferrable timer for watchdog - reduces wakeups from idle per 
>>> second.
>>>
>>> Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com>
>>>
>>> --- linux-2.6/drivers/net/sky2.c	2007-12-07 10:04:39.000000000 -0500
>>> +++ linux-2.6-work/drivers/net/sky2.c	2007-12-18 20:07:58.000000000 -0500
>>> @@ -4230,7 +4230,10 @@
>>>   			sky2_show_addr(dev1);
>>>   	}
>>>
>>> -	setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
>>> +	hw->watchdog_timer.function = sky2_watchdog;
>>> +	hw->watchdog_timer.data = (unsigned long) hw;
>>> +	init_timer_deferrable(&hw->watchdog_timer);
>>> +
>>>   	INIT_WORK(&hw->restart_work, sky2_restart);
>>>
>>>   	pci_set_drvdata(pdev, hw);
>> Does it really reduce the wakeup's or only change who gets charged by powertop?
>> The system is going to wakeup once a second anyway. Looks to me that if the
>> timer is using round_jiffies(), that setting deferrable just changes the accounting.
>>
>> My interpretation of the api is:
>>    * round_jiffies()  - timer wants to wakeup but isn't precise about when so schedule
>>                         on next second when system will wake up anyway;
>>                         e.g why meetings are usually scheduled on the hour
>>
>>    * deferrable       - timer doesn't have to really wakeup but wants to happen near
>>                         a particular time. e.g. "I'll meet you at the pub around 8pm"
>>
>> Therefore doing deferrable is unnecessary for timers using round_jiffies unless system
>> is so good at doing timers that it is going to skip doing timer once per second.
>>
> 
> parag.warudkar@gmail.com wrote:
> 
>> NO_HZ kernels don't do timers every second - if you do round_jiffies() the kernel will wakeup and run the timer at that time no matter what. 
>>
>> The reason deferrable was introduced is to avoid waking up the kernel just for this one timer that can be called when the CPU is not idle for some reason other than this timer.
>>
>> In other words let's say there were two timers - one non-deferrable expiring in 3 seconds and other deferrable, expiring in 1.5 seconds. The kernel will not wake up twice - once for 1.5 second and other for 3 second - it will wake up once at expiry of 3 second timer and execute both the 1.5 second and 3 second timers.
>>
>> And this is not just powertop accounting thing - like I said the total num of wakeups per second go down with this patch.
>>
>> Parag
>>
>> Sent via BlackBerry from T-Mobile
> 
> 
> Quit top-posting!
> 
> If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
> should just be converted to deferrable??  I am a bit concerned that if deferrable gets used everywhere
> then a strange situation would occur where all timers were waiting for some other timer to finally
> happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
>  "you first, no you first, after you mister chip, no after you mister dale,..."



that's a dangerous situation indeed and I'd really like to know what the limits
are for deferring deferrable timers.... Arjan, do you know? Anyone?

I don't see a danger just yet on normal systems - I get something like 10 wakeups
per second from just the kernel (acpi, ahci, usb) on most my systems which
guarantees that the watchdog runs often enough, but for embedded systems and
critical timers in other drivers this may be an issue quickly


Auke

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 19:09       ` Kok, Auke
@ 2007-12-20 19:11         ` Arjan van de Ven
  2007-12-20 19:22           ` Kok, Auke
  0 siblings, 1 reply; 14+ messages in thread
From: Arjan van de Ven @ 2007-12-20 19:11 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Stephen Hemminger, parag.warudkar, netdev, akpm, linux-kernel


>>> My interpretation of the api is:
>>>    * round_jiffies()  - timer wants to wakeup but isn't precise about when so schedule
>>>                         on next second when system will wake up anyway;
>>>                         e.g why meetings are usually scheduled on the hour
>>>
>>>    * deferrable       - timer doesn't have to really wakeup but wants to happen near
>>>                         a particular time. e.g. "I'll meet you at the pub around 8pm"

this is not correct.

deferrable means "if you're busy wake me up at this time. But if not, don't bother waking up for me, get to it
later".

The "later" can be a LONG time later, several seconds easily, if not more.
(timers are on a per cpu bases, and you may end up with a several-core system where the common timers are all on another cpu
than this one)



>> If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
>> should just be converted to deferrable??  I am a bit concerned that if deferrable gets used everywhere
>> then a strange situation would occur where all timers were waiting for some other timer to finally
>> happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
>>  "you first, no you first, after you mister chip, no after you mister dale,..."
> 
> 
> 
> that's a dangerous situation indeed and I'd really like to know what the limits
> are for deferring deferrable timers.... Arjan, do you know? Anyone?

there is NO limit to deferring a timer. Do NOT use a deferrable timer if you can't afford the timer to not happen
within.. 10 to 100 seconds! (or more)
They are really meant for things where you CAN afford for it to not happen when you're idle....


> 
> I don't see a danger just yet on normal systems - I get something like 10 wakeups
> per second from just the kernel (acpi, ahci, usb) on most my systems which
> guarantees that the watchdog runs often enough, but for embedded systems and
> critical timers in other drivers this may be an issue quickly

on my work desktop test box the average time between cpu wakeups is 1.4 seconds
(and that's single core). It would be higher if it wasn't for some hpet limit issues.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 19:11         ` Arjan van de Ven
@ 2007-12-20 19:22           ` Kok, Auke
  2007-12-20 19:42             ` Arjan van de Ven
  2007-12-20 20:00             ` Parag Warudkar
  0 siblings, 2 replies; 14+ messages in thread
From: Kok, Auke @ 2007-12-20 19:22 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Stephen Hemminger, parag.warudkar, netdev, akpm, linux-kernel

Arjan van de Ven wrote:
> 
>>>> My interpretation of the api is:
>>>>    * round_jiffies()  - timer wants to wakeup but isn't precise
>>>> about when so schedule
>>>>                         on next second when system will wake up anyway;
>>>>                         e.g why meetings are usually scheduled on
>>>> the hour
>>>>
>>>>    * deferrable       - timer doesn't have to really wakeup but
>>>> wants to happen near
>>>>                         a particular time. e.g. "I'll meet you at
>>>> the pub around 8pm"
> 
> this is not correct.
> 
> deferrable means "if you're busy wake me up at this time. But if not,
> don't bother waking up for me, get to it
> later".
> 
> The "later" can be a LONG time later, several seconds easily, if not more.
> (timers are on a per cpu bases, and you may end up with a several-core
> system where the common timers are all on another cpu
> than this one)
> 
> 
> 
>>> If this is the case then the whole usage of round_jiffies() is bogus.
>>> All users of round_jiffies()
>>> should just be converted to deferrable??  I am a bit concerned that
>>> if deferrable gets used everywhere
>>> then a strange situation would occur where all timers were waiting
>>> for some other timer to finally
>>> happen, kind of a wierd timelock situation. Like the old chip/dale
>>> cartoon:
>>>  "you first, no you first, after you mister chip, no after you mister
>>> dale,..."
>>
>>
>>
>> that's a dangerous situation indeed and I'd really like to know what
>> the limits
>> are for deferring deferrable timers.... Arjan, do you know? Anyone?
> 
> there is NO limit to deferring a timer. Do NOT use a deferrable timer if
> you can't afford the timer to not happen
> within.. 10 to 100 seconds! (or more)
> They are really meant for things where you CAN afford for it to not
> happen when you're idle....

ok, that's just bad and if there's no user-defineable limit to the deferral I
definately don't like this change.

Can I safely assume that any irq will cause all deferred timers to run?

If this is the case then for e1000 this patch is still OK since the watchdog needs
to run (1) after a link up/down interrupt or (2) to update statistics. Those
statistics won't increase if there is no traffic of course...

Auke

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 19:22           ` Kok, Auke
@ 2007-12-20 19:42             ` Arjan van de Ven
  2007-12-20 20:00             ` Parag Warudkar
  1 sibling, 0 replies; 14+ messages in thread
From: Arjan van de Ven @ 2007-12-20 19:42 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Stephen Hemminger, parag.warudkar, netdev, akpm, linux-kernel

Kok, Auke wrote:

> ok, that's just bad and if there's no user-defineable limit to the deferral I
> definately don't like this change.
> 
> Can I safely assume that any irq will cause all deferred timers to run?

*on that cpu*. Timers are per cpu, as are interrupts. Just not per se the same one ...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 19:22           ` Kok, Auke
  2007-12-20 19:42             ` Arjan van de Ven
@ 2007-12-20 20:00             ` Parag Warudkar
  2007-12-20 20:04               ` Arjan van de Ven
  2007-12-20 20:17               ` Krzysztof Oledzki
  1 sibling, 2 replies; 14+ messages in thread
From: Parag Warudkar @ 2007-12-20 20:00 UTC (permalink / raw)
  To: Kok, Auke; +Cc: Arjan van de Ven, Stephen Hemminger, netdev, akpm, linux-kernel

On Dec 20, 2007 2:22 PM, Kok, Auke <auke-jan.h.kok@intel.com> wrote:
> ok, that's just bad and if there's no user-defineable limit to the deferral I
> definately don't like this change.
>
> Can I safely assume that any irq will cause all deferred timers to run?

I think even other causes for wakeup like process related ones will
cause the CPU to go busy and run the timers.
This, coupled with the fact that no one is yet able to reach 0 wakeups
per second makes it pretty unlikely that deferrable timers will be
deferred indefinitely.

>
> If this is the case then for e1000 this patch is still OK since the watchdog needs
> to run (1) after a link up/down interrupt or (2) to update statistics. Those
> statistics won't increase if there is no traffic of course...
>

I think it is reasonable for Network driver watchdogs to use a
deferrable timer - if the machine is 100% IDLE there is no one needing
the network to be up. If there is something running even on the other
CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
which will make it very likely in practice that each CPU will be
interrupted in reasonable amount of time.

Of course there are theoretical cases where we could land into a
situation where a CPU in a multiprocessor machine is IDLE infinitely
and that causes the watchdog that happens to be bound to run on the
same CPU to not run. To take care of these unlikely cases I think the
timer mechanism should have a reasonable limit on how long a CPU can
go IDLE if there are deferrable timers.

Parag

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 20:00             ` Parag Warudkar
@ 2007-12-20 20:04               ` Arjan van de Ven
  2007-12-20 20:36                 ` Parag Warudkar
  2007-12-20 20:17               ` Krzysztof Oledzki
  1 sibling, 1 reply; 14+ messages in thread
From: Arjan van de Ven @ 2007-12-20 20:04 UTC (permalink / raw)
  To: Parag Warudkar; +Cc: Kok, Auke, Stephen Hemminger, netdev, akpm, linux-kernel

Parag Warudkar wrote:
> On Dec 20, 2007 2:22 PM, Kok, Auke <auke-jan.h.kok@intel.com> wrote:
>> ok, that's just bad and if there's no user-defineable limit to the deferral I
>> definately don't like this change.
>>
>> Can I safely assume that any irq will cause all deferred timers to run?
> 
> I think even other causes for wakeup like process related ones will
> cause the CPU to go busy and run the timers.
> This, coupled with the fact that no one is yet able to reach 0 wakeups
> per second makes it pretty unlikely that deferrable timers will be
> deferred indefinitely.

0.8 is easy on single core today.
multicore just increases how idle you can be for a given core.

> 
>> If this is the case then for e1000 this patch is still OK since the watchdog needs
>> to run (1) after a link up/down interrupt or (2) to update statistics. Those
>> statistics won't increase if there is no traffic of course...
>>
> 
> I think it is reasonable for Network driver watchdogs to use a
> deferrable timer - if the machine is 100% IDLE there is no one needing
> the network to be up. If there is something running even on the other
> CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> which will make it very likely in practice that each CPU will be
> interrupted in reasonable amount of time.

this is not correct; many machines are idle waiting for network data. Think of webservers...

> 
> Of course there are theoretical cases where we could land into a
> situation where a CPU in a multiprocessor machine is IDLE infinitely
> and that causes the watchdog that happens to be bound to run on the
> same CPU to not run. To take care of these unlikely cases I think the
> timer mechanism should have a reasonable limit on how long a CPU can
> go IDLE if there are deferrable timers.

how about something else instead: a timer mechanism that takes a range instead..
that at least has defined semantics; the deferrable semantics really are "indefinite".
Lets keep at least the semantics clear and clean.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 20:00             ` Parag Warudkar
  2007-12-20 20:04               ` Arjan van de Ven
@ 2007-12-20 20:17               ` Krzysztof Oledzki
  1 sibling, 0 replies; 14+ messages in thread
From: Krzysztof Oledzki @ 2007-12-20 20:17 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Kok, Auke, Arjan van de Ven, Stephen Hemminger, netdev, akpm,
	linux-kernel



On Thu, 20 Dec 2007, Parag Warudkar wrote:

> On Dec 20, 2007 2:22 PM, Kok, Auke <auke-jan.h.kok@intel.com> wrote:
>> ok, that's just bad and if there's no user-defineable limit to the deferral I
>> definately don't like this change.
>>
>> Can I safely assume that any irq will cause all deferred timers to run?
>
> I think even other causes for wakeup like process related ones will
> cause the CPU to go busy and run the timers.
> This, coupled with the fact that no one is yet able to reach 0 wakeups
> per second makes it pretty unlikely that deferrable timers will be
> deferred indefinitely.
>
>>
>> If this is the case then for e1000 this patch is still OK since the watchdog needs
>> to run (1) after a link up/down interrupt or (2) to update statistics. Those
>> statistics won't increase if there is no traffic of course...
>>
>
> I think it is reasonable for Network driver watchdogs to use a
> deferrable timer - if the machine is 100% IDLE there is no one needing
> the network to be up.

Please note tha being connected to a network does not only mean to send 
but also to receive.

Best regards,

 				Krzysztof Oledzki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 20:04               ` Arjan van de Ven
@ 2007-12-20 20:36                 ` Parag Warudkar
  2007-12-20 21:08                   ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: Parag Warudkar @ 2007-12-20 20:36 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Kok, Auke, Stephen Hemminger, netdev, akpm, linux-kernel

On Dec 20, 2007 3:04 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > I think it is reasonable for Network driver watchdogs to use a
> > deferrable timer - if the machine is 100% IDLE there is no one needing
> > the network to be up. If there is something running even on the other
> > CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> > which will make it very likely in practice that each CPU will be
> > interrupted in reasonable amount of time.
>
> this is not correct; many machines are idle waiting for network data. Think of webservers...

Yes, I forgot the receive case. So if a server was 100% IDLE and a web
server was listening for network data and we reach 0 wakeups per
second on the CPU where the network watchdog timer is scheduled to run
deferred _and_ the network link went down, it would cause the watchdog
to not run and redo the link until some one else wakes up that CPU
later.
So as long as we make sure we don't convert every timer to deferrable
we should be ok - may be this can be resolved easily by having a
non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
that just causes at least one wake up in some reasonable time delta
from the previous wakeup (whoever caused that one.) It is still
beneficial in that all deferrable timers would run at once without
needing to have separate wakeup for each.

>
> >
> > Of course there are theoretical cases where we could land into a
> > situation where a CPU in a multiprocessor machine is IDLE infinitely
> > and that causes the watchdog that happens to be bound to run on the
> > same CPU to not run. To take care of these unlikely cases I think the
> > timer mechanism should have a reasonable limit on how long a CPU can
> > go IDLE if there are deferrable timers.
>
> how about something else instead: a timer mechanism that takes a range instead..
> that at least has defined semantics; the deferrable semantics really are "indefinite".
> Lets keep at least the semantics clear and clean.
>

Would not the simpler solution of installing a non-deferrable timer
per cpu which will not allow the CPU to go IDLE for more than x units
of time at once  (or something to that effect) work? Range would
complicate the thing and I am not sure how many cases will know
reasonably correct range for their normal operation. In this instance
of the e1000 watchdog what range could it give and be successful at
what it wants to do - bring up the link in reasonable amount of time,
while also realizing the power savings?

Perhaps depending on Server/Laptop/Desktop machine (may be based on
Preemption) we could have normal or deferrable timers but that'll
exclude Servers from power savings and I am not sure Data center folks
will like that :) .

Parag

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 20:36                 ` Parag Warudkar
@ 2007-12-20 21:08                   ` Stephen Hemminger
  2007-12-20 21:24                     ` Kok, Auke
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2007-12-20 21:08 UTC (permalink / raw)
  To: Parag Warudkar; +Cc: Arjan van de Ven, Kok, Auke, netdev, akpm, linux-kernel

On Thu, 20 Dec 2007 15:36:13 -0500
"Parag Warudkar" <parag.warudkar@gmail.com> wrote:

> On Dec 20, 2007 3:04 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > > I think it is reasonable for Network driver watchdogs to use a
> > > deferrable timer - if the machine is 100% IDLE there is no one needing
> > > the network to be up. If there is something running even on the other
> > > CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> > > which will make it very likely in practice that each CPU will be
> > > interrupted in reasonable amount of time.
> >
> > this is not correct; many machines are idle waiting for network data. Think of webservers...
> 
> Yes, I forgot the receive case. So if a server was 100% IDLE and a web
> server was listening for network data and we reach 0 wakeups per
> second on the CPU where the network watchdog timer is scheduled to run
> deferred _and_ the network link went down, it would cause the watchdog
> to not run and redo the link until some one else wakes up that CPU
> later.
> So as long as we make sure we don't convert every timer to deferrable
> we should be ok - may be this can be resolved easily by having a
> non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
> that just causes at least one wake up in some reasonable time delta
> from the previous wakeup (whoever caused that one.) It is still
> beneficial in that all deferrable timers would run at once without
> needing to have separate wakeup for each.
> 
> >
> > >
> > > Of course there are theoretical cases where we could land into a
> > > situation where a CPU in a multiprocessor machine is IDLE infinitely
> > > and that causes the watchdog that happens to be bound to run on the
> > > same CPU to not run. To take care of these unlikely cases I think the
> > > timer mechanism should have a reasonable limit on how long a CPU can
> > > go IDLE if there are deferrable timers.
> >
> > how about something else instead: a timer mechanism that takes a range instead..
> > that at least has defined semantics; the deferrable semantics really are "indefinite".
> > Lets keep at least the semantics clear and clean.
> >
> 
> Would not the simpler solution of installing a non-deferrable timer
> per cpu which will not allow the CPU to go IDLE for more than x units
> of time at once  (or something to that effect) work? Range would
> complicate the thing and I am not sure how many cases will know
> reasonably correct range for their normal operation. In this instance
> of the e1000 watchdog what range could it give and be successful at
> what it wants to do - bring up the link in reasonable amount of time,
> while also realizing the power savings?
> 
> Perhaps depending on Server/Laptop/Desktop machine (may be based on
> Preemption) we could have normal or deferrable timers but that'll
> exclude Servers from power savings and I am not sure Data center folks
> will like that :) .
> 
> Parag


The problem is that on a server the receiver will go deaf if the chip
bug that the watchdog is looking for triggers.  Yes, no packets in
and it happily will just sit there.

So for now, I am not going to apply your simple patch and work on a 
two stage timer per arjan's suggestion for a later release.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sky2: Use deferrable timer for watchdog
  2007-12-20 21:08                   ` Stephen Hemminger
@ 2007-12-20 21:24                     ` Kok, Auke
  0 siblings, 0 replies; 14+ messages in thread
From: Kok, Auke @ 2007-12-20 21:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Parag Warudkar, Arjan van de Ven, netdev, akpm, linux-kernel

Stephen Hemminger wrote:
> On Thu, 20 Dec 2007 15:36:13 -0500
> "Parag Warudkar" <parag.warudkar@gmail.com> wrote:
> 
>> On Dec 20, 2007 3:04 PM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>> I think it is reasonable for Network driver watchdogs to use a
>>>> deferrable timer - if the machine is 100% IDLE there is no one needing
>>>> the network to be up. If there is something running even on the other
>>>> CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
>>>> which will make it very likely in practice that each CPU will be
>>>> interrupted in reasonable amount of time.
>>> this is not correct; many machines are idle waiting for network data. Think of webservers...
>> Yes, I forgot the receive case. So if a server was 100% IDLE and a web
>> server was listening for network data and we reach 0 wakeups per
>> second on the CPU where the network watchdog timer is scheduled to run
>> deferred _and_ the network link went down, it would cause the watchdog
>> to not run and redo the link until some one else wakes up that CPU
>> later.
>> So as long as we make sure we don't convert every timer to deferrable
>> we should be ok - may be this can be resolved easily by having a
>> non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
>> that just causes at least one wake up in some reasonable time delta
>> from the previous wakeup (whoever caused that one.) It is still
>> beneficial in that all deferrable timers would run at once without
>> needing to have separate wakeup for each.
>>
>>>> Of course there are theoretical cases where we could land into a
>>>> situation where a CPU in a multiprocessor machine is IDLE infinitely
>>>> and that causes the watchdog that happens to be bound to run on the
>>>> same CPU to not run. To take care of these unlikely cases I think the
>>>> timer mechanism should have a reasonable limit on how long a CPU can
>>>> go IDLE if there are deferrable timers.
>>> how about something else instead: a timer mechanism that takes a range instead..
>>> that at least has defined semantics; the deferrable semantics really are "indefinite".
>>> Lets keep at least the semantics clear and clean.
>>>
>> Would not the simpler solution of installing a non-deferrable timer
>> per cpu which will not allow the CPU to go IDLE for more than x units
>> of time at once  (or something to that effect) work? Range would
>> complicate the thing and I am not sure how many cases will know
>> reasonably correct range for their normal operation. In this instance
>> of the e1000 watchdog what range could it give and be successful at
>> what it wants to do - bring up the link in reasonable amount of time,
>> while also realizing the power savings?
>>
>> Perhaps depending on Server/Laptop/Desktop machine (may be based on
>> Preemption) we could have normal or deferrable timers but that'll
>> exclude Servers from power savings and I am not sure Data center folks
>> will like that :) .
>>
>> Parag
> 
> 
> The problem is that on a server the receiver will go deaf if the chip
> bug that the watchdog is looking for triggers.  Yes, no packets in
> and it happily will just sit there.
> 
> So for now, I am not going to apply your simple patch and work on a 
> two stage timer per arjan's suggestion for a later release.

I also think that's the right way to go for now. I'll ask jeff to hold off on the
two patches for now.

Auke


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-12-20 21:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-19  1:13 [PATCH] sky2: Use deferrable timer for watchdog Parag Warudkar
2007-12-20 17:16 ` Stephen Hemminger
     [not found]   ` <823114761-1198171803-cardhu_decombobulator_blackberry.rim.net-937108990-@bxe019.bisx.prod.on.blackberry>
2007-12-20 17:51     ` Stephen Hemminger
2007-12-20 18:05       ` Parag Warudkar
2007-12-20 19:09       ` Kok, Auke
2007-12-20 19:11         ` Arjan van de Ven
2007-12-20 19:22           ` Kok, Auke
2007-12-20 19:42             ` Arjan van de Ven
2007-12-20 20:00             ` Parag Warudkar
2007-12-20 20:04               ` Arjan van de Ven
2007-12-20 20:36                 ` Parag Warudkar
2007-12-20 21:08                   ` Stephen Hemminger
2007-12-20 21:24                     ` Kok, Auke
2007-12-20 20:17               ` Krzysztof Oledzki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).