All of lore.kernel.org
 help / color / mirror / Atom feed
* Upper bound mode for kernel timers
@ 2021-03-02  0:10 Josh Poimboeuf
  2021-03-03 14:06 ` David Laight
  0 siblings, 1 reply; 2+ messages in thread
From: Josh Poimboeuf @ 2021-03-02  0:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Artem Savkov, linux-kernel, Anna-Maria Behnsen

Hi Thomas,

As discussed on IRC:

We had a report of a regression in the TCP keepalive timer.  The user
had a 3600s keepalive timer for preventing firewall disconnects (on a
3650s interval).  They observed keepalive timers coming in up to four
minutes late, causing unexpected disconnects.

The regression was observed to have come from the timer wheel rewrite
from almost five years ago:

  500462a9de65 ("timers: Switch to a non-cascading wheel")

As you mentioned, with a HZ of 1000, the granularity for a one-hour
timer is four minutes, which matches the seen behavior.

To "fix" it, the user can just lower the timeout value by four minutes,
but that's a workaround, because the keepalive timer isn't working as
advertised.

One potential fix would be an "upper bound mode" in the timer, i.e. give
the user a way to specify that the given 'expires' value is an upper
bound rather than a lower bound.

As you graciously offered, if you or Anna-Maria can implement that new
interface, we (Artem or I) can write up a patch to use it for the
keepalive timer.

-- 
Josh


^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: Upper bound mode for kernel timers
  2021-03-02  0:10 Upper bound mode for kernel timers Josh Poimboeuf
@ 2021-03-03 14:06 ` David Laight
  0 siblings, 0 replies; 2+ messages in thread
From: David Laight @ 2021-03-03 14:06 UTC (permalink / raw)
  To: 'Josh Poimboeuf', Thomas Gleixner
  Cc: Artem Savkov, linux-kernel, Anna-Maria Behnsen

From: Josh Poimboeuf
> Sent: 02 March 2021 00:11
> 
> We had a report of a regression in the TCP keepalive timer.  The user
> had a 3600s keepalive timer for preventing firewall disconnects (on a
> 3650s interval).  They observed keepalive timers coming in up to four
> minutes late, causing unexpected disconnects.
> 
> The regression was observed to have come from the timer wheel rewrite
> from almost five years ago:
> 
>   500462a9de65 ("timers: Switch to a non-cascading wheel")
> 
> As you mentioned, with a HZ of 1000, the granularity for a one-hour
> timer is four minutes, which matches the seen behavior.

That seems horribly broken - if technically valid.

Reading the big comment even the 32sec for the next finer 'wheel'
seems a little coarse for a 1h timer.
The second finer wheel has 4sec resolution - which is probably reasonable.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-03 18:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02  0:10 Upper bound mode for kernel timers Josh Poimboeuf
2021-03-03 14:06 ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.