linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Artem Savkov <asavkov@redhat.com>,
	jpoimboe@redhat.com, netdev@vger.kernel.org
Cc: davem@davemloft.net, yoshfuji@linux-ipv6.org, dsahern@kernel.org,
	linux-kernel@vger.kernel.org, Artem Savkov <asavkov@redhat.com>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>
Subject: Re: [PATCH 1/2] timer: introduce upper bound timers
Date: Sat, 26 Mar 2022 22:13:52 +0100	[thread overview]
Message-ID: <87zglcfmcv.ffs@tglx> (raw)
In-Reply-To: <87tubn8rgk.ffs@tglx>

Artem,

On Thu, Mar 24 2022 at 13:28, Thomas Gleixner wrote:
> On Wed, Mar 23 2022 at 12:16, Artem Savkov wrote:
>>  	 * Round up with level granularity to prevent this.
>> +	 * Do not perform round up in case of upper bound timer.
>>  	 */
>> -	expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
>> +	if (upper_bound)
>> +		expires = expires >> LVL_SHIFT(lvl);
>> +	else
>> +		expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);
>
> While this "works", I fundamentally hate this because it adds an extra

actually it cannot not work. At least not in a realiable and predictable
way.

The timer wheel is fundamentaly relying on moving the timer one bucket
out. Let's look how this all works.

The wheel has N levels of bucket arrays. Each level has 64 buckets and
the granularity increases by a factor of 8 per level, i.e. the worst
case deviation is 12.5% per level.

The original timer wheel was able to fire the timer at expiry time + one
tick for the price of cascading timers every so often from one level to
the next lower level. Here are a few pointers:

    https://lwn.net/Articles/152436/
    https://lwn.net/Articles/646950/

The accuracy of the original wheel implementation was weakened already
in course of the NOHZ development where the concept of slack
(algorithmic batching at enqueue time for the price of overhead) was
introduced. It had the same 12.5% worst case deviation of the resulting
granularity level, though the batching was not enforced and only worked
most of the time. So in theory the same issue could have been seen back
then already.

The enqueue placement and the expiry is driven by base::clock, which
is nothing else than jiffies. When a timer is queued then base::clock is
the time on which the next tick and expiry check happens.

Now let's look how the expiry mechanism works. The first level (0) is
obviously expired every tick. On every eigth tick the second level (1) is
expired, on every 64th tick the third level (2)...

IOW, the expiry events of a level happen at 8^index(level) intervals.

Let's assume that base::clk is 0. That means at the next tick (which
could be imminent) in _all_ levels bucket[0] is due for expiry.

Now let's enqueue a timer with expiry value of 64:

    delta = 64 - 0 = 64

That means the timer ends up in the second level in bucket[0], which
makes it eligible for expiry at the next tick. The same is true for any
expiry value of 8^N.

Not what you are trying to achieve, right? You try to enforce an upper
bound, but you expect that the timer does not fire earlier than 12.5% of
the granularity level of that upper bound, right?

IOW, you created a expiry lottery and there is no way to prevent that
except with more conditionals and heuristics which are hardely welcomed.

You've also seen the outcome of a timer firing unexpectedly due to the
bit abuse, right?

Now let's take a step back and look at the problem at hand.

TCP alive timer, which is affected by the batching and the resulting
inaccuracy, is (re)armed with a relative timeout, which is known
upfront, right?

The important part is *relative* timeout. Why?

Because the timeout is relative it's trivial to calculate a relative
timeout value from the given accurate relative timeout value, which is
guaranteed to not expire late and within the timerwheel's error margin,
and use that for the actual rearming.

I'm pretty sure that you can come up with a conversion function for that
and use this function in the TCP code at the point where the TCP
keepalive timer is configured.

Hint 1: The output of calc_wheel_index() should be good enough to figure
        that out.

Hint 2: If you get frustrated, try

        git grep johnstul arch/x86/ | awk '{ $1=$2=$3=""; print $0 }'

Hint 3: Feel free to ask questions anytime

Thanks,

        tglx

  parent reply	other threads:[~2022-03-26 21:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23 11:16 [PATCH 0/2] Upper bound mode for kernel timers Artem Savkov
2022-03-23 11:16 ` [PATCH 1/2] timer: introduce upper bound timers Artem Savkov
2022-03-23 18:40   ` Josh Poimboeuf
2022-03-24  9:14     ` [PATCH v2 0/2] Upper bound mode for kernel timers Artem Savkov
2022-03-24  9:14       ` [PATCH v2 1/2] timer: introduce upper bound timers Artem Savkov
2022-03-24  9:15       ` [PATCH v2 2/2] net: make tcp keepalive timer upper bound Artem Savkov
2022-03-24 12:28   ` [PATCH 1/2] timer: introduce upper bound timers Thomas Gleixner
2022-03-24 13:54     ` Thomas Gleixner
2022-03-26 21:13     ` Thomas Gleixner [this message]
2022-03-30  8:20       ` [PATCH v3 0/2] Upper bound kernel timers Artem Savkov
2022-03-30  8:20         ` [PATCH v3 1/2] timer: add a function to adjust timeouts to be upper bound Artem Savkov
2022-03-30 13:40           ` Anna-Maria Behnsen
2022-04-02  6:55             ` Artem Savkov
2022-04-05 15:33               ` Thomas Gleixner
2022-04-07  7:52                 ` [PATCH v4 0/2] Upper bound kernel timers Artem Savkov
2022-04-07  7:52                   ` [PATCH v4 1/2] timer: add a function to adjust timeouts to be upper bound Artem Savkov
2022-04-08  0:37                     ` Thomas Gleixner
2022-04-08  5:39                       ` Josh Poimboeuf
2022-04-12 13:42                       ` Artem Savkov
2022-05-05 13:18                       ` [PATCH v5 0/2] Upper bound kernel timers Artem Savkov
2022-05-05 13:18                         ` [PATCH v5 1/2] timer: add a function to adjust timeouts to be upper bound Artem Savkov
2022-05-05 13:18                         ` [PATCH v5 2/2] net: make tcp keepalive timer " Artem Savkov
2022-05-05 17:56                           ` Josh Poimboeuf
2022-05-06  6:39                             ` Artem Savkov
2022-05-06 16:24                               ` Josh Poimboeuf
2022-07-26 22:42                         ` [PATCH v5 0/2] Upper bound kernel timers Josh Poimboeuf
2022-04-07  7:52                   ` [PATCH v4 2/2] net: make tcp keepalive timer upper bound Artem Savkov
     [not found]                 ` <Yk1i3WrcVIICAiF0@samus.usersys.redhat.com>
2022-04-07 23:26                   ` [PATCH v3 1/2] timer: add a function to adjust timeouts to be " Thomas Gleixner
2022-03-30  8:20         ` [PATCH v3 2/2] net: make tcp keepalive timer " Artem Savkov
2022-04-02  3:09           ` [net] 6ef3f95797: UBSAN:shift-out-of-bounds_in_kernel/time/timer.c kernel test robot
2022-04-02  7:11             ` Artem Savkov
2022-03-30 10:28         ` [PATCH v3 0/2] Upper bound kernel timers David Laight
2022-03-25  7:38   ` [timer] d41e0719d5: UBSAN:shift-out-of-bounds_in_lib/flex_proportions.c kernel test robot
2022-03-25 19:14     ` Thomas Gleixner
2022-03-23 11:16 ` [PATCH 2/2] net: make tcp keepalive timer upper bound Artem Savkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zglcfmcv.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=asavkov@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).