linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Alexey Perevalov <a.perevalov@samsung.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	John Stultz <john.stultz@linaro.org>,
	Anton Vorontsov <anton@enomsg.org>,
	Kyungmin Park <kyungmin.park@samsung.com>,
	cw00.choi@samsung.com, Andrew Morton <akpm@linux-foundation.org>,
	Anton Vorontsov <anton.vorontsov@linaro.org>
Subject: Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
Date: Wed, 5 Mar 2014 12:40:25 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.02.1403050146560.18573@ionos.tec.linutronix.de> (raw)
In-Reply-To: <CALCETrVxvCaLUyeMoaEHXvUzOgj_531HENu1G90_WKnS3dE4zA@mail.gmail.com>

On Tue, 4 Mar 2014, Andy Lutomirski wrote:
> On Tue, Mar 4, 2014 at 4:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > A slacked timer still gets enqueued into the main timer queue. It just
> > relies on the fact that it gets batched with some other expiring
> > timer. But thats completely different to the deferrable approach.
> >
> >        start_timer(timer, expiry, slack);
> >
> >            timer.hard_expiry = expiry + slack;
> >            timer.soft_expiry = expiry;
> >            enqueue_timer(timer, timer.hard_expiry);
> >
> > The enqueueing code puts it into the queue by looking at the
> > hard_expiry code. And the expiry code looks at the timer.soft_expiry
> > value to expire a timer early.
> >
> > Now assume the following:
> >
> >        start_timer(timer, +100ms, 100s);
> >
> > So that puts that timer into the hard expiry line of 100.1 sec from
> > now. So if the cpu is busy and is firing a lot of timers then your
> > timer could be delayed up to the hard expiry time, i.e. 100.1 seconds
> > from now, which has completely differrent semantics than the
> > deferrrable timers.
> 
> Erk.  I didn't realize that.  Is that really the desired behavior?  I

It's the implemented behaviour for a reason.

> assumed that a timer with slack would fire at the earliest time after
> the soft timeout at which the system wasn't idle.  The idea is to
> batch wakeups, right?

Correct. And that's why the slack thing was invented. Not the best
invention, but it solved a problem without creating a cast in stone
new user space ABI. And it was simple to do with the existing
RB-Tree. Otherwise you'd need a Priority Search Tree which handles
overlapping expiry ranges.
 
> > The deferrable timer is guaranteed to expire (halfways) on time when
> > the system is active and does not affect the system from going idle,
> > but it expires right away when the system comes back out of idle.
> >
> > The slack timers are just a batching mechanism to align expiry times
> > of non deferrable timers to a common time.
> >
> > So how do you map those together?
> 
> By thinking of what semantics are actually useful for userspace developers.
> 
> I think that most userspace developers probably want the semantics
> that I thought that timer slack had: I want to do work between time A
> and time B.  Before A is too early, but I'm willing to wait until time
> B if it improves power consumption.

Well, that's what slack actually does.

But your assumption that this is what most userspace developers
probably want is wrong. A lot of them want the following:

   Fire me on time when the CPU/system is busy, otherwise ignore me
   for a time X, where X might be infinite.

And you cannot map this to slack. See below.
 
> Presumably, if the kernel chooses *not* to fire the timer just after
> time A even if the system is awake, then it's risking an unnecessary
> wakeup at time B.
> 
> (I admit that I don't really understand the hrtimer code.  I guess
> that two indexes on the list of timers would be needed.)

The real problem is that we want to cover the following cases:

    1) Expire me no matter what at X

    2) Expire me no matter what at X + Slack (wakeup batching)

    3) Expire me close to X when the system/cpu is busy otherwise expire me latest
        at X + Slack

    4) Expire me close to X when the system/cpu is busy otherwise
       ignore me

#1 and #2 are handled today #1 is #2 with Slack = 0

#4 is what I implemented with the extra internal queues and the extra
flag. We can make the internal implementation to handle #3 as well,
but we do not have a user space interface for that.

> >> Once we agree on a solution to the Y2038 issue on 32bit with a unified
> >> 32/64 bit syscall interface which simply gets rid of the timespec/val
> >> nonsense and takes a simple u64 nsec value we can add the slack
> >> property to that without any further inconvenience.
> >
> > Ignoring this wont get you anywhere.
> 
> I'm not entirely sure why per-timer slack can't be added without
> simultaneously fixing Y2038 (and presumably leap seconds, too) but a
> new flag can be.

The additional flag is fine as it does not introduce a completely new
ABI, it merily extends the existing ABI.

But adding a per call slack is going to introduce a new ABI and I
really dont want to go there as we need to introduce a new ABI for the
Y2038 issue anyway. And that's way more than the few direct timer
related syscalls. Basically we have to look at all syscalls which take
a timespec/timeval.

So no, we are not going to add an adhoc intermediate ABI which we need
to support forever.

Thanks,

	tglx

  reply	other threads:[~2014-03-05 11:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
2014-02-20 16:23 ` [PATCH v4 1/6] Replace ternary operator to macro Alexey Perevalov
2014-02-20 20:49   ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace Alexey Perevalov
2014-02-20 20:49   ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 3/6] hrtimer: Add support for deferrable timer into the hrtimer Alexey Perevalov
2014-02-20 16:23 ` [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm() Alexey Perevalov
2014-02-20 21:13   ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 5/6] timerfd: Add support for deferrable timers Alexey Perevalov
2014-02-26  2:53   ` Andy Lutomirski
2014-03-04 20:58     ` Thomas Gleixner
2014-03-04 21:53       ` Andy Lutomirski
2014-03-04 22:11         ` Thomas Gleixner
2014-03-04 22:43           ` Andy Lutomirski
2014-03-05  0:10             ` Thomas Gleixner
2014-03-05  0:42               ` Andy Lutomirski
2014-03-05 11:40                 ` Thomas Gleixner [this message]
2014-03-05  9:42           ` Richard Cochran
2014-02-20 16:23 ` [PATCH v4 6/6] tracing/trivial: Add CLOCK_*_DEFERRABLE for tracing clockids Alexey Perevalov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1403050146560.18573@ionos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=a.perevalov@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton.vorontsov@linaro.org \
    --cc=anton@enomsg.org \
    --cc=cw00.choi@samsung.com \
    --cc=john.stultz@linaro.org \
    --cc=kyungmin.park@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).