From: Thomas Gleixner <tglx@linutronix.de>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Alexey Perevalov <a.perevalov@samsung.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
John Stultz <john.stultz@linaro.org>,
Anton Vorontsov <anton@enomsg.org>,
Kyungmin Park <kyungmin.park@samsung.com>,
cw00.choi@samsung.com, Andrew Morton <akpm@linux-foundation.org>,
Anton Vorontsov <anton.vorontsov@linaro.org>
Subject: Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
Date: Wed, 5 Mar 2014 12:40:25 +0100 (CET) [thread overview]
Message-ID: <alpine.DEB.2.02.1403050146560.18573@ionos.tec.linutronix.de> (raw)
In-Reply-To: <CALCETrVxvCaLUyeMoaEHXvUzOgj_531HENu1G90_WKnS3dE4zA@mail.gmail.com>
On Tue, 4 Mar 2014, Andy Lutomirski wrote:
> On Tue, Mar 4, 2014 at 4:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > A slacked timer still gets enqueued into the main timer queue. It just
> > relies on the fact that it gets batched with some other expiring
> > timer. But thats completely different to the deferrable approach.
> >
> > start_timer(timer, expiry, slack);
> >
> > timer.hard_expiry = expiry + slack;
> > timer.soft_expiry = expiry;
> > enqueue_timer(timer, timer.hard_expiry);
> >
> > The enqueueing code puts it into the queue by looking at the
> > hard_expiry code. And the expiry code looks at the timer.soft_expiry
> > value to expire a timer early.
> >
> > Now assume the following:
> >
> > start_timer(timer, +100ms, 100s);
> >
> > So that puts that timer into the hard expiry line of 100.1 sec from
> > now. So if the cpu is busy and is firing a lot of timers then your
> > timer could be delayed up to the hard expiry time, i.e. 100.1 seconds
> > from now, which has completely differrent semantics than the
> > deferrrable timers.
>
> Erk. I didn't realize that. Is that really the desired behavior? I
It's the implemented behaviour for a reason.
> assumed that a timer with slack would fire at the earliest time after
> the soft timeout at which the system wasn't idle. The idea is to
> batch wakeups, right?
Correct. And that's why the slack thing was invented. Not the best
invention, but it solved a problem without creating a cast in stone
new user space ABI. And it was simple to do with the existing
RB-Tree. Otherwise you'd need a Priority Search Tree which handles
overlapping expiry ranges.
> > The deferrable timer is guaranteed to expire (halfways) on time when
> > the system is active and does not affect the system from going idle,
> > but it expires right away when the system comes back out of idle.
> >
> > The slack timers are just a batching mechanism to align expiry times
> > of non deferrable timers to a common time.
> >
> > So how do you map those together?
>
> By thinking of what semantics are actually useful for userspace developers.
>
> I think that most userspace developers probably want the semantics
> that I thought that timer slack had: I want to do work between time A
> and time B. Before A is too early, but I'm willing to wait until time
> B if it improves power consumption.
Well, that's what slack actually does.
But your assumption that this is what most userspace developers
probably want is wrong. A lot of them want the following:
Fire me on time when the CPU/system is busy, otherwise ignore me
for a time X, where X might be infinite.
And you cannot map this to slack. See below.
> Presumably, if the kernel chooses *not* to fire the timer just after
> time A even if the system is awake, then it's risking an unnecessary
> wakeup at time B.
>
> (I admit that I don't really understand the hrtimer code. I guess
> that two indexes on the list of timers would be needed.)
The real problem is that we want to cover the following cases:
1) Expire me no matter what at X
2) Expire me no matter what at X + Slack (wakeup batching)
3) Expire me close to X when the system/cpu is busy otherwise expire me latest
at X + Slack
4) Expire me close to X when the system/cpu is busy otherwise
ignore me
#1 and #2 are handled today #1 is #2 with Slack = 0
#4 is what I implemented with the extra internal queues and the extra
flag. We can make the internal implementation to handle #3 as well,
but we do not have a user space interface for that.
> >> Once we agree on a solution to the Y2038 issue on 32bit with a unified
> >> 32/64 bit syscall interface which simply gets rid of the timespec/val
> >> nonsense and takes a simple u64 nsec value we can add the slack
> >> property to that without any further inconvenience.
> >
> > Ignoring this wont get you anywhere.
>
> I'm not entirely sure why per-timer slack can't be added without
> simultaneously fixing Y2038 (and presumably leap seconds, too) but a
> new flag can be.
The additional flag is fine as it does not introduce a completely new
ABI, it merily extends the existing ABI.
But adding a per call slack is going to introduce a new ABI and I
really dont want to go there as we need to introduce a new ABI for the
Y2038 issue anyway. And that's way more than the few direct timer
related syscalls. Basically we have to look at all syscalls which take
a timespec/timeval.
So no, we are not going to add an adhoc intermediate ABI which we need
to support forever.
Thanks,
tglx
next prev parent reply other threads:[~2014-03-05 11:40 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
2014-02-20 16:23 ` [PATCH v4 1/6] Replace ternary operator to macro Alexey Perevalov
2014-02-20 20:49 ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace Alexey Perevalov
2014-02-20 20:49 ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 3/6] hrtimer: Add support for deferrable timer into the hrtimer Alexey Perevalov
2014-02-20 16:23 ` [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm() Alexey Perevalov
2014-02-20 21:13 ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 5/6] timerfd: Add support for deferrable timers Alexey Perevalov
2014-02-26 2:53 ` Andy Lutomirski
2014-03-04 20:58 ` Thomas Gleixner
2014-03-04 21:53 ` Andy Lutomirski
2014-03-04 22:11 ` Thomas Gleixner
2014-03-04 22:43 ` Andy Lutomirski
2014-03-05 0:10 ` Thomas Gleixner
2014-03-05 0:42 ` Andy Lutomirski
2014-03-05 11:40 ` Thomas Gleixner [this message]
2014-03-05 9:42 ` Richard Cochran
2014-02-20 16:23 ` [PATCH v4 6/6] tracing/trivial: Add CLOCK_*_DEFERRABLE for tracing clockids Alexey Perevalov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.02.1403050146560.18573@ionos.tec.linutronix.de \
--to=tglx@linutronix.de \
--cc=a.perevalov@samsung.com \
--cc=akpm@linux-foundation.org \
--cc=anton.vorontsov@linaro.org \
--cc=anton@enomsg.org \
--cc=cw00.choi@samsung.com \
--cc=john.stultz@linaro.org \
--cc=kyungmin.park@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).