linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API
@ 2014-02-20 16:23 Alexey Perevalov
  2014-02-20 16:23 ` [PATCH v4 1/6] Replace ternary operator to macro Alexey Perevalov
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Alexey Perevalov @ 2014-02-20 16:23 UTC (permalink / raw)
  To: linux-kernel, tglx, john.stultz
  Cc: Alexey Perevalov, anton, kyungmin.park, cw00.choi, akpm

Hello,

This patch set introduces deferrable functionality for hrtimers and add ability
to use it from timerfd_create syscall.
It means user-space processes are able to not wake up the system from cpuidle state
in case of periodic job. And it's the main goal of this patch set for us and for team
sent such patch set before.
I hope it will find users in kernel space too, cause it could avoid timer_list drawbacks,
such as worst case when many,many timers are gonna to be expired.


Kernel already has deferrable timer funcionality based on timer_list. But deferrable
functionality inside hrtimers makes timerfd implemenation trivial for such case.

It's based on github.com/torvalds/linux.git git repository on top of
e95003c3f9ccbfa7ab9d265e6eb703ee2fa4cfe7

Alexey Perevalov (3):
  Replace ternary operator to macro
  tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable
    clockid trace
  tracing/trivial: Add CLOCK_*_DEFERRABLE for tracing clockids

Anton Vorontsov (2):
  timerfd: Move repeated logic into timerfd_rearm()
  timerfd: Add support for deferrable timers

Thomas Gleixner (1):
  hrtimer: Add support for deferrable timer into the hrtimer

 fs/timerfd.c                 |   47 ++++++++++++++++----------------
 include/linux/hrtimer.h      |    3 ++
 include/trace/events/timer.h |   25 ++++++++++++++---
 include/uapi/linux/time.h    |    3 ++
 kernel/hrtimer.c             |   62 +++++++++++++++++++++++++++++++++---------
 5 files changed, 100 insertions(+), 40 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/6] Replace ternary operator to macro
  2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
@ 2014-02-20 16:23 ` Alexey Perevalov
  2014-02-20 20:49   ` Thomas Gleixner
  2014-02-20 16:23 ` [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace Alexey Perevalov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Alexey Perevalov @ 2014-02-20 16:23 UTC (permalink / raw)
  To: linux-kernel, tglx, john.stultz
  Cc: Alexey Perevalov, anton, kyungmin.park, cw00.choi, akpm

For extensibility purpose it's better to have _switch_ type mechanism
for representing human readable CLOCKID* and HRMODE.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/trace/events/timer.h |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 68c2c20..185b2c6 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -8,6 +8,16 @@
 #include <linux/hrtimer.h>
 #include <linux/timer.h>
 
+#define clockid_to_string(clockid)						\
+	__print_symbolic(clockid,						\
+			 { CLOCK_MONOTONIC,		"CLOCK_MONOTONIC" },	\
+			 { CLOCK_REALTIME,		"CLOCK_REALTIME" })
+
+#define hrmode_to_string(hrmode)						\
+	__print_symbolic(hrmode,						\
+			 { HRTIMER_MODE_ABS,		"HRTIMER_MODE_ABS" },	\
+			 { HRTIMER_MODE_REL,		"HRTIMER_MODE_REL" })
+
 DECLARE_EVENT_CLASS(timer_class,
 
 	TP_PROTO(struct timer_list *timer),
@@ -147,10 +157,8 @@ TRACE_EVENT(hrtimer_init,
 	),
 
 	TP_printk("hrtimer=%p clockid=%s mode=%s", __entry->hrtimer,
-		  __entry->clockid == CLOCK_REALTIME ?
-			"CLOCK_REALTIME" : "CLOCK_MONOTONIC",
-		  __entry->mode == HRTIMER_MODE_ABS ?
-			"HRTIMER_MODE_ABS" : "HRTIMER_MODE_REL")
+		  clockid_to_string(__entry->clockid),
+		  hrmode_to_string(__entry->mode))
 );
 
 /**
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace
  2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
  2014-02-20 16:23 ` [PATCH v4 1/6] Replace ternary operator to macro Alexey Perevalov
@ 2014-02-20 16:23 ` Alexey Perevalov
  2014-02-20 20:49   ` Thomas Gleixner
  2014-02-20 16:23 ` [PATCH v4 3/6] hrtimer: Add support for deferrable timer into the hrtimer Alexey Perevalov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Alexey Perevalov @ 2014-02-20 16:23 UTC (permalink / raw)
  To: linux-kernel, tglx, john.stultz
  Cc: Alexey Perevalov, anton, kyungmin.park, cw00.choi, akpm

These clockids is also used in current hrtimer implementation.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/trace/events/timer.h |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 185b2c6..547b79f 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -11,7 +11,10 @@
 #define clockid_to_string(clockid)						\
 	__print_symbolic(clockid,						\
 			 { CLOCK_MONOTONIC,		"CLOCK_MONOTONIC" },	\
-			 { CLOCK_REALTIME,		"CLOCK_REALTIME" })
+			 { CLOCK_REALTIME,		"CLOCK_REALTIME" },	\
+			 { CLOCK_BOOTTIME,		"CLOCK_BOOTTIME" },	\
+			 { CLOCK_TAI,			"CLOCK_TAI" }		\
+	)
 
 #define hrmode_to_string(hrmode)						\
 	__print_symbolic(hrmode,						\
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 3/6] hrtimer: Add support for deferrable timer into the hrtimer
  2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
  2014-02-20 16:23 ` [PATCH v4 1/6] Replace ternary operator to macro Alexey Perevalov
  2014-02-20 16:23 ` [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace Alexey Perevalov
@ 2014-02-20 16:23 ` Alexey Perevalov
  2014-02-20 16:23 ` [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm() Alexey Perevalov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Alexey Perevalov @ 2014-02-20 16:23 UTC (permalink / raw)
  To: linux-kernel, tglx, john.stultz
  Cc: anton, kyungmin.park, cw00.choi, akpm, Alexey Perevalov

From: Thomas Gleixner <tglx@linutronix.de>

This patch introduces new public CLOCKID constants for
user space API, such as timerfd. It extends hrtimer API and makes
possible to have unified interfaces where deferreble functionality is
used. In-kernel users such as device drivers could find benefits too.

High resolution timer now could work with CLOCK_REALTIME_DEFERRABLE,
CLOCK_MONOTONIC_DEFERRABLE, CLOCK_BOOTTIME_DEFERRABLE.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/linux/hrtimer.h   |    3 +++
 include/uapi/linux/time.h |    3 +++
 kernel/hrtimer.c          |   62 +++++++++++++++++++++++++++++++++++----------
 3 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index d19a5c2..fe1159c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -158,6 +158,9 @@ enum  hrtimer_base_type {
 	HRTIMER_BASE_REALTIME,
 	HRTIMER_BASE_BOOTTIME,
 	HRTIMER_BASE_TAI,
+	HRTIMER_BASE_MONOTONIC_DEFERRABLE,
+	HRTIMER_BASE_REALTIME_DEFERRABLE,
+	HRTIMER_BASE_BOOTTIME_DEFERRABLE,
 	HRTIMER_MAX_CLOCK_BASES,
 };
 
diff --git a/include/uapi/linux/time.h b/include/uapi/linux/time.h
index e75e1b6..bb8dc60 100644
--- a/include/uapi/linux/time.h
+++ b/include/uapi/linux/time.h
@@ -56,6 +56,9 @@ struct itimerval {
 #define CLOCK_BOOTTIME_ALARM		9
 #define CLOCK_SGI_CYCLE			10	/* Hardware specific */
 #define CLOCK_TAI			11
+#define CLOCK_REALTIME_DEFERRABLE	12
+#define CLOCK_MONOTONIC_DEFERRABLE	13
+#define CLOCK_BOOTTIME_DEFERRABLE	14
 
 #define MAX_CLOCKS			16
 #define CLOCKS_MASK			(CLOCK_REALTIME | CLOCK_MONOTONIC)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 0909436..d1478fc 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -92,14 +92,35 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
 			.get_time = &ktime_get_clocktai,
 			.resolution = KTIME_LOW_RES,
 		},
+		{
+			.index = HRTIMER_BASE_MONOTONIC_DEFERRABLE,
+			.clockid = CLOCK_MONOTONIC_DEFERRABLE,
+			.get_time = &ktime_get,
+			.resolution = KTIME_LOW_RES,
+		},
+		{
+			.index = HRTIMER_BASE_REALTIME_DEFERRABLE,
+			.clockid = CLOCK_REALTIME_DEFERRABLE,
+			.get_time = &ktime_get_real,
+			.resolution = KTIME_LOW_RES,
+		},
+		{
+			.index = HRTIMER_BASE_BOOTTIME_DEFERRABLE,
+			.clockid = CLOCK_BOOTTIME_DEFERRABLE,
+			.get_time = &ktime_get_boottime,
+			.resolution = KTIME_LOW_RES,
+		},
 	}
 };
 
 static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
-	[CLOCK_REALTIME]	= HRTIMER_BASE_REALTIME,
-	[CLOCK_MONOTONIC]	= HRTIMER_BASE_MONOTONIC,
-	[CLOCK_BOOTTIME]	= HRTIMER_BASE_BOOTTIME,
-	[CLOCK_TAI]		= HRTIMER_BASE_TAI,
+	[CLOCK_REALTIME]		= HRTIMER_BASE_REALTIME,
+	[CLOCK_MONOTONIC]		= HRTIMER_BASE_MONOTONIC,
+	[CLOCK_BOOTTIME]		= HRTIMER_BASE_BOOTTIME,
+	[CLOCK_TAI]			= HRTIMER_BASE_TAI,
+	[CLOCK_REALTIME_DEFERRABLE]	= HRTIMER_BASE_REALTIME_DEFERRABLE,
+	[CLOCK_MONOTONIC_DEFERRABLE]	= HRTIMER_BASE_MONOTONIC_DEFERRABLE,
+	[CLOCK_BOOTTIME_DEFERRABLE]	= HRTIMER_BASE_BOOTTIME_DEFERRABLE,
 };
 
 static inline int hrtimer_clockid_to_base(clockid_t clock_id)
@@ -194,7 +215,8 @@ hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)
 #ifdef CONFIG_HIGH_RES_TIMERS
 	ktime_t expires;
 
-	if (!new_base->cpu_base->hres_active)
+	if (!new_base->cpu_base->hres_active ||
+	    new_base->index >= HRTIMER_BASE_MONOTONIC_DEFERRABLE)
 		return 0;
 
 	expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
@@ -556,7 +578,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
 
 	expires_next.tv64 = KTIME_MAX;
 
-	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++, base++) {
+	for (i = 0; i < HRTIMER_BASE_MONOTONIC_DEFERRABLE; i++, base++) {
 		struct hrtimer *timer;
 		struct timerqueue_node *next;
 
@@ -615,6 +637,13 @@ static int hrtimer_reprogram(struct hrtimer *timer,
 		return 0;
 
 	/*
+	 * Deferrable timers are not touching the underlying
+	 * hardware.
+	 */
+	if (base->index >= HRTIMER_BASE_MONOTONIC_DEFERRABLE)
+		return 0;
+
+	/*
 	 * CLOCK_REALTIME timer might be requested with an absolute
 	 * expiry time which is less than base->offset. Nothing wrong
 	 * about that, just avoid to call into the tick code, which
@@ -924,7 +953,10 @@ static void __remove_hrtimer(struct hrtimer *timer,
 
 			expires = ktime_sub(hrtimer_get_expires(timer),
 					    base->offset);
-			if (base->cpu_base->expires_next.tv64 == expires.tv64)
+
+			/* We only care about non deferrable timers here */
+			if (base->index <= HRTIMER_BASE_MONOTONIC_DEFERRABLE &&
+			    base->cpu_base->expires_next.tv64 == expires.tv64)
 				hrtimer_force_reprogram(base->cpu_base, 1);
 		}
 #endif
@@ -1152,7 +1184,7 @@ ktime_t hrtimer_get_next_event(void)
 	raw_spin_lock_irqsave(&cpu_base->lock, flags);
 
 	if (!hrtimer_hres_active()) {
-		for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++, base++) {
+		for (i = 0; i < HRTIMER_BASE_MONOTONIC_DEFERRABLE; i++, base++) {
 			struct hrtimer *timer;
 			struct timerqueue_node *next;
 
@@ -1284,7 +1316,8 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 {
 	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
 	ktime_t expires_next, now, entry_time, delta;
-	int i, retries = 0;
+	unsigned long bases;
+	int retries = 0;
 
 	BUG_ON(!cpu_base->hres_active);
 	cpu_base->nr_events++;
@@ -1302,14 +1335,16 @@ retry:
 	 * this CPU.
 	 */
 	cpu_base->expires_next.tv64 = KTIME_MAX;
+	bases = cpu_base->active_bases;
 
-	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
+	while (bases) {
 		struct hrtimer_clock_base *base;
 		struct timerqueue_node *node;
 		ktime_t basenow;
+		int i;
 
-		if (!(cpu_base->active_bases & (1 << i)))
-			continue;
+		i = __ffs(bases);
+		bases &= ~(1 << i);
 
 		base = cpu_base->clock_base + i;
 		basenow = ktime_add(now, base->offset);
@@ -1339,7 +1374,8 @@ retry:
 						    base->offset);
 				if (expires.tv64 < 0)
 					expires.tv64 = KTIME_MAX;
-				if (expires.tv64 < expires_next.tv64)
+				if (expires.tv64 < expires_next.tv64 &&
+				    base->index <= HRTIMER_BASE_MONOTONIC_DEFERRABLE)
 					expires_next = expires;
 				break;
 			}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm()
  2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
                   ` (2 preceding siblings ...)
  2014-02-20 16:23 ` [PATCH v4 3/6] hrtimer: Add support for deferrable timer into the hrtimer Alexey Perevalov
@ 2014-02-20 16:23 ` Alexey Perevalov
  2014-02-20 21:13   ` Thomas Gleixner
  2014-02-20 16:23 ` [PATCH v4 5/6] timerfd: Add support for deferrable timers Alexey Perevalov
  2014-02-20 16:23 ` [PATCH v4 6/6] tracing/trivial: Add CLOCK_*_DEFERRABLE for tracing clockids Alexey Perevalov
  5 siblings, 1 reply; 19+ messages in thread
From: Alexey Perevalov @ 2014-02-20 16:23 UTC (permalink / raw)
  To: linux-kernel, tglx, john.stultz
  Cc: Anton Vorontsov, kyungmin.park, cw00.choi, akpm, Anton Vorontsov,
	Alexey Perevalov

From: Anton Vorontsov <anton@enomsg.org>

This patch introduces timerfd_rearm(), this small helper is used to
forward and restart the hrtimer.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 fs/timerfd.c |   40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 9293121..4a349cb 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -229,6 +229,23 @@ static unsigned int timerfd_poll(struct file *file, poll_table *wait)
 	return events;
 }
 
+static u64 timerfd_rearm(struct timerfd_ctx *ctx)
+{
+	u64 orun = 0;
+
+	if (isalarm(ctx)) {
+		orun += alarm_forward_now(
+			&ctx->t.alarm, ctx->tintv) - 1;
+		alarm_restart(&ctx->t.alarm);
+	} else {
+		orun += hrtimer_forward_now(&ctx->t.tmr,
+		     ctx->tintv) - 1;
+		hrtimer_restart(&ctx->t.tmr);
+	}
+
+	return orun;
+}
+
 static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
 			    loff_t *ppos)
 {
@@ -265,15 +282,7 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
 			 * callback to avoid DoS attacks specifying a very
 			 * short timer period.
 			 */
-			if (isalarm(ctx)) {
-				ticks += alarm_forward_now(
-					&ctx->t.alarm, ctx->tintv) - 1;
-				alarm_restart(&ctx->t.alarm);
-			} else {
-				ticks += hrtimer_forward_now(&ctx->t.tmr,
-							     ctx->tintv) - 1;
-				hrtimer_restart(&ctx->t.tmr);
-			}
+			ticks += timerfd_rearm(ctx);
 		}
 		ctx->expired = 0;
 		ctx->ticks = 0;
@@ -421,18 +430,7 @@ static int do_timerfd_gettime(int ufd, struct itimerspec *t)
 	spin_lock_irq(&ctx->wqh.lock);
 	if (ctx->expired && ctx->tintv.tv64) {
 		ctx->expired = 0;
-
-		if (isalarm(ctx)) {
-			ctx->ticks +=
-				alarm_forward_now(
-					&ctx->t.alarm, ctx->tintv) - 1;
-			alarm_restart(&ctx->t.alarm);
-		} else {
-			ctx->ticks +=
-				hrtimer_forward_now(&ctx->t.tmr, ctx->tintv)
-				- 1;
-			hrtimer_restart(&ctx->t.tmr);
-		}
+		ctx->ticks += timerfd_rearm(ctx);
 	}
 	t->it_value = ktime_to_timespec(timerfd_get_remaining(ctx));
 	t->it_interval = ktime_to_timespec(ctx->tintv);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
                   ` (3 preceding siblings ...)
  2014-02-20 16:23 ` [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm() Alexey Perevalov
@ 2014-02-20 16:23 ` Alexey Perevalov
  2014-02-26  2:53   ` Andy Lutomirski
  2014-02-20 16:23 ` [PATCH v4 6/6] tracing/trivial: Add CLOCK_*_DEFERRABLE for tracing clockids Alexey Perevalov
  5 siblings, 1 reply; 19+ messages in thread
From: Alexey Perevalov @ 2014-02-20 16:23 UTC (permalink / raw)
  To: linux-kernel, tglx, john.stultz
  Cc: Anton Vorontsov, kyungmin.park, cw00.choi, akpm, Anton Vorontsov,
	Alexey Perevalov

From: Anton Vorontsov <anton@enomsg.org>

This patch implements a userland-side API for generic deferrable timers,
per linux/timer.h:

 * A deferrable timer will work normally when the system is busy, but
 * will not cause a CPU to come out of idle just to service it; instead,
 * the timer will be serviced when the CPU eventually wakes up with a
 * subsequent non-deferrable timer.

These timers are crucial for power saving, i.e. periodic tasks that want
to work in background when the system is under use, but don't want to
cause wakeups themselves.

Currently, the implementation is based on high resolution timers. It
provides the same overrun count as for none deferrable timers. What's why
it doesn't not bring any new functionality into timerfd implementation,
except clockid validation.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 fs/timerfd.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 4a349cb..c132fd2 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -326,7 +326,10 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 	    (clockid != CLOCK_MONOTONIC &&
 	     clockid != CLOCK_REALTIME &&
 	     clockid != CLOCK_REALTIME_ALARM &&
-	     clockid != CLOCK_BOOTTIME_ALARM))
+	     clockid != CLOCK_BOOTTIME_ALARM &&
+	     clockid != CLOCK_REALTIME_DEFERRABLE &&
+	     clockid != CLOCK_MONOTONIC_DEFERRABLE &&
+	     clockid != CLOCK_BOOTTIME_DEFERRABLE))
 		return -EINVAL;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -354,7 +357,7 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 	return ufd;
 }
 
-static int do_timerfd_settime(int ufd, int flags, 
+static int do_timerfd_settime(int ufd, int flags,
 		const struct itimerspec *new,
 		struct itimerspec *old)
 {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 6/6] tracing/trivial: Add CLOCK_*_DEFERRABLE for tracing clockids
  2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
                   ` (4 preceding siblings ...)
  2014-02-20 16:23 ` [PATCH v4 5/6] timerfd: Add support for deferrable timers Alexey Perevalov
@ 2014-02-20 16:23 ` Alexey Perevalov
  5 siblings, 0 replies; 19+ messages in thread
From: Alexey Perevalov @ 2014-02-20 16:23 UTC (permalink / raw)
  To: linux-kernel, tglx, john.stultz
  Cc: Alexey Perevalov, anton, kyungmin.park, cw00.choi, akpm

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/trace/events/timer.h |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 547b79f..ea3119a 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -13,7 +13,13 @@
 			 { CLOCK_MONOTONIC,		"CLOCK_MONOTONIC" },	\
 			 { CLOCK_REALTIME,		"CLOCK_REALTIME" },	\
 			 { CLOCK_BOOTTIME,		"CLOCK_BOOTTIME" },	\
-			 { CLOCK_TAI,			"CLOCK_TAI" }		\
+			 { CLOCK_TAI,			"CLOCK_TAI" },		\
+			 { CLOCK_MONOTONIC_DEFERRABLE,				\
+				"CLOCK_MONOTONIC_DEFERRABLE" },			\
+			 { CLOCK_REALTIME_DEFERRABLE,				\
+				"CLOCK_REALTIME_DEFERRABLE" },			\
+			 { CLOCK_BOOTTIME_DEFERRABLE,				\
+				"CLOCK_BOOTTIME_DEFERRABLE" }			\
 	)
 
 #define hrmode_to_string(hrmode)						\
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/6] Replace ternary operator to macro
  2014-02-20 16:23 ` [PATCH v4 1/6] Replace ternary operator to macro Alexey Perevalov
@ 2014-02-20 20:49   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-02-20 20:49 UTC (permalink / raw)
  To: Alexey Perevalov
  Cc: linux-kernel, john.stultz, anton, kyungmin.park, cw00.choi, akpm

On Thu, 20 Feb 2014, Alexey Perevalov wrote:

The subject line lacks any information in which part of the kernel you
are doing this.

> For extensibility purpose it's better to have _switch_ type mechanism
> for representing human readable CLOCKID* and HRMODE.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/trace/events/timer.h |   16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
> index 68c2c20..185b2c6 100644
> --- a/include/trace/events/timer.h
> +++ b/include/trace/events/timer.h
> @@ -8,6 +8,16 @@
>  #include <linux/hrtimer.h>
>  #include <linux/timer.h>
>  
> +#define clockid_to_string(clockid)						\
> +	__print_symbolic(clockid,						\
> +			 { CLOCK_MONOTONIC,		"CLOCK_MONOTONIC" },	\
> +			 { CLOCK_REALTIME,		"CLOCK_REALTIME" })
> +
> +#define hrmode_to_string(hrmode)						\
> +	__print_symbolic(hrmode,						\
> +			 { HRTIMER_MODE_ABS,		"HRTIMER_MODE_ABS" },	\
> +			 { HRTIMER_MODE_REL,		"HRTIMER_MODE_REL" })
> +
>  DECLARE_EVENT_CLASS(timer_class,
>  
>  	TP_PROTO(struct timer_list *timer),
> @@ -147,10 +157,8 @@ TRACE_EVENT(hrtimer_init,
>  	),
>  
>  	TP_printk("hrtimer=%p clockid=%s mode=%s", __entry->hrtimer,
> -		  __entry->clockid == CLOCK_REALTIME ?
> -			"CLOCK_REALTIME" : "CLOCK_MONOTONIC",
> -		  __entry->mode == HRTIMER_MODE_ABS ?
> -			"HRTIMER_MODE_ABS" : "HRTIMER_MODE_REL")
> +		  clockid_to_string(__entry->clockid),
> +		  hrmode_to_string(__entry->mode))
>  );
>  
>  /**
> -- 
> 1.7.9.5
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace
  2014-02-20 16:23 ` [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace Alexey Perevalov
@ 2014-02-20 20:49   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-02-20 20:49 UTC (permalink / raw)
  To: Alexey Perevalov
  Cc: linux-kernel, john.stultz, anton, kyungmin.park, cw00.choi, akpm

On Thu, 20 Feb 2014, Alexey Perevalov wrote:

> These clockids is also used in current hrtimer implementation.

tracing/trivial ???

It's a functional patch related to tracing. 
 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/trace/events/timer.h |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
> index 185b2c6..547b79f 100644
> --- a/include/trace/events/timer.h
> +++ b/include/trace/events/timer.h
> @@ -11,7 +11,10 @@
>  #define clockid_to_string(clockid)						\
>  	__print_symbolic(clockid,						\
>  			 { CLOCK_MONOTONIC,		"CLOCK_MONOTONIC" },	\
> -			 { CLOCK_REALTIME,		"CLOCK_REALTIME" })
> +			 { CLOCK_REALTIME,		"CLOCK_REALTIME" },	\
> +			 { CLOCK_BOOTTIME,		"CLOCK_BOOTTIME" },	\
> +			 { CLOCK_TAI,			"CLOCK_TAI" }		\
> +	)
>  
>  #define hrmode_to_string(hrmode)						\
>  	__print_symbolic(hrmode,						\
> -- 
> 1.7.9.5
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm()
  2014-02-20 16:23 ` [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm() Alexey Perevalov
@ 2014-02-20 21:13   ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-02-20 21:13 UTC (permalink / raw)
  To: Alexey Perevalov
  Cc: linux-kernel, john.stultz, Anton Vorontsov, kyungmin.park,
	cw00.choi, akpm, Anton Vorontsov

On Thu, 20 Feb 2014, Alexey Perevalov wrote:
> From: Anton Vorontsov <anton@enomsg.org>
> 
> This patch introduces timerfd_rearm(), this small helper is used to
> forward and restart the hrtimer.

And for what is this helper used for?
 
> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  fs/timerfd.c |   40 +++++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index 9293121..4a349cb 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -229,6 +229,23 @@ static unsigned int timerfd_poll(struct file *file, poll_table *wait)
>  	return events;
>  }
>  
> +static u64 timerfd_rearm(struct timerfd_ctx *ctx)
> +{
> +	u64 orun = 0;
> +
> +	if (isalarm(ctx)) {
> +		orun += alarm_forward_now(
> +			&ctx->t.alarm, ctx->tintv) - 1;
> +		alarm_restart(&ctx->t.alarm);
> +	} else {
> +		orun += hrtimer_forward_now(&ctx->t.tmr,
> +		     ctx->tintv) - 1;
> +		hrtimer_restart(&ctx->t.tmr);
> +	}
> +
> +	return orun;

I really give up. You did not even try to read and understand my
review points.

No, you went and mechanicallly fixed the compiler warning in the most
idiotic way one can come up with.

What's wrong with writing it:

{
	u64 orun;

	if (isalarm(ctx)) {
		orun = alarm_forward_now(ctx->t.alarm, ctx->tintv) - 1;
		alarm_restart(&ctx->t.alarm);
	} else {
		orun = hrtimer_forward_now(&ctx->t.tmr, ctx->tintv) - 1;
		hrtimer_restart(&ctx->t.tmr);
	}
	return orun;
}

Can you spot the difference? Just for the record:

1) It fixes the issue with proper assignments instead of pointlessly
   initializing the variable to 0 and then add the return value.

2) It removes the useless line break which makes the code simpler to
   read.

I told you to do #2 explicitely, but you decided to ignore it.

I did not tell you how to do #1, but I did not expect at all that
anyone might come up with that horrible solution.

This is not some random homework assignment in high school where you
might get away with the "It compiles so what" mentality.

Seriously, if you come back with another set of half baken patches,
you're going to end up in the utter-waste-of-time section of my
.procmailrc.

I have quite some patience with people, but I really have better
things to do than wasting my scarce time with advisory resistant
wannabe kernel developers.

Thanks,

	tglx

Hint: Take your time. Read carefully what I asked for and let your
      coworkers look over the patches including the subject lines and
      the changelogs before you send another half baken crap out in a
      haste.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-02-20 16:23 ` [PATCH v4 5/6] timerfd: Add support for deferrable timers Alexey Perevalov
@ 2014-02-26  2:53   ` Andy Lutomirski
  2014-03-04 20:58     ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2014-02-26  2:53 UTC (permalink / raw)
  To: Alexey Perevalov, linux-kernel, tglx, john.stultz
  Cc: Anton Vorontsov, kyungmin.park, cw00.choi, akpm, Anton Vorontsov

On 02/20/2014 08:23 AM, Alexey Perevalov wrote:
> From: Anton Vorontsov <anton@enomsg.org>
> 
> This patch implements a userland-side API for generic deferrable timers,
> per linux/timer.h:
> 
>  * A deferrable timer will work normally when the system is busy, but
>  * will not cause a CPU to come out of idle just to service it; instead,
>  * the timer will be serviced when the CPU eventually wakes up with a
>  * subsequent non-deferrable timer.
> 
> These timers are crucial for power saving, i.e. periodic tasks that want
> to work in background when the system is under use, but don't want to
> cause wakeups themselves.

Please don't.  This API sucks for all kinds of reasons:

 - Why is it a new kind of clock?
 - How deferrable is deferrable?
 - It adds new core code, which serves no purpose (the problem is
already solved).

On the other hand, if you added a fancier version of timerfd_settime
that could explicitly set the slack value (or, equivalently, the
earliest and latest allowable times), that could be quite useful.

It's often bugged me that timer slack is per-process.

--Andy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-02-26  2:53   ` Andy Lutomirski
@ 2014-03-04 20:58     ` Thomas Gleixner
  2014-03-04 21:53       ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2014-03-04 20:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexey Perevalov, linux-kernel, john.stultz, Anton Vorontsov,
	kyungmin.park, cw00.choi, akpm, Anton Vorontsov

On Tue, 25 Feb 2014, Andy Lutomirski wrote:
> On 02/20/2014 08:23 AM, Alexey Perevalov wrote:
> > From: Anton Vorontsov <anton@enomsg.org>
> > 
> > This patch implements a userland-side API for generic deferrable timers,
> > per linux/timer.h:
> > 
> >  * A deferrable timer will work normally when the system is busy, but
> >  * will not cause a CPU to come out of idle just to service it; instead,
> >  * the timer will be serviced when the CPU eventually wakes up with a
> >  * subsequent non-deferrable timer.
> > 
> > These timers are crucial for power saving, i.e. periodic tasks that want
> > to work in background when the system is under use, but don't want to
> > cause wakeups themselves.
> 
> Please don't.  This API sucks for all kinds of reasons:
> 
>  - Why is it a new kind of clock?

We made it a flag already.

>  - How deferrable is deferrable?

Deferrable is infinite.

>  - It adds new core code, which serves no purpose (the problem is
> already solved).

You wish and you're wrong.

    - Make this work with the timer_list stuff for arbitrary clock
      ids. No way we go back to the mess of CLOCK_REALTIME which we
      had before hrtimers

    - No way that we add a gazillion of extra code in timer related
      interfaces to distinguish between deferrable and non deferrable
      timers utilizing a different interface.

> On the other hand, if you added a fancier version of timerfd_settime
> that could explicitly set the slack value (or, equivalently, the
> earliest and latest allowable times), that could be quite useful.
> 
> It's often bugged me that timer slack is per-process.

That's a totally different issue. There is a world aside of timerfd
timers.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-03-04 20:58     ` Thomas Gleixner
@ 2014-03-04 21:53       ` Andy Lutomirski
  2014-03-04 22:11         ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2014-03-04 21:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexey Perevalov, linux-kernel, John Stultz, Anton Vorontsov,
	Kyungmin Park, cw00.choi, Andrew Morton, Anton Vorontsov

On Tue, Mar 4, 2014 at 12:58 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 25 Feb 2014, Andy Lutomirski wrote:
>> On the other hand, if you added a fancier version of timerfd_settime
>> that could explicitly set the slack value (or, equivalently, the
>> earliest and latest allowable times), that could be quite useful.
>>
>> It's often bugged me that timer slack is per-process.
>
> That's a totally different issue. There is a world aside of timerfd
> timers.

This is a patch to add deferrable support *to timerfd*.  I'm asking
why this is a useful feature to add.  Some day (maybe soon -- systemd
is doing something awful here) people will want choices of slack
values other than the per-process value and infinity.  Why not just do
that and skip this intermediate API that will have to be supported
forever?

I admit that I don't really understand the difference between hrtimers
and the other kind of timer internally, but I don't see why this
should have any effect at all of the userspace API.  The kernel
supports slack, slack is strictly more useful than "deferrable", so
let's just expose it properly to userspace.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-03-04 21:53       ` Andy Lutomirski
@ 2014-03-04 22:11         ` Thomas Gleixner
  2014-03-04 22:43           ` Andy Lutomirski
  2014-03-05  9:42           ` Richard Cochran
  0 siblings, 2 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-03-04 22:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexey Perevalov, linux-kernel, John Stultz, Anton Vorontsov,
	Kyungmin Park, cw00.choi, Andrew Morton, Anton Vorontsov

On Tue, 4 Mar 2014, Andy Lutomirski wrote:

> On Tue, Mar 4, 2014 at 12:58 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Tue, 25 Feb 2014, Andy Lutomirski wrote:
> >> On the other hand, if you added a fancier version of timerfd_settime
> >> that could explicitly set the slack value (or, equivalently, the
> >> earliest and latest allowable times), that could be quite useful.
> >>
> >> It's often bugged me that timer slack is per-process.
> >
> > That's a totally different issue. There is a world aside of timerfd
> > timers.
> 
> This is a patch to add deferrable support *to timerfd*.  I'm asking

There is a new patch series which adds deferrable support to all timer
related interfaces which have a flags field. And that's the only
sensible solution right now.

We do no add another random special case syscall for timerfd just
because timerfd is linux specific.

No, we want to support that stuff right now with the existing
interfaces as we have to revisit all of the timer related interfaces
in the near future anyway due to the Y2038 issue.

And your idea of per thread slack is completely bogus. If we want to
make the slack value usefull then it needs to be done per timer and
not per thread/process.

But we cannot do that right now as we cannot whip up severl dozen of
new syscalls just because we want to add slack/deferrable whatever
properties.

Once we agree on a solution to the Y2038 issue on 32bit with a unified
32/64 bit syscall interface which simply gets rid of the timespec/val
nonsense and takes a simple u64 nsec value we can add the slack
property to that without any further inconvenience.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-03-04 22:11         ` Thomas Gleixner
@ 2014-03-04 22:43           ` Andy Lutomirski
  2014-03-05  0:10             ` Thomas Gleixner
  2014-03-05  9:42           ` Richard Cochran
  1 sibling, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2014-03-04 22:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexey Perevalov, linux-kernel, John Stultz, Anton Vorontsov,
	Kyungmin Park, cw00.choi, Andrew Morton, Anton Vorontsov

On Tue, Mar 4, 2014 at 2:11 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 4 Mar 2014, Andy Lutomirski wrote:
>
>> On Tue, Mar 4, 2014 at 12:58 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Tue, 25 Feb 2014, Andy Lutomirski wrote:
>> >> On the other hand, if you added a fancier version of timerfd_settime
>> >> that could explicitly set the slack value (or, equivalently, the
>> >> earliest and latest allowable times), that could be quite useful.
>> >>
>> >> It's often bugged me that timer slack is per-process.
>> >
>> > That's a totally different issue. There is a world aside of timerfd
>> > timers.
>>
>> This is a patch to add deferrable support *to timerfd*.  I'm asking
>
> There is a new patch series which adds deferrable support to all timer
> related interfaces which have a flags field. And that's the only
> sensible solution right now.
>
> We do no add another random special case syscall for timerfd just
> because timerfd is linux specific.

What syscalls?  I can think of exactly two timer interfaces that
actually accept a clock id and flags: clock_nanosleep and
timerfd_settime.

>
> No, we want to support that stuff right now with the existing
> interfaces as we have to revisit all of the timer related interfaces
> in the near future anyway due to the Y2038 issue.
>
> And your idea of per thread slack is completely bogus. If we want to
> make the slack value usefull then it needs to be done per timer and
> not per thread/process.

This is *exactly* what I'm suggesting.

>
> But we cannot do that right now as we cannot whip up severl dozen of
> new syscalls just because we want to add slack/deferrable whatever
> properties.

Two syscalls, right?

>
> Once we agree on a solution to the Y2038 issue on 32bit with a unified
> 32/64 bit syscall interface which simply gets rid of the timespec/val
> nonsense and takes a simple u64 nsec value we can add the slack
> property to that without any further inconvenience.
>
> Thanks,
>
>         tglx
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-03-04 22:43           ` Andy Lutomirski
@ 2014-03-05  0:10             ` Thomas Gleixner
  2014-03-05  0:42               ` Andy Lutomirski
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2014-03-05  0:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexey Perevalov, linux-kernel, John Stultz, Anton Vorontsov,
	Kyungmin Park, cw00.choi, Andrew Morton, Anton Vorontsov

On Tue, 4 Mar 2014, Andy Lutomirski wrote:
> On Tue, Mar 4, 2014 at 2:11 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > We do no add another random special case syscall for timerfd just
> > because timerfd is linux specific.
> 
> What syscalls?  I can think of exactly two timer interfaces that
> actually accept a clock id and flags: clock_nanosleep and
> timerfd_settime.

Sure, and what you can think of is reality?

 sys_timer_settime() which relies on sys_timer_create() are outside
 your universe, right?

And no. We are not adding timer_list mess back to any of them.

Aside of that if you want to make the slack thing usefull on a per
call basis then you want to add it to a lot of other interfaces like
poll.

And you are completely ignoring the fact that the slack works
completely differrent:

A slacked timer still gets enqueued into the main timer queue. It just
relies on the fact that it gets batched with some other expiring
timer. But thats completely different to the deferrable approach.

       start_timer(timer, expiry, slack);

       	   timer.hard_expiry = expiry + slack;
	   timer.soft_expiry = expiry;
	   enqueue_timer(timer, timer.hard_expiry);

The enqueueing code puts it into the queue by looking at the
hard_expiry code. And the expiry code looks at the timer.soft_expiry
value to expire a timer early.

Now assume the following:

       start_timer(timer, +100ms, 100s);

So that puts that timer into the hard expiry line of 100.1 sec from
now. So if the cpu is busy and is firing a lot of timers then your
timer could be delayed up to the hard expiry time, i.e. 100.1 seconds
from now, which has completely differrent semantics than the
deferrrable timers.

The deferrable timer is guaranteed to expire (halfways) on time when
the system is active and does not affect the system from going idle,
but it expires right away when the system comes back out of idle.

The slack timers are just a batching mechanism to align expiry times
of non deferrable timers to a common time.

So how do you map those together?

I'm not saying that a per timer slack is useless, but it does not
solve the issue of deferrable timers.

Quite the contrary, it would be simpler to implement the slacked
timers as a special case of the deferrable timers. But hell no, we are
not going to go there.

> > But we cannot do that right now as we cannot whip up severl dozen of
> > new syscalls just because we want to add slack/deferrable whatever
> > properties.

> Two syscalls, right?

It does not matter at all how many syscalls this affects. We are not
adding any random new syscalls just because we can.

> Once we agree on a solution to the Y2038 issue on 32bit with a unified
> 32/64 bit syscall interface which simply gets rid of the timespec/val
> nonsense and takes a simple u64 nsec value we can add the slack
> property to that without any further inconvenience.

Ignoring this wont get you anywhere.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-03-05  0:10             ` Thomas Gleixner
@ 2014-03-05  0:42               ` Andy Lutomirski
  2014-03-05 11:40                 ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Andy Lutomirski @ 2014-03-05  0:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexey Perevalov, linux-kernel, John Stultz, Anton Vorontsov,
	Kyungmin Park, cw00.choi, Andrew Morton, Anton Vorontsov

On Tue, Mar 4, 2014 at 4:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 4 Mar 2014, Andy Lutomirski wrote:
>> On Tue, Mar 4, 2014 at 2:11 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > We do no add another random special case syscall for timerfd just
>> > because timerfd is linux specific.
>>
>> What syscalls?  I can think of exactly two timer interfaces that
>> actually accept a clock id and flags: clock_nanosleep and
>> timerfd_settime.
>
> Sure, and what you can think of is reality?
>
>  sys_timer_settime() which relies on sys_timer_create() are outside
>  your universe, right?
>

Sigh, I forgot about those.  I would argue that there is no real
reason to make timer_create any fancier.  That kind of sucks.

> Aside of that if you want to make the slack thing usefull on a per
> call basis then you want to add it to a lot of other interfaces like
> poll.

Same with deferrable timers.  And things that want MONOTONIC *and*
REALTIME.  Etc.

>
> And you are completely ignoring the fact that the slack works
> completely differrent:
>
> A slacked timer still gets enqueued into the main timer queue. It just
> relies on the fact that it gets batched with some other expiring
> timer. But thats completely different to the deferrable approach.
>
>        start_timer(timer, expiry, slack);
>
>            timer.hard_expiry = expiry + slack;
>            timer.soft_expiry = expiry;
>            enqueue_timer(timer, timer.hard_expiry);
>
> The enqueueing code puts it into the queue by looking at the
> hard_expiry code. And the expiry code looks at the timer.soft_expiry
> value to expire a timer early.
>
> Now assume the following:
>
>        start_timer(timer, +100ms, 100s);
>
> So that puts that timer into the hard expiry line of 100.1 sec from
> now. So if the cpu is busy and is firing a lot of timers then your
> timer could be delayed up to the hard expiry time, i.e. 100.1 seconds
> from now, which has completely differrent semantics than the
> deferrrable timers.

Erk.  I didn't realize that.  Is that really the desired behavior?  I
assumed that a timer with slack would fire at the earliest time after
the soft timeout at which the system wasn't idle.  The idea is to
batch wakeups, right?

>
> The deferrable timer is guaranteed to expire (halfways) on time when
> the system is active and does not affect the system from going idle,
> but it expires right away when the system comes back out of idle.
>
> The slack timers are just a batching mechanism to align expiry times
> of non deferrable timers to a common time.
>
> So how do you map those together?

By thinking of what semantics are actually useful for userspace developers.

I think that most userspace developers probably want the semantics
that I thought that timer slack had: I want to do work between time A
and time B.  Before A is too early, but I'm willing to wait until time
B if it improves power consumption.

Presumably, if the kernel chooses *not* to fire the timer just after
time A even if the system is awake, then it's risking an unnecessary
wakeup at time B.

(I admit that I don't really understand the hrtimer code.  I guess
that two indexes on the list of timers would be needed.)

>> > But we cannot do that right now as we cannot whip up severl dozen of
>> > new syscalls just because we want to add slack/deferrable whatever
>> > properties.
>
>> Two syscalls, right?
>
> It does not matter at all how many syscalls this affects. We are not
> adding any random new syscalls just because we can.
>
>> Once we agree on a solution to the Y2038 issue on 32bit with a unified
>> 32/64 bit syscall interface which simply gets rid of the timespec/val
>> nonsense and takes a simple u64 nsec value we can add the slack
>> property to that without any further inconvenience.
>
> Ignoring this wont get you anywhere.

I'm not entirely sure why per-timer slack can't be added without
simultaneously fixing Y2038 (and presumably leap seconds, too) but a
new flag can be.

--Andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-03-04 22:11         ` Thomas Gleixner
  2014-03-04 22:43           ` Andy Lutomirski
@ 2014-03-05  9:42           ` Richard Cochran
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Cochran @ 2014-03-05  9:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Alexey Perevalov, linux-kernel, John Stultz,
	Anton Vorontsov, Kyungmin Park, cw00.choi, Andrew Morton,
	Anton Vorontsov

On Tue, Mar 04, 2014 at 11:11:21PM +0100, Thomas Gleixner wrote:
> 
> Once we agree on a solution to the Y2038 issue on 32bit with a unified
> 32/64 bit syscall interface which simply gets rid of the timespec/val
> nonsense and takes a simple u64 nsec value we can add the slack
> property to that without any further inconvenience.

Can you expand on this tangent a bit?

Who needs to agree, and where is this being debated?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/6] timerfd: Add support for deferrable timers
  2014-03-05  0:42               ` Andy Lutomirski
@ 2014-03-05 11:40                 ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2014-03-05 11:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexey Perevalov, linux-kernel, John Stultz, Anton Vorontsov,
	Kyungmin Park, cw00.choi, Andrew Morton, Anton Vorontsov

On Tue, 4 Mar 2014, Andy Lutomirski wrote:
> On Tue, Mar 4, 2014 at 4:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > A slacked timer still gets enqueued into the main timer queue. It just
> > relies on the fact that it gets batched with some other expiring
> > timer. But thats completely different to the deferrable approach.
> >
> >        start_timer(timer, expiry, slack);
> >
> >            timer.hard_expiry = expiry + slack;
> >            timer.soft_expiry = expiry;
> >            enqueue_timer(timer, timer.hard_expiry);
> >
> > The enqueueing code puts it into the queue by looking at the
> > hard_expiry code. And the expiry code looks at the timer.soft_expiry
> > value to expire a timer early.
> >
> > Now assume the following:
> >
> >        start_timer(timer, +100ms, 100s);
> >
> > So that puts that timer into the hard expiry line of 100.1 sec from
> > now. So if the cpu is busy and is firing a lot of timers then your
> > timer could be delayed up to the hard expiry time, i.e. 100.1 seconds
> > from now, which has completely differrent semantics than the
> > deferrrable timers.
> 
> Erk.  I didn't realize that.  Is that really the desired behavior?  I

It's the implemented behaviour for a reason.

> assumed that a timer with slack would fire at the earliest time after
> the soft timeout at which the system wasn't idle.  The idea is to
> batch wakeups, right?

Correct. And that's why the slack thing was invented. Not the best
invention, but it solved a problem without creating a cast in stone
new user space ABI. And it was simple to do with the existing
RB-Tree. Otherwise you'd need a Priority Search Tree which handles
overlapping expiry ranges.
 
> > The deferrable timer is guaranteed to expire (halfways) on time when
> > the system is active and does not affect the system from going idle,
> > but it expires right away when the system comes back out of idle.
> >
> > The slack timers are just a batching mechanism to align expiry times
> > of non deferrable timers to a common time.
> >
> > So how do you map those together?
> 
> By thinking of what semantics are actually useful for userspace developers.
> 
> I think that most userspace developers probably want the semantics
> that I thought that timer slack had: I want to do work between time A
> and time B.  Before A is too early, but I'm willing to wait until time
> B if it improves power consumption.

Well, that's what slack actually does.

But your assumption that this is what most userspace developers
probably want is wrong. A lot of them want the following:

   Fire me on time when the CPU/system is busy, otherwise ignore me
   for a time X, where X might be infinite.

And you cannot map this to slack. See below.
 
> Presumably, if the kernel chooses *not* to fire the timer just after
> time A even if the system is awake, then it's risking an unnecessary
> wakeup at time B.
> 
> (I admit that I don't really understand the hrtimer code.  I guess
> that two indexes on the list of timers would be needed.)

The real problem is that we want to cover the following cases:

    1) Expire me no matter what at X

    2) Expire me no matter what at X + Slack (wakeup batching)

    3) Expire me close to X when the system/cpu is busy otherwise expire me latest
        at X + Slack

    4) Expire me close to X when the system/cpu is busy otherwise
       ignore me

#1 and #2 are handled today #1 is #2 with Slack = 0

#4 is what I implemented with the extra internal queues and the extra
flag. We can make the internal implementation to handle #3 as well,
but we do not have a user space interface for that.

> >> Once we agree on a solution to the Y2038 issue on 32bit with a unified
> >> 32/64 bit syscall interface which simply gets rid of the timespec/val
> >> nonsense and takes a simple u64 nsec value we can add the slack
> >> property to that without any further inconvenience.
> >
> > Ignoring this wont get you anywhere.
> 
> I'm not entirely sure why per-timer slack can't be added without
> simultaneously fixing Y2038 (and presumably leap seconds, too) but a
> new flag can be.

The additional flag is fine as it does not introduce a completely new
ABI, it merily extends the existing ABI.

But adding a per call slack is going to introduce a new ABI and I
really dont want to go there as we need to introduce a new ABI for the
Y2038 issue anyway. And that's way more than the few direct timer
related syscalls. Basically we have to look at all syscalls which take
a timespec/timeval.

So no, we are not going to add an adhoc intermediate ABI which we need
to support forever.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-03-05 11:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-20 16:23 [PATCH v4 0/6] Deferrable timers support for hrtimers/timerfd API Alexey Perevalov
2014-02-20 16:23 ` [PATCH v4 1/6] Replace ternary operator to macro Alexey Perevalov
2014-02-20 20:49   ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 2/6] tracing/trivial: Add CLOCK_BOOTIME and CLOCK_TAI for human readable clockid trace Alexey Perevalov
2014-02-20 20:49   ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 3/6] hrtimer: Add support for deferrable timer into the hrtimer Alexey Perevalov
2014-02-20 16:23 ` [PATCH v4 4/6] timerfd: Move repeated logic into timerfd_rearm() Alexey Perevalov
2014-02-20 21:13   ` Thomas Gleixner
2014-02-20 16:23 ` [PATCH v4 5/6] timerfd: Add support for deferrable timers Alexey Perevalov
2014-02-26  2:53   ` Andy Lutomirski
2014-03-04 20:58     ` Thomas Gleixner
2014-03-04 21:53       ` Andy Lutomirski
2014-03-04 22:11         ` Thomas Gleixner
2014-03-04 22:43           ` Andy Lutomirski
2014-03-05  0:10             ` Thomas Gleixner
2014-03-05  0:42               ` Andy Lutomirski
2014-03-05 11:40                 ` Thomas Gleixner
2014-03-05  9:42           ` Richard Cochran
2014-02-20 16:23 ` [PATCH v4 6/6] tracing/trivial: Add CLOCK_*_DEFERRABLE for tracing clockids Alexey Perevalov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).