All of lore.kernel.org
 help / color / mirror / Atom feed
* (no subject)
@ 2012-08-08 18:10 Tejun Heo
  2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo
                   ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz

Timer internals are protected by irqsafe lock but the lock is
naturally dropped and irq enabled while a timer is executed.  This
makes dequeueing timer for execution and the actual execution
non-atomic against IRQs.  No matter what the timer function does, IRQs
can occur between timer dispatch and execution.  This means that an
IRQ handler could interrupt any timer in progress and it's impossible
for an IRQ handler to cancel and drain a timer.

This restriction manifests as ugly convolutions in workqueue
delayed_work interface.  A !idle delayed_work is either on timer,
being transferred from timer to worklist, on worklist, or executing.
There are interfaces which need to cancel a pending delayed_work -
cancel_delayed_work() and friends and mod_delayed_work().  They want
to cancel a work item in the first three states but it's impossible to
drain the second state from IRQ handlers which lead to the following
oddities.

* mod_delayed_work() can't be used from IRQ handlers.

* __cancel_delayed_work() can't use the usual try_to_grab_pending()
  which handles all three states but instead only deals with the first
  state using a separate implementation.  There's no way to make a
  delayed_work not pending from IRQ handlers.

* The context / behavior differences among cancel_delayed_work(),
  __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and
  confusing (the first two are mostly historical tho).

This patchset implements irqsafe timers.  For an irqsafe timer, IRQ is
not enabled from dispatch till the end of its execution making it safe
to drain the timer regardless of context.  This will enable cleaning
up delayed_work interface.

This patchset contains the following four patches.

 0001-timer-generalize-timer-base-flags-handling.patch
 0002-timer-relocate-declarations-of-init_timer_on_stack_k.patch
 0003-timer-clean-up-timer-initializers.patch
 0004-timer-implement-TIMER_IRQSAFE.patch

0001 generalizes timer->base flags handling so that TIMER_IRQSAFE can
be added easily.

0002-0003 clean up initializers so that adding TIMER_IRQSAFE doesn't
need to duplicate init code multiple times.

0004 implements TIMER_IRQSAFE.

This patchset is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-timer-irqsafe

Will soon post workqueue patchset which makes use of this.  If this
goes in, it would be great if this either goes through wq/for-3.7 or
gets its own branch somewhere so that it can be pulled into
wq/for-3.7.

Thanks.

 include/linux/timer.h |  161 ++++++++++++++++++--------------------------------
 kernel/timer.c        |  108 +++++++++++++++------------------
 2 files changed, 110 insertions(+), 159 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/4] timer: generalize timer->base flags handling
  2012-08-08 18:10 Tejun Heo
@ 2012-08-08 18:10 ` Tejun Heo
  2012-08-21 16:40   ` [tip:timers/core] timer: Generalize " tip-bot for Tejun Heo
  2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo

To prepare for addition of another flag, generalize timer->base flags
handling.

* Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.

* Define and use TIMER_FLAG_MASK for flags masking so that multiple
  flags can be handled correctly.

* Don't dereference timer->base directly even if
  !tbase_get_deferrable().  All two such places are already passed in
  @base, so use it instead.

* Make sure tvec_base's alignment is large enough for timer->base
  flags using BUILD_BUG_ON().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/timer.h |    6 ++++--
 kernel/timer.c        |   21 +++++++++++++--------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 6abd913..cbd32ec 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -58,7 +58,9 @@ extern struct tvec_base boot_tvec_bases;
  * the timer will be serviced when the CPU eventually wakes up with a
  * subsequent non-deferrable timer.
  */
-#define TBASE_DEFERRABLE_FLAG		(0x1)
+#define TIMER_DEFERRABLE		0x1LU
+
+#define TIMER_FLAG_MASK			0x1LU
 
 #define TIMER_INITIALIZER(_function, _expires, _data) {		\
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
@@ -72,7 +74,7 @@ extern struct tvec_base boot_tvec_bases;
 	}
 
 #define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *)		\
-		  ((unsigned char *)(ptr) + TBASE_DEFERRABLE_FLAG))
+		  ((unsigned char *)(ptr) + TIMER_DEFERRABLE))
 
 #define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
diff --git a/kernel/timer.c b/kernel/timer.c
index a61c093..cf7af56 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -92,12 +92,12 @@ static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;
 /* Functions below help us manage 'deferrable' flag */
 static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
 {
-	return ((unsigned int)(unsigned long)base & TBASE_DEFERRABLE_FLAG);
+	return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE);
 }
 
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
-	return ((struct tvec_base *)((unsigned long)base & ~TBASE_DEFERRABLE_FLAG));
+	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
 }
 
 static inline void timer_set_deferrable(struct timer_list *timer)
@@ -108,8 +108,9 @@ static inline void timer_set_deferrable(struct timer_list *timer)
 static inline void
 timer_set_base(struct timer_list *timer, struct tvec_base *new_base)
 {
-	timer->base = (struct tvec_base *)((unsigned long)(new_base) |
-				      tbase_get_deferrable(timer->base));
+	unsigned long flags = (unsigned long)timer->base & TIMER_FLAG_MASK;
+
+	timer->base = (struct tvec_base *)((unsigned long)(new_base) | flags);
 }
 
 static unsigned long round_jiffies_common(unsigned long j, int cpu,
@@ -686,7 +687,7 @@ detach_expired_timer(struct timer_list *timer, struct tvec_base *base)
 {
 	detach_timer(timer, true);
 	if (!tbase_get_deferrable(timer->base))
-		timer->base->active_timers--;
+		base->active_timers--;
 }
 
 static int detach_if_pending(struct timer_list *timer, struct tvec_base *base,
@@ -697,7 +698,7 @@ static int detach_if_pending(struct timer_list *timer, struct tvec_base *base,
 
 	detach_timer(timer, clear_pending);
 	if (!tbase_get_deferrable(timer->base)) {
-		timer->base->active_timers--;
+		base->active_timers--;
 		if (timer->expires == base->next_timer)
 			base->next_timer = base->timer_jiffies;
 	}
@@ -1800,9 +1801,13 @@ static struct notifier_block __cpuinitdata timers_nb = {
 
 void __init init_timers(void)
 {
-	int err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE,
-				(void *)(long)smp_processor_id());
+	int err;
+
+	/* ensure there are enough low bits for flags in timer->base pointer */
+	BUILD_BUG_ON(__alignof__(struct tvec_base) & TIMER_FLAG_MASK);
 
+	err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE,
+			       (void *)(long)smp_processor_id());
 	init_timer_stats();
 
 	BUG_ON(err != NOTIFY_OK);
-- 
1.7.7.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key()
  2012-08-08 18:10 Tejun Heo
  2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo
@ 2012-08-08 18:10 ` Tejun Heo
  2012-08-21 16:41   ` [tip:timers/core] timer: Relocate " tip-bot for Tejun Heo
  2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo

init_timer_on_stack_key() is used by init macro definitions.  Move
init_timer_on_stack_key() and destroy_timer_on_stack() declarations
above init macro defs.  This will make the next init cleanup patch
easier to read.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/timer.h |   30 +++++++++++++++---------------
 1 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index cbd32ec..1d364ae 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -97,6 +97,21 @@ void init_timer_deferrable_key(struct timer_list *timer,
 			       const char *name,
 			       struct lock_class_key *key);
 
+#ifdef CONFIG_DEBUG_OBJECTS_TIMERS
+extern void init_timer_on_stack_key(struct timer_list *timer,
+				    const char *name,
+				    struct lock_class_key *key);
+extern void destroy_timer_on_stack(struct timer_list *timer);
+#else
+static inline void destroy_timer_on_stack(struct timer_list *timer) { }
+static inline void init_timer_on_stack_key(struct timer_list *timer,
+					   const char *name,
+					   struct lock_class_key *key)
+{
+	init_timer_key(timer, name, key);
+}
+#endif
+
 #ifdef CONFIG_LOCKDEP
 #define init_timer(timer)						\
 	do {								\
@@ -150,21 +165,6 @@ void init_timer_deferrable_key(struct timer_list *timer,
 	setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data))
 #endif
 
-#ifdef CONFIG_DEBUG_OBJECTS_TIMERS
-extern void init_timer_on_stack_key(struct timer_list *timer,
-				    const char *name,
-				    struct lock_class_key *key);
-extern void destroy_timer_on_stack(struct timer_list *timer);
-#else
-static inline void destroy_timer_on_stack(struct timer_list *timer) { }
-static inline void init_timer_on_stack_key(struct timer_list *timer,
-					   const char *name,
-					   struct lock_class_key *key)
-{
-	init_timer_key(timer, name, key);
-}
-#endif
-
 static inline void setup_timer_key(struct timer_list * timer,
 				const char *name,
 				struct lock_class_key *key,
-- 
1.7.7.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/4] timer: clean up timer initializers
  2012-08-08 18:10 Tejun Heo
  2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo
  2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo
@ 2012-08-08 18:10 ` Tejun Heo
  2012-08-21 16:42   ` [tip:timers/core] timer: Clean " tip-bot for Tejun Heo
  2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo

Over time, timer initializers became messy with unnecessarily
duplicated codes which are inconsistently spread across timer.h and
timer.c.

This patch cleans up timer initializers.

* timer.c::__init_timer() is renamed to do_init_timer().

* __TIMER_INITIALIZER() added.  It takes @flags and all initializers
  are wrappers around it.

* init_timer[_on_stack]_key() now take @flags.

* __init_timer[_on_stack]() added.  They take @flags and all init
  macros are wrappers around them.

* __setup_timer[_on_stack]() added.  It uses __init_timer() and takes
  @flags.  All setup macros are wrappers around the two.

Note that this patch doesn't add missing init/setup combinations -
e.g. init_timer_deferrable_on_stack().  Adding missing ones is
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/timer.h |  123 +++++++++++++++---------------------------------
 kernel/timer.c        |   56 ++++++-----------------
 2 files changed, 53 insertions(+), 126 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 1d364ae..3f95c1f 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -62,136 +62,91 @@ extern struct tvec_base boot_tvec_bases;
 
 #define TIMER_FLAG_MASK			0x1LU
 
-#define TIMER_INITIALIZER(_function, _expires, _data) {		\
+#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
 		.function = (_function),			\
 		.expires = (_expires),				\
 		.data = (_data),				\
-		.base = &boot_tvec_bases,			\
+		.base = (void *)((unsigned long)&boot_tvec_bases + (_flags)), \
 		.slack = -1,					\
 		__TIMER_LOCKDEP_MAP_INITIALIZER(		\
 			__FILE__ ":" __stringify(__LINE__))	\
 	}
 
-#define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *)		\
-		  ((unsigned char *)(ptr) + TIMER_DEFERRABLE))
+#define TIMER_INITIALIZER(_function, _expires, _data)		\
+	__TIMER_INITIALIZER((_function), (_expires), (_data), 0)
 
-#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\
-		.entry = { .prev = TIMER_ENTRY_STATIC },	\
-		.function = (_function),			\
-		.expires = (_expires),				\
-		.data = (_data),				\
-		.base = TBASE_MAKE_DEFERRED(&boot_tvec_bases),	\
-		__TIMER_LOCKDEP_MAP_INITIALIZER(		\
-			__FILE__ ":" __stringify(__LINE__))	\
-	}
+#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data)	\
+	__TIMER_INITIALIZER((_function), (_expires), (_data), TIMER_DEFERRABLE)
 
 #define DEFINE_TIMER(_name, _function, _expires, _data)		\
 	struct timer_list _name =				\
 		TIMER_INITIALIZER(_function, _expires, _data)
 
-void init_timer_key(struct timer_list *timer,
-		    const char *name,
-		    struct lock_class_key *key);
-void init_timer_deferrable_key(struct timer_list *timer,
-			       const char *name,
-			       struct lock_class_key *key);
+void init_timer_key(struct timer_list *timer, unsigned int flags,
+		    const char *name, struct lock_class_key *key);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void init_timer_on_stack_key(struct timer_list *timer,
-				    const char *name,
+				    unsigned int flags, const char *name,
 				    struct lock_class_key *key);
 extern void destroy_timer_on_stack(struct timer_list *timer);
 #else
 static inline void destroy_timer_on_stack(struct timer_list *timer) { }
 static inline void init_timer_on_stack_key(struct timer_list *timer,
-					   const char *name,
+					   unsigned int flags, const char *name,
 					   struct lock_class_key *key)
 {
-	init_timer_key(timer, name, key);
+	init_timer_key(timer, flags, name, key);
 }
 #endif
 
 #ifdef CONFIG_LOCKDEP
-#define init_timer(timer)						\
+#define __init_timer(_timer, _flags)					\
 	do {								\
 		static struct lock_class_key __key;			\
-		init_timer_key((timer), #timer, &__key);		\
+		init_timer_key((_timer), (_flags), #_timer, &__key);	\
 	} while (0)
 
-#define init_timer_deferrable(timer)					\
+#define __init_timer_on_stack(_timer, _flags)				\
 	do {								\
 		static struct lock_class_key __key;			\
-		init_timer_deferrable_key((timer), #timer, &__key);	\
+		init_timer_on_stack_key((_timer), (_flags), #_timer, &__key); \
 	} while (0)
+#else
+#define __init_timer(_timer, _flags)					\
+	init_timer_key((_timer), (_flags), NULL, NULL)
+#define __init_timer_on_stack(_timer, _flags)				\
+	init_timer_on_stack_key((_timer), (_flags), NULL, NULL)
+#endif
 
+#define init_timer(timer)						\
+	__init_timer((timer), 0)
+#define init_timer_deferrable(timer)					\
+	__init_timer((timer), TIMER_DEFERRABLE)
 #define init_timer_on_stack(timer)					\
+	__init_timer_on_stack((timer), 0)
+
+#define __setup_timer(_timer, _fn, _data, _flags)			\
 	do {								\
-		static struct lock_class_key __key;			\
-		init_timer_on_stack_key((timer), #timer, &__key);	\
+		__init_timer((_timer), (_flags));			\
+		(_timer)->function = (_fn);				\
+		(_timer)->data = (_data);				\
 	} while (0)
 
-#define setup_timer(timer, fn, data)					\
+#define __setup_timer_on_stack(_timer, _fn, _data, _flags)		\
 	do {								\
-		static struct lock_class_key __key;			\
-		setup_timer_key((timer), #timer, &__key, (fn), (data));\
+		__init_timer_on_stack((_timer), (_flags));		\
+		(_timer)->function = (_fn);				\
+		(_timer)->data = (_data);				\
 	} while (0)
 
+#define setup_timer(timer, fn, data)					\
+	__setup_timer((timer), (fn), (data), 0)
 #define setup_timer_on_stack(timer, fn, data)				\
-	do {								\
-		static struct lock_class_key __key;			\
-		setup_timer_on_stack_key((timer), #timer, &__key,	\
-					 (fn), (data));			\
-	} while (0)
+	__setup_timer_on_stack((timer), (fn), (data), 0)
 #define setup_deferrable_timer_on_stack(timer, fn, data)		\
-	do {								\
-		static struct lock_class_key __key;			\
-		setup_deferrable_timer_on_stack_key((timer), #timer,	\
-						    &__key, (fn),	\
-						    (data));		\
-	} while (0)
-#else
-#define init_timer(timer)\
-	init_timer_key((timer), NULL, NULL)
-#define init_timer_deferrable(timer)\
-	init_timer_deferrable_key((timer), NULL, NULL)
-#define init_timer_on_stack(timer)\
-	init_timer_on_stack_key((timer), NULL, NULL)
-#define setup_timer(timer, fn, data)\
-	setup_timer_key((timer), NULL, NULL, (fn), (data))
-#define setup_timer_on_stack(timer, fn, data)\
-	setup_timer_on_stack_key((timer), NULL, NULL, (fn), (data))
-#define setup_deferrable_timer_on_stack(timer, fn, data)\
-	setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data))
-#endif
-
-static inline void setup_timer_key(struct timer_list * timer,
-				const char *name,
-				struct lock_class_key *key,
-				void (*function)(unsigned long),
-				unsigned long data)
-{
-	timer->function = function;
-	timer->data = data;
-	init_timer_key(timer, name, key);
-}
-
-static inline void setup_timer_on_stack_key(struct timer_list *timer,
-					const char *name,
-					struct lock_class_key *key,
-					void (*function)(unsigned long),
-					unsigned long data)
-{
-	timer->function = function;
-	timer->data = data;
-	init_timer_on_stack_key(timer, name, key);
-}
-
-extern void setup_deferrable_timer_on_stack_key(struct timer_list *timer,
-						const char *name,
-						struct lock_class_key *key,
-						void (*function)(unsigned long),
-						unsigned long data);
+	__setup_timer_on_stack((timer), (fn), (data), TIMER_DEFERRABLE)
 
 /**
  * timer_pending - is a timer pending?
diff --git a/kernel/timer.c b/kernel/timer.c
index cf7af56..8d185a1 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -100,11 +100,6 @@ static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
 }
 
-static inline void timer_set_deferrable(struct timer_list *timer)
-{
-	timer->base = TBASE_MAKE_DEFERRED(timer->base);
-}
-
 static inline void
 timer_set_base(struct timer_list *timer, struct tvec_base *new_base)
 {
@@ -564,16 +559,14 @@ static inline void debug_timer_assert_init(struct timer_list *timer)
 	debug_object_assert_init(timer, &timer_debug_descr);
 }
 
-static void __init_timer(struct timer_list *timer,
-			 const char *name,
-			 struct lock_class_key *key);
+static void do_init_timer(struct timer_list *timer, unsigned int flags,
+			  const char *name, struct lock_class_key *key);
 
-void init_timer_on_stack_key(struct timer_list *timer,
-			     const char *name,
-			     struct lock_class_key *key)
+void init_timer_on_stack_key(struct timer_list *timer, unsigned int flags,
+			     const char *name, struct lock_class_key *key)
 {
 	debug_object_init_on_stack(timer, &timer_debug_descr);
-	__init_timer(timer, name, key);
+	do_init_timer(timer, flags, name, key);
 }
 EXPORT_SYMBOL_GPL(init_timer_on_stack_key);
 
@@ -614,12 +607,13 @@ static inline void debug_assert_init(struct timer_list *timer)
 	debug_timer_assert_init(timer);
 }
 
-static void __init_timer(struct timer_list *timer,
-			 const char *name,
-			 struct lock_class_key *key)
+static void do_init_timer(struct timer_list *timer, unsigned int flags,
+			  const char *name, struct lock_class_key *key)
 {
+	struct tvec_base *base = __raw_get_cpu_var(tvec_bases);
+
 	timer->entry.next = NULL;
-	timer->base = __raw_get_cpu_var(tvec_bases);
+	timer->base = (void *)((unsigned long)base | flags);
 	timer->slack = -1;
 #ifdef CONFIG_TIMER_STATS
 	timer->start_site = NULL;
@@ -629,22 +623,10 @@ static void __init_timer(struct timer_list *timer,
 	lockdep_init_map(&timer->lockdep_map, name, key, 0);
 }
 
-void setup_deferrable_timer_on_stack_key(struct timer_list *timer,
-					 const char *name,
-					 struct lock_class_key *key,
-					 void (*function)(unsigned long),
-					 unsigned long data)
-{
-	timer->function = function;
-	timer->data = data;
-	init_timer_on_stack_key(timer, name, key);
-	timer_set_deferrable(timer);
-}
-EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key);
-
 /**
  * init_timer_key - initialize a timer
  * @timer: the timer to be initialized
+ * @flags: timer flags
  * @name: name of the timer
  * @key: lockdep class key of the fake lock used for tracking timer
  *       sync lock dependencies
@@ -652,24 +634,14 @@ EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key);
  * init_timer_key() must be done to a timer prior calling *any* of the
  * other timer functions.
  */
-void init_timer_key(struct timer_list *timer,
-		    const char *name,
-		    struct lock_class_key *key)
+void init_timer_key(struct timer_list *timer, unsigned int flags,
+		    const char *name, struct lock_class_key *key)
 {
 	debug_init(timer);
-	__init_timer(timer, name, key);
+	do_init_timer(timer, flags, name, key);
 }
 EXPORT_SYMBOL(init_timer_key);
 
-void init_timer_deferrable_key(struct timer_list *timer,
-			       const char *name,
-			       struct lock_class_key *key)
-{
-	init_timer_key(timer, name, key);
-	timer_set_deferrable(timer);
-}
-EXPORT_SYMBOL(init_timer_deferrable_key);
-
 static inline void detach_timer(struct timer_list *timer, bool clear_pending)
 {
 	struct list_head *entry = &timer->entry;
-- 
1.7.7.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/4] timer: implement TIMER_IRQSAFE
  2012-08-08 18:10 Tejun Heo
                   ` (2 preceding siblings ...)
  2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo
@ 2012-08-08 18:10 ` Tejun Heo
  2012-08-21 16:43   ` [tip:timers/core] timer: Implement TIMER_IRQSAFE tip-bot for Tejun Heo
  2012-08-08 18:13 ` $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers" Tejun Heo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo

Timer internals are protected with irq-safe locks but timer execution
isn't, so a timer being dequeued for execution and its execution
aren't atomic against IRQs.  This makes it impossible to wait for its
completion from IRQ handlers and difficult to shoot down a timer from
IRQ handlers.

This issue caused some issues for delayed_work interface.  Because
there's no way to reliably shoot down delayed_work->timer from IRQ
handlers, __cancel_delayed_work() can't share the logic to steal the
target delayed_work with cancel_delayed_work_sync(), and can only
steal delayed_works which are on queued on timer.  Similarly, the
pending mod_delayed_work() can't be used from IRQ handlers.

This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
to be executed without enabling IRQ after dequeueing such that its
dequeueing and execution are atomic against IRQ handlers.

This makes it safe to wait for the timer's completion from IRQ
handlers, for example, using del_timer_sync().  It can never be
executing on the local CPU and if executing on other CPUs it won't be
interrupted until done.

This will enable simplifying delayed_work cancel/mod interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/timer.h |   12 ++++++++----
 kernel/timer.c        |   35 ++++++++++++++++++++++++-----------
 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 3f95c1f..726192e4 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -49,18 +49,22 @@ extern struct tvec_base boot_tvec_bases;
 #endif
 
 /*
- * Note that all tvec_bases are 2 byte aligned and lower bit of
- * base in timer_list is guaranteed to be zero. Use the LSB to
- * indicate whether the timer is deferrable.
+ * Note that all tvec_bases are at least 4 byte aligned and lower two bits
+ * of base in timer_list is guaranteed to be zero. Use them for flags.
  *
  * A deferrable timer will work normally when the system is busy, but
  * will not cause a CPU to come out of idle just to service it; instead,
  * the timer will be serviced when the CPU eventually wakes up with a
  * subsequent non-deferrable timer.
+ *
+ * An irqsafe timer is executed with IRQ disabled and it's safe to wait for
+ * the completion of the running instance from IRQ handlers, for example,
+ * by calling del_timer_sync().
  */
 #define TIMER_DEFERRABLE		0x1LU
+#define TIMER_IRQSAFE			0x2LU
 
-#define TIMER_FLAG_MASK			0x1LU
+#define TIMER_FLAG_MASK			0x3LU
 
 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
diff --git a/kernel/timer.c b/kernel/timer.c
index 8d185a1..706fe4c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -95,6 +95,11 @@ static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
 	return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE);
 }
 
+static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
+{
+	return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
+}
+
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
 	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -1002,14 +1007,14 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
  *
  * Synchronization rules: Callers must prevent restarting of the timer,
  * otherwise this function is meaningless. It must not be called from
- * interrupt contexts. The caller must not hold locks which would prevent
- * completion of the timer's handler. The timer's handler must not call
- * add_timer_on(). Upon exit the timer is not queued and the handler is
- * not running on any CPU.
+ * interrupt contexts unless the timer is an irqsafe one. The caller must
+ * not hold locks which would prevent completion of the timer's
+ * handler. The timer's handler must not call add_timer_on(). Upon exit the
+ * timer is not queued and the handler is not running on any CPU.
  *
- * Note: You must not hold locks that are held in interrupt context
- *   while calling this function. Even if the lock has nothing to do
- *   with the timer in question.  Here's why:
+ * Note: For !irqsafe timers, you must not hold locks that are held in
+ *   interrupt context while calling this function. Even if the lock has
+ *   nothing to do with the timer in question.  Here's why:
  *
  *    CPU0                             CPU1
  *    ----                             ----
@@ -1046,7 +1051,7 @@ int del_timer_sync(struct timer_list *timer)
 	 * don't use it in hardirq context, because it
 	 * could lead to deadlock.
 	 */
-	WARN_ON(in_irq());
+	WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base));
 	for (;;) {
 		int ret = try_to_del_timer_sync(timer);
 		if (ret >= 0)
@@ -1153,19 +1158,27 @@ static inline void __run_timers(struct tvec_base *base)
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
+			bool irqsafe;
 
 			timer = list_first_entry(head, struct timer_list,entry);
 			fn = timer->function;
 			data = timer->data;
+			irqsafe = tbase_get_irqsafe(timer->base);
 
 			timer_stats_account_timer(timer);
 
 			base->running_timer = timer;
 			detach_expired_timer(timer, base);
 
-			spin_unlock_irq(&base->lock);
-			call_timer_fn(timer, fn, data);
-			spin_lock_irq(&base->lock);
+			if (irqsafe) {
+				spin_unlock(&base->lock);
+				call_timer_fn(timer, fn, data);
+				spin_lock(&base->lock);
+			} else {
+				spin_unlock_irq(&base->lock);
+				call_timer_fn(timer, fn, data);
+				spin_lock_irq(&base->lock);
+			}
 		}
 	}
 	base->running_timer = NULL;
-- 
1.7.7.3


^ permalink raw reply	[flat|nested] 31+ messages in thread

* $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers"
  2012-08-08 18:10 Tejun Heo
                   ` (3 preceding siblings ...)
  2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo
@ 2012-08-08 18:13 ` Tejun Heo
  2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo
  2012-08-14 18:55 ` Thomas Gleixner
  6 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2012-08-08 18:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz

Either I'm keeping forgetting adding Subject: tag or git-send-mail is
somehow screwed up.  Sorry about that.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-08 18:10 Tejun Heo
                   ` (4 preceding siblings ...)
  2012-08-08 18:13 ` $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers" Tejun Heo
@ 2012-08-13 23:35 ` Tejun Heo
  2012-08-14  8:32   ` Thomas Gleixner
  2012-08-14 19:16   ` Thomas Gleixner
  2012-08-14 18:55 ` Thomas Gleixner
  6 siblings, 2 replies; 31+ messages in thread
From: Tejun Heo @ 2012-08-13 23:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz

Hello,

On Wed, Aug 08, 2012 at 11:10:24AM -0700, Tejun Heo wrote:
> Timer internals are protected by irqsafe lock but the lock is
> naturally dropped and irq enabled while a timer is executed.  This
> makes dequeueing timer for execution and the actual execution
> non-atomic against IRQs.  No matter what the timer function does, IRQs
> can occur between timer dispatch and execution.  This means that an
> IRQ handler could interrupt any timer in progress and it's impossible
> for an IRQ handler to cancel and drain a timer.

If nobody objects, I'll route this through wq/for-3.7 together with
"workqueue: use irqsafe timer in delayed_work" patchset.  If you
object, please scream.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo
@ 2012-08-14  8:32   ` Thomas Gleixner
  2012-08-14 19:16   ` Thomas Gleixner
  1 sibling, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14  8:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

On Mon, 13 Aug 2012, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 08, 2012 at 11:10:24AM -0700, Tejun Heo wrote:
> > Timer internals are protected by irqsafe lock but the lock is
> > naturally dropped and irq enabled while a timer is executed.  This
> > makes dequeueing timer for execution and the actual execution
> > non-atomic against IRQs.  No matter what the timer function does, IRQs
> > can occur between timer dispatch and execution.  This means that an
> > IRQ handler could interrupt any timer in progress and it's impossible
> > for an IRQ handler to cancel and drain a timer.
> 
> If nobody objects, I'll route this through wq/for-3.7 together with
> "workqueue: use irqsafe timer in delayed_work" patchset.  If you
> object, please scream.

SCREAM!

I'll reply to the original posting later - still processing my
vacation backlog.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-08 18:10 Tejun Heo
                   ` (5 preceding siblings ...)
  2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo
@ 2012-08-14 18:55 ` Thomas Gleixner
  2012-08-14 19:15   ` Tejun Heo
  6 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 18:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

On Wed, 8 Aug 2012, Tejun Heo wrote:
> Timer internals are protected by irqsafe lock but the lock is
> naturally dropped and irq enabled while a timer is executed.  This
> makes dequeueing timer for execution and the actual execution
> non-atomic against IRQs.  No matter what the timer function does, IRQs
> can occur between timer dispatch and execution.  This means that an
> IRQ handler could interrupt any timer in progress and it's impossible
> for an IRQ handler to cancel and drain a timer.
> 
> This restriction manifests as ugly convolutions in workqueue
> delayed_work interface.  A !idle delayed_work is either on timer,
> being transferred from timer to worklist, on worklist, or executing.
> There are interfaces which need to cancel a pending delayed_work -
> cancel_delayed_work() and friends and mod_delayed_work().  They want
> to cancel a work item in the first three states but it's impossible to
> drain the second state from IRQ handlers which lead to the following
> oddities.
> 
> * mod_delayed_work() can't be used from IRQ handlers.

This function does not exist. So what?
 
> * __cancel_delayed_work() can't use the usual try_to_grab_pending()
>   which handles all three states but instead only deals with the first
>   state using a separate implementation.  There's no way to make a
>   delayed_work not pending from IRQ handlers.

And why is that desired and justifies the mess you are trying to
create in the timer code?

> * The context / behavior differences among cancel_delayed_work(),
>   __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and
>   confusing (the first two are mostly historical tho).

We have a lot of subtle differences in interfaces for similar
reasons.

> This patchset implements irqsafe timers.  For an irqsafe timer, IRQ is
> not enabled from dispatch till the end of its execution making it safe
> to drain the timer regardless of context.  This will enable cleaning
> up delayed_work interface.

By burdening crap on the timer code. We had a similar context case
handling in the original hrtimers code and Linus went berserk on it.
There is no real good reason to reinvent it in a different flavour.

Your general approach about workqueues seems to be adding hacks into
other sensitive code paths [ see __schedule() ]. Can we please stop
that? workqueues are not so special to justify that.

The whole delayed work handling is a fragile construct due to the
combination of the timer and the work struct itself. As long as the
timer is pending the work is detached from the workqueue code and
that's what causes the issues at hand.

So rather than hacking special cases into the timer code, why can't we
manage those delayed works in the workqueue core itself and avoid the
whole timer dance?

Right now delayed work arms a timer, whose callback enqueues the work
and wakes the worker thread, which then executes the work.

So what about changing delayed_work into:

struct delayed_work {
       struct work_struct work;
       unsigned long expires;
};

Now when delayed work gets scheduled it gets enqueued into a separate
list in the workqueue core with the proper worker lock held. Then
check the expiry time of the new work against the current expiry time
of a timer in the worker itself. If the new expiry time is after the
current expiry time, nothing to do. If the new expiry is before the
current expiry time or the timer is not armed, then (re)arm the timer.

When the timer expires it wakes the worker and that evaluates the
delayed list for expired works and executes them and rearms the timer
if necessary.

To cancel delayed work you don't have to worry about the timer
callback being executed at all, simply because the timer callback is
just a wakeup of the worker and not fiddling with the work itself. If
the work is removed before the worker thread runs, life goes on as
usual.

So all you have to do is to remove the work from the delayed list. If
the timer is armed, just leave it alone and let it fire. Canceling
delayed work is probably not a high frequency operation.

In fact that would make cancel_delayed_work and cancel_work basically
the same operation.

I have no idea how many concurrent delayed works are on the fly, so I
can't tell whether a simple ordered list is sufficient or if you need
a tree which is better suited for a large number of sorted items. But
that's a trivial to solve detail.

Thoughts?

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 18:55 ` Thomas Gleixner
@ 2012-08-14 19:15   ` Tejun Heo
  2012-08-14 20:43     ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-14 19:15 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Hello, Thomas.

On Tue, Aug 14, 2012 at 08:55:16PM +0200, Thomas Gleixner wrote:
> > * mod_delayed_work() can't be used from IRQ handlers.
> 
> This function does not exist. So what?

It makes the workqueue users messy.  It's difficult to get completely
correct and subtle errors are difficult to detect / verify.

> > * __cancel_delayed_work() can't use the usual try_to_grab_pending()
> >   which handles all three states but instead only deals with the first
> >   state using a separate implementation.  There's no way to make a
> >   delayed_work not pending from IRQ handlers.
> 
> And why is that desired and justifies the mess you are trying to
> create in the timer code?

Because API forcing its users to be messy is stupid.

> > * The context / behavior differences among cancel_delayed_work(),
> >   __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and
> >   confusing (the first two are mostly historical tho).
> 
> We have a lot of subtle differences in interfaces for similar
> reasons.

And we should work to make them better.

> > This patchset implements irqsafe timers.  For an irqsafe timer, IRQ is
> > not enabled from dispatch till the end of its execution making it safe
> > to drain the timer regardless of context.  This will enable cleaning
> > up delayed_work interface.
> 
> By burdening crap on the timer code. We had a similar context case
> handling in the original hrtimers code and Linus went berserk on it.
> There is no real good reason to reinvent it in a different flavour.
> 
> Your general approach about workqueues seems to be adding hacks into
> other sensitive code paths [ see __schedule() ]. Can we please stop
> that? workqueues are not so special to justify that.

The schedule thing worked out pretty well, didn't it?  If it improves
the kernel in general, I don't see why timer shouldn't participate in
it.  Timer ain't that special either.  However, it does suck to add
one-off feature which isn't used by anyone else but I couldn't find a
better way.

So, if you can think of something better, sure.  Let's see.

> Right now delayed work arms a timer, whose callback enqueues the work
> and wakes the worker thread, which then executes the work.
> 
> So what about changing delayed_work into:
> 
> struct delayed_work {
>        struct work_struct work;
>        unsigned long expires;
> };
> 
> Now when delayed work gets scheduled it gets enqueued into a separate
> list in the workqueue core with the proper worker lock held. Then
> check the expiry time of the new work against the current expiry time
> of a timer in the worker itself.

Work items aren't assigned to worker on queue.  It's a shared worker
pool.  Workers take work items when they can.

> If the new expiry time is after the
> current expiry time, nothing to do. If the new expiry is before the
> current expiry time or the timer is not armed, then (re)arm the timer.
> 
> When the timer expires it wakes the worker and that evaluates the
> delayed list for expired works and executes them and rearms the timer
> if necessary.

How are you gonna decide which worker a delayed work item should be
queued on?  What if the work item before it takes a very long time to
finish?  Do we migrate those work items to a different worker?

> To cancel delayed work you don't have to worry about the timer
> callback being executed at all, simply because the timer callback is
> just a wakeup of the worker and not fiddling with the work itself. If
> the work is removed before the worker thread runs, life goes on as
> usual.
> 
> So all you have to do is to remove the work from the delayed list. If
> the timer is armed, just leave it alone and let it fire. Canceling
> delayed work is probably not a high frequency operation.
> 
> In fact that would make cancel_delayed_work and cancel_work basically
> the same operation.
> 
> I have no idea how many concurrent delayed works are on the fly, so I
> can't tell whether a simple ordered list is sufficient or if you need
> a tree which is better suited for a large number of sorted items. But
> that's a trivial to solve detail.

Aside from work <-> worker association confusion, you're basically
suggesting for workqueue to implement its own tvec_base in suboptimal
way.  Doesn't make much sense to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo
  2012-08-14  8:32   ` Thomas Gleixner
@ 2012-08-14 19:16   ` Thomas Gleixner
  2012-08-14 19:22     ` Tejun Heo
  1 sibling, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 19:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

On Mon, 13 Aug 2012, Tejun Heo wrote:
> On Wed, Aug 08, 2012 at 11:10:24AM -0700, Tejun Heo wrote:
> > Timer internals are protected by irqsafe lock but the lock is
> > naturally dropped and irq enabled while a timer is executed.  This
> > makes dequeueing timer for execution and the actual execution
> > non-atomic against IRQs.  No matter what the timer function does, IRQs
> > can occur between timer dispatch and execution.  This means that an
> > IRQ handler could interrupt any timer in progress and it's impossible
> > for an IRQ handler to cancel and drain a timer.
> 
> If nobody objects, I'll route this through wq/for-3.7 together with
> "workqueue: use irqsafe timer in delayed_work" patchset.  If you
> object, please scream.

Second thoughts.

Why the hell are you trying to rush stuff which affects a well
maintained part of the kernel through your own tree w/o having the
courtesy of contacting the maintainer politely instead of sending an
ultimatum?

You posted that series less than a week ago and there is no reason why
you need to push hard on inclusion into next after a few days.

If you really cared about my opinion you could have figured out that
I'm on vacation.

Yours grumpy

      tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 19:16   ` Thomas Gleixner
@ 2012-08-14 19:22     ` Tejun Heo
  2012-08-14 21:03       ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-14 19:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Hello,

On Tue, Aug 14, 2012 at 09:16:16PM +0200, Thomas Gleixner wrote:
> Why the hell are you trying to rush stuff which affects a well
> maintained part of the kernel through your own tree w/o having the
> courtesy of contacting the maintainer politely instead of sending an
> ultimatum?
>
> You posted that series less than a week ago and there is no reason why
> you need to push hard on inclusion into next after a few days.

So, it has been out for a while, and it's not the final inclusion into
mainline.  The change to timer is fairly localized and you have enough
opportunity to object before and after.  It's not like I'm gonna push
it in against oppositions.

> If you really cared about my opinion you could have figured out that
> I'm on vacation.

Ah... sorry but it's not like we date or anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 19:15   ` Tejun Heo
@ 2012-08-14 20:43     ` Thomas Gleixner
  2012-08-14 21:40       ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 20:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Tue, Aug 14, 2012 at 08:55:16PM +0200, Thomas Gleixner wrote:
> > > * mod_delayed_work() can't be used from IRQ handlers.
> > 
> > This function does not exist. So what?
> 
> It makes the workqueue users messy.  It's difficult to get completely
> correct and subtle errors are difficult to detect / verify.

Ah, the function which does not exist makes the users
messy. Interesting observation.
 
> > > * __cancel_delayed_work() can't use the usual try_to_grab_pending()
> > >   which handles all three states but instead only deals with the first
> > >   state using a separate implementation.  There's no way to make a
> > >   delayed_work not pending from IRQ handlers.
> > 
> > And why is that desired and justifies the mess you are trying to
> > create in the timer code?
> 
> Because API forcing its users to be messy is stupid.

That's not an answer to my question. Let me rephrase:

Why is it required to cancel delayed work from interupt handlers? Is
it just because people are doing stupid things or is there are real
requirement?

No API forces users to be messy. Most of the mess originates from
using the wrong mechanisms and then trying to work around that.

If you are trying to clean up mess which originates from there, then
you are just nurturing the abuse instead of fixing the underlying
design problems in the affected code pathes.
 
> > > * The context / behavior differences among cancel_delayed_work(),
> > >   __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and
> > >   confusing (the first two are mostly historical tho).
> > 
> > We have a lot of subtle differences in interfaces for similar
> > reasons.
> 
> And we should work to make them better.

No objection, but not for the price of cluttering common and hot code
pathes with conditionals and special cases. There are always
situations where it's justified to have a special function which is
tailored for a particular context instead of making the common case
slow and ugly.

> > > This patchset implements irqsafe timers.  For an irqsafe timer, IRQ is
> > > not enabled from dispatch till the end of its execution making it safe
> > > to drain the timer regardless of context.  This will enable cleaning
> > > up delayed_work interface.
> > 
> > By burdening crap on the timer code. We had a similar context case
> > handling in the original hrtimers code and Linus went berserk on it.
> > There is no real good reason to reinvent it in a different flavour.
> > 
> > Your general approach about workqueues seems to be adding hacks into
> > other sensitive code paths [ see __schedule() ]. Can we please stop
> > that? workqueues are not so special to justify that.
> 
> The schedule thing worked out pretty well, didn't it?  If it improves

You know my opinion about it. It's still a fricking nightmare for RT
and aside of that I still don't see a proper justification for having
it in the guts of the scheduler.

> the kernel in general, I don't see why timer shouldn't participate in
> it.  Timer ain't that special either.  However, it does suck to add

I didn't say that timers are special and timers have been improved all
the time to make the life of their users simpler. Just there is a
subtle difference between making the life simpler and adding stuff
which is designed to work around shortcomings in other parts of the
kernel.

> one-off feature which isn't used by anyone else but I couldn't find a
> better way.
>  
> So, if you can think of something better, sure.  Let's see.
> 
> > Right now delayed work arms a timer, whose callback enqueues the work
> > and wakes the worker thread, which then executes the work.
> > 
> > So what about changing delayed_work into:
> > 
> > struct delayed_work {
> >        struct work_struct work;
> >        unsigned long expires;
> > };
> > 
> > Now when delayed work gets scheduled it gets enqueued into a separate
> > list in the workqueue core with the proper worker lock held. Then
> > check the expiry time of the new work against the current expiry time
> > of a timer in the worker itself.
> 
> Work items aren't assigned to worker on queue.  It's a shared worker
> pool.  Workers take work items when they can.

Then add it to the worker pool. It's an implementation detail not a
design problem.

> > If the new expiry time is after the
> > current expiry time, nothing to do. If the new expiry is before the
> > current expiry time or the timer is not armed, then (re)arm the timer.
> > 
> > When the timer expires it wakes the worker and that evaluates the
> > delayed list for expired works and executes them and rearms the timer
> > if necessary.
> 
> How are you gonna decide which worker a delayed work item should be
> queued on?  What if the work item before it takes a very long time to
> finish?  Do we migrate those work items to a different worker?

You're the expert on those worker details :)

> > To cancel delayed work you don't have to worry about the timer
> > callback being executed at all, simply because the timer callback is
> > just a wakeup of the worker and not fiddling with the work itself. If
> > the work is removed before the worker thread runs, life goes on as
> > usual.
> > 
> > So all you have to do is to remove the work from the delayed list. If
> > the timer is armed, just leave it alone and let it fire. Canceling
> > delayed work is probably not a high frequency operation.
> > 
> > In fact that would make cancel_delayed_work and cancel_work basically
> > the same operation.
> > 
> > I have no idea how many concurrent delayed works are on the fly, so I
> > can't tell whether a simple ordered list is sufficient or if you need
> > a tree which is better suited for a large number of sorted items. But
> > that's a trivial to solve detail.
> 
> Aside from work <-> worker association confusion, you're basically
> suggesting for workqueue to implement its own tvec_base in suboptimal
> way.  Doesn't make much sense to me.

Your approach of adding a special case timer for workqueues which
burdens all other users and opens another cans of worms of subtle bugs
in the timer code does not make any sense to me.

Further I did not say that you need to implement your own tvec_base
code into workqueues for that.

First of all you did not answer my question whether the delayed timer
based works are frequent enough to require something else than a
simple linked list which is evaluated by a worker thread for the next
expiry and the code which adds a new work just checks the cached next
expiry value. If you have the requirement of serving tons of those
delayed works, then you have the choice of using a set of ready to use
library function which just needs a litte tweakage to work with
jiffies instead of nsec based values.

Secondly and what's worse you are completely ignoring the fact that
the problem you are trying to solve is due to the disconnection of the
delayed work (pending on a timer) and the workqueue core.

The reason why you can't cancel/modify stuff is due to the locking
issues which arise due to del_timer_sync() in the current design.

So instead of thinking about the shortcomings of your own design you
want to force extra cases into the core timer code as you see fit. 

And looking at the use cases:

 block/blk-core.c                    |    6 ++----
 block/blk-throttle.c                |    7 +------
 drivers/block/floppy.c              |    3 +--
 drivers/infiniband/core/mad.c       |   14 +++++---------
 drivers/input/keyboard/qt2160.c     |    3 +--
 drivers/input/mouse/synaptics_i2c.c |    7 +------
 net/core/link_watch.c               |   21 ++++++---------------

It's not that impressive and I really want to see a detailed analysis
why all of this is necessary in the first place before we add special
cases to the core timer infrastucture.

You can strike drivers/block/floppy.c, drivers/input/keyboard/qt2160.c
and drivers/input/mouse/synaptics_i2c.c from that argument list as
they are really not interesting at all either due to being obsolete or
a complete mess to begin with.

For the others please bring up proper arguments why we need this
special casing and why those things can't do with other mechanisms.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 19:22     ` Tejun Heo
@ 2012-08-14 21:03       ` Thomas Gleixner
  2012-08-14 21:56         ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 21:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Tue, Aug 14, 2012 at 09:16:16PM +0200, Thomas Gleixner wrote:
> > Why the hell are you trying to rush stuff which affects a well
> > maintained part of the kernel through your own tree w/o having the
> > courtesy of contacting the maintainer politely instead of sending an
> > ultimatum?
> >
> > You posted that series less than a week ago and there is no reason why
> > you need to push hard on inclusion into next after a few days.
> 
> So, it has been out for a while, and it's not the final inclusion into

Out for a while? You must be kidding.

The original post is from Wed. Aug 8th and your ultimatum mail is
dated Mon Aug 13th. That's a mere 3 working days.

> mainline.  The change to timer is fairly localized and you have enough
> opportunity to object before and after.  It's not like I'm gonna push
> it in against oppositions.

You seem to misunderstand what pushing into -next means. -next is not
a playground for random changes. It's an integration tree for stuff
which is supposed to show up in the next merge window. And stuff which
touches other maintainers trees needs to orginate from there or at
least be acked/reviewed-by those maintainers.

Why should -next have different rules to mainline?

> > If you really cared about my opinion you could have figured out that
> > I'm on vacation.
> 
> Ah... sorry but it's not like we date or anything.

Fortunately not. I wouldn't touch such an insensitive clod with a ten
foot pole.

Though other people whom I'm not dating either are sensitive enough to
send a polite mail after a few days:

<cite>
 Thomas, I figured out on IRC that you are on vacation. Would
 you be so kind to look at the patches I've posted 10 days ago? ....
</cite>

Do you see the subtle difference between a polite reminder after 10
days and an ultimatum after 5 days ?????

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 20:43     ` Thomas Gleixner
@ 2012-08-14 21:40       ` Tejun Heo
  2012-08-14 23:12         ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-14 21:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Hello, Thomas.

On Tue, Aug 14, 2012 at 10:43:30PM +0200, Thomas Gleixner wrote:
> > It makes the workqueue users messy.  It's difficult to get completely
> > correct and subtle errors are difficult to detect / verify.
> 
> Ah, the function which does not exist makes the users
> messy. Interesting observation.

Can we get a little less snarky please?  It's tiring.

Lack of mod_delayed_work() has been an annoyance for quite a few
users.  People use either explicit cancel + queue or separate timer +
work_struct to work around, or just side-step it in some other way.
In any case, they end up implementing it ad-hoc in their code.  Nasty.

> That's not an answer to my question. Let me rephrase:
> 
> Why is it required to cancel delayed work from interupt handlers? Is
> it just because people are doing stupid things or is there are real
> requirement?

That's pretty much like asking why does del_timer() need to be called
from IRQ context.  Workqueue is pretty extensively used from IRQ
handlers.  Queueing a work item from IRQ handlers is a pretty common
operation and modding / canceling pending delayed work items isn't
difficult to reach from there.  Can people live without it?  Sure,
people can live without pretty much anything, but it's just not good.

> No API forces users to be messy. Most of the mess originates from
> using the wrong mechanisms and then trying to work around that.
> 
> If you are trying to clean up mess which originates from there, then
> you are just nurturing the abuse instead of fixing the underlying
> design problems in the affected code pathes.

There are good APIs and bad APIs.  delayed_work is widely used and
unfortunately has complications which stem from implementation details
which force its users to work around the shortcomings on their own.
It's usually better to go some extra miles (it's not even that much
here) in the common API if that saves cruft in its many users.

> > > > * The context / behavior differences among cancel_delayed_work(),
> > > >   __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and
> > > >   confusing (the first two are mostly historical tho).
> > > 
> > > We have a lot of subtle differences in interfaces for similar
> > > reasons.
> > 
> > And we should work to make them better.
> 
> No objection, but not for the price of cluttering common and hot code
> pathes with conditionals and special cases. There are always
> situations where it's justified to have a special function which is
> tailored for a particular context instead of making the common case
> slow and ugly.

I agree the addition isn't the prettiest thing in the world.  I tried
to implement it inside workqueue proper by trying to detect the case
where try_to_grab_pending() is called from an IRQ handler which
interrupted delayed_work_timer_fn() but I couldn't make it work
reliably in any sane way.  So, if you can come up with something
better, be my guest.  Short of that, I think adding irqsafe timer is a
reasonable trade-off.

> > The schedule thing worked out pretty well, didn't it?  If it improves
> 
> You know my opinion about it. It's still a fricking nightmare for RT
> and aside of that I still don't see a proper justification for having
> it in the guts of the scheduler.

I can imagine it sucking for RT but for mainline, we now have
self-regulating worker pool without huge and messy heuristics trying
to guesstimate what's going on and we need some sort of worker pool
mechanism in kernel.  It might not suit your fancy but it has been
working pretty well.

> > the kernel in general, I don't see why timer shouldn't participate in
> > it.  Timer ain't that special either.  However, it does suck to add
> 
> I didn't say that timers are special and timers have been improved all
> the time to make the life of their users simpler. Just there is a
> subtle difference between making the life simpler and adding stuff
> which is designed to work around shortcomings in other parts of the
> kernel.

The problem here is that it's really difficult to make delayed_work
(or any other generic construct wrapping timer) work properly (people
naturally expect similar behavior to timer) using the interface timer
currently exposes and it's not caused by some random limitation in
delayed_work implementation.  Until now, the burden has been
transferred to delayed_work users, which is rather sad.

So, I really don't think there's any fundamental conflict regarding
the basic direction.

> > Aside from work <-> worker association confusion, you're basically
> > suggesting for workqueue to implement its own tvec_base in suboptimal
> > way.  Doesn't make much sense to me.
> 
> Your approach of adding a special case timer for workqueues which
> burdens all other users and opens another cans of worms of subtle bugs
> in the timer code does not make any sense to me.

What kind of cans of worms would it open?

> First of all you did not answer my question whether the delayed timer
> based works are frequent enough to require something else than a
> simple linked list which is evaluated by a worker thread for the next
> expiry and the code which adds a new work just checks the cached next
> expiry value. If you have the requirement of serving tons of those
> delayed works, then you have the choice of using a set of ready to use
> library function which just needs a litte tweakage to work with
> jiffies instead of nsec based values.

Grep for delayed_work.  It can be as frequent as any timer.  Some
drivers use it to delay certain processing from IRQ handlers.  Block
layer uses it a lot to manage plugging and other delayed processing.
They can generate a lot of timer modifications.  It isn't different
from any timer.

Custom ad-hoc unscalable timer mechanism vs. single if().

> Secondly and what's worse you are completely ignoring the fact that
> the problem you are trying to solve is due to the disconnection of the
> delayed work (pending on a timer) and the workqueue core.

Alternatively, it's addressing timer's deficiency of not allowing
building other generic constructs on top of it.  It's not some black
or white thing.

> The reason why you can't cancel/modify stuff is due to the locking
> issues which arise due to del_timer_sync() in the current design.
> 
> So instead of thinking about the shortcomings of your own design you
> want to force extra cases into the core timer code as you see fit. 

No, the shortcoming is fundamental in the *timer* interface if you
want to build something which incorporates it and be able to present
the same timer-like behavior to its users.  Really, try it with the
current interface.

Now, whether being able to wrap timer to build something like
delayed_work is necessary could be debated.  Maybe everyone should
just use timer + work_struct directly, but I think it's a bit too late
to have that discussion and even if we aren't I think having
delayed_work is a good idea.

> And looking at the use cases:
> 
>  block/blk-core.c                    |    6 ++----
>  block/blk-throttle.c                |    7 +------
>  drivers/block/floppy.c              |    3 +--
>  drivers/infiniband/core/mad.c       |   14 +++++---------
>  drivers/input/keyboard/qt2160.c     |    3 +--
>  drivers/input/mouse/synaptics_i2c.c |    7 +------
>  net/core/link_watch.c               |   21 ++++++---------------

Hmmm?  The diffstat is more like,

 arch/powerpc/platforms/cell/cpufreq_spudemand.c |    2
 block/blk-core.c                                |    8 -
 block/blk-throttle.c                            |    7 -
 drivers/block/floppy.c                          |    5
 drivers/cpufreq/cpufreq_conservative.c          |    2
 drivers/cpufreq/cpufreq_ondemand.c              |    2
 drivers/devfreq/devfreq.c                       |    2
 drivers/infiniband/core/mad.c                   |   16 --
 drivers/input/keyboard/qt2160.c                 |    3
 drivers/input/mouse/synaptics_i2c.c             |    7 -
 drivers/net/ethernet/mellanox/mlx4/sense.c      |    2
 drivers/power/ab8500_btemp.c                    |    2
 drivers/power/ab8500_charger.c                  |    8 -
 drivers/power/ab8500_fg.c                       |    8 -
 drivers/power/abx500_chargalg.c                 |    4
 drivers/power/max17040_battery.c                |    2
 drivers/video/omap2/displays/panel-taal.c       |    6
 drivers/video/omap2/dss/dsi.c                   |    6
 mm/slab.c                                       |    2
 mm/vmstat.c                                     |    2
 net/core/link_watch.c                           |   21 ---
 net/core/neighbour.c                            |    2
 net/ipv4/inetpeer.c                             |    2
 net/sunrpc/cache.c                              |    2

and these are only the obvious conversions and aren't simple
conversions.  Now they behave *properly*.  If the user calls cancel,
the delayed_work is reliably canceled and mod_delayed_work()
*reliably* schedule the work item at the specified time regardless of
the current state of delayed_work.

> It's not that impressive and I really want to see a detailed analysis
> why all of this is necessary in the first place before we add special
> cases to the core timer infrastucture.
> 
> You can strike drivers/block/floppy.c, drivers/input/keyboard/qt2160.c
> and drivers/input/mouse/synaptics_i2c.c from that argument list as
> they are really not interesting at all either due to being obsolete or
> a complete mess to begin with.
> 
> For the others please bring up proper arguments why we need this
> special casing and why those things can't do with other mechanisms.

I think I've written enough here and in the commit message but to sum
up,

A single if() in timer enables,

* Quite a few cleanups.  I did only cancel + queue ones here.  There
  are some which were using explicit timer + work_struct.

* Much cleaner API which behaves reliably as expected, which makes
  delayed_work users life easier and reduces the possibility of subtle
  bugs.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 21:03       ` Thomas Gleixner
@ 2012-08-14 21:56         ` Tejun Heo
  2012-08-14 22:45           ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-14 21:56 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Hello, Thomas.

On Tue, Aug 14, 2012 at 11:03:33PM +0200, Thomas Gleixner wrote:
> Why should -next have different rules to mainline?

It's faster paced and trees revert.  The message specifically was a
ping for objection and I was waiting for further response and would
have waited until early next week (and written another "applied"
message which would be another chance to veto).  And even if that
isn't enough for whatever reason and you or anyone else object it
afterwards, it'll get reverted / reouted differently / whatever.

As for subsystem boundary, at least I cross them and let others cross
if the changes aren't significant and the proposed changes are likely
to be be used only in that tree for the devel window.  The timer
change seems borderline to me.  It isn't trivial but doesn't seem all
that invasive to me.

I don't think any critical protocol is breached here.  If you're upset
about the style of the ping, I apologize.  I'll try to be more
sensitive when pinging you.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 21:56         ` Tejun Heo
@ 2012-08-14 22:45           ` Thomas Gleixner
  2012-08-14 23:01             ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 22:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra,
	Stephen Rothwell

Tejun,

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Tue, Aug 14, 2012 at 11:03:33PM +0200, Thomas Gleixner wrote:
> > Why should -next have different rules to mainline?
> 
> It's faster paced and trees revert.  The message specifically was a

Nonsense. It's a tree which gets stuff which is cooked and not a tree
for random experimental crap. Period.

> ping for objection and I was waiting for further response and would
> have waited until early next week (and written another "applied"
> message which would be another chance to veto).  And even if that
> isn't enough for whatever reason and you or anyone else object it
> afterwards, it'll get reverted / reouted differently / whatever.
> 
> As for subsystem boundary, at least I cross them and let others cross
> if the changes aren't significant and the proposed changes are likely
> to be be used only in that tree for the devel window.  The timer
> change seems borderline to me.  It isn't trivial but doesn't seem all
> that invasive to me.

It does not matter at all whether you think it's invasive or
not. There is a reason why we have maintainers for different parts of
the kernel who are responsible for deciding what's invasive.

And we have very well worked out mechanisms regarding cross tree
changes, i.e. providing minimal trees to pull for other maintainers.

> I don't think any critical protocol is breached here.

May I recommend that you make yourself familiar with the way how this
community works?

> If you're upset about the style of the ping, I apologize.  I'll try
> to be more sensitive when pinging you.

It's not about me. You are trying to play the system.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 22:45           ` Thomas Gleixner
@ 2012-08-14 23:01             ` Tejun Heo
  2012-08-14 23:33               ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-14 23:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra,
	Stephen Rothwell

Hello, Thomas.

On Wed, Aug 15, 2012 at 12:45:24AM +0200, Thomas Gleixner wrote:
> And we have very well worked out mechanisms regarding cross tree
> changes, i.e. providing minimal trees to pull for other maintainers.

If you look at the review branches, they're actually structured that
way so that the timer part can be pulled separately.  If the
maintainer wants to do that, sure.  If the maintainer thinks routing
through another tree is fine, that's okay too.  Subsystem boundaries
are all good and great but it's not some absolute barrier which can't
be crossed at any cost.

> > If you're upset about the style of the ping, I apologize.  I'll try
> > to be more sensitive when pinging you.
> 
> It's not about me. You are trying to play the system.

Thomas, I wasn't trying to get it through behind your back.  You have
been notified clearly multiple times and have ample opportunities to
object and suggest different ways if you don't like whatever is going
on.

I probably should have written "if the maintainer doesn't object, I
think it would be easier to route these through wq/for-3.7 as it will
be the only user for now, blah blah blah" instead and maybe I
misjudged the character of the changes or the subsystem.  That said, I
think you're inferring too much.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 21:40       ` Tejun Heo
@ 2012-08-14 23:12         ` Thomas Gleixner
  2012-08-14 23:27           ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 23:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Tejun,

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Tue, Aug 14, 2012 at 10:43:30PM +0200, Thomas Gleixner wrote:
> > > It makes the workqueue users messy.  It's difficult to get completely
> > > correct and subtle errors are difficult to detect / verify.
> > 
> > Ah, the function which does not exist makes the users
> > messy. Interesting observation.
> 
> Can we get a little less snarky please?  It's tiring.

Can you please try to answer my questions instead of throwing random
blurb into my direction?

Just for the record. The thread evolved from here:

  <tj>    * mod_delayed_work() can't be used from IRQ handlers.

My answer was:

  <tglx>  This function does not exist. So what?

Which was completely appropriate as this function does not exist
though you used it as a primary argument for your patches.

Now your answer to my reply was:

  <tj>    It makes the workqueue users messy.  It's difficult to get
          completely correct and subtle errors are difficult to
          detect / verify.

Can you please point out any relevance to my question which would have
me prevented from writing the following?

  <tglx>  Ah, the function which does not exist makes the users
  	  messy. Interesting observation.

So instead of saying, that you wrote an utter nonsense reply you
accuse me of being obnoxious:

  <tj>    Can we get a little less snarky please?  It's tiring.

Can you please sit down for a little while and think about your own
snarkiness and your own tiring behaviour against other kernel
maintainers?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 23:12         ` Thomas Gleixner
@ 2012-08-14 23:27           ` Tejun Heo
  2012-08-14 23:46             ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-14 23:27 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Hello, Thomas.

On Wed, Aug 15, 2012 at 01:12:01AM +0200, Thomas Gleixner wrote:
> Just for the record. The thread evolved from here:
> 
>   <tj>    * mod_delayed_work() can't be used from IRQ handlers.
> 
> My answer was:
> 
>   <tglx>  This function does not exist. So what?
> 
> Which was completely appropriate as this function does not exist
> though you used it as a primary argument for your patches.

I read it as "so, what's wrong with not having mod_delayed_work()?",
so the response.

It exists in wq/for-3.7 and cancel_delayed_work() (the one without
preceding __) + queue() users have been already converted.

  http://thread.gmane.org/gmane.linux.kernel/1334546

> Can you please sit down for a little while and think about your own
> snarkiness and your own tiring behaviour against other kernel
> maintainers?

Believe it or not, I tend to work pretty well with other maintainers
and developers.  You start responding with words like "mess" and
"crap" with condescension sprinkled and expect the conversation to not
escalate?  Let's just stay technical, okay?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 23:01             ` Tejun Heo
@ 2012-08-14 23:33               ` Thomas Gleixner
  2012-08-15  0:18                 ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 23:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra,
	Stephen Rothwell

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Wed, Aug 15, 2012 at 12:45:24AM +0200, Thomas Gleixner wrote:
> > And we have very well worked out mechanisms regarding cross tree
> > changes, i.e. providing minimal trees to pull for other maintainers.
> 
> If you look at the review branches, they're actually structured that
> way so that the timer part can be pulled separately.  If the
> maintainer wants to do that, sure.  If the maintainer thinks routing
> through another tree is fine, that's okay too.  Subsystem boundaries
> are all good and great but it's not some absolute barrier which can't
> be crossed at any cost.

That's not about any cost. You are trying to force stuff into -next
with a THREE workdays notice, just because you think that your stuff
is so important.

Have you ever tried to understand how the kernel development system
works?

> I probably should have written "if the maintainer doesn't object, I
> think it would be easier to route these through wq/for-3.7 as it will
> be the only user for now, blah blah blah" instead and maybe I
> misjudged the character of the changes or the subsystem.  That said, I
> think you're inferring too much.

No, I'm, not inferrring too much. It's not about what you should have
written.

It's about your general attitude. You really think that I accept all
that stuff just because you add some "blah, blah, blah" to some mail?

Far from it!

To convince me to accept your patches you should start answering my
questions and suggestions seriously in the first place and not
discarding them upfront as lunatic visions.

As long as you can't provide a proper counter argument against
maintaining the timer in the same context as the work, no matter what
the underlying mechanism to achieve this will be, I'm not going to
accept any of this hackery neither near next nor mainline.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 23:27           ` Tejun Heo
@ 2012-08-14 23:46             ` Thomas Gleixner
  2012-08-14 23:52               ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2012-08-14 23:46 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Tejun,

On Tue, 14 Aug 2012, Tejun Heo wrote:
> On Wed, Aug 15, 2012 at 01:12:01AM +0200, Thomas Gleixner wrote:
> > Just for the record. The thread evolved from here:
> > 
> >   <tj>    * mod_delayed_work() can't be used from IRQ handlers.
> > 
> > My answer was:
> > 
> >   <tglx>  This function does not exist. So what?
> > 
> > Which was completely appropriate as this function does not exist
> > though you used it as a primary argument for your patches.
> 
> I read it as "so, what's wrong with not having mod_delayed_work()?",
> so the response.

Oh well. Your interpretation of "So what?" starts to stress my
patience.

> It exists in wq/for-3.7 and cancel_delayed_work() (the one without
> preceding __) + queue() users have been already converted.
> 
>   http://thread.gmane.org/gmane.linux.kernel/1334546

Do you really expect that I follow all of kernel dev posts within a
day of returning from a two weeks vacation?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 23:46             ` Thomas Gleixner
@ 2012-08-14 23:52               ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2012-08-14 23:52 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz

Hello,

On Tue, Aug 14, 2012 at 4:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Do you really expect that I follow all of kernel dev posts within a
> day of returning from a two weeks vacation?

The head message says on what it's based on and the git branch. I
can't read your mind or know your current state. You could have asked
using a proper sentence.

We can continue this, but I don't think this is leading any place
productive. I'm replying to your other reply about the suggestion
about implementing workqueue's own timerlist. Let's just talk about
that. Please.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-14 23:33               ` Thomas Gleixner
@ 2012-08-15  0:18                 ` Tejun Heo
  2012-08-15 10:58                   ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Tejun Heo @ 2012-08-15  0:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra,
	Stephen Rothwell

Hello, Thomas.

On Wed, Aug 15, 2012 at 01:33:09AM +0200, Thomas Gleixner wrote:
> To convince me to accept your patches you should start answering my
> questions and suggestions seriously in the first place and not
> discarding them upfront as lunatic visions.
> 
> As long as you can't provide a proper counter argument against
> maintaining the timer in the same context as the work, no matter what
> the underlying mechanism to achieve this will be, I'm not going to
> accept any of this hackery neither near next nor mainline.

Sure, that's exactly why the patches are posted for review, so you're
suggesting for workqueue implement essentially its own timer list - be
that a simple sorted linked list or priority heap.  Am I understanding
you correctly?

If so, we're comparing the following two.

a. Adding IRQSAFE timer.  Runtime cost is one additional if() in timer
   execution path.

b. Implementing workqueue's own timer system which is driven by timer
   so that the timer part can also be protected by the usual workqueue
   synchronization.

To me, #a seems like the better choice here.

delayed_work is one of the more common constructs used widely in the
kernel.  It's often used in device drivers to defer processing to
process context and timer queueing (including modification) is a
frequent operation.

IRQ handlers schedule them, some drivers use it to poll the device,
block layer uses it for most of deferred handling - SCSI layer failing
to issue a command due to resource shortage reschedules delayed_work
repeatedly, and so on.

Essentially, delayed_work might be used for any purpose a timer is
used.  Timer users switch to delayed_work if process context becomes
necessary for whatever reason, so, I don't think we can get away with
simple sorted list implementation.  We might be okay under some
workloads but O(N^2) insertion complexity for something as commonly
used as delayed_work doesn't seem like a good idea to me.

We could go for more involved implementation, say a priority heap or
somewhat simplified version of tvec_base, but that seems like a bad
tradeoff to me.  We would be trading off fairly complex chunk of code
duplicating an existing capability to avoid adding fairly small
feature to the timer.  It will likely be worse than the proper timer
and we have to maintain two chunks of code doing about the same thing
to save single if() in the existing timer code.

Let's see if we can agree on the latter point first.  Do you agree
that it wouldn't be a good idea to implement relatively complex timer
subsystem inside workqueue?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-15  0:18                 ` Tejun Heo
@ 2012-08-15 10:58                   ` Peter Zijlstra
  2012-08-16 19:36                     ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2012-08-15 10:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Thomas Gleixner, LKML, Linus Torvalds, mingo, Andrew Morton,
	Stephen Rothwell

On Tue, 2012-08-14 at 17:18 -0700, Tejun Heo wrote:
> Let's see if we can agree on the latter point first.  Do you agree
> that it wouldn't be a good idea to implement relatively complex timer
> subsystem inside workqueue? 

RB-trees are fairly trivial to use, but can we please get back to why
people want to do del/mod delayed work from IRQ context?

I can get the queueing part, but why do they need to cancel and or
modify stuff?

Trying to come up with a solution to a problem you don't understand is
kinda difficult.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers
  2012-08-15 10:58                   ` Peter Zijlstra
@ 2012-08-16 19:36                     ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2012-08-16 19:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, Linus Torvalds, mingo, Andrew Morton,
	Stephen Rothwell

Hello, Peter.

On Wed, Aug 15, 2012 at 12:58:27PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-08-14 at 17:18 -0700, Tejun Heo wrote:
> > Let's see if we can agree on the latter point first.  Do you agree
> > that it wouldn't be a good idea to implement relatively complex timer
> > subsystem inside workqueue? 
> 
> RB-trees are fairly trivial to use,

I'll get back to this later.

> but can we please get back to why
> people want to do del/mod delayed work from IRQ context?
> 
> I can get the queueing part, but why do they need to cancel and or
> modify stuff?

It isn't different from being able to use del_timer() or mod_timer()
from IRQ handlers.

For example, block layer uses delayed_work to restart queue processing
after a short delay for whatever reason.  Depending on the driver,
request issuing can happen from its IRQ handler chained from
completion.  If the command processing detects a condition which
indicates that the queue can't process any more requests until another
event happens, it shoots down the delayed_work.

Another example is drivers using polling and irq together.  If IRQ
happens, they shoot down the pending delayed_work and schedules it
immediately.  Another similar use is timeout handler.  Schedule
timeout handler when initiating an operation and cancel it on
completion IRQ.

Apart from having process context when executing the callback,
delayed_work's use case isn't different from timer.  It might as well
be named sleepable_timer and implemented as part of timer with
cooperation from workqueue.

As such, it's natural to expect interface and behavior similar to
those of timer and that's another reason why implementing workqueue's
own timerlist isn't a good idea.  e.g. What about deferrable
delayed_work?  We definitely better have that and I'm sure it can also
be implemented separately in workqueue too, but it seems quite silly
to me.  Let's say we implement it by having two timer_lists, one
deferrable and one not.  What if someone wants to adjust timer slack -
ie. what about users which can use much larger slack than allowed by
the default deferrable?

Yet another thing is that not being able to use
cancel/mod_delayed_work() from IRQ handlers is inherently bad API.
The restriction isn't inherent in what it does.  It rises purely from
implementation details.  For an API which is as widely used as
workqueue, especially by drivers, that just doesn't seem like a good
idea.

The downside being one additional if() in timer execution path, to me,
the tradeoff seems clear.  If exposing IRQSAFE to other users or
allowing del_timer_sync() from IRQ context is bothersome, those can be
dropped too.  The *only* thing which is necessary is for timer to not
enable IRQ between delayed_work timer being dequeued and its
execution.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:timers/core] timer: Generalize timer->base flags handling
  2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo
@ 2012-08-21 16:40   ` tip-bot for Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Tejun Heo @ 2012-08-21 16:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx

Commit-ID:  e52b1db37b89b69ceb08b521a808bd2cf4724481
Gitweb:     http://git.kernel.org/tip/e52b1db37b89b69ceb08b521a808bd2cf4724481
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Wed, 8 Aug 2012 11:10:25 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 21 Aug 2012 16:28:30 +0200

timer: Generalize timer->base flags handling

To prepare for addition of another flag, generalize timer->base flags
handling.

* Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.

* Define and use TIMER_FLAG_MASK for flags masking so that multiple
  flags can be handled correctly.

* Don't dereference timer->base directly even if
  !tbase_get_deferrable().  All two such places are already passed in
  @base, so use it instead.

* Make sure tvec_base's alignment is large enough for timer->base
  flags using BUILD_BUG_ON().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timer.h |    6 ++++--
 kernel/timer.c        |   21 +++++++++++++--------
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 6abd913..cbd32ec 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -58,7 +58,9 @@ extern struct tvec_base boot_tvec_bases;
  * the timer will be serviced when the CPU eventually wakes up with a
  * subsequent non-deferrable timer.
  */
-#define TBASE_DEFERRABLE_FLAG		(0x1)
+#define TIMER_DEFERRABLE		0x1LU
+
+#define TIMER_FLAG_MASK			0x1LU
 
 #define TIMER_INITIALIZER(_function, _expires, _data) {		\
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
@@ -72,7 +74,7 @@ extern struct tvec_base boot_tvec_bases;
 	}
 
 #define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *)		\
-		  ((unsigned char *)(ptr) + TBASE_DEFERRABLE_FLAG))
+		  ((unsigned char *)(ptr) + TIMER_DEFERRABLE))
 
 #define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
diff --git a/kernel/timer.c b/kernel/timer.c
index a61c093..cf7af56 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -92,12 +92,12 @@ static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;
 /* Functions below help us manage 'deferrable' flag */
 static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
 {
-	return ((unsigned int)(unsigned long)base & TBASE_DEFERRABLE_FLAG);
+	return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE);
 }
 
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
-	return ((struct tvec_base *)((unsigned long)base & ~TBASE_DEFERRABLE_FLAG));
+	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
 }
 
 static inline void timer_set_deferrable(struct timer_list *timer)
@@ -108,8 +108,9 @@ static inline void timer_set_deferrable(struct timer_list *timer)
 static inline void
 timer_set_base(struct timer_list *timer, struct tvec_base *new_base)
 {
-	timer->base = (struct tvec_base *)((unsigned long)(new_base) |
-				      tbase_get_deferrable(timer->base));
+	unsigned long flags = (unsigned long)timer->base & TIMER_FLAG_MASK;
+
+	timer->base = (struct tvec_base *)((unsigned long)(new_base) | flags);
 }
 
 static unsigned long round_jiffies_common(unsigned long j, int cpu,
@@ -686,7 +687,7 @@ detach_expired_timer(struct timer_list *timer, struct tvec_base *base)
 {
 	detach_timer(timer, true);
 	if (!tbase_get_deferrable(timer->base))
-		timer->base->active_timers--;
+		base->active_timers--;
 }
 
 static int detach_if_pending(struct timer_list *timer, struct tvec_base *base,
@@ -697,7 +698,7 @@ static int detach_if_pending(struct timer_list *timer, struct tvec_base *base,
 
 	detach_timer(timer, clear_pending);
 	if (!tbase_get_deferrable(timer->base)) {
-		timer->base->active_timers--;
+		base->active_timers--;
 		if (timer->expires == base->next_timer)
 			base->next_timer = base->timer_jiffies;
 	}
@@ -1800,9 +1801,13 @@ static struct notifier_block __cpuinitdata timers_nb = {
 
 void __init init_timers(void)
 {
-	int err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE,
-				(void *)(long)smp_processor_id());
+	int err;
+
+	/* ensure there are enough low bits for flags in timer->base pointer */
+	BUILD_BUG_ON(__alignof__(struct tvec_base) & TIMER_FLAG_MASK);
 
+	err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE,
+			       (void *)(long)smp_processor_id());
 	init_timer_stats();
 
 	BUG_ON(err != NOTIFY_OK);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:timers/core] timer: Relocate declarations of init_timer_on_stack_key()
  2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo
@ 2012-08-21 16:41   ` tip-bot for Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Tejun Heo @ 2012-08-21 16:41 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx

Commit-ID:  5a9af38d05f6a1bd0d3f1f69a074cdbe9c87e977
Gitweb:     http://git.kernel.org/tip/5a9af38d05f6a1bd0d3f1f69a074cdbe9c87e977
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Wed, 8 Aug 2012 11:10:26 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 21 Aug 2012 16:28:30 +0200

timer: Relocate declarations of init_timer_on_stack_key()

init_timer_on_stack_key() is used by init macro definitions.  Move
init_timer_on_stack_key() and destroy_timer_on_stack() declarations
above init macro defs.  This will make the next init cleanup patch
easier to read.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-3-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timer.h |   30 +++++++++++++++---------------
 1 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index cbd32ec..1d364ae 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -97,6 +97,21 @@ void init_timer_deferrable_key(struct timer_list *timer,
 			       const char *name,
 			       struct lock_class_key *key);
 
+#ifdef CONFIG_DEBUG_OBJECTS_TIMERS
+extern void init_timer_on_stack_key(struct timer_list *timer,
+				    const char *name,
+				    struct lock_class_key *key);
+extern void destroy_timer_on_stack(struct timer_list *timer);
+#else
+static inline void destroy_timer_on_stack(struct timer_list *timer) { }
+static inline void init_timer_on_stack_key(struct timer_list *timer,
+					   const char *name,
+					   struct lock_class_key *key)
+{
+	init_timer_key(timer, name, key);
+}
+#endif
+
 #ifdef CONFIG_LOCKDEP
 #define init_timer(timer)						\
 	do {								\
@@ -150,21 +165,6 @@ void init_timer_deferrable_key(struct timer_list *timer,
 	setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data))
 #endif
 
-#ifdef CONFIG_DEBUG_OBJECTS_TIMERS
-extern void init_timer_on_stack_key(struct timer_list *timer,
-				    const char *name,
-				    struct lock_class_key *key);
-extern void destroy_timer_on_stack(struct timer_list *timer);
-#else
-static inline void destroy_timer_on_stack(struct timer_list *timer) { }
-static inline void init_timer_on_stack_key(struct timer_list *timer,
-					   const char *name,
-					   struct lock_class_key *key)
-{
-	init_timer_key(timer, name, key);
-}
-#endif
-
 static inline void setup_timer_key(struct timer_list * timer,
 				const char *name,
 				struct lock_class_key *key,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:timers/core] timer: Clean up timer initializers
  2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo
@ 2012-08-21 16:42   ` tip-bot for Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Tejun Heo @ 2012-08-21 16:42 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx

Commit-ID:  fc683995a6c4e604d62ab9a488ac2c1ba94fa868
Gitweb:     http://git.kernel.org/tip/fc683995a6c4e604d62ab9a488ac2c1ba94fa868
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Wed, 8 Aug 2012 11:10:27 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 21 Aug 2012 16:28:30 +0200

timer: Clean up timer initializers

Over time, timer initializers became messy with unnecessarily
duplicated code which are inconsistently spread across timer.h and
timer.c.

This patch cleans up timer initializers.

* timer.c::__init_timer() is renamed to do_init_timer().

* __TIMER_INITIALIZER() added.  It takes @flags and all initializers
  are wrappers around it.

* init_timer[_on_stack]_key() now take @flags.

* __init_timer[_on_stack]() added.  They take @flags and all init
  macros are wrappers around them.

* __setup_timer[_on_stack]() added.  It uses __init_timer() and takes
  @flags.  All setup macros are wrappers around the two.

Note that this patch doesn't add missing init/setup combinations -
e.g. init_timer_deferrable_on_stack().  Adding missing ones is
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timer.h |  123 +++++++++++++++---------------------------------
 kernel/timer.c        |   56 ++++++-----------------
 2 files changed, 53 insertions(+), 126 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 1d364ae..3f95c1f 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -62,136 +62,91 @@ extern struct tvec_base boot_tvec_bases;
 
 #define TIMER_FLAG_MASK			0x1LU
 
-#define TIMER_INITIALIZER(_function, _expires, _data) {		\
+#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
 		.function = (_function),			\
 		.expires = (_expires),				\
 		.data = (_data),				\
-		.base = &boot_tvec_bases,			\
+		.base = (void *)((unsigned long)&boot_tvec_bases + (_flags)), \
 		.slack = -1,					\
 		__TIMER_LOCKDEP_MAP_INITIALIZER(		\
 			__FILE__ ":" __stringify(__LINE__))	\
 	}
 
-#define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *)		\
-		  ((unsigned char *)(ptr) + TIMER_DEFERRABLE))
+#define TIMER_INITIALIZER(_function, _expires, _data)		\
+	__TIMER_INITIALIZER((_function), (_expires), (_data), 0)
 
-#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\
-		.entry = { .prev = TIMER_ENTRY_STATIC },	\
-		.function = (_function),			\
-		.expires = (_expires),				\
-		.data = (_data),				\
-		.base = TBASE_MAKE_DEFERRED(&boot_tvec_bases),	\
-		__TIMER_LOCKDEP_MAP_INITIALIZER(		\
-			__FILE__ ":" __stringify(__LINE__))	\
-	}
+#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data)	\
+	__TIMER_INITIALIZER((_function), (_expires), (_data), TIMER_DEFERRABLE)
 
 #define DEFINE_TIMER(_name, _function, _expires, _data)		\
 	struct timer_list _name =				\
 		TIMER_INITIALIZER(_function, _expires, _data)
 
-void init_timer_key(struct timer_list *timer,
-		    const char *name,
-		    struct lock_class_key *key);
-void init_timer_deferrable_key(struct timer_list *timer,
-			       const char *name,
-			       struct lock_class_key *key);
+void init_timer_key(struct timer_list *timer, unsigned int flags,
+		    const char *name, struct lock_class_key *key);
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void init_timer_on_stack_key(struct timer_list *timer,
-				    const char *name,
+				    unsigned int flags, const char *name,
 				    struct lock_class_key *key);
 extern void destroy_timer_on_stack(struct timer_list *timer);
 #else
 static inline void destroy_timer_on_stack(struct timer_list *timer) { }
 static inline void init_timer_on_stack_key(struct timer_list *timer,
-					   const char *name,
+					   unsigned int flags, const char *name,
 					   struct lock_class_key *key)
 {
-	init_timer_key(timer, name, key);
+	init_timer_key(timer, flags, name, key);
 }
 #endif
 
 #ifdef CONFIG_LOCKDEP
-#define init_timer(timer)						\
+#define __init_timer(_timer, _flags)					\
 	do {								\
 		static struct lock_class_key __key;			\
-		init_timer_key((timer), #timer, &__key);		\
+		init_timer_key((_timer), (_flags), #_timer, &__key);	\
 	} while (0)
 
-#define init_timer_deferrable(timer)					\
+#define __init_timer_on_stack(_timer, _flags)				\
 	do {								\
 		static struct lock_class_key __key;			\
-		init_timer_deferrable_key((timer), #timer, &__key);	\
+		init_timer_on_stack_key((_timer), (_flags), #_timer, &__key); \
 	} while (0)
+#else
+#define __init_timer(_timer, _flags)					\
+	init_timer_key((_timer), (_flags), NULL, NULL)
+#define __init_timer_on_stack(_timer, _flags)				\
+	init_timer_on_stack_key((_timer), (_flags), NULL, NULL)
+#endif
 
+#define init_timer(timer)						\
+	__init_timer((timer), 0)
+#define init_timer_deferrable(timer)					\
+	__init_timer((timer), TIMER_DEFERRABLE)
 #define init_timer_on_stack(timer)					\
+	__init_timer_on_stack((timer), 0)
+
+#define __setup_timer(_timer, _fn, _data, _flags)			\
 	do {								\
-		static struct lock_class_key __key;			\
-		init_timer_on_stack_key((timer), #timer, &__key);	\
+		__init_timer((_timer), (_flags));			\
+		(_timer)->function = (_fn);				\
+		(_timer)->data = (_data);				\
 	} while (0)
 
-#define setup_timer(timer, fn, data)					\
+#define __setup_timer_on_stack(_timer, _fn, _data, _flags)		\
 	do {								\
-		static struct lock_class_key __key;			\
-		setup_timer_key((timer), #timer, &__key, (fn), (data));\
+		__init_timer_on_stack((_timer), (_flags));		\
+		(_timer)->function = (_fn);				\
+		(_timer)->data = (_data);				\
 	} while (0)
 
+#define setup_timer(timer, fn, data)					\
+	__setup_timer((timer), (fn), (data), 0)
 #define setup_timer_on_stack(timer, fn, data)				\
-	do {								\
-		static struct lock_class_key __key;			\
-		setup_timer_on_stack_key((timer), #timer, &__key,	\
-					 (fn), (data));			\
-	} while (0)
+	__setup_timer_on_stack((timer), (fn), (data), 0)
 #define setup_deferrable_timer_on_stack(timer, fn, data)		\
-	do {								\
-		static struct lock_class_key __key;			\
-		setup_deferrable_timer_on_stack_key((timer), #timer,	\
-						    &__key, (fn),	\
-						    (data));		\
-	} while (0)
-#else
-#define init_timer(timer)\
-	init_timer_key((timer), NULL, NULL)
-#define init_timer_deferrable(timer)\
-	init_timer_deferrable_key((timer), NULL, NULL)
-#define init_timer_on_stack(timer)\
-	init_timer_on_stack_key((timer), NULL, NULL)
-#define setup_timer(timer, fn, data)\
-	setup_timer_key((timer), NULL, NULL, (fn), (data))
-#define setup_timer_on_stack(timer, fn, data)\
-	setup_timer_on_stack_key((timer), NULL, NULL, (fn), (data))
-#define setup_deferrable_timer_on_stack(timer, fn, data)\
-	setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data))
-#endif
-
-static inline void setup_timer_key(struct timer_list * timer,
-				const char *name,
-				struct lock_class_key *key,
-				void (*function)(unsigned long),
-				unsigned long data)
-{
-	timer->function = function;
-	timer->data = data;
-	init_timer_key(timer, name, key);
-}
-
-static inline void setup_timer_on_stack_key(struct timer_list *timer,
-					const char *name,
-					struct lock_class_key *key,
-					void (*function)(unsigned long),
-					unsigned long data)
-{
-	timer->function = function;
-	timer->data = data;
-	init_timer_on_stack_key(timer, name, key);
-}
-
-extern void setup_deferrable_timer_on_stack_key(struct timer_list *timer,
-						const char *name,
-						struct lock_class_key *key,
-						void (*function)(unsigned long),
-						unsigned long data);
+	__setup_timer_on_stack((timer), (fn), (data), TIMER_DEFERRABLE)
 
 /**
  * timer_pending - is a timer pending?
diff --git a/kernel/timer.c b/kernel/timer.c
index cf7af56..8d185a1 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -100,11 +100,6 @@ static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
 }
 
-static inline void timer_set_deferrable(struct timer_list *timer)
-{
-	timer->base = TBASE_MAKE_DEFERRED(timer->base);
-}
-
 static inline void
 timer_set_base(struct timer_list *timer, struct tvec_base *new_base)
 {
@@ -564,16 +559,14 @@ static inline void debug_timer_assert_init(struct timer_list *timer)
 	debug_object_assert_init(timer, &timer_debug_descr);
 }
 
-static void __init_timer(struct timer_list *timer,
-			 const char *name,
-			 struct lock_class_key *key);
+static void do_init_timer(struct timer_list *timer, unsigned int flags,
+			  const char *name, struct lock_class_key *key);
 
-void init_timer_on_stack_key(struct timer_list *timer,
-			     const char *name,
-			     struct lock_class_key *key)
+void init_timer_on_stack_key(struct timer_list *timer, unsigned int flags,
+			     const char *name, struct lock_class_key *key)
 {
 	debug_object_init_on_stack(timer, &timer_debug_descr);
-	__init_timer(timer, name, key);
+	do_init_timer(timer, flags, name, key);
 }
 EXPORT_SYMBOL_GPL(init_timer_on_stack_key);
 
@@ -614,12 +607,13 @@ static inline void debug_assert_init(struct timer_list *timer)
 	debug_timer_assert_init(timer);
 }
 
-static void __init_timer(struct timer_list *timer,
-			 const char *name,
-			 struct lock_class_key *key)
+static void do_init_timer(struct timer_list *timer, unsigned int flags,
+			  const char *name, struct lock_class_key *key)
 {
+	struct tvec_base *base = __raw_get_cpu_var(tvec_bases);
+
 	timer->entry.next = NULL;
-	timer->base = __raw_get_cpu_var(tvec_bases);
+	timer->base = (void *)((unsigned long)base | flags);
 	timer->slack = -1;
 #ifdef CONFIG_TIMER_STATS
 	timer->start_site = NULL;
@@ -629,22 +623,10 @@ static void __init_timer(struct timer_list *timer,
 	lockdep_init_map(&timer->lockdep_map, name, key, 0);
 }
 
-void setup_deferrable_timer_on_stack_key(struct timer_list *timer,
-					 const char *name,
-					 struct lock_class_key *key,
-					 void (*function)(unsigned long),
-					 unsigned long data)
-{
-	timer->function = function;
-	timer->data = data;
-	init_timer_on_stack_key(timer, name, key);
-	timer_set_deferrable(timer);
-}
-EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key);
-
 /**
  * init_timer_key - initialize a timer
  * @timer: the timer to be initialized
+ * @flags: timer flags
  * @name: name of the timer
  * @key: lockdep class key of the fake lock used for tracking timer
  *       sync lock dependencies
@@ -652,24 +634,14 @@ EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key);
  * init_timer_key() must be done to a timer prior calling *any* of the
  * other timer functions.
  */
-void init_timer_key(struct timer_list *timer,
-		    const char *name,
-		    struct lock_class_key *key)
+void init_timer_key(struct timer_list *timer, unsigned int flags,
+		    const char *name, struct lock_class_key *key)
 {
 	debug_init(timer);
-	__init_timer(timer, name, key);
+	do_init_timer(timer, flags, name, key);
 }
 EXPORT_SYMBOL(init_timer_key);
 
-void init_timer_deferrable_key(struct timer_list *timer,
-			       const char *name,
-			       struct lock_class_key *key)
-{
-	init_timer_key(timer, name, key);
-	timer_set_deferrable(timer);
-}
-EXPORT_SYMBOL(init_timer_deferrable_key);
-
 static inline void detach_timer(struct timer_list *timer, bool clear_pending)
 {
 	struct list_head *entry = &timer->entry;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:timers/core] timer: Implement TIMER_IRQSAFE
  2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo
@ 2012-08-21 16:43   ` tip-bot for Tejun Heo
  2012-08-21 19:26     ` Tejun Heo
  0 siblings, 1 reply; 31+ messages in thread
From: tip-bot for Tejun Heo @ 2012-08-21 16:43 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx

Commit-ID:  c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3
Gitweb:     http://git.kernel.org/tip/c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3
Author:     Tejun Heo <tj@kernel.org>
AuthorDate: Wed, 8 Aug 2012 11:10:28 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 21 Aug 2012 16:28:31 +0200

timer: Implement TIMER_IRQSAFE

Timer internals are protected with irq-safe locks but timer execution
isn't, so a timer being dequeued for execution and its execution
aren't atomic against IRQs.  This makes it impossible to wait for its
completion from IRQ handlers and difficult to shoot down a timer from
IRQ handlers.

This issue caused some issues for delayed_work interface.  Because
there's no way to reliably shoot down delayed_work->timer from IRQ
handlers, __cancel_delayed_work() can't share the logic to steal the
target delayed_work with cancel_delayed_work_sync(), and can only
steal delayed_works which are on queued on timer.  Similarly, the
pending mod_delayed_work() can't be used from IRQ handlers.

This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
to be executed without enabling IRQ after dequeueing such that its
dequeueing and execution are atomic against IRQ handlers.

This makes it safe to wait for the timer's completion from IRQ
handlers, for example, using del_timer_sync().  It can never be
executing on the local CPU and if executing on other CPUs it won't be
interrupted until done.

This will enable simplifying delayed_work cancel/mod interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timer.h |   16 ++++++++++++----
 kernel/timer.c        |   35 ++++++++++++++++++++++++-----------
 2 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 3f95c1f..8c5a197 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -49,18 +49,26 @@ extern struct tvec_base boot_tvec_bases;
 #endif
 
 /*
- * Note that all tvec_bases are 2 byte aligned and lower bit of
- * base in timer_list is guaranteed to be zero. Use the LSB to
- * indicate whether the timer is deferrable.
+ * Note that all tvec_bases are at least 4 byte aligned and lower two bits
+ * of base in timer_list is guaranteed to be zero. Use them for flags.
  *
  * A deferrable timer will work normally when the system is busy, but
  * will not cause a CPU to come out of idle just to service it; instead,
  * the timer will be serviced when the CPU eventually wakes up with a
  * subsequent non-deferrable timer.
+ *
+ * An irqsafe timer is executed with IRQ disabled and it's safe to wait for
+ * the completion of the running instance from IRQ handlers, for example,
+ * by calling del_timer_sync().
+ *
+ * Note: The irq disabled callback execution is a special case for
+ * workqueue locking issues. It's not meant for executing random crap
+ * with interrupts disabled. Abuse is monitored!
  */
 #define TIMER_DEFERRABLE		0x1LU
+#define TIMER_IRQSAFE			0x2LU
 
-#define TIMER_FLAG_MASK			0x1LU
+#define TIMER_FLAG_MASK			0x3LU
 
 #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
 		.entry = { .prev = TIMER_ENTRY_STATIC },	\
diff --git a/kernel/timer.c b/kernel/timer.c
index 8d185a1..706fe4c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -95,6 +95,11 @@ static inline unsigned int tbase_get_deferrable(struct tvec_base *base)
 	return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE);
 }
 
+static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
+{
+	return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
+}
+
 static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
 {
 	return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -1002,14 +1007,14 @@ EXPORT_SYMBOL(try_to_del_timer_sync);
  *
  * Synchronization rules: Callers must prevent restarting of the timer,
  * otherwise this function is meaningless. It must not be called from
- * interrupt contexts. The caller must not hold locks which would prevent
- * completion of the timer's handler. The timer's handler must not call
- * add_timer_on(). Upon exit the timer is not queued and the handler is
- * not running on any CPU.
+ * interrupt contexts unless the timer is an irqsafe one. The caller must
+ * not hold locks which would prevent completion of the timer's
+ * handler. The timer's handler must not call add_timer_on(). Upon exit the
+ * timer is not queued and the handler is not running on any CPU.
  *
- * Note: You must not hold locks that are held in interrupt context
- *   while calling this function. Even if the lock has nothing to do
- *   with the timer in question.  Here's why:
+ * Note: For !irqsafe timers, you must not hold locks that are held in
+ *   interrupt context while calling this function. Even if the lock has
+ *   nothing to do with the timer in question.  Here's why:
  *
  *    CPU0                             CPU1
  *    ----                             ----
@@ -1046,7 +1051,7 @@ int del_timer_sync(struct timer_list *timer)
 	 * don't use it in hardirq context, because it
 	 * could lead to deadlock.
 	 */
-	WARN_ON(in_irq());
+	WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base));
 	for (;;) {
 		int ret = try_to_del_timer_sync(timer);
 		if (ret >= 0)
@@ -1153,19 +1158,27 @@ static inline void __run_timers(struct tvec_base *base)
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
+			bool irqsafe;
 
 			timer = list_first_entry(head, struct timer_list,entry);
 			fn = timer->function;
 			data = timer->data;
+			irqsafe = tbase_get_irqsafe(timer->base);
 
 			timer_stats_account_timer(timer);
 
 			base->running_timer = timer;
 			detach_expired_timer(timer, base);
 
-			spin_unlock_irq(&base->lock);
-			call_timer_fn(timer, fn, data);
-			spin_lock_irq(&base->lock);
+			if (irqsafe) {
+				spin_unlock(&base->lock);
+				call_timer_fn(timer, fn, data);
+				spin_lock(&base->lock);
+			} else {
+				spin_unlock_irq(&base->lock);
+				call_timer_fn(timer, fn, data);
+				spin_lock_irq(&base->lock);
+			}
 		}
 	}
 	base->running_timer = NULL;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [tip:timers/core] timer: Implement TIMER_IRQSAFE
  2012-08-21 16:43   ` [tip:timers/core] timer: Implement TIMER_IRQSAFE tip-bot for Tejun Heo
@ 2012-08-21 19:26     ` Tejun Heo
  0 siblings, 0 replies; 31+ messages in thread
From: Tejun Heo @ 2012-08-21 19:26 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx

On Tue, Aug 21, 2012 at 09:43:07AM -0700, tip-bot for Tejun Heo wrote:
> Commit-ID:  c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3
> Gitweb:     http://git.kernel.org/tip/c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3
> Author:     Tejun Heo <tj@kernel.org>
> AuthorDate: Wed, 8 Aug 2012 11:10:28 -0700
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Tue, 21 Aug 2012 16:28:31 +0200
> 
> timer: Implement TIMER_IRQSAFE
> 
> Timer internals are protected with irq-safe locks but timer execution
> isn't, so a timer being dequeued for execution and its execution
> aren't atomic against IRQs.  This makes it impossible to wait for its
> completion from IRQ handlers and difficult to shoot down a timer from
> IRQ handlers.
> 
> This issue caused some issues for delayed_work interface.  Because
> there's no way to reliably shoot down delayed_work->timer from IRQ
> handlers, __cancel_delayed_work() can't share the logic to steal the
> target delayed_work with cancel_delayed_work_sync(), and can only
> steal delayed_works which are on queued on timer.  Similarly, the
> pending mod_delayed_work() can't be used from IRQ handlers.
> 
> This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
> to be executed without enabling IRQ after dequeueing such that its
> dequeueing and execution are atomic against IRQ handlers.
> 
> This makes it safe to wait for the timer's completion from IRQ
> handlers, for example, using del_timer_sync().  It can never be
> executing on the local CPU and if executing on other CPUs it won't be
> interrupted until done.
> 
> This will enable simplifying delayed_work cancel/mod interface.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: torvalds@linux-foundation.org
> Cc: peterz@infradead.org
> Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Will pull into wq/for-3.7 and put delay_work changes on top.  If
there's any objection, please scream.

Thanks a lot.

-- 
tejun

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-08-21 19:27 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-08 18:10 Tejun Heo
2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo
2012-08-21 16:40   ` [tip:timers/core] timer: Generalize " tip-bot for Tejun Heo
2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo
2012-08-21 16:41   ` [tip:timers/core] timer: Relocate " tip-bot for Tejun Heo
2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo
2012-08-21 16:42   ` [tip:timers/core] timer: Clean " tip-bot for Tejun Heo
2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo
2012-08-21 16:43   ` [tip:timers/core] timer: Implement TIMER_IRQSAFE tip-bot for Tejun Heo
2012-08-21 19:26     ` Tejun Heo
2012-08-08 18:13 ` $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers" Tejun Heo
2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo
2012-08-14  8:32   ` Thomas Gleixner
2012-08-14 19:16   ` Thomas Gleixner
2012-08-14 19:22     ` Tejun Heo
2012-08-14 21:03       ` Thomas Gleixner
2012-08-14 21:56         ` Tejun Heo
2012-08-14 22:45           ` Thomas Gleixner
2012-08-14 23:01             ` Tejun Heo
2012-08-14 23:33               ` Thomas Gleixner
2012-08-15  0:18                 ` Tejun Heo
2012-08-15 10:58                   ` Peter Zijlstra
2012-08-16 19:36                     ` Tejun Heo
2012-08-14 18:55 ` Thomas Gleixner
2012-08-14 19:15   ` Tejun Heo
2012-08-14 20:43     ` Thomas Gleixner
2012-08-14 21:40       ` Tejun Heo
2012-08-14 23:12         ` Thomas Gleixner
2012-08-14 23:27           ` Tejun Heo
2012-08-14 23:46             ` Thomas Gleixner
2012-08-14 23:52               ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.