* (no subject) @ 2012-08-08 18:10 Tejun Heo 2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo ` (6 more replies) 0 siblings, 7 replies; 31+ messages in thread From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz Timer internals are protected by irqsafe lock but the lock is naturally dropped and irq enabled while a timer is executed. This makes dequeueing timer for execution and the actual execution non-atomic against IRQs. No matter what the timer function does, IRQs can occur between timer dispatch and execution. This means that an IRQ handler could interrupt any timer in progress and it's impossible for an IRQ handler to cancel and drain a timer. This restriction manifests as ugly convolutions in workqueue delayed_work interface. A !idle delayed_work is either on timer, being transferred from timer to worklist, on worklist, or executing. There are interfaces which need to cancel a pending delayed_work - cancel_delayed_work() and friends and mod_delayed_work(). They want to cancel a work item in the first three states but it's impossible to drain the second state from IRQ handlers which lead to the following oddities. * mod_delayed_work() can't be used from IRQ handlers. * __cancel_delayed_work() can't use the usual try_to_grab_pending() which handles all three states but instead only deals with the first state using a separate implementation. There's no way to make a delayed_work not pending from IRQ handlers. * The context / behavior differences among cancel_delayed_work(), __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and confusing (the first two are mostly historical tho). This patchset implements irqsafe timers. For an irqsafe timer, IRQ is not enabled from dispatch till the end of its execution making it safe to drain the timer regardless of context. This will enable cleaning up delayed_work interface. This patchset contains the following four patches. 0001-timer-generalize-timer-base-flags-handling.patch 0002-timer-relocate-declarations-of-init_timer_on_stack_k.patch 0003-timer-clean-up-timer-initializers.patch 0004-timer-implement-TIMER_IRQSAFE.patch 0001 generalizes timer->base flags handling so that TIMER_IRQSAFE can be added easily. 0002-0003 clean up initializers so that adding TIMER_IRQSAFE doesn't need to duplicate init code multiple times. 0004 implements TIMER_IRQSAFE. This patchset is also available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-timer-irqsafe Will soon post workqueue patchset which makes use of this. If this goes in, it would be great if this either goes through wq/for-3.7 or gets its own branch somewhere so that it can be pulled into wq/for-3.7. Thanks. include/linux/timer.h | 161 ++++++++++++++++++-------------------------------- kernel/timer.c | 108 +++++++++++++++------------------ 2 files changed, 110 insertions(+), 159 deletions(-) -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 1/4] timer: generalize timer->base flags handling 2012-08-08 18:10 Tejun Heo @ 2012-08-08 18:10 ` Tejun Heo 2012-08-21 16:40 ` [tip:timers/core] timer: Generalize " tip-bot for Tejun Heo 2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo ` (5 subsequent siblings) 6 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo To prepare for addition of another flag, generalize timer->base flags handling. * Rename from TBASE_*_FLAG to TIMER_* and make them LU constants. * Define and use TIMER_FLAG_MASK for flags masking so that multiple flags can be handled correctly. * Don't dereference timer->base directly even if !tbase_get_deferrable(). All two such places are already passed in @base, so use it instead. * Make sure tvec_base's alignment is large enough for timer->base flags using BUILD_BUG_ON(). Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/timer.h | 6 ++++-- kernel/timer.c | 21 +++++++++++++-------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 6abd913..cbd32ec 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -58,7 +58,9 @@ extern struct tvec_base boot_tvec_bases; * the timer will be serviced when the CPU eventually wakes up with a * subsequent non-deferrable timer. */ -#define TBASE_DEFERRABLE_FLAG (0x1) +#define TIMER_DEFERRABLE 0x1LU + +#define TIMER_FLAG_MASK 0x1LU #define TIMER_INITIALIZER(_function, _expires, _data) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ @@ -72,7 +74,7 @@ extern struct tvec_base boot_tvec_bases; } #define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *) \ - ((unsigned char *)(ptr) + TBASE_DEFERRABLE_FLAG)) + ((unsigned char *)(ptr) + TIMER_DEFERRABLE)) #define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\ .entry = { .prev = TIMER_ENTRY_STATIC }, \ diff --git a/kernel/timer.c b/kernel/timer.c index a61c093..cf7af56 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -92,12 +92,12 @@ static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases; /* Functions below help us manage 'deferrable' flag */ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) { - return ((unsigned int)(unsigned long)base & TBASE_DEFERRABLE_FLAG); + return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE); } static inline struct tvec_base *tbase_get_base(struct tvec_base *base) { - return ((struct tvec_base *)((unsigned long)base & ~TBASE_DEFERRABLE_FLAG)); + return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK)); } static inline void timer_set_deferrable(struct timer_list *timer) @@ -108,8 +108,9 @@ static inline void timer_set_deferrable(struct timer_list *timer) static inline void timer_set_base(struct timer_list *timer, struct tvec_base *new_base) { - timer->base = (struct tvec_base *)((unsigned long)(new_base) | - tbase_get_deferrable(timer->base)); + unsigned long flags = (unsigned long)timer->base & TIMER_FLAG_MASK; + + timer->base = (struct tvec_base *)((unsigned long)(new_base) | flags); } static unsigned long round_jiffies_common(unsigned long j, int cpu, @@ -686,7 +687,7 @@ detach_expired_timer(struct timer_list *timer, struct tvec_base *base) { detach_timer(timer, true); if (!tbase_get_deferrable(timer->base)) - timer->base->active_timers--; + base->active_timers--; } static int detach_if_pending(struct timer_list *timer, struct tvec_base *base, @@ -697,7 +698,7 @@ static int detach_if_pending(struct timer_list *timer, struct tvec_base *base, detach_timer(timer, clear_pending); if (!tbase_get_deferrable(timer->base)) { - timer->base->active_timers--; + base->active_timers--; if (timer->expires == base->next_timer) base->next_timer = base->timer_jiffies; } @@ -1800,9 +1801,13 @@ static struct notifier_block __cpuinitdata timers_nb = { void __init init_timers(void) { - int err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE, - (void *)(long)smp_processor_id()); + int err; + + /* ensure there are enough low bits for flags in timer->base pointer */ + BUILD_BUG_ON(__alignof__(struct tvec_base) & TIMER_FLAG_MASK); + err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE, + (void *)(long)smp_processor_id()); init_timer_stats(); BUG_ON(err != NOTIFY_OK); -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [tip:timers/core] timer: Generalize timer->base flags handling 2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo @ 2012-08-21 16:40 ` tip-bot for Tejun Heo 0 siblings, 0 replies; 31+ messages in thread From: tip-bot for Tejun Heo @ 2012-08-21 16:40 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx Commit-ID: e52b1db37b89b69ceb08b521a808bd2cf4724481 Gitweb: http://git.kernel.org/tip/e52b1db37b89b69ceb08b521a808bd2cf4724481 Author: Tejun Heo <tj@kernel.org> AuthorDate: Wed, 8 Aug 2012 11:10:25 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Tue, 21 Aug 2012 16:28:30 +0200 timer: Generalize timer->base flags handling To prepare for addition of another flag, generalize timer->base flags handling. * Rename from TBASE_*_FLAG to TIMER_* and make them LU constants. * Define and use TIMER_FLAG_MASK for flags masking so that multiple flags can be handled correctly. * Don't dereference timer->base directly even if !tbase_get_deferrable(). All two such places are already passed in @base, so use it instead. * Make sure tvec_base's alignment is large enough for timer->base flags using BUILD_BUG_ON(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/timer.h | 6 ++++-- kernel/timer.c | 21 +++++++++++++-------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 6abd913..cbd32ec 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -58,7 +58,9 @@ extern struct tvec_base boot_tvec_bases; * the timer will be serviced when the CPU eventually wakes up with a * subsequent non-deferrable timer. */ -#define TBASE_DEFERRABLE_FLAG (0x1) +#define TIMER_DEFERRABLE 0x1LU + +#define TIMER_FLAG_MASK 0x1LU #define TIMER_INITIALIZER(_function, _expires, _data) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ @@ -72,7 +74,7 @@ extern struct tvec_base boot_tvec_bases; } #define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *) \ - ((unsigned char *)(ptr) + TBASE_DEFERRABLE_FLAG)) + ((unsigned char *)(ptr) + TIMER_DEFERRABLE)) #define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\ .entry = { .prev = TIMER_ENTRY_STATIC }, \ diff --git a/kernel/timer.c b/kernel/timer.c index a61c093..cf7af56 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -92,12 +92,12 @@ static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases; /* Functions below help us manage 'deferrable' flag */ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) { - return ((unsigned int)(unsigned long)base & TBASE_DEFERRABLE_FLAG); + return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE); } static inline struct tvec_base *tbase_get_base(struct tvec_base *base) { - return ((struct tvec_base *)((unsigned long)base & ~TBASE_DEFERRABLE_FLAG)); + return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK)); } static inline void timer_set_deferrable(struct timer_list *timer) @@ -108,8 +108,9 @@ static inline void timer_set_deferrable(struct timer_list *timer) static inline void timer_set_base(struct timer_list *timer, struct tvec_base *new_base) { - timer->base = (struct tvec_base *)((unsigned long)(new_base) | - tbase_get_deferrable(timer->base)); + unsigned long flags = (unsigned long)timer->base & TIMER_FLAG_MASK; + + timer->base = (struct tvec_base *)((unsigned long)(new_base) | flags); } static unsigned long round_jiffies_common(unsigned long j, int cpu, @@ -686,7 +687,7 @@ detach_expired_timer(struct timer_list *timer, struct tvec_base *base) { detach_timer(timer, true); if (!tbase_get_deferrable(timer->base)) - timer->base->active_timers--; + base->active_timers--; } static int detach_if_pending(struct timer_list *timer, struct tvec_base *base, @@ -697,7 +698,7 @@ static int detach_if_pending(struct timer_list *timer, struct tvec_base *base, detach_timer(timer, clear_pending); if (!tbase_get_deferrable(timer->base)) { - timer->base->active_timers--; + base->active_timers--; if (timer->expires == base->next_timer) base->next_timer = base->timer_jiffies; } @@ -1800,9 +1801,13 @@ static struct notifier_block __cpuinitdata timers_nb = { void __init init_timers(void) { - int err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE, - (void *)(long)smp_processor_id()); + int err; + + /* ensure there are enough low bits for flags in timer->base pointer */ + BUILD_BUG_ON(__alignof__(struct tvec_base) & TIMER_FLAG_MASK); + err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE, + (void *)(long)smp_processor_id()); init_timer_stats(); BUG_ON(err != NOTIFY_OK); ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() 2012-08-08 18:10 Tejun Heo 2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo @ 2012-08-08 18:10 ` Tejun Heo 2012-08-21 16:41 ` [tip:timers/core] timer: Relocate " tip-bot for Tejun Heo 2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo ` (4 subsequent siblings) 6 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo init_timer_on_stack_key() is used by init macro definitions. Move init_timer_on_stack_key() and destroy_timer_on_stack() declarations above init macro defs. This will make the next init cleanup patch easier to read. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/timer.h | 30 +++++++++++++++--------------- 1 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index cbd32ec..1d364ae 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -97,6 +97,21 @@ void init_timer_deferrable_key(struct timer_list *timer, const char *name, struct lock_class_key *key); +#ifdef CONFIG_DEBUG_OBJECTS_TIMERS +extern void init_timer_on_stack_key(struct timer_list *timer, + const char *name, + struct lock_class_key *key); +extern void destroy_timer_on_stack(struct timer_list *timer); +#else +static inline void destroy_timer_on_stack(struct timer_list *timer) { } +static inline void init_timer_on_stack_key(struct timer_list *timer, + const char *name, + struct lock_class_key *key) +{ + init_timer_key(timer, name, key); +} +#endif + #ifdef CONFIG_LOCKDEP #define init_timer(timer) \ do { \ @@ -150,21 +165,6 @@ void init_timer_deferrable_key(struct timer_list *timer, setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data)) #endif -#ifdef CONFIG_DEBUG_OBJECTS_TIMERS -extern void init_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key); -extern void destroy_timer_on_stack(struct timer_list *timer); -#else -static inline void destroy_timer_on_stack(struct timer_list *timer) { } -static inline void init_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) -{ - init_timer_key(timer, name, key); -} -#endif - static inline void setup_timer_key(struct timer_list * timer, const char *name, struct lock_class_key *key, -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [tip:timers/core] timer: Relocate declarations of init_timer_on_stack_key() 2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo @ 2012-08-21 16:41 ` tip-bot for Tejun Heo 0 siblings, 0 replies; 31+ messages in thread From: tip-bot for Tejun Heo @ 2012-08-21 16:41 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx Commit-ID: 5a9af38d05f6a1bd0d3f1f69a074cdbe9c87e977 Gitweb: http://git.kernel.org/tip/5a9af38d05f6a1bd0d3f1f69a074cdbe9c87e977 Author: Tejun Heo <tj@kernel.org> AuthorDate: Wed, 8 Aug 2012 11:10:26 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Tue, 21 Aug 2012 16:28:30 +0200 timer: Relocate declarations of init_timer_on_stack_key() init_timer_on_stack_key() is used by init macro definitions. Move init_timer_on_stack_key() and destroy_timer_on_stack() declarations above init macro defs. This will make the next init cleanup patch easier to read. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1344449428-24962-3-git-send-email-tj@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/timer.h | 30 +++++++++++++++--------------- 1 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index cbd32ec..1d364ae 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -97,6 +97,21 @@ void init_timer_deferrable_key(struct timer_list *timer, const char *name, struct lock_class_key *key); +#ifdef CONFIG_DEBUG_OBJECTS_TIMERS +extern void init_timer_on_stack_key(struct timer_list *timer, + const char *name, + struct lock_class_key *key); +extern void destroy_timer_on_stack(struct timer_list *timer); +#else +static inline void destroy_timer_on_stack(struct timer_list *timer) { } +static inline void init_timer_on_stack_key(struct timer_list *timer, + const char *name, + struct lock_class_key *key) +{ + init_timer_key(timer, name, key); +} +#endif + #ifdef CONFIG_LOCKDEP #define init_timer(timer) \ do { \ @@ -150,21 +165,6 @@ void init_timer_deferrable_key(struct timer_list *timer, setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data)) #endif -#ifdef CONFIG_DEBUG_OBJECTS_TIMERS -extern void init_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key); -extern void destroy_timer_on_stack(struct timer_list *timer); -#else -static inline void destroy_timer_on_stack(struct timer_list *timer) { } -static inline void init_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) -{ - init_timer_key(timer, name, key); -} -#endif - static inline void setup_timer_key(struct timer_list * timer, const char *name, struct lock_class_key *key, ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 3/4] timer: clean up timer initializers 2012-08-08 18:10 Tejun Heo 2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo 2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo @ 2012-08-08 18:10 ` Tejun Heo 2012-08-21 16:42 ` [tip:timers/core] timer: Clean " tip-bot for Tejun Heo 2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo ` (3 subsequent siblings) 6 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo Over time, timer initializers became messy with unnecessarily duplicated codes which are inconsistently spread across timer.h and timer.c. This patch cleans up timer initializers. * timer.c::__init_timer() is renamed to do_init_timer(). * __TIMER_INITIALIZER() added. It takes @flags and all initializers are wrappers around it. * init_timer[_on_stack]_key() now take @flags. * __init_timer[_on_stack]() added. They take @flags and all init macros are wrappers around them. * __setup_timer[_on_stack]() added. It uses __init_timer() and takes @flags. All setup macros are wrappers around the two. Note that this patch doesn't add missing init/setup combinations - e.g. init_timer_deferrable_on_stack(). Adding missing ones is trivial. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/timer.h | 123 +++++++++++++++--------------------------------- kernel/timer.c | 56 ++++++----------------- 2 files changed, 53 insertions(+), 126 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 1d364ae..3f95c1f 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -62,136 +62,91 @@ extern struct tvec_base boot_tvec_bases; #define TIMER_FLAG_MASK 0x1LU -#define TIMER_INITIALIZER(_function, _expires, _data) { \ +#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ .function = (_function), \ .expires = (_expires), \ .data = (_data), \ - .base = &boot_tvec_bases, \ + .base = (void *)((unsigned long)&boot_tvec_bases + (_flags)), \ .slack = -1, \ __TIMER_LOCKDEP_MAP_INITIALIZER( \ __FILE__ ":" __stringify(__LINE__)) \ } -#define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *) \ - ((unsigned char *)(ptr) + TIMER_DEFERRABLE)) +#define TIMER_INITIALIZER(_function, _expires, _data) \ + __TIMER_INITIALIZER((_function), (_expires), (_data), 0) -#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\ - .entry = { .prev = TIMER_ENTRY_STATIC }, \ - .function = (_function), \ - .expires = (_expires), \ - .data = (_data), \ - .base = TBASE_MAKE_DEFERRED(&boot_tvec_bases), \ - __TIMER_LOCKDEP_MAP_INITIALIZER( \ - __FILE__ ":" __stringify(__LINE__)) \ - } +#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) \ + __TIMER_INITIALIZER((_function), (_expires), (_data), TIMER_DEFERRABLE) #define DEFINE_TIMER(_name, _function, _expires, _data) \ struct timer_list _name = \ TIMER_INITIALIZER(_function, _expires, _data) -void init_timer_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key); -void init_timer_deferrable_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key); +void init_timer_key(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key); #ifdef CONFIG_DEBUG_OBJECTS_TIMERS extern void init_timer_on_stack_key(struct timer_list *timer, - const char *name, + unsigned int flags, const char *name, struct lock_class_key *key); extern void destroy_timer_on_stack(struct timer_list *timer); #else static inline void destroy_timer_on_stack(struct timer_list *timer) { } static inline void init_timer_on_stack_key(struct timer_list *timer, - const char *name, + unsigned int flags, const char *name, struct lock_class_key *key) { - init_timer_key(timer, name, key); + init_timer_key(timer, flags, name, key); } #endif #ifdef CONFIG_LOCKDEP -#define init_timer(timer) \ +#define __init_timer(_timer, _flags) \ do { \ static struct lock_class_key __key; \ - init_timer_key((timer), #timer, &__key); \ + init_timer_key((_timer), (_flags), #_timer, &__key); \ } while (0) -#define init_timer_deferrable(timer) \ +#define __init_timer_on_stack(_timer, _flags) \ do { \ static struct lock_class_key __key; \ - init_timer_deferrable_key((timer), #timer, &__key); \ + init_timer_on_stack_key((_timer), (_flags), #_timer, &__key); \ } while (0) +#else +#define __init_timer(_timer, _flags) \ + init_timer_key((_timer), (_flags), NULL, NULL) +#define __init_timer_on_stack(_timer, _flags) \ + init_timer_on_stack_key((_timer), (_flags), NULL, NULL) +#endif +#define init_timer(timer) \ + __init_timer((timer), 0) +#define init_timer_deferrable(timer) \ + __init_timer((timer), TIMER_DEFERRABLE) #define init_timer_on_stack(timer) \ + __init_timer_on_stack((timer), 0) + +#define __setup_timer(_timer, _fn, _data, _flags) \ do { \ - static struct lock_class_key __key; \ - init_timer_on_stack_key((timer), #timer, &__key); \ + __init_timer((_timer), (_flags)); \ + (_timer)->function = (_fn); \ + (_timer)->data = (_data); \ } while (0) -#define setup_timer(timer, fn, data) \ +#define __setup_timer_on_stack(_timer, _fn, _data, _flags) \ do { \ - static struct lock_class_key __key; \ - setup_timer_key((timer), #timer, &__key, (fn), (data));\ + __init_timer_on_stack((_timer), (_flags)); \ + (_timer)->function = (_fn); \ + (_timer)->data = (_data); \ } while (0) +#define setup_timer(timer, fn, data) \ + __setup_timer((timer), (fn), (data), 0) #define setup_timer_on_stack(timer, fn, data) \ - do { \ - static struct lock_class_key __key; \ - setup_timer_on_stack_key((timer), #timer, &__key, \ - (fn), (data)); \ - } while (0) + __setup_timer_on_stack((timer), (fn), (data), 0) #define setup_deferrable_timer_on_stack(timer, fn, data) \ - do { \ - static struct lock_class_key __key; \ - setup_deferrable_timer_on_stack_key((timer), #timer, \ - &__key, (fn), \ - (data)); \ - } while (0) -#else -#define init_timer(timer)\ - init_timer_key((timer), NULL, NULL) -#define init_timer_deferrable(timer)\ - init_timer_deferrable_key((timer), NULL, NULL) -#define init_timer_on_stack(timer)\ - init_timer_on_stack_key((timer), NULL, NULL) -#define setup_timer(timer, fn, data)\ - setup_timer_key((timer), NULL, NULL, (fn), (data)) -#define setup_timer_on_stack(timer, fn, data)\ - setup_timer_on_stack_key((timer), NULL, NULL, (fn), (data)) -#define setup_deferrable_timer_on_stack(timer, fn, data)\ - setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data)) -#endif - -static inline void setup_timer_key(struct timer_list * timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data) -{ - timer->function = function; - timer->data = data; - init_timer_key(timer, name, key); -} - -static inline void setup_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data) -{ - timer->function = function; - timer->data = data; - init_timer_on_stack_key(timer, name, key); -} - -extern void setup_deferrable_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data); + __setup_timer_on_stack((timer), (fn), (data), TIMER_DEFERRABLE) /** * timer_pending - is a timer pending? diff --git a/kernel/timer.c b/kernel/timer.c index cf7af56..8d185a1 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -100,11 +100,6 @@ static inline struct tvec_base *tbase_get_base(struct tvec_base *base) return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK)); } -static inline void timer_set_deferrable(struct timer_list *timer) -{ - timer->base = TBASE_MAKE_DEFERRED(timer->base); -} - static inline void timer_set_base(struct timer_list *timer, struct tvec_base *new_base) { @@ -564,16 +559,14 @@ static inline void debug_timer_assert_init(struct timer_list *timer) debug_object_assert_init(timer, &timer_debug_descr); } -static void __init_timer(struct timer_list *timer, - const char *name, - struct lock_class_key *key); +static void do_init_timer(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key); -void init_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) +void init_timer_on_stack_key(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(timer, &timer_debug_descr); - __init_timer(timer, name, key); + do_init_timer(timer, flags, name, key); } EXPORT_SYMBOL_GPL(init_timer_on_stack_key); @@ -614,12 +607,13 @@ static inline void debug_assert_init(struct timer_list *timer) debug_timer_assert_init(timer); } -static void __init_timer(struct timer_list *timer, - const char *name, - struct lock_class_key *key) +static void do_init_timer(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key) { + struct tvec_base *base = __raw_get_cpu_var(tvec_bases); + timer->entry.next = NULL; - timer->base = __raw_get_cpu_var(tvec_bases); + timer->base = (void *)((unsigned long)base | flags); timer->slack = -1; #ifdef CONFIG_TIMER_STATS timer->start_site = NULL; @@ -629,22 +623,10 @@ static void __init_timer(struct timer_list *timer, lockdep_init_map(&timer->lockdep_map, name, key, 0); } -void setup_deferrable_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data) -{ - timer->function = function; - timer->data = data; - init_timer_on_stack_key(timer, name, key); - timer_set_deferrable(timer); -} -EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key); - /** * init_timer_key - initialize a timer * @timer: the timer to be initialized + * @flags: timer flags * @name: name of the timer * @key: lockdep class key of the fake lock used for tracking timer * sync lock dependencies @@ -652,24 +634,14 @@ EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key); * init_timer_key() must be done to a timer prior calling *any* of the * other timer functions. */ -void init_timer_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) +void init_timer_key(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key) { debug_init(timer); - __init_timer(timer, name, key); + do_init_timer(timer, flags, name, key); } EXPORT_SYMBOL(init_timer_key); -void init_timer_deferrable_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) -{ - init_timer_key(timer, name, key); - timer_set_deferrable(timer); -} -EXPORT_SYMBOL(init_timer_deferrable_key); - static inline void detach_timer(struct timer_list *timer, bool clear_pending) { struct list_head *entry = &timer->entry; -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [tip:timers/core] timer: Clean up timer initializers 2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo @ 2012-08-21 16:42 ` tip-bot for Tejun Heo 0 siblings, 0 replies; 31+ messages in thread From: tip-bot for Tejun Heo @ 2012-08-21 16:42 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx Commit-ID: fc683995a6c4e604d62ab9a488ac2c1ba94fa868 Gitweb: http://git.kernel.org/tip/fc683995a6c4e604d62ab9a488ac2c1ba94fa868 Author: Tejun Heo <tj@kernel.org> AuthorDate: Wed, 8 Aug 2012 11:10:27 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Tue, 21 Aug 2012 16:28:30 +0200 timer: Clean up timer initializers Over time, timer initializers became messy with unnecessarily duplicated code which are inconsistently spread across timer.h and timer.c. This patch cleans up timer initializers. * timer.c::__init_timer() is renamed to do_init_timer(). * __TIMER_INITIALIZER() added. It takes @flags and all initializers are wrappers around it. * init_timer[_on_stack]_key() now take @flags. * __init_timer[_on_stack]() added. They take @flags and all init macros are wrappers around them. * __setup_timer[_on_stack]() added. It uses __init_timer() and takes @flags. All setup macros are wrappers around the two. Note that this patch doesn't add missing init/setup combinations - e.g. init_timer_deferrable_on_stack(). Adding missing ones is trivial. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/timer.h | 123 +++++++++++++++--------------------------------- kernel/timer.c | 56 ++++++----------------- 2 files changed, 53 insertions(+), 126 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 1d364ae..3f95c1f 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -62,136 +62,91 @@ extern struct tvec_base boot_tvec_bases; #define TIMER_FLAG_MASK 0x1LU -#define TIMER_INITIALIZER(_function, _expires, _data) { \ +#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ .function = (_function), \ .expires = (_expires), \ .data = (_data), \ - .base = &boot_tvec_bases, \ + .base = (void *)((unsigned long)&boot_tvec_bases + (_flags)), \ .slack = -1, \ __TIMER_LOCKDEP_MAP_INITIALIZER( \ __FILE__ ":" __stringify(__LINE__)) \ } -#define TBASE_MAKE_DEFERRED(ptr) ((struct tvec_base *) \ - ((unsigned char *)(ptr) + TIMER_DEFERRABLE)) +#define TIMER_INITIALIZER(_function, _expires, _data) \ + __TIMER_INITIALIZER((_function), (_expires), (_data), 0) -#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) {\ - .entry = { .prev = TIMER_ENTRY_STATIC }, \ - .function = (_function), \ - .expires = (_expires), \ - .data = (_data), \ - .base = TBASE_MAKE_DEFERRED(&boot_tvec_bases), \ - __TIMER_LOCKDEP_MAP_INITIALIZER( \ - __FILE__ ":" __stringify(__LINE__)) \ - } +#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) \ + __TIMER_INITIALIZER((_function), (_expires), (_data), TIMER_DEFERRABLE) #define DEFINE_TIMER(_name, _function, _expires, _data) \ struct timer_list _name = \ TIMER_INITIALIZER(_function, _expires, _data) -void init_timer_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key); -void init_timer_deferrable_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key); +void init_timer_key(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key); #ifdef CONFIG_DEBUG_OBJECTS_TIMERS extern void init_timer_on_stack_key(struct timer_list *timer, - const char *name, + unsigned int flags, const char *name, struct lock_class_key *key); extern void destroy_timer_on_stack(struct timer_list *timer); #else static inline void destroy_timer_on_stack(struct timer_list *timer) { } static inline void init_timer_on_stack_key(struct timer_list *timer, - const char *name, + unsigned int flags, const char *name, struct lock_class_key *key) { - init_timer_key(timer, name, key); + init_timer_key(timer, flags, name, key); } #endif #ifdef CONFIG_LOCKDEP -#define init_timer(timer) \ +#define __init_timer(_timer, _flags) \ do { \ static struct lock_class_key __key; \ - init_timer_key((timer), #timer, &__key); \ + init_timer_key((_timer), (_flags), #_timer, &__key); \ } while (0) -#define init_timer_deferrable(timer) \ +#define __init_timer_on_stack(_timer, _flags) \ do { \ static struct lock_class_key __key; \ - init_timer_deferrable_key((timer), #timer, &__key); \ + init_timer_on_stack_key((_timer), (_flags), #_timer, &__key); \ } while (0) +#else +#define __init_timer(_timer, _flags) \ + init_timer_key((_timer), (_flags), NULL, NULL) +#define __init_timer_on_stack(_timer, _flags) \ + init_timer_on_stack_key((_timer), (_flags), NULL, NULL) +#endif +#define init_timer(timer) \ + __init_timer((timer), 0) +#define init_timer_deferrable(timer) \ + __init_timer((timer), TIMER_DEFERRABLE) #define init_timer_on_stack(timer) \ + __init_timer_on_stack((timer), 0) + +#define __setup_timer(_timer, _fn, _data, _flags) \ do { \ - static struct lock_class_key __key; \ - init_timer_on_stack_key((timer), #timer, &__key); \ + __init_timer((_timer), (_flags)); \ + (_timer)->function = (_fn); \ + (_timer)->data = (_data); \ } while (0) -#define setup_timer(timer, fn, data) \ +#define __setup_timer_on_stack(_timer, _fn, _data, _flags) \ do { \ - static struct lock_class_key __key; \ - setup_timer_key((timer), #timer, &__key, (fn), (data));\ + __init_timer_on_stack((_timer), (_flags)); \ + (_timer)->function = (_fn); \ + (_timer)->data = (_data); \ } while (0) +#define setup_timer(timer, fn, data) \ + __setup_timer((timer), (fn), (data), 0) #define setup_timer_on_stack(timer, fn, data) \ - do { \ - static struct lock_class_key __key; \ - setup_timer_on_stack_key((timer), #timer, &__key, \ - (fn), (data)); \ - } while (0) + __setup_timer_on_stack((timer), (fn), (data), 0) #define setup_deferrable_timer_on_stack(timer, fn, data) \ - do { \ - static struct lock_class_key __key; \ - setup_deferrable_timer_on_stack_key((timer), #timer, \ - &__key, (fn), \ - (data)); \ - } while (0) -#else -#define init_timer(timer)\ - init_timer_key((timer), NULL, NULL) -#define init_timer_deferrable(timer)\ - init_timer_deferrable_key((timer), NULL, NULL) -#define init_timer_on_stack(timer)\ - init_timer_on_stack_key((timer), NULL, NULL) -#define setup_timer(timer, fn, data)\ - setup_timer_key((timer), NULL, NULL, (fn), (data)) -#define setup_timer_on_stack(timer, fn, data)\ - setup_timer_on_stack_key((timer), NULL, NULL, (fn), (data)) -#define setup_deferrable_timer_on_stack(timer, fn, data)\ - setup_deferrable_timer_on_stack_key((timer), NULL, NULL, (fn), (data)) -#endif - -static inline void setup_timer_key(struct timer_list * timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data) -{ - timer->function = function; - timer->data = data; - init_timer_key(timer, name, key); -} - -static inline void setup_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data) -{ - timer->function = function; - timer->data = data; - init_timer_on_stack_key(timer, name, key); -} - -extern void setup_deferrable_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data); + __setup_timer_on_stack((timer), (fn), (data), TIMER_DEFERRABLE) /** * timer_pending - is a timer pending? diff --git a/kernel/timer.c b/kernel/timer.c index cf7af56..8d185a1 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -100,11 +100,6 @@ static inline struct tvec_base *tbase_get_base(struct tvec_base *base) return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK)); } -static inline void timer_set_deferrable(struct timer_list *timer) -{ - timer->base = TBASE_MAKE_DEFERRED(timer->base); -} - static inline void timer_set_base(struct timer_list *timer, struct tvec_base *new_base) { @@ -564,16 +559,14 @@ static inline void debug_timer_assert_init(struct timer_list *timer) debug_object_assert_init(timer, &timer_debug_descr); } -static void __init_timer(struct timer_list *timer, - const char *name, - struct lock_class_key *key); +static void do_init_timer(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key); -void init_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) +void init_timer_on_stack_key(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(timer, &timer_debug_descr); - __init_timer(timer, name, key); + do_init_timer(timer, flags, name, key); } EXPORT_SYMBOL_GPL(init_timer_on_stack_key); @@ -614,12 +607,13 @@ static inline void debug_assert_init(struct timer_list *timer) debug_timer_assert_init(timer); } -static void __init_timer(struct timer_list *timer, - const char *name, - struct lock_class_key *key) +static void do_init_timer(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key) { + struct tvec_base *base = __raw_get_cpu_var(tvec_bases); + timer->entry.next = NULL; - timer->base = __raw_get_cpu_var(tvec_bases); + timer->base = (void *)((unsigned long)base | flags); timer->slack = -1; #ifdef CONFIG_TIMER_STATS timer->start_site = NULL; @@ -629,22 +623,10 @@ static void __init_timer(struct timer_list *timer, lockdep_init_map(&timer->lockdep_map, name, key, 0); } -void setup_deferrable_timer_on_stack_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key, - void (*function)(unsigned long), - unsigned long data) -{ - timer->function = function; - timer->data = data; - init_timer_on_stack_key(timer, name, key); - timer_set_deferrable(timer); -} -EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key); - /** * init_timer_key - initialize a timer * @timer: the timer to be initialized + * @flags: timer flags * @name: name of the timer * @key: lockdep class key of the fake lock used for tracking timer * sync lock dependencies @@ -652,24 +634,14 @@ EXPORT_SYMBOL_GPL(setup_deferrable_timer_on_stack_key); * init_timer_key() must be done to a timer prior calling *any* of the * other timer functions. */ -void init_timer_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) +void init_timer_key(struct timer_list *timer, unsigned int flags, + const char *name, struct lock_class_key *key) { debug_init(timer); - __init_timer(timer, name, key); + do_init_timer(timer, flags, name, key); } EXPORT_SYMBOL(init_timer_key); -void init_timer_deferrable_key(struct timer_list *timer, - const char *name, - struct lock_class_key *key) -{ - init_timer_key(timer, name, key); - timer_set_deferrable(timer); -} -EXPORT_SYMBOL(init_timer_deferrable_key); - static inline void detach_timer(struct timer_list *timer, bool clear_pending) { struct list_head *entry = &timer->entry; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 4/4] timer: implement TIMER_IRQSAFE 2012-08-08 18:10 Tejun Heo ` (2 preceding siblings ...) 2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo @ 2012-08-08 18:10 ` Tejun Heo 2012-08-21 16:43 ` [tip:timers/core] timer: Implement TIMER_IRQSAFE tip-bot for Tejun Heo 2012-08-08 18:13 ` $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers" Tejun Heo ` (2 subsequent siblings) 6 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-08 18:10 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz, Tejun Heo Timer internals are protected with irq-safe locks but timer execution isn't, so a timer being dequeued for execution and its execution aren't atomic against IRQs. This makes it impossible to wait for its completion from IRQ handlers and difficult to shoot down a timer from IRQ handlers. This issue caused some issues for delayed_work interface. Because there's no way to reliably shoot down delayed_work->timer from IRQ handlers, __cancel_delayed_work() can't share the logic to steal the target delayed_work with cancel_delayed_work_sync(), and can only steal delayed_works which are on queued on timer. Similarly, the pending mod_delayed_work() can't be used from IRQ handlers. This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer to be executed without enabling IRQ after dequeueing such that its dequeueing and execution are atomic against IRQ handlers. This makes it safe to wait for the timer's completion from IRQ handlers, for example, using del_timer_sync(). It can never be executing on the local CPU and if executing on other CPUs it won't be interrupted until done. This will enable simplifying delayed_work cancel/mod interface. Signed-off-by: Tejun Heo <tj@kernel.org> --- include/linux/timer.h | 12 ++++++++---- kernel/timer.c | 35 ++++++++++++++++++++++++----------- 2 files changed, 32 insertions(+), 15 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 3f95c1f..726192e4 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -49,18 +49,22 @@ extern struct tvec_base boot_tvec_bases; #endif /* - * Note that all tvec_bases are 2 byte aligned and lower bit of - * base in timer_list is guaranteed to be zero. Use the LSB to - * indicate whether the timer is deferrable. + * Note that all tvec_bases are at least 4 byte aligned and lower two bits + * of base in timer_list is guaranteed to be zero. Use them for flags. * * A deferrable timer will work normally when the system is busy, but * will not cause a CPU to come out of idle just to service it; instead, * the timer will be serviced when the CPU eventually wakes up with a * subsequent non-deferrable timer. + * + * An irqsafe timer is executed with IRQ disabled and it's safe to wait for + * the completion of the running instance from IRQ handlers, for example, + * by calling del_timer_sync(). */ #define TIMER_DEFERRABLE 0x1LU +#define TIMER_IRQSAFE 0x2LU -#define TIMER_FLAG_MASK 0x1LU +#define TIMER_FLAG_MASK 0x3LU #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ diff --git a/kernel/timer.c b/kernel/timer.c index 8d185a1..706fe4c 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -95,6 +95,11 @@ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE); } +static inline unsigned int tbase_get_irqsafe(struct tvec_base *base) +{ + return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE); +} + static inline struct tvec_base *tbase_get_base(struct tvec_base *base) { return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK)); @@ -1002,14 +1007,14 @@ EXPORT_SYMBOL(try_to_del_timer_sync); * * Synchronization rules: Callers must prevent restarting of the timer, * otherwise this function is meaningless. It must not be called from - * interrupt contexts. The caller must not hold locks which would prevent - * completion of the timer's handler. The timer's handler must not call - * add_timer_on(). Upon exit the timer is not queued and the handler is - * not running on any CPU. + * interrupt contexts unless the timer is an irqsafe one. The caller must + * not hold locks which would prevent completion of the timer's + * handler. The timer's handler must not call add_timer_on(). Upon exit the + * timer is not queued and the handler is not running on any CPU. * - * Note: You must not hold locks that are held in interrupt context - * while calling this function. Even if the lock has nothing to do - * with the timer in question. Here's why: + * Note: For !irqsafe timers, you must not hold locks that are held in + * interrupt context while calling this function. Even if the lock has + * nothing to do with the timer in question. Here's why: * * CPU0 CPU1 * ---- ---- @@ -1046,7 +1051,7 @@ int del_timer_sync(struct timer_list *timer) * don't use it in hardirq context, because it * could lead to deadlock. */ - WARN_ON(in_irq()); + WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base)); for (;;) { int ret = try_to_del_timer_sync(timer); if (ret >= 0) @@ -1153,19 +1158,27 @@ static inline void __run_timers(struct tvec_base *base) while (!list_empty(head)) { void (*fn)(unsigned long); unsigned long data; + bool irqsafe; timer = list_first_entry(head, struct timer_list,entry); fn = timer->function; data = timer->data; + irqsafe = tbase_get_irqsafe(timer->base); timer_stats_account_timer(timer); base->running_timer = timer; detach_expired_timer(timer, base); - spin_unlock_irq(&base->lock); - call_timer_fn(timer, fn, data); - spin_lock_irq(&base->lock); + if (irqsafe) { + spin_unlock(&base->lock); + call_timer_fn(timer, fn, data); + spin_lock(&base->lock); + } else { + spin_unlock_irq(&base->lock); + call_timer_fn(timer, fn, data); + spin_lock_irq(&base->lock); + } } } base->running_timer = NULL; -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [tip:timers/core] timer: Implement TIMER_IRQSAFE 2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo @ 2012-08-21 16:43 ` tip-bot for Tejun Heo 2012-08-21 19:26 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: tip-bot for Tejun Heo @ 2012-08-21 16:43 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tj, tglx Commit-ID: c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3 Gitweb: http://git.kernel.org/tip/c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3 Author: Tejun Heo <tj@kernel.org> AuthorDate: Wed, 8 Aug 2012 11:10:28 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Tue, 21 Aug 2012 16:28:31 +0200 timer: Implement TIMER_IRQSAFE Timer internals are protected with irq-safe locks but timer execution isn't, so a timer being dequeued for execution and its execution aren't atomic against IRQs. This makes it impossible to wait for its completion from IRQ handlers and difficult to shoot down a timer from IRQ handlers. This issue caused some issues for delayed_work interface. Because there's no way to reliably shoot down delayed_work->timer from IRQ handlers, __cancel_delayed_work() can't share the logic to steal the target delayed_work with cancel_delayed_work_sync(), and can only steal delayed_works which are on queued on timer. Similarly, the pending mod_delayed_work() can't be used from IRQ handlers. This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer to be executed without enabling IRQ after dequeueing such that its dequeueing and execution are atomic against IRQ handlers. This makes it safe to wait for the timer's completion from IRQ handlers, for example, using del_timer_sync(). It can never be executing on the local CPU and if executing on other CPUs it won't be interrupted until done. This will enable simplifying delayed_work cancel/mod interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- include/linux/timer.h | 16 ++++++++++++---- kernel/timer.c | 35 ++++++++++++++++++++++++----------- 2 files changed, 36 insertions(+), 15 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 3f95c1f..8c5a197 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -49,18 +49,26 @@ extern struct tvec_base boot_tvec_bases; #endif /* - * Note that all tvec_bases are 2 byte aligned and lower bit of - * base in timer_list is guaranteed to be zero. Use the LSB to - * indicate whether the timer is deferrable. + * Note that all tvec_bases are at least 4 byte aligned and lower two bits + * of base in timer_list is guaranteed to be zero. Use them for flags. * * A deferrable timer will work normally when the system is busy, but * will not cause a CPU to come out of idle just to service it; instead, * the timer will be serviced when the CPU eventually wakes up with a * subsequent non-deferrable timer. + * + * An irqsafe timer is executed with IRQ disabled and it's safe to wait for + * the completion of the running instance from IRQ handlers, for example, + * by calling del_timer_sync(). + * + * Note: The irq disabled callback execution is a special case for + * workqueue locking issues. It's not meant for executing random crap + * with interrupts disabled. Abuse is monitored! */ #define TIMER_DEFERRABLE 0x1LU +#define TIMER_IRQSAFE 0x2LU -#define TIMER_FLAG_MASK 0x1LU +#define TIMER_FLAG_MASK 0x3LU #define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ .entry = { .prev = TIMER_ENTRY_STATIC }, \ diff --git a/kernel/timer.c b/kernel/timer.c index 8d185a1..706fe4c 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -95,6 +95,11 @@ static inline unsigned int tbase_get_deferrable(struct tvec_base *base) return ((unsigned int)(unsigned long)base & TIMER_DEFERRABLE); } +static inline unsigned int tbase_get_irqsafe(struct tvec_base *base) +{ + return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE); +} + static inline struct tvec_base *tbase_get_base(struct tvec_base *base) { return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK)); @@ -1002,14 +1007,14 @@ EXPORT_SYMBOL(try_to_del_timer_sync); * * Synchronization rules: Callers must prevent restarting of the timer, * otherwise this function is meaningless. It must not be called from - * interrupt contexts. The caller must not hold locks which would prevent - * completion of the timer's handler. The timer's handler must not call - * add_timer_on(). Upon exit the timer is not queued and the handler is - * not running on any CPU. + * interrupt contexts unless the timer is an irqsafe one. The caller must + * not hold locks which would prevent completion of the timer's + * handler. The timer's handler must not call add_timer_on(). Upon exit the + * timer is not queued and the handler is not running on any CPU. * - * Note: You must not hold locks that are held in interrupt context - * while calling this function. Even if the lock has nothing to do - * with the timer in question. Here's why: + * Note: For !irqsafe timers, you must not hold locks that are held in + * interrupt context while calling this function. Even if the lock has + * nothing to do with the timer in question. Here's why: * * CPU0 CPU1 * ---- ---- @@ -1046,7 +1051,7 @@ int del_timer_sync(struct timer_list *timer) * don't use it in hardirq context, because it * could lead to deadlock. */ - WARN_ON(in_irq()); + WARN_ON(in_irq() && !tbase_get_irqsafe(timer->base)); for (;;) { int ret = try_to_del_timer_sync(timer); if (ret >= 0) @@ -1153,19 +1158,27 @@ static inline void __run_timers(struct tvec_base *base) while (!list_empty(head)) { void (*fn)(unsigned long); unsigned long data; + bool irqsafe; timer = list_first_entry(head, struct timer_list,entry); fn = timer->function; data = timer->data; + irqsafe = tbase_get_irqsafe(timer->base); timer_stats_account_timer(timer); base->running_timer = timer; detach_expired_timer(timer, base); - spin_unlock_irq(&base->lock); - call_timer_fn(timer, fn, data); - spin_lock_irq(&base->lock); + if (irqsafe) { + spin_unlock(&base->lock); + call_timer_fn(timer, fn, data); + spin_lock(&base->lock); + } else { + spin_unlock_irq(&base->lock); + call_timer_fn(timer, fn, data); + spin_lock_irq(&base->lock); + } } } base->running_timer = NULL; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [tip:timers/core] timer: Implement TIMER_IRQSAFE 2012-08-21 16:43 ` [tip:timers/core] timer: Implement TIMER_IRQSAFE tip-bot for Tejun Heo @ 2012-08-21 19:26 ` Tejun Heo 0 siblings, 0 replies; 31+ messages in thread From: Tejun Heo @ 2012-08-21 19:26 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx On Tue, Aug 21, 2012 at 09:43:07AM -0700, tip-bot for Tejun Heo wrote: > Commit-ID: c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3 > Gitweb: http://git.kernel.org/tip/c5f66e99b7cb091e3d51ae8e8156892e8feb7fa3 > Author: Tejun Heo <tj@kernel.org> > AuthorDate: Wed, 8 Aug 2012 11:10:28 -0700 > Committer: Thomas Gleixner <tglx@linutronix.de> > CommitDate: Tue, 21 Aug 2012 16:28:31 +0200 > > timer: Implement TIMER_IRQSAFE > > Timer internals are protected with irq-safe locks but timer execution > isn't, so a timer being dequeued for execution and its execution > aren't atomic against IRQs. This makes it impossible to wait for its > completion from IRQ handlers and difficult to shoot down a timer from > IRQ handlers. > > This issue caused some issues for delayed_work interface. Because > there's no way to reliably shoot down delayed_work->timer from IRQ > handlers, __cancel_delayed_work() can't share the logic to steal the > target delayed_work with cancel_delayed_work_sync(), and can only > steal delayed_works which are on queued on timer. Similarly, the > pending mod_delayed_work() can't be used from IRQ handlers. > > This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer > to be executed without enabling IRQ after dequeueing such that its > dequeueing and execution are atomic against IRQ handlers. > > This makes it safe to wait for the timer's completion from IRQ > handlers, for example, using del_timer_sync(). It can never be > executing on the local CPU and if executing on other CPUs it won't be > interrupted until done. > > This will enable simplifying delayed_work cancel/mod interface. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Cc: torvalds@linux-foundation.org > Cc: peterz@infradead.org > Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Will pull into wq/for-3.7 and put delay_work changes on top. If there's any objection, please scream. Thanks a lot. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers" 2012-08-08 18:10 Tejun Heo ` (3 preceding siblings ...) 2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo @ 2012-08-08 18:13 ` Tejun Heo 2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo 2012-08-14 18:55 ` Thomas Gleixner 6 siblings, 0 replies; 31+ messages in thread From: Tejun Heo @ 2012-08-08 18:13 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz Either I'm keeping forgetting adding Subject: tag or git-send-mail is somehow screwed up. Sorry about that. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-08 18:10 Tejun Heo ` (4 preceding siblings ...) 2012-08-08 18:13 ` $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers" Tejun Heo @ 2012-08-13 23:35 ` Tejun Heo 2012-08-14 8:32 ` Thomas Gleixner 2012-08-14 19:16 ` Thomas Gleixner 2012-08-14 18:55 ` Thomas Gleixner 6 siblings, 2 replies; 31+ messages in thread From: Tejun Heo @ 2012-08-13 23:35 UTC (permalink / raw) To: linux-kernel; +Cc: torvalds, mingo, akpm, tglx, peterz Hello, On Wed, Aug 08, 2012 at 11:10:24AM -0700, Tejun Heo wrote: > Timer internals are protected by irqsafe lock but the lock is > naturally dropped and irq enabled while a timer is executed. This > makes dequeueing timer for execution and the actual execution > non-atomic against IRQs. No matter what the timer function does, IRQs > can occur between timer dispatch and execution. This means that an > IRQ handler could interrupt any timer in progress and it's impossible > for an IRQ handler to cancel and drain a timer. If nobody objects, I'll route this through wq/for-3.7 together with "workqueue: use irqsafe timer in delayed_work" patchset. If you object, please scream. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo @ 2012-08-14 8:32 ` Thomas Gleixner 2012-08-14 19:16 ` Thomas Gleixner 1 sibling, 0 replies; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 8:32 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz On Mon, 13 Aug 2012, Tejun Heo wrote: > Hello, > > On Wed, Aug 08, 2012 at 11:10:24AM -0700, Tejun Heo wrote: > > Timer internals are protected by irqsafe lock but the lock is > > naturally dropped and irq enabled while a timer is executed. This > > makes dequeueing timer for execution and the actual execution > > non-atomic against IRQs. No matter what the timer function does, IRQs > > can occur between timer dispatch and execution. This means that an > > IRQ handler could interrupt any timer in progress and it's impossible > > for an IRQ handler to cancel and drain a timer. > > If nobody objects, I'll route this through wq/for-3.7 together with > "workqueue: use irqsafe timer in delayed_work" patchset. If you > object, please scream. SCREAM! I'll reply to the original posting later - still processing my vacation backlog. Thanks, tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo 2012-08-14 8:32 ` Thomas Gleixner @ 2012-08-14 19:16 ` Thomas Gleixner 2012-08-14 19:22 ` Tejun Heo 1 sibling, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 19:16 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz On Mon, 13 Aug 2012, Tejun Heo wrote: > On Wed, Aug 08, 2012 at 11:10:24AM -0700, Tejun Heo wrote: > > Timer internals are protected by irqsafe lock but the lock is > > naturally dropped and irq enabled while a timer is executed. This > > makes dequeueing timer for execution and the actual execution > > non-atomic against IRQs. No matter what the timer function does, IRQs > > can occur between timer dispatch and execution. This means that an > > IRQ handler could interrupt any timer in progress and it's impossible > > for an IRQ handler to cancel and drain a timer. > > If nobody objects, I'll route this through wq/for-3.7 together with > "workqueue: use irqsafe timer in delayed_work" patchset. If you > object, please scream. Second thoughts. Why the hell are you trying to rush stuff which affects a well maintained part of the kernel through your own tree w/o having the courtesy of contacting the maintainer politely instead of sending an ultimatum? You posted that series less than a week ago and there is no reason why you need to push hard on inclusion into next after a few days. If you really cared about my opinion you could have figured out that I'm on vacation. Yours grumpy tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 19:16 ` Thomas Gleixner @ 2012-08-14 19:22 ` Tejun Heo 2012-08-14 21:03 ` Thomas Gleixner 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-14 19:22 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Hello, On Tue, Aug 14, 2012 at 09:16:16PM +0200, Thomas Gleixner wrote: > Why the hell are you trying to rush stuff which affects a well > maintained part of the kernel through your own tree w/o having the > courtesy of contacting the maintainer politely instead of sending an > ultimatum? > > You posted that series less than a week ago and there is no reason why > you need to push hard on inclusion into next after a few days. So, it has been out for a while, and it's not the final inclusion into mainline. The change to timer is fairly localized and you have enough opportunity to object before and after. It's not like I'm gonna push it in against oppositions. > If you really cared about my opinion you could have figured out that > I'm on vacation. Ah... sorry but it's not like we date or anything. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 19:22 ` Tejun Heo @ 2012-08-14 21:03 ` Thomas Gleixner 2012-08-14 21:56 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 21:03 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz On Tue, 14 Aug 2012, Tejun Heo wrote: > On Tue, Aug 14, 2012 at 09:16:16PM +0200, Thomas Gleixner wrote: > > Why the hell are you trying to rush stuff which affects a well > > maintained part of the kernel through your own tree w/o having the > > courtesy of contacting the maintainer politely instead of sending an > > ultimatum? > > > > You posted that series less than a week ago and there is no reason why > > you need to push hard on inclusion into next after a few days. > > So, it has been out for a while, and it's not the final inclusion into Out for a while? You must be kidding. The original post is from Wed. Aug 8th and your ultimatum mail is dated Mon Aug 13th. That's a mere 3 working days. > mainline. The change to timer is fairly localized and you have enough > opportunity to object before and after. It's not like I'm gonna push > it in against oppositions. You seem to misunderstand what pushing into -next means. -next is not a playground for random changes. It's an integration tree for stuff which is supposed to show up in the next merge window. And stuff which touches other maintainers trees needs to orginate from there or at least be acked/reviewed-by those maintainers. Why should -next have different rules to mainline? > > If you really cared about my opinion you could have figured out that > > I'm on vacation. > > Ah... sorry but it's not like we date or anything. Fortunately not. I wouldn't touch such an insensitive clod with a ten foot pole. Though other people whom I'm not dating either are sensitive enough to send a polite mail after a few days: <cite> Thomas, I figured out on IRC that you are on vacation. Would you be so kind to look at the patches I've posted 10 days ago? .... </cite> Do you see the subtle difference between a polite reminder after 10 days and an ultimatum after 5 days ????? Thanks, tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 21:03 ` Thomas Gleixner @ 2012-08-14 21:56 ` Tejun Heo 2012-08-14 22:45 ` Thomas Gleixner 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-14 21:56 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Hello, Thomas. On Tue, Aug 14, 2012 at 11:03:33PM +0200, Thomas Gleixner wrote: > Why should -next have different rules to mainline? It's faster paced and trees revert. The message specifically was a ping for objection and I was waiting for further response and would have waited until early next week (and written another "applied" message which would be another chance to veto). And even if that isn't enough for whatever reason and you or anyone else object it afterwards, it'll get reverted / reouted differently / whatever. As for subsystem boundary, at least I cross them and let others cross if the changes aren't significant and the proposed changes are likely to be be used only in that tree for the devel window. The timer change seems borderline to me. It isn't trivial but doesn't seem all that invasive to me. I don't think any critical protocol is breached here. If you're upset about the style of the ping, I apologize. I'll try to be more sensitive when pinging you. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 21:56 ` Tejun Heo @ 2012-08-14 22:45 ` Thomas Gleixner 2012-08-14 23:01 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 22:45 UTC (permalink / raw) To: Tejun Heo Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra, Stephen Rothwell Tejun, On Tue, 14 Aug 2012, Tejun Heo wrote: > On Tue, Aug 14, 2012 at 11:03:33PM +0200, Thomas Gleixner wrote: > > Why should -next have different rules to mainline? > > It's faster paced and trees revert. The message specifically was a Nonsense. It's a tree which gets stuff which is cooked and not a tree for random experimental crap. Period. > ping for objection and I was waiting for further response and would > have waited until early next week (and written another "applied" > message which would be another chance to veto). And even if that > isn't enough for whatever reason and you or anyone else object it > afterwards, it'll get reverted / reouted differently / whatever. > > As for subsystem boundary, at least I cross them and let others cross > if the changes aren't significant and the proposed changes are likely > to be be used only in that tree for the devel window. The timer > change seems borderline to me. It isn't trivial but doesn't seem all > that invasive to me. It does not matter at all whether you think it's invasive or not. There is a reason why we have maintainers for different parts of the kernel who are responsible for deciding what's invasive. And we have very well worked out mechanisms regarding cross tree changes, i.e. providing minimal trees to pull for other maintainers. > I don't think any critical protocol is breached here. May I recommend that you make yourself familiar with the way how this community works? > If you're upset about the style of the ping, I apologize. I'll try > to be more sensitive when pinging you. It's not about me. You are trying to play the system. Thanks, tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 22:45 ` Thomas Gleixner @ 2012-08-14 23:01 ` Tejun Heo 2012-08-14 23:33 ` Thomas Gleixner 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-14 23:01 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra, Stephen Rothwell Hello, Thomas. On Wed, Aug 15, 2012 at 12:45:24AM +0200, Thomas Gleixner wrote: > And we have very well worked out mechanisms regarding cross tree > changes, i.e. providing minimal trees to pull for other maintainers. If you look at the review branches, they're actually structured that way so that the timer part can be pulled separately. If the maintainer wants to do that, sure. If the maintainer thinks routing through another tree is fine, that's okay too. Subsystem boundaries are all good and great but it's not some absolute barrier which can't be crossed at any cost. > > If you're upset about the style of the ping, I apologize. I'll try > > to be more sensitive when pinging you. > > It's not about me. You are trying to play the system. Thomas, I wasn't trying to get it through behind your back. You have been notified clearly multiple times and have ample opportunities to object and suggest different ways if you don't like whatever is going on. I probably should have written "if the maintainer doesn't object, I think it would be easier to route these through wq/for-3.7 as it will be the only user for now, blah blah blah" instead and maybe I misjudged the character of the changes or the subsystem. That said, I think you're inferring too much. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 23:01 ` Tejun Heo @ 2012-08-14 23:33 ` Thomas Gleixner 2012-08-15 0:18 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 23:33 UTC (permalink / raw) To: Tejun Heo Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra, Stephen Rothwell On Tue, 14 Aug 2012, Tejun Heo wrote: > On Wed, Aug 15, 2012 at 12:45:24AM +0200, Thomas Gleixner wrote: > > And we have very well worked out mechanisms regarding cross tree > > changes, i.e. providing minimal trees to pull for other maintainers. > > If you look at the review branches, they're actually structured that > way so that the timer part can be pulled separately. If the > maintainer wants to do that, sure. If the maintainer thinks routing > through another tree is fine, that's okay too. Subsystem boundaries > are all good and great but it's not some absolute barrier which can't > be crossed at any cost. That's not about any cost. You are trying to force stuff into -next with a THREE workdays notice, just because you think that your stuff is so important. Have you ever tried to understand how the kernel development system works? > I probably should have written "if the maintainer doesn't object, I > think it would be easier to route these through wq/for-3.7 as it will > be the only user for now, blah blah blah" instead and maybe I > misjudged the character of the changes or the subsystem. That said, I > think you're inferring too much. No, I'm, not inferrring too much. It's not about what you should have written. It's about your general attitude. You really think that I accept all that stuff just because you add some "blah, blah, blah" to some mail? Far from it! To convince me to accept your patches you should start answering my questions and suggestions seriously in the first place and not discarding them upfront as lunatic visions. As long as you can't provide a proper counter argument against maintaining the timer in the same context as the work, no matter what the underlying mechanism to achieve this will be, I'm not going to accept any of this hackery neither near next nor mainline. Thanks, tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 23:33 ` Thomas Gleixner @ 2012-08-15 0:18 ` Tejun Heo 2012-08-15 10:58 ` Peter Zijlstra 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-15 0:18 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Linus Torvalds, mingo, Andrew Morton, Peter Zijlstra, Stephen Rothwell Hello, Thomas. On Wed, Aug 15, 2012 at 01:33:09AM +0200, Thomas Gleixner wrote: > To convince me to accept your patches you should start answering my > questions and suggestions seriously in the first place and not > discarding them upfront as lunatic visions. > > As long as you can't provide a proper counter argument against > maintaining the timer in the same context as the work, no matter what > the underlying mechanism to achieve this will be, I'm not going to > accept any of this hackery neither near next nor mainline. Sure, that's exactly why the patches are posted for review, so you're suggesting for workqueue implement essentially its own timer list - be that a simple sorted linked list or priority heap. Am I understanding you correctly? If so, we're comparing the following two. a. Adding IRQSAFE timer. Runtime cost is one additional if() in timer execution path. b. Implementing workqueue's own timer system which is driven by timer so that the timer part can also be protected by the usual workqueue synchronization. To me, #a seems like the better choice here. delayed_work is one of the more common constructs used widely in the kernel. It's often used in device drivers to defer processing to process context and timer queueing (including modification) is a frequent operation. IRQ handlers schedule them, some drivers use it to poll the device, block layer uses it for most of deferred handling - SCSI layer failing to issue a command due to resource shortage reschedules delayed_work repeatedly, and so on. Essentially, delayed_work might be used for any purpose a timer is used. Timer users switch to delayed_work if process context becomes necessary for whatever reason, so, I don't think we can get away with simple sorted list implementation. We might be okay under some workloads but O(N^2) insertion complexity for something as commonly used as delayed_work doesn't seem like a good idea to me. We could go for more involved implementation, say a priority heap or somewhat simplified version of tvec_base, but that seems like a bad tradeoff to me. We would be trading off fairly complex chunk of code duplicating an existing capability to avoid adding fairly small feature to the timer. It will likely be worse than the proper timer and we have to maintain two chunks of code doing about the same thing to save single if() in the existing timer code. Let's see if we can agree on the latter point first. Do you agree that it wouldn't be a good idea to implement relatively complex timer subsystem inside workqueue? Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-15 0:18 ` Tejun Heo @ 2012-08-15 10:58 ` Peter Zijlstra 2012-08-16 19:36 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2012-08-15 10:58 UTC (permalink / raw) To: Tejun Heo Cc: Thomas Gleixner, LKML, Linus Torvalds, mingo, Andrew Morton, Stephen Rothwell On Tue, 2012-08-14 at 17:18 -0700, Tejun Heo wrote: > Let's see if we can agree on the latter point first. Do you agree > that it wouldn't be a good idea to implement relatively complex timer > subsystem inside workqueue? RB-trees are fairly trivial to use, but can we please get back to why people want to do del/mod delayed work from IRQ context? I can get the queueing part, but why do they need to cancel and or modify stuff? Trying to come up with a solution to a problem you don't understand is kinda difficult. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-15 10:58 ` Peter Zijlstra @ 2012-08-16 19:36 ` Tejun Heo 0 siblings, 0 replies; 31+ messages in thread From: Tejun Heo @ 2012-08-16 19:36 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, LKML, Linus Torvalds, mingo, Andrew Morton, Stephen Rothwell Hello, Peter. On Wed, Aug 15, 2012 at 12:58:27PM +0200, Peter Zijlstra wrote: > On Tue, 2012-08-14 at 17:18 -0700, Tejun Heo wrote: > > Let's see if we can agree on the latter point first. Do you agree > > that it wouldn't be a good idea to implement relatively complex timer > > subsystem inside workqueue? > > RB-trees are fairly trivial to use, I'll get back to this later. > but can we please get back to why > people want to do del/mod delayed work from IRQ context? > > I can get the queueing part, but why do they need to cancel and or > modify stuff? It isn't different from being able to use del_timer() or mod_timer() from IRQ handlers. For example, block layer uses delayed_work to restart queue processing after a short delay for whatever reason. Depending on the driver, request issuing can happen from its IRQ handler chained from completion. If the command processing detects a condition which indicates that the queue can't process any more requests until another event happens, it shoots down the delayed_work. Another example is drivers using polling and irq together. If IRQ happens, they shoot down the pending delayed_work and schedules it immediately. Another similar use is timeout handler. Schedule timeout handler when initiating an operation and cancel it on completion IRQ. Apart from having process context when executing the callback, delayed_work's use case isn't different from timer. It might as well be named sleepable_timer and implemented as part of timer with cooperation from workqueue. As such, it's natural to expect interface and behavior similar to those of timer and that's another reason why implementing workqueue's own timerlist isn't a good idea. e.g. What about deferrable delayed_work? We definitely better have that and I'm sure it can also be implemented separately in workqueue too, but it seems quite silly to me. Let's say we implement it by having two timer_lists, one deferrable and one not. What if someone wants to adjust timer slack - ie. what about users which can use much larger slack than allowed by the default deferrable? Yet another thing is that not being able to use cancel/mod_delayed_work() from IRQ handlers is inherently bad API. The restriction isn't inherent in what it does. It rises purely from implementation details. For an API which is as widely used as workqueue, especially by drivers, that just doesn't seem like a good idea. The downside being one additional if() in timer execution path, to me, the tradeoff seems clear. If exposing IRQSAFE to other users or allowing del_timer_sync() from IRQ context is bothersome, those can be dropped too. The *only* thing which is necessary is for timer to not enable IRQ between delayed_work timer being dequeued and its execution. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-08 18:10 Tejun Heo ` (5 preceding siblings ...) 2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo @ 2012-08-14 18:55 ` Thomas Gleixner 2012-08-14 19:15 ` Tejun Heo 6 siblings, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 18:55 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz On Wed, 8 Aug 2012, Tejun Heo wrote: > Timer internals are protected by irqsafe lock but the lock is > naturally dropped and irq enabled while a timer is executed. This > makes dequeueing timer for execution and the actual execution > non-atomic against IRQs. No matter what the timer function does, IRQs > can occur between timer dispatch and execution. This means that an > IRQ handler could interrupt any timer in progress and it's impossible > for an IRQ handler to cancel and drain a timer. > > This restriction manifests as ugly convolutions in workqueue > delayed_work interface. A !idle delayed_work is either on timer, > being transferred from timer to worklist, on worklist, or executing. > There are interfaces which need to cancel a pending delayed_work - > cancel_delayed_work() and friends and mod_delayed_work(). They want > to cancel a work item in the first three states but it's impossible to > drain the second state from IRQ handlers which lead to the following > oddities. > > * mod_delayed_work() can't be used from IRQ handlers. This function does not exist. So what? > * __cancel_delayed_work() can't use the usual try_to_grab_pending() > which handles all three states but instead only deals with the first > state using a separate implementation. There's no way to make a > delayed_work not pending from IRQ handlers. And why is that desired and justifies the mess you are trying to create in the timer code? > * The context / behavior differences among cancel_delayed_work(), > __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and > confusing (the first two are mostly historical tho). We have a lot of subtle differences in interfaces for similar reasons. > This patchset implements irqsafe timers. For an irqsafe timer, IRQ is > not enabled from dispatch till the end of its execution making it safe > to drain the timer regardless of context. This will enable cleaning > up delayed_work interface. By burdening crap on the timer code. We had a similar context case handling in the original hrtimers code and Linus went berserk on it. There is no real good reason to reinvent it in a different flavour. Your general approach about workqueues seems to be adding hacks into other sensitive code paths [ see __schedule() ]. Can we please stop that? workqueues are not so special to justify that. The whole delayed work handling is a fragile construct due to the combination of the timer and the work struct itself. As long as the timer is pending the work is detached from the workqueue code and that's what causes the issues at hand. So rather than hacking special cases into the timer code, why can't we manage those delayed works in the workqueue core itself and avoid the whole timer dance? Right now delayed work arms a timer, whose callback enqueues the work and wakes the worker thread, which then executes the work. So what about changing delayed_work into: struct delayed_work { struct work_struct work; unsigned long expires; }; Now when delayed work gets scheduled it gets enqueued into a separate list in the workqueue core with the proper worker lock held. Then check the expiry time of the new work against the current expiry time of a timer in the worker itself. If the new expiry time is after the current expiry time, nothing to do. If the new expiry is before the current expiry time or the timer is not armed, then (re)arm the timer. When the timer expires it wakes the worker and that evaluates the delayed list for expired works and executes them and rearms the timer if necessary. To cancel delayed work you don't have to worry about the timer callback being executed at all, simply because the timer callback is just a wakeup of the worker and not fiddling with the work itself. If the work is removed before the worker thread runs, life goes on as usual. So all you have to do is to remove the work from the delayed list. If the timer is armed, just leave it alone and let it fire. Canceling delayed work is probably not a high frequency operation. In fact that would make cancel_delayed_work and cancel_work basically the same operation. I have no idea how many concurrent delayed works are on the fly, so I can't tell whether a simple ordered list is sufficient or if you need a tree which is better suited for a large number of sorted items. But that's a trivial to solve detail. Thoughts? tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 18:55 ` Thomas Gleixner @ 2012-08-14 19:15 ` Tejun Heo 2012-08-14 20:43 ` Thomas Gleixner 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-14 19:15 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Hello, Thomas. On Tue, Aug 14, 2012 at 08:55:16PM +0200, Thomas Gleixner wrote: > > * mod_delayed_work() can't be used from IRQ handlers. > > This function does not exist. So what? It makes the workqueue users messy. It's difficult to get completely correct and subtle errors are difficult to detect / verify. > > * __cancel_delayed_work() can't use the usual try_to_grab_pending() > > which handles all three states but instead only deals with the first > > state using a separate implementation. There's no way to make a > > delayed_work not pending from IRQ handlers. > > And why is that desired and justifies the mess you are trying to > create in the timer code? Because API forcing its users to be messy is stupid. > > * The context / behavior differences among cancel_delayed_work(), > > __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and > > confusing (the first two are mostly historical tho). > > We have a lot of subtle differences in interfaces for similar > reasons. And we should work to make them better. > > This patchset implements irqsafe timers. For an irqsafe timer, IRQ is > > not enabled from dispatch till the end of its execution making it safe > > to drain the timer regardless of context. This will enable cleaning > > up delayed_work interface. > > By burdening crap on the timer code. We had a similar context case > handling in the original hrtimers code and Linus went berserk on it. > There is no real good reason to reinvent it in a different flavour. > > Your general approach about workqueues seems to be adding hacks into > other sensitive code paths [ see __schedule() ]. Can we please stop > that? workqueues are not so special to justify that. The schedule thing worked out pretty well, didn't it? If it improves the kernel in general, I don't see why timer shouldn't participate in it. Timer ain't that special either. However, it does suck to add one-off feature which isn't used by anyone else but I couldn't find a better way. So, if you can think of something better, sure. Let's see. > Right now delayed work arms a timer, whose callback enqueues the work > and wakes the worker thread, which then executes the work. > > So what about changing delayed_work into: > > struct delayed_work { > struct work_struct work; > unsigned long expires; > }; > > Now when delayed work gets scheduled it gets enqueued into a separate > list in the workqueue core with the proper worker lock held. Then > check the expiry time of the new work against the current expiry time > of a timer in the worker itself. Work items aren't assigned to worker on queue. It's a shared worker pool. Workers take work items when they can. > If the new expiry time is after the > current expiry time, nothing to do. If the new expiry is before the > current expiry time or the timer is not armed, then (re)arm the timer. > > When the timer expires it wakes the worker and that evaluates the > delayed list for expired works and executes them and rearms the timer > if necessary. How are you gonna decide which worker a delayed work item should be queued on? What if the work item before it takes a very long time to finish? Do we migrate those work items to a different worker? > To cancel delayed work you don't have to worry about the timer > callback being executed at all, simply because the timer callback is > just a wakeup of the worker and not fiddling with the work itself. If > the work is removed before the worker thread runs, life goes on as > usual. > > So all you have to do is to remove the work from the delayed list. If > the timer is armed, just leave it alone and let it fire. Canceling > delayed work is probably not a high frequency operation. > > In fact that would make cancel_delayed_work and cancel_work basically > the same operation. > > I have no idea how many concurrent delayed works are on the fly, so I > can't tell whether a simple ordered list is sufficient or if you need > a tree which is better suited for a large number of sorted items. But > that's a trivial to solve detail. Aside from work <-> worker association confusion, you're basically suggesting for workqueue to implement its own tvec_base in suboptimal way. Doesn't make much sense to me. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 19:15 ` Tejun Heo @ 2012-08-14 20:43 ` Thomas Gleixner 2012-08-14 21:40 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 20:43 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz On Tue, 14 Aug 2012, Tejun Heo wrote: > On Tue, Aug 14, 2012 at 08:55:16PM +0200, Thomas Gleixner wrote: > > > * mod_delayed_work() can't be used from IRQ handlers. > > > > This function does not exist. So what? > > It makes the workqueue users messy. It's difficult to get completely > correct and subtle errors are difficult to detect / verify. Ah, the function which does not exist makes the users messy. Interesting observation. > > > * __cancel_delayed_work() can't use the usual try_to_grab_pending() > > > which handles all three states but instead only deals with the first > > > state using a separate implementation. There's no way to make a > > > delayed_work not pending from IRQ handlers. > > > > And why is that desired and justifies the mess you are trying to > > create in the timer code? > > Because API forcing its users to be messy is stupid. That's not an answer to my question. Let me rephrase: Why is it required to cancel delayed work from interupt handlers? Is it just because people are doing stupid things or is there are real requirement? No API forces users to be messy. Most of the mess originates from using the wrong mechanisms and then trying to work around that. If you are trying to clean up mess which originates from there, then you are just nurturing the abuse instead of fixing the underlying design problems in the affected code pathes. > > > * The context / behavior differences among cancel_delayed_work(), > > > __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and > > > confusing (the first two are mostly historical tho). > > > > We have a lot of subtle differences in interfaces for similar > > reasons. > > And we should work to make them better. No objection, but not for the price of cluttering common and hot code pathes with conditionals and special cases. There are always situations where it's justified to have a special function which is tailored for a particular context instead of making the common case slow and ugly. > > > This patchset implements irqsafe timers. For an irqsafe timer, IRQ is > > > not enabled from dispatch till the end of its execution making it safe > > > to drain the timer regardless of context. This will enable cleaning > > > up delayed_work interface. > > > > By burdening crap on the timer code. We had a similar context case > > handling in the original hrtimers code and Linus went berserk on it. > > There is no real good reason to reinvent it in a different flavour. > > > > Your general approach about workqueues seems to be adding hacks into > > other sensitive code paths [ see __schedule() ]. Can we please stop > > that? workqueues are not so special to justify that. > > The schedule thing worked out pretty well, didn't it? If it improves You know my opinion about it. It's still a fricking nightmare for RT and aside of that I still don't see a proper justification for having it in the guts of the scheduler. > the kernel in general, I don't see why timer shouldn't participate in > it. Timer ain't that special either. However, it does suck to add I didn't say that timers are special and timers have been improved all the time to make the life of their users simpler. Just there is a subtle difference between making the life simpler and adding stuff which is designed to work around shortcomings in other parts of the kernel. > one-off feature which isn't used by anyone else but I couldn't find a > better way. > > So, if you can think of something better, sure. Let's see. > > > Right now delayed work arms a timer, whose callback enqueues the work > > and wakes the worker thread, which then executes the work. > > > > So what about changing delayed_work into: > > > > struct delayed_work { > > struct work_struct work; > > unsigned long expires; > > }; > > > > Now when delayed work gets scheduled it gets enqueued into a separate > > list in the workqueue core with the proper worker lock held. Then > > check the expiry time of the new work against the current expiry time > > of a timer in the worker itself. > > Work items aren't assigned to worker on queue. It's a shared worker > pool. Workers take work items when they can. Then add it to the worker pool. It's an implementation detail not a design problem. > > If the new expiry time is after the > > current expiry time, nothing to do. If the new expiry is before the > > current expiry time or the timer is not armed, then (re)arm the timer. > > > > When the timer expires it wakes the worker and that evaluates the > > delayed list for expired works and executes them and rearms the timer > > if necessary. > > How are you gonna decide which worker a delayed work item should be > queued on? What if the work item before it takes a very long time to > finish? Do we migrate those work items to a different worker? You're the expert on those worker details :) > > To cancel delayed work you don't have to worry about the timer > > callback being executed at all, simply because the timer callback is > > just a wakeup of the worker and not fiddling with the work itself. If > > the work is removed before the worker thread runs, life goes on as > > usual. > > > > So all you have to do is to remove the work from the delayed list. If > > the timer is armed, just leave it alone and let it fire. Canceling > > delayed work is probably not a high frequency operation. > > > > In fact that would make cancel_delayed_work and cancel_work basically > > the same operation. > > > > I have no idea how many concurrent delayed works are on the fly, so I > > can't tell whether a simple ordered list is sufficient or if you need > > a tree which is better suited for a large number of sorted items. But > > that's a trivial to solve detail. > > Aside from work <-> worker association confusion, you're basically > suggesting for workqueue to implement its own tvec_base in suboptimal > way. Doesn't make much sense to me. Your approach of adding a special case timer for workqueues which burdens all other users and opens another cans of worms of subtle bugs in the timer code does not make any sense to me. Further I did not say that you need to implement your own tvec_base code into workqueues for that. First of all you did not answer my question whether the delayed timer based works are frequent enough to require something else than a simple linked list which is evaluated by a worker thread for the next expiry and the code which adds a new work just checks the cached next expiry value. If you have the requirement of serving tons of those delayed works, then you have the choice of using a set of ready to use library function which just needs a litte tweakage to work with jiffies instead of nsec based values. Secondly and what's worse you are completely ignoring the fact that the problem you are trying to solve is due to the disconnection of the delayed work (pending on a timer) and the workqueue core. The reason why you can't cancel/modify stuff is due to the locking issues which arise due to del_timer_sync() in the current design. So instead of thinking about the shortcomings of your own design you want to force extra cases into the core timer code as you see fit. And looking at the use cases: block/blk-core.c | 6 ++---- block/blk-throttle.c | 7 +------ drivers/block/floppy.c | 3 +-- drivers/infiniband/core/mad.c | 14 +++++--------- drivers/input/keyboard/qt2160.c | 3 +-- drivers/input/mouse/synaptics_i2c.c | 7 +------ net/core/link_watch.c | 21 ++++++--------------- It's not that impressive and I really want to see a detailed analysis why all of this is necessary in the first place before we add special cases to the core timer infrastucture. You can strike drivers/block/floppy.c, drivers/input/keyboard/qt2160.c and drivers/input/mouse/synaptics_i2c.c from that argument list as they are really not interesting at all either due to being obsolete or a complete mess to begin with. For the others please bring up proper arguments why we need this special casing and why those things can't do with other mechanisms. Thanks, tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 20:43 ` Thomas Gleixner @ 2012-08-14 21:40 ` Tejun Heo 2012-08-14 23:12 ` Thomas Gleixner 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-14 21:40 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Hello, Thomas. On Tue, Aug 14, 2012 at 10:43:30PM +0200, Thomas Gleixner wrote: > > It makes the workqueue users messy. It's difficult to get completely > > correct and subtle errors are difficult to detect / verify. > > Ah, the function which does not exist makes the users > messy. Interesting observation. Can we get a little less snarky please? It's tiring. Lack of mod_delayed_work() has been an annoyance for quite a few users. People use either explicit cancel + queue or separate timer + work_struct to work around, or just side-step it in some other way. In any case, they end up implementing it ad-hoc in their code. Nasty. > That's not an answer to my question. Let me rephrase: > > Why is it required to cancel delayed work from interupt handlers? Is > it just because people are doing stupid things or is there are real > requirement? That's pretty much like asking why does del_timer() need to be called from IRQ context. Workqueue is pretty extensively used from IRQ handlers. Queueing a work item from IRQ handlers is a pretty common operation and modding / canceling pending delayed work items isn't difficult to reach from there. Can people live without it? Sure, people can live without pretty much anything, but it's just not good. > No API forces users to be messy. Most of the mess originates from > using the wrong mechanisms and then trying to work around that. > > If you are trying to clean up mess which originates from there, then > you are just nurturing the abuse instead of fixing the underlying > design problems in the affected code pathes. There are good APIs and bad APIs. delayed_work is widely used and unfortunately has complications which stem from implementation details which force its users to work around the shortcomings on their own. It's usually better to go some extra miles (it's not even that much here) in the common API if that saves cruft in its many users. > > > > * The context / behavior differences among cancel_delayed_work(), > > > > __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and > > > > confusing (the first two are mostly historical tho). > > > > > > We have a lot of subtle differences in interfaces for similar > > > reasons. > > > > And we should work to make them better. > > No objection, but not for the price of cluttering common and hot code > pathes with conditionals and special cases. There are always > situations where it's justified to have a special function which is > tailored for a particular context instead of making the common case > slow and ugly. I agree the addition isn't the prettiest thing in the world. I tried to implement it inside workqueue proper by trying to detect the case where try_to_grab_pending() is called from an IRQ handler which interrupted delayed_work_timer_fn() but I couldn't make it work reliably in any sane way. So, if you can come up with something better, be my guest. Short of that, I think adding irqsafe timer is a reasonable trade-off. > > The schedule thing worked out pretty well, didn't it? If it improves > > You know my opinion about it. It's still a fricking nightmare for RT > and aside of that I still don't see a proper justification for having > it in the guts of the scheduler. I can imagine it sucking for RT but for mainline, we now have self-regulating worker pool without huge and messy heuristics trying to guesstimate what's going on and we need some sort of worker pool mechanism in kernel. It might not suit your fancy but it has been working pretty well. > > the kernel in general, I don't see why timer shouldn't participate in > > it. Timer ain't that special either. However, it does suck to add > > I didn't say that timers are special and timers have been improved all > the time to make the life of their users simpler. Just there is a > subtle difference between making the life simpler and adding stuff > which is designed to work around shortcomings in other parts of the > kernel. The problem here is that it's really difficult to make delayed_work (or any other generic construct wrapping timer) work properly (people naturally expect similar behavior to timer) using the interface timer currently exposes and it's not caused by some random limitation in delayed_work implementation. Until now, the burden has been transferred to delayed_work users, which is rather sad. So, I really don't think there's any fundamental conflict regarding the basic direction. > > Aside from work <-> worker association confusion, you're basically > > suggesting for workqueue to implement its own tvec_base in suboptimal > > way. Doesn't make much sense to me. > > Your approach of adding a special case timer for workqueues which > burdens all other users and opens another cans of worms of subtle bugs > in the timer code does not make any sense to me. What kind of cans of worms would it open? > First of all you did not answer my question whether the delayed timer > based works are frequent enough to require something else than a > simple linked list which is evaluated by a worker thread for the next > expiry and the code which adds a new work just checks the cached next > expiry value. If you have the requirement of serving tons of those > delayed works, then you have the choice of using a set of ready to use > library function which just needs a litte tweakage to work with > jiffies instead of nsec based values. Grep for delayed_work. It can be as frequent as any timer. Some drivers use it to delay certain processing from IRQ handlers. Block layer uses it a lot to manage plugging and other delayed processing. They can generate a lot of timer modifications. It isn't different from any timer. Custom ad-hoc unscalable timer mechanism vs. single if(). > Secondly and what's worse you are completely ignoring the fact that > the problem you are trying to solve is due to the disconnection of the > delayed work (pending on a timer) and the workqueue core. Alternatively, it's addressing timer's deficiency of not allowing building other generic constructs on top of it. It's not some black or white thing. > The reason why you can't cancel/modify stuff is due to the locking > issues which arise due to del_timer_sync() in the current design. > > So instead of thinking about the shortcomings of your own design you > want to force extra cases into the core timer code as you see fit. No, the shortcoming is fundamental in the *timer* interface if you want to build something which incorporates it and be able to present the same timer-like behavior to its users. Really, try it with the current interface. Now, whether being able to wrap timer to build something like delayed_work is necessary could be debated. Maybe everyone should just use timer + work_struct directly, but I think it's a bit too late to have that discussion and even if we aren't I think having delayed_work is a good idea. > And looking at the use cases: > > block/blk-core.c | 6 ++---- > block/blk-throttle.c | 7 +------ > drivers/block/floppy.c | 3 +-- > drivers/infiniband/core/mad.c | 14 +++++--------- > drivers/input/keyboard/qt2160.c | 3 +-- > drivers/input/mouse/synaptics_i2c.c | 7 +------ > net/core/link_watch.c | 21 ++++++--------------- Hmmm? The diffstat is more like, arch/powerpc/platforms/cell/cpufreq_spudemand.c | 2 block/blk-core.c | 8 - block/blk-throttle.c | 7 - drivers/block/floppy.c | 5 drivers/cpufreq/cpufreq_conservative.c | 2 drivers/cpufreq/cpufreq_ondemand.c | 2 drivers/devfreq/devfreq.c | 2 drivers/infiniband/core/mad.c | 16 -- drivers/input/keyboard/qt2160.c | 3 drivers/input/mouse/synaptics_i2c.c | 7 - drivers/net/ethernet/mellanox/mlx4/sense.c | 2 drivers/power/ab8500_btemp.c | 2 drivers/power/ab8500_charger.c | 8 - drivers/power/ab8500_fg.c | 8 - drivers/power/abx500_chargalg.c | 4 drivers/power/max17040_battery.c | 2 drivers/video/omap2/displays/panel-taal.c | 6 drivers/video/omap2/dss/dsi.c | 6 mm/slab.c | 2 mm/vmstat.c | 2 net/core/link_watch.c | 21 --- net/core/neighbour.c | 2 net/ipv4/inetpeer.c | 2 net/sunrpc/cache.c | 2 and these are only the obvious conversions and aren't simple conversions. Now they behave *properly*. If the user calls cancel, the delayed_work is reliably canceled and mod_delayed_work() *reliably* schedule the work item at the specified time regardless of the current state of delayed_work. > It's not that impressive and I really want to see a detailed analysis > why all of this is necessary in the first place before we add special > cases to the core timer infrastucture. > > You can strike drivers/block/floppy.c, drivers/input/keyboard/qt2160.c > and drivers/input/mouse/synaptics_i2c.c from that argument list as > they are really not interesting at all either due to being obsolete or > a complete mess to begin with. > > For the others please bring up proper arguments why we need this > special casing and why those things can't do with other mechanisms. I think I've written enough here and in the commit message but to sum up, A single if() in timer enables, * Quite a few cleanups. I did only cancel + queue ones here. There are some which were using explicit timer + work_struct. * Much cleaner API which behaves reliably as expected, which makes delayed_work users life easier and reduces the possibility of subtle bugs. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 21:40 ` Tejun Heo @ 2012-08-14 23:12 ` Thomas Gleixner 2012-08-14 23:27 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 23:12 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Tejun, On Tue, 14 Aug 2012, Tejun Heo wrote: > On Tue, Aug 14, 2012 at 10:43:30PM +0200, Thomas Gleixner wrote: > > > It makes the workqueue users messy. It's difficult to get completely > > > correct and subtle errors are difficult to detect / verify. > > > > Ah, the function which does not exist makes the users > > messy. Interesting observation. > > Can we get a little less snarky please? It's tiring. Can you please try to answer my questions instead of throwing random blurb into my direction? Just for the record. The thread evolved from here: <tj> * mod_delayed_work() can't be used from IRQ handlers. My answer was: <tglx> This function does not exist. So what? Which was completely appropriate as this function does not exist though you used it as a primary argument for your patches. Now your answer to my reply was: <tj> It makes the workqueue users messy. It's difficult to get completely correct and subtle errors are difficult to detect / verify. Can you please point out any relevance to my question which would have me prevented from writing the following? <tglx> Ah, the function which does not exist makes the users messy. Interesting observation. So instead of saying, that you wrote an utter nonsense reply you accuse me of being obnoxious: <tj> Can we get a little less snarky please? It's tiring. Can you please sit down for a little while and think about your own snarkiness and your own tiring behaviour against other kernel maintainers? Thanks, tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 23:12 ` Thomas Gleixner @ 2012-08-14 23:27 ` Tejun Heo 2012-08-14 23:46 ` Thomas Gleixner 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2012-08-14 23:27 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Hello, Thomas. On Wed, Aug 15, 2012 at 01:12:01AM +0200, Thomas Gleixner wrote: > Just for the record. The thread evolved from here: > > <tj> * mod_delayed_work() can't be used from IRQ handlers. > > My answer was: > > <tglx> This function does not exist. So what? > > Which was completely appropriate as this function does not exist > though you used it as a primary argument for your patches. I read it as "so, what's wrong with not having mod_delayed_work()?", so the response. It exists in wq/for-3.7 and cancel_delayed_work() (the one without preceding __) + queue() users have been already converted. http://thread.gmane.org/gmane.linux.kernel/1334546 > Can you please sit down for a little while and think about your own > snarkiness and your own tiring behaviour against other kernel > maintainers? Believe it or not, I tend to work pretty well with other maintainers and developers. You start responding with words like "mess" and "crap" with condescension sprinkled and expect the conversation to not escalate? Let's just stay technical, okay? Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 23:27 ` Tejun Heo @ 2012-08-14 23:46 ` Thomas Gleixner 2012-08-14 23:52 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Thomas Gleixner @ 2012-08-14 23:46 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Tejun, On Tue, 14 Aug 2012, Tejun Heo wrote: > On Wed, Aug 15, 2012 at 01:12:01AM +0200, Thomas Gleixner wrote: > > Just for the record. The thread evolved from here: > > > > <tj> * mod_delayed_work() can't be used from IRQ handlers. > > > > My answer was: > > > > <tglx> This function does not exist. So what? > > > > Which was completely appropriate as this function does not exist > > though you used it as a primary argument for your patches. > > I read it as "so, what's wrong with not having mod_delayed_work()?", > so the response. Oh well. Your interpretation of "So what?" starts to stress my patience. > It exists in wq/for-3.7 and cancel_delayed_work() (the one without > preceding __) + queue() users have been already converted. > > http://thread.gmane.org/gmane.linux.kernel/1334546 Do you really expect that I follow all of kernel dev posts within a day of returning from a two weeks vacation? Thanks, tglx ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers 2012-08-14 23:46 ` Thomas Gleixner @ 2012-08-14 23:52 ` Tejun Heo 0 siblings, 0 replies; 31+ messages in thread From: Tejun Heo @ 2012-08-14 23:52 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, torvalds, mingo, akpm, peterz Hello, On Tue, Aug 14, 2012 at 4:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > Do you really expect that I follow all of kernel dev posts within a > day of returning from a two weeks vacation? The head message says on what it's based on and the git branch. I can't read your mind or know your current state. You could have asked using a proper sentence. We can continue this, but I don't think this is leading any place productive. I'm replying to your other reply about the suggestion about implementing workqueue's own timerlist. Let's just talk about that. Please. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2012-08-21 19:27 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-08-08 18:10 Tejun Heo 2012-08-08 18:10 ` [PATCH 1/4] timer: generalize timer->base flags handling Tejun Heo 2012-08-21 16:40 ` [tip:timers/core] timer: Generalize " tip-bot for Tejun Heo 2012-08-08 18:10 ` [PATCH 2/4] timer: relocate declarations of init_timer_on_stack_key() Tejun Heo 2012-08-21 16:41 ` [tip:timers/core] timer: Relocate " tip-bot for Tejun Heo 2012-08-08 18:10 ` [PATCH 3/4] timer: clean up timer initializers Tejun Heo 2012-08-21 16:42 ` [tip:timers/core] timer: Clean " tip-bot for Tejun Heo 2012-08-08 18:10 ` [PATCH 4/4] timer: implement TIMER_IRQSAFE Tejun Heo 2012-08-21 16:43 ` [tip:timers/core] timer: Implement TIMER_IRQSAFE tip-bot for Tejun Heo 2012-08-21 19:26 ` Tejun Heo 2012-08-08 18:13 ` $SUBJ should have been "[PATCHSET] timer: clean up initializers and implement irqsafe timers" Tejun Heo 2012-08-13 23:35 ` [PATCHSET] timer: clean up initializers and implement irqsafe timers Tejun Heo 2012-08-14 8:32 ` Thomas Gleixner 2012-08-14 19:16 ` Thomas Gleixner 2012-08-14 19:22 ` Tejun Heo 2012-08-14 21:03 ` Thomas Gleixner 2012-08-14 21:56 ` Tejun Heo 2012-08-14 22:45 ` Thomas Gleixner 2012-08-14 23:01 ` Tejun Heo 2012-08-14 23:33 ` Thomas Gleixner 2012-08-15 0:18 ` Tejun Heo 2012-08-15 10:58 ` Peter Zijlstra 2012-08-16 19:36 ` Tejun Heo 2012-08-14 18:55 ` Thomas Gleixner 2012-08-14 19:15 ` Tejun Heo 2012-08-14 20:43 ` Thomas Gleixner 2012-08-14 21:40 ` Tejun Heo 2012-08-14 23:12 ` Thomas Gleixner 2012-08-14 23:27 ` Tejun Heo 2012-08-14 23:46 ` Thomas Gleixner 2012-08-14 23:52 ` Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.