linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2] timers: Don't wake ktimersoftd on every tick
@ 2016-12-13 21:44 Haris Okanovic
  2016-12-23 17:28 ` Sebastian Andrzej Siewior
  2017-02-03 16:51 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 42+ messages in thread
From: Haris Okanovic @ 2016-12-13 21:44 UTC (permalink / raw)
  To: linux-rt-users
  Cc: linux-kernel, bigeasy, tglx, julia.cartwright, gratian.crisan,
	haris.okanovic

Changed the way timers are collected per Julia and Thomas'
recommendation: Expired timers are now collected in interrupt context
and fired in ktimersoftd to avoid double-walk of `pending_map`.

This is implemented by storing lists of expired timers in timer_base,
which carries a memory overhead 9*sizeof(pointer) per CPU. The timer
system uses hlist's which don't have end-node references, making it
impossible to merge 2 hlist's in constant time. I.e. Merging requires
walking one list. I also considered switching `vectors` to regular
list's which don't have this limitations, but that approach has the same
memory overhead. list_head is bigger than hlist_head by sizeof(pointer)
and is instantiated 9+ times per CPU as `vectors`. I believe the only
way to trim overhead is to spend more CPU cycles in interrupt context
either in list merging (unbounded operation) or the original double-walk
implementation. Any suggestions/preferences?

As before, a 6h run of cyclictest without CPU affinity shows decrease in
22-70us latency range. No change in max jitter. No change when `-S` is
used.

[Before/after traces]

ftp://ftp.ni.com/outgoing/tp02-timer-peek-traces.tgz
(Email me if link dies. Server periodically purges old files.)

[Hardware/software/config]

 NI cRIO-9033
  2 core Intel Atom CPU

 Kernel 4.8.6-rt5
  CONFIG_HZ_PERIODIC=y

[Outstanding concerns/issues/questions]

I'm relatively new to the timer subsystem, so please feel free to poke
as many holes as possible in this change. A few things that concern me
at the moment are:

Can jiffies change while one or more cpus is inside tick_sched_timer(),
 in interrupt context? I'm copying jiffies to a local variable in
 find_expired_timers() to ensure it doesn't run unbounded, but I'm not
 sure if that's necessary.

Any special considerations for testing NO_HZ builds? (Other than letting
it run idle for a while)

timers_dead_cpu() presently asserts no timer callback is actively
running, which suggests that timers must be canceled prior to disabling
CPUs; otherwise, there's a race between active timers and hotplug
which can crash the whole kernel. Is this a safe assumption to make and
are there any special considerations for CPU hotplug testing?

Other tests/performance benchmark I should run?

Source: https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v2

Thanks,
Haris
---
 kernel/time/timer.c | 97 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 67 insertions(+), 30 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ba53447..b0cee45 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -206,6 +206,8 @@ struct timer_base {
 	bool			is_idle;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
+	struct hlist_head	expired_lists[LVL_DEPTH];
+	int					expired_count;
 } ____cacheline_aligned;
 
 static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
@@ -1343,7 +1345,8 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 	}
 }
 
-static void expire_timers(struct timer_base *base, struct hlist_head *head)
+static inline void __expire_timers(struct timer_base *base,
+							struct hlist_head *head)
 {
 	while (!hlist_empty(head)) {
 		struct timer_list *timer;
@@ -1360,7 +1363,7 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 		data = timer->data;
 
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
-		    timer->flags & TIMER_IRQSAFE) {
+			timer->flags & TIMER_IRQSAFE) {
 			raw_spin_unlock(&base->lock);
 			call_timer_fn(timer, fn, data);
 			base->running_timer = NULL;
@@ -1374,21 +1377,37 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static void expire_timers(struct timer_base *base)
+{
+	struct hlist_head *head;
+
+	while (base->expired_count--) {
+		head = base->expired_lists + base->expired_count;
+		__expire_timers(base, head);
+	}
+	base->expired_count = 0;
+}
+
+static void __collect_expired_timers(struct timer_base *base)
 {
 	unsigned long clk = base->clk;
 	struct hlist_head *vec;
-	int i, levels = 0;
+	int i;
 	unsigned int idx;
 
+	/* expire_timers() must be called at least once before we can
+	 * collect more timers
+	 */
+	if (WARN_ON(base->expired_count))
+		return;
+
 	for (i = 0; i < LVL_DEPTH; i++) {
 		idx = (clk & LVL_MASK) + i * LVL_SIZE;
 
 		if (__test_and_clear_bit(idx, base->pending_map)) {
 			vec = base->vectors + idx;
-			hlist_move_list(vec, heads++);
-			levels++;
+			hlist_move_list(vec,
+				&base->expired_lists[base->expired_count++]);
 		}
 		/* Is it time to look at the next level? */
 		if (clk & LVL_CLK_MASK)
@@ -1396,7 +1415,6 @@ static int __collect_expired_timers(struct timer_base *base,
 		/* Shift clock for the next level granularity */
 		clk >>= LVL_CLK_SHIFT;
 	}
-	return levels;
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -1585,8 +1603,7 @@ void timer_clear_idle(void)
 	base->is_idle = false;
 }
 
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
+static void collect_expired_timers(struct timer_base *base)
 {
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
@@ -1603,20 +1620,49 @@ static int collect_expired_timers(struct timer_base *base,
 		if (time_after(next, jiffies)) {
 			/* The call site will increment clock! */
 			base->clk = jiffies - 1;
-			return 0;
+			return;
 		}
 		base->clk = next;
 	}
-	return __collect_expired_timers(base, heads);
+	__collect_expired_timers(base);
 }
 #else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
+static inline void collect_expired_timers(struct timer_base *base)
 {
-	return __collect_expired_timers(base, heads);
+	__collect_expired_timers(base);
 }
 #endif
 
+static int find_expired_timers(struct timer_base *base)
+{
+	const unsigned long int end_clk = jiffies;
+
+	while (!base->expired_count && time_after_eq(end_clk, base->clk)) {
+		collect_expired_timers(base);
+		base->clk++;
+	}
+
+	return base->expired_count;
+}
+
+/* Called from cpu tick routine to quickly collect expired timers */
+static int tick_find_expired(struct timer_base *base)
+{
+	int count;
+
+	raw_spin_lock(&base->lock);
+
+	if (unlikely(time_after(jiffies, base->clk + HZ))) {
+		/* defer to ktimersoftd; don't spend too long in irq context */
+		count = -1;
+	} else
+		count = find_expired_timers(base);
+
+	raw_spin_unlock(&base->lock);
+
+	return count;
+}
+
 /*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
@@ -1643,22 +1689,11 @@ void update_process_times(int user_tick)
  */
 static inline void __run_timers(struct timer_base *base)
 {
-	struct hlist_head heads[LVL_DEPTH];
-	int levels;
-
-	if (!time_after_eq(jiffies, base->clk))
-		return;
-
 	raw_spin_lock_irq(&base->lock);
 
-	while (time_after_eq(jiffies, base->clk)) {
-
-		levels = collect_expired_timers(base, heads);
-		base->clk++;
+	while (find_expired_timers(base))
+		expire_timers(base);
 
-		while (levels--)
-			expire_timers(base, heads + levels);
-	}
 	raw_spin_unlock_irq(&base->lock);
 	wakeup_timer_waiters(base);
 }
@@ -1686,12 +1721,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);
@@ -1861,6 +1896,7 @@ int timers_dead_cpu(unsigned int cpu)
 		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
 
 		BUG_ON(old_base->running_timer);
+		BUG_ON(old_base->expired_count);
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
@@ -1887,6 +1923,7 @@ static void __init init_timer_cpu(int cpu)
 #ifdef CONFIG_PREEMPT_RT_FULL
 		init_swait_queue_head(&base->wait_for_running_timer);
 #endif
+		base->expired_count = 0;
 	}
 }
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC v2] timers: Don't wake ktimersoftd on every tick
  2016-12-13 21:44 [RFC v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
@ 2016-12-23 17:28 ` Sebastian Andrzej Siewior
  2016-12-28 18:06   ` Haris Okanovic
  2017-02-03 16:51 ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-12-23 17:28 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright, gratian.crisan

On 2016-12-13 15:44:05 [-0600], Haris Okanovic wrote:
> Changed the way timers are collected per Julia and Thomas'

I can only see Julia's response to the initial thread.

> recommendation: Expired timers are now collected in interrupt context
> and fired in ktimersoftd to avoid double-walk of `pending_map`.
> 
> This is implemented by storing lists of expired timers in timer_base,
> which carries a memory overhead 9*sizeof(pointer) per CPU. The timer
> system uses hlist's which don't have end-node references, making it
> impossible to merge 2 hlist's in constant time. I.e. Merging requires
> walking one list. I also considered switching `vectors` to regular
> list's which don't have this limitations, but that approach has the same
> memory overhead. list_head is bigger than hlist_head by sizeof(pointer)
> and is instantiated 9+ times per CPU as `vectors`. I believe the only
> way to trim overhead is to spend more CPU cycles in interrupt context
> either in list merging (unbounded operation) or the original double-walk
> implementation. Any suggestions/preferences?
> 
> As before, a 6h run of cyclictest without CPU affinity shows decrease in
> 22-70us latency range. 
what does this mean? Your cyclictest runs on a random CPU with one thread
only?

> No change in max jitter. 
Does this mean your average latency went down 20-70us and your max is
the same?

> No change when `-S` is
> used.

-S gives you one thread per core, makes sure it stays on that core and
uses clock_nanosleep().

clock_nanosleep() should be used no matter what. 


> [Before/after traces]
> 
> ftp://ftp.ni.com/outgoing/tp02-timer-peek-traces.tgz
> (Email me if link dies. Server periodically purges old files.)
> 
> [Hardware/software/config]
> 
>  NI cRIO-9033
>   2 core Intel Atom CPU
> 
>  Kernel 4.8.6-rt5
>   CONFIG_HZ_PERIODIC=y
> 
> [Outstanding concerns/issues/questions]
> 
> I'm relatively new to the timer subsystem, so please feel free to poke
> as many holes as possible in this change. A few things that concern me
> at the moment are:
> 
> Can jiffies change while one or more cpus is inside tick_sched_timer(),
>  in interrupt context? I'm copying jiffies to a local variable in
>  find_expired_timers() to ensure it doesn't run unbounded, but I'm not
>  sure if that's necessary.

It could change. Only the house keeping does update jiffies in
tick_sched_do_timer().

> Any special considerations for testing NO_HZ builds? (Other than letting
> it run idle for a while)
> 
> timers_dead_cpu() presently asserts no timer callback is actively
> running, which suggests that timers must be canceled prior to disabling
> CPUs; otherwise, there's a race between active timers and hotplug
> which can crash the whole kernel. Is this a safe assumption to make and
> are there any special considerations for CPU hotplug testing?

timers_dead_cpu() and hrtimers_dead_cpu() migrate timer away. At that
point the CPU should be down already so a timer can't run on that CPU.

> Other tests/performance benchmark I should run?
> 
> Source: https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v2
> 
> Thanks,
> Haris

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v2] timers: Don't wake ktimersoftd on every tick
  2016-12-23 17:28 ` Sebastian Andrzej Siewior
@ 2016-12-28 18:06   ` Haris Okanovic
  0 siblings, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2016-12-28 18:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright, gratian.crisan

On 12/23/2016 11:28 AM, Sebastian Andrzej Siewior wrote:
> On 2016-12-13 15:44:05 [-0600], Haris Okanovic wrote:
>> Changed the way timers are collected per Julia and Thomas'
>
> I can only see Julia's response to the initial thread.
>

I should have been more clear. Thomas commented on irc and recommended 
Julia's approach.

>> recommendation: Expired timers are now collected in interrupt context
>> and fired in ktimersoftd to avoid double-walk of `pending_map`.
>>
>> This is implemented by storing lists of expired timers in timer_base,
>> which carries a memory overhead 9*sizeof(pointer) per CPU. The timer
>> system uses hlist's which don't have end-node references, making it
>> impossible to merge 2 hlist's in constant time. I.e. Merging requires
>> walking one list. I also considered switching `vectors` to regular
>> list's which don't have this limitations, but that approach has the same
>> memory overhead. list_head is bigger than hlist_head by sizeof(pointer)
>> and is instantiated 9+ times per CPU as `vectors`. I believe the only
>> way to trim overhead is to spend more CPU cycles in interrupt context
>> either in list merging (unbounded operation) or the original double-walk
>> implementation. Any suggestions/preferences?
>>
>> As before, a 6h run of cyclictest without CPU affinity shows decrease in
>> 22-70us latency range.
> what does this mean? Your cyclictest runs on a random CPU with one thread
> only?
>

Yes. My point is that cyclictest only shows a significant difference 
(before and after this change) when `-S` is not used.

>> No change in max jitter.
> Does this mean your average latency went down 20-70us and your max is
> the same?
>

Yes. Average latency (20-70us range) goes down in a single-threaded run 
of cyclictest. Max jitter stays the same in both single and multi-thread 
runs.

>> No change when `-S` is
>> used.
>
> -S gives you one thread per core, makes sure it stays on that core and
> uses clock_nanosleep().
>
> clock_nanosleep() should be used no matter what.
>
>
>> [Before/after traces]
>>
>> ftp://ftp.ni.com/outgoing/tp02-timer-peek-traces.tgz
>> (Email me if link dies. Server periodically purges old files.)
>>
>> [Hardware/software/config]
>>
>>  NI cRIO-9033
>>   2 core Intel Atom CPU
>>
>>  Kernel 4.8.6-rt5
>>   CONFIG_HZ_PERIODIC=y
>>
>> [Outstanding concerns/issues/questions]
>>
>> I'm relatively new to the timer subsystem, so please feel free to poke
>> as many holes as possible in this change. A few things that concern me
>> at the moment are:
>>
>> Can jiffies change while one or more cpus is inside tick_sched_timer(),
>>  in interrupt context? I'm copying jiffies to a local variable in
>>  find_expired_timers() to ensure it doesn't run unbounded, but I'm not
>>  sure if that's necessary.
>
> It could change. Only the house keeping does update jiffies in
> tick_sched_do_timer().
>
>> Any special considerations for testing NO_HZ builds? (Other than letting
>> it run idle for a while)
>>
>> timers_dead_cpu() presently asserts no timer callback is actively
>> running, which suggests that timers must be canceled prior to disabling
>> CPUs; otherwise, there's a race between active timers and hotplug
>> which can crash the whole kernel. Is this a safe assumption to make and
>> are there any special considerations for CPU hotplug testing?
>
> timers_dead_cpu() and hrtimers_dead_cpu() migrate timer away. At that
> point the CPU should be down already so a timer can't run on that CPU.
>
>> Other tests/performance benchmark I should run?
>>
>> Source: https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v2
>>
>> Thanks,
>> Haris
>
> Sebastian
>

-- Haris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC v2] timers: Don't wake ktimersoftd on every tick
  2016-12-13 21:44 [RFC v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  2016-12-23 17:28 ` Sebastian Andrzej Siewior
@ 2017-02-03 16:51 ` Sebastian Andrzej Siewior
  2017-02-03 18:21   ` [PATCH] " Haris Okanovic
  2017-02-03 18:27   ` [RFC v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  1 sibling, 2 replies; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-02-03 16:51 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright, gratian.crisan

On 2016-12-13 15:44:05 [-0600], Haris Okanovic wrote:
> Changed the way timers are collected per Julia and Thomas'
> recommendation: Expired timers are now collected in interrupt context
> and fired in ktimersoftd to avoid double-walk of `pending_map`.
> Thanks,
> Haris

I've been staring at it for a while now and now I applied it locally and
look how it goes.
I would need that patch signed as per
	https://kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin
if you want it applied.
> ---
>  kernel/time/timer.c | 97 ++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 67 insertions(+), 30 deletions(-)

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] timers: Don't wake ktimersoftd on every tick
  2017-02-03 16:51 ` Sebastian Andrzej Siewior
@ 2017-02-03 18:21   ` Haris Okanovic
  2017-02-10 17:02     ` Sebastian Andrzej Siewior
  2017-02-03 18:27   ` [RFC v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  1 sibling, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2017-02-03 18:21 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel, bigeasy
  Cc: haris.okanovic, tglx, julia.cartwright, gratian.crisan

Collect expired timers in interrupt context to avoid overhead of waking
ktimersoftd on every tick. ktimersoftd now wakes only when one or more
timers are ready, which yields a minor reduction in small latency spikes.

This is implemented by storing lists of expired timers in timer_base,
updated on each tick. Any addition to the lists wakes ktimersoftd
(softirq) to process those timers.

Please refer to the following RFC threads for more details:
https://www.spinics.net/lists/linux-rt-users/msg16095.html
https://www.spinics.net/lists/linux-rt-users/msg16113.html

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
---
 kernel/time/timer.c | 97 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 67 insertions(+), 30 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 08a5ab7..5ed1484 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -206,6 +206,8 @@ struct timer_base {
 	bool			is_idle;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
+	struct hlist_head	expired_lists[LVL_DEPTH];
+	int					expired_count;
 } ____cacheline_aligned;
 
 static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
@@ -1353,7 +1355,8 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 	}
 }
 
-static void expire_timers(struct timer_base *base, struct hlist_head *head)
+static inline void __expire_timers(struct timer_base *base,
+							struct hlist_head *head)
 {
 	while (!hlist_empty(head)) {
 		struct timer_list *timer;
@@ -1370,7 +1373,7 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 		data = timer->data;
 
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
-		    timer->flags & TIMER_IRQSAFE) {
+			timer->flags & TIMER_IRQSAFE) {
 			raw_spin_unlock(&base->lock);
 			call_timer_fn(timer, fn, data);
 			base->running_timer = NULL;
@@ -1384,21 +1387,37 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static void expire_timers(struct timer_base *base)
+{
+	struct hlist_head *head;
+
+	while (base->expired_count--) {
+		head = base->expired_lists + base->expired_count;
+		__expire_timers(base, head);
+	}
+	base->expired_count = 0;
+}
+
+static void __collect_expired_timers(struct timer_base *base)
 {
 	unsigned long clk = base->clk;
 	struct hlist_head *vec;
-	int i, levels = 0;
+	int i;
 	unsigned int idx;
 
+	/* expire_timers() must be called at least once before we can
+	 * collect more timers
+	 */
+	if (WARN_ON(base->expired_count))
+		return;
+
 	for (i = 0; i < LVL_DEPTH; i++) {
 		idx = (clk & LVL_MASK) + i * LVL_SIZE;
 
 		if (__test_and_clear_bit(idx, base->pending_map)) {
 			vec = base->vectors + idx;
-			hlist_move_list(vec, heads++);
-			levels++;
+			hlist_move_list(vec,
+				&base->expired_lists[base->expired_count++]);
 		}
 		/* Is it time to look at the next level? */
 		if (clk & LVL_CLK_MASK)
@@ -1406,7 +1425,6 @@ static int __collect_expired_timers(struct timer_base *base,
 		/* Shift clock for the next level granularity */
 		clk >>= LVL_CLK_SHIFT;
 	}
-	return levels;
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -1599,8 +1617,7 @@ void timer_clear_idle(void)
 	base->is_idle = false;
 }
 
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
+static void collect_expired_timers(struct timer_base *base)
 {
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
@@ -1617,20 +1634,49 @@ static int collect_expired_timers(struct timer_base *base,
 		if (time_after(next, jiffies)) {
 			/* The call site will increment clock! */
 			base->clk = jiffies - 1;
-			return 0;
+			return;
 		}
 		base->clk = next;
 	}
-	return __collect_expired_timers(base, heads);
+	__collect_expired_timers(base);
 }
 #else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
+static inline void collect_expired_timers(struct timer_base *base)
 {
-	return __collect_expired_timers(base, heads);
+	__collect_expired_timers(base);
 }
 #endif
 
+static int find_expired_timers(struct timer_base *base)
+{
+	const unsigned long int end_clk = jiffies;
+
+	while (!base->expired_count && time_after_eq(end_clk, base->clk)) {
+		collect_expired_timers(base);
+		base->clk++;
+	}
+
+	return base->expired_count;
+}
+
+/* Called from cpu tick routine to quickly collect expired timers */
+static int tick_find_expired(struct timer_base *base)
+{
+	int count;
+
+	raw_spin_lock(&base->lock);
+
+	if (unlikely(time_after(jiffies, base->clk + HZ))) {
+		/* defer to ktimersoftd; don't spend too long in irq context */
+		count = -1;
+	} else
+		count = find_expired_timers(base);
+
+	raw_spin_unlock(&base->lock);
+
+	return count;
+}
+
 /*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
@@ -1657,22 +1703,11 @@ void update_process_times(int user_tick)
  */
 static inline void __run_timers(struct timer_base *base)
 {
-	struct hlist_head heads[LVL_DEPTH];
-	int levels;
-
-	if (!time_after_eq(jiffies, base->clk))
-		return;
-
 	raw_spin_lock_irq(&base->lock);
 
-	while (time_after_eq(jiffies, base->clk)) {
-
-		levels = collect_expired_timers(base, heads);
-		base->clk++;
+	while (find_expired_timers(base))
+		expire_timers(base);
 
-		while (levels--)
-			expire_timers(base, heads + levels);
-	}
 	raw_spin_unlock_irq(&base->lock);
 	wakeup_timer_waiters(base);
 }
@@ -1700,12 +1735,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);
@@ -1875,6 +1910,7 @@ int timers_dead_cpu(unsigned int cpu)
 		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
 
 		BUG_ON(old_base->running_timer);
+		BUG_ON(old_base->expired_count);
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
@@ -1901,6 +1937,7 @@ static void __init init_timer_cpu(int cpu)
 #ifdef CONFIG_PREEMPT_RT_FULL
 		init_swait_queue_head(&base->wait_for_running_timer);
 #endif
+		base->expired_count = 0;
 	}
 }
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC v2] timers: Don't wake ktimersoftd on every tick
  2017-02-03 16:51 ` Sebastian Andrzej Siewior
  2017-02-03 18:21   ` [PATCH] " Haris Okanovic
@ 2017-02-03 18:27   ` Haris Okanovic
  1 sibling, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2017-02-03 18:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright, gratian.crisan



On 02/03/2017 10:51 AM, Sebastian Andrzej Siewior wrote:
> On 2016-12-13 15:44:05 [-0600], Haris Okanovic wrote:
>> Changed the way timers are collected per Julia and Thomas'
>> recommendation: Expired timers are now collected in interrupt context
>> and fired in ktimersoftd to avoid double-walk of `pending_map`.
> …
>> Thanks,
>> Haris
>
> I've been staring at it for a while now and now I applied it locally and
> look how it goes.
> I would need that patch signed as per
> 	https://kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin
> if you want it applied.

Sent a proper PATCH email with sign-off.

Built and booted with both CONFIG_HZ_PERIODIC=y and CONFIG_NO_HZ_IDLE=y 
configurations (100 HZ). Ran short cyclictest to verify numbers.

>> ---
>>  kernel/time/timer.c | 97 ++++++++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 67 insertions(+), 30 deletions(-)
>
> Sebastian
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] timers: Don't wake ktimersoftd on every tick
  2017-02-03 18:21   ` [PATCH] " Haris Okanovic
@ 2017-02-10 17:02     ` Sebastian Andrzej Siewior
  2017-05-26 17:16       ` [PATCH] Revert "timers: Don't wake ktimersoftd on every tick" Anna-Maria Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-02-10 17:02 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright, gratian.crisan

On 2017-02-03 12:21:12 [-0600], Haris Okanovic wrote:
> Collect expired timers in interrupt context to avoid overhead of waking
> ktimersoftd on every tick. ktimersoftd now wakes only when one or more
> timers are ready, which yields a minor reduction in small latency spikes.
> 
> This is implemented by storing lists of expired timers in timer_base,
> updated on each tick. Any addition to the lists wakes ktimersoftd
> (softirq) to process those timers.
> 
> Please refer to the following RFC threads for more details:
> https://www.spinics.net/lists/linux-rt-users/msg16095.html
> https://www.spinics.net/lists/linux-rt-users/msg16113.html
> 
> Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>

applied.

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-02-10 17:02     ` Sebastian Andrzej Siewior
@ 2017-05-26 17:16       ` Anna-Maria Gleixner
  2017-05-26 19:27         ` Haris Okanovic
  2017-05-27  7:47         ` [PATCH] Revert "timers: Don't wake ktimersoftd on every tick" Sebastian Andrzej Siewior
  0 siblings, 2 replies; 42+ messages in thread
From: Anna-Maria Gleixner @ 2017-05-26 17:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Haris Okanovic, linux-rt-users, linux-kernel, tglx,
	julia.cartwright, gratian.crisan

This reverts commit 032f93cae150a.

The problem is that the look ahead optimization from the tick timer
interrupt context can race with the softirq thread expiring timer. As
a consequence the temporary hlist heads which hold the to expire
timers are overwritten and the timers which are already removed from
the wheel bucket for expiry are now dangling w/o a list head.

That means those timers never get expired. If one of those timers is
canceled the removal operation will result in a hlist corruption.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
---
 kernel/time/timer.c | 96 ++++++++++++++++-------------------------------------
 1 file changed, 29 insertions(+), 67 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index cdff4411f8f6..08a5ab762495 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -206,8 +206,6 @@ struct timer_base {
 	bool			is_idle;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
-	struct hlist_head	expired_lists[LVL_DEPTH];
-	int			expired_count;
 } ____cacheline_aligned;
 
 static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
@@ -1355,8 +1353,7 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 	}
 }
 
-static inline void __expire_timers(struct timer_base *base,
-				   struct hlist_head *head)
+static void expire_timers(struct timer_base *base, struct hlist_head *head)
 {
 	while (!hlist_empty(head)) {
 		struct timer_list *timer;
@@ -1387,38 +1384,21 @@ static inline void __expire_timers(struct timer_base *base,
 	}
 }
 
-static void expire_timers(struct timer_base *base)
-{
-	struct hlist_head *head;
-
-	while (base->expired_count--) {
-		head = base->expired_lists + base->expired_count;
-		__expire_timers(base, head);
-	}
-	base->expired_count = 0;
-}
-
-static void __collect_expired_timers(struct timer_base *base)
+static int __collect_expired_timers(struct timer_base *base,
+				    struct hlist_head *heads)
 {
 	unsigned long clk = base->clk;
 	struct hlist_head *vec;
-	int i;
+	int i, levels = 0;
 	unsigned int idx;
 
-	/*
-	 * expire_timers() must be called at least once before we can
-	 * collect more timers
-	 */
-	if (WARN_ON(base->expired_count))
-		return;
-
 	for (i = 0; i < LVL_DEPTH; i++) {
 		idx = (clk & LVL_MASK) + i * LVL_SIZE;
 
 		if (__test_and_clear_bit(idx, base->pending_map)) {
 			vec = base->vectors + idx;
-			hlist_move_list(vec,
-				&base->expired_lists[base->expired_count++]);
+			hlist_move_list(vec, heads++);
+			levels++;
 		}
 		/* Is it time to look at the next level? */
 		if (clk & LVL_CLK_MASK)
@@ -1426,6 +1406,7 @@ static void __collect_expired_timers(struct timer_base *base)
 		/* Shift clock for the next level granularity */
 		clk >>= LVL_CLK_SHIFT;
 	}
+	return levels;
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -1618,7 +1599,8 @@ void timer_clear_idle(void)
 	base->is_idle = false;
 }
 
-static void collect_expired_timers(struct timer_base *base)
+static int collect_expired_timers(struct timer_base *base,
+				  struct hlist_head *heads)
 {
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
@@ -1635,49 +1617,20 @@ static void collect_expired_timers(struct timer_base *base)
 		if (time_after(next, jiffies)) {
 			/* The call site will increment clock! */
 			base->clk = jiffies - 1;
-			return;
+			return 0;
 		}
 		base->clk = next;
 	}
-	__collect_expired_timers(base);
+	return __collect_expired_timers(base, heads);
 }
 #else
-static inline void collect_expired_timers(struct timer_base *base)
+static inline int collect_expired_timers(struct timer_base *base,
+					 struct hlist_head *heads)
 {
-	__collect_expired_timers(base);
+	return __collect_expired_timers(base, heads);
 }
 #endif
 
-static int find_expired_timers(struct timer_base *base)
-{
-	const unsigned long int end_clk = jiffies;
-
-	while (!base->expired_count && time_after_eq(end_clk, base->clk)) {
-		collect_expired_timers(base);
-		base->clk++;
-	}
-
-	return base->expired_count;
-}
-
-/* Called from CPU tick routine to quickly collect expired timers */
-static int tick_find_expired(struct timer_base *base)
-{
-	int count;
-
-	raw_spin_lock(&base->lock);
-
-	if (unlikely(time_after(jiffies, base->clk + HZ))) {
-		/* defer to ktimersoftd; don't spend too long in irq context */
-		count = -1;
-	} else
-		count = find_expired_timers(base);
-
-	raw_spin_unlock(&base->lock);
-
-	return count;
-}
-
 /*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
@@ -1704,11 +1657,22 @@ void update_process_times(int user_tick)
  */
 static inline void __run_timers(struct timer_base *base)
 {
+	struct hlist_head heads[LVL_DEPTH];
+	int levels;
+
+	if (!time_after_eq(jiffies, base->clk))
+		return;
+
 	raw_spin_lock_irq(&base->lock);
 
-	while (find_expired_timers(base))
-		expire_timers(base);
+	while (time_after_eq(jiffies, base->clk)) {
+
+		levels = collect_expired_timers(base, heads);
+		base->clk++;
 
+		while (levels--)
+			expire_timers(base, heads + levels);
+	}
 	raw_spin_unlock_irq(&base->lock);
 	wakeup_timer_waiters(base);
 }
@@ -1736,12 +1700,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
+	if (time_before(jiffies, base->clk)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
+		if (time_before(jiffies, base->clk))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);
@@ -1911,7 +1875,6 @@ int timers_dead_cpu(unsigned int cpu)
 		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
 
 		BUG_ON(old_base->running_timer);
-		BUG_ON(old_base->expired_count);
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
@@ -1938,7 +1901,6 @@ static void __init init_timer_cpu(int cpu)
 #ifdef CONFIG_PREEMPT_RT_FULL
 		init_swait_queue_head(&base->wait_for_running_timer);
 #endif
-		base->expired_count = 0;
 	}
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-05-26 17:16       ` [PATCH] Revert "timers: Don't wake ktimersoftd on every tick" Anna-Maria Gleixner
@ 2017-05-26 19:27         ` Haris Okanovic
  2017-05-26 19:49           ` Thomas Gleixner
  2017-05-27  7:47         ` [PATCH] Revert "timers: Don't wake ktimersoftd on every tick" Sebastian Andrzej Siewior
  1 sibling, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2017-05-26 19:27 UTC (permalink / raw)
  To: Anna-Maria Gleixner
  Cc: Sebastian Andrzej Siewior, linux-rt-users, linux-kernel, tglx,
	julia.cartwright, gratian.crisan

Anna-Maria,

Look-ahead is implemented by tick_find_expired() and expiry by 
__run_timers(), both of which hold timer_base::lock (raw spin lock) 
while running. Those two routines shouldn't be able to run 
simultaneously on the same timer_base. Are you sure the race isn't in 
another code path?

Do you have a test which demonstrates this? I'd like to investigate. 
Your .config file would also be useful.

I'm looking at v4.9.27-rt17 source.

Thanks,
Haris


On 05/26/2017 12:16 PM, Anna-Maria Gleixner wrote:
> This reverts commit 032f93cae150a.
>
> The problem is that the look ahead optimization from the tick timer
> interrupt context can race with the softirq thread expiring timer. As
> a consequence the temporary hlist heads which hold the to expire
> timers are overwritten and the timers which are already removed from
> the wheel bucket for expiry are now dangling w/o a list head.
>
> That means those timers never get expired. If one of those timers is
> canceled the removal operation will result in a hlist corruption.
>
> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
> ---
>  kernel/time/timer.c | 96 ++++++++++++++++-------------------------------------
>  1 file changed, 29 insertions(+), 67 deletions(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index cdff4411f8f6..08a5ab762495 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -206,8 +206,6 @@ struct timer_base {
>  	bool			is_idle;
>  	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
>  	struct hlist_head	vectors[WHEEL_SIZE];
> -	struct hlist_head	expired_lists[LVL_DEPTH];
> -	int			expired_count;
>  } ____cacheline_aligned;
>
>  static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
> @@ -1355,8 +1353,7 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
>  	}
>  }
>
> -static inline void __expire_timers(struct timer_base *base,
> -				   struct hlist_head *head)
> +static void expire_timers(struct timer_base *base, struct hlist_head *head)
>  {
>  	while (!hlist_empty(head)) {
>  		struct timer_list *timer;
> @@ -1387,38 +1384,21 @@ static inline void __expire_timers(struct timer_base *base,
>  	}
>  }
>
> -static void expire_timers(struct timer_base *base)
> -{
> -	struct hlist_head *head;
> -
> -	while (base->expired_count--) {
> -		head = base->expired_lists + base->expired_count;
> -		__expire_timers(base, head);
> -	}
> -	base->expired_count = 0;
> -}
> -
> -static void __collect_expired_timers(struct timer_base *base)
> +static int __collect_expired_timers(struct timer_base *base,
> +				    struct hlist_head *heads)
>  {
>  	unsigned long clk = base->clk;
>  	struct hlist_head *vec;
> -	int i;
> +	int i, levels = 0;
>  	unsigned int idx;
>
> -	/*
> -	 * expire_timers() must be called at least once before we can
> -	 * collect more timers
> -	 */
> -	if (WARN_ON(base->expired_count))
> -		return;
> -
>  	for (i = 0; i < LVL_DEPTH; i++) {
>  		idx = (clk & LVL_MASK) + i * LVL_SIZE;
>
>  		if (__test_and_clear_bit(idx, base->pending_map)) {
>  			vec = base->vectors + idx;
> -			hlist_move_list(vec,
> -				&base->expired_lists[base->expired_count++]);
> +			hlist_move_list(vec, heads++);
> +			levels++;
>  		}
>  		/* Is it time to look at the next level? */
>  		if (clk & LVL_CLK_MASK)
> @@ -1426,6 +1406,7 @@ static void __collect_expired_timers(struct timer_base *base)
>  		/* Shift clock for the next level granularity */
>  		clk >>= LVL_CLK_SHIFT;
>  	}
> +	return levels;
>  }
>
>  #ifdef CONFIG_NO_HZ_COMMON
> @@ -1618,7 +1599,8 @@ void timer_clear_idle(void)
>  	base->is_idle = false;
>  }
>
> -static void collect_expired_timers(struct timer_base *base)
> +static int collect_expired_timers(struct timer_base *base,
> +				  struct hlist_head *heads)
>  {
>  	/*
>  	 * NOHZ optimization. After a long idle sleep we need to forward the
> @@ -1635,49 +1617,20 @@ static void collect_expired_timers(struct timer_base *base)
>  		if (time_after(next, jiffies)) {
>  			/* The call site will increment clock! */
>  			base->clk = jiffies - 1;
> -			return;
> +			return 0;
>  		}
>  		base->clk = next;
>  	}
> -	__collect_expired_timers(base);
> +	return __collect_expired_timers(base, heads);
>  }
>  #else
> -static inline void collect_expired_timers(struct timer_base *base)
> +static inline int collect_expired_timers(struct timer_base *base,
> +					 struct hlist_head *heads)
>  {
> -	__collect_expired_timers(base);
> +	return __collect_expired_timers(base, heads);
>  }
>  #endif
>
> -static int find_expired_timers(struct timer_base *base)
> -{
> -	const unsigned long int end_clk = jiffies;
> -
> -	while (!base->expired_count && time_after_eq(end_clk, base->clk)) {
> -		collect_expired_timers(base);
> -		base->clk++;
> -	}
> -
> -	return base->expired_count;
> -}
> -
> -/* Called from CPU tick routine to quickly collect expired timers */
> -static int tick_find_expired(struct timer_base *base)
> -{
> -	int count;
> -
> -	raw_spin_lock(&base->lock);
> -
> -	if (unlikely(time_after(jiffies, base->clk + HZ))) {
> -		/* defer to ktimersoftd; don't spend too long in irq context */
> -		count = -1;
> -	} else
> -		count = find_expired_timers(base);
> -
> -	raw_spin_unlock(&base->lock);
> -
> -	return count;
> -}
> -
>  /*
>   * Called from the timer interrupt handler to charge one tick to the current
>   * process.  user_tick is 1 if the tick is user time, 0 for system.
> @@ -1704,11 +1657,22 @@ void update_process_times(int user_tick)
>   */
>  static inline void __run_timers(struct timer_base *base)
>  {
> +	struct hlist_head heads[LVL_DEPTH];
> +	int levels;
> +
> +	if (!time_after_eq(jiffies, base->clk))
> +		return;
> +
>  	raw_spin_lock_irq(&base->lock);
>
> -	while (find_expired_timers(base))
> -		expire_timers(base);
> +	while (time_after_eq(jiffies, base->clk)) {
> +
> +		levels = collect_expired_timers(base, heads);
> +		base->clk++;
>
> +		while (levels--)
> +			expire_timers(base, heads + levels);
> +	}
>  	raw_spin_unlock_irq(&base->lock);
>  	wakeup_timer_waiters(base);
>  }
> @@ -1736,12 +1700,12 @@ void run_local_timers(void)
>
>  	hrtimer_run_queues();
>  	/* Raise the softirq only if required. */
> -	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
> +	if (time_before(jiffies, base->clk)) {
>  		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
>  			return;
>  		/* CPU is awake, so check the deferrable base. */
>  		base++;
> -		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
> +		if (time_before(jiffies, base->clk))
>  			return;
>  	}
>  	raise_softirq(TIMER_SOFTIRQ);
> @@ -1911,7 +1875,6 @@ int timers_dead_cpu(unsigned int cpu)
>  		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
>
>  		BUG_ON(old_base->running_timer);
> -		BUG_ON(old_base->expired_count);
>
>  		for (i = 0; i < WHEEL_SIZE; i++)
>  			migrate_timer_list(new_base, old_base->vectors + i);
> @@ -1938,7 +1901,6 @@ static void __init init_timer_cpu(int cpu)
>  #ifdef CONFIG_PREEMPT_RT_FULL
>  		init_swait_queue_head(&base->wait_for_running_timer);
>  #endif
> -		base->expired_count = 0;
>  	}
>  }
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-05-26 19:27         ` Haris Okanovic
@ 2017-05-26 19:49           ` Thomas Gleixner
  2017-05-26 20:25             ` Haris Okanovic
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2017-05-26 19:49 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: Anna-Maria Gleixner, Sebastian Andrzej Siewior, linux-rt-users,
	linux-kernel, julia.cartwright, gratian.crisan

On Fri, 26 May 2017, Haris Okanovic wrote:

> Anna-Maria,
> 
> Look-ahead is implemented by tick_find_expired() and expiry by __run_timers(),
> both of which hold timer_base::lock (raw spin lock) while running. Those two
> routines shouldn't be able to run simultaneously on the same timer_base. Are
> you sure the race isn't in another code path?

It happens when softirq runs and drops the spinlock to call the timer
function. And from there stuff goes down the drain.

Anna-Maria will send you the test case on monday.

Thanks,

	tglx

	

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-05-26 19:49           ` Thomas Gleixner
@ 2017-05-26 20:25             ` Haris Okanovic
  2017-05-26 20:50               ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2017-05-26 20:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Gleixner, Sebastian Andrzej Siewior, linux-rt-users,
	linux-kernel, julia.cartwright, gratian.crisan

Oh crap. I think I see the problem. I decrement expired_count before 
processing the list. Dropping the lock permits another run of 
tick_find_expired()->find_expired_timers() in the middle of 
__expire_timers() since it uses expired_count==0 as a condition.

This should fix it, but I'll wait for Anna-Maria's test next week before 
submitting a patch.

>  static void expire_timers(struct timer_base *base)
>  {
>         struct hlist_head *head;
> +       int expCount = base->expired_count;
>
> -       while (base->expired_count--) {
> -               head = base->expired_lists + base->expired_count;
> +       while (expCount--) {
> +               head = base->expired_lists + expCount;
>                 __expire_timers(base, head);
>         }
>         base->expired_count = 0;
>  }

Thanks,
Haris


On 05/26/2017 02:49 PM, Thomas Gleixner wrote:
> On Fri, 26 May 2017, Haris Okanovic wrote:
>
>> Anna-Maria,
>>
>> Look-ahead is implemented by tick_find_expired() and expiry by __run_timers(),
>> both of which hold timer_base::lock (raw spin lock) while running. Those two
>> routines shouldn't be able to run simultaneously on the same timer_base. Are
>> you sure the race isn't in another code path?
>
> It happens when softirq runs and drops the spinlock to call the timer
> function. And from there stuff goes down the drain.
>
> Anna-Maria will send you the test case on monday.
>
> Thanks,
>
> 	tglx
>
> 	
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-05-26 20:25             ` Haris Okanovic
@ 2017-05-26 20:50               ` Thomas Gleixner
  2017-06-02 14:37                 ` Haris Okanovic
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2017-05-26 20:50 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: Anna-Maria Gleixner, Sebastian Andrzej Siewior, linux-rt-users,
	linux-kernel, julia.cartwright, gratian.crisan

On Fri, 26 May 2017, Haris Okanovic wrote:

> Oh crap. I think I see the problem. I decrement expired_count before
> processing the list. Dropping the lock permits another run of
> tick_find_expired()->find_expired_timers() in the middle of __expire_timers()
> since it uses expired_count==0 as a condition.
> 
> This should fix it, but I'll wait for Anna-Maria's test next week before
> submitting a patch.
> 
> >  static void expire_timers(struct timer_base *base)
> >  {
> >         struct hlist_head *head;
> > +       int expCount = base->expired_count;

No camel case for heavens sake!

And this requires:

   	 cnt = READ_ONCE(base->expired_count);

> > -       while (base->expired_count--) {
> > -               head = base->expired_lists + base->expired_count;
> > +       while (expCount--) {
> > +               head = base->expired_lists + expCount;
> >                 __expire_timers(base, head);
> >         }

Plus a comment.

> >         base->expired_count = 0;

Anna-Maria spotted the same issue, but I voted for the revert right now
because I was worried about the consistency of base->clk under all
circumstances.

The other thing I noticed was this weird condition which does not do the
look ahead when base->clk is back for some time. Why don't you use the
existing optimization which uses the bitmap for fast forward?

The other issue I have is that this can race at all. If you raised the
softirq in the look ahead then you should not go into that function until
the softirq has actually completed. There is no point in wasting time in
the hrtimer interrupt if the softirq is running anyway.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-05-26 17:16       ` [PATCH] Revert "timers: Don't wake ktimersoftd on every tick" Anna-Maria Gleixner
  2017-05-26 19:27         ` Haris Okanovic
@ 2017-05-27  7:47         ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-05-27  7:47 UTC (permalink / raw)
  To: Anna-Maria Gleixner
  Cc: Haris Okanovic, linux-rt-users, linux-kernel, tglx,
	julia.cartwright, gratian.crisan

On 2017-05-26 19:16:07 [+0200], Anna-Maria Gleixner wrote:
> This reverts commit 032f93cae150a.
> 
> The problem is that the look ahead optimization from the tick timer
> interrupt context can race with the softirq thread expiring timer. As
> a consequence the temporary hlist heads which hold the to expire
> timers are overwritten and the timers which are already removed from
> the wheel bucket for expiry are now dangling w/o a list head.
> 
> That means those timers never get expired. If one of those timers is
> canceled the removal operation will result in a hlist corruption.
> 
> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>

Applied

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-05-26 20:50               ` Thomas Gleixner
@ 2017-06-02 14:37                 ` Haris Okanovic
  2017-06-04 14:17                   ` Thomas Gleixner
  0 siblings, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2017-06-02 14:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Gleixner, Sebastian Andrzej Siewior, linux-rt-users,
	linux-kernel, julia.cartwright, gratian.crisan

On 05/26/2017 03:50 PM, Thomas Gleixner wrote:
> On Fri, 26 May 2017, Haris Okanovic wrote:
>
>> Oh crap. I think I see the problem. I decrement expired_count before
>> processing the list. Dropping the lock permits another run of
>> tick_find_expired()->find_expired_timers() in the middle of __expire_timers()
>> since it uses expired_count==0 as a condition.
>>
>> This should fix it, but I'll wait for Anna-Maria's test next week before
>> submitting a patch.
>>
>>>  static void expire_timers(struct timer_base *base)
>>>  {
>>>         struct hlist_head *head;
>>> +       int expCount = base->expired_count;
>
> No camel case for heavens sake!
>
> And this requires:
>
>    	 cnt = READ_ONCE(base->expired_count);
>
>>> -       while (base->expired_count--) {
>>> -               head = base->expired_lists + base->expired_count;
>>> +       while (expCount--) {
>>> +               head = base->expired_lists + expCount;
>>>                 __expire_timers(base, head);
>>>         }
>
> Plus a comment.

Fixed, thanks.

Are your recommending READ_ONCE() purely for documentation purposes?
All reads and writes to base->expired_count happen while base->lock is 
held. It just can't reach zero until expired_lists is ready to be rewritten.

>
>>>         base->expired_count = 0;
>
> Anna-Maria spotted the same issue, but I voted for the revert right now
> because I was worried about the consistency of base->clk under all
> circumstances.
>
> The other thing I noticed was this weird condition which does not do the
> look ahead when base->clk is back for some time.

The soft interrupt fires unconditionally if base->clk hasn't advanced in 
some time to limit how long cpu spends in hard interrupt context.

> Why don't you use the
> existing optimization which uses the bitmap for fast forward?
>

Are you referring to forward_timer_base()/base->next_expiry? I think 
it's only updated in the nohz case. Can you share function name/line 
number(s) if you're thinking of something else.

> The other issue I have is that this can race at all. If you raised the
> softirq in the look ahead then you should not go into that function until
> the softirq has actually completed. There is no point in wasting time in
> the hrtimer interrupt if the softirq is running anyway.
>

Makes sense. Skipping the large `if` block in run_local_timers() when 
`local_softirq_pending() & TIMER_SOFTIRQ`.

> Thanks,
>
> 	tglx
>
>
>
>

I also ran Anna-Maria's test for 12h without failure; I.e. no "Stalled" 
messages. It fails withing 10-15m on my qemu VM without the fix (4-core 
Nehalem, 1GB RAM).

You can view a diff at
https://github.com/harisokanovic/linux/tree/dev/hokanovi/timers-race

-- Haris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-06-02 14:37                 ` Haris Okanovic
@ 2017-06-04 14:17                   ` Thomas Gleixner
  2017-07-17 21:58                     ` Haris Okanovic
  2017-07-17 22:04                     ` [PATCH v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  0 siblings, 2 replies; 42+ messages in thread
From: Thomas Gleixner @ 2017-06-04 14:17 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: Anna-Maria Gleixner, Sebastian Andrzej Siewior, linux-rt-users,
	linux-kernel, julia.cartwright, gratian.crisan

On Fri, 2 Jun 2017, Haris Okanovic wrote:
> On 05/26/2017 03:50 PM, Thomas Gleixner wrote:
> > > >  static void expire_timers(struct timer_base *base)
> > > >  {
> > > >         struct hlist_head *head;
> > > > +       int expCount = base->expired_count;
> > 
> > No camel case for heavens sake!
> > 
> > And this requires:
> > 
> >    	 cnt = READ_ONCE(base->expired_count);
> > 
> > > > -       while (base->expired_count--) {
> > > > -               head = base->expired_lists + base->expired_count;
> > > > +       while (expCount--) {
> > > > +               head = base->expired_lists + expCount;
> > > >                 __expire_timers(base, head);
> > > >         }
> > 
> > Plus a comment.
> 
> Fixed, thanks.
> 
> Are your recommending READ_ONCE() purely for documentation purposes?

Yes.

> > The other thing I noticed was this weird condition which does not do the
> > look ahead when base->clk is back for some time.
> 
> The soft interrupt fires unconditionally if base->clk hasn't advanced in some
> time to limit how long cpu spends in hard interrupt context.

That makes no sense.

> > Why don't you use the
> > existing optimization which uses the bitmap for fast forward?
> > 
> 
> Are you referring to forward_timer_base()/base->next_expiry? I think it's only
> updated in the nohz case. Can you share function name/line number(s) if you're
> thinking of something else.

I think just using collect_expired_timers() should be enough. In the !NOHZ
case the base shouldn't be that far back, right?

> > The other issue I have is that this can race at all. If you raised the
> > softirq in the look ahead then you should not go into that function until
> > the softirq has actually completed. There is no point in wasting time in
> > the hrtimer interrupt if the softirq is running anyway.
> > 
> 
> Makes sense. Skipping the large `if` block in run_local_timers() when
> `local_softirq_pending() & TIMER_SOFTIRQ`.

No. You need your own state tracking. The TIMER_SOFTIRQ bit is cleared when
the softirq is invoked, but that does not mean that it finished running.

run_local_timers()
{
	lock(base->lock);
	if (!base->softirq_activated)
		if (base_has_timers_to_expire()) {
			base->softirq_activated = true;
			raise_softirq(TIMER_SOFTIRQ);
		}
	}
	unlock(base->lock);
}

timer_softirq()
{
	lock(base->lock);
	expire_timers();
	base->softirq_activated = false;
	unlock(base->lock);
}

That way you avoid any operation in the tick interrupt as long as the soft
interrupt processing has not completed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] Revert "timers: Don't wake ktimersoftd on every tick"
  2017-06-04 14:17                   ` Thomas Gleixner
@ 2017-07-17 21:58                     ` Haris Okanovic
  2017-07-17 22:04                     ` [PATCH v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  1 sibling, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2017-07-17 21:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Gleixner, Sebastian Andrzej Siewior, linux-rt-users,
	linux-kernel, julia.cartwright, gratian.crisan, harisokn

On 06/04/2017 09:17 AM, Thomas Gleixner wrote:
> On Fri, 2 Jun 2017, Haris Okanovic wrote:
>> On 05/26/2017 03:50 PM, Thomas Gleixner wrote:
>>>>>   static void expire_timers(struct timer_base *base)
>>>>>   {
>>>>>          struct hlist_head *head;
>>>>> +       int expCount = base->expired_count;
>>>
>>> No camel case for heavens sake!
>>>
>>> And this requires:
>>>
>>>     	 cnt = READ_ONCE(base->expired_count);
>>>
>>>>> -       while (base->expired_count--) {
>>>>> -               head = base->expired_lists + base->expired_count;
>>>>> +       while (expCount--) {
>>>>> +               head = base->expired_lists + expCount;
>>>>>                  __expire_timers(base, head);
>>>>>          }
>>>
>>> Plus a comment.
>>
>> Fixed, thanks.
>>
>> Are your recommending READ_ONCE() purely for documentation purposes?
> 
> Yes.
> 
>>> The other thing I noticed was this weird condition which does not do the
>>> look ahead when base->clk is back for some time.
>>
>> The soft interrupt fires unconditionally if base->clk hasn't advanced in some
>> time to limit how long cpu spends in hard interrupt context.
> 
> That makes no sense.
> 

I wrote this part out of an abundance of caution: I didn't want a 
potentially unbounded operation to run in hardirq context. This logic 
forces both the update to timer bases & firing of timers into softirq 
context if the clock advances by a lot (some arbitrary large number of 
ticks, HZ in this case).

However, I think you're right that this is unneeded since 
run_local_timers() is called per tick, and thus would never exercise 
this case.

Removed this case.

>>> Why don't you use the
>>> existing optimization which uses the bitmap for fast forward?
>>>
>>
>> Are you referring to forward_timer_base()/base->next_expiry? I think it's only
>> updated in the nohz case. Can you share function name/line number(s) if you're
>> thinking of something else.
> 
> I think just using collect_expired_timers() should be enough. In the !NOHZ
> case the base shouldn't be that far back, right?
> 

Refactored.

>>> The other issue I have is that this can race at all. If you raised the
>>> softirq in the look ahead then you should not go into that function until
>>> the softirq has actually completed. There is no point in wasting time in
>>> the hrtimer interrupt if the softirq is running anyway.
>>>
>>
>> Makes sense. Skipping the large `if` block in run_local_timers() when
>> `local_softirq_pending() & TIMER_SOFTIRQ`.
> 
> No. You need your own state tracking. The TIMER_SOFTIRQ bit is cleared when
> the softirq is invoked, but that does not mean that it finished running.
> 
> run_local_timers()
> {
> 	lock(base->lock);
> 	if (!base->softirq_activated)
> 		if (base_has_timers_to_expire()) {
> 			base->softirq_activated = true;
> 			raise_softirq(TIMER_SOFTIRQ);
> 		}
> 	}
> 	unlock(base->lock);
> }
> 
> timer_softirq()
> {
> 	lock(base->lock);
> 	expire_timers();
> 	base->softirq_activated = false;
> 	unlock(base->lock);
> }
> 
> That way you avoid any operation in the tick interrupt as long as the soft
> interrupt processing has not completed.

Added a per-cpu block_softirq boolean.

> 
> Thanks,
> 
> 	tglx
> 

I'll post a v2 patch shortly.

Thanks,
Haris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] timers: Don't wake ktimersoftd on every tick
  2017-06-04 14:17                   ` Thomas Gleixner
  2017-07-17 21:58                     ` Haris Okanovic
@ 2017-07-17 22:04                     ` Haris Okanovic
  2017-07-18 21:33                       ` Thomas Gleixner
  1 sibling, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2017-07-17 22:04 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: haris.okanovic, harisokn, bigeasy, tglx, julia.cartwright,
	gratian.crisan, anna-maria

We recently upgraded from 4.1 to 4.6 and noticed a minor latency
regression caused by an additional thread wakeup (ktimersoftd) in
interrupt context on every tick. The wakeups are from
run_local_timers() raising TIMER_SOFTIRQ. Both TIMER and SCHED softirq
coalesced into one ksoftirqd wakeup prior to Sebastian's change to split
timers into their own thread.

There's already logic in run_local_timers() to avoid some unnecessary
wakeups of ksoftirqd, but it doesn't seems to catch them all. In
particular, I've seen many unnecessary wakeups when jiffies increments
prior to run_local_timers().

Change the way timers are collected per Julia and Thomas'
recommendation: Expired timers are now collected in interrupt context
and fired in ktimersoftd to avoid double-walk of `pending_map`.

Collect expired timers in interrupt context to avoid overhead of waking
ktimersoftd on every tick. ktimersoftd now wakes only when one or more
timers are ready, which yields a minor reduction in small latency spikes.

This is implemented by storing lists of expired timers in timer_base,
updated on each tick. Any addition to the lists wakes ktimersoftd
(softirq) to process those timers.

https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v4

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
---
[PATCH v2] Applied Thomas Gleixner's suggestions:
 - Fix expired_count race
 - Remove unneeded base->clk lookahead
 - Return expired_count in collect_expired_timers()
 - Add block_softirq
 - Rebase to v4.11.8-rt5
---
 kernel/time/timer.c | 122 +++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 92 insertions(+), 30 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 5730d42bfd67..e5b537f2308c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -209,9 +209,12 @@ struct timer_base {
 	bool			is_idle;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
+	struct hlist_head	expired_lists[LVL_DEPTH];
+	int			expired_count;
 } ____cacheline_aligned;
 
 static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
+static DEFINE_PER_CPU(int, block_softirqs);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
 unsigned int sysctl_timer_migration = 1;
@@ -1314,7 +1317,8 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 	}
 }
 
-static void expire_timers(struct timer_base *base, struct hlist_head *head)
+static inline void __expire_timers(struct timer_base *base,
+				   struct hlist_head *head)
 {
 	while (!hlist_empty(head)) {
 		struct timer_list *timer;
@@ -1344,21 +1348,45 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static void expire_timers(struct timer_base *base)
 {
-	unsigned long clk = base->clk;
+	struct hlist_head *head;
+	int count = READ_ONCE(base->expired_count);
+
+	while (count--) {
+		head = base->expired_lists + count;
+		__expire_timers(base, head);
+	}
+
+	/* Zero base->expired_count after processing all base->expired_lists
+	 * to signal it's ready to get re-populated. Otherwise, we race with
+	 * tick_find_expired() when base->lock is temporarily dropped in
+	 * __expire_timers() */
+	base->expired_count = 0;
+}
+
+static int __collect_expired_timers(struct timer_base *base)
+{
+	unsigned long clk;
 	struct hlist_head *vec;
-	int i, levels = 0;
+	int i;
 	unsigned int idx;
 
+	/*
+	 * expire_timers() must be called at least once before we can
+	 * collect more timers
+	 */
+	if (base->expired_count)
+		goto end;
+
+	clk = base->clk;
 	for (i = 0; i < LVL_DEPTH; i++) {
 		idx = (clk & LVL_MASK) + i * LVL_SIZE;
 
 		if (__test_and_clear_bit(idx, base->pending_map)) {
 			vec = base->vectors + idx;
-			hlist_move_list(vec, heads++);
-			levels++;
+			hlist_move_list(vec,
+				&base->expired_lists[base->expired_count++]);
 		}
 		/* Is it time to look at the next level? */
 		if (clk & LVL_CLK_MASK)
@@ -1366,7 +1394,8 @@ static int __collect_expired_timers(struct timer_base *base,
 		/* Shift clock for the next level granularity */
 		clk >>= LVL_CLK_SHIFT;
 	}
-	return levels;
+
+	end: return base->expired_count;
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -1559,8 +1588,7 @@ void timer_clear_idle(void)
 	base->is_idle = false;
 }
 
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
+static int collect_expired_timers(struct timer_base *base)
 {
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
@@ -1581,16 +1609,41 @@ static int collect_expired_timers(struct timer_base *base,
 		}
 		base->clk = next;
 	}
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
+static inline int collect_expired_timers(struct timer_base *base)
 {
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #endif
 
+/* Increments timer_base to current jiffies or until first expired
+ * timer is found. Return number of expired timers. */
+static int find_expired_timers(struct timer_base *base)
+{
+	const unsigned long int end_clk = jiffies;
+	int expired_count;
+
+	while ( !(expired_count = collect_expired_timers(base)) &&
+			time_after_eq(end_clk, base->clk) ) {
+		base->clk++;
+	}
+
+	return expired_count;
+}
+
+/* Called from CPU tick routine to collect expired timers up to current
+ * jiffies. Return number of expired timers. */
+static int tick_find_expired(struct timer_base *base)
+{
+	int count;
+	raw_spin_lock(&base->lock);
+	count = find_expired_timers(base);
+	raw_spin_unlock(&base->lock);
+	return count;
+}
+
 /*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
@@ -1618,22 +1671,11 @@ void update_process_times(int user_tick)
  */
 static inline void __run_timers(struct timer_base *base)
 {
-	struct hlist_head heads[LVL_DEPTH];
-	int levels;
-
-	if (!time_after_eq(jiffies, base->clk))
-		return;
-
 	raw_spin_lock_irq(&base->lock);
 
-	while (time_after_eq(jiffies, base->clk)) {
+	while (find_expired_timers(base))
+		expire_timers(base);
 
-		levels = collect_expired_timers(base, heads);
-		base->clk++;
-
-		while (levels--)
-			expire_timers(base, heads + levels);
-	}
 	raw_spin_unlock_irq(&base->lock);
 	wakeup_timer_waiters(base);
 }
@@ -1644,12 +1686,16 @@ static inline void __run_timers(struct timer_base *base)
 static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	int* block_softirq = this_cpu_ptr(&block_softirqs);
 
 	irq_work_tick_soft();
 
 	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
+
+	/* Allow new TIMER_SOFTIRQs to get scheduled by run_local_timers() */
+	WRITE_ONCE(*block_softirq, 0);
 }
 
 /*
@@ -1657,18 +1703,28 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
  */
 void run_local_timers(void)
 {
-	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+	int* block_softirq = this_cpu_ptr(&block_softirqs);
+	struct timer_base *base;
 
 	hrtimer_run_queues();
+
+	/* Skip if TIMER_SOFTIRQ is already running for this CPU */
+	if (READ_ONCE(*block_softirq))
+		return;
+
+	base = this_cpu_ptr(&timer_bases[BASE_STD]);
+
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
 	}
+
+	WRITE_ONCE(*block_softirq, 1);
 	raise_softirq(TIMER_SOFTIRQ);
 }
 
@@ -1826,6 +1882,7 @@ int timers_dead_cpu(unsigned int cpu)
 		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
 
 		BUG_ON(old_base->running_timer);
+		BUG_ON(old_base->expired_count);
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
@@ -1842,6 +1899,7 @@ int timers_dead_cpu(unsigned int cpu)
 static void __init init_timer_cpu(int cpu)
 {
 	struct timer_base *base;
+	int* block_softirq;
 	int i;
 
 	for (i = 0; i < NR_BASES; i++) {
@@ -1852,6 +1910,10 @@ static void __init init_timer_cpu(int cpu)
 #ifdef CONFIG_PREEMPT_RT_FULL
 		init_swait_queue_head(&base->wait_for_running_timer);
 #endif
+		base->expired_count = 0;
+
+		block_softirq = per_cpu_ptr(&block_softirqs, cpu);
+		*block_softirq = 0;
 	}
 }
 
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] timers: Don't wake ktimersoftd on every tick
  2017-07-17 22:04                     ` [PATCH v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
@ 2017-07-18 21:33                       ` Thomas Gleixner
  2017-08-03 21:04                         ` Haris Okanovic
                                           ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Thomas Gleixner @ 2017-07-18 21:33 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, harisokn, bigeasy,
	julia.cartwright, gratian.crisan, anna-maria

On Mon, 17 Jul 2017, Haris Okanovic wrote:
> We recently upgraded from 4.1 to 4.6 and noticed a minor latency
> regression caused by an additional thread wakeup (ktimersoftd) in
> interrupt context on every tick. The wakeups are from
> run_local_timers() raising TIMER_SOFTIRQ. Both TIMER and SCHED softirq
> coalesced into one ksoftirqd wakeup prior to Sebastian's change to split
> timers into their own thread.
> 
> There's already logic in run_local_timers() to avoid some unnecessary
> wakeups of ksoftirqd, but it doesn't seems to catch them all. In
> particular, I've seen many unnecessary wakeups when jiffies increments
> prior to run_local_timers().
> 
> Change the way timers are collected per Julia and Thomas'
> recommendation: Expired timers are now collected in interrupt context
> and fired in ktimersoftd to avoid double-walk of `pending_map`.
> 
> Collect expired timers in interrupt context to avoid overhead of waking
> ktimersoftd on every tick. ktimersoftd now wakes only when one or more
> timers are ready, which yields a minor reduction in small latency spikes.
> 
> This is implemented by storing lists of expired timers in timer_base,
> updated on each tick. Any addition to the lists wakes ktimersoftd
> (softirq) to process those timers.

One thing which would be really good to have in the changelog is the
overhead of that collection operation in hard irq context.

> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 5730d42bfd67..e5b537f2308c 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -209,9 +209,12 @@ struct timer_base {
>  	bool			is_idle;
>  	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
>  	struct hlist_head	vectors[WHEEL_SIZE];
> +	struct hlist_head	expired_lists[LVL_DEPTH];
> +	int			expired_count;

You need to look at the cache layout of that whole thing. My gut feeling
tells me that that count is at the wrong place.

>  } ____cacheline_aligned;
>  
>  static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
> +static DEFINE_PER_CPU(int, block_softirqs);

Why are you putting that into a seperate per cpu variable instead of adding
a bool to the base struct as I suggested in my example:

       base->softirq_activated = false;

Having that separate makes no sense conceptually and cache wise it can
force to touch yet another cacheline depending on the placement by
compiler/linker. Looking at your implementation it does in 100% of the
cases.

You can use the first base for that, as that is going to be touched anyway
and is cache hot in any case.

> -static void expire_timers(struct timer_base *base, struct hlist_head *head)
> +static inline void __expire_timers(struct timer_base *base,

What's the purpose of this change? If it makes sense to inline it, then the
compiler will do so.

> +				   struct hlist_head *head)
>  {
>  	while (!hlist_empty(head)) {
>  		struct timer_list *timer;
> @@ -1344,21 +1348,45 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
>  	}
>  }
>  
> -static int __collect_expired_timers(struct timer_base *base,
> -				    struct hlist_head *heads)
> +static void expire_timers(struct timer_base *base)
>  {
> -	unsigned long clk = base->clk;
> +	struct hlist_head *head;
> +	int count = READ_ONCE(base->expired_count);

Please keep the reverse fir tree ordering based on length for the variables
as we have it throughout that code.

> +
> +	while (count--) {

So this changed vs. the previous implementation and in this case the
READ_ONCE() is pointless as the compiler CANNOT reevaluate base->foo inside
that loop.

> +		head = base->expired_lists + count;
> +		__expire_timers(base, head);
> +	}
> +
> +	/* Zero base->expired_count after processing all base->expired_lists
> +	 * to signal it's ready to get re-populated. Otherwise, we race with
> +	 * tick_find_expired() when base->lock is temporarily dropped in
> +	 * __expire_timers() */

Please stick to the two sane comment styles:

       /* Single line comment */

       /*
        * Multi line comment
        * Multi line comment
        * Multi line comment
	*/

For further enlightment: http://marc.info/?l=linux-kernel&m=146799838429328&w=2

> +	base->expired_count = 0;
> +}
> +
> +static int __collect_expired_timers(struct timer_base *base)
> +{
> +	unsigned long clk;
>  	struct hlist_head *vec;
> -	int i, levels = 0;
> +	int i;
>  	unsigned int idx;

See above

> +	/*
> +	 * expire_timers() must be called at least once before we can
> +	 * collect more timers
> +	 */
> +	if (base->expired_count)
> +		goto end;
> +
> +	clk = base->clk;
>  	for (i = 0; i < LVL_DEPTH; i++) {
>  		idx = (clk & LVL_MASK) + i * LVL_SIZE;
>  
>  		if (__test_and_clear_bit(idx, base->pending_map)) {
>  			vec = base->vectors + idx;
> -			hlist_move_list(vec, heads++);
> -			levels++;
> +			hlist_move_list(vec,
> +				&base->expired_lists[base->expired_count++]);

Eew. What's wrong with local variables ?

     struct hist_head *list = &base->expired_vectors;

at the top of this function and then do

			hlist_move_list(vec, list++);
			base->expired_levels++;

or have a local count and use it as index to list[]. The code generation
should be roughly the same, but I expect it to be better with the seperate
increments.

>  		}
>  		/* Is it time to look at the next level? */
>  		if (clk & LVL_CLK_MASK)
> @@ -1366,7 +1394,8 @@ static int __collect_expired_timers(struct timer_base *base,
>  		/* Shift clock for the next level granularity */
>  		clk >>= LVL_CLK_SHIFT;
>  	}
> -	return levels;
> +
> +	end: return base->expired_count;

More Eeew! Can you please look how labels are placed in the
kernel. Certainly not that way.

Aside of that the goto is silly. You can just return expired_count up at
that conditional, or move the conditional to the caller.

Actually I do not understand that conditional at the top at all. The call
site breaks out of the loop when the return value is > 0. So what's that
for? Paranoia? If that's the case then you want a WARN_ONCE there, because
that should never happen. Otherwise it's just pointless. If actually
something relies on that, then it's disgusting.

>  }
>  
>  #ifdef CONFIG_NO_HZ_COMMON
> @@ -1559,8 +1588,7 @@ void timer_clear_idle(void)
>  	base->is_idle = false;
>  }
>  
> -static int collect_expired_timers(struct timer_base *base,
> -				  struct hlist_head *heads)
> +static int collect_expired_timers(struct timer_base *base)
>  {
>  	/*
>  	 * NOHZ optimization. After a long idle sleep we need to forward the
> @@ -1581,16 +1609,41 @@ static int collect_expired_timers(struct timer_base *base,
>  		}
>  		base->clk = next;
>  	}
> -	return __collect_expired_timers(base, heads);
> +	return __collect_expired_timers(base);
>  }
>  #else
> -static inline int collect_expired_timers(struct timer_base *base,
> -					 struct hlist_head *heads)
> +static inline int collect_expired_timers(struct timer_base *base)
>  {
> -	return __collect_expired_timers(base, heads);
> +	return __collect_expired_timers(base);
>  }
>  #endif
>  
> +/* Increments timer_base to current jiffies or until first expired
> + * timer is found. Return number of expired timers. */

Sigh.

> +static int find_expired_timers(struct timer_base *base)
> +{
> +	const unsigned long int end_clk = jiffies;

const ? unsigned long int ?

> +	int expired_count;
> +
> +	while ( !(expired_count = collect_expired_timers(base)) &&
> +			time_after_eq(end_clk, base->clk) ) {

These extra white spaces after ( and before ) are pointless and not kernel
coding style.

What's worse is the order of your conditionals. Just look at the original
code.

> +		base->clk++;
> +	}

Aside of that this loop is fricking hard to read.

	int levels = 0;

	while (!levels && time_after_eq(jiffies, base->clk)) {
	      levels = collect_expired_timers(base, heads);
	      base->clk++;
	}
	
	return levels;

Is all what you need here, right? That's what the original loop does as
well.

> +
> +	return expired_count;
> +}
> +
> +/* Called from CPU tick routine to collect expired timers up to current
> + * jiffies. Return number of expired timers. */

Wrong. It returns the number of levels which have expired timers. The
number of actual timers per level is unknown as we move the complete list.

> +static int tick_find_expired(struct timer_base *base)
> +{
> +	int count;

Missing new line between declaration and code. checkpatch.pl is wrong on a
lot of things, but it would have told you.

> +	raw_spin_lock(&base->lock);
> +	count = find_expired_timers(base);
> +	raw_spin_unlock(&base->lock);
> +	return count;

Please be consistent with the names. We use 'levels' throughout all the other
functions. Random variable names are just confusing.

> +}
> +
>  /*
>   * Called from the timer interrupt handler to charge one tick to the current
>   * process.  user_tick is 1 if the tick is user time, 0 for system.
> @@ -1618,22 +1671,11 @@ void update_process_times(int user_tick)
>   */
>  static inline void __run_timers(struct timer_base *base)
>  {
> -	struct hlist_head heads[LVL_DEPTH];
> -	int levels;
> -
> -	if (!time_after_eq(jiffies, base->clk))
> -		return;
> -
>  	raw_spin_lock_irq(&base->lock);
>  
> -	while (time_after_eq(jiffies, base->clk)) {
> +	while (find_expired_timers(base))
> +		expire_timers(base);

Now I understand that extra conditional above. That's crap, really. Two
ways to solve that:

	do {
       		expire_timers(base);
	} while (find_expired_timers(base));

	which requires a check for base->expired_levels inside of
	expire_timers().

or

       if (base->expired_levels)
       		expire_timers(base);

	while (find_expired_timers(base))
       		expire_timers(base);

>  	raw_spin_unlock_irq(&base->lock);
>  	wakeup_timer_waiters(base);

Errm. Please submit patches against mainline. This is RT only. On mainline
the overhead of raising the softirq is not that big, but the exercise is
the same.

> @@ -1644,12 +1686,16 @@ static inline void __run_timers(struct timer_base *base)
>  static __latent_entropy void run_timer_softirq(struct softirq_action *h)
>  {
>  	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
> +	int* block_softirq = this_cpu_ptr(&block_softirqs);

Sigh. A pointer is declared with:

	int *p;

and not

	int* p;

>  	irq_work_tick_soft();
>  
>  	__run_timers(base);
>  	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
>  		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
> +
> +	/* Allow new TIMER_SOFTIRQs to get scheduled by run_local_timers() */
> +	WRITE_ONCE(*block_softirq, 0);

You are in interrupt enabled code here. So you actually might miss a wakeup
and delay it to the next tick. If that's your intention then please
document it proper. If not, you need to disable interrupts around the write
and recheck stuff.

Also the WRITE_ONCE() is pointless. The compiler cannot reorder the
write. And it does not protect you from racing with the hard interrupt. So
for the sloppy variant a simple:

       base->softirq_activated = false;

is sufficient.

>  }
>  
>  /*
> @@ -1657,18 +1703,28 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
>   */
>  void run_local_timers(void)
>  {
> -	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
> +	int* block_softirq = this_cpu_ptr(&block_softirqs);
> +	struct timer_base *base;
>  
>  	hrtimer_run_queues();
> +
> +	/* Skip if TIMER_SOFTIRQ is already running for this CPU */
> +	if (READ_ONCE(*block_softirq))
> +		return;
> +
> +	base = this_cpu_ptr(&timer_bases[BASE_STD]);

And this here becomes:

    	if (base->softirq_activated)
    		return;

> +
>  	/* Raise the softirq only if required. */
> -	if (time_before(jiffies, base->clk)) {
> +	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
>  		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
>  			return;
>  		/* CPU is awake, so check the deferrable base. */
>  		base++;
> -		if (time_before(jiffies, base->clk))
> +		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
>  			return;

To make that work, all you need here is:

		base--;

>  	}

and
	base->softirq_activated = true;

>  static void __init init_timer_cpu(int cpu)
>  {
>  	struct timer_base *base;
> +	int* block_softirq;
>  	int i;
>  
>  	for (i = 0; i < NR_BASES; i++) {
> @@ -1852,6 +1910,10 @@ static void __init init_timer_cpu(int cpu)
>  #ifdef CONFIG_PREEMPT_RT_FULL
>  		init_swait_queue_head(&base->wait_for_running_timer);
>  #endif
> +		base->expired_count = 0;
> +
> +		block_softirq = per_cpu_ptr(&block_softirqs, cpu);
> +		*block_softirq = 0;

What kind of voodoo initialization is this? Do you not trust BSS? Or do you
not make sure that the stuff is brought into proper state when a CPU goes
offline?

Aside of the above, this patch wants to be split into two pieces:

  1) Embedd the hlist heads for expired bucket collection into base
     struct and adjust the code accordingly.

  2) Implement the conditional softirq raise machinery

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] timers: Don't wake ktimersoftd on every tick
  2017-07-18 21:33                       ` Thomas Gleixner
@ 2017-08-03 21:04                         ` Haris Okanovic
  2017-08-03 21:06                         ` [PATCH v3 1/2] " Haris Okanovic
  2018-03-06 23:39                         ` [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  2 siblings, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2017-08-03 21:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-rt-users, linux-kernel, harisokn, bigeasy,
	julia.cartwright, gratian.crisan, anna-maria

Thomas,

Apologies on the late response. I've been busy last few weeks.

On 07/18/2017 04:33 PM, Thomas Gleixner wrote:
> On Mon, 17 Jul 2017, Haris Okanovic wrote:
>> We recently upgraded from 4.1 to 4.6 and noticed a minor latency
>> regression caused by an additional thread wakeup (ktimersoftd) in
>> interrupt context on every tick. The wakeups are from
>> run_local_timers() raising TIMER_SOFTIRQ. Both TIMER and SCHED softirq
>> coalesced into one ksoftirqd wakeup prior to Sebastian's change to split
>> timers into their own thread.
>>
>> There's already logic in run_local_timers() to avoid some unnecessary
>> wakeups of ksoftirqd, but it doesn't seems to catch them all. In
>> particular, I've seen many unnecessary wakeups when jiffies increments
>> prior to run_local_timers().
>>
>> Change the way timers are collected per Julia and Thomas'
>> recommendation: Expired timers are now collected in interrupt context
>> and fired in ktimersoftd to avoid double-walk of `pending_map`.
>>
>> Collect expired timers in interrupt context to avoid overhead of waking
>> ktimersoftd on every tick. ktimersoftd now wakes only when one or more
>> timers are ready, which yields a minor reduction in small latency spikes.
>>
>> This is implemented by storing lists of expired timers in timer_base,
>> updated on each tick. Any addition to the lists wakes ktimersoftd
>> (softirq) to process those timers.
> 
> One thing which would be really good to have in the changelog is the
> overhead of that collection operation in hard irq context.
> 

Added testing note: Execution time of run_local_timers() increases by 
0.2us to 2.5us as measured by TSC on a 2core Intel Atom E3825 system.

I'm guessing the variance is spin lock contention caused by timers being 
added/removed by different threads.

I also verified the average case latency decrease in cyclictest and 
reran Anna-Maria's test on a qemu 4core Nehalem VM; latency decreases 
and no stalls.

>> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
>> index 5730d42bfd67..e5b537f2308c 100644
>> --- a/kernel/time/timer.c
>> +++ b/kernel/time/timer.c
>> @@ -209,9 +209,12 @@ struct timer_base {
>>   	bool			is_idle;
>>   	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
>>   	struct hlist_head	vectors[WHEEL_SIZE];
>> +	struct hlist_head	expired_lists[LVL_DEPTH];
>> +	int			expired_count;
> 
> You need to look at the cache layout of that whole thing. My gut feeling
> tells me that that count is at the wrong place.
> 

You're right, there's a 4-byte hole after `lock` we can use. I'll move
`expired_count` there.

>>   } ____cacheline_aligned;
>>   
>>   static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
>> +static DEFINE_PER_CPU(int, block_softirqs);
> 
> Why are you putting that into a seperate per cpu variable instead of adding
> a bool to the base struct as I suggested in my example:
> 
>         base->softirq_activated = false;
> 
> Having that separate makes no sense conceptually and cache wise it can
> force to touch yet another cacheline depending on the placement by
> compiler/linker. Looking at your implementation it does in 100% of the
> cases.
> 
> You can use the first base for that, as that is going to be touched anyway
> and is cache hot in any case.
> 

I was trying to avoid using twice as much memory in the NOHZ case and
didn't consider cache implications. There's actually another 1-byte hole
after `timer_base.is_idle` which can fit this bool.

>> -static void expire_timers(struct timer_base *base, struct hlist_head *head)
>> +static inline void __expire_timers(struct timer_base *base,
> 
> What's the purpose of this change? If it makes sense to inline it, then the
> compiler will do so.
> 

I inlined it because it only has one call site, but I'm sure the
compiler will figure that out as well. Dropped.

>> +				   struct hlist_head *head)
>>   {
>>   	while (!hlist_empty(head)) {
>>   		struct timer_list *timer;
>> @@ -1344,21 +1348,45 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
>>   	}
>>   }
>>   
>> -static int __collect_expired_timers(struct timer_base *base,
>> -				    struct hlist_head *heads)
>> +static void expire_timers(struct timer_base *base)
>>   {
>> -	unsigned long clk = base->clk;
>> +	struct hlist_head *head;
>> +	int count = READ_ONCE(base->expired_count);
> 
> Please keep the reverse fir tree ordering based on length for the variables
> as we have it throughout that code.
> 

Moved clk.

>> +
>> +	while (count--) {
> 
> So this changed vs. the previous implementation and in this case the
> READ_ONCE() is pointless as the compiler CANNOT reevaluate base->foo inside
> that loop.
> 

Removed.

>> +		head = base->expired_lists + count;
>> +		__expire_timers(base, head);
>> +	}
>> +
>> +	/* Zero base->expired_count after processing all base->expired_lists
>> +	 * to signal it's ready to get re-populated. Otherwise, we race with
>> +	 * tick_find_expired() when base->lock is temporarily dropped in
>> +	 * __expire_timers() */
> 
> Please stick to the two sane comment styles:
> 
>         /* Single line comment */
> 
>         /*
>          * Multi line comment
>          * Multi line comment
>          * Multi line comment
> 	*/
> 
> For further enlightment: http://marc.info/?l=linux-kernel&m=146799838429328&w=2
> 

Fixed.

>> +	base->expired_count = 0;
>> +}
>> +
>> +static int __collect_expired_timers(struct timer_base *base)
>> +{
>> +	unsigned long clk;
>>   	struct hlist_head *vec;
>> -	int i, levels = 0;
>> +	int i;
>>   	unsigned int idx;
> 
> See above
> 
>> +	/*
>> +	 * expire_timers() must be called at least once before we can
>> +	 * collect more timers
>> +	 */
>> +	if (base->expired_count)
>> +		goto end;
>> +
>> +	clk = base->clk;
>>   	for (i = 0; i < LVL_DEPTH; i++) {
>>   		idx = (clk & LVL_MASK) + i * LVL_SIZE;
>>   
>>   		if (__test_and_clear_bit(idx, base->pending_map)) {
>>   			vec = base->vectors + idx;
>> -			hlist_move_list(vec, heads++);
>> -			levels++;
>> +			hlist_move_list(vec,
>> +				&base->expired_lists[base->expired_count++]);
> 
> Eew. What's wrong with local variables ?
> 
>       struct hist_head *list = &base->expired_vectors;
> 
> at the top of this function and then do
> 
> 			hlist_move_list(vec, list++);
> 			base->expired_levels++;
> 
> or have a local count and use it as index to list[]. The code generation
> should be roughly the same, but I expect it to be better with the seperate
> increments.
> 

Done, looks nicer. I was trying to keep changes to existing code minimal.

>>   		}
>>   		/* Is it time to look at the next level? */
>>   		if (clk & LVL_CLK_MASK)
>> @@ -1366,7 +1394,8 @@ static int __collect_expired_timers(struct timer_base *base,
>>   		/* Shift clock for the next level granularity */
>>   		clk >>= LVL_CLK_SHIFT;
>>   	}
>> -	return levels;
>> +
>> +	end: return base->expired_count;
> 
> More Eeew! Can you please look how labels are placed in the
> kernel. Certainly not that way.
> 
> Aside of that the goto is silly. You can just return expired_count up at
> that conditional, or move the conditional to the caller.
> 

Replaced goto with simple return.

> Actually I do not understand that conditional at the top at all. The call
> site breaks out of the loop when the return value is > 0. So what's that
> for? Paranoia? If that's the case then you want a WARN_ONCE there, because
> that should never happen. Otherwise it's just pointless. If actually
> something relies on that, then it's disgusting.
> 
Paranoia. We should never hit this case unless TIMER_SOFTIRQ got raised
without expired timers. Added WARN_ONCE().

>>   }
>>   
>>   #ifdef CONFIG_NO_HZ_COMMON
>> @@ -1559,8 +1588,7 @@ void timer_clear_idle(void)
>>   	base->is_idle = false;
>>   }
>>   
>> -static int collect_expired_timers(struct timer_base *base,
>> -				  struct hlist_head *heads)
>> +static int collect_expired_timers(struct timer_base *base)
>>   {
>>   	/*
>>   	 * NOHZ optimization. After a long idle sleep we need to forward the
>> @@ -1581,16 +1609,41 @@ static int collect_expired_timers(struct timer_base *base,
>>   		}
>>   		base->clk = next;
>>   	}
>> -	return __collect_expired_timers(base, heads);
>> +	return __collect_expired_timers(base);
>>   }
>>   #else
>> -static inline int collect_expired_timers(struct timer_base *base,
>> -					 struct hlist_head *heads)
>> +static inline int collect_expired_timers(struct timer_base *base)
>>   {
>> -	return __collect_expired_timers(base, heads);
>> +	return __collect_expired_timers(base);
>>   }
>>   #endif
>>   
>> +/* Increments timer_base to current jiffies or until first expired
>> + * timer is found. Return number of expired timers. */
> 
> Sigh.
> 

Fixed comment formatting.

>> +static int find_expired_timers(struct timer_base *base)
>> +{
>> +	const unsigned long int end_clk = jiffies;
> 
> const ? unsigned long int ?
> 

Dropped the const. Didn't realize it violated a coding convention.

>> +	int expired_count;
>> +
>> +	while ( !(expired_count = collect_expired_timers(base)) &&
>> +			time_after_eq(end_clk, base->clk) ) {
> 
> These extra white spaces after ( and before ) are pointless and not kernel
> coding style.
> 
> What's worse is the order of your conditionals. Just look at the original
> code.
> 
>> +		base->clk++;
>> +	}
> 

Fixed.

> Aside of that this loop is fricking hard to read.
> 
> 	int levels = 0;
> 
> 	while (!levels && time_after_eq(jiffies, base->clk)) {
> 	      levels = collect_expired_timers(base, heads);
> 	      base->clk++;
> 	}
> 	
> 	return levels;
> 
> Is all what you need here, right? That's what the original loop does as
> well.
> 

Correct, but the original loop was in __run_timers() and this one is 
called from both __run_timers() and run_local_timers(), which is why I 
moved it to a separate function.

>> +
>> +	return expired_count;
>> +}
>> +
>> +/* Called from CPU tick routine to collect expired timers up to current
>> + * jiffies. Return number of expired timers. */
> 
> Wrong. It returns the number of levels which have expired timers. The
> number of actual timers per level is unknown as we move the complete list.
> 

Fixed comment.

>> +static int tick_find_expired(struct timer_base *base)
>> +{
>> +	int count;
> 
> Missing new line between declaration and code. checkpatch.pl is wrong on a
> lot of things, but it would have told you.
> 

Fixed.

>> +	raw_spin_lock(&base->lock);
>> +	count = find_expired_timers(base);
>> +	raw_spin_unlock(&base->lock);
>> +	return count;
> 
> Please be consistent with the names. We use 'levels' throughout all the other
> functions. Random variable names are just confusing.
> 

Renamed "count" to "levels" in timer_base and various functions.

>> +}
>> +
>>   /*
>>    * Called from the timer interrupt handler to charge one tick to the current
>>    * process.  user_tick is 1 if the tick is user time, 0 for system.
>> @@ -1618,22 +1671,11 @@ void update_process_times(int user_tick)
>>    */
>>   static inline void __run_timers(struct timer_base *base)
>>   {
>> -	struct hlist_head heads[LVL_DEPTH];
>> -	int levels;
>> -
>> -	if (!time_after_eq(jiffies, base->clk))
>> -		return;
>> -
>>   	raw_spin_lock_irq(&base->lock);
>>   
>> -	while (time_after_eq(jiffies, base->clk)) {
>> +	while (find_expired_timers(base))
>> +		expire_timers(base);
> 
> Now I understand that extra conditional above. That's crap, really. Two
> ways to solve that:
> 
> 	do {
>         		expire_timers(base);
> 	} while (find_expired_timers(base));
> 
> 	which requires a check for base->expired_levels inside of
> 	expire_timers().
> 
> or
> 
>         if (base->expired_levels)
>         		expire_timers(base);
> 
> 	while (find_expired_timers(base))
>         		expire_timers(base);
> 

The do-while approach works for me. expire_timers() already noops when
expired_levels is zero. However, I would like to keep the
WARN_ONCE(expired_levels) check in __collect_expired_timers() as a
sanity check.

>>   	raw_spin_unlock_irq(&base->lock);
>>   	wakeup_timer_waiters(base);
> 
> Errm. Please submit patches against mainline. This is RT only. On mainline
> the overhead of raising the softirq is not that big, but the exercise is
> the same.
> 

I have been submitting to both mailing lists simultaneously.

>> @@ -1644,12 +1686,16 @@ static inline void __run_timers(struct timer_base *base)
>>   static __latent_entropy void run_timer_softirq(struct softirq_action *h)
>>   {
>>   	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
>> +	int* block_softirq = this_cpu_ptr(&block_softirqs);
> 
> Sigh. A pointer is declared with:
> 
> 	int *p;
> 
> and not
> 
> 	int* p;
> 
>>   	irq_work_tick_soft();
>>   
>>   	__run_timers(base);
>>   	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
>>   		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
>> +
>> +	/* Allow new TIMER_SOFTIRQs to get scheduled by run_local_timers() */
>> +	WRITE_ONCE(*block_softirq, 0);
> 
> You are in interrupt enabled code here. So you actually might miss a wakeup
> and delay it to the next tick. If that's your intention then please
> document it proper. If not, you need to disable interrupts around the write
> and recheck stuff.
> 

I'm not sure what you mean exaclty. My intention here is to only permit 
new TIMER_SOFTIRQs to get raised by run_local_timers(). See updated 
commit message for details.

> Also the WRITE_ONCE() is pointless. The compiler cannot reorder the
> write. And it does not protect you from racing with the hard interrupt. So
> for the sloppy variant a simple:
> 
>         base->softirq_activated = false;
> 
> is sufficient.
> 
>>   }
>>   
>>   /*
>> @@ -1657,18 +1703,28 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
>>    */
>>   void run_local_timers(void)
>>   {
>> -	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
>> +	int* block_softirq = this_cpu_ptr(&block_softirqs);
>> +	struct timer_base *base;
>>   
>>   	hrtimer_run_queues();
>> +
>> +	/* Skip if TIMER_SOFTIRQ is already running for this CPU */
>> +	if (READ_ONCE(*block_softirq))
>> +		return;
>> +
>> +	base = this_cpu_ptr(&timer_bases[BASE_STD]);
> 
> And this here becomes:
> 
>      	if (base->softirq_activated)
>      		return;
> 
>> +
>>   	/* Raise the softirq only if required. */
>> -	if (time_before(jiffies, base->clk)) {
>> +	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
>>   		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
>>   			return;
>>   		/* CPU is awake, so check the deferrable base. */
>>   		base++;
>> -		if (time_before(jiffies, base->clk))
>> +		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
>>   			return;
> 
> To make that work, all you need here is:
> 
> 		base--;
> 
>>   	}
> 
> and
> 	base->softirq_activated = true;
> 

Done. Dropped WRITE_ONCE().

>>   static void __init init_timer_cpu(int cpu)
>>   {
>>   	struct timer_base *base;
>> +	int* block_softirq;
>>   	int i;
>>   
>>   	for (i = 0; i < NR_BASES; i++) {
>> @@ -1852,6 +1910,10 @@ static void __init init_timer_cpu(int cpu)
>>   #ifdef CONFIG_PREEMPT_RT_FULL
>>   		init_swait_queue_head(&base->wait_for_running_timer);
>>   #endif
>> +		base->expired_count = 0;
>> +
>> +		block_softirq = per_cpu_ptr(&block_softirqs, cpu);
>> +		*block_softirq = 0;
> 
> What kind of voodoo initialization is this? Do you not trust BSS? Or do you
> not make sure that the stuff is brought into proper state when a CPU goes
> offline?
> 

Yea, this is pointless. Not sure what I was thinking. Removed.

> Aside of the above, this patch wants to be split into two pieces:
> 
>    1) Embedd the hlist heads for expired bucket collection into base
>       struct and adjust the code accordingly.
> 
>    2) Implement the conditional softirq raise machinery
> 

I agree. I split it and will submit a PATCH v3 shortly.

> Thanks,
> 
> 	tglx
> 

Thanks,
Haris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/2] timers: Don't wake ktimersoftd on every tick
  2017-07-18 21:33                       ` Thomas Gleixner
  2017-08-03 21:04                         ` Haris Okanovic
@ 2017-08-03 21:06                         ` Haris Okanovic
  2017-08-03 21:06                           ` [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
  2018-03-06 23:39                         ` [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  2 siblings, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2017-08-03 21:06 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: haris.okanovic, harisokn, bigeasy, tglx, julia.cartwright,
	gratian.crisan, anna-maria

We recently upgraded from 4.1 to 4.6 and noticed a minor latency
regression caused by an additional thread wakeup (ktimersoftd) in
interrupt context on every tick. The wakeups are from
run_local_timers() raising TIMER_SOFTIRQ. Both TIMER and SCHED softirq
coalesced into one ksoftirqd wakeup prior to Sebastian's change to split
timers into their own thread.

There's already logic in run_local_timers() to avoid some unnecessary
wakeups of ksoftirqd, but it doesn't seems to catch them all. In
particular, I've seen many unnecessary wakeups when jiffies increments
prior to run_local_timers().

Change the way timers are collected per Julia and Thomas'
recommendation: Expired timers are now collected in interrupt context
and fired in ktimersoftd to avoid double-walk of `pending_map`.

Collect expired timers in interrupt context to avoid overhead of waking
ktimersoftd on every tick. ktimersoftd now wakes only when one or more
timers are ready, which yields a minor reduction in small latency
spikes measure by cyclictest.

Execution time of run_local_timers() increases by 0.2us to 2.5us as
measured by TSC on a 2core Intel Atom E3825 system.

This is implemented by storing lists of expired timers in timer_base,
updated on each tick. Any addition to the lists wakes ktimersoftd
(softirq) to process those timers.

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
---
[PATCH v2] Applied Thomas Gleixner's suggestions:
 - Fix expired_count race
 - Remove unneeded base->clk lookahead
 - Return expired_count in collect_expired_timers()
 - Add block_softirq
 - Rebase to v4.11.8-rt5
[PATCH v3]
 - Fix cosmetic issues
 - Rename "count" to "levels" in timer_base and various functions
 - Move expired_levels and block_softirq to fill holes in timer_base
 - Remove READ_ONCE/WRITE_ONCE around block_softirq

https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v5
---
 kernel/time/timer.c | 111 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 82 insertions(+), 29 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 5730d42bfd67..078027d8a866 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -197,6 +197,7 @@ EXPORT_SYMBOL(jiffies_64);
 
 struct timer_base {
 	raw_spinlock_t		lock;
+	int					expired_levels;
 	struct timer_list	*running_timer;
 #ifdef CONFIG_PREEMPT_RT_FULL
 	struct swait_queue_head	wait_for_running_timer;
@@ -209,6 +210,7 @@ struct timer_base {
 	bool			is_idle;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
+	struct hlist_head	expired_lists[LVL_DEPTH];
 } ____cacheline_aligned;
 
 static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
@@ -1314,7 +1316,8 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 	}
 }
 
-static void expire_timers(struct timer_base *base, struct hlist_head *head)
+static void __expire_timers(struct timer_base *base,
+				   struct hlist_head *head)
 {
 	while (!hlist_empty(head)) {
 		struct timer_list *timer;
@@ -1344,21 +1347,49 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static void expire_timers(struct timer_base *base)
+{
+	struct hlist_head *head;
+	int levels = base->expired_levels;
+
+	while (levels--) {
+		head = base->expired_lists + levels;
+		__expire_timers(base, head);
+	}
+
+	/*
+	 * Zero base->expired_levels after processing all base->expired_lists
+	 * to signal it's ready to get re-populated. Otherwise, we race with
+	 * tick_find_expired() when base->lock is temporarily dropped in
+	 * __expire_timers()
+	 */
+	base->expired_levels = 0;
+}
+
+static int __collect_expired_timers(struct timer_base *base)
 {
-	unsigned long clk = base->clk;
 	struct hlist_head *vec;
-	int i, levels = 0;
+	struct hlist_head *expired_list = base->expired_lists;
+	unsigned long clk;
+	int i;
 	unsigned int idx;
 
+	/*
+	 * expire_timers() must be called at least once before we can
+	 * collect more timers.
+	 */
+	if (base->expired_levels)
+		return base->expired_levels;
+
+	clk = base->clk;
 	for (i = 0; i < LVL_DEPTH; i++) {
 		idx = (clk & LVL_MASK) + i * LVL_SIZE;
 
 		if (__test_and_clear_bit(idx, base->pending_map)) {
 			vec = base->vectors + idx;
-			hlist_move_list(vec, heads++);
-			levels++;
+			hlist_move_list(vec, expired_list);
+			base->expired_levels++;
+			expired_list++;
 		}
 		/* Is it time to look at the next level? */
 		if (clk & LVL_CLK_MASK)
@@ -1366,7 +1397,8 @@ static int __collect_expired_timers(struct timer_base *base,
 		/* Shift clock for the next level granularity */
 		clk >>= LVL_CLK_SHIFT;
 	}
-	return levels;
+
+	return base->expired_levels;
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -1559,8 +1591,7 @@ void timer_clear_idle(void)
 	base->is_idle = false;
 }
 
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
+static int collect_expired_timers(struct timer_base *base)
 {
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
@@ -1581,17 +1612,48 @@ static int collect_expired_timers(struct timer_base *base,
 		}
 		base->clk = next;
 	}
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
+static inline int collect_expired_timers(struct timer_base *base)
 {
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #endif
 
 /*
+ * Increments timer_base to current jiffies or until first expired
+ * timer is found. Return number of expired levels.
+ */
+static int find_expired_timers(struct timer_base *base)
+{
+	unsigned long int end_clk = jiffies;
+	int expired_levels;
+
+	while (!(expired_levels = collect_expired_timers(base)) &&
+			time_after_eq(end_clk, base->clk)) {
+		base->clk++;
+	}
+
+	return expired_levels;
+}
+
+/*
+ * Called from CPU tick routine to collect expired timers up to current
+ * jiffies. Return number of expired levels.
+ */
+static int tick_find_expired(struct timer_base *base)
+{
+	int levels;
+
+	raw_spin_lock(&base->lock);
+	levels = find_expired_timers(base);
+	raw_spin_unlock(&base->lock);
+
+	return levels;
+}
+
+/*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
  */
@@ -1618,22 +1680,12 @@ void update_process_times(int user_tick)
  */
 static inline void __run_timers(struct timer_base *base)
 {
-	struct hlist_head heads[LVL_DEPTH];
-	int levels;
-
-	if (!time_after_eq(jiffies, base->clk))
-		return;
-
 	raw_spin_lock_irq(&base->lock);
 
-	while (time_after_eq(jiffies, base->clk)) {
+	do {
+		expire_timers(base);
+	} while (find_expired_timers(base));
 
-		levels = collect_expired_timers(base, heads);
-		base->clk++;
-
-		while (levels--)
-			expire_timers(base, heads + levels);
-	}
 	raw_spin_unlock_irq(&base->lock);
 	wakeup_timer_waiters(base);
 }
@@ -1661,12 +1713,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);
@@ -1826,6 +1878,7 @@ int timers_dead_cpu(unsigned int cpu)
 		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
 
 		BUG_ON(old_base->running_timer);
+		BUG_ON(old_base->expired_levels);
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2017-08-03 21:06                         ` [PATCH v3 1/2] " Haris Okanovic
@ 2017-08-03 21:06                           ` Haris Okanovic
  2018-01-05 19:37                             ` Haris Okanovic
  0 siblings, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2017-08-03 21:06 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: haris.okanovic, harisokn, bigeasy, tglx, julia.cartwright,
	gratian.crisan, anna-maria

This change avoid needlessly searching for more timers in
run_local_timers() (hard interrupt context) when they can't fire.
For example, when ktimersoftd/run_timer_softirq() is scheduled but
preempted due to cpu contention. When it runs, run_timer_softirq() will
discover newly expired timers up to current jiffies in addition to
firing previously expired timers.

However, this change also adds an edge case where non-hrtimer firing
is sometimes delayed by an additional tick. This is acceptable since we
don't make latency guarantees for non-hrtimers and would prefer to
minimize hard interrupt time instead.

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
---
[PATCH v3]
 - Split block_softirq into separate commit

https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v5
---
 kernel/time/timer.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 078027d8a866..f0ef9675abdf 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -208,6 +208,7 @@ struct timer_base {
 	bool			migration_enabled;
 	bool			nohz_active;
 	bool			is_idle;
+	bool			block_softirq;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
 	struct hlist_head	expired_lists[LVL_DEPTH];
@@ -1376,9 +1377,11 @@ static int __collect_expired_timers(struct timer_base *base)
 
 	/*
 	 * expire_timers() must be called at least once before we can
-	 * collect more timers.
+	 * collect more timers. We should never hit this case unless
+	 * TIMER_SOFTIRQ got raised without expired timers.
 	 */
-	if (base->expired_levels)
+	if (WARN_ONCE(base->expired_levels,
+			"Must expire collected timers before collecting more"))
 		return base->expired_levels;
 
 	clk = base->clk;
@@ -1702,6 +1705,9 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
+
+	/* Allow new TIMER_SOFTIRQs to get scheduled by run_local_timers() */
+	base->block_softirq = false;
 }
 
 /*
@@ -1712,6 +1718,14 @@ void run_local_timers(void)
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
 	hrtimer_run_queues();
+
+	/*
+	 * Skip if TIMER_SOFTIRQ is already running on this CPU, since it
+	 * will find and expire all timers up to current jiffies.
+	 */
+	if (base->block_softirq)
+		return;
+
 	/* Raise the softirq only if required. */
 	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
@@ -1720,7 +1734,10 @@ void run_local_timers(void)
 		base++;
 		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
+		base--;
 	}
+
+	base->block_softirq = true;
 	raise_softirq(TIMER_SOFTIRQ);
 }
 
-- 
2.13.2

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2017-08-03 21:06                           ` [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
@ 2018-01-05 19:37                             ` Haris Okanovic
  2018-03-01 15:49                               ` Haris Okanovic
  0 siblings, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2018-01-05 19:37 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel, bigeasy, tglx
  Cc: harisokn, julia.cartwright, gratian.crisan, anna-maria

It looks like an old version of this patch is included in v4.9*-rt* 
kernels -- E.g. commit 032f93ca in v4.9.68-rt60. There's nothing 
functionally wrong with the included version to the best of my 
knowledge. However, I posted a newer V3 [1][2] based on Thomas' feedback 
that's substantially cleaner and likely more efficient (haven't measured 
yet). I think we should include the latter version instead, if only for 
the cosmetic benefits. Thoughts?

[1] https://patchwork.kernel.org/patch/9879825/  [PATCH v3,1/2]
[2] https://patchwork.kernel.org/patch/9879827/  [PATCH v3,2/2]

-- Haris


On 08/03/2017 04:06 PM, Haris Okanovic wrote:
> This change avoid needlessly searching for more timers in
> run_local_timers() (hard interrupt context) when they can't fire.
> For example, when ktimersoftd/run_timer_softirq() is scheduled but
> preempted due to cpu contention. When it runs, run_timer_softirq() will
> discover newly expired timers up to current jiffies in addition to
> firing previously expired timers.
> 
> However, this change also adds an edge case where non-hrtimer firing
> is sometimes delayed by an additional tick. This is acceptable since we
> don't make latency guarantees for non-hrtimers and would prefer to
> minimize hard interrupt time instead.
> 
> Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
> ---
> [PATCH v3]
>   - Split block_softirq into separate commit
> 
> https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v5
> ---
>   kernel/time/timer.c | 21 +++++++++++++++++++--
>   1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 078027d8a866..f0ef9675abdf 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -208,6 +208,7 @@ struct timer_base {
>   	bool			migration_enabled;
>   	bool			nohz_active;
>   	bool			is_idle;
> +	bool			block_softirq;
>   	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
>   	struct hlist_head	vectors[WHEEL_SIZE];
>   	struct hlist_head	expired_lists[LVL_DEPTH];
> @@ -1376,9 +1377,11 @@ static int __collect_expired_timers(struct timer_base *base)
>   
>   	/*
>   	 * expire_timers() must be called at least once before we can
> -	 * collect more timers.
> +	 * collect more timers. We should never hit this case unless
> +	 * TIMER_SOFTIRQ got raised without expired timers.
>   	 */
> -	if (base->expired_levels)
> +	if (WARN_ONCE(base->expired_levels,
> +			"Must expire collected timers before collecting more"))
>   		return base->expired_levels;
>   
>   	clk = base->clk;
> @@ -1702,6 +1705,9 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
>   	__run_timers(base);
>   	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
>   		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
> +
> +	/* Allow new TIMER_SOFTIRQs to get scheduled by run_local_timers() */
> +	base->block_softirq = false;
>   }
>   
>   /*
> @@ -1712,6 +1718,14 @@ void run_local_timers(void)
>   	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
>   
>   	hrtimer_run_queues();
> +
> +	/*
> +	 * Skip if TIMER_SOFTIRQ is already running on this CPU, since it
> +	 * will find and expire all timers up to current jiffies.
> +	 */
> +	if (base->block_softirq)
> +		return;
> +
>   	/* Raise the softirq only if required. */
>   	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
>   		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
> @@ -1720,7 +1734,10 @@ void run_local_timers(void)
>   		base++;
>   		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
>   			return;
> +		base--;
>   	}
> +
> +	base->block_softirq = true;
>   	raise_softirq(TIMER_SOFTIRQ);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-01-05 19:37                             ` Haris Okanovic
@ 2018-03-01 15:49                               ` Haris Okanovic
  2018-03-01 15:54                                 ` Thomas Gleixner
  2018-03-01 16:47                                 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-03-01 15:49 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel, bigeasy, tglx
  Cc: Haris Okanovic, julia.cartwright, gratian.crisan, anna-maria

*bump* Has anyone looked into this?

On 01/05/2018 01:37 PM, Haris Okanovic wrote:
> It looks like an old version of this patch is included in v4.9*-rt* 
> kernels -- E.g. commit 032f93ca in v4.9.68-rt60. There's nothing 
> functionally wrong with the included version to the best of my 
> knowledge. However, I posted a newer V3 [1][2] based on Thomas' feedback 
> that's substantially cleaner and likely more efficient (haven't measured 
> yet). I think we should include the latter version instead, if only for 
> the cosmetic benefits. Thoughts?
> 
> [1] https://patchwork.kernel.org/patch/9879825/  [PATCH v3,1/2]
> [2] https://patchwork.kernel.org/patch/9879827/  [PATCH v3,2/2]
> 
> -- Haris
> 
> 
> On 08/03/2017 04:06 PM, Haris Okanovic wrote:
>> This change avoid needlessly searching for more timers in
>> run_local_timers() (hard interrupt context) when they can't fire.
>> For example, when ktimersoftd/run_timer_softirq() is scheduled but
>> preempted due to cpu contention. When it runs, run_timer_softirq() will
>> discover newly expired timers up to current jiffies in addition to
>> firing previously expired timers.
>>
>> However, this change also adds an edge case where non-hrtimer firing
>> is sometimes delayed by an additional tick. This is acceptable since we
>> don't make latency guarantees for non-hrtimers and would prefer to
>> minimize hard interrupt time instead.
>>
>> Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
>> ---
>> [PATCH v3]
>>   - Split block_softirq into separate commit
>>
>> https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v5
>> ---
>>   kernel/time/timer.c | 21 +++++++++++++++++++--
>>   1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
>> index 078027d8a866..f0ef9675abdf 100644
>> --- a/kernel/time/timer.c
>> +++ b/kernel/time/timer.c
>> @@ -208,6 +208,7 @@ struct timer_base {
>>       bool            migration_enabled;
>>       bool            nohz_active;
>>       bool            is_idle;
>> +    bool            block_softirq;
>>       DECLARE_BITMAP(pending_map, WHEEL_SIZE);
>>       struct hlist_head    vectors[WHEEL_SIZE];
>>       struct hlist_head    expired_lists[LVL_DEPTH];
>> @@ -1376,9 +1377,11 @@ static int __collect_expired_timers(struct 
>> timer_base *base)
>>       /*
>>        * expire_timers() must be called at least once before we can
>> -     * collect more timers.
>> +     * collect more timers. We should never hit this case unless
>> +     * TIMER_SOFTIRQ got raised without expired timers.
>>        */
>> -    if (base->expired_levels)
>> +    if (WARN_ONCE(base->expired_levels,
>> +            "Must expire collected timers before collecting more"))
>>           return base->expired_levels;
>>       clk = base->clk;
>> @@ -1702,6 +1705,9 @@ static __latent_entropy void 
>> run_timer_softirq(struct softirq_action *h)
>>       __run_timers(base);
>>       if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
>>           __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
>> +
>> +    /* Allow new TIMER_SOFTIRQs to get scheduled by 
>> run_local_timers() */
>> +    base->block_softirq = false;
>>   }
>>   /*
>> @@ -1712,6 +1718,14 @@ void run_local_timers(void)
>>       struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
>>       hrtimer_run_queues();
>> +
>> +    /*
>> +     * Skip if TIMER_SOFTIRQ is already running on this CPU, since it
>> +     * will find and expire all timers up to current jiffies.
>> +     */
>> +    if (base->block_softirq)
>> +        return;
>> +
>>       /* Raise the softirq only if required. */
>>       if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
>>           if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
>> @@ -1720,7 +1734,10 @@ void run_local_timers(void)
>>           base++;
>>           if (time_before(jiffies, base->clk) || 
>> !tick_find_expired(base))
>>               return;
>> +        base--;
>>       }
>> +
>> +    base->block_softirq = true;
>>       raise_softirq(TIMER_SOFTIRQ);
>>   }
>>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-01 15:49                               ` Haris Okanovic
@ 2018-03-01 15:54                                 ` Thomas Gleixner
  2018-03-01 16:35                                   ` Haris Okanovic
  2018-03-01 16:47                                 ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Gleixner @ 2018-03-01 15:54 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, bigeasy, julia.cartwright,
	gratian.crisan, anna-maria

On Thu, 1 Mar 2018, Haris Okanovic wrote:

> *bump* Has anyone looked into this?

I have, but it's still in my melted spectrum induced backlog and it does
not apply anymore :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-01 15:54                                 ` Thomas Gleixner
@ 2018-03-01 16:35                                   ` Haris Okanovic
  0 siblings, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-03-01 16:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Haris Okanovic, linux-rt-users, linux-kernel, bigeasy,
	julia.cartwright, gratian.crisan, anna-maria

No problem. I know you've been busy recently. I'll post an update.

-- Haris

On 03/01/2018 09:54 AM, Thomas Gleixner wrote:
> On Thu, 1 Mar 2018, Haris Okanovic wrote:
> 
>> *bump* Has anyone looked into this?
> 
> I have, but it's still in my melted spectrum induced backlog and it does
> not apply anymore :)
> 
> Thanks,
> 
> 	tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-01 15:49                               ` Haris Okanovic
  2018-03-01 15:54                                 ` Thomas Gleixner
@ 2018-03-01 16:47                                 ` Sebastian Andrzej Siewior
  2018-03-01 18:37                                   ` Haris Okanovic
  1 sibling, 1 reply; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-01 16:47 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright,
	gratian.crisan, anna-maria

On 2018-03-01 09:49:59 [-0600], Haris Okanovic wrote:
> *bump* Has anyone looked into this?

I'm lost. 
It entered the kernel in v4.9.9-rt6 and left in v4.9.30-rt20 once we
figured out that there is something wrong with it.
Is it still in a RT tree somewhere?
Is there a newer patch pending on your side?

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-01 16:47                                 ` Sebastian Andrzej Siewior
@ 2018-03-01 18:37                                   ` Haris Okanovic
  2018-03-01 19:06                                     ` Sebastian Andrzej Siewior
  2018-03-02 14:52                                     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-03-01 18:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright,
	gratian.crisan, anna-maria

On 03/01/2018 10:47 AM, Sebastian Andrzej Siewior wrote:
> On 2018-03-01 09:49:59 [-0600], Haris Okanovic wrote:
>> *bump* Has anyone looked into this?
> 
> I'm lost.
> It entered the kernel in v4.9.9-rt6 and left in v4.9.30-rt20 once we
> figured out that there is something wrong with it.

It was added back into 4.9 at some point after v4.9.30-rt20. I see an 
older version in v4.9.68-rt60, for example, hence my original email. It 
was dropped sometime thereafter, presumably because it no longer cleanly 
applies. I don't see it in v4.14.20-rt17, for example.

> Is it still in a RT tree somewhere?

I can't draw an exact timeline, since rt history is hard to follow. I 
don't think so.

> Is there a newer patch pending on your side?

Not yet. The latest version on patchwork is OK, just needs to be rebased 
post-4.9. I'll post a new version when I get a chance to build and 
retest it.

> 
> Sebastian
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-01 18:37                                   ` Haris Okanovic
@ 2018-03-01 19:06                                     ` Sebastian Andrzej Siewior
  2018-03-02 14:52                                     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-01 19:06 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright,
	gratian.crisan, anna-maria

On 2018-03-01 12:37:49 [-0600], Haris Okanovic wrote:
> On 03/01/2018 10:47 AM, Sebastian Andrzej Siewior wrote:
> > On 2018-03-01 09:49:59 [-0600], Haris Okanovic wrote:
> > > *bump* Has anyone looked into this?
> > 
> > I'm lost.
> > It entered the kernel in v4.9.9-rt6 and left in v4.9.30-rt20 once we
> > figured out that there is something wrong with it.
> 
> It was added back into 4.9 at some point after v4.9.30-rt20. I see an older
> version in v4.9.68-rt60, for example, hence my original email. It was
let me check that tomorrow.

> dropped sometime thereafter, presumably because it no longer cleanly
> applies. I don't see it in v4.14.20-rt17, for example.
> 
> > Is there a newer patch pending on your side?
> 
> Not yet. The latest version on patchwork is OK, just needs to be rebased
> post-4.9. I'll post a new version when I get a chance to build and retest
> it.
okay.

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-01 18:37                                   ` Haris Okanovic
  2018-03-01 19:06                                     ` Sebastian Andrzej Siewior
@ 2018-03-02 14:52                                     ` Sebastian Andrzej Siewior
  2018-03-02 16:29                                       ` Haris Okanovic
  1 sibling, 1 reply; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-02 14:52 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright,
	gratian.crisan, anna-maria

On 2018-03-01 12:37:49 [-0600], Haris Okanovic wrote:
> It was added back into 4.9 at some point after v4.9.30-rt20. I see an older
> version in v4.9.68-rt60, for example, hence my original email. It was
> dropped sometime thereafter, presumably because it no longer cleanly
> applies. I don't see it in v4.14.20-rt17, for example.

It was removed in v4.9.34-rt24 via 95d4a348841d ("Revert "timers: Don't
wake ktimersoftd on every tick""). I don't see any leftovers or an older
version.
Looking at the queue I see two patches from you and that is:
 timers: Don't wake ktimersoftd on every tick
 tpm_tis: fix stall after iowrite*()s

and the former is reverted. This was v4.9.84-rt62 that I've been looking
at. Could please point me to the code/patches or something?

> > Is there a newer patch pending on your side?
> 
> Not yet. The latest version on patchwork is OK, just needs to be rebased
> post-4.9. I'll post a new version when I get a chance to build and retest
> it.
okay.

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-02 14:52                                     ` Sebastian Andrzej Siewior
@ 2018-03-02 16:29                                       ` Haris Okanovic
  2018-03-02 16:39                                         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2018-03-02 16:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright,
	gratian.crisan, anna-maria

> Could please point me to the code/patches or something?

I rebase onto v4.14.20-rt17, running some sanity test before reposting 
to ml (cyclictest & Anna's timertest). Will post V4 sometime today (US 
Central Time) if everything goes well.

Are you also asking for a 4.9 version? I'm fine leaving it out of 4.9.

-- Haris


On 03/02/2018 08:52 AM, Sebastian Andrzej Siewior wrote:
> On 2018-03-01 12:37:49 [-0600], Haris Okanovic wrote:
>> It was added back into 4.9 at some point after v4.9.30-rt20. I see an older
>> version in v4.9.68-rt60, for example, hence my original email. It was
>> dropped sometime thereafter, presumably because it no longer cleanly
>> applies. I don't see it in v4.14.20-rt17, for example.
> 
> It was removed in v4.9.34-rt24 via 95d4a348841d ("Revert "timers: Don't
> wake ktimersoftd on every tick""). I don't see any leftovers or an older
> version.
> Looking at the queue I see two patches from you and that is:
>   timers: Don't wake ktimersoftd on every tick
>   tpm_tis: fix stall after iowrite*()s
> 
> and the former is reverted. This was v4.9.84-rt62 that I've been looking
> at. Could please point me to the code/patches or something?
> 
>>> Is there a newer patch pending on your side?
>>
>> Not yet. The latest version on patchwork is OK, just needs to be rebased
>> post-4.9. I'll post a new version when I get a chance to build and retest
>> it.
> okay.
> 
> Sebastian
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-02 16:29                                       ` Haris Okanovic
@ 2018-03-02 16:39                                         ` Sebastian Andrzej Siewior
  2018-03-02 17:19                                           ` Haris Okanovic
  0 siblings, 1 reply; 42+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-03-02 16:39 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright,
	gratian.crisan, anna-maria

On 2018-03-02 10:29:56 [-0600], Haris Okanovic wrote:
> > Could please point me to the code/patches or something?
> 
> I rebase onto v4.14.20-rt17, running some sanity test before reposting to ml
> (cyclictest & Anna's timertest). Will post V4 sometime today (US Central
> Time) if everything goes well.
> 
> Are you also asking for a 4.9 version? I'm fine leaving it out of 4.9.

Hmmm. Maybe this is a form of miscommunication here :)
So my understanding is that you complain/ask why there is an older
version of the patch still in v4.9-RT:

|It was added back into 4.9 at some point after v4.9.30-rt20. I see an older
|version in v4.9.68-rt60, for example, hence my original email. It was dropped
|sometime thereafter, presumably because it no longer cleanly applies. I don't
|see it in v4.14.20-rt17, for example.

So I ask where you see the old version of your patch in v4.9-RT. Yes it
was added, then removed and it never appeared back in. However, I don't
see anymore in v4.9.68-rt60.

> -- Haris

Sebastian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-02 16:39                                         ` Sebastian Andrzej Siewior
@ 2018-03-02 17:19                                           ` Haris Okanovic
  0 siblings, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-03-02 17:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, linux-kernel, tglx, julia.cartwright,
	gratian.crisan, anna-maria



On 03/02/2018 10:39 AM, Sebastian Andrzej Siewior wrote:
> On 2018-03-02 10:29:56 [-0600], Haris Okanovic wrote:
>>> Could please point me to the code/patches or something?
>>
>> I rebase onto v4.14.20-rt17, running some sanity test before reposting to ml
>> (cyclictest & Anna's timertest). Will post V4 sometime today (US Central
>> Time) if everything goes well.
>>
>> Are you also asking for a 4.9 version? I'm fine leaving it out of 4.9.
> 
> Hmmm. Maybe this is a form of miscommunication here :)

Yea, I agree :) Let me try to summarize: When I originally asked this 
question back in March, rt was on 4.9. I was asking back then to pull 
the V3 revision of my timer patch (replacing the old V2). Given that RT 
already moved to 4.14 in the meantime, backed out V3, and V3 no longer 
applies 4.14, I'm fine leaving everything as-is! I post a V4 (for 4.14) 
when I finish retesting it.

> So my understanding is that you complain/ask why there is an older
> version of the patch still in v4.9-RT:
> 
> |It was added back into 4.9 at some point after v4.9.30-rt20. I see an older
> |version in v4.9.68-rt60, for example, hence my original email. It was dropped
> |sometime thereafter, presumably because it no longer cleanly applies. I don't
> |see it in v4.14.20-rt17, for example.
> 
> So I ask where you see the old version of your patch in v4.9-RT. Yes it
> was added, then removed and it never appeared back in. However, I don't
> see anymore in v4.9.68-rt60.
> 
>> -- Haris
> 
> Sebastian
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick
  2017-07-18 21:33                       ` Thomas Gleixner
  2017-08-03 21:04                         ` Haris Okanovic
  2017-08-03 21:06                         ` [PATCH v3 1/2] " Haris Okanovic
@ 2018-03-06 23:39                         ` Haris Okanovic
  2018-03-06 23:39                           ` [PATCH v4 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
       [not found]                           ` <1523332961.4827.1.camel@gmx.de>
  2 siblings, 2 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-03-06 23:39 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: haris.okanovic, harisokn, bigeasy, tglx, julia.cartwright,
	gratian.crisan, anna-maria, daniel

We recently upgraded from 4.1 to 4.6 and noticed a minor latency
regression caused by an additional thread wakeup (ktimersoftd) in
interrupt context on every tick. The wakeups are from
run_local_timers() raising TIMER_SOFTIRQ. Both TIMER and SCHED softirq
coalesced into one ksoftirqd wakeup prior to Sebastian's change to split
timers into their own thread.

There's already logic in run_local_timers() to avoid some unnecessary
wakeups of ksoftirqd, but it doesn't seems to catch them all. In
particular, I've seen many unnecessary wakeups when jiffies increments
prior to run_local_timers().

Change the way timers are collected per Julia and Thomas'
recommendation: Expired timers are now collected in interrupt context
and fired in ktimersoftd to avoid double-walk of `pending_map`.

Collect expired timers in interrupt context to avoid overhead of waking
ktimersoftd on every tick. ktimersoftd now wakes only when one or more
timers are ready, which yields a minor reduction in small latency
spikes measure by cyclictest.

Execution time of run_local_timers() increases by 0.2us to 2.5us as
measured by TSC on a 2core Intel Atom E3825 system.

This is implemented by storing lists of expired timers in timer_base,
updated on each tick. Any addition to the lists wakes ktimersoftd
(softirq) to process those timers.

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
---
[PATCH v2] Applied Thomas Gleixner's suggestions:
 - Fix expired_count race
 - Remove unneeded base->clk lookahead
 - Return expired_count in collect_expired_timers()
 - Add block_softirq
 - Rebase to v4.11.8-rt5
[PATCH v3]
 - Fix cosmetic issues
 - Rename "count" to "levels" in timer_base and various functions
 - Move expired_levels and block_softirq to fill holes in timer_base
 - Remove READ_ONCE/WRITE_ONCE around block_softirq
[PATCH v4]
 - Rebase onto v4.14.20-rt17

https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v6
---
 kernel/time/timer.c | 111 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 82 insertions(+), 29 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 96fd01c9f6b1..98e952a6428d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -197,6 +197,7 @@ EXPORT_SYMBOL(jiffies_64);
 
 struct timer_base {
 	raw_spinlock_t		lock;
+	int					expired_levels;
 	struct timer_list	*running_timer;
 #ifdef CONFIG_PREEMPT_RT_FULL
 	struct swait_queue_head	wait_for_running_timer;
@@ -208,6 +209,7 @@ struct timer_base {
 	bool			must_forward_clk;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
+	struct hlist_head	expired_lists[LVL_DEPTH];
 } ____cacheline_aligned;
 
 static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
@@ -1342,7 +1344,8 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 	}
 }
 
-static void expire_timers(struct timer_base *base, struct hlist_head *head)
+static void __expire_timers(struct timer_base *base,
+				   struct hlist_head *head)
 {
 	while (!hlist_empty(head)) {
 		struct timer_list *timer;
@@ -1372,21 +1375,49 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static void expire_timers(struct timer_base *base)
+{
+	struct hlist_head *head;
+	int levels = base->expired_levels;
+
+	while (levels--) {
+		head = base->expired_lists + levels;
+		__expire_timers(base, head);
+	}
+
+	/*
+	 * Zero base->expired_levels after processing all base->expired_lists
+	 * to signal it's ready to get re-populated. Otherwise, we race with
+	 * tick_find_expired() when base->lock is temporarily dropped in
+	 * __expire_timers()
+	 */
+	base->expired_levels = 0;
+}
+
+static int __collect_expired_timers(struct timer_base *base)
 {
-	unsigned long clk = base->clk;
 	struct hlist_head *vec;
-	int i, levels = 0;
+	struct hlist_head *expired_list = base->expired_lists;
+	unsigned long clk;
+	int i;
 	unsigned int idx;
 
+	/*
+	 * expire_timers() must be called at least once before we can
+	 * collect more timers.
+	 */
+	if (base->expired_levels)
+		return base->expired_levels;
+
+	clk = base->clk;
 	for (i = 0; i < LVL_DEPTH; i++) {
 		idx = (clk & LVL_MASK) + i * LVL_SIZE;
 
 		if (__test_and_clear_bit(idx, base->pending_map)) {
 			vec = base->vectors + idx;
-			hlist_move_list(vec, heads++);
-			levels++;
+			hlist_move_list(vec, expired_list);
+			base->expired_levels++;
+			expired_list++;
 		}
 		/* Is it time to look at the next level? */
 		if (clk & LVL_CLK_MASK)
@@ -1394,7 +1425,8 @@ static int __collect_expired_timers(struct timer_base *base,
 		/* Shift clock for the next level granularity */
 		clk >>= LVL_CLK_SHIFT;
 	}
-	return levels;
+
+	return base->expired_levels;
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -1593,8 +1625,7 @@ void timer_clear_idle(void)
 	base->is_idle = false;
 }
 
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
+static int collect_expired_timers(struct timer_base *base)
 {
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
@@ -1615,16 +1646,47 @@ static int collect_expired_timers(struct timer_base *base,
 		}
 		base->clk = next;
 	}
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
+static inline int collect_expired_timers(struct timer_base *base)
 {
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #endif
 
+/*
+ * Increments timer_base to current jiffies or until first expired
+ * timer is found. Return number of expired levels.
+ */
+static int find_expired_timers(struct timer_base *base)
+{
+	unsigned long int end_clk = jiffies;
+	int expired_levels;
+
+	while (!(expired_levels = collect_expired_timers(base)) &&
+			time_after_eq(end_clk, base->clk)) {
+		base->clk++;
+	}
+
+	return expired_levels;
+}
+
+/*
+ * Called from CPU tick routine to collect expired timers up to current
+ * jiffies. Return number of expired levels.
+ */
+static int tick_find_expired(struct timer_base *base)
+{
+	int levels;
+
+	raw_spin_lock(&base->lock);
+	levels = find_expired_timers(base);
+	raw_spin_unlock(&base->lock);
+
+	return levels;
+}
+
 /*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
@@ -1652,22 +1714,12 @@ void update_process_times(int user_tick)
  */
 static inline void __run_timers(struct timer_base *base)
 {
-	struct hlist_head heads[LVL_DEPTH];
-	int levels;
-
-	if (!time_after_eq(jiffies, base->clk))
-		return;
-
 	raw_spin_lock_irq(&base->lock);
 
-	while (time_after_eq(jiffies, base->clk)) {
+	do {
+		expire_timers(base);
+	} while (find_expired_timers(base));
 
-		levels = collect_expired_timers(base, heads);
-		base->clk++;
-
-		while (levels--)
-			expire_timers(base, heads + levels);
-	}
 	raw_spin_unlock_irq(&base->lock);
 	wakeup_timer_waiters(base);
 }
@@ -1707,12 +1759,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);
@@ -1887,6 +1939,7 @@ int timers_dead_cpu(unsigned int cpu)
 		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
 
 		BUG_ON(old_base->running_timer);
+		BUG_ON(old_base->expired_levels);
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-03-06 23:39                         ` [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
@ 2018-03-06 23:39                           ` Haris Okanovic
       [not found]                           ` <1523332961.4827.1.camel@gmx.de>
  1 sibling, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-03-06 23:39 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: haris.okanovic, harisokn, bigeasy, tglx, julia.cartwright,
	gratian.crisan, anna-maria, daniel

This change avoid needlessly searching for more timers in
run_local_timers() (hard interrupt context) when they can't fire.
For example, when ktimersoftd/run_timer_softirq() is scheduled but
preempted due to cpu contention. When it runs, run_timer_softirq() will
discover newly expired timers up to current jiffies in addition to
firing previously expired timers.

However, this change also adds an edge case where non-hrtimer firing
is sometimes delayed by an additional tick. This is acceptable since we
don't make latency guarantees for non-hrtimers and would prefer to
minimize hard interrupt time instead.

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
---
[PATCH v3]
 - Split block_softirq into separate commit
[PATCH v4]
 - Rebase onto v4.14.20-rt17

https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v6
---
 kernel/time/timer.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 98e952a6428d..5687e9bcf378 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -207,6 +207,7 @@ struct timer_base {
 	unsigned int		cpu;
 	bool			is_idle;
 	bool			must_forward_clk;
+	bool			block_softirq;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
 	struct hlist_head	expired_lists[LVL_DEPTH];
@@ -1404,9 +1405,11 @@ static int __collect_expired_timers(struct timer_base *base)
 
 	/*
 	 * expire_timers() must be called at least once before we can
-	 * collect more timers.
+	 * collect more timers. We should never hit this case unless
+	 * TIMER_SOFTIRQ got raised without expired timers.
 	 */
-	if (base->expired_levels)
+	if (WARN_ONCE(base->expired_levels,
+			"Must expire collected timers before collecting more"))
 		return base->expired_levels;
 
 	clk = base->clk;
@@ -1748,6 +1751,9 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
+
+	/* Allow new TIMER_SOFTIRQs to get scheduled by run_local_timers() */
+	base->block_softirq = false;
 }
 
 /*
@@ -1758,6 +1764,14 @@ void run_local_timers(void)
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
 	hrtimer_run_queues();
+
+	/*
+	 * Skip if TIMER_SOFTIRQ is already running on this CPU, since it
+	 * will find and expire all timers up to current jiffies.
+	 */
+	if (base->block_softirq)
+		return;
+
 	/* Raise the softirq only if required. */
 	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
@@ -1766,7 +1780,10 @@ void run_local_timers(void)
 		base++;
 		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
+		base--;
 	}
+
+	base->block_softirq = true;
 	raise_softirq(TIMER_SOFTIRQ);
 }
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick
       [not found]                           ` <1523332961.4827.1.camel@gmx.de>
@ 2018-04-12 15:00                             ` Haris Okanovic
  2018-06-19 12:43                               ` Daniel Bristot de Oliveira
  2018-06-28 16:36                               ` Haris Okanovic
  0 siblings, 2 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-04-12 15:00 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, linux-kernel

Hi Mike,

I haven't tested the patch with wireshark until now. My system also 
hangs shortly after it starts. I'm pretty sure I hit workqueues in my 
earlier tests via the block driver, but it's clearly not whatever 
wireshark is using. I'll look at it and try to post a fix. CCing the 
list to avoid this patch until then.

Thanks,
Haris


On 04/09/2018 11:02 PM, Mike Galbraith wrote:
> Hi,
> 
> Are these patches dead, or are you planning another submission?  I ask
> because I discovered that with them applied, firing up wireshark hangs
> the box 100% repeatably, with wireshark never fully initializing.
> 
> Applying them to an otherwise virgin 4.14.20-rt17 to be sure...
> 
> crash> bt 6016PID: 6016   TASK: ffff95dd68572180  CPU: 2   COMMAND:
> "dumpcap"
>   #0 [ffffb490094f3bc0] __schedule at ffffffffa56d55b9
>   #1 [ffffb490094f3c40] schedule at ffffffffa56d5a03
>   #2 [ffffb490094f3c58] schedule_timeout at ffffffffa56d8467
>   #3 [ffffb490094f3cd8] wait_for_completion at ffffffffa56d6e34
>   #4 [ffffb490094f3d18] __wait_rcu_gp at ffffffffa50e59cd
>   #5 [ffffb490094f3d58] synchronize_rcu at ffffffffa50ec14e
>   #6 [ffffb490094f3d98] packet_set_ring at ffffffffc0c74da0 [af_packet]
>   #7 [ffffb490094f3e50] packet_setsockopt at ffffffffc0c75d23 [af_packet]
>   #8 [ffffb490094f3ef8] sys_setsockopt at ffffffffa558a5e2
>   #9 [ffffb490094f3f30] do_syscall_64 at ffffffffa5001b05
> #10 [ffffb490094f3f50] entry_SYSCALL_64_after_hwframe at ffffffffa5800065
>      RIP: 00007f3107a1cfaa  RSP: 00007ffc9745c2e8  RFLAGS: 00000246
>      RAX: ffffffffffffffda  RBX: 0000000000000001  RCX: 00007f3107a1cfaa
>      RDX: 0000000000000005  RSI: 0000000000000107  RDI: 0000000000000003
>      RBP: 000055ae1d8eb470   R8: 000000000000001c   R9: 0000000000000002
>      R10: 00007ffc9745c350  R11: 0000000000000246  R12: 00007ffc9745c350
>      R13: 0000000000000000  R14: 000055ae1d8eb200  R15: 000055ae1d8eb2d0
>      ORIG_RAX: 0000000000000036  CS: 0033  SS: 002b
> crash> dmesg
> ...
> [  483.808197] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 52s!
> [  483.808204] Showing busy workqueues and worker pools:
> [  483.808206] workqueue events: flags=0x0
> [  483.808208]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  483.808211]     pending: check_corruption
> [  492.695124] sysrq: SysRq : Trigger a crash
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick
  2018-04-12 15:00                             ` [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
@ 2018-06-19 12:43                               ` Daniel Bristot de Oliveira
  2018-06-20 14:24                                 ` Haris Okanovic
  2018-06-28 16:36                               ` Haris Okanovic
  1 sibling, 1 reply; 42+ messages in thread
From: Daniel Bristot de Oliveira @ 2018-06-19 12:43 UTC (permalink / raw)
  To: Haris Okanovic, Mike Galbraith; +Cc: linux-rt-users, linux-kernel



On 04/12/2018 05:00 PM, Haris Okanovic wrote:
> Hi Mike,
> 
> I haven't tested the patch with wireshark until now. My system also
> hangs shortly after it starts. I'm pretty sure I hit workqueues in my
> earlier tests via the block driver, but it's clearly not whatever
> wireshark is using. I'll look at it and try to post a fix. CCing the
> list to avoid this patch until then.

Hi Haris,

Do we have any update regarding a newer version of this patch?

-- Daniel

> Thanks,
> Haris
> 
> 
> On 04/09/2018 11:02 PM, Mike Galbraith wrote:
>> Hi,
>>
>> Are these patches dead, or are you planning another submission?  I ask
>> because I discovered that with them applied, firing up wireshark hangs
>> the box 100% repeatably, with wireshark never fully initializing.
>>
>> Applying them to an otherwise virgin 4.14.20-rt17 to be sure...
>>
>> crash> bt 6016PID: 6016   TASK: ffff95dd68572180  CPU: 2   COMMAND:
>> "dumpcap"
>>   #0 [ffffb490094f3bc0] __schedule at ffffffffa56d55b9
>>   #1 [ffffb490094f3c40] schedule at ffffffffa56d5a03
>>   #2 [ffffb490094f3c58] schedule_timeout at ffffffffa56d8467
>>   #3 [ffffb490094f3cd8] wait_for_completion at ffffffffa56d6e34
>>   #4 [ffffb490094f3d18] __wait_rcu_gp at ffffffffa50e59cd
>>   #5 [ffffb490094f3d58] synchronize_rcu at ffffffffa50ec14e
>>   #6 [ffffb490094f3d98] packet_set_ring at ffffffffc0c74da0 [af_packet]
>>   #7 [ffffb490094f3e50] packet_setsockopt at ffffffffc0c75d23 [af_packet]
>>   #8 [ffffb490094f3ef8] sys_setsockopt at ffffffffa558a5e2
>>   #9 [ffffb490094f3f30] do_syscall_64 at ffffffffa5001b05
>> #10 [ffffb490094f3f50] entry_SYSCALL_64_after_hwframe at ffffffffa5800065
>>      RIP: 00007f3107a1cfaa  RSP: 00007ffc9745c2e8  RFLAGS: 00000246
>>      RAX: ffffffffffffffda  RBX: 0000000000000001  RCX: 00007f3107a1cfaa
>>      RDX: 0000000000000005  RSI: 0000000000000107  RDI: 0000000000000003
>>      RBP: 000055ae1d8eb470   R8: 000000000000001c   R9: 0000000000000002
>>      R10: 00007ffc9745c350  R11: 0000000000000246  R12: 00007ffc9745c350
>>      R13: 0000000000000000  R14: 000055ae1d8eb200  R15: 000055ae1d8eb2d0
>>      ORIG_RAX: 0000000000000036  CS: 0033  SS: 002b
>> crash> dmesg
>> ...
>> [  483.808197] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0
>> nice=0 stuck for 52s!
>> [  483.808204] Showing busy workqueues and worker pools:
>> [  483.808206] workqueue events: flags=0x0
>> [  483.808208]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
>> [  483.808211]     pending: check_corruption
>> [  492.695124] sysrq: SysRq : Trigger a crash
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe
> linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick
  2018-06-19 12:43                               ` Daniel Bristot de Oliveira
@ 2018-06-20 14:24                                 ` Haris Okanovic
  0 siblings, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-06-20 14:24 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira; +Cc: Mike Galbraith, linux-rt-users, linux-kernel

Hi Daniel,

I didn't have time to look at it yet, aside from confirming Mike's issue 
reproduces on my machine. It's still on my backlog to investigate.

-- Haris


On 06/19/2018 07:43 AM, Daniel Bristot de Oliveira wrote:
> 
> 
> On 04/12/2018 05:00 PM, Haris Okanovic wrote:
>> Hi Mike,
>>
>> I haven't tested the patch with wireshark until now. My system also
>> hangs shortly after it starts. I'm pretty sure I hit workqueues in my
>> earlier tests via the block driver, but it's clearly not whatever
>> wireshark is using. I'll look at it and try to post a fix. CCing the
>> list to avoid this patch until then.
> 
> Hi Haris,
> 
> Do we have any update regarding a newer version of this patch?
> 
> -- Daniel
> 
>> Thanks,
>> Haris
>>
>>
>> On 04/09/2018 11:02 PM, Mike Galbraith wrote:
>>> Hi,
>>>
>>> Are these patches dead, or are you planning another submission?  I ask
>>> because I discovered that with them applied, firing up wireshark hangs
>>> the box 100% repeatably, with wireshark never fully initializing.
>>>
>>> Applying them to an otherwise virgin 4.14.20-rt17 to be sure...
>>>
>>> crash> bt 6016PID: 6016   TASK: ffff95dd68572180  CPU: 2   COMMAND:
>>> "dumpcap"
>>>    #0 [ffffb490094f3bc0] __schedule at ffffffffa56d55b9
>>>    #1 [ffffb490094f3c40] schedule at ffffffffa56d5a03
>>>    #2 [ffffb490094f3c58] schedule_timeout at ffffffffa56d8467
>>>    #3 [ffffb490094f3cd8] wait_for_completion at ffffffffa56d6e34
>>>    #4 [ffffb490094f3d18] __wait_rcu_gp at ffffffffa50e59cd
>>>    #5 [ffffb490094f3d58] synchronize_rcu at ffffffffa50ec14e
>>>    #6 [ffffb490094f3d98] packet_set_ring at ffffffffc0c74da0 [af_packet]
>>>    #7 [ffffb490094f3e50] packet_setsockopt at ffffffffc0c75d23 [af_packet]
>>>    #8 [ffffb490094f3ef8] sys_setsockopt at ffffffffa558a5e2
>>>    #9 [ffffb490094f3f30] do_syscall_64 at ffffffffa5001b05
>>> #10 [ffffb490094f3f50] entry_SYSCALL_64_after_hwframe at ffffffffa5800065
>>>       RIP: 00007f3107a1cfaa  RSP: 00007ffc9745c2e8  RFLAGS: 00000246
>>>       RAX: ffffffffffffffda  RBX: 0000000000000001  RCX: 00007f3107a1cfaa
>>>       RDX: 0000000000000005  RSI: 0000000000000107  RDI: 0000000000000003
>>>       RBP: 000055ae1d8eb470   R8: 000000000000001c   R9: 0000000000000002
>>>       R10: 00007ffc9745c350  R11: 0000000000000246  R12: 00007ffc9745c350
>>>       R13: 0000000000000000  R14: 000055ae1d8eb200  R15: 000055ae1d8eb2d0
>>>       ORIG_RAX: 0000000000000036  CS: 0033  SS: 002b
>>> crash> dmesg
>>> ...
>>> [  483.808197] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0
>>> nice=0 stuck for 52s!
>>> [  483.808204] Showing busy workqueues and worker pools:
>>> [  483.808206] workqueue events: flags=0x0
>>> [  483.808208]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
>>> [  483.808211]     pending: check_corruption
>>> [  492.695124] sysrq: SysRq : Trigger a crash
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-rt-users" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwIDbA&c=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA&r=8Bziuw3IaCGjyrSAphuGwHmVdHcVwza-srUYwL9U_Ms&m=FlO5SC4s2pRHcS-IG9Y02aEEXYqou1_HtmZakuldqlY&s=JJ8LrCcEF4YKcLbUey5JTreyXV0p1sJ1wqic9LQ1PCE&e=

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick
  2018-04-12 15:00                             ` [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
  2018-06-19 12:43                               ` Daniel Bristot de Oliveira
@ 2018-06-28 16:36                               ` Haris Okanovic
  2018-06-28 16:40                                 ` [PATCH v5 " Haris Okanovic
  1 sibling, 1 reply; 42+ messages in thread
From: Haris Okanovic @ 2018-06-28 16:36 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, linux-kernel, daniel, hokanovi

I found the problem: Running `dumpcap -D` (E.g. by wireshark) creates a 
timer that's sometimes re-armed with 0 timeout in it's callback function 
prb_retire_rx_blk_timer_expired(). My change introduced a subtle change 
in __run_timers()'s stop condition, which causes ktimersoftd to spin 
when such a timer is present. This blocks all other timers including 
those from workqueues.

ktimersoftd stack:
  __run_timers()
  [...]
  call_timer_fn()
  prb_retire_rx_blk_timer_expired()
  _prb_refresh_rx_retire_blk_timer()
  mod_timer(..., jiffies + 0)

The current implementation deals with this corner case by deferring 
0-timeout timers to the next tick, where they expires late. I'll submit 
a v5 shortly to match this behavior.

Thanks again for reporting this issue, Mike. It's a good find! And my 
apologies for the late fix. I've been busy the last few months.

-- Haris


On 04/12/2018 10:00 AM, Haris Okanovic wrote:
> Hi Mike,
> 
> I haven't tested the patch with wireshark until now. My system also 
> hangs shortly after it starts. I'm pretty sure I hit workqueues in my 
> earlier tests via the block driver, but it's clearly not whatever 
> wireshark is using. I'll look at it and try to post a fix. CCing the 
> list to avoid this patch until then.
> 
> Thanks,
> Haris
> 
> 
> On 04/09/2018 11:02 PM, Mike Galbraith wrote:
>> Hi,
>>
>> Are these patches dead, or are you planning another submission?  I ask
>> because I discovered that with them applied, firing up wireshark hangs
>> the box 100% repeatably, with wireshark never fully initializing.
>>
>> Applying them to an otherwise virgin 4.14.20-rt17 to be sure...
>>
>> crash> bt 6016PID: 6016   TASK: ffff95dd68572180  CPU: 2   COMMAND:
>> "dumpcap"
>>   #0 [ffffb490094f3bc0] __schedule at ffffffffa56d55b9
>>   #1 [ffffb490094f3c40] schedule at ffffffffa56d5a03
>>   #2 [ffffb490094f3c58] schedule_timeout at ffffffffa56d8467
>>   #3 [ffffb490094f3cd8] wait_for_completion at ffffffffa56d6e34
>>   #4 [ffffb490094f3d18] __wait_rcu_gp at ffffffffa50e59cd
>>   #5 [ffffb490094f3d58] synchronize_rcu at ffffffffa50ec14e
>>   #6 [ffffb490094f3d98] packet_set_ring at ffffffffc0c74da0 [af_packet]
>>   #7 [ffffb490094f3e50] packet_setsockopt at ffffffffc0c75d23 [af_packet]
>>   #8 [ffffb490094f3ef8] sys_setsockopt at ffffffffa558a5e2
>>   #9 [ffffb490094f3f30] do_syscall_64 at ffffffffa5001b05
>> #10 [ffffb490094f3f50] entry_SYSCALL_64_after_hwframe at ffffffffa5800065
>>      RIP: 00007f3107a1cfaa  RSP: 00007ffc9745c2e8  RFLAGS: 00000246
>>      RAX: ffffffffffffffda  RBX: 0000000000000001  RCX: 00007f3107a1cfaa
>>      RDX: 0000000000000005  RSI: 0000000000000107  RDI: 0000000000000003
>>      RBP: 000055ae1d8eb470   R8: 000000000000001c   R9: 0000000000000002
>>      R10: 00007ffc9745c350  R11: 0000000000000246  R12: 00007ffc9745c350
>>      R13: 0000000000000000  R14: 000055ae1d8eb200  R15: 000055ae1d8eb2d0
>>      ORIG_RAX: 0000000000000036  CS: 0033  SS: 002b
>> crash> dmesg
>> ...
>> [  483.808197] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 
>> nice=0 stuck for 52s!
>> [  483.808204] Showing busy workqueues and worker pools:
>> [  483.808206] workqueue events: flags=0x0
>> [  483.808208]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
>> [  483.808211]     pending: check_corruption
>> [  492.695124] sysrq: SysRq : Trigger a crash
>>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick
  2018-06-28 16:36                               ` Haris Okanovic
@ 2018-06-28 16:40                                 ` Haris Okanovic
  2018-06-28 16:40                                   ` [PATCH v5 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
  2018-07-13 12:01                                   ` [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick Anna-Maria Gleixner
  0 siblings, 2 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-06-28 16:40 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: haris.okanovic, harisokn, bigeasy, tglx, julia.cartwright,
	gratian.crisan, anna-maria, daniel, efault

Collect expired timers in interrupt context to avoid overhead of waking
ktimersoftd on every scheduler tick.

This is implemented by storing lists of expired timers in the timer_base
struct, which is updated by the interrupt routing on each tick in
run_local_timers(). TIMER softirq (ktimersoftd) is then raised only when
one or more expired timers are collected.

Performance impact on a 2core Intel Atom E3825 system:
 * reduction in small latency spikes measured by cyclictest
 * ~30% fewer context-switches measured by perf
 * run_local_timers() execution time increases by 0.2 measured by TSC

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>

---
[PATCH v2] Applied Thomas Gleixner's suggestions:
 - Fix expired_count race
 - Remove unneeded base->clk lookahead
 - Return expired_count in collect_expired_timers()
 - Add block_softirq
 - Rebase to v4.11.8-rt5
[PATCH v3]
 - Fix cosmetic issues
 - Rename "count" to "levels" in timer_base and various functions
 - Move expired_levels and block_softirq to fill holes in timer_base
 - Remove READ_ONCE/WRITE_ONCE around block_softirq
[PATCH v4]
 - Rebase onto v4.14.20-rt17
[PATCH v5]
 - Fix hang when timer is rearmed with 0 offset in it's callback
 - Rewrite description

https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v7
---
 kernel/time/timer.c | 111 ++++++++++++++++++++++++++++++++------------
 1 file changed, 82 insertions(+), 29 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 96fd01c9f6b1..dd67c18c16d0 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -197,6 +197,7 @@ EXPORT_SYMBOL(jiffies_64);
 
 struct timer_base {
 	raw_spinlock_t		lock;
+	int					expired_levels;
 	struct timer_list	*running_timer;
 #ifdef CONFIG_PREEMPT_RT_FULL
 	struct swait_queue_head	wait_for_running_timer;
@@ -208,6 +209,7 @@ struct timer_base {
 	bool			must_forward_clk;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
+	struct hlist_head	expired_lists[LVL_DEPTH];
 } ____cacheline_aligned;
 
 static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]);
@@ -1342,7 +1344,8 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
 	}
 }
 
-static void expire_timers(struct timer_base *base, struct hlist_head *head)
+static void __expire_timers(struct timer_base *base,
+				   struct hlist_head *head)
 {
 	while (!hlist_empty(head)) {
 		struct timer_list *timer;
@@ -1372,21 +1375,49 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head)
 	}
 }
 
-static int __collect_expired_timers(struct timer_base *base,
-				    struct hlist_head *heads)
+static void expire_timers(struct timer_base *base)
+{
+	struct hlist_head *head;
+	int levels = base->expired_levels;
+
+	while (levels--) {
+		head = base->expired_lists + levels;
+		__expire_timers(base, head);
+	}
+
+	/*
+	 * Zero base->expired_levels after processing all base->expired_lists
+	 * to signal it's ready to get re-populated. Otherwise, we race with
+	 * tick_find_expired() when base->lock is temporarily dropped in
+	 * __expire_timers()
+	 */
+	base->expired_levels = 0;
+}
+
+static int __collect_expired_timers(struct timer_base *base)
 {
-	unsigned long clk = base->clk;
 	struct hlist_head *vec;
-	int i, levels = 0;
+	struct hlist_head *expired_list = base->expired_lists;
+	unsigned long clk;
+	int i;
 	unsigned int idx;
 
+	/*
+	 * expire_timers() must be called at least once before we can
+	 * collect more timers.
+	 */
+	if (base->expired_levels)
+		return base->expired_levels;
+
+	clk = base->clk;
 	for (i = 0; i < LVL_DEPTH; i++) {
 		idx = (clk & LVL_MASK) + i * LVL_SIZE;
 
 		if (__test_and_clear_bit(idx, base->pending_map)) {
 			vec = base->vectors + idx;
-			hlist_move_list(vec, heads++);
-			levels++;
+			hlist_move_list(vec, expired_list);
+			base->expired_levels++;
+			expired_list++;
 		}
 		/* Is it time to look at the next level? */
 		if (clk & LVL_CLK_MASK)
@@ -1394,7 +1425,8 @@ static int __collect_expired_timers(struct timer_base *base,
 		/* Shift clock for the next level granularity */
 		clk >>= LVL_CLK_SHIFT;
 	}
-	return levels;
+
+	return base->expired_levels;
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -1593,8 +1625,7 @@ void timer_clear_idle(void)
 	base->is_idle = false;
 }
 
-static int collect_expired_timers(struct timer_base *base,
-				  struct hlist_head *heads)
+static int collect_expired_timers(struct timer_base *base)
 {
 	/*
 	 * NOHZ optimization. After a long idle sleep we need to forward the
@@ -1615,16 +1646,47 @@ static int collect_expired_timers(struct timer_base *base,
 		}
 		base->clk = next;
 	}
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #else
-static inline int collect_expired_timers(struct timer_base *base,
-					 struct hlist_head *heads)
+static inline int collect_expired_timers(struct timer_base *base)
 {
-	return __collect_expired_timers(base, heads);
+	return __collect_expired_timers(base);
 }
 #endif
 
+/*
+ * Increments timer_base to current jiffies or until first expired
+ * timer is found. Return number of expired levels.
+ */
+static int find_expired_timers(struct timer_base *base)
+{
+	unsigned long int end_clk = jiffies;
+	int expired_levels = 0;
+
+	while (time_after_eq(end_clk, base->clk) && !expired_levels) {
+		expired_levels = collect_expired_timers(base);
+		base->clk++;
+	}
+
+	return expired_levels;
+}
+
+/*
+ * Called from CPU tick routine to collect expired timers up to current
+ * jiffies. Return number of expired levels.
+ */
+static int tick_find_expired(struct timer_base *base)
+{
+	int levels;
+
+	raw_spin_lock(&base->lock);
+	levels = find_expired_timers(base);
+	raw_spin_unlock(&base->lock);
+
+	return levels;
+}
+
 /*
  * Called from the timer interrupt handler to charge one tick to the current
  * process.  user_tick is 1 if the tick is user time, 0 for system.
@@ -1652,22 +1714,12 @@ void update_process_times(int user_tick)
  */
 static inline void __run_timers(struct timer_base *base)
 {
-	struct hlist_head heads[LVL_DEPTH];
-	int levels;
-
-	if (!time_after_eq(jiffies, base->clk))
-		return;
-
 	raw_spin_lock_irq(&base->lock);
 
-	while (time_after_eq(jiffies, base->clk)) {
+	do {
+		expire_timers(base);
+	} while (find_expired_timers(base));
 
-		levels = collect_expired_timers(base, heads);
-		base->clk++;
-
-		while (levels--)
-			expire_timers(base, heads + levels);
-	}
 	raw_spin_unlock_irq(&base->lock);
 	wakeup_timer_waiters(base);
 }
@@ -1707,12 +1759,12 @@ void run_local_timers(void)
 
 	hrtimer_run_queues();
 	/* Raise the softirq only if required. */
-	if (time_before(jiffies, base->clk)) {
+	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
 			return;
 		/* CPU is awake, so check the deferrable base. */
 		base++;
-		if (time_before(jiffies, base->clk))
+		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
 	}
 	raise_softirq(TIMER_SOFTIRQ);
@@ -1887,6 +1939,7 @@ int timers_dead_cpu(unsigned int cpu)
 		raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING);
 
 		BUG_ON(old_base->running_timer);
+		BUG_ON(old_base->expired_levels);
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled
  2018-06-28 16:40                                 ` [PATCH v5 " Haris Okanovic
@ 2018-06-28 16:40                                   ` Haris Okanovic
  2018-07-13 12:01                                   ` [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick Anna-Maria Gleixner
  1 sibling, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-06-28 16:40 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: haris.okanovic, harisokn, bigeasy, tglx, julia.cartwright,
	gratian.crisan, anna-maria, daniel, efault

This change avoids needlessly searching for more timers in
run_local_timers() (hard interrupt context) when they can't fire.
For example, when ktimersoftd/run_timer_softirq() is scheduled but
preempted due to cpu contention. When it runs, run_timer_softirq() will
discover newly expired timers up to current jiffies in addition to
firing previously expired timers.

However, this change also adds an edge case where non-hrtimer firing
is sometimes delayed by an additional tick. This is acceptable since we
don't make latency guarantees for non-hrtimers and would prefer to
minimize hard interrupt time instead.

Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
---
[PATCH v3]
 - Split block_softirq into separate commit
[PATCH v4]
 - Rebase onto v4.14.20-rt17
[PATCH v5]
   no change

https://github.com/harisokanovic/linux/tree/dev/hokanovi/timer-peek-v7
---
 kernel/time/timer.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index dd67c18c16d0..723c3667de2b 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -207,6 +207,7 @@ struct timer_base {
 	unsigned int		cpu;
 	bool			is_idle;
 	bool			must_forward_clk;
+	bool			block_softirq;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
 	struct hlist_head	expired_lists[LVL_DEPTH];
@@ -1404,9 +1405,11 @@ static int __collect_expired_timers(struct timer_base *base)
 
 	/*
 	 * expire_timers() must be called at least once before we can
-	 * collect more timers.
+	 * collect more timers. We should never hit this case unless
+	 * TIMER_SOFTIRQ got raised without expired timers.
 	 */
-	if (base->expired_levels)
+	if (WARN_ONCE(base->expired_levels,
+			"Must expire collected timers before collecting more"))
 		return base->expired_levels;
 
 	clk = base->clk;
@@ -1748,6 +1751,9 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
+
+	/* Allow new TIMER_SOFTIRQs to get scheduled by run_local_timers() */
+	base->block_softirq = false;
 }
 
 /*
@@ -1758,6 +1764,14 @@ void run_local_timers(void)
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
 	hrtimer_run_queues();
+
+	/*
+	 * Skip if TIMER_SOFTIRQ is already running on this CPU, since it
+	 * will find and expire all timers up to current jiffies.
+	 */
+	if (base->block_softirq)
+		return;
+
 	/* Raise the softirq only if required. */
 	if (time_before(jiffies, base->clk) || !tick_find_expired(base)) {
 		if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
@@ -1766,7 +1780,10 @@ void run_local_timers(void)
 		base++;
 		if (time_before(jiffies, base->clk) || !tick_find_expired(base))
 			return;
+		base--;
 	}
+
+	base->block_softirq = true;
 	raise_softirq(TIMER_SOFTIRQ);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick
  2018-06-28 16:40                                 ` [PATCH v5 " Haris Okanovic
  2018-06-28 16:40                                   ` [PATCH v5 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
@ 2018-07-13 12:01                                   ` Anna-Maria Gleixner
  2018-07-13 14:37                                     ` Haris Okanovic
  1 sibling, 1 reply; 42+ messages in thread
From: Anna-Maria Gleixner @ 2018-07-13 12:01 UTC (permalink / raw)
  To: Haris Okanovic
  Cc: linux-rt-users, linux-kernel, harisokn, bigeasy, tglx,
	julia.cartwright, gratian.crisan, daniel, efault

Hi Haris,

On Thu, 28 Jun 2018, Haris Okanovic wrote:

> Collect expired timers in interrupt context to avoid overhead of waking
> ktimersoftd on every scheduler tick.
> 
> This is implemented by storing lists of expired timers in the timer_base
> struct, which is updated by the interrupt routing on each tick in
> run_local_timers(). TIMER softirq (ktimersoftd) is then raised only when
> one or more expired timers are collected.
> 
> Performance impact on a 2core Intel Atom E3825 system:
>  * reduction in small latency spikes measured by cyclictest
>  * ~30% fewer context-switches measured by perf
>  * run_local_timers() execution time increases by 0.2 measured by TSC
> 

I'm also working on timer improvements at the moment. When I fixed all
my bugs in my implementation (there is a last horrible one), I'm very
interested in integrating your patches into my testing to be able to
give you a tested-by.

Thanks,

	Anna-Maria

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick
  2018-07-13 12:01                                   ` [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick Anna-Maria Gleixner
@ 2018-07-13 14:37                                     ` Haris Okanovic
  0 siblings, 0 replies; 42+ messages in thread
From: Haris Okanovic @ 2018-07-13 14:37 UTC (permalink / raw)
  To: Anna-Maria Gleixner
  Cc: linux-rt-users, linux-kernel, harisokn, bigeasy, tglx,
	julia.cartwright, gratian.crisan, daniel, efault

Sounds good. I'll keep an eye out for your patch set and try it on my 
boards as well. CC me if you can.

-- Haris


On 07/13/2018 07:01 AM, Anna-Maria Gleixner wrote:
> Hi Haris,
> 
> On Thu, 28 Jun 2018, Haris Okanovic wrote:
> 
>> Collect expired timers in interrupt context to avoid overhead of waking
>> ktimersoftd on every scheduler tick.
>>
>> This is implemented by storing lists of expired timers in the timer_base
>> struct, which is updated by the interrupt routing on each tick in
>> run_local_timers(). TIMER softirq (ktimersoftd) is then raised only when
>> one or more expired timers are collected.
>>
>> Performance impact on a 2core Intel Atom E3825 system:
>>   * reduction in small latency spikes measured by cyclictest
>>   * ~30% fewer context-switches measured by perf
>>   * run_local_timers() execution time increases by 0.2 measured by TSC
>>
> 
> I'm also working on timer improvements at the moment. When I fixed all
> my bugs in my implementation (there is a last horrible one), I'm very
> interested in integrating your patches into my testing to be able to
> give you a tested-by.
> 
> Thanks,
> 
> 	Anna-Maria
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-07-13 14:38 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13 21:44 [RFC v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
2016-12-23 17:28 ` Sebastian Andrzej Siewior
2016-12-28 18:06   ` Haris Okanovic
2017-02-03 16:51 ` Sebastian Andrzej Siewior
2017-02-03 18:21   ` [PATCH] " Haris Okanovic
2017-02-10 17:02     ` Sebastian Andrzej Siewior
2017-05-26 17:16       ` [PATCH] Revert "timers: Don't wake ktimersoftd on every tick" Anna-Maria Gleixner
2017-05-26 19:27         ` Haris Okanovic
2017-05-26 19:49           ` Thomas Gleixner
2017-05-26 20:25             ` Haris Okanovic
2017-05-26 20:50               ` Thomas Gleixner
2017-06-02 14:37                 ` Haris Okanovic
2017-06-04 14:17                   ` Thomas Gleixner
2017-07-17 21:58                     ` Haris Okanovic
2017-07-17 22:04                     ` [PATCH v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
2017-07-18 21:33                       ` Thomas Gleixner
2017-08-03 21:04                         ` Haris Okanovic
2017-08-03 21:06                         ` [PATCH v3 1/2] " Haris Okanovic
2017-08-03 21:06                           ` [PATCH v3 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
2018-01-05 19:37                             ` Haris Okanovic
2018-03-01 15:49                               ` Haris Okanovic
2018-03-01 15:54                                 ` Thomas Gleixner
2018-03-01 16:35                                   ` Haris Okanovic
2018-03-01 16:47                                 ` Sebastian Andrzej Siewior
2018-03-01 18:37                                   ` Haris Okanovic
2018-03-01 19:06                                     ` Sebastian Andrzej Siewior
2018-03-02 14:52                                     ` Sebastian Andrzej Siewior
2018-03-02 16:29                                       ` Haris Okanovic
2018-03-02 16:39                                         ` Sebastian Andrzej Siewior
2018-03-02 17:19                                           ` Haris Okanovic
2018-03-06 23:39                         ` [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
2018-03-06 23:39                           ` [PATCH v4 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
     [not found]                           ` <1523332961.4827.1.camel@gmx.de>
2018-04-12 15:00                             ` [PATCH v4 1/2] timers: Don't wake ktimersoftd on every tick Haris Okanovic
2018-06-19 12:43                               ` Daniel Bristot de Oliveira
2018-06-20 14:24                                 ` Haris Okanovic
2018-06-28 16:36                               ` Haris Okanovic
2018-06-28 16:40                                 ` [PATCH v5 " Haris Okanovic
2018-06-28 16:40                                   ` [PATCH v5 2/2] timers: Don't search for expired timers while TIMER_SOFTIRQ is scheduled Haris Okanovic
2018-07-13 12:01                                   ` [PATCH v5 1/2] timers: Don't wake ktimersoftd on every tick Anna-Maria Gleixner
2018-07-13 14:37                                     ` Haris Okanovic
2017-05-27  7:47         ` [PATCH] Revert "timers: Don't wake ktimersoftd on every tick" Sebastian Andrzej Siewior
2017-02-03 18:27   ` [RFC v2] timers: Don't wake ktimersoftd on every tick Haris Okanovic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).