All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
@ 2017-12-12 21:58 bert schulze
  2017-12-13 17:53 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 8+ messages in thread
From: bert schulze @ 2017-12-12 21:58 UTC (permalink / raw)
  To: linux-rt-users

Hi folks,

I'm having issues with v4.14-rt1 to v4.14.3-rt5 using NO_HZ_FULL_ALL=y
with PREEMPT_RT_FULL=y and kernel.timer_migration enabled (which seems
to be enabled by default).

Full config used: http://paste.debian.net/hidden/eb51a120/

The kernel either boots fine or may lock up on boot already (sysrq is
working still and boot continues after some seconds upto minutes).

If any hang occurred on boot dmesg will contain:
root@deb9:~# dmesg | grep hrtimer
[    1.507207] hrtimer: interrupt took 28740 ns

If the system booted up fine (-> no "interrupt took #### ns" message)
it behaves as expected as long as timer migration was disabled.

root@deb9:~# echo 0 > /proc/sys/kernel/timer_migration 

A simple sleep (or anything else using nanosleep() is sufficient to
reproduce this.


The expected behaviour with kernel.timer_migration = 0

root@deb9:~# grep LOC: /proc/interrupts 
LOC:     91968       801       775       590   Local timer interrupts

root@deb9:~# for cpu in {0..3} ;do time taskset -ac $cpu sleep 0.1 ;done 
real    0m0.104s  // CPU0 ok
real    0m0.104s  // CPU1 ok
real    0m0.104s  // CPU2 ok
real    0m0.105s  // CPU3 ok

root@deb9:~# grep LOC: /proc/interrupts 
LOC:    101069       824       782       599   Local timer interrupts

Roughly 10 seconds passed and the housekeeping cpu shows ~10.000 timer
interrupts (which matches up with CONFIG_HZ=1000).


Doing the same with kernel.timer_migration = 1

root@deb9:~# for cpu in {0..3} ;do time taskset -ac $cpu sleep 0.1 ;done 
real    0m0.104s  // CPU0 ok
[  125.282455] hrtimer: interrupt took 2230 ns  <-- 
real    0m28.023s // CPU1 not ok
real    0m9.129s  // CPU2 not ok
real    0m10.000s // CPU3 not ok

The hrtimer: "interrupt took #### ns" message appeared any sleep on the
adaptive-tick cpu are completely off and …

root@deb9:~# grep LOC: /proc/interrupts 
LOC:  12544410       874       828       638   Local timer interrupts

… timer interrupts on the housekeeping cpu advanced by ~12400000 after
roughly 60 seconds even though the system is up for 2 minutes.

root@deb9:~# uptime 
 21:37:14 up 2 min,  1 user,  load average: 0.17, 0.15, 0.06


To rule out my hardware I've successfully reproduced this on i7-6700,
i7-3517u, i7-2xxxHQ hardware as well as in QEMU itself.

Everything is back to normal by passing "nohz_full=" to the kernel to
disable adaptive-tick cpus.

I've furthermore tested v4.13.13-rt5 and WIP.timers branch of tip.git
and both of them are working as expected.


Thanks,
Bert

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
  2017-12-12 21:58 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y bert schulze
@ 2017-12-13 17:53 ` Sebastian Andrzej Siewior
  2017-12-13 19:14   ` bert schulze
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-12-13 17:53 UTC (permalink / raw)
  To: bert schulze; +Cc: linux-rt-users

On 2017-12-12 22:58:18 [+0100], bert schulze wrote:
> Hi folks,
Hi,

> I'm having issues with v4.14-rt1 to v4.14.3-rt5 using NO_HZ_FULL_ALL=y
> with PREEMPT_RT_FULL=y and kernel.timer_migration enabled (which seems
> to be enabled by default).
> 
> Full config used: http://paste.debian.net/hidden/eb51a120/
> 
> The kernel either boots fine or may lock up on boot already (sysrq is
> working still and boot continues after some seconds upto minutes).
> 
> If any hang occurred on boot dmesg will contain:
> root@deb9:~# dmesg | grep hrtimer
> [    1.507207] hrtimer: interrupt took 28740 ns

this pops up because for some reason the system setup a lot of timers
and it takes time process them…

> If the system booted up fine (-> no "interrupt took #### ns" message)
> it behaves as expected as long as timer migration was disabled.
> 
> root@deb9:~# echo 0 > /proc/sys/kernel/timer_migration 
> 
> A simple sleep (or anything else using nanosleep() is sufficient to
> reproduce this.
> 
> 
> The expected behaviour with kernel.timer_migration = 0
> 
> root@deb9:~# grep LOC: /proc/interrupts 
> LOC:     91968       801       775       590   Local timer interrupts
> 
> root@deb9:~# for cpu in {0..3} ;do time taskset -ac $cpu sleep 0.1 ;done 
> real    0m0.104s  // CPU0 ok
> real    0m0.104s  // CPU1 ok
> real    0m0.104s  // CPU2 ok
> real    0m0.105s  // CPU3 ok
> 
> root@deb9:~# grep LOC: /proc/interrupts 
> LOC:    101069       824       782       599   Local timer interrupts
> 
> Roughly 10 seconds passed and the housekeeping cpu shows ~10.000 timer
> interrupts (which matches up with CONFIG_HZ=1000).
> 
> 
> Doing the same with kernel.timer_migration = 1
> 
> root@deb9:~# for cpu in {0..3} ;do time taskset -ac $cpu sleep 0.1 ;done 
> real    0m0.104s  // CPU0 ok
> [  125.282455] hrtimer: interrupt took 2230 ns  <-- 
> real    0m28.023s // CPU1 not ok
> real    0m9.129s  // CPU2 not ok
> real    0m10.000s // CPU3 not ok

your timer takes way longer. __hrtimer_init_sleeper() set your timer to
expire in softirq context and this does not happen for cross-base. If
you switch this to hard ctx then they will expire properly. The
interrupt storm remains…

…

> I've furthermore tested v4.13.13-rt5 and WIP.timers branch of tip.git
> and both of them are working as expected.

you have to take into account that you have almost no timers that will
expire in the softirq context. I will check that tomorrow and I expect
that the soft-timer in WIP.timers will also fail to expire in time.

> 
> Thanks,
> Bert

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
  2017-12-13 17:53 ` Sebastian Andrzej Siewior
@ 2017-12-13 19:14   ` bert schulze
  2017-12-14 16:25     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 8+ messages in thread
From: bert schulze @ 2017-12-13 19:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Sebastian Andrzej Siewior (Wed, Dec 13, 2017 at 06:53:39PM +0100):
> On 2017-12-12 22:58:18 [+0100], bert schulze wrote:
> > Hi folks,
> Hi,
> 
> > I'm having issues with v4.14-rt1 to v4.14.3-rt5 using NO_HZ_FULL_ALL=y
> > with PREEMPT_RT_FULL=y and kernel.timer_migration enabled (which seems
> > to be enabled by default).
> > 
> > Full config used: http://paste.debian.net/hidden/eb51a120/
> > 
> > The kernel either boots fine or may lock up on boot already (sysrq is
> > working still and boot continues after some seconds upto minutes).
> > 
> > If any hang occurred on boot dmesg will contain:
> > root@deb9:~# dmesg | grep hrtimer
> > [    1.507207] hrtimer: interrupt took 28740 ns
> 
> this pops up because for some reason the system setup a lot of timers
> and it takes time process them…

Yes as timer migration kicked in on boot causing the interrupt storm.
Disabling it all together there are no more hangs on boot.

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 40ac38728b7a..77acc35c8749 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -220,7 +220,11 @@ static DEFINE_MUTEX(timer_keys_mutex);
 static struct swork_event timer_update_swork;
 
 #ifdef CONFIG_SMP
+#ifdef CONFIG_NO_HZ_FULL_ALL
+unsigned int sysctl_timer_migration = 0;
+#else
 unsigned int sysctl_timer_migration = 1;
+#endif
 
 DEFINE_STATIC_KEY_FALSE(timers_migration_enabled);

--
> 
> > If the system booted up fine (-> no "interrupt took #### ns" message)
> > it behaves as expected as long as timer migration was disabled.
> > 
> > root@deb9:~# echo 0 > /proc/sys/kernel/timer_migration 
> > 
> > A simple sleep (or anything else using nanosleep() is sufficient to
> > reproduce this.
> > 
> > 
> > The expected behaviour with kernel.timer_migration = 0
> > 
> > root@deb9:~# grep LOC: /proc/interrupts 
> > LOC:     91968       801       775       590   Local timer interrupts
> > 
> > root@deb9:~# for cpu in {0..3} ;do time taskset -ac $cpu sleep 0.1 ;done 
> > real    0m0.104s  // CPU0 ok
> > real    0m0.104s  // CPU1 ok
> > real    0m0.104s  // CPU2 ok
> > real    0m0.105s  // CPU3 ok
> > 
> > root@deb9:~# grep LOC: /proc/interrupts 
> > LOC:    101069       824       782       599   Local timer interrupts
> > 
> > Roughly 10 seconds passed and the housekeeping cpu shows ~10.000 timer
> > interrupts (which matches up with CONFIG_HZ=1000).
> > 
> > 
> > Doing the same with kernel.timer_migration = 1
> > 
> > root@deb9:~# for cpu in {0..3} ;do time taskset -ac $cpu sleep 0.1 ;done 
> > real    0m0.104s  // CPU0 ok
> > [  125.282455] hrtimer: interrupt took 2230 ns  <-- 
> > real    0m28.023s // CPU1 not ok
> > real    0m9.129s  // CPU2 not ok
> > real    0m10.000s // CPU3 not ok
> 
> your timer takes way longer. __hrtimer_init_sleeper() set your timer to
> expire in softirq context and this does not happen for cross-base. If
> you switch this to hard ctx then they will expire properly. The
> interrupt storm remains…

This is sleep from coreutils, I'm not setting up anything here.
top, cyclictest whatever using nanosleep is broken.

> > I've furthermore tested v4.13.13-rt5 and WIP.timers branch of tip.git
> > and both of them are working as expected.
> 
> you have to take into account that you have almost no timers that will
> expire in the softirq context. I will check that tomorrow and I expect
> that the soft-timer in WIP.timers will also fail to expire in time.

Tried that as mentioned, no interrupt storm and nothing fails to expire.
(Add mainline 4.14 upto 4.14.5 to the list of working versions).

$ git log -1 | head -n1
commit fae02a10ca4b00ca614f8b3ea996cb419629d315 (HEAD -> WIP.timers, origin/WIP.timers)

root@deb9:~# bash test 
# uname -a
Linux deb9.virt 4.15.0-rc1-rt+ #2 SMP PREEMPT Wed Dec 13 19:10:01 CET 2017 x86_64 GNU/Linux
# cat /proc/sys/kernel/timer_migration
1
# dmesg | grep hrtimer
# date
Wed Dec 13 19:26:21 CET 2017
# grep LOC /proc/interrupts
LOC:      10409        651        495        404   Local timer interrupts
# for cpu in {0..3} ;do time taskset -ac  sleep 0.1 ;done
real    0m0.101s
real    0m0.101s
real    0m0.103s
real    0m0.103s
# date
Wed Dec 13 19:26:21 CET 2017
# grep LOC /proc/interrupts
LOC:      10825        704        500        405   Local timer interrupts


Bisect against linux-4.14.y-rt-rebase branch

--> last commit which is working 
git bisect good 3f2facf61992f4bb7a31463f6364e9a7cf1b9fcd

--> first commit which compiles but shows the expired timers
git bisect bad 33dc6fcdc040d1325b452cf7d6dbbd771931ad5b

# possible first bad commit: [33dc6fcdc040d1325b452cf7d6dbbd771931ad5b]
rtmutex: add ww_mutex addon for mutex-rt
# possible first bad commit: [82cf595179fb5d33c347ba1ce444e7db005abb3a]
rtmutex: wire up RT's locking
# possible first bad commit: [115407b319f59cdf2aac6723e3b007c9c22f64b7]
rtmutex: add rwlock implementation based on rtmutex
# possible first bad commit: [2d31dae80012b69945ec724060c9fc0ec13c8c6e]
rtmutex: add rwsem implementation based on rtmutex
# possible first bad commit: [2bee01da12a21f1b47f1c9e9044f7d923fce22cc]
rtmutex: add mutex implementation based on rtmutex
# possible first bad commit: [99f780e45d62cfb49eb9331fae5dc99757d8e3a1]
rtmutex: add sleeping lock implementation
# possible first bad commit: [71967ab31b3538fdfb801c44127f7ffc9bbc1e72]
rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock

Do you need anything else that would be helpful?

> > 
> > Thanks,
> > Bert
> 
> Sebastian

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
  2017-12-13 19:14   ` bert schulze
@ 2017-12-14 16:25     ` Sebastian Andrzej Siewior
  2017-12-14 18:01       ` bert schulze
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-12-14 16:25 UTC (permalink / raw)
  To: bert schulze; +Cc: linux-rt-users

On 2017-12-13 20:14:54 [+0100], bert schulze wrote:
> Yes as timer migration kicked in on boot causing the interrupt storm.
> Disabling it all together there are no more hangs on boot.

and with the patch I Cced you one we have fix for that but there is
more…

> This is sleep from coreutils, I'm not setting up anything here.
> top, cyclictest whatever using nanosleep is broken.

that sleep should be also affected.

> Tried that as mentioned, no interrupt storm and nothing fails to expire.
> (Add mainline 4.14 upto 4.14.5 to the list of working versions).
> Do you need anything else that would be helpful?

bisect with RT is an exercise of its own.
It looks that if you set rcu_normal_after_boot to 0 in
kernel/rcu/update.c then other problem should go away and everything is
"normal" now, right?
Now I need to figure why that is broken…

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
  2017-12-14 16:25     ` Sebastian Andrzej Siewior
@ 2017-12-14 18:01       ` bert schulze
  2017-12-14 18:27         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 8+ messages in thread
From: bert schulze @ 2017-12-14 18:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Sebastian Andrzej Siewior (Thu, Dec 14, 2017 at 05:25:03PM +0100):
> On 2017-12-13 20:14:54 [+0100], bert schulze wrote:
> > Yes as timer migration kicked in on boot causing the interrupt storm.
> > Disabling it all together there are no more hangs on boot.
> 
> and with the patch I Cced you one we have fix for that but there is
> more…
> 
> > This is sleep from coreutils, I'm not setting up anything here.
> > top, cyclictest whatever using nanosleep is broken.
> 
> that sleep should be also affected.
> 
> > Tried that as mentioned, no interrupt storm and nothing fails to expire.
> > (Add mainline 4.14 upto 4.14.5 to the list of working versions).
> …
> 
> > Do you need anything else that would be helpful?
> 
> bisect with RT is an exercise of its own.
> It looks that if you set rcu_normal_after_boot to 0 in
> kernel/rcu/update.c then other problem should go away and everything is
> "normal" now, right?

System boots reliable with mentioned patch applied, and nanosleep is
working as well! I do not have to set rcu_normal_after_boot to 0 or what
is its purpose?

> Now I need to figure why that is broken…

Guess you already did :)

root@deb9:~# bash test 
# uname -a
Linux deb9.virt 4.14.3-rt5-rt+ #5 SMP PREEMPT RT Thu Dec 14 18:39:27 CET 2017 x86_64 GNU/Linux
# cat /proc/sys/kernel/timer_migration
1
# dmesg | grep hrtimer
# date
Thu Dec 14 18:41:21 CET 2017
# grep LOC /proc/interrupts
LOC:       8002        208        138         68   Local timer interrupts
# for cpu in {0..3} ;do time taskset -ac  sleep 0.1 ;done
real    0m0.103s
real    0m0.105s
real    0m0.105s
real    0m0.105s
# date
Thu Dec 14 18:41:22 CET 2017
# grep LOC /proc/interrupts
LOC:       8437        209        139         70   Local timer interrupts

Tested on bare hardware and QEMU, top and cyclictest are working as well
no more interrupt storms can be observed!

> 
> Sebastian
Thank you!
Bert

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
  2017-12-14 18:01       ` bert schulze
@ 2017-12-14 18:27         ` Sebastian Andrzej Siewior
  2017-12-14 20:57           ` bert schulze
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-12-14 18:27 UTC (permalink / raw)
  To: bert schulze; +Cc: linux-rt-users

On 2017-12-14 19:01:50 [+0100], bert schulze wrote:
> > It looks that if you set rcu_normal_after_boot to 0 in
> > kernel/rcu/update.c then other problem should go away and everything is
> > "normal" now, right?
> 
> System boots reliable with mentioned patch applied, and nanosleep is
> working as well! I do not have to set rcu_normal_after_boot to 0 or what
> is its purpose?

By default RT sets rcu_normal_after_boot to 1. And with it set to 1 the
system seems to freeze somehow. After switching it back to 0 the system
works more reliably.

> > Now I need to figure why that is broken…
> 
> Guess you already did :)
no, not yet.

> root@deb9:~# bash test 
> # uname -a
> Linux deb9.virt 4.14.3-rt5-rt+ #5 SMP PREEMPT RT Thu Dec 14 18:39:27 CET 2017 x86_64 GNU/Linux
> # cat /proc/sys/kernel/timer_migration
> 1
> # dmesg | grep hrtimer
> # date
> Thu Dec 14 18:41:21 CET 2017
> # grep LOC /proc/interrupts
> LOC:       8002        208        138         68   Local timer interrupts
> # for cpu in {0..3} ;do time taskset -ac  sleep 0.1 ;done
> real    0m0.103s
> real    0m0.105s
> real    0m0.105s
> real    0m0.105s
> # date
> Thu Dec 14 18:41:22 CET 2017
> # grep LOC /proc/interrupts
> LOC:       8437        209        139         70   Local timer interrupts
> 
> Tested on bare hardware and QEMU, top and cyclictest are working as well
> no more interrupt storms can be observed!

hmm. So just that patch and everything works for you? No hangups during
boot, no "perf: interrupt took too long" every 30-60 secs?

> > 
> Thank you!
> Bert

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
  2017-12-14 18:27         ` Sebastian Andrzej Siewior
@ 2017-12-14 20:57           ` bert schulze
  2017-12-15 18:22             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 8+ messages in thread
From: bert schulze @ 2017-12-14 20:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Sebastian Andrzej Siewior (Thu, Dec 14, 2017 at 07:27:36PM +0100):
> On 2017-12-14 19:01:50 [+0100], bert schulze wrote:
> > > It looks that if you set rcu_normal_after_boot to 0 in
> > > kernel/rcu/update.c then other problem should go away and everything is
> > > "normal" now, right?
> > 
> > System boots reliable with mentioned patch applied, and nanosleep is
> > working as well! I do not have to set rcu_normal_after_boot to 0 or what
> > is its purpose?
> 
> By default RT sets rcu_normal_after_boot to 1. And with it set to 1 the
> system seems to freeze somehow. After switching it back to 0 the system
> works more reliably.

Running disk I/O and cpu workloads alongside cyclictest, no freeze
observed at all so far.

> 
> > > Now I need to figure why that is broken…
> > 
> > Guess you already did :)
> no, not yet.
> 
> > root@deb9:~# bash test 
> > # uname -a
> > Linux deb9.virt 4.14.3-rt5-rt+ #5 SMP PREEMPT RT Thu Dec 14 18:39:27 CET 2017 x86_64 GNU/Linux
> > # cat /proc/sys/kernel/timer_migration
> > 1
> > # dmesg | grep hrtimer
> > # date
> > Thu Dec 14 18:41:21 CET 2017
> > # grep LOC /proc/interrupts
> > LOC:       8002        208        138         68   Local timer interrupts
> > # for cpu in {0..3} ;do time taskset -ac  sleep 0.1 ;done
> > real    0m0.103s
> > real    0m0.105s
> > real    0m0.105s
> > real    0m0.105s
> > # date
> > Thu Dec 14 18:41:22 CET 2017
> > # grep LOC /proc/interrupts
> > LOC:       8437        209        139         70   Local timer interrupts
> > 
> > Tested on bare hardware and QEMU, top and cyclictest are working as well
> > no more interrupt storms can be observed!
> 
> hmm. So just that patch and everything works for you? No hangups during
> boot, no "perf: interrupt took too long" every 30-60 secs?

Yes no perf messages. However this is my "production" config stripped
off debugging. I can confirm that these messages appear when I use a
debugging config.

Disabling CONFIG_HARDLOCKUP_DETECTOR -> no more perf messages logged!

> 
> > > 
> > Thank you!
> > Bert
> 
> Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y
  2017-12-14 20:57           ` bert schulze
@ 2017-12-15 18:22             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-12-15 18:22 UTC (permalink / raw)
  To: bert schulze; +Cc: linux-rt-users

On 2017-12-14 21:57:50 [+0100], bert schulze wrote:
> Running disk I/O and cpu workloads alongside cyclictest, no freeze
> observed at all so far.

good. Case closed then.

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-15 18:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-12 21:58 4.14-rt timer issues using PREEMPT_RT_FULL=y and NO_HZ_FULL_ALL=y bert schulze
2017-12-13 17:53 ` Sebastian Andrzej Siewior
2017-12-13 19:14   ` bert schulze
2017-12-14 16:25     ` Sebastian Andrzej Siewior
2017-12-14 18:01       ` bert schulze
2017-12-14 18:27         ` Sebastian Andrzej Siewior
2017-12-14 20:57           ` bert schulze
2017-12-15 18:22             ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.