All of lore.kernel.org
 help / color / mirror / Atom feed
* [OSADL QA 3.18.9-rt5 #1]
@ 2015-04-07 22:52 Carsten Emde
  2015-04-09 12:37 ` Sebastian Andrzej Siewior
  2015-04-09 16:53 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 11+ messages in thread
From: Carsten Emde @ 2015-04-07 22:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux RT Users

Hi Sebastian,

an Intel Bay Trail board (Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz) at 
the OSADL QA Farm rack #b/slot #6 (https://www.osadl.org/?id=1894) stops 
working every 12 to 36 hours. The only way to get the board back to work 
is to power cycle it. Such crashes did not happen with any of the 
previously tested 3.12-rt kernels. About eight crashes have been 
observed so far - the kernel message obtained at the serial console (see 
below) was similar in all cases.

Thanks,
Carsten.


------------[ cut here ]------------
\x01WARNING: CPU: 3 PID: 16574 at kernel/watchdog.c:298 
watchdog_overflow_callback+0x10f/0x16c()
Watchdog detected hard LOCKUP on cpu 3\x01
Modules linked in: rpcsec_gss_krb5 nfsv4 eeprom nfs cpufreq_stats 
fscache bnep bluetooth ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 
nf_defrag_ipv6 ip6table_filter ip6_tables cfg80211 rfkill it87 hwmon_vid 
pl2303 usbserial cdc_acm r8169 mii iTCO_wdt iTCO_vendor_support ppdev 
coretemp kvm_intel kvm crc32c_intel snd_hda_codec_hdmi 
ghash_clmulni_intel cryptd microcode snd_hda_codec_realtek 
snd_hda_codec_generic serio_raw snd_hda_intel snd_hda_controller pcspkr 
snd_hda_codec snd_hwdep lpc_ich i2c_i801 snd_seq mfd_core snd_seq_device 
snd_pcm snd_timer snd xhci_pci shpchp soundcore xhci_hcd parport_pc 
parport nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd grace 
sunrpc i915 i2c_algo_bit drm_kms_helper drm i2c_core video ipv6 autofs4 
[last unloaded: hwlat_detector]
\x01CPU: 3 PID: 16574 Comm: cyclictest Not tainted 3.18.9-rt5 #30
\x01Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./J1900N-D3V, BIOS F2 03/06/2014
  0000000000000009 ffff88013fd85ba8 ffffffff814f89a4 00000000000003f8
  ffff88013fd85bf8 ffff88013fd85be8 ffffffff8103a27b 0000000000000000
  ffffffff810b6e65 0000000000000003 0000000000000000 ffff88013fd85d38
Call Trace:
  <NMI>  [<ffffffff814f89a4>] dump_stack+0x4f/0x9e
  [<ffffffff8103a27b>] warn_slowpath_common+0x81/0x9b
  [<ffffffff810b6e65>] ? watchdog_overflow_callback+0x10f/0x16c
  [<ffffffff8103a2db>] warn_slowpath_fmt+0x46/0x48
  [<ffffffff810b6e65>] watchdog_overflow_callback+0x10f/0x16c
  [<ffffffff810dec3b>] __perf_event_overflow+0x15a/0x1e8
  [<ffffffff81013c48>] ? x86_perf_event_set_period+0xfa/0x10c
  [<ffffffff810df127>] perf_event_overflow+0x14/0x16
  [<ffffffff8101825a>] intel_pmu_handle_irq+0x2bc/0x341
  [<ffffffff81012de4>] perf_event_nmi_handler+0x25/0x3e
  [<ffffffff81006325>] nmi_handle+0x72/0x134
  [<ffffffff81028081>] ? cpumask_clear_cpu.constprop.4+0x11/0x11
  [<ffffffff814fc7c3>] ? _raw_spin_unlock_irqrestore+0xe/0x4d
  [<ffffffff81006649>] default_do_nmi+0x78/0x14e
  [<ffffffff81006782>] do_nmi+0x63/0xa4
  [<ffffffff814fec0a>] end_repeat_nmi+0x1e/0x2e
  [<ffffffff814fc7c3>] ? _raw_spin_unlock_irqrestore+0xe/0x4d
  [<ffffffff814fc7c3>] ? _raw_spin_unlock_irqrestore+0xe/0x4d
  [<ffffffff814fc7c3>] ? _raw_spin_unlock_irqrestore+0xe/0x4d
  <<EOE>>  <IRQ>  [<ffffffff81086b9b>] hrtimer_try_to_cancel+0x55/0x5f
  [<ffffffff81087017>] hrtimer_cancel+0x16/0x28
  [<ffffffff81092fdf>] tick_nohz_restart+0x17/0x72
  [<ffffffff810936fc>] __tick_nohz_full_check+0x8e/0x93
  [<ffffffff8109370f>] nohz_full_kick_work_func+0xe/0x10
  [<ffffffff810d6a37>] irq_work_run_list+0x39/0x57
  [<ffffffff810930ae>] ? tick_sched_do_timer+0x45/0x45
  [<ffffffff810d6d6d>] irq_work_tick+0x60/0x67
  [<ffffffff81086122>] update_process_times+0x57/0x67
  [<ffffffff81092df3>] tick_sched_handle+0x4a/0x59
  [<ffffffff810930e9>] tick_sched_timer+0x3b/0x64
  [<ffffffff81086a77>] __run_hrtimer+0x7a/0x149
  [<ffffffff81087435>] hrtimer_interrupt+0x1cc/0x2c5
  [<ffffffff81026e3d>] local_apic_timer_interrupt+0x54/0x58
  [<ffffffff81027193>] smp_apic_timer_interrupt+0x31/0x43
  [<ffffffff814fdd0a>] apic_timer_interrupt+0x6a/0x70
  <EOI>  [<ffffffff810e2419>] ? context_tracking_user_exit+0xa0/0xcd
  [<ffffffff8100ec59>] syscall_trace_leave+0xf9/0x134
  [<ffffffff814fd1a8>] int_check_syscall_exit_work+0x34/0x3d
\x01---[ end trace 0000000000000002 ]---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-04-07 22:52 [OSADL QA 3.18.9-rt5 #1] Carsten Emde
@ 2015-04-09 12:37 ` Sebastian Andrzej Siewior
  2015-04-09 16:53 ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-09 12:37 UTC (permalink / raw)
  To: Carsten Emde; +Cc: Linux RT Users

On 04/08/2015 12:52 AM, Carsten Emde wrote:
> Hi Sebastian,

Hi Carsten,

> an Intel Bay Trail board (Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz) at
> the OSADL QA Farm rack #b/slot #6 (https://www.osadl.org/?id=1894) stops
> working every 12 to 36 hours. The only way to get the board back to work
> is to power cycle it. Such crashes did not happen with any of the
> previously tested 3.12-rt kernels. About eight crashes have been
> observed so far - the kernel message obtained at the serial console (see
> below) was similar in all cases.

This backtrace looks familiar I think. I think the problem is that
hrtimer_cancel() is called from hrtimer_interrupt() and deadlocks. Is
there anything special you do here except runnin FULL_NOHZ?

> 
> Thanks,
> Carsten.
Sebastian


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-04-07 22:52 [OSADL QA 3.18.9-rt5 #1] Carsten Emde
  2015-04-09 12:37 ` Sebastian Andrzej Siewior
@ 2015-04-09 16:53 ` Sebastian Andrzej Siewior
  2015-04-10 12:36   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-09 16:53 UTC (permalink / raw)
  To: Carsten Emde; +Cc: Linux RT Users

On 04/08/2015 12:52 AM, Carsten Emde wrote:
> Hi Sebastian,

Hi Carsten,

> an Intel Bay Trail board (Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz) at
> the OSADL QA Farm rack #b/slot #6 (https://www.osadl.org/?id=1894) stops
> working every 12 to 36 hours. The only way to get the board back to work

I'm going to re-arm the IPI which should cure this. Tomorrow.

> 
> Thanks,
> Carsten.

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-04-09 16:53 ` Sebastian Andrzej Siewior
@ 2015-04-10 12:36   ` Sebastian Andrzej Siewior
  2015-04-11  1:35     ` Carsten Emde
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-10 12:36 UTC (permalink / raw)
  To: Carsten Emde; +Cc: Linux RT Users

* Sebastian Andrzej Siewior | 2015-04-09 18:53:26 [+0200]:

>On 04/08/2015 12:52 AM, Carsten Emde wrote:
>> Hi Sebastian,
>
Hi Carsten,

>> an Intel Bay Trail board (Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz) at
>> the OSADL QA Farm rack #b/slot #6 (https://www.osadl.org/?id=1894) stops
>> working every 12 to 36 hours. The only way to get the board back to work
>
>I'm going to re-arm the IPI which should cure this. Tomorrow.

Could you try this:
--

Subject: [PATCH] kernel/irq_work: fix no_hz deadlock

Invoking NO_HZ's irq_work callback from timer irq is not working very
well if the callback decides to invoke hrtimer_cancel():

|hrtimer_try_to_cancel+0x55/0x5f
|hrtimer_cancel+0x16/0x28
|tick_nohz_restart+0x17/0x72
|__tick_nohz_full_check+0x8e/0x93
|nohz_full_kick_work_func+0xe/0x10
|irq_work_run_list+0x39/0x57
|irq_work_tick+0x60/0x67
|update_process_times+0x57/0x67
|tick_sched_handle+0x4a/0x59
|tick_sched_timer+0x3b/0x64
|__run_hrtimer+0x7a/0x149
|hrtimer_interrupt+0x1cc/0x2c5

and here we deadlock while waiting for the lock which we are holding.
To fix this I'm doing the same thing that upstream is doing: is the
irq_work dedicated IRQ and use it only for what is marked as "hirq"
which should only be the FULL_NO_HZ related work.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/arm/kernel/smp.c      |    2 --
 arch/arm64/kernel/smp.c    |    2 --
 arch/powerpc/kernel/time.c |    2 +-
 arch/sparc/kernel/pcr.c    |    2 --
 arch/x86/kernel/irq_work.c |    2 --
 kernel/irq_work.c          |   33 +++++++++++----------------------
 kernel/time/tick-sched.c   |    5 +++++
 kernel/time/timer.c        |    6 +++---
 8 files changed, 20 insertions(+), 34 deletions(-)

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -506,14 +506,12 @@ void arch_send_call_function_single_ipi(
 }
 
 #ifdef CONFIG_IRQ_WORK
-#ifndef CONFIG_PREEMPT_RT_FULL
 void arch_irq_work_raise(void)
 {
 	if (arch_irq_work_has_interrupt())
 		smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
 }
 #endif
-#endif
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 void tick_broadcast(const struct cpumask *mask)
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -529,14 +529,12 @@ void arch_send_call_function_single_ipi(
 }
 
 #ifdef CONFIG_IRQ_WORK
-#ifndef CONFIG_PREEMPT_RT_FULL
 void arch_irq_work_raise(void)
 {
 	if (__smp_cross_call)
 		smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
 }
 #endif
-#endif
 
 static DEFINE_RAW_SPINLOCK(stop_lock);
 
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -424,7 +424,7 @@ unsigned long profile_pc(struct pt_regs
 EXPORT_SYMBOL(profile_pc);
 #endif
 
-#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
+#if defined(CONFIG_IRQ_WORK)
 
 /*
  * 64-bit uses a byte in the PACA, 32-bit uses a per-cpu variable...
--- a/arch/sparc/kernel/pcr.c
+++ b/arch/sparc/kernel/pcr.c
@@ -43,12 +43,10 @@ void __irq_entry deferred_pcr_work_irq(i
 	set_irq_regs(old_regs);
 }
 
-#ifndef CONFIG_PREEMPT_RT_FULL
 void arch_irq_work_raise(void)
 {
 	set_softint(1 << PIL_DEFERRED_PCR_WORK);
 }
-#endif
 
 const struct pcr_ops *pcr_ops;
 EXPORT_SYMBOL_GPL(pcr_ops);
--- a/arch/x86/kernel/irq_work.c
+++ b/arch/x86/kernel/irq_work.c
@@ -38,7 +38,6 @@ static inline void __smp_irq_work_interr
 	exiting_irq();
 }
 
-#ifndef CONFIG_PREEMPT_RT_FULL
 void arch_irq_work_raise(void)
 {
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -49,4 +48,3 @@ void arch_irq_work_raise(void)
 	apic_wait_icr_idle();
 #endif
 }
-#endif
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -51,11 +51,7 @@ static bool irq_work_claim(struct irq_wo
 	return true;
 }
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-void arch_irq_work_raise(void)
-#else
 void __weak arch_irq_work_raise(void)
-#endif
 {
 	/*
 	 * Lame architectures will get the timer tick callback
@@ -117,10 +113,8 @@ bool irq_work_queue(struct irq_work *wor
 	if (work->flags & IRQ_WORK_HARD_IRQ) {
 		if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
 			arch_irq_work_raise();
-	} else {
-		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)))
-			arch_irq_work_raise();
-	}
+	} /* for lazy_list we have the timer irq */
+
 #else
 	if (work->flags & IRQ_WORK_LAZY) {
 		if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
@@ -203,30 +197,25 @@ static void irq_work_run_list(struct lli
 void irq_work_run(void)
 {
 #ifdef CONFIG_PREEMPT_RT_FULL
-	if (in_irq()) {
-		irq_work_run_list(this_cpu_ptr(&hirq_work_list));
-		return;
-	}
-#endif
+	irq_work_run_list(this_cpu_ptr(&hirq_work_list));
+#else
 	irq_work_run_list(this_cpu_ptr(&raised_list));
 	irq_work_run_list(this_cpu_ptr(&lazy_list));
+#endif
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
 
 void irq_work_tick(void)
 {
-	struct llist_head *raised;
-
 #ifdef CONFIG_PREEMPT_RT_FULL
-	if (in_irq()) {
-		irq_work_run_list(this_cpu_ptr(&hirq_work_list));
-		return;
-	}
-#endif
-	raised = &__get_cpu_var(raised_list);
-	if (!llist_empty(raised))
+	irq_work_run_list(this_cpu_ptr(&lazy_list));
+#else
+	struct llist_head *raised = &__get_cpu_var(raised_list);
+
+	if (!llist_empty(raised) && !arch_irq_work_has_interrupt())
 		irq_work_run_list(raised);
 	irq_work_run_list(&__get_cpu_var(lazy_list));
+#endif
 }
 
 /*
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -181,6 +181,11 @@ static bool can_stop_full_tick(void)
 		return false;
 	}
 
+	if (!arch_irq_work_has_interrupt()) {
+		trace_tick_stop(0, "missing irq work interrupt\n");
+		return false;
+	}
+
 	/* sched_clock_tick() needs us? */
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
 	/*
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1450,9 +1450,9 @@ void update_process_times(int user_tick)
 	scheduler_tick();
 	run_local_timers();
 	rcu_check_callbacks(cpu, user_tick);
-#ifdef CONFIG_IRQ_WORK
-	if (in_irq())
-		irq_work_tick();
+
+#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
+	irq_work_tick();
 #endif
 	run_posix_cpu_timers(p);
 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-04-10 12:36   ` Sebastian Andrzej Siewior
@ 2015-04-11  1:35     ` Carsten Emde
  2015-04-20 21:22       ` [RESOLVED OSADL " Carsten Emde
  2015-05-12  0:15     ` [OSADL " Steven Rostedt
  2015-05-13 16:34     ` Steven Rostedt
  2 siblings, 1 reply; 11+ messages in thread
From: Carsten Emde @ 2015-04-11  1:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux RT Users

Hi Sebastian,

>>> an Intel Bay Trail board (Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz) at
>>> the OSADL QA Farm rack #b/slot #6 (https://www.osadl.org/?id=1894) stops
>>> working every 12 to 36 hours. The only way to get the board back to work
>> [..]
> Could you try this:
> --
> Subject: [PATCH] kernel/irq_work: fix no_hz deadlock
>
> Invoking NO_HZ's irq_work callback from timer irq is not working very
> well if the callback decides to invoke hrtimer_cancel():
>
> |hrtimer_try_to_cancel+0x55/0x5f
> |hrtimer_cancel+0x16/0x28
> |tick_nohz_restart+0x17/0x72
> |__tick_nohz_full_check+0x8e/0x93
> |nohz_full_kick_work_func+0xe/0x10
> |irq_work_run_list+0x39/0x57
> |irq_work_tick+0x60/0x67
> |update_process_times+0x57/0x67
> |tick_sched_handle+0x4a/0x59
> |tick_sched_timer+0x3b/0x64
> |__run_hrtimer+0x7a/0x149
> |hrtimer_interrupt+0x1cc/0x2c5
>
> and here we deadlock while waiting for the lock which we are holding.
> To fix this I'm doing the same thing that upstream is doing: is the
> irq_work dedicated IRQ and use it only for what is marked as "hirq"
> which should only be the FULL_NO_HZ related work.
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> [..]
Thanks a lot! Applied the patch and restarted the box. Given the fact 
that it took up to 36 hours until the board stopped, we unfortunately 
need to see at least one week of crash-free operation, before we may 
consider the bug as fixed.

	-Carsten.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RESOLVED OSADL QA 3.18.9-rt5 #1]
  2015-04-11  1:35     ` Carsten Emde
@ 2015-04-20 21:22       ` Carsten Emde
  0 siblings, 0 replies; 11+ messages in thread
From: Carsten Emde @ 2015-04-20 21:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux RT Users

Hi Sebastian,

>>>> an Intel Bay Trail board (Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz) at
>>>> the OSADL QA Farm rack #b/slot #6 (https://www.osadl.org/?id=1894)
>>>> stops
>>>> working every 12 to 36 hours. The only way to get the board back to
>>>> work
>>> [..]
>> Could you try this:
>> --
>> Subject: [PATCH] kernel/irq_work: fix no_hz deadlock
>>
>> Invoking NO_HZ's irq_work callback from timer irq is not working very
>> well if the callback decides to invoke hrtimer_cancel():
>>
>> |hrtimer_try_to_cancel+0x55/0x5f
>> |hrtimer_cancel+0x16/0x28
>> |tick_nohz_restart+0x17/0x72
>> |__tick_nohz_full_check+0x8e/0x93
>> |nohz_full_kick_work_func+0xe/0x10
>> |irq_work_run_list+0x39/0x57
>> |irq_work_tick+0x60/0x67
>> |update_process_times+0x57/0x67
>> |tick_sched_handle+0x4a/0x59
>> |tick_sched_timer+0x3b/0x64
>> |__run_hrtimer+0x7a/0x149
>> |hrtimer_interrupt+0x1cc/0x2c5
>>
>> and here we deadlock while waiting for the lock which we are holding.
>> To fix this I'm doing the same thing that upstream is doing: is the
>> irq_work dedicated IRQ and use it only for what is marked as "hirq"
>> which should only be the FULL_NO_HZ related work.
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> [..]
> Thanks a lot! Applied the patch and restarted the box. Given the fact
> that it took up to 36 hours until the board stopped, we unfortunately
> need to see at least one week of crash-free operation, before we may
> consider the bug as fixed.
The board survived nine days without a crash -> RESOLVED.

Thanks,
	Carsten.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-04-10 12:36   ` Sebastian Andrzej Siewior
  2015-04-11  1:35     ` Carsten Emde
@ 2015-05-12  0:15     ` Steven Rostedt
  2015-05-13  8:12       ` Sebastian Andrzej Siewior
  2015-05-13 16:34     ` Steven Rostedt
  2 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2015-05-12  0:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Carsten Emde, Linux RT Users

On Fri, 10 Apr 2015 14:36:34 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:


> Subject: [PATCH] kernel/irq_work: fix no_hz deadlock
> 
> Invoking NO_HZ's irq_work callback from timer irq is not working very
> well if the callback decides to invoke hrtimer_cancel():
> 
> |hrtimer_try_to_cancel+0x55/0x5f
> |hrtimer_cancel+0x16/0x28
> |tick_nohz_restart+0x17/0x72
> |__tick_nohz_full_check+0x8e/0x93
> |nohz_full_kick_work_func+0xe/0x10
> |irq_work_run_list+0x39/0x57
> |irq_work_tick+0x60/0x67
> |update_process_times+0x57/0x67
> |tick_sched_handle+0x4a/0x59
> |tick_sched_timer+0x3b/0x64
> |__run_hrtimer+0x7a/0x149
> |hrtimer_interrupt+0x1cc/0x2c5
> 
> and here we deadlock while waiting for the lock which we are holding.
> To fix this I'm doing the same thing that upstream is doing: is the
> irq_work dedicated IRQ and use it only for what is marked as "hirq"
> which should only be the FULL_NO_HZ related work.

I'm backporting this to the stable releases, and I'm a bit worried
about the above comment. The new Scheduler IPI code uses work queues and
requires it to be done in a hard irq.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-05-12  0:15     ` [OSADL " Steven Rostedt
@ 2015-05-13  8:12       ` Sebastian Andrzej Siewior
  2015-05-13 15:23         ` Steven Rostedt
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-05-13  8:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Carsten Emde, Linux RT Users

* Steven Rostedt | 2015-05-11 20:15:06 [-0400]:

>I'm backporting this to the stable releases, and I'm a bit worried
>about the above comment. The new Scheduler IPI code uses work queues and
>requires it to be done in a hard irq.

The IPI part is only done for the high-prio which is only used by NOHZ.
Everyone else gets in via softiq as we did it before.

>-- Steve

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-05-13  8:12       ` Sebastian Andrzej Siewior
@ 2015-05-13 15:23         ` Steven Rostedt
  0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2015-05-13 15:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Carsten Emde, Linux RT Users

On Wed, 13 May 2015 10:12:36 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> * Steven Rostedt | 2015-05-11 20:15:06 [-0400]:
> 
> >I'm backporting this to the stable releases, and I'm a bit worried
> >about the above comment. The new Scheduler IPI code uses work queues and
> >requires it to be done in a hard irq.
> 
> The IPI part is only done for the high-prio which is only used by NOHZ.
> Everyone else gets in via softiq as we did it before.
> 

My push/pull RT tasks via IPI, uses the irq_work to do this. It has
nothing to do with NOHZ, and will cause serious latencies if it is done
by softirq.

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-04-10 12:36   ` Sebastian Andrzej Siewior
  2015-04-11  1:35     ` Carsten Emde
  2015-05-12  0:15     ` [OSADL " Steven Rostedt
@ 2015-05-13 16:34     ` Steven Rostedt
  2015-06-11 15:27       ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2015-05-13 16:34 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Carsten Emde, Linux RT Users, Peter Zijlstra

On Fri, 10 Apr 2015 14:36:34 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1450,9 +1450,9 @@ void update_process_times(int user_tick)
>  	scheduler_tick();
>  	run_local_timers();
>  	rcu_check_callbacks(cpu, user_tick);
> -#ifdef CONFIG_IRQ_WORK
> -	if (in_irq())
> -		irq_work_tick();
> +
> +#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
> +	irq_work_tick();
>  #endif

Found the bug. The above actually changes the code
for !CONFIG_PREEMPT_RT_FULL. You still need to keep that

	if (in_irq())

check, otherwise you can call irq_work_tick() from softirq in non RT
configs, which talking to Peter Zijlstra, is a no no.

Note, my tests were failing on CONFIG_PREEMPT_LL (low latency).

-- Steve


>  	run_posix_cpu_timers(p);
>  }


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OSADL QA 3.18.9-rt5 #1]
  2015-05-13 16:34     ` Steven Rostedt
@ 2015-06-11 15:27       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-06-11 15:27 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Carsten Emde, Linux RT Users, Peter Zijlstra

* Steven Rostedt | 2015-05-13 12:34:27 [-0400]:

>Found the bug. The above actually changes the code
>for !CONFIG_PREEMPT_RT_FULL. You still need to keep that
>
>	if (in_irq())
>
>check, otherwise you can call irq_work_tick() from softirq in non RT
>configs, which talking to Peter Zijlstra, is a no no.
>
>Note, my tests were failing on CONFIG_PREEMPT_LL (low latency).

This was removed by mistake. Obviously. I am adding the chunk back:

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1450,7 +1450,8 @@ void update_process_times(int user_tick)
 	run_local_timers();
 	rcu_check_callbacks(user_tick);
 #if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
-	irq_work_tick();
+	if (in_irq())
+		irq_work_tick();
 #endif
 	run_posix_cpu_timers(p);
 }

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-06-11 15:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-07 22:52 [OSADL QA 3.18.9-rt5 #1] Carsten Emde
2015-04-09 12:37 ` Sebastian Andrzej Siewior
2015-04-09 16:53 ` Sebastian Andrzej Siewior
2015-04-10 12:36   ` Sebastian Andrzej Siewior
2015-04-11  1:35     ` Carsten Emde
2015-04-20 21:22       ` [RESOLVED OSADL " Carsten Emde
2015-05-12  0:15     ` [OSADL " Steven Rostedt
2015-05-13  8:12       ` Sebastian Andrzej Siewior
2015-05-13 15:23         ` Steven Rostedt
2015-05-13 16:34     ` Steven Rostedt
2015-06-11 15:27       ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.