All of lore.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] 3.18.7-rt1
@ 2015-02-16 11:18 Sebastian Andrzej Siewior
  2015-02-16 18:13 ` Sebastian Andrzej Siewior
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-16 11:18 UTC (permalink / raw)
  To: linux-rt-users; +Cc: LKML, Thomas Gleixner, rostedt, John Kacur

Dear RT folks!

I'm pleased to announce the v3.18.7-rt1 patch set. It was running over
the weekend on my x86 box and was still alive this morning. However it
is still the first release for the v3.18 -RT series.
I haven't follow the mailing list or commented / applied any patches
from the list for -RT while being busy getting this release done (except
one patch I needed to have anyway). This is about to change. I will try to
go through my RT-inbox before doing the next release.

Changes since v3.14.25-rt22

- rebased to v3.18

- Added "Work Simple" from Daniel Wagner. I needed it for cgroup's RCU
  handling.
  
- "timers: do not raise softirq unconditionally" has been reverted. It
  has been added to optimize the FULL NOHZ case on CPUs which could avoid
  a softirq / timer wake up. Its been reverted because it breaks the
  switch to highres.

- added a few patches to get MQ-BLK and cgroups working.

Known issues:

      - bcache is disabled.

      - lazy preempt on x86_64 leads to a crash with some load.

      - CPU hotplug works in general. Steven's test script however
        deadlocks usually on the second invocation.

      - xor / raid_pq
        I had max latency jumping up to 67563us on one CPU while the next
        lower max was 58us. I tracked it down to module's init code of
        xor and raid_pq. Both disable preemption while measuring the
        measuring the performance of the individual implementation.

The RT patch against 3.18.7 can be found here:

   https://www.kernel.org/pub/linux/kernel/projects/rt/3.18/patch-3.18.7-rt1.patch.xz

The split quilt queue is available at:

   https://www.kernel.org/pub/linux/kernel/projects/rt/3.18/patches-3.18.7-rt1.tar.xz

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-16 11:18 [ANNOUNCE] 3.18.7-rt1 Sebastian Andrzej Siewior
@ 2015-02-16 18:13 ` Sebastian Andrzej Siewior
  2015-02-16 20:12 ` Richard Cochran
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-16 18:13 UTC (permalink / raw)
  To: linux-rt-users
  Cc: LKML, Thomas Gleixner, rostedt, John Kacur, 'NeilBrown, ',
	'Herbert Xu, linux-crypto, David S. Miller

* Sebastian Andrzej Siewior | 2015-02-16 12:18:22 [+0100]:

>Known issues:
>
>      - xor / raid_pq
>        I had max latency jumping up to 67563us on one CPU while the next
>        lower max was 58us. I tracked it down to module's init code of
>        xor and raid_pq. Both disable preemption while measuring the
>        measuring the performance of the individual implementation.

The patch at the bottom gets rid of it.

How important is this preempt_disable() and how likely is that we could
use precomputed priority lists of function instead this of this runtime
check? XOR already prefers AVX based-xor if available and numbers/test
at runtime could be removed.
Is there a case where SSE worse on CPU X better than MMX and this is why
we do it?

diff --git a/crypto/xor.c b/crypto/xor.c
index 35d6b3a..19e20f5 100644
--- a/crypto/xor.c
+++ b/crypto/xor.c
@@ -70,7 +70,7 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
 	tmpl->next = template_list;
 	template_list = tmpl;
 
-	preempt_disable();
+	preempt_disable_nort();
 
 	/*
 	 * Count the number of XORs done during a whole jiffy, and use
@@ -94,7 +94,7 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
 			max = count;
 	}
 
-	preempt_enable();
+	preempt_enable_nort();
 
 	speed = max * (HZ * BENCH_SIZE / 1024);
 	tmpl->speed = speed;
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 7d0e5cd..e9920d4 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -142,7 +142,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
 
 			perf = 0;
 
-			preempt_disable();
+			preempt_disable_nort();
 			j0 = jiffies;
 			while ((j1 = jiffies) == j0)
 				cpu_relax();
@@ -151,7 +151,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
 				(*algo)->gen_syndrome(disks, PAGE_SIZE, *dptrs);
 				perf++;
 			}
-			preempt_enable();
+			preempt_enable_nort();
 
 			if (perf > bestperf) {
 				bestperf = perf;


Sebastian

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-16 11:18 [ANNOUNCE] 3.18.7-rt1 Sebastian Andrzej Siewior
  2015-02-16 18:13 ` Sebastian Andrzej Siewior
@ 2015-02-16 20:12 ` Richard Cochran
  2015-02-16 20:21   ` Steven Rostedt
  2015-02-17  8:37 ` Daniel Wagner
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Richard Cochran @ 2015-02-16 20:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Mon, Feb 16, 2015 at 12:18:22PM +0100, Sebastian Andrzej Siewior wrote:
>       - CPU hotplug works in general. Steven's test script however
>         deadlocks usually on the second invocation.

Where can I find Steve's hotplug test script?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-16 20:12 ` Richard Cochran
@ 2015-02-16 20:21   ` Steven Rostedt
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2015-02-16 20:21 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Sebastian Andrzej Siewior, linux-rt-users, LKML, Thomas Gleixner,
	John Kacur

[-- Attachment #1: Type: text/plain, Size: 554 bytes --]

On Mon, 16 Feb 2015 21:12:40 +0100
Richard Cochran <richardcochran@gmail.com> wrote:

> On Mon, Feb 16, 2015 at 12:18:22PM +0100, Sebastian Andrzej Siewior wrote:
> >       - CPU hotplug works in general. Steven's test script however
> >         deadlocks usually on the second invocation.
> 
> Where can I find Steve's hotplug test script?

Here, I posted it before. Hmm, I wonder if we should post it on some
web site. I could add it to my own.

/me does that...

OK, it's now here too:

 http://rostedt.homelinux.com/code/stress-cpu-hotplug

-- Steve

[-- Attachment #2: stress-cpu-hotplug --]
[-- Type: application/octet-stream, Size: 906 bytes --]

#!/bin/bash

MAXCPUS=12

# find cpus

CPUS=`ls -d /sys/devices/system/cpu/cpu[1-9]*`
NR=`echo $CPUS | wc -w`

let x=0
for cpu in $CPUS; do
    file=$cpu/online
    CPUONLINE[$x]=$file
    ENB[$x]=`cat $file`
    CPU[$x]=`basename $cpu`
    let x=$x+1
done

let MAXCNT=$x

MSKCNT=$MAXCNT

if [ $MAXCNT -gt $MAXCPUS ]; then
    MSKCNT=$MAXCPUS
fi

let MSKCNT=2**$MSKCNT

hotplug() {
    MSK=$1
    ECHO="$MSK"
    CMD=""
    x=0
    while [ $MSK -gt 0 ]; do
	let bit=$MSK'&'1
	if [ $bit -eq 1 ]; then
	    if [ ${ENB[$x]} -eq 1 ]; then
		cmd="disabling"
		ENB[$x]=0
		num=0
	    else
		cmd="enabling"
		ENB[$x]=1
		num=1
	    fi
	    CMD="$CMD echo $num > ${CPUONLINE[$x]};"
	    ECHO="$ECHO $cmd ${CPU[$x]}"
	fi
	let x=$x+1
	let MSK=$MSK'>>'1
    done
    echo $ECHO
    eval $CMD
}

let MSKCNT=$MSKCNT-1
for i in `seq $MSKCNT`; do
    hotplug $i
done
for i in `seq $MSKCNT`; do
    hotplug $i
done

exit 0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-16 11:18 [ANNOUNCE] 3.18.7-rt1 Sebastian Andrzej Siewior
  2015-02-16 18:13 ` Sebastian Andrzej Siewior
  2015-02-16 20:12 ` Richard Cochran
@ 2015-02-17  8:37 ` Daniel Wagner
  2015-02-19  9:36   ` Joakim Hernberg
  2015-02-18 11:21 ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Daniel Wagner @ 2015-02-17  8:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rt-users
  Cc: LKML, Thomas Gleixner, rostedt, John Kacur

Hi Sebastian,

On 02/16/2015 12:18 PM, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
> 
> I'm pleased to announce the v3.18.7-rt1 patch set. It was running over
> the weekend on my x86 box and was still alive this morning. However it
> is still the first release for the v3.18 -RT series.
> I haven't follow the mailing list or commented / applied any patches
> from the list for -RT while being busy getting this release done (except
> one patch I needed to have anyway). This is about to change. I will try to
> go through my RT-inbox before doing the next release.

I needed the patch below to get it running stable under load on my shiny box.

cheers,
daniel


>From c517743659575932d7b7c94a08276d0cee8a2fdd Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
Date: Fri, 11 Jul 2014 15:26:13 +0200
Subject: [PATCH] thermal: Defer thermal wakups to threads

On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
call schedule while we run in irq context.

[<ffffffff816850ac>] dump_stack+0x4e/0x8f
[<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
[<ffffffff816896b4>] __schedule+0x5b4/0x700
[<ffffffff8168982a>] schedule+0x2a/0x90
[<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
[<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
[<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal]
[<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160
[<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250
[<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40
[<ffffffff8169415d>] thermal_interrupt+0x6d/0x80

Let's defer the work to a kthread.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c | 49 ++++++++++++++++++++++++++++++++--
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/x86_pkg_temp_thermal.c b/drivers/thermal/x86_pkg_temp_thermal.c
index 9ea3d9d..001ba02 100644
--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -29,6 +29,7 @@
 #include <linux/pm.h>
 #include <linux/thermal.h>
 #include <linux/debugfs.h>
+#include <linux/work-simple.h>
 #include <asm/cpu_device_id.h>
 #include <asm/mce.h>
 
@@ -352,7 +353,7 @@ static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
 	}
 }
 
-static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+static void platform_thermal_notify_work(struct swork_event *event)
 {
 	unsigned long flags;
 	int cpu = smp_processor_id();
@@ -369,7 +370,7 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
 			pkg_work_scheduled[phy_id]) {
 		disable_pkg_thres_interrupt();
 		spin_unlock_irqrestore(&pkg_work_lock, flags);
-		return -EINVAL;
+		return;
 	}
 	pkg_work_scheduled[phy_id] = 1;
 	spin_unlock_irqrestore(&pkg_work_lock, flags);
@@ -378,9 +379,48 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
 	schedule_delayed_work_on(cpu,
 				&per_cpu(pkg_temp_thermal_threshold_work, cpu),
 				msecs_to_jiffies(notify_delay_ms));
+}
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+static struct swork_event notify_work;
+
+static int thermal_notify_work_init(void)
+{
+	int err;
+
+	err = swork_get();
+	if (!err)
+		return err;
+
+	INIT_SWORK(&notify_work, platform_thermal_notify_work);
+	return 0;
+}
+
+static void thermal_notify_work_cleanup(void)
+{
+	swork_put();
+}
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+	swork_queue(&notify_work);
 	return 0;
 }
 
+#else  /* !CONFIG_PREEMPT_RT_FULL */
+
+static int thermal_notify_work_init(void) { return 0; }
+
+static int thermal_notify_work_cleanup(void) {  }
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+	platform_thermal_notify_work(NULL);
+
+	return 0;
+}
+#endif /* CONFIG_PREEMPT_RT_FULL */
+
 static int find_siblings_cpu(int cpu)
 {
 	int i;
@@ -594,6 +634,10 @@ static int __init pkg_temp_thermal_init(void)
 	for_each_online_cpu(i)
 		if (get_core_online(i))
 			goto err_ret;
+
+	if (!thermal_notify_work_init())
+		goto err_ret;
+
 	__register_hotcpu_notifier(&pkg_temp_thermal_notifier);
 	cpu_notifier_register_done();
 
@@ -619,6 +663,7 @@ static void __exit pkg_temp_thermal_exit(void)
 
 	cpu_notifier_register_begin();
 	__unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
+	thermal_notify_work_cleanup();
 	mutex_lock(&phy_dev_list_mutex);
 	list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
 		/* Retore old MSR value for package thermal interrupt */
-- 
2.1.0



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions
  2015-02-16 11:18 [ANNOUNCE] 3.18.7-rt1 Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2015-02-17  8:37 ` Daniel Wagner
@ 2015-02-18 11:21 ` Mike Galbraith
  2015-02-18 11:27   ` [rfc patch-3.18.7-rt1] rt/nohz_full: Fix can_stop_full_tick() gripe within softirq Mike Galbraith
                     ` (2 more replies)
  2015-02-18 14:09 ` [patch-3.18.7-rt1] snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats Mike Galbraith
  2015-02-18 15:05 ` [patch-3.18.7-rt1]sunrpc: make svc_xprt_do_enqueue() use get_cpu_light() Mike Galbraith
  5 siblings, 3 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-02-18 11:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:

> Known issues:
>
>       - lazy preempt on x86_64 leads to a crash with some load.

The below still works for me.  (it doesn't make nohz_full actually work
in rt, but at least folks who want to tinker with it can do so)


If context tracking is enabled, we can recurse, and explode violently.
Add missing checks to preempt_schedule_context().

Fix other inconsistencies spotted while searching for the little SOB.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 arch/x86/Kconfig                   |    2 +-
 arch/x86/include/asm/thread_info.h |    1 +
 include/linux/preempt.h            |    2 +-
 include/linux/preempt_mask.h       |   10 ++++++++--
 kernel/fork.c                      |    1 +
 kernel/sched/core.c                |   18 ++++++------------
 kernel/sched/fair.c                |    2 +-
 7 files changed, 19 insertions(+), 17 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -21,7 +21,7 @@ config X86_64
 ### Arch settings
 config X86
 	def_bool y
-	select HAVE_PREEMPT_LAZY if X86_32
+	select HAVE_PREEMPT_LAZY
 	select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
 	select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
 	select ARCH_HAS_FAST_MULTIPLIER
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -46,6 +46,7 @@ struct thread_info {
 	.flags		= 0,			\
 	.cpu		= 0,			\
 	.saved_preempt_count = INIT_PREEMPT_COUNT,	\
+	.preempt_lazy_count = 0,		\
 	.addr_limit	= KERNEL_DS,		\
 	.restart_block = {			\
 		.fn = do_no_restart_syscall,	\
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -91,8 +91,8 @@ do { \
 
 #define preempt_lazy_enable() \
 do { \
-	dec_preempt_lazy_count(); \
 	barrier(); \
+	dec_preempt_lazy_count(); \
 	preempt_check_resched(); \
 } while (0)
 
--- a/include/linux/preempt_mask.h
+++ b/include/linux/preempt_mask.h
@@ -118,9 +118,15 @@ extern int in_serving_softirq(void);
 		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)
 
 #ifdef CONFIG_PREEMPT_COUNT
-# define preemptible()	(preempt_count() == 0 && !irqs_disabled())
+# define preemptible()		(preempt_count() == 0 && !irqs_disabled())
+#ifdef CONFIG_PREEMPT_LAZY
+# define preemptible_lazy()	(preempt_lazy_count() == 0 || need_resched_now())
 #else
-# define preemptible()	0
+# define preemptible_lazy()	1
+#endif
+#else
+# define preemptible()		0
+# define preemptible_lazy()	0
 #endif
 
 #endif /* LINUX_PREEMPT_MASK_H */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -348,6 +348,7 @@ static struct task_struct *dup_task_stru
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
 	clear_tsk_need_resched(tsk);
+	clear_tsk_need_resched_lazy(tsk);
 	set_task_stack_end_magic(tsk);
 
 #ifdef CONFIG_CC_STACKPROTECTOR
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2861,8 +2861,8 @@ void migrate_enable(void)
 		p->migrate_disable = 0;
 
 	unpin_current_cpu();
-	preempt_enable();
 	preempt_lazy_enable();
+	preempt_enable();
 }
 EXPORT_SYMBOL(migrate_enable);
 #else
@@ -3099,19 +3099,13 @@ asmlinkage __visible void __sched notrac
 {
 	/*
 	 * If there is a non-zero preempt_count or interrupts are disabled,
-	 * we do not want to preempt the current task. Just return..
+	 * we do not want to preempt the current task. Just return.  For
+	 * lazy preemption we also check for non-zero preempt_count_lazy,
+	 * and bail if no immediate preemption is required.
 	 */
-	if (likely(!preemptible()))
+	if (likely(!preemptible() || !preemptible_lazy()))
 		return;
 
-#ifdef CONFIG_PREEMPT_LAZY
-	/*
-	 * Check for lazy preemption
-	 */
-	if (current_thread_info()->preempt_lazy_count &&
-			!test_thread_flag(TIF_NEED_RESCHED))
-		return;
-#endif
 	do {
 		__preempt_count_add(PREEMPT_ACTIVE);
 		/*
@@ -3155,7 +3149,7 @@ asmlinkage __visible void __sched notrac
 {
 	enum ctx_state prev_ctx;
 
-	if (likely(!preemptible()))
+	if (likely(!preemptible() || !preemptible_lazy()))
 		return;
 
 	do {
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4761,7 +4761,7 @@ static void check_preempt_wakeup(struct
 	 * prevents us from potentially nominating it as a false LAST_BUDDY
 	 * below.
 	 */
-	if (test_tsk_need_resched(curr))
+	if (test_tsk_need_resched(curr) || test_tsk_need_resched_lazy(curr))
 		return;
 
 	/* Idle tasks are by definition preempted by non-idle tasks. */



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc patch-3.18.7-rt1] rt/nohz_full: Fix can_stop_full_tick() gripe within softirq
  2015-02-18 11:21 ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
@ 2015-02-18 11:27   ` Mike Galbraith
  2015-02-19  4:50     ` Mike Galbraith
  2015-02-19  6:30   ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
  2015-03-09 13:45   ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2015-02-18 11:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

The shuts gripe up.

[   57.170576] ------------[ cut here ]------------
[   57.170583] WARNING: CPU: 3 PID: 71 at kernel/time/tick-sched.c:167 can_stop_full_tick+0x1ae/0x260()
[   57.170617] CPU: 3 PID: 71 Comm: sirq-timer/3 Not tainted 3.18.7-rt1 #4
[   57.170617] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
[   57.170618]  0000000000000009 ffff88040b21fc48 ffffffff815bb722 0000000000000002
[   57.170619]  0000000000000000 ffff88040b21fc88 ffffffff810512f1 ffff88040b21fcc8
[   57.170620]  ffff88041eccd360 ffff88041ecc0000 000000000000000b 0000000000000000
[   57.170620] Call Trace:
[   57.170624]  [<ffffffff815bb722>] dump_stack+0x4f/0x9e
[   57.170626]  [<ffffffff810512f1>] warn_slowpath_common+0x81/0xc0
[   57.170628]  [<ffffffff810513ea>] warn_slowpath_null+0x1a/0x20
[   57.170628]  [<ffffffff810c1f2e>] can_stop_full_tick+0x1ae/0x260
[   57.170629]  [<ffffffff810c2071>] __tick_nohz_full_check+0x71/0xb0
[   57.170630]  [<ffffffff810c20be>] nohz_full_kick_work_func+0xe/0x10
[   57.170631]  [<ffffffff81115d3f>] irq_work_run_list+0x3f/0x60
[   57.170632]  [<ffffffff8111617e>] irq_work_tick+0x3e/0x90
[   57.170634]  [<ffffffff810b04c5>] run_timer_softirq+0x35/0x2f0
[   57.170635]  [<ffffffff8107f0a2>] ? __vtime_account_system+0x32/0x40
[   57.170637]  [<ffffffff81055794>] do_current_softirqs.isra.11+0x1c4/0x3c0
[   57.170638]  [<ffffffff81055a55>] run_ksoftirqd+0x25/0x40
[   57.170639]  [<ffffffff810723dd>] smpboot_thread_fn+0x1dd/0x340
[   57.170640]  [<ffffffff81072200>] ? smpboot_register_percpu_thread+0x100/0x100
[   57.170641]  [<ffffffff8106e4ab>] kthread+0xbb/0xe0
[   57.170642]  [<ffffffff8106e3f0>] ? kthread_worker_fn+0x190/0x190
[   57.170644]  [<ffffffff815c1cac>] ret_from_fork+0x7c/0xb0
[   57.170644]  [<ffffffff8106e3f0>] ? kthread_worker_fn+0x190/0x190
[   57.170645] ---[ end trace 0000000000000002 ]---

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 kernel/time/tick-sched.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -222,6 +222,8 @@ void __tick_nohz_full_check(void)
 
 static void nohz_full_kick_work_func(struct irq_work *work)
 {
+	if (in_serving_softirq())
+		return;
 	__tick_nohz_full_check();
 }
 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch-3.18.7-rt1] snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats
  2015-02-16 11:18 [ANNOUNCE] 3.18.7-rt1 Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2015-02-18 11:21 ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
@ 2015-02-18 14:09 ` Mike Galbraith
  2015-02-25 14:08   ` Sebastian Andrzej Siewior
  2015-02-18 15:05 ` [patch-3.18.7-rt1]sunrpc: make svc_xprt_do_enqueue() use get_cpu_light() Mike Galbraith
  5 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2015-02-18 14:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

Locking functions previously using read_lock_irq()/read_lock_irqsave() were
changed to local_irq_disable/save(), leading to gripes.  Use nort variants.

[ 2423.966857] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
[ 2423.966858] in_atomic(): 0, irqs_disabled(): 1, pid: 5947, name: alsa-sink-ALC88
[ 2423.966860] CPU: 5 PID: 5947 Comm: alsa-sink-ALC88 Not tainted 3.18.7-rt1 #9
[ 2423.966860] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
[ 2423.966862]  ffff880409316240 ffff88040866fa38 ffffffff815bdeb5 0000000000000002
[ 2423.966863]  0000000000000000 ffff88040866fa58 ffffffff81073c86 ffffffffa03b2640
[ 2423.966864]  ffff88040239ec00 ffff88040866fa78 ffffffff815c3d34 ffffffffa03b2640
[ 2423.966864] Call Trace:
[ 2423.966868]  [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
[ 2423.966870]  [<ffffffff81073c86>] __might_sleep+0xe6/0x150
[ 2423.966880]  [<ffffffff815c3d34>] __rt_spin_lock+0x24/0x50
[ 2423.966883]  [<ffffffff815c4044>] rt_read_lock+0x34/0x40
[ 2423.966887]  [<ffffffffa03a2979>] snd_pcm_stream_lock+0x29/0x70 [snd_pcm]
[ 2423.966890]  [<ffffffffa03a355d>] snd_pcm_playback_poll+0x5d/0x120 [snd_pcm]
[ 2423.966892]  [<ffffffff811937a2>] do_sys_poll+0x322/0x5b0
[ 2423.966895]  [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[ 2423.966897]  [<ffffffffa03fa023>] ? azx_cc_read+0x23/0x30 [snd_hda_controller]
[ 2423.966899]  [<ffffffff810b9029>] ? timecounter_read+0x19/0x50
[ 2423.966901]  [<ffffffffa03fb187>] ? azx_get_wallclock_tstamp+0x97/0xc0 [snd_hda_controller]
[ 2423.966904]  [<ffffffffa03aaed1>] ? snd_pcm_update_hw_ptr0+0x1b1/0x470 [snd_pcm]
[ 2423.966906]  [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[ 2423.966907]  [<ffffffff81051d2a>] ? unpin_current_cpu+0x1a/0x70
[ 2423.966910]  [<ffffffff81079ec0>] ? migrate_enable+0xe0/0x1e0
[ 2423.966912]  [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966914]  [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966915]  [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966916]  [<ffffffff811923c0>] ? poll_select_copy_remaining+0x130/0x130
[ 2423.966919]  [<ffffffffa03a5ba2>] ? snd_pcm_common_ioctl1+0x1c2/0xda0 [snd_pcm]
[ 2423.966920]  [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[ 2423.966923]  [<ffffffffa03a690b>] ? snd_pcm_playback_ioctl1+0x18b/0x2d0 [snd_pcm]
[ 2423.966924]  [<ffffffff811c6fa9>] ? eventfd_ctx_read+0x179/0x1e0
[ 2423.966926]  [<ffffffffa03a6a84>] ? snd_pcm_playback_ioctl+0x34/0x40 [snd_pcm]
[ 2423.966927]  [<ffffffff810b661e>] ? ktime_get_ts64+0x4e/0xf0
[ 2423.966928]  [<ffffffff81193d48>] SyS_ppoll+0x1a8/0x1c0
[ 2423.966930]  [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 sound/core/pcm_native.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -104,7 +104,7 @@ EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock)
 void snd_pcm_stream_lock_irq(struct snd_pcm_substream *substream)
 {
 	if (!substream->pcm->nonatomic)
-		local_irq_disable();
+		local_irq_disable_nort();
 	snd_pcm_stream_lock(substream);
 }
 EXPORT_SYMBOL_GPL(snd_pcm_stream_lock_irq);
@@ -113,7 +113,7 @@ void snd_pcm_stream_unlock_irq(struct sn
 {
 	snd_pcm_stream_unlock(substream);
 	if (!substream->pcm->nonatomic)
-		local_irq_enable();
+		local_irq_enable_nort();
 }
 EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irq);
 
@@ -121,7 +121,7 @@ unsigned long _snd_pcm_stream_lock_irqsa
 {
 	unsigned long flags = 0;
 	if (!substream->pcm->nonatomic)
-		local_irq_save(flags);
+		local_irq_save_nort(flags);
 	snd_pcm_stream_lock(substream);
 	return flags;
 }
@@ -132,7 +132,7 @@ void snd_pcm_stream_unlock_irqrestore(st
 {
 	snd_pcm_stream_unlock(substream);
 	if (!substream->pcm->nonatomic)
-		local_irq_restore(flags);
+		local_irq_restore_nort(flags);
 }
 EXPORT_SYMBOL_GPL(snd_pcm_stream_unlock_irqrestore);
 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch-3.18.7-rt1]sunrpc: make svc_xprt_do_enqueue() use get_cpu_light()
  2015-02-16 11:18 [ANNOUNCE] 3.18.7-rt1 Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2015-02-18 14:09 ` [patch-3.18.7-rt1] snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats Mike Galbraith
@ 2015-02-18 15:05 ` Mike Galbraith
  2015-02-25 14:14   ` Sebastian Andrzej Siewior
  5 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2015-02-18 15:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

[   37.667792] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   37.720307] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
[   37.720307] in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
[   37.720318] Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]

[   37.720320] CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
[   37.720320] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
[   37.720321]  ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
[   37.720322]  0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
[   37.720322]  ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
[   37.720323] Call Trace:
[   37.720326]  [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
[   37.720328]  [<ffffffff81073c86>] __might_sleep+0xe6/0x150
[   37.720329]  [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
[   37.720335]  [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
[   37.720340]  [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
[   37.720345]  [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
[   37.720349]  [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
[   37.720351]  [<ffffffff810b6f47>] ? __getnstimeofday64+0x37/0xd0
[   37.720352]  [<ffffffff810b6fee>] ? getnstimeofday64+0xe/0x30
[   37.720353]  [<ffffffff810b702a>] ? do_gettimeofday+0x1a/0x50
[   37.720356]  [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
[   37.720358]  [<ffffffff812bcfe7>] ? debug_smp_processor_id+0x17/0x20
[   37.720361]  [<ffffffff811512c7>] ? might_fault+0x47/0x50
[   37.720363]  [<ffffffffa072e410>] ? write_versions+0x320/0x320 [nfsd]
[   37.720365]  [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
[   37.720366]  [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
[   37.720368]  [<ffffffff814c5082>] ? release_sock+0x152/0x1a0
[   37.720369]  [<ffffffff8117f889>] SyS_write+0x49/0xb0
[   37.720371]  [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 net/sunrpc/svc_xprt.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -357,7 +357,7 @@ static void svc_xprt_do_enqueue(struct s
 		return;
 	}
 
-	cpu = get_cpu();
+	cpu = get_cpu_light();
 	pool = svc_pool_for_cpu(xprt->xpt_server, cpu);
 	spin_lock_bh(&pool->sp_lock);
 
@@ -390,7 +390,7 @@ static void svc_xprt_do_enqueue(struct s
 	}
 
 	spin_unlock_bh(&pool->sp_lock);
-	put_cpu();
+	put_cpu_light();
 }
 
 /*



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch-3.18.7-rt1] rt/nohz_full: Fix can_stop_full_tick() gripe within softirq
  2015-02-18 11:27   ` [rfc patch-3.18.7-rt1] rt/nohz_full: Fix can_stop_full_tick() gripe within softirq Mike Galbraith
@ 2015-02-19  4:50     ` Mike Galbraith
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-02-19  4:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Wed, 2015-02-18 at 12:27 +0100, Mike Galbraith wrote:
> The shuts gripe up.

Gee, that was a very particularly bad way to shut it up.  Never mind :)

	-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions
  2015-02-18 11:21 ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
  2015-02-18 11:27   ` [rfc patch-3.18.7-rt1] rt/nohz_full: Fix can_stop_full_tick() gripe within softirq Mike Galbraith
@ 2015-02-19  6:30   ` Mike Galbraith
  2015-03-09 13:45   ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-02-19  6:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Wed, 2015-02-18 at 12:21 +0100, Mike Galbraith wrote:
> On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
> 
> > Known issues:
> >
> >       - lazy preempt on x86_64 leads to a crash with some load.
> 
> The below still works for me.  (it doesn't make nohz_full actually work
> in rt, but at least folks who want to tinker with it can do so)

Actually, with the below (and other local hacks obviously), it's looking
pretty darn good on my little desktop box.

---
 kernel/sched/core.c      |    2 +-
 kernel/time/tick-sched.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -810,7 +810,7 @@ bool sched_can_stop_tick(void)
 	 * nr_running update is assumed to be visible
 	 * after IPI is sent from wakers.
 	 */
-	if (this_rq()->nr_running > 1)
+	if (this_rq()->nr_running - task_is_softirqd(current) > 1)
 		return false;
 
 	return true;
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -164,7 +164,7 @@ bool tick_nohz_full_running;
 
 static bool can_stop_full_tick(void)
 {
-	WARN_ON_ONCE(!irqs_disabled());
+	WARN_ON_ONCE_NONRT(!irqs_disabled());  <== better way to shut warning up ;-)
 
 	if (!sched_can_stop_tick()) {
 		trace_tick_stop(0, "more than 1 task in runqueue\n");

With 2 shielded cored measuring perturbations (output 1 line/min)

homer:~ # cgexec -g cpuset:rtcpus taskset -c 2 pert 60
3591.74 MHZ CPU
perturbation threshold 0.013 usecs.
pert/s:       12 >7.76us:        0 min:  0.04 max:  0.78 avg:  0.09 sum/s:     1us overhead: 0.00%
pert/s:        7 >7.02us:        0 min:  0.04 max:  0.70 avg:  0.05 sum/s:     0us overhead: 0.00%
pert/s:       14 >8.11us:        1 min:  0.04 max: 35.72 avg:  0.09 sum/s:     1us overhead: 0.00%
pert/s:       39 >7.35us:        0 min:  0.04 max:  0.93 avg:  0.11 sum/s:     4us overhead: 0.00%
pert/s:       44 >6.66us:        0 min:  0.04 max:  0.84 avg:  0.11 sum/s:     5us overhead: 0.00%
pert/s:       46 >6.05us:        0 min:  0.04 max:  1.00 avg:  0.11 sum/s:     5us overhead: 0.00%
pert/s:       36 >5.49us:        0 min:  0.04 max:  0.71 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       42 >5.21us:        0 min:  0.04 max:  5.43 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       24 >4.73us:        0 min:  0.04 max:  0.74 avg:  0.10 sum/s:     2us overhead: 0.00%
pert/s:       33 >4.31us:        0 min:  0.04 max:  0.90 avg:  0.10 sum/s:     3us overhead: 0.00%
pert/s:       34 >3.92us:        0 min:  0.04 max:  0.70 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       64 >3.59us:        0 min:  0.04 max:  1.05 avg:  0.12 sum/s:     7us overhead: 0.00%
pert/s:       53 >3.27us:        0 min:  0.04 max:  0.66 avg:  0.11 sum/s:     6us overhead: 0.00%
pert/s:       55 >2.98us:        0 min:  0.04 max:  0.72 avg:  0.11 sum/s:     6us overhead: 0.00%
pert/s:       38 >2.70us:        0 min:  0.04 max:  0.35 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       35 >2.46us:        0 min:  0.04 max:  0.43 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       36 >2.25us:        0 min:  0.04 max:  0.58 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       44 >2.08us:        0 min:  0.04 max:  0.93 avg:  0.11 sum/s:     5us overhead: 0.00%
pert/s:       45 >1.91us:        0 min:  0.04 max:  0.76 avg:  0.11 sum/s:     5us overhead: 0.00%
pert/s:       33 >1.75us:        0 min:  0.04 max:  0.55 avg:  0.10 sum/s:     3us overhead: 0.00%
Terminated

homer:~ # cgexec -g cpuset:rtcpus taskset -c 3 pert 60
3591.73 MHZ CPU
perturbation threshold 0.013 usecs.
pert/s:       11 >6.26us:        0 min:  0.04 max:  0.77 avg:  0.08 sum/s:     1us overhead: 0.00%
pert/s:        7 >5.67us:        0 min:  0.04 max:  0.71 avg:  0.06 sum/s:     0us overhead: 0.00%
pert/s:        8 >5.14us:        0 min:  0.04 max:  0.57 avg:  0.05 sum/s:     0us overhead: 0.00%
pert/s:       39 >4.67us:        0 min:  0.04 max:  0.94 avg:  0.11 sum/s:     4us overhead: 0.00%
pert/s:       44 >4.25us:        0 min:  0.04 max:  0.83 avg:  0.11 sum/s:     5us overhead: 0.00%
pert/s:       46 >3.89us:        0 min:  0.04 max:  1.12 avg:  0.12 sum/s:     5us overhead: 0.00%
pert/s:       36 >3.54us:        0 min:  0.04 max:  0.70 avg:  0.11 sum/s:     4us overhead: 0.00%
pert/s:       40 >3.42us:        1 min:  0.04 max:  4.63 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       24 >3.12us:        0 min:  0.04 max:  0.60 avg:  0.10 sum/s:     2us overhead: 0.00%
pert/s:       34 >2.86us:        0 min:  0.04 max:  0.97 avg:  0.11 sum/s:     4us overhead: 0.00%
pert/s:       35 >2.60us:        0 min:  0.04 max:  0.39 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       64 >2.39us:        0 min:  0.04 max:  0.94 avg:  0.12 sum/s:     8us overhead: 0.00%
pert/s:       52 >2.19us:        0 min:  0.04 max:  0.73 avg:  0.12 sum/s:     6us overhead: 0.00%
pert/s:       55 >2.01us:        0 min:  0.04 max:  0.62 avg:  0.12 sum/s:     6us overhead: 0.00%
pert/s:       38 >1.83us:        0 min:  0.04 max:  0.35 avg:  0.11 sum/s:     4us overhead: 0.00%
pert/s:       35 >1.68us:        0 min:  0.04 max:  0.43 avg:  0.10 sum/s:     4us overhead: 0.00%
pert/s:       36 >1.54us:        0 min:  0.04 max:  0.61 avg:  0.11 sum/s:     4us overhead: 0.00%
pert/s:       45 >1.44us:        0 min:  0.04 max:  0.92 avg:  0.11 sum/s:     5us overhead: 0.00%
pert/s:       43 >1.33us:        0 min:  0.04 max:  0.64 avg:  0.11 sum/s:     5us overhead: 0.00%
pert/s:       34 >1.23us:        0 min:  0.04 max:  0.54 avg:  0.10 sum/s:     4us overhead: 0.00%
Terminated


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-17  8:37 ` Daniel Wagner
@ 2015-02-19  9:36   ` Joakim Hernberg
  2015-02-25 13:55     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 39+ messages in thread
From: Joakim Hernberg @ 2015-02-19  9:36 UTC (permalink / raw)
  To: Daniel Wagner, John Kacur
  Cc: Sebastian Andrzej Siewior, linux-rt-users, LKML, Thomas Gleixner,
	rostedt

On Tue, 17 Feb 2015 09:37:44 +0100
Daniel Wagner <wagi@monom.org> wrote:

> I needed the patch below to get it running stable under load on my
> shiny box.

FWIW, this patch makes 3.18-rt survive thermal events on my laptop.

> From c517743659575932d7b7c94a08276d0cee8a2fdd Mon Sep 17 00:00:00 2001
> From: Daniel Wagner <daniel.wagner@bmw-carit.de>
> Date: Fri, 11 Jul 2014 15:26:13 +0200
> Subject: [PATCH] thermal: Defer thermal wakups to threads
> 
> On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
> call schedule while we run in irq context.
> 
> [<ffffffff816850ac>] dump_stack+0x4e/0x8f
> [<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
> [<ffffffff816896b4>] __schedule+0x5b4/0x700
> [<ffffffff8168982a>] schedule+0x2a/0x90
> [<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
> [<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
> [<ffffffffa03a7b75>]
> pkg_temp_thermal_platform_thermal_notify+0x45/0x134
> [x86_pkg_temp_thermal] [<ffffffff8103d4db>] ?
> therm_throt_process+0x1b/0x160 [<ffffffff8103d831>]
> intel_thermal_interrupt+0x211/0x250 [<ffffffff8103d8c1>]
> smp_thermal_interrupt+0x21/0x40 [<ffffffff8169415d>]
> thermal_interrupt+0x6d/0x80
> 
> Let's defer the work to a kthread.
> 
> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/thermal/x86_pkg_temp_thermal.c | 49
> ++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+),
> 2 deletions(-)
> 
> diff --git a/drivers/thermal/x86_pkg_temp_thermal.c
> b/drivers/thermal/x86_pkg_temp_thermal.c index 9ea3d9d..001ba02 100644
> --- a/drivers/thermal/x86_pkg_temp_thermal.c
> +++ b/drivers/thermal/x86_pkg_temp_thermal.c
> @@ -29,6 +29,7 @@
>  #include <linux/pm.h>
>  #include <linux/thermal.h>
>  #include <linux/debugfs.h>
> +#include <linux/work-simple.h>
>  #include <asm/cpu_device_id.h>
>  #include <asm/mce.h>
>  
> @@ -352,7 +353,7 @@ static void
> pkg_temp_thermal_threshold_work_fn(struct work_struct *work) }
>  }
>  
> -static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> +static void platform_thermal_notify_work(struct swork_event *event)
>  {
>  	unsigned long flags;
>  	int cpu = smp_processor_id();
> @@ -369,7 +370,7 @@ static int
> pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> pkg_work_scheduled[phy_id]) { disable_pkg_thres_interrupt();
>  		spin_unlock_irqrestore(&pkg_work_lock, flags);
> -		return -EINVAL;
> +		return;
>  	}
>  	pkg_work_scheduled[phy_id] = 1;
>  	spin_unlock_irqrestore(&pkg_work_lock, flags);
> @@ -378,9 +379,48 @@ static int
> pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> schedule_delayed_work_on(cpu,
> &per_cpu(pkg_temp_thermal_threshold_work, cpu),
> msecs_to_jiffies(notify_delay_ms)); +}
> +
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +static struct swork_event notify_work;
> +
> +static int thermal_notify_work_init(void)
> +{
> +	int err;
> +
> +	err = swork_get();
> +	if (!err)
> +		return err;
> +
> +	INIT_SWORK(&notify_work, platform_thermal_notify_work);
> +	return 0;
> +}
> +
> +static void thermal_notify_work_cleanup(void)
> +{
> +	swork_put();
> +}
> +
> +static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> +{
> +	swork_queue(&notify_work);
>  	return 0;
>  }
>  
> +#else  /* !CONFIG_PREEMPT_RT_FULL */
> +
> +static int thermal_notify_work_init(void) { return 0; }
> +
> +static int thermal_notify_work_cleanup(void) {  }
> +
> +static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
> +{
> +	platform_thermal_notify_work(NULL);
> +
> +	return 0;
> +}
> +#endif /* CONFIG_PREEMPT_RT_FULL */
> +
>  static int find_siblings_cpu(int cpu)
>  {
>  	int i;
> @@ -594,6 +634,10 @@ static int __init pkg_temp_thermal_init(void)
>  	for_each_online_cpu(i)
>  		if (get_core_online(i))
>  			goto err_ret;
> +
> +	if (!thermal_notify_work_init())
> +		goto err_ret;
> +
>  	__register_hotcpu_notifier(&pkg_temp_thermal_notifier);
>  	cpu_notifier_register_done();
>  
> @@ -619,6 +663,7 @@ static void __exit pkg_temp_thermal_exit(void)
>  
>  	cpu_notifier_register_begin();
>  	__unregister_hotcpu_notifier(&pkg_temp_thermal_notifier);
> +	thermal_notify_work_cleanup();
>  	mutex_lock(&phy_dev_list_mutex);
>  	list_for_each_entry_safe(phdev, n, &phy_dev_list, list) {
>  		/* Retore old MSR value for package thermal
> interrupt */



-- 

   Joakim

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-19  9:36   ` Joakim Hernberg
@ 2015-02-25 13:55     ` Sebastian Andrzej Siewior
  2015-02-26  8:48       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-25 13:55 UTC (permalink / raw)
  To: Joakim Hernberg
  Cc: Daniel Wagner, John Kacur, linux-rt-users, LKML, Thomas Gleixner,
	rostedt

* Joakim Hernberg | 2015-02-19 10:36:01 [+0100]:

>On Tue, 17 Feb 2015 09:37:44 +0100
>Daniel Wagner <wagi@monom.org> wrote:
>
>> I needed the patch below to get it running stable under load on my
>> shiny box.
>
>FWIW, this patch makes 3.18-rt survive thermal events on my laptop.

Okay. I applied a slightly modified version of it. That init() is
invoked before the callback function is assigned.
On module exit there is one piece missing: After the callback is
removed we should flush the swork queue to ensure that one callback
is neither pending nor executing. But this won't work now becuase the
swait API lacks that feature so it has to be postponed.

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1] snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats
  2015-02-18 14:09 ` [patch-3.18.7-rt1] snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats Mike Galbraith
@ 2015-02-25 14:08   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-25 14:08 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

* Mike Galbraith | 2015-02-18 15:09:23 [+0100]:

>Locking functions previously using read_lock_irq()/read_lock_irqsave() were
>changed to local_irq_disable/save(), leading to gripes.  Use nort variants.
applied

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1]sunrpc: make svc_xprt_do_enqueue() use get_cpu_light()
  2015-02-18 15:05 ` [patch-3.18.7-rt1]sunrpc: make svc_xprt_do_enqueue() use get_cpu_light() Mike Galbraith
@ 2015-02-25 14:14   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-25 14:14 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

* Mike Galbraith | 2015-02-18 16:05:28 [+0100]:

>[   37.667792] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
>[   37.720307] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
>[   37.720307] in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
>[   37.720318] Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]

applied

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-25 13:55     ` Sebastian Andrzej Siewior
@ 2015-02-26  8:48       ` Sebastian Andrzej Siewior
  2015-02-27  6:40         ` Daniel Wagner
  0 siblings, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-26  8:48 UTC (permalink / raw)
  To: Joakim Hernberg
  Cc: Daniel Wagner, John Kacur, linux-rt-users, LKML, Thomas Gleixner,
	rostedt

* Sebastian Andrzej Siewior | 2015-02-25 14:55:01 [+0100]:

>Okay. I applied a slightly modified version of it. That init() is

>From 8334ac498f104c00e5d93e3e83d3bcec1a993cec Mon Sep 17 00:00:00 2001
From: Daniel Wagner <wagi@monom.org>
Date: Tue, 17 Feb 2015 09:37:44 +0100
Subject: [PATCH] thermal: Defer thermal wakups to threads

On RT the spin lock in pkg_temp_thermal_platfrom_thermal_notify will
call schedule while we run in irq context.

[<ffffffff816850ac>] dump_stack+0x4e/0x8f
[<ffffffff81680f7d>] __schedule_bug+0xa6/0xb4
[<ffffffff816896b4>] __schedule+0x5b4/0x700
[<ffffffff8168982a>] schedule+0x2a/0x90
[<ffffffff8168a8b5>] rt_spin_lock_slowlock+0xe5/0x2d0
[<ffffffff8168afd5>] rt_spin_lock+0x25/0x30
[<ffffffffa03a7b75>] pkg_temp_thermal_platform_thermal_notify+0x45/0x134 [x86_pkg_temp_thermal]
[<ffffffff8103d4db>] ? therm_throt_process+0x1b/0x160
[<ffffffff8103d831>] intel_thermal_interrupt+0x211/0x250
[<ffffffff8103d8c1>] smp_thermal_interrupt+0x21/0x40
[<ffffffff8169415d>] thermal_interrupt+0x6d/0x80

Let's defer the work to a kthread.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
[bigeasy: reoder init/denit position. TODO: flush swork on exit]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/thermal/x86_pkg_temp_thermal.c | 51 ++++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/x86_pkg_temp_thermal.c b/drivers/thermal/x86_pkg_temp_thermal.c
index 9ea3d9d49ffc..ab7c400a9d5c 100644
--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -29,6 +29,7 @@
 #include <linux/pm.h>
 #include <linux/thermal.h>
 #include <linux/debugfs.h>
+#include <linux/work-simple.h>
 #include <asm/cpu_device_id.h>
 #include <asm/mce.h>
 
@@ -352,7 +353,7 @@ static void pkg_temp_thermal_threshold_work_fn(struct work_struct *work)
 	}
 }
 
-static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+static void platform_thermal_notify_work(struct swork_event *event)
 {
 	unsigned long flags;
 	int cpu = smp_processor_id();
@@ -369,7 +370,7 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
 			pkg_work_scheduled[phy_id]) {
 		disable_pkg_thres_interrupt();
 		spin_unlock_irqrestore(&pkg_work_lock, flags);
-		return -EINVAL;
+		return;
 	}
 	pkg_work_scheduled[phy_id] = 1;
 	spin_unlock_irqrestore(&pkg_work_lock, flags);
@@ -378,9 +379,48 @@ static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
 	schedule_delayed_work_on(cpu,
 				&per_cpu(pkg_temp_thermal_threshold_work, cpu),
 				msecs_to_jiffies(notify_delay_ms));
+}
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+static struct swork_event notify_work;
+
+static int thermal_notify_work_init(void)
+{
+	int err;
+
+	err = swork_get();
+	if (!err)
+		return err;
+
+	INIT_SWORK(&notify_work, platform_thermal_notify_work);
 	return 0;
 }
 
+static void thermal_notify_work_cleanup(void)
+{
+	swork_put();
+}
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+	swork_queue(&notify_work);
+	return 0;
+}
+
+#else  /* !CONFIG_PREEMPT_RT_FULL */
+
+static int thermal_notify_work_init(void) { return 0; }
+
+static int thermal_notify_work_cleanup(void) {  }
+
+static int pkg_temp_thermal_platform_thermal_notify(__u64 msr_val)
+{
+	platform_thermal_notify_work(NULL);
+
+	return 0;
+}
+#endif /* CONFIG_PREEMPT_RT_FULL */
+
 static int find_siblings_cpu(int cpu)
 {
 	int i;
@@ -584,6 +624,9 @@ static int __init pkg_temp_thermal_init(void)
 	if (!x86_match_cpu(pkg_temp_thermal_ids))
 		return -ENODEV;
 
+	if (!thermal_notify_work_init())
+		return -ENODEV;
+
 	spin_lock_init(&pkg_work_lock);
 	platform_thermal_package_notify =
 			pkg_temp_thermal_platform_thermal_notify;
@@ -594,6 +637,7 @@ static int __init pkg_temp_thermal_init(void)
 	for_each_online_cpu(i)
 		if (get_core_online(i))
 			goto err_ret;
+
 	__register_hotcpu_notifier(&pkg_temp_thermal_notifier);
 	cpu_notifier_register_done();
 
@@ -608,7 +652,7 @@ static int __init pkg_temp_thermal_init(void)
 	kfree(pkg_work_scheduled);
 	platform_thermal_package_notify = NULL;
 	platform_thermal_package_rate_control = NULL;
-
+	thermal_notify_work_cleanup();
 	return -ENODEV;
 }
 
@@ -633,6 +677,7 @@ static void __exit pkg_temp_thermal_exit(void)
 	mutex_unlock(&phy_dev_list_mutex);
 	platform_thermal_package_notify = NULL;
 	platform_thermal_package_rate_control = NULL;
+	thermal_notify_work_cleanup();
 	for_each_online_cpu(i)
 		cancel_delayed_work_sync(
 			&per_cpu(pkg_temp_thermal_threshold_work, i));
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-26  8:48       ` Sebastian Andrzej Siewior
@ 2015-02-27  6:40         ` Daniel Wagner
  2015-02-27  7:58           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 39+ messages in thread
From: Daniel Wagner @ 2015-02-27  6:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Joakim Hernberg
  Cc: John Kacur, linux-rt-users, LKML, Thomas Gleixner, rostedt

Hi Sebastian

On 02/26/2015 09:48 AM, Sebastian Andrzej Siewior wrote:
> * Sebastian Andrzej Siewior | 2015-02-25 14:55:01 [+0100]:
> +
> +static int thermal_notify_work_init(void)
> +{
> +	int err;
> +
> +	err = swork_get();
> +	if (!err)
> +		return err;

It think this should be:

	if (err)
		return err;


cheers,
daniel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ANNOUNCE] 3.18.7-rt1
  2015-02-27  6:40         ` Daniel Wagner
@ 2015-02-27  7:58           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-27  7:58 UTC (permalink / raw)
  To: Daniel Wagner, Joakim Hernberg
  Cc: John Kacur, linux-rt-users, LKML, Thomas Gleixner, rostedt

On 02/27/2015 07:40 AM, Daniel Wagner wrote:
> Hi Sebastian

Hi Daniel,

> On 02/26/2015 09:48 AM, Sebastian Andrzej Siewior wrote:
>> * Sebastian Andrzej Siewior | 2015-02-25 14:55:01 [+0100]:
>> +
>> +static int thermal_notify_work_init(void)
>> +{
>> +	int err;
>> +
>> +	err = swork_get();
>> +	if (!err)
>> +		return err;
> 
> It think this should be:
> 
> 	if (err)
> 		return err;

Makes sense.

> 
> cheers,
> daniel

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions
  2015-02-18 11:21 ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
  2015-02-18 11:27   ` [rfc patch-3.18.7-rt1] rt/nohz_full: Fix can_stop_full_tick() gripe within softirq Mike Galbraith
  2015-02-19  6:30   ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
@ 2015-03-09 13:45   ` Sebastian Andrzej Siewior
  2015-03-09 14:36     ` Mike Galbraith
  2 siblings, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-09 13:45 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

* Mike Galbraith | 2015-02-18 12:21:54 [+0100]:

>On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
>
>> Known issues:
>>
>>       - lazy preempt on x86_64 leads to a crash with some load.
>
>The below still works for me.  (it doesn't make nohz_full actually work
>in rt, but at least folks who want to tinker with it can do so)
So your box still crashes without it?

The thing is I tried to reproduce it and it does not occur anymore.
Neither in KVM nor on real HW.

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions
  2015-03-09 13:45   ` Sebastian Andrzej Siewior
@ 2015-03-09 14:36     ` Mike Galbraith
  2015-03-09 14:45       ` Sebastian Andrzej Siewior
  2015-03-10  9:35       ` Mike Galbraith
  0 siblings, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-03-09 14:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Mon, 2015-03-09 at 14:45 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2015-02-18 12:21:54 [+0100]:
> 
> >On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
> >
> >> Known issues:
> >>
> >>       - lazy preempt on x86_64 leads to a crash with some load.
> >
> >The below still works for me.  (it doesn't make nohz_full actually work
> >in rt, but at least folks who want to tinker with it can do so)
> So your box still crashes without it?
> 
> The thing is I tried to reproduce it and it does not occur anymore.
> Neither in KVM nor on real HW.

Heh, well that's interesting, I just presumed it would still explode
because you said it would.  I'll beat on it any let you know if I'm
carrying a placebo patch or not :)

	-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions
  2015-03-09 14:36     ` Mike Galbraith
@ 2015-03-09 14:45       ` Sebastian Andrzej Siewior
  2015-03-10  9:35       ` Mike Galbraith
  1 sibling, 0 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-09 14:45 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On 03/09/2015 03:36 PM, Mike Galbraith wrote:
> On Mon, 2015-03-09 at 14:45 +0100, Sebastian Andrzej Siewior wrote:
>> * Mike Galbraith | 2015-02-18 12:21:54 [+0100]:
>>
>>> On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
>>>
>>>> Known issues:
>>>>
>>>>       - lazy preempt on x86_64 leads to a crash with some load.
>>>
>>> The below still works for me.  (it doesn't make nohz_full actually work
>>> in rt, but at least folks who want to tinker with it can do so)
>> So your box still crashes without it?
>>
>> The thing is I tried to reproduce it and it does not occur anymore.
>> Neither in KVM nor on real HW.
> 
> Heh, well that's interesting, I just presumed it would still explode
> because you said it would.  I'll beat on it any let you know if I'm
> carrying a placebo patch or not :)

Hehe. By the time I did 3.18-rt1 I was surprised that the disable-
lazy-preempt-on-x8664 was still there because I assumed it has been
fixed in the v3.14 series. So I left it as-is (including the broken
message) and wanted to drop later an eye on it. And after you dropped a
few patches to fix it I assumed (even more) that it is still broken :)

> 
> 	-Mike
> 

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions
  2015-03-09 14:36     ` Mike Galbraith
  2015-03-09 14:45       ` Sebastian Andrzej Siewior
@ 2015-03-10  9:35       ` Mike Galbraith
  2015-03-11  6:18         ` Mike Galbraith
  2015-03-12  9:38         ` [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL Mike Galbraith
  1 sibling, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-03-10  9:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Mon, 2015-03-09 at 15:36 +0100, Mike Galbraith wrote:
> On Mon, 2015-03-09 at 14:45 +0100, Sebastian Andrzej Siewior wrote:
> > * Mike Galbraith | 2015-02-18 12:21:54 [+0100]:
> > 
> > >On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
> > >
> > >> Known issues:
> > >>
> > >>       - lazy preempt on x86_64 leads to a crash with some load.
> > >
> > >The below still works for me.  (it doesn't make nohz_full actually work
> > >in rt, but at least folks who want to tinker with it can do so)
> > So your box still crashes without it?
> > 
> > The thing is I tried to reproduce it and it does not occur anymore.
> > Neither in KVM nor on real HW.
> 
> Heh, well that's interesting, I just presumed it would still explode
> because you said it would.  I'll beat on it any let you know if I'm
> carrying a placebo patch or not :)

Yup, someone made kaboom go away.  I was able to easily blow my box out
of the water by running tbench + kbuild with nohz_full active.  No more.

nohz_full doesn't work though, due to an otherwise solo task trying to
shut the tick down having just awakened ksoftirqd.. but no more kaboom.

	-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions
  2015-03-10  9:35       ` Mike Galbraith
@ 2015-03-11  6:18         ` Mike Galbraith
  2015-03-12  9:38         ` [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL Mike Galbraith
  1 sibling, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-03-11  6:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Tue, 2015-03-10 at 10:35 +0100, Mike Galbraith wrote:
> On Mon, 2015-03-09 at 15:36 +0100, Mike Galbraith wrote:
> > On Mon, 2015-03-09 at 14:45 +0100, Sebastian Andrzej Siewior wrote:
> > > * Mike Galbraith | 2015-02-18 12:21:54 [+0100]:
> > > 
> > > >On Mon, 2015-02-16 at 12:18 +0100, Sebastian Andrzej Siewior wrote:
> > > >
> > > >> Known issues:
> > > >>
> > > >>       - lazy preempt on x86_64 leads to a crash with some load.
> > > >
> > > >The below still works for me.  (it doesn't make nohz_full actually work
> > > >in rt, but at least folks who want to tinker with it can do so)
> > > So your box still crashes without it?
> > > 
> > > The thing is I tried to reproduce it and it does not occur anymore.
> > > Neither in KVM nor on real HW.
> > 
> > Heh, well that's interesting, I just presumed it would still explode
> > because you said it would.  I'll beat on it any let you know if I'm
> > carrying a placebo patch or not :)
> 
> Yup, someone made kaboom go away.  I was able to easily blow my box out
> of the water by running tbench + kbuild with nohz_full active.  No more.
> 
> nohz_full doesn't work though, due to an otherwise solo task trying to
> shut the tick down having just awakened ksoftirqd.. but no more kaboom.

FWIW, the hack below made nohz_full functional in my trees, and it still
doesn't explode, so seems it really really did get fixed up.

---
 kernel/sched/core.c      |    2 +-
 kernel/softirq.c         |   25 +++++++++++++++++++------
 kernel/time/hrtimer.c    |   10 +++++++++-
 kernel/time/tick-sched.c |   15 ++++++++++++++-
 4 files changed, 43 insertions(+), 9 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -810,7 +810,7 @@ bool sched_can_stop_tick(void)
 	 * nr_running update is assumed to be visible
 	 * after IPI is sent from wakers.
 	 */
-	if (this_rq()->nr_running > 1)
+	if (this_rq()->nr_running - task_is_softirqd(current) > 1)
 		return false;
 
 	return true;
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -517,19 +517,32 @@ static void unlock_softirq(int which)
 
 static void do_single_softirq(int which, int need_rcu_bh_qs)
 {
-	unsigned long old_flags = current->flags;
+	struct task_struct *p = current;
+	unsigned long old_flags = p->flags;
 
-	current->flags &= ~PF_MEMALLOC;
+	p->flags &= ~PF_MEMALLOC;
 	vtime_account_irq_enter(current);
-	current->flags |= PF_IN_SOFTIRQ;
+	p->flags |= PF_IN_SOFTIRQ;
 	lockdep_softirq_enter();
 	local_irq_enable();
 	handle_softirq(which);
 	local_irq_disable();
 	lockdep_softirq_exit();
-	current->flags &= ~PF_IN_SOFTIRQ;
-	vtime_account_irq_enter(current);
-	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
+	p->flags &= ~PF_IN_SOFTIRQ;
+	vtime_account_irq_enter(p);
+	tsk_restore_flags(p, old_flags, PF_MEMALLOC);
+#if defined(CONFIG_NO_HZ_FULL) && defined(CONFIG_PREEMPT_RT_FULL)
+	/*
+	 * If a task calls irq_exit()->invoke_softirq(), it won't
+	 * be able to shut the tick down because it just woke us.
+	 * Try to shut it down now.
+	 */
+	if (tick_nohz_full_cpu(raw_smp_processor_id()) &&
+	    task_is_softirqd(p) && which == TIMER_SOFTIRQ &&
+	    sched_can_stop_tick()) {
+		tick_nohz_irq_exit();
+	}
+#endif
 }
 
 /*
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1141,7 +1141,6 @@ hrtimer_start(struct hrtimer *timer, kti
 }
 EXPORT_SYMBOL_GPL(hrtimer_start);
 
-
 /**
  * hrtimer_try_to_cancel - try to deactivate a timer
  * @timer:	hrtimer to stop
@@ -1162,6 +1161,15 @@ int hrtimer_try_to_cancel(struct hrtimer
 
 	if (!hrtimer_callback_running(timer))
 		ret = remove_hrtimer(timer, base);
+#if defined(CONFIG_PREEMPT_RT_FULL) && defined(CONFIG_NO_HZ_FULL)
+	else {
+		extern enum hrtimer_restart tick_sched_timer(struct hrtimer *timer);
+		int cpu = raw_smp_processor_id();
+
+		if (tick_nohz_full_cpu(cpu) && timer->function == tick_sched_timer)
+			ret = HRTIMER_STATE_CALLBACK;
+	}
+#endif
 
 	unlock_hrtimer_base(timer, &flags);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -222,7 +222,14 @@ void __tick_nohz_full_check(void)
 
 static void nohz_full_kick_work_func(struct irq_work *work)
 {
+	unsigned long flags;
+
+	local_save_flags(flags);
+	/* ksoftirqd processes softirqs with interrupts enabled */
+	if (task_is_softirqd(current))
+		local_irq_disable_rt();
 	__tick_nohz_full_check();
+	local_irq_restore(flags);
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
@@ -879,6 +886,12 @@ ktime_t tick_nohz_get_sleep_length(void)
 
 static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 {
+#if defined(CONFIG_PREEMPT_RT_FULL) && defined(CONFIG_NO_HZ_FULL)
+	if (tick_nohz_full_cpu(raw_smp_processor_id()) &&
+	    hrtimer_cancel(&ts->sched_timer) == HRTIMER_STATE_CALLBACK)
+		return;
+	else
+#endif
 	hrtimer_cancel(&ts->sched_timer);
 	hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
 
@@ -1108,7 +1121,7 @@ void tick_irq_enter(void)
  * We rearm the timer until we get disabled by the idle code.
  * Called with interrupts disabled.
  */
-static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
+enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 {
 	struct tick_sched *ts =
 		container_of(timer, struct tick_sched, sched_timer);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-10  9:35       ` Mike Galbraith
  2015-03-11  6:18         ` Mike Galbraith
@ 2015-03-12  9:38         ` Mike Galbraith
  2015-03-12 15:09           ` Steven Rostedt
  2015-03-13  4:53           ` [rfc patch v2] " Mike Galbraith
  1 sibling, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-03-12  9:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

This works, let's see is someone has a better idea...

First of all, a task being ticked and trying to shut the tick down will
fail to do so due to having just awakened ksoftirqd, so let ksoftirqd
try to do that after SOFTIRQ_TIMER processing.  Secondly, should the
tick be shut down, we may livelock in hrtimer-cancel() because in -rt
a callback may be running.  Break the loop, and let tick_nohz_restart()
know that the timer is busy so it can bail.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 kernel/sched/core.c      |    6 +++++-
 kernel/softirq.c         |   17 +++++++++++++++++
 kernel/time/hrtimer.c    |    9 +++++++++
 kernel/time/tick-sched.c |   16 +++++++++++++++-
 4 files changed, 46 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -783,12 +783,16 @@ static inline bool got_nohz_idle_kick(vo
 #ifdef CONFIG_NO_HZ_FULL
 bool sched_can_stop_tick(void)
 {
+	int softirqd = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
+
+	softirqd &= current == this_cpu_ksoftirqd();
+
 	/*
 	 * More than one running task need preemption.
 	 * nr_running update is assumed to be visible
 	 * after IPI is sent from wakers.
 	 */
-	if (this_rq()->nr_running > 1)
+	if (this_rq()->nr_running - softirqd > 1)
 		return false;
 
 	return true;
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -498,6 +498,22 @@ static void unlock_softirq(int which)
 	local_unlock(local_softirq_locks[which]);
 }
 
+/*
+ * Let ksoftirqd try to shut down the tick when awakened via
+ * timer_interrupt->irq_exit()->invoke_softirq(), as the task
+ * then calling tick_nohz_irq_exit() WILL fail to do so due
+ * to that very wakeup having made rq->nr_running > 1.
+ */
+static void tick_nohz_sirq_timer_exit(int which)
+{
+	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
+		return;
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL))
+		return;
+	if (which != TIMER_SOFTIRQ || current != this_cpu_ksoftirqd())
+		return;
+	tick_nohz_irq_exit();
+}
 static void do_single_softirq(int which, int need_rcu_bh_qs)
 {
 	unsigned long old_flags = current->flags;
@@ -513,6 +529,7 @@ static void do_single_softirq(int which,
 	current->flags &= ~PF_IN_SOFTIRQ;
 	vtime_account_irq_enter(current);
 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
+	tick_nohz_sirq_timer_exit(which);
 }
 
 /*
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1162,6 +1162,15 @@ int hrtimer_try_to_cancel(struct hrtimer
 
 	if (!hrtimer_callback_running(timer))
 		ret = remove_hrtimer(timer, base);
+	else if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
+		 tick_nohz_full_cpu(raw_smp_processor_id())) {
+		/*
+		 * Let tick_nohz_restart() know that the timer is
+		 * active lest we spin forever in hrtimer_cancel().
+		 */
+		if (in_irq() && timer->irqsafe)
+			ret = HRTIMER_STATE_CALLBACK;
+	}
 
 	unlock_hrtimer_base(timer, &flags);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -222,7 +222,14 @@ void __tick_nohz_full_check(void)
 
 static void nohz_full_kick_work_func(struct irq_work *work)
 {
+	unsigned long flags;
+
+	local_save_flags(flags);
+	/* ksoftirqd processes softirqs with interrupts enabled */
+	if (current == this_cpu_ksoftirqd())
+		local_irq_disable();
 	__tick_nohz_full_check();
+	local_irq_restore(flags);
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
@@ -879,7 +886,14 @@ ktime_t tick_nohz_get_sleep_length(void)
 
 static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 {
-	hrtimer_cancel(&ts->sched_timer);
+	int cpu = raw_smp_processor_id();
+
+	/* In an RT kernel, a callback may be running.  If so, bail */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && tick_nohz_full_cpu(cpu)) {
+		if (hrtimer_cancel(&ts->sched_timer) == HRTIMER_STATE_CALLBACK)
+			return;
+	} else
+		hrtimer_cancel(&ts->sched_timer);
 	hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
 
 	while (1) {




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-12  9:38         ` [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL Mike Galbraith
@ 2015-03-12 15:09           ` Steven Rostedt
  2015-03-13  2:13             ` Mike Galbraith
  2015-03-13  4:53           ` [rfc patch v2] " Mike Galbraith
  1 sibling, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2015-03-12 15:09 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, linux-rt-users, LKML, Thomas Gleixner,
	John Kacur

On Thu, 12 Mar 2015 10:38:13 +0100
Mike Galbraith <umgwanakikbuti@gmail.com> wrote:

> This works, let's see is someone has a better idea...
> 
> First of all, a task being ticked and trying to shut the tick down will
> fail to do so due to having just awakened ksoftirqd, so let ksoftirqd
> try to do that after SOFTIRQ_TIMER processing.  Secondly, should the
> tick be shut down, we may livelock in hrtimer-cancel() because in -rt
> a callback may be running.  Break the loop, and let tick_nohz_restart()
> know that the timer is busy so it can bail.
> 
> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> ---
>  kernel/sched/core.c      |    6 +++++-
>  kernel/softirq.c         |   17 +++++++++++++++++
>  kernel/time/hrtimer.c    |    9 +++++++++
>  kernel/time/tick-sched.c |   16 +++++++++++++++-
>  4 files changed, 46 insertions(+), 2 deletions(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -783,12 +783,16 @@ static inline bool got_nohz_idle_kick(vo
>  #ifdef CONFIG_NO_HZ_FULL
>  bool sched_can_stop_tick(void)
>  {
> +	int softirqd = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> +
> +	softirqd &= current == this_cpu_ksoftirqd();

Ug, binary and logical ANDs should not be combined. Just looks nasty.
What about:

	softirqd = !!(IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
			current == this_cpu_ksoftirqd());

> +
>  	/*
>  	 * More than one running task need preemption.
>  	 * nr_running update is assumed to be visible
>  	 * after IPI is sent from wakers.
>  	 */
> -	if (this_rq()->nr_running > 1)
> +	if (this_rq()->nr_running - softirqd > 1)

There should also be a comment explaining this hack.

>  		return false;
>  
>  	return true;
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -498,6 +498,22 @@ static void unlock_softirq(int which)
>  	local_unlock(local_softirq_locks[which]);
>  }
>  
> +/*
> + * Let ksoftirqd try to shut down the tick when awakened via
> + * timer_interrupt->irq_exit()->invoke_softirq(), as the task
> + * then calling tick_nohz_irq_exit() WILL fail to do so due
> + * to that very wakeup having made rq->nr_running > 1.
> + */
> +static void tick_nohz_sirq_timer_exit(int which)
> +{
> +	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
> +		return;
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL))
> +		return;
> +	if (which != TIMER_SOFTIRQ || current != this_cpu_ksoftirqd())
> +		return;
> +	tick_nohz_irq_exit();
> +}
>  static void do_single_softirq(int which, int need_rcu_bh_qs)
>  {
>  	unsigned long old_flags = current->flags;
> @@ -513,6 +529,7 @@ static void do_single_softirq(int which,
>  	current->flags &= ~PF_IN_SOFTIRQ;
>  	vtime_account_irq_enter(current);
>  	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
> +	tick_nohz_sirq_timer_exit(which);
>  }
>  
>  /*
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1162,6 +1162,15 @@ int hrtimer_try_to_cancel(struct hrtimer
>  
>  	if (!hrtimer_callback_running(timer))
>  		ret = remove_hrtimer(timer, base);
> +	else if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
> +		 tick_nohz_full_cpu(raw_smp_processor_id())) {
> +		/*
> +		 * Let tick_nohz_restart() know that the timer is
> +		 * active lest we spin forever in hrtimer_cancel().
> +		 */
> +		if (in_irq() && timer->irqsafe)
> +			ret = HRTIMER_STATE_CALLBACK;
> +	}
>  
>  	unlock_hrtimer_base(timer, &flags);
>  
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -222,7 +222,14 @@ void __tick_nohz_full_check(void)
>  
>  static void nohz_full_kick_work_func(struct irq_work *work)
>  {
> +	unsigned long flags;
> +
> +	local_save_flags(flags);
> +	/* ksoftirqd processes softirqs with interrupts enabled */
> +	if (current == this_cpu_ksoftirqd())
> +		local_irq_disable();

Why the funkiness above? Why not just call local_irq_save()?

-- Steve

>  	__tick_nohz_full_check();
> +	local_irq_restore(flags);
>  }
>  
>  static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
> @@ -879,7 +886,14 @@ ktime_t tick_nohz_get_sleep_length(void)
>  
>  static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
>  {
> -	hrtimer_cancel(&ts->sched_timer);
> +	int cpu = raw_smp_processor_id();
> +
> +	/* In an RT kernel, a callback may be running.  If so, bail */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && tick_nohz_full_cpu(cpu)) {
> +		if (hrtimer_cancel(&ts->sched_timer) == HRTIMER_STATE_CALLBACK)
> +			return;
> +	} else
> +		hrtimer_cancel(&ts->sched_timer);
>  	hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
>  
>  	while (1) {
> 
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-12 15:09           ` Steven Rostedt
@ 2015-03-13  2:13             ` Mike Galbraith
  2015-03-13  3:03               ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2015-03-13  2:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, linux-rt-users, LKML, Thomas Gleixner,
	John Kacur

On Thu, 2015-03-12 at 11:09 -0400, Steven Rostedt wrote:
> On Thu, 12 Mar 2015 10:38:13 +0100
> Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
> 

> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -783,12 +783,16 @@ static inline bool got_nohz_idle_kick(vo
> >  #ifdef CONFIG_NO_HZ_FULL
> >  bool sched_can_stop_tick(void)
> >  {
> > +	int softirqd = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> > +
> > +	softirqd &= current == this_cpu_ksoftirqd();
> 
> Ug, binary and logical ANDs should not be combined. Just looks nasty.
> What about:
> 
> 	softirqd = !!(IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
> 			current == this_cpu_ksoftirqd());

My way looks prettier to me, but I seem to be the only who thinks so
(this is not the first time it got a gripe), so I'll change it ;-)

> > +
> >  	/*
> >  	 * More than one running task need preemption.
> >  	 * nr_running update is assumed to be visible
> >  	 * after IPI is sent from wakers.
> >  	 */
> > -	if (this_rq()->nr_running > 1)
> > +	if (this_rq()->nr_running - softirqd > 1)
> 
> There should also be a comment explaining this hack.

Yeah.
  
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -222,7 +222,14 @@ void __tick_nohz_full_check(void)
> >  
> >  static void nohz_full_kick_work_func(struct irq_work *work)
> >  {
> > +	unsigned long flags;
> > +
> > +	local_save_flags(flags);
> > +	/* ksoftirqd processes softirqs with interrupts enabled */
> > +	if (current == this_cpu_ksoftirqd())
> > +		local_irq_disable();
> 
> Why the funkiness above? Why not just call local_irq_save()?

Gone.  It would be good to make the rest gone too, it's ain't pretty it
just works.

	-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-13  2:13             ` Mike Galbraith
@ 2015-03-13  3:03               ` Steven Rostedt
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2015-03-13  3:03 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, linux-rt-users, LKML, Thomas Gleixner,
	John Kacur

On Fri, 13 Mar 2015 03:13:15 +0100
Mike Galbraith <umgwanakikbuti@gmail.com> wrote:

> > > +	int softirqd = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
> > > +
> > > +	softirqd &= current == this_cpu_ksoftirqd();
> > 
> > Ug, binary and logical ANDs should not be combined. Just looks nasty.
> > What about:
> > 
> > 	softirqd = !!(IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
> > 			current == this_cpu_ksoftirqd());
> 
> My way looks prettier to me, but I seem to be the only who thinks so
> (this is not the first time it got a gripe), so I'll change it ;-)

I shouldn't have said it looks nasty. It actually does look pretty, in
an artistic kind of way. It's just not very comprehensible.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-12  9:38         ` [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL Mike Galbraith
  2015-03-12 15:09           ` Steven Rostedt
@ 2015-03-13  4:53           ` Mike Galbraith
  2015-03-16 20:24             ` Sebastian Andrzej Siewior
  2015-04-10 14:15             ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-03-13  4:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur


First of all, a task being ticked and trying to shut the tick down will
fail to do so due to having just awakened ksoftirqd, so let ksoftirqd
try to do that after SOFTIRQ_TIMER processing.  Secondly, should the
tick be shut down, we may livelock in hrtimer-cancel() because in -rt
a callback may be running.  Break the loop, and let tick_nohz_restart()
know that the timer is busy so it can bail.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 kernel/sched/core.c      |    8 +++++++-
 kernel/softirq.c         |   17 +++++++++++++++++
 kernel/time/hrtimer.c    |    9 +++++++++
 kernel/time/tick-sched.c |   14 +++++++++++++-
 4 files changed, 46 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -783,12 +783,18 @@ static inline bool got_nohz_idle_kick(vo
 #ifdef CONFIG_NO_HZ_FULL
 bool sched_can_stop_tick(void)
 {
+	int ksoftirqd = !!(IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
+			   current == this_cpu_ksoftirqd());
+
 	/*
 	 * More than one running task need preemption.
 	 * nr_running update is assumed to be visible
 	 * after IPI is sent from wakers.
+	 *
+	 * NOTE, RT: ksoftirqd tries to stop the tick for
+	 * tasks as they exit irq, ergo subtracts itself.
 	 */
-	if (this_rq()->nr_running > 1)
+	if (this_rq()->nr_running - ksoftirqd > 1)
 		return false;
 
 	return true;
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -498,6 +498,22 @@ static void unlock_softirq(int which)
 	local_unlock(local_softirq_locks[which]);
 }
 
+/*
+ * Let ksoftirqd try to shut down the tick when awakened via
+ * timer_interrupt->irq_exit()->invoke_softirq(), as the task
+ * then calling tick_nohz_irq_exit() WILL fail to do so due
+ * to that very wakeup having made rq->nr_running > 1.
+ */
+static void tick_nohz_sirq_timer_exit(int which)
+{
+	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
+		return;
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL))
+		return;
+	if (which != TIMER_SOFTIRQ || current != this_cpu_ksoftirqd())
+		return;
+	tick_nohz_irq_exit();
+}
 static void do_single_softirq(int which, int need_rcu_bh_qs)
 {
 	unsigned long old_flags = current->flags;
@@ -513,6 +529,7 @@ static void do_single_softirq(int which,
 	current->flags &= ~PF_IN_SOFTIRQ;
 	vtime_account_irq_enter(current);
 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
+	tick_nohz_sirq_timer_exit(which);
 }
 
 /*
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1162,6 +1162,15 @@ int hrtimer_try_to_cancel(struct hrtimer
 
 	if (!hrtimer_callback_running(timer))
 		ret = remove_hrtimer(timer, base);
+	else if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
+		 tick_nohz_full_cpu(raw_smp_processor_id())) {
+		/*
+		 * Let tick_nohz_restart() know that the timer is
+		 * active lest we spin forever in hrtimer_cancel().
+		 */
+		if (in_irq() && timer->irqsafe)
+			ret = HRTIMER_STATE_CALLBACK;
+	}
 
 	unlock_hrtimer_base(timer, &flags);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -222,7 +222,12 @@ void __tick_nohz_full_check(void)
 
 static void nohz_full_kick_work_func(struct irq_work *work)
 {
+	unsigned long flags;
+
+	/* ksoftirqd processes sirqs with interrupts enabled */
+	local_irq_save(flags);
 	__tick_nohz_full_check();
+	local_irq_restore(flags);
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
@@ -879,7 +884,14 @@ ktime_t tick_nohz_get_sleep_length(void)
 
 static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 {
-	hrtimer_cancel(&ts->sched_timer);
+	int cpu = raw_smp_processor_id();
+
+	/* In an RT kernel, a callback may be running.  If so, bail */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) && tick_nohz_full_cpu(cpu)) {
+		if (hrtimer_cancel(&ts->sched_timer) == HRTIMER_STATE_CALLBACK)
+			return;
+	} else
+		hrtimer_cancel(&ts->sched_timer);
 	hrtimer_set_expires(&ts->sched_timer, ts->last_tick);
 
 	while (1) {




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-13  4:53           ` [rfc patch v2] " Mike Galbraith
@ 2015-03-16 20:24             ` Sebastian Andrzej Siewior
  2015-03-17  1:53               ` Mike Galbraith
  2015-04-10 14:15             ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-16 20:24 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

* Mike Galbraith | 2015-03-13 05:53:25 [+0100]:

>First of all, a task being ticked and trying to shut the tick down will
>fail to do so due to having just awakened ksoftirqd, so let ksoftirqd
>try to do that after SOFTIRQ_TIMER processing.  Secondly, should the
>tick be shut down, we may livelock in hrtimer-cancel() because in -rt
>a callback may be running.  Break the loop, and let tick_nohz_restart()
>know that the timer is busy so it can bail.

I am a bit undecided on that one. I included it in the series but did
not enable it yet.
Just so we are on the same page here: you boot your machine with
something like
    "isolcpus=1-31 rcu_nocbs=1-31 nohz_full=1-31"
and pin all kernel threads to CPU0, right?

What you do is that you accept the fact that the timer-softirq is
scheduled for no reason and then you try to disable the tick from within
the timer-softirq. I assumed that it would work get the "expired timer"
somehow.

>Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-16 20:24             ` Sebastian Andrzej Siewior
@ 2015-03-17  1:53               ` Mike Galbraith
  2015-03-17  4:45                 ` Mike Galbraith
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Galbraith @ 2015-03-17  1:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Mon, 2015-03-16 at 21:24 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2015-03-13 05:53:25 [+0100]:
> 
> >First of all, a task being ticked and trying to shut the tick down will
> >fail to do so due to having just awakened ksoftirqd, so let ksoftirqd
> >try to do that after SOFTIRQ_TIMER processing.  Secondly, should the
> >tick be shut down, we may livelock in hrtimer-cancel() because in -rt
> >a callback may be running.  Break the loop, and let tick_nohz_restart()
> >know that the timer is busy so it can bail.
> 
> I am a bit undecided on that one. I included it in the series but did
> not enable it yet.
> Just so we are on the same page here: you boot your machine with
> something like
>     "isolcpus=1-31 rcu_nocbs=1-31 nohz_full=1-31"
> and pin all kernel threads to CPU0, right?

No, I only declare the nohz_full set, do the isolation via cpusets.

> What you do is that you accept the fact that the timer-softirq is
> scheduled for no reason and then you try to disable the tick from within
> the timer-softirq. I assumed that it would work get the "expired timer"
> somehow.

Yup, it works around that otherwise crippling wakeup.  If I re-apply..

     timers-do-not-raise-softirq-unconditionally.patch

..the workaround is not needed of course, but the livelock fix still is.
I haven't yet tested that in 3.18-rt though, only 4.0-rt, but I presume
it'll be the same deal there when I do.

	-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-17  1:53               ` Mike Galbraith
@ 2015-03-17  4:45                 ` Mike Galbraith
  0 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-03-17  4:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Tue, 2015-03-17 at 02:53 +0100, Mike Galbraith wrote:
> On Mon, 2015-03-16 at 21:24 +0100, Sebastian Andrzej Siewior wrote:

> > What you do is that you accept the fact that the timer-softirq is
> > scheduled for no reason and then you try to disable the tick from within
> > the timer-softirq. I assumed that it would work get the "expired timer"
> > somehow.
> 
> Yup, it works around that otherwise crippling wakeup.  If I re-apply..
> 
>      timers-do-not-raise-softirq-unconditionally.patch
> 
> ..the workaround is not needed of course, but the livelock fix still is.
> I haven't yet tested that in 3.18-rt though, only 4.0-rt, but I presume
> it'll be the same deal there when I do.

Did that, and it is.  Fire up tbench 4 + make -j4 on my i7-4790+smt box
booted nohz_full=2,3,6,7, death spiral begins shortly thereafter.

	-Mike


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-03-13  4:53           ` [rfc patch v2] " Mike Galbraith
  2015-03-16 20:24             ` Sebastian Andrzej Siewior
@ 2015-04-10 14:15             ` Sebastian Andrzej Siewior
  2015-04-10 14:24               ` Mike Galbraith
                                 ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-10 14:15 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

* Mike Galbraith | 2015-03-13 05:53:25 [+0100]:

>First of all, a task being ticked and trying to shut the tick down will
>fail to do so due to having just awakened ksoftirqd, so let ksoftirqd
>try to do that after SOFTIRQ_TIMER processing.  Secondly, should the
>tick be shut down, we may livelock in hrtimer-cancel() because in -rt
>a callback may be running.  Break the loop, and let tick_nohz_restart()
>know that the timer is busy so it can bail.

So with this patch
    http://marc.info/?l=linux-rt-users&m=142866940327602&w=2

you could drop the hrtimer hacks. Is this the remaining part all that is
required or we shorten the following patch a little more?

>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -783,12 +783,18 @@ static inline bool got_nohz_idle_kick(vo
> #ifdef CONFIG_NO_HZ_FULL
> bool sched_can_stop_tick(void)
> {
>+	int ksoftirqd = !!(IS_ENABLED(CONFIG_PREEMPT_RT_FULL) &&
>+			   current == this_cpu_ksoftirqd());
>+
> 	/*
> 	 * More than one running task need preemption.
> 	 * nr_running update is assumed to be visible
> 	 * after IPI is sent from wakers.
>+	 *
>+	 * NOTE, RT: ksoftirqd tries to stop the tick for
>+	 * tasks as they exit irq, ergo subtracts itself.
> 	 */
>-	if (this_rq()->nr_running > 1)
>+	if (this_rq()->nr_running - ksoftirqd > 1)
> 		return false;
> 
> 	return true;
>--- a/kernel/softirq.c
>+++ b/kernel/softirq.c
>@@ -498,6 +498,22 @@ static void unlock_softirq(int which)
> 	local_unlock(local_softirq_locks[which]);
> }
> 
>+/*
>+ * Let ksoftirqd try to shut down the tick when awakened via
>+ * timer_interrupt->irq_exit()->invoke_softirq(), as the task
>+ * then calling tick_nohz_irq_exit() WILL fail to do so due
>+ * to that very wakeup having made rq->nr_running > 1.
>+ */
>+static void tick_nohz_sirq_timer_exit(int which)
>+{
>+	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
>+		return;
>+	if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL))
>+		return;
>+	if (which != TIMER_SOFTIRQ || current != this_cpu_ksoftirqd())
>+		return;
>+	tick_nohz_irq_exit();
>+}
> static void do_single_softirq(int which, int need_rcu_bh_qs)
> {
> 	unsigned long old_flags = current->flags;
>@@ -513,6 +529,7 @@ static void do_single_softirq(int which,
> 	current->flags &= ~PF_IN_SOFTIRQ;
> 	vtime_account_irq_enter(current);
> 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
>+	tick_nohz_sirq_timer_exit(which);
> }
> 
> /*
>--- a/kernel/time/tick-sched.c
>+++ b/kernel/time/tick-sched.c
>@@ -222,7 +222,12 @@ void __tick_nohz_full_check(void)
> 
> static void nohz_full_kick_work_func(struct irq_work *work)
> {
>+	unsigned long flags;
>+
>+	/* ksoftirqd processes sirqs with interrupts enabled */
>+	local_irq_save(flags);
> 	__tick_nohz_full_check();
>+	local_irq_restore(flags);
> }
> 
> static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-04-10 14:15             ` Sebastian Andrzej Siewior
@ 2015-04-10 14:24               ` Mike Galbraith
  2015-04-10 14:28               ` Mike Galbraith
  2015-04-11 13:15               ` Mike Galbraith
  2 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-04-10 14:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Fri, 2015-04-10 at 16:15 +0200, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2015-03-13 05:53:25 [+0100]:
> 
> > First of all, a task being ticked and trying to shut the tick down 
> > will
> > fail to do so due to having just awakened ksoftirqd, so let 
> > ksoftirqd
> > try to do that after SOFTIRQ_TIMER processing.  Secondly, should 
> > the
> > tick be shut down, we may livelock in hrtimer-cancel() because in -
> > rt
> > a callback may be running.  Break the loop, and let 
> > tick_nohz_restart()
> > know that the timer is busy so it can bail.
> 
> So with this patch
>     http://marc.info/?l=linux-rt-users&m=142866940327602&w=2
> 
> you could drop the hrtimer hacks. Is this the remaining part all 
> that is
> required or we shorten the following patch a little more?

It's better to let the task that awakened ksoftirqd do the subtracting 
instead of waiting for ksoftirqd to try again after waker fails, so it 
wants to know that ksoftirqd is running due to that timer wakeup.

If you remove the hrtimer bit, you should quickly meet the livelock.

        -Mike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-04-10 14:15             ` Sebastian Andrzej Siewior
  2015-04-10 14:24               ` Mike Galbraith
@ 2015-04-10 14:28               ` Mike Galbraith
  2015-04-11 13:15               ` Mike Galbraith
  2 siblings, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-04-10 14:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Fri, 2015-04-10 at 16:15 +0200, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2015-03-13 05:53:25 [+0100]:
> 
> > First of all, a task being ticked and trying to shut the tick down 
> > will
> > fail to do so due to having just awakened ksoftirqd, so let 
> > ksoftirqd
> > try to do that after SOFTIRQ_TIMER processing.  Secondly, should 
> > the
> > tick be shut down, we may livelock in hrtimer-cancel() because in -
> > rt
> > a callback may be running.  Break the loop, and let 
> > tick_nohz_restart()
> > know that the timer is busy so it can bail.
> 
> So with this patch
>     http://marc.info/?l=linux-rt-users&m=142866940327602&w=2

Oh duh, too much of a hurry, I'll give that a spin.

        -Mike 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-04-10 14:15             ` Sebastian Andrzej Siewior
  2015-04-10 14:24               ` Mike Galbraith
  2015-04-10 14:28               ` Mike Galbraith
@ 2015-04-11 13:15               ` Mike Galbraith
  2015-04-11 13:36                 ` Mike Galbraith
  2015-04-13  9:41                 ` Sebastian Andrzej Siewior
  2 siblings, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-04-11 13:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Fri, 2015-04-10 at 16:15 +0200, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2015-03-13 05:53:25 [+0100]:
> 
> > First of all, a task being ticked and trying to shut the tick down 
> > will
> > fail to do so due to having just awakened ksoftirqd, so let 
> > ksoftirqd
> > try to do that after SOFTIRQ_TIMER processing.  Secondly, should 
> > the
> > tick be shut down, we may livelock in hrtimer-cancel() because in -
> > rt
> > a callback may be running.  Break the loop, and let 
> > tick_nohz_restart()
> > know that the timer is busy so it can bail.
> 
> So with this patch
>     http://marc.info/?l=linux-rt-users&m=142866940327602&w=2


@@ -117,10 +113,8 @@ bool irq_work_queue(struct irq_work *wor
        if (work->flags & IRQ_WORK_HARD_IRQ) {
                if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
                        arch_irq_work_raise();
-       } else {
-               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)))
-                       arch_irq_work_raise();
-       }
+       } /* for lazy_list we have the timer irq */

If we don't queue work without IRQ_WORK_HARD_IRQ set, how does it run?

Anyway, box did not livelock, nor did it with that bit reverted.

> you could drop the hrtimer hacks. Is this the remaining part all 
> that is
> required or we shorten the following patch a little more?

rt, nohz_full: fix nohz_full for PREEMPT_RT_FULL

A task being ticked and trying to shut the tick down will fail due
to having just awakened ksoftirqd, subtract it from nr_running.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 kernel/sched/core.c      |   17 ++++++++++++++++-
 kernel/time/tick-sched.c |    5 +++++
 2 files changed, 21 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -781,14 +781,29 @@ static inline bool got_nohz_idle_kick(vo
 #endif /* CONFIG_NO_HZ_COMMON */
 
 #ifdef CONFIG_NO_HZ_FULL
+
+static int ksoftirqd_running(void)
+{
+       struct task_struct *softirqd;
+
+       if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL))
+               return 0;
+       softirqd = this_cpu_ksoftirqd();
+       if (softirqd && softirqd->on_rq)
+               return 1;
+       return 0;
+}
+
 bool sched_can_stop_tick(void)
 {
        /*
         * More than one running task need preemption.
         * nr_running update is assumed to be visible
         * after IPI is sent from wakers.
+        *
+        * NOTE, RT: if ksoftirqd is awake, subtract it.
         */
-       if (this_rq()->nr_running > 1)
+       if (this_rq()->nr_running - ksoftirqd_running() > 1)
                return false;
 
        return true;
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -227,7 +227,12 @@ void __tick_nohz_full_check(void)
 
 static void nohz_full_kick_work_func(struct irq_work *work)
 {
+       unsigned long flags;
+
+       /* ksoftirqd processes sirqs with interrupts enabled */
+       local_irq_save(flags);
        __tick_nohz_full_check();
+       local_irq_restore(flags);
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-04-11 13:15               ` Mike Galbraith
@ 2015-04-11 13:36                 ` Mike Galbraith
  2015-04-11 14:22                   ` Mike Galbraith
  2015-04-13  9:43                   ` Sebastian Andrzej Siewior
  2015-04-13  9:41                 ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-04-11 13:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Sat, 2015-04-11 at 15:15 +0200, Mike Galbraith wrote:
> 
> @@ -117,10 +113,8 @@ bool irq_work_queue(struct irq_work *wor
>         if (work->flags & IRQ_WORK_HARD_IRQ) {
>                 if (llist_add(&work->llnode, 
> this_cpu_ptr(&hirq_work_list)))
>                         arch_irq_work_raise();
> -       } else {
> -               if (llist_add(&work->llnode, 
> this_cpu_ptr(&lazy_list)))
> -                       arch_irq_work_raise();
> -       }
> +       } /* for lazy_list we have the timer irq */
> 
> If we don't queue work without IRQ_WORK_HARD_IRQ set, how does it 
> run?
> 
> Anyway, box did not livelock, nor did it with that bit reverted.

Well, not reverted, I actually queued and maybe poked softirq, but 
didn't check whether anything was being queued or not, only caring for 
the nonce whether box went catatonic or kept on working.

@@ -118,8 +115,9 @@ bool irq_work_queue(struct irq_work *wor
                if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
                        arch_irq_work_raise();
        } else {
-               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)))
-                       arch_irq_work_raise();
+               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
+                   tick_nohz_tick_stopped())
+                       raise_softirq(TIMER_SOFTIRQ);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-04-11 13:36                 ` Mike Galbraith
@ 2015-04-11 14:22                   ` Mike Galbraith
  2015-04-13  9:43                   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 39+ messages in thread
From: Mike Galbraith @ 2015-04-11 14:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On Sat, 2015-04-11 at 15:36 +0200, Mike Galbraith wrote:
> 
> Well, not reverted, I actually queued and maybe poked softirq, but 
> didn't check whether anything was being queued or not

(the checking: !IRQ_WORK_HARD_IRQ happens, box remains alive)

           mount-827   [002] ....211     9.781994: irq_work_queue: queue work: ffff88041ec8d7d0
           mount-827   [002] .N..211     9.781997: irq_work_queue: poke ksoftirqd
    sirq-timer/2-49    [002] .....11     9.781999: irq_work_run_list: execute work: ffff88041ec8d7d0
          swapon-852   [003] ....211    10.224358: irq_work_queue: queue work: ffff88041eccd7d0
    sirq-timer/3-67    [003] .....11    10.224501: irq_work_run_list: execute work: ffff88041eccd7d0
       plymouthd-476   [000] ....2..    10.227930: irq_work_queue: queue work: ffff88041ec0d7d0
    sirq-timer/0-4     [000] .....11    10.228311: irq_work_run_list: execute work: ffff88041ec0d7d0
   systemd-udevd-694   [001] ....211    10.316067: irq_work_queue: queue work: ffff88041ec4d7d0
    sirq-timer/1-34    [001] .....11    10.316340: irq_work_run_list: execute work: ffff88041ec4d7d0
   systemd-udevd-694   [001] ....211    10.316871: irq_work_queue: queue work: ffff88041ec4d7d0
    sirq-timer/1-34    [001] .....11    10.317341: irq_work_run_list: execute work: ffff88041ec4d7d0
   systemd-udevd-741   [005] ....211    10.515752: irq_work_queue: queue work: ffff88041ed4d7d0
    sirq-timer/5-100   [005] .....11    10.516522: irq_work_run_list: execute work: ffff88041ed4d7d0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-04-11 13:15               ` Mike Galbraith
  2015-04-11 13:36                 ` Mike Galbraith
@ 2015-04-13  9:41                 ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-13  9:41 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On 04/11/2015 03:15 PM, Mike Galbraith wrote:
> @@ -117,10 +113,8 @@ bool irq_work_queue(struct irq_work *wor
>         if (work->flags & IRQ_WORK_HARD_IRQ) {
>                 if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
>                         arch_irq_work_raise();
> -       } else {
> -               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)))
> -                       arch_irq_work_raise();
> -       }
> +       } /* for lazy_list we have the timer irq */
> 
> If we don't queue work without IRQ_WORK_HARD_IRQ set, how does it run?

We schedule the timer softirq which does the work in the non
IRQ_WORK_HARD_IRQ case.

> Anyway, box did not livelock, nor did it with that bit reverted.
> 
>> you could drop the hrtimer hacks. Is this the remaining part all 
>> that is
>> required or we shorten the following patch a little more?
> 
> rt, nohz_full: fix nohz_full for PREEMPT_RT_FULL
> 
> A task being ticked and trying to shut the tick down will fail due
> to having just awakened ksoftirqd, subtract it from nr_running.

thanks.

> 
> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> ---
>  kernel/sched/core.c      |   17 ++++++++++++++++-
>  kernel/time/tick-sched.c |    5 +++++
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -781,14 +781,29 @@ static inline bool got_nohz_idle_kick(vo
>  #endif /* CONFIG_NO_HZ_COMMON */
>  
>  #ifdef CONFIG_NO_HZ_FULL
> +
> +static int ksoftirqd_running(void)
> +{
> +       struct task_struct *softirqd;
> +
> +       if (!IS_ENABLED(CONFIG_PREEMPT_RT_FULL))
> +               return 0;
> +       softirqd = this_cpu_ksoftirqd();
> +       if (softirqd && softirqd->on_rq)
> +               return 1;
> +       return 0;
> +}
> +
>  bool sched_can_stop_tick(void)
>  {
>         /*
>          * More than one running task need preemption.
>          * nr_running update is assumed to be visible
>          * after IPI is sent from wakers.
> +        *
> +        * NOTE, RT: if ksoftirqd is awake, subtract it.
>          */
> -       if (this_rq()->nr_running > 1)
> +       if (this_rq()->nr_running - ksoftirqd_running() > 1)
>                 return false;
>  
>         return true;
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -227,7 +227,12 @@ void __tick_nohz_full_check(void)
>  
>  static void nohz_full_kick_work_func(struct irq_work *work)
>  {
> +       unsigned long flags;
> +
> +       /* ksoftirqd processes sirqs with interrupts enabled */
> +       local_irq_save(flags);
>         __tick_nohz_full_check();
> +       local_irq_restore(flags);
>  }
>  
>  static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
> 
Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [rfc patch v2] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL
  2015-04-11 13:36                 ` Mike Galbraith
  2015-04-11 14:22                   ` Mike Galbraith
@ 2015-04-13  9:43                   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-04-13  9:43 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-rt-users, LKML, Thomas Gleixner, rostedt, John Kacur

On 04/11/2015 03:36 PM, Mike Galbraith wrote:
> On Sat, 2015-04-11 at 15:15 +0200, Mike Galbraith wrote:
>>
>> @@ -117,10 +113,8 @@ bool irq_work_queue(struct irq_work *wor
>>         if (work->flags & IRQ_WORK_HARD_IRQ) {
>>                 if (llist_add(&work->llnode, 
>> this_cpu_ptr(&hirq_work_list)))
>>                         arch_irq_work_raise();
>> -       } else {
>> -               if (llist_add(&work->llnode, 
>> this_cpu_ptr(&lazy_list)))
>> -                       arch_irq_work_raise();
>> -       }
>> +       } /* for lazy_list we have the timer irq */
>>
>> If we don't queue work without IRQ_WORK_HARD_IRQ set, how does it 
>> run?
>>
>> Anyway, box did not livelock, nor did it with that bit reverted.
> 
> Well, not reverted, I actually queued and maybe poked softirq, but 
> didn't check whether anything was being queued or not, only caring for 
> the nonce whether box went catatonic or kept on working.
> 
> @@ -118,8 +115,9 @@ bool irq_work_queue(struct irq_work *wor
>                 if (llist_add(&work->llnode, this_cpu_ptr(&hirq_work_list)))
>                         arch_irq_work_raise();
>         } else {
> -               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)))
> -                       arch_irq_work_raise();
> +               if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
> +                   tick_nohz_tick_stopped())
> +                       raise_softirq(TIMER_SOFTIRQ);

This sounds like a good idea actually. Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-04-13  9:43 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-16 11:18 [ANNOUNCE] 3.18.7-rt1 Sebastian Andrzej Siewior
2015-02-16 18:13 ` Sebastian Andrzej Siewior
2015-02-16 20:12 ` Richard Cochran
2015-02-16 20:21   ` Steven Rostedt
2015-02-17  8:37 ` Daniel Wagner
2015-02-19  9:36   ` Joakim Hernberg
2015-02-25 13:55     ` Sebastian Andrzej Siewior
2015-02-26  8:48       ` Sebastian Andrzej Siewior
2015-02-27  6:40         ` Daniel Wagner
2015-02-27  7:58           ` Sebastian Andrzej Siewior
2015-02-18 11:21 ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
2015-02-18 11:27   ` [rfc patch-3.18.7-rt1] rt/nohz_full: Fix can_stop_full_tick() gripe within softirq Mike Galbraith
2015-02-19  4:50     ` Mike Galbraith
2015-02-19  6:30   ` [patch-3.18.7-rt1] sched/context-tracking: fix PREEMPT_LAZY explosions Mike Galbraith
2015-03-09 13:45   ` Sebastian Andrzej Siewior
2015-03-09 14:36     ` Mike Galbraith
2015-03-09 14:45       ` Sebastian Andrzej Siewior
2015-03-10  9:35       ` Mike Galbraith
2015-03-11  6:18         ` Mike Galbraith
2015-03-12  9:38         ` [rfc patch] rt,nohz_full: fix nohz_full for PREEMPT_RT_FULL Mike Galbraith
2015-03-12 15:09           ` Steven Rostedt
2015-03-13  2:13             ` Mike Galbraith
2015-03-13  3:03               ` Steven Rostedt
2015-03-13  4:53           ` [rfc patch v2] " Mike Galbraith
2015-03-16 20:24             ` Sebastian Andrzej Siewior
2015-03-17  1:53               ` Mike Galbraith
2015-03-17  4:45                 ` Mike Galbraith
2015-04-10 14:15             ` Sebastian Andrzej Siewior
2015-04-10 14:24               ` Mike Galbraith
2015-04-10 14:28               ` Mike Galbraith
2015-04-11 13:15               ` Mike Galbraith
2015-04-11 13:36                 ` Mike Galbraith
2015-04-11 14:22                   ` Mike Galbraith
2015-04-13  9:43                   ` Sebastian Andrzej Siewior
2015-04-13  9:41                 ` Sebastian Andrzej Siewior
2015-02-18 14:09 ` [patch-3.18.7-rt1] snd/pcm: fix snd_pcm_stream_lock*() irqs_disabled() splats Mike Galbraith
2015-02-25 14:08   ` Sebastian Andrzej Siewior
2015-02-18 15:05 ` [patch-3.18.7-rt1]sunrpc: make svc_xprt_do_enqueue() use get_cpu_light() Mike Galbraith
2015-02-25 14:14   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.