linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] v4.11.5-rt1
@ 2017-06-16 10:56 Sebastian Andrzej Siewior
  2017-06-17  8:14 ` Mike Galbraith
  2017-06-18 17:01 ` [patch-rt] rtmutex: Fix lock stealing logic Mike Galbraith
  0 siblings, 2 replies; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-16 10:56 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, linux-rt-users, Steven Rostedt

Dear RT folks!

I'm pleased to announce the v4.11.5-rt1 patch set. 
The release has been delayed due to the hotplug rework that was started
before the final v4.11 release. However the new code has not been
stabilized yet and it was decided to bring back the old patches before
delaying the v4.11-RT release any longer. We will try to complete the
hotplug rework (for RT) in the v4.11 cycle.

Changes since v4.9.39-rt21:

  - rebase to v4.11.5

Known issues
	- CPU hotplug got a little better but can deadlock.

You can get this release via the git tree at:

    git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git v4.11.5-rt1

The RT patch against v4.11.5 can be found here:

    https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.11/older/patch-4.11.5-rt1.patch.xz

The split quilt queue is available at:

    https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.11/older/patches-4.11.5-rt1.tar.xz

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-16 10:56 [ANNOUNCE] v4.11.5-rt1 Sebastian Andrzej Siewior
@ 2017-06-17  8:14 ` Mike Galbraith
  2017-06-18  6:46   ` Mike Galbraith
  2017-06-19  8:52   ` Sebastian Andrzej Siewior
  2017-06-18 17:01 ` [patch-rt] rtmutex: Fix lock stealing logic Mike Galbraith
  1 sibling, 2 replies; 32+ messages in thread
From: Mike Galbraith @ 2017-06-17  8:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Thomas Gleixner
  Cc: LKML, linux-rt-users, Steven Rostedt

On Fri, 2017-06-16 at 12:56 +0200, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
> 
> I'm pleased to announce the v4.11.5-rt1 patch set. 
> The release has been delayed due to the hotplug rework that was started
> before the final v4.11 release. However the new code has not been
> stabilized yet and it was decided to bring back the old patches before
> delaying the v4.11-RT release any longer. We will try to complete the
> hotplug rework (for RT) in the v4.11 cycle.
> 
> Changes since v4.9.39-rt21:
> 
>   - rebase to v4.11.5

Hi Sebastian,

I noticed a couple things wrt migrate_disable() changes...

During that rebase, migrate_disable() was changed to no longer map to
preempt_disable() for nonrt, but some patches still assume it does.  It
now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
grumbling in nonrt builds with PREEMPT_COUNT enabled.

	-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-17  8:14 ` Mike Galbraith
@ 2017-06-18  6:46   ` Mike Galbraith
  2017-06-19  8:52   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 32+ messages in thread
From: Mike Galbraith @ 2017-06-18  6:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Thomas Gleixner
  Cc: LKML, linux-rt-users, Steven Rostedt

On Sat, 2017-06-17 at 10:14 +0200, Mike Galbraith wrote:
> 
>... the RT workaround in futex.c induces
> grumbling in nonrt builds with PREEMPT_COUNT enabled.

A trivial way to fix it up is to...

futex: Fix migrate_disable/enable workaround for !PREEMPT_RT_FULL

The imbalance fixed by aed0f50e58eb only exists for PREEMPT_RT_FULL,
and creates for other PREEMPT_COUNT configs.  Create/use _rt variants
of migrate_disable/enable() which are compiled away when not needed.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Fixes: aed0f50e58eb ("futex: workaround migrate_disable/enable in different context")
---
 include/linux/preempt.h |   11 +++++++++++
 kernel/futex.c          |    8 ++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -227,12 +227,21 @@ do { \
 
 extern void migrate_disable(void);
 extern void migrate_enable(void);
+#ifdef CONFIG_PREEMPT_RT_FULL
+#define migrate_disable_rt()		migrate_disable()
+#define migrate_enable_rt()		migrate_enable()
+#else
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
+#endif
 
 int __migrate_disabled(struct task_struct *p);
 
 #else
 #define migrate_disable()		barrier()
 #define migrate_enable()		barrier()
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
 static inline int __migrate_disabled(struct task_struct *p)
 {
 	return 0;
@@ -323,6 +332,8 @@ do { \
 
 #define migrate_disable()			barrier()
 #define migrate_enable()			barrier()
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
 
 static inline int __migrate_disabled(struct task_struct *p)
 {
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2690,12 +2690,12 @@ static int futex_lock_pi(u32 __user *uad
 	 * one migrate_disable() pending in the slow-path which is reversed
 	 * after the raw_spin_unlock_irq() where we leave the atomic context.
 	 */
-	migrate_disable();
+	migrate_disable_rt();
 
 	spin_unlock(q.lock_ptr);
 	ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
 	raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock);
-	migrate_enable();
+	migrate_enable_rt();
 
 	if (ret) {
 		if (ret == 1)
@@ -2846,13 +2846,13 @@ static int futex_unlock_pi(u32 __user *u
 		 * won't undo the migrate_disable() which was issued when
 		 * locking hb->lock.
 		 */
-		migrate_disable();
+		migrate_disable_rt();
 		spin_unlock(&hb->lock);
 
 		/* Drops pi_state->pi_mutex.wait_lock */
 		ret = wake_futex_pi(uaddr, uval, pi_state);
 
-		migrate_enable();
+		migrate_enable_rt();
 
 		put_pi_state(pi_state);
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch-rt] rtmutex: Fix lock stealing logic
  2017-06-16 10:56 [ANNOUNCE] v4.11.5-rt1 Sebastian Andrzej Siewior
  2017-06-17  8:14 ` Mike Galbraith
@ 2017-06-18 17:01 ` Mike Galbraith
  2017-06-23  7:37   ` [patch-rt v2] " Mike Galbraith
  1 sibling, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-18 17:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Thomas Gleixner
  Cc: LKML, linux-rt-users, Steven Rostedt


1. When trying to acquire an rtmutex, we first try to grab it without
queueing the waiter, and explicitly check for that initial attempt
in the !waiter path of __try_to_take_rt_mutex().  Checking whether
the lock taker is top waiter before allowing a steal attempt in that
path is a thinko: the lock taker has not yet blocked.

2. It seems wrong to change the definition of rt_mutex_waiter_less()
to mean less or perhaps equal when we have an rt_mutex_waiter_equal().

Remove the thinko, and implement rt_mutex_stealable(steal_mode) using
a restored rt_mutex_waiter_less(), and rt_mutex_waiter_equal().

Signed-off-by: Mike Galbraith <efault@gmx.de>
Fixes: f34d077c6f28 ("rt: Add the preempt-rt lock replacement APIs")
---
 kernel/locking/rtmutex.c |   69 +++++++++++++++++++++++------------------------
 1 file changed, 35 insertions(+), 34 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
-#define STEAL_NORMAL  0
-#define STEAL_LATERAL 1
-
 /*
  * Only use with rt_mutex_waiter_{less,equal}()
  */
-#define task_to_waiter(p)	\
-	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }
+#define task_to_waiter(p) &(struct rt_mutex_waiter) \
+	{ .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) }
 
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
-		     struct rt_mutex_waiter *right, int mode)
+		     struct rt_mutex_waiter *right)
 {
-	if (mode == STEAL_NORMAL) {
-		if (left->prio < right->prio)
-			return 1;
-	} else {
-		if (left->prio <= right->prio)
-			return 1;
-	}
+	if (left->prio < right->prio)
+		return 1;
+
 	/*
 	 * If both waiters have dl_prio(), we check the deadlines of the
 	 * associated tasks.
@@ -287,6 +280,25 @@ rt_mutex_waiter_equal(struct rt_mutex_wa
 	return 1;
 }
 
+#define STEAL_NORMAL  0
+#define STEAL_LATERAL 1
+
+static inline int rt_mutex_stealable(struct rt_mutex_waiter *thief,
+				     struct rt_mutex_waiter *victim, int mode)
+{
+	if (rt_mutex_waiter_less(thief, victim))
+		return 1;
+
+	/*
+	 * Note that RT tasks are excluded from lateral-steals
+	 * to prevent the introduction of an unbounded latency.
+	 */
+	if (mode == STEAL_NORMAL || rt_task(thief->task))
+		return 0;
+
+	return rt_mutex_waiter_equal(thief, victim);
+}
+
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
@@ -298,7 +310,7 @@ rt_mutex_enqueue(struct rt_mutex *lock,
 	while (*link) {
 		parent = *link;
 		entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
-		if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+		if (rt_mutex_waiter_less(waiter, entry)) {
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
@@ -337,7 +349,7 @@ rt_mutex_enqueue_pi(struct task_struct *
 	while (*link) {
 		parent = *link;
 		entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
-		if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+		if (rt_mutex_waiter_less(waiter, entry)) {
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
@@ -847,6 +859,7 @@ static int rt_mutex_adjust_prio_chain(st
  * @task:   The task which wants to acquire the lock
  * @waiter: The waiter that is queued to the lock's wait tree if the
  *	    callsite called task_blocked_on_lock(), otherwise NULL
+ * @mode:   Lock steal mode (STEAL_NORMAL, STEAL_LATERAL)
  */
 static int __try_to_take_rt_mutex(struct rt_mutex *lock,
 				  struct task_struct *task,
@@ -890,10 +903,9 @@ static int __try_to_take_rt_mutex(struct
 		 * @lock, give up.
 		 */
 		if (waiter != rt_mutex_top_waiter(lock)) {
-			/* XXX rt_mutex_waiter_less() ? */
+			/* XXX rt_mutex_stealable() ? */
 			return 0;
 		}
-
 		/*
 		 * We can acquire the lock. Remove the waiter from the
 		 * lock waiters tree.
@@ -910,25 +922,14 @@ static int __try_to_take_rt_mutex(struct
 		 * not need to be dequeued.
 		 */
 		if (rt_mutex_has_waiters(lock)) {
-			struct task_struct *pown = rt_mutex_top_waiter(lock)->task;
-
-			if (task != pown)
-				return 0;
-
-			/*
-			 * Note that RT tasks are excluded from lateral-steals
-			 * to prevent the introduction of an unbounded latency.
-			 */
-			if (rt_task(task))
-				mode = STEAL_NORMAL;
 			/*
-			 * If @task->prio is greater than or equal to
-			 * the top waiter priority (kernel view),
-			 * @task lost.
+			 * If @task->prio is greater than the top waiter
+			 * priority (kernel view), or equal to it when
+			 * lateral steal is forbidden, @task lost.
 			 */
-			if (!rt_mutex_waiter_less(task_to_waiter(task),
-						  rt_mutex_top_waiter(lock),
-						  mode))
+			if (!rt_mutex_stealable(task_to_waiter(task),
+						rt_mutex_top_waiter(lock),
+						mode))
 				return 0;
 			/*
 			 * The current top waiter stays enqueued. We

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-17  8:14 ` Mike Galbraith
  2017-06-18  6:46   ` Mike Galbraith
@ 2017-06-19  8:52   ` Sebastian Andrzej Siewior
  2017-06-19 10:14     ` Mike Galbraith
  1 sibling, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-19  8:52 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On 2017-06-17 10:14:37 [+0200], Mike Galbraith wrote:
> Hi Sebastian,
Hi Mike,

> 
> I noticed a couple things wrt migrate_disable() changes...
> 
> During that rebase, migrate_disable() was changed to no longer map to
> preempt_disable() for nonrt, but some patches still assume it does.  It
> now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
> grumbling in nonrt builds with PREEMPT_COUNT enabled.

argh, right. It was planned to get it merged upstream but due to
$reasons we never got that far. For that reason I would simply revert
that change and let migrate_disable() map to preempt_disable() as it did
earlier.

> 	-Mike

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19  8:52   ` Sebastian Andrzej Siewior
@ 2017-06-19 10:14     ` Mike Galbraith
  2017-06-19 10:44       ` Sebastian Andrzej Siewior
  2017-06-19 14:08       ` Steven Rostedt
  0 siblings, 2 replies; 32+ messages in thread
From: Mike Galbraith @ 2017-06-19 10:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On Mon, 2017-06-19 at 10:52 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-17 10:14:37 [+0200], Mike Galbraith wrote:
> 
> > During that rebase, migrate_disable() was changed to no longer map to
> > preempt_disable() for nonrt, but some patches still assume it does.  It
> > now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
> > grumbling in nonrt builds with PREEMPT_COUNT enabled.
> 
> argh, right. It was planned to get it merged upstream but due to
> $reasons we never got that far. For that reason I would simply revert
> that change and let migrate_disable() map to preempt_disable() as it did
> earlier.

Ok, doesn't matter for RT testing.  What does matter, is that...

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 30b24f774198..10e832da70b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
  */
 int wake_up_lock_sleeper(struct task_struct *p)
 {
-       return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
+       return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
 }

...appears to be inducing lost futex wakeups.

Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
running otherwise absolutely pristine 4.9-rt21, after having double
verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
make sure that the little bugger really really REALLY ain't fscking
with me for the sheer fun of it, futexes being made of pure evil :)

My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
thing is that it only reproduces on the DL980 if I let it use multiple
sockets, pin it to one, and all is peachy, (rather seems to be given)
whereas on desktop box, the hang is far more intermittent, but there.

	-Mike

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 10:14     ` Mike Galbraith
@ 2017-06-19 10:44       ` Sebastian Andrzej Siewior
  2017-06-19 11:31         ` Mike Galbraith
  2017-06-19 14:08       ` Steven Rostedt
  1 sibling, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-19 10:44 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On 2017-06-19 12:14:51 [+0200], Mike Galbraith wrote:
> Ok, doesn't matter for RT testing.  What does matter, is that...
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 30b24f774198..10e832da70b6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
>   */
>  int wake_up_lock_sleeper(struct task_struct *p)
>  {
> -       return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
> +       return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
>  }
> 
> ...appears to be inducing lost futex wakeups.

has this something to do with "rtmutex: Fix lock stealing logic" ?

> Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
> running otherwise absolutely pristine 4.9-rt21, after having double
> verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
> make sure that the little bugger really really REALLY ain't fscking
> with me for the sheer fun of it, futexes being made of pure evil :)

So v4.9-rt20 works fine but -rt21 starts to lose wakeups on DL980 in
general or just with "futex_wait -n 4" ?

> My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
> thing is that it only reproduces on the DL980 if I let it use multiple
> sockets, pin it to one, and all is peachy, (rather seems to be given)
> whereas on desktop box, the hang is far more intermittent, but there.

do I parse it right, as v4.9-rt21 (without the change above) works with
the testcase mentioned if you pin it to one socket but does not work if
you let it use multiple sockets.
And your desktop box hangs no matter what?

> 	-Mike

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 10:44       ` Sebastian Andrzej Siewior
@ 2017-06-19 11:31         ` Mike Galbraith
  2017-06-19 11:50           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-19 11:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On Mon, 2017-06-19 at 12:44 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 12:14:51 [+0200], Mike Galbraith wrote:
> > Ok, doesn't matter for RT testing.  What does matter, is that...
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 30b24f774198..10e832da70b6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
> >   */
> >  int wake_up_lock_sleeper(struct task_struct *p)
> >  {
> > -       return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
> > +       return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
> >  }
> > 
> > ...appears to be inducing lost futex wakeups.
> 
> has this something to do with "rtmutex: Fix lock stealing logic" ?

Nope.  The above is fallout of me being inspired me to stare, that
inspiration having come initially from seeing lost wakeup symptoms in
my desktop, telling me something has gone sour in rt-land, so a hunting
I did go.  I expected to find I had made a booboo in my trees, but
maybe not, as I found a suspiciously similar symptom to what I was
looking for in virgin source.

> > Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
> > running otherwise absolutely pristine 4.9-rt21, after having double
> > verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
> > make sure that the little bugger really really REALLY ain't fscking
> > with me for the sheer fun of it, futexes being made of pure evil :)
> 
> So v4.9-rt20 works fine but -rt21 starts to lose wakeups on DL980 in
> general or just with "futex_wait -n 4" ?

-rt20 is verified to work fine, -rt21 starts hanging with futextest.
 The futex_wait -n 4 testcase was distilled out of seeing the full
futextest/run.sh hanging.  The only symptom I've _seen_ on the DL980 is
futextest hanging.  On the desktop, I've seen more, and may still, I'll
know when I see or don't see desktop gizmos occasionally go comatose.

> > My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
> > thing is that it only reproduces on the DL980 if I let it use multiple
> > sockets, pin it to one, and all is peachy, (rather seems to be given)
> > whereas on desktop box, the hang is far more intermittent, but there.
> 
> do I parse it right, as v4.9-rt21 (without the change above) works with
> the testcase mentioned if you pin it to one socket but does not work if
> you let it use multiple sockets.
> And your desktop box hangs no matter what?

No no, desktop box will reproduce, but not nearly as reliably as the 8
socket box does, but yes, it seems to work fine on the DL980 when
pinned to one socket.  I was testing 4.9-rt because hunt was in
progress when 4.11-rt was born.

	-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 11:31         ` Mike Galbraith
@ 2017-06-19 11:50           ` Sebastian Andrzej Siewior
  2017-06-19 12:55             ` Mike Galbraith
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-19 11:50 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On 2017-06-19 13:31:28 [+0200], Mike Galbraith wrote:
> > > Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
> > > running otherwise absolutely pristine 4.9-rt21, after having double
> > > verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
> > > make sure that the little bugger really really REALLY ain't fscking
> > > with me for the sheer fun of it, futexes being made of pure evil :)
> > 
> > So v4.9-rt20 works fine but -rt21 starts to lose wakeups on DL980 in
> > general or just with "futex_wait -n 4" ?
> 
> -rt20 is verified to work fine, -rt21 starts hanging with futextest.
>  The futex_wait -n 4 testcase was distilled out of seeing the full
> futextest/run.sh hanging.  The only symptom I've _seen_ on the DL980 is
> futextest hanging.  On the desktop, I've seen more, and may still, I'll
> know when I see or don't see desktop gizmos occasionally go comatose.

rt20…rt21 is just
  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/sched-Prevent-task-state-corruption-by-spurious-lock.patch?h=linux-4.9.y-rt-patches

Let me verify that here and fire maybe the four socket box. 

> > > My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
> > > thing is that it only reproduces on the DL980 if I let it use multiple
> > > sockets, pin it to one, and all is peachy, (rather seems to be given)
> > > whereas on desktop box, the hang is far more intermittent, but there.
> > 
> > do I parse it right, as v4.9-rt21 (without the change above) works with
> > the testcase mentioned if you pin it to one socket but does not work if
> > you let it use multiple sockets.
> > And your desktop box hangs no matter what?
> 
> No no, desktop box will reproduce, but not nearly as reliably as the 8
> socket box does, but yes, it seems to work fine on the DL980 when
> pinned to one socket.  I was testing 4.9-rt because hunt was in
> progress when 4.11-rt was born.

Let me try futex-wait test on more boxes then…

> 	-Mike

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 11:50           ` Sebastian Andrzej Siewior
@ 2017-06-19 12:55             ` Mike Galbraith
  2017-06-19 14:06               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-19 12:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On Mon, 2017-06-19 at 13:50 +0200, Sebastian Andrzej Siewior wrote:
> 
> rt20…rt21 is just
>   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/sched-Prevent-task-state-corruption-by-spurious-lock.patch?h=linux-4.9.y-rt-patches

Yup.  I got there via git checkout v4.9.x-rtx, build/test.

> Let me verify that here and fire maybe the four socket box.

One thing perhaps interesting about the DL980 is that it is 8 socket,
but with 7 memoryless nodes (poor thing).  I wouldn't be surprised if
you had to try a lot harder than I.
DL980 now seems happy with master-rt21 fwtw.  It's not virgin though,
not even considering continuous merge 4.9..today.

Hopefully desktop will cease with the comatose widget business too, I
really don't want to go there.  These things give tglx headaches, might
well blow the point clean off of my head. 

	-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 12:55             ` Mike Galbraith
@ 2017-06-19 14:06               ` Sebastian Andrzej Siewior
  2017-06-19 14:36                 ` Mike Galbraith
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-19 14:06 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On 2017-06-19 14:55:35 [+0200], Mike Galbraith wrote:
> On Mon, 2017-06-19 at 13:50 +0200, Sebastian Andrzej Siewior wrote:
> > 
> > rt20…rt21 is just
> >   https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/sched-Prevent-task-state-corruption-by-spurious-lock.patch?h=linux-4.9.y-rt-patches
> 
> Yup.  I got there via git checkout v4.9.x-rtx, build/test.
> 
> > Let me verify that here and fire maybe the four socket box.
> 
> One thing perhaps interesting about the DL980 is that it is 8 socket,
> but with 7 memoryless nodes (poor thing).  I wouldn't be surprised if
> you had to try a lot harder than I.
> DL980 now seems happy with master-rt21 fwtw.  It's not virgin though,
> not even considering continuous merge 4.9..today.
> 
> Hopefully desktop will cease with the comatose widget business too, I
> really don't want to go there.  These things give tglx headaches, might
> well blow the point clean off of my head. 

I am suppressed that your desktop shows any symptoms on rt21. I tried my
smaller AMD box (A10), an Intel two sockets and a four socket box. Each
of them was fine with the run.sh and manual futex_wait invocation.
Could you please send the config of your desktop box?

> 	-Mike

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 10:14     ` Mike Galbraith
  2017-06-19 10:44       ` Sebastian Andrzej Siewior
@ 2017-06-19 14:08       ` Steven Rostedt
  2017-06-19 14:13         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2017-06-19 14:08 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, LKML, linux-rt-users

On Mon, 19 Jun 2017 12:14:51 +0200
Mike Galbraith <efault@gmx.de> wrote:

> On Mon, 2017-06-19 at 10:52 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-17 10:14:37 [+0200], Mike Galbraith wrote:
> >   
> > > During that rebase, migrate_disable() was changed to no longer map to
> > > preempt_disable() for nonrt, but some patches still assume it does.  It
> > > now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
> > > grumbling in nonrt builds with PREEMPT_COUNT enabled.  
> > 
> > argh, right. It was planned to get it merged upstream but due to
> > $reasons we never got that far. For that reason I would simply revert
> > that change and let migrate_disable() map to preempt_disable() as it did
> > earlier.  
> 
> Ok, doesn't matter for RT testing.  What does matter, is that...
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 30b24f774198..10e832da70b6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
>   */
>  int wake_up_lock_sleeper(struct task_struct *p)
>  {
> -       return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
> +       return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
>  }
> 
> ...appears to be inducing lost futex wakeups.

Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
waiter->savestate is true. And that should always be false for futex.

-- Steve

> 
> Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
> running otherwise absolutely pristine 4.9-rt21, after having double
> verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
> make sure that the little bugger really really REALLY ain't fscking
> with me for the sheer fun of it, futexes being made of pure evil :)
> 
> My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
> thing is that it only reproduces on the DL980 if I let it use multiple
> sockets, pin it to one, and all is peachy, (rather seems to be given)
> whereas on desktop box, the hang is far more intermittent, but there.
> 
> 	-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 14:08       ` Steven Rostedt
@ 2017-06-19 14:13         ` Sebastian Andrzej Siewior
  2017-06-19 14:41           ` Steven Rostedt
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-19 14:13 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mike Galbraith, Thomas Gleixner, LKML, linux-rt-users

On 2017-06-19 10:08:38 [-0400], Steven Rostedt wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 30b24f774198..10e832da70b6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
> >   */
> >  int wake_up_lock_sleeper(struct task_struct *p)
> >  {
> > -       return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
> > +       return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
> >  }
> > 
> > ...appears to be inducing lost futex wakeups.
> 
> Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> waiter->savestate is true. And that should always be false for futex.

you still have sleeping locks like the hb-lock (which might matter in
case the task blocks on the lock and does not continue for some reason).

> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 14:06               ` Sebastian Andrzej Siewior
@ 2017-06-19 14:36                 ` Mike Galbraith
  2017-06-19 15:03                   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-19 14:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote:
> 
> I am suppressed that your desktop shows any symptoms on rt21. I tried my
> smaller AMD box (A10), an Intel two sockets and a four socket box. Each
> of them was fine with the run.sh and manual futex_wait invocation.
> Could you please send the config of your desktop box?

(sent offline)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 14:13         ` Sebastian Andrzej Siewior
@ 2017-06-19 14:41           ` Steven Rostedt
  2017-06-19 16:29             ` Mike Galbraith
  0 siblings, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2017-06-19 14:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mike Galbraith, Thomas Gleixner, LKML, linux-rt-users

On Mon, 19 Jun 2017 16:13:41 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:


> > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> > waiter->savestate is true. And that should always be false for futex.  
> 
> you still have sleeping locks like the hb-lock (which might matter in
> case the task blocks on the lock and does not continue for some reason).

But then I'd expect this to be an issue with any spinlock in the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 14:36                 ` Mike Galbraith
@ 2017-06-19 15:03                   ` Sebastian Andrzej Siewior
  2017-06-19 16:14                     ` Mike Galbraith
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-19 15:03 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On 2017-06-19 16:36:22 [+0200], Mike Galbraith wrote:
> On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote:
> > 
> > I am suppressed that your desktop shows any symptoms on rt21. I tried my
> > smaller AMD box (A10), an Intel two sockets and a four socket box. Each
> > of them was fine with the run.sh and manual futex_wait invocation.
> > Could you please send the config of your desktop box?
> 
> (sent offline)

btw: I tried booting the two and four sockets box with less memory so
that the box is using just one memory node but nothing changed. Let me
try with the config of yours.

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 15:03                   ` Sebastian Andrzej Siewior
@ 2017-06-19 16:14                     ` Mike Galbraith
  2017-06-19 16:27                       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-19 16:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On Mon, 2017-06-19 at 17:03 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 16:36:22 [+0200], Mike Galbraith wrote:
> > On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote:
> > > 
> > > I am suppressed that your desktop shows any symptoms on rt21. I tried my
> > > smaller AMD box (A10), an Intel two sockets and a four socket box. Each
> > > of them was fine with the run.sh and manual futex_wait invocation.
> > > Could you please send the config of your desktop box?
> > 
> > (sent offline)
> 
> btw: I tried booting the two and four sockets box with less memory so
> that the box is using just one memory node but nothing changed. Let me
> try with the config of yours.

BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the
have something resembling a life thing, and it did not stall in 50
iterations of performance/run.sh (bloody fickle thing).  Hohum, take it
for whatever you think it's worth.

	-Mike  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 16:14                     ` Mike Galbraith
@ 2017-06-19 16:27                       ` Sebastian Andrzej Siewior
  2017-06-19 16:46                         ` Mike Galbraith
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-19 16:27 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On 2017-06-19 18:14:52 [+0200], Mike Galbraith wrote:
> 
> BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the
> have something resembling a life thing, and it did not stall in 50
> iterations of performance/run.sh (bloody fickle thing).  Hohum, take it
> for whatever you think it's worth.

that sounds good because I tried your config on three boxes with the
same result.
So the issue remains on the DLxxx box which is 8socket NUMA box with
only one memory node, right?

I let run.sh run in a loop until tomorrow evening and check for results…

> 	-Mike  

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 14:41           ` Steven Rostedt
@ 2017-06-19 16:29             ` Mike Galbraith
  2017-06-20  7:45               ` Mike Galbraith
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-19 16:29 UTC (permalink / raw)
  To: Steven Rostedt, Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users

On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote:
> On Mon, 19 Jun 2017 16:13:41 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> 
> > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> > > waiter->savestate is true. And that should always be false for futex.  
> > 
> > you still have sleeping locks like the hb-lock (which might matter in
> > case the task blocks on the lock and does not continue for some reason).
> 
> But then I'd expect this to be an issue with any spinlock in the kernel.

Thing to do is set a trap for the bugger.  I'll give that a go after I
get some beans/biscuits work done.  For the nonce, what I've reported
is all I know.

	-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 16:27                       ` Sebastian Andrzej Siewior
@ 2017-06-19 16:46                         ` Mike Galbraith
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Galbraith @ 2017-06-19 16:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On Mon, 2017-06-19 at 18:27 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 18:14:52 [+0200], Mike Galbraith wrote:
> > 
> > BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the
> > have something resembling a life thing, and it did not stall in 50
> > iterations of performance/run.sh (bloody fickle thing).  Hohum, take it
> > for whatever you think it's worth.
> 
> that sounds good because I tried your config on three boxes with the
> same result.
> So the issue remains on the DLxxx box which is 8socket NUMA box with
> only one memory node, right?

Yes.  Now _I_ think (having spent days chasing) that it is in fact
generic, but yes, all else aside, my highly utilized and very reliable
up until now (and crippled) DL980 G7 very clearly states that we have a
bug lurking.

	-Mike 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-19 16:29             ` Mike Galbraith
@ 2017-06-20  7:45               ` Mike Galbraith
  2017-06-22 16:34                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-20  7:45 UTC (permalink / raw)
  To: Steven Rostedt, Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, linux-rt-users

On Mon, 2017-06-19 at 18:29 +0200, Mike Galbraith wrote:
> On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote:
> > On Mon, 19 Jun 2017 16:13:41 +0200
> > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > 
> > 
> > > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> > > > waiter->savestate is true. And that should always be false for futex.  
> > > 
> > > you still have sleeping locks like the hb-lock (which might matter in
> > > case the task blocks on the lock and does not continue for some reason).
> > 
> > But then I'd expect this to be an issue with any spinlock in the kernel.
> 
> Thing to do is set a trap for the bugger.

See ! and ?

vogelweide:~/:[1]# grep MIKE trace 
MIKE 913.728104: start
MIKE  futex_wait-7496  [031] .......   913.729160: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7497
MIKE  futex_wait-7496  [031] .......   913.729253: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7499
MIKE  futex_wait-7496  [031] .......   913.729307: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7500
MIKE  futex_wait-7496  [031] .......   913.729348: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7501
MIKE  futex_wait-7496  [031] d...2..   913.729430: sched_switch: prev_comm=futex_wait prev_pid=7496 prev_prio=120 prev_state=S ==> next_comm=swapper/31 next_pid=0 next_prio=120
MIKE--futex_wait-7500  [005] d...2..   920.040320: sched_switch: prev_comm=futex_wait prev_pid=7500 prev_prio=120 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
MIKE  futex_wait-7497  [058] d...211   920.060009: sched_waking: comm=futex_wait pid=7501 prio=120 target_cpu=044
MIKE  futex_wait-7497  [058] d...311   920.060012: sched_wake_idle_without_ipi: cpu=44
MIKE  futex_wait-7497  [058] d...311   920.060012: sched_wakeup: comm=futex_wait pid=7501 prio=120 target_cpu=044
MIKE--futex_wait-7497  [058] d...2..   920.060035: sched_switch: prev_comm=futex_wait prev_pid=7497 prev_prio=120 prev_state=S ==> next_comm=swapper/58 next_pid=0 next_prio=120
MIKE  futex_wait-7499  [043] dN..411   920.060035: sched_wakeup: comm=ktimersoftd/43 pid=438 prio=0 target_cpu=043
MIKE! futex_wait-7499  [043] .N..111   920.060037: wake_up_lock_sleeper: futex_wait:7497 state:1 saved_state:0
MIKE! futex_wait-7499  [043] dN..211   920.060037: try_to_wake_up: futex_wait:7497 WF_LOCK_SLEEPER !(p->state:1 & state:2) bailing
MIKE 920.060037: futex_wait-7497 is never awakened again until ^C cleanup, 7499/7501 still standing
MIKE  futex_wait-7499  [043] ....111   920.060066: wake_up_lock_sleeper: futex_wait:7501 state:2 saved_state:0
MIKE  futex_wait-7499  [043] d...211   920.060066: try_to_wake_up: futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [043] ....111   920.060606: wake_up_lock_sleeper: futex_wait:7501 state:2 saved_state:0
MIKE 920.355048: huh? wake_up_lock_sleeper sees state=2 but try_to_wake_up then sees state=0 and bails ?!?
MIKE  futex_wait-7501  [044] ....111   920.355048: wake_up_lock_sleeper: futex_wait:7499 state:2 saved_state:0
MIKE  futex_wait-7501  [044] d...211   920.355049: try_to_wake_up: futex_wait:7499 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [048] d..h311   920.355060: sched_stat_runtime: comm=futex_wait pid=7499 runtime=168537 [ns] vruntime=4850005377 [ns]
MIKE  futex_wait-7499  [048] d...311   920.355061: sched_waking: comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048
MIKE  futex_wait-7499  [048] d...411   920.355062: sched_stat_runtime: comm=futex_wait pid=7499 runtime=2972 [ns] vruntime=4850008349 [ns]
MIKE  futex_wait-7499  [048] d.L.411   920.355063: sched_wakeup: comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048
MIKE  futex_wait-7499  [048] d.L.311   920.355064: sched_waking: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] dNL.411   920.355065: sched_wakeup: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] .NL.111   920.355066: wake_up_lock_sleeper: futex_wait:7501 state:0 saved_state:0
MIKE  futex_wait-7499  [048] dNL.211   920.355067: try_to_wake_up: futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [048] dNL.211   920.355067: sched_stat_runtime: comm=futex_wait pid=7499 runtime=2011 [ns] vruntime=4850010360 [ns]
MIKE  futex_wait-7499  [048] d...211   920.355068: sched_switch: prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=R+ ==> next_comm=ktimersoftd/48 next_pid=488 next_prio=0
MIKE       <...>-488   [048] d...2..   920.355070: sched_switch: prev_comm=ktimersoftd/48 prev_pid=488 prev_prio=0 prev_state=S ==> next_comm=ksoftirqd/48 next_pid=489 next_prio=120
MIKE       <...>-489   [048] d...2..   920.355071: sched_stat_runtime: comm=ksoftirqd/48 pid=489 runtime=1925 [ns] vruntime=4838010274 [ns]
MIKE       <...>-489   [048] d...2..   920.355072: sched_switch: prev_comm=ksoftirqd/48 prev_pid=489 prev_prio=120 prev_state=S ==> next_comm=futex_wait next_pid=7499 next_prio=120
MIKE  futex_wait-7501  [044] d...2..   920.355072: sched_stat_runtime: comm=futex_wait pid=7501 runtime=32398 [ns] vruntime=6632181410 [ns]
MIKE--futex_wait-7501  [044] d...2..   920.355074: sched_switch: prev_comm=futex_wait prev_pid=7501 prev_prio=120 prev_state=S ==> next_comm=swapper/44 next_pid=0 next_prio=120
MIKE 920.355074: 7499 is last man standing, but bored, just waking ktimersoftd
MIKE  futex_wait-7499  [048] d..h1..   920.356060: sched_stat_runtime: comm=futex_wait pid=7499 runtime=988793 [ns] vruntime=4850999153 [ns]
MIKE  futex_wait-7499  [048] d...1..   920.356061: sched_waking: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] dN..2..   920.356063: sched_wakeup: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] dN..2..   920.356063: sched_stat_runtime: comm=futex_wait pid=7499 runtime=1854 [ns] vruntime=4851001007 [ns]
MIKE  futex_wait-7499  [048] d...2..   920.356064: sched_switch: prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=R ==> next_comm=ktimersoftd/48 next_pid=488 next_prio=0
MIKE       <...>-488   [048] d...2..   920.356065: sched_switch: prev_comm=ktimersoftd/48 prev_pid=488 prev_prio=0 prev_state=S ==> next_comm=futex_wait next_pid=7499 next_prio=120
MIKE  futex_wait-7499  [048] d..h1..   920.384055: sched_stat_runtime: comm=futex_wait pid=7499 runtime=997267 [ns] vruntime=4878950744 [ns]
MIKE  futex_wait-7499  [048] d...1..   920.384056: sched_waking: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] dN..2..   920.384056: sched_wakeup: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] dN..2..   920.384057: sched_stat_runtime: comm=futex_wait pid=7499 runtime=1161 [ns] vruntime=4878951905 [ns]
MIKE  futex_wait-7499  [048] d...2..   920.384057: sched_switch: prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=R ==> next_comm=ktimersoftd/48 next_pid=488 next_prio=0
MIKE           <...>-488   [048] d...2..   920.384058: sched_switch: prev_comm=ktimersoftd/48 prev_pid=488 prev_prio=0 prev_state=S ==> next_comm=futex_wait next_pid=7499 next_prio=120
MIKE  futex_wait-7499  [048] d...2..   920.384271: sched_stat_runtime: comm=futex_wait pid=7499 runtime=213243 [ns] vruntime=4879165148 [ns]
MIKE  futex_wait-7499  [048] d...2..   920.384271: sched_switch: prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=S ==> next_comm=swapper/48 next_pid=0 next_prio=120
MIKE 920.384271: we're stalled, all futex_wait have scheduled off, elide gap
MIKE 981.447247: nothing to see above /me poking ^C, elide to EOF

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-20  7:45               ` Mike Galbraith
@ 2017-06-22 16:34                 ` Sebastian Andrzej Siewior
  2017-06-22 17:30                   ` Mike Galbraith
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-22 16:34 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Steven Rostedt, Thomas Gleixner, LKML, linux-rt-users

On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> See ! and ?

See see.
What about this:

diff --git a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1014,8 +1014,20 @@ struct wake_q_head {
 #define WAKE_Q(name)					\
 	struct wake_q_head name = { WAKE_Q_TAIL, &name.first }
 
-extern void wake_q_add(struct wake_q_head *head,
-			      struct task_struct *task);
+extern void __wake_q_add(struct wake_q_head *head,
+			 struct task_struct *task, bool sleeper);
+static inline void wake_q_add(struct wake_q_head *head,
+			      struct task_struct *task)
+{
+	__wake_q_add(head, task, false);
+}
+
+static inline void wake_q_add_sleeper(struct wake_q_head *head,
+				      struct task_struct *task)
+{
+	__wake_q_add(head, task, true);
+}
+
 extern void __wake_up_q(struct wake_q_head *head, bool sleeper);
 
 static inline void wake_up_q(struct wake_q_head *head)
@@ -1745,6 +1757,7 @@ struct task_struct {
 	raw_spinlock_t pi_lock;
 
 	struct wake_q_node wake_q;
+	struct wake_q_node wake_q_sleeper;
 
 #ifdef CONFIG_RT_MUTEXES
 	/* PI waiters blocked on a rt_mutex held by this task */
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	tsk->splice_pipe = NULL;
 	tsk->task_frag.page = NULL;
 	tsk->wake_q.next = NULL;
+	tsk->wake_q_sleeper.next = NULL;
 
 	account_kernel_stack(tsk, 1);
 
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q,
 	 */
 	preempt_disable();
 	if (waiter->savestate)
-		wake_q_add(wake_sleeper_q, waiter->task);
+		wake_q_add_sleeper(wake_sleeper_q, waiter->task);
 	else
 		wake_q_add(wake_q, waiter->task);
 	raw_spin_unlock(&current->pi_lock);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p)
 #endif
 #endif
 
-void wake_q_add(struct wake_q_head *head, struct task_struct *task)
+void __wake_q_add(struct wake_q_head *head, struct task_struct *task,
+		  bool sleeper)
 {
-	struct wake_q_node *node = &task->wake_q;
+	struct wake_q_node *node;
+
+	if (sleeper)
+		node = &task->wake_q_sleeper;
+	else
+		node = &task->wake_q;
 
 	/*
 	 * Atomically grab the task, if ->wake_q is !nil already it means
@@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper)
 	while (node != WAKE_Q_TAIL) {
 		struct task_struct *task;
 
-		task = container_of(node, struct task_struct, wake_q);
+		if (sleeper)
+			task = container_of(node, struct task_struct, wake_q_sleeper);
+		else
+			task = container_of(node, struct task_struct, wake_q);
 		BUG_ON(!task);
 		/* task can safely be re-inserted now */
 		node = node->next;
-		task->wake_q.next = NULL;
+		if (sleeper)
+			task->wake_q_sleeper.next = NULL;
+		else
+			task->wake_q.next = NULL;
 
 		/*
 		 * wake_up_process() implies a wmb() to pair with the queueing

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-22 16:34                 ` Sebastian Andrzej Siewior
@ 2017-06-22 17:30                   ` Mike Galbraith
  2017-06-22 21:36                     ` Thomas Gleixner
                                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Mike Galbraith @ 2017-06-22 17:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Thomas Gleixner, LKML, linux-rt-users

On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > See ! and ?
> 
> See see.
> What about this:

I'll give it a go, likely during the weekend.

I moved 4.11-rt today (also repros nicely) due to ftrace annoying me.
 After yet more staring at ever more huge traces (opposite of goal;),
then taking a break to stare at source again, I decided that the dual
wake_q business should die.. and the stall died with it.

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1014,8 +1014,20 @@ struct wake_q_head {
>  #define WAKE_Q(name)					\
>  	struct wake_q_head name = { WAKE_Q_TAIL, &name.first }
>  
> -extern void wake_q_add(struct wake_q_head *head,
> -			      struct task_struct *task);
> +extern void __wake_q_add(struct wake_q_head *head,
> +			 struct task_struct *task, bool sleeper);
> +static inline void wake_q_add(struct wake_q_head *head,
> +			      struct task_struct *task)
> +{
> +	__wake_q_add(head, task, false);
> +}
> +
> +static inline void wake_q_add_sleeper(struct wake_q_head *head,
> +				      struct task_struct *task)
> +{
> +	__wake_q_add(head, task, true);
> +}
> +
>  extern void __wake_up_q(struct wake_q_head *head, bool sleeper);
>  
>  static inline void wake_up_q(struct wake_q_head *head)
> @@ -1745,6 +1757,7 @@ struct task_struct {
>  	raw_spinlock_t pi_lock;
>  
>  	struct wake_q_node wake_q;
> +	struct wake_q_node wake_q_sleeper;
>  
>  #ifdef CONFIG_RT_MUTEXES
>  	/* PI waiters blocked on a rt_mutex held by this task */
> diff --git a/kernel/fork.c b/kernel/fork.c
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  	tsk->splice_pipe = NULL;
>  	tsk->task_frag.page = NULL;
>  	tsk->wake_q.next = NULL;
> +	tsk->wake_q_sleeper.next = NULL;
>  
>  	account_kernel_stack(tsk, 1);
>  
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q,
>  	 */
>  	preempt_disable();
>  	if (waiter->savestate)
> -		wake_q_add(wake_sleeper_q, waiter->task);
> +		wake_q_add_sleeper(wake_sleeper_q, waiter->task);
>  	else
>  		wake_q_add(wake_q, waiter->task);
>  	raw_spin_unlock(&current->pi_lock);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p)
>  #endif
>  #endif
>  
> -void wake_q_add(struct wake_q_head *head, struct task_struct *task)
> +void __wake_q_add(struct wake_q_head *head, struct task_struct *task,
> +		  bool sleeper)
>  {
> -	struct wake_q_node *node = &task->wake_q;
> +	struct wake_q_node *node;
> +
> +	if (sleeper)
> +		node = &task->wake_q_sleeper;
> +	else
> +		node = &task->wake_q;
>  
>  	/*
>  	 * Atomically grab the task, if ->wake_q is !nil already it means
> @@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper)
>  	while (node != WAKE_Q_TAIL) {
>  		struct task_struct *task;
>  
> -		task = container_of(node, struct task_struct, wake_q);
> +		if (sleeper)
> +			task = container_of(node, struct task_struct, wake_q_sleeper);
> +		else
> +			task = container_of(node, struct task_struct, wake_q);
>  		BUG_ON(!task);
>  		/* task can safely be re-inserted now */
>  		node = node->next;
> -		task->wake_q.next = NULL;
> +		if (sleeper)
> +			task->wake_q_sleeper.next = NULL;
> +		else
> +			task->wake_q.next = NULL;
>  
>  		/*
>  		 * wake_up_process() implies a wmb() to pair with the queueing

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-22 17:30                   ` Mike Galbraith
@ 2017-06-22 21:36                     ` Thomas Gleixner
  2017-06-23  2:00                     ` Mike Galbraith
  2017-06-23 12:48                     ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2017-06-22 21:36 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Steven Rostedt, LKML, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

On Thu, 22 Jun 2017, Mike Galbraith wrote:
> On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > > See ! and ?
> > 
> > See see.
> > What about this:
> 
> I'll give it a go, likely during the weekend.
> 
> I moved 4.11-rt today (also repros nicely) due to ftrace annoying me.
>  After yet more staring at ever more huge traces (opposite of goal;),
> then taking a break to stare at source again, I decided that the dual
> wake_q business should die.. and the stall died with it.

The dual wake_q business is the semantically correct seperation. The evils
of 'sleeping spinlocks' require to preserve the task state accross the lock
sleep and act upon the real wakeup (not the lock wakeup) proper.

The TASK_ALL wakeup of lock sleepers was hiding that non seperation of
these two except for the resulting wreckage of ptrace, which we simply
ignored for a long time because ....

Now that we figure out that TASK_ALL is wrong we exposed the convolution of
competing lock wakeups and regular wakeups. Seperating wake_q for those is
the only sane solution.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-22 17:30                   ` Mike Galbraith
  2017-06-22 21:36                     ` Thomas Gleixner
@ 2017-06-23  2:00                     ` Mike Galbraith
  2017-06-23 12:48                     ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 32+ messages in thread
From: Mike Galbraith @ 2017-06-23  2:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Thomas Gleixner, LKML, linux-rt-users

On Thu, 2017-06-22 at 19:30 +0200, Mike Galbraith wrote:
> On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > > See ! and ?
> > 
> > See see.
> > What about this:
> 
> I'll give it a go, likely during the weekend.

What the heck, after so much time expended squabbling with this bugger,
beans/biscuits work will hardly notice a mere hour or so...

Yup, it's confirmed dead, and no doubt my GUI goblins will be too.

	-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch-rt v2] rtmutex: Fix lock stealing logic
  2017-06-18 17:01 ` [patch-rt] rtmutex: Fix lock stealing logic Mike Galbraith
@ 2017-06-23  7:37   ` Mike Galbraith
  2017-06-23 10:07     ` Mike Galbraith
  2017-06-23 13:33     ` Steven Rostedt
  0 siblings, 2 replies; 32+ messages in thread
From: Mike Galbraith @ 2017-06-23  7:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Thomas Gleixner
  Cc: LKML, linux-rt-users, Steven Rostedt

V2 changes:
  - beautification (ymmv)
  - enable lock stealing when waiter is queued

rtmutex: Fix lock stealing logic

1. When trying to acquire an rtmutex, we first try to grab it without
queueing the waiter, and explicitly check for that initial attempt
in the !waiter path of __try_to_take_rt_mutex().  Checking whether
the lock taker is top waiter before allowing a steal attempt in that
path is a thinko: the lock taker has not yet blocked.

2. It seems wrong to change the definition of rt_mutex_waiter_less()
to mean less or perhaps equal when we have an rt_mutex_waiter_equal().

Remove the thinko, restore rt_mutex_waiter_less(), implement and use
rt_mutex_steal() based upon rt_mutex_waiter_less/equal(), moving all
qualification criteria into the function itself.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 kernel/locking/rtmutex.c |   76 ++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 39 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
-#define STEAL_NORMAL  0
-#define STEAL_LATERAL 1
-
 /*
  * Only use with rt_mutex_waiter_{less,equal}()
  */
-#define task_to_waiter(p)	\
-	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }
+#define task_to_waiter(p) &(struct rt_mutex_waiter) \
+	{ .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) }
 
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
-		     struct rt_mutex_waiter *right, int mode)
+		     struct rt_mutex_waiter *right)
 {
-	if (mode == STEAL_NORMAL) {
-		if (left->prio < right->prio)
-			return 1;
-	} else {
-		if (left->prio <= right->prio)
-			return 1;
-	}
+	if (left->prio < right->prio)
+		return 1;
+
 	/*
 	 * If both waiters have dl_prio(), we check the deadlines of the
 	 * associated tasks.
@@ -287,6 +280,27 @@ rt_mutex_waiter_equal(struct rt_mutex_wa
 	return 1;
 }
 
+#define STEAL_NORMAL  0
+#define STEAL_LATERAL 1
+
+static inline int
+rt_mutex_steal(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, int mode)
+{
+	struct rt_mutex_waiter *top_waiter = rt_mutex_top_waiter(lock);
+
+	if (waiter == top_waiter || rt_mutex_waiter_less(waiter, top_waiter))
+		return 1;
+
+	/*
+	 * Note that RT tasks are excluded from lateral-steals
+	 * to prevent the introduction of an unbounded latency.
+	 */
+	if (mode == STEAL_NORMAL || rt_task(waiter->task))
+		return 0;
+
+	return rt_mutex_waiter_equal(waiter, top_waiter);
+}
+
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
@@ -298,7 +312,7 @@ rt_mutex_enqueue(struct rt_mutex *lock,
 	while (*link) {
 		parent = *link;
 		entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
-		if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+		if (rt_mutex_waiter_less(waiter, entry)) {
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
@@ -337,7 +351,7 @@ rt_mutex_enqueue_pi(struct task_struct *
 	while (*link) {
 		parent = *link;
 		entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
-		if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+		if (rt_mutex_waiter_less(waiter, entry)) {
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
@@ -847,6 +861,7 @@ static int rt_mutex_adjust_prio_chain(st
  * @task:   The task which wants to acquire the lock
  * @waiter: The waiter that is queued to the lock's wait tree if the
  *	    callsite called task_blocked_on_lock(), otherwise NULL
+ * @mode:   Lock steal mode (STEAL_NORMAL, STEAL_LATERAL)
  */
 static int __try_to_take_rt_mutex(struct rt_mutex *lock,
 				  struct task_struct *task,
@@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
 	 */
 	if (waiter) {
 		/*
-		 * If waiter is not the highest priority waiter of
-		 * @lock, give up.
+		 * If waiter is not the highest priority waiter of @lock,
+		 * or its peer when lateral steal is allowed, give up.
 		 */
-		if (waiter != rt_mutex_top_waiter(lock)) {
-			/* XXX rt_mutex_waiter_less() ? */
+		if (!rt_mutex_steal(lock, waiter, mode))
 			return 0;
-		}
-
 		/*
 		 * We can acquire the lock. Remove the waiter from the
 		 * lock waiters tree.
 		 */
 		rt_mutex_dequeue(lock, waiter);
-
 	} else {
 		/*
 		 * If the lock has waiters already we check whether @task is
@@ -910,25 +921,12 @@ static int __try_to_take_rt_mutex(struct
 		 * not need to be dequeued.
 		 */
 		if (rt_mutex_has_waiters(lock)) {
-			struct task_struct *pown = rt_mutex_top_waiter(lock)->task;
-
-			if (task != pown)
-				return 0;
-
-			/*
-			 * Note that RT tasks are excluded from lateral-steals
-			 * to prevent the introduction of an unbounded latency.
-			 */
-			if (rt_task(task))
-				mode = STEAL_NORMAL;
 			/*
-			 * If @task->prio is greater than or equal to
-			 * the top waiter priority (kernel view),
-			 * @task lost.
+			 * If @task->prio is greater than the top waiter
+			 * priority (kernel view), or equal to it when a
+			 * lateral steal is forbidden, @task lost.
 			 */
-			if (!rt_mutex_waiter_less(task_to_waiter(task),
-						  rt_mutex_top_waiter(lock),
-						  mode))
+			if (!rt_mutex_steal(lock, task_to_waiter(task), mode))
 				return 0;
 			/*
 			 * The current top waiter stays enqueued. We

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch-rt v2] rtmutex: Fix lock stealing logic
  2017-06-23  7:37   ` [patch-rt v2] " Mike Galbraith
@ 2017-06-23 10:07     ` Mike Galbraith
  2017-06-26 13:47       ` Sebastian Andrzej Siewior
  2017-06-23 13:33     ` Steven Rostedt
  1 sibling, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-23 10:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Thomas Gleixner
  Cc: LKML, linux-rt-users, Steven Rostedt

On Fri, 2017-06-23 at 09:37 +0200, Mike Galbraith wrote:
> V2 changes:
>   - beautification (ymmv)
>   - enable lock stealing when waiter is queued

(V3 s/steal/claim)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ANNOUNCE] v4.11.5-rt1
  2017-06-22 17:30                   ` Mike Galbraith
  2017-06-22 21:36                     ` Thomas Gleixner
  2017-06-23  2:00                     ` Mike Galbraith
@ 2017-06-23 12:48                     ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-23 12:48 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Steven Rostedt, Thomas Gleixner, LKML, linux-rt-users

On 2017-06-22 19:30:07 [+0200], Mike Galbraith wrote:
> On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > > See ! and ?
> > 
> > See see.
> > What about this:
> 
> I'll give it a go, likely during the weekend.

It survived >1d on my AMD-A10 box. I am adding this description:

|If a task is queued as a sleeper for a wakeup and never goes to
|schedule() (because it just obtained the lock) then it will receive a
|spurious wake up which is not "bad", it is considered. Until that wake
|up happens this task can no be enqueued for any wake ups handled by the
|WAKE_Q infrastructure (because a task can only be enqueued once). This
|wouldn't be bad if we would use the same wakeup mechanism for the wake
|up of sleepers as we do for "normal" wake ups. But we don't…
|
|So.
|   T1			T2		T3
|   spin_lock(x)				spin_unlock(x);
|   					wake_q_add(q1, T1)
|   spin_unlock(x)
|   set_state(TASK_INTERRUPTIBLE)
|   if (!condition)
|	schedule()
|			condition = true
|			wake_q_add(q2, T1)
|			// T1 not added, still enqueued
|			wake_up_q(q2)
|					wake_up_q_sleeper(q1)
|					// T1 not woken up, wrong task state
|
|In order to solve this race this patch adds a wake_q_node for the
|sleeper case.

and consider this closed unless I hear from you different things :)

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch-rt v2] rtmutex: Fix lock stealing logic
  2017-06-23  7:37   ` [patch-rt v2] " Mike Galbraith
  2017-06-23 10:07     ` Mike Galbraith
@ 2017-06-23 13:33     ` Steven Rostedt
  2017-06-23 14:07       ` Mike Galbraith
  1 sibling, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2017-06-23 13:33 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, LKML, linux-rt-users

On Fri, 23 Jun 2017 09:37:14 +0200
Mike Galbraith <efault@gmx.de> wrote:

> V2 changes:
>   - beautification (ymmv)
>   - enable lock stealing when waiter is queued
> 
> rtmutex: Fix lock stealing logic
> 
> 1. When trying to acquire an rtmutex, we first try to grab it without
> queueing the waiter, and explicitly check for that initial attempt
> in the !waiter path of __try_to_take_rt_mutex().  Checking whether
> the lock taker is top waiter before allowing a steal attempt in that
> path is a thinko: the lock taker has not yet blocked.
> 
> 2. It seems wrong to change the definition of rt_mutex_waiter_less()
> to mean less or perhaps equal when we have an rt_mutex_waiter_equal().
> 
> Remove the thinko, restore rt_mutex_waiter_less(), implement and use
> rt_mutex_steal() based upon rt_mutex_waiter_less/equal(), moving all
> qualification criteria into the function itself.
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> ---
>  kernel/locking/rtmutex.c |   76 ++++++++++++++++++++++-------------------------
>  1 file changed, 37 insertions(+), 39 deletions(-)
> 
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe(
>  }
>  #endif
>  
> -#define STEAL_NORMAL  0
> -#define STEAL_LATERAL 1
> -
>  /*
>   * Only use with rt_mutex_waiter_{less,equal}()
>   */
> -#define task_to_waiter(p)	\
> -	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }
> +#define task_to_waiter(p) &(struct rt_mutex_waiter) \
> +	{ .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) }
>  
>  static inline int
>  rt_mutex_waiter_less(struct rt_mutex_waiter *left,
> -		     struct rt_mutex_waiter *right, int mode)
> +		     struct rt_mutex_waiter *right)
>  {
> -	if (mode == STEAL_NORMAL) {
> -		if (left->prio < right->prio)
> -			return 1;
> -	} else {
> -		if (left->prio <= right->prio)
> -			return 1;
> -	}
> +	if (left->prio < right->prio)
> +		return 1;
> +
>  	/*
>  	 * If both waiters have dl_prio(), we check the deadlines of the
>  	 * associated tasks.
> @@ -287,6 +280,27 @@ rt_mutex_waiter_equal(struct rt_mutex_wa
>  	return 1;
>  }
>  
> +#define STEAL_NORMAL  0
> +#define STEAL_LATERAL 1
> +
> +static inline int
> +rt_mutex_steal(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, int mode)
> +{
> +	struct rt_mutex_waiter *top_waiter = rt_mutex_top_waiter(lock);
> +
> +	if (waiter == top_waiter || rt_mutex_waiter_less(waiter, top_waiter))
> +		return 1;
> +
> +	/*
> +	 * Note that RT tasks are excluded from lateral-steals
> +	 * to prevent the introduction of an unbounded latency.
> +	 */
> +	if (mode == STEAL_NORMAL || rt_task(waiter->task))
> +		return 0;
> +
> +	return rt_mutex_waiter_equal(waiter, top_waiter);
> +}
> +
>  static void
>  rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
>  {
> @@ -298,7 +312,7 @@ rt_mutex_enqueue(struct rt_mutex *lock,
>  	while (*link) {
>  		parent = *link;
>  		entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
> -		if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
> +		if (rt_mutex_waiter_less(waiter, entry)) {
>  			link = &parent->rb_left;
>  		} else {
>  			link = &parent->rb_right;
> @@ -337,7 +351,7 @@ rt_mutex_enqueue_pi(struct task_struct *
>  	while (*link) {
>  		parent = *link;
>  		entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
> -		if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
> +		if (rt_mutex_waiter_less(waiter, entry)) {
>  			link = &parent->rb_left;
>  		} else {
>  			link = &parent->rb_right;
> @@ -847,6 +861,7 @@ static int rt_mutex_adjust_prio_chain(st
>   * @task:   The task which wants to acquire the lock
>   * @waiter: The waiter that is queued to the lock's wait tree if the
>   *	    callsite called task_blocked_on_lock(), otherwise NULL
> + * @mode:   Lock steal mode (STEAL_NORMAL, STEAL_LATERAL)
>   */
>  static int __try_to_take_rt_mutex(struct rt_mutex *lock,
>  				  struct task_struct *task,
> @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
>  	 */
>  	if (waiter) {
>  		/*
> -		 * If waiter is not the highest priority waiter of
> -		 * @lock, give up.
> +		 * If waiter is not the highest priority waiter of @lock,
> +		 * or its peer when lateral steal is allowed, give up.
>  		 */
> -		if (waiter != rt_mutex_top_waiter(lock)) {
> -			/* XXX rt_mutex_waiter_less() ? */
> +		if (!rt_mutex_steal(lock, waiter, mode))
>  			return 0;
> -		}
> -
>  		/*
>  		 * We can acquire the lock. Remove the waiter from the
>  		 * lock waiters tree.
>  		 */
>  		rt_mutex_dequeue(lock, waiter);
> -

I liked that space.

>  	} else {
>  		/*
>  		 * If the lock has waiters already we check whether @task is
> @@ -910,25 +921,12 @@ static int __try_to_take_rt_mutex(struct
>  		 * not need to be dequeued.
>  		 */
>  		if (rt_mutex_has_waiters(lock)) {
> -			struct task_struct *pown = rt_mutex_top_waiter(lock)->task;
> -
> -			if (task != pown)
> -				return 0;

OK, I see what happened here. Yeah, this will always be true, because
when waiter == NULL, it means that the task isn't on the lock's list
yet, and pown != task is always true.


> -
> -			/*
> -			 * Note that RT tasks are excluded from lateral-steals
> -			 * to prevent the introduction of an unbounded latency.
> -			 */
> -			if (rt_task(task))
> -				mode = STEAL_NORMAL;
>  			/*
> -			 * If @task->prio is greater than or equal to
> -			 * the top waiter priority (kernel view),
> -			 * @task lost.
> +			 * If @task->prio is greater than the top waiter
> +			 * priority (kernel view), or equal to it when a
> +			 * lateral steal is forbidden, @task lost.
>  			 */
> -			if (!rt_mutex_waiter_less(task_to_waiter(task),
> -						  rt_mutex_top_waiter(lock),
> -						  mode))
> +			if (!rt_mutex_steal(lock, task_to_waiter(task), mode))
>  				return 0;
>  			/*
>  			 * The current top waiter stays enqueued. We

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch-rt v2] rtmutex: Fix lock stealing logic
  2017-06-23 13:33     ` Steven Rostedt
@ 2017-06-23 14:07       ` Mike Galbraith
  2017-06-23 14:14         ` Steven Rostedt
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Galbraith @ 2017-06-23 14:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, LKML, linux-rt-users

On Fri, 2017-06-23 at 09:33 -0400, Steven Rostedt wrote:
>  				  struct task_struct *task,
> > @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
> >  	 */
> >  	if (waiter) {
> >  		/*
> > -		 * If waiter is not the highest priority waiter of
> > -		 * @lock, give up.
> > +		 * If waiter is not the highest priority waiter of @lock,
> > +		 * or its peer when lateral steal is allowed, give up.
> >  		 */
> > -		if (waiter != rt_mutex_top_waiter(lock)) {
> > -			/* XXX rt_mutex_waiter_less() ? */
> > +		if (!rt_mutex_steal(lock, waiter, mode))
> >  			return 0;
> > -		}
> > -
> >  		/*
> >  		 * We can acquire the lock. Remove the waiter from the
> >  		 * lock waiters tree.
> >  		 */
> >  		rt_mutex_dequeue(lock, waiter);
> > -
> 
> I liked that space.

I like minus signs in diffstat, that one was a freebee.  Maintainers
can revive it if they like, or I can post a V3 with it revived as well
as s/rt_mutex_steal/rt_mutex_claim.

	-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch-rt v2] rtmutex: Fix lock stealing logic
  2017-06-23 14:07       ` Mike Galbraith
@ 2017-06-23 14:14         ` Steven Rostedt
  0 siblings, 0 replies; 32+ messages in thread
From: Steven Rostedt @ 2017-06-23 14:14 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, LKML, linux-rt-users

On Fri, 23 Jun 2017 16:07:19 +0200
Mike Galbraith <efault@gmx.de> wrote:

> On Fri, 2017-06-23 at 09:33 -0400, Steven Rostedt wrote:
> >  				  struct task_struct *task,  
> > > @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
> > >  	 */
> > >  	if (waiter) {
> > >  		/*
> > > -		 * If waiter is not the highest priority waiter of
> > > -		 * @lock, give up.
> > > +		 * If waiter is not the highest priority waiter of @lock,
> > > +		 * or its peer when lateral steal is allowed, give up.
> > >  		 */
> > > -		if (waiter != rt_mutex_top_waiter(lock)) {
> > > -			/* XXX rt_mutex_waiter_less() ? */
> > > +		if (!rt_mutex_steal(lock, waiter, mode))
> > >  			return 0;
> > > -		}
> > > -
> > >  		/*
> > >  		 * We can acquire the lock. Remove the waiter from the
> > >  		 * lock waiters tree.
> > >  		 */
> > >  		rt_mutex_dequeue(lock, waiter);
> > > -  
> > 
> > I liked that space.  
> 
> I like minus signs in diffstat, that one was a freebee.  Maintainers
> can revive it if they like, or I can post a V3 with it revived as well
> as s/rt_mutex_steal/rt_mutex_claim.
>

It's not bigly to me.

-- Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch-rt v2] rtmutex: Fix lock stealing logic
  2017-06-23 10:07     ` Mike Galbraith
@ 2017-06-26 13:47       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-26 13:47 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Thomas Gleixner, LKML, linux-rt-users, Steven Rostedt

On 2017-06-23 12:07:38 [+0200], Mike Galbraith wrote:
> On Fri, 2017-06-23 at 09:37 +0200, Mike Galbraith wrote:
> > V2 changes:
> >   - beautification (ymmv)
> >   - enable lock stealing when waiter is queued

Applied.

> (V3 s/steal/claim)

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-06-26 13:47 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-16 10:56 [ANNOUNCE] v4.11.5-rt1 Sebastian Andrzej Siewior
2017-06-17  8:14 ` Mike Galbraith
2017-06-18  6:46   ` Mike Galbraith
2017-06-19  8:52   ` Sebastian Andrzej Siewior
2017-06-19 10:14     ` Mike Galbraith
2017-06-19 10:44       ` Sebastian Andrzej Siewior
2017-06-19 11:31         ` Mike Galbraith
2017-06-19 11:50           ` Sebastian Andrzej Siewior
2017-06-19 12:55             ` Mike Galbraith
2017-06-19 14:06               ` Sebastian Andrzej Siewior
2017-06-19 14:36                 ` Mike Galbraith
2017-06-19 15:03                   ` Sebastian Andrzej Siewior
2017-06-19 16:14                     ` Mike Galbraith
2017-06-19 16:27                       ` Sebastian Andrzej Siewior
2017-06-19 16:46                         ` Mike Galbraith
2017-06-19 14:08       ` Steven Rostedt
2017-06-19 14:13         ` Sebastian Andrzej Siewior
2017-06-19 14:41           ` Steven Rostedt
2017-06-19 16:29             ` Mike Galbraith
2017-06-20  7:45               ` Mike Galbraith
2017-06-22 16:34                 ` Sebastian Andrzej Siewior
2017-06-22 17:30                   ` Mike Galbraith
2017-06-22 21:36                     ` Thomas Gleixner
2017-06-23  2:00                     ` Mike Galbraith
2017-06-23 12:48                     ` Sebastian Andrzej Siewior
2017-06-18 17:01 ` [patch-rt] rtmutex: Fix lock stealing logic Mike Galbraith
2017-06-23  7:37   ` [patch-rt v2] " Mike Galbraith
2017-06-23 10:07     ` Mike Galbraith
2017-06-26 13:47       ` Sebastian Andrzej Siewior
2017-06-23 13:33     ` Steven Rostedt
2017-06-23 14:07       ` Mike Galbraith
2017-06-23 14:14         ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).