linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
@ 2015-06-30 14:30 Rabin Vincent
  2015-07-01  5:36 ` Mike Galbraith
  0 siblings, 1 reply; 31+ messages in thread
From: Rabin Vincent @ 2015-06-30 14:30 UTC (permalink / raw)
  To: mingo, peterz; +Cc: linux-kernel

Hi,

We're seeing a livelock where two CPUs both loop with interrupts
disabled in pick_next_task_fair() / idle_balance() and continuously
fetch all tasks from each other.  This has been observed on a 3.18
kernel but the relevant code paths appear to be the same in current
mainline.

The problem situation appears to be this:

cpu0                                        cpu1


pick_next_task
 take rq0->lock
 pick_next_task_fair
  running=0
  idle_balance()
   drop rq0->lock


                                        wakeup A,B on cpu0


                                        pick_next_task
                                         take rq1->lock
                                         pick_next_task_fair
                                          running=0
                                          idle_balance()
                                           drop r1->lock
                                           load_balance()
                                           busiest=rq0
                                           take rq0->lock
                                           detach A,B
                                           drop rq0->lock
                                           take rq1->lock
                                           attach A,B
                                           pulled_task = 2
                                           drop rq1->lock

   load_balance()
   busiest=rq1
   take rq1->lock
   detach A,B
   drop rq1->lock
   take rq0->lock
   attach A,B
   pulled_task = 2
   drop rq0->lock

                                          running=0()
                                          idle_balance()
                                          busiest=rq0, pull A,B, etc.


  running = 0
  load_balance()
  busiest=rq1, pull A,B, etc

And this goes on, with interrupts disabled on both CPUs, for at least a
100 ms (which is when a hardware watchdog hits).

The conditions needed, apart from the right timing, are:

 - cgroups. One of the tasks, say A, needs to be in a CPU cgroup.  When
   the problem occurs, A's ->se has zero load_avg_contrib and
   task_h_load(A) is zero.  However, the se->parent->parent of A has a
   (relatively) high load_avg_contrib.  cpu0's cfs_rq has therefore a
   relatively high runnable_load_avg.  find_busiest_group() therefore
   detects imbalance, and detach_tasks() detaches all tasks.

 - PREEMPT=n.  Otherwise, the code under #ifdef in detach_tasks() would
   ensure that we'd only ever pull a maximum of one task during idle
   balancing.

 - cpu0 and cpu1 are threads on the same core (cpus_share_cache() ==
   true).  otherwise, cpu1 will not be able to wakeup tasks on cpu0
   while cpu0 has interrupts disabled (since an IPI would be required).
   Turning off the default TTWU_QUEUE feature would also provide the
   same effect.

I see two simple ways to prevent the livelock.  One is to just to remove
the #ifdef:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d57cc0..74c94dc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5937,7 +5937,6 @@ static int detach_tasks(struct lb_env *env)
 		detached++;
 		env->imbalance -= load;
 
-#ifdef CONFIG_PREEMPT
 		/*
 		 * NEWIDLE balancing is a source of latency, so preemptible
 		 * kernels will stop after the first task is detached to minimize
@@ -5945,7 +5944,6 @@ static int detach_tasks(struct lb_env *env)
 		 */
 		if (env->idle == CPU_NEWLY_IDLE)
 			break;
-#endif
 
 		/*
 		 * We only want to steal up to the prescribed amount of

Or this should work too:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d57cc0..13358cf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7487,6 +7487,8 @@ static int idle_balance(struct rq *this_rq)
 	 */
 	if (this_rq->cfs.h_nr_running && !pulled_task)
 		pulled_task = 1;
+	else if (!this_rq->cfs.h_nr_running && pulled_task)
+		pulled_task = 0;
 
 out:
 	/* Move the next balance forward */

But it seems that the real problem is that detach_tasks() can remove all
tasks, when cgroups are involved as described above?

Thanks.

/Rabin

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-08-03 17:06 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-30 14:30 [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Rabin Vincent
2015-07-01  5:36 ` Mike Galbraith
2015-07-01 14:55   ` Rabin Vincent
2015-07-01 15:47     ` Mike Galbraith
2015-07-01 20:44     ` Peter Zijlstra
2015-07-01 23:25       ` Yuyang Du
2015-07-02  8:05         ` Mike Galbraith
2015-07-02  1:05           ` Yuyang Du
2015-07-02 10:25             ` Mike Galbraith
2015-07-02 11:40             ` Morten Rasmussen
2015-07-02 19:37               ` Yuyang Du
2015-07-03  9:34                 ` Morten Rasmussen
2015-07-03 16:38                   ` Peter Zijlstra
2015-07-05 22:31                     ` Yuyang Du
2015-07-09 14:32                       ` Morten Rasmussen
2015-07-09 23:24                         ` Yuyang Du
2015-07-05 20:12                   ` Yuyang Du
2015-07-06 17:36                     ` Dietmar Eggemann
2015-07-07 11:17                       ` Rabin Vincent
2015-07-13 17:43                         ` Dietmar Eggemann
2015-07-09 13:53                     ` Morten Rasmussen
2015-07-09 22:34                       ` Yuyang Du
2015-07-02 10:53         ` Peter Zijlstra
2015-07-02 11:44           ` Morten Rasmussen
2015-07-02 18:42             ` Yuyang Du
2015-07-03  4:42               ` Mike Galbraith
2015-07-03 16:39         ` Peter Zijlstra
2015-07-05 22:11           ` Yuyang Du
2015-07-09  6:15             ` Stefan Ekenberg
2015-07-26 18:57             ` Yuyang Du
2015-08-03 17:05             ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).