* [PATCH?] Livelock in pick_next_task_fair() / idle_balance() @ 2015-06-30 14:30 Rabin Vincent 2015-07-01 5:36 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Rabin Vincent @ 2015-06-30 14:30 UTC (permalink / raw) To: mingo, peterz; +Cc: linux-kernel Hi, We're seeing a livelock where two CPUs both loop with interrupts disabled in pick_next_task_fair() / idle_balance() and continuously fetch all tasks from each other. This has been observed on a 3.18 kernel but the relevant code paths appear to be the same in current mainline. The problem situation appears to be this: cpu0 cpu1 pick_next_task take rq0->lock pick_next_task_fair running=0 idle_balance() drop rq0->lock wakeup A,B on cpu0 pick_next_task take rq1->lock pick_next_task_fair running=0 idle_balance() drop r1->lock load_balance() busiest=rq0 take rq0->lock detach A,B drop rq0->lock take rq1->lock attach A,B pulled_task = 2 drop rq1->lock load_balance() busiest=rq1 take rq1->lock detach A,B drop rq1->lock take rq0->lock attach A,B pulled_task = 2 drop rq0->lock running=0() idle_balance() busiest=rq0, pull A,B, etc. running = 0 load_balance() busiest=rq1, pull A,B, etc And this goes on, with interrupts disabled on both CPUs, for at least a 100 ms (which is when a hardware watchdog hits). The conditions needed, apart from the right timing, are: - cgroups. One of the tasks, say A, needs to be in a CPU cgroup. When the problem occurs, A's ->se has zero load_avg_contrib and task_h_load(A) is zero. However, the se->parent->parent of A has a (relatively) high load_avg_contrib. cpu0's cfs_rq has therefore a relatively high runnable_load_avg. find_busiest_group() therefore detects imbalance, and detach_tasks() detaches all tasks. - PREEMPT=n. Otherwise, the code under #ifdef in detach_tasks() would ensure that we'd only ever pull a maximum of one task during idle balancing. - cpu0 and cpu1 are threads on the same core (cpus_share_cache() == true). otherwise, cpu1 will not be able to wakeup tasks on cpu0 while cpu0 has interrupts disabled (since an IPI would be required). Turning off the default TTWU_QUEUE feature would also provide the same effect. I see two simple ways to prevent the livelock. One is to just to remove the #ifdef: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3d57cc0..74c94dc 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5937,7 +5937,6 @@ static int detach_tasks(struct lb_env *env) detached++; env->imbalance -= load; -#ifdef CONFIG_PREEMPT /* * NEWIDLE balancing is a source of latency, so preemptible * kernels will stop after the first task is detached to minimize @@ -5945,7 +5944,6 @@ static int detach_tasks(struct lb_env *env) */ if (env->idle == CPU_NEWLY_IDLE) break; -#endif /* * We only want to steal up to the prescribed amount of Or this should work too: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3d57cc0..13358cf 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7487,6 +7487,8 @@ static int idle_balance(struct rq *this_rq) */ if (this_rq->cfs.h_nr_running && !pulled_task) pulled_task = 1; + else if (!this_rq->cfs.h_nr_running && pulled_task) + pulled_task = 0; out: /* Move the next balance forward */ But it seems that the real problem is that detach_tasks() can remove all tasks, when cgroups are involved as described above? Thanks. /Rabin ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-06-30 14:30 [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Rabin Vincent @ 2015-07-01 5:36 ` Mike Galbraith 2015-07-01 14:55 ` Rabin Vincent 0 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2015-07-01 5:36 UTC (permalink / raw) To: Rabin Vincent; +Cc: mingo, peterz, linux-kernel On Tue, 2015-06-30 at 16:30 +0200, Rabin Vincent wrote: > Hi, > > We're seeing a livelock where two CPUs both loop with interrupts > disabled in pick_next_task_fair() / idle_balance() and continuously > fetch all tasks from each other... Hm. Does the below help? Looks to me like we are over-balancing. homer:/sys/kernel/debug/tracing # echo > trace;massive_intr 12 10 > /dev/null;cat trace|grep OINK | wc -l;tail trace 480 massive_intr-8967 [003] d... 404.285682: load_balance: OINK - imbal: 1 load: 83 run: 1 det: 1 sload_was: 180 sload_is: 83 dload: 83 massive_intr-8960 [001] d.s. 404.293331: load_balance: OINK - imbal: 1 load: 84 run: 1 det: 1 sload_was: 180 sload_is: 99 dload: 174 massive_intr-8962 [002] d.s. 404.317572: load_balance: OINK - imbal: 1 load: 83 run: 1 det: 1 sload_was: 180 sload_is: 83 dload: 161 massive_intr-8967 [003] d... 404.318296: load_balance: OINK - imbal: 1 load: 83 run: 1 det: 1 sload_was: 180 sload_is: 83 dload: 83 massive_intr-8960 [005] d... 404.341049: load_balance: OINK - imbal: 1 load: 84 run: 1 det: 1 sload_was: 180 sload_is: 83 dload: 84 massive_intr-8962 [002] d.s. 404.381549: load_balance: OINK - imbal: 1 load: 83 run: 1 det: 1 sload_was: 180 sload_is: 83 dload: 161 massive_intr-8971 [005] d... 404.417148: load_balance: OINK - imbal: 1 load: 84 run: 1 det: 1 sload_was: 180 sload_is: 83 dload: 84 massive_intr-8964 [006] d.s. 404.418536: load_balance: OINK - imbal: 1 load: 72 run: 1 det: 1 sload_was: 144 sload_is: 83 dload: 149 massive_intr-8968 [003] d... 404.437861: load_balance: OINK - imbal: 1 load: 83 run: 1 det: 1 sload_was: 180 sload_is: 83 dload: 83 massive_intr-8970 [001] d.s. 404.485263: load_balance: OINK - imbal: 1 load: 71 run: 1 det: 1 sload_was: 154 sload_is: 83 dload: 148 homer:/sys/kernel/debug/tracing # echo > trace;hackbench > /dev/null;cat trace|grep OINK | wc -l;tail trace 51 hackbench-9079 [006] d... 425.982722: load_balance: OINK - imbal: 54 load: 42 run: 4 det: 4 sload_was: 130 sload_is: 62 dload: 109 hackbench-9231 [002] d... 425.982974: load_balance: OINK - imbal: 14 load: 23 run: 1 det: 1 sload_was: 30 sload_is: 15 dload: 23 hackbench-9328 [006] d... 425.983037: load_balance: OINK - imbal: 16 load: 72 run: 2 det: 1 sload_was: 44 sload_is: 32 dload: 72 hackbench-9197 [002] d... 425.984416: load_balance: OINK - imbal: 62 load: 21 run: 3 det: 8 sload_was: 232 sload_is: 78 dload: 119 hackbench-9196 [004] d... 425.984507: load_balance: OINK - imbal: 45 load: 43 run: 1 det: 1 sload_was: 44 sload_is: 22 dload: 43 hackbench-9201 [004] d... 425.984648: load_balance: OINK - imbal: 15 load: 44 run: 1 det: 2 sload_was: 71 sload_is: 25 dload: 73 hackbench-9235 [002] d... 425.984789: load_balance: OINK - imbal: 5 load: 32 run: 2 det: 1 sload_was: 65 sload_is: 42 dload: 54 hackbench-9327 [000] d... 425.985424: load_balance: OINK - imbal: 1 load: 95 run: 1 det: 1 sload_was: 49 sload_is: 25 dload: 95 hackbench-9193 [003] d... 425.988701: load_balance: OINK - imbal: 22 load: 94 run: 1 det: 1 sload_was: 128 sload_is: 66 dload: 94 hackbench-9197 [003] d... 425.988712: load_balance: OINK - imbal: 56 load: 92 run: 1 det: 1 sload_was: 118 sload_is: 66 dload: 92 homer:/sys/kernel/debug/tracing # tail trace kwin-4654 [002] d... 460.627311: load_balance: kglobalaccel:4597 is non-contributor - count as 37 konsole-4707 [005] d... 460.627639: load_balance: kded4:4594 is non-contributor - count as 40 konsole-4707 [005] d... 460.627640: load_balance: kactivitymanage:4611 is non-contributor - count as 40 kwin-4654 [005] d... 460.627712: load_balance: kactivitymanage:4611 is non-contributor - count as 41 kactivitymanage-4611 [005] d... 460.627726: load_balance: kded4:4594 is non-contributor - count as 40 kwin-4654 [005] d... 460.828628: load_balance: OINK - imbal: 3 load: 618 run: 1 det: 1 sload_was: 1024 sload_is: 0 dload: 618 plasma-desktop-4665 [000] d... 461.886746: load_balance: baloo_file:4666 is non-contributor - count as 141 systemd-journal-397 [001] d... 466.209790: load_balance: pulseaudio:4718 is non-contributor - count as 5 systemd-1 [007] d... 466.209868: load_balance: kmix:4704 is non-contributor - count as 13 gnome-keyring-d-9455 [002] d... 466.209902: load_balance: krunner:4702 is non-contributor - count as 8 --- kernel/sched/fair.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5897,7 +5897,7 @@ static int detach_tasks(struct lb_env *e { struct list_head *tasks = &env->src_rq->cfs_tasks; struct task_struct *p; - unsigned long load; + unsigned long load, d_load = 0, s_load = env->src_rq->load.weight; int detached = 0; lockdep_assert_held(&env->src_rq->lock); @@ -5936,6 +5936,11 @@ static int detach_tasks(struct lb_env *e detached++; env->imbalance -= load; + if (!load) { + load = min_t(unsigned long, env->imbalance, p->se.load.weight); + trace_printk("%s:%d is non-contributor - count as %ld\n", p->comm, p->pid, load); + } + d_load += load; #ifdef CONFIG_PREEMPT /* @@ -5954,6 +5959,18 @@ static int detach_tasks(struct lb_env *e if (env->imbalance <= 0) break; + /* + * We don't want to bleed busiest_rq dry either. Weighted load + * and/or imbalance may be dinky, load contribution can even be + * zero, perhaps causing us to over balancem we had not assigned + * it above. + */ + if (env->src_rq->load.weight <= env->dst_rq->load.weight + d_load) { + trace_printk("OINK - imbal: %ld load: %ld run: %d det: %d sload_was: %ld sload_is: %ld dload: %ld\n", + env->imbalance, load, env->src_rq->nr_running, detached, s_load, env->src_rq->load.weight, env->dst_rq->load.weight+d_load); + break; + } + continue; next: list_move_tail(&p->se.group_node, tasks); ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-01 5:36 ` Mike Galbraith @ 2015-07-01 14:55 ` Rabin Vincent 2015-07-01 15:47 ` Mike Galbraith 2015-07-01 20:44 ` Peter Zijlstra 0 siblings, 2 replies; 31+ messages in thread From: Rabin Vincent @ 2015-07-01 14:55 UTC (permalink / raw) To: Mike Galbraith; +Cc: mingo, peterz, linux-kernel On Wed, Jul 01, 2015 at 07:36:35AM +0200, Mike Galbraith wrote: > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5897,7 +5897,7 @@ static int detach_tasks(struct lb_env *e > { > struct list_head *tasks = &env->src_rq->cfs_tasks; > struct task_struct *p; > - unsigned long load; > + unsigned long load, d_load = 0, s_load = env->src_rq->load.weight; > int detached = 0; > > lockdep_assert_held(&env->src_rq->lock); > @@ -5936,6 +5936,11 @@ static int detach_tasks(struct lb_env *e > > detached++; > env->imbalance -= load; > + if (!load) { > + load = min_t(unsigned long, env->imbalance, p->se.load.weight); > + trace_printk("%s:%d is non-contributor - count as %ld\n", p->comm, p->pid, load); > + } > + d_load += load; > > #ifdef CONFIG_PREEMPT > /* > @@ -5954,6 +5959,18 @@ static int detach_tasks(struct lb_env *e > if (env->imbalance <= 0) > break; > > + /* > + * We don't want to bleed busiest_rq dry either. Weighted load > + * and/or imbalance may be dinky, load contribution can even be > + * zero, perhaps causing us to over balancem we had not assigned > + * it above. > + */ > + if (env->src_rq->load.weight <= env->dst_rq->load.weight + d_load) { > + trace_printk("OINK - imbal: %ld load: %ld run: %d det: %d sload_was: %ld sload_is: %ld dload: %ld\n", > + env->imbalance, load, env->src_rq->nr_running, detached, s_load, env->src_rq->load.weight, env->dst_rq->load.weight+d_load); > + break; > + } > + > continue; > next: > list_move_tail(&p->se.group_node, tasks); > I've tried to analyse how your patch would affect the situation in one of the crash dumps which I have of the problem. In this dump, cpu0 is in the middle of dequeuing all tasks from cpu1. rcu_sched has already been detached and there are two tasks left, one of them which is being processed by dequeue_entity_load_avg() called from dequeue_task() at the time the watchdog hits. lb_env is the following. imbalance is, as you can see, 60. crash> struct lb_env 8054fd50 struct lb_env { sd = 0x8fc13e00, src_rq = 0x81297200, src_cpu = 1, dst_cpu = 0, dst_rq = 0x8128e200, dst_grpmask = 0x0, new_dst_cpu = 0, idle = CPU_NEWLY_IDLE, imbalance = 60, cpus = 0x8128d238, flags = 0, loop = 2, loop_break = 32, loop_max = 3, fbq_type = all, tasks = { next = 0x8fc4c6ec, prev = 0x8fc4c6ec } } Weights of the runqueues: crash> struct rq.load.weight runqueues:0,1 [0]: 8128e200 load.weight = 0, [1]: 81297200 load.weight = 1935, The only running tasks on the system are these three: crash> foreach RU ps PID PPID CPU TASK ST %MEM VSZ RSS COMM > 0 0 0 8056d8b0 RU 0.0 0 0 [swapper/0] > 0 0 1 8fc5cd18 RU 0.0 0 0 [swapper/1] > 0 0 2 8fc5c6b0 RU 0.0 0 0 [swapper/2] > 0 0 3 8fc5c048 RU 0.0 0 0 [swapper/3] 7 2 0 8fc4c690 RU 0.0 0 0 [rcu_sched] 30 2 1 8fd26108 RU 0.0 0 0 [kswapd0] 413 1 1 8edda408 RU 0.6 1900 416 rngd And the load.weight and load_avg_contribs for them and their parent SEs: crash> foreach 7 30 413 load PID: 7 TASK: 8fc4c690 CPU: 0 COMMAND: "rcu_sched" task_h_load(): 325 [ = (load_avg_contrib { 5} * cfs_rq->h_load { 65}) / (cfs_rq->runnable_load_avg { 0} + 1) ] SE: 8fc4c6d8 load_avg_contrib: 5 load.weight: 1024 PARENT: 00000000 GROUPNAME: (null) PID: 30 TASK: 8fd26108 CPU: 1 COMMAND: "kswapd0" task_h_load(): 10 [ = (load_avg_contrib { 10} * cfs_rq->h_load { 133}) / (cfs_rq->runnable_load_avg { 128} + 1) ] SE: 8fd26150 load_avg_contrib: 10 load.weight: 1024 PARENT: 00000000 GROUPNAME: (null) PID: 413 TASK: 8edda408 CPU: 1 COMMAND: "rngd" task_h_load(): 0 [ = (load_avg_contrib { 0} * cfs_rq->h_load { 0}) / (cfs_rq->runnable_load_avg { 0} + 1) ] SE: 8edda450 load_avg_contrib: 0 load.weight: 1024 PARENT: 8fffbd00 GROUPNAME: (null) SE: 8fffbd00 load_avg_contrib: 0 load.weight: 2 PARENT: 8f531f80 GROUPNAME: rngd@hwrng.service SE: 8f531f80 load_avg_contrib: 0 load.weight: 1024 PARENT: 8f456e00 GROUPNAME: system-rngd.slice SE: 8f456e00 load_avg_contrib: 118 load.weight: 911 PARENT: 00000000 GROUPNAME: system.slice Given the above, we can see that with your patch: - dst_rq->load.weight is 0 and will not change in this loop. - src_rq->load.weight was 1935 + 1024 before the loop. It will go down to 1935 (already has), 1024, and then 0. - d_load will be 325*, 335, and then 395. (* - probably not exactly since rcu_sched has already had set_task_rq() called [cfs_rq switched] on it, but I guess it's actually going to be much lower based on the other dumps I see where rcu_sched hasn't be switched yet). So, we will not hit the "if (env->src_rq->load.weight <= env->dst_rq->load.weight + d_load)" condition to break out of the loop until we actualy move all tasks. So the patch will not have any effect on this case. Or am I missing something? We'll set up a test anyway with the patch; the problem usually takes a couple of days to reproduce. /Rabin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-01 14:55 ` Rabin Vincent @ 2015-07-01 15:47 ` Mike Galbraith 2015-07-01 20:44 ` Peter Zijlstra 1 sibling, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2015-07-01 15:47 UTC (permalink / raw) To: Rabin Vincent; +Cc: mingo, peterz, linux-kernel On Wed, 2015-07-01 at 16:55 +0200, Rabin Vincent wrote: > So, we will not hit the "if (env->src_rq->load.weight <= > env->dst_rq->load.weight + d_load)" condition to break out of the loop until we > actualy move all tasks. So the patch will not have any effect on this case. > Or am I missing something? Probably not. I did have it breaking if dst_rq would pass src_rq->nr_running, which would certainly stop it, but thought I try to let it watch weights. Either way, task_h_load(p) returning 0 is not very wonderful. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-01 14:55 ` Rabin Vincent 2015-07-01 15:47 ` Mike Galbraith @ 2015-07-01 20:44 ` Peter Zijlstra 2015-07-01 23:25 ` Yuyang Du 1 sibling, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2015-07-01 20:44 UTC (permalink / raw) To: Rabin Vincent Cc: Mike Galbraith, mingo, peterz, linux-kernel, Paul Turner, Ben Segall, yuyang.du, Morten Rasmussen On Wed, Jul 01, 2015 at 04:55:51PM +0200, Rabin Vincent wrote: > PID: 413 TASK: 8edda408 CPU: 1 COMMAND: "rngd" > task_h_load(): 0 [ = (load_avg_contrib { 0} * cfs_rq->h_load { 0}) / (cfs_rq->runnable_load_avg { 0} + 1) ] > SE: 8edda450 load_avg_contrib: 0 load.weight: 1024 PARENT: 8fffbd00 GROUPNAME: (null) > SE: 8fffbd00 load_avg_contrib: 0 load.weight: 2 PARENT: 8f531f80 GROUPNAME: rngd@hwrng.service > SE: 8f531f80 load_avg_contrib: 0 load.weight: 1024 PARENT: 8f456e00 GROUPNAME: system-rngd.slice > SE: 8f456e00 load_avg_contrib: 118 load.weight: 911 PARENT: 00000000 GROUPNAME: system.slice So there's two problems there... the first we can (and should) fix, the second I'm not sure there's anything we can do about. Firstly, a group (parent) load_avg_contrib should never be less than that of its constituent parts, therefore the top 3 SEs should have at least 118 too. Now its been a while since I looked at the per entity load tracking stuff so some of the details have left me, but while it looks like we add the se->avg.load_avg_contrib to its cfs->runnable_load, we do not propagate that into the corresponding (group) se. This means the se->avg.load_avg_contrib is accounted per cpu without migration benefits. So if our task just got migrated onto a cpu that hasn't ran the group in a while, the group will not have accumulated runtime. A quick fix would be something like the below; although I think we want to do something else, like maybe propagate the load_avg_contrib up the hierarchy etc.. But I need to think more about that. The second problem is that your second SE has a weight of 2, that'll get the task_h_load() a factor of 1/512 in which will flatten pretty much anything down to small. This is per configuration, so there's really not something we can or should do about that. Untested, uncompiled hackery following, mostly for discussion. --- kernel/sched/fair.c | 6 ++++-- kernel/sched/sched.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3d57cc0ca0a6..95d0ba249c8b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6082,7 +6082,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq) struct rq *rq = rq_of(cfs_rq); struct sched_entity *se = cfs_rq->tg->se[cpu_of(rq)]; unsigned long now = jiffies; - unsigned long load; + unsigned long load, load_avg_contrib = 0; if (cfs_rq->last_h_load_update == now) return; @@ -6090,6 +6090,8 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq) cfs_rq->h_load_next = NULL; for_each_sched_entity(se) { cfs_rq = cfs_rq_of(se); + cfs_rq->h_load_avg_contrib = load_avg_contrib = + max(load_avg_contrib, se->avg.load_avg_contrib); cfs_rq->h_load_next = se; if (cfs_rq->last_h_load_update == now) break; @@ -6102,7 +6104,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq) while ((se = cfs_rq->h_load_next) != NULL) { load = cfs_rq->h_load; - load = div64_ul(load * se->avg.load_avg_contrib, + load = div64_ul(load * cfs_rq->h_load_avg_contrib, cfs_rq->runnable_load_avg + 1); cfs_rq = group_cfs_rq(se); cfs_rq->h_load = load; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 885889190a1f..7738e3b301b7 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -394,6 +394,7 @@ struct cfs_rq { * this group. */ unsigned long h_load; + unsigned long h_load_avg_contrib; u64 last_h_load_update; struct sched_entity *h_load_next; #endif /* CONFIG_FAIR_GROUP_SCHED */ ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-01 20:44 ` Peter Zijlstra @ 2015-07-01 23:25 ` Yuyang Du 2015-07-02 8:05 ` Mike Galbraith ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Yuyang Du @ 2015-07-01 23:25 UTC (permalink / raw) To: Peter Zijlstra Cc: Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen Hello, We have two basic load balancing: idle blancing and periodic balancing. It takes a tick for periodic balancing to happen again, so the livelock could not be from it. On the contrary, the idle balancing occurs as needed at arbitrary time, so the livelock could happen. And obviously, the idle balancing livelock SHOULD happen: one CPU pulls tasks from the other, makes the other idle, and this iterates... That being said, it is also obvious to prevent the livelock from happening: idle pulling until the source rq's nr_running is 1, becuase otherwise we just avoid idleness by making another idleness. Hope the patch at the end should work. On Wed, Jul 01, 2015 at 10:44:04PM +0200, Peter Zijlstra wrote: > On Wed, Jul 01, 2015 at 04:55:51PM +0200, Rabin Vincent wrote: > > PID: 413 TASK: 8edda408 CPU: 1 COMMAND: "rngd" > > task_h_load(): 0 [ = (load_avg_contrib { 0} * cfs_rq->h_load { 0}) / (cfs_rq->runnable_load_avg { 0} + 1) ] > > SE: 8edda450 load_avg_contrib: 0 load.weight: 1024 PARENT: 8fffbd00 GROUPNAME: (null) > > SE: 8fffbd00 load_avg_contrib: 0 load.weight: 2 PARENT: 8f531f80 GROUPNAME: rngd@hwrng.service > > SE: 8f531f80 load_avg_contrib: 0 load.weight: 1024 PARENT: 8f456e00 GROUPNAME: system-rngd.slice > > SE: 8f456e00 load_avg_contrib: 118 load.weight: 911 PARENT: 00000000 GROUPNAME: system.slice > > Firstly, a group (parent) load_avg_contrib should never be less than > that of its constituent parts, therefore the top 3 SEs should have at > least 118 too. I think the downward is parent, so with this case, "parent is bigger than child" is ok. But if the child is the only contributor to the parent's load (probably is this case), then the load_avg_contrib should not jump suddenly from 0 to 118. So this might be due to the __update_group_entity_contrib(), I did not look into detail in this complex function, but a glimpse suggests it is at least not consistent if not arbitray, so likely will not satisfy "parent is bigger than child". But this patch series http://comments.gmane.org/gmane.linux.kernel/1981970 should be very consistent in computing all SE's load avergage, thus safisfy the universal truth... > Now its been a while since I looked at the per entity load tracking > stuff so some of the details have left me, but while it looks like we > add the se->avg.load_avg_contrib to its cfs->runnable_load, we do not > propagate that into the corresponding (group) se. > > This means the se->avg.load_avg_contrib is accounted per cpu without > migration benefits. So if our task just got migrated onto a cpu that > hasn't ran the group in a while, the group will not have accumulated > runtime. Yes, this is true, the migration has nothing to do with either the source or the destination group SEs. Down this road, if we want the addition and subtraction after migation, we need to do it bottom-up along the SE, and do without rq->lock (?). Need to think about it for a while. Thanks, Yuyang diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 40a7fcb..f7cc1ef 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5898,6 +5898,10 @@ static int detach_tasks(struct lb_env *env) return 0; while (!list_empty(tasks)) { + + if (env->idle == CPU_NEWLY_IDLE && env->src_rq->nr_running <= 1) + break; + p = list_first_entry(tasks, struct task_struct, se.group_node); env->loop++; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-01 23:25 ` Yuyang Du @ 2015-07-02 8:05 ` Mike Galbraith 2015-07-02 1:05 ` Yuyang Du 2015-07-02 10:53 ` Peter Zijlstra 2015-07-03 16:39 ` Peter Zijlstra 2 siblings, 1 reply; 31+ messages in thread From: Mike Galbraith @ 2015-07-02 8:05 UTC (permalink / raw) To: Yuyang Du Cc: Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen On Thu, 2015-07-02 at 07:25 +0800, Yuyang Du wrote: > That being said, it is also obvious to prevent the livelock from happening: > idle pulling until the source rq's nr_running is 1, becuase otherwise we > just avoid idleness by making another idleness. Yeah, but that's just the symptom, not the disease. Better for the idle balance symptom may actually be to only pull one when idle balancing. After all, the immediate goal is to find something better to do than idle, not to achieve continual perfect (is the enemy of good) balance. -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 8:05 ` Mike Galbraith @ 2015-07-02 1:05 ` Yuyang Du 2015-07-02 10:25 ` Mike Galbraith 2015-07-02 11:40 ` Morten Rasmussen 0 siblings, 2 replies; 31+ messages in thread From: Yuyang Du @ 2015-07-02 1:05 UTC (permalink / raw) To: Mike Galbraith Cc: Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen Hi Mike, On Thu, Jul 02, 2015 at 10:05:47AM +0200, Mike Galbraith wrote: > On Thu, 2015-07-02 at 07:25 +0800, Yuyang Du wrote: > > > That being said, it is also obvious to prevent the livelock from happening: > > idle pulling until the source rq's nr_running is 1, becuase otherwise we > > just avoid idleness by making another idleness. > > Yeah, but that's just the symptom, not the disease. Better for the idle > balance symptom may actually be to only pull one when idle balancing. > After all, the immediate goal is to find something better to do than > idle, not to achieve continual perfect (is the enemy of good) balance. > Symptom? :) You mean "pull one and stop, can't be greedy"? Right, but still need to assure you don't make another idle CPU (meaning until nr_running == 1), which is the cure to disease. I am ok with at most "pull one", but probably we stick to the load_balance() by pulling an fair amount, assuming load_balance() magically computes the right imbalance, otherwise you may have to do multiple "pull one"s. Thanks, Yuyang ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 1:05 ` Yuyang Du @ 2015-07-02 10:25 ` Mike Galbraith 2015-07-02 11:40 ` Morten Rasmussen 1 sibling, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2015-07-02 10:25 UTC (permalink / raw) To: Yuyang Du Cc: Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen On Thu, 2015-07-02 at 09:05 +0800, Yuyang Du wrote: > > Symptom? :) Yeah, my fingers have a collection of those :) -Mike ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 1:05 ` Yuyang Du 2015-07-02 10:25 ` Mike Galbraith @ 2015-07-02 11:40 ` Morten Rasmussen 2015-07-02 19:37 ` Yuyang Du 1 sibling, 1 reply; 31+ messages in thread From: Morten Rasmussen @ 2015-07-02 11:40 UTC (permalink / raw) To: Yuyang Du Cc: Mike Galbraith, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Thu, Jul 02, 2015 at 09:05:39AM +0800, Yuyang Du wrote: > Hi Mike, > > On Thu, Jul 02, 2015 at 10:05:47AM +0200, Mike Galbraith wrote: > > On Thu, 2015-07-02 at 07:25 +0800, Yuyang Du wrote: > > > > > That being said, it is also obvious to prevent the livelock from happening: > > > idle pulling until the source rq's nr_running is 1, becuase otherwise we > > > just avoid idleness by making another idleness. > > > > Yeah, but that's just the symptom, not the disease. Better for the idle > > balance symptom may actually be to only pull one when idle balancing. > > After all, the immediate goal is to find something better to do than > > idle, not to achieve continual perfect (is the enemy of good) balance. > > > Symptom? :) > > You mean "pull one and stop, can't be greedy"? Right, but still need to > assure you don't make another idle CPU (meaning until nr_running == 1), which > is the cure to disease. > > I am ok with at most "pull one", but probably we stick to the load_balance() > by pulling an fair amount, assuming load_balance() magically computes the > right imbalance, otherwise you may have to do multiple "pull one"s. Talking about the disease and looking at the debug data that Rabin has provided I think the problem is due to the way blocked load is handled (or not handled) in calculate_imbalance(). We have three entities in the root cfs_rq on cpu1: 1. Task entity pid 7, load_avg_contrib = 5. 2. Task entity pid 30, load_avg_contrib = 10. 3. Group entity, load_avg_contrib = 118, but contains task entity pid 413 further down the hierarchy with task_h_load() = 0. The 118 comes from the blocked load contribution in the system.slice task group. calculate_imbalance() figures out the average loads are: cpu0: load/capacity = 0*1024/1024 = 0 cpu1: load/capacity = (5 + 10 + 118)*1024/1024 = 133 domain: load/capacity = (0 + 133)*1024/(2*1024) = 62 env->imbalance = 62 Rabin reported env->imbalance = 60 after pulling the rcu task with load_avg_contrib = 5. It doesn't match my numbers exactly, but it pretty close ;-) detach_tasks() will attempts to pull 62 based on tasks task_h_load() but the task_h_load() sum is only 5 + 10 + 0 and hence detach_tasks() will empty the src_rq. IOW, since task groups include blocked load in the load_avg_contrib (see __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the imbalance includes blocked load and hence env->imbalance >= sum(task_h_load(p)) for all tasks p on the rq. Which leads to detach_tasks() emptying the rq completely in the reported scenario where blocked load > runnable load. Whether emptying the src_rq is the right thing to do depends on on your point of view. Does balanced load (runnable+blocked) take priority over keeping cpus busy or not? For idle_balance() it seems intuitively correct to not empty the rq and hence you could consider env->imbalance to be too big. I think we will see more of this kind of problems if we include weighted_cpuload() as well. Parts of the imbalance calculation code is quite old and could use some attention first. A short term fix could be what Yuyang propose, stop pulling tasks when there is only one left in detach_tasks(). It won't affect active load balance where we may want to migrate the last task as it active load balance doesn't use detach_tasks(). Morten ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 11:40 ` Morten Rasmussen @ 2015-07-02 19:37 ` Yuyang Du 2015-07-03 9:34 ` Morten Rasmussen 0 siblings, 1 reply; 31+ messages in thread From: Yuyang Du @ 2015-07-02 19:37 UTC (permalink / raw) To: Morten Rasmussen Cc: Mike Galbraith, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall Hi Morten, On Thu, Jul 02, 2015 at 12:40:32PM +0100, Morten Rasmussen wrote: > detach_tasks() will attempts to pull 62 based on tasks task_h_load() but > the task_h_load() sum is only 5 + 10 + 0 and hence detach_tasks() will > empty the src_rq. > > IOW, since task groups include blocked load in the load_avg_contrib (see > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > imbalance includes blocked load and hence env->imbalance >= > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > detach_tasks() emptying the rq completely in the reported scenario where > blocked load > runnable load. Whenever I want to know the load avg concerning task group, I need to walk through the complete codes again, I prefer not to do it this time. But it should not be that simply to say "the 118 comes from the blocked load". Anyway, with blocked load, yes, we definitely can't move (or even find) some ammount of the imbalance if we only look at the tasks on the queue. But this may or may not be a problem. Firstly, the question comes to whether we want blocked load anywhere. This is just about a "now vs. average" question. Secondly, if we stick to average, we just need to treat the blocked load consistently, not that group SE has it, but task SE does not, or somewhere has it, others not. Thanks, Yuyang > Whether emptying the src_rq is the right thing to do depends on on your > point of view. Does balanced load (runnable+blocked) take priority over > keeping cpus busy or not? For idle_balance() it seems intuitively > correct to not empty the rq and hence you could consider env->imbalance > to be too big. > > I think we will see more of this kind of problems if we include > weighted_cpuload() as well. Parts of the imbalance calculation code is > quite old and could use some attention first. > > A short term fix could be what Yuyang propose, stop pulling tasks when > there is only one left in detach_tasks(). It won't affect active load > balance where we may want to migrate the last task as it active load > balance doesn't use detach_tasks(). > > Morten ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 19:37 ` Yuyang Du @ 2015-07-03 9:34 ` Morten Rasmussen 2015-07-03 16:38 ` Peter Zijlstra 2015-07-05 20:12 ` Yuyang Du 0 siblings, 2 replies; 31+ messages in thread From: Morten Rasmussen @ 2015-07-03 9:34 UTC (permalink / raw) To: Yuyang Du Cc: Mike Galbraith, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Fri, Jul 03, 2015 at 03:37:02AM +0800, Yuyang Du wrote: > Hi Morten, > > On Thu, Jul 02, 2015 at 12:40:32PM +0100, Morten Rasmussen wrote: > > detach_tasks() will attempts to pull 62 based on tasks task_h_load() but > > the task_h_load() sum is only 5 + 10 + 0 and hence detach_tasks() will > > empty the src_rq. > > > > IOW, since task groups include blocked load in the load_avg_contrib (see > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > > imbalance includes blocked load and hence env->imbalance >= > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > > detach_tasks() emptying the rq completely in the reported scenario where > > blocked load > runnable load. > > Whenever I want to know the load avg concerning task group, I need to > walk through the complete codes again, I prefer not to do it this time. > But it should not be that simply to say "the 118 comes from the blocked load". But the whole hierarchy of group entities is updated each time we enqueue or dequeue a task. I don't see how the group entity load_avg_contrib is not up to date? Why do you need to update it again? In any case, we have one task in the group hierarchy which has a load_avg_contrib of 0 and the grand-grand parent group entity has a load_avg_contrib of 118 and no additional tasks. That load contribution must be from tasks which are no longer around on the rq? No? > Anyway, with blocked load, yes, we definitely can't move (or even find) some > ammount of the imbalance if we only look at the tasks on the queue. But this > may or may not be a problem. > > Firstly, the question comes to whether we want blocked load anywhere. > This is just about a "now vs. average" question. That is what I meant in the paragraph below. It is a scheduling policy question. > Secondly, if we stick to average, we just need to treat the blocked load > consistently, not that group SE has it, but task SE does not, or somewhere > has it, others not. I agree that inconsistent use of blocked load will lead us into trouble. The problem is that none of the load-balance logic was designed for blocked load. It was written to deal load that is currently on the rq and slightly biased by average cpu load, not dealing with load contribution of tasks which we can't migrate at the moment because they are blocked. The load-balance code has to be updated to deal with blocked load. We will run into all sorts of issues if we don't and roll out use of blocked load everywhere. However, before we can rework the load-balance code, we have to agree on the now vs average balance policy. Your proposed patch implements a policy somewhere in between. We try to balance based on average, but we don't allow idle_balance() to empty a cpu completely. A pure average balance policy would allow a cpu to go idle even if we could do have tasks waiting on other rqs if the blocked load indicates that other task will show up shortly one the cpu. A pure "now" balance would balance based on runnable_load_avg for all entities including groups ignoring all blocked load, but that goes against the PELT group balancing design. I'm not against having a policy that sits somewhere in between, we just have to agree it is the right policy and clean up the load-balance code such that the implemented policy is clear. Morten > > Thanks, > Yuyang > > > Whether emptying the src_rq is the right thing to do depends on on your > > point of view. Does balanced load (runnable+blocked) take priority over > > keeping cpus busy or not? For idle_balance() it seems intuitively > > correct to not empty the rq and hence you could consider env->imbalance > > to be too big. > > > > I think we will see more of this kind of problems if we include > > weighted_cpuload() as well. Parts of the imbalance calculation code is > > quite old and could use some attention first. > > > > A short term fix could be what Yuyang propose, stop pulling tasks when > > there is only one left in detach_tasks(). It won't affect active load > > balance where we may want to migrate the last task as it active load > > balance doesn't use detach_tasks(). > > > > Morten > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-03 9:34 ` Morten Rasmussen @ 2015-07-03 16:38 ` Peter Zijlstra 2015-07-05 22:31 ` Yuyang Du 2015-07-05 20:12 ` Yuyang Du 1 sibling, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2015-07-03 16:38 UTC (permalink / raw) To: Morten Rasmussen Cc: Yuyang Du, Mike Galbraith, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote: > > > IOW, since task groups include blocked load in the load_avg_contrib (see > > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > > > imbalance includes blocked load and hence env->imbalance >= > > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > > > detach_tasks() emptying the rq completely in the reported scenario where > > > blocked load > runnable load. So IIRC we need the blocked load for groups for computing the per-cpu slices of the total weight, avg works really well for that. > I'm not against having a policy that sits somewhere in between, we just > have to agree it is the right policy and clean up the load-balance code > such that the implemented policy is clear. Right, for balancing its a tricky question, but mixing them without intent is, as you say, a bit of a mess. So clearly blocked load doesn't make sense for (new)idle balancing. OTOH it does make some sense for the regular periodic balancing, because there we really do care mostly about the averages, esp. so when we're overloaded -- but there are issues there too. Now we can't track them both (or rather we could, but overhead). I like Yuyang's load tracking rewrite, but it changes exactly this part, and I'm not sure I understand the full ramifications of that yet. One way out would be to split the load balancer into 3 distinct regions; 1) get a task on every CPU, screw everything else. 2) get each CPU fully utilized, still ignoring 'load' 3) when everybody is fully utilized, consider load. If we make find_busiest_foo() select one of these 3, and make calculate_imbalance() invariant to the metric passed in, and have things like cpu_load() and task_load() return different, but coherent, numbers depending on which region we're in, this almost sounds 'simple'. The devil is in the details, and the balancer is a hairy nest of details which will make the above non-trivial. But for 1) we could simply 'balance' on nr_running, for 2) we can 'balance' on runnable_avg and for 3) we'll 'balance' on load_avg (which will then include blocked load). Let me go play outside for a bit so that it can sink in what kind of nonsense my heat addled brain has just sprouted :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-03 16:38 ` Peter Zijlstra @ 2015-07-05 22:31 ` Yuyang Du 2015-07-09 14:32 ` Morten Rasmussen 0 siblings, 1 reply; 31+ messages in thread From: Yuyang Du @ 2015-07-05 22:31 UTC (permalink / raw) To: Peter Zijlstra Cc: Morten Rasmussen, Mike Galbraith, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Fri, Jul 03, 2015 at 06:38:31PM +0200, Peter Zijlstra wrote: > > I'm not against having a policy that sits somewhere in between, we just > > have to agree it is the right policy and clean up the load-balance code > > such that the implemented policy is clear. > > Right, for balancing its a tricky question, but mixing them without > intent is, as you say, a bit of a mess. > > So clearly blocked load doesn't make sense for (new)idle balancing. OTOH > it does make some sense for the regular periodic balancing, because > there we really do care mostly about the averages, esp. so when we're > overloaded -- but there are issues there too. > > Now we can't track them both (or rather we could, but overhead). > > I like Yuyang's load tracking rewrite, but it changes exactly this part, > and I'm not sure I understand the full ramifications of that yet. Thanks. It would be a pure average policy, which is non-perfect like now, and certainly needs a mixing like now, but it is worth a starter, because it is simple and reasaonble, and based on it, the other parts can be simple and reasonable. > One way out would be to split the load balancer into 3 distinct regions; > > 1) get a task on every CPU, screw everything else. > 2) get each CPU fully utilized, still ignoring 'load' > 3) when everybody is fully utilized, consider load. > > If we make find_busiest_foo() select one of these 3, and make > calculate_imbalance() invariant to the metric passed in, and have things > like cpu_load() and task_load() return different, but coherent, numbers > depending on which region we're in, this almost sounds 'simple'. > > The devil is in the details, and the balancer is a hairy nest of details > which will make the above non-trivial. > > But for 1) we could simply 'balance' on nr_running, for 2) we can > 'balance' on runnable_avg and for 3) we'll 'balance' on load_avg (which > will then include blocked load). > > Let me go play outside for a bit so that it can sink in what kind of > nonsense my heat addled brain has just sprouted :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-05 22:31 ` Yuyang Du @ 2015-07-09 14:32 ` Morten Rasmussen 2015-07-09 23:24 ` Yuyang Du 0 siblings, 1 reply; 31+ messages in thread From: Morten Rasmussen @ 2015-07-09 14:32 UTC (permalink / raw) To: Yuyang Du Cc: Peter Zijlstra, Mike Galbraith, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Mon, Jul 06, 2015 at 06:31:44AM +0800, Yuyang Du wrote: > On Fri, Jul 03, 2015 at 06:38:31PM +0200, Peter Zijlstra wrote: > > > I'm not against having a policy that sits somewhere in between, we just > > > have to agree it is the right policy and clean up the load-balance code > > > such that the implemented policy is clear. > > > > Right, for balancing its a tricky question, but mixing them without > > intent is, as you say, a bit of a mess. > > > > So clearly blocked load doesn't make sense for (new)idle balancing. OTOH > > it does make some sense for the regular periodic balancing, because > > there we really do care mostly about the averages, esp. so when we're > > overloaded -- but there are issues there too. > > > > Now we can't track them both (or rather we could, but overhead). > > > > I like Yuyang's load tracking rewrite, but it changes exactly this part, > > and I'm not sure I understand the full ramifications of that yet. I don't think anybody does ;-) But I think we should try to make it work. > Thanks. It would be a pure average policy, which is non-perfect like now, > and certainly needs a mixing like now, but it is worth a starter, because > it is simple and reasaonble, and based on it, the other parts can be simple > and reasonable. I think we all agree on the benefits of taking blocked load into account but also that there are some policy questions to be addressed. > > One way out would be to split the load balancer into 3 distinct regions; > > > > 1) get a task on every CPU, screw everything else. > > 2) get each CPU fully utilized, still ignoring 'load' > > 3) when everybody is fully utilized, consider load. Seems very reasonable to me. We more or less follow that idea in the energy-model driven scheduling patches, at least 2) and 3). The difficult bit is detecting when to transition between 2) and 3). If you want to enforce smp_nice you have to start worrying about task priority as soon as one cpu is fully utilized. For example, a fully utilized cpu has two high priority tasks while all other cpus are running low priority tasks and are not fully utilized. The utilization imbalance may be too small to cause any tasks to be migrated, so we end up giving fewer cycles to the high priority tasks. > > If we make find_busiest_foo() select one of these 3, and make > > calculate_imbalance() invariant to the metric passed in, and have things > > like cpu_load() and task_load() return different, but coherent, numbers > > depending on which region we're in, this almost sounds 'simple'. > > > > The devil is in the details, and the balancer is a hairy nest of details > > which will make the above non-trivial. Yes, but if we have an overall policy like the one you propose we can at least make it complicated and claim that we think we know what it is supposed to do ;-) I agree that there is some work to be done in find_busiest_*() and calcuate_imbalance() + friends. Maybe step one should be to clean them up a bit. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-09 14:32 ` Morten Rasmussen @ 2015-07-09 23:24 ` Yuyang Du 0 siblings, 0 replies; 31+ messages in thread From: Yuyang Du @ 2015-07-09 23:24 UTC (permalink / raw) To: Morten Rasmussen Cc: Peter Zijlstra, Mike Galbraith, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall Hi, On Thu, Jul 09, 2015 at 03:32:20PM +0100, Morten Rasmussen wrote: > On Mon, Jul 06, 2015 at 06:31:44AM +0800, Yuyang Du wrote: > > On Fri, Jul 03, 2015 at 06:38:31PM +0200, Peter Zijlstra wrote: > > > > I'm not against having a policy that sits somewhere in between, we just > > > > have to agree it is the right policy and clean up the load-balance code > > > > such that the implemented policy is clear. > > > > > > Right, for balancing its a tricky question, but mixing them without > > > intent is, as you say, a bit of a mess. > > > > > > So clearly blocked load doesn't make sense for (new)idle balancing. OTOH > > > it does make some sense for the regular periodic balancing, because > > > there we really do care mostly about the averages, esp. so when we're > > > overloaded -- but there are issues there too. > > > > > > Now we can't track them both (or rather we could, but overhead). > > > > > > I like Yuyang's load tracking rewrite, but it changes exactly this part, > > > and I'm not sure I understand the full ramifications of that yet. > > I don't think anybody does ;-) But I think we should try to make it > work. > > > Thanks. It would be a pure average policy, which is non-perfect like now, > > and certainly needs a mixing like now, but it is worth a starter, because > > it is simple and reasaonble, and based on it, the other parts can be simple > > and reasonable. > > I think we all agree on the benefits of taking blocked load into > account but also that there are some policy questions to be addressed. > > > > One way out would be to split the load balancer into 3 distinct regions; > > > > > > 1) get a task on every CPU, screw everything else. > > > 2) get each CPU fully utilized, still ignoring 'load' > > > 3) when everybody is fully utilized, consider load. > > Seems very reasonable to me. We more or less follow that idea in the > energy-model driven scheduling patches, at least 2) and 3). > > The difficult bit is detecting when to transition between 2) and 3). If > you want to enforce smp_nice you have to start worrying about task > priority as soon as one cpu is fully utilized. > > For example, a fully utilized cpu has two high priority tasks while all > other cpus are running low priority tasks and are not fully utilized. > The utilization imbalance may be too small to cause any tasks to be > migrated, so we end up giving fewer cycles to the high priority tasks. > > > > If we make find_busiest_foo() select one of these 3, and make > > > calculate_imbalance() invariant to the metric passed in, and have things > > > like cpu_load() and task_load() return different, but coherent, numbers > > > depending on which region we're in, this almost sounds 'simple'. > > > > > > The devil is in the details, and the balancer is a hairy nest of details > > > which will make the above non-trivial. > > Yes, but if we have an overall policy like the one you propose we can at > least make it complicated and claim that we think we know what it is > supposed to do ;-) > > I agree that there is some work to be done in find_busiest_*() and > calcuate_imbalance() + friends. Maybe step one should be to clean them > up a bit. Consensus looks like that we move step-by-step and start working right now: 1) Based on the "Rewrite" patch, let me add cfs_rq->runnable_load_avg. Then we will have up-to-date everything: load.weight, runnable_load_avg, and load_avg (including runnable + blocked), from pure now to pure average. The runnable_load_avg will be used the same as now. So we will not have a shred of remification. As long as the code is cleared and simplified, it is a win. 2) Let's clean up a bit the load balancing part code-wise, and if needed, make change to the obvious things, otherwise leave it unchanged. 3) Polish/complicate the policies, :) What do you think? Thanks, Yuyang ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-03 9:34 ` Morten Rasmussen 2015-07-03 16:38 ` Peter Zijlstra @ 2015-07-05 20:12 ` Yuyang Du 2015-07-06 17:36 ` Dietmar Eggemann 2015-07-09 13:53 ` Morten Rasmussen 1 sibling, 2 replies; 31+ messages in thread From: Yuyang Du @ 2015-07-05 20:12 UTC (permalink / raw) To: Morten Rasmussen Cc: Mike Galbraith, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall Hi Morten, On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote: > > > IOW, since task groups include blocked load in the load_avg_contrib (see > > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > > > imbalance includes blocked load and hence env->imbalance >= > > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > > > detach_tasks() emptying the rq completely in the reported scenario where > > > blocked load > runnable load. > > > > Whenever I want to know the load avg concerning task group, I need to > > walk through the complete codes again, I prefer not to do it this time. > > But it should not be that simply to say "the 118 comes from the blocked load". > > But the whole hierarchy of group entities is updated each time we enqueue > or dequeue a task. I don't see how the group entity load_avg_contrib is > not up to date? Why do you need to update it again? > > In any case, we have one task in the group hierarchy which has a > load_avg_contrib of 0 and the grand-grand parent group entity has a > load_avg_contrib of 118 and no additional tasks. That load contribution > must be from tasks which are no longer around on the rq? No? load_avg_contrib has WEIGHT inside, so the most I can say is: SE: 8f456e00's load_avg_contrib 118 = (its cfs_rq's runnable + blocked) / (tg->load_avg + 1) * tg->shares The tg->shares is probably 1024 (at least 911). So we are just left with: cfs_rq / tg = 11.5% I myself did question the sudden jump from 0 to 118 (see a previous reply). But anyway, this really is irrelevant to the discusstion. > > Anyway, with blocked load, yes, we definitely can't move (or even find) some > > ammount of the imbalance if we only look at the tasks on the queue. But this > > may or may not be a problem. > > > > Firstly, the question comes to whether we want blocked load anywhere. > > This is just about a "now vs. average" question. > > That is what I meant in the paragraph below. It is a scheduling policy > question. > > > Secondly, if we stick to average, we just need to treat the blocked load > > consistently, not that group SE has it, but task SE does not, or somewhere > > has it, others not. > > I agree that inconsistent use of blocked load will lead us into trouble. > The problem is that none of the load-balance logic was designed for > blocked load. It was written to deal load that is currently on the rq > and slightly biased by average cpu load, not dealing with load > contribution of tasks which we can't migrate at the moment because they > are blocked. The load-balance code has to be updated to deal with > blocked load. We will run into all sorts of issues if we don't and roll > out use of blocked load everywhere. > > However, before we can rework the load-balance code, we have to agree on > the now vs average balance policy. > > Your proposed patch implements a policy somewhere in between. We try to > balance based on average, but we don't allow idle_balance() to empty a > cpu completely. A pure average balance policy would allow a cpu to go > idle even if we could do have tasks waiting on other rqs if the blocked > load indicates that other task will show up shortly one the cpu. A pure > "now" balance would balance based on runnable_load_avg for all entities > including groups ignoring all blocked load, but that goes against the > PELT group balancing design. > > I'm not against having a policy that sits somewhere in between, we just > have to agree it is the right policy and clean up the load-balance code > such that the implemented policy is clear. The proposed patch sits in between? I agree, but would rather see it from another perspective. First, I don't think it merits a solution/policy. It is just a cheap "last guard" to protect the "king" - no crash. Second, a "pure average" policy is pretty fine in general, but it does not mean we would simply allow a CPU to be pulled empty, that is because we are making a bet from a prediction (average) here. By prediction, it basically means sometimes it fails. As the failure could lead to a disater, without blaming the prediction, it is reasonable we make a sort of "damage control". Thanks, Yuyang ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-05 20:12 ` Yuyang Du @ 2015-07-06 17:36 ` Dietmar Eggemann 2015-07-07 11:17 ` Rabin Vincent 2015-07-09 13:53 ` Morten Rasmussen 1 sibling, 1 reply; 31+ messages in thread From: Dietmar Eggemann @ 2015-07-06 17:36 UTC (permalink / raw) To: Yuyang Du, Morten Rasmussen Cc: Mike Galbraith, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall Hi Yuyang, On 05/07/15 21:12, Yuyang Du wrote: > Hi Morten, > > On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote: >>>> IOW, since task groups include blocked load in the load_avg_contrib (see >>>> __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the >>>> imbalance includes blocked load and hence env->imbalance >= >>>> sum(task_h_load(p)) for all tasks p on the rq. Which leads to >>>> detach_tasks() emptying the rq completely in the reported scenario where >>>> blocked load > runnable load. >>> >>> Whenever I want to know the load avg concerning task group, I need to >>> walk through the complete codes again, I prefer not to do it this time. >>> But it should not be that simply to say "the 118 comes from the blocked load". >> >> But the whole hierarchy of group entities is updated each time we enqueue >> or dequeue a task. I don't see how the group entity load_avg_contrib is >> not up to date? Why do you need to update it again? >> >> In any case, we have one task in the group hierarchy which has a >> load_avg_contrib of 0 and the grand-grand parent group entity has a >> load_avg_contrib of 118 and no additional tasks. That load contribution >> must be from tasks which are no longer around on the rq? No? > > load_avg_contrib has WEIGHT inside, so the most I can say is: > SE: 8f456e00's load_avg_contrib 118 = (its cfs_rq's runnable + blocked) / (tg->load_avg + 1) * tg->shares > > The tg->shares is probably 1024 (at least 911). So we are just left with: > > cfs_rq / tg = 11.5% > > I myself did question the sudden jump from 0 to 118 (see a previous reply). Do you mean the jump from system-rngd.slice (0) (tg.css.id=3) to system.slice (118) (tg.css.id=2)? Maybe, the 118 might come from another tg hierarchy (w/ tg.css.id >= 3) inside the system.slice group representing another service. Rabin, could you share the content of your /sys/fs/cgroup/cpu/system.slice directory and of /proc/cgroups ? Whether 118 comes from the cfs_rq->blocked_load_avg of one of the tg levels of one of the other system.slice tg hierarchies or it results from not updating the se.avg.load_avg_contrib values of se's representing tg's immediately is not that important I guess. Even if we're able to sync both things (task en/dequeue and tg se.avg.load_avg_contrib update) perfectly (by calling update_cfs_rq_blocked_load() always w/ force_update=1 and immediately after that update_entity_load_avg() for all tg se's in one hierarchy, we would still have to deal w/ the blocked load part if the tg se representing system.slice contributes to cpu_rq(cpu)->cfs.runnable_load_avg. -- Dietmar > > But anyway, this really is irrelevant to the discusstion. > [...] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-06 17:36 ` Dietmar Eggemann @ 2015-07-07 11:17 ` Rabin Vincent 2015-07-13 17:43 ` Dietmar Eggemann 0 siblings, 1 reply; 31+ messages in thread From: Rabin Vincent @ 2015-07-07 11:17 UTC (permalink / raw) To: Dietmar Eggemann Cc: Yuyang Du, Morten Rasmussen, Mike Galbraith, Peter Zijlstra, mingo, linux-kernel, Paul Turner, Ben Segall On Mon, Jul 06, 2015 at 07:36:56PM +0200, Dietmar Eggemann wrote: > Rabin, could you share the content of your > /sys/fs/cgroup/cpu/system.slice directory and of /proc/cgroups ? Here's /proc/cgroups, # cat /proc/cgroups #subsys_name hierarchy num_cgroups enabled cpu 2 98 1 cpuacct 2 98 1 and the contents of /sys/fs/cgroup/cpu/system.slice are available here: https://drive.google.com/file/d/0B4tMLbMvJ-l6ZVBvZ09QOE15MU0/view /Rabin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-07 11:17 ` Rabin Vincent @ 2015-07-13 17:43 ` Dietmar Eggemann 0 siblings, 0 replies; 31+ messages in thread From: Dietmar Eggemann @ 2015-07-13 17:43 UTC (permalink / raw) To: Rabin Vincent Cc: Yuyang Du, Morten Rasmussen, Mike Galbraith, Peter Zijlstra, mingo, linux-kernel, Paul Turner, Ben Segall On 07/07/15 12:17, Rabin Vincent wrote: > On Mon, Jul 06, 2015 at 07:36:56PM +0200, Dietmar Eggemann wrote: >> Rabin, could you share the content of your >> /sys/fs/cgroup/cpu/system.slice directory and of /proc/cgroups ? > > Here's /proc/cgroups, > > # cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpu 2 98 1 > cpuacct 2 98 1 > > and the contents of /sys/fs/cgroup/cpu/system.slice are available here: > https://drive.google.com/file/d/0B4tMLbMvJ-l6ZVBvZ09QOE15MU0/view > > /Rabin > So why not maintain a runnable signal for the task group se's? At least to figure out if the 118 is coming from blocked load. -- >8 -- Subject: [PATCH] sched: Maintain a runnable version of tg->load_avg and cfs_rq->tg_load_contrib Including blocked load in the load average contribution of sched entities (se->avg.load_avg_contrib) representing task groups can lead to scenarios where the imbalance is greater than sum(task_h_load(p)) for all tasks p on the src rq. To avoid this use cfs_rq->runnable_tg_load_contrib and tg->runnable_load_avg to calculate se->avg.load_avg_contrib for sched entities representing task groups. Both runnable based values are updated in cadence with the existing values. The existing tg->load_avg and cfs_rq->tg_load_contrib are still used to calculate task group weight. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> --- kernel/sched/fair.c | 11 ++++++++--- kernel/sched/sched.h | 2 ++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 587a2f67ceb1..f2cfbaaf5700 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2647,7 +2647,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, int force_update) { struct task_group *tg = cfs_rq->tg; - long tg_contrib; + long tg_contrib, runnable_tg_contrib; tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg; tg_contrib -= cfs_rq->tg_load_contrib; @@ -2655,9 +2655,14 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq, if (!tg_contrib) return; + runnable_tg_contrib = cfs_rq->runnable_load_avg; + runnable_tg_contrib -= cfs_rq->runnable_tg_load_contrib; + if (force_update || abs(tg_contrib) > cfs_rq->tg_load_contrib / 8) { atomic_long_add(tg_contrib, &tg->load_avg); cfs_rq->tg_load_contrib += tg_contrib; + atomic_long_add(runnable_tg_contrib, &tg->runnable_load_avg); + cfs_rq->runnable_tg_load_contrib += runnable_tg_contrib; } } @@ -2690,9 +2695,9 @@ static inline void __update_group_entity_contrib(struct sched_entity *se) u64 contrib; - contrib = cfs_rq->tg_load_contrib * tg->shares; + contrib = cfs_rq->runnable_tg_load_contrib * tg->shares; se->avg.load_avg_contrib = div_u64(contrib, - atomic_long_read(&tg->load_avg) + 1); + atomic_long_read(&tg->runnable_load_avg) + 1); /* * For group entities we need to compute a correction term in the case diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 84d48790bb6d..eed74e5efe91 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -245,6 +245,7 @@ struct task_group { #ifdef CONFIG_SMP atomic_long_t load_avg; + atomic_long_t runnable_load_avg; atomic_t runnable_avg; #endif #endif @@ -386,6 +387,7 @@ struct cfs_rq { /* Required to track per-cpu representation of a task_group */ u32 tg_runnable_contrib; unsigned long tg_load_contrib; + unsigned long runnable_tg_load_contrib; /* * h_load = weight * f(tg) -- 1.9.1 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-05 20:12 ` Yuyang Du 2015-07-06 17:36 ` Dietmar Eggemann @ 2015-07-09 13:53 ` Morten Rasmussen 2015-07-09 22:34 ` Yuyang Du 1 sibling, 1 reply; 31+ messages in thread From: Morten Rasmussen @ 2015-07-09 13:53 UTC (permalink / raw) To: Yuyang Du Cc: Mike Galbraith, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Mon, Jul 06, 2015 at 04:12:41AM +0800, Yuyang Du wrote: > Hi Morten, > > On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote: > > > > IOW, since task groups include blocked load in the load_avg_contrib (see > > > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > > > > imbalance includes blocked load and hence env->imbalance >= > > > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > > > > detach_tasks() emptying the rq completely in the reported scenario where > > > > blocked load > runnable load. > > > > > > Whenever I want to know the load avg concerning task group, I need to > > > walk through the complete codes again, I prefer not to do it this time. > > > But it should not be that simply to say "the 118 comes from the blocked load". > > > > But the whole hierarchy of group entities is updated each time we enqueue > > or dequeue a task. I don't see how the group entity load_avg_contrib is > > not up to date? Why do you need to update it again? > > > > In any case, we have one task in the group hierarchy which has a > > load_avg_contrib of 0 and the grand-grand parent group entity has a > > load_avg_contrib of 118 and no additional tasks. That load contribution > > must be from tasks which are no longer around on the rq? No? > > load_avg_contrib has WEIGHT inside, so the most I can say is: > SE: 8f456e00's load_avg_contrib 118 = (its cfs_rq's runnable + blocked) / (tg->load_avg + 1) * tg->shares > > The tg->shares is probably 1024 (at least 911). So we are just left with: > > cfs_rq / tg = 11.5% Yes, we also know that there is only one runnable task in the task group hierarchy and its contribution is 0. Hence the rest must be from non-runnable tasks belonging to some child group. > I myself did question the sudden jump from 0 to 118 (see a previous reply). > > But anyway, this really is irrelevant to the discusstion. Agreed. > > > > Anyway, with blocked load, yes, we definitely can't move (or even find) some > > > ammount of the imbalance if we only look at the tasks on the queue. But this > > > may or may not be a problem. > > > > > > Firstly, the question comes to whether we want blocked load anywhere. > > > This is just about a "now vs. average" question. > > > > That is what I meant in the paragraph below. It is a scheduling policy > > question. > > > > > Secondly, if we stick to average, we just need to treat the blocked load > > > consistently, not that group SE has it, but task SE does not, or somewhere > > > has it, others not. > > > > I agree that inconsistent use of blocked load will lead us into trouble. > > The problem is that none of the load-balance logic was designed for > > blocked load. It was written to deal load that is currently on the rq > > and slightly biased by average cpu load, not dealing with load > > contribution of tasks which we can't migrate at the moment because they > > are blocked. The load-balance code has to be updated to deal with > > blocked load. We will run into all sorts of issues if we don't and roll > > out use of blocked load everywhere. > > > > However, before we can rework the load-balance code, we have to agree on > > the now vs average balance policy. > > > > Your proposed patch implements a policy somewhere in between. We try to > > balance based on average, but we don't allow idle_balance() to empty a > > cpu completely. A pure average balance policy would allow a cpu to go > > idle even if we could do have tasks waiting on other rqs if the blocked > > load indicates that other task will show up shortly one the cpu. A pure > > "now" balance would balance based on runnable_load_avg for all entities > > including groups ignoring all blocked load, but that goes against the > > PELT group balancing design. > > > > I'm not against having a policy that sits somewhere in between, we just > > have to agree it is the right policy and clean up the load-balance code > > such that the implemented policy is clear. > > The proposed patch sits in between? I agree, but would rather see it from > another perspective. > > First, I don't think it merits a solution/policy. It is just a cheap > "last guard" to protect the "king" - no crash. It's fine for me to have checks that makes sure we don't crash if we hit some corner case in the load-balance code. I was more interested in figuring out why we ended up in this situation and how we can implement a more consistent policy. > Second, a "pure average" policy is pretty fine in general, but it does not > mean we would simply allow a CPU to be pulled empty, that is because we are > making a bet from a prediction (average) here. By prediction, it basically > means sometimes it fails. As the failure could lead to a disater, without > blaming the prediction, it is reasonable we make a sort of "damage control". I like the idea of balancing based on a average load/utilization that includes blocked load/utilization. It ties in very nicely driving DVFS for example. It is a fundamental change to how we perceive load for load-balancing decisions though. It basically requires an update to the load-balancing policy as most of it doesn't consider blocked load scenarios this one. This one idle-balance problem is easily solve by making sure not to pull the last task. I makes a lot of sense. But I'm quite sure that we will face many more of such cases. For example, at task wake-up, do you put the task on a cpu which is idle in this instant with a high average load or on a cpu which is busy in this instant but has a low average load? Currently, we just put the task on an idle cpu ignoring any blocked load. If we implement a strict average view policy we should choose the cpu with the lowest load even if it is currently busy. I would suggest that we revise the load-balance code before/while adding blocked load so we get a clear(er) policy in the load-balance code so we don't need too many (and maybe not so straightforward) last guards to fix wrong balancing decisions. It doesn't have to be perfect, I just want to get rid of code that nobody around can explain so we can tell if we still make the right predictions when adding blocked load. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-09 13:53 ` Morten Rasmussen @ 2015-07-09 22:34 ` Yuyang Du 0 siblings, 0 replies; 31+ messages in thread From: Yuyang Du @ 2015-07-09 22:34 UTC (permalink / raw) To: Morten Rasmussen Cc: Mike Galbraith, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Thu, Jul 09, 2015 at 02:53:14PM +0100, Morten Rasmussen wrote: > On Mon, Jul 06, 2015 at 04:12:41AM +0800, Yuyang Du wrote: > > Hi Morten, > > > > On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote: > > > > > IOW, since task groups include blocked load in the load_avg_contrib (see > > > > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > > > > > imbalance includes blocked load and hence env->imbalance >= > > > > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > > > > > detach_tasks() emptying the rq completely in the reported scenario where > > > > > blocked load > runnable load. > > > > > > > > Whenever I want to know the load avg concerning task group, I need to > > > > walk through the complete codes again, I prefer not to do it this time. > > > > But it should not be that simply to say "the 118 comes from the blocked load". > > > > > > But the whole hierarchy of group entities is updated each time we enqueue > > > or dequeue a task. I don't see how the group entity load_avg_contrib is > > > not up to date? Why do you need to update it again? > > > > > > In any case, we have one task in the group hierarchy which has a > > > load_avg_contrib of 0 and the grand-grand parent group entity has a > > > load_avg_contrib of 118 and no additional tasks. That load contribution > > > must be from tasks which are no longer around on the rq? No? > > > > load_avg_contrib has WEIGHT inside, so the most I can say is: > > SE: 8f456e00's load_avg_contrib 118 = (its cfs_rq's runnable + blocked) / (tg->load_avg + 1) * tg->shares > > > > The tg->shares is probably 1024 (at least 911). So we are just left with: > > > > cfs_rq / tg = 11.5% > > Yes, we also know that there is only one runnable task in the task group > hierarchy and its contribution is 0. Hence the rest must be from > non-runnable tasks belonging to some child group. Agreed. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-01 23:25 ` Yuyang Du 2015-07-02 8:05 ` Mike Galbraith @ 2015-07-02 10:53 ` Peter Zijlstra 2015-07-02 11:44 ` Morten Rasmussen 2015-07-03 16:39 ` Peter Zijlstra 2 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2015-07-02 10:53 UTC (permalink / raw) To: Yuyang Du Cc: Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > And obviously, the idle balancing livelock SHOULD happen: one CPU pulls > tasks from the other, makes the other idle, and this iterates... > > That being said, it is also obvious to prevent the livelock from happening: > idle pulling until the source rq's nr_running is 1, becuase otherwise we > just avoid idleness by making another idleness. Well, ideally the imbalance calculation would be so that it would avoid this from happening in the first place. Its a 'balance' operation, not a 'steal everything'. We want to take work -- as we have none -- but we want to ensure that afterwards we have equal work, ie we're balanced. So clearly that all is hosed. Now Morten was looking into simplifying calculate_imbalance() recently. > On Wed, Jul 01, 2015 at 10:44:04PM +0200, Peter Zijlstra wrote: > > On Wed, Jul 01, 2015 at 04:55:51PM +0200, Rabin Vincent wrote: > > > PID: 413 TASK: 8edda408 CPU: 1 COMMAND: "rngd" > > > task_h_load(): 0 [ = (load_avg_contrib { 0} * cfs_rq->h_load { 0}) / (cfs_rq->runnable_load_avg { 0} + 1) ] > > > SE: 8edda450 load_avg_contrib: 0 load.weight: 1024 PARENT: 8fffbd00 GROUPNAME: (null) > > > SE: 8fffbd00 load_avg_contrib: 0 load.weight: 2 PARENT: 8f531f80 GROUPNAME: rngd@hwrng.service > > > SE: 8f531f80 load_avg_contrib: 0 load.weight: 1024 PARENT: 8f456e00 GROUPNAME: system-rngd.slice > > > SE: 8f456e00 load_avg_contrib: 118 load.weight: 911 PARENT: 00000000 GROUPNAME: system.slice > > > > Firstly, a group (parent) load_avg_contrib should never be less than > > that of its constituent parts, therefore the top 3 SEs should have at > > least 118 too. > > I think the downward is parent, Ugh, I cannot read. Let me blame it on the heat. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 10:53 ` Peter Zijlstra @ 2015-07-02 11:44 ` Morten Rasmussen 2015-07-02 18:42 ` Yuyang Du 0 siblings, 1 reply; 31+ messages in thread From: Morten Rasmussen @ 2015-07-02 11:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Yuyang Du, Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall On Thu, Jul 02, 2015 at 12:53:59PM +0200, Peter Zijlstra wrote: > On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > > And obviously, the idle balancing livelock SHOULD happen: one CPU pulls > > tasks from the other, makes the other idle, and this iterates... > > > > That being said, it is also obvious to prevent the livelock from happening: > > idle pulling until the source rq's nr_running is 1, becuase otherwise we > > just avoid idleness by making another idleness. > > Well, ideally the imbalance calculation would be so that it would avoid > this from happening in the first place. Its a 'balance' operation, not a > 'steal everything'. > > We want to take work -- as we have none -- but we want to ensure that > afterwards we have equal work, ie we're balanced. Agreed, I think this is the true problem. See my other reply. > > So clearly that all is hosed. Now Morten was looking into simplifying > calculate_imbalance() recently. Yes. I'm held up doing other stuff at the moment, but I think calculate_imbalance() needs some attention and I'm planning on looking at that next. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 11:44 ` Morten Rasmussen @ 2015-07-02 18:42 ` Yuyang Du 2015-07-03 4:42 ` Mike Galbraith 0 siblings, 1 reply; 31+ messages in thread From: Yuyang Du @ 2015-07-02 18:42 UTC (permalink / raw) To: Morten Rasmussen Cc: Peter Zijlstra, Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall On Thu, Jul 02, 2015 at 12:44:55PM +0100, Morten Rasmussen wrote: > On Thu, Jul 02, 2015 at 12:53:59PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > > > And obviously, the idle balancing livelock SHOULD happen: one CPU pulls > > > tasks from the other, makes the other idle, and this iterates... > > > > > > That being said, it is also obvious to prevent the livelock from happening: > > > idle pulling until the source rq's nr_running is 1, becuase otherwise we > > > just avoid idleness by making another idleness. > > > > Well, ideally the imbalance calculation would be so that it would avoid > > this from happening in the first place. Its a 'balance' operation, not a > > 'steal everything'. > > > > We want to take work -- as we have none -- but we want to ensure that > > afterwards we have equal work, ie we're balanced. > > Agreed, I think this is the true problem. See my other reply. Yes, this is agreed at all time. Like I said load_balance() (for idle balancing) should compute the right imbalance and move a fair amount, to achieve we are balanced. Whatever is wrong in how much computed and moved "right imbalance" is should be fixed anyway. But still, I think, even with the above, in idle balancing, pulling until the source rq's nr_running == 1 is not just "a short term fix", but should be there permanently acting like a last guard with no overhead, why not. > > > > > So clearly that all is hosed. Now Morten was looking into simplifying > > calculate_imbalance() recently. > > Yes. I'm held up doing other stuff at the moment, but I think > calculate_imbalance() needs some attention and I'm planning on looking at > that next. Thanks, Yuyang ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-02 18:42 ` Yuyang Du @ 2015-07-03 4:42 ` Mike Galbraith 0 siblings, 0 replies; 31+ messages in thread From: Mike Galbraith @ 2015-07-03 4:42 UTC (permalink / raw) To: Yuyang Du Cc: Morten Rasmussen, Peter Zijlstra, Rabin Vincent, mingo, linux-kernel, Paul Turner, Ben Segall On Fri, 2015-07-03 at 02:42 +0800, Yuyang Du wrote: > But still, I think, even with the above, in idle balancing, pulling until the source > rq's nr_running == 1 is not just "a short term fix", but should be there permanently > acting like a last guard with no overhead, why not. Yeah, seems so. Searching for steal all samples... (this is all with autogroup) load_balance: idle - s_run: 0 d_run: 2 s_load: 0 d_load: 3 imb: 23 det_tasks: 2 det_load: 3 zeros: 1 load_balance: idle - s_run: 0 d_run: 2 s_load: 0 d_load: 0 imb: 32 det_tasks: 2 det_load: 0 zeros: 2 load_balance: idle - s_run: 0 d_run: 2 s_load: 0 d_load: 1 imb: 17 det_tasks: 2 det_load: 1 zeros: 1 load_balance: idle - s_run: 0 d_run: 2 s_load: 0 d_load: 37 imb: 22 det_tasks: 2 det_load: 37 zeros: 1 load_balance: idle - s_run: 0 d_run: 2 s_load: 0 d_load: 0 imb: 102 det_tasks: 2 det_load: 0 zeros: 2 load_balance: idle - s_run: 0 d_run: 1 s_load: 0 d_load: 93 imb: 47 det_tasks: 1 det_load: 93 zeros: 0 load_balance: idle - s_run: 0 d_run: 2 s_load: 0 d_load: 202 imb: 125 det_tasks: 2 det_load: 202 zeros: 0 load_balance: idle - s_run: 0 d_run: 2 s_load: 0 d_load: 243 imb: 188 det_tasks: 2 det_load: 243 zeros: 0 load_balance: idle - s_run: 0 d_run: 1 s_load: 0 d_load: 145 imb: 73 det_tasks: 1 det_load: 145 zeros: 0 load_balance: idle - s_run: 0 d_run: 1 s_load: 0 d_load: 46 imb: 24 det_tasks: 1 det_load: 46 zeros: 0 Both varieties of total pilferage (w/wo 0 load tasks involved) seem to happen only during idle balance, never periodic (yet). Oddity: make -j8 occasionally stacks/pulls piles of load=dinky. homer:/sys/kernel/debug/tracing # for i in `seq 1 10`; do cat trace|grep "s_run: 1.*det_tasks: $i.*zeros: 0"|wc -l; done 71634 1567 79 15 1 3 0 2 3 0 homer:/sys/kernel/debug/tracing # cat trace|grep "s_run: 1.*det_tasks: 8.*zeros: 0" <idle>-0 [002] dNs. 594.973783: load_balance: norm - s_run: 1 d_run: 9 s_load: 67 d_load: 1110 imb: 86 det_tasks: 8 det_load: 86 zeros: 0 <...>-10367 [007] d... 1456.477281: load_balance: idle - s_run: 1 d_run: 8 s_load: 805 d_load: 22 imb: 45 det_tasks: 8 det_load: 22 zeros: 0 homer:/sys/kernel/debug/tracing # cat trace|grep "s_run: 1.*det_tasks: 9.*zeros: 0" <...>-23317 [004] d... 486.677925: load_balance: idle - s_run: 1 d_run: 9 s_load: 888 d_load: 27 imb: 47 det_tasks: 9 det_load: 27 zeros: 0 <...>-11485 [002] d... 573.411095: load_balance: idle - s_run: 1 d_run: 9 s_load: 124 d_load: 78 imb: 82 det_tasks: 9 det_load: 78 zeros: 0 <...>-23286 [000] d... 1510.378740: load_balance: idle - s_run: 1 d_run: 9 s_load: 102 d_load: 58 imb: 63 det_tasks: 9 det_load: 58 zeros: 0 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-01 23:25 ` Yuyang Du 2015-07-02 8:05 ` Mike Galbraith 2015-07-02 10:53 ` Peter Zijlstra @ 2015-07-03 16:39 ` Peter Zijlstra 2015-07-05 22:11 ` Yuyang Du 2 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2015-07-03 16:39 UTC (permalink / raw) To: Yuyang Du Cc: Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 40a7fcb..f7cc1ef 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5898,6 +5898,10 @@ static int detach_tasks(struct lb_env *env) > return 0; > > while (!list_empty(tasks)) { > + > + if (env->idle == CPU_NEWLY_IDLE && env->src_rq->nr_running <= 1) Should we make that ->idle != CPU_NOT_IDLE ? > + break; > + > p = list_first_entry(tasks, struct task_struct, se.group_node); > > env->loop++; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-03 16:39 ` Peter Zijlstra @ 2015-07-05 22:11 ` Yuyang Du 2015-07-09 6:15 ` Stefan Ekenberg ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Yuyang Du @ 2015-07-05 22:11 UTC (permalink / raw) To: Peter Zijlstra Cc: Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen On Fri, Jul 03, 2015 at 06:39:28PM +0200, Peter Zijlstra wrote: > On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 40a7fcb..f7cc1ef 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -5898,6 +5898,10 @@ static int detach_tasks(struct lb_env *env) > > return 0; > > > > while (!list_empty(tasks)) { > > + > > + if (env->idle == CPU_NEWLY_IDLE && env->src_rq->nr_running <= 1) > > Should we make that ->idle != CPU_NOT_IDLE ? I think including CPU_IDLE is good. -- Subject: [PATCH] sched: Avoid pulling all tasks in idle balancing In idle balancing where a CPU going idle pulls tasks from another CPU, a livelock may happen if the CPU pulls all tasks from another, makes it idle, and this iterates. So just avoid this. Reported-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> --- kernel/sched/fair.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 40a7fcb..769d591 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5898,6 +5898,13 @@ static int detach_tasks(struct lb_env *env) return 0; while (!list_empty(tasks)) { + /* + * We don't want to steal all, otherwise we may be treated likewise, + * which could at worst lead to a livelock crash. + */ + if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) + break; + p = list_first_entry(tasks, struct task_struct, se.group_node); env->loop++; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-05 22:11 ` Yuyang Du @ 2015-07-09 6:15 ` Stefan Ekenberg 2015-07-26 18:57 ` Yuyang Du 2015-08-03 17:05 ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du 2 siblings, 0 replies; 31+ messages in thread From: Stefan Ekenberg @ 2015-07-09 6:15 UTC (permalink / raw) To: Yuyang Du Cc: Peter Zijlstra, Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen Hi, I tested the patch on a setup with 7 devices, all running the same troublesome use-case in parallel (same use-case as we used to produce the crash dumps). This use-case was previously able to reproduce the problem about 21 times during 24 hours. After including the patch the setup ran perfectly for 48 hours. So to summarize, patch tested OK. Tested-by: Stefan Ekenberg <stefan.ekenberg@axis.com> On Mon, Jul 06, 2015 at 12:11:51AM +0200, Yuyang Du wrote: > On Fri, Jul 03, 2015 at 06:39:28PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 40a7fcb..f7cc1ef 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -5898,6 +5898,10 @@ static int detach_tasks(struct lb_env *env) > > > return 0; > > > > > > while (!list_empty(tasks)) { > > > + > > > + if (env->idle == CPU_NEWLY_IDLE && env->src_rq->nr_running <= 1) > > > > Should we make that ->idle != CPU_NOT_IDLE ? > > I think including CPU_IDLE is good. > > -- > Subject: [PATCH] sched: Avoid pulling all tasks in idle balancing > > In idle balancing where a CPU going idle pulls tasks from another CPU, > a livelock may happen if the CPU pulls all tasks from another, makes > it idle, and this iterates. So just avoid this. > > Reported-by: Rabin Vincent <rabin.vincent@axis.com> > Signed-off-by: Yuyang Du <yuyang.du@intel.com> > --- > kernel/sched/fair.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 40a7fcb..769d591 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5898,6 +5898,13 @@ static int detach_tasks(struct lb_env *env) > return 0; > > while (!list_empty(tasks)) { > + /* > + * We don't want to steal all, otherwise we may be treated likewise, > + * which could at worst lead to a livelock crash. > + */ > + if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) > + break; > + > p = list_first_entry(tasks, struct task_struct, se.group_node); > > env->loop++; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() 2015-07-05 22:11 ` Yuyang Du 2015-07-09 6:15 ` Stefan Ekenberg @ 2015-07-26 18:57 ` Yuyang Du 2015-08-03 17:05 ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du 2 siblings, 0 replies; 31+ messages in thread From: Yuyang Du @ 2015-07-26 18:57 UTC (permalink / raw) To: Peter Zijlstra Cc: Rabin Vincent, Mike Galbraith, mingo, linux-kernel, Paul Turner, Ben Segall, Morten Rasmussen Hi Peter, What about this patch? Adding: Tested-by: Stefan Ekenberg <stefan.ekenberg@axis.com> Thanks, Yuyang On Mon, Jul 06, 2015 at 06:11:51AM +0800, Yuyang Du wrote: > On Fri, Jul 03, 2015 at 06:39:28PM +0200, Peter Zijlstra wrote: > > On Thu, Jul 02, 2015 at 07:25:11AM +0800, Yuyang Du wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 40a7fcb..f7cc1ef 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -5898,6 +5898,10 @@ static int detach_tasks(struct lb_env *env) > > > return 0; > > > > > > while (!list_empty(tasks)) { > > > + > > > + if (env->idle == CPU_NEWLY_IDLE && env->src_rq->nr_running <= 1) > > > > Should we make that ->idle != CPU_NOT_IDLE ? > > I think including CPU_IDLE is good. > > -- > Subject: [PATCH] sched: Avoid pulling all tasks in idle balancing > > In idle balancing where a CPU going idle pulls tasks from another CPU, > a livelock may happen if the CPU pulls all tasks from another, makes > it idle, and this iterates. So just avoid this. > > Reported-by: Rabin Vincent <rabin.vincent@axis.com> > Signed-off-by: Yuyang Du <yuyang.du@intel.com> > --- > kernel/sched/fair.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 40a7fcb..769d591 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5898,6 +5898,13 @@ static int detach_tasks(struct lb_env *env) > return 0; > > while (!list_empty(tasks)) { > + /* > + * We don't want to steal all, otherwise we may be treated likewise, > + * which could at worst lead to a livelock crash. > + */ > + if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) > + break; > + > p = list_first_entry(tasks, struct task_struct, se.group_node); > > env->loop++; > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing 2015-07-05 22:11 ` Yuyang Du 2015-07-09 6:15 ` Stefan Ekenberg 2015-07-26 18:57 ` Yuyang Du @ 2015-08-03 17:05 ` tip-bot for Yuyang Du 2 siblings, 0 replies; 31+ messages in thread From: tip-bot for Yuyang Du @ 2015-08-03 17:05 UTC (permalink / raw) To: linux-tip-commits Cc: umgwanakikbuti, mingo, linux-kernel, peterz, morten.rasmussen, tglx, efault, pjt, torvalds, yuyang.du, bsegall, hpa, rabin.vincent Commit-ID: 985d3a4c11cd28251bcc7925aa2d7a9038910384 Gitweb: http://git.kernel.org/tip/985d3a4c11cd28251bcc7925aa2d7a9038910384 Author: Yuyang Du <yuyang.du@intel.com> AuthorDate: Mon, 6 Jul 2015 06:11:51 +0800 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Mon, 3 Aug 2015 12:21:19 +0200 sched/fair: Avoid pulling all tasks in idle balancing In idle balancing where a CPU going idle pulls tasks from another CPU, a livelock may happen if the CPU pulls all tasks from another, makes it idle, and this iterates. So just avoid this. Reported-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150705221151.GF5197@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 587a2f6..8b384b8d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5863,6 +5863,13 @@ static int detach_tasks(struct lb_env *env) return 0; while (!list_empty(tasks)) { + /* + * We don't want to steal all, otherwise we may be treated likewise, + * which could at worst lead to a livelock crash. + */ + if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) + break; + p = list_first_entry(tasks, struct task_struct, se.group_node); env->loop++; ^ permalink raw reply related [flat|nested] 31+ messages in thread
end of thread, other threads:[~2015-08-03 17:06 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-30 14:30 [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Rabin Vincent 2015-07-01 5:36 ` Mike Galbraith 2015-07-01 14:55 ` Rabin Vincent 2015-07-01 15:47 ` Mike Galbraith 2015-07-01 20:44 ` Peter Zijlstra 2015-07-01 23:25 ` Yuyang Du 2015-07-02 8:05 ` Mike Galbraith 2015-07-02 1:05 ` Yuyang Du 2015-07-02 10:25 ` Mike Galbraith 2015-07-02 11:40 ` Morten Rasmussen 2015-07-02 19:37 ` Yuyang Du 2015-07-03 9:34 ` Morten Rasmussen 2015-07-03 16:38 ` Peter Zijlstra 2015-07-05 22:31 ` Yuyang Du 2015-07-09 14:32 ` Morten Rasmussen 2015-07-09 23:24 ` Yuyang Du 2015-07-05 20:12 ` Yuyang Du 2015-07-06 17:36 ` Dietmar Eggemann 2015-07-07 11:17 ` Rabin Vincent 2015-07-13 17:43 ` Dietmar Eggemann 2015-07-09 13:53 ` Morten Rasmussen 2015-07-09 22:34 ` Yuyang Du 2015-07-02 10:53 ` Peter Zijlstra 2015-07-02 11:44 ` Morten Rasmussen 2015-07-02 18:42 ` Yuyang Du 2015-07-03 4:42 ` Mike Galbraith 2015-07-03 16:39 ` Peter Zijlstra 2015-07-05 22:11 ` Yuyang Du 2015-07-09 6:15 ` Stefan Ekenberg 2015-07-26 18:57 ` Yuyang Du 2015-08-03 17:05 ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).