LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Julien Desfossez <jdesfossez@digitalocean.com>
Cc: mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	tim.c.chen@linux.intel.com, torvalds@linux-foundation.org,
	linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Vineeth Pillai <vpillai@digitalocean.com>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Aaron Lu <aaron.lu@linux.alibaba.com>
Subject: Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.
Date: Wed, 10 Apr 2019 17:01:16 +0200
Message-ID: <20190410150116.GI2490@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <1554835135-11814-1-git-send-email-jdesfossez@digitalocean.com>

On Tue, Apr 09, 2019 at 02:38:55PM -0400, Julien Desfossez wrote:
> We found the source of the major performance regression we discussed
> previously. It turns out there was a pattern where a task (a kworker in this
> case) could be woken up, but the core could still end up idle before that
> task had a chance to run.
> 
> Example sequence, cpu0 and cpu1 and siblings on the same core, task1 and
> task2 are in the same cgroup with the tag enabled (each following line
> happens in the increasing order of time):
> - task1 running on cpu0, task2 running on cpu1
> - sched_waking(kworker/0, target_cpu=cpu0)
> - task1 scheduled out of cpu0
> - kworker/0 cannot run on cpu0 because of task2 is still running on cpu1
>   cpu0 is idle
> - task2 scheduled out of cpu1

But at this point core_cookie is still set; we don't clear it when the
last task goes away.

> - cpu1 doesn’t select kworker/0 for cpu0, because the optimization path ends
>   the task selection if core_cookie is NULL for currently selected process
>   and the cpu1’s runqueue.

But at this point core_cookie is still set, we only (re)set it later to
p->core_cookie.

What I suspect happens is that you hit the 'again' clause due to a
higher prio @max on the second sibling. And at that point we've
destroyed core_cookie.

> - cpu1 is idle
> --> both siblings are idle but kworker/0 is still in the run queue of cpu0.
>     Cpu0 may stay idle for longer if it goes deep idle.
> 
> With the fix below, we ensure to send an IPI to the sibling if it is idle
> and has tasks waiting in its runqueue.
> This fixes the performance issue we were seeing.
> 
> Now here is what we can measure with a disk write-intensive benchmark:
> - no performance impact with enabling core scheduling without any tagged
>   task,
> - 5% overhead if one tagged task is competing with an untagged task,
> - 10% overhead if 2 tasks tagged with a different tag are competing
>   against each other.
> 
> We are starting more scaling tests, but this is very encouraging !
> 
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e1fa10561279..02c862a5e973 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3779,7 +3779,22 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>  
>  				trace_printk("unconstrained pick: %s/%d %lx\n",
>  						next->comm, next->pid, next->core_cookie);
> +				rq->core_pick = NULL;
>  
> +				/*
> +				 * If the sibling is idling, we might want to wake it
> +				 * so that it can check for any runnable but blocked tasks 
> +				 * due to previous task matching.
> +				 */
> +				for_each_cpu(j, smt_mask) {
> +					struct rq *rq_j = cpu_rq(j);
> +					rq_j->core_pick = NULL;
> +					if (j != cpu && is_idle_task(rq_j->curr) && rq_j->nr_running) {
> +						resched_curr(rq_j);
> +						trace_printk("IPI(%d->%d[%d]) idle preempt\n",
> +							     cpu, j, rq_j->nr_running);
> +					}
> +				}
>  				goto done;
>  			}

I'm thinking there is a more elegant solution hiding in there; possibly
saving/restoring that core_cookie on the again loop should do, but I've
always had the nagging suspicion that whole selection loop could be done
better.

  reply index

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 16:56 [RFC][PATCH 00/16] sched: Core scheduling Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 01/16] stop_machine: Fix stop_cpus_in_progress ordering Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 03/16] sched: Wrap rq::lock access Peter Zijlstra
2019-02-19 16:13   ` Phil Auld
2019-02-19 16:22     ` Peter Zijlstra
2019-02-19 16:37       ` Phil Auld
2019-03-18 15:41   ` Julien Desfossez
2019-03-20  2:29     ` Subhra Mazumdar
2019-03-21 21:20       ` Julien Desfossez
2019-03-22 13:34         ` Peter Zijlstra
2019-03-22 20:59           ` Julien Desfossez
2019-03-23  0:06         ` Subhra Mazumdar
2019-03-27  1:02           ` Subhra Mazumdar
2019-03-29 13:35           ` Julien Desfossez
2019-03-29 22:23             ` Subhra Mazumdar
2019-04-01 21:35               ` Subhra Mazumdar
2019-04-03 20:16                 ` Julien Desfossez
2019-04-05  1:30                   ` Subhra Mazumdar
2019-04-02  7:42               ` Peter Zijlstra
2019-03-22 23:28       ` Tim Chen
2019-03-22 23:44         ` Tim Chen
2019-02-18 16:56 ` [RFC][PATCH 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 06/16] sched/fair: Export newidle_balance() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 07/16] sched: Allow put_prev_task() to drop rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 08/16] sched: Rework pick_next_task() slow-path Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 09/16] sched: Introduce sched_class::pick_task() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 10/16] sched: Core-wide rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 11/16] sched: Basic tracking of matching tasks Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 12/16] sched: A quick and dirty cgroup tagging interface Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling Peter Zijlstra
     [not found]   ` <20190402064612.GA46500@aaronlu>
2019-04-02  8:28     ` Peter Zijlstra
2019-04-02 13:20       ` Aaron Lu
2019-04-05 14:55       ` Aaron Lu
2019-04-09 18:09         ` Tim Chen
2019-04-10  4:36           ` Aaron Lu
2019-04-10 14:18             ` Aubrey Li
2019-04-11  2:11               ` Aaron Lu
2019-04-10 14:44             ` Peter Zijlstra
2019-04-11  3:05               ` Aaron Lu
2019-04-11  9:19                 ` Peter Zijlstra
2019-04-10  8:06           ` Peter Zijlstra
2019-04-10 19:58             ` Vineeth Remanan Pillai
2019-04-15 16:59             ` Julien Desfossez
2019-04-16 13:43       ` Aaron Lu
2019-04-09 18:38   ` Julien Desfossez
2019-04-10 15:01     ` Peter Zijlstra [this message]
2019-04-11  0:11     ` Subhra Mazumdar
2019-04-19  8:40       ` Ingo Molnar
2019-04-19 23:16         ` Subhra Mazumdar
2019-02-18 16:56 ` [RFC][PATCH 14/16] sched/fair: Add a few assertions Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 15/16] sched: Trivial forced-newidle balancer Peter Zijlstra
2019-02-21 16:19   ` Valentin Schneider
2019-02-21 16:41     ` Peter Zijlstra
2019-02-21 16:47       ` Peter Zijlstra
2019-02-21 18:28         ` Valentin Schneider
2019-04-04  8:31       ` Aubrey Li
2019-04-06  1:36         ` Aubrey Li
2019-02-18 16:56 ` [RFC][PATCH 16/16] sched: Debug bits Peter Zijlstra
2019-02-18 17:49 ` [RFC][PATCH 00/16] sched: Core scheduling Linus Torvalds
2019-02-18 20:40   ` Peter Zijlstra
2019-02-19  0:29     ` Linus Torvalds
2019-02-19 15:15       ` Ingo Molnar
2019-02-22 12:17     ` Paolo Bonzini
2019-02-22 14:20       ` Peter Zijlstra
2019-02-22 19:26         ` Tim Chen
2019-02-26  8:26           ` Aubrey Li
2019-02-27  7:54             ` Aubrey Li
2019-02-21  2:53   ` Subhra Mazumdar
2019-02-21 14:03     ` Peter Zijlstra
2019-02-21 18:44       ` Subhra Mazumdar
2019-02-22  0:34       ` Subhra Mazumdar
2019-02-22 12:45   ` Mel Gorman
2019-02-22 16:10     ` Mel Gorman
2019-03-08 19:44     ` Subhra Mazumdar
2019-03-11  4:23       ` Aubrey Li
2019-03-11 18:34         ` Subhra Mazumdar
2019-03-11 23:33           ` Subhra Mazumdar
2019-03-12  0:20             ` Greg Kerr
2019-03-12  0:47               ` Subhra Mazumdar
2019-03-12  7:33               ` Aaron Lu
2019-03-12  7:45             ` Aubrey Li
2019-03-13  5:55               ` Aubrey Li
2019-03-14  0:35                 ` Tim Chen
2019-03-14  5:30                   ` Aubrey Li
2019-03-14  6:07                     ` Li, Aubrey
2019-03-18  6:56             ` Aubrey Li
2019-03-12 19:07           ` Pawan Gupta
2019-03-26  7:32       ` Aaron Lu
2019-03-26  7:56         ` Aaron Lu
2019-02-19 22:07 ` Greg Kerr
2019-02-20  9:42   ` Peter Zijlstra
2019-02-20 18:33     ` Greg Kerr
2019-02-22 14:10       ` Peter Zijlstra
2019-03-07 22:06         ` Paolo Bonzini
2019-02-20 18:43     ` Subhra Mazumdar
2019-03-01  2:54 ` Subhra Mazumdar
2019-03-14 15:28 ` Julien Desfossez

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190410150116.GI2490@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox