linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Li, Aubrey" <aubrey.li@linux.intel.com>
Cc: viremana@linux.microsoft.com,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Julien Desfossez" <jdesfossez@digitalocean.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Glexiner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Subhra Mazumdar" <subhra.mazumdar@oracle.com>,
	"Frederic Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Aaron Lu" <aaron.lwe@gmail.com>,
	"Aubrey Li" <aubrey.intel@gmail.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Vineeth Pillai" <vineethrp@gmail.com>,
	"Chen Yu" <yu.c.chen@intel.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Ning, Hongyu" <hongyu.ning@linux.intel.com>,
	"benbjiang(蒋彪)" <benbjiang@tencent.com>
Subject: Re: [RFC PATCH 00/16] Core scheduling v6
Date: Wed, 12 Aug 2020 19:08:50 -0400	[thread overview]
Message-ID: <20200812230850.GA3511387@google.com> (raw)
In-Reply-To: <162a03cc-66c3-1999-83a2-deaad5aa04c8@linux.intel.com>

On Wed, Aug 12, 2020 at 10:01:24AM +0800, Li, Aubrey wrote:
> Hi Joel,
> 
> On 2020/8/10 0:44, Joel Fernandes wrote:
> > Hi Aubrey,
> > 
> > Apologies for replying late as I was still looking into the details.
> > 
> > On Wed, Aug 05, 2020 at 11:57:20AM +0800, Li, Aubrey wrote:
> > [...]
> >> +/*
> >> + * Core scheduling policy:
> >> + * - CORE_SCHED_DISABLED: core scheduling is disabled.
> >> + * - CORE_COOKIE_MATCH: tasks with same cookie can run
> >> + *                     on the same core concurrently.
> >> + * - CORE_COOKIE_TRUST: trusted task can run with kernel
> >> 			thread on the same core concurrently. 
> >> + * - CORE_COOKIE_LONELY: tasks with cookie can run only
> >> + *                     with idle thread on the same core.
> >> + */
> >> +enum coresched_policy {
> >> +       CORE_SCHED_DISABLED,
> >> +       CORE_SCHED_COOKIE_MATCH,
> >> +	CORE_SCHED_COOKIE_TRUST,
> >> +       CORE_SCHED_COOKIE_LONELY,
> >> +};
> >>
> >> We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind
> >> of performance regression. Not sure if this sounds attractive?
> > 
> > Instead of this, I think it can be something simpler IMHO:
> > 
> > 1. Consider all cookie-0 task as trusted. (Even right now, if you apply the
> >    core-scheduling patchset, such tasks will share a core and sniff on each
> >    other. So let us not pretend that such tasks are not trusted).
> > 
> > 2. All kernel threads and idle task would have a cookie 0 (so that will cover
> >    ksoftirqd reported in your original issue).
> > 
> > 3. Add a config option (CONFIG_SCHED_CORE_DEFAULT_TASKS_UNTRUSTED). Default
> >    enable it. Setting this option would tag all tasks that are forked from a
> >    cookie-0 task with their own cookie. Later on, such tasks can be added to
> >    a group. This cover's PeterZ's ask about having 'default untrusted').
> >    (Users like ChromeOS that don't want to userspace system processes to be
> >    tagged can disable this option so such tasks will be cookie-0).
> > 
> > 4. Allow prctl/cgroup interfaces to create groups of tasks and override the
> >    above behaviors.
> 
> How does uperf in a cgroup work with ksoftirqd? Are you suggesting I set uperf's
> cookie to be cookie-0 via prctl?

Yes, but let me try to understand better. There are 2 problems here I think:

1. ksoftirqd getting idled when HT is turned on, because uperf is sharing a
core with it: This should not be any worse than SMT OFF, because even SMT OFF
would also reduce ksoftirqd's CPU time just core sched is doing. Sure
core-scheduling adds some overhead with IPIs but such a huge drop of perf is
strange. Peter any thoughts on that?

2. Interface: To solve the performance problem, you are saying you want uperf
to share a core with ksoftirqd so that it is not forced into idle.  Why not
just keep uperf out of the cgroup? Then it will have cookie 0 and be able to
share core with kernel threads. About user-user isolation that you need, if
you tag any "untrusted" threads by adding it to CGroup, then there will
automatically isolated from uperf while allowing uperf to share CPU with
kernel threads.

Please let me know your thoughts and thanks,

 - Joel

> 
> Thanks,
> -Aubrey
> > 
> > 5. Document everything clearly so the semantics are clear both to the
> >    developers of core scheduling and to system administrators.
> > 
> > Note that, with the concept of "system trusted cookie", we can also do
> > optimizations like:
> > 1. Disable STIBP when switching into trusted tasks.
> > 2. Disable L1D flushing / verw stuff for L1TF/MDS issues, when switching into
> >    trusted tasks.
> > 
> > At least #1 seems to be biting enabling HT on ChromeOS right now, and one
> > other engineer requested I do something like #2 already.
> > 
> > Once we get full-syscall isolation working, threads belonging to a process
> > can also share a core so those can just share a core with the task-group
> > leader.
> > 
> >>> Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ?
> >>
> >> This is a good question, from the data we measured by uperf,
> >> SMT+core-scheduling is 28.2% worse than no-SMT, :(
> > 
> > This is worrying for sure. :-(. We ought to debug/profile it more to see what
> > is causing the overhead. Me/Vineeth added it as a topic for LPC as well.
> > 
> > Any other thoughts from others on this?
> > 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> >>> thanks,
> >>>
> >>>  - Joel
> >>> PS: I am planning to write a patch behind a CONFIG option that tags
> >>> all processes (default untrusted) so everything gets a cookie which
> >>> some folks said was how they wanted (have a whitelist instead of
> >>> blacklist).
> >>>
> >>
> 

  reply	other threads:[~2020-08-12 23:08 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-30 21:32 [RFC PATCH 00/16] Core scheduling v6 Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 01/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 02/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 03/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 04/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2020-07-21 14:02   ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks(Internet mail) benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 06/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2020-07-01 23:28   ` Joel Fernandes
2020-07-02  0:54     ` Tim Chen
2020-07-02 12:57       ` Joel Fernandes
2020-07-02 13:23         ` Joel Fernandes
2020-07-05 23:44         ` Tim Chen
2020-07-03 20:21     ` Vineeth Remanan Pillai
2020-07-06 14:09       ` Joel Fernandes
2020-07-06 14:38         ` Vineeth Remanan Pillai
2020-07-06 17:37           ` Joel Fernandes
2020-06-30 21:32 ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case Vineeth Remanan Pillai
2020-07-21  7:35   ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case(Internet mail) benbjiang(蒋彪)
2020-07-22  7:20   ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 08/16] sched/fair: wrapper for cfs_rq->min_vruntime Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison Vineeth Remanan Pillai
2020-07-22  0:23   ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison(Internet mail) benbjiang(蒋彪)
2020-07-24  7:14     ` Aaron Lu
2020-07-24 12:08       ` Jiang Biao
2020-06-30 21:32 ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2020-07-20  4:06   ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail) benbjiang(蒋彪)
2020-07-20  6:06     ` Li, Aubrey
     [not found]       ` <8082F052-2F52-42D3-B396-18A35A94F26F@tencent.com>
2020-07-20  8:03         ` Li, Aubrey
2020-07-20  8:22           ` benbjiang(蒋彪)
2020-07-20 14:34   ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 11/16] sched: migration changes for core scheduling Vineeth Remanan Pillai
2020-07-22  8:54   ` [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail) benbjiang(蒋彪)
2020-07-22 12:13     ` Li, Aubrey
2020-07-22 14:32       ` benbjiang(蒋彪)
2020-07-23  1:57         ` Li, Aubrey
2020-07-23  2:42           ` benbjiang(蒋彪)
2020-07-23  3:35             ` Li, Aubrey
2020-07-23  4:23               ` benbjiang(蒋彪)
2020-07-23  5:39                 ` Li, Aubrey
2020-07-23  7:47                   ` benbjiang(蒋彪)
2020-07-23  8:06                     ` Li, Aubrey
2020-07-23  8:28                       ` benbjiang(蒋彪)
2020-07-23 23:43                         ` Aubrey Li
2020-07-24  1:26                           ` benbjiang(蒋彪)
2020-07-24  2:05                             ` Li, Aubrey
2020-07-24  2:29                               ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 12/16] sched: cgroup tagging interface for core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 13/16] sched: Fix pick_next_task() race condition in " Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 14/16] irq: Add support for core-wide protection of IRQ and softirq Vineeth Remanan Pillai
2020-07-10 12:19   ` Li, Aubrey
2020-07-10 13:21     ` Joel Fernandes
2020-07-13  2:23       ` Li, Aubrey
2020-07-13 15:58         ` Joel Fernandes
2020-07-10 13:36     ` Vineeth Remanan Pillai
2020-07-11  1:33       ` Aubrey Li
2020-07-17 23:37     ` Thomas Gleixner
2020-07-18 17:05       ` Joel Fernandes
2020-07-17 23:36   ` Thomas Gleixner
2020-07-20  3:53     ` Joel Fernandes
2020-07-20  8:20       ` Thomas Gleixner
2020-07-20 11:09       ` Vineeth Pillai
2020-06-30 21:32 ` [RFC PATCH 15/16] Documentation: Add documentation on core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 16/16] sched: Debug bits Vineeth Remanan Pillai
2020-07-31 16:41 ` [RFC PATCH 00/16] Core scheduling v6 Vineeth Pillai
2020-08-03  8:23 ` Li, Aubrey
2020-08-03 16:53   ` Joel Fernandes
2020-08-05  3:57     ` Li, Aubrey
2020-08-05  6:16       ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-09 16:44       ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-12  2:01         ` Li, Aubrey
2020-08-12 23:08           ` Joel Fernandes [this message]
2020-08-13  4:28             ` Li, Aubrey
2020-08-14  0:26               ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-14  1:36                 ` Li, Aubrey
2020-08-14  4:04                   ` benbjiang(蒋彪)
2020-08-14  5:18                     ` Li, Aubrey
2020-08-14  7:54                       ` benbjiang(蒋彪)
2020-08-20 22:37               ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-27  0:30 ` Alexander Graf
2020-08-27  1:20   ` Vineeth Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812230850.GA3511387@google.com \
    --to=joel@joelfernandes.org \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@tencent.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fweisbec@gmail.com \
    --cc=hongyu.ning@linux.intel.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineethrp@gmail.com \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).