linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: "Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Aubrey Li" <aubrey.intel@gmail.com>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Dario Faggioli" <dfaggioli@suse.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Aaron Lu" <aaron.lwe@gmail.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [RFC PATCH v4 00/19] Core scheduling v4
Date: Mon, 16 Mar 2020 20:55:21 -0400	[thread overview]
Message-ID: <20200317005521.GA8244@google.com> (raw)
In-Reply-To: <20200221232057.GA19671@sinkpad>

Hi Julien, Peter, all,

On Fri, Feb 21, 2020 at 06:20:57PM -0500, Julien Desfossez wrote:
> On 18-Feb-2020 04:58:02 PM, Vineeth Remanan Pillai wrote:
> > > Yes, this makes sense, patch updated at here, I put your name there if
> > > you don't mind.
> > > https://github.com/aubreyli/linux/tree/coresched_v4-v5.5.2-rc2
> > >
> > > Thanks Aubrey!
> 
> Just a quick note, I ran a very cpu-intensive benchmark (9x12 vcpus VMs
> running linpack), all affined to an 18 cores NUMA node (36 hardware
> threads). Each VM is running in its own cgroup/tag with core scheduling
> enabled. We know it already performed much better than nosmt, so for
> this case, I measured various co-scheduling statistics:
> - how much time the process spends co-scheduled with idle, a compatible
>   or an incompatible task
> - how long does the process spends running in a inefficient
>   configuration (more than 1 thread running alone on a core)
> 
> And I am very happy to report than even though the 9 VMs were configured
> to float on the whole NUMA node, the scheduler / load-balancer did a
> very good job at keeping an efficient configuration:
> 
> Process 10667 (qemu-system-x86), 10 seconds trace:
>   - total runtime: 46451472309 ns,
>   - local neighbors (total: 45713285084 ns, 98.411 % of process runtime):
>   - idle neighbors (total: 484054061 ns, 1.042 % of process runtime):
>   - foreign neighbors (total: 4191002 ns, 0.009 % of process runtime):
>   - unknown neighbors  (total: 92042503 ns, 0.198 % of process runtime)
>   - inefficient periods (total: 464832 ns, 0.001 % of process runtime):
>     - number of periods: 48
>     - min period duration: 1424 ns
>     - max period duration: 116988 ns
>     - average period duration: 9684.000 ns
>     - stdev: 19282.130
> 
> I thought you would enjoy seeing this :-)

Looks quite interesting. We are trying apply this work to ChromeOS. What we
want to do is selectively marking tasks, instead of grouping sets of trusted
tasks. I have a patch that adds a prctl which a task can call, and it works
well (task calls prctl and gets a cookie which gives it a dedicated core).

However, I have the following questions, in particular there are 4 scenarios
where I feel the current patches do not resolve MDS/L1TF, would you guys
please share your thoughts?

1. HT1 is running either hostile guest or host code.
   HT2 is running an interrupt handler (victim).

   In this case I see there is a possible MDS issue between HT1 and HT2.

2. HT1 is executing hostile host code, and gets interrupted by a victim
   interrupt. HT2 is idle.

   In this case, I see there is a possible MDS issue between interrupt and
   the host code on the same HT1.

3. HT1 is executing hostile guest code, HT2 is executing a victim interrupt
   handler on the host.

   In this case, I see there is a possible L1TF issue between HT1 and HT2.
   This issue does not happen if HT1 is running host code, since the host
   kernel takes care of inverting PTE bits.

4. HT1 is idle, and HT2 is running a victim process. Now HT1 starts running
   hostile code on guest or host. HT2 is being forced idle. However, there is
   an overlap between HT1 starting to execute hostile code and HT2's victim
   process getting scheduled out.
   Speaking to Vineeth, we discussed an idea to monitor the core_sched_seq
   counter of the sibling being idled to detect that it is now idle.
   However we discussed today that looking at this data, it is not really an
   issue since it is such a small window.

My concern is now cases 1, 2 to which there does not seem a good solution,
short of disabling interrupts. For 3, we could still possibly do something on
the guest side, such as using shadow page tables. Any thoughts on all this?

thanks,

- Joel



  reply	other threads:[~2020-03-17  0:55 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-30 18:33 [RFC PATCH v4 00/19] Core scheduling v4 Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 01/19] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 02/19] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 03/19] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 04/19] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 05/19] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 06/19] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 07/19] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 08/19] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 09/19] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 10/19] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 11/19] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 12/19] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 13/19] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 14/19] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 15/19] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 16/19] sched: Debug bits Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 17/19] sched/fair: wrapper for cfs_rq->min_vruntime Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 18/19] sched/fair: core wide vruntime comparison Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 19/19] sched/fair : Wake up forced idle siblings if needed Vineeth Remanan Pillai
2019-10-31 11:42 ` [RFC PATCH v4 00/19] Core scheduling v4 Li, Aubrey
2019-11-01 11:33   ` Li, Aubrey
2019-11-08  3:20     ` Li, Aubrey
2019-10-31 18:42 ` Phil Auld
2019-11-01 14:03   ` Vineeth Remanan Pillai
2019-11-01 16:35     ` Greg Kerr
2019-11-01 18:07       ` Dario Faggioli
2019-11-12  1:45     ` Dario Faggioli
2019-11-13 17:16       ` Tim Chen
2020-01-02  2:28       ` Aubrey Li
2020-01-10 23:19         ` Tim Chen
2019-11-11 19:10 ` Tim Chen
2020-01-14  1:12   ` Tim Chen
2020-01-14 15:40     ` Vineeth Remanan Pillai
2020-01-15  3:43       ` Li, Aubrey
2020-01-15 19:33         ` Tim Chen
2020-01-16  1:45           ` Aubrey Li
2020-01-17 16:00             ` Vineeth Remanan Pillai
2020-01-22 18:04               ` Gruza, Agata
2020-01-28  2:40       ` Dario Faggioli
     [not found]         ` <CANaguZDDpzrzdTmvjXvCmV2c+wBt6mXWSz4Vn-LJ-onc_Oj=yw@mail.gmail.com>
2020-02-01 15:31           ` Dario Faggioli
2020-02-06  0:28       ` Tim Chen
2020-02-06 22:37         ` Julien Desfossez
2020-02-12 23:07         ` Julien Desfossez
2020-02-13 18:37           ` Tim Chen
2020-02-14  6:10             ` Aubrey Li
     [not found]               ` <CANaguZC40mDHfL1H_9AA7H8cyd028t9PQVRqQ3kB4ha8R7hhqg@mail.gmail.com>
2020-02-15  6:01                 ` Aubrey Li
     [not found]                   ` <CANaguZBj_x_2+9KwbHCQScsmraC_mHdQB6uRqMTYMmvhBYfv2Q@mail.gmail.com>
2020-02-21 23:20                     ` Julien Desfossez
2020-03-17  0:55                       ` Joel Fernandes [this message]
2020-03-17 19:07                         ` Tim Chen
2020-03-17 20:18                           ` Tim Chen
2020-03-18  1:10                             ` Joel Fernandes
2020-03-17 21:17                           ` Thomas Gleixner
2020-03-17 21:58                             ` Tim Chen
2020-03-18  1:03                             ` Joel Fernandes
2020-03-18  2:30                               ` Joel Fernandes
2020-03-18  0:52                           ` Joel Fernandes
2020-03-18 11:53                             ` Thomas Gleixner
2020-03-19  1:54                               ` Joel Fernandes
2020-02-25  3:44               ` Aaron Lu
2020-02-25  5:32                 ` Aubrey Li
2020-02-25  7:34                   ` Aaron Lu
2020-02-25 10:40                     ` Aubrey Li
2020-02-25 11:21                       ` Aaron Lu
2020-02-25 13:41                         ` Aubrey Li
     [not found]                 ` <CANaguZD205ccu1V_2W-QuMRrJA9SjJ5ng1do4NCdLy8NDKKrbA@mail.gmail.com>
2020-02-26  3:13                   ` Aaron Lu
2020-02-26  7:21                   ` Aubrey Li
     [not found]                     ` <CANaguZDQZg-Z6aNpeLcjQ-cGm3X8CQOkZ_hnJNUyqDRM=yVDFQ@mail.gmail.com>
2020-02-27  4:45                       ` Aubrey Li
2020-02-28 23:55                       ` Tim Chen
2020-03-03 14:59                         ` Li, Aubrey
2020-03-03 23:54                           ` Li, Aubrey
2020-03-05  4:33                             ` Aaron Lu
2020-03-05  6:10                               ` Li, Aubrey
2020-03-05  8:52                                 ` Aaron Lu
2020-02-27  2:04                   ` Aaron Lu
2020-02-27 14:10                     ` Phil Auld
2020-02-27 14:37                       ` Aubrey Li
2020-02-28  2:54                       ` Aaron Lu
2020-03-05 13:45                         ` Aubrey Li
2020-03-06  2:41                           ` Aaron Lu
2020-03-06 18:06                             ` Tim Chen
2020-03-06 18:33                               ` Phil Auld
2020-03-06 21:44                                 ` Tim Chen
2020-03-07  3:13                                   ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200317005521.GA8244@google.com \
    --to=joel@joelfernandes.org \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=dfaggioli@suse.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).