linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dfaggioli@suse.com>
To: Peter Zijlstra <peterz@infradead.org>,
	vpillai <vpillai@digitalocean.com>
Cc: Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>, Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	aubrey.li@linux.intel.com,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Joel Fernandes <joelaf@google.com>,
	joel@joelfernandes.org, Alexander Graf <graf@amazon.de>
Subject: Re: [RFC PATCH 00/13] Core scheduling v5
Date: Sat, 09 May 2020 16:35:34 +0200	[thread overview]
Message-ID: <b540686733f52135b6153a3b4480a25f7f552386.camel@suse.com> (raw)
In-Reply-To: <20200414142152.GV20730@hirez.programming.kicks-ass.net>

[-- Attachment #1: Type: text/plain, Size: 5824 bytes --]

On Tue, 2020-04-14 at 16:21 +0200, Peter Zijlstra wrote:
> On Wed, Mar 04, 2020 at 04:59:50PM +0000, vpillai wrote:
> > 
> > - Investigate the source of the overhead even when no tasks are
> > tagged:
> >   https://lkml.org/lkml/2019/10/29/242
> 
>  - explain why we're all still doing this ....
> 
> Seriously, what actual problems does it solve? The patch-set still
> isn't
> L1TF complete and afaict it does exactly nothing for MDS.
> 
Hey Peter! Late to the party, I know...

But I'm replying anyway. At least, you'll have the chance to yell at me
for this during OSPM. ;-P

> Like I've written many times now, back when the world was simpler and
> all we had to worry about was L1TF, core-scheduling made some sense,
> but
> how does it make sense today?
> 
Indeed core-scheduling alone doesn't even completely solve L1TF. There
are the interrupts and the VMEXITs issues. Both are being discussed in
this thread and, FWIW, my personal opinion is that the way to go is
what Alex says here:

<79529592-5d60-2a41-fbb6-4a5f8279f998@amazon.com>

(E.g., when he mentions solution 4 "Create a "safe" page table which
runs with HT enabled", etc).

But let's stick to your point: if it were only for L1TF, then fine, but
it's all pointless because of MDS. My answer to this is very much
focused on my usecase, which is virtualization. I know you hate us, and
you surely have your good reasons, but you know... :-)

Correct me if I'm wrong, but I think that the "nice" thing of L1TF is
that it allows a VM to spy on another VM or on the host, but it does
not allow a regular task to spy on another task or on the kernel (well,
it would, but it's easily mitigated).

The bad thing about MDS is that it instead allow *all* of that.

Now, one thing that we absolutely want to avoid in virt is that a VM is
able to spy on other VMs or on the host. Sure, we also care about tasks
running in our VMs to be safe, but, really, inter-VM and VM-to-host
isolation is the primary concern of an hypervisor.

And how a VM (or stuff running inside a VM) can spy on another VM or on
the host, via L1TF or MDS? Well, if the attacker VM and the victim VM
--or if the attacker VM and the host-- are running on the same core. If
they're not, it can't... which is basically an L1TF-only looking
scenario.

So, in virt, core-scheduling:
1) is the *only* way (aside from no-EPT) to prevent attacker VM to spy 
   on victim VM, if they're running concurrently, both in guest mode,
   on the same core (and that's, of course, because with
   core-scheduling they just won't be doing that :-) )
2) interrupts and VMEXITs needs being taken care of --which was the 
   case already when, as you said "we had only L1TF". Once that is done
   we will effectively prevent all VM to VM and VM to host attack
   scenarios.

Sure, it will still be possible, for instance, for task_A in VM1 to spy
on task_B, also in VM1. This seems to be, AFAIUI, Joel's usecase, so
I'm happy to leave it to him to defend that, as he's doing already (but
indeed I'm very happy to see that it is also getting attention).

Now, of course saying anything like "works for my own usecase so let's
go for it" does not fly. But since you were asking whether and how this
feature could make sense today, suppose that:
1) we get core-scheduling,
2) we find a solution for irqs and VMEXITs, as we would have to if 
   there was only L1TF,
3) we manage to make the overhead of core-scheduling close to zero 
   when it's there (I mean, enabled at compile time) but not used (I
   mean, no tagging of tasks, or whatever).

That would mean that virt people can enable core-scheduling, and
achieve good inter-VM and VM-to-host isolation, without imposing
overhead to other use cases, that would leave core-scheduling disabled.

And this is something that I would think it makes sense.

Of course, we're not there... because even when this series will give
us point 1, we will also need 2 and we need to make sure we also
satisfy 3 (and we weren't, last time I checked ;-P).

But I think it's worth keeping trying.

I'd also add a couple of more ideas, still about core-scheduling in
virt, but from a different standpoint than security:
- if I tag vcpu0 and vcpu1 together[*], then vcpu2 and vcpu3 together,
  then vcpu4 and vcpu5 together, then I'm sure that each pair will
  always be scheduled on the same core. At which point I can define
  an SMT virtual topology, for the VM, that will make sense, even
  without pinning the vcpus;
- if I run VMs from different customers, when vcpu2 of VM1 and vcpu1
  of VM2 run on the same core, they influence each others' performance.
  If, e.g., I bill basing on time spent on CPUs, it means customer
  A's workload, running in VM1, may influence the billing of customer
  B, who owns VM2. With core scheduling, if I tag all the vcpus of each
  VM together, I won't have this any longer.

[*] with "tag together" I mean let them have the same tag which, ATM
would be "put them in the same cgroup and enable cpu.tag".

Whether or not these make sense, e.g., performance wise, it's a bid
hard to tell, with the feature not-yet finalized... But I've started
doing some preliminary measurements already. Hopefully, they'll be
ready by Monday.  

So that's it. I hope this gives you enough material to complain about
during OSPM. At least, given the event is virtual, I won't get any
microphone box (or, worse, frozen sharks!) thrown at me in anger! :-D

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2020-05-09 14:35 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04 16:59 [RFC PATCH 00/13] Core scheduling v5 vpillai
2020-03-04 16:59 ` [RFC PATCH 01/13] sched: Wrap rq::lock access vpillai
2020-03-04 16:59 ` [RFC PATCH 02/13] sched: Introduce sched_class::pick_task() vpillai
2020-03-04 16:59 ` [RFC PATCH 03/13] sched: Core-wide rq->lock vpillai
2020-04-01 11:42   ` [PATCH] sched/arm64: store cpu topology before notify_cpu_starting Cheng Jian
2020-04-01 13:23     ` Valentin Schneider
2020-04-06  8:00       ` chengjian (D)
2020-04-09  9:59       ` Sudeep Holla
2020-04-09 10:32         ` Valentin Schneider
2020-04-09 11:08           ` Sudeep Holla
2020-04-09 17:54     ` Joel Fernandes
2020-04-10 13:49       ` chengjian (D)
2020-04-14 11:36   ` [RFC PATCH 03/13] sched: Core-wide rq->lock Peter Zijlstra
2020-04-14 21:35     ` Vineeth Remanan Pillai
2020-04-15 10:55       ` Peter Zijlstra
2020-04-14 14:32   ` Peter Zijlstra
2020-03-04 16:59 ` [RFC PATCH 04/13] sched/fair: Add a few assertions vpillai
2020-03-04 16:59 ` [RFC PATCH 05/13] sched: Basic tracking of matching tasks vpillai
2020-03-04 16:59 ` [RFC PATCH 06/13] sched: Update core scheduler queue when taking cpu online/offline vpillai
2020-03-04 16:59 ` [RFC PATCH 07/13] sched: Add core wide task selection and scheduling vpillai
2020-04-14 13:35   ` Peter Zijlstra
2020-04-16 23:32     ` Tim Chen
2020-04-17 10:57       ` Peter Zijlstra
2020-04-16  3:39   ` Chen Yu
2020-04-16 19:59     ` Vineeth Remanan Pillai
2020-04-17 11:18     ` Peter Zijlstra
2020-04-19 15:31       ` Chen Yu
2020-05-21 23:14   ` Joel Fernandes
2020-05-21 23:16     ` Joel Fernandes
2020-05-22  2:35     ` Joel Fernandes
2020-05-22  3:44       ` Aaron Lu
2020-05-22 20:13         ` Joel Fernandes
2020-03-04 16:59 ` [RFC PATCH 08/13] sched/fair: wrapper for cfs_rq->min_vruntime vpillai
2020-03-04 16:59 ` [RFC PATCH 09/13] sched/fair: core wide vruntime comparison vpillai
2020-04-14 13:56   ` Peter Zijlstra
2020-04-15  3:34     ` Aaron Lu
2020-04-15  4:07       ` Aaron Lu
2020-04-15 21:24         ` Vineeth Remanan Pillai
2020-04-17  9:40           ` Aaron Lu
2020-04-20  8:07             ` [PATCH updated] sched/fair: core wide cfs task priority comparison Aaron Lu
2020-04-20 22:26               ` Vineeth Remanan Pillai
2020-04-21  2:51                 ` Aaron Lu
2020-04-24 14:24                   ` [PATCH updated v2] " Aaron Lu
2020-05-06 14:35                     ` Peter Zijlstra
2020-05-08  8:44                       ` Aaron Lu
2020-05-08  9:09                         ` Peter Zijlstra
2020-05-08 12:34                           ` Aaron Lu
2020-05-14 13:02                             ` Peter Zijlstra
2020-05-14 22:51                               ` Vineeth Remanan Pillai
2020-05-15 10:38                                 ` Peter Zijlstra
2020-05-15 10:43                                   ` Peter Zijlstra
2020-05-15 14:24                                   ` Vineeth Remanan Pillai
2020-05-16  3:42                               ` Aaron Lu
2020-05-22  9:40                                 ` Aaron Lu
2020-06-08  1:41                               ` Ning, Hongyu
2020-03-04 17:00 ` [RFC PATCH 10/13] sched: Trivial forced-newidle balancer vpillai
2020-03-04 17:00 ` [RFC PATCH 11/13] sched: migration changes for core scheduling vpillai
2020-06-12 13:21   ` Joel Fernandes
2020-06-12 21:32     ` Vineeth Remanan Pillai
2020-06-13  2:25       ` Joel Fernandes
2020-06-13 18:59         ` Vineeth Remanan Pillai
2020-06-15  2:05           ` Li, Aubrey
2020-03-04 17:00 ` [RFC PATCH 12/13] sched: cgroup tagging interface " vpillai
2020-06-26 15:06   ` Vineeth Remanan Pillai
2020-03-04 17:00 ` [RFC PATCH 13/13] sched: Debug bits vpillai
2020-03-04 17:36 ` [RFC PATCH 00/13] Core scheduling v5 Tim Chen
2020-03-04 17:42   ` Vineeth Remanan Pillai
2020-04-14 14:21 ` Peter Zijlstra
2020-04-15 16:32   ` Joel Fernandes
2020-04-17 11:12     ` Peter Zijlstra
2020-04-17 12:35       ` Alexander Graf
2020-04-17 13:08         ` Peter Zijlstra
2020-04-18  2:25       ` Joel Fernandes
2020-05-09 14:35   ` Dario Faggioli [this message]
     [not found] ` <38805656-2e2f-222a-c083-692f4b113313@linux.intel.com>
2020-05-09  3:39   ` Ning, Hongyu
2020-05-14 20:51     ` FW: " Gruza, Agata
2020-05-10 23:46 ` [PATCH RFC] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-11 13:49   ` Peter Zijlstra
2020-05-11 14:54     ` Joel Fernandes
2020-05-20 22:26 ` [PATCH RFC] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-05-21  4:09   ` [PATCH RFC] sched: Add a per-thread core scheduling interface(Internet mail) benbjiang(蒋彪)
2020-05-21 13:49     ` Joel Fernandes
2020-05-21  8:51   ` [PATCH RFC] sched: Add a per-thread core scheduling interface Peter Zijlstra
2020-05-21 13:47     ` Joel Fernandes
2020-05-21 20:20       ` Vineeth Remanan Pillai
2020-05-22 12:59       ` Peter Zijlstra
2020-05-22 21:35         ` Joel Fernandes
2020-05-24 14:00           ` Phil Auld
2020-05-28 14:51             ` Joel Fernandes
2020-05-28 17:01             ` Peter Zijlstra
2020-05-28 18:17               ` Phil Auld
2020-05-28 18:34                 ` Phil Auld
2020-05-28 18:23               ` Joel Fernandes
2020-05-21 18:31   ` Linus Torvalds
2020-05-21 20:40     ` Joel Fernandes
2020-05-21 21:58       ` Jesse Barnes
2020-05-22 16:33         ` Linus Torvalds
2020-05-20 22:37 ` [PATCH RFC v2] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-20 22:48 ` [PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic Joel Fernandes (Google)
2020-05-21 22:52   ` Paul E. McKenney
2020-05-22  1:26     ` Joel Fernandes
2020-06-25 20:12 ` [RFC PATCH 00/13] Core scheduling v5 Vineeth Remanan Pillai
2020-06-26  1:47   ` Joel Fernandes
2020-06-26 14:36     ` Vineeth Remanan Pillai
2020-06-26 15:10       ` Joel Fernandes
2020-06-26 15:12         ` Joel Fernandes
2020-06-27 16:21         ` Joel Fernandes
2020-06-30 14:11         ` Phil Auld
2020-06-29 12:33   ` Li, Aubrey
2020-06-29 19:41     ` Vineeth Remanan Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b540686733f52135b6153a3b4480a25f7f552386.camel@suse.com \
    --to=dfaggioli@suse.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.de \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=joelaf@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).