linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Tim Chen <tim.c.chen@linux.intel.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: "Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Aubrey Li" <aubrey.intel@gmail.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Ingo Molnar" <mingo@kernel.org>, "Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Dario Faggioli" <dfaggioli@suse.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Aaron Lu" <aaron.lwe@gmail.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [RFC PATCH v4 00/19] Core scheduling v4
Date: Tue, 17 Mar 2020 22:17:47 +0100	[thread overview]
Message-ID: <87imj2bs04.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <ee268494-c35e-422f-1aaf-baab12191c38@linux.intel.com>

Tim,

Tim Chen <tim.c.chen@linux.intel.com> writes:
>> However, I have the following questions, in particular there are 4 scenarios
>> where I feel the current patches do not resolve MDS/L1TF, would you guys
>> please share your thoughts?
>> 
>> 1. HT1 is running either hostile guest or host code.
>>    HT2 is running an interrupt handler (victim).
>> 
>>    In this case I see there is a possible MDS issue between HT1 and HT2.
>
> Core scheduling mitigates the userspace to userspace attacks via MDS between the HT.
> It does not prevent the userspace to kernel space attack.  That will
> have to be mitigated via other means, e.g. redirecting interrupts to a core
> that don't run potentially unsafe code.

Which is in some cases simply impossible. Think multiqueue devices with
managed interrupts. You can't change the affinity of those. Neither can
you do that for the per cpu timer interrupt.

>> 2. HT1 is executing hostile host code, and gets interrupted by a victim
>>    interrupt. HT2 is idle.
>
> Similar to above.

No. It's the same HT so not similar at all.

>>    In this case, I see there is a possible MDS issue between interrupt and
>>    the host code on the same HT1.
>
> The cpu buffers are cleared before return to the hostile host code.  So
> MDS shouldn't be an issue if interrupt handler and hostile code
> runs on the same HT thread.

OTOH, thats mostly correct. Aside of the shouldn't wording:

  MDS _is_ no issue in this case when the full mitigation is enabled.

Assumed that I have not less information about MDS than you have :)

>> 3. HT1 is executing hostile guest code, HT2 is executing a victim interrupt
>>    handler on the host.
>> 
>>    In this case, I see there is a possible L1TF issue between HT1 and HT2.
>>    This issue does not happen if HT1 is running host code, since the host
>>    kernel takes care of inverting PTE bits.
>
> The interrupt handler will be run with PTE inverted.  So I don't think
> there's a leak via L1TF in this scenario.

How so?

Host memory is attackable, when one of the sibling SMT threads runs in
host OS (hypervisor) context and the other in guest context.

HT1 is in guest mode and attacking (has control over PTEs). HT2 is
running in host mode and executes an interrupt handler. The host PTE
inversion does not matter in this scenario at all.

So HT1 can very well see data which is brought into the shared L1 by
HT2.

The only way to mitigate that aside of disabling HT is disabling EPT.

>> 4. HT1 is idle, and HT2 is running a victim process. Now HT1 starts running
>>    hostile code on guest or host. HT2 is being forced idle. However, there is
>>    an overlap between HT1 starting to execute hostile code and HT2's victim
>>    process getting scheduled out.
>>    Speaking to Vineeth, we discussed an idea to monitor the core_sched_seq
>>    counter of the sibling being idled to detect that it is now idle.
>>    However we discussed today that looking at this data, it is not really an
>>    issue since it is such a small window.

If the victim HT is kicked out of execution with an IPI then the overlap
depends on the contexts:

        HT1 (attack)		HT2 (victim)

 A      idle -> user space      user space -> idle

 B      idle -> user space      guest -> idle

 C      idle -> guest           user space -> idle

 D      idle -> guest           guest -> idle

The IPI from HT1 brings HT2 immediately into the kernel when HT2 is in
host user mode or brings it immediately into VMEXIT when HT2 is in guest
mode.

#A On return from handling the IPI HT2 immediately reschedules to idle.
   To have an overlap the return to user space on HT1 must be faster.

#B Coming back from VEMXIT into schedule/idle might take slightly longer
   than #A.

#C Similar to #A, but reentering guest mode in HT1 after sending the IPI
   will probably take longer.

#D Similar to #C if you make the assumption that VMEXIT on HT2 and
   rescheduling into idle is not significantly slower than reaching
   VMENTER after sending the IPI.

In all cases the data exposed by a potential overlap shouldn't be that
interesting (e.g. scheduler state), but that obviously depends on what
the attacker is looking for.

But all of them are still problematic vs. interrupts / softinterrupts
which can happen on HT2 on the way to idle or while idling. i.e. #3 of
the original case list. #A and #B are only affected my MDS, #C and #D by
both MDS and L1TF (if EPT is in use).

>> My concern is now cases 1, 2 to which there does not seem a good solution,
>> short of disabling interrupts. For 3, we could still possibly do something on
>> the guest side, such as using shadow page tables. Any thoughts on all this?

#1 can be partially mitigated by changing interrupt affinities, which is
   not always possible and in the case of the local timer interrupt
   completely impossible. It's not only the timer interrupt itself, the
   timer callbacks which can run in the softirq on return from interrupt
   might be valuable attack surface depending on the nature of the
   callbacks, the random entropy timer just being a random example.

#2 is a non issue if MDS mitigation is on, i.e. buffers are flushed
   before returning to user space. It's pretty much a non SMT case,
   i.e. same CPU user to kernel attack.

#3 Can only be fully mitigated by disabling EPT 

#4 Assumed that my assumptions about transition times are correct, which
   I think they are, #4 is pretty much redirected to #1

Hope that helps.

Thanks,

        tglx


  parent reply	other threads:[~2020-03-17 21:18 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-30 18:33 [RFC PATCH v4 00/19] Core scheduling v4 Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 01/19] stop_machine: Fix stop_cpus_in_progress ordering Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 02/19] sched: Fix kerneldoc comment for ia64_set_curr_task Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 03/19] sched: Wrap rq::lock access Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 04/19] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 05/19] sched: Add task_struct pointer to sched_class::set_curr_task Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 06/19] sched/fair: Export newidle_balance() Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 07/19] sched: Allow put_prev_task() to drop rq->lock Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 08/19] sched: Rework pick_next_task() slow-path Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 09/19] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 10/19] sched: Core-wide rq->lock Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 11/19] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 12/19] sched: A quick and dirty cgroup tagging interface Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 13/19] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 14/19] sched/fair: Add a few assertions Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 15/19] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 16/19] sched: Debug bits Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 17/19] sched/fair: wrapper for cfs_rq->min_vruntime Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 18/19] sched/fair: core wide vruntime comparison Vineeth Remanan Pillai
2019-10-30 18:33 ` [RFC PATCH v4 19/19] sched/fair : Wake up forced idle siblings if needed Vineeth Remanan Pillai
2019-10-31 11:42 ` [RFC PATCH v4 00/19] Core scheduling v4 Li, Aubrey
2019-11-01 11:33   ` Li, Aubrey
2019-11-08  3:20     ` Li, Aubrey
2019-10-31 18:42 ` Phil Auld
2019-11-01 14:03   ` Vineeth Remanan Pillai
2019-11-01 16:35     ` Greg Kerr
2019-11-01 18:07       ` Dario Faggioli
2019-11-12  1:45     ` Dario Faggioli
2019-11-13 17:16       ` Tim Chen
2020-01-02  2:28       ` Aubrey Li
2020-01-10 23:19         ` Tim Chen
2019-11-11 19:10 ` Tim Chen
2020-01-14  1:12   ` Tim Chen
2020-01-14 15:40     ` Vineeth Remanan Pillai
2020-01-15  3:43       ` Li, Aubrey
2020-01-15 19:33         ` Tim Chen
2020-01-16  1:45           ` Aubrey Li
2020-01-17 16:00             ` Vineeth Remanan Pillai
2020-01-22 18:04               ` Gruza, Agata
2020-01-28  2:40       ` Dario Faggioli
     [not found]         ` <CANaguZDDpzrzdTmvjXvCmV2c+wBt6mXWSz4Vn-LJ-onc_Oj=yw@mail.gmail.com>
2020-02-01 15:31           ` Dario Faggioli
2020-02-06  0:28       ` Tim Chen
2020-02-06 22:37         ` Julien Desfossez
2020-02-12 23:07         ` Julien Desfossez
2020-02-13 18:37           ` Tim Chen
2020-02-14  6:10             ` Aubrey Li
     [not found]               ` <CANaguZC40mDHfL1H_9AA7H8cyd028t9PQVRqQ3kB4ha8R7hhqg@mail.gmail.com>
2020-02-15  6:01                 ` Aubrey Li
     [not found]                   ` <CANaguZBj_x_2+9KwbHCQScsmraC_mHdQB6uRqMTYMmvhBYfv2Q@mail.gmail.com>
2020-02-21 23:20                     ` Julien Desfossez
2020-03-17  0:55                       ` Joel Fernandes
2020-03-17 19:07                         ` Tim Chen
2020-03-17 20:18                           ` Tim Chen
2020-03-18  1:10                             ` Joel Fernandes
2020-03-17 21:17                           ` Thomas Gleixner [this message]
2020-03-17 21:58                             ` Tim Chen
2020-03-18  1:03                             ` Joel Fernandes
2020-03-18  2:30                               ` Joel Fernandes
2020-03-18  0:52                           ` Joel Fernandes
2020-03-18 11:53                             ` Thomas Gleixner
2020-03-19  1:54                               ` Joel Fernandes
2020-02-25  3:44               ` Aaron Lu
2020-02-25  5:32                 ` Aubrey Li
2020-02-25  7:34                   ` Aaron Lu
2020-02-25 10:40                     ` Aubrey Li
2020-02-25 11:21                       ` Aaron Lu
2020-02-25 13:41                         ` Aubrey Li
     [not found]                 ` <CANaguZD205ccu1V_2W-QuMRrJA9SjJ5ng1do4NCdLy8NDKKrbA@mail.gmail.com>
2020-02-26  3:13                   ` Aaron Lu
2020-02-26  7:21                   ` Aubrey Li
     [not found]                     ` <CANaguZDQZg-Z6aNpeLcjQ-cGm3X8CQOkZ_hnJNUyqDRM=yVDFQ@mail.gmail.com>
2020-02-27  4:45                       ` Aubrey Li
2020-02-28 23:55                       ` Tim Chen
2020-03-03 14:59                         ` Li, Aubrey
2020-03-03 23:54                           ` Li, Aubrey
2020-03-05  4:33                             ` Aaron Lu
2020-03-05  6:10                               ` Li, Aubrey
2020-03-05  8:52                                 ` Aaron Lu
2020-02-27  2:04                   ` Aaron Lu
2020-02-27 14:10                     ` Phil Auld
2020-02-27 14:37                       ` Aubrey Li
2020-02-28  2:54                       ` Aaron Lu
2020-03-05 13:45                         ` Aubrey Li
2020-03-06  2:41                           ` Aaron Lu
2020-03-06 18:06                             ` Tim Chen
2020-03-06 18:33                               ` Phil Auld
2020-03-06 21:44                                 ` Tim Chen
2020-03-07  3:13                                   ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87imj2bs04.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=dfaggioli@suse.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).