linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liran Alon <liran.alon@oracle.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	juerg.haefliger@hpe.com, deepa.srinivasan@oracle.com,
	Jim Mattson <jmattson@google.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	linux-mm <linux-mm@kvack.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	joao.m.martins@oracle.com, pradeep.vincent@oracle.com,
	Andi Kleen <ak@linux.intel.com>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	kanth.ghatraju@oracle.com, Kees Cook <keescook@google.com>,
	jsteckli@os.inf.tu-dresden.de,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	chris.hyser@oracle.com, Tyler Hicks <tyhicks@canonical.com>,
	John Haxby <john.haxby@oracle.com>, Jon Masters <jcm@redhat.com>
Subject: Re: Redoing eXclusive Page Frame Ownership (XPFO) with isolated CPUs in mind (for KVM to isolate its guests per CPU)
Date: Tue, 21 Aug 2018 17:01:57 +0300	[thread overview]
Message-ID: <ED24D811-C740-417F-A443-B7A249F4FF4C@oracle.com> (raw)
In-Reply-To: <1534845423.10027.44.camel@infradead.org>


> On 21 Aug 2018, at 12:57, David Woodhouse <dwmw2@infradead.org> wrote:
> 
> Another alternative... I'm told POWER8 does an interesting thing with
> hyperthreading and gang scheduling for KVM. The host kernel doesn't
> actually *see* the hyperthreads at all, and KVM just launches the full
> set of siblings when it enters a guest, and gathers them again when any
> of them exits. That's definitely worth investigating as an option for
> x86, too.

I actually think that such scheduling mechanism which prevents leaking cache entries to sibling hyperthreads should co-exist together with the KVM address space isolation to fully mitigate L1TF and other similar vulnerabilities. The address space isolation should prevent VMExit handlers code gadgets from loading arbitrary host memory to the cache. Once VMExit code path switches to full host address space, then we should also make sure that no other sibling hyprethread is running in the guest.

Focusing on the scheduling mechanism, we must make sure that when a logical processor runs guest code, all siblings logical processors must run code which do not populate L1D cache with information unrelated to this VM. This includes forbidding one logical processor to run guest code while sibling is running a host task such as a NIC interrupt handler.
Thus, when a vCPU thread exits the guest into the host and VMExit handler reaches code flow which could populate L1D cache with this information, we should force an exit from the guest of the siblings logical processors, such that they will be allowed to resume only on a core which we can promise that the L1D cache is free from information unrelated to this VM.

At first, I have created a patch series which attempts to implement such mechanism in KVM. However, it became clear to me that this may need to be implemented in the scheduler itself. This is because:
1. It is difficult to handle all new scheduling contrains only in KVM.
2. This mechanism should be relevant for any Type-2 hypervisor which runs inside Linux besides KVM (Such as VMware Workstation or VirtualBox).
3. This mechanism could also be used to prevent future “core-cache-leaking” vulnerabilities to be exploited between processes of different security domains which run as siblings on the same core.

The main idea is a mechanism which is very similar to Microsoft's "core scheduler" which they implemented to mitigate this vulnerability. The mechanism should work as follows:
1. Each CPU core will now be tagged with a "security domain id".
2. The scheduler will provide a mechanism to tag a task with a security domain id.
3. Tasks will inherit their security domain id from their parent task.
    3.1. First task in system will have security domain id of 0. Thus, if nothing special is done, all tasks will be assigned with security domain id of 0.
4. Tasks will be able to allocate a new security domain id from the scheduler and assign it to another task dynamically.
5. Linux scheduler will prevent scheduling tasks on a core with a different security domain id:
    5.0. CPU core security domain id will be set to the security domain id of the tasks which currently run on it.
    5.1. The scheduler will attempt to first schedule a task on a core with required security domain id if such exists.
    5.2. Otherwise, will need to decide if it wishes to kick all tasks running on some core to run the task with a different security domain id on that core.

The above mechanism can be used to mitigate the L1TF HT variant by just assigning vCPU tasks with a security domain id which is unique per VM and also different than the security domain id of the host which is 0.

I would be glad to hear feedback on the above suggestion.
If this should better be discussed on a separate email thread, please say so and I will open a new thread.

Thanks,
-Liran



  reply	other threads:[~2018-08-21 14:04 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-20 21:25 Redoing eXclusive Page Frame Ownership (XPFO) with isolated CPUs in mind (for KVM to isolate its guests per CPU) Konrad Rzeszutek Wilk
2018-08-20 21:48 ` Linus Torvalds
     [not found]   ` <1534801939.10027.24.camel@amazon.co.uk>
2018-08-20 22:18     ` Kees Cook
2018-08-20 22:27     ` Linus Torvalds
2018-08-20 22:35       ` Tycho Andersen
2018-08-20 22:59         ` Dave Hansen
2018-08-20 23:14           ` David Woodhouse
2018-08-20 23:26             ` Dave Hansen
2018-08-20 23:38               ` Linus Torvalds
2018-08-21  9:57       ` David Woodhouse
2018-08-21 14:01         ` Liran Alon [this message]
2018-08-21 14:22           ` David Woodhouse
2018-08-21 23:04             ` Liran Alon
2018-08-30 16:00       ` Julian Stecklina
2018-08-31 15:26         ` Tycho Andersen
2018-09-01 21:38         ` Linus Torvalds
2018-09-03 14:51           ` Julian Stecklina
2018-09-12 15:37             ` Julian Stecklina
2018-09-13  6:11               ` Juerg Haefliger
2018-09-17 10:01                 ` Julian Stecklina
2018-09-17 10:19                   ` Tycho Andersen
2018-09-17 13:27                   ` Christoph Hellwig
2018-09-14 17:06               ` Khalid Aziz
2018-09-17  9:51                 ` Julian Stecklina
2018-09-18 23:00                   ` Khalid Aziz
2018-09-24 14:45                     ` Stecklina, Julian
2018-10-15  8:07                       ` Khalid Aziz
2018-10-24 11:00                         ` Khalid Aziz
2018-10-24 15:00                           ` Tycho Andersen
2018-09-03 15:26           ` Andi Kleen
2018-09-04  9:37             ` Julian Stecklina
     [not found]           ` <CACfEFw_h5uup-anKZwfBcWMJB7gHxb9NEPTRSUAY0+t11RiQbg@mail.gmail.com>
2018-09-03 15:36             ` Andi Kleen
2018-09-07 21:30         ` Khalid Aziz
2018-08-31  8:43     ` James Bottomley
2018-09-19  1:03     ` Balbir Singh
2018-09-19 15:43       ` Jonathan Adams
2018-09-23  2:33         ` Balbir Singh
2018-09-25 14:12           ` Stecklina, Julian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ED24D811-C740-417F-A443-B7A249F4FF4C@oracle.com \
    --to=liran.alon@oracle.com \
    --cc=ak@linux.intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=chris.hyser@oracle.com \
    --cc=deepa.srinivasan@oracle.com \
    --cc=dwmw2@infradead.org \
    --cc=jcm@redhat.com \
    --cc=jmattson@google.com \
    --cc=joao.m.martins@oracle.com \
    --cc=john.haxby@oracle.com \
    --cc=jsteckli@os.inf.tu-dresden.de \
    --cc=juerg.haefliger@hpe.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=keescook@google.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=khalid.aziz@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pradeep.vincent@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tyhicks@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).