All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexandre Chartre <alexandre.chartre@oracle.com>
To: pbonzini@redhat.com, rkrcmar@redhat.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, kvm@vger.kernel.org, x86@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: konrad.wilk@oracle.com, jan.setjeeilers@oracle.com,
	liran.alon@oracle.com, jwadams@google.com
Subject: Re: [RFC KVM 00/27] KVM Address Space Isolation
Date: Wed, 15 May 2019 14:52:50 +0200	[thread overview]
Message-ID: <b5ebe77f-14f5-5f87-a4bd-8befb71a9969@oracle.com> (raw)
In-Reply-To: <1557758315-12667-1-git-send-email-alexandre.chartre@oracle.com>


Thanks all for your replies and comments. I am trying to summarize main
feedback below, and define next steps.

But first, let me clarify what should happen when exiting the KVM isolated
address space (i.e. when we need to access to the full kernel). There was
some confusion because this was not clearly described in the cover letter.
Thanks to Liran for this better explanation:

   When a hyperthread needs to switch from KVM isolated address space to
   kernel full address space, it should first kick all sibling hyperthreads
   outside of guest and only then safety switch to full kernel address
   space. Only once all sibling hyperthreads are running with KVM isolated
   address space, it is safe to enter guest.

   The main point of this address space is to avoid kicking all sibling
   hyperthreads on *every* VMExit from guest but instead only kick them when
   switching address space. The assumption is that the vast majority of exits
   can be handled in KVM isolated address space and therefore do not require
   to kick the sibling hyperthreads outside of guest.

   “kick” in this context means sending an IPI to all sibling hyperthreads.
   This IPI will cause these sibling hyperthreads to exit from guest to host
   on EXTERNAL_INTERRUPT and wait for a condition that again allows to enter
   back into guest. This condition will be once all hyperthreads of CPU core
   is again running only within KVM isolated address space of this VM.


Feedback
========

Page-table Management

- Need to cleanup terminology mm vs page-table. It looks like we just need
   a KVM page-table, not a KVM mm.

- Interfaces for creating and managing page-table should be provided by
   the kernel, and not implemented in KVM. KVM shouldn't access kernel
   low-level memory management functions.

KVM Isolation Enter/Exit

- Changing CR3 in #PF could be a natural extension as #PF can already
   change page-tables, but we need a very coherent design and strong
   rules.

- Reduce kernel code running without the whole kernel mapping to the
   minimum.

- Avoid using current and task_struct while running with KVM page table.

- Ensure KVM page-table is not used with vmalloc.

- Try to avoid copying parts of the vmalloc page tables. This
   interacts unpleasantly with having the kernel stack.  We can freely
   use a different stack (the IRQ stack, for example) as long as
   we don't schedule, but that means we can't run preemptable code.

- Potential issues with tracing, kprobes... A solution would be to
   compile the isolated code with tracing off.

- Better centralize KVM isolation exit on IRQ, NMI, MCE, faults...
   Switch back to full kernel before switching to IRQ stack or
   shorlty after.

- Can we disable IRQ while running with KVM page-table?

   For IRQs it's somewhat feasible, but not for NMIs since NMIs are
   unblocked on VMX immediately after VM-Exit

   Exits due to INTR, NMI and #MC are considered high priority and are
   serviced before re-enabling IRQs and preemption[1].  All other exits
   are handled after IRQs and preemption are re-enabled.

   A decent number of exit handlers are quite short, but many exit
   handlers require significantly longer flows. In short, leaving
   IRQs disabled across all exits is not practical.

   It makes sense to pinpoint exactly what exits are:
   a) in the hot path for the use case (configuration)
   b) can be handled fast enough that they can run with IRQs disabled.

   Generating that list might allow us to tightly bound the contents
   of kvm_mm and sidestep many of the corner cases, i.e. select VM-Exits
   are handle with IRQs disabled using KVM's mm, while "slow" VM-Exits
   go through the full context switch.


KVM Page Table Content

- Check and reduce core mappings (kernel text size, cpu_entry_area,
   espfix64, IRQ stack...)

- Check and reduce percpu mapping, percpu memory can contain secrets (e.g.
   percpu random pool)


Next Steps
==========

I will investigate Sean's suggestion to see which VM-Exits can be handled
fast enough so that they can run with IRQs disabled (fast VM-Exits),
and which slow VM-Exits are in the hot path.

So I will work on a new POC which just handles fast VM-Exits with IRQs
disabled. This should largely reduce mappings required in the KVM page
table. I will also try to just have a KVM page-table and not a KVM mm.

After this new POC, we should be able to evaluate the need for handling
slow VM-Exits. And if there's an actual need, we can investigate how
to handle them with IRQs enabled.


Thanks,

alex.


      parent reply	other threads:[~2019-05-15 12:54 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-13 14:38 [RFC KVM 00/27] KVM Address Space Isolation Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 01/27] kernel: Export memory-management symbols required for KVM address space isolation Alexandre Chartre
2019-05-13 15:15   ` Peter Zijlstra
2019-05-13 15:17     ` Liran Alon
2019-05-13 14:38 ` [RFC KVM 02/27] KVM: x86: Introduce address_space_isolation module parameter Alexandre Chartre
2019-05-13 15:46   ` Andy Lutomirski
2019-05-13 15:46     ` Andy Lutomirski
2019-05-13 15:55     ` Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 03/27] KVM: x86: Introduce KVM separate virtual address space Alexandre Chartre
2019-05-13 15:45   ` Andy Lutomirski
2019-05-13 15:45     ` Andy Lutomirski
2019-05-13 16:04     ` Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 04/27] KVM: x86: Switch to KVM address space on entry to guest Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 05/27] KVM: x86: Add handler to exit kvm isolation Alexandre Chartre
2019-05-13 15:49   ` Andy Lutomirski
2019-05-13 15:49     ` Andy Lutomirski
2019-05-13 16:10     ` Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 06/27] KVM: x86: Exit KVM isolation on IRQ entry Alexandre Chartre
2019-05-13 15:51   ` Andy Lutomirski
2019-05-13 15:51     ` Andy Lutomirski
2019-05-13 16:28     ` Alexandre Chartre
2019-05-13 18:13       ` Andy Lutomirski
2019-05-13 18:13         ` Andy Lutomirski
2019-05-14  7:07         ` Peter Zijlstra
2019-05-14  7:58           ` Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 07/27] KVM: x86: Switch to host address space when may access sensitive data Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 08/27] KVM: x86: Optimize branches which checks if address space isolation enabled Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 09/27] kvm/isolation: function to track buffers allocated for the KVM page table Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 10/27] kvm/isolation: add KVM page table entry free functions Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 11/27] kvm/isolation: add KVM page table entry offset functions Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 12/27] kvm/isolation: add KVM page table entry allocation functions Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 13/27] kvm/isolation: add KVM page table entry set functions Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 14/27] kvm/isolation: functions to copy page table entries for a VA range Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 15/27] kvm/isolation: keep track of VA range mapped in KVM address space Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 16/27] kvm/isolation: functions to clear page table entries for a VA range Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 17/27] kvm/isolation: improve mapping copy when mapping is already present Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 18/27] kvm/isolation: function to copy page table entries for percpu buffer Alexandre Chartre
2019-05-13 18:18   ` Andy Lutomirski
2019-05-13 18:18     ` Andy Lutomirski
2019-05-14  7:09     ` Peter Zijlstra
2019-05-14  8:25       ` Alexandre Chartre
2019-05-14  8:34         ` Andy Lutomirski
2019-05-14  9:41           ` Alexandre Chartre
2019-05-14 15:23             ` Andy Lutomirski
2019-05-14 15:23               ` Andy Lutomirski
2019-05-14 16:24               ` Alexandre Chartre
2019-05-14 17:05                 ` Peter Zijlstra
2019-05-14 18:09                   ` Sean Christopherson
2019-05-14 20:33                     ` Andy Lutomirski
2019-05-14 20:33                       ` Andy Lutomirski
2019-05-14 21:06                       ` Sean Christopherson
2019-05-14 21:55                         ` Andy Lutomirski
2019-05-14 22:38                           ` Sean Christopherson
2019-05-18  0:05                             ` Jonathan Adams
2019-05-18  0:05                               ` Jonathan Adams
2019-05-14 20:27                   ` Andy Lutomirski
2019-05-14 20:27                     ` Andy Lutomirski
2019-05-13 14:38 ` [RFC KVM 19/27] kvm/isolation: initialize the KVM page table with core mappings Alexandre Chartre
2019-05-13 15:50   ` Dave Hansen
2019-05-13 16:00     ` Andy Lutomirski
2019-05-13 16:00       ` Andy Lutomirski
2019-05-13 17:00       ` Alexandre Chartre
2019-05-13 16:46     ` Sean Christopherson
2019-05-13 16:47     ` Alexandre Chartre
2019-05-14 10:26       ` Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 20/27] kvm/isolation: initialize the KVM page table with vmx specific data Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 21/27] kvm/isolation: initialize the KVM page table with vmx VM data Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 22/27] kvm/isolation: initialize the KVM page table with vmx cpu data Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 23/27] kvm/isolation: initialize the KVM page table with the vcpu tasks Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 24/27] kvm/isolation: KVM page fault handler Alexandre Chartre
2019-05-13 15:15   ` Peter Zijlstra
2019-05-13 21:25     ` Liran Alon
2019-05-14  2:02       ` Andy Lutomirski
2019-05-14  2:02         ` Andy Lutomirski
2019-05-14  7:21         ` Peter Zijlstra
2019-05-14 15:36           ` Alexandre Chartre
2019-05-14 15:43             ` Andy Lutomirski
2019-05-13 16:02   ` Andy Lutomirski
2019-05-13 16:02     ` Andy Lutomirski
2019-05-13 16:21     ` Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 25/27] kvm/isolation: implement actual KVM isolation enter/exit Alexandre Chartre
2019-05-13 15:16   ` Peter Zijlstra
2019-05-13 16:01   ` Andy Lutomirski
2019-05-13 16:01     ` Andy Lutomirski
2019-05-13 14:38 ` [RFC KVM 26/27] kvm/isolation: initialize the KVM page table with KVM memslots Alexandre Chartre
2019-05-13 14:38 ` [RFC KVM 27/27] kvm/isolation: initialize the KVM page table with KVM buses Alexandre Chartre
2019-05-13 16:42 ` [RFC KVM 00/27] KVM Address Space Isolation Liran Alon
2019-05-13 18:17 ` Andy Lutomirski
2019-05-13 18:17   ` Andy Lutomirski
2019-05-13 21:08   ` Liran Alon
2019-05-14  2:07     ` Andy Lutomirski
2019-05-14  2:07       ` Andy Lutomirski
2019-05-14  7:37       ` Peter Zijlstra
2019-05-14 21:32         ` Jan Setje-Eilers
2019-05-14  8:05       ` Liran Alon
2019-05-14  7:29     ` Peter Zijlstra
2019-05-14  7:57       ` Liran Alon
2019-05-14  8:33     ` Alexandre Chartre
2019-05-13 19:31 ` Nakajima, Jun
2019-05-13 21:16   ` Liran Alon
2019-05-13 21:42     ` Nakajima, Jun
2019-05-13 21:53       ` Liran Alon
2019-05-15 12:52 ` Alexandre Chartre [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5ebe77f-14f5-5f87-a4bd-8befb71a9969@oracle.com \
    --to=alexandre.chartre@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jan.setjeeilers@oracle.com \
    --cc=jwadams@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liran.alon@oracle.com \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rkrcmar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.