All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] KVM L1TF options
@ 2018-07-02 21:41 Alexander Graf
  2018-07-02 21:44 ` [MODERATED] " Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alexander Graf @ 2018-07-02 21:41 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2771 bytes --]

Howdy,

I've been wondering for a while what options we (realistically) have to
mitigate L1TF with SMT in the KVM case. So far I gathered:

1) Disable SMT
1.1) on boot
1.2) dynamically, magically

I guess we all agree that while it's an easy option out, it's not
necessarily the best user experience for mixed environments, where both
host Linux applications and guests are running. This typically happens
in any "secure" container environment, hyper-converged cloud, etc etc.


2) Gang scheduling for vcpus, disallow non-kvm processes to run with kvm

This seems to be quite tricky to get right from what I gather. I
remember IBM had similar approaches for HV KVM on PowerPC initially, but
eventually went with SMT1 on the host and SMTx in the guest, with KVM
doing the ganging.


3) Shadow page tables

We lose quite some performance, but would be 100% safe. The biggest
problem is that we degrade on single-core performance rather than
overall system performance.

Also with KPTI, shadow page tables started to perform much worse.


--


So I've been wondering whether there is any other option: We really
don't need the big shadow page table hammer. Instead, we can happily
allow the guest to use nested pages during runtime as long as the page
tables it is using are sanitized.

4) EPT with PT sanitization

We could map additional guest read only RAM in an unused window of the
guest physical address space as a page table cache. Then trap CR3
writes. On CR3 writes, check if we have a counter-page to the original
CR3 entry, point to our sanitized clone instead. Otherwise create one.

Then we can slowly propagate that clone similar to shadow page tables,
but we would never have to do the GPA->HVA->GPA dance. Instead, all we
need to do is check whether a PTE has the P bit set or not. If it does,
copy it. If it doesn't, just set the whole clone PTE to 0.

The most annoying part about this approach is that we would need to
implement A and D bits by hand, because a) the HW walker can not write
to the clone PTE because it's mapped read-only and b) we should
propagate the bits immediately into the real PTE anyway.

Another idea Jörg came up with was that we could try to optimize the
sanitization even more using PML (Page Modification Log). Using that we
know which pages the guest modified, so if we find a page in there that
we have in our sanitized cache, we can just redo that whole page in one go.

What I'm not quite sure of yet are page invalidations. Is there
something we can trap on for those? Otherwise we'd have to set all page
table pages to read-only in the EPT and that would end up very expensive.

So what do people think about the idea? Does it sound doable?


Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: KVM L1TF options
  2018-07-02 21:41 [MODERATED] KVM L1TF options Alexander Graf
@ 2018-07-02 21:44 ` Dave Hansen
  2018-07-02 22:02 ` Anthony Liguori
  2018-07-05 19:22 ` [MODERATED] " Jon Masters
  2 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2018-07-02 21:44 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 130 bytes --]

On 07/02/2018 02:41 PM, speck for Alexander Graf wrote:
> 1) Disable SMT

... plus the L1 flushes before entering guests.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: KVM L1TF options
  2018-07-02 21:41 [MODERATED] KVM L1TF options Alexander Graf
  2018-07-02 21:44 ` [MODERATED] " Dave Hansen
@ 2018-07-02 22:02 ` Anthony Liguori
  2018-07-02 22:28   ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
                     ` (2 more replies)
  2018-07-05 19:22 ` [MODERATED] " Jon Masters
  2 siblings, 3 replies; 11+ messages in thread
From: Anthony Liguori @ 2018-07-02 22:02 UTC (permalink / raw)
  To: speck

speck for Alexander Graf <speck@linutronix.de> writes:
> 
> Howdy,
> 
> I've been wondering for a while what options we (realistically) have to
> mitigate L1TF with SMT in the KVM case. So far I gathered:
> 
> 1) Disable SMT
> 1.1) on boot
> 1.2) dynamically, magically
> 
> I guess we all agree that while it's an easy option out, it's not
> necessarily the best user experience for mixed environments, where both
> host Linux applications and guests are running. This typically happens
> in any "secure" container environment, hyper-converged cloud, etc etc.
> 
> 
> 2) Gang scheduling for vcpus, disallow non-kvm processes to run with kvm
> 
> This seems to be quite tricky to get right from what I gather. I
> remember IBM had similar approaches for HV KVM on PowerPC initially, but
> eventually went with SMT1 on the host and SMTx in the guest, with KVM
> doing the ganging.

There are at least a couple implementations of this already but my
understanding is that mingo/tglx have pushed back on doing this.
Regardless, we plan on posting an implementation once the embargo lifts.

> 3) Shadow page tables
> 
> We lose quite some performance, but would be 100% safe. The biggest
> problem is that we degrade on single-core performance rather than
> overall system performance.
> 
> Also with KPTI, shadow page tables started to perform much worse.
> 
> 
> --
> 
> 
> So I've been wondering whether there is any other option: We really
> don't need the big shadow page table hammer. Instead, we can happily
> allow the guest to use nested pages during runtime as long as the page
> tables it is using are sanitized.
> 
> 4) EPT with PT sanitization
> 
> We could map additional guest read only RAM in an unused window of the
> guest physical address space as a page table cache. Then trap CR3
> writes. On CR3 writes, check if we have a counter-page to the original
> CR3 entry, point to our sanitized clone instead. Otherwise create one.
> 
> Then we can slowly propagate that clone similar to shadow page tables,
> but we would never have to do the GPA->HVA->GPA dance. Instead, all we
> need to do is check whether a PTE has the P bit set or not. If it does,
> copy it. If it doesn't, just set the whole clone PTE to 0.
> 
> The most annoying part about this approach is that we would need to
> implement A and D bits by hand, because a) the HW walker can not write
> to the clone PTE because it's mapped read-only and b) we should
> propagate the bits immediately into the real PTE anyway.
> 
> Another idea J=C3=B6rg came up with was that we could try to optimize the=
> 
> sanitization even more using PML (Page Modification Log). Using that we
> know which pages the guest modified, so if we find a page in there that
> we have in our sanitized cache, we can just redo that whole page in one g=
> o.
> 
> What I'm not quite sure of yet are page invalidations. Is there
> something we can trap on for those? Otherwise we'd have to set all page
> table pages to read-only in the EPT and that would end up very expensive.=
> 
> 
> So what do people think about the idea? Does it sound doable?

You need to write protect all levels of the page table so you really do
need to do full blown shadow paging, no?

I'm not able to get my head around what gets simplified with this
approach.

I think there is also another thing needed:

5) Secret hiding

Data cannot end up in the L1 cache unless it's in the virtual address
space.  We can attempt to ensure that any guest secret is only present
in it's virtual address space and never in another guest's address
space.

This requires not carrying a kernel virtual address mapping for any
physical memory used by the guest but also some care in placing guest
register state and potentially device model state into a userspace va
instead of kernel memory as it is today.

It also requires stack scrubbing and a few other things.  The advantage
is that it avoids the L1 cache clearing.

Regards,

Anthony Liguori

> 
> 
> Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: KVM L1TF options
  2018-07-02 22:02 ` Anthony Liguori
@ 2018-07-02 22:28   ` Alexander Graf
  2018-07-02 22:33   ` Thomas Gleixner
  2018-07-03  8:36   ` Paolo Bonzini
  2 siblings, 0 replies; 11+ messages in thread
From: Alexander Graf @ 2018-07-02 22:28 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 5403 bytes --]



On 03.07.18 00:02, speck for Anthony Liguori wrote:
> speck for Alexander Graf <speck@linutronix.de> writes:
>>
>> Howdy,
>>
>> I've been wondering for a while what options we (realistically) have to
>> mitigate L1TF with SMT in the KVM case. So far I gathered:
>>
>> 1) Disable SMT
>> 1.1) on boot
>> 1.2) dynamically, magically
>>
>> I guess we all agree that while it's an easy option out, it's not
>> necessarily the best user experience for mixed environments, where both
>> host Linux applications and guests are running. This typically happens
>> in any "secure" container environment, hyper-converged cloud, etc etc.
>>
>>
>> 2) Gang scheduling for vcpus, disallow non-kvm processes to run with kvm
>>
>> This seems to be quite tricky to get right from what I gather. I
>> remember IBM had similar approaches for HV KVM on PowerPC initially, but
>> eventually went with SMT1 on the host and SMTx in the guest, with KVM
>> doing the ganging.
> 
> There are at least a couple implementations of this already but my
> understanding is that mingo/tglx have pushed back on doing this.
> Regardless, we plan on posting an implementation once the embargo lifts.
> 
>> 3) Shadow page tables
>>
>> We lose quite some performance, but would be 100% safe. The biggest
>> problem is that we degrade on single-core performance rather than
>> overall system performance.
>>
>> Also with KPTI, shadow page tables started to perform much worse.
>>
>>
>> --
>>
>>
>> So I've been wondering whether there is any other option: We really
>> don't need the big shadow page table hammer. Instead, we can happily
>> allow the guest to use nested pages during runtime as long as the page
>> tables it is using are sanitized.
>>
>> 4) EPT with PT sanitization
>>
>> We could map additional guest read only RAM in an unused window of the
>> guest physical address space as a page table cache. Then trap CR3
>> writes. On CR3 writes, check if we have a counter-page to the original
>> CR3 entry, point to our sanitized clone instead. Otherwise create one.
>>
>> Then we can slowly propagate that clone similar to shadow page tables,
>> but we would never have to do the GPA->HVA->GPA dance. Instead, all we
>> need to do is check whether a PTE has the P bit set or not. If it does,
>> copy it. If it doesn't, just set the whole clone PTE to 0.
>>
>> The most annoying part about this approach is that we would need to
>> implement A and D bits by hand, because a) the HW walker can not write
>> to the clone PTE because it's mapped read-only and b) we should
>> propagate the bits immediately into the real PTE anyway.
>>
>> Another idea J=C3=B6rg came up with was that we could try to optimize the=
>>
>> sanitization even more using PML (Page Modification Log). Using that we
>> know which pages the guest modified, so if we find a page in there that
>> we have in our sanitized cache, we can just redo that whole page in one g=
>> o.
>>
>> What I'm not quite sure of yet are page invalidations. Is there
>> something we can trap on for those? Otherwise we'd have to set all page
>> table pages to read-only in the EPT and that would end up very expensive.=
>>
>>
>> So what do people think about the idea? Does it sound doable?
> 
> You need to write protect all levels of the page table so you really do
> need to do full blown shadow paging, no?

Well my hope was that we could get away without write protecting the
levels and instead do things lazily with PML. Basically we could just
scan the PML log on invlpg and mov cr3 for any page table changes and
propagate them into the shadow copy automatically.

> I'm not able to get my head around what gets simplified with this
> approach.

I simply think we can use techniques which people didn't consider useful
to combine before, because EPT enabled CPUs didn't need to bother with
shadowing pages.

I also seem to remember that the memslot code showed up in my profiles
back in the day quite a bit. We would only need that to the same level
as with EPT, because we don't need to find the HVA in the shadowing case
anymore.

> I think there is also another thing needed:
> 
> 5) Secret hiding
> 
> Data cannot end up in the L1 cache unless it's in the virtual address
> space.  We can attempt to ensure that any guest secret is only present
> in it's virtual address space and never in another guest's address
> space.
> 
> This requires not carrying a kernel virtual address mapping for any
> physical memory used by the guest but also some care in placing guest
> register state and potentially device model state into a userspace va
> instead of kernel memory as it is today.
> 
> It also requires stack scrubbing and a few other things.  The advantage
> is that it avoids the L1 cache clearing.

Doesn't that break when you have an in-kernel crypto driver like a LUKS
encrypted hard drive?

Also, I don't know if I understand all details of the vulnerability, but
from what I gathered guest !P speculations actually end up going on the
physical tags of the cache using the address bits of the GPTE as tag
hints. So if a previous process 2 seconds ago happened to access a
secret on that same core, you would potentially be able to see it inside
your guest.

I would love to have someone tell me that this is not true :).


Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: KVM L1TF options
  2018-07-02 22:02 ` Anthony Liguori
  2018-07-02 22:28   ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
@ 2018-07-02 22:33   ` Thomas Gleixner
  2018-07-02 22:41     ` [MODERATED] " Anthony Liguori
  2018-07-03  8:36   ` Paolo Bonzini
  2 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2018-07-02 22:33 UTC (permalink / raw)
  To: speck

On Mon, 2 Jul 2018, speck for Anthony Liguori wrote:
> speck for Alexander Graf <speck@linutronix.de> writes:
> > 2) Gang scheduling for vcpus, disallow non-kvm processes to run with kvm
> > 
> > This seems to be quite tricky to get right from what I gather. I
> > remember IBM had similar approaches for HV KVM on PowerPC initially, but
> > eventually went with SMT1 on the host and SMTx in the guest, with KVM
> > doing the ganging.
> 
> There are at least a couple implementations of this already but my
> understanding is that mingo/tglx have pushed back on doing this.

You forgot Linus :)

For my part I pushed back on this being the stuff which magically solves
all problems and concentrating on it _before_ having the most obvious
mitigation in place. I didn't want to end up in a situation where the mess
goes public and we have our pants down (again).

Aisde of that I really want to see something which is fully functional and
does not immediately cause gastric ulcer when looking at it. What I've seen
so far is horrible proof of concept hackery full of tinkering in regular
fast paths, obvious bugs, interesting assumptions and suitable for a subset
of workloads.

I surely can see it work for particular workloads where the CPU resources
are partitioned and exits are really rare, but nobody so far explained how
this could ever work for general workloads with overcommitment, lots of
single VCPU guests, vmexit heavy scenarios etc. without creating a horrible
overhead, wreckaging load balancing and other subtle issues.

I'm open for being educated on that, but I definitely won't put my money on
it.

> Regardless, we plan on posting an implementation once the embargo lifts.

If you have stuff ready, feel free to post it here once the minimal
mitigation has been settled in the next couple of days.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: KVM L1TF options
  2018-07-02 22:33   ` Thomas Gleixner
@ 2018-07-02 22:41     ` Anthony Liguori
  0 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2018-07-02 22:41 UTC (permalink / raw)
  To: speck

speck for Thomas Gleixner <speck@linutronix.de> writes:
> On Mon, 2 Jul 2018, speck for Anthony Liguori wrote:
> > speck for Alexander Graf <speck@linutronix.de> writes:
> > > 2) Gang scheduling for vcpus, disallow non-kvm processes to run with kvm
> > > 
> > > This seems to be quite tricky to get right from what I gather. I
> > > remember IBM had similar approaches for HV KVM on PowerPC initially, but
> > > eventually went with SMT1 on the host and SMTx in the guest, with KVM
> > > doing the ganging.
> > 
> > There are at least a couple implementations of this already but my
> > understanding is that mingo/tglx have pushed back on doing this.
> 
> You forgot Linus :)
> 
> For my part I pushed back on this being the stuff which magically solves
> all problems and concentrating on it _before_ having the most obvious
> mitigation in place. I didn't want to end up in a situation where the mess
> goes public and we have our pants down (again).
> 
> Aisde of that I really want to see something which is fully functional and
> does not immediately cause gastric ulcer when looking at it. What I've seen
> so far is horrible proof of concept hackery full of tinkering in regular
> fast paths, obvious bugs, interesting assumptions and suitable for a subset
> of workloads.
> 
> I surely can see it work for particular workloads where the CPU resources
> are partitioned and exits are really rare, but nobody so far explained how
> this could ever work for general workloads with overcommitment, lots of
> single VCPU guests, vmexit heavy scenarios etc. without creating a horrible
> overhead, wreckaging load balancing and other subtle issues.
> 
> I'm open for being educated on that, but I definitely won't put my money on
> it.
> 
> > Regardless, we plan on posting an implementation once the embargo lifts.
> 
> If you have stuff ready, feel free to post it here once the minimal
> mitigation has been settled in the next couple of days.

I think we're close enough to send.  I'll ask the engineer to send it
out here.

Regards,

Anthony Liguori

> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: KVM L1TF options
  2018-07-02 22:02 ` Anthony Liguori
  2018-07-02 22:28   ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
  2018-07-02 22:33   ` Thomas Gleixner
@ 2018-07-03  8:36   ` Paolo Bonzini
  2018-07-03 12:50     ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
  2018-07-04 14:33     ` [MODERATED] " Jiri Kosina
  2 siblings, 2 replies; 11+ messages in thread
From: Paolo Bonzini @ 2018-07-03  8:36 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

On 03/07/2018 00:02, speck for Anthony Liguori wrote:
>> We could map additional guest read only RAM in an unused window of the
>> guest physical address space as a page table cache. Then trap CR3
>> writes. On CR3 writes, check if we have a counter-page to the original
>> CR3 entry, point to our sanitized clone instead. Otherwise create one.
>> So what do people think about the idea? Does it sound doable?
>
> You need to write protect all levels of the page table so you really do
> need to do full blown shadow paging, no?
> 
> I'm not able to get my head around what gets simplified with this
> approach.

I think it's possible indeed to do the WP lazily, because you get
vmexits for both CR3 load and INVLPG which are the only events that
invalidate the TLB.

However, we can forgo the lazy write protection, and probably get better
by paravirtualizing all the things.  Let the guest know about the
private, read-only-mapped copy of the CR3; then use the "CR3 targets"
feature so that the guest can write the last recent values without
vmexits.  The increased cost of shadow paging with KPTI is really coming
from the CR3 exits, while write protection and A/D emulation have pretty
good performance (early EPT didn't have A/D bits, so it's optimized a lot).

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: KVM L1TF options
  2018-07-03  8:36   ` Paolo Bonzini
@ 2018-07-03 12:50     ` Alexander Graf
  2018-07-04 14:33     ` [MODERATED] " Jiri Kosina
  1 sibling, 0 replies; 11+ messages in thread
From: Alexander Graf @ 2018-07-03 12:50 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]



On 03.07.18 10:36, speck for Paolo Bonzini wrote:
> On 03/07/2018 00:02, speck for Anthony Liguori wrote:
>>> We could map additional guest read only RAM in an unused window of the
>>> guest physical address space as a page table cache. Then trap CR3
>>> writes. On CR3 writes, check if we have a counter-page to the original
>>> CR3 entry, point to our sanitized clone instead. Otherwise create one.
>>> So what do people think about the idea? Does it sound doable?
>>
>> You need to write protect all levels of the page table so you really do
>> need to do full blown shadow paging, no?
>>
>> I'm not able to get my head around what gets simplified with this
>> approach.
> 
> I think it's possible indeed to do the WP lazily, because you get
> vmexits for both CR3 load and INVLPG which are the only events that
> invalidate the TLB.
> 
> However, we can forgo the lazy write protection, and probably get better
> by paravirtualizing all the things.  Let the guest know about the
> private, read-only-mapped copy of the CR3; then use the "CR3 targets"
> feature so that the guest can write the last recent values without
> vmexits.  The increased cost of shadow paging with KPTI is really coming
> from the CR3 exits, while write protection and A/D emulation have pretty
> good performance (early EPT didn't have A/D bits, so it's optimized a lot).

So I think it's worth a shot. Given that this would also mean we don't
need to flush the L1 cache anymore, doing shadowed EPT may at the end of
the day give us the best performance balance.

I won't be able to get to it soonish though, so I'd be happy to see
someone beat me to it :)


Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: KVM L1TF options
  2018-07-03  8:36   ` Paolo Bonzini
  2018-07-03 12:50     ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
@ 2018-07-04 14:33     ` Jiri Kosina
  2018-07-04 14:51       ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
  1 sibling, 1 reply; 11+ messages in thread
From: Jiri Kosina @ 2018-07-04 14:33 UTC (permalink / raw)
  To: speck

On Tue, 3 Jul 2018, speck for Paolo Bonzini wrote:

> I think it's possible indeed to do the WP lazily, because you get 
> vmexits for both CR3 load and INVLPG which are the only events that 
> invalidate the TLB.

I am still having issues understanding the concept here.

What prevents malicious guest from *not* doing invlpg after tinkering with 
!present PTE?

Sure, it'll eventually lead to memory corruption in the guest, but would 
still allow it for exploiting L1TF, no?

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: KVM L1TF options
  2018-07-04 14:33     ` [MODERATED] " Jiri Kosina
@ 2018-07-04 14:51       ` Alexander Graf
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Graf @ 2018-07-04 14:51 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]



On 04.07.18 16:33, speck for Jiri Kosina wrote:
> On Tue, 3 Jul 2018, speck for Paolo Bonzini wrote:
> 
>> I think it's possible indeed to do the WP lazily, because you get 
>> vmexits for both CR3 load and INVLPG which are the only events that 
>> invalidate the TLB.
> 
> I am still having issues understanding the concept here.
> 
> What prevents malicious guest from *not* doing invlpg after tinkering with 
> !present PTE?
> 
> Sure, it'll eventually lead to memory corruption in the guest, but would 
> still allow it for exploiting L1TF, no?

How so? The guest will never have the ability to write to anything that
goes into cr3. It only has access to its own copy of page tables, never
the ones that hardware uses.


Alex


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [MODERATED] Re: KVM L1TF options
  2018-07-02 21:41 [MODERATED] KVM L1TF options Alexander Graf
  2018-07-02 21:44 ` [MODERATED] " Dave Hansen
  2018-07-02 22:02 ` Anthony Liguori
@ 2018-07-05 19:22 ` Jon Masters
  2 siblings, 0 replies; 11+ messages in thread
From: Jon Masters @ 2018-07-05 19:22 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]

On 07/02/2018 05:41 PM, speck for Alexander Graf wrote:

> 4) EPT with PT sanitization

> We could map additional guest read only RAM in an unused window of the
> guest physical address space as a page table cache. Then trap CR3
> writes. On CR3 writes, check if we have a counter-page to the original
> CR3 entry, point to our sanitized clone instead. Otherwise create one.
> Then we can slowly propagate that clone similar to shadow page tables,
> but we would never have to do the GPA->HVA->GPA dance. Instead, all we
> need to do is check whether a PTE has the P bit set or not. If it does,
> copy it. If it doesn't, just set the whole clone PTE to 0.

I think this would be safe. It took a few minutes thinking about it, but
what you're saying is that the guest *never* has control over its page
tables. It's like shadow page tables but leveraging the hardware assist
from EPT. So if anyone else also wondered, it's not just CR3, it's
actually every page is "shadowed" in this approach. So a guest can't
just toggle it's !P bit (even without doing an invalidation, force TLB
eviction, wait for walker to see the change) and attack the host. It
would be perfectly safe. Slower than EPT, maybe faster than shadow.

> The most annoying part about this approach is that we would need to
> implement A and D bits by hand, because a) the HW walker can not write
> to the clone PTE because it's mapped read-only and b) we should
> propagate the bits immediately into the real PTE anyway.

You could do this periodically though as some kind of background task,
just as long as you didn't then race with the guest also scanning. This
is what will kill performance, and differ between Linux and Windows.

> What I'm not quite sure of yet are page invalidations. Is there
> something we can trap on for those? Otherwise we'd have to set all page
> table pages to read-only in the EPT and that would end up very expensive.
> 
> So what do people think about the idea? Does it sound doable?

Depends on the cost of emulating A/D bits and the above question.

Jon.

-- 
Computer Architect | Sent from my Fedora powered laptop


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-07-05 19:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-02 21:41 [MODERATED] KVM L1TF options Alexander Graf
2018-07-02 21:44 ` [MODERATED] " Dave Hansen
2018-07-02 22:02 ` Anthony Liguori
2018-07-02 22:28   ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
2018-07-02 22:33   ` Thomas Gleixner
2018-07-02 22:41     ` [MODERATED] " Anthony Liguori
2018-07-03  8:36   ` Paolo Bonzini
2018-07-03 12:50     ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
2018-07-04 14:33     ` [MODERATED] " Jiri Kosina
2018-07-04 14:51       ` [MODERATED] Re: ***UNCHECKED*** " Alexander Graf
2018-07-05 19:22 ` [MODERATED] " Jon Masters

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.