[MODERATED] KVM L1TF options

* [MODERATED] KVM L1TF options
@ 2018-07-02 21:41 Alexander Graf
  2018-07-02 21:44 ` [MODERATED] " Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alexander Graf @ 2018-07-02 21:41 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2771 bytes --]

Howdy,

I've been wondering for a while what options we (realistically) have to
mitigate L1TF with SMT in the KVM case. So far I gathered:

1) Disable SMT
1.1) on boot
1.2) dynamically, magically

I guess we all agree that while it's an easy option out, it's not
necessarily the best user experience for mixed environments, where both
host Linux applications and guests are running. This typically happens
in any "secure" container environment, hyper-converged cloud, etc etc.

2) Gang scheduling for vcpus, disallow non-kvm processes to run with kvm

This seems to be quite tricky to get right from what I gather. I
remember IBM had similar approaches for HV KVM on PowerPC initially, but
eventually went with SMT1 on the host and SMTx in the guest, with KVM
doing the ganging.

3) Shadow page tables

We lose quite some performance, but would be 100% safe. The biggest
problem is that we degrade on single-core performance rather than
overall system performance.

Also with KPTI, shadow page tables started to perform much worse.

--

So I've been wondering whether there is any other option: We really
don't need the big shadow page table hammer. Instead, we can happily
allow the guest to use nested pages during runtime as long as the page
tables it is using are sanitized.

4) EPT with PT sanitization

We could map additional guest read only RAM in an unused window of the
guest physical address space as a page table cache. Then trap CR3
writes. On CR3 writes, check if we have a counter-page to the original
CR3 entry, point to our sanitized clone instead. Otherwise create one.

Then we can slowly propagate that clone similar to shadow page tables,
but we would never have to do the GPA->HVA->GPA dance. Instead, all we
need to do is check whether a PTE has the P bit set or not. If it does,
copy it. If it doesn't, just set the whole clone PTE to 0.

The most annoying part about this approach is that we would need to
implement A and D bits by hand, because a) the HW walker can not write
to the clone PTE because it's mapped read-only and b) we should
propagate the bits immediately into the real PTE anyway.

Another idea Jörg came up with was that we could try to optimize the
sanitization even more using PML (Page Modification Log). Using that we
know which pages the guest modified, so if we find a page in there that
we have in our sanitized cache, we can just redo that whole page in one go.

What I'm not quite sure of yet are page invalidations. Is there
something we can trap on for those? Otherwise we'd have to set all page
table pages to read-only in the EPT and that would end up very expensive.

So what do people think about the idea? Does it sound doable?

Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread