All of lore.kernel.org
 help / color / mirror / Atom feed
* further post-Meltdown-bad-aid performance thoughts
@ 2018-01-19 14:37 Jan Beulich
  2018-01-19 15:43 ` George Dunlap
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2018-01-19 14:37 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper

All,

along the lines of the relatively easy first step submitted yesterday,
I've had some further thoughts in that direction. A fundamental
thing for this is of course to first of all establish what kind of
information we consider safe to expose (in the long run) to guests.

The current state of things is deemed incomplete, yet despite my
earlier inquiries I haven't heard back any concrete example of
information, exposure of which does any harm. While it seems to be
generally believed that large parts of the Xen image should not be
exposed, it's not all that clear to me why that would be. I could
agree with better hiding writable data parts of it, just to be on the
safe side (I'm unaware of statically allocated data though which
might carry any secrets), but what would be the point of hiding
code and r/o data? Anyone wanting to know their contents can
simply obtain the Xen binary for their platform.

Similar considerations apply to the other data we currently keep
mapped while running 64-bit PV guests.

The reason I bring this up is because further steps in the direction
of recovering performance would likely require as a prerequisite
exposure of further data, first and foremost struct vcpu and
struct domain for the currently active vCPU. Once again I'm not
aware of any secrets living there. Another item might need to be
the local CPU's per-CPU data.

Additionally this would require leaving interrupts turned off for
longer periods of time on the entry paths.

Feedback appreciated, thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-19 14:37 further post-Meltdown-bad-aid performance thoughts Jan Beulich
@ 2018-01-19 15:43 ` George Dunlap
  2018-01-19 16:36   ` Jan Beulich
  2018-01-22 17:44   ` Matt Wilson
  0 siblings, 2 replies; 11+ messages in thread
From: George Dunlap @ 2018-01-19 15:43 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: George Dunlap, Andrew Cooper

On 01/19/2018 02:37 PM, Jan Beulich wrote:
> All,
> 
> along the lines of the relatively easy first step submitted yesterday,
> I've had some further thoughts in that direction. A fundamental
> thing for this is of course to first of all establish what kind of
> information we consider safe to expose (in the long run) to guests.
> 
> The current state of things is deemed incomplete, yet despite my
> earlier inquiries I haven't heard back any concrete example of
> information, exposure of which does any harm. While it seems to be
> generally believed that large parts of the Xen image should not be
> exposed, it's not all that clear to me why that would be. I could
> agree with better hiding writable data parts of it, just to be on the
> safe side (I'm unaware of statically allocated data though which
> might carry any secrets), but what would be the point of hiding
> code and r/o data? Anyone wanting to know their contents can
> simply obtain the Xen binary for their platform.

This tails into a discussion I think we should have about dealing with
SP1, and also future-proofing against future speculative execution attacks.

Right now there are "windows" through which people can look using SP1-3,
which we are trying to close.  SP1's "window" is the guest -> hypervisor
virtual address space (hence XPTI, separating the address spaces).
SP2's "window" is branch-target-poisoned gadgets (hence using retpoline
and various techniques to prevent branch target poisoning).  SP1's
"window" is array boundary privilege checks, hence Linux's attempts to
prevent speculation over privilege checks by using lfence or other
tricks[1].

But there will surely be more attacks like this (in fact, there may
already be some in the works[2]).

So what if instead of trying to close the "windows", we made it so that
there was nothing through the windows to see?  If no matter what the
hypervisor speculatively executed, nothing sensitive was visibile except
what a vcpu was already allowed to see,

At a first cut, there are two kinds of data inside the hypervisor which
might be interesting to an attacker:

1) Guest data: private encryption keys, secret data, &c
 1a. Direct copies of guest data
 1b. Data from which an attacker can infer guest data

2) Hypervisor data that makes it easier to perform other exploits.  For
instance, the layout of memory, the exact address of certain dynamic
data structures,  &c.

Personally I don't think we should worry too much about #2.  The main
thing we should be focusing on is 1a and 1b.

Another potential consideration is information about what monitoring
tools might be deployed against the attacker; an attacker might act
differently if she knew that VMI was being used than otherwise.  But I
doubt that the presence of VMI is really going to be able to be kept
secret very well; if I had a choice between obfuscating VMI and
recovering performance lost to SP* mitigations, I think I'd go for
performance.

> The reason I bring this up is because further steps in the direction
> of recovering performance would likely require as a prerequisite
> exposure of further data, first and foremost struct vcpu and
> struct domain for the currently active vCPU. Once again I'm not
> aware of any secrets living there. Another item might need to be
> the local CPU's per-CPU data.

A quick glance through struct vcpu doesn't turn up anything obvious.  If
we were worried about RowHammer, exposing the MFNs of various values
might be worth hiding.

Maybe it would be better to start "whitelisting" state that was believed
to be safe, rather than blacklisting state known to be dangerous.

On the whole I agree with Jan's approach, to start exposing, for
performance reasons, bits of state we believe to be safe, and then deal
with attacks as they come up.

 -George

[1] https://lwn.net/SubscriberLink/744287/02dd9bc503409ca3/
[2] skyfallattack.com



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-19 15:43 ` George Dunlap
@ 2018-01-19 16:36   ` Jan Beulich
  2018-01-19 17:00     ` George Dunlap
  2018-01-22 17:44   ` Matt Wilson
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2018-01-19 16:36 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Andrew Cooper, xen-devel

>>> On 19.01.18 at 16:43, <george.dunlap@citrix.com> wrote:
> On 01/19/2018 02:37 PM, Jan Beulich wrote:
>> All,
>> 
>> along the lines of the relatively easy first step submitted yesterday,
>> I've had some further thoughts in that direction. A fundamental
>> thing for this is of course to first of all establish what kind of
>> information we consider safe to expose (in the long run) to guests.
>> 
>> The current state of things is deemed incomplete, yet despite my
>> earlier inquiries I haven't heard back any concrete example of
>> information, exposure of which does any harm. While it seems to be
>> generally believed that large parts of the Xen image should not be
>> exposed, it's not all that clear to me why that would be. I could
>> agree with better hiding writable data parts of it, just to be on the
>> safe side (I'm unaware of statically allocated data though which
>> might carry any secrets), but what would be the point of hiding
>> code and r/o data? Anyone wanting to know their contents can
>> simply obtain the Xen binary for their platform.
> 
> This tails into a discussion I think we should have about dealing with
> SP1, and also future-proofing against future speculative execution attacks.
> 
> Right now there are "windows" through which people can look using SP1-3,
> which we are trying to close.  SP1's "window" is the guest -> hypervisor

I think you mean SP3 here.

> virtual address space (hence XPTI, separating the address spaces).
> SP2's "window" is branch-target-poisoned gadgets (hence using retpoline
> and various techniques to prevent branch target poisoning).  SP1's
> "window" is array boundary privilege checks, hence Linux's attempts to
> prevent speculation over privilege checks by using lfence or other
> tricks[1].
> 
> But there will surely be more attacks like this (in fact, there may
> already be some in the works[2]).
> 
> So what if instead of trying to close the "windows", we made it so that
> there was nothing through the windows to see?  If no matter what the
> hypervisor speculatively executed, nothing sensitive was visibile except
> what a vcpu was already allowed to see,

I think you didn't finish your sentence here, but I also think I
can guess the missing part. There's a price to pay for such an
approach though - iterating over domains, or vCPU-s of a
domain (just as an example) wouldn't be simple list walks
anymore. There are certainly other things. IOW - yes, and
approach like this seems possible, but with all the lost
performance I think we shouldn't go overboard with further
hiding.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-19 16:36   ` Jan Beulich
@ 2018-01-19 17:00     ` George Dunlap
  2018-01-22  9:25       ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2018-01-19 17:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, xen-devel

On 01/19/2018 04:36 PM, Jan Beulich wrote:
>>>> On 19.01.18 at 16:43, <george.dunlap@citrix.com> wrote:
>> On 01/19/2018 02:37 PM, Jan Beulich wrote:
>>> All,
>>>
>>> along the lines of the relatively easy first step submitted yesterday,
>>> I've had some further thoughts in that direction. A fundamental
>>> thing for this is of course to first of all establish what kind of
>>> information we consider safe to expose (in the long run) to guests.
>>>
>>> The current state of things is deemed incomplete, yet despite my
>>> earlier inquiries I haven't heard back any concrete example of
>>> information, exposure of which does any harm. While it seems to be
>>> generally believed that large parts of the Xen image should not be
>>> exposed, it's not all that clear to me why that would be. I could
>>> agree with better hiding writable data parts of it, just to be on the
>>> safe side (I'm unaware of statically allocated data though which
>>> might carry any secrets), but what would be the point of hiding
>>> code and r/o data? Anyone wanting to know their contents can
>>> simply obtain the Xen binary for their platform.
>>
>> This tails into a discussion I think we should have about dealing with
>> SP1, and also future-proofing against future speculative execution attacks.
>>
>> Right now there are "windows" through which people can look using SP1-3,
>> which we are trying to close.  SP1's "window" is the guest -> hypervisor
> 
> I think you mean SP3 here.
> 
>> virtual address space (hence XPTI, separating the address spaces).
>> SP2's "window" is branch-target-poisoned gadgets (hence using retpoline
>> and various techniques to prevent branch target poisoning).  SP1's
>> "window" is array boundary privilege checks, hence Linux's attempts to
>> prevent speculation over privilege checks by using lfence or other
>> tricks[1].
>>
>> But there will surely be more attacks like this (in fact, there may
>> already be some in the works[2]).
>>
>> So what if instead of trying to close the "windows", we made it so that
>> there was nothing through the windows to see?  If no matter what the
>> hypervisor speculatively executed, nothing sensitive was visibile except
>> what a vcpu was already allowed to see,
> 
> I think you didn't finish your sentence here, but I also think I
> can guess the missing part. There's a price to pay for such an
> approach though - iterating over domains, or vCPU-s of a
> domain (just as an example) wouldn't be simple list walks
> anymore. There are certainly other things. IOW - yes, and
> approach like this seems possible, but with all the lost
> performance I think we shouldn't go overboard with further
> hiding.

Right, so the next question: what information *from other guests* are
sensitive?

Obviously the guest registers are sensitive.  But how much of the
information in vcpu struct that we actually need to have "to hand" is
actually sensitive information that we need to hide from other VMs?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-19 17:00     ` George Dunlap
@ 2018-01-22  9:25       ` Jan Beulich
  2018-01-22 12:33         ` George Dunlap
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2018-01-22  9:25 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Andrew Cooper, xen-devel

>>> On 19.01.18 at 18:00, <george.dunlap@citrix.com> wrote:
> On 01/19/2018 04:36 PM, Jan Beulich wrote:
>>>>> On 19.01.18 at 16:43, <george.dunlap@citrix.com> wrote:
>>> So what if instead of trying to close the "windows", we made it so that
>>> there was nothing through the windows to see?  If no matter what the
>>> hypervisor speculatively executed, nothing sensitive was visibile except
>>> what a vcpu was already allowed to see,
>> 
>> I think you didn't finish your sentence here, but I also think I
>> can guess the missing part. There's a price to pay for such an
>> approach though - iterating over domains, or vCPU-s of a
>> domain (just as an example) wouldn't be simple list walks
>> anymore. There are certainly other things. IOW - yes, and
>> approach like this seems possible, but with all the lost
>> performance I think we shouldn't go overboard with further
>> hiding.
> 
> Right, so the next question: what information *from other guests* are
> sensitive?
> 
> Obviously the guest registers are sensitive.  But how much of the
> information in vcpu struct that we actually need to have "to hand" is
> actually sensitive information that we need to hide from other VMs?

None, I think. But that's not the main aspect here. struct vcpu
instances come and go, which would mean we'd have to
permanently update what is or is not being exposed in the page
tables used. This, while solvable, is going to be a significant
burden in terms of synchronizing page tables (if we continue to
use per-CPU ones) and/or TLB shootdown. Whereas if only the
running vCPU's structure (and it's struct domain) are exposed,
no such synchronization is needed (things would simply be
updated during context switch).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-22  9:25       ` Jan Beulich
@ 2018-01-22 12:33         ` George Dunlap
  2018-01-22 13:30           ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2018-01-22 12:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, xen-devel

On 01/22/2018 09:25 AM, Jan Beulich wrote:
>>>> On 19.01.18 at 18:00, <george.dunlap@citrix.com> wrote:
>> On 01/19/2018 04:36 PM, Jan Beulich wrote:
>>>>>> On 19.01.18 at 16:43, <george.dunlap@citrix.com> wrote:
>>>> So what if instead of trying to close the "windows", we made it so that
>>>> there was nothing through the windows to see?  If no matter what the
>>>> hypervisor speculatively executed, nothing sensitive was visibile except
>>>> what a vcpu was already allowed to see,
>>>
>>> I think you didn't finish your sentence here, but I also think I
>>> can guess the missing part. There's a price to pay for such an
>>> approach though - iterating over domains, or vCPU-s of a
>>> domain (just as an example) wouldn't be simple list walks
>>> anymore. There are certainly other things. IOW - yes, and
>>> approach like this seems possible, but with all the lost
>>> performance I think we shouldn't go overboard with further
>>> hiding.
>>
>> Right, so the next question: what information *from other guests* are
>> sensitive?
>>
>> Obviously the guest registers are sensitive.  But how much of the
>> information in vcpu struct that we actually need to have "to hand" is
>> actually sensitive information that we need to hide from other VMs?
> 
> None, I think. But that's not the main aspect here. struct vcpu
> instances come and go, which would mean we'd have to
> permanently update what is or is not being exposed in the page
> tables used. This, while solvable, is going to be a significant
> burden in terms of synchronizing page tables (if we continue to
> use per-CPU ones) and/or TLB shootdown. Whereas if only the
> running vCPU's structure (and it's struct domain) are exposed,
> no such synchronization is needed (things would simply be
> updated during context switch).

I'm not sure we're actually communicating.

Correct me if I'm wrong; at the moment, under XPTI, hypercalls running
under Xen still have access to all of host memory.  To protect against
SP3, we remove almost all Xen memory from the address space before
switching to the guest.

What I'm proposing is something like this:

* We have a "global" region of Xen memory that is mapped by all
processors.  This will contain everything we consider not sensitive;
including Xen text segments, and most domain and vcpu data.  But it will
*not* map all of host memory, nor have access to sensitive data, such as
vcpu register state.

* We have per-cpu "local" regions.  In this region we will map,
on-demand, guest memory which is needed to perform current operations.
(We can consider how strictly we need to unmap memory after using it.)
We will also map the current vcpu's registers.

* On entry to a 64-bit PV guest, we don't change the mapping at all.

Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
can only access its own RAM and registers.  There's no extra overhead to
context switching into or out of the hypervisor.

Given that, I don't understand what the following comments mean:

"There's a price to pay for such an approach though - iterating over
domains, or vCPU-s of a domain (just as an example) wouldn't be simple
list walks anymore."

If we remove sensitive information from the domain and vcpu structs,
then any bit of hypervisor code can iterate over domain and vcpu structs
at will; only if they actually need to read or write sensitive data will
they have to perform an expensive map/unmap operation.  But in general,
to read another vcpu's registers you already need to do a vcpu_pause() /
vcpu_unpause(), which involves at least two IPIs (with one
spin-and-wait), so it doesn't seem like that should add a lot of extra
overhead.

"struct vcpu instances come and go, which would mean we'd have to
permanently update what is or is not being exposed in the page tables
used. This, while solvable, is going to be a significant burden in terms
of synchronizing page tables (if we continue to use per-CPU ones) and/or
TLB shootdown."

I don't understand what this is referring to in my proposed plan above.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-22 12:33         ` George Dunlap
@ 2018-01-22 13:30           ` Jan Beulich
  2018-01-22 15:15             ` George Dunlap
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2018-01-22 13:30 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Andrew Cooper, xen-devel

>>> On 22.01.18 at 13:33, <george.dunlap@citrix.com> wrote:
> On 01/22/2018 09:25 AM, Jan Beulich wrote:
>>>>> On 19.01.18 at 18:00, <george.dunlap@citrix.com> wrote:
>>> On 01/19/2018 04:36 PM, Jan Beulich wrote:
>>>>>>> On 19.01.18 at 16:43, <george.dunlap@citrix.com> wrote:
>>>>> So what if instead of trying to close the "windows", we made it so that
>>>>> there was nothing through the windows to see?  If no matter what the
>>>>> hypervisor speculatively executed, nothing sensitive was visibile except
>>>>> what a vcpu was already allowed to see,
>>>>
>>>> I think you didn't finish your sentence here, but I also think I
>>>> can guess the missing part. There's a price to pay for such an
>>>> approach though - iterating over domains, or vCPU-s of a
>>>> domain (just as an example) wouldn't be simple list walks
>>>> anymore. There are certainly other things. IOW - yes, and
>>>> approach like this seems possible, but with all the lost
>>>> performance I think we shouldn't go overboard with further
>>>> hiding.
>>>
>>> Right, so the next question: what information *from other guests* are
>>> sensitive?
>>>
>>> Obviously the guest registers are sensitive.  But how much of the
>>> information in vcpu struct that we actually need to have "to hand" is
>>> actually sensitive information that we need to hide from other VMs?
>> 
>> None, I think. But that's not the main aspect here. struct vcpu
>> instances come and go, which would mean we'd have to
>> permanently update what is or is not being exposed in the page
>> tables used. This, while solvable, is going to be a significant
>> burden in terms of synchronizing page tables (if we continue to
>> use per-CPU ones) and/or TLB shootdown. Whereas if only the
>> running vCPU's structure (and it's struct domain) are exposed,
>> no such synchronization is needed (things would simply be
>> updated during context switch).
> 
> I'm not sure we're actually communicating.
> 
> Correct me if I'm wrong; at the moment, under XPTI, hypercalls running
> under Xen still have access to all of host memory.  To protect against
> SP3, we remove almost all Xen memory from the address space before
> switching to the guest.
> 
> What I'm proposing is something like this:
> 
> * We have a "global" region of Xen memory that is mapped by all
> processors.  This will contain everything we consider not sensitive;
> including Xen text segments, and most domain and vcpu data.  But it will
> *not* map all of host memory, nor have access to sensitive data, such as
> vcpu register state.
> 
> * We have per-cpu "local" regions.  In this region we will map,
> on-demand, guest memory which is needed to perform current operations.
> (We can consider how strictly we need to unmap memory after using it.)
> We will also map the current vcpu's registers.
> 
> * On entry to a 64-bit PV guest, we don't change the mapping at all.
> 
> Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
> can only access its own RAM and registers.  There's no extra overhead to
> context switching into or out of the hypervisor.

And we would open back up the SP3 variant of guest user mode
attacking its own kernel by going through the Xen mappings. I
can't exclude that variants of SP1 (less likely SP2) allowing indirect
guest-user -> guest-kernel attacks couldn't be found.

> Given that, I don't understand what the following comments mean:
> 
> "There's a price to pay for such an approach though - iterating over
> domains, or vCPU-s of a domain (just as an example) wouldn't be simple
> list walks anymore."
> 
> If we remove sensitive information from the domain and vcpu structs,
> then any bit of hypervisor code can iterate over domain and vcpu structs
> at will; only if they actually need to read or write sensitive data will
> they have to perform an expensive map/unmap operation.  But in general,
> to read another vcpu's registers you already need to do a vcpu_pause() /
> vcpu_unpause(), which involves at least two IPIs (with one
> spin-and-wait), so it doesn't seem like that should add a lot of extra
> overhead.

Reading another vCPU-s register can't be compared with e.g.
wanting to deliver an interrupt to other than the currently running
vCPU.

> "struct vcpu instances come and go, which would mean we'd have to
> permanently update what is or is not being exposed in the page tables
> used. This, while solvable, is going to be a significant burden in terms
> of synchronizing page tables (if we continue to use per-CPU ones) and/or
> TLB shootdown."
> 
> I don't understand what this is referring to in my proposed plan above.

I had specifically said these were just examples (ones coming to
mind immediately). Of course splitting such structures in two parts
is an option, but I'm not sure it's a reasonable one (which perhaps
depends on details on how you would envision the implementation).
If the split off piece(s) was/were being referred to by pointers out
of the main structure, there would be a meaningful risk of some
perhaps rarely executed piece of code de-referencing it in the
wrong context. Otoh entirely independent structures (without
pointers in either direction) would need careful management of
their life times, so one doesn't go away without the other.

You mention the possibility of on demand mapping - if data
structures aren't used frequently, that's certainly an option.
In the end there's a lot of uncertainty here whether the in theory
nice outline could actually live up to the requirements of an
actual implementation. Yet considering the (presumably)
fundamental re-structuring of data which would be required
here calls for at least some of this uncertainty to be addressed
before actually making an attempt to switch over to such a
model.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-22 13:30           ` Jan Beulich
@ 2018-01-22 15:15             ` George Dunlap
  2018-01-22 17:04               ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2018-01-22 15:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, xen-devel

On 01/22/2018 01:30 PM, Jan Beulich wrote:
>>>> On 22.01.18 at 13:33, <george.dunlap@citrix.com> wrote:
>> On 01/22/2018 09:25 AM, Jan Beulich wrote:
>>>>>> On 19.01.18 at 18:00, <george.dunlap@citrix.com> wrote:
>>>> On 01/19/2018 04:36 PM, Jan Beulich wrote:
>>>>>>>> On 19.01.18 at 16:43, <george.dunlap@citrix.com> wrote:
>>>>>> So what if instead of trying to close the "windows", we made it so that
>>>>>> there was nothing through the windows to see?  If no matter what the
>>>>>> hypervisor speculatively executed, nothing sensitive was visibile except
>>>>>> what a vcpu was already allowed to see,
>>>>>
>>>>> I think you didn't finish your sentence here, but I also think I
>>>>> can guess the missing part. There's a price to pay for such an
>>>>> approach though - iterating over domains, or vCPU-s of a
>>>>> domain (just as an example) wouldn't be simple list walks
>>>>> anymore. There are certainly other things. IOW - yes, and
>>>>> approach like this seems possible, but with all the lost
>>>>> performance I think we shouldn't go overboard with further
>>>>> hiding.
>>>>
>>>> Right, so the next question: what information *from other guests* are
>>>> sensitive?
>>>>
>>>> Obviously the guest registers are sensitive.  But how much of the
>>>> information in vcpu struct that we actually need to have "to hand" is
>>>> actually sensitive information that we need to hide from other VMs?
>>>
>>> None, I think. But that's not the main aspect here. struct vcpu
>>> instances come and go, which would mean we'd have to
>>> permanently update what is or is not being exposed in the page
>>> tables used. This, while solvable, is going to be a significant
>>> burden in terms of synchronizing page tables (if we continue to
>>> use per-CPU ones) and/or TLB shootdown. Whereas if only the
>>> running vCPU's structure (and it's struct domain) are exposed,
>>> no such synchronization is needed (things would simply be
>>> updated during context switch).
>>
>> I'm not sure we're actually communicating.
>>
>> Correct me if I'm wrong; at the moment, under XPTI, hypercalls running
>> under Xen still have access to all of host memory.  To protect against
>> SP3, we remove almost all Xen memory from the address space before
>> switching to the guest.
>>
>> What I'm proposing is something like this:
>>
>> * We have a "global" region of Xen memory that is mapped by all
>> processors.  This will contain everything we consider not sensitive;
>> including Xen text segments, and most domain and vcpu data.  But it will
>> *not* map all of host memory, nor have access to sensitive data, such as
>> vcpu register state.
>>
>> * We have per-cpu "local" regions.  In this region we will map,
>> on-demand, guest memory which is needed to perform current operations.
>> (We can consider how strictly we need to unmap memory after using it.)
>> We will also map the current vcpu's registers.
>>
>> * On entry to a 64-bit PV guest, we don't change the mapping at all.
>>
>> Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
>> can only access its own RAM and registers.  There's no extra overhead to
>> context switching into or out of the hypervisor.
> 
> And we would open back up the SP3 variant of guest user mode
> attacking its own kernel by going through the Xen mappings. I
> can't exclude that variants of SP1 (less likely SP2) allowing indirect
> guest-user -> guest-kernel attacks couldn't be found.

How?  Xen doesn't have the guest kernel memory mapped when it's not
using it.

>> Given that, I don't understand what the following comments mean:
>>
>> "There's a price to pay for such an approach though - iterating over
>> domains, or vCPU-s of a domain (just as an example) wouldn't be simple
>> list walks anymore."
>>
>> If we remove sensitive information from the domain and vcpu structs,
>> then any bit of hypervisor code can iterate over domain and vcpu structs
>> at will; only if they actually need to read or write sensitive data will
>> they have to perform an expensive map/unmap operation.  But in general,
>> to read another vcpu's registers you already need to do a vcpu_pause() /
>> vcpu_unpause(), which involves at least two IPIs (with one
>> spin-and-wait), so it doesn't seem like that should add a lot of extra
>> overhead.
> 
> Reading another vCPU-s register can't be compared with e.g.
> wanting to deliver an interrupt to other than the currently running
> vCPU.

I'm not sure what this has to do with what I said.  Your original claim
was that "iterating over domains wouldn't be simple list walks anymore",
and I said it would be.

If you want to make some other claim about the cost of delivering an
interrupt to another vcpu then please actually make a claim and justify it.

>> "struct vcpu instances come and go, which would mean we'd have to
>> permanently update what is or is not being exposed in the page tables
>> used. This, while solvable, is going to be a significant burden in terms
>> of synchronizing page tables (if we continue to use per-CPU ones) and/or
>> TLB shootdown."
>>
>> I don't understand what this is referring to in my proposed plan above.
> 
> I had specifically said these were just examples (ones coming to
> mind immediately).

And what I'm saying is that I haven't been able to infer any examples
here.  I can't tell whether there's some misunderstanding of yours I can
correct, or if there's some misunderstanding of mine that I can take on
(either to solve or dissuade me from pursuing this idea further),
because I don't know what you're talking about.

> Of course splitting such structures in two parts
> is an option, but I'm not sure it's a reasonable one (which perhaps
> depends on details on how you would envision the implementation).
> If the split off piece(s) was/were being referred to by pointers out
> of the main structure, there would be a meaningful risk of some
> perhaps rarely executed piece of code de-referencing it in the
> wrong context. Otoh entirely independent structures (without
> pointers in either direction) would need careful management of
> their life times, so one doesn't go away without the other.

Well the obvious thing to do would be to change all accesses of
"sensitive" data to go through an accessor function.  The accessor
function could determine if the data was already mapped or if it needed
to be mapped before returning it.

> You mention the possibility of on demand mapping - if data
> structures aren't used frequently, that's certainly an option.
> In the end there's a lot of uncertainty here whether the in theory
> nice outline could actually live up to the requirements of an
> actual implementation. Yet considering the (presumably)
> fundamental re-structuring of data which would be required
> here calls for at least some of this uncertainty to be addressed
> before actually making an attempt to switch over to such a
> model.

Of course, and that's what I'm proposing we do -- explore the
possibility of a "panopticon"* Xen.  The question of exactly what bits
of hypervisor state we should consider 'sensitive' is needed both for
your question (short-term XPTI performance improvements), and mine
(long-term restructuring to potentially mitigate all information leaks).

 -George

* ...where Xen assumes that its mapped memory observed by a running vcpu
at all times, a la [ https://en.wikipedia.org/wiki/Panopticon ]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-22 15:15             ` George Dunlap
@ 2018-01-22 17:04               ` Jan Beulich
  2018-01-22 17:11                 ` George Dunlap
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2018-01-22 17:04 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Andrew Cooper, xen-devel

>>> On 22.01.18 at 16:15, <george.dunlap@citrix.com> wrote:
> On 01/22/2018 01:30 PM, Jan Beulich wrote:
>>>>> On 22.01.18 at 13:33, <george.dunlap@citrix.com> wrote:
>>> What I'm proposing is something like this:
>>>
>>> * We have a "global" region of Xen memory that is mapped by all
>>> processors.  This will contain everything we consider not sensitive;
>>> including Xen text segments, and most domain and vcpu data.  But it will
>>> *not* map all of host memory, nor have access to sensitive data, such as
>>> vcpu register state.
>>>
>>> * We have per-cpu "local" regions.  In this region we will map,
>>> on-demand, guest memory which is needed to perform current operations.
>>> (We can consider how strictly we need to unmap memory after using it.)
>>> We will also map the current vcpu's registers.
>>>
>>> * On entry to a 64-bit PV guest, we don't change the mapping at all.
>>>
>>> Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
>>> can only access its own RAM and registers.  There's no extra overhead to
>>> context switching into or out of the hypervisor.
>> 
>> And we would open back up the SP3 variant of guest user mode
>> attacking its own kernel by going through the Xen mappings. I
>> can't exclude that variants of SP1 (less likely SP2) allowing indirect
>> guest-user -> guest-kernel attacks couldn't be found.
> 
> How?  Xen doesn't have the guest kernel memory mapped when it's not
> using it.

Oh, so you mean to do away with the direct map altogether?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-22 17:04               ` Jan Beulich
@ 2018-01-22 17:11                 ` George Dunlap
  0 siblings, 0 replies; 11+ messages in thread
From: George Dunlap @ 2018-01-22 17:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, xen-devel

On 01/22/2018 05:04 PM, Jan Beulich wrote:
>>>> On 22.01.18 at 16:15, <george.dunlap@citrix.com> wrote:
>> On 01/22/2018 01:30 PM, Jan Beulich wrote:
>>>>>> On 22.01.18 at 13:33, <george.dunlap@citrix.com> wrote:
>>>> What I'm proposing is something like this:
>>>>
>>>> * We have a "global" region of Xen memory that is mapped by all
>>>> processors.  This will contain everything we consider not sensitive;
>>>> including Xen text segments, and most domain and vcpu data.  But it will
>>>> *not* map all of host memory, nor have access to sensitive data, such as
>>>> vcpu register state.
>>>>
>>>> * We have per-cpu "local" regions.  In this region we will map,
>>>> on-demand, guest memory which is needed to perform current operations.
>>>> (We can consider how strictly we need to unmap memory after using it.)
>>>> We will also map the current vcpu's registers.
>>>>
>>>> * On entry to a 64-bit PV guest, we don't change the mapping at all.
>>>>
>>>> Now, no matter what the speculative attack -- SP1, SP2, or SP3 -- a vcpu
>>>> can only access its own RAM and registers.  There's no extra overhead to
>>>> context switching into or out of the hypervisor.
>>>
>>> And we would open back up the SP3 variant of guest user mode
>>> attacking its own kernel by going through the Xen mappings. I
>>> can't exclude that variants of SP1 (less likely SP2) allowing indirect
>>> guest-user -> guest-kernel attacks couldn't be found.
>>
>> How?  Xen doesn't have the guest kernel memory mapped when it's not
>> using it.
> 
> Oh, so you mean to do away with the direct map altogether?

Yes. :-)  The direct map is *the* core reason why the SP*
vulnerabilities are so dangerous.  If the *only* thing we did was get
rid of the direct map, without doing *anything* else, we would almost
entirely mitigate the effect of all of the attacks.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: further post-Meltdown-bad-aid performance thoughts
  2018-01-19 15:43 ` George Dunlap
  2018-01-19 16:36   ` Jan Beulich
@ 2018-01-22 17:44   ` Matt Wilson
  1 sibling, 0 replies; 11+ messages in thread
From: Matt Wilson @ 2018-01-22 17:44 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, xen-devel, Jan Beulich, Andrew Cooper

On Fri, Jan 19, 2018 at 03:43:26PM +0000, George Dunlap wrote:
[...] 

> But there will surely be more attacks like this (in fact, there may
> already be some in the works[2]).

[...]
 
>  -George
> 
> [1] https://lwn.net/SubscriberLink/744287/02dd9bc503409ca3/
> [2] skyfallattack.com

In case anyone missed it, [2] is an under-informed hoax.

--msw

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-01-22 17:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-19 14:37 further post-Meltdown-bad-aid performance thoughts Jan Beulich
2018-01-19 15:43 ` George Dunlap
2018-01-19 16:36   ` Jan Beulich
2018-01-19 17:00     ` George Dunlap
2018-01-22  9:25       ` Jan Beulich
2018-01-22 12:33         ` George Dunlap
2018-01-22 13:30           ` Jan Beulich
2018-01-22 15:15             ` George Dunlap
2018-01-22 17:04               ` Jan Beulich
2018-01-22 17:11                 ` George Dunlap
2018-01-22 17:44   ` Matt Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.