All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	Jan Beulich <JBeulich@suse.com>, Juergen Gross <jgross@suse.com>
Cc: wei.liu2@citrix.com, George.Dunlap@eu.citrix.com,
	ian.jackson@eu.citrix.com, Dario Faggioli <dfaggioli@suse.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains
Date: Tue, 23 Jan 2018 11:06:46 +0000	[thread overview]
Message-ID: <f6cd2e22-da88-7c09-6474-19a4c47bc61a@citrix.com> (raw)
In-Reply-To: <8c922b5c-0708-8b6b-4c4e-e6e3c41ec385@citrix.com>

On 01/22/2018 07:02 PM, Andrew Cooper wrote:
> On 22/01/18 18:48, George Dunlap wrote:
>> On 01/22/2018 06:39 PM, Andrew Cooper wrote:
>>> On 22/01/18 16:51, Jan Beulich wrote:
>>>>>>> On 22.01.18 at 16:00, <jgross@suse.com> wrote:
>>>>> On 22/01/18 15:48, Jan Beulich wrote:
>>>>>>>>> On 22.01.18 at 15:38, <jgross@suse.com> wrote:
>>>>>>> On 22/01/18 15:22, Jan Beulich wrote:
>>>>>>>>>>> On 22.01.18 at 15:18, <jgross@suse.com> wrote:
>>>>>>>>> On 22/01/18 13:50, Jan Beulich wrote:
>>>>>>>>>>>>> On 22.01.18 at 13:32, <jgross@suse.com> wrote:
>>>>>>>>>>> As a preparation for doing page table isolation in the Xen hypervisor
>>>>>>>>>>> in order to mitigate "Meltdown" use dedicated stacks, GDT and TSS for
>>>>>>>>>>> 64 bit PV domains mapped to the per-domain virtual area.
>>>>>>>>>>>
>>>>>>>>>>> The per-vcpu stacks are used for early interrupt handling only. After
>>>>>>>>>>> saving the domain's registers stacks are switched back to the normal
>>>>>>>>>>> per physical cpu ones in order to be able to address on-stack data
>>>>>>>>>>> from other cpus e.g. while handling IPIs.
>>>>>>>>>>>
>>>>>>>>>>> Adding %cr3 switching between saving of the registers and switching
>>>>>>>>>>> the stacks will enable the possibility to run guest code without any
>>>>>>>>>>> per physical cpu mapping, i.e. avoiding the threat of a guest being
>>>>>>>>>>> able to access other domains data.
>>>>>>>>>>>
>>>>>>>>>>> Without any further measures it will still be possible for e.g. a
>>>>>>>>>>> guest's user program to read stack data of another vcpu of the same
>>>>>>>>>>> domain, but this can be easily avoided by a little PV-ABI modification
>>>>>>>>>>> introducing per-cpu user address spaces.
>>>>>>>>>>>
>>>>>>>>>>> This series is meant as a replacement for Andrew's patch series:
>>>>>>>>>>> "x86: Prerequisite work for a Xen KAISER solution".
>>>>>>>>>> Considering in particular the two reverts, what I'm missing here
>>>>>>>>>> is a clear description of the meaningful additional protection this
>>>>>>>>>> approach provides over the band-aid. For context see also
>>>>>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg01735.html 
>>>>>>>>> My approach supports mapping only the following data while the guest is
>>>>>>>>> running (apart form the guest's own data, of course):
>>>>>>>>>
>>>>>>>>> - the per-vcpu entry stacks of the domain which will contain only the
>>>>>>>>>   guest's registers saved when an interrupt occurs
>>>>>>>>> - the per-vcpu GDTs and TSSs of the domain
>>>>>>>>> - the IDT
>>>>>>>>> - the interrupt handler code (arch/x86/x86_64/[compat/]entry.S
>>>>>>>>>
>>>>>>>>> All other hypervisor data and code can be completely hidden from the
>>>>>>>>> guests.
>>>>>>>> I understand that. What I'm not clear about is: Which parts of
>>>>>>>> the additionally hidden data are actually necessary (or at least
>>>>>>>> very desirable) to hide?
>>>>>>> Necessary:
>>>>>>> - other guests' memory (e.g. physical memory 1:1 mapping)
>>>>>>> - data from other guests e.g.in stack pages, debug buffers, I/O buffers,
>>>>>>>   code emulator buffers
>>>>>>> - other guests' register values e.g. in vcpu structure
>>>>>> All of this is already being made invisible by the band-aid (with the
>>>>>> exception of leftovers on the hypervisor stacks across context
>>>>>> switches, which we've already said could be taken care of by
>>>>>> memset()ing that area). I'm asking about the _additional_ benefits
>>>>>> of your approach.
>>>>> I'm quite sure the performance will be much better as it doesn't require
>>>>> per physical cpu L4 page tables, but just a shadow L4 table for each
>>>>> guest L4 table, similar to the Linux kernel KPTI approach.
>>>> But isn't that model having the same synchronization issues upon
>>>> guest L4 updates which Andrew was fighting with?
>>> (Condensing a lot of threads down into one)
>>>
>>> All the methods have L4 synchronisation update issues, until we have a
>>> PV ABI which guarantees that L4's don't get reused.  Any improvements to
>>> the shadowing/synchronisation algorithm will benefit all approaches.
>>>
>>> Juergen: you're now adding a LTR into the context switch path which
>>> tends to be very slow.  I.e. As currently presented, this series
>>> necessarily has a higher runtime overhead than Jan's XPTI.
>>>
>>> One of my concerns is that this patch series moves further away from the
>>> secondary goal of my KAISER series, which was to have the IDT and GDT
>>> mapped at the same linear addresses on every CPU so a) SIDT/SGDT don't
>>> leak which CPU you're currently scheduled on into PV guests and b) the
>>> context switch code can drop a load of its slow instructions like LGDT
>>> and the VMWRITEs to update the VMCS.
>>>
>>> Jan: As to the things not covered by the current XPTI, hiding most of
>>> the .text section is important to prevent fingerprinting or ROP
>>> scanning.  This is a defence-in-depth argument, but a guest being easily
>>> able to identify whether certain XSAs are fixed or not is quite bad. 
>> I'm afraid we have a fairly different opinion of what is "quite bad".
> 
> I suggest you try talking to some real users then.
> 
>> Suppose we handed users a knob and said, "If you flip this switch,
>> attackers won't be able to tell if you've fixed XSAs or not without
>> trying them; but it will slow down your guests 20%."  How many do you
>> think would flip it, and how many would reckon that an attacker could
>> probably find out that information anyway?
> 
> Nonsense.  The performance hit is already taken. 

You just said:

"Juergen: you're now adding a LTR into the context switch path which
tends to be very slow.  I.e. As currently presented, this series
necessarily has a higher runtime overhead than Jan's XPTI."

And:

"As to the things not covered by the current XPTI, hiding most of
the .text section is important..."

You've previously said that the overhead for your KAISER series was much
higher than Jan's "bandaid" XPTI series, and implied that Juergen's
approach would suffer the same fate.

This led me to infer:

1. The .text segment is not hidden in XPTI, but would be under your and
Juergen's approaches

2. The cost of hiding the .text segment, over and above XPTI stage 1,
according to our current best efforts, is significant (making up 20% as
a reasonable strawman).

In which case performance hit is most certainly *not* already taken.

> The argument is "do
> you want an attacker able to trivially evaluate security weaknesses in
> your hypervisor", a process which usually has to be done by guesswork
> and knowing the exact binary under attack.  Having .text fully readable
> lowers the barrier to entry substantially.

And I can certainly see that some users would want to protect against
that.  But faced with an even higher performance hit, a significant
number of users would probably pass.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2018-01-23 11:06 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-22 12:32 [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 01/12] x86: cleanup processor.h Juergen Gross
2018-01-22 12:52   ` Jan Beulich
     [not found]   ` <5A65ECA502000078001A111C@suse.com>
2018-01-22 14:10     ` Juergen Gross
2018-01-22 14:25       ` Andrew Cooper
2018-01-22 14:32         ` Jan Beulich
2018-01-22 12:32 ` [PATCH RFC v2 02/12] x86: don't use hypervisor stack size for dumping guest stacks Juergen Gross
2018-01-23  9:26   ` Jan Beulich
     [not found]   ` <5A670DEF02000078001A16AF@suse.com>
2018-01-23  9:58     ` Juergen Gross
2018-01-23 10:11       ` Jan Beulich
     [not found]       ` <5A67187C02000078001A1742@suse.com>
2018-01-23 10:19         ` Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 03/12] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873 Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 04/12] x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4 Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 05/12] x86: don't access saved user regs via rsp in trap handlers Juergen Gross
2018-01-30 14:49   ` Jan Beulich
     [not found]   ` <5A70941B02000078001A3BF0@suse.com>
2018-01-30 16:33     ` Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 06/12] x86: add a xpti command line parameter Juergen Gross
2018-01-30 15:39   ` Jan Beulich
     [not found]   ` <5A709FDF02000078001A3C2C@suse.com>
2018-01-30 16:51     ` Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 07/12] x86: allow per-domain mappings without NX bit or with specific mfn Juergen Gross
2018-01-29 17:06   ` Jan Beulich
     [not found]   ` <5A6F62B602000078001A3810@suse.com>
2018-01-30  8:02     ` Juergen Gross
2018-01-30  8:41       ` Jan Beulich
2018-01-31 10:30   ` Jan Beulich
2018-01-22 12:32 ` [PATCH RFC v2 08/12] xen/x86: use dedicated function for tss initialization Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 09/12] x86: enhance syscall stub to work in per-domain mapping Juergen Gross
2018-01-30 15:11   ` Jan Beulich
     [not found]   ` <5A70991902000078001A3C16@suse.com>
2018-01-30 16:50     ` Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 10/12] x86: allocate per-vcpu stacks for interrupt entries Juergen Gross
2018-01-30 15:40   ` Jan Beulich
2018-02-09 12:35     ` Juergen Gross
2018-02-13  9:10       ` Jan Beulich
     [not found]   ` <5A70A01402000078001A3C30@suse.com>
2018-01-30 17:12     ` Juergen Gross
2018-01-31 10:18       ` Jan Beulich
2018-01-22 12:32 ` [PATCH RFC v2 11/12] x86: modify interrupt handlers to support stack switching Juergen Gross
2018-01-30 16:07   ` Jan Beulich
     [not found]   ` <5A70A63D02000078001A3C7C@suse.com>
2018-01-30 17:19     ` Juergen Gross
2018-01-31 10:36       ` Jan Beulich
     [not found]       ` <5A71AA4202000078001A3F56@suse.com>
2018-02-02 15:42         ` Juergen Gross
2018-01-22 12:32 ` [PATCH RFC v2 12/12] x86: activate per-vcpu stacks in case of xpti Juergen Gross
2018-01-30 16:33   ` Jan Beulich
     [not found]   ` <5A70AC7F02000078001A3CA6@suse.com>
2018-01-30 17:33     ` Juergen Gross
2018-01-31 10:40       ` Jan Beulich
2018-01-22 12:50 ` [PATCH RFC v2 00/12] xen/x86: use per-vcpu stacks for 64 bit pv domains Jan Beulich
     [not found] ` <5A65EC0A02000078001A1118@suse.com>
2018-01-22 14:18   ` Juergen Gross
2018-01-22 14:22     ` Jan Beulich
     [not found]     ` <5A6601D302000078001A1230@suse.com>
2018-01-22 14:38       ` Juergen Gross
2018-01-22 14:48         ` Jan Beulich
     [not found]         ` <5A6607DB02000078001A127B@suse.com>
2018-01-22 15:00           ` Juergen Gross
2018-01-22 16:51             ` Jan Beulich
2018-01-22 18:39               ` Andrew Cooper
2018-01-22 18:48                 ` George Dunlap
2018-01-22 19:02                   ` Andrew Cooper
2018-01-23  8:36                     ` Jan Beulich
2018-01-23 11:23                       ` Andrew Cooper
2018-01-23 11:06                     ` George Dunlap [this message]
2018-01-23  6:34                 ` Juergen Gross
2018-01-23  7:21                   ` Juergen Gross
2018-01-23  8:53                   ` Jan Beulich
     [not found]                   ` <5A67061F02000078001A1669@suse.com>
2018-01-23  9:24                     ` Juergen Gross
2018-01-23  9:31                       ` Jan Beulich
     [not found]                       ` <5A670F0E02000078001A16C9@suse.com>
2018-01-23 10:10                         ` Juergen Gross
2018-01-23 11:45                           ` Andrew Cooper
2018-01-23 13:31                             ` Juergen Gross
2018-01-23 13:24                 ` Dario Faggioli
2018-01-23 16:45                 ` George Dunlap
2018-01-23 16:56                   ` Juergen Gross
2018-01-23 17:33                     ` George Dunlap
2018-01-24  7:37                       ` Jan Beulich
     [not found]             ` <5A6624A602000078001A1375@suse.com>
2018-01-23  5:50               ` Juergen Gross
2018-01-23  8:40                 ` Jan Beulich
     [not found]                 ` <5A67030F02000078001A164B@suse.com>
2018-01-23  9:45                   ` Juergen Gross
2018-01-22 21:45 ` Konrad Rzeszutek Wilk
2018-01-23  6:38   ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6cd2e22-da88-7c09-6474-19a4c47bc61a@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jgross@suse.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.