All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ben Catterall <Ben.Catterall@citrix.com>, xen-devel@lists.xensource.com
Cc: george.dunlap@eu.citrix.com, tim@xen.org, keir@xen.org,
	ian.campbell@citrix.com, jbeulich@suse.com
Subject: Re: [RFC 2/4] HVM x86 deprivileged mode: Create deprivileged page tables
Date: Fri, 7 Aug 2015 16:20:29 +0100	[thread overview]
Message-ID: <55C4CCBD.80704@citrix.com> (raw)
In-Reply-To: <55C4B07E.1080102@citrix.com>

On 07/08/15 14:19, Ben Catterall wrote:
> On 06/08/15 20:52, Andrew Cooper wrote:
>> On 06/08/15 17:45, Ben Catterall wrote:
>>> The paging structure mappings for the deprivileged mode are added
>>> to the monitor page table for HVM guests. The entries are generated by
>>> walking the page tables and mapping in new pages. If a higher-level
>>> page table
>>> mapping exists, we attempt to traverse all the way down to a leaf
>>> page and add
>>> the page there. Access bits are flipped as needed.
>>>
>>> The page entries are generated for deprivileged .text, .data and a
>>> stack. The
>>> data is copied from sections allocated by the linker. The mappings
>>> are setup in
>>> an unused portion of the Xen virtual address space. The pages are
>>> mapped in as
>>> user mode accessible, with NX bits set for the data and stack
>>> regions and the
>>> code region is set to be executable and read-only.
>>>
>>> The needed pages are allocated on the HAP page heap and are
>>> deallocated when
>>> those heap pages are deallocated (on domain destruction).
>>>
>>> Signed-off-by: Ben Catterall <Ben.Catterall@citrix.com>
>>> ---
>>>   xen/arch/x86/hvm/Makefile          |   1 +
>>>   xen/arch/x86/hvm/deprivileged.c    | 441
>>> +++++++++++++++++++++++++++++++++++++
>>>   xen/arch/x86/mm/hap/hap.c          |  10 +-
>>>   xen/arch/x86/xen.lds.S             |  19 ++
>>>   xen/include/asm-x86/config.h       |  10 +-
>>>   xen/include/xen/hvm/deprivileged.h |  94 ++++++++
>>>   6 files changed, 573 insertions(+), 2 deletions(-)
>>>   create mode 100644 xen/arch/x86/hvm/deprivileged.c
>>>   create mode 100644 xen/include/xen/hvm/deprivileged.h
>>>
>>> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
>>> index 794e793..bd83ba3 100644
>>> --- a/xen/arch/x86/hvm/Makefile
>>> +++ b/xen/arch/x86/hvm/Makefile
>>> @@ -16,6 +16,7 @@ obj-y += pmtimer.o
>>>   obj-y += quirks.o
>>>   obj-y += rtc.o
>>>   obj-y += save.o
>>> +obj-y += deprivileged.o
>>
>> We attempt to keep objects linked in alphabetical order, where possible.
>>
> apologies
>>>   obj-y += stdvga.o
>>>   obj-y += vioapic.o
>>>   obj-y += viridian.o
>>> diff --git a/xen/arch/x86/hvm/deprivileged.c
>>> b/xen/arch/x86/hvm/deprivileged.c
>>> new file mode 100644
>>> index 0000000..071d900
>>> --- /dev/null
>>> +++ b/xen/arch/x86/hvm/deprivileged.c
>>> @@ -0,0 +1,441 @@
>>> +/*
>>> + * HVM deprivileged mode to provide support for running operations in
>>> + * user mode from Xen
>>> + */
>>> +#include <xen/lib.h>
>>> +#include <xen/mm.h>
>>> +#include <xen/domain_page.h>
>>> +#include <xen/config.h>
>>> +#include <xen/types.h>
>>> +#include <xen/sched.h>
>>> +#include <asm/paging.h>
>>> +#include <xen/compiler.h>
>>> +#include <asm/hap.h>
>>> +#include <asm/paging.h>
>>> +#include <asm-x86/page.h>
>>> +#include <public/domctl.h>
>>> +#include <xen/domain_page.h>
>>> +#include <asm/hvm/vmx/vmx.h>
>>> +#include <xen/hvm/deprivileged.h>
>>> +
>>> +void hvm_deprivileged_init(struct domain *d, l4_pgentry_t *l4e)
>>> +{
>>> +    int ret;
>>> +    void *p;
>>> +    unsigned long int size;
>>> +
>>> +    /* Copy the .text segment for deprivileged code */
>>
>> Why do you need to copy the deprivileged text section at all?  It is
>> read only and completely common.  All you should need to do is make
>> userspace mappings to it where it resides in the Xen linked area.
>>
> tbh: At the time, I was unsure if it could open a hole if, when Xen
> switched over to superpages, and the deprivileged code was not aligned
> on a 4KB page boundary, then mapping it in might prove problematic.
> Mainly as we'd map in some of Xen's ring 0 code by accident, which
> could prove useful to an attacker as they could then call this from
> userspace.

The monitor tables are only used in the context of HVM guests.  PV
guests run in their own pagetables (including a subset of the idle
tables), but have no chance of accidentally gaining these mappings. (A
64bit PV kernel will get legitimately irate if part of it was replaced
with some hvm depriv code.)

Creating user mappings of Xen's .text section does allow a PV guest to
execute the code, but only at the current privilege of the PV guest. 
Therefore, while a bug, doesn't present an increased attack surface. 
(It might be interesting as a side channel to analyse for RoP purposes,
but not as a direct attack surface.)

>
> Though, now I've thought about it more, I can ensure in the linker
> that it's aligned properly  and I think map_pages_to_xen() will do
> what I need.

Playing with alignment is easy.

>
>>
>>> +           (unsigned long int)&__hvm_deprivileged_text_start;
>>> +
>>> +    ret = hvm_deprivileged_copy_pages(d, l4e,
>>> +                              (unsigned long
>>> int)&__hvm_deprivileged_text_start,
>>> +                              (unsigned long
>>> int)HVM_DEPRIVILEGED_TEXT_ADDR,
>>> +                              size, 0 /* No write */);
>>> +
>>> +    if( ret )
>>> +    {
>>> +        printk("HVM: Error when initialising depriv .text. Code:
>>> %d", ret);
>>> +        domain_crash(d);
>>> +        return;
>>> +    }
>>> +
>>> +    /* Copy the .data segment for ring3 code */
>>> +    size = (unsigned long int)&__hvm_deprivileged_data_end -
>>> +           (unsigned long int)&__hvm_deprivileged_data_start;
>>> +
>>> +    ret = hvm_deprivileged_copy_pages(d, l4e,
>>> +                              (unsigned long
>>> int)&__hvm_deprivileged_data_start,
>>> +                              (unsigned long
>>> int)HVM_DEPRIVILEGED_DATA_ADDR,
>>> +                              size, _PAGE_NX | _PAGE_RW);
>>
>> What do you expect to end up in a data section like this?
>>
> It's for passing in configuration data at the start when we run the
> device (reduce the number of system calls needed) and for local data
> which the emulated devices may need. It can also be used to put the
> results of any system calls that the deprivileged code may call.

You can achieve the same thing by making the "stacks" larger and
segmenting them.  The API could pass a pointer to the data in one of the
registers, and it just happens to be sheer coincidence that the "stack"
and "data area" happen to be adjacent.

>
>>> +}
>>> +
>>> +/* Create a copy of the data at the specified virtual address.
>>> + * When writing to the source, it will walk the page table
>>> hierarchy, creating
>>> + * new levels as needed.
>>> + *
>>> + * If we find a leaf entry in a page table (one which holds the
>>> + * mfn of a 4KB, 2MB, etc. page frame) which has already been
>>> + * mapped in, then we bail as we have a collision and this likely
>>> + * means a bug or the memory configuration has been changed.
>>> + *
>>> + * Pages have PAGE_USER, PAGE_GLOBAL (if supported) and
>>> PAGE_PRESENT set by
>>> + * default. The extra l1_flags are used for extra control e.g.
>>> PAGE_RW.
>>> + * The PAGE_RW flag will be enabled for all page structure entries
>>> + * above the leaf page if that leaf page has PAGE_RW set. This is
>>> needed to
>>> + * permit the writes on the leaf pages. See the Intel manual 3A
>>> section 4.6.
>>> + *
>>> + * TODO: We proceed down to L1 4KB pages and then map these in. We
>>> should
>>> + * stop the recursion on L3/L2 for a 1GB or 2MB page which would
>>> mean faster
>>> + * page access. When we stop would depend on size (e.g. use 2MB
>>> pages for a
>>> + * few MBs)
>>> + */
>>> +int hvm_deprivileged_copy_pages(struct domain *d,
>>> +                            l4_pgentry_t *l4t_base,
>>> +                            unsigned long int src_start,
>>> +                            unsigned long int dst_start,
>>> +                            unsigned long int size,
>>> +                            unsigned int l1_flags)
>>
>> <large snip>
>>
>> Does map_pages_to_xen() not do everything you need here?  It certainly
>> should be fine for the .text section, and will keep the idle and monitor
>> tables up to date for you.
>>
> As it updates the idle tables I haven't used it. From my
> understanding: those idle tables are also used by the PV guests when
> the guest is created as the idle table is copied from FIRST_XEN_SLOT
> up to the end of the table. In config.h, the region where I'm mapping
> in my new mode has been allocated for PV guest-defined usage. I can
> edit init_guest_l4_table so that it doesn't copy this last slot
> across. I don't know if this would have further repercussions.
>
> What would you suggest? Thanks!

The idle pagetables are Xens primary set of pagetables.  Globally common
stuff such the depriv .text should reside in them.

PV guest pagetables are constructed by memcpy()ing certain slots from
the idle pagetables, the vast majority of the virtual address space is
owned by the guest, so is untouched by Xen.

In this case, your new depriv extensions live in an area which unused
for HVM monitor tables, but would belong to a PV guest kernel.


However, map_pages_to_xen() isn't quite appropriate for making a set of
Xen pagetables for mappings to be hooked into monitor tables, so you are
going to have to keep the current allocation functions.

~Andrew

  reply	other threads:[~2015-08-07 15:20 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-06 16:45 [RFC 0/4] HVM x86 enhancements to run Xen deprivileged mode operations Ben Catterall
2015-08-06 16:45 ` [RFC 1/4] HVM x86 deprivileged mode: Page allocation helper Ben Catterall
2015-08-06 19:22   ` Andrew Cooper
2015-08-07  9:57     ` Ben Catterall
2015-08-07 13:14       ` Andrew Cooper
2015-08-10  8:50       ` Tim Deegan
2015-08-10  8:52         ` Tim Deegan
2015-08-10  8:55           ` Andrew Cooper
2015-08-10 10:08             ` Tim Deegan
2015-08-06 16:45 ` [RFC 2/4] HVM x86 deprivileged mode: Create deprivileged page tables Ben Catterall
2015-08-06 19:52   ` Andrew Cooper
2015-08-07 13:19     ` Ben Catterall
2015-08-07 15:20       ` Andrew Cooper [this message]
2015-08-06 16:45 ` [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode Ben Catterall
2015-08-06 20:55   ` Andrew Cooper
2015-08-07 12:51     ` Ben Catterall
2015-08-07 13:08       ` David Vrabel
2015-08-07 14:24       ` Andrew Cooper
2015-08-11  9:45     ` Ian Campbell
2015-08-10  9:49   ` Tim Deegan
2015-08-10 10:14     ` Andrew Cooper
2015-08-11  9:55       ` Tim Deegan
2015-08-11 16:51         ` Ben Catterall
2015-08-11 17:05           ` Tim Deegan
2015-08-11 17:19             ` Andrew Cooper
2015-08-11 18:29               ` Boris Ostrovsky
2015-08-12 13:29                 ` Andrew Cooper
2015-08-12 13:33                   ` Andrew Cooper
2015-08-17 13:53                     ` Ben Catterall
2015-08-17 15:07                       ` Tim Deegan
2015-08-17 15:17                         ` Jan Beulich
2015-08-18 10:25                           ` Ben Catterall
2015-08-18 10:26                             ` Ben Catterall
2015-08-18 14:22                               ` Jan Beulich
2015-08-18 16:55                         ` Andrew Cooper
2015-08-19 10:36                           ` Ben Catterall
2015-08-12 10:10               ` Jan Beulich
2015-08-12 13:22             ` Ben Catterall
2015-08-12 13:26               ` Tim Deegan
2015-08-20 14:42       ` Ben Catterall
2015-08-11 10:35     ` Ben Catterall
2015-08-06 16:45 ` [RFC 4/4] HVM x86 deprivileged mode: Trap handlers for " Ben Catterall
2015-08-06 21:24   ` Andrew Cooper
2015-08-07 12:32     ` Ben Catterall
2015-08-07 13:19       ` Andrew Cooper
2015-08-07 13:26         ` Ben Catterall
2015-08-10 10:07   ` Tim Deegan
2015-08-11 10:33     ` Ben Catterall
2015-08-17 13:59       ` Ben Catterall
2015-08-17 14:58         ` Tim Deegan
2015-08-17 15:14           ` Jan Beulich
2015-08-12  9:50 ` [RFC 0/4] HVM x86 enhancements to run Xen deprivileged mode operations Jan Beulich
2015-08-12 11:27   ` Ben Catterall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C4CCBD.80704@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ben.Catterall@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.