xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Michał Leszczyński" <michal.leszczynski@cert.pl>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Julien Grall <julien@xen.org>, Wei Liu <wl@xen.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Jan Beulich <jbeulich@suse.com>,
	Xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op
Date: Wed, 17 Jun 2020 21:13:05 +0200 (CEST)	[thread overview]
Message-ID: <998292451.9258672.1592421185726.JavaMail.zimbra@cert.pl> (raw)
In-Reply-To: <20200616172352.GT735@Air-de-Roger>

----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@citrix.com napisał(a):

> On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote:
>> Provide an interface for privileged domains to manage
>> external IPT monitoring.
>> 
>> Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
> 
> Thanks for the patch! I have some questions below which require your
> input.
> 
>> ---
>>  xen/arch/x86/hvm/hvm.c          | 170 ++++++++++++++++++++++++++++++++
>>  xen/include/public/hvm/hvm_op.h |  27 +++++
>>  2 files changed, 197 insertions(+)
>> 
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 5bb47583b3..9292caebe0 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4949,6 +4949,172 @@ static int compat_altp2m_op(
>>      return rc;
>>  }
>>  
>> +static int do_vmtrace_op(
>> +    XEN_GUEST_HANDLE_PARAM(void) arg)
> 
> No need for the newline, this can fit on a single line.
> 
>> +{
>> +    struct xen_hvm_vmtrace_op a;
>> +    struct domain *d = NULL;
> 
> I don't think you need to init d to NULL (at least by looking at the
> current code below).
> 
>> +    int rc = -EFAULT;
> 
> No need to init rc.
> 
>> +    int i;
> 
> unsigned since it's used as a loop counter.
> 
>> +    struct vcpu *v;
>> +    void* buf;
> 
> Nit: '*' should be prepended to the variable name.
> 
>> +    uint32_t buf_size;
> 
> size_t
> 
>> +    uint32_t buf_order;
> 
> Order is generally fine using unsigned int, no need to use a
> specifically sized type.
> 
>> +    uint64_t buf_mfn;
> 
> Could this use the mfn type?
> 
>> +    struct page_info *pg;
>> +
>> +    if ( !hvm_ipt_supported() )
>> +        return -EOPNOTSUPP;
>> +
>> +    if ( copy_from_guest(&a, arg, 1) )
>> +        return -EFAULT;
>> +
>> +    if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION )
>> +        return -EINVAL;
>> +
>> +    switch ( a.cmd )
>> +    {
>> +    case HVMOP_vmtrace_ipt_enable:
>> +    case HVMOP_vmtrace_ipt_disable:
>> +    case HVMOP_vmtrace_ipt_get_buf:
>> +    case HVMOP_vmtrace_ipt_get_offset:
>> +        break;
>> +
>> +    default:
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    d = rcu_lock_domain_by_any_id(a.domain);
>> +
>> +    if ( d == NULL )
>> +        return -ESRCH;
>> +
>> +    if ( !is_hvm_domain(d) )
>> +    {
>> +        rc = -EOPNOTSUPP;
>> +        goto out;
>> +    }
>> +
>> +    domain_pause(d);
>> +
>> +    if ( a.vcpu >= d->max_vcpus )
>> +    {
>> +        rc = -EINVAL;
>> +        goto out;
>> +    }
>> +
>> +    v = d->vcpu[a.vcpu];
>> +
>> +    if ( a.cmd == HVMOP_vmtrace_ipt_enable )
> 
> Please use a switch here, you might even consider re-using the switch
> from above and moving the domain checks before actually checking the
> command field, so that you don't need to perform two switches against
> a.cmd.
> 
>> +    {
>> +        if ( v->arch.hvm.vmx.ipt_state ) {
> 
> Coding style, brace should be on newline (there are more below which
> I'm not going to comment on).
> 
>> +            // already enabled
> 
> Comments should use /* ... */, there multiple instances of this below
> which I'm not going to comment on, please check CODING_STYLE.
> 
> Also, the interface looks racy, I think you are missing a lock to
> protect v->arch.hvm.vmx.ipt_state from being freed under your feet if
> you issue concurrent calls to the interface.
> 
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        if ( a.size < PAGE_SIZE || a.size > 1000000 * PAGE_SIZE ) {
> 
> You can use GB(4) which is easier to read. Should the size also be a
> multiple of a PAGE_SIZE?
> 
>> +            // we don't accept trace buffer size smaller than single page
>> +            // and the upper bound is defined as 4GB in the specification
>> +            rc = -EINVAL;
>> +            goto out;
>> +	}
> 
> Stray tab.
> 
>> +
>> +        buf_order = get_order_from_bytes(a.size);
>> +
>> +        if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) {
> 
> Oh here is the check. I think you can move this with the checks above
> by doing a.size & ~PAGE_MASK.


I belive it's more strict than a.size & ~PAGE_MASK. I think that CPU expects that the buffer size is a power of 2, so you can have 64 MB or 128 MB, but not 96 MB buffer.


> 
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        buf = page_to_virt(alloc_domheap_pages(d, buf_order,
>> MEMF_no_refcount));
> 
> What if alloc_domheap_pages return NULL?
> 
> Since I think you only what the linear address of the page to zero it
> I would suggest using clear_domain_page.
> 

Hmm. This was fixed already. Most probably I did something strange with git and this change was not stored. I will correct this with patch v2.


>> +        buf_size = a.size;
>> +
>> +        if ( !buf ) {
>> +            rc = -EFAULT;
>> +            goto out;
>> +        }
>> +
>> +        memset(buf, 0, buf_size);
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) {
>> +            share_xen_page_with_privileged_guests(virt_to_page(buf) + i,
>> SHARE_ro);
> 
> This line (and some more below) exceeds 80 characters, please split
> it.
> 
>> +        }
>> +
>> +        v->arch.hvm.vmx.ipt_state = xmalloc(struct ipt_state);
> 
> You should check that xmalloc has succeeds before trying to access
> ipt_state.
> 
>> +        v->arch.hvm.vmx.ipt_state->output_base = virt_to_mfn(buf) <<
>> PAGE_SHIFT;
>> +        v->arch.hvm.vmx.ipt_state->output_mask = buf_size - 1;
>> +        v->arch.hvm.vmx.ipt_state->status = 0;
>> +        v->arch.hvm.vmx.ipt_state->ctl = RTIT_CTL_TRACEEN | RTIT_CTL_OS |
>> RTIT_CTL_USR | RTIT_CTL_BRANCH_EN;
> 
> Shouldn't the user be able to select what tracing should be enabled?
> 
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_disable )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        buf_mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
>> +        buf_size = ( v->arch.hvm.vmx.ipt_state->output_mask + 1 ) &
>> 0xFFFFFFFFUL;
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>> +        {
>> +            if ( (mfn_to_page(_mfn(buf_mfn + i))->count_info & PGC_count_mask)
>> != 1 )
>> +            {
>> +                rc = -EBUSY;
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        xfree(v->arch.hvm.vmx.ipt_state);
>> +	v->arch.hvm.vmx.ipt_state = NULL;
>> +
>> +        for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ )
>> +        {
>> +            pg = mfn_to_page(_mfn(buf_mfn + i));
>> +            put_page_alloc_ref(pg);
>> +            if ( !test_and_clear_bit(_PGC_xen_heap, &pg->count_info) )
>> +                ASSERT_UNREACHABLE();
>> +            pg->u.inuse.type_info = 0;
>> +            page_set_owner(pg, NULL);
>> +            free_domheap_page(pg);
> 
> Hm, this seems fairly dangerous, what guarantees that the caller is
> not going to map the buffer while you are trying to tear it down?
> 
> You perform a check before freeing ipt_state, but between the check
> and the actual tearing down the domain might have setup mappings to
> them.
> 
> I wonder, could you expand a bit on why trace buffers are allocated
> from domheap memory by Xen?


In general, I thought it would be good to account trace buffers for particular DomUs, so it would be easier to troubleshoot the memory usage.


> 
> There are a couple of options here, maybe the caller could provide
> it's own buffer, then Xen would take an extra reference to those pages
> and setup them to be used as buffers.
> 
> Another alternative would be to use domhep memory but not let the
> caller map it directly, and instead introduce a hypercall to copy
> from the internal Xen buffer into a user-provided one.
> 
> How much memory is used on average by those buffers? That would help
> decide a model that would best fit the usage.


From 4 kB to 4 GB. Right now I use 128 MB buffers and it takes just a few seconds to fill them up completely.

I think I've just copied the pattern which is already present in Xen's code, e.g. interfaces used by xenbaked/xentrace tools.


> 
>> +        }
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_buf )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT;
> 
> This will not work for translated domains, ie: a PVH or HVM domain
> won't be able to use this interface since it has no way to request the
> mapping of a specific mfn into it's physmap. I think we need to take
> this into account when deciding how the interface should be, so that
> we don't corner ourselves with a PV only interface.

Please be aware that this is only going to be used by Dom0. Is is well-supported case that somebody is using PVH/HVM Dom0?

I think that all Virtual Machine Introspection stuff currently requires to have Dom0 PV. Our main goal is to have this working well in combo with VMI.


> 
>> +        a.size = (v->arch.hvm.vmx.ipt_state->output_mask + 1) & 0xFFFFFFFFUL;
> 
> You can truncate it easier by casting to uint32_t I think.
> 
> Or even better, you could put output_mask in a union like:
> 
> union {
>    uint64_t raw;
>    struct {
>        uint32_t size;
>	uint32_t offset;
>    }
> }
> 
> Then you can avoid the shifting and the castings.
> 
>> +    }
>> +    else if ( a.cmd == HVMOP_vmtrace_ipt_get_offset )
>> +    {
>> +        if ( !v->arch.hvm.vmx.ipt_state ) {
>> +            rc = -EINVAL;
>> +            goto out;
>> +        }
>> +
>> +        a.offset = v->arch.hvm.vmx.ipt_state->output_mask >> 32;
>> +    }
>> +
>> +    rc = -EFAULT;
>> +    if ( __copy_to_guest(arg, &a, 1) )
>> +      goto out;
>> +    rc = 0;
>> +
>> + out:
>> +    smp_wmb();
> 
> Why do you need a barrier here?
> 
>> +    domain_unpause(d);
>> +    rcu_unlock_domain(d);
>> +
>> +    return rc;
>> +}
>> +
>> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t);
>> +
>>  static int hvmop_get_mem_type(
>>      XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg)
>>  {
>> @@ -5101,6 +5267,10 @@ long do_hvm_op(unsigned long op,
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>          rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
>>          break;
>>  
>> +    case HVMOP_vmtrace:
>> +        rc = do_vmtrace_op(arg);
>> +        break;
>> +
>>      default:
>>      {
>>          gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
>> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
>> index 870ec52060..3bbcd54c96 100644
>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op {
>>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>>  
>> +/* HVMOP_vmtrace: Perform VM tracing related operation */
>> +#define HVMOP_vmtrace 26
>> +
>> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001
>> +
>> +struct xen_hvm_vmtrace_op {
>> +    /* IN variable */
>> +    uint32_t version;   /* HVMOP_VMTRACE_INTERFACE_VERSION */
>> +    uint32_t cmd;
>> +/* Enable/disable external vmtrace for given domain */
>> +#define HVMOP_vmtrace_ipt_enable      1
>> +#define HVMOP_vmtrace_ipt_disable     2
>> +#define HVMOP_vmtrace_ipt_get_buf     3
>> +#define HVMOP_vmtrace_ipt_get_offset  4
>> +    domid_t domain;
> 
> You are missing a padding field here AFAICT.
> 
> Roger.


Thanks for your feedback, I will apply all the remaining suggestions in patch v2.

Best regards,
Michał Leszczyński
CERT Polska


  reply	other threads:[~2020-06-17 19:14 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16 15:16 [PATCH v1 0/7] Implement support for external IPT monitoring Michał Leszczyński
2020-06-16 15:19 ` [PATCH v1 1/7] x86/vmx: add Intel PT MSR definitions Michał Leszczyński
2020-06-18 13:31   ` Jan Beulich
2020-06-16 15:20 ` [PATCH v1 2/7] x86/vmx: add IPT cpu feature Michał Leszczyński
2020-06-16 16:30   ` Roger Pau Monné
2020-06-17 11:34     ` Jan Beulich
2020-06-16 15:21 ` [PATCH v1 3/7] x86/vmx: add ipt_state as part of vCPU state Michał Leszczyński
2020-06-16 16:33   ` Roger Pau Monné
2020-06-16 15:22 ` [PATCH v1 4/7] x86/vmx: add do_vmtrace_op Michał Leszczyński
2020-06-16 17:23   ` Roger Pau Monné
2020-06-17 19:13     ` Michał Leszczyński [this message]
2020-06-18  3:20       ` Tamas K Lengyel
2020-06-18 11:01         ` Michał Leszczyński
2020-06-18 11:55           ` Roger Pau Monné
2020-06-18 12:51             ` Jan Beulich
2020-06-18 13:09               ` Michał Leszczyński
2020-06-18 13:24                 ` Jan Beulich
2020-06-18 13:40                 ` Roger Pau Monné
2020-06-18  8:46       ` Roger Pau Monné
2020-06-18 15:25     ` Michał Leszczyński
2020-06-18 15:39       ` Jan Beulich
2020-06-18 15:47         ` Tamas K Lengyel
2020-06-18 15:49           ` Tamas K Lengyel
2020-06-16 15:22 ` [PATCH v1 5/7] tools/libxc: add xc_ptbuf_* functions Michał Leszczyński
2020-06-16 15:23 ` [PATCH v1 6/7] tools/proctrace: add proctrace tool Michał Leszczyński
2020-06-16 15:24 ` [PATCH v1 7/7] x86/vmx: switch IPT MSRs on vmentry/vmexit Michał Leszczyński
2020-06-16 17:38   ` Roger Pau Monné
2020-06-16 17:47     ` Michał Leszczyński
2020-06-17  9:09       ` Roger Pau Monné
2020-06-17 11:54         ` Michał Leszczyński
2020-06-17 12:51           ` Roger Pau Monné
2020-06-17 15:14             ` Andrew Cooper
2020-06-17 18:56               ` Michał Leszczyński
2020-06-18  8:52                 ` Roger Pau Monné
2020-06-18 11:07                   ` Michał Leszczyński
2020-06-18 11:49                     ` Roger Pau Monné
2020-06-17 23:30               ` Kang, Luwei
2020-06-18 10:02                 ` Andrew Cooper
2020-06-18 17:38   ` Andrew Cooper
2020-06-16 18:17 ` [PATCH v1 0/7] Implement support for external IPT monitoring Andrew Cooper
2020-06-16 18:47   ` Michał Leszczyński
2020-06-16 20:16     ` Andrew Cooper
2020-06-17  3:02       ` Tamas K Lengyel
2020-06-17 16:19         ` Andrew Cooper
2020-06-17 16:27           ` Tamas K Lengyel
2020-06-17 17:23             ` Andrew Cooper
2020-06-17 19:31               ` Tamas K Lengyel
2020-06-17 19:30             ` Michał Leszczyński
2020-06-17 20:20           ` Michał Leszczyński
2020-06-18  8:25             ` Roger Pau Monné
2020-06-18 14:59           ` Michał Leszczyński
2020-06-17  1:35     ` Tian, Kevin
2020-06-17  6:45       ` Kang, Luwei
2020-06-17  9:21         ` Roger Pau Monné
2020-06-17 12:37           ` Kang, Luwei
2020-06-17 12:53             ` Roger Pau Monné
2020-06-17 23:29               ` Kang, Luwei
2020-06-18  0:56                 ` Michał Leszczyński
2020-06-18  7:00                   ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=998292451.9258672.1592421185726.JavaMail.zimbra@cert.pl \
    --to=michal.leszczynski@cert.pl \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).