All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH 7/8] xen/x86: Rework CR4 handling for PV guests
Date: Thu, 25 Jun 2015 17:22:10 +0100	[thread overview]
Message-ID: <558C2AB2.1070001@citrix.com> (raw)
In-Reply-To: <558C2F3E0200007800089BA9@mail.emea.novell.com>

On 25/06/15 15:41, Jan Beulich wrote:
>>>> On 24.06.15 at 18:31, <andrew.cooper3@citrix.com> wrote:
>> PV CR4 settings are now based on mmu_cr4_features, rather than the current
>> contents of CR4.  This causes dom0 to be consistent with domUs, despite 
>> being
>> constructed in a region with CR4.SMAP purposefully disabled.
> That'll be fine as long as we're keeping CR4 stable (except for
> guest controlled bits). Agreed, once that's no longer the case,
> mmu_cr4_features would no longer be very useful either.
> Still it would seem more reasonable to me to base things on
> current values than on some not guaranteed to be up to date.

mmu_cr4_features is supposed to be the runtime value in cr4.

At any point, SMAP/SMEP/PGE might be temporarily disabled, which is why
I avoided using the current value of cr4.

>
>> The major change here is that v->arch.pv_vcpu.ctrlreg[4] now contains only CR4
>> bits which Xen wishes to shadow, rather than containing a mix of host and
>> guest bits.  This is likely to elicit a warning when migrating a PV guest 
>> from
>> an older version of Xen, although no function problems as a result.
> For one, I'm having a hard time guessing what the part of the sentence
> after the comma is meant to tell me.

Before this patch, v->arch.pv_vcpu.ctrlreg[4] contained an unqualified
combination of host and guest controlled bits.

After this patch, it strictly contains the bits Xen chooses to shadow.

> And then, from a support
> perspective, such warnings aren't helpful, as they incur questions.

It was useful for debugging, but can see your point.  I could reword it
to "ignoring $FOO's attempt to update $BAR's CR4".  I would hesitate
from removing it completely however.

> Plus
> finally, storing neither the current guest view nor the current hypervisor
> view in that field seems prone to cause headaches in the future. Otoh
> this gets it in line with CR0 handling as it looks.

My logic was that the set of shadowed bits will never decrease on newer
versions of Xen, but Xen's choice of CR4 bits may change.  As a result,
neither "guests view" nor "hypervisors view" is appropriate in an
upgrade case.

>
>> A second change is that visible set of CR4 bits is different.  In 
>> particular,
>> VMXE and MCE are not leaked into a guest, while SMAP and SMEP are exposed.
> Not leaking VMXE into the guest is fine. Not exposing MCE even to
> Dom0 is questionable.

I would prefer not to special case the hardware domain if possible, and
I suppose MCE is not too much of an issue to leak.  We already are,
after all.

> And exposing SMEP and SMAP without
> exposing their CPUID flags seems bogus.

I was hoping not to get bogged down in CPUID given my upcoming work to
fix feature levelling.

After recent consideration, I am still not sure if we want to support
SMAP in 32bit PV guests or not.  The trapping of stac/clac would be a
high overhead, although the guest could modify EFLAGS.AC itself using popf.

>
>> In addition, add PGE to the set of bits we don't care about a guest 
>> attempting to modify.
> I agree from a functionality pov this is okay, but do we really want
> to allow the guest to turn this off from a performance pov?

This change doesn't shadow PGE.  It just prevents the warning from
firing if the guest attempts to clear PGE.

>
>> @@ -683,6 +683,26 @@ void arch_domain_unpause(struct domain *d)
>>  }
>>  
>>  /*
>> + * CR4 bits Xen will shadow on behalf of the guest.
>> + *  - TSD for vtsc
>> + *  - DE for %%dr4/5 emulation
> DR4/5 handling is surely secondary here: I/O breakpoints are the
> main feature to be controlled by this bit.

True - I will expand the comment.

>
>> @@ -699,6 +719,8 @@ static int __init init_pv_cr4_masks(void)
>>          common_mask &= ~X86_CR4_DE;
>>      if ( cpu_has_xsave )
>>          common_mask &= ~X86_CR4_OSXSAVE;
>> +    if ( cpu_has_pge )
>> +        common_mask &= ~X86_CR4_PGE;
> Following the earlier comment, shouldn't this bit then also be added
> to PV_CR4_SHADOW?

I don't think so.  A guest has to defer to Xen to perform TLB flushes. 
I just wished to avoid the warning.

>
>> @@ -708,21 +730,61 @@ static int __init init_pv_cr4_masks(void)
>>      if ( cpu_has_fsgsbase )
>>          pv_cr4_mask &= ~X86_CR4_FSGSBASE;
>>  
>> +    BUILD_BUG_ON(PV_CR4_SHADOW & PV_CR4_READ);
>> +
>>      return 0;
>>  }
>>  __initcall(init_pv_cr4_masks);
>>  
>> -unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4)
>> +void pv_cr4_write(struct vcpu *v, unsigned long guest_cr4)
>>  {
>> -    unsigned long hv_cr4 = real_cr4_to_pv_guest_cr4(read_cr4());
>> +    unsigned long host_cr4 = mmu_cr4_features & (PV_CR4_READ | PV_CR4_SHADOW);
>>      unsigned long mask = is_pv_32bit_vcpu(v) ? compat_pv_cr4_mask : pv_cr4_mask;
>>  
>> -    if ( (guest_cr4 & mask) != (hv_cr4 & mask) )
>> +    if ( guest_cr4 == 0 )
>> +    {
>> +        /* Default - all shadowed bits to 0. */
>> +        guest_cr4 = mmu_cr4_features & ~PV_CR4_SHADOW;
>> +    }
>> +    else if ( (guest_cr4 & ~PV_CR4_SHADOW) == 0 )
>> +    {
>> +        /*
>> +         * Only setting shadowed bits.  This will be the case on restore, so
>> +         * avoid warning about losing readable host bits.
>> +         */
>> +    }
>> +    else if ( (guest_cr4 & mask) != (host_cr4 & mask) )
>> +    {
>>          printk(XENLOG_G_WARNING
>> -               "d%d attempted to change %pv's CR4 flags %08lx -> %08lx\n",
>> -               current->domain->domain_id, v, hv_cr4, guest_cr4);
>> +               "d%d attempted to change %pv's CR4 flags %08lx -> %08lx "
>> +               "(bad +%#lx, -%#lx)\n",
> No mixture of 0x-prefixed and not-0x-prefixed hex numbers please.

Oops yes.

> I anyway wonder whether printing four numbers here is necessary
> when printing just three (host_cr4, guest_cr4, and mask) would
> suffice.

It is substantially harder to read.

>
>> +               current->domain->domain_id, v, host_cr4, guest_cr4,
>> +               (host_cr4 ^ guest_cr4) & guest_cr4 & mask,
>> +               (host_cr4 ^ guest_cr4) & ~guest_cr4 & mask);
>> +    }
>> +
>> +    v->arch.pv_vcpu.ctrlreg[4] =
>> +        ((mmu_cr4_features & mask) | (guest_cr4 & ~mask)) & PV_CR4_SHADOW;
> With FSGSBASE not in PV_CR4_SHADOW, how would (as the comment
> in init_pv_cr4_masks() says) the guest manage to observe the flag in
> its desired state after a migration?

The guest does not get any control of FSGSBASE.  Xen unilaterally has it
enabled, and uses them on vcpu context switch so it is not safe to ever
disable.

>
>> +unsigned long pv_cr4_read(const struct vcpu *v)
>> +{
>> +    unsigned long cr4 = read_cr4();
>> +
>> +    return (cr4 & PV_CR4_READ & ~PV_CR4_SHADOW) |
>> +        (v->arch.pv_vcpu.ctrlreg[4] & PV_CR4_SHADOW);
> I don't understand either of the two uses of PV_CR4_SHADOW here:
> The first seems pointless as earlier an you already assert that it
> and PV_CR4_READ are disjoint, i.e. the negation of the former
> surely has no less bits set than the latter. And the second seems
> pointless since you only ever store bits in PV_CR4_SHADOW into the
> field.

This bit of code has failed to change over the development of the
series.  I will adjust it.

~Andrew

  reply	other threads:[~2015-06-25 16:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-24 16:31 [PATCH 0/8] [RFC] SMAP handling improvements for 32bit PV guests on Andrew Cooper
2015-06-24 16:31 ` [PATCH 1/8] common/vsprintf: Special-case DOMID_IDLE handling for %pv Andrew Cooper
2015-06-24 16:31 ` [PATCH 2/8] x86/traps: Avoid using current too early on boot Andrew Cooper
2015-06-24 16:31 ` [PATCH 3/8] x86/setup: Initialise CR4 before creating idle_vcpu[0] Andrew Cooper
2015-06-24 16:31 ` [PATCH 4/8] xen/x86: Clean up CR4 definitions Andrew Cooper
2015-06-24 16:31 ` [PATCH 5/8] xen/x86: Drop PSE from XEN_MINIMAL_CR4 Andrew Cooper
2015-06-24 16:31 ` [PATCH 6/8] xen/x86: Calculate PV CR4 masks at boot Andrew Cooper
2015-06-25 13:08   ` Jan Beulich
2015-06-25 13:31     ` Andrew Cooper
2015-06-25 13:40       ` Jan Beulich
2015-06-25 13:43         ` Andrew Cooper
2015-06-24 16:31 ` [PATCH 7/8] xen/x86: Rework CR4 handling for PV guests Andrew Cooper
2015-06-25 14:41   ` Jan Beulich
2015-06-25 16:22     ` Andrew Cooper [this message]
2015-06-26  6:22       ` Jan Beulich
2015-06-26  8:10         ` Andrew Cooper
2015-06-26  8:50           ` Jan Beulich
2015-06-24 16:31 ` [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit " Andrew Cooper
2015-06-25 11:18   ` David Vrabel
2015-06-25 11:53     ` Andrew Cooper
2015-06-25 15:12   ` Jan Beulich
2015-06-25 16:31     ` Andrew Cooper
2015-06-26  6:33   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558C2AB2.1070001@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.