xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Keir Fraser <keir@xen.org>, Feng Wu <feng.wu@intel.com>
Subject: Re: [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code
Date: Wed, 9 Mar 2016 11:19:39 +0000	[thread overview]
Message-ID: <56E006CB.7090907@citrix.com> (raw)
In-Reply-To: <56DE93E902000078000DA3D1@prv-mh.provo.novell.com>

On 08/03/16 07:57, Jan Beulich wrote:
>
>>> @@ -174,10 +174,43 @@ compat_bad_hypercall:
>>>  /* %rbx: struct vcpu, interrupts disabled */
>>>  ENTRY(compat_restore_all_guest)
>>>          ASSERT_INTERRUPTS_DISABLED
>>> +.Lcr4_orig:
>>> +        ASM_NOP3 /* mov   %cr4, %rax */
>>> +        ASM_NOP6 /* and   $..., %rax */
>>> +        ASM_NOP3 /* mov   %rax, %cr4 */
>>> +        .pushsection .altinstr_replacement, "ax"
>>> +.Lcr4_alt:
>>> +        mov   %cr4, %rax
>>> +        and   $~(X86_CR4_SMEP|X86_CR4_SMAP), %rax
>>> +        mov   %rax, %cr4
>>> +.Lcr4_alt_end:
>>> +        .section .altinstructions, "a"
>>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMEP, 12, \
>>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>>> +        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_SMAP, 12, \
>>> +                             (.Lcr4_alt_end - .Lcr4_alt)
>> These 12's look as if they should be (.Lcr4_alt - .Lcr4_orig).
> Well, the NOPs that get put there make 12 (= 3 + 6 + 3) a
> pretty obvious (shorter and hence more readable) option. But
> yes, if you're of the strong opinion that we should use the
> longer alternative, I can switch these around.

I have to admit that I prefer the Linux ALTERNATIVE macro for assembly,
which takes care of the calculations like this.  It is slightly
unfortunate that it generally requires its assembly blocks in strings,
and is unsuitable for larger blocks.  Perhaps we can see about an
variant in due course.

>
>>> +/* This mustn't modify registers other than %rax. */
>>> +ENTRY(cr4_smep_smap_restore)
>>> +        mov   %cr4, %rax
>>> +        test  $X86_CR4_SMEP|X86_CR4_SMAP,%eax
>>> +        jnz   0f
>>> +        or    cr4_smep_smap_mask(%rip), %rax
>>> +        mov   %rax, %cr4
>>> +        ret
>>> +0:
>>> +        and   cr4_smep_smap_mask(%rip), %eax
>>> +        cmp   cr4_smep_smap_mask(%rip), %eax
>>> +        je    1f
>>> +        BUG
>> What is the purpose of this bugcheck? It looks like it is catching a
>> mismatch of masked options, but I am not completely sure.
> This aims at detecting that some of the CR4 bits which are
> expected to be set really aren't (other than the case when all
> of the ones of interest here are clear).
>
>> For all other ASM level BUG's, I put a short comment on the same line,
>> to aid people who hit the bug.
> Will do. Question: Should this check perhaps become !NDEBUG
> dependent?

It probably should do.

>
>>> @@ -454,13 +455,64 @@ ENTRY(page_fault)
>>>  GLOBAL(handle_exception)
>>>          SAVE_ALL CLAC
>>>  handle_exception_saved:
>>> +        GET_CURRENT(%rbx)
>>>          testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
>>>          jz    exception_with_ints_disabled
>>> -        sti
>>> +
>>> +.Lsmep_smap_orig:
>>> +        jmp   0f
>>> +        .if 0 // GAS bug (affecting at least 2.22 ... 2.26)
>>> +        .org .Lsmep_smap_orig + (.Lsmep_smap_alt_end - .Lsmep_smap_alt), 0xcc
>>> +        .else
>>> +        // worst case: rex + opcode + modrm + 4-byte displacement
>>> +        .skip (1 + 1 + 1 + 4) - 2, 0xcc
>>> +        .endif
>> Which bug is this?  How does it manifest.  More generally, what is this
>> alternative trying to achieve?
> The .org gets a warning (.Lsmep_smap_orig supposedly being
> undefined, and hence getting assumed to be zero) followed by
> an error (attempt to move the current location backwards). The
> fix https://sourceware.org/ml/binutils/2016-03/msg00030.html
> is pending approval.

I presume this is down to the documented restriction about crossing
sections.  i.e. there is no .Lsmep_smap_orig in .altinstr_replacement,
where it found the first two symbols.

>>> +        .pushsection .altinstr_replacement, "ax"
>>> +.Lsmep_smap_alt:
>>> +        mov   VCPU_domain(%rbx),%rax
>>> +.Lsmep_smap_alt_end:
>>> +        .section .altinstructions, "a"
>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>> +                             X86_FEATURE_SMEP, \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>> +        altinstruction_entry .Lsmep_smap_orig, .Lsmep_smap_alt, \
>>> +                             X86_FEATURE_SMAP, \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt), \
>>> +                             (.Lsmep_smap_alt_end - .Lsmep_smap_alt)
>>> +        .popsection
>>> +
>>> +        testb $3,UREGS_cs(%rsp)
>>> +        jz    0f
>>> +        cmpb  $0,DOMAIN_is_32bit_pv(%rax)
>> This comparison is wrong on hardware lacking SMEP and SMAP, as the "mov
>> VCPU_domain(%rbx),%rax" won't have happened.
> That mov indeed won't have happened, but the original instruction
> is a branch past all of this code, so the above is correct (and I did
> test on older hardware).

Oh so it wont.  It is moderately subtle that this entire codeblock is
logically contained in the alternative.

It would be far clearer, and work around your org bug, if this was a
single alternative which patched jump into a nop.

At the very least, a label of .Lcr3_pv32_fixup_done would be an
improvement over 0.

>
>>> +        je    0f
>>> +        call  cr4_smep_smap_restore
>>> +        /*
>>> +         * An NMI or #MC may occur between clearing CR4.SMEP and CR4.SMAP in
>>> +         * compat_restore_all_guest and it actually returning to guest
>>> +         * context, in which case the guest would run with the two features
>>> +         * enabled. The only bad that can happen from this is a kernel mode
>>> +         * #PF which the guest doesn't expect. Rather than trying to make the
>>> +         * NMI/#MC exit path honor the intended CR4 setting, simply check
>>> +         * whether the wrong CR4 was in use when the #PF occurred, and exit
>>> +         * back to the guest (which will in turn clear the two CR4 bits) to
>>> +         * re-execute the instruction. If we get back here, the CR4 bits
>>> +         * should then be found clear (unless another NMI/#MC occurred at
>>> +         * exactly the right time), and we'll continue processing the
>>> +         * exception as normal.
>>> +         */
>>> +        test  %rax,%rax
>>> +        jnz   0f
>>> +        mov   $PFEC_page_present,%al
>>> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
>>> +        jne   0f
>>> +        xor   UREGS_error_code(%rsp),%eax
>>> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
>>> +        jz    compat_test_all_events
>>> +0:      sti
>> Its code like this which makes me even more certain that we have far too
>> much code written in assembly which doesn't need to be.  Maybe not this
>> specific sample, but it has taken me 15 minutes and a pad of paper to
>> try and work out how this conditional works, and I am still not certain
>> its correct.  In particular, PFEC_prot_key looks like it fool the test
>> into believing a non-smap/smep fault was a smap/smep fault.
> Not sure how you come to think of PFEC_prot_key here: That's
> a bit which can be set only together with PFEC_user_mode, yet
> we care about kernel mode faults only here.

I would not make that assumption.  Assumptions about the valid set of
#PF flags are precisely the reason that older Linux falls into an
infinite loop when encountering a SMAP pagefault, rather than a clean crash.

~Andrew

>
>> Can you at least provide some C in a comment with the intended
>> conditional, to aid clarity?
> Sure, if you think that helps beyond the (I think) pretty extensive
> comment:
>
> +        test  %rax,%rax
> +        jnz   0f
> +        /*
> +         * The below effectively is
> +         * if ( regs->entry_vector == TRAP_page_fault &&
> +         *      (regs->error_code & PFEC_page_present) &&
> +         *      !(regs->error_code & ~(PFEC_write_access|PFEC_insn_fetch)) )
> +         *     goto compat_test_all_events;
> +         */
> +        mov   $PFEC_page_present,%al
> +        cmpb  $TRAP_page_fault,UREGS_entry_vector(%rsp)
> +        jne   0f
> +        xor   UREGS_error_code(%rsp),%eax
> +        test  $~(PFEC_write_access|PFEC_insn_fetch),%eax
> +        jz    compat_test_all_events
> +0:
>
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-03-09 11:19 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-04 11:08 [PATCH 0/4] x86: accommodate 32-bit PV guests with SMAP/SMEP handling Jan Beulich
2016-03-04 11:27 ` [PATCH 1/4] x86/alternatives: correct near branch check Jan Beulich
2016-03-07 15:43   ` Andrew Cooper
2016-03-07 15:56     ` Jan Beulich
2016-03-07 16:11       ` Andrew Cooper
2016-03-07 16:21         ` Jan Beulich
2016-03-08 17:33           ` Andrew Cooper
2016-03-04 11:27 ` [PATCH 2/4] x86: suppress SMAP and SMEP while running 32-bit PV guest code Jan Beulich
2016-03-07 16:59   ` Andrew Cooper
2016-03-08  7:57     ` Jan Beulich
2016-03-09  8:09       ` Wu, Feng
2016-03-09 14:09         ` Jan Beulich
2016-03-09 11:19       ` Andrew Cooper [this message]
2016-03-09 14:28         ` Jan Beulich
2016-03-09  8:09   ` Wu, Feng
2016-03-09 10:45     ` Andrew Cooper
2016-03-09 12:27       ` Wu, Feng
2016-03-09 12:33         ` Andrew Cooper
2016-03-09 12:36           ` Jan Beulich
2016-03-09 12:54             ` Wu, Feng
2016-03-09 13:35             ` Wu, Feng
2016-03-09 13:42               ` Andrew Cooper
2016-03-09 14:03       ` Jan Beulich
2016-03-09 14:07     ` Jan Beulich
2016-03-04 11:28 ` [PATCH 3/4] x86: use optimal NOPs to fill the SMAP/SMEP placeholders Jan Beulich
2016-03-07 17:43   ` Andrew Cooper
2016-03-08  8:02     ` Jan Beulich
2016-03-04 11:29 ` [PATCH 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
2016-03-07 17:45   ` Andrew Cooper
2016-03-10  9:44 ` [PATCH v2 0/3] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
2016-03-10  9:53   ` [PATCH v2 1/3] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
2016-05-13 15:48     ` Andrew Cooper
2016-03-10  9:54   ` [PATCH v2 2/3] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
2016-05-13 15:49     ` Andrew Cooper
2016-03-10  9:55   ` [PATCH v2 3/3] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
     [not found]   ` <56E9A0DB02000078000DD54C@prv-mh.provo.novell.com>
2016-03-17  7:50     ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Jan Beulich
2016-03-17  8:02       ` [PATCH v3 1/4] x86: move cached CR4 value to struct cpu_info Jan Beulich
2016-03-17 16:20         ` Andrew Cooper
2016-03-17  8:03       ` [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
2016-03-25 18:01         ` Konrad Rzeszutek Wilk
2016-03-29  6:55           ` Jan Beulich
2016-05-13 15:58         ` Andrew Cooper
2016-03-17  8:03       ` [PATCH v3 3/4] x86: use optimal NOPs to fill the SMEP/SMAP placeholders Jan Beulich
2016-05-13 15:57         ` Andrew Cooper
2016-05-13 16:06           ` Jan Beulich
2016-05-13 16:09             ` Andrew Cooper
2016-03-17  8:04       ` [PATCH v3 4/4] x86: use 32-bit loads for 32-bit PV guest state reload Jan Beulich
2016-03-25 18:02         ` Konrad Rzeszutek Wilk
2016-03-17 16:14       ` [PATCH v3 5/4] x86: reduce code size of struct cpu_info member accesses Jan Beulich
2016-03-25 18:47         ` Konrad Rzeszutek Wilk
2016-03-29  6:59           ` Jan Beulich
2016-03-30 14:28             ` Konrad Rzeszutek Wilk
2016-03-30 14:42               ` Jan Beulich
2016-05-13 16:11         ` Andrew Cooper
2016-05-03 13:58       ` Ping: [PATCH v3 2/4] x86: suppress SMEP and SMAP while running 32-bit PV guest code Jan Beulich
2016-05-03 14:10         ` Andrew Cooper
2016-05-03 14:25           ` Jan Beulich
2016-05-04 10:03             ` Andrew Cooper
2016-05-04 13:35               ` Jan Beulich
2016-05-04  3:07         ` Wu, Feng
2016-05-13 15:21         ` Wei Liu
2016-05-13 15:30           ` Jan Beulich
2016-05-13 15:33             ` Wei Liu
2016-05-13 17:02       ` [PATCH v3 0/4] x86: accommodate 32-bit PV guests with SMEP/SMAP handling Wei Liu
2016-05-13 17:21         ` Andrew Cooper
2016-06-21  6:19       ` Wu, Feng
2016-06-21  7:17         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E006CB.7090907@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=feng.wu@intel.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).