From: Jan Beulich <jbeulich@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>,
"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
"Andy Lutomirski" <luto@kernel.org>,
"Manuel Bouyer" <bouyer@antioche.eu.org>
Subject: Re: [PATCH 3/3] x86/pv: Inject #UD for missing SYSCALL callbacks
Date: Thu, 24 Sep 2020 16:56:13 +0200 [thread overview]
Message-ID: <ed891c94-63a4-496c-6817-1b88ac4c004b@suse.com> (raw)
In-Reply-To: <20200923101848.29049-4-andrew.cooper3@citrix.com>
On 23.09.2020 12:18, Andrew Cooper wrote:
> Despite appearing to be a deliberate design choice of early PV64, the
> resulting behaviour for unregistered SYSCALL callbacks creates an untenable
> testability problem for Xen. Furthermore, the behaviour is undocumented,
> bizarre, and inconsistent with related behaviour in Xen, and very liable
> introduce a security vulnerability into a PV guest if the author hasn't
> studied Xen's assembly code in detail.
>
> There are two different bugs here.
>
> 1) The current logic confuses the registered entrypoints, and may deliver a
> SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
> entrypoint is registered.
>
> This has been the case ever since 2007 (c/s cd75d47348b) but up until
> 2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
> a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
I'm not sure what you derive the last half sentence from. To a 32-bit
PV guest, nothing can make things look like being 64-bit. And as you
did say in your 2018 change, FLAT_KERNEL_SS == FLAT_USER_SS32.
As to the "confusion" of entry points - before the compat mode entry
path was introduced, a 64-bit guest could only register a single
entry point. Hence guests at the time had to multiplex 32- and 64-bit
user mode entry from this one code path. In order to avoid regressing
any such guest, the falling back to using the 64-bit entry point was
chosen. Effectively what you propose is to regress such guests now,
rather than back then.
> Xen would malfunction under these circumstances, if it were a PV guest.
> Linux would as well, but PVOps has always registered both entrypoints and
> discarded the Xen-provided selectors. NetBSD really does malfunction as a
> consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
>
> 2) In the case that neither SYSCALL callbacks are registered, the guest will
> be crashed when userspace executes a SYSCALL instruction, which is a
> userspace => kernel DoS.
>
> This has been the case ever since the introduction of 64bit PV support, but
> behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
> #GP/#UD in userspace before the callback is registered, and are therefore
> safe by default.
I agree this part is an improvement.
> This change does constitute a change in the PV ABI, for corner cases of a PV
> guest kernel registering neither callback, or not registering the 32bit
> callback when running on AMD/Hygon hardware.
>
> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
> SYSENTER (safe by default, until explicitly enabled), as well as native
> hardware (always delivered to the single applicable callback).
Albeit an OS running natively and setting EFER.SCE is obliged to set both
entry points; they can't have one without the other (and not be vulnerable).
Since it's unclear what the PV equivalent of EFER.SCE is, I don't think
comparing this particular aspect of the behavior makes a lot of sense.
> Most importantly however, and the primary reason for the change, is that it
> lets us actually test the PV entrypoints to prove correct behaviour.
You mean "test the absence of PV entry points" here?
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -26,18 +26,30 @@
> /* %rbx: struct vcpu */
> ENTRY(switch_to_kernel)
> leaq VCPU_trap_bounce(%rbx),%rdx
> - /* TB_eip = (32-bit syscall && syscall32_addr) ?
> - * syscall32_addr : syscall_addr */
> - xor %eax,%eax
> +
> + /* TB_eip = 32-bit syscall ? syscall32_addr : syscall_addr */
> + mov VCPU_syscall32_addr(%rbx), %ecx
> + mov VCPU_syscall_addr(%rbx), %rax
> cmpw $FLAT_USER_CS32,UREGS_cs(%rsp)
> - cmoveq VCPU_syscall32_addr(%rbx),%rax
> - testq %rax,%rax
> - cmovzq VCPU_syscall_addr(%rbx),%rax
> - movq %rax,TRAPBOUNCE_eip(%rdx)
> + cmove %rcx, %rax
> +
> /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
> btl $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
> setc %cl
> leal (,%rcx,TBF_INTERRUPT),%ecx
> +
> + test %rax, %rax
> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
> + movq VCPU_trap_ctxt(%rbx), %rdi
> + movl $X86_EXC_UD, UREGS_entry_vector(%rsp)
> + subl $2, UREGS_rip(%rsp)
> + movl X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_eip(%rdi), %eax
I guess you mean "movq ..., %rax"? Iirc 32-bit guests don't even get through
here.
Jan
next prev parent reply other threads:[~2020-09-24 14:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-23 10:18 [PATCH 0/3] x86/pv: Multiple fixes to SYSCALL/SYSENTER handling (XSA-339 followup) Andrew Cooper
2020-09-23 10:18 ` [PATCH 1/3] x86/pv: Don't deliver #GP for a SYSENTER with NT set Andrew Cooper
2020-09-24 13:55 ` Jan Beulich
2020-09-23 10:18 ` [PATCH 2/3] x86/pv: Don't clobber NT on return-to-guest Andrew Cooper
2020-09-24 13:57 ` Jan Beulich
2020-09-23 10:18 ` [PATCH 3/3] x86/pv: Inject #UD for missing SYSCALL callbacks Andrew Cooper
2020-09-24 14:56 ` Jan Beulich [this message]
2020-09-28 13:05 ` Andrew Cooper
2020-09-28 15:35 ` Jan Beulich
2020-10-09 11:53 ` [PATCH v2] " Andrew Cooper
2020-10-09 12:40 ` Manuel Bouyer
2020-10-09 12:50 ` Andrew Cooper
2020-10-14 14:16 ` Roger Pau Monné
2020-10-14 14:20 ` Manuel Bouyer
2020-10-14 14:26 ` Andrew Cooper
2020-10-14 15:17 ` Andrew Cooper
2020-10-14 16:28 ` Roger Pau Monné
2020-10-14 17:41 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed891c94-63a4-496c-6817-1b88ac4c004b@suse.com \
--to=jbeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=bouyer@antioche.eu.org \
--cc=luto@kernel.org \
--cc=roger.pau@citrix.com \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).