All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/22] Vixen: A PV-in-HVM shim
@ 2018-01-06 22:54 Anthony Liguori
  2018-01-06 22:54 ` [PATCH 01/22] ---- x86/Kconfig: Options for Xen and PVH support Anthony Liguori
                   ` (24 more replies)
  0 siblings, 25 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
appears to be very difficult to isolate the hypervisor's page tables
from PV domUs while maintaining ABI compatibility.  Instead of trying
to make a KPTI-like approach work for Xen PV, it seems reasonable to
run a copy of Xen within an HVM (or PVH) domU to provide backwards
compatibility with guests as mentioned in XSA-254 [1].

This patch series adds a new mode to Xen called Vixen (Virtualized
Xen) which provides a PV-compatible interface while gaining
CVE-2017-5754 protection for the host provided by hardware
virtualization.  Vixen supports running a single unprivileged PV
domain (a dom1) that is constructed by the dom0 domain builder.

Please note the Xen page table configuration fundamental to the
current PV ABI makes it impossible for an operating system to mitigate
CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
(KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
must run directly in a HVM or PVH domU.

This series is very similar to the PVH series posted by Wei and we
have been discussing how to merge efforts.  We were hoping to have
more time to work this out.  I am posting this because I'm fairly
confident that this series is complete (all PV instances in EC2 are
using this) and others might find it useful.  I also wanted to have
more of a discussion about the best way to merge and some of the
differences in designs.

This series is also available at:

 git clone https://github.com/aliguori/xen.git vixen-upstream-v1

Regards,

Anthony Liguori

[1] https://xenbits.xen.org/xsa/advisory-254.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/22] ---- x86/Kconfig: Options for Xen and PVH support
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 02/22] x86/entry: Probe for Xen early during boot Anthony Liguori
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/Kconfig | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 7c45829..07530bf 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -117,6 +117,23 @@ config TBOOT
 	  Technology (TXT)
 
 	  If unsure, say Y.
+
+config XEN_GUEST
+	def_bool y
+	prompt "Xen Guest"
+	---help---
+	  Support for Xen detecting when it is running under Xen.
+
+	  If unsure, say Y.
+
+config PVH_GUEST
+	def_bool n
+	prompt "PVH Guest"
+	depends on XEN_GUEST
+	---help---
+	  Support booting using the PVH ABI.
+
+	  If unsure, say N.
 endmenu
 
 source "common/Kconfig"
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/22] x86/entry: Probe for Xen early during boot
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
  2018-01-06 22:54 ` [PATCH 01/22] ---- x86/Kconfig: Options for Xen and PVH support Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 03/22] x86/guest: Hypercall support Anthony Liguori
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/Makefile           |  1 +
 xen/arch/x86/guest/Makefile     |  1 +
 xen/arch/x86/guest/xen.c        | 75 +++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c            |  4 +++
 xen/include/asm-x86/guest.h     | 34 +++++++++++++++++++
 xen/include/asm-x86/guest/xen.h | 47 ++++++++++++++++++++++++++
 6 files changed, 162 insertions(+)
 create mode 100644 xen/arch/x86/guest/Makefile
 create mode 100644 xen/arch/x86/guest/xen.c
 create mode 100644 xen/include/asm-x86/guest.h
 create mode 100644 xen/include/asm-x86/guest/xen.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d5d58a2..c1977d1 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -1,6 +1,7 @@
 subdir-y += acpi
 subdir-y += cpu
 subdir-y += genapic
+subdir-$(CONFIG_XEN_GUEST) += guest
 subdir-$(CONFIG_HVM) += hvm
 subdir-y += mm
 subdir-$(CONFIG_XENOPROF) += oprofile
diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
new file mode 100644
index 0000000..7f67396
--- /dev/null
+++ b/xen/arch/x86/guest/Makefile
@@ -0,0 +1 @@
+obj-y += xen.o
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
new file mode 100644
index 0000000..9446a46
--- /dev/null
+++ b/xen/arch/x86/guest/xen.c
@@ -0,0 +1,75 @@
+/******************************************************************************
+ * arch/x86/guest/xen.c
+ *
+ * Support for detecting and running under Xen.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+#include <xen/init.h>
+#include <xen/types.h>
+
+#include <asm/guest.h>
+#include <asm/processor.h>
+
+#include <public/arch-x86/cpuid.h>
+
+bool xen_guest;
+
+static uint32_t xen_cpuid_base;
+
+static void __init find_xen_leaves(void)
+{
+    uint32_t eax, ebx, ecx, edx, base;
+
+    for ( base = XEN_CPUID_FIRST_LEAF;
+          base < XEN_CPUID_FIRST_LEAF + 0x10000; base += 0x100 )
+    {
+        cpuid(base, &eax, &ebx, &ecx, &edx);
+
+        if ( (ebx == XEN_CPUID_SIGNATURE_EBX) &&
+             (ecx == XEN_CPUID_SIGNATURE_ECX) &&
+             (edx == XEN_CPUID_SIGNATURE_EDX) &&
+             ((eax - base) >= 2) )
+        {
+            xen_cpuid_base = base;
+            break;
+        }
+    }
+}
+
+void __init probe_hypervisor(void)
+{
+    /* Too early to use cpu_has_hypervisor */
+    if ( !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) )
+        return;
+
+    find_xen_leaves();
+
+    if ( !xen_cpuid_base )
+        return;
+
+    xen_guest = true;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2e10c6b..7627c3f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -51,6 +51,8 @@
 #include <asm/alternative.h>
 #include <asm/mc146818rtc.h>
 #include <asm/cpuid.h>
+#include <asm/guest.h>
+#include <public/arch-x86/cpuid.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool __initdata opt_nosmp;
@@ -704,6 +706,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
      * allocing any xenheap structures wanted in lower memory. */
     kexec_early_calculations();
 
+    probe_hypervisor();
+
     parse_video_info();
 
     rdmsrl(MSR_EFER, this_cpu(efer));
diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
new file mode 100644
index 0000000..eb08434
--- /dev/null
+++ b/xen/include/asm-x86/guest.h
@@ -0,0 +1,34 @@
+/******************************************************************************
+ * asm-x86/guest.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_GUEST_H__
+#define __X86_GUEST_H__
+
+#include <asm/guest/xen.h>
+
+#endif /* __X86_GUEST_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
new file mode 100644
index 0000000..97a7c8d
--- /dev/null
+++ b/xen/include/asm-x86/guest/xen.h
@@ -0,0 +1,47 @@
+/******************************************************************************
+ * asm-x86/guest/xen.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_GUEST_XEN_H__
+#define __X86_GUEST_XEN_H__
+
+#include <xen/types.h>
+
+#ifdef CONFIG_XEN_GUEST
+
+extern bool xen_guest;
+
+void probe_hypervisor(void);
+
+#else
+
+#define xen_guest 0
+
+static inline void probe_hypervisor(void) {};
+
+#endif /* CONFIG_XEN_GUEST */
+#endif /* __X86_GUEST_XEN_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/22] x86/guest: Hypercall support
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
  2018-01-06 22:54 ` [PATCH 01/22] ---- x86/Kconfig: Options for Xen and PVH support Anthony Liguori
  2018-01-06 22:54 ` [PATCH 02/22] x86/entry: Probe for Xen early during boot Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 04/22] x86: Don't use potentially incorrect CPUID values for topology information Anthony Liguori
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/guest/Makefile           |  1 +
 xen/arch/x86/guest/hypercall_page.S   | 79 ++++++++++++++++++++++++++++++
 xen/arch/x86/guest/xen.c              |  5 ++
 xen/arch/x86/xen.lds.S                |  1 +
 xen/include/asm-x86/guest.h           |  1 +
 xen/include/asm-x86/guest/hypercall.h | 92 +++++++++++++++++++++++++++++++++++
 6 files changed, 179 insertions(+)
 create mode 100644 xen/arch/x86/guest/hypercall_page.S
 create mode 100644 xen/include/asm-x86/guest/hypercall.h

diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
index 7f67396..c5d5188 100644
--- a/xen/arch/x86/guest/Makefile
+++ b/xen/arch/x86/guest/Makefile
@@ -1 +1,2 @@
+obj-y += hypercall_page.o
 obj-y += xen.o
diff --git a/xen/arch/x86/guest/hypercall_page.S b/xen/arch/x86/guest/hypercall_page.S
new file mode 100644
index 0000000..fdd2e72
--- /dev/null
+++ b/xen/arch/x86/guest/hypercall_page.S
@@ -0,0 +1,79 @@
+#include <asm/page.h>
+#include <asm/asm_defns.h>
+#include <public/xen.h>
+
+        .section ".text.page_aligned", "ax", @progbits
+        .p2align PAGE_SHIFT
+
+GLOBAL(hypercall_page)
+         /* Poisoned with `ret` for safety before hypercalls are set up. */
+        .fill PAGE_SIZE, 1, 0xc3
+        .type hypercall_page, STT_OBJECT
+        .size hypercall_page, PAGE_SIZE
+
+/*
+ * Identify a specific hypercall in the hypercall page
+ * @param name Hypercall name.
+ */
+#define DECLARE_HYPERCALL(name)                                                 \
+        .globl HYPERCALL_ ## name;                                              \
+        .set   HYPERCALL_ ## name, hypercall_page + __HYPERVISOR_ ## name * 32; \
+        .type  HYPERCALL_ ## name, STT_FUNC;                                    \
+        .size  HYPERCALL_ ## name, 32
+
+DECLARE_HYPERCALL(set_trap_table)
+DECLARE_HYPERCALL(mmu_update)
+DECLARE_HYPERCALL(set_gdt)
+DECLARE_HYPERCALL(stack_switch)
+DECLARE_HYPERCALL(set_callbacks)
+DECLARE_HYPERCALL(fpu_taskswitch)
+DECLARE_HYPERCALL(sched_op_compat)
+DECLARE_HYPERCALL(platform_op)
+DECLARE_HYPERCALL(set_debugreg)
+DECLARE_HYPERCALL(get_debugreg)
+DECLARE_HYPERCALL(update_descriptor)
+DECLARE_HYPERCALL(memory_op)
+DECLARE_HYPERCALL(multicall)
+DECLARE_HYPERCALL(update_va_mapping)
+DECLARE_HYPERCALL(set_timer_op)
+DECLARE_HYPERCALL(event_channel_op_compat)
+DECLARE_HYPERCALL(xen_version)
+DECLARE_HYPERCALL(console_io)
+DECLARE_HYPERCALL(physdev_op_compat)
+DECLARE_HYPERCALL(grant_table_op)
+DECLARE_HYPERCALL(vm_assist)
+DECLARE_HYPERCALL(update_va_mapping_otherdomain)
+DECLARE_HYPERCALL(iret)
+DECLARE_HYPERCALL(vcpu_op)
+DECLARE_HYPERCALL(set_segment_base)
+DECLARE_HYPERCALL(mmuext_op)
+DECLARE_HYPERCALL(xsm_op)
+DECLARE_HYPERCALL(nmi_op)
+DECLARE_HYPERCALL(sched_op)
+DECLARE_HYPERCALL(callback_op)
+DECLARE_HYPERCALL(xenoprof_op)
+DECLARE_HYPERCALL(event_channel_op)
+DECLARE_HYPERCALL(physdev_op)
+DECLARE_HYPERCALL(hvm_op)
+DECLARE_HYPERCALL(sysctl)
+DECLARE_HYPERCALL(domctl)
+DECLARE_HYPERCALL(kexec_op)
+DECLARE_HYPERCALL(tmem_op)
+DECLARE_HYPERCALL(xc_reserved_op)
+DECLARE_HYPERCALL(xenpmu_op)
+
+DECLARE_HYPERCALL(arch_0)
+DECLARE_HYPERCALL(arch_1)
+DECLARE_HYPERCALL(arch_2)
+DECLARE_HYPERCALL(arch_3)
+DECLARE_HYPERCALL(arch_4)
+DECLARE_HYPERCALL(arch_5)
+DECLARE_HYPERCALL(arch_6)
+DECLARE_HYPERCALL(arch_7)
+
+/*
+ * Local variables:
+ * tab-width: 8
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 9446a46..c5b4341 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -22,6 +22,7 @@
 #include <xen/types.h>
 
 #include <asm/guest.h>
+#include <asm/msr.h>
 #include <asm/processor.h>
 
 #include <public/arch-x86/cpuid.h>
@@ -29,6 +30,7 @@
 bool xen_guest;
 
 static uint32_t xen_cpuid_base;
+extern char hypercall_page[];
 
 static void __init find_xen_leaves(void)
 {
@@ -61,6 +63,9 @@ void __init probe_hypervisor(void)
     if ( !xen_cpuid_base )
         return;
 
+    /* Fill the hypercall page. */
+    wrmsrl(cpuid_ebx(xen_cpuid_base + 2), __pa(hypercall_page));
+
     xen_guest = true;
 }
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index d5e8821..dd0e1c5 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -59,6 +59,7 @@ SECTIONS
   .text : {
         _stext = .;            /* Text and read-only data */
        *(.text)
+       *(.text.page_aligned)
        *(.text.cold)
        *(.text.unlikely)
        *(.fixup)
diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
index eb08434..70250b7 100644
--- a/xen/include/asm-x86/guest.h
+++ b/xen/include/asm-x86/guest.h
@@ -19,6 +19,7 @@
 #ifndef __X86_GUEST_H__
 #define __X86_GUEST_H__
 
+#include <asm/guest/hypercall.h>
 #include <asm/guest/xen.h>
 
 #endif /* __X86_GUEST_H__ */
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
new file mode 100644
index 0000000..c460f59
--- /dev/null
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -0,0 +1,92 @@
+/******************************************************************************
+ * asm-x86/guest/hypercall.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_XEN_HYPERCALL_H__
+#define __X86_XEN_HYPERCALL_H__
+
+#ifdef CONFIG_XEN_GUEST
+
+/*
+ * Hypercall primatives for 64bit
+ *
+ * Inputs: %rdi, %rsi, %rdx, %r10, %r8, %r9 (arguments 1-6)
+ */
+
+#define _hypercall64_1(type, hcall, a1)                                 \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp)                                    \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1))                                          \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#define _hypercall64_2(type, hcall, a1, a2)                             \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp), "=S" (tmp)                        \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1)), "2" ((long)(a2))                        \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#define _hypercall64_3(type, hcall, a1, a2, a3)                         \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp), "=S" (tmp), "=d" (tmp)            \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1)), "2" ((long)(a2)), "3" ((long)(a3))      \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#define _hypercall64_4(type, hcall, a1, a2, a3, a4)                     \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        register long _a4 asm ("r10") = ((long)(a4));                   \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp), "=S" (tmp), "=d" (tmp),           \
+              "=&r" (tmp)                                               \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1)), "2" ((long)(a2)), "3" ((long)(a3)),     \
+              "4" (_a4)                                                 \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#endif /* CONFIG_XEN_GUEST */
+#endif /* __X86_XEN_HYPERCALL_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/22] x86: Don't use potentially incorrect CPUID values for topology information
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (2 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 03/22] x86/guest: Hypercall support Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 05/22] char: optionally redirect {, g}printk output to QEMU debug log Anthony Liguori
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Jan H. Schönherr <jschoenh@amazon.de>

Intel says for CPUID leaf 0Bh:

  "Software must not use EBX[15:0] to enumerate processor
   topology of the system. This value in this field
   (EBX[15:0]) is only intended for display/diagnostic
   purposes. The actual number of logical processors
   available to BIOS/OS/Applications may be different from
   the value of EBX[15:0], depending on software and platform
   hardware configurations."

And yet, we're using them to derive the number cores in a package
and the number of siblings in a core.

Derive the number of siblings and cores from EAX instead, which is
intended for that.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/cpu/common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index e9588b3..22f392f 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -479,8 +479,8 @@ void detect_extended_topology(struct cpuinfo_x86 *c)
 	initial_apicid = edx;
 
 	/* Populate HT related information from sub-leaf level 0 */
-	core_level_siblings = c->x86_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
 	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+	core_level_siblings = c->x86_num_siblings = 1 << ht_mask_width;
 
 	sub_index = 1;
 	do {
@@ -488,8 +488,8 @@ void detect_extended_topology(struct cpuinfo_x86 *c)
 
 		/* Check for the Core type in the implemented sub leaves */
 		if ( LEAFB_SUBTYPE(ecx) == CORE_TYPE ) {
-			core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
 			core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+			core_level_siblings = 1 << core_plus_mask_width;
 			break;
 		}
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/22] char: optionally redirect {, g}printk output to QEMU debug log
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (3 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 04/22] x86: Don't use potentially incorrect CPUID values for topology information Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  0:18   ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 06/22] console: do not print banner if below info log threshold Anthony Liguori
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Matt Wilson <msw@amazon.com>

When using Vixen, it is helpful to get the Xen messages in a
separate channel than the console output.  Add an option to
output to the QEMU backdoor logging port.

Signed-off-by: Matt Wilson <msw@amazon.com>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/drivers/char/console.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 19d0e74..b9412c5 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -85,6 +85,11 @@ static int __read_mostly sercon_handle = -1;
 
 static DEFINE_SPINLOCK(console_lock);
 
+/* send all printk output to QEMU debug log. Input does not change,
+ * nor does dom0 output.
+ */
+static bool_t __read_mostly qemu_debug = false;
+
 /*
  * To control the amount of printing, thresholds are added.
  * These thresholds correspond to the XENLOG logging levels.
@@ -564,10 +569,21 @@ static void __putstr(const char *str)
 {
     ASSERT(spin_is_locked(&console_lock));
 
-    sercon_puts(str);
-    video_puts(str);
+    if ( qemu_debug )
+    {
+        char c;
+        while ( (c = *str++) != '\0' )
+        {
+            outb(c, 0x12);
+        }
+    }
+    else
+    {
+        sercon_puts(str);
+        video_puts(str);
 
-    conring_puts(str);
+        conring_puts(str);
+    }
 
     if ( !console_locks_busted )
         tasklet_schedule(&notify_dom0_con_ring_tasklet);
@@ -762,6 +778,8 @@ void __init console_init_preirq(void)
             p++;
         if ( !strncmp(p, "vga", 3) )
             video_init();
+        else if ( !strncmp(p, "qemu", 4) )
+            qemu_debug = true;
         else if ( !strncmp(p, "none", 4) )
             continue;
         else if ( (sh = serial_parse_handle(p)) >= 0 )
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/22] console: do not print banner if below info log threshold
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (4 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 05/22] char: optionally redirect {, g}printk output to QEMU debug log Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 07/22] vixen: introduce is_vixen() to allow altering behavior Anthony Liguori
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

Only print the banner if the log threshold is at least info.

For Vixen guests, we want the console output to be exactly what the
PV guest would show on it's own.  That means the inner Xen banner
can potentially break automation that assumes a specific type of
console output.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/drivers/char/console.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index b9412c5..a07343d 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -801,9 +801,12 @@ void __init console_init_preirq(void)
     serial_set_rx_handler(sercon_handle, serial_rx);
 
     /* HELLO WORLD --- start-of-day banner text. */
-    spin_lock(&console_lock);
-    __putstr(xen_banner());
-    spin_unlock(&console_lock);
+    if ( 2 < xenlog_lower_thresh ) {
+        /* Only display at XENLOG_INFO level */
+        spin_lock(&console_lock);
+        __putstr(xen_banner());
+        spin_unlock(&console_lock);
+    }
     printk("Xen version %d.%d%s (%s@%s) (%s) debug=%c " gcov_string " %s\n",
            xen_major_version(), xen_minor_version(), xen_extra_version(),
            xen_compile_by(), xen_compile_domain(),
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/22] vixen: introduce is_vixen() to allow altering behavior
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (5 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 06/22] console: do not print banner if below info log threshold Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  0:06   ` Matt Wilson
  2018-01-06 22:54 ` [PATCH 08/22] vixen: allow dom0 to be created with a domid != 0 Anthony Liguori
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

Vixen (Virtualized Xen) is a paravirtual mode of Xen where
paravirtual I/O is passed through from the parent hypervisor
all the way through the dom0 guest.  The dom0 guest is also
deprivileged and renumbered to give the appearance that it
is running as a normal PV guest.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/guest/Makefile       |  1 +
 xen/arch/x86/guest/vixen.c        | 30 ++++++++++++++++
 xen/include/asm-x86/guest/vixen.h | 73 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+)
 create mode 100644 xen/arch/x86/guest/vixen.c
 create mode 100644 xen/include/asm-x86/guest/vixen.h

diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
index c5d5188..1c9cd7d 100644
--- a/xen/arch/x86/guest/Makefile
+++ b/xen/arch/x86/guest/Makefile
@@ -1,2 +1,3 @@
 obj-y += hypercall_page.o
 obj-y += xen.o
+obj-y += vixen.o
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
new file mode 100644
index 0000000..d82e68f
--- /dev/null
+++ b/xen/arch/x86/guest/vixen.c
@@ -0,0 +1,30 @@
+/******************************************************************************
+ * arch/x86/guest/vixen.c
+ *
+ * Support for detecting and running under Xen HVM.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright 2017-2018 Amazon.com, Inc. or its affiliates.
+ */
+
+#include <asm/guest/vixen.h>
+
+static int in_vixen;
+
+bool is_vixen(void)
+{
+    return in_vixen > 0;
+}
+
diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h
new file mode 100644
index 0000000..be90c46
--- /dev/null
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -0,0 +1,73 @@
+/******************************************************************************
+ * include/asm-x86/guest/vixen.h
+ *
+ * Support for detecting and running under Xen HVM.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright 2017-2018 Amazon.com, Inc. or its affiliates.
+ */
+
+#ifndef XEN_VIXEN_H
+#define XEN_VIXEN_H
+
+#include <asm/guest.h>
+#include <public/xen.h>
+#include <xen/sched.h>
+
+static inline int
+HYPERVISOR_xen_version(int cmd, void *arg)
+{
+    return _hypercall64_2(int, __HYPERVISOR_xen_version, cmd, arg);
+}
+
+static inline unsigned long
+HYPERVISOR_hvm_op(int op, void *arg)
+{
+   return _hypercall64_2(unsigned long, __HYPERVISOR_hvm_op, op, arg);
+}
+
+static inline int
+HYPERVISOR_grant_table_op(unsigned int cmd, void *uop, unsigned int count)
+{
+    return _hypercall64_3(int, __HYPERVISOR_grant_table_op, cmd, uop, count);
+}
+
+static inline long
+HYPERVISOR_memory_op(unsigned int cmd, void *arg)
+{
+    return _hypercall64_2(long, __HYPERVISOR_memory_op, cmd, arg);
+}
+
+static inline int
+HYPERVISOR_event_channel_op(int cmd, void *arg)
+{
+    return _hypercall64_2(int, __HYPERVISOR_event_channel_op, cmd, arg);
+}
+
+static inline int
+HYPERVISOR_sched_op(int cmd, void *arg)
+{
+    return _hypercall64_2(int, __HYPERVISOR_sched_op, cmd, arg);
+}
+
+static inline int
+HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args)
+{
+	return _hypercall64_3(int, __HYPERVISOR_vcpu_op, cmd, vcpuid, extra_args);
+}
+
+bool is_vixen(void);
+
+#endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/22] vixen: allow dom0 to be created with a domid != 0
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (6 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 07/22] vixen: introduce is_vixen() to allow altering behavior Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM Anthony Liguori
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

Some older guests special case domid=0 instead of checking the
shared info flags so in order to get PV drivers loaded properly,
we need to make the guest always appear with a domid != 0.

While the Vixen domain is the hardware domain, we don't want it
to behave that way so we also modify the is_hardware_domain()
check.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/dom0_build.c | 2 +-
 xen/arch/x86/setup.c      | 2 +-
 xen/common/domain.c       | 4 ++--
 xen/include/xen/sched.h   | 6 +++++-
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index bf992fe..88810db 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -469,7 +469,7 @@ int __init construct_dom0(struct domain *d, const module_t *image,
     int rc;
 
     /* Sanity! */
-    BUG_ON(d->domain_id != 0);
+    BUG_ON(d->domain_id != dom0_domid);
     BUG_ON(d->vcpu[0] == NULL);
     BUG_ON(d->vcpu[0]->is_initialised);
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 7627c3f..f9d087e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1591,7 +1591,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     }
 
     /* Create initial domain 0. */
-    dom0 = domain_create(0, domcr_flags, 0, &config);
+    dom0 = domain_create(dom0_domid, domcr_flags, 0, &config);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
         panic("Error creating domain 0");
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7af8d12..b4d679e 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -202,7 +202,7 @@ static int late_hwdom_init(struct domain *d)
     struct domain *dom0;
     int rv;
 
-    if ( d != hardware_domain || d->domain_id == 0 )
+    if ( d != hardware_domain || d->domain_id == dom0_domid )
         return 0;
 
     rv = xsm_init_hardware_domain(XSM_HOOK, d);
@@ -310,7 +310,7 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
     else
         d->guest_type = guest_type_pv;
 
-    if ( domid == 0 || domid == hardware_domid )
+    if ( domid == dom0_domid || domid == hardware_domid )
     {
         if ( hardware_domid < 0 || hardware_domid >= DOMID_FIRST_RESERVED )
             panic("The value of hardware_dom must be a valid domain ID");
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 002ba29..f6c6fff 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -27,6 +27,8 @@
 #include <public/vcpu.h>
 #include <public/vm_event.h>
 #include <public/event_channel.h>
+#include <asm/guest.h>
+#include <asm/guest/vixen.h>
 
 #ifdef CONFIG_COMPAT
 #include <compat/vcpu.h>
@@ -54,6 +56,8 @@ extern domid_t hardware_domid;
 #define hardware_domid 0
 #endif
 
+#define dom0_domid (is_vixen() ? 1 : 0)
+
 #ifndef CONFIG_COMPAT
 #define BITS_PER_EVTCHN_WORD(d) BITS_PER_XEN_ULONG
 #else
@@ -873,7 +877,7 @@ void watchdog_domain_destroy(struct domain *d);
  *    (that is, this would not be suitable for a driver domain)
  *  - There is never a reason to deny the hardware domain access to this
  */
-#define is_hardware_domain(_d) ((_d) == hardware_domain)
+#define is_hardware_domain(_d) (!is_vixen() && ((_d) == hardware_domain))
 
 /* This check is for functionality specific to a control domain */
 #define is_control_domain(_d) ((_d)->is_privileged)
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (7 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 08/22] vixen: allow dom0 to be created with a domid != 0 Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:16   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 10/22] vixen: do not permit access to physical IRQs if in Vixen mode Anthony Liguori
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

In order to be able to assign the Xenstore page into the Vixen guest,
we need struct page_info's to exist.  We do this by modifying the
e820 table early in boot and then using the badpages handling to
prevent these pages from being added to the xenheap.

Since these pages exist in a somewhat weird state in Xen, we need
to relax permission checking too in order to be able to assign them
to the guest.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/e820.c |  6 ++++++
 xen/arch/x86/mm.c   | 21 ++++++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index 7c572ba..9ee147f 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -9,6 +9,7 @@
 #include <asm/processor.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
+#include <asm/guest/vixen.h>
 
 /*
  * opt_mem: Limit maximum address of physical RAM.
@@ -698,6 +699,11 @@ unsigned long __init init_e820(const char *str, struct e820map *raw)
         print_e820_memory_map(raw->map, raw->nr_map);
     }
 
+    if ( is_vixen() )
+    {
+        /* Pretend that passed through special pages are RAM */
+        e820_change_range_type(raw, 0xfeffc000, 0xff000000, E820_RESERVED, E820_RAM);
+    }
     machine_specific_memory_setup(raw);
 
     printk("%s RAM map:\n", str);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a56f875..935901b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -122,6 +122,7 @@
 #include <asm/fixmap.h>
 #include <asm/io_apic.h>
 #include <asm/pci.h>
+#include <asm/guest.h>
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/grant_table.h>
@@ -945,7 +946,7 @@ get_page_from_l1e(
             case 0:
                 break;
             case 1:
-                if ( !is_hardware_domain(l1e_owner) )
+                if ( !is_vixen() && !is_hardware_domain(l1e_owner) )
                     break;
                 /* fallthrough */
             case -1:
@@ -5536,6 +5537,21 @@ void arch_dump_shared_mem_info(void)
             mem_sharing_get_nr_saved_mfns());
 }
 
+const unsigned long *__init
+vixen_get_platform_badpages(unsigned int *array_size)
+{
+    static unsigned long __initdata bad_pages[] = {
+        0xfeffc000,
+        0xfeffd000,
+        0xfeffe000,
+        0xfefff000,
+    };
+
+    *array_size = ARRAY_SIZE(bad_pages);
+
+    return bad_pages;
+}
+
 const unsigned long *__init get_platform_badpages(unsigned int *array_size)
 {
     u32 igd_id;
@@ -5547,6 +5563,9 @@ const unsigned long *__init get_platform_badpages(unsigned int *array_size)
         0x40004000,
     };
 
+    if ( is_vixen() )
+        return vixen_get_platform_badpages(array_size);
+
     *array_size = ARRAY_SIZE(bad_pages);
     igd_id = pci_conf_read32(0, 0, 2, 0, 0);
     if ( !IS_SNB_GFX(igd_id) )
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/22] vixen: do not permit access to physical IRQs if in Vixen mode
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (8 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:18   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping Anthony Liguori
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

Our intention is for the Vixen guest to be deprivileged so we need
to avoid permitting access to each IRQ even though it is technically
the hardware domain.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/irq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 87ef2e8..bd75108 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -25,6 +25,7 @@
 #include <asm/flushtlb.h>
 #include <asm/mach-generic/mach_apic.h>
 #include <public/physdev.h>
+#include <asm/guest/vixen.h>
 
 static int parse_irq_vector_map_param(const char *s);
 
@@ -190,7 +191,7 @@ int create_irq(nodeid_t node)
         desc->arch.used = IRQ_UNUSED;
         irq = ret;
     }
-    else if ( hardware_domain )
+    else if ( !is_vixen() && hardware_domain )
     {
         ret = irq_permit_access(hardware_domain, irq);
         if ( ret )
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (9 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 10/22] vixen: do not permit access to physical IRQs if in Vixen mode Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:23   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 12/22] vixen: paravirtualization TSC frequency calculation Anthony Liguori
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

We split initialization of Vixen into two parts.  The first part
just detects the presence of an HVM hypervisor so that we can
figure out whether to modify the e820 table.

The later initialization is used to actually map the shared_info
structure from the parent hypervisor into Xen.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/guest/vixen.c        | 45 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c              |  5 +++++
 xen/include/asm-x86/guest/vixen.h |  4 ++++
 3 files changed, 54 insertions(+)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index d82e68f..d8466ba 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -20,8 +20,53 @@
  */
 
 #include <asm/guest/vixen.h>
+#include <public/version.h>
 
 static int in_vixen;
+static uint8_t global_si_data[4 << 10] __attribute__((aligned(4096)));
+static shared_info_t *global_si = (void *)global_si_data;
+
+void __init init_vixen(void)
+{
+    int major, minor, version;
+
+    if ( !xen_guest )
+    {
+        printk("Disabling Vixen because we are not running under Xen\n");
+        in_vixen = -1;
+        return;
+    }
+
+    version = HYPERVISOR_xen_version(XENVER_version, NULL);
+    major = version >> 16;
+    minor = version & 0xffff;
+
+    printk("Vixen running under Xen %d.%d\n", major, minor);
+
+    in_vixen = 1;
+}
+
+void __init early_vixen_init(void)
+{
+    struct xen_add_to_physmap xatp;
+    long rc;
+
+    if ( !is_vixen() )
+	return;
+
+    /* Setup our own shared info area */
+    xatp.domid = DOMID_SELF;
+    xatp.idx = 0;
+    xatp.space = XENMAPSPACE_shared_info;
+    xatp.gpfn = virt_to_mfn(global_si);
+
+    rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
+    if ( rc < 0 )
+        printk("Setting shared info page failed: %ld\n", rc);
+
+    memset(&global_si->native.evtchn_mask[0], 0x00,
+           sizeof(global_si->native.evtchn_mask));
+}
 
 bool is_vixen(void)
 {
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f9d087e..07239c0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -869,6 +869,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     else
         panic("Bootloader provided no memory information.");
 
+    /* Vixen must be initialized before init_e820() */
+    init_vixen();
+
     /* Sanitise the raw E820 map to produce a final clean version. */
     max_page = raw_max_page = init_e820(memmap_type, &e820_raw);
 
@@ -1516,6 +1519,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     rcu_init();
 
+    early_vixen_init();
+
     early_time_init();
 
     arch_init_memory();
diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h
index be90c46..5bfa59d 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -70,4 +70,8 @@ HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args)
 
 bool is_vixen(void);
 
+void __init init_vixen(void);
+
+void __init early_vixen_init(void);
+
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/22] vixen: paravirtualization TSC frequency calculation
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (10 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 13/22] vixen: Use SCHEDOP_shutdown to shutdown the machine Anthony Liguori
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

Otherwise when time sharing a physical CPU, the calculation can
be bogus resulting in time drift for the guest due to improper
frequency within pvclock.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/guest/vixen.c        | 21 +++++++++++++++++++++
 xen/arch/x86/time.c               |  9 ++++++++-
 xen/include/asm-x86/guest/vixen.h |  2 ++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index d8466ba..1816ece 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -73,3 +73,24 @@ bool is_vixen(void)
     return in_vixen > 0;
 }
 
+u64 vixen_get_cpu_freq(void)
+{
+    volatile vcpu_time_info_t *timep = &global_si->native.vcpu_info[0].time;
+    vcpu_time_info_t time;
+    uint32_t version;
+    u64 imm;
+
+    do {
+	version = timep->version;
+	rmb();
+	time = *timep;
+    } while ((version & 1) || version != time.version);
+
+    imm = (1000000000ULL << 32) / time.tsc_to_system_mul;
+
+    if (time.tsc_shift < 0) {
+	return imm << -time.tsc_shift;
+    } else {
+	return imm >> time.tsc_shift;
+    }
+}
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 2a87950..04c0fbb 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -36,6 +36,7 @@
 #include <io_ports.h>
 #include <asm/setup.h> /* for early_time_init */
 #include <public/arch-x86/cpuid.h>
+#include <asm/guest/vixen.h>
 
 /* opt_clocksource: Force clocksource to one of: pit, hpet, acpi. */
 static char __initdata opt_clocksource[10];
@@ -1687,6 +1688,12 @@ void __init early_time_init(void)
 
     preinit_pit();
     tmp = init_platform_timer();
+
+    /* We cannot trust calibrated values when running under
+     * a hypervisor. */
+    if ( is_vixen() )
+        tmp = vixen_get_cpu_freq();
+
     plt_tsc.frequency = tmp;
 
     set_time_scale(&t->tsc_scale, tmp);
@@ -2014,7 +2021,7 @@ void tsc_set_info(struct domain *d,
                   uint32_t tsc_mode, uint64_t elapsed_nsec,
                   uint32_t gtsc_khz, uint32_t incarnation)
 {
-    if ( is_idle_domain(d) || is_hardware_domain(d) )
+    if ( is_idle_domain(d) || is_vixen() || is_hardware_domain(d) )
     {
         d->arch.vtsc = 0;
         return;
diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h
index 5bfa59d..28c4337 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -74,4 +74,6 @@ void __init init_vixen(void);
 
 void __init early_vixen_init(void);
 
+u64 vixen_get_cpu_freq(void);
+
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/22] vixen: Use SCHEDOP_shutdown to shutdown the machine
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (11 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 12/22] vixen: paravirtualization TSC frequency calculation Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:27   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 14/22] vixen: forward VCPUOP_register_runstate_memory_area to outer Xen Anthony Liguori
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Jan H. Schönherr <jschoenh@amazon.de>

While the hwdom_shutdown() is able to reboot the system, it fails to
properly power it off. With SCHEDOP_shutdown, we delegate the problem.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/domain.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b4d679e..ede377c 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -42,6 +42,7 @@
 #include <xen/trace.h>
 #include <xen/tmem.h>
 #include <asm/setup.h>
+#include <asm/guest/vixen.h>
 
 /* Linux config option: propageted to domain0 */
 /* xen_processor_pmbits: xen control Cx, Px, ... */
@@ -693,6 +694,17 @@ void __domain_crash_synchronous(void)
 }
 
 
+static void vixen_shutdown(u8 reason)
+{
+    struct sched_shutdown sched_shutdown = { .reason = reason };
+
+    if (!opt_noreboot)
+        HYPERVISOR_sched_op(SCHEDOP_shutdown, &sched_shutdown);
+
+    /* Fallback, in case the hypercall fails */
+    hwdom_shutdown(reason);
+}
+ 
 void domain_shutdown(struct domain *d, u8 reason)
 {
     struct vcpu *v;
@@ -703,6 +715,8 @@ void domain_shutdown(struct domain *d, u8 reason)
         d->shutdown_code = reason;
     reason = d->shutdown_code;
 
+    if ( is_vixen() )
+        vixen_shutdown(reason);
     if ( is_hardware_domain(d) )
         hwdom_shutdown(reason);
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/22] vixen: forward VCPUOP_register_runstate_memory_area to outer Xen
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (12 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 13/22] vixen: Use SCHEDOP_shutdown to shutdown the machine Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 15/22] vixen: pass through version hypercalls to parent Xen Anthony Liguori
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

This allows for proper accounting of steal time within the guest.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/domain.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index ede377c..780f8ff 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1414,6 +1414,12 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !guest_handle_okay(area.addr.h, 1) )
             break;
 
+        if ( is_vixen() ) {
+            rc = HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
+                                    vcpuid, &area);
+            break;
+        }
+
         rc = 0;
         runstate_guest(v) = area.addr.h;
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/22] vixen: pass through version hypercalls to parent Xen
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (13 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 14/22] vixen: forward VCPUOP_register_runstate_memory_area to outer Xen Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:31   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 16/22] vixen: pass grant table operations through to the outer Xen Anthony Liguori
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

This is necessary to trigger event channel upcalls but it is also
useful to passthrough the full version information such that the
guest believes it is running on the parent Xen.

Signed-off-by: Matt Wilson <msw@amazon.com>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/kernel.c | 82 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 12 deletions(-)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 8d137c5..b9885c8 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -15,6 +15,7 @@
 #include <xsm/xsm.h>
 #include <asm/current.h>
 #include <public/version.h>
+#include <asm/guest/vixen.h>
 
 #ifndef COMPAT
 
@@ -311,14 +312,32 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     switch ( cmd )
     {
     case XENVER_version:
-        return (xen_major_version() << 16) | xen_minor_version();
+        if ( is_vixen() )
+            return HYPERVISOR_xen_version(XENVER_version, NULL);
+        else
+            return (xen_major_version() << 16) | xen_minor_version();
 
     case XENVER_extraversion:
     {
         xen_extraversion_t extraversion;
+        int rc;
 
         memset(extraversion, 0, sizeof(extraversion));
-        safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
+        if ( is_vixen() )
+        {
+            if ( deny )
+                safe_strcpy(extraversion, xen_deny());
+            else
+            {
+                rc = HYPERVISOR_xen_version(XENVER_extraversion, &extraversion);
+                if ( rc )
+                    return rc;
+            }
+        }
+        else
+        {
+            safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
+        }
         if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
             return -EFAULT;
         return 0;
@@ -327,12 +346,22 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_compile_info:
     {
         xen_compile_info_t info;
+        int rc;
 
         memset(&info, 0, sizeof(info));
-        safe_strcpy(info.compiler,       deny ? xen_deny() : xen_compiler());
-        safe_strcpy(info.compile_by,     deny ? xen_deny() : xen_compile_by());
-        safe_strcpy(info.compile_domain, deny ? xen_deny() : xen_compile_domain());
-        safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
+        if ( is_vixen() )
+        {
+            rc = HYPERVISOR_xen_version(XENVER_compile_info, &info);
+            if ( rc )
+                return rc;
+        }
+        else
+        {
+            safe_strcpy(info.compiler,       deny ? xen_deny() : xen_compiler());
+            safe_strcpy(info.compile_by,     deny ? xen_deny() : xen_compile_by());
+            safe_strcpy(info.compile_domain, deny ? xen_deny() : xen_compile_domain());
+            safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
+        }
         if ( copy_to_guest(arg, &info, 1) )
             return -EFAULT;
         return 0;
@@ -366,9 +395,24 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_changeset:
     {
         xen_changeset_info_t chgset;
+        int rc;
 
         memset(chgset, 0, sizeof(chgset));
-        safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
+        if ( is_vixen() )
+        {
+            if ( deny )
+                safe_strcpy(chgset, xen_deny());
+            else
+            {
+                rc = HYPERVISOR_xen_version(XENVER_changeset, &chgset);
+                if ( rc )
+                    return rc;
+            }
+        }
+        else
+        {
+            safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
+        }
         if ( copy_to_guest(arg, chgset, ARRAY_SIZE(chgset)) )
             return -EFAULT;
         return 0;
@@ -430,15 +474,29 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_guest_handle:
     {
         xen_domain_handle_t hdl;
+        int rc;
 
-        if ( deny )
-            memset(&hdl, 0, ARRAY_SIZE(hdl));
+        memset(&hdl, 0, ARRAY_SIZE(hdl));
 
         BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != ARRAY_SIZE(hdl));
 
-        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
-                           ARRAY_SIZE(hdl) ) )
-            return -EFAULT;
+        if ( is_vixen () )
+        {
+            if ( !deny )
+            {
+                rc = HYPERVISOR_xen_version(XENVER_guest_handle, &hdl);
+                if ( rc )
+                    return rc;
+            }
+            if ( copy_to_guest(arg, hdl, ARRAY_SIZE(hdl) ) )
+                return -EFAULT;
+        }
+        else
+        {
+            if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
+                               ARRAY_SIZE(hdl) ) )
+                return -EFAULT;
+        }
         return 0;
     }
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (14 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 15/22] vixen: pass through version hypercalls to parent Xen Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:36   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 17/22] vixen: setup infrastructure to receive event channel notifications Anthony Liguori
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

The grant table is a region of guest memory that contains GMFNs
which in PV are MFNs but are PFNs in HVM.  Since a Vixen guest MFN
is an HVM PFN, we can pass this table directly through to the outer
Xen which cuts down considerably on overhead.

We do not forward most of the hypercalls since we only intend on
Vixen to be used for normal guests, not driver domains.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/grant_table.c | 131 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 250450b..b302fd0 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -39,6 +39,7 @@
 #include <xen/vmap.h>
 #include <xsm/xsm.h>
 #include <asm/flushtlb.h>
+#include <asm/guest.h>
 
 /* Per-domain grant information. */
 struct grant_table {
@@ -1199,6 +1200,9 @@ gnttab_map_grant_ref(
     int i;
     struct gnttab_map_grant_ref op;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     for ( i = 0; i < count; i++ )
     {
         if ( i && hypercall_preempt_check() )
@@ -1502,6 +1506,9 @@ gnttab_unmap_grant_ref(
     struct gnttab_unmap_grant_ref op;
     struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     while ( count != 0 )
     {
         c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
@@ -1567,6 +1574,9 @@ gnttab_unmap_and_replace(
     struct gnttab_unmap_and_replace op;
     struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     while ( count != 0 )
     {
         c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
@@ -1801,6 +1811,80 @@ grant_table_init(struct domain *d, struct grant_table *gt,
 }
 
 static long
+vixen_gnttab_setup_table(
+    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
+{
+    long rc;
+
+    struct gnttab_setup_table op;
+    xen_pfn_t *frame_list = NULL;
+    static void *grant_table;
+    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
+
+    if ( count != 1 )
+        return -EINVAL;
+
+    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
+    {
+        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
+        return -EFAULT;
+    }
+
+    if ( grant_table == NULL ) {
+        struct xen_add_to_physmap xatp;
+        struct domain *d;
+        int i;
+
+        for ( i = 0; i < max_grant_frames; i++ )
+        {
+             grant_table = alloc_xenheap_page();
+             BUG_ON(grant_table == NULL);
+             xatp.domid = DOMID_SELF;
+             xatp.idx = i;
+             xatp.space = XENMAPSPACE_grant_table;
+             xatp.gpfn = virt_to_mfn(grant_table);
+             rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
+             if ( rc != 0 )
+                 printk("Add to physmap failed! %ld\n", rc);
+
+             d = rcu_lock_current_domain();
+             share_xen_page_with_guest(mfn_to_page(xatp.gpfn), d, XENSHARE_writable);
+             rcu_unlock_domain(d);
+        }
+    }
+
+    if ( op.nr_frames > 0 ) {
+        frame_list = xzalloc_array(xen_pfn_t, op.nr_frames);
+        if ( frame_list == NULL )
+            return -ENOMEM;
+    }
+
+    old_frame_list = op.frame_list;
+    op.frame_list.p = frame_list;
+
+    rc = HYPERVISOR_grant_table_op(GNTTABOP_setup_table, &op, count);
+    op.frame_list = old_frame_list;
+
+    if ( rc >= 0 ) {
+        if ( op.status == 0 && op.nr_frames &&
+             copy_to_guest(old_frame_list, frame_list, op.nr_frames) != 0 ) {
+            rc = -EFAULT;
+            goto out;
+        }
+
+        if ( unlikely(copy_to_guest(uop, &op, 1)) != 0 ) {
+            rc = -EFAULT;
+            goto out;
+        }
+    }
+
+ out:
+    xfree(frame_list);
+
+    return rc;
+}
+
+static long
 gnttab_setup_table(
     XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count,
     unsigned int limit_max)
@@ -1811,6 +1895,9 @@ gnttab_setup_table(
     struct grant_table *gt;
     unsigned int i;
 
+    if ( is_vixen() )
+        return vixen_gnttab_setup_table(uop, count);
+
     if ( count != 1 )
         return -EINVAL;
 
@@ -1892,6 +1979,26 @@ gnttab_setup_table(
 }
 
 static long
+vixen_gnttab_query_size(
+    XEN_GUEST_HANDLE_PARAM(gnttab_query_size_t) uop, unsigned int count)
+{
+    struct gnttab_query_size op;
+    int rc;
+
+    if ( count != 1 )
+        return -EINVAL;
+
+    if ( unlikely(copy_from_guest(&op, uop, 1)) != 0)
+        return -EFAULT;
+
+    rc = HYPERVISOR_grant_table_op(GNTTABOP_query_size, &op, count);
+    if (rc == 0 && unlikely(__copy_to_guest(uop, &op, 1)) )
+        rc = -EFAULT;
+
+    return rc;
+}
+
+static long
 gnttab_query_size(
     XEN_GUEST_HANDLE_PARAM(gnttab_query_size_t) uop, unsigned int count)
 {
@@ -1902,6 +2009,9 @@ gnttab_query_size(
     if ( count != 1 )
         return -EINVAL;
 
+    if ( is_vixen() )
+        return vixen_gnttab_query_size(uop, count);
+
     if ( unlikely(copy_from_guest(&op, uop, 1)) )
         return -EFAULT;
 
@@ -2015,6 +2125,9 @@ gnttab_transfer(
     unsigned int max_bitsize;
     struct active_grant_entry *act;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     for ( i = 0; i < count; i++ )
     {
         bool_t okay;
@@ -2816,6 +2929,9 @@ static long gnttab_copy(
     struct gnttab_copy_buf dest = {};
     long rc = 0;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     for ( i = 0; i < count; i++ )
     {
         if ( i && hypercall_preempt_check() )
@@ -2869,6 +2985,9 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     int res;
     unsigned int i;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     if ( copy_from_guest(&op, uop, 1) )
         return -EFAULT;
 
@@ -3021,6 +3140,9 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
     if ( count != 1 )
         return -EINVAL;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
     {
         gdprintk(XENLOG_INFO,
@@ -3091,6 +3213,9 @@ gnttab_get_version(XEN_GUEST_HANDLE_PARAM(gnttab_get_version_t) uop)
     struct domain *d;
     int rc;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     if ( copy_from_guest(&op, uop, 1) )
         return -EFAULT;
 
@@ -3186,6 +3311,9 @@ gnttab_swap_grant_ref(XEN_GUEST_HANDLE_PARAM(gnttab_swap_grant_ref_t) uop,
     int i;
     gnttab_swap_grant_ref_t op;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     for ( i = 0; i < count; i++ )
     {
         if ( i && hypercall_preempt_check() )
@@ -3285,6 +3413,9 @@ gnttab_cache_flush(XEN_GUEST_HANDLE_PARAM(gnttab_cache_flush_t) uop,
     unsigned int i;
     gnttab_cache_flush_t op;
 
+    if ( is_vixen() )
+        return -ENOSYS;
+
     for ( i = 0; i < count; i++ )
     {
         if ( i && hypercall_preempt_check() )
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/22] vixen: setup infrastructure to receive event channel notifications
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (15 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 16/22] vixen: pass grant table operations through to the outer Xen Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:42   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying Anthony Liguori
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

This patch registers an interrupt handler using either an INTx
interrupt from the platform PCI device, CALLBACK_IRQ vector
delivery, or evtchn_upcall_vector depending on what the parent
hypervisor supports.

The event channel polling code comes from Linux but uses the
internal infrastructure for delivery.

Finally, this infrastructure has to be initialized per-VCPU so
hook the appropriate place for that.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/domain.c             |   3 +
 xen/arch/x86/guest/vixen.c        | 264 ++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c              |   3 +
 xen/include/asm-x86/guest/vixen.h |   6 +
 4 files changed, 276 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index da1bf1a..3e9c5be 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1147,6 +1147,9 @@ int arch_set_info_guest(
 
     update_cr3(v);
 
+    if ( is_vixen() )
+        vixen_vcpu_initialize(v);
+
  out:
     if ( flags & VGCF_online )
         clear_bit(_VPF_down, &v->pause_flags);
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index 1816ece..76d9638 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -21,10 +21,16 @@
 
 #include <asm/guest/vixen.h>
 #include <public/version.h>
+#include <xen/event.h>
+#include <asm/apic.h>
 
 static int in_vixen;
 static uint8_t global_si_data[4 << 10] __attribute__((aligned(4096)));
 static shared_info_t *global_si = (void *)global_si_data;
+static bool vixen_per_cpu_notifications = true;
+static uint8_t vixen_evtchn_vector;
+static bool vixen_needs_apic_ack = true;
+struct irqaction vixen_irqaction;
 
 void __init init_vixen(void)
 {
@@ -94,3 +100,261 @@ u64 vixen_get_cpu_freq(void)
 	return imm >> time.tsc_shift;
     }
 }
+
+/*
+ * Make a bitmask (i.e. unsigned long *) of a xen_ulong_t
+ * array. Primarily to avoid long lines (hence the terse name).
+ */
+#define BM(x) (unsigned long *)(x)
+/* Find the first set bit in a evtchn mask */
+#define EVTCHN_FIRST_BIT(w) find_first_bit(BM(&(w)), BITS_PER_XEN_ULONG)
+
+/*
+ * Mask out the i least significant bits of w
+ */
+#define MASK_LSBS(w, i) (w & ((~((xen_ulong_t)0UL)) << i))
+
+static DEFINE_PER_CPU(unsigned int, current_word_idx);
+static DEFINE_PER_CPU(unsigned int, current_bit_idx);
+
+static inline xen_ulong_t active_evtchns(unsigned int cpu,
+                                         shared_info_t *sh,
+                                         unsigned int idx)
+{
+    return sh->native.evtchn_pending[idx] &
+           ~sh->native.evtchn_mask[idx];
+}
+
+static void vixen_evtchn_poll_one(size_t cpu)
+{
+    shared_info_t *s = global_si;
+    struct vcpu_info *vcpu_info = &s->native.vcpu_info[cpu];
+    xen_ulong_t pending_words;
+    xen_ulong_t pending_bits;
+    int start_word_idx, start_bit_idx;
+    int word_idx, bit_idx, i;
+
+    /*
+     * Master flag must be cleared /before/ clearing
+     * selector flag. xchg_xen_ulong must contain an
+     * appropriate barrier.
+     */
+    pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0);
+
+    start_word_idx = this_cpu(current_word_idx);
+    start_bit_idx = this_cpu(current_bit_idx);
+
+    word_idx = start_word_idx;
+
+    for (i = 0; pending_words != 0; i++) {
+        xen_ulong_t words;
+
+        words = MASK_LSBS(pending_words, word_idx);
+
+        /*
+         * If we masked out all events, wrap to beginning.
+         */
+        if (words == 0) {
+            word_idx = 0;
+            bit_idx = 0;
+            continue;
+        }
+        word_idx = EVTCHN_FIRST_BIT(words);
+
+        pending_bits = active_evtchns(cpu, s, word_idx);
+        bit_idx = 0; /* usually scan entire word from start */
+        /*
+         * We scan the starting word in two parts.
+         *
+         * 1st time: start in the middle, scanning the
+         * upper bits.
+         *
+         * 2nd time: scan the whole word (not just the
+         * parts skipped in the first pass) -- if an
+         * event in the previously scanned bits is
+         * pending again it would just be scanned on
+         * the next loop anyway.
+         */
+        if (word_idx == start_word_idx) {
+            if (i == 0)
+                bit_idx = start_bit_idx;
+        }
+
+        do {
+            struct evtchn *chn;
+            xen_ulong_t bits;
+            int port;
+
+            bits = MASK_LSBS(pending_bits, bit_idx);
+
+            /* If we masked out all events, move on. */
+            if (bits == 0)
+                break;
+
+            bit_idx = EVTCHN_FIRST_BIT(bits);
+
+            /* Process port. */
+            port = (word_idx * BITS_PER_XEN_ULONG) + bit_idx;
+
+            chn = evtchn_from_port(hardware_domain, port);
+            clear_bit(port, s->native.evtchn_pending);
+            evtchn_port_set_pending(hardware_domain, chn->notify_vcpu_id, chn);
+
+            bit_idx = (bit_idx + 1) % BITS_PER_XEN_ULONG;
+
+            /* Next caller starts at last processed + 1 */
+            this_cpu(current_word_idx) = bit_idx ? word_idx : (word_idx+1) % BITS_PER_XEN_ULONG;
+            this_cpu(current_bit_idx) = bit_idx;
+        } while (bit_idx != 0);
+
+        /* Scan start_l1i twice; all others once. */
+        if ((word_idx != start_word_idx) || (i != 0))
+            pending_words &= ~(1UL << word_idx);
+
+        word_idx = (word_idx + 1) % BITS_PER_XEN_ULONG;
+    }
+}
+
+static void vixen_upcall(int cpu)
+{
+    shared_info_t *s = global_si;
+    struct vcpu_info *vcpu_info = &s->native.vcpu_info[cpu];
+
+    do {
+        vcpu_info->evtchn_upcall_pending = 0;
+        vixen_evtchn_poll_one(cpu);
+    } while (vcpu_info->evtchn_upcall_pending);
+}
+
+static void vixen_evtchn_notify(struct cpu_user_regs *regs)
+{
+    if (vixen_needs_apic_ack)
+        ack_APIC_irq();
+
+    vixen_upcall(smp_processor_id());
+}
+
+static void vixen_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
+{
+    vixen_upcall(smp_processor_id());
+}
+
+static int hvm_set_parameter(int idx, uint64_t value)
+{
+    struct xen_hvm_param xhv;
+    int r;
+
+    xhv.domid = DOMID_SELF;
+    xhv.index = idx;
+    xhv.value = value;
+    r = HYPERVISOR_hvm_op(HVMOP_set_param, &xhv);
+    if (r < 0) {
+        printk("Cannot set hvm parameter %d: %d!\n",
+               idx, r);
+        return r;
+    }
+    return r;
+}
+
+void vixen_vcpu_initialize(struct vcpu *v)
+{
+    struct xen_hvm_evtchn_upcall_vector upcall;
+    long rc;
+
+    printk("VIXEN vcpu init VCPU%d\n", v->vcpu_id);
+
+    vcpu_pin_override(v, v->vcpu_id);
+
+    if (!vixen_needs_apic_ack)
+        return;
+
+    printk("VIXEN vcpu init VCPU%d -- trying evtchn_upcall_vector\n", v->vcpu_id);
+
+    upcall.vcpu = v->vcpu_id;
+    upcall.vector = vixen_evtchn_vector;
+    rc = HYPERVISOR_hvm_op(HVMOP_set_evtchn_upcall_vector, &upcall);
+    if ( rc )
+    {
+        struct xen_feature_info fi;
+
+        printk("VIXEN vcpu init VCPU%d -- trying hvm_callback_vector\n", v->vcpu_id);
+
+        fi.submap_idx = 0;
+        rc = HYPERVISOR_xen_version(XENVER_get_features, &fi);
+        if ( !rc )
+        {
+            rc = -EINVAL;
+            if ( fi.submap & (1 << XENFEAT_hvm_callback_vector) )
+            {
+                rc = hvm_set_parameter(HVM_PARAM_CALLBACK_IRQ,
+                                       ((uint64_t)HVM_PARAM_CALLBACK_TYPE_VECTOR << 56) | vixen_evtchn_vector);
+            }
+            if ( !rc )
+                vixen_needs_apic_ack = false;
+        }
+    }
+
+    if ( rc )
+    {
+        int slot;
+
+        vixen_per_cpu_notifications = false;
+
+        printk("VIXEN vcpu init VCPU%d -- trying pci_intx_callback\n", v->vcpu_id);
+        for (slot = 2; slot < 32; slot++) {
+            uint16_t vendor, device;
+
+            vendor = pci_conf_read16(0, 0, slot, 0, PCI_VENDOR_ID);
+            device = pci_conf_read16(0, 0, slot, 0, PCI_DEVICE_ID);
+
+            if (vendor == 0x5853 && device == 0x0001) {
+                break;
+            }
+        }
+
+        if (slot != 32) {
+            int pin, line;
+
+            printk("Found Xen platform device at 0000:00:%02d.0\n", slot);
+            pin = pci_conf_read8(0, 0, slot, 0, PCI_INTERRUPT_PIN);
+            if (pin) {
+                line = pci_conf_read8(0, 0, slot, 0, PCI_INTERRUPT_LINE);
+                rc = hvm_set_parameter(HVM_PARAM_CALLBACK_IRQ,
+                                       (1ULL << 56) | (slot << 11) | (pin - 1));
+
+                if (rc) {
+                    printk("Failed to setup IRQ callback\n");
+                } else {
+                    vixen_irqaction.handler = vixen_interrupt;
+                    vixen_irqaction.name = "vixen";
+                    vixen_irqaction.dev_id = 0;
+                    rc = setup_irq(line, 0, &vixen_irqaction);
+                    if (rc) {
+                        printk("Setup IRQ failed!\n");
+                    } else {
+                        printk("Xen platform LNK mapped to line %d\n", line);
+                        vixen_needs_apic_ack = false;
+                    }
+                }
+            }
+        } else {
+            printk("Cannot find Platform device\n");
+        }
+    }
+}
+
+bool vixen_has_per_cpu_notifications(void)
+{
+    return vixen_per_cpu_notifications;
+}
+
+void __init
+vixen_transform(struct domain *dom0)
+{
+    /* Setup event channel forwarding */
+    alloc_direct_apic_vector(&vixen_evtchn_vector, vixen_evtchn_notify);
+    printk("Vixen evtchn vector is %d\n", vixen_evtchn_vector);
+
+    /* Initialize the first vCPU */
+    vixen_vcpu_initialize(dom0->vcpu[0]);
+}
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 07239c0..1b89844 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1603,6 +1603,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     dom0->is_privileged = 1;
     dom0->target = NULL;
 
+    if ( is_vixen() )
+        vixen_transform(dom0);
+
     /* Grab the DOM0 command line. */
     cmdline = (char *)(mod[0].string ? __va(mod[0].string) : NULL);
     if ( (cmdline != NULL) || (kextra != NULL) )
diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h
index 28c4337..e486cc3 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -76,4 +76,10 @@ void __init early_vixen_init(void);
 
 u64 vixen_get_cpu_freq(void);
 
+bool vixen_has_per_cpu_notifications(void);
+
+void vixen_vcpu_initialize(struct vcpu *v);
+
+void __init vixen_transform(struct domain *dom0);
+
 #endif
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (16 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 17/22] vixen: setup infrastructure to receive event channel notifications Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  8:44   ` Roger Pau Monné
  2018-01-06 22:54 ` [PATCH 19/22] vixen: Fix Vixen adaptation of send_global_virq() Anthony Liguori
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Jan H. Schönherr <jschoenh@amazon.de>

Previously, we would keep proxied event channels as ECS_INTERDOMAIN
channel around. This works for most things, but has the problem
that EVTCHNOP_status is broken, and that EVTCHNOP_close does not
mark an event channel as free.

Introduce a separate ECS_PROXY to denote event channels that are
forwarded to the hypervisor we're running under.

This makes the code more readable in many places.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/event_channel.c | 87 ++++++++++++++++++++++++++++++++++++++++------
 xen/include/xen/event.h    |  3 ++
 xen/include/xen/sched.h    |  1 +
 3 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index c69f9db..85ff7e0 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -30,6 +30,7 @@
 #include <public/xen.h>
 #include <public/event_channel.h>
 #include <xsm/xsm.h>
+#include <asm/guest/vixen.h>
 
 #define ERROR_EXIT(_errno)                                          \
     do {                                                            \
@@ -156,25 +157,25 @@ static void free_evtchn_bucket(struct domain *d, struct evtchn *bucket)
     xfree(bucket);
 }
 
-static int get_free_port(struct domain *d)
+static int allocate_port(struct domain *d, int port)
 {
     struct evtchn *chn;
     struct evtchn **grp;
-    int            port;
 
     if ( d->is_dying )
         return -EINVAL;
 
-    for ( port = 0; port_is_valid(d, port); port++ )
+    if ( port_is_valid(d, port) )
     {
         if ( port > d->max_evtchn_port )
             return -ENOSPC;
         if ( evtchn_from_port(d, port)->state == ECS_FREE
              && !evtchn_port_is_busy(d, port) )
             return port;
+        return -EINVAL;
     }
 
-    if ( port == d->max_evtchns || port > d->max_evtchn_port )
+    if ( port >= d->max_evtchns || port > d->max_evtchn_port )
         return -ENOSPC;
 
     if ( !group_from_port(d, port) )
@@ -185,16 +186,59 @@ static int get_free_port(struct domain *d)
         group_from_port(d, port) = grp;
     }
 
-    chn = alloc_evtchn_bucket(d, port);
-    if ( !chn )
-        return -ENOMEM;
-    bucket_from_port(d, port) = chn;
+    while ( d->valid_evtchns <= port )
+    {
+        chn = alloc_evtchn_bucket(d, d->valid_evtchns);
+        if ( !chn )
+            return -ENOMEM;
+        bucket_from_port(d, d->valid_evtchns) = chn;
 
-    write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+        write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+    }
 
     return port;
 }
 
+static int get_free_port(struct domain *d)
+{
+    int port;
+
+    for ( port = 0; port_is_valid(d, port); port++ )
+    {
+        if ( port > d->max_evtchn_port )
+            return -ENOSPC;
+        if ( evtchn_from_port(d, port)->state == ECS_FREE
+             && !evtchn_port_is_busy(d, port) )
+            break;
+    }
+
+    return allocate_port(d, port);
+}
+
+int evtchn_alloc_proxy(struct domain *d, int port, u8 ecs)
+{
+    struct evtchn *chn;
+    int rc;
+
+    if ( !is_vixen() )
+        return -ENOSYS;
+
+    rc = allocate_port(d, port);
+    if ( rc < 0 )
+        return rc;
+
+    chn = evtchn_from_port(d, port);
+    spin_lock(&chn->lock);
+    chn->state = ECS_PROXY;
+    evtchn_port_init(d, chn);
+
+    if ( ecs == ECS_INTERDOMAIN )
+        evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
+    spin_unlock(&chn->lock);
+
+    return 0;
+}
+
 static void free_evtchn(struct domain *d, struct evtchn *chn)
 {
     /* Clear pending event to avoid unexpected behavior on re-bind. */
@@ -628,6 +672,9 @@ static long evtchn_close(struct domain *d1, int port1, bool_t guest)
 
         goto out;
 
+    case ECS_PROXY:
+        break;
+
     default:
         BUG();
     }
@@ -690,6 +737,14 @@ int evtchn_send(struct domain *ld, unsigned int lport)
     case ECS_UNBOUND:
         /* silently drop the notification */
         break;
+    case ECS_PROXY:
+        ret = -EINVAL;
+        if ( is_vixen() )
+        {
+            struct evtchn_send send = { .port = lport };
+            ret = HYPERVISOR_event_channel_op(EVTCHNOP_send, &send);
+        }
+        break;
     default:
         ret = -EINVAL;
     }
@@ -892,6 +947,10 @@ static long evtchn_status(evtchn_status_t *status)
     case ECS_IPI:
         status->status = EVTCHNSTAT_ipi;
         break;
+    case ECS_PROXY:
+        BUG_ON(!is_vixen());
+        rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, status);
+        break;
     default:
         BUG();
     }
@@ -944,6 +1003,14 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id)
     case ECS_INTERDOMAIN:
         chn->notify_vcpu_id = vcpu_id;
         break;
+    case ECS_PROXY:
+        if ( is_vixen() && vixen_has_per_cpu_notifications() )
+        {
+            struct evtchn_bind_vcpu bind = { .port = port, .vcpu = vcpu_id };
+            HYPERVISOR_event_channel_op(EVTCHNOP_bind_vcpu, &bind);
+        }
+        chn->notify_vcpu_id = vcpu_id;
+        break;
     case ECS_PIRQ:
         if ( chn->notify_vcpu_id == vcpu_id )
             break;
@@ -1276,7 +1343,7 @@ int evtchn_init(struct domain *d)
     d->valid_evtchns = EVTCHNS_PER_BUCKET;
 
     spin_lock_init_prof(d, event_lock);
-    if ( get_free_port(d) != 0 )
+    if ( allocate_port(d, 0) != 0 )
     {
         free_evtchn_bucket(d, d->evtchn);
         return -EINVAL;
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 87915ea..f3febe6 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -71,6 +71,9 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 /* Inject an event channel notification into the guest */
 void arch_evtchn_inject(struct vcpu *v);
 
+/* Allocate a specific event channel as proxy. */
+int evtchn_alloc_proxy(struct domain *d, int port, u8 ecs);
+
 /*
  * Internal event channel object storage.
  *
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index f6c6fff..eb5a989 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -93,6 +93,7 @@ struct evtchn
 #define ECS_PIRQ         4 /* Channel is bound to a physical IRQ line.       */
 #define ECS_VIRQ         5 /* Channel is bound to a virtual IRQ line.        */
 #define ECS_IPI          6 /* Channel is bound to a virtual IPI line.        */
+#define ECS_PROXY        7 /* Channel is proxied to parent hypervisor.       */
     u8  state;             /* ECS_* */
     u8  xen_consumer:XEN_CONSUMER_BITS; /* Consumer in Xen if nonzero */
     u8  pending:1;
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/22] vixen: Fix Vixen adaptation of send_global_virq()
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (17 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 20/22] vixen: event channel passthrough support Anthony Liguori
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Jan H. Schönherr <jschoenh@amazon.de>

The function originally did the following unconditionally:

   send_guest_global_virq(global_virq_handlers[virq] ?: hardware_domain, virq);

The new variant should reflect the non-Vixen case correctly.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/event_channel.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 85ff7e0..3dee73b 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -840,7 +840,10 @@ void send_global_virq(uint32_t virq)
     ASSERT(virq < NR_VIRQS);
     ASSERT(virq_is_global(virq));
 
-    send_guest_global_virq(global_virq_handlers[virq] ?: hardware_domain, virq);
+    if ( global_virq_handlers[virq] )
+        send_guest_global_virq(global_virq_handlers[virq], virq);
+    else if ( !is_vixen() )
+        send_guest_global_virq(hardware_domain, virq);
 }
 
 int set_global_virq_handler(struct domain *d, uint32_t virq)
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 20/22] vixen: event channel passthrough support
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (18 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 19/22] vixen: Fix Vixen adaptation of send_global_virq() Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 21/22] vixen: provide Xencons implementation Anthony Liguori
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

For Vixen, we do not want to pass through all event channel
operations as HVM guests do not have nearly as many event channel
interactions as PV and on older versions of Xen, there is no reliable
way to wake up an event channel on VCPU != 0 leading to a variety of
deadlocks.

By only forwarding interdomain and unbound event channel operations,
we can avoid this problem since these can always be bound to VCPU 0
on older versions of Xen HVM.  On newer versions of Xen, we allow the
event channels to be bound to the VCPU requested by the inner guest.

To ensure that we keep everything in sync, all event channels end up
allocating an unbound event channel in the parent Xen and we rely on
the parent Xen to owner the event channel address space.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/common/event_channel.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 3dee73b..54ea720 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -199,10 +199,34 @@ static int allocate_port(struct domain *d, int port)
     return port;
 }
 
+static int vixen_get_free_port(struct domain *d)
+{
+    int rc;
+    struct evtchn_alloc_unbound unbound = { .dom = DOMID_SELF,
+                                            .remote_dom = DOMID_SELF };
+
+    rc = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, &unbound);
+    if ( rc )
+        return rc;
+
+    rc = allocate_port(d, unbound.port);
+    if ( rc < 0 )
+    {
+        struct evtchn_close close = { .port = unbound.port };
+        HYPERVISOR_event_channel_op(EVTCHNOP_close, &close);
+        printk("Vixen: failed to allocate event channel %d => %d\n",
+               unbound.port, rc);
+    }
+    return rc;
+}
+
 static int get_free_port(struct domain *d)
 {
     int port;
 
+    if ( is_vixen() )
+        return vixen_get_free_port(d);
+
     for ( port = 0; port_is_valid(d, port); port++ )
     {
         if ( port > d->max_evtchn_port )
@@ -252,6 +276,11 @@ static void free_evtchn(struct domain *d, struct evtchn *chn)
     xsm_evtchn_close_post(chn);
 }
 
+static bool is_loopback(domid_t ldom, domid_t rdom)
+{
+    return ldom == DOMID_SELF && rdom == DOMID_SELF;
+}
+
 static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 {
     struct evtchn *chn;
@@ -266,6 +295,23 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
 
     spin_lock(&d->event_lock);
 
+    if ( is_vixen() && !is_loopback(alloc->dom, alloc->remote_dom) ) {
+        rc = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, alloc);
+        if ( rc )
+            goto out;
+
+        rc = evtchn_alloc_proxy(d, alloc->port, ECS_UNBOUND);
+        if ( rc )
+        {
+            struct evtchn_close close = { .port = alloc->port };
+            HYPERVISOR_event_channel_op(EVTCHNOP_close, &close);
+            printk("Vixen: failed to reserve unbound event channel %d => %ld\n",
+                   alloc->port, rc);
+        }
+
+        goto out;
+    }
+
     if ( (port = get_free_port(d)) < 0 )
         ERROR_EXIT_DOM(port, d);
     chn = evtchn_from_port(d, port);
@@ -315,6 +361,27 @@ static void double_evtchn_unlock(struct evtchn *lchn, struct evtchn *rchn)
         spin_unlock(&rchn->lock);
 }
 
+static long vixen_evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
+{
+    struct domain *d = current->domain;
+    long rc;
+
+    rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_interdomain, bind);
+    if ( rc )
+        return rc;
+
+    rc = evtchn_alloc_proxy(d, bind->local_port, ECS_INTERDOMAIN);
+    if ( rc )
+    {
+        struct evtchn_close close = { .port = bind->local_port };
+        HYPERVISOR_event_channel_op(EVTCHNOP_close, &close);
+        printk("Vixen: failed to reserve inter-domain event channel %d => %ld\n",
+               bind->local_port, rc);
+    }
+
+    return rc;
+}
+
 static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
 {
     struct evtchn *lchn, *rchn;
@@ -323,6 +390,9 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
     domid_t        rdom = bind->remote_dom;
     long           rc;
 
+    if ( is_vixen() && !is_loopback(DOMID_SELF, bind->remote_dom) )
+        return vixen_evtchn_bind_interdomain(bind);
+
     if ( rdom == DOMID_SELF )
         rdom = current->domain->domain_id;
 
@@ -581,6 +651,13 @@ static long evtchn_close(struct domain *d1, int port1, bool_t guest)
         goto out;
     }
 
+    if ( is_vixen() ) {
+        struct evtchn_close close = { .port = port1 };
+        rc = HYPERVISOR_event_channel_op(EVTCHNOP_close, &close);
+        if (rc != 0)
+            goto out;
+    }
+
     switch ( chn1->state )
     {
     case ECS_FREE:
@@ -1215,6 +1292,10 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case EVTCHNOP_init_control: {
         struct evtchn_init_control init_control;
+
+        if ( is_vixen() )
+            return -ENOSYS;
+
         if ( copy_from_guest(&init_control, arg, 1) != 0 )
             return -EFAULT;
         rc = evtchn_fifo_init_control(&init_control);
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 21/22] vixen: provide Xencons implementation
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (19 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 20/22] vixen: event channel passthrough support Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-06 22:54 ` [PATCH 22/22] vixen: dom0 builder support Anthony Liguori
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

Our initial approach exposed the console ring directly to guests
which worked well except for the fact that very old versions of Xen
did not support console ring for HVM guests.  It also proved to
be complicated from a management tool perspective since both the
serial console and the paravirt console for HVM guests produced
output.

Having a simple xencons implementation helps simplify using Vixen
as a management tool no longer needs to care about whether or not
this mode is enabled.

In order to output to the console without the '(Xen)' adornment,
we introduce a new entry point into the console code too.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/guest/vixen.c        | 41 +++++++++++++++++++++++++++++++++++++++
 xen/common/event_channel.c        |  5 ++++-
 xen/drivers/char/console.c        | 16 +++++++++++++++
 xen/include/asm-x86/guest/vixen.h |  2 ++
 xen/include/xen/lib.h             |  1 +
 5 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index 76d9638..eeffafa 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -23,6 +23,7 @@
 #include <public/version.h>
 #include <xen/event.h>
 #include <asm/apic.h>
+#include <public/io/console.h>
 
 static int in_vixen;
 static uint8_t global_si_data[4 << 10] __attribute__((aligned(4096)));
@@ -31,6 +32,9 @@ static bool vixen_per_cpu_notifications = true;
 static uint8_t vixen_evtchn_vector;
 static bool vixen_needs_apic_ack = true;
 struct irqaction vixen_irqaction;
+static volatile struct xencons_interface *vixen_xencons_iface;
+static uint16_t vixen_xencons_port;
+static spinlock_t vixen_xencons_lock;
 
 void __init init_vixen(void)
 {
@@ -49,6 +53,8 @@ void __init init_vixen(void)
 
     printk("Vixen running under Xen %d.%d\n", major, minor);
 
+    spin_lock_init(&vixen_xencons_lock);
+
     in_vixen = 1;
 }
 
@@ -239,6 +245,41 @@ static void vixen_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
     vixen_upcall(smp_processor_id());
 }
 
+bool vixen_ring_process(uint16_t port)
+{
+    volatile struct xencons_interface *r = vixen_xencons_iface;
+    char buffer[128];
+    size_t n;
+
+    if (r == NULL || port != vixen_xencons_port) {
+        return false;
+    }
+
+    spin_lock(&vixen_xencons_lock);
+
+    n = 0;
+    while (r->out_prod != r->out_cons) {
+        char ch = r->out[MASK_XENCONS_IDX(r->out_cons, r->out)];
+        if (n == sizeof(buffer) - 1) {
+            buffer[n] = 0;
+            guest_puts(hardware_domain, buffer);
+            n = 0;
+        }
+        buffer[n++] = ch;
+        rmb();
+        r->out_cons++;
+    }
+
+    if (n) {
+        buffer[n] = 0;
+        guest_puts(hardware_domain, buffer);
+    }
+
+    spin_unlock(&vixen_xencons_lock);
+
+    return true;
+}
+
 static int hvm_set_parameter(int idx, uint64_t value)
 {
     struct xen_hvm_param xhv;
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 54ea720..6d060a5 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1241,7 +1241,10 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         struct evtchn_send send;
         if ( copy_from_guest(&send, arg, 1) != 0 )
             return -EFAULT;
-        rc = evtchn_send(current->domain, send.port);
+        if ( vixen_ring_process(send.port) )
+            rc = 0;
+        else
+            rc = evtchn_send(current->domain, send.port);
         break;
     }
 
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index a07343d..a83aeb2 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -764,6 +764,22 @@ void guest_printk(const struct domain *d, const char *fmt, ...)
     va_end(args);
 }
 
+void guest_puts(const struct domain *d, const char *kbuf)
+{
+    spin_lock_irq(&console_lock);
+
+    sercon_puts(kbuf);
+    video_puts(kbuf);
+
+    if ( opt_console_to_ring )
+    {
+        conring_puts(kbuf);
+        tasklet_schedule(&notify_dom0_con_ring_tasklet);
+    }
+
+    spin_unlock_irq(&console_lock);
+}
+
 void __init console_init_preirq(void)
 {
     char *p;
diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h
index e486cc3..f46c6ed 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -82,4 +82,6 @@ void vixen_vcpu_initialize(struct vcpu *v);
 
 void __init vixen_transform(struct domain *dom0);
 
+bool vixen_ring_process(uint16_t port);
+
 #endif
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index ed00ae1..de84638 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -92,6 +92,7 @@ extern void printk(const char *format, ...)
     __attribute__ ((format (printf, 1, 2)));
 extern void guest_printk(const struct domain *d, const char *format, ...)
     __attribute__ ((format (printf, 2, 3)));
+extern void guest_puts(const struct domain *d, const char *message);
 extern void noreturn panic(const char *format, ...)
     __attribute__ ((format (printf, 1, 2)));
 extern long vm_assist(struct domain *, unsigned int cmd, unsigned int type,
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 22/22] vixen: dom0 builder support
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (20 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 21/22] vixen: provide Xencons implementation Anthony Liguori
@ 2018-01-06 22:54 ` Anthony Liguori
  2018-01-07  0:24   ` Matt Wilson
                     ` (2 more replies)
  2018-01-06 23:29 ` [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (2 subsequent siblings)
  24 siblings, 3 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 22:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Matt Wilson, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori

From: Anthony Liguori <aliguori@amazon.com>

The dom0 builder requires a number of modifications in order to be
able to launch unprivileged guests.  The console and store pages
must be mapped in a specific location within the guest's initial
page table.

We also have to setup the start info to be what's expected for
unprivileged guests and supress the normal logic to give dom0
increased permissions.

We have to pass around the console and store pages which involves
touching a number of places including the PVH builder.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
---
 xen/arch/x86/dom0_build.c         |  7 +++-
 xen/arch/x86/guest/vixen.c        | 65 +++++++++++++++++++++++++++++++-
 xen/arch/x86/hvm/dom0_build.c     |  4 +-
 xen/arch/x86/pv/dom0_build.c      | 79 ++++++++++++++++++++++++++++++++++-----
 xen/arch/x86/setup.c              | 12 +++++-
 xen/include/asm-x86/dom0_build.h  |  8 +++-
 xen/include/asm-x86/guest/vixen.h |  5 ++-
 xen/include/asm-x86/setup.h       |  4 +-
 8 files changed, 164 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 88810db..df9d3f8 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -464,7 +464,9 @@ int __init dom0_setup_permissions(struct domain *d)
 int __init construct_dom0(struct domain *d, const module_t *image,
                           unsigned long image_headroom, module_t *initrd,
                           void *(*bootstrap_map)(const module_t *),
-                          char *cmdline)
+                          char *cmdline,
+                          xen_pfn_t store_mfn, uint32_t store_evtchn,
+                          xen_pfn_t console_mfn, uint32_t console_evtchn)
 {
     int rc;
 
@@ -484,7 +486,8 @@ int __init construct_dom0(struct domain *d, const module_t *image,
 #endif
 
     rc = (is_hvm_domain(d) ? dom0_construct_pvh : dom0_construct_pv)
-         (d, image, image_headroom, initrd, bootstrap_map, cmdline);
+         (d, image, image_headroom, initrd, bootstrap_map, cmdline,
+          store_mfn, store_evtchn, console_mfn, console_evtchn);
     if ( rc )
         return rc;
 
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index eeffafa..08619d1 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -280,6 +280,23 @@ bool vixen_ring_process(uint16_t port)
     return true;
 }
 
+static int hvm_get_parameter(int idx, uint64_t *value)
+{
+    struct xen_hvm_param xhv;
+    int r;
+
+    xhv.domid = DOMID_SELF;
+    xhv.index = idx;
+    r = HYPERVISOR_hvm_op(HVMOP_get_param, &xhv);
+    if (r < 0) {
+        printk("Cannot get hvm parameter %d: %d!\n",
+               idx, r);
+        return r;
+    }
+    *value = xhv.value;
+    return r;
+}
+
 static int hvm_set_parameter(int idx, uint64_t value)
 {
     struct xen_hvm_param xhv;
@@ -390,8 +407,54 @@ bool vixen_has_per_cpu_notifications(void)
 }
 
 void __init
-vixen_transform(struct domain *dom0)
+vixen_transform(struct domain *dom0,
+                xen_pfn_t *pstore_mfn, uint32_t *pstore_evtchn,
+                xen_pfn_t *pconsole_mfn, uint32_t *pconsole_evtchn)
 {
+    uint64_t v = 0;
+    long rc;
+    struct evtchn_unmask unmask;
+    struct evtchn_alloc_unbound alloc;
+
+    /* Setup Xenstore */
+    hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v);
+    *pstore_evtchn = unmask.port = v;
+    HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
+
+    hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
+    *pstore_mfn = v;
+
+    printk("Vixen Xenstore evtchn is %d, pfn is 0x%" PRIx64 "\n",
+           *pstore_evtchn, *pstore_mfn);
+
+    /* Setup Xencons */
+    alloc.dom = DOMID_SELF;
+    alloc.remote_dom = DOMID_SELF;
+
+    rc = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, &alloc);
+    if ( rc )
+    {
+        printk("Failed to alloc unbound event channel: %ld\n", rc);
+        *pconsole_evtchn = 0;
+        *pconsole_mfn = 0;
+    }
+    else
+    {
+        void *console_data;
+
+        console_data = alloc_xenheap_page();
+
+        *pconsole_evtchn = alloc.port;
+        *pconsole_mfn = virt_to_mfn(console_data);
+
+        memset(console_data, 0, 4096);
+        vixen_xencons_iface = console_data;
+        vixen_xencons_port = alloc.port;
+    }
+
+    printk("Vixen Xencons evtchn is %d, pfn is 0x%" PRIx64 "\n",
+           *pconsole_evtchn, *pconsole_mfn);
+
     /* Setup event channel forwarding */
     alloc_direct_apic_vector(&vixen_evtchn_vector, vixen_evtchn_notify);
     printk("Vixen evtchn vector is %d\n", vixen_evtchn_vector);
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 4338965..b2ca64f 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1064,7 +1064,9 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
                               unsigned long image_headroom,
                               module_t *initrd,
                               void *(*bootstrap_map)(const module_t *),
-                              char *cmdline)
+                              char *cmdline,
+                              xen_pfn_t store_mfn, uint32_t store_evtchn,
+                              xen_pfn_t console_mfn, uint32_t console_evtchn)
 {
     paddr_t entry, start_info;
     int rc;
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 09c765a..c69f573 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -7,6 +7,7 @@
 #include <xen/console.h>
 #include <xen/domain.h>
 #include <xen/domain_page.h>
+#include <xen/event.h>
 #include <xen/init.h>
 #include <xen/libelf.h>
 #include <xen/multiboot.h>
@@ -276,7 +277,9 @@ int __init dom0_construct_pv(struct domain *d,
                              unsigned long image_headroom,
                              module_t *initrd,
                              void *(*bootstrap_map)(const module_t *),
-                             char *cmdline)
+                             char *cmdline,
+                             xen_pfn_t store_mfn, uint32_t store_evtchn,
+                             xen_pfn_t console_mfn, uint32_t console_evtchn)
 {
     int i, cpu, rc, compatible, compat32, order, machine;
     struct cpu_user_regs *regs;
@@ -299,6 +302,7 @@ int __init dom0_construct_pv(struct domain *d,
     l3_pgentry_t *l3tab = NULL, *l3start = NULL;
     l2_pgentry_t *l2tab = NULL, *l2start = NULL;
     l1_pgentry_t *l1tab = NULL, *l1start = NULL;
+    xen_pfn_t saved_pfn = ~0UL;
 
     /*
      * This fully describes the memory layout of the initial domain. All
@@ -441,8 +445,24 @@ int __init dom0_construct_pv(struct domain *d,
         vphysmap_end = vphysmap_start;
     vstartinfo_start = round_pgup(vphysmap_end);
     vstartinfo_end   = (vstartinfo_start +
-                        sizeof(struct start_info) +
-                        sizeof(struct dom0_vga_console_info));
+                        sizeof(struct start_info));
+    if ( !is_vixen() )
+        vstartinfo_end += sizeof(struct dom0_vga_console_info);
+    vstartinfo_end   = round_pgup(vstartinfo_end);
+
+    if ( is_vixen() ) {
+        struct page_info *pg;
+
+        saved_pfn = (vstartinfo_end - v_start) / PAGE_SIZE;
+
+        pg = mfn_to_page(store_mfn);
+        share_xen_page_with_guest(pg, d, XENSHARE_writable);
+        vstartinfo_end   += PAGE_SIZE;
+
+        pg = mfn_to_page(console_mfn);
+        share_xen_page_with_guest(pg, d, XENSHARE_writable);
+        vstartinfo_end   += PAGE_SIZE;
+    }
 
     vpt_start        = round_pgup(vstartinfo_end);
     for ( nr_pt_pages = 2; ; nr_pt_pages++ )
@@ -634,7 +654,13 @@ int __init dom0_construct_pv(struct domain *d,
             *l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
             l2tab++;
         }
-        if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
+        if ( count == saved_pfn ) {
+            mfn = store_mfn;
+            pfn++;
+        } else if ( count == saved_pfn + 1 ) {
+            mfn = console_mfn;
+            pfn++;
+        } else if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
             mfn = pfn++;
         else
             mfn = initrd_mfn++;
@@ -737,7 +763,8 @@ int __init dom0_construct_pv(struct domain *d,
 
     si->shared_info = virt_to_maddr(d->shared_info);
 
-    si->flags        = SIF_PRIVILEGED | SIF_INITDOMAIN;
+    si->flags        = is_vixen() ? 0 : (SIF_PRIVILEGED | SIF_INITDOMAIN);
+
     if ( !vinitrd_start && initrd_len )
         si->flags   |= SIF_MOD_START_PFN;
     si->flags       |= (xen_processor_pmbits << 8) & SIF_PM_MASK;
@@ -818,6 +845,32 @@ int __init dom0_construct_pv(struct domain *d,
         }
     }
 
+    if ( is_vixen() )
+    {
+        dom0_update_physmap(d, saved_pfn, store_mfn, vphysmap_start);
+        dom0_update_physmap(d, saved_pfn + 1, console_mfn, vphysmap_start);
+
+        rc = evtchn_alloc_proxy(d, store_evtchn, ECS_INTERDOMAIN);
+        if ( rc )
+        {
+            printk("Vixen: failed to reserve Xenstore event channel %d => %d\n",
+                   store_evtchn, rc);
+            goto out;
+        }
+        rc = evtchn_alloc_proxy(d, console_evtchn, ECS_INTERDOMAIN);
+        if ( rc )
+        {
+            printk("Vixen: failed to reserve Console event channel %d => %d\n",
+                   console_evtchn, rc);
+            goto out;
+        }
+
+        si->store_mfn = store_mfn;
+        si->store_evtchn = store_evtchn;
+        si->console.domU.mfn = console_mfn;
+        si->console.domU.evtchn = console_evtchn;
+    }
+
     if ( initrd_len != 0 )
     {
         si->mod_start = vinitrd_start ?: initrd_pfn;
@@ -828,14 +881,15 @@ int __init dom0_construct_pv(struct domain *d,
     if ( cmdline != NULL )
         strlcpy((char *)si->cmd_line, cmdline, sizeof(si->cmd_line));
 
-    if ( fill_console_start_info((void *)(si + 1)) )
+    if ( !is_vixen() && fill_console_start_info((void *)(si + 1)) )
     {
         si->console.dom0.info_off  = sizeof(struct start_info);
         si->console.dom0.info_size = sizeof(struct dom0_vga_console_info);
     }
 
     if ( is_pv_32bit_domain(d) )
-        xlat_start_info(si, XLAT_start_info_console_dom0);
+        xlat_start_info(si, is_vixen() ? XLAT_start_info_console_domU :
+                                         XLAT_start_info_console_dom0);
 
     /* Return to idle domain's page tables. */
     mapcache_override_current(NULL);
@@ -873,9 +927,11 @@ int __init dom0_construct_pv(struct domain *d,
     if ( test_bit(XENFEAT_supervisor_mode_kernel, parms.f_required) )
         panic("Dom0 requires supervisor-mode execution");
 
-    rc = dom0_setup_permissions(d);
-    BUG_ON(rc != 0);
-
+    if ( !is_vixen() )
+    {
+        rc = dom0_setup_permissions(d);
+        BUG_ON(rc != 0);
+    }
     if ( elf_check_broken(&elf) )
         printk(" Xen warning: dom0 kernel broken ELF: %s\n",
                elf_check_broken(&elf));
@@ -886,6 +942,9 @@ int __init dom0_construct_pv(struct domain *d,
     v->is_initialised = 1;
     clear_bit(_VPF_down, &v->pause_flags);
 
+    if ( is_vixen() )
+        d->max_pages = d->tot_pages;
+
     return 0;
 
 out:
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 1b89844..c49eeea 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -663,6 +663,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         .stop_bits = 1
     };
     struct xen_arch_domainconfig config = { .emulation_flags = 0 };
+    xen_pfn_t store_mfn = 0, console_mfn = 0;
+    uint32_t store_evtchn = 0, console_evtchn = 0;
 
     /* Critical region without IDT or TSS.  Any fault is deadly! */
 
@@ -1595,6 +1597,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
     }
 
+    if ( is_vixen() )
+        config.emulation_flags = XEN_X86_EMU_PIT;
+
     /* Create initial domain 0. */
     dom0 = domain_create(dom0_domid, domcr_flags, 0, &config);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
@@ -1604,7 +1609,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     dom0->target = NULL;
 
     if ( is_vixen() )
-        vixen_transform(dom0);
+        vixen_transform(dom0, &store_mfn, &store_evtchn,
+                        &console_mfn, &console_evtchn);
 
     /* Grab the DOM0 command line. */
     cmdline = (char *)(mod[0].string ? __va(mod[0].string) : NULL);
@@ -1667,7 +1673,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( construct_dom0(dom0, mod, modules_headroom,
                         (initrdidx > 0) && (initrdidx < mbi->mods_count)
                         ? mod + initrdidx : NULL,
-                        bootstrap_map, cmdline) != 0)
+                        bootstrap_map, cmdline,
+                        store_mfn, store_evtchn,
+                        console_mfn, console_evtchn) != 0)
         panic("Could not set up DOM0 guest OS");
 
     if ( cpu_has_smap )
diff --git a/xen/include/asm-x86/dom0_build.h b/xen/include/asm-x86/dom0_build.h
index d83d2b4..459211c 100644
--- a/xen/include/asm-x86/dom0_build.h
+++ b/xen/include/asm-x86/dom0_build.h
@@ -18,13 +18,17 @@ int dom0_construct_pv(struct domain *d, const module_t *image,
                       unsigned long image_headroom,
                       module_t *initrd,
                       void *(*bootstrap_map)(const module_t *),
-                      char *cmdline);
+                      char *cmdline,
+                      xen_pfn_t store_mfn, uint32_t store_evtchn,
+                      xen_pfn_t console_mfn, uint32_t console_evtchn);
 
 int dom0_construct_pvh(struct domain *d, const module_t *image,
                        unsigned long image_headroom,
                        module_t *initrd,
                        void *(*bootstrap_map)(const module_t *),
-                       char *cmdline);
+                       char *cmdline,
+                       xen_pfn_t store_mfn, uint32_t store_evtchn,
+                       xen_pfn_t console_mfn, uint32_t console_evtchn);
 
 unsigned long dom0_paging_pages(const struct domain *d,
                                 unsigned long nr_pages);
diff --git a/xen/include/asm-x86/guest/vixen.h b/xen/include/asm-x86/guest/vixen.h
index f46c6ed..eca263a 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -80,7 +80,10 @@ bool vixen_has_per_cpu_notifications(void);
 
 void vixen_vcpu_initialize(struct vcpu *v);
 
-void __init vixen_transform(struct domain *dom0);
+void __init
+vixen_transform(struct domain *dom0,
+                xen_pfn_t *pstore_mfn, uint32_t *pstore_evtchn,
+                xen_pfn_t *pconsole_mfn, uint32_t *pconsole_evtchn);
 
 bool vixen_ring_process(uint16_t port);
 
diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
index c5b3d4e..51b207b 100644
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -39,7 +39,9 @@ int construct_dom0(
     const module_t *kernel, unsigned long kernel_headroom,
     module_t *initrd,
     void *(*bootstrap_map)(const module_t *),
-    char *cmdline);
+    char *cmdline,
+    xen_pfn_t store_mfn, uint32_t store_evtchn,
+    xen_pfn_t console_mfn, uint32_t console_evtchn);
 void setup_io_bitmap(struct domain *d);
 
 unsigned long initial_images_nrpages(nodeid_t node);
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (21 preceding siblings ...)
  2018-01-06 22:54 ` [PATCH 22/22] vixen: dom0 builder support Anthony Liguori
@ 2018-01-06 23:29 ` Anthony Liguori
  2018-01-06 23:50 ` Andrew Cooper
  2018-01-08 11:54 ` Wei Liu
  24 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-06 23:29 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 6, 2018 at 2:54 PM, Anthony Liguori <aliguori@amzn.com> wrote:
> From: Anthony Liguori <aliguori@amazon.com>
>
> CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
> appears to be very difficult to isolate the hypervisor's page tables
> from PV domUs while maintaining ABI compatibility.  Instead of trying
> to make a KPTI-like approach work for Xen PV, it seems reasonable to
> run a copy of Xen within an HVM (or PVH) domU to provide backwards
> compatibility with guests as mentioned in XSA-254 [1].

I also posted a branch with a backport to 4.9 stable.

https://github.com/aliguori/xen/tree/vixen-stable-4.9

While this is a big more than what goes into a typical stable release, given
that it is addressing a security issue and is relatively well contained, I think
it would be worth considering for addition to stable.

Regards,

Anthony Liguori

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (22 preceding siblings ...)
  2018-01-06 23:29 ` [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
@ 2018-01-06 23:50 ` Andrew Cooper
  2018-01-06 23:59   ` Matt Wilson
  2018-01-07  0:05   ` Anthony Liguori
  2018-01-08 11:54 ` Wei Liu
  24 siblings, 2 replies; 80+ messages in thread
From: Andrew Cooper @ 2018-01-06 23:50 UTC (permalink / raw)
  To: Anthony Liguori, xen-devel
  Cc: KarimAllah Ahmed, Jan H. Schönherr, Wei Liu,
	Anthony Liguori, Matt Wilson

On 06/01/2018 22:54, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
>
> CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
> appears to be very difficult to isolate the hypervisor's page tables
> from PV domUs while maintaining ABI compatibility.  Instead of trying
> to make a KPTI-like approach work for Xen PV, it seems reasonable to
> run a copy of Xen within an HVM (or PVH) domU to provide backwards
> compatibility with guests as mentioned in XSA-254 [1].
>
> This patch series adds a new mode to Xen called Vixen (Virtualized
> Xen)

It is quite telling that through all of this, I never even considered
asking if vixen stood for anything!

> which provides a PV-compatible interface while gaining
> CVE-2017-5754 protection for the host provided by hardware
> virtualization.  Vixen supports running a single unprivileged PV
> domain (a dom1) that is constructed by the dom0 domain builder.
>
> Please note the Xen page table configuration fundamental to the
> current PV ABI makes it impossible for an operating system to mitigate
> CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
> (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
> must run directly in a HVM or PVH domU.

Its a little more complicated than this, but I suppose is worth pointing
out.

A 64bit PV guest kernel cannot, of its own accord, protect itself
against SP3/Meltdown.  This is due to the shared nature/responsibility
of pagetables between the PV guest kernel and Xen.

What the Vixen/PV-shim plan does is isolate the guest sufficiently that
any SP3 attacks can't read data belonging to other guests on the host.

An SP3/Meltdown mitigation can only come from having Xen change the way
it uses pagetables, and my 44-patch prerequisite series serves to
demonstrate that this seems impractical with the existing ABI.

> This series is very similar to the PVH series posted by Wei and we
> have been discussing how to merge efforts.  We were hoping to have
> more time to work this out.  I am posting this because I'm fairly
> confident that this series is complete (all PV instances in EC2 are
> using this) and others might find it useful.  I also wanted to have
> more of a discussion about the best way to merge and some of the
> differences in designs.

Some ad hoc thoughts so far:

* Upstream, we need to take the PV-Shim side of domid handling. 
Unilaterally using dom1 is fine for server-virt infrastructure where
guests only ever talk to dom0, but isn't fine if you've got domains
which are communicating directly (e.g. with libvchan).  This is very
minor in the grand scheme of things though.

* I do prefer the Vixen side of startup, where we describe rather more
clearly what is going on.  I never got around to stea^W borrowing this
for PV-shim.

* Whatever eventual version gets in upstream, it is important that it
HVM and PVH capable for backwards and forwards compatibility.  Again,
this doesn't appear to be too complicated to arrange in practice.  For
reference, what is the oldest version of Xen you need to target here? 
(The pre-console-ring observation puts it quite old)

* For PV-shim, we took the approach of making the domU neither
privileged nor the hardware domain.  While I expect this throws up a
different set of issues, I think it is a cleaner approach overall.

I'm sure there are areas I've missed, but this is hopefully a start.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-06 23:50 ` Andrew Cooper
@ 2018-01-06 23:59   ` Matt Wilson
  2018-01-07  0:05   ` Anthony Liguori
  1 sibling, 0 replies; 80+ messages in thread
From: Matt Wilson @ 2018-01-06 23:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Anthony Liguori, Wei Liu, KarimAllah Ahmed,
	Jan H. Schönherr, Anthony Liguori, xen-devel

On Sat, Jan 06, 2018 at 11:50:46PM +0000, Andrew Cooper wrote:
> On 06/01/2018 22:54, Anthony Liguori wrote:
> > Please note the Xen page table configuration fundamental to the
> > current PV ABI makes it impossible for an operating system to mitigate
> > CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
> > (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
> > must run directly in a HVM or PVH domU.
> 
> Its a little more complicated than this, but I suppose is worth pointing
> out.
> 
> A 64bit PV guest kernel cannot, of its own accord, protect itself
> against SP3/Meltdown.  This is due to the shared nature/responsibility
> of pagetables between the PV guest kernel and Xen.
> 
> What the Vixen/PV-shim plan does is isolate the guest sufficiently that
> any SP3 attacks can't read data belonging to other guests on the host.
> 
> An SP3/Meltdown mitigation can only come from having Xen change the way
> it uses pagetables, and my 44-patch prerequisite series serves to
> demonstrate that this seems impractical with the existing ABI.

I'm not sure we're saying anything different than you are.

> > This series is very similar to the PVH series posted by Wei and we
> > have been discussing how to merge efforts.  We were hoping to have
> > more time to work this out.  I am posting this because I'm fairly
> > confident that this series is complete (all PV instances in EC2 are
> > using this) and others might find it useful.  I also wanted to have
> > more of a discussion about the best way to merge and some of the
> > differences in designs.
> 
> Some ad hoc thoughts so far:
> 
> * Upstream, we need to take the PV-Shim side of domid handling. 
> Unilaterally using dom1 is fine for server-virt infrastructure where
> guests only ever talk to dom0, but isn't fine if you've got domains
> which are communicating directly (e.g. with libvchan).  This is very
> minor in the grand scheme of things though.

Agreed. Handling domU to domU communication will be more
complicated. Passing through a different domid isn't too hard, and
that should handle most of it. We were attempting to make this as
simple as possible...

> * I do prefer the Vixen side of startup, where we describe rather more
> clearly what is going on.  I never got around to stea^W borrowing this
> for PV-shim.
> 
> * Whatever eventual version gets in upstream, it is important that it
> HVM and PVH capable for backwards and forwards compatibility.  Again,
> this doesn't appear to be too complicated to arrange in practice.  For
> reference, what is the oldest version of Xen you need to target here? 
> (The pre-console-ring observation puts it quite old)

3.4.mumble.

> * For PV-shim, we took the approach of making the domU neither
> privileged nor the hardware domain.  While I expect this throws up a
> different set of issues, I think it is a cleaner approach overall.
> 
> I'm sure there are areas I've missed, but this is hopefully a start.

Thanks for the quick feedback, and for all the help along the way.

--msw

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-06 23:50 ` Andrew Cooper
  2018-01-06 23:59   ` Matt Wilson
@ 2018-01-07  0:05   ` Anthony Liguori
  2018-01-07 20:29     ` Anthony Liguori
  1 sibling, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07  0:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Jan H. Schönherr, Anthony Liguori, xen-devel

On Sat, Jan 6, 2018 at 3:50 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 06/01/2018 22:54, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
>> appears to be very difficult to isolate the hypervisor's page tables
>> from PV domUs while maintaining ABI compatibility.  Instead of trying
>> to make a KPTI-like approach work for Xen PV, it seems reasonable to
>> run a copy of Xen within an HVM (or PVH) domU to provide backwards
>> compatibility with guests as mentioned in XSA-254 [1].
>>
>> This patch series adds a new mode to Xen called Vixen (Virtualized
>> Xen)
>
> It is quite telling that through all of this, I never even considered
> asking if vixen stood for anything!

Also, topical for the season:
https://www.youtube.com/watch?v=78c7vDFt6G8&feature=youtu.be&t=7

>> which provides a PV-compatible interface while gaining
>> CVE-2017-5754 protection for the host provided by hardware
>> virtualization.  Vixen supports running a single unprivileged PV
>> domain (a dom1) that is constructed by the dom0 domain builder.
>>
>> Please note the Xen page table configuration fundamental to the
>> current PV ABI makes it impossible for an operating system to mitigate
>> CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
>> (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
>> must run directly in a HVM or PVH domU.
>
> Its a little more complicated than this, but I suppose is worth pointing
> out.
>
> A 64bit PV guest kernel cannot, of its own accord, protect itself
> against SP3/Meltdown.  This is due to the shared nature/responsibility
> of pagetables between the PV guest kernel and Xen.
>
> What the Vixen/PV-shim plan does is isolate the guest sufficiently that
> any SP3 attacks can't read data belonging to other guests on the host.
>
> An SP3/Meltdown mitigation can only come from having Xen change the way
> it uses pagetables, and my 44-patch prerequisite series serves to
> demonstrate that this seems impractical with the existing ABI.

Correct.  You can get close but getting 100% of the way seems unlikely.

>> This series is very similar to the PVH series posted by Wei and we
>> have been discussing how to merge efforts.  We were hoping to have
>> more time to work this out.  I am posting this because I'm fairly
>> confident that this series is complete (all PV instances in EC2 are
>> using this) and others might find it useful.  I also wanted to have
>> more of a discussion about the best way to merge and some of the
>> differences in designs.
>
> Some ad hoc thoughts so far:
>
> * Upstream, we need to take the PV-Shim side of domid handling.
> Unilaterally using dom1 is fine for server-virt infrastructure where
> guests only ever talk to dom0, but isn't fine if you've got domains
> which are communicating directly (e.g. with libvchan).  This is very
> minor in the grand scheme of things though.

That's fine.  I think we should try to focus on merging some common
infrastructure because I don't think 75+ patch series are going to be
easy to get agreement on.

I'm not a huge fan of passing the domid via CPUID.  That's going to
be messy over time.  I do, however, like the idea of passing it as a
command line argument.  I'm happy to add support for that if that's
agreeable.

> * I do prefer the Vixen side of startup, where we describe rather more
> clearly what is going on.  I never got around to stea^W borrowing this
> for PV-shim.

I think no matter what, we should try to get the first few patches merged
to add basic guest detection and hypercall support.

> * Whatever eventual version gets in upstream, it is important that it
> HVM and PVH capable for backwards and forwards compatibility.  Again,
> this doesn't appear to be too complicated to arrange in practice.  For
> reference, what is the oldest version of Xen you need to target here?
> (The pre-console-ring observation puts it quite old)

3.4.x is what we're targetting.  That is indeed old but since since this
is a security issue, supporting a wide range of environments seems
like the right thing to do.

> * For PV-shim, we took the approach of making the domU neither
> privileged nor the hardware domain.  While I expect this throws up a
> different set of issues, I think it is a cleaner approach overall.

I never got a chance to try this out and see what breaks.

The one argument I'd make against it is that over time, I'd like to add
privileges to the domU in an attempt to improve performance.  We found
a lot of weird compatibility issues on older versions of Linux so I didn't
attempt to do any of this up front but in the long term, I would like to steal
some of the tricks from Xenner.

> I'm sure there are areas I've missed, but this is hopefully a start.

Thanks Andrew!

Regards,

Anthony Liguori

> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/22] vixen: introduce is_vixen() to allow altering behavior
  2018-01-06 22:54 ` [PATCH 07/22] vixen: introduce is_vixen() to allow altering behavior Anthony Liguori
@ 2018-01-07  0:06   ` Matt Wilson
  2018-01-07  0:26     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Matt Wilson @ 2018-01-07  0:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:22PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> Vixen (Virtualized Xen) is a paravirtual mode of Xen where
> paravirtual I/O is passed through from the parent hypervisor
> all the way through the dom0 guest.  The dom0 guest is also
> deprivileged and renumbered to give the appearance that it
> is running as a normal PV guest.
> 
> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> ---
>  xen/arch/x86/guest/Makefile       |  1 +
>  xen/arch/x86/guest/vixen.c        | 30 ++++++++++++++++
>  xen/include/asm-x86/guest/vixen.h | 73 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 104 insertions(+)
>  create mode 100644 xen/arch/x86/guest/vixen.c
>  create mode 100644 xen/include/asm-x86/guest/vixen.h

This will break ARM builds in future patches that use is_vixen().

--msw

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] char: optionally redirect {, g}printk output to QEMU debug log
  2018-01-06 22:54 ` [PATCH 05/22] char: optionally redirect {, g}printk output to QEMU debug log Anthony Liguori
@ 2018-01-07  0:18   ` Anthony Liguori
  2018-01-07  0:35     ` Matt Wilson
  0 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07  0:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 6, 2018 at 2:54 PM, Anthony Liguori <aliguori@amzn.com> wrote:
> From: Matt Wilson <msw@amazon.com>
>
> When using Vixen, it is helpful to get the Xen messages in a
> separate channel than the console output.  Add an option to
> output to the QEMU backdoor logging port.
>
> Signed-off-by: Matt Wilson <msw@amazon.com>
> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> ---
>  xen/drivers/char/console.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
> index 19d0e74..b9412c5 100644
> --- a/xen/drivers/char/console.c
> +++ b/xen/drivers/char/console.c
> @@ -85,6 +85,11 @@ static int __read_mostly sercon_handle = -1;
>
>  static DEFINE_SPINLOCK(console_lock);
>
> +/* send all printk output to QEMU debug log. Input does not change,
> + * nor does dom0 output.
> + */
> +static bool_t __read_mostly qemu_debug = false;
> +
>  /*
>   * To control the amount of printing, thresholds are added.
>   * These thresholds correspond to the XENLOG logging levels.
> @@ -564,10 +569,21 @@ static void __putstr(const char *str)
>  {
>      ASSERT(spin_is_locked(&console_lock));
>
> -    sercon_puts(str);
> -    video_puts(str);
> +    if ( qemu_debug )
> +    {
> +        char c;
> +        while ( (c = *str++) != '\0' )
> +        {
> +            outb(c, 0x12);
> +        }

Yeah, this has no hope of working on ARM Matt.  Shame on you ;-P

Regards,

Anthony Liguori

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/22] vixen: dom0 builder support
  2018-01-06 22:54 ` [PATCH 22/22] vixen: dom0 builder support Anthony Liguori
@ 2018-01-07  0:24   ` Matt Wilson
  2018-01-07  9:02   ` Roger Pau Monné
  2018-01-08 18:22   ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 80+ messages in thread
From: Matt Wilson @ 2018-01-07  0:24 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:37PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> The dom0 builder requires a number of modifications in order to be
> able to launch unprivileged guests.  The console and store pages
> must be mapped in a specific location within the guest's initial
> page table.
> 
> We also have to setup the start info to be what's expected for
> unprivileged guests and supress the normal logic to give dom0
> increased permissions.
> 
> We have to pass around the console and store pages which involves
> touching a number of places including the PVH builder.

There are some unresolved issues that are introduced by apparent
differences in how the libxc domain builder and the dom0 domain
builder arrange things, in particular the bootstrap page tables and
32-on-64 compatibility page table frames. As is, this seems to
primarily cause an incompatibility with 32-bit PAE mini-os that makes
assumptions about how page tables are allocated at start of day
(already fixed by [1]).

Practically speaking, this means that existing buggy PV-GRUB images
die early in MM setup when loaded as the dom1 kernel for Vixen.

--msw

[1] http://xenbits.xen.org/gitweb/?p=mini-os.git;a=commitdiff;h=8d84345a20d8a46ea26379c9f19961f6aa3e6e83


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/22] vixen: introduce is_vixen() to allow altering behavior
  2018-01-07  0:06   ` Matt Wilson
@ 2018-01-07  0:26     ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07  0:26 UTC (permalink / raw)
  To: Matt Wilson
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sat, Jan 6, 2018 at 4:06 PM, Matt Wilson <msw@amzn.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:22PM -0800, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> Vixen (Virtualized Xen) is a paravirtual mode of Xen where
>> paravirtual I/O is passed through from the parent hypervisor
>> all the way through the dom0 guest.  The dom0 guest is also
>> deprivileged and renumbered to give the appearance that it
>> is running as a normal PV guest.
>>
>> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
>> ---
>>  xen/arch/x86/guest/Makefile       |  1 +
>>  xen/arch/x86/guest/vixen.c        | 30 ++++++++++++++++
>>  xen/include/asm-x86/guest/vixen.h | 73 +++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 104 insertions(+)
>>  create mode 100644 xen/arch/x86/guest/vixen.c
>>  create mode 100644 xen/include/asm-x86/guest/vixen.h
>
> This will break ARM builds in future patches that use is_vixen().

https://github.com/aliguori/xen/tree/vixen-upstream.next

Fixes the ARM build and will be in v2 of this series.

Regards,

Anthony Liguori

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] char: optionally redirect {, g}printk output to QEMU debug log
  2018-01-07  0:18   ` Anthony Liguori
@ 2018-01-07  0:35     ` Matt Wilson
  0 siblings, 0 replies; 80+ messages in thread
From: Matt Wilson @ 2018-01-07  0:35 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Anthony Liguori, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 04:18:46PM -0800, Anthony Liguori wrote:
> On Sat, Jan 6, 2018 at 2:54 PM, Anthony Liguori <aliguori@amzn.com> wrote:
> > From: Matt Wilson <msw@amazon.com>
> 
> Yeah, this has no hope of working on ARM Matt.  Shame on you ;-P

It's almost like you put this patch in front of the one introducing
is_vixen() just so you could say I broke it first. ;-)

--msw

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM
  2018-01-06 22:54 ` [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM Anthony Liguori
@ 2018-01-07  8:16   ` Roger Pau Monné
  2018-01-07 15:27     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:16 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:24PM -0800, Anthony Liguori wrote:
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index a56f875..935901b 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -122,6 +122,7 @@
>  #include <asm/fixmap.h>
>  #include <asm/io_apic.h>
>  #include <asm/pci.h>
> +#include <asm/guest.h>
>  
>  #include <asm/hvm/grant_table.h>
>  #include <asm/pv/grant_table.h>
> @@ -945,7 +946,7 @@ get_page_from_l1e(
>              case 0:
>                  break;
>              case 1:
> -                if ( !is_hardware_domain(l1e_owner) )
> +                if ( !is_vixen() && !is_hardware_domain(l1e_owner) )
>                      break;
>                  /* fallthrough */
>              case -1:
> @@ -5536,6 +5537,21 @@ void arch_dump_shared_mem_info(void)
>              mem_sharing_get_nr_saved_mfns());
>  }
>  
> +const unsigned long *__init
> +vixen_get_platform_badpages(unsigned int *array_size)
> +{
> +    static unsigned long __initdata bad_pages[] = {
> +        0xfeffc000,
> +        0xfeffd000,
> +        0xfeffe000,
> +        0xfefff000,

This values shouldn't be hardcoded. IMHO it would also be good to
place all the vixen_ helpers in a single file.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/22] vixen: do not permit access to physical IRQs if in Vixen mode
  2018-01-06 22:54 ` [PATCH 10/22] vixen: do not permit access to physical IRQs if in Vixen mode Anthony Liguori
@ 2018-01-07  8:18   ` Roger Pau Monné
  2018-01-07 15:28     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:25PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> Our intention is for the Vixen guest to be deprivileged so we need
> to avoid permitting access to each IRQ even though it is technically
> the hardware domain.

I'm still not sure I see why you need the vixen guest to be the
hardware_domain. On the pv-shim work we managed to make the domu !=
hardware_domain, and that seems to work just fine.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping
  2018-01-06 22:54 ` [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping Anthony Liguori
@ 2018-01-07  8:23   ` Roger Pau Monné
  2018-01-07 15:33     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:23 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:26PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> We split initialization of Vixen into two parts.  The first part
> just detects the presence of an HVM hypervisor so that we can
> figure out whether to modify the e820 table.
> 
> The later initialization is used to actually map the shared_info
> structure from the parent hypervisor into Xen.
> 
> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> ---
>  xen/arch/x86/guest/vixen.c        | 45 +++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/setup.c              |  5 +++++
>  xen/include/asm-x86/guest/vixen.h |  4 ++++
>  3 files changed, 54 insertions(+)
> 
> diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
> index d82e68f..d8466ba 100644
> --- a/xen/arch/x86/guest/vixen.c
> +++ b/xen/arch/x86/guest/vixen.c
> @@ -20,8 +20,53 @@
>   */
>  
>  #include <asm/guest/vixen.h>
> +#include <public/version.h>
>  
>  static int in_vixen;
> +static uint8_t global_si_data[4 << 10] __attribute__((aligned(4096)));

The shared_info memory page gfn doesn't need to be populated, by doing
it like this you are wasting a domain's memory page.

> +static shared_info_t *global_si = (void *)global_si_data;
> +
> +void __init init_vixen(void)
> +{
> +    int major, minor, version;
> +
> +    if ( !xen_guest )
> +    {
> +        printk("Disabling Vixen because we are not running under Xen\n");
> +        in_vixen = -1;
> +        return;
> +    }
> +
> +    version = HYPERVISOR_xen_version(XENVER_version, NULL);
> +    major = version >> 16;
> +    minor = version & 0xffff;
> +
> +    printk("Vixen running under Xen %d.%d\n", major, minor);
> +
> +    in_vixen = 1;
> +}
> +
> +void __init early_vixen_init(void)
> +{
> +    struct xen_add_to_physmap xatp;
> +    long rc;
> +
> +    if ( !is_vixen() )
> +	return;
> +
> +    /* Setup our own shared info area */
> +    xatp.domid = DOMID_SELF;
> +    xatp.idx = 0;
> +    xatp.space = XENMAPSPACE_shared_info;
> +    xatp.gpfn = virt_to_mfn(global_si);
> +
> +    rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
> +    if ( rc < 0 )
> +        printk("Setting shared info page failed: %ld\n", rc);
> +
> +    memset(&global_si->native.evtchn_mask[0], 0x00,
> +           sizeof(global_si->native.evtchn_mask));

Hm, I'm not sure I like to approach of unmasking everything. IMHO I
would rather mask everything and unmask them when the guest actually
binds the event channel. That makes sure that an interrupt will get
injected when the event channel is unmasked (if there's an event
pending).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] vixen: Use SCHEDOP_shutdown to shutdown the machine
  2018-01-06 22:54 ` [PATCH 13/22] vixen: Use SCHEDOP_shutdown to shutdown the machine Anthony Liguori
@ 2018-01-07  8:27   ` Roger Pau Monné
  2018-01-07 15:35     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:27 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:28PM -0800, Anthony Liguori wrote:
> From: Jan H. Schönherr <jschoenh@amazon.de>
> 
> While the hwdom_shutdown() is able to reboot the system, it fails to
> properly power it off. With SCHEDOP_shutdown, we delegate the problem.
> 
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> ---
>  xen/common/domain.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index b4d679e..ede377c 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -42,6 +42,7 @@
>  #include <xen/trace.h>
>  #include <xen/tmem.h>
>  #include <asm/setup.h>
> +#include <asm/guest/vixen.h>
>  
>  /* Linux config option: propageted to domain0 */
>  /* xen_processor_pmbits: xen control Cx, Px, ... */
> @@ -693,6 +694,17 @@ void __domain_crash_synchronous(void)
>  }
>  
>  
> +static void vixen_shutdown(u8 reason)
> +{
> +    struct sched_shutdown sched_shutdown = { .reason = reason };
> +
> +    if (!opt_noreboot)
> +        HYPERVISOR_sched_op(SCHEDOP_shutdown, &sched_shutdown);
> +
> +    /* Fallback, in case the hypercall fails */
> +    hwdom_shutdown(reason);
> +}
> + 
>  void domain_shutdown(struct domain *d, u8 reason)
>  {
>      struct vcpu *v;
> @@ -703,6 +715,8 @@ void domain_shutdown(struct domain *d, u8 reason)
>          d->shutdown_code = reason;
>      reason = d->shutdown_code;
>  
> +    if ( is_vixen() )
> +        vixen_shutdown(reason);

What happens with hypervisor triggered shutdowns? For pv-shim we
translated all hypervisor initiated shutdowns to crash requests, since
AFAICT they can only come from panics/BUGs/ASSERTs...

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/22] vixen: pass through version hypercalls to parent Xen
  2018-01-06 22:54 ` [PATCH 15/22] vixen: pass through version hypercalls to parent Xen Anthony Liguori
@ 2018-01-07  8:31   ` Roger Pau Monné
  2018-01-07 15:40     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:31 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:30PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> This is necessary to trigger event channel upcalls but it is also

I'm lost here, what does version have to do with upcalls?

> useful to passthrough the full version information such that the
> guest believes it is running on the parent Xen.

In any case, I think this is wrong. The interface the guest sees is
the interface from vixen, not the interface of the L0. Hence reporting
the L0 version is not appropriate.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-06 22:54 ` [PATCH 16/22] vixen: pass grant table operations through to the outer Xen Anthony Liguori
@ 2018-01-07  8:36   ` Roger Pau Monné
  2018-01-07 15:42     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:31PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> The grant table is a region of guest memory that contains GMFNs
> which in PV are MFNs but are PFNs in HVM.  Since a Vixen guest MFN
> is an HVM PFN, we can pass this table directly through to the outer
> Xen which cuts down considerably on overhead.
> 
> We do not forward most of the hypercalls since we only intend on
> Vixen to be used for normal guests, not driver domains.
> 
> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> ---
>  xen/common/grant_table.c | 131 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 131 insertions(+)
> 
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 250450b..b302fd0 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -39,6 +39,7 @@
>  #include <xen/vmap.h>
>  #include <xsm/xsm.h>
>  #include <asm/flushtlb.h>
> +#include <asm/guest.h>
>  
>  /* Per-domain grant information. */
>  struct grant_table {
> @@ -1199,6 +1200,9 @@ gnttab_map_grant_ref(
>      int i;
>      struct gnttab_map_grant_ref op;
>  
> +    if ( is_vixen() )
> +        return -ENOSYS;

Here and below: instead of adding all those is_vixen calls in a bunch
of gnttab functions, why don't you just replace the whole
do_grant_table_op function? That's cleaner and less intrusive.

>  static long
> +vixen_gnttab_setup_table(
> +    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
> +{
> +    long rc;
> +
> +    struct gnttab_setup_table op;
> +    xen_pfn_t *frame_list = NULL;
> +    static void *grant_table;
> +    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
> +
> +    if ( count != 1 )
> +        return -EINVAL;
> +
> +    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
> +    {
> +        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
> +        return -EFAULT;
> +    }
> +
> +    if ( grant_table == NULL ) {
> +        struct xen_add_to_physmap xatp;
> +        struct domain *d;
> +        int i;
> +
> +        for ( i = 0; i < max_grant_frames; i++ )
> +        {
> +             grant_table = alloc_xenheap_page();

This is wasting one memory page, grant table frames don't need to be
populated.

> +             BUG_ON(grant_table == NULL);
> +             xatp.domid = DOMID_SELF;
> +             xatp.idx = i;
> +             xatp.space = XENMAPSPACE_grant_table;
> +             xatp.gpfn = virt_to_mfn(grant_table);
> +             rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
> +             if ( rc != 0 )
> +                 printk("Add to physmap failed! %ld\n", rc);
> +
> +             d = rcu_lock_current_domain();
> +             share_xen_page_with_guest(mfn_to_page(xatp.gpfn), d, XENSHARE_writable);
> +             rcu_unlock_domain(d);
> +        }
> +    }
> +
> +    if ( op.nr_frames > 0 ) {
> +        frame_list = xzalloc_array(xen_pfn_t, op.nr_frames);
> +        if ( frame_list == NULL )
> +            return -ENOMEM;
> +    }
> +
> +    old_frame_list = op.frame_list;
> +    op.frame_list.p = frame_list;
> +
> +    rc = HYPERVISOR_grant_table_op(GNTTABOP_setup_table, &op, count);

On HVM you don't need to use the GNTTABOP_setup_table hypercall,
XENMEM_add_to_physmap already does all the needed setup AFAICT.

> +    op.frame_list = old_frame_list;
> +
> +    if ( rc >= 0 ) {
> +        if ( op.status == 0 && op.nr_frames &&
> +             copy_to_guest(old_frame_list, frame_list, op.nr_frames) != 0 ) {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +
> +        if ( unlikely(copy_to_guest(uop, &op, 1)) != 0 ) {
> +            rc = -EFAULT;
> +            goto out;
> +        }
> +    }
> +
> + out:
> +    xfree(frame_list);
> +
> +    return rc;
> +}
> +
> +static long
>  gnttab_setup_table(
>      XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count,
>      unsigned int limit_max)
> @@ -1811,6 +1895,9 @@ gnttab_setup_table(
>      struct grant_table *gt;
>      unsigned int i;
>  
> +    if ( is_vixen() )
> +        return vixen_gnttab_setup_table(uop, count);
> +
>      if ( count != 1 )
>          return -EINVAL;
>  
> @@ -1892,6 +1979,26 @@ gnttab_setup_table(
>  }
>  
>  static long
> +vixen_gnttab_query_size(
> +    XEN_GUEST_HANDLE_PARAM(gnttab_query_size_t) uop, unsigned int count)
> +{
> +    struct gnttab_query_size op;
> +    int rc;
> +
> +    if ( count != 1 )
> +        return -EINVAL;
> +
> +    if ( unlikely(copy_from_guest(&op, uop, 1)) != 0)
> +        return -EFAULT;
> +
> +    rc = HYPERVISOR_grant_table_op(GNTTABOP_query_size, &op, count);
> +    if (rc == 0 && unlikely(__copy_to_guest(uop, &op, 1)) )
           ^ nit: missing space

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/22] vixen: setup infrastructure to receive event channel notifications
  2018-01-06 22:54 ` [PATCH 17/22] vixen: setup infrastructure to receive event channel notifications Anthony Liguori
@ 2018-01-07  8:42   ` Roger Pau Monné
  2018-01-07 15:45     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:42 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:32PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> This patch registers an interrupt handler using either an INTx
> interrupt from the platform PCI device, CALLBACK_IRQ vector
> delivery, or evtchn_upcall_vector depending on what the parent
> hypervisor supports.
> 
> The event channel polling code comes from Linux but uses the
> internal infrastructure for delivery.
> 
> Finally, this infrastructure has to be initialized per-VCPU so
> hook the appropriate place for that.
> 
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> ---
>  xen/arch/x86/domain.c             |   3 +
>  xen/arch/x86/guest/vixen.c        | 264 ++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/setup.c              |   3 +
>  xen/include/asm-x86/guest/vixen.h |   6 +
>  4 files changed, 276 insertions(+)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index da1bf1a..3e9c5be 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1147,6 +1147,9 @@ int arch_set_info_guest(
>  
>      update_cr3(v);
>  
> +    if ( is_vixen() )
> +        vixen_vcpu_initialize(v);
> +
>   out:
>      if ( flags & VGCF_online )
>          clear_bit(_VPF_down, &v->pause_flags);
> diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
> index 1816ece..76d9638 100644
> --- a/xen/arch/x86/guest/vixen.c
> +++ b/xen/arch/x86/guest/vixen.c
> @@ -21,10 +21,16 @@
>  
>  #include <asm/guest/vixen.h>
>  #include <public/version.h>
> +#include <xen/event.h>
> +#include <asm/apic.h>
>  
>  static int in_vixen;
>  static uint8_t global_si_data[4 << 10] __attribute__((aligned(4096)));
>  static shared_info_t *global_si = (void *)global_si_data;
> +static bool vixen_per_cpu_notifications = true;
> +static uint8_t vixen_evtchn_vector;
> +static bool vixen_needs_apic_ack = true;
> +struct irqaction vixen_irqaction;
>  
>  void __init init_vixen(void)
>  {
> @@ -94,3 +100,261 @@ u64 vixen_get_cpu_freq(void)
>  	return imm >> time.tsc_shift;
>      }
>  }
> +
> +/*
> + * Make a bitmask (i.e. unsigned long *) of a xen_ulong_t
> + * array. Primarily to avoid long lines (hence the terse name).
> + */
> +#define BM(x) (unsigned long *)(x)
> +/* Find the first set bit in a evtchn mask */
> +#define EVTCHN_FIRST_BIT(w) find_first_bit(BM(&(w)), BITS_PER_XEN_ULONG)
> +
> +/*
> + * Mask out the i least significant bits of w
> + */
> +#define MASK_LSBS(w, i) (w & ((~((xen_ulong_t)0UL)) << i))
> +
> +static DEFINE_PER_CPU(unsigned int, current_word_idx);
> +static DEFINE_PER_CPU(unsigned int, current_bit_idx);
> +
> +static inline xen_ulong_t active_evtchns(unsigned int cpu,
> +                                         shared_info_t *sh,
> +                                         unsigned int idx)
> +{
> +    return sh->native.evtchn_pending[idx] &
> +           ~sh->native.evtchn_mask[idx];
> +}
> +
> +static void vixen_evtchn_poll_one(size_t cpu)
> +{

All this seems overly complicated, specially taking into account that
vixen itself doesn't execute almost any code for each interrupt, since
they are forwarded to the guest. IMHO you could have a simpler event
channel loop without loosing much performance or fairness (but I
haven't done much tests regarding that, so I could be wrong).

> +    shared_info_t *s = global_si;
> +    struct vcpu_info *vcpu_info = &s->native.vcpu_info[cpu];
> +    xen_ulong_t pending_words;
> +    xen_ulong_t pending_bits;
> +    int start_word_idx, start_bit_idx;
> +    int word_idx, bit_idx, i;
> +
> +    /*
> +     * Master flag must be cleared /before/ clearing
> +     * selector flag. xchg_xen_ulong must contain an
> +     * appropriate barrier.
> +     */
> +    pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0);
> +
> +    start_word_idx = this_cpu(current_word_idx);
> +    start_bit_idx = this_cpu(current_bit_idx);
> +
> +    word_idx = start_word_idx;
> +
> +    for (i = 0; pending_words != 0; i++) {
> +        xen_ulong_t words;
> +
> +        words = MASK_LSBS(pending_words, word_idx);
> +
> +        /*
> +         * If we masked out all events, wrap to beginning.
> +         */
> +        if (words == 0) {
> +            word_idx = 0;
> +            bit_idx = 0;
> +            continue;
> +        }
> +        word_idx = EVTCHN_FIRST_BIT(words);
> +
> +        pending_bits = active_evtchns(cpu, s, word_idx);
> +        bit_idx = 0; /* usually scan entire word from start */
> +        /*
> +         * We scan the starting word in two parts.
> +         *
> +         * 1st time: start in the middle, scanning the
> +         * upper bits.
> +         *
> +         * 2nd time: scan the whole word (not just the
> +         * parts skipped in the first pass) -- if an
> +         * event in the previously scanned bits is
> +         * pending again it would just be scanned on
> +         * the next loop anyway.
> +         */
> +        if (word_idx == start_word_idx) {
> +            if (i == 0)
> +                bit_idx = start_bit_idx;
> +        }
> +
> +        do {
> +            struct evtchn *chn;
> +            xen_ulong_t bits;
> +            int port;
> +
> +            bits = MASK_LSBS(pending_bits, bit_idx);
> +
> +            /* If we masked out all events, move on. */
> +            if (bits == 0)
> +                break;
> +
> +            bit_idx = EVTCHN_FIRST_BIT(bits);
> +
> +            /* Process port. */
> +            port = (word_idx * BITS_PER_XEN_ULONG) + bit_idx;
> +
> +            chn = evtchn_from_port(hardware_domain, port);
> +            clear_bit(port, s->native.evtchn_pending);
> +            evtchn_port_set_pending(hardware_domain, chn->notify_vcpu_id, chn);
> +
> +            bit_idx = (bit_idx + 1) % BITS_PER_XEN_ULONG;
> +
> +            /* Next caller starts at last processed + 1 */
> +            this_cpu(current_word_idx) = bit_idx ? word_idx : (word_idx+1) % BITS_PER_XEN_ULONG;
> +            this_cpu(current_bit_idx) = bit_idx;
> +        } while (bit_idx != 0);
> +
> +        /* Scan start_l1i twice; all others once. */
> +        if ((word_idx != start_word_idx) || (i != 0))
> +            pending_words &= ~(1UL << word_idx);
> +
> +        word_idx = (word_idx + 1) % BITS_PER_XEN_ULONG;
> +    }
> +}
> +
> +static void vixen_upcall(int cpu)
> +{
> +    shared_info_t *s = global_si;
> +    struct vcpu_info *vcpu_info = &s->native.vcpu_info[cpu];
> +
> +    do {
> +        vcpu_info->evtchn_upcall_pending = 0;
> +        vixen_evtchn_poll_one(cpu);
> +    } while (vcpu_info->evtchn_upcall_pending);
> +}
> +
> +static void vixen_evtchn_notify(struct cpu_user_regs *regs)
> +{
> +    if (vixen_needs_apic_ack)
> +        ack_APIC_irq();
> +
> +    vixen_upcall(smp_processor_id());
> +}
> +
> +static void vixen_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
> +{
> +    vixen_upcall(smp_processor_id());
> +}
> +
> +static int hvm_set_parameter(int idx, uint64_t value)
> +{
> +    struct xen_hvm_param xhv;
> +    int r;
> +
> +    xhv.domid = DOMID_SELF;
> +    xhv.index = idx;
> +    xhv.value = value;
> +    r = HYPERVISOR_hvm_op(HVMOP_set_param, &xhv);
> +    if (r < 0) {
> +        printk("Cannot set hvm parameter %d: %d!\n",
> +               idx, r);
> +        return r;
> +    }
> +    return r;
> +}
> +
> +void vixen_vcpu_initialize(struct vcpu *v)
> +{
> +    struct xen_hvm_evtchn_upcall_vector upcall;
> +    long rc;
> +
> +    printk("VIXEN vcpu init VCPU%d\n", v->vcpu_id);
> +
> +    vcpu_pin_override(v, v->vcpu_id);
> +
> +    if (!vixen_needs_apic_ack)
> +        return;
> +
> +    printk("VIXEN vcpu init VCPU%d -- trying evtchn_upcall_vector\n", v->vcpu_id);
> +
> +    upcall.vcpu = v->vcpu_id;
> +    upcall.vector = vixen_evtchn_vector;
> +    rc = HYPERVISOR_hvm_op(HVMOP_set_evtchn_upcall_vector, &upcall);
> +    if ( rc )
> +    {
> +        struct xen_feature_info fi;
> +
> +        printk("VIXEN vcpu init VCPU%d -- trying hvm_callback_vector\n", v->vcpu_id);
> +
> +        fi.submap_idx = 0;
> +        rc = HYPERVISOR_xen_version(XENVER_get_features, &fi);
> +        if ( !rc )
> +        {
> +            rc = -EINVAL;
> +            if ( fi.submap & (1 << XENFEAT_hvm_callback_vector) )
> +            {
> +                rc = hvm_set_parameter(HVM_PARAM_CALLBACK_IRQ,
> +                                       ((uint64_t)HVM_PARAM_CALLBACK_TYPE_VECTOR << 56) | vixen_evtchn_vector);
> +            }
> +            if ( !rc )
> +                vixen_needs_apic_ack = false;
> +        }
> +    }
> +
> +    if ( rc )
> +    {
> +        int slot;
> +
> +        vixen_per_cpu_notifications = false;
> +
> +        printk("VIXEN vcpu init VCPU%d -- trying pci_intx_callback\n", v->vcpu_id);
> +        for (slot = 2; slot < 32; slot++) {

Coding style for braces and missing spaces in the condition, here and
below.

> +            uint16_t vendor, device;
> +
> +            vendor = pci_conf_read16(0, 0, slot, 0, PCI_VENDOR_ID);
> +            device = pci_conf_read16(0, 0, slot, 0, PCI_DEVICE_ID);
> +
> +            if (vendor == 0x5853 && device == 0x0001) {

Those values should be made defines and documented somewhere.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying
  2018-01-06 22:54 ` [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying Anthony Liguori
@ 2018-01-07  8:44   ` Roger Pau Monné
  2018-01-07 15:46     ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  8:44 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:33PM -0800, Anthony Liguori wrote:
> From: Jan H. Schönherr <jschoenh@amazon.de>
> 
> Previously, we would keep proxied event channels as ECS_INTERDOMAIN
> channel around. This works for most things, but has the problem
> that EVTCHNOP_status is broken, and that EVTCHNOP_close does not
> mark an event channel as free.

Why not use ECS_RESERVED for event channels that are forwarded to L0?

You could easily see whether an event channel is forwarded or not just
by checking if it's ECS_RESERVED, and then decide whether to forward
the hypercall to L0 or handle it in vixen.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/22] vixen: dom0 builder support
  2018-01-06 22:54 ` [PATCH 22/22] vixen: dom0 builder support Anthony Liguori
  2018-01-07  0:24   ` Matt Wilson
@ 2018-01-07  9:02   ` Roger Pau Monné
  2018-01-07 15:52     ` Anthony Liguori
  2018-01-08 18:22   ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-07  9:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

On Sat, Jan 06, 2018 at 02:54:37PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> The dom0 builder requires a number of modifications in order to be
> able to launch unprivileged guests.  The console and store pages
> must be mapped in a specific location within the guest's initial
> page table.
> 
> We also have to setup the start info to be what's expected for
> unprivileged guests and supress the normal logic to give dom0
> increased permissions.
> 
> We have to pass around the console and store pages which involves
> touching a number of places including the PVH builder.

AFAICT you are missing a fix for the positions of the p2m mapping in
the hypervisor virtual memory hole for 32bit PV guests [0].

Without this fix the 32bit DomU ABI is broken, which mandates the m2p
to always be mapped at virt_hv_start_low, and some early Linux pvops
kernels will fail to boot (IIRC from 2.6.32-2.6.36, because they don't
have XENMEM_machphys_mapping implemented).

[0] http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=28b2108b362e8976676a96c90eee058605427b57

> @@ -276,7 +277,9 @@ int __init dom0_construct_pv(struct domain *d,
>                               unsigned long image_headroom,
>                               module_t *initrd,
>                               void *(*bootstrap_map)(const module_t *),
> -                             char *cmdline)
> +                             char *cmdline,
> +                             xen_pfn_t store_mfn, uint32_t store_evtchn,
> +                             xen_pfn_t console_mfn, uint32_t console_evtchn)
>  {
>      int i, cpu, rc, compatible, compat32, order, machine;
>      struct cpu_user_regs *regs;
> @@ -299,6 +302,7 @@ int __init dom0_construct_pv(struct domain *d,
>      l3_pgentry_t *l3tab = NULL, *l3start = NULL;
>      l2_pgentry_t *l2tab = NULL, *l2start = NULL;
>      l1_pgentry_t *l1tab = NULL, *l1start = NULL;
> +    xen_pfn_t saved_pfn = ~0UL;
>  
>      /*
>       * This fully describes the memory layout of the initial domain. All
> @@ -441,8 +445,24 @@ int __init dom0_construct_pv(struct domain *d,
>          vphysmap_end = vphysmap_start;
>      vstartinfo_start = round_pgup(vphysmap_end);
>      vstartinfo_end   = (vstartinfo_start +
> -                        sizeof(struct start_info) +
> -                        sizeof(struct dom0_vga_console_info));
> +                        sizeof(struct start_info));
> +    if ( !is_vixen() )
> +        vstartinfo_end += sizeof(struct dom0_vga_console_info);
> +    vstartinfo_end   = round_pgup(vstartinfo_end);
> +
> +    if ( is_vixen() ) {
> +        struct page_info *pg;
> +
> +        saved_pfn = (vstartinfo_end - v_start) / PAGE_SIZE;
> +
> +        pg = mfn_to_page(store_mfn);
> +        share_xen_page_with_guest(pg, d, XENSHARE_writable);
> +        vstartinfo_end   += PAGE_SIZE;
> +
> +        pg = mfn_to_page(console_mfn);
> +        share_xen_page_with_guest(pg, d, XENSHARE_writable);
> +        vstartinfo_end   += PAGE_SIZE;
> +    }
>  
>      vpt_start        = round_pgup(vstartinfo_end);
>      for ( nr_pt_pages = 2; ; nr_pt_pages++ )
> @@ -634,7 +654,13 @@ int __init dom0_construct_pv(struct domain *d,
>              *l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
>              l2tab++;
>          }
> -        if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
> +        if ( count == saved_pfn ) {
> +            mfn = store_mfn;
> +            pfn++;
> +        } else if ( count == saved_pfn + 1 ) {
> +            mfn = console_mfn;
> +            pfn++;
> +        } else if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
>              mfn = pfn++;

IMHO it's easier to do this fixup afterwards [1] instead of having to
modify the Dom0 build process in different places (the Dom0 PV
building code is already messy enough).

[1] http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=a38ce82113223e3c5119590c520cc30c8462e709

>          else
>              mfn = initrd_mfn++;
> @@ -737,7 +763,8 @@ int __init dom0_construct_pv(struct domain *d,
>  
>      si->shared_info = virt_to_maddr(d->shared_info);
>  
> -    si->flags        = SIF_PRIVILEGED | SIF_INITDOMAIN;
> +    si->flags        = is_vixen() ? 0 : (SIF_PRIVILEGED | SIF_INITDOMAIN);
> +
>      if ( !vinitrd_start && initrd_len )
>          si->flags   |= SIF_MOD_START_PFN;
>      si->flags       |= (xen_processor_pmbits << 8) & SIF_PM_MASK;
> @@ -818,6 +845,32 @@ int __init dom0_construct_pv(struct domain *d,
>          }
>      }
>  
> +    if ( is_vixen() )
> +    {
> +        dom0_update_physmap(d, saved_pfn, store_mfn, vphysmap_start);
> +        dom0_update_physmap(d, saved_pfn + 1, console_mfn, vphysmap_start);
> +
> +        rc = evtchn_alloc_proxy(d, store_evtchn, ECS_INTERDOMAIN);
> +        if ( rc )
> +        {
> +            printk("Vixen: failed to reserve Xenstore event channel %d => %d\n",
> +                   store_evtchn, rc);
> +            goto out;
> +        }
> +        rc = evtchn_alloc_proxy(d, console_evtchn, ECS_INTERDOMAIN);
> +        if ( rc )
> +        {
> +            printk("Vixen: failed to reserve Console event channel %d => %d\n",
> +                   console_evtchn, rc);
> +            goto out;
> +        }

IMHO you could just panic here. Nothing useful is going to happen
after dom0_construct_pv failing and you avoid the goto.

> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 1b89844..c49eeea 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -663,6 +663,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          .stop_bits = 1
>      };
>      struct xen_arch_domainconfig config = { .emulation_flags = 0 };
> +    xen_pfn_t store_mfn = 0, console_mfn = 0;
> +    uint32_t store_evtchn = 0, console_evtchn = 0;
>  
>      /* Critical region without IDT or TSS.  Any fault is deadly! */
>  
> @@ -1595,6 +1597,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
>      }
>  
> +    if ( is_vixen() )
> +        config.emulation_flags = XEN_X86_EMU_PIT;

DomUs should not have an emulated PIT, that's only for Dom0 PV.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM
  2018-01-07  8:16   ` Roger Pau Monné
@ 2018-01-07 15:27     ` Anthony Liguori
  2018-01-08  9:51       ` Roger Pau Monné
  0 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:27 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:16 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:24PM -0800, Anthony Liguori wrote:
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index a56f875..935901b 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -122,6 +122,7 @@
>>  #include <asm/fixmap.h>
>>  #include <asm/io_apic.h>
>>  #include <asm/pci.h>
>> +#include <asm/guest.h>
>>
>>  #include <asm/hvm/grant_table.h>
>>  #include <asm/pv/grant_table.h>
>> @@ -945,7 +946,7 @@ get_page_from_l1e(
>>              case 0:
>>                  break;
>>              case 1:
>> -                if ( !is_hardware_domain(l1e_owner) )
>> +                if ( !is_vixen() && !is_hardware_domain(l1e_owner) )
>>                      break;
>>                  /* fallthrough */
>>              case -1:
>> @@ -5536,6 +5537,21 @@ void arch_dump_shared_mem_info(void)
>>              mem_sharing_get_nr_saved_mfns());
>>  }
>>
>> +const unsigned long *__init
>> +vixen_get_platform_badpages(unsigned int *array_size)
>> +{
>> +    static unsigned long __initdata bad_pages[] = {
>> +        0xfeffc000,
>> +        0xfeffd000,
>> +        0xfeffe000,
>> +        0xfefff000,
>
> This values shouldn't be hardcoded. IMHO it would also be good to
> place all the vixen_ helpers in a single file.

Ack on moving to a helper.

I don't know of a way to call the hypervisor to ask "what's the
special page range?".  I can find special pages via the hvm get
parameters calls but there's no guarantee they are contiguous so the
resulting code to punch holes in the e820 because fairly complex.  Any
ideas how to do this nicely?

Regards,

Anthony Liguori

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/22] vixen: do not permit access to physical IRQs if in Vixen mode
  2018-01-07  8:18   ` Roger Pau Monné
@ 2018-01-07 15:28     ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:28 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:18 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:25PM -0800, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> Our intention is for the Vixen guest to be deprivileged so we need
>> to avoid permitting access to each IRQ even though it is technically
>> the hardware domain.
>
> I'm still not sure I see why you need the vixen guest to be the
> hardware_domain. On the pv-shim work we managed to make the domu !=
> hardware_domain, and that seems to work just fine.

I just haven't tried it yet.  I'll take a shot today and see how hard it is.

Regards,

Anthony Liguori

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping
  2018-01-07  8:23   ` Roger Pau Monné
@ 2018-01-07 15:33     ` Anthony Liguori
  2018-01-08  9:55       ` Roger Pau Monné
  0 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:23 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:26PM -0800, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> We split initialization of Vixen into two parts.  The first part
>> just detects the presence of an HVM hypervisor so that we can
>> figure out whether to modify the e820 table.
>>
>> The later initialization is used to actually map the shared_info
>> structure from the parent hypervisor into Xen.
>>
>> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
>> ---
>>  xen/arch/x86/guest/vixen.c        | 45 +++++++++++++++++++++++++++++++++++++++
>>  xen/arch/x86/setup.c              |  5 +++++
>>  xen/include/asm-x86/guest/vixen.h |  4 ++++
>>  3 files changed, 54 insertions(+)
>>
>> diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
>> index d82e68f..d8466ba 100644
>> --- a/xen/arch/x86/guest/vixen.c
>> +++ b/xen/arch/x86/guest/vixen.c
>> @@ -20,8 +20,53 @@
>>   */
>>
>>  #include <asm/guest/vixen.h>
>> +#include <public/version.h>
>>
>>  static int in_vixen;
>> +static uint8_t global_si_data[4 << 10] __attribute__((aligned(4096)));
>
> The shared_info memory page gfn doesn't need to be populated, by doing
> it like this you are wasting a domain's memory page.

Right, Andy previously gave me this feedback.  It's been on my TODO
but just haven't gotten there.  I'll take a look at the PVShim tree
and see if there's something I can cherry pick for better shared info
handling.

>> +    rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
>> +    if ( rc < 0 )
>> +        printk("Setting shared info page failed: %ld\n", rc);
>> +
>> +    memset(&global_si->native.evtchn_mask[0], 0x00,
>> +           sizeof(global_si->native.evtchn_mask));
>
> Hm, I'm not sure I like to approach of unmasking everything. IMHO I
> would rather mask everything and unmask them when the guest actually
> binds the event channel. That makes sure that an interrupt will get
> injected when the event channel is unmasked (if there's an event
> pending).

This is done in hvmloader and we discovered that guests rely on it.
See hvmloader/xenbus.c:xenbus_shutdown().

Regards,

Anthony Liguori

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] vixen: Use SCHEDOP_shutdown to shutdown the machine
  2018-01-07  8:27   ` Roger Pau Monné
@ 2018-01-07 15:35     ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:35 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:27 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:28PM -0800, Anthony Liguori wrote:
>> From: Jan H. Schönherr <jschoenh@amazon.de>
>>
>> While the hwdom_shutdown() is able to reboot the system, it fails to
>> properly power it off. With SCHEDOP_shutdown, we delegate the problem.
>>
>> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
>> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
>> ---
>>  xen/common/domain.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index b4d679e..ede377c 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -42,6 +42,7 @@
>>  #include <xen/trace.h>
>>  #include <xen/tmem.h>
>>  #include <asm/setup.h>
>> +#include <asm/guest/vixen.h>
>>
>>  /* Linux config option: propageted to domain0 */
>>  /* xen_processor_pmbits: xen control Cx, Px, ... */
>> @@ -693,6 +694,17 @@ void __domain_crash_synchronous(void)
>>  }
>>
>>
>> +static void vixen_shutdown(u8 reason)
>> +{
>> +    struct sched_shutdown sched_shutdown = { .reason = reason };
>> +
>> +    if (!opt_noreboot)
>> +        HYPERVISOR_sched_op(SCHEDOP_shutdown, &sched_shutdown);
>> +
>> +    /* Fallback, in case the hypercall fails */
>> +    hwdom_shutdown(reason);
>> +}
>> +
>>  void domain_shutdown(struct domain *d, u8 reason)
>>  {
>>      struct vcpu *v;
>> @@ -703,6 +715,8 @@ void domain_shutdown(struct domain *d, u8 reason)
>>          d->shutdown_code = reason;
>>      reason = d->shutdown_code;
>>
>> +    if ( is_vixen() )
>> +        vixen_shutdown(reason);
>
> What happens with hypervisor triggered shutdowns? For pv-shim we
> translated all hypervisor initiated shutdowns to crash requests, since
> AFAICT they can only come from panics/BUGs/ASSERTs...

If a guest attempts to gracefully shut down (shutdown -h now), then
without this change, the vixen domain will shut down but the
hypervisor will sit in the idle domain.

With this change, the hypervisor powers off (or restarts depending on
the reason).

An internal BUG() will reset the hypervisor.

Regards,

Anthony Liguori

> Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/22] vixen: pass through version hypercalls to parent Xen
  2018-01-07  8:31   ` Roger Pau Monné
@ 2018-01-07 15:40     ` Anthony Liguori
  2018-01-07 15:55       ` Andrew Cooper
  2018-01-08  9:36       ` Jan Beulich
  0 siblings, 2 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:31 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:30PM -0800, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> This is necessary to trigger event channel upcalls but it is also
>
> I'm lost here, what does version have to do with upcalls?

In Linux, xen_force_evtchn_callback() does HYPERVISOR_xen_version(0,
NULL).  This is done when IRQs are re-enabled after being disabled to
trigger checking pending.

I'm not 100% confident that it's necessary to pass this all the way
through to the parent Xen but it seemed like the right thing to do
since we need the parent to update pending events in order for the
events in Vixen to get updated.

>> useful to passthrough the full version information such that the
>> guest believes it is running on the parent Xen.
>
> In any case, I think this is wrong. The interface the guest sees is
> the interface from vixen, not the interface of the L0. Hence reporting
> the L0 version is not appropriate.

I think it depends on what you want.  We were aiming for maximum
compatibility and many users trigger behavior from Xen version for
better or worse.

Happy to make this optional if this isn't universally desired.

Regards,

Anthony Liguori

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-07  8:36   ` Roger Pau Monné
@ 2018-01-07 15:42     ` Anthony Liguori
  2018-01-07 16:45       ` Andrew Cooper
  2018-01-08 10:05       ` Roger Pau Monné
  0 siblings, 2 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:42 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:36 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:31PM -0800, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> The grant table is a region of guest memory that contains GMFNs
>> which in PV are MFNs but are PFNs in HVM.  Since a Vixen guest MFN
>> is an HVM PFN, we can pass this table directly through to the outer
>> Xen which cuts down considerably on overhead.
>>
>> We do not forward most of the hypercalls since we only intend on
>> Vixen to be used for normal guests, not driver domains.
>>
>> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
>> ---
>>  xen/common/grant_table.c | 131 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 131 insertions(+)
>>
>> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
>> index 250450b..b302fd0 100644
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -39,6 +39,7 @@
>>  #include <xen/vmap.h>
>>  #include <xsm/xsm.h>
>>  #include <asm/flushtlb.h>
>> +#include <asm/guest.h>
>>
>>  /* Per-domain grant information. */
>>  struct grant_table {
>> @@ -1199,6 +1200,9 @@ gnttab_map_grant_ref(
>>      int i;
>>      struct gnttab_map_grant_ref op;
>>
>> +    if ( is_vixen() )
>> +        return -ENOSYS;
>
> Here and below: instead of adding all those is_vixen calls in a bunch
> of gnttab functions, why don't you just replace the whole
> do_grant_table_op function? That's cleaner and less intrusive.

Ack.  That's what we did for event channels and I like it better too.

>>  static long
>> +vixen_gnttab_setup_table(
>> +    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
>> +{
>> +    long rc;
>> +
>> +    struct gnttab_setup_table op;
>> +    xen_pfn_t *frame_list = NULL;
>> +    static void *grant_table;
>> +    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
>> +
>> +    if ( count != 1 )
>> +        return -EINVAL;
>> +
>> +    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
>> +    {
>> +        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
>> +        return -EFAULT;
>> +    }
>> +
>> +    if ( grant_table == NULL ) {
>> +        struct xen_add_to_physmap xatp;
>> +        struct domain *d;
>> +        int i;
>> +
>> +        for ( i = 0; i < max_grant_frames; i++ )
>> +        {
>> +             grant_table = alloc_xenheap_page();
>
> This is wasting one memory page, grant table frames don't need to be
> populated.

Well they have to have a valid struct page_info in order for the guest
to map it within its address space.

Or did you have something else in mind?

>> +             BUG_ON(grant_table == NULL);
>> +             xatp.domid = DOMID_SELF;
>> +             xatp.idx = i;
>> +             xatp.space = XENMAPSPACE_grant_table;
>> +             xatp.gpfn = virt_to_mfn(grant_table);
>> +             rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
>> +             if ( rc != 0 )
>> +                 printk("Add to physmap failed! %ld\n", rc);
>> +
>> +             d = rcu_lock_current_domain();
>> +             share_xen_page_with_guest(mfn_to_page(xatp.gpfn), d, XENSHARE_writable);
>> +             rcu_unlock_domain(d);
>> +        }
>> +    }
>> +
>> +    if ( op.nr_frames > 0 ) {
>> +        frame_list = xzalloc_array(xen_pfn_t, op.nr_frames);
>> +        if ( frame_list == NULL )
>> +            return -ENOMEM;
>> +    }
>> +
>> +    old_frame_list = op.frame_list;
>> +    op.frame_list.p = frame_list;
>> +
>> +    rc = HYPERVISOR_grant_table_op(GNTTABOP_setup_table, &op, count);
>
> On HVM you don't need to use the GNTTABOP_setup_table hypercall,
> XENMEM_add_to_physmap already does all the needed setup AFAICT.

I'll double check this, thanks.

>> +    op.frame_list = old_frame_list;
>> +
>> +    if ( rc >= 0 ) {
>> +        if ( op.status == 0 && op.nr_frames &&
>> +             copy_to_guest(old_frame_list, frame_list, op.nr_frames) != 0 ) {
>> +            rc = -EFAULT;
>> +            goto out;
>> +        }
>> +
>> +        if ( unlikely(copy_to_guest(uop, &op, 1)) != 0 ) {
>> +            rc = -EFAULT;
>> +            goto out;
>> +        }
>> +    }
>> +
>> + out:
>> +    xfree(frame_list);
>> +
>> +    return rc;
>> +}
>> +
>> +static long
>>  gnttab_setup_table(
>>      XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count,
>>      unsigned int limit_max)
>> @@ -1811,6 +1895,9 @@ gnttab_setup_table(
>>      struct grant_table *gt;
>>      unsigned int i;
>>
>> +    if ( is_vixen() )
>> +        return vixen_gnttab_setup_table(uop, count);
>> +
>>      if ( count != 1 )
>>          return -EINVAL;
>>
>> @@ -1892,6 +1979,26 @@ gnttab_setup_table(
>>  }
>>
>>  static long
>> +vixen_gnttab_query_size(
>> +    XEN_GUEST_HANDLE_PARAM(gnttab_query_size_t) uop, unsigned int count)
>> +{
>> +    struct gnttab_query_size op;
>> +    int rc;
>> +
>> +    if ( count != 1 )
>> +        return -EINVAL;
>> +
>> +    if ( unlikely(copy_from_guest(&op, uop, 1)) != 0)
>> +        return -EFAULT;
>> +
>> +    rc = HYPERVISOR_grant_table_op(GNTTABOP_query_size, &op, count);
>> +    if (rc == 0 && unlikely(__copy_to_guest(uop, &op, 1)) )
>            ^ nit: missing space

Ack.

Regards,

Anthony Liguori

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/22] vixen: setup infrastructure to receive event channel notifications
  2018-01-07  8:42   ` Roger Pau Monné
@ 2018-01-07 15:45     ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:42 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:32PM -0800, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> This patch registers an interrupt handler using either an INTx
>> interrupt from the platform PCI device, CALLBACK_IRQ vector
>> delivery, or evtchn_upcall_vector depending on what the parent
>> hypervisor supports.
>>
>> The event channel polling code comes from Linux but uses the
>> internal infrastructure for delivery.
>>
>> Finally, this infrastructure has to be initialized per-VCPU so
>> hook the appropriate place for that.
>>
>> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
>> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
>> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
>> ---
>>  xen/arch/x86/domain.c             |   3 +
>>  xen/arch/x86/guest/vixen.c        | 264 ++++++++++++++++++++++++++++++++++++++
>>  xen/arch/x86/setup.c              |   3 +
>>  xen/include/asm-x86/guest/vixen.h |   6 +
>>  4 files changed, 276 insertions(+)
>>
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index da1bf1a..3e9c5be 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1147,6 +1147,9 @@ int arch_set_info_guest(
>>
>>      update_cr3(v);
>>
>> +    if ( is_vixen() )
>> +        vixen_vcpu_initialize(v);
>> +
>>   out:
>>      if ( flags & VGCF_online )
>>          clear_bit(_VPF_down, &v->pause_flags);
>> diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
>> index 1816ece..76d9638 100644
>> --- a/xen/arch/x86/guest/vixen.c
>> +++ b/xen/arch/x86/guest/vixen.c
>> @@ -21,10 +21,16 @@
>>
>>  #include <asm/guest/vixen.h>
>>  #include <public/version.h>
>> +#include <xen/event.h>
>> +#include <asm/apic.h>
>>
>>  static int in_vixen;
>>  static uint8_t global_si_data[4 << 10] __attribute__((aligned(4096)));
>>  static shared_info_t *global_si = (void *)global_si_data;
>> +static bool vixen_per_cpu_notifications = true;
>> +static uint8_t vixen_evtchn_vector;
>> +static bool vixen_needs_apic_ack = true;
>> +struct irqaction vixen_irqaction;
>>
>>  void __init init_vixen(void)
>>  {
>> @@ -94,3 +100,261 @@ u64 vixen_get_cpu_freq(void)
>>       return imm >> time.tsc_shift;
>>      }
>>  }
>> +
>> +/*
>> + * Make a bitmask (i.e. unsigned long *) of a xen_ulong_t
>> + * array. Primarily to avoid long lines (hence the terse name).
>> + */
>> +#define BM(x) (unsigned long *)(x)
>> +/* Find the first set bit in a evtchn mask */
>> +#define EVTCHN_FIRST_BIT(w) find_first_bit(BM(&(w)), BITS_PER_XEN_ULONG)
>> +
>> +/*
>> + * Mask out the i least significant bits of w
>> + */
>> +#define MASK_LSBS(w, i) (w & ((~((xen_ulong_t)0UL)) << i))
>> +
>> +static DEFINE_PER_CPU(unsigned int, current_word_idx);
>> +static DEFINE_PER_CPU(unsigned int, current_bit_idx);
>> +
>> +static inline xen_ulong_t active_evtchns(unsigned int cpu,
>> +                                         shared_info_t *sh,
>> +                                         unsigned int idx)
>> +{
>> +    return sh->native.evtchn_pending[idx] &
>> +           ~sh->native.evtchn_mask[idx];
>> +}
>> +
>> +static void vixen_evtchn_poll_one(size_t cpu)
>> +{
>
> All this seems overly complicated, specially taking into account that
> vixen itself doesn't execute almost any code for each interrupt, since
> they are forwarded to the guest. IMHO you could have a simpler event
> channel loop without loosing much performance or fairness (but I
> haven't done much tests regarding that, so I could be wrong).

We started with something much simpler.  You may have seen earlier
versions of that.  We really struggled with delivery of events for !VCPU0
particularly when using INTx callbacks.  We never quite got it working
100% reliably.

Ultimately, when we switched to this implementation, we were able to
make it work reliably.  If there are concrete suggestions for simplifying this
logic, I'm happy to try it out and test it, but I'm writing this from
scratch again
because it's proven to be tricky.

>> +    shared_info_t *s = global_si;
>> +    struct vcpu_info *vcpu_info = &s->native.vcpu_info[cpu];
>> +    xen_ulong_t pending_words;
>> +    xen_ulong_t pending_bits;
>> +    int start_word_idx, start_bit_idx;
>> +    int word_idx, bit_idx, i;
>> +
>> +    /*
>> +     * Master flag must be cleared /before/ clearing
>> +     * selector flag. xchg_xen_ulong must contain an
>> +     * appropriate barrier.
>> +     */
>> +    pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0);
>> +
>> +    start_word_idx = this_cpu(current_word_idx);
>> +    start_bit_idx = this_cpu(current_bit_idx);
>> +
>> +    word_idx = start_word_idx;
>> +
>> +    for (i = 0; pending_words != 0; i++) {
>> +        xen_ulong_t words;
>> +
>> +        words = MASK_LSBS(pending_words, word_idx);
>> +
>> +        /*
>> +         * If we masked out all events, wrap to beginning.
>> +         */
>> +        if (words == 0) {
>> +            word_idx = 0;
>> +            bit_idx = 0;
>> +            continue;
>> +        }
>> +        word_idx = EVTCHN_FIRST_BIT(words);
>> +
>> +        pending_bits = active_evtchns(cpu, s, word_idx);
>> +        bit_idx = 0; /* usually scan entire word from start */
>> +        /*
>> +         * We scan the starting word in two parts.
>> +         *
>> +         * 1st time: start in the middle, scanning the
>> +         * upper bits.
>> +         *
>> +         * 2nd time: scan the whole word (not just the
>> +         * parts skipped in the first pass) -- if an
>> +         * event in the previously scanned bits is
>> +         * pending again it would just be scanned on
>> +         * the next loop anyway.
>> +         */
>> +        if (word_idx == start_word_idx) {
>> +            if (i == 0)
>> +                bit_idx = start_bit_idx;
>> +        }
>> +
>> +        do {
>> +            struct evtchn *chn;
>> +            xen_ulong_t bits;
>> +            int port;
>> +
>> +            bits = MASK_LSBS(pending_bits, bit_idx);
>> +
>> +            /* If we masked out all events, move on. */
>> +            if (bits == 0)
>> +                break;
>> +
>> +            bit_idx = EVTCHN_FIRST_BIT(bits);
>> +
>> +            /* Process port. */
>> +            port = (word_idx * BITS_PER_XEN_ULONG) + bit_idx;
>> +
>> +            chn = evtchn_from_port(hardware_domain, port);
>> +            clear_bit(port, s->native.evtchn_pending);
>> +            evtchn_port_set_pending(hardware_domain, chn->notify_vcpu_id, chn);
>> +
>> +            bit_idx = (bit_idx + 1) % BITS_PER_XEN_ULONG;
>> +
>> +            /* Next caller starts at last processed + 1 */
>> +            this_cpu(current_word_idx) = bit_idx ? word_idx : (word_idx+1) % BITS_PER_XEN_ULONG;
>> +            this_cpu(current_bit_idx) = bit_idx;
>> +        } while (bit_idx != 0);
>> +
>> +        /* Scan start_l1i twice; all others once. */
>> +        if ((word_idx != start_word_idx) || (i != 0))
>> +            pending_words &= ~(1UL << word_idx);
>> +
>> +        word_idx = (word_idx + 1) % BITS_PER_XEN_ULONG;
>> +    }
>> +}
>> +
>> +static void vixen_upcall(int cpu)
>> +{
>> +    shared_info_t *s = global_si;
>> +    struct vcpu_info *vcpu_info = &s->native.vcpu_info[cpu];
>> +
>> +    do {
>> +        vcpu_info->evtchn_upcall_pending = 0;
>> +        vixen_evtchn_poll_one(cpu);
>> +    } while (vcpu_info->evtchn_upcall_pending);
>> +}
>> +
>> +static void vixen_evtchn_notify(struct cpu_user_regs *regs)
>> +{
>> +    if (vixen_needs_apic_ack)
>> +        ack_APIC_irq();
>> +
>> +    vixen_upcall(smp_processor_id());
>> +}
>> +
>> +static void vixen_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
>> +{
>> +    vixen_upcall(smp_processor_id());
>> +}
>> +
>> +static int hvm_set_parameter(int idx, uint64_t value)
>> +{
>> +    struct xen_hvm_param xhv;
>> +    int r;
>> +
>> +    xhv.domid = DOMID_SELF;
>> +    xhv.index = idx;
>> +    xhv.value = value;
>> +    r = HYPERVISOR_hvm_op(HVMOP_set_param, &xhv);
>> +    if (r < 0) {
>> +        printk("Cannot set hvm parameter %d: %d!\n",
>> +               idx, r);
>> +        return r;
>> +    }
>> +    return r;
>> +}
>> +
>> +void vixen_vcpu_initialize(struct vcpu *v)
>> +{
>> +    struct xen_hvm_evtchn_upcall_vector upcall;
>> +    long rc;
>> +
>> +    printk("VIXEN vcpu init VCPU%d\n", v->vcpu_id);
>> +
>> +    vcpu_pin_override(v, v->vcpu_id);
>> +
>> +    if (!vixen_needs_apic_ack)
>> +        return;
>> +
>> +    printk("VIXEN vcpu init VCPU%d -- trying evtchn_upcall_vector\n", v->vcpu_id);
>> +
>> +    upcall.vcpu = v->vcpu_id;
>> +    upcall.vector = vixen_evtchn_vector;
>> +    rc = HYPERVISOR_hvm_op(HVMOP_set_evtchn_upcall_vector, &upcall);
>> +    if ( rc )
>> +    {
>> +        struct xen_feature_info fi;
>> +
>> +        printk("VIXEN vcpu init VCPU%d -- trying hvm_callback_vector\n", v->vcpu_id);
>> +
>> +        fi.submap_idx = 0;
>> +        rc = HYPERVISOR_xen_version(XENVER_get_features, &fi);
>> +        if ( !rc )
>> +        {
>> +            rc = -EINVAL;
>> +            if ( fi.submap & (1 << XENFEAT_hvm_callback_vector) )
>> +            {
>> +                rc = hvm_set_parameter(HVM_PARAM_CALLBACK_IRQ,
>> +                                       ((uint64_t)HVM_PARAM_CALLBACK_TYPE_VECTOR << 56) | vixen_evtchn_vector);
>> +            }
>> +            if ( !rc )
>> +                vixen_needs_apic_ack = false;
>> +        }
>> +    }
>> +
>> +    if ( rc )
>> +    {
>> +        int slot;
>> +
>> +        vixen_per_cpu_notifications = false;
>> +
>> +        printk("VIXEN vcpu init VCPU%d -- trying pci_intx_callback\n", v->vcpu_id);
>> +        for (slot = 2; slot < 32; slot++) {
>
> Coding style for braces and missing spaces in the condition, here and
> below.

Ack.

>
>> +            uint16_t vendor, device;
>> +
>> +            vendor = pci_conf_read16(0, 0, slot, 0, PCI_VENDOR_ID);
>> +            device = pci_conf_read16(0, 0, slot, 0, PCI_DEVICE_ID);
>> +
>> +            if (vendor == 0x5853 && device == 0x0001) {
>
> Those values should be made defines and documented somewhere.

Ack.

Regards,

Anthony Liguori

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying
  2018-01-07  8:44   ` Roger Pau Monné
@ 2018-01-07 15:46     ` Anthony Liguori
  2018-01-08 10:04       ` Jan H. Schönherr
  0 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:46 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 12:44 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:33PM -0800, Anthony Liguori wrote:
>> From: Jan H. Schönherr <jschoenh@amazon.de>
>>
>> Previously, we would keep proxied event channels as ECS_INTERDOMAIN
>> channel around. This works for most things, but has the problem
>> that EVTCHNOP_status is broken, and that EVTCHNOP_close does not
>> mark an event channel as free.
>
> Why not use ECS_RESERVED for event channels that are forwarded to L0?
>
> You could easily see whether an event channel is forwarded or not just
> by checking if it's ECS_RESERVED, and then decide whether to forward
> the hypercall to L0 or handle it in vixen.

Jan?

Regards,

Anthony Liguori

>
> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/22] vixen: dom0 builder support
  2018-01-07  9:02   ` Roger Pau Monné
@ 2018-01-07 15:52     ` Anthony Liguori
  2018-01-08 10:03       ` Roger Pau Monné
  0 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 15:52 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 7, 2018 at 1:02 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Sat, Jan 06, 2018 at 02:54:37PM -0800, Anthony Liguori wrote:
>> From: Anthony Liguori <aliguori@amazon.com>
>>
>> The dom0 builder requires a number of modifications in order to be
>> able to launch unprivileged guests.  The console and store pages
>> must be mapped in a specific location within the guest's initial
>> page table.
>>
>> We also have to setup the start info to be what's expected for
>> unprivileged guests and supress the normal logic to give dom0
>> increased permissions.
>>
>> We have to pass around the console and store pages which involves
>> touching a number of places including the PVH builder.
>
> AFAICT you are missing a fix for the positions of the p2m mapping in
> the hypervisor virtual memory hole for 32bit PV guests [0].
>
> Without this fix the 32bit DomU ABI is broken, which mandates the m2p
> to always be mapped at virt_hv_start_low, and some early Linux pvops
> kernels will fail to boot (IIRC from 2.6.32-2.6.36, because they don't
> have XENMEM_machphys_mapping implemented).
>
> [0] http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=28b2108b362e8976676a96c90eee058605427b57

Thanks!  Will pick this up and do a bit of testing.

>
>> @@ -276,7 +277,9 @@ int __init dom0_construct_pv(struct domain *d,
>>                               unsigned long image_headroom,
>>                               module_t *initrd,
>>                               void *(*bootstrap_map)(const module_t *),
>> -                             char *cmdline)
>> +                             char *cmdline,
>> +                             xen_pfn_t store_mfn, uint32_t store_evtchn,
>> +                             xen_pfn_t console_mfn, uint32_t console_evtchn)
>>  {
>>      int i, cpu, rc, compatible, compat32, order, machine;
>>      struct cpu_user_regs *regs;
>> @@ -299,6 +302,7 @@ int __init dom0_construct_pv(struct domain *d,
>>      l3_pgentry_t *l3tab = NULL, *l3start = NULL;
>>      l2_pgentry_t *l2tab = NULL, *l2start = NULL;
>>      l1_pgentry_t *l1tab = NULL, *l1start = NULL;
>> +    xen_pfn_t saved_pfn = ~0UL;
>>
>>      /*
>>       * This fully describes the memory layout of the initial domain. All
>> @@ -441,8 +445,24 @@ int __init dom0_construct_pv(struct domain *d,
>>          vphysmap_end = vphysmap_start;
>>      vstartinfo_start = round_pgup(vphysmap_end);
>>      vstartinfo_end   = (vstartinfo_start +
>> -                        sizeof(struct start_info) +
>> -                        sizeof(struct dom0_vga_console_info));
>> +                        sizeof(struct start_info));
>> +    if ( !is_vixen() )
>> +        vstartinfo_end += sizeof(struct dom0_vga_console_info);
>> +    vstartinfo_end   = round_pgup(vstartinfo_end);
>> +
>> +    if ( is_vixen() ) {
>> +        struct page_info *pg;
>> +
>> +        saved_pfn = (vstartinfo_end - v_start) / PAGE_SIZE;
>> +
>> +        pg = mfn_to_page(store_mfn);
>> +        share_xen_page_with_guest(pg, d, XENSHARE_writable);
>> +        vstartinfo_end   += PAGE_SIZE;
>> +
>> +        pg = mfn_to_page(console_mfn);
>> +        share_xen_page_with_guest(pg, d, XENSHARE_writable);
>> +        vstartinfo_end   += PAGE_SIZE;
>> +    }
>>
>>      vpt_start        = round_pgup(vstartinfo_end);
>>      for ( nr_pt_pages = 2; ; nr_pt_pages++ )
>> @@ -634,7 +654,13 @@ int __init dom0_construct_pv(struct domain *d,
>>              *l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
>>              l2tab++;
>>          }
>> -        if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
>> +        if ( count == saved_pfn ) {
>> +            mfn = store_mfn;
>> +            pfn++;
>> +        } else if ( count == saved_pfn + 1 ) {
>> +            mfn = console_mfn;
>> +            pfn++;
>> +        } else if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
>>              mfn = pfn++;
>
> IMHO it's easier to do this fixup afterwards [1] instead of having to
> modify the Dom0 build process in different places (the Dom0 PV
> building code is already messy enough).
>
> [1] http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=a38ce82113223e3c5119590c520cc30c8462e709

I'm indifferent on approach but agree that the code is a mess :-D

I'll look at that commit a bit more.

>>          else
>>              mfn = initrd_mfn++;
>> @@ -737,7 +763,8 @@ int __init dom0_construct_pv(struct domain *d,
>>
>>      si->shared_info = virt_to_maddr(d->shared_info);
>>
>> -    si->flags        = SIF_PRIVILEGED | SIF_INITDOMAIN;
>> +    si->flags        = is_vixen() ? 0 : (SIF_PRIVILEGED | SIF_INITDOMAIN);
>> +
>>      if ( !vinitrd_start && initrd_len )
>>          si->flags   |= SIF_MOD_START_PFN;
>>      si->flags       |= (xen_processor_pmbits << 8) & SIF_PM_MASK;
>> @@ -818,6 +845,32 @@ int __init dom0_construct_pv(struct domain *d,
>>          }
>>      }
>>
>> +    if ( is_vixen() )
>> +    {
>> +        dom0_update_physmap(d, saved_pfn, store_mfn, vphysmap_start);
>> +        dom0_update_physmap(d, saved_pfn + 1, console_mfn, vphysmap_start);
>> +
>> +        rc = evtchn_alloc_proxy(d, store_evtchn, ECS_INTERDOMAIN);
>> +        if ( rc )
>> +        {
>> +            printk("Vixen: failed to reserve Xenstore event channel %d => %d\n",
>> +                   store_evtchn, rc);
>> +            goto out;
>> +        }
>> +        rc = evtchn_alloc_proxy(d, console_evtchn, ECS_INTERDOMAIN);
>> +        if ( rc )
>> +        {
>> +            printk("Vixen: failed to reserve Console event channel %d => %d\n",
>> +                   console_evtchn, rc);
>> +            goto out;
>> +        }
>
> IMHO you could just panic here. Nothing useful is going to happen
> after dom0_construct_pv failing and you avoid the goto.

Ack.

>> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
>> index 1b89844..c49eeea 100644
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -663,6 +663,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>          .stop_bits = 1
>>      };
>>      struct xen_arch_domainconfig config = { .emulation_flags = 0 };
>> +    xen_pfn_t store_mfn = 0, console_mfn = 0;
>> +    uint32_t store_evtchn = 0, console_evtchn = 0;
>>
>>      /* Critical region without IDT or TSS.  Any fault is deadly! */
>>
>> @@ -1595,6 +1597,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>          config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
>>      }
>>
>> +    if ( is_vixen() )
>> +        config.emulation_flags = XEN_X86_EMU_PIT;
>
> DomUs should not have an emulated PIT, that's only for Dom0 PV.

Unfortunately, they do need an emulated PIT.  Until PVH got merged, an
emulated PIT
was always present for DomUs and 2.6.21 era kernels use the PIT for
TSC calibration.
If a PIT isn't present, then they hang during early boot.

Regards,

Anthony Liguori

> Thanks, Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/22] vixen: pass through version hypercalls to parent Xen
  2018-01-07 15:40     ` Anthony Liguori
@ 2018-01-07 15:55       ` Andrew Cooper
  2018-01-08  9:36       ` Jan Beulich
  1 sibling, 0 replies; 80+ messages in thread
From: Andrew Cooper @ 2018-01-07 15:55 UTC (permalink / raw)
  To: Anthony Liguori, Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Anthony Liguori, KarimAllah Ahmed,
	Jan H. Schönherr, Matt Wilson, xen-devel

On 07/01/2018 15:40, Anthony Liguori wrote:
> On Sun, Jan 7, 2018 at 12:31 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Sat, Jan 06, 2018 at 02:54:30PM -0800, Anthony Liguori wrote:
>>> From: Anthony Liguori <aliguori@amazon.com>
>>>
>>> This is necessary to trigger event channel upcalls but it is also
>> I'm lost here, what does version have to do with upcalls?
> In Linux, xen_force_evtchn_callback() does HYPERVISOR_xen_version(0,
> NULL).  This is done when IRQs are re-enabled after being disabled to
> trigger checking pending.
>
> I'm not 100% confident that it's necessary to pass this all the way
> through to the parent Xen but it seemed like the right thing to do
> since we need the parent to update pending events in order for the
> events in Vixen to get updated.
>
>>> useful to passthrough the full version information such that the
>>> guest believes it is running on the parent Xen.
>> In any case, I think this is wrong. The interface the guest sees is
>> the interface from vixen, not the interface of the L0. Hence reporting
>> the L0 version is not appropriate.
> I think it depends on what you want.  We were aiming for maximum
> compatibility and many users trigger behavior from Xen version for
> better or worse.
>
> Happy to make this optional if this isn't universally desired.

It will be subtle either way.

My gut feeling is that it will be worse to pretend that Xen 4.10 isn't
4.10, than having PV guests suddenly find themselves on a newer
hypervisor.  The PV ABI hasn't changed much at all.

I don't have any evident to back up this feeling though.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-07 15:42     ` Anthony Liguori
@ 2018-01-07 16:45       ` Andrew Cooper
  2018-01-07 17:09         ` Anthony Liguori
  2018-01-08 10:12         ` Roger Pau Monné
  2018-01-08 10:05       ` Roger Pau Monné
  1 sibling, 2 replies; 80+ messages in thread
From: Andrew Cooper @ 2018-01-07 16:45 UTC (permalink / raw)
  To: Anthony Liguori, Roger Pau Monné
  Cc: Juergen Gross, Anthony Liguori, Wei Liu, Anthony Liguori,
	KarimAllah Ahmed, Jan H. Schönherr, Jan Beulich,
	Paul Durrant, Matt Wilson, xen-devel

On 07/01/2018 15:42, Anthony Liguori wrote:
> On Sun, Jan 7, 2018 at 12:36 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Sat, Jan 06, 2018 at 02:54:31PM -0800, Anthony Liguori wrote:
>>>  static long
>>> +vixen_gnttab_setup_table(
>>> +    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
>>> +{
>>> +    long rc;
>>> +
>>> +    struct gnttab_setup_table op;
>>> +    xen_pfn_t *frame_list = NULL;
>>> +    static void *grant_table;
>>> +    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
>>> +
>>> +    if ( count != 1 )
>>> +        return -EINVAL;
>>> +
>>> +    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
>>> +    {
>>> +        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
>>> +        return -EFAULT;
>>> +    }
>>> +
>>> +    if ( grant_table == NULL ) {
>>> +        struct xen_add_to_physmap xatp;
>>> +        struct domain *d;
>>> +        int i;
>>> +
>>> +        for ( i = 0; i < max_grant_frames; i++ )
>>> +        {
>>> +             grant_table = alloc_xenheap_page();
>> This is wasting one memory page, grant table frames don't need to be
>> populated.
> Well they have to have a valid struct page_info in order for the guest
> to map it within its address space.
>
> Or did you have something else in mind?

Mapping of L0 frames into L1 is a giant mess.

First of all, some technical facts:
1) Frames which we map from L0 into L1 do not need to replace existing
RAM.  We can use any GFNs up to maxphysaddr.
2) Mapped frames should not replace RAM, and particularly not frames in
.data or .bss, because of the performance hit from shattered host
superpages.
3) Ideally, we'd want to map into entirely unused GFNs, because then we
don't have to interfere with what was there before.

In Xen, to allow a frame to be used by a guest, we need to set up domain
ownership for it.  This requires a struct page_info to exist, which by
default only occurs for pages L1 Xen things is RAM.

There is a completely gross way of dealing with this by faking up L1's
E820 map to include a range as RAM, and adding every entry in that range
into the badpages list.  This causes L1 Xen to put together page_info's
for them, but otherwise ignore their existence.

Off the top of my head, frames needing special attention are:
* The special pages, including Xenstore and Console rings.  These are
real frames (as opposed to mappings), but live inside an E820 hole from
L1's point of view.
* Shared info
* Grant table/status frames
* Vcpuinfo frames
* Event_fifo (if we care to wire that up, but perhaps its not worth it).

What I started doing in PV-shim (before switching to the SP2 side of
things fully) was to hard code these mapping frames immediately after
the special pages, which is a horrible but safe (as far as I can tell)
way of doing things.

Ideally, L1 could work out a safe place to use for mappings (which
ideally, would be a block of GFNs immediately above the last used
frame), but this cannot be done with the toolstack-provided E820 alone,
because it is insufficiently descriptive as it deliberately omits
information which can be found in the DSDT (e.g. ACPI hotplug regions).

The only reasonable option is for L0 to fully understand the guest
physical address, and be able to report the details fully to L1,
probably in an E820-like way but with our own type identifiers to cover
the options which aren't in the E820 spec.

This allows L1 to be positively told information such as "This range is
safe for mapping into", without having to go and parse all the secondary
layout information which is derived from this information in the first
place.  Having said that, this will require hypervisor and toolstack
changes, so isn't reasonable to retrofit.

Overall I want to ensure that, whatever plan we come up with for the
shim, it doesn't further tangle things up and make them harder to untangle.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-07 16:45       ` Andrew Cooper
@ 2018-01-07 17:09         ` Anthony Liguori
  2018-01-07 18:45           ` Anthony Liguori
  2018-01-08 10:12         ` Roger Pau Monné
  1 sibling, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 17:09 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Anthony Liguori, Wei Liu, Matt Wilson,
	KarimAllah Ahmed, Jan H. Schönherr, Jan Beulich,
	Paul Durrant, Anthony Liguori, xen-devel, Roger Pau Monné

On Sun, Jan 7, 2018 at 8:45 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 07/01/2018 15:42, Anthony Liguori wrote:
>> On Sun, Jan 7, 2018 at 12:36 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> On Sat, Jan 06, 2018 at 02:54:31PM -0800, Anthony Liguori wrote:
>>>>  static long
>>>> +vixen_gnttab_setup_table(
>>>> +    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
>>>> +{
>>>> +    long rc;
>>>> +
>>>> +    struct gnttab_setup_table op;
>>>> +    xen_pfn_t *frame_list = NULL;
>>>> +    static void *grant_table;
>>>> +    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
>>>> +
>>>> +    if ( count != 1 )
>>>> +        return -EINVAL;
>>>> +
>>>> +    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
>>>> +    {
>>>> +        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
>>>> +        return -EFAULT;
>>>> +    }
>>>> +
>>>> +    if ( grant_table == NULL ) {
>>>> +        struct xen_add_to_physmap xatp;
>>>> +        struct domain *d;
>>>> +        int i;
>>>> +
>>>> +        for ( i = 0; i < max_grant_frames; i++ )
>>>> +        {
>>>> +             grant_table = alloc_xenheap_page();
>>> This is wasting one memory page, grant table frames don't need to be
>>> populated.
>> Well they have to have a valid struct page_info in order for the guest
>> to map it within its address space.
>>
>> Or did you have something else in mind?
>
> Mapping of L0 frames into L1 is a giant mess.
>
> First of all, some technical facts:
> 1) Frames which we map from L0 into L1 do not need to replace existing
> RAM.  We can use any GFNs up to maxphysaddr.
> 2) Mapped frames should not replace RAM, and particularly not frames in
> .data or .bss, because of the performance hit from shattered host
> superpages.
> 3) Ideally, we'd want to map into entirely unused GFNs, because then we
> don't have to interfere with what was there before.
>
> In Xen, to allow a frame to be used by a guest, we need to set up domain
> ownership for it.  This requires a struct page_info to exist, which by
> default only occurs for pages L1 Xen things is RAM.
>
> There is a completely gross way of dealing with this by faking up L1's
> E820 map to include a range as RAM, and adding every entry in that range
> into the badpages list.  This causes L1 Xen to put together page_info's
> for them, but otherwise ignore their existence.

I'll look at this.  I know it's gross but it's pretty straight forward.

> Off the top of my head, frames needing special attention are:
> * The special pages, including Xenstore and Console rings.  These are
> real frames (as opposed to mappings), but live inside an E820 hole from
> L1's point of view.
> * Shared info
> * Grant table/status frames
> * Vcpuinfo frames

I think you mean the runstate area.  We punch this through in Vixen so
it doesn't need special handling.

> * Event_fifo (if we care to wire that up, but perhaps its not worth it).

I don't think it's worth it TBH.

> What I started doing in PV-shim (before switching to the SP2 side of
> things fully) was to hard code these mapping frames immediately after
> the special pages, which is a horrible but safe (as far as I can tell)
> way of doing things.

I'll take this path after checking myself.

> Ideally, L1 could work out a safe place to use for mappings (which
> ideally, would be a block of GFNs immediately above the last used
> frame), but this cannot be done with the toolstack-provided E820 alone,
> because it is insufficiently descriptive as it deliberately omits
> information which can be found in the DSDT (e.g. ACPI hotplug regions).
>
> The only reasonable option is for L0 to fully understand the guest
> physical address, and be able to report the details fully to L1,
> probably in an E820-like way but with our own type identifiers to cover
> the options which aren't in the E820 spec.
>
> This allows L1 to be positively told information such as "This range is
> safe for mapping into", without having to go and parse all the secondary
> layout information which is derived from this information in the first
> place.  Having said that, this will require hypervisor and toolstack
> changes, so isn't reasonable to retrofit.

Right.  With the current series, no changes are needed to the hypervisor or
toolstack which is pretty powerful.  Maintaining that property is pretty useful.

> Overall I want to ensure that, whatever plan we come up with for the
> shim, it doesn't further tangle things up and make them harder to untangle.

For sure.  I think perhaps codifying the HVM/PVH ABI to say that the special
pages region is, well, special and describing it a bit more is a nice way to
keep things simple but also make it less of a hack.

Regards,

Anthony Liguori

>
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-07 17:09         ` Anthony Liguori
@ 2018-01-07 18:45           ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 18:45 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Anthony Liguori, Wei Liu, Matt Wilson,
	KarimAllah Ahmed, Jan H. Schönherr, Jan Beulich,
	Paul Durrant, Anthony Liguori, xen-devel, Roger Pau Monné

On Sun, Jan 7, 2018 at 9:09 AM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On Sun, Jan 7, 2018 at 8:45 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 07/01/2018 15:42, Anthony Liguori wrote:
>>> On Sun, Jan 7, 2018 at 12:36 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>> On Sat, Jan 06, 2018 at 02:54:31PM -0800, Anthony Liguori wrote:
>>>>>  static long
>>>>> +vixen_gnttab_setup_table(
>>>>> +    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
>>>>> +{
>>>>> +    long rc;
>>>>> +
>>>>> +    struct gnttab_setup_table op;
>>>>> +    xen_pfn_t *frame_list = NULL;
>>>>> +    static void *grant_table;
>>>>> +    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
>>>>> +
>>>>> +    if ( count != 1 )
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
>>>>> +    {
>>>>> +        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
>>>>> +        return -EFAULT;
>>>>> +    }
>>>>> +
>>>>> +    if ( grant_table == NULL ) {
>>>>> +        struct xen_add_to_physmap xatp;
>>>>> +        struct domain *d;
>>>>> +        int i;
>>>>> +
>>>>> +        for ( i = 0; i < max_grant_frames; i++ )
>>>>> +        {
>>>>> +             grant_table = alloc_xenheap_page();
>>>> This is wasting one memory page, grant table frames don't need to be
>>>> populated.
>>> Well they have to have a valid struct page_info in order for the guest
>>> to map it within its address space.
>>>
>>> Or did you have something else in mind?
>>
>> Mapping of L0 frames into L1 is a giant mess.
>>
>> First of all, some technical facts:
>> 1) Frames which we map from L0 into L1 do not need to replace existing
>> RAM.  We can use any GFNs up to maxphysaddr.
>> 2) Mapped frames should not replace RAM, and particularly not frames in
>> .data or .bss, because of the performance hit from shattered host
>> superpages.
>> 3) Ideally, we'd want to map into entirely unused GFNs, because then we
>> don't have to interfere with what was there before.
>>
>> In Xen, to allow a frame to be used by a guest, we need to set up domain
>> ownership for it.  This requires a struct page_info to exist, which by
>> default only occurs for pages L1 Xen things is RAM.
>>
>> There is a completely gross way of dealing with this by faking up L1's
>> E820 map to include a range as RAM, and adding every entry in that range
>> into the badpages list.  This causes L1 Xen to put together page_info's
>> for them, but otherwise ignore their existence.
>
> I'll look at this.  I know it's gross but it's pretty straight forward.
>
>> Off the top of my head, frames needing special attention are:
>> * The special pages, including Xenstore and Console rings.  These are
>> real frames (as opposed to mappings), but live inside an E820 hole from
>> L1's point of view.
>> * Shared info
>> * Grant table/status frames
>> * Vcpuinfo frames
>
> I think you mean the runstate area.  We punch this through in Vixen so
> it doesn't need special handling.
>
>> * Event_fifo (if we care to wire that up, but perhaps its not worth it).
>
> I don't think it's worth it TBH.
>
>> What I started doing in PV-shim (before switching to the SP2 side of
>> things fully) was to hard code these mapping frames immediately after
>> the special pages, which is a horrible but safe (as far as I can tell)
>> way of doing things.
>
> I'll take this path after checking myself.
>
>> Ideally, L1 could work out a safe place to use for mappings (which
>> ideally, would be a block of GFNs immediately above the last used
>> frame), but this cannot be done with the toolstack-provided E820 alone,
>> because it is insufficiently descriptive as it deliberately omits
>> information which can be found in the DSDT (e.g. ACPI hotplug regions).
>>
>> The only reasonable option is for L0 to fully understand the guest
>> physical address, and be able to report the details fully to L1,
>> probably in an E820-like way but with our own type identifiers to cover
>> the options which aren't in the E820 spec.
>>
>> This allows L1 to be positively told information such as "This range is
>> safe for mapping into", without having to go and parse all the secondary
>> layout information which is derived from this information in the first
>> place.  Having said that, this will require hypervisor and toolstack
>> changes, so isn't reasonable to retrofit.
>
> Right.  With the current series, no changes are needed to the hypervisor or
> toolstack which is pretty powerful.  Maintaining that property is pretty useful.
>
>> Overall I want to ensure that, whatever plan we come up with for the
>> shim, it doesn't further tangle things up and make them harder to untangle.
>
> For sure.  I think perhaps codifying the HVM/PVH ABI to say that the special
> pages region is, well, special and describing it a bit more is a nice way to
> keep things simple but also make it less of a hack.

Silly me, it already is.  It's the reserved_mem_pgstart field in
hvm_info_table in ACPI.

Regards,

Anthony Liguori

> Regards,
>
> Anthony Liguori
>
>>
>> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-07  0:05   ` Anthony Liguori
@ 2018-01-07 20:29     ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-07 20:29 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Jan H. Schönherr, Anthony Liguori, xen-devel

I sent a v2 out with most of the changes discussed in this thread.
The only things missing are getting rid of hardware_domain and
ECS_RESERVED vs. ECS_PROXY.

Regards,

Anthony Liguori

On Sat, Jan 6, 2018 at 4:05 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On Sat, Jan 6, 2018 at 3:50 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 06/01/2018 22:54, Anthony Liguori wrote:
>>> From: Anthony Liguori <aliguori@amazon.com>
>>>
>>> CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
>>> appears to be very difficult to isolate the hypervisor's page tables
>>> from PV domUs while maintaining ABI compatibility.  Instead of trying
>>> to make a KPTI-like approach work for Xen PV, it seems reasonable to
>>> run a copy of Xen within an HVM (or PVH) domU to provide backwards
>>> compatibility with guests as mentioned in XSA-254 [1].
>>>
>>> This patch series adds a new mode to Xen called Vixen (Virtualized
>>> Xen)
>>
>> It is quite telling that through all of this, I never even considered
>> asking if vixen stood for anything!
>
> Also, topical for the season:
> https://www.youtube.com/watch?v=78c7vDFt6G8&feature=youtu.be&t=7
>
>>> which provides a PV-compatible interface while gaining
>>> CVE-2017-5754 protection for the host provided by hardware
>>> virtualization.  Vixen supports running a single unprivileged PV
>>> domain (a dom1) that is constructed by the dom0 domain builder.
>>>
>>> Please note the Xen page table configuration fundamental to the
>>> current PV ABI makes it impossible for an operating system to mitigate
>>> CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
>>> (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
>>> must run directly in a HVM or PVH domU.
>>
>> Its a little more complicated than this, but I suppose is worth pointing
>> out.
>>
>> A 64bit PV guest kernel cannot, of its own accord, protect itself
>> against SP3/Meltdown.  This is due to the shared nature/responsibility
>> of pagetables between the PV guest kernel and Xen.
>>
>> What the Vixen/PV-shim plan does is isolate the guest sufficiently that
>> any SP3 attacks can't read data belonging to other guests on the host.
>>
>> An SP3/Meltdown mitigation can only come from having Xen change the way
>> it uses pagetables, and my 44-patch prerequisite series serves to
>> demonstrate that this seems impractical with the existing ABI.
>
> Correct.  You can get close but getting 100% of the way seems unlikely.
>
>>> This series is very similar to the PVH series posted by Wei and we
>>> have been discussing how to merge efforts.  We were hoping to have
>>> more time to work this out.  I am posting this because I'm fairly
>>> confident that this series is complete (all PV instances in EC2 are
>>> using this) and others might find it useful.  I also wanted to have
>>> more of a discussion about the best way to merge and some of the
>>> differences in designs.
>>
>> Some ad hoc thoughts so far:
>>
>> * Upstream, we need to take the PV-Shim side of domid handling.
>> Unilaterally using dom1 is fine for server-virt infrastructure where
>> guests only ever talk to dom0, but isn't fine if you've got domains
>> which are communicating directly (e.g. with libvchan).  This is very
>> minor in the grand scheme of things though.
>
> That's fine.  I think we should try to focus on merging some common
> infrastructure because I don't think 75+ patch series are going to be
> easy to get agreement on.
>
> I'm not a huge fan of passing the domid via CPUID.  That's going to
> be messy over time.  I do, however, like the idea of passing it as a
> command line argument.  I'm happy to add support for that if that's
> agreeable.
>
>> * I do prefer the Vixen side of startup, where we describe rather more
>> clearly what is going on.  I never got around to stea^W borrowing this
>> for PV-shim.
>
> I think no matter what, we should try to get the first few patches merged
> to add basic guest detection and hypercall support.
>
>> * Whatever eventual version gets in upstream, it is important that it
>> HVM and PVH capable for backwards and forwards compatibility.  Again,
>> this doesn't appear to be too complicated to arrange in practice.  For
>> reference, what is the oldest version of Xen you need to target here?
>> (The pre-console-ring observation puts it quite old)
>
> 3.4.x is what we're targetting.  That is indeed old but since since this
> is a security issue, supporting a wide range of environments seems
> like the right thing to do.
>
>> * For PV-shim, we took the approach of making the domU neither
>> privileged nor the hardware domain.  While I expect this throws up a
>> different set of issues, I think it is a cleaner approach overall.
>
> I never got a chance to try this out and see what breaks.
>
> The one argument I'd make against it is that over time, I'd like to add
> privileges to the domU in an attempt to improve performance.  We found
> a lot of weird compatibility issues on older versions of Linux so I didn't
> attempt to do any of this up front but in the long term, I would like to steal
> some of the tricks from Xenner.
>
>> I'm sure there are areas I've missed, but this is hopefully a start.
>
> Thanks Andrew!
>
> Regards,
>
> Anthony Liguori
>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/22] vixen: pass through version hypercalls to parent Xen
  2018-01-07 15:40     ` Anthony Liguori
  2018-01-07 15:55       ` Andrew Cooper
@ 2018-01-08  9:36       ` Jan Beulich
  1 sibling, 0 replies; 80+ messages in thread
From: Jan Beulich @ 2018-01-08  9:36 UTC (permalink / raw)
  To: Roger Pau Monné, Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Anthony Liguori, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Matt Wilson, xen-devel

>>> On 07.01.18 at 16:40, <anthony@codemonkey.ws> wrote:
> On Sun, Jan 7, 2018 at 12:31 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Sat, Jan 06, 2018 at 02:54:30PM -0800, Anthony Liguori wrote:
>>> From: Anthony Liguori <aliguori@amazon.com>
>>>
>>> This is necessary to trigger event channel upcalls but it is also
>>
>> I'm lost here, what does version have to do with upcalls?
> 
> In Linux, xen_force_evtchn_callback() does HYPERVISOR_xen_version(0,
> NULL).  This is done when IRQs are re-enabled after being disabled to
> trigger checking pending.
> 
> I'm not 100% confident that it's necessary to pass this all the way
> through to the parent Xen but it seemed like the right thing to do
> since we need the parent to update pending events in order for the
> events in Vixen to get updated.
> 
>>> useful to passthrough the full version information such that the
>>> guest believes it is running on the parent Xen.
>>
>> In any case, I think this is wrong. The interface the guest sees is
>> the interface from vixen, not the interface of the L0. Hence reporting
>> the L0 version is not appropriate.
> 
> I think it depends on what you want.  We were aiming for maximum
> compatibility and many users trigger behavior from Xen version for
> better or worse.

At the example of the plain version number, I think what is being
reported back to the guest may need to be the lower of shim and
actual hypervisor versions. XENVER_get_features may want ANDing
both values (perhaps with some customization in case there are
bits exclusively affected by either party). I didn't think through other
sub-ops yet.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM
  2018-01-07 15:27     ` Anthony Liguori
@ 2018-01-08  9:51       ` Roger Pau Monné
  2018-01-08  9:54         ` Andrew Cooper
  0 siblings, 1 reply; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08  9:51 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 07, 2018 at 07:27:48AM -0800, Anthony Liguori wrote:
> On Sun, Jan 7, 2018 at 12:16 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > On Sat, Jan 06, 2018 at 02:54:24PM -0800, Anthony Liguori wrote:
> >> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> >> index a56f875..935901b 100644
> >> --- a/xen/arch/x86/mm.c
> >> +++ b/xen/arch/x86/mm.c
> >> @@ -122,6 +122,7 @@
> >>  #include <asm/fixmap.h>
> >>  #include <asm/io_apic.h>
> >>  #include <asm/pci.h>
> >> +#include <asm/guest.h>
> >>
> >>  #include <asm/hvm/grant_table.h>
> >>  #include <asm/pv/grant_table.h>
> >> @@ -945,7 +946,7 @@ get_page_from_l1e(
> >>              case 0:
> >>                  break;
> >>              case 1:
> >> -                if ( !is_hardware_domain(l1e_owner) )
> >> +                if ( !is_vixen() && !is_hardware_domain(l1e_owner) )
> >>                      break;
> >>                  /* fallthrough */
> >>              case -1:
> >> @@ -5536,6 +5537,21 @@ void arch_dump_shared_mem_info(void)
> >>              mem_sharing_get_nr_saved_mfns());
> >>  }
> >>
> >> +const unsigned long *__init
> >> +vixen_get_platform_badpages(unsigned int *array_size)
> >> +{
> >> +    static unsigned long __initdata bad_pages[] = {
> >> +        0xfeffc000,
> >> +        0xfeffd000,
> >> +        0xfeffe000,
> >> +        0xfefff000,
> >
> > This values shouldn't be hardcoded. IMHO it would also be good to
> > place all the vixen_ helpers in a single file.
> 
> Ack on moving to a helper.
> 
> I don't know of a way to call the hypervisor to ask "what's the
> special page range?".  I can find special pages via the hvm get
> parameters calls but there's no guarantee they are contiguous so the
> resulting code to punch holes in the e820 because fairly complex.  Any
> ideas how to do this nicely?

I've done something similar for the shim, but the values in the
bag_pages array are dynamic:

http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=d5a72acaa2ced1bd66a1ef1ef7a4a1bda43a9df3

Also, why do you need to add 4 GFNs to the list of bad pages? Just
adding the console/xenstore pages to the e820 and to the list of bad
pages should be enough.

It's a nit at this stage, but I again think vixen related code should
live in a separate file instead of polluting x86/mm.c

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM
  2018-01-08  9:51       ` Roger Pau Monné
@ 2018-01-08  9:54         ` Andrew Cooper
  2018-01-08 10:23           ` Roger Pau Monné
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Cooper @ 2018-01-08  9:54 UTC (permalink / raw)
  To: Roger Pau Monné, Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Anthony Liguori, KarimAllah Ahmed,
	Jan H. Schönherr, Matt Wilson, xen-devel

On 08/01/2018 09:51, Roger Pau Monné wrote:
> On Sun, Jan 07, 2018 at 07:27:48AM -0800, Anthony Liguori wrote:
>> On Sun, Jan 7, 2018 at 12:16 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> On Sat, Jan 06, 2018 at 02:54:24PM -0800, Anthony Liguori wrote:
>>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>>>> index a56f875..935901b 100644
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -122,6 +122,7 @@
>>>>  #include <asm/fixmap.h>
>>>>  #include <asm/io_apic.h>
>>>>  #include <asm/pci.h>
>>>> +#include <asm/guest.h>
>>>>
>>>>  #include <asm/hvm/grant_table.h>
>>>>  #include <asm/pv/grant_table.h>
>>>> @@ -945,7 +946,7 @@ get_page_from_l1e(
>>>>              case 0:
>>>>                  break;
>>>>              case 1:
>>>> -                if ( !is_hardware_domain(l1e_owner) )
>>>> +                if ( !is_vixen() && !is_hardware_domain(l1e_owner) )
>>>>                      break;
>>>>                  /* fallthrough */
>>>>              case -1:
>>>> @@ -5536,6 +5537,21 @@ void arch_dump_shared_mem_info(void)
>>>>              mem_sharing_get_nr_saved_mfns());
>>>>  }
>>>>
>>>> +const unsigned long *__init
>>>> +vixen_get_platform_badpages(unsigned int *array_size)
>>>> +{
>>>> +    static unsigned long __initdata bad_pages[] = {
>>>> +        0xfeffc000,
>>>> +        0xfeffd000,
>>>> +        0xfeffe000,
>>>> +        0xfefff000,
>>> This values shouldn't be hardcoded. IMHO it would also be good to
>>> place all the vixen_ helpers in a single file.
>> Ack on moving to a helper.
>>
>> I don't know of a way to call the hypervisor to ask "what's the
>> special page range?".  I can find special pages via the hvm get
>> parameters calls but there's no guarantee they are contiguous so the
>> resulting code to punch holes in the e820 because fairly complex.  Any
>> ideas how to do this nicely?
> I've done something similar for the shim, but the values in the
> bag_pages array are dynamic:
>
> http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=d5a72acaa2ced1bd66a1ef1ef7a4a1bda43a9df3
>
> Also, why do you need to add 4 GFNs to the list of bad pages? Just
> adding the console/xenstore pages to the e820 and to the list of bad
> pages should be enough.

You've got to be careful not to have the bootscrub zero the IDENT_PT. 
For safety, I put all of the special pages through this E820/bad cycle.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping
  2018-01-07 15:33     ` Anthony Liguori
@ 2018-01-08  9:55       ` Roger Pau Monné
  0 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08  9:55 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 07, 2018 at 07:33:06AM -0800, Anthony Liguori wrote:
> On Sun, Jan 7, 2018 at 12:23 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > On Sat, Jan 06, 2018 at 02:54:26PM -0800, Anthony Liguori wrote:
> >> From: Anthony Liguori <aliguori@amazon.com>
> >> +    rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
> >> +    if ( rc < 0 )
> >> +        printk("Setting shared info page failed: %ld\n", rc);
> >> +
> >> +    memset(&global_si->native.evtchn_mask[0], 0x00,
> >> +           sizeof(global_si->native.evtchn_mask));
> >
> > Hm, I'm not sure I like to approach of unmasking everything. IMHO I
> > would rather mask everything and unmask them when the guest actually
> > binds the event channel. That makes sure that an interrupt will get
> > injected when the event channel is unmasked (if there's an event
> > pending).
> 
> This is done in hvmloader and we discovered that guests rely on it.
> See hvmloader/xenbus.c:xenbus_shutdown().

But that's something completely different. There hvmloader is
resetting everything so the guest finds it in a proper state
(hvmloader is handling the shared_info page to the guest kernel). Here
vixen is not sharing the shared_info page with the guest (this is
just used by vixen), and hence I would do the opposite: mask
everything and unmask the event channels that the guest is actually
using.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/22] vixen: dom0 builder support
  2018-01-07 15:52     ` Anthony Liguori
@ 2018-01-08 10:03       ` Roger Pau Monné
  0 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08 10:03 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 07, 2018 at 07:52:26AM -0800, Anthony Liguori wrote:
> On Sun, Jan 7, 2018 at 1:02 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > On Sat, Jan 06, 2018 at 02:54:37PM -0800, Anthony Liguori wrote:
> >> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> >> index 1b89844..c49eeea 100644
> >> --- a/xen/arch/x86/setup.c
> >> +++ b/xen/arch/x86/setup.c
> >> @@ -663,6 +663,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >>          .stop_bits = 1
> >>      };
> >>      struct xen_arch_domainconfig config = { .emulation_flags = 0 };
> >> +    xen_pfn_t store_mfn = 0, console_mfn = 0;
> >> +    uint32_t store_evtchn = 0, console_evtchn = 0;
> >>
> >>      /* Critical region without IDT or TSS.  Any fault is deadly! */
> >>
> >> @@ -1595,6 +1597,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >>          config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
> >>      }
> >>
> >> +    if ( is_vixen() )
> >> +        config.emulation_flags = XEN_X86_EMU_PIT;
> >
> > DomUs should not have an emulated PIT, that's only for Dom0 PV.
> 
> Unfortunately, they do need an emulated PIT.  Until PVH got merged, an
> emulated PIT
> was always present for DomUs and 2.6.21 era kernels use the PIT for
> TSC calibration.
> If a PIT isn't present, then they hang during early boot.

Is this something that only affects 2.6.21? I've certainly tested
different old distros that use 2.6.18 and they all work just fine
without a PIT.

Do you know where can I find one of those kernels?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying
  2018-01-07 15:46     ` Anthony Liguori
@ 2018-01-08 10:04       ` Jan H. Schönherr
  0 siblings, 0 replies; 80+ messages in thread
From: Jan H. Schönherr @ 2018-01-08 10:04 UTC (permalink / raw)
  To: Anthony Liguori, Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Anthony Liguori, KarimAllah Ahmed,
	Andrew Cooper, Matt Wilson, xen-devel

On 01/07/2018 04:46 PM, Anthony Liguori wrote:
> On Sun, Jan 7, 2018 at 12:44 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Sat, Jan 06, 2018 at 02:54:33PM -0800, Anthony Liguori wrote:
>>> From: Jan H. Schönherr <jschoenh@amazon.de>
>>>
>>> Previously, we would keep proxied event channels as ECS_INTERDOMAIN
>>> channel around. This works for most things, but has the problem
>>> that EVTCHNOP_status is broken, and that EVTCHNOP_close does not
>>> mark an event channel as free.
>>
>> Why not use ECS_RESERVED for event channels that are forwarded to L0?
>>
>> You could easily see whether an event channel is forwarded or not just
>> by checking if it's ECS_RESERVED, and then decide whether to forward
>> the hypercall to L0 or handle it in vixen.
> 
> Jan?

I didn't go for RESERVED, because of potential confusion with other RESERVED
ports. AFAIK, there's only port 0 that's reserved, currently. If you don't
want to distinguish between "reserved" and "proxied", you'd have to forward
operations on port 0 as well, and then rely on port 0 being reserved in L0
as well.

Using a separate ECS_PROXY seemed cleaner to me. Less chance to get it wrong
or for it to accidentally get broken by a future change. :)

Regards
Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-07 15:42     ` Anthony Liguori
  2018-01-07 16:45       ` Andrew Cooper
@ 2018-01-08 10:05       ` Roger Pau Monné
  1 sibling, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08 10:05 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Sun, Jan 07, 2018 at 07:42:55AM -0800, Anthony Liguori wrote:
> On Sun, Jan 7, 2018 at 12:36 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > On Sat, Jan 06, 2018 at 02:54:31PM -0800, Anthony Liguori wrote:
> >> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> >> index 250450b..b302fd0 100644
> >> --- a/xen/common/grant_table.c
> >> +++ b/xen/common/grant_table.c
> >>  static long
> >> +vixen_gnttab_setup_table(
> >> +    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
> >> +{
> >> +    long rc;
> >> +
> >> +    struct gnttab_setup_table op;
> >> +    xen_pfn_t *frame_list = NULL;
> >> +    static void *grant_table;
> >> +    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
> >> +
> >> +    if ( count != 1 )
> >> +        return -EINVAL;
> >> +
> >> +    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
> >> +    {
> >> +        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
> >> +        return -EFAULT;
> >> +    }
> >> +
> >> +    if ( grant_table == NULL ) {
> >> +        struct xen_add_to_physmap xatp;
> >> +        struct domain *d;
> >> +        int i;
> >> +
> >> +        for ( i = 0; i < max_grant_frames; i++ )
> >> +        {
> >> +             grant_table = alloc_xenheap_page();
> >
> > This is wasting one memory page, grant table frames don't need to be
> > populated.
> 
> Well they have to have a valid struct page_info in order for the guest
> to map it within its address space.
> 
> Or did you have something else in mind?

You can map it in some unpopulated memory region and then add it to
the list of iomem regions for the guest (iomem_permit_access). Grant
table frames AFAICT don't require a struct page.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] vixen: pass grant table operations through to the outer Xen
  2018-01-07 16:45       ` Andrew Cooper
  2018-01-07 17:09         ` Anthony Liguori
@ 2018-01-08 10:12         ` Roger Pau Monné
  1 sibling, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08 10:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Anthony Liguori, Wei Liu, Anthony Liguori,
	KarimAllah Ahmed, Jan H. Schönherr, Jan Beulich,
	Paul Durrant, Anthony Liguori, xen-devel, Matt Wilson

On Sun, Jan 07, 2018 at 04:45:21PM +0000, Andrew Cooper wrote:
> On 07/01/2018 15:42, Anthony Liguori wrote:
> > On Sun, Jan 7, 2018 at 12:36 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> On Sat, Jan 06, 2018 at 02:54:31PM -0800, Anthony Liguori wrote:
> >>>  static long
> >>> +vixen_gnttab_setup_table(
> >>> +    XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
> >>> +{
> >>> +    long rc;
> >>> +
> >>> +    struct gnttab_setup_table op;
> >>> +    xen_pfn_t *frame_list = NULL;
> >>> +    static void *grant_table;
> >>> +    XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
> >>> +
> >>> +    if ( count != 1 )
> >>> +        return -EINVAL;
> >>> +
> >>> +    if ( unlikely(copy_from_guest(&op, uop, 1) != 0) )
> >>> +    {
> >>> +        gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
> >>> +        return -EFAULT;
> >>> +    }
> >>> +
> >>> +    if ( grant_table == NULL ) {
> >>> +        struct xen_add_to_physmap xatp;
> >>> +        struct domain *d;
> >>> +        int i;
> >>> +
> >>> +        for ( i = 0; i < max_grant_frames; i++ )
> >>> +        {
> >>> +             grant_table = alloc_xenheap_page();
> >> This is wasting one memory page, grant table frames don't need to be
> >> populated.
> > Well they have to have a valid struct page_info in order for the guest
> > to map it within its address space.
> >
> > Or did you have something else in mind?
> 
> Mapping of L0 frames into L1 is a giant mess.
> 
> First of all, some technical facts:
> 1) Frames which we map from L0 into L1 do not need to replace existing
> RAM.  We can use any GFNs up to maxphysaddr.
> 2) Mapped frames should not replace RAM, and particularly not frames in
> .data or .bss, because of the performance hit from shattered host
> superpages.
> 3) Ideally, we'd want to map into entirely unused GFNs, because then we
> don't have to interfere with what was there before.
> 
> In Xen, to allow a frame to be used by a guest, we need to set up domain
> ownership for it.  This requires a struct page_info to exist, which by
> default only occurs for pages L1 Xen things is RAM.

There's at least one exception to this, iomem ranges can be mapped by
guests and they don't require a struct page_info to exist. This is
what I'm using on the pv-shim to map the shared_info page and the
grant table frames.

> There is a completely gross way of dealing with this by faking up L1's
> E820 map to include a range as RAM, and adding every entry in that range
> into the badpages list.  This causes L1 Xen to put together page_info's
> for them, but otherwise ignore their existence.
> 
> Off the top of my head, frames needing special attention are:
> * The special pages, including Xenstore and Console rings.  These are
> real frames (as opposed to mappings), but live inside an E820 hole from
> L1's point of view.
> * Shared info
> * Grant table/status frames
> * Vcpuinfo frames

vcpu_info areas need to be populated, the memory in that case is
provided by the L1.

> * Event_fifo (if we care to wire that up, but perhaps its not worth it).
> 
> What I started doing in PV-shim (before switching to the SP2 side of
> things fully) was to hard code these mapping frames immediately after
> the special pages, which is a horrible but safe (as far as I can tell)
> way of doing things.
> 
> Ideally, L1 could work out a safe place to use for mappings (which
> ideally, would be a block of GFNs immediately above the last used
> frame), but this cannot be done with the toolstack-provided E820 alone,
> because it is insufficiently descriptive as it deliberately omits
> information which can be found in the DSDT (e.g. ACPI hotplug regions).

In the shim I'm using a rangeset to keep track of unused memory
regions that can be used to map pages from L0, it's still kind of
hacky, but IMHO it's the best option ATM.

http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=patch;h=aa4a9eac9d9f669bc9315e9e569c0335e2cb2a74

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM
  2018-01-08  9:54         ` Andrew Cooper
@ 2018-01-08 10:23           ` Roger Pau Monné
  0 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08 10:23 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Anthony Liguori, Wei Liu, Anthony Liguori, KarimAllah Ahmed,
	Jan H. Schönherr, Anthony Liguori, xen-devel, Matt Wilson

On Mon, Jan 08, 2018 at 09:54:59AM +0000, Andrew Cooper wrote:
> On 08/01/2018 09:51, Roger Pau Monné wrote:
> > On Sun, Jan 07, 2018 at 07:27:48AM -0800, Anthony Liguori wrote:
> >> On Sun, Jan 7, 2018 at 12:16 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>> On Sat, Jan 06, 2018 at 02:54:24PM -0800, Anthony Liguori wrote:
> >>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> >>>> index a56f875..935901b 100644
> >>>> --- a/xen/arch/x86/mm.c
> >>>> +++ b/xen/arch/x86/mm.c
> >>>> @@ -122,6 +122,7 @@
> >>>>  #include <asm/fixmap.h>
> >>>>  #include <asm/io_apic.h>
> >>>>  #include <asm/pci.h>
> >>>> +#include <asm/guest.h>
> >>>>
> >>>>  #include <asm/hvm/grant_table.h>
> >>>>  #include <asm/pv/grant_table.h>
> >>>> @@ -945,7 +946,7 @@ get_page_from_l1e(
> >>>>              case 0:
> >>>>                  break;
> >>>>              case 1:
> >>>> -                if ( !is_hardware_domain(l1e_owner) )
> >>>> +                if ( !is_vixen() && !is_hardware_domain(l1e_owner) )
> >>>>                      break;
> >>>>                  /* fallthrough */
> >>>>              case -1:
> >>>> @@ -5536,6 +5537,21 @@ void arch_dump_shared_mem_info(void)
> >>>>              mem_sharing_get_nr_saved_mfns());
> >>>>  }
> >>>>
> >>>> +const unsigned long *__init
> >>>> +vixen_get_platform_badpages(unsigned int *array_size)
> >>>> +{
> >>>> +    static unsigned long __initdata bad_pages[] = {
> >>>> +        0xfeffc000,
> >>>> +        0xfeffd000,
> >>>> +        0xfeffe000,
> >>>> +        0xfefff000,
> >>> This values shouldn't be hardcoded. IMHO it would also be good to
> >>> place all the vixen_ helpers in a single file.
> >> Ack on moving to a helper.
> >>
> >> I don't know of a way to call the hypervisor to ask "what's the
> >> special page range?".  I can find special pages via the hvm get
> >> parameters calls but there's no guarantee they are contiguous so the
> >> resulting code to punch holes in the e820 because fairly complex.  Any
> >> ideas how to do this nicely?
> > I've done something similar for the shim, but the values in the
> > bag_pages array are dynamic:
> >
> > http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=d5a72acaa2ced1bd66a1ef1ef7a4a1bda43a9df3
> >
> > Also, why do you need to add 4 GFNs to the list of bad pages? Just
> > adding the console/xenstore pages to the e820 and to the list of bad
> > pages should be enough.
> 
> You've got to be careful not to have the bootscrub zero the IDENT_PT. 
> For safety, I put all of the special pages through this E820/bad cycle.

IDENT_PT is not marked as a RAM region in the e820, so Xen will not
scrub it.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
                   ` (23 preceding siblings ...)
  2018-01-06 23:50 ` Andrew Cooper
@ 2018-01-08 11:54 ` Wei Liu
  2018-01-08 12:11   ` Roger Pau Monné
  24 siblings, 1 reply; 80+ messages in thread
From: Wei Liu @ 2018-01-08 11:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

Hi Anthony

On Sat, Jan 06, 2018 at 02:54:15PM -0800, Anthony Liguori wrote:
> From: Anthony Liguori <aliguori@amazon.com>
> 
> CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
> appears to be very difficult to isolate the hypervisor's page tables
> from PV domUs while maintaining ABI compatibility.  Instead of trying
> to make a KPTI-like approach work for Xen PV, it seems reasonable to
> run a copy of Xen within an HVM (or PVH) domU to provide backwards
> compatibility with guests as mentioned in XSA-254 [1].
> 
> This patch series adds a new mode to Xen called Vixen (Virtualized
> Xen) which provides a PV-compatible interface while gaining
> CVE-2017-5754 protection for the host provided by hardware
> virtualization.  Vixen supports running a single unprivileged PV
> domain (a dom1) that is constructed by the dom0 domain builder.
> 
> Please note the Xen page table configuration fundamental to the
> current PV ABI makes it impossible for an operating system to mitigate
> CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
> (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
> must run directly in a HVM or PVH domU.
> 
> This series is very similar to the PVH series posted by Wei and we
> have been discussing how to merge efforts.  We were hoping to have
> more time to work this out.  I am posting this because I'm fairly
> confident that this series is complete (all PV instances in EC2 are
> using this) and others might find it useful.  I also wanted to have
> more of a discussion about the best way to merge and some of the
> differences in designs.
> 
> This series is also available at:
> 
>  git clone https://github.com/aliguori/xen.git vixen-upstream-v1

I do want to make the shim be able to run in both pvh and hvm mode
(which doesn't seem to be too hard in practice).

I suppose we need to:

1. Agree on the kconfig options.
2. Figure out what is needed for each mode and guard them accordingly.
3. Unify the implementation of hypercall forwarding and other internal
   code.

I was sick last week so I'm a bit behind on everything (including the
pvshim series, which has a lot of feedback now).  I will read your
series (v1, v2 and comments) shortly and hopefully I can figure out
things by myself.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 11:54 ` Wei Liu
@ 2018-01-08 12:11   ` Roger Pau Monné
  2018-01-08 12:14     ` Wei Liu
  2018-01-08 16:02     ` Anthony Liguori
  0 siblings, 2 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08 12:11 UTC (permalink / raw)
  To: Wei Liu
  Cc: Anthony Liguori, Anthony Liguori, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Matt Wilson, xen-devel

On Mon, Jan 08, 2018 at 11:54:57AM +0000, Wei Liu wrote:
> Hi Anthony
> 
> On Sat, Jan 06, 2018 at 02:54:15PM -0800, Anthony Liguori wrote:
> > From: Anthony Liguori <aliguori@amazon.com>
> > 
> > CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
> > appears to be very difficult to isolate the hypervisor's page tables
> > from PV domUs while maintaining ABI compatibility.  Instead of trying
> > to make a KPTI-like approach work for Xen PV, it seems reasonable to
> > run a copy of Xen within an HVM (or PVH) domU to provide backwards
> > compatibility with guests as mentioned in XSA-254 [1].
> > 
> > This patch series adds a new mode to Xen called Vixen (Virtualized
> > Xen) which provides a PV-compatible interface while gaining
> > CVE-2017-5754 protection for the host provided by hardware
> > virtualization.  Vixen supports running a single unprivileged PV
> > domain (a dom1) that is constructed by the dom0 domain builder.
> > 
> > Please note the Xen page table configuration fundamental to the
> > current PV ABI makes it impossible for an operating system to mitigate
> > CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
> > (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
> > must run directly in a HVM or PVH domU.
> > 
> > This series is very similar to the PVH series posted by Wei and we
> > have been discussing how to merge efforts.  We were hoping to have
> > more time to work this out.  I am posting this because I'm fairly
> > confident that this series is complete (all PV instances in EC2 are
> > using this) and others might find it useful.  I also wanted to have
> > more of a discussion about the best way to merge and some of the
> > differences in designs.
> > 
> > This series is also available at:
> > 
> >  git clone https://github.com/aliguori/xen.git vixen-upstream-v1
> 
> I do want to make the shim be able to run in both pvh and hvm mode
> (which doesn't seem to be too hard in practice).

AFAIK the pv-shim code will already work in HVM mode. It's just that
booting the pv-shim in HVM mode requires that you install the shim
inside of the guest and then boot it using grub or a similar loader
that can do multiboot.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 12:11   ` Roger Pau Monné
@ 2018-01-08 12:14     ` Wei Liu
  2018-01-08 16:02     ` Anthony Liguori
  1 sibling, 0 replies; 80+ messages in thread
From: Wei Liu @ 2018-01-08 12:14 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Mon, Jan 08, 2018 at 12:11:55PM +0000, Roger Pau Monné wrote:
> On Mon, Jan 08, 2018 at 11:54:57AM +0000, Wei Liu wrote:
> > Hi Anthony
> > 
> > On Sat, Jan 06, 2018 at 02:54:15PM -0800, Anthony Liguori wrote:
> > > From: Anthony Liguori <aliguori@amazon.com>
> > > 
> > > CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
> > > appears to be very difficult to isolate the hypervisor's page tables
> > > from PV domUs while maintaining ABI compatibility.  Instead of trying
> > > to make a KPTI-like approach work for Xen PV, it seems reasonable to
> > > run a copy of Xen within an HVM (or PVH) domU to provide backwards
> > > compatibility with guests as mentioned in XSA-254 [1].
> > > 
> > > This patch series adds a new mode to Xen called Vixen (Virtualized
> > > Xen) which provides a PV-compatible interface while gaining
> > > CVE-2017-5754 protection for the host provided by hardware
> > > virtualization.  Vixen supports running a single unprivileged PV
> > > domain (a dom1) that is constructed by the dom0 domain builder.
> > > 
> > > Please note the Xen page table configuration fundamental to the
> > > current PV ABI makes it impossible for an operating system to mitigate
> > > CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
> > > (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
> > > must run directly in a HVM or PVH domU.
> > > 
> > > This series is very similar to the PVH series posted by Wei and we
> > > have been discussing how to merge efforts.  We were hoping to have
> > > more time to work this out.  I am posting this because I'm fairly
> > > confident that this series is complete (all PV instances in EC2 are
> > > using this) and others might find it useful.  I also wanted to have
> > > more of a discussion about the best way to merge and some of the
> > > differences in designs.
> > > 
> > > This series is also available at:
> > > 
> > >  git clone https://github.com/aliguori/xen.git vixen-upstream-v1
> > 
> > I do want to make the shim be able to run in both pvh and hvm mode
> > (which doesn't seem to be too hard in practice).
> 
> AFAIK the pv-shim code will already work in HVM mode. It's just that
> booting the pv-shim in HVM mode requires that you install the shim
> inside of the guest and then boot it using grub or a similar loader
> that can do multiboot.
> 

I'm thinking more along the line that we use the shim in place of
hvmloader, then we don't need to install it inside the guest.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 12:11   ` Roger Pau Monné
  2018-01-08 12:14     ` Wei Liu
@ 2018-01-08 16:02     ` Anthony Liguori
  2018-01-08 16:28       ` George Dunlap
                         ` (2 more replies)
  1 sibling, 3 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-08 16:02 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Jan Beulich,
	Anthony Liguori, xen-devel

On Mon, Jan 8, 2018 at 4:11 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 11:54:57AM +0000, Wei Liu wrote:
>> Hi Anthony
>>
>> On Sat, Jan 06, 2018 at 02:54:15PM -0800, Anthony Liguori wrote:
>> > From: Anthony Liguori <aliguori@amazon.com>
>> >
>> > CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
>> > appears to be very difficult to isolate the hypervisor's page tables
>> > from PV domUs while maintaining ABI compatibility.  Instead of trying
>> > to make a KPTI-like approach work for Xen PV, it seems reasonable to
>> > run a copy of Xen within an HVM (or PVH) domU to provide backwards
>> > compatibility with guests as mentioned in XSA-254 [1].
>> >
>> > This patch series adds a new mode to Xen called Vixen (Virtualized
>> > Xen) which provides a PV-compatible interface while gaining
>> > CVE-2017-5754 protection for the host provided by hardware
>> > virtualization.  Vixen supports running a single unprivileged PV
>> > domain (a dom1) that is constructed by the dom0 domain builder.
>> >
>> > Please note the Xen page table configuration fundamental to the
>> > current PV ABI makes it impossible for an operating system to mitigate
>> > CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
>> > (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
>> > must run directly in a HVM or PVH domU.
>> >
>> > This series is very similar to the PVH series posted by Wei and we
>> > have been discussing how to merge efforts.  We were hoping to have
>> > more time to work this out.  I am posting this because I'm fairly
>> > confident that this series is complete (all PV instances in EC2 are
>> > using this) and others might find it useful.  I also wanted to have
>> > more of a discussion about the best way to merge and some of the
>> > differences in designs.
>> >
>> > This series is also available at:
>> >
>> >  git clone https://github.com/aliguori/xen.git vixen-upstream-v1
>>
>> I do want to make the shim be able to run in both pvh and hvm mode
>> (which doesn't seem to be too hard in practice).
>
> AFAIK the pv-shim code will already work in HVM mode. It's just that
> booting the pv-shim in HVM mode requires that you install the shim
> inside of the guest and then boot it using grub or a similar loader
> that can do multiboot.

I'm happy to work on either approach.  I just want to get something
merged to have
an upstream solution to this issue.  I think this particular CVE for
Xen PV is the worst
of this batch of issues so I'm super eager on getting a solution
straightened out.  I'd
really like to hear from others on what the right approach should be
and I'll work on
whatever the consensus is.

I think PVH is a good long term solution but I think it's a poor short
term solution.
PVH isn't widely deployed so it's asking people to upgrade their
infrastructure to a
very new version of Xen.  It also requires tools changes which means
that even if
you are on a newer version of Xen, you still have to upgrade.  The
patch series is
also pretty big which means I suspect people will need to wait to 4.11 at best.

OTOH, the HVM version of the series requires no tools changes and works on Xen
versions going back to 3.4 (at least).  What this means practically
speaking is that
if it were merged, we can tell people that they can solve this problem
by building the
HVM shim and modifying their launch config to boot from an ISO or
something similar.

This gives people an immediate solution that does not require major
changes to their
underlying infrastructure.

The series now is also reasonably contained and small enough that
IMHO, it could go
into the stable tree.  That means that once merged, we could cut a
stable release giving
people an official release that could be used for this purpose.

If it was entirely my call, I would work on merging HVM shim first,
get a 4.10 stable release
cut with it, and then focus on getting PVH shim in place for the 4.11
release.  I think
this is the right balance of addressing the short term needs while
also having the best long
term solution.

Regards,

Anthony Liguori

>
> Roger.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 16:02     ` Anthony Liguori
@ 2018-01-08 16:28       ` George Dunlap
       [not found]         ` <CA+aC4kt5zbymFbHqCMV-oB80cw2dXWTcTztpa4EnqOKELKs7qg@mail.gmail.com>
  2018-01-08 16:30       ` Wei Liu
  2018-01-08 16:38       ` Roger Pau Monné
  2 siblings, 1 reply; 80+ messages in thread
From: George Dunlap @ 2018-01-08 16:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Jan Beulich, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori,
	Matt Wilson, xen-devel, Roger Pau Monné

On Mon, Jan 8, 2018 at 4:02 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
>>> I do want to make the shim be able to run in both pvh and hvm mode
>>> (which doesn't seem to be too hard in practice).
>>
>> AFAIK the pv-shim code will already work in HVM mode. It's just that
>> booting the pv-shim in HVM mode requires that you install the shim
>> inside of the guest and then boot it using grub or a similar loader
>> that can do multiboot.
>
> I'm happy to work on either approach.  I just want to get something
> merged to have
> an upstream solution to this issue.  I think this particular CVE for
> Xen PV is the worst
> of this batch of issues so I'm super eager on getting a solution
> straightened out.  I'd
> really like to hear from others on what the right approach should be
> and I'll work on
> whatever the consensus is.
>
> I think PVH is a good long term solution but I think it's a poor short
> term solution.
> PVH isn't widely deployed so it's asking people to upgrade their
> infrastructure to a
> very new version of Xen.  It also requires tools changes which means
> that even if
> you are on a newer version of Xen, you still have to upgrade.  The
> patch series is
> also pretty big which means I suspect people will need to wait to 4.11 at best.
>
> OTOH, the HVM version of the series requires no tools changes and works on Xen
> versions going back to 3.4 (at least).  What this means practically
> speaking is that
> if it were merged, we can tell people that they can solve this problem
> by building the
> HVM shim and modifying their launch config to boot from an ISO or
> something similar.
>
> This gives people an immediate solution that does not require major
> changes to their
> underlying infrastructure.

Solving the "how to we boot the shim" question is the main reason that
we decided to start with PVH-only back to 4.8.

We didn't consider working around it by having a special boot disk
(ISO or otherwise); it's hard to know how well that will work for most
people.  You don't think that "having to create and boot from a custom
ISO" would count as "major changes to underlying infrastructure"?

> The series now is also reasonably contained and small enough that
> IMHO, it could go
> into the stable tree.  That means that once merged, we could cut a
> stable release giving
> people an official release that could be used for this purpose.
>
> If it was entirely my call, I would work on merging HVM shim first,
> get a 4.10 stable release
> cut with it, and then focus on getting PVH shim in place for the 4.11
> release.  I think
> this is the right balance of addressing the short term needs while
> also having the best long
> term solution.

If I understand correctly, this series is missing a number of features
from the other series -- migration being the key one, but perhaps
others (vcpu hot-plug? ballooning?).

In either case, it sounds like "additional boot disk" should work for
older versions, it sounds like.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 16:02     ` Anthony Liguori
  2018-01-08 16:28       ` George Dunlap
@ 2018-01-08 16:30       ` Wei Liu
  2018-01-08 16:39         ` Ian Jackson
  2018-01-08 17:11         ` Anthony Liguori
  2018-01-08 16:38       ` Roger Pau Monné
  2 siblings, 2 replies; 80+ messages in thread
From: Wei Liu @ 2018-01-08 16:30 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Jan Beulich, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori,
	Matt Wilson, security, xen-devel, Ian Jackson,
	Roger Pau Monné

On Mon, Jan 08, 2018 at 08:02:07AM -0800, Anthony Liguori wrote:
> On Mon, Jan 8, 2018 at 4:11 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > On Mon, Jan 08, 2018 at 11:54:57AM +0000, Wei Liu wrote:
> >> Hi Anthony
> >>
> >> On Sat, Jan 06, 2018 at 02:54:15PM -0800, Anthony Liguori wrote:
> >> > From: Anthony Liguori <aliguori@amazon.com>
> >> >
> >> > CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
> >> > appears to be very difficult to isolate the hypervisor's page tables
> >> > from PV domUs while maintaining ABI compatibility.  Instead of trying
> >> > to make a KPTI-like approach work for Xen PV, it seems reasonable to
> >> > run a copy of Xen within an HVM (or PVH) domU to provide backwards
> >> > compatibility with guests as mentioned in XSA-254 [1].
> >> >
> >> > This patch series adds a new mode to Xen called Vixen (Virtualized
> >> > Xen) which provides a PV-compatible interface while gaining
> >> > CVE-2017-5754 protection for the host provided by hardware
> >> > virtualization.  Vixen supports running a single unprivileged PV
> >> > domain (a dom1) that is constructed by the dom0 domain builder.
> >> >
> >> > Please note the Xen page table configuration fundamental to the
> >> > current PV ABI makes it impossible for an operating system to mitigate
> >> > CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
> >> > (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
> >> > must run directly in a HVM or PVH domU.
> >> >
> >> > This series is very similar to the PVH series posted by Wei and we
> >> > have been discussing how to merge efforts.  We were hoping to have
> >> > more time to work this out.  I am posting this because I'm fairly
> >> > confident that this series is complete (all PV instances in EC2 are
> >> > using this) and others might find it useful.  I also wanted to have
> >> > more of a discussion about the best way to merge and some of the
> >> > differences in designs.
> >> >
> >> > This series is also available at:
> >> >
> >> >  git clone https://github.com/aliguori/xen.git vixen-upstream-v1
> >>
> >> I do want to make the shim be able to run in both pvh and hvm mode
> >> (which doesn't seem to be too hard in practice).
> >
> > AFAIK the pv-shim code will already work in HVM mode. It's just that
> > booting the pv-shim in HVM mode requires that you install the shim
> > inside of the guest and then boot it using grub or a similar loader
> > that can do multiboot.
> 
> I'm happy to work on either approach.  I just want to get something
> merged to have
> an upstream solution to this issue.  I think this particular CVE for
> Xen PV is the worst
> of this batch of issues so I'm super eager on getting a solution
> straightened out.  I'd
> really like to hear from others on what the right approach should be
> and I'll work on
> whatever the consensus is.
> 
> I think PVH is a good long term solution but I think it's a poor short
> term solution.
> PVH isn't widely deployed so it's asking people to upgrade their
> infrastructure to a
> very new version of Xen.  It also requires tools changes which means
> that even if
> you are on a newer version of Xen, you still have to upgrade.  The
> patch series is
> also pretty big which means I suspect people will need to wait to 4.11 at best.
> 
> OTOH, the HVM version of the series requires no tools changes and works on Xen
> versions going back to 3.4 (at least).  What this means practically
> speaking is that
> if it were merged, we can tell people that they can solve this problem
> by building the
> HVM shim and modifying their launch config to boot from an ISO or
> something similar.
> 

This is fair enough. And it is the major reason why I want to make the
shim works for both hvm and pvh in the first place.

I'm more than happy to work with you to make PV-in-HVM work.

> This gives people an immediate solution that does not require major
> changes to their
> underlying infrastructure.
> 

What is your assessment of the completeness of this series? I think
listing what works or what doesn't will have upstream make the decision
better. For example, does migration work? It doesn't mean everything
needs to be complete before we can start merging it but we do want
upstream users to under what would be broken.

> The series now is also reasonably contained and small enough that
> IMHO, it could go
> into the stable tree.  That means that once merged, we could cut a
> stable release giving
> people an official release that could be used for this purpose.
> 
> If it was entirely my call, I would work on merging HVM shim first,
> get a 4.10 stable release
> cut with it, and then focus on getting PVH shim in place for the 4.11
> release.  I think
> this is the right balance of addressing the short term needs while
> also having the best long
> term solution.

Not my call either. I will wait for security team member and stable tree
maintainers to weight in.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 16:02     ` Anthony Liguori
  2018-01-08 16:28       ` George Dunlap
  2018-01-08 16:30       ` Wei Liu
@ 2018-01-08 16:38       ` Roger Pau Monné
  2 siblings, 0 replies; 80+ messages in thread
From: Roger Pau Monné @ 2018-01-08 16:38 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Jan Beulich, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Matt Wilson, xen-devel,
	Anthony Liguori

On Mon, Jan 08, 2018 at 08:02:07AM -0800, Anthony Liguori wrote:
> On Mon, Jan 8, 2018 at 4:11 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > On Mon, Jan 08, 2018 at 11:54:57AM +0000, Wei Liu wrote:
> >> Hi Anthony
> >>
> >> On Sat, Jan 06, 2018 at 02:54:15PM -0800, Anthony Liguori wrote:
> >> > From: Anthony Liguori <aliguori@amazon.com>
> >> >
> >> > CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
> >> > appears to be very difficult to isolate the hypervisor's page tables
> >> > from PV domUs while maintaining ABI compatibility.  Instead of trying
> >> > to make a KPTI-like approach work for Xen PV, it seems reasonable to
> >> > run a copy of Xen within an HVM (or PVH) domU to provide backwards
> >> > compatibility with guests as mentioned in XSA-254 [1].
> >> >
> >> > This patch series adds a new mode to Xen called Vixen (Virtualized
> >> > Xen) which provides a PV-compatible interface while gaining
> >> > CVE-2017-5754 protection for the host provided by hardware
> >> > virtualization.  Vixen supports running a single unprivileged PV
> >> > domain (a dom1) that is constructed by the dom0 domain builder.
> >> >
> >> > Please note the Xen page table configuration fundamental to the
> >> > current PV ABI makes it impossible for an operating system to mitigate
> >> > CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
> >> > (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
> >> > must run directly in a HVM or PVH domU.
> >> >
> >> > This series is very similar to the PVH series posted by Wei and we
> >> > have been discussing how to merge efforts.  We were hoping to have
> >> > more time to work this out.  I am posting this because I'm fairly
> >> > confident that this series is complete (all PV instances in EC2 are
> >> > using this) and others might find it useful.  I also wanted to have
> >> > more of a discussion about the best way to merge and some of the
> >> > differences in designs.
> >> >
> >> > This series is also available at:
> >> >
> >> >  git clone https://github.com/aliguori/xen.git vixen-upstream-v1
> >>
> >> I do want to make the shim be able to run in both pvh and hvm mode
> >> (which doesn't seem to be too hard in practice).
> >
> > AFAIK the pv-shim code will already work in HVM mode. It's just that
> > booting the pv-shim in HVM mode requires that you install the shim
> > inside of the guest and then boot it using grub or a similar loader
> > that can do multiboot.
> 
> I'm happy to work on either approach.  I just want to get something
> merged to have
> an upstream solution to this issue.  I think this particular CVE for
> Xen PV is the worst
> of this batch of issues so I'm super eager on getting a solution
> straightened out.  I'd
> really like to hear from others on what the right approach should be
> and I'll work on
> whatever the consensus is.

I agree it's important to get something merged or in a decent shape in
order to solve/mitigate this issue, and likely ASAP.

> I think PVH is a good long term solution but I think it's a poor short
> term solution.
> PVH isn't widely deployed so it's asking people to upgrade their
> infrastructure to a
> very new version of Xen.  It also requires tools changes which means
> that even if
> you are on a newer version of Xen, you still have to upgrade.  The
> patch series is
> also pretty big which means I suspect people will need to wait to 4.11 at best.
> 
> OTOH, the HVM version of the series requires no tools changes and works on Xen
> versions going back to 3.4 (at least).  What this means practically
> speaking is that
> if it were merged, we can tell people that they can solve this problem
> by building the
> HVM shim and modifying their launch config to boot from an ISO or
> something similar.

The only difference here is that vixen is capable of using more event
channel injection mechanisms, whether the pv-shim is limited to
HVMOP_set_evtchn_upcall_vector ATM. Apart from that pv-shim code
should work fine inside of an HVM container.

> This gives people an immediate solution that does not require major
> changes to their
> underlying infrastructure.
> 
> The series now is also reasonably contained and small enough that
> IMHO, it could go
> into the stable tree.  That means that once merged, we could cut a
> stable release giving
> people an official release that could be used for this purpose.

It's also important to note that vixen series are also smaller because
it supports a much more limited set of features. The pv-shim code
supports migration, vcpu hotplug/unplug (the vcpu is actually
plugged/unplugged from the shim itself) and memory ballooning.

IMHO merging a sub-set of the pv-shim work in order to get a set of
functionality similar to the one offered by vixen should probe
easier.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 16:30       ` Wei Liu
@ 2018-01-08 16:39         ` Ian Jackson
  2018-01-08 17:03           ` Anthony Liguori
  2018-01-08 17:11         ` Anthony Liguori
  1 sibling, 1 reply; 80+ messages in thread
From: Ian Jackson @ 2018-01-08 16:39 UTC (permalink / raw)
  To: Wei Liu
  Cc: Anthony Liguori, Anthony Liguori, KarimAllah Ahmed,
	Andrew Cooper, Jan H.Schönherr, Anthony Liguori,
	Jan Beulich, Matt Wilson, security, xen-devel,
	Roger Pau Monné

Wei Liu writes ("Re: [Xen-devel] [PATCH 00/22] Vixen: A PV-in-HVM shim"):
> On Mon, Jan 08, 2018 at 08:02:07AM -0800, Anthony Liguori wrote:
> > OTOH, the HVM version of the series requires no tools changes and
> > works on Xen versions going back to 3.4 (at least).

That depends, I think, on how you are selecting the guest kernel.

libxl (at least, older libxls) don't support direct kernel boot in HVM
mode.  So if you were using kernel= in your config file that won't
work without libxl changes which are really hard to do and also
maintain ABI compatibility.

Likewise bootloader= (eg bootloader="pygrub").

> > If it was entirely my call, I would work on merging HVM shim
> > first, get a 4.10 stable release cut with it, and then focus on
> > getting PVH shim in place for the 4.11 release.  I think this is
> > the right balance of addressing the short term needs while also
> > having the best long term solution.
> 
> Not my call either. I will wait for security team member and stable tree
> maintainers to weight in.

Since shim users are going to be using unstable/4.10 as the shim
anyway, I think a good priority is indeed getting a good solution for
4.10.

Personally I am not doing any Xen review work or commit work right now
that is not related to Meltdown/Spectre.  Everything else has to wait.

Furthermore I think we should avoid committing anything to
xen-unstable that will complicate our efforts on the shim.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
       [not found]               ` <CA+aC4kujGxWQzSfWP=8qP2SWd0G+qBod8HCLuosPg9SzS-22Vw@mail.gmail.com>
@ 2018-01-08 16:41                 ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-08 16:41 UTC (permalink / raw)
  To: George Dunlap
  Cc: Anthony Liguori, Wei Liu, Jan Beulich, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori,
	Matt Wilson, xen-devel, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 3715 bytes --]

On Jan 8, 2018 8:28 AM, "George Dunlap" <dunlapg@umich.edu> wrote:

On Mon, Jan 8, 2018 at 4:02 PM, Anthony Liguori <anthony@codemonkey.ws>
wrote:
>>> I do want to make the shim be able to run in both pvh and hvm mode
>>> (which doesn't seem to be too hard in practice).
>>
>> AFAIK the pv-shim code will already work in HVM mode. It's just that
>> booting the pv-shim in HVM mode requires that you install the shim
>> inside of the guest and then boot it using grub or a similar loader
>> that can do multiboot.
>
> I'm happy to work on either approach.  I just want to get something
> merged to have
> an upstream solution to this issue.  I think this particular CVE for
> Xen PV is the worst
> of this batch of issues so I'm super eager on getting a solution
> straightened out.  I'd
> really like to hear from others on what the right approach should be
> and I'll work on
> whatever the consensus is.
>
> I think PVH is a good long term solution but I think it's a poor short
> term solution.
> PVH isn't widely deployed so it's asking people to upgrade their
> infrastructure to a
> very new version of Xen.  It also requires tools changes which means
> that even if
> you are on a newer version of Xen, you still have to upgrade.  The
> patch series is
> also pretty big which means I suspect people will need to wait to 4.11 at
best.
>
> OTOH, the HVM version of the series requires no tools changes and works
on Xen
> versions going back to 3.4 (at least).  What this means practically
> speaking is that
> if it were merged, we can tell people that they can solve this problem
> by building the
> HVM shim and modifying their launch config to boot from an ISO or
> something similar.
>
> This gives people an immediate solution that does not require major
> changes to their
> underlying infrastructure.

Solving the "how to we boot the shim" question is the main reason that
we decided to start with PVH-only back to 4.8.

We didn't consider working around it by having a special boot disk
(ISO or otherwise); it's hard to know how well that will work for most
people.  You don't think that "having to create and boot from a custom
ISO" would count as "major changes to underlying infrastructure"?


If you are a use of xl, then it's a one time config file conversion plus
iso generation.  If you use pvgrub, then the same iso can be reused widely.

You can imagine a script to automate the conversion too.

It may be harder if you are using management tools but it's still easier
than a major version upgrade.

> The series now is also reasonably contained and small enough that
> IMHO, it could go
> into the stable tree.  That means that once merged, we could cut a
> stable release giving
> people an official release that could be used for this purpose.
>
> If it was entirely my call, I would work on merging HVM shim first,
> get a 4.10 stable release
> cut with it, and then focus on getting PVH shim in place for the 4.11
> release.  I think
> this is the right balance of addressing the short term needs while
> also having the best long
> term solution.

If I understand correctly, this series is missing a number of features
from the other series -- migration being the key one, but perhaps
others (vcpu hot-plug? ballooning?).


I haven't tested vcpu hotplug.  The xenstore communication works but we
would have to also pass through the reservations calls.  Not hard at all to
do.

In terms of completeness, as I mentioned, all PV instances in EC2 are using
this today.  We don't make use of all Xen features so some others may be
missing.


In either case, it sounds like "additional boot disk" should work for
older versions, it sounds like.


Right.

Regards,

Anthony Liguori


 -George

[-- Attachment #1.2: Type: text/html, Size: 5811 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 16:39         ` Ian Jackson
@ 2018-01-08 17:03           ` Anthony Liguori
  2018-01-08 17:34             ` Wei Liu
  0 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-08 17:03 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Anthony Liguori, Wei Liu, Jan Beulich, KarimAllah Ahmed,
	Andrew Cooper, Jan H.Schönherr, Anthony Liguori,
	Matt Wilson, security, xen-devel, Roger Pau Monné

On Mon, Jan 8, 2018 at 8:39 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> Wei Liu writes ("Re: [Xen-devel] [PATCH 00/22] Vixen: A PV-in-HVM shim"):
>> On Mon, Jan 08, 2018 at 08:02:07AM -0800, Anthony Liguori wrote:
>> > OTOH, the HVM version of the series requires no tools changes and
>> > works on Xen versions going back to 3.4 (at least).
>
> That depends, I think, on how you are selecting the guest kernel.
>
> libxl (at least, older libxls) don't support direct kernel boot in HVM
> mode.  So if you were using kernel= in your config file that won't
> work without libxl changes which are really hard to do and also
> maintain ABI compatibility.
>
> Likewise bootloader= (eg bootloader="pygrub").

I think pvgrub is a pretty reasonable alternative to pygrub for most people.

What we specifically did was take the kernel/etc arguments and used them
to generate an ISO with isolinux with the shim embedded in the ISO.

While it does work to set boot="d" and add the ISO to the disk=[] option, we
preferred to use a wrapper around qemu to directly add a -cdrom option so
that the ISO would not be exposed as a blkback device.

It's not effort free, but it's also a change that I would think most
administrators
can make.

Regards,

Anthony Liguori

>> > If it was entirely my call, I would work on merging HVM shim
>> > first, get a 4.10 stable release cut with it, and then focus on
>> > getting PVH shim in place for the 4.11 release.  I think this is
>> > the right balance of addressing the short term needs while also
>> > having the best long term solution.
>>
>> Not my call either. I will wait for security team member and stable tree
>> maintainers to weight in.
>
> Since shim users are going to be using unstable/4.10 as the shim
> anyway, I think a good priority is indeed getting a good solution for
> 4.10.
>
> Personally I am not doing any Xen review work or commit work right now
> that is not related to Meltdown/Spectre.  Everything else has to wait.
>
> Furthermore I think we should avoid committing anything to
> xen-unstable that will complicate our efforts on the shim.
>
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 16:30       ` Wei Liu
  2018-01-08 16:39         ` Ian Jackson
@ 2018-01-08 17:11         ` Anthony Liguori
  1 sibling, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-08 17:11 UTC (permalink / raw)
  To: Wei Liu
  Cc: Anthony Liguori, Jan Beulich, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Anthony Liguori, Matt Wilson, security,
	xen-devel, Ian Jackson, Roger Pau Monné

On Mon, Jan 8, 2018 at 8:30 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 08:02:07AM -0800, Anthony Liguori wrote:
>> On Mon, Jan 8, 2018 at 4:11 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>> > On Mon, Jan 08, 2018 at 11:54:57AM +0000, Wei Liu wrote:
>> >> Hi Anthony
>> >>
>> >> On Sat, Jan 06, 2018 at 02:54:15PM -0800, Anthony Liguori wrote:
>> >> > From: Anthony Liguori <aliguori@amazon.com>
>> >> >
>> >> > CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
>> >> > appears to be very difficult to isolate the hypervisor's page tables
>> >> > from PV domUs while maintaining ABI compatibility.  Instead of trying
>> >> > to make a KPTI-like approach work for Xen PV, it seems reasonable to
>> >> > run a copy of Xen within an HVM (or PVH) domU to provide backwards
>> >> > compatibility with guests as mentioned in XSA-254 [1].
>> >> >
>> >> > This patch series adds a new mode to Xen called Vixen (Virtualized
>> >> > Xen) which provides a PV-compatible interface while gaining
>> >> > CVE-2017-5754 protection for the host provided by hardware
>> >> > virtualization.  Vixen supports running a single unprivileged PV
>> >> > domain (a dom1) that is constructed by the dom0 domain builder.
>> >> >
>> >> > Please note the Xen page table configuration fundamental to the
>> >> > current PV ABI makes it impossible for an operating system to mitigate
>> >> > CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
>> >> > (KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
>> >> > must run directly in a HVM or PVH domU.
>> >> >
>> >> > This series is very similar to the PVH series posted by Wei and we
>> >> > have been discussing how to merge efforts.  We were hoping to have
>> >> > more time to work this out.  I am posting this because I'm fairly
>> >> > confident that this series is complete (all PV instances in EC2 are
>> >> > using this) and others might find it useful.  I also wanted to have
>> >> > more of a discussion about the best way to merge and some of the
>> >> > differences in designs.
>> >> >
>> >> > This series is also available at:
>> >> >
>> >> >  git clone https://github.com/aliguori/xen.git vixen-upstream-v1
>> >>
>> >> I do want to make the shim be able to run in both pvh and hvm mode
>> >> (which doesn't seem to be too hard in practice).
>> >
>> > AFAIK the pv-shim code will already work in HVM mode. It's just that
>> > booting the pv-shim in HVM mode requires that you install the shim
>> > inside of the guest and then boot it using grub or a similar loader
>> > that can do multiboot.
>>
>> I'm happy to work on either approach.  I just want to get something
>> merged to have
>> an upstream solution to this issue.  I think this particular CVE for
>> Xen PV is the worst
>> of this batch of issues so I'm super eager on getting a solution
>> straightened out.  I'd
>> really like to hear from others on what the right approach should be
>> and I'll work on
>> whatever the consensus is.
>>
>> I think PVH is a good long term solution but I think it's a poor short
>> term solution.
>> PVH isn't widely deployed so it's asking people to upgrade their
>> infrastructure to a
>> very new version of Xen.  It also requires tools changes which means
>> that even if
>> you are on a newer version of Xen, you still have to upgrade.  The
>> patch series is
>> also pretty big which means I suspect people will need to wait to 4.11 at best.
>>
>> OTOH, the HVM version of the series requires no tools changes and works on Xen
>> versions going back to 3.4 (at least).  What this means practically
>> speaking is that
>> if it were merged, we can tell people that they can solve this problem
>> by building the
>> HVM shim and modifying their launch config to boot from an ISO or
>> something similar.
>>
>
> This is fair enough. And it is the major reason why I want to make the
> shim works for both hvm and pvh in the first place.
>
> I'm more than happy to work with you to make PV-in-HVM work.
>
>> This gives people an immediate solution that does not require major
>> changes to their
>> underlying infrastructure.
>>
>
> What is your assessment of the completeness of this series? I think
> listing what works or what doesn't will have upstream make the decision
> better. For example, does migration work? It doesn't mean everything
> needs to be complete before we can start merging it but we do want
> upstream users to under what would be broken.

I don't know if migration works.

I know that a wide range of guests work going back to pretty old kernel versions
including XenoLinux.

Dynamic attach/detach of devices work, API driven reboot/shutdown, CPU capping,
SMP, etc.

I think the major features we haven't tested are probably migration,
CPU hotplug,
and ballooning.

Regards,

Anthony Liguori

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 17:03           ` Anthony Liguori
@ 2018-01-08 17:34             ` Wei Liu
  2018-01-08 17:47               ` Anthony Liguori
  0 siblings, 1 reply; 80+ messages in thread
From: Wei Liu @ 2018-01-08 17:34 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Jan Beulich, KarimAllah Ahmed,
	Andrew Cooper, Jan H.Schönherr, Ian Jackson,
	Anthony Liguori, Matt Wilson, security, xen-devel,
	Roger Pau Monné

On Mon, Jan 08, 2018 at 09:03:44AM -0800, Anthony Liguori wrote:
> On Mon, Jan 8, 2018 at 8:39 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> > Wei Liu writes ("Re: [Xen-devel] [PATCH 00/22] Vixen: A PV-in-HVM shim"):
> >> On Mon, Jan 08, 2018 at 08:02:07AM -0800, Anthony Liguori wrote:
> >> > OTOH, the HVM version of the series requires no tools changes and
> >> > works on Xen versions going back to 3.4 (at least).
> >
> > That depends, I think, on how you are selecting the guest kernel.
> >
> > libxl (at least, older libxls) don't support direct kernel boot in HVM
> > mode.  So if you were using kernel= in your config file that won't
> > work without libxl changes which are really hard to do and also
> > maintain ABI compatibility.
> >
> > Likewise bootloader= (eg bootloader="pygrub").
> 
> I think pvgrub is a pretty reasonable alternative to pygrub for most people.
> 
> What we specifically did was take the kernel/etc arguments and used them
> to generate an ISO with isolinux with the shim embedded in the ISO.
> 
> While it does work to set boot="d" and add the ISO to the disk=[] option, we
> preferred to use a wrapper around qemu to directly add a -cdrom option so
> that the ISO would not be exposed as a blkback device.
> 

If you use an ISO which boots isolinux, when and where do you get
hvm_start_info?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 17:34             ` Wei Liu
@ 2018-01-08 17:47               ` Anthony Liguori
  2018-01-08 17:53                 ` Ian Jackson
  0 siblings, 1 reply; 80+ messages in thread
From: Anthony Liguori @ 2018-01-08 17:47 UTC (permalink / raw)
  To: Wei Liu
  Cc: Anthony Liguori, Jan Beulich, KarimAllah Ahmed, Andrew Cooper,
	Jan H.Schönherr, Ian Jackson, Anthony Liguori, Matt Wilson,
	security, xen-devel, Roger Pau Monné

On Mon, Jan 8, 2018 at 9:34 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 09:03:44AM -0800, Anthony Liguori wrote:
>> On Mon, Jan 8, 2018 at 8:39 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
>> > Wei Liu writes ("Re: [Xen-devel] [PATCH 00/22] Vixen: A PV-in-HVM shim"):
>> >> On Mon, Jan 08, 2018 at 08:02:07AM -0800, Anthony Liguori wrote:
>> >> > OTOH, the HVM version of the series requires no tools changes and
>> >> > works on Xen versions going back to 3.4 (at least).
>> >
>> > That depends, I think, on how you are selecting the guest kernel.
>> >
>> > libxl (at least, older libxls) don't support direct kernel boot in HVM
>> > mode.  So if you were using kernel= in your config file that won't
>> > work without libxl changes which are really hard to do and also
>> > maintain ABI compatibility.
>> >
>> > Likewise bootloader= (eg bootloader="pygrub").
>>
>> I think pvgrub is a pretty reasonable alternative to pygrub for most people.
>>
>> What we specifically did was take the kernel/etc arguments and used them
>> to generate an ISO with isolinux with the shim embedded in the ISO.
>>
>> While it does work to set boot="d" and add the ISO to the disk=[] option, we
>> preferred to use a wrapper around qemu to directly add a -cdrom option so
>> that the ISO would not be exposed as a blkback device.
>>
>
> If you use an ISO which boots isolinux, when and where do you get
> hvm_start_info?

hvmloader is still used.   The full HVM boot stack is intact so it's
hvmloader -> {pcbios,seabios} -> boot loader.

For testing, I've been using grub as the boot loader but isolinux
works fine too.

The shim is booted as a multiboot kernel and the original
kernel/initrd are passed as multiboot modules.

Regards,

Anthony Liguori

> Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/22] Vixen: A PV-in-HVM shim
  2018-01-08 17:47               ` Anthony Liguori
@ 2018-01-08 17:53                 ` Ian Jackson
  0 siblings, 0 replies; 80+ messages in thread
From: Ian Jackson @ 2018-01-08 17:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Wei Liu, Jan Beulich, KarimAllah Ahmed,
	Andrew Cooper, Jan H.Schönherr, Anthony Liguori,
	Matt Wilson, security, xen-devel, Roger Pau Monné

Anthony Liguori writes ("Re: [Xen-devel] [PATCH 00/22] Vixen: A PV-in-HVM shim"):
> hvmloader is still used.   The full HVM boot stack is intact so it's
> hvmloader -> {pcbios,seabios} -> boot loader.
> 
> For testing, I've been using grub as the boot loader but isolinux
> works fine too.
> 
> The shim is booted as a multiboot kernel and the original
> kernel/initrd are passed as multiboot modules.

My coworkers just explained this to me.  I think it sounds brilliant.
We have been calling the constructed cdrom a "sidecar".

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/22] vixen: dom0 builder support
  2018-01-06 22:54 ` [PATCH 22/22] vixen: dom0 builder support Anthony Liguori
  2018-01-07  0:24   ` Matt Wilson
  2018-01-07  9:02   ` Roger Pau Monné
@ 2018-01-08 18:22   ` Konrad Rzeszutek Wilk
  2018-01-08 18:26     ` Anthony Liguori
  2 siblings, 1 reply; 80+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-08 18:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Wei Liu, Anthony Liguori, KarimAllah Ahmed, Andrew Cooper,
	Jan H. Schönherr, Matt Wilson, xen-devel

.snip..
> +    printk("Vixen Xenstore evtchn is %d, pfn is 0x%" PRIx64 "\n",
> +           *pstore_evtchn, *pstore_mfn);

So.. patch " console: do not print banner if below info log threshold"
speaks about having the printk be as close to what the PV guest would be
but here you are providing the printks.
?
And the other patches too?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/22] vixen: dom0 builder support
  2018-01-08 18:22   ` Konrad Rzeszutek Wilk
@ 2018-01-08 18:26     ` Anthony Liguori
  0 siblings, 0 replies; 80+ messages in thread
From: Anthony Liguori @ 2018-01-08 18:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Anthony Liguori, Wei Liu, Matt Wilson, KarimAllah Ahmed,
	Andrew Cooper, Jan H. Schönherr, Anthony Liguori, xen-devel

On Mon, Jan 8, 2018 at 10:22 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> .snip..
>> +    printk("Vixen Xenstore evtchn is %d, pfn is 0x%" PRIx64 "\n",
>> +           *pstore_evtchn, *pstore_mfn);
>
> So.. patch " console: do not print banner if below info log threshold"
> speaks about having the printk be as close to what the PV guest would be
> but here you are providing the printks.
> ?
> And the other patches too?

When using loglvl=none, this printk will be suppressed.  It's useful
for debugging though.

The banner patch was needed because even with loglvl=none, the banner
is still printed out.

The QEMU console patch is yet another option which sends the Xen
logging output to the
QEMU log.

Regards,

Anthony Liguori


> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2018-01-08 18:26 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-06 22:54 [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
2018-01-06 22:54 ` [PATCH 01/22] ---- x86/Kconfig: Options for Xen and PVH support Anthony Liguori
2018-01-06 22:54 ` [PATCH 02/22] x86/entry: Probe for Xen early during boot Anthony Liguori
2018-01-06 22:54 ` [PATCH 03/22] x86/guest: Hypercall support Anthony Liguori
2018-01-06 22:54 ` [PATCH 04/22] x86: Don't use potentially incorrect CPUID values for topology information Anthony Liguori
2018-01-06 22:54 ` [PATCH 05/22] char: optionally redirect {, g}printk output to QEMU debug log Anthony Liguori
2018-01-07  0:18   ` Anthony Liguori
2018-01-07  0:35     ` Matt Wilson
2018-01-06 22:54 ` [PATCH 06/22] console: do not print banner if below info log threshold Anthony Liguori
2018-01-06 22:54 ` [PATCH 07/22] vixen: introduce is_vixen() to allow altering behavior Anthony Liguori
2018-01-07  0:06   ` Matt Wilson
2018-01-07  0:26     ` Anthony Liguori
2018-01-06 22:54 ` [PATCH 08/22] vixen: allow dom0 to be created with a domid != 0 Anthony Liguori
2018-01-06 22:54 ` [PATCH 09/22] vixen: modify the e820 table to advertise HVM special pages as RAM Anthony Liguori
2018-01-07  8:16   ` Roger Pau Monné
2018-01-07 15:27     ` Anthony Liguori
2018-01-08  9:51       ` Roger Pau Monné
2018-01-08  9:54         ` Andrew Cooper
2018-01-08 10:23           ` Roger Pau Monné
2018-01-06 22:54 ` [PATCH 10/22] vixen: do not permit access to physical IRQs if in Vixen mode Anthony Liguori
2018-01-07  8:18   ` Roger Pau Monné
2018-01-07 15:28     ` Anthony Liguori
2018-01-06 22:54 ` [PATCH 11/22] vixen: early initialization of Vixen including shared_info mapping Anthony Liguori
2018-01-07  8:23   ` Roger Pau Monné
2018-01-07 15:33     ` Anthony Liguori
2018-01-08  9:55       ` Roger Pau Monné
2018-01-06 22:54 ` [PATCH 12/22] vixen: paravirtualization TSC frequency calculation Anthony Liguori
2018-01-06 22:54 ` [PATCH 13/22] vixen: Use SCHEDOP_shutdown to shutdown the machine Anthony Liguori
2018-01-07  8:27   ` Roger Pau Monné
2018-01-07 15:35     ` Anthony Liguori
2018-01-06 22:54 ` [PATCH 14/22] vixen: forward VCPUOP_register_runstate_memory_area to outer Xen Anthony Liguori
2018-01-06 22:54 ` [PATCH 15/22] vixen: pass through version hypercalls to parent Xen Anthony Liguori
2018-01-07  8:31   ` Roger Pau Monné
2018-01-07 15:40     ` Anthony Liguori
2018-01-07 15:55       ` Andrew Cooper
2018-01-08  9:36       ` Jan Beulich
2018-01-06 22:54 ` [PATCH 16/22] vixen: pass grant table operations through to the outer Xen Anthony Liguori
2018-01-07  8:36   ` Roger Pau Monné
2018-01-07 15:42     ` Anthony Liguori
2018-01-07 16:45       ` Andrew Cooper
2018-01-07 17:09         ` Anthony Liguori
2018-01-07 18:45           ` Anthony Liguori
2018-01-08 10:12         ` Roger Pau Monné
2018-01-08 10:05       ` Roger Pau Monné
2018-01-06 22:54 ` [PATCH 17/22] vixen: setup infrastructure to receive event channel notifications Anthony Liguori
2018-01-07  8:42   ` Roger Pau Monné
2018-01-07 15:45     ` Anthony Liguori
2018-01-06 22:54 ` [PATCH 18/22] vixen: Introduce ECS_PROXY for event channel proxying Anthony Liguori
2018-01-07  8:44   ` Roger Pau Monné
2018-01-07 15:46     ` Anthony Liguori
2018-01-08 10:04       ` Jan H. Schönherr
2018-01-06 22:54 ` [PATCH 19/22] vixen: Fix Vixen adaptation of send_global_virq() Anthony Liguori
2018-01-06 22:54 ` [PATCH 20/22] vixen: event channel passthrough support Anthony Liguori
2018-01-06 22:54 ` [PATCH 21/22] vixen: provide Xencons implementation Anthony Liguori
2018-01-06 22:54 ` [PATCH 22/22] vixen: dom0 builder support Anthony Liguori
2018-01-07  0:24   ` Matt Wilson
2018-01-07  9:02   ` Roger Pau Monné
2018-01-07 15:52     ` Anthony Liguori
2018-01-08 10:03       ` Roger Pau Monné
2018-01-08 18:22   ` Konrad Rzeszutek Wilk
2018-01-08 18:26     ` Anthony Liguori
2018-01-06 23:29 ` [PATCH 00/22] Vixen: A PV-in-HVM shim Anthony Liguori
2018-01-06 23:50 ` Andrew Cooper
2018-01-06 23:59   ` Matt Wilson
2018-01-07  0:05   ` Anthony Liguori
2018-01-07 20:29     ` Anthony Liguori
2018-01-08 11:54 ` Wei Liu
2018-01-08 12:11   ` Roger Pau Monné
2018-01-08 12:14     ` Wei Liu
2018-01-08 16:02     ` Anthony Liguori
2018-01-08 16:28       ` George Dunlap
     [not found]         ` <CA+aC4kt5zbymFbHqCMV-oB80cw2dXWTcTztpa4EnqOKELKs7qg@mail.gmail.com>
     [not found]           ` <CA+aC4ku_0MB34=Y=yF3XwADyXttYd8t3Dw7XcOhOr+8aS9nONA@mail.gmail.com>
     [not found]             ` <CA+aC4kvsVXDFTHrcRK76868zmqCVuKtrjpSLJk=dYNdxL0PSCw@mail.gmail.com>
     [not found]               ` <CA+aC4kujGxWQzSfWP=8qP2SWd0G+qBod8HCLuosPg9SzS-22Vw@mail.gmail.com>
2018-01-08 16:41                 ` Anthony Liguori
2018-01-08 16:30       ` Wei Liu
2018-01-08 16:39         ` Ian Jackson
2018-01-08 17:03           ` Anthony Liguori
2018-01-08 17:34             ` Wei Liu
2018-01-08 17:47               ` Anthony Liguori
2018-01-08 17:53                 ` Ian Jackson
2018-01-08 17:11         ` Anthony Liguori
2018-01-08 16:38       ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.