xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] HVMlite domU support
@ 2016-02-01 15:38 Boris Ostrovsky
  2016-02-01 15:38 ` [PATCH v2 01/11] xen/hvmlite: Import hvmlite-related Xen public interfaces Boris Ostrovsky
                   ` (11 more replies)
  0 siblings, 12 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

This series introduces HVMlite support for unprivileged guests.

It has been tested on Intel/AMD, both 32- and 64-bit, including CPU on- and
offlining and save/restore. (Restore will result in APIC write warnings
which exist now for 32-bit PV guests as well so I didn't address this in
this series)

Compile-tested on ARM

v2:
* Dropped first patch (don't use initial_page_table/initial_pg_pmd and start_secondary)
* Much simplified kernel init code (after realizing that xen_hvm_guest_init()/
  init_hvm_pv_info() already do most of it). As result, factoring ot of xen_start_kernel()
  is no longer necessary. (patch 3)
* Dropped pcifront use until we decide that we *are* using it. (patch 5)


Boris Ostrovsky (11):
  xen/hvmlite: Import hvmlite-related Xen public interfaces
  xen/hvmlite: Bootstrap HVMlite guest
  xen/hvmlite: Initialize HVMlite kernel
  xen/hvmlite: Allow HVMlite guests delay initializing grant table
  xen/hvmlite: HVMlite guests always have PV devices
  xen/hvmlite: Prepare cpu_initialize_context() routine for HVMlite SMP
  xen/hvmlite: Initialize context for secondary VCPUs
  xen/hvmlite: Extend APIC operations for HVMlite guests
  xen/hvmlite: Use x86's default timer init for HVMlite guests
  xen/hvmlite: Boot secondary CPUs
  xen/hvmlite: Enable CPU on-/offlining

 arch/x86/xen/Makefile                |    1 +
 arch/x86/xen/apic.c                  |   39 +++++-
 arch/x86/xen/enlighten.c             |  106 +++++++++++++-
 arch/x86/xen/grant-table.c           |    4 +-
 arch/x86/xen/platform-pci-unplug.c   |    4 +-
 arch/x86/xen/pmu.c                   |    4 +-
 arch/x86/xen/smp.c                   |  252 ++++++++++++++++++++++++----------
 arch/x86/xen/smp.h                   |    4 +
 arch/x86/xen/time.c                  |    5 +-
 arch/x86/xen/xen-hvmlite.S           |  190 +++++++++++++++++++++++++
 drivers/xen/grant-table.c            |    8 +-
 include/xen/interface/elfnote.h      |   12 ++-
 include/xen/interface/hvm/hvm_vcpu.h |  143 +++++++++++++++++++
 include/xen/interface/xen.h          |   24 ++++
 include/xen/xen.h                    |    6 +
 15 files changed, 706 insertions(+), 96 deletions(-)
 create mode 100644 arch/x86/xen/xen-hvmlite.S
 create mode 100644 include/xen/interface/hvm/hvm_vcpu.h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 01/11] xen/hvmlite: Import hvmlite-related Xen public interfaces
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-02 16:06   ` David Vrabel
  2016-02-01 15:38 ` [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest Boris Ostrovsky
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 include/xen/interface/elfnote.h      |   12 +++-
 include/xen/interface/hvm/hvm_vcpu.h |  143 ++++++++++++++++++++++++++++++++++
 include/xen/interface/xen.h          |   24 ++++++
 3 files changed, 178 insertions(+), 1 deletions(-)
 create mode 100644 include/xen/interface/hvm/hvm_vcpu.h

diff --git a/include/xen/interface/elfnote.h b/include/xen/interface/elfnote.h
index f90b034..9e9f9bf 100644
--- a/include/xen/interface/elfnote.h
+++ b/include/xen/interface/elfnote.h
@@ -193,9 +193,19 @@
 #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
 
 /*
+ * Physical entry point into the kernel.
+ *
+ * 32bit entry point into the kernel. When requested to launch the
+ * guest kernel in a HVM container, Xen will use this entry point to
+ * launch the guest in 32bit protected mode with paging disabled.
+ * Ignored otherwise.
+ */
+#define XEN_ELFNOTE_PHYS32_ENTRY 18
+
+/*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_SUPPORTED_FEATURES
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
 
 #endif /* __XEN_PUBLIC_ELFNOTE_H__ */
 
diff --git a/include/xen/interface/hvm/hvm_vcpu.h b/include/xen/interface/hvm/hvm_vcpu.h
new file mode 100644
index 0000000..32ca83e
--- /dev/null
+++ b/include/xen/interface/hvm/hvm_vcpu.h
@@ -0,0 +1,143 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2015, Roger Pau Monne <roger.pau@citrix.com>
+ */
+
+#ifndef __XEN_PUBLIC_HVM_HVM_VCPU_H__
+#define __XEN_PUBLIC_HVM_HVM_VCPU_H__
+
+#include "../xen.h"
+
+struct vcpu_hvm_x86_32 {
+    uint32_t eax;
+    uint32_t ecx;
+    uint32_t edx;
+    uint32_t ebx;
+    uint32_t esp;
+    uint32_t ebp;
+    uint32_t esi;
+    uint32_t edi;
+    uint32_t eip;
+    uint32_t eflags;
+
+    uint32_t cr0;
+    uint32_t cr3;
+    uint32_t cr4;
+
+    uint32_t pad1;
+
+    /*
+     * EFER should only be used to set the NXE bit (if required)
+     * when starting a vCPU in 32bit mode with paging enabled or
+     * to set the LME/LMA bits in order to start the vCPU in
+     * compatibility mode.
+     */
+    uint64_t efer;
+
+    uint32_t cs_base;
+    uint32_t ds_base;
+    uint32_t ss_base;
+    uint32_t es_base;
+    uint32_t tr_base;
+    uint32_t cs_limit;
+    uint32_t ds_limit;
+    uint32_t ss_limit;
+    uint32_t es_limit;
+    uint32_t tr_limit;
+    uint16_t cs_ar;
+    uint16_t ds_ar;
+    uint16_t ss_ar;
+    uint16_t es_ar;
+    uint16_t tr_ar;
+
+    uint16_t pad2[3];
+};
+
+/*
+ * The layout of the _ar fields of the segment registers is the
+ * following:
+ *
+ * Bits   [0,3]: type (bits 40-43).
+ * Bit        4: s    (descriptor type, bit 44).
+ * Bit    [5,6]: dpl  (descriptor privilege level, bits 45-46).
+ * Bit        7: p    (segment-present, bit 47).
+ * Bit        8: avl  (available for system software, bit 52).
+ * Bit        9: l    (64-bit code segment, bit 53).
+ * Bit       10: db   (meaning depends on the segment, bit 54).
+ * Bit       11: g    (granularity, bit 55)
+ * Bits [12,15]: unused, must be blank.
+ *
+ * A more complete description of the meaning of this fields can be
+ * obtained from the Intel SDM, Volume 3, section 3.4.5.
+ */
+
+struct vcpu_hvm_x86_64 {
+    uint64_t rax;
+    uint64_t rcx;
+    uint64_t rdx;
+    uint64_t rbx;
+    uint64_t rsp;
+    uint64_t rbp;
+    uint64_t rsi;
+    uint64_t rdi;
+    uint64_t rip;
+    uint64_t rflags;
+
+    uint64_t cr0;
+    uint64_t cr3;
+    uint64_t cr4;
+    uint64_t efer;
+
+    /*
+     * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
+     * directly in long mode, so the cached parts of the segment
+     * registers get set to match that environment.
+     *
+     * If the user wants to launch the vCPU in compatibility mode
+     * the 32-bit structure should be used instead.
+     */
+};
+
+struct vcpu_hvm_context {
+#define VCPU_HVM_MODE_32B 0  /* 32bit fields of the structure will be used. */
+#define VCPU_HVM_MODE_64B 1  /* 64bit fields of the structure will be used. */
+    uint32_t mode;
+
+    uint32_t pad;
+
+    /* CPU registers. */
+    union {
+        struct vcpu_hvm_x86_32 x86_32;
+        struct vcpu_hvm_x86_64 x86_64;
+    } cpu_regs;
+};
+typedef struct vcpu_hvm_context vcpu_hvm_context_t;
+
+#endif /* __XEN_PUBLIC_HVM_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 167071c..ead7afd 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -645,6 +645,30 @@ struct start_info {
 	unsigned long nr_p2m_frames;/* # of pfns forming initial P->M table.  */
 };
 
+/*
+ * Start of day structure passed to PVH guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so
+ * a 0 value in any of the address fields should be treated
+ * as not present.
+ */
+struct hvm_start_info {
+#define HVM_START_MAGIC_VALUE 0x336ec578
+    uint32_t magic;             /* Contains the magic value 0x336ec578       */
+                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
+    uint32_t flags;             /* SIF_xxx flags.                            */
+    uint32_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
+    uint32_t modlist_paddr;     /* Physical address of an array of           */
+                                /* hvm_modlist_entry.                        */
+};
+
+struct hvm_modlist_entry {
+    uint32_t paddr;             /* Physical address of the module.           */
+    uint32_t size;              /* Size of the module in bytes.              */
+};
+
+
 /* These flags are passed in the 'flags' field of start_info_t. */
 #define SIF_PRIVILEGED      (1<<0)  /* Is the domain privileged? */
 #define SIF_INITDOMAIN      (1<<1)  /* Is this the initial control domain? */
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
  2016-02-01 15:38 ` [PATCH v2 01/11] xen/hvmlite: Import hvmlite-related Xen public interfaces Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-02 16:39   ` David Vrabel
                     ` (5 more replies)
  2016-02-01 15:38 ` [PATCH v2 03/11] xen/hvmlite: Initialize HVMlite kernel Boris Ostrovsky
                   ` (9 subsequent siblings)
  11 siblings, 6 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
page, initialize boot_params, enable early page tables.

Since this stub is executed before kernel entry point we cannot use
variables in .bss which is cleared by kernel. We explicitly place
variables that are initialized here into .data.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/Makefile      |    1 +
 arch/x86/xen/enlighten.c   |   86 +++++++++++++++++++++-
 arch/x86/xen/xen-hvmlite.S |  175 ++++++++++++++++++++++++++++++++++++++++++++
 include/xen/xen.h          |    6 ++
 4 files changed, 267 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/xen/xen-hvmlite.S

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index e47e527..1d913d7 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= vga.o
 obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
 obj-$(CONFIG_XEN_EFI)		+= efi.o
+obj-$(CONFIG_XEN_PVHVM) 	+= xen-hvmlite.o
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5774800..5f05fa2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -118,7 +118,8 @@ DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
  */
 DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
 
-enum xen_domain_type xen_domain_type = XEN_NATIVE;
+enum xen_domain_type xen_domain_type
+	__attribute__((section(".data"))) = XEN_NATIVE;
 EXPORT_SYMBOL_GPL(xen_domain_type);
 
 unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
@@ -171,6 +172,17 @@ struct tls_descs {
  */
 static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
 
+#ifdef CONFIG_XEN_PVHVM
+/*
+ * HVMlite variables. These need to live in data segment since they are
+ * initialized before startup_{32|64}, which clear .bss, are invoked.
+ */
+int xen_hvmlite __attribute__((section(".data"))) = 0;
+struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
+uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
+struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
+#endif
+
 static void clamp_max_cpus(void)
 {
 #ifdef CONFIG_SMP
@@ -1731,6 +1743,78 @@ asmlinkage __visible void __init xen_start_kernel(void)
 #endif
 }
 
+#ifdef CONFIG_XEN_PVHVM
+static void __init hvmlite_bootparams(void)
+{
+	struct xen_memory_map memmap;
+	int i;
+
+	memset(&xen_hvmlite_boot_params, 0, sizeof(xen_hvmlite_boot_params));
+
+	memmap.nr_entries = ARRAY_SIZE(xen_hvmlite_boot_params.e820_map);
+	set_xen_guest_handle(memmap.buffer, xen_hvmlite_boot_params.e820_map);
+	if (HYPERVISOR_memory_op(XENMEM_memory_map, &memmap)) {
+		xen_raw_console_write("XENMEM_memory_map failed\n");
+		BUG();
+	}
+
+	xen_hvmlite_boot_params.e820_map[memmap.nr_entries].addr =
+		ISA_START_ADDRESS;
+	xen_hvmlite_boot_params.e820_map[memmap.nr_entries].size =
+		ISA_END_ADDRESS - ISA_START_ADDRESS;
+	xen_hvmlite_boot_params.e820_map[memmap.nr_entries++].type =
+		E820_RESERVED;
+
+	sanitize_e820_map(xen_hvmlite_boot_params.e820_map,
+			  ARRAY_SIZE(xen_hvmlite_boot_params.e820_map),
+			  &memmap.nr_entries);
+
+	xen_hvmlite_boot_params.e820_entries = memmap.nr_entries;
+	for (i = 0; i < xen_hvmlite_boot_params.e820_entries; i++)
+		e820_add_region(xen_hvmlite_boot_params.e820_map[i].addr,
+				xen_hvmlite_boot_params.e820_map[i].size,
+				xen_hvmlite_boot_params.e820_map[i].type);
+
+	xen_hvmlite_boot_params.hdr.cmd_line_ptr =
+		hvmlite_start_info.cmdline_paddr;
+
+	/* The first module is always ramdisk */
+	if (hvmlite_start_info.nr_modules) {
+		struct hvm_modlist_entry *modaddr =
+			__va(hvmlite_start_info.modlist_paddr);
+		xen_hvmlite_boot_params.hdr.ramdisk_image = modaddr->paddr;
+		xen_hvmlite_boot_params.hdr.ramdisk_size = modaddr->size;
+	}
+
+	/*
+	 * See Documentation/x86/boot.txt.
+	 *
+	 * Version 2.12 supports Xen entry point but we will use default x86/PC
+	 * environment (i.e. hardware_subarch 0).
+	 */
+	xen_hvmlite_boot_params.hdr.version = 0x212;
+	xen_hvmlite_boot_params.hdr.type_of_loader = 9; /* Xen loader */
+}
+
+/*
+ * This routine (and those that it might call) should not use
+ * anything that lives in .bss since that segment will be cleared later
+ */
+void __init xen_prepare_hvmlite(void)
+{
+	u32 eax, ecx, edx, msr;
+	u64 pfn;
+
+	xen_hvmlite = 1;
+
+	cpuid(xen_cpuid_base() + 2, &eax, &msr, &ecx, &edx);
+	pfn = __pa(hypercall_page);
+	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+
+	hvmlite_bootparams();
+}
+#endif
+
 void __ref xen_hvm_init_shared_info(void)
 {
 	int cpu;
diff --git a/arch/x86/xen/xen-hvmlite.S b/arch/x86/xen/xen-hvmlite.S
new file mode 100644
index 0000000..fc7c08c
--- /dev/null
+++ b/arch/x86/xen/xen-hvmlite.S
@@ -0,0 +1,175 @@
+/*
+ * Copyright C 2016, Oracle and/or its affiliates. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+	.code32
+	.text
+#define _pa(x)          ((x) - __START_KERNEL_map)
+
+#include <linux/elfnote.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <asm/segment.h>
+#include <asm/asm.h>
+#include <asm/boot.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <xen/interface/elfnote.h>
+
+	__HEAD
+	.code32
+
+/* Entry point for HVMlite guests */
+ENTRY(hvmlite_start_xen)
+	cli
+	cld
+
+	mov $_pa(gdt), %eax
+	lgdt (%eax)
+
+	movl $(__BOOT_DS),%eax
+	movl %eax,%ds
+	movl %eax,%es
+	movl %eax,%ss
+
+	/* Stash hvm_start_info */
+	mov $_pa(hvmlite_start_info), %edi
+	mov %ebx, %esi
+	mov $_pa(hvmlite_start_info_sz), %ecx
+	mov (%ecx), %ecx
+	rep
+	movsb
+
+	movl $_pa(early_stack_end), %eax
+	movl %eax, %esp
+
+	/* Enable PAE mode */
+	movl %cr4, %eax
+	orl $X86_CR4_PAE, %eax
+	movl %eax, %cr4
+
+#ifdef CONFIG_X86_64
+	/* Enable Long mode */
+	movl $MSR_EFER, %ecx
+	rdmsr
+	btsl $_EFER_LME, %eax
+	wrmsr
+
+	/* Enable pre-constructed page tables */
+	mov $_pa(init_level4_pgt), %eax
+	movl %eax, %cr3
+	movl $(X86_CR0_PG | X86_CR0_PE), %eax
+	movl %eax, %cr0
+
+	/* Jump to 64-bit mode. */
+	pushl $__KERNEL_CS
+	leal _pa(1f), %eax
+	pushl %eax
+	lret
+
+	/* 64-bit entry point */
+	.code64
+1:
+	call xen_prepare_hvmlite
+
+	/* startup_64 expects boot_params in %rsi */
+	mov $_pa(xen_hvmlite_boot_params), %rsi
+	movq $_pa(startup_64), %rax
+	jmp *%rax
+
+#else /* CONFIG_X86_64 */
+
+	/* Clear boot page tables */
+	movl $_pa(early_pgtable), %edi
+	xorl %eax, %eax
+	movl $((PAGE_SIZE*5)/4), %ecx
+	rep stosl
+
+	/* Level 3 */
+	movl $_pa(early_pgtable), %edi
+	leal (PAGE_SIZE +_PAGE_PRESENT)(%edi), %eax
+	movl $4, %ecx
+1:
+	movl %eax, 0x00(%edi)
+	addl $8, %edi
+	decl %ecx
+	jnz 1b
+
+	/* Level 2 (2M entries) */
+	movl $(_pa(early_pgtable) + PAGE_SIZE), %edi
+	movl $(_PAGE_PSE | _PAGE_RW | _PAGE_PRESENT), %eax
+	movl $2048, %ecx
+2:
+	movl %eax, 0(%edi)
+	addl $0x00200000, %eax
+	addl $8, %edi
+	decl %ecx
+	jnz 2b
+
+	/* Enable the boot paging */
+	movl $_pa(early_pgtable), %eax
+	movl %eax, %cr3
+	movl %cr0, %eax
+	orl $(X86_CR0_PG | X86_CR0_PE), %eax
+	movl %eax, %cr0
+
+	ljmp $__BOOT_CS,$3f
+3:
+	call xen_prepare_hvmlite
+	mov $_pa(xen_hvmlite_boot_params), %esi
+
+	/* startup_32 doesn't expect paging and PAE to be on */
+	ljmp $__BOOT_CS,$_pa(4f)
+4:
+	movl %cr0, %eax
+	andl $~X86_CR0_PG, %eax
+	movl %eax, %cr0
+	movl %cr4, %eax
+	andl $~X86_CR4_PAE, %eax
+	movl %eax, %cr4
+
+	ljmp    $0x10, $_pa(startup_32)
+#endif
+
+	.data
+gdt:
+	.word	gdt_end - gdt
+	.long	_pa(gdt)
+	.word	0
+	.quad	0x0000000000000000 /* NULL descriptor */
+#ifdef CONFIG_X86_64
+	.quad	0x00af9a000000ffff /* __KERNEL_CS */
+#else
+	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
+#endif
+	.quad	0x00cf92000000ffff /* __KERNEL_DS */
+gdt_end:
+
+	.bss
+	.balign 4
+early_stack:
+	.fill 16, 1, 0
+early_stack_end:
+
+#ifdef CONFIG_X86_32
+	.section ".pgtable","a",@nobits
+	.balign 4096
+early_pgtable:
+	.fill 5*4096, 1, 0
+#endif
+
+	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
+	             _ASM_PTR (hvmlite_start_xen - __START_KERNEL_map))
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 0c0e3ef..6a0d3f3 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,6 +29,12 @@ extern enum xen_domain_type xen_domain_type;
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
+#ifdef CONFIG_XEN_PVHVM
+extern int xen_hvmlite;
+#else
+#define xen_hvmlite		(0)
+#endif
+
 #ifdef CONFIG_XEN_PVH
 /* This functionality exists only for x86. The XEN_PVHVM support exists
  * only in x86 world - hence on ARM it will be always disabled.
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 03/11] xen/hvmlite: Initialize HVMlite kernel
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
  2016-02-01 15:38 ` [PATCH v2 01/11] xen/hvmlite: Import hvmlite-related Xen public interfaces Boris Ostrovsky
  2016-02-01 15:38 ` [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-17 20:08   ` Luis R. Rodriguez
  2016-02-01 15:38 ` [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table Boris Ostrovsky
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

HVMlite guests need to make a few additional initialization calls
compared to regular HVM guests.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/enlighten.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5f05fa2..1409de6 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1863,15 +1863,21 @@ static void __init init_hvm_pv_info(void)
 	minor = eax & 0xffff;
 	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
 
-	cpuid(base + 2, &pages, &msr, &ecx, &edx);
-
-	pfn = __pa(hypercall_page);
-	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+	/* HVMlite set up hypercall page earlier in xen_prepare_hvmlite() */
+	if (xen_hvmlite) {
+		pv_info.name = "Xen HVMlite";
+		pv_info.paravirt_enabled = 1;
+		xen_init_apic();
+		machine_ops = xen_machine_ops;
+	} else {
+		pv_info.name = "Xen HVM";
+		cpuid(base + 2, &pages, &msr, &ecx, &edx);
+		pfn = __pa(hypercall_page);
+		wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+	}
 
 	xen_setup_features();
 
-	pv_info.name = "Xen HVM";
-
 	xen_domain_type = XEN_HVM_DOMAIN;
 }
 
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 03/11] xen/hvmlite: Initialize HVMlite kernel Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-01 15:38 ` [PATCH v2 05/11] xen/hvmlite: HVMlite guests always have PV devices Boris Ostrovsky
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

.. just like we currently do for PVH guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/grant-table.c |    4 ++--
 drivers/xen/grant-table.c  |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index e079500..40ad9c2 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -110,7 +110,7 @@ int arch_gnttab_init(unsigned long nr_shared)
 	return arch_gnttab_valloc(&gnttab_shared_vm_area, nr_shared);
 }
 
-#ifdef CONFIG_XEN_PVH
+#ifdef CONFIG_XEN_PVHVM
 #include <xen/balloon.h>
 #include <xen/events.h>
 #include <linux/slab.h>
@@ -164,7 +164,7 @@ static int __init xlated_setup_gnttab_pages(void)
 
 static int __init xen_pvh_gnttab_setup(void)
 {
-	if (!xen_pvh_domain())
+	if (!xen_pvh_domain() && !xen_hvmlite)
 		return -ENODEV;
 
 	return xlated_setup_gnttab_pages();
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index c49f79e..9a239d5 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1147,13 +1147,13 @@ EXPORT_SYMBOL_GPL(gnttab_init);
 
 static int __gnttab_init(void)
 {
+	if (!xen_domain())
+		return -ENODEV;
+
 	/* Delay grant-table initialization in the PV on HVM case */
-	if (xen_hvm_domain())
+	if (xen_hvm_domain() && !xen_hvmlite)
 		return 0;
 
-	if (!xen_pv_domain())
-		return -ENODEV;
-
 	return gnttab_init();
 }
 /* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 05/11] xen/hvmlite: HVMlite guests always have PV devices
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-01 15:38 ` [PATCH v2 06/11] xen/hvmlite: Prepare cpu_initialize_context() routine for HVMlite SMP Boris Ostrovsky
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/platform-pci-unplug.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/platform-pci-unplug.c b/arch/x86/xen/platform-pci-unplug.c
index 9586ff3..802ec90 100644
--- a/arch/x86/xen/platform-pci-unplug.c
+++ b/arch/x86/xen/platform-pci-unplug.c
@@ -73,8 +73,8 @@ bool xen_has_pv_devices(void)
 	if (!xen_domain())
 		return false;
 
-	/* PV domains always have them. */
-	if (xen_pv_domain())
+	/* PV and HVMlite domains always have them. */
+	if (xen_pv_domain() || xen_hvmlite)
 		return true;
 
 	/* And user has xen_platform_pci=0 set in guest config as
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 06/11] xen/hvmlite: Prepare cpu_initialize_context() routine for HVMlite SMP
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 05/11] xen/hvmlite: HVMlite guests always have PV devices Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-02 16:16   ` David Vrabel
  2016-02-01 15:38 ` [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs Boris Ostrovsky
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

Subsequent patch will add support for starting secondary VCPUs in
HVMlite guest. This patch exists to simmplify code review.

No functional changes (except for introduction of 'if (!xen_hvmlite)').

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/smp.c |  104 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 3f4ebf0..5fc4afb 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -390,70 +390,80 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 	if (cpumask_test_and_set_cpu(cpu, xen_cpu_initialized_map))
 		return 0;
 
-	ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL);
-	if (ctxt == NULL)
-		return -ENOMEM;
+	if (!xen_hvmlite) {
 
-	gdt = get_cpu_gdt_table(cpu);
+		ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL);
+		if (ctxt == NULL)
+			return -ENOMEM;
+
+		gdt = get_cpu_gdt_table(cpu);
 
 #ifdef CONFIG_X86_32
-	/* Note: PVH is not yet supported on x86_32. */
-	ctxt->user_regs.fs = __KERNEL_PERCPU;
-	ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
+		/* Note: PVH is not yet supported on x86_32. */
+		ctxt->user_regs.fs = __KERNEL_PERCPU;
+		ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
-	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
+		memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-		ctxt->flags = VGCF_IN_KERNEL;
-		ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
-		ctxt->user_regs.ds = __USER_DS;
-		ctxt->user_regs.es = __USER_DS;
-		ctxt->user_regs.ss = __KERNEL_DS;
+		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			ctxt->user_regs.eip =
+				(unsigned long)cpu_bringup_and_idle;
+			ctxt->flags = VGCF_IN_KERNEL;
+			ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
+			ctxt->user_regs.ds = __USER_DS;
+			ctxt->user_regs.es = __USER_DS;
+			ctxt->user_regs.ss = __KERNEL_DS;
 
-		xen_copy_trap_info(ctxt->trap_ctxt);
+			xen_copy_trap_info(ctxt->trap_ctxt);
 
-		ctxt->ldt_ents = 0;
+			ctxt->ldt_ents = 0;
 
-		BUG_ON((unsigned long)gdt & ~PAGE_MASK);
+			BUG_ON((unsigned long)gdt & ~PAGE_MASK);
 
-		gdt_mfn = arbitrary_virt_to_mfn(gdt);
-		make_lowmem_page_readonly(gdt);
-		make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
+			gdt_mfn = arbitrary_virt_to_mfn(gdt);
+			make_lowmem_page_readonly(gdt);
+			make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
 
-		ctxt->gdt_frames[0] = gdt_mfn;
-		ctxt->gdt_ents      = GDT_ENTRIES;
+			ctxt->gdt_frames[0] = gdt_mfn;
+			ctxt->gdt_ents      = GDT_ENTRIES;
 
-		ctxt->kernel_ss = __KERNEL_DS;
-		ctxt->kernel_sp = idle->thread.sp0;
+			ctxt->kernel_ss = __KERNEL_DS;
+			ctxt->kernel_sp = idle->thread.sp0;
 
 #ifdef CONFIG_X86_32
-		ctxt->event_callback_cs     = __KERNEL_CS;
-		ctxt->failsafe_callback_cs  = __KERNEL_CS;
+			ctxt->event_callback_cs     = __KERNEL_CS;
+			ctxt->failsafe_callback_cs  = __KERNEL_CS;
 #else
-		ctxt->gs_base_kernel = per_cpu_offset(cpu);
+			ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
-		ctxt->event_callback_eip    =
-					(unsigned long)xen_hypervisor_callback;
-		ctxt->failsafe_callback_eip =
-					(unsigned long)xen_failsafe_callback;
-		ctxt->user_regs.cs = __KERNEL_CS;
-		per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
-	}
+			ctxt->event_callback_eip    =
+				(unsigned long)xen_hypervisor_callback;
+			ctxt->failsafe_callback_eip =
+				(unsigned long)xen_failsafe_callback;
+			ctxt->user_regs.cs = __KERNEL_CS;
+			per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
+		}
 #ifdef CONFIG_XEN_PVH
-	else {
-		/*
-		 * The vcpu comes on kernel page tables which have the NX pte
-		 * bit set. This means before DS/SS is touched, NX in
-		 * EFER must be set. Hence the following assembly glue code.
-		 */
-		ctxt->user_regs.eip = (unsigned long)xen_pvh_early_cpu_init;
-		ctxt->user_regs.rdi = cpu;
-		ctxt->user_regs.rsi = true;  /* entry == true */
-	}
+		else {
+			/*
+			 * The vcpu comes on kernel page tables which have the
+			 * NX pte bit set. This means before DS/SS is touched,
+			 * NX in EFER must be set. Hence the following assembly
+			 * glue code.
+			 */
+			ctxt->user_regs.eip =
+				(unsigned long)xen_pvh_early_cpu_init;
+			ctxt->user_regs.rdi = cpu;
+			ctxt->user_regs.rsi = true;  /* entry == true */
+		}
 #endif
-	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
-	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
+		ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
+		ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
+	} else {
+		ctxt = NULL; /* To quiet down compiler */
+		BUG();
+	}
+
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
 		BUG();
 
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 06/11] xen/hvmlite: Prepare cpu_initialize_context() routine for HVMlite SMP Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-02 16:21   ` David Vrabel
       [not found]   ` <56B0D786.7000002@citrix.com>
  2016-02-01 15:38 ` [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests Boris Ostrovsky
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/smp.c         |   57 ++++++++++++++++++++++++++++++++++++++++----
 arch/x86/xen/smp.h         |    4 +++
 arch/x86/xen/xen-hvmlite.S |    7 +++++
 3 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 5fc4afb..b265c4f 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -27,6 +27,7 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/vcpu.h>
 #include <xen/interface/xenpmu.h>
+#include <xen/interface/hvm/hvm_vcpu.h>
 
 #include <asm/xen/interface.h>
 #include <asm/xen/hypercall.h>
@@ -384,6 +385,7 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 	struct vcpu_guest_context *ctxt;
 	struct desc_struct *gdt;
 	unsigned long gdt_mfn;
+	void *ctxt_arg;
 
 	/* used to tell cpu_init() that it can proceed with initialization */
 	cpumask_set_cpu(cpu, cpu_callout_mask);
@@ -392,7 +394,7 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 
 	if (!xen_hvmlite) {
 
-		ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL);
+		ctxt_arg = ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL);
 		if (ctxt == NULL)
 			return -ENOMEM;
 
@@ -460,14 +462,59 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 		ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
 		ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
 	} else {
-		ctxt = NULL; /* To quiet down compiler */
-		BUG();
+#ifdef CONFIG_XEN_PVHVM
+		struct vcpu_hvm_context *hctxt;
+
+		ctxt_arg = hctxt = kzalloc(sizeof(*hctxt), GFP_KERNEL);
+		if (hctxt == NULL)
+			return -ENOMEM;
+
+#ifdef CONFIG_X86_64
+		hctxt->mode = VCPU_HVM_MODE_64B;
+		hctxt->cpu_regs.x86_64.rip =
+			(unsigned long)secondary_startup_64;
+		hctxt->cpu_regs.x86_64.rsp = stack_start;
+
+		hctxt->cpu_regs.x86_64.cr0 =
+			X86_CR0_PG | X86_CR0_WP | X86_CR0_PE;
+		hctxt->cpu_regs.x86_64.cr4 = X86_CR4_PAE;
+		hctxt->cpu_regs.x86_64.cr3 =
+			xen_pfn_to_cr3(virt_to_mfn(init_level4_pgt));
+		hctxt->cpu_regs.x86_64.efer = EFER_LME | EFER_NX;
+#else
+		hctxt->mode = VCPU_HVM_MODE_32B;
+		/*
+		 * startup_32_smp expects GDT loaded so we can't jump
+		 * there directly.
+		 */
+		hctxt->cpu_regs.x86_32.eip =
+			(unsigned long)hvmlite_smp_32 - __START_KERNEL_map;
+
+		hctxt->cpu_regs.x86_32.cr0 = X86_CR0_PE;
+
+		hctxt->cpu_regs.x86_32.cs_base = 0;
+		hctxt->cpu_regs.x86_32.cs_limit = ~0u;
+		hctxt->cpu_regs.x86_32.cs_ar = 0xc9b;
+		hctxt->cpu_regs.x86_32.ds_base = 0;
+		hctxt->cpu_regs.x86_32.ds_limit = ~0u;
+		hctxt->cpu_regs.x86_32.ds_ar = 0xc93;
+		hctxt->cpu_regs.x86_32.es_base = 0;
+		hctxt->cpu_regs.x86_32.es_limit = ~0u;
+		hctxt->cpu_regs.x86_32.es_ar = 0xc93;
+		hctxt->cpu_regs.x86_32.ss_base = 0;
+		hctxt->cpu_regs.x86_32.ss_limit = ~0u;
+		hctxt->cpu_regs.x86_32.ss_ar = 0xc93;
+		hctxt->cpu_regs.x86_32.tr_base = 0;
+		hctxt->cpu_regs.x86_32.tr_limit = 0xff;
+		hctxt->cpu_regs.x86_32.tr_ar = 0x8b;
+#endif
+#endif
 	}
 
-	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
+	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt_arg))
 		BUG();
 
-	kfree(ctxt);
+	kfree(ctxt_arg);
 	return 0;
 }
 
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index 963d62a..b4a833c 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -8,6 +8,10 @@ extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
 
+#ifdef CONFIG_X86_32
+extern void hvmlite_smp_32(void);
+#endif
+
 #ifdef CONFIG_XEN_PVH
 extern void xen_pvh_early_cpu_init(int cpu, bool entry);
 #else
diff --git a/arch/x86/xen/xen-hvmlite.S b/arch/x86/xen/xen-hvmlite.S
index fc7c08c..805e6a0 100644
--- a/arch/x86/xen/xen-hvmlite.S
+++ b/arch/x86/xen/xen-hvmlite.S
@@ -144,6 +144,13 @@ ENTRY(hvmlite_start_xen)
 	ljmp    $0x10, $_pa(startup_32)
 #endif
 
+#ifdef CONFIG_X86_32
+ENTRY(hvmlite_smp_32)
+        mov $_pa(boot_gdt_descr), %eax
+        lgdt (%eax)
+        jmp startup_32_smp
+#endif
+
 	.data
 gdt:
 	.word	gdt_end - gdt
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-04 10:04   ` David Vrabel
       [not found]   ` <56B32220.4040505@citrix.com>
  2016-02-01 15:38 ` [PATCH v2 09/11] xen/hvmlite: Use x86's default timer init " Boris Ostrovsky
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

HVMlite guests need to be viewed as having APIC, otherwise smpboot code,
for example, will complain.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---

Not sure about xen_cpu_present_to_apicid() being an identity function, given
xen_x86_32_early_logical_apicid().

Suspend/resume will cause xen_apic_write() warnings.

 arch/x86/xen/apic.c |   39 +++++++++++++++++++++++++++++++++++++--
 arch/x86/xen/smp.c  |    2 +-
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index acda713..9bf27f2 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -6,6 +6,7 @@
 
 #include <xen/xen.h>
 #include <xen/interface/physdev.h>
+#include <xen/interface/vcpu.h>
 #include "xen-ops.h"
 #include "pmu.h"
 #include "smp.h"
@@ -78,6 +79,21 @@ static void xen_apic_write(u32 reg, u32 val)
 		return;
 	}
 
+	if (xen_hvmlite) {
+		switch (reg) {
+		case APIC_TASKPRI:
+		case APIC_SPIV:
+		case APIC_ESR:
+		case APIC_LVTT:
+		case APIC_LVT0:
+		case APIC_LVT1:
+		case APIC_LVTERR:
+			pr_debug("Unimplemented APIC register %x,"
+				 " value: %x\n", reg, val);
+			return;
+		}
+	}
+
 	/* Warn to see if there's any stray references */
 	WARN(1,"register: %x, value: %x\n", reg, val);
 }
@@ -100,7 +116,7 @@ static u32 xen_safe_apic_wait_icr_idle(void)
 
 static int xen_apic_probe_pv(void)
 {
-	if (xen_pv_domain())
+	if (xen_pv_domain() || xen_hvmlite)
 		return 1;
 
 	return 0;
@@ -142,6 +158,19 @@ static void xen_silent_inquire(int apicid)
 {
 }
 
+static int xen_cpu_present_to_apicid(int cpu)
+{
+	return cpu;
+}
+
+static int xen_wakeup_secondary_cpu(int cpu, unsigned long start_eip)
+{
+	if (!xen_hvmlite)
+		return -EINVAL;
+
+	return HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
+}
+
 static struct apic xen_pv_apic = {
 	.name 				= "Xen PV",
 	.probe 				= xen_apic_probe_pv,
@@ -162,7 +191,7 @@ static struct apic xen_pv_apic = {
 
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map, /* Used on 32-bit */
 	.setup_apic_routing		= NULL,
-	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
+	.cpu_present_to_apicid		= xen_cpu_present_to_apicid,
 	.apicid_to_cpu_present		= physid_set_mask_of_physid, /* Used on 32-bit */
 	.check_phys_apicid_present	= default_check_phys_apicid_present, /* smp_sanity_check needs it */
 	.phys_pkg_id			= xen_phys_pkg_id, /* detect_ht */
@@ -180,6 +209,9 @@ static struct apic xen_pv_apic = {
 	.send_IPI_all 			= xen_send_IPI_all,
 	.send_IPI_self 			= xen_send_IPI_self,
 #endif
+
+	.wakeup_secondary_cpu	= xen_wakeup_secondary_cpu,
+
 	/* .wait_for_init_deassert- used  by AP bootup - smp_callin which we don't use */
 	.inquire_remote_apic		= xen_silent_inquire,
 
@@ -216,5 +248,8 @@ void __init xen_init_apic(void)
 		apic = &xen_pv_apic;
 
 	x86_platform.apic_post_init = xen_apic_check;
+
+	if (xen_hvmlite)
+		setup_force_cpu_cap(X86_FEATURE_APIC);
 }
 apic_driver(xen_pv_apic);
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index b265c4f..fb085ef 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -214,7 +214,7 @@ static int xen_smp_intr_init(unsigned int cpu)
 	 * The IRQ worker on PVHVM goes through the native path and uses the
 	 * IPI mechanism.
 	 */
-	if (xen_hvm_domain())
+	if (xen_hvm_domain() && !xen_hvmlite)
 		return 0;
 
 	callfunc_name = kasprintf(GFP_KERNEL, "irqwork%d", cpu);
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 09/11] xen/hvmlite: Use x86's default timer init for HVMlite guests
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-02 16:27   ` David Vrabel
       [not found]   ` <56B0D8DD.5010206@citrix.com>
  2016-02-01 15:38 ` [PATCH v2 10/11] xen/hvmlite: Boot secondary CPUs Boris Ostrovsky
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

xen_timer_init() will be called from apic_bsp_setup().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/time.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index f1ba6a0..d77b398 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -492,7 +492,10 @@ void __init xen_init_time_ops(void)
 {
 	pv_time_ops = xen_time_ops;
 
-	x86_init.timers.timer_init = xen_time_init;
+	if (!xen_hvmlite)
+		x86_init.timers.timer_init = xen_time_init;
+	else
+		x86_init.timers.timer_init = x86_init_noop;
 	x86_init.timers.setup_percpu_clockev = x86_init_noop;
 	x86_cpuinit.setup_percpu_clockev = x86_init_noop;
 
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 10/11] xen/hvmlite: Boot secondary CPUs
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (8 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 09/11] xen/hvmlite: Use x86's default timer init " Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
  2016-02-04 10:38   ` David Vrabel
       [not found]   ` <56B32A29.6050406@citrix.com>
  2016-02-01 15:38 ` [PATCH v2 11/11] xen/hvmlite: Enable CPU on-/offlining Boris Ostrovsky
       [not found] ` <1454341137-14110-5-git-send-email-boris.ostrovsky@oracle.com>
  11 siblings, 2 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

HVMlite secondary VCPUs use baremetal bringup path (i.e. native_*
smp_ops) but need to do some preparation in PV code.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/enlighten.c |    2 +
 arch/x86/xen/pmu.c       |    4 +-
 arch/x86/xen/smp.c       |   64 +++++++++++++++++++++++++++++++++------------
 3 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 1409de6..76fb0b2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1933,6 +1933,8 @@ static void __init xen_hvm_guest_init(void)
 		xen_have_vector_callback = 1;
 	xen_hvm_smp_init();
 	register_cpu_notifier(&xen_hvm_cpu_notifier);
+	if (xen_hvmlite)
+		smp_found_config = 1;
 	xen_unplug_emulated_devices();
 	x86_init.irqs.intr_init = xen_init_IRQ;
 	xen_hvm_init_time_ops();
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 724a087..7bc209b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -518,7 +518,7 @@ void xen_pmu_init(int cpu)
 
 	BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
 
-	if (xen_hvm_domain())
+	if (xen_hvm_domain() && !xen_hvmlite)
 		return;
 
 	xenpmu_data = (struct xen_pmu_data *)get_zeroed_page(GFP_KERNEL);
@@ -556,7 +556,7 @@ void xen_pmu_finish(int cpu)
 {
 	struct xen_pmu_params xp;
 
-	if (xen_hvm_domain())
+	if (xen_hvm_domain() && !xen_hvmlite)
 		return;
 
 	xp.vcpu = cpu;
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index fb085ef..a22cae2 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -348,26 +348,31 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
 	}
 	xen_init_lock_cpu(0);
 
-	smp_store_boot_cpu_info();
-	cpu_data(0).x86_max_cores = 1;
+	if (!xen_hvmlite) {
+		smp_store_boot_cpu_info();
+		cpu_data(0).x86_max_cores = 1;
+
+		for_each_possible_cpu(i) {
+			zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i),
+					   GFP_KERNEL);
+			zalloc_cpumask_var(&per_cpu(cpu_core_map, i),
+					   GFP_KERNEL);
+			zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i),
+					   GFP_KERNEL);
+		}
+		set_cpu_sibling_map(0);
 
-	for_each_possible_cpu(i) {
-		zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
-		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
-		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
+		if (!alloc_cpumask_var(&xen_cpu_initialized_map, GFP_KERNEL))
+			panic("could not allocate xen_cpu_initialized_map\n");
+
+		cpumask_copy(xen_cpu_initialized_map, cpumask_of(0));
 	}
-	set_cpu_sibling_map(0);
 
 	xen_pmu_init(0);
 
 	if (xen_smp_intr_init(0))
 		BUG();
 
-	if (!alloc_cpumask_var(&xen_cpu_initialized_map, GFP_KERNEL))
-		panic("could not allocate xen_cpu_initialized_map\n");
-
-	cpumask_copy(xen_cpu_initialized_map, cpumask_of(0));
-
 	/* Restrict the possible_map according to max_cpus. */
 	while ((num_possible_cpus() > 1) && (num_possible_cpus() > max_cpus)) {
 		for (cpu = nr_cpu_ids - 1; !cpu_possible(cpu); cpu--)
@@ -375,8 +380,11 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
 		set_cpu_possible(cpu, false);
 	}
 
-	for_each_possible_cpu(cpu)
+	for_each_possible_cpu(cpu) {
 		set_cpu_present(cpu, true);
+		if (xen_hvmlite)
+			physid_set(cpu, phys_cpu_present_map);
+	}
 }
 
 static int
@@ -810,10 +818,15 @@ void __init xen_smp_init(void)
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
 {
+	if (xen_hvmlite)
+		xen_smp_prepare_cpus(max_cpus);
+
 	native_smp_prepare_cpus(max_cpus);
-	WARN_ON(xen_smp_intr_init(0));
 
-	xen_init_lock_cpu(0);
+	if (!xen_hvmlite) {
+		WARN_ON(xen_smp_intr_init(0));
+		xen_init_lock_cpu(0);
+	}
 }
 
 static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
@@ -836,8 +849,21 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
 	*/
 	rc = xen_smp_intr_init(cpu);
 	WARN_ON(rc);
-	if (!rc)
-		rc =  native_cpu_up(cpu, tidle);
+
+	if (xen_hvmlite) {
+		rc = cpu_initialize_context(cpu, tidle);
+		if (rc) {
+			xen_smp_intr_free(cpu);
+			return rc;
+		}
+		xen_pmu_init(cpu);
+	}
+
+	if (!rc) {
+		rc = native_cpu_up(cpu, tidle);
+		if (rc && xen_hvmlite)
+			xen_pmu_finish(cpu);
+	}
 
 	/*
 	 * We must initialize the slowpath CPU kicker _after_ the native
@@ -861,4 +887,8 @@ void __init xen_hvm_smp_init(void)
 	smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
 	smp_ops.send_call_func_single_ipi = xen_smp_send_call_function_single_ipi;
 	smp_ops.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu;
+	if (xen_hvmlite) {
+		smp_ops.play_dead = xen_play_dead;
+		xen_fill_possible_map();
+	}
 }
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 11/11] xen/hvmlite: Enable CPU on-/offlining
  2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
                   ` (9 preceding siblings ...)
  2016-02-01 15:38 ` [PATCH v2 10/11] xen/hvmlite: Boot secondary CPUs Boris Ostrovsky
@ 2016-02-01 15:38 ` Boris Ostrovsky
       [not found] ` <1454341137-14110-5-git-send-email-boris.ostrovsky@oracle.com>
  11 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-01 15:38 UTC (permalink / raw)
  To: david.vrabel, konrad.wilk
  Cc: Boris Ostrovsky, xen-devel, mcgrof, linux-kernel, roger.pau

When offlining, we should properly clean up interrupts and wait
until hypervisor declares VCPU as down before cleaning up.

After VCPU that was previously offlined is brought back to life
we want to jump back to bare-metal entry points. It's a simple
jump on 64-bit but requires minor tweaking for 32-bit case.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 arch/x86/xen/smp.c         |   35 +++++++++++++++++++++++++----------
 arch/x86/xen/xen-hvmlite.S |    8 ++++++++
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index a22cae2..d768c4e 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -143,7 +143,7 @@ static void xen_smp_intr_free(unsigned int cpu)
 		kfree(per_cpu(xen_callfuncsingle_irq, cpu).name);
 		per_cpu(xen_callfuncsingle_irq, cpu).name = NULL;
 	}
-	if (xen_hvm_domain())
+	if (xen_hvm_domain() && !xen_hvmlite)
 		return;
 
 	if (per_cpu(xen_irq_work, cpu).irq >= 0) {
@@ -585,7 +585,8 @@ static int xen_cpu_disable(void)
 
 static void xen_cpu_die(unsigned int cpu)
 {
-	while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL)) {
+	while ((xen_pv_domain() || xen_hvmlite) &&
+	       HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL)) {
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		schedule_timeout(HZ/10);
 	}
@@ -602,14 +603,28 @@ static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
 {
 	play_dead_common();
 	HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL);
-	cpu_bringup();
-	/*
-	 * commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu down)
-	 * clears certain data that the cpu_idle loop (which called us
-	 * and that we return from) expects. The only way to get that
-	 * data back is to call:
-	 */
-	tick_nohz_idle_enter();
+
+	if (!xen_hvm_domain()) {
+		cpu_bringup();
+		/*
+		 * commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu
+		 * down) clears certain data that the cpu_idle loop (which
+		 * called us and that we return from) expects. The only way to
+		 * get that data back is to call:
+		 */
+		tick_nohz_idle_enter();
+	} else {
+		/*
+		 * For 64-bit we can jump directly to SMP entry point but for
+		 * 32-bit we need to disable paging and load boot GDT (just
+		 * like in cpu_initialize_context()).
+		 */
+#ifdef CONFIG_X86_64
+		asm("jmp secondary_startup_64");
+#else
+		asm("jmp hvmlite_smp_32_hp");
+#endif
+	}
 }
 
 #else /* !CONFIG_HOTPLUG_CPU */
diff --git a/arch/x86/xen/xen-hvmlite.S b/arch/x86/xen/xen-hvmlite.S
index 805e6a0..1db2e96 100644
--- a/arch/x86/xen/xen-hvmlite.S
+++ b/arch/x86/xen/xen-hvmlite.S
@@ -145,6 +145,14 @@ ENTRY(hvmlite_start_xen)
 #endif
 
 #ifdef CONFIG_X86_32
+ENTRY(hvmlite_smp_32_hp)
+	movl $_pa(initial_page_table), %eax
+	movl %eax, %cr3
+	ljmp $__KERNEL_CS,$_pa(5f)
+5:
+	movl $X86_CR0_PE, %eax
+	movl %eax, %cr0
+
 ENTRY(hvmlite_smp_32)
         mov $_pa(boot_gdt_descr), %eax
         lgdt (%eax)
-- 
1.7.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 01/11] xen/hvmlite: Import hvmlite-related Xen public interfaces
  2016-02-01 15:38 ` [PATCH v2 01/11] xen/hvmlite: Import hvmlite-related Xen public interfaces Boris Ostrovsky
@ 2016-02-02 16:06   ` David Vrabel
  0 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-02 16:06 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 01/02/16 15:38, Boris Ostrovsky wrote:
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Acked-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table
       [not found] ` <1454341137-14110-5-git-send-email-boris.ostrovsky@oracle.com>
@ 2016-02-02 16:13   ` David Vrabel
       [not found]   ` <56B0D596.4050301@citrix.com>
  2016-02-03 18:59   ` Luis R. Rodriguez
  2 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-02 16:13 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 01/02/16 15:38, Boris Ostrovsky wrote:
> .. just like we currently do for PVH guests

I think this description is wrong.  In the HVM guess the grant table
initialization is delayed, but here we want to do it immediately (since
we may have no platform PCI device to trigger it otherwise).

Otherwise,

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  arch/x86/xen/grant-table.c |    4 ++--
>  drivers/xen/grant-table.c  |    8 ++++----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> index e079500..40ad9c2 100644
> --- a/arch/x86/xen/grant-table.c
> +++ b/arch/x86/xen/grant-table.c
> @@ -110,7 +110,7 @@ int arch_gnttab_init(unsigned long nr_shared)
>  	return arch_gnttab_valloc(&gnttab_shared_vm_area, nr_shared);
>  }
>  
> -#ifdef CONFIG_XEN_PVH
> +#ifdef CONFIG_XEN_PVHVM
>  #include <xen/balloon.h>
>  #include <xen/events.h>
>  #include <linux/slab.h>
> @@ -164,7 +164,7 @@ static int __init xlated_setup_gnttab_pages(void)
>  
>  static int __init xen_pvh_gnttab_setup(void)
>  {
> -	if (!xen_pvh_domain())
> +	if (!xen_pvh_domain() && !xen_hvmlite)
>  		return -ENODEV;
>  
>  	return xlated_setup_gnttab_pages();
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index c49f79e..9a239d5 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1147,13 +1147,13 @@ EXPORT_SYMBOL_GPL(gnttab_init);
>  
>  static int __gnttab_init(void)
>  {
> +	if (!xen_domain())
> +		return -ENODEV;
> +
>  	/* Delay grant-table initialization in the PV on HVM case */
> -	if (xen_hvm_domain())
> +	if (xen_hvm_domain() && !xen_hvmlite)
>  		return 0;
>  
> -	if (!xen_pv_domain())
> -		return -ENODEV;
> -
>  	return gnttab_init();
>  }
>  /* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 06/11] xen/hvmlite: Prepare cpu_initialize_context() routine for HVMlite SMP
  2016-02-01 15:38 ` [PATCH v2 06/11] xen/hvmlite: Prepare cpu_initialize_context() routine for HVMlite SMP Boris Ostrovsky
@ 2016-02-02 16:16   ` David Vrabel
  0 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-02 16:16 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 01/02/16 15:38, Boris Ostrovsky wrote:
> Subsequent patch will add support for starting secondary VCPUs in
> HVMlite guest. This patch exists to simmplify code review.
> 
> No functional changes (except for introduction of 'if (!xen_hvmlite)').

Acked-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs
  2016-02-01 15:38 ` [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs Boris Ostrovsky
@ 2016-02-02 16:21   ` David Vrabel
       [not found]   ` <56B0D786.7000002@citrix.com>
  1 sibling, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-02 16:21 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

This needs some more description in the commit message.

> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
[...]
> +		hctxt->cpu_regs.x86_32.cs_base = 0;
> +		hctxt->cpu_regs.x86_32.cs_limit = ~0u;
> +		hctxt->cpu_regs.x86_32.cs_ar = 0xc9b;
> +		hctxt->cpu_regs.x86_32.ds_base = 0;
> +		hctxt->cpu_regs.x86_32.ds_limit = ~0u;
> +		hctxt->cpu_regs.x86_32.ds_ar = 0xc93;
> +		hctxt->cpu_regs.x86_32.es_base = 0;
> +		hctxt->cpu_regs.x86_32.es_limit = ~0u;
> +		hctxt->cpu_regs.x86_32.es_ar = 0xc93;
> +		hctxt->cpu_regs.x86_32.ss_base = 0;
> +		hctxt->cpu_regs.x86_32.ss_limit = ~0u;
> +		hctxt->cpu_regs.x86_32.ss_ar = 0xc93;
> +		hctxt->cpu_regs.x86_32.tr_base = 0;
> +		hctxt->cpu_regs.x86_32.tr_limit = 0xff;
> +		hctxt->cpu_regs.x86_32.tr_ar = 0x8b;

Lots of hard-coded values here.  Should this be #defined somewhere?

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 09/11] xen/hvmlite: Use x86's default timer init for HVMlite guests
  2016-02-01 15:38 ` [PATCH v2 09/11] xen/hvmlite: Use x86's default timer init " Boris Ostrovsky
@ 2016-02-02 16:27   ` David Vrabel
       [not found]   ` <56B0D8DD.5010206@citrix.com>
  1 sibling, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-02 16:27 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 01/02/16 15:38, Boris Ostrovsky wrote:
> xen_timer_init() will be called from apic_bsp_setup().

I must be missing something because xen_init_time_ops() is only called
from the PV-only xen_start_kernel()?

David

> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -492,7 +492,10 @@ void __init xen_init_time_ops(void)
>  {
>  	pv_time_ops = xen_time_ops;
>  
> -	x86_init.timers.timer_init = xen_time_init;
> +	if (!xen_hvmlite)
> +		x86_init.timers.timer_init = xen_time_init;
> +	else
> +		x86_init.timers.timer_init = x86_init_noop;
>  	x86_init.timers.setup_percpu_clockev = x86_init_noop;
>  	x86_cpuinit.setup_percpu_clockev = x86_init_noop;
>  
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
  2016-02-01 15:38 ` [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest Boris Ostrovsky
@ 2016-02-02 16:39   ` David Vrabel
       [not found]   ` <56B0DBD6.3060205@citrix.com>
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-02 16:39 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 01/02/16 15:38, Boris Ostrovsky wrote:
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -29,6 +29,12 @@ extern enum xen_domain_type xen_domain_type;
>  #define xen_initial_domain()	(0)
>  #endif	/* CONFIG_XEN_DOM0 */
>  
> +#ifdef CONFIG_XEN_PVHVM
> +extern int xen_hvmlite;
> +#else
> +#define xen_hvmlite		(0)
> +#endif

I think we want a feature set and not a single boolean here.  There's
going to HVMlite variants (e.g., those with an emulated APIC to support
PCI passthrough and those without).

So we test e.g., xen_hvm_feature(XEN_HVM_FEATURE_APIC), which would be
clear for a HVMlite guest without an (emulated) APIC.

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table
       [not found]   ` <56B0D596.4050301@citrix.com>
@ 2016-02-02 16:49     ` Boris Ostrovsky
  0 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-02 16:49 UTC (permalink / raw)
  To: David Vrabel, konrad.wilk; +Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 02/02/2016 11:13 AM, David Vrabel wrote:
> On 01/02/16 15:38, Boris Ostrovsky wrote:
>> .. just like we currently do for PVH guests
> I think this description is wrong.  In the HVM guess the grant table
> initialization is delayed, but here we want to do it immediately (since
> we may have no platform PCI device to trigger it otherwise).

Yes, it's wrong. I'll re-word it.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs
       [not found]   ` <56B0D786.7000002@citrix.com>
@ 2016-02-02 16:58     ` Boris Ostrovsky
       [not found]     ` <56B0E046.6050900@oracle.com>
  1 sibling, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-02 16:58 UTC (permalink / raw)
  To: David Vrabel, konrad.wilk; +Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 02/02/2016 11:21 AM, David Vrabel wrote:
> This needs some more description in the commit message.
>
>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
> [...]
>> +		hctxt->cpu_regs.x86_32.cs_base = 0;
>> +		hctxt->cpu_regs.x86_32.cs_limit = ~0u;
>> +		hctxt->cpu_regs.x86_32.cs_ar = 0xc9b;
>> +		hctxt->cpu_regs.x86_32.ds_base = 0;
>> +		hctxt->cpu_regs.x86_32.ds_limit = ~0u;
>> +		hctxt->cpu_regs.x86_32.ds_ar = 0xc93;
>> +		hctxt->cpu_regs.x86_32.es_base = 0;
>> +		hctxt->cpu_regs.x86_32.es_limit = ~0u;
>> +		hctxt->cpu_regs.x86_32.es_ar = 0xc93;
>> +		hctxt->cpu_regs.x86_32.ss_base = 0;
>> +		hctxt->cpu_regs.x86_32.ss_limit = ~0u;
>> +		hctxt->cpu_regs.x86_32.ss_ar = 0xc93;
>> +		hctxt->cpu_regs.x86_32.tr_base = 0;
>> +		hctxt->cpu_regs.x86_32.tr_limit = 0xff;
>> +		hctxt->cpu_regs.x86_32.tr_ar = 0x8b;
> Lots of hard-coded values here.  Should this be #defined somewhere?

We also don't need to set bases to zero since hctxt is kzalloc'd. I'll 
remove that and add a comment.

As for macros --- I couldn't find the bits defined symbolically anywhere 
and since this is the only place this is used the macros would be local 
here.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 09/11] xen/hvmlite: Use x86's default timer init for HVMlite guests
       [not found]   ` <56B0D8DD.5010206@citrix.com>
@ 2016-02-02 17:01     ` Boris Ostrovsky
  0 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-02 17:01 UTC (permalink / raw)
  To: David Vrabel, konrad.wilk; +Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 02/02/2016 11:27 AM, David Vrabel wrote:
> On 01/02/16 15:38, Boris Ostrovsky wrote:
>> xen_timer_init() will be called from apic_bsp_setup().
> I must be missing something because xen_init_time_ops() is only called
> from the PV-only xen_start_kernel()?

This is leftover from the earlier series where common code called 
xen_init_time_ops() (which, of course, it could easily not have done for 
HVMlite guests).

So it's not needed anymore, I'll drop this patch.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]   ` <56B0DBD6.3060205@citrix.com>
@ 2016-02-02 17:19     ` Boris Ostrovsky
  0 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-02 17:19 UTC (permalink / raw)
  To: David Vrabel, konrad.wilk; +Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 02/02/2016 11:39 AM, David Vrabel wrote:
> On 01/02/16 15:38, Boris Ostrovsky wrote:
>> --- a/include/xen/xen.h
>> +++ b/include/xen/xen.h
>> @@ -29,6 +29,12 @@ extern enum xen_domain_type xen_domain_type;
>>   #define xen_initial_domain()	(0)
>>   #endif	/* CONFIG_XEN_DOM0 */
>>   
>> +#ifdef CONFIG_XEN_PVHVM
>> +extern int xen_hvmlite;
>> +#else
>> +#define xen_hvmlite		(0)
>> +#endif
> I think we want a feature set and not a single boolean here.  There's
> going to HVMlite variants (e.g., those with an emulated APIC to support
> PCI passthrough and those without).
>
> So we test e.g., xen_hvm_feature(XEN_HVM_FEATURE_APIC), which would be
> clear for a HVMlite guest without an (emulated) APIC.

I was thinking of eventually 's/xen_hvmlite/xen_pvh_domain/g' but this 
is better.

Then we need to export features via XENVER_get_features. Alternatively, 
we can provide them in the hypervisor CPUID leaf.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
  2016-02-01 15:38 ` [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest Boris Ostrovsky
  2016-02-02 16:39   ` David Vrabel
       [not found]   ` <56B0DBD6.3060205@citrix.com>
@ 2016-02-03 18:55   ` Luis R. Rodriguez
       [not found]   ` <20160203185525.GV20964@wotan.suse.de>
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03 18:55 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: pmonclus, GLin, mcgrof, x86, linux-kernel, David Vrabel, hpa,
	xen-devel, bp, bblanco, roger.pau

I saw no considerations for the recommendations I had made last on your v1:

https://lkml.kernel.org/r/CAB=NE6XPA0YzbnM8=rspkKai6d3GkXXO00Gr0VZUYoyzNy6thw@mail.gmail.com

Of importance:

1) Using pv_info.paravirt_enabled = 1 is wrong unless you mean to say this
   is for legacy x86:

Your patch #3 keeps on setting pv_info.paravirt_enabled = 1 and as discussed
this is wrong. It will be renamed to x86_legacy_free() to align with what folks
are pushing for a BIOS flag to annotate if a system requires legacy x86 stuff.
This also means re-thinking all use cases and ensuring subarch is used then
instead when the goal was to avoid Xen from entering that code. Today Xen does
not use this but with my work it does and it helps clean and brush up a lot of
these checks with future prospects to even help unify entry points.

2) We should avoid more hypervisor type hacks, and just consider a new 
   hypervisor type to close the gap:

Using x86_legacy_free() and friends in a unified way for all systems means it
should only be used after init_hypervisor_platform() which is called during
setup_arch().  This means we have a semantic gap for checks on "are we on
hypervisor type and which one?". There are drivers now using these sorts of
checks as well, for instance snd_intel8x0_inside_vm(). We should avoid having
these hacks but that also means cleaning up a well define grammar here for what
we want.  I'm doing work to help with this by streamlining use of the subarch
type, that should help with PV code, but your use case seems different but yet
related, what I had suggested last was to consider we add a new hypervisor type
to the x86 boot protocol which would be available early on. This would have a
few purposes, one of which deserves its own section below on dead code:

    a) clean up hacks as with snd_intel8x0_inside_vm()
    b) enable a generic way and clean way to distinguish what hypervisor
       type you're on
    c) since it would be set early and if we can ensure its accessible
       early on boot it would mean avoiding having to add yet-another
       asm entry point for Linux, you could just use startup_32() and
       the hypervisor type could easily just have an early branch call
       and post branch call very similar to how we deal with the subarch
       currently on 32-bit. Your calls then just become early stubs and
       we'd have a solution for other PV types that want a similar solution
       later

3) Dead code concerns and unifying entry points:

Addressing the semantics for the gray areas I am highlighting are critical for
ensuring one does not run code or even exposes code as a available for the type
of run time system booted, some folks call this "dead code". This is critical
for Linux distributions which need to rely on the flexibility of having one
kernel work for different use cases. The resolution to this problem was pvops
but pvops has shortcoming for dead code, it didn't address the problem likely as
it was not considered serious. It also didn't address the issue of different
hypervisors wanting different entry points and that this fact alone also contributes
to more dead code concerns, case in point the regressions introduced by cr4 shadow
and the latest one is Kasan which to this day breaks Xen! Dead code topics are
not easy to grasp, its why I've started on my own crusade to talk to people and
write about it [0], and as of late propose some changes to avoid these in a
clean way without extending pvops. Adding yet another entry point will not help
here *specially* if we do not take semantics seriously over the different hypervisors
and hypervisor types.

[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
[1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html

My recommendation then:

We add new hypervisor type to close the semantic gap for hypervisor types, and
much like subarch enable also a subarch_data to let you pass and use your
hvmlite_start_info. This would not only help with the semantics but also help
avoid yet-another-entry point and force us to provide a well define structure
for considering code that should not run by pegging it as required or supported
for different early x86 code stubs.

I had hinted perhaps we might be able to piggy back on top of the ELF loader
protocol as well, and since that's standard do wonder if that could instead
be extended to help unify a mechanism for different OSes instead of making
this just a solution for Linux.

Code review below.

On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
> Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
> page, initialize boot_params, enable early page tables.
> 
> Since this stub is executed before kernel entry point we cannot use
> variables in .bss which is cleared by kernel. We explicitly place
> variables that are initialized here into .data.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 5774800..5f05fa2 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -171,6 +172,17 @@ struct tls_descs {
>   */
>  static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>  
> +#ifdef CONFIG_XEN_PVHVM
> +/*
> + * HVMlite variables. These need to live in data segment since they are
> + * initialized before startup_{32|64}, which clear .bss, are invoked.
> + */
> +int xen_hvmlite __attribute__((section(".data"))) = 0;
> +struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
> +uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
> +struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
> +#endif
> +

The section annotations seems very special use case but likely worth documenting
and defining a new macro for in include/linux/compiler.h. This would make it
easier to change should we want to change the section used here later and
enable others to easily look for the reason for these annotations in a
single place.

  Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table
       [not found] ` <1454341137-14110-5-git-send-email-boris.ostrovsky@oracle.com>
  2016-02-02 16:13   ` [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table David Vrabel
       [not found]   ` <56B0D596.4050301@citrix.com>
@ 2016-02-03 18:59   ` Luis R. Rodriguez
  2 siblings, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03 18:59 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: xen-devel, roger.pau, david.vrabel, linux-kernel

On Mon, Feb 01, 2016 at 10:38:50AM -0500, Boris Ostrovsky wrote:
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index c49f79e..9a239d5 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1147,13 +1147,13 @@ EXPORT_SYMBOL_GPL(gnttab_init);
>  
>  static int __gnttab_init(void)
>  {
> +	if (!xen_domain())
> +		return -ENODEV;
> +
>  	/* Delay grant-table initialization in the PV on HVM case */
> -	if (xen_hvm_domain())
> +	if (xen_hvm_domain() && !xen_hvmlite)
>  		return 0;
>  
> -	if (!xen_pv_domain())
> -		return -ENODEV;
> -
>  	return gnttab_init();
>  }

For instance if we had pv types generalized then this would just be a one line
addition for __gnttab_init() to support hvmlite. Much cleaner, generalizes the
extensions and gets folks thinking about the types on all calls Xen init call
sites.

  Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]   ` <20160203185525.GV20964@wotan.suse.de>
@ 2016-02-03 20:11     ` Boris Ostrovsky
  2016-02-03 20:52     ` Andrew Cooper
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-03 20:11 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: pmonclus, GLin, mcgrof, x86, linux-kernel, David Vrabel, hpa,
	xen-devel, bp, bblanco, roger.pau

On 02/03/2016 01:55 PM, Luis R. Rodriguez wrote:
> I saw no considerations for the recommendations I had made last on your v1:
>
> https://lkml.kernel.org/r/CAB=NE6XPA0YzbnM8=rspkKai6d3GkXXO00Gr0VZUYoyzNy6thw@mail.gmail.com
>
> Of importance:
>
> 1) Using pv_info.paravirt_enabled = 1 is wrong unless you mean to say this
>     is for legacy x86:
>
> Your patch #3 keeps on setting pv_info.paravirt_enabled = 1 and as discussed
> this is wrong. It will be renamed to x86_legacy_free() to align with what folks
> are pushing for a BIOS flag to annotate if a system requires legacy x86 stuff.
> This also means re-thinking all use cases and ensuring subarch is used then
> instead when the goal was to avoid Xen from entering that code. Today Xen does
> not use this but with my work it does and it helps clean and brush up a lot of
> these checks with future prospects to even help unify entry points.

As I said earlier, I am not sure I understand what subarch buys us for 
HVMlite guests.

As for using paravirt_enabled -- this is really only used to 
differentiate HVM from HVMlite and I think (although I'd need to check) 
is only needed by Xen-specific code in a couple of places. So if/when it 
is removed we will switch to something else. Since your work is WIP I 
decided to keep using it until it's clear what other options may be 
available.

>
> 2) We should avoid more hypervisor type hacks, and just consider a new
>     hypervisor type to close the gap:
>
> Using x86_legacy_free() and friends in a unified way for all systems means it
> should only be used after init_hypervisor_platform() which is called during
> setup_arch().  This means we have a semantic gap for checks on "are we on
> hypervisor type and which one?".

In this particular case we don't need any information about hypervisor 
until init_hypervisor_platform().

> There are drivers now using these sorts of
> checks as well, for instance snd_intel8x0_inside_vm(). We should avoid having
> these hacks but that also means cleaning up a well define grammar here for what
> we want.  I'm doing work to help with this by streamlining use of the subarch
> type, that should help with PV code, but your use case seems different but yet
> related, what I had suggested last was to consider we add a new hypervisor type
> to the x86 boot protocol which would be available early on. This would have a
> few purposes, one of which deserves its own section below on dead code:
>
>      a) clean up hacks as with snd_intel8x0_inside_vm()
>      b) enable a generic way and clean way to distinguish what hypervisor
>         type you're on
>      c) since it would be set early and if we can ensure its accessible
>         early on boot it would mean avoiding having to add yet-another
>         asm entry point for Linux, you could just use startup_32() and
>         the hypervisor type could easily just have an early branch call
>         and post branch call very similar to how we deal with the subarch
>         currently on 32-bit. Your calls then just become early stubs and
>         we'd have a solution for other PV types that want a similar solution
>         later

Which calls?

If you are referring to xen_prepare_hvmlite/hvmlite_bootparams then 
these are needed to prepare boot_params. And we should not enter 
startup_32() without them ready.


>
> 3) Dead code concerns and unifying entry points:
>
> Addressing the semantics for the gray areas I am highlighting are critical for
> ensuring one does not run code or even exposes code as a available for the type
> of run time system booted, some folks call this "dead code". This is critical
> for Linux distributions which need to rely on the flexibility of having one
> kernel work for different use cases. The resolution to this problem was pvops
> but pvops has shortcoming for dead code, it didn't address the problem likely as
> it was not considered serious. It also didn't address the issue of different
> hypervisors wanting different entry points and that this fact alone also contributes
> to more dead code concerns, case in point the regressions introduced by cr4 shadow
> and the latest one is Kasan which to this day breaks Xen! Dead code topics are
> not easy to grasp, its why I've started on my own crusade to talk to people and
> write about it [0], and as of late propose some changes to avoid these in a
> clean way without extending pvops. Adding yet another entry point will not help
> here *specially* if we do not take semantics seriously over the different hypervisors
> and hypervisor types.
>
> [0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
> [1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html

I don't understand what this has to do with HVMlite guests.

>
> My recommendation then:
>
> We add new hypervisor type to close the semantic gap for hypervisor types, and
> much like subarch enable also a subarch_data to let you pass and use your
> hvmlite_start_info. This would not only help with the semantics but also help
> avoid yet-another-entry point and force us to provide a well define structure
> for considering code that should not run by pegging it as required or supported
> for different early x86 code stubs.

As I said before, I don't see how we can avoid having another entry 
point without making Xen change its load procedure. Which is highly 
unlikely to happen.

>
> I had hinted perhaps we might be able to piggy back on top of the ELF loader
> protocol as well, and since that's standard do wonder if that could instead
> be extended to help unify a mechanism for different OSes instead of making
> this just a solution for Linux.
>
> Code review below.
>
> On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
>> Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
>> page, initialize boot_params, enable early page tables.
>>
>> Since this stub is executed before kernel entry point we cannot use
>> variables in .bss which is cleared by kernel. We explicitly place
>> variables that are initialized here into .data.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 5774800..5f05fa2 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -171,6 +172,17 @@ struct tls_descs {
>>    */
>>   static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>>   
>> +#ifdef CONFIG_XEN_PVHVM
>> +/*
>> + * HVMlite variables. These need to live in data segment since they are
>> + * initialized before startup_{32|64}, which clear .bss, are invoked.
>> + */
>> +int xen_hvmlite __attribute__((section(".data"))) = 0;
>> +struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
>> +uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
>> +struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
>> +#endif
>> +
> The section annotations seems very special use case but likely worth documenting
> and defining a new macro for in include/linux/compiler.h. This would make it
> easier to change should we want to change the section used here later and
> enable others to easily look for the reason for these annotations in a
> single place.

I wonder whether __initdata would be a good attribute. We only need this 
early in the boot.

And xen_hvmlite is gone now so we don't need to worry about it.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]   ` <20160203185525.GV20964@wotan.suse.de>
  2016-02-03 20:11     ` Boris Ostrovsky
@ 2016-02-03 20:52     ` Andrew Cooper
       [not found]     ` <56B25F0C.2050808@oracle.com>
       [not found]     ` <56B268A2.5000704@citrix.com>
  3 siblings, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2016-02-03 20:52 UTC (permalink / raw)
  To: Luis R. Rodriguez, Boris Ostrovsky
  Cc: pmonclus, GLin, mcgrof, x86, linux-kernel, David Vrabel, hpa,
	xen-devel, bp, bblanco, roger.pau

On 03/02/16 18:55, Luis R. Rodriguez wrote:
> We add new hypervisor type to close the semantic gap for hypervisor types, and
> much like subarch enable also a subarch_data to let you pass and use your
> hvmlite_start_info. This would not only help with the semantics but also help
> avoid yet-another-entry point and force us to provide a well define structure
> for considering code that should not run by pegging it as required or supported
> for different early x86 code stubs.

Was I unclear last time?  Xen *will not* be introducing Linux-specifics
into the HVMLite starting ABI.

Your perceived problem with multiple entry points is not a problem with
multiple entry points; It is a problem with multiple different paths
performing the same initialisation.

The Linux entry for PV guests is indeed completely horrible.  I am not
trying to defend it in the slightest.

However, the HVMLite entry which is a very short stub that sets up a
zeropage and hands off to the native start routine is fine.  There is
still just routine performing native x86 startup.

If you still desperately want to avoid multiple entry points, then just
insist on using grub for the VM.  I expect that that is how most people
will end up using HVMLite VMs anyway.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]     ` <56B25F0C.2050808@oracle.com>
@ 2016-02-03 23:40       ` Luis R. Rodriguez
       [not found]       ` <20160203234026.GS20964@wotan.suse.de>
  1 sibling, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03 23:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: pmonclus, GLin, bblanco, x86, linux-kernel, Luis R. Rodriguez,
	David Vrabel, hpa, xen-devel, bp, roger.pau

On Wed, Feb 03, 2016 at 03:11:56PM -0500, Boris Ostrovsky wrote:
> On 02/03/2016 01:55 PM, Luis R. Rodriguez wrote:
> >I saw no considerations for the recommendations I had made last on your v1:
> >
> >https://lkml.kernel.org/r/CAB=NE6XPA0YzbnM8=rspkKai6d3GkXXO00Gr0VZUYoyzNy6thw@mail.gmail.com
> >
> >Of importance:
> >
> >1) Using pv_info.paravirt_enabled = 1 is wrong unless you mean to say this
> >    is for legacy x86:
> >
> >Your patch #3 keeps on setting pv_info.paravirt_enabled = 1 and as discussed
> >this is wrong. It will be renamed to x86_legacy_free() to align with what folks
> >are pushing for a BIOS flag to annotate if a system requires legacy x86 stuff.
> >This also means re-thinking all use cases and ensuring subarch is used then
> >instead when the goal was to avoid Xen from entering that code. Today Xen does
> >not use this but with my work it does and it helps clean and brush up a lot of
> >these checks with future prospects to even help unify entry points.
> 
> As I said earlier, I am not sure I understand what subarch buys us
> for HVMlite guests.

I accepted subarch may not be the right thing, so proposed a hypervisor type.
What it buys you is a strong semantics association between code designed
for a purpose.

> As for using paravirt_enabled -- this is really only used to
> differentiate HVM from HVMlite and I think (although I'd need to
> check) is only needed by Xen-specific code in a couple of places.

That sounds like a Xen specific use case as such an interface that is
pointed out as going to renamed to reflect its actual use case should not
be abused for that purpose.

> So if/when it is removed we will switch to something else. Since your work is
> WIP I decided to keep using it until it's clear what other options may be
> available.

And your work is not WIP? I'll be splitting my patches up and the rename
will be atomic, it likely can go in first than yours, so not sure why you
are simply brushing this off.

> >2) We should avoid more hypervisor type hacks, and just consider a new
> >    hypervisor type to close the gap:
> >
> >Using x86_legacy_free() and friends in a unified way for all systems means it
> >should only be used after init_hypervisor_platform() which is called during
> >setup_arch().  This means we have a semantic gap for checks on "are we on
> >hypervisor type and which one?".
> 
> In this particular case we don't need any information about
> hypervisor until init_hypervisor_platform().

I pointed out in your v1 patchset how microcode loading was not blocked, you
then asked how KVM does it, and that was explained as well, and that they
don't enable it as well. You need a solution for this.

> >There are drivers now using these sorts of
> >checks as well, for instance snd_intel8x0_inside_vm(). We should avoid having
> >these hacks but that also means cleaning up a well define grammar here for what
> >we want.  I'm doing work to help with this by streamlining use of the subarch
> >type, that should help with PV code, but your use case seems different but yet
> >related, what I had suggested last was to consider we add a new hypervisor type
> >to the x86 boot protocol which would be available early on. This would have a
> >few purposes, one of which deserves its own section below on dead code:
> >
> >     a) clean up hacks as with snd_intel8x0_inside_vm()
> >     b) enable a generic way and clean way to distinguish what hypervisor
> >        type you're on
> >     c) since it would be set early and if we can ensure its accessible
> >        early on boot it would mean avoiding having to add yet-another
> >        asm entry point for Linux, you could just use startup_32() and
> >        the hypervisor type could easily just have an early branch call
> >        and post branch call very similar to how we deal with the subarch
> >        currently on 32-bit. Your calls then just become early stubs and
> >        we'd have a solution for other PV types that want a similar solution
> >        later
> 
> Which calls?
> 
> If you are referring to xen_prepare_hvmlite/hvmlite_bootparams

Even before, hvmlite_start_xen(). 

> then these are needed to prepare boot_params. And we should not enter
> startup_32() without them ready.

As-is the x86 boot protocol would not allow an easy way for this, I'm
suggesting we consider extending the boot protocol to add a hypervisor
type and data pointer much as with subarch and subarch_data for the
particular purpose of both enabling entry into the same startup_32()
but also a clean way for modifications of stubs both at the beginning
and at the end of startup_32().

Pseudo code:

startup_32()                         startup_64()
       |                                  |
       |                                  |
       V                                  V
pre_hypervisor_stub_32()	pre_hypervisor_stub_64()
       |                                  |
       |                                  |
       V                                  V
 [existing startup_32()]       [existing startup_64()]
       |                                  |
       |                                  |
       V                                  V
post_hypervisor_stub_32()	post_hypervisor_stub_64()

The pre_hypervisor_stub_32() would have much of the code in
hvmlite_start_xen() but for 32-bit, pre_hypervisor_stub_64()
would have the 64-bits.

> >3) Dead code concerns and unifying entry points:
> >
> >Addressing the semantics for the gray areas I am highlighting are critical for
> >ensuring one does not run code or even exposes code as a available for the type
> >of run time system booted, some folks call this "dead code". This is critical
> >for Linux distributions which need to rely on the flexibility of having one
> >kernel work for different use cases. The resolution to this problem was pvops
> >but pvops has shortcoming for dead code, it didn't address the problem likely as
> >it was not considered serious. It also didn't address the issue of different
> >hypervisors wanting different entry points and that this fact alone also contributes
> >to more dead code concerns, case in point the regressions introduced by cr4 shadow
> >and the latest one is Kasan which to this day breaks Xen! Dead code topics are
> >not easy to grasp, its why I've started on my own crusade to talk to people and
> >write about it [0], and as of late propose some changes to avoid these in a
> >clean way without extending pvops. Adding yet another entry point will not help
> >here *specially* if we do not take semantics seriously over the different hypervisors
> >and hypervisor types.
> >
> >[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
> >[1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html
> 
> I don't understand what this has to do with HVMlite guests.

We have no formal semantics to ensure correctness and avoid dead code for HVMLite,
as with PV. The microcode loader is one example of code that should not run. Sure,
an adhoc solution might be possible, but I'm advocating for a generic solution.

> >My recommendation then:
> >
> >We add new hypervisor type to close the semantic gap for hypervisor types, and
> >much like subarch enable also a subarch_data to let you pass and use your
> >hvmlite_start_info. This would not only help with the semantics but also help
> >avoid yet-another-entry point and force us to provide a well define structure
> >for considering code that should not run by pegging it as required or supported
> >for different early x86 code stubs.
> 
> As I said before, I don't see how we can avoid having another entry
> point without making Xen change its load procedure. Which is highly
> unlikely to happen.

Why would it not be possible if this is about a new guest type? Surely there
are good reasons to consider extending the load procedure protocol?

> >I had hinted perhaps we might be able to piggy back on top of the ELF loader
> >protocol as well, and since that's standard do wonder if that could instead
> >be extended to help unify a mechanism for different OSes instead of making
> >this just a solution for Linux.

If it was possible to extend it, would this be a reasonable venue to consider?

> >Code review below.
> >
> >On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
> >>Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
> >>page, initialize boot_params, enable early page tables.
> >>
> >>Since this stub is executed before kernel entry point we cannot use
> >>variables in .bss which is cleared by kernel. We explicitly place
> >>variables that are initialized here into .data.
> >>
> >>Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >>---
> >>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> >>index 5774800..5f05fa2 100644
> >>--- a/arch/x86/xen/enlighten.c
> >>+++ b/arch/x86/xen/enlighten.c
> >>@@ -171,6 +172,17 @@ struct tls_descs {
> >>   */
> >>  static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
> >>+#ifdef CONFIG_XEN_PVHVM
> >>+/*
> >>+ * HVMlite variables. These need to live in data segment since they are
> >>+ * initialized before startup_{32|64}, which clear .bss, are invoked.
> >>+ */
> >>+int xen_hvmlite __attribute__((section(".data"))) = 0;
> >>+struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
> >>+uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
> >>+struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
> >>+#endif
> >>+
> >The section annotations seems very special use case but likely worth documenting
> >and defining a new macro for in include/linux/compiler.h. This would make it
> >easier to change should we want to change the section used here later and
> >enable others to easily look for the reason for these annotations in a
> >single place.
> 
> I wonder whether __initdata would be a good attribute. We only need
> this early in the boot.

I could not find other users of .data other than some specific driver.
Using anything with *init* alludes you can free the data later but if we
want to keep it I suggest a different prefix, up to you.

  Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]     ` <56B268A2.5000704@citrix.com>
@ 2016-02-03 23:59       ` Luis R. Rodriguez
       [not found]       ` <20160203235908.GT20964@wotan.suse.de>
  1 sibling, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-03 23:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: pmonclus, GLin, bblanco, x86, linux-kernel, Luis R. Rodriguez,
	David Vrabel, hpa, xen-devel, Boris Ostrovsky, bp, roger.pau

On Wed, Feb 03, 2016 at 08:52:50PM +0000, Andrew Cooper wrote:
> On 03/02/16 18:55, Luis R. Rodriguez wrote:
> > We add new hypervisor type to close the semantic gap for hypervisor types, and
> > much like subarch enable also a subarch_data to let you pass and use your
> > hvmlite_start_info. This would not only help with the semantics but also help
> > avoid yet-another-entry point and force us to provide a well define structure
> > for considering code that should not run by pegging it as required or supported
> > for different early x86 code stubs.
> 
> Was I unclear last time?  Xen *will not* be introducing Linux-specifics
> into the HVMLite starting ABI.

This does not have to be "Linux specifics" but rather a light way to enable
a hypervisor to clue in *any* OS of its hypervisor type, guest type, and
custom hypervisor data that can be used to populate needed OS specifics
about the guest. Perhaps Xen's own loader mechanism could be extended just
slightly to become *that* standard, its just right now it doesn't seem to
enable for generalizing this in a very useful way for OSes. Its all
custom stubs.

> Your perceived problem with multiple entry points is not a problem with
> multiple entry points; It is a problem with multiple different paths
> performing the same initialisation.

Its actually more of an issue with the lack of strong general semantics
available for different hypervisors and guest types and requirements for x86's
init path. What you end up with as collateral is multiple entry points, and
these can be sloppy and as you note can perform the same initialisation.
Another issue is the inability to proactively ensure new x86 init code
addresses different x86 requirements (cr4 shadow regression and Kasan still
being broken on Xen are two examples) and it just so happens that the lack of
semantics for the different guest types required to be evaluated is one issue
for x86.

We can do better.

> The Linux entry for PV guests is indeed completely horrible.  I am not
> trying to defend it in the slightest.
> 
> However, the HVMLite entry which is a very short stub that sets up a
> zeropage and hands off to the native start routine is fine.

Its alright, and a huge stepping stone towards good architecture. We
however can do better.

> There is still just routine performing native x86 startup.
> 
> If you still desperately want to avoid multiple entry points, then just
> insist on using grub for the VM.  I expect that that is how most people
> will end up using HVMLite VMs anyway.

Are you saying Grub can do some of this heavy lifting that I am trying
to avoid? If so that'd be great news.

  Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]       ` <20160203235908.GT20964@wotan.suse.de>
@ 2016-02-04  0:08         ` Luis R. Rodriguez
  2016-02-04  0:51         ` Andrew Cooper
       [not found]         ` <56B2A09A.70404@citrix.com>
  2 siblings, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-04  0:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Pere Monclus, GLin, Brenden Blanco, X86 ML, linux-kernel,
	Luis R. Rodriguez, David Vrabel, H. Peter Anvin, xen-devel,
	Boris Ostrovsky, Borislav Petkov, Roger Pau Monné

On Wed, Feb 3, 2016 at 3:59 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> If you still desperately want to avoid multiple entry points, then just
>> insist on using grub for the VM.  I expect that that is how most people
>> will end up using HVMLite VMs anyway.
>
> Are you saying Grub can do some of this heavy lifting that I am trying
> to avoid? If so that'd be great news.

Come to think of it, it would still leave the same semantic gap issue,
which is one of the reasons I'm pushing for a solution for that now.

 Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]       ` <20160203235908.GT20964@wotan.suse.de>
  2016-02-04  0:08         ` Luis R. Rodriguez
@ 2016-02-04  0:51         ` Andrew Cooper
       [not found]         ` <56B2A09A.70404@citrix.com>
  2 siblings, 0 replies; 54+ messages in thread
From: Andrew Cooper @ 2016-02-04  0:51 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: pmonclus, GLin, bblanco, x86, linux-kernel, Luis R. Rodriguez,
	David Vrabel, hpa, xen-devel, Boris Ostrovsky, bp, roger.pau

On 03/02/2016 23:59, Luis R. Rodriguez wrote:
> On Wed, Feb 03, 2016 at 08:52:50PM +0000, Andrew Cooper wrote:
>> On 03/02/16 18:55, Luis R. Rodriguez wrote:
>>> We add new hypervisor type to close the semantic gap for hypervisor types, and
>>> much like subarch enable also a subarch_data to let you pass and use your
>>> hvmlite_start_info. This would not only help with the semantics but also help
>>> avoid yet-another-entry point and force us to provide a well define structure
>>> for considering code that should not run by pegging it as required or supported
>>> for different early x86 code stubs.
>> Was I unclear last time?  Xen *will not* be introducing Linux-specifics
>> into the HVMLite starting ABI.
> This does not have to be "Linux specifics" but rather a light way to enable
> a hypervisor to clue in *any* OS of its hypervisor type, guest type, and
> custom hypervisor data that can be used to populate needed OS specifics
> about the guest. Perhaps Xen's own loader mechanism could be extended just
> slightly to become *that* standard, its just right now it doesn't seem to
> enable for generalizing this in a very useful way for OSes. Its all
> custom stubs.

There are already standard x86 ways of doing this, via the hypervisor
cpuid bits.  Xen presents itself normally in this regard, as do all the
other hypervisors.

It is completely backwards to expect a hypervisor (or toolstack in our
case) to deliberately prod what it suspects might be a Linux binary in a
way which it things a Linux binary might like to be prodded.

>
>> Your perceived problem with multiple entry points is not a problem with
>> multiple entry points; It is a problem with multiple different paths
>> performing the same initialisation.
> Its actually more of an issue with the lack of strong general semantics
> available for different hypervisors and guest types and requirements for x86's
> init path. What you end up with as collateral is multiple entry points, and
> these can be sloppy and as you note can perform the same initialisation.
> Another issue is the inability to proactively ensure new x86 init code
> addresses different x86 requirements (cr4 shadow regression and Kasan still
> being broken on Xen are two examples) and it just so happens that the lack of
> semantics for the different guest types required to be evaluated is one issue
> for x86.
>
> We can do better.

Even with a perfect startup() routine which caters for all runtime
usecases, you cannot avoid having multiple entry stubs to cater for the
different ways the binary might be started.

Unless you are volunteering to write a single stub which can first
evaluate whether it is in 16/32/64bit mode, then create a safe stack to
use, then evaluate how it was started (multiboot, legacy BIOS, EFI,
etc.) and turn all this information into a zeropage.

I don't know that would be possible, but the point is moot as it
definitely wouldn't be maintainable if it were possible.

>
>> The Linux entry for PV guests is indeed completely horrible.  I am not
>> trying to defend it in the slightest.
>>
>> However, the HVMLite entry which is a very short stub that sets up a
>> zeropage and hands off to the native start routine is fine.
> Its alright, and a huge stepping stone towards good architecture. We
> however can do better.

Then we are generally in agreement.  However, at the risk of sounding
like a grump old sod,  take this win first and then work on the next
stepping stone.

Review comments identifying "I am working on implementing a new $X which
will make this area better/easier/more shiny in the future" are fine. 
Review comments complaining that "you haven't used this shiny new $X
which doesn't exist yet" are a waste of time; time which you would be
better spent implementing said $X.

Noone is disagreeing that improvements can be made, but don't try to do
them all at once, or nothing will get done.

>
>> There is still just routine performing native x86 startup.
>>
>> If you still desperately want to avoid multiple entry points, then just
>> insist on using grub for the VM.  I expect that that is how most people
>> will end up using HVMLite VMs anyway.
> Are you saying Grub can do some of this heavy lifting that I am trying
> to avoid? If so that'd be great news.

There are two different ways of running your VM, depending on your usecase.

The more traditional approach of a full OS will want to load a kernel
out of the guests filesystem, according to the guests bootloader
configuration.  At this point it doesn't matter for Linux as it will be
booted by some already-existing bootloader which already knows how to do
the job.

The more container/unikernel oriented approach is to boot an image
directly from dom0, skipping the middle layers of firmware and
filesystems.  For this to work, Linux needs to be able to be started via
the hypervisor ABI of choice, which in this case is something which
looks very much like multiboot.

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests
  2016-02-01 15:38 ` [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests Boris Ostrovsky
@ 2016-02-04 10:04   ` David Vrabel
       [not found]   ` <56B32220.4040505@citrix.com>
  1 sibling, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-04 10:04 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 01/02/16 15:38, Boris Ostrovsky wrote:
> HVMlite guests need to be viewed as having APIC, otherwise smpboot code,
> for example, will complain.

I think we should consider always giving HVMlite guests an emulated
APIC.  I think this eliminates one of the biggest differences between
HVMlite and native/KVM guests and will reduce the risk of future
breakage in this area.

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs
       [not found]     ` <56B0E046.6050900@oracle.com>
@ 2016-02-04 10:07       ` David Vrabel
  2016-02-04 12:58       ` Doug Goldstein
       [not found]       ` <56B34B01.50000@cardoe.com>
  2 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-04 10:07 UTC (permalink / raw)
  To: Boris Ostrovsky, David Vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 02/02/16 16:58, Boris Ostrovsky wrote:
> On 02/02/2016 11:21 AM, David Vrabel wrote:
>> This needs some more description in the commit message.
>>
>>> --- a/arch/x86/xen/smp.c
>>> +++ b/arch/x86/xen/smp.c
>> [...]
>>> +        hctxt->cpu_regs.x86_32.cs_base = 0;
>>> +        hctxt->cpu_regs.x86_32.cs_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.cs_ar = 0xc9b;
>>> +        hctxt->cpu_regs.x86_32.ds_base = 0;
>>> +        hctxt->cpu_regs.x86_32.ds_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.ds_ar = 0xc93;
>>> +        hctxt->cpu_regs.x86_32.es_base = 0;
>>> +        hctxt->cpu_regs.x86_32.es_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.es_ar = 0xc93;
>>> +        hctxt->cpu_regs.x86_32.ss_base = 0;
>>> +        hctxt->cpu_regs.x86_32.ss_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.ss_ar = 0xc93;
>>> +        hctxt->cpu_regs.x86_32.tr_base = 0;
>>> +        hctxt->cpu_regs.x86_32.tr_limit = 0xff;
>>> +        hctxt->cpu_regs.x86_32.tr_ar = 0x8b;
>> Lots of hard-coded values here.  Should this be #defined somewhere?
> 
> We also don't need to set bases to zero since hctxt is kzalloc'd. I'll
> remove that and add a comment.
> 
> As for macros --- I couldn't find the bits defined symbolically anywhere
> and since this is the only place this is used the macros would be local
> here.

Ok.

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 10/11] xen/hvmlite: Boot secondary CPUs
  2016-02-01 15:38 ` [PATCH v2 10/11] xen/hvmlite: Boot secondary CPUs Boris Ostrovsky
@ 2016-02-04 10:38   ` David Vrabel
       [not found]   ` <56B32A29.6050406@citrix.com>
  1 sibling, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-02-04 10:38 UTC (permalink / raw)
  To: Boris Ostrovsky, david.vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 01/02/16 15:38, Boris Ostrovsky wrote:
> HVMlite secondary VCPUs use baremetal bringup path (i.e. native_*
> smp_ops) but need to do some preparation in PV code.

If we always provided an emulated APIC could we use the native SMP
bring-up instead?

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests
       [not found]   ` <56B32220.4040505@citrix.com>
@ 2016-02-04 12:14     ` Roger Pau Monné
       [not found]     ` <56B340B4.3050406@citrix.com>
  1 sibling, 0 replies; 54+ messages in thread
From: Roger Pau Monné @ 2016-02-04 12:14 UTC (permalink / raw)
  To: David Vrabel, Boris Ostrovsky, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel

El 4/2/16 a les 11:04, David Vrabel ha escrit:
> On 01/02/16 15:38, Boris Ostrovsky wrote:
>> HVMlite guests need to be viewed as having APIC, otherwise smpboot code,
>> for example, will complain.
> 
> I think we should consider always giving HVMlite guests an emulated
> APIC.  I think this eliminates one of the biggest differences between
> HVMlite and native/KVM guests and will reduce the risk of future
> breakage in this area.

Right, I'm not opposed to that, but I think we should keep the hypercall
interface in order to bring up vCPUs. IMHO it's useful for unikernels
for example (do those support SMP?), and in general allows for
easier/faster CPU-bringup as compared to bare metal.

Then if we indeed decide to provide and emulated lapic, should we also
at least provide the ACPI MADT table in order to enumerate them?

Roger.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs
       [not found]     ` <56B0E046.6050900@oracle.com>
  2016-02-04 10:07       ` David Vrabel
@ 2016-02-04 12:58       ` Doug Goldstein
       [not found]       ` <56B34B01.50000@cardoe.com>
  2 siblings, 0 replies; 54+ messages in thread
From: Doug Goldstein @ 2016-02-04 12:58 UTC (permalink / raw)
  To: Boris Ostrovsky, David Vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau


[-- Attachment #1.1: Type: text/plain, Size: 1533 bytes --]

On 2/2/16 10:58 AM, Boris Ostrovsky wrote:
> On 02/02/2016 11:21 AM, David Vrabel wrote:
>> This needs some more description in the commit message.
>>
>>> --- a/arch/x86/xen/smp.c
>>> +++ b/arch/x86/xen/smp.c
>> [...]
>>> +        hctxt->cpu_regs.x86_32.cs_base = 0;
>>> +        hctxt->cpu_regs.x86_32.cs_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.cs_ar = 0xc9b;
>>> +        hctxt->cpu_regs.x86_32.ds_base = 0;
>>> +        hctxt->cpu_regs.x86_32.ds_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.ds_ar = 0xc93;
>>> +        hctxt->cpu_regs.x86_32.es_base = 0;
>>> +        hctxt->cpu_regs.x86_32.es_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.es_ar = 0xc93;
>>> +        hctxt->cpu_regs.x86_32.ss_base = 0;
>>> +        hctxt->cpu_regs.x86_32.ss_limit = ~0u;
>>> +        hctxt->cpu_regs.x86_32.ss_ar = 0xc93;
>>> +        hctxt->cpu_regs.x86_32.tr_base = 0;
>>> +        hctxt->cpu_regs.x86_32.tr_limit = 0xff;
>>> +        hctxt->cpu_regs.x86_32.tr_ar = 0x8b;
>> Lots of hard-coded values here.  Should this be #defined somewhere?
> 
> We also don't need to set bases to zero since hctxt is kzalloc'd. I'll
> remove that and add a comment.
> 
> As for macros --- I couldn't find the bits defined symbolically anywhere
> and since this is the only place this is used the macros would be local
> here.
> 
> -boris
> 

It could be useful to have them defined locally if only to give them
some more meaning by having a name rather than 0x8b. Just a thought.

-- 
Doug Goldstein


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 959 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 10/11] xen/hvmlite: Boot secondary CPUs
       [not found]   ` <56B32A29.6050406@citrix.com>
@ 2016-02-04 13:51     ` Boris Ostrovsky
  0 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-04 13:51 UTC (permalink / raw)
  To: David Vrabel, konrad.wilk; +Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 02/04/2016 05:38 AM, David Vrabel wrote:
> On 01/02/16 15:38, Boris Ostrovsky wrote:
>> HVMlite secondary VCPUs use baremetal bringup path (i.e. native_*
>> smp_ops) but need to do some preparation in PV code.
> If we always provided an emulated APIC could we use the native SMP
> bring-up instead?

We do use native SMP bringup now (wrapped in 
xen_hvm_cpu_up/xen_cpu_die/xen_play_dead)), I added a couple of PV APIC 
ops to help with that, most notably xen_wakeup_secondary_cpu().

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests
       [not found]     ` <56B340B4.3050406@citrix.com>
@ 2016-02-04 14:01       ` Boris Ostrovsky
  0 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-04 14:01 UTC (permalink / raw)
  To: Roger Pau Monné, David Vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel

On 02/04/2016 07:14 AM, Roger Pau Monné wrote:
> El 4/2/16 a les 11:04, David Vrabel ha escrit:
>> On 01/02/16 15:38, Boris Ostrovsky wrote:
>>> HVMlite guests need to be viewed as having APIC, otherwise smpboot code,
>>> for example, will complain.
>> I think we should consider always giving HVMlite guests an emulated
>> APIC.  I think this eliminates one of the biggest differences between
>> HVMlite and native/KVM guests and will reduce the risk of future
>> breakage in this area.
> Right, I'm not opposed to that, but I think we should keep the hypercall
> interface in order to bring up vCPUs. IMHO it's useful for unikernels
> for example (do those support SMP?), and in general allows for
> easier/faster CPU-bringup as compared to bare metal.

As I mentioned in another reply, we can manage without emulated apic by 
slightly extending PV APIC. However, it is rather fragile and I think 
not having to do this would be a great improvement from code reliability 
POV. And with Intel's vAPIC (and AMD's equivalent eventually, I suppose) 
we should get performance improvement as well.

What was the reason for not providing it? ACPI?

>
> Then if we indeed decide to provide and emulated lapic, should we also
> at least provide the ACPI MADT table in order to enumerate them?

This brings another question that I was going to ask. In the NVDIMM 
thread there is a discussion about where to implement ACPI tables and I 
think people are leaning toward doing in it qemu. With HVMlite we can 
only (??) implement MADT (and whatever we add later) in the toolstack 
and so we will have 3 places (qemu, hvmloader, toolstack) that build 
ACPI tables.

Can we have all this done in one place? Perhaps keep this code as a library?

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs
       [not found]       ` <56B34B01.50000@cardoe.com>
@ 2016-02-04 19:08         ` Boris Ostrovsky
  0 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-04 19:08 UTC (permalink / raw)
  To: Doug Goldstein, David Vrabel, konrad.wilk
  Cc: xen-devel, mcgrof, linux-kernel, roger.pau

On 02/04/2016 07:58 AM, Doug Goldstein wrote:
> On 2/2/16 10:58 AM, Boris Ostrovsky wrote:
>> On 02/02/2016 11:21 AM, David Vrabel wrote:
>>> This needs some more description in the commit message.
>>>
>>>> --- a/arch/x86/xen/smp.c
>>>> +++ b/arch/x86/xen/smp.c
>>> [...]
>>>> +        hctxt->cpu_regs.x86_32.cs_base = 0;
>>>> +        hctxt->cpu_regs.x86_32.cs_limit = ~0u;
>>>> +        hctxt->cpu_regs.x86_32.cs_ar = 0xc9b;
>>>> +        hctxt->cpu_regs.x86_32.ds_base = 0;
>>>> +        hctxt->cpu_regs.x86_32.ds_limit = ~0u;
>>>> +        hctxt->cpu_regs.x86_32.ds_ar = 0xc93;
>>>> +        hctxt->cpu_regs.x86_32.es_base = 0;
>>>> +        hctxt->cpu_regs.x86_32.es_limit = ~0u;
>>>> +        hctxt->cpu_regs.x86_32.es_ar = 0xc93;
>>>> +        hctxt->cpu_regs.x86_32.ss_base = 0;
>>>> +        hctxt->cpu_regs.x86_32.ss_limit = ~0u;
>>>> +        hctxt->cpu_regs.x86_32.ss_ar = 0xc93;
>>>> +        hctxt->cpu_regs.x86_32.tr_base = 0;
>>>> +        hctxt->cpu_regs.x86_32.tr_limit = 0xff;
>>>> +        hctxt->cpu_regs.x86_32.tr_ar = 0x8b;
>>> Lots of hard-coded values here.  Should this be #defined somewhere?
>> We also don't need to set bases to zero since hctxt is kzalloc'd. I'll
>> remove that and add a comment.
>>
>> As for macros --- I couldn't find the bits defined symbolically anywhere
>> and since this is the only place this is used the macros would be local
>> here.
>>
>> -boris
>>
> It could be useful to have them defined locally if only to give them
> some more meaning by having a name rather than 0x8b. Just a thought.

Yes, I'll do that (or at least have a comment explaining the bits).

Looks like this we should wait with this series though until we figure 
out APIC emulation status.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]       ` <20160203234026.GS20964@wotan.suse.de>
@ 2016-02-04 19:54         ` Boris Ostrovsky
       [not found]         ` <56B3AC67.7080704@oracle.com>
  1 sibling, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-04 19:54 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: pmonclus, GLin, bblanco, x86, linux-kernel, Luis R. Rodriguez,
	David Vrabel, hpa, xen-devel, bp, roger.pau

On 02/03/2016 06:40 PM, Luis R. Rodriguez wrote:
> On Wed, Feb 03, 2016 at 03:11:56PM -0500, Boris Ostrovsky wrote:
>> On 02/03/2016 01:55 PM, Luis R. Rodriguez wrote:
>>> I saw no considerations for the recommendations I had made last on your v1:
>>>
>>> https://lkml.kernel.org/r/CAB=NE6XPA0YzbnM8=rspkKai6d3GkXXO00Gr0VZUYoyzNy6thw@mail.gmail.com
>>>
>>> Of importance:
>>>
>>> 1) Using pv_info.paravirt_enabled = 1 is wrong unless you mean to say this
>>>     is for legacy x86:
>>>
>>> Your patch #3 keeps on setting pv_info.paravirt_enabled = 1 and as discussed
>>> this is wrong. It will be renamed to x86_legacy_free() to align with what folks
>>> are pushing for a BIOS flag to annotate if a system requires legacy x86 stuff.
>>> This also means re-thinking all use cases and ensuring subarch is used then
>>> instead when the goal was to avoid Xen from entering that code. Today Xen does
>>> not use this but with my work it does and it helps clean and brush up a lot of
>>> these checks with future prospects to even help unify entry points.
>> As I said earlier, I am not sure I understand what subarch buys us
>> for HVMlite guests.
> I accepted subarch may not be the right thing, so proposed a hypervisor type.

I don't see much difference between having an HV-specific subarch and a 
hypervisor type.

> What it buys you is a strong semantics association between code designed
> for a purpose.
>
>> As for using paravirt_enabled -- this is really only used to
>> differentiate HVM from HVMlite and I think (although I'd need to
>> check) is only needed by Xen-specific code in a couple of places.
> That sounds like a Xen specific use case as such an interface that is
> pointed out as going to renamed to reflect its actual use case should not
> be abused for that purpose.
>
>> So if/when it is removed we will switch to something else. Since your work is
>> WIP I decided to keep using it until it's clear what other options may be
>> available.
> And your work is not WIP? I'll be splitting my patches up and the rename
> will be atomic, it likely can go in first than yours, so not sure why you
> are simply brushing this off.

I didn't mean to imply anything by saying that your patches are a WIP. 
It's just that I can only write and test my patches against existing 
code, not the future one.

I am sorry if you felt I was trying to say something else, it certainly 
was not my intent.

>
>>> 2) We should avoid more hypervisor type hacks, and just consider a new
>>>     hypervisor type to close the gap:
>>>
>>> Using x86_legacy_free() and friends in a unified way for all systems means it
>>> should only be used after init_hypervisor_platform() which is called during
>>> setup_arch().  This means we have a semantic gap for checks on "are we on
>>> hypervisor type and which one?".
>> In this particular case we don't need any information about
>> hypervisor until init_hypervisor_platform().
> I pointed out in your v1 patchset how microcode loading was not blocked, you
> then asked how KVM does it, and that was explained as well, and that they
> don't enable it as well. You need a solution for this.

Not really. Xen will ignore writes to microcode-specific MSRs, just like 
KVM.

This is exact same behavior we have now with regular HVM guests.


> As-is the x86 boot protocol would not allow an easy way for this, I'm 
> suggesting we consider extending the boot protocol to add a hypervisor 
> type and data pointer much as with subarch and subarch_data for the

Who will set hypervisor type and where? It won't be Xen as Andrew 
mentioned in another email.

> particular purpose of both enabling entry into the same startup_32()
> but also a clean way for modifications of stubs both at the beginning
> and at the end of startup_32().
>
> Pseudo code:
>
> startup_32()                         startup_64()
>         |                                  |
>         |                                  |
>         V                                  V
> pre_hypervisor_stub_32()	pre_hypervisor_stub_64()
>         |                                  |
>         |                                  |
>         V                                  V
>   [existing startup_32()]       [existing startup_64()]
>         |                                  |
>         |                                  |
>         V                                  V
> post_hypervisor_stub_32()	post_hypervisor_stub_64()
>
> The pre_hypervisor_stub_32() would have much of the code in
> hvmlite_start_xen() but for 32-bit, pre_hypervisor_stub_64()
> would have the 64-bits.


Sure. When the protocol is agreed upon and this code is written we will 
just move hvmlite_start_xen() to pre_hypervisor_stub_32().



> +int xen_hvmlite __attribute__((section(".data"))) = 0;
> +struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
> +uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
> +struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
> +#endif
> +
>>> The section annotations seems very special use case but likely worth documenting
>>> and defining a new macro for in include/linux/compiler.h. This would make it
>>> easier to change should we want to change the section used here later and
>>> enable others to easily look for the reason for these annotations in a
>>> single place.
>> I wonder whether __initdata would be a good attribute. We only need
>> this early in the boot.
> I could not find other users of .data other than some specific driver.
> Using anything with *init* alludes you can free the data later but if we
> want to keep it I suggest a different prefix, up to you.

That's why I said that we only need this info early in the boot.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]         ` <56B3AC67.7080704@oracle.com>
@ 2016-02-04 20:57           ` Luis R. Rodriguez
       [not found]           ` <20160204205721.GJ12481@wotan.suse.de>
  1 sibling, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-04 20:57 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: pmonclus, GLin, bblanco, x86, linux-kernel, Luis R. Rodriguez,
	David Vrabel, hpa, xen-devel, bp, roger.pau

On Thu, Feb 04, 2016 at 02:54:15PM -0500, Boris Ostrovsky wrote:
> On 02/03/2016 06:40 PM, Luis R. Rodriguez wrote:
> >On Wed, Feb 03, 2016 at 03:11:56PM -0500, Boris Ostrovsky wrote:
> >>On 02/03/2016 01:55 PM, Luis R. Rodriguez wrote:
> >>>I saw no considerations for the recommendations I had made last on your v1:
> >>>
> >>>https://lkml.kernel.org/r/CAB=NE6XPA0YzbnM8=rspkKai6d3GkXXO00Gr0VZUYoyzNy6thw@mail.gmail.com
> >>>
> >>>Of importance:
> >>>
> >>>1) Using pv_info.paravirt_enabled = 1 is wrong unless you mean to say this
> >>>    is for legacy x86:
> >>>
> >>>Your patch #3 keeps on setting pv_info.paravirt_enabled = 1 and as discussed
> >>>this is wrong. It will be renamed to x86_legacy_free() to align with what folks
> >>>are pushing for a BIOS flag to annotate if a system requires legacy x86 stuff.
> >>>This also means re-thinking all use cases and ensuring subarch is used then
> >>>instead when the goal was to avoid Xen from entering that code. Today Xen does
> >>>not use this but with my work it does and it helps clean and brush up a lot of
> >>>these checks with future prospects to even help unify entry points.
> >>As I said earlier, I am not sure I understand what subarch buys us
> >>for HVMlite guests.
> >I accepted subarch may not be the right thing, so proposed a hypervisor type.
> 
> I don't see much difference between having an HV-specific subarch
> and a hypervisor type.

Ah, well here lies the issue. As per hpa subarch was not designed for defining
a hypervisor, but rather at least subarch PC (0) [should be used if the
hardware is] "enumerable using standard PC mechanisms (PCI, ACPI) and doesn't
need a special boot flow". Does that follow the definition of HVMlite?

I was pointing out to hpa how paravirt_enabled() has limitations in that it is
set late and as such only logically be available for all users after
setup_arch(), so I figured we could repurpose subarch for a hypervisor type.
He noted:

  "If you have a genuine need for a "hypervisor type" then that is a
   separate thing and should be treated separately from subarch.  However,
   you need to consider that some hypervisors can emulate other hypervisors
   and you may have more than one hypervisor API available."

> >What it buys you is a strong semantics association between code designed
> >for a purpose.
> >
> >>As for using paravirt_enabled -- this is really only used to
> >>differentiate HVM from HVMlite and I think (although I'd need to
> >>check) is only needed by Xen-specific code in a couple of places.
> >That sounds like a Xen specific use case as such an interface that is
> >pointed out as going to renamed to reflect its actual use case should not
> >be abused for that purpose.
> >
> >>So if/when it is removed we will switch to something else. Since your work is
> >>WIP I decided to keep using it until it's clear what other options may be
> >>available.
> >And your work is not WIP? I'll be splitting my patches up and the rename
> >will be atomic, it likely can go in first than yours, so not sure why you
> >are simply brushing this off.
> 
> I didn't mean to imply anything by saying that your patches are a
> WIP. It's just that I can only write and test my patches against
> existing code, not the future one.
> 
> I am sorry if you felt I was trying to say something else, it
> certainly was not my intent.

I don't really care about that, my point was that we both are working on
similar areas right now and both efforts are helping us clean up the init
path and give us better semantics, we should take both patch series into
consideration as they *both* are being reviewed now. The definition and use
of subarch at least of importance here for HVMLite in consideration for
future cleanup.

> >>>2) We should avoid more hypervisor type hacks, and just consider a new
> >>>    hypervisor type to close the gap:
> >>>
> >>>Using x86_legacy_free() and friends in a unified way for all systems means it
> >>>should only be used after init_hypervisor_platform() which is called during
> >>>setup_arch().  This means we have a semantic gap for checks on "are we on
> >>>hypervisor type and which one?".
> >>In this particular case we don't need any information about
> >>hypervisor until init_hypervisor_platform().
> >I pointed out in your v1 patchset how microcode loading was not blocked, you
> >then asked how KVM does it, and that was explained as well, and that they
> >don't enable it as well. You need a solution for this.
> 
> Not really. Xen will ignore writes to microcode-specific MSRs, just
> like KVM.
>
> This is exact same behavior we have now with regular HVM guests.

OK great. That still means the code will run, and if we can avoid that
why not. I am fine with annotating this as future work to help. Let me
then ask as well, how about the rest of the code during and after
startup_32() and startup_64() -- are we sure that's all safe ?

> >As-is the x86 boot protocol would not allow an easy way for this,
> >I'm suggesting we consider extending the boot protocol to add a
> >hypervisor type and data pointer much as with subarch and
> >subarch_data for the
> 
> Who will set hypervisor type and where? It won't be Xen as Andrew
> mentioned in another email.

Andrew seems to think I'm after some senseless prodding, there are
good reasons to consider setting at least a type and custom data
pointer, and in fact I think there are gains for this not only for
Linux but other OSes; so I'll keep working on my arguments there.

> >particular purpose of both enabling entry into the same startup_32()
> >but also a clean way for modifications of stubs both at the beginning
> >and at the end of startup_32().
> >
> >Pseudo code:
> >
> >startup_32()                         startup_64()
> >        |                                  |
> >        |                                  |
> >        V                                  V
> >pre_hypervisor_stub_32()	pre_hypervisor_stub_64()
> >        |                                  |
> >        |                                  |
> >        V                                  V
> >  [existing startup_32()]       [existing startup_64()]
> >        |                                  |
> >        |                                  |
> >        V                                  V
> >post_hypervisor_stub_32()	post_hypervisor_stub_64()
> >
> >The pre_hypervisor_stub_32() would have much of the code in
> >hvmlite_start_xen() but for 32-bit, pre_hypervisor_stub_64()
> >would have the 64-bits.
> 
> 
> Sure. When the protocol is agreed upon and this code is written we
> will just move hvmlite_start_xen() to pre_hypervisor_stub_32().

OK fair enough.

> >+int xen_hvmlite __attribute__((section(".data"))) = 0;
> >+struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
> >+uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
> >+struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
> >+#endif
> >+
> >>>The section annotations seems very special use case but likely worth documenting
> >>>and defining a new macro for in include/linux/compiler.h. This would make it
> >>>easier to change should we want to change the section used here later and
> >>>enable others to easily look for the reason for these annotations in a
> >>>single place.
> >>I wonder whether __initdata would be a good attribute. We only need
> >>this early in the boot.
> >I could not find other users of .data other than some specific driver.
> >Using anything with *init* alludes you can free the data later but if we
> >want to keep it I suggest a different prefix, up to you.
> 
> That's why I said that we only need this info early in the boot.

Still -- better just document and add a shared macro for it.

  Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]           ` <20160204205721.GJ12481@wotan.suse.de>
@ 2016-02-04 22:28             ` Boris Ostrovsky
  0 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-02-04 22:28 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: pmonclus, GLin, bblanco, x86, linux-kernel, Luis R. Rodriguez,
	David Vrabel, hpa, xen-devel, bp, roger.pau

On 02/04/2016 03:57 PM, Luis R. Rodriguez wrote:
>
> Ah, well here lies the issue. As per hpa subarch was not designed for defining
> a hypervisor, but rather at least subarch PC (0) [should be used if the
> hardware is] "enumerable using standard PC mechanisms (PCI, ACPI) and doesn't
> need a special boot flow". Does that follow the definition of HVMlite?

Yes. HVMlite is going to use baremetal boot flow.

> OK great. That still means the code will run, and if we can avoid that
> why not. I am fine with annotating this as future work to help. Let me
> then ask as well, how about the rest of the code during and after
> startup_32() and startup_64() -- are we sure that's all safe ?

I can't be sure, all I can say is that so far I haven't seen any problems.

-boris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]         ` <56B2A09A.70404@citrix.com>
@ 2016-02-04 23:10           ` Luis R. Rodriguez
       [not found]           ` <20160204231031.GM12481@wotan.suse.de>
  1 sibling, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-04 23:10 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: pmonclus, bblanco, x86, linux-kernel, Luis R. Rodriguez, GLin,
	David Vrabel, hpa, xen-devel, Boris Ostrovsky, bp,
	Linus Torvalds, roger.pau

On Thu, Feb 04, 2016 at 12:51:38AM +0000, Andrew Cooper wrote:
> On 03/02/2016 23:59, Luis R. Rodriguez wrote:
> > On Wed, Feb 03, 2016 at 08:52:50PM +0000, Andrew Cooper wrote:
> >> On 03/02/16 18:55, Luis R. Rodriguez wrote:
> >>> We add new hypervisor type to close the semantic gap for hypervisor types, and
> >>> much like subarch enable also a subarch_data to let you pass and use your
> >>> hvmlite_start_info. This would not only help with the semantics but also help
> >>> avoid yet-another-entry point and force us to provide a well define structure
> >>> for considering code that should not run by pegging it as required or supported
> >>> for different early x86 code stubs.
> >> Was I unclear last time?  Xen *will not* be introducing Linux-specifics
> >> into the HVMLite starting ABI.
> > This does not have to be "Linux specifics" but rather a light way to enable
> > a hypervisor to clue in *any* OS of its hypervisor type, guest type, and
> > custom hypervisor data that can be used to populate needed OS specifics
> > about the guest. Perhaps Xen's own loader mechanism could be extended just
> > slightly to become *that* standard, its just right now it doesn't seem to
> > enable for generalizing this in a very useful way for OSes. Its all
> > custom stubs.
> 
> There are already standard x86 ways of doing this, via the hypervisor
> cpuid bits.  Xen presents itself normally in this regard, as do all the
> other hypervisors.

I don't think this is availably early in asm boot? Its why I think the
zero page is convenient. The boot loader should in theory know these
things, as well as if its in 32-bit, 64-bit, etc.

> It is completely backwards to expect a hypervisor (or toolstack in our
> case) to deliberately prod what it suspects might be a Linux binary in a
> way which it things a Linux binary might like to be prodded.

Perhaps prodding tons of info seems ludicrous, however prodding at least a
loader type and custom data pointer to interpret that so that then your stub
can interpret seems sensible for many reasons and I don't think prodding two
things is much to ask for, given the possible gains on clean architecture.
Its why I am suggesting perhaps this should just be standardized.

We need flexibility on both sides.

> >> Your perceived problem with multiple entry points is not a problem with
> >> multiple entry points; It is a problem with multiple different paths
> >> performing the same initialisation.
> > Its actually more of an issue with the lack of strong general semantics
> > available for different hypervisors and guest types and requirements for x86's
> > init path. What you end up with as collateral is multiple entry points, and
> > these can be sloppy and as you note can perform the same initialisation.
> > Another issue is the inability to proactively ensure new x86 init code
> > addresses different x86 requirements (cr4 shadow regression and Kasan still
> > being broken on Xen are two examples) and it just so happens that the lack of
> > semantics for the different guest types required to be evaluated is one issue
> > for x86.
> >
> > We can do better.
> 
> Even with a perfect startup() routine which caters for all runtime
> usecases, you cannot avoid having multiple entry stubs to cater for the
> different ways the binary might be started.
> 
> Unless you are volunteering to write a single stub which can first
> evaluate whether it is in 16/32/64bit mode, then create a safe stack to
> use, then evaluate how it was started (multiboot, legacy BIOS, EFI,
> etc.) and turn all this information into a zeropage.
> 
> I don't know that would be possible, but the point is moot as it
> definitely wouldn't be maintainable if it were possible.

I think some folks have hope at least some of it might be. I can't do this,
otherwise I would have done it already. Given my review of the commit logs on
different entry points, and code I do think its sensible to desire this to help
with semantics on startup and this should in turn help duplication, bugs, but I
obviously do not doubt its difficulty.

Its at least sensible in my mind to strive towards the best possible semantics
and code sharing from x86-64 bit onwards and if I can help with that I'll do
what I can.

> >> The Linux entry for PV guests is indeed completely horrible.  I am not
> >> trying to defend it in the slightest.
> >>
> >> However, the HVMLite entry which is a very short stub that sets up a
> >> zeropage and hands off to the native start routine is fine.
> > Its alright, and a huge stepping stone towards good architecture. We
> > however can do better.
> 
> Then we are generally in agreement.  However, at the risk of sounding
> like a grump old sod,  take this win first and then work on the next
> stepping stone.

I think this is fair.

> Review comments identifying "I am working on implementing a new $X which
> will make this area better/easier/more shiny in the future" are fine. 
> Review comments complaining that "you haven't used this shiny new $X
> which doesn't exist yet" are a waste of time; time which you would be
> better spent implementing said $X.
> 
> No one is disagreeing that improvements can be made, but don't try to do
> them all at once, or nothing will get done.

Its a fair point, the only contending issue here is the use of
paravirt_enabled() and how I'm changing this to paravirt_legacy(),
I think that's it. Other than we can coordinate on both fronts
to later help clean things up further.

> >> There is still just routine performing native x86 startup.
> >>
> >> If you still desperately want to avoid multiple entry points, then just
> >> insist on using grub for the VM.  I expect that that is how most people
> >> will end up using HVMLite VMs anyway.
> > Are you saying Grub can do some of this heavy lifting that I am trying
> > to avoid? If so that'd be great news.
> 
> There are two different ways of running your VM, depending on your usecase.
> 
> The more traditional approach of a full OS will want to load a kernel
> out of the guests filesystem, according to the guests bootloader
> configuration.  At this point it doesn't matter for Linux as it will be
> booted by some already-existing bootloader which already knows how to do
> the job.

The grub instance would be on the guest filesystem right?  So grub would know
it got kicked by Xen, and grub could then prod what we think is right? In
other words you pass on the custom xen struct to grub and then grub does
the zero page filling.

> The more container/unikernel oriented approach is to boot an image
> directly from dom0, skipping the middle layers of firmware and
> filesystems.  For this to work, Linux needs to be able to be started via
> the hypervisor ABI of choice, which in this case is something which
> looks very much like multiboot.

I see thanks. And the hypervisor ABI in no way would ever consider an option to
let OSes have 2 bits of info set, the hypervisor guest type and a pointer to
custom data structure for the guest type, in order to help with a cleaner
startup on OSes?

  Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/11] xen/hvmlite: Initialize HVMlite kernel
  2016-02-01 15:38 ` [PATCH v2 03/11] xen/hvmlite: Initialize HVMlite kernel Boris Ostrovsky
@ 2016-02-17 20:08   ` Luis R. Rodriguez
  0 siblings, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-02-17 20:08 UTC (permalink / raw)
  To: Boris Ostrovsky, Andy Lutomirski
  Cc: xen-devel, Roger Pau Monné, David Vrabel, linux-kernel

On Mon, Feb 1, 2016 at 7:38 AM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> +               pv_info.paravirt_enabled = 1;

As its being discussed we want to remove paravirt_enabled so this
series should not be merged with this:

http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled

 Luis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]           ` <20160204231031.GM12481@wotan.suse.de>
@ 2016-04-05 22:02             ` Luis R. Rodriguez
  0 siblings, 0 replies; 54+ messages in thread
From: Luis R. Rodriguez @ 2016-04-05 22:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Pere Monclus, Matt Fleming, Brenden Blanco, X86 ML, linux-kernel,
	Andy Lutomirski, Luis R. Rodriguez, Gary Lin, David Vrabel,
	H. Peter Anvin, xen-devel, Boris Ostrovsky, Borislav Petkov,
	Linus Torvalds, Roger Pau Monné

On Thu, Feb 4, 2016 at 3:10 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Feb 04, 2016 at 12:51:38AM +0000, Andrew Cooper wrote:
>> On 03/02/2016 23:59, Luis R. Rodriguez wrote:
>> > On Wed, Feb 03, 2016 at 08:52:50PM +0000, Andrew Cooper wrote:
>> >> On 03/02/16 18:55, Luis R. Rodriguez wrote:
>> >>> We add new hypervisor type to close the semantic gap for hypervisor types, and
>> >>> much like subarch enable also a subarch_data to let you pass and use your
>> >>> hvmlite_start_info. This would not only help with the semantics but also help
>> >>> avoid yet-another-entry point and force us to provide a well define structure
>> >>> for considering code that should not run by pegging it as required or supported
>> >>> for different early x86 code stubs.
>> >> Was I unclear last time?  Xen *will not* be introducing Linux-specifics
>> >> into the HVMLite starting ABI.
>> > This does not have to be "Linux specifics" but rather a light way to enable
>> > a hypervisor to clue in *any* OS of its hypervisor type, guest type, and
>> > custom hypervisor data that can be used to populate needed OS specifics
>> > about the guest. Perhaps Xen's own loader mechanism could be extended just
>> > slightly to become *that* standard, its just right now it doesn't seem to
>> > enable for generalizing this in a very useful way for OSes. Its all
>> > custom stubs.
>>
>> There are already standard x86 ways of doing this, via the hypervisor
>> cpuid bits.  Xen presents itself normally in this regard, as do all the
>> other hypervisors.
>
> I don't think this is availably early in asm boot? Its why I think the
> zero page is convenient. The boot loader should in theory know these
> things, as well as if its in 32-bit, 64-bit, etc.
>
>> It is completely backwards to expect a hypervisor (or toolstack in our
>> case) to deliberately prod what it suspects might be a Linux binary in a
>> way which it things a Linux binary might like to be prodded.
>
> Perhaps prodding tons of info seems ludicrous, however prodding at least a
> loader type and custom data pointer to interpret that so that then your stub
> can interpret seems sensible for many reasons and I don't think prodding two
> things is much to ask for, given the possible gains on clean architecture.
> Its why I am suggesting perhaps this should just be standardized.
>
> We need flexibility on both sides.

And... it seems EFI boot already does this!

A few of us have been discussing now the EFI boot prospect as a
complete alternative to this and avoiding adding
yet-another-boot-entry (TM) to x86. That deserves a full discussion on
its own so will send notes on that next on a separate new thread.

>> >> Your perceived problem with multiple entry points is not a problem with
>> >> multiple entry points; It is a problem with multiple different paths
>> >> performing the same initialisation.
>> > Its actually more of an issue with the lack of strong general semantics
>> > available for different hypervisors and guest types and requirements for x86's
>> > init path. What you end up with as collateral is multiple entry points, and
>> > these can be sloppy and as you note can perform the same initialisation.
>> > Another issue is the inability to proactively ensure new x86 init code
>> > addresses different x86 requirements (cr4 shadow regression and Kasan still
>> > being broken on Xen are two examples) and it just so happens that the lack of
>> > semantics for the different guest types required to be evaluated is one issue
>> > for x86.
>> >
>> > We can do better.
>>
>> Even with a perfect startup() routine which caters for all runtime
>> usecases, you cannot avoid having multiple entry stubs to cater for the
>> different ways the binary might be started.
>>
>> Unless you are volunteering to write a single stub which can first
>> evaluate whether it is in 16/32/64bit mode, then create a safe stack to
>> use, then evaluate how it was started (multiboot, legacy BIOS, EFI,
>> etc.) and turn all this information into a zeropage.
>>
>> I don't know that would be possible, but the point is moot as it
>> definitely wouldn't be maintainable if it were possible.
>
> I think some folks have hope at least some of it might be. I can't do this,
> otherwise I would have done it already. Given my review of the commit logs on
> different entry points, and code I do think its sensible to desire this to help
> with semantics on startup and this should in turn help duplication, bugs, but I
> obviously do not doubt its difficulty.

It would seem streamlining off of an EFI boot entry could boot strap
this effort. To this end I'll be sending out a new thread in which we
review that a bit more.

> Its at least sensible in my mind to strive towards the best possible semantics
> and code sharing from x86-64 bit onwards and if I can help with that I'll do
> what I can.

Me and Andy Lutomirski have been recently fending off some ill
conceived virtualization semantics in the kernel, that work seems to
have proven that the lack of appropriate semantics has only incurred
hacks on the kernel which can be replaced by properly thought out
solutions -- which ultimately should help with other guest types, and
avoid further virtualization hackery in the kernel. If we need to
extend semantics for another guest type I'd like for us to identify
that now and address it.

  Luis

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
  2016-02-01 15:38 ` [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest Boris Ostrovsky
                     ` (3 preceding siblings ...)
       [not found]   ` <20160203185525.GV20964@wotan.suse.de>
@ 2016-04-24 20:23   ` Borislav Petkov
       [not found]   ` <20160424202314.GA3973@pd.tnic>
  5 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2016-04-24 20:23 UTC (permalink / raw)
  To: Boris Ostrovsky, H. Peter Anvin, Ingo Molnar, Thomas Gleixner
  Cc: mcgrof, linux-kernel, david.vrabel, xen-devel, roger.pau

On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
> Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
> page, initialize boot_params, enable early page tables.
> 
> Since this stub is executed before kernel entry point we cannot use
> variables in .bss which is cleared by kernel. We explicitly place
> variables that are initialized here into .data.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  arch/x86/xen/Makefile      |    1 +
>  arch/x86/xen/enlighten.c   |   86 +++++++++++++++++++++-
>  arch/x86/xen/xen-hvmlite.S |  175 ++++++++++++++++++++++++++++++++++++++++++++
>  include/xen/xen.h          |    6 ++
>  4 files changed, 267 insertions(+), 1 deletions(-)
>  create mode 100644 arch/x86/xen/xen-hvmlite.S
> 
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index e47e527..1d913d7 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -23,3 +23,4 @@ obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
>  obj-$(CONFIG_XEN_DOM0)		+= vga.o
>  obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
>  obj-$(CONFIG_XEN_EFI)		+= efi.o
> +obj-$(CONFIG_XEN_PVHVM) 	+= xen-hvmlite.o
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 5774800..5f05fa2 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -118,7 +118,8 @@ DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
>   */
>  DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
>  
> -enum xen_domain_type xen_domain_type = XEN_NATIVE;
> +enum xen_domain_type xen_domain_type
> +	__attribute__((section(".data"))) = XEN_NATIVE;
>  EXPORT_SYMBOL_GPL(xen_domain_type);
>  
>  unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
> @@ -171,6 +172,17 @@ struct tls_descs {
>   */
>  static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>  
> +#ifdef CONFIG_XEN_PVHVM
> +/*
> + * HVMlite variables. These need to live in data segment since they are
> + * initialized before startup_{32|64}, which clear .bss, are invoked.

So this jumps into startup_32/64 and I don't think we have talked about
it yet, have we? I'm not aware of any threads about it. Are we fine with
it, are we not?

I think we need to agree on API where xen guests should jump into
arch/x86/ and adhere to it. Otherwise, we will break xen again if we change
stuff in x86 and we do like to change stuff in x86 all the time.

Adding tip guys and leaving in the rest for reference.

...



> + */
> +int xen_hvmlite __attribute__((section(".data"))) = 0;
> +struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
> +uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
> +struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
> +#endif
> +
>  static void clamp_max_cpus(void)
>  {
>  #ifdef CONFIG_SMP
> @@ -1731,6 +1743,78 @@ asmlinkage __visible void __init xen_start_kernel(void)
>  #endif
>  }
>  
> +#ifdef CONFIG_XEN_PVHVM
> +static void __init hvmlite_bootparams(void)
> +{
> +	struct xen_memory_map memmap;
> +	int i;
> +
> +	memset(&xen_hvmlite_boot_params, 0, sizeof(xen_hvmlite_boot_params));
> +
> +	memmap.nr_entries = ARRAY_SIZE(xen_hvmlite_boot_params.e820_map);
> +	set_xen_guest_handle(memmap.buffer, xen_hvmlite_boot_params.e820_map);
> +	if (HYPERVISOR_memory_op(XENMEM_memory_map, &memmap)) {
> +		xen_raw_console_write("XENMEM_memory_map failed\n");
> +		BUG();
> +	}
> +
> +	xen_hvmlite_boot_params.e820_map[memmap.nr_entries].addr =
> +		ISA_START_ADDRESS;
> +	xen_hvmlite_boot_params.e820_map[memmap.nr_entries].size =
> +		ISA_END_ADDRESS - ISA_START_ADDRESS;
> +	xen_hvmlite_boot_params.e820_map[memmap.nr_entries++].type =
> +		E820_RESERVED;
> +
> +	sanitize_e820_map(xen_hvmlite_boot_params.e820_map,
> +			  ARRAY_SIZE(xen_hvmlite_boot_params.e820_map),
> +			  &memmap.nr_entries);
> +
> +	xen_hvmlite_boot_params.e820_entries = memmap.nr_entries;
> +	for (i = 0; i < xen_hvmlite_boot_params.e820_entries; i++)
> +		e820_add_region(xen_hvmlite_boot_params.e820_map[i].addr,
> +				xen_hvmlite_boot_params.e820_map[i].size,
> +				xen_hvmlite_boot_params.e820_map[i].type);
> +
> +	xen_hvmlite_boot_params.hdr.cmd_line_ptr =
> +		hvmlite_start_info.cmdline_paddr;
> +
> +	/* The first module is always ramdisk */
> +	if (hvmlite_start_info.nr_modules) {
> +		struct hvm_modlist_entry *modaddr =
> +			__va(hvmlite_start_info.modlist_paddr);
> +		xen_hvmlite_boot_params.hdr.ramdisk_image = modaddr->paddr;
> +		xen_hvmlite_boot_params.hdr.ramdisk_size = modaddr->size;
> +	}
> +
> +	/*
> +	 * See Documentation/x86/boot.txt.
> +	 *
> +	 * Version 2.12 supports Xen entry point but we will use default x86/PC
> +	 * environment (i.e. hardware_subarch 0).
> +	 */
> +	xen_hvmlite_boot_params.hdr.version = 0x212;
> +	xen_hvmlite_boot_params.hdr.type_of_loader = 9; /* Xen loader */
> +}
> +
> +/*
> + * This routine (and those that it might call) should not use
> + * anything that lives in .bss since that segment will be cleared later
> + */
> +void __init xen_prepare_hvmlite(void)
> +{
> +	u32 eax, ecx, edx, msr;
> +	u64 pfn;
> +
> +	xen_hvmlite = 1;
> +
> +	cpuid(xen_cpuid_base() + 2, &eax, &msr, &ecx, &edx);
> +	pfn = __pa(hypercall_page);
> +	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
> +
> +	hvmlite_bootparams();
> +}
> +#endif
> +
>  void __ref xen_hvm_init_shared_info(void)
>  {
>  	int cpu;
> diff --git a/arch/x86/xen/xen-hvmlite.S b/arch/x86/xen/xen-hvmlite.S
> new file mode 100644
> index 0000000..fc7c08c
> --- /dev/null
> +++ b/arch/x86/xen/xen-hvmlite.S
> @@ -0,0 +1,175 @@
> +/*
> + * Copyright C 2016, Oracle and/or its affiliates. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +	.code32
> +	.text
> +#define _pa(x)          ((x) - __START_KERNEL_map)
> +
> +#include <linux/elfnote.h>
> +#include <linux/init.h>
> +#include <linux/linkage.h>
> +#include <asm/segment.h>
> +#include <asm/asm.h>
> +#include <asm/boot.h>
> +#include <asm/processor-flags.h>
> +#include <asm/msr.h>
> +#include <xen/interface/elfnote.h>
> +
> +	__HEAD
> +	.code32
> +
> +/* Entry point for HVMlite guests */
> +ENTRY(hvmlite_start_xen)
> +	cli
> +	cld
> +
> +	mov $_pa(gdt), %eax
> +	lgdt (%eax)
> +
> +	movl $(__BOOT_DS),%eax
> +	movl %eax,%ds
> +	movl %eax,%es
> +	movl %eax,%ss
> +
> +	/* Stash hvm_start_info */
> +	mov $_pa(hvmlite_start_info), %edi
> +	mov %ebx, %esi
> +	mov $_pa(hvmlite_start_info_sz), %ecx
> +	mov (%ecx), %ecx
> +	rep
> +	movsb
> +
> +	movl $_pa(early_stack_end), %eax
> +	movl %eax, %esp
> +
> +	/* Enable PAE mode */
> +	movl %cr4, %eax
> +	orl $X86_CR4_PAE, %eax
> +	movl %eax, %cr4
> +
> +#ifdef CONFIG_X86_64
> +	/* Enable Long mode */
> +	movl $MSR_EFER, %ecx
> +	rdmsr
> +	btsl $_EFER_LME, %eax
> +	wrmsr
> +
> +	/* Enable pre-constructed page tables */
> +	mov $_pa(init_level4_pgt), %eax
> +	movl %eax, %cr3
> +	movl $(X86_CR0_PG | X86_CR0_PE), %eax
> +	movl %eax, %cr0
> +
> +	/* Jump to 64-bit mode. */
> +	pushl $__KERNEL_CS
> +	leal _pa(1f), %eax
> +	pushl %eax
> +	lret
> +
> +	/* 64-bit entry point */
> +	.code64
> +1:
> +	call xen_prepare_hvmlite
> +
> +	/* startup_64 expects boot_params in %rsi */
> +	mov $_pa(xen_hvmlite_boot_params), %rsi
> +	movq $_pa(startup_64), %rax
> +	jmp *%rax
> +
> +#else /* CONFIG_X86_64 */
> +
> +	/* Clear boot page tables */
> +	movl $_pa(early_pgtable), %edi
> +	xorl %eax, %eax
> +	movl $((PAGE_SIZE*5)/4), %ecx
> +	rep stosl
> +
> +	/* Level 3 */
> +	movl $_pa(early_pgtable), %edi
> +	leal (PAGE_SIZE +_PAGE_PRESENT)(%edi), %eax
> +	movl $4, %ecx
> +1:
> +	movl %eax, 0x00(%edi)
> +	addl $8, %edi
> +	decl %ecx
> +	jnz 1b
> +
> +	/* Level 2 (2M entries) */
> +	movl $(_pa(early_pgtable) + PAGE_SIZE), %edi
> +	movl $(_PAGE_PSE | _PAGE_RW | _PAGE_PRESENT), %eax
> +	movl $2048, %ecx
> +2:
> +	movl %eax, 0(%edi)
> +	addl $0x00200000, %eax
> +	addl $8, %edi
> +	decl %ecx
> +	jnz 2b
> +
> +	/* Enable the boot paging */
> +	movl $_pa(early_pgtable), %eax
> +	movl %eax, %cr3
> +	movl %cr0, %eax
> +	orl $(X86_CR0_PG | X86_CR0_PE), %eax
> +	movl %eax, %cr0
> +
> +	ljmp $__BOOT_CS,$3f
> +3:
> +	call xen_prepare_hvmlite
> +	mov $_pa(xen_hvmlite_boot_params), %esi
> +
> +	/* startup_32 doesn't expect paging and PAE to be on */
> +	ljmp $__BOOT_CS,$_pa(4f)
> +4:
> +	movl %cr0, %eax
> +	andl $~X86_CR0_PG, %eax
> +	movl %eax, %cr0
> +	movl %cr4, %eax
> +	andl $~X86_CR4_PAE, %eax
> +	movl %eax, %cr4
> +
> +	ljmp    $0x10, $_pa(startup_32)
> +#endif
> +
> +	.data
> +gdt:
> +	.word	gdt_end - gdt
> +	.long	_pa(gdt)
> +	.word	0
> +	.quad	0x0000000000000000 /* NULL descriptor */
> +#ifdef CONFIG_X86_64
> +	.quad	0x00af9a000000ffff /* __KERNEL_CS */
> +#else
> +	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
> +#endif
> +	.quad	0x00cf92000000ffff /* __KERNEL_DS */
> +gdt_end:
> +
> +	.bss
> +	.balign 4
> +early_stack:
> +	.fill 16, 1, 0
> +early_stack_end:
> +
> +#ifdef CONFIG_X86_32
> +	.section ".pgtable","a",@nobits
> +	.balign 4096
> +early_pgtable:
> +	.fill 5*4096, 1, 0
> +#endif
> +
> +	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
> +	             _ASM_PTR (hvmlite_start_xen - __START_KERNEL_map))
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 0c0e3ef..6a0d3f3 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -29,6 +29,12 @@ extern enum xen_domain_type xen_domain_type;
>  #define xen_initial_domain()	(0)
>  #endif	/* CONFIG_XEN_DOM0 */
>  
> +#ifdef CONFIG_XEN_PVHVM
> +extern int xen_hvmlite;
> +#else
> +#define xen_hvmlite		(0)
> +#endif
> +
>  #ifdef CONFIG_XEN_PVH
>  /* This functionality exists only for x86. The XEN_PVHVM support exists
>   * only in x86 world - hence on ARM it will be always disabled.
> -- 
> 1.7.1
> 
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]   ` <20160424202314.GA3973@pd.tnic>
@ 2016-04-25 13:21     ` Boris Ostrovsky
       [not found]     ` <571E19D7.1080301@oracle.com>
  2016-04-25 17:24     ` David Vrabel
  2 siblings, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-04-25 13:21 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin, Ingo Molnar, Thomas Gleixner
  Cc: mcgrof, linux-kernel, david.vrabel, xen-devel, roger.pau

On 04/24/2016 04:23 PM, Borislav Petkov wrote:
> On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
>> Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
>> page, initialize boot_params, enable early page tables.
>>
>> Since this stub is executed before kernel entry point we cannot use
>> variables in .bss which is cleared by kernel. We explicitly place
>> variables that are initialized here into .data.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>>   arch/x86/xen/Makefile      |    1 +
>>   arch/x86/xen/enlighten.c   |   86 +++++++++++++++++++++-
>>   arch/x86/xen/xen-hvmlite.S |  175 ++++++++++++++++++++++++++++++++++++++++++++
>>   include/xen/xen.h          |    6 ++
>>   4 files changed, 267 insertions(+), 1 deletions(-)
>>   create mode 100644 arch/x86/xen/xen-hvmlite.S
>>
>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>> index e47e527..1d913d7 100644
>> --- a/arch/x86/xen/Makefile
>> +++ b/arch/x86/xen/Makefile
>> @@ -23,3 +23,4 @@ obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
>>   obj-$(CONFIG_XEN_DOM0)		+= vga.o
>>   obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
>>   obj-$(CONFIG_XEN_EFI)		+= efi.o
>> +obj-$(CONFIG_XEN_PVHVM) 	+= xen-hvmlite.o
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 5774800..5f05fa2 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -118,7 +118,8 @@ DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
>>    */
>>   DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
>>   
>> -enum xen_domain_type xen_domain_type = XEN_NATIVE;
>> +enum xen_domain_type xen_domain_type
>> +	__attribute__((section(".data"))) = XEN_NATIVE;
>>   EXPORT_SYMBOL_GPL(xen_domain_type);
>>   
>>   unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
>> @@ -171,6 +172,17 @@ struct tls_descs {
>>    */
>>   static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>>   
>> +#ifdef CONFIG_XEN_PVHVM
>> +/*
>> + * HVMlite variables. These need to live in data segment since they are
>> + * initialized before startup_{32|64}, which clear .bss, are invoked.
> So this jumps into startup_32/64 and I don't think we have talked about
> it yet, have we? I'm not aware of any threads about it. Are we fine with
> it, are we not?
>
> I think we need to agree on API where xen guests should jump into
> arch/x86/ and adhere to it. Otherwise, we will break xen again if we change
> stuff in x86 and we do like to change stuff in x86 all the time.
>
> Adding tip guys and leaving in the rest for reference.

I was following  Documentation/x86/boot.txt (plus comments in code 
preceding those two routines) which I considered to be the API.

We are supposed to come to startup_32 with paging off and %esi pointing 
to boot_params. For 64-bit paging is on, %rsi points to zero-page. Plus 
certain requirements on segment registers and interrupt being disabled. 
(I just noticed that for 32-bit %ebp, %edi and %ebx are supposed to be 
zero, so I'll need to do that)

-boris

>
> ...
>
>
>
>> + */
>> +int xen_hvmlite __attribute__((section(".data"))) = 0;
>> +struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
>> +uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
>> +struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
>> +#endif
>> +
>>   static void clamp_max_cpus(void)
>>   {
>>   #ifdef CONFIG_SMP
>> @@ -1731,6 +1743,78 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>   #endif
>>   }
>>   
>> +#ifdef CONFIG_XEN_PVHVM
>> +static void __init hvmlite_bootparams(void)
>> +{
>> +	struct xen_memory_map memmap;
>> +	int i;
>> +
>> +	memset(&xen_hvmlite_boot_params, 0, sizeof(xen_hvmlite_boot_params));
>> +
>> +	memmap.nr_entries = ARRAY_SIZE(xen_hvmlite_boot_params.e820_map);
>> +	set_xen_guest_handle(memmap.buffer, xen_hvmlite_boot_params.e820_map);
>> +	if (HYPERVISOR_memory_op(XENMEM_memory_map, &memmap)) {
>> +		xen_raw_console_write("XENMEM_memory_map failed\n");
>> +		BUG();
>> +	}
>> +
>> +	xen_hvmlite_boot_params.e820_map[memmap.nr_entries].addr =
>> +		ISA_START_ADDRESS;
>> +	xen_hvmlite_boot_params.e820_map[memmap.nr_entries].size =
>> +		ISA_END_ADDRESS - ISA_START_ADDRESS;
>> +	xen_hvmlite_boot_params.e820_map[memmap.nr_entries++].type =
>> +		E820_RESERVED;
>> +
>> +	sanitize_e820_map(xen_hvmlite_boot_params.e820_map,
>> +			  ARRAY_SIZE(xen_hvmlite_boot_params.e820_map),
>> +			  &memmap.nr_entries);
>> +
>> +	xen_hvmlite_boot_params.e820_entries = memmap.nr_entries;
>> +	for (i = 0; i < xen_hvmlite_boot_params.e820_entries; i++)
>> +		e820_add_region(xen_hvmlite_boot_params.e820_map[i].addr,
>> +				xen_hvmlite_boot_params.e820_map[i].size,
>> +				xen_hvmlite_boot_params.e820_map[i].type);
>> +
>> +	xen_hvmlite_boot_params.hdr.cmd_line_ptr =
>> +		hvmlite_start_info.cmdline_paddr;
>> +
>> +	/* The first module is always ramdisk */
>> +	if (hvmlite_start_info.nr_modules) {
>> +		struct hvm_modlist_entry *modaddr =
>> +			__va(hvmlite_start_info.modlist_paddr);
>> +		xen_hvmlite_boot_params.hdr.ramdisk_image = modaddr->paddr;
>> +		xen_hvmlite_boot_params.hdr.ramdisk_size = modaddr->size;
>> +	}
>> +
>> +	/*
>> +	 * See Documentation/x86/boot.txt.
>> +	 *
>> +	 * Version 2.12 supports Xen entry point but we will use default x86/PC
>> +	 * environment (i.e. hardware_subarch 0).
>> +	 */
>> +	xen_hvmlite_boot_params.hdr.version = 0x212;
>> +	xen_hvmlite_boot_params.hdr.type_of_loader = 9; /* Xen loader */
>> +}
>> +
>> +/*
>> + * This routine (and those that it might call) should not use
>> + * anything that lives in .bss since that segment will be cleared later
>> + */
>> +void __init xen_prepare_hvmlite(void)
>> +{
>> +	u32 eax, ecx, edx, msr;
>> +	u64 pfn;
>> +
>> +	xen_hvmlite = 1;
>> +
>> +	cpuid(xen_cpuid_base() + 2, &eax, &msr, &ecx, &edx);
>> +	pfn = __pa(hypercall_page);
>> +	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
>> +
>> +	hvmlite_bootparams();
>> +}
>> +#endif
>> +
>>   void __ref xen_hvm_init_shared_info(void)
>>   {
>>   	int cpu;
>> diff --git a/arch/x86/xen/xen-hvmlite.S b/arch/x86/xen/xen-hvmlite.S
>> new file mode 100644
>> index 0000000..fc7c08c
>> --- /dev/null
>> +++ b/arch/x86/xen/xen-hvmlite.S
>> @@ -0,0 +1,175 @@
>> +/*
>> + * Copyright C 2016, Oracle and/or its affiliates. All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +	.code32
>> +	.text
>> +#define _pa(x)          ((x) - __START_KERNEL_map)
>> +
>> +#include <linux/elfnote.h>
>> +#include <linux/init.h>
>> +#include <linux/linkage.h>
>> +#include <asm/segment.h>
>> +#include <asm/asm.h>
>> +#include <asm/boot.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/msr.h>
>> +#include <xen/interface/elfnote.h>
>> +
>> +	__HEAD
>> +	.code32
>> +
>> +/* Entry point for HVMlite guests */
>> +ENTRY(hvmlite_start_xen)
>> +	cli
>> +	cld
>> +
>> +	mov $_pa(gdt), %eax
>> +	lgdt (%eax)
>> +
>> +	movl $(__BOOT_DS),%eax
>> +	movl %eax,%ds
>> +	movl %eax,%es
>> +	movl %eax,%ss
>> +
>> +	/* Stash hvm_start_info */
>> +	mov $_pa(hvmlite_start_info), %edi
>> +	mov %ebx, %esi
>> +	mov $_pa(hvmlite_start_info_sz), %ecx
>> +	mov (%ecx), %ecx
>> +	rep
>> +	movsb
>> +
>> +	movl $_pa(early_stack_end), %eax
>> +	movl %eax, %esp
>> +
>> +	/* Enable PAE mode */
>> +	movl %cr4, %eax
>> +	orl $X86_CR4_PAE, %eax
>> +	movl %eax, %cr4
>> +
>> +#ifdef CONFIG_X86_64
>> +	/* Enable Long mode */
>> +	movl $MSR_EFER, %ecx
>> +	rdmsr
>> +	btsl $_EFER_LME, %eax
>> +	wrmsr
>> +
>> +	/* Enable pre-constructed page tables */
>> +	mov $_pa(init_level4_pgt), %eax
>> +	movl %eax, %cr3
>> +	movl $(X86_CR0_PG | X86_CR0_PE), %eax
>> +	movl %eax, %cr0
>> +
>> +	/* Jump to 64-bit mode. */
>> +	pushl $__KERNEL_CS
>> +	leal _pa(1f), %eax
>> +	pushl %eax
>> +	lret
>> +
>> +	/* 64-bit entry point */
>> +	.code64
>> +1:
>> +	call xen_prepare_hvmlite
>> +
>> +	/* startup_64 expects boot_params in %rsi */
>> +	mov $_pa(xen_hvmlite_boot_params), %rsi
>> +	movq $_pa(startup_64), %rax
>> +	jmp *%rax
>> +
>> +#else /* CONFIG_X86_64 */
>> +
>> +	/* Clear boot page tables */
>> +	movl $_pa(early_pgtable), %edi
>> +	xorl %eax, %eax
>> +	movl $((PAGE_SIZE*5)/4), %ecx
>> +	rep stosl
>> +
>> +	/* Level 3 */
>> +	movl $_pa(early_pgtable), %edi
>> +	leal (PAGE_SIZE +_PAGE_PRESENT)(%edi), %eax
>> +	movl $4, %ecx
>> +1:
>> +	movl %eax, 0x00(%edi)
>> +	addl $8, %edi
>> +	decl %ecx
>> +	jnz 1b
>> +
>> +	/* Level 2 (2M entries) */
>> +	movl $(_pa(early_pgtable) + PAGE_SIZE), %edi
>> +	movl $(_PAGE_PSE | _PAGE_RW | _PAGE_PRESENT), %eax
>> +	movl $2048, %ecx
>> +2:
>> +	movl %eax, 0(%edi)
>> +	addl $0x00200000, %eax
>> +	addl $8, %edi
>> +	decl %ecx
>> +	jnz 2b
>> +
>> +	/* Enable the boot paging */
>> +	movl $_pa(early_pgtable), %eax
>> +	movl %eax, %cr3
>> +	movl %cr0, %eax
>> +	orl $(X86_CR0_PG | X86_CR0_PE), %eax
>> +	movl %eax, %cr0
>> +
>> +	ljmp $__BOOT_CS,$3f
>> +3:
>> +	call xen_prepare_hvmlite
>> +	mov $_pa(xen_hvmlite_boot_params), %esi
>> +
>> +	/* startup_32 doesn't expect paging and PAE to be on */
>> +	ljmp $__BOOT_CS,$_pa(4f)
>> +4:
>> +	movl %cr0, %eax
>> +	andl $~X86_CR0_PG, %eax
>> +	movl %eax, %cr0
>> +	movl %cr4, %eax
>> +	andl $~X86_CR4_PAE, %eax
>> +	movl %eax, %cr4
>> +
>> +	ljmp    $0x10, $_pa(startup_32)
>> +#endif
>> +
>> +	.data
>> +gdt:
>> +	.word	gdt_end - gdt
>> +	.long	_pa(gdt)
>> +	.word	0
>> +	.quad	0x0000000000000000 /* NULL descriptor */
>> +#ifdef CONFIG_X86_64
>> +	.quad	0x00af9a000000ffff /* __KERNEL_CS */
>> +#else
>> +	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
>> +#endif
>> +	.quad	0x00cf92000000ffff /* __KERNEL_DS */
>> +gdt_end:
>> +
>> +	.bss
>> +	.balign 4
>> +early_stack:
>> +	.fill 16, 1, 0
>> +early_stack_end:
>> +
>> +#ifdef CONFIG_X86_32
>> +	.section ".pgtable","a",@nobits
>> +	.balign 4096
>> +early_pgtable:
>> +	.fill 5*4096, 1, 0
>> +#endif
>> +
>> +	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
>> +	             _ASM_PTR (hvmlite_start_xen - __START_KERNEL_map))
>> diff --git a/include/xen/xen.h b/include/xen/xen.h
>> index 0c0e3ef..6a0d3f3 100644
>> --- a/include/xen/xen.h
>> +++ b/include/xen/xen.h
>> @@ -29,6 +29,12 @@ extern enum xen_domain_type xen_domain_type;
>>   #define xen_initial_domain()	(0)
>>   #endif	/* CONFIG_XEN_DOM0 */
>>   
>> +#ifdef CONFIG_XEN_PVHVM
>> +extern int xen_hvmlite;
>> +#else
>> +#define xen_hvmlite		(0)
>> +#endif
>> +
>>   #ifdef CONFIG_XEN_PVH
>>   /* This functionality exists only for x86. The XEN_PVHVM support exists
>>    * only in x86 world - hence on ARM it will be always disabled.
>> -- 
>> 1.7.1
>>
>>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]     ` <571E19D7.1080301@oracle.com>
@ 2016-04-25 13:47       ` Borislav Petkov
       [not found]       ` <20160425134749.GB28454@pd.tnic>
  1 sibling, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2016-04-25 13:47 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: mcgrof, linux-kernel, david.vrabel, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Ingo Molnar, roger.pau

On Mon, Apr 25, 2016 at 09:21:27AM -0400, Boris Ostrovsky wrote:
> I was following  Documentation/x86/boot.txt (plus comments in code preceding
> those two routines) which I considered to be the API.
> 
> We are supposed to come to startup_32 with paging off and %esi pointing to
> boot_params. For 64-bit paging is on, %rsi points to zero-page.

So the entry points which are ABI and the ones I believe you're talking
about are in arch/x86/boot/compressed/head_64.S. But you have all this
stuff laid out in arch/x86/xen/ and I don't see you using the entry
points in boot/compressed/. It looks more to me that you're using the
ones in arch/x86/kernel/head_{32,64}.S after decompression.

Or am I missing something?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]       ` <20160425134749.GB28454@pd.tnic>
@ 2016-04-25 13:54         ` Boris Ostrovsky
       [not found]         ` <571E219D.2090308@oracle.com>
  1 sibling, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-04-25 13:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mcgrof, linux-kernel, david.vrabel, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Ingo Molnar, roger.pau

On 04/25/2016 09:47 AM, Borislav Petkov wrote:
> On Mon, Apr 25, 2016 at 09:21:27AM -0400, Boris Ostrovsky wrote:
>> I was following  Documentation/x86/boot.txt (plus comments in code preceding
>> those two routines) which I considered to be the API.
>>
>> We are supposed to come to startup_32 with paging off and %esi pointing to
>> boot_params. For 64-bit paging is on, %rsi points to zero-page.
> So the entry points which are ABI and the ones I believe you're talking
> about are in arch/x86/boot/compressed/head_64.S. But you have all this
> stuff laid out in arch/x86/xen/ and I don't see you using the entry
> points in boot/compressed/. It looks more to me that you're using the
> ones in arch/x86/kernel/head_{32,64}.S after decompression.

Yes, those. We don't do anything in arch/x86/boot/compressed, hypervisor 
loads vmlinuX at entry point specified by XEN_ELFNOTE_PHYS32_ENTRY 
(which is hvmlite_start_xen).


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]         ` <571E219D.2090308@oracle.com>
@ 2016-04-25 14:11           ` Borislav Petkov
       [not found]           ` <20160425141145.GE28454@pd.tnic>
  1 sibling, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2016-04-25 14:11 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: mcgrof, linux-kernel, david.vrabel, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Ingo Molnar, roger.pau

On Mon, Apr 25, 2016 at 09:54:37AM -0400, Boris Ostrovsky wrote:
> Yes, those.

I don't think the ones in arch/x86/kernel/head_{32,64}.S are ABI.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]           ` <20160425141145.GE28454@pd.tnic>
@ 2016-04-25 14:42             ` Boris Ostrovsky
       [not found]             ` <571E2CC7.7080907@oracle.com>
  1 sibling, 0 replies; 54+ messages in thread
From: Boris Ostrovsky @ 2016-04-25 14:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mcgrof, linux-kernel, david.vrabel, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Ingo Molnar, roger.pau

On 04/25/2016 10:11 AM, Borislav Petkov wrote:
> On Mon, Apr 25, 2016 at 09:54:37AM -0400, Boris Ostrovsky wrote:
>> Yes, those.
> I don't think the ones in arch/x86/kernel/head_{32,64}.S are ABI.
>

Hmm... I thought that everything specified in boot.txt was ABI.

I don't think we can jump to compressed code from vmlinux.

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]             ` <571E2CC7.7080907@oracle.com>
@ 2016-04-25 15:22               ` Borislav Petkov
       [not found]               ` <20160425152209.GH28454@pd.tnic>
  1 sibling, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2016-04-25 15:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: mcgrof, linux-kernel, david.vrabel, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Ingo Molnar, roger.pau

On Mon, Apr 25, 2016 at 10:42:15AM -0400, Boris Ostrovsky wrote:
> Hmm... I thought that everything specified in boot.txt was ABI.

But those are not there.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]               ` <20160425152209.GH28454@pd.tnic>
@ 2016-04-25 15:48                 ` Boris Ostrovsky
  2016-04-26 10:53                   ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Boris Ostrovsky @ 2016-04-25 15:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mcgrof, linux-kernel, david.vrabel, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Ingo Molnar, roger.pau

On 04/25/2016 11:22 AM, Borislav Petkov wrote:
> On Mon, Apr 25, 2016 at 10:42:15AM -0400, Boris Ostrovsky wrote:
>> Hmm... I thought that everything specified in boot.txt was ABI.
> But those are not there.
>


https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/boot.txt#n1060

and

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/boot.txt#n1096

is what I was referring to.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
       [not found]   ` <20160424202314.GA3973@pd.tnic>
  2016-04-25 13:21     ` Boris Ostrovsky
       [not found]     ` <571E19D7.1080301@oracle.com>
@ 2016-04-25 17:24     ` David Vrabel
  2 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2016-04-25 17:24 UTC (permalink / raw)
  To: Borislav Petkov, Boris Ostrovsky, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner
  Cc: xen-devel, mcgrof, roger.pau, linux-kernel, david.vrabel

On 24/04/16 21:23, Borislav Petkov wrote:
> On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
>> Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
>> page, initialize boot_params, enable early page tables.
>>
>> Since this stub is executed before kernel entry point we cannot use
>> variables in .bss which is cleared by kernel. We explicitly place
>> variables that are initialized here into .data.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>>  arch/x86/xen/Makefile      |    1 +
>>  arch/x86/xen/enlighten.c   |   86 +++++++++++++++++++++-
>>  arch/x86/xen/xen-hvmlite.S |  175 ++++++++++++++++++++++++++++++++++++++++++++
>>  include/xen/xen.h          |    6 ++
>>  4 files changed, 267 insertions(+), 1 deletions(-)
>>  create mode 100644 arch/x86/xen/xen-hvmlite.S
>>
>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>> index e47e527..1d913d7 100644
>> --- a/arch/x86/xen/Makefile
>> +++ b/arch/x86/xen/Makefile
>> @@ -23,3 +23,4 @@ obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
>>  obj-$(CONFIG_XEN_DOM0)		+= vga.o
>>  obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
>>  obj-$(CONFIG_XEN_EFI)		+= efi.o
>> +obj-$(CONFIG_XEN_PVHVM) 	+= xen-hvmlite.o
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 5774800..5f05fa2 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -118,7 +118,8 @@ DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
>>   */
>>  DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
>>  
>> -enum xen_domain_type xen_domain_type = XEN_NATIVE;
>> +enum xen_domain_type xen_domain_type
>> +	__attribute__((section(".data"))) = XEN_NATIVE;
>>  EXPORT_SYMBOL_GPL(xen_domain_type);
>>  
>>  unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
>> @@ -171,6 +172,17 @@ struct tls_descs {
>>   */
>>  static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>>  
>> +#ifdef CONFIG_XEN_PVHVM
>> +/*
>> + * HVMlite variables. These need to live in data segment since they are
>> + * initialized before startup_{32|64}, which clear .bss, are invoked.
> 
> So this jumps into startup_32/64 and I don't think we have talked about
> it yet, have we? I'm not aware of any threads about it. Are we fine with
> it, are we not?
> 
> I think we need to agree on API where xen guests should jump into
> arch/x86/ and adhere to it. Otherwise, we will break xen again if we change
> stuff in x86 and we do like to change stuff in x86 all the time.
> 
> Adding tip guys and leaving in the rest for reference.

It would be good if we could start in the decompresser, but we would
need to be able to:

a) identify that the image is PVH-capable.
b) call a PVH specific entry point that builds the expected struct
boot_params.

I don't see any scope in the existing boot protocol to allow this. Hence
we get Xen to decompress the image and look at ELF notes etc.

We want PVH to be a drop-in replacement for PV as much as possible so
this excludes using a bootloader or post-processing the bzImage into a
PVH-compatible ELF image.

I'm open to other suggestions but what's proposed here seems the least
intrusive and with minimal risk for future breakage.  I don't think the
decompressor to kernel ABI changes often does it?

David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest
  2016-04-25 15:48                 ` Boris Ostrovsky
@ 2016-04-26 10:53                   ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2016-04-26 10:53 UTC (permalink / raw)
  To: Boris Ostrovsky, H. Peter Anvin
  Cc: mcgrof, linux-kernel, david.vrabel, xen-devel, Thomas Gleixner,
	Ingo Molnar, roger.pau

On Mon, Apr 25, 2016 at 11:48:19AM -0400, Boris Ostrovsky wrote:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/boot.txt#n1096
> 
> is what I was referring to.

Right, so reportedly those two weren't meant to be entry points
initially but stuff is using them (think of boot loaders and kexec, for
example) which makes them effectively such.

So I guess having one more user wouldn't change anything.

However, I'd like to document that fact and make them explicit, see
below.

Btw, that boot.txt file could use some serious scrubbing, but that's
for another day.

  (Btw 2, that "start address of loaded 64-bit kernel plus 0x200" is
  simply wrong. The 0x200 offset is for the boot/compressed/ version of
  startup_64:

arch/x86/boot/compressed/head_64.S:
	...

        .code64
        .org 0x200
ENTRY(startup_64)

---
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 9da6f3512249..69ed95784085 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -1053,9 +1053,9 @@ described in zero-page.txt.
 After setting up the struct boot_params, the boot loader can load the
 32/64-bit kernel in the same way as that of 16-bit boot protocol.
 
-In 32-bit boot protocol, the kernel is started by jumping to the
-32-bit kernel entry point, which is the start address of loaded
-32/64-bit kernel.
+In 32-bit boot protocol, the kernel is started by jumping to the 32-bit
+kernel entry point (arch/x86/kernel/head_32.S::startup_32), which is the
+start address of loaded 32/64-bit kernel.
 
 At entry, the CPU must be in 32-bit protected mode with paging
 disabled; a GDT must be loaded with the descriptors for selectors
@@ -1089,9 +1089,9 @@ After setting up the struct boot_params, the boot loader can load
 64-bit kernel in the same way as that of 16-bit boot protocol, but
 kernel could be loaded above 4G.
 
-In 64-bit boot protocol, the kernel is started by jumping to the
-64-bit kernel entry point, which is the start address of loaded
-64-bit kernel plus 0x200.
+In 64-bit boot protocol, the kernel is started by jumping to the 64-bit
+kernel entry point (arch/x86/kernel/head_64.S::startup_64), which is the
+start address of loaded 64-bit kernel.
 
 At entry, the CPU must be in 64-bit mode with paging enabled.
 The range with setup_header.init_size from start address of loaded

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2016-04-26 10:53 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01 15:38 [PATCH v2 00/11] HVMlite domU support Boris Ostrovsky
2016-02-01 15:38 ` [PATCH v2 01/11] xen/hvmlite: Import hvmlite-related Xen public interfaces Boris Ostrovsky
2016-02-02 16:06   ` David Vrabel
2016-02-01 15:38 ` [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest Boris Ostrovsky
2016-02-02 16:39   ` David Vrabel
     [not found]   ` <56B0DBD6.3060205@citrix.com>
2016-02-02 17:19     ` Boris Ostrovsky
2016-02-03 18:55   ` Luis R. Rodriguez
     [not found]   ` <20160203185525.GV20964@wotan.suse.de>
2016-02-03 20:11     ` Boris Ostrovsky
2016-02-03 20:52     ` Andrew Cooper
     [not found]     ` <56B25F0C.2050808@oracle.com>
2016-02-03 23:40       ` Luis R. Rodriguez
     [not found]       ` <20160203234026.GS20964@wotan.suse.de>
2016-02-04 19:54         ` Boris Ostrovsky
     [not found]         ` <56B3AC67.7080704@oracle.com>
2016-02-04 20:57           ` Luis R. Rodriguez
     [not found]           ` <20160204205721.GJ12481@wotan.suse.de>
2016-02-04 22:28             ` Boris Ostrovsky
     [not found]     ` <56B268A2.5000704@citrix.com>
2016-02-03 23:59       ` Luis R. Rodriguez
     [not found]       ` <20160203235908.GT20964@wotan.suse.de>
2016-02-04  0:08         ` Luis R. Rodriguez
2016-02-04  0:51         ` Andrew Cooper
     [not found]         ` <56B2A09A.70404@citrix.com>
2016-02-04 23:10           ` Luis R. Rodriguez
     [not found]           ` <20160204231031.GM12481@wotan.suse.de>
2016-04-05 22:02             ` Luis R. Rodriguez
2016-04-24 20:23   ` Borislav Petkov
     [not found]   ` <20160424202314.GA3973@pd.tnic>
2016-04-25 13:21     ` Boris Ostrovsky
     [not found]     ` <571E19D7.1080301@oracle.com>
2016-04-25 13:47       ` Borislav Petkov
     [not found]       ` <20160425134749.GB28454@pd.tnic>
2016-04-25 13:54         ` Boris Ostrovsky
     [not found]         ` <571E219D.2090308@oracle.com>
2016-04-25 14:11           ` Borislav Petkov
     [not found]           ` <20160425141145.GE28454@pd.tnic>
2016-04-25 14:42             ` Boris Ostrovsky
     [not found]             ` <571E2CC7.7080907@oracle.com>
2016-04-25 15:22               ` Borislav Petkov
     [not found]               ` <20160425152209.GH28454@pd.tnic>
2016-04-25 15:48                 ` Boris Ostrovsky
2016-04-26 10:53                   ` Borislav Petkov
2016-04-25 17:24     ` David Vrabel
2016-02-01 15:38 ` [PATCH v2 03/11] xen/hvmlite: Initialize HVMlite kernel Boris Ostrovsky
2016-02-17 20:08   ` Luis R. Rodriguez
2016-02-01 15:38 ` [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table Boris Ostrovsky
2016-02-01 15:38 ` [PATCH v2 05/11] xen/hvmlite: HVMlite guests always have PV devices Boris Ostrovsky
2016-02-01 15:38 ` [PATCH v2 06/11] xen/hvmlite: Prepare cpu_initialize_context() routine for HVMlite SMP Boris Ostrovsky
2016-02-02 16:16   ` David Vrabel
2016-02-01 15:38 ` [PATCH v2 07/11] xen/hvmlite: Initialize context for secondary VCPUs Boris Ostrovsky
2016-02-02 16:21   ` David Vrabel
     [not found]   ` <56B0D786.7000002@citrix.com>
2016-02-02 16:58     ` Boris Ostrovsky
     [not found]     ` <56B0E046.6050900@oracle.com>
2016-02-04 10:07       ` David Vrabel
2016-02-04 12:58       ` Doug Goldstein
     [not found]       ` <56B34B01.50000@cardoe.com>
2016-02-04 19:08         ` Boris Ostrovsky
2016-02-01 15:38 ` [PATCH v2 08/11] xen/hvmlite: Extend APIC operations for HVMlite guests Boris Ostrovsky
2016-02-04 10:04   ` David Vrabel
     [not found]   ` <56B32220.4040505@citrix.com>
2016-02-04 12:14     ` Roger Pau Monné
     [not found]     ` <56B340B4.3050406@citrix.com>
2016-02-04 14:01       ` Boris Ostrovsky
2016-02-01 15:38 ` [PATCH v2 09/11] xen/hvmlite: Use x86's default timer init " Boris Ostrovsky
2016-02-02 16:27   ` David Vrabel
     [not found]   ` <56B0D8DD.5010206@citrix.com>
2016-02-02 17:01     ` Boris Ostrovsky
2016-02-01 15:38 ` [PATCH v2 10/11] xen/hvmlite: Boot secondary CPUs Boris Ostrovsky
2016-02-04 10:38   ` David Vrabel
     [not found]   ` <56B32A29.6050406@citrix.com>
2016-02-04 13:51     ` Boris Ostrovsky
2016-02-01 15:38 ` [PATCH v2 11/11] xen/hvmlite: Enable CPU on-/offlining Boris Ostrovsky
     [not found] ` <1454341137-14110-5-git-send-email-boris.ostrovsky@oracle.com>
2016-02-02 16:13   ` [PATCH v2 04/11] xen/hvmlite: Allow HVMlite guests delay initializing grant table David Vrabel
     [not found]   ` <56B0D596.4050301@citrix.com>
2016-02-02 16:49     ` Boris Ostrovsky
2016-02-03 18:59   ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).