linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] PVH v2 support (domU)
@ 2017-01-26 19:41 Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 1/9] x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C Boris Ostrovsky
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

PVH v2 support for unprivileged guests.

Now that we decided to defer ACPI CPU hotplug until we understand better
what to do about it in dom0 I am sending v2 with PV-style CPU hotplug, with
v1 comments addressed.


Boris Ostrovsky (9):
  x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C
  xen/x86: Remove PVH support
  xen/pvh: Import PVH-related Xen public interfaces
  xen/pvh: Bootstrap PVH guest
  xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC
  xen/pvh: Initialize grant table for PVH guests
  xen/pvh: PVH guests always have PV devices
  xen/pvh: Enable CPU hotplug
  xen/pvh: Use Xen's emergency_restart op for PVH guests

 arch/x86/include/asm/pgtable_32.h      |  32 ++++
 arch/x86/kernel/head32.c               |  62 ++++++++
 arch/x86/kernel/head_32.S              | 121 +--------------
 arch/x86/xen/Kconfig                   |   2 +-
 arch/x86/xen/Makefile                  |   1 +
 arch/x86/xen/enlighten.c               | 272 +++++++++++++++++----------------
 arch/x86/xen/mmu.c                     |  21 +--
 arch/x86/xen/platform-pci-unplug.c     |   4 +-
 arch/x86/xen/setup.c                   |  37 +----
 arch/x86/xen/smp.c                     |  78 ++++------
 arch/x86/xen/smp.h                     |   8 -
 arch/x86/xen/xen-head.S                |  62 +-------
 arch/x86/xen/xen-ops.h                 |   1 -
 arch/x86/xen/xen-pvh.S                 | 137 +++++++++++++++++
 drivers/xen/cpu_hotplug.c              |   2 +-
 drivers/xen/events/events_base.c       |   1 -
 drivers/xen/grant-table.c              |   8 +-
 include/xen/interface/elfnote.h        |  12 +-
 include/xen/interface/hvm/hvm_vcpu.h   | 143 +++++++++++++++++
 include/xen/interface/hvm/start_info.h |  98 ++++++++++++
 include/xen/xen.h                      |  12 +-
 21 files changed, 676 insertions(+), 438 deletions(-)
 create mode 100644 arch/x86/xen/xen-pvh.S
 create mode 100644 include/xen/interface/hvm/hvm_vcpu.h
 create mode 100644 include/xen/interface/hvm/start_info.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/9] x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 2/9] xen/x86: Remove PVH support Boris Ostrovsky
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

The new Xen PVH entry point requires page tables to be setup by the
kernel since it is entered with paging disabled.

Pull the common code out of head_32.S so that mk_early_pgtbl_32() can be
invoked from both the new Xen entry point and the existing startup_32()
code.

Convert resulting common code to C.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: matt@codeblueprint.co.uk
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1481215471-9639-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1e620f9b23e598ab936ece12233e98e97930b692)
---
This patch should go into mainline from x86 tree in 4.10 timeframe.

 arch/x86/include/asm/pgtable_32.h |  32 ++++++++++
 arch/x86/kernel/head32.c          |  62 +++++++++++++++++++
 arch/x86/kernel/head_32.S         | 121 +++-----------------------------------
 3 files changed, 101 insertions(+), 114 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index b6c0b40..fbc7336 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -27,6 +27,7 @@
 
 extern pgd_t swapper_pg_dir[1024];
 extern pgd_t initial_page_table[1024];
+extern pmd_t initial_pg_pmd[];
 
 static inline void pgtable_cache_init(void) { }
 static inline void check_pgt_cache(void) { }
@@ -75,4 +76,35 @@ static inline void check_pgt_cache(void) { }
 #define kern_addr_valid(kaddr)	(0)
 #endif
 
+/*
+ * This is how much memory in addition to the memory covered up to
+ * and including _end we need mapped initially.
+ * We need:
+ *     (KERNEL_IMAGE_SIZE/4096) / 1024 pages (worst case, non PAE)
+ *     (KERNEL_IMAGE_SIZE/4096) / 512 + 4 pages (worst case for PAE)
+ *
+ * Modulo rounding, each megabyte assigned here requires a kilobyte of
+ * memory, which is currently unreclaimed.
+ *
+ * This should be a multiple of a page.
+ *
+ * KERNEL_IMAGE_SIZE should be greater than pa(_end)
+ * and small than max_low_pfn, otherwise will waste some page table entries
+ */
+#if PTRS_PER_PMD > 1
+#define PAGE_TABLE_SIZE(pages) (((pages) / PTRS_PER_PMD) + PTRS_PER_PGD)
+#else
+#define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
+#endif
+
+/*
+ * Number of possible pages in the lowmem region.
+ *
+ * We shift 2 by 31 instead of 1 by 32 to the left in order to avoid a
+ * gas warning about overflowing shift count when gas has been compiled
+ * with only a host target support using a 32-bit type for internal
+ * representation.
+ */
+#define LOWMEM_PAGES ((((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT))
+
 #endif /* _ASM_X86_PGTABLE_32_H */
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index f16c55b..e5fb436 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -49,3 +49,65 @@ asmlinkage __visible void __init i386_start_kernel(void)
 
 	start_kernel();
 }
+
+/*
+ * Initialize page tables.  This creates a PDE and a set of page
+ * tables, which are located immediately beyond __brk_base.  The variable
+ * _brk_end is set up to point to the first "safe" location.
+ * Mappings are created both at virtual address 0 (identity mapping)
+ * and PAGE_OFFSET for up to _end.
+ *
+ * In PAE mode initial_page_table is statically defined to contain
+ * enough entries to cover the VMSPLIT option (that is the top 1, 2 or 3
+ * entries). The identity mapping is handled by pointing two PGD entries
+ * to the first kernel PMD. Note the upper half of each PMD or PTE are
+ * always zero at this stage.
+ */
+void __init mk_early_pgtbl_32(void)
+{
+#ifdef __pa
+#undef __pa
+#endif
+#define __pa(x)  ((unsigned long)(x) - PAGE_OFFSET)
+	pte_t pte, *ptep;
+	int i;
+	unsigned long *ptr;
+	/* Enough space to fit pagetables for the low memory linear map */
+	const unsigned long limit = __pa(_end) +
+		(PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT);
+#ifdef CONFIG_X86_PAE
+	pmd_t pl2, *pl2p = (pmd_t *)__pa(initial_pg_pmd);
+#define SET_PL2(pl2, val)    { (pl2).pmd = (val); }
+#else
+	pgd_t pl2, *pl2p = (pgd_t *)__pa(initial_page_table);
+#define SET_PL2(pl2, val)   { (pl2).pgd = (val); }
+#endif
+
+	ptep = (pte_t *)__pa(__brk_base);
+	pte.pte = PTE_IDENT_ATTR;
+
+	while ((pte.pte & PTE_PFN_MASK) < limit) {
+
+		SET_PL2(pl2, (unsigned long)ptep | PDE_IDENT_ATTR);
+		*pl2p = pl2;
+#ifndef CONFIG_X86_PAE
+		/* Kernel PDE entry */
+		*(pl2p +  ((PAGE_OFFSET >> PGDIR_SHIFT))) = pl2;
+#endif
+		for (i = 0; i < PTRS_PER_PTE; i++) {
+			*ptep = pte;
+			pte.pte += PAGE_SIZE;
+			ptep++;
+		}
+
+		pl2p++;
+	}
+
+	ptr = (unsigned long *)__pa(&max_pfn_mapped);
+	/* Can't use pte_pfn() since it's a call with CONFIG_PARAVIRT */
+	*ptr = (pte.pte & PTE_PFN_MASK) >> PAGE_SHIFT;
+
+	ptr = (unsigned long *)__pa(&_brk_end);
+	*ptr = (unsigned long)ptep + PAGE_OFFSET;
+}
+
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 4e8577d..1f85ee8 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -24,6 +24,7 @@
 #include <asm/nops.h>
 #include <asm/bootparam.h>
 #include <asm/export.h>
+#include <asm/pgtable_32.h>
 
 /* Physical address */
 #define pa(X) ((X) - __PAGE_OFFSET)
@@ -41,44 +42,10 @@
 #define X86_CAPABILITY	new_cpu_data+CPUINFO_x86_capability
 #define X86_VENDOR_ID	new_cpu_data+CPUINFO_x86_vendor_id
 
-/*
- * This is how much memory in addition to the memory covered up to
- * and including _end we need mapped initially.
- * We need:
- *     (KERNEL_IMAGE_SIZE/4096) / 1024 pages (worst case, non PAE)
- *     (KERNEL_IMAGE_SIZE/4096) / 512 + 4 pages (worst case for PAE)
- *
- * Modulo rounding, each megabyte assigned here requires a kilobyte of
- * memory, which is currently unreclaimed.
- *
- * This should be a multiple of a page.
- *
- * KERNEL_IMAGE_SIZE should be greater than pa(_end)
- * and small than max_low_pfn, otherwise will waste some page table entries
- */
-
-#if PTRS_PER_PMD > 1
-#define PAGE_TABLE_SIZE(pages) (((pages) / PTRS_PER_PMD) + PTRS_PER_PGD)
-#else
-#define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
-#endif
 
 #define SIZEOF_PTREGS 17*4
 
 /*
- * Number of possible pages in the lowmem region.
- *
- * We shift 2 by 31 instead of 1 by 32 to the left in order to avoid a
- * gas warning about overflowing shift count when gas has been compiled
- * with only a host target support using a 32-bit type for internal
- * representation.
- */
-LOWMEM_PAGES = (((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT)
-
-/* Enough space to fit pagetables for the low memory linear map */
-MAPPING_BEYOND_END = PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT
-
-/*
  * Worst-case size of the kernel mapping we need to make:
  * a relocatable kernel can live anywhere in lowmem, so we need to be able
  * to map all of lowmem.
@@ -160,90 +127,15 @@ ENTRY(startup_32)
 	call load_ucode_bsp
 #endif
 
-/*
- * Initialize page tables.  This creates a PDE and a set of page
- * tables, which are located immediately beyond __brk_base.  The variable
- * _brk_end is set up to point to the first "safe" location.
- * Mappings are created both at virtual address 0 (identity mapping)
- * and PAGE_OFFSET for up to _end.
- */
-#ifdef CONFIG_X86_PAE
-
-	/*
-	 * In PAE mode initial_page_table is statically defined to contain
-	 * enough entries to cover the VMSPLIT option (that is the top 1, 2 or 3
-	 * entries). The identity mapping is handled by pointing two PGD entries
-	 * to the first kernel PMD.
-	 *
-	 * Note the upper half of each PMD or PTE are always zero at this stage.
-	 */
-
-#define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */
-
-	xorl %ebx,%ebx				/* %ebx is kept at zero */
-
-	movl $pa(__brk_base), %edi
-	movl $pa(initial_pg_pmd), %edx
-	movl $PTE_IDENT_ATTR, %eax
-10:
-	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PMD entry */
-	movl %ecx,(%edx)			/* Store PMD entry */
-						/* Upper half already zero */
-	addl $8,%edx
-	movl $512,%ecx
-11:
-	stosl
-	xchgl %eax,%ebx
-	stosl
-	xchgl %eax,%ebx
-	addl $0x1000,%eax
-	loop 11b
-
-	/*
-	 * End condition: we must map up to the end + MAPPING_BEYOND_END.
-	 */
-	movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %ebp
-	cmpl %ebp,%eax
-	jb 10b
-1:
-	addl $__PAGE_OFFSET, %edi
-	movl %edi, pa(_brk_end)
-	shrl $12, %eax
-	movl %eax, pa(max_pfn_mapped)
+	/* Create early pagetables. */
+	call  mk_early_pgtbl_32
 
 	/* Do early initialization of the fixmap area */
 	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+#ifdef  CONFIG_X86_PAE
+#define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */
 	movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
-#else	/* Not PAE */
-
-page_pde_offset = (__PAGE_OFFSET >> 20);
-
-	movl $pa(__brk_base), %edi
-	movl $pa(initial_page_table), %edx
-	movl $PTE_IDENT_ATTR, %eax
-10:
-	leal PDE_IDENT_ATTR(%edi),%ecx		/* Create PDE entry */
-	movl %ecx,(%edx)			/* Store identity PDE entry */
-	movl %ecx,page_pde_offset(%edx)		/* Store kernel PDE entry */
-	addl $4,%edx
-	movl $1024, %ecx
-11:
-	stosl
-	addl $0x1000,%eax
-	loop 11b
-	/*
-	 * End condition: we must map up to the end + MAPPING_BEYOND_END.
-	 */
-	movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %ebp
-	cmpl %ebp,%eax
-	jb 10b
-	addl $__PAGE_OFFSET, %edi
-	movl %edi, pa(_brk_end)
-	shrl $12, %eax
-	movl %eax, pa(max_pfn_mapped)
-
-	/* Do early initialization of the fixmap area */
-	movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
+#else
 	movl %eax,pa(initial_page_table+0xffc)
 #endif
 
@@ -666,6 +558,7 @@ ENTRY(setup_once_ref)
 __PAGE_ALIGNED_BSS
 	.align PAGE_SIZE
 #ifdef CONFIG_X86_PAE
+.globl initial_pg_pmd
 initial_pg_pmd:
 	.fill 1024*KPMDS,4,0
 #else
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/9] xen/x86: Remove PVH support
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 1/9] x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 3/9] xen/pvh: Import PVH-related Xen public interfaces Boris Ostrovsky
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

We are replacing existing PVH guests with new implementation.

We are keeping xen_pvh_domain() macro (for now set to zero) because
when we introduce new PVH implementation later in this series we will
reuse current PVH-specific code (xen_pvh_gnttab_setup()), and that
code is conditioned by 'if (xen_pvh_domain())'. (We will also need
a noop xen_pvh_domain() for !CONFIG_XEN_PVH).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Changes in v2:
* Added comment to commit message clarifying why xen_pvh_domain()
  is kept.


 arch/x86/xen/enlighten.c         | 140 ++++++---------------------------------
 arch/x86/xen/mmu.c               |  21 +-----
 arch/x86/xen/setup.c             |  37 +----------
 arch/x86/xen/smp.c               |  78 ++++++++--------------
 arch/x86/xen/smp.h               |   8 ---
 arch/x86/xen/xen-head.S          |  62 ++---------------
 arch/x86/xen/xen-ops.h           |   1 -
 drivers/xen/events/events_base.c |   1 -
 include/xen/xen.h                |  13 +---
 9 files changed, 54 insertions(+), 307 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 51ef952..828f1b2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1138,10 +1138,11 @@ void xen_setup_vcpu_info_placement(void)
 		xen_vcpu_setup(cpu);
 	}
 
-	/* xen_vcpu_setup managed to place the vcpu_info within the
-	 * percpu area for all cpus, so make use of it. Note that for
-	 * PVH we want to use native IRQ mechanism. */
-	if (have_vcpu_info_placement && !xen_pvh_domain()) {
+	/*
+	 * xen_vcpu_setup managed to place the vcpu_info within the
+	 * percpu area for all cpus, so make use of it.
+	 */
+	if (have_vcpu_info_placement) {
 		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
 		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
 		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
@@ -1413,49 +1414,9 @@ static void __init xen_boot_params_init_edd(void)
  * Set up the GDT and segment registers for -fstack-protector.  Until
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
- *
- * Note, that it is __ref because the only caller of this after init
- * is PVH which is not going to use xen_load_gdt_boot or other
- * __init functions.
  */
-static void __ref xen_setup_gdt(int cpu)
+static void xen_setup_gdt(int cpu)
 {
-	if (xen_feature(XENFEAT_auto_translated_physmap)) {
-#ifdef CONFIG_X86_64
-		unsigned long dummy;
-
-		load_percpu_segment(cpu); /* We need to access per-cpu area */
-		switch_to_new_gdt(cpu); /* GDT and GS set */
-
-		/* We are switching of the Xen provided GDT to our HVM mode
-		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
-		 * and we are jumping to reload it.
-		 */
-		asm volatile ("pushq %0\n"
-			      "leaq 1f(%%rip),%0\n"
-			      "pushq %0\n"
-			      "lretq\n"
-			      "1:\n"
-			      : "=&r" (dummy) : "0" (__KERNEL_CS));
-
-		/*
-		 * While not needed, we also set the %es, %ds, and %fs
-		 * to zero. We don't care about %ss as it is NULL.
-		 * Strictly speaking this is not needed as Xen zeros those
-		 * out (and also MSR_FS_BASE, MSR_GS_BASE, MSR_KERNEL_GS_BASE)
-		 *
-		 * Linux zeros them in cpu_init() and in secondary_startup_64
-		 * (for BSP).
-		 */
-		loadsegment(es, 0);
-		loadsegment(ds, 0);
-		loadsegment(fs, 0);
-#else
-		/* PVH: TODO Implement. */
-		BUG();
-#endif
-		return; /* PVH does not need any PV GDT ops. */
-	}
 	pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_cpu_ops.load_gdt = xen_load_gdt_boot;
 
@@ -1466,59 +1427,6 @@ static void __ref xen_setup_gdt(int cpu)
 	pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
-#ifdef CONFIG_XEN_PVH
-/*
- * A PV guest starts with default flags that are not set for PVH, set them
- * here asap.
- */
-static void xen_pvh_set_cr_flags(int cpu)
-{
-
-	/* Some of these are setup in 'secondary_startup_64'. The others:
-	 * X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM guests
-	 * (which PVH shared codepaths), while X86_CR0_PG is for PVH. */
-	write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM);
-
-	if (!cpu)
-		return;
-	/*
-	 * For BSP, PSE PGE are set in probe_page_size_mask(), for APs
-	 * set them here. For all, OSFXSR OSXMMEXCPT are set in fpu__init_cpu().
-	*/
-	if (boot_cpu_has(X86_FEATURE_PSE))
-		cr4_set_bits_and_update_boot(X86_CR4_PSE);
-
-	if (boot_cpu_has(X86_FEATURE_PGE))
-		cr4_set_bits_and_update_boot(X86_CR4_PGE);
-}
-
-/*
- * Note, that it is ref - because the only caller of this after init
- * is PVH which is not going to use xen_load_gdt_boot or other
- * __init functions.
- */
-void __ref xen_pvh_secondary_vcpu_init(int cpu)
-{
-	xen_setup_gdt(cpu);
-	xen_pvh_set_cr_flags(cpu);
-}
-
-static void __init xen_pvh_early_guest_init(void)
-{
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		return;
-
-	BUG_ON(!xen_feature(XENFEAT_hvm_callback_vector));
-
-	xen_pvh_early_cpu_init(0, false);
-	xen_pvh_set_cr_flags(0);
-
-#ifdef CONFIG_X86_32
-	BUG(); /* PVH: Implement proper support. */
-#endif
-}
-#endif    /* CONFIG_XEN_PVH */
-
 static void __init xen_dom0_set_legacy_features(void)
 {
 	x86_platform.legacy.rtc = 1;
@@ -1555,24 +1463,17 @@ asmlinkage __visible void __init xen_start_kernel(void)
 	xen_domain_type = XEN_PV_DOMAIN;
 
 	xen_setup_features();
-#ifdef CONFIG_XEN_PVH
-	xen_pvh_early_guest_init();
-#endif
+
 	xen_setup_machphys_mapping();
 
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
 	pv_init_ops = xen_init_ops;
-	if (!xen_pvh_domain()) {
-		pv_cpu_ops = xen_cpu_ops;
+	pv_cpu_ops = xen_cpu_ops;
 
-		x86_platform.get_nmi_reason = xen_get_nmi_reason;
-	}
+	x86_platform.get_nmi_reason = xen_get_nmi_reason;
 
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		x86_init.resources.memory_setup = xen_auto_xlated_memory_setup;
-	else
-		x86_init.resources.memory_setup = xen_memory_setup;
+	x86_init.resources.memory_setup = xen_memory_setup;
 	x86_init.oem.arch_setup = xen_arch_setup;
 	x86_init.oem.banner = xen_banner;
 
@@ -1665,18 +1566,15 @@ asmlinkage __visible void __init xen_start_kernel(void)
 	/* set the limit of our address space */
 	xen_reserve_top();
 
-	/* PVH: runs at default kernel iopl of 0 */
-	if (!xen_pvh_domain()) {
-		/*
-		 * We used to do this in xen_arch_setup, but that is too late
-		 * on AMD were early_cpu_init (run before ->arch_setup()) calls
-		 * early_amd_init which pokes 0xcf8 port.
-		 */
-		set_iopl.iopl = 1;
-		rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
-		if (rc != 0)
-			xen_raw_printk("physdev_op failed %d\n", rc);
-	}
+	/*
+	 * We used to do this in xen_arch_setup, but that is too late
+	 * on AMD were early_cpu_init (run before ->arch_setup()) calls
+	 * early_amd_init which pokes 0xcf8 port.
+	 */
+	set_iopl.iopl = 1;
+	rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
+	if (rc != 0)
+		xen_raw_printk("physdev_op failed %d\n", rc);
 
 #ifdef CONFIG_X86_32
 	/* set up basic CPUID stuff */
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 7d5afdb..f6740b5 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1792,10 +1792,6 @@ static void __init set_page_prot_flags(void *addr, pgprot_t prot,
 	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
 	pte_t pte = pfn_pte(pfn, prot);
 
-	/* For PVH no need to set R/O or R/W to pin them or unpin them. */
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return;
-
 	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, flags))
 		BUG();
 }
@@ -1902,8 +1898,7 @@ static void __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
  * level2_ident_pgt, and level2_kernel_pgt.  This means that only the
  * kernel has a physical mapping to start with - but that's enough to
  * get __va working.  We need to fill in the rest of the physical
- * mapping once some sort of allocator has been set up.  NOTE: for
- * PVH, the page tables are native.
+ * mapping once some sort of allocator has been set up.
  */
 void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
@@ -2812,16 +2807,6 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-	if (xen_feature(XENFEAT_auto_translated_physmap)) {
-#ifdef CONFIG_XEN_PVH
-		/* We need to update the local page tables and the xen HAP */
-		return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
-						 prot, domid, pages);
-#else
-		return -EINVAL;
-#endif
-        }
-
 	rmd.mfn = gfn;
 	rmd.prot = prot;
 	/* We use the err_ptr to indicate if there we are doing a contiguous
@@ -2915,10 +2900,6 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 	if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
 		return 0;
 
-#ifdef CONFIG_XEN_PVH
-	return xen_xlate_unmap_gfn_range(vma, numpgs, pages);
-#else
 	return -EINVAL;
-#endif
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index f3f7b41..a8c306c 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -915,39 +915,6 @@ char * __init xen_memory_setup(void)
 }
 
 /*
- * Machine specific memory setup for auto-translated guests.
- */
-char * __init xen_auto_xlated_memory_setup(void)
-{
-	struct xen_memory_map memmap;
-	int i;
-	int rc;
-
-	memmap.nr_entries = ARRAY_SIZE(xen_e820_map);
-	set_xen_guest_handle(memmap.buffer, xen_e820_map);
-
-	rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
-	if (rc < 0)
-		panic("No memory map (%d)\n", rc);
-
-	xen_e820_map_entries = memmap.nr_entries;
-
-	sanitize_e820_map(xen_e820_map, ARRAY_SIZE(xen_e820_map),
-			  &xen_e820_map_entries);
-
-	for (i = 0; i < xen_e820_map_entries; i++)
-		e820_add_region(xen_e820_map[i].addr, xen_e820_map[i].size,
-				xen_e820_map[i].type);
-
-	/* Remove p2m info, it is not needed. */
-	xen_start_info->mfn_list = 0;
-	xen_start_info->first_p2m_pfn = 0;
-	xen_start_info->nr_p2m_frames = 0;
-
-	return "Xen";
-}
-
-/*
  * Set the bit indicating "nosegneg" library variants should be used.
  * We only need to bother in pure 32-bit mode; compat 32-bit processes
  * can have un-truncated segments, so wrapping around is allowed.
@@ -1032,8 +999,8 @@ void __init xen_pvmmu_arch_setup(void)
 void __init xen_arch_setup(void)
 {
 	xen_panic_handler_init();
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		xen_pvmmu_arch_setup();
+
+	xen_pvmmu_arch_setup();
 
 #ifdef CONFIG_ACPI
 	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 311acad..0dee6f5 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -99,18 +99,8 @@ static void cpu_bringup(void)
 	local_irq_enable();
 }
 
-/*
- * Note: cpu parameter is only relevant for PVH. The reason for passing it
- * is we can't do smp_processor_id until the percpu segments are loaded, for
- * which we need the cpu number! So we pass it in rdi as first parameter.
- */
-asmlinkage __visible void cpu_bringup_and_idle(int cpu)
+asmlinkage __visible void cpu_bringup_and_idle(void)
 {
-#ifdef CONFIG_XEN_PVH
-	if (xen_feature(XENFEAT_auto_translated_physmap) &&
-	    xen_feature(XENFEAT_supervisor_mode_kernel))
-		xen_pvh_secondary_vcpu_init(cpu);
-#endif
 	cpu_bringup();
 	cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
 }
@@ -404,61 +394,47 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
 	gdt = get_cpu_gdt_table(cpu);
 
 #ifdef CONFIG_X86_32
-	/* Note: PVH is not yet supported on x86_32. */
 	ctxt->user_regs.fs = __KERNEL_PERCPU;
 	ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
 #endif
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
-		ctxt->flags = VGCF_IN_KERNEL;
-		ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
-		ctxt->user_regs.ds = __USER_DS;
-		ctxt->user_regs.es = __USER_DS;
-		ctxt->user_regs.ss = __KERNEL_DS;
+	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
+	ctxt->flags = VGCF_IN_KERNEL;
+	ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
+	ctxt->user_regs.ds = __USER_DS;
+	ctxt->user_regs.es = __USER_DS;
+	ctxt->user_regs.ss = __KERNEL_DS;
 
-		xen_copy_trap_info(ctxt->trap_ctxt);
+	xen_copy_trap_info(ctxt->trap_ctxt);
 
-		ctxt->ldt_ents = 0;
+	ctxt->ldt_ents = 0;
 
-		BUG_ON((unsigned long)gdt & ~PAGE_MASK);
+	BUG_ON((unsigned long)gdt & ~PAGE_MASK);
 
-		gdt_mfn = arbitrary_virt_to_mfn(gdt);
-		make_lowmem_page_readonly(gdt);
-		make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
+	gdt_mfn = arbitrary_virt_to_mfn(gdt);
+	make_lowmem_page_readonly(gdt);
+	make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
 
-		ctxt->gdt_frames[0] = gdt_mfn;
-		ctxt->gdt_ents      = GDT_ENTRIES;
+	ctxt->gdt_frames[0] = gdt_mfn;
+	ctxt->gdt_ents      = GDT_ENTRIES;
 
-		ctxt->kernel_ss = __KERNEL_DS;
-		ctxt->kernel_sp = idle->thread.sp0;
+	ctxt->kernel_ss = __KERNEL_DS;
+	ctxt->kernel_sp = idle->thread.sp0;
 
 #ifdef CONFIG_X86_32
-		ctxt->event_callback_cs     = __KERNEL_CS;
-		ctxt->failsafe_callback_cs  = __KERNEL_CS;
+	ctxt->event_callback_cs     = __KERNEL_CS;
+	ctxt->failsafe_callback_cs  = __KERNEL_CS;
 #else
-		ctxt->gs_base_kernel = per_cpu_offset(cpu);
-#endif
-		ctxt->event_callback_eip    =
-					(unsigned long)xen_hypervisor_callback;
-		ctxt->failsafe_callback_eip =
-					(unsigned long)xen_failsafe_callback;
-		ctxt->user_regs.cs = __KERNEL_CS;
-		per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
-	}
-#ifdef CONFIG_XEN_PVH
-	else {
-		/*
-		 * The vcpu comes on kernel page tables which have the NX pte
-		 * bit set. This means before DS/SS is touched, NX in
-		 * EFER must be set. Hence the following assembly glue code.
-		 */
-		ctxt->user_regs.eip = (unsigned long)xen_pvh_early_cpu_init;
-		ctxt->user_regs.rdi = cpu;
-		ctxt->user_regs.rsi = true;  /* entry == true */
-	}
+	ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
+	ctxt->event_callback_eip    =
+		(unsigned long)xen_hypervisor_callback;
+	ctxt->failsafe_callback_eip =
+		(unsigned long)xen_failsafe_callback;
+	ctxt->user_regs.cs = __KERNEL_CS;
+	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
+
 	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt))
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c5c16dc..9beef33 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -21,12 +21,4 @@ static inline int xen_smp_intr_init(unsigned int cpu)
 static inline void xen_smp_intr_free(unsigned int cpu) {}
 #endif /* CONFIG_SMP */
 
-#ifdef CONFIG_XEN_PVH
-extern void xen_pvh_early_cpu_init(int cpu, bool entry);
-#else
-static inline void xen_pvh_early_cpu_init(int cpu, bool entry)
-{
-}
-#endif
-
 #endif
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7f8d8ab..37794e4 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -16,25 +16,6 @@
 #include <xen/interface/xen-mca.h>
 #include <asm/xen/interface.h>
 
-#ifdef CONFIG_XEN_PVH
-#define PVH_FEATURES_STR  "|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel"
-/* Note the lack of 'hvm_callback_vector'. Older hypervisor will
- * balk at this being part of XEN_ELFNOTE_FEATURES, so we put it in
- * XEN_ELFNOTE_SUPPORTED_FEATURES which older hypervisors will ignore.
- */
-#define PVH_FEATURES ((1 << XENFEAT_writable_page_tables) | \
-		      (1 << XENFEAT_auto_translated_physmap) | \
-		      (1 << XENFEAT_supervisor_mode_kernel) | \
-		      (1 << XENFEAT_hvm_callback_vector))
-/* The XENFEAT_writable_page_tables is not stricly necessary as we set that
- * up regardless whether this CONFIG option is enabled or not, but it
- * clarifies what the right flags need to be.
- */
-#else
-#define PVH_FEATURES_STR  ""
-#define PVH_FEATURES (0)
-#endif
-
 	__INIT
 ENTRY(startup_xen)
 	cld
@@ -54,41 +35,6 @@ ENTRY(startup_xen)
 
 	__FINIT
 
-#ifdef CONFIG_XEN_PVH
-/*
- * xen_pvh_early_cpu_init() - early PVH VCPU initialization
- * @cpu:   this cpu number (%rdi)
- * @entry: true if this is a secondary vcpu coming up on this entry
- *         point, false if this is the boot CPU being initialized for
- *         the first time (%rsi)
- *
- * Note: This is called as a function on the boot CPU, and is the entry point
- *       on the secondary CPU.
- */
-ENTRY(xen_pvh_early_cpu_init)
-	mov     %rsi, %r11
-
-	/* Gather features to see if NX implemented. */
-	mov     $0x80000001, %eax
-	cpuid
-	mov     %edx, %esi
-
-	mov     $MSR_EFER, %ecx
-	rdmsr
-	bts     $_EFER_SCE, %eax
-
-	bt      $20, %esi
-	jnc     1f      	/* No NX, skip setting it */
-	bts     $_EFER_NX, %eax
-1:	wrmsr
-#ifdef CONFIG_SMP
-	cmp     $0, %r11b
-	jne     cpu_bringup_and_idle
-#endif
-	ret
-
-#endif /* CONFIG_XEN_PVH */
-
 .pushsection .text
 	.balign PAGE_SIZE
 ENTRY(hypercall_page)
@@ -114,10 +60,10 @@ ENTRY(hypercall_page)
 #endif
 	ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,          _ASM_PTR startup_xen)
 	ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
-	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .ascii "!writable_page_tables|pae_pgdir_above_4gb"; .asciz PVH_FEATURES_STR)
-	ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES, .long (PVH_FEATURES) |
-						(1 << XENFEAT_writable_page_tables) |
-						(1 << XENFEAT_dom0))
+	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,
+		.ascii "!writable_page_tables|pae_pgdir_above_4gb")
+	ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES,
+		.long (1 << XENFEAT_writable_page_tables) | (1 << XENFEAT_dom0))
 	ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,       .asciz "yes")
 	ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz "generic")
 	ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index ac0a2b0..f6a41c4 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -146,5 +146,4 @@ static inline void __init xen_efi_init(void)
 
 extern int xen_panic_handler_init(void);
 
-void xen_pvh_secondary_vcpu_init(int cpu);
 #endif /* XEN_OPS_H */
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index fd8e872..6a53577 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1704,7 +1704,6 @@ void __init xen_init_IRQ(void)
 		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
 		eoi_gmfn.gmfn = virt_to_gfn(pirq_eoi_map);
 		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
-		/* TODO: No PVH support for PIRQ EOI */
 		if (rc != 0) {
 			free_page((unsigned long) pirq_eoi_map);
 			pirq_eoi_map = NULL;
diff --git a/include/xen/xen.h b/include/xen/xen.h
index f0f0252..d0f9684 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,17 +29,6 @@ enum xen_domain_type {
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
-#ifdef CONFIG_XEN_PVH
-/* This functionality exists only for x86. The XEN_PVHVM support exists
- * only in x86 world - hence on ARM it will be always disabled.
- * N.B. ARM guests are neither PV nor HVM nor PVHVM.
- * It's a bit like PVH but is different also (it's further towards the H
- * end of the spectrum than even PVH).
- */
-#include <xen/features.h>
-#define xen_pvh_domain() (xen_pv_domain() && \
-			  xen_feature(XENFEAT_auto_translated_physmap))
-#else
 #define xen_pvh_domain()	(0)
-#endif
+
 #endif	/* _XEN_XEN_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/9] xen/pvh: Import PVH-related Xen public interfaces
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 1/9] x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 2/9] xen/x86: Remove PVH support Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest Boris Ostrovsky
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 include/xen/interface/elfnote.h        |  12 ++-
 include/xen/interface/hvm/hvm_vcpu.h   | 143 +++++++++++++++++++++++++++++++++
 include/xen/interface/hvm/start_info.h |  98 ++++++++++++++++++++++
 3 files changed, 252 insertions(+), 1 deletion(-)
 create mode 100644 include/xen/interface/hvm/hvm_vcpu.h
 create mode 100644 include/xen/interface/hvm/start_info.h

diff --git a/include/xen/interface/elfnote.h b/include/xen/interface/elfnote.h
index f90b034..9e9f9bf 100644
--- a/include/xen/interface/elfnote.h
+++ b/include/xen/interface/elfnote.h
@@ -193,9 +193,19 @@
 #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
 
 /*
+ * Physical entry point into the kernel.
+ *
+ * 32bit entry point into the kernel. When requested to launch the
+ * guest kernel in a HVM container, Xen will use this entry point to
+ * launch the guest in 32bit protected mode with paging disabled.
+ * Ignored otherwise.
+ */
+#define XEN_ELFNOTE_PHYS32_ENTRY 18
+
+/*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_SUPPORTED_FEATURES
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
 
 #endif /* __XEN_PUBLIC_ELFNOTE_H__ */
 
diff --git a/include/xen/interface/hvm/hvm_vcpu.h b/include/xen/interface/hvm/hvm_vcpu.h
new file mode 100644
index 0000000..32ca83e
--- /dev/null
+++ b/include/xen/interface/hvm/hvm_vcpu.h
@@ -0,0 +1,143 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2015, Roger Pau Monne <roger.pau@citrix.com>
+ */
+
+#ifndef __XEN_PUBLIC_HVM_HVM_VCPU_H__
+#define __XEN_PUBLIC_HVM_HVM_VCPU_H__
+
+#include "../xen.h"
+
+struct vcpu_hvm_x86_32 {
+    uint32_t eax;
+    uint32_t ecx;
+    uint32_t edx;
+    uint32_t ebx;
+    uint32_t esp;
+    uint32_t ebp;
+    uint32_t esi;
+    uint32_t edi;
+    uint32_t eip;
+    uint32_t eflags;
+
+    uint32_t cr0;
+    uint32_t cr3;
+    uint32_t cr4;
+
+    uint32_t pad1;
+
+    /*
+     * EFER should only be used to set the NXE bit (if required)
+     * when starting a vCPU in 32bit mode with paging enabled or
+     * to set the LME/LMA bits in order to start the vCPU in
+     * compatibility mode.
+     */
+    uint64_t efer;
+
+    uint32_t cs_base;
+    uint32_t ds_base;
+    uint32_t ss_base;
+    uint32_t es_base;
+    uint32_t tr_base;
+    uint32_t cs_limit;
+    uint32_t ds_limit;
+    uint32_t ss_limit;
+    uint32_t es_limit;
+    uint32_t tr_limit;
+    uint16_t cs_ar;
+    uint16_t ds_ar;
+    uint16_t ss_ar;
+    uint16_t es_ar;
+    uint16_t tr_ar;
+
+    uint16_t pad2[3];
+};
+
+/*
+ * The layout of the _ar fields of the segment registers is the
+ * following:
+ *
+ * Bits   [0,3]: type (bits 40-43).
+ * Bit        4: s    (descriptor type, bit 44).
+ * Bit    [5,6]: dpl  (descriptor privilege level, bits 45-46).
+ * Bit        7: p    (segment-present, bit 47).
+ * Bit        8: avl  (available for system software, bit 52).
+ * Bit        9: l    (64-bit code segment, bit 53).
+ * Bit       10: db   (meaning depends on the segment, bit 54).
+ * Bit       11: g    (granularity, bit 55)
+ * Bits [12,15]: unused, must be blank.
+ *
+ * A more complete description of the meaning of this fields can be
+ * obtained from the Intel SDM, Volume 3, section 3.4.5.
+ */
+
+struct vcpu_hvm_x86_64 {
+    uint64_t rax;
+    uint64_t rcx;
+    uint64_t rdx;
+    uint64_t rbx;
+    uint64_t rsp;
+    uint64_t rbp;
+    uint64_t rsi;
+    uint64_t rdi;
+    uint64_t rip;
+    uint64_t rflags;
+
+    uint64_t cr0;
+    uint64_t cr3;
+    uint64_t cr4;
+    uint64_t efer;
+
+    /*
+     * Using VCPU_HVM_MODE_64B implies that the vCPU is launched
+     * directly in long mode, so the cached parts of the segment
+     * registers get set to match that environment.
+     *
+     * If the user wants to launch the vCPU in compatibility mode
+     * the 32-bit structure should be used instead.
+     */
+};
+
+struct vcpu_hvm_context {
+#define VCPU_HVM_MODE_32B 0  /* 32bit fields of the structure will be used. */
+#define VCPU_HVM_MODE_64B 1  /* 64bit fields of the structure will be used. */
+    uint32_t mode;
+
+    uint32_t pad;
+
+    /* CPU registers. */
+    union {
+        struct vcpu_hvm_x86_32 x86_32;
+        struct vcpu_hvm_x86_64 x86_64;
+    } cpu_regs;
+};
+typedef struct vcpu_hvm_context vcpu_hvm_context_t;
+
+#endif /* __XEN_PUBLIC_HVM_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h
new file mode 100644
index 0000000..6484159
--- /dev/null
+++ b/include/xen/interface/hvm/start_info.h
@@ -0,0 +1,98 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2016, Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+
+/*
+ * Start of day structure passed to PVH guests and to HVM guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so a 0 value in any
+ * of the address fields should be treated as not present.
+ *
+ *  0 +----------------+
+ *    | magic          | Contains the magic value XEN_HVM_START_MAGIC_VALUE
+ *    |                | ("xEn3" with the 0x80 bit of the "E" set).
+ *  4 +----------------+
+ *    | version        | Version of this structure. Current version is 0. New
+ *    |                | versions are guaranteed to be backwards-compatible.
+ *  8 +----------------+
+ *    | flags          | SIF_xxx flags.
+ * 12 +----------------+
+ *    | nr_modules     | Number of modules passed to the kernel.
+ * 16 +----------------+
+ *    | modlist_paddr  | Physical address of an array of modules
+ *    |                | (layout of the structure below).
+ * 24 +----------------+
+ *    | cmdline_paddr  | Physical address of the command line,
+ *    |                | a zero-terminated ASCII string.
+ * 32 +----------------+
+ *    | rsdp_paddr     | Physical address of the RSDP ACPI data structure.
+ * 40 +----------------+
+ *
+ * The layout of each entry in the module structure is the following:
+ *
+ *  0 +----------------+
+ *    | paddr          | Physical address of the module.
+ *  8 +----------------+
+ *    | size           | Size of the module in bytes.
+ * 16 +----------------+
+ *    | cmdline_paddr  | Physical address of the command line,
+ *    |                | a zero-terminated ASCII string.
+ * 24 +----------------+
+ *    | reserved       |
+ * 32 +----------------+
+ *
+ * The address and sizes are always a 64bit little endian unsigned integer.
+ *
+ * NB: Xen on x86 will always try to place all the data below the 4GiB
+ * boundary.
+ */
+#define XEN_HVM_START_MAGIC_VALUE 0x336ec578
+
+/*
+ * C representation of the x86/HVM start info layout.
+ *
+ * The canonical definition of this layout is above, this is just a way to
+ * represent the layout described there using C types.
+ */
+struct hvm_start_info {
+    uint32_t magic;             /* Contains the magic value 0x336ec578       */
+                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
+    uint32_t version;           /* Version of this structure.                */
+    uint32_t flags;             /* SIF_xxx flags.                            */
+    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
+    uint64_t modlist_paddr;     /* Physical address of an array of           */
+                                /* hvm_modlist_entry.                        */
+    uint64_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
+                                /* structure.                                */
+};
+
+struct hvm_modlist_entry {
+    uint64_t paddr;             /* Physical address of the module.           */
+    uint64_t size;              /* Size of the module in bytes.              */
+    uint64_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint64_t reserved;
+};
+
+#endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2017-01-26 19:41 ` [PATCH v2 3/9] xen/pvh: Import PVH-related Xen public interfaces Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-02-03  7:24   ` Juergen Gross
  2017-01-26 19:41 ` [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC Boris Ostrovsky
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

Start PVH guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
page, initialize boot_params, enable early page tables.

Since this stub is executed before kernel entry point we cannot use
variables in .bss which is cleared by kernel. We explicitly place
variables that are initialized here into .data.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v2:
* Assembly cleanup
* Check for e820 size in init_pvh_bootparams()
* Check XEN_HVM_START_MAGIC_VALUE in start_info


 arch/x86/xen/Kconfig     |   2 +-
 arch/x86/xen/Makefile    |   1 +
 arch/x86/xen/enlighten.c |  98 ++++++++++++++++++++++++++++++++-
 arch/x86/xen/xen-pvh.S   | 137 +++++++++++++++++++++++++++++++++++++++++++++++
 include/xen/xen.h        |   5 ++
 5 files changed, 241 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/xen/xen-pvh.S

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index c7b15f3..76b6dbd 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -53,5 +53,5 @@ config XEN_DEBUG_FS
 
 config XEN_PVH
 	bool "Support for running as a PVH guest"
-	depends on X86_64 && XEN && XEN_PVHVM
+	depends on XEN && XEN_PVHVM && ACPI
 	def_bool n
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index e47e527..cb0164a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= vga.o
 obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
 obj-$(CONFIG_XEN_EFI)		+= efi.o
+obj-$(CONFIG_XEN_PVH)	 	+= xen-pvh.o
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 828f1b2..c82fe14 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -45,6 +45,7 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/nmi.h>
 #include <xen/interface/xen-mca.h>
+#include <xen/interface/hvm/start_info.h>
 #include <xen/features.h>
 #include <xen/page.h>
 #include <xen/hvm.h>
@@ -121,7 +122,8 @@
 DEFINE_PER_CPU(uint32_t, xen_vcpu_id);
 EXPORT_PER_CPU_SYMBOL(xen_vcpu_id);
 
-enum xen_domain_type xen_domain_type = XEN_NATIVE;
+enum xen_domain_type xen_domain_type
+	__attribute__((section(".data"))) = XEN_NATIVE;
 EXPORT_SYMBOL_GPL(xen_domain_type);
 
 unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
@@ -176,6 +178,17 @@ struct tls_descs {
  */
 static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
 
+#ifdef CONFIG_XEN_PVH
+/*
+ * PVH variables. These need to live in data segment since they are
+ * initialized before startup_{32|64}, which clear .bss, are invoked.
+ */
+bool xen_pvh __attribute__((section(".data"))) = 0;
+struct hvm_start_info pvh_start_info __attribute__((section(".data")));
+unsigned int pvh_start_info_sz = sizeof(pvh_start_info);
+struct boot_params pvh_bootparams __attribute__((section(".data")));
+#endif
+
 static void clamp_max_cpus(void)
 {
 #ifdef CONFIG_SMP
@@ -1656,6 +1669,89 @@ asmlinkage __visible void __init xen_start_kernel(void)
 #endif
 }
 
+#ifdef CONFIG_XEN_PVH
+static void __init init_pvh_bootparams(void)
+{
+	struct xen_memory_map memmap;
+	unsigned int i;
+	int rc;
+
+	memset(&pvh_bootparams, 0, sizeof(pvh_bootparams));
+
+	memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_map);
+	set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_map);
+	rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
+	if (rc) {
+		xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc);
+		BUG();
+	}
+
+	if (memmap.nr_entries < E820MAX) {
+		pvh_bootparams.e820_map[memmap.nr_entries].addr =
+			ISA_START_ADDRESS;
+		pvh_bootparams.e820_map[memmap.nr_entries].size =
+			ISA_END_ADDRESS - ISA_START_ADDRESS;
+		pvh_bootparams.e820_map[memmap.nr_entries++].type =
+			E820_RESERVED;
+	} else
+		xen_raw_printk("Warning: Can fit ISA range into e820\n");
+
+	sanitize_e820_map(pvh_bootparams.e820_map,
+			  ARRAY_SIZE(pvh_bootparams.e820_map),
+			  &memmap.nr_entries);
+
+	pvh_bootparams.e820_entries = memmap.nr_entries;
+	for (i = 0; i < pvh_bootparams.e820_entries; i++)
+		e820_add_region(pvh_bootparams.e820_map[i].addr,
+				pvh_bootparams.e820_map[i].size,
+				pvh_bootparams.e820_map[i].type);
+
+	pvh_bootparams.hdr.cmd_line_ptr =
+		pvh_start_info.cmdline_paddr;
+
+	/* The first module is always ramdisk. */
+	if (pvh_start_info.nr_modules) {
+		struct hvm_modlist_entry *modaddr =
+			__va(pvh_start_info.modlist_paddr);
+		pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
+		pvh_bootparams.hdr.ramdisk_size = modaddr->size;
+	}
+
+	/*
+	 * See Documentation/x86/boot.txt.
+	 *
+	 * Version 2.12 supports Xen entry point but we will use default x86/PC
+	 * environment (i.e. hardware_subarch 0).
+	 */
+	pvh_bootparams.hdr.version = 0x212;
+	pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */
+}
+
+/*
+ * This routine (and those that it might call) should not use
+ * anything that lives in .bss since that segment will be cleared later.
+ */
+void __init xen_prepare_pvh(void)
+{
+	u32 msr;
+	u64 pfn;
+
+	if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) {
+		xen_raw_printk("Error: Unexpected magic value (0x%08x)\n",
+				pvh_start_info.magic);
+		BUG();
+	}
+
+	xen_pvh = 1;
+
+	msr = cpuid_ebx(xen_cpuid_base() + 2);
+	pfn = __pa(hypercall_page);
+	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+
+	init_pvh_bootparams();
+}
+#endif
+
 void __ref xen_hvm_init_shared_info(void)
 {
 	int cpu;
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
new file mode 100644
index 0000000..410036a
--- /dev/null
+++ b/arch/x86/xen/xen-pvh.S
@@ -0,0 +1,137 @@
+/*
+ * Copyright C 2016, Oracle and/or its affiliates. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+	.code32
+	.text
+#define _pa(x)          ((x) - __START_KERNEL_map)
+
+#include <linux/elfnote.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <asm/segment.h>
+#include <asm/asm.h>
+#include <asm/boot.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <xen/interface/elfnote.h>
+
+	__HEAD
+
+/* Entry point for PVH guests. */
+ENTRY(pvh_start_xen)
+	cld
+
+	lgdt (_pa(gdt))
+
+	mov $(__BOOT_DS),%eax
+	mov %eax,%ds
+	mov %eax,%es
+	mov %eax,%ss
+
+	/* Stash hvm_start_info. */
+	mov $_pa(pvh_start_info), %edi
+	mov %ebx, %esi
+	mov _pa(pvh_start_info_sz), %ecx
+	shr $2,%ecx
+	rep
+	movsl
+
+	mov $_pa(early_stack_end), %esp
+
+	/* Enable PAE mode. */
+	mov %cr4, %eax
+	orl $X86_CR4_PAE, %eax
+	mov %eax, %cr4
+
+#ifdef CONFIG_X86_64
+	/* Enable Long mode. */
+	mov $MSR_EFER, %ecx
+	rdmsr
+	btsl $_EFER_LME, %eax
+	wrmsr
+
+	/* Enable pre-constructed page tables. */
+	mov $_pa(init_level4_pgt), %eax
+	mov %eax, %cr3
+	mov $(X86_CR0_PG | X86_CR0_PE), %eax
+	mov %eax, %cr0
+
+	/* Jump to 64-bit mode. */
+        ljmp $__KERNEL_CS, $_pa(1f)
+
+	/* 64-bit entry point. */
+	.code64
+1:
+	call xen_prepare_pvh
+
+	/* startup_64 expects boot_params in %rsi. */
+	mov $_pa(pvh_bootparams), %rsi
+	mov $_pa(startup_64), %rax
+	jmp *%rax
+
+#else /* CONFIG_X86_64 */
+
+	call mk_early_pgtbl_32
+
+	mov $_pa(initial_page_table), %eax
+	mov %eax, %cr3
+
+	mov %cr0, %eax
+	or $(X86_CR0_PG | X86_CR0_PE), %eax
+	mov %eax, %cr0
+
+	ljmp $__BOOT_CS, $1f
+1:
+	call xen_prepare_pvh
+	mov $_pa(pvh_bootparams), %esi
+
+	/* startup_32 doesn't expect paging and PAE to be on. */
+	ljmp $__BOOT_CS, $_pa(2f)
+2:
+	mov %cr0, %eax
+	and $~X86_CR0_PG, %eax
+	mov %eax, %cr0
+	mov %cr4, %eax
+	and $~X86_CR4_PAE, %eax
+	mov %eax, %cr4
+
+	ljmp    $0x10, $_pa(startup_32)
+#endif
+ENDPROC(pvh_start_xen)
+
+	.data
+gdt:
+	.word	gdt_end - gdt
+	.long	_pa(gdt)
+	.word	0
+	.quad	0x0000000000000000 /* NULL descriptor */
+#ifdef CONFIG_X86_64
+	.quad	0x00af9a000000ffff /* __KERNEL_CS */
+#else
+	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
+#endif
+	.quad	0x00cf92000000ffff /* __KERNEL_DS */
+gdt_end:
+
+	.bss
+	.balign 4
+early_stack:
+	.fill 16, 1, 0
+early_stack_end:
+
+	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
+	             _ASM_PTR (pvh_start_xen - __START_KERNEL_map))
diff --git a/include/xen/xen.h b/include/xen/xen.h
index d0f9684..6e8b7fc 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,6 +29,11 @@ enum xen_domain_type {
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
+#ifdef CONFIG_XEN_PVH
+extern bool xen_pvh;
+#define xen_pvh_domain()	(xen_hvm_domain() && xen_pvh)
+#else
 #define xen_pvh_domain()	(0)
+#endif
 
 #endif	/* _XEN_XEN_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2017-01-26 19:41 ` [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-02-02 15:23   ` Juergen Gross
  2017-02-02 15:35   ` Roger Pau Monné
  2017-01-26 19:41 ` [PATCH v2 6/9] xen/pvh: Initialize grant table for PVH guests Boris Ostrovsky
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

Make sure they don't use these devices since they are not emulated
for unprivileged PVH guest.

Also don't initialize hypercall page for them in init_hvm_pv_info()
since this has already been done.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v2:
* Use cpuid_ebx() instead of cpuid()

 arch/x86/xen/enlighten.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c82fe14..6463382 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1791,20 +1791,32 @@ void __ref xen_hvm_init_shared_info(void)
 static void __init init_hvm_pv_info(void)
 {
 	int major, minor;
-	uint32_t eax, ebx, ecx, edx, pages, msr, base;
-	u64 pfn;
+	uint32_t eax, ebx, ecx, edx, msr, base;
 
 	base = xen_cpuid_base();
-	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
+	eax = cpuid_eax(base + 1);
 
 	major = eax >> 16;
 	minor = eax & 0xffff;
 	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
 
-	cpuid(base + 2, &pages, &msr, &ecx, &edx);
+	xen_domain_type = XEN_HVM_DOMAIN;
 
-	pfn = __pa(hypercall_page);
-	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+	/* PVH set up hypercall page earlier in xen_prepare_pvh(). */
+	if (xen_pvh_domain()) {
+		pv_info.name = "Xen PVH";
+#ifdef CONFIG_ACPI
+		/* No PIC or IOAPIC */
+		acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM;
+#endif
+	} else {
+		u64 pfn;
+
+		pv_info.name = "Xen HVM";
+		msr = cpuid_ebx(base + 2);
+		pfn = __pa(hypercall_page);
+		wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+	}
 
 	xen_setup_features();
 
@@ -1813,10 +1825,6 @@ static void __init init_hvm_pv_info(void)
 		this_cpu_write(xen_vcpu_id, ebx);
 	else
 		this_cpu_write(xen_vcpu_id, smp_processor_id());
-
-	pv_info.name = "Xen HVM";
-
-	xen_domain_type = XEN_HVM_DOMAIN;
 }
 #endif
 
@@ -1892,6 +1900,9 @@ static void __init xen_hvm_guest_init(void)
 
 	init_hvm_pv_info();
 
+	if (xen_pvh_domain())
+		x86_platform.legacy.rtc = 0;
+
 	xen_hvm_init_shared_info();
 
 	xen_panic_handler_init();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/9] xen/pvh: Initialize grant table for PVH guests
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2017-01-26 19:41 ` [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-01-27 15:38   ` Juergen Gross
  2017-01-26 19:41 ` [PATCH v2 7/9] xen/pvh: PVH guests always have PV devices Boris Ostrovsky
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

Like PV guests, PVH does not have PCI devices and therefore cannot
use MMIO space to store grants. Instead it balloons out memory and
keeps grants there.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v2:
* Updated commit message

 drivers/xen/grant-table.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index bb36b1e..d6786b8 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1146,13 +1146,13 @@ int gnttab_init(void)
 
 static int __gnttab_init(void)
 {
+	if (!xen_domain())
+		return -ENODEV;
+
 	/* Delay grant-table initialization in the PV on HVM case */
-	if (xen_hvm_domain())
+	if (xen_hvm_domain() && !xen_pvh_domain())
 		return 0;
 
-	if (!xen_pv_domain())
-		return -ENODEV;
-
 	return gnttab_init();
 }
 /* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 7/9] xen/pvh: PVH guests always have PV devices
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2017-01-26 19:41 ` [PATCH v2 6/9] xen/pvh: Initialize grant table for PVH guests Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 8/9] xen/pvh: Enable CPU hotplug Boris Ostrovsky
  2017-01-26 19:41 ` [PATCH v2 9/9] xen/pvh: Use Xen's emergency_restart op for PVH guests Boris Ostrovsky
  8 siblings, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/platform-pci-unplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/platform-pci-unplug.c b/arch/x86/xen/platform-pci-unplug.c
index 90d1b83..33a783c 100644
--- a/arch/x86/xen/platform-pci-unplug.c
+++ b/arch/x86/xen/platform-pci-unplug.c
@@ -73,8 +73,8 @@ bool xen_has_pv_devices(void)
 	if (!xen_domain())
 		return false;
 
-	/* PV domains always have them. */
-	if (xen_pv_domain())
+	/* PV and PVH domains always have them. */
+	if (xen_pv_domain() || xen_pvh_domain())
 		return true;
 
 	/* And user has xen_platform_pci=0 set in guest config as
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 8/9] xen/pvh: Enable CPU hotplug
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2017-01-26 19:41 ` [PATCH v2 7/9] xen/pvh: PVH guests always have PV devices Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-01-27 15:36   ` Juergen Gross
  2017-01-26 19:41 ` [PATCH v2 9/9] xen/pvh: Use Xen's emergency_restart op for PVH guests Boris Ostrovsky
  8 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

PVH guests don't (yet) receive ACPI hotplug interrupts and therefore
need to monitor xenstore for CPU hotplug event.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 drivers/xen/cpu_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index 5676aef..0bab60a3 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -107,7 +107,7 @@ static int __init setup_vcpu_hotplug_event(void)
 		.notifier_call = setup_cpu_watcher };
 
 #ifdef CONFIG_X86
-	if (!xen_pv_domain())
+	if (!xen_pv_domain() && !xen_pvh_domain())
 #else
 	if (!xen_domain())
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 9/9] xen/pvh: Use Xen's emergency_restart op for PVH guests
  2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2017-01-26 19:41 ` [PATCH v2 8/9] xen/pvh: Enable CPU hotplug Boris Ostrovsky
@ 2017-01-26 19:41 ` Boris Ostrovsky
  2017-01-27 15:37   ` Juergen Gross
  8 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-01-26 19:41 UTC (permalink / raw)
  To: JGross; +Cc: roger.pau, xen-devel, linux-kernel, boris.ostrovsky

Using native_machine_emergency_restart (called during reboot) will
lead PVH guests to machine_real_restart()  where we try to use
real_mode_header which is not initialized.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
New in v2

 arch/x86/xen/enlighten.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 6463382..20ae5d9d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1915,6 +1915,9 @@ static void __init xen_hvm_guest_init(void)
 	x86_init.irqs.intr_init = xen_init_IRQ;
 	xen_hvm_init_time_ops();
 	xen_hvm_init_mmu_ops();
+
+	if (xen_pvh_domain())
+		machine_ops.emergency_restart = xen_emergency_restart;
 #ifdef CONFIG_KEXEC_CORE
 	machine_ops.shutdown = xen_hvm_shutdown;
 	machine_ops.crash_shutdown = xen_hvm_crash_shutdown;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 8/9] xen/pvh: Enable CPU hotplug
  2017-01-26 19:41 ` [PATCH v2 8/9] xen/pvh: Enable CPU hotplug Boris Ostrovsky
@ 2017-01-27 15:36   ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2017-01-27 15:36 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: roger.pau, xen-devel, linux-kernel

On 26/01/17 20:41, Boris Ostrovsky wrote:
> PVH guests don't (yet) receive ACPI hotplug interrupts and therefore
> need to monitor xenstore for CPU hotplug event.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 9/9] xen/pvh: Use Xen's emergency_restart op for PVH guests
  2017-01-26 19:41 ` [PATCH v2 9/9] xen/pvh: Use Xen's emergency_restart op for PVH guests Boris Ostrovsky
@ 2017-01-27 15:37   ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2017-01-27 15:37 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: roger.pau, xen-devel, linux-kernel

On 26/01/17 20:41, Boris Ostrovsky wrote:
> Using native_machine_emergency_restart (called during reboot) will
> lead PVH guests to machine_real_restart()  where we try to use
> real_mode_header which is not initialized.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 6/9] xen/pvh: Initialize grant table for PVH guests
  2017-01-26 19:41 ` [PATCH v2 6/9] xen/pvh: Initialize grant table for PVH guests Boris Ostrovsky
@ 2017-01-27 15:38   ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2017-01-27 15:38 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: roger.pau, xen-devel, linux-kernel

On 26/01/17 20:41, Boris Ostrovsky wrote:
> Like PV guests, PVH does not have PCI devices and therefore cannot
> use MMIO space to store grants. Instead it balloons out memory and
> keeps grants there.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC
  2017-01-26 19:41 ` [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC Boris Ostrovsky
@ 2017-02-02 15:23   ` Juergen Gross
  2017-02-02 15:35   ` Roger Pau Monné
  1 sibling, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2017-02-02 15:23 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: roger.pau, xen-devel, linux-kernel

On 26/01/17 20:41, Boris Ostrovsky wrote:
> Make sure they don't use these devices since they are not emulated
> for unprivileged PVH guest.
> 
> Also don't initialize hypercall page for them in init_hvm_pv_info()
> since this has already been done.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC
  2017-01-26 19:41 ` [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC Boris Ostrovsky
  2017-02-02 15:23   ` Juergen Gross
@ 2017-02-02 15:35   ` Roger Pau Monné
  2017-02-02 16:30     ` Boris Ostrovsky
  1 sibling, 1 reply; 22+ messages in thread
From: Roger Pau Monné @ 2017-02-02 15:35 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: JGross, xen-devel, linux-kernel

On Thu, Jan 26, 2017 at 02:41:28PM -0500, Boris Ostrovsky wrote:
> Make sure they don't use these devices since they are not emulated
> for unprivileged PVH guest.

This description seems weird for what it's actually done. AFAICT you are not
really preventing the guest from using the PIC or the IO APIC, because this is
fetched from the MADT table (or should be fetched from there in any case).

See below for the RTC...

[...]
> @@ -1892,6 +1900,9 @@ static void __init xen_hvm_guest_init(void)
>  
>  	init_hvm_pv_info();
>  
> +	if (xen_pvh_domain())
> +		x86_platform.legacy.rtc = 0;

Can't you fetch that from the FADT boot flags field? (See "5.2.9.3 IA-PC Boot
Architecture Flags" in ACPI 6.1 spec).

Roger.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC
  2017-02-02 15:35   ` Roger Pau Monné
@ 2017-02-02 16:30     ` Boris Ostrovsky
  2017-02-02 16:40       ` Roger Pau Monné
  0 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-02-02 16:30 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: JGross, xen-devel, linux-kernel

On 02/02/2017 10:35 AM, Roger Pau Monné wrote:
> On Thu, Jan 26, 2017 at 02:41:28PM -0500, Boris Ostrovsky wrote:
>> Make sure they don't use these devices since they are not emulated
>> for unprivileged PVH guest.
> This description seems weird for what it's actually done. AFAICT you are not
> really preventing the guest from using the PIC or the IO APIC, because this is
> fetched from the MADT table (or should be fetched from there in any case).

This was meant to say that we don't want to use ACPI_IRQ_MODEL_[IOA]PIC
since we don't support SCI (which is expected on x86 to be one of the two).

I'll re-word it.

>
> See below for the RTC...
>
> [...]
>> @@ -1892,6 +1900,9 @@ static void __init xen_hvm_guest_init(void)
>>  
>>  	init_hvm_pv_info();
>>  
>> +	if (xen_pvh_domain())
>> +		x86_platform.legacy.rtc = 0;
> Can't you fetch that from the FADT boot flags field? (See "5.2.9.3 IA-PC Boot
> Architecture Flags" in ACPI 6.1 spec).

Good point. In fact, I can drop this altogether because
acpi_parse_fadt() will do this for us.

-boris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC
  2017-02-02 16:30     ` Boris Ostrovsky
@ 2017-02-02 16:40       ` Roger Pau Monné
  2017-02-02 17:47         ` Boris Ostrovsky
  0 siblings, 1 reply; 22+ messages in thread
From: Roger Pau Monné @ 2017-02-02 16:40 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: JGross, xen-devel, linux-kernel

On Thu, Feb 02, 2017 at 11:30:19AM -0500, Boris Ostrovsky wrote:
> On 02/02/2017 10:35 AM, Roger Pau Monné wrote:
> > On Thu, Jan 26, 2017 at 02:41:28PM -0500, Boris Ostrovsky wrote:
> >> Make sure they don't use these devices since they are not emulated
> >> for unprivileged PVH guest.
> > This description seems weird for what it's actually done. AFAICT you are not
> > really preventing the guest from using the PIC or the IO APIC, because this is
> > fetched from the MADT table (or should be fetched from there in any case).
> 
> This was meant to say that we don't want to use ACPI_IRQ_MODEL_[IOA]PIC
> since we don't support SCI (which is expected on x86 to be one of the two).

Hm, right. At some point (ie: when PCI-passthrough is implemented) we will be
providing an IO APIC and a SCI through it. Or would we rather always use the
event channel SCI?

Roger.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC
  2017-02-02 16:40       ` Roger Pau Monné
@ 2017-02-02 17:47         ` Boris Ostrovsky
  0 siblings, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-02-02 17:47 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: JGross, xen-devel, linux-kernel

On 02/02/2017 11:40 AM, Roger Pau Monné wrote:
> On Thu, Feb 02, 2017 at 11:30:19AM -0500, Boris Ostrovsky wrote:
>> On 02/02/2017 10:35 AM, Roger Pau Monné wrote:
>>> On Thu, Jan 26, 2017 at 02:41:28PM -0500, Boris Ostrovsky wrote:
>>>> Make sure they don't use these devices since they are not emulated
>>>> for unprivileged PVH guest.
>>> This description seems weird for what it's actually done. AFAICT you are not
>>> really preventing the guest from using the PIC or the IO APIC, because this is
>>> fetched from the MADT table (or should be fetched from there in any case).
>> This was meant to say that we don't want to use ACPI_IRQ_MODEL_[IOA]PIC
>> since we don't support SCI (which is expected on x86 to be one of the two).
> Hm, right. At some point (ie: when PCI-passthrough is implemented) we will be
> providing an IO APIC and a SCI through it. Or would we rather always use the
> event channel SCI?

Staying closer to bare-metal (i.e. doing it via IOAPIC) might be better
but then this would always require IOAPIC for PVH (because we'd want to
use SCI for hotplug too, if we ever get there).

-boris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest
  2017-01-26 19:41 ` [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest Boris Ostrovsky
@ 2017-02-03  7:24   ` Juergen Gross
  2017-02-03 16:20     ` Boris Ostrovsky
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2017-02-03  7:24 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: roger.pau, xen-devel, linux-kernel

On 26/01/17 20:41, Boris Ostrovsky wrote:
> Start PVH guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
> page, initialize boot_params, enable early page tables.
> 
> Since this stub is executed before kernel entry point we cannot use
> variables in .bss which is cleared by kernel. We explicitly place
> variables that are initialized here into .data.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Changes in v2:
> * Assembly cleanup
> * Check for e820 size in init_pvh_bootparams()
> * Check XEN_HVM_START_MAGIC_VALUE in start_info
> 
> 
>  arch/x86/xen/Kconfig     |   2 +-
>  arch/x86/xen/Makefile    |   1 +
>  arch/x86/xen/enlighten.c |  98 ++++++++++++++++++++++++++++++++-
>  arch/x86/xen/xen-pvh.S   | 137 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/xen/xen.h        |   5 ++
>  5 files changed, 241 insertions(+), 2 deletions(-)
>  create mode 100644 arch/x86/xen/xen-pvh.S
> 
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index c7b15f3..76b6dbd 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -53,5 +53,5 @@ config XEN_DEBUG_FS
>  
>  config XEN_PVH
>  	bool "Support for running as a PVH guest"
> -	depends on X86_64 && XEN && XEN_PVHVM
> +	depends on XEN && XEN_PVHVM && ACPI
>  	def_bool n
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index e47e527..cb0164a 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -23,3 +23,4 @@ obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
>  obj-$(CONFIG_XEN_DOM0)		+= vga.o
>  obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
>  obj-$(CONFIG_XEN_EFI)		+= efi.o
> +obj-$(CONFIG_XEN_PVH)	 	+= xen-pvh.o
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 828f1b2..c82fe14 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -45,6 +45,7 @@
>  #include <xen/interface/memory.h>
>  #include <xen/interface/nmi.h>
>  #include <xen/interface/xen-mca.h>
> +#include <xen/interface/hvm/start_info.h>
>  #include <xen/features.h>
>  #include <xen/page.h>
>  #include <xen/hvm.h>
> @@ -121,7 +122,8 @@
>  DEFINE_PER_CPU(uint32_t, xen_vcpu_id);
>  EXPORT_PER_CPU_SYMBOL(xen_vcpu_id);
>  
> -enum xen_domain_type xen_domain_type = XEN_NATIVE;
> +enum xen_domain_type xen_domain_type
> +	__attribute__((section(".data"))) = XEN_NATIVE;
>  EXPORT_SYMBOL_GPL(xen_domain_type);
>  
>  unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
> @@ -176,6 +178,17 @@ struct tls_descs {
>   */
>  static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>  
> +#ifdef CONFIG_XEN_PVH
> +/*
> + * PVH variables. These need to live in data segment since they are
> + * initialized before startup_{32|64}, which clear .bss, are invoked.
> + */
> +bool xen_pvh __attribute__((section(".data"))) = 0;
> +struct hvm_start_info pvh_start_info __attribute__((section(".data")));
> +unsigned int pvh_start_info_sz = sizeof(pvh_start_info);

While I believe this can live in .bss as it isn't used after clearing
.bss there should either be a comment why this is save or you should
attribute it as .data, too.

> +struct boot_params pvh_bootparams __attribute__((section(".data")));
> +#endif
> +
>  static void clamp_max_cpus(void)
>  {
>  #ifdef CONFIG_SMP
> @@ -1656,6 +1669,89 @@ asmlinkage __visible void __init xen_start_kernel(void)
>  #endif
>  }
>  
> +#ifdef CONFIG_XEN_PVH
> +static void __init init_pvh_bootparams(void)
> +{
> +	struct xen_memory_map memmap;
> +	unsigned int i;
> +	int rc;
> +
> +	memset(&pvh_bootparams, 0, sizeof(pvh_bootparams));
> +
> +	memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_map);
> +	set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_map);
> +	rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
> +	if (rc) {
> +		xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc);
> +		BUG();
> +	}
> +
> +	if (memmap.nr_entries < E820MAX) {

Shouldn't this be E820MAX - 1?
What happens if memmap.nr_entries is already
ARRAY_SIZE(pvh_bootparams.e820_map) ?

> +		pvh_bootparams.e820_map[memmap.nr_entries].addr =
> +			ISA_START_ADDRESS;
> +		pvh_bootparams.e820_map[memmap.nr_entries].size =
> +			ISA_END_ADDRESS - ISA_START_ADDRESS;
> +		pvh_bootparams.e820_map[memmap.nr_entries++].type =
> +			E820_RESERVED;

I'd rather split out the '++' to a separate statement.

> +	} else
> +		xen_raw_printk("Warning: Can fit ISA range into e820\n");
> +
> +	sanitize_e820_map(pvh_bootparams.e820_map,
> +			  ARRAY_SIZE(pvh_bootparams.e820_map),
> +			  &memmap.nr_entries);
> +
> +	pvh_bootparams.e820_entries = memmap.nr_entries;
> +	for (i = 0; i < pvh_bootparams.e820_entries; i++)
> +		e820_add_region(pvh_bootparams.e820_map[i].addr,
> +				pvh_bootparams.e820_map[i].size,
> +				pvh_bootparams.e820_map[i].type);
> +
> +	pvh_bootparams.hdr.cmd_line_ptr =
> +		pvh_start_info.cmdline_paddr;
> +
> +	/* The first module is always ramdisk. */
> +	if (pvh_start_info.nr_modules) {
> +		struct hvm_modlist_entry *modaddr =
> +			__va(pvh_start_info.modlist_paddr);
> +		pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
> +		pvh_bootparams.hdr.ramdisk_size = modaddr->size;
> +	}
> +
> +	/*
> +	 * See Documentation/x86/boot.txt.
> +	 *
> +	 * Version 2.12 supports Xen entry point but we will use default x86/PC
> +	 * environment (i.e. hardware_subarch 0).
> +	 */
> +	pvh_bootparams.hdr.version = 0x212;
> +	pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */
> +}
> +
> +/*
> + * This routine (and those that it might call) should not use
> + * anything that lives in .bss since that segment will be cleared later.
> + */
> +void __init xen_prepare_pvh(void)
> +{
> +	u32 msr;
> +	u64 pfn;
> +
> +	if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) {
> +		xen_raw_printk("Error: Unexpected magic value (0x%08x)\n",
> +				pvh_start_info.magic);
> +		BUG();
> +	}
> +
> +	xen_pvh = 1;
> +
> +	msr = cpuid_ebx(xen_cpuid_base() + 2);
> +	pfn = __pa(hypercall_page);
> +	wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
> +
> +	init_pvh_bootparams();
> +}
> +#endif
> +
>  void __ref xen_hvm_init_shared_info(void)
>  {
>  	int cpu;
> diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
> new file mode 100644
> index 0000000..410036a
> --- /dev/null
> +++ b/arch/x86/xen/xen-pvh.S
> @@ -0,0 +1,137 @@
> +/*
> + * Copyright C 2016, Oracle and/or its affiliates. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +	.code32
> +	.text
> +#define _pa(x)          ((x) - __START_KERNEL_map)
> +
> +#include <linux/elfnote.h>
> +#include <linux/init.h>
> +#include <linux/linkage.h>
> +#include <asm/segment.h>
> +#include <asm/asm.h>
> +#include <asm/boot.h>
> +#include <asm/processor-flags.h>
> +#include <asm/msr.h>
> +#include <xen/interface/elfnote.h>
> +
> +	__HEAD
> +
> +/* Entry point for PVH guests. */

Could you add some comments about register conetnts at entry?

> +ENTRY(pvh_start_xen)
> +	cld
> +
> +	lgdt (_pa(gdt))
> +
> +	mov $(__BOOT_DS),%eax
> +	mov %eax,%ds
> +	mov %eax,%es
> +	mov %eax,%ss
> +
> +	/* Stash hvm_start_info. */
> +	mov $_pa(pvh_start_info), %edi
> +	mov %ebx, %esi
> +	mov _pa(pvh_start_info_sz), %ecx
> +	shr $2,%ecx
> +	rep
> +	movsl
> +
> +	mov $_pa(early_stack_end), %esp
> +
> +	/* Enable PAE mode. */
> +	mov %cr4, %eax
> +	orl $X86_CR4_PAE, %eax
> +	mov %eax, %cr4
> +
> +#ifdef CONFIG_X86_64
> +	/* Enable Long mode. */
> +	mov $MSR_EFER, %ecx
> +	rdmsr
> +	btsl $_EFER_LME, %eax
> +	wrmsr
> +
> +	/* Enable pre-constructed page tables. */
> +	mov $_pa(init_level4_pgt), %eax
> +	mov %eax, %cr3
> +	mov $(X86_CR0_PG | X86_CR0_PE), %eax
> +	mov %eax, %cr0
> +
> +	/* Jump to 64-bit mode. */
> +        ljmp $__KERNEL_CS, $_pa(1f)

Indentation

> +
> +	/* 64-bit entry point. */
> +	.code64
> +1:
> +	call xen_prepare_pvh
> +
> +	/* startup_64 expects boot_params in %rsi. */
> +	mov $_pa(pvh_bootparams), %rsi
> +	mov $_pa(startup_64), %rax
> +	jmp *%rax
> +
> +#else /* CONFIG_X86_64 */
> +
> +	call mk_early_pgtbl_32
> +
> +	mov $_pa(initial_page_table), %eax
> +	mov %eax, %cr3
> +
> +	mov %cr0, %eax
> +	or $(X86_CR0_PG | X86_CR0_PE), %eax
> +	mov %eax, %cr0
> +
> +	ljmp $__BOOT_CS, $1f
> +1:
> +	call xen_prepare_pvh
> +	mov $_pa(pvh_bootparams), %esi
> +
> +	/* startup_32 doesn't expect paging and PAE to be on. */
> +	ljmp $__BOOT_CS, $_pa(2f)
> +2:
> +	mov %cr0, %eax
> +	and $~X86_CR0_PG, %eax
> +	mov %eax, %cr0
> +	mov %cr4, %eax
> +	and $~X86_CR4_PAE, %eax
> +	mov %eax, %cr4
> +
> +	ljmp    $0x10, $_pa(startup_32)

Any reason to use 0x10 instead of __BOOT_CS?

> +#endif
> +ENDPROC(pvh_start_xen)
> +
> +	.data

Alignment?

> +gdt:
> +	.word	gdt_end - gdt
> +	.long	_pa(gdt)

This is a rather strange construct: the NULL descriptor of the
GDT being used as space for lgdt operand.

> +	.word	0
> +	.quad	0x0000000000000000 /* NULL descriptor */

And this comment is wrong: the NULL descriptor is at "gdt:".

> +#ifdef CONFIG_X86_64
> +	.quad	0x00af9a000000ffff /* __KERNEL_CS */

Mind adding comments about the semantics of those constants?
Or use GDT_ENTRY() macro?

> +#else
> +	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
> +#endif
> +	.quad	0x00cf92000000ffff /* __KERNEL_DS */
> +gdt_end:
> +
> +	.bss
> +	.balign 4
> +early_stack:
> +	.fill 16, 1, 0

Is the stack size large enough? With a hypercall being executed in
xen_prepare_pvh() I doubt this will be okay.

> +early_stack_end:
> +
> +	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
> +	             _ASM_PTR (pvh_start_xen - __START_KERNEL_map))
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index d0f9684..6e8b7fc 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -29,6 +29,11 @@ enum xen_domain_type {
>  #define xen_initial_domain()	(0)
>  #endif	/* CONFIG_XEN_DOM0 */
>  
> +#ifdef CONFIG_XEN_PVH
> +extern bool xen_pvh;
> +#define xen_pvh_domain()	(xen_hvm_domain() && xen_pvh)
> +#else
>  #define xen_pvh_domain()	(0)
> +#endif
>  
>  #endif	/* _XEN_XEN_H */
> 


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest
  2017-02-03  7:24   ` Juergen Gross
@ 2017-02-03 16:20     ` Boris Ostrovsky
  2017-02-03 16:40       ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-02-03 16:20 UTC (permalink / raw)
  To: Juergen Gross; +Cc: roger.pau, xen-devel, linux-kernel


>> +
>> +	__HEAD
>> +
>> +/* Entry point for PVH guests. */
> Could you add some comments about register conetnts at entry?

Reference to Xen's docs/misc/hvmlite.markdown would be sifficient?




>> +gdt:
>> +	.word	gdt_end - gdt
>> +	.long	_pa(gdt)
> This is a rather strange construct: the NULL descriptor of the
> GDT being used as space for lgdt operand.
>
>> +	.word	0
>> +	.quad	0x0000000000000000 /* NULL descriptor */
> And this comment is wrong: the NULL descriptor is at "gdt:".

I'll change it to:

gdt:
        .word   gdt_end - gdt_start
        .long   _pa(gdt_start)
        .word   0
gdt_start:
        .quad   0x0000000000000000 /* NULL descriptor */
        .quad   0x0000000000000000 /* reserved */
#ifdef CONFIG_X86_64
        .quad   0x00af9a000000ffff /* __KERNEL_CS */
#else
        .quad   0x00cf9a000000ffff /* __KERNEL_CS */
#endif
        .quad   0x00cf92000000ffff /* __KERNEL_DS */
gdt_end:


>
>> +#ifdef CONFIG_X86_64
>> +	.quad	0x00af9a000000ffff /* __KERNEL_CS */
> Mind adding comments about the semantics of those constants?
> Or use GDT_ENTRY() macro?
>
>> +#else
>> +	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
>> +#endif
>> +	.quad	0x00cf92000000ffff /* __KERNEL_DS */
>> +gdt_end:
>> +
>> +	.bss
>> +	.balign 4
>> +early_stack:
>> +	.fill 16, 1, 0
> Is the stack size large enough? With a hypercall being executed in
> xen_prepare_pvh() I doubt this will be okay.

What do you think it should be then?

-boris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest
  2017-02-03 16:20     ` Boris Ostrovsky
@ 2017-02-03 16:40       ` Juergen Gross
  2017-02-03 18:05         ` [Xen-devel] " Andrew Cooper
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2017-02-03 16:40 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: roger.pau, xen-devel, linux-kernel

On 03/02/17 17:20, Boris Ostrovsky wrote:
> 
>>> +
>>> +	__HEAD
>>> +
>>> +/* Entry point for PVH guests. */
>> Could you add some comments about register conetnts at entry?
> 
> Reference to Xen's docs/misc/hvmlite.markdown would be sifficient?

I think the corresponding lines should be copied to this source
file. It is inconvenient to have to get the Xen repostory for
this information.

>>> +gdt:
>>> +	.word	gdt_end - gdt
>>> +	.long	_pa(gdt)
>> This is a rather strange construct: the NULL descriptor of the
>> GDT being used as space for lgdt operand.
>>
>>> +	.word	0
>>> +	.quad	0x0000000000000000 /* NULL descriptor */
>> And this comment is wrong: the NULL descriptor is at "gdt:".
> 
> I'll change it to:
> 
> gdt:
>         .word   gdt_end - gdt_start
>         .long   _pa(gdt_start)
>         .word   0
> gdt_start:
>         .quad   0x0000000000000000 /* NULL descriptor */
>         .quad   0x0000000000000000 /* reserved */

Much better. :-)

> #ifdef CONFIG_X86_64
>         .quad   0x00af9a000000ffff /* __KERNEL_CS */
> #else
>         .quad   0x00cf9a000000ffff /* __KERNEL_CS */
> #endif
>         .quad   0x00cf92000000ffff /* __KERNEL_DS */
> gdt_end:
> 
> 
>>
>>> +#ifdef CONFIG_X86_64
>>> +	.quad	0x00af9a000000ffff /* __KERNEL_CS */
>> Mind adding comments about the semantics of those constants?
>> Or use GDT_ENTRY() macro?
>>
>>> +#else
>>> +	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
>>> +#endif
>>> +	.quad	0x00cf92000000ffff /* __KERNEL_DS */
>>> +gdt_end:
>>> +
>>> +	.bss
>>> +	.balign 4
>>> +early_stack:
>>> +	.fill 16, 1, 0
>> Is the stack size large enough? With a hypercall being executed in
>> xen_prepare_pvh() I doubt this will be okay.
> 
> What do you think it should be then?

I didn't check the disassembly, but even if it is okay right now
the needed stack size will depend on the compiler used. I'd rather
use a larger size (e.g. 256 bytes).

Maybe its even possible to reuse initial_stack, but I haven't
verified that.


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Xen-devel] [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest
  2017-02-03 16:40       ` Juergen Gross
@ 2017-02-03 18:05         ` Andrew Cooper
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cooper @ 2017-02-03 18:05 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky; +Cc: xen-devel, linux-kernel, roger.pau

On 03/02/17 16:40, Juergen Gross wrote:
> On 03/02/17 17:20, Boris Ostrovsky wrote:
>>>> +
>>>> +	__HEAD
>>>> +
>>>> +/* Entry point for PVH guests. */
>>> Could you add some comments about register conetnts at entry?
>> Reference to Xen's docs/misc/hvmlite.markdown would be sifficient?
> I think the corresponding lines should be copied to this source
> file. It is inconvenient to have to get the Xen repostory for
> this information.
>
>>>> +gdt:
>>>> +	.word	gdt_end - gdt
>>>> +	.long	_pa(gdt)
>>> This is a rather strange construct: the NULL descriptor of the
>>> GDT being used as space for lgdt operand.
>>>
>>>> +	.word	0
>>>> +	.quad	0x0000000000000000 /* NULL descriptor */
>>> And this comment is wrong: the NULL descriptor is at "gdt:".
>> I'll change it to:
>>
>> gdt:
>>         .word   gdt_end - gdt_start
>>         .long   _pa(gdt_start)
>>         .word   0
>> gdt_start:
>>         .quad   0x0000000000000000 /* NULL descriptor */
>>         .quad   0x0000000000000000 /* reserved */
> Much better. :-)
>
>> #ifdef CONFIG_X86_64
>>         .quad   0x00af9a000000ffff /* __KERNEL_CS */
>> #else
>>         .quad   0x00cf9a000000ffff /* __KERNEL_CS */
>> #endif
>>         .quad   0x00cf92000000ffff /* __KERNEL_DS */
>> gdt_end:
>>
>>
>>>> +#ifdef CONFIG_X86_64
>>>> +	.quad	0x00af9a000000ffff /* __KERNEL_CS */
>>> Mind adding comments about the semantics of those constants?
>>> Or use GDT_ENTRY() macro?
>>>
>>>> +#else
>>>> +	.quad	0x00cf9a000000ffff /* __KERNEL_CS */
>>>> +#endif
>>>> +	.quad	0x00cf92000000ffff /* __KERNEL_DS */
>>>> +gdt_end:
>>>> +
>>>> +	.bss
>>>> +	.balign 4
>>>> +early_stack:
>>>> +	.fill 16, 1, 0
>>> Is the stack size large enough? With a hypercall being executed in
>>> xen_prepare_pvh() I doubt this will be okay.
>> What do you think it should be then?
> I didn't check the disassembly, but even if it is okay right now
> the needed stack size will depend on the compiler used. I'd rather
> use a larger size (e.g. 256 bytes).
>
> Maybe its even possible to reuse initial_stack, but I haven't
> verified that.

Hypercalls in HVM guests don't use the stack at all in the hypercall
page, and while this is unlikely to ever change, we make no guarentee to
maintain this property.  (64bit PV guests use 2 words of stack in the
hypercall page.)

However, you must `call` at the hypercall page entry stub which uses 1
word, and the compiler needs to perform register scheduling for all 6
hypercall arguments which might involve spilling them to the stack.

2 words of stack doesn't seem large enough, irrespective of hypercalls.

~Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-02-03 18:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 19:41 [PATCH v2 0/9] PVH v2 support (domU) Boris Ostrovsky
2017-01-26 19:41 ` [PATCH v2 1/9] x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C Boris Ostrovsky
2017-01-26 19:41 ` [PATCH v2 2/9] xen/x86: Remove PVH support Boris Ostrovsky
2017-01-26 19:41 ` [PATCH v2 3/9] xen/pvh: Import PVH-related Xen public interfaces Boris Ostrovsky
2017-01-26 19:41 ` [PATCH v2 4/9] xen/pvh: Bootstrap PVH guest Boris Ostrovsky
2017-02-03  7:24   ` Juergen Gross
2017-02-03 16:20     ` Boris Ostrovsky
2017-02-03 16:40       ` Juergen Gross
2017-02-03 18:05         ` [Xen-devel] " Andrew Cooper
2017-01-26 19:41 ` [PATCH v2 5/9] xen/pvh: Prevent PVH guests from using PIC, RTC and IOAPIC Boris Ostrovsky
2017-02-02 15:23   ` Juergen Gross
2017-02-02 15:35   ` Roger Pau Monné
2017-02-02 16:30     ` Boris Ostrovsky
2017-02-02 16:40       ` Roger Pau Monné
2017-02-02 17:47         ` Boris Ostrovsky
2017-01-26 19:41 ` [PATCH v2 6/9] xen/pvh: Initialize grant table for PVH guests Boris Ostrovsky
2017-01-27 15:38   ` Juergen Gross
2017-01-26 19:41 ` [PATCH v2 7/9] xen/pvh: PVH guests always have PV devices Boris Ostrovsky
2017-01-26 19:41 ` [PATCH v2 8/9] xen/pvh: Enable CPU hotplug Boris Ostrovsky
2017-01-27 15:36   ` Juergen Gross
2017-01-26 19:41 ` [PATCH v2 9/9] xen/pvh: Use Xen's emergency_restart op for PVH guests Boris Ostrovsky
2017-01-27 15:37   ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).