All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] xen/x86: use per-vcpu stacks for 64 bit pv domains
@ 2018-01-09 14:26 Juergen Gross
  2018-01-09 14:26 ` [PATCH RFC 1/4] xen/x86: use dedicated function for tss initialization Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Juergen Gross @ 2018-01-09 14:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, ian.jackson, jbeulich

As a preparation for doing page table isolation in the Xen hypervisor
in order to mitigate "Meltdown" use dedicated stacks for 64 bit PV
domains mapped to the per-domain virtual area. The TSS is added to
that area, too, and the GDT is no longer a remapped version of the
per physical cpu one.

This will enable the possibility to run guest code without any per
physical cpu mapping, i.e. avoiding the threat of a guest being able
to access other domains data.

Without any further measures it will still be possible for e.g. a
guest's user program to read stack data of another vcpu of the same
domain, but this can be easily avoided by a little PV-ABI modification
introducing per-cpu user address spaces.

This series is meant as a replacement for Andrew's patch series:
"x86: Prerequisite work for a Xen KAISER solution".

What needs to be done:
- add livepatch support (should be rather easy)
- debug-keys "d" needs some adaptions
- performance evaluation
- some optimizations?


Juergen Gross (4):
  xen/x86: use dedicated function for tss initialization
  xen/x86: add helper for stack guard
  xen/x86: split context_switch()
  xen: use per-vcpu TSS and stacks for pv domains

 xen/arch/x86/cpu/common.c    |  56 +++++++++++++----------
 xen/arch/x86/domain.c        | 106 +++++++++++++++++++++++++++++--------------
 xen/arch/x86/mm.c            |   8 +---
 xen/arch/x86/pv/domain.c     |  72 +++++++++++++++++++++++++++--
 xen/arch/x86/x86_64/entry.S  |   4 ++
 xen/include/asm-x86/config.h |   9 +++-
 xen/include/asm-x86/mm.h     |  11 +++++
 xen/include/asm-x86/system.h |   1 +
 8 files changed, 198 insertions(+), 69 deletions(-)

-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC 1/4] xen/x86: use dedicated function for tss initialization
  2018-01-09 14:26 [PATCH RFC 0/4] xen/x86: use per-vcpu stacks for 64 bit pv domains Juergen Gross
@ 2018-01-09 14:26 ` Juergen Gross
  2018-01-09 14:26 ` [PATCH RFC 2/4] xen/x86: add helper for stack guard Juergen Gross
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Juergen Gross @ 2018-01-09 14:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, ian.jackson, jbeulich

Carve out the TSS initialization from load_system_tables().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/cpu/common.c    | 56 ++++++++++++++++++++++++--------------------
 xen/include/asm-x86/system.h |  1 +
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index e9588b3c0d..8c0d3181d0 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -634,6 +634,35 @@ void __init early_cpu_init(void)
 	early_cpu_detect();
 }
 
+void tss_init(struct tss_struct *tss, unsigned long stack_bottom)
+{
+	unsigned long stack_top = stack_bottom & ~(STACK_SIZE - 1);
+
+	*tss = (struct tss_struct){
+		/* Main stack for interrupts/exceptions. */
+		.rsp0 = stack_bottom,
+
+		/* Ring 1 and 2 stacks poisoned. */
+		.rsp1 = 0x8600111111111111ul,
+		.rsp2 = 0x8600111111111111ul,
+
+		/*
+		 * MCE, NMI and Double Fault handlers get their own stacks.
+		 * All others poisoned.
+		 */
+		.ist = {
+			[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
+			[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
+			[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
+
+			[IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
+				0x8600111111111111ul,
+		},
+
+		.bitmap = IOBMP_INVALID_OFFSET,
+	};
+}
+
 /*
  * Sets up system tables and descriptors.
  *
@@ -645,8 +674,7 @@ void __init early_cpu_init(void)
 void load_system_tables(void)
 {
 	unsigned int cpu = smp_processor_id();
-	unsigned long stack_bottom = get_stack_bottom(),
-		stack_top = stack_bottom & ~(STACK_SIZE - 1);
+	unsigned long stack_bottom = get_stack_bottom();
 
 	struct tss_struct *tss = &this_cpu(init_tss);
 	struct desc_struct *gdt =
@@ -663,29 +691,7 @@ void load_system_tables(void)
 		.limit = (IDT_ENTRIES * sizeof(idt_entry_t)) - 1,
 	};
 
-	*tss = (struct tss_struct){
-		/* Main stack for interrupts/exceptions. */
-		.rsp0 = stack_bottom,
-
-		/* Ring 1 and 2 stacks poisoned. */
-		.rsp1 = 0x8600111111111111ul,
-		.rsp2 = 0x8600111111111111ul,
-
-		/*
-		 * MCE, NMI and Double Fault handlers get their own stacks.
-		 * All others poisoned.
-		 */
-		.ist = {
-			[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
-			[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
-			[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
-
-			[IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
-				0x8600111111111111ul,
-		},
-
-		.bitmap = IOBMP_INVALID_OFFSET,
-	};
+	tss_init(tss, stack_bottom);
 
 	_set_tssldt_desc(
 		gdt + TSS_ENTRY,
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 8ac170371b..2cf50d1d49 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -230,6 +230,7 @@ static inline int local_irq_is_enabled(void)
 
 void trap_init(void);
 void init_idt_traps(void);
+void tss_init(struct tss_struct *tss, unsigned long stack_bottom);
 void load_system_tables(void);
 void percpu_traps_init(void);
 void subarch_percpu_traps_init(void);
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 2/4] xen/x86: add helper for stack guard
  2018-01-09 14:26 [PATCH RFC 0/4] xen/x86: use per-vcpu stacks for 64 bit pv domains Juergen Gross
  2018-01-09 14:26 ` [PATCH RFC 1/4] xen/x86: use dedicated function for tss initialization Juergen Gross
@ 2018-01-09 14:26 ` Juergen Gross
  2018-01-09 14:26 ` [PATCH RFC 3/4] xen/x86: split context_switch() Juergen Gross
  2018-01-09 14:27 ` [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains Juergen Gross
  3 siblings, 0 replies; 11+ messages in thread
From: Juergen Gross @ 2018-01-09 14:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, ian.jackson, jbeulich

Instead of open coding the calculation of the stack guard page multiple
times add a helper to do the calculation.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/mm.c        | 8 ++------
 xen/include/asm-x86/mm.h | 6 ++++++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a56f875d45..b60e79e82e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5517,16 +5517,12 @@ void memguard_unguard_range(void *p, unsigned long l)
 void memguard_guard_stack(void *p)
 {
     BUILD_BUG_ON((PRIMARY_STACK_SIZE + PAGE_SIZE) > STACK_SIZE);
-    p = (void *)((unsigned long)p + STACK_SIZE -
-                 PRIMARY_STACK_SIZE - PAGE_SIZE);
-    memguard_guard_range(p, PAGE_SIZE);
+    memguard_guard_range(memguard_get_guard_page(p), PAGE_SIZE);
 }
 
 void memguard_unguard_stack(void *p)
 {
-    p = (void *)((unsigned long)p + STACK_SIZE -
-                 PRIMARY_STACK_SIZE - PAGE_SIZE);
-    memguard_unguard_range(p, PAGE_SIZE);
+    memguard_unguard_range(memguard_get_guard_page(p), PAGE_SIZE);
 }
 
 void arch_dump_shared_mem_info(void)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 4af6b2341a..84e112b830 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -517,6 +517,12 @@ void memguard_unguard_range(void *p, unsigned long l);
 #define memguard_unguard_range(_p,_l)  ((void)0)
 #endif
 
+static inline void *memguard_get_guard_page(void *p)
+{
+    return (void *)((unsigned long)p + STACK_SIZE -
+                    PRIMARY_STACK_SIZE - PAGE_SIZE);
+}
+
 void memguard_guard_stack(void *p);
 void memguard_unguard_stack(void *p);
 
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 3/4] xen/x86: split context_switch()
  2018-01-09 14:26 [PATCH RFC 0/4] xen/x86: use per-vcpu stacks for 64 bit pv domains Juergen Gross
  2018-01-09 14:26 ` [PATCH RFC 1/4] xen/x86: use dedicated function for tss initialization Juergen Gross
  2018-01-09 14:26 ` [PATCH RFC 2/4] xen/x86: add helper for stack guard Juergen Gross
@ 2018-01-09 14:26 ` Juergen Gross
  2018-01-09 14:27 ` [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains Juergen Gross
  3 siblings, 0 replies; 11+ messages in thread
From: Juergen Gross @ 2018-01-09 14:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, ian.jackson, jbeulich

Split up context_switch() to prepare switching of the used stack.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c | 67 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index da1bf1a97b..c0cb2cae64 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1673,38 +1673,10 @@ static void __context_switch(void)
     per_cpu(curr_vcpu, cpu) = n;
 }
 
-
-void context_switch(struct vcpu *prev, struct vcpu *next)
+static void context_switch_irqoff(struct vcpu *prev, struct vcpu *next,
+                                  unsigned int cpu)
 {
-    unsigned int cpu = smp_processor_id();
     const struct domain *prevd = prev->domain, *nextd = next->domain;
-    cpumask_t dirty_mask;
-
-    ASSERT(local_irq_is_enabled());
-
-    cpumask_copy(&dirty_mask, next->vcpu_dirty_cpumask);
-    /* Allow at most one CPU at a time to be dirty. */
-    ASSERT(cpumask_weight(&dirty_mask) <= 1);
-    if ( unlikely(!cpumask_test_cpu(cpu, &dirty_mask) &&
-                  !cpumask_empty(&dirty_mask)) )
-    {
-        /* Other cpus call __sync_local_execstate from flush ipi handler. */
-        flush_tlb_mask(&dirty_mask);
-    }
-
-    if ( prev != next )
-    {
-        _update_runstate_area(prev);
-        vpmu_switch_from(prev);
-        np2m_schedule(NP2M_SCHEDLE_OUT);
-    }
-
-    if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm_vcpu.tm_list) )
-        pt_save_timer(prev);
-
-    local_irq_disable();
-
-    set_current(next);
 
     if ( (per_cpu(curr_vcpu, cpu) == next) ||
          (is_idle_domain(nextd) && cpu_online(cpu)) )
@@ -1760,6 +1732,41 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     BUG();
 }
 
+void context_switch(struct vcpu *prev, struct vcpu *next)
+{
+    unsigned int cpu = smp_processor_id();
+    const struct domain *prevd = prev->domain;
+    cpumask_t dirty_mask;
+
+    ASSERT(local_irq_is_enabled());
+
+    cpumask_copy(&dirty_mask, next->vcpu_dirty_cpumask);
+    /* Allow at most one CPU at a time to be dirty. */
+    ASSERT(cpumask_weight(&dirty_mask) <= 1);
+    if ( unlikely(!cpumask_test_cpu(cpu, &dirty_mask) &&
+                  !cpumask_empty(&dirty_mask)) )
+    {
+        /* Other cpus call __sync_local_execstate from flush ipi handler. */
+        flush_tlb_mask(&dirty_mask);
+    }
+
+    if ( prev != next )
+    {
+        _update_runstate_area(prev);
+        vpmu_switch_from(prev);
+        np2m_schedule(NP2M_SCHEDLE_OUT);
+    }
+
+    if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm_vcpu.tm_list) )
+        pt_save_timer(prev);
+
+    local_irq_disable();
+
+    set_current(next);
+
+    context_switch_irqoff(prev, next, cpu);
+}
+
 void continue_running(struct vcpu *same)
 {
     /* See the comment above. */
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
  2018-01-09 14:26 [PATCH RFC 0/4] xen/x86: use per-vcpu stacks for 64 bit pv domains Juergen Gross
                   ` (2 preceding siblings ...)
  2018-01-09 14:26 ` [PATCH RFC 3/4] xen/x86: split context_switch() Juergen Gross
@ 2018-01-09 14:27 ` Juergen Gross
  2018-01-09 17:01   ` Andrew Cooper
  3 siblings, 1 reply; 11+ messages in thread
From: Juergen Gross @ 2018-01-09 14:27 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, ian.jackson, jbeulich

Instead of using the TSS and stacks of the physical processor allocate
them per vcpu, map them in the per domain area, and use those.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c        | 45 +++++++++++++++++++++++----
 xen/arch/x86/pv/domain.c     | 72 +++++++++++++++++++++++++++++++++++++++++---
 xen/arch/x86/x86_64/entry.S  |  4 +++
 xen/include/asm-x86/config.h |  9 +++++-
 xen/include/asm-x86/mm.h     |  5 +++
 5 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index c0cb2cae64..952ed7e121 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1582,7 +1582,12 @@ static void _update_runstate_area(struct vcpu *v)
 
 static inline bool need_full_gdt(const struct domain *d)
 {
-    return is_pv_domain(d) && !is_idle_domain(d);
+    return is_pv_32bit_domain(d);
+}
+
+static inline bool need_per_vcpu_data(const struct domain *d)
+{
+    return is_pv_domain(d) && !is_idle_domain(d) && !is_pv_32bit_domain(d);
 }
 
 static void __context_switch(void)
@@ -1657,8 +1662,19 @@ static void __context_switch(void)
 
     write_ptbase(n);
 
-    if ( need_full_gdt(nd) &&
-         ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
+    if ( need_per_vcpu_data(nd) )
+    {
+        gdt = (struct desc_struct *)GDT_VIRT_START(n);
+        gdt[PER_CPU_GDT_ENTRY].a = cpu;
+
+        gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
+        gdt_desc.base = GDT_VIRT_START(n);
+
+        lgdt(&gdt_desc);
+        ltr(TSS_ENTRY << 3);
+    }
+    else if ( need_full_gdt(nd) &&
+              ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
     {
         gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
         gdt_desc.base = GDT_VIRT_START(n);
@@ -1673,8 +1689,8 @@ static void __context_switch(void)
     per_cpu(curr_vcpu, cpu) = n;
 }
 
-static void context_switch_irqoff(struct vcpu *prev, struct vcpu *next,
-                                  unsigned int cpu)
+void context_switch_irqoff(struct vcpu *prev, struct vcpu *next,
+                           unsigned int cpu)
 {
     const struct domain *prevd = prev->domain, *nextd = next->domain;
 
@@ -1764,7 +1780,24 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
     set_current(next);
 
-    context_switch_irqoff(prev, next, cpu);
+    if ( is_pv_domain(prevd) && !is_pv_32bit_domain(prevd) )
+    {
+        struct desc_struct *gdt = this_cpu(compat_gdt_table) -
+                                  FIRST_RESERVED_GDT_ENTRY;
+        const struct desc_ptr gdtr = {
+            .base = (unsigned long)gdt,
+            .limit = LAST_RESERVED_GDT_BYTE,
+        };
+        void *stack = (struct cpu_info *)(stack_base[cpu] + STACK_SIZE) - 1;
+
+        /* Switch to global accessible gdt and tss. */
+        lgdt(&gdtr);
+        ltr(TSS_ENTRY << 3);
+
+        context_switch_irqoff_stack(prev, next, cpu, stack);
+    }
+    else
+        context_switch_irqoff(prev, next, cpu);
 }
 
 void continue_running(struct vcpu *same)
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 74e9e667d2..6692aa6922 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -96,10 +96,32 @@ int switch_compat(struct domain *d)
 
 static int pv_create_gdt_ldt_l1tab(struct vcpu *v)
 {
-    return create_perdomain_mapping(v->domain, GDT_VIRT_START(v),
-                                    1U << GDT_LDT_VCPU_SHIFT,
-                                    v->domain->arch.pv_domain.gdt_ldt_l1tab,
-                                    NULL);
+    int rc;
+
+    rc = create_perdomain_mapping(v->domain, GDT_VIRT_START(v),
+                                  1U << GDT_LDT_VCPU_SHIFT,
+                                  v->domain->arch.pv_domain.gdt_ldt_l1tab,
+                                  NULL);
+    if ( !rc && !is_pv_32bit_vcpu(v) )
+    {
+        struct desc_struct *gdt;
+
+        gdt = (struct desc_struct *)GDT_VIRT_START(v) +
+              FIRST_RESERVED_GDT_ENTRY;
+        rc = create_perdomain_mapping(v->domain, (unsigned long)gdt,
+                                      NR_RESERVED_GDT_BYTES,
+                                      NULL, NIL(struct page_info *));
+        if ( !rc )
+        {
+            memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_BYTES);
+            _set_tssldt_desc(gdt + TSS_ENTRY - FIRST_RESERVED_GDT_ENTRY,
+                         TSS_START(v),
+                         offsetof(struct tss_struct, __cacheline_filler) - 1,
+                         SYS_DESC_tss_avail);
+        }
+    }
+
+    return rc;
 }
 
 static void pv_destroy_gdt_ldt_l1tab(struct vcpu *v)
@@ -119,6 +141,46 @@ void pv_vcpu_destroy(struct vcpu *v)
     pv_destroy_gdt_ldt_l1tab(v);
     xfree(v->arch.pv_vcpu.trap_ctxt);
     v->arch.pv_vcpu.trap_ctxt = NULL;
+
+    if ( !is_pv_32bit_vcpu(v) )
+        destroy_perdomain_mapping(v->domain, STACKS_START(v),
+                                  STACK_SIZE + PAGE_SIZE);
+}
+
+static int pv_vcpu_init_tss_stacks(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    void *stacks;
+    int rc;
+
+    /* Populate page tables. */
+    rc = create_perdomain_mapping(d, STACKS_START(v), STACK_SIZE + PAGE_SIZE,
+                                  NIL(l1_pgentry_t *), NULL);
+    if ( rc )
+        goto done;
+
+    /* Map TSS. */
+    rc = create_perdomain_mapping(d, TSS_START(v), PAGE_SIZE,
+                                  NULL, NIL(struct page_info *));
+    if ( rc )
+        goto done;
+
+    /* Map stacks. */
+    stacks = (void *)STACKS_START(v);
+    rc = create_perdomain_mapping(d, STACKS_START(v), STACK_SIZE,
+                                  NULL, NIL(struct page_info *));
+    if ( rc )
+        goto done;
+#ifdef MEMORY_GUARD
+    /* Remove guard page. */
+    destroy_perdomain_mapping(d, (unsigned long)memguard_get_guard_page(stacks),
+                              PAGE_SIZE);
+#endif
+
+    tss_init((struct tss_struct *)TSS_START(v), STACKS_START(v));
+
+ done:
+    return 0;
 }
 
 int pv_vcpu_initialise(struct vcpu *v)
@@ -157,6 +219,8 @@ int pv_vcpu_initialise(struct vcpu *v)
         if ( (rc = setup_compat_l4(v)) )
             goto done;
     }
+    else
+        rc = pv_vcpu_init_tss_stacks(v);
 
  done:
     if ( rc )
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 1dd9ccf6a2..997b75167c 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -742,3 +742,7 @@ autogen_stubs: /* Automatically generated stubs. */
 
         .section .init.rodata
         .size autogen_entrypoints, . - autogen_entrypoints
+
+ENTRY(context_switch_irqoff_stack)
+	mov   %rcx, %rsp
+        jmp   context_switch_irqoff
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index 9ef9d03ca7..46096cc666 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -202,7 +202,7 @@ extern unsigned char boot_edid_info[128];
 /* Slot 260: per-domain mappings (including map cache). */
 #define PERDOMAIN_VIRT_START    (PML4_ADDR(260))
 #define PERDOMAIN_SLOT_MBYTES   (PML4_ENTRY_BYTES >> (20 + PAGETABLE_ORDER))
-#define PERDOMAIN_SLOTS         3
+#define PERDOMAIN_SLOTS         4
 #define PERDOMAIN_VIRT_SLOT(s)  (PERDOMAIN_VIRT_START + (s) * \
                                  (PERDOMAIN_SLOT_MBYTES << 20))
 /* Slot 261: machine-to-phys conversion table (256GB). */
@@ -310,6 +310,13 @@ extern unsigned long xen_phys_start;
 #define ARG_XLAT_START(v)        \
     (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))
 
+/* per-vcpu Xen stacks and TSS. The fourth per-domain-mapping sub-area. */
+#define TSS_STACKS_VIRT_START    PERDOMAIN_VIRT_SLOT(3)
+#define TSS_STACKS_VA_SHIFT      (PAGE_SHIFT + STACK_ORDER + 1)
+#define STACKS_START(v)          (TSS_STACKS_VIRT_START +                    \
+                                  ((v)->vcpu_id << TSS_STACKS_VA_SHIFT))
+#define TSS_START(v)             (STACKS_START(v) + STACK_SIZE)
+
 #define NATIVE_VM_ASSIST_VALID   ((1UL << VMASST_TYPE_4gb_segments)        | \
                                   (1UL << VMASST_TYPE_4gb_segments_notify) | \
                                   (1UL << VMASST_TYPE_writable_pagetables) | \
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 84e112b830..6678bf04f5 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -636,4 +636,9 @@ static inline bool arch_mfn_in_directmap(unsigned long mfn)
     return mfn <= (virt_to_mfn(eva - 1) + 1);
 }
 
+void context_switch_irqoff(struct vcpu *prev, struct vcpu *next,
+                           unsigned int cpu);
+void context_switch_irqoff_stack(struct vcpu *prev, struct vcpu *next,
+                                 unsigned int cpu, void *stack);
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.13.6


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
  2018-01-09 14:27 ` [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains Juergen Gross
@ 2018-01-09 17:01   ` Andrew Cooper
  2018-01-09 17:40     ` Juergen Gross
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2018-01-09 17:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: ian.jackson, jbeulich

On 09/01/18 14:27, Juergen Gross wrote:
> Instead of using the TSS and stacks of the physical processor allocate
> them per vcpu, map them in the per domain area, and use those.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

I don't see anything here which updates the fields in the TSS across
context switch.  Without it, you'll be taking NMIs/MCEs/DF's on the
wrong stack.

I still don't see how your plan is viable in the first place, and is
adding substantially more complexity to an answer which doesn't need it.

I'm afraid I'm on the verge of a nack unless you can explain how is
intended to be safe, and better than what we currently have.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
  2018-01-09 17:01   ` Andrew Cooper
@ 2018-01-09 17:40     ` Juergen Gross
  2018-01-09 19:13       ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Juergen Gross @ 2018-01-09 17:40 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, jbeulich

On 09/01/18 18:01, Andrew Cooper wrote:
> On 09/01/18 14:27, Juergen Gross wrote:
>> Instead of using the TSS and stacks of the physical processor allocate
>> them per vcpu, map them in the per domain area, and use those.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> I don't see anything here which updates the fields in the TSS across
> context switch.  Without it, you'll be taking NMIs/MCEs/DF's on the
> wrong stack.

No, I'm doing ltr() with a TSS referencing the per-vcpu stacks. TSS is
per vcpu, too.

> I still don't see how your plan is viable in the first place, and is
> adding substantially more complexity to an answer which doesn't need it.
> 
> I'm afraid I'm on the verge of a nack unless you can explain how is
> intended to be safe, and better than what we currently have.

It is laying the groundwork for a KAISER solution needing no mapping of
per physical cpu areas in the user guest tables, so isolating the guests
from each other.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
  2018-01-09 17:40     ` Juergen Gross
@ 2018-01-09 19:13       ` Andrew Cooper
  2018-01-09 19:39         ` Juergen Gross
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2018-01-09 19:13 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Ian Jackson, jbeulich

(sorry for the top-post. I'm on my phone) 

I can see you are using ltr, but I don't see anywhere where where you are changing the content on the TSS, or the top-of-stack content.

It is very complicated to safely switch IST stacks when you might be taking interrupts. 

~Andrew 
________________________________________
From: Juergen Gross [jgross@suse.com]
Sent: 09 January 2018 17:40
To: Andrew Cooper; xen-devel@lists.xenproject.org
Cc: Ian Jackson; konrad.wilk@oracle.com; jbeulich@suse.com
Subject: Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains

On 09/01/18 18:01, Andrew Cooper wrote:
> On 09/01/18 14:27, Juergen Gross wrote:
>> Instead of using the TSS and stacks of the physical processor allocate
>> them per vcpu, map them in the per domain area, and use those.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>
> I don't see anything here which updates the fields in the TSS across
> context switch.  Without it, you'll be taking NMIs/MCEs/DF's on the
> wrong stack.

No, I'm doing ltr() with a TSS referencing the per-vcpu stacks. TSS is
per vcpu, too.



> I still don't see how your plan is viable in the first place, and is
> adding substantially more complexity to an answer which doesn't need it.
>
> I'm afraid I'm on the verge of a nack unless you can explain how is
> intended to be safe, and better than what we currently have.

It is laying the groundwork for a KAISER solution needing no mapping of
per physical cpu areas in the user guest tables, so isolating the guests
from each other.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
  2018-01-09 19:13       ` Andrew Cooper
@ 2018-01-09 19:39         ` Juergen Gross
  2018-01-10 10:40           ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Juergen Gross @ 2018-01-09 19:39 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, jbeulich

On 09/01/18 20:13, Andrew Cooper wrote:
> (sorry for the top-post. I'm on my phone) 
> 
> I can see you are using ltr, but I don't see anywhere where where you are changing the content on the TSS, or the top-of-stack content.

The per-vcpu TSS is already initialized with the correct stack
addresses, so it doesn't have to be modified later.

> It is very complicated to safely switch IST stacks when you might be taking interrupts.

Using LTR with a new TSS with both stack areas mapped (old and new)
should work, right?


Juergen

> 
> ~Andrew 
> ________________________________________
> From: Juergen Gross [jgross@suse.com]
> Sent: 09 January 2018 17:40
> To: Andrew Cooper; xen-devel@lists.xenproject.org
> Cc: Ian Jackson; konrad.wilk@oracle.com; jbeulich@suse.com
> Subject: Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
> 
> On 09/01/18 18:01, Andrew Cooper wrote:
>> On 09/01/18 14:27, Juergen Gross wrote:
>>> Instead of using the TSS and stacks of the physical processor allocate
>>> them per vcpu, map them in the per domain area, and use those.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> I don't see anything here which updates the fields in the TSS across
>> context switch.  Without it, you'll be taking NMIs/MCEs/DF's on the
>> wrong stack.
> 
> No, I'm doing ltr() with a TSS referencing the per-vcpu stacks. TSS is
> per vcpu, too.
> 
> 
> 
>> I still don't see how your plan is viable in the first place, and is
>> adding substantially more complexity to an answer which doesn't need it.
>>
>> I'm afraid I'm on the verge of a nack unless you can explain how is
>> intended to be safe, and better than what we currently have.
> 
> It is laying the groundwork for a KAISER solution needing no mapping of
> per physical cpu areas in the user guest tables, so isolating the guests
> from each other.
> 
> 
> Juergen
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
  2018-01-09 19:39         ` Juergen Gross
@ 2018-01-10 10:40           ` Andrew Cooper
  2018-01-10 10:53             ` Juergen Gross
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2018-01-10 10:40 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Ian Jackson, jbeulich

On 09/01/18 19:39, Juergen Gross wrote:
> On 09/01/18 20:13, Andrew Cooper wrote:
>> (sorry for the top-post. I'm on my phone) 
>>
>> I can see you are using ltr, but I don't see anywhere where where you are changing the content on the TSS, or the top-of-stack content.
> The per-vcpu TSS is already initialized with the correct stack
> addresses, so it doesn't have to be modified later.
>
>> It is very complicated to safely switch IST stacks when you might be taking interrupts.
> Using LTR with a new TSS with both stack areas mapped (old and new)
> should work, right?

The top-of-stack block has pcpu information on it, including
smp_processor_id() and pervcpu_offset.  Switching the cr4 shadow without
hitting an assert is tricky, and was left with a rather large RFC/TODO
in my pre-kaiser series.

The syscall stubs contain absolute stack references in them, so at a
minimum they also need rewriting on context switch.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains
  2018-01-10 10:40           ` Andrew Cooper
@ 2018-01-10 10:53             ` Juergen Gross
  0 siblings, 0 replies; 11+ messages in thread
From: Juergen Gross @ 2018-01-10 10:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Ian Jackson, jbeulich

On 10/01/18 11:40, Andrew Cooper wrote:
> On 09/01/18 19:39, Juergen Gross wrote:
>> On 09/01/18 20:13, Andrew Cooper wrote:
>>> (sorry for the top-post. I'm on my phone) 
>>>
>>> I can see you are using ltr, but I don't see anywhere where where you are changing the content on the TSS, or the top-of-stack content.
>> The per-vcpu TSS is already initialized with the correct stack
>> addresses, so it doesn't have to be modified later.
>>
>>> It is very complicated to safely switch IST stacks when you might be taking interrupts.
>> Using LTR with a new TSS with both stack areas mapped (old and new)
>> should work, right?
> 
> The top-of-stack block has pcpu information on it, including
> smp_processor_id() and pervcpu_offset.  Switching the cr4 shadow without
> hitting an assert is tricky, and was left with a rather large RFC/TODO
> in my pre-kaiser series.
> 
> The syscall stubs contain absolute stack references in them, so at a
> minimum they also need rewriting on context switch.

Aah, okay. This is the information I was after.

So I need to look after struct cpu_info and the syscall stubs next.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-01-10 10:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-09 14:26 [PATCH RFC 0/4] xen/x86: use per-vcpu stacks for 64 bit pv domains Juergen Gross
2018-01-09 14:26 ` [PATCH RFC 1/4] xen/x86: use dedicated function for tss initialization Juergen Gross
2018-01-09 14:26 ` [PATCH RFC 2/4] xen/x86: add helper for stack guard Juergen Gross
2018-01-09 14:26 ` [PATCH RFC 3/4] xen/x86: split context_switch() Juergen Gross
2018-01-09 14:27 ` [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains Juergen Gross
2018-01-09 17:01   ` Andrew Cooper
2018-01-09 17:40     ` Juergen Gross
2018-01-09 19:13       ` Andrew Cooper
2018-01-09 19:39         ` Juergen Gross
2018-01-10 10:40           ` Andrew Cooper
2018-01-10 10:53             ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.