All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] x86/PVH: Dom0 building adjustments
@ 2021-09-02  8:30 Jan Beulich
  2021-09-02  8:32 ` [PATCH v2 1/6] x86/P2M: relax guarding of MMIO entries Jan Beulich
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-02  8:30 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

The code building PVH Dom0 made use of sequences of P2M changes
which are disallowed as of XSA-378. First of all population of the
first Mb of memory needs to be redone. Then, largely as a
workaround, checking introduced by XSA-378 needs to be slightly
relaxed.

Note that with these adjustments I get Dom0 to start booting on my
development system, but the Dom0 kernel then gets stuck. Since it
was the first time for me to try PVH Dom0 in this context (see
below for why I was hesitant), I cannot tell yet whether this is
due further fallout from the XSA, or some further unrelated
problem. Dom0's BSP makes it all the way through to entering the
idle loop while all APs are still in VPF_down.

[And there was another rather basic issue to fight first (patch
 was submitted separately as RFC): vPCI wasn't aware of hidden PCI
 devices, hitting an ASSERT(). Obviously I couldn't afford not
 having a functioning serial console.]

In the course I ran into an oom condition while populating Dom0's
RAM. Hence next some re-work of dom0_compute_nr_pages(). In turn
in the course of putting that together I did notice that PV Dom0,
when run in shadow mode, wouldn't have its shadow allocation
properly set.

Finally make debug key '0' work at least partially for PVH Dom0
and limit the amount of log output for offline (down) vCPU-s.

1: P2M: relax guarding of MMIO entries
2: P2M: relax permissions of PVH Dom0's MMIO entries
3: PVH: improve Dom0 memory size calculation
4: PV: properly set shadow allocation for Dom0
5: PVH: actually show Dom0's register state from debug key '0'
6: HVM: skip offline vCPU-s when dumping VMCBs/VMCSes

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/6] x86/P2M: relax guarding of MMIO entries
  2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
@ 2021-09-02  8:32 ` Jan Beulich
  2021-09-06 15:54   ` Andrew Cooper
  2021-09-02  8:33 ` [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's " Jan Beulich
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-09-02  8:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, George Dunlap

One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by code paths not intended for this purpose. At least in
the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
this is too strict. Generally short-circuit requests establishing the
same kind of mapping that's already in place, while otherwise adjusting
permissions without - as before - allowing MFN or type to change.

While there, also add a log message to the other domain_crash()
invocation that did prevent PVH Dom0 from coming up after the XSA-378
changes.

Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I may have gone too far by allowing "access" to change for all special
types now.
---
v2: Format string and comment adjustments. Split off access
    accumulation.

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d
         if ( p2m_is_special(ot) )
         {
             /* Don't permit unmapping grant/foreign/direct-MMIO this way. */
-            domain_crash(d);
             p2m_unlock(p2m);
-            
+            printk(XENLOG_G_ERR
+                   "%pd: GFN %#lx (%#lx,%u,%u) -> (%#lx,%u,%u) not permitted\n",
+                   d, gfn_x(gfn) + i,
+                   mfn_x(omfn), ot, a,
+                   mfn_x(mfn) + i, t, p2m->default_access);
+            domain_crash(d);
             return -EPERM;
         }
         else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
@@ -1302,9 +1306,24 @@ static int set_typed_p2m_entry(struct do
     }
     if ( p2m_is_special(ot) )
     {
-        gfn_unlock(p2m, gfn, order);
-        domain_crash(d);
-        return -EPERM;
+        /* Special-case (almost) identical mappings. */
+        if ( !mfn_eq(mfn, omfn) || gfn_p2mt != ot )
+        {
+            gfn_unlock(p2m, gfn, order);
+            printk(XENLOG_G_ERR
+                   "%pd: GFN %#lx (%#lx,%u,%u,%u) -> (%#lx,%u,%u,%u) not permitted\n",
+                   d, gfn_l,
+                   mfn_x(omfn), cur_order, ot, a,
+                   mfn_x(mfn), order, gfn_p2mt, access);
+            domain_crash(d);
+            return -EPERM;
+        }
+
+        if ( access == a )
+        {
+            gfn_unlock(p2m, gfn, order);
+            return 0;
+        }
     }
     else if ( p2m_is_ram(ot) )
     {



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries
  2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
  2021-09-02  8:32 ` [PATCH v2 1/6] x86/P2M: relax guarding of MMIO entries Jan Beulich
@ 2021-09-02  8:33 ` Jan Beulich
  2021-09-06 15:48   ` Andrew Cooper
  2021-09-02  8:33 ` [PATCH v2 3/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-09-02  8:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, George Dunlap

To become independent of the sequence of mapping operations, permit
"access" to accumulate for Dom0, noting that there's not going to be an
introspection agent for it which this might interfere with. While e.g.
ideally only ROM regions would get mapped with X set, getting there is
quite a bit of work. Plus the use of p2m_access_* here is abusive in the
first place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Split off from original patch.

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1319,6 +1319,18 @@ static int set_typed_p2m_entry(struct do
             return -EPERM;
         }
 
+        /*
+         * Gross bodge, to go away again rather sooner than later:
+         *
+         * For MMIO allow access permissions to accumulate, but only for Dom0.
+         * Since set_identity_p2m_entry() and set_mmio_p2m_entry() differ in
+         * the way they specify "access", this will allow the ultimate result
+         * be independent of the sequence of operations.
+         */
+        if ( is_hardware_domain(d) && gfn_p2mt == p2m_mmio_direct &&
+             access <= p2m_access_rwx && a <= p2m_access_rwx )
+            access |= a;
+
         if ( access == a )
         {
             gfn_unlock(p2m, gfn, order);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 3/6] x86/PVH: improve Dom0 memory size calculation
  2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
  2021-09-02  8:32 ` [PATCH v2 1/6] x86/P2M: relax guarding of MMIO entries Jan Beulich
  2021-09-02  8:33 ` [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's " Jan Beulich
@ 2021-09-02  8:33 ` Jan Beulich
  2021-09-02  8:34 ` [PATCH v2 4/6] x86/PV: properly set shadow allocation for Dom0 Jan Beulich
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-02  8:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Assuming that the accounting for IOMMU page tables will also take care
of the P2M needs was wrong: dom0_paging_pages() can determine a far
higher value, high enough for the system to run out of memory while
setting up Dom0. Hence in the case of shared page tables the larger of
the two values needs to be used (without shared page tables the sum of
both continues to be applicable).

While there also account for two further aspects in the PV case: With
"iommu=dom0-passthrough" no IOMMU page tables would get allocated, so
none need accounting for. And if shadow mode is to be enabled, setting
aside a suitable amount for the P2M pool to get populated is also
necessary (i.e. similar to the non-shared-page-tables case of PVH).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -318,7 +318,7 @@ unsigned long __init dom0_compute_nr_pag
     struct domain *d, struct elf_dom_parms *parms, unsigned long initrd_len)
 {
     nodeid_t node;
-    unsigned long avail = 0, nr_pages, min_pages, max_pages;
+    unsigned long avail = 0, nr_pages, min_pages, max_pages, iommu_pages = 0;
     bool need_paging;
 
     /* The ordering of operands is to work around a clang5 issue. */
@@ -337,18 +337,23 @@ unsigned long __init dom0_compute_nr_pag
         avail -= d->max_vcpus - 1;
 
     /* Reserve memory for iommu_dom0_init() (rough estimate). */
-    if ( is_iommu_enabled(d) )
+    if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
     {
         unsigned int s;
 
         for ( s = 9; s < BITS_PER_LONG; s += 9 )
-            avail -= max_pdx >> s;
+            iommu_pages += max_pdx >> s;
+
+        avail -= iommu_pages;
     }
 
-    need_paging = is_hvm_domain(d) &&
-        (!iommu_use_hap_pt(d) || !paging_mode_hap(d));
+    need_paging = is_hvm_domain(d)
+                  ? !iommu_use_hap_pt(d) || !paging_mode_hap(d)
+                  : opt_dom0_shadow;
     for ( ; ; need_paging = false )
     {
+        unsigned long paging_pages;
+
         nr_pages = get_memsize(&dom0_size, avail);
         min_pages = get_memsize(&dom0_min_size, avail);
         max_pages = get_memsize(&dom0_max_size, avail);
@@ -377,11 +382,20 @@ unsigned long __init dom0_compute_nr_pag
         nr_pages = min(nr_pages, max_pages);
         nr_pages = min(nr_pages, avail);
 
-        if ( !need_paging )
-            break;
+        paging_pages = paging_mode_enabled(d) || need_paging
+                       ? dom0_paging_pages(d, nr_pages) : 0;
 
         /* Reserve memory for shadow or HAP. */
-        avail -= dom0_paging_pages(d, nr_pages);
+        if ( !need_paging )
+        {
+            if ( paging_pages <= iommu_pages )
+                break;
+
+            avail -= paging_pages - iommu_pages;
+        }
+        else
+            avail -= paging_pages;
+        iommu_pages = paging_pages;
     }
 
     if ( is_pv_domain(d) &&



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 4/6] x86/PV: properly set shadow allocation for Dom0
  2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2021-09-02  8:33 ` [PATCH v2 3/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
@ 2021-09-02  8:34 ` Jan Beulich
  2021-09-02  8:35 ` [PATCH v2 5/6] x86/PVH: actually show Dom0's register state from debug key '0' Jan Beulich
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-02  8:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan

Leaving shadow setup just to the L1TF tasklet means running Dom0 on a
minimally acceptable shadow memory pool, rather than what normally
would be used (also, for example, for PVH). Populate the pool before
triggering the tasklet, on a best effort basis (again like done for
PVH).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
---
v2: Latch dom0_paging_pages() result.

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1298,7 +1298,7 @@ int shadow_set_allocation(struct domain
 {
     struct page_info *sp;
 
-    ASSERT(paging_locked_by_me(d));
+    ASSERT(paging_locked_by_me(d) || system_state < SYS_STATE_active);
 
     if ( pages > 0 )
     {
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -21,6 +21,7 @@
 #include <asm/page.h>
 #include <asm/pv/mm.h>
 #include <asm/setup.h>
+#include <asm/shadow.h>
 
 /* Allow ring-3 access in long mode as guest cannot use ring 1 ... */
 #define BASE_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_USER)
@@ -933,7 +934,18 @@ int __init dom0_construct_pv(struct doma
 #ifdef CONFIG_SHADOW_PAGING
     if ( opt_dom0_shadow )
     {
+        bool preempted;
+
         printk("Switching dom0 to using shadow paging\n");
+
+        nr_pt_pages = dom0_paging_pages(d, nr_pages);
+
+        do {
+            preempted = false;
+            shadow_set_allocation(d, nr_pt_pages, &preempted);
+            process_pending_softirqs();
+        } while ( preempted );
+
         tasklet_schedule(&d->arch.paging.shadow.pv_l1tf_tasklet);
     }
 #endif



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 5/6] x86/PVH: actually show Dom0's register state from debug key '0'
  2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
                   ` (3 preceding siblings ...)
  2021-09-02  8:34 ` [PATCH v2 4/6] x86/PV: properly set shadow allocation for Dom0 Jan Beulich
@ 2021-09-02  8:35 ` Jan Beulich
  2021-09-02  8:36 ` [PATCH v2 6/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes Jan Beulich
  2021-09-02 11:42 ` [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-02  8:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Kevin Tian, Jun Nakajima

vcpu_show_registers() didn't do anything for HVM so far. Note though
that some extra hackery is needed for VMX - see the code comment.

Note further that the show_guest_stack() invocation is left alone here:
While strictly speaking guest_kernel_mode() should be predicated by a
PV / !HVM check, show_guest_stack() itself will bail immediately for
HVM.

While there and despite not being PVH-specific, take the opportunity and
filter offline vCPU-s: There's not really any register state associated
with them, so avoid spamming the log with useless information while
still leaving an indication of the fact.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I was pondering whether to also have the VMCS/VMCB dumped for every
vCPU, to present full state. The downside is that for larger systems
this would be a lot of output.
---
v2: New.

--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -49,6 +49,39 @@ static void read_registers(struct cpu_us
     crs[7] = read_gs_shadow();
 }
 
+static void get_hvm_registers(struct vcpu *v, struct cpu_user_regs *regs,
+                              unsigned long crs[8])
+{
+    struct segment_register sreg;
+
+    crs[0] = v->arch.hvm.guest_cr[0];
+    crs[2] = v->arch.hvm.guest_cr[2];
+    crs[3] = v->arch.hvm.guest_cr[3];
+    crs[4] = v->arch.hvm.guest_cr[4];
+
+    hvm_get_segment_register(v, x86_seg_cs, &sreg);
+    regs->cs = sreg.sel;
+
+    hvm_get_segment_register(v, x86_seg_ds, &sreg);
+    regs->ds = sreg.sel;
+
+    hvm_get_segment_register(v, x86_seg_es, &sreg);
+    regs->es = sreg.sel;
+
+    hvm_get_segment_register(v, x86_seg_fs, &sreg);
+    regs->fs = sreg.sel;
+    crs[5] = sreg.base;
+
+    hvm_get_segment_register(v, x86_seg_gs, &sreg);
+    regs->gs = sreg.sel;
+    crs[6] = sreg.base;
+
+    hvm_get_segment_register(v, x86_seg_ss, &sreg);
+    regs->ss = sreg.sel;
+
+    crs[7] = hvm_get_shadow_gs_base(v);
+}
+
 static void _show_registers(
     const struct cpu_user_regs *regs, unsigned long crs[8],
     enum context context, const struct vcpu *v)
@@ -99,27 +132,8 @@ void show_registers(const struct cpu_use
 
     if ( guest_mode(regs) && is_hvm_vcpu(v) )
     {
-        struct segment_register sreg;
+        get_hvm_registers(v, &fault_regs, fault_crs);
         context = CTXT_hvm_guest;
-        fault_crs[0] = v->arch.hvm.guest_cr[0];
-        fault_crs[2] = v->arch.hvm.guest_cr[2];
-        fault_crs[3] = v->arch.hvm.guest_cr[3];
-        fault_crs[4] = v->arch.hvm.guest_cr[4];
-        hvm_get_segment_register(v, x86_seg_cs, &sreg);
-        fault_regs.cs = sreg.sel;
-        hvm_get_segment_register(v, x86_seg_ds, &sreg);
-        fault_regs.ds = sreg.sel;
-        hvm_get_segment_register(v, x86_seg_es, &sreg);
-        fault_regs.es = sreg.sel;
-        hvm_get_segment_register(v, x86_seg_fs, &sreg);
-        fault_regs.fs = sreg.sel;
-        fault_crs[5] = sreg.base;
-        hvm_get_segment_register(v, x86_seg_gs, &sreg);
-        fault_regs.gs = sreg.sel;
-        fault_crs[6] = sreg.base;
-        hvm_get_segment_register(v, x86_seg_ss, &sreg);
-        fault_regs.ss = sreg.sel;
-        fault_crs[7] = hvm_get_shadow_gs_base(v);
     }
     else
     {
@@ -159,24 +173,35 @@ void show_registers(const struct cpu_use
 void vcpu_show_registers(const struct vcpu *v)
 {
     const struct cpu_user_regs *regs = &v->arch.user_regs;
-    bool kernel = guest_kernel_mode(v, regs);
+    struct cpu_user_regs aux_regs;
+    enum context context;
     unsigned long crs[8];
 
-    /* Only handle PV guests for now */
-    if ( !is_pv_vcpu(v) )
-        return;
-
-    crs[0] = v->arch.pv.ctrlreg[0];
-    crs[2] = arch_get_cr2(v);
-    crs[3] = pagetable_get_paddr(kernel ?
-                                 v->arch.guest_table :
-                                 v->arch.guest_table_user);
-    crs[4] = v->arch.pv.ctrlreg[4];
-    crs[5] = v->arch.pv.fs_base;
-    crs[6 + !kernel] = v->arch.pv.gs_base_kernel;
-    crs[7 - !kernel] = v->arch.pv.gs_base_user;
+    if ( is_hvm_vcpu(v) )
+    {
+        aux_regs = *regs;
+        get_hvm_registers(v->domain->vcpu[v->vcpu_id], &aux_regs, crs);
+        regs = &aux_regs;
+        context = CTXT_hvm_guest;
+    }
+    else
+    {
+        bool kernel = guest_kernel_mode(v, regs);
+
+        crs[0] = v->arch.pv.ctrlreg[0];
+        crs[2] = arch_get_cr2(v);
+        crs[3] = pagetable_get_paddr(kernel ?
+                                     v->arch.guest_table :
+                                     v->arch.guest_table_user);
+        crs[4] = v->arch.pv.ctrlreg[4];
+        crs[5] = v->arch.pv.fs_base;
+        crs[6 + !kernel] = v->arch.pv.gs_base_kernel;
+        crs[7 - !kernel] = v->arch.pv.gs_base_user;
+
+        context = CTXT_pv_guest;
+    }
 
-    _show_registers(regs, crs, CTXT_pv_guest, v);
+    _show_registers(regs, crs, context, v);
 }
 
 void show_page_walk(unsigned long addr)
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -631,6 +631,12 @@ void vcpu_show_execution_state(struct vc
 {
     unsigned long flags;
 
+    if ( test_bit(_VPF_down, &v->pause_flags) )
+    {
+        printk("*** %pv is offline ***\n", v);
+        return;
+    }
+
     printk("*** Dumping Dom%d vcpu#%d state: ***\n",
            v->domain->domain_id, v->vcpu_id);
 
@@ -642,6 +648,21 @@ void vcpu_show_execution_state(struct vc
 
     vcpu_pause(v); /* acceptably dangerous */
 
+#ifdef CONFIG_HVM
+    /*
+     * For VMX special care is needed: Reading some of the register state will
+     * require VMCS accesses. Engaging foreign VMCSes involves acquiring of a
+     * lock, which check_lock() would object to when done from an IRQs-disabled
+     * region. Despite this being a layering violation, engage the VMCS right
+     * here. This then also avoids doing so several times in close succession.
+     */
+    if ( cpu_has_vmx && is_hvm_vcpu(v) )
+    {
+        ASSERT(!in_irq());
+        vmx_vmcs_enter(v);
+    }
+#endif
+
     /* Prevent interleaving of output. */
     flags = console_lock_recursive_irqsave();
 
@@ -651,6 +672,11 @@ void vcpu_show_execution_state(struct vc
 
     console_unlock_recursive_irqrestore(flags);
 
+#ifdef CONFIG_HVM
+    if ( cpu_has_vmx && is_hvm_vcpu(v) )
+        vmx_vmcs_exit(v);
+#endif
+
     vcpu_unpause(v);
 }
 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 6/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes
  2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
                   ` (4 preceding siblings ...)
  2021-09-02  8:35 ` [PATCH v2 5/6] x86/PVH: actually show Dom0's register state from debug key '0' Jan Beulich
@ 2021-09-02  8:36 ` Jan Beulich
  2021-09-02 11:42 ` [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-02  8:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Kevin Tian, Jun Nakajima

There's not really any register state associated with offline vCPU-s, so
avoid spamming the log with largely useless information while still
leaving an indication of the fact.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -241,6 +241,11 @@ static void vmcb_dump(unsigned char ch)
         printk("\n>>> Domain %d <<<\n", d->domain_id);
         for_each_vcpu ( d, v )
         {
+            if ( test_bit(_VPF_down, &v->pause_flags) )
+            {
+                printk("\tVCPU %u: offline\n", v->vcpu_id);
+                continue;
+            }
             printk("\tVCPU %d\n", v->vcpu_id);
             svm_vmcb_dump("key_handler", v->arch.hvm.svm.vmcb);
         }
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -2133,6 +2133,11 @@ static void vmcs_dump(unsigned char ch)
         printk("\n>>> Domain %d <<<\n", d->domain_id);
         for_each_vcpu ( d, v )
         {
+            if ( test_bit(_VPF_down, &v->pause_flags) )
+            {
+                printk("\tVCPU %u: offline\n", v->vcpu_id);
+                continue;
+            }
             printk("\tVCPU %d\n", v->vcpu_id);
             vmcs_dump_vcpu(v);
         }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/6] x86/PVH: Dom0 building adjustments
  2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
                   ` (5 preceding siblings ...)
  2021-09-02  8:36 ` [PATCH v2 6/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes Jan Beulich
@ 2021-09-02 11:42 ` Jan Beulich
  2021-09-02 12:15   ` Juergen Gross
  6 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-09-02 11:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

On 02.09.2021 10:30, Jan Beulich wrote:
> The code building PVH Dom0 made use of sequences of P2M changes
> which are disallowed as of XSA-378. First of all population of the
> first Mb of memory needs to be redone. Then, largely as a
> workaround, checking introduced by XSA-378 needs to be slightly
> relaxed.
> 
> Note that with these adjustments I get Dom0 to start booting on my
> development system, but the Dom0 kernel then gets stuck. Since it
> was the first time for me to try PVH Dom0 in this context (see
> below for why I was hesitant), I cannot tell yet whether this is
> due further fallout from the XSA, or some further unrelated
> problem. Dom0's BSP makes it all the way through to entering the
> idle loop while all APs are still in VPF_down.

This last part of the mystery is now solved: By cloning from my PV
.config, I've inherited the X86_X2APIC=n that I have there. Yet
ACPI's MADT gets populated with only x2APIC entries when building
Dom0, which such a kernel would mostly ignore (except for logging).
IOW in a way this was indeed a missing select, except that what's
needed wouldn't quite work yet:

--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -83,6 +83,6 @@ config XEN_PVH
 	bool "Xen PVH guest support"
 	depends on XEN && XEN_PVHVM && ACPI
 	select PVH
-	def_bool n
+	select X86_X2APIC if XEN_DOM0
 	help
 	  Support for running as a Xen PVH guest.

This is because, as mentioned, XEN_DOM0 gets turned off when XEN_PV
is off. Whereas x2APIC isn't strictly needed for DomU afaics
(MADT gets populated with LAPIC entries only), so the "select" also
shouldn't be unconditional.

While likely no-one will really care, I'd like to note that this
effectively means a 32-bit Linux PVH Dom0 would be impossible, as
X86_X2APIC depends on X86_64. This may want reflecting in the
eventual adjustment to the XEN_DOM0 dependencies.

Btw, I've meanwhile also checked timers: They're active and get
updated while Dom0 is in this funny idle state.

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/6] x86/PVH: Dom0 building adjustments
  2021-09-02 11:42 ` [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
@ 2021-09-02 12:15   ` Juergen Gross
  2021-09-02 12:39     ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Juergen Gross @ 2021-09-02 12:15 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné


[-- Attachment #1.1.1: Type: text/plain, Size: 2124 bytes --]

On 02.09.21 13:42, Jan Beulich wrote:
> On 02.09.2021 10:30, Jan Beulich wrote:
>> The code building PVH Dom0 made use of sequences of P2M changes
>> which are disallowed as of XSA-378. First of all population of the
>> first Mb of memory needs to be redone. Then, largely as a
>> workaround, checking introduced by XSA-378 needs to be slightly
>> relaxed.
>>
>> Note that with these adjustments I get Dom0 to start booting on my
>> development system, but the Dom0 kernel then gets stuck. Since it
>> was the first time for me to try PVH Dom0 in this context (see
>> below for why I was hesitant), I cannot tell yet whether this is
>> due further fallout from the XSA, or some further unrelated
>> problem. Dom0's BSP makes it all the way through to entering the
>> idle loop while all APs are still in VPF_down.
> 
> This last part of the mystery is now solved: By cloning from my PV
> .config, I've inherited the X86_X2APIC=n that I have there. Yet
> ACPI's MADT gets populated with only x2APIC entries when building
> Dom0, which such a kernel would mostly ignore (except for logging).
> IOW in a way this was indeed a missing select, except that what's
> needed wouldn't quite work yet:
> 
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -83,6 +83,6 @@ config XEN_PVH
>   	bool "Xen PVH guest support"
>   	depends on XEN && XEN_PVHVM && ACPI
>   	select PVH
> -	def_bool n
> +	select X86_X2APIC if XEN_DOM0
>   	help
>   	  Support for running as a Xen PVH guest.
> 
> This is because, as mentioned, XEN_DOM0 gets turned off when XEN_PV
> is off. Whereas x2APIC isn't strictly needed for DomU afaics
> (MADT gets populated with LAPIC entries only), so the "select" also
> shouldn't be unconditional.

Correct.

We should rename XEN_DOM0 to XEN_DOM0_PV, and add a "real" XEN_DOM0.

> While likely no-one will really care, I'd like to note that this
> effectively means a 32-bit Linux PVH Dom0 would be impossible, as
> X86_X2APIC depends on X86_64. This may want reflecting in the
> eventual adjustment to the XEN_DOM0 dependencies.

Indeed.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/6] x86/PVH: Dom0 building adjustments
  2021-09-02 12:15   ` Juergen Gross
@ 2021-09-02 12:39     ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-02 12:39 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, xen-devel

On 02.09.2021 14:15, Juergen Gross wrote:
> On 02.09.21 13:42, Jan Beulich wrote:
>> On 02.09.2021 10:30, Jan Beulich wrote:
>>> The code building PVH Dom0 made use of sequences of P2M changes
>>> which are disallowed as of XSA-378. First of all population of the
>>> first Mb of memory needs to be redone. Then, largely as a
>>> workaround, checking introduced by XSA-378 needs to be slightly
>>> relaxed.
>>>
>>> Note that with these adjustments I get Dom0 to start booting on my
>>> development system, but the Dom0 kernel then gets stuck. Since it
>>> was the first time for me to try PVH Dom0 in this context (see
>>> below for why I was hesitant), I cannot tell yet whether this is
>>> due further fallout from the XSA, or some further unrelated
>>> problem. Dom0's BSP makes it all the way through to entering the
>>> idle loop while all APs are still in VPF_down.
>>
>> This last part of the mystery is now solved: By cloning from my PV
>> .config, I've inherited the X86_X2APIC=n that I have there. Yet
>> ACPI's MADT gets populated with only x2APIC entries when building
>> Dom0, which such a kernel would mostly ignore (except for logging).
>> IOW in a way this was indeed a missing select, except that what's
>> needed wouldn't quite work yet:
>>
>> --- a/arch/x86/xen/Kconfig
>> +++ b/arch/x86/xen/Kconfig
>> @@ -83,6 +83,6 @@ config XEN_PVH
>>   	bool "Xen PVH guest support"
>>   	depends on XEN && XEN_PVHVM && ACPI
>>   	select PVH
>> -	def_bool n
>> +	select X86_X2APIC if XEN_DOM0
>>   	help
>>   	  Support for running as a Xen PVH guest.
>>
>> This is because, as mentioned, XEN_DOM0 gets turned off when XEN_PV
>> is off. Whereas x2APIC isn't strictly needed for DomU afaics
>> (MADT gets populated with LAPIC entries only), so the "select" also
>> shouldn't be unconditional.
> 
> Correct.
> 
> We should rename XEN_DOM0 to XEN_DOM0_PV, and add a "real" XEN_DOM0.

Actually, as I have just found, the lack of XEN_DOM0 _is_ a problem:
xen_initial_domain() gets hardcoded to 0 without it. I'll have to
make a(nother) patch along the lines of what you suggest; hvc-xen
and "earlyprintk=xen" also don't really work together, because the
first thing xenboot_write_console() does is a !xen_pv_domain() check.

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries
  2021-09-02  8:33 ` [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's " Jan Beulich
@ 2021-09-06 15:48   ` Andrew Cooper
  2021-09-06 15:55     ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2021-09-06 15:48 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné, George Dunlap

On 02/09/2021 09:33, Jan Beulich wrote:
> To become independent of the sequence of mapping operations, permit
> "access" to accumulate for Dom0, noting that there's not going to be an
> introspection agent for it which this might interfere with. While e.g.
> ideally only ROM regions would get mapped with X set, getting there is
> quite a bit of work.

?

That's literally the opposite of what needs to happen to fix this bug. 
Introspection is the only interface which should be restricting X
permissions.

>  Plus the use of p2m_access_* here is abusive in the
> first place.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Split off from original patch.
>
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1319,6 +1319,18 @@ static int set_typed_p2m_entry(struct do
>              return -EPERM;
>          }
>  
> +        /*
> +         * Gross bodge, to go away again rather sooner than later:
> +         *
> +         * For MMIO allow access permissions to accumulate, but only for Dom0.
> +         * Since set_identity_p2m_entry() and set_mmio_p2m_entry() differ in
> +         * the way they specify "access", this will allow the ultimate result
> +         * be independent of the sequence of operations.

"result to be"

~Andrew

> +         */
> +        if ( is_hardware_domain(d) && gfn_p2mt == p2m_mmio_direct &&
> +             access <= p2m_access_rwx && a <= p2m_access_rwx )
> +            access |= a;
> +
>          if ( access == a )
>          {
>              gfn_unlock(p2m, gfn, order);
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/6] x86/P2M: relax guarding of MMIO entries
  2021-09-02  8:32 ` [PATCH v2 1/6] x86/P2M: relax guarding of MMIO entries Jan Beulich
@ 2021-09-06 15:54   ` Andrew Cooper
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2021-09-06 15:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné, George Dunlap

On 02/09/2021 09:32, Jan Beulich wrote:
> One of the changes comprising the fixes for XSA-378 disallows replacing
> MMIO mappings by code paths not intended for this purpose. At least in
> the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
> this is too strict. Generally short-circuit requests establishing the
> same kind of mapping that's already in place, while otherwise adjusting
> permissions without - as before - allowing MFN or type to change.

"Generally short-circuit requests establishing the same kind of mapping
(mfn, type) but allow the permissions to differ".

> While there, also add a log message to the other domain_crash()
> invocation that did prevent PVH Dom0 from coming up after the XSA-378
> changes.
>
> Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I may have gone too far by allowing "access" to change for all special
> types now.

I think this is appropriate.  After all, it is the pre-existing
behaviour, and the type change is the important thing to restrict.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries
  2021-09-06 15:48   ` Andrew Cooper
@ 2021-09-06 15:55     ` Jan Beulich
  2021-09-06 19:23       ` Andrew Cooper
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-09-06 15:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, George Dunlap, xen-devel

On 06.09.2021 17:48, Andrew Cooper wrote:
> On 02/09/2021 09:33, Jan Beulich wrote:
>> To become independent of the sequence of mapping operations, permit
>> "access" to accumulate for Dom0, noting that there's not going to be an
>> introspection agent for it which this might interfere with. While e.g.
>> ideally only ROM regions would get mapped with X set, getting there is
>> quite a bit of work.
> 
> ?
> 
> That's literally the opposite of what needs to happen to fix this bug. 
> Introspection is the only interface which should be restricting X
> permissions.

What agent would be handling access violations in Dom0? The description
(now) focuses on entirely Dom0; I agree that DomU wants things the way
you describe (provided all p2m_access_t abuses are gone).

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries
  2021-09-06 15:55     ` Jan Beulich
@ 2021-09-06 19:23       ` Andrew Cooper
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2021-09-06 19:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Wei Liu, Roger Pau Monné, George Dunlap, xen-devel

On 06/09/2021 16:55, Jan Beulich wrote:
> On 06.09.2021 17:48, Andrew Cooper wrote:
>> On 02/09/2021 09:33, Jan Beulich wrote:
>>> To become independent of the sequence of mapping operations, permit
>>> "access" to accumulate for Dom0, noting that there's not going to be an
>>> introspection agent for it which this might interfere with. While e.g.
>>> ideally only ROM regions would get mapped with X set, getting there is
>>> quite a bit of work.
>> ?
>>
>> That's literally the opposite of what needs to happen to fix this bug. 
>> Introspection is the only interface which should be restricting X
>> permissions.
> What agent would be handling access violations in Dom0?

None.  dom0 really shouldn't have any NX mappings in the first place.

~Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-06 19:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-02  8:30 [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
2021-09-02  8:32 ` [PATCH v2 1/6] x86/P2M: relax guarding of MMIO entries Jan Beulich
2021-09-06 15:54   ` Andrew Cooper
2021-09-02  8:33 ` [PATCH v2 2/6] x86/P2M: relax permissions of PVH Dom0's " Jan Beulich
2021-09-06 15:48   ` Andrew Cooper
2021-09-06 15:55     ` Jan Beulich
2021-09-06 19:23       ` Andrew Cooper
2021-09-02  8:33 ` [PATCH v2 3/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
2021-09-02  8:34 ` [PATCH v2 4/6] x86/PV: properly set shadow allocation for Dom0 Jan Beulich
2021-09-02  8:35 ` [PATCH v2 5/6] x86/PVH: actually show Dom0's register state from debug key '0' Jan Beulich
2021-09-02  8:36 ` [PATCH v2 6/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes Jan Beulich
2021-09-02 11:42 ` [PATCH v2 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
2021-09-02 12:15   ` Juergen Gross
2021-09-02 12:39     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.