All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] x86/PVH: Dom0 building adjustments
@ 2021-09-29 13:11 Jan Beulich
  2021-09-29 13:13 ` [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-29 13:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

In the course of fixing XSA-378 fallout I ran into a number of
issues, the remaining parts of which this series is trying to deal
with. The majority of the changes is pretty much independent of
one another.

There was another rather basic issue to fight first (patch
was submitted separately as RFC): vPCI wasn't aware of hidden PCI
devices, hitting an ASSERT(). Obviously I couldn't afford not
having a functioning serial console.

Compared to v3 there is one new patch here, and the video mode
propagation one has been dropped for effectively having got
rejected. I realize that discussion on what exactly to include in
patch 3 has not settled yet; that patch is deliberately unchanged.
See individual patches for details.

1: PVH: improve Dom0 memory size calculation
2: PV: properly set shadow allocation for Dom0
3: PVH: permit more physdevop-s to be used by Dom0
4: HVM: also dump stacks from show_execution_state()
5: HVM: skip offline vCPU-s when dumping VMCBs/VMCSes
6: P2M: relax permissions of PVH Dom0's MMIO entries

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation
  2021-09-29 13:11 [PATCH v4 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
@ 2021-09-29 13:13 ` Jan Beulich
  2021-10-21 10:54   ` Jan Beulich
  2021-10-22  9:55   ` Roger Pau Monné
  2021-09-29 13:13 ` [PATCH v4 2/6] x86/PV: properly set shadow allocation for Dom0 Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-29 13:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Assuming that the accounting for IOMMU page tables will also take care
of the P2M needs was wrong: dom0_paging_pages() can determine a far
higher value, high enough for the system to run out of memory while
setting up Dom0. Hence in the case of shared page tables the larger of
the two values needs to be used (without shared page tables the sum of
both continues to be applicable).

To not further complicate the logic, eliminate the up-to-2-iteration
loop in favor of doing a few calculations twice (before and after
calling dom0_paging_pages()). While this will lead to slightly too high
a value in "cpu_pages", it is deemed better to account a few too many
than a few too little.

Also uniformly use paging_mode_enabled(), not is_hvm_domain().

While there also account for two further aspects in the PV case: With
"iommu=dom0-passthrough" no IOMMU page tables would get allocated, so
none need accounting for. And if shadow mode is to be enabled, setting
aside a suitable amount for the P2M pool to get populated is also
necessary (i.e. similar to the non-shared-page-tables case of PVH).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I wonder whether this isn't enough to drop the "PVH dom0 without
dom0_mem" warning.

--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -318,8 +318,7 @@ unsigned long __init dom0_compute_nr_pag
     struct domain *d, struct elf_dom_parms *parms, unsigned long initrd_len)
 {
     nodeid_t node;
-    unsigned long avail = 0, nr_pages, min_pages, max_pages;
-    bool need_paging;
+    unsigned long avail = 0, nr_pages, min_pages, max_pages, iommu_pages = 0;
 
     /* The ordering of operands is to work around a clang5 issue. */
     if ( CONFIG_DOM0_MEM[0] && !dom0_mem_set )
@@ -337,53 +336,65 @@ unsigned long __init dom0_compute_nr_pag
         avail -= d->max_vcpus - 1;
 
     /* Reserve memory for iommu_dom0_init() (rough estimate). */
-    if ( is_iommu_enabled(d) )
+    if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
     {
         unsigned int s;
 
         for ( s = 9; s < BITS_PER_LONG; s += 9 )
-            avail -= max_pdx >> s;
+            iommu_pages += max_pdx >> s;
+
+        avail -= iommu_pages;
+    }
+
+    nr_pages = get_memsize(&dom0_size, avail);
+
+    /*
+     * If allocation isn't specified, reserve 1/16th of available memory for
+     * things like DMA buffers. This reservation is clamped to a maximum of
+     * 128MB.
+     */
+    if ( !nr_pages )
+    {
+        nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
+                            : min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
+        if ( paging_mode_enabled(d) )
+            /*
+             * Temporary workaround message until internal (paging) memory
+             * accounting required to build a pvh dom0 is improved.
+             */
+            printk("WARNING: PVH dom0 without dom0_mem set is still unstable. "
+                   "If you get crashes during boot, try adding a dom0_mem parameter\n");
     }
 
-    need_paging = is_hvm_domain(d) &&
-        (!iommu_use_hap_pt(d) || !paging_mode_hap(d));
-    for ( ; ; need_paging = false )
+    if ( paging_mode_enabled(d) || opt_dom0_shadow )
     {
-        nr_pages = get_memsize(&dom0_size, avail);
-        min_pages = get_memsize(&dom0_min_size, avail);
-        max_pages = get_memsize(&dom0_max_size, avail);
+        unsigned long cpu_pages;
 
         /*
-         * If allocation isn't specified, reserve 1/16th of available memory
-         * for things like DMA buffers. This reservation is clamped to a
-         * maximum of 128MB.
+         * Clamp according to min/max limits and available memory
+         * (preliminary).
          */
-        if ( !nr_pages )
-        {
-            nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
-                                 : min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
-            if ( is_hvm_domain(d) && !need_paging )
-                /*
-                 * Temporary workaround message until internal (paging) memory
-                 * accounting required to build a pvh dom0 is improved.
-                 */
-                printk("WARNING: PVH dom0 without dom0_mem set is still unstable. "
-                       "If you get crashes during boot, try adding a dom0_mem parameter\n");
-        }
-
-
-        /* Clamp according to min/max limits and available memory. */
-        nr_pages = max(nr_pages, min_pages);
-        nr_pages = min(nr_pages, max_pages);
+        nr_pages = max(nr_pages, get_memsize(&dom0_min_size, avail));
+        nr_pages = min(nr_pages, get_memsize(&dom0_max_size, avail));
         nr_pages = min(nr_pages, avail);
 
-        if ( !need_paging )
-            break;
+        cpu_pages = dom0_paging_pages(d, nr_pages);
 
-        /* Reserve memory for shadow or HAP. */
-        avail -= dom0_paging_pages(d, nr_pages);
+        if ( !iommu_use_hap_pt(d) )
+            avail -= cpu_pages;
+        else if ( cpu_pages > iommu_pages )
+            avail -= cpu_pages - iommu_pages;
     }
 
+    nr_pages = get_memsize(&dom0_size, avail);
+    min_pages = get_memsize(&dom0_min_size, avail);
+    max_pages = get_memsize(&dom0_max_size, avail);
+
+    /* Clamp according to min/max limits and available memory (final). */
+    nr_pages = max(nr_pages, min_pages);
+    nr_pages = min(nr_pages, max_pages);
+    nr_pages = min(nr_pages, avail);
+
     if ( is_pv_domain(d) &&
          (parms->p2m_base == UNSET_ADDR) && !memsize_gt_zero(&dom0_size) &&
          (!memsize_gt_zero(&dom0_min_size) || (nr_pages > min_pages)) )



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 2/6] x86/PV: properly set shadow allocation for Dom0
  2021-09-29 13:11 [PATCH v4 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
  2021-09-29 13:13 ` [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
@ 2021-09-29 13:13 ` Jan Beulich
  2021-09-29 13:14 ` [PATCH v4 3/6] x86/PVH: permit more physdevop-s to be used by Dom0 Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-29 13:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan

Leaving shadow setup just to the L1TF tasklet means running Dom0 on a
minimally acceptable shadow memory pool, rather than what normally
would be used (also, for example, for PVH). Populate the pool before
triggering the tasklet (or in preparation for L1TF checking logic to
trigger it), on a best effort basis (again like done for PVH).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v4: Also fill pool when opt_pv_l1tf_hwdom is set.
v2: Latch dom0_paging_pages() result.

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1298,7 +1298,7 @@ int shadow_set_allocation(struct domain
 {
     struct page_info *sp;
 
-    ASSERT(paging_locked_by_me(d));
+    ASSERT(paging_locked_by_me(d) || system_state < SYS_STATE_active);
 
     if ( pages > 0 )
     {
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -21,6 +21,7 @@
 #include <asm/page.h>
 #include <asm/pv/mm.h>
 #include <asm/setup.h>
+#include <asm/shadow.h>
 
 /* Allow ring-3 access in long mode as guest cannot use ring 1 ... */
 #define BASE_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_USER)
@@ -928,8 +929,22 @@ int __init dom0_construct_pv(struct doma
     if ( d->domain_id == hardware_domid )
         iommu_hwdom_init(d);
 
-    /* Activate shadow mode, if requested.  Reuse the pv_l1tf tasklet. */
 #ifdef CONFIG_SHADOW_PAGING
+    /* Fill the shadow pool if necessary. */
+    if ( opt_dom0_shadow || opt_pv_l1tf_hwdom )
+    {
+        bool preempted;
+
+        nr_pt_pages = dom0_paging_pages(d, nr_pages);
+
+        do {
+            preempted = false;
+            shadow_set_allocation(d, nr_pt_pages, &preempted);
+            process_pending_softirqs();
+        } while ( preempted );
+    }
+
+    /* Activate shadow mode, if requested.  Reuse the pv_l1tf tasklet. */
     if ( opt_dom0_shadow )
     {
         printk("Switching dom0 to using shadow paging\n");



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 3/6] x86/PVH: permit more physdevop-s to be used by Dom0
  2021-09-29 13:11 [PATCH v4 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
  2021-09-29 13:13 ` [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
  2021-09-29 13:13 ` [PATCH v4 2/6] x86/PV: properly set shadow allocation for Dom0 Jan Beulich
@ 2021-09-29 13:14 ` Jan Beulich
  2021-10-22 10:17   ` Roger Pau Monné
  2021-09-29 13:14 ` [PATCH v4 4/6] x86/HVM: also dump stacks from show_execution_state() Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-09-29 13:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Certain notifications of Dom0 to Xen are independent of the mode Dom0 is
running in. Permit further PCI related ones (only their modern forms).
Also include the USB2 debug port operation at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I'm uncertain about the has_vpci() part of the check: I would think
is_hardware_domain() is both sufficient and concise. Without vPCI a PVH
Dom0 won't see any PCI devices in the first place (and hence would
effectively be non-functioning). Dropping this would in particular make
PHYSDEVOP_dbgp_op better fit in the mix.
---
v3: New.

--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -94,6 +94,12 @@ static long hvm_physdev_op(int cmd, XEN_
         break;
 
     case PHYSDEVOP_pci_mmcfg_reserved:
+    case PHYSDEVOP_pci_device_add:
+    case PHYSDEVOP_pci_device_remove:
+    case PHYSDEVOP_restore_msi_ext:
+    case PHYSDEVOP_dbgp_op:
+    case PHYSDEVOP_prepare_msix:
+    case PHYSDEVOP_release_msix:
         if ( !has_vpci(currd) || !is_hardware_domain(currd) )
             return -ENOSYS;
         break;



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 4/6] x86/HVM: also dump stacks from show_execution_state()
  2021-09-29 13:11 [PATCH v4 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2021-09-29 13:14 ` [PATCH v4 3/6] x86/PVH: permit more physdevop-s to be used by Dom0 Jan Beulich
@ 2021-09-29 13:14 ` Jan Beulich
  2021-10-22 10:30   ` Roger Pau Monné
  2021-09-29 13:15 ` [PATCH v4 5/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes Jan Beulich
  2021-09-29 13:15 ` [PATCH v4 6/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries Jan Beulich
  5 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-09-29 13:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Wire up show_hvm_stack() also on this path. Move the show_guest_stack()
invocation out of show_stack(), rendering dead the is-HVM check there.

While separating guest and host paths, also move the show_code()
invocation - the function bails immediately when guest_mode() returns
"true".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: New.

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -303,10 +303,6 @@ static void show_guest_stack(struct vcpu
     unsigned long mask = STACK_SIZE;
     void *stack_page = NULL;
 
-    /* Avoid HVM as we don't know what the stack looks like. */
-    if ( is_hvm_vcpu(v) )
-        return;
-
     if ( is_pv_32bit_vcpu(v) )
     {
         compat_show_guest_stack(v, regs, debug_stack_lines);
@@ -611,14 +607,11 @@ static void show_trace(const struct cpu_
     printk("\n");
 }
 
-void show_stack(const struct cpu_user_regs *regs)
+static void show_stack(const struct cpu_user_regs *regs)
 {
     unsigned long *stack = ESP_BEFORE_EXCEPTION(regs), *stack_bottom, addr;
     int i;
 
-    if ( guest_mode(regs) )
-        return show_guest_stack(current, regs);
-
     printk("Xen stack trace from "__OP"sp=%p:\n  ", stack);
 
     stack_bottom = _p(get_stack_dump_bottom(regs->rsp));
@@ -687,8 +680,30 @@ void show_execution_state(const struct c
     unsigned long flags = console_lock_recursive_irqsave();
 
     show_registers(regs);
-    show_code(regs);
-    show_stack(regs);
+
+    if ( guest_mode(regs) )
+    {
+        struct vcpu *curr = current;
+
+        if ( is_hvm_vcpu(curr) )
+        {
+            /*
+             * Stop interleaving prevention: The necessary P2M lookups
+             * involve locking, which has to occur with IRQs enabled.
+             */
+            console_unlock_recursive_irqrestore(flags);
+
+            show_hvm_stack(curr, regs);
+            return;
+        }
+
+        show_guest_stack(curr, regs);
+    }
+    else
+    {
+        show_code(regs);
+        show_stack(regs);
+    }
 
     console_unlock_recursive_irqrestore(flags);
 }
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -493,7 +493,6 @@ static always_inline void rep_nop(void)
 #define cpu_relax() rep_nop()
 
 void show_code(const struct cpu_user_regs *regs);
-void show_stack(const struct cpu_user_regs *regs);
 void show_stack_overflow(unsigned int cpu, const struct cpu_user_regs *regs);
 void show_registers(const struct cpu_user_regs *regs);
 void show_execution_state(const struct cpu_user_regs *regs);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 5/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes
  2021-09-29 13:11 [PATCH v4 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
                   ` (3 preceding siblings ...)
  2021-09-29 13:14 ` [PATCH v4 4/6] x86/HVM: also dump stacks from show_execution_state() Jan Beulich
@ 2021-09-29 13:15 ` Jan Beulich
  2021-09-29 13:15 ` [PATCH v4 6/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries Jan Beulich
  5 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-09-29 13:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Kevin Tian, Jun Nakajima

There's not really any register state associated with vCPU-s that
haven't been initialized yet, so avoid spamming the log with largely
useless information while still leaving an indication of the fact.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v4: Key off of v->is_initialised.
v2: New.

--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -242,6 +242,11 @@ static void vmcb_dump(unsigned char ch)
         printk("\n>>> Domain %d <<<\n", d->domain_id);
         for_each_vcpu ( d, v )
         {
+            if ( !v->is_initialised )
+            {
+                printk("\tVCPU %u: not initialized\n", v->vcpu_id);
+                continue;
+            }
             printk("\tVCPU %d\n", v->vcpu_id);
             svm_vmcb_dump("key_handler", v->arch.hvm.svm.vmcb);
         }
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -2133,6 +2133,11 @@ static void vmcs_dump(unsigned char ch)
         printk("\n>>> Domain %d <<<\n", d->domain_id);
         for_each_vcpu ( d, v )
         {
+            if ( !v->is_initialised )
+            {
+                printk("\tVCPU %u: not initialized\n", v->vcpu_id);
+                continue;
+            }
             printk("\tVCPU %d\n", v->vcpu_id);
             vmcs_dump_vcpu(v);
         }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 6/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries
  2021-09-29 13:11 [PATCH v4 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
                   ` (4 preceding siblings ...)
  2021-09-29 13:15 ` [PATCH v4 5/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes Jan Beulich
@ 2021-09-29 13:15 ` Jan Beulich
  2021-10-22 13:25   ` Roger Pau Monné
  5 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-09-29 13:15 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, George Dunlap

To become independent of the sequence of mapping operations, permit
"access" to accumulate for Dom0, noting that there's not going to be an
introspection agent for it which this might interfere with. While e.g.
ideally only ROM regions would get mapped with X set, getting there is
quite a bit of work. Plus the use of p2m_access_* here is abusive in the
first place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Move last in series, for being controversial.
v2: Split off from original patch. Accumulate all of R, W, and X.

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1319,6 +1319,18 @@ static int set_typed_p2m_entry(struct do
             return -EPERM;
         }
 
+        /*
+         * Gross bodge, to go away again rather sooner than later:
+         *
+         * For MMIO allow access permissions to accumulate, but only for Dom0.
+         * Since set_identity_p2m_entry() and set_mmio_p2m_entry() differ in
+         * the way they specify "access", this will allow the ultimate result
+         * to be independent of the sequence of operations.
+         */
+        if ( is_hardware_domain(d) && gfn_p2mt == p2m_mmio_direct &&
+             access <= p2m_access_rwx && a <= p2m_access_rwx )
+            access |= a;
+
         if ( access == a )
         {
             gfn_unlock(p2m, gfn, order);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation
  2021-09-29 13:13 ` [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
@ 2021-10-21 10:54   ` Jan Beulich
  2021-10-22  9:55   ` Roger Pau Monné
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-10-21 10:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

On 29.09.2021 15:13, Jan Beulich wrote:
> @@ -337,53 +336,65 @@ unsigned long __init dom0_compute_nr_pag
>          avail -= d->max_vcpus - 1;
>  
>      /* Reserve memory for iommu_dom0_init() (rough estimate). */
> -    if ( is_iommu_enabled(d) )
> +    if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
>      {
>          unsigned int s;
>  
>          for ( s = 9; s < BITS_PER_LONG; s += 9 )
> -            avail -= max_pdx >> s;
> +            iommu_pages += max_pdx >> s;
> +
> +        avail -= iommu_pages;
> +    }
> +
> +    nr_pages = get_memsize(&dom0_size, avail);
> +
> +    /*
> +     * If allocation isn't specified, reserve 1/16th of available memory for
> +     * things like DMA buffers. This reservation is clamped to a maximum of
> +     * 128MB.
> +     */
> +    if ( !nr_pages )
> +    {
> +        nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
> +                            : min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
> +        if ( paging_mode_enabled(d) )
> +            /*
> +             * Temporary workaround message until internal (paging) memory
> +             * accounting required to build a pvh dom0 is improved.
> +             */
> +            printk("WARNING: PVH dom0 without dom0_mem set is still unstable. "
> +                   "If you get crashes during boot, try adding a dom0_mem parameter\n");
>      }
>  
> -    need_paging = is_hvm_domain(d) &&
> -        (!iommu_use_hap_pt(d) || !paging_mode_hap(d));
> -    for ( ; ; need_paging = false )
> +    if ( paging_mode_enabled(d) || opt_dom0_shadow )
>      {
> -        nr_pages = get_memsize(&dom0_size, avail);
> -        min_pages = get_memsize(&dom0_min_size, avail);
> -        max_pages = get_memsize(&dom0_max_size, avail);
> +        unsigned long cpu_pages;
>  
>          /*
> -         * If allocation isn't specified, reserve 1/16th of available memory
> -         * for things like DMA buffers. This reservation is clamped to a
> -         * maximum of 128MB.
> +         * Clamp according to min/max limits and available memory
> +         * (preliminary).
>           */
> -        if ( !nr_pages )
> -        {
> -            nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
> -                                 : min(avail / 16, 128UL << (20 - PAGE_SHIFT)));

Just FYI that I've noticed only now that moving this only up is
not enough; the same also ...

> -            if ( is_hvm_domain(d) && !need_paging )
> -                /*
> -                 * Temporary workaround message until internal (paging) memory
> -                 * accounting required to build a pvh dom0 is improved.
> -                 */
> -                printk("WARNING: PVH dom0 without dom0_mem set is still unstable. "
> -                       "If you get crashes during boot, try adding a dom0_mem parameter\n");
> -        }
> -
> -
> -        /* Clamp according to min/max limits and available memory. */
> -        nr_pages = max(nr_pages, min_pages);
> -        nr_pages = min(nr_pages, max_pages);
> +        nr_pages = max(nr_pages, get_memsize(&dom0_min_size, avail));
> +        nr_pages = min(nr_pages, get_memsize(&dom0_max_size, avail));
>          nr_pages = min(nr_pages, avail);
>  
> -        if ( !need_paging )
> -            break;
> +        cpu_pages = dom0_paging_pages(d, nr_pages);
>  
> -        /* Reserve memory for shadow or HAP. */
> -        avail -= dom0_paging_pages(d, nr_pages);
> +        if ( !iommu_use_hap_pt(d) )
> +            avail -= cpu_pages;
> +        else if ( cpu_pages > iommu_pages )
> +            avail -= cpu_pages - iommu_pages;
>      }
>  
> +    nr_pages = get_memsize(&dom0_size, avail);

... is needed here, or else things won't work e.g. without any "dom0_mem=".
I'll introduce a helper function ...

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation
  2021-09-29 13:13 ` [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
  2021-10-21 10:54   ` Jan Beulich
@ 2021-10-22  9:55   ` Roger Pau Monné
  2021-10-22  9:58     ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2021-10-22  9:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Wed, Sep 29, 2021 at 03:13:24PM +0200, Jan Beulich wrote:
> Assuming that the accounting for IOMMU page tables will also take care
> of the P2M needs was wrong: dom0_paging_pages() can determine a far
> higher value, high enough for the system to run out of memory while
> setting up Dom0. Hence in the case of shared page tables the larger of
> the two values needs to be used (without shared page tables the sum of
> both continues to be applicable).
> 
> To not further complicate the logic, eliminate the up-to-2-iteration
> loop in favor of doing a few calculations twice (before and after
> calling dom0_paging_pages()). While this will lead to slightly too high
> a value in "cpu_pages", it is deemed better to account a few too many
> than a few too little.
> 
> Also uniformly use paging_mode_enabled(), not is_hvm_domain().
> 
> While there also account for two further aspects in the PV case: With
> "iommu=dom0-passthrough" no IOMMU page tables would get allocated, so
> none need accounting for. And if shadow mode is to be enabled, setting
> aside a suitable amount for the P2M pool to get populated is also
> necessary (i.e. similar to the non-shared-page-tables case of PVH).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I wonder whether this isn't enough to drop the "PVH dom0 without
> dom0_mem" warning.
> 
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -318,8 +318,7 @@ unsigned long __init dom0_compute_nr_pag
>      struct domain *d, struct elf_dom_parms *parms, unsigned long initrd_len)
>  {
>      nodeid_t node;
> -    unsigned long avail = 0, nr_pages, min_pages, max_pages;
> -    bool need_paging;
> +    unsigned long avail = 0, nr_pages, min_pages, max_pages, iommu_pages = 0;
>  
>      /* The ordering of operands is to work around a clang5 issue. */
>      if ( CONFIG_DOM0_MEM[0] && !dom0_mem_set )
> @@ -337,53 +336,65 @@ unsigned long __init dom0_compute_nr_pag
>          avail -= d->max_vcpus - 1;
>  
>      /* Reserve memory for iommu_dom0_init() (rough estimate). */
> -    if ( is_iommu_enabled(d) )
> +    if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
>      {
>          unsigned int s;
>  
>          for ( s = 9; s < BITS_PER_LONG; s += 9 )
> -            avail -= max_pdx >> s;
> +            iommu_pages += max_pdx >> s;
> +
> +        avail -= iommu_pages;
> +    }
> +
> +    nr_pages = get_memsize(&dom0_size, avail);
> +
> +    /*
> +     * If allocation isn't specified, reserve 1/16th of available memory for
> +     * things like DMA buffers. This reservation is clamped to a maximum of
> +     * 128MB.
> +     */
> +    if ( !nr_pages )
> +    {
> +        nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
> +                            : min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
> +        if ( paging_mode_enabled(d) )
> +            /*
> +             * Temporary workaround message until internal (paging) memory
> +             * accounting required to build a pvh dom0 is improved.
> +             */
> +            printk("WARNING: PVH dom0 without dom0_mem set is still unstable. "
> +                   "If you get crashes during boot, try adding a dom0_mem parameter\n");
>      }
>  
> -    need_paging = is_hvm_domain(d) &&
> -        (!iommu_use_hap_pt(d) || !paging_mode_hap(d));
> -    for ( ; ; need_paging = false )
> +    if ( paging_mode_enabled(d) || opt_dom0_shadow )

Do we also need to account for opt_pv_l1tf_hwdom in case dom0 gets
shadowing enabled during runtime?

The rest LGTM, so:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

I'm also fine if you want to remove the warning message at this time.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation
  2021-10-22  9:55   ` Roger Pau Monné
@ 2021-10-22  9:58     ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-10-22  9:58 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 22.10.2021 11:55, Roger Pau Monné wrote:
> On Wed, Sep 29, 2021 at 03:13:24PM +0200, Jan Beulich wrote:
>> Assuming that the accounting for IOMMU page tables will also take care
>> of the P2M needs was wrong: dom0_paging_pages() can determine a far
>> higher value, high enough for the system to run out of memory while
>> setting up Dom0. Hence in the case of shared page tables the larger of
>> the two values needs to be used (without shared page tables the sum of
>> both continues to be applicable).
>>
>> To not further complicate the logic, eliminate the up-to-2-iteration
>> loop in favor of doing a few calculations twice (before and after
>> calling dom0_paging_pages()). While this will lead to slightly too high
>> a value in "cpu_pages", it is deemed better to account a few too many
>> than a few too little.
>>
>> Also uniformly use paging_mode_enabled(), not is_hvm_domain().
>>
>> While there also account for two further aspects in the PV case: With
>> "iommu=dom0-passthrough" no IOMMU page tables would get allocated, so
>> none need accounting for. And if shadow mode is to be enabled, setting
>> aside a suitable amount for the P2M pool to get populated is also
>> necessary (i.e. similar to the non-shared-page-tables case of PVH).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> I wonder whether this isn't enough to drop the "PVH dom0 without
>> dom0_mem" warning.
>>
>> --- a/xen/arch/x86/dom0_build.c
>> +++ b/xen/arch/x86/dom0_build.c
>> @@ -318,8 +318,7 @@ unsigned long __init dom0_compute_nr_pag
>>      struct domain *d, struct elf_dom_parms *parms, unsigned long initrd_len)
>>  {
>>      nodeid_t node;
>> -    unsigned long avail = 0, nr_pages, min_pages, max_pages;
>> -    bool need_paging;
>> +    unsigned long avail = 0, nr_pages, min_pages, max_pages, iommu_pages = 0;
>>  
>>      /* The ordering of operands is to work around a clang5 issue. */
>>      if ( CONFIG_DOM0_MEM[0] && !dom0_mem_set )
>> @@ -337,53 +336,65 @@ unsigned long __init dom0_compute_nr_pag
>>          avail -= d->max_vcpus - 1;
>>  
>>      /* Reserve memory for iommu_dom0_init() (rough estimate). */
>> -    if ( is_iommu_enabled(d) )
>> +    if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
>>      {
>>          unsigned int s;
>>  
>>          for ( s = 9; s < BITS_PER_LONG; s += 9 )
>> -            avail -= max_pdx >> s;
>> +            iommu_pages += max_pdx >> s;
>> +
>> +        avail -= iommu_pages;
>> +    }
>> +
>> +    nr_pages = get_memsize(&dom0_size, avail);
>> +
>> +    /*
>> +     * If allocation isn't specified, reserve 1/16th of available memory for
>> +     * things like DMA buffers. This reservation is clamped to a maximum of
>> +     * 128MB.
>> +     */
>> +    if ( !nr_pages )
>> +    {
>> +        nr_pages = avail - (pv_shim ? pv_shim_mem(avail)
>> +                            : min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
>> +        if ( paging_mode_enabled(d) )
>> +            /*
>> +             * Temporary workaround message until internal (paging) memory
>> +             * accounting required to build a pvh dom0 is improved.
>> +             */
>> +            printk("WARNING: PVH dom0 without dom0_mem set is still unstable. "
>> +                   "If you get crashes during boot, try adding a dom0_mem parameter\n");
>>      }
>>  
>> -    need_paging = is_hvm_domain(d) &&
>> -        (!iommu_use_hap_pt(d) || !paging_mode_hap(d));
>> -    for ( ; ; need_paging = false )
>> +    if ( paging_mode_enabled(d) || opt_dom0_shadow )
> 
> Do we also need to account for opt_pv_l1tf_hwdom in case dom0 gets
> shadowing enabled during runtime?

Yes, we do, and I've added that to the check for v5 already.

> The rest LGTM, so:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, but as said in a reply to this just yesterday, this is buggy,
and a v5 is going to be needed anyway.

> I'm also fine if you want to remove the warning message at this time.

Okay, will do.

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/6] x86/PVH: permit more physdevop-s to be used by Dom0
  2021-09-29 13:14 ` [PATCH v4 3/6] x86/PVH: permit more physdevop-s to be used by Dom0 Jan Beulich
@ 2021-10-22 10:17   ` Roger Pau Monné
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2021-10-22 10:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Wed, Sep 29, 2021 at 03:14:13PM +0200, Jan Beulich wrote:
> Certain notifications of Dom0 to Xen are independent of the mode Dom0 is
> running in. Permit further PCI related ones (only their modern forms).
> Also include the USB2 debug port operation at this occasion.

Sorry, I realize now that I failed to provide a reply on the previous
patch.

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I'm uncertain about the has_vpci() part of the check: I would think
> is_hardware_domain() is both sufficient and concise. Without vPCI a PVH
> Dom0 won't see any PCI devices in the first place (and hence would
> effectively be non-functioning). Dropping this would in particular make
> PHYSDEVOP_dbgp_op better fit in the mix.

I agree, it's not an option to have a non-vPCI PVH dom0 anyway, and
the important restriction for those operations is the hardware domain
part.

> ---
> v3: New.
> 
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -94,6 +94,12 @@ static long hvm_physdev_op(int cmd, XEN_
>          break;
>  
>      case PHYSDEVOP_pci_mmcfg_reserved:
> +    case PHYSDEVOP_pci_device_add:
> +    case PHYSDEVOP_pci_device_remove:

Those are indeed fine.

> +    case PHYSDEVOP_restore_msi_ext:

While I understand the current need for this one in order to possibly
restore the MSI interrupt for the serial console, it might be a
cleaner approach to let dom0 restore the PCI state on it's own using
the native methods like it's done for initial setup.

I realize we are missing a bunch of stuff here, for once vPCI state
should be reset and put in sync with the hardware values after a
restore from suspension.

And then we likely need to either limit PHYSDEVOP_restore_msi_ext to
just restore Xen used MSI vectors, or if technically possible it would
be best from Xen to setup it's own MSI vectors when restoring without
relying on dom0.

> +    case PHYSDEVOP_dbgp_op:

This one is also fine.

> +    case PHYSDEVOP_prepare_msix:
> +    case PHYSDEVOP_release_msix:

Those two again I'm not sure should be added right now, as we still
haven't decided how pci passthrough will work from the hardware domain
PoV. I'm explicitly concerned about those two because they will mess
with the MSI-X state behind the back of vPCI, and nothing good will
came out of it.

If we really need those two in the future, code should be added so
that vPCI state is properly updated.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/6] x86/HVM: also dump stacks from show_execution_state()
  2021-09-29 13:14 ` [PATCH v4 4/6] x86/HVM: also dump stacks from show_execution_state() Jan Beulich
@ 2021-10-22 10:30   ` Roger Pau Monné
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2021-10-22 10:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Wed, Sep 29, 2021 at 03:14:46PM +0200, Jan Beulich wrote:
> Wire up show_hvm_stack() also on this path. Move the show_guest_stack()
> invocation out of show_stack(), rendering dead the is-HVM check there.
> 
> While separating guest and host paths, also move the show_code()
> invocation - the function bails immediately when guest_mode() returns
> "true".
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v4: New.
> 
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -303,10 +303,6 @@ static void show_guest_stack(struct vcpu
>      unsigned long mask = STACK_SIZE;
>      void *stack_page = NULL;
>  
> -    /* Avoid HVM as we don't know what the stack looks like. */
> -    if ( is_hvm_vcpu(v) )
> -        return;

Might be good to keep it as an ASSERT_UNREACHABLE now.

Regardless:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries
  2021-09-29 13:15 ` [PATCH v4 6/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries Jan Beulich
@ 2021-10-22 13:25   ` Roger Pau Monné
  2021-11-02  9:36     ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2021-10-22 13:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap

On Wed, Sep 29, 2021 at 03:15:48PM +0200, Jan Beulich wrote:
> To become independent of the sequence of mapping operations, permit
> "access" to accumulate for Dom0, noting that there's not going to be an
> introspection agent for it which this might interfere with. While e.g.
> ideally only ROM regions would get mapped with X set, getting there is
> quite a bit of work. Plus the use of p2m_access_* here is abusive in the
> first place.

While doing this might be fine on Intel hardware, AMD hardware can
specify strict mapping access requirements from the IVMD flags, and
hence we should enforce those.

I think a better solution would be to not return error if the only
divergence between the current mapping and the requested one is the
access flag. We could log a message in that case about being unable to
change the access for the gfn.

This relies on the RMRR/IVMD regions being setup before any other MMIO
region, or else Xen would have to clear existing entries on that case.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries
  2021-10-22 13:25   ` Roger Pau Monné
@ 2021-11-02  9:36     ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-11-02  9:36 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap

On 22.10.2021 15:25, Roger Pau Monné wrote:
> On Wed, Sep 29, 2021 at 03:15:48PM +0200, Jan Beulich wrote:
>> To become independent of the sequence of mapping operations, permit
>> "access" to accumulate for Dom0, noting that there's not going to be an
>> introspection agent for it which this might interfere with. While e.g.
>> ideally only ROM regions would get mapped with X set, getting there is
>> quite a bit of work. Plus the use of p2m_access_* here is abusive in the
>> first place.
> 
> While doing this might be fine on Intel hardware, AMD hardware can
> specify strict mapping access requirements from the IVMD flags, and
> hence we should enforce those.
> 
> I think a better solution would be to not return error if the only
> divergence between the current mapping and the requested one is the
> access flag. We could log a message in that case about being unable to
> change the access for the gfn.
> 
> This relies on the RMRR/IVMD regions being setup before any other MMIO
> region, or else Xen would have to clear existing entries on that case.

I guess I'll rather withdraw this change, until such point where we
actually run into an issue here. It was meant as a proactive
measure only anyway ...

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-11-02  9:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29 13:11 [PATCH v4 0/6] x86/PVH: Dom0 building adjustments Jan Beulich
2021-09-29 13:13 ` [PATCH v4 1/6] x86/PVH: improve Dom0 memory size calculation Jan Beulich
2021-10-21 10:54   ` Jan Beulich
2021-10-22  9:55   ` Roger Pau Monné
2021-10-22  9:58     ` Jan Beulich
2021-09-29 13:13 ` [PATCH v4 2/6] x86/PV: properly set shadow allocation for Dom0 Jan Beulich
2021-09-29 13:14 ` [PATCH v4 3/6] x86/PVH: permit more physdevop-s to be used by Dom0 Jan Beulich
2021-10-22 10:17   ` Roger Pau Monné
2021-09-29 13:14 ` [PATCH v4 4/6] x86/HVM: also dump stacks from show_execution_state() Jan Beulich
2021-10-22 10:30   ` Roger Pau Monné
2021-09-29 13:15 ` [PATCH v4 5/6] x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes Jan Beulich
2021-09-29 13:15 ` [PATCH v4 6/6] x86/P2M: relax permissions of PVH Dom0's MMIO entries Jan Beulich
2021-10-22 13:25   ` Roger Pau Monné
2021-11-02  9:36     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.