All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0
@ 2018-10-30 15:41 Roger Pau Monne
  2018-10-30 15:41 ` [PATCH v3 1/7] x86/pvh: fix TSC mode setup " Roger Pau Monne
                   ` (6 more replies)
  0 siblings, 7 replies; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne

Hello,

The following series contain miscellaneous fixes for a PVH Dom0. I've
found out this issues while trying to boot on an AMD EPYC box.

The series can be found on my git repo:

git://xenbits.xen.org/people/royger/xen.git fixes-pvh-v3

Roger Pau Monne (7):
  x86/pvh: fix TSC mode setup for PVH Dom0
  x86/hvm: introduce a define for the debug output IO port
  x86/pvh: allow PVH Dom0 to use the debug IO port console
  vpci: fix updating the command register
  vpci: fix execution of long running operations
  vpci/msix: carve p2m hole for MSIX MMIO regions
  amd/pvh: enable ACPI C1E disable quirk on PVH Dom0

 xen/arch/x86/cpu/amd.c            | 13 +++--
 xen/arch/x86/dom0_build.c         | 11 ++++
 xen/arch/x86/hvm/hvm.c            |  4 +-
 xen/arch/x86/hvm/ioreq.c          |  3 --
 xen/arch/x86/hvm/svm/svm.c        | 21 ++++++++
 xen/arch/x86/time.c               |  2 +-
 xen/common/domain.c               |  2 +
 xen/drivers/char/console.c        |  2 +-
 xen/drivers/vpci/header.c         | 84 +++++++++++++++++++------------
 xen/drivers/vpci/msix.c           | 40 +++++++++++++++
 xen/include/asm-x86/amd.h         |  3 ++
 xen/include/public/arch-x86/xen.h |  7 +++
 xen/include/xen/vpci.h            | 18 +++----
 13 files changed, 158 insertions(+), 52 deletions(-)

-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/7] x86/pvh: fix TSC mode setup for PVH Dom0
  2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
@ 2018-10-30 15:41 ` Roger Pau Monne
  2018-10-30 15:41 ` [PATCH v3 2/7] x86/hvm: introduce a define for the debug output IO port Roger Pau Monne
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

A PVH Dom0 might use TSC scaling or other HVM specific TSC
adjustments, so only short-circuit the TSC setup for a classic PV
Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 553698d4ab..03f792e7e5 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2125,7 +2125,7 @@ void tsc_set_info(struct domain *d,
 {
     ASSERT(!is_system_domain(d));
 
-    if ( is_hardware_domain(d) )
+    if ( is_pv_domain(d) && is_hardware_domain(d) )
     {
         d->arch.vtsc = 0;
         return;
-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/7] x86/hvm: introduce a define for the debug output IO port
  2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
  2018-10-30 15:41 ` [PATCH v3 1/7] x86/pvh: fix TSC mode setup " Roger Pau Monne
@ 2018-10-30 15:41 ` Roger Pau Monne
  2018-10-31 16:36   ` Wei Liu
  2018-10-30 15:41 ` [PATCH v3 3/7] x86/pvh: allow PVH Dom0 to use the debug IO port console Roger Pau Monne
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Roger Pau Monne

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/hvm.c            | 4 ++--
 xen/drivers/char/console.c        | 2 +-
 xen/include/public/arch-x86/xen.h | 7 +++++++
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 42d7a9bd1b..0e9d316b40 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -545,7 +545,7 @@ static int hvm_print_line(
     struct domain *cd = current->domain;
     char c = *val;
 
-    ASSERT(bytes == 1 && port == 0xe9);
+    ASSERT(bytes == 1 && port == XEN_HVM_DEBUGCONS_IOPORT);
 
     /* Deny any input requests. */
     if ( dir != IOREQ_WRITE )
@@ -654,7 +654,7 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    register_portio_handler(d, 0xe9, 1, hvm_print_line);
+    register_portio_handler(d, XEN_HVM_DEBUGCONS_IOPORT, 1, hvm_print_line);
 
     if ( hvm_tsc_scaling_supported )
         d->arch.hvm.tsc_scaling_ratio = hvm_default_tsc_scaling_ratio;
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 3b75f7a472..734abd2552 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -458,7 +458,7 @@ static inline void xen_console_write_debug_port(const char *buf, size_t len)
     unsigned long tmp;
     asm volatile ( "rep outsb;"
                    : "=&S" (tmp), "=&c" (tmp)
-                   : "0" (buf), "1" (len), "d" (0xe9) );
+                   : "0" (buf), "1" (len), "d" (XEN_HVM_DEBUGCONS_IOPORT) );
 }
 #endif
 
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index 69ee4bc40d..c76622654a 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -346,6 +346,13 @@ struct xen_arch_domainconfig {
 #define XEN_CPUID          XEN_EMULATE_PREFIX "cpuid"
 #endif
 
+/*
+ * Debug console IO port, also called "port E9 hack". Each character written
+ * to this IO port will be printed on the hypervisor console, subject to log
+ * level restrictions.
+ */
+#define XEN_HVM_DEBUGCONS_IOPORT 0xe9
+
 #endif /* __XEN_PUBLIC_ARCH_X86_XEN_H__ */
 
 /*
-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/7] x86/pvh: allow PVH Dom0 to use the debug IO port console
  2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
  2018-10-30 15:41 ` [PATCH v3 1/7] x86/pvh: fix TSC mode setup " Roger Pau Monne
  2018-10-30 15:41 ` [PATCH v3 2/7] x86/hvm: introduce a define for the debug output IO port Roger Pau Monne
@ 2018-10-30 15:41 ` Roger Pau Monne
  2018-10-30 16:27   ` Wei Liu
  2018-10-30 15:41 ` [PATCH v3 4/7] vpci: fix updating the command register Roger Pau Monne
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Force trapping accesses to IO port 0xe9 for a PVH Dom0, so it can
print to the HVM debug console.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v2:
 - Always enable the E9 debug console.

Changes since v1:
 - Use a define for 0xe9.
 - Expand 'List of' in the Xen command doc.
---
 xen/arch/x86/dom0_build.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index fe73cef899..038e37132a 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -433,6 +433,12 @@ int __init dom0_setup_permissions(struct domain *d)
         rc |= ioports_deny_access(d, pmtmr_ioport, pmtmr_ioport + 3);
     /* PCI configuration space (NB. 0xcf8 has special treatment). */
     rc |= ioports_deny_access(d, 0xcfc, 0xcff);
+#ifdef CONFIG_HVM
+    if ( is_hvm_domain(d) )
+        /* HVM debug console IO port. */
+        rc |= ioports_deny_access(d, XEN_HVM_DEBUGCONS_IOPORT,
+                                  XEN_HVM_DEBUGCONS_IOPORT);
+#endif
     /* Command-line I/O ranges. */
     process_dom0_ioports_disable(d);
 
-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 4/7] vpci: fix updating the command register
  2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
                   ` (2 preceding siblings ...)
  2018-10-30 15:41 ` [PATCH v3 3/7] x86/pvh: allow PVH Dom0 to use the debug IO port console Roger Pau Monne
@ 2018-10-30 15:41 ` Roger Pau Monne
  2018-11-05 16:46   ` Jan Beulich
  2018-10-30 15:41 ` [PATCH v3 5/7] vpci: fix execution of long running operations Roger Pau Monne
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Roger Pau Monne

When switching the memory decoding bit in the command register the
rest of the changes where dropped, leading to only the memory decoding
bit being updated.

Fix this by unconditionally writing the guest-requested command except
for the memory decoding bit, which will be updated once the p2m
changes are performed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/drivers/vpci/header.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 0ec4c082a6..9234de9b26 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -333,8 +333,10 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
          * hoping the guest will realize and try again.
          */
         modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
-    else
-        pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
+
+    /* Write the new command without updating the memory decoding bit. */
+    cmd = (cmd & ~PCI_COMMAND_MEMORY) | (current_cmd & PCI_COMMAND_MEMORY);
+    pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
 }
 
 static void bar_write(const struct pci_dev *pdev, unsigned int reg,
-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
                   ` (3 preceding siblings ...)
  2018-10-30 15:41 ` [PATCH v3 4/7] vpci: fix updating the command register Roger Pau Monne
@ 2018-10-30 15:41 ` Roger Pau Monne
  2018-11-05 16:56   ` Jan Beulich
  2018-10-30 15:41 ` [PATCH v3 6/7] vpci/msix: carve p2m hole for MSIX MMIO regions Roger Pau Monne
  2018-10-30 15:41 ` [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0 Roger Pau Monne
  6 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, Jan Beulich, Roger Pau Monne

BAR map/unmap is a long running operation that needs to be preempted
in order to avoid overrunning the assigned vCPU time (or even
triggering the watchdog).

Current logic for this preemption is wrong, and won't work at all for
AMD since only Intel makes use of hvm_io_pending (and even in that
case the current code is wrong).

Instead move the code that performs the mapping/unmapping to
a tasklet and pause the guest vCPU until the {un}mapping is done and
the memory decoding bit has been toggled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
[ioreq parts]
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/ioreq.c  |  3 --
 xen/common/domain.c       |  2 ++
 xen/drivers/vpci/header.c | 70 +++++++++++++++++++++++----------------
 xen/include/xen/vpci.h    | 15 +++------
 4 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index e2e755a8a1..58a86f9225 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -85,9 +85,6 @@ bool hvm_io_pending(struct vcpu *v)
     struct hvm_ioreq_server *s;
     unsigned int id;
 
-    if ( has_vpci(d) && vpci_process_pending(v) )
-        return true;
-
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
         struct hvm_ioreq_vcpu *sv;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index b8d4848970..10bfb55909 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -153,6 +153,8 @@ struct vcpu *vcpu_create(
 
     grant_table_init_vcpu(v);
 
+    vpci_init_vcpu(v);
+
     if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 9234de9b26..4af85d3c02 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -118,39 +118,48 @@ static void modify_decoding(const struct pci_dev *pdev, bool map, bool rom_only)
                      cmd);
 }
 
-bool vpci_process_pending(struct vcpu *v)
+static void vpci_process_pending(unsigned long data)
 {
-    if ( v->vpci.mem )
+    struct vcpu *v = (void *)data;
+    struct map_data map_data = {
+        .d = v->domain,
+        .map = v->vpci.map,
+    };
+    int rc;
+
+    ASSERT(v->vpci.mem && atomic_read(&v->pause_count));
+
+    while ( (rc = rangeset_consume_ranges(v->vpci.mem, map_range, &map_data)) ==
+            -ERESTART )
     {
-        struct map_data data = {
-            .d = v->domain,
-            .map = v->vpci.map,
-        };
-        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
-
-        if ( rc == -ERESTART )
-            return true;
-
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
-
-        rangeset_destroy(v->vpci.mem);
-        v->vpci.mem = NULL;
-        if ( rc )
-            /*
-             * FIXME: in case of failure remove the device from the domain.
-             * Note that there might still be leftover mappings. While this is
-             * safe for Dom0, for DomUs the domain will likely need to be
-             * killed in order to avoid leaking stale p2m mappings on
-             * failure.
-             */
-            vpci_remove_device(v->vpci.pdev);
+        tasklet_schedule(&v->vpci.task);
+        return;
     }
 
-    return false;
+    spin_lock(&v->vpci.pdev->vpci->lock);
+    /* Disable memory decoding unconditionally on failure. */
+    modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
+                    !rc && v->vpci.rom_only);
+    spin_unlock(&v->vpci.pdev->vpci->lock);
+
+    rangeset_destroy(v->vpci.mem);
+    v->vpci.mem = NULL;
+    if ( rc )
+        /*
+         * FIXME: in case of failure remove the device from the domain.
+         * Note that there might still be leftover mappings. While this is
+         * safe for Dom0, for DomUs the domain will likely need to be
+         * killed in order to avoid leaking stale p2m mappings on
+         * failure.
+         */
+        vpci_remove_device(v->vpci.pdev);
+
+    vcpu_unpause(v);
+}
+
+void vpci_init_vcpu(struct vcpu *v)
+{
+    tasklet_init(&v->vpci.task, vpci_process_pending, (unsigned long)v);
 }
 
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
@@ -183,6 +192,9 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
     curr->vpci.mem = mem;
     curr->vpci.map = map;
     curr->vpci.rom_only = rom_only;
+    /* Pause the vCPU and schedule the tasklet that will perform the mapping. */
+    vcpu_pause_nosync(curr);
+    tasklet_schedule(&curr->vpci.task);
 }
 
 static int modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index af2b8580ee..f97c48a8f1 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -49,11 +49,8 @@ uint32_t vpci_hw_read16(const struct pci_dev *pdev, unsigned int reg,
 uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
                         void *data);
 
-/*
- * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
- * should not run.
- */
-bool __must_check vpci_process_pending(struct vcpu *v);
+/* Init per-vCPU vPCI data. */
+void vpci_init_vcpu(struct vcpu *v);
 
 struct vpci {
     /* List of vPCI handlers for a device. */
@@ -142,6 +139,8 @@ struct vpci {
 };
 
 struct vpci_vcpu {
+    /* Per-vCPU tasklet to enqueue pending operations. */
+    struct tasklet task;
     /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
     struct rangeset *mem;
     struct pci_dev *pdev;
@@ -233,11 +232,7 @@ static inline void vpci_write(pci_sbdf_t sbdf, unsigned int reg,
     ASSERT_UNREACHABLE();
 }
 
-static inline bool vpci_process_pending(struct vcpu *v)
-{
-    ASSERT_UNREACHABLE();
-    return false;
-}
+static inline void vpci_init_vcpu(struct vcpu *v) { }
 #endif
 
 #endif
-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 6/7] vpci/msix: carve p2m hole for MSIX MMIO regions
  2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
                   ` (4 preceding siblings ...)
  2018-10-30 15:41 ` [PATCH v3 5/7] vpci: fix execution of long running operations Roger Pau Monne
@ 2018-10-30 15:41 ` Roger Pau Monne
  2018-11-05 17:07   ` Jan Beulich
  2018-10-30 15:41 ` [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0 Roger Pau Monne
  6 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Roger Pau Monne

Make sure the MSIX MMIO regions don't have p2m entries setup, so that
accesses to them trap into the hypervisor and can be handled by vpci.

This is a side-effect of commit 042678762 for PVH Dom0, which added
mappings for all the reserved regions into the Dom0 p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/drivers/vpci/header.c |  8 ++++++++
 xen/drivers/vpci/msix.c   | 40 +++++++++++++++++++++++++++++++++++++++
 xen/include/xen/vpci.h    |  3 +++
 3 files changed, 51 insertions(+)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 4af85d3c02..455dd4fc90 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -88,6 +88,14 @@ static void modify_decoding(const struct pci_dev *pdev, bool map, bool rom_only)
     uint16_t cmd;
     unsigned int i;
 
+    /*
+     * Make sure there are no mappings in the MSIX MMIO areas, so that accesses
+     * can be trapped (and emulated) by Xen when the memory decoding bit is
+     * enabled.
+     */
+    if ( map && !rom_only && vpci_make_msix_hole(pdev) )
+        return;
+
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
         if ( !MAPPABLE_BAR(&header->bars[i]) )
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 1960dae123..5759551724 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -21,6 +21,7 @@
 #include <xen/vpci.h>
 
 #include <asm/msi.h>
+#include <asm/p2m.h>
 
 #define VMSIX_SIZE(num) offsetof(struct vpci_msix, entries[num])
 
@@ -395,6 +396,45 @@ static const struct hvm_mmio_ops vpci_msix_table_ops = {
     .write = msix_write,
 };
 
+int vpci_make_msix_hole(const struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    unsigned int i;
+
+    if ( !pdev->vpci->msix )
+        return 0;
+
+    /* Make sure there's a hole for the MSIX table/PBA in the p2m. */
+    for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->tables); i++ )
+    {
+        unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
+        unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
+                                     vmsix_table_size(pdev->vpci, i) - 1);
+
+        for ( ; start <= end; start++ )
+        {
+            p2m_type_t t;
+            mfn_t mfn = get_gfn_query(d, start, &t);
+
+            if ( t == p2m_mmio_direct && mfn_x(mfn) == start )
+                    clear_identity_p2m_entry(d, start);
+            else if ( t != p2m_mmio_dm )
+            {
+                put_gfn(d, start);
+                gprintk(XENLOG_WARNING,
+                        "%04x:%02x:%02x.%u: existing mapping (mfn: %" PRI_mfn
+                        "type: %d) at %#lx clobbers MSIX MMIO area\n",
+                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                        PCI_FUNC(pdev->devfn), mfn_x(mfn), t, start);
+                return -EEXIST;
+            }
+            put_gfn(d, start);
+        }
+    }
+
+    return 0;
+}
+
 static int init_msix(struct pci_dev *pdev)
 {
     struct domain *d = pdev->domain;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index f97c48a8f1..e0c22ad81c 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -151,6 +151,9 @@ struct vpci_vcpu {
 #ifdef __XEN__
 void vpci_dump_msi(void);
 
+/* Make sure there's a hole in the p2m for the MSIX mmio areas. */
+int vpci_make_msix_hole(const struct pci_dev *pdev);
+
 /* Arch-specific vPCI MSI helpers. */
 void vpci_msi_arch_mask(struct vpci_msi *msi, const struct pci_dev *pdev,
                         unsigned int entry, bool mask);
-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0
  2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
                   ` (5 preceding siblings ...)
  2018-10-30 15:41 ` [PATCH v3 6/7] vpci/msix: carve p2m hole for MSIX MMIO regions Roger Pau Monne
@ 2018-10-30 15:41 ` Roger Pau Monne
  2018-10-30 16:28   ` Wei Liu
                     ` (2 more replies)
  6 siblings, 3 replies; 37+ messages in thread
From: Roger Pau Monne @ 2018-10-30 15:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Jan Beulich, Andrew Cooper, Suravee Suthikulpanit,
	Boris Ostrovsky, Brian Woods, Roger Pau Monne

PV Dom0 has a quirk for some AMD processors, where enabling ACPI can
also enable C1E mode. Apply the same workaround as done on PV for a
PVH Dom0, which consist on trapping accesses to the SMI command IO
port and disabling C1E if ACPI is enabled.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Brian Woods <brian.woods@amd.com>
---
Changes since v2:
 - Only register the IO port handler for the hardware domain.
---
 xen/arch/x86/cpu/amd.c     | 13 +++++++++----
 xen/arch/x86/dom0_build.c  |  5 +++++
 xen/arch/x86/hvm/svm/svm.c | 21 +++++++++++++++++++++
 xen/include/asm-x86/amd.h  |  3 +++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index c394c1c2ec..18a9e92b3c 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -44,6 +44,9 @@ integer_param("cpuid_mask_thermal_ecx", opt_cpuid_mask_thermal_ecx);
 s8 __read_mostly opt_allow_unsafe;
 boolean_param("allow_unsafe", opt_allow_unsafe);
 
+/* Signal whether the ACPI C1E quirk is required. */
+bool amd_acpi_c1e_quirk;
+
 static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo,
 				 unsigned int *hi)
 {
@@ -427,7 +430,7 @@ static void disable_c1_ramping(void)
 	}
 }
 
-static void disable_c1e(void *unused)
+void amd_disable_c1e(void *unused)
 {
 	uint64_t msr_content;
 
@@ -447,7 +450,7 @@ static void check_disable_c1e(unsigned int port, u8 value)
 {
 	/* C1E is sometimes enabled during entry to ACPI mode. */
 	if ((port == acpi_smi_cmd) && (value == acpi_enable_value))
-		on_each_cpu(disable_c1e, NULL, 1);
+		on_each_cpu(amd_disable_c1e, NULL, 1);
 }
 
 /*
@@ -626,9 +629,11 @@ static void init_amd(struct cpuinfo_x86 *c)
 	switch(c->x86)
 	{
 	case 0xf ... 0x17:
-		disable_c1e(NULL);
-		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
+		amd_disable_c1e(NULL);
+		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value)) {
 			pv_post_outb_hook = check_disable_c1e;
+			amd_acpi_c1e_quirk = true;
+		}
 		break;
 	}
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 038e37132a..5e2ad4bd56 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -12,6 +12,7 @@
 #include <xen/sched-if.h>
 #include <xen/softirq.h>
 
+#include <asm/amd.h>
 #include <asm/dom0_build.h>
 #include <asm/guest.h>
 #include <asm/hpet.h>
@@ -435,9 +436,13 @@ int __init dom0_setup_permissions(struct domain *d)
     rc |= ioports_deny_access(d, 0xcfc, 0xcff);
 #ifdef CONFIG_HVM
     if ( is_hvm_domain(d) )
+    {
         /* HVM debug console IO port. */
         rc |= ioports_deny_access(d, XEN_HVM_DEBUGCONS_IOPORT,
                                   XEN_HVM_DEBUGCONS_IOPORT);
+        if ( amd_acpi_c1e_quirk )
+            rc |= ioports_deny_access(d, acpi_smi_cmd, acpi_smi_cmd);
+    }
 #endif
     /* Command-line I/O ranges. */
     process_dom0_ioports_disable(d);
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 41427e7b9b..828ac5beed 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1273,6 +1273,24 @@ void svm_host_osvw_init()
     spin_unlock(&osvw_lock);
 }
 
+static int acpi_c1e_quirk(int dir, unsigned int port, unsigned int bytes,
+                          uint32_t *val)
+{
+    ASSERT(bytes == 1 && port == acpi_smi_cmd);
+
+    if ( dir == IOREQ_READ )
+    {
+        *val = inb(port);
+        return X86EMUL_OKAY;
+    }
+
+    outb(*val, port);
+    if ( *val == acpi_enable_value )
+       on_each_cpu(amd_disable_c1e, NULL, 1);
+
+    return X86EMUL_OKAY;
+}
+
 static int svm_domain_initialise(struct domain *d)
 {
     static const struct arch_csw csw = {
@@ -1285,6 +1303,9 @@ static int svm_domain_initialise(struct domain *d)
 
     svm_guest_osvw_init(d);
 
+    if ( is_hardware_domain(d) && amd_acpi_c1e_quirk )
+        register_portio_handler(d, acpi_smi_cmd, 1, acpi_c1e_quirk);
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/amd.h b/xen/include/asm-x86/amd.h
index e9867c7823..71fc52924e 100644
--- a/xen/include/asm-x86/amd.h
+++ b/xen/include/asm-x86/amd.h
@@ -148,4 +148,7 @@ extern s8 opt_allow_unsafe;
 void fam10h_check_enable_mmcfg(void);
 void check_enable_amd_mmconf_dmi(void);
 
+extern bool amd_acpi_c1e_quirk;
+void amd_disable_c1e(void *);
+
 #endif /* __AMD_H__ */
-- 
2.19.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/7] x86/pvh: allow PVH Dom0 to use the debug IO port console
  2018-10-30 15:41 ` [PATCH v3 3/7] x86/pvh: allow PVH Dom0 to use the debug IO port console Roger Pau Monne
@ 2018-10-30 16:27   ` Wei Liu
  0 siblings, 0 replies; 37+ messages in thread
From: Wei Liu @ 2018-10-30 16:27 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Tue, Oct 30, 2018 at 04:41:19PM +0100, Roger Pau Monne wrote:
> Force trapping accesses to IO port 0xe9 for a PVH Dom0, so it can
> print to the HVM debug console.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0
  2018-10-30 15:41 ` [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0 Roger Pau Monne
@ 2018-10-30 16:28   ` Wei Liu
  2018-10-31 17:44   ` Boris Ostrovsky
  2018-11-02  9:06   ` Jan Beulich
  2 siblings, 0 replies; 37+ messages in thread
From: Wei Liu @ 2018-10-30 16:28 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Jan Beulich, Andrew Cooper, Suravee Suthikulpanit,
	xen-devel, Boris Ostrovsky, Brian Woods

On Tue, Oct 30, 2018 at 04:41:23PM +0100, Roger Pau Monne wrote:
> PV Dom0 has a quirk for some AMD processors, where enabling ACPI can
> also enable C1E mode. Apply the same workaround as done on PV for a
> PVH Dom0, which consist on trapping accesses to the SMI command IO
> port and disabling C1E if ACPI is enabled.
> 
> Reported-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 2/7] x86/hvm: introduce a define for the debug output IO port
  2018-10-30 15:41 ` [PATCH v3 2/7] x86/hvm: introduce a define for the debug output IO port Roger Pau Monne
@ 2018-10-31 16:36   ` Wei Liu
  0 siblings, 0 replies; 37+ messages in thread
From: Wei Liu @ 2018-10-31 16:36 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, xen-devel

On Tue, Oct 30, 2018 at 04:41:18PM +0100, Roger Pau Monne wrote:
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0
  2018-10-30 15:41 ` [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0 Roger Pau Monne
  2018-10-30 16:28   ` Wei Liu
@ 2018-10-31 17:44   ` Boris Ostrovsky
  2018-11-02  9:06   ` Jan Beulich
  2 siblings, 0 replies; 37+ messages in thread
From: Boris Ostrovsky @ 2018-10-31 17:44 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Andrew Cooper, Brian Woods, Wei Liu, Suravee Suthikulpanit, Jan Beulich

On 10/30/18 11:41 AM, Roger Pau Monne wrote:
> PV Dom0 has a quirk for some AMD processors, where enabling ACPI can
> also enable C1E mode. Apply the same workaround as done on PV for a
> PVH Dom0, which consist on trapping accesses to the SMI command IO
> port and disabling C1E if ACPI is enabled.
>
> Reported-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Brian Woods <brian.woods@amd.com>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0
  2018-10-30 15:41 ` [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0 Roger Pau Monne
  2018-10-30 16:28   ` Wei Liu
  2018-10-31 17:44   ` Boris Ostrovsky
@ 2018-11-02  9:06   ` Jan Beulich
  2018-11-07 10:24     ` Roger Pau Monné
  2 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-02  9:06 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, Andrew Cooper, Suravee Suthikulpanit, xen-devel,
	Boris Ostrovsky, Brian Woods

>>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -44,6 +44,9 @@ integer_param("cpuid_mask_thermal_ecx", opt_cpuid_mask_thermal_ecx);
>  s8 __read_mostly opt_allow_unsafe;
>  boolean_param("allow_unsafe", opt_allow_unsafe);
>  
> +/* Signal whether the ACPI C1E quirk is required. */
> +bool amd_acpi_c1e_quirk;

__read_mostly

> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1273,6 +1273,24 @@ void svm_host_osvw_init()
>      spin_unlock(&osvw_lock);
>  }
>  
> +static int acpi_c1e_quirk(int dir, unsigned int port, unsigned int bytes,
> +                          uint32_t *val)
> +{
> +    ASSERT(bytes == 1 && port == acpi_smi_cmd);
> +
> +    if ( dir == IOREQ_READ )
> +    {
> +        *val = inb(port);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    outb(*val, port);
> +    if ( *val == acpi_enable_value )
> +       on_each_cpu(amd_disable_c1e, NULL, 1);
> +
> +    return X86EMUL_OKAY;
> +}

I think you could do with even less logic duplication if instead you
called check_disable_c1e() (then given an amd_ prefix of course)
here. I'd then further consider to do away with
pv_post_outb_hook in a follow-up patch, getting the PV side
closer to the model used here.

Not being the maintainer of this code, I'd nevertheless like to also
suggest to use if/else here, making the function get away with
just a single return point.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/7] vpci: fix updating the command register
  2018-10-30 15:41 ` [PATCH v3 4/7] vpci: fix updating the command register Roger Pau Monne
@ 2018-11-05 16:46   ` Jan Beulich
  2018-11-07 10:47     ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-05 16:46 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> When switching the memory decoding bit in the command register the
> rest of the changes where dropped, leading to only the memory decoding
> bit being updated.
> 
> Fix this by unconditionally writing the guest-requested command except
> for the memory decoding bit, which will be updated once the p2m
> changes are performed.

Are you convinced there are no devices (or drivers) with errata
requiring the write to happen in a single step? Remember that
the register value immediately becomes visible to other (v)CPUs.

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -333,8 +333,10 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>           * hoping the guest will realize and try again.
>           */
>          modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
> -    else
> -        pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
> +
> +    /* Write the new command without updating the memory decoding bit. */
> +    cmd = (cmd & ~PCI_COMMAND_MEMORY) | (current_cmd & PCI_COMMAND_MEMORY);
> +    pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);

Furthermore, if the mapping didn't get deferred, aren't you
discarding the new decode bit value here, as written by
modify_bars() -> apply_map() -> modify_decoding()?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-10-30 15:41 ` [PATCH v3 5/7] vpci: fix execution of long running operations Roger Pau Monne
@ 2018-11-05 16:56   ` Jan Beulich
  2018-11-07 11:11     ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-05 16:56 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

>>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> BAR map/unmap is a long running operation that needs to be preempted
> in order to avoid overrunning the assigned vCPU time (or even
> triggering the watchdog).
> 
> Current logic for this preemption is wrong, and won't work at all for
> AMD since only Intel makes use of hvm_io_pending (and even in that
> case the current code is wrong).

I'm having trouble understanding this, both for the AMD aspect
(it is only vvmx.c which has a function call not mirrored on the
AMD side) and for the supposed general brokenness. Without
some clarification I can't judge whether re-implementing via
tasklet is actually the best approach.

> +void vpci_init_vcpu(struct vcpu *v)
> +{
> +    tasklet_init(&v->vpci.task, vpci_process_pending, (unsigned long)v);
>  }

Since there's no respective cleanup code added afaics - what
if the domain gets cleaned up behind the back of the (long
running) tasklet? Don't you want to acquire (and then release)
an extra domain reference somewhere?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 6/7] vpci/msix: carve p2m hole for MSIX MMIO regions
  2018-10-30 15:41 ` [PATCH v3 6/7] vpci/msix: carve p2m hole for MSIX MMIO regions Roger Pau Monne
@ 2018-11-05 17:07   ` Jan Beulich
  2018-11-07 11:33     ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-05 17:07 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> Make sure the MSIX MMIO regions don't have p2m entries setup, so that
> accesses to them trap into the hypervisor and can be handled by vpci.
> 
> This is a side-effect of commit 042678762 for PVH Dom0, which added
> mappings for all the reserved regions into the Dom0 p2m.

I'm afraid the description is ambiguous or misleading, as I don't suppose
you want to state that what the patch here does is a side effect of the
mentioned commit. Instead I assume you mean that p2m entries we
don't want get set up without the change here.

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -88,6 +88,14 @@ static void modify_decoding(const struct pci_dev *pdev, bool map, bool rom_only)
>      uint16_t cmd;
>      unsigned int i;
>  
> +    /*
> +     * Make sure there are no mappings in the MSIX MMIO areas, so that accesses
> +     * can be trapped (and emulated) by Xen when the memory decoding bit is
> +     * enabled.
> +     */
> +    if ( map && !rom_only && vpci_make_msix_hole(pdev) )
> +        return;

If I'm not mistaken, you punch holes after having set up p2m entries.
This may be fine for Dom0, but looks racy for (future) DomU use of
this code. If so, please add a respective fixme annotation.

> +int vpci_make_msix_hole(const struct pci_dev *pdev)
> +{
> +    struct domain *d = pdev->domain;
> +    unsigned int i;
> +
> +    if ( !pdev->vpci->msix )
> +        return 0;
> +
> +    /* Make sure there's a hole for the MSIX table/PBA in the p2m. */
> +    for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->tables); i++ )
> +    {
> +        unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
> +        unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
> +                                     vmsix_table_size(pdev->vpci, i) - 1);
> +
> +        for ( ; start <= end; start++ )
> +        {
> +            p2m_type_t t;
> +            mfn_t mfn = get_gfn_query(d, start, &t);
> +
> +            if ( t == p2m_mmio_direct && mfn_x(mfn) == start )
> +                    clear_identity_p2m_entry(d, start);

Indentation.

> +            else if ( t != p2m_mmio_dm )

Can you please also permit p2m_invalid right away, as the long term
plan is to default to that type instead of p2m_mmio_dm for unpopulated
p2m entries? And perhaps using switch() then produces easier to read
code.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0
  2018-11-02  9:06   ` Jan Beulich
@ 2018-11-07 10:24     ` Roger Pau Monné
  0 siblings, 0 replies; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-07 10:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Andrew Cooper, Suravee Suthikulpanit, xen-devel,
	Boris Ostrovsky, Brian Woods

On Fri, Nov 02, 2018 at 03:06:58AM -0600, Jan Beulich wrote:
> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> > --- a/xen/arch/x86/cpu/amd.c
> > +++ b/xen/arch/x86/cpu/amd.c
> > @@ -44,6 +44,9 @@ integer_param("cpuid_mask_thermal_ecx", opt_cpuid_mask_thermal_ecx);
> >  s8 __read_mostly opt_allow_unsafe;
> >  boolean_param("allow_unsafe", opt_allow_unsafe);
> >  
> > +/* Signal whether the ACPI C1E quirk is required. */
> > +bool amd_acpi_c1e_quirk;
> 
> __read_mostly
> 
> > --- a/xen/arch/x86/hvm/svm/svm.c
> > +++ b/xen/arch/x86/hvm/svm/svm.c
> > @@ -1273,6 +1273,24 @@ void svm_host_osvw_init()
> >      spin_unlock(&osvw_lock);
> >  }
> >  
> > +static int acpi_c1e_quirk(int dir, unsigned int port, unsigned int bytes,
> > +                          uint32_t *val)
> > +{
> > +    ASSERT(bytes == 1 && port == acpi_smi_cmd);
> > +
> > +    if ( dir == IOREQ_READ )
> > +    {
> > +        *val = inb(port);
> > +        return X86EMUL_OKAY;
> > +    }
> > +
> > +    outb(*val, port);
> > +    if ( *val == acpi_enable_value )
> > +       on_each_cpu(amd_disable_c1e, NULL, 1);
> > +
> > +    return X86EMUL_OKAY;
> > +}
> 
> I think you could do with even less logic duplication if instead you
> called check_disable_c1e() (then given an amd_ prefix of course)
> here. I'd then further consider to do away with
> pv_post_outb_hook in a follow-up patch, getting the PV side
> closer to the model used here.

Using check_disable_c1e incurs in one extra check that's not needed
here, since the port will always be acpi_smi_cmd in acpi_c1e_quirk.
In any case I agree it's better to use check_disable_c1e so there's
slightly less code repetition.

> 
> Not being the maintainer of this code, I'd nevertheless like to also
> suggest to use if/else here, making the function get away with
> just a single return point.

Ack, will resend with the fixes.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/7] vpci: fix updating the command register
  2018-11-05 16:46   ` Jan Beulich
@ 2018-11-07 10:47     ` Roger Pau Monné
  2018-11-07 15:00       ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-07 10:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

On Mon, Nov 05, 2018 at 09:46:22AM -0700, Jan Beulich wrote:
> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> > When switching the memory decoding bit in the command register the
> > rest of the changes where dropped, leading to only the memory decoding
> > bit being updated.
> > 
> > Fix this by unconditionally writing the guest-requested command except
> > for the memory decoding bit, which will be updated once the p2m
> > changes are performed.
> 
> Are you convinced there are no devices (or drivers) with errata
> requiring the write to happen in a single step?

That I certainly don't know. On Xen we already toggle the memory
decoding bit if required when sizing the BARs during initialization,
which will likely also trigger such a bug if it exist.

> Remember that
> the register value immediately becomes visible to other (v)CPUs.

But the vCPU that performed the write would be blocked and only
released once all the bits have been updated. While other vCPUs could
indeed see a partly updated command value without the memory decode
bit set I'm not sure this is harmful, well behaved OSes should wait
for the write to complete in any case.

> > --- a/xen/drivers/vpci/header.c
> > +++ b/xen/drivers/vpci/header.c
> > @@ -333,8 +333,10 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> >           * hoping the guest will realize and try again.
> >           */
> >          modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
> > -    else
> > -        pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
> > +
> > +    /* Write the new command without updating the memory decoding bit. */
> > +    cmd = (cmd & ~PCI_COMMAND_MEMORY) | (current_cmd & PCI_COMMAND_MEMORY);
> > +    pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
> 
> Furthermore, if the mapping didn't get deferred, aren't you
> discarding the new decode bit value here, as written by
> modify_bars() -> apply_map() -> modify_decoding()?

The map should always be deferred when called from cmd_write because
this handler will be accessed exclusively by the guest, in which case
system_state >= SYS_STATE_active.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-05 16:56   ` Jan Beulich
@ 2018-11-07 11:11     ` Roger Pau Monné
  2018-11-07 15:06       ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-07 11:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

On Mon, Nov 05, 2018 at 09:56:13AM -0700, Jan Beulich wrote:
> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> > BAR map/unmap is a long running operation that needs to be preempted
> > in order to avoid overrunning the assigned vCPU time (or even
> > triggering the watchdog).
> > 
> > Current logic for this preemption is wrong, and won't work at all for
> > AMD since only Intel makes use of hvm_io_pending (and even in that
> > case the current code is wrong).
> 
> I'm having trouble understanding this, both for the AMD aspect
> (it is only vvmx.c which has a function call not mirrored on the
> AMD side) and for the supposed general brokenness. Without
> some clarification I can't judge whether re-implementing via
> tasklet is actually the best approach.

hvm_io_pending itself cannot block the vCPU from executing, it's used
by nvmx_switch_guest in order to prevent changing the nested VMCS if
there's pending IO emulation work AFAICT.

The only way I could find to actually prevent a vCPU from running
while doing some work on it's behalf in a preemptive way is by
blocking it and using a tasklet. What's done with IOREQs is not
suitable here since Xen needs to do some work instead of just wait on
an external event (an event channel from the IOREQ).

> > +void vpci_init_vcpu(struct vcpu *v)
> > +{
> > +    tasklet_init(&v->vpci.task, vpci_process_pending, (unsigned long)v);
> >  }
> 
> Since there's no respective cleanup code added afaics - what
> if the domain gets cleaned up behind the back of the (long
> running) tasklet? Don't you want to acquire (and then release)
> an extra domain reference somewhere?

Yes, that's correct. Isn't just doing a tasklet_kill at domain
destruction enough?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 6/7] vpci/msix: carve p2m hole for MSIX MMIO regions
  2018-11-05 17:07   ` Jan Beulich
@ 2018-11-07 11:33     ` Roger Pau Monné
  0 siblings, 0 replies; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-07 11:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

On Mon, Nov 05, 2018 at 10:07:24AM -0700, Jan Beulich wrote:
> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> > Make sure the MSIX MMIO regions don't have p2m entries setup, so that
> > accesses to them trap into the hypervisor and can be handled by vpci.
> > 
> > This is a side-effect of commit 042678762 for PVH Dom0, which added
> > mappings for all the reserved regions into the Dom0 p2m.
> 
> I'm afraid the description is ambiguous or misleading, as I don't suppose
> you want to state that what the patch here does is a side effect of the
> mentioned commit. Instead I assume you mean that p2m entries we
> don't want get set up without the change here.

Yes, arch_iommu_hwdom_init will setup such entries. What's done here
is just to make sure there are no mappings established for the MSIX
MMIO regions, or else no trapping would happen. I will reword the
commit message to make it clearer.

> > --- a/xen/drivers/vpci/header.c
> > +++ b/xen/drivers/vpci/header.c
> > @@ -88,6 +88,14 @@ static void modify_decoding(const struct pci_dev *pdev, bool map, bool rom_only)
> >      uint16_t cmd;
> >      unsigned int i;
> >  
> > +    /*
> > +     * Make sure there are no mappings in the MSIX MMIO areas, so that accesses
> > +     * can be trapped (and emulated) by Xen when the memory decoding bit is
> > +     * enabled.
> > +     */
> > +    if ( map && !rom_only && vpci_make_msix_hole(pdev) )
> > +        return;
> 
> If I'm not mistaken, you punch holes after having set up p2m entries.
> This may be fine for Dom0, but looks racy for (future) DomU use of
> this code. If so, please add a respective fixme annotation.

Ack. All the BAR handling/mapping must be much more strict for DomU.

> > +int vpci_make_msix_hole(const struct pci_dev *pdev)
> > +{
> > +    struct domain *d = pdev->domain;
> > +    unsigned int i;
> > +
> > +    if ( !pdev->vpci->msix )
> > +        return 0;
> > +
> > +    /* Make sure there's a hole for the MSIX table/PBA in the p2m. */
> > +    for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->tables); i++ )
> > +    {
> > +        unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
> > +        unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
> > +                                     vmsix_table_size(pdev->vpci, i) - 1);
> > +
> > +        for ( ; start <= end; start++ )
> > +        {
> > +            p2m_type_t t;
> > +            mfn_t mfn = get_gfn_query(d, start, &t);
> > +
> > +            if ( t == p2m_mmio_direct && mfn_x(mfn) == start )
> > +                    clear_identity_p2m_entry(d, start);
> 
> Indentation.
> 
> > +            else if ( t != p2m_mmio_dm )
> 
> Can you please also permit p2m_invalid right away, as the long term
> plan is to default to that type instead of p2m_mmio_dm for unpopulated
> p2m entries? And perhaps using switch() then produces easier to read
> code.

Sure.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 4/7] vpci: fix updating the command register
  2018-11-07 10:47     ` Roger Pau Monné
@ 2018-11-07 15:00       ` Jan Beulich
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2018-11-07 15:00 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 07.11.18 at 11:47, <roger.pau@citrix.com> wrote:
> On Mon, Nov 05, 2018 at 09:46:22AM -0700, Jan Beulich wrote:
>> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
>> > When switching the memory decoding bit in the command register the
>> > rest of the changes where dropped, leading to only the memory decoding
>> > bit being updated.
>> > 
>> > Fix this by unconditionally writing the guest-requested command except
>> > for the memory decoding bit, which will be updated once the p2m
>> > changes are performed.
>> 
>> Are you convinced there are no devices (or drivers) with errata
>> requiring the write to happen in a single step?
> 
> That I certainly don't know. On Xen we already toggle the memory
> decoding bit if required when sizing the BARs during initialization,
> which will likely also trigger such a bug if it exist.

Toggling a single bit is still slightly different from taking off a
certain bit from a value to be written. Only the second write
(toggling just that one bit) matches behavior we already have.

>> Remember that
>> the register value immediately becomes visible to other (v)CPUs.
> 
> But the vCPU that performed the write would be blocked and only
> released once all the bits have been updated. While other vCPUs could
> indeed see a partly updated command value without the memory decode
> bit set I'm not sure this is harmful, well behaved OSes should wait
> for the write to complete in any case.

They should, yes. Anyway, this isn't an argument against the patch
in its current shape, just something which came to mind while reviewing.

>> > --- a/xen/drivers/vpci/header.c
>> > +++ b/xen/drivers/vpci/header.c
>> > @@ -333,8 +333,10 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> >           * hoping the guest will realize and try again.
>> >           */
>> >          modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
>> > -    else
>> > -        pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
>> > +
>> > +    /* Write the new command without updating the memory decoding bit. */
>> > +    cmd = (cmd & ~PCI_COMMAND_MEMORY) | (current_cmd & PCI_COMMAND_MEMORY);
>> > +    pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
>> 
>> Furthermore, if the mapping didn't get deferred, aren't you
>> discarding the new decode bit value here, as written by
>> modify_bars() -> apply_map() -> modify_decoding()?
> 
> The map should always be deferred when called from cmd_write because
> this handler will be accessed exclusively by the guest, in which case
> system_state >= SYS_STATE_active.

Would you mind clarifying this in a comment, or by way of an added
ASSERT()?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-07 11:11     ` Roger Pau Monné
@ 2018-11-07 15:06       ` Jan Beulich
  2018-11-07 17:15         ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-07 15:06 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

>>> On 07.11.18 at 12:11, <roger.pau@citrix.com> wrote:
> On Mon, Nov 05, 2018 at 09:56:13AM -0700, Jan Beulich wrote:
>> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
>> > BAR map/unmap is a long running operation that needs to be preempted
>> > in order to avoid overrunning the assigned vCPU time (or even
>> > triggering the watchdog).
>> > 
>> > Current logic for this preemption is wrong, and won't work at all for
>> > AMD since only Intel makes use of hvm_io_pending (and even in that
>> > case the current code is wrong).
>> 
>> I'm having trouble understanding this, both for the AMD aspect
>> (it is only vvmx.c which has a function call not mirrored on the
>> AMD side) and for the supposed general brokenness. Without
>> some clarification I can't judge whether re-implementing via
>> tasklet is actually the best approach.
> 
> hvm_io_pending itself cannot block the vCPU from executing, it's used
> by nvmx_switch_guest in order to prevent changing the nested VMCS if
> there's pending IO emulation work AFAICT.
> 
> The only way I could find to actually prevent a vCPU from running
> while doing some work on it's behalf in a preemptive way is by
> blocking it and using a tasklet. What's done with IOREQs is not
> suitable here since Xen needs to do some work instead of just wait on
> an external event (an event channel from the IOREQ).

No, there is a second means, I've just confused the functions. The
question is whether your vpci_process_pending() invocation
perhaps sits in the wrong function. handle_hvm_io_completion() is
what hvm_do_resume() calls, and what can prevent a guest from
resuming execution. The hvm_io_pending() invocation just sits on
a special case path down from there (through handle_pio()).

>> > +void vpci_init_vcpu(struct vcpu *v)
>> > +{
>> > +    tasklet_init(&v->vpci.task, vpci_process_pending, (unsigned long)v);
>> >  }
>> 
>> Since there's no respective cleanup code added afaics - what
>> if the domain gets cleaned up behind the back of the (long
>> running) tasklet? Don't you want to acquire (and then release)
>> an extra domain reference somewhere?
> 
> Yes, that's correct. Isn't just doing a tasklet_kill at domain
> destruction enough?

Yes, that should do it.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-07 15:06       ` Jan Beulich
@ 2018-11-07 17:15         ` Roger Pau Monné
  2018-11-08  9:55           ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-07 17:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

On Wed, Nov 07, 2018 at 08:06:00AM -0700, Jan Beulich wrote:
> >>> On 07.11.18 at 12:11, <roger.pau@citrix.com> wrote:
> > On Mon, Nov 05, 2018 at 09:56:13AM -0700, Jan Beulich wrote:
> >> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> >> > BAR map/unmap is a long running operation that needs to be preempted
> >> > in order to avoid overrunning the assigned vCPU time (or even
> >> > triggering the watchdog).
> >> > 
> >> > Current logic for this preemption is wrong, and won't work at all for
> >> > AMD since only Intel makes use of hvm_io_pending (and even in that
> >> > case the current code is wrong).
> >> 
> >> I'm having trouble understanding this, both for the AMD aspect
> >> (it is only vvmx.c which has a function call not mirrored on the
> >> AMD side) and for the supposed general brokenness. Without
> >> some clarification I can't judge whether re-implementing via
> >> tasklet is actually the best approach.
> > 
> > hvm_io_pending itself cannot block the vCPU from executing, it's used
> > by nvmx_switch_guest in order to prevent changing the nested VMCS if
> > there's pending IO emulation work AFAICT.
> > 
> > The only way I could find to actually prevent a vCPU from running
> > while doing some work on it's behalf in a preemptive way is by
> > blocking it and using a tasklet. What's done with IOREQs is not
> > suitable here since Xen needs to do some work instead of just wait on
> > an external event (an event channel from the IOREQ).
> 
> No, there is a second means, I've just confused the functions. The
> question is whether your vpci_process_pending() invocation
> perhaps sits in the wrong function. handle_hvm_io_completion() is
> what hvm_do_resume() calls, and what can prevent a guest from
> resuming execution. The hvm_io_pending() invocation just sits on
> a special case path down from there (through handle_pio()).

Yes, handle_hvm_io_completion is the function that actually blocks the
vCPU and waits for an event channel from the ioreq. This is however
not suitable because it uses the following code (simplified):

set_bit(_VPF_blocked_in_xen, &current->pause_flags);
raise_softirq(SCHEDULE_SOFTIRQ);
do_softirq();

In the vPCI case Xen cannot set the vCPU as blocked_in_xen, Xen needs
the scheduler to schedule the vCPU so the pending work can be
processed. Then if the blocked bit is not set the call to do_softirq
would be recurred, thus probably leading to a stack overflow if
there's enough pending work. ie:

<process work>
	<do_softirq>
		<process work>
			<do_softirq>
				<...>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-07 17:15         ` Roger Pau Monné
@ 2018-11-08  9:55           ` Jan Beulich
  2018-11-08 11:29             ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-08  9:55 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

>>> On 07.11.18 at 18:15, <roger.pau@citrix.com> wrote:
> On Wed, Nov 07, 2018 at 08:06:00AM -0700, Jan Beulich wrote:
>> >>> On 07.11.18 at 12:11, <roger.pau@citrix.com> wrote:
>> > On Mon, Nov 05, 2018 at 09:56:13AM -0700, Jan Beulich wrote:
>> >> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
>> >> > BAR map/unmap is a long running operation that needs to be preempted
>> >> > in order to avoid overrunning the assigned vCPU time (or even
>> >> > triggering the watchdog).
>> >> > 
>> >> > Current logic for this preemption is wrong, and won't work at all for
>> >> > AMD since only Intel makes use of hvm_io_pending (and even in that
>> >> > case the current code is wrong).
>> >> 
>> >> I'm having trouble understanding this, both for the AMD aspect
>> >> (it is only vvmx.c which has a function call not mirrored on the
>> >> AMD side) and for the supposed general brokenness. Without
>> >> some clarification I can't judge whether re-implementing via
>> >> tasklet is actually the best approach.
>> > 
>> > hvm_io_pending itself cannot block the vCPU from executing, it's used
>> > by nvmx_switch_guest in order to prevent changing the nested VMCS if
>> > there's pending IO emulation work AFAICT.
>> > 
>> > The only way I could find to actually prevent a vCPU from running
>> > while doing some work on it's behalf in a preemptive way is by
>> > blocking it and using a tasklet. What's done with IOREQs is not
>> > suitable here since Xen needs to do some work instead of just wait on
>> > an external event (an event channel from the IOREQ).
>> 
>> No, there is a second means, I've just confused the functions. The
>> question is whether your vpci_process_pending() invocation
>> perhaps sits in the wrong function. handle_hvm_io_completion() is
>> what hvm_do_resume() calls, and what can prevent a guest from
>> resuming execution. The hvm_io_pending() invocation just sits on
>> a special case path down from there (through handle_pio()).
> 
> Yes, handle_hvm_io_completion is the function that actually blocks the
> vCPU and waits for an event channel from the ioreq. This is however
> not suitable because it uses the following code (simplified):
> 
> set_bit(_VPF_blocked_in_xen, &current->pause_flags);
> raise_softirq(SCHEDULE_SOFTIRQ);
> do_softirq();
> 
> In the vPCI case Xen cannot set the vCPU as blocked_in_xen, Xen needs
> the scheduler to schedule the vCPU so the pending work can be
> processed.

Right, and I didn't say you should set the vCPU to blocked. What
I've pointed out is that the mere fact of handle_hvm_io_completion()
returning false makes hvm_do_resume() bail, resulting in another
round through softirq processing (from entry.S code) as long as
_some_ softirq is pending (here: the scheduler one).

 Then if the blocked bit is not set the call to do_softirq
> would be recurred, thus probably leading to a stack overflow if
> there's enough pending work. ie:
> 
> <process work>
> 	<do_softirq>
> 		<process work>
> 			<do_softirq>
> 				<...>

Why would that be? The do_softirq() invocation sits on the exit-
to-guest path, explicitly avoiding any such nesting unless there
was a do_softirq() invocation somewhere in a softirq handler.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08  9:55           ` Jan Beulich
@ 2018-11-08 11:29             ` Roger Pau Monné
  2018-11-08 11:42               ` Julien Grall
  2018-11-08 12:32               ` Jan Beulich
  0 siblings, 2 replies; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-08 11:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

On Thu, Nov 08, 2018 at 02:55:56AM -0700, Jan Beulich wrote:
> >>> On 07.11.18 at 18:15, <roger.pau@citrix.com> wrote:
> > On Wed, Nov 07, 2018 at 08:06:00AM -0700, Jan Beulich wrote:
> >> >>> On 07.11.18 at 12:11, <roger.pau@citrix.com> wrote:
> >> > On Mon, Nov 05, 2018 at 09:56:13AM -0700, Jan Beulich wrote:
> >> >> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> >> >> > BAR map/unmap is a long running operation that needs to be preempted
> >> >> > in order to avoid overrunning the assigned vCPU time (or even
> >> >> > triggering the watchdog).
> >> >> > 
> >> >> > Current logic for this preemption is wrong, and won't work at all for
> >> >> > AMD since only Intel makes use of hvm_io_pending (and even in that
> >> >> > case the current code is wrong).
> >> >> 
> >> >> I'm having trouble understanding this, both for the AMD aspect
> >> >> (it is only vvmx.c which has a function call not mirrored on the
> >> >> AMD side) and for the supposed general brokenness. Without
> >> >> some clarification I can't judge whether re-implementing via
> >> >> tasklet is actually the best approach.
> >> > 
> >> > hvm_io_pending itself cannot block the vCPU from executing, it's used
> >> > by nvmx_switch_guest in order to prevent changing the nested VMCS if
> >> > there's pending IO emulation work AFAICT.
> >> > 
> >> > The only way I could find to actually prevent a vCPU from running
> >> > while doing some work on it's behalf in a preemptive way is by
> >> > blocking it and using a tasklet. What's done with IOREQs is not
> >> > suitable here since Xen needs to do some work instead of just wait on
> >> > an external event (an event channel from the IOREQ).
> >> 
> >> No, there is a second means, I've just confused the functions. The
> >> question is whether your vpci_process_pending() invocation
> >> perhaps sits in the wrong function. handle_hvm_io_completion() is
> >> what hvm_do_resume() calls, and what can prevent a guest from
> >> resuming execution. The hvm_io_pending() invocation just sits on
> >> a special case path down from there (through handle_pio()).
> > 
> > Yes, handle_hvm_io_completion is the function that actually blocks the
> > vCPU and waits for an event channel from the ioreq. This is however
> > not suitable because it uses the following code (simplified):
> > 
> > set_bit(_VPF_blocked_in_xen, &current->pause_flags);
> > raise_softirq(SCHEDULE_SOFTIRQ);
> > do_softirq();
> > 
> > In the vPCI case Xen cannot set the vCPU as blocked_in_xen, Xen needs
> > the scheduler to schedule the vCPU so the pending work can be
> > processed.
> 
> Right, and I didn't say you should set the vCPU to blocked. What
> I've pointed out is that the mere fact of handle_hvm_io_completion()
> returning false makes hvm_do_resume() bail, resulting in another
> round through softirq processing (from entry.S code) as long as
> _some_ softirq is pending (here: the scheduler one).

No, hvm_do_resume bailing doesn't prevent the VM from being scheduled.
Both vmx_do_resume and svm_do_resume (the callers of hvm_do_resume)
will run the guest regardless of whether hvm_do_resume has bailed
early.

Note that at the point where hvm_do_resume is called there's no
turning back, both {vmx/svm}_do_resume functions are annotated as
noreturn. The only way to prevent guest execution at that point is to
raise a scheduler softirq and process it, like it's done in
wait_on_xen_event_channel.

> 
>  Then if the blocked bit is not set the call to do_softirq
> > would be recurred, thus probably leading to a stack overflow if
> > there's enough pending work. ie:
> > 
> > <process work>
> > 	<do_softirq>
> > 		<process work>
> > 			<do_softirq>
> > 				<...>
> 
> Why would that be? The do_softirq() invocation sits on the exit-
> to-guest path, explicitly avoiding any such nesting unless there
> was a do_softirq() invocation somewhere in a softirq handler.

It sits on an exit-to-guest path, but the following chunk:

raise_softirq(SCHEDULE_SOFTIRQ);
do_softirq();

Would prevent the path from ever reaching the exit-to-guest and
nesting on itself, unless the vCPU is marked as blocked, which
prevents it from being scheduled thus avoiding this recursion.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 11:29             ` Roger Pau Monné
@ 2018-11-08 11:42               ` Julien Grall
  2018-11-08 11:44                 ` Roger Pau Monné
  2018-11-08 12:32               ` Jan Beulich
  1 sibling, 1 reply; 37+ messages in thread
From: Julien Grall @ 2018-11-08 11:42 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Paul Durrant, xen-devel

Hi,

Sorry to jump in the conversation late.

On 11/8/18 11:29 AM, Roger Pau Monné wrote:
>> Why would that be? The do_softirq() invocation sits on the exit-
>> to-guest path, explicitly avoiding any such nesting unless there
>> was a do_softirq() invocation somewhere in a softirq handler.
> 
> It sits on an exit-to-guest path, but the following chunk:
> 
> raise_softirq(SCHEDULE_SOFTIRQ);
> do_softirq();
> 
> Would prevent the path from ever reaching the exit-to-guest and
> nesting on itself, unless the vCPU is marked as blocked, which
> prevents it from being scheduled thus avoiding this recursion.

I can't see how the recursion could happen on Arm. So is it an x86 issue?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 11:42               ` Julien Grall
@ 2018-11-08 11:44                 ` Roger Pau Monné
  2018-11-08 11:52                   ` Julien Grall
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-08 11:44 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Paul Durrant, Jan Beulich, xen-devel

On Thu, Nov 08, 2018 at 11:42:35AM +0000, Julien Grall wrote:
> Hi,
> 
> Sorry to jump in the conversation late.
> 
> On 11/8/18 11:29 AM, Roger Pau Monné wrote:
> > > Why would that be? The do_softirq() invocation sits on the exit-
> > > to-guest path, explicitly avoiding any such nesting unless there
> > > was a do_softirq() invocation somewhere in a softirq handler.
> > 
> > It sits on an exit-to-guest path, but the following chunk:
> > 
> > raise_softirq(SCHEDULE_SOFTIRQ);
> > do_softirq();
> > 
> > Would prevent the path from ever reaching the exit-to-guest and
> > nesting on itself, unless the vCPU is marked as blocked, which
> > prevents it from being scheduled thus avoiding this recursion.
> 
> I can't see how the recursion could happen on Arm. So is it an x86 issue?

This is not an issue with the current code, I was just discussing with
Jan how to properly implement vPCI long running operations that need
to be preempted.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 11:44                 ` Roger Pau Monné
@ 2018-11-08 11:52                   ` Julien Grall
  2018-11-08 12:20                     ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Julien Grall @ 2018-11-08 11:52 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Paul Durrant, Jan Beulich, xen-devel

Hi Roger,

On 11/8/18 11:44 AM, Roger Pau Monné wrote:
> On Thu, Nov 08, 2018 at 11:42:35AM +0000, Julien Grall wrote:
>> Hi,
>>
>> Sorry to jump in the conversation late.
>>
>> On 11/8/18 11:29 AM, Roger Pau Monné wrote:
>>>> Why would that be? The do_softirq() invocation sits on the exit-
>>>> to-guest path, explicitly avoiding any such nesting unless there
>>>> was a do_softirq() invocation somewhere in a softirq handler.
>>>
>>> It sits on an exit-to-guest path, but the following chunk:
>>>
>>> raise_softirq(SCHEDULE_SOFTIRQ);
>>> do_softirq();
>>>
>>> Would prevent the path from ever reaching the exit-to-guest and
>>> nesting on itself, unless the vCPU is marked as blocked, which
>>> prevents it from being scheduled thus avoiding this recursion.
>>
>> I can't see how the recursion could happen on Arm. So is it an x86 issue?
> 
> This is not an issue with the current code, I was just discussing with
> Jan how to properly implement vPCI long running operations that need
> to be preempted.

To give more context on my question, we are looking at handling 
preemption on Arm in some long running operations (e.g cache flush) 
without having to worry about returning to guest.

I am thinking something along the following on Arm in a loop.

for ( .... )
{
    do_action
    if ( try_reschedule )
    {
	raise_softirq(SCHEDULE_SOFTIRQ);
	do_softirq();
    }
}

This would require to have no lock taken but I think it would work on 
Arm for any long operations. So I am quite interested on the result on 
the discussions here.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 11:52                   ` Julien Grall
@ 2018-11-08 12:20                     ` Roger Pau Monné
  2018-11-08 12:38                       ` Julien Grall
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-08 12:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Paul Durrant, Jan Beulich, xen-devel

On Thu, Nov 08, 2018 at 11:52:57AM +0000, Julien Grall wrote:
> Hi Roger,
> 
> On 11/8/18 11:44 AM, Roger Pau Monné wrote:
> > On Thu, Nov 08, 2018 at 11:42:35AM +0000, Julien Grall wrote:
> > > Hi,
> > > 
> > > Sorry to jump in the conversation late.
> > > 
> > > On 11/8/18 11:29 AM, Roger Pau Monné wrote:
> > > > > Why would that be? The do_softirq() invocation sits on the exit-
> > > > > to-guest path, explicitly avoiding any such nesting unless there
> > > > > was a do_softirq() invocation somewhere in a softirq handler.
> > > > 
> > > > It sits on an exit-to-guest path, but the following chunk:
> > > > 
> > > > raise_softirq(SCHEDULE_SOFTIRQ);
> > > > do_softirq();
> > > > 
> > > > Would prevent the path from ever reaching the exit-to-guest and
> > > > nesting on itself, unless the vCPU is marked as blocked, which
> > > > prevents it from being scheduled thus avoiding this recursion.
> > > 
> > > I can't see how the recursion could happen on Arm. So is it an x86 issue?
> > 
> > This is not an issue with the current code, I was just discussing with
> > Jan how to properly implement vPCI long running operations that need
> > to be preempted.
> 
> To give more context on my question, we are looking at handling preemption
> on Arm in some long running operations (e.g cache flush) without having to
> worry about returning to guest.
> 
> I am thinking something along the following on Arm in a loop.
> 
> for ( .... )
> {
>    do_action
>    if ( try_reschedule )
>    {
> 	raise_softirq(SCHEDULE_SOFTIRQ);
> 	do_softirq();
>    }
> }
> 
> This would require to have no lock taken but I think it would work on Arm
> for any long operations. So I am quite interested on the result on the
> discussions here.

As said to Jan, I don't think this is viable because you could end up
recursing in do_softirq if there are no other guests to run and enough
reschedules.

Let's image that there's only 1 vCPU to run, and that it has a long
running operation pending. I assume you will somehow hook the code to
perform such operation in the guest resume path:

do_softirq()
    do_action()
-> preempt
        raise_softirq(SCHEDULE);
        do_softirq();
            do_action();
-> preempt
                raise_softirq(SCHEDULE);
                do_softirq();
                    do_action();
-> preempt
...

As you can see this could overflow the stack if the are enough
preemptions.

IMO the only viable way to do this is to use tasklet and block the
vCPU until the tasklet has finished processing the pending work.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 11:29             ` Roger Pau Monné
  2018-11-08 11:42               ` Julien Grall
@ 2018-11-08 12:32               ` Jan Beulich
  2018-11-08 12:47                 ` Roger Pau Monné
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-08 12:32 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

>>> On 08.11.18 at 12:29, <roger.pau@citrix.com> wrote:
> On Thu, Nov 08, 2018 at 02:55:56AM -0700, Jan Beulich wrote:
>> >>> On 07.11.18 at 18:15, <roger.pau@citrix.com> wrote:
>> > On Wed, Nov 07, 2018 at 08:06:00AM -0700, Jan Beulich wrote:
>> >> >>> On 07.11.18 at 12:11, <roger.pau@citrix.com> wrote:
>> >> > On Mon, Nov 05, 2018 at 09:56:13AM -0700, Jan Beulich wrote:
>> >> >> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
>> >> >> > BAR map/unmap is a long running operation that needs to be preempted
>> >> >> > in order to avoid overrunning the assigned vCPU time (or even
>> >> >> > triggering the watchdog).
>> >> >> > 
>> >> >> > Current logic for this preemption is wrong, and won't work at all for
>> >> >> > AMD since only Intel makes use of hvm_io_pending (and even in that
>> >> >> > case the current code is wrong).
>> >> >> 
>> >> >> I'm having trouble understanding this, both for the AMD aspect
>> >> >> (it is only vvmx.c which has a function call not mirrored on the
>> >> >> AMD side) and for the supposed general brokenness. Without
>> >> >> some clarification I can't judge whether re-implementing via
>> >> >> tasklet is actually the best approach.
>> >> > 
>> >> > hvm_io_pending itself cannot block the vCPU from executing, it's used
>> >> > by nvmx_switch_guest in order to prevent changing the nested VMCS if
>> >> > there's pending IO emulation work AFAICT.
>> >> > 
>> >> > The only way I could find to actually prevent a vCPU from running
>> >> > while doing some work on it's behalf in a preemptive way is by
>> >> > blocking it and using a tasklet. What's done with IOREQs is not
>> >> > suitable here since Xen needs to do some work instead of just wait on
>> >> > an external event (an event channel from the IOREQ).
>> >> 
>> >> No, there is a second means, I've just confused the functions. The
>> >> question is whether your vpci_process_pending() invocation
>> >> perhaps sits in the wrong function. handle_hvm_io_completion() is
>> >> what hvm_do_resume() calls, and what can prevent a guest from
>> >> resuming execution. The hvm_io_pending() invocation just sits on
>> >> a special case path down from there (through handle_pio()).
>> > 
>> > Yes, handle_hvm_io_completion is the function that actually blocks the
>> > vCPU and waits for an event channel from the ioreq. This is however
>> > not suitable because it uses the following code (simplified):
>> > 
>> > set_bit(_VPF_blocked_in_xen, &current->pause_flags);
>> > raise_softirq(SCHEDULE_SOFTIRQ);
>> > do_softirq();
>> > 
>> > In the vPCI case Xen cannot set the vCPU as blocked_in_xen, Xen needs
>> > the scheduler to schedule the vCPU so the pending work can be
>> > processed.
>> 
>> Right, and I didn't say you should set the vCPU to blocked. What
>> I've pointed out is that the mere fact of handle_hvm_io_completion()
>> returning false makes hvm_do_resume() bail, resulting in another
>> round through softirq processing (from entry.S code) as long as
>> _some_ softirq is pending (here: the scheduler one).
> 
> No, hvm_do_resume bailing doesn't prevent the VM from being scheduled.
> Both vmx_do_resume and svm_do_resume (the callers of hvm_do_resume)
> will run the guest regardless of whether hvm_do_resume has bailed
> early.
> 
> Note that at the point where hvm_do_resume is called there's no
> turning back, both {vmx/svm}_do_resume functions are annotated as
> noreturn. The only way to prevent guest execution at that point is to
> raise a scheduler softirq and process it, like it's done in
> wait_on_xen_event_channel.

That's what I've said. You don't need to mark the vCPU as blocked
in Xen for this, though. And that's what I understood was your
original concern.

>>  Then if the blocked bit is not set the call to do_softirq
>> > would be recurred, thus probably leading to a stack overflow if
>> > there's enough pending work. ie:
>> > 
>> > <process work>
>> > 	<do_softirq>
>> > 		<process work>
>> > 			<do_softirq>
>> > 				<...>
>> 
>> Why would that be? The do_softirq() invocation sits on the exit-
>> to-guest path, explicitly avoiding any such nesting unless there
>> was a do_softirq() invocation somewhere in a softirq handler.
> 
> It sits on an exit-to-guest path, but the following chunk:
> 
> raise_softirq(SCHEDULE_SOFTIRQ);
> do_softirq();
> 
> Would prevent the path from ever reaching the exit-to-guest and
> nesting on itself, unless the vCPU is marked as blocked, which
> prevents it from being scheduled thus avoiding this recursion.

It would, indeed, but emphasis is on the subjunctive unless
you have an example of such malformed code. I'm not
aware of any, and my earlier comments also didn't suggest to
introduce such. Perhaps I'm simply missing part of what you
think you see? It is important to note though that
vmx_do_resume() has

    reset_stack_and_jump(vmx_asm_do_vmentry);

as its last step, precluding any nesting.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 12:20                     ` Roger Pau Monné
@ 2018-11-08 12:38                       ` Julien Grall
  0 siblings, 0 replies; 37+ messages in thread
From: Julien Grall @ 2018-11-08 12:38 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Paul Durrant, Jan Beulich, xen-devel

Hi Roger,

On 11/8/18 12:20 PM, Roger Pau Monné wrote:
> On Thu, Nov 08, 2018 at 11:52:57AM +0000, Julien Grall wrote:
>> Hi Roger,
>>
>> On 11/8/18 11:44 AM, Roger Pau Monné wrote:
>>> On Thu, Nov 08, 2018 at 11:42:35AM +0000, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> Sorry to jump in the conversation late.
>>>>
>>>> On 11/8/18 11:29 AM, Roger Pau Monné wrote:
>>>>>> Why would that be? The do_softirq() invocation sits on the exit-
>>>>>> to-guest path, explicitly avoiding any such nesting unless there
>>>>>> was a do_softirq() invocation somewhere in a softirq handler.
>>>>>
>>>>> It sits on an exit-to-guest path, but the following chunk:
>>>>>
>>>>> raise_softirq(SCHEDULE_SOFTIRQ);
>>>>> do_softirq();
>>>>>
>>>>> Would prevent the path from ever reaching the exit-to-guest and
>>>>> nesting on itself, unless the vCPU is marked as blocked, which
>>>>> prevents it from being scheduled thus avoiding this recursion.
>>>>
>>>> I can't see how the recursion could happen on Arm. So is it an x86 issue?
>>>
>>> This is not an issue with the current code, I was just discussing with
>>> Jan how to properly implement vPCI long running operations that need
>>> to be preempted.
>>
>> To give more context on my question, we are looking at handling preemption
>> on Arm in some long running operations (e.g cache flush) without having to
>> worry about returning to guest.
>>
>> I am thinking something along the following on Arm in a loop.
>>
>> for ( .... )
>> {
>>     do_action
>>     if ( try_reschedule )
>>     {
>> 	raise_softirq(SCHEDULE_SOFTIRQ);
>> 	do_softirq();
>>     }
>> }
>>
>> This would require to have no lock taken but I think it would work on Arm
>> for any long operations. So I am quite interested on the result on the
>> discussions here.
> 
> As said to Jan, I don't think this is viable because you could end up
> recursing in do_softirq if there are no other guests to run and enough
> reschedules.
> 
> Let's image that there's only 1 vCPU to run, and that it has a long
> running operation pending. I assume you will somehow hook the code to
> perform such operation in the guest resume path:
> 
> do_softirq()
>      do_action()
> -> preempt
>          raise_softirq(SCHEDULE);
>          do_softirq();
>              do_action();
> -> preempt
>                  raise_softirq(SCHEDULE);
>                  do_softirq();
>                      do_action();
> -> preempt
> ...
> 
> As you can see this could overflow the stack if the are enough
> preemptions.

This sounds like an x86 specific issue. In the case of Arm, the 
context_switch() function will return, so we will come back in the loop 
before.

We can do this because the hypervisor stack is per-VCPU. So there are no 
stack overflowed involved here.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 12:32               ` Jan Beulich
@ 2018-11-08 12:47                 ` Roger Pau Monné
  2018-11-08 13:04                   ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-08 12:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

On Thu, Nov 08, 2018 at 05:32:02AM -0700, Jan Beulich wrote:
> >>> On 08.11.18 at 12:29, <roger.pau@citrix.com> wrote:
> > On Thu, Nov 08, 2018 at 02:55:56AM -0700, Jan Beulich wrote:
> >> >>> On 07.11.18 at 18:15, <roger.pau@citrix.com> wrote:
> >>  Then if the blocked bit is not set the call to do_softirq
> >> > would be recurred, thus probably leading to a stack overflow if
> >> > there's enough pending work. ie:
> >> > 
> >> > <process work>
> >> > 	<do_softirq>
> >> > 		<process work>
> >> > 			<do_softirq>
> >> > 				<...>
> >> 
> >> Why would that be? The do_softirq() invocation sits on the exit-
> >> to-guest path, explicitly avoiding any such nesting unless there
> >> was a do_softirq() invocation somewhere in a softirq handler.
> > 
> > It sits on an exit-to-guest path, but the following chunk:
> > 
> > raise_softirq(SCHEDULE_SOFTIRQ);
> > do_softirq();
> > 
> > Would prevent the path from ever reaching the exit-to-guest and
> > nesting on itself, unless the vCPU is marked as blocked, which
> > prevents it from being scheduled thus avoiding this recursion.
> 
> It would, indeed, but emphasis is on the subjunctive unless
> you have an example of such malformed code. I'm not
> aware of any, and my earlier comments also didn't suggest to
> introduce such. Perhaps I'm simply missing part of what you
> think you see?

Maybe, I think I'm also lost in this discussion because we seem to
agree.

My point would be that on x86 I think the only way to have preemptible
long-running operations inside of Xen is to block the guest vCPU and
run them in a tasklet, or at least this seems the less invasive one.

Do you still have objections to this patch/approach?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 12:47                 ` Roger Pau Monné
@ 2018-11-08 13:04                   ` Jan Beulich
  2018-11-08 13:20                     ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-08 13:04 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

>>> On 08.11.18 at 13:47, <roger.pau@citrix.com> wrote:
> My point would be that on x86 I think the only way to have preemptible
> long-running operations inside of Xen is to block the guest vCPU and
> run them in a tasklet, or at least this seems the less invasive one.
> 
> Do you still have objections to this patch/approach?

Well, I still don't understand why you think you need to introduce
a tasklet in the first place. That's because I still don't understand
what you think is wrong with the current approach (leaving aside
the exact placement of where the vpci hook needs to be called).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 13:04                   ` Jan Beulich
@ 2018-11-08 13:20                     ` Roger Pau Monné
       [not found]                       ` <791E55F8020000889527FA34@prv1-mh.provo.novell.com>
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-08 13:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

On Thu, Nov 08, 2018 at 06:04:11AM -0700, Jan Beulich wrote:
> >>> On 08.11.18 at 13:47, <roger.pau@citrix.com> wrote:
> > My point would be that on x86 I think the only way to have preemptible
> > long-running operations inside of Xen is to block the guest vCPU and
> > run them in a tasklet, or at least this seems the less invasive one.
> > 
> > Do you still have objections to this patch/approach?
> 
> Well, I still don't understand why you think you need to introduce
> a tasklet in the first place. That's because I still don't understand
> what you think is wrong with the current approach (leaving aside
> the exact placement of where the vpci hook needs to be called).

The current approach doesn't prevent the vCPU from returning to guest
context with pending work.

Placing the pending work hook (vpci_process_pending) in hvm_do_resume
is not going to solve this unless we raise and process a scheduler
softirq, and then this leads to the recursion problem.

So overall I don't see how it's possible to have this long running
preemptible operations inside the hypervisor if it's not using a
tasklet.

Sorry for making this discussion so long, hope this is more clear than
my previous replies.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
       [not found]                       ` <791E55F8020000889527FA34@prv1-mh.provo.novell.com>
@ 2018-11-08 16:25                         ` Jan Beulich
  2018-11-08 16:59                           ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2018-11-08 16:25 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

>>> On 08.11.18 at 14:20, <roger.pau@citrix.com> wrote:
> On Thu, Nov 08, 2018 at 06:04:11AM -0700, Jan Beulich wrote:
>> >>> On 08.11.18 at 13:47, <roger.pau@citrix.com> wrote:
>> > My point would be that on x86 I think the only way to have preemptible
>> > long-running operations inside of Xen is to block the guest vCPU and
>> > run them in a tasklet, or at least this seems the less invasive one.
>> > 
>> > Do you still have objections to this patch/approach?
>> 
>> Well, I still don't understand why you think you need to introduce
>> a tasklet in the first place. That's because I still don't understand
>> what you think is wrong with the current approach (leaving aside
>> the exact placement of where the vpci hook needs to be called).
> 
> The current approach doesn't prevent the vCPU from returning to guest
> context with pending work.
> 
> Placing the pending work hook (vpci_process_pending) in hvm_do_resume
> is not going to solve this unless we raise and process a scheduler
> softirq, and then this leads to the recursion problem.

Which recursion problem? I still haven't seen an outline taking into
account what I have written in earlier replies. In particular I don't
see a softirq handler itself calling do_softirq() anywhere.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
  2018-11-08 16:25                         ` Jan Beulich
@ 2018-11-08 16:59                           ` Roger Pau Monné
       [not found]                             ` <E720D0C40200003B9527FA34@prv1-mh.provo.novell.com>
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2018-11-08 16:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

On Thu, Nov 08, 2018 at 09:25:26AM -0700, Jan Beulich wrote:
> >>> On 08.11.18 at 14:20, <roger.pau@citrix.com> wrote:
> > On Thu, Nov 08, 2018 at 06:04:11AM -0700, Jan Beulich wrote:
> >> >>> On 08.11.18 at 13:47, <roger.pau@citrix.com> wrote:
> >> > My point would be that on x86 I think the only way to have preemptible
> >> > long-running operations inside of Xen is to block the guest vCPU and
> >> > run them in a tasklet, or at least this seems the less invasive one.
> >> > 
> >> > Do you still have objections to this patch/approach?
> >> 
> >> Well, I still don't understand why you think you need to introduce
> >> a tasklet in the first place. That's because I still don't understand
> >> what you think is wrong with the current approach (leaving aside
> >> the exact placement of where the vpci hook needs to be called).
> > 
> > The current approach doesn't prevent the vCPU from returning to guest
> > context with pending work.
> > 
> > Placing the pending work hook (vpci_process_pending) in hvm_do_resume
> > is not going to solve this unless we raise and process a scheduler
> > softirq, and then this leads to the recursion problem.
> 
> Which recursion problem? I still haven't seen an outline taking into
> account what I have written in earlier replies. In particular I don't
> see a softirq handler itself calling do_softirq() anywhere.

I could place the vPCI pending work hook in hvm_do_resume or
handle_hvm_io_completion and then simply do a
raise_softirq(SCHEDULER_SOFTIRQ) and return in order to preempt. The
do_softirq() call in the svm/vmx guest entry path is called with a
clean stack so there's no stack overflow issue there.

Do you think this would be better than blocking the vCPU and using a
tasklet?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 5/7] vpci: fix execution of long running operations
       [not found]                             ` <E720D0C40200003B9527FA34@prv1-mh.provo.novell.com>
@ 2018-11-09  8:02                               ` Jan Beulich
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2018-11-09  8:02 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, xen-devel

>>> On 08.11.18 at 17:59, <roger.pau@citrix.com> wrote:
> On Thu, Nov 08, 2018 at 09:25:26AM -0700, Jan Beulich wrote:
>> >>> On 08.11.18 at 14:20, <roger.pau@citrix.com> wrote:
>> > On Thu, Nov 08, 2018 at 06:04:11AM -0700, Jan Beulich wrote:
>> >> >>> On 08.11.18 at 13:47, <roger.pau@citrix.com> wrote:
>> >> > My point would be that on x86 I think the only way to have preemptible
>> >> > long-running operations inside of Xen is to block the guest vCPU and
>> >> > run them in a tasklet, or at least this seems the less invasive one.
>> >> > 
>> >> > Do you still have objections to this patch/approach?
>> >> 
>> >> Well, I still don't understand why you think you need to introduce
>> >> a tasklet in the first place. That's because I still don't understand
>> >> what you think is wrong with the current approach (leaving aside
>> >> the exact placement of where the vpci hook needs to be called).
>> > 
>> > The current approach doesn't prevent the vCPU from returning to guest
>> > context with pending work.
>> > 
>> > Placing the pending work hook (vpci_process_pending) in hvm_do_resume
>> > is not going to solve this unless we raise and process a scheduler
>> > softirq, and then this leads to the recursion problem.
>> 
>> Which recursion problem? I still haven't seen an outline taking into
>> account what I have written in earlier replies. In particular I don't
>> see a softirq handler itself calling do_softirq() anywhere.
> 
> I could place the vPCI pending work hook in hvm_do_resume or
> handle_hvm_io_completion and then simply do a
> raise_softirq(SCHEDULER_SOFTIRQ) and return in order to preempt. The
> do_softirq() call in the svm/vmx guest entry path is called with a
> clean stack so there's no stack overflow issue there.
> 
> Do you think this would be better than blocking the vCPU and using a
> tasklet?

At least it would better fit with the rest of how things currently
get done.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-11-09  8:02 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-30 15:41 [PATCH v3 0/7] x86/pvh: fix fixes for PVH Dom0 Roger Pau Monne
2018-10-30 15:41 ` [PATCH v3 1/7] x86/pvh: fix TSC mode setup " Roger Pau Monne
2018-10-30 15:41 ` [PATCH v3 2/7] x86/hvm: introduce a define for the debug output IO port Roger Pau Monne
2018-10-31 16:36   ` Wei Liu
2018-10-30 15:41 ` [PATCH v3 3/7] x86/pvh: allow PVH Dom0 to use the debug IO port console Roger Pau Monne
2018-10-30 16:27   ` Wei Liu
2018-10-30 15:41 ` [PATCH v3 4/7] vpci: fix updating the command register Roger Pau Monne
2018-11-05 16:46   ` Jan Beulich
2018-11-07 10:47     ` Roger Pau Monné
2018-11-07 15:00       ` Jan Beulich
2018-10-30 15:41 ` [PATCH v3 5/7] vpci: fix execution of long running operations Roger Pau Monne
2018-11-05 16:56   ` Jan Beulich
2018-11-07 11:11     ` Roger Pau Monné
2018-11-07 15:06       ` Jan Beulich
2018-11-07 17:15         ` Roger Pau Monné
2018-11-08  9:55           ` Jan Beulich
2018-11-08 11:29             ` Roger Pau Monné
2018-11-08 11:42               ` Julien Grall
2018-11-08 11:44                 ` Roger Pau Monné
2018-11-08 11:52                   ` Julien Grall
2018-11-08 12:20                     ` Roger Pau Monné
2018-11-08 12:38                       ` Julien Grall
2018-11-08 12:32               ` Jan Beulich
2018-11-08 12:47                 ` Roger Pau Monné
2018-11-08 13:04                   ` Jan Beulich
2018-11-08 13:20                     ` Roger Pau Monné
     [not found]                       ` <791E55F8020000889527FA34@prv1-mh.provo.novell.com>
2018-11-08 16:25                         ` Jan Beulich
2018-11-08 16:59                           ` Roger Pau Monné
     [not found]                             ` <E720D0C40200003B9527FA34@prv1-mh.provo.novell.com>
2018-11-09  8:02                               ` Jan Beulich
2018-10-30 15:41 ` [PATCH v3 6/7] vpci/msix: carve p2m hole for MSIX MMIO regions Roger Pau Monne
2018-11-05 17:07   ` Jan Beulich
2018-11-07 11:33     ` Roger Pau Monné
2018-10-30 15:41 ` [PATCH v3 7/7] amd/pvh: enable ACPI C1E disable quirk on PVH Dom0 Roger Pau Monne
2018-10-30 16:28   ` Wei Liu
2018-10-31 17:44   ` Boris Ostrovsky
2018-11-02  9:06   ` Jan Beulich
2018-11-07 10:24     ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.