[Xen-devel] [PATCH v3 0/3] Xen on Hyper-V: Implement L0 assisted TLB flush

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [Xen-devel] [PATCH v3 0/3] Xen on Hyper-V: Implement L0 assisted TLB flush
@ 2020-02-17 13:55 Wei Liu
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 1/3] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Wei Liu @ 2020-02-17 13:55 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Hi all

This seris is based on Roger's L0 assisted flush series.

I have done some testing against a Linux on Hyper-V in a 32-vcpu VM.
All builds were done with -j32.



Building Xen on Linux:
real    0m45.376s
user    2m28.156s
sys     0m51.672s

Building Xen on Linux on Xen on Hyper-V, no assisted flush:
real    3m8.762s
user    10m46.787s
sys     30m14.492s

Building Xen on Linux on Xen on Hyper-V, with assisted flush:
real    0m44.369s
user    3m16.231s
sys     3m3.330s



Building Linux x86_64_defconfig on Linux:
real    0m59.698s
user    21m14.014s
sys     2m58.742s

Building Linux x86_64_defconfig on Linux on Xen on Hyper-V, no assisted
flush:
real    2m6.284s
user    31m18.706s
sys     20m31.106s

Building Linux x86_64_defconfig on Linux on Xen on Hyper-V, with assisted
flush:
real    1m38.968s
user    28m40.398s
sys     11m20.151s



There are various degrees of improvement depending on the workload. Xen
can perhaps be optmised a bit more because it currently doesn't pass the
address space id (cr3) to Hyper-V, but that requires reworking TLB flush
APIs within Xen.

Wei.

Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Paul Durrant <pdurrant@amazon.com>

Wei Liu (3):
  x86/hypervisor: pass flags to hypervisor_flush_tlb
  x86/hyperv: skeleton for L0 assisted TLB flush
  x86/hyperv: L0 assisted TLB flush

 xen/arch/x86/guest/hyperv/Makefile     |   2 +
 xen/arch/x86/guest/hyperv/hyperv.c     |  17 ++
 xen/arch/x86/guest/hyperv/private.h    |  13 ++
 xen/arch/x86/guest/hyperv/tlb.c        | 212 +++++++++++++++++++++++++
 xen/arch/x86/guest/hyperv/util.c       |  74 +++++++++
 xen/arch/x86/guest/hypervisor.c        |   7 +-
 xen/arch/x86/guest/xen/xen.c           |   2 +-
 xen/arch/x86/smp.c                     |   5 +-
 xen/include/asm-x86/flushtlb.h         |   3 +
 xen/include/asm-x86/guest/hypervisor.h |  10 +-
 10 files changed, 334 insertions(+), 11 deletions(-)
 create mode 100644 xen/arch/x86/guest/hyperv/tlb.c
 create mode 100644 xen/arch/x86/guest/hyperv/util.c

-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Xen-devel] [PATCH v3 1/3] x86/hypervisor: pass flags to hypervisor_flush_tlb
  2020-02-17 13:55 [Xen-devel] [PATCH v3 0/3] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
@ 2020-02-17 13:55 ` Wei Liu
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 2/3] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 3/3] x86/hyperv: " Wei Liu
  2 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2020-02-17 13:55 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Hyper-V's L0 assisted flush has fine-grained control over what gets
flushed. We need all the flags available to make the best decisions
possible.

No functional change because Xen's implementation doesn't care about
what is passed to it.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
---
v2:
1. Introduce FLUSH_TLB_FLAGS_MASK
---
 xen/arch/x86/guest/hypervisor.c        |  7 +++++--
 xen/arch/x86/guest/xen/xen.c           |  2 +-
 xen/arch/x86/smp.c                     |  5 ++---
 xen/include/asm-x86/flushtlb.h         |  3 +++
 xen/include/asm-x86/guest/hypervisor.h | 10 +++++-----
 5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
index 47e938e287..6ee28c9df1 100644
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -75,10 +75,13 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
 }
 
 int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
-                         unsigned int order)
+                         unsigned int flags)
 {
+    if ( flags & ~FLUSH_TLB_FLAGS_MASK )
+        return -EINVAL;
+
     if ( ops.flush_tlb )
-        return alternative_call(ops.flush_tlb, mask, va, order);
+        return alternative_call(ops.flush_tlb, mask, va, flags);
 
     return -ENOSYS;
 }
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index 5d3427a713..0eb1115c4d 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -324,7 +324,7 @@ static void __init e820_fixup(struct e820map *e820)
         pv_shim_fixup_e820(e820);
 }
 
-static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
+static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int flags)
 {
     return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
 }
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index c7caf5bc26..4dab74c0d5 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -258,9 +258,8 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
          !cpumask_subset(mask, cpumask_of(cpu)) )
     {
         if ( cpu_has_hypervisor &&
-             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
-                         FLUSH_ORDER_MASK)) &&
-             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
+             !(flags & ~FLUSH_TLB_FLAGS_MASK) &&
+             !hypervisor_flush_tlb(mask, va, flags) )
         {
             if ( tlb_clk_enabled )
                 tlb_clk_enabled = false;
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 9773014320..a4de317452 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -123,6 +123,9 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
  /* Flush all HVM guests linear TLB (using ASID/VPID) */
 #define FLUSH_GUESTS_TLB 0x4000
 
+#define FLUSH_TLB_FLAGS_MASK (FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID | \
+                              FLUSH_ORDER_MASK)
+
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);
 #define flush_local(flags) flush_area_local(NULL, flags)
diff --git a/xen/include/asm-x86/guest/hypervisor.h b/xen/include/asm-x86/guest/hypervisor.h
index 432e57c2a0..48d54735d2 100644
--- a/xen/include/asm-x86/guest/hypervisor.h
+++ b/xen/include/asm-x86/guest/hypervisor.h
@@ -35,7 +35,7 @@ struct hypervisor_ops {
     /* Fix up e820 map */
     void (*e820_fixup)(struct e820map *e820);
     /* L0 assisted TLB flush */
-    int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int order);
+    int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int flags);
 };
 
 #ifdef CONFIG_GUEST
@@ -48,11 +48,11 @@ void hypervisor_e820_fixup(struct e820map *e820);
 /*
  * L0 assisted TLB flush.
  * mask: cpumask of the dirty vCPUs that should be flushed.
- * va: linear address to flush, or NULL for global flushes.
- * order: order of the linear address pointed by va.
+ * va: linear address to flush, or NULL for entire address space.
+ * flags: flags for flushing, including the order of va.
  */
 int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
-                         unsigned int order);
+                         unsigned int flags);
 
 #else
 
@@ -65,7 +65,7 @@ static inline int hypervisor_ap_setup(void) { return 0; }
 static inline void hypervisor_resume(void) { ASSERT_UNREACHABLE(); }
 static inline void hypervisor_e820_fixup(struct e820map *e820) {}
 static inline int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
-                                       unsigned int order)
+                                       unsigned int flags)
 {
     return -ENOSYS;
 }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Xen-devel] [PATCH v3 2/3] x86/hyperv: skeleton for L0 assisted TLB flush
  2020-02-17 13:55 [Xen-devel] [PATCH v3 0/3] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 1/3] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
@ 2020-02-17 13:55 ` Wei Liu
  2020-02-17 14:06   ` Durrant, Paul
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 3/3] x86/hyperv: " Wei Liu
  2 siblings, 1 reply; 10+ messages in thread
From: Wei Liu @ 2020-02-17 13:55 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Implement a basic hook for L0 assisted TLB flush. The hook needs to
check if prerequisites are met. If they are not met, it returns an error
number to fall back to native flushes.

Introduce a new variable to indicate if hypercall page is ready.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v3:
1. Change hv_hcall_page_ready to hcall_page_ready
---
 xen/arch/x86/guest/hyperv/Makefile  |  1 +
 xen/arch/x86/guest/hyperv/hyperv.c  | 17 ++++++++++++
 xen/arch/x86/guest/hyperv/private.h |  4 +++
 xen/arch/x86/guest/hyperv/tlb.c     | 41 +++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+)
 create mode 100644 xen/arch/x86/guest/hyperv/tlb.c

diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
index 68170109a9..18902c33e9 100644
--- a/xen/arch/x86/guest/hyperv/Makefile
+++ b/xen/arch/x86/guest/hyperv/Makefile
@@ -1 +1,2 @@
 obj-y += hyperv.o
+obj-y += tlb.o
diff --git a/xen/arch/x86/guest/hyperv/hyperv.c b/xen/arch/x86/guest/hyperv/hyperv.c
index 70f4cd5ae0..f1b3073712 100644
--- a/xen/arch/x86/guest/hyperv/hyperv.c
+++ b/xen/arch/x86/guest/hyperv/hyperv.c
@@ -33,6 +33,8 @@ DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
 DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
 DEFINE_PER_CPU_READ_MOSTLY(unsigned int, hv_vp_index);
 
+static bool __read_mostly hcall_page_ready;
+
 static uint64_t generate_guest_id(void)
 {
     union hv_guest_os_id id = {};
@@ -119,6 +121,8 @@ static void __init setup_hypercall_page(void)
     BUG_ON(!hypercall_msr.enable);
 
     set_fixmap_x(FIX_X_HYPERV_HCALL, mfn << PAGE_SHIFT);
+
+    hcall_page_ready = true;
 }
 
 static int setup_hypercall_pcpu_arg(void)
@@ -199,11 +203,24 @@ static void __init e820_fixup(struct e820map *e820)
         panic("Unable to reserve Hyper-V hypercall range\n");
 }
 
+static int flush_tlb(const cpumask_t *mask, const void *va,
+                     unsigned int flags)
+{
+    if ( !(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) )
+        return -EOPNOTSUPP;
+
+    if ( !hcall_page_ready || !this_cpu(hv_input_page) )
+        return -ENXIO;
+
+    return hyperv_flush_tlb(mask, va, flags);
+}
+
 static const struct hypervisor_ops __initdata ops = {
     .name = "Hyper-V",
     .setup = setup,
     .ap_setup = ap_setup,
     .e820_fixup = e820_fixup,
+    .flush_tlb = flush_tlb,
 };
 
 /*
diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
index 956eff831f..509bedaafa 100644
--- a/xen/arch/x86/guest/hyperv/private.h
+++ b/xen/arch/x86/guest/hyperv/private.h
@@ -22,10 +22,14 @@
 #ifndef __XEN_HYPERV_PRIVIATE_H__
 #define __XEN_HYPERV_PRIVIATE_H__
 
+#include <xen/cpumask.h>
 #include <xen/percpu.h>
 
 DECLARE_PER_CPU(void *, hv_input_page);
 DECLARE_PER_CPU(void *, hv_vp_assist);
 DECLARE_PER_CPU(unsigned int, hv_vp_index);
 
+int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
+                     unsigned int flags);
+
 #endif /* __XEN_HYPERV_PRIVIATE_H__  */
diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
new file mode 100644
index 0000000000..48f527229e
--- /dev/null
+++ b/xen/arch/x86/guest/hyperv/tlb.c
@@ -0,0 +1,41 @@
+/******************************************************************************
+ * arch/x86/guest/hyperv/tlb.c
+ *
+ * Support for TLB management using hypercalls
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2020 Microsoft.
+ */
+
+#include <xen/cpumask.h>
+#include <xen/errno.h>
+
+#include "private.h"
+
+int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
+                     unsigned int flags)
+{
+    return -EOPNOTSUPP;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Xen-devel] [PATCH v3 3/3] x86/hyperv: L0 assisted TLB flush
  2020-02-17 13:55 [Xen-devel] [PATCH v3 0/3] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 1/3] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 2/3] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
@ 2020-02-17 13:55 ` Wei Liu
  2020-02-17 17:34   ` Roger Pau Monné
  2020-02-17 19:51   ` Michael Kelley
  2 siblings, 2 replies; 10+ messages in thread
From: Wei Liu @ 2020-02-17 13:55 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
of several hypercalls:

 * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX

Pick the most efficient hypercalls available.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
v3:
1. Address more comments.
2. Fix usage of max_vp_index.
3. Use the fill_gva_list algorithm from Linux.

v2:
1. Address Roger and Jan's comments re types etc.
2. Fix pointer arithmetic.
3. Misc improvement to code.
---
 xen/arch/x86/guest/hyperv/Makefile  |   1 +
 xen/arch/x86/guest/hyperv/private.h |   9 ++
 xen/arch/x86/guest/hyperv/tlb.c     | 173 +++++++++++++++++++++++++++-
 xen/arch/x86/guest/hyperv/util.c    |  74 ++++++++++++
 4 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/x86/guest/hyperv/util.c

diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
index 18902c33e9..0e39410968 100644
--- a/xen/arch/x86/guest/hyperv/Makefile
+++ b/xen/arch/x86/guest/hyperv/Makefile
@@ -1,2 +1,3 @@
 obj-y += hyperv.o
 obj-y += tlb.o
+obj-y += util.o
diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
index 509bedaafa..79a77930a0 100644
--- a/xen/arch/x86/guest/hyperv/private.h
+++ b/xen/arch/x86/guest/hyperv/private.h
@@ -24,12 +24,21 @@
 
 #include <xen/cpumask.h>
 #include <xen/percpu.h>
+#include <xen/types.h>
 
 DECLARE_PER_CPU(void *, hv_input_page);
 DECLARE_PER_CPU(void *, hv_vp_assist);
 DECLARE_PER_CPU(unsigned int, hv_vp_index);
 
+static inline unsigned int hv_vp_index(unsigned int cpu)
+{
+    return per_cpu(hv_vp_index, cpu);
+}
+
 int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
                      unsigned int flags);
 
+/* Returns number of banks, -ev if error */
+int cpumask_to_vpset(struct hv_vpset *vpset, const cpumask_t *mask);
+
 #endif /* __XEN_HYPERV_PRIVIATE_H__  */
diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
index 48f527229e..8cd1c6f0ed 100644
--- a/xen/arch/x86/guest/hyperv/tlb.c
+++ b/xen/arch/x86/guest/hyperv/tlb.c
@@ -19,17 +19,188 @@
  * Copyright (c) 2020 Microsoft.
  */
 
+#include <xen/cpu.h>
 #include <xen/cpumask.h>
 #include <xen/errno.h>
 
+#include <asm/guest/hyperv.h>
+#include <asm/guest/hyperv-hcall.h>
+#include <asm/guest/hyperv-tlfs.h>
+
 #include "private.h"
 
+/*
+ * It is possible to encode up to 4096 pages using the lower 12 bits
+ * in an element of gva_list
+ */
+#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
+
+static unsigned int fill_gva_list(uint64_t *gva_list, const void *va,
+                                  unsigned int order)
+{
+    unsigned long cur = (unsigned long)va;
+    /* end is 1 past the range to be flushed */
+    unsigned long end = cur + (PAGE_SIZE << order);
+    unsigned int n = 0;
+
+    do {
+        unsigned long diff = end - cur;
+
+        gva_list[n] = cur & PAGE_MASK;
+
+        /*
+         * Use lower 12 bits to encode the number of additional pages
+         * to flush
+         */
+        if ( diff >= HV_TLB_FLUSH_UNIT )
+        {
+            gva_list[n] |= ~PAGE_MASK;
+            cur += HV_TLB_FLUSH_UNIT;
+        }
+        else
+        {
+            gva_list[n] |= (diff - 1) >> PAGE_SHIFT;
+            cur = end;
+        }
+
+        n++;
+    } while ( cur < end );
+
+    return n;
+}
+
+static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
+                             unsigned int flags)
+{
+    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
+    int nr_banks;
+    unsigned int max_gvas, order = flags & FLUSH_ORDER_MASK;
+    uint64_t *gva_list;
+
+    if ( !flush || local_irq_is_enabled() )
+    {
+        ASSERT_UNREACHABLE();
+        return ~0ULL;
+    }
+
+    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
+        return ~0ULL;
+
+    flush->address_space = 0;
+    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+    if ( !(flags & FLUSH_TLB_GLOBAL) )
+        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+
+    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
+    if ( nr_banks < 0 )
+        return ~0ULL;
+
+    max_gvas =
+        (PAGE_SIZE - sizeof(*flush) - nr_banks *
+         sizeof(flush->hv_vp_set.bank_contents[0])) /
+        sizeof(uint64_t);       /* gva is represented as uint64_t */
+
+    /*
+     * Flush the entire address space if va is NULL or if there is not
+     * enough space for gva_list.
+     */
+    if ( !va || (PAGE_SIZE << order) / HV_TLB_FLUSH_UNIT > max_gvas )
+        return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
+                                   nr_banks, virt_to_maddr(flush), 0);
+
+    /*
+     * The calculation of gva_list address requires the structure to
+     * be 64 bits aligned.
+     */
+    BUILD_BUG_ON(sizeof(*flush) % sizeof(uint64_t));
+    gva_list = (uint64_t *)flush + sizeof(*flush) / sizeof(uint64_t) + nr_banks;
+
+    return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
+                               fill_gva_list(gva_list, va, order),
+                               nr_banks, virt_to_maddr(flush), 0);
+}
+
+/* Maximum number of gvas for hv_tlb_flush */
+#define MAX_GVAS ((PAGE_SIZE - sizeof(struct hv_tlb_flush)) / sizeof(uint64_t))
+
 int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
                      unsigned int flags)
 {
-    return -EOPNOTSUPP;
+    unsigned long irq_flags;
+    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
+    unsigned int order = flags & FLUSH_ORDER_MASK;
+    uint64_t ret;
+
+    if ( !flush || cpumask_empty(mask) )
+    {
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
+    }
+
+    local_irq_save(irq_flags);
+
+    flush->address_space = 0;
+    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+    flush->processor_mask = 0;
+    if ( !(flags & FLUSH_TLB_GLOBAL) )
+        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+
+    if ( cpumask_equal(mask, &cpu_online_map) )
+        flush->flags |= HV_FLUSH_ALL_PROCESSORS;
+    else
+    {
+        unsigned int cpu;
+
+        /*
+         * Normally VP indices are in ascending order and match Xen's
+         * idea of CPU ids. Check the last index to see if VP index is
+         * >= 64. If so, we can skip setting up parameters for
+         * non-applicable hypercalls without looking further.
+         */
+        if ( hv_vp_index(cpumask_last(mask)) >= 64 )
+            goto do_ex_hypercall;
+
+        for_each_cpu ( cpu, mask )
+        {
+            unsigned int vpid = hv_vp_index(cpu);
+
+            if ( vpid >= ms_hyperv.max_vp_index )
+            {
+                local_irq_restore(irq_flags);
+                return -ENXIO;
+            }
+
+            if ( vpid >= 64 )
+                goto do_ex_hypercall;
+
+            __set_bit(vpid, &flush->processor_mask);
+        }
+    }
+
+    /*
+     * Flush the entire address space if va is NULL or if there is not
+     * enough space for gva_list.
+     */
+    if ( !va || (PAGE_SIZE << order) / HV_TLB_FLUSH_UNIT > MAX_GVAS )
+        ret = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+                              virt_to_maddr(flush), 0);
+    else
+        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST,
+                                  fill_gva_list(flush->gva_list, va, order),
+                                  0, virt_to_maddr(flush), 0);
+    goto done;
+
+ do_ex_hypercall:
+    ret = flush_tlb_ex(mask, va, flags);
+
+ done:
+    local_irq_restore(irq_flags);
+
+    return ret & HV_HYPERCALL_RESULT_MASK ? -ENXIO : 0;
 }
 
+#undef MAX_GVAS
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/guest/hyperv/util.c b/xen/arch/x86/guest/hyperv/util.c
new file mode 100644
index 0000000000..0abb37b05f
--- /dev/null
+++ b/xen/arch/x86/guest/hyperv/util.c
@@ -0,0 +1,74 @@
+/******************************************************************************
+ * arch/x86/guest/hyperv/util.c
+ *
+ * Hyper-V utility functions
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2020 Microsoft.
+ */
+
+#include <xen/cpu.h>
+#include <xen/cpumask.h>
+#include <xen/errno.h>
+
+#include <asm/guest/hyperv.h>
+#include <asm/guest/hyperv-tlfs.h>
+
+#include "private.h"
+
+int cpumask_to_vpset(struct hv_vpset *vpset,
+                     const cpumask_t *mask)
+{
+    int nr = 1;
+    unsigned int cpu, vcpu_bank, vcpu_offset;
+    unsigned int max_banks = ms_hyperv.max_vp_index / 64;
+
+    /* Up to 64 banks can be represented by valid_bank_mask */
+    if ( max_banks > 64 )
+        return -E2BIG;
+
+    /* Clear all banks to avoid flushing unwanted CPUs */
+    for ( vcpu_bank = 0; vcpu_bank < max_banks; vcpu_bank++ )
+        vpset->bank_contents[vcpu_bank] = 0;
+
+    vpset->valid_bank_mask = 0;
+    vpset->format = HV_GENERIC_SET_SPARSE_4K;
+
+    for_each_cpu ( cpu, mask )
+    {
+        unsigned int vcpu = hv_vp_index(cpu);
+
+        vcpu_bank = vcpu / 64;
+        vcpu_offset = vcpu % 64;
+
+        __set_bit(vcpu_offset, &vpset->bank_contents[vcpu_bank]);
+        __set_bit(vcpu_bank, &vpset->valid_bank_mask);
+
+        if ( vcpu_bank >= nr )
+            nr = vcpu_bank + 1;
+    }
+
+    return nr;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/3] x86/hyperv: skeleton for L0 assisted TLB flush
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 2/3] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
@ 2020-02-17 14:06   ` Durrant, Paul
  0 siblings, 0 replies; 10+ messages in thread
From: Durrant, Paul @ 2020-02-17 14:06 UTC (permalink / raw)
  To: Wei Liu, Xen Development List
  Cc: Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jan Beulich, Michael Kelley

> -----Original Message-----
> From: Wei Liu <wei.liu.xen@gmail.com> On Behalf Of Wei Liu
> Sent: 17 February 2020 13:55
> To: Xen Development List <xen-devel@lists.xenproject.org>
> Cc: Michael Kelley <mikelley@microsoft.com>; Durrant, Paul
> <pdurrant@amazon.co.uk>; Wei Liu <liuwe@microsoft.com>; Roger Pau Monné
> <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; Jan Beulich
> <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>
> Subject: [PATCH v3 2/3] x86/hyperv: skeleton for L0 assisted TLB flush
> 
> Implement a basic hook for L0 assisted TLB flush. The hook needs to
> check if prerequisites are met. If they are not met, it returns an error
> number to fall back to native flushes.
> 
> Introduce a new variable to indicate if hypercall page is ready.
> 
> Signed-off-by: Wei Liu <liuwe@microsoft.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Paul Durrant <pdurrant@amazon.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v3 3/3] x86/hyperv: L0 assisted TLB flush
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 3/3] x86/hyperv: " Wei Liu
@ 2020-02-17 17:34   ` Roger Pau Monné
  2020-02-18 10:40     ` Wei Liu
  2020-02-17 19:51   ` Michael Kelley
  1 sibling, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2020-02-17 17:34 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Mon, Feb 17, 2020 at 01:55:17PM +0000, Wei Liu wrote:
> Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
> of several hypercalls:
> 
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX
> 
> Pick the most efficient hypercalls available.
> 
> Signed-off-by: Wei Liu <liuwe@microsoft.com>

Just two comments below.

> ---
> v3:
> 1. Address more comments.
> 2. Fix usage of max_vp_index.
> 3. Use the fill_gva_list algorithm from Linux.
> 
> v2:
> 1. Address Roger and Jan's comments re types etc.
> 2. Fix pointer arithmetic.
> 3. Misc improvement to code.
> ---
>  xen/arch/x86/guest/hyperv/Makefile  |   1 +
>  xen/arch/x86/guest/hyperv/private.h |   9 ++
>  xen/arch/x86/guest/hyperv/tlb.c     | 173 +++++++++++++++++++++++++++-
>  xen/arch/x86/guest/hyperv/util.c    |  74 ++++++++++++
>  4 files changed, 256 insertions(+), 1 deletion(-)
>  create mode 100644 xen/arch/x86/guest/hyperv/util.c
> 
> diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
> index 18902c33e9..0e39410968 100644
> --- a/xen/arch/x86/guest/hyperv/Makefile
> +++ b/xen/arch/x86/guest/hyperv/Makefile
> @@ -1,2 +1,3 @@
>  obj-y += hyperv.o
>  obj-y += tlb.o
> +obj-y += util.o
> diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
> index 509bedaafa..79a77930a0 100644
> --- a/xen/arch/x86/guest/hyperv/private.h
> +++ b/xen/arch/x86/guest/hyperv/private.h
> @@ -24,12 +24,21 @@
>  
>  #include <xen/cpumask.h>
>  #include <xen/percpu.h>
> +#include <xen/types.h>

Do you still need to include types.h?

None of the additions to this header done in this patch seems to
require it AFAICT.

>  
>  DECLARE_PER_CPU(void *, hv_input_page);
>  DECLARE_PER_CPU(void *, hv_vp_assist);
>  DECLARE_PER_CPU(unsigned int, hv_vp_index);
>  
> +static inline unsigned int hv_vp_index(unsigned int cpu)
> +{
> +    return per_cpu(hv_vp_index, cpu);
> +}
> +
>  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
>                       unsigned int flags);
>  
> +/* Returns number of banks, -ev if error */
> +int cpumask_to_vpset(struct hv_vpset *vpset, const cpumask_t *mask);
> +
>  #endif /* __XEN_HYPERV_PRIVIATE_H__  */
> diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
> index 48f527229e..8cd1c6f0ed 100644
> --- a/xen/arch/x86/guest/hyperv/tlb.c
> +++ b/xen/arch/x86/guest/hyperv/tlb.c
> @@ -19,17 +19,188 @@
>   * Copyright (c) 2020 Microsoft.
>   */
>  
> +#include <xen/cpu.h>
>  #include <xen/cpumask.h>
>  #include <xen/errno.h>
>  
> +#include <asm/guest/hyperv.h>
> +#include <asm/guest/hyperv-hcall.h>
> +#include <asm/guest/hyperv-tlfs.h>
> +
>  #include "private.h"
>  
> +/*
> + * It is possible to encode up to 4096 pages using the lower 12 bits
> + * in an element of gva_list
> + */
> +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
> +
> +static unsigned int fill_gva_list(uint64_t *gva_list, const void *va,
> +                                  unsigned int order)
> +{
> +    unsigned long cur = (unsigned long)va;
> +    /* end is 1 past the range to be flushed */
> +    unsigned long end = cur + (PAGE_SIZE << order);
> +    unsigned int n = 0;
> +
> +    do {
> +        unsigned long diff = end - cur;
> +
> +        gva_list[n] = cur & PAGE_MASK;
> +
> +        /*
> +         * Use lower 12 bits to encode the number of additional pages
> +         * to flush
> +         */
> +        if ( diff >= HV_TLB_FLUSH_UNIT )
> +        {
> +            gva_list[n] |= ~PAGE_MASK;
> +            cur += HV_TLB_FLUSH_UNIT;
> +        }
> +        else
> +        {
> +            gva_list[n] |= (diff - 1) >> PAGE_SHIFT;
> +            cur = end;
> +        }
> +
> +        n++;
> +    } while ( cur < end );
> +
> +    return n;
> +}
> +
> +static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
> +                             unsigned int flags)
> +{
> +    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
> +    int nr_banks;
> +    unsigned int max_gvas, order = flags & FLUSH_ORDER_MASK;
> +    uint64_t *gva_list;
> +
> +    if ( !flush || local_irq_is_enabled() )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return ~0ULL;
> +    }
> +
> +    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
> +        return ~0ULL;
> +
> +    flush->address_space = 0;
> +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> +
> +    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
> +    if ( nr_banks < 0 )
> +        return ~0ULL;
> +
> +    max_gvas =
> +        (PAGE_SIZE - sizeof(*flush) - nr_banks *
> +         sizeof(flush->hv_vp_set.bank_contents[0])) /
> +        sizeof(uint64_t);       /* gva is represented as uint64_t */
> +
> +    /*
> +     * Flush the entire address space if va is NULL or if there is not
> +     * enough space for gva_list.
> +     */
> +    if ( !va || (PAGE_SIZE << order) / HV_TLB_FLUSH_UNIT > max_gvas )
> +        return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
> +                                   nr_banks, virt_to_maddr(flush), 0);
> +
> +    /*
> +     * The calculation of gva_list address requires the structure to
> +     * be 64 bits aligned.
> +     */
> +    BUILD_BUG_ON(sizeof(*flush) % sizeof(uint64_t));
> +    gva_list = (uint64_t *)flush + sizeof(*flush) / sizeof(uint64_t) + nr_banks;
> +
> +    return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
> +                               fill_gva_list(gva_list, va, order),
> +                               nr_banks, virt_to_maddr(flush), 0);
> +}
> +
> +/* Maximum number of gvas for hv_tlb_flush */
> +#define MAX_GVAS ((PAGE_SIZE - sizeof(struct hv_tlb_flush)) / sizeof(uint64_t))
> +
>  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
>                       unsigned int flags)
>  {
> -    return -EOPNOTSUPP;
> +    unsigned long irq_flags;
> +    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
> +    unsigned int order = flags & FLUSH_ORDER_MASK;

I think you need a - 1 here, as FLUSH_ORDER(x) is defined as ((x)+1).
So if a user has specified order 0 here you would get order 1 instead.

unsigned int order = (flags - 1) & FLUSH_ORDER_MASK;

Sorry for not noticing this earlier.

> +    uint64_t ret;
> +
> +    if ( !flush || cpumask_empty(mask) )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return -EINVAL;
> +    }
> +
> +    local_irq_save(irq_flags);

I think you disable interrupts in order to prevent re-entering this
function, and hence avoid an interrupt from triggering in the middle
and also attempting to do a TLB flush using the same per-CPU input
page.

As pointed out to me by Jan, we can also get #MC and #NMI which will
still happen despite interrupts being disabled, and hence you might
want to assert that you are not in #MC or #NMI context before
accessing the per-CPU hv_input_page (or else just return an error
and avoid using the assisted flush). I have a patch that will
hopefully be able to signal when in #MC or #NMI context.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v3 3/3] x86/hyperv: L0 assisted TLB flush
  2020-02-17 13:55 ` [Xen-devel] [PATCH v3 3/3] x86/hyperv: " Wei Liu
  2020-02-17 17:34   ` Roger Pau Monné
@ 2020-02-17 19:51   ` Michael Kelley
  2020-02-17 23:05     ` Wei Liu
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Kelley @ 2020-02-17 19:51 UTC (permalink / raw)
  To: Wei Liu, Xen Development List
  Cc: Andrew Cooper, Paul Durrant, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu.xen@gmail.com> On Behalf Of Wei Liu

[snip]

> diff --git a/xen/arch/x86/guest/hyperv/util.c b/xen/arch/x86/guest/hyperv/util.c
> new file mode 100644
> index 0000000000..0abb37b05f
> --- /dev/null
> +++ b/xen/arch/x86/guest/hyperv/util.c
> @@ -0,0 +1,74 @@
> +/**************************************************************************
> ****
> + * arch/x86/guest/hyperv/util.c
> + *
> + * Hyper-V utility functions
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see https://www.gnu.org/licenses/.
> + *
> + * Copyright (c) 2020 Microsoft.
> + */
> +
> +#include <xen/cpu.h>
> +#include <xen/cpumask.h>
> +#include <xen/errno.h>
> +
> +#include <asm/guest/hyperv.h>
> +#include <asm/guest/hyperv-tlfs.h>
> +
> +#include "private.h"
> +
> +int cpumask_to_vpset(struct hv_vpset *vpset,
> +                     const cpumask_t *mask)
> +{
> +    int nr = 1;
> +    unsigned int cpu, vcpu_bank, vcpu_offset;
> +    unsigned int max_banks = ms_hyperv.max_vp_index / 64;
> +
> +    /* Up to 64 banks can be represented by valid_bank_mask */
> +    if ( max_banks > 64 )
> +        return -E2BIG;
> +
> +    /* Clear all banks to avoid flushing unwanted CPUs */
> +    for ( vcpu_bank = 0; vcpu_bank < max_banks; vcpu_bank++ )
> +        vpset->bank_contents[vcpu_bank] = 0;
> +
> +    vpset->valid_bank_mask = 0;
> +    vpset->format = HV_GENERIC_SET_SPARSE_4K;
> +
> +    for_each_cpu ( cpu, mask )
> +    {
> +        unsigned int vcpu = hv_vp_index(cpu);
> +
> +        vcpu_bank = vcpu / 64;
> +        vcpu_offset = vcpu % 64;
> +
> +        __set_bit(vcpu_offset, &vpset->bank_contents[vcpu_bank]);
> +        __set_bit(vcpu_bank, &vpset->valid_bank_mask);

This approach to setting the bits in the valid_bank_mask causes a bug.
If an entire 64-bit word in the bank_contents array is zero because there
are no CPUs in that range, the corresponding bit in valid_bank_mask still
must be set to tell Hyper-V that the 64-bit word is present in the array
and should be processed, even though the content is zero.  A zero bit
in valid_bank_mask indicates that the corresponding 64-bit word in the
array is not present, and every 64-bit word above it has been shifted down.
That's why the similar Linux function sets valid_bank_mask the way that
it does.

Michael

> +
> +        if ( vcpu_bank >= nr )
> +            nr = vcpu_bank + 1;
> +    }
> +
> +    return nr;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --
> 2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v3 3/3] x86/hyperv: L0 assisted TLB flush
  2020-02-17 19:51   ` Michael Kelley
@ 2020-02-17 23:05     ` Wei Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2020-02-17 23:05 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Jan Beulich,
	Xen Development List, Roger Pau Monné

On Mon, Feb 17, 2020 at 07:51:42PM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu.xen@gmail.com> On Behalf Of Wei Liu
> 
> [snip]
> 
> > diff --git a/xen/arch/x86/guest/hyperv/util.c b/xen/arch/x86/guest/hyperv/util.c
> > new file mode 100644
> > index 0000000000..0abb37b05f
> > --- /dev/null
> > +++ b/xen/arch/x86/guest/hyperv/util.c
> > @@ -0,0 +1,74 @@
> > +/**************************************************************************
> > ****
> > + * arch/x86/guest/hyperv/util.c
> > + *
> > + * Hyper-V utility functions
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; If not, see https://www.gnu.org/licenses/.
> > + *
> > + * Copyright (c) 2020 Microsoft.
> > + */
> > +
> > +#include <xen/cpu.h>
> > +#include <xen/cpumask.h>
> > +#include <xen/errno.h>
> > +
> > +#include <asm/guest/hyperv.h>
> > +#include <asm/guest/hyperv-tlfs.h>
> > +
> > +#include "private.h"
> > +
> > +int cpumask_to_vpset(struct hv_vpset *vpset,
> > +                     const cpumask_t *mask)
> > +{
> > +    int nr = 1;
> > +    unsigned int cpu, vcpu_bank, vcpu_offset;
> > +    unsigned int max_banks = ms_hyperv.max_vp_index / 64;
> > +
> > +    /* Up to 64 banks can be represented by valid_bank_mask */
> > +    if ( max_banks > 64 )
> > +        return -E2BIG;
> > +
> > +    /* Clear all banks to avoid flushing unwanted CPUs */
> > +    for ( vcpu_bank = 0; vcpu_bank < max_banks; vcpu_bank++ )
> > +        vpset->bank_contents[vcpu_bank] = 0;
> > +
> > +    vpset->valid_bank_mask = 0;
> > +    vpset->format = HV_GENERIC_SET_SPARSE_4K;
> > +
> > +    for_each_cpu ( cpu, mask )
> > +    {
> > +        unsigned int vcpu = hv_vp_index(cpu);
> > +
> > +        vcpu_bank = vcpu / 64;
> > +        vcpu_offset = vcpu % 64;
> > +
> > +        __set_bit(vcpu_offset, &vpset->bank_contents[vcpu_bank]);
> > +        __set_bit(vcpu_bank, &vpset->valid_bank_mask);
> 
> This approach to setting the bits in the valid_bank_mask causes a bug.
> If an entire 64-bit word in the bank_contents array is zero because there
> are no CPUs in that range, the corresponding bit in valid_bank_mask still
> must be set to tell Hyper-V that the 64-bit word is present in the array
> and should be processed, even though the content is zero.  A zero bit
> in valid_bank_mask indicates that the corresponding 64-bit word in the
> array is not present, and every 64-bit word above it has been shifted down.
> That's why the similar Linux function sets valid_bank_mask the way that
> it does.

Oh, interesting. TLFS isn't explicit about this "shift-down" property. I
thought it presented the example as a compact array because they wanted
to not insert dozens of zeros.

I will fix this in the next version.

Wei.

> 
> Michael
> 
> > +
> > +        if ( vcpu_bank >= nr )
> > +            nr = vcpu_bank + 1;
> > +    }
> > +
> > +    return nr;
> > +}
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > --
> > 2.20.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v3 3/3] x86/hyperv: L0 assisted TLB flush
  2020-02-17 17:34   ` Roger Pau Monné
@ 2020-02-18 10:40     ` Wei Liu
  2020-02-18 12:52       ` Wei Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Wei Liu @ 2020-02-18 10:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Mon, Feb 17, 2020 at 06:34:12PM +0100, Roger Pau Monné wrote:
> On Mon, Feb 17, 2020 at 01:55:17PM +0000, Wei Liu wrote:
> > Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
> > of several hypercalls:
> > 
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX
> > 
> > Pick the most efficient hypercalls available.
> > 
> > Signed-off-by: Wei Liu <liuwe@microsoft.com>
> 
> Just two comments below.
> 
> > ---
> > v3:
> > 1. Address more comments.
> > 2. Fix usage of max_vp_index.
> > 3. Use the fill_gva_list algorithm from Linux.
> > 
> > v2:
> > 1. Address Roger and Jan's comments re types etc.
> > 2. Fix pointer arithmetic.
> > 3. Misc improvement to code.
> > ---
> >  xen/arch/x86/guest/hyperv/Makefile  |   1 +
> >  xen/arch/x86/guest/hyperv/private.h |   9 ++
> >  xen/arch/x86/guest/hyperv/tlb.c     | 173 +++++++++++++++++++++++++++-
> >  xen/arch/x86/guest/hyperv/util.c    |  74 ++++++++++++
> >  4 files changed, 256 insertions(+), 1 deletion(-)
> >  create mode 100644 xen/arch/x86/guest/hyperv/util.c
> > 
> > diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
> > index 18902c33e9..0e39410968 100644
> > --- a/xen/arch/x86/guest/hyperv/Makefile
> > +++ b/xen/arch/x86/guest/hyperv/Makefile
> > @@ -1,2 +1,3 @@
> >  obj-y += hyperv.o
> >  obj-y += tlb.o
> > +obj-y += util.o
> > diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
> > index 509bedaafa..79a77930a0 100644
> > --- a/xen/arch/x86/guest/hyperv/private.h
> > +++ b/xen/arch/x86/guest/hyperv/private.h
> > @@ -24,12 +24,21 @@
> >  
> >  #include <xen/cpumask.h>
> >  #include <xen/percpu.h>
> > +#include <xen/types.h>
> 
> Do you still need to include types.h?
> 

Not anymore.

> None of the additions to this header done in this patch seems to
> require it AFAICT.
> 
> >  
> >  DECLARE_PER_CPU(void *, hv_input_page);
> >  DECLARE_PER_CPU(void *, hv_vp_assist);
> >  DECLARE_PER_CPU(unsigned int, hv_vp_index);
> >  
> > +static inline unsigned int hv_vp_index(unsigned int cpu)
> > +{
> > +    return per_cpu(hv_vp_index, cpu);
> > +}
> > +
> >  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
> >                       unsigned int flags);
> >  
> > +/* Returns number of banks, -ev if error */
> > +int cpumask_to_vpset(struct hv_vpset *vpset, const cpumask_t *mask);
> > +
> >  #endif /* __XEN_HYPERV_PRIVIATE_H__  */
> > diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
> > index 48f527229e..8cd1c6f0ed 100644
> > --- a/xen/arch/x86/guest/hyperv/tlb.c
> > +++ b/xen/arch/x86/guest/hyperv/tlb.c
> > @@ -19,17 +19,188 @@
> >   * Copyright (c) 2020 Microsoft.
> >   */
> >  
> > +#include <xen/cpu.h>
> >  #include <xen/cpumask.h>
> >  #include <xen/errno.h>
> >  
> > +#include <asm/guest/hyperv.h>
> > +#include <asm/guest/hyperv-hcall.h>
> > +#include <asm/guest/hyperv-tlfs.h>
> > +
> >  #include "private.h"
> >  
> > +/*
> > + * It is possible to encode up to 4096 pages using the lower 12 bits
> > + * in an element of gva_list
> > + */
> > +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
> > +
> > +static unsigned int fill_gva_list(uint64_t *gva_list, const void *va,
> > +                                  unsigned int order)
> > +{
> > +    unsigned long cur = (unsigned long)va;
> > +    /* end is 1 past the range to be flushed */
> > +    unsigned long end = cur + (PAGE_SIZE << order);
> > +    unsigned int n = 0;
> > +
> > +    do {
> > +        unsigned long diff = end - cur;
> > +
> > +        gva_list[n] = cur & PAGE_MASK;
> > +
> > +        /*
> > +         * Use lower 12 bits to encode the number of additional pages
> > +         * to flush
> > +         */
> > +        if ( diff >= HV_TLB_FLUSH_UNIT )
> > +        {
> > +            gva_list[n] |= ~PAGE_MASK;
> > +            cur += HV_TLB_FLUSH_UNIT;
> > +        }
> > +        else
> > +        {
> > +            gva_list[n] |= (diff - 1) >> PAGE_SHIFT;
> > +            cur = end;
> > +        }
> > +
> > +        n++;
> > +    } while ( cur < end );
> > +
> > +    return n;
> > +}
> > +
> > +static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
> > +                             unsigned int flags)
> > +{
> > +    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
> > +    int nr_banks;
> > +    unsigned int max_gvas, order = flags & FLUSH_ORDER_MASK;
> > +    uint64_t *gva_list;
> > +
> > +    if ( !flush || local_irq_is_enabled() )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return ~0ULL;
> > +    }
> > +
> > +    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
> > +        return ~0ULL;
> > +
> > +    flush->address_space = 0;
> > +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> > +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> > +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> > +
> > +    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
> > +    if ( nr_banks < 0 )
> > +        return ~0ULL;
> > +
> > +    max_gvas =
> > +        (PAGE_SIZE - sizeof(*flush) - nr_banks *
> > +         sizeof(flush->hv_vp_set.bank_contents[0])) /
> > +        sizeof(uint64_t);       /* gva is represented as uint64_t */
> > +
> > +    /*
> > +     * Flush the entire address space if va is NULL or if there is not
> > +     * enough space for gva_list.
> > +     */
> > +    if ( !va || (PAGE_SIZE << order) / HV_TLB_FLUSH_UNIT > max_gvas )
> > +        return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
> > +                                   nr_banks, virt_to_maddr(flush), 0);
> > +
> > +    /*
> > +     * The calculation of gva_list address requires the structure to
> > +     * be 64 bits aligned.
> > +     */
> > +    BUILD_BUG_ON(sizeof(*flush) % sizeof(uint64_t));
> > +    gva_list = (uint64_t *)flush + sizeof(*flush) / sizeof(uint64_t) + nr_banks;
> > +
> > +    return hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
> > +                               fill_gva_list(gva_list, va, order),
> > +                               nr_banks, virt_to_maddr(flush), 0);
> > +}
> > +
> > +/* Maximum number of gvas for hv_tlb_flush */
> > +#define MAX_GVAS ((PAGE_SIZE - sizeof(struct hv_tlb_flush)) / sizeof(uint64_t))
> > +
> >  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
> >                       unsigned int flags)
> >  {
> > -    return -EOPNOTSUPP;
> > +    unsigned long irq_flags;
> > +    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
> > +    unsigned int order = flags & FLUSH_ORDER_MASK;
> 
> I think you need a - 1 here, as FLUSH_ORDER(x) is defined as ((x)+1).
> So if a user has specified order 0 here you would get order 1 instead.
> 
> unsigned int order = (flags - 1) & FLUSH_ORDER_MASK;

Yes, indeed. That's what flush_area_local does. I will fix this.

BTW, I think your series also needs fixing. The patch that introduced
hypervisor_flush_tlb hook. I took the snippet from that patch directly.

> 
> Sorry for not noticing this earlier.

Thanks for noticing this. :-)

> 
> > +    uint64_t ret;
> > +
> > +    if ( !flush || cpumask_empty(mask) )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return -EINVAL;
> > +    }
> > +
> > +    local_irq_save(irq_flags);
> 
> I think you disable interrupts in order to prevent re-entering this
> function, and hence avoid an interrupt from triggering in the middle
> and also attempting to do a TLB flush using the same per-CPU input
> page.
> 
> As pointed out to me by Jan, we can also get #MC and #NMI which will
> still happen despite interrupts being disabled, and hence you might
> want to assert that you are not in #MC or #NMI context before
> accessing the per-CPU hv_input_page (or else just return an error
> and avoid using the assisted flush). I have a patch that will
> hopefully be able to signal when in #MC or #NMI context.
> 

This function should return an error in that case. It is better to fall
back to native path than crashing.

Wei.


> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v3 3/3] x86/hyperv: L0 assisted TLB flush
  2020-02-18 10:40     ` Wei Liu
@ 2020-02-18 12:52       ` Wei Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Wei Liu @ 2020-02-18 12:52 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Tue, Feb 18, 2020 at 10:40:29AM +0000, Wei Liu wrote:
> > 
> > > +    uint64_t ret;
> > > +
> > > +    if ( !flush || cpumask_empty(mask) )
> > > +    {
> > > +        ASSERT_UNREACHABLE();
> > > +        return -EINVAL;
> > > +    }
> > > +
> > > +    local_irq_save(irq_flags);
> > 
> > I think you disable interrupts in order to prevent re-entering this
> > function, and hence avoid an interrupt from triggering in the middle
> > and also attempting to do a TLB flush using the same per-CPU input
> > page.
> > 
> > As pointed out to me by Jan, we can also get #MC and #NMI which will
> > still happen despite interrupts being disabled, and hence you might
> > want to assert that you are not in #MC or #NMI context before
> > accessing the per-CPU hv_input_page (or else just return an error
> > and avoid using the assisted flush). I have a patch that will
> > hopefully be able to signal when in #MC or #NMI context.
> > 
> 
> This function should return an error in that case. It is better to fall
> back to native path than crashing.
> 

I briefly read through the other thread about what is allowed in #NMI or
#MC context. The discussion centred around if some operation should be
allowed to happen in those contexts in the first place.

For now I will just add a comment in the Hyper-V code. Once that
discussion is resolved, Hyper-V code can follow suite where applicable.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-18 12:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 13:55 [Xen-devel] [PATCH v3 0/3] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
2020-02-17 13:55 ` [Xen-devel] [PATCH v3 1/3] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
2020-02-17 13:55 ` [Xen-devel] [PATCH v3 2/3] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
2020-02-17 14:06   ` Durrant, Paul
2020-02-17 13:55 ` [Xen-devel] [PATCH v3 3/3] x86/hyperv: " Wei Liu
2020-02-17 17:34   ` Roger Pau Monné
2020-02-18 10:40     ` Wei Liu
2020-02-18 12:52       ` Wei Liu
2020-02-17 19:51   ` Michael Kelley
2020-02-17 23:05     ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).