Xen-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Xen-devel] [PATCH 0/4] Xen on Hyper-V: Implement L0 assisted TLB flush
@ 2020-02-12 16:09 Wei Liu
  2020-02-12 16:09 ` [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup Wei Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Wei Liu @ 2020-02-12 16:09 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Hi all

This seris is based on Roger's L0 assisted flush series.

I have done some testing against a Linux on Hyper-V in a 32-vcpu VM.
All builds were done with -j32.



Building Xen on Linux:
real    0m45.376s
user    2m28.156s
sys     0m51.672s

Building Xen on Linux on Xen on Hyper-V, no assisted flush:
real    3m8.762s
user    10m46.787s
sys     30m14.492s

Building Xen on Linux on Xen on Hyper-V, with assisted flush:
real    0m44.369s
user    3m16.231s
sys     3m3.330s



Building Linux x86_64_defconfig on Linux:
real    0m59.698s
user    21m14.014s
sys     2m58.742s

Building Linux x86_64_defconfig on Linux on Xen on Hyper-V, no assisted
flush:
real    2m6.284s
user    31m18.706s
sys     20m31.106s

Building Linux x86_64_defconfig on Linux on Xen on Hyper-V, with assisted
flush:
real    1m38.968s
user    28m40.398s
sys     11m20.151s



There are various degrees of improvement depending on the workload. Xen
can perhaps be optmised a bit more because it currently doesn't pass the
address space id (cr3) to Hyper-V, but that requires reworking TLB flush
APIs within Xen.

Wei.

Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Paul Durrant <pdurrant@amazon.com>

Wei Liu (4):
  x86/hyperv: misc cleanup
  x86/hypervisor: pass flags to hypervisor_flush_tlb
  x86/hyperv: skeleton for L0 assisted TLB flush
  x86/hyperv: L0 assisted TLB flush

 xen/arch/x86/guest/hyperv/Makefile       |   2 +
 xen/arch/x86/guest/hyperv/hyperv.c       |  19 +-
 xen/arch/x86/guest/hyperv/private.h      |  15 +-
 xen/arch/x86/guest/hyperv/tlb.c          | 211 +++++++++++++++++++++++
 xen/arch/x86/guest/hyperv/util.c         |  72 ++++++++
 xen/arch/x86/guest/hypervisor.c          |   4 +-
 xen/arch/x86/guest/xen/xen.c             |   2 +-
 xen/arch/x86/smp.c                       |   2 +-
 xen/include/asm-x86/guest/hyperv-hcall.h |   5 +-
 xen/include/asm-x86/guest/hypervisor.h   |  10 +-
 10 files changed, 329 insertions(+), 13 deletions(-)
 create mode 100644 xen/arch/x86/guest/hyperv/tlb.c
 create mode 100644 xen/arch/x86/guest/hyperv/util.c

-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup
  2020-02-12 16:09 [Xen-devel] [PATCH 0/4] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
@ 2020-02-12 16:09 ` Wei Liu
  2020-02-12 16:53   ` Roger Pau Monné
                     ` (2 more replies)
  2020-02-12 16:09 ` [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 23+ messages in thread
From: Wei Liu @ 2020-02-12 16:09 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Change hv_vp_index to use uint32_t because that is what is defined in TLFS.

Add "_addr" suffix to hv_do_rep_hypercall's parameters.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
 xen/arch/x86/guest/hyperv/hyperv.c       | 2 +-
 xen/arch/x86/guest/hyperv/private.h      | 2 +-
 xen/include/asm-x86/guest/hyperv-hcall.h | 5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/guest/hyperv/hyperv.c b/xen/arch/x86/guest/hyperv/hyperv.c
index 70f4cd5ae0..b7044f7193 100644
--- a/xen/arch/x86/guest/hyperv/hyperv.c
+++ b/xen/arch/x86/guest/hyperv/hyperv.c
@@ -31,7 +31,7 @@
 struct ms_hyperv_info __read_mostly ms_hyperv;
 DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
 DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
-DEFINE_PER_CPU_READ_MOSTLY(unsigned int, hv_vp_index);
+DEFINE_PER_CPU_READ_MOSTLY(uint32_t, hv_vp_index);
 
 static uint64_t generate_guest_id(void)
 {
diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
index 956eff831f..eb14ea43e5 100644
--- a/xen/arch/x86/guest/hyperv/private.h
+++ b/xen/arch/x86/guest/hyperv/private.h
@@ -26,6 +26,6 @@
 
 DECLARE_PER_CPU(void *, hv_input_page);
 DECLARE_PER_CPU(void *, hv_vp_assist);
-DECLARE_PER_CPU(unsigned int, hv_vp_index);
+DECLARE_PER_CPU(uint32_t, hv_vp_index);
 
 #endif /* __XEN_HYPERV_PRIVIATE_H__  */
diff --git a/xen/include/asm-x86/guest/hyperv-hcall.h b/xen/include/asm-x86/guest/hyperv-hcall.h
index 4d3b131b3a..3396caccdd 100644
--- a/xen/include/asm-x86/guest/hyperv-hcall.h
+++ b/xen/include/asm-x86/guest/hyperv-hcall.h
@@ -61,7 +61,8 @@ static inline uint64_t hv_do_fast_hypercall(uint16_t code,
 
 static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t rep_count,
                                            uint16_t varhead_size,
-                                           paddr_t input, paddr_t output)
+                                           paddr_t input_addr,
+                                           paddr_t output_addr)
 {
     uint64_t control = code;
     uint64_t status;
@@ -71,7 +72,7 @@ static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t rep_count,
     control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET;
 
     do {
-        status = hv_do_hypercall(control, input, output);
+        status = hv_do_hypercall(control, input_addr, output_addr);
         if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS )
             break;
 
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb
  2020-02-12 16:09 [Xen-devel] [PATCH 0/4] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
  2020-02-12 16:09 ` [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup Wei Liu
@ 2020-02-12 16:09 ` Wei Liu
  2020-02-12 17:00   ` Roger Pau Monné
  2020-02-12 16:09 ` [Xen-devel] [PATCH 3/4] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
  2020-02-12 16:09 ` [Xen-devel] [PATCH 4/4] x86/hyperv: " Wei Liu
  3 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2020-02-12 16:09 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Hyper-V's L0 assisted flush has fine-grained control over what gets
flushed. We need all the flags available to make the best decisions
possible.

No functional change because Xen's implementation doesn't care about
what is passed to it.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
 xen/arch/x86/guest/hypervisor.c        |  4 ++--
 xen/arch/x86/guest/xen/xen.c           |  2 +-
 xen/arch/x86/smp.c                     |  2 +-
 xen/include/asm-x86/guest/hypervisor.h | 10 +++++-----
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
index 47e938e287..2724fd9bad 100644
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -75,10 +75,10 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
 }
 
 int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
-                         unsigned int order)
+                         unsigned int flags)
 {
     if ( ops.flush_tlb )
-        return alternative_call(ops.flush_tlb, mask, va, order);
+        return alternative_call(ops.flush_tlb, mask, va, flags);
 
     return -ENOSYS;
 }
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index 5d3427a713..0eb1115c4d 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -324,7 +324,7 @@ static void __init e820_fixup(struct e820map *e820)
         pv_shim_fixup_e820(e820);
 }
 
-static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
+static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int flags)
 {
     return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
 }
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 9bc925616a..2df21e396a 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -260,7 +260,7 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
         if ( cpu_has_hypervisor &&
              !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
                          FLUSH_ORDER_MASK)) &&
-             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
+             !hypervisor_flush_tlb(mask, va, flags) )
         {
             if ( tlb_clk_enabled )
                 tlb_clk_enabled = false;
diff --git a/xen/include/asm-x86/guest/hypervisor.h b/xen/include/asm-x86/guest/hypervisor.h
index 432e57c2a0..48d54735d2 100644
--- a/xen/include/asm-x86/guest/hypervisor.h
+++ b/xen/include/asm-x86/guest/hypervisor.h
@@ -35,7 +35,7 @@ struct hypervisor_ops {
     /* Fix up e820 map */
     void (*e820_fixup)(struct e820map *e820);
     /* L0 assisted TLB flush */
-    int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int order);
+    int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int flags);
 };
 
 #ifdef CONFIG_GUEST
@@ -48,11 +48,11 @@ void hypervisor_e820_fixup(struct e820map *e820);
 /*
  * L0 assisted TLB flush.
  * mask: cpumask of the dirty vCPUs that should be flushed.
- * va: linear address to flush, or NULL for global flushes.
- * order: order of the linear address pointed by va.
+ * va: linear address to flush, or NULL for entire address space.
+ * flags: flags for flushing, including the order of va.
  */
 int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
-                         unsigned int order);
+                         unsigned int flags);
 
 #else
 
@@ -65,7 +65,7 @@ static inline int hypervisor_ap_setup(void) { return 0; }
 static inline void hypervisor_resume(void) { ASSERT_UNREACHABLE(); }
 static inline void hypervisor_e820_fixup(struct e820map *e820) {}
 static inline int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
-                                       unsigned int order)
+                                       unsigned int flags)
 {
     return -ENOSYS;
 }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Xen-devel] [PATCH 3/4] x86/hyperv: skeleton for L0 assisted TLB flush
  2020-02-12 16:09 [Xen-devel] [PATCH 0/4] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
  2020-02-12 16:09 ` [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup Wei Liu
  2020-02-12 16:09 ` [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
@ 2020-02-12 16:09 ` Wei Liu
  2020-02-12 17:09   ` Roger Pau Monné
  2020-02-12 16:09 ` [Xen-devel] [PATCH 4/4] x86/hyperv: " Wei Liu
  3 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2020-02-12 16:09 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Implement a basic hook for L0 assisted TLB flush. The hook needs to
check if prerequisites are met. If they are not met, it returns an error
number to fall back to native flushes.

Introduce a new variable to indicate if hypercall page is ready.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
 xen/arch/x86/guest/hyperv/Makefile  |  1 +
 xen/arch/x86/guest/hyperv/hyperv.c  | 17 ++++++++++++
 xen/arch/x86/guest/hyperv/private.h |  4 +++
 xen/arch/x86/guest/hyperv/tlb.c     | 41 +++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+)
 create mode 100644 xen/arch/x86/guest/hyperv/tlb.c

diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
index 68170109a9..18902c33e9 100644
--- a/xen/arch/x86/guest/hyperv/Makefile
+++ b/xen/arch/x86/guest/hyperv/Makefile
@@ -1 +1,2 @@
 obj-y += hyperv.o
+obj-y += tlb.o
diff --git a/xen/arch/x86/guest/hyperv/hyperv.c b/xen/arch/x86/guest/hyperv/hyperv.c
index b7044f7193..1cdc88e27c 100644
--- a/xen/arch/x86/guest/hyperv/hyperv.c
+++ b/xen/arch/x86/guest/hyperv/hyperv.c
@@ -33,6 +33,8 @@ DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
 DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
 DEFINE_PER_CPU_READ_MOSTLY(uint32_t, hv_vp_index);
 
+static bool __read_mostly hv_hcall_page_ready;
+
 static uint64_t generate_guest_id(void)
 {
     union hv_guest_os_id id = {};
@@ -119,6 +121,8 @@ static void __init setup_hypercall_page(void)
     BUG_ON(!hypercall_msr.enable);
 
     set_fixmap_x(FIX_X_HYPERV_HCALL, mfn << PAGE_SHIFT);
+
+    hv_hcall_page_ready = true;
 }
 
 static int setup_hypercall_pcpu_arg(void)
@@ -199,11 +203,24 @@ static void __init e820_fixup(struct e820map *e820)
         panic("Unable to reserve Hyper-V hypercall range\n");
 }
 
+static int flush_tlb(const cpumask_t *mask, const void *va,
+                     unsigned int flags)
+{
+    if ( !(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) )
+        return -EOPNOTSUPP;
+
+    if ( !hv_hcall_page_ready || !this_cpu(hv_input_page) )
+        return -ENXIO;
+
+    return hyperv_flush_tlb(mask, va, flags);
+}
+
 static const struct hypervisor_ops __initdata ops = {
     .name = "Hyper-V",
     .setup = setup,
     .ap_setup = ap_setup,
     .e820_fixup = e820_fixup,
+    .flush_tlb = flush_tlb,
 };
 
 /*
diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
index eb14ea43e5..78e52f74ce 100644
--- a/xen/arch/x86/guest/hyperv/private.h
+++ b/xen/arch/x86/guest/hyperv/private.h
@@ -22,10 +22,14 @@
 #ifndef __XEN_HYPERV_PRIVIATE_H__
 #define __XEN_HYPERV_PRIVIATE_H__
 
+#include <xen/cpumask.h>
 #include <xen/percpu.h>
 
 DECLARE_PER_CPU(void *, hv_input_page);
 DECLARE_PER_CPU(void *, hv_vp_assist);
 DECLARE_PER_CPU(uint32_t, hv_vp_index);
 
+int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
+                     unsigned int flags);
+
 #endif /* __XEN_HYPERV_PRIVIATE_H__  */
diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
new file mode 100644
index 0000000000..48f527229e
--- /dev/null
+++ b/xen/arch/x86/guest/hyperv/tlb.c
@@ -0,0 +1,41 @@
+/******************************************************************************
+ * arch/x86/guest/hyperv/tlb.c
+ *
+ * Support for TLB management using hypercalls
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2020 Microsoft.
+ */
+
+#include <xen/cpumask.h>
+#include <xen/errno.h>
+
+#include "private.h"
+
+int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
+                     unsigned int flags)
+{
+    return -EOPNOTSUPP;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Xen-devel] [PATCH 4/4] x86/hyperv: L0 assisted TLB flush
  2020-02-12 16:09 [Xen-devel] [PATCH 0/4] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
                   ` (2 preceding siblings ...)
  2020-02-12 16:09 ` [Xen-devel] [PATCH 3/4] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
@ 2020-02-12 16:09 ` " Wei Liu
  2020-02-12 17:43   ` Roger Pau Monné
  3 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2020-02-12 16:09 UTC (permalink / raw)
  To: Xen Development List
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Roger Pau Monné

Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
of several hypercalls:

 * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
 * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX

Pick the most efficient hypercalls available.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
 xen/arch/x86/guest/hyperv/Makefile  |   1 +
 xen/arch/x86/guest/hyperv/private.h |   9 ++
 xen/arch/x86/guest/hyperv/tlb.c     | 172 +++++++++++++++++++++++++++-
 xen/arch/x86/guest/hyperv/util.c    |  72 ++++++++++++
 4 files changed, 253 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/x86/guest/hyperv/util.c

diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
index 18902c33e9..0e39410968 100644
--- a/xen/arch/x86/guest/hyperv/Makefile
+++ b/xen/arch/x86/guest/hyperv/Makefile
@@ -1,2 +1,3 @@
 obj-y += hyperv.o
 obj-y += tlb.o
+obj-y += util.o
diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
index 78e52f74ce..311f060495 100644
--- a/xen/arch/x86/guest/hyperv/private.h
+++ b/xen/arch/x86/guest/hyperv/private.h
@@ -24,12 +24,21 @@
 
 #include <xen/cpumask.h>
 #include <xen/percpu.h>
+#include <xen/types.h>
 
 DECLARE_PER_CPU(void *, hv_input_page);
 DECLARE_PER_CPU(void *, hv_vp_assist);
 DECLARE_PER_CPU(uint32_t, hv_vp_index);
 
+static inline uint32_t hv_vp_index(int cpu)
+{
+    return per_cpu(hv_vp_index, cpu);
+}
+
 int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
                      unsigned int flags);
 
+/* Returns number of banks, -ev if error */
+int cpumask_to_vpset(struct hv_vpset *vpset, const cpumask_t *mask);
+
 #endif /* __XEN_HYPERV_PRIVIATE_H__  */
diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
index 48f527229e..99b789d9e9 100644
--- a/xen/arch/x86/guest/hyperv/tlb.c
+++ b/xen/arch/x86/guest/hyperv/tlb.c
@@ -19,15 +19,185 @@
  * Copyright (c) 2020 Microsoft.
  */
 
+#include <xen/cpu.h>
 #include <xen/cpumask.h>
 #include <xen/errno.h>
 
+#include <asm/guest/hyperv.h>
+#include <asm/guest/hyperv-hcall.h>
+#include <asm/guest/hyperv-tlfs.h>
+
 #include "private.h"
 
+/*
+ * It is possible to encode up to 4096 pages using the lower 12 bits
+ * in an element of gva_list
+ */
+#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
+#define ORDER_TO_BYTES(order) ((1ul << (order)) * PAGE_SIZE)
+
+static unsigned int fill_gva_list(uint64_t *gva_list, const void *va,
+                                  unsigned int order)
+{
+    unsigned long start = (unsigned long)va;
+    unsigned long end = start + ORDER_TO_BYTES(order) - 1;
+    unsigned int n = 0;
+
+    do {
+        unsigned long remain = end > start ? end - start : 0;
+
+        gva_list[n] = start & PAGE_MASK;
+
+        /*
+         * Use lower 12 bits to encode the number of additional pages
+         * to flush
+         */
+        if ( remain >= HV_TLB_FLUSH_UNIT )
+        {
+            gva_list[n] |= ~PAGE_MASK;
+            start += HV_TLB_FLUSH_UNIT;
+        }
+        else if ( remain )
+        {
+            gva_list[n] |= (remain - 1) >> PAGE_SHIFT;
+            start = end;
+        }
+
+        n++;
+    } while ( start < end );
+
+    return n;
+}
+
+static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
+                             unsigned int flags)
+{
+    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
+    int nr_banks;
+    unsigned int max_gvas;
+    unsigned int order = flags & FLUSH_ORDER_MASK;
+    uint64_t ret;
+
+    ASSERT(flush);
+    ASSERT(!local_irq_is_enabled());
+
+    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
+        return ~0ULL;
+
+    flush->address_space = 0;
+    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+    if ( !(flags & FLUSH_TLB_GLOBAL) )
+        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+
+    flush->hv_vp_set.valid_bank_mask = 0;
+    flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+
+    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
+    if ( nr_banks < 0 )
+        return ~0ULL;
+
+    max_gvas =
+        (PAGE_SIZE - sizeof(*flush) - nr_banks *
+         sizeof(flush->hv_vp_set.bank_contents[0])) /
+        sizeof(uint64_t);       /* gva is represented as uint64_t */
+
+    /*
+     * Flush the entire address space if va is NULL or if there is not
+     * enough space for gva_list.
+     */
+    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
+        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
+                                  nr_banks, virt_to_maddr(flush), 0);
+    else
+    {
+        uint64_t *gva_list = (uint64_t *)flush + sizeof(*flush) + nr_banks;
+        unsigned int gvas = fill_gva_list(gva_list, va, order);
+
+        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
+                                  gvas, nr_banks, virt_to_maddr(flush), 0);
+    }
+
+    return ret;
+}
+
 int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
                      unsigned int flags)
 {
-    return -EOPNOTSUPP;
+    unsigned long irq_flags;
+    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
+    uint64_t ret;
+    unsigned int order = flags & FLUSH_ORDER_MASK;
+    unsigned int max_gvas;
+
+    ASSERT(flush);
+    ASSERT(!cpumask_empty(mask));
+
+    local_irq_save(irq_flags);
+
+    flush->address_space = 0;
+    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+    flush->processor_mask = 0;
+    if ( !(flags & FLUSH_TLB_GLOBAL) )
+        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
+
+    if ( cpumask_equal(mask, &cpu_online_map) )
+        flush->flags |= HV_FLUSH_ALL_PROCESSORS;
+    else
+    {
+        int cpu;
+
+        /*
+         * Normally VP indices are in ascending order and match Xen's
+         * idea of CPU ids. Check the last index to see if VP index is
+         * >= 64. If so, we can skip setting up parameters for
+         * non-applicable hypercalls without looking further.
+         */
+        if ( hv_vp_index(cpumask_last(mask)) >= 64 )
+            goto do_ex_hypercall;
+
+        for_each_cpu ( cpu, mask )
+        {
+            uint32_t vpid = hv_vp_index(cpu);
+
+            if ( vpid > ms_hyperv.max_vp_index )
+            {
+                local_irq_restore(irq_flags);
+                return -ENXIO;
+            }
+
+            if ( vpid >= 64 )
+                goto do_ex_hypercall;
+
+            __set_bit(vpid, &flush->processor_mask);
+        }
+    }
+
+    max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
+
+    /*
+     * Flush the entire address space if va is NULL or if there is not
+     * enough space for gva_list.
+     */
+    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
+        ret = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
+                              virt_to_maddr(flush), 0);
+    else
+    {
+        unsigned int gvas = fill_gva_list(flush->gva_list, va, order);
+
+        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST, gvas, 0,
+                                  virt_to_maddr(flush), 0);
+    }
+
+    goto done;
+
+ do_ex_hypercall:
+    ret = flush_tlb_ex(mask, va, flags);
+
+ done:
+    local_irq_restore(irq_flags);
+
+    return ret & HV_HYPERCALL_RESULT_MASK;
 }
 
 /*
diff --git a/xen/arch/x86/guest/hyperv/util.c b/xen/arch/x86/guest/hyperv/util.c
new file mode 100644
index 0000000000..9d0b5f4a46
--- /dev/null
+++ b/xen/arch/x86/guest/hyperv/util.c
@@ -0,0 +1,72 @@
+/******************************************************************************
+ * arch/x86/guest/hyperv/util.c
+ *
+ * Hyper-V utility functions
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2020 Microsoft.
+ */
+
+#include <xen/cpu.h>
+#include <xen/cpumask.h>
+#include <xen/errno.h>
+
+#include <asm/guest/hyperv.h>
+#include <asm/guest/hyperv-tlfs.h>
+
+#include "private.h"
+
+int cpumask_to_vpset(struct hv_vpset *vpset,
+                     const cpumask_t *mask)
+{
+    int nr = 1, cpu, vcpu_bank, vcpu_offset;
+    int max_banks = ms_hyperv.max_vp_index / 64;
+
+    /* Up to 64 banks can be represented by valid_bank_mask */
+    if ( max_banks >= 64 )
+        return -1;
+
+    /* Clear all banks to avoid flushing unwanted CPUs */
+    for ( vcpu_bank = 0; vcpu_bank <= max_banks; vcpu_bank++ )
+        vpset->bank_contents[vcpu_bank] = 0;
+
+    vpset->valid_bank_mask = 0;
+
+    for_each_cpu ( cpu, mask )
+    {
+        int vcpu = hv_vp_index(cpu);
+
+        vcpu_bank = vcpu / 64;
+        vcpu_offset = vcpu % 64;
+
+        __set_bit(vcpu_offset, &vpset->bank_contents[vcpu_bank]);
+        __set_bit(vcpu_bank, &vpset->valid_bank_mask);
+
+        if ( vcpu_bank >= nr )
+            nr = vcpu_bank + 1;
+    }
+
+    return nr;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup
  2020-02-12 16:09 ` [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup Wei Liu
@ 2020-02-12 16:53   ` Roger Pau Monné
  2020-02-13 10:30     ` Wei Liu
  2020-02-13  8:43   ` Durrant, Paul
  2020-02-13  9:46   ` Jan Beulich
  2 siblings, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2020-02-12 16:53 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Wed, Feb 12, 2020 at 04:09:15PM +0000, Wei Liu wrote:
> Change hv_vp_index to use uint32_t because that is what is defined in TLFS.
> 
> Add "_addr" suffix to hv_do_rep_hypercall's parameters.

Being of type paddr_t I'm unsure the _addr suffix adds any value to
the name.

> 
> Signed-off-by: Wei Liu <liuwe@microsoft.com>

LGTM (and I'm certainly not going to oppose to the _addr suffix):

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb
  2020-02-12 16:09 ` [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
@ 2020-02-12 17:00   ` Roger Pau Monné
  2020-02-13  8:48     ` Durrant, Paul
  0 siblings, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2020-02-12 17:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Wed, Feb 12, 2020 at 04:09:16PM +0000, Wei Liu wrote:
> Hyper-V's L0 assisted flush has fine-grained control over what gets
> flushed. We need all the flags available to make the best decisions
> possible.
> 
> No functional change because Xen's implementation doesn't care about
> what is passed to it.

While it's certainly fine to pass a flags field with more information,
the flush flags for Xen can also contain FLUSH_CACHE, FLUSH_VCPU_STATE
or FLUSH_ROOT_PGTBL, can you add an assert that those never get passed
to the flush hook?

IMO we should define a mask with FLUSH_TLB, FLUSH_TLB_GLOBAL,
FLUSH_VA_VALID and FLUSH_ORDER_MASK and assert that those are the only
valid flags to be used for the hypervisor assisted flush hook.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 3/4] x86/hyperv: skeleton for L0 assisted TLB flush
  2020-02-12 16:09 ` [Xen-devel] [PATCH 3/4] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
@ 2020-02-12 17:09   ` Roger Pau Monné
  2020-02-12 23:01     ` Wei Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2020-02-12 17:09 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Wed, Feb 12, 2020 at 04:09:17PM +0000, Wei Liu wrote:
> Implement a basic hook for L0 assisted TLB flush. The hook needs to
> check if prerequisites are met. If they are not met, it returns an error
> number to fall back to native flushes.
> 
> Introduce a new variable to indicate if hypercall page is ready.
> 
> Signed-off-by: Wei Liu <liuwe@microsoft.com>
> ---
>  xen/arch/x86/guest/hyperv/Makefile  |  1 +
>  xen/arch/x86/guest/hyperv/hyperv.c  | 17 ++++++++++++
>  xen/arch/x86/guest/hyperv/private.h |  4 +++
>  xen/arch/x86/guest/hyperv/tlb.c     | 41 +++++++++++++++++++++++++++++
>  4 files changed, 63 insertions(+)
>  create mode 100644 xen/arch/x86/guest/hyperv/tlb.c
> 
> diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
> index 68170109a9..18902c33e9 100644
> --- a/xen/arch/x86/guest/hyperv/Makefile
> +++ b/xen/arch/x86/guest/hyperv/Makefile
> @@ -1 +1,2 @@
>  obj-y += hyperv.o
> +obj-y += tlb.o
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c b/xen/arch/x86/guest/hyperv/hyperv.c
> index b7044f7193..1cdc88e27c 100644
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -33,6 +33,8 @@ DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
>  DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
>  DEFINE_PER_CPU_READ_MOSTLY(uint32_t, hv_vp_index);
>  
> +static bool __read_mostly hv_hcall_page_ready;
> +
>  static uint64_t generate_guest_id(void)
>  {
>      union hv_guest_os_id id = {};
> @@ -119,6 +121,8 @@ static void __init setup_hypercall_page(void)
>      BUG_ON(!hypercall_msr.enable);
>  
>      set_fixmap_x(FIX_X_HYPERV_HCALL, mfn << PAGE_SHIFT);
> +
> +    hv_hcall_page_ready = true;

I guess filling the hypercall page in the probe function like it's
done for Xen is too early for HyperV, and hence you need this
safeguard?

TBH, maybe it would be best (and safer) to prevent using any hooks
until setup has been called, and hence this check could be pulled into
the generic hook?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] x86/hyperv: L0 assisted TLB flush
  2020-02-12 16:09 ` [Xen-devel] [PATCH 4/4] x86/hyperv: " Wei Liu
@ 2020-02-12 17:43   ` Roger Pau Monné
  2020-02-13  9:49     ` Jan Beulich
  2020-02-13 12:20     ` Wei Liu
  0 siblings, 2 replies; 23+ messages in thread
From: Roger Pau Monné @ 2020-02-12 17:43 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Wed, Feb 12, 2020 at 04:09:18PM +0000, Wei Liu wrote:
> Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
> of several hypercalls:
> 
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX
> 
> Pick the most efficient hypercalls available.
> 
> Signed-off-by: Wei Liu <liuwe@microsoft.com>
> ---
>  xen/arch/x86/guest/hyperv/Makefile  |   1 +
>  xen/arch/x86/guest/hyperv/private.h |   9 ++
>  xen/arch/x86/guest/hyperv/tlb.c     | 172 +++++++++++++++++++++++++++-
>  xen/arch/x86/guest/hyperv/util.c    |  72 ++++++++++++
>  4 files changed, 253 insertions(+), 1 deletion(-)
>  create mode 100644 xen/arch/x86/guest/hyperv/util.c
> 
> diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
> index 18902c33e9..0e39410968 100644
> --- a/xen/arch/x86/guest/hyperv/Makefile
> +++ b/xen/arch/x86/guest/hyperv/Makefile
> @@ -1,2 +1,3 @@
>  obj-y += hyperv.o
>  obj-y += tlb.o
> +obj-y += util.o
> diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
> index 78e52f74ce..311f060495 100644
> --- a/xen/arch/x86/guest/hyperv/private.h
> +++ b/xen/arch/x86/guest/hyperv/private.h
> @@ -24,12 +24,21 @@
>  
>  #include <xen/cpumask.h>
>  #include <xen/percpu.h>
> +#include <xen/types.h>
>  
>  DECLARE_PER_CPU(void *, hv_input_page);
>  DECLARE_PER_CPU(void *, hv_vp_assist);
>  DECLARE_PER_CPU(uint32_t, hv_vp_index);
>  
> +static inline uint32_t hv_vp_index(int cpu)

unsigned int for cpu.

> +{
> +    return per_cpu(hv_vp_index, cpu);
> +}
> +
>  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
>                       unsigned int flags);
>  
> +/* Returns number of banks, -ev if error */
> +int cpumask_to_vpset(struct hv_vpset *vpset, const cpumask_t *mask);
> +
>  #endif /* __XEN_HYPERV_PRIVIATE_H__  */
> diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
> index 48f527229e..99b789d9e9 100644
> --- a/xen/arch/x86/guest/hyperv/tlb.c
> +++ b/xen/arch/x86/guest/hyperv/tlb.c
> @@ -19,15 +19,185 @@
>   * Copyright (c) 2020 Microsoft.
>   */
>  
> +#include <xen/cpu.h>
>  #include <xen/cpumask.h>
>  #include <xen/errno.h>
>  
> +#include <asm/guest/hyperv.h>
> +#include <asm/guest/hyperv-hcall.h>
> +#include <asm/guest/hyperv-tlfs.h>
> +
>  #include "private.h"
>  
> +/*
> + * It is possible to encode up to 4096 pages using the lower 12 bits
> + * in an element of gva_list
> + */
> +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
> +#define ORDER_TO_BYTES(order) ((1ul << (order)) * PAGE_SIZE)

There are already some conversion functions in xen/mm.h
(get_order_from_{bytes/pages}), maybe you could add a
get_bytes_from_order helper there?

> +
> +static unsigned int fill_gva_list(uint64_t *gva_list, const void *va,
> +                                  unsigned int order)
> +{
> +    unsigned long start = (unsigned long)va;
> +    unsigned long end = start + ORDER_TO_BYTES(order) - 1;
> +    unsigned int n = 0;
> +
> +    do {
> +        unsigned long remain = end > start ? end - start : 0;

I don't think you can get here with end == start?

As that's the condition of the loop, and order 0 is going to set
end = start + 4096 - 1.

> +
> +        gva_list[n] = start & PAGE_MASK;
> +
> +        /*
> +         * Use lower 12 bits to encode the number of additional pages
> +         * to flush
> +         */
> +        if ( remain >= HV_TLB_FLUSH_UNIT )
> +        {
> +            gva_list[n] |= ~PAGE_MASK;
> +            start += HV_TLB_FLUSH_UNIT;
> +        }
> +        else if ( remain )
> +        {
> +            gva_list[n] |= (remain - 1) >> PAGE_SHIFT;
> +            start = end;
> +        }
> +
> +        n++;
> +    } while ( start < end );
> +
> +    return n;
> +}
> +
> +static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
> +                             unsigned int flags)
> +{
> +    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
> +    int nr_banks;
> +    unsigned int max_gvas;
> +    unsigned int order = flags & FLUSH_ORDER_MASK;
> +    uint64_t ret;
> +
> +    ASSERT(flush);
> +    ASSERT(!local_irq_is_enabled());

Can you turn this into an if condition with ASSERT_UNREACHABLE and
return ~0ULL? (as I think that signals an error).

> +
> +    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
> +        return ~0ULL;
> +
> +    flush->address_space = 0;
> +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> +
> +    flush->hv_vp_set.valid_bank_mask = 0;
> +    flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> +
> +    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
> +    if ( nr_banks < 0 )
> +        return ~0ULL;
> +
> +    max_gvas =
> +        (PAGE_SIZE - sizeof(*flush) - nr_banks *
> +         sizeof(flush->hv_vp_set.bank_contents[0])) /
> +        sizeof(uint64_t);       /* gva is represented as uint64_t */
> +
> +    /*
> +     * Flush the entire address space if va is NULL or if there is not
> +     * enough space for gva_list.
> +     */
> +    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
> +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
> +                                  nr_banks, virt_to_maddr(flush), 0);
> +    else
> +    {
> +        uint64_t *gva_list = (uint64_t *)flush + sizeof(*flush) + nr_banks;

Don't you need nr_banks * sizeof(flush->hv_vp_set.bank_contents) in
order to calculate the position of the gva_list?

> +        unsigned int gvas = fill_gva_list(gva_list, va, order);
> +
> +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
> +                                  gvas, nr_banks, virt_to_maddr(flush), 0);
> +    }
> +
> +    return ret;
> +}
> +
>  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
>                       unsigned int flags)
>  {
> -    return -EOPNOTSUPP;
> +    unsigned long irq_flags;
> +    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
> +    uint64_t ret;
> +    unsigned int order = flags & FLUSH_ORDER_MASK;
> +    unsigned int max_gvas;
> +
> +    ASSERT(flush);
> +    ASSERT(!cpumask_empty(mask));
> +
> +    local_irq_save(irq_flags);
> +
> +    flush->address_space = 0;
> +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> +    flush->processor_mask = 0;
> +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> +
> +    if ( cpumask_equal(mask, &cpu_online_map) )
> +        flush->flags |= HV_FLUSH_ALL_PROCESSORS;
> +    else
> +    {
> +        int cpu;

unsigned int.

> +
> +        /*
> +         * Normally VP indices are in ascending order and match Xen's
> +         * idea of CPU ids. Check the last index to see if VP index is
> +         * >= 64. If so, we can skip setting up parameters for
> +         * non-applicable hypercalls without looking further.
> +         */
> +        if ( hv_vp_index(cpumask_last(mask)) >= 64 )
> +            goto do_ex_hypercall;
> +
> +        for_each_cpu ( cpu, mask )
> +        {
> +            uint32_t vpid = hv_vp_index(cpu);
> +
> +            if ( vpid > ms_hyperv.max_vp_index )
> +            {
> +                local_irq_restore(irq_flags);
> +                return -ENXIO;
> +            }
> +
> +            if ( vpid >= 64 )
> +                goto do_ex_hypercall;
> +
> +            __set_bit(vpid, &flush->processor_mask);
> +        }
> +    }
> +
> +    max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
> +
> +    /*
> +     * Flush the entire address space if va is NULL or if there is not
> +     * enough space for gva_list.
> +     */
> +    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
> +        ret = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> +                              virt_to_maddr(flush), 0);
> +    else
> +    {
> +        unsigned int gvas = fill_gva_list(flush->gva_list, va, order);
> +
> +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST, gvas, 0,
> +                                  virt_to_maddr(flush), 0);
> +    }
> +
> +    goto done;
> +
> + do_ex_hypercall:
> +    ret = flush_tlb_ex(mask, va, flags);
> +
> + done:
> +    local_irq_restore(irq_flags);
> +
> +    return ret & HV_HYPERCALL_RESULT_MASK;

Will this return an error code that uses the same space as Xen's errno
values?

>  }
>  
>  /*
> diff --git a/xen/arch/x86/guest/hyperv/util.c b/xen/arch/x86/guest/hyperv/util.c
> new file mode 100644
> index 0000000000..9d0b5f4a46
> --- /dev/null
> +++ b/xen/arch/x86/guest/hyperv/util.c
> @@ -0,0 +1,72 @@
> +/******************************************************************************
> + * arch/x86/guest/hyperv/util.c
> + *
> + * Hyper-V utility functions
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2020 Microsoft.
> + */
> +
> +#include <xen/cpu.h>
> +#include <xen/cpumask.h>
> +#include <xen/errno.h>
> +
> +#include <asm/guest/hyperv.h>
> +#include <asm/guest/hyperv-tlfs.h>
> +
> +#include "private.h"
> +
> +int cpumask_to_vpset(struct hv_vpset *vpset,
> +                     const cpumask_t *mask)
> +{
> +    int nr = 1, cpu, vcpu_bank, vcpu_offset;
> +    int max_banks = ms_hyperv.max_vp_index / 64;

I think nr whats to be int (to match the function return type), but
the rest should be unsigned ints, specially because they are used as
array indexes.

> +
> +    /* Up to 64 banks can be represented by valid_bank_mask */
> +    if ( max_banks >= 64 )
> +        return -1;

E2BIG or some such?

> +
> +    /* Clear all banks to avoid flushing unwanted CPUs */
> +    for ( vcpu_bank = 0; vcpu_bank <= max_banks; vcpu_bank++ )
> +        vpset->bank_contents[vcpu_bank] = 0;
> +
> +    vpset->valid_bank_mask = 0;
> +
> +    for_each_cpu ( cpu, mask )
> +    {
> +        int vcpu = hv_vp_index(cpu);

unsigned int or uint32_t (which is the tyupe that hv_vp_index
returns).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 3/4] x86/hyperv: skeleton for L0 assisted TLB flush
  2020-02-12 17:09   ` Roger Pau Monné
@ 2020-02-12 23:01     ` Wei Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Liu @ 2020-02-12 23:01 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Wed, Feb 12, 2020 at 06:09:24PM +0100, Roger Pau Monné wrote:
> On Wed, Feb 12, 2020 at 04:09:17PM +0000, Wei Liu wrote:
> > Implement a basic hook for L0 assisted TLB flush. The hook needs to
> > check if prerequisites are met. If they are not met, it returns an error
> > number to fall back to native flushes.
> > 
> > Introduce a new variable to indicate if hypercall page is ready.
> > 
> > Signed-off-by: Wei Liu <liuwe@microsoft.com>
> > ---
> >  xen/arch/x86/guest/hyperv/Makefile  |  1 +
> >  xen/arch/x86/guest/hyperv/hyperv.c  | 17 ++++++++++++
> >  xen/arch/x86/guest/hyperv/private.h |  4 +++
> >  xen/arch/x86/guest/hyperv/tlb.c     | 41 +++++++++++++++++++++++++++++
> >  4 files changed, 63 insertions(+)
> >  create mode 100644 xen/arch/x86/guest/hyperv/tlb.c
> > 
> > diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
> > index 68170109a9..18902c33e9 100644
> > --- a/xen/arch/x86/guest/hyperv/Makefile
> > +++ b/xen/arch/x86/guest/hyperv/Makefile
> > @@ -1 +1,2 @@
> >  obj-y += hyperv.o
> > +obj-y += tlb.o
> > diff --git a/xen/arch/x86/guest/hyperv/hyperv.c b/xen/arch/x86/guest/hyperv/hyperv.c
> > index b7044f7193..1cdc88e27c 100644
> > --- a/xen/arch/x86/guest/hyperv/hyperv.c
> > +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> > @@ -33,6 +33,8 @@ DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
> >  DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
> >  DEFINE_PER_CPU_READ_MOSTLY(uint32_t, hv_vp_index);
> >  
> > +static bool __read_mostly hv_hcall_page_ready;
> > +
> >  static uint64_t generate_guest_id(void)
> >  {
> >      union hv_guest_os_id id = {};
> > @@ -119,6 +121,8 @@ static void __init setup_hypercall_page(void)
> >      BUG_ON(!hypercall_msr.enable);
> >  
> >      set_fixmap_x(FIX_X_HYPERV_HCALL, mfn << PAGE_SHIFT);
> > +
> > +    hv_hcall_page_ready = true;
> 
> I guess filling the hypercall page in the probe function like it's
> done for Xen is too early for HyperV, and hence you need this
> safeguard?

Yes that's too early.

Keep in mind that Hyper-V's hypercall page is an overlay page which is
not backed by real memory from Xen's PoV. Xen can't fill in a stub
there. Xen needs to wait until other infrastructures to go live before
setting up a hypercall page, while in the mean time, it will / may need
to flush TLB.

> 
> TBH, maybe it would be best (and safer) to prevent using any hooks
> until setup has been called, and hence this check could be pulled into
> the generic hook?

"Prevent" is too vague a term here. We can't stop code from executing in
parallel. In some situation there is no way to fail gracefully.

There is only this hook at the moment that requires such special care
afaict, and this is largely due to Hyper-V's idiosyncrasy. Others are
called only in setup / teardown path which can easily be reasoned about.

Wei.

> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup
  2020-02-12 16:09 ` [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup Wei Liu
  2020-02-12 16:53   ` Roger Pau Monné
@ 2020-02-13  8:43   ` Durrant, Paul
  2020-02-13  9:46   ` Jan Beulich
  2 siblings, 0 replies; 23+ messages in thread
From: Durrant, Paul @ 2020-02-13  8:43 UTC (permalink / raw)
  To: Wei Liu, Xen Development List
  Cc: Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jan Beulich, Michael Kelley

> -----Original Message-----
> From: Wei Liu <wei.liu.xen@gmail.com> On Behalf Of Wei Liu
> Sent: 12 February 2020 17:09
> To: Xen Development List <xen-devel@lists.xenproject.org>
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Michael Kelley
> <mikelley@microsoft.com>; Wei Liu <liuwe@microsoft.com>; Wei Liu
> <wl@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: [PATCH 1/4] x86/hyperv: misc cleanup
> 
> Change hv_vp_index to use uint32_t because that is what is defined in
> TLFS.
> 
> Add "_addr" suffix to hv_do_rep_hypercall's parameters.
> 
> Signed-off-by: Wei Liu <liuwe@microsoft.com>

Reviewed-by: Paul Durrant <pdurrant@amazon.com>

> ---
>  xen/arch/x86/guest/hyperv/hyperv.c       | 2 +-
>  xen/arch/x86/guest/hyperv/private.h      | 2 +-
>  xen/include/asm-x86/guest/hyperv-hcall.h | 5 +++--
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c
> b/xen/arch/x86/guest/hyperv/hyperv.c
> index 70f4cd5ae0..b7044f7193 100644
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -31,7 +31,7 @@
>  struct ms_hyperv_info __read_mostly ms_hyperv;
>  DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
>  DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
> -DEFINE_PER_CPU_READ_MOSTLY(unsigned int, hv_vp_index);
> +DEFINE_PER_CPU_READ_MOSTLY(uint32_t, hv_vp_index);
> 
>  static uint64_t generate_guest_id(void)
>  {
> diff --git a/xen/arch/x86/guest/hyperv/private.h
> b/xen/arch/x86/guest/hyperv/private.h
> index 956eff831f..eb14ea43e5 100644
> --- a/xen/arch/x86/guest/hyperv/private.h
> +++ b/xen/arch/x86/guest/hyperv/private.h
> @@ -26,6 +26,6 @@
> 
>  DECLARE_PER_CPU(void *, hv_input_page);
>  DECLARE_PER_CPU(void *, hv_vp_assist);
> -DECLARE_PER_CPU(unsigned int, hv_vp_index);
> +DECLARE_PER_CPU(uint32_t, hv_vp_index);
> 
>  #endif /* __XEN_HYPERV_PRIVIATE_H__  */
> diff --git a/xen/include/asm-x86/guest/hyperv-hcall.h b/xen/include/asm-
> x86/guest/hyperv-hcall.h
> index 4d3b131b3a..3396caccdd 100644
> --- a/xen/include/asm-x86/guest/hyperv-hcall.h
> +++ b/xen/include/asm-x86/guest/hyperv-hcall.h
> @@ -61,7 +61,8 @@ static inline uint64_t hv_do_fast_hypercall(uint16_t
> code,
> 
>  static inline uint64_t hv_do_rep_hypercall(uint16_t code, uint16_t
> rep_count,
>                                             uint16_t varhead_size,
> -                                           paddr_t input, paddr_t output)
> +                                           paddr_t input_addr,
> +                                           paddr_t output_addr)
>  {
>      uint64_t control = code;
>      uint64_t status;
> @@ -71,7 +72,7 @@ static inline uint64_t hv_do_rep_hypercall(uint16_t
> code, uint16_t rep_count,
>      control |= (uint64_t)rep_count << HV_HYPERCALL_REP_COMP_OFFSET;
> 
>      do {
> -        status = hv_do_hypercall(control, input, output);
> +        status = hv_do_hypercall(control, input_addr, output_addr);
>          if ( (status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS )
>              break;
> 
> --
> 2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb
  2020-02-12 17:00   ` Roger Pau Monné
@ 2020-02-13  8:48     ` Durrant, Paul
  2020-02-13 10:29       ` Wei Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Durrant, Paul @ 2020-02-13  8:48 UTC (permalink / raw)
  To: Roger Pau Monné, Wei Liu
  Cc: Xen Development List, Andrew Cooper, Wei Liu, Jan Beulich,
	Michael Kelley

> -----Original Message-----
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: 12 February 2020 18:01
> To: Wei Liu <wl@xen.org>
> Cc: Xen Development List <xen-devel@lists.xenproject.org>; Durrant, Paul
> <pdurrant@amazon.co.uk>; Michael Kelley <mikelley@microsoft.com>; Wei Liu
> <liuwe@microsoft.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH 2/4] x86/hypervisor: pass flags to
> hypervisor_flush_tlb
> 
> On Wed, Feb 12, 2020 at 04:09:16PM +0000, Wei Liu wrote:
> > Hyper-V's L0 assisted flush has fine-grained control over what gets
> > flushed. We need all the flags available to make the best decisions
> > possible.
> >
> > No functional change because Xen's implementation doesn't care about
> > what is passed to it.
> 
> While it's certainly fine to pass a flags field with more information,
> the flush flags for Xen can also contain FLUSH_CACHE, FLUSH_VCPU_STATE
> or FLUSH_ROOT_PGTBL, can you add an assert that those never get passed
> to the flush hook?
> 
> IMO we should define a mask with FLUSH_TLB, FLUSH_TLB_GLOBAL,
> FLUSH_VA_VALID and FLUSH_ORDER_MASK and assert that those are the only
> valid flags to be used for the hypervisor assisted flush hook.
>

Agreed that this should be abstracted; we certainly don't want to have various bits of Xen needing to know what hypervisor it is running on top of.

  Paul

 
> Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup
  2020-02-12 16:09 ` [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup Wei Liu
  2020-02-12 16:53   ` Roger Pau Monné
  2020-02-13  8:43   ` Durrant, Paul
@ 2020-02-13  9:46   ` Jan Beulich
  2020-02-13 12:24     ` Wei Liu
  2 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2020-02-13  9:46 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Xen Development List, Roger Pau Monné

On 12.02.2020 17:09, Wei Liu wrote:
> --- a/xen/arch/x86/guest/hyperv/private.h
> +++ b/xen/arch/x86/guest/hyperv/private.h
> @@ -26,6 +26,6 @@
>  
>  DECLARE_PER_CPU(void *, hv_input_page);
>  DECLARE_PER_CPU(void *, hv_vp_assist);
> -DECLARE_PER_CPU(unsigned int, hv_vp_index);
> +DECLARE_PER_CPU(uint32_t, hv_vp_index);

You've got a co-maintainer ack, i.e. so be it, but FTR this is
against what CodingStyle says, afaict: "Fixed width types should
only be used when a fixed width quantity is meant (which for
example may be a value read from or to be written to a register)."
If you handed the address (perhaps indirectly, e.g. by converting
to a physical one first) of this variable to Hyper-V, then things
would be different. But

    this_cpu(hv_vp_index) = vp_index_msr;

would, if unsigned int was wider than 32 bits, not cause any
issues. And this is the only place the variable currently gets
accessed, and I expect future uses will just be reads of it (as
can be seen later in the series).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] x86/hyperv: L0 assisted TLB flush
  2020-02-12 17:43   ` Roger Pau Monné
@ 2020-02-13  9:49     ` Jan Beulich
  2020-02-13 12:25       ` Wei Liu
  2020-02-13 12:20     ` Wei Liu
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2020-02-13  9:49 UTC (permalink / raw)
  To: Roger Pau Monné, Wei Liu
  Cc: Xen Development List, Paul Durrant, Andrew Cooper, Wei Liu,
	Michael Kelley

On 12.02.2020 18:43, Roger Pau Monné wrote:
> On Wed, Feb 12, 2020 at 04:09:18PM +0000, Wei Liu wrote:
>> Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
>> of several hypercalls:
>>
>>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
>>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
>>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
>>  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX
>>
>> Pick the most efficient hypercalls available.
>>
>> Signed-off-by: Wei Liu <liuwe@microsoft.com>
>> ---
>>  xen/arch/x86/guest/hyperv/Makefile  |   1 +
>>  xen/arch/x86/guest/hyperv/private.h |   9 ++
>>  xen/arch/x86/guest/hyperv/tlb.c     | 172 +++++++++++++++++++++++++++-
>>  xen/arch/x86/guest/hyperv/util.c    |  72 ++++++++++++
>>  4 files changed, 253 insertions(+), 1 deletion(-)
>>  create mode 100644 xen/arch/x86/guest/hyperv/util.c
>>
>> diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
>> index 18902c33e9..0e39410968 100644
>> --- a/xen/arch/x86/guest/hyperv/Makefile
>> +++ b/xen/arch/x86/guest/hyperv/Makefile
>> @@ -1,2 +1,3 @@
>>  obj-y += hyperv.o
>>  obj-y += tlb.o
>> +obj-y += util.o
>> diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
>> index 78e52f74ce..311f060495 100644
>> --- a/xen/arch/x86/guest/hyperv/private.h
>> +++ b/xen/arch/x86/guest/hyperv/private.h
>> @@ -24,12 +24,21 @@
>>  
>>  #include <xen/cpumask.h>
>>  #include <xen/percpu.h>
>> +#include <xen/types.h>
>>  
>>  DECLARE_PER_CPU(void *, hv_input_page);
>>  DECLARE_PER_CPU(void *, hv_vp_assist);
>>  DECLARE_PER_CPU(uint32_t, hv_vp_index);
>>  
>> +static inline uint32_t hv_vp_index(int cpu)
> 
> unsigned int for cpu.

And also for the return type, as per my comment on patch 1.

>> --- a/xen/arch/x86/guest/hyperv/tlb.c
>> +++ b/xen/arch/x86/guest/hyperv/tlb.c
>> @@ -19,15 +19,185 @@
>>   * Copyright (c) 2020 Microsoft.
>>   */
>>  
>> +#include <xen/cpu.h>
>>  #include <xen/cpumask.h>
>>  #include <xen/errno.h>
>>  
>> +#include <asm/guest/hyperv.h>
>> +#include <asm/guest/hyperv-hcall.h>
>> +#include <asm/guest/hyperv-tlfs.h>
>> +
>>  #include "private.h"
>>  
>> +/*
>> + * It is possible to encode up to 4096 pages using the lower 12 bits
>> + * in an element of gva_list
>> + */
>> +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
>> +#define ORDER_TO_BYTES(order) ((1ul << (order)) * PAGE_SIZE)
> 
> There are already some conversion functions in xen/mm.h
> (get_order_from_{bytes/pages}), maybe you could add a
> get_bytes_from_order helper there?

I don't think a macro (or helper function) is worthwhile here - we
don't have any in the various other places that do the same. The
above should be used inline, preferably in the simpler form of
PAGE_SIZE << order.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb
  2020-02-13  8:48     ` Durrant, Paul
@ 2020-02-13 10:29       ` Wei Liu
  2020-02-13 11:56         ` Wei Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2020-02-13 10:29 UTC (permalink / raw)
  To: Durrant, Paul
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Michael Kelley, Jan Beulich,
	Xen Development List, Roger Pau Monné

On Thu, Feb 13, 2020 at 08:48:39AM +0000, Durrant, Paul wrote:
> > -----Original Message-----
> > From: Roger Pau Monné <roger.pau@citrix.com>
> > Sent: 12 February 2020 18:01
> > To: Wei Liu <wl@xen.org>
> > Cc: Xen Development List <xen-devel@lists.xenproject.org>; Durrant, Paul
> > <pdurrant@amazon.co.uk>; Michael Kelley <mikelley@microsoft.com>; Wei Liu
> > <liuwe@microsoft.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > <andrew.cooper3@citrix.com>
> > Subject: Re: [PATCH 2/4] x86/hypervisor: pass flags to
> > hypervisor_flush_tlb
> > 
> > On Wed, Feb 12, 2020 at 04:09:16PM +0000, Wei Liu wrote:
> > > Hyper-V's L0 assisted flush has fine-grained control over what gets
> > > flushed. We need all the flags available to make the best decisions
> > > possible.
> > >
> > > No functional change because Xen's implementation doesn't care about
> > > what is passed to it.
> > 
> > While it's certainly fine to pass a flags field with more information,
> > the flush flags for Xen can also contain FLUSH_CACHE, FLUSH_VCPU_STATE
> > or FLUSH_ROOT_PGTBL, can you add an assert that those never get passed
> > to the flush hook?
> > 
> > IMO we should define a mask with FLUSH_TLB, FLUSH_TLB_GLOBAL,
> > FLUSH_VA_VALID and FLUSH_ORDER_MASK and assert that those are the only
> > valid flags to be used for the hypervisor assisted flush hook.
> >
> 
> Agreed that this should be abstracted; we certainly don't want to have
> various bits of Xen needing to know what hypervisor it is running on
> top of.

OK. I can introduce a FLUSH_TLB_FLAGS for all things pertaining to TLB
-- the four things mentioned in Roger's reply.

Wei.

> 
>   Paul
> 
>  
> > Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup
  2020-02-12 16:53   ` Roger Pau Monné
@ 2020-02-13 10:30     ` Wei Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Liu @ 2020-02-13 10:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Wed, Feb 12, 2020 at 05:53:54PM +0100, Roger Pau Monné wrote:
> On Wed, Feb 12, 2020 at 04:09:15PM +0000, Wei Liu wrote:
> > Change hv_vp_index to use uint32_t because that is what is defined in TLFS.
> > 
> > Add "_addr" suffix to hv_do_rep_hypercall's parameters.
> 
> Being of type paddr_t I'm unsure the _addr suffix adds any value to
> the name.
> 

Andrew asked us to add that prefix for hv_do_hypercall. I discovered the
same thing should've been done for hv_do_rep_hypercall as well.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb
  2020-02-13 10:29       ` Wei Liu
@ 2020-02-13 11:56         ` Wei Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Liu @ 2020-02-13 11:56 UTC (permalink / raw)
  To: Durrant, Paul
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Michael Kelley, Jan Beulich,
	Xen Development List, Roger Pau Monné

On Thu, Feb 13, 2020 at 10:29:16AM +0000, Wei Liu wrote:
> On Thu, Feb 13, 2020 at 08:48:39AM +0000, Durrant, Paul wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monné <roger.pau@citrix.com>
> > > Sent: 12 February 2020 18:01
> > > To: Wei Liu <wl@xen.org>
> > > Cc: Xen Development List <xen-devel@lists.xenproject.org>; Durrant, Paul
> > > <pdurrant@amazon.co.uk>; Michael Kelley <mikelley@microsoft.com>; Wei Liu
> > > <liuwe@microsoft.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > > <andrew.cooper3@citrix.com>
> > > Subject: Re: [PATCH 2/4] x86/hypervisor: pass flags to
> > > hypervisor_flush_tlb
> > > 
> > > On Wed, Feb 12, 2020 at 04:09:16PM +0000, Wei Liu wrote:
> > > > Hyper-V's L0 assisted flush has fine-grained control over what gets
> > > > flushed. We need all the flags available to make the best decisions
> > > > possible.
> > > >
> > > > No functional change because Xen's implementation doesn't care about
> > > > what is passed to it.
> > > 
> > > While it's certainly fine to pass a flags field with more information,
> > > the flush flags for Xen can also contain FLUSH_CACHE, FLUSH_VCPU_STATE
> > > or FLUSH_ROOT_PGTBL, can you add an assert that those never get passed
> > > to the flush hook?
> > > 
> > > IMO we should define a mask with FLUSH_TLB, FLUSH_TLB_GLOBAL,
> > > FLUSH_VA_VALID and FLUSH_ORDER_MASK and assert that those are the only
> > > valid flags to be used for the hypervisor assisted flush hook.
> > >
> > 
> > Agreed that this should be abstracted; we certainly don't want to have
> > various bits of Xen needing to know what hypervisor it is running on
> > top of.
> 
> OK. I can introduce a FLUSH_TLB_FLAGS for all things pertaining to TLB
> -- the four things mentioned in Roger's reply.

If I'm not mistaken flush_area_mask in Roger's patch already filters out
all the unwanted flags, so my code is correct as-is.

I can add

#define FLUSH_TLB_FLAGS_MASK (FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID | \
                              FLUSH_ORDER_MASK)

and use it in flush_area_mask to replace the longer string there.

Wei.


> 
> Wei.
> 
> > 
> >   Paul
> > 
> >  
> > > Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] x86/hyperv: L0 assisted TLB flush
  2020-02-12 17:43   ` Roger Pau Monné
  2020-02-13  9:49     ` Jan Beulich
@ 2020-02-13 12:20     ` Wei Liu
  2020-02-13 12:41       ` Roger Pau Monné
  1 sibling, 1 reply; 23+ messages in thread
From: Wei Liu @ 2020-02-13 12:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Wed, Feb 12, 2020 at 06:43:47PM +0100, Roger Pau Monné wrote:
> On Wed, Feb 12, 2020 at 04:09:18PM +0000, Wei Liu wrote:
> > Implement L0 assisted TLB flush for Xen on Hyper-V. It takes advantage
> > of several hypercalls:
> > 
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE
> >  * HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX
> > 
> > Pick the most efficient hypercalls available.
> > 
> > Signed-off-by: Wei Liu <liuwe@microsoft.com>
> > ---
> >  xen/arch/x86/guest/hyperv/Makefile  |   1 +
> >  xen/arch/x86/guest/hyperv/private.h |   9 ++
> >  xen/arch/x86/guest/hyperv/tlb.c     | 172 +++++++++++++++++++++++++++-
> >  xen/arch/x86/guest/hyperv/util.c    |  72 ++++++++++++
> >  4 files changed, 253 insertions(+), 1 deletion(-)
> >  create mode 100644 xen/arch/x86/guest/hyperv/util.c
> > 
> > diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
> > index 18902c33e9..0e39410968 100644
> > --- a/xen/arch/x86/guest/hyperv/Makefile
> > +++ b/xen/arch/x86/guest/hyperv/Makefile
> > @@ -1,2 +1,3 @@
> >  obj-y += hyperv.o
> >  obj-y += tlb.o
> > +obj-y += util.o
> > diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
> > index 78e52f74ce..311f060495 100644
> > --- a/xen/arch/x86/guest/hyperv/private.h
> > +++ b/xen/arch/x86/guest/hyperv/private.h
> > @@ -24,12 +24,21 @@
> >  
> >  #include <xen/cpumask.h>
> >  #include <xen/percpu.h>
> > +#include <xen/types.h>
> >  
> >  DECLARE_PER_CPU(void *, hv_input_page);
> >  DECLARE_PER_CPU(void *, hv_vp_assist);
> >  DECLARE_PER_CPU(uint32_t, hv_vp_index);
> >  
> > +static inline uint32_t hv_vp_index(int cpu)
> 
> unsigned int for cpu.
> 
> > +{
> > +    return per_cpu(hv_vp_index, cpu);
> > +}
> > +
> >  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
> >                       unsigned int flags);
> >  
> > +/* Returns number of banks, -ev if error */
> > +int cpumask_to_vpset(struct hv_vpset *vpset, const cpumask_t *mask);
> > +
> >  #endif /* __XEN_HYPERV_PRIVIATE_H__  */
> > diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
> > index 48f527229e..99b789d9e9 100644
> > --- a/xen/arch/x86/guest/hyperv/tlb.c
> > +++ b/xen/arch/x86/guest/hyperv/tlb.c
> > @@ -19,15 +19,185 @@
> >   * Copyright (c) 2020 Microsoft.
> >   */
> >  
> > +#include <xen/cpu.h>
> >  #include <xen/cpumask.h>
> >  #include <xen/errno.h>
> >  
> > +#include <asm/guest/hyperv.h>
> > +#include <asm/guest/hyperv-hcall.h>
> > +#include <asm/guest/hyperv-tlfs.h>
> > +
> >  #include "private.h"
> >  
> > +/*
> > + * It is possible to encode up to 4096 pages using the lower 12 bits
> > + * in an element of gva_list
> > + */
> > +#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
> > +#define ORDER_TO_BYTES(order) ((1ul << (order)) * PAGE_SIZE)
> 
> There are already some conversion functions in xen/mm.h
> (get_order_from_{bytes/pages}), maybe you could add a
> get_bytes_from_order helper there?
> 
> > +
> > +static unsigned int fill_gva_list(uint64_t *gva_list, const void *va,
> > +                                  unsigned int order)
> > +{
> > +    unsigned long start = (unsigned long)va;
> > +    unsigned long end = start + ORDER_TO_BYTES(order) - 1;
> > +    unsigned int n = 0;
> > +
> > +    do {
> > +        unsigned long remain = end > start ? end - start : 0;
> 
> I don't think you can get here with end == start?
> 
> As that's the condition of the loop, and order 0 is going to set
> end = start + 4096 - 1.

Correct. This can be simplified as remain = end - start .

> 
> > +
> > +        gva_list[n] = start & PAGE_MASK;
> > +
> > +        /*
> > +         * Use lower 12 bits to encode the number of additional pages
> > +         * to flush
> > +         */
> > +        if ( remain >= HV_TLB_FLUSH_UNIT )
> > +        {
> > +            gva_list[n] |= ~PAGE_MASK;
> > +            start += HV_TLB_FLUSH_UNIT;
> > +        }
> > +        else if ( remain )
> > +        {
> > +            gva_list[n] |= (remain - 1) >> PAGE_SHIFT;
> > +            start = end;
> > +        }
> > +
> > +        n++;
> > +    } while ( start < end );
> > +
> > +    return n;
> > +}
> > +
> > +static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
> > +                             unsigned int flags)
> > +{
> > +    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
> > +    int nr_banks;
> > +    unsigned int max_gvas;
> > +    unsigned int order = flags & FLUSH_ORDER_MASK;
> > +    uint64_t ret;
> > +
> > +    ASSERT(flush);
> > +    ASSERT(!local_irq_is_enabled());
> 
> Can you turn this into an if condition with ASSERT_UNREACHABLE and
> return ~0ULL? (as I think that signals an error).
> 

There is no need for that. This function will always be internal to
Hyper-V in the foreseeable future. If it is ever called with IRQ enabled
something is wrong with the code.

> > +
> > +    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
> > +        return ~0ULL;
> > +
> > +    flush->address_space = 0;
> > +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> > +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> > +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> > +
> > +    flush->hv_vp_set.valid_bank_mask = 0;
> > +    flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > +
> > +    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
> > +    if ( nr_banks < 0 )
> > +        return ~0ULL;
> > +
> > +    max_gvas =
> > +        (PAGE_SIZE - sizeof(*flush) - nr_banks *
> > +         sizeof(flush->hv_vp_set.bank_contents[0])) /
> > +        sizeof(uint64_t);       /* gva is represented as uint64_t */
> > +
> > +    /*
> > +     * Flush the entire address space if va is NULL or if there is not
> > +     * enough space for gva_list.
> > +     */
> > +    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
> > +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
> > +                                  nr_banks, virt_to_maddr(flush), 0);
> > +    else
> > +    {
> > +        uint64_t *gva_list = (uint64_t *)flush + sizeof(*flush) + nr_banks;
> 
> Don't you need nr_banks * sizeof(flush->hv_vp_set.bank_contents) in
> order to calculate the position of the gva_list?
> 

The pointer arithmetic is done on uint64_t pointers so it already takes
into account sizeof(bank_contents[0]).

> > +        unsigned int gvas = fill_gva_list(gva_list, va, order);
> > +
> > +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
> > +                                  gvas, nr_banks, virt_to_maddr(flush), 0);
> > +    }
> > +
> > +    return ret;
> > +}
> > +
> >  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
> >                       unsigned int flags)
> >  {
> > -    return -EOPNOTSUPP;
> > +    unsigned long irq_flags;
> > +    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
> > +    uint64_t ret;
> > +    unsigned int order = flags & FLUSH_ORDER_MASK;
> > +    unsigned int max_gvas;
> > +
> > +    ASSERT(flush);
> > +    ASSERT(!cpumask_empty(mask));
> > +
> > +    local_irq_save(irq_flags);
> > +
> > +    flush->address_space = 0;
> > +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> > +    flush->processor_mask = 0;
> > +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> > +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> > +
> > +    if ( cpumask_equal(mask, &cpu_online_map) )
> > +        flush->flags |= HV_FLUSH_ALL_PROCESSORS;
> > +    else
> > +    {
> > +        int cpu;
> 
> unsigned int.
> 

I picked int here and above because all the cpumask functions return
int. I don't mind changing it to unsigned int -- it makes no practical
difference.

> > +
> > +        /*
> > +         * Normally VP indices are in ascending order and match Xen's
> > +         * idea of CPU ids. Check the last index to see if VP index is
> > +         * >= 64. If so, we can skip setting up parameters for
> > +         * non-applicable hypercalls without looking further.
> > +         */
> > +        if ( hv_vp_index(cpumask_last(mask)) >= 64 )
> > +            goto do_ex_hypercall;
> > +
> > +        for_each_cpu ( cpu, mask )
> > +        {
> > +            uint32_t vpid = hv_vp_index(cpu);
> > +
> > +            if ( vpid > ms_hyperv.max_vp_index )
> > +            {
> > +                local_irq_restore(irq_flags);
> > +                return -ENXIO;
> > +            }
> > +
> > +            if ( vpid >= 64 )
> > +                goto do_ex_hypercall;
> > +
> > +            __set_bit(vpid, &flush->processor_mask);
> > +        }
> > +    }
> > +
> > +    max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
> > +
> > +    /*
> > +     * Flush the entire address space if va is NULL or if there is not
> > +     * enough space for gva_list.
> > +     */
> > +    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
> > +        ret = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> > +                              virt_to_maddr(flush), 0);
> > +    else
> > +    {
> > +        unsigned int gvas = fill_gva_list(flush->gva_list, va, order);
> > +
> > +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST, gvas, 0,
> > +                                  virt_to_maddr(flush), 0);
> > +    }
> > +
> > +    goto done;
> > +
> > + do_ex_hypercall:
> > +    ret = flush_tlb_ex(mask, va, flags);
> > +
> > + done:
> > +    local_irq_restore(irq_flags);
> > +
> > +    return ret & HV_HYPERCALL_RESULT_MASK;
> 
> Will this return an error code that uses the same space as Xen's errno
> values?
> 

No, it won't. It returns Hyper-V's status code (0 still means success).

I didn't think that was a big deal because non-zero values meant errors.
And the upper layer didn't care about the exact error values (yet).

> >  }
> >  
> >  /*
> > diff --git a/xen/arch/x86/guest/hyperv/util.c b/xen/arch/x86/guest/hyperv/util.c
> > new file mode 100644
> > index 0000000000..9d0b5f4a46
> > --- /dev/null
> > +++ b/xen/arch/x86/guest/hyperv/util.c
> > @@ -0,0 +1,72 @@
> > +/******************************************************************************
> > + * arch/x86/guest/hyperv/util.c
> > + *
> > + * Hyper-V utility functions
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> > + *
> > + * Copyright (c) 2020 Microsoft.
> > + */
> > +
> > +#include <xen/cpu.h>
> > +#include <xen/cpumask.h>
> > +#include <xen/errno.h>
> > +
> > +#include <asm/guest/hyperv.h>
> > +#include <asm/guest/hyperv-tlfs.h>
> > +
> > +#include "private.h"
> > +
> > +int cpumask_to_vpset(struct hv_vpset *vpset,
> > +                     const cpumask_t *mask)
> > +{
> > +    int nr = 1, cpu, vcpu_bank, vcpu_offset;
> > +    int max_banks = ms_hyperv.max_vp_index / 64;
> 
> I think nr whats to be int (to match the function return type), but
> the rest should be unsigned ints, specially because they are used as
> array indexes.
> 

OK.

> > +
> > +    /* Up to 64 banks can be represented by valid_bank_mask */
> > +    if ( max_banks >= 64 )
> > +        return -1;
> 
> E2BIG or some such?
> 

Right. That's better than -1.

> > +
> > +    /* Clear all banks to avoid flushing unwanted CPUs */
> > +    for ( vcpu_bank = 0; vcpu_bank <= max_banks; vcpu_bank++ )
> > +        vpset->bank_contents[vcpu_bank] = 0;
> > +
> > +    vpset->valid_bank_mask = 0;
> > +
> > +    for_each_cpu ( cpu, mask )
> > +    {
> > +        int vcpu = hv_vp_index(cpu);
> 
> unsigned int or uint32_t (which is the tyupe that hv_vp_index
> returns).
> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup
  2020-02-13  9:46   ` Jan Beulich
@ 2020-02-13 12:24     ` Wei Liu
  2020-02-13 13:32       ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Wei Liu @ 2020-02-13 12:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Xen Development List, Roger Pau Monné

On Thu, Feb 13, 2020 at 10:46:56AM +0100, Jan Beulich wrote:
> On 12.02.2020 17:09, Wei Liu wrote:
> > --- a/xen/arch/x86/guest/hyperv/private.h
> > +++ b/xen/arch/x86/guest/hyperv/private.h
> > @@ -26,6 +26,6 @@
> >  
> >  DECLARE_PER_CPU(void *, hv_input_page);
> >  DECLARE_PER_CPU(void *, hv_vp_assist);
> > -DECLARE_PER_CPU(unsigned int, hv_vp_index);
> > +DECLARE_PER_CPU(uint32_t, hv_vp_index);
> 
> You've got a co-maintainer ack, i.e. so be it, but FTR this is
> against what CodingStyle says, afaict: "Fixed width types should
> only be used when a fixed width quantity is meant (which for
> example may be a value read from or to be written to a register)."
> If you handed the address (perhaps indirectly, e.g. by converting
> to a physical one first) of this variable to Hyper-V, then things
> would be different. But
> 
>     this_cpu(hv_vp_index) = vp_index_msr;
> 
> would, if unsigned int was wider than 32 bits, not cause any

Did you mean "wouldn't" here?

> issues. And this is the only place the variable currently gets
> accessed, and I expect future uses will just be reads of it (as
> can be seen later in the series).
> 

Yes.

Wei.

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] x86/hyperv: L0 assisted TLB flush
  2020-02-13  9:49     ` Jan Beulich
@ 2020-02-13 12:25       ` Wei Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Liu @ 2020-02-13 12:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Xen Development List, Roger Pau Monné

On Thu, Feb 13, 2020 at 10:49:39AM +0100, Jan Beulich wrote:
> >> diff --git a/xen/arch/x86/guest/hyperv/Makefile b/xen/arch/x86/guest/hyperv/Makefile
> >> index 18902c33e9..0e39410968 100644
> >> --- a/xen/arch/x86/guest/hyperv/Makefile
> >> +++ b/xen/arch/x86/guest/hyperv/Makefile
> >> @@ -1,2 +1,3 @@
> >>  obj-y += hyperv.o
> >>  obj-y += tlb.o
> >> +obj-y += util.o
> >> diff --git a/xen/arch/x86/guest/hyperv/private.h b/xen/arch/x86/guest/hyperv/private.h
> >> index 78e52f74ce..311f060495 100644
> >> --- a/xen/arch/x86/guest/hyperv/private.h
> >> +++ b/xen/arch/x86/guest/hyperv/private.h
> >> @@ -24,12 +24,21 @@
> >>  
> >>  #include <xen/cpumask.h>
> >>  #include <xen/percpu.h>
> >> +#include <xen/types.h>
> >>  
> >>  DECLARE_PER_CPU(void *, hv_input_page);
> >>  DECLARE_PER_CPU(void *, hv_vp_assist);
> >>  DECLARE_PER_CPU(uint32_t, hv_vp_index);
> >>  
> >> +static inline uint32_t hv_vp_index(int cpu)
> > 
> > unsigned int for cpu.
> 
> And also for the return type, as per my comment on patch 1.

Ack.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] x86/hyperv: L0 assisted TLB flush
  2020-02-13 12:20     ` Wei Liu
@ 2020-02-13 12:41       ` Roger Pau Monné
  2020-02-14 10:47         ` Wei Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2020-02-13 12:41 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Thu, Feb 13, 2020 at 12:20:33PM +0000, Wei Liu wrote:
> On Wed, Feb 12, 2020 at 06:43:47PM +0100, Roger Pau Monné wrote:
> > On Wed, Feb 12, 2020 at 04:09:18PM +0000, Wei Liu wrote:
> > > +static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
> > > +                             unsigned int flags)
> > > +{
> > > +    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
> > > +    int nr_banks;
> > > +    unsigned int max_gvas;
> > > +    unsigned int order = flags & FLUSH_ORDER_MASK;
> > > +    uint64_t ret;
> > > +
> > > +    ASSERT(flush);
> > > +    ASSERT(!local_irq_is_enabled());
> > 
> > Can you turn this into an if condition with ASSERT_UNREACHABLE and
> > return ~0ULL? (as I think that signals an error).
> > 
> 
> There is no need for that. This function will always be internal to
> Hyper-V in the foreseeable future. If it is ever called with IRQ enabled
> something is wrong with the code.

But iff it ever manages to be called violating one of those conditions
things will go badly I assume?

It would be better to stay on the safe side and simply return an error
when the conditions are no meet, and assert in the debug build.

> 
> > > +
> > > +    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
> > > +        return ~0ULL;
> > > +
> > > +    flush->address_space = 0;
> > > +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> > > +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> > > +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> > > +
> > > +    flush->hv_vp_set.valid_bank_mask = 0;
> > > +    flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > > +
> > > +    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
> > > +    if ( nr_banks < 0 )
> > > +        return ~0ULL;
> > > +
> > > +    max_gvas =
> > > +        (PAGE_SIZE - sizeof(*flush) - nr_banks *
> > > +         sizeof(flush->hv_vp_set.bank_contents[0])) /
> > > +        sizeof(uint64_t);       /* gva is represented as uint64_t */
> > > +
> > > +    /*
> > > +     * Flush the entire address space if va is NULL or if there is not
> > > +     * enough space for gva_list.
> > > +     */
> > > +    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
> > > +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
> > > +                                  nr_banks, virt_to_maddr(flush), 0);
> > > +    else
> > > +    {
> > > +        uint64_t *gva_list = (uint64_t *)flush + sizeof(*flush) + nr_banks;
> > 
> > Don't you need nr_banks * sizeof(flush->hv_vp_set.bank_contents) in
> > order to calculate the position of the gva_list?
> > 
> 
> The pointer arithmetic is done on uint64_t pointers so it already takes
> into account sizeof(bank_contents[0]).

Oh, then the sizeof(*flush) should be divided by sizeof(uint64_t)?

> > > +        unsigned int gvas = fill_gva_list(gva_list, va, order);
> > > +
> > > +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX,
> > > +                                  gvas, nr_banks, virt_to_maddr(flush), 0);
> > > +    }
> > > +
> > > +    return ret;
> > > +}
> > > +
> > >  int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
> > >                       unsigned int flags)
> > >  {
> > > -    return -EOPNOTSUPP;
> > > +    unsigned long irq_flags;
> > > +    struct hv_tlb_flush *flush = this_cpu(hv_input_page);
> > > +    uint64_t ret;
> > > +    unsigned int order = flags & FLUSH_ORDER_MASK;
> > > +    unsigned int max_gvas;
> > > +
> > > +    ASSERT(flush);
> > > +    ASSERT(!cpumask_empty(mask));
> > > +
> > > +    local_irq_save(irq_flags);
> > > +
> > > +    flush->address_space = 0;
> > > +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> > > +    flush->processor_mask = 0;
> > > +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> > > +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> > > +
> > > +    if ( cpumask_equal(mask, &cpu_online_map) )
> > > +        flush->flags |= HV_FLUSH_ALL_PROCESSORS;
> > > +    else
> > > +    {
> > > +        int cpu;
> > 
> > unsigned int.
> > 
> 
> I picked int here and above because all the cpumask functions return
> int. I don't mind changing it to unsigned int -- it makes no practical
> difference.

Those should likely return unsigned ints also, as I don't think
cpumask can return errors. I prefer unsigned int, since negative cpu
values make no sense.

> > > +
> > > +        /*
> > > +         * Normally VP indices are in ascending order and match Xen's
> > > +         * idea of CPU ids. Check the last index to see if VP index is
> > > +         * >= 64. If so, we can skip setting up parameters for
> > > +         * non-applicable hypercalls without looking further.
> > > +         */
> > > +        if ( hv_vp_index(cpumask_last(mask)) >= 64 )
> > > +            goto do_ex_hypercall;
> > > +
> > > +        for_each_cpu ( cpu, mask )
> > > +        {
> > > +            uint32_t vpid = hv_vp_index(cpu);
> > > +
> > > +            if ( vpid > ms_hyperv.max_vp_index )
> > > +            {
> > > +                local_irq_restore(irq_flags);
> > > +                return -ENXIO;
> > > +            }
> > > +
> > > +            if ( vpid >= 64 )
> > > +                goto do_ex_hypercall;
> > > +
> > > +            __set_bit(vpid, &flush->processor_mask);
> > > +        }
> > > +    }
> > > +
> > > +    max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
> > > +
> > > +    /*
> > > +     * Flush the entire address space if va is NULL or if there is not
> > > +     * enough space for gva_list.
> > > +     */
> > > +    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
> > > +        ret = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
> > > +                              virt_to_maddr(flush), 0);
> > > +    else
> > > +    {
> > > +        unsigned int gvas = fill_gva_list(flush->gva_list, va, order);
> > > +
> > > +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST, gvas, 0,
> > > +                                  virt_to_maddr(flush), 0);
> > > +    }
> > > +
> > > +    goto done;
> > > +
> > > + do_ex_hypercall:
> > > +    ret = flush_tlb_ex(mask, va, flags);
> > > +
> > > + done:
> > > +    local_irq_restore(irq_flags);
> > > +
> > > +    return ret & HV_HYPERCALL_RESULT_MASK;
> > 
> > Will this return an error code that uses the same space as Xen's errno
> > values?
> > 
> 
> No, it won't. It returns Hyper-V's status code (0 still means success).
> 
> I didn't think that was a big deal because non-zero values meant errors.
> And the upper layer didn't care about the exact error values (yet).

Hm, I would rather have this return an error value in the errno.h
range. ie:

return ret & HV_HYPERCALL_RESULT_MASK ? -EINVAL : 0;

Or something along this lines, but long term you will need some kind
of mapping between HyperV and Xen error codes IMO.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup
  2020-02-13 12:24     ` Wei Liu
@ 2020-02-13 13:32       ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2020-02-13 13:32 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Xen Development List, Roger Pau Monné

On 13.02.2020 13:24, Wei Liu wrote:
> On Thu, Feb 13, 2020 at 10:46:56AM +0100, Jan Beulich wrote:
>> On 12.02.2020 17:09, Wei Liu wrote:
>>> --- a/xen/arch/x86/guest/hyperv/private.h
>>> +++ b/xen/arch/x86/guest/hyperv/private.h
>>> @@ -26,6 +26,6 @@
>>>  
>>>  DECLARE_PER_CPU(void *, hv_input_page);
>>>  DECLARE_PER_CPU(void *, hv_vp_assist);
>>> -DECLARE_PER_CPU(unsigned int, hv_vp_index);
>>> +DECLARE_PER_CPU(uint32_t, hv_vp_index);
>>
>> You've got a co-maintainer ack, i.e. so be it, but FTR this is
>> against what CodingStyle says, afaict: "Fixed width types should
>> only be used when a fixed width quantity is meant (which for
>> example may be a value read from or to be written to a register)."
>> If you handed the address (perhaps indirectly, e.g. by converting
>> to a physical one first) of this variable to Hyper-V, then things
>> would be different. But
>>
>>     this_cpu(hv_vp_index) = vp_index_msr;
>>
>> would, if unsigned int was wider than 32 bits, not cause any
> 
> Did you mean "wouldn't" here?

I don't think so - the not comes after the second comma.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] x86/hyperv: L0 assisted TLB flush
  2020-02-13 12:41       ` Roger Pau Monné
@ 2020-02-14 10:47         ` Wei Liu
  0 siblings, 0 replies; 23+ messages in thread
From: Wei Liu @ 2020-02-14 10:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Wei Liu, Wei Liu, Andrew Cooper, Paul Durrant, Michael Kelley,
	Jan Beulich, Xen Development List

On Thu, Feb 13, 2020 at 01:41:27PM +0100, Roger Pau Monné wrote:
> On Thu, Feb 13, 2020 at 12:20:33PM +0000, Wei Liu wrote:
> > On Wed, Feb 12, 2020 at 06:43:47PM +0100, Roger Pau Monné wrote:
> > > On Wed, Feb 12, 2020 at 04:09:18PM +0000, Wei Liu wrote:
> > > > +static uint64_t flush_tlb_ex(const cpumask_t *mask, const void *va,
> > > > +                             unsigned int flags)
> > > > +{
> > > > +    struct hv_tlb_flush_ex *flush = this_cpu(hv_input_page);
> > > > +    int nr_banks;
> > > > +    unsigned int max_gvas;
> > > > +    unsigned int order = flags & FLUSH_ORDER_MASK;
> > > > +    uint64_t ret;
> > > > +
> > > > +    ASSERT(flush);
> > > > +    ASSERT(!local_irq_is_enabled());
> > > 
> > > Can you turn this into an if condition with ASSERT_UNREACHABLE and
> > > return ~0ULL? (as I think that signals an error).
> > > 
> > 
> > There is no need for that. This function will always be internal to
> > Hyper-V in the foreseeable future. If it is ever called with IRQ enabled
> > something is wrong with the code.
> 
> But iff it ever manages to be called violating one of those conditions
> things will go badly I assume?
> 
> It would be better to stay on the safe side and simply return an error
> when the conditions are no meet, and assert in the debug build.

OK.

> 
> > 
> > > > +
> > > > +    if ( !(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED) )
> > > > +        return ~0ULL;
> > > > +
> > > > +    flush->address_space = 0;
> > > > +    flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
> > > > +    if ( !(flags & FLUSH_TLB_GLOBAL) )
> > > > +        flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> > > > +
> > > > +    flush->hv_vp_set.valid_bank_mask = 0;
> > > > +    flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > > > +
> > > > +    nr_banks = cpumask_to_vpset(&flush->hv_vp_set, mask);
> > > > +    if ( nr_banks < 0 )
> > > > +        return ~0ULL;
> > > > +
> > > > +    max_gvas =
> > > > +        (PAGE_SIZE - sizeof(*flush) - nr_banks *
> > > > +         sizeof(flush->hv_vp_set.bank_contents[0])) /
> > > > +        sizeof(uint64_t);       /* gva is represented as uint64_t */
> > > > +
> > > > +    /*
> > > > +     * Flush the entire address space if va is NULL or if there is not
> > > > +     * enough space for gva_list.
> > > > +     */
> > > > +    if ( !va || (ORDER_TO_BYTES(order) / HV_TLB_FLUSH_UNIT) > max_gvas )
> > > > +        ret = hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX, 0,
> > > > +                                  nr_banks, virt_to_maddr(flush), 0);
> > > > +    else
> > > > +    {
> > > > +        uint64_t *gva_list = (uint64_t *)flush + sizeof(*flush) + nr_banks;
> > > 
> > > Don't you need nr_banks * sizeof(flush->hv_vp_set.bank_contents) in
> > > order to calculate the position of the gva_list?
> > > 
> > 
> > The pointer arithmetic is done on uint64_t pointers so it already takes
> > into account sizeof(bank_contents[0]).
> 
> Oh, then the sizeof(*flush) should be divided by sizeof(uint64_t)?
> 

Yes. I think so. Thanks for catching this.

[...]
> > > > + do_ex_hypercall:
> > > > +    ret = flush_tlb_ex(mask, va, flags);
> > > > +
> > > > + done:
> > > > +    local_irq_restore(irq_flags);
> > > > +
> > > > +    return ret & HV_HYPERCALL_RESULT_MASK;
> > > 
> > > Will this return an error code that uses the same space as Xen's errno
> > > values?
> > > 
> > 
> > No, it won't. It returns Hyper-V's status code (0 still means success).
> > 
> > I didn't think that was a big deal because non-zero values meant errors.
> > And the upper layer didn't care about the exact error values (yet).
> 
> Hm, I would rather have this return an error value in the errno.h
> range. ie:
> 
> return ret & HV_HYPERCALL_RESULT_MASK ? -EINVAL : 0;
> 

Sure this can be done. I would use ENXIO rather than EINVAL though.

> Or something along this lines, but long term you will need some kind
> of mapping between HyperV and Xen error codes IMO.
> 

Yes. When we need more sophisticated handling of error codes.

Wei.

> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12 16:09 [Xen-devel] [PATCH 0/4] Xen on Hyper-V: Implement L0 assisted TLB flush Wei Liu
2020-02-12 16:09 ` [Xen-devel] [PATCH 1/4] x86/hyperv: misc cleanup Wei Liu
2020-02-12 16:53   ` Roger Pau Monné
2020-02-13 10:30     ` Wei Liu
2020-02-13  8:43   ` Durrant, Paul
2020-02-13  9:46   ` Jan Beulich
2020-02-13 12:24     ` Wei Liu
2020-02-13 13:32       ` Jan Beulich
2020-02-12 16:09 ` [Xen-devel] [PATCH 2/4] x86/hypervisor: pass flags to hypervisor_flush_tlb Wei Liu
2020-02-12 17:00   ` Roger Pau Monné
2020-02-13  8:48     ` Durrant, Paul
2020-02-13 10:29       ` Wei Liu
2020-02-13 11:56         ` Wei Liu
2020-02-12 16:09 ` [Xen-devel] [PATCH 3/4] x86/hyperv: skeleton for L0 assisted TLB flush Wei Liu
2020-02-12 17:09   ` Roger Pau Monné
2020-02-12 23:01     ` Wei Liu
2020-02-12 16:09 ` [Xen-devel] [PATCH 4/4] x86/hyperv: " Wei Liu
2020-02-12 17:43   ` Roger Pau Monné
2020-02-13  9:49     ` Jan Beulich
2020-02-13 12:25       ` Wei Liu
2020-02-13 12:20     ` Wei Liu
2020-02-13 12:41       ` Roger Pau Monné
2020-02-14 10:47         ` Wei Liu

Xen-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/xen-devel/0 xen-devel/git/0.git
	git clone --mirror https://lore.kernel.org/xen-devel/1 xen-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 xen-devel xen-devel/ https://lore.kernel.org/xen-devel \
		xen-devel@lists.xenproject.org xen-devel@lists.xen.org
	public-inbox-index xen-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.xenproject.lists.xen-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git