[Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode
@ 2020-01-27 18:11 Roger Pau Monne
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type Roger Pau Monne
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Roger Pau Monne

Hello,

The following series aims to improve the TLB flush times when running
nested Xen, and it's specially beneficial when running in shim mode.

Only the HAP guest TLB flush is improved, the shadow paging TLB flush is
left as-is, and can be improved later if there's interest.

For a reference on the performance improvement see patch #7, as it's a
huge increase which can benefit other guests using assisted TLB flushes,
and also the ones using the viridian TLB flush assist (ie: Windows).

Thanks, Roger.

Roger Pau Monne (7):
  x86/tlb: fix NEED_FLUSH return type
  x86/hvm: allow ASID flush when v != current
  x86/paging: add TLB flush hooks
  x86/hap: improve hypervisor assisted guest TLB flush
  x86/tlb: introduce a flush guests TLB flag
  x86/tlb: allow disabling the TLB clock
  x86/tlb: use Xen L0 assisted TLB flush when available

 xen/arch/x86/flushtlb.c                | 24 ++++++---
 xen/arch/x86/guest/hypervisor.c        | 10 ++++
 xen/arch/x86/guest/xen/xen.c           |  6 +++
 xen/arch/x86/hvm/asid.c                |  6 +--
 xen/arch/x86/hvm/hvm.c                 | 51 ++----------------
 xen/arch/x86/mm/hap/hap.c              | 48 +++++++++++++++++
 xen/arch/x86/mm/shadow/common.c        | 71 +++++++++++++++++++++++---
 xen/arch/x86/mm/shadow/hvm.c           |  2 +-
 xen/arch/x86/mm/shadow/multi.c         | 17 +++---
 xen/arch/x86/smp.c                     | 11 ++++
 xen/include/asm-x86/flushtlb.h         | 21 +++++++-
 xen/include/asm-x86/guest/hypervisor.h | 17 ++++++
 xen/include/asm-x86/hap.h              |  3 ++
 xen/include/asm-x86/shadow.h           | 12 +++++
 14 files changed, 220 insertions(+), 79 deletions(-)

-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
@ 2020-01-27 18:11 ` Roger Pau Monne
  2020-01-28 14:17   ` Wei Liu
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 2/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne

The returned type wants to be bool instead of int.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/include/asm-x86/flushtlb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 434821aaf3..2cfe4e6e97 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -42,7 +42,7 @@ static inline void page_set_tlbflush_timestamp(struct page_info *page)
  * @lastuse_stamp is a timestamp taken when the PFN we are testing was last 
  * used for a purpose that may have caused the CPU's TLB to become tainted.
  */
-static inline int NEED_FLUSH(u32 cpu_stamp, u32 lastuse_stamp)
+static inline bool NEED_FLUSH(u32 cpu_stamp, u32 lastuse_stamp)
 {
     u32 curr_time = tlbflush_current_time();
     /*
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v3 2/7] x86/hvm: allow ASID flush when v != current
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type Roger Pau Monne
@ 2020-01-27 18:11 ` Roger Pau Monne
  2020-02-06 11:15   ` Wei Liu
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 3/7] x86/paging: add TLB flush hooks Roger Pau Monne
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne

Current implementation of hvm_asid_flush_vcpu is not safe to use
unless the target vCPU is either paused or the currently running one,
as it modifies the generation without any locking.

Fix this by using atomic operations when accessing the generation
field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
allows to safely flush the current ASID generation. Note that for the
flush to take effect if the vCPU is currently running a vmexit is
required.

Note the same could be achieved by introducing an extra field to
hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
fields to 0, so there's no need to delay this to the vmentry ASID
helper.

This is not a bugfix as no callers that would violate the assumptions
listed in the first paragraph have been found, but a preparatory
change in order to allow remote flushing of HVM vCPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/asid.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/asid.c b/xen/arch/x86/hvm/asid.c
index 9d3c671a5f..80b73da89b 100644
--- a/xen/arch/x86/hvm/asid.c
+++ b/xen/arch/x86/hvm/asid.c
@@ -82,7 +82,7 @@ void hvm_asid_init(int nasids)

 void hvm_asid_flush_vcpu_asid(struct hvm_vcpu_asid *asid)
 {
-    asid->generation = 0;
+    write_atomic(&asid->generation, 0);
 }

 void hvm_asid_flush_vcpu(struct vcpu *v)
@@ -120,7 +120,7 @@ bool_t hvm_asid_handle_vmenter(struct hvm_vcpu_asid *asid)
         goto disabled;

     /* Test if VCPU has valid ASID. */
-    if ( asid->generation == data->core_asid_generation )
+    if ( read_atomic(&asid->generation) == data->core_asid_generation )
         return 0;

     /* If there are no free ASIDs, need to go to a new generation */
@@ -134,7 +134,7 @@ bool_t hvm_asid_handle_vmenter(struct hvm_vcpu_asid *asid)

     /* Now guaranteed to be a free ASID. */
     asid->asid = data->next_asid++;
-    asid->generation = data->core_asid_generation;
+    write_atomic(&asid->generation, data->core_asid_generation);

     /*
      * When we assign ASID 1, flush all TLB entries as we are starting a new
-- 
2.25.0

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v3 3/7] x86/paging: add TLB flush hooks
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type Roger Pau Monne
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 2/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
@ 2020-01-27 18:11 ` Roger Pau Monne
  2020-02-05 20:52   ` Wei Liu
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 4/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Roger Pau Monne

Add shadow and hap implementation specific helpers to perform guest
TLB flushes. Note that the code for both is exactly the same at the
moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
further patches that will add implementation specific optimizations to
them.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/hvm.c          | 51 ++----------------------------
 xen/arch/x86/mm/hap/hap.c       | 54 ++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/common.c | 55 +++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c  |  1 -
 xen/include/asm-x86/hap.h       |  3 ++
 xen/include/asm-x86/shadow.h    | 12 +++++++
 6 files changed, 127 insertions(+), 49 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0b93609a82..96c419f0ef 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3986,55 +3986,10 @@ static void hvm_s3_resume(struct domain *d)
 bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                         void *ctxt)
 {
-    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
-    cpumask_t *mask = &this_cpu(flush_cpumask);
-    struct domain *d = current->domain;
-    struct vcpu *v;
-
-    /* Avoid deadlock if more than one vcpu tries this at the same time. */
-    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
-        return false;
-
-    /* Pause all other vcpus. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_pause_nosync(v);
-
-    /* Now that all VCPUs are signalled to deschedule, we wait... */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            while ( !vcpu_runnable(v) && v->is_running )
-                cpu_relax();
-
-    /* All other vcpus are paused, safe to unlock now. */
-    spin_unlock(&d->hypercall_deadlock_mutex);
-
-    cpumask_clear(mask);
-
-    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
-    for_each_vcpu ( d, v )
-    {
-        unsigned int cpu;
-
-        if ( !flush_vcpu(ctxt, v) )
-            continue;
-
-        paging_update_cr3(v, false);
+    struct domain *currd = current->domain;
 
-        cpu = read_atomic(&v->dirty_cpu);
-        if ( is_vcpu_dirty_cpu(cpu) )
-            __cpumask_set_cpu(cpu, mask);
-    }
-
-    /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
-
-    /* Done. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_unpause(v);
-
-    return true;
+    return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
+                                      : hap_flush_tlb(flush_vcpu, ctxt);
 }
 
 static bool always_flush(void *ctxt, struct vcpu *v)
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..6894c1aa38 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -669,6 +669,60 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     hvm_update_guest_cr3(v, noflush);
 }
 
+bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                   void *ctxt)
+{
+    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
+    cpumask_t *mask = &this_cpu(flush_cpumask);
+    struct domain *d = current->domain;
+    struct vcpu *v;
+
+    /* Avoid deadlock if more than one vcpu tries this at the same time. */
+    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+        return false;
+
+    /* Pause all other vcpus. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_pause_nosync(v);
+
+    /* Now that all VCPUs are signalled to deschedule, we wait... */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            while ( !vcpu_runnable(v) && v->is_running )
+                cpu_relax();
+
+    /* All other vcpus are paused, safe to unlock now. */
+    spin_unlock(&d->hypercall_deadlock_mutex);
+
+    cpumask_clear(mask);
+
+    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
+    for_each_vcpu ( d, v )
+    {
+        unsigned int cpu;
+
+        if ( !flush_vcpu(ctxt, v) )
+            continue;
+
+        paging_update_cr3(v, false);
+
+        cpu = read_atomic(&v->dirty_cpu);
+        if ( is_vcpu_dirty_cpu(cpu) )
+            __cpumask_set_cpu(cpu, mask);
+    }
+
+    /* Flush TLBs on all CPUs with dirty vcpu state. */
+    flush_tlb_mask(mask);
+
+    /* Done. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_unpause(v);
+
+    return true;
+}
+
 const struct paging_mode *
 hap_paging_get_mode(struct vcpu *v)
 {
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 6212ec2c4a..ee90e55b41 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3357,6 +3357,61 @@ out:
     return rc;
 }
 
+/* Fluhs TLB of selected vCPUs. */
+bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                      void *ctxt)
+{
+    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
+    cpumask_t *mask = &this_cpu(flush_cpumask);
+    struct domain *d = current->domain;
+    struct vcpu *v;
+
+    /* Avoid deadlock if more than one vcpu tries this at the same time. */
+    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+        return false;
+
+    /* Pause all other vcpus. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_pause_nosync(v);
+
+    /* Now that all VCPUs are signalled to deschedule, we wait... */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            while ( !vcpu_runnable(v) && v->is_running )
+                cpu_relax();
+
+    /* All other vcpus are paused, safe to unlock now. */
+    spin_unlock(&d->hypercall_deadlock_mutex);
+
+    cpumask_clear(mask);
+
+    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
+    for_each_vcpu ( d, v )
+    {
+        unsigned int cpu;
+
+        if ( !flush_vcpu(ctxt, v) )
+            continue;
+
+        paging_update_cr3(v, false);
+
+        cpu = read_atomic(&v->dirty_cpu);
+        if ( is_vcpu_dirty_cpu(cpu) )
+            __cpumask_set_cpu(cpu, mask);
+    }
+
+    /* Flush TLBs on all CPUs with dirty vcpu state. */
+    flush_tlb_mask(mask);
+
+    /* Done. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_unpause(v);
+
+    return true;
+}
+
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 26798b317c..dfe264cf83 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4157,7 +4157,6 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     if ( do_locking ) paging_unlock(v->domain);
 }
 
-
 /**************************************************************************/
 /* Functions to revoke guest rights */
 
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index b94bfb4ed0..0c6aa26b9b 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -46,6 +46,9 @@ int   hap_track_dirty_vram(struct domain *d,
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
 
+bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                   void *ctxt);
+
 #endif /* XEN_HAP_H */
 
 /*
diff --git a/xen/include/asm-x86/shadow.h b/xen/include/asm-x86/shadow.h
index 907c71f497..3c1f6df478 100644
--- a/xen/include/asm-x86/shadow.h
+++ b/xen/include/asm-x86/shadow.h
@@ -95,6 +95,10 @@ void shadow_blow_tables_per_domain(struct domain *d);
 int shadow_set_allocation(struct domain *d, unsigned int pages,
                           bool *preempted);
 
+/* Flush the TLB of the selected vCPUs. */
+bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                      void *ctxt);
+
 #else /* !CONFIG_SHADOW_PAGING */
 
 #define shadow_teardown(d, p) ASSERT(is_pv_domain(d))
@@ -106,6 +110,14 @@ int shadow_set_allocation(struct domain *d, unsigned int pages,
 #define shadow_set_allocation(d, pages, preempted) \
     ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
 
+static inline bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt,
+                                                       struct vcpu *v),
+                                    void *ctxt)
+{
+    ASSERT_UNREACHABLE();
+    return -EOPNOTSUPP;
+}
+
 static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
                                      int fast, int all) {}
 
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v3 4/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (2 preceding siblings ...)
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 3/7] x86/paging: add TLB flush hooks Roger Pau Monne
@ 2020-01-27 18:11 ` Roger Pau Monne
  2020-02-06 11:23   ` Wei Liu
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 5/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Roger Pau Monne

The current implementation of the hypervisor assisted flush for HAP is
extremely inefficient.

First of all there's no need to call paging_update_cr3, as the only
relevant part of that function when doing a flush is the ASID vCPU
flush, so just call that function directly.

Since hvm_asid_flush_vcpu is protected against concurrent callers by
using atomic operations there's no need anymore to pause the affected
vCPUs.

Finally the global TLB flush performed by flush_tlb_mask is also not
necessary, since we only want to flush the guest TLB state it's enough
to trigger a vmexit on the pCPUs currently holding any vCPU state, as
such vmexit will already perform an ASID/VPID update, and thus clear
the guest TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/mm/hap/hap.c | 48 +++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 6894c1aa38..401eaf8026 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -669,32 +669,24 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     hvm_update_guest_cr3(v, noflush);
 }
 
+static void do_flush(void *data)
+{
+    cpumask_t *mask = data;
+    unsigned int cpu = smp_processor_id();
+
+    ASSERT(cpumask_test_cpu(cpu, mask));
+    cpumask_clear_cpu(cpu, mask);
+}
+
 bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                    void *ctxt)
 {
     static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
     cpumask_t *mask = &this_cpu(flush_cpumask);
     struct domain *d = current->domain;
+    unsigned int this_cpu = smp_processor_id();
     struct vcpu *v;
 
-    /* Avoid deadlock if more than one vcpu tries this at the same time. */
-    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
-        return false;
-
-    /* Pause all other vcpus. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_pause_nosync(v);
-
-    /* Now that all VCPUs are signalled to deschedule, we wait... */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            while ( !vcpu_runnable(v) && v->is_running )
-                cpu_relax();
-
-    /* All other vcpus are paused, safe to unlock now. */
-    spin_unlock(&d->hypercall_deadlock_mutex);
-
     cpumask_clear(mask);
 
     /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
@@ -705,20 +697,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
         if ( !flush_vcpu(ctxt, v) )
             continue;
 
-        paging_update_cr3(v, false);
+        hvm_asid_flush_vcpu(v);
 
         cpu = read_atomic(&v->dirty_cpu);
-        if ( is_vcpu_dirty_cpu(cpu) )
+        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
             __cpumask_set_cpu(cpu, mask);
     }
 
-    /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
-
-    /* Done. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_unpause(v);
+    /*
+     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
+     * ASID/VPIT change and hence accomplish a guest TLB flush. Note that vCPUs
+     * not currently running will already be flushed when scheduled because of
+     * the ASID tickle done in the loop above.
+     */
+    on_selected_cpus(mask, do_flush, mask, 0);
+    while ( !cpumask_empty(mask) )
+        cpu_relax();
 
     return true;
 }
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v3 5/7] x86/tlb: introduce a flush guests TLB flag
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (3 preceding siblings ...)
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 4/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
@ 2020-01-27 18:11 ` Roger Pau Monne
  2020-02-06 12:59   ` Wei Liu
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 6/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Roger Pau Monne

Introduce a specific flag to request a HVM guest TLB flush, which is
an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.

This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the guest page tables in that case
(ie: when using HAP).

Modify all shadow code TLB flushes to also flush the guest TLB, in
order to keep the previous behavior. I haven't looked at each specific
shadow code TLB flush in order to figure out whether it actually
requires a guest TLB flush or not, so there might be room for
improvement in that regard.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/flushtlb.c         |  5 +++--
 xen/arch/x86/mm/shadow/common.c | 18 +++++++++---------
 xen/arch/x86/mm/shadow/hvm.c    |  2 +-
 xen/arch/x86/mm/shadow/multi.c  | 16 ++++++++--------
 xen/include/asm-x86/flushtlb.h  |  2 ++
 5 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 03f92c23dc..e7ccd4ec7b 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -59,8 +59,6 @@ static u32 pre_flush(void)
         raise_softirq(NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ);
 
  skip_clocktick:
-    hvm_flush_guest_tlbs();
-
     return t2;
 }
 
@@ -221,6 +219,9 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
             do_tlb_flush();
     }
 
+    if ( flags & FLUSH_GUESTS_TLB )
+        hvm_flush_guest_tlbs();
+
     if ( flags & FLUSH_CACHE )
     {
         const struct cpuinfo_x86 *c = &current_cpu_data;
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index ee90e55b41..2a85e10112 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -363,7 +363,7 @@ static int oos_remove_write_access(struct vcpu *v, mfn_t gmfn,
     }
 
     if ( ftlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     return 0;
 }
@@ -939,7 +939,7 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
                 /* See if that freed up enough space */
                 if ( d->arch.paging.shadow.free_pages >= pages )
                 {
-                    flush_tlb_mask(d->dirty_cpumask);
+                    flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
                     return;
                 }
             }
@@ -993,7 +993,7 @@ static void shadow_blow_tables(struct domain *d)
                                pagetable_get_mfn(v->arch.shadow_table[i]), 0);
 
     /* Make sure everyone sees the unshadowings */
-    flush_tlb_mask(d->dirty_cpumask);
+    flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 }
 
 void shadow_blow_tables_per_domain(struct domain *d)
@@ -1102,7 +1102,7 @@ mfn_t shadow_alloc(struct domain *d,
         if ( unlikely(!cpumask_empty(&mask)) )
         {
             perfc_incr(shadow_alloc_tlbflush);
-            flush_tlb_mask(&mask);
+            flush_mask(&mask, FLUSH_TLB | FLUSH_GUESTS_TLB);
         }
         /* Now safe to clear the page for reuse */
         clear_domain_page(page_to_mfn(sp));
@@ -2290,7 +2290,7 @@ void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all)
 
     /* Need to flush TLBs now, so that linear maps are safe next time we
      * take a fault. */
-    flush_tlb_mask(d->dirty_cpumask);
+    flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     paging_unlock(d);
 }
@@ -3005,7 +3005,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
         {
             sh_remove_all_shadows_and_parents(d, mfn);
             if ( sh_remove_all_mappings(d, mfn, _gfn(gfn)) )
-                flush_tlb_mask(d->dirty_cpumask);
+                flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
         }
     }
 
@@ -3045,7 +3045,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
                 }
                 omfn = mfn_add(omfn, 1);
             }
-            flush_tlb_mask(&flushmask);
+            flush_mask(&flushmask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
             if ( npte )
                 unmap_domain_page(npte);
@@ -3332,7 +3332,7 @@ int shadow_track_dirty_vram(struct domain *d,
         }
     }
     if ( flush_tlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     goto out;
 
 out_sl1ma:
@@ -3402,7 +3402,7 @@ bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
     }
 
     /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
+    flush_mask(mask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     /* Done. */
     for_each_vcpu ( d, v )
diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
index a219266fa2..64077d181b 100644
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -590,7 +590,7 @@ static void validate_guest_pt_write(struct vcpu *v, mfn_t gmfn,
 
     if ( rc & SHADOW_SET_FLUSH )
         /* Need to flush TLBs to pick up shadow PT changes */
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     if ( rc & SHADOW_SET_ERROR )
     {
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index dfe264cf83..7efe32c35b 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3066,7 +3066,7 @@ static int sh_page_fault(struct vcpu *v,
         perfc_incr(shadow_rm_write_flush_tlb);
         smp_wmb();
         atomic_inc(&d->arch.paging.shadow.gtable_dirty_version);
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     }
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -3575,7 +3575,7 @@ static bool sh_invlpg(struct vcpu *v, unsigned long linear)
     if ( mfn_to_page(sl1mfn)->u.sh.type
          == SH_type_fl1_shadow )
     {
-        flush_tlb_local();
+        flush_local(FLUSH_TLB | FLUSH_GUESTS_TLB);
         return false;
     }
 
@@ -3810,7 +3810,7 @@ sh_update_linear_entries(struct vcpu *v)
          * table entry. But, without this change, it would fetch the wrong
          * value due to a stale TLB.
          */
-        flush_tlb_local();
+        flush_local(FLUSH_TLB | FLUSH_GUESTS_TLB);
     }
 }
 
@@ -4011,7 +4011,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
      * (old) shadow linear maps in the writeable mapping heuristics. */
 #if GUEST_PAGING_LEVELS == 2
     if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow);
 #elif GUEST_PAGING_LEVELS == 3
     /* PAE guests have four shadow_table entries, based on the
@@ -4035,7 +4035,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
             }
         }
         if ( flush )
-            flush_tlb_mask(d->dirty_cpumask);
+            flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
         /* Now install the new shadows. */
         for ( i = 0; i < 4; i++ )
         {
@@ -4056,7 +4056,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     }
 #elif GUEST_PAGING_LEVELS == 4
     if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
     if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
     {
@@ -4501,7 +4501,7 @@ static void sh_pagetable_dying(paddr_t gpa)
         }
     }
     if ( flush )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     /* Remember that we've seen the guest use this interface, so we
      * can rely on it using it in future, instead of guessing at
@@ -4538,7 +4538,7 @@ static void sh_pagetable_dying(paddr_t gpa)
         mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     }
 
     /* Remember that we've seen the guest use this interface, so we
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 2cfe4e6e97..07f9bc6103 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
 #define FLUSH_VCPU_STATE 0x1000
  /* Flush the per-cpu root page table */
 #define FLUSH_ROOT_PGTBL 0x2000
+ /* Flush all HVM guests linear TLB (using ASID/VPID) */
+#define FLUSH_GUESTS_TLB 0x4000
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v3 6/7] x86/tlb: allow disabling the TLB clock
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (4 preceding siblings ...)
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 5/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
@ 2020-01-27 18:11 ` Roger Pau Monne
  2020-02-06 13:31   ` Wei Liu
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
  2020-02-05 16:14 ` [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monné
  7 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne

The TLB clock is helpful when running Xen on bare metal because when
doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
last flush.

This is not the case however when Xen is running virtualized, and the
underlying hypervisor provides mechanism to assist in performing TLB
flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
order to perform a TLB flush without having to IPI each CPU. When
using such mechanisms it's no longer possible to keep a timestamp of
the flushes on each CPU, as they are performed by the underlying
hypervisor.

Offer a boolean in order to signal Xen that the timestamped TLB
shouldn't be used. This avoids keeping the timestamps of the flushes,
and also forces NEED_FLUSH to always return true.

No functional change intended, as this change doesn't introduce any
user that disables the timestamped TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/flushtlb.c        | 19 +++++++++++++------
 xen/include/asm-x86/flushtlb.h | 17 ++++++++++++++++-
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index e7ccd4ec7b..3649900793 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -32,6 +32,9 @@
 u32 tlbflush_clock = 1U;
 DEFINE_PER_CPU(u32, tlbflush_time);
 
+/* Signals whether the TLB flush clock is in use. */
+bool __read_mostly tlb_clk_enabled = true;
+
 /*
  * pre_flush(): Increment the virtual TLB-flush clock. Returns new clock value.
  * 
@@ -82,12 +85,13 @@ static void post_flush(u32 t)
 static void do_tlb_flush(void)
 {
     unsigned long flags, cr4;
-    u32 t;
+    u32 t = 0;
 
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
 
-    t = pre_flush();
+    if ( tlb_clk_enabled )
+        t = pre_flush();
 
     if ( use_invpcid )
         invpcid_flush_all();
@@ -99,7 +103,8 @@ static void do_tlb_flush(void)
     else
         write_cr3(read_cr3());
 
-    post_flush(t);
+    if ( tlb_clk_enabled )
+        post_flush(t);
 
     local_irq_restore(flags);
 }
@@ -107,7 +112,7 @@ static void do_tlb_flush(void)
 void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
     unsigned long flags, old_cr4;
-    u32 t;
+    u32 t = 0;
 
     /* Throughout this function we make this assumption: */
     ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE));
@@ -115,7 +120,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
 
-    t = pre_flush();
+    if ( tlb_clk_enabled )
+        t = pre_flush();
 
     old_cr4 = read_cr4();
     ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
@@ -167,7 +173,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     if ( cr4 & X86_CR4_PCIDE )
         invpcid_flush_all_nonglobals();
 
-    post_flush(t);
+    if ( tlb_clk_enabled )
+        post_flush(t);
 
     local_irq_restore(flags);
 }
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 07f9bc6103..9773014320 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -21,10 +21,21 @@ extern u32 tlbflush_clock;
 /* Time at which each CPU's TLB was last flushed. */
 DECLARE_PER_CPU(u32, tlbflush_time);
 
-#define tlbflush_current_time() tlbflush_clock
+/* TLB clock is in use. */
+extern bool tlb_clk_enabled;
+
+static inline uint32_t tlbflush_current_time(void)
+{
+    /* Returning 0 from tlbflush_current_time will always force a flush. */
+    return tlb_clk_enabled ? tlbflush_clock : 0;
+}
 
 static inline void page_set_tlbflush_timestamp(struct page_info *page)
 {
+    /* Avoid the write if the TLB clock is disabled. */
+    if ( !tlb_clk_enabled )
+        return;
+
     /*
      * Prevent storing a stale time stamp, which could happen if an update
      * to tlbflush_clock plus a subsequent flush IPI happen between the
@@ -67,6 +78,10 @@ static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp)
 {
     unsigned int cpu;
 
+    /* Short-circuit: there's no need to iterate if the clock is disabled. */
+    if ( !tlb_clk_enabled )
+        return;
+
     for_each_cpu ( cpu, mask )
         if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) )
             __cpumask_clear_cpu(cpu, mask);
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (5 preceding siblings ...)
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 6/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
@ 2020-01-27 18:11 ` Roger Pau Monne
  2020-01-28 14:17   ` Wei Liu
  2020-02-06 13:49   ` Wei Liu
  2020-02-05 16:14 ` [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monné
  7 siblings, 2 replies; 28+ messages in thread
From: Roger Pau Monne @ 2020-01-27 18:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne

Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.

The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.

Using x2APIC and ALLBUT shorthand:
real	4m35.973s
user	4m35.110s
sys	36m24.117s

Using L0 assisted flush:
real    1m2.596s
user    4m34.818s
sys     5m16.374s

The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook. Note that the Xen implementation completely ignores the
dirty CPU mask and the linear address passed in, and always performs a
global TLB flush on all vCPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Add a L0 assisted hook to hypervisor ops.
---
 xen/arch/x86/guest/hypervisor.c        | 10 ++++++++++
 xen/arch/x86/guest/xen/xen.c           |  6 ++++++
 xen/arch/x86/smp.c                     | 11 +++++++++++
 xen/include/asm-x86/guest/hypervisor.h | 17 +++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
index 4f27b98740..4085b19734 100644
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -18,6 +18,7 @@
  *
  * Copyright (c) 2019 Microsoft.
  */
+#include <xen/cpumask.h>
 #include <xen/init.h>
 #include <xen/types.h>
 
@@ -64,6 +65,15 @@ void hypervisor_resume(void)
         ops->resume();
 }
 
+int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                         unsigned int order)
+{
+    if ( ops && ops->flush_tlb )
+        return ops->flush_tlb(mask, va, order);
+
+    return -ENOSYS;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index 6dbc5f953f..639a2a4b32 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -310,11 +310,17 @@ static void resume(void)
         pv_console_init();
 }
 
+static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
+{
+    return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
+}
+
 static const struct hypervisor_ops ops = {
     .name = "Xen",
     .setup = setup,
     .ap_setup = ap_setup,
     .resume = resume,
+    .flush_tlb = flush_tlb,
 };
 
 const struct hypervisor_ops *__init xg_probe(void)
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 65eb7cbda8..9bc925616a 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -15,6 +15,7 @@
 #include <xen/perfc.h>
 #include <xen/spinlock.h>
 #include <asm/current.h>
+#include <asm/guest.h>
 #include <asm/smp.h>
 #include <asm/mc146818rtc.h>
 #include <asm/flushtlb.h>
@@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
     if ( (flags & ~FLUSH_ORDER_MASK) &&
          !cpumask_subset(mask, cpumask_of(cpu)) )
     {
+        if ( cpu_has_hypervisor &&
+             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
+                         FLUSH_ORDER_MASK)) &&
+             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
+        {
+            if ( tlb_clk_enabled )
+                tlb_clk_enabled = false;
+            return;
+        }
+
         spin_lock(&flush_lock);
         cpumask_and(&flush_cpumask, mask, &cpu_online_map);
         cpumask_clear_cpu(cpu, &flush_cpumask);
diff --git a/xen/include/asm-x86/guest/hypervisor.h b/xen/include/asm-x86/guest/hypervisor.h
index 392f4b90ae..e230a5d065 100644
--- a/xen/include/asm-x86/guest/hypervisor.h
+++ b/xen/include/asm-x86/guest/hypervisor.h
@@ -19,6 +19,8 @@
 #ifndef __X86_HYPERVISOR_H__
 #define __X86_HYPERVISOR_H__
 
+#include <xen/cpumask.h>
+
 struct hypervisor_ops {
     /* Name of the hypervisor */
     const char *name;
@@ -28,6 +30,8 @@ struct hypervisor_ops {
     void (*ap_setup)(void);
     /* Resume from suspension */
     void (*resume)(void);
+    /* L0 assisted TLB flush */
+    int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int order);
 };
 
 #ifdef CONFIG_GUEST
@@ -36,6 +40,14 @@ const char *hypervisor_probe(void);
 void hypervisor_setup(void);
 void hypervisor_ap_setup(void);
 void hypervisor_resume(void);
+/*
+ * L0 assisted TLB flush.
+ * mask: cpumask of the dirty vCPUs that should be flushed.
+ * va: linear address to flush, or NULL for global flushes.
+ * order: order of the linear address pointed by va.
+ */
+int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                         unsigned int order);
 
 #else
 
@@ -46,6 +58,11 @@ static inline const char *hypervisor_probe(void) { return NULL; }
 static inline void hypervisor_setup(void) { ASSERT_UNREACHABLE(); }
 static inline void hypervisor_ap_setup(void) { ASSERT_UNREACHABLE(); }
 static inline void hypervisor_resume(void) { ASSERT_UNREACHABLE(); }
+static inline int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                                       unsigned int order)
+{
+    return -ENOSYS;
+}
 
 #endif  /* CONFIG_GUEST */
 
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
@ 2020-01-28 14:17   ` Wei Liu
  2020-01-28 14:57     ` Roger Pau Monné
  2020-02-06 13:49   ` Wei Liu
  1 sibling, 1 reply; 28+ messages in thread
From: Wei Liu @ 2020-01-28 14:17 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
[...]
>  
>  const struct hypervisor_ops *__init xg_probe(void)
> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> index 65eb7cbda8..9bc925616a 100644
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -15,6 +15,7 @@
>  #include <xen/perfc.h>
>  #include <xen/spinlock.h>
>  #include <asm/current.h>
> +#include <asm/guest.h>
>  #include <asm/smp.h>
>  #include <asm/mc146818rtc.h>
>  #include <asm/flushtlb.h>
> @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
>      if ( (flags & ~FLUSH_ORDER_MASK) &&
>           !cpumask_subset(mask, cpumask_of(cpu)) )
>      {
> +        if ( cpu_has_hypervisor &&
> +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> +                         FLUSH_ORDER_MASK)) &&
> +             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
> +        {
> +            if ( tlb_clk_enabled )
> +                tlb_clk_enabled = false;

You may delete the if here to make the generated machine code shorter.

OOI why isn't tlb_clk_enabled set to false when Xen determines to use L0
assisted flush?

(Sorry I haven't read previous patches in detail)

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type Roger Pau Monne
@ 2020-01-28 14:17   ` Wei Liu
  2020-02-03 11:48     ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Wei Liu @ 2020-01-28 14:17 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Mon, Jan 27, 2020 at 07:11:09PM +0100, Roger Pau Monne wrote:
> The returned type wants to be bool instead of int.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wl@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-28 14:17   ` Wei Liu
@ 2020-01-28 14:57     ` Roger Pau Monné
  2020-01-28 16:24       ` Wei Liu
  0 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monné @ 2020-01-28 14:57 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Andrew Cooper

On Tue, Jan 28, 2020 at 02:17:36PM +0000, Wei Liu wrote:
> On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
> [...]
> >  
> >  const struct hypervisor_ops *__init xg_probe(void)
> > diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> > index 65eb7cbda8..9bc925616a 100644
> > --- a/xen/arch/x86/smp.c
> > +++ b/xen/arch/x86/smp.c
> > @@ -15,6 +15,7 @@
> >  #include <xen/perfc.h>
> >  #include <xen/spinlock.h>
> >  #include <asm/current.h>
> > +#include <asm/guest.h>
> >  #include <asm/smp.h>
> >  #include <asm/mc146818rtc.h>
> >  #include <asm/flushtlb.h>
> > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
> >      if ( (flags & ~FLUSH_ORDER_MASK) &&
> >           !cpumask_subset(mask, cpumask_of(cpu)) )
> >      {
> > +        if ( cpu_has_hypervisor &&
> > +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> > +                         FLUSH_ORDER_MASK)) &&
> > +             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
> > +        {
> > +            if ( tlb_clk_enabled )
> > +                tlb_clk_enabled = false;
> 
> You may delete the if here to make the generated machine code shorter.

Hm, but tlb_clk_enabled is marked as read_mostly, which won't be true
then, and would likely have a performance impact.

> OOI why isn't tlb_clk_enabled set to false when Xen determines to use L0
> assisted flush?

L0 assisted flush can fail (ie: return an error), and in that case Xen
would be better to continue using the timestamped tlb, as it could
avoid some flushes.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-28 14:57     ` Roger Pau Monné
@ 2020-01-28 16:24       ` Wei Liu
  2020-01-28 17:16         ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Wei Liu @ 2020-01-28 16:24 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Tue, Jan 28, 2020 at 03:57:04PM +0100, Roger Pau Monné wrote:
> On Tue, Jan 28, 2020 at 02:17:36PM +0000, Wei Liu wrote:
> > On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
> > [...]
> > >  
> > >  const struct hypervisor_ops *__init xg_probe(void)
> > > diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> > > index 65eb7cbda8..9bc925616a 100644
> > > --- a/xen/arch/x86/smp.c
> > > +++ b/xen/arch/x86/smp.c
> > > @@ -15,6 +15,7 @@
> > >  #include <xen/perfc.h>
> > >  #include <xen/spinlock.h>
> > >  #include <asm/current.h>
> > > +#include <asm/guest.h>
> > >  #include <asm/smp.h>
> > >  #include <asm/mc146818rtc.h>
> > >  #include <asm/flushtlb.h>
> > > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
> > >      if ( (flags & ~FLUSH_ORDER_MASK) &&
> > >           !cpumask_subset(mask, cpumask_of(cpu)) )
> > >      {
> > > +        if ( cpu_has_hypervisor &&
> > > +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> > > +                         FLUSH_ORDER_MASK)) &&
> > > +             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
> > > +        {
> > > +            if ( tlb_clk_enabled )
> > > +                tlb_clk_enabled = false;
> > 
> > You may delete the if here to make the generated machine code shorter.
> 
> Hm, but tlb_clk_enabled is marked as read_mostly, which won't be true
> then, and would likely have a performance impact.

OK. Fair enough.

> 
> > OOI why isn't tlb_clk_enabled set to false when Xen determines to use L0
> > assisted flush?
> 
> L0 assisted flush can fail (ie: return an error), and in that case Xen
> would be better to continue using the timestamped tlb, as it could
> avoid some flushes.

Do you need to set tlb_clk_enabled in that case?

Wei.

> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-28 16:24       ` Wei Liu
@ 2020-01-28 17:16         ` Roger Pau Monné
  2020-01-28 17:20           ` Andrew Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monné @ 2020-01-28 17:16 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Andrew Cooper

On Tue, Jan 28, 2020 at 04:24:24PM +0000, Wei Liu wrote:
> On Tue, Jan 28, 2020 at 03:57:04PM +0100, Roger Pau Monné wrote:
> > On Tue, Jan 28, 2020 at 02:17:36PM +0000, Wei Liu wrote:
> > > On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
> > > [...]
> > > >  
> > > >  const struct hypervisor_ops *__init xg_probe(void)
> > > > diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> > > > index 65eb7cbda8..9bc925616a 100644
> > > > --- a/xen/arch/x86/smp.c
> > > > +++ b/xen/arch/x86/smp.c
> > > > @@ -15,6 +15,7 @@
> > > >  #include <xen/perfc.h>
> > > >  #include <xen/spinlock.h>
> > > >  #include <asm/current.h>
> > > > +#include <asm/guest.h>
> > > >  #include <asm/smp.h>
> > > >  #include <asm/mc146818rtc.h>
> > > >  #include <asm/flushtlb.h>
> > > > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
> > > >      if ( (flags & ~FLUSH_ORDER_MASK) &&
> > > >           !cpumask_subset(mask, cpumask_of(cpu)) )
> > > >      {
> > > > +        if ( cpu_has_hypervisor &&
> > > > +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> > > > +                         FLUSH_ORDER_MASK)) &&
> > > > +             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
> > > > +        {
> > > > +            if ( tlb_clk_enabled )
> > > > +                tlb_clk_enabled = false;
> > > 
> > > You may delete the if here to make the generated machine code shorter.
> > 
> > Hm, but tlb_clk_enabled is marked as read_mostly, which won't be true
> > then, and would likely have a performance impact.
> 
> OK. Fair enough.
> 
> > 
> > > OOI why isn't tlb_clk_enabled set to false when Xen determines to use L0
> > > assisted flush?
> > 
> > L0 assisted flush can fail (ie: return an error), and in that case Xen
> > would be better to continue using the timestamped tlb, as it could
> > avoid some flushes.
> 
> Do you need to set tlb_clk_enabled in that case?

AFAICT it's safe to enable the TLB timestamps after being disabled,
but hypervisor_flush_tlb could fail intermittently with EBUSY in the
Xen implementation, and I don't really want that to cause spurious
enabling of the timestamps periodically.

My expectation would be that such assistance could fail sporadically,
but it shouldn't re-enable the timestamped TLB as failure would be
infrequent.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-28 17:16         ` Roger Pau Monné
@ 2020-01-28 17:20           ` Andrew Cooper
  2020-01-28 17:38             ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2020-01-28 17:20 UTC (permalink / raw)
  To: Roger Pau Monné, Wei Liu; +Cc: xen-devel

On 28/01/2020 17:16, Roger Pau Monné wrote:
>>>> OOI why isn't tlb_clk_enabled set to false when Xen determines to use L0
>>>> assisted flush?
>>> L0 assisted flush can fail (ie: return an error), and in that case Xen
>>> would be better to continue using the timestamped tlb, as it could
>>> avoid some flushes.
>> Do you need to set tlb_clk_enabled in that case?
> AFAICT it's safe to enable the TLB timestamps after being disabled,
> but hypervisor_flush_tlb could fail intermittently with EBUSY in the
> Xen implementation, and I don't really want that to cause spurious
> enabling of the timestamps periodically.

What causes -EBUSY?

I don't think it is reasonable for the hypercall to fail like that. 
There is no circumstance ever where a TLB flush is wanted, and it is
able to be deferred.

Forcing all callers to loop while busy is a terrible interface.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-28 17:20           ` Andrew Cooper
@ 2020-01-28 17:38             ` Roger Pau Monné
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monné @ 2020-01-28 17:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu

On Tue, Jan 28, 2020 at 05:20:25PM +0000, Andrew Cooper wrote:
> On 28/01/2020 17:16, Roger Pau Monné wrote:
> >>>> OOI why isn't tlb_clk_enabled set to false when Xen determines to use L0
> >>>> assisted flush?
> >>> L0 assisted flush can fail (ie: return an error), and in that case Xen
> >>> would be better to continue using the timestamped tlb, as it could
> >>> avoid some flushes.
> >> Do you need to set tlb_clk_enabled in that case?
> > AFAICT it's safe to enable the TLB timestamps after being disabled,
> > but hypervisor_flush_tlb could fail intermittently with EBUSY in the
> > Xen implementation, and I don't really want that to cause spurious
> > enabling of the timestamps periodically.
> 
> What causes -EBUSY?

My bad, it's not EBUSY, it's ERESTART but that won't be returned to
the caller.

> I don't think it is reasonable for the hypercall to fail like that. 
> There is no circumstance ever where a TLB flush is wanted, and it is
> able to be deferred.

After this series ERESTART is only used by the shadow flush
implementation, and I'm not sure I'm confident enough to try to change
the shadow code in order to not do that, also because I think the
interest is likely much lower than for the HAP case.

> Forcing all callers to loop while busy is a terrible interface.

Well, the whole implementation of hvm_flush_vcpu_tlb is quite clumsy
because it attempts to pause each vCPU to be flushed, which for guests
with a high number of vCPUs makes it likely slower than just using a
shorthand IPI inside the guest.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type
  2020-01-28 14:17   ` Wei Liu
@ 2020-02-03 11:48     ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2020-02-03 11:48 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Andrew Cooper, Roger Pau Monne

On 28.01.2020 15:17, Wei Liu wrote:
> On Mon, Jan 27, 2020 at 07:11:09PM +0100, Roger Pau Monne wrote:
>> The returned type wants to be bool instead of int.
>>
>> No functional change intended.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Wei Liu <wl@xen.org>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode
  2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (6 preceding siblings ...)
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
@ 2020-02-05 16:14 ` Roger Pau Monné
  2020-02-05 16:44   ` Jan Beulich
  7 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monné @ 2020-02-05 16:14 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

Ping?

Just want to make sure this doesn't go under the radar.

Thanks, Roger.

On Mon, Jan 27, 2020 at 07:11:08PM +0100, Roger Pau Monne wrote:
> Hello,
> 
> The following series aims to improve the TLB flush times when running
> nested Xen, and it's specially beneficial when running in shim mode.
> 
> Only the HAP guest TLB flush is improved, the shadow paging TLB flush is
> left as-is, and can be improved later if there's interest.
> 
> For a reference on the performance improvement see patch #7, as it's a
> huge increase which can benefit other guests using assisted TLB flushes,
> and also the ones using the viridian TLB flush assist (ie: Windows).
> 
> Thanks, Roger.
> 
> Roger Pau Monne (7):
>   x86/tlb: fix NEED_FLUSH return type
>   x86/hvm: allow ASID flush when v != current
>   x86/paging: add TLB flush hooks
>   x86/hap: improve hypervisor assisted guest TLB flush
>   x86/tlb: introduce a flush guests TLB flag
>   x86/tlb: allow disabling the TLB clock
>   x86/tlb: use Xen L0 assisted TLB flush when available
> 
>  xen/arch/x86/flushtlb.c                | 24 ++++++---
>  xen/arch/x86/guest/hypervisor.c        | 10 ++++
>  xen/arch/x86/guest/xen/xen.c           |  6 +++
>  xen/arch/x86/hvm/asid.c                |  6 +--
>  xen/arch/x86/hvm/hvm.c                 | 51 ++----------------
>  xen/arch/x86/mm/hap/hap.c              | 48 +++++++++++++++++
>  xen/arch/x86/mm/shadow/common.c        | 71 +++++++++++++++++++++++---
>  xen/arch/x86/mm/shadow/hvm.c           |  2 +-
>  xen/arch/x86/mm/shadow/multi.c         | 17 +++---
>  xen/arch/x86/smp.c                     | 11 ++++
>  xen/include/asm-x86/flushtlb.h         | 21 +++++++-
>  xen/include/asm-x86/guest/hypervisor.h | 17 ++++++
>  xen/include/asm-x86/hap.h              |  3 ++
>  xen/include/asm-x86/shadow.h           | 12 +++++
>  14 files changed, 220 insertions(+), 79 deletions(-)
> 
> -- 
> 2.25.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode
  2020-02-05 16:14 ` [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monné
@ 2020-02-05 16:44   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2020-02-05 16:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, xen-devel, Tim Deegan, Wei Liu, Andrew Cooper

On 05.02.2020 17:14, Roger Pau Monné wrote:
> Ping?
> 
> Just want to make sure this doesn't go under the radar.

It's safely sitting in my to-be-looked-at folder, but there's
way too much else to look after to get to it, yet.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 3/7] x86/paging: add TLB flush hooks
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 3/7] x86/paging: add TLB flush hooks Roger Pau Monne
@ 2020-02-05 20:52   ` Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2020-02-05 20:52 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich,
	xen-devel

On Mon, Jan 27, 2020 at 07:11:11PM +0100, Roger Pau Monne wrote:
[...]
>  
> diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index 26798b317c..dfe264cf83 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -4157,7 +4157,6 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
>      if ( do_locking ) paging_unlock(v->domain);
>  }
>  
> -

Stray change.

>  /**************************************************************************/
>  /* Functions to revoke guest rights */
>  
> diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
> index b94bfb4ed0..0c6aa26b9b 100644
> --- a/xen/include/asm-x86/hap.h
> +++ b/xen/include/asm-x86/hap.h
> @@ -46,6 +46,9 @@ int   hap_track_dirty_vram(struct domain *d,
>  extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
>  int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
>  
> +bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> +                   void *ctxt);
> +
>  #endif /* XEN_HAP_H */
>  
>  /*
> diff --git a/xen/include/asm-x86/shadow.h b/xen/include/asm-x86/shadow.h
> index 907c71f497..3c1f6df478 100644
> --- a/xen/include/asm-x86/shadow.h
> +++ b/xen/include/asm-x86/shadow.h
> @@ -95,6 +95,10 @@ void shadow_blow_tables_per_domain(struct domain *d);
>  int shadow_set_allocation(struct domain *d, unsigned int pages,
>                            bool *preempted);
>  
> +/* Flush the TLB of the selected vCPUs. */
> +bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> +                      void *ctxt);
> +
>  #else /* !CONFIG_SHADOW_PAGING */
>  
>  #define shadow_teardown(d, p) ASSERT(is_pv_domain(d))
> @@ -106,6 +110,14 @@ int shadow_set_allocation(struct domain *d, unsigned int pages,
>  #define shadow_set_allocation(d, pages, preempted) \
>      ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
>  
> +static inline bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt,
> +                                                       struct vcpu *v),
> +                                    void *ctxt)
> +{
> +    ASSERT_UNREACHABLE();
> +    return -EOPNOTSUPP;

This function needs to return true/false per its signature.

With this fixed:

Reviewed-by: Wei Liu <wl@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/7] x86/hvm: allow ASID flush when v != current
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 2/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
@ 2020-02-06 11:15   ` Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2020-02-06 11:15 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, Jan 27, 2020 at 07:11:10PM +0100, Roger Pau Monne wrote:
> Current implementation of hvm_asid_flush_vcpu is not safe to use
> unless the target vCPU is either paused or the currently running one,
> as it modifies the generation without any locking.
> 
> Fix this by using atomic operations when accessing the generation
> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
> allows to safely flush the current ASID generation. Note that for the
> flush to take effect if the vCPU is currently running a vmexit is
> required.
> 
> Note the same could be achieved by introducing an extra field to
> hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
> hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
> seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
> fields to 0, so there's no need to delay this to the vmentry ASID
> helper.
> 
> This is not a bugfix as no callers that would violate the assumptions
> listed in the first paragraph have been found, but a preparatory
> change in order to allow remote flushing of HVM vCPUs.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Took me some time to go over ASID code but as far as I can tell, this
modification is correct:

Reviewed-by: Wei Liu <wl@xen.org>

> ---
>  xen/arch/x86/hvm/asid.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/asid.c b/xen/arch/x86/hvm/asid.c
> index 9d3c671a5f..80b73da89b 100644
> --- a/xen/arch/x86/hvm/asid.c
> +++ b/xen/arch/x86/hvm/asid.c
> @@ -82,7 +82,7 @@ void hvm_asid_init(int nasids)
>  
>  void hvm_asid_flush_vcpu_asid(struct hvm_vcpu_asid *asid)
>  {
> -    asid->generation = 0;
> +    write_atomic(&asid->generation, 0);
>  }
>  
>  void hvm_asid_flush_vcpu(struct vcpu *v)
> @@ -120,7 +120,7 @@ bool_t hvm_asid_handle_vmenter(struct hvm_vcpu_asid *asid)
>          goto disabled;
>  
>      /* Test if VCPU has valid ASID. */
> -    if ( asid->generation == data->core_asid_generation )
> +    if ( read_atomic(&asid->generation) == data->core_asid_generation )
>          return 0;
>  
>      /* If there are no free ASIDs, need to go to a new generation */
> @@ -134,7 +134,7 @@ bool_t hvm_asid_handle_vmenter(struct hvm_vcpu_asid *asid)
>  
>      /* Now guaranteed to be a free ASID. */
>      asid->asid = data->next_asid++;
> -    asid->generation = data->core_asid_generation;
> +    write_atomic(&asid->generation, data->core_asid_generation);
>  
>      /*
>       * When we assign ASID 1, flush all TLB entries as we are starting a new
> -- 
> 2.25.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 4/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 4/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
@ 2020-02-06 11:23   ` Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2020-02-06 11:23 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, Jan 27, 2020 at 07:11:12PM +0100, Roger Pau Monne wrote:
> The current implementation of the hypervisor assisted flush for HAP is
> extremely inefficient.
> 
> First of all there's no need to call paging_update_cr3, as the only
> relevant part of that function when doing a flush is the ASID vCPU
> flush, so just call that function directly.
> 
> Since hvm_asid_flush_vcpu is protected against concurrent callers by
> using atomic operations there's no need anymore to pause the affected
> vCPUs.
> 
> Finally the global TLB flush performed by flush_tlb_mask is also not
> necessary, since we only want to flush the guest TLB state it's enough
> to trigger a vmexit on the pCPUs currently holding any vCPU state, as
> such vmexit will already perform an ASID/VPID update, and thus clear
> the guest TLB.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/mm/hap/hap.c | 48 +++++++++++++++++----------------------
>  1 file changed, 21 insertions(+), 27 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index 6894c1aa38..401eaf8026 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -669,32 +669,24 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
>      hvm_update_guest_cr3(v, noflush);
>  }
>  
> +static void do_flush(void *data)

I think the name is misleading, because this function doesn't flush the
TLB itself. We're relying on the side effect of vmexit to flush.

I don't have suggestion for a good name, though, so this comment is not
blocking.

> +{
> +    cpumask_t *mask = data;
> +    unsigned int cpu = smp_processor_id();
> +
> +    ASSERT(cpumask_test_cpu(cpu, mask));
> +    cpumask_clear_cpu(cpu, mask);
> +}
> +
>  bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>                     void *ctxt)
>  {
>      static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
>      cpumask_t *mask = &this_cpu(flush_cpumask);
>      struct domain *d = current->domain;
> +    unsigned int this_cpu = smp_processor_id();
>      struct vcpu *v;
>  
> -    /* Avoid deadlock if more than one vcpu tries this at the same time. */
> -    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
> -        return false;
> -
> -    /* Pause all other vcpus. */
> -    for_each_vcpu ( d, v )
> -        if ( v != current && flush_vcpu(ctxt, v) )
> -            vcpu_pause_nosync(v);
> -
> -    /* Now that all VCPUs are signalled to deschedule, we wait... */
> -    for_each_vcpu ( d, v )
> -        if ( v != current && flush_vcpu(ctxt, v) )
> -            while ( !vcpu_runnable(v) && v->is_running )
> -                cpu_relax();
> -
> -    /* All other vcpus are paused, safe to unlock now. */
> -    spin_unlock(&d->hypercall_deadlock_mutex);
> -
>      cpumask_clear(mask);
>  
>      /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
> @@ -705,20 +697,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>          if ( !flush_vcpu(ctxt, v) )
>              continue;
>  
> -        paging_update_cr3(v, false);
> +        hvm_asid_flush_vcpu(v);
>  
>          cpu = read_atomic(&v->dirty_cpu);
> -        if ( is_vcpu_dirty_cpu(cpu) )
> +        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
>              __cpumask_set_cpu(cpu, mask);
>      }
>  
> -    /* Flush TLBs on all CPUs with dirty vcpu state. */
> -    flush_tlb_mask(mask);
> -
> -    /* Done. */
> -    for_each_vcpu ( d, v )
> -        if ( v != current && flush_vcpu(ctxt, v) )
> -            vcpu_unpause(v);
> +    /*
> +     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
> +     * ASID/VPIT change and hence accomplish a guest TLB flush. Note that vCPUs
> +     * not currently running will already be flushed when scheduled because of
> +     * the ASID tickle done in the loop above.
> +     */

VPIT -> VPID

Reviewed-by: Wei Liu <wl@xen.org>

> +    on_selected_cpus(mask, do_flush, mask, 0);
> +    while ( !cpumask_empty(mask) )
> +        cpu_relax();
>  
>      return true;
>  }
> -- 
> 2.25.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 5/7] x86/tlb: introduce a flush guests TLB flag
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 5/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
@ 2020-02-06 12:59   ` Wei Liu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Liu @ 2020-02-06 12:59 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich,
	xen-devel

On Mon, Jan 27, 2020 at 07:11:13PM +0100, Roger Pau Monne wrote:
> Introduce a specific flag to request a HVM guest TLB flush, which is
> an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.
> 
> This was previously unconditionally done in each pre_flush call, but
> that's not required: HVM guests not using shadow don't require linear
> TLB flushes as Xen doesn't modify the guest page tables in that case
> (ie: when using HAP).
> 
> Modify all shadow code TLB flushes to also flush the guest TLB, in
> order to keep the previous behavior. I haven't looked at each specific
> shadow code TLB flush in order to figure out whether it actually
> requires a guest TLB flush or not, so there might be room for
> improvement in that regard.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wl@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 6/7] x86/tlb: allow disabling the TLB clock
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 6/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
@ 2020-02-06 13:31   ` Wei Liu
  2020-02-06 13:46     ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Wei Liu @ 2020-02-06 13:31 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, Jan 27, 2020 at 07:11:14PM +0100, Roger Pau Monne wrote:
> The TLB clock is helpful when running Xen on bare metal because when
> doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
> last flush.
> 
> This is not the case however when Xen is running virtualized, and the
> underlying hypervisor provides mechanism to assist in performing TLB
> flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
> order to perform a TLB flush without having to IPI each CPU. When
> using such mechanisms it's no longer possible to keep a timestamp of
> the flushes on each CPU, as they are performed by the underlying
> hypervisor.
> 
> Offer a boolean in order to signal Xen that the timestamped TLB
> shouldn't be used. This avoids keeping the timestamps of the flushes,
> and also forces NEED_FLUSH to always return true.
> 
> No functional change intended, as this change doesn't introduce any
> user that disables the timestamped TLB.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/flushtlb.c        | 19 +++++++++++++------
>  xen/include/asm-x86/flushtlb.h | 17 ++++++++++++++++-
>  2 files changed, 29 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
> index e7ccd4ec7b..3649900793 100644
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -32,6 +32,9 @@
>  u32 tlbflush_clock = 1U;
>  DEFINE_PER_CPU(u32, tlbflush_time);
>  
> +/* Signals whether the TLB flush clock is in use. */
> +bool __read_mostly tlb_clk_enabled = true;
> +
>  /*
>   * pre_flush(): Increment the virtual TLB-flush clock. Returns new clock value.
>   * 
> @@ -82,12 +85,13 @@ static void post_flush(u32 t)
>  static void do_tlb_flush(void)
>  {
>      unsigned long flags, cr4;
> -    u32 t;
> +    u32 t = 0;
>  
>      /* This non-reentrant function is sometimes called in interrupt context. */
>      local_irq_save(flags);
>  
> -    t = pre_flush();
> +    if ( tlb_clk_enabled )
> +        t = pre_flush();

I think it makes more sense to push the check to pre_flush and
post_flush -- they are the ones that care about the clock, after all.

This also has the effect of making this patch a bit shorter, I think.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 6/7] x86/tlb: allow disabling the TLB clock
  2020-02-06 13:31   ` Wei Liu
@ 2020-02-06 13:46     ` Roger Pau Monné
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monné @ 2020-02-06 13:46 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Thu, Feb 06, 2020 at 01:31:34PM +0000, Wei Liu wrote:
> On Mon, Jan 27, 2020 at 07:11:14PM +0100, Roger Pau Monne wrote:
> > The TLB clock is helpful when running Xen on bare metal because when
> > doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
> > last flush.
> > 
> > This is not the case however when Xen is running virtualized, and the
> > underlying hypervisor provides mechanism to assist in performing TLB
> > flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
> > order to perform a TLB flush without having to IPI each CPU. When
> > using such mechanisms it's no longer possible to keep a timestamp of
> > the flushes on each CPU, as they are performed by the underlying
> > hypervisor.
> > 
> > Offer a boolean in order to signal Xen that the timestamped TLB
> > shouldn't be used. This avoids keeping the timestamps of the flushes,
> > and also forces NEED_FLUSH to always return true.
> > 
> > No functional change intended, as this change doesn't introduce any
> > user that disables the timestamped TLB.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >  xen/arch/x86/flushtlb.c        | 19 +++++++++++++------
> >  xen/include/asm-x86/flushtlb.h | 17 ++++++++++++++++-
> >  2 files changed, 29 insertions(+), 7 deletions(-)
> > 
> > diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
> > index e7ccd4ec7b..3649900793 100644
> > --- a/xen/arch/x86/flushtlb.c
> > +++ b/xen/arch/x86/flushtlb.c
> > @@ -32,6 +32,9 @@
> >  u32 tlbflush_clock = 1U;
> >  DEFINE_PER_CPU(u32, tlbflush_time);
> >  
> > +/* Signals whether the TLB flush clock is in use. */
> > +bool __read_mostly tlb_clk_enabled = true;
> > +
> >  /*
> >   * pre_flush(): Increment the virtual TLB-flush clock. Returns new clock value.
> >   * 
> > @@ -82,12 +85,13 @@ static void post_flush(u32 t)
> >  static void do_tlb_flush(void)
> >  {
> >      unsigned long flags, cr4;
> > -    u32 t;
> > +    u32 t = 0;
> >  
> >      /* This non-reentrant function is sometimes called in interrupt context. */
> >      local_irq_save(flags);
> >  
> > -    t = pre_flush();
> > +    if ( tlb_clk_enabled )
> > +        t = pre_flush();
> 
> I think it makes more sense to push the check to pre_flush and
> post_flush -- they are the ones that care about the clock, after all.
> 
> This also has the effect of making this patch a bit shorter, I think.

I added the check here in order to avoid the call just to return 0.
This is a hot path, so I thought that a check here would be less
expensive that a function call when the TLB stamps are disabled.

I don't mind moving it inside {pre/post}_flush functions if that's the
consensus.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-01-27 18:11 ` [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
  2020-01-28 14:17   ` Wei Liu
@ 2020-02-06 13:49   ` Wei Liu
  2020-02-06 14:09     ` Roger Pau Monné
  1 sibling, 1 reply; 28+ messages in thread
From: Wei Liu @ 2020-02-06 13:49 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
> Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
> This greatly increases the performance of TLB flushes when running
> with a high amount of vCPUs as a Xen guest, and is specially important
> when running in shim mode.
> 
> The following figures are from a PV guest running `make -j32 xen` in
> shim mode with 32 vCPUs and HAP.
> 
> Using x2APIC and ALLBUT shorthand:
> real	4m35.973s
> user	4m35.110s
> sys	36m24.117s
> 
> Using L0 assisted flush:
> real    1m2.596s
> user    4m34.818s
> sys     5m16.374s
> 
> The implementation adds a new hook to hypervisor_ops so other
> enlightenments can also implement such assisted flush just by filling
> the hook. Note that the Xen implementation completely ignores the
> dirty CPU mask and the linear address passed in, and always performs a
> global TLB flush on all vCPUs.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v1:
>  - Add a L0 assisted hook to hypervisor ops.
> ---
>  xen/arch/x86/guest/hypervisor.c        | 10 ++++++++++
>  xen/arch/x86/guest/xen/xen.c           |  6 ++++++
>  xen/arch/x86/smp.c                     | 11 +++++++++++
>  xen/include/asm-x86/guest/hypervisor.h | 17 +++++++++++++++++
>  4 files changed, 44 insertions(+)
> 
> diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
> index 4f27b98740..4085b19734 100644
> --- a/xen/arch/x86/guest/hypervisor.c
> +++ b/xen/arch/x86/guest/hypervisor.c
> @@ -18,6 +18,7 @@
>   *
>   * Copyright (c) 2019 Microsoft.
>   */
> +#include <xen/cpumask.h>
>  #include <xen/init.h>
>  #include <xen/types.h>
>  
> @@ -64,6 +65,15 @@ void hypervisor_resume(void)
>          ops->resume();
>  }
>  
> +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
> +                         unsigned int order)
> +{
> +    if ( ops && ops->flush_tlb )
> +        return ops->flush_tlb(mask, va, order);
> +

Is there a way to make this an alternative call? I consider tlb flush a
frequent operation which can use some optimisation.

This can be done as a later improvement if it is too difficult though.
This patch already has some substantial improvement.

> +    return -ENOSYS;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
> index 6dbc5f953f..639a2a4b32 100644
> --- a/xen/arch/x86/guest/xen/xen.c
> +++ b/xen/arch/x86/guest/xen/xen.c
> @@ -310,11 +310,17 @@ static void resume(void)
>          pv_console_init();
>  }
>  
> +static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
> +{
> +    return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
> +}
> +
>  static const struct hypervisor_ops ops = {
>      .name = "Xen",
>      .setup = setup,
>      .ap_setup = ap_setup,
>      .resume = resume,
> +    .flush_tlb = flush_tlb,
>  };
>  
>  const struct hypervisor_ops *__init xg_probe(void)
> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> index 65eb7cbda8..9bc925616a 100644
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -15,6 +15,7 @@
>  #include <xen/perfc.h>
>  #include <xen/spinlock.h>
>  #include <asm/current.h>
> +#include <asm/guest.h>
>  #include <asm/smp.h>
>  #include <asm/mc146818rtc.h>
>  #include <asm/flushtlb.h>
> @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
>      if ( (flags & ~FLUSH_ORDER_MASK) &&
>           !cpumask_subset(mask, cpumask_of(cpu)) )
>      {
> +        if ( cpu_has_hypervisor &&
> +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> +                         FLUSH_ORDER_MASK)) &&
> +             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
> +        {
> +            if ( tlb_clk_enabled )
> +                tlb_clk_enabled = false;
> +            return;
> +        }
> +

Per my understanding, not turning tlb_clk_enabled back to true after an
assisted flush fails is okay, because the effect of tlb_clk_enabled
being false is to always make NEED_FLUSH return true. That causes
excessive flushing, but it is okay in terms of correctness. Do I
understand it correctly?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-02-06 13:49   ` Wei Liu
@ 2020-02-06 14:09     ` Roger Pau Monné
  2020-02-06 14:46       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monné @ 2020-02-06 14:09 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Thu, Feb 06, 2020 at 01:49:35PM +0000, Wei Liu wrote:
> On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
> > Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
> > This greatly increases the performance of TLB flushes when running
> > with a high amount of vCPUs as a Xen guest, and is specially important
> > when running in shim mode.
> > 
> > The following figures are from a PV guest running `make -j32 xen` in
> > shim mode with 32 vCPUs and HAP.
> > 
> > Using x2APIC and ALLBUT shorthand:
> > real	4m35.973s
> > user	4m35.110s
> > sys	36m24.117s
> > 
> > Using L0 assisted flush:
> > real    1m2.596s
> > user    4m34.818s
> > sys     5m16.374s
> > 
> > The implementation adds a new hook to hypervisor_ops so other
> > enlightenments can also implement such assisted flush just by filling
> > the hook. Note that the Xen implementation completely ignores the
> > dirty CPU mask and the linear address passed in, and always performs a
> > global TLB flush on all vCPUs.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v1:
> >  - Add a L0 assisted hook to hypervisor ops.
> > ---
> >  xen/arch/x86/guest/hypervisor.c        | 10 ++++++++++
> >  xen/arch/x86/guest/xen/xen.c           |  6 ++++++
> >  xen/arch/x86/smp.c                     | 11 +++++++++++
> >  xen/include/asm-x86/guest/hypervisor.h | 17 +++++++++++++++++
> >  4 files changed, 44 insertions(+)
> > 
> > diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
> > index 4f27b98740..4085b19734 100644
> > --- a/xen/arch/x86/guest/hypervisor.c
> > +++ b/xen/arch/x86/guest/hypervisor.c
> > @@ -18,6 +18,7 @@
> >   *
> >   * Copyright (c) 2019 Microsoft.
> >   */
> > +#include <xen/cpumask.h>
> >  #include <xen/init.h>
> >  #include <xen/types.h>
> >  
> > @@ -64,6 +65,15 @@ void hypervisor_resume(void)
> >          ops->resume();
> >  }
> >  
> > +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
> > +                         unsigned int order)
> > +{
> > +    if ( ops && ops->flush_tlb )
> > +        return ops->flush_tlb(mask, va, order);
> > +
> 
> Is there a way to make this an alternative call? I consider tlb flush a
> frequent operation which can use some optimisation.
> 
> This can be done as a later improvement if it is too difficult though.
> This patch already has some substantial improvement.

I can look into making this an alternative call, if it turn out to be
too complex I will leave it out for a separate patch.

> > +    return -ENOSYS;
> > +}
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
> > index 6dbc5f953f..639a2a4b32 100644
> > --- a/xen/arch/x86/guest/xen/xen.c
> > +++ b/xen/arch/x86/guest/xen/xen.c
> > @@ -310,11 +310,17 @@ static void resume(void)
> >          pv_console_init();
> >  }
> >  
> > +static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
> > +{
> > +    return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
> > +}
> > +
> >  static const struct hypervisor_ops ops = {
> >      .name = "Xen",
> >      .setup = setup,
> >      .ap_setup = ap_setup,
> >      .resume = resume,
> > +    .flush_tlb = flush_tlb,
> >  };
> >  
> >  const struct hypervisor_ops *__init xg_probe(void)
> > diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> > index 65eb7cbda8..9bc925616a 100644
> > --- a/xen/arch/x86/smp.c
> > +++ b/xen/arch/x86/smp.c
> > @@ -15,6 +15,7 @@
> >  #include <xen/perfc.h>
> >  #include <xen/spinlock.h>
> >  #include <asm/current.h>
> > +#include <asm/guest.h>
> >  #include <asm/smp.h>
> >  #include <asm/mc146818rtc.h>
> >  #include <asm/flushtlb.h>
> > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
> >      if ( (flags & ~FLUSH_ORDER_MASK) &&
> >           !cpumask_subset(mask, cpumask_of(cpu)) )
> >      {
> > +        if ( cpu_has_hypervisor &&
> > +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> > +                         FLUSH_ORDER_MASK)) &&
> > +             !hypervisor_flush_tlb(mask, va, flags & FLUSH_ORDER_MASK) )
> > +        {
> > +            if ( tlb_clk_enabled )
> > +                tlb_clk_enabled = false;
> > +            return;
> > +        }
> > +
> 
> Per my understanding, not turning tlb_clk_enabled back to true after an
> assisted flush fails is okay, because the effect of tlb_clk_enabled
> being false is to always make NEED_FLUSH return true. That causes
> excessive flushing, but it is okay in terms of correctness. Do I
> understand it correctly?

Yes, that's my understanding. The stamped TLB is an optimization in
order to avoid flushes. Keeping it always off is safe, OTOH having it
on without properly tracking the flushes can indeed cause issues.

For the time being I would rather leave it off when it's been
disabled (ie: not turn it on if the assisted flush fails), as that's
safe.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-02-06 14:09     ` Roger Pau Monné
@ 2020-02-06 14:46       ` Jan Beulich
  2020-02-06 15:37         ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2020-02-06 14:46 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 06.02.2020 15:09, Roger Pau Monné wrote:
> On Thu, Feb 06, 2020 at 01:49:35PM +0000, Wei Liu wrote:
>> On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
>>> Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
>>> This greatly increases the performance of TLB flushes when running
>>> with a high amount of vCPUs as a Xen guest, and is specially important
>>> when running in shim mode.
>>>
>>> The following figures are from a PV guest running `make -j32 xen` in
>>> shim mode with 32 vCPUs and HAP.
>>>
>>> Using x2APIC and ALLBUT shorthand:
>>> real	4m35.973s
>>> user	4m35.110s
>>> sys	36m24.117s
>>>
>>> Using L0 assisted flush:
>>> real    1m2.596s
>>> user    4m34.818s
>>> sys     5m16.374s
>>>
>>> The implementation adds a new hook to hypervisor_ops so other
>>> enlightenments can also implement such assisted flush just by filling
>>> the hook. Note that the Xen implementation completely ignores the
>>> dirty CPU mask and the linear address passed in, and always performs a
>>> global TLB flush on all vCPUs.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Changes since v1:
>>>  - Add a L0 assisted hook to hypervisor ops.
>>> ---
>>>  xen/arch/x86/guest/hypervisor.c        | 10 ++++++++++
>>>  xen/arch/x86/guest/xen/xen.c           |  6 ++++++
>>>  xen/arch/x86/smp.c                     | 11 +++++++++++
>>>  xen/include/asm-x86/guest/hypervisor.h | 17 +++++++++++++++++
>>>  4 files changed, 44 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
>>> index 4f27b98740..4085b19734 100644
>>> --- a/xen/arch/x86/guest/hypervisor.c
>>> +++ b/xen/arch/x86/guest/hypervisor.c
>>> @@ -18,6 +18,7 @@
>>>   *
>>>   * Copyright (c) 2019 Microsoft.
>>>   */
>>> +#include <xen/cpumask.h>
>>>  #include <xen/init.h>
>>>  #include <xen/types.h>
>>>  
>>> @@ -64,6 +65,15 @@ void hypervisor_resume(void)
>>>          ops->resume();
>>>  }
>>>  
>>> +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
>>> +                         unsigned int order)
>>> +{
>>> +    if ( ops && ops->flush_tlb )
>>> +        return ops->flush_tlb(mask, va, order);
>>> +
>>
>> Is there a way to make this an alternative call? I consider tlb flush a
>> frequent operation which can use some optimisation.
>>
>> This can be done as a later improvement if it is too difficult though.
>> This patch already has some substantial improvement.
> 
> I can look into making this an alternative call, if it turn out to be
> too complex I will leave it out for a separate patch.

It'll be two steps - make a global struct hypervisor_ops instance
which the per-hypervisor instances get _copied_ into upon boot
(at that point all of those can go into .init.* sections), and
then switch the call(s) of interest. I.e. while the 2nd step can
of course be done right here, the first will want to be in a
prereq patch.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-02-06 14:46       ` Jan Beulich
@ 2020-02-06 15:37         ` Roger Pau Monné
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monné @ 2020-02-06 15:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Thu, Feb 06, 2020 at 03:46:56PM +0100, Jan Beulich wrote:
> On 06.02.2020 15:09, Roger Pau Monné wrote:
> > On Thu, Feb 06, 2020 at 01:49:35PM +0000, Wei Liu wrote:
> >> On Mon, Jan 27, 2020 at 07:11:15PM +0100, Roger Pau Monne wrote:
> >>> Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
> >>> This greatly increases the performance of TLB flushes when running
> >>> with a high amount of vCPUs as a Xen guest, and is specially important
> >>> when running in shim mode.
> >>>
> >>> The following figures are from a PV guest running `make -j32 xen` in
> >>> shim mode with 32 vCPUs and HAP.
> >>>
> >>> Using x2APIC and ALLBUT shorthand:
> >>> real	4m35.973s
> >>> user	4m35.110s
> >>> sys	36m24.117s
> >>>
> >>> Using L0 assisted flush:
> >>> real    1m2.596s
> >>> user    4m34.818s
> >>> sys     5m16.374s
> >>>
> >>> The implementation adds a new hook to hypervisor_ops so other
> >>> enlightenments can also implement such assisted flush just by filling
> >>> the hook. Note that the Xen implementation completely ignores the
> >>> dirty CPU mask and the linear address passed in, and always performs a
> >>> global TLB flush on all vCPUs.
> >>>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> Changes since v1:
> >>>  - Add a L0 assisted hook to hypervisor ops.
> >>> ---
> >>>  xen/arch/x86/guest/hypervisor.c        | 10 ++++++++++
> >>>  xen/arch/x86/guest/xen/xen.c           |  6 ++++++
> >>>  xen/arch/x86/smp.c                     | 11 +++++++++++
> >>>  xen/include/asm-x86/guest/hypervisor.h | 17 +++++++++++++++++
> >>>  4 files changed, 44 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
> >>> index 4f27b98740..4085b19734 100644
> >>> --- a/xen/arch/x86/guest/hypervisor.c
> >>> +++ b/xen/arch/x86/guest/hypervisor.c
> >>> @@ -18,6 +18,7 @@
> >>>   *
> >>>   * Copyright (c) 2019 Microsoft.
> >>>   */
> >>> +#include <xen/cpumask.h>
> >>>  #include <xen/init.h>
> >>>  #include <xen/types.h>
> >>>  
> >>> @@ -64,6 +65,15 @@ void hypervisor_resume(void)
> >>>          ops->resume();
> >>>  }
> >>>  
> >>> +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
> >>> +                         unsigned int order)
> >>> +{
> >>> +    if ( ops && ops->flush_tlb )
> >>> +        return ops->flush_tlb(mask, va, order);
> >>> +
> >>
> >> Is there a way to make this an alternative call? I consider tlb flush a
> >> frequent operation which can use some optimisation.
> >>
> >> This can be done as a later improvement if it is too difficult though.
> >> This patch already has some substantial improvement.
> > 
> > I can look into making this an alternative call, if it turn out to be
> > too complex I will leave it out for a separate patch.
> 
> It'll be two steps - make a global struct hypervisor_ops instance
> which the per-hypervisor instances get _copied_ into upon boot
> (at that point all of those can go into .init.* sections), and
> then switch the call(s) of interest. I.e. while the 2nd step can
> of course be done right here, the first will want to be in a
> prereq patch.

Done. I've only switched the flush_tlb operation, since the rest are
not relevant from a performance PoV.

Will wait until next week before posting a new version.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-02-06 15:38 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27 18:11 [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
2020-01-27 18:11 ` [Xen-devel] [PATCH v3 1/7] x86/tlb: fix NEED_FLUSH return type Roger Pau Monne
2020-01-28 14:17   ` Wei Liu
2020-02-03 11:48     ` Jan Beulich
2020-01-27 18:11 ` [Xen-devel] [PATCH v3 2/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
2020-02-06 11:15   ` Wei Liu
2020-01-27 18:11 ` [Xen-devel] [PATCH v3 3/7] x86/paging: add TLB flush hooks Roger Pau Monne
2020-02-05 20:52   ` Wei Liu
2020-01-27 18:11 ` [Xen-devel] [PATCH v3 4/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
2020-02-06 11:23   ` Wei Liu
2020-01-27 18:11 ` [Xen-devel] [PATCH v3 5/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
2020-02-06 12:59   ` Wei Liu
2020-01-27 18:11 ` [Xen-devel] [PATCH v3 6/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
2020-02-06 13:31   ` Wei Liu
2020-02-06 13:46     ` Roger Pau Monné
2020-01-27 18:11 ` [Xen-devel] [PATCH v3 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
2020-01-28 14:17   ` Wei Liu
2020-01-28 14:57     ` Roger Pau Monné
2020-01-28 16:24       ` Wei Liu
2020-01-28 17:16         ` Roger Pau Monné
2020-01-28 17:20           ` Andrew Cooper
2020-01-28 17:38             ` Roger Pau Monné
2020-02-06 13:49   ` Wei Liu
2020-02-06 14:09     ` Roger Pau Monné
2020-02-06 14:46       ` Jan Beulich
2020-02-06 15:37         ` Roger Pau Monné
2020-02-05 16:14 ` [Xen-devel] [PATCH v3 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monné
2020-02-05 16:44   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.