All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation
@ 2018-03-05 11:35 Olaf Hering
  2018-03-05 13:18 ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Olaf Hering @ 2018-03-05 11:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski-Górecki,
	Tim Deegan, Jan Beulich, Daniel De Graaf

Add an option to control when vTSC emulation will be activated for a
domU with tsc_mode=default. Without such option each TSC access from
domU will be emulated, which causes a significant perfomance drop for
workloads that make use of rdtsc.

Add a new domctl XEN_DOMCTL_set_vtsc_khz_tolerance to adjust the
tolerance value of a running domU that is supposed to be migrated.

One option to avoid the TSC option is to run domUs with tsc_mode=native.
This has the drawback that migrating a domU from a "2.3GHz" class host
to a "2.4GHz" class host may change the rate at wich the TSC counter
increases, the domU may not be prepared for that.

With this option the host admin can decide how a domU should behave when
it is migrated across systems of the same class. Since there is always
some jitter when Xen calibrates the cpu_khz value, all hosts of the same
class will most likely have slightly different values. As a result vTSC
emulation is unavoidable.

The new option allows jitter up to 65535 KHz, which is more than enough
for the intended use. Data collected during the incident which triggered
this change showed a jitter of up to 200 KHz across systems of the same
class.

A new utility is added which allows to adjust the vtsc_khz_tolerance
value for running domUs. This is useful to avoid emulation for domUs
that are already running and which can not be restarted.

Existing padding fields are reused to store vtsc_khz_tolerance as u16.

In case the migration stream should get a new record type, the ordering
of records sent during migration becomes important. The value of
vtsc_khz_tolerance must be known by the receiving host before
configuring TSC, because this is the place where the decision of vTSC
emulation is made.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---

Followup to earlier attempt with global knob:
https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg01589.html

 .gitignore                               |  1 +
 docs/man/xl.cfg.pod.5.in                 | 10 ++++
 docs/specs/libxc-migration-stream.pandoc |  6 ++-
 tools/libxc/include/xenctrl.h            |  6 +++
 tools/libxc/xc_domain.c                  | 15 ++++++
 tools/libxc/xc_sr_common_x86.c           |  6 ++-
 tools/libxc/xc_sr_stream_format.h        |  3 +-
 tools/libxl/libxl_types.idl              |  1 +
 tools/libxl/libxl_x86.c                  |  3 +-
 tools/misc/Makefile                      |  4 ++
 tools/misc/xen-vtsc.c                    | 92 ++++++++++++++++++++++++++++++++
 tools/python/xen/lowlevel/xc/xc.c        |  2 +-
 tools/xl/xl_parse.c                      |  3 ++
 xen/arch/x86/domain.c                    |  2 +-
 xen/arch/x86/domctl.c                    | 11 ++++
 xen/arch/x86/time.c                      | 29 ++++++++--
 xen/include/asm-x86/domain.h             |  1 +
 xen/include/asm-x86/time.h               |  4 +-
 xen/include/public/domctl.h              |  5 +-
 xen/xsm/flask/hooks.c                    |  1 +
 20 files changed, 191 insertions(+), 14 deletions(-)
 create mode 100644 tools/misc/xen-vtsc.c

diff --git a/.gitignore b/.gitignore
index c566e3cb64..da2a54eb90 100644
--- a/.gitignore
+++ b/.gitignore
@@ -221,6 +221,7 @@ tools/misc/xen-detect
 tools/misc/xen-diag
 tools/misc/xen-tmem-list-parse
 tools/misc/xen-livepatch
+tools/misc/xen-vtsc
 tools/misc/xenperf
 tools/misc/xenpm
 tools/misc/xen-hvmctx
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index a699367779..0d67525192 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1891,6 +1891,16 @@ determined in a similar way to that of B<default> TSC mode.
 
 Please see B<xen-tscmode(7)> for more information on this option.
 
+=item B<vtsc_khz_tolerance="KHZ">
+
+B<(x86 only, relevant only for tsc_mode=default)>
+When a domU is started, the CPU frequency of the host is used by the domU for
+TSC related time measurement. Once the domU is either migrated or
+saved/restored on another host that CPU frequency has to be emulated to avoid
+timedrift. To avoid the performance penalty of the TSC emulation, allow a
+certain amount of jitter of the measured CPU frequency on the hosts the domU
+is supposed to run on.
+
 =item B<localtime=BOOLEAN>
 
 Set the real time clock to local time or to UTC. False (0) by default,
diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
index 73421ff393..e69b88e526 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -3,7 +3,7 @@
   Andrew Cooper <<andrew.cooper3@citrix.com>>
   Wen Congyang <<wency@cn.fujitsu.com>>
   Yang Hongyang <<hongyang.yang@easystack.cn>>
-% Revision 2
+% Revision 3
 
 Introduction
 ============
@@ -472,7 +472,7 @@ XEN\_DOMCTL\_{get,set}tscinfo hypercall sub-ops.
     +------------------------+------------------------+
     | nsec                                            |
     +------------------------+------------------------+
-    | incarnation            | (reserved)             |
+    | incarnation            | tolerance | (reserved) |
     +------------------------+------------------------+
 
 --------------------------------------------------------------------
@@ -485,6 +485,8 @@ khz              TSC frequency, in kHz.
 nsec             Elapsed time, in nanoseconds.
 
 incarnation      Incarnation.
+
+vtsc_tolerance   Amount of Jitter the domU can handle after migration.
 --------------------------------------------------------------------
 
 \clearpage
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 543abfcb34..7159a836ed 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1355,11 +1355,16 @@ int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
                               int32_t time_offset_seconds);
 
+int xc_domain_set_vtsc_khz_tolerance(xc_interface *xch,
+                                     uint32_t domid,
+                                     uint16_t vtsc_khz_tolerance);
+
 int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t domid,
                            uint32_t tsc_mode,
                            uint64_t elapsed_nsec,
                            uint32_t gtsc_khz,
+                           uint16_t vtsc_khz_tolerance,
                            uint32_t incarnation);
 
 int xc_domain_get_tsc_info(xc_interface *xch,
@@ -1367,6 +1372,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
                            uint32_t *tsc_mode,
                            uint64_t *elapsed_nsec,
                            uint32_t *gtsc_khz,
+                           uint16_t *vtsc_khz_tolerance,
                            uint32_t *incarnation);
 
 int xc_domain_disable_migrate(xc_interface *xch, uint32_t domid);
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ea3df1ef31..5ce78c4fe0 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -847,11 +847,23 @@ int xc_domain_disable_migrate(xc_interface *xch, uint32_t domid)
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_set_vtsc_khz_tolerance(xc_interface *xch,
+                                     uint32_t domid,
+                                     uint16_t vtsc_khz_tolerance)
+{
+    DECLARE_DOMCTL;
+    domctl.cmd = XEN_DOMCTL_set_vtsc_khz_tolerance;
+    domctl.domain = domid;
+    domctl.u.tsc_info.vtsc_khz_tolerance = vtsc_khz_tolerance;
+    return do_domctl(xch, &domctl);
+}
+
 int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t domid,
                            uint32_t tsc_mode,
                            uint64_t elapsed_nsec,
                            uint32_t gtsc_khz,
+                           uint16_t vtsc_khz_tolerance,
                            uint32_t incarnation)
 {
     DECLARE_DOMCTL;
@@ -860,6 +872,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
     domctl.u.tsc_info.tsc_mode = tsc_mode;
     domctl.u.tsc_info.elapsed_nsec = elapsed_nsec;
     domctl.u.tsc_info.gtsc_khz = gtsc_khz;
+    domctl.u.tsc_info.vtsc_khz_tolerance = vtsc_khz_tolerance;
     domctl.u.tsc_info.incarnation = incarnation;
     return do_domctl(xch, &domctl);
 }
@@ -869,6 +882,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
                            uint32_t *tsc_mode,
                            uint64_t *elapsed_nsec,
                            uint32_t *gtsc_khz,
+                           uint16_t *vtsc_khz_tolerance,
                            uint32_t *incarnation)
 {
     int rc;
@@ -882,6 +896,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
         *tsc_mode = domctl.u.tsc_info.tsc_mode;
         *elapsed_nsec = domctl.u.tsc_info.elapsed_nsec;
         *gtsc_khz = domctl.u.tsc_info.gtsc_khz;
+        *vtsc_khz_tolerance = domctl.u.tsc_info.vtsc_khz_tolerance;
         *incarnation = domctl.u.tsc_info.incarnation;
     }
     return rc;
diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c
index 98f1cef30f..ea3e551a83 100644
--- a/tools/libxc/xc_sr_common_x86.c
+++ b/tools/libxc/xc_sr_common_x86.c
@@ -12,7 +12,8 @@ int write_tsc_info(struct xc_sr_context *ctx)
     };
 
     if ( xc_domain_get_tsc_info(xch, ctx->domid, &tsc.mode,
-                                &tsc.nsec, &tsc.khz, &tsc.incarnation) < 0 )
+                                &tsc.nsec, &tsc.khz, &tsc.vtsc_tolerance,
+                                &tsc.incarnation) < 0 )
     {
         PERROR("Unable to obtain TSC information");
         return -1;
@@ -34,7 +35,8 @@ int handle_tsc_info(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     }
 
     if ( xc_domain_set_tsc_info(xch, ctx->domid, tsc->mode,
-                                tsc->nsec, tsc->khz, tsc->incarnation) )
+                                tsc->nsec, tsc->khz, tsc->vtsc_tolerance,
+                                tsc->incarnation) )
     {
         PERROR("Unable to set TSC information");
         return -1;
diff --git a/tools/libxc/xc_sr_stream_format.h b/tools/libxc/xc_sr_stream_format.h
index 15ff1c7efb..9b52f6ace6 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -121,7 +121,8 @@ struct xc_sr_rec_tsc_info
     uint32_t khz;
     uint64_t nsec;
     uint32_t incarnation;
-    uint32_t _res1;
+    uint16_t vtsc_tolerance;
+    uint16_t _res1;
 };
 
 /* HVM_PARAMS */
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 35038120ca..cce4f3eff1 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -466,6 +466,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
     ("numa_placement",  libxl_defbool),
     ("tsc_mode",        libxl_tsc_mode),
+    ("vtsc_khz_tolerance", uint32),
     ("max_memkb",       MemKB),
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index d82013f6ed..1114ac0865 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -317,7 +317,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
     default:
         abort();
     }
-    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
+    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0,
+                           d_config->b_info.vtsc_khz_tolerance, 0);
     if (libxl_defbool_val(d_config->b_info.disable_migrate))
         xc_domain_disable_migrate(ctx->xch, domid);
     rtc_timeoffset = d_config->b_info.rtc_timeoffset;
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index eaa28793ef..2b086e5758 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -36,6 +36,7 @@ INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
 INSTALL_PRIVBIN                += xenpvnetboot
+INSTALL_PRIVBIN                += xen-vtsc
 
 # Everything to be installed
 TARGETS_ALL := $(INSTALL_BIN) $(INSTALL_SBIN) $(INSTALL_PRIVBIN)
@@ -112,6 +113,9 @@ xen-livepatch: xen-livepatch.o
 xen-diag: xen-diag.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-vtsc: xen-vtsc.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-vtsc.c b/tools/misc/xen-vtsc.c
new file mode 100644
index 0000000000..07649c6397
--- /dev/null
+++ b/tools/misc/xen-vtsc.c
@@ -0,0 +1,92 @@
+/* Could be merged into xen-diag.c? */
+
+#include <inttypes.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <xenctrl.h>
+
+static uint32_t domid;
+static uint32_t tsc_mode;
+static uint64_t elapsed_nsec;
+static uint32_t gtsc_khz;
+static uint16_t vtsc_khz_tolerance;
+static uint32_t incarnation;
+static uint16_t new_vtsc_khz_tolerance;
+
+static void show_help(void)
+{
+    fprintf(stderr, "Usage: xen-vtsc <domid> [vtsc_tolerance]\n");
+}
+
+int main(int argc, char *argv[])
+{
+    struct xc_interface_core *xch;
+    int mode_set = 0;
+    int rc;
+
+    if (argc < 2 || argc > 3 || strcmp("-h", argv[1]) == 0)
+    {
+        show_help();
+        return 0;
+    }
+    domid = atol(argv[1]);
+    if (argc == 3)
+    {
+        unsigned long val;
+
+        val = atol(argv[2]);
+        if ( val > UINT16_MAX )
+        {
+            fprintf(stderr,
+                    "Error: value for vtsc_tolerance must between 0 and %u\n", USHRT_MAX);
+            return 1;
+        }
+        new_vtsc_khz_tolerance = val;
+        if ( domid )
+            mode_set = 1;
+    }
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "failed to get xch handler\n");
+        return 1;
+    }
+
+
+
+    if (mode_set)
+    {
+        rc = xc_domain_set_vtsc_khz_tolerance(xch, domid, new_vtsc_khz_tolerance);
+        if ( rc )
+        {
+            perror("xc_domain_set_vtsc_khz_tolerance");
+            goto err;
+        }
+    }
+    else
+    {
+        rc = xc_domain_get_tsc_info(xch, domid, &tsc_mode, &elapsed_nsec, &gtsc_khz,
+                                    &vtsc_khz_tolerance, &incarnation);
+        if ( rc )
+        {
+            perror("xc_domain_get_tsc_info");
+            goto err;
+        }
+
+        printf("domid: %" PRIu32 "\n"
+               "tsc_mode: %" PRIu32 "\n"
+               "elapsed_nsec: %" PRIu64 "\n"
+               "gtsc_khz: %" PRIu32 "\n"
+               "vtsc_khz_tolerance: %" PRIu16 "\n"
+               "incarnation: %" PRIu32 "\n",
+               domid, tsc_mode, elapsed_nsec, gtsc_khz, vtsc_khz_tolerance, incarnation);
+    }
+
+err:
+    xc_interface_close(xch);
+
+    return !!rc;
+}
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index f501764100..e73e2cafc7 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1522,7 +1522,7 @@ static PyObject *pyxc_domain_set_tsc_info(XcObject *self, PyObject *args)
     if (!PyArg_ParseTuple(args, "ii", &dom, &tsc_mode))
         return NULL;
 
-    if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0) != 0)
+    if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0, 0) != 0)
         return pyxc_error_to_exception(self->xc_handle);
 
     Py_INCREF(zero);
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index f6842540ca..0388416011 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1126,6 +1126,9 @@ void parse_config_data(const char *config_source,
         }
     }
 
+    if (!xlu_cfg_get_long(config, "vtsc_khz_tolerance", &l, 0))
+        b_info->vtsc_khz_tolerance = l < 0 || l > UINT16_MAX ? UINT16_MAX : l;
+
     if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
         b_info->rtc_timeoffset = l;
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1f8b08ef02..2ee8755d13 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -558,7 +558,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
     }
 
     /* initialize default tsc behavior in case tools don't */
-    tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
+    tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0, 0);
     spin_lock_init(&d->arch.vtsc_lock);
 
     /* PV/PVH guests get an emulated PIT too for video BIOSes to use. */
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..7d11957a80 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -939,6 +939,7 @@ long arch_do_domctl(
             tsc_get_info(d, &domctl->u.tsc_info.tsc_mode,
                          &domctl->u.tsc_info.elapsed_nsec,
                          &domctl->u.tsc_info.gtsc_khz,
+                         &domctl->u.tsc_info.vtsc_khz_tolerance,
                          &domctl->u.tsc_info.incarnation);
             domain_unpause(d);
             copyback = true;
@@ -954,11 +955,21 @@ long arch_do_domctl(
             tsc_set_info(d, domctl->u.tsc_info.tsc_mode,
                          domctl->u.tsc_info.elapsed_nsec,
                          domctl->u.tsc_info.gtsc_khz,
+                         domctl->u.tsc_info.vtsc_khz_tolerance,
                          domctl->u.tsc_info.incarnation);
             domain_unpause(d);
         }
         break;
 
+    case XEN_DOMCTL_set_vtsc_khz_tolerance:
+        if ( d == currd )
+            ret = -EINVAL;
+        else
+        {
+            d->arch.vtsc_khz_tolerance = domctl->u.tsc_info.vtsc_khz_tolerance;
+        }
+        break;
+
     case XEN_DOMCTL_suppress_spurious_page_faults:
         d->arch.suppress_spurious_page_faults = 1;
         break;
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 1a6fde65dd..bc078e4434 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2064,13 +2064,14 @@ int host_tsc_is_safe(void)
  */
 void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
                   uint64_t *elapsed_nsec, uint32_t *gtsc_khz,
-                  uint32_t *incarnation)
+                  uint16_t *vtsc_khz_tolerance, uint32_t *incarnation)
 {
     bool enable_tsc_scaling = is_hvm_domain(d) &&
                               hvm_tsc_scaling_supported && !d->arch.vtsc;
 
     *incarnation = d->arch.incarnation;
     *tsc_mode = d->arch.tsc_mode;
+    *vtsc_khz_tolerance = d->arch.vtsc_khz_tolerance;
 
     switch ( *tsc_mode )
     {
@@ -2122,7 +2123,8 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
  */
 void tsc_set_info(struct domain *d,
                   uint32_t tsc_mode, uint64_t elapsed_nsec,
-                  uint32_t gtsc_khz, uint32_t incarnation)
+                  uint32_t gtsc_khz, uint16_t vtsc_khz_tolerance,
+                  uint32_t incarnation)
 {
     if ( is_idle_domain(d) || is_hardware_domain(d) )
     {
@@ -2132,9 +2134,12 @@ void tsc_set_info(struct domain *d,
 
     switch ( d->arch.tsc_mode = tsc_mode )
     {
+        bool diff_tolerated;
         bool enable_tsc_scaling;
 
     case TSC_MODE_DEFAULT:
+        d->arch.vtsc_khz_tolerance = vtsc_khz_tolerance;
+        /* Fallthrough. */
     case TSC_MODE_ALWAYS_EMULATE:
         d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
         d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
@@ -2147,8 +2152,24 @@ void tsc_set_info(struct domain *d,
          * When a guest is created, gtsc_khz is passed in as zero, making
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
          */
+        diff_tolerated = d->arch.tsc_khz == cpu_khz;
+
+        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz ) {
+            uint32_t khz_diff;
+
+            khz_diff = cpu_khz > gtsc_khz ?
+                       cpu_khz - gtsc_khz : gtsc_khz - cpu_khz;
+            if (vtsc_khz_tolerance)
+                diff_tolerated = khz_diff <= vtsc_khz_tolerance;
+
+            printk(XENLOG_WARNING "%s: d%u: host has %lu kHz,"
+                   " domU expects %u kHz,"
+                   " difference of %u is %s tolerance of %u\n",
+                   __func__, d->domain_id, cpu_khz, gtsc_khz, khz_diff,
+                   diff_tolerated ? "within" : "outside", vtsc_khz_tolerance);
+        }
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (diff_tolerated ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
         {
@@ -2237,6 +2258,8 @@ static void dump_softtsc(unsigned char key)
             printk(",ofs=%#"PRIx64, d->arch.vtsc_offset);
         if ( d->arch.tsc_khz )
             printk(",khz=%"PRIu32, d->arch.tsc_khz);
+        if ( d->arch.vtsc_khz_tolerance )
+            printk(",tol=%"PRIu16, d->arch.vtsc_khz_tolerance);
         if ( d->arch.incarnation )
             printk(",inc=%"PRIu32, d->arch.incarnation);
 #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4679d5477d..67d9c66c11 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -369,6 +369,7 @@ struct arch_domain
     /* TSC management (emulation, pv, scaling, stats) */
     int tsc_mode;            /* see include/asm-x86/time.h */
     bool_t vtsc;             /* tsc is emulated (may change after migrate) */
+    uint16_t vtsc_khz_tolerance; /* domU handles that much jitter in khz value */
     s_time_t vtsc_last;      /* previous TSC value (guarantee monotonicity) */
     spinlock_t vtsc_lock;
     uint64_t vtsc_offset;    /* adjustment for save/restore/migrate */
diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h
index b3ae832df4..439b47d93e 100644
--- a/xen/include/asm-x86/time.h
+++ b/xen/include/asm-x86/time.h
@@ -61,10 +61,10 @@ u64 gtime_to_gtsc(struct domain *d, u64 time);
 u64 gtsc_to_gtime(struct domain *d, u64 tsc);
 
 void tsc_set_info(struct domain *d, uint32_t tsc_mode, uint64_t elapsed_nsec,
-                  uint32_t gtsc_khz, uint32_t incarnation);
+                  uint32_t gtsc_khz, uint16_t vtsc_khz_tolerance, uint32_t incarnation);
    
 void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec,
-                  uint32_t *gtsc_khz, uint32_t *incarnation);
+                  uint32_t *gtsc_khz, uint16_t *vtsc_khz_tolerance, uint32_t *incarnation);
    
 
 void force_update_vcpu_system_time(struct vcpu *v);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index ec7a860afc..f405336d91 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -697,12 +697,14 @@ struct xen_domctl_disable_migrate {
 
 /* XEN_DOMCTL_gettscinfo */
 /* XEN_DOMCTL_settscinfo */
+/* XEN_DOMCTL_set_vtsc_khz_tolerance */
 struct xen_domctl_tsc_info {
     /* IN/OUT */
     uint32_t tsc_mode;
     uint32_t gtsc_khz;
     uint32_t incarnation;
-    uint32_t pad;
+    uint16_t vtsc_khz_tolerance;
+    uint16_t pad;
     uint64_aligned_t elapsed_nsec;
 };
 
@@ -1172,6 +1174,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_soft_reset                    79
 #define XEN_DOMCTL_set_gnttab_limits             80
 #define XEN_DOMCTL_vuart_op                      81
+#define XEN_DOMCTL_set_vtsc_khz_tolerance        82
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 1802d8dfe6..2faa9c87e1 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -721,6 +721,7 @@ static int flask_domctl(struct domain *d, int cmd)
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__GETTSC);
 
     case XEN_DOMCTL_settscinfo:
+    case XEN_DOMCTL_set_vtsc_khz_tolerance:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SETTSC);
 
     case XEN_DOMCTL_audit_p2m:

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation
  2018-03-05 11:35 [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation Olaf Hering
@ 2018-03-05 13:18 ` Jan Beulich
  2018-03-05 14:18   ` Olaf Hering
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2018-03-05 13:18 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski-Górecki,
	xen-devel, Daniel De Graaf

>>> On 05.03.18 at 12:35, <olaf@aepfle.de> wrote:

One thing I'm missing in the description (or the added documentation)
is a discussion of the conditions under which it is safe to make use of
the new setting.

> @@ -954,11 +955,21 @@ long arch_do_domctl(
>              tsc_set_info(d, domctl->u.tsc_info.tsc_mode,
>                           domctl->u.tsc_info.elapsed_nsec,
>                           domctl->u.tsc_info.gtsc_khz,
> +                         domctl->u.tsc_info.vtsc_khz_tolerance,
>                           domctl->u.tsc_info.incarnation);
>              domain_unpause(d);
>          }
>          break;
>  
> +    case XEN_DOMCTL_set_vtsc_khz_tolerance:
> +        if ( d == currd )
> +            ret = -EINVAL;

Why? There's e.g. no domain_pause() involved here. And without
that there's no difference between changing the value behind the
back of a foreign domain, of for oneself. Granted Dom0 isn't likely
to want to do that, but such checks should have a reason.

> +        else
> +        {
> +            d->arch.vtsc_khz_tolerance = domctl->u.tsc_info.vtsc_khz_tolerance;
> +        }

Stray braces (the more that the if() side doesn't have them).

Also throughout the patch I wonder if it wasn't more natural to
put the unit last in the parameter / field names.

> @@ -2122,7 +2123,8 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
>   */
>  void tsc_set_info(struct domain *d,
>                    uint32_t tsc_mode, uint64_t elapsed_nsec,
> -                  uint32_t gtsc_khz, uint32_t incarnation)
> +                  uint32_t gtsc_khz, uint16_t vtsc_khz_tolerance,
> +                  uint32_t incarnation)

For the sake of consistency with the other types here I'm not
going to demand to replace uint16_t here, but it's really not
necessary to use that type here (other than on the read path).

> @@ -2147,8 +2152,24 @@ void tsc_set_info(struct domain *d,
>           * When a guest is created, gtsc_khz is passed in as zero, making
>           * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
>           */
> +        diff_tolerated = d->arch.tsc_khz == cpu_khz;
> +
> +        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz ) {

Style.

> +            uint32_t khz_diff;
> +
> +            khz_diff = cpu_khz > gtsc_khz ?
> +                       cpu_khz - gtsc_khz : gtsc_khz - cpu_khz;
> +            if (vtsc_khz_tolerance)

Again.

> +                diff_tolerated = khz_diff <= vtsc_khz_tolerance;
> +
> +            printk(XENLOG_WARNING "%s: d%u: host has %lu kHz,"
> +                   " domU expects %u kHz,"
> +                   " difference of %u is %s tolerance of %u\n",
> +                   __func__, d->domain_id, cpu_khz, gtsc_khz, khz_diff,
> +                   diff_tolerated ? "within" : "outside", vtsc_khz_tolerance);

Leftover debugging message?

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -697,12 +697,14 @@ struct xen_domctl_disable_migrate {
>  
>  /* XEN_DOMCTL_gettscinfo */
>  /* XEN_DOMCTL_settscinfo */
> +/* XEN_DOMCTL_set_vtsc_khz_tolerance */
>  struct xen_domctl_tsc_info {
>      /* IN/OUT */
>      uint32_t tsc_mode;
>      uint32_t gtsc_khz;
>      uint32_t incarnation;
> -    uint32_t pad;
> +    uint16_t vtsc_khz_tolerance;
> +    uint16_t pad;

Generally with the prior pad field not being checked anywhere this
isn't a valid extension. However, this being a domctl (and the
interface version suitably bumped already) plus looking at how the
libxc code works, I think you can get away doing it like this.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation
  2018-03-05 13:18 ` Jan Beulich
@ 2018-03-05 14:18   ` Olaf Hering
  2018-03-05 14:45     ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Olaf Hering @ 2018-03-05 14:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski-Górecki,
	xen-devel, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 1489 bytes --]

Am Mon, 05 Mar 2018 06:18:17 -0700
schrieb "Jan Beulich" <JBeulich@suse.com>:

> >>> On 05.03.18 at 12:35, <olaf@aepfle.de> wrote:  
> 
> One thing I'm missing in the description (or the added documentation)
> is a discussion of the conditions under which it is safe to make use of
> the new setting.

The same rules as tsc_mode=native apply, I think.
xen-tscmode(7) has to be updated as well to cover the new knob.


> > +    case XEN_DOMCTL_set_vtsc_khz_tolerance:
> > +        if ( d == currd )
> > +            ret = -EINVAL;  
> Why? There's e.g. no domain_pause() involved here.

I have thought about that. Now I think that part can be even simpler,
just like the following XEN_DOMCTL_suppress_spurious_page_faults:
just change that read-only value unconditionally. There is no obvious
harm in changing that value.

> Also throughout the patch I wonder if it wasn't more natural to
> put the unit last in the parameter / field names.

That was just to keep the diff slightly smaller.


> > +            printk(XENLOG_WARNING "%s: d%u: host has %lu kHz,"
> > +                   " domU expects %u kHz,"
> > +                   " difference of %u is %s tolerance of %u\n",
> > +                   __func__, d->domain_id, cpu_khz, gtsc_khz, khz_diff,
> > +                   diff_tolerated ? "within" : "outside", vtsc_khz_tolerance);  
> Leftover debugging message?

I think it is worth to log that event, perhaps not with WARNING level.


Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation
  2018-03-05 14:18   ` Olaf Hering
@ 2018-03-05 14:45     ` Jan Beulich
  2018-03-05 14:46       ` George Dunlap
  2018-03-05 17:05       ` Olaf Hering
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Beulich @ 2018-03-05 14:45 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Tim Deegan, StefanoStabellini, Wei Liu, George Dunlap,
	Andrew Cooper, IanJackson, Marek Marczykowski-Górecki,
	xen-devel, Daniel De Graaf

>>> On 05.03.18 at 15:18, <olaf@aepfle.de> wrote:
> Am Mon, 05 Mar 2018 06:18:17 -0700
> schrieb "Jan Beulich" <JBeulich@suse.com>:
> 
>> >>> On 05.03.18 at 12:35, <olaf@aepfle.de> wrote:  
>> > +    case XEN_DOMCTL_set_vtsc_khz_tolerance:
>> > +        if ( d == currd )
>> > +            ret = -EINVAL;  
>> Why? There's e.g. no domain_pause() involved here.
> 
> I have thought about that. Now I think that part can be even simpler,
> just like the following XEN_DOMCTL_suppress_spurious_page_faults:
> just change that read-only value unconditionally. There is no obvious
> harm in changing that value.
> 
>> Also throughout the patch I wonder if it wasn't more natural to
>> put the unit last in the parameter / field names.
> 
> That was just to keep the diff slightly smaller.

I'm only talking about additions you make, and I don't see a size
difference between "vtsc_khz_tolerance" and "vtsc_tolerance_khz".

>> > +            printk(XENLOG_WARNING "%s: d%u: host has %lu kHz,"
>> > +                   " domU expects %u kHz,"
>> > +                   " difference of %u is %s tolerance of %u\n",
>> > +                   __func__, d->domain_id, cpu_khz, gtsc_khz, khz_diff,
>> > +                   diff_tolerated ? "within" : "outside", vtsc_khz_tolerance);  
>> Leftover debugging message?
> 
> I think it is worth to log that event, perhaps not with WARNING level.

XENLOG_G_INFO at most, I would say, and perhaps only when
the setting is non-zero.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation
  2018-03-05 14:45     ` Jan Beulich
@ 2018-03-05 14:46       ` George Dunlap
  2018-03-05 17:05       ` Olaf Hering
  1 sibling, 0 replies; 6+ messages in thread
From: George Dunlap @ 2018-03-05 14:46 UTC (permalink / raw)
  To: Jan Beulich, Olaf Hering
  Cc: Tim Deegan, StefanoStabellini, Wei Liu, George Dunlap,
	Andrew Cooper, IanJackson, Marek Marczykowski-Górecki,
	xen-devel, Daniel De Graaf

On 03/05/2018 02:45 PM, Jan Beulich wrote:
>>>> On 05.03.18 at 15:18, <olaf@aepfle.de> wrote:
>> Am Mon, 05 Mar 2018 06:18:17 -0700
>> schrieb "Jan Beulich" <JBeulich@suse.com>:
>>
>>>>>> On 05.03.18 at 12:35, <olaf@aepfle.de> wrote:  
>>>> +    case XEN_DOMCTL_set_vtsc_khz_tolerance:
>>>> +        if ( d == currd )
>>>> +            ret = -EINVAL;  
>>> Why? There's e.g. no domain_pause() involved here.
>>
>> I have thought about that. Now I think that part can be even simpler,
>> just like the following XEN_DOMCTL_suppress_spurious_page_faults:
>> just change that read-only value unconditionally. There is no obvious
>> harm in changing that value.
>>
>>> Also throughout the patch I wonder if it wasn't more natural to
>>> put the unit last in the parameter / field names.
>>
>> That was just to keep the diff slightly smaller.
> 
> I'm only talking about additions you make, and I don't see a size
> difference between "vtsc_khz_tolerance" and "vtsc_tolerance_khz".

+1 to unit at the end.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation
  2018-03-05 14:45     ` Jan Beulich
  2018-03-05 14:46       ` George Dunlap
@ 2018-03-05 17:05       ` Olaf Hering
  1 sibling, 0 replies; 6+ messages in thread
From: Olaf Hering @ 2018-03-05 17:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, StefanoStabellini, Wei Liu, George Dunlap,
	Andrew Cooper, IanJackson, Marek Marczykowski-Górecki,
	xen-devel, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 303 bytes --]

Am Mon, 05 Mar 2018 07:45:24 -0700
schrieb "Jan Beulich" <JBeulich@suse.com>:

> I'm only talking about additions you make, and I don't see a size
> difference between "vtsc_khz_tolerance" and "vtsc_tolerance_khz".

Oh, I misunderstood that sentence. I will change the name as suggested.

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-05 17:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 11:35 [PATCH v2] new config option vtsc_khz_tolerance to avoid TSC emulation Olaf Hering
2018-03-05 13:18 ` Jan Beulich
2018-03-05 14:18   ` Olaf Hering
2018-03-05 14:45     ` Jan Beulich
2018-03-05 14:46       ` George Dunlap
2018-03-05 17:05       ` Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.