All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
@ 2018-06-07 13:08 Olaf Hering
  2018-06-07 13:31 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Olaf Hering @ 2018-06-07 13:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski-Górecki,
	Tim Deegan, Julien Grall, Jan Beulich

Add an option to control when vTSC emulation will be activated for a
domU with tsc_mode=default. Without such option each TSC access from
domU will be emulated, which causes a significant perfomance drop for
workloads that make use of rdtsc.

One option to avoid the TSC option is to run domUs with tsc_mode=native.
This has the drawback that migrating a domU from a "2.3GHz" class host
to a "2.4GHz" class host may change the rate at wich the TSC counter
increases, the domU may not be prepared for that.

With the new option the host admin can decide how a domU should behave
when it is migrated across systems of the same class. Since there is
always some jitter when Xen calibrates the cpu_khz value, all hosts of
the same class will most likely have slightly different values. As a
result vTSC emulation is unavoidable. Data collected during the incident
which triggered this change showed a jitter of up to 200 KHz across
systems of the same class.

Existing padding fields are reused to store vtsc_khz_tolerance as u16.
The padding is sent as zero in write_tsc_info to the receving host.
The padding is undefined if the changed code runs as receiver.
handle_tsc_info has no code to verify that padding is indeed zero. Due
to the lack of a version field it is impossible to know if the sender
already has the newly introduced vtsc_tolerance field. In the worst
case the receiving domU will get an unemulated TSC.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Wei Liu <wei.liu2@citrix.com> (v07/v08)
Reviewed-by: Jan Beulich <jbeulich@suse.com> (v08)
--

v9:
 - extend commit msg, mention potential issues with xc_sr_rec_tsc_info._res1
v8:
 - adjust also python stream checker for added tolerance member
v7:
 - use uint16 in libxl_types.idl to match type used elsewhere in the patch
v6:
 - mention default value in xl.cfg
 - tsc_set_info: remove usage of __func__, use %d for domid
 - tsc_set_info: use ABS to calculate khz_diff
v5:
 - reduce functionality to allow setting of the tolerance value
   only at initial domU startup
v4:
 - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz
v3:
 - rename vtsc_khz_tolerance to vtsc_tolerance_khz
 - separate domctls to adjust values
 - more docs
 - update libxl.h
 - update python tests
 - flask check bound to tsc permissions
 - not runtime tested due to dlsym() build errors in staging
---
 docs/man/xen-tscmode.pod.7               | 16 +++++++++++++
 docs/man/xl.cfg.pod.5.in                 | 10 ++++++++
 docs/specs/libxc-migration-stream.pandoc |  6 +++--
 tools/libxc/include/xenctrl.h            |  2 ++
 tools/libxc/xc_domain.c                  |  4 ++++
 tools/libxc/xc_sr_common_x86.c           |  6 +++--
 tools/libxc/xc_sr_stream_format.h        |  3 ++-
 tools/libxl/libxl.h                      |  6 +++++
 tools/libxl/libxl_types.idl              |  1 +
 tools/libxl/libxl_x86.c                  |  3 ++-
 tools/python/xen/lowlevel/xc/xc.c        |  2 +-
 tools/python/xen/migration/libxc.py      |  8 +++----
 tools/xl/xl_parse.c                      |  3 +++
 xen/arch/x86/domain.c                    |  2 +-
 xen/arch/x86/domctl.c                    |  2 ++
 xen/arch/x86/time.c                      | 30 +++++++++++++++++++++---
 xen/include/asm-x86/domain.h             |  1 +
 xen/include/asm-x86/time.h               |  6 +++--
 xen/include/public/domctl.h              |  3 ++-
 19 files changed, 96 insertions(+), 18 deletions(-)

diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 3bbc96f201..122ae36679 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -99,6 +99,9 @@ whether or not the VM has been saved/restored/migrated
 
 =back
 
+If the tsc_mode is set to "default" the decision to emulate TSC can be
+tweaked further with the "vtsc_tolerance_khz" option.
+
 To understand this in more detail, the rest of this document must
 be read.
 
@@ -211,6 +214,19 @@ is emulated.  Note that, though emulated, the "apparent" TSC frequency
 will be the TSC frequency of the initial physical machine, even after
 migration.
 
+Since the calibration of the TSC frequency may not be 100% accurate, the
+exact value of the frequency can change even across reboots. This means
+also several otherwise identical systems can have a slightly different
+TSC frequency. As a result TSC access will be emulated if a domU is
+migrated from one host to another, identical host. To avoid the
+performance impact of TSC emulation a certain tolerance of the measured
+host TSC frequency can be specified with "vtsc_tolerance_khz". If the
+measured "cpu_khz" value is within the tolerance range, TSC access
+remains native. Otherwise it will be emulated. This allows to migrate
+domUs between identical hardware. If the domU will be migrated to a
+different kind of hardware, say from a "2.3GHz" to a "2.5GHz" system,
+TSC will be emualted to maintain the TSC frequency expected by the domU.
+
 For environments where both TSC-safeness AND highest performance
 even across migration is a requirement, application code can be specially
 modified to use an algorithm explicitly designed into Xen for this purpose.
diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 47d88243b1..995277794f 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1898,6 +1898,16 @@ determined in a similar way to that of B<default> TSC mode.
 
 Please see B<xen-tscmode(7)> for more information on this option.
 
+=item B<vtsc_tolerance_khz="KHZ">
+
+B<(x86 only, relevant only for tsc_mode=default)>
+When a domU is started, the CPU frequency of the host is used by the domU for
+TSC related time measurement. Once the domU is either migrated or
+saved/restored on another host that CPU frequency has to be emulated to avoid
+timedrift. To avoid the performance penalty of the TSC emulation, allow a
+certain amount of jitter of the measured CPU frequency on the hosts the domU
+is supposed to run on. Default value is 0, i.e. no tolerance.
+
 =item B<localtime=BOOLEAN>
 
 Set the real time clock to local time or to UTC. False (0) by default,
diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
index 73421ff393..0d0f17edb1 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -3,7 +3,7 @@
   Andrew Cooper <<andrew.cooper3@citrix.com>>
   Wen Congyang <<wency@cn.fujitsu.com>>
   Yang Hongyang <<hongyang.yang@easystack.cn>>
-% Revision 2
+% Revision 3
 
 Introduction
 ============
@@ -472,7 +472,7 @@ XEN\_DOMCTL\_{get,set}tscinfo hypercall sub-ops.
     +------------------------+------------------------+
     | nsec                                            |
     +------------------------+------------------------+
-    | incarnation            | (reserved)             |
+    | incarnation            | tolerance | (reserved) |
     +------------------------+------------------------+
 
 --------------------------------------------------------------------
@@ -485,6 +485,8 @@ khz              TSC frequency, in kHz.
 nsec             Elapsed time, in nanoseconds.
 
 incarnation      Incarnation.
+
+tolerance        Amount of Jitter the domU can handle after migration
 --------------------------------------------------------------------
 
 \clearpage
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 408fa1c6a4..e74c480ae2 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1360,6 +1360,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t tsc_mode,
                            uint64_t elapsed_nsec,
                            uint32_t gtsc_khz,
+                           uint16_t vtsc_tolerance_khz,
                            uint32_t incarnation);
 
 int xc_domain_get_tsc_info(xc_interface *xch,
@@ -1367,6 +1368,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
                            uint32_t *tsc_mode,
                            uint64_t *elapsed_nsec,
                            uint32_t *gtsc_khz,
+                           uint16_t *vtsc_tolerance_khz,
                            uint32_t *incarnation);
 
 int xc_domain_disable_migrate(xc_interface *xch, uint32_t domid);
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 57e18ee227..ec111989ee 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -852,6 +852,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
                            uint32_t tsc_mode,
                            uint64_t elapsed_nsec,
                            uint32_t gtsc_khz,
+                           uint16_t vtsc_tolerance_khz,
                            uint32_t incarnation)
 {
     DECLARE_DOMCTL;
@@ -860,6 +861,7 @@ int xc_domain_set_tsc_info(xc_interface *xch,
     domctl.u.tsc_info.tsc_mode = tsc_mode;
     domctl.u.tsc_info.elapsed_nsec = elapsed_nsec;
     domctl.u.tsc_info.gtsc_khz = gtsc_khz;
+    domctl.u.tsc_info.vtsc_tolerance_khz = vtsc_tolerance_khz;
     domctl.u.tsc_info.incarnation = incarnation;
     return do_domctl(xch, &domctl);
 }
@@ -869,6 +871,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
                            uint32_t *tsc_mode,
                            uint64_t *elapsed_nsec,
                            uint32_t *gtsc_khz,
+                           uint16_t *vtsc_tolerance_khz,
                            uint32_t *incarnation)
 {
     int rc;
@@ -882,6 +885,7 @@ int xc_domain_get_tsc_info(xc_interface *xch,
         *tsc_mode = domctl.u.tsc_info.tsc_mode;
         *elapsed_nsec = domctl.u.tsc_info.elapsed_nsec;
         *gtsc_khz = domctl.u.tsc_info.gtsc_khz;
+        *vtsc_tolerance_khz = domctl.u.tsc_info.vtsc_tolerance_khz;
         *incarnation = domctl.u.tsc_info.incarnation;
     }
     return rc;
diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c
index 98f1cef30f..ea3e551a83 100644
--- a/tools/libxc/xc_sr_common_x86.c
+++ b/tools/libxc/xc_sr_common_x86.c
@@ -12,7 +12,8 @@ int write_tsc_info(struct xc_sr_context *ctx)
     };
 
     if ( xc_domain_get_tsc_info(xch, ctx->domid, &tsc.mode,
-                                &tsc.nsec, &tsc.khz, &tsc.incarnation) < 0 )
+                                &tsc.nsec, &tsc.khz, &tsc.vtsc_tolerance,
+                                &tsc.incarnation) < 0 )
     {
         PERROR("Unable to obtain TSC information");
         return -1;
@@ -34,7 +35,8 @@ int handle_tsc_info(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     }
 
     if ( xc_domain_set_tsc_info(xch, ctx->domid, tsc->mode,
-                                tsc->nsec, tsc->khz, tsc->incarnation) )
+                                tsc->nsec, tsc->khz, tsc->vtsc_tolerance,
+                                tsc->incarnation) )
     {
         PERROR("Unable to set TSC information");
         return -1;
diff --git a/tools/libxc/xc_sr_stream_format.h b/tools/libxc/xc_sr_stream_format.h
index 15ff1c7efb..9b52f6ace6 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -121,7 +121,8 @@ struct xc_sr_rec_tsc_info
     uint32_t khz;
     uint64_t nsec;
     uint32_t incarnation;
-    uint32_t _res1;
+    uint16_t vtsc_tolerance;
+    uint16_t _res1;
 };
 
 /* HVM_PARAMS */
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a09d069358..2247f04648 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -354,6 +354,12 @@
 #define LIBXL_HAVE_BUILDINFO_BOOTLOADER 1
 #define LIBXL_HAVE_BUILDINFO_BOOTLOADER_ARGS 1
 
+/*
+ * LIBXL_HAVE_VTSC_TOLERANCE_KHZ indicates that libxl_domain_build_info
+ * has the vtsc_tolerance_khz field.
+ */
+#define LIBXL_HAVE_VTSC_TOLERANCE_KHZ 1
+
 /*
  * libxl ABI compatibility
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 01ec1d1afa..bb99776401 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -466,6 +466,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
     ("numa_placement",  libxl_defbool),
     ("tsc_mode",        libxl_tsc_mode),
+    ("vtsc_tolerance_khz", uint16),
     ("max_memkb",       MemKB),
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ab88562619..d9747cc45a 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -314,7 +314,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
     default:
         abort();
     }
-    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
+    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0,
+                           d_config->b_info.vtsc_tolerance_khz, 0);
     if (libxl_defbool_val(d_config->b_info.disable_migrate))
         xc_domain_disable_migrate(ctx->xch, domid);
     rtc_timeoffset = d_config->b_info.rtc_timeoffset;
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 694bfa0642..d781589886 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1522,7 +1522,7 @@ static PyObject *pyxc_domain_set_tsc_info(XcObject *self, PyObject *args)
     if (!PyArg_ParseTuple(args, "ii", &dom, &tsc_mode))
         return NULL;
 
-    if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0) != 0)
+    if (xc_domain_set_tsc_info(self->xc_handle, dom, tsc_mode, 0, 0, 0, 0) != 0)
         return pyxc_error_to_exception(self->xc_handle);
 
     Py_INCREF(zero);
diff --git a/tools/python/xen/migration/libxc.py b/tools/python/xen/migration/libxc.py
index f24448a9ef..abcda617e4 100644
--- a/tools/python/xen/migration/libxc.py
+++ b/tools/python/xen/migration/libxc.py
@@ -114,7 +114,7 @@ X86_PV_P2M_FRAMES_FORMAT  = "II"
 X86_PV_VCPU_HDR_FORMAT    = "II"
 
 # tsc_info
-TSC_INFO_FORMAT           = "IIQII"
+TSC_INFO_FORMAT           = "IIQIHH"
 
 # hvm_params
 HVM_PARAMS_ENTRY_FORMAT   = "QQ"
@@ -363,14 +363,14 @@ class VerifyLibxc(VerifyBase):
         if len(content) != sz:
             raise RecordError("Length should be %u bytes" % (sz, ))
 
-        mode, khz, nsec, incarn, res1 = unpack(TSC_INFO_FORMAT, content)
+        mode, khz, nsec, incarn, tolerance, res1 = unpack(TSC_INFO_FORMAT, content)
 
         if res1 != 0:
             raise StreamError("Reserved bits set in TSC_INFO: 0x%08x"
                               % (res1, ))
 
-        self.info("  Mode %u, %u kHz, %u ns, incarnation %d"
-                  % (mode, khz, nsec, incarn))
+        self.info("  Mode %u, %u kHz, %u ns, incarnation %d, tolerance %u kHz"
+                  % (mode, khz, nsec, incarn, tolerance))
 
 
     def verify_record_hvm_context(self, content):
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index e6c54483e0..1915640d64 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1126,6 +1126,9 @@ void parse_config_data(const char *config_source,
         }
     }
 
+    if (!xlu_cfg_get_long(config, "vtsc_tolerance_khz", &l, 0))
+        b_info->vtsc_tolerance_khz = l < 0 || l > UINT16_MAX ? UINT16_MAX : l;
+
     if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
         b_info->rtc_timeoffset = l;
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 0ca820a00a..3ae13b8f78 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -557,7 +557,7 @@ int arch_domain_create(struct domain *d,
         ASSERT_UNREACHABLE(); /* Not HVM and not PV? */
 
     /* initialize default tsc behavior in case tools don't */
-    tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0);
+    tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0, 0);
 
     /* PV/PVH guests get an emulated PIT too for video BIOSes to use. */
     pit_init(d, cpu_khz);
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..d86ff58482 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -939,6 +939,7 @@ long arch_do_domctl(
             tsc_get_info(d, &domctl->u.tsc_info.tsc_mode,
                          &domctl->u.tsc_info.elapsed_nsec,
                          &domctl->u.tsc_info.gtsc_khz,
+                         &domctl->u.tsc_info.vtsc_tolerance_khz,
                          &domctl->u.tsc_info.incarnation);
             domain_unpause(d);
             copyback = true;
@@ -954,6 +955,7 @@ long arch_do_domctl(
             tsc_set_info(d, domctl->u.tsc_info.tsc_mode,
                          domctl->u.tsc_info.elapsed_nsec,
                          domctl->u.tsc_info.gtsc_khz,
+                         domctl->u.tsc_info.vtsc_tolerance_khz,
                          domctl->u.tsc_info.incarnation);
             domain_unpause(d);
         }
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index c342d00732..4a9c43b718 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2063,7 +2063,7 @@ int host_tsc_is_safe(void)
  */
 void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
                   uint64_t *elapsed_nsec, uint32_t *gtsc_khz,
-                  uint32_t *incarnation)
+                  uint16_t *vtsc_tolerance_khz, uint32_t *incarnation)
 {
     bool enable_tsc_scaling = is_hvm_domain(d) &&
                               hvm_tsc_scaling_supported && !d->arch.vtsc;
@@ -2079,6 +2079,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
         *elapsed_nsec = *gtsc_khz = 0;
         break;
     case TSC_MODE_DEFAULT:
+        *vtsc_tolerance_khz = d->arch.vtsc_tolerance_khz;
         if ( d->arch.vtsc )
         {
     case TSC_MODE_ALWAYS_EMULATE:
@@ -2121,7 +2122,8 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
  */
 void tsc_set_info(struct domain *d,
                   uint32_t tsc_mode, uint64_t elapsed_nsec,
-                  uint32_t gtsc_khz, uint32_t incarnation)
+                  uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz,
+                  uint32_t incarnation)
 {
     ASSERT(!is_system_domain(d));
 
@@ -2133,9 +2135,12 @@ void tsc_set_info(struct domain *d,
 
     switch ( d->arch.tsc_mode = tsc_mode )
     {
+        bool disable_vtsc;
         bool enable_tsc_scaling;
 
     case TSC_MODE_DEFAULT:
+        d->arch.vtsc_tolerance_khz = vtsc_tolerance_khz;
+        /* Fallthrough. */
     case TSC_MODE_ALWAYS_EMULATE:
         d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
         d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
@@ -2148,8 +2153,25 @@ void tsc_set_info(struct domain *d,
          * When a guest is created, gtsc_khz is passed in as zero, making
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
          */
+        disable_vtsc = d->arch.tsc_khz == cpu_khz;
+
+        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz &&
+             d->arch.vtsc_tolerance_khz )
+        {
+            long khz_diff;
+
+            khz_diff = ABS((long)(cpu_khz - gtsc_khz));
+            disable_vtsc = khz_diff <= d->arch.vtsc_tolerance_khz;
+
+            printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
+                   " domU expects %u kHz,"
+                   " difference of %ld is %s tolerance of %u\n",
+                   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
+                   disable_vtsc ? "within" : "outside",
+                   d->arch.vtsc_tolerance_khz);
+        }
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (disable_vtsc ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
         {
@@ -2238,6 +2260,8 @@ static void dump_softtsc(unsigned char key)
             printk(",ofs=%#"PRIx64, d->arch.vtsc_offset);
         if ( d->arch.tsc_khz )
             printk(",khz=%"PRIu32, d->arch.tsc_khz);
+        if ( d->arch.vtsc_tolerance_khz )
+            printk(",tol=%"PRIu16, d->arch.vtsc_tolerance_khz);
         if ( d->arch.incarnation )
             printk(",inc=%"PRIu32, d->arch.incarnation);
 #if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 197f8d62be..be3265aa7f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -379,6 +379,7 @@ struct arch_domain
     uint64_t vtsc_offset;    /* adjustment for save/restore/migrate */
     uint32_t tsc_khz;        /* cached guest khz for certain emulated or
                                 hardware TSC scaling cases */
+    uint32_t vtsc_tolerance_khz; /* domU handles that much jitter in cpu_khz */
     struct time_scale vtsc_to_ns; /* scaling for certain emulated or
                                      hardware TSC scaling cases */
     struct time_scale ns_to_vtsc; /* scaling for certain emulated or
diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h
index b3ae832df4..ef9be7a701 100644
--- a/xen/include/asm-x86/time.h
+++ b/xen/include/asm-x86/time.h
@@ -61,10 +61,12 @@ u64 gtime_to_gtsc(struct domain *d, u64 time);
 u64 gtsc_to_gtime(struct domain *d, u64 tsc);
 
 void tsc_set_info(struct domain *d, uint32_t tsc_mode, uint64_t elapsed_nsec,
-                  uint32_t gtsc_khz, uint32_t incarnation);
+                  uint32_t gtsc_khz, uint16_t vtsc_tolerance_khz,
+                  uint32_t incarnation);
    
 void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec,
-                  uint32_t *gtsc_khz, uint32_t *incarnation);
+                  uint32_t *gtsc_khz, uint16_t *vtsc_tolerance_khz,
+                  uint32_t *incarnation);
    
 
 void force_update_vcpu_system_time(struct vcpu *v);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 0535da81c6..b2a10ff04d 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -702,7 +702,8 @@ struct xen_domctl_tsc_info {
     uint32_t tsc_mode;
     uint32_t gtsc_khz;
     uint32_t incarnation;
-    uint32_t pad;
+    uint16_t vtsc_tolerance_khz;
+    uint16_t pad;
     uint64_aligned_t elapsed_nsec;
 };
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 13:08 [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation Olaf Hering
@ 2018-06-07 13:31 ` Jan Beulich
  2018-06-07 13:47   ` Olaf Hering
  2018-06-26 17:11 ` Olaf Hering
  2018-10-01 12:39 ` Andrew Cooper
  2 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2018-06-07 13:31 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski, xen-devel,
	Julien Grall

>>> On 07.06.18 at 15:08, <olaf@aepfle.de> wrote:
> Add an option to control when vTSC emulation will be activated for a
> domU with tsc_mode=default. Without such option each TSC access from
> domU will be emulated, which causes a significant perfomance drop for
> workloads that make use of rdtsc.
> 
> One option to avoid the TSC option is to run domUs with tsc_mode=native.
> This has the drawback that migrating a domU from a "2.3GHz" class host
> to a "2.4GHz" class host may change the rate at wich the TSC counter
> increases, the domU may not be prepared for that.
> 
> With the new option the host admin can decide how a domU should behave
> when it is migrated across systems of the same class. Since there is
> always some jitter when Xen calibrates the cpu_khz value, all hosts of
> the same class will most likely have slightly different values. As a
> result vTSC emulation is unavoidable. Data collected during the incident
> which triggered this change showed a jitter of up to 200 KHz across
> systems of the same class.
> 
> Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> The padding is sent as zero in write_tsc_info to the receving host.
> The padding is undefined if the changed code runs as receiver.
> handle_tsc_info has no code to verify that padding is indeed zero. Due
> to the lack of a version field it is impossible to know if the sender
> already has the newly introduced vtsc_tolerance field. In the worst
> case the receiving domU will get an unemulated TSC.

Hmm, I find this description concerning. Is the field reliably zero when
coming from older Xen, or is it not?

> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Reviewed-by: Wei Liu <wei.liu2@citrix.com> (v07/v08)
> Reviewed-by: Jan Beulich <jbeulich@suse.com> (v08)

To avoid this getting committed prematurely: I gave this R-b pending
Andrew finding his prior concerns addressed.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 13:31 ` Jan Beulich
@ 2018-06-07 13:47   ` Olaf Hering
  2018-06-07 14:44     ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Olaf Hering @ 2018-06-07 13:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski, xen-devel,
	Julien Grall


[-- Attachment #1.1: Type: text/plain, Size: 594 bytes --]

Am Thu, 07 Jun 2018 07:31:26 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> Hmm, I find this description concerning. Is the field reliably zero when
> coming from older Xen, or is it not?

It depends on how old.
If the other sending side has write_tsc_info from 4.6+, then _res1 is zero.
If it is older, the XC_SAVE_ID_TSC_INFO is eventually not handled anymore.
At least I could not find the relevant code, even in staging-4.6.
But then, it is unlikely that migration from 4.5 to 4.6 did not work.

Also, in general the input comes from "remote". So it is untrusted.

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 13:47   ` Olaf Hering
@ 2018-06-07 14:44     ` Jan Beulich
  2018-06-07 14:49       ` Olaf Hering
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2018-06-07 14:44 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski, xen-devel,
	Julien Grall

>>> On 07.06.18 at 15:47, <olaf@aepfle.de> wrote:
> Am Thu, 07 Jun 2018 07:31:26 -0600
> schrieb "Jan Beulich" <JBeulich@suse.com>:
> 
>> Hmm, I find this description concerning. Is the field reliably zero when
>> coming from older Xen, or is it not?
> 
> It depends on how old.
> If the other sending side has write_tsc_info from 4.6+, then _res1 is zero.
> If it is older, the XC_SAVE_ID_TSC_INFO is eventually not handled anymore.
> At least I could not find the relevant code, even in staging-4.6.
> But then, it is unlikely that migration from 4.5 to 4.6 did not work.
> 
> Also, in general the input comes from "remote". So it is untrusted.

But isn't this exactly the concern Andrew had voiced? If so, we're not
any closer to this having a chance to go in. The re-use of the field is
acceptable only if all existing senders reliably fill zero in there.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 14:44     ` Jan Beulich
@ 2018-06-07 14:49       ` Olaf Hering
  2018-06-07 14:55         ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Olaf Hering @ 2018-06-07 14:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski, xen-devel,
	Julien Grall


[-- Attachment #1.1: Type: text/plain, Size: 268 bytes --]

Am Thu, 07 Jun 2018 08:44:41 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> The re-use of the field is acceptable only if all existing senders reliably fill zero in there.

How do we know all senders? I just know about write_tsc_info from xen-4.6+.

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 14:49       ` Olaf Hering
@ 2018-06-07 14:55         ` Jan Beulich
  2018-06-07 21:05           ` Olaf Hering
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2018-06-07 14:55 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski, xen-devel,
	Julien Grall

>>> On 07.06.18 at 16:49, <olaf@aepfle.de> wrote:
> Am Thu, 07 Jun 2018 08:44:41 -0600
> schrieb "Jan Beulich" <JBeulich@suse.com>:
> 
>> The re-use of the field is acceptable only if all existing senders reliably
>> fill zero in there.
> 
> How do we know all senders? I just know about write_tsc_info from xen-4.6+.

I don't think we care about senders other than ones using libxc, so
by knowing whether all libxc versions conform to the requirement,
we could declare we're fine.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 14:55         ` Jan Beulich
@ 2018-06-07 21:05           ` Olaf Hering
  0 siblings, 0 replies; 25+ messages in thread
From: Olaf Hering @ 2018-06-07 21:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Marek Marczykowski, xen-devel,
	Julien Grall


[-- Attachment #1.1: Type: text/plain, Size: 663 bytes --]

On Thu, Jun 07, Jan Beulich wrote:

> >>> On 07.06.18 at 16:49, <olaf@aepfle.de> wrote:
> > Am Thu, 07 Jun 2018 08:44:41 -0600
> > schrieb "Jan Beulich" <JBeulich@suse.com>:
> >> The re-use of the field is acceptable only if all existing senders reliably
> >> fill zero in there.
> > How do we know all senders? I just know about write_tsc_info from xen-4.6+.
> I don't think we care about senders other than ones using libxc, so
> by knowing whether all libxc versions conform to the requirement,
> we could declare we're fine.

Yes, I think we are fine. And if migration from pre-4.6 is supported
anyway is another question. Most likely the answer is no.

Olaf

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 13:08 [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation Olaf Hering
  2018-06-07 13:31 ` Jan Beulich
@ 2018-06-26 17:11 ` Olaf Hering
  2018-08-01 16:17   ` Olaf Hering
  2018-10-01 12:39 ` Andrew Cooper
  2 siblings, 1 reply; 25+ messages in thread
From: Olaf Hering @ 2018-06-26 17:11 UTC (permalink / raw)
  To: xen-devel, Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Marek Marczykowski-Górecki, Julien Grall,
	Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 402 bytes --]

Am Thu,  7 Jun 2018 15:08:29 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> Add an option to control when vTSC emulation will be activated for a
> domU with tsc_mode=default. Without such option each TSC access from
> domU will be emulated, which causes a significant perfomance drop for
> workloads that make use of rdtsc.

Andrew,

do you have any comments on this patch/approach?

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-26 17:11 ` Olaf Hering
@ 2018-08-01 16:17   ` Olaf Hering
  2018-09-13  7:39     ` Olaf Hering
  0 siblings, 1 reply; 25+ messages in thread
From: Olaf Hering @ 2018-08-01 16:17 UTC (permalink / raw)
  To: xen-devel, Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Marek Marczykowski-Górecki, Julien Grall,
	Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 606 bytes --]

Am Tue, 26 Jun 2018 19:11:13 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> Am Thu,  7 Jun 2018 15:08:29 +0200
> schrieb Olaf Hering <olaf@aepfle.de>:
> > Add an option to control when vTSC emulation will be activated for a
> > domU with tsc_mode=default. Without such option each TSC access from
> > domU will be emulated, which causes a significant perfomance drop for
> > workloads that make use of rdtsc.  
> Andrew,
> do you have any comments on this patch/approach?


Andrew,

any comments on this patch?

https://lists.xen.org/archives/html/xen-devel/2018-06/msg00398.html

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-08-01 16:17   ` Olaf Hering
@ 2018-09-13  7:39     ` Olaf Hering
  2018-09-28 13:50       ` Olaf Hering
  0 siblings, 1 reply; 25+ messages in thread
From: Olaf Hering @ 2018-09-13  7:39 UTC (permalink / raw)
  To: xen-devel, Andrew Cooper, Lars Kurth
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson,
	Marek Marczykowski-Górecki, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 955 bytes --]

Andrew, Lars,

this patch was not applied yet, even after a few "pings".

I think it needs an approval from Andrew before it can go in, unless the
bug has to be fixed in some other way.

Please take some time to review that change.

Thanks,
Olaf

Am Wed, 1 Aug 2018 18:17:47 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> Am Tue, 26 Jun 2018 19:11:13 +0200
> schrieb Olaf Hering <olaf@aepfle.de>:
> > Am Thu,  7 Jun 2018 15:08:29 +0200
> > schrieb Olaf Hering <olaf@aepfle.de>:  
> > > Add an option to control when vTSC emulation will be activated for a
> > > domU with tsc_mode=default. Without such option each TSC access from
> > > domU will be emulated, which causes a significant perfomance drop for
> > > workloads that make use of rdtsc.    
> > Andrew,
> > do you have any comments on this patch/approach?  
> Andrew,
> any comments on this patch?
> https://lists.xen.org/archives/html/xen-devel/2018-06/msg00398.html

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-09-13  7:39     ` Olaf Hering
@ 2018-09-28 13:50       ` Olaf Hering
  2018-10-01 10:52         ` Ian Jackson
  0 siblings, 1 reply; 25+ messages in thread
From: Olaf Hering @ 2018-09-28 13:50 UTC (permalink / raw)
  To: xen-devel, Andrew Cooper, Lars Kurth
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson,
	Marek Marczykowski-Górecki, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 262 bytes --]

Am Thu, 13 Sep 2018 09:39:13 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> this patch was not applied yet, even after a few "pings".

No reaction since months.

So scrap that patch, just in case it is still part of someones to-consider queue.


Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-09-28 13:50       ` Olaf Hering
@ 2018-10-01 10:52         ` Ian Jackson
  2018-10-01 11:25           ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Jackson @ 2018-10-01 10:52 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Lars Kurth, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan,
	Marek Marczykowski-Górecki, xen-devel, Julien Grall,
	Jan Beulich

Olaf Hering writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation"):
> Am Thu, 13 Sep 2018 09:39:13 +0200
> schrieb Olaf Hering <olaf@aepfle.de>:
> > this patch was not applied yet, even after a few "pings".
> 
> No reaction since months.
> So scrap that patch, just in case it is still part of someones to-consider queue.

I think it would be worth exploring whether this patch could be
applied without an explicit ack from Andrew.

I confess I ignored all the previous mails because they all started
   Andrew,
or
   Andrew, Lars,
so I assumed that you didn't want attention from other
maintainers/committers.

Now that I look at the thread it is difficult for me to see the wood
for the trees but I don't see unanswered concerns.  I guess the patch
should maybe therefore be committed.

As committer, before I did that despite a missing ack, I would want to
(i) try to understand if possibly why the ack was missing (ii) do my
own review to satisfy any doubts (iii) give the maintainer a clear
tike interval to say "nack".


As for (i) (reasons for lack of ack), see above, and it's not clear to
me; but AFAICT there are two difficulties.  One (see (a) below) is
with the principle of the feature.  One (see (b) below) is about a
detail of the implementation.

As for (ii) (own review), see below.  That addresses (a) and (b).

As for (iii): If anyone actually objects to this patch for some
different reason, or to my proposed approach, please comment in the
next 7 days.


(a) Principle

I read the documentation for the new feature and it seems to make
sense.  But it would be nice for the documentation to explain what is
considered a likely safe and sufficient value.

For example, one might say something like this:

  Typical TSC speed variation between supposedly identical hosts is
  about X%.  A Unix guest running NTP for time synchronisation can
  cope with clock drift rates of up to about Y%.  So a
  vtsc_tolerance_khz setting between these two values is likely to be
  effective for migration between "identical" hosts, and not
  disruptive.

Olaf, please could you fill in the blanks in that text, and consider
what the exact wording should be (depending on what research you
conducted etc.), and then consider whether to put it in the
documentation, commit meesage, or just on the mailing list ?

Personally I think documentation would be ideal if we have firm
information but if firm information is hard to come by and all we have
is empirical data, then maybe a commit message describing those
experiences would be sufficient.


(b) Protocol compatibility

  Olaf:
  > > On 07.06.18 at 16:49, <olaf@aepfle.de> wrote:
  > > > Am Thu, 07 Jun 2018 08:44:41 -0600
  > > > schrieb "Jan Beulich" <JBeulich@suse.com>:
  > > >> The re-use of the field is acceptable only if all existing
  > > >> senders reliably fill zero in there.
  > > >
  > > > How do we know all senders? I just know about write_tsc_info
  > > > from xen-4.6+.
  > >
  > > I don't think we care about senders other than ones using libxc, so
  > > by knowing whether all libxc versions conform to the requirement,
  > > we could declare we're fine.
  > 
  > Yes, I think we are fine. And if migration from pre-4.6 is supported
  > anyway is another question. Most likely the answer is no.

This comment from Olaf "Yes, I think we are fine" is not particularly
reassuring.  Olaf, can you please say something more definite about
how you verified whether your assertion is true.

And yes, we do mean also for versions from pre-4.6.  It is not
supported, but it must not go wrong in unexpected ways.  You can use
the git history to see older implementations in libxc.


Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 10:52         ` Ian Jackson
@ 2018-10-01 11:25           ` Jan Beulich
  2018-10-01 13:24             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages] Ian Jackson
  2018-10-01 13:38             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation George Dunlap
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2018-10-01 11:25 UTC (permalink / raw)
  To: Andrew Cooper, Ian Jackson
  Cc: Lars Kurth, Olaf Hering, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Marek Marczykowski, xen-devel, Julien Grall

>>> On 01.10.18 at 12:52, <ian.jackson@citrix.com> wrote:
> Olaf Hering writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to 
> avoid TSC emulation"):
>> Am Thu, 13 Sep 2018 09:39:13 +0200
>> schrieb Olaf Hering <olaf@aepfle.de>:
>> > this patch was not applied yet, even after a few "pings".
>> 
>> No reaction since months.
>> So scrap that patch, just in case it is still part of someones to-consider 
> queue.
> 
> I think it would be worth exploring whether this patch could be
> applied without an explicit ack from Andrew.
> 
> I confess I ignored all the previous mails because they all started
>    Andrew,
> or
>    Andrew, Lars,
> so I assumed that you didn't want attention from other
> maintainers/committers.
> 
> Now that I look at the thread it is difficult for me to see the wood
> for the trees but I don't see unanswered concerns.

Problem is - discussion around this was (iirc) happening not only on
the list, but also on irc (including perhaps private chats). It was for
that reason that I made my R-b conditional upon Andrew at least
giving an informal go-ahead (otherwise, together with Wei's R-b,
the patch could have gone in).

Besides the question of correctness from the perspective of guests
(which imo is not a problem as the feature needs to be actively
enabled, and I think we have no reason to keep admins from
breaking their guests if they really mean to), I think the main concern
was with the way migration of the new value was implemented. But I
really have to defer to Andrew for that, irrespective of him not
having responded (on the list) to prior pings.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-06-07 13:08 [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation Olaf Hering
  2018-06-07 13:31 ` Jan Beulich
  2018-06-26 17:11 ` Olaf Hering
@ 2018-10-01 12:39 ` Andrew Cooper
  2018-10-01 13:29   ` Jan Beulich
                     ` (2 more replies)
  2 siblings, 3 replies; 25+ messages in thread
From: Andrew Cooper @ 2018-10-01 12:39 UTC (permalink / raw)
  To: Olaf Hering, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson,
	Marek Marczykowski-Górecki, Julien Grall, Jan Beulich

On 07/06/18 14:08, Olaf Hering wrote:
> Add an option to control when vTSC emulation will be activated for a
> domU with tsc_mode=default. Without such option each TSC access from
> domU will be emulated, which causes a significant perfomance drop for
> workloads that make use of rdtsc.
>
> One option to avoid the TSC option is to run domUs with tsc_mode=native.
> This has the drawback that migrating a domU from a "2.3GHz" class host
> to a "2.4GHz" class host may change the rate at wich the TSC counter
> increases, the domU may not be prepared for that.
>
> With the new option the host admin can decide how a domU should behave
> when it is migrated across systems of the same class. Since there is
> always some jitter when Xen calibrates the cpu_khz value, all hosts of
> the same class will most likely have slightly different values. As a
> result vTSC emulation is unavoidable. Data collected during the incident
> which triggered this change showed a jitter of up to 200 KHz across
> systems of the same class.

Do you have any further details of the systems involved?  If they are
identical systems, they should all have the same real TSC frequency, and
its a known issue that Xen isn't very good at working out the
frequency.  TBH, fixing that would be far better overall.

>
> Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> The padding is sent as zero in write_tsc_info to the receving host.
> The padding is undefined if the changed code runs as receiver.

I'm not sure what you mean by this final sentence.

> handle_tsc_info has no code to verify that padding is indeed zero. Due
> to the lack of a version field it is impossible to know if the sender
> already has the newly introduced vtsc_tolerance field. In the worst
> case the receiving domU will get an unemulated TSC.

The lack of padding verification is deliberate, for forwards
compatibility.  Why does the sending code matter?  One way or another,
if the field is 0, the option wasn't present or wasn't configured. 
Neither of these situations affect the decision-making that the
receiving side needs to perform.

>
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Reviewed-by: Wei Liu <wei.liu2@citrix.com> (v07/v08)
> Reviewed-by: Jan Beulich <jbeulich@suse.com> (v08)

I'm still -0.5 for this patch.  I can appreciate why you want it, but it
is a gross hack which only works when you don't skew time more than NTP
in the guest can cope with.  My gut feeling is that there will be other
more subtle fallout.

As for the implementation itself, a few trivial comments.

> diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
> index 3bbc96f201..122ae36679 100644
> --- a/docs/man/xen-tscmode.pod.7
> +++ b/docs/man/xen-tscmode.pod.7
> @@ -99,6 +99,9 @@ whether or not the VM has been saved/restored/migrated
>  
>  =back
>  
> +If the tsc_mode is set to "default" the decision to emulate TSC can be
> +tweaked further with the "vtsc_tolerance_khz" option.
> +
>  To understand this in more detail, the rest of this document must
>  be read.
>  
> @@ -211,6 +214,19 @@ is emulated.  Note that, though emulated, the "apparent" TSC frequency
>  will be the TSC frequency of the initial physical machine, even after
>  migration.
>  
> +Since the calibration of the TSC frequency may not be 100% accurate, the
> +exact value of the frequency can change even across reboots.

It can change across reboots for other reasons, e.g. firmware settings.

I'd phrase this as "Since the calibration of the TSC frequency isn't
100% accurate, the value measured by Xen can vary across reboots".

>  This means
> +also several otherwise identical systems can have a slightly different
> +TSC frequency. As a result TSC access will be emulated if a domU is
> +migrated from one host to another, identical host. To avoid the
> +performance impact of TSC emulation a certain tolerance of the measured
> +host TSC frequency can be specified with "vtsc_tolerance_khz". If the
> +measured "cpu_khz" value is within the tolerance range, TSC access
> +remains native. Otherwise it will be emulated. This allows to migrate
> +domUs between identical hardware. If the domU will be migrated to a
> +different kind of hardware, say from a "2.3GHz" to a "2.5GHz" system,
> +TSC will be emualted to maintain the TSC frequency expected by the domU.
> +
>  For environments where both TSC-safeness AND highest performance
>  even across migration is a requirement, application code can be specially
>  modified to use an algorithm explicitly designed into Xen for this purpose.
> diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
> index 47d88243b1..995277794f 100644
> --- a/docs/man/xl.cfg.pod.5.in
> +++ b/docs/man/xl.cfg.pod.5.in
> @@ -1898,6 +1898,16 @@ determined in a similar way to that of B<default> TSC mode.
>  
>  Please see B<xen-tscmode(7)> for more information on this option.
>  
> +=item B<vtsc_tolerance_khz="KHZ">
> +
> +B<(x86 only, relevant only for tsc_mode=default)>
> +When a domU is started, the CPU frequency of the host is used by the domU for
> +TSC related time measurement. Once the domU is either migrated or
> +saved/restored on another host that CPU frequency has to be emulated to avoid
> +timedrift. To avoid the performance penalty of the TSC emulation, allow a
> +certain amount of jitter of the measured CPU frequency on the hosts the domU
> +is supposed to run on. Default value is 0, i.e. no tolerance.

In one of these two paragraphs, I think there needs to be a warning
about clock drift in the guest.

> +
>  =item B<localtime=BOOLEAN>
>  
>  Set the real time clock to local time or to UTC. False (0) by default,
> diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
> index 73421ff393..0d0f17edb1 100644
> --- a/docs/specs/libxc-migration-stream.pandoc
> +++ b/docs/specs/libxc-migration-stream.pandoc
> @@ -3,7 +3,7 @@
>    Andrew Cooper <<andrew.cooper3@citrix.com>>
>    Wen Congyang <<wency@cn.fujitsu.com>>
>    Yang Hongyang <<hongyang.yang@easystack.cn>>
> -% Revision 2
> +% Revision 3
>  
>  Introduction
>  ============
> @@ -472,7 +472,7 @@ XEN\_DOMCTL\_{get,set}tscinfo hypercall sub-ops.
>      +------------------------+------------------------+
>      | nsec                                            |
>      +------------------------+------------------------+
> -    | incarnation            | (reserved)             |
> +    | incarnation            | tolerance | (reserved) |
>      +------------------------+------------------------+
>  
>  --------------------------------------------------------------------
> @@ -485,6 +485,8 @@ khz              TSC frequency, in kHz.
>  nsec             Elapsed time, in nanoseconds.
>  
>  incarnation      Incarnation.
> +
> +tolerance        Amount of Jitter the domU can handle after migration

Measurement units?

>  --------------------------------------------------------------------
>  
>  \clearpage
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index c342d00732..4a9c43b718 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2148,8 +2153,25 @@ void tsc_set_info(struct domain *d,
>           * When a guest is created, gtsc_khz is passed in as zero, making
>           * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
>           */
> +        disable_vtsc = d->arch.tsc_khz == cpu_khz;
> +
> +        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz &&
> +             d->arch.vtsc_tolerance_khz )
> +        {
> +            long khz_diff;
> +
> +            khz_diff = ABS((long)(cpu_khz - gtsc_khz));
> +            disable_vtsc = khz_diff <= d->arch.vtsc_tolerance_khz;
> +
> +            printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
> +                   " domU expects %u kHz,"
> +                   " difference of %ld is %s tolerance of %u\n",
> +                   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
> +                   disable_vtsc ? "within" : "outside",
> +                   d->arch.vtsc_tolerance_khz);
> +        }

Newline here.

>          if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
> -             (d->arch.tsc_khz == cpu_khz ||
> +             (disable_vtsc ||
>                (is_hvm_domain(d) &&
>                 hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages]
  2018-10-01 11:25           ` Jan Beulich
@ 2018-10-01 13:24             ` Ian Jackson
  2018-10-01 13:58               ` Andrew Cooper
  2018-10-01 13:38             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation George Dunlap
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Jackson @ 2018-10-01 13:24 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Lars Kurth, Olaf Hering, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Marek Marczykowski, xen-devel, Julien Grall

Jan Beulich writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation"):
> Problem is - discussion around this was (iirc) happening not only on
> the list, but also on irc (including perhaps private chats).

Hrm.  Well, if it didn't happen on the list, it didn't happen.  It is
often useful and productive, of course, to thrash something out on
irc, or ping there, or whatever but:

Conclusions from irc (and from in-person conversations, phone calls,
or other kinds of un-minuted discussions) must be transferred to email
as otherwise they are lost (and the effort of having them is often
wasted as the conversation has to be had again).

If properly writing up the conclusion from an irc conversation is too
hard, pasting a transcript into an email seems a bare minimum.

> It was for that reason that I made my R-b conditional upon Andrew at
> least giving an informal go-ahead (otherwise, together with Wei's
> R-b, the patch could have gone in).

Right.  Thanks for the clarification.


Andrew Cooper writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation"):
> [lots of stuff]

Thanks for the review.  I hope Olaf will be able to address most of
it, but:

Andrew:
> [Olaf:]
> > Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> > The padding is sent as zero in write_tsc_info to the receving host.
> > The padding is undefined if the changed code runs as receiver.
> 
> I'm not sure what you mean by this final sentence.

Me neither.

> > handle_tsc_info has no code to verify that padding is indeed zero. Due
> > to the lack of a version field it is impossible to know if the sender
> > already has the newly introduced vtsc_tolerance field. In the worst
> > case the receiving domU will get an unemulated TSC.
> 
> The lack of padding verification is deliberate, for forwards
> compatibility.  Why does the sending code matter?  One way or another,
> if the field is 0, the option wasn't present or wasn't configured. 
> Neither of these situations affect the decision-making that the
> receiving side needs to perform.

We are talking here about an unused field that is (supposedly?)
always sent as zero ?

AIUI:

In the new code the semantics of zero is "do not allow any tolerance".
The old code handles this correctly.  The issue is with migrations
from new code to old: the tolerance value will silently ignored.

Presumably this is considered preferable to the alternative, which is
to extend the migration stream in a deliberately incompatible way so
that the migration fails.  I think this is what Olaf is trying to
say ?

> > Signed-off-by: Olaf Hering <olaf@aepfle.de>
> > Reviewed-by: Wei Liu <wei.liu2@citrix.com> (v07/v08)
> > Reviewed-by: Jan Beulich <jbeulich@suse.com> (v08)
> 
> I'm still -0.5 for this patch.  I can appreciate why you want it, but it
> is a gross hack which only works when you don't skew time more than NTP
> in the guest can cope with.

That surely is why there is a limit to the tolerance.  I've asked Olaf
to try to quantify an appropriate limit.

>   My gut feeling is that there will be other
> more subtle fallout.

That's not particularly convincing to me.  But if you could try to be
more specific I think this could be usefull addressed in the
documentation for the feature ?


Olaf, I think the ball is now in your court.  I hope you can address
Andrew's comments, and maybe mine too.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 12:39 ` Andrew Cooper
@ 2018-10-01 13:29   ` Jan Beulich
  2018-10-01 15:17     ` Andrew Cooper
  2018-10-09 16:42   ` Olaf Hering
  2018-11-29 13:21   ` Olaf Hering
  2 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2018-10-01 13:29 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Olaf Hering, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Marek Marczykowski,
	xen-devel, Julien Grall

>>> On 01.10.18 at 14:39, <andrew.cooper3@citrix.com> wrote:
> On 07/06/18 14:08, Olaf Hering wrote:
>> Add an option to control when vTSC emulation will be activated for a
>> domU with tsc_mode=default. Without such option each TSC access from
>> domU will be emulated, which causes a significant perfomance drop for
>> workloads that make use of rdtsc.
>>
>> One option to avoid the TSC option is to run domUs with tsc_mode=native.
>> This has the drawback that migrating a domU from a "2.3GHz" class host
>> to a "2.4GHz" class host may change the rate at wich the TSC counter
>> increases, the domU may not be prepared for that.
>>
>> With the new option the host admin can decide how a domU should behave
>> when it is migrated across systems of the same class. Since there is
>> always some jitter when Xen calibrates the cpu_khz value, all hosts of
>> the same class will most likely have slightly different values. As a
>> result vTSC emulation is unavoidable. Data collected during the incident
>> which triggered this change showed a jitter of up to 200 KHz across
>> systems of the same class.
> 
> Do you have any further details of the systems involved?  If they are
> identical systems, they should all have the same real TSC frequency, and
> its a known issue that Xen isn't very good at working out the
> frequency.  TBH, fixing that would be far better overall.

Are you convinced all parts match their nominal frequency without
_any_ deviation? If that was the case, we could indeed use CPUID
leaves 0x15 / 0x16 output, if available. But I very much doubt this.
As an example, here's what bare metal Linux says on my newest
system:

tsc: Detected 2600.000 MHz processor
tsc: Refined TSC clocksource calibration: 2591.990 MHz

Xen figures:

(XEN) Detected 2592.107 MHz processor.

And then after another re-boot bare metal Linux again

tsc: Refined TSC clocksource calibration: 2592.008 MHz

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 11:25           ` Jan Beulich
  2018-10-01 13:24             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages] Ian Jackson
@ 2018-10-01 13:38             ` George Dunlap
  2018-10-01 14:00               ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: George Dunlap @ 2018-10-01 13:38 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Ian Jackson
  Cc: Lars Kurth, Olaf Hering, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Marek Marczykowski, xen-devel, Julien Grall

On 10/01/2018 12:25 PM, Jan Beulich wrote:
>>>> On 01.10.18 at 12:52, <ian.jackson@citrix.com> wrote:
>> Olaf Hering writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to 
>> avoid TSC emulation"):
>>> Am Thu, 13 Sep 2018 09:39:13 +0200
>>> schrieb Olaf Hering <olaf@aepfle.de>:
>>>> this patch was not applied yet, even after a few "pings".
>>>
>>> No reaction since months.
>>> So scrap that patch, just in case it is still part of someones to-consider 
>> queue.
>>
>> I think it would be worth exploring whether this patch could be
>> applied without an explicit ack from Andrew.
>>
>> I confess I ignored all the previous mails because they all started
>>    Andrew,
>> or
>>    Andrew, Lars,
>> so I assumed that you didn't want attention from other
>> maintainers/committers.
>>
>> Now that I look at the thread it is difficult for me to see the wood
>> for the trees but I don't see unanswered concerns.
> 
> Problem is - discussion around this was (iirc) happening not only on
> the list, but also on irc (including perhaps private chats). It was for
> that reason that I made my R-b conditional upon Andrew at least
> giving an informal go-ahead (otherwise, together with Wei's R-b,
> the patch could have gone in).
> 
> Besides the question of correctness from the perspective of guests
> (which imo is not a problem as the feature needs to be actively
> enabled, and I think we have no reason to keep admins from
> breaking their guests if they really mean to), 

I agree with this, BTW.

> I think the main concern
> was with the way migration of the new value was implemented. But I
> really have to defer to Andrew for that, irrespective of him not
> having responded (on the list) to prior pings.

Is Andrew really the only person who knows enough about migration to
give this the thumbs-up?  Technically migration is in the toolstack, and
so Wei should have double-checked that before giving his review; and
when a question was raised, Wei (as the relevant maintainer who had
given it an R-b) should either have asserted that the code was indeed
correct, or withdrawn his R-b and given Olaf feedback to allow him to
get it into shape.

(This is all based on the history sketched out by Jan above; if there is
more to it then of course this analysis may not be correct.)

I was only skimming the thread, and intended to weigh in at some point;
but I didn't really understand why it was blocked on Andrew in the first
place.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages]
  2018-10-01 13:24             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages] Ian Jackson
@ 2018-10-01 13:58               ` Andrew Cooper
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2018-10-01 13:58 UTC (permalink / raw)
  To: Ian Jackson, Jan Beulich
  Cc: Lars Kurth, Olaf Hering, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Marek Marczykowski, xen-devel, Julien Grall

On 01/10/18 14:24, Ian Jackson wrote:
>
>>> handle_tsc_info has no code to verify that padding is indeed zero. Due
>>> to the lack of a version field it is impossible to know if the sender
>>> already has the newly introduced vtsc_tolerance field. In the worst
>>> case the receiving domU will get an unemulated TSC.
>> The lack of padding verification is deliberate, for forwards
>> compatibility.  Why does the sending code matter?  One way or another,
>> if the field is 0, the option wasn't present or wasn't configured. 
>> Neither of these situations affect the decision-making that the
>> receiving side needs to perform.
> We are talking here about an unused field that is (supposedly?)
> always sent as zero ?

The spec requires the padding to always be sent as zero, and
verify-stream-v2 does check the padding and object to non-zero values.

libxc's write_tsc_info() has always fully zeroed the structure, and
convert-legacy-stream also behaves the same way.  However, I notice this
change will break migration from pre 4.6 as write_libxc_tsc_info() now
has a mismatched number of parameters to pack and throw an exception.

>
> AIUI:
>
> In the new code the semantics of zero is "do not allow any tolerance".
> The old code handles this correctly.  The issue is with migrations
> from new code to old: the tolerance value will silently ignored.

Migrating from new to old doesn't remotely work, and whether things
explode or not is very context dependent.

I'm not sure its worth caring about this case.  We never have before. 
(Getting backwards migration working for emergency cases is on my TODO
list, but it is dependent on rewriting the hypervisor interfaces for
getting/setting guest state.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 13:38             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation George Dunlap
@ 2018-10-01 14:00               ` Jan Beulich
  2018-10-01 14:25                 ` George Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2018-10-01 14:00 UTC (permalink / raw)
  To: george.dunlap
  Cc: Lars Kurth, Olaf Hering, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Marek Marczykowski, xen-devel, Julien Grall, Ian Jackson

>>> On 01.10.18 at 15:38, <george.dunlap@citrix.com> wrote:
> On 10/01/2018 12:25 PM, Jan Beulich wrote:
>> I think the main concern
>> was with the way migration of the new value was implemented. But I
>> really have to defer to Andrew for that, irrespective of him not
>> having responded (on the list) to prior pings.
> 
> Is Andrew really the only person who knows enough about migration to
> give this the thumbs-up?

That's not the point here, at least afaic: He had voiced _some_
concern on an earlier version. In such a case it is, I think, only
appropriate to wait with committing until there was indication
that the concerns were sufficiently addressed (verbally or by
adjustments to the code). It is instead my incomplete
knowledge on the various parts of migration that made it
undesirable to try to judge in his place, after pings lead no-
where.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 14:00               ` Jan Beulich
@ 2018-10-01 14:25                 ` George Dunlap
  2018-10-01 16:01                   ` Wei Liu
  0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2018-10-01 14:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Lars Kurth, Olaf Hering, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Marek Marczykowski, xen-devel, Julien Grall, Ian Jackson

On 10/01/2018 03:00 PM, Jan Beulich wrote:
>>>> On 01.10.18 at 15:38, <george.dunlap@citrix.com> wrote:
>> On 10/01/2018 12:25 PM, Jan Beulich wrote:
>>> I think the main concern
>>> was with the way migration of the new value was implemented. But I
>>> really have to defer to Andrew for that, irrespective of him not
>>> having responded (on the list) to prior pings.
>>
>> Is Andrew really the only person who knows enough about migration to
>> give this the thumbs-up?
> 
> That's not the point here, at least afaic: He had voiced _some_
> concern on an earlier version. In such a case it is, I think, only
> appropriate to wait with committing until there was indication
> that the concerns were sufficiently addressed (verbally or by
> adjustments to the code).

Right -- but it's not your job to make sure the migration stuff is
properly addressed; it's Wei and Ian's job.  Wei's R-b was a statement
from him that the code was good; when Andy questioned that, I think it
was then *Wei's* job to address the question, not yours or Andy's (or
even Olaf's).  If Wei says, "I've considered Andy's objections and I
think the patch is fine as-is", then it can be checked in (given a
reasonable amount of time for Andy to respond); and Wei can own whatever
consequences there are.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 13:29   ` Jan Beulich
@ 2018-10-01 15:17     ` Andrew Cooper
  2018-10-01 15:28       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2018-10-01 15:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Olaf Hering, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Marek Marczykowski,
	xen-devel, Julien Grall

On 01/10/18 14:29, Jan Beulich wrote:
>>>> On 01.10.18 at 14:39, <andrew.cooper3@citrix.com> wrote:
>> On 07/06/18 14:08, Olaf Hering wrote:
>>> Add an option to control when vTSC emulation will be activated for a
>>> domU with tsc_mode=default. Without such option each TSC access from
>>> domU will be emulated, which causes a significant perfomance drop for
>>> workloads that make use of rdtsc.
>>>
>>> One option to avoid the TSC option is to run domUs with tsc_mode=native.
>>> This has the drawback that migrating a domU from a "2.3GHz" class host
>>> to a "2.4GHz" class host may change the rate at wich the TSC counter
>>> increases, the domU may not be prepared for that.
>>>
>>> With the new option the host admin can decide how a domU should behave
>>> when it is migrated across systems of the same class. Since there is
>>> always some jitter when Xen calibrates the cpu_khz value, all hosts of
>>> the same class will most likely have slightly different values. As a
>>> result vTSC emulation is unavoidable. Data collected during the incident
>>> which triggered this change showed a jitter of up to 200 KHz across
>>> systems of the same class.
>> Do you have any further details of the systems involved?  If they are
>> identical systems, they should all have the same real TSC frequency, and
>> its a known issue that Xen isn't very good at working out the
>> frequency.  TBH, fixing that would be far better overall.
> Are you convinced all parts match their nominal frequency without
> _any_ deviation?

That is the intent of publishing the numbers, yes.

> If that was the case, we could indeed use CPUID
> leaves 0x15 / 0x16 output, if available.

We very much should be doing this.  There are also model-specific ways
of getting the same data on older processors.

> But I very much doubt this.
> As an example, here's what bare metal Linux says on my newest
> system:
>
> tsc: Detected 2600.000 MHz processor
> tsc: Refined TSC clocksource calibration: 2591.990 MHz
>
> Xen figures:
>
> (XEN) Detected 2592.107 MHz processor.
>
> And then after another re-boot bare metal Linux again
>
> tsc: Refined TSC clocksource calibration: 2592.008 MHz

What is surprising here?  The calibration loop is not 100% accurate and
cannot be made to be perfect.

The fact that Linux and Xen agree is because they basically share the
same calibration algorithm - not that the processor is really running at
2592MHz.  For one, all calibration options will read slow by the amount
of time it takes an interrupt to propagate through the system fabric,
and there is basically nothing software can do to account for this.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 15:17     ` Andrew Cooper
@ 2018-10-01 15:28       ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2018-10-01 15:28 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Olaf Hering, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson, Marek Marczykowski,
	xen-devel, Julien Grall

>>> On 01.10.18 at 17:17, <andrew.cooper3@citrix.com> wrote:
> On 01/10/18 14:29, Jan Beulich wrote:
>>>>> On 01.10.18 at 14:39, <andrew.cooper3@citrix.com> wrote:
>>> On 07/06/18 14:08, Olaf Hering wrote:
>>>> Add an option to control when vTSC emulation will be activated for a
>>>> domU with tsc_mode=default. Without such option each TSC access from
>>>> domU will be emulated, which causes a significant perfomance drop for
>>>> workloads that make use of rdtsc.
>>>>
>>>> One option to avoid the TSC option is to run domUs with tsc_mode=native.
>>>> This has the drawback that migrating a domU from a "2.3GHz" class host
>>>> to a "2.4GHz" class host may change the rate at wich the TSC counter
>>>> increases, the domU may not be prepared for that.
>>>>
>>>> With the new option the host admin can decide how a domU should behave
>>>> when it is migrated across systems of the same class. Since there is
>>>> always some jitter when Xen calibrates the cpu_khz value, all hosts of
>>>> the same class will most likely have slightly different values. As a
>>>> result vTSC emulation is unavoidable. Data collected during the incident
>>>> which triggered this change showed a jitter of up to 200 KHz across
>>>> systems of the same class.
>>> Do you have any further details of the systems involved?  If they are
>>> identical systems, they should all have the same real TSC frequency, and
>>> its a known issue that Xen isn't very good at working out the
>>> frequency.  TBH, fixing that would be far better overall.
>> Are you convinced all parts match their nominal frequency without
>> _any_ deviation?
> 
> That is the intent of publishing the numbers, yes.
> 
>> If that was the case, we could indeed use CPUID
>> leaves 0x15 / 0x16 output, if available.
> 
> We very much should be doing this.  There are also model-specific ways
> of getting the same data on older processors.
> 
>> But I very much doubt this.
>> As an example, here's what bare metal Linux says on my newest
>> system:
>>
>> tsc: Detected 2600.000 MHz processor
>> tsc: Refined TSC clocksource calibration: 2591.990 MHz
>>
>> Xen figures:
>>
>> (XEN) Detected 2592.107 MHz processor.
>>
>> And then after another re-boot bare metal Linux again
>>
>> tsc: Refined TSC clocksource calibration: 2592.008 MHz
> 
> What is surprising here?  The calibration loop is not 100% accurate and
> cannot be made to be perfect.
> 
> The fact that Linux and Xen agree is because they basically share the
> same calibration algorithm - not that the processor is really running at
> 2592MHz.

And I'm not claiming it is. I'm merely voicing my doubt that the
processor is running at exactly the announced 2600.000 MHz.
In which case it is simply unknown whether calibrated or
nominal values come closer to the truth.

>  For one, all calibration options will read slow by the amount
> of time it takes an interrupt to propagate through the system fabric,
> and there is basically nothing software can do to account for this.

Well, if you measure under otherwise identical conditions twice
the arrival of instances of the same signal, then its time to travel
through the fabric doesn't matter for the distance in time between
the arrival of the two instances. But of course this is as idealized
as is an assumption that humans would be able to manufacture
clocks ticking at exactly their nominal frequency.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 14:25                 ` George Dunlap
@ 2018-10-01 16:01                   ` Wei Liu
  0 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2018-10-01 16:01 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, Olaf Hering, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Tim Deegan,
	Marek Marczykowski, xen-devel, Julien Grall, Jan Beulich,
	Ian Jackson

On Mon, Oct 01, 2018 at 03:25:36PM +0100, George Dunlap wrote:
> On 10/01/2018 03:00 PM, Jan Beulich wrote:
> >>>> On 01.10.18 at 15:38, <george.dunlap@citrix.com> wrote:
> >> On 10/01/2018 12:25 PM, Jan Beulich wrote:
> >>> I think the main concern
> >>> was with the way migration of the new value was implemented. But I
> >>> really have to defer to Andrew for that, irrespective of him not
> >>> having responded (on the list) to prior pings.
> >>
> >> Is Andrew really the only person who knows enough about migration to
> >> give this the thumbs-up?
> > 
> > That's not the point here, at least afaic: He had voiced _some_
> > concern on an earlier version. In such a case it is, I think, only
> > appropriate to wait with committing until there was indication
> > that the concerns were sufficiently addressed (verbally or by
> > adjustments to the code).
> 
> Right -- but it's not your job to make sure the migration stuff is
> properly addressed; it's Wei and Ian's job.  Wei's R-b was a statement
> from him that the code was good; when Andy questioned that, I think it
> was then *Wei's* job to address the question, not yours or Andy's (or
> even Olaf's).  If Wei says, "I've considered Andy's objections and I
> think the patch is fine as-is", then it can be checked in (given a
> reasonable amount of time for Andy to respond); and Wei can own whatever
> consequences there are.

This patch touched more than toolstack code, that's why Jan gave his R-b
in the first place.

The contention is not on the correctness of the code, but on if this
mechanism had unintended consequences. Both Jan and I thought the code
was correct, but we didn't feel comfortable enough to ignore objections.

Sorry Olaf.

Wei.

> 
>  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 12:39 ` Andrew Cooper
  2018-10-01 13:29   ` Jan Beulich
@ 2018-10-09 16:42   ` Olaf Hering
  2018-11-29 13:21   ` Olaf Hering
  2 siblings, 0 replies; 25+ messages in thread
From: Olaf Hering @ 2018-10-09 16:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Ian Jackson,
	Marek Marczykowski-Górecki, xen-devel, Julien Grall,
	Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1872 bytes --]

Thanks for all the replies. I will work through them.
One remark now:

Am Mon, 1 Oct 2018 13:39:51 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> > With the new option the host admin can decide how a domU should behave
> > when it is migrated across systems of the same class. Since there is
> > always some jitter when Xen calibrates the cpu_khz value, all hosts of
> > the same class will most likely have slightly different values. As a
> > result vTSC emulation is unavoidable. Data collected during the incident
> > which triggered this change showed a jitter of up to 200 KHz across
> > systems of the same class.  
> Do you have any further details of the systems involved?  If they are
> identical systems, they should all have the same real TSC frequency, and
> its a known issue that Xen isn't very good at working out the
> frequency.

If "identical" systems really have an identical frequency, but Xen is
unable to figure out the exact value of that frequency that it is going
to use for its calculations, why would it be a good idea in the first place
to rely on such an estimated value? If there is drift expected, shouldn't
that be handled by running ntpd inside dom0 and the domUs?

If I did that math correctly: a domU that is started with the estimated
cpu_khz value of 2666766, and is migrated to another host with the
estimated cpu_khz value of 2666755, would see 950400K fewer ticks during
a day. How many ticks actually happen, noone knows. But since that amount
of ticks represents a time span of ca. 0.3 seconds I would assume that
ntpd can handle such drift.

With larger differences, like the 200 KHz mentioned above, the drift will
be larger.


What I mean to say is: should the comparison of both hosts be based
on different metrics, instead of a plain estimated value in 'cpu_khz'?


Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation
  2018-10-01 12:39 ` Andrew Cooper
  2018-10-01 13:29   ` Jan Beulich
  2018-10-09 16:42   ` Olaf Hering
@ 2018-11-29 13:21   ` Olaf Hering
  2 siblings, 0 replies; 25+ messages in thread
From: Olaf Hering @ 2018-11-29 13:21 UTC (permalink / raw)
  To: Andrew Cooper, Ian Jackson
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Tim Deegan, Marek Marczykowski-Górecki,
	xen-devel, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 5150 bytes --]

Thanks Andrew and Ian for taking the time to look at this change.
In turn it took me some time to get back to this topic.

Am Mon, 1 Oct 2018 13:39:51 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> On 07/06/18 14:08, Olaf Hering wrote:
> > Add an option to control when vTSC emulation will be activated for a
> > domU with tsc_mode=default. Without such option each TSC access from
> > domU will be emulated, which causes a significant perfomance drop for
> > workloads that make use of rdtsc.
> >
> > One option to avoid the TSC option is to run domUs with tsc_mode=native.
> > This has the drawback that migrating a domU from a "2.3GHz" class host
> > to a "2.4GHz" class host may change the rate at wich the TSC counter
> > increases, the domU may not be prepared for that.
> >
> > With the new option the host admin can decide how a domU should behave
> > when it is migrated across systems of the same class. Since there is
> > always some jitter when Xen calibrates the cpu_khz value, all hosts of
> > the same class will most likely have slightly different values. As a
> > result vTSC emulation is unavoidable. Data collected during the incident
> > which triggered this change showed a jitter of up to 200 KHz across
> > systems of the same class.  
> 
> Do you have any further details of the systems involved?  If they are
> identical systems, they should all have the same real TSC frequency, and
> its a known issue that Xen isn't very good at working out the
> frequency.  TBH, fixing that would be far better overall.

My test hosts have a E5504 cpu. The ones where the issue was reported
use "E7-8880 v3" today, the used hardware two years ago was likely older.

From what I understand the TSC frequency stored in "cpu_khz" is just an
estimated value, not the real hardware frequency. Still, it is used to
decide if two hosts tick at the same speed. The domU kernel may use the
estimated value for its timekeeping, I think it does the same estimation
as Xen itself. But, I have to dig into that.

To me it looks like domUs should run ntpd themselves if there is a plan
to migrate them at some point in the future. At least if they use TSC
for timekeeping. With ntpd the domU would detect time skew, even if it
was not yet migrated to another host. I do not know much about timekeeping,
so this is just a guess on my side.


> > Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> > The padding is sent as zero in write_tsc_info to the receving host.
> > The padding is undefined if the changed code runs as receiver.  
> I'm not sure what you mean by this final sentence.

I have removed that part, since incoming padding is in practice always zero.

> > handle_tsc_info has no code to verify that padding is indeed zero. Due
> > to the lack of a version field it is impossible to know if the sender
> > already has the newly introduced vtsc_tolerance field. In the worst
> > case the receiving domU will get an unemulated TSC.  
> 
> The lack of padding verification is deliberate, for forwards
> compatibility.  Why does the sending code matter?  One way or another,
> if the field is 0, the option wasn't present or wasn't configured. 
> Neither of these situations affect the decision-making that the
> receiving side needs to perform.

> > Signed-off-by: Olaf Hering <olaf@aepfle.de>
> > Reviewed-by: Wei Liu <wei.liu2@citrix.com> (v07/v08)
> > Reviewed-by: Jan Beulich <jbeulich@suse.com> (v08)  
> 
> I'm still -0.5 for this patch.  I can appreciate why you want it, but it
> is a gross hack which only works when you don't skew time more than NTP
> in the guest can cope with.  My gut feeling is that there will be other
> more subtle fallout.


I will do some research regarding how much skewing a domU can handle.
As said in another reply, the expected time drift with a difference of
just 11 kHz in the cpu_khz variable on a 2.6GHz system is about 0.3 seconds
during a day.

IanJ requested clarification for how much time skew a system can handle.
Perhaps this should have been part of the initial submission for
tsc_mode=native already. I will do some research. Also some of the concerns
about missing documentation are already covered in paragraph #3 of the
commit message.

I will do some more testing with staging and send v10 next week.
The accumulated changes for a v10, so far:
v10:
 - rebase to 402411ec40
 - update write_libxc_tsc_info to handle the new parameter vtsc_tolerance_khz
   without this change, migration from xen-4.5 will fail (Andrew)
 - add newline to tsc_set_info (Andrew)
 - add measurment unit to libxc-migration-stream.pandoc (Andrew)
 - add pointer to xen-tscmode(7) in xl.cfg(5)/vtsc_tolerance_khz (Andrew)
 - reword the newly added paragraph in xen-tscmode(7) (Andrew),
   and also mention that it is about the measured/estimated TSC value
   rather than the real value. The latter is simply unknown.
 - simplify wording regarding the value of padding field in old Xen
   versions, the previous one turned out to be confusing and not helpful


Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-11-29 13:21 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-07 13:08 [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation Olaf Hering
2018-06-07 13:31 ` Jan Beulich
2018-06-07 13:47   ` Olaf Hering
2018-06-07 14:44     ` Jan Beulich
2018-06-07 14:49       ` Olaf Hering
2018-06-07 14:55         ` Jan Beulich
2018-06-07 21:05           ` Olaf Hering
2018-06-26 17:11 ` Olaf Hering
2018-08-01 16:17   ` Olaf Hering
2018-09-13  7:39     ` Olaf Hering
2018-09-28 13:50       ` Olaf Hering
2018-10-01 10:52         ` Ian Jackson
2018-10-01 11:25           ` Jan Beulich
2018-10-01 13:24             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages] Ian Jackson
2018-10-01 13:58               ` Andrew Cooper
2018-10-01 13:38             ` [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation George Dunlap
2018-10-01 14:00               ` Jan Beulich
2018-10-01 14:25                 ` George Dunlap
2018-10-01 16:01                   ` Wei Liu
2018-10-01 12:39 ` Andrew Cooper
2018-10-01 13:29   ` Jan Beulich
2018-10-01 15:17     ` Andrew Cooper
2018-10-01 15:28       ` Jan Beulich
2018-10-09 16:42   ` Olaf Hering
2018-11-29 13:21   ` Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.